2023-01-11T21:14:16.8543400Z Requested labels: linux.g5.4xlarge.nvidia.gpu
2023-01-11T21:14:16.8543511Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/tags/ciflow/trunk/91627
2023-01-11T21:14:16.8543690Z Reusable workflow chain:
2023-01-11T21:14:16.8543725Z pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627 (8419ddda87c8a47eacc63b54bc7ec98c1f27c26e)
2023-01-11T21:14:16.8543773Z -> pytorch/pytorch/.github/workflows/_linux-test.yml@refs/tags/ciflow/trunk/91627 (8419ddda87c8a47eacc63b54bc7ec98c1f27c26e)
2023-01-11T21:14:16.8543803Z Waiting for a runner to pick up this job...
2023-01-11T21:14:17.1266156Z Job is about to start running on the runner: i-016718a172a944ca0 (organization)
2023-01-11T21:14:21.5010212Z Current runner version: '2.300.2'
2023-01-11T21:14:21.5015494Z Runner name: 'i-016718a172a944ca0'
2023-01-11T21:14:21.5016078Z Runner group name: 'Default'
2023-01-11T21:14:21.5016758Z Machine name: 'ip-10-0-2-196'
2023-01-11T21:14:21.5018981Z ##[group]GITHUB_TOKEN Permissions
2023-01-11T21:14:21.5019738Z Actions: write
2023-01-11T21:14:21.5020092Z Checks: write
2023-01-11T21:14:21.5020463Z Contents: write
2023-01-11T21:14:21.5020874Z Deployments: write
2023-01-11T21:14:21.5021235Z Discussions: write
2023-01-11T21:14:21.5021599Z Issues: write
2023-01-11T21:14:21.5021999Z Metadata: read
2023-01-11T21:14:21.5022357Z Packages: write
2023-01-11T21:14:21.5022737Z Pages: write
2023-01-11T21:14:21.5023116Z PullRequests: write
2023-01-11T21:14:21.5023520Z RepositoryProjects: write
2023-01-11T21:14:21.5023943Z SecurityEvents: write
2023-01-11T21:14:21.5024340Z Statuses: write
2023-01-11T21:14:21.5024688Z ##[endgroup]
2023-01-11T21:14:21.5027649Z Secret source: Actions
2023-01-11T21:14:21.5028355Z Prepare workflow directory
2023-01-11T21:14:21.7575859Z Prepare all required actions
2023-01-11T21:14:21.7752585Z Getting action download info
2023-01-11T21:14:22.0855347Z Download action repository 'pytorch/test-infra@main' (SHA:2c225610d00fb13c04fcd60389d3e4d8326167c3)
2023-01-11T21:14:22.3833061Z Download action repository 'pytorch/pytorch@master' (SHA:c5836153f5332ca83d5cacde38f2829a4d54793e)
2023-01-11T21:14:24.8242351Z Download action repository 'seemethere/upload-artifact-s3@v5' (SHA:baba72d0712b404f646cebe0730933554ebce96a)
2023-01-11T21:14:25.1588624Z Getting action download info
2023-01-11T21:14:25.5985061Z Download action repository 'malfet/checkout@silent-checkout' (SHA:c7b8fef48edfe1bca0044a44b1f7f7c4318a3076)
2023-01-11T21:14:27.0832770Z Getting action download info
2023-01-11T21:14:27.4893178Z Download action repository 'nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482' (SHA:3e91a01664abd3c5cd539100d10d33b9c5b68482)
2023-01-11T21:14:27.6264289Z Uses: pytorch/pytorch/.github/workflows/_linux-test.yml
2023-01-11T21:14:27.6265701Z ##[group] Inputs
2023-01-11T21:14:27.6266007Z   build-environment: linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T21:14:27.6266854Z   test-matrix: { include: [
  { config: "default", shard: 1, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" },
  { config: "default", shard: 2, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" },
  { config: "default", shard: 3, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" },
  { config: "default", shard: 4, num_shards: 4, runner: "linux.g5.4xlarge.nvidia.gpu" },
  { config: "slow", shard: 1, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
  { config: "slow", shard: 2, num_shards: 2, runner: "linux.g5.4xlarge.nvidia.gpu" },
  { config: "functorch", shard: 1, num_shards: 1, runner: "linux.g5.4xlarge.nvidia.gpu" },
]}

2023-01-11T21:14:27.6267762Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:14:27.6268143Z   sync-tag: 
2023-01-11T21:14:27.6268787Z   timeout-minutes: 240
2023-01-11T21:14:27.6268995Z   use-gha: 
2023-01-11T21:14:27.6269195Z ##[endgroup]
2023-01-11T21:14:27.6269706Z Complete job name: cuda11.6-py3.10-gcc7-sm86 / test (default, 2, 4, linux.g5.4xlarge.nvidia.gpu)
2023-01-11T21:14:27.6926993Z ##[group]Run pytorch/test-infra/.github/actions/setup-ssh@main
2023-01-11T21:14:27.6927275Z with:
2023-01-11T21:14:27.6927726Z   github-secret: ***
2023-01-11T21:14:27.6928076Z   instructions: All testing is done inside the container, to start an interactive session run:
  docker exec -it $(docker container ps --format '{{.ID}}') bash

2023-01-11T21:14:27.6928433Z   activate-with-label: false
2023-01-11T21:14:27.6928644Z   label: with-ssh
2023-01-11T21:14:27.6928851Z   remove-existing-keys: true
2023-01-11T21:14:27.6929046Z env:
2023-01-11T21:14:27.6929232Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:14:27.6929436Z ##[endgroup]
2023-01-11T21:14:27.7688201Z ciflow reference detected, attempting to extract PR number
2023-01-11T21:14:28.3871962Z Grabbing public ssh keys from https://github.com/pytorch-bot[bot].keys
2023-01-11T21:14:28.4638580Z No SSH keys found for user pytorch-bot[bot]
2023-01-11T21:14:28.4639305Z Grabbing public ssh keys from https://github.com/LucaLumetti.keys
2023-01-11T21:14:28.5427293Z ~/.ssh/authorized_keys file found on node, removing ~/.ssh and starting fresh
2023-01-11T21:14:28.5441034Z Public keys pulled and installed to /home/ec2-user/.ssh/authorized_keys
2023-01-11T21:14:28.5464501Z Login using: ssh ec2-user@ec2-18-208-138-83.compute-1.amazonaws.com
2023-01-11T21:14:28.5464865Z All testing is done inside the container, to start an interactive session run:
2023-01-11T21:14:28.5465240Z   docker exec -it $(docker container ps --format '{{.ID}}') bash
2023-01-11T21:14:28.5682039Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master
2023-01-11T21:14:28.5682322Z with:
2023-01-11T21:14:28.5682506Z   submodules: recursive
2023-01-11T21:14:28.5682714Z   fetch-depth: 0
2023-01-11T21:14:28.5682907Z env:
2023-01-11T21:14:28.5683089Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:14:28.5683298Z ##[endgroup]
2023-01-11T21:14:28.5887396Z ##[group]Run retry () {
2023-01-11T21:14:28.5887657Z [36;1mretry () {[0m
2023-01-11T21:14:28.5887901Z [36;1m  $* || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)[0m
2023-01-11T21:14:28.5888123Z [36;1m}[0m
2023-01-11T21:14:28.5888325Z [36;1mecho "${GITHUB_WORKSPACE}"[0m
2023-01-11T21:14:28.5888559Z [36;1mif [ -z "${NO_SUDO}" ]; then[0m
2023-01-11T21:14:28.5888801Z [36;1m  retry sudo rm -rf "${GITHUB_WORKSPACE}"[0m
2023-01-11T21:14:28.5889029Z [36;1melse[0m
2023-01-11T21:14:28.5889248Z [36;1m  retry rm -rf "${GITHUB_WORKSPACE}"[0m
2023-01-11T21:14:28.5889455Z [36;1mfi[0m
2023-01-11T21:14:28.5889706Z [36;1mmkdir "${GITHUB_WORKSPACE}"[0m
2023-01-11T21:14:28.5904334Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:14:28.5904583Z env:
2023-01-11T21:14:28.5904777Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:14:28.5904986Z   NO_SUDO: 
2023-01-11T21:14:28.5905164Z ##[endgroup]
2023-01-11T21:14:28.5995605Z /home/ec2-user/actions-runner/_work/pytorch/pytorch
2023-01-11T21:14:30.5287398Z ##[group]Run malfet/checkout@silent-checkout
2023-01-11T21:14:30.5287667Z with:
2023-01-11T21:14:30.5287882Z   ref: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:14:30.5288103Z   fetch-depth: 0
2023-01-11T21:14:30.5288299Z   submodules: recursive
2023-01-11T21:14:30.5288500Z   quiet-checkout: true
2023-01-11T21:14:30.5288706Z   repository: pytorch/pytorch
2023-01-11T21:14:30.5289021Z   token: ***
2023-01-11T21:14:30.5289210Z   ssh-strict: true
2023-01-11T21:14:30.5289414Z   persist-credentials: true
2023-01-11T21:14:30.5289626Z   clean: true
2023-01-11T21:14:30.5289820Z   lfs: false
2023-01-11T21:14:30.5290008Z   set-safe-directory: true
2023-01-11T21:14:30.5290203Z env:
2023-01-11T21:14:30.5290389Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:14:30.5290581Z ##[endgroup]
2023-01-11T21:14:30.6382665Z Syncing repository: pytorch/pytorch
2023-01-11T21:14:30.6383967Z ##[group]Getting Git version info
2023-01-11T21:14:30.6384431Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2023-01-11T21:14:30.6384876Z [command]/usr/bin/git version
2023-01-11T21:14:30.6385087Z git version 2.38.1
2023-01-11T21:14:30.6385646Z ##[endgroup]
2023-01-11T21:14:30.6396410Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/72d2e0db-dd5b-43c1-a658-5ef95a1e4abf' before making global git config changes
2023-01-11T21:14:30.6396842Z Adding repository directory to the temporary git global config as a safe directory
2023-01-11T21:14:30.6399052Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2023-01-11T21:14:30.6434701Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch'
2023-01-11T21:14:30.6439274Z ##[group]Initializing the repository
2023-01-11T21:14:30.6441825Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch
2023-01-11T21:14:30.6463266Z hint: Using 'master' as the name for the initial branch. This default branch name
2023-01-11T21:14:30.6463631Z hint: is subject to change. To configure the initial branch name to use in all
2023-01-11T21:14:30.6464000Z hint: of your new repositories, which will suppress this warning, call:
2023-01-11T21:14:30.6464248Z hint: 
2023-01-11T21:14:30.6464541Z hint: 	git config --global init.defaultBranch <name>
2023-01-11T21:14:30.6464783Z hint: 
2023-01-11T21:14:30.6465086Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and
2023-01-11T21:14:30.6465464Z hint: 'development'. The just-created branch can be renamed via this command:
2023-01-11T21:14:30.6465715Z hint: 
2023-01-11T21:14:30.6466036Z hint: 	git branch -m <name>
2023-01-11T21:14:30.6466415Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/
2023-01-11T21:14:30.6471634Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch
2023-01-11T21:14:30.6491580Z ##[endgroup]
2023-01-11T21:14:30.6491976Z ##[group]Disabling automatic garbage collection
2023-01-11T21:14:30.6495060Z [command]/usr/bin/git config --local gc.auto 0
2023-01-11T21:14:30.6513309Z ##[endgroup]
2023-01-11T21:14:30.6513652Z ##[group]Setting up auth
2023-01-11T21:14:30.6519393Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2023-01-11T21:14:30.6540889Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2023-01-11T21:14:30.6765500Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2023-01-11T21:14:30.6788288Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
2023-01-11T21:14:30.6993426Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic ***
2023-01-11T21:14:30.7032940Z ##[endgroup]
2023-01-11T21:14:30.7033311Z ##[group]Fetching the repository
2023-01-11T21:14:30.7038728Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/*
2023-01-11T21:15:22.4358794Z [command]/usr/bin/git rev-parse --verify --quiet 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e^{object}
2023-01-11T21:15:22.4379435Z 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:15:22.4383202Z ##[endgroup]
2023-01-11T21:15:22.4384143Z ##[group]Determining the checkout info
2023-01-11T21:15:22.4384835Z ##[endgroup]
2023-01-11T21:15:22.4385241Z ##[group]Checking out the ref
2023-01-11T21:15:22.4388055Z [command]/usr/bin/git checkout --quiet --force 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:15:23.6204460Z ##[endgroup]
2023-01-11T21:15:23.6205773Z ##[group]Setting up auth for fetching submodules
2023-01-11T21:15:23.6211245Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic ***
2023-01-11T21:15:23.6259196Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf
2023-01-11T21:15:23.6280030Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com:
2023-01-11T21:15:23.6309561Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com:
2023-01-11T21:15:23.6339410Z ##[endgroup]
2023-01-11T21:15:23.6340414Z ##[group]Fetching submodules
2023-01-11T21:15:23.6342664Z [command]/usr/bin/git submodule sync --recursive
2023-01-11T21:15:23.6586105Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive
2023-01-11T21:15:23.6796782Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni'
2023-01-11T21:15:23.6798441Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16'
2023-01-11T21:15:23.6799122Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv'
2023-01-11T21:15:23.6800071Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK'
2023-01-11T21:15:23.6802014Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK'
2023-01-11T21:15:23.6804922Z Submodule 'third_party/VulkanMemoryAllocator' (https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator.git) registered for path 'third_party/VulkanMemoryAllocator'
2023-01-11T21:15:23.6806373Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK'
2023-01-11T21:15:23.6807953Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark'
2023-01-11T21:15:23.6809966Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo'
2023-01-11T21:15:23.6812391Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub'
2023-01-11T21:15:23.6816034Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend'
2023-01-11T21:15:23.6816838Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'third_party/cutlass'
2023-01-11T21:15:23.6819020Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen'
2023-01-11T21:15:23.6821463Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm'
2023-01-11T21:15:23.6823753Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers'
2023-01-11T21:15:23.6826131Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt'
2023-01-11T21:15:23.6828758Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi'
2023-01-11T21:15:23.6831282Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp'
2023-01-11T21:15:23.6833861Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo'
2023-01-11T21:15:23.6836643Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest'
2023-01-11T21:15:23.6839447Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep'
2023-01-11T21:15:23.6842330Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake'
2023-01-11T21:15:23.6845216Z Submodule 'third_party/ittapi' (https://github.com/intel/ittapi.git) registered for path 'third_party/ittapi'
2023-01-11T21:15:23.6848159Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto'
2023-01-11T21:15:23.6851108Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl'
2023-01-11T21:15:23.6855358Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse'
2023-01-11T21:15:23.6858544Z Submodule 'third_party/nlohmann' (https://github.com/nlohmann/json.git) registered for path 'third_party/nlohmann'
2023-01-11T21:15:23.6861673Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx'
2023-01-11T21:15:23.6864955Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt'
2023-01-11T21:15:23.6868301Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft'
2023-01-11T21:15:23.6871683Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf'
2023-01-11T21:15:23.6875071Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd'
2023-01-11T21:15:23.6878747Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool'
2023-01-11T21:15:23.6882216Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11'
2023-01-11T21:15:23.6885926Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum'
2023-01-11T21:15:23.6889613Z Submodule 'third_party/python-peachpy' (https://github.com/malfet/PeachPy.git) registered for path 'third_party/python-peachpy'
2023-01-11T21:15:23.6893543Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six'
2023-01-11T21:15:23.6897617Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef'
2023-01-11T21:15:23.6901434Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb'
2023-01-11T21:15:23.6905863Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe'
2023-01-11T21:15:23.6909839Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd'
2023-01-11T21:15:23.6928781Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'...
2023-01-11T21:15:23.9752383Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'...
2023-01-11T21:15:24.2009871Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'...
2023-01-11T21:15:24.5309618Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'...
2023-01-11T21:15:24.8230216Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'...
2023-01-11T21:15:25.0948223Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/VulkanMemoryAllocator'...
2023-01-11T21:15:27.1538539Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
2023-01-11T21:15:32.6805819Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'...
2023-01-11T21:15:33.0813352Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'...
2023-01-11T21:15:33.5801525Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'...
2023-01-11T21:15:35.0340374Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'...
2023-01-11T21:15:36.1281197Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
2023-01-11T21:15:37.2759124Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'...
2023-01-11T21:15:42.8814259Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
2023-01-11T21:15:43.6265726Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
2023-01-11T21:15:45.0714241Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'...
2023-01-11T21:15:46.2243279Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'...
2023-01-11T21:15:46.4328511Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'...
2023-01-11T21:15:47.1412260Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'...
2023-01-11T21:15:47.4848878Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'...
2023-01-11T21:15:48.3635735Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'...
2023-01-11T21:15:48.7759329Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'...
2023-01-11T21:15:48.9891807Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ittapi'...
2023-01-11T21:15:49.2689774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
2023-01-11T21:15:50.5765284Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'...
2023-01-11T21:15:50.9170666Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'...
2023-01-11T21:15:51.2838485Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
2023-01-11T21:15:57.2794826Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
2023-01-11T21:15:58.9512136Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'...
2023-01-11T21:15:59.4773267Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'...
2023-01-11T21:15:59.7145657Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
2023-01-11T21:16:05.0217394Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'...
2023-01-11T21:16:06.9220841Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'...
2023-01-11T21:16:07.2444530Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
2023-01-11T21:16:08.8290383Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'...
2023-01-11T21:16:09.4136793Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
2023-01-11T21:16:09.7346831Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'...
2023-01-11T21:16:10.1870774Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
2023-01-11T21:16:11.2655769Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'...
2023-01-11T21:16:13.0874356Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
2023-01-11T21:16:13.5666565Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'...
2023-01-11T21:16:15.6246257Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f'
2023-01-11T21:16:15.6320287Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3'
2023-01-11T21:16:15.6376911Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1'
2023-01-11T21:16:15.6554233Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73'
2023-01-11T21:16:15.6727244Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c'
2023-01-11T21:16:15.7002886Z Submodule path 'third_party/VulkanMemoryAllocator': checked out 'a6bfc237255a6bac1513f7c1ebde6d8aed6b5191'
2023-01-11T21:16:16.1929576Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba'
2023-01-11T21:16:16.8923166Z Submodule path 'third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415'
2023-01-11T21:16:16.9769618Z Submodule path 'third_party/cpuinfo': checked out '8ec7bd91ad0470e61cf38f618cc1f270dede599c'
2023-01-11T21:16:17.0053773Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4'
2023-01-11T21:16:17.2387339Z Submodule path 'third_party/cudnn_frontend': checked out '171a7a986f7fbd9ed71bd0cf3c7ad4f55843d6b3'
2023-01-11T21:16:17.5744422Z Submodule path 'third_party/cutlass': checked out 'b72cbf957df8cf84a6d0ff91c190ad51a9c1d24a'
2023-01-11T21:16:17.7768696Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c'
2023-01-11T21:16:17.8133415Z Submodule path 'third_party/fbgemm': checked out '80d64206c07879fd4683be66873de7cefa1a0a71'
2023-01-11T21:16:17.8144783Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit'
2023-01-11T21:16:17.8147013Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T21:16:17.8149954Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest'
2023-01-11T21:16:17.8152793Z Submodule 'third_party/hipify_torch' (https://github.com/ROCmSoftwarePlatform/hipify_torch.git) registered for path 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T21:16:17.8179735Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'...
2023-01-11T21:16:18.7758155Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'...
2023-01-11T21:16:19.3122006Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'...
2023-01-11T21:16:20.2260036Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/hipify_torch'...
2023-01-11T21:16:20.6023525Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out 'd3fbf7c9bc7c1d1365a94a45614b91c5a3706b81'
2023-01-11T21:16:20.6879811Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3'
2023-01-11T21:16:20.7375893Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796'
2023-01-11T21:16:20.7458268Z Submodule path 'third_party/fbgemm/third_party/hipify_torch': checked out '1840658c184f3eeba787dae0f06c45756c1daaf5'
2023-01-11T21:16:20.8154152Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa'
2023-01-11T21:16:20.8455251Z Submodule path 'third_party/fmt': checked out '7bdf0628b1276379886c7f6dda2cef2b3b374f0b'
2023-01-11T21:16:20.8520589Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad'
2023-01-11T21:16:20.8827236Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350'
2023-01-11T21:16:20.9017704Z Submodule path 'third_party/gloo': checked out '4a5e339b764261d20fc409071dc7a8b8989aa195'
2023-01-11T21:16:20.9386456Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929'
2023-01-11T21:16:20.9477515Z Submodule path 'third_party/ideep': checked out 'e533c771a1e75a1c225c14b2261eefa62681d9e6'
2023-01-11T21:16:20.9488230Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn'
2023-01-11T21:16:20.9505938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'...
2023-01-11T21:16:29.0382523Z Submodule path 'third_party/ideep/mkl-dnn': checked out '404ad76ee633c939d705eb583ffe50a806969d5e'
2023-01-11T21:16:29.0397569Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T21:16:29.0421171Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'...
2023-01-11T21:16:37.0542495Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out 'fbec3e25a559ee252022ae066817b204e106a6ba'
2023-01-11T21:16:37.0626089Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724'
2023-01-11T21:16:37.0741591Z Submodule path 'third_party/ittapi': checked out '5b8a7d7422611c3a0d799fb5fc5dd4abfae35b42'
2023-01-11T21:16:37.1518534Z Submodule path 'third_party/kineto': checked out '6c1629809068efd78a8d56b4aa479c7ec49ae562'
2023-01-11T21:16:37.1529987Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T21:16:37.1531079Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T21:16:37.1552163Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'...
2023-01-11T21:16:38.2297343Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'...
2023-01-11T21:16:39.1691767Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd'
2023-01-11T21:16:39.2139328Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347'
2023-01-11T21:16:39.2308033Z Submodule path 'third_party/nccl/nccl': checked out 'f89fd4777d2ef9229c039ff750ae21da01626f52'
2023-01-11T21:16:39.2416403Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a'
2023-01-11T21:16:39.3259864Z Submodule path 'third_party/nlohmann': checked out '87cda1d6646592ac5866dc703c8e1839046a6806'
2023-01-11T21:16:39.5228244Z Submodule path 'third_party/onnx': checked out 'f7ee1ac60d06abe8e26c9b6bbe1e3db5286b614b'
2023-01-11T21:16:39.5250380Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark'
2023-01-11T21:16:39.5251444Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11'
2023-01-11T21:16:39.5270820Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'...
2023-01-11T21:16:39.9297544Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'...
2023-01-11T21:16:40.7851843Z Submodule path 'third_party/onnx/third_party/benchmark': checked out '0d98dba29d66e93259db7daa53a9327df767a415'
2023-01-11T21:16:40.8109083Z Submodule path 'third_party/onnx/third_party/pybind11': checked out 'ffa346860b306c9bbfb341aed9c14c067751feb8'
2023-01-11T21:16:40.8231559Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f'
2023-01-11T21:16:40.8242000Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T21:16:40.8258137Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'...
2023-01-11T21:16:42.5039262Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8'
2023-01-11T21:16:42.5052394Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T21:16:42.5055600Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T21:16:42.5072632Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'...
2023-01-11T21:16:42.9010409Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'...
2023-01-11T21:16:43.7220276Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508'
2023-01-11T21:16:43.7810738Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c'
2023-01-11T21:16:43.7822244Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T21:16:43.7840021Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'...
2023-01-11T21:16:43.9979410Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2023-01-11T21:16:44.0045463Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a'
2023-01-11T21:16:44.2159850Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a'
2023-01-11T21:16:44.2177634Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark'
2023-01-11T21:16:44.2179273Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest'
2023-01-11T21:16:44.2196058Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'...
2023-01-11T21:16:44.5980605Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'...
2023-01-11T21:16:45.4999493Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8'
2023-01-11T21:16:45.5566207Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081'
2023-01-11T21:16:45.5622594Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900'
2023-01-11T21:16:45.5697391Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413'
2023-01-11T21:16:45.5961624Z Submodule path 'third_party/pybind11': checked out '80dc998efced8ceb2be59756668a7e90e8bef917'
2023-01-11T21:16:45.6021257Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7'
2023-01-11T21:16:45.6234421Z Submodule path 'third_party/python-peachpy': checked out 'f45429b087dd7d5bc78bb40dc7cf06425c252d67'
2023-01-11T21:16:45.6298239Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a'
2023-01-11T21:16:45.6640872Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff'
2023-01-11T21:16:45.7531369Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9'
2023-01-11T21:16:45.7727572Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e'
2023-01-11T21:16:45.7737897Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest'
2023-01-11T21:16:45.7738869Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop'
2023-01-11T21:16:45.7739944Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv'
2023-01-11T21:16:45.7742080Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T21:16:45.7758031Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'...
2023-01-11T21:16:46.8790898Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'...
2023-01-11T21:16:47.1627435Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'...
2023-01-11T21:16:48.2546743Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'...
2023-01-11T21:16:49.2448386Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e'
2023-01-11T21:16:49.2559008Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281'
2023-01-11T21:16:49.3080362Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242'
2023-01-11T21:16:49.3323395Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef'
2023-01-11T21:16:49.3336297Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T21:16:49.3353642Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'...
2023-01-11T21:16:49.6150955Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5'
2023-01-11T21:16:49.7332728Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8'
2023-01-11T21:16:49.7358580Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0
2023-01-11T21:16:49.7558175Z Entering 'android/libs/fbjni'
2023-01-11T21:16:49.7584202Z Entering 'third_party/FP16'
2023-01-11T21:16:49.7609691Z Entering 'third_party/FXdiv'
2023-01-11T21:16:49.7636243Z Entering 'third_party/NNPACK'
2023-01-11T21:16:49.7667340Z Entering 'third_party/QNNPACK'
2023-01-11T21:16:49.7693118Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T21:16:49.7723853Z Entering 'third_party/XNNPACK'
2023-01-11T21:16:49.7762474Z Entering 'third_party/benchmark'
2023-01-11T21:16:49.7791265Z Entering 'third_party/cpuinfo'
2023-01-11T21:16:49.7826473Z Entering 'third_party/cub'
2023-01-11T21:16:49.7854986Z Entering 'third_party/cudnn_frontend'
2023-01-11T21:16:49.7884711Z Entering 'third_party/cutlass'
2023-01-11T21:16:49.7915500Z Entering 'third_party/eigen'
2023-01-11T21:16:49.7950127Z Entering 'third_party/fbgemm'
2023-01-11T21:16:49.7982160Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T21:16:49.8009725Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T21:16:49.8038431Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T21:16:49.8068300Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T21:16:49.8096590Z Entering 'third_party/flatbuffers'
2023-01-11T21:16:49.8126282Z Entering 'third_party/fmt'
2023-01-11T21:16:49.8154495Z Entering 'third_party/foxi'
2023-01-11T21:16:49.8183489Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T21:16:49.8210424Z Entering 'third_party/gloo'
2023-01-11T21:16:49.8237224Z Entering 'third_party/googletest'
2023-01-11T21:16:49.8265011Z Entering 'third_party/ideep'
2023-01-11T21:16:49.8289980Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T21:16:49.8317847Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T21:16:49.8349464Z Entering 'third_party/ios-cmake'
2023-01-11T21:16:49.8376240Z Entering 'third_party/ittapi'
2023-01-11T21:16:49.8401282Z Entering 'third_party/kineto'
2023-01-11T21:16:49.8427102Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T21:16:49.8455228Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T21:16:49.8481348Z Entering 'third_party/nccl/nccl'
2023-01-11T21:16:49.8506721Z Entering 'third_party/neon2sse'
2023-01-11T21:16:49.8532506Z Entering 'third_party/nlohmann'
2023-01-11T21:16:49.8558984Z Entering 'third_party/onnx'
2023-01-11T21:16:49.8594246Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T21:16:49.8620887Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T21:16:49.8648135Z Entering 'third_party/onnx-tensorrt'
2023-01-11T21:16:49.8674321Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T21:16:49.8709458Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T21:16:49.8736066Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T21:16:49.8762013Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T21:16:49.8795874Z Entering 'third_party/pocketfft'
2023-01-11T21:16:49.8823241Z Entering 'third_party/protobuf'
2023-01-11T21:16:49.8850819Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T21:16:49.8877830Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T21:16:49.8905982Z Entering 'third_party/psimd'
2023-01-11T21:16:49.8931128Z Entering 'third_party/pthreadpool'
2023-01-11T21:16:49.8958191Z Entering 'third_party/pybind11'
2023-01-11T21:16:49.8985138Z Entering 'third_party/python-enum'
2023-01-11T21:16:49.9010639Z Entering 'third_party/python-peachpy'
2023-01-11T21:16:49.9037522Z Entering 'third_party/python-six'
2023-01-11T21:16:49.9063469Z Entering 'third_party/sleef'
2023-01-11T21:16:49.9088238Z Entering 'third_party/tbb'
2023-01-11T21:16:49.9116590Z Entering 'third_party/tensorpipe'
2023-01-11T21:16:49.9142438Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T21:16:49.9167193Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T21:16:49.9193195Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T21:16:49.9219420Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T21:16:49.9244682Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T21:16:49.9272630Z Entering 'third_party/zstd'
2023-01-11T21:16:49.9306988Z ##[endgroup]
2023-01-11T21:16:49.9307426Z ##[group]Persisting credentials for submodules
2023-01-11T21:16:49.9312790Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || :
2023-01-11T21:16:49.9524322Z Entering 'android/libs/fbjni'
2023-01-11T21:16:49.9554405Z Entering 'third_party/FP16'
2023-01-11T21:16:49.9581071Z Entering 'third_party/FXdiv'
2023-01-11T21:16:49.9605638Z Entering 'third_party/NNPACK'
2023-01-11T21:16:49.9633026Z Entering 'third_party/QNNPACK'
2023-01-11T21:16:49.9661091Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T21:16:49.9694019Z Entering 'third_party/XNNPACK'
2023-01-11T21:16:49.9729957Z Entering 'third_party/benchmark'
2023-01-11T21:16:49.9756555Z Entering 'third_party/cpuinfo'
2023-01-11T21:16:49.9785822Z Entering 'third_party/cub'
2023-01-11T21:16:49.9811968Z Entering 'third_party/cudnn_frontend'
2023-01-11T21:16:49.9846800Z Entering 'third_party/cutlass'
2023-01-11T21:16:49.9882157Z Entering 'third_party/eigen'
2023-01-11T21:16:49.9909795Z Entering 'third_party/fbgemm'
2023-01-11T21:16:49.9938752Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T21:16:49.9965470Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T21:16:49.9994006Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T21:16:50.0024567Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T21:16:50.0052912Z Entering 'third_party/flatbuffers'
2023-01-11T21:16:50.0081095Z Entering 'third_party/fmt'
2023-01-11T21:16:50.0105902Z Entering 'third_party/foxi'
2023-01-11T21:16:50.0131223Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T21:16:50.0156075Z Entering 'third_party/gloo'
2023-01-11T21:16:50.0183455Z Entering 'third_party/googletest'
2023-01-11T21:16:50.0207943Z Entering 'third_party/ideep'
2023-01-11T21:16:50.0233477Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T21:16:50.0263889Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T21:16:50.0295616Z Entering 'third_party/ios-cmake'
2023-01-11T21:16:50.0321401Z Entering 'third_party/ittapi'
2023-01-11T21:16:50.0347495Z Entering 'third_party/kineto'
2023-01-11T21:16:50.0372916Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T21:16:50.0398976Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T21:16:50.0426032Z Entering 'third_party/nccl/nccl'
2023-01-11T21:16:50.0451060Z Entering 'third_party/neon2sse'
2023-01-11T21:16:50.0476548Z Entering 'third_party/nlohmann'
2023-01-11T21:16:50.0502436Z Entering 'third_party/onnx'
2023-01-11T21:16:50.0536574Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.0561757Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.0589115Z Entering 'third_party/onnx-tensorrt'
2023-01-11T21:16:50.0616046Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T21:16:50.0645602Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.0672325Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.0697879Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.0726826Z Entering 'third_party/pocketfft'
2023-01-11T21:16:50.0753286Z Entering 'third_party/protobuf'
2023-01-11T21:16:50.0783302Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T21:16:50.0809823Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T21:16:50.0838202Z Entering 'third_party/psimd'
2023-01-11T21:16:50.0864768Z Entering 'third_party/pthreadpool'
2023-01-11T21:16:50.0895668Z Entering 'third_party/pybind11'
2023-01-11T21:16:50.0924518Z Entering 'third_party/python-enum'
2023-01-11T21:16:50.0951419Z Entering 'third_party/python-peachpy'
2023-01-11T21:16:50.0977859Z Entering 'third_party/python-six'
2023-01-11T21:16:50.1003643Z Entering 'third_party/sleef'
2023-01-11T21:16:50.1030062Z Entering 'third_party/tbb'
2023-01-11T21:16:50.1057024Z Entering 'third_party/tensorpipe'
2023-01-11T21:16:50.1082602Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T21:16:50.1108382Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T21:16:50.1138512Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T21:16:50.1166627Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T21:16:50.1192717Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.1220764Z Entering 'third_party/zstd'
2023-01-11T21:16:50.1256752Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url
2023-01-11T21:16:50.1465988Z Entering 'android/libs/fbjni'
2023-01-11T21:16:50.1489153Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config	remote.origin.url
2023-01-11T21:16:50.1500948Z Entering 'third_party/FP16'
2023-01-11T21:16:50.1524037Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config	remote.origin.url
2023-01-11T21:16:50.1535154Z Entering 'third_party/FXdiv'
2023-01-11T21:16:50.1558492Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config	remote.origin.url
2023-01-11T21:16:50.1568712Z Entering 'third_party/NNPACK'
2023-01-11T21:16:50.1592476Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config	remote.origin.url
2023-01-11T21:16:50.1603337Z Entering 'third_party/QNNPACK'
2023-01-11T21:16:50.1627250Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config	remote.origin.url
2023-01-11T21:16:50.1637913Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T21:16:50.1662678Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/VulkanMemoryAllocator/config	remote.origin.url
2023-01-11T21:16:50.1673315Z Entering 'third_party/XNNPACK'
2023-01-11T21:16:50.1696803Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config	remote.origin.url
2023-01-11T21:16:50.1713905Z Entering 'third_party/benchmark'
2023-01-11T21:16:50.1739519Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config	remote.origin.url
2023-01-11T21:16:50.1752686Z Entering 'third_party/cpuinfo'
2023-01-11T21:16:50.1776674Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config	remote.origin.url
2023-01-11T21:16:50.1787589Z Entering 'third_party/cub'
2023-01-11T21:16:50.1811461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config	remote.origin.url
2023-01-11T21:16:50.1824650Z Entering 'third_party/cudnn_frontend'
2023-01-11T21:16:50.1848923Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config	remote.origin.url
2023-01-11T21:16:50.1863675Z Entering 'third_party/cutlass'
2023-01-11T21:16:50.1887348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cutlass/config	remote.origin.url
2023-01-11T21:16:50.1902865Z Entering 'third_party/eigen'
2023-01-11T21:16:50.1926461Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config	remote.origin.url
2023-01-11T21:16:50.1939091Z Entering 'third_party/fbgemm'
2023-01-11T21:16:50.1962920Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config	remote.origin.url
2023-01-11T21:16:50.1975767Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T21:16:50.2000632Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config	remote.origin.url
2023-01-11T21:16:50.2013466Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T21:16:50.2040289Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config	remote.origin.url
2023-01-11T21:16:50.2054918Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T21:16:50.2079574Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config	remote.origin.url
2023-01-11T21:16:50.2091239Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T21:16:50.2119894Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/hipify_torch/config	remote.origin.url
2023-01-11T21:16:50.2132725Z Entering 'third_party/flatbuffers'
2023-01-11T21:16:50.2158803Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config	remote.origin.url
2023-01-11T21:16:50.2171672Z Entering 'third_party/fmt'
2023-01-11T21:16:50.2195956Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config	remote.origin.url
2023-01-11T21:16:50.2206864Z Entering 'third_party/foxi'
2023-01-11T21:16:50.2231398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config	remote.origin.url
2023-01-11T21:16:50.2242171Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T21:16:50.2267371Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config	remote.origin.url
2023-01-11T21:16:50.2277994Z Entering 'third_party/gloo'
2023-01-11T21:16:50.2303523Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config	remote.origin.url
2023-01-11T21:16:50.2314110Z Entering 'third_party/googletest'
2023-01-11T21:16:50.2338208Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config	remote.origin.url
2023-01-11T21:16:50.2349326Z Entering 'third_party/ideep'
2023-01-11T21:16:50.2376132Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config	remote.origin.url
2023-01-11T21:16:50.2387432Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T21:16:50.2413142Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config	remote.origin.url
2023-01-11T21:16:50.2428572Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T21:16:50.2456533Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config	remote.origin.url
2023-01-11T21:16:50.2474157Z Entering 'third_party/ios-cmake'
2023-01-11T21:16:50.2500687Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config	remote.origin.url
2023-01-11T21:16:50.2512871Z Entering 'third_party/ittapi'
2023-01-11T21:16:50.2542102Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ittapi/config	remote.origin.url
2023-01-11T21:16:50.2554343Z Entering 'third_party/kineto'
2023-01-11T21:16:50.2580495Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config	remote.origin.url
2023-01-11T21:16:50.2591008Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T21:16:50.2618653Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config	remote.origin.url
2023-01-11T21:16:50.2631175Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T21:16:50.2656323Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config	remote.origin.url
2023-01-11T21:16:50.2668375Z Entering 'third_party/nccl/nccl'
2023-01-11T21:16:50.2692167Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config	remote.origin.url
2023-01-11T21:16:50.2704345Z Entering 'third_party/neon2sse'
2023-01-11T21:16:50.2727949Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config	remote.origin.url
2023-01-11T21:16:50.2739095Z Entering 'third_party/nlohmann'
2023-01-11T21:16:50.2763197Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nlohmann/config	remote.origin.url
2023-01-11T21:16:50.2777785Z Entering 'third_party/onnx'
2023-01-11T21:16:50.2802502Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config	remote.origin.url
2023-01-11T21:16:50.2823794Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.2848575Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config	remote.origin.url
2023-01-11T21:16:50.2860874Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.2888348Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2023-01-11T21:16:50.2903214Z Entering 'third_party/onnx-tensorrt'
2023-01-11T21:16:50.2928398Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config	remote.origin.url
2023-01-11T21:16:50.2939049Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T21:16:50.2963449Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config	remote.origin.url
2023-01-11T21:16:50.2979344Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.3006357Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config	remote.origin.url
2023-01-11T21:16:50.3019662Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.3044745Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config	remote.origin.url
2023-01-11T21:16:50.3056213Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.3080728Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2023-01-11T21:16:50.3096294Z Entering 'third_party/pocketfft'
2023-01-11T21:16:50.3120178Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config	remote.origin.url
2023-01-11T21:16:50.3130554Z Entering 'third_party/protobuf'
2023-01-11T21:16:50.3154657Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config	remote.origin.url
2023-01-11T21:16:50.3167434Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T21:16:50.3191846Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config	remote.origin.url
2023-01-11T21:16:50.3203186Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T21:16:50.3227018Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config	remote.origin.url
2023-01-11T21:16:50.3239235Z Entering 'third_party/psimd'
2023-01-11T21:16:50.3263259Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config	remote.origin.url
2023-01-11T21:16:50.3274416Z Entering 'third_party/pthreadpool'
2023-01-11T21:16:50.3299294Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config	remote.origin.url
2023-01-11T21:16:50.3310202Z Entering 'third_party/pybind11'
2023-01-11T21:16:50.3335428Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config	remote.origin.url
2023-01-11T21:16:50.3345416Z Entering 'third_party/python-enum'
2023-01-11T21:16:50.3368894Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config	remote.origin.url
2023-01-11T21:16:50.3380321Z Entering 'third_party/python-peachpy'
2023-01-11T21:16:50.3404392Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config	remote.origin.url
2023-01-11T21:16:50.3416035Z Entering 'third_party/python-six'
2023-01-11T21:16:50.3439591Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config	remote.origin.url
2023-01-11T21:16:50.3450564Z Entering 'third_party/sleef'
2023-01-11T21:16:50.3474616Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config	remote.origin.url
2023-01-11T21:16:50.3485104Z Entering 'third_party/tbb'
2023-01-11T21:16:50.3508297Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config	remote.origin.url
2023-01-11T21:16:50.3521056Z Entering 'third_party/tensorpipe'
2023-01-11T21:16:50.3549264Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config	remote.origin.url
2023-01-11T21:16:50.3561112Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T21:16:50.3588137Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config	remote.origin.url
2023-01-11T21:16:50.3599037Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T21:16:50.3622979Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config	remote.origin.url
2023-01-11T21:16:50.3633990Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T21:16:50.3659751Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config	remote.origin.url
2023-01-11T21:16:50.3672844Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T21:16:50.3698100Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config	remote.origin.url
2023-01-11T21:16:50.3708066Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.3731694Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config	remote.origin.url
2023-01-11T21:16:50.3745317Z Entering 'third_party/zstd'
2023-01-11T21:16:50.3768728Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config	remote.origin.url
2023-01-11T21:16:50.4961820Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:'
2023-01-11T21:16:50.5167849Z Entering 'android/libs/fbjni'
2023-01-11T21:16:50.5194687Z Entering 'third_party/FP16'
2023-01-11T21:16:50.5224065Z Entering 'third_party/FXdiv'
2023-01-11T21:16:50.5252851Z Entering 'third_party/NNPACK'
2023-01-11T21:16:50.5280959Z Entering 'third_party/QNNPACK'
2023-01-11T21:16:50.5311816Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T21:16:50.5340737Z Entering 'third_party/XNNPACK'
2023-01-11T21:16:50.5378416Z Entering 'third_party/benchmark'
2023-01-11T21:16:50.5406656Z Entering 'third_party/cpuinfo'
2023-01-11T21:16:50.5444384Z Entering 'third_party/cub'
2023-01-11T21:16:50.5473299Z Entering 'third_party/cudnn_frontend'
2023-01-11T21:16:50.5505187Z Entering 'third_party/cutlass'
2023-01-11T21:16:50.5539057Z Entering 'third_party/eigen'
2023-01-11T21:16:50.5568347Z Entering 'third_party/fbgemm'
2023-01-11T21:16:50.5598341Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T21:16:50.5628446Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T21:16:50.5655774Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T21:16:50.5684150Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T21:16:50.5712692Z Entering 'third_party/flatbuffers'
2023-01-11T21:16:50.5742401Z Entering 'third_party/fmt'
2023-01-11T21:16:50.5770480Z Entering 'third_party/foxi'
2023-01-11T21:16:50.5797848Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T21:16:50.5827012Z Entering 'third_party/gloo'
2023-01-11T21:16:50.5855835Z Entering 'third_party/googletest'
2023-01-11T21:16:50.5886605Z Entering 'third_party/ideep'
2023-01-11T21:16:50.5914639Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T21:16:50.6034387Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T21:16:50.6066317Z Entering 'third_party/ios-cmake'
2023-01-11T21:16:50.6096791Z Entering 'third_party/ittapi'
2023-01-11T21:16:50.6122403Z Entering 'third_party/kineto'
2023-01-11T21:16:50.6150592Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T21:16:50.6177643Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T21:16:50.6203779Z Entering 'third_party/nccl/nccl'
2023-01-11T21:16:50.6234659Z Entering 'third_party/neon2sse'
2023-01-11T21:16:50.6261299Z Entering 'third_party/nlohmann'
2023-01-11T21:16:50.6287390Z Entering 'third_party/onnx'
2023-01-11T21:16:50.6325934Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.6353398Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.6383930Z Entering 'third_party/onnx-tensorrt'
2023-01-11T21:16:50.6411080Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T21:16:50.6441032Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.6468160Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.6495532Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.6526176Z Entering 'third_party/pocketfft'
2023-01-11T21:16:50.6552118Z Entering 'third_party/protobuf'
2023-01-11T21:16:50.6581083Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T21:16:50.6606654Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T21:16:50.6638207Z Entering 'third_party/psimd'
2023-01-11T21:16:50.6664424Z Entering 'third_party/pthreadpool'
2023-01-11T21:16:50.6691979Z Entering 'third_party/pybind11'
2023-01-11T21:16:50.6721368Z Entering 'third_party/python-enum'
2023-01-11T21:16:50.6747345Z Entering 'third_party/python-peachpy'
2023-01-11T21:16:50.6775772Z Entering 'third_party/python-six'
2023-01-11T21:16:50.6801515Z Entering 'third_party/sleef'
2023-01-11T21:16:50.6831325Z Entering 'third_party/tbb'
2023-01-11T21:16:50.6858839Z Entering 'third_party/tensorpipe'
2023-01-11T21:16:50.6884917Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T21:16:50.6913394Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T21:16:50.6939841Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T21:16:50.6968100Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T21:16:50.6993870Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.7024666Z Entering 'third_party/zstd'
2023-01-11T21:16:50.7060614Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:'
2023-01-11T21:16:50.7269625Z Entering 'android/libs/fbjni'
2023-01-11T21:16:50.7298664Z Entering 'third_party/FP16'
2023-01-11T21:16:50.7327201Z Entering 'third_party/FXdiv'
2023-01-11T21:16:50.7356240Z Entering 'third_party/NNPACK'
2023-01-11T21:16:50.7385244Z Entering 'third_party/QNNPACK'
2023-01-11T21:16:50.7413931Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T21:16:50.7445351Z Entering 'third_party/XNNPACK'
2023-01-11T21:16:50.7479147Z Entering 'third_party/benchmark'
2023-01-11T21:16:50.7508680Z Entering 'third_party/cpuinfo'
2023-01-11T21:16:50.7537664Z Entering 'third_party/cub'
2023-01-11T21:16:50.7565020Z Entering 'third_party/cudnn_frontend'
2023-01-11T21:16:50.7598869Z Entering 'third_party/cutlass'
2023-01-11T21:16:50.7633409Z Entering 'third_party/eigen'
2023-01-11T21:16:50.7664941Z Entering 'third_party/fbgemm'
2023-01-11T21:16:50.7692960Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T21:16:50.7721622Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T21:16:50.7749743Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T21:16:50.7776072Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T21:16:50.7805658Z Entering 'third_party/flatbuffers'
2023-01-11T21:16:50.7838582Z Entering 'third_party/fmt'
2023-01-11T21:16:50.7866577Z Entering 'third_party/foxi'
2023-01-11T21:16:50.7895915Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T21:16:50.7924604Z Entering 'third_party/gloo'
2023-01-11T21:16:50.7954518Z Entering 'third_party/googletest'
2023-01-11T21:16:50.7983163Z Entering 'third_party/ideep'
2023-01-11T21:16:50.8010564Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T21:16:50.8040656Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T21:16:50.8073796Z Entering 'third_party/ios-cmake'
2023-01-11T21:16:50.8101117Z Entering 'third_party/ittapi'
2023-01-11T21:16:50.8130154Z Entering 'third_party/kineto'
2023-01-11T21:16:50.8160127Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T21:16:50.8187580Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T21:16:50.8216935Z Entering 'third_party/nccl/nccl'
2023-01-11T21:16:50.8247322Z Entering 'third_party/neon2sse'
2023-01-11T21:16:50.8277120Z Entering 'third_party/nlohmann'
2023-01-11T21:16:50.8309023Z Entering 'third_party/onnx'
2023-01-11T21:16:50.8345382Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.8375904Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.8405273Z Entering 'third_party/onnx-tensorrt'
2023-01-11T21:16:50.8433322Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T21:16:50.8466152Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T21:16:50.8496053Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T21:16:50.8524715Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.8557051Z Entering 'third_party/pocketfft'
2023-01-11T21:16:50.8583714Z Entering 'third_party/protobuf'
2023-01-11T21:16:50.8614852Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T21:16:50.8643913Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T21:16:50.8673922Z Entering 'third_party/psimd'
2023-01-11T21:16:50.8702621Z Entering 'third_party/pthreadpool'
2023-01-11T21:16:50.8730614Z Entering 'third_party/pybind11'
2023-01-11T21:16:50.8761015Z Entering 'third_party/python-enum'
2023-01-11T21:16:50.8787742Z Entering 'third_party/python-peachpy'
2023-01-11T21:16:50.8815437Z Entering 'third_party/python-six'
2023-01-11T21:16:50.8844054Z Entering 'third_party/sleef'
2023-01-11T21:16:50.8872032Z Entering 'third_party/tbb'
2023-01-11T21:16:50.8900878Z Entering 'third_party/tensorpipe'
2023-01-11T21:16:50.8930041Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T21:16:50.8960001Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T21:16:50.8989616Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T21:16:50.9016250Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T21:16:50.9043704Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T21:16:50.9075801Z Entering 'third_party/zstd'
2023-01-11T21:16:50.9109560Z ##[endgroup]
2023-01-11T21:16:50.9139535Z [command]/usr/bin/git log -1 --format='%H'
2023-01-11T21:16:50.9161649Z '8419ddda87c8a47eacc63b54bc7ec98c1f27c26e'
2023-01-11T21:16:50.9274811Z Prepare all required actions
2023-01-11T21:16:50.9300769Z ##[group]Run ./.github/actions/setup-linux
2023-01-11T21:16:50.9301018Z env:
2023-01-11T21:16:50.9301245Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:50.9301456Z ##[endgroup]
2023-01-11T21:16:50.9337541Z ##[group]Run set -euo pipefail
2023-01-11T21:16:50.9337802Z [36;1mset -euo pipefail[0m
2023-01-11T21:16:50.9338020Z [36;1mfunction get_ec2_metadata() {[0m
2023-01-11T21:16:50.9338286Z [36;1m  # Pulled from instance metadata endpoint for EC2[0m
2023-01-11T21:16:50.9338650Z [36;1m  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html[0m
2023-01-11T21:16:50.9338959Z [36;1m  category=$1[0m
2023-01-11T21:16:50.9339218Z [36;1m  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}"[0m
2023-01-11T21:16:50.9339488Z [36;1m}[0m
2023-01-11T21:16:50.9339735Z [36;1mecho "ami-id: $(get_ec2_metadata ami-id)"[0m
2023-01-11T21:16:50.9340200Z [36;1mecho "instance-id: $(get_ec2_metadata instance-id)"[0m
2023-01-11T21:16:50.9340504Z [36;1mecho "instance-type: $(get_ec2_metadata instance-type)"[0m
2023-01-11T21:16:50.9340766Z [36;1mecho "system info $(uname -a)"[0m
2023-01-11T21:16:50.9351589Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:16:50.9351826Z env:
2023-01-11T21:16:50.9352010Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:50.9352212Z ##[endgroup]
2023-01-11T21:16:50.9421223Z ami-id: ami-096198a0bccc6bad4
2023-01-11T21:16:50.9462306Z instance-id: i-016718a172a944ca0
2023-01-11T21:16:50.9502944Z instance-type: g5.4xlarge
2023-01-11T21:16:50.9508475Z system info Linux ip-10-0-2-196.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
2023-01-11T21:16:50.9523866Z ##[group]Run if systemctl is-active --quiet docker; then
2023-01-11T21:16:50.9524165Z [36;1mif systemctl is-active --quiet docker; then[0m
2023-01-11T21:16:50.9524440Z [36;1m    echo "Docker daemon is running...";[0m
2023-01-11T21:16:50.9524678Z [36;1melse[0m
2023-01-11T21:16:50.9524925Z [36;1m    echo "Starting docker deamon..." && sudo systemctl start docker;[0m
2023-01-11T21:16:50.9525174Z [36;1mfi[0m
2023-01-11T21:16:50.9536476Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:16:50.9536715Z env:
2023-01-11T21:16:50.9536918Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:50.9537130Z ##[endgroup]
2023-01-11T21:16:50.9574676Z Docker daemon is running...
2023-01-11T21:16:50.9588207Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")
2023-01-11T21:16:50.9588580Z [36;1mAWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\")[0m
2023-01-11T21:16:50.9588875Z [36;1mretry () { "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@") }[0m
2023-01-11T21:16:50.9589278Z [36;1mretry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \[0m
2023-01-11T21:16:50.9589646Z [36;1m    --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com"[0m
2023-01-11T21:16:50.9598308Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:16:50.9598538Z env:
2023-01-11T21:16:50.9598730Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:50.9598946Z   AWS_RETRY_MODE: standard
2023-01-11T21:16:50.9599147Z   AWS_MAX_ATTEMPTS: 5
2023-01-11T21:16:50.9599440Z   AWS_DEFAULT_REGION: us-east-1
2023-01-11T21:16:50.9599653Z ##[endgroup]
2023-01-11T21:16:51.7735285Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json.
2023-01-11T21:16:51.7736208Z Configure a credential helper to remove this warning. See
2023-01-11T21:16:51.7737203Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store
2023-01-11T21:16:51.7737735Z 
2023-01-11T21:16:51.7737967Z Login Succeeded
2023-01-11T21:16:51.7769173Z ##[group]Run env | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"
2023-01-11T21:16:51.7769726Z [36;1menv | grep '^GITHUB' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2023-01-11T21:16:51.7770131Z [36;1menv | grep '^CI' >> "/tmp/github_env_${GITHUB_RUN_ID}"[0m
2023-01-11T21:16:51.7781354Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:16:51.7781585Z env:
2023-01-11T21:16:51.7781772Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:51.7781975Z ##[endgroup]
2023-01-11T21:16:51.7848687Z ##[group]Run pytorch/test-infra/.github/actions/pull-docker-image@main
2023-01-11T21:16:51.7848951Z with:
2023-01-11T21:16:51.7849330Z   docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:16:51.7849694Z env:
2023-01-11T21:16:51.7849882Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:51.7850087Z ##[endgroup]
2023-01-11T21:16:51.7861466Z ##[group]Run retry () { "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@") }
2023-01-11T21:16:51.7861750Z [36;1mretry () { "$@"  || (sleep 1 && "$@") || (sleep 2 && "$@") }[0m
2023-01-11T21:16:51.7862034Z [36;1m# ignore output since only exit code is used for conditional[0m
2023-01-11T21:16:51.7862336Z [36;1m# only pull docker image if it's not available locally[0m
2023-01-11T21:16:51.7862650Z [36;1mif ! docker inspect --type=image "${DOCKER_IMAGE}" >/dev/null 2>/dev/null; then[0m
2023-01-11T21:16:51.7862958Z [36;1m  retry docker pull "${DOCKER_IMAGE}"[0m
2023-01-11T21:16:51.7863176Z [36;1mfi[0m
2023-01-11T21:16:51.7871835Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:16:51.7872067Z env:
2023-01-11T21:16:51.7872259Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:16:51.7872657Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:16:51.7873028Z ##[endgroup]
2023-01-11T21:16:52.0040315Z fd224c2e6c79d7fdec6408da598bf52bc5b201dd: Pulling from pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7
2023-01-11T21:16:52.0040819Z fb668870d8a7: Pulling fs layer
2023-01-11T21:16:52.0041161Z 3dc32ed140fb: Pulling fs layer
2023-01-11T21:16:52.0041479Z 54a1df240516: Pulling fs layer
2023-01-11T21:16:52.0041693Z cf378b3cb3c7: Pulling fs layer
2023-01-11T21:16:52.0041909Z 9b4412378859: Pulling fs layer
2023-01-11T21:16:52.0045820Z 502253a1be21: Pulling fs layer
2023-01-11T21:16:52.0046226Z 5c7dd67e5809: Pulling fs layer
2023-01-11T21:16:52.0046684Z bdfd23ed3f48: Pulling fs layer
2023-01-11T21:16:52.0047049Z aee1dd761bdd: Pulling fs layer
2023-01-11T21:16:52.0047311Z 5feda9af2542: Pulling fs layer
2023-01-11T21:16:52.0047671Z f8371ecb849a: Pulling fs layer
2023-01-11T21:16:52.0048102Z ce4a87d45645: Pulling fs layer
2023-01-11T21:16:52.0048513Z 39629f7269f9: Pulling fs layer
2023-01-11T21:16:52.0048894Z cf378b3cb3c7: Waiting
2023-01-11T21:16:52.0049285Z 87d0ffa55850: Pulling fs layer
2023-01-11T21:16:52.0063658Z 70702f8b5bc4: Pulling fs layer
2023-01-11T21:16:52.0064075Z 0c06be5c20e0: Pulling fs layer
2023-01-11T21:16:52.0064374Z b372c2a3bc3f: Pulling fs layer
2023-01-11T21:16:52.0064663Z 582d081a59fa: Pulling fs layer
2023-01-11T21:16:52.0064883Z e1c655e7ec0e: Pulling fs layer
2023-01-11T21:16:52.0065108Z c7726d39d806: Pulling fs layer
2023-01-11T21:16:52.0065314Z 9b4412378859: Waiting
2023-01-11T21:16:52.0065512Z 1c22f2f8c01b: Pulling fs layer
2023-01-11T21:16:52.0065722Z bdfd23ed3f48: Waiting
2023-01-11T21:16:52.0070259Z b8f759fd0191: Pulling fs layer
2023-01-11T21:16:52.0070826Z 502253a1be21: Waiting
2023-01-11T21:16:52.0071110Z e28e73a4bddd: Pulling fs layer
2023-01-11T21:16:52.0071395Z 90d8f9bbe048: Pulling fs layer
2023-01-11T21:16:52.0071663Z b34bd39d0461: Pulling fs layer
2023-01-11T21:16:52.0071937Z 2f2308643d60: Pulling fs layer
2023-01-11T21:16:52.0072209Z 8e3432e5a569: Pulling fs layer
2023-01-11T21:16:52.0072457Z aee1dd761bdd: Waiting
2023-01-11T21:16:52.0073707Z 9ea746919509: Pulling fs layer
2023-01-11T21:16:52.0074068Z 5c7dd67e5809: Waiting
2023-01-11T21:16:52.0074301Z 39629f7269f9: Waiting
2023-01-11T21:16:52.0074543Z 70702f8b5bc4: Waiting
2023-01-11T21:16:52.0074785Z 0c06be5c20e0: Waiting
2023-01-11T21:16:52.0075033Z 1a2fd7b216d7: Pulling fs layer
2023-01-11T21:16:52.0075297Z 5feda9af2542: Waiting
2023-01-11T21:16:52.0075542Z f8371ecb849a: Waiting
2023-01-11T21:16:52.0075828Z 19fde6a723a0: Pulling fs layer
2023-01-11T21:16:52.0076100Z 06369252d749: Pulling fs layer
2023-01-11T21:16:52.0076483Z c7726d39d806: Waiting
2023-01-11T21:16:52.0076746Z b372c2a3bc3f: Waiting
2023-01-11T21:16:52.0077004Z ea4bfeaa0fc7: Pulling fs layer
2023-01-11T21:16:52.0077246Z a1d16b6a5070: Pulling fs layer
2023-01-11T21:16:52.0077463Z f550b7ff2470: Pulling fs layer
2023-01-11T21:16:52.0077657Z e1c655e7ec0e: Waiting
2023-01-11T21:16:52.0077858Z 12ddc57b99eb: Pulling fs layer
2023-01-11T21:16:52.0078070Z 8345085fb0a0: Pulling fs layer
2023-01-11T21:16:52.0078264Z b34bd39d0461: Waiting
2023-01-11T21:16:52.0078465Z 4cc94dbec031: Pulling fs layer
2023-01-11T21:16:52.0078665Z 90d8f9bbe048: Waiting
2023-01-11T21:16:52.0078860Z 29a7c0d5fa4c: Pulling fs layer
2023-01-11T21:16:52.0079069Z 25571655d0e1: Pulling fs layer
2023-01-11T21:16:52.0079278Z bdf297d7f88c: Pulling fs layer
2023-01-11T21:16:52.0079478Z 1c22f2f8c01b: Waiting
2023-01-11T21:16:52.0079712Z 0b3950af8ae1: Pulling fs layer
2023-01-11T21:16:52.0079944Z 6d68f7da8baa: Pulling fs layer
2023-01-11T21:16:52.0080139Z 9ea746919509: Waiting
2023-01-11T21:16:52.0080331Z 4cc94dbec031: Waiting
2023-01-11T21:16:52.0080526Z b8f759fd0191: Waiting
2023-01-11T21:16:52.0080719Z cca768f96df4: Pulling fs layer
2023-01-11T21:16:52.0080923Z f550b7ff2470: Waiting
2023-01-11T21:16:52.0081123Z 8c3cf3d5e1c5: Pulling fs layer
2023-01-11T21:16:52.0081331Z 61eecfa8b34e: Pulling fs layer
2023-01-11T21:16:52.0081531Z 06369252d749: Waiting
2023-01-11T21:16:52.0081715Z 8345085fb0a0: Waiting
2023-01-11T21:16:52.0081895Z 29a7c0d5fa4c: Waiting
2023-01-11T21:16:52.0082083Z 0b3950af8ae1: Waiting
2023-01-11T21:16:52.0082264Z 8e3432e5a569: Waiting
2023-01-11T21:16:52.0082441Z 12ddc57b99eb: Waiting
2023-01-11T21:16:52.0082629Z 582d081a59fa: Waiting
2023-01-11T21:16:52.0082818Z e28e73a4bddd: Waiting
2023-01-11T21:16:52.0082998Z a1d16b6a5070: Waiting
2023-01-11T21:16:52.0083191Z 6d68f7da8baa: Waiting
2023-01-11T21:16:52.0083374Z 25571655d0e1: Waiting
2023-01-11T21:16:52.0083558Z ea4bfeaa0fc7: Waiting
2023-01-11T21:16:52.0083749Z 8c3cf3d5e1c5: Waiting
2023-01-11T21:16:52.0083938Z bdf297d7f88c: Waiting
2023-01-11T21:16:52.0084124Z cca768f96df4: Waiting
2023-01-11T21:16:52.0084326Z 95c1ac011645: Pulling fs layer
2023-01-11T21:16:52.0084525Z 61eecfa8b34e: Waiting
2023-01-11T21:16:52.0084717Z 3046cc00c4ca: Pulling fs layer
2023-01-11T21:16:52.0084930Z 195d560d8cf6: Pulling fs layer
2023-01-11T21:16:52.0085141Z 77250abd5ca4: Pulling fs layer
2023-01-11T21:16:52.0085355Z 881b24daf9c5: Pulling fs layer
2023-01-11T21:16:52.0085559Z 9fbf0a18619e: Pulling fs layer
2023-01-11T21:16:52.0085769Z 02048a597c22: Pulling fs layer
2023-01-11T21:16:52.0085963Z 95c1ac011645: Waiting
2023-01-11T21:16:52.0086152Z 859052a25d95: Pulling fs layer
2023-01-11T21:16:52.0086350Z 195d560d8cf6: Waiting
2023-01-11T21:16:52.0086538Z 3046cc00c4ca: Waiting
2023-01-11T21:16:52.0086719Z 77250abd5ca4: Waiting
2023-01-11T21:16:52.0086920Z 3e03143da3c2: Pulling fs layer
2023-01-11T21:16:52.0087125Z 881b24daf9c5: Waiting
2023-01-11T21:16:52.0087307Z 9fbf0a18619e: Waiting
2023-01-11T21:16:52.0087490Z 859052a25d95: Waiting
2023-01-11T21:16:52.0087676Z 3e03143da3c2: Waiting
2023-01-11T21:16:52.0087906Z 02048a597c22: Waiting
2023-01-11T21:16:52.1265075Z 3dc32ed140fb: Verifying Checksum
2023-01-11T21:16:52.1265478Z 3dc32ed140fb: Download complete
2023-01-11T21:16:52.2135529Z cf378b3cb3c7: Verifying Checksum
2023-01-11T21:16:52.2136049Z cf378b3cb3c7: Download complete
2023-01-11T21:16:52.2715343Z 9b4412378859: Download complete
2023-01-11T21:16:52.3085283Z 54a1df240516: Verifying Checksum
2023-01-11T21:16:52.3085585Z 54a1df240516: Download complete
2023-01-11T21:16:52.3372911Z fb668870d8a7: Download complete
2023-01-11T21:16:52.4196997Z 5c7dd67e5809: Verifying Checksum
2023-01-11T21:16:52.4197353Z 5c7dd67e5809: Download complete
2023-01-11T21:16:52.4585949Z bdfd23ed3f48: Download complete
2023-01-11T21:16:52.5073386Z aee1dd761bdd: Verifying Checksum
2023-01-11T21:16:52.5073725Z aee1dd761bdd: Download complete
2023-01-11T21:16:52.6015657Z f8371ecb849a: Download complete
2023-01-11T21:16:52.6929645Z ce4a87d45645: Download complete
2023-01-11T21:16:52.9619150Z fb668870d8a7: Pull complete
2023-01-11T21:16:53.2066467Z 3dc32ed140fb: Pull complete
2023-01-11T21:16:53.6638793Z 54a1df240516: Pull complete
2023-01-11T21:16:53.7736960Z cf378b3cb3c7: Pull complete
2023-01-11T21:16:53.8705145Z 9b4412378859: Pull complete
2023-01-11T21:16:54.8283262Z 39629f7269f9: Verifying Checksum
2023-01-11T21:16:54.8283603Z 39629f7269f9: Download complete
2023-01-11T21:16:54.9209092Z 87d0ffa55850: Verifying Checksum
2023-01-11T21:16:54.9209403Z 87d0ffa55850: Download complete
2023-01-11T21:16:54.9989753Z 70702f8b5bc4: Verifying Checksum
2023-01-11T21:16:54.9990144Z 70702f8b5bc4: Download complete
2023-01-11T21:16:55.0959173Z 0c06be5c20e0: Download complete
2023-01-11T21:16:57.0643542Z b372c2a3bc3f: Verifying Checksum
2023-01-11T21:16:57.0643940Z b372c2a3bc3f: Download complete
2023-01-11T21:16:57.1319682Z 582d081a59fa: Verifying Checksum
2023-01-11T21:16:57.1319930Z 582d081a59fa: Download complete
2023-01-11T21:16:57.2164460Z e1c655e7ec0e: Download complete
2023-01-11T21:17:03.4948380Z 502253a1be21: Download complete
2023-01-11T21:17:03.5609436Z 1c22f2f8c01b: Verifying Checksum
2023-01-11T21:17:03.5609718Z 1c22f2f8c01b: Download complete
2023-01-11T21:17:03.6409552Z b8f759fd0191: Verifying Checksum
2023-01-11T21:17:03.6409795Z b8f759fd0191: Download complete
2023-01-11T21:17:03.7388181Z e28e73a4bddd: Verifying Checksum
2023-01-11T21:17:03.7388513Z e28e73a4bddd: Download complete
2023-01-11T21:17:03.8498961Z 90d8f9bbe048: Verifying Checksum
2023-01-11T21:17:03.8499326Z 90d8f9bbe048: Download complete
2023-01-11T21:17:03.9535712Z b34bd39d0461: Download complete
2023-01-11T21:17:04.0445951Z 2f2308643d60: Download complete
2023-01-11T21:17:05.0018394Z 8e3432e5a569: Verifying Checksum
2023-01-11T21:17:05.0018797Z 8e3432e5a569: Download complete
2023-01-11T21:17:05.0930538Z 9ea746919509: Verifying Checksum
2023-01-11T21:17:05.0930864Z 9ea746919509: Download complete
2023-01-11T21:17:05.1789841Z 1a2fd7b216d7: Download complete
2023-01-11T21:17:05.2745643Z 19fde6a723a0: Verifying Checksum
2023-01-11T21:17:05.2745982Z 19fde6a723a0: Download complete
2023-01-11T21:17:05.4105633Z 06369252d749: Verifying Checksum
2023-01-11T21:17:05.4105961Z 06369252d749: Download complete
2023-01-11T21:17:05.4979334Z ea4bfeaa0fc7: Verifying Checksum
2023-01-11T21:17:05.4979672Z ea4bfeaa0fc7: Download complete
2023-01-11T21:17:06.8218954Z 5feda9af2542: Verifying Checksum
2023-01-11T21:17:06.8219240Z 5feda9af2542: Download complete
2023-01-11T21:17:06.8935626Z f550b7ff2470: Download complete
2023-01-11T21:17:06.9923285Z 12ddc57b99eb: Download complete
2023-01-11T21:17:07.5484597Z 8345085fb0a0: Verifying Checksum
2023-01-11T21:17:07.5485014Z 8345085fb0a0: Download complete
2023-01-11T21:17:07.6299910Z 4cc94dbec031: Verifying Checksum
2023-01-11T21:17:07.6300265Z 4cc94dbec031: Download complete
2023-01-11T21:17:07.7165028Z 29a7c0d5fa4c: Verifying Checksum
2023-01-11T21:17:07.7165465Z 29a7c0d5fa4c: Download complete
2023-01-11T21:17:07.9740510Z 25571655d0e1: Verifying Checksum
2023-01-11T21:17:07.9740870Z 25571655d0e1: Download complete
2023-01-11T21:17:08.0771718Z bdf297d7f88c: Verifying Checksum
2023-01-11T21:17:08.0772544Z bdf297d7f88c: Download complete
2023-01-11T21:17:08.5529328Z 0b3950af8ae1: Verifying Checksum
2023-01-11T21:17:08.5529614Z 0b3950af8ae1: Download complete
2023-01-11T21:17:08.6380071Z 6d68f7da8baa: Verifying Checksum
2023-01-11T21:17:08.6380346Z 6d68f7da8baa: Download complete
2023-01-11T21:17:08.7241864Z cca768f96df4: Verifying Checksum
2023-01-11T21:17:08.7242246Z cca768f96df4: Download complete
2023-01-11T21:17:08.8119701Z a1d16b6a5070: Verifying Checksum
2023-01-11T21:17:08.8120085Z a1d16b6a5070: Download complete
2023-01-11T21:17:08.8903387Z 61eecfa8b34e: Download complete
2023-01-11T21:17:08.9962720Z 95c1ac011645: Verifying Checksum
2023-01-11T21:17:08.9963131Z 95c1ac011645: Download complete
2023-01-11T21:17:09.0869092Z 3046cc00c4ca: Verifying Checksum
2023-01-11T21:17:09.0869446Z 3046cc00c4ca: Download complete
2023-01-11T21:17:09.1828423Z 195d560d8cf6: Verifying Checksum
2023-01-11T21:17:09.1828909Z 195d560d8cf6: Download complete
2023-01-11T21:17:09.7412014Z 77250abd5ca4: Download complete
2023-01-11T21:17:09.8175811Z 881b24daf9c5: Verifying Checksum
2023-01-11T21:17:09.8176099Z 881b24daf9c5: Download complete
2023-01-11T21:17:11.6815169Z 9fbf0a18619e: Verifying Checksum
2023-01-11T21:17:11.6815892Z 9fbf0a18619e: Download complete
2023-01-11T21:17:11.7740456Z 02048a597c22: Verifying Checksum
2023-01-11T21:17:11.7740750Z 02048a597c22: Download complete
2023-01-11T21:17:14.5226181Z 8c3cf3d5e1c5: Verifying Checksum
2023-01-11T21:17:14.5226623Z 8c3cf3d5e1c5: Download complete
2023-01-11T21:17:14.6376493Z 3e03143da3c2: Verifying Checksum
2023-01-11T21:17:14.6376861Z 3e03143da3c2: Download complete
2023-01-11T21:17:16.9096376Z 502253a1be21: Pull complete
2023-01-11T21:17:17.0151345Z 5c7dd67e5809: Pull complete
2023-01-11T21:17:17.4191688Z bdfd23ed3f48: Pull complete
2023-01-11T21:17:17.5058748Z aee1dd761bdd: Pull complete
2023-01-11T21:17:34.2100689Z 5feda9af2542: Pull complete
2023-01-11T21:17:34.4356769Z f8371ecb849a: Pull complete
2023-01-11T21:17:34.6477249Z ce4a87d45645: Pull complete
2023-01-11T21:17:41.3057440Z 39629f7269f9: Pull complete
2023-01-11T21:17:41.5398081Z 87d0ffa55850: Pull complete
2023-01-11T21:17:41.7727691Z 70702f8b5bc4: Pull complete
2023-01-11T21:17:42.0103081Z 0c06be5c20e0: Pull complete
2023-01-11T21:17:43.6598807Z b372c2a3bc3f: Pull complete
2023-01-11T21:17:43.8978189Z 582d081a59fa: Pull complete
2023-01-11T21:17:44.1283032Z e1c655e7ec0e: Pull complete
2023-01-11T21:17:50.5485327Z c7726d39d806: Verifying Checksum
2023-01-11T21:17:50.5485689Z c7726d39d806: Download complete
2023-01-11T21:18:17.8189679Z c7726d39d806: Pull complete
2023-01-11T21:18:18.0420798Z 1c22f2f8c01b: Pull complete
2023-01-11T21:18:18.2671148Z b8f759fd0191: Pull complete
2023-01-11T21:18:18.5010183Z e28e73a4bddd: Pull complete
2023-01-11T21:18:18.7377991Z 90d8f9bbe048: Pull complete
2023-01-11T21:18:18.9452208Z b34bd39d0461: Pull complete
2023-01-11T21:18:19.1798187Z 2f2308643d60: Pull complete
2023-01-11T21:18:21.2984376Z 8e3432e5a569: Pull complete
2023-01-11T21:18:21.5271471Z 9ea746919509: Pull complete
2023-01-11T21:18:21.7629329Z 1a2fd7b216d7: Pull complete
2023-01-11T21:18:22.0478763Z 19fde6a723a0: Pull complete
2023-01-11T21:18:22.2905751Z 06369252d749: Pull complete
2023-01-11T21:18:22.5065710Z ea4bfeaa0fc7: Pull complete
2023-01-11T21:18:27.5597403Z a1d16b6a5070: Pull complete
2023-01-11T21:18:27.7702298Z f550b7ff2470: Pull complete
2023-01-11T21:18:27.9964697Z 12ddc57b99eb: Pull complete
2023-01-11T21:18:28.9288887Z 8345085fb0a0: Pull complete
2023-01-11T21:18:29.1704693Z 4cc94dbec031: Pull complete
2023-01-11T21:18:29.3908414Z 29a7c0d5fa4c: Pull complete
2023-01-11T21:18:29.8058272Z 25571655d0e1: Pull complete
2023-01-11T21:18:30.0037983Z bdf297d7f88c: Pull complete
2023-01-11T21:18:31.3530941Z 0b3950af8ae1: Pull complete
2023-01-11T21:18:31.5620912Z 6d68f7da8baa: Pull complete
2023-01-11T21:18:31.7921054Z cca768f96df4: Pull complete
2023-01-11T21:18:37.0105557Z 8c3cf3d5e1c5: Pull complete
2023-01-11T21:18:37.1395298Z 61eecfa8b34e: Pull complete
2023-01-11T21:18:37.5365422Z 95c1ac011645: Pull complete
2023-01-11T21:18:37.7201520Z 3046cc00c4ca: Pull complete
2023-01-11T21:18:37.9430793Z 195d560d8cf6: Pull complete
2023-01-11T21:18:38.7511629Z 77250abd5ca4: Pull complete
2023-01-11T21:18:38.9897698Z 881b24daf9c5: Pull complete
2023-01-11T21:18:40.7336732Z 9fbf0a18619e: Pull complete
2023-01-11T21:18:40.9960786Z 02048a597c22: Pull complete
2023-01-11T21:18:42.9949275Z 859052a25d95: Verifying Checksum
2023-01-11T21:18:42.9951467Z 859052a25d95: Download complete
2023-01-11T21:19:16.9733211Z 859052a25d95: Pull complete
2023-01-11T21:19:17.0700095Z 3e03143da3c2: Pull complete
2023-01-11T21:19:17.0808014Z Digest: sha256:866df6c1171dbe014496717cf2080d6cc72ca611a4e8146525c9ef09640c8ba4
2023-01-11T21:19:17.0837386Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:19:17.0861012Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:19:17.0938699Z ##[group]Run pytorch/test-infra/.github/actions/setup-nvidia@main
2023-01-11T21:19:17.0938952Z with:
2023-01-11T21:19:17.0939143Z   driver-version: 515.76
2023-01-11T21:19:17.0939337Z env:
2023-01-11T21:19:17.0939515Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:19:17.0939718Z ##[endgroup]
2023-01-11T21:19:17.0964127Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2023-01-11T21:19:17.0964367Z with:
2023-01-11T21:19:17.0964549Z   timeout_minutes: 10
2023-01-11T21:19:17.0964743Z   max_attempts: 3
2023-01-11T21:19:17.0969789Z   command: # Is it disgusting to have a full shell script here in this github action? Sure
# But is it the best way to make it so that this action relies on nothing else? Absolutely
set -eou pipefail

DISTRIBUTION=$(. /etc/os-release;echo $ID$VERSION_ID)
DRIVER_FN="NVIDIA-Linux-x86_64-${DRIVER_VERSION}.run"
YUM_REPO_URL="https://nvidia.github.io/nvidia-docker/${DISTRIBUTION}/nvidia-docker.repo"

install_nvidia_docker2_amzn2() {
    (
        set -x
        # Needed for yum-config-manager
        sudo yum install -y yum-utils
        sudo yum-config-manager --add-repo "${YUM_REPO_URL}"
        sudo yum install -y nvidia-docker2
        sudo systemctl restart docker
    )
}

install_nvidia_docker2_ubuntu20() {
    (
        set -x
        sudo apt-get install -y nvidia-docker2
        sudo systemctl restart docker
    )
}

pre_install_nvidia_driver_amzn2() {
    (
        # Purge any nvidia driver installed from RHEL repo
        sudo yum remove -y nvidia-driver-latest-dkms
    )
}

install_nvidia_driver_common() {
    (
        # Try to gather more information about the runner and its existing NVIDIA driver if any
        echo "Before installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        HAS_NVIDIA_DRIVER=0
        # Check if NVIDIA driver has already been installed
        if [ -x "$(command -v nvidia-smi)" ]; then
            set +e
            # The driver exists, check its version next. Also check only the first GPU if there are more than one of them
            # so that the same driver version is not print over multiple lines
            INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
            NVIDIA_SMI_STATUS=$?

            if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                echo "Failed to get NVIDIA driver version ($INSTALLED_DRIVER_VERSION). Continuing"
            elif [ "$INSTALLED_DRIVER_VERSION" != "$DRIVER_VERSION" ]; then
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has been installed, but we expect to have $DRIVER_VERSION instead. Continuing"
            else
                HAS_NVIDIA_DRIVER=1
                echo "NVIDIA driver ($INSTALLED_DRIVER_VERSION) has already been installed. Skipping NVIDIA driver installation"
            fi
            set -e
        fi

        if [ "$HAS_NVIDIA_DRIVER" -eq 0 ]; then
            # CAUTION: this may need to be updated in future
            if [ "${DISTRIBUTION}" != ubuntu20.04 ]; then
                  sudo yum groupinstall -y "Development Tools"
                  # ensure our kernel install is the same as our underlying kernel,
                  # groupinstall "Development Tools" has a habit of mismatching kernel headers
                  sudo yum install -y "kernel-devel-uname-r == $(uname -r)"
                  sudo modprobe backlight
            fi
            sudo curl -fsL -o /tmp/nvidia_driver "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

            set +e
            sudo /bin/bash /tmp/nvidia_driver -s --no-drm
            NVIDIA_INSTALLATION_STATUS=$?

            RESET_GPU=0
            if [ "$NVIDIA_INSTALLATION_STATUS" -ne 0 ]; then
                sudo cat /var/log/nvidia-installer.log
                # Fail to install NVIDIA driver, try to reset the GPU
                RESET_GPU=1
            elif [ -x "$(command -v nvidia-smi)" ]; then
                # Check again if nvidia-smi works even if the driver installation completes successfully
                INSTALLED_DRIVER_VERSION=$(nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0)
                NVIDIA_SMI_STATUS=$?

                if [ "$NVIDIA_SMI_STATUS" -ne 0 ] && [ "$NVIDIA_SMI_STATUS" -ne 14 ]; then
                    RESET_GPU=1
                fi
            fi

            if [ "$RESET_GPU" -eq 1 ]; then
                NVIDIA_DEVICES=$(lspci -D | grep -i NVIDIA | cut -d' ' -f1)
                # The GPU can get stuck in a failure state if somehow the test crashs the GPU microcode. When this
                # happens, we'll try to reset all NVIDIA devices https://github.com/pytorch/pytorch/issues/88388
                for PCI_ID in $NVIDIA_DEVICES; do
                    DEVICE_ENABLED=$(cat /sys/bus/pci/devices/$PCI_ID/enable)

                    echo "Reseting $PCI_ID (enabled state: $DEVICE_ENABLED)"
                    # This requires sudo permission of course
                    echo "1" | sudo tee /sys/bus/pci/devices/$PCI_ID/reset
                    sleep 1
                done
            fi

            sudo rm -fv /tmp/nvidia_driver
            set -e
        fi
    )
}

post_install_nvidia_driver_common() {
    (
        sudo modprobe nvidia || true
        echo "After installing NVIDIA driver"
        lspci
        lsmod
        modinfo nvidia || true

        (
            set +e
            nvidia-smi
            NVIDIA_SMI_STATUS=$?

            # Allowable exit statuses for nvidia-smi, see: https://github.com/NVIDIA/gpu-operator/issues/285
            if [ "$NVIDIA_SMI_STATUS" -eq 0 ] || [ "$NVIDIA_SMI_STATUS" -eq 14 ]; then
                echo "INFO: Ignoring allowed status ${NVIDIA_SMI_STATUS}"
            else
                echo "ERROR: nvidia-smi exited with unresolved status ${NVIDIA_SMI_STATUS}"
                exit ${NVIDIA_SMI_STATUS}
            fi
            set -e
        )
    )
}

install_nvidia_driver_amzn2() {
    (
        set -x
        pre_install_nvidia_driver_amzn2
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

install_nvidia_driver_ubuntu20() {
    (
        set -x
        install_nvidia_driver_common
        post_install_nvidia_driver_common
    )
}

echo "== Installing nvidia driver ${DRIVER_FN} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_driver_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_driver_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac

# Install container toolkit based on distribution
echo "== Installing nvidia container toolkit for ${DISTRIBUTION} =="
case "${DISTRIBUTION}" in
    amzn*)
        install_nvidia_docker2_amzn2
        ;;
    ubuntu20.04)
        install_nvidia_docker2_ubuntu20
        ;;
    *)
        echo "ERROR: Unknown distribution ${DISTRIBUTION}"
        exit 1
        ;;
esac
echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}"

2023-01-11T21:19:17.0975437Z   retry_wait_seconds: 10
2023-01-11T21:19:17.0975664Z   polling_interval_seconds: 1
2023-01-11T21:19:17.0975883Z   warning_on_retry: true
2023-01-11T21:19:17.0976082Z   continue_on_error: false
2023-01-11T21:19:17.0976277Z env:
2023-01-11T21:19:17.0976465Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:19:17.0976668Z   DRIVER_VERSION: 515.76
2023-01-11T21:19:17.0976866Z ##[endgroup]
2023-01-11T21:19:17.1445560Z == Installing nvidia driver NVIDIA-Linux-x86_64-515.76.run ==
2023-01-11T21:19:17.1446685Z + pre_install_nvidia_driver_amzn2
2023-01-11T21:19:17.1447389Z + sudo yum remove -y nvidia-driver-latest-dkms
2023-01-11T21:19:17.4230335Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
2023-01-11T21:19:17.4619120Z No Match for argument: nvidia-driver-latest-dkms
2023-01-11T21:19:17.4862616Z No Packages marked for removal
2023-01-11T21:19:17.4993762Z + install_nvidia_driver_common
2023-01-11T21:19:17.4994441Z + echo 'Before installing NVIDIA driver'
2023-01-11T21:19:17.4994682Z + lspci
2023-01-11T21:19:17.4999119Z Before installing NVIDIA driver
2023-01-11T21:19:17.5082312Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2023-01-11T21:19:17.5083299Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2023-01-11T21:19:17.5084244Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2023-01-11T21:19:17.5084875Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2023-01-11T21:19:17.5086304Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061
2023-01-11T21:19:17.5086931Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2023-01-11T21:19:17.5087512Z 00:1e.0 3D controller: NVIDIA Corporation Device 2237 (rev a1)
2023-01-11T21:19:17.5088228Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2023-01-11T21:19:17.5088708Z + lsmod
2023-01-11T21:19:17.5098621Z Module                  Size  Used by
2023-01-11T21:19:17.5098983Z nvidia_modeset       1142784  0
2023-01-11T21:19:17.5099298Z nvidia_uvm           1269760  0
2023-01-11T21:19:17.5099534Z veth                   16384  0
2023-01-11T21:19:17.5099759Z nvidia              40808448  2 nvidia_uvm,nvidia_modeset
2023-01-11T21:19:17.5102169Z drm                   425984  1 nvidia
2023-01-11T21:19:17.5102688Z i2c_core               77824  2 nvidia,drm
2023-01-11T21:19:17.5103079Z backlight              16384  1 nvidia_modeset
2023-01-11T21:19:17.5103405Z xt_conntrack           16384  1
2023-01-11T21:19:17.5103709Z ipt_MASQUERADE         16384  1
2023-01-11T21:19:17.5104035Z nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
2023-01-11T21:19:17.5104382Z nf_conntrack_netlink    49152  0
2023-01-11T21:19:17.5104731Z nfnetlink              16384  2 nf_conntrack_netlink
2023-01-11T21:19:17.5105039Z xfrm_user              45056  1
2023-01-11T21:19:17.5105344Z xfrm_algo              16384  1 xfrm_user
2023-01-11T21:19:17.5105648Z xt_addrtype            16384  2
2023-01-11T21:19:17.5105931Z iptable_filter         16384  1
2023-01-11T21:19:17.5106171Z iptable_nat            16384  1
2023-01-11T21:19:17.5106486Z nf_conntrack_ipv4      16384  3
2023-01-11T21:19:17.5106847Z nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
2023-01-11T21:19:17.5107254Z nf_nat_ipv4            16384  1 iptable_nat
2023-01-11T21:19:17.5107732Z nf_nat                 36864  2 nf_nat_masquerade_ipv4,nf_nat_ipv4
2023-01-11T21:19:17.5108269Z nf_conntrack          155648  7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink
2023-01-11T21:19:17.5108668Z br_netfilter           24576  0
2023-01-11T21:19:17.5109048Z bridge                172032  1 br_netfilter
2023-01-11T21:19:17.5109520Z stp                    16384  1 bridge
2023-01-11T21:19:17.5109883Z llc                    16384  2 bridge,stp
2023-01-11T21:19:17.5110386Z overlay                86016  0
2023-01-11T21:19:17.5110746Z sunrpc                393216  1
2023-01-11T21:19:17.5111664Z dm_mirror              28672  0
2023-01-11T21:19:17.5112220Z dm_region_hash         20480  1 dm_mirror
2023-01-11T21:19:17.5112547Z dm_log                 20480  2 dm_region_hash,dm_mirror
2023-01-11T21:19:17.5112938Z dm_mod                143360  2 dm_log,dm_mirror
2023-01-11T21:19:17.5113269Z dax                    69632  1 dm_mod
2023-01-11T21:19:17.5113552Z pcc_cpufreq            16384  0
2023-01-11T21:19:17.5113925Z crc32_pclmul           16384  0
2023-01-11T21:19:17.5114277Z ghash_clmulni_intel    16384  0
2023-01-11T21:19:17.5114614Z pcbc                   16384  0
2023-01-11T21:19:17.5114934Z aesni_intel           188416  0
2023-01-11T21:19:17.5115260Z aes_x86_64             20480  1 aesni_intel
2023-01-11T21:19:17.5115603Z crypto_simd            16384  1 aesni_intel
2023-01-11T21:19:17.5115989Z glue_helper            16384  1 aesni_intel
2023-01-11T21:19:17.5116324Z cryptd                 28672  3 crypto_simd,ghash_clmulni_intel,aesni_intel
2023-01-11T21:19:17.5116625Z mousedev               24576  0
2023-01-11T21:19:17.5116916Z psmouse                32768  0
2023-01-11T21:19:17.5117156Z evdev                  20480  3
2023-01-11T21:19:17.5117489Z button                 16384  0
2023-01-11T21:19:17.5117732Z ena                   114688  0
2023-01-11T21:19:17.5117945Z crc32c_intel           24576  0
2023-01-11T21:19:17.5118201Z autofs4                49152  2
2023-01-11T21:19:17.5118462Z + modinfo nvidia
2023-01-11T21:19:17.5118860Z filename:       /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko
2023-01-11T21:19:17.5119179Z firmware:       nvidia/515.76/gsp.bin
2023-01-11T21:19:17.5119537Z alias:          char-major-195-*
2023-01-11T21:19:17.5119772Z version:        515.76
2023-01-11T21:19:17.5120032Z supported:      external
2023-01-11T21:19:17.5120286Z license:        NVIDIA
2023-01-11T21:19:17.5120530Z srcversion:     51FD9DD90150B35351AFFBB
2023-01-11T21:19:17.5120826Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2023-01-11T21:19:17.5121123Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2023-01-11T21:19:17.5121433Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2023-01-11T21:19:17.5121725Z depends:        i2c-core,drm
2023-01-11T21:19:17.5122022Z retpoline:      Y
2023-01-11T21:19:17.5122269Z name:           nvidia
2023-01-11T21:19:17.5122601Z vermagic:       4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 
2023-01-11T21:19:17.5157210Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2023-01-11T21:19:17.5157657Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2023-01-11T21:19:17.5158037Z parm:           NVreg_ResmanDebugLevel:int
2023-01-11T21:19:17.5158330Z parm:           NVreg_RmLogonRC:int
2023-01-11T21:19:17.5158629Z parm:           NVreg_ModifyDeviceFiles:int
2023-01-11T21:19:17.5158873Z parm:           NVreg_DeviceFileUID:int
2023-01-11T21:19:17.5159093Z parm:           NVreg_DeviceFileGID:int
2023-01-11T21:19:17.5159322Z parm:           NVreg_DeviceFileMode:int
2023-01-11T21:19:17.5159599Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2023-01-11T21:19:17.5159885Z parm:           NVreg_UsePageAttributeTable:int
2023-01-11T21:19:17.5160131Z parm:           NVreg_EnablePCIeGen3:int
2023-01-11T21:19:17.5160352Z parm:           NVreg_EnableMSI:int
2023-01-11T21:19:17.5160568Z parm:           NVreg_TCEBypassMode:int
2023-01-11T21:19:17.5160805Z parm:           NVreg_EnableStreamMemOPs:int
2023-01-11T21:19:17.5161076Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2023-01-11T21:19:17.5161368Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2023-01-11T21:19:17.5161856Z parm:           NVreg_EnableS0ixPowerManagement:int
2023-01-11T21:19:17.5162166Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2023-01-11T21:19:17.5162464Z parm:           NVreg_DynamicPowerManagement:int
2023-01-11T21:19:17.5162776Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2023-01-11T21:19:17.5163076Z parm:           NVreg_EnableGpuFirmware:int
2023-01-11T21:19:17.5163327Z parm:           NVreg_EnableGpuFirmwareLogs:int
2023-01-11T21:19:17.5163711Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2023-01-11T21:19:17.5163993Z parm:           NVreg_EnableUserNUMAManagement:int
2023-01-11T21:19:17.5164254Z parm:           NVreg_MemoryPoolSize:int
2023-01-11T21:19:17.5164491Z parm:           NVreg_KMallocHeapMaxSize:int
2023-01-11T21:19:17.5164746Z parm:           NVreg_VMallocHeapMaxSize:int
2023-01-11T21:19:17.5164986Z parm:           NVreg_IgnoreMMIOCheck:int
2023-01-11T21:19:17.5165215Z parm:           NVreg_NvLinkDisable:int
2023-01-11T21:19:17.5165486Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2023-01-11T21:19:17.5165764Z parm:           NVreg_RegisterPCIDriver:int
2023-01-11T21:19:17.5166018Z parm:           NVreg_EnableDbgBreakpoint:int
2023-01-11T21:19:17.5166262Z parm:           NVreg_RegistryDwords:charp
2023-01-11T21:19:17.5166521Z parm:           NVreg_RegistryDwordsPerDevice:charp
2023-01-11T21:19:17.5166772Z parm:           NVreg_RmMsg:charp
2023-01-11T21:19:17.5166999Z parm:           NVreg_GpuBlacklist:charp
2023-01-11T21:19:17.5167306Z parm:           NVreg_TemporaryFilePath:charp
2023-01-11T21:19:17.5167560Z parm:           NVreg_ExcludedGpus:charp
2023-01-11T21:19:17.5167801Z parm:           NVreg_DmaRemapPeerMmio:int
2023-01-11T21:19:17.5168046Z parm:           rm_firmware_active:charp
2023-01-11T21:19:17.5168267Z + HAS_NVIDIA_DRIVER=0
2023-01-11T21:19:17.5168534Z ++ command -v nvidia-smi
2023-01-11T21:19:17.5168788Z + '[' -x /usr/bin/nvidia-smi ']'
2023-01-11T21:19:17.5168994Z + set +e
2023-01-11T21:19:17.5169304Z ++ nvidia-smi --query-gpu=driver_version --format=csv,noheader --id=0
2023-01-11T21:19:20.2800290Z + INSTALLED_DRIVER_VERSION=515.76
2023-01-11T21:19:20.2800965Z + NVIDIA_SMI_STATUS=0
2023-01-11T21:19:20.2801478Z + '[' 0 -ne 0 ']'
2023-01-11T21:19:20.2801760Z + '[' 515.76 '!=' 515.76 ']'
2023-01-11T21:19:20.2802002Z + HAS_NVIDIA_DRIVER=1
2023-01-11T21:19:20.2802364Z + echo 'NVIDIA driver (515.76) has already been installed. Skipping NVIDIA driver installation'
2023-01-11T21:19:20.2802665Z + set -e
2023-01-11T21:19:20.2802873Z + '[' 1 -eq 0 ']'
2023-01-11T21:19:20.2803148Z NVIDIA driver (515.76) has already been installed. Skipping NVIDIA driver installation
2023-01-11T21:19:20.2803437Z + post_install_nvidia_driver_common
2023-01-11T21:19:20.2805138Z + sudo modprobe nvidia
2023-01-11T21:19:20.2913770Z + echo 'After installing NVIDIA driver'
2023-01-11T21:19:20.2914149Z + lspci
2023-01-11T21:19:20.2914426Z After installing NVIDIA driver
2023-01-11T21:19:20.2999784Z 00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma]
2023-01-11T21:19:20.3000697Z 00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
2023-01-11T21:19:20.3001905Z 00:01.3 Non-VGA unclassified device: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
2023-01-11T21:19:20.3002886Z 00:03.0 VGA compatible controller: Amazon.com, Inc. Device 1111
2023-01-11T21:19:20.3003948Z 00:04.0 Non-Volatile memory controller: Amazon.com, Inc. Device 8061
2023-01-11T21:19:20.3004836Z 00:05.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
2023-01-11T21:19:20.3005780Z 00:1e.0 3D controller: NVIDIA Corporation Device 2237 (rev a1)
2023-01-11T21:19:20.3006985Z 00:1f.0 Non-Volatile memory controller: Amazon.com, Inc. NVMe SSD Controller
2023-01-11T21:19:20.3007756Z + lsmod
2023-01-11T21:19:20.3019348Z Module                  Size  Used by
2023-01-11T21:19:20.3019799Z nvidia_modeset       1142784  0
2023-01-11T21:19:20.3020173Z nvidia_uvm           1269760  0
2023-01-11T21:19:20.3020431Z veth                   16384  0
2023-01-11T21:19:20.3020730Z nvidia              40808448  2 nvidia_uvm,nvidia_modeset
2023-01-11T21:19:20.3021020Z drm                   425984  1 nvidia
2023-01-11T21:19:20.3021330Z i2c_core               77824  2 nvidia,drm
2023-01-11T21:19:20.3021823Z backlight              16384  1 nvidia_modeset
2023-01-11T21:19:20.3022899Z xt_conntrack           16384  1
2023-01-11T21:19:20.3023245Z ipt_MASQUERADE         16384  1
2023-01-11T21:19:20.3023608Z nf_nat_masquerade_ipv4    16384  1 ipt_MASQUERADE
2023-01-11T21:19:20.3024117Z nf_conntrack_netlink    49152  0
2023-01-11T21:19:20.3024359Z nfnetlink              16384  2 nf_conntrack_netlink
2023-01-11T21:19:20.3024580Z xfrm_user              45056  1
2023-01-11T21:19:20.3024783Z xfrm_algo              16384  1 xfrm_user
2023-01-11T21:19:20.3024994Z xt_addrtype            16384  2
2023-01-11T21:19:20.3025201Z iptable_filter         16384  1
2023-01-11T21:19:20.3025396Z iptable_nat            16384  1
2023-01-11T21:19:20.3025604Z nf_conntrack_ipv4      16384  3
2023-01-11T21:19:20.3025832Z nf_defrag_ipv4         16384  1 nf_conntrack_ipv4
2023-01-11T21:19:20.3026073Z nf_nat_ipv4            16384  1 iptable_nat
2023-01-11T21:19:20.3026316Z nf_nat                 36864  2 nf_nat_masquerade_ipv4,nf_nat_ipv4
2023-01-11T21:19:20.3026675Z nf_conntrack          155648  7 xt_conntrack,nf_nat_masquerade_ipv4,nf_conntrack_ipv4,nf_nat,ipt_MASQUERADE,nf_nat_ipv4,nf_conntrack_netlink
2023-01-11T21:19:20.3027110Z br_netfilter           24576  0
2023-01-11T21:19:20.3027424Z bridge                172032  1 br_netfilter
2023-01-11T21:19:20.3027855Z stp                    16384  1 bridge
2023-01-11T21:19:20.3028170Z llc                    16384  2 bridge,stp
2023-01-11T21:19:20.3028482Z overlay                86016  0
2023-01-11T21:19:20.3028796Z sunrpc                393216  1
2023-01-11T21:19:20.3029103Z dm_mirror              28672  0
2023-01-11T21:19:20.3029411Z dm_region_hash         20480  1 dm_mirror
2023-01-11T21:19:20.3029658Z dm_log                 20480  2 dm_region_hash,dm_mirror
2023-01-11T21:19:20.3029899Z dm_mod                143360  2 dm_log,dm_mirror
2023-01-11T21:19:20.3030115Z dax                    69632  1 dm_mod
2023-01-11T21:19:20.3030314Z pcc_cpufreq            16384  0
2023-01-11T21:19:20.3030520Z crc32_pclmul           16384  0
2023-01-11T21:19:20.3030729Z ghash_clmulni_intel    16384  0
2023-01-11T21:19:20.3030952Z pcbc                   16384  0
2023-01-11T21:19:20.3031171Z aesni_intel           188416  0
2023-01-11T21:19:20.3031383Z aes_x86_64             20480  1 aesni_intel
2023-01-11T21:19:20.3031601Z crypto_simd            16384  1 aesni_intel
2023-01-11T21:19:20.3031830Z glue_helper            16384  1 aesni_intel
2023-01-11T21:19:20.3032091Z cryptd                 28672  3 crypto_simd,ghash_clmulni_intel,aesni_intel
2023-01-11T21:19:20.3032333Z mousedev               24576  0
2023-01-11T21:19:20.3032531Z psmouse                32768  0
2023-01-11T21:19:20.3032732Z evdev                  20480  3
2023-01-11T21:19:20.3032916Z button                 16384  0
2023-01-11T21:19:20.3033105Z ena                   114688  0
2023-01-11T21:19:20.3033301Z crc32c_intel           24576  0
2023-01-11T21:19:20.3033489Z autofs4                49152  2
2023-01-11T21:19:20.3033678Z + modinfo nvidia
2023-01-11T21:19:20.3034076Z filename:       /lib/modules/4.14.252-195.483.amzn2.x86_64/kernel/drivers/video/nvidia.ko
2023-01-11T21:19:20.3034350Z firmware:       nvidia/515.76/gsp.bin
2023-01-11T21:19:20.3034602Z alias:          char-major-195-*
2023-01-11T21:19:20.3034817Z version:        515.76
2023-01-11T21:19:20.3035022Z supported:      external
2023-01-11T21:19:20.3035213Z license:        NVIDIA
2023-01-11T21:19:20.3035430Z srcversion:     51FD9DD90150B35351AFFBB
2023-01-11T21:19:20.3035671Z alias:          pci:v000010DEd*sv*sd*bc06sc80i00*
2023-01-11T21:19:20.3035906Z alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
2023-01-11T21:19:20.3036146Z alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
2023-01-11T21:19:20.3036407Z depends:        i2c-core,drm
2023-01-11T21:19:20.3036605Z retpoline:      Y
2023-01-11T21:19:20.3036797Z name:           nvidia
2023-01-11T21:19:20.3037103Z vermagic:       4.14.252-195.483.amzn2.x86_64 SMP mod_unload modversions 
2023-01-11T21:19:20.3037396Z parm:           NvSwitchRegDwords:NvSwitch regkey (charp)
2023-01-11T21:19:20.3037695Z parm:           NvSwitchBlacklist:NvSwitchBlacklist=uuid[,uuid...] (charp)
2023-01-11T21:19:20.3037980Z parm:           NVreg_ResmanDebugLevel:int
2023-01-11T21:19:20.3038213Z parm:           NVreg_RmLogonRC:int
2023-01-11T21:19:20.3038539Z parm:           NVreg_ModifyDeviceFiles:int
2023-01-11T21:19:20.3038783Z parm:           NVreg_DeviceFileUID:int
2023-01-11T21:19:20.3039019Z parm:           NVreg_DeviceFileGID:int
2023-01-11T21:19:20.3039250Z parm:           NVreg_DeviceFileMode:int
2023-01-11T21:19:20.3039535Z parm:           NVreg_InitializeSystemMemoryAllocations:int
2023-01-11T21:19:20.3039830Z parm:           NVreg_UsePageAttributeTable:int
2023-01-11T21:19:20.3040075Z parm:           NVreg_EnablePCIeGen3:int
2023-01-11T21:19:20.3040309Z parm:           NVreg_EnableMSI:int
2023-01-11T21:19:20.3040539Z parm:           NVreg_TCEBypassMode:int
2023-01-11T21:19:20.3040779Z parm:           NVreg_EnableStreamMemOPs:int
2023-01-11T21:19:20.3041061Z parm:           NVreg_RestrictProfilingToAdminUsers:int
2023-01-11T21:19:20.3041365Z parm:           NVreg_PreserveVideoMemoryAllocations:int
2023-01-11T21:19:20.3041660Z parm:           NVreg_EnableS0ixPowerManagement:int
2023-01-11T21:19:20.3041972Z parm:           NVreg_S0ixPowerManagementVideoMemoryThreshold:int
2023-01-11T21:19:20.3042324Z parm:           NVreg_DynamicPowerManagement:int
2023-01-11T21:19:20.3042647Z parm:           NVreg_DynamicPowerManagementVideoMemoryThreshold:int
2023-01-11T21:19:20.3042948Z parm:           NVreg_EnableGpuFirmware:int
2023-01-11T21:19:20.3043208Z parm:           NVreg_EnableGpuFirmwareLogs:int
2023-01-11T21:19:20.3043490Z parm:           NVreg_OpenRmEnableUnsupportedGpus:int
2023-01-11T21:19:20.3043770Z parm:           NVreg_EnableUserNUMAManagement:int
2023-01-11T21:19:20.3044027Z parm:           NVreg_MemoryPoolSize:int
2023-01-11T21:19:20.3044277Z parm:           NVreg_KMallocHeapMaxSize:int
2023-01-11T21:19:20.3044523Z parm:           NVreg_VMallocHeapMaxSize:int
2023-01-11T21:19:20.3044766Z parm:           NVreg_IgnoreMMIOCheck:int
2023-01-11T21:19:20.3045008Z parm:           NVreg_NvLinkDisable:int
2023-01-11T21:19:20.3045280Z parm:           NVreg_EnablePCIERelaxedOrderingMode:int
2023-01-11T21:19:20.3045549Z parm:           NVreg_RegisterPCIDriver:int
2023-01-11T21:19:20.3045807Z parm:           NVreg_EnableDbgBreakpoint:int
2023-01-11T21:19:20.3046058Z parm:           NVreg_RegistryDwords:charp
2023-01-11T21:19:20.3046315Z parm:           NVreg_RegistryDwordsPerDevice:charp
2023-01-11T21:19:20.3046565Z parm:           NVreg_RmMsg:charp
2023-01-11T21:19:20.3046793Z parm:           NVreg_GpuBlacklist:charp
2023-01-11T21:19:20.3047038Z parm:           NVreg_TemporaryFilePath:charp
2023-01-11T21:19:20.3047283Z parm:           NVreg_ExcludedGpus:charp
2023-01-11T21:19:20.3047526Z parm:           NVreg_DmaRemapPeerMmio:int
2023-01-11T21:19:20.3047759Z parm:           rm_firmware_active:charp
2023-01-11T21:19:20.3047966Z + set +e
2023-01-11T21:19:20.3048171Z + nvidia-smi
2023-01-11T21:19:22.7497174Z Wed Jan 11 21:19:22 2023       
2023-01-11T21:19:22.7497866Z +-----------------------------------------------------------------------------+
2023-01-11T21:19:22.7498364Z | NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
2023-01-11T21:19:22.7498832Z |-------------------------------+----------------------+----------------------+
2023-01-11T21:19:22.7499416Z | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2023-01-11T21:19:22.7499817Z | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2023-01-11T21:19:22.7500106Z |                               |                      |               MIG M. |
2023-01-11T21:19:22.7500342Z |===============================+======================+======================|
2023-01-11T21:19:22.7727814Z |   0  NVIDIA A10G         Off  | 00000000:00:1E.0 Off |                    0 |
2023-01-11T21:19:22.7728391Z |  0%   27C    P0    58W / 300W |      0MiB / 23028MiB |      2%      Default |
2023-01-11T21:19:22.7728869Z |                               |                      |                  N/A |
2023-01-11T21:19:22.7729627Z +-------------------------------+----------------------+----------------------+
2023-01-11T21:19:22.7730461Z                                                                                
2023-01-11T21:19:22.7731094Z +-----------------------------------------------------------------------------+
2023-01-11T21:19:22.7731626Z | Processes:                                                                  |
2023-01-11T21:19:22.7732100Z |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2023-01-11T21:19:22.7732496Z |        ID   ID                                                   Usage      |
2023-01-11T21:19:22.7732738Z |=============================================================================|
2023-01-11T21:19:22.7733058Z |  No running processes found                                                 |
2023-01-11T21:19:22.7733418Z +-----------------------------------------------------------------------------+
2023-01-11T21:19:23.2077874Z + NVIDIA_SMI_STATUS=0
2023-01-11T21:19:23.2078648Z + '[' 0 -eq 0 ']'
2023-01-11T21:19:23.2079343Z + echo 'INFO: Ignoring allowed status 0'
2023-01-11T21:19:23.2079859Z + set -e
2023-01-11T21:19:23.2080550Z INFO: Ignoring allowed status 0
2023-01-11T21:19:23.2083105Z == Installing nvidia container toolkit for amzn2 ==
2023-01-11T21:19:23.2085300Z + sudo yum install -y yum-utils
2023-01-11T21:19:23.4794043Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
2023-01-11T21:19:24.6724949Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version
2023-01-11T21:19:24.6725765Z Nothing to do
2023-01-11T21:19:24.7600171Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo
2023-01-11T21:19:27.1796107Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
2023-01-11T21:19:27.2119373Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo
2023-01-11T21:19:27.2120417Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo
2023-01-11T21:19:27.2121216Z repo saved to /etc/yum.repos.d/nvidia-docker.repo
2023-01-11T21:19:27.2268416Z + sudo yum install -y nvidia-docker2
2023-01-11T21:19:27.7173281Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
2023-01-11T21:19:28.9089518Z Package nvidia-docker2-2.11.0-1.noarch already installed and latest version
2023-01-11T21:19:28.9090113Z Nothing to do
2023-01-11T21:19:28.9926948Z + sudo systemctl restart docker
2023-01-11T21:20:08.1967807Z Command completed after 1 attempt(s).
2023-01-11T21:20:08.2015757Z ##[group]Run python3 -m pip install psutil==5.9.1
2023-01-11T21:20:08.2016050Z [36;1mpython3 -m pip install psutil==5.9.1[0m
2023-01-11T21:20:08.2016303Z [36;1mpython3 -m pip install pynvml==11.4.1[0m
2023-01-11T21:20:08.2016591Z [36;1mpython3 -m tools.stats.monitor > usage_log.txt 2>&1 &[0m
2023-01-11T21:20:08.2016895Z [36;1mecho "monitor-script-pid=${!}" >> "${GITHUB_OUTPUT}"[0m
2023-01-11T21:20:08.2027518Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:20:08.2027755Z env:
2023-01-11T21:20:08.2027949Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:08.2028163Z   GPU_FLAG: --gpus all
2023-01-11T21:20:08.2028352Z ##[endgroup]
2023-01-11T21:20:08.4160563Z Defaulting to user installation because normal site-packages is not writeable
2023-01-11T21:20:08.4345308Z Requirement already satisfied: psutil==5.9.1 in /home/ec2-user/.local/lib/python3.7/site-packages (5.9.1)
2023-01-11T21:20:08.8605360Z Defaulting to user installation because normal site-packages is not writeable
2023-01-11T21:20:08.8776688Z Requirement already satisfied: pynvml==11.4.1 in /home/ec2-user/.local/lib/python3.7/site-packages (11.4.1)
2023-01-11T21:20:09.0943401Z Prepare all required actions
2023-01-11T21:20:09.0943670Z Getting action download info
2023-01-11T21:20:09.2781202Z Download action repository 'seemethere/download-artifact-s3@v4' (SHA:4a8bfae15cc25cc0785c1603ee87a9da8fd442ea)
2023-01-11T21:20:09.4579393Z Download action repository 'actions/download-artifact@v3' (SHA:9bc31d5ccc31df68ecc42ccf4149144866c47d8a)
2023-01-11T21:20:09.7752947Z ##[group]Run ./.github/actions/download-build-artifacts
2023-01-11T21:20:09.7753193Z with:
2023-01-11T21:20:09.7753416Z   name: linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T21:20:09.7753642Z env:
2023-01-11T21:20:09.7753831Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:09.7754034Z   GPU_FLAG: --gpus all
2023-01-11T21:20:09.7754230Z ##[endgroup]
2023-01-11T21:20:09.7777521Z ##[group]Run seemethere/download-artifact-s3@v4
2023-01-11T21:20:09.7777746Z with:
2023-01-11T21:20:09.7777978Z   name: linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T21:20:09.7778234Z   s3-bucket: gha-artifacts
2023-01-11T21:20:09.7778482Z   region: us-east-1
2023-01-11T21:20:09.7778673Z env:
2023-01-11T21:20:09.7778870Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:09.7779076Z   GPU_FLAG: --gpus all
2023-01-11T21:20:09.7779275Z ##[endgroup]
2023-01-11T21:20:10.1734048Z Found 1 objects with prefix pytorch/pytorch/3896346758/linux-bionic-cuda11.6-py3.10-gcc7-sm86/
2023-01-11T21:20:10.1734970Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2023-01-11T21:20:18.0544660Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip
2023-01-11T21:20:18.0545229Z 
2023-01-11T21:20:18.0557928Z ##[warning]The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/
2023-01-11T21:20:18.0565204Z Artifact download has finished successfully
2023-01-11T21:20:18.0676138Z ##[group]Run unzip -o artifacts.zip
2023-01-11T21:20:18.0676394Z [36;1munzip -o artifacts.zip[0m
2023-01-11T21:20:18.0687106Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:20:18.0687362Z env:
2023-01-11T21:20:18.0687586Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:18.0687795Z   GPU_FLAG: --gpus all
2023-01-11T21:20:18.0687994Z ##[endgroup]
2023-01-11T21:20:18.0720534Z Archive:  artifacts.zip
2023-01-11T21:20:18.0722549Z    creating: dist/
2023-01-11T21:20:19.6675485Z   inflating: dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl  
2023-01-11T21:20:19.6675826Z    creating: build/custom_test_artifacts/
2023-01-11T21:20:19.6676164Z    creating: build/custom_test_artifacts/custom-op-build/
2023-01-11T21:20:19.6676530Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/
2023-01-11T21:20:19.6682367Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log  
2023-01-11T21:20:19.6683593Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/
2023-01-11T21:20:19.6684439Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeSystem.cmake  
2023-01-11T21:20:19.6685296Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/
2023-01-11T21:20:19.6686139Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/tmp/
2023-01-11T21:20:19.6687060Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c  
2023-01-11T21:20:19.6687960Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdC/a.out  
2023-01-11T21:20:19.6688549Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/
2023-01-11T21:20:19.6688988Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/
2023-01-11T21:20:19.6689458Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2023-01-11T21:20:19.6690077Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out  
2023-01-11T21:20:19.6690556Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin  
2023-01-11T21:20:19.6691028Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake  
2023-01-11T21:20:19.6691608Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin  
2023-01-11T21:20:19.6692080Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake  
2023-01-11T21:20:19.6692524Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/
2023-01-11T21:20:19.6692957Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/
2023-01-11T21:20:19.6736586Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2023-01-11T21:20:19.6737906Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2023-01-11T21:20:19.6738802Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2023-01-11T21:20:19.6739624Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2023-01-11T21:20:19.6740192Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2023-01-11T21:20:19.6740711Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2023-01-11T21:20:19.6741239Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2023-01-11T21:20:19.6741766Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2023-01-11T21:20:19.6742307Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2023-01-11T21:20:19.6774300Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2023-01-11T21:20:19.6806736Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2023-01-11T21:20:19.6808300Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2023-01-11T21:20:19.6809578Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2023-01-11T21:20:19.6810558Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c  
2023-01-11T21:20:19.6811519Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin  
2023-01-11T21:20:19.6812481Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2023-01-11T21:20:19.6813568Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o  
2023-01-11T21:20:19.6814992Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2023-01-11T21:20:19.6866520Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out  
2023-01-11T21:20:19.6936504Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin  
2023-01-11T21:20:19.6937851Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake  
2023-01-11T21:20:19.6938582Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/
2023-01-11T21:20:19.6939409Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log  
2023-01-11T21:20:19.6939996Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache  
2023-01-11T21:20:19.6940605Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/
2023-01-11T21:20:19.6941342Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.ts  
2023-01-11T21:20:19.6942143Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/compiler_depend.make  
2023-01-11T21:20:19.6942905Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make  
2023-01-11T21:20:19.6943371Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt  
2023-01-11T21:20:19.6943809Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake  
2023-01-11T21:20:19.6944259Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make  
2023-01-11T21:20:19.6944705Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake  
2023-01-11T21:20:19.6945154Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make  
2023-01-11T21:20:19.6945597Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make  
2023-01-11T21:20:19.6963787Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o.d  
2023-01-11T21:20:19.7068823Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o  
2023-01-11T21:20:19.7069559Z    creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/
2023-01-11T21:20:19.7070308Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.ts  
2023-01-11T21:20:19.7070842Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/compiler_depend.make  
2023-01-11T21:20:19.7071664Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make  
2023-01-11T21:20:19.7072277Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt  
2023-01-11T21:20:19.7072739Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake  
2023-01-11T21:20:19.7073204Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make  
2023-01-11T21:20:19.7073676Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake  
2023-01-11T21:20:19.7074146Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make  
2023-01-11T21:20:19.7074605Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make  
2023-01-11T21:20:19.7089797Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o.d  
2023-01-11T21:20:19.7157805Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o  
2023-01-11T21:20:19.7158983Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2023-01-11T21:20:19.7160212Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt  
2023-01-11T21:20:19.7161380Z  extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks  
2023-01-11T21:20:19.7162479Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2  
2023-01-11T21:20:19.7163571Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake  
2023-01-11T21:20:19.7164453Z   inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc  
2023-01-11T21:20:19.7165410Z   inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt  
2023-01-11T21:20:19.7166158Z   inflating: build/custom_test_artifacts/custom-op-build/Makefile  
2023-01-11T21:20:19.7166906Z   inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake  
2023-01-11T21:20:19.7238517Z   inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so  
2023-01-11T21:20:19.7289662Z   inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops  
2023-01-11T21:20:19.7290195Z    creating: build/custom_test_artifacts/jit-hook-build/
2023-01-11T21:20:19.7290596Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/
2023-01-11T21:20:19.7297284Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log  
2023-01-11T21:20:19.7297830Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/
2023-01-11T21:20:19.7298257Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeSystem.cmake  
2023-01-11T21:20:19.7298686Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/
2023-01-11T21:20:19.7299119Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/tmp/
2023-01-11T21:20:19.7299575Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c  
2023-01-11T21:20:19.7300008Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdC/a.out  
2023-01-11T21:20:19.7300437Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/
2023-01-11T21:20:19.7300857Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/
2023-01-11T21:20:19.7301604Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2023-01-11T21:20:19.7302242Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out  
2023-01-11T21:20:19.7303463Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin  
2023-01-11T21:20:19.7303937Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake  
2023-01-11T21:20:19.7325725Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin  
2023-01-11T21:20:19.7326652Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake  
2023-01-11T21:20:19.7327136Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/
2023-01-11T21:20:19.7327575Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/
2023-01-11T21:20:19.7350110Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2023-01-11T21:20:19.7351022Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2023-01-11T21:20:19.7351819Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2023-01-11T21:20:19.7352677Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2023-01-11T21:20:19.7353244Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2023-01-11T21:20:19.7353776Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2023-01-11T21:20:19.7354312Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2023-01-11T21:20:19.7354836Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2023-01-11T21:20:19.7355495Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2023-01-11T21:20:19.7386528Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2023-01-11T21:20:19.7419769Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2023-01-11T21:20:19.7420680Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2023-01-11T21:20:19.7421417Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2023-01-11T21:20:19.7422040Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c  
2023-01-11T21:20:19.7422515Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin  
2023-01-11T21:20:19.7423198Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2023-01-11T21:20:19.7423759Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o  
2023-01-11T21:20:19.7424254Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2023-01-11T21:20:19.7481003Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out  
2023-01-11T21:20:19.7539057Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin  
2023-01-11T21:20:19.7539684Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake  
2023-01-11T21:20:19.7540165Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/
2023-01-11T21:20:19.7540574Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log  
2023-01-11T21:20:19.7540985Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache  
2023-01-11T21:20:19.7541400Z    creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/
2023-01-11T21:20:19.7541856Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.ts  
2023-01-11T21:20:19.7542334Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/compiler_depend.make  
2023-01-11T21:20:19.7542789Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make  
2023-01-11T21:20:19.7543235Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt  
2023-01-11T21:20:19.7543692Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake  
2023-01-11T21:20:19.7544142Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make  
2023-01-11T21:20:19.7544602Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake  
2023-01-11T21:20:19.7545063Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make  
2023-01-11T21:20:19.7545514Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make  
2023-01-11T21:20:19.7560698Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o.d  
2023-01-11T21:20:19.7616195Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o  
2023-01-11T21:20:19.7617192Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2023-01-11T21:20:19.7617868Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt  
2023-01-11T21:20:19.7618302Z  extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks  
2023-01-11T21:20:19.7618832Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2  
2023-01-11T21:20:19.7619248Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake  
2023-01-11T21:20:19.7619656Z   inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc  
2023-01-11T21:20:19.7620039Z   inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt  
2023-01-11T21:20:19.7620448Z   inflating: build/custom_test_artifacts/jit-hook-build/Makefile  
2023-01-11T21:20:19.7620830Z   inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake  
2023-01-11T21:20:19.7659713Z   inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks  
2023-01-11T21:20:19.7660098Z    creating: build/custom_test_artifacts/custom-backend-build/
2023-01-11T21:20:19.7660486Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/
2023-01-11T21:20:19.7664953Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log  
2023-01-11T21:20:19.7665491Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/
2023-01-11T21:20:19.7665938Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeSystem.cmake  
2023-01-11T21:20:19.7666388Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/
2023-01-11T21:20:19.7666830Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/tmp/
2023-01-11T21:20:19.7667557Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/CMakeCCompilerId.c  
2023-01-11T21:20:19.7668760Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdC/a.out  
2023-01-11T21:20:19.7669299Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/
2023-01-11T21:20:19.7669751Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/tmp/
2023-01-11T21:20:19.7672207Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/CMakeCXXCompilerId.cpp  
2023-01-11T21:20:19.7672737Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCXX/a.out  
2023-01-11T21:20:19.7673361Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_C.bin  
2023-01-11T21:20:19.7673862Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCCompiler.cmake  
2023-01-11T21:20:19.7675319Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CXX.bin  
2023-01-11T21:20:19.7675909Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCXXCompiler.cmake  
2023-01-11T21:20:19.7676380Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/
2023-01-11T21:20:19.7676838Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/
2023-01-11T21:20:19.7720174Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii  
2023-01-11T21:20:19.7721869Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c  
2023-01-11T21:20:19.7723366Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu  
2023-01-11T21:20:19.7724548Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c  
2023-01-11T21:20:19.7725667Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id  
2023-01-11T21:20:19.7726753Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx  
2023-01-11T21:20:19.7728013Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin  
2023-01-11T21:20:19.7728609Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin  
2023-01-11T21:20:19.7729149Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c  
2023-01-11T21:20:19.7756612Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii  
2023-01-11T21:20:19.7789449Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp  
2023-01-11T21:20:19.7790286Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o  
2023-01-11T21:20:19.7790953Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin  
2023-01-11T21:20:19.7791635Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.reg.c  
2023-01-11T21:20:19.7792339Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin  
2023-01-11T21:20:19.7792995Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.fatbin.c  
2023-01-11T21:20:19.7793577Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/tmp/a_dlink.o  
2023-01-11T21:20:19.7794077Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/CMakeCUDACompilerId.cu  
2023-01-11T21:20:19.7851118Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CompilerIdCUDA/a.out  
2023-01-11T21:20:19.7909314Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeDetermineCompilerABI_CUDA.bin  
2023-01-11T21:20:19.7909943Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.22.1/CMakeCUDACompiler.cmake  
2023-01-11T21:20:19.7910469Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/
2023-01-11T21:20:19.7910957Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log  
2023-01-11T21:20:19.7911462Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache  
2023-01-11T21:20:19.7911969Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/
2023-01-11T21:20:19.7912519Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.ts  
2023-01-11T21:20:19.7913097Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/compiler_depend.make  
2023-01-11T21:20:19.7913650Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make  
2023-01-11T21:20:19.7914191Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt  
2023-01-11T21:20:19.7914744Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake  
2023-01-11T21:20:19.7915300Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make  
2023-01-11T21:20:19.7915778Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake  
2023-01-11T21:20:19.7916257Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make  
2023-01-11T21:20:19.7916742Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make  
2023-01-11T21:20:19.7917495Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o.d  
2023-01-11T21:20:19.8039258Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o  
2023-01-11T21:20:19.8040275Z    creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/
2023-01-11T21:20:19.8041257Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.ts  
2023-01-11T21:20:19.8042356Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/compiler_depend.make  
2023-01-11T21:20:19.8043327Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make  
2023-01-11T21:20:19.8044283Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt  
2023-01-11T21:20:19.8045269Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake  
2023-01-11T21:20:19.8046249Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make  
2023-01-11T21:20:19.8047218Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake  
2023-01-11T21:20:19.8047943Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make  
2023-01-11T21:20:19.8048461Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make  
2023-01-11T21:20:19.8058032Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o.d  
2023-01-11T21:20:19.8105777Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o  
2023-01-11T21:20:19.8106301Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake  
2023-01-11T21:20:19.8106789Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt  
2023-01-11T21:20:19.8107241Z  extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks  
2023-01-11T21:20:19.8107672Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2  
2023-01-11T21:20:19.8108102Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake  
2023-01-11T21:20:19.8108734Z   inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc  
2023-01-11T21:20:19.8110327Z   inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt  
2023-01-11T21:20:19.8110741Z   inflating: build/custom_test_artifacts/custom-backend-build/Makefile  
2023-01-11T21:20:19.8111202Z   inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake  
2023-01-11T21:20:19.8208400Z   inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so  
2023-01-11T21:20:19.8245876Z   inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend  
2023-01-11T21:20:19.8246467Z    creating: build/lib/
2023-01-11T21:20:19.8246902Z   inflating: build/lib/libclog.a     
2023-01-11T21:20:19.8300662Z   inflating: build/lib/libgtest.a    
2023-01-11T21:20:19.8308587Z   inflating: build/lib/libpthreadpool.a  
2023-01-11T21:20:19.8392315Z   inflating: build/lib/libprotobuf-lite.a  
2023-01-11T21:20:19.8399083Z   inflating: build/lib/libittnotify.a  
2023-01-11T21:20:19.8474915Z   inflating: build/lib/libbenchmark.a  
2023-01-11T21:20:19.8500547Z   inflating: build/lib/libtensorpipe_uv.a  
2023-01-11T21:20:19.8560381Z   inflating: build/lib/libasmjit.a   
2023-01-11T21:20:19.8984325Z   inflating: build/lib/libprotobuf.a  
2023-01-11T21:20:19.9096303Z   inflating: build/lib/libgloo.a     
2023-01-11T21:20:19.9122015Z   inflating: build/lib/libfmt.a      
2023-01-11T21:20:19.9122763Z   inflating: build/lib/libfoxi_loader.a  
2023-01-11T21:20:19.9123273Z   inflating: build/lib/libcaffe2_nvrtc.so  
2023-01-11T21:20:19.9188009Z   inflating: build/lib/libc10.so     
2023-01-11T21:20:19.9188304Z   inflating: build/lib/libtorch_global_deps.so  
2023-01-11T21:20:19.9196331Z   inflating: build/lib/libcpuinfo.a  
2023-01-11T21:20:19.9202737Z   inflating: build/lib/libcpuinfo_internals.a  
2023-01-11T21:20:19.9215850Z   inflating: build/lib/libqnnpack.a  
2023-01-11T21:20:19.9234612Z   inflating: build/lib/libpytorch_qnnpack.a  
2023-01-11T21:20:19.9686503Z   inflating: build/lib/libprotoc.a   
2023-01-11T21:20:19.9687305Z   inflating: build/lib/libnnpack_reference_layers.a  
2023-01-11T21:20:19.9701364Z   inflating: build/lib/libgmock.a    
2023-01-11T21:20:19.9701866Z   inflating: build/lib/libgtest_main.a  
2023-01-11T21:20:19.9702213Z   inflating: build/lib/libbenchmark_main.a  
2023-01-11T21:20:19.9719670Z   inflating: build/lib/libnnpack.a   
2023-01-11T21:20:20.0239882Z   inflating: build/lib/libtensorpipe.a  
2023-01-11T21:20:20.8002153Z   inflating: build/lib/libdnnl.a     
2023-01-11T21:20:20.8112797Z   inflating: build/lib/libXNNPACK.a  
2023-01-11T21:20:20.8155878Z   inflating: build/lib/libc10_cuda.so  
2023-01-11T21:20:20.8156369Z   inflating: build/lib/libgmock_main.a  
2023-01-11T21:20:20.9383742Z   inflating: build/lib/libfbgemm.a   
2023-01-11T21:20:21.0298368Z   inflating: build/lib/libdnnl_graph.a  
2023-01-11T21:20:21.0705882Z   inflating: build/lib/libkineto.a   
2023-01-11T21:20:21.0935900Z   inflating: build/lib/libtensorpipe_cuda.a  
2023-01-11T21:20:21.0970841Z   inflating: build/lib/libcaffe2_protos.a  
2023-01-11T21:20:21.1009185Z   inflating: build/lib/libonnx_proto.a  
2023-01-11T21:20:21.1543929Z   inflating: build/lib/libonnx.a     
2023-01-11T21:20:21.1883703Z   inflating: build/lib/libgloo_cuda.a  
2023-01-11T21:20:23.0706738Z   inflating: build/lib/libtorch_cpu.so  
2023-01-11T21:20:23.0715507Z   inflating: build/lib/libunbox_lib.a  
2023-01-11T21:20:24.4922402Z   inflating: build/lib/libtorch_cuda.so  
2023-01-11T21:20:24.4922760Z   inflating: build/lib/libtorch.so   
2023-01-11T21:20:25.2609609Z   inflating: build/lib/libtorch_cuda_linalg.so  
2023-01-11T21:20:25.2610184Z   inflating: build/lib/libc10d_cuda_test.so  
2023-01-11T21:20:25.2659915Z   inflating: build/lib/libtorchbind_test.so  
2023-01-11T21:20:25.2677879Z   inflating: build/lib/libjitbackend_test.so  
2023-01-11T21:20:25.2702322Z   inflating: build/lib/libbackend_with_compiler.so  
2023-01-11T21:20:25.2705793Z   inflating: build/lib/libshm.so     
2023-01-11T21:20:25.4167718Z   inflating: build/lib/libtorch_python.so  
2023-01-11T21:20:25.4197959Z   inflating: build/lib/libnnapi_backend.so  
2023-01-11T21:20:25.4198437Z    creating: build/bin/
2023-01-11T21:20:25.4242313Z   inflating: build/bin/c10_CompileTimeFunctionPointer_test  
2023-01-11T21:20:25.4287369Z   inflating: build/bin/c10_DeviceGuard_test  
2023-01-11T21:20:25.4330039Z   inflating: build/bin/c10_Device_test  
2023-01-11T21:20:25.4380806Z   inflating: build/bin/c10_DispatchKeySet_test  
2023-01-11T21:20:25.4422698Z   inflating: build/bin/c10_StreamGuard_test  
2023-01-11T21:20:25.4465948Z   inflating: build/bin/c10_SymInt_test  
2023-01-11T21:20:25.4515493Z   inflating: build/bin/c10_InlineDeviceGuard_test  
2023-01-11T21:20:25.4564024Z   inflating: build/bin/c10_InlineStreamGuard_test  
2023-01-11T21:20:25.4613657Z   inflating: build/bin/c10_SizesAndStrides_test  
2023-01-11T21:20:25.4656521Z   inflating: build/bin/c10_Array_test  
2023-01-11T21:20:25.4700915Z   inflating: build/bin/c10_Bitset_test  
2023-01-11T21:20:25.4762807Z   inflating: build/bin/c10_C++17_test  
2023-01-11T21:20:25.4804932Z   inflating: build/bin/c10_ConstexprCrc_test  
2023-01-11T21:20:25.4848024Z   inflating: build/bin/c10_DeadlockDetection_test  
2023-01-11T21:20:25.4891285Z   inflating: build/bin/c10_Half_test  
2023-01-11T21:20:25.4941443Z   inflating: build/bin/c10_LeftRight_test  
2023-01-11T21:20:25.4995972Z   inflating: build/bin/c10_Metaprogramming_test  
2023-01-11T21:20:25.5123146Z   inflating: build/bin/c10_SmallVectorTest  
2023-01-11T21:20:25.5167351Z   inflating: build/bin/c10_Synchronized_test  
2023-01-11T21:20:25.5213725Z   inflating: build/bin/c10_TypeIndex_test  
2023-01-11T21:20:25.5264237Z   inflating: build/bin/c10_ThreadLocal_test  
2023-01-11T21:20:25.5308458Z   inflating: build/bin/c10_TypeList_test  
2023-01-11T21:20:25.5350943Z   inflating: build/bin/c10_TypeTraits_test  
2023-01-11T21:20:25.5397517Z   inflating: build/bin/c10_accumulate_test  
2023-01-11T21:20:25.5446660Z   inflating: build/bin/c10_bfloat16_test  
2023-01-11T21:20:25.5493957Z   inflating: build/bin/c10_complex_math_test  
2023-01-11T21:20:25.5542721Z   inflating: build/bin/c10_complex_test  
2023-01-11T21:20:25.5637044Z   inflating: build/bin/c10_either_test  
2023-01-11T21:20:25.5684367Z   inflating: build/bin/c10_exception_test  
2023-01-11T21:20:25.5728093Z   inflating: build/bin/c10_flags_test  
2023-01-11T21:20:25.5873662Z   inflating: build/bin/c10_intrusive_ptr_test  
2023-01-11T21:20:25.5918824Z   inflating: build/bin/c10_irange_test  
2023-01-11T21:20:25.5968812Z   inflating: build/bin/c10_logging_test  
2023-01-11T21:20:25.6032023Z   inflating: build/bin/c10_optional_test  
2023-01-11T21:20:25.6087196Z   inflating: build/bin/c10_ordered_preserving_dict_test  
2023-01-11T21:20:25.6134150Z   inflating: build/bin/c10_registry_test  
2023-01-11T21:20:25.6185311Z   inflating: build/bin/c10_string_view_test  
2023-01-11T21:20:25.6232117Z   inflating: build/bin/c10_tempfile_test  
2023-01-11T21:20:25.6281588Z   inflating: build/bin/c10_typeid_test  
2023-01-11T21:20:25.6329823Z   inflating: build/bin/c10_intrusive_ptr_benchmark  
2023-01-11T21:20:25.6740571Z   inflating: build/bin/protoc-3.13.0.0  
2023-01-11T21:20:25.7152961Z   inflating: build/bin/protoc        
2023-01-11T21:20:25.7199792Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_1_var_test  
2023-01-11T21:20:25.7247889Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_stream  
2023-01-11T21:20:25.7293243Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_catches_thread_and_block_and_device  
2023-01-11T21:20:25.7339361Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_from_2_processes  
2023-01-11T21:20:25.7385925Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_blocks_and_threads  
2023-01-11T21:20:25.7434098Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_multiple_blocks  
2023-01-11T21:20:25.7476406Z   inflating: build/bin/c10_cuda_CUDATest  
2023-01-11T21:20:25.7522929Z   inflating: build/bin/c10_cuda_CUDAAssertionsTest_multiple_writes_from_same_block  
2023-01-11T21:20:25.7779427Z   inflating: build/bin/vec_test_all_types_DEFAULT  
2023-01-11T21:20:25.8066438Z   inflating: build/bin/vec_test_all_types_AVX2  
2023-01-11T21:20:25.8114248Z   inflating: build/bin/HashStoreTest  
2023-01-11T21:20:25.8161496Z   inflating: build/bin/FileStoreTest  
2023-01-11T21:20:25.8213871Z   inflating: build/bin/TCPStoreTest  
2023-01-11T21:20:25.8226656Z   inflating: build/bin/ProcessGroupMPITest  
2023-01-11T21:20:25.8276920Z   inflating: build/bin/test_edge_op_registration  
2023-01-11T21:20:25.8279634Z   inflating: build/bin/example_allreduce  
2023-01-11T21:20:25.8327249Z   inflating: build/bin/Dimname_test  
2023-01-11T21:20:25.8390780Z   inflating: build/bin/Dict_test     
2023-01-11T21:20:25.8447404Z   inflating: build/bin/MaybeOwned_test  
2023-01-11T21:20:25.8497363Z   inflating: build/bin/NamedTensor_test  
2023-01-11T21:20:25.8549827Z   inflating: build/bin/apply_utils_test  
2023-01-11T21:20:25.8601760Z   inflating: build/bin/atest         
2023-01-11T21:20:25.8655720Z   inflating: build/bin/basic         
2023-01-11T21:20:25.8702584Z   inflating: build/bin/broadcast_test  
2023-01-11T21:20:25.8753254Z   inflating: build/bin/cpu_generator_test  
2023-01-11T21:20:25.8799753Z   inflating: build/bin/cpu_profiling_allocator_test  
2023-01-11T21:20:25.8843617Z   inflating: build/bin/dispatch_key_set_test  
2023-01-11T21:20:25.8920191Z   inflating: build/bin/cpu_rng_test  
2023-01-11T21:20:25.8963642Z   inflating: build/bin/dlconvertor_test  
2023-01-11T21:20:25.9015803Z   inflating: build/bin/extension_backend_test  
2023-01-11T21:20:25.9063395Z   inflating: build/bin/half_test     
2023-01-11T21:20:25.9106571Z   inflating: build/bin/lazy_tensor_test  
2023-01-11T21:20:25.9188644Z   inflating: build/bin/ivalue_test   
2023-01-11T21:20:25.9236652Z   inflating: build/bin/math_kernel_test  
2023-01-11T21:20:25.9285644Z   inflating: build/bin/memory_format_test  
2023-01-11T21:20:25.9331844Z   inflating: build/bin/memory_overlapping_test  
2023-01-11T21:20:25.9376584Z   inflating: build/bin/operator_name_test  
2023-01-11T21:20:25.9424867Z   inflating: build/bin/native_test   
2023-01-11T21:20:25.9471799Z   inflating: build/bin/mobile_memory_cleanup  
2023-01-11T21:20:25.9515687Z   inflating: build/bin/operators_test  
2023-01-11T21:20:25.9561938Z   inflating: build/bin/packedtensoraccessor_test  
2023-01-11T21:20:25.9612118Z   inflating: build/bin/quantized_test  
2023-01-11T21:20:25.9668708Z   inflating: build/bin/pow_test      
2023-01-11T21:20:25.9712590Z   inflating: build/bin/reduce_ops_test  
2023-01-11T21:20:25.9757660Z   inflating: build/bin/reportMemoryUsage_test  
2023-01-11T21:20:25.9806769Z   inflating: build/bin/scalar_tensor_test  
2023-01-11T21:20:25.9855595Z   inflating: build/bin/scalar_test   
2023-01-11T21:20:25.9901586Z   inflating: build/bin/stride_properties_test  
2023-01-11T21:20:25.9970334Z   inflating: build/bin/tensor_iterator_test  
2023-01-11T21:20:25.9972639Z   inflating: build/bin/thread_init_test  
2023-01-11T21:20:26.0020746Z   inflating: build/bin/type_ptr_test  
2023-01-11T21:20:26.0068890Z   inflating: build/bin/test_parallel  
2023-01-11T21:20:26.0113490Z   inflating: build/bin/variant_test  
2023-01-11T21:20:26.0166739Z   inflating: build/bin/type_test     
2023-01-11T21:20:26.0211521Z   inflating: build/bin/undefined_tensor_test  
2023-01-11T21:20:26.0212652Z   inflating: build/bin/verify_api_visibility  
2023-01-11T21:20:26.0274066Z   inflating: build/bin/legacy_vmap_test  
2023-01-11T21:20:26.0318877Z   inflating: build/bin/weakref_test  
2023-01-11T21:20:26.0371640Z   inflating: build/bin/IListRef_test  
2023-01-11T21:20:26.0465253Z   inflating: build/bin/List_test     
2023-01-11T21:20:26.0510005Z   inflating: build/bin/wrapdim_test  
2023-01-11T21:20:26.0553634Z   inflating: build/bin/xla_tensor_test  
2023-01-11T21:20:26.0658875Z   inflating: build/bin/kernel_function_legacy_test  
2023-01-11T21:20:26.0715416Z   inflating: build/bin/KernelFunction_test  
2023-01-11T21:20:26.0798394Z   inflating: build/bin/kernel_function_test  
2023-01-11T21:20:26.0908904Z   inflating: build/bin/kernel_lambda_legacy_test  
2023-01-11T21:20:26.0999333Z   inflating: build/bin/kernel_lambda_test  
2023-01-11T21:20:26.1051856Z   inflating: build/bin/kernel_stackbased_test  
2023-01-11T21:20:26.1135879Z   inflating: build/bin/make_boxed_from_unboxed_functor_test  
2023-01-11T21:20:26.1179622Z   inflating: build/bin/CppSignature_test  
2023-01-11T21:20:26.1220359Z   inflating: build/bin/op_allowlist_test  
2023-01-11T21:20:26.1267360Z   inflating: build/bin/inline_container_test  
2023-01-11T21:20:26.1317918Z   inflating: build/bin/backend_fallback_test  
2023-01-11T21:20:26.1566870Z   inflating: build/bin/op_registration_test  
2023-01-11T21:20:26.1611978Z   inflating: build/bin/cuda_apply_test  
2023-01-11T21:20:26.1659559Z   inflating: build/bin/cuda_caching_host_allocator_test  
2023-01-11T21:20:26.1701956Z   inflating: build/bin/cuda_device_test  
2023-01-11T21:20:26.1764314Z   inflating: build/bin/cuda_complex_math_test  
2023-01-11T21:20:26.1807877Z   inflating: build/bin/cuda_dlconvertor_test  
2023-01-11T21:20:26.1857582Z   inflating: build/bin/cuda_complex_test  
2023-01-11T21:20:26.1909826Z   inflating: build/bin/cuda_atomic_ops_test  
2023-01-11T21:20:26.1962971Z   inflating: build/bin/cuda_cub_test  
2023-01-11T21:20:26.2007082Z   inflating: build/bin/cuda_integer_divider_test  
2023-01-11T21:20:26.2052055Z   inflating: build/bin/cuda_reportMemoryUsage_test  
2023-01-11T21:20:26.2109622Z   inflating: build/bin/cuda_distributions_test  
2023-01-11T21:20:26.2163255Z   inflating: build/bin/cuda_stream_test  
2023-01-11T21:20:26.2206239Z   inflating: build/bin/cuda_half_test  
2023-01-11T21:20:26.2247553Z   inflating: build/bin/cuda_cudnn_test  
2023-01-11T21:20:26.2298551Z   inflating: build/bin/cuda_generator_test  
2023-01-11T21:20:26.2312467Z   inflating: build/bin/tutorial_tensorexpr  
2023-01-11T21:20:26.2369393Z   inflating: build/bin/ProcessGroupGlooTest  
2023-01-11T21:20:26.2420033Z   inflating: build/bin/ProcessGroupGlooAsyncTest  
2023-01-11T21:20:26.2474660Z   inflating: build/bin/ProcessGroupNCCLTest  
2023-01-11T21:20:26.2525238Z   inflating: build/bin/ProcessGroupNCCLErrorsTest  
2023-01-11T21:20:26.2567648Z   inflating: build/bin/cuda_optional_test  
2023-01-11T21:20:26.2613281Z   inflating: build/bin/ProcessGroupUCCTest  
2023-01-11T21:20:26.2660352Z   inflating: build/bin/test_dist_autograd  
2023-01-11T21:20:26.2705097Z   inflating: build/bin/cuda_packedtensoraccessor_test  
2023-01-11T21:20:26.2766711Z   inflating: build/bin/test_cpp_rpc  
2023-01-11T21:20:26.2775127Z   inflating: build/bin/aot_model_compiler_test  
2023-01-11T21:20:26.2776497Z   inflating: build/bin/parallel_benchmark  
2023-01-11T21:20:26.2837619Z   inflating: build/bin/test_mobile_nnc  
2023-01-11T21:20:26.3552462Z   inflating: build/bin/test_tensorexpr  
2023-01-11T21:20:26.3858065Z   inflating: build/bin/test_lazy     
2023-01-11T21:20:26.3903049Z   inflating: build/bin/cuda_vectorized_test  
2023-01-11T21:20:26.3907098Z   inflating: build/bin/torch_shm_manager  
2023-01-11T21:20:26.4943987Z   inflating: build/bin/test_api      
2023-01-11T21:20:26.5891923Z   inflating: build/bin/test_jit      
2023-01-11T21:20:26.5916575Z ##[group]Run df -H
2023-01-11T21:20:26.5916777Z [36;1mdf -H[0m
2023-01-11T21:20:26.5927711Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T21:20:26.5927943Z env:
2023-01-11T21:20:26.5928134Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:26.5928351Z   GPU_FLAG: --gpus all
2023-01-11T21:20:26.5928541Z ##[endgroup]
2023-01-11T21:20:26.5955889Z Filesystem      Size  Used Avail Use% Mounted on
2023-01-11T21:20:26.5956161Z devtmpfs         34G     0   34G   0% /dev
2023-01-11T21:20:26.5956443Z tmpfs            34G     0   34G   0% /dev/shm
2023-01-11T21:20:26.5956938Z tmpfs            34G  476k   34G   1% /run
2023-01-11T21:20:26.5957185Z tmpfs            34G     0   34G   0% /sys/fs/cgroup
2023-01-11T21:20:26.5957419Z /dev/nvme0n1p1  162G   29G  133G  18% /
2023-01-11T21:20:26.5974020Z ##[group]Run .github/scripts/parse_ref.py
2023-01-11T21:20:26.5974279Z [36;1m.github/scripts/parse_ref.py[0m
2023-01-11T21:20:26.5983079Z shell: /usr/bin/bash -e {0}
2023-01-11T21:20:26.5983279Z env:
2023-01-11T21:20:26.5983463Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:26.5983676Z   GPU_FLAG: --gpus all
2023-01-11T21:20:26.5983874Z ##[endgroup]
2023-01-11T21:20:26.6198136Z ##[group]Run set -x
2023-01-11T21:20:26.6198464Z [36;1mset -x[0m
2023-01-11T21:20:26.6198703Z [36;1m[0m
2023-01-11T21:20:26.6198993Z [36;1mif [[ $TEST_CONFIG == 'multigpu' ]]; then[0m
2023-01-11T21:20:26.6199274Z [36;1m  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh[0m
2023-01-11T21:20:26.6199565Z [36;1melif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then[0m
2023-01-11T21:20:26.6199861Z [36;1m  TEST_COMMAND=.jenkins/onnx/test.sh[0m
2023-01-11T21:20:26.6200083Z [36;1melse[0m
2023-01-11T21:20:26.6200300Z [36;1m  TEST_COMMAND=.jenkins/pytorch/test.sh[0m
2023-01-11T21:20:26.6200521Z [36;1mfi[0m
2023-01-11T21:20:26.6200707Z [36;1m[0m
2023-01-11T21:20:26.6200958Z [36;1mCOMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}")[0m
2023-01-11T21:20:26.6201232Z [36;1m[0m
2023-01-11T21:20:26.6201484Z [36;1m# sanitize the input commit message and PR body here:[0m
2023-01-11T21:20:26.6201734Z [36;1m#[0m
2023-01-11T21:20:26.6202053Z [36;1m# trim all new lines from commit messages + PR_BODY to avoid issues with batch environment[0m
2023-01-11T21:20:26.6202439Z [36;1m# variable copying. see https://github.com/pytorch/pytorch/pull/80043#issuecomment-1167796028[0m
2023-01-11T21:20:26.6202773Z [36;1mCOMMIT_MESSAGES="${COMMIT_MESSAGES//[$'\n\r']}"[0m
2023-01-11T21:20:26.6203027Z [36;1mPR_BODY="${PR_BODY//[$'\n\r']}"[0m
2023-01-11T21:20:26.6203238Z [36;1m[0m
2023-01-11T21:20:26.6203517Z [36;1m# then trim all special characters like single and double quotes to avoid unescaped inputs to[0m
2023-01-11T21:20:26.6203811Z [36;1m# wreak havoc internally[0m
2023-01-11T21:20:26.6204072Z [36;1mexport COMMIT_MESSAGES="${COMMIT_MESSAGES//[\'\"]}"[0m
2023-01-11T21:20:26.6204414Z [36;1mexport PR_BODY="${PR_BODY//[\'\"]}"[0m
2023-01-11T21:20:26.6204625Z [36;1m[0m
2023-01-11T21:20:26.6204869Z [36;1m# detached container should get cleaned up by teardown_ec2_linux[0m
2023-01-11T21:20:26.6205175Z [36;1m# TODO: Stop building test binaries as part of the build phase[0m
2023-01-11T21:20:26.6205474Z [36;1m# Used for GPU_FLAG since that doesn't play nice[0m
2023-01-11T21:20:26.6205734Z [36;1m# shellcheck disable=SC2086,SC2090[0m
2023-01-11T21:20:26.6205972Z [36;1mcontainer_name=$(docker run \[0m
2023-01-11T21:20:26.6206185Z [36;1m  ${GPU_FLAG:-} \[0m
2023-01-11T21:20:26.6206402Z [36;1m  -e BUILD_ENVIRONMENT \[0m
2023-01-11T21:20:26.6206624Z [36;1m  -e PR_NUMBER \[0m
2023-01-11T21:20:26.6206830Z [36;1m  -e GITHUB_ACTIONS \[0m
2023-01-11T21:20:26.6207040Z [36;1m  -e BASE_SHA \[0m
2023-01-11T21:20:26.6207239Z [36;1m  -e BRANCH \[0m
2023-01-11T21:20:26.6207428Z [36;1m  -e SHA1 \[0m
2023-01-11T21:20:26.6207639Z [36;1m  -e AWS_DEFAULT_REGION \[0m
2023-01-11T21:20:26.6207854Z [36;1m  -e IN_WHEEL_TEST \[0m
2023-01-11T21:20:26.6208054Z [36;1m  -e SHARD_NUMBER \[0m
2023-01-11T21:20:26.6208262Z [36;1m  -e TEST_CONFIG \[0m
2023-01-11T21:20:26.6208476Z [36;1m  -e NUM_TEST_SHARDS \[0m
2023-01-11T21:20:26.6208683Z [36;1m  -e PR_BODY \[0m
2023-01-11T21:20:26.6208887Z [36;1m  -e COMMIT_MESSAGES \[0m
2023-01-11T21:20:26.6209111Z [36;1m  -e CONTINUE_THROUGH_ERROR \[0m
2023-01-11T21:20:26.6209349Z [36;1m  -e PYTORCH_RETRY_TEST_CASES \[0m
2023-01-11T21:20:26.6209591Z [36;1m  -e PYTORCH_OVERRIDE_FLAKY_SIGNAL \[0m
2023-01-11T21:20:26.6209822Z [36;1m  -e PR_LABELS \[0m
2023-01-11T21:20:26.6210053Z [36;1m  -e MAX_JOBS="$(nproc --ignore=2)" \[0m
2023-01-11T21:20:26.6210280Z [36;1m  -e SCCACHE_BUCKET \[0m
2023-01-11T21:20:26.6210502Z [36;1m  -e SCCACHE_S3_KEY_PREFIX \[0m
2023-01-11T21:20:26.6210717Z [36;1m  -e XLA_CUDA \[0m
2023-01-11T21:20:26.6210934Z [36;1m  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \[0m
2023-01-11T21:20:26.6211190Z [36;1m  -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK \[0m
2023-01-11T21:20:26.6211451Z [36;1m  -e PYTORCH_TEST_RERUN_DISABLED_TESTS \[0m
2023-01-11T21:20:26.6211729Z [36;1m  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \[0m
2023-01-11T21:20:26.6211976Z [36;1m  --ulimit stack=10485760:83886080 \[0m
2023-01-11T21:20:26.6212261Z [36;1m  --security-opt seccomp=unconfined \[0m
2023-01-11T21:20:26.6212503Z [36;1m  --cap-add=SYS_PTRACE \[0m
2023-01-11T21:20:26.6212711Z [36;1m  --ipc=host \[0m
2023-01-11T21:20:26.6212929Z [36;1m  --shm-size="${SHM_SIZE}" \[0m
2023-01-11T21:20:26.6213212Z [36;1m  --tty \[0m
2023-01-11T21:20:26.6213398Z [36;1m  --detach \[0m
2023-01-11T21:20:26.6213612Z [36;1m  --name="${container_name}" \[0m
2023-01-11T21:20:26.6213830Z [36;1m  --user jenkins \[0m
2023-01-11T21:20:26.6214077Z [36;1m  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \[0m
2023-01-11T21:20:26.6214344Z [36;1m  -w /var/lib/jenkins/workspace \[0m
2023-01-11T21:20:26.6214698Z [36;1m  "${DOCKER_IMAGE}"[0m
2023-01-11T21:20:26.6214891Z [36;1m)[0m
2023-01-11T21:20:26.6215125Z [36;1mecho "DOCKER_CONTAINER_ID=${container_name}" >> "${GITHUB_ENV}"[0m
2023-01-11T21:20:26.6215476Z [36;1mdocker exec -t "${container_name}" sh -c "pip install $(echo dist/*.whl)[opt-einsum] && ${TEST_COMMAND}"[0m
2023-01-11T21:20:26.6226107Z shell: /usr/bin/bash -e {0}
2023-01-11T21:20:26.6226304Z env:
2023-01-11T21:20:26.6226501Z   GIT_DEFAULT_BRANCH: master
2023-01-11T21:20:26.6226719Z   GPU_FLAG: --gpus all
2023-01-11T21:20:26.6226989Z   BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T21:20:26.6227245Z   PR_NUMBER: 
2023-01-11T21:20:26.6227430Z   BRANCH: 
2023-01-11T21:20:26.6227645Z   SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:26.6227916Z   BASE_SHA: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:26.6228166Z   PYTORCH_RETRY_TEST_CASES: 1
2023-01-11T21:20:26.6228386Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2023-01-11T21:20:26.6228608Z   TEST_CONFIG: default
2023-01-11T21:20:26.6228852Z   SHARD_NUMBER: 2
2023-01-11T21:20:26.6229040Z   NUM_TEST_SHARDS: 4
2023-01-11T21:20:26.6229249Z   PR_BODY: 
2023-01-11T21:20:26.6229483Z   CONTINUE_THROUGH_ERROR: False
2023-01-11T21:20:26.6229756Z   SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2
2023-01-11T21:20:26.6230008Z   SCCACHE_S3_KEY_PREFIX: trunk
2023-01-11T21:20:26.6230212Z   SHM_SIZE: 2g
2023-01-11T21:20:26.6230594Z   DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:20:26.6230955Z   XLA_CUDA: 
2023-01-11T21:20:26.6231227Z   XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla
2023-01-11T21:20:26.6231521Z   PYTORCH_TEST_CUDA_MEM_LEAK_CHECK: 0
2023-01-11T21:20:26.6231753Z   PYTORCH_TEST_RERUN_DISABLED_TESTS: 0
2023-01-11T21:20:26.6231965Z ##[endgroup]
2023-01-11T21:20:26.6255899Z + [[ default == \m\u\l\t\i\g\p\u ]]
2023-01-11T21:20:26.6257096Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *onnx* ]]
2023-01-11T21:20:26.6257694Z + TEST_COMMAND=.jenkins/pytorch/test.sh
2023-01-11T21:20:26.6258559Z ++ git cherry -v origin/master
2023-01-11T21:20:26.6669375Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\''
2023-01-11T21:20:26.6669843Z + 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch
2023-01-11T21:20:26.6670301Z + 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation'
2023-01-11T21:20:26.6672086Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into '\''input'\''+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e '\''other'\'' instead of '\''output'\'' in documentation'
2023-01-11T21:20:26.6673034Z + PR_BODY=
2023-01-11T21:20:26.6675987Z + export 'COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation'
2023-01-11T21:20:26.6678082Z + COMMIT_MESSAGES='+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation'
2023-01-11T21:20:26.6679021Z + export PR_BODY=
2023-01-11T21:20:26.6679295Z + PR_BODY=
2023-01-11T21:20:26.6683916Z +++ nproc --ignore=2
2023-01-11T21:20:26.6694465Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e GITHUB_ACTIONS -e BASE_SHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e CONTINUE_THROUGH_ERROR -e PYTORCH_RETRY_TEST_CASES -e PYTORCH_OVERRIDE_FLAKY_SIGNAL -e PR_LABELS -e MAX_JOBS=14 -e SCCACHE_BUCKET -e SCCACHE_S3_KEY_PREFIX -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME -e PYTORCH_TEST_CUDA_MEM_LEAK_CHECK -e PYTORCH_TEST_RERUN_DISABLED_TESTS --env-file=/tmp/github_env_3896346758 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T21:20:40.4350268Z + container_name=b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T21:20:40.4350706Z + echo DOCKER_CONTAINER_ID=b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T21:20:40.4353337Z ++ echo dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl
2023-01-11T21:20:40.4354475Z + docker exec -t b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6 sh -c 'pip install dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl[opt-einsum] && .jenkins/pytorch/test.sh'
2023-01-11T21:20:40.8166662Z Processing ./dist/torch-2.0.0a0+git8419ddd-cp310-cp310-linux_x86_64.whl
2023-01-11T21:20:41.0777824Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (4.4.0)
2023-01-11T21:20:41.0779647Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (1.11.1)
2023-01-11T21:20:41.0780648Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (2.6.3)
2023-01-11T21:20:41.0790397Z Requirement already satisfied: opt-einsum>=3.3 in /opt/conda/lib/python3.10/site-packages (from torch==2.0.0a0+git8419ddd) (3.3.0)
2023-01-11T21:20:41.0845204Z Requirement already satisfied: numpy>=1.7 in /opt/conda/lib/python3.10/site-packages (from opt-einsum>=3.3->torch==2.0.0a0+git8419ddd) (1.21.2)
2023-01-11T21:20:41.0990474Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch==2.0.0a0+git8419ddd) (1.2.1)
2023-01-11T21:20:41.7507138Z Installing collected packages: torch
2023-01-11T21:20:48.2537998Z Successfully installed torch-2.0.0a0+git8419ddd
2023-01-11T21:20:48.4025660Z + echo 'Environment variables:'
2023-01-11T21:20:48.4025928Z Environment variables:
2023-01-11T21:20:48.4026124Z + env
2023-01-11T21:20:48.4031541Z SHARD_NUMBER=2
2023-01-11T21:20:48.4032062Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1
2023-01-11T21:20:48.4032395Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6
2023-01-11T21:20:48.4032683Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-01-11T21:20:48.4033009Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6
2023-01-11T21:20:48.4034351Z UCC_HOME=/usr
2023-01-11T21:20:48.4034884Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T21:20:48.4035190Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0
2023-01-11T21:20:48.4035601Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1
2023-01-11T21:20:48.4035979Z INSTALLED_DB=yes
2023-01-11T21:20:48.4036391Z HOSTNAME=b465a1e11c77
2023-01-11T21:20:48.4036619Z GITHUB_REF_NAME=ciflow/trunk/91627
2023-01-11T21:20:48.4036892Z GITHUB_API_URL=https://api.github.com
2023-01-11T21:20:48.4037235Z GITHUB_REPOSITORY_OWNER_ID=21003710
2023-01-11T21:20:48.4037717Z OPENSSL_DIR=/opt/openssl
2023-01-11T21:20:48.4038043Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee
2023-01-11T21:20:48.4044501Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4045177Z CUDA_PATH=/usr/local/cuda
2023-01-11T21:20:48.4045756Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2023-01-11T21:20:48.4046082Z GITHUB_RUN_ATTEMPT=1
2023-01-11T21:20:48.4046287Z TEST_CONFIG=default
2023-01-11T21:20:48.4046548Z NV_LIBNPP_VERSION=11.6.3.124-1
2023-01-11T21:20:48.4046917Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1
2023-01-11T21:20:48.4047174Z GITHUB_REPOSITORY_OWNER=pytorch
2023-01-11T21:20:48.4047428Z GITHUB_ACTIONS=true
2023-01-11T21:20:48.4047686Z NVIDIA_VISIBLE_DEVICES=all
2023-01-11T21:20:48.4047927Z NV_NVPROF_VERSION=11.6.124-1
2023-01-11T21:20:48.4048169Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1
2023-01-11T21:20:48.4048573Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627
2023-01-11T21:20:48.4048861Z NVIDIA_PRODUCT_NAME=CUDA
2023-01-11T21:20:48.4049071Z CI=true
2023-01-11T21:20:48.4049337Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1
2023-01-11T21:20:48.4049653Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1
2023-01-11T21:20:48.4049886Z BRANCH=
2023-01-11T21:20:48.4050146Z GITHUB_HEAD_REF=
2023-01-11T21:20:48.4050393Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab
2023-01-11T21:20:48.4050673Z GITHUB_ACTOR=pytorch-bot[bot]
2023-01-11T21:20:48.4051011Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
2023-01-11T21:20:48.4051384Z GITHUB_ACTION_REF=
2023-01-11T21:20:48.4051628Z NCCL_VERSION=2.12.10-1
2023-01-11T21:20:48.4051897Z GITHUB_ACTION=__self
2023-01-11T21:20:48.4052109Z GITHUB_REF_PROTECTED=false
2023-01-11T21:20:48.4052469Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2023-01-11T21:20:48.4052860Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2023-01-11T21:20:48.4056417Z ***
2023-01-11T21:20:48.4056728Z INSTALLED_VISION=yes
2023-01-11T21:20:48.4056927Z NVARCH=x86_64
2023-01-11T21:20:48.4057297Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1
2023-01-11T21:20:48.4057533Z HOME=/var/lib/jenkins
2023-01-11T21:20:48.4057953Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4058288Z CARGO_NET_GIT_FETCH_WITH_CLI=true
2023-01-11T21:20:48.4058514Z GITHUB_ACTION_REPOSITORY=
2023-01-11T21:20:48.4058714Z GITHUB_REF_TYPE=tag
2023-01-11T21:20:48.4058952Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1
2023-01-11T21:20:48.4059189Z GITHUB_RETENTION_DAYS=90
2023-01-11T21:20:48.4059491Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2023-01-11T21:20:48.4059808Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6
2023-01-11T21:20:48.4060236Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4060560Z DEBIAN_FRONTEND=noninteractive
2023-01-11T21:20:48.4060827Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
2023-01-11T21:20:48.4061073Z GITHUB_REF=refs/tags/ciflow/trunk/91627
2023-01-11T21:20:48.4061322Z NV_CUDA_LIB_VERSION=11.6.2-1
2023-01-11T21:20:48.4061566Z GITHUB_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4061810Z INSTALLED_PROTOBUF=yes
2023-01-11T21:20:48.4062025Z GITHUB_REPOSITORY_ID=65600975
2023-01-11T21:20:48.4062228Z GITHUB_RUN_ID=3896346758
2023-01-11T21:20:48.4062503Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1
2023-01-11T21:20:48.4062740Z NV_LIBNCCL_PACKAGE_NAME=libnccl2
2023-01-11T21:20:48.4062970Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs
2023-01-11T21:20:48.4063217Z NV_NVTX_VERSION=11.6.124-1
2023-01-11T21:20:48.4063432Z CONTINUE_THROUGH_ERROR=False
2023-01-11T21:20:48.4063668Z GITHUB_SERVER_URL=https://github.com
2023-01-11T21:20:48.4063887Z MAX_JOBS=14
2023-01-11T21:20:48.4064080Z GITHUB_ACTOR_ID=54816060
2023-01-11T21:20:48.4064306Z NV_LIBCUBLAS_VERSION=11.9.2.110-1
2023-01-11T21:20:48.4064801Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1
2023-01-11T21:20:48.4065183Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2023-01-11T21:20:48.4065453Z UCX_HOME=/usr
2023-01-11T21:20:48.4065643Z PYTORCH_RETRY_TEST_CASES=1
2023-01-11T21:20:48.4065900Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2023-01-11T21:20:48.4066179Z BASE_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4066441Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1
2023-01-11T21:20:48.4066652Z PR_BODY=
2023-01-11T21:20:48.4066832Z GITHUB_BASE_REF=
2023-01-11T21:20:48.4067008Z TERM=xterm
2023-01-11T21:20:48.4067185Z XLA_CUDA=
2023-01-11T21:20:48.4067396Z NV_NVML_DEV_VERSION=11.6.55-1
2023-01-11T21:20:48.4067604Z TORCH_CUDA_ARCH_LIST=Maxwell
2023-01-11T21:20:48.4067809Z CUDA_VERSION=11.6.2
2023-01-11T21:20:48.4068073Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6
2023-01-11T21:20:48.4068303Z OPENSSL_ROOT_DIR=/opt/openssl
2023-01-11T21:20:48.4068725Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4069027Z GITHUB_JOB=test
2023-01-11T21:20:48.4069221Z SCCACHE_S3_KEY_PREFIX=trunk
2023-01-11T21:20:48.4069687Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation
2023-01-11T21:20:48.4070231Z NVIDIA_DRIVER_CAPABILITIES=compute,utility
2023-01-11T21:20:48.4070463Z NUM_TEST_SHARDS=4
2023-01-11T21:20:48.4070641Z PR_NUMBER=
2023-01-11T21:20:48.4071164Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4071464Z SHLVL=1
2023-01-11T21:20:48.4071737Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6
2023-01-11T21:20:48.4071998Z GITHUB_REPOSITORY=pytorch/pytorch
2023-01-11T21:20:48.4072614Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471
2023-01-11T21:20:48.4073190Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1
2023-01-11T21:20:48.4073443Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4073672Z GITHUB_EVENT_NAME=push
2023-01-11T21:20:48.4073916Z NV_CUDA_CUDART_VERSION=11.6.55-1
2023-01-11T21:20:48.4074197Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2023-01-11T21:20:48.4074421Z GITHUB_RUN_NUMBER=22986
2023-01-11T21:20:48.4074627Z GITHUB_WORKFLOW=trunk
2023-01-11T21:20:48.4074964Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2023-01-11T21:20:48.4075310Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1
2023-01-11T21:20:48.4075589Z GITHUB_WORKFLOW_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4075964Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2023-01-11T21:20:48.4076296Z GITHUB_TRIGGERING_ACTOR=pytorch-bot[bot]
2023-01-11T21:20:48.4076513Z _=/usr/bin/env
2023-01-11T21:20:48.4076810Z ++ python -c 'import site; print(site.getsitepackages()[0])'
2023-01-11T21:20:48.4178294Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch
2023-01-11T21:20:48.4178756Z + TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T21:20:48.4179147Z + TORCH_LIB_DIR=/opt/conda/lib/python3.10/site-packages/torch/lib
2023-01-11T21:20:48.4180787Z + TORCH_TEST_DIR=/opt/conda/lib/python3.10/site-packages/torch/test
2023-01-11T21:20:48.4181309Z + BUILD_DIR=build
2023-01-11T21:20:48.4181721Z + BUILD_RENAMED_DIR=build_renamed
2023-01-11T21:20:48.4182352Z + BUILD_BIN_DIR=build/bin
2023-01-11T21:20:48.4182753Z + export VALGRIND=ON
2023-01-11T21:20:48.4183151Z + VALGRIND=ON
2023-01-11T21:20:48.4183568Z + export TORCH_INDUCTOR_INSTALL_GXX=ON
2023-01-11T21:20:48.4184017Z + TORCH_INDUCTOR_INSTALL_GXX=ON
2023-01-11T21:20:48.4184671Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *clang9* ]]
2023-01-11T21:20:48.4185360Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *bazel* ]]
2023-01-11T21:20:48.4185625Z ++ realpath build/custom_test_artifacts
2023-01-11T21:20:48.4187748Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts
2023-01-11T21:20:48.4190481Z ++ dirname .jenkins/pytorch/test.sh
2023-01-11T21:20:48.4196340Z + source .jenkins/pytorch/common.sh
2023-01-11T21:20:48.4199076Z +++ dirname .jenkins/pytorch/common.sh
2023-01-11T21:20:48.4205377Z ++ source .jenkins/pytorch/common_utils.sh
2023-01-11T21:20:48.4207783Z +++ declare -f -t trap_add
2023-01-11T21:20:48.4212939Z ++ set -ex
2023-01-11T21:20:48.4213592Z ++ [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *rocm* ]]
2023-01-11T21:20:48.4213905Z ++ BUILD_TEST_LIBTORCH=0
2023-01-11T21:20:48.4214272Z + echo 'Environment variables'
2023-01-11T21:20:48.4214679Z Environment variables
2023-01-11T21:20:48.4214887Z + env
2023-01-11T21:20:48.4218852Z SHARD_NUMBER=2
2023-01-11T21:20:48.4219195Z NV_LIBCUBLAS_DEV_VERSION=11.9.2.110-1
2023-01-11T21:20:48.4219564Z NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-6
2023-01-11T21:20:48.4221545Z LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2023-01-11T21:20:48.4222062Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.12.10-1+cuda11.6
2023-01-11T21:20:48.4222377Z UCC_HOME=/usr
2023-01-11T21:20:48.4222824Z BUILD_ENVIRONMENT=linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T21:20:48.4223430Z PYTORCH_TEST_CUDA_MEM_LEAK_CHECK=0
2023-01-11T21:20:48.4223876Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-11-6=11.6.3.124-1
2023-01-11T21:20:48.4224167Z INSTALLED_DB=yes
2023-01-11T21:20:48.4224375Z HOSTNAME=b465a1e11c77
2023-01-11T21:20:48.4224596Z GITHUB_REF_NAME=ciflow/trunk/91627
2023-01-11T21:20:48.4224861Z GITHUB_API_URL=https://api.github.com
2023-01-11T21:20:48.4225107Z GITHUB_REPOSITORY_OWNER_ID=21003710
2023-01-11T21:20:48.4225340Z OPENSSL_DIR=/opt/openssl
2023-01-11T21:20:48.4225588Z UCC_COMMIT=1c7a7127186e7836f73aafbd7697bbc274a77eee
2023-01-11T21:20:48.4226054Z GITHUB_STEP_SUMMARY=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/step_summary_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4226386Z CUDA_PATH=/usr/local/cuda
2023-01-11T21:20:48.4226776Z GITHUB_ACTION_PATH=/home/ec2-user/actions-runner/_work/pytorch/pytorch/./.github/actions/setup-linux
2023-01-11T21:20:48.4227067Z GITHUB_RUN_ATTEMPT=1
2023-01-11T21:20:48.4227274Z TEST_CONFIG=default
2023-01-11T21:20:48.4227503Z NV_LIBNPP_VERSION=11.6.3.124-1
2023-01-11T21:20:48.4227796Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-11-6=11.6.124-1
2023-01-11T21:20:48.4228059Z GITHUB_REPOSITORY_OWNER=pytorch
2023-01-11T21:20:48.4228278Z GITHUB_ACTIONS=true
2023-01-11T21:20:48.4228503Z NVIDIA_VISIBLE_DEVICES=all
2023-01-11T21:20:48.4228782Z NV_NVPROF_VERSION=11.6.124-1
2023-01-11T21:20:48.4229096Z NV_LIBCUSPARSE_VERSION=11.7.2.124-1
2023-01-11T21:20:48.4229500Z GITHUB_WORKFLOW_REF=pytorch/pytorch/.github/workflows/trunk.yml@refs/tags/ciflow/trunk/91627
2023-01-11T21:20:48.4229872Z NVIDIA_PRODUCT_NAME=CUDA
2023-01-11T21:20:48.4230118Z CI=true
2023-01-11T21:20:48.4230319Z PYTORCH_OVERRIDE_FLAKY_SIGNAL=1
2023-01-11T21:20:48.4230634Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-11-6=11.9.2.110-1
2023-01-11T21:20:48.4230870Z BRANCH=
2023-01-11T21:20:48.4231046Z GITHUB_HEAD_REF=
2023-01-11T21:20:48.4231283Z UCX_COMMIT=31e74cac7bee0ef66bef2af72e7d86d9c282e5ab
2023-01-11T21:20:48.4231577Z GITHUB_ACTOR=pytorch-bot[bot]
2023-01-11T21:20:48.4231832Z CMAKE_CUDA_COMPILER_LAUNCHER=/opt/cache/bin/sccache
2023-01-11T21:20:48.4232055Z GITHUB_ACTION_REF=
2023-01-11T21:20:48.4232271Z NCCL_VERSION=2.12.10-1
2023-01-11T21:20:48.4232475Z GITHUB_ACTION=__self
2023-01-11T21:20:48.4232660Z VALGRIND=ON
2023-01-11T21:20:48.4232858Z GITHUB_REF_PROTECTED=false
2023-01-11T21:20:48.4233328Z XLA_CLANG_CACHE_S3_BUCKET_NAME=ossci-compiler-clang-cache-circleci-xla
2023-01-11T21:20:48.4233638Z PYTORCH_TEST_RERUN_DISABLED_TESTS=0
2023-01-11T21:20:48.4233970Z ***
2023-01-11T21:20:48.4234148Z INSTALLED_VISION=yes
2023-01-11T21:20:48.4234350Z NVARCH=x86_64
2023-01-11T21:20:48.4234590Z NV_LIBCUSPARSE_DEV_VERSION=11.7.2.124-1
2023-01-11T21:20:48.4234808Z HOME=/var/lib/jenkins
2023-01-11T21:20:48.4235225Z GITHUB_STATE=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/save_state_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4235602Z CARGO_NET_GIT_FETCH_WITH_CLI=true
2023-01-11T21:20:48.4235822Z GITHUB_ACTION_REPOSITORY=
2023-01-11T21:20:48.4236038Z GITHUB_REF_TYPE=tag
2023-01-11T21:20:48.4236279Z NV_LIBNCCL_PACKAGE_VERSION=2.12.10-1
2023-01-11T21:20:48.4236509Z GITHUB_RETENTION_DAYS=90
2023-01-11T21:20:48.4236805Z SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2
2023-01-11T21:20:48.4237131Z NV_LIBNCCL_PACKAGE=libnccl2=2.12.10-1+cuda11.6
2023-01-11T21:20:48.4237564Z GITHUB_ENV=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_env_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4237875Z DEBIAN_FRONTEND=noninteractive
2023-01-11T21:20:48.4238155Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev
2023-01-11T21:20:48.4238402Z GITHUB_REF=refs/tags/ciflow/trunk/91627
2023-01-11T21:20:48.4238643Z NV_CUDA_LIB_VERSION=11.6.2-1
2023-01-11T21:20:48.4238900Z GITHUB_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4239146Z INSTALLED_PROTOBUF=yes
2023-01-11T21:20:48.4239357Z GITHUB_REPOSITORY_ID=65600975
2023-01-11T21:20:48.4239576Z GITHUB_RUN_ID=3896346758
2023-01-11T21:20:48.4239850Z NV_LIBNPP_PACKAGE=libnpp-11-6=11.6.3.124-1
2023-01-11T21:20:48.4240132Z NV_LIBNCCL_PACKAGE_NAME=libnccl2
2023-01-11T21:20:48.4240374Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs
2023-01-11T21:20:48.4240627Z NV_NVTX_VERSION=11.6.124-1
2023-01-11T21:20:48.4240840Z CONTINUE_THROUGH_ERROR=False
2023-01-11T21:20:48.4241083Z GITHUB_SERVER_URL=https://github.com
2023-01-11T21:20:48.4241313Z MAX_JOBS=14
2023-01-11T21:20:48.4241499Z GITHUB_ACTOR_ID=54816060
2023-01-11T21:20:48.4241738Z NV_LIBCUBLAS_VERSION=11.9.2.110-1
2023-01-11T21:20:48.4242039Z NV_LIBCUBLAS_PACKAGE=libcublas-11-6=11.9.2.110-1
2023-01-11T21:20:48.4242419Z GITHUB_EVENT_PATH=/home/ec2-user/actions-runner/_work/_temp/_github_workflow/event.json
2023-01-11T21:20:48.4242691Z UCX_HOME=/usr
2023-01-11T21:20:48.4242897Z PYTORCH_RETRY_TEST_CASES=1
2023-01-11T21:20:48.4243161Z GITHUB_GRAPHQL_URL=https://api.github.com/graphql
2023-01-11T21:20:48.4243441Z BASE_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4243720Z NV_CUDA_CUDART_DEV_VERSION=11.6.55-1
2023-01-11T21:20:48.4243936Z PR_BODY=
2023-01-11T21:20:48.4244114Z GITHUB_BASE_REF=
2023-01-11T21:20:48.4244305Z TERM=xterm
2023-01-11T21:20:48.4244511Z TORCH_INDUCTOR_INSTALL_GXX=ON
2023-01-11T21:20:48.4244709Z XLA_CUDA=
2023-01-11T21:20:48.4244923Z NV_NVML_DEV_VERSION=11.6.55-1
2023-01-11T21:20:48.4245144Z TORCH_CUDA_ARCH_LIST=Maxwell
2023-01-11T21:20:48.4245347Z CUDA_VERSION=11.6.2
2023-01-11T21:20:48.4245614Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-6
2023-01-11T21:20:48.4245854Z OPENSSL_ROOT_DIR=/opt/openssl
2023-01-11T21:20:48.4246303Z GITHUB_PATH=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/add_path_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4246620Z GITHUB_JOB=test
2023-01-11T21:20:48.4246826Z SCCACHE_S3_KEY_PREFIX=trunk
2023-01-11T21:20:48.4247296Z COMMIT_MESSAGES=+ 52a16ce42647731c772e14e7175afa40fda07b3d make torchgen rename also Number arguments into input+ 87db01a53ecb702267ec36787654e418a52f8e93 fix torch.where signature mismatch+ 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e other instead of output in documentation
2023-01-11T21:20:48.4247766Z NVIDIA_DRIVER_CAPABILITIES=compute,utility
2023-01-11T21:20:48.4247996Z NUM_TEST_SHARDS=4
2023-01-11T21:20:48.4248187Z PR_NUMBER=
2023-01-11T21:20:48.4248584Z GITHUB_OUTPUT=/home/ec2-user/actions-runner/_work/_temp/_runner_file_commands/set_output_1df51260-fb3e-4787-8fe3-b0d112ea072e
2023-01-11T21:20:48.4248924Z SHLVL=1
2023-01-11T21:20:48.4249199Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-11-6
2023-01-11T21:20:48.4249455Z GITHUB_REPOSITORY=pytorch/pytorch
2023-01-11T21:20:48.4250060Z NVIDIA_REQUIRE_CUDA=cuda>=11.6 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451 brand=tesla,driver>=470,driver<471 brand=unknown,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=geforce,driver>=470,driver<471 brand=geforcertx,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=titan,driver>=470,driver<471 brand=titanrtx,driver>=470,driver<471
2023-01-11T21:20:48.4250651Z NV_LIBNPP_DEV_VERSION=11.6.3.124-1
2023-01-11T21:20:48.4250906Z SHA1=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4251137Z GITHUB_EVENT_NAME=push
2023-01-11T21:20:48.4251379Z NV_CUDA_CUDART_VERSION=11.6.55-1
2023-01-11T21:20:48.4251654Z TORCH_NVCC_FLAGS=-Xfatbin -compress-all
2023-01-11T21:20:48.4251919Z GITHUB_RUN_NUMBER=22986
2023-01-11T21:20:48.4252121Z GITHUB_WORKFLOW=trunk
2023-01-11T21:20:48.4252451Z PATH=/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2023-01-11T21:20:48.4252806Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.12.10-1
2023-01-11T21:20:48.4253076Z GITHUB_WORKFLOW_SHA=8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T21:20:48.4253547Z GITHUB_WORKSPACE=/home/ec2-user/actions-runner/_work/pytorch/pytorch
2023-01-11T21:20:48.4253878Z GITHUB_TRIGGERING_ACTOR=pytorch-bot[bot]
2023-01-11T21:20:48.4254091Z _=/usr/bin/env
2023-01-11T21:20:48.4254317Z + echo 'Testing pytorch'
2023-01-11T21:20:48.4254764Z Testing pytorch
2023-01-11T21:20:48.4255002Z + export LANG=C.UTF-8
2023-01-11T21:20:48.4255247Z + LANG=C.UTF-8
2023-01-11T21:20:48.4255493Z + PR_NUMBER=
2023-01-11T21:20:48.4255696Z + [[ default == \d\e\f\a\u\l\t ]]
2023-01-11T21:20:48.4255907Z + export CUDA_VISIBLE_DEVICES=0
2023-01-11T21:20:48.4256126Z + CUDA_VISIBLE_DEVICES=0
2023-01-11T21:20:48.4256338Z + export HIP_VISIBLE_DEVICES=0
2023-01-11T21:20:48.4256543Z + HIP_VISIBLE_DEVICES=0
2023-01-11T21:20:48.4256759Z + [[ default == \d\i\s\t\r\i\b\u\t\e\d ]]
2023-01-11T21:20:48.4256976Z + [[ default == \s\l\o\w ]]
2023-01-11T21:20:48.4257304Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *slow-gradcheck* ]]
2023-01-11T21:20:48.4257679Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *cuda* ]]
2023-01-11T21:20:48.4257960Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2023-01-11T21:20:48.4258202Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda
2023-01-11T21:20:48.4258429Z + [[ default == *crossref* ]]
2023-01-11T21:20:48.4258640Z + [[ default == *dynamo* ]]
2023-01-11T21:20:48.4258838Z + [[ default == *inductor* ]]
2023-01-11T21:20:48.4259150Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *rocm* ]]
2023-01-11T21:20:48.4259513Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *-bazel-* ]]
2023-01-11T21:20:48.4259812Z + pip_install --user ninja==1.10.2
2023-01-11T21:20:48.4260126Z + pip install --progress-bar off --user ninja==1.10.2
2023-01-11T21:20:48.7914126Z Collecting ninja==1.10.2
2023-01-11T21:20:48.8073931Z   Downloading ninja-1.10.2-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB)
2023-01-11T21:20:49.4199468Z Installing collected packages: ninja
2023-01-11T21:20:49.4267972Z [33m  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH.
2023-01-11T21:20:49.4268500Z   Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.[0m[33m
2023-01-11T21:20:49.4317182Z [0mSuccessfully installed ninja-1.10.2
2023-01-11T21:20:49.4966198Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2023-01-11T21:20:49.4967045Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2023-01-11T21:20:49.4967636Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *asan* ]]
2023-01-11T21:20:49.4968001Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *-tsan* ]]
2023-01-11T21:20:49.4968272Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2023-01-11T21:20:49.4968519Z + [[ default == \n\o\g\p\u\_\A\V\X\5\1\2 ]]
2023-01-11T21:20:49.4975788Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *tbb* ]]
2023-01-11T21:20:49.4985745Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *libtorch* ]]
2023-01-11T21:20:49.4986138Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *-bazel-* ]]
2023-01-11T21:20:49.4986500Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *-tsan* ]]
2023-01-11T21:20:49.4988293Z + cd test
2023-01-11T21:20:49.4988727Z + python -c 'import torch; print(torch.__config__.show())'
2023-01-11T21:20:50.7017088Z PyTorch built with:
2023-01-11T21:20:50.7017594Z   - GCC 7.5
2023-01-11T21:20:50.7017933Z   - C++ Version: 201703
2023-01-11T21:20:50.7018510Z   - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications
2023-01-11T21:20:50.7019103Z   - Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba)
2023-01-11T21:20:50.7019535Z   - OpenMP 201511 (a.k.a. OpenMP 4.5)
2023-01-11T21:20:50.7019934Z   - LAPACK is enabled (usually provided by MKL)
2023-01-11T21:20:50.7020265Z   - NNPACK is enabled
2023-01-11T21:20:50.7020582Z   - CPU capability usage: AVX2
2023-01-11T21:20:50.7020825Z   - CUDA Runtime 11.6
2023-01-11T21:20:50.7021134Z   - NVCC architecture flags: -gencode;arch=compute_86,code=sm_86
2023-01-11T21:20:50.7021437Z   - CuDNN 8.3.2  (built against CUDA 11.5)
2023-01-11T21:20:50.7021998Z   - Magma 2.6.1
2023-01-11T21:20:50.7024325Z   - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 
2023-01-11T21:20:50.7026038Z 
2023-01-11T21:20:50.9726584Z + cd test
2023-01-11T21:20:50.9727132Z + python -c 'import torch; print(torch.__config__.parallel_info())'
2023-01-11T21:20:52.0695883Z ATen/Parallel:
2023-01-11T21:20:52.0707702Z 	at::get_num_threads() : 8
2023-01-11T21:20:52.0707954Z 	at::get_num_interop_threads() : 8
2023-01-11T21:20:52.0708189Z OpenMP 201511 (a.k.a. OpenMP 4.5)
2023-01-11T21:20:52.0708400Z 	omp_get_max_threads() : 8
2023-01-11T21:20:52.0708933Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications
2023-01-11T21:20:52.0709252Z 	mkl_get_max_threads() : 8
2023-01-11T21:20:52.0709603Z Intel(R) MKL-DNN v2.7.2 (Git Hash fbec3e25a559ee252022ae066817b204e106a6ba)
2023-01-11T21:20:52.0709893Z std::thread::hardware_concurrency() : 16
2023-01-11T21:20:52.0710124Z Environment variables:
2023-01-11T21:20:52.0710328Z 	OMP_NUM_THREADS : [not set]
2023-01-11T21:20:52.0710538Z 	MKL_NUM_THREADS : [not set]
2023-01-11T21:20:52.0711099Z ATen parallel backend: OpenMP
2023-01-11T21:20:52.0711251Z 
2023-01-11T21:20:52.3144835Z + [[ default == *backward* ]]
2023-01-11T21:20:52.3145485Z + [[ default == *xla* ]]
2023-01-11T21:20:52.3146062Z + [[ default == \j\i\t\_\l\e\g\a\c\y ]]
2023-01-11T21:20:52.3147072Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *libtorch* ]]
2023-01-11T21:20:52.3147464Z + [[ default == distributed ]]
2023-01-11T21:20:52.3147858Z + [[ default == deploy ]]
2023-01-11T21:20:52.3148875Z + [[ default == *inductor_distributed* ]]
2023-01-11T21:20:52.3149708Z + [[ default == *dynamo* ]]
2023-01-11T21:20:52.3150165Z + [[ default == *dynamo* ]]
2023-01-11T21:20:52.3150618Z + [[ default == *inductor_huggingface* ]]
2023-01-11T21:20:52.3150983Z + [[ default == *inductor_timm* ]]
2023-01-11T21:20:52.3151319Z + [[ default == *inductor_torchbench* ]]
2023-01-11T21:20:52.3151626Z + [[ default == *inductor* ]]
2023-01-11T21:20:52.3151834Z + [[ 2 == 1 ]]
2023-01-11T21:20:52.3152051Z + [[ 2 == 2 ]]
2023-01-11T21:20:52.3152330Z + [[ 4 -gt 1 ]]
2023-01-11T21:20:52.3152537Z + install_torchvision
2023-01-11T21:20:52.3152773Z + local commit
2023-01-11T21:20:52.3153057Z ++ get_pinned_commit vision
2023-01-11T21:20:52.3153319Z ++ cat .github/ci_commit_pins/vision.txt
2023-01-11T21:20:52.3159820Z + commit=32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:52.3160604Z + pip_install --no-use-pep517 --user git+https://github.com/pytorch/vision.git@32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:52.3161435Z + pip install --progress-bar off --no-use-pep517 --user git+https://github.com/pytorch/vision.git@32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:52.6303510Z Collecting git+https://github.com/pytorch/vision.git@32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:52.6306352Z   Cloning https://github.com/pytorch/vision.git (to revision 32d254bbfcf14975f846765775584e61ef25a5bc) to /tmp/pip-req-build-2lgu94ct
2023-01-11T21:20:52.6321892Z   Running command git clone --filter=blob:none --quiet https://github.com/pytorch/vision.git /tmp/pip-req-build-2lgu94ct
2023-01-11T21:20:54.8556672Z   Running command git rev-parse -q --verify 'sha^32d254bbfcf14975f846765775584e61ef25a5bc'
2023-01-11T21:20:54.8576357Z   Running command git fetch -q https://github.com/pytorch/vision.git 32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:55.7900132Z   Running command git checkout -q 32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:56.3710976Z   Resolved https://github.com/pytorch/vision.git to commit 32d254bbfcf14975f846765775584e61ef25a5bc
2023-01-11T21:20:58.3718106Z   Preparing metadata (setup.py) ... [?25l- done
2023-01-11T21:20:58.3773502Z [?25hRequirement already satisfied: typing_extensions in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (4.4.0)
2023-01-11T21:20:58.3776158Z Requirement already satisfied: numpy in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (1.21.2)
2023-01-11T21:20:58.3779229Z Requirement already satisfied: requests in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (2.28.1)
2023-01-11T21:20:58.3781945Z Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (2.0.0a0+git8419ddd)
2023-01-11T21:20:58.3787893Z Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/conda/lib/python3.10/site-packages (from torchvision==0.15.0a0+32d254b) (9.3.0)
2023-01-11T21:20:58.3946762Z Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (2022.12.7)
2023-01-11T21:20:58.3952445Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (1.26.13)
2023-01-11T21:20:58.3957478Z Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (3.4)
2023-01-11T21:20:58.3964292Z Requirement already satisfied: charset-normalizer<3,>=2 in /opt/conda/lib/python3.10/site-packages (from requests->torchvision==0.15.0a0+32d254b) (2.0.4)
2023-01-11T21:20:58.4005417Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->torchvision==0.15.0a0+32d254b) (2.6.3)
2023-01-11T21:20:58.4007053Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->torchvision==0.15.0a0+32d254b) (1.11.1)
2023-01-11T21:20:58.4304796Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->torchvision==0.15.0a0+32d254b) (1.2.1)
2023-01-11T21:20:58.4358703Z Building wheels for collected packages: torchvision
2023-01-11T21:21:55.7272141Z   Building wheel for torchvision (setup.py) ... [?25l- \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / done
2023-01-11T21:21:55.7299683Z [?25h  Created wheel for torchvision: filename=torchvision-0.15.0a0+32d254b-cp310-cp310-linux_x86_64.whl size=1856205 sha256=7af400b8a3a64568380d22371ca34ed36a606a82d5c8659da96b47376d3eb0f5
2023-01-11T21:21:55.7300251Z   Stored in directory: /var/lib/jenkins/.cache/pip/wheels/ca/33/ae/1f7c8972d058d079236e7ca0a30b53b050afb405820b9ed787
2023-01-11T21:21:55.7331282Z Successfully built torchvision
2023-01-11T21:21:56.2912231Z Installing collected packages: torchvision
2023-01-11T21:21:56.6628056Z Successfully installed torchvision-0.15.0a0+32d254b
2023-01-11T21:21:56.7589618Z + install_triton
2023-01-11T21:21:56.7589918Z + local commit
2023-01-11T21:21:56.7590152Z + [[ default == *rocm* ]]
2023-01-11T21:21:56.7591678Z ++ get_pinned_commit triton
2023-01-11T21:21:56.7592262Z ++ cat .github/ci_commit_pins/triton.txt
2023-01-11T21:21:56.7602806Z + commit=0d7e7532279e45672555e344646f5c19c3972331
2023-01-11T21:21:56.7603571Z + pip_install --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python
2023-01-11T21:21:56.7604137Z + pip install --progress-bar off --user git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python
2023-01-11T21:21:57.0746437Z Collecting git+https://github.com/openai/triton@0d7e7532279e45672555e344646f5c19c3972331#subdirectory=python
2023-01-11T21:21:57.0747654Z   Cloning https://github.com/openai/triton (to revision 0d7e7532279e45672555e344646f5c19c3972331) to /tmp/pip-req-build-phj8u5v_
2023-01-11T21:21:57.0764019Z   Running command git clone --filter=blob:none --quiet https://github.com/openai/triton /tmp/pip-req-build-phj8u5v_
2023-01-11T21:21:57.7144111Z   Running command git rev-parse -q --verify 'sha^0d7e7532279e45672555e344646f5c19c3972331'
2023-01-11T21:21:57.7157676Z   Running command git fetch -q https://github.com/openai/triton 0d7e7532279e45672555e344646f5c19c3972331
2023-01-11T21:21:58.0417228Z   Running command git checkout -q 0d7e7532279e45672555e344646f5c19c3972331
2023-01-11T21:21:58.3784551Z   Resolved https://github.com/openai/triton to commit 0d7e7532279e45672555e344646f5c19c3972331
2023-01-11T21:21:58.3785406Z   Running command git submodule update --init --recursive -q
2023-01-11T21:21:58.8241585Z   Preparing metadata (setup.py) ... [?25l- done
2023-01-11T21:21:58.9990950Z [?25hCollecting cmake
2023-01-11T21:21:59.0181191Z   Downloading cmake-3.25.0-py2.py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (23.7 MB)
2023-01-11T21:21:59.2816510Z Collecting filelock
2023-01-11T21:21:59.2901667Z   Downloading filelock-3.9.0-py3-none-any.whl (9.7 kB)
2023-01-11T21:21:59.2934271Z Requirement already satisfied: torch in /opt/conda/lib/python3.10/site-packages (from triton==2.0.0) (2.0.0a0+git8419ddd)
2023-01-11T21:21:59.3114806Z Requirement already satisfied: sympy in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (1.11.1)
2023-01-11T21:21:59.3118622Z Requirement already satisfied: networkx in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (2.6.3)
2023-01-11T21:21:59.3122833Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.10/site-packages (from torch->triton==2.0.0) (4.4.0)
2023-01-11T21:21:59.3267540Z Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.10/site-packages (from sympy->torch->triton==2.0.0) (1.2.1)
2023-01-11T21:21:59.3316824Z Building wheels for collected packages: triton
2023-01-11T21:22:39.5302469Z   Building wheel for triton (setup.py) ... [?25l- \ | / - \ | / done
2023-01-11T21:22:39.5424816Z [?25h  Created wheel for triton: filename=triton-2.0.0-cp310-cp310-linux_x86_64.whl size=15377935 sha256=fd8e6bb61136a085b66646350c4b898753e24c3d426a3caab7b52ea9562bb401
2023-01-11T21:22:39.5425759Z   Stored in directory: /var/lib/jenkins/.cache/pip/wheels/3f/1d/23/1c2bc47d618a44f9c949aea4b7e355e737a1f1ed208f009295
2023-01-11T21:22:39.5436473Z Successfully built triton
2023-01-11T21:22:40.1617777Z Installing collected packages: cmake, filelock, triton
2023-01-11T21:22:41.1710154Z Successfully installed cmake-3.25.0 filelock-3.9.0 triton-2.0.0
2023-01-11T21:22:41.2616768Z + pip_install --user jinja2
2023-01-11T21:22:41.2617155Z + pip install --progress-bar off --user jinja2
2023-01-11T21:22:41.6130468Z Collecting jinja2
2023-01-11T21:22:41.6302107Z   Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
2023-01-11T21:22:41.6423896Z Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.10/site-packages (from jinja2) (2.1.1)
2023-01-11T21:22:42.2619233Z Installing collected packages: jinja2
2023-01-11T21:22:42.3386537Z Successfully installed jinja2-3.1.2
2023-01-11T21:22:42.4030984Z + test_python_shard 2
2023-01-11T21:22:42.4031370Z + [[ -z 4 ]]
2023-01-11T21:22:42.4031926Z + python test/run_test.py --exclude-jit-executor --exclude-distributed-tests --shard 2 4 --verbose
2023-01-11T21:22:44.0298710Z Ignoring disabled issues:  []
2023-01-11T21:22:44.0549325Z /var/lib/jenkins/workspace/test/run_test.py:1169: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
2023-01-11T21:22:44.0549869Z   if torch.version.cuda is not None and LooseVersion(torch.version.cuda) >= "11.6":
2023-01-11T21:22:44.0615344Z ##[warning] Gathered no stats from artifacts. Proceeding with default sharding plan.
2023-01-11T21:22:44.0616080Z Selected tests:
2023-01-11T21:22:44.0616294Z  benchmark_utils/test_benchmark_utils
2023-01-11T21:22:44.0616516Z  dynamo/test_comptime
2023-01-11T21:22:44.0616716Z  dynamo/test_functions
2023-01-11T21:22:44.0616905Z  dynamo/test_misc
2023-01-11T21:22:44.0617106Z  dynamo/test_optimizations
2023-01-11T21:22:44.0617328Z  dynamo/test_replay_record
2023-01-11T21:22:44.0617567Z  dynamo/test_torchxla_integration
2023-01-11T21:22:44.0617816Z  dynamo/test_verify_correctness
2023-01-11T21:22:44.0618046Z  inductor/test_torchinductor
2023-01-11T21:22:44.0618264Z  lazy/test_extract_compiled_graph
2023-01-11T21:22:44.0618478Z  lazy/test_ts_opinfo
2023-01-11T21:22:44.0618669Z  nn/test_init
2023-01-11T21:22:44.0618854Z  nn/test_packed_sequence
2023-01-11T21:22:44.0619074Z  profiler/test_memory_profiler
2023-01-11T21:22:44.0619284Z  test_autocast
2023-01-11T21:22:44.0619469Z  test_comparison_utils
2023-01-11T21:22:44.0619687Z  test_cpp_extensions_aot_no_ninja
2023-01-11T21:22:44.0619908Z  test_cuda_nvml_based_avail
2023-01-11T21:22:44.0620102Z  test_dataloader
2023-01-11T21:22:44.0620290Z  test_dispatch
2023-01-11T21:22:44.0620480Z  test_fake_tensor
2023-01-11T21:22:44.0620670Z  test_functional_optim
2023-01-11T21:22:44.0620876Z  test_fx_experimental
2023-01-11T21:22:44.0621073Z  test_import_stats
2023-01-11T21:22:44.0621256Z  test_jit_autocast
2023-01-11T21:22:44.0621450Z  test_jit_llga_fuser
2023-01-11T21:22:44.0621639Z  test_linalg
2023-01-11T21:22:44.0621817Z  test_matmul_cuda
2023-01-11T21:22:44.0622014Z  test_mkldnn_fusion
2023-01-11T21:22:44.0622206Z  test_module_init
2023-01-11T21:22:44.0622403Z  test_multiprocessing_spawn
2023-01-11T21:22:44.0622613Z  test_native_mha
2023-01-11T21:22:44.0622807Z  test_numpy_interop
2023-01-11T21:22:44.0622986Z  test_ops
2023-01-11T21:22:44.0623162Z  test_optim
2023-01-11T21:22:44.0623345Z  test_prims
2023-01-11T21:22:44.0623785Z  test_python_dispatch
2023-01-11T21:22:44.0624001Z  test_scatter_gather_ops
2023-01-11T21:22:44.0624209Z  test_shape_ops
2023-01-11T21:22:44.0624398Z  test_sparse_csr
2023-01-11T21:22:44.0624603Z  test_tensor_creation_ops
2023-01-11T21:22:44.0624808Z  test_testing
2023-01-11T21:22:44.0624989Z  test_type_info
2023-01-11T21:22:44.0625182Z  test_view_ops
2023-01-11T21:22:44.0625370Z  doctests
2023-01-11T21:22:44.0725716Z Prioritized test from test file changes.
2023-01-11T21:22:44.0727020Z reordering tests for PR:
2023-01-11T21:22:44.0728267Z prioritized: ['dynamo/test_misc', 'dynamo/test_optimizations', 'dynamo/test_torchxla_integration', 'inductor/test_torchinductor', 'test_fake_tensor', 'test_python_dispatch', 'test_scatter_gather_ops', 'test_sparse_csr', 'test_testing']
2023-01-11T21:22:44.0729921Z the rest: ['benchmark_utils/test_benchmark_utils', 'dynamo/test_comptime', 'dynamo/test_functions', 'dynamo/test_replay_record', 'dynamo/test_verify_correctness', 'lazy/test_extract_compiled_graph', 'lazy/test_ts_opinfo', 'nn/test_init', 'nn/test_packed_sequence', 'profiler/test_memory_profiler', 'test_autocast', 'test_comparison_utils', 'test_cpp_extensions_aot_no_ninja', 'test_cuda_nvml_based_avail', 'test_dataloader', 'test_dispatch', 'test_functional_optim', 'test_fx_experimental', 'test_import_stats', 'test_jit_autocast', 'test_jit_llga_fuser', 'test_linalg', 'test_matmul_cuda', 'test_mkldnn_fusion', 'test_module_init', 'test_multiprocessing_spawn', 'test_native_mha', 'test_numpy_interop', 'test_ops', 'test_optim', 'test_prims', 'test_shape_ops', 'test_tensor_creation_ops', 'test_type_info', 'test_view_ops', 'doctests']
2023-01-11T21:22:44.0730956Z 
2023-01-11T21:22:44.0731359Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/slow-tests.json to /var/lib/jenkins/workspace/test/.pytorch-slow-tests.json
2023-01-11T21:22:44.1070547Z Downloading https://raw.githubusercontent.com/pytorch/test-infra/generated-stats/stats/disabled-tests-condensed.json to /var/lib/jenkins/workspace/test/.pytorch-disabled-tests.json
2023-01-11T21:22:44.1285685Z parallel (file granularity) tests:
2023-01-11T21:22:44.1286552Z  dynamo/test_misc
2023-01-11T21:22:44.1287150Z  dynamo/test_optimizations
2023-01-11T21:22:44.1287646Z  dynamo/test_torchxla_integration
2023-01-11T21:22:44.1288094Z  inductor/test_torchinductor
2023-01-11T21:22:44.1288434Z  test_python_dispatch
2023-01-11T21:22:44.1288826Z  test_scatter_gather_ops
2023-01-11T21:22:44.1289116Z  test_testing
2023-01-11T21:22:44.1289321Z  benchmark_utils/test_benchmark_utils
2023-01-11T21:22:44.1289545Z  dynamo/test_comptime
2023-01-11T21:22:44.1289750Z  dynamo/test_functions
2023-01-11T21:22:44.1289962Z  dynamo/test_replay_record
2023-01-11T21:22:44.1290198Z  dynamo/test_verify_correctness
2023-01-11T21:22:44.1290431Z  lazy/test_extract_compiled_graph
2023-01-11T21:22:44.1290646Z  lazy/test_ts_opinfo
2023-01-11T21:22:44.1290831Z  nn/test_init
2023-01-11T21:22:44.1291028Z  nn/test_packed_sequence
2023-01-11T21:22:44.1291261Z  profiler/test_memory_profiler
2023-01-11T21:22:44.1291462Z  test_autocast
2023-01-11T21:22:44.1291659Z  test_comparison_utils
2023-01-11T21:22:44.1291859Z  test_dataloader
2023-01-11T21:22:44.1292049Z  test_functional_optim
2023-01-11T21:22:44.1292252Z  test_fx_experimental
2023-01-11T21:22:44.1292451Z  test_import_stats
2023-01-11T21:22:44.1292634Z  test_jit_autocast
2023-01-11T21:22:44.1292826Z  test_jit_llga_fuser
2023-01-11T21:22:44.1293018Z  test_matmul_cuda
2023-01-11T21:22:44.1293200Z  test_mkldnn_fusion
2023-01-11T21:22:44.1293391Z  test_module_init
2023-01-11T21:22:44.1293661Z  test_native_mha
2023-01-11T21:22:44.1293875Z  test_numpy_interop
2023-01-11T21:22:44.1294080Z  test_optim
2023-01-11T21:22:44.1294267Z  test_shape_ops
2023-01-11T21:22:44.1294449Z  test_type_info
2023-01-11T21:22:44.1294787Z  test_view_ops
2023-01-11T21:22:44.1295006Z serial (file granularity) tests:
2023-01-11T21:22:44.1295213Z  test_fake_tensor
2023-01-11T21:22:44.1295406Z  test_sparse_csr
2023-01-11T21:22:44.1295749Z  test_cpp_extensions_aot_no_ninja
2023-01-11T21:22:44.1295968Z  test_cuda_nvml_based_avail
2023-01-11T21:22:44.1296171Z  test_dispatch
2023-01-11T21:22:44.1296358Z  test_linalg
2023-01-11T21:22:44.1296557Z  test_multiprocessing_spawn
2023-01-11T21:22:44.1296761Z  test_ops
2023-01-11T21:22:44.1296939Z  test_prims
2023-01-11T21:22:44.1297127Z  test_tensor_creation_ops
2023-01-11T21:22:44.1297324Z  doctests
2023-01-11T21:22:45.7012141Z Ignoring disabled issues:  []
2023-01-11T21:22:45.7241718Z Running dynamo/test_misc ... [2023-01-11 21:22:45.723623]
2023-01-11T21:22:45.7244408Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:45.723969]
2023-01-11T21:22:45.7299912Z Ignoring disabled issues:  []
2023-01-11T21:22:45.7532702Z Running dynamo/test_optimizations ... [2023-01-11 21:22:45.752655]
2023-01-11T21:22:45.7537581Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_optimizations.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:45.753017]
2023-01-11T21:22:49.1701897Z 
2023-01-11T21:22:49.1703476Z Expand the folded group to see the log file of dynamo/test_optimizations
2023-01-11T21:22:49.1711296Z ##[group]PRINTING LOG FILE of dynamo/test_optimizations (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_optimizations__p8vut7u)
2023-01-11T21:22:49.1711568Z 
2023-01-11T21:22:49.1711657Z Running tests...
2023-01-11T21:22:49.1712089Z ----------------------------------------------------------------------
2023-01-11T21:22:49.1712498Z Test results will be stored in test-reports/python-unittest/dynamo.test_optimizations
2023-01-11T21:22:49.1713081Z   test_inplace_normalize (__main__.NormalizeIRTests) ... ok (1.197s)
2023-01-11T21:22:49.1713422Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:49.1713746Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:49.1714052Z aot_autograd [('total', 1), ('ok', 1)]
2023-01-11T21:22:49.1714479Z   test_example_inputs (__main__.TestOptimizations) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:49.1714763Z ok (0.091s)
2023-01-11T21:22:49.1715191Z   test_example_inputs_runtime_use (__main__.TestOptimizations) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:49.1715496Z ok (0.007s)
2023-01-11T21:22:49.1715743Z   test_has_mutation (__main__.TestOptimizations) ... ok (0.015s)
2023-01-11T21:22:49.1716125Z   test_has_mutation_factory (__main__.TestOptimizations) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:49.1716513Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:49.1716764Z ok (0.014s)
2023-01-11T21:22:49.1717096Z   test_inplacifier (__main__.TestOptimizations) ... optimizations [('out', 1), ('inplace', 1)]
2023-01-11T21:22:49.1717365Z ok (0.014s)
2023-01-11T21:22:49.1717625Z   test_ipex_bf16 (__main__.TestOptimizations) ... skip: requires ipex (0.001s)
2023-01-11T21:22:49.1717955Z   test_ipex_fp32 (__main__.TestOptimizations) ... skip: requires ipex (0.001s)
2023-01-11T21:22:49.1718339Z   test_log_conv_args (__main__.TestOptimizations) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:49.1718719Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:49.1718964Z ok (0.140s)
2023-01-11T21:22:49.1719075Z 
2023-01-11T21:22:49.1719267Z ----------------------------------------------------------------------
2023-01-11T21:22:49.1719527Z Ran 9 tests in 1.479s
2023-01-11T21:22:49.1719651Z 
2023-01-11T21:22:49.1719732Z OK (skipped=2)
2023-01-11T21:22:49.1719847Z 
2023-01-11T21:22:49.1719943Z Generating XML reports...
2023-01-11T21:22:49.1720376Z Generated XML report: test-reports/python-unittest/dynamo.test_optimizations/TEST-NormalizeIRTests-20230111212247.xml
2023-01-11T21:22:49.1720936Z Generated XML report: test-reports/python-unittest/dynamo.test_optimizations/TEST-TestOptimizations-20230111212247.xml
2023-01-11T21:22:49.1721189Z 
2023-01-11T21:22:49.1721548Z ##[endgroup]
2023-01-11T21:22:49.1721975Z FINISHED PRINTING LOG FILE of dynamo/test_optimizations (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_optimizations__p8vut7u)
2023-01-11T21:22:49.1722224Z 
2023-01-11T21:22:51.1020398Z Ignoring disabled issues:  []
2023-01-11T21:22:51.1249002Z Running dynamo/test_torchxla_integration ... [2023-01-11 21:22:51.124462]
2023-01-11T21:22:51.1250925Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_torchxla_integration.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:51.124786]
2023-01-11T21:22:52.7102858Z 
2023-01-11T21:22:52.7103388Z Expand the folded group to see the log file of dynamo/test_misc
2023-01-11T21:22:52.7104937Z ##[group]PRINTING LOG FILE of dynamo/test_misc (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_misc_mzyasd93)
2023-01-11T21:22:52.7105494Z 
2023-01-11T21:22:52.7105620Z Running tests...
2023-01-11T21:22:52.7106360Z ----------------------------------------------------------------------
2023-01-11T21:22:52.7109828Z Test results will be stored in test-reports/python-unittest/dynamo.test_misc
2023-01-11T21:22:52.7110305Z   test_allow_in_graph (__main__.MiscTests) ... ok (1.192s)
2023-01-11T21:22:52.7110838Z   test_autocast (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7111365Z stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:22:52.7111914Z stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)]
2023-01-11T21:22:52.7112246Z ok (0.668s)
2023-01-11T21:22:52.7112734Z   test_autocast_cpu (__main__.MiscTests) ... stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)]
2023-01-11T21:22:52.7113331Z ok (0.027s)
2023-01-11T21:22:52.7113858Z   test_autocast_device (__main__.MiscTests) ... stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)]
2023-01-11T21:22:52.7114234Z ok (0.026s)
2023-01-11T21:22:52.7116521Z   test_autocast_float64 (__main__.MiscTests) ... stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)]
2023-01-11T21:22:52.7116886Z ok (0.025s)
2023-01-11T21:22:52.7117220Z   test_autograd_function_equivalence (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7117753Z stats [('calls_captured', 4), ('unique_graphs', 4), ('fusions_possible', 0)]
2023-01-11T21:22:52.7118094Z ok (0.100s)
2023-01-11T21:22:52.7118674Z   test_autograd_profiler (__main__.MiscTests) ... STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:22:52.7119341Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:22:52.7124357Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:22:52.7124952Z [2023-01-11 21:22:49,540] torch._dynamo.variables.torch: [WARNING] Profiler will be ignored
2023-01-11T21:22:52.7125361Z frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7125626Z unimplemented []
2023-01-11T21:22:52.7125946Z graph_break [('Tensor.tolist', 1)]
2023-01-11T21:22:52.7126330Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7126636Z ok (0.296s)
2023-01-11T21:22:52.7127149Z   test_autograd_profiler_enabled (__main__.MiscTests) ... STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:22:52.7127726Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:22:52.7128340Z STAGE:2023-01-11 21:22:49 2193:2193 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:22:52.7128760Z frames [('total', 3), ('ok', 3)]
2023-01-11T21:22:52.7128963Z unimplemented []
2023-01-11T21:22:52.7129289Z graph_break [('torch.autograd._profiler_enabled not supported yet', 1)]
2023-01-11T21:22:52.7129668Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)]
2023-01-11T21:22:52.7130051Z ok (0.014s)
2023-01-11T21:22:52.7130434Z   test_boolarg (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)]
2023-01-11T21:22:52.7130736Z ok (0.010s)
2023-01-11T21:22:52.7130973Z   test_build_tuple_unpack (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7131324Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7131566Z ok (0.013s)
2023-01-11T21:22:52.7131896Z   test_builder_for_class_with_metaclass (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7132268Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7132509Z ok (0.005s)
2023-01-11T21:22:52.7132888Z   test_builtin_isinstance (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7133171Z ok (0.011s)
2023-01-11T21:22:52.7133424Z   test_builtin_subclasses_as_method_on_class_type (__main__.MiscTests) ... ok (0.002s)
2023-01-11T21:22:52.7133876Z   test_builtin_subclasses_as_method_on_var (__main__.MiscTests) ... ok (0.004s)
2023-01-11T21:22:52.7134209Z   test_call_parent_non_class_methods_from_child (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7134801Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7135048Z ok (0.008s)
2023-01-11T21:22:52.7135415Z   test_callpacked (__main__.MiscTests) ... stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7135684Z ok (0.010s)
2023-01-11T21:22:52.7135990Z   test_cell_output1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7136432Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7136672Z ok (0.005s)
2023-01-11T21:22:52.7136965Z   test_cell_output2 (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7137218Z unimplemented []
2023-01-11T21:22:52.7137618Z graph_break [('call_function UserDefinedObjectVariable(unsupported) [TensorVariable(), TensorVariable()] {}', 1)]
2023-01-11T21:22:52.7138040Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7138284Z ok (0.008s)
2023-01-11T21:22:52.7138986Z   test_change_backends (__main__.MiscTests) ... /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
2023-01-11T21:22:52.7139599Z   warnings.warn("The TorchScript type system doesn't support "
2023-01-11T21:22:52.7139965Z stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)]
2023-01-11T21:22:52.7140262Z frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7140462Z ok (0.058s)
2023-01-11T21:22:52.7140753Z   test_cond (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7140985Z inline_call []
2023-01-11T21:22:52.7141300Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7141548Z ok (0.012s)
2023-01-11T21:22:52.7141772Z   test_cond_export (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7142122Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7142365Z ok (0.016s)
2023-01-11T21:22:52.7142600Z   test_cond_export_single_arg (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7142967Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7143209Z ok (0.009s)
2023-01-11T21:22:52.7143501Z   test_cond_nested (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7143741Z inline_call []
2023-01-11T21:22:52.7144047Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7144280Z ok (0.014s)
2023-01-11T21:22:52.7144589Z   test_cond_side_effects (__main__.MiscTests) ... expected failure (0.001s)
2023-01-11T21:22:52.7144970Z   test_config_getattr_default (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)]
2023-01-11T21:22:52.7145345Z stats [('calls_captured', 21), ('fusions_possible', 18), ('unique_graphs', 3)]
2023-01-11T21:22:52.7145582Z ok (0.033s)
2023-01-11T21:22:52.7145888Z   test_config_log_level (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7146255Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7146487Z ok (0.005s)
2023-01-11T21:22:52.7146778Z   test_config_obj (__main__.MiscTests) ... frames [('total', 4), ('ok', 4)]
2023-01-11T21:22:52.7147140Z stats [('calls_captured', 8), ('fusions_possible', 4), ('unique_graphs', 4)]
2023-01-11T21:22:52.7147368Z ok (0.022s)
2023-01-11T21:22:52.7147613Z   test_const_dict_variable_python_type (__main__.MiscTests) ... ok (0.001s)
2023-01-11T21:22:52.7148212Z   test_cross_entropy_loss_fancy_ctor (__main__.MiscTests) ... /opt/conda/lib/python3.10/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
2023-01-11T21:22:52.7148624Z   warnings.warn(warning.format(ret))
2023-01-11T21:22:52.7148826Z ok (0.002s)
2023-01-11T21:22:52.7149069Z   test_cross_entropy_loss_simple_ctor (__main__.MiscTests) ... ok (0.001s)
2023-01-11T21:22:52.7149441Z   test_dataclass_fields (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7149679Z inline_call []
2023-01-11T21:22:52.7150010Z stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)]
2023-01-11T21:22:52.7150279Z ok (0.034s)
2023-01-11T21:22:52.7150673Z   test_dict_mutation_side_effect (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7151046Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7151286Z ok (0.005s)
2023-01-11T21:22:52.7151631Z   test_dict_reconstruct_keeps_original_order (__main__.MiscTests) ... frames [('total', 13), ('ok', 12)]
2023-01-11T21:22:52.7152089Z unimplemented [("Guard setup for uninitialized class <class 'torch.nn.modules.container.ModuleDict'>", 1)]
2023-01-11T21:22:52.7152990Z graph_break [('UnspecializedNNModuleVariable missing add_module', 3), ('construct nn.Module: ReLU', 1), ('call_function in skip_files /opt/conda/lib/python3.10/collections/__init__.py', 1), ('construct nn.Module: ModuleDict', 1), ('Patched init cannot be inlined.', 1), ('construct nn.Module: Linear', 1), ('construct nn.Module: Sigmoid', 1), ('call_method ConstDictVariable() update [TupleVariable()] {}', 1)]
2023-01-11T21:22:52.7153654Z inline_call [('inline __setitem__', 2), ('Patched init cannot be inlined.', 1)]
2023-01-11T21:22:52.7153906Z ok (0.036s)
2023-01-11T21:22:52.7154194Z   test_dictcomp (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7154435Z inline_call []
2023-01-11T21:22:52.7154744Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7154978Z ok (0.007s)
2023-01-11T21:22:52.7155208Z   test_disable_flag (__main__.MiscTests) ... ok (0.002s)
2023-01-11T21:22:52.7155479Z   test_disable_optimize (__main__.MiscTests) ... ok (0.002s)
2023-01-11T21:22:52.7155837Z   test_disallow_in_graph (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7156080Z unimplemented []
2023-01-11T21:22:52.7156466Z graph_break [('call_function UserDefinedObjectVariable(sub) [TensorVariable(), ConstantVariable(int)] {}', 1)]
2023-01-11T21:22:52.7156888Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7157117Z ok (0.010s)
2023-01-11T21:22:52.7157353Z   test_dunder_methods (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7157704Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:22:52.7157935Z ok (0.020s)
2023-01-11T21:22:52.7158177Z   test_duplicate_graph_break_warning (__main__.MiscTests) ... break
2023-01-11T21:22:52.7158479Z break
2023-01-11T21:22:52.7158693Z frames [('total', 9), ('ok', 9)]
2023-01-11T21:22:52.7159038Z inline_call [('call_function BuiltinVariable(print) [ConstantVariable(str)] {}', 2)]
2023-01-11T21:22:52.7159309Z unimplemented []
2023-01-11T21:22:52.7159647Z graph_break [('call_function BuiltinVariable(print) [ConstantVariable(str)] {}', 4)]
2023-01-11T21:22:52.7160025Z stats [('calls_captured', 6), ('unique_graphs', 4), ('fusions_possible', 2)]
2023-01-11T21:22:52.7160262Z ok (0.026s)
2023-01-11T21:22:52.7160507Z   test_dynamo_min_operator_with_shape (__main__.MiscTests) ... ok (0.002s)
2023-01-11T21:22:52.7160859Z   test_empty_list (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7161218Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)]
2023-01-11T21:22:52.7161461Z ok (0.008s)
2023-01-11T21:22:52.7161830Z   test_enum_no_graphbreaks (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)]
2023-01-11T21:22:52.7162114Z ok (0.010s)
2023-01-11T21:22:52.7162428Z   test_error_on_nested_fx_trace (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7162802Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7163032Z ok (0.004s)
2023-01-11T21:22:52.7163384Z   test_fold (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7163649Z ok (0.004s)
2023-01-11T21:22:52.7164032Z   test_frozenset_torch_func_contains (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)]
2023-01-11T21:22:52.7164363Z ok (0.009s)
2023-01-11T21:22:52.7164746Z   test_function_annotation (__main__.MiscTests) ... stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)]
2023-01-11T21:22:52.7165019Z ok (0.008s)
2023-01-11T21:22:52.7165378Z   test_generate_tensor_from_list_of_numpy_primitive_type (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7165665Z unimplemented []
2023-01-11T21:22:52.7165904Z graph_break [('numpy', 1)]
2023-01-11T21:22:52.7166218Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7166457Z ok (0.007s)
2023-01-11T21:22:52.7166821Z   test_get_device (__main__.MiscTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7167086Z ok (0.005s)
2023-01-11T21:22:52.7167372Z   test_grad (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7167613Z unimplemented []
2023-01-11T21:22:52.7167859Z graph_break [('Tensor.backward', 1)]
2023-01-11T21:22:52.7168200Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7168442Z ok (0.012s)
2023-01-11T21:22:52.7168736Z   test_grad_mode_guard (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7168991Z unimplemented []
2023-01-11T21:22:52.7169243Z graph_break [('Tensor.tolist', 1)]
2023-01-11T21:22:52.7169570Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7169808Z ok (0.011s)
2023-01-11T21:22:52.7170108Z   test_graph_break (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)]
2023-01-11T21:22:52.7170419Z unimplemented []
2023-01-11T21:22:52.7170799Z graph_break [('call_function in skip_files /opt/conda/lib/python3.10/site-packages/torch/_dynamo/__init__.py', 2)]
2023-01-11T21:22:52.7171198Z stats [('calls_captured', 6), ('fusions_possible', 3), ('unique_graphs', 3)]
2023-01-11T21:22:52.7171436Z ok (0.015s)
2023-01-11T21:22:52.7171809Z   test_guard_failure_fn (__main__.MiscTests) ... stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)]
2023-01-11T21:22:52.7172082Z ok (0.012s)
2023-01-11T21:22:52.7172456Z   test_guard_failure_fn2 (__main__.MiscTests) ... stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)]
2023-01-11T21:22:52.7172737Z ok (0.011s)
2023-01-11T21:22:52.7173138Z   test_id_of_nn_module (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)]
2023-01-11T21:22:52.7173421Z ok (0.012s)
2023-01-11T21:22:52.7173895Z   test_if_cond_nn_mod (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)]
2023-01-11T21:22:52.7174171Z ok (0.013s)
2023-01-11T21:22:52.7174468Z   test_inference_mode (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7175134Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7175376Z ok (0.006s)
2023-01-11T21:22:52.7175678Z   test_inline_dict_mutation (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7175930Z inline_call []
2023-01-11T21:22:52.7176255Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7176567Z ok (0.020s)
2023-01-11T21:22:52.7177030Z   test_inline_func_jump_on_tensor_condition (__main__.MiscTests) ... frames [('total', 4), ('ok', 4)]
2023-01-11T21:22:52.7177514Z inline_call [('generic_jump TensorVariable()', 1)]
2023-01-11T21:22:52.7177829Z unimplemented []
2023-01-11T21:22:52.7178195Z graph_break [('generic_jump TensorVariable()', 1)]
2023-01-11T21:22:52.7178659Z stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)]
2023-01-11T21:22:52.7178970Z ok (0.016s)
2023-01-11T21:22:52.7179371Z   test_inline_list_mutation (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7179689Z inline_call []
2023-01-11T21:22:52.7181557Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7181840Z ok (0.018s)
2023-01-11T21:22:52.7182345Z   test_inplace (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7182623Z ok (0.012s)
2023-01-11T21:22:52.7183063Z   test_inplace_param_update (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:22:52.7183452Z ok (0.006s)
2023-01-11T21:22:52.7183847Z   test_is_compiling (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7184339Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7184641Z ok (0.009s)
2023-01-11T21:22:52.7185139Z   test_is_floating_point (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7185513Z ok (0.006s)
2023-01-11T21:22:52.7185958Z   test_is_floating_point2 (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7186244Z ok (0.006s)
2023-01-11T21:22:52.7186632Z   test_is_tensor (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7186973Z ok (0.006s)
2023-01-11T21:22:52.7187320Z   test_is_tensor2 (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7187731Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)]
2023-01-11T21:22:52.7188018Z ok (0.012s)
2023-01-11T21:22:52.7188384Z   test_is_tensor_like (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 2), ('fusions_possible', 1)]
2023-01-11T21:22:52.7188707Z ok (0.012s)
2023-01-11T21:22:52.7189016Z   test_is_tensor_like2 (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)]
2023-01-11T21:22:52.7189261Z unimplemented []
2023-01-11T21:22:52.7189647Z graph_break [('call_function args: UserDefinedObjectVariable(MyTensor) ', 1)]
2023-01-11T21:22:52.7190067Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)]
2023-01-11T21:22:52.7190369Z ok (0.015s)
2023-01-11T21:22:52.7190845Z   test_item (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7191164Z ok (0.016s)
2023-01-11T21:22:52.7191832Z   test_item_changes (__main__.MiscTests) ... stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)]
2023-01-11T21:22:52.7192110Z ok (0.031s)
2023-01-11T21:22:52.7192498Z   test_item_changes_new_shape (__main__.MiscTests) ... stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)]
2023-01-11T21:22:52.7192784Z ok (0.031s)
2023-01-11T21:22:52.7193010Z   test_large_reduction_list (__main__.MiscTests) ... ok (0.008s)
2023-01-11T21:22:52.7193299Z   test_linetable_writer (__main__.MiscTests) ... ok (0.001s)
2023-01-11T21:22:52.7193666Z   test_list_append_return_none (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7194034Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7194280Z ok (0.005s)
2023-01-11T21:22:52.7194569Z   test_list_mul (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7194800Z ok (0.002s)
2023-01-11T21:22:52.7195084Z   test_listcomp (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7195317Z inline_call []
2023-01-11T21:22:52.7195631Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:22:52.7195865Z ok (0.009s)
2023-01-11T21:22:52.7196128Z   test_lnotab_writer (__main__.MiscTests) ... skip: use lnotab when python < 3.10 (0.000s)
2023-01-11T21:22:52.7196571Z   test_manual_seed (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7196838Z ok (0.005s)
2023-01-11T21:22:52.7197197Z   test_matmul1 (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7197468Z ok (0.004s)
2023-01-11T21:22:52.7197701Z   test_module_complex_iter (__main__.MiscTests) ... ok (0.007s)
2023-01-11T21:22:52.7198110Z   test_module_deepcopy (__main__.MiscTests) ... frames [('total', 6), ('ok', 6)]
2023-01-11T21:22:52.7198366Z unimplemented []
2023-01-11T21:22:52.7198709Z graph_break [('call_function in skip_files /opt/conda/lib/python3.10/copy.py', 2)]
2023-01-11T21:22:52.7198964Z inline_call []
2023-01-11T21:22:52.7199282Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:22:52.7199532Z ok (0.043s)
2023-01-11T21:22:52.7199762Z   test_named_parameters (__main__.MiscTests) ... ok (0.017s)
2023-01-11T21:22:52.7200150Z   test_namedtuple1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7200541Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7200776Z ok (0.006s)
2023-01-11T21:22:52.7201081Z   test_namedtuple2 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7201446Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7201692Z ok (0.007s)
2023-01-11T21:22:52.7202064Z   test_namedtuple3 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7202429Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7202669Z ok (0.005s)
2023-01-11T21:22:52.7202962Z   test_nan (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7203323Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7203560Z ok (0.004s)
2023-01-11T21:22:52.7203799Z   test_nested_closure (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7204165Z stats [('calls_captured', 9), ('fusions_possible', 7), ('unique_graphs', 2)]
2023-01-11T21:22:52.7204396Z ok (0.031s)
2023-01-11T21:22:52.7204643Z   test_nested_closure_mutation (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7205017Z stats [('calls_captured', 11), ('fusions_possible', 9), ('unique_graphs', 2)]
2023-01-11T21:22:52.7205257Z ok (0.023s)
2023-01-11T21:22:52.7205751Z   test_nested_disable_decorator (__main__.MiscTests) ... [2023-01-11 21:22:50,520] torch._dynamo.convert_frame: [ERROR] WON'T CONVERT fn3 /var/lib/jenkins/workspace/test/dynamo/test_misc.py line 1197 
2023-01-11T21:22:52.7206101Z due to: 
2023-01-11T21:22:52.7206351Z Traceback (most recent call last):
2023-01-11T21:22:52.7206730Z   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/exc.py", line 67, in unimplemented
2023-01-11T21:22:52.7207025Z     raise Unsupported(msg)
2023-01-11T21:22:52.7207406Z torch._dynamo.exc.Unsupported: call torch._dynamo.disable() wrapped function <function MiscTests.test_nested_disable_decorator.<locals>.fn1 at 0x7f78e333fb50>
2023-01-11T21:22:52.7207674Z 
2023-01-11T21:22:52.7207746Z from user code:
2023-01-11T21:22:52.7208011Z    File "/var/lib/jenkins/workspace/test/dynamo/test_misc.py", line 1199, in fn3
2023-01-11T21:22:52.7208270Z     return fn2(x)
2023-01-11T21:22:52.7208527Z   File "/var/lib/jenkins/workspace/test/dynamo/test_misc.py", line 1192, in fn2
2023-01-11T21:22:52.7208796Z     x = fn1(x)  # graph break
2023-01-11T21:22:52.7208925Z 
2023-01-11T21:22:52.7209067Z Set torch._dynamo.config.verbose=True for more information
2023-01-11T21:22:52.7209238Z 
2023-01-11T21:22:52.7209243Z 
2023-01-11T21:22:52.7209372Z frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7209580Z unimplemented []
2023-01-11T21:22:52.7210020Z graph_break [('call torch._dynamo.disable() wrapped function <function MiscTests.test_nested_disable_decorator.<locals>.fn1 at 0x7f78e333fb50>', 1)]
2023-01-11T21:22:52.7210522Z stats [('calls_captured', 4), ('fusions_possible', 2), ('unique_graphs', 2)]
2023-01-11T21:22:52.7211005Z inline_call [('call torch._dynamo.disable() wrapped function <function MiscTests.test_nested_disable_decorator.<locals>.fn1 at 0x7f78e333fb50>', 1)]
2023-01-11T21:22:52.7211315Z ok (0.012s)
2023-01-11T21:22:52.7211701Z   test_nested_optimize (__main__.MiscTests) ... stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)]
2023-01-11T21:22:52.7212028Z ok (0.014s)
2023-01-11T21:22:52.7212274Z   test_nested_optimize_decorator (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7212648Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:22:52.7212896Z ok (0.008s)
2023-01-11T21:22:52.7213274Z   test_nested_optimize_run (__main__.MiscTests) ... stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)]
2023-01-11T21:22:52.7213559Z ok (0.012s)
2023-01-11T21:22:52.7214065Z   test_nn_functional_reduction (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7214360Z ok (0.005s)
2023-01-11T21:22:52.7214767Z   test_nn_sequential_invocation (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7215165Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7215502Z ok (0.021s)
2023-01-11T21:22:52.7215810Z   test_nn_sequential_invocation_reposition_indices (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7216192Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7216431Z ok (0.016s)
2023-01-11T21:22:52.7216749Z   test_no_error_on_nested_fx_trace (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7217116Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7217357Z ok (0.005s)
2023-01-11T21:22:52.7217724Z   test_no_grad (__main__.MiscTests) ... stats [('calls_captured', 40), ('fusions_possible', 32), ('unique_graphs', 8)]
2023-01-11T21:22:52.7217996Z ok (0.048s)
2023-01-11T21:22:52.7218297Z   test_not_dynamic_scope (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7218547Z inline_call []
2023-01-11T21:22:52.7218852Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7219085Z ok (0.005s)
2023-01-11T21:22:52.7219440Z   test_numel (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7219712Z ok (0.006s)
2023-01-11T21:22:52.7220014Z   test_numpy_int_constant (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7220385Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7220760Z ok (0.006s)
2023-01-11T21:22:52.7221074Z   test_numpy_variable_isinstance (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7221450Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7221689Z ok (0.004s)
2023-01-11T21:22:52.7221926Z   test_object_classmethod (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7222275Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7222515Z ok (0.010s)
2023-01-11T21:22:52.7222751Z   test_object_staticmethod (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7223106Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7223346Z ok (0.010s)
2023-01-11T21:22:52.7223728Z   test_onnx_shape_as_tensor (__main__.MiscTests) ... stats [('calls_captured', 15), ('fusions_possible', 10), ('unique_graphs', 5)]
2023-01-11T21:22:52.7224005Z ok (0.071s)
2023-01-11T21:22:52.7224389Z   test_optimize_on_module (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7224671Z ok (0.005s)
2023-01-11T21:22:52.7225026Z   test_pair (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:22:52.7225287Z ok (0.019s)
2023-01-11T21:22:52.7225587Z   test_python_slice (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7225823Z ok (0.006s)
2023-01-11T21:22:52.7226118Z   test_raise_on_backend_error (__main__.MiscTests) ... frames [('total', 1)]
2023-01-11T21:22:52.7226450Z stats [('calls_captured', 3), ('fusions_possible', 2)]
2023-01-11T21:22:52.7226734Z ok (0.005s)
2023-01-11T21:22:52.7227018Z   test_raises (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)]
2023-01-11T21:22:52.7227260Z unimplemented []
2023-01-11T21:22:52.7227587Z graph_break [('call_function BuiltinVariable(str) [TensorVariable()] {}', 1)]
2023-01-11T21:22:52.7227961Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7228197Z ok (0.008s)
2023-01-11T21:22:52.7228550Z   test_rand (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7228818Z ok (0.009s)
2023-01-11T21:22:52.7229038Z   test_range_input (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7229388Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7229629Z ok (0.007s)
2023-01-11T21:22:52.7229958Z   test_recursive_inline_list_mutation (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7230260Z inline_call []
2023-01-11T21:22:52.7230569Z stats [('calls_captured', 7), ('fusions_possible', 6), ('unique_graphs', 1)]
2023-01-11T21:22:52.7230800Z ok (0.013s)
2023-01-11T21:22:52.7231112Z   test_release_input_memory (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7231486Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7231718Z ok (0.003s)
2023-01-11T21:22:52.7232029Z   test_release_module_memory (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7232401Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7232639Z ok (0.008s)
2023-01-11T21:22:52.7233039Z   test_repro_graph_breaks_in__get_item_by_idx (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7233338Z ok (0.009s)
2023-01-11T21:22:52.7233649Z   test_restore_graphstate (__main__.MiscTests) ... frames [('total', 4), ('ok', 4)]
2023-01-11T21:22:52.7233984Z inline_call [('generic_jump TensorVariable()', 1)]
2023-01-11T21:22:52.7234218Z unimplemented []
2023-01-11T21:22:52.7234495Z graph_break [('generic_jump TensorVariable()', 1)]
2023-01-11T21:22:52.7234839Z stats [('calls_captured', 6), ('unique_graphs', 4), ('fusions_possible', 2)]
2023-01-11T21:22:52.7235079Z ok (0.019s)
2023-01-11T21:22:52.7235513Z   test_restore_graphstate_internals (__main__.MiscTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7235809Z ok (0.007s)
2023-01-11T21:22:52.7236118Z   test_return_nested_function (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7236491Z stats [('calls_captured', 7), ('fusions_possible', 5), ('unique_graphs', 2)]
2023-01-11T21:22:52.7236729Z ok (0.013s)
2023-01-11T21:22:52.7236947Z   test_sample_input (__main__.MiscTests) ... ok (0.516s)
2023-01-11T21:22:52.7237297Z   test_setattr_mutation1 (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7237754Z unimplemented [('call_method UserDefinedObjectVariable(member_descriptor) __mul__ [ConstantVariable(int)] {}', 1)]
2023-01-11T21:22:52.7238276Z graph_break [("isinstance called on UserDefinedClass UserDefinedObjectVariable(member_descriptor) <class 'torch.Tensor'>", 1)]
2023-01-11T21:22:52.7238641Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7238970Z stats [('calls_captured', 12), ('fusions_possible', 11), ('unique_graphs', 1)]
2023-01-11T21:22:52.7239216Z ok (0.017s)
2023-01-11T21:22:52.7239516Z   test_setattr_mutation2 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7239763Z inline_call []
2023-01-11T21:22:52.7240073Z stats [('calls_captured', 9), ('fusions_possible', 8), ('unique_graphs', 1)]
2023-01-11T21:22:52.7240325Z ok (0.013s)
2023-01-11T21:22:52.7240663Z   test_setattr_mutation3 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7240908Z inline_call []
2023-01-11T21:22:52.7241211Z stats [('calls_captured', 9), ('fusions_possible', 8), ('unique_graphs', 1)]
2023-01-11T21:22:52.7241503Z ok (0.014s)
2023-01-11T21:22:52.7241808Z   test_shape_unpack (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7242167Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7242414Z ok (0.004s)
2023-01-11T21:22:52.7242760Z   test_side_effects_codegen_update_mutated (__main__.MiscTests) ... frames [('total', 6), ('ok', 6)]
2023-01-11T21:22:52.7243040Z unimplemented []
2023-01-11T21:22:52.7243281Z graph_break [('Tensor.item', 4)]
2023-01-11T21:22:52.7243612Z stats [('calls_captured', 8), ('fusions_possible', 4), ('unique_graphs', 4)]
2023-01-11T21:22:52.7243855Z ok (0.026s)
2023-01-11T21:22:52.7244147Z   test_size_input (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7244513Z stats [('calls_captured', 2), ('unique_graphs', 2), ('fusions_possible', 0)]
2023-01-11T21:22:52.7244757Z ok (0.007s)
2023-01-11T21:22:52.7245121Z   test_slice_input (__main__.MiscTests) ... stats [('calls_captured', 3), ('unique_graphs', 3), ('fusions_possible', 0)]
2023-01-11T21:22:52.7245399Z ok (0.031s)
2023-01-11T21:22:52.7245724Z   test_tensor_build_list_unpack (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7246104Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7246340Z ok (0.011s)
2023-01-11T21:22:52.7246712Z   test_tensor_data (__main__.MiscTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7246992Z ok (0.014s)
2023-01-11T21:22:52.7247288Z   test_tensor_dict1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7247655Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7247901Z ok (0.005s)
2023-01-11T21:22:52.7248195Z   test_tensor_dict2 (__main__.MiscTests) ... frames [('total', 3), ('ok', 3)]
2023-01-11T21:22:52.7248565Z stats [('calls_captured', 9), ('fusions_possible', 6), ('unique_graphs', 3)]
2023-01-11T21:22:52.7248812Z ok (0.031s)
2023-01-11T21:22:52.7249135Z   test_tensor_dot_grad_no_graph_break (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7249400Z unimplemented []
2023-01-11T21:22:52.7249663Z graph_break [('Tensor.backward', 1)]
2023-01-11T21:22:52.7250049Z stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)]
2023-01-11T21:22:52.7250285Z ok (0.014s)
2023-01-11T21:22:52.7250602Z   test_tensor_is_contiguous (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7250978Z stats [('calls_captured', 8), ('fusions_possible', 6), ('unique_graphs', 2)]
2023-01-11T21:22:52.7251216Z ok (0.166s)
2023-01-11T21:22:52.7251531Z   test_tensor_item_capture (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7251907Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7252140Z ok (0.006s)
2023-01-11T21:22:52.7252464Z   test_tensor_item_no_capture (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7252724Z unimplemented []
2023-01-11T21:22:52.7252963Z graph_break [('Tensor.item', 1)]
2023-01-11T21:22:52.7253295Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7253537Z ok (0.005s)
2023-01-11T21:22:52.7254035Z   test_tensor_layout (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7254303Z ok (0.010s)
2023-01-11T21:22:52.7254886Z   test_tensor_types (__main__.MiscTests) ... frames [('total', 10), ('ok', 10)]
2023-01-11T21:22:52.7255299Z stats [('calls_captured', 10), ('unique_graphs', 10), ('fusions_possible', 0)]
2023-01-11T21:22:52.7255660Z ok (0.076s)
2023-01-11T21:22:52.7256055Z   test_top_package_import (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7256343Z ok (0.005s)
2023-01-11T21:22:52.7256926Z   test_torch_cuda_is_available (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7257219Z ok (0.004s)
2023-01-11T21:22:52.7257613Z   test_torch_cudnn_is_acceptable (__main__.MiscTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7257913Z ok (0.005s)
2023-01-11T21:22:52.7258168Z   test_torch_cudnn_is_acceptable_bad_inputs (__main__.MiscTests) ... ok (0.005s)
2023-01-11T21:22:52.7258573Z   test_torch_nn_parameter_isinstance (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7258960Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7259199Z ok (0.011s)
2023-01-11T21:22:52.7259937Z   test_torch_profiler (__main__.MiscTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91868 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s)
2023-01-11T21:22:52.7260980Z   test_torch_seed (__main__.MiscTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91867 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s)
2023-01-11T21:22:52.7261973Z   test_torch_size (__main__.MiscTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91866 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s)
2023-01-11T21:22:52.7262574Z   test_type_copy (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7262942Z stats [('calls_captured', 6), ('fusions_possible', 4), ('unique_graphs', 2)]
2023-01-11T21:22:52.7263181Z ok (0.013s)
2023-01-11T21:22:52.7263510Z   test_typing_variable_isinstance (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7263894Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7264181Z ok (0.005s)
2023-01-11T21:22:52.7264549Z   test_unpack4 (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:22:52.7264827Z ok (0.015s)
2023-01-11T21:22:52.7265177Z   test_unpack5 (__main__.MiscTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:22:52.7265451Z ok (0.015s)
2023-01-11T21:22:52.7265802Z   test_update_locals_and_stack_uses_shared_cache (__main__.MiscTests) ... frames [('total', 2), ('ok', 2)]
2023-01-11T21:22:52.7266076Z inline_call []
2023-01-11T21:22:52.7266263Z unimplemented []
2023-01-11T21:22:52.7266601Z graph_break [('call_method ListVariable() extend [ListIteratorVariable()] {}', 1)]
2023-01-11T21:22:52.7266867Z ok (0.007s)
2023-01-11T21:22:52.7267245Z   test_user_defined_class_name (__main__.MiscTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7267537Z ok (0.008s)
2023-01-11T21:22:52.7267812Z   test_user_function_variable_supports_enum_argument (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7268067Z ok (0.004s)
2023-01-11T21:22:52.7268343Z   test_user_function_variable_supports_function_argument (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7268734Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7268973Z ok (0.005s)
2023-01-11T21:22:52.7269253Z   test_user_function_variable_supports_type_abcmeta_argument (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7269650Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:22:52.7269899Z ok (0.005s)
2023-01-11T21:22:52.7270283Z   test_user_getattr1 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7270528Z inline_call []
2023-01-11T21:22:52.7270841Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7271078Z ok (0.006s)
2023-01-11T21:22:52.7271391Z   test_user_getattr2 (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7271643Z inline_call []
2023-01-11T21:22:52.7271948Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7272189Z ok (0.007s)
2023-01-11T21:22:52.7272493Z   test_user_property (__main__.MiscTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:22:52.7272734Z inline_call []
2023-01-11T21:22:52.7273044Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7273288Z ok (0.005s)
2023-01-11T21:22:52.7273521Z   test_usr_cls_classmethod (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7273888Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7274132Z ok (0.007s)
2023-01-11T21:22:52.7274376Z   test_usr_cls_staticmethod (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7274732Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:22:52.7274978Z ok (0.007s)
2023-01-11T21:22:52.7275205Z   test_version_ci (__main__.MiscTests) ... ok (0.000s)
2023-01-11T21:22:52.7275487Z   test_write_to_closures_in_inlining (__main__.MiscTests) ... inline_call []
2023-01-11T21:22:52.7275863Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:22:52.7276103Z ok (0.008s)
2023-01-11T21:22:52.7276318Z   test_jit_save (__main__.TestTracer) ... ok (0.039s)
2023-01-11T21:22:52.7276472Z 
2023-01-11T21:22:52.7276678Z ----------------------------------------------------------------------
2023-01-11T21:22:52.7276947Z Ran 161 tests in 4.834s
2023-01-11T21:22:52.7277072Z 
2023-01-11T21:22:52.7277182Z OK (skipped=4, expected failures=1)
2023-01-11T21:22:52.7277317Z 
2023-01-11T21:22:52.7277411Z Generating XML reports...
2023-01-11T21:22:52.7277818Z Generated XML report: test-reports/python-unittest/dynamo.test_misc/TEST-MiscTests-20230111212247.xml
2023-01-11T21:22:52.7278363Z Generated XML report: test-reports/python-unittest/dynamo.test_misc/TEST-TestTracer-20230111212247.xml
2023-01-11T21:22:52.7278593Z 
2023-01-11T21:22:52.7279003Z ##[endgroup]
2023-01-11T21:22:52.7279402Z FINISHED PRINTING LOG FILE of dynamo/test_misc (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_misc_mzyasd93)
2023-01-11T21:22:52.7279631Z 
2023-01-11T21:22:52.8191671Z 
2023-01-11T21:22:52.8192521Z Expand the folded group to see the log file of dynamo/test_torchxla_integration
2023-01-11T21:22:52.8193555Z ##[group]PRINTING LOG FILE of dynamo/test_torchxla_integration (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_torchxla_integration_xt1a9m96)
2023-01-11T21:22:52.8193901Z 
2023-01-11T21:22:52.8194007Z Running tests...
2023-01-11T21:22:52.8194553Z ----------------------------------------------------------------------
2023-01-11T21:22:52.8195091Z Test results will be stored in test-reports/python-unittest/dynamo.test_torchxla_integration
2023-01-11T21:22:52.8195637Z   test_basic (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s)
2023-01-11T21:22:52.8196231Z   test_inplace_update (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s)
2023-01-11T21:22:52.8196838Z   test_linear (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s)
2023-01-11T21:22:52.8197463Z   test_matmul (__main__.TorchXLAReuseGraphTest) ... skip: Skip the tests since torch_xla is not available or XLA devices are not specified (0.001s)
2023-01-11T21:22:52.8197813Z 
2023-01-11T21:22:52.8198088Z ----------------------------------------------------------------------
2023-01-11T21:22:52.8198808Z Ran 4 tests in 0.003s
2023-01-11T21:22:52.8198976Z 
2023-01-11T21:22:52.8199086Z OK (skipped=4)
2023-01-11T21:22:52.8199251Z 
2023-01-11T21:22:52.8199378Z Generating XML reports...
2023-01-11T21:22:52.8200048Z Generated XML report: test-reports/python-unittest/dynamo.test_torchxla_integration/TEST-TorchXLAReuseGraphTest-20230111212252.xml
2023-01-11T21:22:52.8200423Z 
2023-01-11T21:22:52.8200761Z ##[endgroup]
2023-01-11T21:22:52.8201403Z FINISHED PRINTING LOG FILE of dynamo/test_torchxla_integration (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_torchxla_integration_xt1a9m96)
2023-01-11T21:22:52.8201752Z 
2023-01-11T21:22:54.6923975Z Ignoring disabled issues:  []
2023-01-11T21:22:54.7169966Z Running inductor/test_torchinductor ... [2023-01-11 21:22:54.716491]
2023-01-11T21:22:54.7172674Z Executing ['/opt/conda/bin/python', '-bb', 'inductor/test_torchinductor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:54.716859]
2023-01-11T21:22:54.7352883Z Ignoring disabled issues:  []
2023-01-11T21:22:54.7601159Z Running test_python_dispatch ... [2023-01-11 21:22:54.759617]
2023-01-11T21:22:54.7603099Z Executing ['/opt/conda/bin/python', '-bb', 'test_python_dispatch.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:22:54.759963]
2023-01-11T21:22:59.4492670Z 
2023-01-11T21:22:59.4493947Z Expand the folded group to see the log file of test_python_dispatch
2023-01-11T21:22:59.4495178Z ##[group]PRINTING LOG FILE of test_python_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_python_dispatch_fh1knau0)
2023-01-11T21:22:59.4495523Z 
2023-01-11T21:22:59.4495633Z Running tests...
2023-01-11T21:22:59.4496203Z ----------------------------------------------------------------------
2023-01-11T21:22:59.4496770Z Test results will be stored in test-reports/python-unittest/test_python_dispatch
2023-01-11T21:22:59.4497246Z   test_all_same_mode (__main__.TestPythonDispatch) ... ok (1.070s)
2023-01-11T21:22:59.4497685Z   test_autograd_in_attr (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4498105Z   test_basic (__main__.TestPythonDispatch) ... ok (0.005s)
2023-01-11T21:22:59.4498548Z   test_capture_logs_with_torch_dispatch_mode (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4499286Z   test_construct_int_tensor (__main__.TestPythonDispatch) ... ok (0.000s)
2023-01-11T21:22:59.4499776Z   test_custom_autograd (__main__.TestPythonDispatch) ... ok (0.005s)
2023-01-11T21:22:59.4500264Z   test_deepcopy_non_wrapper_subclass (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4500766Z   test_deepcopy_wrapper_subclass (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4501292Z   test_deepcopy_wrapper_subclass_with_clone_returning_different_type (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4501848Z   test_detach_appears_twice_when_called_once (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4502341Z   test_device_slowpath (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4502749Z   test_dim_slowpath (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4503165Z   test_dispatch_super_call (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4503638Z   test_dispatch_super_call_list_arg (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4504143Z   test_dispatch_super_dont_autograd (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4504608Z   test_error_using_class_method_on_mode (__main__.TestPythonDispatch) ... ok (0.005s)
2023-01-11T21:22:59.4505069Z   test_exception_handling (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4505493Z   test_fancy_strides (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4505883Z   test_format (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4506282Z   test_get_cur_mode (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4506789Z   test_get_mode_stack (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4507255Z   test_index_put_where_only_index_is_subclass (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4507887Z   test_invalid_ret (__main__.TestPythonDispatch) ... /var/lib/jenkins/workspace/test/test_python_dispatch.py:447: DeprecationWarning: Please use assertRaisesRegex instead.
2023-01-11T21:22:59.4508409Z   self.assertRaisesRegexp(
2023-01-11T21:22:59.4508692Z ok (0.001s)
2023-01-11T21:22:59.4509030Z   test_is_contiguous_slow_path (__main__.TestPythonDispatch) ... ok (0.003s)
2023-01-11T21:22:59.4509446Z   test_kwarg_only (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4509886Z   test_kwarg_only_and_positional_default (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4510324Z   test_layout_slow_path (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4510733Z   test_like (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4511147Z   test_list_ret (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4511598Z   test_make_subclass_with_modes (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4512067Z   test_make_wrapper_subclass_noalloc (__main__.TestPythonDispatch) ... ok (0.000s)
2023-01-11T21:22:59.4512563Z   test_make_wrapper_subclass_propagates_metadata (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4513036Z   test_maybe_tuple_bug (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4513466Z   test_mode_with_make_subclass (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4513954Z   test_multiple_ops_subclass (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4514450Z   test_nested_push_logging_tensor_mode (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4514900Z   test_nesting_same_mode (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4515304Z   test_new_ones (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4515721Z   test_none_wrapping (__main__.TestPythonDispatch) ... ok (0.008s)
2023-01-11T21:22:59.4516161Z   test_notimplemented_mode (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4516584Z   test_optional_tensor_list (__main__.TestPythonDispatch) ... woof
2023-01-11T21:22:59.4516907Z ok (0.002s)
2023-01-11T21:22:59.4517294Z   test_out (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4517711Z   test_produce_real_type (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4518123Z   test_set_data (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4518547Z   test_shallow_copy_and_detach (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4518972Z   test_sizes_slow_path (__main__.TestPythonDispatch) ... ok (0.003s)
2023-01-11T21:22:59.4519392Z   test_standard_is_not_subclass (__main__.TestPythonDispatch) ... ok (0.000s)
2023-01-11T21:22:59.4520316Z   test_storage (__main__.TestPythonDispatch) ... /var/lib/jenkins/workspace/test/test_python_dispatch.py:469: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:22:59.4521170Z   self.assertRaises(RuntimeError, lambda: x.storage())
2023-01-11T21:22:59.4521488Z ok (0.001s)
2023-01-11T21:22:59.4522334Z   test_storage_can_be_converted_to_python_object (__main__.TestPythonDispatch) ... /var/lib/jenkins/workspace/test/test_python_dispatch.py:1197: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:22:59.4523147Z   s = torch.Storage()
2023-01-11T21:22:59.4523408Z ok (0.001s)
2023-01-11T21:22:59.4523746Z   test_strides_slow_path (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4524280Z   test_subclass_autograd_device_check (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4524734Z   test_subclass_creation (__main__.TestPythonDispatch) ... ok (0.004s)
2023-01-11T21:22:59.4525179Z   test_subclass_priority (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4525638Z   test_tolist_numpy_with_torch_dispatch_mode (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4526115Z   test_torch_dispatch_mode_basic (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4526588Z   test_torch_dispatch_mode_respects_no_dispatch (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4527082Z   test_torch_dispatch_mode_subclass_priority (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4527578Z   test_torch_dispatch_mode_unrelated_tensors (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4528005Z   test_version (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4528445Z   test_with_mode_created_separately (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4528894Z   test_with_nested_modes (__main__.TestPythonDispatch) ... ok (0.001s)
2023-01-11T21:22:59.4529339Z   test_wrapper_subclass_serializes (__main__.TestPythonDispatch) ... ok (0.002s)
2023-01-11T21:22:59.4529767Z   test_basic (__main__.TestPythonDispatcher) ... ok (0.001s)
2023-01-11T21:22:59.4530163Z   test_lstsq (__main__.TestPythonDispatcher) ... ok (0.003s)
2023-01-11T21:22:59.4530613Z   test_alias_analysis (__main__.TestPythonRegistration) ... ok (0.006s)
2023-01-11T21:22:59.4531057Z   test_create_new_library (__main__.TestPythonRegistration) ... ok (0.002s)
2023-01-11T21:22:59.4531517Z   test_error_for_unsupported_ns_or_kind (__main__.TestPythonRegistration) ... ok (0.001s)
2023-01-11T21:22:59.4531995Z   test_error_if_fn_not_callable (__main__.TestPythonRegistration) ... ok (0.001s)
2023-01-11T21:22:59.4532483Z   test_extend_library_with_dispatch_key_arg (__main__.TestPythonRegistration) ... ok (0.002s)
2023-01-11T21:22:59.4533503Z   test_override_aten_ops_with_multiple_libraries (__main__.TestPythonRegistration) ... /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:22:59.4534459Z   operator: aten::mul.Tensor(Tensor self, Tensor other) -> Tensor
2023-01-11T21:22:59.4535067Z     registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6
2023-01-11T21:22:59.4535446Z   dispatch key: ZeroTensor
2023-01-11T21:22:59.4535909Z   previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1070
2023-01-11T21:22:59.4536543Z        new kernel: registered at /dev/null:549 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.)
2023-01-11T21:22:59.4537150Z   self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn)
2023-01-11T21:22:59.4537536Z ok (0.003s)
2023-01-11T21:22:59.4537901Z   test_override_cpu_sum (__main__.TestPythonRegistration) ... ok (0.001s)
2023-01-11T21:22:59.4538806Z   test_override_cuda_with_jiterator (__main__.TestPythonRegistration) ... /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:22:59.4539599Z   operator: aten::where.self(Tensor condition, Tensor self, Tensor other) -> Tensor
2023-01-11T21:22:59.4540068Z     registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6
2023-01-11T21:22:59.4540441Z   dispatch key: CUDA
2023-01-11T21:22:59.4540878Z   previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/functorch/BatchRulesBinaryOps.cpp:322
2023-01-11T21:22:59.4541481Z        new kernel: registered at /dev/null:209 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.)
2023-01-11T21:22:59.4542286Z   self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn)
2023-01-11T21:22:59.4543062Z /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:22:59.4543713Z   operator: aten::gelu(Tensor self, *, str approximate="none") -> Tensor
2023-01-11T21:22:59.4544160Z     registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6
2023-01-11T21:22:59.4544538Z   dispatch key: CUDA
2023-01-11T21:22:59.4544967Z   previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/functorch/BatchRulesUnaryOps.cpp:82
2023-01-11T21:22:59.4545594Z        new kernel: registered at /dev/null:210 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.)
2023-01-11T21:22:59.4546165Z   self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn)
2023-01-11T21:22:59.4546964Z /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:22:59.4547557Z   operator: aten::exp(Tensor self) -> Tensor
2023-01-11T21:22:59.4547967Z     registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6
2023-01-11T21:22:59.4548338Z   dispatch key: CUDA
2023-01-11T21:22:59.4548794Z   previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1070
2023-01-11T21:22:59.4549440Z        new kernel: registered at /dev/null:211 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.)
2023-01-11T21:22:59.4549995Z   self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn)
2023-01-11T21:22:59.4550767Z /opt/conda/lib/python3.10/site-packages/torch/library.py:126: UserWarning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:22:59.4551445Z   operator: aten::add.Tensor(Tensor self, Tensor other, *, Scalar alpha=1) -> Tensor
2023-01-11T21:22:59.4551921Z     registered at /var/lib/jenkins/workspace/build/aten/src/ATen/RegisterSchema.cpp:6
2023-01-11T21:22:59.4552296Z   dispatch key: CUDA
2023-01-11T21:22:59.4552858Z   previous kernel: registered at /var/lib/jenkins/workspace/aten/src/ATen/LegacyBatchingRegistrations.cpp:1070
2023-01-11T21:22:59.4553503Z        new kernel: registered at /dev/null:212 (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/core/dispatch/OperatorEntry.cpp:159.)
2023-01-11T21:22:59.4554058Z   self.m.impl(name, dispatch_key if dispatch_key != "" else "CompositeImplicitAutograd", fn)
2023-01-11T21:22:59.4554408Z ok (1.703s)
2023-01-11T21:22:59.4554565Z 
2023-01-11T21:22:59.4554847Z ----------------------------------------------------------------------
2023-01-11T21:22:59.4555189Z Ran 72 tests in 2.894s
2023-01-11T21:22:59.4555352Z 
2023-01-11T21:22:59.4555432Z OK
2023-01-11T21:22:59.4555573Z 
2023-01-11T21:22:59.4555716Z Generating XML reports...
2023-01-11T21:22:59.4556448Z Generated XML report: test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatch-20230111212256.xml
2023-01-11T21:22:59.4557424Z Generated XML report: test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatcher-20230111212256.xml
2023-01-11T21:22:59.4558336Z Generated XML report: test-reports/python-unittest/test_python_dispatch/TEST-TestPythonRegistration-20230111212256.xml
2023-01-11T21:22:59.4558691Z 
2023-01-11T21:22:59.4559117Z ##[endgroup]
2023-01-11T21:22:59.4559677Z FINISHED PRINTING LOG FILE of test_python_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_python_dispatch_fh1knau0)
2023-01-11T21:22:59.4560008Z 
2023-01-11T21:23:01.3731793Z Ignoring disabled issues:  []
2023-01-11T21:23:01.3966722Z Running test_scatter_gather_ops ... [2023-01-11 21:23:01.395949]
2023-01-11T21:23:01.3967364Z Executing ['/opt/conda/bin/python', '-bb', 'test_scatter_gather_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:01.396273]
2023-01-11T21:23:08.6247006Z 
2023-01-11T21:23:08.6247471Z Expand the folded group to see the log file of test_scatter_gather_ops
2023-01-11T21:23:08.6248737Z ##[group]PRINTING LOG FILE of test_scatter_gather_ops (/var/lib/jenkins/workspace/test/test-reports/test_scatter_gather_ops_339th585)
2023-01-11T21:23:08.6249138Z 
2023-01-11T21:23:08.6249235Z Running tests...
2023-01-11T21:23:08.6249676Z ----------------------------------------------------------------------
2023-01-11T21:23:08.6250088Z Test results will be stored in test-reports/python-unittest/test_scatter_gather_ops
2023-01-11T21:23:08.6250500Z   test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.004s)
2023-01-11T21:23:08.6250928Z   test_gather_backward_with_empty_index_tensor_sparse_grad_False_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.001s)
2023-01-11T21:23:08.6251353Z   test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.001s)
2023-01-11T21:23:08.6251779Z   test_gather_backward_with_empty_index_tensor_sparse_grad_True_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.001s)
2023-01-11T21:23:08.6252168Z   test_gather_bool_cuda_bool (__main__.TestScatterGatherCUDA) ... ok (0.001s)
2023-01-11T21:23:08.6252502Z   test_gather_cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.019s)
2023-01-11T21:23:08.6252826Z   test_gather_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.019s)
2023-01-11T21:23:08.6253161Z   test_scatter__cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.037s)
2023-01-11T21:23:08.6253498Z   test_scatter__cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.036s)
2023-01-11T21:23:08.6253908Z   test_scatter__cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.036s)
2023-01-11T21:23:08.6254256Z   test_scatter__reductions_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.450s)
2023-01-11T21:23:08.6254868Z   test_scatter__reductions_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.450s)
2023-01-11T21:23:08.6255225Z   test_scatter__scalar_cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.035s)
2023-01-11T21:23:08.6255692Z   test_scatter__scalar_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.035s)
2023-01-11T21:23:08.6256040Z   test_scatter__scalar_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.036s)
2023-01-11T21:23:08.6256387Z   test_scatter_add__cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.037s)
2023-01-11T21:23:08.6256731Z   test_scatter_add__cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.037s)
2023-01-11T21:23:08.6257058Z   test_scatter_add__cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.036s)
2023-01-11T21:23:08.6257427Z   test_scatter_add_mult_index_base_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.002s)
2023-01-11T21:23:08.6258030Z   test_scatter_reduce_amax_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... /var/lib/jenkins/workspace/test/test_scatter_gather_ops.py:107: UserWarning: scatter_reduce() is in beta and the API may change at any time. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorAdvancedIndexing.cpp:1739.)
2023-01-11T21:23:08.6258612Z   actual = fn(base.clone(), dim, idx, src, reduce=reduction, include_self=include_self)
2023-01-11T21:23:08.6258862Z ok (0.090s)
2023-01-11T21:23:08.6259130Z   test_scatter_reduce_amax_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.072s)
2023-01-11T21:23:08.6259481Z   test_scatter_reduce_amax_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6259834Z   test_scatter_reduce_amax_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.072s)
2023-01-11T21:23:08.6260170Z   test_scatter_reduce_amax_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.069s)
2023-01-11T21:23:08.6260513Z   test_scatter_reduce_amax_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6260917Z   test_scatter_reduce_amax_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6261255Z   test_scatter_reduce_amax_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6261600Z   test_scatter_reduce_amax_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.069s)
2023-01-11T21:23:08.6261945Z   test_scatter_reduce_amin_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.072s)
2023-01-11T21:23:08.6262301Z   test_scatter_reduce_amin_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.072s)
2023-01-11T21:23:08.6262638Z   test_scatter_reduce_amin_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6262975Z   test_scatter_reduce_amin_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.072s)
2023-01-11T21:23:08.6263323Z   test_scatter_reduce_amin_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.070s)
2023-01-11T21:23:08.6263652Z   test_scatter_reduce_amin_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6263994Z   test_scatter_reduce_amin_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6264337Z   test_scatter_reduce_amin_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.070s)
2023-01-11T21:23:08.6264679Z   test_scatter_reduce_amin_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.071s)
2023-01-11T21:23:08.6265017Z   test_scatter_reduce_mean_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.061s)
2023-01-11T21:23:08.6265364Z   test_scatter_reduce_mean_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6265715Z   test_scatter_reduce_mean_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6266048Z   test_scatter_reduce_mean_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6266391Z   test_scatter_reduce_mean_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.061s)
2023-01-11T21:23:08.6266732Z   test_scatter_reduce_mean_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6267079Z   test_scatter_reduce_mean_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6267410Z   test_scatter_reduce_mean_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6267778Z   test_scatter_reduce_mean_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6268125Z   test_scatter_reduce_prod_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6268475Z   test_scatter_reduce_prod_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6268813Z   test_scatter_reduce_prod_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6269157Z   test_scatter_reduce_prod_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6269503Z   test_scatter_reduce_prod_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6269836Z   test_scatter_reduce_prod_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6270173Z   test_scatter_reduce_prod_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6270517Z   test_scatter_reduce_prod_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6270860Z   test_scatter_reduce_prod_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6271191Z   test_scatter_reduce_sum_cuda_bfloat16 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6271539Z   test_scatter_reduce_sum_cuda_complex128 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6271894Z   test_scatter_reduce_sum_cuda_complex64 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6272236Z   test_scatter_reduce_sum_cuda_float16 (__main__.TestScatterGatherCUDA) ... ok (0.061s)
2023-01-11T21:23:08.6272582Z   test_scatter_reduce_sum_cuda_float32 (__main__.TestScatterGatherCUDA) ... ok (0.060s)
2023-01-11T21:23:08.6272955Z   test_scatter_reduce_sum_cuda_float64 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6273296Z   test_scatter_reduce_sum_cuda_int16 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6273633Z   test_scatter_reduce_sum_cuda_int32 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6273971Z   test_scatter_reduce_sum_cuda_int64 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6274311Z   test_scatter_reduce_sum_cuda_int8 (__main__.TestScatterGatherCUDA) ... ok (0.062s)
2023-01-11T21:23:08.6274690Z   test_scatter_reduce_sum_cuda_uint8 (__main__.TestScatterGatherCUDA) ... ok (0.063s)
2023-01-11T21:23:08.6274880Z 
2023-01-11T21:23:08.6275095Z ----------------------------------------------------------------------
2023-01-11T21:23:08.6275358Z Ran 66 tests in 4.376s
2023-01-11T21:23:08.6275482Z 
2023-01-11T21:23:08.6275549Z OK
2023-01-11T21:23:08.6275643Z 
2023-01-11T21:23:08.6275740Z Generating XML reports...
2023-01-11T21:23:08.6276208Z Generated XML report: test-reports/python-unittest/test_scatter_gather_ops/TEST-TestScatterGatherCUDA-20230111212303.xml
2023-01-11T21:23:08.6276472Z 
2023-01-11T21:23:08.6294189Z ##[endgroup]
2023-01-11T21:23:08.6294798Z FINISHED PRINTING LOG FILE of test_scatter_gather_ops (/var/lib/jenkins/workspace/test/test-reports/test_scatter_gather_ops_339th585)
2023-01-11T21:23:08.6295062Z 
2023-01-11T21:23:10.5351943Z Ignoring disabled issues:  []
2023-01-11T21:23:10.5584696Z Running test_testing ... [2023-01-11 21:23:10.558055]
2023-01-11T21:23:10.5587071Z Executing ['/opt/conda/bin/python', '-bb', 'test_testing.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:10.558396]
2023-01-11T21:23:47.4350023Z 
2023-01-11T21:23:47.4350541Z Expand the folded group to see the log file of test_testing
2023-01-11T21:23:47.4351592Z ##[group]PRINTING LOG FILE of test_testing (/var/lib/jenkins/workspace/test/test-reports/test_testing_qs5jmj7o)
2023-01-11T21:23:47.4352018Z 
2023-01-11T21:23:47.4352150Z Running tests...
2023-01-11T21:23:47.4352771Z ----------------------------------------------------------------------
2023-01-11T21:23:47.4357891Z Test results will be stored in test-reports/python-unittest/test_testing
2023-01-11T21:23:47.4358590Z   test_bool (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4359070Z   test_default_tolerance_selection_mismatching_dtypes (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4359531Z   test_docstring_examples (__main__.TestAssertClose) ... ok (0.007s)
2023-01-11T21:23:47.4359887Z   test_matching (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4360965Z   test_matching_atol (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4361471Z   test_matching_conjugate_bit (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4362296Z   test_matching_nan (__main__.TestAssertClose) ... ok (0.003s)
2023-01-11T21:23:47.4362949Z   test_matching_nan_with_equal_nan (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4363410Z   test_matching_rtol (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4363833Z   test_meta (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4364294Z   test_mismatching_dtype (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4364769Z   test_mismatching_dtype_no_check (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4365840Z   test_mismatching_layout (__main__.TestAssertClose) ... /var/lib/jenkins/workspace/test/test_testing.py:619: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/SparseCsrTensorImpl.cpp:56.)
2023-01-11T21:23:47.4366733Z   sparse_csr = strided.to_sparse_csr()
2023-01-11T21:23:47.4367014Z ok (0.002s)
2023-01-11T21:23:47.4367292Z   test_mismatching_layout_no_check (__main__.TestAssertClose) ... ok (0.004s)
2023-01-11T21:23:47.4367782Z   test_mismatching_shape (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4368091Z   test_mismatching_stride (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4376898Z   test_mismatching_stride_no_check (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4377351Z   test_mismatching_types (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4377782Z   test_mismatching_types_subclasses (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4378239Z   test_mismatching_types_type_equality (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4378639Z   test_mismatching_values (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4379044Z   test_mismatching_values_atol (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4379451Z   test_mismatching_values_rtol (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4379807Z   test_none (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4380154Z   test_none_mismatch (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4380514Z   test_numpy (__main__.TestAssertClose) ... ok (0.002s)
2023-01-11T21:23:47.4380867Z   test_only_atol (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4381204Z   test_only_rtol (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4381550Z   test_scalar (__main__.TestAssertClose) ... ok (0.003s)
2023-01-11T21:23:47.4381920Z   test_unexpected_error_compare (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4382329Z   test_unexpected_error_originate (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4382703Z   test_unknown_layout (__main__.TestAssertClose) ... ok (0.001s)
2023-01-11T21:23:47.4383069Z   test_unknown_type (__main__.TestAssertClose) ... ok (0.008s)
2023-01-11T21:23:47.4383655Z   test_mapping_mismatching_keys (__main__.TestAssertCloseContainer) ... ok (0.000s)
2023-01-11T21:23:47.4384133Z   test_mapping_mismatching_values_msg (__main__.TestAssertCloseContainer) ... ok (0.001s)
2023-01-11T21:23:47.4384550Z   test_sequence_mismatching_len (__main__.TestAssertCloseContainer) ... ok (0.000s)
2023-01-11T21:23:47.4384903Z   test_sequence_mismatching_values_msg (__main__.TestAssertCloseContainer) ... ok (0.001s)
2023-01-11T21:23:47.4385449Z   test_abs_diff (__main__.TestAssertCloseErrorMessage) ... ok (0.003s)
2023-01-11T21:23:47.4385777Z   test_abs_diff_scalar (__main__.TestAssertCloseErrorMessage) ... ok (0.001s)
2023-01-11T21:23:47.4386104Z   test_atol (__main__.TestAssertCloseErrorMessage) ... ok (0.003s)
2023-01-11T21:23:47.4386435Z   test_identifier_scalars (__main__.TestAssertCloseErrorMessage) ... ok (0.001s)
2023-01-11T21:23:47.4386777Z   test_identifier_tensor_likes (__main__.TestAssertCloseErrorMessage) ... ok (0.002s)
2023-01-11T21:23:47.4387129Z   test_mismatched_elements (__main__.TestAssertCloseErrorMessage) ... ok (0.002s)
2023-01-11T21:23:47.4387464Z   test_msg_callable (__main__.TestAssertCloseErrorMessage) ... ok (0.001s)
2023-01-11T21:23:47.4387789Z   test_msg_str (__main__.TestAssertCloseErrorMessage) ... ok (0.001s)
2023-01-11T21:23:47.4388101Z   test_not_close (__main__.TestAssertCloseErrorMessage) ... ok (0.007s)
2023-01-11T21:23:47.4388419Z   test_not_equal (__main__.TestAssertCloseErrorMessage) ... ok (0.002s)
2023-01-11T21:23:47.4388742Z   test_rel_diff (__main__.TestAssertCloseErrorMessage) ... ok (0.003s)
2023-01-11T21:23:47.4389067Z   test_rel_diff_scalar (__main__.TestAssertCloseErrorMessage) ... ok (0.001s)
2023-01-11T21:23:47.4389378Z   test_rtol (__main__.TestAssertCloseErrorMessage) ... ok (0.003s)
2023-01-11T21:23:47.4389758Z   test_zero_div_zero (__main__.TestAssertCloseErrorMessage) ... ok (0.003s)
2023-01-11T21:23:47.4390214Z   test_mismatching_device_cuda (__main__.TestAssertCloseMultiDeviceCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4390692Z   test_mismatching_device_no_check_cuda (__main__.TestAssertCloseMultiDeviceCUDA) ... ok (0.003s)
2023-01-11T21:23:47.4391154Z   test_matching_per_channel (__main__.TestAssertCloseQuantized) ... ok (0.002s)
2023-01-11T21:23:47.4391673Z   test_matching_per_tensor (__main__.TestAssertCloseQuantized) ... ok (0.002s)
2023-01-11T21:23:47.4392122Z   test_mismatching_is_quantized (__main__.TestAssertCloseQuantized) ... ok (0.001s)
2023-01-11T21:23:47.4392548Z   test_mismatching_qscheme (__main__.TestAssertCloseQuantized) ... ok (0.001s)
2023-01-11T21:23:47.4392944Z   test_matching (__main__.TestAssertCloseSparseBSC) ... ok (0.002s)
2023-01-11T21:23:47.4393281Z   test_mismatching_ccol_indices_msg (__main__.TestAssertCloseSparseBSC) ... ok (0.003s)
2023-01-11T21:23:47.4393627Z   test_mismatching_row_indices_msg (__main__.TestAssertCloseSparseBSC) ... ok (0.003s)
2023-01-11T21:23:47.4393980Z   test_mismatching_values_msg (__main__.TestAssertCloseSparseBSC) ... ok (0.003s)
2023-01-11T21:23:47.4394312Z   test_matching (__main__.TestAssertCloseSparseBSR) ... ok (0.002s)
2023-01-11T21:23:47.4394643Z   test_mismatching_col_indices_msg (__main__.TestAssertCloseSparseBSR) ... ok (0.003s)
2023-01-11T21:23:47.4394990Z   test_mismatching_crow_indices_msg (__main__.TestAssertCloseSparseBSR) ... ok (0.003s)
2023-01-11T21:23:47.4395331Z   test_mismatching_values_msg (__main__.TestAssertCloseSparseBSR) ... ok (0.003s)
2023-01-11T21:23:47.4395668Z   test_matching_coalesced (__main__.TestAssertCloseSparseCOO) ... ok (0.001s)
2023-01-11T21:23:47.4396001Z   test_matching_uncoalesced (__main__.TestAssertCloseSparseCOO) ... ok (0.001s)
2023-01-11T21:23:47.4396349Z   test_mismatching_indices_msg (__main__.TestAssertCloseSparseCOO) ... ok (0.003s)
2023-01-11T21:23:47.4396682Z   test_mismatching_nnz (__main__.TestAssertCloseSparseCOO) ... ok (0.001s)
2023-01-11T21:23:47.4397015Z   test_mismatching_sparse_dims (__main__.TestAssertCloseSparseCOO) ... ok (0.001s)
2023-01-11T21:23:47.4397347Z   test_mismatching_values_msg (__main__.TestAssertCloseSparseCOO) ... ok (0.003s)
2023-01-11T21:23:47.4397674Z   test_matching (__main__.TestAssertCloseSparseCSC) ... ok (0.001s)
2023-01-11T21:23:47.4398014Z   test_mismatching_ccol_indices_msg (__main__.TestAssertCloseSparseCSC) ... ok (0.003s)
2023-01-11T21:23:47.4398355Z   test_mismatching_row_indices_msg (__main__.TestAssertCloseSparseCSC) ... ok (0.003s)
2023-01-11T21:23:47.4398697Z   test_mismatching_values_msg (__main__.TestAssertCloseSparseCSC) ... ok (0.003s)
2023-01-11T21:23:47.4399130Z   test_hybrid_support (__main__.TestAssertCloseSparseCSR) ... expected failure (0.006s)
2023-01-11T21:23:47.4399462Z   test_matching (__main__.TestAssertCloseSparseCSR) ... ok (0.002s)
2023-01-11T21:23:47.4399788Z   test_mismatching_col_indices_msg (__main__.TestAssertCloseSparseCSR) ... ok (0.003s)
2023-01-11T21:23:47.4400137Z   test_mismatching_crow_indices_msg (__main__.TestAssertCloseSparseCSR) ... ok (0.003s)
2023-01-11T21:23:47.4400482Z   test_mismatching_values_msg (__main__.TestAssertCloseSparseCSR) ... ok (0.003s)
2023-01-11T21:23:47.4400798Z   test_filtering_env_var (__main__.TestFrameworkUtils) ... ok (11.351s)
2023-01-11T21:23:47.4401088Z   test_circular_dependencies (__main__.TestImports)
2023-01-11T21:23:47.4401726Z Checks that all modules inside torch can be imported ... 2023-01-11 21:23:39,302 - torch.distributed.nn.jit.instantiator - INFO - Created a temporary directory at /tmp/tmpec57be6h
2023-01-11T21:23:47.4402324Z 2023-01-11 21:23:39,302 - torch.distributed.nn.jit.instantiator - INFO - Writing /tmp/tmpec57be6h/_remote_module_non_scriptable.py
2023-01-11T21:23:47.4402632Z ok (15.258s)
2023-01-11T21:23:47.4402910Z   test_no_mutate_global_logging_on_import_path_functorch (__main__.TestImports) ... ok (1.382s)
2023-01-11T21:23:47.4403256Z   test_no_mutate_global_logging_on_import_path_torch (__main__.TestImports) ... ok (1.367s)
2023-01-11T21:23:47.4403676Z   test_no_warning_on_import (__main__.TestImports) ... /var/lib/jenkins/workspace/test/test_testing.py:1836: DeprecationWarning: Please use assertEqual instead.
2023-01-11T21:23:47.4404027Z   self.assertEquals(out, "")
2023-01-11T21:23:47.4404234Z ok (1.384s)
2023-01-11T21:23:47.4404572Z   test_opinfo_error_generators_T_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4404956Z   test_opinfo_error_generators___radd___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4405354Z   test_opinfo_error_generators___rand___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4405744Z   test_opinfo_error_generators___rdiv___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4406124Z   test_opinfo_error_generators___rmod___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4406499Z   test_opinfo_error_generators___rmul___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4406891Z   test_opinfo_error_generators___ror___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4407280Z   test_opinfo_error_generators___rpow___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4407663Z   test_opinfo_error_generators___rsub___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4408046Z   test_opinfo_error_generators___rxor___cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4408427Z   test_opinfo_error_generators_add_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4408813Z   test_opinfo_error_generators_amax_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4409188Z   test_opinfo_error_generators_amin_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4409579Z   test_opinfo_error_generators_aminmax_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4409976Z   test_opinfo_error_generators_arange_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4410385Z   test_opinfo_error_generators_as_strided_scatter_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4410780Z   test_opinfo_error_generators_atan2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4411174Z   test_opinfo_error_generators_bernoulli_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4411607Z   test_opinfo_error_generators_bitwise_and_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4412008Z   test_opinfo_error_generators_bitwise_left_shift_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4412411Z   test_opinfo_error_generators_bitwise_or_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4412828Z   test_opinfo_error_generators_bitwise_right_shift_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4413238Z   test_opinfo_error_generators_bitwise_xor_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4413622Z   test_opinfo_error_generators_cat_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4414143Z   test_opinfo_error_generators_clamp_max_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4414768Z   test_opinfo_error_generators_clamp_min_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4415171Z   test_opinfo_error_generators_complex_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4415552Z   test_opinfo_error_generators_copysign_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4415960Z   test_opinfo_error_generators_cov_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4416365Z   test_opinfo_error_generators_diag_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4416743Z   test_opinfo_error_generators_diag_embed_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4417149Z   test_opinfo_error_generators_diagonal_copy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4417627Z   test_opinfo_error_generators_diagonal_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4418032Z   test_opinfo_error_generators_div_floor_rounding_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4418440Z   test_opinfo_error_generators_div_no_rounding_mode_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4418856Z   test_opinfo_error_generators_div_trunc_rounding_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4419256Z   test_opinfo_error_generators_dsplit_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4419648Z   test_opinfo_error_generators_dstack_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4420024Z   test_opinfo_error_generators_eq_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4420410Z   test_opinfo_error_generators_eye_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4420799Z   test_opinfo_error_generators_fft_fft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4421182Z   test_opinfo_error_generators_fft_fft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4421572Z   test_opinfo_error_generators_fft_fftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4421966Z   test_opinfo_error_generators_fft_hfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4422358Z   test_opinfo_error_generators_fft_hfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4422739Z   test_opinfo_error_generators_fft_hfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4423128Z   test_opinfo_error_generators_fft_ifft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4423525Z   test_opinfo_error_generators_fft_ifft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4423914Z   test_opinfo_error_generators_fft_ifftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4424339Z   test_opinfo_error_generators_fft_ihfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4424734Z   test_opinfo_error_generators_fft_ihfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4425122Z   test_opinfo_error_generators_fft_ihfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4425518Z   test_opinfo_error_generators_fft_irfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4425900Z   test_opinfo_error_generators_fft_irfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4426285Z   test_opinfo_error_generators_fft_irfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4426672Z   test_opinfo_error_generators_fft_rfft2_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4427057Z   test_opinfo_error_generators_fft_rfft_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4427452Z   test_opinfo_error_generators_fft_rfftn_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4427841Z   test_opinfo_error_generators_fliplr_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4428229Z   test_opinfo_error_generators_flipud_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4428614Z   test_opinfo_error_generators_float_power_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4429017Z   test_opinfo_error_generators_floor_divide_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4429411Z   test_opinfo_error_generators_fmax_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4429828Z   test_opinfo_error_generators_fmin_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4430204Z   test_opinfo_error_generators_fmod_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4430598Z   test_opinfo_error_generators_gather_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4430990Z   test_opinfo_error_generators_gcd_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4431362Z   test_opinfo_error_generators_ge_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4431749Z   test_opinfo_error_generators_gradient_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4432135Z   test_opinfo_error_generators_gt_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4432523Z   test_opinfo_error_generators_heaviside_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4432910Z   test_opinfo_error_generators_hsplit_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4433299Z   test_opinfo_error_generators_hstack_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4433687Z   test_opinfo_error_generators_hypot_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4434074Z   test_opinfo_error_generators_igamma_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4434452Z   test_opinfo_error_generators_igammac_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4434853Z   test_opinfo_error_generators_index_select_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4435248Z   test_opinfo_error_generators_isclose_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4435639Z   test_opinfo_error_generators_jiterator_binary_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4436064Z   test_opinfo_error_generators_jiterator_binary_return_by_ref_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4436507Z   test_opinfo_error_generators_kthvalue_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4436903Z   test_opinfo_error_generators_lcm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4437279Z   test_opinfo_error_generators_ldexp_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4437662Z   test_opinfo_error_generators_le_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4438053Z   test_opinfo_error_generators_linalg_cross_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4438457Z   test_opinfo_error_generators_linalg_lstsq_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4438866Z   test_opinfo_error_generators_linalg_lstsq_grad_oriented_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4439286Z   test_opinfo_error_generators_linspace_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4439689Z   test_opinfo_error_generators_logcumsumexp_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4440088Z   test_opinfo_error_generators_logical_and_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4440481Z   test_opinfo_error_generators_logical_or_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4440874Z   test_opinfo_error_generators_logical_xor_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4441276Z   test_opinfo_error_generators_logspace_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4441654Z   test_opinfo_error_generators_lt_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4442091Z   test_opinfo_error_generators_masked_fill_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4442491Z   test_opinfo_error_generators_masked_select_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4442893Z   test_opinfo_error_generators_max_binary_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4443280Z   test_opinfo_error_generators_maximum_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4443664Z   test_opinfo_error_generators_mean_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4444052Z   test_opinfo_error_generators_median_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4444442Z   test_opinfo_error_generators_min_binary_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4444824Z   test_opinfo_error_generators_minimum_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4445219Z   test_opinfo_error_generators_movedim_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4445608Z   test_opinfo_error_generators_mul_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4445994Z   test_opinfo_error_generators_multinomial_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4446396Z   test_opinfo_error_generators_narrow_copy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4446792Z   test_opinfo_error_generators_narrow_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4447197Z   test_opinfo_error_generators_native_layer_norm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4447585Z   test_opinfo_error_generators_ne_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4447970Z   test_opinfo_error_generators_neg_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4448357Z   test_opinfo_error_generators_nextafter_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4448807Z   test_opinfo_error_generators_nn_functional_avg_pool1d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4449230Z   test_opinfo_error_generators_nn_functional_avg_pool2d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4449657Z   test_opinfo_error_generators_nn_functional_avg_pool3d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4450078Z   test_opinfo_error_generators_nn_functional_conv1d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4450489Z   test_opinfo_error_generators_nn_functional_conv2d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4450904Z   test_opinfo_error_generators_nn_functional_embedding_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4451340Z   test_opinfo_error_generators_nn_functional_gaussian_nll_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4451768Z   test_opinfo_error_generators_nn_functional_gelu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4452178Z   test_opinfo_error_generators_nn_functional_group_norm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4452612Z   test_opinfo_error_generators_nn_functional_hinge_embedding_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4453046Z   test_opinfo_error_generators_nn_functional_huber_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4453464Z   test_opinfo_error_generators_nn_functional_l1_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4454005Z   test_opinfo_error_generators_nn_functional_margin_ranking_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4454651Z   test_opinfo_error_generators_nn_functional_max_pool1d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4455075Z   test_opinfo_error_generators_nn_functional_max_pool2d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4455514Z   test_opinfo_error_generators_nn_functional_max_pool3d_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4455947Z   test_opinfo_error_generators_nn_functional_poisson_nll_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4456374Z   test_opinfo_error_generators_nn_functional_prelu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4456780Z   test_opinfo_error_generators_nn_functional_rrelu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4457204Z   test_opinfo_error_generators_nn_functional_soft_margin_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4457620Z   test_opinfo_error_generators_nn_functional_softshrink_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4458048Z   test_opinfo_error_generators_nn_functional_triplet_margin_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4458499Z   test_opinfo_error_generators_nn_functional_triplet_margin_with_distance_loss_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4458918Z   test_opinfo_error_generators_ormqr_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4459295Z   test_opinfo_error_generators_polar_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4459678Z   test_opinfo_error_generators_pow_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4460066Z   test_opinfo_error_generators_remainder_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4460457Z   test_opinfo_error_generators_renorm_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4460839Z   test_opinfo_error_generators_reshape_as_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4461308Z   test_opinfo_error_generators_reshape_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4461693Z   test_opinfo_error_generators_roll_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4462066Z   test_opinfo_error_generators_rot90_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4462445Z   test_opinfo_error_generators_rsub_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4462832Z   test_opinfo_error_generators_scatter_add_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4463227Z   test_opinfo_error_generators_scatter_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4463629Z   test_opinfo_error_generators_signal_windows_bartlett_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4464050Z   test_opinfo_error_generators_signal_windows_blackman_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4464467Z   test_opinfo_error_generators_signal_windows_cosine_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4464887Z   test_opinfo_error_generators_signal_windows_exponential_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4465296Z   test_opinfo_error_generators_signal_windows_gaussian_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4465716Z   test_opinfo_error_generators_signal_windows_general_cosine_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4466145Z   test_opinfo_error_generators_signal_windows_general_hamming_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4466672Z   test_opinfo_error_generators_signal_windows_hamming_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4467086Z   test_opinfo_error_generators_signal_windows_hann_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4467503Z   test_opinfo_error_generators_signal_windows_kaiser_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4467924Z   test_opinfo_error_generators_signal_windows_nuttall_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4468352Z   test_opinfo_error_generators_special_chebyshev_polynomial_t_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4468782Z   test_opinfo_error_generators_special_chebyshev_polynomial_u_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4469499Z   test_opinfo_error_generators_special_chebyshev_polynomial_v_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4470170Z   test_opinfo_error_generators_special_chebyshev_polynomial_w_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4470664Z   test_opinfo_error_generators_special_hermite_polynomial_h_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4471094Z   test_opinfo_error_generators_special_hermite_polynomial_he_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4471534Z   test_opinfo_error_generators_special_laguerre_polynomial_l_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4472144Z   test_opinfo_error_generators_special_legendre_polynomial_p_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4472820Z   test_opinfo_error_generators_special_shifted_chebyshev_polynomial_t_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4473531Z   test_opinfo_error_generators_special_shifted_chebyshev_polynomial_u_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4474206Z   test_opinfo_error_generators_special_shifted_chebyshev_polynomial_v_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4474872Z   test_opinfo_error_generators_special_shifted_chebyshev_polynomial_w_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4475352Z   test_opinfo_error_generators_special_xlog1py_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4475786Z   test_opinfo_error_generators_special_zeta_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4476200Z   test_opinfo_error_generators_sub_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4476597Z   test_opinfo_error_generators_sum_to_size_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4476986Z   test_opinfo_error_generators_t_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4477360Z   test_opinfo_error_generators_take_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4477750Z   test_opinfo_error_generators_trace_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4478135Z   test_opinfo_error_generators_tril_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4478509Z   test_opinfo_error_generators_triu_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4478992Z   test_opinfo_error_generators_true_divide_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4479387Z   test_opinfo_error_generators_unbind_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4479782Z   test_opinfo_error_generators_uniform_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4480175Z   test_opinfo_error_generators_view_as_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4480560Z   test_opinfo_error_generators_view_copy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4480949Z   test_opinfo_error_generators_view_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4481329Z   test_opinfo_error_generators_vsplit_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4481710Z   test_opinfo_error_generators_vstack_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4482106Z   test_opinfo_error_generators_where_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4482495Z   test_opinfo_error_generators_xlogy_cuda (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4482902Z   test_opinfo_reference_generators___radd___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4483309Z   test_opinfo_reference_generators___rand___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4483715Z   test_opinfo_reference_generators___rdiv___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4484131Z   test_opinfo_reference_generators___rmod___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4484537Z   test_opinfo_reference_generators___rmul___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4484946Z   test_opinfo_reference_generators___ror___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4485350Z   test_opinfo_reference_generators___rpow___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4485812Z   test_opinfo_reference_generators___rsub___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4486223Z   test_opinfo_reference_generators___rxor___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4486620Z   test_opinfo_reference_generators_abs_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4487030Z   test_opinfo_reference_generators_acos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4487438Z   test_opinfo_reference_generators_acosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4487841Z   test_opinfo_reference_generators_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4488259Z   test_opinfo_reference_generators_addcdiv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4488673Z   test_opinfo_reference_generators_addcmul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4489087Z   test_opinfo_reference_generators_angle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4489486Z   test_opinfo_reference_generators_asin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4489897Z   test_opinfo_reference_generators_asinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4490299Z   test_opinfo_reference_generators_atan2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4490708Z   test_opinfo_reference_generators_atan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4491144Z   test_opinfo_reference_generators_atanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4491560Z   test_opinfo_reference_generators_bfloat16_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4491977Z   test_opinfo_reference_generators_bitwise_and_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4492405Z   test_opinfo_reference_generators_bitwise_left_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4492820Z   test_opinfo_reference_generators_bitwise_not_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4493231Z   test_opinfo_reference_generators_bitwise_or_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4493655Z   test_opinfo_reference_generators_bitwise_right_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4494215Z   test_opinfo_reference_generators_bitwise_xor_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4494883Z   test_opinfo_reference_generators_bool_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4495309Z   test_opinfo_reference_generators_broadcast_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4495733Z   test_opinfo_reference_generators_bucketize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4496140Z   test_opinfo_reference_generators_byte_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4496534Z   test_opinfo_reference_generators_cat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4496942Z   test_opinfo_reference_generators_cdouble_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4497349Z   test_opinfo_reference_generators_ceil_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4497746Z   test_opinfo_reference_generators_cfloat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4498238Z   test_opinfo_reference_generators_chalf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4498642Z   test_opinfo_reference_generators_char_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4499046Z   test_opinfo_reference_generators_chunk_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4499440Z   test_opinfo_reference_generators_clamp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4499847Z   test_opinfo_reference_generators_clamp_max_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4500266Z   test_opinfo_reference_generators_clamp_min_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4500669Z   test_opinfo_reference_generators_clone_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4501068Z   test_opinfo_reference_generators_complex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4501473Z   test_opinfo_reference_generators_conj_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4501883Z   test_opinfo_reference_generators_conj_physical_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4502310Z   test_opinfo_reference_generators_contiguous_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4502720Z   test_opinfo_reference_generators_copysign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4503127Z   test_opinfo_reference_generators_cos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4503583Z   test_opinfo_reference_generators_cosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4504000Z   test_opinfo_reference_generators_deg2rad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4504413Z   test_opinfo_reference_generators_diag_embed_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4504838Z   test_opinfo_reference_generators_diagonal_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4505259Z   test_opinfo_reference_generators_diagonal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4505666Z   test_opinfo_reference_generators_digamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4506141Z   test_opinfo_reference_generators_div_floor_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4506585Z   test_opinfo_reference_generators_div_no_rounding_mode_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4507022Z   test_opinfo_reference_generators_div_trunc_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4507437Z   test_opinfo_reference_generators_double_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4507855Z   test_opinfo_reference_generators_empty_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4508268Z   test_opinfo_reference_generators_eq_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4508674Z   test_opinfo_reference_generators_erf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4509075Z   test_opinfo_reference_generators_erfc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4509492Z   test_opinfo_reference_generators_erfinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4509900Z   test_opinfo_reference_generators_exp2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4510339Z   test_opinfo_reference_generators_exp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4510739Z   test_opinfo_reference_generators_expm1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4511152Z   test_opinfo_reference_generators_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4511563Z   test_opinfo_reference_generators_flatten_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4511977Z   test_opinfo_reference_generators_float_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4512387Z   test_opinfo_reference_generators_float_power_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4512800Z   test_opinfo_reference_generators_floor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4513216Z   test_opinfo_reference_generators_floor_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4513632Z   test_opinfo_reference_generators_fmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4514026Z   test_opinfo_reference_generators_fmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4514430Z   test_opinfo_reference_generators_fmod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4514837Z   test_opinfo_reference_generators_frac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4515270Z   test_opinfo_reference_generators_frexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4515675Z   test_opinfo_reference_generators_gcd_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4516088Z   test_opinfo_reference_generators_ge_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4516489Z   test_opinfo_reference_generators_gt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4516883Z   test_opinfo_reference_generators_half_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4517297Z   test_opinfo_reference_generators_heaviside_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4517716Z   test_opinfo_reference_generators_hypot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4518124Z   test_opinfo_reference_generators_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4518521Z   test_opinfo_reference_generators_igamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4518935Z   test_opinfo_reference_generators_igammac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4519352Z   test_opinfo_reference_generators_imag_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4519772Z   test_opinfo_reference_generators_index_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4520183Z   test_opinfo_reference_generators_index_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4520603Z   test_opinfo_reference_generators_index_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4521024Z   test_opinfo_reference_generators_index_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4521436Z   test_opinfo_reference_generators_int_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4521846Z   test_opinfo_reference_generators_isclose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4522290Z   test_opinfo_reference_generators_isfinite_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4522705Z   test_opinfo_reference_generators_isinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4523099Z   test_opinfo_reference_generators_isnan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4523510Z   test_opinfo_reference_generators_isneginf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4523919Z   test_opinfo_reference_generators_isposinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4524338Z   test_opinfo_reference_generators_isreal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4524750Z   test_opinfo_reference_generators_jiterator_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4525198Z   test_opinfo_reference_generators_jiterator_binary_return_by_ref_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4525639Z   test_opinfo_reference_generators_jiterator_unary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4526057Z   test_opinfo_reference_generators_lcm_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4526457Z   test_opinfo_reference_generators_ldexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4526864Z   test_opinfo_reference_generators_le_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4527306Z   test_opinfo_reference_generators_lgamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4527713Z   test_opinfo_reference_generators_log10_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4528111Z   test_opinfo_reference_generators_log1p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4528521Z   test_opinfo_reference_generators_log2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4528929Z   test_opinfo_reference_generators_log_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4529345Z   test_opinfo_reference_generators_logical_and_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4529761Z   test_opinfo_reference_generators_logical_not_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4530183Z   test_opinfo_reference_generators_logical_or_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4530602Z   test_opinfo_reference_generators_logical_xor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4531011Z   test_opinfo_reference_generators_logit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4531418Z   test_opinfo_reference_generators_long_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4531822Z   test_opinfo_reference_generators_lt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4532233Z   test_opinfo_reference_generators_max_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4532640Z   test_opinfo_reference_generators_maximum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4533059Z   test_opinfo_reference_generators_min_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4533476Z   test_opinfo_reference_generators_minimum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4534068Z   test_opinfo_reference_generators_movedim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4534619Z   test_opinfo_reference_generators_mul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4535045Z   test_opinfo_reference_generators_mvlgamma_mvlgamma_p_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4535493Z   test_opinfo_reference_generators_mvlgamma_mvlgamma_p_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4535971Z   test_opinfo_reference_generators_mvlgamma_mvlgamma_p_5_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4536394Z   test_opinfo_reference_generators_nan_to_num_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4536818Z   test_opinfo_reference_generators_narrow_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4537237Z   test_opinfo_reference_generators_narrow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4537645Z   test_opinfo_reference_generators_ne_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4538044Z   test_opinfo_reference_generators_neg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4538457Z   test_opinfo_reference_generators_nextafter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4538883Z   test_opinfo_reference_generators_nn_functional_celu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4539318Z   test_opinfo_reference_generators_nn_functional_elu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4539851Z   test_opinfo_reference_generators_nn_functional_group_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4540301Z   test_opinfo_reference_generators_nn_functional_hardshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4540754Z   test_opinfo_reference_generators_nn_functional_hardsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4541209Z   test_opinfo_reference_generators_nn_functional_hardtanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4541658Z   test_opinfo_reference_generators_nn_functional_hinge_embedding_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4542116Z   test_opinfo_reference_generators_nn_functional_logsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4542577Z   test_opinfo_reference_generators_nn_functional_margin_ranking_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4543024Z   test_opinfo_reference_generators_nn_functional_mish_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4543459Z   test_opinfo_reference_generators_nn_functional_prelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4543895Z   test_opinfo_reference_generators_nn_functional_relu6_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4544323Z   test_opinfo_reference_generators_nn_functional_relu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4544756Z   test_opinfo_reference_generators_nn_functional_rrelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4545176Z   test_opinfo_reference_generators_nn_functional_selu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4545614Z   test_opinfo_reference_generators_nn_functional_silu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4546138Z   test_opinfo_reference_generators_nn_functional_softplus_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4546573Z   test_opinfo_reference_generators_nn_functional_softshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4547013Z   test_opinfo_reference_generators_nn_functional_softsign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4547452Z   test_opinfo_reference_generators_nn_functional_tanhshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4547894Z   test_opinfo_reference_generators_nn_functional_threshold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4548318Z   test_opinfo_reference_generators_permute_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4548729Z   test_opinfo_reference_generators_polar_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4549158Z   test_opinfo_reference_generators_polygamma_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4549601Z   test_opinfo_reference_generators_polygamma_polygamma_n_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4550028Z   test_opinfo_reference_generators_polygamma_polygamma_n_2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4550465Z   test_opinfo_reference_generators_polygamma_polygamma_n_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4550891Z   test_opinfo_reference_generators_polygamma_polygamma_n_4_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4551355Z   test_opinfo_reference_generators_positive_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4551761Z   test_opinfo_reference_generators_pow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4552172Z   test_opinfo_reference_generators_rad2deg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4552583Z   test_opinfo_reference_generators_real_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4552996Z   test_opinfo_reference_generators_reciprocal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4553408Z   test_opinfo_reference_generators_remainder_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4553830Z   test_opinfo_reference_generators_reshape_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4554254Z   test_opinfo_reference_generators_reshape_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4554663Z   test_opinfo_reference_generators_round_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4555078Z   test_opinfo_reference_generators_round_decimals_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4555509Z   test_opinfo_reference_generators_round_decimals_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4555940Z   test_opinfo_reference_generators_round_decimals_neg_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4556358Z   test_opinfo_reference_generators_rsqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4556752Z   test_opinfo_reference_generators_rsub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4557164Z   test_opinfo_reference_generators_sgn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4557567Z   test_opinfo_reference_generators_short_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4558006Z   test_opinfo_reference_generators_sigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4558407Z   test_opinfo_reference_generators_sign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4558836Z   test_opinfo_reference_generators_signal_windows_bartlett_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4559275Z   test_opinfo_reference_generators_signal_windows_blackman_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4559709Z   test_opinfo_reference_generators_signal_windows_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4560159Z   test_opinfo_reference_generators_signal_windows_exponential_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4560609Z   test_opinfo_reference_generators_signal_windows_gaussian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4561060Z   test_opinfo_reference_generators_signal_windows_general_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4561513Z   test_opinfo_reference_generators_signal_windows_general_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4561958Z   test_opinfo_reference_generators_signal_windows_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4562394Z   test_opinfo_reference_generators_signal_windows_hann_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4562875Z   test_opinfo_reference_generators_signal_windows_kaiser_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4563305Z   test_opinfo_reference_generators_signal_windows_nuttall_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4563736Z   test_opinfo_reference_generators_signbit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4564145Z   test_opinfo_reference_generators_sin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4564554Z   test_opinfo_reference_generators_sinc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4564952Z   test_opinfo_reference_generators_sinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4565373Z   test_opinfo_reference_generators_special_airy_ai_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4565810Z   test_opinfo_reference_generators_special_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4566240Z   test_opinfo_reference_generators_special_bessel_j1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4566660Z   test_opinfo_reference_generators_special_bessel_y0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4567083Z   test_opinfo_reference_generators_special_bessel_y1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4567531Z   test_opinfo_reference_generators_special_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4567994Z   test_opinfo_reference_generators_special_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4568727Z   test_opinfo_reference_generators_special_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4569419Z   test_opinfo_reference_generators_special_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4569980Z   test_opinfo_reference_generators_special_entr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4570407Z   test_opinfo_reference_generators_special_erfcx_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4570843Z   test_opinfo_reference_generators_special_hermite_polynomial_h_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4571304Z   test_opinfo_reference_generators_special_hermite_polynomial_he_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4571744Z   test_opinfo_reference_generators_special_i0e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4572175Z   test_opinfo_reference_generators_special_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4572591Z   test_opinfo_reference_generators_special_i1e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4573029Z   test_opinfo_reference_generators_special_laguerre_polynomial_l_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4574819Z   test_opinfo_reference_generators_special_legendre_polynomial_p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4575321Z   test_opinfo_reference_generators_special_log_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4575764Z   test_opinfo_reference_generators_special_modified_bessel_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4576293Z   test_opinfo_reference_generators_special_modified_bessel_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4576746Z   test_opinfo_reference_generators_special_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4577198Z   test_opinfo_reference_generators_special_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4577633Z   test_opinfo_reference_generators_special_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4578052Z   test_opinfo_reference_generators_special_ndtri_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4578505Z   test_opinfo_reference_generators_special_polygamma_special_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4578981Z   test_opinfo_reference_generators_special_scaled_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4579443Z   test_opinfo_reference_generators_special_scaled_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4580084Z   test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4580790Z   test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4581495Z   test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4582204Z   test_opinfo_reference_generators_special_shifted_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4582758Z   test_opinfo_reference_generators_special_spherical_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4583197Z   test_opinfo_reference_generators_special_xlog1py_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4583670Z   test_opinfo_reference_generators_special_zeta_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4584083Z   test_opinfo_reference_generators_sqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4584493Z   test_opinfo_reference_generators_square_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4584893Z   test_opinfo_reference_generators_sub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4585303Z   test_opinfo_reference_generators_tan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4585704Z   test_opinfo_reference_generators_tanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4586113Z   test_opinfo_reference_generators_true_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4586524Z   test_opinfo_reference_generators_trunc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4586932Z   test_opinfo_reference_generators_view_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4587340Z   test_opinfo_reference_generators_view_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4587742Z   test_opinfo_reference_generators_where_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4588182Z   test_opinfo_reference_generators_xlogy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4588584Z   test_opinfo_sample_generators_H_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4588987Z   test_opinfo_sample_generators_T_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4589384Z   test_opinfo_sample_generators___getitem___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4589793Z   test_opinfo_sample_generators___radd___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4590198Z   test_opinfo_sample_generators___rand___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4590598Z   test_opinfo_sample_generators___rdiv___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4591007Z   test_opinfo_sample_generators___rmatmul___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4591408Z   test_opinfo_sample_generators___rmod___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4591810Z   test_opinfo_sample_generators___rmul___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4592208Z   test_opinfo_sample_generators___ror___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4592597Z   test_opinfo_sample_generators___rpow___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4592991Z   test_opinfo_sample_generators___rsub___cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4593394Z   test_opinfo_sample_generators___rxor___cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4593807Z   test_opinfo_sample_generators__native_batch_norm_legit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4594251Z   test_opinfo_sample_generators__softmax_backward_data_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4594700Z   test_opinfo_sample_generators_abs_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4595105Z   test_opinfo_sample_generators_acos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4595498Z   test_opinfo_sample_generators_acosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4595901Z   test_opinfo_sample_generators_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4596305Z   test_opinfo_sample_generators_addbmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4596714Z   test_opinfo_sample_generators_addcdiv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4597118Z   test_opinfo_sample_generators_addcmul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4597523Z   test_opinfo_sample_generators_addmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4597942Z   test_opinfo_sample_generators_addmm_decomposed_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4598363Z   test_opinfo_sample_generators_addmv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4598759Z   test_opinfo_sample_generators_addr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4599159Z   test_opinfo_sample_generators_all_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4599563Z   test_opinfo_sample_generators_allclose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4600003Z   test_opinfo_sample_generators_amax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4600398Z   test_opinfo_sample_generators_amin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4600802Z   test_opinfo_sample_generators_aminmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4601205Z   test_opinfo_sample_generators_angle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4601597Z   test_opinfo_sample_generators_any_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4601998Z   test_opinfo_sample_generators_arange_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4602401Z   test_opinfo_sample_generators_argmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4602801Z   test_opinfo_sample_generators_argmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4603196Z   test_opinfo_sample_generators_argsort_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4603607Z   test_opinfo_sample_generators_argwhere_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4604024Z   test_opinfo_sample_generators_as_strided_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4604453Z   test_opinfo_sample_generators_as_strided_partial_views_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4604881Z   test_opinfo_sample_generators_as_strided_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4605299Z   test_opinfo_sample_generators_asin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4605698Z   test_opinfo_sample_generators_asinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4606105Z   test_opinfo_sample_generators_atan2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4606491Z   test_opinfo_sample_generators_atan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4606922Z   test_opinfo_sample_generators_atanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4607334Z   test_opinfo_sample_generators_atleast_1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4607740Z   test_opinfo_sample_generators_atleast_2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4608145Z   test_opinfo_sample_generators_atleast_3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4608545Z   test_opinfo_sample_generators_baddbmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4608959Z   test_opinfo_sample_generators_bernoulli_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4609362Z   test_opinfo_sample_generators_bfloat16_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4609773Z   test_opinfo_sample_generators_bincount_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4610178Z   test_opinfo_sample_generators_bitwise_and_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4610594Z   test_opinfo_sample_generators_bitwise_left_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4611001Z   test_opinfo_sample_generators_bitwise_not_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4611408Z   test_opinfo_sample_generators_bitwise_or_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4611824Z   test_opinfo_sample_generators_bitwise_right_shift_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4612272Z   test_opinfo_sample_generators_bitwise_xor_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4612676Z   test_opinfo_sample_generators_block_diag_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4613081Z   test_opinfo_sample_generators_bmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4613479Z   test_opinfo_sample_generators_bool_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4614014Z   test_opinfo_sample_generators_broadcast_shapes_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4614656Z   test_opinfo_sample_generators_broadcast_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4615092Z   test_opinfo_sample_generators_broadcast_to_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4615558Z   test_opinfo_sample_generators_bucketize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4615951Z   test_opinfo_sample_generators_byte_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4616361Z   test_opinfo_sample_generators_cartesian_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4616771Z   test_opinfo_sample_generators_cat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4617167Z   test_opinfo_sample_generators_cdist_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4617555Z   test_opinfo_sample_generators_cdouble_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4617951Z   test_opinfo_sample_generators_ceil_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4618351Z   test_opinfo_sample_generators_cfloat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4618748Z   test_opinfo_sample_generators_chalf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4619211Z   test_opinfo_sample_generators_char_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4619612Z   test_opinfo_sample_generators_cholesky_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4620022Z   test_opinfo_sample_generators_cholesky_inverse_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4620440Z   test_opinfo_sample_generators_cholesky_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4620836Z   test_opinfo_sample_generators_chunk_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4621233Z   test_opinfo_sample_generators_clamp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4621631Z   test_opinfo_sample_generators_clamp_max_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4622036Z   test_opinfo_sample_generators_clamp_min_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4622431Z   test_opinfo_sample_generators_clone_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4622835Z   test_opinfo_sample_generators_column_stack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4623249Z   test_opinfo_sample_generators_combinations_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4623649Z   test_opinfo_sample_generators_complex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4624046Z   test_opinfo_sample_generators_conj_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4624514Z   test_opinfo_sample_generators_conj_physical_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4624938Z   test_opinfo_sample_generators_constant_pad_nd_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4625350Z   test_opinfo_sample_generators_contiguous_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4625767Z   test_opinfo_sample_generators_copysign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4626224Z   test_opinfo_sample_generators_corrcoef_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4626627Z   test_opinfo_sample_generators_cos_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4627018Z   test_opinfo_sample_generators_cosh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4627434Z   test_opinfo_sample_generators_count_nonzero_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4627839Z   test_opinfo_sample_generators_cov_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4628242Z   test_opinfo_sample_generators_cross_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4628642Z   test_opinfo_sample_generators_cummax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4629048Z   test_opinfo_sample_generators_cummin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4629454Z   test_opinfo_sample_generators_cumprod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4629855Z   test_opinfo_sample_generators_cumsum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4630271Z   test_opinfo_sample_generators_cumulative_trapezoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4630693Z   test_opinfo_sample_generators_deg2rad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4631138Z   test_opinfo_sample_generators_diag_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4631540Z   test_opinfo_sample_generators_diag_embed_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4631949Z   test_opinfo_sample_generators_diagflat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4632362Z   test_opinfo_sample_generators_diagonal_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4632775Z   test_opinfo_sample_generators_diagonal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4633181Z   test_opinfo_sample_generators_diagonal_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4633594Z   test_opinfo_sample_generators_diff_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4634001Z   test_opinfo_sample_generators_digamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4634409Z   test_opinfo_sample_generators_dist_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4634814Z   test_opinfo_sample_generators_div_floor_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4635240Z   test_opinfo_sample_generators_div_no_rounding_mode_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4635671Z   test_opinfo_sample_generators_div_trunc_rounding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4636124Z   test_opinfo_sample_generators_dot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4636563Z   test_opinfo_sample_generators_double_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4636963Z   test_opinfo_sample_generators_dsplit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4637365Z   test_opinfo_sample_generators_dstack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4637750Z   test_opinfo_sample_generators_einsum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4638145Z   test_opinfo_sample_generators_empty_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4638561Z   test_opinfo_sample_generators_empty_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4638965Z   test_opinfo_sample_generators_eq_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4639360Z   test_opinfo_sample_generators_equal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4639759Z   test_opinfo_sample_generators_erf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4640158Z   test_opinfo_sample_generators_erfc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4640558Z   test_opinfo_sample_generators_erfinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4640953Z   test_opinfo_sample_generators_exp2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4641349Z   test_opinfo_sample_generators_exp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4641751Z   test_opinfo_sample_generators_expand_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4642151Z   test_opinfo_sample_generators_expand_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4642550Z   test_opinfo_sample_generators_expm1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4642946Z   test_opinfo_sample_generators_eye_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4643378Z   test_opinfo_sample_generators_fft_fft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4643774Z   test_opinfo_sample_generators_fft_fft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4644180Z   test_opinfo_sample_generators_fft_fftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4644592Z   test_opinfo_sample_generators_fft_fftshift_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4645005Z   test_opinfo_sample_generators_fft_hfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4645405Z   test_opinfo_sample_generators_fft_hfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4645804Z   test_opinfo_sample_generators_fft_hfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4646207Z   test_opinfo_sample_generators_fft_ifft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4646613Z   test_opinfo_sample_generators_fft_ifft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4647002Z   test_opinfo_sample_generators_fft_ifftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4647414Z   test_opinfo_sample_generators_fft_ifftshift_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4647830Z   test_opinfo_sample_generators_fft_ihfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4648270Z   test_opinfo_sample_generators_fft_ihfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4648666Z   test_opinfo_sample_generators_fft_ihfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4649070Z   test_opinfo_sample_generators_fft_irfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4649479Z   test_opinfo_sample_generators_fft_irfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4649874Z   test_opinfo_sample_generators_fft_irfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4650276Z   test_opinfo_sample_generators_fft_rfft2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4650676Z   test_opinfo_sample_generators_fft_rfft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4651075Z   test_opinfo_sample_generators_fft_rfftn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4651471Z   test_opinfo_sample_generators_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4651874Z   test_opinfo_sample_generators_flatten_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4652276Z   test_opinfo_sample_generators_flip_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4652677Z   test_opinfo_sample_generators_fliplr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4653072Z   test_opinfo_sample_generators_flipud_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4653475Z   test_opinfo_sample_generators_float_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4653987Z   test_opinfo_sample_generators_float_power_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4654436Z   test_opinfo_sample_generators_floor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4655076Z   test_opinfo_sample_generators_floor_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4655581Z   test_opinfo_sample_generators_fmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4655976Z   test_opinfo_sample_generators_fmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4656365Z   test_opinfo_sample_generators_fmod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4656746Z   test_opinfo_sample_generators_frac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4657141Z   test_opinfo_sample_generators_frexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4657537Z   test_opinfo_sample_generators_full_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4657923Z   test_opinfo_sample_generators_full_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4658330Z   test_opinfo_sample_generators_gather_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4658726Z   test_opinfo_sample_generators_gcd_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4659118Z   test_opinfo_sample_generators_ge_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4659504Z   test_opinfo_sample_generators_geqrf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4659901Z   test_opinfo_sample_generators_gradient_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4660313Z   test_opinfo_sample_generators_grid_sampler_2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4660768Z   test_opinfo_sample_generators_gt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4661158Z   test_opinfo_sample_generators_half_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4661566Z   test_opinfo_sample_generators_heaviside_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4661969Z   test_opinfo_sample_generators_histc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4662370Z   test_opinfo_sample_generators_hsplit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4662766Z   test_opinfo_sample_generators_hstack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4663166Z   test_opinfo_sample_generators_hypot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4663564Z   test_opinfo_sample_generators_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4663955Z   test_opinfo_sample_generators_igamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4664362Z   test_opinfo_sample_generators_igammac_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4664769Z   test_opinfo_sample_generators_imag_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4665175Z   test_opinfo_sample_generators_index_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4665578Z   test_opinfo_sample_generators_index_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4665982Z   test_opinfo_sample_generators_index_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4666389Z   test_opinfo_sample_generators_index_put_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4666808Z   test_opinfo_sample_generators_index_reduce_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4667315Z   test_opinfo_sample_generators_index_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4667722Z   test_opinfo_sample_generators_inner_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4668126Z   test_opinfo_sample_generators_int_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4668531Z   test_opinfo_sample_generators_isclose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4668929Z   test_opinfo_sample_generators_isfinite_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4669328Z   test_opinfo_sample_generators_isin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4669728Z   test_opinfo_sample_generators_isinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4670116Z   test_opinfo_sample_generators_isnan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4670518Z   test_opinfo_sample_generators_isneginf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4670922Z   test_opinfo_sample_generators_isposinf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4671330Z   test_opinfo_sample_generators_isreal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4671734Z   test_opinfo_sample_generators_istft_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4672162Z   test_opinfo_sample_generators_jiterator_2inputs_2outputs_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4672644Z   test_opinfo_sample_generators_jiterator_4inputs_with_extra_args_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4673080Z   test_opinfo_sample_generators_jiterator_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4673510Z   test_opinfo_sample_generators_jiterator_binary_return_by_ref_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4673945Z   test_opinfo_sample_generators_jiterator_unary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4674352Z   test_opinfo_sample_generators_kron_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4674755Z   test_opinfo_sample_generators_kthvalue_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4675153Z   test_opinfo_sample_generators_lcm_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4675597Z   test_opinfo_sample_generators_ldexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4675995Z   test_opinfo_sample_generators_le_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4676398Z   test_opinfo_sample_generators_lerp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4676792Z   test_opinfo_sample_generators_lgamma_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4677209Z   test_opinfo_sample_generators_linalg_cholesky_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4677633Z   test_opinfo_sample_generators_linalg_cholesky_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4678054Z   test_opinfo_sample_generators_linalg_cond_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4678464Z   test_opinfo_sample_generators_linalg_cross_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4678879Z   test_opinfo_sample_generators_linalg_det_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4679329Z   test_opinfo_sample_generators_linalg_det_singular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4679744Z   test_opinfo_sample_generators_linalg_eig_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4680158Z   test_opinfo_sample_generators_linalg_eigh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4680576Z   test_opinfo_sample_generators_linalg_eigvals_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4680999Z   test_opinfo_sample_generators_linalg_eigvalsh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4681433Z   test_opinfo_sample_generators_linalg_householder_product_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4681858Z   test_opinfo_sample_generators_linalg_inv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4682273Z   test_opinfo_sample_generators_linalg_inv_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4682692Z   test_opinfo_sample_generators_linalg_ldl_factor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4683111Z   test_opinfo_sample_generators_linalg_ldl_factor_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4683533Z   test_opinfo_sample_generators_linalg_ldl_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4683949Z   test_opinfo_sample_generators_linalg_lstsq_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4684415Z   test_opinfo_sample_generators_linalg_lstsq_grad_oriented_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4684836Z   test_opinfo_sample_generators_linalg_lu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4685255Z   test_opinfo_sample_generators_linalg_lu_factor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4685681Z   test_opinfo_sample_generators_linalg_lu_factor_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4686105Z   test_opinfo_sample_generators_linalg_lu_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4686515Z   test_opinfo_sample_generators_linalg_matrix_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4686940Z   test_opinfo_sample_generators_linalg_matrix_power_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4687362Z   test_opinfo_sample_generators_linalg_matrix_rank_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4687792Z   test_opinfo_sample_generators_linalg_matrix_rank_hermitian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4688223Z   test_opinfo_sample_generators_linalg_multi_dot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4688642Z   test_opinfo_sample_generators_linalg_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4689075Z   test_opinfo_sample_generators_linalg_norm_subgradients_at_zero_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4689511Z   test_opinfo_sample_generators_linalg_pinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4689930Z   test_opinfo_sample_generators_linalg_pinv_hermitian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4690425Z   test_opinfo_sample_generators_linalg_pinv_singular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:23:47.4690928Z   test_opinfo_sample_generators_linalg_qr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4691353Z   test_opinfo_sample_generators_linalg_slogdet_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4691762Z   test_opinfo_sample_generators_linalg_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4692178Z   test_opinfo_sample_generators_linalg_solve_ex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4692602Z   test_opinfo_sample_generators_linalg_solve_triangular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4693029Z   test_opinfo_sample_generators_linalg_svd_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4693437Z   test_opinfo_sample_generators_linalg_svdvals_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4693998Z   test_opinfo_sample_generators_linalg_tensorinv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4694459Z   test_opinfo_sample_generators_linalg_tensorsolve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4695129Z   test_opinfo_sample_generators_linalg_vander_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4695545Z   test_opinfo_sample_generators_linalg_vecdot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4695957Z   test_opinfo_sample_generators_linalg_vector_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4696457Z   test_opinfo_sample_generators_linspace_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4696853Z   test_opinfo_sample_generators_log10_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4697262Z   test_opinfo_sample_generators_log1p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4697661Z   test_opinfo_sample_generators_log2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4698060Z   test_opinfo_sample_generators_log_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4698459Z   test_opinfo_sample_generators_log_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4698878Z   test_opinfo_sample_generators_log_softmax_with_dtype_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4699301Z   test_opinfo_sample_generators_logaddexp2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4699718Z   test_opinfo_sample_generators_logaddexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4700128Z   test_opinfo_sample_generators_logcumsumexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4700545Z   test_opinfo_sample_generators_logdet_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4700952Z   test_opinfo_sample_generators_logical_and_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4701359Z   test_opinfo_sample_generators_logical_not_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4701762Z   test_opinfo_sample_generators_logical_or_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4702167Z   test_opinfo_sample_generators_logical_xor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4702576Z   test_opinfo_sample_generators_logit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4702986Z   test_opinfo_sample_generators_logspace_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4703452Z   test_opinfo_sample_generators_logsumexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4703866Z   test_opinfo_sample_generators_long_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4704261Z   test_opinfo_sample_generators_lt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4704649Z   test_opinfo_sample_generators_lu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4705051Z   test_opinfo_sample_generators_lu_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4705459Z   test_opinfo_sample_generators_lu_unpack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4705862Z   test_opinfo_sample_generators_mH_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4706250Z   test_opinfo_sample_generators_mT_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4706655Z   test_opinfo_sample_generators_masked_amax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4707057Z   test_opinfo_sample_generators_masked_amin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4707474Z   test_opinfo_sample_generators_masked_argmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4707885Z   test_opinfo_sample_generators_masked_argmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4708302Z   test_opinfo_sample_generators_masked_cumprod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4708755Z   test_opinfo_sample_generators_masked_cumsum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4709168Z   test_opinfo_sample_generators_masked_fill_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4709579Z   test_opinfo_sample_generators_masked_log_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4710006Z   test_opinfo_sample_generators_masked_logaddexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4710430Z   test_opinfo_sample_generators_masked_logsumexp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4710848Z   test_opinfo_sample_generators_masked_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4711252Z   test_opinfo_sample_generators_masked_median_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4711666Z   test_opinfo_sample_generators_masked_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4712077Z   test_opinfo_sample_generators_masked_normalize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4712488Z   test_opinfo_sample_generators_masked_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4712905Z   test_opinfo_sample_generators_masked_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4713320Z   test_opinfo_sample_generators_masked_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4713731Z   test_opinfo_sample_generators_masked_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4714142Z   test_opinfo_sample_generators_masked_softmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4714556Z   test_opinfo_sample_generators_masked_std_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4714986Z   test_opinfo_sample_generators_masked_sum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4715395Z   test_opinfo_sample_generators_masked_var_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4715797Z   test_opinfo_sample_generators_matmul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4716200Z   test_opinfo_sample_generators_matrix_exp_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4716599Z   test_opinfo_sample_generators_max_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4717035Z   test_opinfo_sample_generators_max_pool2d_with_indices_backward_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4717475Z   test_opinfo_sample_generators_max_reduction_no_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4717906Z   test_opinfo_sample_generators_max_reduction_with_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4718323Z   test_opinfo_sample_generators_maximum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4718725Z   test_opinfo_sample_generators_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4719119Z   test_opinfo_sample_generators_median_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4719536Z   test_opinfo_sample_generators_meshgrid_list_of_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4719973Z   test_opinfo_sample_generators_meshgrid_variadic_tensors_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4720430Z   test_opinfo_sample_generators_min_binary_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4720843Z   test_opinfo_sample_generators_min_reduction_no_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4721273Z   test_opinfo_sample_generators_min_reduction_with_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4721686Z   test_opinfo_sample_generators_minimum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4722078Z   test_opinfo_sample_generators_mm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4722473Z   test_opinfo_sample_generators_mode_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4722871Z   test_opinfo_sample_generators_movedim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4723274Z   test_opinfo_sample_generators_msort_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4723665Z   test_opinfo_sample_generators_mul_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4724068Z   test_opinfo_sample_generators_multinomial_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4724467Z   test_opinfo_sample_generators_mv_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4724882Z   test_opinfo_sample_generators_mvlgamma_mvlgamma_p_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4725308Z   test_opinfo_sample_generators_mvlgamma_mvlgamma_p_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4725739Z   test_opinfo_sample_generators_mvlgamma_mvlgamma_p_5_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4726157Z   test_opinfo_sample_generators_nan_to_num_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4726565Z   test_opinfo_sample_generators_nanmean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4726997Z   test_opinfo_sample_generators_nanmedian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4727411Z   test_opinfo_sample_generators_nanquantile_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4727815Z   test_opinfo_sample_generators_nansum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4728224Z   test_opinfo_sample_generators_narrow_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4728624Z   test_opinfo_sample_generators_narrow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4729041Z   test_opinfo_sample_generators_native_batch_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4729476Z   test_opinfo_sample_generators_native_dropout_backward_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4729913Z   test_opinfo_sample_generators_native_layer_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4730318Z   test_opinfo_sample_generators_ne_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4730715Z   test_opinfo_sample_generators_neg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4731121Z   test_opinfo_sample_generators_new_empty_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4731527Z   test_opinfo_sample_generators_new_empty_strided_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4731981Z   test_opinfo_sample_generators_new_full_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4732382Z   test_opinfo_sample_generators_new_ones_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4732788Z   test_opinfo_sample_generators_new_zeros_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4733184Z   test_opinfo_sample_generators_nextafter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4733624Z   test_opinfo_sample_generators_nn_functional__scaled_dot_product_attention_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4734227Z   test_opinfo_sample_generators_nn_functional_adaptive_avg_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4734895Z   test_opinfo_sample_generators_nn_functional_adaptive_avg_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4735339Z   test_opinfo_sample_generators_nn_functional_adaptive_avg_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4735787Z   test_opinfo_sample_generators_nn_functional_adaptive_max_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4736232Z   test_opinfo_sample_generators_nn_functional_adaptive_max_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4736672Z   test_opinfo_sample_generators_nn_functional_adaptive_max_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4737108Z   test_opinfo_sample_generators_nn_functional_alpha_dropout_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4737544Z   test_opinfo_sample_generators_nn_functional_avg_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4737973Z   test_opinfo_sample_generators_nn_functional_avg_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4738399Z   test_opinfo_sample_generators_nn_functional_avg_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4738915Z   test_opinfo_sample_generators_nn_functional_batch_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4739364Z   test_opinfo_sample_generators_nn_functional_batch_norm_without_cudnn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4739810Z   test_opinfo_sample_generators_nn_functional_bilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4740248Z   test_opinfo_sample_generators_nn_functional_binary_cross_entropy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4740706Z   test_opinfo_sample_generators_nn_functional_binary_cross_entropy_with_logits_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4741154Z   test_opinfo_sample_generators_nn_functional_celu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4741579Z   test_opinfo_sample_generators_nn_functional_conv1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4742003Z   test_opinfo_sample_generators_nn_functional_conv2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4742426Z   test_opinfo_sample_generators_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4742870Z   test_opinfo_sample_generators_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4743316Z   test_opinfo_sample_generators_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4743818Z   test_opinfo_sample_generators_nn_functional_cosine_embedding_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4744266Z   test_opinfo_sample_generators_nn_functional_cosine_similarity_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4744715Z   test_opinfo_sample_generators_nn_functional_cross_entropy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4745152Z   test_opinfo_sample_generators_nn_functional_ctc_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4745582Z   test_opinfo_sample_generators_nn_functional_dropout2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4746010Z   test_opinfo_sample_generators_nn_functional_dropout3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4746439Z   test_opinfo_sample_generators_nn_functional_dropout_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4746872Z   test_opinfo_sample_generators_nn_functional_elu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4747312Z   test_opinfo_sample_generators_nn_functional_embedding_bag_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4747744Z   test_opinfo_sample_generators_nn_functional_embedding_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4748203Z   test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_with_train_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4748683Z   test_opinfo_sample_generators_nn_functional_feature_alpha_dropout_without_train_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4749151Z   test_opinfo_sample_generators_nn_functional_fractional_max_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4749606Z   test_opinfo_sample_generators_nn_functional_fractional_max_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4750090Z   test_opinfo_sample_generators_nn_functional_gaussian_nll_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4750526Z   test_opinfo_sample_generators_nn_functional_gelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4750954Z   test_opinfo_sample_generators_nn_functional_glu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4751379Z   test_opinfo_sample_generators_nn_functional_grid_sample_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4751816Z   test_opinfo_sample_generators_nn_functional_group_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4752250Z   test_opinfo_sample_generators_nn_functional_hardshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4752688Z   test_opinfo_sample_generators_nn_functional_hardsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4753125Z   test_opinfo_sample_generators_nn_functional_hardswish_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4753557Z   test_opinfo_sample_generators_nn_functional_hardtanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4754005Z   test_opinfo_sample_generators_nn_functional_hinge_embedding_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4754446Z   test_opinfo_sample_generators_nn_functional_huber_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4754879Z   test_opinfo_sample_generators_nn_functional_instance_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4755361Z   test_opinfo_sample_generators_nn_functional_interpolate_area_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4755854Z   test_opinfo_sample_generators_nn_functional_interpolate_bicubic_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4756322Z   test_opinfo_sample_generators_nn_functional_interpolate_bilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4756770Z   test_opinfo_sample_generators_nn_functional_interpolate_linear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4757222Z   test_opinfo_sample_generators_nn_functional_interpolate_nearest_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4757673Z   test_opinfo_sample_generators_nn_functional_interpolate_trilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4758120Z   test_opinfo_sample_generators_nn_functional_kl_div_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4758538Z   test_opinfo_sample_generators_nn_functional_l1_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4758968Z   test_opinfo_sample_generators_nn_functional_layer_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4759395Z   test_opinfo_sample_generators_nn_functional_leaky_relu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4759822Z   test_opinfo_sample_generators_nn_functional_linear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4760256Z   test_opinfo_sample_generators_nn_functional_local_response_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4760698Z   test_opinfo_sample_generators_nn_functional_logsigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4761144Z   test_opinfo_sample_generators_nn_functional_margin_ranking_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4761611Z   test_opinfo_sample_generators_nn_functional_max_pool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4762029Z   test_opinfo_sample_generators_nn_functional_max_pool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4762452Z   test_opinfo_sample_generators_nn_functional_max_pool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4762885Z   test_opinfo_sample_generators_nn_functional_max_unpool1d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4763330Z   test_opinfo_sample_generators_nn_functional_max_unpool1d_grad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4763771Z   test_opinfo_sample_generators_nn_functional_max_unpool2d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4764216Z   test_opinfo_sample_generators_nn_functional_max_unpool2d_grad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4764657Z   test_opinfo_sample_generators_nn_functional_max_unpool3d_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4765110Z   test_opinfo_sample_generators_nn_functional_max_unpool3d_grad_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4765577Z   test_opinfo_sample_generators_nn_functional_mish_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4766001Z   test_opinfo_sample_generators_nn_functional_mse_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4766438Z   test_opinfo_sample_generators_nn_functional_multi_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4766922Z   test_opinfo_sample_generators_nn_functional_multilabel_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4767381Z   test_opinfo_sample_generators_nn_functional_multilabel_soft_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4767829Z   test_opinfo_sample_generators_nn_functional_nll_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4768257Z   test_opinfo_sample_generators_nn_functional_normalize_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4768684Z   test_opinfo_sample_generators_nn_functional_one_hot_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4769110Z   test_opinfo_sample_generators_nn_functional_pad_circular_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4769552Z   test_opinfo_sample_generators_nn_functional_pad_constant_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4769987Z   test_opinfo_sample_generators_nn_functional_pad_reflect_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4770426Z   test_opinfo_sample_generators_nn_functional_pad_replicate_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4770864Z   test_opinfo_sample_generators_nn_functional_pairwise_distance_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4771301Z   test_opinfo_sample_generators_nn_functional_pdist_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4771736Z   test_opinfo_sample_generators_nn_functional_pixel_shuffle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4772180Z   test_opinfo_sample_generators_nn_functional_pixel_unshuffle_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4772620Z   test_opinfo_sample_generators_nn_functional_poisson_nll_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4773145Z   test_opinfo_sample_generators_nn_functional_prelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4773573Z   test_opinfo_sample_generators_nn_functional_relu6_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4774132Z   test_opinfo_sample_generators_nn_functional_relu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4774767Z   test_opinfo_sample_generators_nn_functional_rrelu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4775182Z   test_opinfo_sample_generators_nn_functional_selu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4775607Z   test_opinfo_sample_generators_nn_functional_silu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4776036Z   test_opinfo_sample_generators_nn_functional_smooth_l1_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4776471Z   test_opinfo_sample_generators_nn_functional_soft_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4776904Z   test_opinfo_sample_generators_nn_functional_softmin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4777343Z   test_opinfo_sample_generators_nn_functional_softmin_with_dtype_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4777780Z   test_opinfo_sample_generators_nn_functional_softplus_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4778203Z   test_opinfo_sample_generators_nn_functional_softshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4778726Z   test_opinfo_sample_generators_nn_functional_softsign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4779156Z   test_opinfo_sample_generators_nn_functional_tanhshrink_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4779588Z   test_opinfo_sample_generators_nn_functional_threshold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4780026Z   test_opinfo_sample_generators_nn_functional_triplet_margin_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4780489Z   test_opinfo_sample_generators_nn_functional_triplet_margin_with_distance_loss_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4780941Z   test_opinfo_sample_generators_nn_functional_unfold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4781386Z   test_opinfo_sample_generators_nn_functional_upsample_bilinear_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4781829Z   test_opinfo_sample_generators_nn_functional_upsample_nearest_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4782258Z   test_opinfo_sample_generators_nonzero_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4782662Z   test_opinfo_sample_generators_norm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4783064Z   test_opinfo_sample_generators_norm_fro_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4783455Z   test_opinfo_sample_generators_norm_inf_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4783845Z   test_opinfo_sample_generators_norm_nuc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4784252Z   test_opinfo_sample_generators_normal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4784669Z   test_opinfo_sample_generators_normal_number_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4785115Z   test_opinfo_sample_generators_ones_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4785522Z   test_opinfo_sample_generators_ones_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4785923Z   test_opinfo_sample_generators_ormqr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4786321Z   test_opinfo_sample_generators_outer_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4786728Z   test_opinfo_sample_generators_pca_lowrank_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4787142Z   test_opinfo_sample_generators_permute_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4787553Z   test_opinfo_sample_generators_pinverse_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4787951Z   test_opinfo_sample_generators_polar_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4788370Z   test_opinfo_sample_generators_polygamma_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4788808Z   test_opinfo_sample_generators_polygamma_polygamma_n_1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4789243Z   test_opinfo_sample_generators_polygamma_polygamma_n_2_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4789664Z   test_opinfo_sample_generators_polygamma_polygamma_n_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4790124Z   test_opinfo_sample_generators_polygamma_polygamma_n_4_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4790542Z   test_opinfo_sample_generators_positive_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4790950Z   test_opinfo_sample_generators_pow_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4791347Z   test_opinfo_sample_generators_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4791746Z   test_opinfo_sample_generators_put_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4792142Z   test_opinfo_sample_generators_qr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4792542Z   test_opinfo_sample_generators_quantile_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4792941Z   test_opinfo_sample_generators_rad2deg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4793352Z   test_opinfo_sample_generators_rand_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4793760Z   test_opinfo_sample_generators_randint_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4794173Z   test_opinfo_sample_generators_randint_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4794571Z   test_opinfo_sample_generators_randn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4794976Z   test_opinfo_sample_generators_randn_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4795383Z   test_opinfo_sample_generators_ravel_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4795779Z   test_opinfo_sample_generators_real_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4796190Z   test_opinfo_sample_generators_reciprocal_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4796605Z   test_opinfo_sample_generators_remainder_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4797041Z   test_opinfo_sample_generators_renorm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4797439Z   test_opinfo_sample_generators_repeat_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4797856Z   test_opinfo_sample_generators_repeat_interleave_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4798276Z   test_opinfo_sample_generators_reshape_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4798681Z   test_opinfo_sample_generators_reshape_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4799079Z   test_opinfo_sample_generators_resize__cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4799481Z   test_opinfo_sample_generators_resize_as__cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4799689Z   test_opinfo_sample_generators_resolve_conj_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4799886Z   test_opinfo_sample_generators_resolve_neg_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4800077Z   test_opinfo_sample_generators_roll_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4800268Z   test_opinfo_sample_generators_rot90_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4800458Z   test_opinfo_sample_generators_round_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4800655Z   test_opinfo_sample_generators_round_decimals_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4800890Z   test_opinfo_sample_generators_round_decimals_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4801100Z   test_opinfo_sample_generators_round_decimals_neg_3_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4801293Z   test_opinfo_sample_generators_rsqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4801488Z   test_opinfo_sample_generators_rsub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4801689Z   test_opinfo_sample_generators_scalar_tensor_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4801886Z   test_opinfo_sample_generators_scatter_add_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4802081Z   test_opinfo_sample_generators_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4802292Z   test_opinfo_sample_generators_scatter_reduce_amax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4802494Z   test_opinfo_sample_generators_scatter_reduce_amin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4802694Z   test_opinfo_sample_generators_scatter_reduce_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4802893Z   test_opinfo_sample_generators_scatter_reduce_prod_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4803096Z   test_opinfo_sample_generators_scatter_reduce_sum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4803298Z   test_opinfo_sample_generators_searchsorted_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4803509Z   test_opinfo_sample_generators_segment_reduce_lengths_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4803718Z   test_opinfo_sample_generators_segment_reduce_offsets_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4803944Z   test_opinfo_sample_generators_select_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4804149Z   test_opinfo_sample_generators_select_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4804336Z   test_opinfo_sample_generators_sgn_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4804521Z   test_opinfo_sample_generators_short_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4804711Z   test_opinfo_sample_generators_sigmoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4804901Z   test_opinfo_sample_generators_sign_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4805115Z   test_opinfo_sample_generators_signal_windows_bartlett_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4805368Z   test_opinfo_sample_generators_signal_windows_blackman_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4805582Z   test_opinfo_sample_generators_signal_windows_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4805798Z   test_opinfo_sample_generators_signal_windows_exponential_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4806006Z   test_opinfo_sample_generators_signal_windows_gaussian_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4806220Z   test_opinfo_sample_generators_signal_windows_general_cosine_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4806471Z   test_opinfo_sample_generators_signal_windows_general_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4806670Z   test_opinfo_sample_generators_signal_windows_hamming_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4806875Z   test_opinfo_sample_generators_signal_windows_hann_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4807080Z   test_opinfo_sample_generators_signal_windows_kaiser_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4807281Z   test_opinfo_sample_generators_signal_windows_nuttall_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4807472Z   test_opinfo_sample_generators_signbit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4807662Z   test_opinfo_sample_generators_sin_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4807855Z   test_opinfo_sample_generators_sinc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4808043Z   test_opinfo_sample_generators_sinh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4808234Z   test_opinfo_sample_generators_slice_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4808425Z   test_opinfo_sample_generators_slice_scatter_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4808614Z   test_opinfo_sample_generators_softmax_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4808817Z   test_opinfo_sample_generators_softmax_with_dtype_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4809003Z   test_opinfo_sample_generators_sort_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4809212Z   test_opinfo_sample_generators_sparse_sampled_addmm_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4809416Z   test_opinfo_sample_generators_special_airy_ai_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4809673Z   test_opinfo_sample_generators_special_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4809873Z   test_opinfo_sample_generators_special_bessel_j1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4810071Z   test_opinfo_sample_generators_special_bessel_y0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4810263Z   test_opinfo_sample_generators_special_bessel_y1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4810474Z   test_opinfo_sample_generators_special_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4810697Z   test_opinfo_sample_generators_special_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4811199Z   test_opinfo_sample_generators_special_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4811580Z   test_opinfo_sample_generators_special_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4811780Z   test_opinfo_sample_generators_special_entr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4811980Z   test_opinfo_sample_generators_special_erfcx_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4812199Z   test_opinfo_sample_generators_special_hermite_polynomial_h_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4812451Z   test_opinfo_sample_generators_special_hermite_polynomial_he_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4812648Z   test_opinfo_sample_generators_special_i0e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4812847Z   test_opinfo_sample_generators_special_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4813031Z   test_opinfo_sample_generators_special_i1e_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4813250Z   test_opinfo_sample_generators_special_laguerre_polynomial_l_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4813644Z   test_opinfo_sample_generators_special_legendre_polynomial_p_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4814003Z   test_opinfo_sample_generators_special_log_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4814232Z   test_opinfo_sample_generators_special_modified_bessel_i0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4814460Z   test_opinfo_sample_generators_special_modified_bessel_i1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4814935Z   test_opinfo_sample_generators_special_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4815137Z   test_opinfo_sample_generators_special_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4815340Z   test_opinfo_sample_generators_special_ndtr_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4815570Z   test_opinfo_sample_generators_special_ndtri_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4815816Z   test_opinfo_sample_generators_special_polygamma_special_polygamma_n_0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4816036Z   test_opinfo_sample_generators_special_scaled_modified_bessel_k0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4816346Z   test_opinfo_sample_generators_special_scaled_modified_bessel_k1_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4816761Z   test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4817165Z   test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_u_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4817570Z   test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_v_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4817973Z   test_opinfo_sample_generators_special_shifted_chebyshev_polynomial_w_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.000s)
2023-01-11T21:23:47.4818190Z   test_opinfo_sample_generators_special_spherical_bessel_j0_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4818391Z   test_opinfo_sample_generators_special_xlog1py_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4818588Z   test_opinfo_sample_generators_special_zeta_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4818779Z   test_opinfo_sample_generators_split_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4819026Z   test_opinfo_sample_generators_split_list_args_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4819220Z   test_opinfo_sample_generators_split_with_sizes_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4819412Z   test_opinfo_sample_generators_sqrt_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4819604Z   test_opinfo_sample_generators_square_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4819797Z   test_opinfo_sample_generators_squeeze_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4819986Z   test_opinfo_sample_generators_stack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4820174Z   test_opinfo_sample_generators_std_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4820367Z   test_opinfo_sample_generators_std_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4820571Z   test_opinfo_sample_generators_std_mean_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4820769Z   test_opinfo_sample_generators_std_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4820949Z   test_opinfo_sample_generators_stft_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4821134Z   test_opinfo_sample_generators_sub_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4821320Z   test_opinfo_sample_generators_sum_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4821513Z   test_opinfo_sample_generators_sum_to_size_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4821696Z   test_opinfo_sample_generators_svd_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4821892Z   test_opinfo_sample_generators_svd_lowrank_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4822082Z   test_opinfo_sample_generators_symeig_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4822293Z   test_opinfo_sample_generators_t_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4822497Z   test_opinfo_sample_generators_take_along_dim_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4822678Z   test_opinfo_sample_generators_take_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4822866Z   test_opinfo_sample_generators_tan_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4823051Z   test_opinfo_sample_generators_tanh_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4823248Z   test_opinfo_sample_generators_tensor_split_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4823446Z   test_opinfo_sample_generators_tensordot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4823633Z   test_opinfo_sample_generators_tile_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4823818Z   test_opinfo_sample_generators_to_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4824012Z   test_opinfo_sample_generators_to_sparse_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4824197Z   test_opinfo_sample_generators_topk_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4824378Z   test_opinfo_sample_generators_trace_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4824571Z   test_opinfo_sample_generators_transpose_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4824861Z   test_opinfo_sample_generators_trapezoid_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4825049Z   test_opinfo_sample_generators_trapz_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4825256Z   test_opinfo_sample_generators_triangular_solve_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4825441Z   test_opinfo_sample_generators_tril_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4825641Z   test_opinfo_sample_generators_tril_indices_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4825855Z   test_opinfo_sample_generators_triu_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4826064Z   test_opinfo_sample_generators_triu_indices_cuda_int64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4826264Z   test_opinfo_sample_generators_true_divide_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4826445Z   test_opinfo_sample_generators_trunc_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4826639Z   test_opinfo_sample_generators_unbind_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4826834Z   test_opinfo_sample_generators_unflatten_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4827028Z   test_opinfo_sample_generators_unfold_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4827222Z   test_opinfo_sample_generators_unfold_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4827413Z   test_opinfo_sample_generators_uniform_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4827621Z   test_opinfo_sample_generators_unique_consecutive_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4827811Z   test_opinfo_sample_generators_unique_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4828029Z   test_opinfo_sample_generators_unsqueeze_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4828210Z   test_opinfo_sample_generators_var_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4828401Z   test_opinfo_sample_generators_var_mean_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4828602Z   test_opinfo_sample_generators_var_mean_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4828799Z   test_opinfo_sample_generators_var_unbiased_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4828987Z   test_opinfo_sample_generators_vdot_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4829188Z   test_opinfo_sample_generators_view_as_complex_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4829384Z   test_opinfo_sample_generators_view_as_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4829582Z   test_opinfo_sample_generators_view_as_real_cuda_complex64 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4829779Z   test_opinfo_sample_generators_view_copy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4829958Z   test_opinfo_sample_generators_view_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4830146Z   test_opinfo_sample_generators_vsplit_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4830333Z   test_opinfo_sample_generators_vstack_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4830552Z   test_opinfo_sample_generators_where_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4830740Z   test_opinfo_sample_generators_xlogy_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4830929Z   test_opinfo_sample_generators_zero__cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4831118Z   test_opinfo_sample_generators_zeros_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4831311Z   test_opinfo_sample_generators_zeros_like_cuda_float32 (__main__.TestOpInfoSampleFunctionsCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4831435Z   test_sample_input (__main__.TestOpInfos) ... ok (0.001s)
2023-01-11T21:23:47.4831564Z   test_sample_input_metadata (__main__.TestOpInfos) ... ok (0.001s)
2023-01-11T21:23:47.4831711Z   test_default_names (__main__.TestTestParametrization) ... ok (0.001s)
2023-01-11T21:23:47.4831877Z   test_modules_decorator_misuse_error (__main__.TestTestParametrization) ... ok (0.001s)
2023-01-11T21:23:47.4852237Z   test_multiple_handling_of_same_param_error (__main__.TestTestParametrization) ... ok (0.001s)
2023-01-11T21:23:47.4852415Z   test_name_fn (__main__.TestTestParametrization) ... ok (0.002s)
2023-01-11T21:23:47.4852589Z   test_ops_decorator_misuse_error (__main__.TestTestParametrization) ... ok (0.001s)
2023-01-11T21:23:47.4852766Z   test_subtest_expected_failure_x_1 (__main__.TestTestParametrization) ... ok (0.000s)
2023-01-11T21:23:47.4852944Z   test_subtest_expected_failure_x_2 (__main__.TestTestParametrization) ... expected failure (0.000s)
2023-01-11T21:23:47.4853115Z   test_subtest_expected_failure_x_3 (__main__.TestTestParametrization) ... ok (0.000s)
2023-01-11T21:23:47.4853262Z   test_subtest_names (__main__.TestTestParametrization) ... ok (0.001s)
2023-01-11T21:23:47.4853456Z   test_two_things_subtest_expected_failure_x_1_y_4 (__main__.TestTestParametrization) ... expected failure (0.001s)
2023-01-11T21:23:47.4853647Z   test_two_things_subtest_expected_failure_x_1_y_5 (__main__.TestTestParametrization) ... expected failure (0.001s)
2023-01-11T21:23:47.4853928Z   test_two_things_subtest_expected_failure_x_1_y_6 (__main__.TestTestParametrization) ... expected failure (0.001s)
2023-01-11T21:23:47.4854208Z   test_two_things_subtest_expected_failure_x_2_y_4 (__main__.TestTestParametrization) ... ok (0.000s)
2023-01-11T21:23:47.4854395Z   test_two_things_subtest_expected_failure_x_2_y_5 (__main__.TestTestParametrization) ... ok (0.000s)
2023-01-11T21:23:47.4854804Z   test_two_things_subtest_expected_failure_x_2_y_6 (__main__.TestTestParametrization) ... expected failure (0.001s)
2023-01-11T21:23:47.4854978Z   test_two_things_subtest_expected_failure_x_3_y_4 (__main__.TestTestParametrization) ... ok (0.000s)
2023-01-11T21:23:47.4855175Z   test_two_things_subtest_expected_failure_x_3_y_5 (__main__.TestTestParametrization) ... ok (0.000s)
2023-01-11T21:23:47.4855392Z   test_two_things_subtest_expected_failure_x_3_y_6 (__main__.TestTestParametrization) ... expected failure (0.001s)
2023-01-11T21:23:47.4855583Z   test_default_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4855787Z   test_dtypes_composition_invalid_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4855987Z   test_dtypes_composition_valid_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4856200Z   test_multiple_handling_of_same_param_error_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4856377Z   test_name_fn_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4856571Z   test_ops_composition_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.523s)
2023-01-11T21:23:47.4856765Z   test_subtest_expected_failure_x_1_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4857080Z   test_subtest_expected_failure_x_2_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s)
2023-01-11T21:23:47.4857281Z   test_subtest_expected_failure_x_3_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.000s)
2023-01-11T21:23:47.4857467Z   test_subtest_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4857696Z   test_two_things_subtest_expected_failure_x_1_y_4_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s)
2023-01-11T21:23:47.4857924Z   test_two_things_subtest_expected_failure_x_1_y_5_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s)
2023-01-11T21:23:47.4858152Z   test_two_things_subtest_expected_failure_x_1_y_6_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s)
2023-01-11T21:23:47.4858363Z   test_two_things_subtest_expected_failure_x_2_y_4_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4858579Z   test_two_things_subtest_expected_failure_x_2_y_5_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4858796Z   test_two_things_subtest_expected_failure_x_2_y_6_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s)
2023-01-11T21:23:47.4859012Z   test_two_things_subtest_expected_failure_x_3_y_4_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4859217Z   test_two_things_subtest_expected_failure_x_3_y_5_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4859439Z   test_two_things_subtest_expected_failure_x_3_y_6_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... expected failure (0.001s)
2023-01-11T21:23:47.4859635Z   test_unparametrized_names_cuda (__main__.TestTestParametrizationDeviceTypeCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4859786Z   test_assertEqual_longMessage_cuda (__main__.TestTestingCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4860372Z   test_assertEqual_numpy_cuda_bool (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,582 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003079584, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4860489Z ok (0.004s)
2023-01-11T21:23:47.4861017Z   test_assertEqual_numpy_cuda_complex128 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,585 - torch.cuda._sanitizer - INFO - Found Stream with id: 1002190448, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4861091Z ok (0.003s)
2023-01-11T21:23:47.4861613Z   test_assertEqual_numpy_cuda_complex64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,589 - torch.cuda._sanitizer - INFO - Found Stream with id: 1009198656, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4861684Z ok (0.003s)
2023-01-11T21:23:47.4862199Z   test_assertEqual_numpy_cuda_float16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,592 - torch.cuda._sanitizer - INFO - Found Stream with id: 1001152576, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4862273Z ok (0.003s)
2023-01-11T21:23:47.4862782Z   test_assertEqual_numpy_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,596 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003437184, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4862855Z ok (0.004s)
2023-01-11T21:23:47.4863363Z   test_assertEqual_numpy_cuda_float64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,599 - torch.cuda._sanitizer - INFO - Found Stream with id: 1001806720, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4863466Z ok (0.003s)
2023-01-11T21:23:47.4863976Z   test_assertEqual_numpy_cuda_int16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,603 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003544512, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4864049Z ok (0.003s)
2023-01-11T21:23:47.4864551Z   test_assertEqual_numpy_cuda_int32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,606 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004346976, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4864626Z ok (0.003s)
2023-01-11T21:23:47.4865121Z   test_assertEqual_numpy_cuda_int64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,609 - torch.cuda._sanitizer - INFO - Found Stream with id: 1007361680, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4865197Z ok (0.003s)
2023-01-11T21:23:47.4865754Z   test_assertEqual_numpy_cuda_int8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,612 - torch.cuda._sanitizer - INFO - Found Stream with id: 1002159456, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4865828Z ok (0.003s)
2023-01-11T21:23:47.4866329Z   test_assertEqual_numpy_cuda_uint8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,616 - torch.cuda._sanitizer - INFO - Found Stream with id: 113169504, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4866401Z ok (0.003s)
2023-01-11T21:23:47.4866648Z   test_cuda_assert_should_not_stop_common_distributed_test_suite_cuda (__main__.TestTestingCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:23:47.4866892Z   test_cuda_assert_should_stop_common_device_type_test_suite_cuda (__main__.TestTestingCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:23:47.4867181Z   test_cuda_assert_should_stop_common_utils_test_suite_cuda (__main__.TestTestingCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:23:47.4867331Z   test_get_supported_dtypes_cuda (__main__.TestTestingCUDA) ... ok (0.111s)
2023-01-11T21:23:47.4867860Z   test_isclose_atol_rtol_greater_than_zero_cuda_bool (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,732 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003039408, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4867933Z ok (0.007s)
2023-01-11T21:23:47.4868455Z   test_isclose_atol_rtol_greater_than_zero_cuda_float16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,740 - torch.cuda._sanitizer - INFO - Found Stream with id: 996997792, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4868530Z ok (0.007s)
2023-01-11T21:23:47.4869064Z   test_isclose_atol_rtol_greater_than_zero_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,747 - torch.cuda._sanitizer - INFO - Found Stream with id: 997146432, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4869140Z ok (0.007s)
2023-01-11T21:23:47.4869668Z   test_isclose_atol_rtol_greater_than_zero_cuda_float64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,754 - torch.cuda._sanitizer - INFO - Found Stream with id: 996981072, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4869776Z ok (0.007s)
2023-01-11T21:23:47.4870307Z   test_isclose_atol_rtol_greater_than_zero_cuda_int16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,761 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004551360, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4870381Z ok (0.007s)
2023-01-11T21:23:47.4870903Z   test_isclose_atol_rtol_greater_than_zero_cuda_int32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,768 - torch.cuda._sanitizer - INFO - Found Stream with id: 996942240, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4870976Z ok (0.007s)
2023-01-11T21:23:47.4871490Z   test_isclose_atol_rtol_greater_than_zero_cuda_int64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,775 - torch.cuda._sanitizer - INFO - Found Stream with id: 1005064944, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4871566Z ok (0.007s)
2023-01-11T21:23:47.4872078Z   test_isclose_atol_rtol_greater_than_zero_cuda_int8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,782 - torch.cuda._sanitizer - INFO - Found Stream with id: 1006226560, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4872153Z ok (0.007s)
2023-01-11T21:23:47.4872674Z   test_isclose_atol_rtol_greater_than_zero_cuda_uint8 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,789 - torch.cuda._sanitizer - INFO - Found Stream with id: 1000930384, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4872751Z ok (0.007s)
2023-01-11T21:23:47.4873238Z   test_isclose_bool_cuda (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,796 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004529328, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4873310Z ok (0.007s)
2023-01-11T21:23:47.4873854Z   test_isclose_complex_cuda_complex128 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:45,806 - torch.cuda._sanitizer - INFO - Found Stream with id: 996944224, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4873932Z ok (0.240s)
2023-01-11T21:23:47.4874436Z   test_isclose_complex_cuda_complex64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,046 - torch.cuda._sanitizer - INFO - Found Stream with id: 1008427296, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4874512Z ok (0.235s)
2023-01-11T21:23:47.4874665Z   test_isclose_equality_shortcut_cuda (__main__.TestTestingCUDA) ... ok (0.001s)
2023-01-11T21:23:47.4875159Z   test_isclose_float_cuda_float16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,280 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004815696, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4875233Z ok (0.030s)
2023-01-11T21:23:47.4875732Z   test_isclose_float_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,311 - torch.cuda._sanitizer - INFO - Found Stream with id: 1005532160, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4875804Z ok (0.031s)
2023-01-11T21:23:47.4876297Z   test_isclose_float_cuda_float64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,342 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004110048, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4876401Z ok (0.031s)
2023-01-11T21:23:47.4876902Z   test_isclose_integer_cuda_int16 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,373 - torch.cuda._sanitizer - INFO - Found Stream with id: 1003385264, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4876974Z ok (0.014s)
2023-01-11T21:23:47.4877121Z   test_isclose_integer_cuda_int32 (__main__.TestTestingCUDA) ... ok (0.014s)
2023-01-11T21:23:47.4877262Z   test_isclose_integer_cuda_int64 (__main__.TestTestingCUDA) ... ok (0.014s)
2023-01-11T21:23:47.4877400Z   test_isclose_integer_cuda_int8 (__main__.TestTestingCUDA) ... ok (0.014s)
2023-01-11T21:23:47.4877542Z   test_isclose_integer_cuda_uint8 (__main__.TestTestingCUDA) ... ok (0.014s)
2023-01-11T21:23:47.4877712Z   test_isclose_nan_equality_shortcut_cuda_complex128 (__main__.TestTestingCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4877877Z   test_isclose_nan_equality_shortcut_cuda_complex64 (__main__.TestTestingCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4878037Z   test_isclose_nan_equality_shortcut_cuda_float16 (__main__.TestTestingCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4878200Z   test_isclose_nan_equality_shortcut_cuda_float32 (__main__.TestTestingCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4878358Z   test_isclose_nan_equality_shortcut_cuda_float64 (__main__.TestTestingCUDA) ... ok (0.002s)
2023-01-11T21:23:47.4878917Z   test_make_tensor_complex32_cuda (__main__.TestTestingCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py:167: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.)
2023-01-11T21:23:47.4879046Z   result = torch.empty(shape, device=device, dtype=dtype)
2023-01-11T21:23:47.4879116Z ok (0.001s)
2023-01-11T21:23:47.4879251Z   test_make_tensor_cuda_bool (__main__.TestTestingCUDA) ... ok (0.009s)
2023-01-11T21:23:47.4879776Z   test_make_tensor_cuda_complex64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,466 - torch.cuda._sanitizer - INFO - Found Stream with id: 1004253456, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4879847Z ok (0.009s)
2023-01-11T21:23:47.4880342Z   test_make_tensor_cuda_float32 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,472 - torch.cuda._sanitizer - INFO - Found Stream with id: 1008557520, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4880414Z ok (0.015s)
2023-01-11T21:23:47.4880899Z   test_make_tensor_cuda_int64 (__main__.TestTestingCUDA) ... 2023-01-11 21:23:46,488 - torch.cuda._sanitizer - INFO - Found Stream with id: 1002658816, but no matching stream creation in the trace. Backfilling the trace now. Perhaps the sanitizer was enabled after some torch operations?
2023-01-11T21:23:47.4880975Z ok (0.015s)
2023-01-11T21:23:47.4880982Z 
2023-01-11T21:23:47.4881181Z ----------------------------------------------------------------------
2023-01-11T21:23:47.4881268Z Ran 1248 tests in 32.945s
2023-01-11T21:23:47.4881274Z 
2023-01-11T21:23:47.4881369Z OK (skipped=25, expected failures=13)
2023-01-11T21:23:47.4881375Z 
2023-01-11T21:23:47.4881464Z Generating XML reports...
2023-01-11T21:23:47.4881748Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertClose-20230111212313.xml
2023-01-11T21:23:47.4882054Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseContainer-20230111212313.xml
2023-01-11T21:23:47.4882368Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseErrorMessage-20230111212313.xml
2023-01-11T21:23:47.4882689Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseMultiDeviceCUDA-20230111212313.xml
2023-01-11T21:23:47.4883022Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseQuantized-20230111212313.xml
2023-01-11T21:23:47.4883324Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSC-20230111212313.xml
2023-01-11T21:23:47.4883621Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSR-20230111212313.xml
2023-01-11T21:23:47.4883909Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCOO-20230111212313.xml
2023-01-11T21:23:47.4884201Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSC-20230111212313.xml
2023-01-11T21:23:47.4884507Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSR-20230111212313.xml
2023-01-11T21:23:47.4884786Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestFrameworkUtils-20230111212313.xml
2023-01-11T21:23:47.4885056Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestImports-20230111212313.xml
2023-01-11T21:23:47.4885371Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestOpInfoSampleFunctionsCUDA-20230111212313.xml
2023-01-11T21:23:47.4885631Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestOpInfos-20230111212313.xml
2023-01-11T21:23:47.4885926Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestTestParametrization-20230111212313.xml
2023-01-11T21:23:47.4886269Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestTestParametrizationDeviceTypeCUDA-20230111212313.xml
2023-01-11T21:23:47.4886533Z Generated XML report: test-reports/python-unittest/test_testing/TEST-TestTestingCUDA-20230111212313.xml
2023-01-11T21:23:47.4886546Z 
2023-01-11T21:23:47.4887047Z ##[endgroup]
2023-01-11T21:23:47.4887318Z FINISHED PRINTING LOG FILE of test_testing (/var/lib/jenkins/workspace/test/test-reports/test_testing_qs5jmj7o)
2023-01-11T21:23:47.4887327Z 
2023-01-11T21:23:49.3658983Z Ignoring disabled issues:  []
2023-01-11T21:23:49.3885241Z Running benchmark_utils/test_benchmark_utils ... [2023-01-11 21:23:49.387825]
2023-01-11T21:23:49.3886159Z Executing ['/opt/conda/bin/python', '-bb', 'benchmark_utils/test_benchmark_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:49.388178]
2023-01-11T21:23:53.8296155Z 
2023-01-11T21:23:53.8297833Z Expand the folded group to see the log file of benchmark_utils/test_benchmark_utils
2023-01-11T21:23:53.8299028Z ##[group]PRINTING LOG FILE of benchmark_utils/test_benchmark_utils (/var/lib/jenkins/workspace/test/test-reports/benchmark_utils-test_benchmark_utils_qeotpt96)
2023-01-11T21:23:53.8299417Z 
2023-01-11T21:23:53.8299538Z Running tests...
2023-01-11T21:23:53.8300131Z ----------------------------------------------------------------------
2023-01-11T21:23:53.8300773Z Test results will be stored in test-reports/python-unittest/benchmark_utils.test_benchmark_utils
2023-01-11T21:23:53.8301257Z   test_adaptive_timer (__main__.TestBenchmarkUtils) ... ok (1.277s)
2023-01-11T21:23:53.8301779Z   test_collect_callgrind (__main__.TestBenchmarkUtils) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:23:53.8302392Z   test_collect_cpp_callgrind (__main__.TestBenchmarkUtils) ... skip: Failing on clang, see 74398 (0.001s)
2023-01-11T21:23:53.8302875Z   test_compare (__main__.TestBenchmarkUtils) ... ok (0.152s)
2023-01-11T21:23:53.8303339Z   test_cpp_timer (__main__.TestBenchmarkUtils) ... skip: Failing on clang, see 74398 (0.000s)
2023-01-11T21:23:53.8303748Z   test_fuzzer (__main__.TestBenchmarkUtils) ... ok (0.002s)
2023-01-11T21:23:53.8304176Z   test_manipulate_callgrind_stats (__main__.TestBenchmarkUtils) ... ok (0.037s)
2023-01-11T21:23:53.8304594Z   test_timer (__main__.TestBenchmarkUtils) ... ok (0.910s)
2023-01-11T21:23:53.8305054Z   test_timer_tiny_fast_snippet (__main__.TestBenchmarkUtils) ... skip: Failing on clang, see 74398 (0.000s)
2023-01-11T21:23:53.8305600Z 
2023-01-11T21:23:53.8305897Z ----------------------------------------------------------------------
2023-01-11T21:23:53.8306270Z Ran 9 tests in 2.381s
2023-01-11T21:23:53.8306438Z 
2023-01-11T21:23:53.8306561Z OK (skipped=4)
2023-01-11T21:23:53.8306738Z 
2023-01-11T21:23:53.8306869Z Generating XML reports...
2023-01-11T21:23:53.8307551Z Generated XML report: test-reports/python-unittest/benchmark_utils.test_benchmark_utils/TEST-TestBenchmarkUtils-20230111212350.xml
2023-01-11T21:23:53.8307926Z 
2023-01-11T21:23:53.8308273Z ##[endgroup]
2023-01-11T21:23:53.8308924Z FINISHED PRINTING LOG FILE of benchmark_utils/test_benchmark_utils (/var/lib/jenkins/workspace/test/test-reports/benchmark_utils-test_benchmark_utils_qeotpt96)
2023-01-11T21:23:53.8309331Z 
2023-01-11T21:23:55.7315948Z Ignoring disabled issues:  []
2023-01-11T21:23:55.7550646Z Running dynamo/test_comptime ... [2023-01-11 21:23:55.754431]
2023-01-11T21:23:55.7551468Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_comptime.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:23:55.754777]
2023-01-11T21:23:58.8797244Z 
2023-01-11T21:23:58.8798254Z Expand the folded group to see the log file of dynamo/test_comptime
2023-01-11T21:23:58.8799403Z ##[group]PRINTING LOG FILE of dynamo/test_comptime (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_comptime_q0eg521l)
2023-01-11T21:23:58.8799723Z 
2023-01-11T21:23:58.8799843Z Running tests...
2023-01-11T21:23:58.8800399Z ----------------------------------------------------------------------
2023-01-11T21:23:58.8800958Z Test results will be stored in test-reports/python-unittest/dynamo.test_comptime
2023-01-11T21:23:58.8801310Z   test_get_local (__main__.ComptimeTests) ... ok (1.108s)
2023-01-11T21:23:58.8801704Z   test_graph_break (__main__.ComptimeTests) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8802199Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:23:58.8802615Z frames [('total', 6), ('ok', 6)]
2023-01-11T21:23:58.8802958Z stats [('calls_captured', 5), ('unique_graphs', 4), ('fusions_possible', 1)]
2023-01-11T21:23:58.8803300Z unimplemented []
2023-01-11T21:23:58.8803691Z graph_break [('ComptimeContext.graph_break', 2)]
2023-01-11T21:23:58.8804209Z inline_call [('ComptimeContext.graph_break', 1)]
2023-01-11T21:23:58.8804443Z ok (0.016s)
2023-01-11T21:23:58.8804757Z   test_print_bt (__main__.ComptimeTests) ...   File "/var/lib/jenkins/workspace/test/dynamo/test_comptime.py", line 152, in f
2023-01-11T21:23:58.8805040Z     y = g(y)
2023-01-11T21:23:58.8805301Z   File "/var/lib/jenkins/workspace/test/dynamo/test_comptime.py", line 145, in g
2023-01-11T21:23:58.8805570Z     comptime.print_bt()
2023-01-11T21:23:58.8805696Z 
2023-01-11T21:23:58.8805817Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8806026Z inline_call []
2023-01-11T21:23:58.8806396Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:23:58.8806735Z ok (0.090s)
2023-01-11T21:23:58.8807089Z   test_print_disas (__main__.ComptimeTests) ...  54           0 LOAD_FAST                0 (x)
2023-01-11T21:23:58.8807455Z               2 LOAD_CONST               1 (2)
2023-01-11T21:23:58.8807732Z               4 BINARY_MULTIPLY
2023-01-11T21:23:58.8808026Z               6 STORE_FAST               1 (y)
2023-01-11T21:23:58.8808195Z 
2023-01-11T21:23:58.8808316Z  56           8 LOAD_GLOBAL              0 (comptime)
2023-01-11T21:23:58.8808457Z 
2023-01-11T21:23:58.8808683Z  57          10 LOAD_CONST               2 (<code object _ at 0x7f88bc74e810, file "/var/lib/jenkins/workspace/test/dynamo/test_comptime.py", line 56>)
2023-01-11T21:23:58.8809317Z              12 LOAD_CONST               3 ('ComptimeTests.test_print_disas.<locals>.f.<locals>._')
2023-01-11T21:23:58.8809668Z              14 MAKE_FUNCTION            0
2023-01-11T21:23:58.8809935Z              16 CALL_FUNCTION            1
2023-01-11T21:23:58.8810271Z              18 STORE_FAST               2 (_)
2023-01-11T21:23:58.8810402Z 
2023-01-11T21:23:58.8810503Z  60          20 LOAD_GLOBAL              0 (comptime)
2023-01-11T21:23:58.8810749Z              22 LOAD_METHOD              1 (print_disas)
2023-01-11T21:23:58.8810982Z              24 CALL_METHOD              0
2023-01-11T21:23:58.8811233Z     -->      26 POP_TOP
2023-01-11T21:23:58.8811346Z 
2023-01-11T21:23:58.8811449Z  62          28 LOAD_FAST                1 (y)
2023-01-11T21:23:58.8811675Z              30 LOAD_CONST               4 (3)
2023-01-11T21:23:58.8811884Z              32 BINARY_ADD
2023-01-11T21:23:58.8812077Z              34 RETURN_VALUE
2023-01-11T21:23:58.8812199Z 
2023-01-11T21:23:58.8812327Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8812555Z inline_call []
2023-01-11T21:23:58.8812864Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:23:58.8813102Z ok (0.006s)
2023-01-11T21:23:58.8813327Z   test_print_graph (__main__.ComptimeTests) ... 
2023-01-11T21:23:58.8813485Z 
2023-01-11T21:23:58.8813490Z 
2023-01-11T21:23:58.8813595Z def forward(self, x : torch.Tensor):
2023-01-11T21:23:58.8813975Z     # File: /var/lib/jenkins/workspace/test/dynamo/test_comptime.py:26, code: y = x * 2
2023-01-11T21:23:58.8814246Z     mul = x * 2;  x = None
2023-01-11T21:23:58.8814433Z     
2023-01-11T21:23:58.8815099Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8815376Z inline_call []
2023-01-11T21:23:58.8815737Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:23:58.8815970Z ok (0.006s)
2023-01-11T21:23:58.8816199Z   test_print_guards (__main__.ComptimeTests) ... -
2023-01-11T21:23:58.8816479Z             local 'x' TENSOR_MATCH
2023-01-11T21:23:58.8816674Z             {
2023-01-11T21:23:58.8816896Z                 'guard_types': None,
2023-01-11T21:23:58.8817135Z                 'code': None,
2023-01-11T21:23:58.8817376Z                 'obj_weakref': None
2023-01-11T21:23:58.8817621Z                 'guarded_class': None
2023-01-11T21:23:58.8817816Z             }
2023-01-11T21:23:58.8817991Z             
2023-01-11T21:23:58.8818208Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8818407Z inline_call []
2023-01-11T21:23:58.8818715Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:23:58.8819014Z ok (0.006s)
2023-01-11T21:23:58.8819269Z   test_print_locals (__main__.ComptimeTests) ... x = TensorVariable()
2023-01-11T21:23:58.8819523Z y = TensorVariable()
2023-01-11T21:23:58.8819730Z _ = ConstantVariable(NoneType)
2023-01-11T21:23:58.8819983Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8820180Z inline_call []
2023-01-11T21:23:58.8820475Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:23:58.8820719Z ok (0.006s)
2023-01-11T21:23:58.8821019Z   test_print_value_stack (__main__.ComptimeTests) ... - TensorVariable()
2023-01-11T21:23:58.8821304Z frames [('total', 1), ('ok', 1)]
2023-01-11T21:23:58.8821503Z inline_call []
2023-01-11T21:23:58.8821804Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:23:58.8822212Z ok (0.007s)
2023-01-11T21:23:58.8822321Z 
2023-01-11T21:23:58.8822522Z ----------------------------------------------------------------------
2023-01-11T21:23:58.8822781Z Ran 8 tests in 1.246s
2023-01-11T21:23:58.8822908Z 
2023-01-11T21:23:58.8822980Z OK
2023-01-11T21:23:58.8823072Z 
2023-01-11T21:23:58.8823166Z Generating XML reports...
2023-01-11T21:23:58.8823585Z Generated XML report: test-reports/python-unittest/dynamo.test_comptime/TEST-ComptimeTests-20230111212357.xml
2023-01-11T21:23:58.8823829Z 
2023-01-11T21:23:58.8824143Z ##[endgroup]
2023-01-11T21:23:58.8824548Z FINISHED PRINTING LOG FILE of dynamo/test_comptime (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_comptime_q0eg521l)
2023-01-11T21:23:58.8824779Z 
2023-01-11T21:24:00.7868078Z Ignoring disabled issues:  []
2023-01-11T21:24:00.8098487Z Running dynamo/test_functions ... [2023-01-11 21:24:00.809254]
2023-01-11T21:24:00.8099867Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_functions.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:00.809617]
2023-01-11T21:24:04.6833605Z 
2023-01-11T21:24:04.6834226Z Expand the folded group to see the log file of dynamo/test_functions
2023-01-11T21:24:04.6835796Z ##[group]PRINTING LOG FILE of dynamo/test_functions (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_functions_ovkhnnj2)
2023-01-11T21:24:04.6836068Z 
2023-01-11T21:24:04.6836188Z Running tests...
2023-01-11T21:24:04.6837464Z ----------------------------------------------------------------------
2023-01-11T21:24:04.6838026Z Test results will be stored in test-reports/python-unittest/dynamo.test_functions
2023-01-11T21:24:04.6838429Z   test_T (__main__.FunctionTests) ... ok (1.198s)
2023-01-11T21:24:04.6838916Z   test_add (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6839457Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6839759Z ok (0.006s)
2023-01-11T21:24:04.6840373Z   test_add_ (__main__.FunctionTests) ... /var/lib/jenkins/workspace/test/dynamo/test_functions.py:73: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
2023-01-11T21:24:04.6841060Z   a_copy = torch.tensor(a)
2023-01-11T21:24:04.6841944Z /opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py:1052: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
2023-01-11T21:24:04.6842536Z   return node.target(*args, **kwargs)
2023-01-11T21:24:04.6843109Z <eval_with_key>.7:5: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
2023-01-11T21:24:04.6843673Z   tensor = torch.tensor(a);  a = None
2023-01-11T21:24:04.6844130Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6844437Z ok (0.015s)
2023-01-11T21:24:04.6845233Z   test_addcdiv (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6845604Z ok (0.008s)
2023-01-11T21:24:04.6846071Z   test_addcdiv_ (__main__.FunctionTests) ... /var/lib/jenkins/workspace/test/dynamo/test_functions.py:84: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
2023-01-11T21:24:04.6846609Z   a_copy = torch.tensor(a)
2023-01-11T21:24:04.6847293Z /opt/conda/lib/python3.10/site-packages/torch/_dynamo/utils.py:1052: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
2023-01-11T21:24:04.6847741Z   return node.target(*args, **kwargs)
2023-01-11T21:24:04.6848228Z <eval_with_key>.12:5: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
2023-01-11T21:24:04.6848631Z   tensor = torch.tensor(a);  a = None
2023-01-11T21:24:04.6849041Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6849400Z ok (0.015s)
2023-01-11T21:24:04.6849757Z   test_build_list_unpack (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6850303Z stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:24:04.6850660Z ok (0.012s)
2023-01-11T21:24:04.6851075Z   test_chunks1 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6851584Z ok (0.008s)
2023-01-11T21:24:04.6852125Z   test_const_tuple_add1 (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6852513Z ok (0.008s)
2023-01-11T21:24:04.6853035Z   test_const_tuple_add2 (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6853400Z ok (0.008s)
2023-01-11T21:24:04.6854009Z   test_constant1 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6854381Z ok (0.007s)
2023-01-11T21:24:04.6855057Z   test_constant2 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6855397Z ok (0.007s)
2023-01-11T21:24:04.6855817Z   test_constant3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6856159Z ok (0.005s)
2023-01-11T21:24:04.6856683Z   test_constant4 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6857052Z ok (0.005s)
2023-01-11T21:24:04.6857544Z   test_default_dict (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6857878Z ok (0.010s)
2023-01-11T21:24:04.6858321Z   test_del (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6858644Z ok (0.007s)
2023-01-11T21:24:04.6859123Z   test_device (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6859485Z ok (0.005s)
2023-01-11T21:24:04.6860012Z   test_device_constant (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6860381Z ok (0.011s)
2023-01-11T21:24:04.6860860Z   test_dict_copy (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6861224Z ok (0.005s)
2023-01-11T21:24:04.6861717Z   test_dict_ops (__main__.FunctionTests) ... stats [('calls_captured', 8), ('fusions_possible', 7), ('unique_graphs', 1)]
2023-01-11T21:24:04.6862156Z ok (0.013s)
2023-01-11T21:24:04.6862546Z   test_dict_param_keys (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6862830Z ok (0.006s)
2023-01-11T21:24:04.6863215Z   test_distributed_is_available (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6863508Z ok (0.005s)
2023-01-11T21:24:04.6863902Z   test_distributed_is_initialized (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6864197Z ok (0.005s)
2023-01-11T21:24:04.6864556Z   test_dtype (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6864827Z ok (0.005s)
2023-01-11T21:24:04.6865203Z   test_dtype_compare (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6865479Z ok (0.007s)
2023-01-11T21:24:04.6865839Z   test_finfo (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6866109Z ok (0.007s)
2023-01-11T21:24:04.6866472Z   test_float (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6866736Z ok (0.005s)
2023-01-11T21:24:04.6867115Z   test_fn_with_self_set (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6867400Z ok (0.010s)
2023-01-11T21:24:04.6867758Z   test_fstrings1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6868094Z ok (0.005s)
2023-01-11T21:24:04.6868466Z   test_fstrings2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6868731Z ok (0.005s)
2023-01-11T21:24:04.6869100Z   test_fstrings3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6869381Z ok (0.005s)
2023-01-11T21:24:04.6869625Z   test_funcdef_closure (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6869985Z stats [('calls_captured', 10), ('fusions_possible', 9), ('unique_graphs', 1)]
2023-01-11T21:24:04.6870232Z ok (0.014s)
2023-01-11T21:24:04.6870615Z   test_get_default_dtype (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6870897Z ok (0.005s)
2023-01-11T21:24:04.6871270Z   test_globalfn (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6871552Z ok (0.005s)
2023-01-11T21:24:04.6871922Z   test_globalmodule (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6872206Z ok (0.009s)
2023-01-11T21:24:04.6872577Z   test_globalvar (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6872860Z ok (0.006s)
2023-01-11T21:24:04.6873221Z   test_import1 (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6873502Z ok (0.006s)
2023-01-11T21:24:04.6873875Z   test_indirect1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6874146Z ok (0.005s)
2023-01-11T21:24:04.6874515Z   test_indirect2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6874798Z ok (0.005s)
2023-01-11T21:24:04.6875154Z   test_indirect3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6875429Z ok (0.005s)
2023-01-11T21:24:04.6875682Z   test_inline_jit_annotations (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6876089Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6876332Z ok (0.007s)
2023-01-11T21:24:04.6876713Z   test_inline_softmax (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6876996Z ok (0.009s)
2023-01-11T21:24:04.6877236Z   test_inline_with_default (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6877605Z stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6877853Z ok (0.006s)
2023-01-11T21:24:04.6878087Z   test_inner_function (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6878453Z stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6878696Z ok (0.005s)
2023-01-11T21:24:04.6879096Z   test_is_contiguous_memory_format (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6879385Z ok (0.005s)
2023-01-11T21:24:04.6879842Z   test_is_fx_tracing (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6880117Z ok (0.005s)
2023-01-11T21:24:04.6880539Z   test_is_in_onnx_export (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6880839Z ok (0.005s)
2023-01-11T21:24:04.6881204Z   test_is_not_null (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6881484Z ok (0.005s)
2023-01-11T21:24:04.6881862Z   test_is_quantized (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6882178Z ok (0.005s)
2023-01-11T21:24:04.6882551Z   test_is_sparse (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6882832Z ok (0.004s)
2023-01-11T21:24:04.6883216Z   test_islice_chain (__main__.FunctionTests) ... stats [('calls_captured', 6), ('fusions_possible', 5), ('unique_graphs', 1)]
2023-01-11T21:24:04.6883493Z ok (0.011s)
2023-01-11T21:24:04.6883866Z   test_jit_annotate (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6884149Z ok (0.006s)
2023-01-11T21:24:04.6884524Z   test_len_constant_dict (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6884810Z ok (0.005s)
2023-01-11T21:24:04.6885186Z   test_len_constant_list (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6885476Z ok (0.005s)
2023-01-11T21:24:04.6885867Z   test_len_constant_misc_iterables (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6886163Z ok (0.005s)
2023-01-11T21:24:04.6886541Z   test_len_tensor (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6886816Z ok (0.005s)
2023-01-11T21:24:04.6887186Z   test_list_add (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6887464Z ok (0.005s)
2023-01-11T21:24:04.6887829Z   test_list_clear (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6888107Z ok (0.007s)
2023-01-11T21:24:04.6888484Z   test_list_convert (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6888770Z ok (0.008s)
2023-01-11T21:24:04.6889143Z   test_list_reversed (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:24:04.6889425Z ok (0.009s)
2023-01-11T21:24:04.6889846Z   test_list_slice_assignment (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6890134Z ok (0.005s)
2023-01-11T21:24:04.6890503Z   test_list_truth (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6890788Z ok (0.005s)
2023-01-11T21:24:04.6891159Z   test_listarg1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6891431Z ok (0.005s)
2023-01-11T21:24:04.6891803Z   test_listarg2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6892080Z ok (0.006s)
2023-01-11T21:24:04.6892443Z   test_listarg3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6892715Z ok (0.005s)
2023-01-11T21:24:04.6893078Z   test_listarg4 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6893351Z ok (0.005s)
2023-01-11T21:24:04.6893716Z   test_listarg5 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6894076Z ok (0.005s)
2023-01-11T21:24:04.6894458Z   test_load_global_bool (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6894989Z ok (0.005s)
2023-01-11T21:24:04.6895244Z   test_map_sum (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6895721Z stats [('calls_captured', 8), ('fusions_possible', 7), ('unique_graphs', 1)]
2023-01-11T21:24:04.6895957Z ok (0.013s)
2023-01-11T21:24:04.6896329Z   test_methodcall1 (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6896692Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6896921Z ok (0.007s)
2023-01-11T21:24:04.6897154Z   test_methodcall2 (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6897505Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6897733Z ok (0.007s)
2023-01-11T21:24:04.6897966Z   test_methodcall3 (__main__.FunctionTests) ... inline_call []
2023-01-11T21:24:04.6898311Z stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6898546Z ok (0.007s)
2023-01-11T21:24:04.6898907Z   test_min_max (__main__.FunctionTests) ... stats [('calls_captured', 11), ('fusions_possible', 10), ('unique_graphs', 1)]
2023-01-11T21:24:04.6899187Z ok (0.017s)
2023-01-11T21:24:04.6899563Z   test_module_constant (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6899838Z ok (0.009s)
2023-01-11T21:24:04.6900219Z   test_ndim (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6900524Z ok (0.005s)
2023-01-11T21:24:04.6900877Z   test_pop (__main__.FunctionTests) ... stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:04.6901149Z ok (0.009s)
2023-01-11T21:24:04.6901512Z   test_range1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6901786Z ok (0.005s)
2023-01-11T21:24:04.6902142Z   test_range2 (__main__.FunctionTests) ... stats [('calls_captured', 13), ('fusions_possible', 12), ('unique_graphs', 1)]
2023-01-11T21:24:04.6902420Z ok (0.015s)
2023-01-11T21:24:04.6902782Z   test_reduce (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6903048Z ok (0.008s)
2023-01-11T21:24:04.6903425Z   test_return_dict (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6903701Z ok (0.006s)
2023-01-11T21:24:04.6904072Z   test_return_dict2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6904394Z ok (0.006s)
2023-01-11T21:24:04.6904772Z   test_return_tuple1 (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6905051Z ok (0.006s)
2023-01-11T21:24:04.6905413Z   test_return_tuple2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6905684Z ok (0.005s)
2023-01-11T21:24:04.6906044Z   test_shape1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6906311Z ok (0.005s)
2023-01-11T21:24:04.6906674Z   test_shape2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6906952Z ok (0.004s)
2023-01-11T21:24:04.6907310Z   test_slice1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6907574Z ok (0.004s)
2023-01-11T21:24:04.6907939Z   test_slice2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6908213Z ok (0.004s)
2023-01-11T21:24:04.6908563Z   test_slice3 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6908831Z ok (0.005s)
2023-01-11T21:24:04.6909190Z   test_slice4 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6909448Z ok (0.004s)
2023-01-11T21:24:04.6909807Z   test_slice5 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6910143Z ok (0.005s)
2023-01-11T21:24:04.6910526Z   test_slice6 (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6910787Z ok (0.006s)
2023-01-11T21:24:04.6911156Z   test_startswith (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6911434Z ok (0.006s)
2023-01-11T21:24:04.6911791Z   test_tensor_len (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6912069Z ok (0.006s)
2023-01-11T21:24:04.6912451Z   test_tensor_new_with_shape (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6912739Z ok (0.017s)
2023-01-11T21:24:04.6913113Z   test_tensor_new_with_size (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6913398Z ok (0.017s)
2023-01-11T21:24:04.6913767Z   test_tensor_type (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6914033Z ok (0.018s)
2023-01-11T21:24:04.6914403Z   test_tensor_type2 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6914681Z ok (0.030s)
2023-01-11T21:24:04.6915054Z   test_transpose_for_scores (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6915337Z ok (0.007s)
2023-01-11T21:24:04.6915694Z   test_tuple1 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6915965Z ok (0.005s)
2023-01-11T21:24:04.6916313Z   test_tuple2 (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6916581Z ok (0.005s)
2023-01-11T21:24:04.6916954Z   test_tuple_contains (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6917225Z ok (0.005s)
2023-01-11T21:24:04.6917594Z   test_tuple_iadd (__main__.FunctionTests) ... stats [('calls_captured', 2), ('fusions_possible', 1), ('unique_graphs', 1)]
2023-01-11T21:24:04.6917906Z ok (0.006s)
2023-01-11T21:24:04.6918272Z   test_unpack1 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6918538Z ok (0.006s)
2023-01-11T21:24:04.6918901Z   test_unpack2 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6919178Z ok (0.006s)
2023-01-11T21:24:04.6919529Z   test_unpack3 (__main__.FunctionTests) ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:04.6919798Z ok (0.006s)
2023-01-11T21:24:04.6920163Z   test_unpack_ex1 (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:24:04.6920433Z ok (0.008s)
2023-01-11T21:24:04.6920799Z   test_unpack_ex2 (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:24:04.6921074Z ok (0.008s)
2023-01-11T21:24:04.6921440Z   test_unpack_ex3 (__main__.FunctionTests) ... stats [('calls_captured', 5), ('fusions_possible', 4), ('unique_graphs', 1)]
2023-01-11T21:24:04.6921701Z ok (0.008s)
2023-01-11T21:24:04.6922068Z   test_viamethod (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6922345Z ok (0.005s)
2023-01-11T21:24:04.6922702Z   test_viatorch (__main__.FunctionTests) ... stats [('calls_captured', 1), ('unique_graphs', 1), ('fusions_possible', 0)]
2023-01-11T21:24:04.6922974Z ok (0.005s)
2023-01-11T21:24:04.6923082Z 
2023-01-11T21:24:04.6923292Z ----------------------------------------------------------------------
2023-01-11T21:24:04.6923592Z Ran 109 tests in 1.981s
2023-01-11T21:24:04.6923719Z 
2023-01-11T21:24:04.6923792Z OK
2023-01-11T21:24:04.6923893Z 
2023-01-11T21:24:04.6923987Z Generating XML reports...
2023-01-11T21:24:04.6924415Z Generated XML report: test-reports/python-unittest/dynamo.test_functions/TEST-FunctionTests-20230111212402.xml
2023-01-11T21:24:04.6924666Z 
2023-01-11T21:24:04.6926836Z ##[endgroup]
2023-01-11T21:24:04.6927276Z FINISHED PRINTING LOG FILE of dynamo/test_functions (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_functions_ovkhnnj2)
2023-01-11T21:24:04.6927514Z 
2023-01-11T21:24:06.6019519Z Ignoring disabled issues:  []
2023-01-11T21:24:06.6250321Z Running dynamo/test_replay_record ... [2023-01-11 21:24:06.624355]
2023-01-11T21:24:06.6251789Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_replay_record.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:06.624700]
2023-01-11T21:24:08.3281024Z 
2023-01-11T21:24:08.3281727Z Expand the folded group to see the log file of dynamo/test_replay_record
2023-01-11T21:24:08.3282863Z ##[group]PRINTING LOG FILE of dynamo/test_replay_record (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_replay_record_vdt6sddc)
2023-01-11T21:24:08.3283123Z 
2023-01-11T21:24:08.3283210Z Running tests...
2023-01-11T21:24:08.3283652Z ----------------------------------------------------------------------
2023-01-11T21:24:08.3284066Z Test results will be stored in test-reports/python-unittest/dynamo.test_replay_record
2023-01-11T21:24:08.3284425Z   test_fn_call_args (__main__.ReplayRecordTests) ... skip: requires dill (0.000s)
2023-01-11T21:24:08.3284758Z   test_local_module (__main__.ReplayRecordTests) ... skip: requires dill (0.001s)
2023-01-11T21:24:08.3285090Z   test_nonlocal_fn_call (__main__.ReplayRecordTests) ... skip: requires dill (0.000s)
2023-01-11T21:24:08.3285625Z   test_nonlocal_module_class (__main__.ReplayRecordTests) ... skip: requires dill (0.000s)
2023-01-11T21:24:08.3285980Z   test_nonlocal_module_fn_call (__main__.ReplayRecordTests) ... skip: requires dill (0.000s)
2023-01-11T21:24:08.3286334Z   test_successful_inline (__main__.ReplayRecordTests) ... skip: requires dill (0.000s)
2023-01-11T21:24:08.3286672Z   test_unsuccessful_inline (__main__.ReplayRecordTests) ... skip: requires dill (0.000s)
2023-01-11T21:24:08.3286862Z 
2023-01-11T21:24:08.3287339Z ----------------------------------------------------------------------
2023-01-11T21:24:08.3287599Z Ran 7 tests in 0.003s
2023-01-11T21:24:08.3287722Z 
2023-01-11T21:24:08.3287795Z OK (skipped=7)
2023-01-11T21:24:08.3287912Z 
2023-01-11T21:24:08.3288005Z Generating XML reports...
2023-01-11T21:24:08.3288459Z Generated XML report: test-reports/python-unittest/dynamo.test_replay_record/TEST-ReplayRecordTests-20230111212408.xml
2023-01-11T21:24:08.3288720Z 
2023-01-11T21:24:08.3288953Z ##[endgroup]
2023-01-11T21:24:08.3289374Z FINISHED PRINTING LOG FILE of dynamo/test_replay_record (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_replay_record_vdt6sddc)
2023-01-11T21:24:08.3289620Z 
2023-01-11T21:24:10.2671119Z Ignoring disabled issues:  []
2023-01-11T21:24:10.2906542Z Running dynamo/test_verify_correctness ... [2023-01-11 21:24:10.290147]
2023-01-11T21:24:10.2909298Z Executing ['/opt/conda/bin/python', '-bb', 'dynamo/test_verify_correctness.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:10.290490]
2023-01-11T21:24:13.4470715Z 
2023-01-11T21:24:13.4471318Z Expand the folded group to see the log file of dynamo/test_verify_correctness
2023-01-11T21:24:13.4472118Z ##[group]PRINTING LOG FILE of dynamo/test_verify_correctness (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_verify_correctness_dts3b7im)
2023-01-11T21:24:13.4472424Z 
2023-01-11T21:24:13.4472517Z Running tests...
2023-01-11T21:24:13.4472960Z ----------------------------------------------------------------------
2023-01-11T21:24:13.4473386Z Test results will be stored in test-reports/python-unittest/dynamo.test_verify_correctness
2023-01-11T21:24:13.4473965Z   test_example_inputs (__main__.TestVerifyCorrectness) ... ok (1.178s)
2023-01-11T21:24:13.4474277Z   test_incorrect_verify_false (__main__.TestVerifyCorrectness)
2023-01-11T21:24:13.4474730Z The bad optimization return a graph that ... stats [('calls_captured', 3), ('fusions_possible', 2), ('unique_graphs', 1)]
2023-01-11T21:24:13.4475077Z frames [('total', 2), ('ok', 2)]
2023-01-11T21:24:13.4475427Z stats [('calls_captured', 7), ('fusions_possible', 5), ('unique_graphs', 2)]
2023-01-11T21:24:13.4475667Z ok (0.016s)
2023-01-11T21:24:13.4475914Z   test_incorrect_verify_true (__main__.TestVerifyCorrectness)
2023-01-11T21:24:13.4476377Z If a bad optimization return a graph that ... [2023-01-11 21:24:12,911] torch._dynamo.output_graph: [ERROR] error in verify_correctness
2023-01-11T21:24:13.4476691Z Traceback (most recent call last):
2023-01-11T21:24:13.4477089Z   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 173, in __call__
2023-01-11T21:24:13.4477435Z     raise RuntimeError(f"incorrect results of backend {self}")
2023-01-11T21:24:13.4477804Z RuntimeError: incorrect results of backend <torch._dynamo.output_graph.WrapperBackend object at 0x7ff200c7df60>
2023-01-11T21:24:13.4478143Z frames [('total', 2), ('ok', 1)]
2023-01-11T21:24:13.4478472Z stats [('calls_captured', 7), ('fusions_possible', 5), ('unique_graphs', 1)]
2023-01-11T21:24:13.4478723Z ok (0.016s)
2023-01-11T21:24:13.4478986Z   test_ipex_fp32 (__main__.TestVerifyCorrectness) ... skip: requires ipex (0.001s)
2023-01-11T21:24:13.4479382Z   test_nnc (__main__.TestVerifyCorrectness) ... frames [('total', 1), ('ok', 1)]
2023-01-11T21:24:13.4479756Z stats [('calls_captured', 4), ('fusions_possible', 3), ('unique_graphs', 1)]
2023-01-11T21:24:13.4479999Z ok (0.087s)
2023-01-11T21:24:13.4480112Z 
2023-01-11T21:24:13.4480318Z ----------------------------------------------------------------------
2023-01-11T21:24:13.4480579Z Ran 5 tests in 1.299s
2023-01-11T21:24:13.4480702Z 
2023-01-11T21:24:13.4480782Z OK (skipped=1)
2023-01-11T21:24:13.4480895Z 
2023-01-11T21:24:13.4480988Z Generating XML reports...
2023-01-11T21:24:13.4481449Z Generated XML report: test-reports/python-unittest/dynamo.test_verify_correctness/TEST-TestVerifyCorrectness-20230111212411.xml
2023-01-11T21:24:13.4481721Z 
2023-01-11T21:24:13.4481961Z ##[endgroup]
2023-01-11T21:24:13.4482468Z FINISHED PRINTING LOG FILE of dynamo/test_verify_correctness (/var/lib/jenkins/workspace/test/test-reports/dynamo-test_verify_correctness_dts3b7im)
2023-01-11T21:24:13.4482726Z 
2023-01-11T21:24:15.3606009Z Ignoring disabled issues:  []
2023-01-11T21:24:15.3834641Z Running lazy/test_extract_compiled_graph ... [2023-01-11 21:24:15.382872]
2023-01-11T21:24:15.3836383Z Executing ['/opt/conda/bin/python', '-bb', 'lazy/test_extract_compiled_graph.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:15.383234]
2023-01-11T21:24:16.7529621Z 
2023-01-11T21:24:16.7530807Z Expand the folded group to see the log file of lazy/test_extract_compiled_graph
2023-01-11T21:24:16.7532184Z ##[group]PRINTING LOG FILE of lazy/test_extract_compiled_graph (/var/lib/jenkins/workspace/test/test-reports/lazy-test_extract_compiled_graph_rb0dxqv0)
2023-01-11T21:24:16.7532701Z 
2023-01-11T21:24:16.7533137Z ##[endgroup]
2023-01-11T21:24:16.7534035Z FINISHED PRINTING LOG FILE of lazy/test_extract_compiled_graph (/var/lib/jenkins/workspace/test/test-reports/lazy-test_extract_compiled_graph_rb0dxqv0)
2023-01-11T21:24:16.7534330Z 
2023-01-11T21:24:18.6500809Z Ignoring disabled issues:  []
2023-01-11T21:24:18.6728573Z Running lazy/test_ts_opinfo ... [2023-01-11 21:24:18.672074]
2023-01-11T21:24:18.6730294Z Executing ['/opt/conda/bin/python', '-bb', 'lazy/test_ts_opinfo.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:18.672408]
2023-01-11T21:24:22.3675189Z 
2023-01-11T21:24:22.3675647Z Expand the folded group to see the log file of lazy/test_ts_opinfo
2023-01-11T21:24:22.3676509Z ##[group]PRINTING LOG FILE of lazy/test_ts_opinfo (/var/lib/jenkins/workspace/test/test-reports/lazy-test_ts_opinfo_ro8y0r1x)
2023-01-11T21:24:22.3677096Z 
2023-01-11T21:24:22.3677201Z Running tests...
2023-01-11T21:24:22.3677752Z ----------------------------------------------------------------------
2023-01-11T21:24:22.3678193Z Test results will be stored in test-reports/python-unittest/lazy.test_ts_opinfo
2023-01-11T21:24:22.3678539Z   test_nonzero_dynamic (__main__.TestLazyDynamicOps) ... ok (0.160s)
2023-01-11T21:24:22.3678904Z   testConvolutionBackward (__main__.TestLazyTensor) ... skip: Disable until autograd supports symints (0.001s)
2023-01-11T21:24:22.3679248Z   test_tensor_ctr (__main__.TestLazyTensor) ... ok (0.001s)
2023-01-11T21:24:22.3679554Z   test_view_mark_step_preserved (__main__.TestLazyTensor) ... ok (0.002s)
2023-01-11T21:24:22.3679733Z 
2023-01-11T21:24:22.3679935Z ----------------------------------------------------------------------
2023-01-11T21:24:22.3680186Z Ran 4 tests in 0.166s
2023-01-11T21:24:22.3680313Z 
2023-01-11T21:24:22.3680391Z OK (skipped=1)
2023-01-11T21:24:22.3680512Z 
2023-01-11T21:24:22.3680611Z Generating XML reports...
2023-01-11T21:24:22.3681056Z Generated XML report: test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyDynamicOps-20230111212421.xml
2023-01-11T21:24:22.3681573Z Generated XML report: test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyTensor-20230111212421.xml
2023-01-11T21:24:22.3681802Z 
2023-01-11T21:24:22.3682045Z ##[endgroup]
2023-01-11T21:24:22.3682446Z FINISHED PRINTING LOG FILE of lazy/test_ts_opinfo (/var/lib/jenkins/workspace/test/test-reports/lazy-test_ts_opinfo_ro8y0r1x)
2023-01-11T21:24:22.3682677Z 
2023-01-11T21:24:24.2938583Z Ignoring disabled issues:  []
2023-01-11T21:24:24.3176704Z Running nn/test_init ... [2023-01-11 21:24:24.316959]
2023-01-11T21:24:24.3177577Z Executing ['/opt/conda/bin/python', '-bb', 'nn/test_init.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:24.317321]
2023-01-11T21:24:26.2064978Z 
2023-01-11T21:24:26.2066002Z Expand the folded group to see the log file of nn/test_init
2023-01-11T21:24:26.2067078Z ##[group]PRINTING LOG FILE of nn/test_init (/var/lib/jenkins/workspace/test/test-reports/nn-test_init__s30kh7v)
2023-01-11T21:24:26.2067312Z 
2023-01-11T21:24:26.2067542Z ##[endgroup]
2023-01-11T21:24:26.2068353Z FINISHED PRINTING LOG FILE of nn/test_init (/var/lib/jenkins/workspace/test/test-reports/nn-test_init__s30kh7v)
2023-01-11T21:24:26.2068584Z 
2023-01-11T21:24:28.1452466Z Ignoring disabled issues:  []
2023-01-11T21:24:28.1681181Z Running nn/test_packed_sequence ... [2023-01-11 21:24:28.167512]
2023-01-11T21:24:28.1681865Z Executing ['/opt/conda/bin/python', '-bb', 'nn/test_packed_sequence.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:28.167859]
2023-01-11T21:24:31.1616828Z 
2023-01-11T21:24:31.1617408Z Expand the folded group to see the log file of nn/test_packed_sequence
2023-01-11T21:24:31.1618333Z ##[group]PRINTING LOG FILE of nn/test_packed_sequence (/var/lib/jenkins/workspace/test/test-reports/nn-test_packed_sequence_ctas8rzn)
2023-01-11T21:24:31.1618684Z 
2023-01-11T21:24:31.1618805Z Running tests...
2023-01-11T21:24:31.1619380Z ----------------------------------------------------------------------
2023-01-11T21:24:31.1619955Z Test results will be stored in test-reports/python-unittest/nn.test_packed_sequence
2023-01-11T21:24:31.1620446Z   test_pack_padded_sequence (__main__.PackedSequenceTest) ... ok (1.129s)
2023-01-11T21:24:31.1620876Z   test_pack_sequence (__main__.PackedSequenceTest) ... ok (0.055s)
2023-01-11T21:24:31.1621196Z   test_pad_sequence (__main__.PackedSequenceTest) ... ok (0.009s)
2023-01-11T21:24:31.1621657Z   test_pad_sequence_with_non_iterable_sequences (__main__.PackedSequenceTest) ... ok (0.001s)
2023-01-11T21:24:31.1622095Z   test_pad_sequence_with_tensor_sequences (__main__.PackedSequenceTest) ... ok (0.001s)
2023-01-11T21:24:31.1622412Z   test_to (__main__.PackedSequenceTest) ... ok (0.007s)
2023-01-11T21:24:31.1622710Z   test_to_memory_format (__main__.PackedSequenceTest) ... ok (0.001s)
2023-01-11T21:24:31.1623224Z   test_total_length (__main__.PackedSequenceTest) ... ok (0.003s)
2023-01-11T21:24:31.1623509Z   test_type_casts (__main__.PackedSequenceTest)
2023-01-11T21:24:31.1623824Z Test type casting of `PackedSequence` against type casting of tensor ... ok (0.024s)
2023-01-11T21:24:31.1624149Z   test_unpack_sequence (__main__.PackedSequenceTest) ... ok (0.009s)
2023-01-11T21:24:31.1624459Z   test_unpad_sequence (__main__.PackedSequenceTest) ... ok (0.008s)
2023-01-11T21:24:31.1624760Z   test_wrong_order (__main__.PackedSequenceTest) ... ok (0.004s)
2023-01-11T21:24:31.1624927Z 
2023-01-11T21:24:31.1625136Z ----------------------------------------------------------------------
2023-01-11T21:24:31.1625400Z Ran 12 tests in 1.253s
2023-01-11T21:24:31.1625525Z 
2023-01-11T21:24:31.1625594Z OK
2023-01-11T21:24:31.1625693Z 
2023-01-11T21:24:31.1625786Z Generating XML reports...
2023-01-11T21:24:31.1626230Z Generated XML report: test-reports/python-unittest/nn.test_packed_sequence/TEST-PackedSequenceTest-20230111212429.xml
2023-01-11T21:24:31.1626492Z 
2023-01-11T21:24:31.1626761Z ##[endgroup]
2023-01-11T21:24:31.1627179Z FINISHED PRINTING LOG FILE of nn/test_packed_sequence (/var/lib/jenkins/workspace/test/test-reports/nn-test_packed_sequence_ctas8rzn)
2023-01-11T21:24:31.1627431Z 
2023-01-11T21:24:33.0811293Z Ignoring disabled issues:  []
2023-01-11T21:24:33.1045499Z Running profiler/test_memory_profiler ... [2023-01-11 21:24:33.103677]
2023-01-11T21:24:33.1046593Z Executing ['/opt/conda/bin/python', '-bb', 'profiler/test_memory_profiler.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:33.104010]
2023-01-11T21:24:39.4227989Z 
2023-01-11T21:24:39.4228530Z Expand the folded group to see the log file of profiler/test_memory_profiler
2023-01-11T21:24:39.4229921Z ##[group]PRINTING LOG FILE of profiler/test_memory_profiler (/var/lib/jenkins/workspace/test/test-reports/profiler-test_memory_profiler_ksquozlp)
2023-01-11T21:24:39.4230388Z 
2023-01-11T21:24:39.4230545Z Running tests...
2023-01-11T21:24:39.4231128Z ----------------------------------------------------------------------
2023-01-11T21:24:39.4231727Z Test results will be stored in test-reports/python-unittest/profiler.test_memory_profiler
2023-01-11T21:24:39.4232876Z   test_data_flow_graph_complicated (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4233714Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4234460Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4235491Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:336: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4236516Z   lines.append(f"{name + ':':<8} T{storage_to_id[t.storage().data_ptr()]}")
2023-01-11T21:24:39.4237138Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4237704Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4238175Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4238459Z ok (1.327s)
2023-01-11T21:24:39.4238916Z   test_data_flow_graph_non_op_allocations (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4239426Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4239938Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4240320Z ok (0.068s)
2023-01-11T21:24:39.4240750Z   test_data_flow_graph_simple (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4241249Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4241708Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4242855Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:337: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (Triggered internally at /var/lib/jenkins/workspace/build/aten/src/ATen/core/TensorBody.h:485.)
2023-01-11T21:24:39.4243590Z   if t.grad is not None:
2023-01-11T21:24:39.4243963Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4244406Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4244869Z STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4245150Z ok (0.131s)
2023-01-11T21:24:39.4245596Z   test_data_flow_graph_simple_backward (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:35 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4246101Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4246561Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4247233Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:338: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4247772Z   grad_id = storage_to_id[t.grad.storage().data_ptr()]
2023-01-11T21:24:39.4248005Z ok (0.071s)
2023-01-11T21:24:39.4248468Z   test_data_flow_graph_simple_inplace (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4248969Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4249425Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4249880Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4250330Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4250787Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4251058Z ok (0.136s)
2023-01-11T21:24:39.4251503Z   test_data_flow_graph_stacked (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4251999Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4252445Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4252893Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4253341Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4253838Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4254423Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4255072Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4255531Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4255806Z ok (0.242s)
2023-01-11T21:24:39.4256252Z   test_data_flow_graph_with_annotations (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4256754Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4257206Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4257474Z ok (0.077s)
2023-01-11T21:24:39.4257899Z   test_match_schemas (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4258384Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4258842Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4259112Z ok (0.008s)
2023-01-11T21:24:39.4259549Z   test_match_schemas_backward (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4260039Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4260483Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4260762Z ok (0.006s)
2023-01-11T21:24:39.4261201Z   test_match_schemas_tensorlist (__main__.TestDataFlow) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4261792Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4262240Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4262516Z ok (0.003s)
2023-01-11T21:24:39.4262981Z   test_extract_gradients_from_module (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4263494Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4263941Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4264575Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:117: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4265131Z   return tensor.storage().data_ptr() == key.storage.ptr
2023-01-11T21:24:39.4265701Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:147: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4266244Z   allowed_set = {t.storage().data_ptr() for t in tensors}
2023-01-11T21:24:39.4266636Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4267134Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4267595Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4268046Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4268499Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4268960Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4269413Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4269902Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4270368Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4270651Z ok (0.045s)
2023-01-11T21:24:39.4271138Z   test_extract_gradients_from_module_and_optimizer (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4271669Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4272128Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4272405Z ok (0.009s)
2023-01-11T21:24:39.4272874Z   test_extract_gradients_from_optimizer (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4273395Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4273853Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4274303Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4274746Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4275244Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4275698Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4276137Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4276596Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4277049Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4277501Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4277955Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4278237Z ok (0.046s)
2023-01-11T21:24:39.4278736Z   test_extract_gradients_from_optimizer_set_to_none (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4279260Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4279713Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4280163Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4280611Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4281075Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4281554Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4282004Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4282465Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4282915Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4283363Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4283817Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4284093Z ok (0.053s)
2023-01-11T21:24:39.4284554Z   test_extract_gradients_low_level (__main__.TestIdentifyGradients) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4285072Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4285528Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4285972Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4286419Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4286879Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4287156Z ok (0.014s)
2023-01-11T21:24:39.4287591Z   test_config_check (__main__.TestMemoryProfiler) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4288089Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4288549Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4288996Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4289542Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4290002Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4290451Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4290893Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4291355Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4291638Z ok (0.008s)
2023-01-11T21:24:39.4292123Z   test_categories_e2e_sequential_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4292809Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:81: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4293360Z   if isinstance(t, torch.Tensor) and t.storage()
2023-01-11T21:24:39.4293925Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:78: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4294666Z   return tuple(
2023-01-11T21:24:39.4295190Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:79: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4295698Z   (t._cdata, t.storage().data_ptr())
2023-01-11T21:24:39.4296098Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4296563Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4296848Z ok (0.078s)
2023-01-11T21:24:39.4297321Z   test_categories_e2e_sequential_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4297846Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4298304Z STAGE:2023-01-11 21:24:36 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4298586Z ok (0.331s)
2023-01-11T21:24:39.4299045Z   test_categories_e2e_simple_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4299555Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4300060Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4300331Z ok (0.041s)
2023-01-11T21:24:39.4300796Z   test_categories_e2e_simple_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4301312Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4301770Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4302040Z ok (0.197s)
2023-01-11T21:24:39.4302576Z   test_categories_e2e_simple_fwd_bwd_step (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4303103Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4303556Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4303835Z ok (0.245s)
2023-01-11T21:24:39.4304307Z   test_categories_e2e_simple_module_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4304824Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4305286Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4305570Z ok (0.032s)
2023-01-11T21:24:39.4306053Z   test_categories_e2e_simple_module_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4306580Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4307036Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4307315Z ok (0.126s)
2023-01-11T21:24:39.4307802Z   test_categories_e2e_simple_module_fwd_bwd_step (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4308320Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4308826Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4309110Z ok (0.206s)
2023-01-11T21:24:39.4309559Z   test_inputs_fwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4310047Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4310504Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4311137Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:828: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4311657Z   storage = t.storage()
2023-01-11T21:24:39.4312195Z /var/lib/jenkins/workspace/test/profiler/test_memory_profiler.py:836: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:24:39.4312777Z   if key.storage.ptr == storage.data_ptr() and key.device == storage.device
2023-01-11T21:24:39.4313038Z ok (0.028s)
2023-01-11T21:24:39.4313491Z   test_inputs_fwd_bwd (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4313988Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4314454Z STAGE:2023-01-11 21:24:37 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4314739Z ok (0.157s)
2023-01-11T21:24:39.4315190Z   test_inputs_fwd_lazy (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4315726Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4316189Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4316467Z ok (0.033s)
2023-01-11T21:24:39.4316922Z   test_lazily_initialized (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4317426Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4317887Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4318175Z ok (0.082s)
2023-01-11T21:24:39.4318626Z   test_manual_optimizer_step (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4319136Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4319604Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4319887Z ok (0.044s)
2023-01-11T21:24:39.4320332Z   test_memory_timeline (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4320836Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4321296Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4321631Z ok (0.191s)
2023-01-11T21:24:39.4322101Z   test_parameters_and_gradients (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4322619Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4323085Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4323528Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4323976Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4324440Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4324895Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4325339Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4325801Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4326258Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4326707Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4327156Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4327437Z ok (0.180s)
2023-01-11T21:24:39.4327917Z   test_parameters_and_gradients_set_to_none (__main__.TestMemoryProfilerE2E) ... STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4328425Z [W CPUAllocator.cpp:231] Memory block of unknown size was allocated before the profiling started, profiler results will not include the deallocation event
2023-01-11T21:24:39.4328935Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4329394Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4329889Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:24:39.4330385Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:24:39.4330849Z STAGE:2023-01-11 21:24:38 4185:4185 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:24:39.4331127Z ok (0.246s)
2023-01-11T21:24:39.4331243Z 
2023-01-11T21:24:39.4331451Z ----------------------------------------------------------------------
2023-01-11T21:24:39.4331710Z Ran 32 tests in 4.464s
2023-01-11T21:24:39.4331836Z 
2023-01-11T21:24:39.4331908Z OK
2023-01-11T21:24:39.4332013Z 
2023-01-11T21:24:39.4332112Z Generating XML reports...
2023-01-11T21:24:39.4332543Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestDataFlow-20230111212434.xml
2023-01-11T21:24:39.4333121Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestIdentifyGradients-20230111212434.xml
2023-01-11T21:24:39.4333706Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfiler-20230111212434.xml
2023-01-11T21:24:39.4334392Z Generated XML report: test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfilerE2E-20230111212434.xml
2023-01-11T21:24:39.4334869Z 
2023-01-11T21:24:39.4335220Z ##[endgroup]
2023-01-11T21:24:39.4335669Z FINISHED PRINTING LOG FILE of profiler/test_memory_profiler (/var/lib/jenkins/workspace/test/test-reports/profiler-test_memory_profiler_ksquozlp)
2023-01-11T21:24:39.4335929Z 
2023-01-11T21:24:41.3616610Z Ignoring disabled issues:  []
2023-01-11T21:24:41.3850091Z Running test_autocast ... [2023-01-11 21:24:41.384380]
2023-01-11T21:24:41.3851709Z Executing ['/opt/conda/bin/python', '-bb', 'test_autocast.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:41.384754]
2023-01-11T21:24:45.0413086Z 
2023-01-11T21:24:45.0413846Z Expand the folded group to see the log file of test_autocast
2023-01-11T21:24:45.0415132Z ##[group]PRINTING LOG FILE of test_autocast (/var/lib/jenkins/workspace/test/test-reports/test_autocast_9mc1y8u0)
2023-01-11T21:24:45.0415452Z 
2023-01-11T21:24:45.0415573Z Running tests...
2023-01-11T21:24:45.0416036Z ----------------------------------------------------------------------
2023-01-11T21:24:45.0416504Z Test results will be stored in test-reports/python-unittest/test_autocast
2023-01-11T21:24:45.0416859Z   test_autocast_methods_expect_builtin_promote (__main__.TestAutocastCPU) ... ok (1.110s)
2023-01-11T21:24:45.0417187Z   test_autocast_nn_bf16 (__main__.TestAutocastCPU) ... ok (0.005s)
2023-01-11T21:24:45.0417484Z   test_autocast_nn_fp32 (__main__.TestAutocastCPU) ... ok (0.007s)
2023-01-11T21:24:45.0417787Z   test_autocast_torch_bf16 (__main__.TestAutocastCPU) ... ok (0.010s)
2023-01-11T21:24:45.0418114Z   test_autocast_torch_expect_builtin_promote (__main__.TestAutocastCPU) ... ok (0.005s)
2023-01-11T21:24:45.0418439Z   test_autocast_torch_fp32 (__main__.TestAutocastCPU) ... ok (0.064s)
2023-01-11T21:24:45.0418757Z   test_autocast_torch_need_autocast_promote (__main__.TestAutocastCPU) ... ok (0.005s)
2023-01-11T21:24:45.0419067Z   test_cast_cache_is_global (__main__.TestAutocastGPU)
2023-01-11T21:24:45.0419368Z Verifies that the autocast cache is global. This is done by ... ok (0.656s)
2023-01-11T21:24:45.0419676Z   test_autocast_fast_dtype (__main__.TestTorchAutocast) ... ok (0.001s)
2023-01-11T21:24:45.0419850Z 
2023-01-11T21:24:45.0420054Z ----------------------------------------------------------------------
2023-01-11T21:24:45.0420316Z Ran 9 tests in 1.864s
2023-01-11T21:24:45.0420448Z 
2023-01-11T21:24:45.0420523Z OK
2023-01-11T21:24:45.0420627Z 
2023-01-11T21:24:45.0420714Z Generating XML reports...
2023-01-11T21:24:45.0421145Z Generated XML report: test-reports/python-unittest/test_autocast/TEST-TestAutocastCPU-20230111212442.xml
2023-01-11T21:24:45.0421662Z Generated XML report: test-reports/python-unittest/test_autocast/TEST-TestAutocastGPU-20230111212442.xml
2023-01-11T21:24:45.0422522Z Generated XML report: test-reports/python-unittest/test_autocast/TEST-TestTorchAutocast-20230111212442.xml
2023-01-11T21:24:45.0422756Z 
2023-01-11T21:24:45.0423021Z ##[endgroup]
2023-01-11T21:24:45.0423418Z FINISHED PRINTING LOG FILE of test_autocast (/var/lib/jenkins/workspace/test/test-reports/test_autocast_9mc1y8u0)
2023-01-11T21:24:45.0423643Z 
2023-01-11T21:24:46.9620368Z Ignoring disabled issues:  []
2023-01-11T21:24:46.9850142Z Running test_comparison_utils ... [2023-01-11 21:24:46.984506]
2023-01-11T21:24:46.9853547Z Executing ['/opt/conda/bin/python', '-bb', 'test_comparison_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:46.984853]
2023-01-11T21:24:48.5794519Z 
2023-01-11T21:24:48.5795094Z Expand the folded group to see the log file of test_comparison_utils
2023-01-11T21:24:48.5796693Z ##[group]PRINTING LOG FILE of test_comparison_utils (/var/lib/jenkins/workspace/test/test-reports/test_comparison_utils_lzsvj7yi)
2023-01-11T21:24:48.5797259Z 
2023-01-11T21:24:48.5797726Z ##[endgroup]
2023-01-11T21:24:48.5798668Z FINISHED PRINTING LOG FILE of test_comparison_utils (/var/lib/jenkins/workspace/test/test-reports/test_comparison_utils_lzsvj7yi)
2023-01-11T21:24:48.5799134Z 
2023-01-11T21:24:50.4712456Z Ignoring disabled issues:  []
2023-01-11T21:24:50.4945141Z Running test_dataloader ... [2023-01-11 21:24:50.493920]
2023-01-11T21:24:50.4945931Z Executing ['/opt/conda/bin/python', '-bb', 'test_dataloader.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:24:50.494249]
2023-01-11T21:27:03.4835434Z 
2023-01-11T21:27:03.4837196Z Expand the folded group to see the log file of test_dataloader
2023-01-11T21:27:03.4852707Z ##[group]PRINTING LOG FILE of test_dataloader (/var/lib/jenkins/workspace/test/test-reports/test_dataloader_rmt2ijzu)
2023-01-11T21:27:03.4853311Z 
2023-01-11T21:27:03.4853441Z Running tests...
2023-01-11T21:27:03.4854154Z ----------------------------------------------------------------------
2023-01-11T21:27:03.4855059Z Test results will be stored in test-reports/python-unittest/test_dataloader
2023-01-11T21:27:03.4856008Z   test_shuffler_iterdatapipe (__main__.IntegrationTestDataLoaderDataPipe)
2023-01-11T21:27:03.4896218Z Verify ``IterDataPipe.shuffle`` is controlled by ``DataLoader`` ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:27:03.4897198Z   test_add_dataset (__main__.TestConcatDataset) ... ok (0.226s)
2023-01-11T21:27:03.4898096Z   test_concat_raises_index_error (__main__.TestConcatDataset) ... ok (0.001s)
2023-01-11T21:27:03.4898744Z   test_concat_two_non_singletons (__main__.TestConcatDataset) ... ok (0.001s)
2023-01-11T21:27:03.4899621Z   test_concat_two_non_singletons_with_empty (__main__.TestConcatDataset) ... ok (0.001s)
2023-01-11T21:27:03.4900400Z   test_concat_two_singletons (__main__.TestConcatDataset) ... ok (0.001s)
2023-01-11T21:27:03.4900890Z   test_iterable_dataset_err (__main__.TestConcatDataset) ... ok (0.001s)
2023-01-11T21:27:03.4901949Z   test_conv_after_fork (__main__.TestConvAfterFork) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/75492 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.000s)
2023-01-11T21:27:03.4902599Z   test_custom_batch_pin (__main__.TestCustomPinFn) ... ok (0.004s)
2023-01-11T21:27:03.4990334Z   test_custom_batch_pin_worker (__main__.TestCustomPinFn) ... ok (0.138s)
2023-01-11T21:27:03.4991370Z   test_batch_sampler (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.4991987Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.4992696Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.4993637Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.4994127Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.4994526Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.4995091Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.4995522Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.4996225Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.4996782Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.4997529Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.4998206Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.4998969Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5017431Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5018171Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5018662Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5018981Z ok (1.970s)
2023-01-11T21:27:03.5019302Z   test_builtin_collection_conversion (__main__.TestDataLoader) ... ok (0.274s)
2023-01-11T21:27:03.5020061Z   test_bulk_loading_nobatch (__main__.TestDataLoader) ... ok (0.082s)
2023-01-11T21:27:03.5020462Z   test_chain_iterable_style_dataset (__main__.TestDataLoader) ... ok (0.127s)
2023-01-11T21:27:03.5020873Z   test_default_collate_bad_numpy_types (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5021297Z   test_default_collate_bad_sequence_type (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5021705Z   test_default_collate_dtype (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5022116Z   test_default_collate_mapping_keep_type (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5023591Z   test_default_collate_numpy_memmap (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py:172: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:210.)
2023-01-11T21:27:03.5024714Z   return collate([torch.as_tensor(b) for b in batch], collate_fn_map=collate_fn_map)
2023-01-11T21:27:03.5025052Z ok (0.013s)
2023-01-11T21:27:03.5025391Z   test_default_collate_sequence_dont_keep_type (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5025828Z   test_default_collate_sequence_keep_type (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5026256Z   test_default_collate_shared_tensor (__main__.TestDataLoader) ... ok (0.003s)
2023-01-11T21:27:03.5026907Z   test_default_convert_mapping_keep_type (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5027339Z   test_default_convert_sequence_dont_keep_type (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5027777Z   test_default_convert_sequence_keep_type (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5028128Z   test_distributed_sampler_invalid_rank (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5028454Z   test_duplicating_data_with_drop_last (__main__.TestDataLoader) ... ok (0.003s)
2023-01-11T21:27:03.5028753Z   test_error (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5029126Z   test_error_in_init (__main__.TestDataLoader) ... ok (0.093s)
2023-01-11T21:27:03.5029412Z   test_error_workers (__main__.TestDataLoader) ... ok (0.083s)
2023-01-11T21:27:03.5029710Z   test_excessive_thread_creation_warning (__main__.TestDataLoader) ... ok (0.006s)
2023-01-11T21:27:03.5030021Z   test_fd_limit_exceeded (__main__.TestDataLoader) ... ok (1.521s)
2023-01-11T21:27:03.5030607Z   test_get_worker_info (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5030993Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5031438Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5031804Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5032045Z ok (3.357s)
2023-01-11T21:27:03.5032278Z   test_growing_dataset (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5032611Z   test_invalid_assign_after_init (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5032961Z   test_invalid_ctor_args_combinations (__main__.TestDataLoader) ... ok (0.003s)
2023-01-11T21:27:03.5033296Z   test_iterable_style_dataset (__main__.TestDataLoader) ... ok (1.104s)
2023-01-11T21:27:03.5033627Z   test_iterabledataset_len (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5034234Z   test_large_sampler_indices (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5034719Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5035211Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5035613Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5035875Z ok (3.103s)
2023-01-11T21:27:03.5036111Z   test_len (__main__.TestDataLoader) ... ok (0.007s)
2023-01-11T21:27:03.5036436Z   test_multi_epochs_reproducibility (__main__.TestDataLoader) ... ok (0.150s)
2023-01-11T21:27:03.5037053Z   test_multiple_dataloaders (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5037494Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5037988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5038391Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5038894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5039266Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5039767Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5040163Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5040662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5041034Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5041528Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5041922Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5042412Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5042782Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5043309Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5043705Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5043953Z ok (3.406s)
2023-01-11T21:27:03.5044496Z   test_multiprocessing_contexts (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5044943Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5045442Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5045837Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5046337Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5046716Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5047205Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5047601Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5048097Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5048474Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5048961Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5049356Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5049823Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5050341Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5050877Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5051453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5051838Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5052324Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5052719Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5053222Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5053599Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5054088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5054798Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5055444Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5055849Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5056280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5056635Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5057018Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5057449Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5057949Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5058453Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5058797Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5059217Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5059569Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5060002Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5060374Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5060815Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5061168Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5061604Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5061932Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5062359Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5062707Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5063141Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5063555Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5063984Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5064340Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5064771Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5065100Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5065530Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5065881Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5066308Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5066657Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5067085Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5067435Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5067667Z ok (12.336s)
2023-01-11T21:27:03.5068354Z   test_multiprocessing_iterdatapipe (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/graph_settings.py:88: UserWarning: `shuffle=True` was set, but the datapipe does not contain a `Shuffler`. Adding one at the end. Be aware that the default buffer size might not be sufficient for your task.
2023-01-11T21:27:03.5068839Z   warnings.warn(
2023-01-11T21:27:03.5069226Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5069566Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5070005Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5070363Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5070838Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5071174Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5071606Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5071965Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5072393Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5072732Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5073163Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5073520Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5073946Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5074291Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5074721Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5075065Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5075496Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5075831Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5076250Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5076622Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5077052Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5077410Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5077849Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5078195Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5078627Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5078965Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5079377Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5079718Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5080149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5080503Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5080937Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5081289Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5081526Z ok (6.467s)
2023-01-11T21:27:03.5081970Z   test_no_segfault (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5082353Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5082781Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5083140Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5083369Z ok (1.695s)
2023-01-11T21:27:03.5083594Z   test_numpy (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5083875Z   test_numpy_gen_state (__main__.TestDataLoader) ... ok (0.003s)
2023-01-11T21:27:03.5084193Z   test_numpy_scalars (__main__.TestDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5084469Z   test_partial_workers (__main__.TestDataLoader)
2023-01-11T21:27:03.5084766Z Check that workers exit even if the iterator is not exhausted. ... ok (0.167s)
2023-01-11T21:27:03.5085045Z   test_proper_exit (__main__.TestDataLoader)
2023-01-11T21:27:03.5085470Z There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.004s)
2023-01-11T21:27:03.5085895Z   test_random_sampler (__main__.TestDataLoader) ... ok (0.003s)
2023-01-11T21:27:03.5086212Z   test_random_sampler_len_with_replacement (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5086543Z   test_random_sampler_len_without_replacement (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5087068Z   test_sampler (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5087452Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5087888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5088241Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5088679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5089022Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5089489Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5089839Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5090279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5090616Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5091041Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5091397Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5091833Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5092170Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5092590Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5092945Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5093184Z ok (1.950s)
2023-01-11T21:27:03.5093429Z   test_sampler_reproducibility (__main__.TestDataLoader) ... ok (0.014s)
2023-01-11T21:27:03.5093943Z   test_segfault (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5094326Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5096668Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5097019Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5097256Z ok (3.254s)
2023-01-11T21:27:03.5097498Z   test_seqential_batch_workers (__main__.TestDataLoader) ... ok (0.132s)
2023-01-11T21:27:03.5097810Z   test_seqential_batch_workers_prefetch (__main__.TestDataLoader) ... ok (0.134s)
2023-01-11T21:27:03.5098112Z   test_sequential_batch (__main__.TestDataLoader) ... ok (0.038s)
2023-01-11T21:27:03.5098405Z   test_sequential_nonbatch (__main__.TestDataLoader) ... ok (0.021s)
2023-01-11T21:27:03.5098778Z   test_sequential_pin_memory (__main__.TestDataLoader) ... ok (0.004s)
2023-01-11T21:27:03.5099068Z   test_sequential_workers (__main__.TestDataLoader) ... ok (0.185s)
2023-01-11T21:27:03.5099345Z   test_shuffle (__main__.TestDataLoader) ... ok (0.058s)
2023-01-11T21:27:03.5099618Z   test_shuffle_batch (__main__.TestDataLoader) ... ok (0.052s)
2023-01-11T21:27:03.5099891Z   test_shuffle_batch_none (__main__.TestDataLoader) ... ok (0.054s)
2023-01-11T21:27:03.5100182Z   test_shuffle_batch_workers (__main__.TestDataLoader) ... ok (0.163s)
2023-01-11T21:27:03.5100486Z   test_shuffle_batch_workers_prefetch (__main__.TestDataLoader) ... ok (0.163s)
2023-01-11T21:27:03.5100780Z   test_shuffle_pin_memory (__main__.TestDataLoader) ... ok (0.122s)
2023-01-11T21:27:03.5101088Z   test_shuffle_reproducibility (__main__.TestDataLoader) ... ok (0.406s)
2023-01-11T21:27:03.5101385Z   test_shuffle_workers (__main__.TestDataLoader) ... ok (0.213s)
2023-01-11T21:27:03.5101880Z   test_timeout (__main__.TestDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5102252Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5102682Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5103034Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5103458Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5103797Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5104280Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5104637Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5104867Z ok (5.314s)
2023-01-11T21:27:03.5105096Z   test_typing (__main__.TestDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5105378Z   test_worker_init_fn (__main__.TestDataLoader) ... ok (0.051s)
2023-01-11T21:27:03.5105648Z   test_worker_seed (__main__.TestDataLoader) ... ok (0.105s)
2023-01-11T21:27:03.5105944Z   test_worker_seed_reproducibility (__main__.TestDataLoader) ... ok (0.208s)
2023-01-11T21:27:03.5106508Z   test_batch_sampler (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5106924Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5107350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5107713Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5108155Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5108499Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5108926Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5109284Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5109724Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5110059Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5110494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5110850Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5111284Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5111672Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5112109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5112462Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5112692Z ok (2.008s)
2023-01-11T21:27:03.5112983Z   test_builtin_collection_conversion (__main__.TestDataLoaderPersistentWorkers) ... ok (0.291s)
2023-01-11T21:27:03.5113361Z   test_bulk_loading_nobatch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.081s)
2023-01-11T21:27:03.5113739Z   test_chain_iterable_style_dataset (__main__.TestDataLoaderPersistentWorkers) ... ok (0.110s)
2023-01-11T21:27:03.5114105Z   test_dataset_not_reset (__main__.TestDataLoaderPersistentWorkers) ... ok (0.226s)
2023-01-11T21:27:03.5114482Z   test_default_collate_bad_numpy_types (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5114877Z   test_default_collate_bad_sequence_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5115251Z   test_default_collate_dtype (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5115639Z   test_default_collate_mapping_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5116027Z   test_default_collate_numpy_memmap (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5116422Z   test_default_collate_sequence_dont_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5116815Z   test_default_collate_sequence_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5117237Z   test_default_collate_shared_tensor (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5117623Z   test_default_convert_mapping_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5118028Z   test_default_convert_sequence_dont_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5118422Z   test_default_convert_sequence_keep_type (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5118818Z   test_distributed_sampler_invalid_rank (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5119209Z   test_duplicating_data_with_drop_last (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5119567Z   test_early_exit (__main__.TestDataLoaderPersistentWorkers) ... ok (12.399s)
2023-01-11T21:27:03.5119906Z   test_error (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5120248Z   test_error_in_init (__main__.TestDataLoaderPersistentWorkers) ... ok (0.100s)
2023-01-11T21:27:03.5120596Z   test_error_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.058s)
2023-01-11T21:27:03.5120965Z   test_excessive_thread_creation_warning (__main__.TestDataLoaderPersistentWorkers) ... ok (0.008s)
2023-01-11T21:27:03.5121344Z   test_fd_limit_exceeded (__main__.TestDataLoaderPersistentWorkers) ... ok (1.501s)
2023-01-11T21:27:03.5121932Z   test_get_worker_info (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5122355Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5122784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5123145Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5123385Z ok (3.342s)
2023-01-11T21:27:03.5123653Z   test_growing_dataset (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5124020Z   test_invalid_assign_after_init (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5124403Z   test_invalid_ctor_args_combinations (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5124820Z   test_iterable_style_dataset (__main__.TestDataLoaderPersistentWorkers) ... ok (20.994s)
2023-01-11T21:27:03.5125185Z   test_iterabledataset_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5125783Z   test_large_sampler_indices (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5126208Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5126638Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5127005Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5127244Z ok (3.116s)
2023-01-11T21:27:03.5127502Z   test_len (__main__.TestDataLoaderPersistentWorkers) ... ok (0.007s)
2023-01-11T21:27:03.5127862Z   test_multi_epochs_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.057s)
2023-01-11T21:27:03.5128468Z   test_multiple_dataloaders (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5128896Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5129325Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5129686Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5130128Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5130508Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5130935Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5131294Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5131732Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5132071Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5132494Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5132848Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5133279Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5133611Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5134036Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5134390Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5134955Z ok (3.481s)
2023-01-11T21:27:03.5135461Z   test_multiprocessing_contexts (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5135883Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5136312Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5136656Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5137088Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5137424Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5137845Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5138261Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5138693Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5139026Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5139449Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5139793Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5140171Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5140615Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5141053Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5141532Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5141872Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5142298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5142641Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5143076Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5143461Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5143894Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5144239Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5144679Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5145014Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5145437Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5145792Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5146169Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5146615Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5147045Z [W CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
2023-01-11T21:27:03.5147537Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5147883Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5148318Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5148670Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5149109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5149449Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5149888Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5150236Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5150706Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5151046Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5151470Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5151824Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5152257Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5152591Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5153011Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5153370Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5153802Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5154132Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5154568Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5154921Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5155350Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5155679Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5156109Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5156496Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5156732Z ok (12.296s)
2023-01-11T21:27:03.5157450Z   test_multiprocessing_iterdatapipe (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/graph_settings.py:88: UserWarning: `shuffle=True` was set, but the datapipe does not contain a `Shuffler`. Adding one at the end. Be aware that the default buffer size might not be sufficient for your task.
2023-01-11T21:27:03.5157964Z   warnings.warn(
2023-01-11T21:27:03.5158351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5158683Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5159116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5159469Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5159898Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5160226Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5160660Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5161013Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5161448Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5161775Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5162207Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5162559Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5162988Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5163325Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5163784Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5164140Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5164563Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5164908Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5165334Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5165688Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5166112Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5166451Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5166882Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5167228Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5167662Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5167998Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5168423Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5168767Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5169202Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5169574Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5169994Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5170350Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5170587Z ok (6.424s)
2023-01-11T21:27:03.5171076Z   test_no_segfault (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5171485Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5171917Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5172273Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5172512Z ok (1.675s)
2023-01-11T21:27:03.5172775Z   test_numpy (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5173124Z   test_numpy_gen_state (__main__.TestDataLoaderPersistentWorkers) ... ok (0.003s)
2023-01-11T21:27:03.5173484Z   test_numpy_scalars (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5173818Z   test_partial_workers (__main__.TestDataLoaderPersistentWorkers)
2023-01-11T21:27:03.5174149Z Check that workers exit even if the iterator is not exhausted. ... ok (0.225s)
2023-01-11T21:27:03.5174637Z   test_proper_exit (__main__.TestDataLoaderPersistentWorkers)
2023-01-11T21:27:03.5175130Z There might be ConnectionResetError or leaked semaphore warning (due to dirty process exit), but they are all safe to ignore ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s)
2023-01-11T21:27:03.5175681Z   test_random_sampler (__main__.TestDataLoaderPersistentWorkers) ... ok (0.002s)
2023-01-11T21:27:03.5176070Z   test_random_sampler_len_with_replacement (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5176471Z   test_random_sampler_len_without_replacement (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5177121Z   test_sampler (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5177534Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5177964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5178317Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5178743Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5179084Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5179516Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5179857Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5180295Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5180632Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5181065Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5181405Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5181840Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5182173Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5182653Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5182995Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5183232Z ok (1.965s)
2023-01-11T21:27:03.5183525Z   test_sampler_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.013s)
2023-01-11T21:27:03.5184102Z   test_segfault (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5184515Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5184949Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5185310Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5185541Z ok (3.227s)
2023-01-11T21:27:03.5185823Z   test_seqential_batch_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.141s)
2023-01-11T21:27:03.5186213Z   test_seqential_batch_workers_prefetch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.151s)
2023-01-11T21:27:03.5186585Z   test_sequential_batch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.037s)
2023-01-11T21:27:03.5186952Z   test_sequential_nonbatch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.021s)
2023-01-11T21:27:03.5187320Z   test_sequential_pin_memory (__main__.TestDataLoaderPersistentWorkers) ... ok (0.004s)
2023-01-11T21:27:03.5187682Z   test_sequential_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.197s)
2023-01-11T21:27:03.5188027Z   test_shuffle (__main__.TestDataLoaderPersistentWorkers) ... ok (0.058s)
2023-01-11T21:27:03.5188374Z   test_shuffle_batch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.052s)
2023-01-11T21:27:03.5188729Z   test_shuffle_batch_none (__main__.TestDataLoaderPersistentWorkers) ... ok (0.054s)
2023-01-11T21:27:03.5189088Z   test_shuffle_batch_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.182s)
2023-01-11T21:27:03.5189468Z   test_shuffle_batch_workers_prefetch (__main__.TestDataLoaderPersistentWorkers) ... ok (0.176s)
2023-01-11T21:27:03.5189840Z   test_shuffle_pin_memory (__main__.TestDataLoaderPersistentWorkers) ... ok (0.133s)
2023-01-11T21:27:03.5190238Z   test_shuffle_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.302s)
2023-01-11T21:27:03.5190599Z   test_shuffle_workers (__main__.TestDataLoaderPersistentWorkers) ... ok (0.208s)
2023-01-11T21:27:03.5191170Z   test_timeout (__main__.TestDataLoaderPersistentWorkers) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5191582Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5192016Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5192370Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5192810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:27:03.5193152Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:27:03.5193582Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:27:03.5193940Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:27:03.5194178Z ok (5.317s)
2023-01-11T21:27:03.5194433Z   test_typing (__main__.TestDataLoaderPersistentWorkers) ... ok (0.001s)
2023-01-11T21:27:03.5194779Z   test_worker_init_fn (__main__.TestDataLoaderPersistentWorkers) ... ok (0.052s)
2023-01-11T21:27:03.5195125Z   test_worker_seed (__main__.TestDataLoaderPersistentWorkers) ... ok (0.110s)
2023-01-11T21:27:03.5195492Z   test_worker_seed_reproducibility (__main__.TestDataLoaderPersistentWorkers) ... ok (0.208s)
2023-01-11T21:27:03.5195917Z   test_incomplete_fractional_splits (__main__.TestDatasetRandomSplit) ... ok (0.001s)
2023-01-11T21:27:03.5196272Z   test_lengths_must_equal_dataset_size (__main__.TestDatasetRandomSplit) ... ok (0.001s)
2023-01-11T21:27:03.5196621Z   test_slicing_of_subset_of_dataset (__main__.TestDatasetRandomSplit) ... ok (0.003s)
2023-01-11T21:27:03.5196959Z   test_slicing_of_subset_of_subset (__main__.TestDatasetRandomSplit) ... ok (0.002s)
2023-01-11T21:27:03.5197295Z   test_splits_are_mutually_exclusive (__main__.TestDatasetRandomSplit) ... ok (0.002s)
2023-01-11T21:27:03.5197628Z   test_splits_generator (__main__.TestDatasetRandomSplit) ... ok (0.002s)
2023-01-11T21:27:03.5197955Z   test_splits_have_correct_size (__main__.TestDatasetRandomSplit) ... ok (0.001s)
2023-01-11T21:27:03.5198271Z   test_splits_indexing_type (__main__.TestDatasetRandomSplit)
2023-01-11T21:27:03.5198555Z Indices generated by random_split ... ok (0.002s)
2023-01-11T21:27:03.5198866Z   test_splits_reproducibility (__main__.TestDatasetRandomSplit) ... ok (0.007s)
2023-01-11T21:27:03.5199173Z   test_pin_memory (__main__.TestDictDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5199474Z   test_pin_memory_device (__main__.TestDictDataLoader) ... ok (0.001s)
2023-01-11T21:27:03.5200213Z   test_pin_memory_with_only_device (__main__.TestDictDataLoader) ... /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:599: UserWarning: pin memory device is set and pin_memory flag is not used then device pinned memory won't be usedplease set pin_memory to true, if you need to use the device pin memory
2023-01-11T21:27:03.5200701Z   warnings.warn(warn_msg)
2023-01-11T21:27:03.5200892Z ok (0.001s)
2023-01-11T21:27:03.5201137Z   test_sequential_batch (__main__.TestDictDataLoader) ... ok (0.034s)
2023-01-11T21:27:03.5201972Z   test_ind_worker_queue (__main__.TestIndividualWorkerQueue) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/68643 for platform(s) macos, linux, win. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s)
2023-01-11T21:27:03.5202593Z   test_dataloader_with_namedtuple (__main__.TestNamedTupleDataLoader) ... ok (0.002s)
2023-01-11T21:27:03.5202971Z   test_set_affinity_in_worker_init (__main__.TestSetAffinity) ... ok (0.057s)
2023-01-11T21:27:03.5203287Z   test_shuffle_pin_memory (__main__.TestStringDataLoader) ... ok (0.075s)
2023-01-11T21:27:03.5203586Z   test_getitem (__main__.TestTensorDataset) ... ok (0.005s)
2023-01-11T21:27:03.5203869Z   test_getitem_1d (__main__.TestTensorDataset) ... ok (0.004s)
2023-01-11T21:27:03.5204139Z   test_len (__main__.TestTensorDataset) ... ok (0.001s)
2023-01-11T21:27:03.5204420Z   test_many_tensors (__main__.TestTensorDataset) ... ok (0.003s)
2023-01-11T21:27:03.5204712Z   test_single_tensor (__main__.TestTensorDataset) ... ok (0.001s)
2023-01-11T21:27:03.5204887Z 
2023-01-11T21:27:03.5205084Z ----------------------------------------------------------------------
2023-01-11T21:27:03.5205346Z Ran 164 tests in 131.135s
2023-01-11T21:27:03.5205469Z 
2023-01-11T21:27:03.5205549Z OK (skipped=5)
2023-01-11T21:27:03.5205664Z 
2023-01-11T21:27:03.5205755Z Generating XML reports...
2023-01-11T21:27:03.5206171Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestConcatDataset-20230111212451.xml
2023-01-11T21:27:03.5206701Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestCustomPinFn-20230111212451.xml
2023-01-11T21:27:03.5207214Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDataLoader-20230111212451.xml
2023-01-11T21:27:03.5207760Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDataLoaderPersistentWorkers-20230111212451.xml
2023-01-11T21:27:03.5208334Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDatasetRandomSplit-20230111212451.xml
2023-01-11T21:27:03.5208912Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestDictDataLoader-20230111212451.xml
2023-01-11T21:27:03.5209454Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestNamedTupleDataLoader-20230111212451.xml
2023-01-11T21:27:03.5209979Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestSetAffinity-20230111212451.xml
2023-01-11T21:27:03.5210505Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestStringDataLoader-20230111212451.xml
2023-01-11T21:27:03.5211027Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestTensorDataset-20230111212451.xml
2023-01-11T21:27:03.5211601Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-IntegrationTestDataLoaderDataPipe-20230111212451.xml
2023-01-11T21:27:03.5212151Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestConvAfterFork-20230111212451.xml
2023-01-11T21:27:03.5212699Z Generated XML report: test-reports/python-unittest/test_dataloader/TEST-TestIndividualWorkerQueue-20230111212451.xml
2023-01-11T21:27:03.5212963Z 
2023-01-11T21:27:03.5213353Z ##[endgroup]
2023-01-11T21:27:03.5213741Z FINISHED PRINTING LOG FILE of test_dataloader (/var/lib/jenkins/workspace/test/test-reports/test_dataloader_rmt2ijzu)
2023-01-11T21:27:03.5213963Z 
2023-01-11T21:27:05.4220663Z Ignoring disabled issues:  []
2023-01-11T21:27:05.4460874Z Running test_functional_optim ... [2023-01-11 21:27:05.445528]
2023-01-11T21:27:05.4462281Z Executing ['/opt/conda/bin/python', '-bb', 'test_functional_optim.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:05.445902]
2023-01-11T21:27:08.5022684Z 
2023-01-11T21:27:08.5023166Z Expand the folded group to see the log file of test_functional_optim
2023-01-11T21:27:08.5024152Z ##[group]PRINTING LOG FILE of test_functional_optim (/var/lib/jenkins/workspace/test/test-reports/test_functional_optim_xasvohng)
2023-01-11T21:27:08.5024517Z 
2023-01-11T21:27:08.5024653Z Running tests...
2023-01-11T21:27:08.5025166Z ----------------------------------------------------------------------
2023-01-11T21:27:08.5025568Z Test results will be stored in test-reports/python-unittest/test_functional_optim
2023-01-11T21:27:08.5025938Z   test_functional_optim_parity_adam (__main__.TestFunctionalOptimParity) ... ok (1.129s)
2023-01-11T21:27:08.5026518Z   test_functional_optim_parity_adam_w (__main__.TestFunctionalOptimParity) ... ok (0.020s)
2023-01-11T21:27:08.5026890Z   test_functional_optim_parity_sgd (__main__.TestFunctionalOptimParity) ... ok (0.018s)
2023-01-11T21:27:08.5027246Z   test_functional_optim_registration (__main__.TestFunctionalOptimParity) ... ok (0.000s)
2023-01-11T21:27:08.5027452Z 
2023-01-11T21:27:08.5027652Z ----------------------------------------------------------------------
2023-01-11T21:27:08.5027912Z Ran 4 tests in 1.168s
2023-01-11T21:27:08.5028033Z 
2023-01-11T21:27:08.5028094Z OK
2023-01-11T21:27:08.5028194Z 
2023-01-11T21:27:08.5028294Z Generating XML reports...
2023-01-11T21:27:08.5028766Z Generated XML report: test-reports/python-unittest/test_functional_optim/TEST-TestFunctionalOptimParity-20230111212706.xml
2023-01-11T21:27:08.5029034Z 
2023-01-11T21:27:08.5029263Z ##[endgroup]
2023-01-11T21:27:08.5029672Z FINISHED PRINTING LOG FILE of test_functional_optim (/var/lib/jenkins/workspace/test/test-reports/test_functional_optim_xasvohng)
2023-01-11T21:27:08.5029908Z 
2023-01-11T21:27:10.4149796Z Ignoring disabled issues:  []
2023-01-11T21:27:10.4383344Z Running test_fx_experimental ... [2023-01-11 21:27:10.437525]
2023-01-11T21:27:10.4383939Z Executing ['/opt/conda/bin/python', '-bb', 'test_fx_experimental.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:10.437870]
2023-01-11T21:27:27.6753342Z 
2023-01-11T21:27:27.6754112Z Expand the folded group to see the log file of test_fx_experimental
2023-01-11T21:27:27.6755158Z ##[group]PRINTING LOG FILE of test_fx_experimental (/var/lib/jenkins/workspace/test/test-reports/test_fx_experimental_bv6m_9hq)
2023-01-11T21:27:27.6755905Z 
2023-01-11T21:27:27.6756043Z Running tests...
2023-01-11T21:27:27.6756645Z ----------------------------------------------------------------------
2023-01-11T21:27:27.6757288Z Test results will be stored in test-reports/python-unittest/test_fx_experimental
2023-01-11T21:27:27.6757806Z   test_annotate_getitem_node (__main__.TestFXExperimental) ... ok (0.008s)
2023-01-11T21:27:27.6759027Z   test_annotate_returns_with_schema (__main__.TestFXExperimental) ... /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py:181: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
2023-01-11T21:27:27.6759948Z   warnings.warn("The TorchScript type system doesn't support "
2023-01-11T21:27:27.6760289Z ok (1.169s)
2023-01-11T21:27:27.6760635Z   test_aot_based_partition (__main__.TestFXExperimental) ... ok (0.005s)
2023-01-11T21:27:27.6761097Z   test_call_to_assert_no_msg (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6761557Z   test_call_to_assert_with_empty_msg (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6762024Z   test_call_to_assert_with_msg (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6762487Z   test_call_to_assert_with_multiline_message (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6763185Z   test_conv_bn_fusion (__main__.TestFXExperimental) ... ok (0.727s)
2023-01-11T21:27:27.6763680Z   test_conv_bn_fusion_not_running_state (__main__.TestFXExperimental) ... ok (0.008s)
2023-01-11T21:27:27.6764158Z   test_cost_aware_partition (__main__.TestFXExperimental) ... ok (0.011s)
2023-01-11T21:27:27.6764576Z   test_fetch (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6765012Z   test_find_single_partition (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6765456Z   test_lack_of_devices (__main__.TestFXExperimental) ... ok (0.002s)
2023-01-11T21:27:27.6765904Z   test_large_node_error (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6766348Z   test_merge_matmuls (__main__.TestFXExperimental)
2023-01-11T21:27:27.6766717Z A collection of test cases for torch.fx.experimental.merge_matmul, ... ok (0.025s)
2023-01-11T21:27:27.6767148Z   test_meta_tracer (__main__.TestFXExperimental) ... ok (0.017s)
2023-01-11T21:27:27.6767474Z   test_normalize_args (__main__.TestFXExperimental) ... ok (0.466s)
2023-01-11T21:27:27.6768139Z   test_normalize_args_perserve_type (__main__.TestFXExperimental) ... /opt/conda/lib/python3.10/site-packages/torch/fx/operator_schemas.py:207: UserWarning: Does not support nested parametric types, got typing.List[~t]. Please file a bug.
2023-01-11T21:27:27.6768553Z   warnings.warn(
2023-01-11T21:27:27.6768745Z ok (0.005s)
2023-01-11T21:27:27.6768999Z   test_normalize_args_preserve_meta (__main__.TestFXExperimental) ... ok (0.004s)
2023-01-11T21:27:27.6769341Z   test_normalize_binary_operators (__main__.TestFXExperimental) ... ok (0.047s)
2023-01-11T21:27:27.6769667Z   test_normalize_modules_exhaustive (__main__.TestFXExperimental)
2023-01-11T21:27:27.6770468Z Exhaustively test `Node.normalized_arguments` on all standard ... /opt/conda/lib/python3.10/site-packages/torch/nn/modules/conv.py:309: UserWarning: Using padding='same' with even kernel lengths and odd dilation may require a zero-padded copy of the input be created (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Convolution.cpp:997.)
2023-01-11T21:27:27.6771035Z   return F.conv1d(input, weight, bias, self.stride,
2023-01-11T21:27:27.6771258Z ok (1.404s)
2023-01-11T21:27:27.6771517Z   test_optimize_for_inference_cpu (__main__.TestFXExperimental) ... ok (0.229s)
2023-01-11T21:27:27.6771854Z   test_optimize_for_inference_cpu_torchvision (__main__.TestFXExperimental) ... ok (7.288s)
2023-01-11T21:27:27.6772201Z   test_partition_device_mapping (__main__.TestFXExperimental) ... ok (0.007s)
2023-01-11T21:27:27.6772562Z   test_partition_latency (__main__.TestFXExperimental) ... ok (0.006s)
2023-01-11T21:27:27.6772882Z   test_partition_node_manipulation (__main__.TestFXExperimental) ... ok (0.004s)
2023-01-11T21:27:27.6773223Z   test_replace_target_nodes_with (__main__.TestFXExperimental) ... ok (0.002s)
2023-01-11T21:27:27.6773537Z   test_saturate_host (__main__.TestFXExperimental) ... [0, 4]
2023-01-11T21:27:27.6773765Z [1, 2]
2023-01-11T21:27:27.6773929Z ok (0.005s)
2023-01-11T21:27:27.6774179Z   test_size_based_partition (__main__.TestFXExperimental) ... ok (0.005s)
2023-01-11T21:27:27.6774881Z   test_sparse_nn_partition (__main__.TestFXExperimental) ... ok (0.103s)
2023-01-11T21:27:27.6775205Z   test_split_module_default_arg (__main__.TestFXExperimental) ... ok (0.006s)
2023-01-11T21:27:27.6775544Z   test_split_module_kwargs_expansion (__main__.TestFXExperimental) ... ok (0.003s)
2023-01-11T21:27:27.6775881Z   test_split_qualname_mapping (__main__.TestFXExperimental) ... ok (0.005s)
2023-01-11T21:27:27.6776198Z   test_subgraph_creation (__main__.TestFXExperimental) ... ok (0.005s)
2023-01-11T21:27:27.6776523Z   test_subgraph_trivial_resnet (__main__.TestFXExperimental) ... ok (0.152s)
2023-01-11T21:27:27.6776848Z   test_subgraph_uniquename (__main__.TestFXExperimental) ... ok (0.005s)
2023-01-11T21:27:27.6777481Z   test_to_folder (__main__.TestFXExperimental) ... /opt/conda/lib/python3.10/site-packages/torch/fx/graph_module.py:476: UserWarning: Was not able to save the following children modules as reprs -saved as pickled files instead: ['seq']
2023-01-11T21:27:27.6777984Z   warnings.warn("Was not able to save the following children modules as reprs -"
2023-01-11T21:27:27.6778240Z ok (0.008s)
2023-01-11T21:27:27.6778518Z   test_traceable_function_with_nonstandard_name (__main__.TestFXExperimental) ... ok (0.002s)
2023-01-11T21:27:27.6778840Z   test_type_matches (__main__.TestFXExperimental) ... ok (0.002s)
2023-01-11T21:27:27.6779179Z   test_normalize_args_op_overload_cuda (__main__.TestNormalizeOperatorsCUDA) ... ok (0.002s)
2023-01-11T21:27:27.6779592Z   test_normalize_operator_exhaustive_H_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6780022Z   test_normalize_operator_exhaustive_T_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6780527Z   test_normalize_operator_exhaustive___getitem___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6780979Z   test_normalize_operator_exhaustive___radd___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6781418Z   test_normalize_operator_exhaustive___rdiv___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6781858Z   test_normalize_operator_exhaustive___rmatmul___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6782290Z   test_normalize_operator_exhaustive___rmod___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6782727Z   test_normalize_operator_exhaustive___rmul___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6783167Z   test_normalize_operator_exhaustive___rpow___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6783605Z   test_normalize_operator_exhaustive___rsub___cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6784049Z   test_normalize_operator_exhaustive__native_batch_norm_legit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6784519Z   test_normalize_operator_exhaustive__softmax_backward_data_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6784969Z   test_normalize_operator_exhaustive_abs_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6785456Z   test_normalize_operator_exhaustive_acos_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6785892Z   test_normalize_operator_exhaustive_acosh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6786334Z   test_normalize_operator_exhaustive_add_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6786822Z   test_normalize_operator_exhaustive_addbmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6787270Z   test_normalize_operator_exhaustive_addcdiv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6787710Z   test_normalize_operator_exhaustive_addcmul_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6788159Z   test_normalize_operator_exhaustive_addmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6788613Z   test_normalize_operator_exhaustive_addmm_decomposed_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6789062Z   test_normalize_operator_exhaustive_addmv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6789492Z   test_normalize_operator_exhaustive_addr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6789929Z   test_normalize_operator_exhaustive_all_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6790372Z   test_normalize_operator_exhaustive_allclose_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6790817Z   test_normalize_operator_exhaustive_amax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6791244Z   test_normalize_operator_exhaustive_amin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6791715Z   test_normalize_operator_exhaustive_aminmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6792161Z   test_normalize_operator_exhaustive_angle_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6792598Z   test_normalize_operator_exhaustive_any_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6793031Z   test_normalize_operator_exhaustive_arange_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6793476Z   test_normalize_operator_exhaustive_argmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6793929Z   test_normalize_operator_exhaustive_argmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6794372Z   test_normalize_operator_exhaustive_argsort_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6794810Z   test_normalize_operator_exhaustive_argwhere_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6795256Z   test_normalize_operator_exhaustive_as_strided_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6795723Z   test_normalize_operator_exhaustive_as_strided_partial_views_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6796198Z   test_normalize_operator_exhaustive_as_strided_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6796683Z   test_normalize_operator_exhaustive_asin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6797112Z   test_normalize_operator_exhaustive_asinh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6797556Z   test_normalize_operator_exhaustive_atan2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6797995Z   test_normalize_operator_exhaustive_atan_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6798439Z   test_normalize_operator_exhaustive_atanh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6798875Z   test_normalize_operator_exhaustive_atleast_1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6799328Z   test_normalize_operator_exhaustive_atleast_2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6799774Z   test_normalize_operator_exhaustive_atleast_3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6800228Z   test_normalize_operator_exhaustive_baddbmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6800667Z   test_normalize_operator_exhaustive_bernoulli_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6801111Z   test_normalize_operator_exhaustive_bfloat16_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6801558Z   test_normalize_operator_exhaustive_block_diag_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6801997Z   test_normalize_operator_exhaustive_bmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6802427Z   test_normalize_operator_exhaustive_bool_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6802908Z   test_normalize_operator_exhaustive_broadcast_shapes_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6803374Z   test_normalize_operator_exhaustive_broadcast_tensors_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6803832Z   test_normalize_operator_exhaustive_broadcast_to_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6804276Z   test_normalize_operator_exhaustive_bucketize_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6804719Z   test_normalize_operator_exhaustive_byte_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6805173Z   test_normalize_operator_exhaustive_cartesian_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6805624Z   test_normalize_operator_exhaustive_cat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6806058Z   test_normalize_operator_exhaustive_cdist_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6806501Z   test_normalize_operator_exhaustive_cdouble_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6806945Z   test_normalize_operator_exhaustive_ceil_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6807390Z   test_normalize_operator_exhaustive_cfloat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6807850Z   test_normalize_operator_exhaustive_chalf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6808288Z   test_normalize_operator_exhaustive_char_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6808734Z   test_normalize_operator_exhaustive_cholesky_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6809188Z   test_normalize_operator_exhaustive_cholesky_inverse_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6809642Z   test_normalize_operator_exhaustive_cholesky_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6810089Z   test_normalize_operator_exhaustive_chunk_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6810523Z   test_normalize_operator_exhaustive_clamp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6810973Z   test_normalize_operator_exhaustive_clamp_max_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6811417Z   test_normalize_operator_exhaustive_clamp_min_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6811860Z   test_normalize_operator_exhaustive_clone_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6812307Z   test_normalize_operator_exhaustive_column_stack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6812760Z   test_normalize_operator_exhaustive_combinations_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6813201Z   test_normalize_operator_exhaustive_complex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6813647Z   test_normalize_operator_exhaustive_conj_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6814123Z   test_normalize_operator_exhaustive_conj_physical_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6814855Z   test_normalize_operator_exhaustive_constant_pad_nd_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6815305Z   test_normalize_operator_exhaustive_contiguous_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6815757Z   test_normalize_operator_exhaustive_copysign_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6816205Z   test_normalize_operator_exhaustive_corrcoef_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6816649Z   test_normalize_operator_exhaustive_cos_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6817083Z   test_normalize_operator_exhaustive_cosh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6817528Z   test_normalize_operator_exhaustive_count_nonzero_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6817973Z   test_normalize_operator_exhaustive_cov_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6818407Z   test_normalize_operator_exhaustive_cross_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6818842Z   test_normalize_operator_exhaustive_cummax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6819284Z   test_normalize_operator_exhaustive_cummin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6819782Z   test_normalize_operator_exhaustive_cumprod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6820231Z   test_normalize_operator_exhaustive_cumsum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6820681Z   test_normalize_operator_exhaustive_cumulative_trapezoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6821138Z   test_normalize_operator_exhaustive_deg2rad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6821577Z   test_normalize_operator_exhaustive_diag_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6822024Z   test_normalize_operator_exhaustive_diag_embed_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.003s)
2023-01-11T21:27:27.6822471Z   test_normalize_operator_exhaustive_diagflat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6822926Z   test_normalize_operator_exhaustive_diagonal_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6823378Z   test_normalize_operator_exhaustive_diagonal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6823836Z   test_normalize_operator_exhaustive_diagonal_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6824281Z   test_normalize_operator_exhaustive_diff_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6824719Z   test_normalize_operator_exhaustive_digamma_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6825163Z   test_normalize_operator_exhaustive_dist_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6825612Z   test_normalize_operator_exhaustive_div_floor_rounding_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6826103Z   test_normalize_operator_exhaustive_div_no_rounding_mode_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6826573Z   test_normalize_operator_exhaustive_div_trunc_rounding_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6827075Z   test_normalize_operator_exhaustive_dot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6827515Z   test_normalize_operator_exhaustive_double_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6827955Z   test_normalize_operator_exhaustive_dsplit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6828399Z   test_normalize_operator_exhaustive_dstack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6828843Z   test_normalize_operator_exhaustive_einsum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6829284Z   test_normalize_operator_exhaustive_empty_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6829718Z   test_normalize_operator_exhaustive_empty_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.6830154Z   test_normalize_operator_exhaustive_eq_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6830584Z   test_normalize_operator_exhaustive_equal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6831058Z   test_normalize_operator_exhaustive_erf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6831490Z   test_normalize_operator_exhaustive_erfc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6831933Z   test_normalize_operator_exhaustive_erfinv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6832373Z   test_normalize_operator_exhaustive_exp2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6832812Z   test_normalize_operator_exhaustive_exp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6833247Z   test_normalize_operator_exhaustive_expand_as_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6833698Z   test_normalize_operator_exhaustive_expand_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6834140Z   test_normalize_operator_exhaustive_expm1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6834582Z   test_normalize_operator_exhaustive_eye_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6835013Z   test_normalize_operator_exhaustive_fft_fft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6835457Z   test_normalize_operator_exhaustive_fft_fft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6835900Z   test_normalize_operator_exhaustive_fft_fftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6836350Z   test_normalize_operator_exhaustive_fft_fftshift_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6836794Z   test_normalize_operator_exhaustive_fft_hfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6837281Z   test_normalize_operator_exhaustive_fft_hfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6837728Z   test_normalize_operator_exhaustive_fft_hfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6838173Z   test_normalize_operator_exhaustive_fft_ifft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6838605Z   test_normalize_operator_exhaustive_fft_ifft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6839046Z   test_normalize_operator_exhaustive_fft_ifftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6839500Z   test_normalize_operator_exhaustive_fft_ifftshift_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6839953Z   test_normalize_operator_exhaustive_fft_ihfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6840391Z   test_normalize_operator_exhaustive_fft_ihfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6840838Z   test_normalize_operator_exhaustive_fft_ihfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6841284Z   test_normalize_operator_exhaustive_fft_irfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6841731Z   test_normalize_operator_exhaustive_fft_irfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6842226Z   test_normalize_operator_exhaustive_fft_irfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6842671Z   test_normalize_operator_exhaustive_fft_rfft2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6843113Z   test_normalize_operator_exhaustive_fft_rfft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6843554Z   test_normalize_operator_exhaustive_fft_rfftn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6843983Z   test_normalize_operator_exhaustive_fill_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6844426Z   test_normalize_operator_exhaustive_flatten_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6844870Z   test_normalize_operator_exhaustive_flip_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6845317Z   test_normalize_operator_exhaustive_fliplr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6845752Z   test_normalize_operator_exhaustive_flipud_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6846195Z   test_normalize_operator_exhaustive_float_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6846646Z   test_normalize_operator_exhaustive_float_power_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6847092Z   test_normalize_operator_exhaustive_floor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6847576Z   test_normalize_operator_exhaustive_floor_divide_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6848028Z   test_normalize_operator_exhaustive_fmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6848491Z   test_normalize_operator_exhaustive_fmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6848923Z   test_normalize_operator_exhaustive_fmod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6849349Z   test_normalize_operator_exhaustive_frac_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6849787Z   test_normalize_operator_exhaustive_frexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6850217Z   test_normalize_operator_exhaustive_full_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6850662Z   test_normalize_operator_exhaustive_full_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6851099Z   test_normalize_operator_exhaustive_gather_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6851543Z   test_normalize_operator_exhaustive_ge_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6851982Z   test_normalize_operator_exhaustive_geqrf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6852427Z   test_normalize_operator_exhaustive_gradient_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6852874Z   test_normalize_operator_exhaustive_grid_sampler_2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6853320Z   test_normalize_operator_exhaustive_gt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6853792Z   test_normalize_operator_exhaustive_half_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6854238Z   test_normalize_operator_exhaustive_heaviside_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6854879Z   test_normalize_operator_exhaustive_histc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6855325Z   test_normalize_operator_exhaustive_hsplit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6855764Z   test_normalize_operator_exhaustive_hstack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6856199Z   test_normalize_operator_exhaustive_hypot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6856625Z   test_normalize_operator_exhaustive_i0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6857082Z   test_normalize_operator_exhaustive_igamma_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6857551Z   test_normalize_operator_exhaustive_igammac_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6857995Z   test_normalize_operator_exhaustive_index_add_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6858437Z   test_normalize_operator_exhaustive_index_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6858885Z   test_normalize_operator_exhaustive_index_fill_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6859332Z   test_normalize_operator_exhaustive_index_put_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6859782Z   test_normalize_operator_exhaustive_index_reduce_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6860290Z   test_normalize_operator_exhaustive_index_select_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6860728Z   test_normalize_operator_exhaustive_inner_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6861159Z   test_normalize_operator_exhaustive_int_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6861592Z   test_normalize_operator_exhaustive_isclose_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6862041Z   test_normalize_operator_exhaustive_isfinite_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6862474Z   test_normalize_operator_exhaustive_isin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6862909Z   test_normalize_operator_exhaustive_isinf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6863344Z   test_normalize_operator_exhaustive_isnan_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6863780Z   test_normalize_operator_exhaustive_isneginf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6864216Z   test_normalize_operator_exhaustive_isposinf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6864656Z   test_normalize_operator_exhaustive_isreal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6865164Z   test_normalize_operator_exhaustive_jiterator_2inputs_2outputs_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6865654Z   test_normalize_operator_exhaustive_jiterator_4inputs_with_extra_args_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6866127Z   test_normalize_operator_exhaustive_jiterator_binary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6866601Z   test_normalize_operator_exhaustive_jiterator_binary_return_by_ref_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6867075Z   test_normalize_operator_exhaustive_jiterator_unary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6867522Z   test_normalize_operator_exhaustive_kron_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6867964Z   test_normalize_operator_exhaustive_kthvalue_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6868407Z   test_normalize_operator_exhaustive_ldexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6868858Z   test_normalize_operator_exhaustive_le_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6869298Z   test_normalize_operator_exhaustive_lerp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6869731Z   test_normalize_operator_exhaustive_lgamma_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6870183Z   test_normalize_operator_exhaustive_linalg_cholesky_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6870648Z   test_normalize_operator_exhaustive_linalg_cholesky_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6871106Z   test_normalize_operator_exhaustive_linalg_cond_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.003s)
2023-01-11T21:27:27.6871581Z   test_normalize_operator_exhaustive_linalg_cross_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6872033Z   test_normalize_operator_exhaustive_linalg_det_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6872493Z   test_normalize_operator_exhaustive_linalg_det_singular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6872951Z   test_normalize_operator_exhaustive_linalg_eig_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6873393Z   test_normalize_operator_exhaustive_linalg_eigh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6873851Z   test_normalize_operator_exhaustive_linalg_eigvals_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6874315Z   test_normalize_operator_exhaustive_linalg_eigvalsh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6874786Z   test_normalize_operator_exhaustive_linalg_householder_product_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6875245Z   test_normalize_operator_exhaustive_linalg_inv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6875692Z   test_normalize_operator_exhaustive_linalg_inv_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6876149Z   test_normalize_operator_exhaustive_linalg_ldl_factor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6876646Z   test_normalize_operator_exhaustive_linalg_ldl_factor_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6877103Z   test_normalize_operator_exhaustive_linalg_ldl_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6877561Z   test_normalize_operator_exhaustive_linalg_lstsq_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6878025Z   test_normalize_operator_exhaustive_linalg_lstsq_grad_oriented_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6878486Z   test_normalize_operator_exhaustive_linalg_lu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6878927Z   test_normalize_operator_exhaustive_linalg_lu_factor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6879394Z   test_normalize_operator_exhaustive_linalg_lu_factor_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6879855Z   test_normalize_operator_exhaustive_linalg_lu_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6880318Z   test_normalize_operator_exhaustive_linalg_matrix_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6880779Z   test_normalize_operator_exhaustive_linalg_matrix_power_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6881230Z   test_normalize_operator_exhaustive_linalg_matrix_rank_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6881698Z   test_normalize_operator_exhaustive_linalg_matrix_rank_hermitian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6882167Z   test_normalize_operator_exhaustive_linalg_multi_dot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6882648Z   test_normalize_operator_exhaustive_linalg_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6883114Z   test_normalize_operator_exhaustive_linalg_norm_subgradients_at_zero_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6883584Z   test_normalize_operator_exhaustive_linalg_pinv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6884045Z   test_normalize_operator_exhaustive_linalg_pinv_hermitian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6884560Z   test_normalize_operator_exhaustive_linalg_pinv_singular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s)
2023-01-11T21:27:27.6885051Z   test_normalize_operator_exhaustive_linalg_qr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6885503Z   test_normalize_operator_exhaustive_linalg_slogdet_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6885960Z   test_normalize_operator_exhaustive_linalg_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6886419Z   test_normalize_operator_exhaustive_linalg_solve_ex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6886925Z   test_normalize_operator_exhaustive_linalg_solve_triangular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6887417Z   test_normalize_operator_exhaustive_linalg_svd_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6887870Z   test_normalize_operator_exhaustive_linalg_svdvals_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6888340Z   test_normalize_operator_exhaustive_linalg_tensorinv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6888795Z   test_normalize_operator_exhaustive_linalg_tensorsolve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6889254Z   test_normalize_operator_exhaustive_linalg_vander_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6889708Z   test_normalize_operator_exhaustive_linalg_vecdot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6890175Z   test_normalize_operator_exhaustive_linalg_vector_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6890625Z   test_normalize_operator_exhaustive_linspace_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6891076Z   test_normalize_operator_exhaustive_log10_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6891513Z   test_normalize_operator_exhaustive_log1p_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6891954Z   test_normalize_operator_exhaustive_log2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6892382Z   test_normalize_operator_exhaustive_log_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6892828Z   test_normalize_operator_exhaustive_log_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6893294Z   test_normalize_operator_exhaustive_log_softmax_with_dtype_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6893788Z   test_normalize_operator_exhaustive_logaddexp2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6894232Z   test_normalize_operator_exhaustive_logaddexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6894932Z   test_normalize_operator_exhaustive_logcumsumexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6895381Z   test_normalize_operator_exhaustive_logdet_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6895825Z   test_normalize_operator_exhaustive_logical_and_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6896267Z   test_normalize_operator_exhaustive_logical_not_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6896719Z   test_normalize_operator_exhaustive_logical_or_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6897165Z   test_normalize_operator_exhaustive_logical_xor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6897602Z   test_normalize_operator_exhaustive_logit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6898043Z   test_normalize_operator_exhaustive_logspace_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6898479Z   test_normalize_operator_exhaustive_logsumexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6898978Z   test_normalize_operator_exhaustive_long_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6899415Z   test_normalize_operator_exhaustive_lt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6899846Z   test_normalize_operator_exhaustive_lu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6900283Z   test_normalize_operator_exhaustive_lu_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6900733Z   test_normalize_operator_exhaustive_lu_unpack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6901175Z   test_normalize_operator_exhaustive_mH_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6901612Z   test_normalize_operator_exhaustive_mT_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6902054Z   test_normalize_operator_exhaustive_masked_amax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6902506Z   test_normalize_operator_exhaustive_masked_amin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6902963Z   test_normalize_operator_exhaustive_masked_argmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6903419Z   test_normalize_operator_exhaustive_masked_argmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6903871Z   test_normalize_operator_exhaustive_masked_cumprod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6904324Z   test_normalize_operator_exhaustive_masked_cumsum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6904774Z   test_normalize_operator_exhaustive_masked_fill_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6905266Z   test_normalize_operator_exhaustive_masked_log_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6905727Z   test_normalize_operator_exhaustive_masked_logaddexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6906185Z   test_normalize_operator_exhaustive_masked_logsumexp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.6906642Z   test_normalize_operator_exhaustive_masked_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6907096Z   test_normalize_operator_exhaustive_masked_median_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6907541Z   test_normalize_operator_exhaustive_masked_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6908002Z   test_normalize_operator_exhaustive_masked_normalize_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6908456Z   test_normalize_operator_exhaustive_masked_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6908912Z   test_normalize_operator_exhaustive_masked_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6909360Z   test_normalize_operator_exhaustive_masked_select_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6909816Z   test_normalize_operator_exhaustive_masked_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6910301Z   test_normalize_operator_exhaustive_masked_softmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6910759Z   test_normalize_operator_exhaustive_masked_std_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6911202Z   test_normalize_operator_exhaustive_masked_sum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6911648Z   test_normalize_operator_exhaustive_masked_var_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6912092Z   test_normalize_operator_exhaustive_matmul_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6912537Z   test_normalize_operator_exhaustive_matrix_exp_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6912979Z   test_normalize_operator_exhaustive_max_binary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6913449Z   test_normalize_operator_exhaustive_max_pool2d_with_indices_backward_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6913935Z   test_normalize_operator_exhaustive_max_reduction_no_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6914404Z   test_normalize_operator_exhaustive_max_reduction_with_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6914858Z   test_normalize_operator_exhaustive_maximum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6915307Z   test_normalize_operator_exhaustive_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6915749Z   test_normalize_operator_exhaustive_median_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6916207Z   test_normalize_operator_exhaustive_meshgrid_list_of_tensors_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6916717Z   test_normalize_operator_exhaustive_meshgrid_variadic_tensors_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.6917176Z   test_normalize_operator_exhaustive_min_binary_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6917633Z   test_normalize_operator_exhaustive_min_reduction_no_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6918103Z   test_normalize_operator_exhaustive_min_reduction_with_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6918553Z   test_normalize_operator_exhaustive_minimum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6918992Z   test_normalize_operator_exhaustive_mm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6919432Z   test_normalize_operator_exhaustive_mode_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6919874Z   test_normalize_operator_exhaustive_movedim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6920309Z   test_normalize_operator_exhaustive_msort_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6920748Z   test_normalize_operator_exhaustive_mul_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6921191Z   test_normalize_operator_exhaustive_multinomial_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6921667Z   test_normalize_operator_exhaustive_mv_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6922114Z   test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6922582Z   test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6923048Z   test_normalize_operator_exhaustive_mvlgamma_mvlgamma_p_5_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6923508Z   test_normalize_operator_exhaustive_nan_to_num_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6923961Z   test_normalize_operator_exhaustive_nanmean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6924404Z   test_normalize_operator_exhaustive_nanmedian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6924857Z   test_normalize_operator_exhaustive_nanquantile_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6925303Z   test_normalize_operator_exhaustive_nansum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6925752Z   test_normalize_operator_exhaustive_narrow_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6926192Z   test_normalize_operator_exhaustive_narrow_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6926642Z   test_normalize_operator_exhaustive_native_batch_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6927115Z   test_normalize_operator_exhaustive_native_dropout_backward_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6927662Z   test_normalize_operator_exhaustive_native_layer_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6928101Z   test_normalize_operator_exhaustive_ne_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6928542Z   test_normalize_operator_exhaustive_neg_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6928985Z   test_normalize_operator_exhaustive_new_empty_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6929440Z   test_normalize_operator_exhaustive_new_empty_strided_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6929891Z   test_normalize_operator_exhaustive_new_full_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6930341Z   test_normalize_operator_exhaustive_new_ones_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6930791Z   test_normalize_operator_exhaustive_new_zeros_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6931241Z   test_normalize_operator_exhaustive_nextafter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6931706Z   test_normalize_operator_exhaustive_nn_functional__scaled_dot_product_attention_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.6932205Z   test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6932736Z   test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6933233Z   test_normalize_operator_exhaustive_nn_functional_adaptive_avg_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6933711Z   test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6934194Z   test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6934834Z   test_normalize_operator_exhaustive_nn_functional_adaptive_max_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6935318Z   test_normalize_operator_exhaustive_nn_functional_alpha_dropout_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6935800Z   test_normalize_operator_exhaustive_nn_functional_avg_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6936265Z   test_normalize_operator_exhaustive_nn_functional_avg_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6936735Z   test_normalize_operator_exhaustive_nn_functional_avg_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6937205Z   test_normalize_operator_exhaustive_nn_functional_batch_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6937685Z   test_normalize_operator_exhaustive_nn_functional_batch_norm_without_cudnn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6938160Z   test_normalize_operator_exhaustive_nn_functional_bilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6938637Z   test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6939255Z   test_normalize_operator_exhaustive_nn_functional_binary_cross_entropy_with_logits_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6939812Z   test_normalize_operator_exhaustive_nn_functional_celu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6940266Z   test_normalize_operator_exhaustive_nn_functional_conv1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6940727Z   test_normalize_operator_exhaustive_nn_functional_conv2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6941201Z   test_normalize_operator_exhaustive_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6941688Z   test_normalize_operator_exhaustive_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6942158Z   test_normalize_operator_exhaustive_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6942642Z   test_normalize_operator_exhaustive_nn_functional_cosine_embedding_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6943130Z   test_normalize_operator_exhaustive_nn_functional_cosine_similarity_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6943610Z   test_normalize_operator_exhaustive_nn_functional_cross_entropy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6944138Z   test_normalize_operator_exhaustive_nn_functional_ctc_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6944609Z   test_normalize_operator_exhaustive_nn_functional_dropout2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6945080Z   test_normalize_operator_exhaustive_nn_functional_dropout3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6945553Z   test_normalize_operator_exhaustive_nn_functional_dropout_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6946022Z   test_normalize_operator_exhaustive_nn_functional_elu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6946489Z   test_normalize_operator_exhaustive_nn_functional_embedding_bag_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6946966Z   test_normalize_operator_exhaustive_nn_functional_embedding_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6965725Z   test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_with_train_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6966401Z   test_normalize_operator_exhaustive_nn_functional_feature_alpha_dropout_without_train_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6967076Z   test_normalize_operator_exhaustive_nn_functional_fractional_max_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6967767Z   test_normalize_operator_exhaustive_nn_functional_fractional_max_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6968466Z   test_normalize_operator_exhaustive_nn_functional_gaussian_nll_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6969241Z   test_normalize_operator_exhaustive_nn_functional_gelu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6969898Z   test_normalize_operator_exhaustive_nn_functional_glu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6970555Z   test_normalize_operator_exhaustive_nn_functional_grid_sample_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6971230Z   test_normalize_operator_exhaustive_nn_functional_group_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6971897Z   test_normalize_operator_exhaustive_nn_functional_hardshrink_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6972577Z   test_normalize_operator_exhaustive_nn_functional_hardsigmoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6973257Z   test_normalize_operator_exhaustive_nn_functional_hardswish_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6973915Z   test_normalize_operator_exhaustive_nn_functional_hardtanh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6974734Z   test_normalize_operator_exhaustive_nn_functional_hinge_embedding_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6975251Z   test_normalize_operator_exhaustive_nn_functional_huber_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6975828Z   test_normalize_operator_exhaustive_nn_functional_instance_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6976297Z   test_normalize_operator_exhaustive_nn_functional_interpolate_area_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6976837Z   test_normalize_operator_exhaustive_nn_functional_interpolate_bicubic_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6977328Z   test_normalize_operator_exhaustive_nn_functional_interpolate_bilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6977814Z   test_normalize_operator_exhaustive_nn_functional_interpolate_linear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6978291Z   test_normalize_operator_exhaustive_nn_functional_interpolate_nearest_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6978782Z   test_normalize_operator_exhaustive_nn_functional_interpolate_trilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6979257Z   test_normalize_operator_exhaustive_nn_functional_kl_div_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6979713Z   test_normalize_operator_exhaustive_nn_functional_l1_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6980172Z   test_normalize_operator_exhaustive_nn_functional_layer_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6980626Z   test_normalize_operator_exhaustive_nn_functional_leaky_relu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6981086Z   test_normalize_operator_exhaustive_nn_functional_linear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6981557Z   test_normalize_operator_exhaustive_nn_functional_local_response_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6982067Z   test_normalize_operator_exhaustive_nn_functional_logsigmoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6982535Z   test_normalize_operator_exhaustive_nn_functional_margin_ranking_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6983007Z   test_normalize_operator_exhaustive_nn_functional_max_pool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6983465Z   test_normalize_operator_exhaustive_nn_functional_max_pool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6983927Z   test_normalize_operator_exhaustive_nn_functional_max_pool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6984384Z   test_normalize_operator_exhaustive_nn_functional_max_unpool1d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6984865Z   test_normalize_operator_exhaustive_nn_functional_max_unpool1d_grad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6985336Z   test_normalize_operator_exhaustive_nn_functional_max_unpool2d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6985807Z   test_normalize_operator_exhaustive_nn_functional_max_unpool2d_grad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6986269Z   test_normalize_operator_exhaustive_nn_functional_max_unpool3d_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6986773Z   test_normalize_operator_exhaustive_nn_functional_max_unpool3d_grad_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6987242Z   test_normalize_operator_exhaustive_nn_functional_mish_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6987702Z   test_normalize_operator_exhaustive_nn_functional_mse_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6988167Z   test_normalize_operator_exhaustive_nn_functional_multi_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6988640Z   test_normalize_operator_exhaustive_nn_functional_multilabel_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6989131Z   test_normalize_operator_exhaustive_nn_functional_multilabel_soft_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6989610Z   test_normalize_operator_exhaustive_nn_functional_nll_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6990072Z   test_normalize_operator_exhaustive_nn_functional_normalize_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6990529Z   test_normalize_operator_exhaustive_nn_functional_pad_circular_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6991002Z   test_normalize_operator_exhaustive_nn_functional_pad_constant_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6991470Z   test_normalize_operator_exhaustive_nn_functional_pad_reflect_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6991946Z   test_normalize_operator_exhaustive_nn_functional_pad_replicate_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6992446Z   test_normalize_operator_exhaustive_nn_functional_pairwise_distance_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6992915Z   test_normalize_operator_exhaustive_nn_functional_pdist_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6993387Z   test_normalize_operator_exhaustive_nn_functional_pixel_shuffle_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6993860Z   test_normalize_operator_exhaustive_nn_functional_pixel_unshuffle_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6994326Z   test_normalize_operator_exhaustive_nn_functional_poisson_nll_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6994795Z   test_normalize_operator_exhaustive_nn_functional_prelu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6995251Z   test_normalize_operator_exhaustive_nn_functional_relu6_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6995707Z   test_normalize_operator_exhaustive_nn_functional_relu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6996165Z   test_normalize_operator_exhaustive_nn_functional_rrelu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6996611Z   test_normalize_operator_exhaustive_nn_functional_selu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6997060Z   test_normalize_operator_exhaustive_nn_functional_silu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6997557Z   test_normalize_operator_exhaustive_nn_functional_smooth_l1_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6998033Z   test_normalize_operator_exhaustive_nn_functional_soft_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6998493Z   test_normalize_operator_exhaustive_nn_functional_softmin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6998963Z   test_normalize_operator_exhaustive_nn_functional_softmin_with_dtype_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6999435Z   test_normalize_operator_exhaustive_nn_functional_softplus_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.6999903Z   test_normalize_operator_exhaustive_nn_functional_softshrink_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7000359Z   test_normalize_operator_exhaustive_nn_functional_softsign_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7000824Z   test_normalize_operator_exhaustive_nn_functional_tanhshrink_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7001289Z   test_normalize_operator_exhaustive_nn_functional_threshold_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7001760Z   test_normalize_operator_exhaustive_nn_functional_triplet_margin_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7002247Z   test_normalize_operator_exhaustive_nn_functional_triplet_margin_with_distance_loss_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7002729Z   test_normalize_operator_exhaustive_nn_functional_unfold_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7003224Z   test_normalize_operator_exhaustive_nn_functional_upsample_bilinear_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7003707Z   test_normalize_operator_exhaustive_nn_functional_upsample_nearest_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7004170Z   test_normalize_operator_exhaustive_nonzero_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7004597Z   test_normalize_operator_exhaustive_norm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7005034Z   test_normalize_operator_exhaustive_norm_fro_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7005473Z   test_normalize_operator_exhaustive_norm_inf_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7005909Z   test_normalize_operator_exhaustive_norm_nuc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7006333Z   test_normalize_operator_exhaustive_normal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7006776Z   test_normalize_operator_exhaustive_normal_number_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7007215Z   test_normalize_operator_exhaustive_ones_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7007654Z   test_normalize_operator_exhaustive_ones_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7008115Z   test_normalize_operator_exhaustive_ormqr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7008546Z   test_normalize_operator_exhaustive_outer_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7008992Z   test_normalize_operator_exhaustive_pca_lowrank_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7009434Z   test_normalize_operator_exhaustive_permute_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7009870Z   test_normalize_operator_exhaustive_pinverse_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7010310Z   test_normalize_operator_exhaustive_polar_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7010762Z   test_normalize_operator_exhaustive_polygamma_polygamma_n_0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7011232Z   test_normalize_operator_exhaustive_polygamma_polygamma_n_1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.7011685Z   test_normalize_operator_exhaustive_polygamma_polygamma_n_2_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.7012138Z   test_normalize_operator_exhaustive_polygamma_polygamma_n_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.7012587Z   test_normalize_operator_exhaustive_polygamma_polygamma_n_4_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.7013033Z   test_normalize_operator_exhaustive_positive_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7013462Z   test_normalize_operator_exhaustive_pow_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7013897Z   test_normalize_operator_exhaustive_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7014374Z   test_normalize_operator_exhaustive_put_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7015040Z   test_normalize_operator_exhaustive_qr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7015471Z   test_normalize_operator_exhaustive_quantile_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7015910Z   test_normalize_operator_exhaustive_rad2deg_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7016347Z   test_normalize_operator_exhaustive_rand_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7016790Z   test_normalize_operator_exhaustive_randint_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7017273Z   test_normalize_operator_exhaustive_randint_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7017714Z   test_normalize_operator_exhaustive_randn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7018152Z   test_normalize_operator_exhaustive_randn_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7018589Z   test_normalize_operator_exhaustive_ravel_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7019014Z   test_normalize_operator_exhaustive_real_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7019522Z   test_normalize_operator_exhaustive_reciprocal_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7019972Z   test_normalize_operator_exhaustive_remainder_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7020416Z   test_normalize_operator_exhaustive_renorm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7020844Z   test_normalize_operator_exhaustive_repeat_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7021300Z   test_normalize_operator_exhaustive_repeat_interleave_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7021753Z   test_normalize_operator_exhaustive_reshape_as_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7022200Z   test_normalize_operator_exhaustive_reshape_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7022633Z   test_normalize_operator_exhaustive_resize__cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7023077Z   test_normalize_operator_exhaustive_resize_as__cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7023524Z   test_normalize_operator_exhaustive_resolve_conj_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7023974Z   test_normalize_operator_exhaustive_resolve_neg_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7024409Z   test_normalize_operator_exhaustive_roll_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7024843Z   test_normalize_operator_exhaustive_rot90_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7025277Z   test_normalize_operator_exhaustive_round_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7025806Z   test_normalize_operator_exhaustive_round_decimals_0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7026257Z   test_normalize_operator_exhaustive_round_decimals_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7026713Z   test_normalize_operator_exhaustive_round_decimals_neg_3_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7027224Z   test_normalize_operator_exhaustive_rsqrt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7027714Z   test_normalize_operator_exhaustive_rsub_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7028164Z   test_normalize_operator_exhaustive_scalar_tensor_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7028617Z   test_normalize_operator_exhaustive_scatter_add_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7029058Z   test_normalize_operator_exhaustive_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7029512Z   test_normalize_operator_exhaustive_scatter_reduce_amax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7029978Z   test_normalize_operator_exhaustive_scatter_reduce_amin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7030443Z   test_normalize_operator_exhaustive_scatter_reduce_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7030940Z   test_normalize_operator_exhaustive_scatter_reduce_prod_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7031400Z   test_normalize_operator_exhaustive_scatter_reduce_sum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7031860Z   test_normalize_operator_exhaustive_searchsorted_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7032327Z   test_normalize_operator_exhaustive_segment_reduce_lengths_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7032799Z   test_normalize_operator_exhaustive_segment_reduce_offsets_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7033250Z   test_normalize_operator_exhaustive_select_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7033701Z   test_normalize_operator_exhaustive_select_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7034149Z   test_normalize_operator_exhaustive_sgn_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7034586Z   test_normalize_operator_exhaustive_short_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7035020Z   test_normalize_operator_exhaustive_sigmoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7035462Z   test_normalize_operator_exhaustive_sign_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7035916Z   test_normalize_operator_exhaustive_signal_windows_bartlett_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7036387Z   test_normalize_operator_exhaustive_signal_windows_blackman_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7036855Z   test_normalize_operator_exhaustive_signal_windows_cosine_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7037360Z   test_normalize_operator_exhaustive_signal_windows_exponential_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7037887Z   test_normalize_operator_exhaustive_signal_windows_gaussian_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7038364Z   test_normalize_operator_exhaustive_signal_windows_general_cosine_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7038841Z   test_normalize_operator_exhaustive_signal_windows_general_hamming_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7039320Z   test_normalize_operator_exhaustive_signal_windows_hamming_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7039780Z   test_normalize_operator_exhaustive_signal_windows_hann_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7040247Z   test_normalize_operator_exhaustive_signal_windows_kaiser_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7040706Z   test_normalize_operator_exhaustive_signal_windows_nuttall_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7041158Z   test_normalize_operator_exhaustive_signbit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7041595Z   test_normalize_operator_exhaustive_sin_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7042062Z   test_normalize_operator_exhaustive_sinc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7042495Z   test_normalize_operator_exhaustive_sinh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7042937Z   test_normalize_operator_exhaustive_slice_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7043381Z   test_normalize_operator_exhaustive_slice_scatter_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7043828Z   test_normalize_operator_exhaustive_softmax_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7044270Z   test_normalize_operator_exhaustive_softmax_with_dtype_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7044718Z   test_normalize_operator_exhaustive_sort_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7045173Z   test_normalize_operator_exhaustive_sparse_sampled_addmm_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7045637Z   test_normalize_operator_exhaustive_special_airy_ai_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7046086Z   test_normalize_operator_exhaustive_special_bessel_j0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7046545Z   test_normalize_operator_exhaustive_special_bessel_j1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7047003Z   test_normalize_operator_exhaustive_special_bessel_y0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7047493Z   test_normalize_operator_exhaustive_special_bessel_y1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7047985Z   test_normalize_operator_exhaustive_special_chebyshev_polynomial_t_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7048492Z   test_normalize_operator_exhaustive_special_chebyshev_polynomial_u_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7049271Z   test_normalize_operator_exhaustive_special_chebyshev_polynomial_v_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7049955Z   test_normalize_operator_exhaustive_special_chebyshev_polynomial_w_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7050468Z   test_normalize_operator_exhaustive_special_entr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7050919Z   test_normalize_operator_exhaustive_special_erfcx_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7051394Z   test_normalize_operator_exhaustive_special_hermite_polynomial_h_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7051881Z   test_normalize_operator_exhaustive_special_hermite_polynomial_he_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7052350Z   test_normalize_operator_exhaustive_special_i0e_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7052792Z   test_normalize_operator_exhaustive_special_i1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7053240Z   test_normalize_operator_exhaustive_special_i1e_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7053743Z   test_normalize_operator_exhaustive_special_laguerre_polynomial_l_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7054399Z   test_normalize_operator_exhaustive_special_legendre_polynomial_p_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7055147Z   test_normalize_operator_exhaustive_special_log_ndtr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7055613Z   test_normalize_operator_exhaustive_special_modified_bessel_i0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7056087Z   test_normalize_operator_exhaustive_special_modified_bessel_i1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7056571Z   test_normalize_operator_exhaustive_special_modified_bessel_k0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7057048Z   test_normalize_operator_exhaustive_special_modified_bessel_k1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7057509Z   test_normalize_operator_exhaustive_special_ndtr_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7057964Z   test_normalize_operator_exhaustive_special_ndtri_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7058446Z   test_normalize_operator_exhaustive_special_polygamma_special_polygamma_n_0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7058941Z   test_normalize_operator_exhaustive_special_scaled_modified_bessel_k0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7059427Z   test_normalize_operator_exhaustive_special_scaled_modified_bessel_k1_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7060163Z   test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_t_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7060875Z   test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_u_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7061573Z   test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_v_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7062266Z   test_normalize_operator_exhaustive_special_shifted_chebyshev_polynomial_w_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipping - testing takes an unreasonably long time, #79528 (0.002s)
2023-01-11T21:27:27.7062784Z   test_normalize_operator_exhaustive_special_spherical_bessel_j0_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7063256Z   test_normalize_operator_exhaustive_special_xlog1py_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7063718Z   test_normalize_operator_exhaustive_special_zeta_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7064165Z   test_normalize_operator_exhaustive_split_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7064606Z   test_normalize_operator_exhaustive_split_list_args_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7065063Z   test_normalize_operator_exhaustive_split_with_sizes_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7065557Z   test_normalize_operator_exhaustive_sqrt_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7066004Z   test_normalize_operator_exhaustive_square_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7066441Z   test_normalize_operator_exhaustive_squeeze_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7066882Z   test_normalize_operator_exhaustive_stack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7067321Z   test_normalize_operator_exhaustive_std_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7067762Z   test_normalize_operator_exhaustive_std_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7068213Z   test_normalize_operator_exhaustive_std_mean_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7068671Z   test_normalize_operator_exhaustive_std_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7069113Z   test_normalize_operator_exhaustive_stft_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7069546Z   test_normalize_operator_exhaustive_sub_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7069969Z   test_normalize_operator_exhaustive_sum_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7070409Z   test_normalize_operator_exhaustive_sum_to_size_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7070855Z   test_normalize_operator_exhaustive_svd_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7071292Z   test_normalize_operator_exhaustive_svd_lowrank_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7071758Z   test_normalize_operator_exhaustive_symeig_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7072196Z   test_normalize_operator_exhaustive_t_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7072642Z   test_normalize_operator_exhaustive_take_along_dim_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7073088Z   test_normalize_operator_exhaustive_take_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7073516Z   test_normalize_operator_exhaustive_tan_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7073950Z   test_normalize_operator_exhaustive_tanh_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7074397Z   test_normalize_operator_exhaustive_tensor_split_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7074851Z   test_normalize_operator_exhaustive_tensordot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7075283Z   test_normalize_operator_exhaustive_tile_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7075717Z   test_normalize_operator_exhaustive_to_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Skipped! (0.002s)
2023-01-11T21:27:27.7076161Z   test_normalize_operator_exhaustive_to_sparse_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7076637Z   test_normalize_operator_exhaustive_topk_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7077065Z   test_normalize_operator_exhaustive_trace_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7077561Z   test_normalize_operator_exhaustive_transpose_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7078004Z   test_normalize_operator_exhaustive_trapezoid_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7078443Z   test_normalize_operator_exhaustive_trapz_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7078887Z   test_normalize_operator_exhaustive_triangular_solve_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7079337Z   test_normalize_operator_exhaustive_tril_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7079767Z   test_normalize_operator_exhaustive_triu_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7080217Z   test_normalize_operator_exhaustive_true_divide_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7080653Z   test_normalize_operator_exhaustive_trunc_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7081093Z   test_normalize_operator_exhaustive_unbind_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7081540Z   test_normalize_operator_exhaustive_unflatten_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7081990Z   test_normalize_operator_exhaustive_unfold_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7082429Z   test_normalize_operator_exhaustive_unfold_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7082915Z   test_normalize_operator_exhaustive_uniform_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7083374Z   test_normalize_operator_exhaustive_unique_consecutive_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7083827Z   test_normalize_operator_exhaustive_unique_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7084261Z   test_normalize_operator_exhaustive_unsqueeze_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7084700Z   test_normalize_operator_exhaustive_var_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7085144Z   test_normalize_operator_exhaustive_var_mean_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7085600Z   test_normalize_operator_exhaustive_var_mean_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7086060Z   test_normalize_operator_exhaustive_var_unbiased_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7086497Z   test_normalize_operator_exhaustive_vdot_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7086944Z   test_normalize_operator_exhaustive_view_as_complex_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7087392Z   test_normalize_operator_exhaustive_view_as_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7087864Z   test_normalize_operator_exhaustive_view_copy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7088296Z   test_normalize_operator_exhaustive_view_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7088739Z   test_normalize_operator_exhaustive_vsplit_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7089177Z   test_normalize_operator_exhaustive_vstack_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7089618Z   test_normalize_operator_exhaustive_where_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7090049Z   test_normalize_operator_exhaustive_xlogy_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7090484Z   test_normalize_operator_exhaustive_zero__cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7090921Z   test_normalize_operator_exhaustive_zeros_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7091365Z   test_normalize_operator_exhaustive_zeros_like_cuda_float32 (__main__.TestNormalizeOperatorsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:27:27.7091760Z   test_normalize_quantized_eb_cuda (__main__.TestNormalizeOperatorsCUDA) ... ok (0.001s)
2023-01-11T21:27:27.7091961Z 
2023-01-11T21:27:27.7092176Z ----------------------------------------------------------------------
2023-01-11T21:27:27.7092443Z Ran 671 tests in 13.191s
2023-01-11T21:27:27.7092574Z 
2023-01-11T21:27:27.7092649Z OK (skipped=630)
2023-01-11T21:27:27.7092769Z 
2023-01-11T21:27:27.7092862Z Generating XML reports...
2023-01-11T21:27:27.7093308Z Generated XML report: test-reports/python-unittest/test_fx_experimental/TEST-TestFXExperimental-20230111212713.xml
2023-01-11T21:27:27.7093891Z Generated XML report: test-reports/python-unittest/test_fx_experimental/TEST-TestNormalizeOperatorsCUDA-20230111212713.xml
2023-01-11T21:27:27.7094154Z 
2023-01-11T21:27:27.7094830Z ##[endgroup]
2023-01-11T21:27:27.7095336Z FINISHED PRINTING LOG FILE of test_fx_experimental (/var/lib/jenkins/workspace/test/test-reports/test_fx_experimental_bv6m_9hq)
2023-01-11T21:27:27.7095578Z 
2023-01-11T21:27:29.6011878Z Ignoring disabled issues:  []
2023-01-11T21:27:29.6242437Z Running test_import_stats ... [2023-01-11 21:27:29.623471]
2023-01-11T21:27:29.6243557Z Executing ['/opt/conda/bin/python', '-bb', 'test_import_stats.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:29.623788]
2023-01-11T21:27:35.1669313Z 
2023-01-11T21:27:35.1670012Z Expand the folded group to see the log file of test_import_stats
2023-01-11T21:27:35.1671145Z ##[group]PRINTING LOG FILE of test_import_stats (/var/lib/jenkins/workspace/test/test-reports/test_import_stats_pq1awtgl)
2023-01-11T21:27:35.1671494Z 
2023-01-11T21:27:35.1671634Z Running tests...
2023-01-11T21:27:35.1672159Z ----------------------------------------------------------------------
2023-01-11T21:27:35.1672727Z Test results will be stored in test-reports/python-unittest/test_import_stats
2023-01-11T21:27:35.1673331Z   test_time_cuda_device_count (__main__.TestImportTime) ... ok (2.465s)
2023-01-11T21:27:35.1673802Z   test_time_import_torch (__main__.TestImportTime) ... ok (1.365s)
2023-01-11T21:27:35.1673986Z 
2023-01-11T21:27:35.1674198Z ----------------------------------------------------------------------
2023-01-11T21:27:35.1674451Z Ran 2 tests in 3.831s
2023-01-11T21:27:35.1674573Z 
2023-01-11T21:27:35.1674646Z OK
2023-01-11T21:27:35.1674749Z 
2023-01-11T21:27:35.1674843Z Generating XML reports...
2023-01-11T21:27:35.1675273Z Generated XML report: test-reports/python-unittest/test_import_stats/TEST-TestImportTime-20230111212730.xml
2023-01-11T21:27:35.1675513Z 
2023-01-11T21:27:35.1675762Z ##[endgroup]
2023-01-11T21:27:35.1676483Z FINISHED PRINTING LOG FILE of test_import_stats (/var/lib/jenkins/workspace/test/test-reports/test_import_stats_pq1awtgl)
2023-01-11T21:27:35.1676719Z 
2023-01-11T21:27:37.0800885Z Ignoring disabled issues:  []
2023-01-11T21:27:37.1055181Z Running test_jit_autocast ... [2023-01-11 21:27:37.104913]
2023-01-11T21:27:37.1057214Z Executing ['/opt/conda/bin/python', '-bb', 'test_jit_autocast.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:27:37.105265]
2023-01-11T21:28:46.2066825Z 
2023-01-11T21:28:46.2067409Z Expand the folded group to see the log file of test_jit_autocast
2023-01-11T21:28:46.2071308Z ##[group]PRINTING LOG FILE of test_jit_autocast (/var/lib/jenkins/workspace/test/test-reports/test_jit_autocast_kac_6jpc)
2023-01-11T21:28:46.2072121Z monkeytype is not installed. Skipping tests for Profile-Directed Typing
2023-01-11T21:28:46.2072324Z 
2023-01-11T21:28:46.2072411Z Running tests...
2023-01-11T21:28:46.2072804Z ----------------------------------------------------------------------
2023-01-11T21:28:46.2073419Z Test results will be stored in test-reports/python-unittest/test_jit_autocast
2023-01-11T21:28:46.2073832Z   test_autocast_api (__main__.TestAutocast) ... ok (0.766s)
2023-01-11T21:28:46.2074189Z   test_autocast_api_not_supported (__main__.TestAutocast) ... skip: we need to provide dtype argument at this moment (0.001s)
2023-01-11T21:28:46.2074544Z   test_autocast_autodiff (__main__.TestAutocast) ... ok (0.334s)
2023-01-11T21:28:46.2074888Z   test_autocast_decorator (__main__.TestAutocast) ... skip: autocast decorators not supported (0.000s)
2023-01-11T21:28:46.2075257Z   test_autocast_decorator_outside_jit (__main__.TestAutocast) ... ok (0.003s)
2023-01-11T21:28:46.2075563Z   test_autocast_mixed_dtypes (__main__.TestAutocast) ... ok (0.033s)
2023-01-11T21:28:46.2075868Z   test_callees (__main__.TestAutocast) ... ok (0.013s)
2023-01-11T21:28:46.2076277Z   test_callees_with_autocast_off (__main__.TestAutocast) ... ok (0.013s)
2023-01-11T21:28:46.2076633Z   test_callees_with_autocast_on (__main__.TestAutocast) ... ok (0.013s)
2023-01-11T21:28:46.2078270Z   test_conditional_autocast (__main__.TestAutocast) ... ok (0.013s)
2023-01-11T21:28:46.2078855Z   test_control_flow (__main__.TestAutocast) ... skip: broken due to lack of type propagation (0.001s)
2023-01-11T21:28:46.2079490Z   test_divergent_autocast (__main__.TestAutocast) ... ok (0.011s)
2023-01-11T21:28:46.2079895Z   test_divergent_types (__main__.TestAutocast) ... ok (0.012s)
2023-01-11T21:28:46.2080274Z   test_duplicate_inputs (__main__.TestAutocast) ... ok (0.010s)
2023-01-11T21:28:46.2080652Z   test_eager_and_script (__main__.TestAutocast) ... ok (0.004s)
2023-01-11T21:28:46.2081011Z   test_explicit_casts (__main__.TestAutocast) ... ok (0.011s)
2023-01-11T21:28:46.2081295Z   test_fp32_policy (__main__.TestAutocast) ... ok (0.009s)
2023-01-11T21:28:46.2081579Z   test_fp32_policy_with_fp64 (__main__.TestAutocast) ... ok (0.009s)
2023-01-11T21:28:46.2081884Z   test_fp32_set_opt_dtype_policy (__main__.TestAutocast) ... ok (0.011s)
2023-01-11T21:28:46.2083437Z   test_fp32_set_opt_dtype_policy_fp64 (__main__.TestAutocast) ... ok (0.011s)
2023-01-11T21:28:46.2083783Z   test_ignore_amp (__main__.TestAutocast) ... ok (0.003s)
2023-01-11T21:28:46.2084095Z   test_implicitly_nested_autocast (__main__.TestAutocast) ... ok (0.010s)
2023-01-11T21:28:46.2084421Z   test_inplace (__main__.TestAutocast) ... ok (0.012s)
2023-01-11T21:28:46.2085004Z   test_jit_autocast_softmax_cpu (__main__.TestAutocast) ... skip: CPU-only test (0.000s)
2023-01-11T21:28:46.2085480Z   test_jit_autocast_softmax_gpu (__main__.TestAutocast) ... ok (0.307s)
2023-01-11T21:28:46.2085913Z   test_jit_call_method_under_autocast (__main__.TestAutocast) ... ok (0.038s)
2023-01-11T21:28:46.2086346Z   test_jit_executor_under_autocast (__main__.TestAutocast) ... ok (0.015s)
2023-01-11T21:28:46.2086648Z   test_jit_freeze_autocast_basic (__main__.TestAutocast) ... ok (0.022s)
2023-01-11T21:28:46.2086953Z   test_jit_freeze_autocast_constants (__main__.TestAutocast) ... ok (0.017s)
2023-01-11T21:28:46.2087553Z   test_linear_bf16 (__main__.TestAutocast) ... ok (0.011s)
2023-01-11T21:28:46.2087810Z   test_minimal (__main__.TestAutocast) ... ok (0.010s)
2023-01-11T21:28:46.2088079Z   test_minimal_cpu (__main__.TestAutocast) ... ok (0.009s)
2023-01-11T21:28:46.2088353Z   test_minimal_off (__main__.TestAutocast) ... ok (0.009s)
2023-01-11T21:28:46.2088622Z   test_nested_autocast (__main__.TestAutocast) ... ok (0.012s)
2023-01-11T21:28:46.2088893Z   test_promote_policy (__main__.TestAutocast) ... ok (0.010s)
2023-01-11T21:28:46.2089202Z   test_promote_policy_fp64 (__main__.TestAutocast) ... ok (0.010s)
2023-01-11T21:28:46.2089508Z   test_reused_autocast (__main__.TestAutocast) ... ok (0.011s)
2023-01-11T21:28:46.2089823Z   test_reused_autocast_expr (__main__.TestAutocast) ... skip: unsuported autocast syntax (0.001s)
2023-01-11T21:28:46.2090150Z   test_runtime_autocast_state (__main__.TestAutocast) ... ok (0.012s)
2023-01-11T21:28:46.2090454Z   test_runtime_autocast_state_expr (__main__.TestAutocast) ... ok (0.013s)
2023-01-11T21:28:46.2090805Z   test_script_and_tracing (__main__.TestAutocast) ... ok (0.014s)
2023-01-11T21:28:46.2091343Z   test_script_and_tracing_with_autocast (__main__.TestAutocast) ... skip: autocast(False) is ignored inside traced functions (0.001s)
2023-01-11T21:28:46.2091907Z   test_script_module (__main__.TestAutocast) ... ok (0.017s)
2023-01-11T21:28:46.2092330Z   test_tracing_and_script (__main__.TestAutocast) ... ok (0.019s)
2023-01-11T21:28:46.2092891Z   test_tracing_with_autocast_and_script (__main__.TestAutocast) ... skip: scripted called from traced TorchScript is not yet working (0.001s)
2023-01-11T21:28:46.2093435Z   test_cat_promote (__main__.TestJitTraceAutocast) ... ok (0.121s)
2023-01-11T21:28:46.2093867Z   test_generate_autocast_jit_trace_model (__main__.TestJitTraceAutocast) ... ok (21.547s)
2023-01-11T21:28:46.2094343Z   test_nchw_autocast_jit_trace_model (__main__.TestJitTraceAutocast) ... ok (30.489s)
2023-01-11T21:28:46.2094955Z   test_nhwc_autocast_jit_trace_model (__main__.TestJitTraceAutocast) ... ok (10.596s)
2023-01-11T21:28:46.2095294Z   test_script_autocast_cpu (__main__.TestJitTraceAutocast) ... ok (0.057s)
2023-01-11T21:28:46.2095615Z   test_script_autocast_cuda (__main__.TestJitTraceAutocast) ... ok (0.057s)
2023-01-11T21:28:46.2096053Z   test_script_autocast_enable_and_check (__main__.TestJitTraceAutocast) ... ok (0.068s)
2023-01-11T21:28:46.2096390Z   test_scripted_aliasing (__main__.TestJitTraceAutocast) ... ok (0.064s)
2023-01-11T21:28:46.2096571Z 
2023-01-11T21:28:46.2096793Z ----------------------------------------------------------------------
2023-01-11T21:28:46.2097055Z Ran 53 tests in 64.875s
2023-01-11T21:28:46.2097169Z 
2023-01-11T21:28:46.2097250Z OK (skipped=7)
2023-01-11T21:28:46.2097364Z 
2023-01-11T21:28:46.2097457Z Generating XML reports...
2023-01-11T21:28:46.2097881Z Generated XML report: test-reports/python-unittest/test_jit_autocast/TEST-TestAutocast-20230111212740.xml
2023-01-11T21:28:46.2098406Z Generated XML report: test-reports/python-unittest/test_jit_autocast/TEST-TestJitTraceAutocast-20230111212740.xml
2023-01-11T21:28:46.2098660Z 
2023-01-11T21:28:46.2099057Z ##[endgroup]
2023-01-11T21:28:46.2099460Z FINISHED PRINTING LOG FILE of test_jit_autocast (/var/lib/jenkins/workspace/test/test-reports/test_jit_autocast_kac_6jpc)
2023-01-11T21:28:46.2099686Z 
2023-01-11T21:28:48.1126774Z Ignoring disabled issues:  []
2023-01-11T21:28:48.1364567Z Running test_jit_llga_fuser ... [2023-01-11 21:28:48.135974]
2023-01-11T21:28:48.1367356Z Executing ['/opt/conda/bin/python', '-bb', 'test_jit_llga_fuser.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:28:48.136302]
2023-01-11T21:28:51.3026799Z 
2023-01-11T21:28:51.3027546Z Expand the folded group to see the log file of test_jit_llga_fuser
2023-01-11T21:28:51.3028344Z ##[group]PRINTING LOG FILE of test_jit_llga_fuser (/var/lib/jenkins/workspace/test/test-reports/test_jit_llga_fuser_05vvhmu3)
2023-01-11T21:28:51.3028578Z 
2023-01-11T21:28:51.3028928Z Running tests...
2023-01-11T21:28:51.3029367Z ----------------------------------------------------------------------
2023-01-11T21:28:51.3029858Z   test_dynamo_aot_ts_onednn (__main__.TestDynamoAOT) ... Test results will be stored in test-reports/python-unittest/test_jit_llga_fuser
2023-01-11T21:28:51.3030243Z skip: Enable when integration with dynamo aot_autograd is more stable (0.001s)
2023-01-11T21:28:51.3030590Z   test_context_manager (__main__.TestEnableDisableLlgaFuser) ... ok (0.066s)
2023-01-11T21:28:51.3030989Z   test_bn2d_eltwise_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3031428Z   test_bn2d_eltwise_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3031850Z   test_conv2d_bn_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3032280Z   test_conv2d_bn_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3032712Z   test_conv2d_bn_relu_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3033144Z   test_conv2d_bn_relu_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3033564Z   test_conv2d_clamp_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3033990Z   test_conv2d_clamp_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3034428Z   test_conv2d_eltwise_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3034862Z   test_conv2d_eltwise_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3035273Z   test_conv2d_silu_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3035769Z   test_conv2d_silu_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3036192Z   test_conv2d_sum_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3036613Z   test_conv2d_sum_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3037043Z   test_ensure_tensor_is_rewrapped_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3037494Z   test_ensure_tensor_is_rewrapped_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3037945Z   test_linear_eltwise_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3038376Z   test_linear_eltwise_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3038818Z   test_rewrap_tensor_input_to_pytorch_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3039262Z   test_rewrap_tensor_input_to_pytorch_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3039702Z   test_wildcard_cuda_bfloat16 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3040154Z   test_wildcard_cuda_float32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3040592Z   test_wildcard_unsupported_dtype_cuda_int32 (__main__.TestFusionPatternCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3041006Z   test_vision_alexnet_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3041393Z   test_vision_alexnet_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3041780Z   test_vision_densenet121_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3042167Z   test_vision_densenet121_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3042540Z   test_vision_densenet161_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3042924Z   test_vision_densenet161_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3043307Z   test_vision_densenet169_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3043688Z   test_vision_densenet169_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3044058Z   test_vision_densenet201_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3044436Z   test_vision_densenet201_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3044824Z   test_vision_efficientnet_b0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3045222Z   test_vision_efficientnet_b0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3045614Z   test_vision_efficientnet_b1_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3046039Z   test_vision_efficientnet_b1_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3046435Z   test_vision_efficientnet_b2_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3046824Z   test_vision_efficientnet_b2_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3047202Z   test_vision_efficientnet_b3_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3047592Z   test_vision_efficientnet_b3_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3047988Z   test_vision_efficientnet_b4_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3048374Z   test_vision_efficientnet_b4_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3048752Z   test_vision_efficientnet_b5_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3049141Z   test_vision_efficientnet_b5_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3049527Z   test_vision_efficientnet_b6_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3049909Z   test_vision_efficientnet_b6_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3050320Z   test_vision_efficientnet_b7_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3050707Z   test_vision_efficientnet_b7_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3051096Z   test_vision_googlenet_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3051481Z   test_vision_googlenet_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3051858Z   test_vision_mnasnet1_0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3052235Z   test_vision_mnasnet1_0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3052663Z   test_vision_mobilenet_v2_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3053046Z   test_vision_mobilenet_v2_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3053443Z   test_vision_mobilenet_v3_large_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3053837Z   test_vision_mobilenet_v3_large_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3054226Z   test_vision_regnet_y_400mf_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3054938Z   test_vision_regnet_y_400mf_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3055351Z   test_vision_resnet50_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3055731Z   test_vision_resnet50_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3056124Z   test_vision_resnext101_32x8d_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3056562Z   test_vision_resnext101_32x8d_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3056952Z   test_vision_resnext50_32x4d_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3057329Z   test_vision_resnext50_32x4d_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3057721Z   test_vision_shufflenet_v2_x1_0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3058119Z   test_vision_shufflenet_v2_x1_0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3058508Z   test_vision_squeezenet1_0_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3058892Z   test_vision_squeezenet1_0_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3059270Z   test_vision_vgg16_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3059646Z   test_vision_vgg16_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3060026Z   test_vision_wide_resnet50_2_bfloat16 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3060412Z   test_vision_wide_resnet50_2_float32 (__main__.TestModel) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3060829Z   test_add_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3061200Z   test_add_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3061585Z   test_add_scalar_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3061963Z   test_add_scalar_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3062346Z   test_addmm_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3062726Z   test_addmm_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3063107Z   test_avg_pool2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3063484Z   test_avg_pool2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3063869Z   test_bn2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3064240Z   test_bn2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3064612Z   test_cat_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3064977Z   test_cat_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3065349Z   test_conv2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3065732Z   test_conv2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3066107Z   test_eltwise_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3066514Z   test_eltwise_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3066903Z   test_identity_binary_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3067299Z   test_identity_binary_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3067690Z   test_layer_norm_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3068066Z   test_layer_norm_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3068448Z   test_linear_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3068822Z   test_linear_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3069199Z   test_max_pool2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3069586Z   test_max_pool2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3069959Z   test_mul_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3070325Z   test_mul_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3070692Z   test_softmax_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3071107Z   test_softmax_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3071490Z   test_typecheck_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3071887Z   test_typecheck_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.000s)
2023-01-11T21:28:51.3072287Z   test_variable_kernel_avg_pool2d_cuda_bfloat16 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3072705Z   test_variable_kernel_avg_pool2d_cuda_float32 (__main__.TestOpCUDA) ... skip: This test fails for BF16 on machines without AVX512. (0.001s)
2023-01-11T21:28:51.3072928Z 
2023-01-11T21:28:51.3073178Z ----------------------------------------------------------------------
2023-01-11T21:28:51.3073438Z Ran 107 tests in 0.119s
2023-01-11T21:28:51.3073569Z 
2023-01-11T21:28:51.3073644Z OK (skipped=106)
2023-01-11T21:28:51.3073763Z 
2023-01-11T21:28:51.3073854Z Generating XML reports...
2023-01-11T21:28:51.3074328Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestEnableDisableLlgaFuser-20230111212850.xml
2023-01-11T21:28:51.3074877Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestDynamoAOT-20230111212850.xml
2023-01-11T21:28:51.3075402Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestFusionPatternCUDA-20230111212850.xml
2023-01-11T21:28:51.3075913Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestModel-20230111212850.xml
2023-01-11T21:28:51.3076398Z Generated XML report: test-reports/python-unittest/test_jit_llga_fuser/TEST-TestOpCUDA-20230111212850.xml
2023-01-11T21:28:51.3076621Z 
2023-01-11T21:28:51.3076917Z ##[endgroup]
2023-01-11T21:28:51.3077318Z FINISHED PRINTING LOG FILE of test_jit_llga_fuser (/var/lib/jenkins/workspace/test/test-reports/test_jit_llga_fuser_05vvhmu3)
2023-01-11T21:28:51.3077548Z 
2023-01-11T21:28:53.2526735Z Ignoring disabled issues:  []
2023-01-11T21:28:53.2765035Z Running test_matmul_cuda ... [2023-01-11 21:28:53.276086]
2023-01-11T21:28:53.2767687Z Executing ['/opt/conda/bin/python', '-bb', 'test_matmul_cuda.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:28:53.276441]
2023-01-11T21:30:26.0607249Z 
2023-01-11T21:30:26.0608194Z Expand the folded group to see the log file of test_matmul_cuda
2023-01-11T21:30:26.0608895Z ##[group]PRINTING LOG FILE of test_matmul_cuda (/var/lib/jenkins/workspace/test/test-reports/test_matmul_cuda__5okok14)
2023-01-11T21:30:26.0610463Z 
2023-01-11T21:30:26.0610837Z Running tests...
2023-01-11T21:30:26.0611314Z ----------------------------------------------------------------------
2023-01-11T21:30:26.0613229Z Test results will be stored in test-reports/python-unittest/test_matmul_cuda
2023-01-11T21:30:26.0613641Z   test_cublas_addmm_size_10000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (9.429s)
2023-01-11T21:30:26.0614112Z   test_cublas_addmm_size_10000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (13.119s)
2023-01-11T21:30:26.0614737Z   test_cublas_addmm_size_10000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (7.847s)
2023-01-11T21:30:26.0615304Z   test_cublas_addmm_size_1000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.043s)
2023-01-11T21:30:26.0619624Z   test_cublas_addmm_size_1000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.088s)
2023-01-11T21:30:26.0620108Z   test_cublas_addmm_size_1000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.033s)
2023-01-11T21:30:26.0620596Z   test_cublas_addmm_size_100_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.003s)
2023-01-11T21:30:26.0621085Z   test_cublas_addmm_size_100_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.003s)
2023-01-11T21:30:26.0621552Z   test_cublas_addmm_size_100_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.003s)
2023-01-11T21:30:26.0622299Z   test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (15.688s)
2023-01-11T21:30:26.0622674Z   test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (14.463s)
2023-01-11T21:30:26.0623040Z   test_cublas_baddbmm_large_input_1_10000_10000_10000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (15.260s)
2023-01-11T21:30:26.0623518Z   test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (4.352s)
2023-01-11T21:30:26.0623965Z   test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (4.600s)
2023-01-11T21:30:26.0624454Z   test_cublas_baddbmm_large_input_1_10000_1000_10000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (4.673s)
2023-01-11T21:30:26.0624901Z   test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.069s)
2023-01-11T21:30:26.0625264Z   test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.067s)
2023-01-11T21:30:26.0625631Z   test_cublas_baddbmm_large_input_2_1000_1000_1000_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.074s)
2023-01-11T21:30:26.0625999Z   test_cublas_baddbmm_large_input_2_100_100_100_cuda_bfloat16 (__main__.TestMatmulCudaCUDA) ... ok (0.010s)
2023-01-11T21:30:26.0626355Z   test_cublas_baddbmm_large_input_2_100_100_100_cuda_float16 (__main__.TestMatmulCudaCUDA) ... ok (0.004s)
2023-01-11T21:30:26.0626718Z   test_cublas_baddbmm_large_input_2_100_100_100_cuda_float32 (__main__.TestMatmulCudaCUDA) ... ok (0.003s)
2023-01-11T21:30:26.0626917Z 
2023-01-11T21:30:26.0627159Z ----------------------------------------------------------------------
2023-01-11T21:30:26.0627427Z Ran 21 tests in 89.835s
2023-01-11T21:30:26.0627554Z 
2023-01-11T21:30:26.0627617Z OK
2023-01-11T21:30:26.0627720Z 
2023-01-11T21:30:26.0627815Z Generating XML reports...
2023-01-11T21:30:26.0632671Z Generated XML report: test-reports/python-unittest/test_matmul_cuda/TEST-TestMatmulCudaCUDA-20230111212855.xml
2023-01-11T21:30:26.0632937Z 
2023-01-11T21:30:26.0633314Z ##[endgroup]
2023-01-11T21:30:26.0633725Z FINISHED PRINTING LOG FILE of test_matmul_cuda (/var/lib/jenkins/workspace/test/test-reports/test_matmul_cuda__5okok14)
2023-01-11T21:30:26.0634095Z 
2023-01-11T21:30:28.0102741Z Ignoring disabled issues:  []
2023-01-11T21:30:28.0340496Z Running test_mkldnn_fusion ... [2023-01-11 21:30:28.033291]
2023-01-11T21:30:28.0341623Z Executing ['/opt/conda/bin/python', '-bb', 'test_mkldnn_fusion.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:30:28.033650]
2023-01-11T21:31:07.6482941Z 
2023-01-11T21:31:07.6483498Z Expand the folded group to see the log file of test_mkldnn_fusion
2023-01-11T21:31:07.6484243Z ##[group]PRINTING LOG FILE of test_mkldnn_fusion (/var/lib/jenkins/workspace/test/test-reports/test_mkldnn_fusion_8b5he92q)
2023-01-11T21:31:07.6484473Z 
2023-01-11T21:31:07.6486036Z Running tests...
2023-01-11T21:31:07.6486652Z ----------------------------------------------------------------------
2023-01-11T21:31:07.6487231Z Test results will be stored in test-reports/python-unittest/test_mkldnn_fusion
2023-01-11T21:31:07.6487822Z   test_conv_binary_fusion_ops (__main__.TestMkldnnFusion) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s)
2023-01-11T21:31:07.6488322Z   test_conv_unary_fusion_nnc (__main__.TestMkldnnFusion) ... ok (0.870s)
2023-01-11T21:31:07.6488741Z   test_conv_unary_fusion_ops (__main__.TestMkldnnFusion) ... ok (27.080s)
2023-01-11T21:31:07.6490615Z   test_linear_binary_fusion_ops (__main__.TestMkldnnFusion) ... ok (0.015s)
2023-01-11T21:31:07.6490990Z   test_linear_unary_fusion_ops (__main__.TestMkldnnFusion) ... ok (0.019s)
2023-01-11T21:31:07.6491353Z   test_single_conv (__main__.TestMkldnnFusion) ... ok (1.273s)
2023-01-11T21:31:07.6491674Z   test_unsupported_conv (__main__.TestMkldnnFusion) ... ok (7.621s)
2023-01-11T21:31:07.6492138Z 
2023-01-11T21:31:07.6492387Z ----------------------------------------------------------------------
2023-01-11T21:31:07.6492662Z Ran 7 tests in 37.741s
2023-01-11T21:31:07.6492827Z 
2023-01-11T21:31:07.6492933Z OK (skipped=1)
2023-01-11T21:31:07.6501323Z 
2023-01-11T21:31:07.6501637Z Generating XML reports...
2023-01-11T21:31:07.6502301Z Generated XML report: test-reports/python-unittest/test_mkldnn_fusion/TEST-TestMkldnnFusion-20230111213029.xml
2023-01-11T21:31:07.6502649Z 
2023-01-11T21:31:07.6503052Z ##[endgroup]
2023-01-11T21:31:07.6503473Z FINISHED PRINTING LOG FILE of test_mkldnn_fusion (/var/lib/jenkins/workspace/test/test-reports/test_mkldnn_fusion_8b5he92q)
2023-01-11T21:31:07.6503703Z 
2023-01-11T21:31:09.5978476Z Ignoring disabled issues:  []
2023-01-11T21:31:09.6213428Z Running test_module_init ... [2023-01-11 21:31:09.620784]
2023-01-11T21:31:09.6215751Z Executing ['/opt/conda/bin/python', '-bb', 'test_module_init.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:09.621148]
2023-01-11T21:31:15.5358604Z 
2023-01-11T21:31:15.5359264Z Expand the folded group to see the log file of test_module_init
2023-01-11T21:31:15.5360218Z ##[group]PRINTING LOG FILE of test_module_init (/var/lib/jenkins/workspace/test/test-reports/test_module_init___9kb0fz)
2023-01-11T21:31:15.5360514Z 
2023-01-11T21:31:15.5360634Z Running tests...
2023-01-11T21:31:15.5361278Z ----------------------------------------------------------------------
2023-01-11T21:31:15.5361866Z Test results will be stored in test-reports/python-unittest/test_module_init
2023-01-11T21:31:15.5362283Z   test_nn_AdaptiveAvgPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5364149Z   test_nn_AdaptiveAvgPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5364639Z   test_nn_AdaptiveAvgPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5365071Z   test_nn_AdaptiveAvgPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5365479Z   test_nn_AdaptiveAvgPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5365830Z   test_nn_AdaptiveAvgPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5366411Z   test_nn_AdaptiveLogSoftmaxWithLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5366803Z   test_nn_AdaptiveLogSoftmaxWithLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5367161Z   test_nn_AdaptiveMaxPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5367510Z   test_nn_AdaptiveMaxPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5367854Z   test_nn_AdaptiveMaxPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5368195Z   test_nn_AdaptiveMaxPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5368527Z   test_nn_AdaptiveMaxPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5368871Z   test_nn_AdaptiveMaxPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5369215Z   test_nn_AlphaDropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5369543Z   test_nn_AlphaDropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5369875Z   test_nn_AvgPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5370197Z   test_nn_AvgPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5370522Z   test_nn_AvgPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5370833Z   test_nn_AvgPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5371174Z   test_nn_AvgPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5371519Z   test_nn_AvgPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5371900Z   test_nn_BCELoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5372224Z   test_nn_BCELoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5372559Z   test_nn_BCEWithLogitsLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5372911Z   test_nn_BCEWithLogitsLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5373246Z   test_nn_BatchNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5373576Z   test_nn_BatchNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5373900Z   test_nn_BatchNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5374216Z   test_nn_BatchNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5374882Z   test_nn_BatchNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5375217Z   test_nn_BatchNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5375611Z   test_nn_Bilinear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5375930Z   test_nn_Bilinear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5376252Z   test_nn_CELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5376566Z   test_nn_CELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5376879Z   test_nn_CTCLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5377204Z   test_nn_CTCLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5377537Z   test_nn_ChannelShuffle_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5377876Z   test_nn_ChannelShuffle_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5378208Z   test_nn_ConstantPad1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5378550Z   test_nn_ConstantPad1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5378889Z   test_nn_ConstantPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5379284Z   test_nn_ConstantPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5379618Z   test_nn_ConstantPad3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5379953Z   test_nn_ConstantPad3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5380287Z   test_nn_Conv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5380595Z   test_nn_Conv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5380947Z   test_nn_Conv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5381276Z   test_nn_Conv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5381582Z   test_nn_Conv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5381890Z   test_nn_Conv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5382220Z   test_nn_ConvTranspose1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5382563Z   test_nn_ConvTranspose1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5382903Z   test_nn_ConvTranspose2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5383244Z   test_nn_ConvTranspose2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5383590Z   test_nn_ConvTranspose3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5383917Z   test_nn_ConvTranspose3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5384274Z   test_nn_CosineEmbeddingLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5384680Z   test_nn_CosineEmbeddingLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5385039Z   test_nn_CosineSimilarity_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5385381Z   test_nn_CosineSimilarity_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5385733Z   test_nn_CrossEntropyLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5386080Z   test_nn_CrossEntropyLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5386416Z   test_nn_CrossMapLRN2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5386764Z   test_nn_CrossMapLRN2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5387100Z   test_nn_Dropout1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5387429Z   test_nn_Dropout1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5387752Z   test_nn_Dropout2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5388076Z   test_nn_Dropout2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5388400Z   test_nn_Dropout3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5388723Z   test_nn_Dropout3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5389052Z   test_nn_Dropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5389377Z   test_nn_Dropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5389693Z   test_nn_ELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5390000Z   test_nn_ELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5390332Z   test_nn_EmbeddingBag_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5390673Z   test_nn_EmbeddingBag_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5391014Z   test_nn_Embedding_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5391379Z   test_nn_Embedding_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5391756Z   test_nn_FeatureAlphaDropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5392129Z   test_nn_FeatureAlphaDropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5392464Z   test_nn_Flatten_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5392791Z   test_nn_Flatten_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5393112Z   test_nn_Fold_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5393422Z   test_nn_Fold_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5393765Z   test_nn_FractionalMaxPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5394132Z   test_nn_FractionalMaxPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5394490Z   test_nn_FractionalMaxPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5394838Z   test_nn_FractionalMaxPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5395174Z   test_nn_GELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5395492Z   test_nn_GELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5395801Z   test_nn_GLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5396116Z   test_nn_GLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5396435Z   test_nn_GRUCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5396757Z   test_nn_GRUCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5397064Z   test_nn_GRU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.771s)
2023-01-11T21:31:15.5397407Z   test_nn_GRU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5397738Z   test_nn_GaussianNLLLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5398081Z   test_nn_GaussianNLLLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5398421Z   test_nn_GroupNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5398750Z   test_nn_GroupNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5399086Z   test_nn_Hardshrink_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5399408Z   test_nn_Hardshrink_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5399737Z   test_nn_Hardsigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5400071Z   test_nn_Hardsigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5400423Z   test_nn_Hardswish_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5400770Z   test_nn_Hardswish_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5401098Z   test_nn_Hardtanh_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5401431Z   test_nn_Hardtanh_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5401767Z   test_nn_HingeEmbeddingLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5402126Z   test_nn_HingeEmbeddingLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5402474Z   test_nn_HuberLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5402792Z   test_nn_HuberLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5403117Z   test_nn_Identity_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5403444Z   test_nn_Identity_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5403780Z   test_nn_InstanceNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5404115Z   test_nn_InstanceNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5404491Z   test_nn_InstanceNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5404836Z   test_nn_InstanceNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5405164Z   test_nn_InstanceNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5405507Z   test_nn_InstanceNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5405845Z   test_nn_KLDivLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5406183Z   test_nn_KLDivLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5406502Z   test_nn_L1Loss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5406827Z   test_nn_L1Loss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5407153Z   test_nn_LPPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5407477Z   test_nn_LPPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5407808Z   test_nn_LPPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5408137Z   test_nn_LPPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5408463Z   test_nn_LSTMCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5408776Z   test_nn_LSTMCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5409099Z   test_nn_LSTM_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5409420Z   test_nn_LSTM_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5409769Z   test_nn_LayerNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5410094Z   test_nn_LayerNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5410887Z   test_nn_LazyBatchNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py:180: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment.
2023-01-11T21:31:15.5411534Z   warnings.warn('Lazy modules are a new feature under heavy development '
2023-01-11T21:31:15.5411789Z ok (0.004s)
2023-01-11T21:31:15.5412061Z   test_nn_LazyBatchNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5412414Z   test_nn_LazyBatchNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5412762Z   test_nn_LazyBatchNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5413106Z   test_nn_LazyBatchNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5413456Z   test_nn_LazyBatchNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5413799Z   test_nn_LazyConv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5414125Z   test_nn_LazyConv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5414454Z   test_nn_LazyConv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5415306Z   test_nn_LazyConv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5415734Z   test_nn_LazyConv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5416045Z   test_nn_LazyConv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5416383Z   test_nn_LazyConvTranspose1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5416734Z   test_nn_LazyConvTranspose1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5417082Z   test_nn_LazyConvTranspose2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5417432Z   test_nn_LazyConvTranspose2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5417855Z   test_nn_LazyConvTranspose3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5418204Z   test_nn_LazyConvTranspose3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5418543Z   test_nn_LazyInstanceNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5418889Z   test_nn_LazyInstanceNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5419232Z   test_nn_LazyInstanceNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5419567Z   test_nn_LazyInstanceNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5419917Z   test_nn_LazyInstanceNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5420254Z   test_nn_LazyInstanceNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5420584Z   test_nn_LazyLinear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5420897Z   test_nn_LazyLinear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5421217Z   test_nn_LeakyReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5421534Z   test_nn_LeakyReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5421851Z   test_nn_Linear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5422152Z   test_nn_Linear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5422486Z   test_nn_LocalResponseNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5422834Z   test_nn_LocalResponseNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5423220Z   test_nn_LogSigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5423547Z   test_nn_LogSigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5423871Z   test_nn_LogSoftmax_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5424185Z   test_nn_LogSoftmax_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5424509Z   test_nn_MSELoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5424836Z   test_nn_MSELoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5425168Z   test_nn_MarginRankingLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5425512Z   test_nn_MarginRankingLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5425849Z   test_nn_MaxPool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5426178Z   test_nn_MaxPool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5426493Z   test_nn_MaxPool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5426813Z   test_nn_MaxPool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5427140Z   test_nn_MaxPool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5427464Z   test_nn_MaxPool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5427783Z   test_nn_MaxUnpool1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5428117Z   test_nn_MaxUnpool1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5428448Z   test_nn_MaxUnpool2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5428765Z   test_nn_MaxUnpool2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5429095Z   test_nn_MaxUnpool3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5429418Z   test_nn_MaxUnpool3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5429878Z   test_nn_Mish_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5430244Z   test_nn_Mish_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5430600Z   test_nn_ModuleDict_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5430916Z   test_nn_ModuleDict_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5431236Z   test_nn_ModuleList_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5431557Z   test_nn_ModuleList_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5431898Z   test_nn_MultiLabelMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5432258Z   test_nn_MultiLabelMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5432627Z   test_nn_MultiLabelSoftMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5433003Z   test_nn_MultiLabelSoftMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5433356Z   test_nn_MultiMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5433706Z   test_nn_MultiMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5434055Z   test_nn_MultiheadAttention_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5434399Z   test_nn_MultiheadAttention_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5434735Z   test_nn_NLLLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5435057Z   test_nn_NLLLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5435379Z   test_nn_PReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5435726Z   test_nn_PReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5436062Z   test_nn_PairwiseDistance_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5436416Z   test_nn_PairwiseDistance_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5436752Z   test_nn_ParameterDict_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5437091Z   test_nn_ParameterDict_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5437431Z   test_nn_ParameterList_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5437775Z   test_nn_ParameterList_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5438107Z   test_nn_PixelShuffle_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5438441Z   test_nn_PixelShuffle_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5438787Z   test_nn_PixelUnshuffle_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5439127Z   test_nn_PixelUnshuffle_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5439468Z   test_nn_PoissonNLLLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5439812Z   test_nn_PoissonNLLLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5440149Z   test_nn_RNNBase_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5440464Z   test_nn_RNNBase_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5440788Z   test_nn_RNNCellBase_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5441117Z   test_nn_RNNCellBase_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5441435Z   test_nn_RNNCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5441758Z   test_nn_RNNCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5442073Z   test_nn_RNN_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5442386Z   test_nn_RNN_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5442727Z   test_nn_RReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5443045Z   test_nn_RReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5443358Z   test_nn_ReLU6_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5443655Z   test_nn_ReLU6_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5443962Z   test_nn_ReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5444274Z   test_nn_ReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5444604Z   test_nn_ReflectionPad1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5444949Z   test_nn_ReflectionPad1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5445294Z   test_nn_ReflectionPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5445643Z   test_nn_ReflectionPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5445977Z   test_nn_ReflectionPad3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5446319Z   test_nn_ReflectionPad3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5446669Z   test_nn_ReplicationPad1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5447019Z   test_nn_ReplicationPad1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5447356Z   test_nn_ReplicationPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5447703Z   test_nn_ReplicationPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5448098Z   test_nn_ReplicationPad3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5448432Z   test_nn_ReplicationPad3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5448755Z   test_nn_SELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5449069Z   test_nn_SELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5449392Z   test_nn_Sequential_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5449714Z   test_nn_Sequential_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5450033Z   test_nn_SiLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5450344Z   test_nn_SiLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5450653Z   test_nn_Sigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5450978Z   test_nn_Sigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5451306Z   test_nn_SmoothL1Loss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5451637Z   test_nn_SmoothL1Loss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5451967Z   test_nn_SoftMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5452310Z   test_nn_SoftMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5452641Z   test_nn_Softmax2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5452954Z   test_nn_Softmax2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5453280Z   test_nn_Softmax_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5453600Z   test_nn_Softmax_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5453917Z   test_nn_Softmin_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5454230Z   test_nn_Softmin_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5454877Z   test_nn_Softplus_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5475495Z   test_nn_Softplus_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5476170Z   test_nn_Softshrink_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5476505Z   test_nn_Softshrink_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5476818Z   test_nn_Softsign_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5477139Z   test_nn_Softsign_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5477465Z   test_nn_SyncBatchNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5477802Z   test_nn_SyncBatchNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5478122Z   test_nn_Tanh_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5478430Z   test_nn_Tanh_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5478749Z   test_nn_Tanhshrink_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5479063Z   test_nn_Tanhshrink_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5479388Z   test_nn_Threshold_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5479707Z   test_nn_Threshold_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5480052Z   test_nn_TransformerDecoderLayer_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.027s)
2023-01-11T21:31:15.5480440Z   test_nn_TransformerDecoderLayer_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.006s)
2023-01-11T21:31:15.5480820Z   test_nn_TransformerDecoder_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.007s)
2023-01-11T21:31:15.5481230Z   test_nn_TransformerDecoder_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.007s)
2023-01-11T21:31:15.5481576Z   test_nn_TransformerEncoderLayer_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5481941Z   test_nn_TransformerEncoderLayer_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5482295Z   test_nn_TransformerEncoder_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5482644Z   test_nn_TransformerEncoder_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.006s)
2023-01-11T21:31:15.5482972Z   test_nn_Transformer_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.307s)
2023-01-11T21:31:15.5483295Z   test_nn_Transformer_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.376s)
2023-01-11T21:31:15.5483638Z   test_nn_TripletMarginLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5483984Z   test_nn_TripletMarginLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5484355Z   test_nn_TripletMarginWithDistanceLoss_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5484743Z   test_nn_TripletMarginWithDistanceLoss_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5485105Z   test_nn_Unflatten_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5485420Z   test_nn_Unflatten_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5485732Z   test_nn_Unfold_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5486045Z   test_nn_Unfold_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5486360Z   test_nn_Upsample_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5486679Z   test_nn_Upsample_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5487018Z   test_nn_UpsamplingBilinear2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5487376Z   test_nn_UpsamplingBilinear2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5487724Z   test_nn_UpsamplingNearest2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5488081Z   test_nn_UpsamplingNearest2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5488461Z   test_nn_ZeroPad2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5488789Z   test_nn_ZeroPad2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5489099Z   test_qat_Conv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5489415Z   test_qat_Conv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5489721Z   test_qat_Conv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5490018Z   test_qat_Conv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5490328Z   test_qat_Conv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5490642Z   test_qat_Conv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5490959Z   test_qat_EmbeddingBag_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5491285Z   test_qat_EmbeddingBag_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5491614Z   test_qat_Embedding_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5491936Z   test_qat_Embedding_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5492246Z   test_qat_Linear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5492557Z   test_qat_Linear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5492879Z   test_quantizable_LSTMCell_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5493227Z   test_quantizable_LSTMCell_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5493588Z   test_quantizable_LSTM_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5493918Z   test_quantizable_LSTM_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5494276Z   test_quantizable_MultiheadAttention_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5495010Z   test_quantizable_MultiheadAttention_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.005s)
2023-01-11T21:31:15.5495398Z   test_quantized_BatchNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5495838Z   test_quantized_BatchNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5496189Z   test_quantized_BatchNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5496523Z   test_quantized_BatchNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5496855Z   test_quantized_Conv1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5497183Z   test_quantized_Conv1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5497499Z   test_quantized_Conv2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5497823Z   test_quantized_Conv2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5498148Z   test_quantized_Conv3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5498471Z   test_quantized_Conv3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5498808Z   test_quantized_ConvTranspose1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5499161Z   test_quantized_ConvTranspose1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5499509Z   test_quantized_ConvTranspose2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5499849Z   test_quantized_ConvTranspose2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5500197Z   test_quantized_ConvTranspose3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5500552Z   test_quantized_ConvTranspose3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5500989Z   test_quantized_DeQuantize_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5501327Z   test_quantized_DeQuantize_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5501664Z   test_quantized_Dropout_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5501996Z   test_quantized_Dropout_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5502321Z   test_quantized_ELU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5502643Z   test_quantized_ELU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5502989Z   test_quantized_FXFloatFunctional_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5503357Z   test_quantized_FXFloatFunctional_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5503710Z   test_quantized_FloatFunctional_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5504061Z   test_quantized_FloatFunctional_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5504401Z   test_quantized_GroupNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5504741Z   test_quantized_GroupNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5505062Z   test_quantized_Hardswish_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5505394Z   test_quantized_Hardswish_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5505734Z   test_quantized_InstanceNorm1d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5506134Z   test_quantized_InstanceNorm1d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5506479Z   test_quantized_InstanceNorm2d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5506821Z   test_quantized_InstanceNorm2d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5507159Z   test_quantized_InstanceNorm3d_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5507493Z   test_quantized_InstanceNorm3d_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5507833Z   test_quantized_LayerNorm_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5508167Z   test_quantized_LayerNorm_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5508487Z   test_quantized_LeakyReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5508815Z   test_quantized_LeakyReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5509155Z   test_quantized_Linear_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5509485Z   test_quantized_Linear_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5509809Z   test_quantized_PReLU_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.009s)
2023-01-11T21:31:15.5510143Z   test_quantized_PReLU_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.012s)
2023-01-11T21:31:15.5510507Z   test_quantized_QFunctional_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.004s)
2023-01-11T21:31:15.5510866Z   test_quantized_QFunctional_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5511207Z   test_quantized_Quantize_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5511540Z   test_quantized_Quantize_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.003s)
2023-01-11T21:31:15.5511872Z   test_quantized_ReLU6_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5512191Z   test_quantized_ReLU6_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5512520Z   test_quantized_Sigmoid_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5512892Z   test_quantized_Sigmoid_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5513223Z   test_quantized_Softmax_cuda_float32 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5513550Z   test_quantized_Softmax_cuda_float64 (__main__.TestModuleInitCUDA) ... ok (0.002s)
2023-01-11T21:31:15.5513739Z 
2023-01-11T21:31:15.5514060Z ----------------------------------------------------------------------
2023-01-11T21:31:15.5514325Z Ran 374 tests in 2.499s
2023-01-11T21:31:15.5514449Z 
2023-01-11T21:31:15.5514513Z OK
2023-01-11T21:31:15.5514617Z 
2023-01-11T21:31:15.5514711Z Generating XML reports...
2023-01-11T21:31:15.5515160Z Generated XML report: test-reports/python-unittest/test_module_init/TEST-TestModuleInitCUDA-20230111213112.xml
2023-01-11T21:31:15.5515416Z 
2023-01-11T21:31:15.5515799Z ##[endgroup]
2023-01-11T21:31:15.5516196Z FINISHED PRINTING LOG FILE of test_module_init (/var/lib/jenkins/workspace/test/test-reports/test_module_init___9kb0fz)
2023-01-11T21:31:15.5516421Z 
2023-01-11T21:31:17.4472376Z Ignoring disabled issues:  []
2023-01-11T21:31:17.4710251Z Running test_native_mha ... [2023-01-11 21:31:17.470488]
2023-01-11T21:31:17.4710966Z Executing ['/opt/conda/bin/python', '-bb', 'test_native_mha.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:17.470826]
2023-01-11T21:31:21.3459174Z 
2023-01-11T21:31:21.3459885Z Expand the folded group to see the log file of test_native_mha
2023-01-11T21:31:21.3460936Z ##[group]PRINTING LOG FILE of test_native_mha (/var/lib/jenkins/workspace/test/test-reports/test_native_mha_y7qjc5i2)
2023-01-11T21:31:21.3461290Z 
2023-01-11T21:31:21.3461419Z Running tests...
2023-01-11T21:31:21.3461991Z ----------------------------------------------------------------------
2023-01-11T21:31:21.3462686Z Test results will be stored in test-reports/python-unittest/test_native_mha
2023-01-11T21:31:21.3463050Z   test_native_multihead_attention_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.670s)
2023-01-11T21:31:21.3463449Z   test_native_multihead_attention_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.003s)
2023-01-11T21:31:21.3463837Z   test_native_multihead_encoder_decoder_attention_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.002s)
2023-01-11T21:31:21.3464225Z   test_native_multihead_encoder_decoder_attention_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.002s)
2023-01-11T21:31:21.3464720Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3465288Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.004s)
2023-01-11T21:31:21.3465850Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3466410Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.004s)
2023-01-11T21:31:21.3466952Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3467494Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3468054Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3468665Z   test_native_multihead_self_attention_use_nt_False_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.004s)
2023-01-11T21:31:21.3469216Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3469770Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3470322Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3470884Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3471428Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3471985Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3472527Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3473111Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3473663Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3474208Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3474755Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3475293Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3475843Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3476383Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3476925Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3477471Z   test_native_multihead_self_attention_use_nt_False_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3478299Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... /var/lib/jenkins/workspace/test/test_native_mha.py:207: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/NestedTensorImpl.cpp:179.)
2023-01-11T21:31:21.3478951Z   q = torch.nested.nested_tensor(qs, device=device, dtype=dtype)
2023-01-11T21:31:21.3479198Z ok (0.006s)
2023-01-11T21:31:21.3479577Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3480126Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3480668Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3481210Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3481745Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3482291Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3482865Z   test_native_multihead_self_attention_use_nt_True_use_padding_False_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3483404Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3483942Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3484476Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3485021Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3485568Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3486103Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3486642Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3487182Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_False_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.005s)
2023-01-11T21:31:21.3487728Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s)
2023-01-11T21:31:21.3488304Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s)
2023-01-11T21:31:21.3488842Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s)
2023-01-11T21:31:21.3489381Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_False_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s)
2023-01-11T21:31:21.3489915Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s)
2023-01-11T21:31:21.3490457Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_False_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3490994Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float16 (__main__.TestMHADeviceTypeCUDA) ... ok (0.008s)
2023-01-11T21:31:21.3491528Z   test_native_multihead_self_attention_use_nt_True_use_padding_True_pad_all_True_need_weights_False_average_attn_weights_True_fused_True_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.007s)
2023-01-11T21:31:21.3491972Z   test_transform_bias_rescale_qkv_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.011s)
2023-01-11T21:31:21.3492418Z   test_transform_bias_rescale_qkv_nested_cuda_float32 (__main__.TestMHADeviceTypeCUDA) ... ok (0.013s)
2023-01-11T21:31:21.3492629Z 
2023-01-11T21:31:21.3492840Z ----------------------------------------------------------------------
2023-01-11T21:31:21.3493098Z Ran 54 tests in 0.982s
2023-01-11T21:31:21.3493226Z 
2023-01-11T21:31:21.3493298Z OK
2023-01-11T21:31:21.3493402Z 
2023-01-11T21:31:21.3493504Z Generating XML reports...
2023-01-11T21:31:21.3493961Z Generated XML report: test-reports/python-unittest/test_native_mha/TEST-TestMHADeviceTypeCUDA-20230111213119.xml
2023-01-11T21:31:21.3494207Z 
2023-01-11T21:31:21.3494455Z ##[endgroup]
2023-01-11T21:31:21.3494989Z FINISHED PRINTING LOG FILE of test_native_mha (/var/lib/jenkins/workspace/test/test-reports/test_native_mha_y7qjc5i2)
2023-01-11T21:31:21.3495210Z 
2023-01-11T21:31:23.2848983Z Ignoring disabled issues:  []
2023-01-11T21:31:23.3083115Z Running test_numpy_interop ... [2023-01-11 21:31:23.307528]
2023-01-11T21:31:23.3083761Z Executing ['/opt/conda/bin/python', '-bb', 'test_numpy_interop.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:23.307887]
2023-01-11T21:31:26.2523103Z 
2023-01-11T21:31:26.2523727Z Expand the folded group to see the log file of test_numpy_interop
2023-01-11T21:31:26.2524967Z ##[group]PRINTING LOG FILE of test_numpy_interop (/var/lib/jenkins/workspace/test/test-reports/test_numpy_interop_ei77a9xw)
2023-01-11T21:31:26.2525332Z 
2023-01-11T21:31:26.2525456Z Running tests...
2023-01-11T21:31:26.2526057Z ----------------------------------------------------------------------
2023-01-11T21:31:26.2526621Z Test results will be stored in test-reports/python-unittest/test_numpy_interop
2023-01-11T21:31:26.2527634Z   test_ctor_with_invalid_numpy_array_sequence_cuda (__main__.TestNumPyInteropCUDA) ... /var/lib/jenkins/workspace/test/test_numpy_interop.py:265: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_new.cpp:259.)
2023-01-11T21:31:26.2528586Z   torch.tensor([np.random.random(size=(3, 3)), np.random.random(size=(3, 0))], device=device)
2023-01-11T21:31:26.2528969Z ok (0.006s)
2023-01-11T21:31:26.2529621Z   test_ctor_with_numpy_scalar_ctor_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:31:26.2530141Z   test_from_list_of_ndarray_warning_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.001s)
2023-01-11T21:31:26.2530600Z   test_from_numpy_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.009s)
2023-01-11T21:31:26.2531519Z   test_has_storage_numpy_cuda (__main__.TestNumPyInteropCUDA) ... /var/lib/jenkins/workspace/test/test_numpy_interop.py:441: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:31:26.2532472Z   self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.float32).storage())
2023-01-11T21:31:26.2533340Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:442: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:31:26.2534172Z   self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.double).storage())
2023-01-11T21:31:26.2535353Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:443: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:31:26.2536384Z   self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.int).storage())
2023-01-11T21:31:26.2537231Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:444: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:31:26.2538039Z   self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.long).storage())
2023-01-11T21:31:26.2538903Z /var/lib/jenkins/workspace/test/test_numpy_interop.py:445: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:31:26.2539719Z   self.assertIsNotNone(torch.tensor(arr, device=device, dtype=torch.uint8).storage())
2023-01-11T21:31:26.2540102Z ok (0.002s)
2023-01-11T21:31:26.2540510Z   test_multiplication_numpy_scalar_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:31:26.2541054Z   test_numpy_array_interface_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:31:26.2541579Z   test_numpy_index_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:31:26.2542090Z   test_numpy_non_writeable_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T21:31:26.2542592Z   test_numpy_scalar_cmp_cuda_bfloat16 (__main__.TestNumPyInteropCUDA) ... ok (0.004s)
2023-01-11T21:31:26.2543064Z   test_numpy_scalar_cmp_cuda_bool (__main__.TestNumPyInteropCUDA) ... ok (0.003s)
2023-01-11T21:31:26.2543596Z   test_numpy_scalar_cmp_cuda_complex128 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2544290Z   test_numpy_scalar_cmp_cuda_complex64 (__main__.TestNumPyInteropCUDA) ... /var/lib/jenkins/workspace/test/test_numpy_interop.py:471: ComplexWarning: Casting complex values to real discards the imaginary part
2023-01-11T21:31:26.2544916Z   self.assertFalse(t == a)
2023-01-11T21:31:26.2545176Z ok (0.002s)
2023-01-11T21:31:26.2545636Z   test_numpy_scalar_cmp_cuda_float16 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2546105Z   test_numpy_scalar_cmp_cuda_float32 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2546582Z   test_numpy_scalar_cmp_cuda_float64 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2547047Z   test_numpy_scalar_cmp_cuda_int16 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2547531Z   test_numpy_scalar_cmp_cuda_int32 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2550285Z   test_numpy_scalar_cmp_cuda_int64 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2550769Z   test_numpy_scalar_cmp_cuda_int8 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2551243Z   test_numpy_scalar_cmp_cuda_uint8 (__main__.TestNumPyInteropCUDA) ... ok (0.002s)
2023-01-11T21:31:26.2551732Z   test_numpy_unresizable_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:31:26.2552252Z   test_parse_numpy_int_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:31:26.2552745Z   test_to_numpy_bool_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.001s)
2023-01-11T21:31:26.2553230Z   test_to_numpy_cuda (__main__.TestNumPyInteropCUDA) ... skip: Only runs on cpu (0.003s)
2023-01-11T21:31:26.2553707Z   test_to_numpy_force_argument_cuda (__main__.TestNumPyInteropCUDA) ... ok (0.014s)
2023-01-11T21:31:26.2553961Z 
2023-01-11T21:31:26.2554331Z ----------------------------------------------------------------------
2023-01-11T21:31:26.2554692Z Ran 26 tests in 0.075s
2023-01-11T21:31:26.2554861Z 
2023-01-11T21:31:26.2554980Z OK (skipped=8)
2023-01-11T21:31:26.2555280Z 
2023-01-11T21:31:26.2555408Z Generating XML reports...
2023-01-11T21:31:26.2556009Z Generated XML report: test-reports/python-unittest/test_numpy_interop/TEST-TestNumPyInteropCUDA-20230111213125.xml
2023-01-11T21:31:26.2556366Z 
2023-01-11T21:31:26.2556846Z ##[endgroup]
2023-01-11T21:31:26.2557411Z FINISHED PRINTING LOG FILE of test_numpy_interop (/var/lib/jenkins/workspace/test/test-reports/test_numpy_interop_ei77a9xw)
2023-01-11T21:31:26.2557742Z 
2023-01-11T21:31:28.1987381Z Ignoring disabled issues:  []
2023-01-11T21:31:28.2226940Z Running test_optim ... [2023-01-11 21:31:28.222120]
2023-01-11T21:31:28.2227458Z Executing ['/opt/conda/bin/python', '-bb', 'test_optim.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:31:28.222464]
2023-01-11T21:33:08.0026761Z 
2023-01-11T21:33:08.0027354Z Expand the folded group to see the log file of test_optim
2023-01-11T21:33:08.0028256Z ##[group]PRINTING LOG FILE of test_optim (/var/lib/jenkins/workspace/test/test-reports/test_optim_mznt_uo2)
2023-01-11T21:33:08.0029697Z 
2023-01-11T21:33:08.0029923Z Running tests...
2023-01-11T21:33:08.0030437Z ----------------------------------------------------------------------
2023-01-11T21:33:08.0058882Z Test results will be stored in test-reports/python-unittest/test_optim
2023-01-11T21:33:08.0059282Z   test_adadelta (__main__.TestDifferentiableOptimizer) ... ok (1.090s)
2023-01-11T21:33:08.0059760Z   test_adagrad (__main__.TestDifferentiableOptimizer) ... ok (0.005s)
2023-01-11T21:33:08.0060219Z   test_adam (__main__.TestDifferentiableOptimizer) ... ok (0.012s)
2023-01-11T21:33:08.0060596Z   test_adamax (__main__.TestDifferentiableOptimizer) ... ok (0.009s)
2023-01-11T21:33:08.0060978Z   test_adamw (__main__.TestDifferentiableOptimizer) ... ok (0.012s)
2023-01-11T21:33:08.0062069Z   test_asgd (__main__.TestDifferentiableOptimizer) ... ok (0.005s)
2023-01-11T21:33:08.0062544Z   test_nadam (__main__.TestDifferentiableOptimizer) ... ok (0.012s)
2023-01-11T21:33:08.0063018Z   test_radam (__main__.TestDifferentiableOptimizer) ... ok (0.008s)
2023-01-11T21:33:08.0063471Z   test_rmsprop (__main__.TestDifferentiableOptimizer) ... ok (0.011s)
2023-01-11T21:33:08.0063867Z   test_rprop (__main__.TestDifferentiableOptimizer) ... ok (0.009s)
2023-01-11T21:33:08.0064180Z   test_sgd (__main__.TestDifferentiableOptimizer) ... ok (0.004s)
2023-01-11T21:33:08.0064871Z   test_CosineAnnealingWarmRestarts_lr1_T_mult_1 (__main__.TestLRScheduler) ... ok (0.008s)
2023-01-11T21:33:08.0065316Z   test_CosineAnnealingWarmRestarts_lr1_T_mult_2 (__main__.TestLRScheduler) ... ok (0.008s)
2023-01-11T21:33:08.0065819Z   test_CosineAnnealingWarmRestarts_lr1_T_mult_4 (__main__.TestLRScheduler) ... ok (0.007s)
2023-01-11T21:33:08.0066309Z   test_CosineAnnealingWarmRestarts_lr2 (__main__.TestLRScheduler) ... ok (0.063s)
2023-01-11T21:33:08.0066680Z   test_CosineAnnealingWarmRestarts_lr3 (__main__.TestLRScheduler) ... ok (0.004s)
2023-01-11T21:33:08.0067041Z   test_CosineAnnealingWarmRestarts_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0068161Z   test_chained_lr1 (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0070423Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0070820Z ok (0.001s)
2023-01-11T21:33:08.0071165Z   test_chained_lr2 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0071564Z   test_chained_lr2_get_last_lr_before_step (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0071895Z   test_chained_lr3 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0072310Z   test_chained_lr4 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0072590Z   test_chained_lr5 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0072875Z   test_closed_form_constantlr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0073249Z   test_closed_form_cos_anneal_lr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0073694Z   test_closed_form_exp_lr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0074099Z   test_closed_form_linearlr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0074443Z   test_closed_form_multi_step_lr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0074755Z   test_closed_form_poly_lr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0075057Z   test_closed_form_step_lr (__main__.TestLRScheduler) ... ok (0.006s)
2023-01-11T21:33:08.0076360Z   test_compound_cosanneal_and_exp_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0077435Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0077796Z ok (0.002s)
2023-01-11T21:33:08.0078122Z   test_compound_cosanneal_and_linearlr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0078567Z   test_compound_cosanneal_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0078986Z   test_compound_cosanneal_and_step_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0079383Z   test_compound_exp_and_linearlr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0079783Z   test_compound_exp_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0080118Z   test_compound_linearlr_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0080450Z   test_compound_reduce_lr_on_plateau1 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0080838Z   test_compound_reduce_lr_on_plateau2 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0081162Z   test_compound_reduce_lr_on_plateau3 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0081481Z   test_compound_reduce_lr_on_plateau4 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0081795Z   test_compound_reduce_lr_on_plateau5 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0082170Z   test_compound_step_and_constantlr (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0082489Z   test_compound_step_and_exp_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0082885Z   test_compound_step_and_multistep_lr (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0083190Z   test_constantlr (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0083508Z   test_constantlr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0083832Z   test_constantlr_with_epoch (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0084868Z   test_cos_anneal_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0085612Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0085919Z ok (0.001s)
2023-01-11T21:33:08.0086159Z   test_cos_anneal_lr_continue (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0086463Z   test_cosine_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0086764Z   test_cosine_then_cyclic (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0087110Z   test_cycle_lr_cycle_momentum_fail_with_momentumless_optimizer (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0087446Z   test_cycle_lr_exp_range_mode (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0087760Z   test_cycle_lr_exp_range_mode_one_lr (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0088095Z   test_cycle_lr_exp_range_mode_step_size_up_down (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0088410Z   test_cycle_lr_invalid_mode (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0088730Z   test_cycle_lr_removed_after_out_of_scope (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0089068Z   test_cycle_lr_scale_fn_restored_from_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0089401Z   test_cycle_lr_state_dict_picklable (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0089712Z   test_cycle_lr_triangular2_mode (__main__.TestLRScheduler) ... ok (0.004s)
2023-01-11T21:33:08.0090037Z   test_cycle_lr_triangular2_mode_one_lr (__main__.TestLRScheduler) ... ok (0.004s)
2023-01-11T21:33:08.0090378Z   test_cycle_lr_triangular2_mode_step_size_up_down (__main__.TestLRScheduler) ... ok (0.005s)
2023-01-11T21:33:08.0090698Z   test_cycle_lr_triangular_mode (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0091019Z   test_cycle_lr_triangular_mode_one_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0091353Z   test_cycle_lr_triangular_mode_one_lr_no_momentum (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0091695Z   test_cycle_lr_triangular_mode_step_size_up_down (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0092010Z   test_cycle_lr_with_adam (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0092334Z   test_cycle_lr_with_momentumless_optimizer (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0092662Z   test_error_when_getlr_has_epoch (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0092990Z   test_exp_lr (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0093284Z   test_exp_step_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0093624Z   test_exponential_lr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0093955Z   test_get_last_lr_constantlr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0094250Z   test_get_last_lr_linearlr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0095219Z   test_get_last_lr_multi_step_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0096492Z   test_get_last_lr_sequentiallr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:152: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
2023-01-11T21:33:08.0097247Z   warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
2023-01-11T21:33:08.0097479Z ok (0.002s)
2023-01-11T21:33:08.0097725Z   test_get_last_lr_step_lr (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0098743Z   test_lambda_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0099621Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0099887Z ok (0.001s)
2023-01-11T21:33:08.0100132Z   test_lambda_lr_state_dict_fn (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0100443Z   test_lambda_lr_state_dict_obj (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0100777Z   test_linear_linearlr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0101819Z   test_linearlr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0102607Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0102869Z ok (0.001s)
2023-01-11T21:33:08.0103125Z   test_linearlr_start_factor_limits1 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0103434Z   test_linearlr_start_factor_limits2 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0103746Z   test_linearlr_with_epoch (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0104752Z   test_multi_step_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0105543Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0105809Z ok (0.001s)
2023-01-11T21:33:08.0106049Z   test_multi_step_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0106363Z   test_multi_step_lr_with_epoch (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0107376Z   test_multiplicative_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0108100Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0108350Z ok (0.001s)
2023-01-11T21:33:08.0108600Z   test_new_pattern_no_warning (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0108918Z   test_new_pattern_no_warning_with_arg (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0109265Z   test_new_pattern_no_warning_with_overridden_optim_step (__main__.TestLRScheduler) ... ok (0.005s)
2023-01-11T21:33:08.0109593Z   test_no_cyclic_references (__main__.TestLRScheduler) ... ok (0.138s)
2023-01-11T21:33:08.0109906Z   test_no_cyclic_references_in_step (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0110229Z   test_old_pattern_warning (__main__.TestLRScheduler) ... ok (0.004s)
2023-01-11T21:33:08.0110534Z   test_old_pattern_warning_resuming (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0110911Z   test_old_pattern_warning_resuming_with_arg (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0111251Z   test_old_pattern_warning_with_arg (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0111599Z   test_old_pattern_warning_with_overridden_optim_step (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0111956Z   test_onecycle_lr_cannot_calculate_total_steps (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0112294Z   test_onecycle_lr_cosine_annealing (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0112626Z   test_onecycle_lr_invalid_anneal_strategy (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0112950Z   test_onecycle_lr_invalid_pct_start (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0113275Z   test_onecycle_lr_linear_annealing (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0113614Z   test_onecycle_lr_linear_annealing_three_phases (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0114650Z   test_poly_lr (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0115378Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0115635Z ok (0.001s)
2023-01-11T21:33:08.0115910Z   test_polynomial_lr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0116246Z   test_reduce_lr_on_plateau1 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0116546Z   test_reduce_lr_on_plateau2 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0116855Z   test_reduce_lr_on_plateau3 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0117160Z   test_reduce_lr_on_plateau4 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0117461Z   test_reduce_lr_on_plateau5 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0117795Z   test_reduce_lr_on_plateau6 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0118095Z   test_reduce_lr_on_plateau7 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0118389Z   test_reduce_lr_on_plateau8 (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0118696Z   test_reduce_lr_on_plateau_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0119721Z   test_sequentiallr1 (__main__.TestLRScheduler) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0120472Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0121487Z /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:152: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
2023-01-11T21:33:08.0122174Z   warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
2023-01-11T21:33:08.0122456Z ok (0.002s)
2023-01-11T21:33:08.0122691Z   test_sequentiallr2 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0122982Z   test_sequentiallr3 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0123273Z   test_sequentiallr4 (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0123551Z   test_step_lr (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0123870Z   test_step_lr_is_constant_for_constant_epoch (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0124194Z   test_step_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0124488Z   test_swa_lr_state_dict (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0124815Z   test_swalr_cosine_anneal_after_multiplicative (__main__.TestLRScheduler) ... ok (0.002s)
2023-01-11T21:33:08.0125137Z   test_swalr_hypers (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0125460Z   test_swalr_linear_anneal_after_multiplicative (__main__.TestLRScheduler) ... ok (0.003s)
2023-01-11T21:33:08.0125780Z   test_swalr_no_anneal (__main__.TestLRScheduler) ... ok (0.001s)
2023-01-11T21:33:08.0126782Z   test_adadelta (__main__.TestOptim) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0127515Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0127785Z ok (5.248s)
2023-01-11T21:33:08.0128016Z   test_adadelta_complex (__main__.TestOptim) ... ok (0.004s)
2023-01-11T21:33:08.0128289Z   test_adagrad (__main__.TestOptim) ... ok (6.156s)
2023-01-11T21:33:08.0128563Z   test_adagrad_complex (__main__.TestOptim) ... ok (0.005s)
2023-01-11T21:33:08.0128838Z   test_adagrad_sparse (__main__.TestOptim) ... ok (8.841s)
2023-01-11T21:33:08.0129105Z   test_adam (__main__.TestOptim) ... ok (15.067s)
2023-01-11T21:33:08.0129368Z   test_adamax (__main__.TestOptim) ... ok (3.625s)
2023-01-11T21:33:08.0129631Z   test_adamw (__main__.TestOptim) ... ok (4.546s)
2023-01-11T21:33:08.0129929Z   test_asgd (__main__.TestOptim) ... ok (6.321s)
2023-01-11T21:33:08.0130218Z   test_duplicate_params_in_param_group (__main__.TestOptim) ... ok (0.001s)
2023-01-11T21:33:08.0130508Z   test_empty_grad (__main__.TestOptim) ... ok (0.007s)
2023-01-11T21:33:08.0130797Z   test_functional_fused_adam_with_foundinf (__main__.TestOptim) ... ok (0.005s)
2023-01-11T21:33:08.0131097Z   test_fused_optimizers (__main__.TestOptim) ... ok (0.043s)
2023-01-11T21:33:08.0131382Z   test_invalid_param_type (__main__.TestOptim) ... ok (0.000s)
2023-01-11T21:33:08.0131640Z   test_lbfgs (__main__.TestOptim) ... ok (0.708s)
2023-01-11T21:33:08.0131911Z   test_lbfgs_return_type (__main__.TestOptim) ... ok (0.001s)
2023-01-11T21:33:08.0132203Z   test_multi_tensor_optimizers (__main__.TestOptim) ... ok (0.303s)
2023-01-11T21:33:08.0133191Z   test_nadam (__main__.TestOptim) ... /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
2023-01-11T21:33:08.0133923Z   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
2023-01-11T21:33:08.0134180Z ok (2.351s)
2023-01-11T21:33:08.0134420Z   test_no_grad_for_all_params (__main__.TestOptim) ... ok (0.001s)
2023-01-11T21:33:08.0135231Z   test_post_hook (__main__.TestOptim) ... ok (0.001s)
2023-01-11T21:33:08.0135627Z   test_pre_and_post_hook (__main__.TestOptim) ... ok (0.002s)
2023-01-11T21:33:08.0135904Z   test_pre_hook (__main__.TestOptim) ... ok (0.001s)
2023-01-11T21:33:08.0136267Z   test_radam (__main__.TestOptim) ... ok (2.638s)
2023-01-11T21:33:08.0136522Z   test_rmsprop (__main__.TestOptim) ... ok (11.664s)
2023-01-11T21:33:08.0136784Z   test_rprop (__main__.TestOptim) ... ok (5.183s)
2023-01-11T21:33:08.0137040Z   test_sgd (__main__.TestOptim) ... ok (12.021s)
2023-01-11T21:33:08.0137295Z   test_sgd_complex (__main__.TestOptim) ... ok (0.009s)
2023-01-11T21:33:08.0137574Z   test_sgd_sparse (__main__.TestOptim) ... ok (7.405s)
2023-01-11T21:33:08.0137850Z   test_sparse_adam (__main__.TestOptim) ... ok (2.217s)
2023-01-11T21:33:08.0138145Z   test_averaged_model_all_devices (__main__.TestSWAUtils) ... ok (0.058s)
2023-01-11T21:33:08.0138456Z   test_averaged_model_exponential (__main__.TestSWAUtils) ... ok (0.008s)
2023-01-11T21:33:08.0138779Z   test_averaged_model_exponential_buffers (__main__.TestSWAUtils) ... ok (0.005s)
2023-01-11T21:33:08.0139108Z   test_averaged_model_mixed_device (__main__.TestSWAUtils) ... ok (0.006s)
2023-01-11T21:33:08.0139414Z   test_averaged_model_state_dict (__main__.TestSWAUtils) ... ok (0.004s)
2023-01-11T21:33:08.0139725Z   test_bn_update_eval_momentum (__main__.TestSWAUtils) ... ok (0.052s)
2023-01-11T21:33:08.0140020Z   test_update_bn_cnn (__main__.TestSWAUtils) ... ok (1.391s)
2023-01-11T21:33:08.0140305Z   test_update_bn_dnn (__main__.TestSWAUtils) ... ok (0.038s)
2023-01-11T21:33:08.0140457Z 
2023-01-11T21:33:08.0140681Z ----------------------------------------------------------------------
2023-01-11T21:33:08.0140951Z Ran 166 tests in 97.587s
2023-01-11T21:33:08.0141076Z 
2023-01-11T21:33:08.0141150Z OK
2023-01-11T21:33:08.0141254Z 
2023-01-11T21:33:08.0141342Z Generating XML reports...
2023-01-11T21:33:08.0141822Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestDifferentiableOptimizer-20230111213129.xml
2023-01-11T21:33:08.0142391Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestLRScheduler-20230111213129.xml
2023-01-11T21:33:08.0142888Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestOptim-20230111213129.xml
2023-01-11T21:33:08.0143365Z Generated XML report: test-reports/python-unittest/test_optim/TEST-TestSWAUtils-20230111213129.xml
2023-01-11T21:33:08.0143656Z 
2023-01-11T21:33:08.0144087Z ##[endgroup]
2023-01-11T21:33:08.0144476Z FINISHED PRINTING LOG FILE of test_optim (/var/lib/jenkins/workspace/test/test-reports/test_optim_mznt_uo2)
2023-01-11T21:33:08.0144691Z 
2023-01-11T21:33:09.9330521Z Ignoring disabled issues:  []
2023-01-11T21:33:09.9563903Z Running test_shape_ops ... [2023-01-11 21:33:09.955699]
2023-01-11T21:33:09.9564974Z Executing ['/opt/conda/bin/python', '-bb', 'test_shape_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:33:09.956032]
2023-01-11T21:33:14.5124223Z 
2023-01-11T21:33:14.5124668Z Expand the folded group to see the log file of test_shape_ops
2023-01-11T21:33:14.5125712Z ##[group]PRINTING LOG FILE of test_shape_ops (/var/lib/jenkins/workspace/test/test-reports/test_shape_ops_sg7hohkm)
2023-01-11T21:33:14.5126017Z 
2023-01-11T21:33:14.5126136Z Running tests...
2023-01-11T21:33:14.5126686Z ----------------------------------------------------------------------
2023-01-11T21:33:14.5127234Z Test results will be stored in test-reports/python-unittest/test_shape_ops
2023-01-11T21:33:14.5127667Z   test_clamp_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.018s)
2023-01-11T21:33:14.5127985Z   test_clamp_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5128296Z   test_clamp_propagates_nans_cuda (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5128656Z   test_clamp_raises_arg_errors_cuda (__main__.TestShapeOpsCUDA) ... ok (0.011s)
2023-01-11T21:33:14.5128987Z   test_complex_rot90_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:14.5129315Z   test_complex_rot90_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:14.5129823Z   test_diag_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5130124Z   test_diag_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5130427Z   test_diagonal_cuda (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5130760Z   test_diagonal_multidim_cuda_float32 (__main__.TestShapeOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:14.5131097Z   test_flip_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.025s)
2023-01-11T21:33:14.5131393Z   test_flip_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.015s)
2023-01-11T21:33:14.5131700Z   test_flip_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.028s)
2023-01-11T21:33:14.5132008Z   test_flip_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.027s)
2023-01-11T21:33:14.5132313Z   test_flip_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.025s)
2023-01-11T21:33:14.5132614Z   test_flip_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.091s)
2023-01-11T21:33:14.5132912Z   test_flip_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.023s)
2023-01-11T21:33:14.5133209Z   test_flip_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.014s)
2023-01-11T21:33:14.5133504Z   test_flip_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.014s)
2023-01-11T21:33:14.5133802Z   test_flip_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5134083Z   test_flip_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.014s)
2023-01-11T21:33:14.5134378Z   test_flip_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.014s)
2023-01-11T21:33:14.5134880Z   test_flip_errors_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5135190Z   test_flip_errors_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5135508Z   test_flip_errors_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5135832Z   test_flip_errors_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5136250Z   test_flip_errors_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5136566Z   test_flip_errors_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5136875Z   test_flip_errors_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5137260Z   test_flip_errors_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5137562Z   test_flip_errors_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5137862Z   test_flip_errors_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5138172Z   test_flip_errors_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5138471Z   test_flip_errors_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.013s)
2023-01-11T21:33:14.5138817Z   test_flip_large_tensor_cuda (__main__.TestShapeOpsCUDA) ... skip: Insufficient cpu memory (0.121s)
2023-01-11T21:33:14.5139162Z   test_flip_numpy_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.024s)
2023-01-11T21:33:14.5139477Z   test_flip_numpy_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.021s)
2023-01-11T21:33:14.5139785Z   test_flip_numpy_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.023s)
2023-01-11T21:33:14.5140107Z   test_flip_numpy_cuda_complex64 (__main__.TestShapeOpsCUDA) ... ok (0.023s)
2023-01-11T21:33:14.5140426Z   test_flip_numpy_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.023s)
2023-01-11T21:33:14.5140730Z   test_flip_numpy_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.023s)
2023-01-11T21:33:14.5141038Z   test_flip_numpy_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.022s)
2023-01-11T21:33:14.5141350Z   test_flip_numpy_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.021s)
2023-01-11T21:33:14.5141658Z   test_flip_numpy_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.021s)
2023-01-11T21:33:14.5141954Z   test_flip_numpy_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.021s)
2023-01-11T21:33:14.5142261Z   test_flip_numpy_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.021s)
2023-01-11T21:33:14.5142608Z   test_flip_numpy_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.021s)
2023-01-11T21:33:14.5142911Z   test_fliplr_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:14.5143219Z   test_fliplr_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:14.5143528Z   test_fliplr_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5143846Z   test_fliplr_invalid_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:14.5144165Z   test_fliplr_invalid_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:14.5144484Z   test_fliplr_invalid_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:14.5144797Z   test_flipud_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:14.5145097Z   test_flipud_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:14.5145401Z   test_flipud_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:14.5145717Z   test_flipud_invalid_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5146034Z   test_flipud_invalid_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5146355Z   test_flipud_invalid_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5146670Z   test_movedim_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.055s)
2023-01-11T21:33:14.5146986Z   test_movedim_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.049s)
2023-01-11T21:33:14.5147281Z   test_movedim_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.031s)
2023-01-11T21:33:14.5147597Z   test_movedim_invalid_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5147922Z   test_movedim_invalid_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.040s)
2023-01-11T21:33:14.5148234Z   test_movedim_invalid_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.040s)
2023-01-11T21:33:14.5148560Z   test_nonzero_astuple_out_cuda (__main__.TestShapeOpsCUDA) ... ok (0.053s)
2023-01-11T21:33:14.5148873Z   test_nonzero_cuda_bfloat16 (__main__.TestShapeOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:14.5149178Z   test_nonzero_cuda_bool (__main__.TestShapeOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:14.5149514Z   test_nonzero_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5149820Z   test_nonzero_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5150122Z   test_nonzero_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5150416Z   test_nonzero_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5150714Z   test_nonzero_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5151008Z   test_nonzero_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:14.5151302Z   test_nonzero_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:14.5151593Z   test_nonzero_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:14.5151915Z   test_nonzero_discontiguous_cuda (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5152233Z   test_nonzero_no_warning_cuda (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5152540Z   test_nonzero_non_diff_cuda (__main__.TestShapeOpsCUDA) ... ok (0.000s)
2023-01-11T21:33:14.5152836Z   test_rot90_cuda (__main__.TestShapeOpsCUDA) ... ok (0.023s)
2023-01-11T21:33:14.5153143Z   test_sparse_dense_dim_cuda_complex128 (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5153474Z   test_sparse_dense_dim_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5153788Z   test_sparse_dense_dim_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:14.5154113Z   test_tolist_cuda (__main__.TestShapeOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:14.5154428Z   test_trace_cuda_float16 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5154754Z   test_trace_cuda_float32 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5155052Z   test_trace_cuda_float64 (__main__.TestShapeOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:14.5155348Z   test_trace_cuda_int16 (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5155640Z   test_trace_cuda_int32 (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5155936Z   test_trace_cuda_int64 (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5156229Z   test_trace_cuda_int8 (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5156525Z   test_trace_cuda_uint8 (__main__.TestShapeOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:14.5156830Z   test_unbind_cuda (__main__.TestShapeOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:14.5157012Z 
2023-01-11T21:33:14.5157252Z ----------------------------------------------------------------------
2023-01-11T21:33:14.5157510Z Ran 93 tests in 1.715s
2023-01-11T21:33:14.5157629Z 
2023-01-11T21:33:14.5157706Z OK (skipped=4)
2023-01-11T21:33:14.5157822Z 
2023-01-11T21:33:14.5157913Z Generating XML reports...
2023-01-11T21:33:14.5158348Z Generated XML report: test-reports/python-unittest/test_shape_ops/TEST-TestShapeOpsCUDA-20230111213312.xml
2023-01-11T21:33:14.5158592Z 
2023-01-11T21:33:14.5158880Z ##[endgroup]
2023-01-11T21:33:14.5159276Z FINISHED PRINTING LOG FILE of test_shape_ops (/var/lib/jenkins/workspace/test/test-reports/test_shape_ops_sg7hohkm)
2023-01-11T21:33:14.5159496Z 
2023-01-11T21:33:16.4386065Z Ignoring disabled issues:  []
2023-01-11T21:33:16.4623807Z Running test_type_info ... [2023-01-11 21:33:16.461571]
2023-01-11T21:33:16.4624650Z Executing ['/opt/conda/bin/python', '-bb', 'test_type_info.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:33:16.461911]
2023-01-11T21:33:19.3085183Z 
2023-01-11T21:33:19.3085951Z Expand the folded group to see the log file of test_type_info
2023-01-11T21:33:19.3086806Z ##[group]PRINTING LOG FILE of test_type_info (/var/lib/jenkins/workspace/test/test-reports/test_type_info_vq_q4qi7)
2023-01-11T21:33:19.3087053Z 
2023-01-11T21:33:19.3087140Z Running tests...
2023-01-11T21:33:19.3087573Z ----------------------------------------------------------------------
2023-01-11T21:33:19.3087975Z Test results will be stored in test-reports/python-unittest/test_type_info
2023-01-11T21:33:19.3088495Z   test_finfo (__main__.TestDTypeInfo) ... ok (1.096s)
2023-01-11T21:33:19.3088763Z   test_iinfo (__main__.TestDTypeInfo) ... ok (0.001s)
2023-01-11T21:33:19.3089043Z   test_invalid_input (__main__.TestDTypeInfo) ... ok (0.001s)
2023-01-11T21:33:19.3089205Z 
2023-01-11T21:33:19.3089406Z ----------------------------------------------------------------------
2023-01-11T21:33:19.3089662Z Ran 3 tests in 1.098s
2023-01-11T21:33:19.3089785Z 
2023-01-11T21:33:19.3089859Z OK
2023-01-11T21:33:19.3089960Z 
2023-01-11T21:33:19.3090056Z Generating XML reports...
2023-01-11T21:33:19.3090493Z Generated XML report: test-reports/python-unittest/test_type_info/TEST-TestDTypeInfo-20230111213317.xml
2023-01-11T21:33:19.3090732Z 
2023-01-11T21:33:19.3090960Z ##[endgroup]
2023-01-11T21:33:19.3091351Z FINISHED PRINTING LOG FILE of test_type_info (/var/lib/jenkins/workspace/test/test-reports/test_type_info_vq_q4qi7)
2023-01-11T21:33:19.3091572Z 
2023-01-11T21:33:21.2182644Z Ignoring disabled issues:  []
2023-01-11T21:33:21.2420429Z Running test_view_ops ... [2023-01-11 21:33:21.241339]
2023-01-11T21:33:21.2421461Z Executing ['/opt/conda/bin/python', '-bb', 'test_view_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:33:21.241713]
2023-01-11T21:33:40.0382656Z 
2023-01-11T21:33:40.0383292Z Expand the folded group to see the log file of test_view_ops
2023-01-11T21:33:40.0384014Z ##[group]PRINTING LOG FILE of test_view_ops (/var/lib/jenkins/workspace/test/test-reports/test_view_ops_yux6z2dq)
2023-01-11T21:33:40.0384241Z 
2023-01-11T21:33:40.0387633Z Running tests...
2023-01-11T21:33:40.0388305Z ----------------------------------------------------------------------
2023-01-11T21:33:40.0389162Z Test results will be stored in test-reports/python-unittest/test_view_ops
2023-01-11T21:33:40.0390717Z   test_T_cuda (__main__.TestOldViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:1305: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorShape.cpp:3542.)
2023-01-11T21:33:40.0391736Z   t1 = a.T
2023-01-11T21:33:40.0391986Z ok (0.004s)
2023-01-11T21:33:40.0392324Z   test_atleast_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.296s)
2023-01-11T21:33:40.0392780Z   test_atleast_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.299s)
2023-01-11T21:33:40.0393242Z   test_atleast_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.271s)
2023-01-11T21:33:40.0393687Z   test_atleast_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.266s)
2023-01-11T21:33:40.0394791Z   test_atleast_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.259s)
2023-01-11T21:33:40.0395216Z   test_atleast_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.137s)
2023-01-11T21:33:40.0395647Z   test_atleast_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.139s)
2023-01-11T21:33:40.0396064Z   test_atleast_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.129s)
2023-01-11T21:33:40.0396490Z   test_atleast_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.136s)
2023-01-11T21:33:40.0396922Z   test_atleast_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.136s)
2023-01-11T21:33:40.0397351Z   test_atleast_gradient_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.076s)
2023-01-11T21:33:40.0397760Z   test_big_transpose_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.006s)
2023-01-11T21:33:40.0398250Z   test_broadcast_shapes_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:33:40.0398811Z   test_broadcast_tensors_cuda_float32 (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0453547Z   test_broadcast_to_cuda_bool (__main__.TestOldViewOpsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py:679: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/utils/tensor_numpy.cpp:210.)
2023-01-11T21:33:40.0455100Z   return torch.as_tensor(tensor_like)
2023-01-11T21:33:40.0455448Z ok (0.042s)
2023-01-11T21:33:40.0455805Z   test_broadcast_to_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.043s)
2023-01-11T21:33:40.0456401Z   test_broadcast_to_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.043s)
2023-01-11T21:33:40.0456874Z   test_broadcast_to_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.043s)
2023-01-11T21:33:40.0457343Z   test_broadcast_to_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.043s)
2023-01-11T21:33:40.0457802Z   test_broadcast_to_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:40.0458262Z   test_broadcast_to_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:40.0458738Z   test_broadcast_to_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:40.0459197Z   test_broadcast_to_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.041s)
2023-01-11T21:33:40.0459654Z   test_broadcast_to_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:40.0460078Z   test_broadcast_to_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.042s)
2023-01-11T21:33:40.0460407Z   test_chunk_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0460880Z   test_conj_neg_view_numpy_error_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0461535Z   test_contiguous_cuda (__main__.TestOldViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:1683: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0462088Z   x.set_(x.storage(), 0, x.size(), stride)
2023-01-11T21:33:40.0462295Z ok (0.001s)
2023-01-11T21:33:40.0462564Z   test_crow_col_indices_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0462889Z   test_empty_reshape_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0463192Z   test_expand_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.010s)
2023-01-11T21:33:40.0463493Z   test_flatten_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.016s)
2023-01-11T21:33:40.0463797Z   test_memory_format_resize__cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0464123Z   test_memory_format_resize_as_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0464457Z   test_narrow_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0464800Z   test_narrow_tensor_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0465122Z   test_python_types_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0465421Z   test_ravel_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.006s)
2023-01-11T21:33:40.0465717Z   test_reshape_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:40.0466354Z   test_reshape_view_semantics_cuda_bfloat16 (__main__.TestOldViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:1669: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0466988Z   self.assertEqual(tensor.storage().data_ptr(), view_tensor.storage().data_ptr())
2023-01-11T21:33:40.0467616Z /var/lib/jenkins/workspace/test/test_view_ops.py:1675: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0468193Z   self.assertNotEqual(tensor.storage().data_ptr(), copy_tensor.storage().data_ptr())
2023-01-11T21:33:40.0468461Z ok (0.001s)
2023-01-11T21:33:40.0468720Z   test_reshape_view_semantics_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0469067Z   test_reshape_view_semantics_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0469431Z   test_reshape_view_semantics_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0469782Z   test_reshape_view_semantics_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0470122Z   test_reshape_view_semantics_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0470465Z   test_reshape_view_semantics_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0470855Z   test_reshape_view_semantics_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0471188Z   test_reshape_view_semantics_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0471526Z   test_reshape_view_semantics_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0471866Z   test_reshape_view_semantics_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0472240Z   test_reshape_view_semantics_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0472571Z   test_resize_all_dtypes_and_devices_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0472917Z   test_resize_as_all_dtypes_and_devices_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0473263Z   test_resize_as_preserves_strides_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0473584Z   test_resize_overflow_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.006s)
2023-01-11T21:33:40.0473912Z   test_split_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:33:40.0474246Z   test_t_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0474570Z   test_tensor_split_errors_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.040s)
2023-01-11T21:33:40.0474892Z   test_tensor_split_indices_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.169s)
2023-01-11T21:33:40.0475236Z   test_tensor_split_indices_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.224s)
2023-01-11T21:33:40.0475586Z   test_tensor_split_indices_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.229s)
2023-01-11T21:33:40.0475920Z   test_tensor_split_indices_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.213s)
2023-01-11T21:33:40.0476260Z   test_tensor_split_indices_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.214s)
2023-01-11T21:33:40.0476596Z   test_tensor_split_indices_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.211s)
2023-01-11T21:33:40.0476933Z   test_tensor_split_indices_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.171s)
2023-01-11T21:33:40.0477261Z   test_tensor_split_indices_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.175s)
2023-01-11T21:33:40.0477591Z   test_tensor_split_indices_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.170s)
2023-01-11T21:33:40.0477926Z   test_tensor_split_indices_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.176s)
2023-01-11T21:33:40.0478253Z   test_tensor_split_indices_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.174s)
2023-01-11T21:33:40.0478588Z   test_tensor_split_sections_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.605s)
2023-01-11T21:33:40.0478929Z   test_tensor_split_sections_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.798s)
2023-01-11T21:33:40.0479307Z   test_tensor_split_sections_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.820s)
2023-01-11T21:33:40.0479648Z   test_tensor_split_sections_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.754s)
2023-01-11T21:33:40.0479989Z   test_tensor_split_sections_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.765s)
2023-01-11T21:33:40.0480328Z   test_tensor_split_sections_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.740s)
2023-01-11T21:33:40.0480671Z   test_tensor_split_sections_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.599s)
2023-01-11T21:33:40.0480998Z   test_tensor_split_sections_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.601s)
2023-01-11T21:33:40.0481339Z   test_tensor_split_sections_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.576s)
2023-01-11T21:33:40.0481669Z   test_tensor_split_sections_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.603s)
2023-01-11T21:33:40.0481994Z   test_tensor_split_sections_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.597s)
2023-01-11T21:33:40.0482338Z   test_transpose_invalid_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.036s)
2023-01-11T21:33:40.0482678Z   test_transpose_invalid_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.036s)
2023-01-11T21:33:40.0483013Z   test_transpose_invalid_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.036s)
2023-01-11T21:33:40.0483343Z   test_transpose_vs_numpy_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.033s)
2023-01-11T21:33:40.0483684Z   test_transpose_vs_numpy_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.029s)
2023-01-11T21:33:40.0484021Z   test_transpose_vs_numpy_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.018s)
2023-01-11T21:33:40.0484381Z   test_transposes_cuda_bfloat16 (__main__.TestOldViewOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:40.0484708Z   test_transposes_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0485031Z   test_transposes_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.008s)
2023-01-11T21:33:40.0485371Z   test_transposes_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:40.0485690Z   test_transposes_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:40.0486014Z   test_transposes_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:40.0486338Z   test_transposes_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:40.0486643Z   test_transposes_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0486960Z   test_transposes_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0487271Z   test_transposes_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0487587Z   test_transposes_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0487892Z   test_transposes_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0488217Z   test_transposes_errors_cuda_bfloat16 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0488560Z   test_transposes_errors_cuda_bool (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0488887Z   test_transposes_errors_cuda_complex128 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0489359Z   test_transposes_errors_cuda_complex64 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0489694Z   test_transposes_errors_cuda_float16 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0490026Z   test_transposes_errors_cuda_float32 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0490364Z   test_transposes_errors_cuda_float64 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0490697Z   test_transposes_errors_cuda_int16 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0491015Z   test_transposes_errors_cuda_int32 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0491337Z   test_transposes_errors_cuda_int64 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0491698Z   test_transposes_errors_cuda_int8 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0492015Z   test_transposes_errors_cuda_uint8 (__main__.TestOldViewOpsCUDA) ... ok (0.012s)
2023-01-11T21:33:40.0492348Z   test_unsqueeze_cuda (__main__.TestOldViewOpsCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:33:40.0492693Z   test_view_all_dtypes_and_devices_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0493014Z   test_view_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.038s)
2023-01-11T21:33:40.0493304Z   test_view_empty_cuda (__main__.TestOldViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0493609Z   test_T_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0493922Z   test_advanced_indexing_assignment_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0494247Z   test_advanced_indexing_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0495280Z   test_as_strided_gradients_cuda (__main__.TestViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0495896Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0496565Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0497195Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0497744Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0498270Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0498818Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0499347Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0499894Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0500408Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0500997Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0501537Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0502113Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0502644Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0503179Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0503699Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0504249Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0504779Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0505326Z /var/lib/jenkins/workspace/test/test_view_ops.py:667: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:33:40.0505842Z   assert max_offset < len(y.storage()), "test case resizes storage"
2023-01-11T21:33:40.0506073Z ok (0.096s)
2023-01-11T21:33:40.0506332Z   test_as_strided_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0506683Z   test_as_strided_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0506998Z   test_basic_indexing_ellipses_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0507335Z   test_basic_indexing_newaxis_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0507667Z   test_basic_indexing_slice_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0508054Z   test_chunk_view_cuda (__main__.TestViewOpsCUDA) ... skip: See https://github.com/pytorch/pytorch/pull/32720 (0.000s)
2023-01-11T21:33:40.0508426Z   test_conj_imag_view_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0508749Z   test_conj_imag_view_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0509067Z   test_conj_self_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0509369Z   test_conj_self_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0509683Z   test_conj_self_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0509987Z   test_conj_self_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0510282Z   test_conj_self_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0510588Z   test_conj_self_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.000s)
2023-01-11T21:33:40.0510886Z   test_conj_self_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.000s)
2023-01-11T21:33:40.0511185Z   test_conj_self_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.000s)
2023-01-11T21:33:40.0511477Z   test_conj_self_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.000s)
2023-01-11T21:33:40.0511788Z   test_conj_view_with_shared_memory_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0512111Z   test_contiguous_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0512412Z   test_contiguous_self_cuda (__main__.TestViewOpsCUDA) ... ok (0.000s)
2023-01-11T21:33:40.0512719Z   test_diagonal_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0513021Z   test_expand_as_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0513312Z   test_expand_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0513662Z   test_flatten_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0513966Z   test_flatten_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0514279Z   test_imag_noncomplex_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0514594Z   test_imag_noncomplex_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0514916Z   test_imag_noncomplex_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0515234Z   test_imag_noncomplex_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0515547Z   test_imag_noncomplex_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0515868Z   test_imag_noncomplex_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0516187Z   test_imag_noncomplex_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0516497Z   test_imag_noncomplex_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0516807Z   test_imag_noncomplex_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0517116Z   test_movedim_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.004s)
2023-01-11T21:33:40.0517421Z   test_narrow_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0517710Z   test_permute_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0518024Z   test_real_imag_view_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0518343Z   test_real_imag_view_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.003s)
2023-01-11T21:33:40.0518654Z   test_reshape_as_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0518990Z   test_reshape_nonview_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0519290Z   test_reshape_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0519590Z   test_select_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0519901Z   test_set_real_imag_cuda_complex128_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0520242Z   test_set_real_imag_cuda_complex128_bool (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0520581Z   test_set_real_imag_cuda_complex128_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0520977Z   test_set_real_imag_cuda_complex128_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0521305Z   test_set_real_imag_cuda_complex128_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0521638Z   test_set_real_imag_cuda_complex128_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0521972Z   test_set_real_imag_cuda_complex128_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0522293Z   test_set_real_imag_cuda_complex128_int16 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0522625Z   test_set_real_imag_cuda_complex128_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0522951Z   test_set_real_imag_cuda_complex128_int64 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0523278Z   test_set_real_imag_cuda_complex128_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0523600Z   test_set_real_imag_cuda_complex128_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0523930Z   test_set_real_imag_cuda_complex64_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0524262Z   test_set_real_imag_cuda_complex64_bool (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0524590Z   test_set_real_imag_cuda_complex64_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0524933Z   test_set_real_imag_cuda_complex64_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0525269Z   test_set_real_imag_cuda_complex64_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0525634Z   test_set_real_imag_cuda_complex64_float32 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0525957Z   test_set_real_imag_cuda_complex64_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0526291Z   test_set_real_imag_cuda_complex64_int16 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0526622Z   test_set_real_imag_cuda_complex64_int32 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0526942Z   test_set_real_imag_cuda_complex64_int64 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0527269Z   test_set_real_imag_cuda_complex64_int8 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0527594Z   test_set_real_imag_cuda_complex64_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0527977Z   test_split_view_cuda (__main__.TestViewOpsCUDA) ... skip: See https://github.com/pytorch/pytorch/pull/32720 (0.000s)
2023-01-11T21:33:40.0528337Z   test_squeeze_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0528647Z   test_squeeze_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0528945Z   test_t_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0529233Z   test_t_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0529531Z   test_transpose_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0529845Z   test_transpose_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0530138Z   test_unbind_cuda (__main__.TestViewOpsCUDA) ... ok (0.005s)
2023-01-11T21:33:40.0530417Z   test_unbind_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0530718Z   test_unfold_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0531105Z   test_unsqueeze_inplace_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0531405Z   test_unsqueeze_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0531708Z   test_view_as_complex_cuda (__main__.TestViewOpsCUDA) ... ok (0.030s)
2023-01-11T21:33:40.0532022Z   test_view_as_real_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0532871Z   test_view_as_real_cuda_complex32 (__main__.TestViewOpsCUDA) ... /var/lib/jenkins/workspace/test/test_view_ops.py:308: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.)
2023-01-11T21:33:40.0533351Z   t = torch.randn(3, 4, dtype=dtype, device=device)
2023-01-11T21:33:40.0533570Z ok (0.002s)
2023-01-11T21:33:40.0533821Z   test_view_as_real_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0534119Z   test_view_as_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0534412Z   test_view_copy_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0538973Z   test_view_copy_out_cuda (__main__.TestViewOpsCUDA) ... ok (0.002s)
2023-01-11T21:33:40.0539302Z   test_view_copy_output_contiguous_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0539613Z   test_view_dtype_new_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.179s)
2023-01-11T21:33:40.0539930Z   test_view_dtype_new_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.198s)
2023-01-11T21:33:40.0540250Z   test_view_dtype_new_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.166s)
2023-01-11T21:33:40.0560637Z   test_view_dtype_new_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.180s)
2023-01-11T21:33:40.0560969Z   test_view_dtype_new_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.180s)
2023-01-11T21:33:40.0561288Z   test_view_dtype_new_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.162s)
2023-01-11T21:33:40.0561618Z   test_view_dtype_new_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.179s)
2023-01-11T21:33:40.0561923Z   test_view_dtype_new_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.179s)
2023-01-11T21:33:40.0562233Z   test_view_dtype_new_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.162s)
2023-01-11T21:33:40.0562819Z   test_view_dtype_new_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.180s)
2023-01-11T21:33:40.0563135Z   test_view_dtype_new_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.178s)
2023-01-11T21:33:40.0563451Z   test_view_dtype_upsize_errors_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.091s)
2023-01-11T21:33:40.0563792Z   test_view_dtype_upsize_errors_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.135s)
2023-01-11T21:33:40.0564138Z   test_view_dtype_upsize_errors_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0564477Z   test_view_dtype_upsize_errors_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.016s)
2023-01-11T21:33:40.0564825Z   test_view_dtype_upsize_errors_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.090s)
2023-01-11T21:33:40.0565165Z   test_view_dtype_upsize_errors_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.060s)
2023-01-11T21:33:40.0565509Z   test_view_dtype_upsize_errors_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.016s)
2023-01-11T21:33:40.0565842Z   test_view_dtype_upsize_errors_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.090s)
2023-01-11T21:33:40.0566171Z   test_view_dtype_upsize_errors_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.060s)
2023-01-11T21:33:40.0566500Z   test_view_dtype_upsize_errors_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.016s)
2023-01-11T21:33:40.0566820Z   test_view_dtype_upsize_errors_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.135s)
2023-01-11T21:33:40.0567153Z   test_view_dtype_upsize_errors_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.137s)
2023-01-11T21:33:40.0567487Z   test_view_tensor_dsplit_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0567903Z   test_view_tensor_dsplit_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0568230Z   test_view_tensor_dsplit_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0568565Z   test_view_tensor_dsplit_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0568899Z   test_view_tensor_dsplit_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0569217Z   test_view_tensor_dsplit_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0569542Z   test_view_tensor_dsplit_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0569867Z   test_view_tensor_dsplit_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0570188Z   test_view_tensor_dsplit_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0570505Z   test_view_tensor_dsplit_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0570827Z   test_view_tensor_dsplit_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0571158Z   test_view_tensor_dsplit_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0571476Z   test_view_tensor_hsplit_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0571809Z   test_view_tensor_hsplit_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0572141Z   test_view_tensor_hsplit_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0572478Z   test_view_tensor_hsplit_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0572800Z   test_view_tensor_hsplit_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0573126Z   test_view_tensor_hsplit_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0573456Z   test_view_tensor_hsplit_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0573769Z   test_view_tensor_hsplit_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0574094Z   test_view_tensor_hsplit_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0574414Z   test_view_tensor_hsplit_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0575122Z   test_view_tensor_hsplit_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0575526Z   test_view_tensor_hsplit_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0575845Z   test_view_tensor_split_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0576273Z   test_view_tensor_split_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0576591Z   test_view_tensor_split_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0576920Z   test_view_tensor_split_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0577247Z   test_view_tensor_split_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0577577Z   test_view_tensor_split_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0577887Z   test_view_tensor_split_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0578207Z   test_view_tensor_split_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0578528Z   test_view_tensor_split_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0578834Z   test_view_tensor_split_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0579144Z   test_view_tensor_split_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0579459Z   test_view_tensor_split_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0579778Z   test_view_tensor_vsplit_cuda_bfloat16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0580093Z   test_view_tensor_vsplit_cuda_bool (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0580419Z   test_view_tensor_vsplit_cuda_complex128 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0580862Z   test_view_tensor_vsplit_cuda_complex64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0581181Z   test_view_tensor_vsplit_cuda_float16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0581506Z   test_view_tensor_vsplit_cuda_float32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0581826Z   test_view_tensor_vsplit_cuda_float64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0582149Z   test_view_tensor_vsplit_cuda_int16 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0582455Z   test_view_tensor_vsplit_cuda_int32 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0582772Z   test_view_tensor_vsplit_cuda_int64 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0583088Z   test_view_tensor_vsplit_cuda_int8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0583392Z   test_view_tensor_vsplit_cuda_uint8 (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0583699Z   test_view_view_cuda (__main__.TestViewOpsCUDA) ... ok (0.001s)
2023-01-11T21:33:40.0583867Z 
2023-01-11T21:33:40.0584184Z ----------------------------------------------------------------------
2023-01-11T21:33:40.0584452Z Ran 278 tests in 15.850s
2023-01-11T21:33:40.0584568Z 
2023-01-11T21:33:40.0584656Z OK (skipped=12)
2023-01-11T21:33:40.0584774Z 
2023-01-11T21:33:40.0584866Z Generating XML reports...
2023-01-11T21:33:40.0585311Z Generated XML report: test-reports/python-unittest/test_view_ops/TEST-TestOldViewOpsCUDA-20230111213323.xml
2023-01-11T21:33:40.0585821Z Generated XML report: test-reports/python-unittest/test_view_ops/TEST-TestViewOpsCUDA-20230111213323.xml
2023-01-11T21:33:40.0586059Z 
2023-01-11T21:33:40.0586478Z ##[endgroup]
2023-01-11T21:33:40.0586868Z FINISHED PRINTING LOG FILE of test_view_ops (/var/lib/jenkins/workspace/test/test-reports/test_view_ops_yux6z2dq)
2023-01-11T21:33:40.0587088Z 
2023-01-11T21:33:42.0001283Z Ignoring disabled issues:  []
2023-01-11T21:38:05.6660668Z 
2023-01-11T21:38:05.6663080Z Expand the folded group to see the log file of inductor/test_torchinductor
2023-01-11T21:38:05.6663865Z ##[group]PRINTING LOG FILE of inductor/test_torchinductor (/var/lib/jenkins/workspace/test/test-reports/inductor-test_torchinductor_y59pervs)
2023-01-11T21:38:05.6709623Z 
2023-01-11T21:38:05.6710254Z Running tests...
2023-01-11T21:38:05.6714453Z ----------------------------------------------------------------------
2023-01-11T21:38:05.6715199Z Test results will be stored in test-reports/python-unittest/inductor.test_torchinductor
2023-01-11T21:38:05.6715723Z   test_auto_simd (__main__.CPUReproTests) ... ok (2.951s)
2023-01-11T21:38:05.6716161Z   test_complex_memory_overlap (__main__.CPUReproTests) ... ok (0.001s)
2023-01-11T21:38:05.6717157Z   test_conv_stride_constraints (__main__.CPUReproTests) ... [2023-01-11 21:23:39,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.6717830Z [2023-01-11 21:23:40,797] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.6718312Z [2023-01-11 21:23:40,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.6718781Z [2023-01-11 21:23:40,887] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.6718995Z 
2023-01-11T21:38:05.6719101Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6719501Z import torch
2023-01-11T21:38:05.6720075Z import random
2023-01-11T21:38:05.6720371Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6720714Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6720922Z 
2023-01-11T21:38:05.6721044Z aten = torch.ops.aten
2023-01-11T21:38:05.6721391Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6723174Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6723584Z 
2023-01-11T21:38:05.6723705Z import triton
2023-01-11T21:38:05.6723991Z import triton.language as tl
2023-01-11T21:38:05.6724339Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6724719Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6724955Z 
2023-01-11T21:38:05.6724962Z 
2023-01-11T21:38:05.6725201Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.6725652Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.6726081Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.6726341Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.6726546Z {
2023-01-11T21:38:05.6726751Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6726948Z     {
2023-01-11T21:38:05.6727140Z         #pragma omp for  collapse(3)
2023-01-11T21:38:05.6727360Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.6727545Z         {
2023-01-11T21:38:05.6727740Z             for(long i1=0; i1<5; i1+=1)
2023-01-11T21:38:05.6727930Z             {
2023-01-11T21:38:05.6728120Z                 for(long i2=0; i2<256; i2+=1)
2023-01-11T21:38:05.6728315Z                 {
2023-01-11T21:38:05.6728493Z                     {
2023-01-11T21:38:05.6729182Z                         {
2023-01-11T21:38:05.6729452Z                             auto tmp0 = in_ptr0[i2 + (256*i1) + (1280*i0)];
2023-01-11T21:38:05.6729797Z                             out_ptr0[i1 + (5*i2) + (1280*i0)] = tmp0;
2023-01-11T21:38:05.6730074Z                         }
2023-01-11T21:38:05.6730313Z                     }
2023-01-11T21:38:05.6730557Z                 }
2023-01-11T21:38:05.6730788Z             }
2023-01-11T21:38:05.6731014Z         }
2023-01-11T21:38:05.6731201Z     }
2023-01-11T21:38:05.6731404Z }
2023-01-11T21:38:05.6731654Z ''')
2023-01-11T21:38:05.6731784Z 
2023-01-11T21:38:05.6731789Z 
2023-01-11T21:38:05.6731916Z async_compile.wait(globals())
2023-01-11T21:38:05.6732182Z del async_compile
2023-01-11T21:38:05.6732335Z 
2023-01-11T21:38:05.6732432Z def call(args):
2023-01-11T21:38:05.6732701Z     inp_1, weight_1 = args
2023-01-11T21:38:05.6732973Z     args.clear()
2023-01-11T21:38:05.6733438Z     buf0 = empty_strided((2, 5, 16, 16), (1280, 1, 80, 5), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6733930Z     kernel_cpp_0(c_void_p(inp_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.6734253Z     del inp_1
2023-01-11T21:38:05.6734909Z     buf1 = aten.convolution(buf0, weight_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:05.6735324Z     assert_size_stride(buf1, (2, 6, 14, 14), (1176, 1, 84, 6))
2023-01-11T21:38:05.6735613Z     del buf0
2023-01-11T21:38:05.6735853Z     del weight_1
2023-01-11T21:38:05.6736069Z     return (buf1, )
2023-01-11T21:38:05.6736190Z 
2023-01-11T21:38:05.6736195Z 
2023-01-11T21:38:05.6736280Z if __name__ == "__main__":
2023-01-11T21:38:05.6736516Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.6736800Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.6737278Z     inp_1 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6737672Z     weight_1 = rand_strided((6, 5, 3, 3), (45, 1, 15, 5), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6737976Z     print_performance(lambda: call([inp_1, weight_1]))
2023-01-11T21:38:05.6738136Z 
2023-01-11T21:38:05.6738140Z 
2023-01-11T21:38:05.6738235Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6738448Z import torch
2023-01-11T21:38:05.6738635Z import random
2023-01-11T21:38:05.6738861Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6739140Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6742705Z 
2023-01-11T21:38:05.6742916Z aten = torch.ops.aten
2023-01-11T21:38:05.6743303Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6743677Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6743977Z 
2023-01-11T21:38:05.6744061Z import triton
2023-01-11T21:38:05.6744267Z import triton.language as tl
2023-01-11T21:38:05.6744528Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6744825Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6745000Z 
2023-01-11T21:38:05.6745005Z 
2023-01-11T21:38:05.6745104Z async_compile.wait(globals())
2023-01-11T21:38:05.6745306Z del async_compile
2023-01-11T21:38:05.6745427Z 
2023-01-11T21:38:05.6745507Z def call(args):
2023-01-11T21:38:05.6745702Z     inp_1, weight_1 = args
2023-01-11T21:38:05.6745893Z     args.clear()
2023-01-11T21:38:05.6746151Z     buf0 = aten.convolution(inp_1, weight_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:05.6746447Z     assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1))
2023-01-11T21:38:05.6746657Z     del inp_1
2023-01-11T21:38:05.6746866Z     del weight_1
2023-01-11T21:38:05.6747168Z     return (buf0, )
2023-01-11T21:38:05.6747348Z 
2023-01-11T21:38:05.6747360Z 
2023-01-11T21:38:05.6747451Z if __name__ == "__main__":
2023-01-11T21:38:05.6747688Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.6748713Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.6749341Z     inp_1 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6749957Z     weight_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6750262Z     print_performance(lambda: call([inp_1, weight_1]))
2023-01-11T21:38:05.6750423Z 
2023-01-11T21:38:05.6750500Z ok (1.862s)
2023-01-11T21:38:05.6751072Z   test_cpp_kernel_profile (__main__.CPUReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py:179: UserWarning: CUDA is not available, disabling CUDA profiling
2023-01-11T21:38:05.6751771Z   warn("CUDA is not available, disabling CUDA profiling")
2023-01-11T21:38:05.6752481Z STAGE:2023-01-11 21:23:41 2346:2346 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:38:05.6753244Z [2023-01-11 21:23:41,065] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 0
2023-01-11T21:38:05.6754024Z [2023-01-11 21:23:45,080] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 0
2023-01-11T21:38:05.6754690Z STAGE:2023-01-11 21:23:45 2346:2346 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:38:05.6755223Z STAGE:2023-01-11 21:23:45 2346:2346 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:38:05.6755571Z 
2023-01-11T21:38:05.6755732Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6756059Z import torch
2023-01-11T21:38:05.6756377Z import random
2023-01-11T21:38:05.6756687Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6757053Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6757264Z 
2023-01-11T21:38:05.6757378Z aten = torch.ops.aten
2023-01-11T21:38:05.6757710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6758056Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6758225Z 
2023-01-11T21:38:05.6758305Z import triton
2023-01-11T21:38:05.6758507Z import triton.language as tl
2023-01-11T21:38:05.6758786Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6759118Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6759308Z 
2023-01-11T21:38:05.6759313Z 
2023-01-11T21:38:05.6759473Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.6759729Z #include <ATen/record_function.h>
2023-01-11T21:38:05.6760102Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.6760490Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.6760762Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.6761003Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.6761264Z {
2023-01-11T21:38:05.6761512Z     RECORD_FUNCTION("graph_0_kernel_cpp_0", c10::ArrayRef<c10::IValue>({}));
2023-01-11T21:38:05.6761787Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6761991Z     {
2023-01-11T21:38:05.6762163Z         #pragma omp for 
2023-01-11T21:38:05.6762373Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:05.6762567Z         {
2023-01-11T21:38:05.6762799Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.6763100Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.6763355Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6763582Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.6763776Z         }
2023-01-11T21:38:05.6763976Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.6764199Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:05.6764383Z         {
2023-01-11T21:38:05.6764576Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.6764799Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.6765030Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6765270Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.6765456Z         }
2023-01-11T21:38:05.6765614Z     }
2023-01-11T21:38:05.6765775Z }
2023-01-11T21:38:05.6765955Z ''')
2023-01-11T21:38:05.6766047Z 
2023-01-11T21:38:05.6766054Z 
2023-01-11T21:38:05.6766149Z async_compile.wait(globals())
2023-01-11T21:38:05.6766357Z del async_compile
2023-01-11T21:38:05.6766473Z 
2023-01-11T21:38:05.6766551Z def call(args):
2023-01-11T21:38:05.6766734Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.6766923Z     args.clear()
2023-01-11T21:38:05.6767231Z     buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6767555Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.6767815Z     del arg0_1
2023-01-11T21:38:05.6767992Z     del arg1_1
2023-01-11T21:38:05.6768171Z     return (buf0, )
2023-01-11T21:38:05.6768287Z 
2023-01-11T21:38:05.6768291Z 
2023-01-11T21:38:05.6768375Z if __name__ == "__main__":
2023-01-11T21:38:05.6768613Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.6768890Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.6769274Z     arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6769660Z     arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6769960Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.6770128Z 
2023-01-11T21:38:05.6770194Z ok (4.174s)
2023-01-11T21:38:05.6770449Z   test_cpu_vec_cosim (__main__.CPUReproTests) ... ok (0.001s)
2023-01-11T21:38:05.6771012Z   test_inplace_add_alpha (__main__.CPUReproTests) ... [2023-01-11 21:23:45,102] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.6771598Z [2023-01-11 21:23:46,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.6771832Z 
2023-01-11T21:38:05.6771932Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6772154Z import torch
2023-01-11T21:38:05.6772352Z import random
2023-01-11T21:38:05.6772594Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6772901Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6773079Z 
2023-01-11T21:38:05.6773169Z aten = torch.ops.aten
2023-01-11T21:38:05.6773453Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6773735Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6773881Z 
2023-01-11T21:38:05.6773960Z import triton
2023-01-11T21:38:05.6774176Z import triton.language as tl
2023-01-11T21:38:05.6774448Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6775055Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6775255Z 
2023-01-11T21:38:05.6775259Z 
2023-01-11T21:38:05.6775494Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.6775836Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.6776294Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.6776568Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.6776817Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.6777012Z {
2023-01-11T21:38:05.6777324Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6777539Z     {
2023-01-11T21:38:05.6777713Z         #pragma omp for 
2023-01-11T21:38:05.6777920Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.6778112Z         {
2023-01-11T21:38:05.6778348Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.6778650Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.6778956Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(0.55));
2023-01-11T21:38:05.6779209Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.6779429Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.6779652Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.6779845Z         }
2023-01-11T21:38:05.6780051Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.6780280Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.6780473Z         {
2023-01-11T21:38:05.6780660Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.6780879Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.6781115Z             auto tmp2 = static_cast<float>(0.55);
2023-01-11T21:38:05.6782834Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.6783127Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.6783424Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.6783687Z         }
2023-01-11T21:38:05.6783911Z     }
2023-01-11T21:38:05.6784144Z }
2023-01-11T21:38:05.6784423Z ''')
2023-01-11T21:38:05.6784562Z 
2023-01-11T21:38:05.6784574Z 
2023-01-11T21:38:05.6784700Z async_compile.wait(globals())
2023-01-11T21:38:05.6784999Z del async_compile
2023-01-11T21:38:05.6785165Z 
2023-01-11T21:38:05.6785279Z def call(args):
2023-01-11T21:38:05.6785514Z     x_1, y_1 = args
2023-01-11T21:38:05.6785763Z     args.clear()
2023-01-11T21:38:05.6786162Z     kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(y_1.data_ptr()), c_void_p(x_1.data_ptr()))
2023-01-11T21:38:05.6786422Z     del y_1
2023-01-11T21:38:05.6786599Z     return (x_1, )
2023-01-11T21:38:05.6786718Z 
2023-01-11T21:38:05.6786722Z 
2023-01-11T21:38:05.6786808Z if __name__ == "__main__":
2023-01-11T21:38:05.6787049Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.6787323Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.6787697Z     x_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6788053Z     y_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6788328Z     print_performance(lambda: call([x_1, y_1]))
2023-01-11T21:38:05.6788477Z 
2023-01-11T21:38:05.6788553Z ok (1.707s)
2023-01-11T21:38:05.6789204Z   test_inplace_squeeze_needed (__main__.CPUReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.6789717Z   warnings.warn(
2023-01-11T21:38:05.6790095Z [2023-01-11 21:23:46,937] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1
2023-01-11T21:38:05.6790551Z [2023-01-11 21:23:48,735] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1
2023-01-11T21:38:05.6790763Z 
2023-01-11T21:38:05.6790869Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6791085Z import torch
2023-01-11T21:38:05.6791268Z import random
2023-01-11T21:38:05.6791503Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6791822Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6792003Z 
2023-01-11T21:38:05.6792092Z aten = torch.ops.aten
2023-01-11T21:38:05.6792349Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6792618Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6792756Z 
2023-01-11T21:38:05.6792838Z import triton
2023-01-11T21:38:05.6793039Z import triton.language as tl
2023-01-11T21:38:05.6793298Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6793606Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6793786Z 
2023-01-11T21:38:05.6793790Z 
2023-01-11T21:38:05.6793934Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.6794275Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.6794630Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.6794894Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.6795142Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.6795398Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.6795648Z                        const float* __restrict__ in_ptr3,
2023-01-11T21:38:05.6795891Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.6796137Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.6796381Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.6796623Z                        bool* __restrict__ out_ptr4)
2023-01-11T21:38:05.6796821Z {
2023-01-11T21:38:05.6797009Z     auto in_ptr0 = in_out_ptr0;
2023-01-11T21:38:05.6797233Z     auto out_ptr1 = in_out_ptr1;
2023-01-11T21:38:05.6797423Z     {
2023-01-11T21:38:05.6797719Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.6798012Z         float tmp3 = 0;
2023-01-11T21:38:05.6798251Z         auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:05.6798510Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6798724Z         {
2023-01-11T21:38:05.6798933Z             #pragma omp for reduction(+:tmp3_vec) 
2023-01-11T21:38:05.6799172Z             for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.6799411Z             {
2023-01-11T21:38:05.6799657Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.6799966Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.6800230Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6800451Z                 tmp3_vec += tmp2;
2023-01-11T21:38:05.6800636Z             }
2023-01-11T21:38:05.6800943Z             tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:05.6801290Z             #pragma omp for simd simdlen(4) reduction(+:tmp3) 
2023-01-11T21:38:05.6801535Z             for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.6801731Z             {
2023-01-11T21:38:05.6801934Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.6802155Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.6802375Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6802588Z                 tmp3 += tmp2;
2023-01-11T21:38:05.6802784Z             }
2023-01-11T21:38:05.6802953Z         }
2023-01-11T21:38:05.6803142Z         out_ptr0[0] = tmp3;
2023-01-11T21:38:05.6803334Z     }
2023-01-11T21:38:05.6803492Z     {
2023-01-11T21:38:05.6803788Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.6804083Z         float tmp8 = 0;
2023-01-11T21:38:05.6804322Z         auto tmp8_vec = at::vec::Vectorized<float>(tmp8);
2023-01-11T21:38:05.6808441Z         float tmp9 = 0;
2023-01-11T21:38:05.6808758Z         auto tmp9_vec = at::vec::Vectorized<float>(tmp9);
2023-01-11T21:38:05.6810737Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6811040Z         {
2023-01-11T21:38:05.6811367Z             #pragma omp for reduction(+:tmp8_vec) reduction(+:tmp9_vec) 
2023-01-11T21:38:05.6811744Z             for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.6811977Z             {
2023-01-11T21:38:05.6812229Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.6812535Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.6812840Z                 auto tmp3 = at::vec::Vectorized<float>(out_ptr0[0]);
2023-01-11T21:38:05.6813096Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6813430Z                 auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:05.6813817Z                 auto tmp5 = tmp3 / tmp4;
2023-01-11T21:38:05.6814209Z                 auto tmp6 = tmp2 - tmp5;
2023-01-11T21:38:05.6814717Z                 auto tmp7 = tmp6.pow(2);
2023-01-11T21:38:05.6815017Z                 tmp8_vec += tmp7;
2023-01-11T21:38:05.6815446Z                 tmp9_vec += tmp2;
2023-01-11T21:38:05.6817820Z             }
2023-01-11T21:38:05.6818329Z             tmp8 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp8_vec);
2023-01-11T21:38:05.6818924Z             tmp9 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp9_vec);
2023-01-11T21:38:05.6819426Z             #pragma omp for simd simdlen(4) reduction(+:tmp8) reduction(+:tmp9) 
2023-01-11T21:38:05.6819772Z             for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.6820032Z             {
2023-01-11T21:38:05.6820304Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.6820530Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.6820937Z                 auto tmp3 = out_ptr0[0];
2023-01-11T21:38:05.6821164Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6821419Z                 auto tmp4 = static_cast<float>(10);
2023-01-11T21:38:05.6821747Z                 auto tmp5 = tmp3 / tmp4;
2023-01-11T21:38:05.6824378Z                 auto tmp6 = tmp2 - tmp5;
2023-01-11T21:38:05.6824607Z                 auto tmp7 = tmp6 * tmp6;
2023-01-11T21:38:05.6824813Z                 tmp8 += tmp7;
2023-01-11T21:38:05.6825112Z                 tmp9 += tmp2;
2023-01-11T21:38:05.6825340Z             }
2023-01-11T21:38:05.6825513Z         }
2023-01-11T21:38:05.6825690Z         out_ptr1[0] = tmp8;
2023-01-11T21:38:05.6825889Z         out_ptr2[0] = tmp9;
2023-01-11T21:38:05.6826069Z     }
2023-01-11T21:38:05.6826262Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6826524Z     {
2023-01-11T21:38:05.6826768Z         #pragma omp for 
2023-01-11T21:38:05.6827026Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:05.6827250Z         {
2023-01-11T21:38:05.6827424Z             {
2023-01-11T21:38:05.6827590Z                 {
2023-01-11T21:38:05.6827834Z                     auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.6828142Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.6828423Z                     auto tmp3 = out_ptr2[0];
2023-01-11T21:38:05.6828676Z                     auto tmp7 = out_ptr1[0];
2023-01-11T21:38:05.6828907Z                     auto tmp13 = in_ptr2[i0];
2023-01-11T21:38:05.6829145Z                     auto tmp15 = in_ptr3[i0];
2023-01-11T21:38:05.6829441Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.6829765Z                     auto tmp4 = static_cast<float>(10);
2023-01-11T21:38:05.6912288Z                     auto tmp5 = tmp3 / tmp4;
2023-01-11T21:38:05.6912732Z                     auto tmp6 = tmp2 - tmp5;
2023-01-11T21:38:05.6913030Z                     auto tmp8 = tmp7 / tmp4;
2023-01-11T21:38:05.6913404Z                     auto tmp9 = static_cast<float>(1e-05);
2023-01-11T21:38:05.6915586Z                     auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:05.6915905Z                     auto tmp11 = 1 / std::sqrt(tmp10);
2023-01-11T21:38:05.6916337Z                     auto tmp12 = tmp6 * tmp11;
2023-01-11T21:38:05.6916638Z                     auto tmp14 = tmp12 * tmp13;
2023-01-11T21:38:05.6925125Z                     auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:05.6925465Z                     auto tmp17 = tmp16 * (tmp16>0);
2023-01-11T21:38:05.6925759Z                     auto tmp18 = static_cast<float>(0);
2023-01-11T21:38:05.6926056Z                     auto tmp19 = tmp17 <= tmp18;
2023-01-11T21:38:05.6926381Z                     in_out_ptr0[i0] = tmp12;
2023-01-11T21:38:05.6926668Z                     out_ptr3[i0] = tmp17;
2023-01-11T21:38:05.6926951Z                     out_ptr4[i0] = tmp19;
2023-01-11T21:38:05.6927203Z                 }
2023-01-11T21:38:05.6927417Z             }
2023-01-11T21:38:05.6927595Z         }
2023-01-11T21:38:05.6927786Z         #pragma omp single
2023-01-11T21:38:05.6928005Z         {
2023-01-11T21:38:05.6928178Z             {
2023-01-11T21:38:05.6928346Z                 {
2023-01-11T21:38:05.6928540Z                     auto tmp0 = out_ptr1[0];
2023-01-11T21:38:05.6928776Z                     auto tmp1 = static_cast<float>(10);
2023-01-11T21:38:05.6929008Z                     auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.6929362Z                     auto tmp3 = static_cast<float>(1e-05);
2023-01-11T21:38:05.6929657Z                     auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.6929971Z                     auto tmp5 = 1 / std::sqrt(tmp4);
2023-01-11T21:38:05.6930268Z                     auto tmp6 = tmp5 / tmp1;
2023-01-11T21:38:05.6930538Z                     in_out_ptr1[0] = tmp6;
2023-01-11T21:38:05.6930797Z                 }
2023-01-11T21:38:05.6931020Z             }
2023-01-11T21:38:05.6931231Z         }
2023-01-11T21:38:05.6931445Z     }
2023-01-11T21:38:05.6931659Z }
2023-01-11T21:38:05.6931905Z ''')
2023-01-11T21:38:05.6932044Z 
2023-01-11T21:38:05.6932051Z 
2023-01-11T21:38:05.6932189Z async_compile.wait(globals())
2023-01-11T21:38:05.6932471Z del async_compile
2023-01-11T21:38:05.6932628Z 
2023-01-11T21:38:05.6932723Z def call(args):
2023-01-11T21:38:05.6933044Z     primals_1, primals_2, primals_3, primals_4, primals_5 = args
2023-01-11T21:38:05.6933340Z     args.clear()
2023-01-11T21:38:05.6933767Z     buf0 = empty_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6934296Z     aten.mm.out(as_strided(primals_5, (1, 10), (10, 1)), as_strided(primals_1, (10, 10), (1, 10)), out=buf0)
2023-01-11T21:38:05.6934802Z     del primals_1
2023-01-11T21:38:05.6935222Z     buf1 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6935681Z     buf2 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6936116Z     buf3 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6936449Z     buf4 = as_strided(buf0, (10, ), (1, )); del buf0  # reuse
2023-01-11T21:38:05.6936876Z     buf5 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6937435Z     buf6 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.6937783Z     buf7 = buf2; del buf2  # reuse
2023-01-11T21:38:05.6938386Z     kernel_cpp_0(c_void_p(buf4.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(primals_3.data_ptr()), c_void_p(primals_4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()))
2023-01-11T21:38:05.6938979Z     del buf1
2023-01-11T21:38:05.6939210Z     del buf3
2023-01-11T21:38:05.6939456Z     del primals_2
2023-01-11T21:38:05.6939706Z     del primals_4
2023-01-11T21:38:05.6944690Z     return (buf5, primals_3, as_strided(primals_5, (1, 10), (10, 1)), buf4, buf6, buf7, )
2023-01-11T21:38:05.6944941Z 
2023-01-11T21:38:05.6944948Z 
2023-01-11T21:38:05.6945055Z if __name__ == "__main__":
2023-01-11T21:38:05.6945365Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.6945728Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.6946376Z     primals_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6946879Z     primals_2 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6947389Z     primals_3 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6947883Z     primals_4 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6948367Z     primals_5 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6948798Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5]))
2023-01-11T21:38:05.6949055Z 
2023-01-11T21:38:05.6949148Z ok (2.058s)
2023-01-11T21:38:05.6949770Z   test_load_same_bool_tensor_twice (__main__.CPUReproTests) ... [2023-01-11 21:23:48,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 2
2023-01-11T21:38:05.6950444Z [2023-01-11 21:23:50,596] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 2
2023-01-11T21:38:05.6950727Z 
2023-01-11T21:38:05.6950863Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6951132Z import torch
2023-01-11T21:38:05.6951389Z import random
2023-01-11T21:38:05.6951702Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6952054Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6952264Z 
2023-01-11T21:38:05.6952377Z aten = torch.ops.aten
2023-01-11T21:38:05.6952712Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6953053Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6953220Z 
2023-01-11T21:38:05.6953330Z import triton
2023-01-11T21:38:05.6953608Z import triton.language as tl
2023-01-11T21:38:05.6953945Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6954310Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6954538Z 
2023-01-11T21:38:05.6954546Z 
2023-01-11T21:38:05.6954752Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.6955260Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.6955701Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.6956048Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.6956440Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.6956760Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.6957000Z {
2023-01-11T21:38:05.6957271Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6957544Z     {
2023-01-11T21:38:05.6957771Z         #pragma omp for 
2023-01-11T21:38:05.6958052Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.6958300Z         {
2023-01-11T21:38:05.6958568Z             float g_tmp_buffer_in_ptr0[8] = {0};
2023-01-11T21:38:05.6958901Z             flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.6959297Z             auto tmp0 = at::vec::Vectorized<float>::loadu(g_tmp_buffer_in_ptr0);
2023-01-11T21:38:05.6959692Z             auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.6960044Z             flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.6960524Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(-33.0));
2023-01-11T21:38:05.6960907Z             auto tmp3 = decltype(tmp1)::blendv(tmp2, tmp1, tmp0);
2023-01-11T21:38:05.6961233Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.6961531Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.6961799Z         }
2023-01-11T21:38:05.6962058Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.6962353Z         for(long i0=32; i0<34; i0+=1)
2023-01-11T21:38:05.6962603Z         {
2023-01-11T21:38:05.6962842Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.6963130Z             auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.6963507Z             auto tmp1 = static_cast<float>(-33.0);
2023-01-11T21:38:05.6963881Z             auto tmp3 = tmp0 ? tmp1 : tmp2;
2023-01-11T21:38:05.6964174Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.6964444Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.6964679Z         }
2023-01-11T21:38:05.6964930Z     }
2023-01-11T21:38:05.6965172Z }
2023-01-11T21:38:05.6965408Z ''')
2023-01-11T21:38:05.6965539Z 
2023-01-11T21:38:05.6965550Z 
2023-01-11T21:38:05.6965679Z async_compile.wait(globals())
2023-01-11T21:38:05.6965962Z del async_compile
2023-01-11T21:38:05.6966118Z 
2023-01-11T21:38:05.6966220Z def call(args):
2023-01-11T21:38:05.6966468Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.6966720Z     args.clear()
2023-01-11T21:38:05.6967149Z     buf0 = empty_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6967633Z     buf1 = empty_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6968093Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.6968479Z     del arg0_1
2023-01-11T21:38:05.6968713Z     del arg1_1
2023-01-11T21:38:05.6968968Z     return (buf0, buf1, )
2023-01-11T21:38:05.6969121Z 
2023-01-11T21:38:05.6969128Z 
2023-01-11T21:38:05.6969244Z if __name__ == "__main__":
2023-01-11T21:38:05.6969559Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.6969916Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.6970399Z     arg0_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.6970876Z     arg1_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.6971236Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.6971431Z 
2023-01-11T21:38:05.6971533Z ok (1.747s)
2023-01-11T21:38:05.6972141Z   test_masked_fill_softmax (__main__.CPUReproTests) ... [2023-01-11 21:23:50,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 3
2023-01-11T21:38:05.6972830Z [2023-01-11 21:23:52,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 3
2023-01-11T21:38:05.6973088Z 
2023-01-11T21:38:05.6973222Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.6973513Z import torch
2023-01-11T21:38:05.6973761Z import random
2023-01-11T21:38:05.6974122Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.6974703Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.6974938Z 
2023-01-11T21:38:05.6975051Z aten = torch.ops.aten
2023-01-11T21:38:05.6975382Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.6975736Z async_compile = AsyncCompile()
2023-01-11T21:38:05.6975911Z 
2023-01-11T21:38:05.6976014Z import triton
2023-01-11T21:38:05.6976298Z import triton.language as tl
2023-01-11T21:38:05.6976620Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.6977014Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.6977354Z 
2023-01-11T21:38:05.6977360Z 
2023-01-11T21:38:05.6977564Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.6978008Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.6978467Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.6978828Z                        const unsigned char* __restrict__ in_ptr0,
2023-01-11T21:38:05.6979163Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.6979480Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.6979792Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.6980056Z {
2023-01-11T21:38:05.6980298Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:05.6980600Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.6980876Z     {
2023-01-11T21:38:05.6981106Z         #pragma omp for 
2023-01-11T21:38:05.6981379Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.6981631Z         {
2023-01-11T21:38:05.6981937Z             {
2023-01-11T21:38:05.6982593Z                 #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:05.6983261Z                 float tmp5 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.6983642Z                 auto tmp5_vec = at::vec::Vectorized<float>(tmp5);
2023-01-11T21:38:05.6983957Z                 for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.6984218Z                 {
2023-01-11T21:38:05.6984509Z                     float g_tmp_buffer_in_ptr0[8] = {0};
2023-01-11T21:38:05.6984874Z                     flag_to_float(in_ptr0 + (8*i1) + (17*i0), g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.6985296Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(g_tmp_buffer_in_ptr0);
2023-01-11T21:38:05.6985709Z                     auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (17*i0));
2023-01-11T21:38:05.6986049Z                     auto tmp1 = (tmp0);
2023-01-11T21:38:05.6986517Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(-33.0));
2023-01-11T21:38:05.6986914Z                     auto tmp4 = decltype(tmp2)::blendv(tmp3, tmp2, tmp1);
2023-01-11T21:38:05.6987291Z                     tmp5_vec = at::vec::maximum(tmp5_vec, tmp4);
2023-01-11T21:38:05.6987583Z                 }
2023-01-11T21:38:05.6987996Z                 tmp5 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp5_vec);
2023-01-11T21:38:05.6988468Z                 #pragma omp simd simdlen(4)  reduction(max:tmp5)
2023-01-11T21:38:05.6988793Z                 for(long i1=16; i1<17; i1+=1)
2023-01-11T21:38:05.6989046Z                 {
2023-01-11T21:38:05.6989326Z                     auto tmp0 = in_ptr0[i1 + (17*i0)];
2023-01-11T21:38:05.6989646Z                     auto tmp3 = in_ptr1[i1 + (17*i0)];
2023-01-11T21:38:05.6989972Z                     auto tmp1 = static_cast<bool>(tmp0);
2023-01-11T21:38:05.6990381Z                     auto tmp2 = static_cast<float>(-33.0);
2023-01-11T21:38:05.6990714Z                     auto tmp4 = tmp1 ? tmp2 : tmp3;
2023-01-11T21:38:05.6991020Z                     tmp5 = std::max(tmp5, tmp4);
2023-01-11T21:38:05.6991284Z                 }
2023-01-11T21:38:05.6991623Z                 out_ptr0[i0] = tmp5;
2023-01-11T21:38:05.6991877Z             }
2023-01-11T21:38:05.6992101Z         }
2023-01-11T21:38:05.6992338Z         #pragma omp for 
2023-01-11T21:38:05.6992605Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.6992860Z         {
2023-01-11T21:38:05.6993086Z             {
2023-01-11T21:38:05.6993470Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.6993860Z                 float tmp8 = 0;
2023-01-11T21:38:05.6994195Z                 auto tmp8_vec = at::vec::Vectorized<float>(tmp8);
2023-01-11T21:38:05.6994517Z                 for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.6994764Z                 {
2023-01-11T21:38:05.6995055Z                     float g_tmp_buffer_in_ptr0[8] = {0};
2023-01-11T21:38:05.6995419Z                     flag_to_float(in_ptr0 + (8*i1) + (17*i0), g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.6995829Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(g_tmp_buffer_in_ptr0);
2023-01-11T21:38:05.6996250Z                     auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (17*i0));
2023-01-11T21:38:05.6996647Z                     auto tmp5 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:05.6996978Z                     auto tmp1 = (tmp0);
2023-01-11T21:38:05.6997446Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(-33.0));
2023-01-11T21:38:05.6997844Z                     auto tmp4 = decltype(tmp2)::blendv(tmp3, tmp2, tmp1);
2023-01-11T21:38:05.6998238Z                     auto tmp6 = tmp4 - tmp5;
2023-01-11T21:38:05.6998593Z                     auto tmp7 = tmp6.exp();
2023-01-11T21:38:05.6998914Z                     tmp7.store(out_ptr1 + (8*i1) + (17*i0));
2023-01-11T21:38:05.6999212Z                     tmp8_vec += tmp7;
2023-01-11T21:38:05.6999463Z                 }
2023-01-11T21:38:05.6999873Z                 tmp8 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp8_vec);
2023-01-11T21:38:05.7000323Z                 #pragma omp simd simdlen(4)  reduction(+:tmp8)
2023-01-11T21:38:05.7000649Z                 for(long i1=16; i1<17; i1+=1)
2023-01-11T21:38:05.7000904Z                 {
2023-01-11T21:38:05.7001179Z                     auto tmp0 = in_ptr0[i1 + (17*i0)];
2023-01-11T21:38:05.7001502Z                     auto tmp3 = in_ptr1[i1 + (17*i0)];
2023-01-11T21:38:05.7001795Z                     auto tmp5 = out_ptr0[i0];
2023-01-11T21:38:05.7002122Z                     auto tmp1 = static_cast<bool>(tmp0);
2023-01-11T21:38:05.7002542Z                     auto tmp2 = static_cast<float>(-33.0);
2023-01-11T21:38:05.7002874Z                     auto tmp4 = tmp1 ? tmp2 : tmp3;
2023-01-11T21:38:05.7003240Z                     auto tmp6 = tmp4 - tmp5;
2023-01-11T21:38:05.7003560Z                     auto tmp7 = std::exp(tmp6);
2023-01-11T21:38:05.7003873Z                     out_ptr1[i1 + (17*i0)] = tmp7;
2023-01-11T21:38:05.7004140Z                     tmp8 += tmp7;
2023-01-11T21:38:05.7004386Z                 }
2023-01-11T21:38:05.7004650Z                 out_ptr2[i0] = tmp8;
2023-01-11T21:38:05.7004890Z             }
2023-01-11T21:38:05.7005118Z         }
2023-01-11T21:38:05.7005360Z         #pragma omp for 
2023-01-11T21:38:05.7005626Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.7005872Z         {
2023-01-11T21:38:05.7006129Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.7006369Z             {
2023-01-11T21:38:05.7006697Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + (8*i1) + (17*i0));
2023-01-11T21:38:05.7007098Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr2[i0]);
2023-01-11T21:38:05.7007417Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.7007736Z                 tmp2.store(in_out_ptr0 + (8*i1) + (17*i0));
2023-01-11T21:38:05.7008016Z             }
2023-01-11T21:38:05.7008277Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.7008623Z             for(long i1=16; i1<17; i1+=1)
2023-01-11T21:38:05.7008876Z             {
2023-01-11T21:38:05.7009144Z                 auto tmp0 = out_ptr1[i1 + (17*i0)];
2023-01-11T21:38:05.7009436Z                 auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:05.7009733Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.7010039Z                 in_out_ptr0[i1 + (17*i0)] = tmp2;
2023-01-11T21:38:05.7010289Z             }
2023-01-11T21:38:05.7010518Z         }
2023-01-11T21:38:05.7010741Z     }
2023-01-11T21:38:05.7010945Z }
2023-01-11T21:38:05.7011187Z ''')
2023-01-11T21:38:05.7011317Z 
2023-01-11T21:38:05.7011324Z 
2023-01-11T21:38:05.7011462Z async_compile.wait(globals())
2023-01-11T21:38:05.7011729Z del async_compile
2023-01-11T21:38:05.7011892Z 
2023-01-11T21:38:05.7011994Z def call(args):
2023-01-11T21:38:05.7012256Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.7012507Z     args.clear()
2023-01-11T21:38:05.7012930Z     buf0 = empty_strided((2, 1), (1, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7013411Z     buf1 = empty_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7013896Z     buf2 = empty_strided((2, 1), (1, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7014220Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:05.7014887Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.7015303Z     del arg0_1
2023-01-11T21:38:05.7015527Z     del arg1_1
2023-01-11T21:38:05.7015767Z     return (buf3, )
2023-01-11T21:38:05.7015925Z 
2023-01-11T21:38:05.7016003Z 
2023-01-11T21:38:05.7016118Z if __name__ == "__main__":
2023-01-11T21:38:05.7016417Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7016786Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7017376Z     arg0_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7017863Z     arg1_1 = rand_strided((2, 17), (17, 1), device='cpu', dtype=torch.uint8)
2023-01-11T21:38:05.7018231Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.7018436Z 
2023-01-11T21:38:05.7018538Z ok (1.825s)
2023-01-11T21:38:05.7019143Z   test_new_vec_op_cpu_only (__main__.CPUReproTests) ... [2023-01-11 21:23:52,449] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.7019812Z [2023-01-11 21:23:54,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.7020084Z 
2023-01-11T21:38:05.7020219Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7020500Z import torch
2023-01-11T21:38:05.7020744Z import random
2023-01-11T21:38:05.7021031Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7021390Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7021598Z 
2023-01-11T21:38:05.7021710Z aten = torch.ops.aten
2023-01-11T21:38:05.7022052Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7022400Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7022569Z 
2023-01-11T21:38:05.7022672Z import triton
2023-01-11T21:38:05.7022935Z import triton.language as tl
2023-01-11T21:38:05.7023276Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7023672Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7023900Z 
2023-01-11T21:38:05.7023907Z 
2023-01-11T21:38:05.7024109Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7024544Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7025020Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7025384Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7025668Z {
2023-01-11T21:38:05.7025938Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7026204Z     {
2023-01-11T21:38:05.7026538Z         #pragma omp for 
2023-01-11T21:38:05.7026812Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.7027060Z         {
2023-01-11T21:38:05.7027370Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7027699Z             auto tmp1 = tmp0.erf();
2023-01-11T21:38:05.7027996Z             auto tmp2 = tmp1.expm1();
2023-01-11T21:38:05.7028286Z             auto tmp3 = tmp2.log1p();
2023-01-11T21:38:05.7028556Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7028824Z         }
2023-01-11T21:38:05.7029090Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7029374Z         for(long i0=16; i0<18; i0+=1)
2023-01-11T21:38:05.7029636Z         {
2023-01-11T21:38:05.7029895Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7030173Z             auto tmp1 = std::erf(tmp0);
2023-01-11T21:38:05.7030473Z             auto tmp2 = std::expm1(tmp1);
2023-01-11T21:38:05.7030768Z             auto tmp3 = std::log1p(tmp2);
2023-01-11T21:38:05.7031054Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.7031306Z         }
2023-01-11T21:38:05.7031522Z     }
2023-01-11T21:38:05.7031724Z }
2023-01-11T21:38:05.7031979Z ''')
2023-01-11T21:38:05.7032117Z 
2023-01-11T21:38:05.7032124Z 
2023-01-11T21:38:05.7032248Z async_compile.wait(globals())
2023-01-11T21:38:05.7032514Z del async_compile
2023-01-11T21:38:05.7032668Z 
2023-01-11T21:38:05.7032780Z def call(args):
2023-01-11T21:38:05.7033022Z     x_1, = args
2023-01-11T21:38:05.7033258Z     args.clear()
2023-01-11T21:38:05.7033672Z     buf0 = empty_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7034062Z     kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7103722Z     del x_1
2023-01-11T21:38:05.7103967Z     return (buf0, )
2023-01-11T21:38:05.7104124Z 
2023-01-11T21:38:05.7104131Z 
2023-01-11T21:38:05.7104250Z if __name__ == "__main__":
2023-01-11T21:38:05.7104573Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7105018Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7105593Z     x_1 = rand_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7105967Z     print_performance(lambda: call([x_1]))
2023-01-11T21:38:05.7106173Z 
2023-01-11T21:38:05.7106279Z ok (1.708s)
2023-01-11T21:38:05.7106887Z   test_no_op_squeeze (__main__.CPUReproTests) ... [2023-01-11 21:23:54,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 4
2023-01-11T21:38:05.7107592Z [2023-01-11 21:23:54,148] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 4
2023-01-11T21:38:05.7107875Z 
2023-01-11T21:38:05.7108020Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7108323Z import torch
2023-01-11T21:38:05.7108576Z import random
2023-01-11T21:38:05.7108891Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7158673Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7158908Z 
2023-01-11T21:38:05.7159038Z aten = torch.ops.aten
2023-01-11T21:38:05.7159388Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7159751Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7159943Z 
2023-01-11T21:38:05.7160046Z import triton
2023-01-11T21:38:05.7160326Z import triton.language as tl
2023-01-11T21:38:05.7160659Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7161065Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7161292Z 
2023-01-11T21:38:05.7161299Z 
2023-01-11T21:38:05.7161427Z async_compile.wait(globals())
2023-01-11T21:38:05.7161692Z del async_compile
2023-01-11T21:38:05.7161855Z 
2023-01-11T21:38:05.7161960Z def call(args):
2023-01-11T21:38:05.7162215Z     arg0_1, = args
2023-01-11T21:38:05.7162462Z     args.clear()
2023-01-11T21:38:05.7162715Z     return (arg0_1, )
2023-01-11T21:38:05.7162879Z 
2023-01-11T21:38:05.7162886Z 
2023-01-11T21:38:05.7163001Z if __name__ == "__main__":
2023-01-11T21:38:05.7163498Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7163885Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7164409Z     arg0_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7164804Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7164989Z 
2023-01-11T21:38:05.7165085Z ok (0.018s)
2023-01-11T21:38:05.7165723Z   test_parallel_num_threads (__main__.CPUReproTests) ... [2023-01-11 21:23:54,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 5
2023-01-11T21:38:05.7166441Z [2023-01-11 21:23:55,873] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 5
2023-01-11T21:38:05.7166734Z 
2023-01-11T21:38:05.7166882Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7167154Z import torch
2023-01-11T21:38:05.7167399Z import random
2023-01-11T21:38:05.7167713Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7168078Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7168300Z 
2023-01-11T21:38:05.7168414Z aten = torch.ops.aten
2023-01-11T21:38:05.7168760Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7169101Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7169296Z 
2023-01-11T21:38:05.7169395Z import triton
2023-01-11T21:38:05.7169667Z import triton.language as tl
2023-01-11T21:38:05.7170005Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7170412Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7170632Z 
2023-01-11T21:38:05.7170639Z 
2023-01-11T21:38:05.7170963Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7171422Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7171868Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7172215Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.7172542Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7172798Z {
2023-01-11T21:38:05.7173053Z     for(long i0=0; i0<25; i0+=1)
2023-01-11T21:38:05.7173301Z     {
2023-01-11T21:38:05.7173633Z         auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7174007Z         auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.7174343Z         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7174804Z         tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7175068Z     }
2023-01-11T21:38:05.7175321Z     #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.7175617Z     for(long i0=200; i0<200; i0+=1)
2023-01-11T21:38:05.7175854Z     {
2023-01-11T21:38:05.7176096Z         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7176373Z         auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.7176637Z         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7176913Z         out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.7177262Z     }
2023-01-11T21:38:05.7177472Z }
2023-01-11T21:38:05.7177731Z ''')
2023-01-11T21:38:05.7177867Z 
2023-01-11T21:38:05.7177874Z 
2023-01-11T21:38:05.7178011Z async_compile.wait(globals())
2023-01-11T21:38:05.7178277Z del async_compile
2023-01-11T21:38:05.7178437Z 
2023-01-11T21:38:05.7178546Z def call(args):
2023-01-11T21:38:05.7178815Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.7179057Z     args.clear()
2023-01-11T21:38:05.7179506Z     buf0 = empty_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7179942Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7180298Z     del arg0_1
2023-01-11T21:38:05.7180539Z     del arg1_1
2023-01-11T21:38:05.7180787Z     return (buf0, )
2023-01-11T21:38:05.7180945Z 
2023-01-11T21:38:05.7180952Z 
2023-01-11T21:38:05.7181073Z if __name__ == "__main__":
2023-01-11T21:38:05.7181379Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7181850Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7182376Z     arg0_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7182877Z     arg1_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7183279Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.7183491Z 
2023-01-11T21:38:05.7183594Z ok (1.763s)
2023-01-11T21:38:05.7184197Z   test_sign_cpu_only (__main__.CPUReproTests) ... [2023-01-11 21:23:55,926] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.7184873Z [2023-01-11 21:23:57,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.7185158Z 
2023-01-11T21:38:05.7185299Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7185584Z import torch
2023-01-11T21:38:05.7185837Z import random
2023-01-11T21:38:05.7186132Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7186497Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7186720Z 
2023-01-11T21:38:05.7186839Z aten = torch.ops.aten
2023-01-11T21:38:05.7187174Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7187516Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7187693Z 
2023-01-11T21:38:05.7187796Z import triton
2023-01-11T21:38:05.7188057Z import triton.language as tl
2023-01-11T21:38:05.7188396Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7188780Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7189003Z 
2023-01-11T21:38:05.7189009Z 
2023-01-11T21:38:05.7189304Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7189739Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7190200Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7190537Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7190791Z {
2023-01-11T21:38:05.7191049Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7191315Z     {
2023-01-11T21:38:05.7191540Z         #pragma omp for 
2023-01-11T21:38:05.7191811Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.7192060Z         {
2023-01-11T21:38:05.7192366Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7192807Z             auto tmp1 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), decltype(tmp0)(0) < tmp0);
2023-01-11T21:38:05.7193282Z             auto tmp2 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), tmp0 < decltype(tmp0)(0));
2023-01-11T21:38:05.7193712Z             auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:05.7194000Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7194253Z         }
2023-01-11T21:38:05.7194519Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7194816Z         for(long i0=16; i0<18; i0+=1)
2023-01-11T21:38:05.7195102Z         {
2023-01-11T21:38:05.7195348Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7195618Z             auto tmp1 = tmp0 > 0 ? 1 : 0;
2023-01-11T21:38:05.7195905Z             auto tmp2 = tmp0 < 0 ? 1 : 0;
2023-01-11T21:38:05.7196236Z             auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:05.7196501Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.7196743Z         }
2023-01-11T21:38:05.7196956Z     }
2023-01-11T21:38:05.7197158Z }
2023-01-11T21:38:05.7197392Z ''')
2023-01-11T21:38:05.7197523Z 
2023-01-11T21:38:05.7197529Z 
2023-01-11T21:38:05.7197654Z async_compile.wait(globals())
2023-01-11T21:38:05.7197915Z del async_compile
2023-01-11T21:38:05.7198070Z 
2023-01-11T21:38:05.7198174Z def call(args):
2023-01-11T21:38:05.7198408Z     x_1, = args
2023-01-11T21:38:05.7198640Z     args.clear()
2023-01-11T21:38:05.7199042Z     buf0 = empty_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7199431Z     kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7199785Z     del x_1
2023-01-11T21:38:05.7200012Z     return (buf0, )
2023-01-11T21:38:05.7200161Z 
2023-01-11T21:38:05.7200167Z 
2023-01-11T21:38:05.7200274Z if __name__ == "__main__":
2023-01-11T21:38:05.7200577Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7200924Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7201387Z     x_1 = rand_strided((2, 9), (9, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7201732Z     print_performance(lambda: call([x_1]))
2023-01-11T21:38:05.7201914Z 
2023-01-11T21:38:05.7202007Z ok (1.741s)
2023-01-11T21:38:05.7202301Z   test_timed_cpu_only (__main__.CPUReproTests) ... ok (0.001s)
2023-01-11T21:38:05.7202968Z   test_vec_dynamic_shapes (__main__.CPUReproTests) ... [2023-01-11 21:23:57,802] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 6
2023-01-11T21:38:05.7203647Z [2023-01-11 21:23:59,535] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 6
2023-01-11T21:38:05.7203914Z 
2023-01-11T21:38:05.7204045Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7204309Z import torch
2023-01-11T21:38:05.7204545Z import random
2023-01-11T21:38:05.7204844Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7205191Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7205399Z 
2023-01-11T21:38:05.7205508Z aten = torch.ops.aten
2023-01-11T21:38:05.7205839Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7206172Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7206344Z 
2023-01-11T21:38:05.7206449Z import triton
2023-01-11T21:38:05.7206778Z import triton.language as tl
2023-01-11T21:38:05.7207110Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7207496Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7207725Z 
2023-01-11T21:38:05.7207731Z 
2023-01-11T21:38:05.7207924Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7208379Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7208821Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.7209162Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7209471Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.7209768Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.7210203Z                        const long ks0,
2023-01-11T21:38:05.7210470Z                        const long ks1)
2023-01-11T21:38:05.7210705Z {
2023-01-11T21:38:05.7210949Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:05.7211249Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7211505Z     {
2023-01-11T21:38:05.7211742Z         #pragma omp for 
2023-01-11T21:38:05.7212006Z         for(long i0=0; i0<ks0; i0+=1)
2023-01-11T21:38:05.7212246Z         {
2023-01-11T21:38:05.7212464Z             {
2023-01-11T21:38:05.7212691Z                 {
2023-01-11T21:38:05.7213153Z                     float tmp1 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.7213477Z                     for(long i1=0; i1<ks1; i1+=1)
2023-01-11T21:38:05.7213734Z                     {
2023-01-11T21:38:05.7213967Z                         {
2023-01-11T21:38:05.7214240Z                             auto tmp0 = in_ptr0[i1 + (i0*ks1)];
2023-01-11T21:38:05.7230267Z                             tmp1 = std::max(tmp1, tmp0);
2023-01-11T21:38:05.7230565Z                         }
2023-01-11T21:38:05.7230806Z                     }
2023-01-11T21:38:05.7231059Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.7231320Z                 }
2023-01-11T21:38:05.7231539Z             }
2023-01-11T21:38:05.7231764Z         }
2023-01-11T21:38:05.7232122Z         #pragma omp for 
2023-01-11T21:38:05.7232373Z         for(long i0=0; i0<ks0; i0+=1)
2023-01-11T21:38:05.7232631Z         {
2023-01-11T21:38:05.7232845Z             {
2023-01-11T21:38:05.7233150Z                 {
2023-01-11T21:38:05.7233420Z                     float tmp4 = 0;
2023-01-11T21:38:05.7233927Z                     for(long i1=0; i1<ks1; i1+=1)
2023-01-11T21:38:05.7234181Z                     {
2023-01-11T21:38:05.7234412Z                         {
2023-01-11T21:38:05.7234699Z                             auto tmp0 = in_ptr0[i1 + (i0*ks1)];
2023-01-11T21:38:05.7234996Z                             auto tmp1 = out_ptr0[i0];
2023-01-11T21:38:05.7235385Z                             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.7235701Z                             auto tmp3 = std::exp(tmp2);
2023-01-11T21:38:05.7236009Z                             out_ptr1[i1 + (i0*ks1)] = tmp3;
2023-01-11T21:38:05.7236289Z                             tmp4 += tmp3;
2023-01-11T21:38:05.7236536Z                         }
2023-01-11T21:38:05.7236775Z                     }
2023-01-11T21:38:05.7237017Z                     out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.7237445Z                 }
2023-01-11T21:38:05.7237675Z             }
2023-01-11T21:38:05.7237883Z         }
2023-01-11T21:38:05.7238132Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.7238409Z         for(long i0=0; i0<ks0; i0+=1)
2023-01-11T21:38:05.7238652Z         {
2023-01-11T21:38:05.7238894Z             for(long i1=0; i1<ks1; i1+=1)
2023-01-11T21:38:05.7239146Z             {
2023-01-11T21:38:05.7239362Z                 {
2023-01-11T21:38:05.7239592Z                     {
2023-01-11T21:38:05.7239870Z                         auto tmp0 = out_ptr1[i1 + (i0*ks1)];
2023-01-11T21:38:05.7240164Z                         auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:05.7240451Z                         auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.7240841Z                         in_out_ptr0[i1 + (i0*ks1)] = tmp2;
2023-01-11T21:38:05.7241091Z                     }
2023-01-11T21:38:05.7241322Z                 }
2023-01-11T21:38:05.7241544Z             }
2023-01-11T21:38:05.7241773Z         }
2023-01-11T21:38:05.7241985Z     }
2023-01-11T21:38:05.7242191Z }
2023-01-11T21:38:05.7242440Z ''')
2023-01-11T21:38:05.7242578Z 
2023-01-11T21:38:05.7242583Z 
2023-01-11T21:38:05.7242705Z async_compile.wait(globals())
2023-01-11T21:38:05.7242978Z del async_compile
2023-01-11T21:38:05.7243119Z 
2023-01-11T21:38:05.7243217Z def call(args):
2023-01-11T21:38:05.7243453Z     arg0_1, = args
2023-01-11T21:38:05.7243698Z     args.clear()
2023-01-11T21:38:05.7243956Z     arg0_1_size = arg0_1.size()
2023-01-11T21:38:05.7244205Z     s0 = arg0_1_size[0]
2023-01-11T21:38:05.7244448Z     s1 = arg0_1_size[1]
2023-01-11T21:38:05.7244866Z     buf0 = empty_strided((s0, 1), (1, s0), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7245347Z     buf1 = empty_strided((s0, s1), (s1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7245838Z     buf2 = empty_strided((s0, 1), (1, s0), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7246163Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:05.7246515Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_long(s0), c_long(s1))
2023-01-11T21:38:05.7246812Z     del arg0_1
2023-01-11T21:38:05.7246999Z     return (buf3, )
2023-01-11T21:38:05.7247118Z 
2023-01-11T21:38:05.7247123Z 
2023-01-11T21:38:05.7247207Z if __name__ == "__main__":
2023-01-11T21:38:05.7247438Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7247715Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7248094Z     arg0_1 = rand_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7248457Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7248654Z 
2023-01-11T21:38:05.7248753Z ok (1.885s)
2023-01-11T21:38:05.7249349Z   test_vec_kernel_cpu_only (__main__.CPUReproTests) ... [2023-01-11 21:23:59,702] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.7250085Z [2023-01-11 21:24:00,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.7250690Z [2023-01-11 21:24:00,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.7251270Z [2023-01-11 21:24:02,353] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.7251868Z [2023-01-11 21:24:02,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.7252144Z 
2023-01-11T21:38:05.7252280Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7252538Z import torch
2023-01-11T21:38:05.7252777Z import random
2023-01-11T21:38:05.7253078Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7253432Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7253645Z 
2023-01-11T21:38:05.7253759Z aten = torch.ops.aten
2023-01-11T21:38:05.7254092Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7254433Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7254808Z 
2023-01-11T21:38:05.7254914Z import triton
2023-01-11T21:38:05.7255172Z import triton.language as tl
2023-01-11T21:38:05.7357182Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7357571Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7357784Z 
2023-01-11T21:38:05.7357791Z 
2023-01-11T21:38:05.7358019Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7358477Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7433587Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7434041Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.7434363Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7434620Z {
2023-01-11T21:38:05.7434880Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7435154Z     {
2023-01-11T21:38:05.7435391Z         #pragma omp for 
2023-01-11T21:38:05.7435656Z         for(long i0=0; i0<200; i0+=1)
2023-01-11T21:38:05.7435915Z         {
2023-01-11T21:38:05.7436136Z             {
2023-01-11T21:38:05.7436350Z                 {
2023-01-11T21:38:05.7436617Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7436910Z                     auto tmp12 = in_ptr1[i0];
2023-01-11T21:38:05.7437203Z                     auto tmp1 = std::abs(tmp0);
2023-01-11T21:38:05.7437505Z                     auto tmp2 = std::sin(tmp1);
2023-01-11T21:38:05.7437869Z                     auto tmp3 = -tmp2;
2023-01-11T21:38:05.7438140Z                     auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:05.7438502Z                     auto tmp5 = std::exp(-tmp4);
2023-01-11T21:38:05.7438796Z                     auto tmp6 = 1 / (1 + tmp5);
2023-01-11T21:38:05.7439065Z                     auto tmp7 = tmp6 * (tmp6>0);
2023-01-11T21:38:05.7439365Z                     auto tmp8 = std::cos(tmp7);
2023-01-11T21:38:05.7439658Z                     auto tmp9 = std::exp(tmp8);
2023-01-11T21:38:05.7439957Z                     auto tmp10 = std::sqrt(tmp9);
2023-01-11T21:38:05.7440238Z                     auto tmp11 = tmp10 + tmp0;
2023-01-11T21:38:05.7440601Z                     auto tmp13 = tmp11 - tmp12;
2023-01-11T21:38:05.7440888Z                     auto tmp14 = tmp13 * tmp0;
2023-01-11T21:38:05.7441169Z                     auto tmp15 = tmp14 / tmp0;
2023-01-11T21:38:05.7441457Z                     auto tmp16 = tmp15 * tmp15;
2023-01-11T21:38:05.7441734Z                     auto tmp17 = tmp16 * tmp16;
2023-01-11T21:38:05.7442008Z                     auto tmp18 = tmp17 * tmp15;
2023-01-11T21:38:05.7442302Z                     auto tmp19 = tmp18 * tmp18;
2023-01-11T21:38:05.7442600Z                     auto tmp20 = std::log(tmp19);
2023-01-11T21:38:05.7442898Z                     auto tmp21 = std::floor(tmp20);
2023-01-11T21:38:05.7443211Z                     auto tmp22 = std::ceil(tmp21);
2023-01-11T21:38:05.7443508Z                     auto tmp23 = std::trunc(tmp22);
2023-01-11T21:38:05.7443897Z                     auto tmp24 = std::lgamma(tmp23);
2023-01-11T21:38:05.7444221Z                     auto tmp25 = std::fmod(tmp24, tmp12);
2023-01-11T21:38:05.7444529Z                     auto tmp26 = tmp25 > 0 ? 1 : 0;
2023-01-11T21:38:05.7444827Z                     auto tmp27 = tmp25 < 0 ? 1 : 0;
2023-01-11T21:38:05.7445199Z                     auto tmp28 = tmp26 - tmp27;
2023-01-11T21:38:05.7445488Z                     auto tmp29 = tmp28 + tmp12;
2023-01-11T21:38:05.7445772Z                     out_ptr0[i0] = tmp29;
2023-01-11T21:38:05.7446016Z                 }
2023-01-11T21:38:05.7446242Z             }
2023-01-11T21:38:05.7446467Z         }
2023-01-11T21:38:05.7446679Z     }
2023-01-11T21:38:05.7446880Z }
2023-01-11T21:38:05.7447112Z ''')
2023-01-11T21:38:05.7447248Z 
2023-01-11T21:38:05.7447255Z 
2023-01-11T21:38:05.7447385Z async_compile.wait(globals())
2023-01-11T21:38:05.7447650Z del async_compile
2023-01-11T21:38:05.7447802Z 
2023-01-11T21:38:05.7447903Z def call(args):
2023-01-11T21:38:05.7448151Z     x1_1, x2_1 = args
2023-01-11T21:38:05.7448378Z     args.clear()
2023-01-11T21:38:05.7448793Z     buf0 = empty_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7449210Z     kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7449529Z     del x1_1
2023-01-11T21:38:05.7449755Z     del x2_1
2023-01-11T21:38:05.7449987Z     return (buf0, )
2023-01-11T21:38:05.7450133Z 
2023-01-11T21:38:05.7450139Z 
2023-01-11T21:38:05.7450239Z if __name__ == "__main__":
2023-01-11T21:38:05.7450538Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7450980Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7451457Z     x1_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7451924Z     x2_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7452273Z     print_performance(lambda: call([x1_1, x2_1]))
2023-01-11T21:38:05.7452475Z 
2023-01-11T21:38:05.7452482Z 
2023-01-11T21:38:05.7452613Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7452880Z import torch
2023-01-11T21:38:05.7453129Z import random
2023-01-11T21:38:05.7453423Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7453769Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7453974Z 
2023-01-11T21:38:05.7454085Z aten = torch.ops.aten
2023-01-11T21:38:05.7454417Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7454928Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7455098Z 
2023-01-11T21:38:05.7455200Z import triton
2023-01-11T21:38:05.7455463Z import triton.language as tl
2023-01-11T21:38:05.7455807Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7456182Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7456405Z 
2023-01-11T21:38:05.7456411Z 
2023-01-11T21:38:05.7456607Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7457042Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7457600Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7457931Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.7458247Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7458515Z {
2023-01-11T21:38:05.7458761Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7459025Z     {
2023-01-11T21:38:05.7459263Z         #pragma omp for 
2023-01-11T21:38:05.7459520Z         for(long i0=0; i0<25; i0+=1)
2023-01-11T21:38:05.7459773Z         {
2023-01-11T21:38:05.7460085Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7460468Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.7460803Z             auto tmp1 = tmp0.abs();
2023-01-11T21:38:05.7461085Z             auto tmp2 = tmp1.sin();
2023-01-11T21:38:05.7461437Z             auto tmp3 = tmp2.neg();
2023-01-11T21:38:05.7461728Z             auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:05.7462080Z             auto tmp5 = decltype(tmp4)(1)/(decltype(tmp4)(1) + tmp4.neg().exp());
2023-01-11T21:38:05.7462461Z             auto tmp6 = at::vec::clamp_min(tmp5, decltype(tmp5)(0));
2023-01-11T21:38:05.7462780Z             auto tmp7 = tmp6.cos();
2023-01-11T21:38:05.7463064Z             auto tmp8 = tmp7.exp();
2023-01-11T21:38:05.7463341Z             auto tmp9 = tmp8.sqrt();
2023-01-11T21:38:05.7463617Z             auto tmp10 = tmp9 + tmp0;
2023-01-11T21:38:05.7463817Z             auto tmp12 = tmp10 - tmp11;
2023-01-11T21:38:05.7463946Z             auto tmp13 = tmp12 * tmp0;
2023-01-11T21:38:05.7464065Z             auto tmp14 = tmp13 / tmp0;
2023-01-11T21:38:05.7464178Z             auto tmp15 = tmp14 * tmp14;
2023-01-11T21:38:05.7464293Z             auto tmp16 = tmp15 * tmp15;
2023-01-11T21:38:05.7464411Z             auto tmp17 = tmp16 * tmp14;
2023-01-11T21:38:05.7464531Z             auto tmp18 = tmp17 * tmp17;
2023-01-11T21:38:05.7464651Z             auto tmp19 = tmp18.log();
2023-01-11T21:38:05.7464776Z             auto tmp20 = tmp19.floor();
2023-01-11T21:38:05.7464896Z             auto tmp21 = tmp20.ceil();
2023-01-11T21:38:05.7465017Z             auto tmp22 = tmp21.trunc();
2023-01-11T21:38:05.7465142Z             auto tmp23 = tmp22.lgamma();
2023-01-11T21:38:05.7465276Z             auto tmp24 = tmp23.fmod(tmp11);
2023-01-11T21:38:05.7465488Z             auto tmp25 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), decltype(tmp24)(0) < tmp24);
2023-01-11T21:38:05.7465717Z             auto tmp26 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), tmp24 < decltype(tmp24)(0));
2023-01-11T21:38:05.7465986Z             auto tmp27 = tmp25 - tmp26;
2023-01-11T21:38:05.7466109Z             auto tmp28 = tmp27 + tmp11;
2023-01-11T21:38:05.7466239Z             tmp28.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7466334Z         }
2023-01-11T21:38:05.7466475Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7466596Z         for(long i0=200; i0<200; i0+=1)
2023-01-11T21:38:05.7466681Z         {
2023-01-11T21:38:05.7466803Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7466930Z             auto tmp12 = in_ptr1[i0];
2023-01-11T21:38:05.7467064Z             auto tmp1 = std::abs(tmp0);
2023-01-11T21:38:05.7467186Z             auto tmp2 = std::sin(tmp1);
2023-01-11T21:38:05.7467362Z             auto tmp3 = -tmp2;
2023-01-11T21:38:05.7467483Z             auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:05.7467659Z             auto tmp5 = std::exp(-tmp4);
2023-01-11T21:38:05.7467783Z             auto tmp6 = 1 / (1 + tmp5);
2023-01-11T21:38:05.7467904Z             auto tmp7 = tmp6 * (tmp6>0);
2023-01-11T21:38:05.7468028Z             auto tmp8 = std::cos(tmp7);
2023-01-11T21:38:05.7468150Z             auto tmp9 = std::exp(tmp8);
2023-01-11T21:38:05.7468285Z             auto tmp10 = std::sqrt(tmp9);
2023-01-11T21:38:05.7468405Z             auto tmp11 = tmp10 + tmp0;
2023-01-11T21:38:05.7468574Z             auto tmp13 = tmp11 - tmp12;
2023-01-11T21:38:05.7468692Z             auto tmp14 = tmp13 * tmp0;
2023-01-11T21:38:05.7468809Z             auto tmp15 = tmp14 / tmp0;
2023-01-11T21:38:05.7468926Z             auto tmp16 = tmp15 * tmp15;
2023-01-11T21:38:05.7469046Z             auto tmp17 = tmp16 * tmp16;
2023-01-11T21:38:05.7469163Z             auto tmp18 = tmp17 * tmp15;
2023-01-11T21:38:05.7469284Z             auto tmp19 = tmp18 * tmp18;
2023-01-11T21:38:05.7469404Z             auto tmp20 = std::log(tmp19);
2023-01-11T21:38:05.7469536Z             auto tmp21 = std::floor(tmp20);
2023-01-11T21:38:05.7469675Z             auto tmp22 = std::ceil(tmp21);
2023-01-11T21:38:05.7469807Z             auto tmp23 = std::trunc(tmp22);
2023-01-11T21:38:05.7469945Z             auto tmp24 = std::lgamma(tmp23);
2023-01-11T21:38:05.7470084Z             auto tmp25 = std::fmod(tmp24, tmp12);
2023-01-11T21:38:05.7470218Z             auto tmp26 = tmp25 > 0 ? 1 : 0;
2023-01-11T21:38:05.7470397Z             auto tmp27 = tmp25 < 0 ? 1 : 0;
2023-01-11T21:38:05.7470580Z             auto tmp28 = tmp26 - tmp27;
2023-01-11T21:38:05.7470699Z             auto tmp29 = tmp28 + tmp12;
2023-01-11T21:38:05.7470814Z             out_ptr0[i0] = tmp29;
2023-01-11T21:38:05.7470903Z         }
2023-01-11T21:38:05.7470987Z     }
2023-01-11T21:38:05.7471077Z }
2023-01-11T21:38:05.7471189Z ''')
2023-01-11T21:38:05.7471208Z 
2023-01-11T21:38:05.7471213Z 
2023-01-11T21:38:05.7471325Z async_compile.wait(globals())
2023-01-11T21:38:05.7471433Z del async_compile
2023-01-11T21:38:05.7471441Z 
2023-01-11T21:38:05.7471540Z def call(args):
2023-01-11T21:38:05.7471647Z     x1_1, x2_1 = args
2023-01-11T21:38:05.7471755Z     args.clear()
2023-01-11T21:38:05.7472017Z     buf0 = empty_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7472235Z     kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7472324Z     del x1_1
2023-01-11T21:38:05.7472422Z     del x2_1
2023-01-11T21:38:05.7472514Z     return (buf0, )
2023-01-11T21:38:05.7472521Z 
2023-01-11T21:38:05.7472526Z 
2023-01-11T21:38:05.7472635Z if __name__ == "__main__":
2023-01-11T21:38:05.7472788Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7472959Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7473230Z     x1_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7473498Z     x2_1 = rand_strided((10, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7473641Z     print_performance(lambda: call([x1_1, x2_1]))
2023-01-11T21:38:05.7473704Z 
2023-01-11T21:38:05.7474059Z [2023-01-11 21:24:04,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.7474404Z [2023-01-11 21:24:04,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.7474755Z [2023-01-11 21:24:06,125] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.7474763Z 
2023-01-11T21:38:05.7474905Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7475003Z import torch
2023-01-11T21:38:05.7475105Z import random
2023-01-11T21:38:05.7475268Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7475422Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7475443Z 
2023-01-11T21:38:05.7475547Z aten = torch.ops.aten
2023-01-11T21:38:05.7475737Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7475868Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7475878Z 
2023-01-11T21:38:05.7475975Z import triton
2023-01-11T21:38:05.7476099Z import triton.language as tl
2023-01-11T21:38:05.7476264Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7476456Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7476464Z 
2023-01-11T21:38:05.7476474Z 
2023-01-11T21:38:05.7476647Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7476916Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7477073Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7477208Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.7477342Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7477429Z {
2023-01-11T21:38:05.7477566Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7477658Z     {
2023-01-11T21:38:05.7477759Z         #pragma omp for 
2023-01-11T21:38:05.7477869Z         for(long i0=0; i0<20; i0+=1)
2023-01-11T21:38:05.7477960Z         {
2023-01-11T21:38:05.7478078Z             #pragma GCC ivdep
2023-01-11T21:38:05.7478191Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:05.7478283Z             {
2023-01-11T21:38:05.7478364Z                 {
2023-01-11T21:38:05.7478519Z                     {
2023-01-11T21:38:05.7478673Z                         auto tmp0 = in_ptr0[i0 + (20*i1)];
2023-01-11T21:38:05.7478815Z                         auto tmp12 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:05.7478953Z                         auto tmp1 = std::abs(tmp0);
2023-01-11T21:38:05.7479083Z                         auto tmp2 = std::sin(tmp1);
2023-01-11T21:38:05.7479269Z                         auto tmp3 = -tmp2;
2023-01-11T21:38:05.7479404Z                         auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:05.7479599Z                         auto tmp5 = std::exp(-tmp4);
2023-01-11T21:38:05.7479729Z                         auto tmp6 = 1 / (1 + tmp5);
2023-01-11T21:38:05.7479865Z                         auto tmp7 = tmp6 * (tmp6>0);
2023-01-11T21:38:05.7480001Z                         auto tmp8 = std::cos(tmp7);
2023-01-11T21:38:05.7480133Z                         auto tmp9 = std::exp(tmp8);
2023-01-11T21:38:05.7480279Z                         auto tmp10 = std::sqrt(tmp9);
2023-01-11T21:38:05.7480418Z                         auto tmp11 = tmp10 + tmp0;
2023-01-11T21:38:05.7480611Z                         auto tmp13 = tmp11 - tmp12;
2023-01-11T21:38:05.7480736Z                         auto tmp14 = tmp13 * tmp0;
2023-01-11T21:38:05.7480872Z                         auto tmp15 = tmp14 / tmp0;
2023-01-11T21:38:05.7481004Z                         auto tmp16 = tmp15 * tmp15;
2023-01-11T21:38:05.7481128Z                         auto tmp17 = tmp16 * tmp16;
2023-01-11T21:38:05.7481262Z                         auto tmp18 = tmp17 * tmp15;
2023-01-11T21:38:05.7481389Z                         auto tmp19 = tmp18 * tmp18;
2023-01-11T21:38:05.7481525Z                         auto tmp20 = std::log(tmp19);
2023-01-11T21:38:05.7481715Z                         auto tmp21 = std::floor(tmp20);
2023-01-11T21:38:05.7481864Z                         auto tmp22 = std::ceil(tmp21);
2023-01-11T21:38:05.7482008Z                         auto tmp23 = std::trunc(tmp22);
2023-01-11T21:38:05.7482156Z                         auto tmp24 = std::lgamma(tmp23);
2023-01-11T21:38:05.7482310Z                         auto tmp25 = std::fmod(tmp24, tmp12);
2023-01-11T21:38:05.7482442Z                         auto tmp26 = tmp25 > 0 ? 1 : 0;
2023-01-11T21:38:05.7482575Z                         auto tmp27 = tmp25 < 0 ? 1 : 0;
2023-01-11T21:38:05.7482780Z                         auto tmp28 = tmp26 - tmp27;
2023-01-11T21:38:05.7482900Z                         auto tmp29 = tmp28 + tmp12;
2023-01-11T21:38:05.7483027Z                         out_ptr0[i0 + (20*i1)] = tmp29;
2023-01-11T21:38:05.7483122Z                     }
2023-01-11T21:38:05.7483217Z                 }
2023-01-11T21:38:05.7483307Z             }
2023-01-11T21:38:05.7483400Z         }
2023-01-11T21:38:05.7483490Z     }
2023-01-11T21:38:05.7483570Z }
2023-01-11T21:38:05.7483687Z ''')
2023-01-11T21:38:05.7483695Z 
2023-01-11T21:38:05.7483702Z 
2023-01-11T21:38:05.7483829Z async_compile.wait(globals())
2023-01-11T21:38:05.7483938Z del async_compile
2023-01-11T21:38:05.7483944Z 
2023-01-11T21:38:05.7484049Z def call(args):
2023-01-11T21:38:05.7484148Z     x1_1, x2_1 = args
2023-01-11T21:38:05.7484250Z     args.clear()
2023-01-11T21:38:05.7484508Z     buf0 = empty_strided((20, 10), (1, 20), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7484721Z     kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7484819Z     del x1_1
2023-01-11T21:38:05.7484929Z     del x2_1
2023-01-11T21:38:05.7485044Z     return (buf0, )
2023-01-11T21:38:05.7485052Z 
2023-01-11T21:38:05.7485060Z 
2023-01-11T21:38:05.7485186Z if __name__ == "__main__":
2023-01-11T21:38:05.7485347Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7485526Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7485794Z     x1_1 = rand_strided((20, 10), (1, 20), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7486067Z     x2_1 = rand_strided((20, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7486276Z     print_performance(lambda: call([x1_1, x2_1]))
2023-01-11T21:38:05.7486283Z 
2023-01-11T21:38:05.7486288Z 
2023-01-11T21:38:05.7486418Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7486517Z import torch
2023-01-11T21:38:05.7486624Z import random
2023-01-11T21:38:05.7486788Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7486950Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7486956Z 
2023-01-11T21:38:05.7487050Z aten = torch.ops.aten
2023-01-11T21:38:05.7487227Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7487351Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7487362Z 
2023-01-11T21:38:05.7487463Z import triton
2023-01-11T21:38:05.7487586Z import triton.language as tl
2023-01-11T21:38:05.7487741Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7487925Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7487932Z 
2023-01-11T21:38:05.7487938Z 
2023-01-11T21:38:05.7488140Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7488397Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7488564Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7488715Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.7488853Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7488946Z {
2023-01-11T21:38:05.7489083Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7489175Z     {
2023-01-11T21:38:05.7489278Z         #pragma omp for 
2023-01-11T21:38:05.7489394Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.7489546Z         {
2023-01-11T21:38:05.7489733Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7489914Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.7490032Z             auto tmp1 = tmp0.abs();
2023-01-11T21:38:05.7490151Z             auto tmp2 = tmp1.sin();
2023-01-11T21:38:05.7490257Z             auto tmp3 = tmp2.neg();
2023-01-11T21:38:05.7490382Z             auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:05.7490563Z             auto tmp5 = decltype(tmp4)(1)/(decltype(tmp4)(1) + tmp4.neg().exp());
2023-01-11T21:38:05.7490740Z             auto tmp6 = at::vec::clamp_min(tmp5, decltype(tmp5)(0));
2023-01-11T21:38:05.7490861Z             auto tmp7 = tmp6.cos();
2023-01-11T21:38:05.7490975Z             auto tmp8 = tmp7.exp();
2023-01-11T21:38:05.7491093Z             auto tmp9 = tmp8.sqrt();
2023-01-11T21:38:05.7491216Z             auto tmp10 = tmp9 + tmp0;
2023-01-11T21:38:05.7491399Z             auto tmp12 = tmp10 - tmp11;
2023-01-11T21:38:05.7491525Z             auto tmp13 = tmp12 * tmp0;
2023-01-11T21:38:05.7491646Z             auto tmp14 = tmp13 / tmp0;
2023-01-11T21:38:05.7491772Z             auto tmp15 = tmp14 * tmp14;
2023-01-11T21:38:05.7491890Z             auto tmp16 = tmp15 * tmp15;
2023-01-11T21:38:05.7492010Z             auto tmp17 = tmp16 * tmp14;
2023-01-11T21:38:05.7492131Z             auto tmp18 = tmp17 * tmp17;
2023-01-11T21:38:05.7492241Z             auto tmp19 = tmp18.log();
2023-01-11T21:38:05.7492362Z             auto tmp20 = tmp19.floor();
2023-01-11T21:38:05.7492481Z             auto tmp21 = tmp20.ceil();
2023-01-11T21:38:05.7492600Z             auto tmp22 = tmp21.trunc();
2023-01-11T21:38:05.7492730Z             auto tmp23 = tmp22.lgamma();
2023-01-11T21:38:05.7492858Z             auto tmp24 = tmp23.fmod(tmp11);
2023-01-11T21:38:05.7493084Z             auto tmp25 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), decltype(tmp24)(0) < tmp24);
2023-01-11T21:38:05.7493317Z             auto tmp26 = decltype(tmp24)::blendv(decltype(tmp24)(0), decltype(tmp24)(1), tmp24 < decltype(tmp24)(0));
2023-01-11T21:38:05.7493493Z             auto tmp27 = tmp25 - tmp26;
2023-01-11T21:38:05.7493617Z             auto tmp28 = tmp27 + tmp11;
2023-01-11T21:38:05.7493737Z             tmp28.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7493881Z         }
2023-01-11T21:38:05.7494018Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7494132Z         for(long i0=64; i0<70; i0+=1)
2023-01-11T21:38:05.7494225Z         {
2023-01-11T21:38:05.7494338Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7494456Z             auto tmp12 = in_ptr1[i0];
2023-01-11T21:38:05.7655216Z             auto tmp1 = std::abs(tmp0);
2023-01-11T21:38:05.7655720Z             auto tmp2 = std::sin(tmp1);
2023-01-11T21:38:05.7655975Z             auto tmp3 = -tmp2;
2023-01-11T21:38:05.7656127Z             auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:05.7686671Z             auto tmp5 = std::exp(-tmp4);
2023-01-11T21:38:05.7686804Z             auto tmp6 = 1 / (1 + tmp5);
2023-01-11T21:38:05.7686952Z             auto tmp7 = tmp6 * (tmp6>0);
2023-01-11T21:38:05.7687108Z             auto tmp8 = std::cos(tmp7);
2023-01-11T21:38:05.7687728Z             auto tmp9 = std::exp(tmp8);
2023-01-11T21:38:05.7688116Z             auto tmp10 = std::sqrt(tmp9);
2023-01-11T21:38:05.7688264Z             auto tmp11 = tmp10 + tmp0;
2023-01-11T21:38:05.7688437Z             auto tmp13 = tmp11 - tmp12;
2023-01-11T21:38:05.7688531Z             auto tmp14 = tmp13 * tmp0;
2023-01-11T21:38:05.7688644Z             auto tmp15 = tmp14 / tmp0;
2023-01-11T21:38:05.7688755Z             auto tmp16 = tmp15 * tmp15;
2023-01-11T21:38:05.7688867Z             auto tmp17 = tmp16 * tmp16;
2023-01-11T21:38:05.7688981Z             auto tmp18 = tmp17 * tmp15;
2023-01-11T21:38:05.7689200Z             auto tmp19 = tmp18 * tmp18;
2023-01-11T21:38:05.7689393Z             auto tmp20 = std::log(tmp19);
2023-01-11T21:38:05.7689530Z             auto tmp21 = std::floor(tmp20);
2023-01-11T21:38:05.7689810Z             auto tmp22 = std::ceil(tmp21);
2023-01-11T21:38:05.7689973Z             auto tmp23 = std::trunc(tmp22);
2023-01-11T21:38:05.7690136Z             auto tmp24 = std::lgamma(tmp23);
2023-01-11T21:38:05.7690334Z             auto tmp25 = std::fmod(tmp24, tmp12);
2023-01-11T21:38:05.7690499Z             auto tmp26 = tmp25 > 0 ? 1 : 0;
2023-01-11T21:38:05.7690652Z             auto tmp27 = tmp25 < 0 ? 1 : 0;
2023-01-11T21:38:05.7690894Z             auto tmp28 = tmp26 - tmp27;
2023-01-11T21:38:05.7691014Z             auto tmp29 = tmp28 + tmp12;
2023-01-11T21:38:05.7691184Z             out_ptr0[i0] = tmp29;
2023-01-11T21:38:05.7691307Z         }
2023-01-11T21:38:05.7795398Z     }
2023-01-11T21:38:05.7795607Z }
2023-01-11T21:38:05.7795826Z ''')
2023-01-11T21:38:05.7795836Z 
2023-01-11T21:38:05.7795842Z 
2023-01-11T21:38:05.7796002Z async_compile.wait(globals())
2023-01-11T21:38:05.7796121Z del async_compile
2023-01-11T21:38:05.7796128Z 
2023-01-11T21:38:05.7796254Z def call(args):
2023-01-11T21:38:05.7796362Z     x1_1, x2_1 = args
2023-01-11T21:38:05.7796470Z     args.clear()
2023-01-11T21:38:05.7796795Z     buf0 = empty_strided((10, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7797029Z     kernel_cpp_0(c_void_p(x1_1.data_ptr()), c_void_p(x2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7797137Z     del x1_1
2023-01-11T21:38:05.7797225Z     del x2_1
2023-01-11T21:38:05.7797325Z     return (buf0, )
2023-01-11T21:38:05.7797333Z 
2023-01-11T21:38:05.7797339Z 
2023-01-11T21:38:05.7797451Z if __name__ == "__main__":
2023-01-11T21:38:05.7797617Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7797795Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7798067Z     x1_1 = rand_strided((10, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7798331Z     x2_1 = rand_strided((10, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7798478Z     print_performance(lambda: call([x1_1, x2_1]))
2023-01-11T21:38:05.7798489Z 
2023-01-11T21:38:05.7798578Z ok (6.587s)
2023-01-11T21:38:05.7799478Z   test_abs_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7799664Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7800025Z [2023-01-11 21:24:06,145] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 7
2023-01-11T21:38:05.7800400Z [2023-01-11 21:24:07,866] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 7
2023-01-11T21:38:05.7800408Z 
2023-01-11T21:38:05.7800535Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7800642Z import torch
2023-01-11T21:38:05.7800746Z import random
2023-01-11T21:38:05.7800910Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7801074Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7801081Z 
2023-01-11T21:38:05.7801194Z aten = torch.ops.aten
2023-01-11T21:38:05.7801380Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7801513Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7801520Z 
2023-01-11T21:38:05.7801622Z import triton
2023-01-11T21:38:05.7801749Z import triton.language as tl
2023-01-11T21:38:05.7801922Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7802098Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7802106Z 
2023-01-11T21:38:05.7802112Z 
2023-01-11T21:38:05.7802299Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7802569Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7802821Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7802965Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7803059Z {
2023-01-11T21:38:05.7803204Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7803300Z     {
2023-01-11T21:38:05.7803405Z         #pragma omp for 
2023-01-11T21:38:05.7803530Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.7803620Z         {
2023-01-11T21:38:05.7803832Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7803961Z             auto tmp1 = tmp0.abs();
2023-01-11T21:38:05.7804152Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.7804281Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.7804397Z             auto tmp4 = tmp0 / tmp3;
2023-01-11T21:38:05.7804518Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7804611Z         }
2023-01-11T21:38:05.7804746Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7804874Z         for(long i0=16; i0<17; i0+=1)
2023-01-11T21:38:05.7804968Z         {
2023-01-11T21:38:05.7805085Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7805208Z             auto tmp1 = std::abs(tmp0);
2023-01-11T21:38:05.7805343Z             auto tmp2 = static_cast<float>(1);
2023-01-11T21:38:05.7805472Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.7805593Z             auto tmp4 = tmp0 / tmp3;
2023-01-11T21:38:05.7805709Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.7805797Z         }
2023-01-11T21:38:05.7805886Z     }
2023-01-11T21:38:05.7805970Z }
2023-01-11T21:38:05.7806102Z ''')
2023-01-11T21:38:05.7806110Z 
2023-01-11T21:38:05.7806116Z 
2023-01-11T21:38:05.7806246Z async_compile.wait(globals())
2023-01-11T21:38:05.7806356Z del async_compile
2023-01-11T21:38:05.7806363Z 
2023-01-11T21:38:05.7806464Z def call(args):
2023-01-11T21:38:05.7806567Z     arg0_1, = args
2023-01-11T21:38:05.7806666Z     args.clear()
2023-01-11T21:38:05.7806944Z     buf0 = empty_strided((17, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7807127Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7807229Z     del arg0_1
2023-01-11T21:38:05.7807328Z     return (buf0, )
2023-01-11T21:38:05.7807336Z 
2023-01-11T21:38:05.7807342Z 
2023-01-11T21:38:05.7807508Z if __name__ == "__main__":
2023-01-11T21:38:05.7807680Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7807861Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7808130Z     arg0_1 = rand_strided((17, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7808286Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7808293Z 
2023-01-11T21:38:05.7808383Z ok (1.743s)
2023-01-11T21:38:05.7809010Z   test_adaptive_avg_pool2d1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7809196Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7809552Z [2023-01-11 21:24:07,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 8
2023-01-11T21:38:05.7809880Z [2023-01-11 21:24:07,904] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d
2023-01-11T21:38:05.7809888Z 
2023-01-11T21:38:05.7810038Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7810141Z import torch
2023-01-11T21:38:05.7810245Z import random
2023-01-11T21:38:05.7810409Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7810573Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7810581Z 
2023-01-11T21:38:05.7810699Z aten = torch.ops.aten
2023-01-11T21:38:05.7810932Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7811062Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7811069Z 
2023-01-11T21:38:05.7811171Z import triton
2023-01-11T21:38:05.7811299Z import triton.language as tl
2023-01-11T21:38:05.7811471Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7811656Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7811664Z 
2023-01-11T21:38:05.7811671Z 
2023-01-11T21:38:05.7811855Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7812141Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7812310Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7812452Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.7812592Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.7812681Z {
2023-01-11T21:38:05.7812822Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7812915Z     {
2023-01-11T21:38:05.7813019Z         #pragma omp for 
2023-01-11T21:38:05.7813132Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.7813223Z         {
2023-01-11T21:38:05.7813342Z             #pragma GCC ivdep
2023-01-11T21:38:05.7813464Z             for(long i1=0; i1<6; i1+=1)
2023-01-11T21:38:05.7813563Z             {
2023-01-11T21:38:05.7813674Z                 #pragma GCC ivdep
2023-01-11T21:38:05.7813803Z                 for(long i2=0; i2<6; i2+=1)
2023-01-11T21:38:05.7813896Z                 {
2023-01-11T21:38:05.7813991Z                     {
2023-01-11T21:38:05.7814093Z                         {
2023-01-11T21:38:05.7814260Z                             auto tmp0 = static_cast<long>(((8*i1) / 3));
2023-01-11T21:38:05.7814427Z                             auto tmp1 = static_cast<long>(((21 + (16*i1)) / 6));
2023-01-11T21:38:05.7814794Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.7814971Z                             auto tmp3 = static_cast<long>(((8*i2) / 3));
2023-01-11T21:38:05.7815166Z                             auto tmp4 = static_cast<long>(((21 + (16*i2)) / 6));
2023-01-11T21:38:05.7815316Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:05.7815449Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:05.7815642Z                             float tmp7 = 0.0;
2023-01-11T21:38:05.7815757Z                             if(tmp6)
2023-01-11T21:38:05.7815859Z                             {
2023-01-11T21:38:05.7816017Z                                 auto tmp8 = in_ptr0[(16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7816145Z                                 tmp7 = tmp8;
2023-01-11T21:38:05.7816245Z                             }
2023-01-11T21:38:05.7816404Z                             auto tmp9 = static_cast<long>(1 + (((8*i2) / 3)));
2023-01-11T21:38:05.7816542Z                             auto tmp10 = tmp9 < tmp4;
2023-01-11T21:38:05.7816681Z                             auto tmp11 = tmp2 & tmp10;
2023-01-11T21:38:05.7816808Z                             float tmp12 = 0.0;
2023-01-11T21:38:05.7816912Z                             if(tmp11)
2023-01-11T21:38:05.7817019Z                             {
2023-01-11T21:38:05.7817280Z                                 auto tmp13 = in_ptr0[1 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7817412Z                                 tmp12 = tmp13;
2023-01-11T21:38:05.7817517Z                             }
2023-01-11T21:38:05.7817662Z                             auto tmp14 = tmp12 + tmp7;
2023-01-11T21:38:05.7817833Z                             auto tmp15 = static_cast<long>(2 + (((8*i2) / 3)));
2023-01-11T21:38:05.7817969Z                             auto tmp16 = tmp15 < tmp4;
2023-01-11T21:38:05.7818098Z                             auto tmp17 = tmp2 & tmp16;
2023-01-11T21:38:05.7818228Z                             float tmp18 = 0.0;
2023-01-11T21:38:05.7818343Z                             if(tmp17)
2023-01-11T21:38:05.7818534Z                             {
2023-01-11T21:38:05.7818705Z                                 auto tmp19 = in_ptr0[2 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7818836Z                                 tmp18 = tmp19;
2023-01-11T21:38:05.7818940Z                             }
2023-01-11T21:38:05.7819077Z                             auto tmp20 = tmp18 + tmp14;
2023-01-11T21:38:05.7819235Z                             auto tmp21 = static_cast<long>(3 + (((8*i2) / 3)));
2023-01-11T21:38:05.7819372Z                             auto tmp22 = tmp21 < tmp4;
2023-01-11T21:38:05.7819508Z                             auto tmp23 = tmp2 & tmp22;
2023-01-11T21:38:05.7819634Z                             float tmp24 = 0.0;
2023-01-11T21:38:05.7819750Z                             if(tmp23)
2023-01-11T21:38:05.7819853Z                             {
2023-01-11T21:38:05.7820025Z                                 auto tmp25 = in_ptr0[3 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7820142Z                                 tmp24 = tmp25;
2023-01-11T21:38:05.7820250Z                             }
2023-01-11T21:38:05.7820391Z                             auto tmp26 = tmp24 + tmp20;
2023-01-11T21:38:05.7820556Z                             auto tmp27 = static_cast<long>(1 + (((8*i1) / 3)));
2023-01-11T21:38:05.7820690Z                             auto tmp28 = tmp27 < tmp1;
2023-01-11T21:38:05.7820828Z                             auto tmp29 = tmp28 & tmp5;
2023-01-11T21:38:05.7820951Z                             float tmp30 = 0.0;
2023-01-11T21:38:05.7821053Z                             if(tmp29)
2023-01-11T21:38:05.7821152Z                             {
2023-01-11T21:38:05.7821318Z                                 auto tmp31 = in_ptr0[16 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7821443Z                                 tmp30 = tmp31;
2023-01-11T21:38:05.7821549Z                             }
2023-01-11T21:38:05.7821678Z                             auto tmp32 = tmp30 + tmp26;
2023-01-11T21:38:05.7821818Z                             auto tmp33 = tmp28 & tmp10;
2023-01-11T21:38:05.7821944Z                             float tmp34 = 0.0;
2023-01-11T21:38:05.7822047Z                             if(tmp33)
2023-01-11T21:38:05.7822153Z                             {
2023-01-11T21:38:05.7822420Z                                 auto tmp35 = in_ptr0[17 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7822548Z                                 tmp34 = tmp35;
2023-01-11T21:38:05.7822651Z                             }
2023-01-11T21:38:05.7822786Z                             auto tmp36 = tmp34 + tmp32;
2023-01-11T21:38:05.7822917Z                             auto tmp37 = tmp28 & tmp16;
2023-01-11T21:38:05.7823033Z                             float tmp38 = 0.0;
2023-01-11T21:38:05.7823145Z                             if(tmp37)
2023-01-11T21:38:05.7823243Z                             {
2023-01-11T21:38:05.7823414Z                                 auto tmp39 = in_ptr0[18 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7823548Z                                 tmp38 = tmp39;
2023-01-11T21:38:05.7823652Z                             }
2023-01-11T21:38:05.7823785Z                             auto tmp40 = tmp38 + tmp36;
2023-01-11T21:38:05.7823923Z                             auto tmp41 = tmp28 & tmp22;
2023-01-11T21:38:05.7824039Z                             float tmp42 = 0.0;
2023-01-11T21:38:05.7824146Z                             if(tmp41)
2023-01-11T21:38:05.7824248Z                             {
2023-01-11T21:38:05.7824423Z                                 auto tmp43 = in_ptr0[19 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7824542Z                                 tmp42 = tmp43;
2023-01-11T21:38:05.7824640Z                             }
2023-01-11T21:38:05.7824773Z                             auto tmp44 = tmp42 + tmp40;
2023-01-11T21:38:05.7824928Z                             auto tmp45 = static_cast<long>(2 + (((8*i1) / 3)));
2023-01-11T21:38:05.7825100Z                             auto tmp46 = tmp45 < tmp1;
2023-01-11T21:38:05.7825240Z                             auto tmp47 = tmp46 & tmp5;
2023-01-11T21:38:05.7825366Z                             float tmp48 = 0.0;
2023-01-11T21:38:05.7825479Z                             if(tmp47)
2023-01-11T21:38:05.7825592Z                             {
2023-01-11T21:38:05.7825769Z                                 auto tmp49 = in_ptr0[32 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7825891Z                                 tmp48 = tmp49;
2023-01-11T21:38:05.7825986Z                             }
2023-01-11T21:38:05.7826128Z                             auto tmp50 = tmp48 + tmp44;
2023-01-11T21:38:05.7826268Z                             auto tmp51 = tmp46 & tmp10;
2023-01-11T21:38:05.7826396Z                             float tmp52 = 0.0;
2023-01-11T21:38:05.7826515Z                             if(tmp51)
2023-01-11T21:38:05.7826618Z                             {
2023-01-11T21:38:05.7826792Z                                 auto tmp53 = in_ptr0[33 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7826916Z                                 tmp52 = tmp53;
2023-01-11T21:38:05.7827011Z                             }
2023-01-11T21:38:05.7827147Z                             auto tmp54 = tmp52 + tmp50;
2023-01-11T21:38:05.7827283Z                             auto tmp55 = tmp46 & tmp16;
2023-01-11T21:38:05.7827399Z                             float tmp56 = 0.0;
2023-01-11T21:38:05.7827508Z                             if(tmp55)
2023-01-11T21:38:05.7827601Z                             {
2023-01-11T21:38:05.7827770Z                                 auto tmp57 = in_ptr0[34 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7827892Z                                 tmp56 = tmp57;
2023-01-11T21:38:05.7827990Z                             }
2023-01-11T21:38:05.7828127Z                             auto tmp58 = tmp56 + tmp54;
2023-01-11T21:38:05.7828268Z                             auto tmp59 = tmp46 & tmp22;
2023-01-11T21:38:05.7828396Z                             float tmp60 = 0.0;
2023-01-11T21:38:05.7828508Z                             if(tmp59)
2023-01-11T21:38:05.7828598Z                             {
2023-01-11T21:38:05.7828810Z                                 auto tmp61 = in_ptr0[35 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7828935Z                                 tmp60 = tmp61;
2023-01-11T21:38:05.7829031Z                             }
2023-01-11T21:38:05.7829164Z                             auto tmp62 = tmp60 + tmp58;
2023-01-11T21:38:05.7829315Z                             auto tmp63 = static_cast<long>(3 + (((8*i1) / 3)));
2023-01-11T21:38:05.7829446Z                             auto tmp64 = tmp63 < tmp1;
2023-01-11T21:38:05.7829569Z                             auto tmp65 = tmp64 & tmp5;
2023-01-11T21:38:05.7829696Z                             float tmp66 = 0.0;
2023-01-11T21:38:05.7829815Z                             if(tmp65)
2023-01-11T21:38:05.7829922Z                             {
2023-01-11T21:38:05.7830093Z                                 auto tmp67 = in_ptr0[48 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7830210Z                                 tmp66 = tmp67;
2023-01-11T21:38:05.7830313Z                             }
2023-01-11T21:38:05.7830450Z                             auto tmp68 = tmp66 + tmp62;
2023-01-11T21:38:05.7830577Z                             auto tmp69 = tmp64 & tmp10;
2023-01-11T21:38:05.7830706Z                             float tmp70 = 0.0;
2023-01-11T21:38:05.7830812Z                             if(tmp69)
2023-01-11T21:38:05.7830914Z                             {
2023-01-11T21:38:05.7831087Z                                 auto tmp71 = in_ptr0[49 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7831209Z                                 tmp70 = tmp71;
2023-01-11T21:38:05.7831314Z                             }
2023-01-11T21:38:05.7831482Z                             auto tmp72 = tmp70 + tmp68;
2023-01-11T21:38:05.7831616Z                             auto tmp73 = tmp64 & tmp16;
2023-01-11T21:38:05.7831746Z                             float tmp74 = 0.0;
2023-01-11T21:38:05.7831858Z                             if(tmp73)
2023-01-11T21:38:05.7831961Z                             {
2023-01-11T21:38:05.7832137Z                                 auto tmp75 = in_ptr0[50 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7832257Z                                 tmp74 = tmp75;
2023-01-11T21:38:05.7832350Z                             }
2023-01-11T21:38:05.7832487Z                             auto tmp76 = tmp74 + tmp72;
2023-01-11T21:38:05.7832624Z                             auto tmp77 = tmp64 & tmp22;
2023-01-11T21:38:05.7832751Z                             float tmp78 = 0.0;
2023-01-11T21:38:05.7832861Z                             if(tmp77)
2023-01-11T21:38:05.7832964Z                             {
2023-01-11T21:38:05.7833143Z                                 auto tmp79 = in_ptr0[51 + (16*(((8*i1) / 3))) + (256*i0) + (((8*i2) / 3))];
2023-01-11T21:38:05.7833267Z                                 tmp78 = tmp79;
2023-01-11T21:38:05.7833361Z                             }
2023-01-11T21:38:05.7833492Z                             auto tmp80 = tmp78 + tmp76;
2023-01-11T21:38:05.7833621Z                             float tmp81 = 0.0;
2023-01-11T21:38:05.7833734Z                             if(tmp6)
2023-01-11T21:38:05.7833837Z                             {
2023-01-11T21:38:05.7833994Z                                 auto tmp82 = static_cast<float>(1);
2023-01-11T21:38:05.7834115Z                                 tmp81 = tmp82;
2023-01-11T21:38:05.7834204Z                             }
2023-01-11T21:38:05.7834329Z                             float tmp83 = 0.0;
2023-01-11T21:38:05.7834438Z                             if(tmp11)
2023-01-11T21:38:05.7834540Z                             {
2023-01-11T21:38:05.7834695Z                                 auto tmp84 = static_cast<float>(1);
2023-01-11T21:38:05.7834822Z                                 tmp83 = tmp84;
2023-01-11T21:38:05.7834945Z                             }
2023-01-11T21:38:05.7835088Z                             auto tmp85 = tmp83 + tmp81;
2023-01-11T21:38:05.7835224Z                             float tmp86 = 0.0;
2023-01-11T21:38:05.7835381Z                             if(tmp17)
2023-01-11T21:38:05.7835486Z                             {
2023-01-11T21:38:05.7835631Z                                 auto tmp87 = static_cast<float>(1);
2023-01-11T21:38:05.7835754Z                                 tmp86 = tmp87;
2023-01-11T21:38:05.7835854Z                             }
2023-01-11T21:38:05.7835981Z                             auto tmp88 = tmp86 + tmp85;
2023-01-11T21:38:05.7836113Z                             float tmp89 = 0.0;
2023-01-11T21:38:05.7836221Z                             if(tmp23)
2023-01-11T21:38:05.7836323Z                             {
2023-01-11T21:38:05.7836478Z                                 auto tmp90 = static_cast<float>(1);
2023-01-11T21:38:05.7836607Z                                 tmp89 = tmp90;
2023-01-11T21:38:05.7836708Z                             }
2023-01-11T21:38:05.7836839Z                             auto tmp91 = tmp89 + tmp88;
2023-01-11T21:38:05.7836955Z                             float tmp92 = 0.0;
2023-01-11T21:38:05.7837071Z                             if(tmp29)
2023-01-11T21:38:05.7837173Z                             {
2023-01-11T21:38:05.7837328Z                                 auto tmp93 = static_cast<float>(1);
2023-01-11T21:38:05.7837453Z                                 tmp92 = tmp93;
2023-01-11T21:38:05.7837551Z                             }
2023-01-11T21:38:05.7837692Z                             auto tmp94 = tmp92 + tmp91;
2023-01-11T21:38:05.7837810Z                             float tmp95 = 0.0;
2023-01-11T21:38:05.7837922Z                             if(tmp33)
2023-01-11T21:38:05.7838022Z                             {
2023-01-11T21:38:05.7838253Z                                 auto tmp96 = static_cast<float>(1);
2023-01-11T21:38:05.7838379Z                                 tmp95 = tmp96;
2023-01-11T21:38:05.7838483Z                             }
2023-01-11T21:38:05.7838625Z                             auto tmp97 = tmp95 + tmp94;
2023-01-11T21:38:05.7838745Z                             float tmp98 = 0.0;
2023-01-11T21:38:05.7838857Z                             if(tmp37)
2023-01-11T21:38:05.7838961Z                             {
2023-01-11T21:38:05.7839118Z                                 auto tmp99 = static_cast<float>(1);
2023-01-11T21:38:05.7839241Z                                 tmp98 = tmp99;
2023-01-11T21:38:05.7839346Z                             }
2023-01-11T21:38:05.7839480Z                             auto tmp100 = tmp98 + tmp97;
2023-01-11T21:38:05.7839603Z                             float tmp101 = 0.0;
2023-01-11T21:38:05.7839716Z                             if(tmp41)
2023-01-11T21:38:05.7839817Z                             {
2023-01-11T21:38:05.7839984Z                                 auto tmp102 = static_cast<float>(1);
2023-01-11T21:38:05.7840111Z                                 tmp101 = tmp102;
2023-01-11T21:38:05.7840213Z                             }
2023-01-11T21:38:05.7840356Z                             auto tmp103 = tmp101 + tmp100;
2023-01-11T21:38:05.7840481Z                             float tmp104 = 0.0;
2023-01-11T21:38:05.7840599Z                             if(tmp47)
2023-01-11T21:38:05.7840710Z                             {
2023-01-11T21:38:05.7840861Z                                 auto tmp105 = static_cast<float>(1);
2023-01-11T21:38:05.7840987Z                                 tmp104 = tmp105;
2023-01-11T21:38:05.7841090Z                             }
2023-01-11T21:38:05.7841240Z                             auto tmp106 = tmp104 + tmp103;
2023-01-11T21:38:05.7841367Z                             float tmp107 = 0.0;
2023-01-11T21:38:05.7841472Z                             if(tmp51)
2023-01-11T21:38:05.7841573Z                             {
2023-01-11T21:38:05.7841737Z                                 auto tmp108 = static_cast<float>(1);
2023-01-11T21:38:05.7841867Z                                 tmp107 = tmp108;
2023-01-11T21:38:05.7841974Z                             }
2023-01-11T21:38:05.7842116Z                             auto tmp109 = tmp107 + tmp106;
2023-01-11T21:38:05.7842296Z                             float tmp110 = 0.0;
2023-01-11T21:38:05.7842398Z                             if(tmp55)
2023-01-11T21:38:05.7842503Z                             {
2023-01-11T21:38:05.7842656Z                                 auto tmp111 = static_cast<float>(1);
2023-01-11T21:38:05.7842781Z                                 tmp110 = tmp111;
2023-01-11T21:38:05.7842885Z                             }
2023-01-11T21:38:05.7843027Z                             auto tmp112 = tmp110 + tmp109;
2023-01-11T21:38:05.7843158Z                             float tmp113 = 0.0;
2023-01-11T21:38:05.7843264Z                             if(tmp59)
2023-01-11T21:38:05.7843371Z                             {
2023-01-11T21:38:05.7843522Z                                 auto tmp114 = static_cast<float>(1);
2023-01-11T21:38:05.7843643Z                                 tmp113 = tmp114;
2023-01-11T21:38:05.7843745Z                             }
2023-01-11T21:38:05.7843889Z                             auto tmp115 = tmp113 + tmp112;
2023-01-11T21:38:05.7844024Z                             float tmp116 = 0.0;
2023-01-11T21:38:05.7844123Z                             if(tmp65)
2023-01-11T21:38:05.7844225Z                             {
2023-01-11T21:38:05.7844390Z                                 auto tmp117 = static_cast<float>(1);
2023-01-11T21:38:05.7844517Z                                 tmp116 = tmp117;
2023-01-11T21:38:05.7844621Z                             }
2023-01-11T21:38:05.7844762Z                             auto tmp118 = tmp116 + tmp115;
2023-01-11T21:38:05.7844889Z                             float tmp119 = 0.0;
2023-01-11T21:38:05.7844999Z                             if(tmp69)
2023-01-11T21:38:05.7845148Z                             {
2023-01-11T21:38:05.7845303Z                                 auto tmp120 = static_cast<float>(1);
2023-01-11T21:38:05.7845430Z                                 tmp119 = tmp120;
2023-01-11T21:38:05.7845529Z                             }
2023-01-11T21:38:05.7845667Z                             auto tmp121 = tmp119 + tmp118;
2023-01-11T21:38:05.7845795Z                             float tmp122 = 0.0;
2023-01-11T21:38:05.7845909Z                             if(tmp73)
2023-01-11T21:38:05.7845998Z                             {
2023-01-11T21:38:05.7846154Z                                 auto tmp123 = static_cast<float>(1);
2023-01-11T21:38:05.7846283Z                                 tmp122 = tmp123;
2023-01-11T21:38:05.7846382Z                             }
2023-01-11T21:38:05.7846527Z                             auto tmp124 = tmp122 + tmp121;
2023-01-11T21:38:05.7846658Z                             float tmp125 = 0.0;
2023-01-11T21:38:05.7846778Z                             if(tmp77)
2023-01-11T21:38:05.7846870Z                             {
2023-01-11T21:38:05.7847023Z                                 auto tmp126 = static_cast<float>(1);
2023-01-11T21:38:05.7847147Z                                 tmp125 = tmp126;
2023-01-11T21:38:05.7847251Z                             }
2023-01-11T21:38:05.7847394Z                             auto tmp127 = tmp125 + tmp124;
2023-01-11T21:38:05.7847535Z                             auto tmp128 = tmp80 / tmp127;
2023-01-11T21:38:05.7847681Z                             out_ptr0[i2 + (6*i1) + (36*i0)] = tmp128;
2023-01-11T21:38:05.7847769Z                         }
2023-01-11T21:38:05.7847866Z                     }
2023-01-11T21:38:05.7847963Z                 }
2023-01-11T21:38:05.7848059Z             }
2023-01-11T21:38:05.7848153Z         }
2023-01-11T21:38:05.7848273Z         #pragma omp for 
2023-01-11T21:38:05.7848400Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.7848484Z         {
2023-01-11T21:38:05.7848679Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7848871Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.7848999Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7849134Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.7849269Z         }
2023-01-11T21:38:05.7849413Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7849528Z         for(long i0=2048; i0<2048; i0+=1)
2023-01-11T21:38:05.7849622Z         {
2023-01-11T21:38:05.7849750Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7849893Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.7850013Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7850132Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.7850227Z         }
2023-01-11T21:38:05.7850312Z     }
2023-01-11T21:38:05.7850402Z }
2023-01-11T21:38:05.7850540Z ''')
2023-01-11T21:38:05.7850547Z 
2023-01-11T21:38:05.7850552Z 
2023-01-11T21:38:05.7850683Z async_compile.wait(globals())
2023-01-11T21:38:05.7850794Z del async_compile
2023-01-11T21:38:05.7850802Z 
2023-01-11T21:38:05.7850908Z def call(args):
2023-01-11T21:38:05.7851012Z     arg0_1, = args
2023-01-11T21:38:05.7851108Z     args.clear()
2023-01-11T21:38:05.7851401Z     buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7851704Z     buf1 = empty_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7851935Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.7852035Z     del arg0_1
2023-01-11T21:38:05.7852186Z     buf2 = aten._adaptive_avg_pool2d(buf1, [2, 5])
2023-01-11T21:38:05.7852281Z     del buf1
2023-01-11T21:38:05.7852384Z     buf3 = buf2
2023-01-11T21:38:05.7852526Z     assert_size_stride(buf3, (2, 4, 2, 5), (40, 10, 5, 1))
2023-01-11T21:38:05.7852622Z     del buf2
2023-01-11T21:38:05.7852732Z     return (buf0, buf3, )
2023-01-11T21:38:05.7852786Z 
2023-01-11T21:38:05.7852792Z 
2023-01-11T21:38:05.7852905Z if __name__ == "__main__":
2023-01-11T21:38:05.7853074Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7853240Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7853568Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7853722Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7854078Z [2023-01-11 21:24:10,096] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 8
2023-01-11T21:38:05.7854812Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7855004Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7855359Z [2023-01-11 21:24:10,123] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 9
2023-01-11T21:38:05.7855367Z 
2023-01-11T21:38:05.7855373Z 
2023-01-11T21:38:05.7855515Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7855628Z import torch
2023-01-11T21:38:05.7855729Z import random
2023-01-11T21:38:05.7855897Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7856069Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7856077Z 
2023-01-11T21:38:05.7856182Z aten = torch.ops.aten
2023-01-11T21:38:05.7856364Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7856499Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7856507Z 
2023-01-11T21:38:05.7856609Z import triton
2023-01-11T21:38:05.7856740Z import triton.language as tl
2023-01-11T21:38:05.7856911Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7857106Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7857113Z 
2023-01-11T21:38:05.7857118Z 
2023-01-11T21:38:05.7857436Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7857785Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7857961Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7858106Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.7858248Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.7858344Z {
2023-01-11T21:38:05.7858489Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7858583Z     {
2023-01-11T21:38:05.7858684Z         #pragma omp for 
2023-01-11T21:38:05.7858805Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.7858901Z         {
2023-01-11T21:38:05.7859021Z             #pragma GCC ivdep
2023-01-11T21:38:05.7859144Z             for(long i1=0; i1<6; i1+=1)
2023-01-11T21:38:05.7859241Z             {
2023-01-11T21:38:05.7859358Z                 #pragma GCC ivdep
2023-01-11T21:38:05.7859478Z                 for(long i2=0; i2<6; i2+=1)
2023-01-11T21:38:05.7859574Z                 {
2023-01-11T21:38:05.7859673Z                     {
2023-01-11T21:38:05.7859778Z                         {
2023-01-11T21:38:05.7859937Z                             auto tmp0 = static_cast<long>((i1 / 2));
2023-01-11T21:38:05.7860104Z                             auto tmp1 = static_cast<long>(((8 + (3*i1)) / 6));
2023-01-11T21:38:05.7860243Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.7860403Z                             auto tmp3 = static_cast<long>((i2 / 2));
2023-01-11T21:38:05.7860556Z                             auto tmp4 = static_cast<long>(((8 + (3*i2)) / 6));
2023-01-11T21:38:05.7860691Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:05.7860827Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:05.7861014Z                             float tmp7 = 0.0;
2023-01-11T21:38:05.7861130Z                             if(tmp6)
2023-01-11T21:38:05.7861234Z                             {
2023-01-11T21:38:05.7861394Z                                 auto tmp8 = in_ptr0[(3*(i1 / 2)) + (9*i0) + (i2 / 2)];
2023-01-11T21:38:05.7861508Z                                 tmp7 = tmp8;
2023-01-11T21:38:05.7861612Z                             }
2023-01-11T21:38:05.7861776Z                             auto tmp9 = static_cast<long>(1 + (i2 / 2));
2023-01-11T21:38:05.7861913Z                             auto tmp10 = tmp9 < tmp4;
2023-01-11T21:38:05.7862047Z                             auto tmp11 = tmp2 & tmp10;
2023-01-11T21:38:05.7862179Z                             float tmp12 = 0.0;
2023-01-11T21:38:05.7862294Z                             if(tmp11)
2023-01-11T21:38:05.7862388Z                             {
2023-01-11T21:38:05.7862556Z                                 auto tmp13 = in_ptr0[1 + (3*(i1 / 2)) + (9*i0) + (i2 / 2)];
2023-01-11T21:38:05.7862687Z                                 tmp12 = tmp13;
2023-01-11T21:38:05.7862788Z                             }
2023-01-11T21:38:05.7862927Z                             auto tmp14 = tmp12 + tmp7;
2023-01-11T21:38:05.7863089Z                             auto tmp15 = static_cast<long>(1 + (i1 / 2));
2023-01-11T21:38:05.7863237Z                             auto tmp16 = tmp15 < tmp1;
2023-01-11T21:38:05.7863369Z                             auto tmp17 = tmp16 & tmp5;
2023-01-11T21:38:05.7863486Z                             float tmp18 = 0.0;
2023-01-11T21:38:05.7863599Z                             if(tmp17)
2023-01-11T21:38:05.7863704Z                             {
2023-01-11T21:38:05.7863871Z                                 auto tmp19 = in_ptr0[3 + (3*(i1 / 2)) + (9*i0) + (i2 / 2)];
2023-01-11T21:38:05.7863991Z                                 tmp18 = tmp19;
2023-01-11T21:38:05.7864096Z                             }
2023-01-11T21:38:05.7864236Z                             auto tmp20 = tmp18 + tmp14;
2023-01-11T21:38:05.7864372Z                             auto tmp21 = tmp16 & tmp10;
2023-01-11T21:38:05.7864503Z                             float tmp22 = 0.0;
2023-01-11T21:38:05.7864617Z                             if(tmp21)
2023-01-11T21:38:05.7864708Z                             {
2023-01-11T21:38:05.7864912Z                                 auto tmp23 = in_ptr0[4 + (3*(i1 / 2)) + (9*i0) + (i2 / 2)];
2023-01-11T21:38:05.7865042Z                                 tmp22 = tmp23;
2023-01-11T21:38:05.7865146Z                             }
2023-01-11T21:38:05.7865282Z                             auto tmp24 = tmp22 + tmp20;
2023-01-11T21:38:05.7865397Z                             float tmp25 = 0.0;
2023-01-11T21:38:05.7865513Z                             if(tmp6)
2023-01-11T21:38:05.7865617Z                             {
2023-01-11T21:38:05.7865779Z                                 auto tmp26 = static_cast<float>(1);
2023-01-11T21:38:05.7865904Z                                 tmp25 = tmp26;
2023-01-11T21:38:05.7866010Z                             }
2023-01-11T21:38:05.7866133Z                             float tmp27 = 0.0;
2023-01-11T21:38:05.7866237Z                             if(tmp11)
2023-01-11T21:38:05.7866343Z                             {
2023-01-11T21:38:05.7866502Z                                 auto tmp28 = static_cast<float>(1);
2023-01-11T21:38:05.7866630Z                                 tmp27 = tmp28;
2023-01-11T21:38:05.7866731Z                             }
2023-01-11T21:38:05.7866863Z                             auto tmp29 = tmp27 + tmp25;
2023-01-11T21:38:05.7866986Z                             float tmp30 = 0.0;
2023-01-11T21:38:05.7867086Z                             if(tmp17)
2023-01-11T21:38:05.7867188Z                             {
2023-01-11T21:38:05.7867347Z                                 auto tmp31 = static_cast<float>(1);
2023-01-11T21:38:05.7867471Z                                 tmp30 = tmp31;
2023-01-11T21:38:05.7867575Z                             }
2023-01-11T21:38:05.7867766Z                             auto tmp32 = tmp30 + tmp29;
2023-01-11T21:38:05.7867894Z                             float tmp33 = 0.0;
2023-01-11T21:38:05.7867997Z                             if(tmp21)
2023-01-11T21:38:05.7868103Z                             {
2023-01-11T21:38:05.7868271Z                                 auto tmp34 = static_cast<float>(1);
2023-01-11T21:38:05.7868393Z                                 tmp33 = tmp34;
2023-01-11T21:38:05.7868494Z                             }
2023-01-11T21:38:05.7868629Z                             auto tmp35 = tmp33 + tmp32;
2023-01-11T21:38:05.7868771Z                             auto tmp36 = tmp24 / tmp35;
2023-01-11T21:38:05.7868918Z                             out_ptr0[i2 + (6*i1) + (36*i0)] = tmp36;
2023-01-11T21:38:05.7869012Z                         }
2023-01-11T21:38:05.7869109Z                     }
2023-01-11T21:38:05.7869202Z                 }
2023-01-11T21:38:05.7869294Z             }
2023-01-11T21:38:05.7869393Z         }
2023-01-11T21:38:05.7869517Z         #pragma omp for 
2023-01-11T21:38:05.7869630Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.7869729Z         {
2023-01-11T21:38:05.7869842Z             #pragma GCC ivdep
2023-01-11T21:38:05.7869966Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.7870066Z             {
2023-01-11T21:38:05.7870191Z                 #pragma GCC ivdep
2023-01-11T21:38:05.7870330Z                 for(long i2=0; i2<5; i2+=1)
2023-01-11T21:38:05.7870418Z                 {
2023-01-11T21:38:05.7870514Z                     {
2023-01-11T21:38:05.7870614Z                         {
2023-01-11T21:38:05.7870772Z                             auto tmp0 = static_cast<long>(((3*i1) / 2));
2023-01-11T21:38:05.7870940Z                             auto tmp1 = static_cast<long>(2 + (((3*i1) / 2)));
2023-01-11T21:38:05.7871085Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.7871245Z                             auto tmp3 = static_cast<long>(((3*i2) / 5));
2023-01-11T21:38:05.7871401Z                             auto tmp4 = static_cast<long>(((7 + (3*i2)) / 5));
2023-01-11T21:38:05.7871537Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:05.7871679Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:05.7871809Z                             float tmp7 = 0.0;
2023-01-11T21:38:05.7871966Z                             if(tmp6)
2023-01-11T21:38:05.7872069Z                             {
2023-01-11T21:38:05.7872242Z                                 auto tmp8 = in_ptr0[(3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))];
2023-01-11T21:38:05.7872403Z                                 auto tmp9 = static_cast<float>(1);
2023-01-11T21:38:05.7872536Z                                 auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:05.7872658Z                                 tmp7 = tmp10;
2023-01-11T21:38:05.7872755Z                             }
2023-01-11T21:38:05.7872916Z                             auto tmp11 = static_cast<long>(1 + (((3*i2) / 5)));
2023-01-11T21:38:05.7873057Z                             auto tmp12 = tmp11 < tmp4;
2023-01-11T21:38:05.7873191Z                             auto tmp13 = tmp2 & tmp12;
2023-01-11T21:38:05.7873323Z                             float tmp14 = 0.0;
2023-01-11T21:38:05.7873421Z                             if(tmp13)
2023-01-11T21:38:05.7873523Z                             {
2023-01-11T21:38:05.7873705Z                                 auto tmp15 = in_ptr0[1 + (3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))];
2023-01-11T21:38:05.7873864Z                                 auto tmp16 = static_cast<float>(1);
2023-01-11T21:38:05.7874011Z                                 auto tmp17 = tmp15 + tmp16;
2023-01-11T21:38:05.7874130Z                                 tmp14 = tmp17;
2023-01-11T21:38:05.7874233Z                             }
2023-01-11T21:38:05.7874374Z                             auto tmp18 = tmp14 + tmp7;
2023-01-11T21:38:05.7874532Z                             auto tmp19 = static_cast<long>(1 + (((3*i1) / 2)));
2023-01-11T21:38:05.7874671Z                             auto tmp20 = tmp19 < tmp1;
2023-01-11T21:38:05.7874851Z                             auto tmp21 = tmp20 & tmp5;
2023-01-11T21:38:05.7874981Z                             float tmp22 = 0.0;
2023-01-11T21:38:05.7875095Z                             if(tmp21)
2023-01-11T21:38:05.7875206Z                             {
2023-01-11T21:38:05.7875388Z                                 auto tmp23 = in_ptr0[3 + (3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))];
2023-01-11T21:38:05.7875527Z                                 auto tmp24 = static_cast<float>(1);
2023-01-11T21:38:05.7875671Z                                 auto tmp25 = tmp23 + tmp24;
2023-01-11T21:38:05.7875796Z                                 tmp22 = tmp25;
2023-01-11T21:38:05.7875905Z                             }
2023-01-11T21:38:05.7876044Z                             auto tmp26 = tmp22 + tmp18;
2023-01-11T21:38:05.7876182Z                             auto tmp27 = tmp20 & tmp12;
2023-01-11T21:38:05.7876310Z                             float tmp28 = 0.0;
2023-01-11T21:38:05.7876427Z                             if(tmp27)
2023-01-11T21:38:05.7876522Z                             {
2023-01-11T21:38:05.7876694Z                                 auto tmp29 = in_ptr0[4 + (3*(((3*i1) / 2))) + (9*i0) + (((3*i2) / 5))];
2023-01-11T21:38:05.7876858Z                                 auto tmp30 = static_cast<float>(1);
2023-01-11T21:38:05.7876997Z                                 auto tmp31 = tmp29 + tmp30;
2023-01-11T21:38:05.7877119Z                                 tmp28 = tmp31;
2023-01-11T21:38:05.7877223Z                             }
2023-01-11T21:38:05.7877364Z                             auto tmp32 = tmp28 + tmp26;
2023-01-11T21:38:05.7877483Z                             float tmp33 = 0.0;
2023-01-11T21:38:05.7877598Z                             if(tmp6)
2023-01-11T21:38:05.7877698Z                             {
2023-01-11T21:38:05.7877851Z                                 auto tmp34 = static_cast<float>(1);
2023-01-11T21:38:05.7877974Z                                 tmp33 = tmp34;
2023-01-11T21:38:05.7878077Z                             }
2023-01-11T21:38:05.7878200Z                             float tmp35 = 0.0;
2023-01-11T21:38:05.7878304Z                             if(tmp13)
2023-01-11T21:38:05.7878411Z                             {
2023-01-11T21:38:05.7878614Z                                 auto tmp36 = static_cast<float>(1);
2023-01-11T21:38:05.7878739Z                                 tmp35 = tmp36;
2023-01-11T21:38:05.7878839Z                             }
2023-01-11T21:38:05.7878972Z                             auto tmp37 = tmp35 + tmp33;
2023-01-11T21:38:05.7879099Z                             float tmp38 = 0.0;
2023-01-11T21:38:05.7879211Z                             if(tmp21)
2023-01-11T21:38:05.7879304Z                             {
2023-01-11T21:38:05.7879454Z                                 auto tmp39 = static_cast<float>(1);
2023-01-11T21:38:05.7879581Z                                 tmp38 = tmp39;
2023-01-11T21:38:05.7879684Z                             }
2023-01-11T21:38:05.7879824Z                             auto tmp40 = tmp38 + tmp37;
2023-01-11T21:38:05.7879953Z                             float tmp41 = 0.0;
2023-01-11T21:38:05.7880065Z                             if(tmp27)
2023-01-11T21:38:05.7880155Z                             {
2023-01-11T21:38:05.7880314Z                                 auto tmp42 = static_cast<float>(1);
2023-01-11T21:38:05.7880440Z                                 tmp41 = tmp42;
2023-01-11T21:38:05.7880542Z                             }
2023-01-11T21:38:05.7880676Z                             auto tmp43 = tmp41 + tmp40;
2023-01-11T21:38:05.7880812Z                             auto tmp44 = tmp32 / tmp43;
2023-01-11T21:38:05.7880953Z                             out_ptr1[i2 + (5*i1) + (10*i0)] = tmp44;
2023-01-11T21:38:05.7881044Z                         }
2023-01-11T21:38:05.7881148Z                     }
2023-01-11T21:38:05.7881245Z                 }
2023-01-11T21:38:05.7881338Z             }
2023-01-11T21:38:05.7881472Z         }
2023-01-11T21:38:05.7881562Z     }
2023-01-11T21:38:05.7881654Z }
2023-01-11T21:38:05.7881775Z ''')
2023-01-11T21:38:05.7881784Z 
2023-01-11T21:38:05.7881790Z 
2023-01-11T21:38:05.7881930Z async_compile.wait(globals())
2023-01-11T21:38:05.7882041Z del async_compile
2023-01-11T21:38:05.7882048Z 
2023-01-11T21:38:05.7882152Z def call(args):
2023-01-11T21:38:05.7882260Z     arg0_1, = args
2023-01-11T21:38:05.7882367Z     args.clear()
2023-01-11T21:38:05.7882667Z     buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7882939Z     buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7883168Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.7883271Z     del arg0_1
2023-01-11T21:38:05.7883386Z     return (buf0, buf1, )
2023-01-11T21:38:05.7883394Z 
2023-01-11T21:38:05.7883400Z 
2023-01-11T21:38:05.7883506Z if __name__ == "__main__":
2023-01-11T21:38:05.7883673Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7883849Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7884147Z     arg0_1 = rand_strided((2, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7884289Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7884657Z [2023-01-11 21:24:12,158] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 9
2023-01-11T21:38:05.7885242Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7885417Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7885782Z [2023-01-11 21:24:12,185] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 10
2023-01-11T21:38:05.7885791Z 
2023-01-11T21:38:05.7885797Z 
2023-01-11T21:38:05.7885934Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7886036Z import torch
2023-01-11T21:38:05.7886140Z import random
2023-01-11T21:38:05.7886363Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7886539Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7886547Z 
2023-01-11T21:38:05.7886647Z aten = torch.ops.aten
2023-01-11T21:38:05.7886840Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7886975Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7886982Z 
2023-01-11T21:38:05.7887087Z import triton
2023-01-11T21:38:05.7887215Z import triton.language as tl
2023-01-11T21:38:05.7887382Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7887577Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7887589Z 
2023-01-11T21:38:05.7887596Z 
2023-01-11T21:38:05.7887792Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7888059Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7888232Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7888382Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.7888530Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.7888622Z {
2023-01-11T21:38:05.7888756Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7888852Z     {
2023-01-11T21:38:05.7888956Z         #pragma omp for 
2023-01-11T21:38:05.7889083Z         for(long i0=0; i0<36; i0+=1)
2023-01-11T21:38:05.7889178Z         {
2023-01-11T21:38:05.7889363Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7889496Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7889591Z         }
2023-01-11T21:38:05.7889781Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7889893Z         for(long i0=288; i0<288; i0+=1)
2023-01-11T21:38:05.7889987Z         {
2023-01-11T21:38:05.7890107Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7890218Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.7890314Z         }
2023-01-11T21:38:05.7890430Z         #pragma omp for 
2023-01-11T21:38:05.7890553Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.7890636Z         {
2023-01-11T21:38:05.7890756Z             #pragma GCC ivdep
2023-01-11T21:38:05.7890873Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.7890966Z             {
2023-01-11T21:38:05.7891086Z                 #pragma GCC ivdep
2023-01-11T21:38:05.7891210Z                 for(long i2=0; i2<5; i2+=1)
2023-01-11T21:38:05.7891303Z                 {
2023-01-11T21:38:05.7891392Z                     {
2023-01-11T21:38:05.7891490Z                         {
2023-01-11T21:38:05.7891652Z                             auto tmp0 = static_cast<long>(3*i1);
2023-01-11T21:38:05.7891812Z                             auto tmp1 = static_cast<long>(3 + (3*i1));
2023-01-11T21:38:05.7891949Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.7892111Z                             auto tmp3 = static_cast<long>(((6*i2) / 5));
2023-01-11T21:38:05.7892284Z                             auto tmp4 = static_cast<long>(2 + (((6*i2) / 5)));
2023-01-11T21:38:05.7892404Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:05.7892545Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:05.7892673Z                             float tmp7 = 0.0;
2023-01-11T21:38:05.7892786Z                             if(tmp6)
2023-01-11T21:38:05.7892891Z                             {
2023-01-11T21:38:05.7893057Z                                 auto tmp8 = in_ptr0[(18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7893214Z                                 auto tmp9 = static_cast<float>(1);
2023-01-11T21:38:05.7893362Z                                 auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:05.7893474Z                                 tmp7 = tmp10;
2023-01-11T21:38:05.7893575Z                             }
2023-01-11T21:38:05.7893739Z                             auto tmp11 = static_cast<long>(1 + (((6*i2) / 5)));
2023-01-11T21:38:05.7893916Z                             auto tmp12 = tmp11 < tmp4;
2023-01-11T21:38:05.7894057Z                             auto tmp13 = tmp2 & tmp12;
2023-01-11T21:38:05.7894184Z                             float tmp14 = 0.0;
2023-01-11T21:38:05.7894301Z                             if(tmp13)
2023-01-11T21:38:05.7894392Z                             {
2023-01-11T21:38:05.7932113Z                                 auto tmp15 = in_ptr0[1 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7932292Z                                 auto tmp16 = static_cast<float>(1);
2023-01-11T21:38:05.7932442Z                                 auto tmp17 = tmp15 + tmp16;
2023-01-11T21:38:05.7932567Z                                 tmp14 = tmp17;
2023-01-11T21:38:05.7932679Z                             }
2023-01-11T21:38:05.7932825Z                             auto tmp18 = tmp14 + tmp7;
2023-01-11T21:38:05.7932991Z                             auto tmp19 = static_cast<long>(1 + (3*i1));
2023-01-11T21:38:05.7933124Z                             auto tmp20 = tmp19 < tmp1;
2023-01-11T21:38:05.7933263Z                             auto tmp21 = tmp20 & tmp5;
2023-01-11T21:38:05.7933391Z                             float tmp22 = 0.0;
2023-01-11T21:38:05.7933505Z                             if(tmp21)
2023-01-11T21:38:05.7933611Z                             {
2023-01-11T21:38:05.7933775Z                                 auto tmp23 = in_ptr0[6 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7933932Z                                 auto tmp24 = static_cast<float>(1);
2023-01-11T21:38:05.7934066Z                                 auto tmp25 = tmp23 + tmp24;
2023-01-11T21:38:05.7934191Z                                 tmp22 = tmp25;
2023-01-11T21:38:05.7934387Z                             }
2023-01-11T21:38:05.7934665Z                             auto tmp26 = tmp22 + tmp18;
2023-01-11T21:38:05.7934812Z                             auto tmp27 = tmp20 & tmp12;
2023-01-11T21:38:05.7934942Z                             float tmp28 = 0.0;
2023-01-11T21:38:05.7935060Z                             if(tmp27)
2023-01-11T21:38:05.7935155Z                             {
2023-01-11T21:38:05.7935352Z                                 auto tmp29 = in_ptr0[7 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7935528Z                                 auto tmp30 = static_cast<float>(1);
2023-01-11T21:38:05.7935669Z                                 auto tmp31 = tmp29 + tmp30;
2023-01-11T21:38:05.7935794Z                                 tmp28 = tmp31;
2023-01-11T21:38:05.7935899Z                             }
2023-01-11T21:38:05.7936035Z                             auto tmp32 = tmp28 + tmp26;
2023-01-11T21:38:05.7936195Z                             auto tmp33 = static_cast<long>(2 + (3*i1));
2023-01-11T21:38:05.7936327Z                             auto tmp34 = tmp33 < tmp1;
2023-01-11T21:38:05.7936465Z                             auto tmp35 = tmp34 & tmp5;
2023-01-11T21:38:05.7936596Z                             float tmp36 = 0.0;
2023-01-11T21:38:05.7936708Z                             if(tmp35)
2023-01-11T21:38:05.7936809Z                             {
2023-01-11T21:38:05.7936980Z                                 auto tmp37 = in_ptr0[12 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7937204Z                                 auto tmp38 = static_cast<float>(1);
2023-01-11T21:38:05.7937333Z                                 auto tmp39 = tmp37 + tmp38;
2023-01-11T21:38:05.7937459Z                                 tmp36 = tmp39;
2023-01-11T21:38:05.7937566Z                             }
2023-01-11T21:38:05.7937705Z                             auto tmp40 = tmp36 + tmp32;
2023-01-11T21:38:05.7937842Z                             auto tmp41 = tmp34 & tmp12;
2023-01-11T21:38:05.7937969Z                             float tmp42 = 0.0;
2023-01-11T21:38:05.7938080Z                             if(tmp41)
2023-01-11T21:38:05.7938185Z                             {
2023-01-11T21:38:05.7938343Z                                 auto tmp43 = in_ptr0[13 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7938565Z                                 auto tmp44 = static_cast<float>(1);
2023-01-11T21:38:05.7938708Z                                 auto tmp45 = tmp43 + tmp44;
2023-01-11T21:38:05.7938831Z                                 tmp42 = tmp45;
2023-01-11T21:38:05.7938934Z                             }
2023-01-11T21:38:05.7939071Z                             auto tmp46 = tmp42 + tmp40;
2023-01-11T21:38:05.7939202Z                             auto tmp47 = tmp1 < tmp1;
2023-01-11T21:38:05.7939327Z                             auto tmp48 = tmp47 & tmp5;
2023-01-11T21:38:05.7939453Z                             float tmp49 = 0.0;
2023-01-11T21:38:05.7939566Z                             if(tmp48)
2023-01-11T21:38:05.7939674Z                             {
2023-01-11T21:38:05.7939835Z                                 auto tmp50 = in_ptr0[18 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7939983Z                                 auto tmp51 = static_cast<float>(1);
2023-01-11T21:38:05.7940131Z                                 auto tmp52 = tmp50 + tmp51;
2023-01-11T21:38:05.7940254Z                                 tmp49 = tmp52;
2023-01-11T21:38:05.7940347Z                             }
2023-01-11T21:38:05.7940486Z                             auto tmp53 = tmp49 + tmp46;
2023-01-11T21:38:05.7940616Z                             auto tmp54 = tmp47 & tmp12;
2023-01-11T21:38:05.7940750Z                             float tmp55 = 0.0;
2023-01-11T21:38:05.7940863Z                             if(tmp54)
2023-01-11T21:38:05.7940969Z                             {
2023-01-11T21:38:05.7941135Z                                 auto tmp56 = in_ptr0[19 + (18*i1) + (36*i0) + (((6*i2) / 5))];
2023-01-11T21:38:05.7941333Z                                 auto tmp57 = static_cast<float>(1);
2023-01-11T21:38:05.7941471Z                                 auto tmp58 = tmp56 + tmp57;
2023-01-11T21:38:05.7941597Z                                 tmp55 = tmp58;
2023-01-11T21:38:05.7941698Z                             }
2023-01-11T21:38:05.7941832Z                             auto tmp59 = tmp55 + tmp53;
2023-01-11T21:38:05.7941958Z                             float tmp60 = 0.0;
2023-01-11T21:38:05.7942071Z                             if(tmp6)
2023-01-11T21:38:05.7942165Z                             {
2023-01-11T21:38:05.7942321Z                                 auto tmp61 = static_cast<float>(1);
2023-01-11T21:38:05.7942444Z                                 tmp60 = tmp61;
2023-01-11T21:38:05.7942544Z                             }
2023-01-11T21:38:05.7942669Z                             float tmp62 = 0.0;
2023-01-11T21:38:05.7942782Z                             if(tmp13)
2023-01-11T21:38:05.7942882Z                             {
2023-01-11T21:38:05.7943039Z                                 auto tmp63 = static_cast<float>(1);
2023-01-11T21:38:05.7943151Z                                 tmp62 = tmp63;
2023-01-11T21:38:05.7943253Z                             }
2023-01-11T21:38:05.7943393Z                             auto tmp64 = tmp62 + tmp60;
2023-01-11T21:38:05.7943526Z                             float tmp65 = 0.0;
2023-01-11T21:38:05.7943640Z                             if(tmp21)
2023-01-11T21:38:05.7943740Z                             {
2023-01-11T21:38:05.7943891Z                                 auto tmp66 = static_cast<float>(1);
2023-01-11T21:38:05.7944004Z                                 tmp65 = tmp66;
2023-01-11T21:38:05.7944106Z                             }
2023-01-11T21:38:05.7944249Z                             auto tmp67 = tmp65 + tmp64;
2023-01-11T21:38:05.7944374Z                             float tmp68 = 0.0;
2023-01-11T21:38:05.7944490Z                             if(tmp27)
2023-01-11T21:38:05.7944595Z                             {
2023-01-11T21:38:05.7944758Z                                 auto tmp69 = static_cast<float>(1);
2023-01-11T21:38:05.7944870Z                                 tmp68 = tmp69;
2023-01-11T21:38:05.7944974Z                             }
2023-01-11T21:38:05.7945109Z                             auto tmp70 = tmp68 + tmp67;
2023-01-11T21:38:05.7945294Z                             float tmp71 = 0.0;
2023-01-11T21:38:05.7945418Z                             if(tmp35)
2023-01-11T21:38:05.7945529Z                             {
2023-01-11T21:38:05.7945697Z                                 auto tmp72 = static_cast<float>(1);
2023-01-11T21:38:05.7945830Z                                 tmp71 = tmp72;
2023-01-11T21:38:05.7945935Z                             }
2023-01-11T21:38:05.7946072Z                             auto tmp73 = tmp71 + tmp70;
2023-01-11T21:38:05.7946197Z                             float tmp74 = 0.0;
2023-01-11T21:38:05.7946312Z                             if(tmp41)
2023-01-11T21:38:05.7946415Z                             {
2023-01-11T21:38:05.7946571Z                                 auto tmp75 = static_cast<float>(1);
2023-01-11T21:38:05.7946688Z                                 tmp74 = tmp75;
2023-01-11T21:38:05.7946789Z                             }
2023-01-11T21:38:05.7946929Z                             auto tmp76 = tmp74 + tmp73;
2023-01-11T21:38:05.7947061Z                             float tmp77 = 0.0;
2023-01-11T21:38:05.7947172Z                             if(tmp48)
2023-01-11T21:38:05.7947273Z                             {
2023-01-11T21:38:05.7947429Z                                 auto tmp78 = static_cast<float>(1);
2023-01-11T21:38:05.7947553Z                                 tmp77 = tmp78;
2023-01-11T21:38:05.7947646Z                             }
2023-01-11T21:38:05.7947785Z                             auto tmp79 = tmp77 + tmp76;
2023-01-11T21:38:05.7947906Z                             float tmp80 = 0.0;
2023-01-11T21:38:05.7948019Z                             if(tmp54)
2023-01-11T21:38:05.7948171Z                             {
2023-01-11T21:38:05.7948332Z                                 auto tmp81 = static_cast<float>(1);
2023-01-11T21:38:05.7948460Z                                 tmp80 = tmp81;
2023-01-11T21:38:05.7948555Z                             }
2023-01-11T21:38:05.7948687Z                             auto tmp82 = tmp80 + tmp79;
2023-01-11T21:38:05.7948818Z                             auto tmp83 = tmp59 / tmp82;
2023-01-11T21:38:05.7948970Z                             out_ptr1[i2 + (5*i1) + (10*i0)] = tmp83;
2023-01-11T21:38:05.7949073Z                         }
2023-01-11T21:38:05.7949177Z                     }
2023-01-11T21:38:05.7949268Z                 }
2023-01-11T21:38:05.7949354Z             }
2023-01-11T21:38:05.7949451Z         }
2023-01-11T21:38:05.7949544Z     }
2023-01-11T21:38:05.7949635Z }
2023-01-11T21:38:05.7949778Z ''')
2023-01-11T21:38:05.7949785Z 
2023-01-11T21:38:05.7949792Z 
2023-01-11T21:38:05.7949921Z async_compile.wait(globals())
2023-01-11T21:38:05.7950037Z del async_compile
2023-01-11T21:38:05.7950044Z 
2023-01-11T21:38:05.7950140Z def call(args):
2023-01-11T21:38:05.7950244Z     arg0_1, = args
2023-01-11T21:38:05.7950352Z     args.clear()
2023-01-11T21:38:05.7950647Z     buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7950947Z     buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7951181Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.7951279Z     del arg0_1
2023-01-11T21:38:05.7951389Z     return (buf0, buf1, )
2023-01-11T21:38:05.7951397Z 
2023-01-11T21:38:05.7951403Z 
2023-01-11T21:38:05.7951506Z if __name__ == "__main__":
2023-01-11T21:38:05.7951666Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7951832Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7973791Z     arg0_1 = rand_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7973956Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7974330Z [2023-01-11 21:24:14,053] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 10
2023-01-11T21:38:05.7974340Z 
2023-01-11T21:38:05.7974443Z ok (6.186s)
2023-01-11T21:38:05.7975302Z   test_adaptive_avg_pool2d2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7975489Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7975842Z [2023-01-11 21:24:14,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 11
2023-01-11T21:38:05.7976183Z [2023-01-11 21:24:14,077] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d
2023-01-11T21:38:05.7976543Z [2023-01-11 21:24:14,079] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 11
2023-01-11T21:38:05.7976551Z 
2023-01-11T21:38:05.7976684Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7976792Z import torch
2023-01-11T21:38:05.7976896Z import random
2023-01-11T21:38:05.7977054Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7977312Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7977320Z 
2023-01-11T21:38:05.7977437Z aten = torch.ops.aten
2023-01-11T21:38:05.7977618Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7977744Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7977751Z 
2023-01-11T21:38:05.7977856Z import triton
2023-01-11T21:38:05.7977982Z import triton.language as tl
2023-01-11T21:38:05.7978157Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7978427Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7978435Z 
2023-01-11T21:38:05.7978441Z 
2023-01-11T21:38:05.7978574Z async_compile.wait(globals())
2023-01-11T21:38:05.7978682Z del async_compile
2023-01-11T21:38:05.7978689Z 
2023-01-11T21:38:05.7978784Z def call(args):
2023-01-11T21:38:05.7978888Z     arg0_1, = args
2023-01-11T21:38:05.7978991Z     args.clear()
2023-01-11T21:38:05.7979153Z     buf0 = aten._adaptive_avg_pool2d(arg0_1, [4, 4])
2023-01-11T21:38:05.7979254Z     del arg0_1
2023-01-11T21:38:05.7979353Z     buf1 = buf0
2023-01-11T21:38:05.7979498Z     assert_size_stride(buf1, (2, 4, 4, 4), (64, 16, 4, 1))
2023-01-11T21:38:05.7979586Z     del buf0
2023-01-11T21:38:05.7979690Z     return (buf1, )
2023-01-11T21:38:05.7979697Z 
2023-01-11T21:38:05.7979704Z 
2023-01-11T21:38:05.7979814Z if __name__ == "__main__":
2023-01-11T21:38:05.7979971Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7980151Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7980470Z     arg0_1 = rand_strided((2, 4, 21, 21), (1764, 441, 21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7980622Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7980629Z 
2023-01-11T21:38:05.7980729Z ok (0.026s)
2023-01-11T21:38:05.7981350Z   test_add_const_float_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7981534Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7981889Z [2023-01-11 21:24:14,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 12
2023-01-11T21:38:05.7982254Z [2023-01-11 21:24:15,796] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 12
2023-01-11T21:38:05.7982261Z 
2023-01-11T21:38:05.7982401Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7982507Z import torch
2023-01-11T21:38:05.7982608Z import random
2023-01-11T21:38:05.7982813Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7983016Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7983024Z 
2023-01-11T21:38:05.7983132Z aten = torch.ops.aten
2023-01-11T21:38:05.7983394Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7983547Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7983554Z 
2023-01-11T21:38:05.7983824Z import triton
2023-01-11T21:38:05.7983979Z import triton.language as tl
2023-01-11T21:38:05.7984181Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7984400Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7984413Z 
2023-01-11T21:38:05.7984420Z 
2023-01-11T21:38:05.7984642Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7984908Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7985098Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7985265Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7985381Z {
2023-01-11T21:38:05.7985547Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7985660Z     {
2023-01-11T21:38:05.7985834Z         #pragma omp for 
2023-01-11T21:38:05.7985947Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.7986061Z         {
2023-01-11T21:38:05.7986267Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7986522Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1.5));
2023-01-11T21:38:05.7986663Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7986814Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7986996Z         }
2023-01-11T21:38:05.7987156Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7987270Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:05.7987385Z         {
2023-01-11T21:38:05.7987531Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7987704Z             auto tmp1 = static_cast<float>(1.5);
2023-01-11T21:38:05.7987878Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7988019Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.7988149Z         }
2023-01-11T21:38:05.7988231Z     }
2023-01-11T21:38:05.7988340Z }
2023-01-11T21:38:05.7988494Z ''')
2023-01-11T21:38:05.7988502Z 
2023-01-11T21:38:05.7988508Z 
2023-01-11T21:38:05.7988649Z async_compile.wait(globals())
2023-01-11T21:38:05.7988806Z del async_compile
2023-01-11T21:38:05.7988813Z 
2023-01-11T21:38:05.7989021Z def call(args):
2023-01-11T21:38:05.7989147Z     arg0_1, = args
2023-01-11T21:38:05.7989244Z     args.clear()
2023-01-11T21:38:05.7989543Z     buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7989773Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.7989919Z     del arg0_1
2023-01-11T21:38:05.7990169Z     return (buf0, )
2023-01-11T21:38:05.7990176Z 
2023-01-11T21:38:05.7990182Z 
2023-01-11T21:38:05.7990327Z if __name__ == "__main__":
2023-01-11T21:38:05.7990503Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.7990698Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.7990971Z     arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.7991181Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.7991189Z 
2023-01-11T21:38:05.7991320Z ok (1.718s)
2023-01-11T21:38:05.7991965Z   test_add_const_int_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.7992173Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.7992627Z [2023-01-11 21:24:15,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 13
2023-01-11T21:38:05.7993013Z [2023-01-11 21:24:17,519] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 13
2023-01-11T21:38:05.7993021Z 
2023-01-11T21:38:05.7993245Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.7993472Z import torch
2023-01-11T21:38:05.7993572Z import random
2023-01-11T21:38:05.7993926Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.7994113Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.7994121Z 
2023-01-11T21:38:05.7994255Z aten = torch.ops.aten
2023-01-11T21:38:05.7994468Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.7994615Z async_compile = AsyncCompile()
2023-01-11T21:38:05.7994622Z 
2023-01-11T21:38:05.7994751Z import triton
2023-01-11T21:38:05.7994935Z import triton.language as tl
2023-01-11T21:38:05.7995101Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.7995311Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.7995319Z 
2023-01-11T21:38:05.7995325Z 
2023-01-11T21:38:05.7995545Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.7995838Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.7996026Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.7996213Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.7996338Z {
2023-01-11T21:38:05.7996536Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.7996621Z     {
2023-01-11T21:38:05.7996810Z         #pragma omp for 
2023-01-11T21:38:05.7996950Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.7997067Z         {
2023-01-11T21:38:05.7997278Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.7997481Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.7997641Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7997765Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.7997878Z         }
2023-01-11T21:38:05.7998035Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.7998201Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:05.7998314Z         {
2023-01-11T21:38:05.7998458Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.7998624Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.7998742Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.7998884Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.7998994Z         }
2023-01-11T21:38:05.7999154Z     }
2023-01-11T21:38:05.7999267Z }
2023-01-11T21:38:05.7999416Z ''')
2023-01-11T21:38:05.7999424Z 
2023-01-11T21:38:05.7999430Z 
2023-01-11T21:38:05.7999583Z async_compile.wait(globals())
2023-01-11T21:38:05.7999679Z del async_compile
2023-01-11T21:38:05.7999687Z 
2023-01-11T21:38:05.7999809Z def call(args):
2023-01-11T21:38:05.7999971Z     arg0_1, = args
2023-01-11T21:38:05.8000103Z     args.clear()
2023-01-11T21:38:05.8000402Z     buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8000605Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8000725Z     del arg0_1
2023-01-11T21:38:05.8000821Z     return (buf0, )
2023-01-11T21:38:05.8001158Z 
2023-01-11T21:38:05.8001166Z 
2023-01-11T21:38:05.8001277Z if __name__ == "__main__":
2023-01-11T21:38:05.8001468Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8001694Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8001995Z     arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8002171Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8002179Z 
2023-01-11T21:38:05.8002347Z ok (1.722s)
2023-01-11T21:38:05.8003043Z   test_add_inplace_permuted_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8003248Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8003633Z [2023-01-11 21:24:17,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 14
2023-01-11T21:38:05.8003984Z [2023-01-11 21:24:19,304] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 14
2023-01-11T21:38:05.8004592Z 
2023-01-11T21:38:05.8004738Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8004861Z import torch
2023-01-11T21:38:05.8004992Z import random
2023-01-11T21:38:05.8005206Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8005423Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8005437Z 
2023-01-11T21:38:05.8005576Z aten = torch.ops.aten
2023-01-11T21:38:05.8005811Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8005940Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8005947Z 
2023-01-11T21:38:05.8006074Z import triton
2023-01-11T21:38:05.8006232Z import triton.language as tl
2023-01-11T21:38:05.8006435Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8006635Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8006643Z 
2023-01-11T21:38:05.8006648Z 
2023-01-11T21:38:05.8006939Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8007310Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8007489Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8007623Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8007791Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8007907Z {
2023-01-11T21:38:05.8008109Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8008225Z     {
2023-01-11T21:38:05.8008375Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8008542Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.8008624Z         {
2023-01-11T21:38:05.8008774Z             for(long i1=0; i1<12; i1+=1)
2023-01-11T21:38:05.8009324Z             {
2023-01-11T21:38:05.8009482Z                 for(long i2=0; i2<27; i2+=1)
2023-01-11T21:38:05.8009603Z                 {
2023-01-11T21:38:05.8009849Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i2) + (221*i1) + (2652*i0));
2023-01-11T21:38:05.8010084Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i2) + (221*i0));
2023-01-11T21:38:05.8010203Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8010835Z                     tmp2.store(out_ptr0 + (8*i2) + (221*i1) + (2652*i0));
2023-01-11T21:38:05.8010965Z                 }
2023-01-11T21:38:05.8011128Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8011270Z                 for(long i2=216; i2<221; i2+=1)
2023-01-11T21:38:05.8011386Z                 {
2023-01-11T21:38:05.8011556Z                     auto tmp0 = out_ptr0[i2 + (221*i1) + (2652*i0)];
2023-01-11T21:38:05.8011718Z                     auto tmp1 = in_ptr1[i2 + (221*i0)];
2023-01-11T21:38:05.8011843Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8012017Z                     out_ptr0[i2 + (221*i1) + (2652*i0)] = tmp2;
2023-01-11T21:38:05.8012130Z                 }
2023-01-11T21:38:05.8012246Z             }
2023-01-11T21:38:05.8012367Z         }
2023-01-11T21:38:05.8012518Z     }
2023-01-11T21:38:05.8012609Z }
2023-01-11T21:38:05.8012762Z ''')
2023-01-11T21:38:05.8012770Z 
2023-01-11T21:38:05.8012776Z 
2023-01-11T21:38:05.8012929Z async_compile.wait(globals())
2023-01-11T21:38:05.8013070Z del async_compile
2023-01-11T21:38:05.8013077Z 
2023-01-11T21:38:05.8013291Z def call(args):
2023-01-11T21:38:05.8013442Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8013572Z     args.clear()
2023-01-11T21:38:05.8013809Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.8013903Z     del arg1_1
2023-01-11T21:38:05.8014039Z     return (arg0_1, )
2023-01-11T21:38:05.8014047Z 
2023-01-11T21:38:05.8014054Z 
2023-01-11T21:38:05.8014199Z if __name__ == "__main__":
2023-01-11T21:38:05.8014387Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8014851Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8015198Z     arg0_1 = rand_strided((2, 13, 12, 17), (2652, 17, 221, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8015523Z     arg1_1 = rand_strided((2, 13, 1, 17), (221, 17, 17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8015698Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8015706Z 
2023-01-11T21:38:05.8015832Z ok (1.785s)
2023-01-11T21:38:05.8016438Z   test_addmm_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8016655Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8017033Z [2023-01-11 21:24:19,333] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 15
2023-01-11T21:38:05.8017626Z [2023-01-11 21:24:21,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 15
2023-01-11T21:38:05.8017634Z 
2023-01-11T21:38:05.8017876Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8018005Z import torch
2023-01-11T21:38:05.8018133Z import random
2023-01-11T21:38:05.8018328Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8018540Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8018548Z 
2023-01-11T21:38:05.8018656Z aten = torch.ops.aten
2023-01-11T21:38:05.8018861Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8019023Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8019030Z 
2023-01-11T21:38:05.8019160Z import triton
2023-01-11T21:38:05.8019312Z import triton.language as tl
2023-01-11T21:38:05.8019505Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8019723Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8019735Z 
2023-01-11T21:38:05.8019741Z 
2023-01-11T21:38:05.8019969Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8020237Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8020423Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8020605Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8020777Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.8020999Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8021161Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8021309Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.8021432Z {
2023-01-11T21:38:05.8021570Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8021686Z     {
2023-01-11T21:38:05.8021823Z         #pragma omp for 
2023-01-11T21:38:05.8021993Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8022111Z         {
2023-01-11T21:38:05.8022330Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8022540Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8022657Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8022884Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8023027Z         }
2023-01-11T21:38:05.8023185Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8023330Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8023445Z         {
2023-01-11T21:38:05.8023643Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8023780Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8023950Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8024087Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8024200Z         }
2023-01-11T21:38:05.8024336Z         #pragma omp for 
2023-01-11T21:38:05.8024476Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8024801Z         {
2023-01-11T21:38:05.8024980Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8025193Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8025456Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8025613Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8025724Z         }
2023-01-11T21:38:05.8025888Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8026055Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8026143Z         {
2023-01-11T21:38:05.8026290Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.8026455Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8026611Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8026796Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.8027482Z         }
2023-01-11T21:38:05.8027624Z         #pragma omp for 
2023-01-11T21:38:05.8027738Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8027929Z         {
2023-01-11T21:38:05.8028139Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i0);
2023-01-11T21:38:05.8028732Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.8028904Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8029063Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.8029179Z         }
2023-01-11T21:38:05.8029313Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8029457Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8029566Z         {
2023-01-11T21:38:05.8029713Z             auto tmp0 = in_ptr2[i0];
2023-01-11T21:38:05.8029872Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.8030030Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8030163Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:05.8030249Z         }
2023-01-11T21:38:05.8030431Z     }
2023-01-11T21:38:05.8030544Z }
2023-01-11T21:38:05.8030706Z ''')
2023-01-11T21:38:05.8030719Z 
2023-01-11T21:38:05.8030724Z 
2023-01-11T21:38:05.8030941Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.8031247Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8031453Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.8031577Z {
2023-01-11T21:38:05.8031708Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8031828Z     {
2023-01-11T21:38:05.8031969Z         #pragma omp for 
2023-01-11T21:38:05.8032156Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8032270Z         {
2023-01-11T21:38:05.8032499Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8032715Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(4));
2023-01-11T21:38:05.8032835Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8032998Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8033111Z         }
2023-01-11T21:38:05.8033329Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8033473Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8033586Z         {
2023-01-11T21:38:05.8033763Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.8033905Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:05.8034109Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8034257Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8034377Z         }
2023-01-11T21:38:05.8034489Z     }
2023-01-11T21:38:05.8034633Z }
2023-01-11T21:38:05.8034799Z ''')
2023-01-11T21:38:05.8034808Z 
2023-01-11T21:38:05.8034814Z 
2023-01-11T21:38:05.8034956Z async_compile.wait(globals())
2023-01-11T21:38:05.8035120Z del async_compile
2023-01-11T21:38:05.8035128Z 
2023-01-11T21:38:05.8035254Z def call(args):
2023-01-11T21:38:05.8035394Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8035537Z     args.clear()
2023-01-11T21:38:05.8035833Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8036187Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8036515Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8036836Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8036972Z     del arg0_1
2023-01-11T21:38:05.8037095Z     del arg1_1
2023-01-11T21:38:05.8037207Z     del arg2_1
2023-01-11T21:38:05.8037534Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8037729Z     aten.addmm.out(buf0, buf1, buf2, beta=1, alpha=1, out=buf3)
2023-01-11T21:38:05.8037853Z     del buf0
2023-01-11T21:38:05.8037945Z     del buf1
2023-01-11T21:38:05.8038065Z     del buf2
2023-01-11T21:38:05.8038213Z     buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:05.8038404Z     kernel_cpp_1(c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8038592Z     return (buf4, )
2023-01-11T21:38:05.8038600Z 
2023-01-11T21:38:05.8038607Z 
2023-01-11T21:38:05.8038734Z if __name__ == "__main__":
2023-01-11T21:38:05.8038926Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8039144Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8039431Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8039780Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8040080Z     arg2_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8040271Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8040279Z 
2023-01-11T21:38:05.8040399Z ok (1.901s)
2023-01-11T21:38:05.8041069Z   test_alexnet_prefix_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8041267Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8041656Z [2023-01-11 21:24:21,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 16
2023-01-11T21:38:05.8041998Z [2023-01-11 21:24:21,408] torch._inductor.scheduler: [DEBUG] removed dead node: buf2
2023-01-11T21:38:05.8042356Z [2023-01-11 21:24:23,118] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 16
2023-01-11T21:38:05.8042395Z 
2023-01-11T21:38:05.8042529Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8042681Z import torch
2023-01-11T21:38:05.8042808Z import random
2023-01-11T21:38:05.8042999Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8043188Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8043200Z 
2023-01-11T21:38:05.8043336Z aten = torch.ops.aten
2023-01-11T21:38:05.8043572Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8043693Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8043776Z 
2023-01-11T21:38:05.8043875Z import triton
2023-01-11T21:38:05.8044070Z import triton.language as tl
2023-01-11T21:38:05.8044265Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8044481Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8044490Z 
2023-01-11T21:38:05.8044496Z 
2023-01-11T21:38:05.8044707Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8045003Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8045199Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8045358Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8045446Z {
2023-01-11T21:38:05.8045604Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8045712Z     {
2023-01-11T21:38:05.8045845Z         #pragma omp for 
2023-01-11T21:38:05.8046172Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.8046298Z         {
2023-01-11T21:38:05.8046410Z             #pragma GCC ivdep
2023-01-11T21:38:05.8046566Z             for(long i1=0; i1<27; i1+=1)
2023-01-11T21:38:05.8046682Z             {
2023-01-11T21:38:05.8046882Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8047059Z                 for(long i2=0; i2<27; i2+=1)
2023-01-11T21:38:05.8047177Z                 {
2023-01-11T21:38:05.8047299Z                     {
2023-01-11T21:38:05.8047395Z                         {
2023-01-11T21:38:05.8047584Z                             auto tmp0 = in_ptr0[(2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8047814Z                             auto tmp2 = in_ptr0[1 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8048007Z                             auto tmp5 = in_ptr0[2 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8048254Z                             auto tmp8 = in_ptr0[55 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8048450Z                             auto tmp11 = in_ptr0[56 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8048663Z                             auto tmp14 = in_ptr0[57 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8048852Z                             auto tmp17 = in_ptr0[110 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8049013Z                             auto tmp20 = in_ptr0[111 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8049205Z                             auto tmp23 = in_ptr0[112 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8049372Z                             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.8049535Z                             auto tmp3 = tmp2 * (tmp2>0);
2023-01-11T21:38:05.8049742Z                             auto tmp4 = (tmp1 != tmp1) ? tmp1 : std::max(tmp3, tmp1);
2023-01-11T21:38:05.8049953Z                             auto tmp6 = tmp5 * (tmp5>0);
2023-01-11T21:38:05.8050175Z                             auto tmp7 = (tmp4 != tmp4) ? tmp4 : std::max(tmp6, tmp4);
2023-01-11T21:38:05.8050339Z                             auto tmp9 = tmp8 * (tmp8>0);
2023-01-11T21:38:05.8050552Z                             auto tmp10 = (tmp7 != tmp7) ? tmp7 : std::max(tmp9, tmp7);
2023-01-11T21:38:05.8050698Z                             auto tmp12 = tmp11 * (tmp11>0);
2023-01-11T21:38:05.8050909Z                             auto tmp13 = (tmp10 != tmp10) ? tmp10 : std::max(tmp12, tmp10);
2023-01-11T21:38:05.8051099Z                             auto tmp15 = tmp14 * (tmp14>0);
2023-01-11T21:38:05.8051312Z                             auto tmp16 = (tmp13 != tmp13) ? tmp13 : std::max(tmp15, tmp13);
2023-01-11T21:38:05.8051489Z                             auto tmp18 = tmp17 * (tmp17>0);
2023-01-11T21:38:05.8051705Z                             auto tmp19 = (tmp16 != tmp16) ? tmp16 : std::max(tmp18, tmp16);
2023-01-11T21:38:05.8051877Z                             auto tmp21 = tmp20 * (tmp20>0);
2023-01-11T21:38:05.8052098Z                             auto tmp22 = (tmp19 != tmp19) ? tmp19 : std::max(tmp21, tmp19);
2023-01-11T21:38:05.8052274Z                             auto tmp24 = tmp23 * (tmp23>0);
2023-01-11T21:38:05.8052494Z                             auto tmp25 = (tmp22 != tmp22) ? tmp22 : std::max(tmp24, tmp22);
2023-01-11T21:38:05.8052665Z                             out_ptr0[i2 + (27*i1) + (729*i0)] = tmp25;
2023-01-11T21:38:05.8052787Z                         }
2023-01-11T21:38:05.8052931Z                     }
2023-01-11T21:38:05.8053088Z                 }
2023-01-11T21:38:05.8053199Z             }
2023-01-11T21:38:05.8053320Z         }
2023-01-11T21:38:05.8053402Z     }
2023-01-11T21:38:05.8053507Z }
2023-01-11T21:38:05.8053655Z ''')
2023-01-11T21:38:05.8053663Z 
2023-01-11T21:38:05.8053670Z 
2023-01-11T21:38:05.8053820Z async_compile.wait(globals())
2023-01-11T21:38:05.8053944Z del async_compile
2023-01-11T21:38:05.8053957Z 
2023-01-11T21:38:05.8054105Z def call(args):
2023-01-11T21:38:05.8054249Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8054343Z     args.clear()
2023-01-11T21:38:05.8054744Z     buf0 = aten.convolution(arg2_1, arg1_1, arg0_1, (4, 4), (2, 2), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:05.8054890Z     assert_size_stride(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1))
2023-01-11T21:38:05.8054979Z     del arg0_1
2023-01-11T21:38:05.8055067Z     del arg1_1
2023-01-11T21:38:05.8055155Z     del arg2_1
2023-01-11T21:38:05.8055410Z     buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8055557Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8055682Z     return (buf1, )
2023-01-11T21:38:05.8055688Z 
2023-01-11T21:38:05.8055692Z 
2023-01-11T21:38:05.8055837Z if __name__ == "__main__":
2023-01-11T21:38:05.8055971Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8056194Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8056409Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8056645Z     arg1_1 = rand_strided((64, 3, 11, 11), (363, 121, 11, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8056896Z     arg2_1 = rand_strided((16, 3, 224, 224), (150528, 50176, 224, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8057051Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8057056Z 
2023-01-11T21:38:05.8057195Z ok (1.955s)
2023-01-11T21:38:05.8057745Z   test_any_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8057902Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8058182Z [2023-01-11 21:24:23,198] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 17
2023-01-11T21:38:05.8058465Z [2023-01-11 21:24:24,949] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 17
2023-01-11T21:38:05.8058901Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8059050Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8059330Z [2023-01-11 21:24:24,991] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 18
2023-01-11T21:38:05.8059613Z [2023-01-11 21:24:26,864] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 18
2023-01-11T21:38:05.8059619Z 
2023-01-11T21:38:05.8059798Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8059869Z import torch
2023-01-11T21:38:05.8059967Z import random
2023-01-11T21:38:05.8060144Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8060288Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8060294Z 
2023-01-11T21:38:05.8060392Z aten = torch.ops.aten
2023-01-11T21:38:05.8060556Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8060669Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8060674Z 
2023-01-11T21:38:05.8060767Z import triton
2023-01-11T21:38:05.8060857Z import triton.language as tl
2023-01-11T21:38:05.8061016Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8061177Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8061186Z 
2023-01-11T21:38:05.8061190Z 
2023-01-11T21:38:05.8061350Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8061575Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8061721Z extern "C" void kernel(bool* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.8061847Z                        bool* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.8061980Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8062081Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:05.8062240Z                        bool* __restrict__ out_ptr1)
2023-01-11T21:38:05.8062336Z {
2023-01-11T21:38:05.8062443Z     auto out_ptr2 = in_out_ptr0;
2023-01-11T21:38:05.8062547Z     auto out_ptr3 = in_out_ptr1;
2023-01-11T21:38:05.8062637Z     {
2023-01-11T21:38:05.8062719Z         {
2023-01-11T21:38:05.8062795Z             bool tmp2 = 0;
2023-01-11T21:38:05.8062888Z             bool tmp4 = 0;
2023-01-11T21:38:05.8062981Z             bool tmp8 = 0;
2023-01-11T21:38:05.8063107Z             bool tmp10 = 0;
2023-01-11T21:38:05.8063249Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8063339Z             {
2023-01-11T21:38:05.8063530Z                 #pragma omp for reduction(||:tmp2) reduction(||:tmp4) reduction(||:tmp8) reduction(||:tmp10) 
2023-01-11T21:38:05.8063625Z                 for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8063712Z                 {
2023-01-11T21:38:05.8063800Z                     {
2023-01-11T21:38:05.8063921Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8064095Z                         auto tmp1 = static_cast<bool>(tmp0);
2023-01-11T21:38:05.8064227Z                         auto tmp3 = std::isinf(tmp0);
2023-01-11T21:38:05.8064360Z                         auto tmp5 = tmp3 == 0;
2023-01-11T21:38:05.8064472Z                         auto tmp6 = static_cast<long>(tmp5);
2023-01-11T21:38:05.8064602Z                         auto tmp7 = static_cast<bool>(tmp6);
2023-01-11T21:38:05.8064716Z                         auto tmp9 = tmp5 == 0;
2023-01-11T21:38:05.8064828Z                         tmp2 = tmp2 || tmp1;
2023-01-11T21:38:05.8064938Z                         tmp4 = tmp4 || tmp3;
2023-01-11T21:38:05.8065047Z                         tmp8 = tmp8 || tmp7;
2023-01-11T21:38:05.8065161Z                         tmp10 = tmp10 || tmp9;
2023-01-11T21:38:05.8065234Z                     }
2023-01-11T21:38:05.8065342Z                 }
2023-01-11T21:38:05.8065451Z             }
2023-01-11T21:38:05.8065564Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:05.8065663Z             out_ptr1[0] = tmp4;
2023-01-11T21:38:05.8065760Z             out_ptr2[0] = tmp8;
2023-01-11T21:38:05.8065858Z             out_ptr3[0] = tmp10;
2023-01-11T21:38:05.8065921Z         }
2023-01-11T21:38:05.8066036Z     }
2023-01-11T21:38:05.8066123Z     {
2023-01-11T21:38:05.8066205Z         {
2023-01-11T21:38:05.8066324Z             auto tmp0 = out_ptr2[0];
2023-01-11T21:38:05.8066429Z             auto tmp1 = tmp0 == 0;
2023-01-11T21:38:05.8066514Z             in_out_ptr0[0] = tmp1;
2023-01-11T21:38:05.8066596Z         }
2023-01-11T21:38:05.8066674Z     }
2023-01-11T21:38:05.8066756Z     {
2023-01-11T21:38:05.8066845Z         {
2023-01-11T21:38:05.8066949Z             auto tmp0 = out_ptr3[0];
2023-01-11T21:38:05.8067052Z             auto tmp1 = tmp0 == 0;
2023-01-11T21:38:05.8067161Z             in_out_ptr1[0] = tmp1;
2023-01-11T21:38:05.8067260Z         }
2023-01-11T21:38:05.8067343Z     }
2023-01-11T21:38:05.8067420Z }
2023-01-11T21:38:05.8067524Z ''')
2023-01-11T21:38:05.8067530Z 
2023-01-11T21:38:05.8067534Z 
2023-01-11T21:38:05.8067688Z async_compile.wait(globals())
2023-01-11T21:38:05.8067780Z del async_compile
2023-01-11T21:38:05.8067785Z 
2023-01-11T21:38:05.8067855Z def call(args):
2023-01-11T21:38:05.8067943Z     arg0_1, = args
2023-01-11T21:38:05.8068034Z     args.clear()
2023-01-11T21:38:05.8068248Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8068447Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8068644Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8068846Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8068931Z     buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:05.8069040Z     buf5 = buf3; del buf3  # reuse
2023-01-11T21:38:05.8069275Z     kernel_cpp_0(c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8069364Z     del arg0_1
2023-01-11T21:38:05.8069475Z     return (buf0, buf1, buf4, buf5, )
2023-01-11T21:38:05.8069480Z 
2023-01-11T21:38:05.8069485Z 
2023-01-11T21:38:05.8069594Z if __name__ == "__main__":
2023-01-11T21:38:05.8069727Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8069878Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8070073Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8070276Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8070281Z 
2023-01-11T21:38:05.8070286Z 
2023-01-11T21:38:05.8070398Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8070486Z import torch
2023-01-11T21:38:05.8070576Z import random
2023-01-11T21:38:05.8070714Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8070879Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8070884Z 
2023-01-11T21:38:05.8070983Z aten = torch.ops.aten
2023-01-11T21:38:05.8071118Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8071227Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8071232Z 
2023-01-11T21:38:05.8071321Z import triton
2023-01-11T21:38:05.8071429Z import triton.language as tl
2023-01-11T21:38:05.8071570Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8071726Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8071735Z 
2023-01-11T21:38:05.8071739Z 
2023-01-11T21:38:05.8071902Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8072144Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8072259Z extern "C" void kernel(bool* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.8072385Z                        bool* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.8072511Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8072666Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:05.8072783Z                        bool* __restrict__ out_ptr1)
2023-01-11T21:38:05.8072863Z {
2023-01-11T21:38:05.8072979Z     auto out_ptr3 = in_out_ptr0;
2023-01-11T21:38:05.8073064Z     auto out_ptr2 = in_out_ptr1;
2023-01-11T21:38:05.8073182Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8073276Z     {
2023-01-11T21:38:05.8073373Z         #pragma omp for 
2023-01-11T21:38:05.8073475Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8073562Z         {
2023-01-11T21:38:05.8073644Z             {
2023-01-11T21:38:05.8073708Z                 {
2023-01-11T21:38:05.8073815Z                     bool tmp2 = 0;
2023-01-11T21:38:05.8073929Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.8074015Z                     {
2023-01-11T21:38:05.8074145Z                         {
2023-01-11T21:38:05.8074276Z                             auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.8074446Z                             auto tmp1 = static_cast<bool>(tmp0);
2023-01-11T21:38:05.8074540Z                             tmp2 = tmp2 || tmp1;
2023-01-11T21:38:05.8074628Z                         }
2023-01-11T21:38:05.8074722Z                     }
2023-01-11T21:38:05.8074829Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8074915Z                 }
2023-01-11T21:38:05.8074998Z             }
2023-01-11T21:38:05.8075093Z         }
2023-01-11T21:38:05.8075156Z     }
2023-01-11T21:38:05.8075238Z     {
2023-01-11T21:38:05.8075323Z         {
2023-01-11T21:38:05.8075428Z             bool tmp2 = 0;
2023-01-11T21:38:05.8075526Z             bool tmp5 = 0;
2023-01-11T21:38:05.8075655Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8075718Z             {
2023-01-11T21:38:05.8075872Z                 #pragma omp for reduction(||:tmp2) reduction(||:tmp5) 
2023-01-11T21:38:05.8075988Z                 for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:05.8076086Z                 {
2023-01-11T21:38:05.8076205Z                     {
2023-01-11T21:38:05.8076332Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8076460Z                         auto tmp1 = std::isinf(tmp0);
2023-01-11T21:38:05.8076554Z                         auto tmp3 = tmp1 == 0;
2023-01-11T21:38:05.8076669Z                         auto tmp4 = tmp3 == 0;
2023-01-11T21:38:05.8076779Z                         tmp2 = tmp2 || tmp1;
2023-01-11T21:38:05.8076892Z                         tmp5 = tmp5 || tmp4;
2023-01-11T21:38:05.8077012Z                     }
2023-01-11T21:38:05.8077117Z                 }
2023-01-11T21:38:05.8077206Z             }
2023-01-11T21:38:05.8077287Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:05.8077387Z             out_ptr2[0] = tmp5;
2023-01-11T21:38:05.8077470Z         }
2023-01-11T21:38:05.8077554Z     }
2023-01-11T21:38:05.8077677Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8077757Z     {
2023-01-11T21:38:05.8077863Z         #pragma omp for 
2023-01-11T21:38:05.8077945Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8078076Z         {
2023-01-11T21:38:05.8078160Z             {
2023-01-11T21:38:05.8078242Z                 {
2023-01-11T21:38:05.8078344Z                     bool tmp5 = 0;
2023-01-11T21:38:05.8078458Z                     for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:05.8078524Z                     {
2023-01-11T21:38:05.8078617Z                         {
2023-01-11T21:38:05.8078752Z                             auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:05.8078887Z                             auto tmp1 = std::isinf(tmp0);
2023-01-11T21:38:05.8079018Z                             auto tmp2 = tmp1 == 0;
2023-01-11T21:38:05.8079153Z                             auto tmp3 = static_cast<long>(tmp2);
2023-01-11T21:38:05.8079286Z                             auto tmp4 = static_cast<bool>(tmp3);
2023-01-11T21:38:05.8079403Z                             tmp5 = tmp5 || tmp4;
2023-01-11T21:38:05.8079474Z                         }
2023-01-11T21:38:05.8079563Z                     }
2023-01-11T21:38:05.8079677Z                     out_ptr3[i0] = tmp5;
2023-01-11T21:38:05.8079762Z                 }
2023-01-11T21:38:05.8079846Z             }
2023-01-11T21:38:05.8079979Z         }
2023-01-11T21:38:05.8080058Z         #pragma omp for 
2023-01-11T21:38:05.8080165Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8080247Z         {
2023-01-11T21:38:05.8080332Z             {
2023-01-11T21:38:05.8241079Z                 {
2023-01-11T21:38:05.8241365Z                     auto tmp0 = out_ptr3[i0];
2023-01-11T21:38:05.8241478Z                     auto tmp1 = tmp0 == 0;
2023-01-11T21:38:05.8241579Z                     in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.8241656Z                 }
2023-01-11T21:38:05.8241732Z             }
2023-01-11T21:38:05.8241804Z         }
2023-01-11T21:38:05.8241897Z         #pragma omp single
2023-01-11T21:38:05.8242263Z         {
2023-01-11T21:38:05.8242338Z             {
2023-01-11T21:38:05.8242409Z                 {
2023-01-11T21:38:05.8242510Z                     auto tmp0 = out_ptr2[0];
2023-01-11T21:38:05.8242606Z                     auto tmp1 = tmp0 == 0;
2023-01-11T21:38:05.8242699Z                     in_out_ptr1[0] = tmp1;
2023-01-11T21:38:05.8242768Z                 }
2023-01-11T21:38:05.8242831Z             }
2023-01-11T21:38:05.8242900Z         }
2023-01-11T21:38:05.8242965Z     }
2023-01-11T21:38:05.8243034Z }
2023-01-11T21:38:05.8243233Z ''')
2023-01-11T21:38:05.8243242Z 
2023-01-11T21:38:05.8243246Z 
2023-01-11T21:38:05.8243348Z async_compile.wait(globals())
2023-01-11T21:38:05.8243429Z del async_compile
2023-01-11T21:38:05.8243434Z 
2023-01-11T21:38:05.8243505Z def call(args):
2023-01-11T21:38:05.8243585Z     arg0_1, = args
2023-01-11T21:38:05.8243661Z     args.clear()
2023-01-11T21:38:05.8243869Z     buf0 = empty_strided((16, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8244054Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8244234Z     buf4 = empty_strided((), (), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8244423Z     buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8244516Z     buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:05.8244600Z     buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:05.8244822Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8244897Z     del arg0_1
2023-01-11T21:38:05.8244991Z     return (buf0, buf1, buf3, buf5, )
2023-01-11T21:38:05.8245068Z 
2023-01-11T21:38:05.8245073Z 
2023-01-11T21:38:05.8245155Z if __name__ == "__main__":
2023-01-11T21:38:05.8245280Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8245408Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8245642Z     arg0_1 = rand_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8245768Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8245774Z 
2023-01-11T21:38:05.8245845Z ok (3.704s)
2023-01-11T21:38:05.8246299Z   test_arange1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8246432Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8246697Z [2023-01-11 21:24:26,920] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 19
2023-01-11T21:38:05.8246962Z [2023-01-11 21:24:28,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 19
2023-01-11T21:38:05.8246971Z 
2023-01-11T21:38:05.8247072Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8247148Z import torch
2023-01-11T21:38:05.8247222Z import random
2023-01-11T21:38:05.8247337Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8247463Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8247468Z 
2023-01-11T21:38:05.8247550Z aten = torch.ops.aten
2023-01-11T21:38:05.8247687Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8247784Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8247790Z 
2023-01-11T21:38:05.8247866Z import triton
2023-01-11T21:38:05.8247963Z import triton.language as tl
2023-01-11T21:38:05.8248083Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8248225Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8248231Z 
2023-01-11T21:38:05.8248235Z 
2023-01-11T21:38:05.8248373Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8248613Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8248737Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8248843Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8248945Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8249009Z {
2023-01-11T21:38:05.8249106Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8249174Z     {
2023-01-11T21:38:05.8249259Z         #pragma omp for 
2023-01-11T21:38:05.8249347Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8249415Z         {
2023-01-11T21:38:05.8249486Z             {
2023-01-11T21:38:05.8249553Z                 {
2023-01-11T21:38:05.8249645Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8249756Z                     auto tmp1 = static_cast<float>(i0);
2023-01-11T21:38:05.8249853Z                     auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8249945Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8250014Z                 }
2023-01-11T21:38:05.8250084Z             }
2023-01-11T21:38:05.8250154Z         }
2023-01-11T21:38:05.8250229Z         #pragma omp for 
2023-01-11T21:38:05.8250315Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8250383Z         {
2023-01-11T21:38:05.8250469Z             #pragma GCC ivdep
2023-01-11T21:38:05.8250557Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.8250625Z             {
2023-01-11T21:38:05.8250688Z                 {
2023-01-11T21:38:05.8250759Z                     {
2023-01-11T21:38:05.8250870Z                         auto tmp0 = out_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.8251015Z                         auto tmp1 = static_cast<long>(10 + i1);
2023-01-11T21:38:05.8251133Z                         auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:05.8251232Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.8251333Z                         out_ptr1[i1 + (8*i0)] = tmp3;
2023-01-11T21:38:05.8251408Z                     }
2023-01-11T21:38:05.8251472Z                 }
2023-01-11T21:38:05.8251539Z             }
2023-01-11T21:38:05.8251605Z         }
2023-01-11T21:38:05.8251673Z     }
2023-01-11T21:38:05.8251741Z }
2023-01-11T21:38:05.8251831Z ''')
2023-01-11T21:38:05.8251837Z 
2023-01-11T21:38:05.8251841Z 
2023-01-11T21:38:05.8251933Z async_compile.wait(globals())
2023-01-11T21:38:05.8252004Z del async_compile
2023-01-11T21:38:05.8252009Z 
2023-01-11T21:38:05.8252084Z def call(args):
2023-01-11T21:38:05.8252160Z     arg0_1, = args
2023-01-11T21:38:05.8252233Z     args.clear()
2023-01-11T21:38:05.8252432Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8252629Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8252799Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8252867Z     del arg0_1
2023-01-11T21:38:05.8252949Z     return (buf0, buf1, )
2023-01-11T21:38:05.8252957Z 
2023-01-11T21:38:05.8252962Z 
2023-01-11T21:38:05.8253044Z if __name__ == "__main__":
2023-01-11T21:38:05.8253160Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8253288Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8253486Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8253599Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8253604Z 
2023-01-11T21:38:05.8253675Z ok (1.841s)
2023-01-11T21:38:05.8254120Z   test_arange2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8254295Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8254712Z [2023-01-11 21:24:28,737] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 20
2023-01-11T21:38:05.8254981Z [2023-01-11 21:24:30,441] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 20
2023-01-11T21:38:05.8254987Z 
2023-01-11T21:38:05.8255086Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8255163Z import torch
2023-01-11T21:38:05.8255242Z import random
2023-01-11T21:38:05.8255360Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8255485Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8255494Z 
2023-01-11T21:38:05.8255571Z aten = torch.ops.aten
2023-01-11T21:38:05.8255707Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8255803Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8255808Z 
2023-01-11T21:38:05.8255883Z import triton
2023-01-11T21:38:05.8255977Z import triton.language as tl
2023-01-11T21:38:05.8256101Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8256243Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8256248Z 
2023-01-11T21:38:05.8256253Z 
2023-01-11T21:38:05.8256392Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8256593Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8256716Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8256818Z                        long* __restrict__ out_ptr0)
2023-01-11T21:38:05.8256885Z {
2023-01-11T21:38:05.8257032Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8257098Z     {
2023-01-11T21:38:05.8257248Z         #pragma omp for 
2023-01-11T21:38:05.8257330Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8257401Z         {
2023-01-11T21:38:05.8257486Z             #pragma GCC ivdep
2023-01-11T21:38:05.8257574Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.8257646Z             {
2023-01-11T21:38:05.8257715Z                 {
2023-01-11T21:38:05.8257785Z                     {
2023-01-11T21:38:05.8257887Z                         auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.8258001Z                         auto tmp1 = static_cast<long>(i1);
2023-01-11T21:38:05.8258102Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8258202Z                         out_ptr0[i1 + (8*i0)] = tmp2;
2023-01-11T21:38:05.8258275Z                     }
2023-01-11T21:38:05.8258345Z                 }
2023-01-11T21:38:05.8258412Z             }
2023-01-11T21:38:05.8258474Z         }
2023-01-11T21:38:05.8258544Z     }
2023-01-11T21:38:05.8258609Z }
2023-01-11T21:38:05.8258694Z ''')
2023-01-11T21:38:05.8258699Z 
2023-01-11T21:38:05.8258704Z 
2023-01-11T21:38:05.8258800Z async_compile.wait(globals())
2023-01-11T21:38:05.8258877Z del async_compile
2023-01-11T21:38:05.8258882Z 
2023-01-11T21:38:05.8258957Z def call(args):
2023-01-11T21:38:05.8259028Z     arg0_1, = args
2023-01-11T21:38:05.8259104Z     args.clear()
2023-01-11T21:38:05.8259297Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8259438Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8259512Z     del arg0_1
2023-01-11T21:38:05.8259587Z     return (buf0, )
2023-01-11T21:38:05.8259592Z 
2023-01-11T21:38:05.8259596Z 
2023-01-11T21:38:05.8259677Z if __name__ == "__main__":
2023-01-11T21:38:05.8259789Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8259915Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8260107Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8260221Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8260226Z 
2023-01-11T21:38:05.8260298Z ok (1.735s)
2023-01-11T21:38:05.8260823Z   test_arange3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8260957Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8261214Z [2023-01-11 21:24:30,478] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 21
2023-01-11T21:38:05.8261474Z [2023-01-11 21:24:32,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 21
2023-01-11T21:38:05.8261483Z 
2023-01-11T21:38:05.8261582Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8261651Z import torch
2023-01-11T21:38:05.8261727Z import random
2023-01-11T21:38:05.8261847Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8261969Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8261976Z 
2023-01-11T21:38:05.8262060Z aten = torch.ops.aten
2023-01-11T21:38:05.8262196Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8262292Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8262297Z 
2023-01-11T21:38:05.8262365Z import triton
2023-01-11T21:38:05.8262458Z import triton.language as tl
2023-01-11T21:38:05.8262585Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8262724Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8262730Z 
2023-01-11T21:38:05.8262735Z 
2023-01-11T21:38:05.8262871Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8263113Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8263239Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8263343Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8263403Z {
2023-01-11T21:38:05.8263507Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8263574Z     {
2023-01-11T21:38:05.8263655Z         #pragma omp for 
2023-01-11T21:38:05.8263746Z         for(long i0=0; i0<14; i0+=1)
2023-01-11T21:38:05.8263815Z         {
2023-01-11T21:38:05.8263882Z             {
2023-01-11T21:38:05.8263945Z                 {
2023-01-11T21:38:05.8264042Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8264155Z                     auto tmp1 = static_cast<long>(4*i0);
2023-01-11T21:38:05.8264266Z                     auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:05.8264368Z                     auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.8264461Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.8264531Z                 }
2023-01-11T21:38:05.8264592Z             }
2023-01-11T21:38:05.8264662Z         }
2023-01-11T21:38:05.8264728Z     }
2023-01-11T21:38:05.8264792Z }
2023-01-11T21:38:05.8264876Z ''')
2023-01-11T21:38:05.8264882Z 
2023-01-11T21:38:05.8264886Z 
2023-01-11T21:38:05.8264982Z async_compile.wait(globals())
2023-01-11T21:38:05.8265059Z del async_compile
2023-01-11T21:38:05.8265064Z 
2023-01-11T21:38:05.8265132Z def call(args):
2023-01-11T21:38:05.8265209Z     arg0_1, = args
2023-01-11T21:38:05.8265285Z     args.clear()
2023-01-11T21:38:05.8265481Z     buf0 = empty_strided((14, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8265619Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8265692Z     del arg0_1
2023-01-11T21:38:05.8265768Z     return (buf0, )
2023-01-11T21:38:05.8265773Z 
2023-01-11T21:38:05.8265777Z 
2023-01-11T21:38:05.8265851Z if __name__ == "__main__":
2023-01-11T21:38:05.8265971Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8266102Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8266296Z     arg0_1 = rand_strided((14, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8266408Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8266442Z 
2023-01-11T21:38:05.8266516Z ok (1.750s)
2023-01-11T21:38:05.8266960Z   test_arange4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8267090Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8267346Z [2023-01-11 21:24:32,224] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 22
2023-01-11T21:38:05.8267609Z [2023-01-11 21:24:33,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 22
2023-01-11T21:38:05.8267614Z 
2023-01-11T21:38:05.8267706Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8267779Z import torch
2023-01-11T21:38:05.8267857Z import random
2023-01-11T21:38:05.8267976Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8268099Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8268104Z 
2023-01-11T21:38:05.8268185Z aten = torch.ops.aten
2023-01-11T21:38:05.8268322Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8268411Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8268422Z 
2023-01-11T21:38:05.8268491Z import triton
2023-01-11T21:38:05.8268583Z import triton.language as tl
2023-01-11T21:38:05.8268709Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8268881Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8268887Z 
2023-01-11T21:38:05.8268891Z 
2023-01-11T21:38:05.8269028Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8269233Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8269357Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8269455Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8269521Z {
2023-01-11T21:38:05.8269623Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8269689Z     {
2023-01-11T21:38:05.8269771Z         #pragma omp for 
2023-01-11T21:38:05.8269859Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.8269926Z         {
2023-01-11T21:38:05.8269988Z             {
2023-01-11T21:38:05.8270055Z                 {
2023-01-11T21:38:05.8270153Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8270326Z                     auto tmp1 = static_cast<long>(512 + ((-1)*i0));
2023-01-11T21:38:05.8270438Z                     auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:05.8270575Z                     auto tmp3 = tmp0 - tmp2;
2023-01-11T21:38:05.8270665Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.8270728Z                 }
2023-01-11T21:38:05.8270795Z             }
2023-01-11T21:38:05.8270863Z         }
2023-01-11T21:38:05.8270930Z     }
2023-01-11T21:38:05.8270993Z }
2023-01-11T21:38:05.8271077Z ''')
2023-01-11T21:38:05.8271082Z 
2023-01-11T21:38:05.8271086Z 
2023-01-11T21:38:05.8271179Z async_compile.wait(globals())
2023-01-11T21:38:05.8271249Z del async_compile
2023-01-11T21:38:05.8271254Z 
2023-01-11T21:38:05.8271330Z def call(args):
2023-01-11T21:38:05.8271404Z     arg0_1, = args
2023-01-11T21:38:05.8271478Z     args.clear()
2023-01-11T21:38:05.8271676Z     buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8271813Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8271888Z     del arg0_1
2023-01-11T21:38:05.8271957Z     return (buf0, )
2023-01-11T21:38:05.8271962Z 
2023-01-11T21:38:05.8271973Z 
2023-01-11T21:38:05.8272048Z if __name__ == "__main__":
2023-01-11T21:38:05.8272167Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8272329Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8272553Z     arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8272676Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8272681Z 
2023-01-11T21:38:05.8272756Z ok (1.733s)
2023-01-11T21:38:05.8273299Z   test_argmax_argmin1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8273445Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8273737Z [2023-01-11 21:24:33,957] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 23
2023-01-11T21:38:05.8274034Z [2023-01-11 21:24:35,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 23
2023-01-11T21:38:05.8274039Z 
2023-01-11T21:38:05.8274142Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8274217Z import torch
2023-01-11T21:38:05.8274292Z import random
2023-01-11T21:38:05.8274422Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8274557Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8274562Z 
2023-01-11T21:38:05.8274648Z aten = torch.ops.aten
2023-01-11T21:38:05.8274791Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8274891Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8274896Z 
2023-01-11T21:38:05.8275010Z import triton
2023-01-11T21:38:05.8275126Z import triton.language as tl
2023-01-11T21:38:05.8275277Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8275443Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8275448Z 
2023-01-11T21:38:05.8275452Z 
2023-01-11T21:38:05.8275607Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8275834Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8275961Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8276069Z                        long* __restrict__ out_ptr0,
2023-01-11T21:38:05.8276177Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.8276243Z {
2023-01-11T21:38:05.8276309Z     {
2023-01-11T21:38:05.8276379Z         {
2023-01-11T21:38:05.8276511Z             struct IndexValue_1 {size_t index; float value;};
2023-01-11T21:38:05.8276802Z             IndexValue_1 tmp1{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:05.8276942Z             #pragma omp declare reduction(argmax : struct IndexValue_1 :\
2023-01-11T21:38:05.8277103Z                 omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:05.8277260Z                 omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:05.8277495Z             	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:05.8277618Z             struct IndexValue_2 {size_t index; float value;};
2023-01-11T21:38:05.8277756Z             IndexValue_2 tmp2{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:05.8277898Z             #pragma omp declare reduction(argmin : struct IndexValue_2 :\
2023-01-11T21:38:05.8278048Z                 omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:05.8278191Z                 omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:05.8278342Z             	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:05.8278451Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8278520Z             {
2023-01-11T21:38:05.8278709Z                 #pragma omp for reduction(argmax:tmp1) reduction(argmin:tmp2) 
2023-01-11T21:38:05.8278810Z                 for(long i0=0; i0<524288; i0+=1)
2023-01-11T21:38:05.8278880Z                 {
2023-01-11T21:38:05.8278955Z                     {
2023-01-11T21:38:05.8279050Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8279150Z                         if (tmp1.value < tmp0) {
2023-01-11T21:38:05.8279269Z                             tmp1.index = i0; tmp1.value = tmp0;
2023-01-11T21:38:05.8279343Z                         }
2023-01-11T21:38:05.8279441Z                         if (tmp2.value > tmp0) {
2023-01-11T21:38:05.8279554Z                             tmp2.index = i0; tmp2.value = tmp0;
2023-01-11T21:38:05.8279630Z                         }
2023-01-11T21:38:05.8279695Z                     }
2023-01-11T21:38:05.8279764Z                 }
2023-01-11T21:38:05.8279829Z             }
2023-01-11T21:38:05.8279921Z             out_ptr0[0] = tmp1.index;
2023-01-11T21:38:05.8280010Z             out_ptr1[0] = tmp2.index;
2023-01-11T21:38:05.8280078Z         }
2023-01-11T21:38:05.8280141Z     }
2023-01-11T21:38:05.8280205Z }
2023-01-11T21:38:05.8280292Z ''')
2023-01-11T21:38:05.8280298Z 
2023-01-11T21:38:05.8280302Z 
2023-01-11T21:38:05.8280398Z async_compile.wait(globals())
2023-01-11T21:38:05.8280474Z del async_compile
2023-01-11T21:38:05.8280479Z 
2023-01-11T21:38:05.8280555Z def call(args):
2023-01-11T21:38:05.8280627Z     arg0_1, = args
2023-01-11T21:38:05.8280703Z     args.clear()
2023-01-11T21:38:05.8280879Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8281060Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8281229Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8281336Z     del arg0_1
2023-01-11T21:38:05.8281418Z     return (buf0, buf1, )
2023-01-11T21:38:05.8281423Z 
2023-01-11T21:38:05.8281428Z 
2023-01-11T21:38:05.8281510Z if __name__ == "__main__":
2023-01-11T21:38:05.8281630Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8281750Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8281968Z     arg0_1 = rand_strided((8, 256, 256), (65536, 256, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8282082Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8282087Z 
2023-01-11T21:38:05.8282158Z ok (1.732s)
2023-01-11T21:38:05.8282618Z   test_argmax_argmin2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8282755Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8283013Z [2023-01-11 21:24:35,698] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 24
2023-01-11T21:38:05.8283273Z [2023-01-11 21:24:37,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 24
2023-01-11T21:38:05.8283279Z 
2023-01-11T21:38:05.8283377Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8283453Z import torch
2023-01-11T21:38:05.8283521Z import random
2023-01-11T21:38:05.8283641Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8283766Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8283771Z 
2023-01-11T21:38:05.8283852Z aten = torch.ops.aten
2023-01-11T21:38:05.8283987Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8284086Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8284091Z 
2023-01-11T21:38:05.8284164Z import triton
2023-01-11T21:38:05.8284250Z import triton.language as tl
2023-01-11T21:38:05.8284376Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8284544Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8284550Z 
2023-01-11T21:38:05.8284554Z 
2023-01-11T21:38:05.8284693Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8284904Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8285027Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8285134Z                        long* __restrict__ out_ptr0,
2023-01-11T21:38:05.8285235Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8285326Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8285423Z                        long* __restrict__ out_ptr3)
2023-01-11T21:38:05.8285494Z {
2023-01-11T21:38:05.8285596Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8285662Z     {
2023-01-11T21:38:05.8285744Z         #pragma omp for 
2023-01-11T21:38:05.8285835Z         for(long i0=0; i0<144; i0+=1)
2023-01-11T21:38:05.8285896Z         {
2023-01-11T21:38:05.8285967Z             {
2023-01-11T21:38:05.8286035Z                 {
2023-01-11T21:38:05.8286162Z                     struct IndexValue_3 {size_t index; float value;};
2023-01-11T21:38:05.8286399Z                     IndexValue_3 tmp1{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:05.8286543Z                     #pragma omp declare reduction(argmax : struct IndexValue_3 :\
2023-01-11T21:38:05.8286701Z                         omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:05.8286853Z                         omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:05.8287165Z                     	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:05.8287294Z                     struct IndexValue_4 {size_t index; float value;};
2023-01-11T21:38:05.8287433Z                     IndexValue_4 tmp2{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:05.8287582Z                     #pragma omp declare reduction(argmin : struct IndexValue_4 :\
2023-01-11T21:38:05.8287743Z                         omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:05.8287894Z                         omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:05.8288044Z                     	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:05.8288143Z                     for(long i1=0; i1<144; i1+=1)
2023-01-11T21:38:05.8288209Z                     {
2023-01-11T21:38:05.8288283Z                         {
2023-01-11T21:38:05.8288393Z                             auto tmp0 = in_ptr0[i0 + (144*i1)];
2023-01-11T21:38:05.8288499Z                             if (tmp1.value < tmp0) {
2023-01-11T21:38:05.8288619Z                                 tmp1.index = i1; tmp1.value = tmp0;
2023-01-11T21:38:05.8288695Z                             }
2023-01-11T21:38:05.8288800Z                             if (tmp2.value > tmp0) {
2023-01-11T21:38:05.8288918Z                                 tmp2.index = i1; tmp2.value = tmp0;
2023-01-11T21:38:05.8288987Z                             }
2023-01-11T21:38:05.8289059Z                         }
2023-01-11T21:38:05.8289130Z                     }
2023-01-11T21:38:05.8289230Z                     out_ptr0[i0] = tmp1.index;
2023-01-11T21:38:05.8289327Z                     out_ptr1[i0] = tmp2.index;
2023-01-11T21:38:05.8289395Z                 }
2023-01-11T21:38:05.8289464Z             }
2023-01-11T21:38:05.8289526Z         }
2023-01-11T21:38:05.8289608Z         #pragma omp for 
2023-01-11T21:38:05.8289696Z         for(long i0=0; i0<144; i0+=1)
2023-01-11T21:38:05.8289766Z         {
2023-01-11T21:38:05.8289834Z             {
2023-01-11T21:38:05.8289901Z                 {
2023-01-11T21:38:05.8290022Z                     struct IndexValue_5 {size_t index; float value;};
2023-01-11T21:38:05.8290256Z                     IndexValue_5 tmp1{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:05.8290433Z                     #pragma omp declare reduction(argmax : struct IndexValue_5 :\
2023-01-11T21:38:05.8290592Z                         omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:05.8290745Z                         omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:05.8290986Z                     	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:05.8291113Z                     struct IndexValue_6 {size_t index; float value;};
2023-01-11T21:38:05.8291255Z                     IndexValue_6 tmp2{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:05.8291403Z                     #pragma omp declare reduction(argmin : struct IndexValue_6 :\
2023-01-11T21:38:05.8291551Z                         omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:05.8291707Z                         omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:05.8291857Z                     	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:05.8291958Z                     for(long i1=0; i1<144; i1+=1)
2023-01-11T21:38:05.8292029Z                     {
2023-01-11T21:38:05.8292101Z                         {
2023-01-11T21:38:05.8292211Z                             auto tmp0 = in_ptr0[i1 + (144*i0)];
2023-01-11T21:38:05.8292315Z                             if (tmp1.value < tmp0) {
2023-01-11T21:38:05.8292425Z                                 tmp1.index = i1; tmp1.value = tmp0;
2023-01-11T21:38:05.8292501Z                             }
2023-01-11T21:38:05.8292633Z                             if (tmp2.value > tmp0) {
2023-01-11T21:38:05.8292751Z                                 tmp2.index = i1; tmp2.value = tmp0;
2023-01-11T21:38:05.8292824Z                             }
2023-01-11T21:38:05.8292899Z                         }
2023-01-11T21:38:05.8292971Z                     }
2023-01-11T21:38:05.8293068Z                     out_ptr2[i0] = tmp1.index;
2023-01-11T21:38:05.8293164Z                     out_ptr3[i0] = tmp2.index;
2023-01-11T21:38:05.8293235Z                 }
2023-01-11T21:38:05.8293303Z             }
2023-01-11T21:38:05.8293369Z         }
2023-01-11T21:38:05.8293436Z     }
2023-01-11T21:38:05.8293494Z }
2023-01-11T21:38:05.8293579Z ''')
2023-01-11T21:38:05.8293585Z 
2023-01-11T21:38:05.8293589Z 
2023-01-11T21:38:05.8293684Z async_compile.wait(globals())
2023-01-11T21:38:05.8293762Z del async_compile
2023-01-11T21:38:05.8293767Z 
2023-01-11T21:38:05.8293843Z def call(args):
2023-01-11T21:38:05.8293917Z     arg0_1, = args
2023-01-11T21:38:05.8293996Z     args.clear()
2023-01-11T21:38:05.8294190Z     buf0 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8294374Z     buf1 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8294679Z     buf2 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8294875Z     buf3 = empty_strided((144, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8295095Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:05.8295170Z     del arg0_1
2023-01-11T21:38:05.8295261Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:05.8295267Z 
2023-01-11T21:38:05.8295271Z 
2023-01-11T21:38:05.8295352Z if __name__ == "__main__":
2023-01-11T21:38:05.8295472Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8295618Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8295852Z     arg0_1 = rand_strided((144, 144), (144, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8295966Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8295971Z 
2023-01-11T21:38:05.8296044Z ok (1.715s)
2023-01-11T21:38:05.8296168Z   test_argmax_argmin3_cpu (__main__.CpuTests) ... skip: 
2023-01-11T21:38:05.8296376Z         FIXME: In the case of having equally max/min elements, our implementation returns
2023-01-11T21:38:05.8296491Z         the last index instead of the first one
2023-01-11T21:38:05.8296563Z          (0.001s)
2023-01-11T21:38:05.8297014Z   test_as_strided_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8297201Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8297471Z [2023-01-11 21:24:37,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 25
2023-01-11T21:38:05.8297736Z [2023-01-11 21:24:39,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 25
2023-01-11T21:38:05.8297742Z 
2023-01-11T21:38:05.8297837Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8297911Z import torch
2023-01-11T21:38:05.8297985Z import random
2023-01-11T21:38:05.8298103Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8298232Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8298237Z 
2023-01-11T21:38:05.8298313Z aten = torch.ops.aten
2023-01-11T21:38:05.8298452Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8298549Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8298555Z 
2023-01-11T21:38:05.8298629Z import triton
2023-01-11T21:38:05.8298762Z import triton.language as tl
2023-01-11T21:38:05.8298885Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8299024Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8299030Z 
2023-01-11T21:38:05.8299034Z 
2023-01-11T21:38:05.8299177Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8299378Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8299503Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.8299614Z                        const float* __restrict__ in_ptr0)
2023-01-11T21:38:05.8299681Z {
2023-01-11T21:38:05.8299783Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8299847Z     {
2023-01-11T21:38:05.8299929Z         #pragma omp for 
2023-01-11T21:38:05.8300011Z         for(long i0=0; i0<512; i0+=1)
2023-01-11T21:38:05.8300080Z         {
2023-01-11T21:38:05.8300220Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8300361Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8300451Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8300584Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8300677Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.8300780Z             tmp4.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8300842Z         }
2023-01-11T21:38:05.8300942Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8301035Z         for(long i0=4096; i0<4096; i0+=1)
2023-01-11T21:38:05.8301104Z         {
2023-01-11T21:38:05.8301194Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8301297Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8301380Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8301483Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.8301571Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.8301662Z             in_out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8301729Z         }
2023-01-11T21:38:05.8301795Z     }
2023-01-11T21:38:05.8301862Z }
2023-01-11T21:38:05.8301941Z ''')
2023-01-11T21:38:05.8301946Z 
2023-01-11T21:38:05.8301957Z 
2023-01-11T21:38:05.8302044Z async_compile.wait(globals())
2023-01-11T21:38:05.8302153Z del async_compile
2023-01-11T21:38:05.8302160Z 
2023-01-11T21:38:05.8302238Z def call(args):
2023-01-11T21:38:05.8302313Z     arg0_1, = args
2023-01-11T21:38:05.8302388Z     args.clear()
2023-01-11T21:38:05.8302591Z     buf0 = empty_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8302718Z     buf1 = as_strided(buf0, (8, 8, 64), (512, 64, 1)); del buf0  # reuse
2023-01-11T21:38:05.8302849Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.8302964Z     return (as_strided(arg0_1, (8, 8, 64), (512, 64, 1)), buf1, )
2023-01-11T21:38:05.8302969Z 
2023-01-11T21:38:05.8302974Z 
2023-01-11T21:38:05.8303057Z if __name__ == "__main__":
2023-01-11T21:38:05.8303174Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8303301Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8303501Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8303616Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8303621Z 
2023-01-11T21:38:05.8303695Z ok (1.719s)
2023-01-11T21:38:05.8304155Z   test_as_strided_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8304281Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8304564Z [2023-01-11 21:24:39,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 26
2023-01-11T21:38:05.8304825Z [2023-01-11 21:24:40,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 26
2023-01-11T21:38:05.8304831Z 
2023-01-11T21:38:05.8304929Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8305009Z import torch
2023-01-11T21:38:05.8305086Z import random
2023-01-11T21:38:05.8305205Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8305339Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8305345Z 
2023-01-11T21:38:05.8305436Z aten = torch.ops.aten
2023-01-11T21:38:05.8305593Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8305692Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8305698Z 
2023-01-11T21:38:05.8305773Z import triton
2023-01-11T21:38:05.8305867Z import triton.language as tl
2023-01-11T21:38:05.8305993Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8306135Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8306140Z 
2023-01-11T21:38:05.8306145Z 
2023-01-11T21:38:05.8306279Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8306478Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8306606Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8306716Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8306819Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8306885Z {
2023-01-11T21:38:05.8306986Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8307052Z     {
2023-01-11T21:38:05.8307128Z         #pragma omp for 
2023-01-11T21:38:05.8307219Z         for(long i0=0; i0<1280; i0+=1)
2023-01-11T21:38:05.8307286Z         {
2023-01-11T21:38:05.8307425Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8307568Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:05.8307656Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8307797Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:05.8307885Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.8308007Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8308076Z         }
2023-01-11T21:38:05.8308177Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8308273Z         for(long i0=10240; i0<10240; i0+=1)
2023-01-11T21:38:05.8308340Z         {
2023-01-11T21:38:05.8308428Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8308532Z             auto tmp1 = static_cast<float>(8);
2023-01-11T21:38:05.8308615Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8308720Z             auto tmp3 = static_cast<float>(10);
2023-01-11T21:38:05.8308810Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.8308896Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8308965Z         }
2023-01-11T21:38:05.8309047Z         #pragma omp for 
2023-01-11T21:38:05.8309134Z         for(long i0=0; i0<5120; i0+=1)
2023-01-11T21:38:05.8309194Z         {
2023-01-11T21:38:05.8309262Z             {
2023-01-11T21:38:05.8309331Z                 {
2023-01-11T21:38:05.8309431Z                     auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.8309542Z                     auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8309637Z                     auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8309739Z                     auto tmp3 = static_cast<float>(4);
2023-01-11T21:38:05.8309880Z                     auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:05.8309975Z                     out_ptr0[2*i0] = tmp4;
2023-01-11T21:38:05.8310045Z                 }
2023-01-11T21:38:05.8310111Z             }
2023-01-11T21:38:05.8310179Z         }
2023-01-11T21:38:05.8310248Z     }
2023-01-11T21:38:05.8310306Z }
2023-01-11T21:38:05.8310388Z ''')
2023-01-11T21:38:05.8310424Z 
2023-01-11T21:38:05.8310428Z 
2023-01-11T21:38:05.8310523Z async_compile.wait(globals())
2023-01-11T21:38:05.8310602Z del async_compile
2023-01-11T21:38:05.8310607Z 
2023-01-11T21:38:05.8310681Z def call(args):
2023-01-11T21:38:05.8310767Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8310843Z     args.clear()
2023-01-11T21:38:05.8311053Z     buf0 = empty_strided((10, 1024), (1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8311214Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8311288Z     del arg0_1
2023-01-11T21:38:05.8311359Z     del arg1_1
2023-01-11T21:38:05.8311441Z     return (buf0, )
2023-01-11T21:38:05.8311446Z 
2023-01-11T21:38:05.8311451Z 
2023-01-11T21:38:05.8311532Z if __name__ == "__main__":
2023-01-11T21:38:05.8311651Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8311777Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8311976Z     arg0_1 = rand_strided((10, 1024), (1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8312181Z     arg1_1 = rand_strided((10, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8312302Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8312309Z 
2023-01-11T21:38:05.8312381Z ok (1.785s)
2023-01-11T21:38:05.8312836Z   test_avg_pool2d1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8312971Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8313227Z [2023-01-11 21:24:40,896] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 27
2023-01-11T21:38:05.8313493Z [2023-01-11 21:24:42,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 27
2023-01-11T21:38:05.8313499Z 
2023-01-11T21:38:05.8313597Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8313674Z import torch
2023-01-11T21:38:05.8313743Z import random
2023-01-11T21:38:05.8313892Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8314017Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8314022Z 
2023-01-11T21:38:05.8314104Z aten = torch.ops.aten
2023-01-11T21:38:05.8314244Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8314341Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8314347Z 
2023-01-11T21:38:05.8314420Z import triton
2023-01-11T21:38:05.8314507Z import triton.language as tl
2023-01-11T21:38:05.8314634Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8314773Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8314782Z 
2023-01-11T21:38:05.8314786Z 
2023-01-11T21:38:05.8314924Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8315173Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8315311Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8315419Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8315484Z {
2023-01-11T21:38:05.8315579Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8315646Z     {
2023-01-11T21:38:05.8315728Z         #pragma omp for 
2023-01-11T21:38:05.8315816Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8315883Z         {
2023-01-11T21:38:05.8315970Z             #pragma GCC ivdep
2023-01-11T21:38:05.8316058Z             for(long i1=0; i1<7; i1+=1)
2023-01-11T21:38:05.8316120Z             {
2023-01-11T21:38:05.8316206Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8316300Z                 for(long i2=0; i2<7; i2+=1)
2023-01-11T21:38:05.8316400Z                 {
2023-01-11T21:38:05.8316473Z                     {
2023-01-11T21:38:05.8316547Z                         {
2023-01-11T21:38:05.8316665Z                             auto tmp0 = in_ptr0[(2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8316778Z                             auto tmp1 = in_ptr0[1 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8316902Z                             auto tmp3 = in_ptr0[2 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317021Z                             auto tmp5 = in_ptr0[16 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317134Z                             auto tmp7 = in_ptr0[17 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317248Z                             auto tmp9 = in_ptr0[18 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317367Z                             auto tmp11 = in_ptr0[32 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317481Z                             auto tmp13 = in_ptr0[33 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317598Z                             auto tmp15 = in_ptr0[34 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.8317693Z                             auto tmp2 = tmp1 + tmp0;
2023-01-11T21:38:05.8317789Z                             auto tmp4 = tmp3 + tmp2;
2023-01-11T21:38:05.8317889Z                             auto tmp6 = tmp5 + tmp4;
2023-01-11T21:38:05.8317987Z                             auto tmp8 = tmp7 + tmp6;
2023-01-11T21:38:05.8318087Z                             auto tmp10 = tmp9 + tmp8;
2023-01-11T21:38:05.8318187Z                             auto tmp12 = tmp11 + tmp10;
2023-01-11T21:38:05.8318291Z                             auto tmp14 = tmp13 + tmp12;
2023-01-11T21:38:05.8318387Z                             auto tmp16 = tmp15 + tmp14;
2023-01-11T21:38:05.8318512Z                             auto tmp17 = static_cast<float>(0.1111111111111111);
2023-01-11T21:38:05.8318612Z                             auto tmp18 = tmp16 * tmp17;
2023-01-11T21:38:05.8318722Z                             out_ptr0[i2 + (7*i1) + (49*i0)] = tmp18;
2023-01-11T21:38:05.8318797Z                         }
2023-01-11T21:38:05.8318868Z                     }
2023-01-11T21:38:05.8318936Z                 }
2023-01-11T21:38:05.8318998Z             }
2023-01-11T21:38:05.8319063Z         }
2023-01-11T21:38:05.8319129Z     }
2023-01-11T21:38:05.8319191Z }
2023-01-11T21:38:05.8319317Z ''')
2023-01-11T21:38:05.8319323Z 
2023-01-11T21:38:05.8319328Z 
2023-01-11T21:38:05.8319421Z async_compile.wait(globals())
2023-01-11T21:38:05.8319500Z del async_compile
2023-01-11T21:38:05.8319505Z 
2023-01-11T21:38:05.8319579Z def call(args):
2023-01-11T21:38:05.8319647Z     arg0_1, = args
2023-01-11T21:38:05.8319722Z     args.clear()
2023-01-11T21:38:05.8319935Z     buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8320074Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8320146Z     del arg0_1
2023-01-11T21:38:05.8320221Z     return (buf0, )
2023-01-11T21:38:05.8320229Z 
2023-01-11T21:38:05.8320233Z 
2023-01-11T21:38:05.8320313Z if __name__ == "__main__":
2023-01-11T21:38:05.8320424Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8320548Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8320770Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8320883Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8320889Z 
2023-01-11T21:38:05.8320958Z ok (1.777s)
2023-01-11T21:38:05.8321408Z   test_avg_pool2d2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8321541Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8321825Z [2023-01-11 21:24:42,694] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 28
2023-01-11T21:38:05.8322087Z [2023-01-11 21:24:44,436] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 28
2023-01-11T21:38:05.8322096Z 
2023-01-11T21:38:05.8322196Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8322266Z import torch
2023-01-11T21:38:05.8322340Z import random
2023-01-11T21:38:05.8322458Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8322582Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8322587Z 
2023-01-11T21:38:05.8322669Z aten = torch.ops.aten
2023-01-11T21:38:05.8322806Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8322902Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8322907Z 
2023-01-11T21:38:05.8322975Z import triton
2023-01-11T21:38:05.8323072Z import triton.language as tl
2023-01-11T21:38:05.8323200Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8323339Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8323345Z 
2023-01-11T21:38:05.8323349Z 
2023-01-11T21:38:05.8323482Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8323691Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8323815Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8323918Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8323978Z {
2023-01-11T21:38:05.8324078Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8324143Z     {
2023-01-11T21:38:05.8324222Z         #pragma omp for 
2023-01-11T21:38:05.8324309Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.8324377Z         {
2023-01-11T21:38:05.8324462Z             #pragma GCC ivdep
2023-01-11T21:38:05.8324546Z             for(long i1=0; i1<27; i1+=1)
2023-01-11T21:38:05.8324616Z             {
2023-01-11T21:38:05.8324701Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8324797Z                 for(long i2=0; i2<27; i2+=1)
2023-01-11T21:38:05.8324864Z                 {
2023-01-11T21:38:05.8324933Z                     {
2023-01-11T21:38:05.8325000Z                         {
2023-01-11T21:38:05.8325147Z                             auto tmp0 = in_ptr0[(2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325268Z                             auto tmp1 = in_ptr0[1 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325384Z                             auto tmp3 = in_ptr0[2 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325502Z                             auto tmp5 = in_ptr0[55 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325618Z                             auto tmp7 = in_ptr0[56 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325734Z                             auto tmp9 = in_ptr0[57 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325857Z                             auto tmp11 = in_ptr0[110 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8325978Z                             auto tmp13 = in_ptr0[111 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8326093Z                             auto tmp15 = in_ptr0[112 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.8326197Z                             auto tmp2 = tmp1 + tmp0;
2023-01-11T21:38:05.8326294Z                             auto tmp4 = tmp3 + tmp2;
2023-01-11T21:38:05.8326392Z                             auto tmp6 = tmp5 + tmp4;
2023-01-11T21:38:05.8326489Z                             auto tmp8 = tmp7 + tmp6;
2023-01-11T21:38:05.8326591Z                             auto tmp10 = tmp9 + tmp8;
2023-01-11T21:38:05.8326693Z                             auto tmp12 = tmp11 + tmp10;
2023-01-11T21:38:05.8326791Z                             auto tmp14 = tmp13 + tmp12;
2023-01-11T21:38:05.8326894Z                             auto tmp16 = tmp15 + tmp14;
2023-01-11T21:38:05.8327050Z                             auto tmp17 = static_cast<float>(0.1111111111111111);
2023-01-11T21:38:05.8327151Z                             auto tmp18 = tmp16 * tmp17;
2023-01-11T21:38:05.8327264Z                             out_ptr0[i2 + (27*i1) + (729*i0)] = tmp18;
2023-01-11T21:38:05.8327339Z                         }
2023-01-11T21:38:05.8327411Z                     }
2023-01-11T21:38:05.8327474Z                 }
2023-01-11T21:38:05.8327542Z             }
2023-01-11T21:38:05.8327611Z         }
2023-01-11T21:38:05.8327677Z     }
2023-01-11T21:38:05.8327741Z }
2023-01-11T21:38:05.8327826Z ''')
2023-01-11T21:38:05.8327831Z 
2023-01-11T21:38:05.8327836Z 
2023-01-11T21:38:05.8327931Z async_compile.wait(globals())
2023-01-11T21:38:05.8328002Z del async_compile
2023-01-11T21:38:05.8328014Z 
2023-01-11T21:38:05.8328084Z def call(args):
2023-01-11T21:38:05.8328156Z     arg0_1, = args
2023-01-11T21:38:05.8328229Z     args.clear()
2023-01-11T21:38:05.8328452Z     buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8328593Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8328664Z     del arg0_1
2023-01-11T21:38:05.8328737Z     return (buf0, )
2023-01-11T21:38:05.8328742Z 
2023-01-11T21:38:05.8328746Z 
2023-01-11T21:38:05.8328823Z if __name__ == "__main__":
2023-01-11T21:38:05.8328939Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8329065Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8329291Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8329405Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8329412Z 
2023-01-11T21:38:05.8329484Z ok (1.826s)
2023-01-11T21:38:05.8329935Z   test_avg_pool2d3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8330069Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8330353Z [2023-01-11 21:24:44,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 29
2023-01-11T21:38:05.8330611Z [2023-01-11 21:24:46,312] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 29
2023-01-11T21:38:05.8330622Z 
2023-01-11T21:38:05.8330715Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8330790Z import torch
2023-01-11T21:38:05.8330863Z import random
2023-01-11T21:38:05.8330983Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8331106Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8331111Z 
2023-01-11T21:38:05.8331194Z aten = torch.ops.aten
2023-01-11T21:38:05.8331334Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8331424Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8331429Z 
2023-01-11T21:38:05.8331503Z import triton
2023-01-11T21:38:05.8331596Z import triton.language as tl
2023-01-11T21:38:05.8331725Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8331863Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8331869Z 
2023-01-11T21:38:05.8331873Z 
2023-01-11T21:38:05.8332010Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8332216Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8332338Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8332435Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8332501Z {
2023-01-11T21:38:05.8332604Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8332699Z     {
2023-01-11T21:38:05.8332781Z         #pragma omp for 
2023-01-11T21:38:05.8332868Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.8332936Z         {
2023-01-11T21:38:05.8333016Z             #pragma GCC ivdep
2023-01-11T21:38:05.8333105Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.8333173Z             {
2023-01-11T21:38:05.8333245Z                 {
2023-01-11T21:38:05.8333316Z                     {
2023-01-11T21:38:05.8333495Z                         auto tmp0 = static_cast<long>((-1) + (2*i0));
2023-01-11T21:38:05.8333606Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8333701Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8333808Z                         auto tmp3 = static_cast<long>(8);
2023-01-11T21:38:05.8333908Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8334005Z                         auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:05.8334177Z                         auto tmp6 = static_cast<long>((-1) + (2*i1));
2023-01-11T21:38:05.8334279Z                         auto tmp7 = tmp6 >= tmp1;
2023-01-11T21:38:05.8334379Z                         auto tmp8 = tmp6 < tmp3;
2023-01-11T21:38:05.8334469Z                         auto tmp9 = tmp7 & tmp8;
2023-01-11T21:38:05.8334770Z                         auto tmp10 = tmp5 & tmp9;
2023-01-11T21:38:05.8334869Z                         float tmp11 = 0.0;
2023-01-11T21:38:05.8334954Z                         if(tmp10)
2023-01-11T21:38:05.8335026Z                         {
2023-01-11T21:38:05.8335212Z                             auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8335302Z                             tmp11 = tmp12;
2023-01-11T21:38:05.8335370Z                         }
2023-01-11T21:38:05.8335484Z                         auto tmp13 = static_cast<long>(2*i1);
2023-01-11T21:38:05.8335587Z                         auto tmp14 = tmp13 >= tmp1;
2023-01-11T21:38:05.8335686Z                         auto tmp15 = tmp13 < tmp3;
2023-01-11T21:38:05.8335789Z                         auto tmp16 = tmp14 & tmp15;
2023-01-11T21:38:05.8335887Z                         auto tmp17 = tmp5 & tmp16;
2023-01-11T21:38:05.8335978Z                         float tmp18 = 0.0;
2023-01-11T21:38:05.8336053Z                         if(tmp17)
2023-01-11T21:38:05.8336126Z                         {
2023-01-11T21:38:05.8336532Z                             auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8336627Z                             tmp18 = tmp19;
2023-01-11T21:38:05.8336704Z                         }
2023-01-11T21:38:05.8336807Z                         auto tmp20 = tmp18 + tmp11;
2023-01-11T21:38:05.8336924Z                         auto tmp21 = static_cast<long>(1 + (2*i1));
2023-01-11T21:38:05.8337023Z                         auto tmp22 = tmp21 >= tmp1;
2023-01-11T21:38:05.8337115Z                         auto tmp23 = tmp21 < tmp3;
2023-01-11T21:38:05.8337274Z                         auto tmp24 = tmp22 & tmp23;
2023-01-11T21:38:05.8337383Z                         auto tmp25 = tmp5 & tmp24;
2023-01-11T21:38:05.8337488Z                         float tmp26 = 0.0;
2023-01-11T21:38:05.8337580Z                         if(tmp25)
2023-01-11T21:38:05.8337671Z                         {
2023-01-11T21:38:05.8337849Z                             auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8337933Z                             tmp26 = tmp27;
2023-01-11T21:38:05.8338004Z                         }
2023-01-11T21:38:05.8338102Z                         auto tmp28 = tmp26 + tmp20;
2023-01-11T21:38:05.8338215Z                         auto tmp29 = static_cast<long>(2*i0);
2023-01-11T21:38:05.8338313Z                         auto tmp30 = tmp29 >= tmp1;
2023-01-11T21:38:05.8338408Z                         auto tmp31 = tmp29 < tmp3;
2023-01-11T21:38:05.8338504Z                         auto tmp32 = tmp30 & tmp31;
2023-01-11T21:38:05.8338593Z                         auto tmp33 = tmp32 & tmp9;
2023-01-11T21:38:05.8338682Z                         float tmp34 = 0.0;
2023-01-11T21:38:05.8338803Z                         if(tmp33)
2023-01-11T21:38:05.8338874Z                         {
2023-01-11T21:38:05.8339050Z                             auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8339138Z                             tmp34 = tmp35;
2023-01-11T21:38:05.8339209Z                         }
2023-01-11T21:38:05.8339304Z                         auto tmp36 = tmp34 + tmp28;
2023-01-11T21:38:05.8339398Z                         auto tmp37 = tmp32 & tmp16;
2023-01-11T21:38:05.8339488Z                         float tmp38 = 0.0;
2023-01-11T21:38:05.8339567Z                         if(tmp37)
2023-01-11T21:38:05.8339638Z                         {
2023-01-11T21:38:05.8339749Z                             auto tmp39 = in_ptr0[(2*i1) + (16*i0)];
2023-01-11T21:38:05.8339839Z                             tmp38 = tmp39;
2023-01-11T21:38:05.8339905Z                         }
2023-01-11T21:38:05.8340002Z                         auto tmp40 = tmp38 + tmp36;
2023-01-11T21:38:05.8340103Z                         auto tmp41 = tmp32 & tmp24;
2023-01-11T21:38:05.8340191Z                         float tmp42 = 0.0;
2023-01-11T21:38:05.8340271Z                         if(tmp41)
2023-01-11T21:38:05.8340343Z                         {
2023-01-11T21:38:05.8340456Z                             auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8340540Z                             tmp42 = tmp43;
2023-01-11T21:38:05.8340612Z                         }
2023-01-11T21:38:05.8340710Z                         auto tmp44 = tmp42 + tmp40;
2023-01-11T21:38:05.8340825Z                         auto tmp45 = static_cast<long>(1 + (2*i0));
2023-01-11T21:38:05.8340921Z                         auto tmp46 = tmp45 >= tmp1;
2023-01-11T21:38:05.8341020Z                         auto tmp47 = tmp45 < tmp3;
2023-01-11T21:38:05.8341116Z                         auto tmp48 = tmp46 & tmp47;
2023-01-11T21:38:05.8341211Z                         auto tmp49 = tmp48 & tmp9;
2023-01-11T21:38:05.8341294Z                         float tmp50 = 0.0;
2023-01-11T21:38:05.8341377Z                         if(tmp49)
2023-01-11T21:38:05.8341447Z                         {
2023-01-11T21:38:05.8341560Z                             auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8341647Z                             tmp50 = tmp51;
2023-01-11T21:38:05.8341754Z                         }
2023-01-11T21:38:05.8341854Z                         auto tmp52 = tmp50 + tmp44;
2023-01-11T21:38:05.8341943Z                         auto tmp53 = tmp48 & tmp16;
2023-01-11T21:38:05.8342032Z                         float tmp54 = 0.0;
2023-01-11T21:38:05.8342114Z                         if(tmp53)
2023-01-11T21:38:05.8342184Z                         {
2023-01-11T21:38:05.8342296Z                             auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8342385Z                             tmp54 = tmp55;
2023-01-11T21:38:05.8342456Z                         }
2023-01-11T21:38:05.8342546Z                         auto tmp56 = tmp54 + tmp52;
2023-01-11T21:38:05.8342643Z                         auto tmp57 = tmp48 & tmp24;
2023-01-11T21:38:05.8342733Z                         float tmp58 = 0.0;
2023-01-11T21:38:05.8342814Z                         if(tmp57)
2023-01-11T21:38:05.8342886Z                         {
2023-01-11T21:38:05.8343000Z                             auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8343088Z                             tmp58 = tmp59;
2023-01-11T21:38:05.8343154Z                         }
2023-01-11T21:38:05.8343248Z                         auto tmp60 = tmp58 + tmp56;
2023-01-11T21:38:05.8343372Z                         auto tmp61 = static_cast<float>(0.1111111111111111);
2023-01-11T21:38:05.8343469Z                         auto tmp62 = tmp60 * tmp61;
2023-01-11T21:38:05.8343567Z                         out_ptr0[i1 + (4*i0)] = tmp62;
2023-01-11T21:38:05.8343637Z                     }
2023-01-11T21:38:05.8343705Z                 }
2023-01-11T21:38:05.8343765Z             }
2023-01-11T21:38:05.8343864Z         }
2023-01-11T21:38:05.8343928Z     }
2023-01-11T21:38:05.8343994Z }
2023-01-11T21:38:05.8344081Z ''')
2023-01-11T21:38:05.8344087Z 
2023-01-11T21:38:05.8344091Z 
2023-01-11T21:38:05.8344184Z async_compile.wait(globals())
2023-01-11T21:38:05.8344260Z del async_compile
2023-01-11T21:38:05.8344265Z 
2023-01-11T21:38:05.8344333Z def call(args):
2023-01-11T21:38:05.8344409Z     arg0_1, = args
2023-01-11T21:38:05.8344483Z     args.clear()
2023-01-11T21:38:05.8344696Z     buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8344835Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8344904Z     del arg0_1
2023-01-11T21:38:05.8344978Z     return (buf0, )
2023-01-11T21:38:05.8344983Z 
2023-01-11T21:38:05.8344988Z 
2023-01-11T21:38:05.8345067Z if __name__ == "__main__":
2023-01-11T21:38:05.8345178Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8345303Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8345519Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8345631Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8345636Z 
2023-01-11T21:38:05.8345706Z ok (1.832s)
2023-01-11T21:38:05.8346169Z   test_avg_pool2d4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8346300Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8346556Z [2023-01-11 21:24:46,332] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 30
2023-01-11T21:38:05.8346821Z [2023-01-11 21:24:48,104] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 30
2023-01-11T21:38:05.8346829Z 
2023-01-11T21:38:05.8346920Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8346997Z import torch
2023-01-11T21:38:05.8347072Z import random
2023-01-11T21:38:05.8347189Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8347338Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8347343Z 
2023-01-11T21:38:05.8347425Z aten = torch.ops.aten
2023-01-11T21:38:05.8347561Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8347657Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8347663Z 
2023-01-11T21:38:05.8347730Z import triton
2023-01-11T21:38:05.8347820Z import triton.language as tl
2023-01-11T21:38:05.8347943Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8348085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8348091Z 
2023-01-11T21:38:05.8348098Z 
2023-01-11T21:38:05.8348236Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8348444Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8348567Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8348671Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8348731Z {
2023-01-11T21:38:05.8348832Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8348896Z     {
2023-01-11T21:38:05.8348977Z         #pragma omp for 
2023-01-11T21:38:05.8349063Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8349127Z         {
2023-01-11T21:38:05.8349205Z             #pragma GCC ivdep
2023-01-11T21:38:05.8349295Z             for(long i1=0; i1<55; i1+=1)
2023-01-11T21:38:05.8349365Z             {
2023-01-11T21:38:05.8349449Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8349543Z                 for(long i2=0; i2<55; i2+=1)
2023-01-11T21:38:05.8349612Z                 {
2023-01-11T21:38:05.8349711Z                     {
2023-01-11T21:38:05.8349777Z                         {
2023-01-11T21:38:05.8349897Z                             auto tmp0 = in_ptr0[(2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350021Z                             auto tmp1 = in_ptr0[1 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350140Z                             auto tmp3 = in_ptr0[2 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350261Z                             auto tmp5 = in_ptr0[111 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350380Z                             auto tmp7 = in_ptr0[112 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350500Z                             auto tmp9 = in_ptr0[113 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350622Z                             auto tmp11 = in_ptr0[222 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350735Z                             auto tmp13 = in_ptr0[223 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350856Z                             auto tmp15 = in_ptr0[224 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.8350961Z                             auto tmp2 = tmp1 + tmp0;
2023-01-11T21:38:05.8351060Z                             auto tmp4 = tmp3 + tmp2;
2023-01-11T21:38:05.8351155Z                             auto tmp6 = tmp5 + tmp4;
2023-01-11T21:38:05.8351256Z                             auto tmp8 = tmp7 + tmp6;
2023-01-11T21:38:05.8351357Z                             auto tmp10 = tmp9 + tmp8;
2023-01-11T21:38:05.8351460Z                             auto tmp12 = tmp11 + tmp10;
2023-01-11T21:38:05.8351555Z                             auto tmp14 = tmp13 + tmp12;
2023-01-11T21:38:05.8351656Z                             auto tmp16 = tmp15 + tmp14;
2023-01-11T21:38:05.8351781Z                             auto tmp17 = static_cast<float>(0.1111111111111111);
2023-01-11T21:38:05.8351880Z                             auto tmp18 = tmp16 * tmp17;
2023-01-11T21:38:05.8351990Z                             out_ptr0[i2 + (55*i1) + (3025*i0)] = tmp18;
2023-01-11T21:38:05.8352065Z                         }
2023-01-11T21:38:05.8352136Z                     }
2023-01-11T21:38:05.8352198Z                 }
2023-01-11T21:38:05.8352263Z             }
2023-01-11T21:38:05.8352329Z         }
2023-01-11T21:38:05.8352395Z     }
2023-01-11T21:38:05.8352456Z }
2023-01-11T21:38:05.8352586Z ''')
2023-01-11T21:38:05.8352592Z 
2023-01-11T21:38:05.8352597Z 
2023-01-11T21:38:05.8352690Z async_compile.wait(globals())
2023-01-11T21:38:05.8352760Z del async_compile
2023-01-11T21:38:05.8352765Z 
2023-01-11T21:38:05.8352842Z def call(args):
2023-01-11T21:38:05.8352914Z     arg0_1, = args
2023-01-11T21:38:05.8352987Z     args.clear()
2023-01-11T21:38:05.8353212Z     buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8353350Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8353422Z     del arg0_1
2023-01-11T21:38:05.8353490Z     return (buf0, )
2023-01-11T21:38:05.8353497Z 
2023-01-11T21:38:05.8353509Z 
2023-01-11T21:38:05.8353583Z if __name__ == "__main__":
2023-01-11T21:38:05.8353699Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8353823Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8354051Z     arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8354163Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8354168Z 
2023-01-11T21:38:05.8354239Z ok (1.794s)
2023-01-11T21:38:05.8354689Z   test_avg_pool2d5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8354848Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8355106Z [2023-01-11 21:24:48,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 31
2023-01-11T21:38:05.8355111Z 
2023-01-11T21:38:05.8355203Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8355274Z import torch
2023-01-11T21:38:05.8355350Z import random
2023-01-11T21:38:05.8355467Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8355589Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8355595Z 
2023-01-11T21:38:05.8355675Z aten = torch.ops.aten
2023-01-11T21:38:05.8355808Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8355897Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8355907Z 
2023-01-11T21:38:05.8355973Z import triton
2023-01-11T21:38:05.8356064Z import triton.language as tl
2023-01-11T21:38:05.8356188Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8356330Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8356335Z 
2023-01-11T21:38:05.8356340Z 
2023-01-11T21:38:05.8356475Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8356681Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8356806Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8356903Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8356965Z {
2023-01-11T21:38:05.8357065Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8357130Z     {
2023-01-11T21:38:05.8357210Z         #pragma omp for 
2023-01-11T21:38:05.8357293Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.8357360Z         {
2023-01-11T21:38:05.8357439Z             #pragma GCC ivdep
2023-01-11T21:38:05.8357527Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.8357593Z             {
2023-01-11T21:38:05.8357659Z                 {
2023-01-11T21:38:05.8357733Z                     {
2023-01-11T21:38:05.8357907Z                         auto tmp0 = static_cast<long>((-1) + (2*i0));
2023-01-11T21:38:05.8358015Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8358108Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8358257Z                         auto tmp3 = static_cast<long>(8);
2023-01-11T21:38:05.8358356Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8358455Z                         auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:05.8358626Z                         auto tmp6 = static_cast<long>((-1) + (2*i1));
2023-01-11T21:38:05.8358725Z                         auto tmp7 = tmp6 >= tmp1;
2023-01-11T21:38:05.8358822Z                         auto tmp8 = tmp6 < tmp3;
2023-01-11T21:38:05.8358911Z                         auto tmp9 = tmp7 & tmp8;
2023-01-11T21:38:05.8359008Z                         auto tmp10 = tmp5 & tmp9;
2023-01-11T21:38:05.8359096Z                         float tmp11 = 0.0;
2023-01-11T21:38:05.8359180Z                         if(tmp10)
2023-01-11T21:38:05.8359254Z                         {
2023-01-11T21:38:05.8359431Z                             auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8359518Z                             tmp11 = tmp12;
2023-01-11T21:38:05.8359584Z                         }
2023-01-11T21:38:05.8359703Z                         auto tmp13 = static_cast<long>(2*i1);
2023-01-11T21:38:05.8359805Z                         auto tmp14 = tmp13 >= tmp1;
2023-01-11T21:38:05.8359903Z                         auto tmp15 = tmp13 < tmp3;
2023-01-11T21:38:05.8359998Z                         auto tmp16 = tmp14 & tmp15;
2023-01-11T21:38:05.8360093Z                         auto tmp17 = tmp5 & tmp16;
2023-01-11T21:38:05.8360185Z                         float tmp18 = 0.0;
2023-01-11T21:38:05.8360269Z                         if(tmp17)
2023-01-11T21:38:05.8360335Z                         {
2023-01-11T21:38:05.8360508Z                             auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8360625Z                             tmp18 = tmp19;
2023-01-11T21:38:05.8360699Z                         }
2023-01-11T21:38:05.8360796Z                         auto tmp20 = tmp18 + tmp11;
2023-01-11T21:38:05.8360911Z                         auto tmp21 = static_cast<long>(1 + (2*i1));
2023-01-11T21:38:05.8361011Z                         auto tmp22 = tmp21 >= tmp1;
2023-01-11T21:38:05.8361102Z                         auto tmp23 = tmp21 < tmp3;
2023-01-11T21:38:05.8361196Z                         auto tmp24 = tmp22 & tmp23;
2023-01-11T21:38:05.8361292Z                         auto tmp25 = tmp5 & tmp24;
2023-01-11T21:38:05.8361382Z                         float tmp26 = 0.0;
2023-01-11T21:38:05.8361463Z                         if(tmp25)
2023-01-11T21:38:05.8361536Z                         {
2023-01-11T21:38:05.8361708Z                             auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8361789Z                             tmp26 = tmp27;
2023-01-11T21:38:05.8361865Z                         }
2023-01-11T21:38:05.8361963Z                         auto tmp28 = tmp26 + tmp20;
2023-01-11T21:38:05.8362076Z                         auto tmp29 = static_cast<long>(2*i0);
2023-01-11T21:38:05.8362175Z                         auto tmp30 = tmp29 >= tmp1;
2023-01-11T21:38:05.8362276Z                         auto tmp31 = tmp29 < tmp3;
2023-01-11T21:38:05.8362371Z                         auto tmp32 = tmp30 & tmp31;
2023-01-11T21:38:05.8362460Z                         auto tmp33 = tmp32 & tmp9;
2023-01-11T21:38:05.8362550Z                         float tmp34 = 0.0;
2023-01-11T21:38:05.8362630Z                         if(tmp33)
2023-01-11T21:38:05.8362701Z                         {
2023-01-11T21:38:05.8362873Z                             auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8362961Z                             tmp34 = tmp35;
2023-01-11T21:38:05.8363031Z                         }
2023-01-11T21:38:05.8363122Z                         auto tmp36 = tmp34 + tmp28;
2023-01-11T21:38:05.8363220Z                         auto tmp37 = tmp32 & tmp16;
2023-01-11T21:38:05.8363309Z                         float tmp38 = 0.0;
2023-01-11T21:38:05.8363387Z                         if(tmp37)
2023-01-11T21:38:05.8363459Z                         {
2023-01-11T21:38:05.8363597Z                             auto tmp39 = in_ptr0[(2*i1) + (16*i0)];
2023-01-11T21:38:05.8363685Z                             tmp38 = tmp39;
2023-01-11T21:38:05.8363750Z                         }
2023-01-11T21:38:05.8363846Z                         auto tmp40 = tmp38 + tmp36;
2023-01-11T21:38:05.8363944Z                         auto tmp41 = tmp32 & tmp24;
2023-01-11T21:38:05.8364032Z                         float tmp42 = 0.0;
2023-01-11T21:38:05.8364112Z                         if(tmp41)
2023-01-11T21:38:05.8364187Z                         {
2023-01-11T21:38:05.8364300Z                             auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8364387Z                             tmp42 = tmp43;
2023-01-11T21:38:05.8364455Z                         }
2023-01-11T21:38:05.8364552Z                         auto tmp44 = tmp42 + tmp40;
2023-01-11T21:38:05.8364666Z                         auto tmp45 = static_cast<long>(1 + (2*i0));
2023-01-11T21:38:05.8364763Z                         auto tmp46 = tmp45 >= tmp1;
2023-01-11T21:38:05.8364865Z                         auto tmp47 = tmp45 < tmp3;
2023-01-11T21:38:05.8364962Z                         auto tmp48 = tmp46 & tmp47;
2023-01-11T21:38:05.8365057Z                         auto tmp49 = tmp48 & tmp9;
2023-01-11T21:38:05.8365140Z                         float tmp50 = 0.0;
2023-01-11T21:38:05.8365218Z                         if(tmp49)
2023-01-11T21:38:05.8365290Z                         {
2023-01-11T21:38:05.8365403Z                             auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8365489Z                             tmp50 = tmp51;
2023-01-11T21:38:05.8365562Z                         }
2023-01-11T21:38:05.8365693Z                         auto tmp52 = tmp50 + tmp44;
2023-01-11T21:38:05.8365782Z                         auto tmp53 = tmp48 & tmp16;
2023-01-11T21:38:05.8365873Z                         float tmp54 = 0.0;
2023-01-11T21:38:05.8365952Z                         if(tmp53)
2023-01-11T21:38:05.8366023Z                         {
2023-01-11T21:38:05.8366137Z                             auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8366225Z                             tmp54 = tmp55;
2023-01-11T21:38:05.8366294Z                         }
2023-01-11T21:38:05.8366384Z                         auto tmp56 = tmp54 + tmp52;
2023-01-11T21:38:05.8366479Z                         auto tmp57 = tmp48 & tmp24;
2023-01-11T21:38:05.8366568Z                         float tmp58 = 0.0;
2023-01-11T21:38:05.8366646Z                         if(tmp57)
2023-01-11T21:38:05.8366718Z                         {
2023-01-11T21:38:05.8366831Z                             auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8366921Z                             tmp58 = tmp59;
2023-01-11T21:38:05.8366987Z                         }
2023-01-11T21:38:05.8367083Z                         auto tmp60 = tmp58 + tmp56;
2023-01-11T21:38:05.8367172Z                         float tmp61 = 0.0;
2023-01-11T21:38:05.8367251Z                         if(tmp10)
2023-01-11T21:38:05.8367325Z                         {
2023-01-11T21:38:05.8367439Z                             auto tmp62 = static_cast<float>(1);
2023-01-11T21:38:05.8367528Z                             tmp61 = tmp62;
2023-01-11T21:38:05.8367592Z                         }
2023-01-11T21:38:05.8367681Z                         float tmp63 = 0.0;
2023-01-11T21:38:05.8367760Z                         if(tmp17)
2023-01-11T21:38:05.8367830Z                         {
2023-01-11T21:38:05.8367942Z                             auto tmp64 = static_cast<float>(1);
2023-01-11T21:38:05.8368028Z                             tmp63 = tmp64;
2023-01-11T21:38:05.8368099Z                         }
2023-01-11T21:38:05.8368191Z                         auto tmp65 = tmp63 + tmp61;
2023-01-11T21:38:05.8368281Z                         float tmp66 = 0.0;
2023-01-11T21:38:05.8368361Z                         if(tmp25)
2023-01-11T21:38:05.8368432Z                         {
2023-01-11T21:38:05.8368543Z                             auto tmp67 = static_cast<float>(1);
2023-01-11T21:38:05.8368667Z                             tmp66 = tmp67;
2023-01-11T21:38:05.8368740Z                         }
2023-01-11T21:38:05.8368831Z                         auto tmp68 = tmp66 + tmp65;
2023-01-11T21:38:05.8368919Z                         float tmp69 = 0.0;
2023-01-11T21:38:05.8368998Z                         if(tmp33)
2023-01-11T21:38:05.8369069Z                         {
2023-01-11T21:38:05.8369179Z                             auto tmp70 = static_cast<float>(1);
2023-01-11T21:38:05.8369264Z                             tmp69 = tmp70;
2023-01-11T21:38:05.8369336Z                         }
2023-01-11T21:38:05.8369427Z                         auto tmp71 = tmp69 + tmp68;
2023-01-11T21:38:05.8369519Z                         float tmp72 = 0.0;
2023-01-11T21:38:05.8369600Z                         if(tmp37)
2023-01-11T21:38:05.8369674Z                         {
2023-01-11T21:38:05.8369782Z                             auto tmp73 = static_cast<float>(1);
2023-01-11T21:38:05.8369870Z                             tmp72 = tmp73;
2023-01-11T21:38:05.8369945Z                         }
2023-01-11T21:38:05.8370036Z                         auto tmp74 = tmp72 + tmp71;
2023-01-11T21:38:05.8370125Z                         float tmp75 = 0.0;
2023-01-11T21:38:05.8370206Z                         if(tmp41)
2023-01-11T21:38:05.8370279Z                         {
2023-01-11T21:38:05.8370387Z                             auto tmp76 = static_cast<float>(1);
2023-01-11T21:38:05.8370473Z                             tmp75 = tmp76;
2023-01-11T21:38:05.8370544Z                         }
2023-01-11T21:38:05.8370635Z                         auto tmp77 = tmp75 + tmp74;
2023-01-11T21:38:05.8370750Z                         float tmp78 = 0.0;
2023-01-11T21:38:05.8370829Z                         if(tmp49)
2023-01-11T21:38:05.8370901Z                         {
2023-01-11T21:38:05.8371010Z                             auto tmp79 = static_cast<float>(1);
2023-01-11T21:38:05.8371097Z                             tmp78 = tmp79;
2023-01-11T21:38:05.8371176Z                         }
2023-01-11T21:38:05.8371267Z                         auto tmp80 = tmp78 + tmp77;
2023-01-11T21:38:05.8371357Z                         float tmp81 = 0.0;
2023-01-11T21:38:05.8371437Z                         if(tmp53)
2023-01-11T21:38:05.8371509Z                         {
2023-01-11T21:38:05.8371619Z                             auto tmp82 = static_cast<float>(1);
2023-01-11T21:38:05.8371706Z                             tmp81 = tmp82;
2023-01-11T21:38:05.8371781Z                         }
2023-01-11T21:38:05.8371878Z                         auto tmp83 = tmp81 + tmp80;
2023-01-11T21:38:05.8371961Z                         float tmp84 = 0.0;
2023-01-11T21:38:05.8372043Z                         if(tmp57)
2023-01-11T21:38:05.8372118Z                         {
2023-01-11T21:38:05.8372227Z                             auto tmp85 = static_cast<float>(1);
2023-01-11T21:38:05.8372313Z                             tmp84 = tmp85;
2023-01-11T21:38:05.8372384Z                         }
2023-01-11T21:38:05.8372484Z                         auto tmp86 = tmp84 + tmp83;
2023-01-11T21:38:05.8372575Z                         auto tmp87 = tmp60 / tmp86;
2023-01-11T21:38:05.8372675Z                         out_ptr0[i1 + (4*i0)] = tmp87;
2023-01-11T21:38:05.8372746Z                     }
2023-01-11T21:38:05.8372813Z                 }
2023-01-11T21:38:05.8372882Z             }
2023-01-11T21:38:05.8372949Z         }
2023-01-11T21:38:05.8373009Z     }
2023-01-11T21:38:05.8373072Z }
2023-01-11T21:38:05.8373159Z ''')
2023-01-11T21:38:05.8373165Z 
2023-01-11T21:38:05.8373169Z 
2023-01-11T21:38:05.8373264Z async_compile.wait(globals())
2023-01-11T21:38:05.8373342Z del async_compile
2023-01-11T21:38:05.8373347Z 
2023-01-11T21:38:05.8373421Z def call(args):
2023-01-11T21:38:05.8373492Z     arg0_1, = args
2023-01-11T21:38:05.8373567Z     args.clear()
2023-01-11T21:38:05.8373773Z     buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8373991Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8374067Z     del arg0_1
2023-01-11T21:38:05.8374143Z     return (buf0, )
2023-01-11T21:38:05.8374148Z 
2023-01-11T21:38:05.8374152Z 
2023-01-11T21:38:05.8374232Z if __name__ == "__main__":
2023-01-11T21:38:05.8374348Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8374587Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8374795Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8374906Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8375169Z [2023-01-11 21:24:49,929] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 31
2023-01-11T21:38:05.8375179Z 
2023-01-11T21:38:05.8375249Z ok (1.822s)
2023-01-11T21:38:05.8375708Z   test_avg_pool2d6_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8375840Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8376093Z [2023-01-11 21:24:49,948] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 32
2023-01-11T21:38:05.8376352Z [2023-01-11 21:24:51,727] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 32
2023-01-11T21:38:05.8376358Z 
2023-01-11T21:38:05.8376502Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8376576Z import torch
2023-01-11T21:38:05.8376644Z import random
2023-01-11T21:38:05.8376760Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8376882Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8376887Z 
2023-01-11T21:38:05.8376970Z aten = torch.ops.aten
2023-01-11T21:38:05.8377105Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8377256Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8377262Z 
2023-01-11T21:38:05.8377338Z import triton
2023-01-11T21:38:05.8377423Z import triton.language as tl
2023-01-11T21:38:05.8377549Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8377689Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8377695Z 
2023-01-11T21:38:05.8377700Z 
2023-01-11T21:38:05.8377838Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8378042Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8378169Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8378273Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8378336Z {
2023-01-11T21:38:05.8378431Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8378499Z     {
2023-01-11T21:38:05.8378580Z         #pragma omp for 
2023-01-11T21:38:05.8378669Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.8378734Z         {
2023-01-11T21:38:05.8378819Z             #pragma GCC ivdep
2023-01-11T21:38:05.8378906Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.8378968Z             {
2023-01-11T21:38:05.8379040Z                 {
2023-01-11T21:38:05.8379112Z                     {
2023-01-11T21:38:05.8379285Z                         auto tmp0 = static_cast<long>((-1) + (2*i0));
2023-01-11T21:38:05.8379395Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8379495Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8379604Z                         auto tmp3 = static_cast<long>(8);
2023-01-11T21:38:05.8379697Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8379795Z                         auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:05.8380015Z                         auto tmp6 = static_cast<long>((-1) + (2*i1));
2023-01-11T21:38:05.8380116Z                         auto tmp7 = tmp6 >= tmp1;
2023-01-11T21:38:05.8380214Z                         auto tmp8 = tmp6 < tmp3;
2023-01-11T21:38:05.8380310Z                         auto tmp9 = tmp7 & tmp8;
2023-01-11T21:38:05.8380411Z                         auto tmp10 = tmp5 & tmp9;
2023-01-11T21:38:05.8380495Z                         float tmp11 = 0.0;
2023-01-11T21:38:05.8380576Z                         if(tmp10)
2023-01-11T21:38:05.8380649Z                         {
2023-01-11T21:38:05.8380824Z                             auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8380917Z                             tmp11 = tmp12;
2023-01-11T21:38:05.8380992Z                         }
2023-01-11T21:38:05.8381107Z                         auto tmp13 = static_cast<long>(2*i1);
2023-01-11T21:38:05.8381207Z                         auto tmp14 = tmp13 >= tmp1;
2023-01-11T21:38:05.8381299Z                         auto tmp15 = tmp13 < tmp3;
2023-01-11T21:38:05.8381398Z                         auto tmp16 = tmp14 & tmp15;
2023-01-11T21:38:05.8381498Z                         auto tmp17 = tmp5 & tmp16;
2023-01-11T21:38:05.8381589Z                         float tmp18 = 0.0;
2023-01-11T21:38:05.8381670Z                         if(tmp17)
2023-01-11T21:38:05.8381740Z                         {
2023-01-11T21:38:05.8381914Z                             auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8381996Z                             tmp18 = tmp19;
2023-01-11T21:38:05.8382068Z                         }
2023-01-11T21:38:05.8382166Z                         auto tmp20 = tmp18 + tmp11;
2023-01-11T21:38:05.8382313Z                         auto tmp21 = static_cast<long>(1 + (2*i1));
2023-01-11T21:38:05.8382411Z                         auto tmp22 = tmp21 >= tmp1;
2023-01-11T21:38:05.8382510Z                         auto tmp23 = tmp21 < tmp3;
2023-01-11T21:38:05.8382605Z                         auto tmp24 = tmp22 & tmp23;
2023-01-11T21:38:05.8382698Z                         auto tmp25 = tmp5 & tmp24;
2023-01-11T21:38:05.8382787Z                         float tmp26 = 0.0;
2023-01-11T21:38:05.8382869Z                         if(tmp25)
2023-01-11T21:38:05.8382940Z                         {
2023-01-11T21:38:05.8383112Z                             auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8383200Z                             tmp26 = tmp27;
2023-01-11T21:38:05.8383274Z                         }
2023-01-11T21:38:05.8383367Z                         auto tmp28 = tmp26 + tmp20;
2023-01-11T21:38:05.8383478Z                         auto tmp29 = static_cast<long>(2*i0);
2023-01-11T21:38:05.8383579Z                         auto tmp30 = tmp29 >= tmp1;
2023-01-11T21:38:05.8383675Z                         auto tmp31 = tmp29 < tmp3;
2023-01-11T21:38:05.8383771Z                         auto tmp32 = tmp30 & tmp31;
2023-01-11T21:38:05.8383867Z                         auto tmp33 = tmp32 & tmp9;
2023-01-11T21:38:05.8383958Z                         float tmp34 = 0.0;
2023-01-11T21:38:05.8384033Z                         if(tmp33)
2023-01-11T21:38:05.8384106Z                         {
2023-01-11T21:38:05.8384278Z                             auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8384368Z                             tmp34 = tmp35;
2023-01-11T21:38:05.8384439Z                         }
2023-01-11T21:38:05.8384539Z                         auto tmp36 = tmp34 + tmp28;
2023-01-11T21:38:05.8384636Z                         auto tmp37 = tmp32 & tmp16;
2023-01-11T21:38:05.8384725Z                         float tmp38 = 0.0;
2023-01-11T21:38:05.8384800Z                         if(tmp37)
2023-01-11T21:38:05.8384880Z                         {
2023-01-11T21:38:05.8384994Z                             auto tmp39 = in_ptr0[(2*i1) + (16*i0)];
2023-01-11T21:38:05.8385081Z                             tmp38 = tmp39;
2023-01-11T21:38:05.8385153Z                         }
2023-01-11T21:38:05.8385251Z                         auto tmp40 = tmp38 + tmp36;
2023-01-11T21:38:05.8385375Z                         auto tmp41 = tmp32 & tmp24;
2023-01-11T21:38:05.8385460Z                         float tmp42 = 0.0;
2023-01-11T21:38:05.8385551Z                         if(tmp41)
2023-01-11T21:38:05.8385637Z                         {
2023-01-11T21:38:05.8385765Z                             auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8385861Z                             tmp42 = tmp43;
2023-01-11T21:38:05.8385931Z                         }
2023-01-11T21:38:05.8386030Z                         auto tmp44 = tmp42 + tmp40;
2023-01-11T21:38:05.8386140Z                         auto tmp45 = static_cast<long>(1 + (2*i0));
2023-01-11T21:38:05.8386239Z                         auto tmp46 = tmp45 >= tmp1;
2023-01-11T21:38:05.8386337Z                         auto tmp47 = tmp45 < tmp3;
2023-01-11T21:38:05.8386434Z                         auto tmp48 = tmp46 & tmp47;
2023-01-11T21:38:05.8386529Z                         auto tmp49 = tmp48 & tmp9;
2023-01-11T21:38:05.8386624Z                         float tmp50 = 0.0;
2023-01-11T21:38:05.8386704Z                         if(tmp49)
2023-01-11T21:38:05.8386770Z                         {
2023-01-11T21:38:05.8386883Z                             auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8386971Z                             tmp50 = tmp51;
2023-01-11T21:38:05.8387041Z                         }
2023-01-11T21:38:05.8387139Z                         auto tmp52 = tmp50 + tmp44;
2023-01-11T21:38:05.8387235Z                         auto tmp53 = tmp48 & tmp16;
2023-01-11T21:38:05.8387323Z                         float tmp54 = 0.0;
2023-01-11T21:38:05.8387396Z                         if(tmp53)
2023-01-11T21:38:05.8387499Z                         {
2023-01-11T21:38:05.8387610Z                             auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8387697Z                             tmp54 = tmp55;
2023-01-11T21:38:05.8387769Z                         }
2023-01-11T21:38:05.8387869Z                         auto tmp56 = tmp54 + tmp52;
2023-01-11T21:38:05.8387967Z                         auto tmp57 = tmp48 & tmp24;
2023-01-11T21:38:05.8388050Z                         float tmp58 = 0.0;
2023-01-11T21:38:05.8388130Z                         if(tmp57)
2023-01-11T21:38:05.8388201Z                         {
2023-01-11T21:38:05.8388314Z                             auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.8388403Z                             tmp58 = tmp59;
2023-01-11T21:38:05.8388476Z                         }
2023-01-11T21:38:05.8388573Z                         auto tmp60 = tmp58 + tmp56;
2023-01-11T21:38:05.8388690Z                         auto tmp61 = static_cast<float>(0.3333333333333333);
2023-01-11T21:38:05.8388791Z                         auto tmp62 = tmp60 * tmp61;
2023-01-11T21:38:05.8388891Z                         out_ptr0[i1 + (4*i0)] = tmp62;
2023-01-11T21:38:05.8388963Z                     }
2023-01-11T21:38:05.8389033Z                 }
2023-01-11T21:38:05.8389097Z             }
2023-01-11T21:38:05.8389167Z         }
2023-01-11T21:38:05.8389226Z     }
2023-01-11T21:38:05.8389290Z }
2023-01-11T21:38:05.8389376Z ''')
2023-01-11T21:38:05.8389381Z 
2023-01-11T21:38:05.8389385Z 
2023-01-11T21:38:05.8389481Z async_compile.wait(globals())
2023-01-11T21:38:05.8389556Z del async_compile
2023-01-11T21:38:05.8389561Z 
2023-01-11T21:38:05.8389634Z def call(args):
2023-01-11T21:38:05.8389706Z     arg0_1, = args
2023-01-11T21:38:05.8389774Z     args.clear()
2023-01-11T21:38:05.8389987Z     buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8390124Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8390200Z     del arg0_1
2023-01-11T21:38:05.8390274Z     return (buf0, )
2023-01-11T21:38:05.8390279Z 
2023-01-11T21:38:05.8390283Z 
2023-01-11T21:38:05.8390363Z if __name__ == "__main__":
2023-01-11T21:38:05.8390479Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8390634Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8390840Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8390955Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8390960Z 
2023-01-11T21:38:05.8391032Z ok (1.798s)
2023-01-11T21:38:05.8391481Z   test_avg_pool2d7_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8391616Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8391872Z [2023-01-11 21:24:51,746] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 33
2023-01-11T21:38:05.8392100Z [2023-01-11 21:24:51,751] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d
2023-01-11T21:38:05.8392361Z [2023-01-11 21:24:51,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 33
2023-01-11T21:38:05.8392368Z 
2023-01-11T21:38:05.8392468Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8392542Z import torch
2023-01-11T21:38:05.8392610Z import random
2023-01-11T21:38:05.8392729Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8392851Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8392857Z 
2023-01-11T21:38:05.8392942Z aten = torch.ops.aten
2023-01-11T21:38:05.8393108Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8393204Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8393210Z 
2023-01-11T21:38:05.8393282Z import triton
2023-01-11T21:38:05.8393368Z import triton.language as tl
2023-01-11T21:38:05.8393492Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8393634Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8393640Z 
2023-01-11T21:38:05.8393644Z 
2023-01-11T21:38:05.8393736Z async_compile.wait(globals())
2023-01-11T21:38:05.8393811Z del async_compile
2023-01-11T21:38:05.8393817Z 
2023-01-11T21:38:05.8393890Z def call(args):
2023-01-11T21:38:05.8393964Z     arg0_1, = args
2023-01-11T21:38:05.8394042Z     args.clear()
2023-01-11T21:38:05.8394168Z     buf0 = aten.avg_pool2d(arg0_1, [13, 13], [1, 1], [0, 0], False, True, None)
2023-01-11T21:38:05.8394240Z     del arg0_1
2023-01-11T21:38:05.8394312Z     buf1 = buf0
2023-01-11T21:38:05.8394425Z     assert_size_stride(buf1, (1, 1, 12, 12), (144, 144, 12, 1))
2023-01-11T21:38:05.8394498Z     del buf0
2023-01-11T21:38:05.8394574Z     return (buf1, )
2023-01-11T21:38:05.8394579Z 
2023-01-11T21:38:05.8394583Z 
2023-01-11T21:38:05.8394662Z if __name__ == "__main__":
2023-01-11T21:38:05.8394770Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8394898Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8395119Z     arg0_1 = rand_strided((1, 1, 24, 24), (576, 576, 24, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8395232Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8395237Z 
2023-01-11T21:38:05.8395307Z ok (0.026s)
2023-01-11T21:38:05.8395771Z   test_avg_pool2d_backward2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8395906Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8396167Z [2023-01-11 21:24:51,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 34
2023-01-11T21:38:05.8396172Z 
2023-01-11T21:38:05.8396310Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8396389Z import torch
2023-01-11T21:38:05.8396458Z import random
2023-01-11T21:38:05.8396575Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8396698Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8396703Z 
2023-01-11T21:38:05.8396785Z aten = torch.ops.aten
2023-01-11T21:38:05.8396922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8397017Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8397022Z 
2023-01-11T21:38:05.8397095Z import triton
2023-01-11T21:38:05.8397180Z import triton.language as tl
2023-01-11T21:38:05.8397306Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8397446Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8397452Z 
2023-01-11T21:38:05.8397456Z 
2023-01-11T21:38:05.8397593Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8397806Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8397930Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8398034Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8398103Z {
2023-01-11T21:38:05.8398197Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8398265Z     {
2023-01-11T21:38:05.8398350Z         #pragma omp for 
2023-01-11T21:38:05.8398438Z         for(long i0=0; i0<20; i0+=1)
2023-01-11T21:38:05.8398505Z         {
2023-01-11T21:38:05.8398588Z             #pragma GCC ivdep
2023-01-11T21:38:05.8398679Z             for(long i1=0; i1<15; i1+=1)
2023-01-11T21:38:05.8398770Z             {
2023-01-11T21:38:05.8398838Z                 {
2023-01-11T21:38:05.8398909Z                     {
2023-01-11T21:38:05.8399078Z                         auto tmp0 = static_cast<int>((-1) + i0);
2023-01-11T21:38:05.8399246Z                         auto tmp1 = static_cast<int>((-1) + i1);
2023-01-11T21:38:05.8399362Z                         auto tmp2 = static_cast<int>(2 + i0);
2023-01-11T21:38:05.8399473Z                         auto tmp3 = static_cast<int>(2 + i1);
2023-01-11T21:38:05.8399575Z                         auto tmp4 = static_cast<int>(0);
2023-01-11T21:38:05.8399713Z                         auto tmp5 = (tmp4 != tmp4) ? tmp4 : std::max(tmp0, tmp4);
2023-01-11T21:38:05.8399847Z                         auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp1, tmp4);
2023-01-11T21:38:05.8399956Z                         auto tmp7 = static_cast<int>(20);
2023-01-11T21:38:05.8400084Z                         auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp2, tmp7);
2023-01-11T21:38:05.8400195Z                         auto tmp9 = static_cast<int>(15);
2023-01-11T21:38:05.8400330Z                         auto tmp10 = (tmp9 != tmp9) ? tmp9 : std::min(tmp3, tmp9);
2023-01-11T21:38:05.8400429Z                         auto tmp11 = tmp5 + tmp4;
2023-01-11T21:38:05.8400522Z                         auto tmp12 = tmp6 + tmp4;
2023-01-11T21:38:05.8400633Z                         auto tmp13 = static_cast<int>(1);
2023-01-11T21:38:05.8400738Z                         auto tmp14 = static_cast<int>(3);
2023-01-11T21:38:05.8400838Z                         auto tmp15 = tmp11 * tmp13;
2023-01-11T21:38:05.8400983Z                         auto tmp16 = tmp15 - tmp13;
2023-01-11T21:38:05.8401079Z                         auto tmp17 = tmp12 * tmp13;
2023-01-11T21:38:05.8401220Z                         auto tmp18 = tmp17 - tmp13;
2023-01-11T21:38:05.8401310Z                         auto tmp19 = tmp16 + tmp14;
2023-01-11T21:38:05.8401408Z                         auto tmp20 = tmp7 + tmp13;
2023-01-11T21:38:05.8401547Z                         auto tmp21 = (tmp20 != tmp20) ? tmp20 : std::min(tmp19, tmp20);
2023-01-11T21:38:05.8401646Z                         auto tmp22 = tmp18 + tmp14;
2023-01-11T21:38:05.8401744Z                         auto tmp23 = tmp9 + tmp13;
2023-01-11T21:38:05.8401915Z                         auto tmp24 = (tmp23 != tmp23) ? tmp23 : std::min(tmp22, tmp23);
2023-01-11T21:38:05.8402051Z                         auto tmp25 = (tmp4 != tmp4) ? tmp4 : std::max(tmp16, tmp4);
2023-01-11T21:38:05.8402182Z                         auto tmp26 = (tmp4 != tmp4) ? tmp4 : std::max(tmp18, tmp4);
2023-01-11T21:38:05.8402309Z                         auto tmp27 = (tmp7 != tmp7) ? tmp7 : std::min(tmp21, tmp7);
2023-01-11T21:38:05.8402428Z                         auto tmp28 = (tmp9 != tmp9) ? tmp9 : std::min(tmp24, tmp9);
2023-01-11T21:38:05.8402574Z                         auto tmp29 = tmp27 - tmp25;
2023-01-11T21:38:05.8402718Z                         auto tmp30 = tmp28 - tmp26;
2023-01-11T21:38:05.8402817Z                         auto tmp31 = tmp29 * tmp30;
2023-01-11T21:38:05.8402959Z                         auto tmp32 = tmp8 - tmp13;
2023-01-11T21:38:05.8403095Z                         auto tmp33 = (tmp32 != tmp32) ? tmp32 : std::min(tmp11, tmp32);
2023-01-11T21:38:05.8403239Z                         auto tmp34 = tmp10 - tmp13;
2023-01-11T21:38:05.8403375Z                         auto tmp35 = (tmp34 != tmp34) ? tmp34 : std::min(tmp12, tmp34);
2023-01-11T21:38:05.8403483Z                         auto tmp36 = in_ptr0[tmp35 + (15*tmp33)];
2023-01-11T21:38:05.8403581Z                         auto tmp37 = tmp36 / tmp31;
2023-01-11T21:38:05.8403679Z                         auto tmp38 = tmp11 < tmp8;
2023-01-11T21:38:05.8403774Z                         auto tmp39 = tmp12 < tmp10;
2023-01-11T21:38:05.8403870Z                         auto tmp40 = tmp38 & tmp39;
2023-01-11T21:38:05.8403986Z                         auto tmp41 = static_cast<float>(0.0);
2023-01-11T21:38:05.8404094Z                         auto tmp42 = tmp40 ? tmp37 : tmp41;
2023-01-11T21:38:05.8404216Z                         auto tmp43 = tmp6 + tmp13;
2023-01-11T21:38:05.8404315Z                         auto tmp44 = tmp43 * tmp13;
2023-01-11T21:38:05.8404458Z                         auto tmp45 = tmp44 - tmp13;
2023-01-11T21:38:05.8404554Z                         auto tmp46 = tmp45 + tmp14;
2023-01-11T21:38:05.8404692Z                         auto tmp47 = (tmp23 != tmp23) ? tmp23 : std::min(tmp46, tmp23);
2023-01-11T21:38:05.8404827Z                         auto tmp48 = (tmp4 != tmp4) ? tmp4 : std::max(tmp45, tmp4);
2023-01-11T21:38:05.8404956Z                         auto tmp49 = (tmp9 != tmp9) ? tmp9 : std::min(tmp47, tmp9);
2023-01-11T21:38:05.8405099Z                         auto tmp50 = tmp49 - tmp48;
2023-01-11T21:38:05.8405197Z                         auto tmp51 = tmp29 * tmp50;
2023-01-11T21:38:05.8405322Z                         auto tmp52 = (tmp34 != tmp34) ? tmp34 : std::min(tmp43, tmp34);
2023-01-11T21:38:05.8405438Z                         auto tmp53 = in_ptr0[tmp52 + (15*tmp33)];
2023-01-11T21:38:05.8405540Z                         auto tmp54 = tmp53 / tmp51;
2023-01-11T21:38:05.8405636Z                         auto tmp55 = tmp43 < tmp10;
2023-01-11T21:38:05.8405733Z                         auto tmp56 = tmp38 & tmp55;
2023-01-11T21:38:05.8405837Z                         auto tmp57 = tmp42 + tmp54;
2023-01-11T21:38:05.8405945Z                         auto tmp58 = tmp56 ? tmp57 : tmp42;
2023-01-11T21:38:05.8406047Z                         auto tmp59 = static_cast<int>(2);
2023-01-11T21:38:05.8406146Z                         auto tmp60 = tmp6 + tmp59;
2023-01-11T21:38:05.8406242Z                         auto tmp61 = tmp60 * tmp13;
2023-01-11T21:38:05.8406384Z                         auto tmp62 = tmp61 - tmp13;
2023-01-11T21:38:05.8406482Z                         auto tmp63 = tmp62 + tmp14;
2023-01-11T21:38:05.8406614Z                         auto tmp64 = (tmp23 != tmp23) ? tmp23 : std::min(tmp63, tmp23);
2023-01-11T21:38:05.8406744Z                         auto tmp65 = (tmp4 != tmp4) ? tmp4 : std::max(tmp62, tmp4);
2023-01-11T21:38:05.8406874Z                         auto tmp66 = (tmp9 != tmp9) ? tmp9 : std::min(tmp64, tmp9);
2023-01-11T21:38:05.8407010Z                         auto tmp67 = tmp66 - tmp65;
2023-01-11T21:38:05.8407106Z                         auto tmp68 = tmp29 * tmp67;
2023-01-11T21:38:05.8407272Z                         auto tmp69 = (tmp34 != tmp34) ? tmp34 : std::min(tmp60, tmp34);
2023-01-11T21:38:05.8407387Z                         auto tmp70 = in_ptr0[tmp69 + (15*tmp33)];
2023-01-11T21:38:05.8407482Z                         auto tmp71 = tmp70 / tmp68;
2023-01-11T21:38:05.8407580Z                         auto tmp72 = tmp60 < tmp10;
2023-01-11T21:38:05.8407678Z                         auto tmp73 = tmp38 & tmp72;
2023-01-11T21:38:05.8407773Z                         auto tmp74 = tmp58 + tmp71;
2023-01-11T21:38:05.8407874Z                         auto tmp75 = tmp73 ? tmp74 : tmp58;
2023-01-11T21:38:05.8407971Z                         auto tmp76 = tmp5 + tmp13;
2023-01-11T21:38:05.8408069Z                         auto tmp77 = tmp76 * tmp13;
2023-01-11T21:38:05.8408214Z                         auto tmp78 = tmp77 - tmp13;
2023-01-11T21:38:05.8408309Z                         auto tmp79 = tmp78 + tmp14;
2023-01-11T21:38:05.8408445Z                         auto tmp80 = (tmp20 != tmp20) ? tmp20 : std::min(tmp79, tmp20);
2023-01-11T21:38:05.8408580Z                         auto tmp81 = (tmp4 != tmp4) ? tmp4 : std::max(tmp78, tmp4);
2023-01-11T21:38:05.8408710Z                         auto tmp82 = (tmp7 != tmp7) ? tmp7 : std::min(tmp80, tmp7);
2023-01-11T21:38:05.8408847Z                         auto tmp83 = tmp82 - tmp81;
2023-01-11T21:38:05.8408944Z                         auto tmp84 = tmp83 * tmp30;
2023-01-11T21:38:05.8409075Z                         auto tmp85 = (tmp32 != tmp32) ? tmp32 : std::min(tmp76, tmp32);
2023-01-11T21:38:05.8409188Z                         auto tmp86 = in_ptr0[tmp35 + (15*tmp85)];
2023-01-11T21:38:05.8409313Z                         auto tmp87 = tmp86 / tmp84;
2023-01-11T21:38:05.8409409Z                         auto tmp88 = tmp76 < tmp8;
2023-01-11T21:38:05.8409505Z                         auto tmp89 = tmp88 & tmp39;
2023-01-11T21:38:05.8409604Z                         auto tmp90 = tmp75 + tmp87;
2023-01-11T21:38:05.8409710Z                         auto tmp91 = tmp89 ? tmp90 : tmp75;
2023-01-11T21:38:05.8409811Z                         auto tmp92 = tmp83 * tmp50;
2023-01-11T21:38:05.8409924Z                         auto tmp93 = in_ptr0[tmp52 + (15*tmp85)];
2023-01-11T21:38:05.8410022Z                         auto tmp94 = tmp93 / tmp92;
2023-01-11T21:38:05.8410116Z                         auto tmp95 = tmp88 & tmp55;
2023-01-11T21:38:05.8410211Z                         auto tmp96 = tmp91 + tmp94;
2023-01-11T21:38:05.8410319Z                         auto tmp97 = tmp95 ? tmp96 : tmp91;
2023-01-11T21:38:05.8410409Z                         auto tmp98 = tmp83 * tmp67;
2023-01-11T21:38:05.8410521Z                         auto tmp99 = in_ptr0[tmp69 + (15*tmp85)];
2023-01-11T21:38:05.8410628Z                         auto tmp100 = tmp99 / tmp98;
2023-01-11T21:38:05.8410728Z                         auto tmp101 = tmp88 & tmp72;
2023-01-11T21:38:05.8410831Z                         auto tmp102 = tmp97 + tmp100;
2023-01-11T21:38:05.8410948Z                         auto tmp103 = tmp101 ? tmp102 : tmp97;
2023-01-11T21:38:05.8411048Z                         auto tmp104 = tmp5 + tmp59;
2023-01-11T21:38:05.8411143Z                         auto tmp105 = tmp104 * tmp13;
2023-01-11T21:38:05.8411293Z                         auto tmp106 = tmp105 - tmp13;
2023-01-11T21:38:05.8411393Z                         auto tmp107 = tmp106 + tmp14;
2023-01-11T21:38:05.8411530Z                         auto tmp108 = (tmp20 != tmp20) ? tmp20 : std::min(tmp107, tmp20);
2023-01-11T21:38:05.8411668Z                         auto tmp109 = (tmp4 != tmp4) ? tmp4 : std::max(tmp106, tmp4);
2023-01-11T21:38:05.8411800Z                         auto tmp110 = (tmp7 != tmp7) ? tmp7 : std::min(tmp108, tmp7);
2023-01-11T21:38:05.8411953Z                         auto tmp111 = tmp110 - tmp109;
2023-01-11T21:38:05.8412053Z                         auto tmp112 = tmp111 * tmp30;
2023-01-11T21:38:05.8412189Z                         auto tmp113 = (tmp32 != tmp32) ? tmp32 : std::min(tmp104, tmp32);
2023-01-11T21:38:05.8412335Z                         auto tmp114 = in_ptr0[tmp35 + (15*tmp113)];
2023-01-11T21:38:05.8412439Z                         auto tmp115 = tmp114 / tmp112;
2023-01-11T21:38:05.8412541Z                         auto tmp116 = tmp104 < tmp8;
2023-01-11T21:38:05.8412641Z                         auto tmp117 = tmp116 & tmp39;
2023-01-11T21:38:05.8412743Z                         auto tmp118 = tmp103 + tmp115;
2023-01-11T21:38:05.8412854Z                         auto tmp119 = tmp117 ? tmp118 : tmp103;
2023-01-11T21:38:05.8412957Z                         auto tmp120 = tmp111 * tmp50;
2023-01-11T21:38:05.8413065Z                         auto tmp121 = in_ptr0[tmp52 + (15*tmp113)];
2023-01-11T21:38:05.8413169Z                         auto tmp122 = tmp121 / tmp120;
2023-01-11T21:38:05.8413271Z                         auto tmp123 = tmp116 & tmp55;
2023-01-11T21:38:05.8413371Z                         auto tmp124 = tmp119 + tmp122;
2023-01-11T21:38:05.8413481Z                         auto tmp125 = tmp123 ? tmp124 : tmp119;
2023-01-11T21:38:05.8413584Z                         auto tmp126 = tmp111 * tmp67;
2023-01-11T21:38:05.8413701Z                         auto tmp127 = in_ptr0[tmp69 + (15*tmp113)];
2023-01-11T21:38:05.8413802Z                         auto tmp128 = tmp127 / tmp126;
2023-01-11T21:38:05.8413896Z                         auto tmp129 = tmp116 & tmp72;
2023-01-11T21:38:05.8413997Z                         auto tmp130 = tmp125 + tmp128;
2023-01-11T21:38:05.8414111Z                         auto tmp131 = tmp129 ? tmp130 : tmp125;
2023-01-11T21:38:05.8414215Z                         out_ptr0[i1 + (15*i0)] = tmp131;
2023-01-11T21:38:05.8414286Z                     }
2023-01-11T21:38:05.8414385Z                 }
2023-01-11T21:38:05.8414451Z             }
2023-01-11T21:38:05.8414692Z         }
2023-01-11T21:38:05.8414762Z     }
2023-01-11T21:38:05.8414824Z }
2023-01-11T21:38:05.8414912Z ''')
2023-01-11T21:38:05.8414918Z 
2023-01-11T21:38:05.8414922Z 
2023-01-11T21:38:05.8415020Z async_compile.wait(globals())
2023-01-11T21:38:05.8415102Z del async_compile
2023-01-11T21:38:05.8415107Z 
2023-01-11T21:38:05.8415182Z def call(args):
2023-01-11T21:38:05.8415273Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8415352Z     args.clear()
2023-01-11T21:38:05.8415594Z     buf0 = empty_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8415733Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8415804Z     del arg0_1
2023-01-11T21:38:05.8415879Z     return (buf0, )
2023-01-11T21:38:05.8415884Z 
2023-01-11T21:38:05.8415889Z 
2023-01-11T21:38:05.8415968Z if __name__ == "__main__":
2023-01-11T21:38:05.8416088Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8416211Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8416428Z     arg0_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8416646Z     arg1_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8416766Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8417030Z [2023-01-11 21:24:53,617] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 34
2023-01-11T21:38:05.8417036Z 
2023-01-11T21:38:05.8417108Z ok (1.870s)
2023-01-11T21:38:05.8417625Z   test_avg_pool2d_backward3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8417762Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8418018Z [2023-01-11 21:24:53,658] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 35
2023-01-11T21:38:05.8418331Z [2023-01-11 21:24:55,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 35
2023-01-11T21:38:05.8418346Z 
2023-01-11T21:38:05.8418438Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8418512Z import torch
2023-01-11T21:38:05.8418584Z import random
2023-01-11T21:38:05.8418702Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8418827Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8418832Z 
2023-01-11T21:38:05.8418912Z aten = torch.ops.aten
2023-01-11T21:38:05.8419048Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8419138Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8419146Z 
2023-01-11T21:38:05.8419220Z import triton
2023-01-11T21:38:05.8419312Z import triton.language as tl
2023-01-11T21:38:05.8419438Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8419578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8419584Z 
2023-01-11T21:38:05.8419590Z 
2023-01-11T21:38:05.8419728Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8419933Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8420055Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8420152Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8420219Z {
2023-01-11T21:38:05.8420320Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8420387Z     {
2023-01-11T21:38:05.8420468Z         #pragma omp for 
2023-01-11T21:38:05.8420556Z         for(long i0=0; i0<2016; i0+=1)
2023-01-11T21:38:05.8420662Z         {
2023-01-11T21:38:05.8420740Z             #pragma GCC ivdep
2023-01-11T21:38:05.8420829Z             for(long i1=0; i1<21; i1+=1)
2023-01-11T21:38:05.8420896Z             {
2023-01-11T21:38:05.8420981Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8421082Z                 for(long i2=0; i2<21; i2+=1)
2023-01-11T21:38:05.8421153Z                 {
2023-01-11T21:38:05.8421222Z                     {
2023-01-11T21:38:05.8421288Z                         {
2023-01-11T21:38:05.8421411Z                             auto tmp0 = static_cast<int>(((1 + i1) / 2));
2023-01-11T21:38:05.8421529Z                             auto tmp1 = static_cast<int>(((1 + i2) / 2));
2023-01-11T21:38:05.8421648Z                             auto tmp2 = static_cast<int>(1 + (i1 / 2));
2023-01-11T21:38:05.8421765Z                             auto tmp3 = static_cast<int>(1 + (i2 / 2));
2023-01-11T21:38:05.8421878Z                             auto tmp4 = static_cast<int>(0);
2023-01-11T21:38:05.8422015Z                             auto tmp5 = (tmp4 != tmp4) ? tmp4 : std::max(tmp0, tmp4);
2023-01-11T21:38:05.8422142Z                             auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp1, tmp4);
2023-01-11T21:38:05.8422256Z                             auto tmp7 = static_cast<int>(11);
2023-01-11T21:38:05.8422392Z                             auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp2, tmp7);
2023-01-11T21:38:05.8422519Z                             auto tmp9 = (tmp7 != tmp7) ? tmp7 : std::min(tmp3, tmp7);
2023-01-11T21:38:05.8422620Z                             auto tmp10 = tmp5 + tmp4;
2023-01-11T21:38:05.8422718Z                             auto tmp11 = tmp6 + tmp4;
2023-01-11T21:38:05.8422833Z                             auto tmp12 = static_cast<int>(1);
2023-01-11T21:38:05.8422983Z                             auto tmp13 = tmp8 - tmp12;
2023-01-11T21:38:05.8423121Z                             auto tmp14 = (tmp13 != tmp13) ? tmp13 : std::min(tmp10, tmp13);
2023-01-11T21:38:05.8423263Z                             auto tmp15 = tmp9 - tmp12;
2023-01-11T21:38:05.8423404Z                             auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp11, tmp15);
2023-01-11T21:38:05.8423528Z                             auto tmp17 = in_ptr0[tmp16 + (11*tmp14) + (121*i0)];
2023-01-11T21:38:05.8423628Z                             auto tmp18 = tmp17 / 1;
2023-01-11T21:38:05.8423761Z                             auto tmp19 = tmp10 < tmp8;
2023-01-11T21:38:05.8423864Z                             auto tmp20 = tmp11 < tmp9;
2023-01-11T21:38:05.8423968Z                             auto tmp21 = tmp19 & tmp20;
2023-01-11T21:38:05.8424077Z                             auto tmp22 = static_cast<float>(0.0);
2023-01-11T21:38:05.8424188Z                             auto tmp23 = tmp21 ? tmp18 : tmp22;
2023-01-11T21:38:05.8424300Z                             out_ptr0[i2 + (21*i1) + (441*i0)] = tmp23;
2023-01-11T21:38:05.8424374Z                         }
2023-01-11T21:38:05.8424445Z                     }
2023-01-11T21:38:05.8424517Z                 }
2023-01-11T21:38:05.8424583Z             }
2023-01-11T21:38:05.8424643Z         }
2023-01-11T21:38:05.8424710Z     }
2023-01-11T21:38:05.8424773Z }
2023-01-11T21:38:05.8424856Z ''')
2023-01-11T21:38:05.8424861Z 
2023-01-11T21:38:05.8424866Z 
2023-01-11T21:38:05.8424959Z async_compile.wait(globals())
2023-01-11T21:38:05.8425035Z del async_compile
2023-01-11T21:38:05.8425042Z 
2023-01-11T21:38:05.8425117Z def call(args):
2023-01-11T21:38:05.8425195Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8425263Z     args.clear()
2023-01-11T21:38:05.8425494Z     buf0 = empty_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8425633Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8425705Z     del arg0_1
2023-01-11T21:38:05.8425779Z     return (buf0, )
2023-01-11T21:38:05.8425786Z 
2023-01-11T21:38:05.8425790Z 
2023-01-11T21:38:05.8425870Z if __name__ == "__main__":
2023-01-11T21:38:05.8425987Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8426136Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8426362Z     arg0_1 = rand_strided((1, 2016, 11, 11), (243936, 121, 11, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8426582Z     arg1_1 = rand_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8426704Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8426709Z 
2023-01-11T21:38:05.8426779Z ok (1.954s)
2023-01-11T21:38:05.8427242Z   test_avg_pool2d_backward4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8427374Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8427635Z [2023-01-11 21:24:55,600] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 36
2023-01-11T21:38:05.8427875Z [2023-01-11 21:24:55,611] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d_backward
2023-01-11T21:38:05.8428137Z [2023-01-11 21:24:55,613] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 36
2023-01-11T21:38:05.8428143Z 
2023-01-11T21:38:05.8428235Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8428308Z import torch
2023-01-11T21:38:05.8428381Z import random
2023-01-11T21:38:05.8428500Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8428624Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8428629Z 
2023-01-11T21:38:05.8428712Z aten = torch.ops.aten
2023-01-11T21:38:05.8428849Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8428945Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8428953Z 
2023-01-11T21:38:05.8429020Z import triton
2023-01-11T21:38:05.8429114Z import triton.language as tl
2023-01-11T21:38:05.8429240Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8429380Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8429386Z 
2023-01-11T21:38:05.8429419Z 
2023-01-11T21:38:05.8429515Z async_compile.wait(globals())
2023-01-11T21:38:05.8429591Z del async_compile
2023-01-11T21:38:05.8429596Z 
2023-01-11T21:38:05.8429671Z def call(args):
2023-01-11T21:38:05.8429750Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8429818Z     args.clear()
2023-01-11T21:38:05.8429968Z     buf0 = aten.avg_pool2d_backward(arg0_1, arg1_1, [13, 13], [1, 1], [0, 0], True, False, None)
2023-01-11T21:38:05.8430040Z     del arg0_1
2023-01-11T21:38:05.8430110Z     del arg1_1
2023-01-11T21:38:05.8430183Z     buf1 = buf0
2023-01-11T21:38:05.8430297Z     assert_size_stride(buf1, (1, 16, 24, 24), (9216, 576, 24, 1))
2023-01-11T21:38:05.8430370Z     del buf0
2023-01-11T21:38:05.8430439Z     return (buf1, )
2023-01-11T21:38:05.8430444Z 
2023-01-11T21:38:05.8430449Z 
2023-01-11T21:38:05.8430528Z if __name__ == "__main__":
2023-01-11T21:38:05.8430645Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8430774Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8430999Z     arg0_1 = rand_strided((1, 16, 12, 12), (2304, 144, 12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8431217Z     arg1_1 = rand_strided((1, 16, 24, 24), (9216, 576, 24, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8431338Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8431343Z 
2023-01-11T21:38:05.8431412Z ok (0.036s)
2023-01-11T21:38:05.8431866Z   test_avg_pool2d_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8432025Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8432283Z [2023-01-11 21:24:55,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 37
2023-01-11T21:38:05.8432546Z [2023-01-11 21:24:57,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 37
2023-01-11T21:38:05.8432552Z 
2023-01-11T21:38:05.8432650Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8432725Z import torch
2023-01-11T21:38:05.8432799Z import random
2023-01-11T21:38:05.8432917Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8433042Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8433047Z 
2023-01-11T21:38:05.8433121Z aten = torch.ops.aten
2023-01-11T21:38:05.8433260Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8433354Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8433359Z 
2023-01-11T21:38:05.8433431Z import triton
2023-01-11T21:38:05.8433523Z import triton.language as tl
2023-01-11T21:38:05.8433646Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8433788Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8433793Z 
2023-01-11T21:38:05.8433798Z 
2023-01-11T21:38:05.8433935Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8434134Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8434257Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8434362Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8434428Z {
2023-01-11T21:38:05.8434528Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8434593Z     {
2023-01-11T21:38:05.8434677Z         #pragma omp for 
2023-01-11T21:38:05.8434757Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8434823Z         {
2023-01-11T21:38:05.8434907Z             #pragma GCC ivdep
2023-01-11T21:38:05.8434997Z             for(long i1=0; i1<14; i1+=1)
2023-01-11T21:38:05.8435064Z             {
2023-01-11T21:38:05.8435149Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8435286Z                 for(long i2=0; i2<14; i2+=1)
2023-01-11T21:38:05.8435350Z                 {
2023-01-11T21:38:05.8435421Z                     {
2023-01-11T21:38:05.8435495Z                         {
2023-01-11T21:38:05.8435612Z                             auto tmp0 = static_cast<int>((i1 / 2));
2023-01-11T21:38:05.8435727Z                             auto tmp1 = static_cast<int>((i2 / 2));
2023-01-11T21:38:05.8435845Z                             auto tmp2 = static_cast<int>(1 + (i1 / 2));
2023-01-11T21:38:05.8435961Z                             auto tmp3 = static_cast<int>(1 + (i2 / 2));
2023-01-11T21:38:05.8436067Z                             auto tmp4 = static_cast<int>(0);
2023-01-11T21:38:05.8436207Z                             auto tmp5 = (tmp4 != tmp4) ? tmp4 : std::max(tmp0, tmp4);
2023-01-11T21:38:05.8436338Z                             auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp1, tmp4);
2023-01-11T21:38:05.8436452Z                             auto tmp7 = static_cast<int>(7);
2023-01-11T21:38:05.8436584Z                             auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp2, tmp7);
2023-01-11T21:38:05.8436712Z                             auto tmp9 = (tmp7 != tmp7) ? tmp7 : std::min(tmp3, tmp7);
2023-01-11T21:38:05.8436812Z                             auto tmp10 = tmp5 + tmp4;
2023-01-11T21:38:05.8436911Z                             auto tmp11 = tmp6 + tmp4;
2023-01-11T21:38:05.8437016Z                             auto tmp12 = static_cast<int>(1);
2023-01-11T21:38:05.8437168Z                             auto tmp13 = tmp8 - tmp12;
2023-01-11T21:38:05.8437308Z                             auto tmp14 = (tmp13 != tmp13) ? tmp13 : std::min(tmp10, tmp13);
2023-01-11T21:38:05.8437487Z                             auto tmp15 = tmp9 - tmp12;
2023-01-11T21:38:05.8437625Z                             auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp11, tmp15);
2023-01-11T21:38:05.8437749Z                             auto tmp17 = in_ptr0[tmp16 + (7*tmp14) + (49*i0)];
2023-01-11T21:38:05.8437850Z                             auto tmp18 = tmp17 / 4;
2023-01-11T21:38:05.8437950Z                             auto tmp19 = tmp10 < tmp8;
2023-01-11T21:38:05.8438045Z                             auto tmp20 = tmp11 < tmp9;
2023-01-11T21:38:05.8438150Z                             auto tmp21 = tmp19 & tmp20;
2023-01-11T21:38:05.8438267Z                             auto tmp22 = static_cast<float>(0.0);
2023-01-11T21:38:05.8438378Z                             auto tmp23 = tmp21 ? tmp18 : tmp22;
2023-01-11T21:38:05.8438491Z                             out_ptr0[i2 + (14*i1) + (196*i0)] = tmp23;
2023-01-11T21:38:05.8438565Z                         }
2023-01-11T21:38:05.8438640Z                     }
2023-01-11T21:38:05.8438708Z                 }
2023-01-11T21:38:05.8438768Z             }
2023-01-11T21:38:05.8438834Z         }
2023-01-11T21:38:05.8438899Z     }
2023-01-11T21:38:05.8438964Z }
2023-01-11T21:38:05.8439049Z ''')
2023-01-11T21:38:05.8439054Z 
2023-01-11T21:38:05.8439059Z 
2023-01-11T21:38:05.8439157Z async_compile.wait(globals())
2023-01-11T21:38:05.8439234Z del async_compile
2023-01-11T21:38:05.8439239Z 
2023-01-11T21:38:05.8439307Z def call(args):
2023-01-11T21:38:05.8439387Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8439461Z     args.clear()
2023-01-11T21:38:05.8439683Z     buf0 = empty_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8439824Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8439898Z     del arg0_1
2023-01-11T21:38:05.8439973Z     return (buf0, )
2023-01-11T21:38:05.8439978Z 
2023-01-11T21:38:05.8439983Z 
2023-01-11T21:38:05.8440059Z if __name__ == "__main__":
2023-01-11T21:38:05.8440176Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8440305Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8440519Z     arg0_1 = rand_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8440766Z     arg1_1 = rand_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8440892Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8440898Z 
2023-01-11T21:38:05.8440970Z ok (1.781s)
2023-01-11T21:38:05.8441422Z   test_baddbmm_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8441556Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8441805Z [2023-01-11 21:24:57,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 38
2023-01-11T21:38:05.8442068Z [2023-01-11 21:24:59,153] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 38
2023-01-11T21:38:05.8442074Z 
2023-01-11T21:38:05.8442173Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8442247Z import torch
2023-01-11T21:38:05.8442322Z import random
2023-01-11T21:38:05.8442441Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8442565Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8442571Z 
2023-01-11T21:38:05.8442652Z aten = torch.ops.aten
2023-01-11T21:38:05.8442781Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8442877Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8442882Z 
2023-01-11T21:38:05.8442988Z import triton
2023-01-11T21:38:05.8443081Z import triton.language as tl
2023-01-11T21:38:05.8443209Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8443353Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8443359Z 
2023-01-11T21:38:05.8443364Z 
2023-01-11T21:38:05.8443506Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8443714Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8443828Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.8443938Z                        const float* __restrict__ in_ptr0)
2023-01-11T21:38:05.8444004Z {
2023-01-11T21:38:05.8444106Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8444173Z     {
2023-01-11T21:38:05.8444267Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8444354Z         for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:05.8444414Z         {
2023-01-11T21:38:05.8444505Z             for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:05.8444576Z             {
2023-01-11T21:38:05.8444671Z                 for(long i2=0; i2<12; i2+=1)
2023-01-11T21:38:05.8444741Z                 {
2023-01-11T21:38:05.8444892Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i2) + (100*i0));
2023-01-11T21:38:05.8445055Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + (8*i2) + (100*i1) + (12800*i0));
2023-01-11T21:38:05.8445153Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8445273Z                     tmp2.store(in_out_ptr0 + (8*i2) + (100*i1) + (12800*i0));
2023-01-11T21:38:05.8445360Z                 }
2023-01-11T21:38:05.8445472Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8445590Z                 for(long i2=96; i2<100; i2+=1)
2023-01-11T21:38:05.8445659Z                 {
2023-01-11T21:38:05.8445764Z                     auto tmp0 = in_ptr0[i2 + (100*i0)];
2023-01-11T21:38:05.8445882Z                     auto tmp1 = in_out_ptr0[i2 + (100*i1) + (12800*i0)];
2023-01-11T21:38:05.8445974Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8446086Z                     in_out_ptr0[i2 + (100*i1) + (12800*i0)] = tmp2;
2023-01-11T21:38:05.8446155Z                 }
2023-01-11T21:38:05.8446223Z             }
2023-01-11T21:38:05.8446290Z         }
2023-01-11T21:38:05.8446386Z     }
2023-01-11T21:38:05.8446454Z }
2023-01-11T21:38:05.8446532Z ''')
2023-01-11T21:38:05.8446537Z 
2023-01-11T21:38:05.8446542Z 
2023-01-11T21:38:05.8446637Z async_compile.wait(globals())
2023-01-11T21:38:05.8446712Z del async_compile
2023-01-11T21:38:05.8446717Z 
2023-01-11T21:38:05.8446792Z def call(args):
2023-01-11T21:38:05.8446879Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8446954Z     args.clear()
2023-01-11T21:38:05.8447170Z     buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8447264Z     aten.bmm.out(arg1_1, arg2_1, out=buf0)
2023-01-11T21:38:05.8447336Z     del arg1_1
2023-01-11T21:38:05.8447411Z     del arg2_1
2023-01-11T21:38:05.8447501Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:05.8447638Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.8447710Z     del arg0_1
2023-01-11T21:38:05.8447784Z     return (buf1, )
2023-01-11T21:38:05.8447789Z 
2023-01-11T21:38:05.8447796Z 
2023-01-11T21:38:05.8447877Z if __name__ == "__main__":
2023-01-11T21:38:05.8447988Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8448115Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8448328Z     arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8448538Z     arg1_1 = rand_strided((6, 128, 64), (8192, 64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8448749Z     arg2_1 = rand_strided((6, 64, 100), (6400, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8448879Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8448919Z 
2023-01-11T21:38:05.8448992Z ok (1.796s)
2023-01-11T21:38:05.8449452Z   test_batch_norm_2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8449587Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8449836Z [2023-01-11 21:24:59,463] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 39
2023-01-11T21:38:05.8450099Z [2023-01-11 21:25:01,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 39
2023-01-11T21:38:05.8450520Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8450655Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8450912Z [2023-01-11 21:25:01,606] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 40
2023-01-11T21:38:05.8450918Z 
2023-01-11T21:38:05.8451016Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8451090Z import torch
2023-01-11T21:38:05.8451164Z import random
2023-01-11T21:38:05.8451286Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8451403Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8451408Z 
2023-01-11T21:38:05.8451490Z aten = torch.ops.aten
2023-01-11T21:38:05.8451626Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8451724Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8451729Z 
2023-01-11T21:38:05.8451803Z import triton
2023-01-11T21:38:05.8451896Z import triton.language as tl
2023-01-11T21:38:05.8452020Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8452159Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8452193Z 
2023-01-11T21:38:05.8452199Z 
2023-01-11T21:38:05.8452329Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8452536Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8452660Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8452769Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8452877Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.8452984Z                        const float* __restrict__ in_ptr3,
2023-01-11T21:38:05.8453089Z                        const float* __restrict__ in_ptr4,
2023-01-11T21:38:05.8453195Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8453291Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8453390Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8453493Z                        bool* __restrict__ out_ptr3)
2023-01-11T21:38:05.8453561Z {
2023-01-11T21:38:05.8453664Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8453730Z     {
2023-01-11T21:38:05.8453812Z         #pragma omp for 
2023-01-11T21:38:05.8453892Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.8453958Z         {
2023-01-11T21:38:05.8454099Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8454195Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8454263Z         }
2023-01-11T21:38:05.8454363Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8454448Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.8454647Z         {
2023-01-11T21:38:05.8454782Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8454869Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.8454934Z         }
2023-01-11T21:38:05.8455015Z         #pragma omp for 
2023-01-11T21:38:05.8455101Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.8455160Z         {
2023-01-11T21:38:05.8455299Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8455407Z             tmp0.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8455489Z         }
2023-01-11T21:38:05.8455600Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8455698Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.8455763Z         {
2023-01-11T21:38:05.8455845Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.8455930Z             out_ptr1[i0] = tmp0;
2023-01-11T21:38:05.8455999Z         }
2023-01-11T21:38:05.8456091Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8456176Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.8456248Z         {
2023-01-11T21:38:05.8456337Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:05.8456398Z             {
2023-01-11T21:38:05.8456492Z                 for(long i2=0; i2<8; i2+=1)
2023-01-11T21:38:05.8456562Z                 {
2023-01-11T21:38:05.8456721Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + (8*i2) + (64*i1) + (640*i0));
2023-01-11T21:38:05.8456856Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i1]);
2023-01-11T21:38:05.8456988Z                     auto tmp3 = at::vec::Vectorized<float>(out_ptr1[i1]);
2023-01-11T21:38:05.8457120Z                     auto tmp11 = at::vec::Vectorized<float>(in_ptr3[i1]);
2023-01-11T21:38:05.8457309Z                     auto tmp13 = at::vec::Vectorized<float>(in_ptr4[i1]);
2023-01-11T21:38:05.8457447Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8457657Z                     auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1e-05));
2023-01-11T21:38:05.8457758Z                     auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.8457853Z                     auto tmp6 = tmp5.sqrt();
2023-01-11T21:38:05.8457960Z                     auto tmp7 = tmp6.reciprocal();
2023-01-11T21:38:05.8458104Z                     auto tmp8 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8458243Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8458341Z                     auto tmp10 = tmp2 * tmp9;
2023-01-11T21:38:05.8458433Z                     auto tmp12 = tmp10 * tmp11;
2023-01-11T21:38:05.8458533Z                     auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:05.8458671Z                     auto tmp15 = at::vec::clamp_min(tmp14, decltype(tmp14)(0));
2023-01-11T21:38:05.8458792Z                     tmp15.store(out_ptr2 + (8*i2) + (64*i1) + (640*i0));
2023-01-11T21:38:05.8458863Z                 }
2023-01-11T21:38:05.8458962Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8459057Z                 for(long i2=64; i2<64; i2+=1)
2023-01-11T21:38:05.8459123Z                 {
2023-01-11T21:38:05.8459234Z                     auto tmp0 = in_ptr2[i2 + (64*i1) + (640*i0)];
2023-01-11T21:38:05.8459331Z                     auto tmp1 = out_ptr0[i1];
2023-01-11T21:38:05.8459433Z                     auto tmp3 = out_ptr1[i1];
2023-01-11T21:38:05.8459531Z                     auto tmp11 = in_ptr3[i1];
2023-01-11T21:38:05.8459626Z                     auto tmp13 = in_ptr4[i1];
2023-01-11T21:38:05.8459764Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8459921Z                     auto tmp4 = static_cast<float>(1e-05);
2023-01-11T21:38:05.8460016Z                     auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.8460120Z                     auto tmp6 = std::sqrt(tmp5);
2023-01-11T21:38:05.8460212Z                     auto tmp7 = 1 / tmp6;
2023-01-11T21:38:05.8460323Z                     auto tmp8 = static_cast<float>(1);
2023-01-11T21:38:05.8460419Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8460514Z                     auto tmp10 = tmp2 * tmp9;
2023-01-11T21:38:05.8460691Z                     auto tmp12 = tmp10 * tmp11;
2023-01-11T21:38:05.8460783Z                     auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:05.8460884Z                     auto tmp15 = tmp14 * (tmp14>0);
2023-01-11T21:38:05.8460991Z                     out_ptr2[i2 + (64*i1) + (640*i0)] = tmp15;
2023-01-11T21:38:05.8461064Z                 }
2023-01-11T21:38:05.8461131Z             }
2023-01-11T21:38:05.8461196Z         }
2023-01-11T21:38:05.8461280Z         #pragma omp for 
2023-01-11T21:38:05.8461361Z         for(long i0=0; i0<1280; i0+=1)
2023-01-11T21:38:05.8461428Z         {
2023-01-11T21:38:05.8461494Z             {
2023-01-11T21:38:05.8461563Z                 {
2023-01-11T21:38:05.8461660Z                     auto tmp0 = out_ptr2[i0];
2023-01-11T21:38:05.8461769Z                     auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:05.8461857Z                     auto tmp2 = tmp0 <= tmp1;
2023-01-11T21:38:05.8461950Z                     out_ptr3[i0] = tmp2;
2023-01-11T21:38:05.8462021Z                 }
2023-01-11T21:38:05.8462158Z             }
2023-01-11T21:38:05.8462240Z         }
2023-01-11T21:38:05.8462334Z     }
2023-01-11T21:38:05.8462422Z }
2023-01-11T21:38:05.8462668Z ''')
2023-01-11T21:38:05.8462674Z 
2023-01-11T21:38:05.8462679Z 
2023-01-11T21:38:05.8462793Z async_compile.wait(globals())
2023-01-11T21:38:05.8462888Z del async_compile
2023-01-11T21:38:05.8462893Z 
2023-01-11T21:38:05.8462986Z def call(args):
2023-01-11T21:38:05.8463124Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args
2023-01-11T21:38:05.8463216Z     args.clear()
2023-01-11T21:38:05.8463441Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8463676Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8463936Z     buf2 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8464205Z     buf3 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8464583Z     kernel_cpp_0(c_void_p(primals_3.data_ptr()), c_void_p(primals_4.data_ptr()), c_void_p(primals_6.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:05.8464680Z     del primals_1
2023-01-11T21:38:05.8464749Z     del primals_2
2023-01-11T21:38:05.8464887Z     del primals_3
2023-01-11T21:38:05.8464983Z     del primals_4
2023-01-11T21:38:05.8465125Z     return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, )
2023-01-11T21:38:05.8465131Z 
2023-01-11T21:38:05.8465135Z 
2023-01-11T21:38:05.8465236Z if __name__ == "__main__":
2023-01-11T21:38:05.8465372Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8465542Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8465798Z     primals_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8465995Z     primals_2 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8466212Z     primals_3 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8466435Z     primals_4 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8466642Z     primals_5 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8466880Z     primals_6 = rand_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8467076Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6]))
2023-01-11T21:38:05.8467082Z 
2023-01-11T21:38:05.8467362Z [2023-01-11 21:25:03,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 40
2023-01-11T21:38:05.8467368Z 
2023-01-11T21:38:05.8467502Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8467595Z import torch
2023-01-11T21:38:05.8467703Z import random
2023-01-11T21:38:05.8467884Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8468023Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8468028Z 
2023-01-11T21:38:05.8468127Z aten = torch.ops.aten
2023-01-11T21:38:05.8468281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8468395Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8468400Z 
2023-01-11T21:38:05.8468486Z import triton
2023-01-11T21:38:05.8468571Z import triton.language as tl
2023-01-11T21:38:05.8468735Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8468891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8468896Z 
2023-01-11T21:38:05.8468900Z 
2023-01-11T21:38:05.8469058Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8469282Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8469422Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8469551Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8469674Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.8469781Z                        const float* __restrict__ in_ptr3,
2023-01-11T21:38:05.8469913Z                        const float* __restrict__ in_ptr4,
2023-01-11T21:38:05.8470046Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8470210Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8470323Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8470437Z                        bool* __restrict__ out_ptr3)
2023-01-11T21:38:05.8470517Z {
2023-01-11T21:38:05.8470614Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8470696Z     {
2023-01-11T21:38:05.8470802Z         #pragma omp for 
2023-01-11T21:38:05.8470907Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.8471003Z         {
2023-01-11T21:38:05.8471164Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8471277Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8471337Z         }
2023-01-11T21:38:05.8471452Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8471554Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.8471674Z         {
2023-01-11T21:38:05.8471781Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8471886Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.8472027Z         }
2023-01-11T21:38:05.8472101Z         #pragma omp for 
2023-01-11T21:38:05.8472203Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.8472285Z         {
2023-01-11T21:38:05.8472435Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8472551Z             tmp0.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8472637Z         }
2023-01-11T21:38:05.8472750Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8472833Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.8472914Z         {
2023-01-11T21:38:05.8473030Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.8473132Z             out_ptr1[i0] = tmp0;
2023-01-11T21:38:05.8473217Z         }
2023-01-11T21:38:05.8473333Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8473436Z         for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.8473496Z         {
2023-01-11T21:38:05.8473600Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:05.8473681Z             {
2023-01-11T21:38:05.8473794Z                 for(long i2=0; i2<32; i2+=1)
2023-01-11T21:38:05.8473935Z                 {
2023-01-11T21:38:05.8474112Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + (8*i2) + (256*i1) + (2560*i0));
2023-01-11T21:38:05.8474269Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i1]);
2023-01-11T21:38:05.8474394Z                     auto tmp3 = at::vec::Vectorized<float>(out_ptr1[i1]);
2023-01-11T21:38:05.8474540Z                     auto tmp11 = at::vec::Vectorized<float>(in_ptr3[i1]);
2023-01-11T21:38:05.8474717Z                     auto tmp13 = at::vec::Vectorized<float>(in_ptr4[i1]);
2023-01-11T21:38:05.8474880Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8475113Z                     auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1e-05));
2023-01-11T21:38:05.8475248Z                     auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.8475404Z                     auto tmp6 = tmp5.sqrt();
2023-01-11T21:38:05.8475527Z                     auto tmp7 = tmp6.reciprocal();
2023-01-11T21:38:05.8475663Z                     auto tmp8 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8475773Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8475883Z                     auto tmp10 = tmp2 * tmp9;
2023-01-11T21:38:05.8475996Z                     auto tmp12 = tmp10 * tmp11;
2023-01-11T21:38:05.8476115Z                     auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:05.8476271Z                     auto tmp15 = at::vec::clamp_min(tmp14, decltype(tmp14)(0));
2023-01-11T21:38:05.8476460Z                     tmp15.store(out_ptr2 + (8*i2) + (256*i1) + (2560*i0));
2023-01-11T21:38:05.8476556Z                 }
2023-01-11T21:38:05.8476651Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8476765Z                 for(long i2=256; i2<256; i2+=1)
2023-01-11T21:38:05.8476848Z                 {
2023-01-11T21:38:05.8476975Z                     auto tmp0 = in_ptr2[i2 + (256*i1) + (2560*i0)];
2023-01-11T21:38:05.8477090Z                     auto tmp1 = out_ptr0[i1];
2023-01-11T21:38:05.8477202Z                     auto tmp3 = out_ptr1[i1];
2023-01-11T21:38:05.8477317Z                     auto tmp11 = in_ptr3[i1];
2023-01-11T21:38:05.8477405Z                     auto tmp13 = in_ptr4[i1];
2023-01-11T21:38:05.8477560Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8477751Z                     auto tmp4 = static_cast<float>(1e-05);
2023-01-11T21:38:05.8477863Z                     auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.8477983Z                     auto tmp6 = std::sqrt(tmp5);
2023-01-11T21:38:05.8478088Z                     auto tmp7 = 1 / tmp6;
2023-01-11T21:38:05.8478212Z                     auto tmp8 = static_cast<float>(1);
2023-01-11T21:38:05.8478301Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8478450Z                     auto tmp10 = tmp2 * tmp9;
2023-01-11T21:38:05.8478604Z                     auto tmp12 = tmp10 * tmp11;
2023-01-11T21:38:05.8478715Z                     auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:05.8478847Z                     auto tmp15 = tmp14 * (tmp14>0);
2023-01-11T21:38:05.8478971Z                     out_ptr2[i2 + (256*i1) + (2560*i0)] = tmp15;
2023-01-11T21:38:05.8479056Z                 }
2023-01-11T21:38:05.8479116Z             }
2023-01-11T21:38:05.8479196Z         }
2023-01-11T21:38:05.8479300Z         #pragma omp for 
2023-01-11T21:38:05.8479410Z         for(long i0=0; i0<7680; i0+=1)
2023-01-11T21:38:05.8479497Z         {
2023-01-11T21:38:05.8479578Z             {
2023-01-11T21:38:05.8479675Z                 {
2023-01-11T21:38:05.8479765Z                     auto tmp0 = out_ptr2[i0];
2023-01-11T21:38:05.8479890Z                     auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:05.8480000Z                     auto tmp2 = tmp0 <= tmp1;
2023-01-11T21:38:05.8480116Z                     out_ptr3[i0] = tmp2;
2023-01-11T21:38:05.8480200Z                 }
2023-01-11T21:38:05.8480322Z             }
2023-01-11T21:38:05.8480404Z         }
2023-01-11T21:38:05.8480463Z     }
2023-01-11T21:38:05.8480541Z }
2023-01-11T21:38:05.8480656Z ''')
2023-01-11T21:38:05.8480662Z 
2023-01-11T21:38:05.8480666Z 
2023-01-11T21:38:05.8480775Z async_compile.wait(globals())
2023-01-11T21:38:05.8480874Z del async_compile
2023-01-11T21:38:05.8480879Z 
2023-01-11T21:38:05.8480972Z def call(args):
2023-01-11T21:38:05.8481131Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args
2023-01-11T21:38:05.8481235Z     args.clear()
2023-01-11T21:38:05.8481444Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8481650Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8481884Z     buf2 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8482138Z     buf3 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8482482Z     kernel_cpp_0(c_void_p(primals_3.data_ptr()), c_void_p(primals_4.data_ptr()), c_void_p(primals_6.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:05.8482580Z     del primals_1
2023-01-11T21:38:05.8482671Z     del primals_2
2023-01-11T21:38:05.8482739Z     del primals_3
2023-01-11T21:38:05.8482869Z     del primals_4
2023-01-11T21:38:05.8483009Z     return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, )
2023-01-11T21:38:05.8483021Z 
2023-01-11T21:38:05.8483026Z 
2023-01-11T21:38:05.8483120Z if __name__ == "__main__":
2023-01-11T21:38:05.8483261Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8483417Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8483640Z     primals_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8483853Z     primals_2 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8484042Z     primals_3 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8484252Z     primals_4 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8484458Z     primals_5 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8484702Z     primals_6 = rand_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8484904Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6]))
2023-01-11T21:38:05.8484912Z 
2023-01-11T21:38:05.8485003Z ok (4.235s)
2023-01-11T21:38:05.8485527Z   test_bernoulli1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8485677Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8485953Z [2023-01-11 21:25:03,461] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 41
2023-01-11T21:38:05.8486270Z [2023-01-11 21:25:05,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 41
2023-01-11T21:38:05.8486276Z 
2023-01-11T21:38:05.8486372Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8486463Z import torch
2023-01-11T21:38:05.8486560Z import random
2023-01-11T21:38:05.8486696Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8486835Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8486840Z 
2023-01-11T21:38:05.8486959Z aten = torch.ops.aten
2023-01-11T21:38:05.8487111Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8487225Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8487230Z 
2023-01-11T21:38:05.8487297Z import triton
2023-01-11T21:38:05.8487404Z import triton.language as tl
2023-01-11T21:38:05.8487560Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8487718Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8487723Z 
2023-01-11T21:38:05.8487727Z 
2023-01-11T21:38:05.8487882Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8488109Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8488291Z extern "C" void kernel(float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8488371Z {
2023-01-11T21:38:05.8488467Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8488596Z     {
2023-01-11T21:38:05.8488692Z         #pragma omp for 
2023-01-11T21:38:05.8488796Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:05.8488879Z         {
2023-01-11T21:38:05.8489036Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(0));
2023-01-11T21:38:05.8489147Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8489208Z         }
2023-01-11T21:38:05.8489334Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8489445Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:05.8489529Z         {
2023-01-11T21:38:05.8489646Z             auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:05.8489747Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.8489807Z         }
2023-01-11T21:38:05.8489890Z     }
2023-01-11T21:38:05.8489968Z }
2023-01-11T21:38:05.8490067Z ''')
2023-01-11T21:38:05.8490072Z 
2023-01-11T21:38:05.8490076Z 
2023-01-11T21:38:05.8490207Z async_compile.wait(globals())
2023-01-11T21:38:05.8490300Z del async_compile
2023-01-11T21:38:05.8490305Z 
2023-01-11T21:38:05.8490435Z def call(args):
2023-01-11T21:38:05.8490527Z     arg0_1, = args
2023-01-11T21:38:05.8490595Z     args.clear()
2023-01-11T21:38:05.8490803Z     buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8490926Z     kernel_cpp_0(c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8491029Z     aten.bernoulli_(buf0, )
2023-01-11T21:38:05.8491132Z     return (buf0, buf0, )
2023-01-11T21:38:05.8491137Z 
2023-01-11T21:38:05.8491141Z 
2023-01-11T21:38:05.8491248Z if __name__ == "__main__":
2023-01-11T21:38:05.8491381Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8491502Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8491715Z     arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8491845Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8491850Z 
2023-01-11T21:38:05.8491934Z ok (1.795s)
2023-01-11T21:38:05.8492434Z   test_bernoulli2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8492593Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8492868Z [2023-01-11 21:25:05,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 42
2023-01-11T21:38:05.8493155Z [2023-01-11 21:25:05,246] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.8493477Z [2023-01-11 21:25:06,981] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 42
2023-01-11T21:38:05.8493483Z 
2023-01-11T21:38:05.8493600Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8493670Z import torch
2023-01-11T21:38:05.8493765Z import random
2023-01-11T21:38:05.8493905Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8494054Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8494060Z 
2023-01-11T21:38:05.8494158Z aten = torch.ops.aten
2023-01-11T21:38:05.8494310Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8494433Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8494439Z 
2023-01-11T21:38:05.8494633Z import triton
2023-01-11T21:38:05.8494747Z import triton.language as tl
2023-01-11T21:38:05.8494894Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8495052Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8495295Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.8495301Z 
2023-01-11T21:38:05.8495305Z 
2023-01-11T21:38:05.8495459Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8495683Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8495822Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:05.8495926Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8496104Z                        bool* __restrict__ out_ptr0)
2023-01-11T21:38:05.8496183Z {
2023-01-11T21:38:05.8496305Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8496389Z     {
2023-01-11T21:38:05.8496485Z         #pragma omp for 
2023-01-11T21:38:05.8496587Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8496647Z         {
2023-01-11T21:38:05.8496727Z             {
2023-01-11T21:38:05.8496811Z                 {
2023-01-11T21:38:05.8496932Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.8497054Z                     auto tmp5 = in_ptr1[i0];
2023-01-11T21:38:05.8497304Z                     auto tmp1 = static_cast<int>(65535);
2023-01-11T21:38:05.8497419Z                     auto tmp2 = tmp0 ^ tmp1;
2023-01-11T21:38:05.8497523Z                     auto tmp3 = static_cast<int>(i0);
2023-01-11T21:38:05.8497683Z                     auto tmp4 = static_cast<float>(normalized_rand_cpu(tmp2, tmp3));;
2023-01-11T21:38:05.8497798Z                     auto tmp6 = tmp4 < tmp5;
2023-01-11T21:38:05.8497902Z                     out_ptr0[i0] = tmp6;
2023-01-11T21:38:05.8498027Z                 }
2023-01-11T21:38:05.8498127Z             }
2023-01-11T21:38:05.8498208Z         }
2023-01-11T21:38:05.8498270Z     }
2023-01-11T21:38:05.8498349Z }
2023-01-11T21:38:05.8498452Z ''')
2023-01-11T21:38:05.8498458Z 
2023-01-11T21:38:05.8498462Z 
2023-01-11T21:38:05.8498572Z async_compile.wait(globals())
2023-01-11T21:38:05.8498662Z del async_compile
2023-01-11T21:38:05.8498672Z 
2023-01-11T21:38:05.8498759Z def call(args):
2023-01-11T21:38:05.8498855Z     arg0_1, = args
2023-01-11T21:38:05.8498926Z     args.clear()
2023-01-11T21:38:05.8499095Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.8499339Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8499539Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8499627Z     del arg0_1
2023-01-11T21:38:05.8499718Z     return (buf0, )
2023-01-11T21:38:05.8499724Z 
2023-01-11T21:38:05.8499728Z 
2023-01-11T21:38:05.8499822Z if __name__ == "__main__":
2023-01-11T21:38:05.8500031Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8500152Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8500366Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8500590Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8500721Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8500726Z 
2023-01-11T21:38:05.8500810Z ok (1.761s)
2023-01-11T21:38:05.8501279Z   test_bitwise2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8501434Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8501711Z [2023-01-11 21:25:07,003] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 43
2023-01-11T21:38:05.8501988Z [2023-01-11 21:25:08,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 43
2023-01-11T21:38:05.8502025Z 
2023-01-11T21:38:05.8502139Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8502206Z import torch
2023-01-11T21:38:05.8502317Z import random
2023-01-11T21:38:05.8502452Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8502589Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8502597Z 
2023-01-11T21:38:05.8502703Z aten = torch.ops.aten
2023-01-11T21:38:05.8502855Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8503006Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8503012Z 
2023-01-11T21:38:05.8503079Z import triton
2023-01-11T21:38:05.8503186Z import triton.language as tl
2023-01-11T21:38:05.8503335Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8503502Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8503508Z 
2023-01-11T21:38:05.8503512Z 
2023-01-11T21:38:05.8503667Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8503901Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8504041Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.8504167Z                        const bool* __restrict__ in_ptr1,
2023-01-11T21:38:05.8504264Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:05.8504387Z                        bool* __restrict__ out_ptr1,
2023-01-11T21:38:05.8504505Z                        bool* __restrict__ out_ptr2,
2023-01-11T21:38:05.8504618Z                        bool* __restrict__ out_ptr3)
2023-01-11T21:38:05.8504714Z {
2023-01-11T21:38:05.8504839Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8504919Z     {
2023-01-11T21:38:05.8504995Z         #pragma omp for 
2023-01-11T21:38:05.8505098Z         for(long i0=0; i0<40; i0+=1)
2023-01-11T21:38:05.8505221Z         {
2023-01-11T21:38:05.8505307Z             {
2023-01-11T21:38:05.8505393Z                 {
2023-01-11T21:38:05.8505510Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8505645Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8505733Z                     auto tmp1 = tmp0 == 0;
2023-01-11T21:38:05.8505844Z                     auto tmp3 = tmp0 | tmp2;
2023-01-11T21:38:05.8505957Z                     auto tmp4 = tmp0 ^ tmp2;
2023-01-11T21:38:05.8506108Z                     auto tmp5 = tmp0 & tmp2;
2023-01-11T21:38:05.8506216Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.8506320Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.8506435Z                     out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.8506517Z                     out_ptr3[i0] = tmp5;
2023-01-11T21:38:05.8506601Z                 }
2023-01-11T21:38:05.8506696Z             }
2023-01-11T21:38:05.8506782Z         }
2023-01-11T21:38:05.8506866Z     }
2023-01-11T21:38:05.8506993Z }
2023-01-11T21:38:05.8507073Z ''')
2023-01-11T21:38:05.8507100Z 
2023-01-11T21:38:05.8507105Z 
2023-01-11T21:38:05.8507192Z async_compile.wait(globals())
2023-01-11T21:38:05.8507299Z del async_compile
2023-01-11T21:38:05.8507304Z 
2023-01-11T21:38:05.8507396Z def call(args):
2023-01-11T21:38:05.8507492Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8507596Z     args.clear()
2023-01-11T21:38:05.8507809Z     buf0 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8508019Z     buf1 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8508198Z     buf2 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8508400Z     buf3 = empty_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8508661Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:05.8508751Z     del arg0_1
2023-01-11T21:38:05.8508838Z     del arg1_1
2023-01-11T21:38:05.8508948Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:05.8508982Z 
2023-01-11T21:38:05.8508987Z 
2023-01-11T21:38:05.8509097Z if __name__ == "__main__":
2023-01-11T21:38:05.8509230Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8509351Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8509601Z     arg0_1 = rand_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8509817Z     arg1_1 = rand_strided((2, 20), (20, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8509954Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8509959Z 
2023-01-11T21:38:05.8510046Z ok (1.767s)
2023-01-11T21:38:05.8510516Z   test_bitwise_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8510670Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8510962Z [2023-01-11 21:25:08,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 44
2023-01-11T21:38:05.8511246Z [2023-01-11 21:25:10,632] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 44
2023-01-11T21:38:05.8511252Z 
2023-01-11T21:38:05.8511378Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8511448Z import torch
2023-01-11T21:38:05.8511539Z import random
2023-01-11T21:38:05.8511675Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8511819Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8511824Z 
2023-01-11T21:38:05.8511923Z aten = torch.ops.aten
2023-01-11T21:38:05.8512076Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8512205Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8512214Z 
2023-01-11T21:38:05.8512284Z import triton
2023-01-11T21:38:05.8512400Z import triton.language as tl
2023-01-11T21:38:05.8512586Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8512742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8512747Z 
2023-01-11T21:38:05.8512752Z 
2023-01-11T21:38:05.8512939Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8513163Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8513300Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:05.8513429Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:05.8513526Z                        int* __restrict__ out_ptr0,
2023-01-11T21:38:05.8513655Z                        int* __restrict__ out_ptr1,
2023-01-11T21:38:05.8513771Z                        int* __restrict__ out_ptr2,
2023-01-11T21:38:05.8513886Z                        int* __restrict__ out_ptr3)
2023-01-11T21:38:05.8513973Z {
2023-01-11T21:38:05.8514093Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8514176Z     {
2023-01-11T21:38:05.8514253Z         #pragma omp for 
2023-01-11T21:38:05.8514362Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8514445Z         {
2023-01-11T21:38:05.8514543Z             {
2023-01-11T21:38:05.8514669Z                 {
2023-01-11T21:38:05.8514784Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8514899Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8514983Z                     auto tmp1 = ~tmp0;
2023-01-11T21:38:05.8515097Z                     auto tmp3 = tmp0 | tmp2;
2023-01-11T21:38:05.8515216Z                     auto tmp4 = tmp0 ^ tmp2;
2023-01-11T21:38:05.8515327Z                     auto tmp5 = tmp0 & tmp2;
2023-01-11T21:38:05.8515433Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.8515550Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.8515645Z                     out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.8535206Z                     out_ptr3[i0] = tmp5;
2023-01-11T21:38:05.8535308Z                 }
2023-01-11T21:38:05.8535384Z             }
2023-01-11T21:38:05.8535449Z         }
2023-01-11T21:38:05.8535522Z     }
2023-01-11T21:38:05.8535590Z }
2023-01-11T21:38:05.8535705Z ''')
2023-01-11T21:38:05.8535711Z 
2023-01-11T21:38:05.8535720Z 
2023-01-11T21:38:05.8535833Z async_compile.wait(globals())
2023-01-11T21:38:05.8535916Z del async_compile
2023-01-11T21:38:05.8535922Z 
2023-01-11T21:38:05.8536001Z def call(args):
2023-01-11T21:38:05.8536078Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8536159Z     args.clear()
2023-01-11T21:38:05.8536388Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.8536583Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.8536776Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.8536965Z     buf3 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.8537272Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:05.8537362Z     del arg0_1
2023-01-11T21:38:05.8537445Z     del arg1_1
2023-01-11T21:38:05.8537543Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:05.8537549Z 
2023-01-11T21:38:05.8537553Z 
2023-01-11T21:38:05.8537637Z if __name__ == "__main__":
2023-01-11T21:38:05.8537757Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8537885Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8538079Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.8538270Z     arg1_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.8538391Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8538397Z 
2023-01-11T21:38:05.8538464Z ok (1.884s)
2023-01-11T21:38:05.8539010Z   test_bmm1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8539147Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8539413Z [2023-01-11 21:25:10,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 45
2023-01-11T21:38:05.8539675Z [2023-01-11 21:25:12,563] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 45
2023-01-11T21:38:05.8540090Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8540224Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8540479Z [2023-01-11 21:25:12,586] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 46
2023-01-11T21:38:05.8540485Z 
2023-01-11T21:38:05.8540582Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8540657Z import torch
2023-01-11T21:38:05.8540734Z import random
2023-01-11T21:38:05.8540847Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8540969Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8540974Z 
2023-01-11T21:38:05.8541056Z aten = torch.ops.aten
2023-01-11T21:38:05.8541197Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8541293Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8541341Z 
2023-01-11T21:38:05.8541419Z import triton
2023-01-11T21:38:05.8541515Z import triton.language as tl
2023-01-11T21:38:05.8541633Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8541773Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8541778Z 
2023-01-11T21:38:05.8541783Z 
2023-01-11T21:38:05.8541925Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8542132Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8542259Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8542368Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8542472Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8542575Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8542634Z {
2023-01-11T21:38:05.8542737Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8542805Z     {
2023-01-11T21:38:05.8542891Z         #pragma omp for 
2023-01-11T21:38:05.8542980Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8543048Z         {
2023-01-11T21:38:05.8543199Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8543331Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8543428Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8543524Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8543593Z         }
2023-01-11T21:38:05.8543695Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8543785Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:05.8543852Z         {
2023-01-11T21:38:05.8543934Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8544038Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8544127Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8544215Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8544283Z         }
2023-01-11T21:38:05.8544371Z         #pragma omp for 
2023-01-11T21:38:05.8544457Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8544516Z         {
2023-01-11T21:38:05.8544650Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8544786Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8544914Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8545010Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8545078Z         }
2023-01-11T21:38:05.8545180Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8545261Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:05.8545329Z         {
2023-01-11T21:38:05.8545420Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.8545524Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8545614Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8545701Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.8545770Z         }
2023-01-11T21:38:05.8545831Z     }
2023-01-11T21:38:05.8545897Z }
2023-01-11T21:38:05.8545984Z ''')
2023-01-11T21:38:05.8545990Z 
2023-01-11T21:38:05.8545994Z 
2023-01-11T21:38:05.8546131Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.8546336Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8546462Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.8546528Z {
2023-01-11T21:38:05.8546622Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8546690Z     {
2023-01-11T21:38:05.8546771Z         #pragma omp for 
2023-01-11T21:38:05.8546858Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8546927Z         {
2023-01-11T21:38:05.8547072Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8547206Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.8547296Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8547423Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8547493Z         }
2023-01-11T21:38:05.8547595Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8547685Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:05.8547754Z         {
2023-01-11T21:38:05.8547851Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.8547953Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.8548045Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8548136Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8548206Z         }
2023-01-11T21:38:05.8548275Z     }
2023-01-11T21:38:05.8548341Z }
2023-01-11T21:38:05.8548428Z ''')
2023-01-11T21:38:05.8548434Z 
2023-01-11T21:38:05.8548438Z 
2023-01-11T21:38:05.8548535Z async_compile.wait(globals())
2023-01-11T21:38:05.8548609Z del async_compile
2023-01-11T21:38:05.8548614Z 
2023-01-11T21:38:05.8548692Z def call(args):
2023-01-11T21:38:05.8548774Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8548853Z     args.clear()
2023-01-11T21:38:05.8549063Z     buf0 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8549171Z     aten.bmm.out(arg0_1, arg1_1, out=buf0)
2023-01-11T21:38:05.8549377Z     buf1 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8549571Z     buf2 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8549771Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8549848Z     del arg0_1
2023-01-11T21:38:05.8549926Z     del arg1_1
2023-01-11T21:38:05.8550128Z     buf3 = empty_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8550226Z     aten.bmm.out(buf1, buf2, out=buf3)
2023-01-11T21:38:05.8550298Z     del buf1
2023-01-11T21:38:05.8550363Z     del buf2
2023-01-11T21:38:05.8550457Z     buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:05.8550571Z     kernel_cpp_1(c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8550659Z     return (buf0, buf4, )
2023-01-11T21:38:05.8550664Z 
2023-01-11T21:38:05.8550668Z 
2023-01-11T21:38:05.8550753Z if __name__ == "__main__":
2023-01-11T21:38:05.8550873Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8551003Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8551255Z     arg0_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8551446Z     arg1_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8551568Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8551574Z 
2023-01-11T21:38:05.8551578Z 
2023-01-11T21:38:05.8551676Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8551750Z import torch
2023-01-11T21:38:05.8551829Z import random
2023-01-11T21:38:05.8551949Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8552073Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8552081Z 
2023-01-11T21:38:05.8552164Z aten = torch.ops.aten
2023-01-11T21:38:05.8552293Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8552389Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8552394Z 
2023-01-11T21:38:05.8552469Z import triton
2023-01-11T21:38:05.8552564Z import triton.language as tl
2023-01-11T21:38:05.8552691Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8552830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8552836Z 
2023-01-11T21:38:05.8552840Z 
2023-01-11T21:38:05.8552974Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8553175Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8553292Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8553401Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8553505Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8553638Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8553706Z {
2023-01-11T21:38:05.8553809Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8553879Z     {
2023-01-11T21:38:05.8553954Z         #pragma omp for 
2023-01-11T21:38:05.8554044Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8554112Z         {
2023-01-11T21:38:05.8554251Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8554388Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8554479Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8554576Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8554635Z         }
2023-01-11T21:38:05.8554735Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8554823Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:05.8554890Z         {
2023-01-11T21:38:05.8554978Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8555087Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8555177Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8555255Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8555324Z         }
2023-01-11T21:38:05.8555405Z         #pragma omp for 
2023-01-11T21:38:05.8555495Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:05.8555562Z         {
2023-01-11T21:38:05.8555699Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8555836Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8555918Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8556015Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8556084Z         }
2023-01-11T21:38:05.8556182Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8556274Z         for(long i0=80; i0<80; i0+=1)
2023-01-11T21:38:05.8556345Z         {
2023-01-11T21:38:05.8556434Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.8556534Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8556622Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8556708Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.8556775Z         }
2023-01-11T21:38:05.8556842Z     }
2023-01-11T21:38:05.8556908Z }
2023-01-11T21:38:05.8557025Z ''')
2023-01-11T21:38:05.8557031Z 
2023-01-11T21:38:05.8557035Z 
2023-01-11T21:38:05.8557162Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.8557370Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8557489Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.8557554Z {
2023-01-11T21:38:05.8557658Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8557726Z     {
2023-01-11T21:38:05.8557807Z         #pragma omp for 
2023-01-11T21:38:05.8557886Z         for(long i0=0; i0<20; i0+=1)
2023-01-11T21:38:05.8557954Z         {
2023-01-11T21:38:05.8558102Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8558240Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.8558329Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8558432Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.8558503Z         }
2023-01-11T21:38:05.8558595Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8558683Z         for(long i0=160; i0<160; i0+=1)
2023-01-11T21:38:05.8558749Z         {
2023-01-11T21:38:05.8558843Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.8558948Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.8559036Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8559123Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8559183Z         }
2023-01-11T21:38:05.8559250Z     }
2023-01-11T21:38:05.8559315Z }
2023-01-11T21:38:05.8559405Z ''')
2023-01-11T21:38:05.8559410Z 
2023-01-11T21:38:05.8559444Z 
2023-01-11T21:38:05.8559539Z async_compile.wait(globals())
2023-01-11T21:38:05.8559617Z del async_compile
2023-01-11T21:38:05.8559623Z 
2023-01-11T21:38:05.8559698Z def call(args):
2023-01-11T21:38:05.8559770Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8559846Z     args.clear()
2023-01-11T21:38:05.8560055Z     buf0 = empty_strided((1, 16, 10), (160, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8560231Z     aten.mm.out(as_strided(arg0_1, (16, 8), (8, 1)), as_strided(arg1_1, (8, 10), (10, 1)), out=as_strided(buf0, (16, 10), (10, 1)))
2023-01-11T21:38:05.8560438Z     buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8560638Z     buf2 = empty_strided((1, 8, 10), (80, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8560834Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8560906Z     del arg0_1
2023-01-11T21:38:05.8560971Z     del arg1_1
2023-01-11T21:38:05.8561181Z     buf3 = empty_strided((1, 16, 10), (160, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8561350Z     aten.mm.out(as_strided(buf1, (16, 8), (8, 1)), as_strided(buf2, (8, 10), (10, 1)), out=as_strided(buf3, (16, 10), (10, 1)))
2023-01-11T21:38:05.8561422Z     del buf1
2023-01-11T21:38:05.8561493Z     del buf2
2023-01-11T21:38:05.8561585Z     buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:05.8561692Z     kernel_cpp_1(c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8561766Z     return (buf0, buf4, )
2023-01-11T21:38:05.8561779Z 
2023-01-11T21:38:05.8561783Z 
2023-01-11T21:38:05.8561857Z if __name__ == "__main__":
2023-01-11T21:38:05.8561978Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8562110Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8562315Z     arg0_1 = rand_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8562516Z     arg1_1 = rand_strided((1, 8, 10), (80, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8562638Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8562643Z 
2023-01-11T21:38:05.8562908Z [2023-01-11 21:25:14,448] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 46
2023-01-11T21:38:05.8562984Z ok (3.816s)
2023-01-11T21:38:05.8563460Z   test_bmm2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8563587Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8563841Z [2023-01-11 21:25:14,467] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 47
2023-01-11T21:38:05.8564107Z [2023-01-11 21:25:14,470] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 47
2023-01-11T21:38:05.8564112Z 
2023-01-11T21:38:05.8564215Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8564289Z import torch
2023-01-11T21:38:05.8564363Z import random
2023-01-11T21:38:05.8564484Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8564609Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8564615Z 
2023-01-11T21:38:05.8564690Z aten = torch.ops.aten
2023-01-11T21:38:05.8564827Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8564925Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8564930Z 
2023-01-11T21:38:05.8565004Z import triton
2023-01-11T21:38:05.8565096Z import triton.language as tl
2023-01-11T21:38:05.8565225Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8565367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8565401Z 
2023-01-11T21:38:05.8565406Z 
2023-01-11T21:38:05.8565502Z async_compile.wait(globals())
2023-01-11T21:38:05.8565572Z del async_compile
2023-01-11T21:38:05.8565577Z 
2023-01-11T21:38:05.8565652Z def call(args):
2023-01-11T21:38:05.8565733Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8565808Z     args.clear()
2023-01-11T21:38:05.8566017Z     buf0 = empty_strided((1, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8566184Z     aten.mm.out(as_strided(arg0_1, (8, 8), (1, 8)), as_strided(arg1_1, (8, 8), (8, 1)), out=as_strided(buf0, (8, 8), (8, 1)))
2023-01-11T21:38:05.8566258Z     del arg0_1
2023-01-11T21:38:05.8566322Z     del arg1_1
2023-01-11T21:38:05.8566398Z     return (buf0, )
2023-01-11T21:38:05.8566403Z 
2023-01-11T21:38:05.8566408Z 
2023-01-11T21:38:05.8566490Z if __name__ == "__main__":
2023-01-11T21:38:05.8566607Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8566736Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8566944Z     arg0_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8567142Z     arg1_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8567262Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8567267Z 
2023-01-11T21:38:05.8567340Z ok (0.021s)
2023-01-11T21:38:05.8567782Z   test_bool_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8567917Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8568172Z [2023-01-11 21:25:14,504] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 48
2023-01-11T21:38:05.8568433Z [2023-01-11 21:25:16,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 48
2023-01-11T21:38:05.8568439Z 
2023-01-11T21:38:05.8568537Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8568611Z import torch
2023-01-11T21:38:05.8568686Z import random
2023-01-11T21:38:05.8568834Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8568961Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8568966Z 
2023-01-11T21:38:05.8569042Z aten = torch.ops.aten
2023-01-11T21:38:05.8569177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8569275Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8569280Z 
2023-01-11T21:38:05.8569354Z import triton
2023-01-11T21:38:05.8569446Z import triton.language as tl
2023-01-11T21:38:05.8569569Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8569711Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8569719Z 
2023-01-11T21:38:05.8569724Z 
2023-01-11T21:38:05.8569864Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8570063Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8570190Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.8570303Z                        const bool* __restrict__ in_ptr1,
2023-01-11T21:38:05.8570408Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:05.8570508Z                        bool* __restrict__ out_ptr1,
2023-01-11T21:38:05.8570607Z                        bool* __restrict__ out_ptr2,
2023-01-11T21:38:05.8570704Z                        bool* __restrict__ out_ptr3,
2023-01-11T21:38:05.8570793Z                        bool* __restrict__ out_ptr4,
2023-01-11T21:38:05.8570890Z                        bool* __restrict__ out_ptr5,
2023-01-11T21:38:05.8570987Z                        bool* __restrict__ out_ptr6,
2023-01-11T21:38:05.8571117Z                        bool* __restrict__ out_ptr7,
2023-01-11T21:38:05.8571219Z                        bool* __restrict__ out_ptr8)
2023-01-11T21:38:05.8571286Z {
2023-01-11T21:38:05.8571369Z     #pragma GCC ivdep
2023-01-11T21:38:05.8571448Z     for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.8571515Z     {
2023-01-11T21:38:05.8571586Z         {
2023-01-11T21:38:05.8571659Z             {
2023-01-11T21:38:05.8571755Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8571848Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.8571941Z                 auto tmp2 = tmp0 || tmp1;
2023-01-11T21:38:05.8572028Z                 auto tmp3 = tmp0 && tmp1;
2023-01-11T21:38:05.8572118Z                 auto tmp4 = tmp0 & tmp1;
2023-01-11T21:38:05.8572210Z                 auto tmp5 = tmp0 | tmp1;
2023-01-11T21:38:05.8572302Z                 auto tmp6 = tmp0 ^ tmp1;
2023-01-11T21:38:05.8572393Z                 auto tmp7 = tmp0 == 0;
2023-01-11T21:38:05.8572483Z                 out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8572572Z                 out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.8572650Z                 out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.8572734Z                 out_ptr3[i0] = tmp5;
2023-01-11T21:38:05.8572821Z                 out_ptr4[i0] = tmp6;
2023-01-11T21:38:05.8572905Z                 out_ptr5[i0] = tmp3;
2023-01-11T21:38:05.8572993Z                 out_ptr6[i0] = tmp2;
2023-01-11T21:38:05.8573077Z                 out_ptr7[i0] = tmp7;
2023-01-11T21:38:05.8573160Z                 out_ptr8[i0] = tmp1;
2023-01-11T21:38:05.8573221Z             }
2023-01-11T21:38:05.8573288Z         }
2023-01-11T21:38:05.8573354Z     }
2023-01-11T21:38:05.8573420Z }
2023-01-11T21:38:05.8573509Z ''')
2023-01-11T21:38:05.8573514Z 
2023-01-11T21:38:05.8573519Z 
2023-01-11T21:38:05.8573620Z async_compile.wait(globals())
2023-01-11T21:38:05.8573700Z del async_compile
2023-01-11T21:38:05.8573705Z 
2023-01-11T21:38:05.8573772Z def call(args):
2023-01-11T21:38:05.8573853Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8573932Z     args.clear()
2023-01-11T21:38:05.8574121Z     buf0 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8574306Z     buf1 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8574620Z     buf2 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8574897Z     buf3 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8575074Z     buf4 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8575255Z     buf5 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8575436Z     buf6 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8575614Z     buf7 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8575792Z     buf8 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8576140Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()))
2023-01-11T21:38:05.8576220Z     del arg0_1
2023-01-11T21:38:05.8576293Z     del arg1_1
2023-01-11T21:38:05.8576420Z     return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, )
2023-01-11T21:38:05.8576426Z 
2023-01-11T21:38:05.8576430Z 
2023-01-11T21:38:05.8576504Z if __name__ == "__main__":
2023-01-11T21:38:05.8576625Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8576752Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8576939Z     arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8577176Z     arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8577307Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8577350Z 
2023-01-11T21:38:05.8577423Z ok (1.800s)
2023-01-11T21:38:05.8577748Z   test_both_scalars_cpu (__main__.CpuTests) ... [2023-01-11 21:25:16,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 49
2023-01-11T21:38:05.8578008Z [2023-01-11 21:25:18,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 49
2023-01-11T21:38:05.8578020Z 
2023-01-11T21:38:05.8578111Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8578187Z import torch
2023-01-11T21:38:05.8578261Z import random
2023-01-11T21:38:05.8578380Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8578505Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8578510Z 
2023-01-11T21:38:05.8578592Z aten = torch.ops.aten
2023-01-11T21:38:05.8578727Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8578815Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8578822Z 
2023-01-11T21:38:05.8578896Z import triton
2023-01-11T21:38:05.8578993Z import triton.language as tl
2023-01-11T21:38:05.8579118Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8579258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8579263Z 
2023-01-11T21:38:05.8579267Z 
2023-01-11T21:38:05.8579405Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8579615Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8579732Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8579826Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8579927Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8580027Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8580126Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:05.8580225Z                        float* __restrict__ out_ptr5)
2023-01-11T21:38:05.8580291Z {
2023-01-11T21:38:05.8580360Z     {
2023-01-11T21:38:05.8580420Z         {
2023-01-11T21:38:05.8580525Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:05.8580634Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:05.8580723Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8580808Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:05.8580907Z         }
2023-01-11T21:38:05.8580978Z     }
2023-01-11T21:38:05.8581036Z     {
2023-01-11T21:38:05.8581102Z         {
2023-01-11T21:38:05.8581210Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:05.8581316Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:05.8581405Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8581488Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:05.8581548Z         }
2023-01-11T21:38:05.8581614Z     }
2023-01-11T21:38:05.8581680Z     {
2023-01-11T21:38:05.8581749Z         {
2023-01-11T21:38:05.8581855Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:05.8581965Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:05.8582095Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8582171Z             out_ptr2[0] = tmp2;
2023-01-11T21:38:05.8582238Z         }
2023-01-11T21:38:05.8582304Z     }
2023-01-11T21:38:05.8582371Z     {
2023-01-11T21:38:05.8582439Z         {
2023-01-11T21:38:05.8582550Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:05.8582654Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:05.8582774Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8582858Z             out_ptr3[0] = tmp2;
2023-01-11T21:38:05.8582923Z         }
2023-01-11T21:38:05.8582989Z     }
2023-01-11T21:38:05.8583060Z     {
2023-01-11T21:38:05.8583129Z         {
2023-01-11T21:38:05.8583224Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:05.8583331Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:05.8583422Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8583504Z             out_ptr4[0] = tmp2;
2023-01-11T21:38:05.8583608Z         }
2023-01-11T21:38:05.8583678Z     }
2023-01-11T21:38:05.8583747Z     {
2023-01-11T21:38:05.8583806Z         {
2023-01-11T21:38:05.8583910Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:05.8584012Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:05.8584104Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8584190Z             out_ptr5[0] = tmp2;
2023-01-11T21:38:05.8584256Z         }
2023-01-11T21:38:05.8584322Z     }
2023-01-11T21:38:05.8584381Z }
2023-01-11T21:38:05.8584466Z ''')
2023-01-11T21:38:05.8584472Z 
2023-01-11T21:38:05.8584476Z 
2023-01-11T21:38:05.8584572Z async_compile.wait(globals())
2023-01-11T21:38:05.8584649Z del async_compile
2023-01-11T21:38:05.8584654Z 
2023-01-11T21:38:05.8584729Z def call(args):
2023-01-11T21:38:05.8584914Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8585096Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8585271Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8585448Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8585622Z     buf4 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8585799Z     buf5 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8586033Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:05.8586144Z     return (buf0, buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:05.8586150Z 
2023-01-11T21:38:05.8586155Z 
2023-01-11T21:38:05.8586236Z if __name__ == "__main__":
2023-01-11T21:38:05.8586355Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8586483Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8586578Z     print_performance(lambda: call([]))
2023-01-11T21:38:05.8586585Z 
2023-01-11T21:38:05.8586659Z ok (1.839s)
2023-01-11T21:38:05.8587139Z   test_cat_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8587275Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8587534Z [2023-01-11 21:25:18,159] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 50
2023-01-11T21:38:05.8587794Z [2023-01-11 21:25:20,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 50
2023-01-11T21:38:05.8588205Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8588340Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8588598Z [2023-01-11 21:25:20,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 51
2023-01-11T21:38:05.8588603Z 
2023-01-11T21:38:05.8588701Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8588777Z import torch
2023-01-11T21:38:05.8588845Z import random
2023-01-11T21:38:05.8588964Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8589088Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8589093Z 
2023-01-11T21:38:05.8589176Z aten = torch.ops.aten
2023-01-11T21:38:05.8589311Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8589409Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8589446Z 
2023-01-11T21:38:05.8589522Z import triton
2023-01-11T21:38:05.8589607Z import triton.language as tl
2023-01-11T21:38:05.8589734Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8589873Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8589878Z 
2023-01-11T21:38:05.8589885Z 
2023-01-11T21:38:05.8590023Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8590230Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8590356Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8590463Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8590563Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8590655Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8590757Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8590858Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:05.8590962Z                        double* __restrict__ out_ptr5,
2023-01-11T21:38:05.8591063Z                        double* __restrict__ out_ptr6)
2023-01-11T21:38:05.8591130Z {
2023-01-11T21:38:05.8591233Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8591292Z     {
2023-01-11T21:38:05.8591384Z         #pragma omp for 
2023-01-11T21:38:05.8591471Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8591537Z         {
2023-01-11T21:38:05.8591628Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.8591697Z             {
2023-01-11T21:38:05.8591851Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (16*i0));
2023-01-11T21:38:05.8591986Z                 auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8592081Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8592193Z                 tmp0.store(out_ptr0 + (8*i1) + (36*i0));
2023-01-11T21:38:05.8592303Z                 tmp2.store(out_ptr1 + (8*i1) + (36*i0));
2023-01-11T21:38:05.8592371Z             }
2023-01-11T21:38:05.8592471Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8592563Z             for(long i1=16; i1<16; i1+=1)
2023-01-11T21:38:05.8592623Z             {
2023-01-11T21:38:05.8592767Z                 auto tmp0 = in_ptr0[i1 + (16*i0)];
2023-01-11T21:38:05.8592876Z                 auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8592970Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8593067Z                 out_ptr0[i1 + (36*i0)] = tmp0;
2023-01-11T21:38:05.8593163Z                 out_ptr1[i1 + (36*i0)] = tmp2;
2023-01-11T21:38:05.8593231Z             }
2023-01-11T21:38:05.8593291Z         }
2023-01-11T21:38:05.8593373Z         #pragma omp for 
2023-01-11T21:38:05.8593461Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8593529Z         {
2023-01-11T21:38:05.8593614Z             #pragma GCC ivdep
2023-01-11T21:38:05.8593701Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.8593773Z             {
2023-01-11T21:38:05.8593835Z                 {
2023-01-11T21:38:05.8593909Z                     {
2023-01-11T21:38:05.8594021Z                         auto tmp0 = in_ptr0[i1 + (16*i0)];
2023-01-11T21:38:05.8594136Z                         auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8594241Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8594345Z                         out_ptr2[i1 + (36*i0)] = tmp2;
2023-01-11T21:38:05.8594416Z                     }
2023-01-11T21:38:05.8594477Z                 }
2023-01-11T21:38:05.8594546Z             }
2023-01-11T21:38:05.8594613Z         }
2023-01-11T21:38:05.8594697Z         #pragma omp for 
2023-01-11T21:38:05.8594785Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:05.8594851Z         {
2023-01-11T21:38:05.8594919Z             {
2023-01-11T21:38:05.8594980Z                 {
2023-01-11T21:38:05.8595078Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8595215Z                     auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8595313Z                     auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8595427Z                     auto tmp3 = static_cast<double>(tmp2);
2023-01-11T21:38:05.8595518Z                     out_ptr3[i0] = tmp2;
2023-01-11T21:38:05.8595608Z                     out_ptr4[i0] = tmp2;
2023-01-11T21:38:05.8595694Z                     out_ptr5[i0] = tmp3;
2023-01-11T21:38:05.8595783Z                     out_ptr6[i0] = tmp3;
2023-01-11T21:38:05.8595852Z                 }
2023-01-11T21:38:05.8595922Z             }
2023-01-11T21:38:05.8595990Z         }
2023-01-11T21:38:05.8596056Z     }
2023-01-11T21:38:05.8596113Z }
2023-01-11T21:38:05.8596202Z ''')
2023-01-11T21:38:05.8596208Z 
2023-01-11T21:38:05.8596212Z 
2023-01-11T21:38:05.8596305Z async_compile.wait(globals())
2023-01-11T21:38:05.8596381Z del async_compile
2023-01-11T21:38:05.8596386Z 
2023-01-11T21:38:05.8596462Z def call(args):
2023-01-11T21:38:05.8596534Z     arg0_1, = args
2023-01-11T21:38:05.8596614Z     args.clear()
2023-01-11T21:38:05.8596818Z     buf3 = empty_strided((8, 36), (36, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8596918Z     buf0 = as_strided(buf3, (8, 16), (36, 1))  # alias
2023-01-11T21:38:05.8597027Z     buf2 = as_strided(buf3, (8, 16), (36, 1), 20)  # alias
2023-01-11T21:38:05.8597138Z     buf1 = as_strided(buf3, (8, 4), (36, 1), 16)  # alias
2023-01-11T21:38:05.8597337Z     buf6 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8597442Z     buf4 = as_strided(buf6, (8, 16), (16, 1))  # alias
2023-01-11T21:38:05.8597551Z     buf5 = as_strided(buf6, (8, 16), (16, 1), 128)  # alias
2023-01-11T21:38:05.8597751Z     buf9 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.8597848Z     buf7 = as_strided(buf9, (8, 16), (16, 1))  # alias
2023-01-11T21:38:05.8597956Z     buf8 = as_strided(buf9, (8, 16), (16, 1), 128)  # alias
2023-01-11T21:38:05.8598235Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()))
2023-01-11T21:38:05.8598312Z     del arg0_1
2023-01-11T21:38:05.8598398Z     return (buf3, buf6, buf9, )
2023-01-11T21:38:05.8598404Z 
2023-01-11T21:38:05.8598435Z 
2023-01-11T21:38:05.8598518Z if __name__ == "__main__":
2023-01-11T21:38:05.8598636Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8598766Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8598965Z     arg0_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8599071Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8599076Z 
2023-01-11T21:38:05.8599341Z [2023-01-11 21:25:22,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 51
2023-01-11T21:38:05.8599347Z 
2023-01-11T21:38:05.8599445Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8599523Z import torch
2023-01-11T21:38:05.8599599Z import random
2023-01-11T21:38:05.8599718Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8599843Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8599848Z 
2023-01-11T21:38:05.8599933Z aten = torch.ops.aten
2023-01-11T21:38:05.8600066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8600161Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8600166Z 
2023-01-11T21:38:05.8600241Z import triton
2023-01-11T21:38:05.8600334Z import triton.language as tl
2023-01-11T21:38:05.8600460Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8600599Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8600605Z 
2023-01-11T21:38:05.8600610Z 
2023-01-11T21:38:05.8600746Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8600953Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8601102Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8601214Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8601324Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.8601439Z                        const double* __restrict__ in_ptr3,
2023-01-11T21:38:05.8601546Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8601648Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8601748Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8601840Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8601940Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:05.8602038Z                        float* __restrict__ out_ptr5,
2023-01-11T21:38:05.8602142Z                        double* __restrict__ out_ptr6,
2023-01-11T21:38:05.8602246Z                        double* __restrict__ out_ptr7,
2023-01-11T21:38:05.8602350Z                        float* __restrict__ out_ptr8,
2023-01-11T21:38:05.8602452Z                        double* __restrict__ out_ptr9)
2023-01-11T21:38:05.8602510Z {
2023-01-11T21:38:05.8602614Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8602681Z     {
2023-01-11T21:38:05.8602780Z         #pragma omp for  collapse(3)
2023-01-11T21:38:05.8602868Z         for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.8602934Z         {
2023-01-11T21:38:05.8603023Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.8603084Z             {
2023-01-11T21:38:05.8603181Z                 for(long i2=0; i2<16; i2+=1)
2023-01-11T21:38:05.8603250Z                 {
2023-01-11T21:38:05.8603321Z                     {
2023-01-11T21:38:05.8603396Z                         {
2023-01-11T21:38:05.8603516Z                             auto tmp0 = in_ptr0[i0 + (3*i2) + (48*i1)];
2023-01-11T21:38:05.8603633Z                             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8603730Z                             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8603848Z                             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.8603951Z                             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.8604064Z                             out_ptr0[i2 + (48*i1) + (144*i0)] = tmp0;
2023-01-11T21:38:05.8604201Z                             out_ptr1[i2 + (48*i1) + (144*i0)] = tmp2;
2023-01-11T21:38:05.8604309Z                             out_ptr2[i2 + (48*i1) + (144*i0)] = tmp4;
2023-01-11T21:38:05.8604383Z                         }
2023-01-11T21:38:05.8604454Z                     }
2023-01-11T21:38:05.8604516Z                 }
2023-01-11T21:38:05.8604584Z             }
2023-01-11T21:38:05.8604652Z         }
2023-01-11T21:38:05.8604749Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8604836Z         for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.8604907Z         {
2023-01-11T21:38:05.8605012Z             for(long i1=0; i1<144; i1+=1)
2023-01-11T21:38:05.8605087Z             {
2023-01-11T21:38:05.8605173Z                 {
2023-01-11T21:38:05.8605250Z                     {
2023-01-11T21:38:05.8605359Z                         auto tmp0 = in_ptr1[i1 + (144*i0)];
2023-01-11T21:38:05.8605461Z                         out_ptr3[i0 + (3*i1)] = tmp0;
2023-01-11T21:38:05.8605535Z                     }
2023-01-11T21:38:05.8605596Z                 }
2023-01-11T21:38:05.8605668Z             }
2023-01-11T21:38:05.8605734Z         }
2023-01-11T21:38:05.8605833Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8605919Z         for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.8605988Z         {
2023-01-11T21:38:05.8606078Z             for(long i1=0; i1<48; i1+=1)
2023-01-11T21:38:05.8606138Z             {
2023-01-11T21:38:05.8606208Z                 {
2023-01-11T21:38:05.8606279Z                     {
2023-01-11T21:38:05.8606390Z                         auto tmp0 = in_ptr0[i0 + (3*i1)];
2023-01-11T21:38:05.8606504Z                         auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8606635Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.8606739Z                         out_ptr4[i1 + (48*i0)] = tmp2;
2023-01-11T21:38:05.8606803Z                     }
2023-01-11T21:38:05.8606872Z                 }
2023-01-11T21:38:05.8606942Z             }
2023-01-11T21:38:05.8607018Z         }
2023-01-11T21:38:05.8607100Z         #pragma omp for 
2023-01-11T21:38:05.8607190Z         for(long i0=0; i0<144; i0+=1)
2023-01-11T21:38:05.8607250Z         {
2023-01-11T21:38:05.8607316Z             {
2023-01-11T21:38:05.8607386Z                 {
2023-01-11T21:38:05.8607483Z                     auto tmp0 = out_ptr4[i0];
2023-01-11T21:38:05.8607596Z                     auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:05.8607686Z                     out_ptr5[i0] = tmp0;
2023-01-11T21:38:05.8607776Z                     out_ptr6[i0] = tmp1;
2023-01-11T21:38:05.8607857Z                     out_ptr7[i0] = tmp1;
2023-01-11T21:38:05.8607929Z                 }
2023-01-11T21:38:05.8607996Z             }
2023-01-11T21:38:05.8608067Z         }
2023-01-11T21:38:05.8608161Z         #pragma omp for  collapse(3)
2023-01-11T21:38:05.8608248Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.8608316Z         {
2023-01-11T21:38:05.8608396Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.8608470Z             {
2023-01-11T21:38:05.8608564Z                 for(long i2=0; i2<48; i2+=1)
2023-01-11T21:38:05.8608633Z                 {
2023-01-11T21:38:05.8608703Z                     {
2023-01-11T21:38:05.8608776Z                         {
2023-01-11T21:38:05.8608895Z                             auto tmp0 = in_ptr2[i2 + (48*i1) + (144*i0)];
2023-01-11T21:38:05.8608998Z                             out_ptr8[i1 + (3*i2) + (144*i0)] = tmp0;
2023-01-11T21:38:05.8609071Z                         }
2023-01-11T21:38:05.8609141Z                     }
2023-01-11T21:38:05.8609209Z                 }
2023-01-11T21:38:05.8609277Z             }
2023-01-11T21:38:05.8609346Z         }
2023-01-11T21:38:05.8609433Z         #pragma omp for  collapse(3)
2023-01-11T21:38:05.8609520Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.8609586Z         {
2023-01-11T21:38:05.8609675Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.8609744Z             {
2023-01-11T21:38:05.8609865Z                 for(long i2=0; i2<48; i2+=1)
2023-01-11T21:38:05.8609936Z                 {
2023-01-11T21:38:05.8609999Z                     {
2023-01-11T21:38:05.8610072Z                         {
2023-01-11T21:38:05.8610190Z                             auto tmp0 = in_ptr3[i2 + (48*i1) + (144*i0)];
2023-01-11T21:38:05.8610301Z                             out_ptr9[i1 + (3*i2) + (144*i0)] = tmp0;
2023-01-11T21:38:05.8610373Z                         }
2023-01-11T21:38:05.8610444Z                     }
2023-01-11T21:38:05.8610511Z                 }
2023-01-11T21:38:05.8610571Z             }
2023-01-11T21:38:05.8610638Z         }
2023-01-11T21:38:05.8610705Z     }
2023-01-11T21:38:05.8610775Z }
2023-01-11T21:38:05.8610865Z ''')
2023-01-11T21:38:05.8610870Z 
2023-01-11T21:38:05.8610875Z 
2023-01-11T21:38:05.8610969Z async_compile.wait(globals())
2023-01-11T21:38:05.8611047Z del async_compile
2023-01-11T21:38:05.8611052Z 
2023-01-11T21:38:05.8611120Z def call(args):
2023-01-11T21:38:05.8611195Z     arg0_1, = args
2023-01-11T21:38:05.8611277Z     args.clear()
2023-01-11T21:38:05.8611500Z     buf3 = empty_strided((1, 3, 3, 48), (432, 144, 48, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8611617Z     buf0 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1))  # alias
2023-01-11T21:38:05.8611735Z     buf1 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 16)  # alias
2023-01-11T21:38:05.8611853Z     buf2 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 32)  # alias
2023-01-11T21:38:05.8612066Z     buf4 = empty_strided((1, 3, 3, 48), (432, 1, 144, 3), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8612269Z     buf7 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8612416Z     buf5 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1))  # alias
2023-01-11T21:38:05.8612532Z     buf6 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1), 144)  # alias
2023-01-11T21:38:05.8612747Z     buf11 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.8612869Z     buf9 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1))  # alias
2023-01-11T21:38:05.8612990Z     buf10 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1), 144)  # alias
2023-01-11T21:38:05.8613201Z     buf8 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8613413Z     buf12 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.8613820Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf12.data_ptr()))
2023-01-11T21:38:05.8613891Z     del arg0_1
2023-01-11T21:38:05.8613960Z     del buf0
2023-01-11T21:38:05.8614029Z     del buf1
2023-01-11T21:38:05.8614104Z     del buf10
2023-01-11T21:38:05.8614174Z     del buf11
2023-01-11T21:38:05.8614243Z     del buf2
2023-01-11T21:38:05.8614310Z     del buf3
2023-01-11T21:38:05.8614371Z     del buf5
2023-01-11T21:38:05.8614441Z     del buf6
2023-01-11T21:38:05.8614617Z     del buf7
2023-01-11T21:38:05.8614687Z     del buf9
2023-01-11T21:38:05.8614778Z     return (buf4, buf8, buf12, )
2023-01-11T21:38:05.8614783Z 
2023-01-11T21:38:05.8614788Z 
2023-01-11T21:38:05.8614869Z if __name__ == "__main__":
2023-01-11T21:38:05.8614989Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8615111Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8615332Z     arg0_1 = rand_strided((1, 3, 3, 16), (144, 1, 48, 3), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8615449Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8615454Z 
2023-01-11T21:38:05.8615527Z ok (3.967s)
2023-01-11T21:38:05.8616027Z   test_cat_extern_kernel_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8616161Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8616418Z [2023-01-11 21:25:22,119] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 52
2023-01-11T21:38:05.8616682Z [2023-01-11 21:25:23,879] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 52
2023-01-11T21:38:05.8616690Z 
2023-01-11T21:38:05.8616789Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8616856Z import torch
2023-01-11T21:38:05.8616932Z import random
2023-01-11T21:38:05.8617051Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8617230Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8617238Z 
2023-01-11T21:38:05.8617329Z aten = torch.ops.aten
2023-01-11T21:38:05.8617468Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8617564Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8617569Z 
2023-01-11T21:38:05.8617644Z import triton
2023-01-11T21:38:05.8617729Z import triton.language as tl
2023-01-11T21:38:05.8617855Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8617995Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8618000Z 
2023-01-11T21:38:05.8618004Z 
2023-01-11T21:38:05.8618183Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8618390Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8618515Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8618621Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8618688Z {
2023-01-11T21:38:05.8618782Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8618847Z     {
2023-01-11T21:38:05.8618931Z         #pragma omp for 
2023-01-11T21:38:05.8619021Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.8619087Z         {
2023-01-11T21:38:05.8619178Z             for(long i1=0; i1<32; i1+=1)
2023-01-11T21:38:05.8619239Z             {
2023-01-11T21:38:05.8619389Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (256*i0));
2023-01-11T21:38:05.8619498Z                 tmp0.store(out_ptr0 + (8*i1) + (512*i0));
2023-01-11T21:38:05.8619567Z             }
2023-01-11T21:38:05.8619672Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8619766Z             for(long i1=256; i1<256; i1+=1)
2023-01-11T21:38:05.8619835Z             {
2023-01-11T21:38:05.8619928Z                 auto tmp0 = in_ptr0[i1 + (256*i0)];
2023-01-11T21:38:05.8620025Z                 out_ptr0[i1 + (512*i0)] = tmp0;
2023-01-11T21:38:05.8620094Z             }
2023-01-11T21:38:05.8620165Z         }
2023-01-11T21:38:05.8620232Z     }
2023-01-11T21:38:05.8620300Z }
2023-01-11T21:38:05.8620385Z ''')
2023-01-11T21:38:05.8620391Z 
2023-01-11T21:38:05.8620395Z 
2023-01-11T21:38:05.8620487Z async_compile.wait(globals())
2023-01-11T21:38:05.8620557Z del async_compile
2023-01-11T21:38:05.8620562Z 
2023-01-11T21:38:05.8620637Z def call(args):
2023-01-11T21:38:05.8620733Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:05.8620810Z     args.clear()
2023-01-11T21:38:05.8621016Z     buf0 = empty_strided((256, 1600), (1600, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8621115Z     aten.mm.out(arg1_1, arg2_1, out=buf0)
2023-01-11T21:38:05.8621191Z     del arg1_1
2023-01-11T21:38:05.8621255Z     del arg2_1
2023-01-11T21:38:05.8621457Z     buf3 = empty_strided((256, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8621572Z     buf1 = as_strided(buf3, (256, 256), (512, 1))  # alias
2023-01-11T21:38:05.8621727Z     aten.mm.out(as_strided(buf0, (256, 100), (1600, 1)), arg3_1, out=buf1)
2023-01-11T21:38:05.8621801Z     del arg3_1
2023-01-11T21:38:05.8621873Z     del buf0
2023-01-11T21:38:05.8621983Z     buf2 = as_strided(buf3, (256, 256), (512, 1), 256)  # alias
2023-01-11T21:38:05.8622114Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8622188Z     del arg0_1
2023-01-11T21:38:05.8622264Z     return (buf3, )
2023-01-11T21:38:05.8622269Z 
2023-01-11T21:38:05.8622274Z 
2023-01-11T21:38:05.8622356Z if __name__ == "__main__":
2023-01-11T21:38:05.8622473Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8622599Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8622807Z     arg0_1 = rand_strided((256, 256), (256, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8623012Z     arg1_1 = rand_strided((256, 1024), (1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8623211Z     arg2_1 = rand_strided((1024, 1600), (1600, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8623409Z     arg3_1 = rand_strided((100, 256), (256, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8623545Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:05.8623551Z 
2023-01-11T21:38:05.8623625Z ok (1.836s)
2023-01-11T21:38:05.8624088Z   test_cat_upcasting_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8624254Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8624511Z [2023-01-11 21:25:23,934] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 53
2023-01-11T21:38:05.8624776Z [2023-01-11 21:25:25,679] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 53
2023-01-11T21:38:05.8624782Z 
2023-01-11T21:38:05.8624880Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8624947Z import torch
2023-01-11T21:38:05.8625024Z import random
2023-01-11T21:38:05.8625142Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8625265Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8625271Z 
2023-01-11T21:38:05.8625354Z aten = torch.ops.aten
2023-01-11T21:38:05.8625491Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8625590Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8625599Z 
2023-01-11T21:38:05.8625672Z import triton
2023-01-11T21:38:05.8625758Z import triton.language as tl
2023-01-11T21:38:05.8625883Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8626023Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8626029Z 
2023-01-11T21:38:05.8626035Z 
2023-01-11T21:38:05.8626172Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8626380Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8626506Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8626618Z                        const half* __restrict__ in_ptr1,
2023-01-11T21:38:05.8626724Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8626817Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8626883Z {
2023-01-11T21:38:05.8626990Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8627061Z     {
2023-01-11T21:38:05.8627145Z         #pragma omp for 
2023-01-11T21:38:05.8627235Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8627302Z         {
2023-01-11T21:38:05.8627384Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.8627456Z             {
2023-01-11T21:38:05.8627633Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (16*i0));
2023-01-11T21:38:05.8627747Z                 tmp0.store(out_ptr0 + (8*i1) + (36*i0));
2023-01-11T21:38:05.8627812Z             }
2023-01-11T21:38:05.8627911Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8628004Z             for(long i1=16; i1<16; i1+=1)
2023-01-11T21:38:05.8628065Z             {
2023-01-11T21:38:05.8628166Z                 auto tmp0 = in_ptr0[i1 + (16*i0)];
2023-01-11T21:38:05.8628264Z                 out_ptr0[i1 + (36*i0)] = tmp0;
2023-01-11T21:38:05.8628333Z             }
2023-01-11T21:38:05.8628401Z         }
2023-01-11T21:38:05.8628484Z         #pragma omp for 
2023-01-11T21:38:05.8628576Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8628635Z         {
2023-01-11T21:38:05.8628722Z             #pragma GCC ivdep
2023-01-11T21:38:05.8628814Z             for(long i1=0; i1<20; i1+=1)
2023-01-11T21:38:05.8628884Z             {
2023-01-11T21:38:05.8628956Z                 {
2023-01-11T21:38:05.8629027Z                     {
2023-01-11T21:38:05.8629152Z                         auto tmp0 = static_cast<float>(in_ptr1[i1 + (20*i0)]);
2023-01-11T21:38:05.8629271Z                         auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8629373Z                         out_ptr1[i1 + (36*i0)] = tmp1;
2023-01-11T21:38:05.8629446Z                     }
2023-01-11T21:38:05.8629513Z                 }
2023-01-11T21:38:05.8629582Z             }
2023-01-11T21:38:05.8629650Z         }
2023-01-11T21:38:05.8629709Z     }
2023-01-11T21:38:05.8629778Z }
2023-01-11T21:38:05.8629864Z ''')
2023-01-11T21:38:05.8629869Z 
2023-01-11T21:38:05.8629873Z 
2023-01-11T21:38:05.8629966Z async_compile.wait(globals())
2023-01-11T21:38:05.8630089Z del async_compile
2023-01-11T21:38:05.8630094Z 
2023-01-11T21:38:05.8630169Z def call(args):
2023-01-11T21:38:05.8630249Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8630317Z     args.clear()
2023-01-11T21:38:05.8630518Z     buf2 = empty_strided((8, 36), (36, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8630631Z     buf0 = as_strided(buf2, (8, 16), (36, 1))  # alias
2023-01-11T21:38:05.8630740Z     buf1 = as_strided(buf2, (8, 20), (36, 1), 16)  # alias
2023-01-11T21:38:05.8630934Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8631005Z     del arg0_1
2023-01-11T21:38:05.8631078Z     del arg1_1
2023-01-11T21:38:05.8631156Z     return (buf2, )
2023-01-11T21:38:05.8631161Z 
2023-01-11T21:38:05.8631166Z 
2023-01-11T21:38:05.8631239Z if __name__ == "__main__":
2023-01-11T21:38:05.8631358Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8631490Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8631690Z     arg0_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8631885Z     arg1_1 = rand_strided((8, 20), (20, 1), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.8632008Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8632014Z 
2023-01-11T21:38:05.8632085Z ok (1.767s)
2023-01-11T21:38:05.8632538Z   test_cauchy_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8632669Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8632917Z [2023-01-11 21:25:25,700] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 54
2023-01-11T21:38:05.8633180Z [2023-01-11 21:25:27,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 54
2023-01-11T21:38:05.8633186Z 
2023-01-11T21:38:05.8633285Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8633358Z import torch
2023-01-11T21:38:05.8633474Z import random
2023-01-11T21:38:05.8633594Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8633720Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8633726Z 
2023-01-11T21:38:05.8633810Z aten = torch.ops.aten
2023-01-11T21:38:05.8633939Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8634036Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8634041Z 
2023-01-11T21:38:05.8634115Z import triton
2023-01-11T21:38:05.8634208Z import triton.language as tl
2023-01-11T21:38:05.8634334Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8634478Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8634483Z 
2023-01-11T21:38:05.8634488Z 
2023-01-11T21:38:05.8634625Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8634834Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8634977Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8635099Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8635217Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8635285Z {
2023-01-11T21:38:05.8635351Z     {
2023-01-11T21:38:05.8635418Z         {
2023-01-11T21:38:05.8635502Z             float tmp6 = 0;
2023-01-11T21:38:05.8635604Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8635672Z             {
2023-01-11T21:38:05.8635783Z                 #pragma omp for reduction(+:tmp6) 
2023-01-11T21:38:05.8635879Z                 for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.8635982Z                 {
2023-01-11T21:38:05.8636082Z                     for(long i1=0; i1<32; i1+=1)
2023-01-11T21:38:05.8636155Z                     {
2023-01-11T21:38:05.8636220Z                         {
2023-01-11T21:38:05.8636322Z                             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8636423Z                             auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:05.8636567Z                             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.8636663Z                             auto tmp3 = 1 / tmp2;
2023-01-11T21:38:05.8636778Z                             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.8636879Z                             auto tmp5 = tmp3 * tmp4;
2023-01-11T21:38:05.8636959Z                             tmp6 += tmp5;
2023-01-11T21:38:05.8637037Z                         }
2023-01-11T21:38:05.8637109Z                     }
2023-01-11T21:38:05.8637179Z                 }
2023-01-11T21:38:05.8637247Z             }
2023-01-11T21:38:05.8637336Z             out_ptr0[0] = tmp6;
2023-01-11T21:38:05.8637405Z         }
2023-01-11T21:38:05.8637463Z     }
2023-01-11T21:38:05.8637531Z }
2023-01-11T21:38:05.8637615Z ''')
2023-01-11T21:38:05.8637621Z 
2023-01-11T21:38:05.8637625Z 
2023-01-11T21:38:05.8637719Z async_compile.wait(globals())
2023-01-11T21:38:05.8637798Z del async_compile
2023-01-11T21:38:05.8637805Z 
2023-01-11T21:38:05.8637883Z def call(args):
2023-01-11T21:38:05.8637962Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8638031Z     args.clear()
2023-01-11T21:38:05.8638219Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8638386Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8638458Z     del arg0_1
2023-01-11T21:38:05.8638529Z     del arg1_1
2023-01-11T21:38:05.8638605Z     return (buf0, )
2023-01-11T21:38:05.8638610Z 
2023-01-11T21:38:05.8638615Z 
2023-01-11T21:38:05.8638698Z if __name__ == "__main__":
2023-01-11T21:38:05.8638818Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8638936Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8639134Z     arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8639327Z     arg1_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8639476Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8639482Z 
2023-01-11T21:38:05.8639556Z ok (1.752s)
2023-01-11T21:38:05.8640005Z   test_clamp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8640137Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8640397Z [2023-01-11 21:25:27,461] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 55
2023-01-11T21:38:05.8640659Z [2023-01-11 21:25:29,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 55
2023-01-11T21:38:05.8640665Z 
2023-01-11T21:38:05.8640757Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8640833Z import torch
2023-01-11T21:38:05.8640909Z import random
2023-01-11T21:38:05.8641028Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8641153Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8641158Z 
2023-01-11T21:38:05.8641240Z aten = torch.ops.aten
2023-01-11T21:38:05.8641376Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8641475Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8641481Z 
2023-01-11T21:38:05.8641547Z import triton
2023-01-11T21:38:05.8641642Z import triton.language as tl
2023-01-11T21:38:05.8641796Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8641936Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8641941Z 
2023-01-11T21:38:05.8641946Z 
2023-01-11T21:38:05.8642084Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8642291Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8642415Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8642527Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8642622Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8642725Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8642825Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.8642893Z {
2023-01-11T21:38:05.8642995Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8643065Z     {
2023-01-11T21:38:05.8643148Z         #pragma omp for 
2023-01-11T21:38:05.8643232Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8643301Z         {
2023-01-11T21:38:05.8643441Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8643578Z             auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8643803Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(-0.10000000149011612));
2023-01-11T21:38:05.8643919Z             auto tmp2 = at::vec::maximum(tmp0, tmp1);
2023-01-11T21:38:05.8644067Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(0.10000000149011612));
2023-01-11T21:38:05.8644180Z             auto tmp4 = at::vec::minimum(tmp2, tmp3);
2023-01-11T21:38:05.8644313Z             auto tmp6 = at::vec::Vectorized<float>(static_cast<float>(0.0));
2023-01-11T21:38:05.8644428Z             auto tmp7 = at::vec::maximum(tmp5, tmp6);
2023-01-11T21:38:05.8644519Z             auto tmp8 = tmp0 + tmp5;
2023-01-11T21:38:05.8644630Z             auto tmp9 = at::vec::minimum(tmp8, tmp6);
2023-01-11T21:38:05.8644728Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8644826Z             tmp7.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8644919Z             tmp9.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.8644979Z         }
2023-01-11T21:38:05.8645079Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8645199Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8645271Z         {
2023-01-11T21:38:05.8645360Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8645449Z             auto tmp5 = in_ptr1[i0];
2023-01-11T21:38:05.8645625Z             auto tmp1 = static_cast<float>(-0.10000000149011612);
2023-01-11T21:38:05.8645756Z             auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1);
2023-01-11T21:38:05.8645865Z             auto tmp3 = static_cast<float>(0.10000000149011612);
2023-01-11T21:38:05.8645991Z             auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::min(tmp2, tmp3);
2023-01-11T21:38:05.8646098Z             auto tmp6 = static_cast<float>(0.0);
2023-01-11T21:38:05.8646228Z             auto tmp7 = (tmp6 != tmp6) ? tmp6 : std::max(tmp5, tmp6);
2023-01-11T21:38:05.8646319Z             auto tmp8 = tmp0 + tmp5;
2023-01-11T21:38:05.8646444Z             auto tmp9 = (tmp6 != tmp6) ? tmp6 : std::min(tmp8, tmp6);
2023-01-11T21:38:05.8646532Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8646613Z             out_ptr1[i0] = tmp7;
2023-01-11T21:38:05.8646702Z             out_ptr2[i0] = tmp9;
2023-01-11T21:38:05.8646768Z         }
2023-01-11T21:38:05.8646835Z     }
2023-01-11T21:38:05.8646899Z }
2023-01-11T21:38:05.8646984Z ''')
2023-01-11T21:38:05.8646990Z 
2023-01-11T21:38:05.8646994Z 
2023-01-11T21:38:05.8647089Z async_compile.wait(globals())
2023-01-11T21:38:05.8647159Z del async_compile
2023-01-11T21:38:05.8647173Z 
2023-01-11T21:38:05.8647240Z def call(args):
2023-01-11T21:38:05.8647319Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8647394Z     args.clear()
2023-01-11T21:38:05.8647589Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8647812Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8648000Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8648222Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8648288Z     del arg0_1
2023-01-11T21:38:05.8648360Z     del arg1_1
2023-01-11T21:38:05.8648447Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.8648452Z 
2023-01-11T21:38:05.8648457Z 
2023-01-11T21:38:05.8648536Z if __name__ == "__main__":
2023-01-11T21:38:05.8648657Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8648783Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8648979Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8649178Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8649292Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8649297Z 
2023-01-11T21:38:05.8649368Z ok (1.792s)
2023-01-11T21:38:05.8649818Z   test_clone_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8649950Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8650207Z [2023-01-11 21:25:29,257] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 56
2023-01-11T21:38:05.8650470Z [2023-01-11 21:25:31,127] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 56
2023-01-11T21:38:05.8650479Z 
2023-01-11T21:38:05.8650579Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8650652Z import torch
2023-01-11T21:38:05.8650727Z import random
2023-01-11T21:38:05.8650839Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8650963Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8650968Z 
2023-01-11T21:38:05.8651077Z aten = torch.ops.aten
2023-01-11T21:38:05.8651216Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8651312Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8651317Z 
2023-01-11T21:38:05.8651392Z import triton
2023-01-11T21:38:05.8651485Z import triton.language as tl
2023-01-11T21:38:05.8651610Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8651742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8651748Z 
2023-01-11T21:38:05.8651753Z 
2023-01-11T21:38:05.8651890Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8652099Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8652225Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8652330Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8652434Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8652502Z {
2023-01-11T21:38:05.8652596Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8652665Z     {
2023-01-11T21:38:05.8652747Z         #pragma omp for 
2023-01-11T21:38:05.8652834Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.8652902Z         {
2023-01-11T21:38:05.8653042Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8653179Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8653267Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8653395Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8653511Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.8653607Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8653701Z             tmp4.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8653769Z         }
2023-01-11T21:38:05.8653869Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8653961Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.8654021Z         {
2023-01-11T21:38:05.8654111Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8654216Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8654305Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8654409Z             auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:05.8654623Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.8654714Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8654792Z             out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.8654859Z         }
2023-01-11T21:38:05.8654926Z     }
2023-01-11T21:38:05.8654989Z }
2023-01-11T21:38:05.8655087Z ''')
2023-01-11T21:38:05.8655092Z 
2023-01-11T21:38:05.8655097Z 
2023-01-11T21:38:05.8655194Z async_compile.wait(globals())
2023-01-11T21:38:05.8655271Z del async_compile
2023-01-11T21:38:05.8655276Z 
2023-01-11T21:38:05.8655343Z def call(args):
2023-01-11T21:38:05.8655417Z     arg0_1, = args
2023-01-11T21:38:05.8655492Z     args.clear()
2023-01-11T21:38:05.8655698Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8655896Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8656065Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8656140Z     del arg0_1
2023-01-11T21:38:05.8656214Z     return (buf0, buf1, )
2023-01-11T21:38:05.8656232Z 
2023-01-11T21:38:05.8656236Z 
2023-01-11T21:38:05.8656309Z if __name__ == "__main__":
2023-01-11T21:38:05.8656428Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8656556Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8656756Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8656869Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8656874Z 
2023-01-11T21:38:05.8656946Z ok (1.913s)
2023-01-11T21:38:05.8657557Z   test_constant_pad_1d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8657695Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8657956Z [2023-01-11 21:25:31,164] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 57
2023-01-11T21:38:05.8658215Z [2023-01-11 21:25:32,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 57
2023-01-11T21:38:05.8658224Z 
2023-01-11T21:38:05.8658329Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8658406Z import torch
2023-01-11T21:38:05.8658486Z import random
2023-01-11T21:38:05.8658607Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8658735Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8658740Z 
2023-01-11T21:38:05.8658825Z aten = torch.ops.aten
2023-01-11T21:38:05.8658958Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8659059Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8659064Z 
2023-01-11T21:38:05.8659143Z import triton
2023-01-11T21:38:05.8659238Z import triton.language as tl
2023-01-11T21:38:05.8659370Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8659512Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8659518Z 
2023-01-11T21:38:05.8659555Z 
2023-01-11T21:38:05.8659696Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8659901Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8660029Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8660131Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8660237Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8660306Z {
2023-01-11T21:38:05.8660412Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8660482Z     {
2023-01-11T21:38:05.8660567Z         #pragma omp for 
2023-01-11T21:38:05.8660650Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.8660720Z         {
2023-01-11T21:38:05.8660808Z             #pragma GCC ivdep
2023-01-11T21:38:05.8660902Z             for(long i1=0; i1<32; i1+=1)
2023-01-11T21:38:05.8660975Z             {
2023-01-11T21:38:05.8661048Z                 {
2023-01-11T21:38:05.8661121Z                     {
2023-01-11T21:38:05.8661232Z                         auto tmp0 = static_cast<long>(i1);
2023-01-11T21:38:05.8661343Z                         auto tmp1 = static_cast<long>(31);
2023-01-11T21:38:05.8661447Z                         auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.8661542Z                         float tmp3 = 6.0;
2023-01-11T21:38:05.8661631Z                         if(tmp2)
2023-01-11T21:38:05.8661712Z                         {
2023-01-11T21:38:05.8661825Z                             auto tmp4 = in_ptr0[i1 + (31*i0)];
2023-01-11T21:38:05.8661907Z                             tmp3 = tmp4;
2023-01-11T21:38:05.8661987Z                         }
2023-01-11T21:38:05.8662092Z                         out_ptr0[i1 + (32*i0)] = tmp3;
2023-01-11T21:38:05.8662164Z                     }
2023-01-11T21:38:05.8662236Z                 }
2023-01-11T21:38:05.8662308Z             }
2023-01-11T21:38:05.8662378Z         }
2023-01-11T21:38:05.8662455Z         #pragma omp for 
2023-01-11T21:38:05.8662545Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.8662619Z         {
2023-01-11T21:38:05.8662706Z             #pragma GCC ivdep
2023-01-11T21:38:05.8662798Z             for(long i1=0; i1<36; i1+=1)
2023-01-11T21:38:05.8662874Z             {
2023-01-11T21:38:05.8662946Z                 {
2023-01-11T21:38:05.8663012Z                     {
2023-01-11T21:38:05.8663216Z                         auto tmp0 = static_cast<long>((-2) + i1);
2023-01-11T21:38:05.8663334Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8663441Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8663554Z                         auto tmp3 = static_cast<long>(31);
2023-01-11T21:38:05.8663659Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8663764Z                         auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:05.8663851Z                         float tmp6 = 99.0;
2023-01-11T21:38:05.8663941Z                         if(tmp5)
2023-01-11T21:38:05.8664018Z                         {
2023-01-11T21:38:05.8664199Z                             auto tmp7 = in_ptr0[(-2) + i1 + (31*i0)];
2023-01-11T21:38:05.8664292Z                             tmp6 = tmp7;
2023-01-11T21:38:05.8664369Z                         }
2023-01-11T21:38:05.8664473Z                         out_ptr1[i1 + (36*i0)] = tmp6;
2023-01-11T21:38:05.8664538Z                     }
2023-01-11T21:38:05.8664612Z                 }
2023-01-11T21:38:05.8664681Z             }
2023-01-11T21:38:05.8664750Z         }
2023-01-11T21:38:05.8664820Z     }
2023-01-11T21:38:05.8664888Z }
2023-01-11T21:38:05.8664970Z ''')
2023-01-11T21:38:05.8664986Z 
2023-01-11T21:38:05.8664991Z 
2023-01-11T21:38:05.8665087Z async_compile.wait(globals())
2023-01-11T21:38:05.8665186Z del async_compile
2023-01-11T21:38:05.8665191Z 
2023-01-11T21:38:05.8665278Z def call(args):
2023-01-11T21:38:05.8665371Z     arg0_1, = args
2023-01-11T21:38:05.8665451Z     args.clear()
2023-01-11T21:38:05.8665662Z     buf0 = empty_strided((2, 16, 32), (512, 32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8665943Z     buf1 = empty_strided((2, 16, 36), (576, 36, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8666105Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8666180Z     del arg0_1
2023-01-11T21:38:05.8666261Z     return (buf0, buf1, )
2023-01-11T21:38:05.8666270Z 
2023-01-11T21:38:05.8666274Z 
2023-01-11T21:38:05.8666354Z if __name__ == "__main__":
2023-01-11T21:38:05.8666473Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8666600Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8666807Z     arg0_1 = rand_strided((2, 16, 31), (496, 31, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8666921Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8666926Z 
2023-01-11T21:38:05.8666989Z ok (1.817s)
2023-01-11T21:38:05.8667447Z   test_constant_pad_2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8667591Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8667849Z [2023-01-11 21:25:32,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 58
2023-01-11T21:38:05.8668113Z [2023-01-11 21:25:34,739] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 58
2023-01-11T21:38:05.8668118Z 
2023-01-11T21:38:05.8668218Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8668292Z import torch
2023-01-11T21:38:05.8668370Z import random
2023-01-11T21:38:05.8668491Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8668608Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8668626Z 
2023-01-11T21:38:05.8668700Z aten = torch.ops.aten
2023-01-11T21:38:05.8668834Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8668932Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8668938Z 
2023-01-11T21:38:05.8669012Z import triton
2023-01-11T21:38:05.8669133Z import triton.language as tl
2023-01-11T21:38:05.8669257Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8669397Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8669403Z 
2023-01-11T21:38:05.8669407Z 
2023-01-11T21:38:05.8669536Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8669745Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8669868Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8669973Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8670075Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8670145Z {
2023-01-11T21:38:05.8670247Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8670313Z     {
2023-01-11T21:38:05.8670401Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8670488Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:05.8670554Z         {
2023-01-11T21:38:05.8670652Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:05.8670720Z             {
2023-01-11T21:38:05.8670790Z                 {
2023-01-11T21:38:05.8670854Z                     {
2023-01-11T21:38:05.8671023Z                         auto tmp0 = static_cast<long>((-1) + i0);
2023-01-11T21:38:05.8671136Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8671236Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8671347Z                         auto tmp3 = static_cast<long>(8);
2023-01-11T21:38:05.8671444Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8671612Z                         auto tmp5 = static_cast<long>((-1) + i1);
2023-01-11T21:38:05.8671745Z                         auto tmp6 = tmp5 >= tmp1;
2023-01-11T21:38:05.8671836Z                         auto tmp7 = tmp5 < tmp3;
2023-01-11T21:38:05.8671934Z                         auto tmp8 = tmp2 & tmp4;
2023-01-11T21:38:05.8672033Z                         auto tmp9 = tmp8 & tmp6;
2023-01-11T21:38:05.8672133Z                         auto tmp10 = tmp9 & tmp7;
2023-01-11T21:38:05.8672227Z                         float tmp11 = 6.0;
2023-01-11T21:38:05.8672310Z                         if(tmp10)
2023-01-11T21:38:05.8672387Z                         {
2023-01-11T21:38:05.8672554Z                             auto tmp12 = in_ptr0[(-9) + i1 + (8*i0)];
2023-01-11T21:38:05.8672645Z                             tmp11 = tmp12;
2023-01-11T21:38:05.8672720Z                         }
2023-01-11T21:38:05.8672823Z                         out_ptr0[i1 + (10*i0)] = tmp11;
2023-01-11T21:38:05.8672893Z                     }
2023-01-11T21:38:05.8672963Z                 }
2023-01-11T21:38:05.8673034Z             }
2023-01-11T21:38:05.8673094Z         }
2023-01-11T21:38:05.8673191Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8673276Z         for(long i0=0; i0<15; i0+=1)
2023-01-11T21:38:05.8673344Z         {
2023-01-11T21:38:05.8673434Z             for(long i1=0; i1<11; i1+=1)
2023-01-11T21:38:05.8673502Z             {
2023-01-11T21:38:05.8673564Z                 {
2023-01-11T21:38:05.8673635Z                     {
2023-01-11T21:38:05.8673803Z                         auto tmp0 = static_cast<long>((-3) + i0);
2023-01-11T21:38:05.8673913Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8674011Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8674118Z                         auto tmp3 = static_cast<long>(8);
2023-01-11T21:38:05.8674216Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8674384Z                         auto tmp5 = static_cast<long>((-1) + i1);
2023-01-11T21:38:05.8674475Z                         auto tmp6 = tmp5 >= tmp1;
2023-01-11T21:38:05.8674580Z                         auto tmp7 = tmp5 < tmp3;
2023-01-11T21:38:05.8674681Z                         auto tmp8 = tmp2 & tmp4;
2023-01-11T21:38:05.8674778Z                         auto tmp9 = tmp8 & tmp6;
2023-01-11T21:38:05.8674875Z                         auto tmp10 = tmp9 & tmp7;
2023-01-11T21:38:05.8675008Z                         float tmp11 = 99.0;
2023-01-11T21:38:05.8675093Z                         if(tmp10)
2023-01-11T21:38:05.8675159Z                         {
2023-01-11T21:38:05.8675334Z                             auto tmp12 = in_ptr0[(-25) + i1 + (8*i0)];
2023-01-11T21:38:05.8675424Z                             tmp11 = tmp12;
2023-01-11T21:38:05.8675499Z                         }
2023-01-11T21:38:05.8675603Z                         out_ptr1[i1 + (11*i0)] = tmp11;
2023-01-11T21:38:05.8675676Z                     }
2023-01-11T21:38:05.8675749Z                 }
2023-01-11T21:38:05.8675809Z             }
2023-01-11T21:38:05.8675881Z         }
2023-01-11T21:38:05.8675948Z     }
2023-01-11T21:38:05.8676014Z }
2023-01-11T21:38:05.8676098Z ''')
2023-01-11T21:38:05.8676105Z 
2023-01-11T21:38:05.8676109Z 
2023-01-11T21:38:05.8676206Z async_compile.wait(globals())
2023-01-11T21:38:05.8676284Z del async_compile
2023-01-11T21:38:05.8676289Z 
2023-01-11T21:38:05.8676357Z def call(args):
2023-01-11T21:38:05.8676436Z     arg0_1, = args
2023-01-11T21:38:05.8676511Z     args.clear()
2023-01-11T21:38:05.8676733Z     buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8676952Z     buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8677121Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8677196Z     del arg0_1
2023-01-11T21:38:05.8677270Z     return (buf0, buf1, )
2023-01-11T21:38:05.8677288Z 
2023-01-11T21:38:05.8677293Z 
2023-01-11T21:38:05.8677366Z if __name__ == "__main__":
2023-01-11T21:38:05.8677517Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8677647Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8677860Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8677981Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8677986Z 
2023-01-11T21:38:05.8678057Z ok (1.787s)
2023-01-11T21:38:05.8678516Z   test_constant_pad_3d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8678649Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8678905Z [2023-01-11 21:25:34,768] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 59
2023-01-11T21:38:05.8679164Z [2023-01-11 21:25:36,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 59
2023-01-11T21:38:05.8679170Z 
2023-01-11T21:38:05.8679269Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8679348Z import torch
2023-01-11T21:38:05.8679424Z import random
2023-01-11T21:38:05.8679544Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8679668Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8679675Z 
2023-01-11T21:38:05.8679758Z aten = torch.ops.aten
2023-01-11T21:38:05.8679898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8679987Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8679992Z 
2023-01-11T21:38:05.8680071Z import triton
2023-01-11T21:38:05.8680161Z import triton.language as tl
2023-01-11T21:38:05.8680288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8680433Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8680438Z 
2023-01-11T21:38:05.8680442Z 
2023-01-11T21:38:05.8680579Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8680783Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8680941Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8681040Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8681146Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8681213Z {
2023-01-11T21:38:05.8681317Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8681385Z     {
2023-01-11T21:38:05.8681479Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8681565Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.8681627Z         {
2023-01-11T21:38:05.8681718Z             for(long i1=0; i1<15; i1+=1)
2023-01-11T21:38:05.8681788Z             {
2023-01-11T21:38:05.8681879Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8681975Z                 for(long i2=0; i2<11; i2+=1)
2023-01-11T21:38:05.8682044Z                 {
2023-01-11T21:38:05.8682126Z                     #pragma GCC ivdep
2023-01-11T21:38:05.8682223Z                     for(long i3=0; i3<7; i3+=1)
2023-01-11T21:38:05.8682297Z                     {
2023-01-11T21:38:05.8682371Z                         {
2023-01-11T21:38:05.8682445Z                             {
2023-01-11T21:38:05.8682623Z                                 auto tmp0 = static_cast<long>((-5) + i1);
2023-01-11T21:38:05.8682738Z                                 auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8682836Z                                 auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8682948Z                                 auto tmp3 = static_cast<long>(4);
2023-01-11T21:38:05.8683051Z                                 auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8683228Z                                 auto tmp5 = static_cast<long>((-3) + i2);
2023-01-11T21:38:05.8683364Z                                 auto tmp6 = tmp5 >= tmp1;
2023-01-11T21:38:05.8683470Z                                 auto tmp7 = tmp5 < tmp3;
2023-01-11T21:38:05.8683645Z                                 auto tmp8 = static_cast<long>((-1) + i3);
2023-01-11T21:38:05.8683753Z                                 auto tmp9 = tmp8 >= tmp1;
2023-01-11T21:38:05.8683848Z                                 auto tmp10 = tmp8 < tmp3;
2023-01-11T21:38:05.8683953Z                                 auto tmp11 = tmp2 & tmp4;
2023-01-11T21:38:05.8684063Z                                 auto tmp12 = tmp11 & tmp6;
2023-01-11T21:38:05.8684168Z                                 auto tmp13 = tmp12 & tmp7;
2023-01-11T21:38:05.8684272Z                                 auto tmp14 = tmp13 & tmp9;
2023-01-11T21:38:05.8684379Z                                 auto tmp15 = tmp14 & tmp10;
2023-01-11T21:38:05.8684477Z                                 float tmp16 = 6.0;
2023-01-11T21:38:05.8684568Z                                 if(tmp15)
2023-01-11T21:38:05.8684639Z                                 {
2023-01-11T21:38:05.8684844Z                                     auto tmp17 = in_ptr0[(-93) + i3 + (4*i2) + (16*i1) + (64*i0)];
2023-01-11T21:38:05.8684938Z                                     tmp16 = tmp17;
2023-01-11T21:38:05.8685018Z                                 }
2023-01-11T21:38:05.8685139Z                                 out_ptr0[i3 + (7*i2) + (77*i1) + (1155*i0)] = tmp16;
2023-01-11T21:38:05.8685214Z                             }
2023-01-11T21:38:05.8685288Z                         }
2023-01-11T21:38:05.8685351Z                     }
2023-01-11T21:38:05.8685421Z                 }
2023-01-11T21:38:05.8685485Z             }
2023-01-11T21:38:05.8685552Z         }
2023-01-11T21:38:05.8685634Z         #pragma omp for 
2023-01-11T21:38:05.8685723Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8685783Z         {
2023-01-11T21:38:05.8685868Z             #pragma GCC ivdep
2023-01-11T21:38:05.8685959Z             for(long i1=0; i1<11; i1+=1)
2023-01-11T21:38:05.8686028Z             {
2023-01-11T21:38:05.8686114Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8686210Z                 for(long i2=0; i2<4; i2+=1)
2023-01-11T21:38:05.8686280Z                 {
2023-01-11T21:38:05.8686343Z                     {
2023-01-11T21:38:05.8686447Z                         {
2023-01-11T21:38:05.8686622Z                             auto tmp0 = static_cast<long>((-3) + i1);
2023-01-11T21:38:05.8686734Z                             auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.8686837Z                             auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.8686949Z                             auto tmp3 = static_cast<long>(4);
2023-01-11T21:38:05.8687050Z                             auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.8687142Z                             auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:05.8687235Z                             float tmp6 = 6.0;
2023-01-11T21:38:05.8687322Z                             if(tmp5)
2023-01-11T21:38:05.8687397Z                             {
2023-01-11T21:38:05.8687581Z                                 auto tmp7 = in_ptr0[(-12) + i2 + (4*i1) + (16*i0)];
2023-01-11T21:38:05.8687673Z                                 tmp6 = tmp7;
2023-01-11T21:38:05.8687748Z                             }
2023-01-11T21:38:05.8687859Z                             out_ptr1[i2 + (4*i1) + (44*i0)] = tmp6;
2023-01-11T21:38:05.8687925Z                         }
2023-01-11T21:38:05.8687996Z                     }
2023-01-11T21:38:05.8688065Z                 }
2023-01-11T21:38:05.8688133Z             }
2023-01-11T21:38:05.8688200Z         }
2023-01-11T21:38:05.8688267Z     }
2023-01-11T21:38:05.8688324Z }
2023-01-11T21:38:05.8688410Z ''')
2023-01-11T21:38:05.8688415Z 
2023-01-11T21:38:05.8688420Z 
2023-01-11T21:38:05.8688514Z async_compile.wait(globals())
2023-01-11T21:38:05.8688593Z del async_compile
2023-01-11T21:38:05.8688598Z 
2023-01-11T21:38:05.8688673Z def call(args):
2023-01-11T21:38:05.8688779Z     arg0_1, = args
2023-01-11T21:38:05.8688857Z     args.clear()
2023-01-11T21:38:05.8689079Z     buf0 = empty_strided((2, 15, 11, 7), (1155, 77, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8689286Z     buf1 = empty_strided((2, 4, 11, 4), (176, 44, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8689459Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8689536Z     del arg0_1
2023-01-11T21:38:05.8689619Z     return (buf0, buf1, )
2023-01-11T21:38:05.8689624Z 
2023-01-11T21:38:05.8689628Z 
2023-01-11T21:38:05.8689711Z if __name__ == "__main__":
2023-01-11T21:38:05.8689831Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8689959Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8690170Z     arg0_1 = rand_strided((2, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8690275Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8690283Z 
2023-01-11T21:38:05.8690354Z ok (1.796s)
2023-01-11T21:38:05.8690833Z   test_conv2d_backward_channels_last_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8690965Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8691219Z [2023-01-11 21:25:36,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 60
2023-01-11T21:38:05.8691480Z [2023-01-11 21:25:36,686] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 60
2023-01-11T21:38:05.8691485Z 
2023-01-11T21:38:05.8691587Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8691665Z import torch
2023-01-11T21:38:05.8691741Z import random
2023-01-11T21:38:05.8691854Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8691978Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8691984Z 
2023-01-11T21:38:05.8692070Z aten = torch.ops.aten
2023-01-11T21:38:05.8692235Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8692334Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8692339Z 
2023-01-11T21:38:05.8692415Z import triton
2023-01-11T21:38:05.8692512Z import triton.language as tl
2023-01-11T21:38:05.8692639Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8692771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8692777Z 
2023-01-11T21:38:05.8692781Z 
2023-01-11T21:38:05.8692874Z async_compile.wait(globals())
2023-01-11T21:38:05.8692952Z del async_compile
2023-01-11T21:38:05.8692957Z 
2023-01-11T21:38:05.8693034Z def call(args):
2023-01-11T21:38:05.8693125Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8693201Z     args.clear()
2023-01-11T21:38:05.8693376Z     buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [320], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True])
2023-01-11T21:38:05.8693451Z     del arg0_1
2023-01-11T21:38:05.8693518Z     del arg1_1
2023-01-11T21:38:05.8693594Z     del arg2_1
2023-01-11T21:38:05.8693667Z     buf1 = buf0[0]
2023-01-11T21:38:05.8693787Z     assert_size_stride(buf1, (2, 2048, 8, 8), (131072, 1, 16384, 2048))
2023-01-11T21:38:05.8693862Z     buf2 = buf0[1]
2023-01-11T21:38:05.8693979Z     assert_size_stride(buf2, (320, 2048, 1, 1), (2048, 1, 2048, 2048))
2023-01-11T21:38:05.8694044Z     buf3 = buf0[2]
2023-01-11T21:38:05.8694147Z     assert_size_stride(buf3, (320, ), (1, ))
2023-01-11T21:38:05.8694216Z     del buf0
2023-01-11T21:38:05.8694304Z     return (buf1, buf2, buf3, )
2023-01-11T21:38:05.8694309Z 
2023-01-11T21:38:05.8694314Z 
2023-01-11T21:38:05.8694394Z if __name__ == "__main__":
2023-01-11T21:38:05.8694655Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8694783Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8695012Z     arg0_1 = rand_strided((2, 320, 8, 8), (20480, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8695227Z     arg1_1 = rand_strided((2, 2048, 8, 8), (131072, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8695456Z     arg2_1 = rand_strided((320, 2048, 1, 1), (2048, 1, 2048, 2048), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8695584Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8695589Z 
2023-01-11T21:38:05.8695663Z ok (0.167s)
2023-01-11T21:38:05.8695833Z   test_conv2d_binary_cpu (__main__.CpuTests) ... skip: only support cpu conv2d binary test (0.002s)
2023-01-11T21:38:05.8696004Z   test_conv2d_channels_last_cpu (__main__.CpuTests) ... skip: only support cpu channels_last (0.001s)
2023-01-11T21:38:05.8696169Z   test_conv2d_packed_cpu (__main__.CpuTests) ... skip: only support cpu conv2d unary test (0.000s)
2023-01-11T21:38:05.8696334Z   test_conv2d_unary_cpu (__main__.CpuTests) ... skip: only support cpu conv2d unary test (0.001s)
2023-01-11T21:38:05.8696506Z   test_conv3d_channels_last_cpu (__main__.CpuTests) ... skip: only support cpu channels_last (0.001s)
2023-01-11T21:38:05.8697011Z   test_conv_autotune_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.8697095Z   warnings.warn(
2023-01-11T21:38:05.8697417Z [2023-01-11 21:25:36,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 61
2023-01-11T21:38:05.8697678Z [2023-01-11 21:25:36,795] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 61
2023-01-11T21:38:05.8697684Z 
2023-01-11T21:38:05.8697786Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8697862Z import torch
2023-01-11T21:38:05.8697938Z import random
2023-01-11T21:38:05.8698058Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8698174Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8698188Z 
2023-01-11T21:38:05.8698309Z aten = torch.ops.aten
2023-01-11T21:38:05.8698449Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8698546Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8698551Z 
2023-01-11T21:38:05.8698626Z import triton
2023-01-11T21:38:05.8698720Z import triton.language as tl
2023-01-11T21:38:05.8698846Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8698985Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8698990Z 
2023-01-11T21:38:05.8699131Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune
2023-01-11T21:38:05.8699280Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time
2023-01-11T21:38:05.8699427Z from torch._inductor.triton_ops.autotune import conv_heuristics
2023-01-11T21:38:05.8699432Z 
2023-01-11T21:38:05.8699437Z 
2023-01-11T21:38:05.8699528Z async_compile.wait(globals())
2023-01-11T21:38:05.8699604Z del async_compile
2023-01-11T21:38:05.8699609Z 
2023-01-11T21:38:05.8699686Z def call(args):
2023-01-11T21:38:05.8699776Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8699851Z     args.clear()
2023-01-11T21:38:05.8699987Z     buf0 = aten.convolution(arg0_1, arg1_1, arg2_1, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:05.8700104Z     assert_size_stride(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1))
2023-01-11T21:38:05.8700179Z     del arg0_1
2023-01-11T21:38:05.8700251Z     del arg1_1
2023-01-11T21:38:05.8700322Z     del arg2_1
2023-01-11T21:38:05.8700398Z     return (buf0, )
2023-01-11T21:38:05.8700403Z 
2023-01-11T21:38:05.8700409Z 
2023-01-11T21:38:05.8700489Z if __name__ == "__main__":
2023-01-11T21:38:05.8700646Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8700765Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8700997Z     arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8701218Z     arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8701409Z     arg2_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8701537Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8701542Z 
2023-01-11T21:38:05.8701614Z ok (0.099s)
2023-01-11T21:38:05.8701939Z   test_conv_backward_cpu (__main__.CpuTests) ... [2023-01-11 21:25:36,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 62
2023-01-11T21:38:05.8702202Z [2023-01-11 21:25:36,963] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 62
2023-01-11T21:38:05.8702211Z 
2023-01-11T21:38:05.8702301Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8702377Z import torch
2023-01-11T21:38:05.8702453Z import random
2023-01-11T21:38:05.8702571Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8702695Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8702701Z 
2023-01-11T21:38:05.8702786Z aten = torch.ops.aten
2023-01-11T21:38:05.8702923Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8703018Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8703023Z 
2023-01-11T21:38:05.8703090Z import triton
2023-01-11T21:38:05.8703184Z import triton.language as tl
2023-01-11T21:38:05.8703309Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8703453Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8703459Z 
2023-01-11T21:38:05.8703463Z 
2023-01-11T21:38:05.8703558Z async_compile.wait(globals())
2023-01-11T21:38:05.8703635Z del async_compile
2023-01-11T21:38:05.8703643Z 
2023-01-11T21:38:05.8703719Z def call(args):
2023-01-11T21:38:05.8703849Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args
2023-01-11T21:38:05.8703918Z     args.clear()
2023-01-11T21:38:05.8704124Z     buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True])
2023-01-11T21:38:05.8704202Z     buf1 = buf0[0]
2023-01-11T21:38:05.8704316Z     assert_size_stride(buf1, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:05.8704389Z     buf2 = buf0[1]
2023-01-11T21:38:05.8704498Z     assert_size_stride(buf2, (4, 4, 3, 3), (36, 9, 3, 1))
2023-01-11T21:38:05.8704570Z     buf3 = buf0[2]
2023-01-11T21:38:05.8704662Z     assert_size_stride(buf3, (4, ), (1, ))
2023-01-11T21:38:05.8704732Z     del buf0
2023-01-11T21:38:05.8704900Z     buf4 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, False, False])
2023-01-11T21:38:05.8704974Z     del arg0_1
2023-01-11T21:38:05.8705061Z     del arg1_1
2023-01-11T21:38:05.8705148Z     del arg2_1
2023-01-11T21:38:05.8705227Z     buf5 = buf4[0]
2023-01-11T21:38:05.8705347Z     assert_size_stride(buf5, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:05.8705420Z     del buf4
2023-01-11T21:38:05.8705585Z     buf6 = aten.convolution_backward(arg3_1, arg4_1, arg5_1, [4], [1], [0], [1], False, [0], 1, [True, True, True])
2023-01-11T21:38:05.8705659Z     del arg3_1
2023-01-11T21:38:05.8705730Z     del arg4_1
2023-01-11T21:38:05.8705802Z     del arg5_1
2023-01-11T21:38:05.8705875Z     buf7 = buf6[0]
2023-01-11T21:38:05.8705977Z     assert_size_stride(buf7, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:05.8706052Z     buf8 = buf6[1]
2023-01-11T21:38:05.8706160Z     assert_size_stride(buf8, (4, 4, 3, 3), (36, 9, 3, 1))
2023-01-11T21:38:05.8706232Z     buf9 = buf6[2]
2023-01-11T21:38:05.8706330Z     assert_size_stride(buf9, (4, ), (1, ))
2023-01-11T21:38:05.8706402Z     del buf6
2023-01-11T21:38:05.8706576Z     buf10 = aten.convolution_backward(arg6_1, arg7_1, arg8_1, [4], [1, 1, 1], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1, [True, True, True])
2023-01-11T21:38:05.8706671Z     del arg6_1
2023-01-11T21:38:05.8706746Z     del arg7_1
2023-01-11T21:38:05.8706817Z     del arg8_1
2023-01-11T21:38:05.8706893Z     buf11 = buf10[0]
2023-01-11T21:38:05.8707014Z     assert_size_stride(buf11, (3, 4, 5, 5, 5), (500, 125, 25, 5, 1))
2023-01-11T21:38:05.8707091Z     buf12 = buf10[1]
2023-01-11T21:38:05.8707206Z     assert_size_stride(buf12, (4, 4, 3, 3, 3), (108, 27, 9, 3, 1))
2023-01-11T21:38:05.8707273Z     buf13 = buf10[2]
2023-01-11T21:38:05.8707374Z     assert_size_stride(buf13, (4, ), (1, ))
2023-01-11T21:38:05.8707444Z     del buf10
2023-01-11T21:38:05.8707577Z     return (buf1, buf2, buf3, buf5, buf7, buf8, buf9, buf11, buf12, buf13, )
2023-01-11T21:38:05.8707583Z 
2023-01-11T21:38:05.8707588Z 
2023-01-11T21:38:05.8707670Z if __name__ == "__main__":
2023-01-11T21:38:05.8707789Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8707920Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8708136Z     arg0_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8708341Z     arg1_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8708554Z     arg2_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8708755Z     arg3_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8708963Z     arg4_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8709164Z     arg5_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8709379Z     arg6_1 = rand_strided((3, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8709600Z     arg7_1 = rand_strided((3, 4, 5, 5, 5), (500, 125, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8709818Z     arg8_1 = rand_strided((4, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8709976Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1]))
2023-01-11T21:38:05.8709990Z 
2023-01-11T21:38:05.8710083Z ok (0.168s)
2023-01-11T21:38:05.8710245Z   test_conv_bn_fuse_cpu (__main__.CpuTests) ... skip: only support cpu conv bn test (0.001s)
2023-01-11T21:38:05.8710418Z   test_conv_functional_bn_fuse_cpu (__main__.CpuTests) ... skip: only support cpu conv bn test (0.001s)
2023-01-11T21:38:05.8710876Z   test_convolution1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8711010Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8711268Z [2023-01-11 21:25:37,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 63
2023-01-11T21:38:05.8711530Z [2023-01-11 21:25:38,841] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 63
2023-01-11T21:38:05.8711536Z 
2023-01-11T21:38:05.8711636Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8711712Z import torch
2023-01-11T21:38:05.8711780Z import random
2023-01-11T21:38:05.8711900Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8712024Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8712029Z 
2023-01-11T21:38:05.8712110Z aten = torch.ops.aten
2023-01-11T21:38:05.8712247Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8712345Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8712377Z 
2023-01-11T21:38:05.8712453Z import triton
2023-01-11T21:38:05.8712546Z import triton.language as tl
2023-01-11T21:38:05.8712664Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8712805Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8712811Z 
2023-01-11T21:38:05.8712816Z 
2023-01-11T21:38:05.8712959Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8713167Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8713289Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.8713396Z                        bool* __restrict__ out_ptr0)
2023-01-11T21:38:05.8713465Z {
2023-01-11T21:38:05.8713559Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8713625Z     {
2023-01-11T21:38:05.8713709Z         #pragma omp for 
2023-01-11T21:38:05.8713797Z         for(long i0=0; i0<2352; i0+=1)
2023-01-11T21:38:05.8713866Z         {
2023-01-11T21:38:05.8713939Z             {
2023-01-11T21:38:05.8714008Z                 {
2023-01-11T21:38:05.8714104Z                     auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.8714205Z                     auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.8714316Z                     auto tmp2 = static_cast<float>(0);
2023-01-11T21:38:05.8714418Z                     auto tmp3 = tmp1 <= tmp2;
2023-01-11T21:38:05.8714514Z                     in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.8714606Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.8714675Z                 }
2023-01-11T21:38:05.8714735Z             }
2023-01-11T21:38:05.8714801Z         }
2023-01-11T21:38:05.8714867Z     }
2023-01-11T21:38:05.8714931Z }
2023-01-11T21:38:05.8715016Z ''')
2023-01-11T21:38:05.8715022Z 
2023-01-11T21:38:05.8715026Z 
2023-01-11T21:38:05.8715121Z async_compile.wait(globals())
2023-01-11T21:38:05.8715198Z del async_compile
2023-01-11T21:38:05.8715203Z 
2023-01-11T21:38:05.8715270Z def call(args):
2023-01-11T21:38:05.8715378Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:05.8715459Z     args.clear()
2023-01-11T21:38:05.8715617Z     buf0 = aten.convolution(primals_3, primals_1, primals_2, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:05.8715734Z     assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1))
2023-01-11T21:38:05.8715812Z     del primals_2
2023-01-11T21:38:05.8715973Z     buf1 = as_strided(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)); del buf0  # reuse
2023-01-11T21:38:05.8716187Z     buf2 = empty_strided((2, 6, 14, 14), (1176, 196, 14, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8716318Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8716430Z     return (buf1, primals_1, primals_3, buf2, )
2023-01-11T21:38:05.8716435Z 
2023-01-11T21:38:05.8716439Z 
2023-01-11T21:38:05.8716520Z if __name__ == "__main__":
2023-01-11T21:38:05.8716638Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8716764Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8716989Z     primals_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8717188Z     primals_2 = rand_strided((6, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8717412Z     primals_3 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8717548Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:05.8717553Z 
2023-01-11T21:38:05.8717626Z ok (1.867s)
2023-01-11T21:38:05.8718080Z   test_convolution2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8718209Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8718493Z [2023-01-11 21:25:38,886] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 64
2023-01-11T21:38:05.8718754Z [2023-01-11 21:25:38,907] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 64
2023-01-11T21:38:05.8718762Z 
2023-01-11T21:38:05.8718862Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8718938Z import torch
2023-01-11T21:38:05.8719012Z import random
2023-01-11T21:38:05.8719124Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8719253Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8719259Z 
2023-01-11T21:38:05.8719343Z aten = torch.ops.aten
2023-01-11T21:38:05.8719480Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8719582Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8719587Z 
2023-01-11T21:38:05.8719664Z import triton
2023-01-11T21:38:05.8719757Z import triton.language as tl
2023-01-11T21:38:05.8719888Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8720020Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8720026Z 
2023-01-11T21:38:05.8720038Z 
2023-01-11T21:38:05.8720124Z async_compile.wait(globals())
2023-01-11T21:38:05.8720204Z del async_compile
2023-01-11T21:38:05.8720209Z 
2023-01-11T21:38:05.8720284Z def call(args):
2023-01-11T21:38:05.8720371Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8720449Z     args.clear()
2023-01-11T21:38:05.8720585Z     buf0 = aten.convolution(arg0_1, arg1_1, arg2_1, (4,), (0,), (1,), True, (0,), 1)
2023-01-11T21:38:05.8720697Z     assert_size_stride(buf0, (2, 16, 364), (5824, 364, 1))
2023-01-11T21:38:05.8720763Z     del arg0_1
2023-01-11T21:38:05.8720835Z     del arg1_1
2023-01-11T21:38:05.8720908Z     del arg2_1
2023-01-11T21:38:05.8720988Z     return (buf0, )
2023-01-11T21:38:05.8720993Z 
2023-01-11T21:38:05.8720997Z 
2023-01-11T21:38:05.8721083Z if __name__ == "__main__":
2023-01-11T21:38:05.8721200Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8721326Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8721531Z     arg0_1 = rand_strided((2, 32, 90), (2880, 90, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8721766Z     arg1_1 = rand_strided((32, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8721961Z     arg2_1 = rand_strided((16, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8722091Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8722096Z 
2023-01-11T21:38:05.8722167Z ok (0.064s)
2023-01-11T21:38:05.8722615Z   test_cos_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8722752Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8723010Z [2023-01-11 21:25:38,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 65
2023-01-11T21:38:05.8723275Z [2023-01-11 21:25:40,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 65
2023-01-11T21:38:05.8723280Z 
2023-01-11T21:38:05.8723380Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8723446Z import torch
2023-01-11T21:38:05.8723521Z import random
2023-01-11T21:38:05.8723638Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8723765Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8723770Z 
2023-01-11T21:38:05.8723852Z aten = torch.ops.aten
2023-01-11T21:38:05.8723989Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8724089Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8724122Z 
2023-01-11T21:38:05.8724190Z import triton
2023-01-11T21:38:05.8724284Z import triton.language as tl
2023-01-11T21:38:05.8724409Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8724551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8724556Z 
2023-01-11T21:38:05.8724563Z 
2023-01-11T21:38:05.8724704Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8724912Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8725040Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8725146Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8725240Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8725310Z {
2023-01-11T21:38:05.8725413Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8725482Z     {
2023-01-11T21:38:05.8725564Z         #pragma omp for 
2023-01-11T21:38:05.8725655Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.8725724Z         {
2023-01-11T21:38:05.8725859Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8725950Z             auto tmp1 = tmp0.cos();
2023-01-11T21:38:05.8726092Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.8726184Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.8726320Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8726412Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:05.8726503Z             auto tmp6 = tmp5.cos();
2023-01-11T21:38:05.8726600Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8726688Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8726756Z         }
2023-01-11T21:38:05.8726857Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8726946Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.8727015Z         {
2023-01-11T21:38:05.8727104Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8727191Z             auto tmp1 = std::cos(tmp0);
2023-01-11T21:38:05.8727303Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:05.8727392Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.8727495Z             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.8727622Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:05.8727717Z             auto tmp6 = std::cos(tmp5);
2023-01-11T21:38:05.8727804Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.8727881Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:05.8727948Z         }
2023-01-11T21:38:05.8728015Z     }
2023-01-11T21:38:05.8728081Z }
2023-01-11T21:38:05.8728166Z ''')
2023-01-11T21:38:05.8728172Z 
2023-01-11T21:38:05.8728176Z 
2023-01-11T21:38:05.8728274Z async_compile.wait(globals())
2023-01-11T21:38:05.8728354Z del async_compile
2023-01-11T21:38:05.8728360Z 
2023-01-11T21:38:05.8728435Z def call(args):
2023-01-11T21:38:05.8728507Z     arg0_1, = args
2023-01-11T21:38:05.8728583Z     args.clear()
2023-01-11T21:38:05.8728786Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8728987Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8729157Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8729232Z     del arg0_1
2023-01-11T21:38:05.8729313Z     return (buf0, buf1, )
2023-01-11T21:38:05.8729318Z 
2023-01-11T21:38:05.8729323Z 
2023-01-11T21:38:05.8729396Z if __name__ == "__main__":
2023-01-11T21:38:05.8729516Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8729642Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8729841Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8729959Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8729964Z 
2023-01-11T21:38:05.8730071Z ok (1.833s)
2023-01-11T21:38:05.8730238Z   test_cpp_wrapper_cpu (__main__.CpuTests) ... skip: cpp_wrapper only supports cpu (0.001s)
2023-01-11T21:38:05.8730379Z   test_cudnn_rnn_cpu (__main__.CpuTests) ... skip: requires CUDA (0.003s)
2023-01-11T21:38:05.8730589Z   test_dense_mask_index_cpu (__main__.CpuTests) ... skip: https://github.com/pytorch/torchdynamo/issues/1697 (0.001s)
2023-01-11T21:38:05.8731037Z   test_div1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8731170Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8731436Z [2023-01-11 21:25:40,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 66
2023-01-11T21:38:05.8731699Z [2023-01-11 21:25:42,730] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 66
2023-01-11T21:38:05.8731705Z 
2023-01-11T21:38:05.8731804Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8731880Z import torch
2023-01-11T21:38:05.8731955Z import random
2023-01-11T21:38:05.8732076Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8732192Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8732204Z 
2023-01-11T21:38:05.8732280Z aten = torch.ops.aten
2023-01-11T21:38:05.8732417Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8732514Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8732521Z 
2023-01-11T21:38:05.8732596Z import triton
2023-01-11T21:38:05.8732691Z import triton.language as tl
2023-01-11T21:38:05.8732817Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8732957Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8732965Z 
2023-01-11T21:38:05.8732969Z 
2023-01-11T21:38:05.8733107Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8733305Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8733457Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8733570Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8733674Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8733777Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8733876Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8733974Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8734065Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:05.8734133Z {
2023-01-11T21:38:05.8734235Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8734302Z     {
2023-01-11T21:38:05.8734387Z         #pragma omp for 
2023-01-11T21:38:05.8734581Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8734651Z         {
2023-01-11T21:38:05.8734786Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8734923Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8735025Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8735135Z             auto tmp3 = tmp2.floor();
2023-01-11T21:38:05.8735237Z             auto tmp4 = tmp2.trunc();
2023-01-11T21:38:05.8735346Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8735440Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8735527Z             tmp4.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.8735622Z             tmp2.store(out_ptr3 + 8*i0);
2023-01-11T21:38:05.8735714Z             tmp3.store(out_ptr4 + 8*i0);
2023-01-11T21:38:05.8735782Z         }
2023-01-11T21:38:05.8735880Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8736018Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8736086Z         {
2023-01-11T21:38:05.8736169Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8736259Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.8736346Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8736447Z             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:05.8736550Z             auto tmp4 = std::trunc(tmp2);
2023-01-11T21:38:05.8736634Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8736720Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.8736796Z             out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.8736882Z             out_ptr3[i0] = tmp2;
2023-01-11T21:38:05.8736967Z             out_ptr4[i0] = tmp3;
2023-01-11T21:38:05.8737037Z         }
2023-01-11T21:38:05.8737105Z     }
2023-01-11T21:38:05.8737225Z }
2023-01-11T21:38:05.8737330Z ''')
2023-01-11T21:38:05.8737336Z 
2023-01-11T21:38:05.8737340Z 
2023-01-11T21:38:05.8737426Z async_compile.wait(globals())
2023-01-11T21:38:05.8737508Z del async_compile
2023-01-11T21:38:05.8737513Z 
2023-01-11T21:38:05.8737588Z def call(args):
2023-01-11T21:38:05.8737669Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8737744Z     args.clear()
2023-01-11T21:38:05.8737943Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8738140Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8738322Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8738513Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8738700Z     buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8738961Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8739037Z     del arg0_1
2023-01-11T21:38:05.8739114Z     del arg1_1
2023-01-11T21:38:05.8739216Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8739222Z 
2023-01-11T21:38:05.8739226Z 
2023-01-11T21:38:05.8739306Z if __name__ == "__main__":
2023-01-11T21:38:05.8739425Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8739588Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8739787Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8739983Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8740104Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8740109Z 
2023-01-11T21:38:05.8740179Z ok (1.985s)
2023-01-11T21:38:05.8740623Z   test_div2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8740760Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8741021Z [2023-01-11 21:25:42,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 67
2023-01-11T21:38:05.8741285Z [2023-01-11 21:25:44,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 67
2023-01-11T21:38:05.8741291Z 
2023-01-11T21:38:05.8741381Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8741459Z import torch
2023-01-11T21:38:05.8741534Z import random
2023-01-11T21:38:05.8741654Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8741777Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8741782Z 
2023-01-11T21:38:05.8741867Z aten = torch.ops.aten
2023-01-11T21:38:05.8742002Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8742132Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8742137Z 
2023-01-11T21:38:05.8742204Z import triton
2023-01-11T21:38:05.8742298Z import triton.language as tl
2023-01-11T21:38:05.8742422Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8742568Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8742574Z 
2023-01-11T21:38:05.8742579Z 
2023-01-11T21:38:05.8742717Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8742926Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8743053Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8743162Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8743257Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8743356Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8743457Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8743560Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8743658Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:05.8743724Z {
2023-01-11T21:38:05.8743825Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8743884Z     {
2023-01-11T21:38:05.8743969Z         #pragma omp for 
2023-01-11T21:38:05.8744055Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8744123Z         {
2023-01-11T21:38:05.8744191Z             {
2023-01-11T21:38:05.8744262Z                 {
2023-01-11T21:38:05.8744353Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8744453Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8744570Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8744667Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:05.8744774Z                     auto tmp4 = std::floor(tmp3);
2023-01-11T21:38:05.8744878Z                     auto tmp5 = std::trunc(tmp3);
2023-01-11T21:38:05.8744971Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.8745061Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.8745142Z                     out_ptr2[i0] = tmp5;
2023-01-11T21:38:05.8745229Z                     out_ptr3[i0] = tmp3;
2023-01-11T21:38:05.8745345Z                     out_ptr4[i0] = tmp4;
2023-01-11T21:38:05.8745417Z                 }
2023-01-11T21:38:05.8745484Z             }
2023-01-11T21:38:05.8745553Z         }
2023-01-11T21:38:05.8745613Z     }
2023-01-11T21:38:05.8745678Z }
2023-01-11T21:38:05.8745764Z ''')
2023-01-11T21:38:05.8745770Z 
2023-01-11T21:38:05.8745774Z 
2023-01-11T21:38:05.8745869Z async_compile.wait(globals())
2023-01-11T21:38:05.8745948Z del async_compile
2023-01-11T21:38:05.8745953Z 
2023-01-11T21:38:05.8746028Z def call(args):
2023-01-11T21:38:05.8746110Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8746185Z     args.clear()
2023-01-11T21:38:05.8746374Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8746568Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8746756Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8746945Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8747133Z     buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8747392Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8747467Z     del arg0_1
2023-01-11T21:38:05.8747539Z     del arg1_1
2023-01-11T21:38:05.8747633Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8747639Z 
2023-01-11T21:38:05.8747643Z 
2023-01-11T21:38:05.8747724Z if __name__ == "__main__":
2023-01-11T21:38:05.8747843Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8748002Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8748194Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8748387Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8748511Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8748516Z 
2023-01-11T21:38:05.8748587Z ok (1.847s)
2023-01-11T21:38:05.8749036Z   test_div3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8749162Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8749420Z [2023-01-11 21:25:44,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 68
2023-01-11T21:38:05.8749683Z [2023-01-11 21:25:46,350] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 68
2023-01-11T21:38:05.8749689Z 
2023-01-11T21:38:05.8749788Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8749865Z import torch
2023-01-11T21:38:05.8749939Z import random
2023-01-11T21:38:05.8750059Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8750184Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8750189Z 
2023-01-11T21:38:05.8750264Z aten = torch.ops.aten
2023-01-11T21:38:05.8750401Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8750498Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8750504Z 
2023-01-11T21:38:05.8750580Z import triton
2023-01-11T21:38:05.8750673Z import triton.language as tl
2023-01-11T21:38:05.8750797Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8750938Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8750944Z 
2023-01-11T21:38:05.8750948Z 
2023-01-11T21:38:05.8751087Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8751286Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8751449Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8751563Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8751669Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8751772Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8751872Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8751974Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8752072Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8752130Z {
2023-01-11T21:38:05.8752234Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8752304Z     {
2023-01-11T21:38:05.8752386Z         #pragma omp for 
2023-01-11T21:38:05.8752475Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8752544Z         {
2023-01-11T21:38:05.8752605Z             {
2023-01-11T21:38:05.8752674Z                 {
2023-01-11T21:38:05.8752775Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8752873Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8752986Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8753100Z                     auto tmp3 = static_cast<float>(tmp2);
2023-01-11T21:38:05.8753195Z                     auto tmp4 = tmp1 / tmp3;
2023-01-11T21:38:05.8753433Z                     auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2);
2023-01-11T21:38:05.8753530Z                     auto tmp6 = tmp0 / tmp2;
2023-01-11T21:38:05.8753622Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8753743Z                     out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.8753832Z                     out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.8753921Z                     out_ptr3[i0] = tmp4;
2023-01-11T21:38:05.8754009Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:05.8754079Z                 }
2023-01-11T21:38:05.8754140Z             }
2023-01-11T21:38:05.8754210Z         }
2023-01-11T21:38:05.8754279Z     }
2023-01-11T21:38:05.8754342Z }
2023-01-11T21:38:05.8754428Z ''')
2023-01-11T21:38:05.8754434Z 
2023-01-11T21:38:05.8754439Z 
2023-01-11T21:38:05.8754533Z async_compile.wait(globals())
2023-01-11T21:38:05.8754603Z del async_compile
2023-01-11T21:38:05.8754619Z 
2023-01-11T21:38:05.8754687Z def call(args):
2023-01-11T21:38:05.8754770Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8754850Z     args.clear()
2023-01-11T21:38:05.8755047Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8755238Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8755435Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8755630Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8755808Z     buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8756066Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8756142Z     del arg0_1
2023-01-11T21:38:05.8756216Z     del arg1_1
2023-01-11T21:38:05.8756321Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8756328Z 
2023-01-11T21:38:05.8756332Z 
2023-01-11T21:38:05.8756413Z if __name__ == "__main__":
2023-01-11T21:38:05.8756532Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8756661Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8756847Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8757039Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8757162Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8757167Z 
2023-01-11T21:38:05.8757238Z ok (1.773s)
2023-01-11T21:38:05.8757768Z   test_div4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8757904Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8758160Z [2023-01-11 21:25:46,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 69
2023-01-11T21:38:05.8758427Z [2023-01-11 21:25:46,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 69
2023-01-11T21:38:05.8758433Z 
2023-01-11T21:38:05.8758536Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8758613Z import torch
2023-01-11T21:38:05.8758681Z import random
2023-01-11T21:38:05.8758806Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8758931Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8758936Z 
2023-01-11T21:38:05.8759022Z aten = torch.ops.aten
2023-01-11T21:38:05.8759164Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8759261Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8759266Z 
2023-01-11T21:38:05.8759342Z import triton
2023-01-11T21:38:05.8759427Z import triton.language as tl
2023-01-11T21:38:05.8759558Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8759697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8759732Z 
2023-01-11T21:38:05.8759737Z 
2023-01-11T21:38:05.8759878Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8760084Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8760207Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8760320Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8760424Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8760516Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8760617Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8760719Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8760817Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8760883Z {
2023-01-11T21:38:05.8760986Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8761052Z     {
2023-01-11T21:38:05.8761126Z         #pragma omp for 
2023-01-11T21:38:05.8761218Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8761286Z         {
2023-01-11T21:38:05.8761355Z             {
2023-01-11T21:38:05.8761424Z                 {
2023-01-11T21:38:05.8761523Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8761621Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8761730Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8761844Z                     auto tmp3 = static_cast<float>(tmp2);
2023-01-11T21:38:05.8761940Z                     auto tmp4 = tmp1 / tmp3;
2023-01-11T21:38:05.8762188Z                     auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2);
2023-01-11T21:38:05.8762283Z                     auto tmp6 = tmp0 / tmp2;
2023-01-11T21:38:05.8762374Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8762465Z                     out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.8762546Z                     out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.8762637Z                     out_ptr3[i0] = tmp4;
2023-01-11T21:38:05.8762726Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:05.8762803Z                 }
2023-01-11T21:38:05.8762870Z             }
2023-01-11T21:38:05.8762937Z         }
2023-01-11T21:38:05.8763005Z     }
2023-01-11T21:38:05.8763062Z }
2023-01-11T21:38:05.8763146Z ''')
2023-01-11T21:38:05.8763182Z 
2023-01-11T21:38:05.8763187Z 
2023-01-11T21:38:05.8763284Z async_compile.wait(globals())
2023-01-11T21:38:05.8763360Z del async_compile
2023-01-11T21:38:05.8763365Z 
2023-01-11T21:38:05.8763440Z def call(args):
2023-01-11T21:38:05.8763526Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8763602Z     args.clear()
2023-01-11T21:38:05.8763788Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8763980Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8764170Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8764365Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8764553Z     buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8764811Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8764904Z     del arg0_1
2023-01-11T21:38:05.8764984Z     del arg1_1
2023-01-11T21:38:05.8765098Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8765115Z 
2023-01-11T21:38:05.8765120Z 
2023-01-11T21:38:05.8765194Z if __name__ == "__main__":
2023-01-11T21:38:05.8765314Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8765440Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8765632Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8765822Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8765981Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8765986Z 
2023-01-11T21:38:05.8766057Z ok (0.067s)
2023-01-11T21:38:05.8766501Z   test_div5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8766635Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8766881Z [2023-01-11 21:25:46,588] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 70
2023-01-11T21:38:05.8767141Z [2023-01-11 21:25:48,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 70
2023-01-11T21:38:05.8767150Z 
2023-01-11T21:38:05.8767250Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8767326Z import torch
2023-01-11T21:38:05.8767402Z import random
2023-01-11T21:38:05.8767521Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8767649Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8767654Z 
2023-01-11T21:38:05.8767741Z aten = torch.ops.aten
2023-01-11T21:38:05.8767871Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8767966Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8767972Z 
2023-01-11T21:38:05.8768046Z import triton
2023-01-11T21:38:05.8768141Z import triton.language as tl
2023-01-11T21:38:05.8768265Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8768405Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8768410Z 
2023-01-11T21:38:05.8768415Z 
2023-01-11T21:38:05.8768552Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8768759Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8768873Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8768978Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8769082Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8769213Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8769320Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8769421Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8769489Z {
2023-01-11T21:38:05.8769584Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8769650Z     {
2023-01-11T21:38:05.8769731Z         #pragma omp for 
2023-01-11T21:38:05.8769822Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8769892Z         {
2023-01-11T21:38:05.8769960Z             {
2023-01-11T21:38:05.8770030Z                 {
2023-01-11T21:38:05.8770120Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8770238Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8770352Z                     auto tmp2 = static_cast<float>(16);
2023-01-11T21:38:05.8770451Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:05.8770560Z                     auto tmp4 = static_cast<long>(16);
2023-01-11T21:38:05.8770813Z                     auto tmp5 = ((tmp0 < 0) != (tmp4 < 0) ? (tmp0 % tmp4 != 0 ? tmp0 / tmp4 - 1 : tmp0 / tmp4) : tmp0 / tmp4);
2023-01-11T21:38:05.8770914Z                     auto tmp6 = tmp0 / tmp4;
2023-01-11T21:38:05.8770997Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.8771085Z                     out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.8771174Z                     out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.8771264Z                     out_ptr3[i0] = tmp3;
2023-01-11T21:38:05.8771352Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:05.8771421Z                 }
2023-01-11T21:38:05.8771490Z             }
2023-01-11T21:38:05.8771582Z         }
2023-01-11T21:38:05.8771650Z     }
2023-01-11T21:38:05.8771715Z }
2023-01-11T21:38:05.8771799Z ''')
2023-01-11T21:38:05.8771805Z 
2023-01-11T21:38:05.8771809Z 
2023-01-11T21:38:05.8771903Z async_compile.wait(globals())
2023-01-11T21:38:05.8771980Z del async_compile
2023-01-11T21:38:05.8771986Z 
2023-01-11T21:38:05.8772063Z def call(args):
2023-01-11T21:38:05.8772131Z     arg0_1, = args
2023-01-11T21:38:05.8772207Z     args.clear()
2023-01-11T21:38:05.8772401Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8772590Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8772779Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8772969Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8773153Z     buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8773399Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8773466Z     del arg0_1
2023-01-11T21:38:05.8773569Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8773574Z 
2023-01-11T21:38:05.8773579Z 
2023-01-11T21:38:05.8773662Z if __name__ == "__main__":
2023-01-11T21:38:05.8773780Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8773910Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8774103Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8774217Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8774222Z 
2023-01-11T21:38:05.8774294Z ok (1.835s)
2023-01-11T21:38:05.8774845Z   test_div6_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8774976Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8775288Z [2023-01-11 21:25:48,296] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 71
2023-01-11T21:38:05.8775553Z [2023-01-11 21:25:49,958] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 71
2023-01-11T21:38:05.8775559Z 
2023-01-11T21:38:05.8775659Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8775734Z import torch
2023-01-11T21:38:05.8775812Z import random
2023-01-11T21:38:05.8775933Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8776056Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8776061Z 
2023-01-11T21:38:05.8776136Z aten = torch.ops.aten
2023-01-11T21:38:05.8776278Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8776380Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8776385Z 
2023-01-11T21:38:05.8776460Z import triton
2023-01-11T21:38:05.8776554Z import triton.language as tl
2023-01-11T21:38:05.8776681Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8776820Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8776826Z 
2023-01-11T21:38:05.8776830Z 
2023-01-11T21:38:05.8776968Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8777222Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8777354Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.8777465Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8777569Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8777669Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8777809Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8777911Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8778008Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8778066Z {
2023-01-11T21:38:05.8778171Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8778238Z     {
2023-01-11T21:38:05.8778320Z         #pragma omp for 
2023-01-11T21:38:05.8778409Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8778477Z         {
2023-01-11T21:38:05.8778538Z             {
2023-01-11T21:38:05.8778610Z                 {
2023-01-11T21:38:05.8778708Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8778805Z                     auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:05.8778914Z                     auto tmp1 = static_cast<long>(tmp0);
2023-01-11T21:38:05.8779028Z                     auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:05.8779138Z                     auto tmp4 = static_cast<float>(tmp3);
2023-01-11T21:38:05.8779231Z                     auto tmp5 = tmp2 / tmp4;
2023-01-11T21:38:05.8779477Z                     auto tmp6 = ((tmp1 < 0) != (tmp3 < 0) ? (tmp1 % tmp3 != 0 ? tmp1 / tmp3 - 1 : tmp1 / tmp3) : tmp1 / tmp3);
2023-01-11T21:38:05.8779573Z                     auto tmp7 = tmp1 / tmp3;
2023-01-11T21:38:05.8779686Z                     auto tmp8 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8779781Z                     auto tmp9 = tmp8 / tmp4;
2023-01-11T21:38:05.8779870Z                     out_ptr0[i0] = tmp5;
2023-01-11T21:38:05.8779960Z                     out_ptr1[i0] = tmp6;
2023-01-11T21:38:05.8780051Z                     out_ptr2[i0] = tmp7;
2023-01-11T21:38:05.8780133Z                     out_ptr3[i0] = tmp9;
2023-01-11T21:38:05.8780221Z                     out_ptr4[i0] = tmp6;
2023-01-11T21:38:05.8780292Z                 }
2023-01-11T21:38:05.8780361Z             }
2023-01-11T21:38:05.8780428Z         }
2023-01-11T21:38:05.8780496Z     }
2023-01-11T21:38:05.8780553Z }
2023-01-11T21:38:05.8780638Z ''')
2023-01-11T21:38:05.8780644Z 
2023-01-11T21:38:05.8780649Z 
2023-01-11T21:38:05.8780742Z async_compile.wait(globals())
2023-01-11T21:38:05.8780821Z del async_compile
2023-01-11T21:38:05.8780826Z 
2023-01-11T21:38:05.8780902Z def call(args):
2023-01-11T21:38:05.8780982Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8781092Z     args.clear()
2023-01-11T21:38:05.8781289Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8781471Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8781663Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8781854Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8782039Z     buf4 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8782298Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8782377Z     del arg0_1
2023-01-11T21:38:05.8782449Z     del arg1_1
2023-01-11T21:38:05.8782551Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8782556Z 
2023-01-11T21:38:05.8782563Z 
2023-01-11T21:38:05.8782637Z if __name__ == "__main__":
2023-01-11T21:38:05.8782755Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8782880Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8783068Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8783259Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8783377Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8783382Z 
2023-01-11T21:38:05.8783454Z ok (1.707s)
2023-01-11T21:38:05.8783902Z   test_div7_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8784066Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8784315Z [2023-01-11 21:25:50,004] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 72
2023-01-11T21:38:05.8784580Z [2023-01-11 21:25:51,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 72
2023-01-11T21:38:05.8784586Z 
2023-01-11T21:38:05.8784687Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8784762Z import torch
2023-01-11T21:38:05.8784837Z import random
2023-01-11T21:38:05.8784957Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8785085Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8785091Z 
2023-01-11T21:38:05.8785173Z aten = torch.ops.aten
2023-01-11T21:38:05.8785303Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8785399Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8785405Z 
2023-01-11T21:38:05.8785478Z import triton
2023-01-11T21:38:05.8785575Z import triton.language as tl
2023-01-11T21:38:05.8785702Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8785843Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8785849Z 
2023-01-11T21:38:05.8785853Z 
2023-01-11T21:38:05.8785991Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8786197Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8786312Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8786421Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8786531Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8786632Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8786735Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8786837Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8786964Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8787024Z {
2023-01-11T21:38:05.8787128Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8787193Z     {
2023-01-11T21:38:05.8787274Z         #pragma omp for 
2023-01-11T21:38:05.8787363Z         for(long i0=0; i0<10000; i0+=1)
2023-01-11T21:38:05.8787432Z         {
2023-01-11T21:38:05.8787501Z             {
2023-01-11T21:38:05.8787564Z                 {
2023-01-11T21:38:05.8787662Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8787761Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8787877Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8787994Z                     auto tmp3 = static_cast<float>(tmp2);
2023-01-11T21:38:05.8788090Z                     auto tmp4 = tmp1 / tmp3;
2023-01-11T21:38:05.8788338Z                     auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2);
2023-01-11T21:38:05.8788430Z                     auto tmp6 = tmp0 / tmp2;
2023-01-11T21:38:05.8788522Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8788611Z                     out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.8788698Z                     out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.8788785Z                     out_ptr3[i0] = tmp4;
2023-01-11T21:38:05.8788872Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:05.8788941Z                 }
2023-01-11T21:38:05.8789002Z             }
2023-01-11T21:38:05.8789069Z         }
2023-01-11T21:38:05.8789135Z     }
2023-01-11T21:38:05.8789200Z }
2023-01-11T21:38:05.8789285Z ''')
2023-01-11T21:38:05.8789290Z 
2023-01-11T21:38:05.8789327Z 
2023-01-11T21:38:05.8789422Z async_compile.wait(globals())
2023-01-11T21:38:05.8789499Z del async_compile
2023-01-11T21:38:05.8789503Z 
2023-01-11T21:38:05.8789578Z def call(args):
2023-01-11T21:38:05.8789651Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8789726Z     args.clear()
2023-01-11T21:38:05.8789934Z     buf0 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8790137Z     buf1 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8790336Z     buf2 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8790539Z     buf3 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8790736Z     buf4 = empty_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8790986Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8791063Z     del arg0_1
2023-01-11T21:38:05.8791135Z     del arg1_1
2023-01-11T21:38:05.8791236Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8791241Z 
2023-01-11T21:38:05.8791246Z 
2023-01-11T21:38:05.8791328Z if __name__ == "__main__":
2023-01-11T21:38:05.8791454Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8791581Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8791781Z     arg0_1 = rand_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8791974Z     arg1_1 = rand_strided((100, 100), (100, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8792095Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8792100Z 
2023-01-11T21:38:05.8792172Z ok (1.713s)
2023-01-11T21:38:05.8792483Z   test_div8_cpu (__main__.CpuTests) ... [2023-01-11 21:25:51,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 73
2023-01-11T21:38:05.8792747Z [2023-01-11 21:25:53,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 73
2023-01-11T21:38:05.8792754Z 
2023-01-11T21:38:05.8792859Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8792933Z import torch
2023-01-11T21:38:05.8793008Z import random
2023-01-11T21:38:05.8793148Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8793275Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8793280Z 
2023-01-11T21:38:05.8793362Z aten = torch.ops.aten
2023-01-11T21:38:05.8793503Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8793601Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8793606Z 
2023-01-11T21:38:05.8793680Z import triton
2023-01-11T21:38:05.8793773Z import triton.language as tl
2023-01-11T21:38:05.8793900Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8794032Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8794047Z 
2023-01-11T21:38:05.8794051Z 
2023-01-11T21:38:05.8794181Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8794385Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8794501Z extern "C" void kernel(long* __restrict__ out_ptr0,
2023-01-11T21:38:05.8794604Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8794704Z                        long* __restrict__ out_ptr2)
2023-01-11T21:38:05.8794771Z {
2023-01-11T21:38:05.8794838Z     {
2023-01-11T21:38:05.8794899Z         {
2023-01-11T21:38:05.8795008Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:05.8795125Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:05.8795229Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8795329Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:05.8795406Z         }
2023-01-11T21:38:05.8795476Z     }
2023-01-11T21:38:05.8795534Z     {
2023-01-11T21:38:05.8795632Z         {
2023-01-11T21:38:05.8795738Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:05.8795841Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:05.8796085Z             auto tmp2 = ((tmp0 < 0) != (tmp1 < 0) ? (tmp0 % tmp1 != 0 ? tmp0 / tmp1 - 1 : tmp0 / tmp1) : tmp0 / tmp1);
2023-01-11T21:38:05.8796176Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:05.8796243Z         }
2023-01-11T21:38:05.8796302Z     }
2023-01-11T21:38:05.8796366Z     {
2023-01-11T21:38:05.8796433Z         {
2023-01-11T21:38:05.8796536Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:05.8796642Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:05.8796730Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8796806Z             out_ptr2[0] = tmp2;
2023-01-11T21:38:05.8796870Z         }
2023-01-11T21:38:05.8796935Z     }
2023-01-11T21:38:05.8796997Z }
2023-01-11T21:38:05.8797081Z ''')
2023-01-11T21:38:05.8797086Z 
2023-01-11T21:38:05.8797091Z 
2023-01-11T21:38:05.8797187Z async_compile.wait(globals())
2023-01-11T21:38:05.8797265Z del async_compile
2023-01-11T21:38:05.8797271Z 
2023-01-11T21:38:05.8797338Z def call(args):
2023-01-11T21:38:05.8797525Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8797709Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8797885Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8798049Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8798137Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.8798142Z 
2023-01-11T21:38:05.8798146Z 
2023-01-11T21:38:05.8798229Z if __name__ == "__main__":
2023-01-11T21:38:05.8798346Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8798474Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8798570Z     print_performance(lambda: call([]))
2023-01-11T21:38:05.8798578Z 
2023-01-11T21:38:05.8798649Z ok (1.683s)
2023-01-11T21:38:05.8799141Z   test_div_prim_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8799278Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8799536Z [2023-01-11 21:25:53,393] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 74
2023-01-11T21:38:05.8799798Z [2023-01-11 21:25:55,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 74
2023-01-11T21:38:05.8800211Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8800346Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8800601Z [2023-01-11 21:25:55,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 75
2023-01-11T21:38:05.8800861Z [2023-01-11 21:25:56,747] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 75
2023-01-11T21:38:05.8800867Z 
2023-01-11T21:38:05.8800965Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8801032Z import torch
2023-01-11T21:38:05.8801106Z import random
2023-01-11T21:38:05.8801225Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8801354Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8801359Z 
2023-01-11T21:38:05.8801441Z aten = torch.ops.aten
2023-01-11T21:38:05.8801606Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8801702Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8801707Z 
2023-01-11T21:38:05.8801775Z import triton
2023-01-11T21:38:05.8801868Z import triton.language as tl
2023-01-11T21:38:05.8801995Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8802137Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8802143Z 
2023-01-11T21:38:05.8802148Z 
2023-01-11T21:38:05.8802287Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8802492Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8802617Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8802734Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8802831Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8802895Z {
2023-01-11T21:38:05.8803004Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8803070Z     {
2023-01-11T21:38:05.8803153Z         #pragma omp for 
2023-01-11T21:38:05.8803240Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:05.8803308Z         {
2023-01-11T21:38:05.8803442Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8803581Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8803671Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8803766Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8803834Z         }
2023-01-11T21:38:05.8803933Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8804022Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:05.8804081Z         {
2023-01-11T21:38:05.8804171Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8804260Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.8804349Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8804437Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8804504Z         }
2023-01-11T21:38:05.8804573Z     }
2023-01-11T21:38:05.8804630Z }
2023-01-11T21:38:05.8804718Z ''')
2023-01-11T21:38:05.8804724Z 
2023-01-11T21:38:05.8804728Z 
2023-01-11T21:38:05.8804824Z async_compile.wait(globals())
2023-01-11T21:38:05.8804900Z del async_compile
2023-01-11T21:38:05.8804937Z 
2023-01-11T21:38:05.8805017Z def call(args):
2023-01-11T21:38:05.8805098Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8805173Z     args.clear()
2023-01-11T21:38:05.8805361Z     buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8805530Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8805604Z     del arg0_1
2023-01-11T21:38:05.8805676Z     del arg1_1
2023-01-11T21:38:05.8805751Z     return (buf0, )
2023-01-11T21:38:05.8805756Z 
2023-01-11T21:38:05.8805761Z 
2023-01-11T21:38:05.8805840Z if __name__ == "__main__":
2023-01-11T21:38:05.8805962Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8806089Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8806280Z     arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8806474Z     arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8806601Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8806606Z 
2023-01-11T21:38:05.8806611Z 
2023-01-11T21:38:05.8806712Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8806789Z import torch
2023-01-11T21:38:05.8806864Z import random
2023-01-11T21:38:05.8806981Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8807097Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8807109Z 
2023-01-11T21:38:05.8807183Z aten = torch.ops.aten
2023-01-11T21:38:05.8807319Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8807415Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8807447Z 
2023-01-11T21:38:05.8807527Z import triton
2023-01-11T21:38:05.8807621Z import triton.language as tl
2023-01-11T21:38:05.8807746Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8807887Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8807892Z 
2023-01-11T21:38:05.8807899Z 
2023-01-11T21:38:05.8808031Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8819667Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8819833Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8820005Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8820144Z                        long* __restrict__ out_ptr0)
2023-01-11T21:38:05.8820215Z {
2023-01-11T21:38:05.8820325Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8820399Z     {
2023-01-11T21:38:05.8820478Z         #pragma omp for 
2023-01-11T21:38:05.8820581Z         for(long i0=0; i0<100; i0+=1)
2023-01-11T21:38:05.8820654Z         {
2023-01-11T21:38:05.8820725Z             {
2023-01-11T21:38:05.8820804Z                 {
2023-01-11T21:38:05.8820907Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8821002Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.8821105Z                     auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8821200Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8821275Z                 }
2023-01-11T21:38:05.8821346Z             }
2023-01-11T21:38:05.8821415Z         }
2023-01-11T21:38:05.8821483Z     }
2023-01-11T21:38:05.8821542Z }
2023-01-11T21:38:05.8821661Z ''')
2023-01-11T21:38:05.8821667Z 
2023-01-11T21:38:05.8821672Z 
2023-01-11T21:38:05.8821771Z async_compile.wait(globals())
2023-01-11T21:38:05.8821853Z del async_compile
2023-01-11T21:38:05.8821858Z 
2023-01-11T21:38:05.8821935Z def call(args):
2023-01-11T21:38:05.8822019Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8822100Z     args.clear()
2023-01-11T21:38:05.8822293Z     buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8822473Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8822552Z     del arg0_1
2023-01-11T21:38:05.8822629Z     del arg1_1
2023-01-11T21:38:05.8822797Z     return (buf0, )
2023-01-11T21:38:05.8822803Z 
2023-01-11T21:38:05.8822808Z 
2023-01-11T21:38:05.8822895Z if __name__ == "__main__":
2023-01-11T21:38:05.8823021Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8823157Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8823352Z     arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8823551Z     arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8823678Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8823683Z 
2023-01-11T21:38:05.8823764Z ok (3.392s)
2023-01-11T21:38:05.8824228Z   test_div_zero_dim_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8824369Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8824673Z [2023-01-11 21:25:56,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 76
2023-01-11T21:38:05.8825000Z [2023-01-11 21:25:58,554] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 76
2023-01-11T21:38:05.8825533Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8825727Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8826066Z [2023-01-11 21:25:58,606] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 77
2023-01-11T21:38:05.8826480Z [2023-01-11 21:26:00,368] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 77
2023-01-11T21:38:05.8827003Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8827153Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8827453Z [2023-01-11 21:26:00,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 78
2023-01-11T21:38:05.8827459Z 
2023-01-11T21:38:05.8827568Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8827677Z import torch
2023-01-11T21:38:05.8827759Z import random
2023-01-11T21:38:05.8827900Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8828041Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8828047Z 
2023-01-11T21:38:05.8828129Z aten = torch.ops.aten
2023-01-11T21:38:05.8828289Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8828425Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8828431Z 
2023-01-11T21:38:05.8828513Z import triton
2023-01-11T21:38:05.8828616Z import triton.language as tl
2023-01-11T21:38:05.8828760Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8828921Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8828930Z 
2023-01-11T21:38:05.8828934Z 
2023-01-11T21:38:05.8829119Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8829349Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8829527Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8829646Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8829759Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8829895Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8830003Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8830110Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8830207Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:05.8830282Z {
2023-01-11T21:38:05.8830394Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8830467Z     {
2023-01-11T21:38:05.8830581Z         #pragma omp for 
2023-01-11T21:38:05.8830678Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.8830749Z         {
2023-01-11T21:38:05.8830894Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8831026Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:05.8831124Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8831218Z             auto tmp3 = tmp2.floor();
2023-01-11T21:38:05.8831314Z             auto tmp4 = tmp2.trunc();
2023-01-11T21:38:05.8831415Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8831520Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8831610Z             tmp4.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.8831710Z             tmp2.store(out_ptr3 + 8*i0);
2023-01-11T21:38:05.8831808Z             tmp3.store(out_ptr4 + 8*i0);
2023-01-11T21:38:05.8831881Z         }
2023-01-11T21:38:05.8832010Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8832143Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.8832220Z         {
2023-01-11T21:38:05.8832308Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8832408Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:05.8832505Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8832639Z             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:05.8832750Z             auto tmp4 = std::trunc(tmp2);
2023-01-11T21:38:05.8832846Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8832940Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.8833023Z             out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.8833116Z             out_ptr3[i0] = tmp2;
2023-01-11T21:38:05.8833207Z             out_ptr4[i0] = tmp3;
2023-01-11T21:38:05.8833280Z         }
2023-01-11T21:38:05.8833353Z     }
2023-01-11T21:38:05.8833425Z }
2023-01-11T21:38:05.8833520Z ''')
2023-01-11T21:38:05.8833525Z 
2023-01-11T21:38:05.8833530Z 
2023-01-11T21:38:05.8833649Z async_compile.wait(globals())
2023-01-11T21:38:05.8833737Z del async_compile
2023-01-11T21:38:05.8833746Z 
2023-01-11T21:38:05.8833826Z def call(args):
2023-01-11T21:38:05.8833913Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8834021Z     args.clear()
2023-01-11T21:38:05.8834228Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8834434Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8834633Z     buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8834849Z     buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8835044Z     buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8835340Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8835421Z     del arg0_1
2023-01-11T21:38:05.8835500Z     del arg1_1
2023-01-11T21:38:05.8835613Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8835619Z 
2023-01-11T21:38:05.8835623Z 
2023-01-11T21:38:05.8835713Z if __name__ == "__main__":
2023-01-11T21:38:05.8835844Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8835970Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8836248Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8836449Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8836579Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8836584Z 
2023-01-11T21:38:05.8836589Z 
2023-01-11T21:38:05.8836694Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8836804Z import torch
2023-01-11T21:38:05.8836887Z import random
2023-01-11T21:38:05.8837015Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8837139Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8837148Z 
2023-01-11T21:38:05.8837240Z aten = torch.ops.aten
2023-01-11T21:38:05.8837385Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8837511Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8837518Z 
2023-01-11T21:38:05.8837601Z import triton
2023-01-11T21:38:05.8837701Z import triton.language as tl
2023-01-11T21:38:05.8837839Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8837978Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8837994Z 
2023-01-11T21:38:05.8837999Z 
2023-01-11T21:38:05.8838134Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8838377Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8838510Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8838630Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8838743Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8838896Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.8839027Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.8839124Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8839231Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:05.8839307Z {
2023-01-11T21:38:05.8839416Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8839485Z     {
2023-01-11T21:38:05.8839573Z         #pragma omp for 
2023-01-11T21:38:05.8839668Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.8839730Z         {
2023-01-11T21:38:05.8839891Z             auto tmp0 = at::vec::Vectorized<float>(in_ptr0[0]);
2023-01-11T21:38:05.8840036Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8840131Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8840230Z             auto tmp3 = tmp2.floor();
2023-01-11T21:38:05.8840325Z             auto tmp4 = tmp2.trunc();
2023-01-11T21:38:05.8840430Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8840523Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8840623Z             tmp4.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.8863372Z             tmp2.store(out_ptr3 + 8*i0);
2023-01-11T21:38:05.8863493Z             tmp3.store(out_ptr4 + 8*i0);
2023-01-11T21:38:05.8863569Z         }
2023-01-11T21:38:05.8863673Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8863767Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.8863828Z         {
2023-01-11T21:38:05.8863918Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:05.8864008Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.8864097Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8864202Z             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:05.8864302Z             auto tmp4 = std::trunc(tmp2);
2023-01-11T21:38:05.8864389Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.8864466Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.8864555Z             out_ptr2[i0] = tmp4;
2023-01-11T21:38:05.8864641Z             out_ptr3[i0] = tmp2;
2023-01-11T21:38:05.8864729Z             out_ptr4[i0] = tmp3;
2023-01-11T21:38:05.8864801Z         }
2023-01-11T21:38:05.8864869Z     }
2023-01-11T21:38:05.8864938Z }
2023-01-11T21:38:05.8865044Z ''')
2023-01-11T21:38:05.8865182Z 
2023-01-11T21:38:05.8865199Z 
2023-01-11T21:38:05.8865289Z async_compile.wait(globals())
2023-01-11T21:38:05.8865368Z del async_compile
2023-01-11T21:38:05.8865373Z 
2023-01-11T21:38:05.8865451Z def call(args):
2023-01-11T21:38:05.8865533Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8865610Z     args.clear()
2023-01-11T21:38:05.8865809Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8866002Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8866182Z     buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8866374Z     buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8866559Z     buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8866824Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8866900Z     del arg0_1
2023-01-11T21:38:05.8866974Z     del arg1_1
2023-01-11T21:38:05.8867074Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8867079Z 
2023-01-11T21:38:05.8867084Z 
2023-01-11T21:38:05.8867166Z if __name__ == "__main__":
2023-01-11T21:38:05.8867277Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8867406Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8867593Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8867786Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8867958Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8867963Z 
2023-01-11T21:38:05.8868235Z [2023-01-11 21:26:02,196] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 78
2023-01-11T21:38:05.8868658Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8868789Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8869042Z [2023-01-11 21:26:02,241] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 79
2023-01-11T21:38:05.8869302Z [2023-01-11 21:26:03,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 79
2023-01-11T21:38:05.8869311Z 
2023-01-11T21:38:05.8869405Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8869479Z import torch
2023-01-11T21:38:05.8869553Z import random
2023-01-11T21:38:05.8869671Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8869797Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8869803Z 
2023-01-11T21:38:05.8869884Z aten = torch.ops.aten
2023-01-11T21:38:05.8870020Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8870111Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8870116Z 
2023-01-11T21:38:05.8870190Z import triton
2023-01-11T21:38:05.8870281Z import triton.language as tl
2023-01-11T21:38:05.8870406Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8870543Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8870549Z 
2023-01-11T21:38:05.8870553Z 
2023-01-11T21:38:05.8870689Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8870896Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8871016Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8871123Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8871255Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8871357Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8871453Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8871553Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8871649Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8871713Z {
2023-01-11T21:38:05.8871811Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8871876Z     {
2023-01-11T21:38:05.8871957Z         #pragma omp for 
2023-01-11T21:38:05.8872043Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:05.8872109Z         {
2023-01-11T21:38:05.8872177Z             {
2023-01-11T21:38:05.8872245Z                 {
2023-01-11T21:38:05.8872340Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8872435Z                     auto tmp2 = in_ptr1[0];
2023-01-11T21:38:05.8872547Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8872659Z                     auto tmp3 = static_cast<float>(tmp2);
2023-01-11T21:38:05.8872754Z                     auto tmp4 = tmp1 / tmp3;
2023-01-11T21:38:05.8873004Z                     auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2);
2023-01-11T21:38:05.8873099Z                     auto tmp6 = tmp0 / tmp2;
2023-01-11T21:38:05.8873185Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8873274Z                     out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.8873360Z                     out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.8873447Z                     out_ptr3[i0] = tmp4;
2023-01-11T21:38:05.8873608Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:05.8873674Z                 }
2023-01-11T21:38:05.8873738Z             }
2023-01-11T21:38:05.8873798Z         }
2023-01-11T21:38:05.8873861Z     }
2023-01-11T21:38:05.8873921Z }
2023-01-11T21:38:05.8874002Z ''')
2023-01-11T21:38:05.8874007Z 
2023-01-11T21:38:05.8874012Z 
2023-01-11T21:38:05.8874103Z async_compile.wait(globals())
2023-01-11T21:38:05.8874176Z del async_compile
2023-01-11T21:38:05.8874180Z 
2023-01-11T21:38:05.8874251Z def call(args):
2023-01-11T21:38:05.8874324Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8874396Z     args.clear()
2023-01-11T21:38:05.8874585Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8874771Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8874957Z     buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8875141Z     buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8875324Z     buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8875580Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8875652Z     del arg0_1
2023-01-11T21:38:05.8875718Z     del arg1_1
2023-01-11T21:38:05.8875814Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8875820Z 
2023-01-11T21:38:05.8875824Z 
2023-01-11T21:38:05.8875901Z if __name__ == "__main__":
2023-01-11T21:38:05.8876015Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8876136Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8876323Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8876500Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8876613Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8876625Z 
2023-01-11T21:38:05.8876629Z 
2023-01-11T21:38:05.8876721Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8876791Z import torch
2023-01-11T21:38:05.8876861Z import random
2023-01-11T21:38:05.8876975Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8877127Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8877133Z 
2023-01-11T21:38:05.8877213Z aten = torch.ops.aten
2023-01-11T21:38:05.8877346Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8877434Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8877439Z 
2023-01-11T21:38:05.8877509Z import triton
2023-01-11T21:38:05.8877598Z import triton.language as tl
2023-01-11T21:38:05.8877718Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8877854Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8877859Z 
2023-01-11T21:38:05.8877864Z 
2023-01-11T21:38:05.8878001Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8878202Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8878321Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8878423Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.8878524Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8878621Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:05.8878716Z                        long* __restrict__ out_ptr2,
2023-01-11T21:38:05.8878814Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.8878907Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:05.8878969Z {
2023-01-11T21:38:05.8879063Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8879125Z     {
2023-01-11T21:38:05.8879205Z         #pragma omp for 
2023-01-11T21:38:05.8879290Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:05.8879384Z         {
2023-01-11T21:38:05.8879451Z             {
2023-01-11T21:38:05.8879518Z                 {
2023-01-11T21:38:05.8879611Z                     auto tmp0 = in_ptr0[0];
2023-01-11T21:38:05.8879709Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8879820Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8879934Z                     auto tmp3 = static_cast<float>(tmp2);
2023-01-11T21:38:05.8880029Z                     auto tmp4 = tmp1 / tmp3;
2023-01-11T21:38:05.8880276Z                     auto tmp5 = ((tmp0 < 0) != (tmp2 < 0) ? (tmp0 % tmp2 != 0 ? tmp0 / tmp2 - 1 : tmp0 / tmp2) : tmp0 / tmp2);
2023-01-11T21:38:05.8880373Z                     auto tmp6 = tmp0 / tmp2;
2023-01-11T21:38:05.8880458Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.8880546Z                     out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.8880633Z                     out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.8880721Z                     out_ptr3[i0] = tmp4;
2023-01-11T21:38:05.8880811Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:05.8880879Z                 }
2023-01-11T21:38:05.8880946Z             }
2023-01-11T21:38:05.8881009Z         }
2023-01-11T21:38:05.8881077Z     }
2023-01-11T21:38:05.8881144Z }
2023-01-11T21:38:05.8881229Z ''')
2023-01-11T21:38:05.8881235Z 
2023-01-11T21:38:05.8881239Z 
2023-01-11T21:38:05.8881340Z async_compile.wait(globals())
2023-01-11T21:38:05.8902129Z del async_compile
2023-01-11T21:38:05.8902135Z 
2023-01-11T21:38:05.8902225Z def call(args):
2023-01-11T21:38:05.8902300Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8902378Z     args.clear()
2023-01-11T21:38:05.8902582Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8902769Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8902958Z     buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8903145Z     buf3 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8903331Z     buf4 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8903669Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.8903739Z     del arg0_1
2023-01-11T21:38:05.8903812Z     del arg1_1
2023-01-11T21:38:05.8903910Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8903916Z 
2023-01-11T21:38:05.8903920Z 
2023-01-11T21:38:05.8904000Z if __name__ == "__main__":
2023-01-11T21:38:05.8904123Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8904248Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8904430Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8904617Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8904737Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8904747Z 
2023-01-11T21:38:05.8904811Z ok (7.180s)
2023-01-11T21:38:05.8905318Z   test_dropout_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.8905399Z   warnings.warn(
2023-01-11T21:38:05.8905655Z [2023-01-11 21:26:03,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 80
2023-01-11T21:38:05.8905905Z [2023-01-11 21:26:03,994] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.8906165Z [2023-01-11 21:26:05,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 80
2023-01-11T21:38:05.8906415Z [2023-01-11 21:26:05,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 81
2023-01-11T21:38:05.8906722Z [2023-01-11 21:26:05,731] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.8906982Z [2023-01-11 21:26:05,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 81
2023-01-11T21:38:05.8906988Z 
2023-01-11T21:38:05.8907085Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8907160Z import torch
2023-01-11T21:38:05.8907236Z import random
2023-01-11T21:38:05.8907358Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8907484Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8907489Z 
2023-01-11T21:38:05.8907572Z aten = torch.ops.aten
2023-01-11T21:38:05.8907710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8907801Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8907811Z 
2023-01-11T21:38:05.8907882Z import triton
2023-01-11T21:38:05.8907976Z import triton.language as tl
2023-01-11T21:38:05.8908106Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8908247Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8908414Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.8908420Z 
2023-01-11T21:38:05.8908425Z 
2023-01-11T21:38:05.8908570Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8908780Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8908902Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:05.8909008Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8909117Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8909188Z {
2023-01-11T21:38:05.8909293Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8909365Z     {
2023-01-11T21:38:05.8909449Z         #pragma omp for 
2023-01-11T21:38:05.8909547Z         for(long i0=0; i0<1000; i0+=1)
2023-01-11T21:38:05.8909610Z         {
2023-01-11T21:38:05.8909681Z             {
2023-01-11T21:38:05.8909754Z                 {
2023-01-11T21:38:05.8909849Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.8909953Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.8910097Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.8910247Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.8910354Z                     auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:05.8910452Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.8910568Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.8910670Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.8910784Z                     auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.8910886Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8910978Z                     out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.8911047Z                 }
2023-01-11T21:38:05.8911119Z             }
2023-01-11T21:38:05.8911190Z         }
2023-01-11T21:38:05.8911258Z     }
2023-01-11T21:38:05.8911325Z }
2023-01-11T21:38:05.8911415Z ''')
2023-01-11T21:38:05.8911420Z 
2023-01-11T21:38:05.8911425Z 
2023-01-11T21:38:05.8911524Z async_compile.wait(globals())
2023-01-11T21:38:05.8911597Z del async_compile
2023-01-11T21:38:05.8911611Z 
2023-01-11T21:38:05.8911681Z def call(args):
2023-01-11T21:38:05.8911758Z     arg0_1, = args
2023-01-11T21:38:05.8911838Z     args.clear()
2023-01-11T21:38:05.8911981Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.8912180Z     buf0 = empty_strided((1000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8912359Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8912436Z     del arg0_1
2023-01-11T21:38:05.8912507Z     return (buf0, )
2023-01-11T21:38:05.8912543Z 
2023-01-11T21:38:05.8912548Z 
2023-01-11T21:38:05.8912633Z if __name__ == "__main__":
2023-01-11T21:38:05.8912756Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8912889Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8913088Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8913292Z     arg0_1 = rand_strided((1000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8913408Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8913413Z 
2023-01-11T21:38:05.8913417Z 
2023-01-11T21:38:05.8913520Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8913589Z import torch
2023-01-11T21:38:05.8913668Z import random
2023-01-11T21:38:05.8913791Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8913916Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8913921Z 
2023-01-11T21:38:05.8914008Z aten = torch.ops.aten
2023-01-11T21:38:05.8914150Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8914250Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8914255Z 
2023-01-11T21:38:05.8914332Z import triton
2023-01-11T21:38:05.8914421Z import triton.language as tl
2023-01-11T21:38:05.8914554Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8914697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8914867Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.8914872Z 
2023-01-11T21:38:05.8914876Z 
2023-01-11T21:38:05.8915017Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8915223Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8915346Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:05.8915462Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8915563Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8915633Z {
2023-01-11T21:38:05.8915739Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8915812Z     {
2023-01-11T21:38:05.8915897Z         #pragma omp for 
2023-01-11T21:38:05.8915990Z         for(long i0=0; i0<1000; i0+=1)
2023-01-11T21:38:05.8916053Z         {
2023-01-11T21:38:05.8916153Z             {
2023-01-11T21:38:05.8916225Z                 {
2023-01-11T21:38:05.8916322Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.8916422Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.8916534Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.8916682Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.8916787Z                     auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:05.8916888Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.8917007Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.8917109Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.8917222Z                     auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.8917321Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8917415Z                     out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.8917494Z                 }
2023-01-11T21:38:05.8917557Z             }
2023-01-11T21:38:05.8917630Z         }
2023-01-11T21:38:05.8917699Z     }
2023-01-11T21:38:05.8917769Z }
2023-01-11T21:38:05.8917858Z ''')
2023-01-11T21:38:05.8917863Z 
2023-01-11T21:38:05.8917868Z 
2023-01-11T21:38:05.8917965Z async_compile.wait(globals())
2023-01-11T21:38:05.8918048Z del async_compile
2023-01-11T21:38:05.8918053Z 
2023-01-11T21:38:05.8918123Z def call(args):
2023-01-11T21:38:05.8918199Z     arg0_1, = args
2023-01-11T21:38:05.8918276Z     args.clear()
2023-01-11T21:38:05.8918418Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.8918617Z     buf0 = empty_strided((1000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8918823Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8918896Z     del arg0_1
2023-01-11T21:38:05.8918966Z     return (buf0, )
2023-01-11T21:38:05.8918971Z 
2023-01-11T21:38:05.8918983Z 
2023-01-11T21:38:05.8919059Z if __name__ == "__main__":
2023-01-11T21:38:05.8919174Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8919302Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8919495Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8919693Z     arg0_1 = rand_strided((1000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8919807Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8919812Z 
2023-01-11T21:38:05.8919882Z ok (1.811s)
2023-01-11T21:38:05.8920215Z   test_dropout_deterministic_cpu (__main__.CpuTests) ... [2023-01-11 21:26:05,804] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 82
2023-01-11T21:38:05.8920475Z [2023-01-11 21:26:05,804] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.8920739Z [2023-01-11 21:26:07,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 82
2023-01-11T21:38:05.8920992Z [2023-01-11 21:26:07,546] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 83
2023-01-11T21:38:05.8921243Z [2023-01-11 21:26:07,546] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.8921501Z [2023-01-11 21:26:07,554] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 83
2023-01-11T21:38:05.8921506Z 
2023-01-11T21:38:05.8921607Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8921683Z import torch
2023-01-11T21:38:05.8921759Z import random
2023-01-11T21:38:05.8921874Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8922000Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8922005Z 
2023-01-11T21:38:05.8922092Z aten = torch.ops.aten
2023-01-11T21:38:05.8922229Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8922356Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8922361Z 
2023-01-11T21:38:05.8922438Z import triton
2023-01-11T21:38:05.8922532Z import triton.language as tl
2023-01-11T21:38:05.8922649Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8922790Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8922955Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.8922960Z 
2023-01-11T21:38:05.8922965Z 
2023-01-11T21:38:05.8923105Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8923313Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8923436Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:05.8923546Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8923649Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8923707Z {
2023-01-11T21:38:05.8923813Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8923882Z     {
2023-01-11T21:38:05.8923964Z         #pragma omp for 
2023-01-11T21:38:05.8924053Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.8924123Z         {
2023-01-11T21:38:05.8924192Z             {
2023-01-11T21:38:05.8924254Z                 {
2023-01-11T21:38:05.8924351Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.8924450Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.8924557Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.8924701Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.8924839Z                     auto tmp3 = static_cast<float>(0.55);
2023-01-11T21:38:05.8924934Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.8925038Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.8925135Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.8925258Z                     auto tmp8 = static_cast<float>(2.2222222222222223);
2023-01-11T21:38:05.8925353Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8925443Z                     out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.8925515Z                 }
2023-01-11T21:38:05.8925584Z             }
2023-01-11T21:38:05.8925644Z         }
2023-01-11T21:38:05.8925711Z     }
2023-01-11T21:38:05.8925776Z }
2023-01-11T21:38:05.8925862Z ''')
2023-01-11T21:38:05.8925867Z 
2023-01-11T21:38:05.8925871Z 
2023-01-11T21:38:05.8925966Z async_compile.wait(globals())
2023-01-11T21:38:05.8926043Z del async_compile
2023-01-11T21:38:05.8926047Z 
2023-01-11T21:38:05.8926124Z def call(args):
2023-01-11T21:38:05.8926202Z     arg0_1, = args
2023-01-11T21:38:05.8926270Z     args.clear()
2023-01-11T21:38:05.8926407Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.8926603Z     buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8926779Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8926853Z     del arg0_1
2023-01-11T21:38:05.8926929Z     return (buf0, )
2023-01-11T21:38:05.8926934Z 
2023-01-11T21:38:05.8926938Z 
2023-01-11T21:38:05.8927020Z if __name__ == "__main__":
2023-01-11T21:38:05.8927130Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8927257Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8927451Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8927648Z     arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8927764Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8927769Z 
2023-01-11T21:38:05.8927773Z 
2023-01-11T21:38:05.8927871Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8927946Z import torch
2023-01-11T21:38:05.8928023Z import random
2023-01-11T21:38:05.8928135Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8928301Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8928306Z 
2023-01-11T21:38:05.8928390Z aten = torch.ops.aten
2023-01-11T21:38:05.8928526Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8928622Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8928628Z 
2023-01-11T21:38:05.8928705Z import triton
2023-01-11T21:38:05.8928799Z import triton.language as tl
2023-01-11T21:38:05.8928924Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8929055Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8929218Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.8929226Z 
2023-01-11T21:38:05.8929230Z 
2023-01-11T21:38:05.8929368Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8929573Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8929700Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:05.8929810Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8929914Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8929982Z {
2023-01-11T21:38:05.8930077Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8930145Z     {
2023-01-11T21:38:05.8930227Z         #pragma omp for 
2023-01-11T21:38:05.8930317Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.8930386Z         {
2023-01-11T21:38:05.8930454Z             {
2023-01-11T21:38:05.8930516Z                 {
2023-01-11T21:38:05.8930607Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.8930739Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.8930846Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.8930990Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.8931104Z                     auto tmp3 = static_cast<float>(0.55);
2023-01-11T21:38:05.8931201Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.8931313Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.8931402Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.8931519Z                     auto tmp8 = static_cast<float>(2.2222222222222223);
2023-01-11T21:38:05.8931614Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8931706Z                     out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.8931778Z                 }
2023-01-11T21:38:05.8931846Z             }
2023-01-11T21:38:05.8931913Z         }
2023-01-11T21:38:05.8931972Z     }
2023-01-11T21:38:05.8932041Z }
2023-01-11T21:38:05.8932126Z ''')
2023-01-11T21:38:05.8932132Z 
2023-01-11T21:38:05.8932136Z 
2023-01-11T21:38:05.8932232Z async_compile.wait(globals())
2023-01-11T21:38:05.8932310Z del async_compile
2023-01-11T21:38:05.8932315Z 
2023-01-11T21:38:05.8932390Z def call(args):
2023-01-11T21:38:05.8932466Z     arg0_1, = args
2023-01-11T21:38:05.8932537Z     args.clear()
2023-01-11T21:38:05.8932672Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.8932865Z     buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8933037Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8933111Z     del arg0_1
2023-01-11T21:38:05.8933186Z     return (buf0, )
2023-01-11T21:38:05.8933191Z 
2023-01-11T21:38:05.8933196Z 
2023-01-11T21:38:05.8933277Z if __name__ == "__main__":
2023-01-11T21:38:05.8933395Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8933516Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8933710Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8933906Z     arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8934050Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8934056Z 
2023-01-11T21:38:05.8934130Z ok (1.815s)
2023-01-11T21:38:05.8934611Z   test_dtype_mismatch_issue_cpu (__main__.CpuTests) ... [2023-01-11 21:26:07,578] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 84
2023-01-11T21:38:05.8934880Z [2023-01-11 21:26:09,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 84
2023-01-11T21:38:05.8934886Z 
2023-01-11T21:38:05.8934985Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8935052Z import torch
2023-01-11T21:38:05.8935127Z import random
2023-01-11T21:38:05.8935246Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8935373Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8935379Z 
2023-01-11T21:38:05.8935460Z aten = torch.ops.aten
2023-01-11T21:38:05.8935600Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8935697Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8935705Z 
2023-01-11T21:38:05.8935782Z import triton
2023-01-11T21:38:05.8935868Z import triton.language as tl
2023-01-11T21:38:05.8935994Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8936133Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8936138Z 
2023-01-11T21:38:05.8936143Z 
2023-01-11T21:38:05.8936282Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8936491Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8936613Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.8936726Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8936879Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8936975Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.8937041Z {
2023-01-11T21:38:05.8937191Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:05.8937297Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8937366Z     {
2023-01-11T21:38:05.8937449Z         #pragma omp for 
2023-01-11T21:38:05.8937530Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:05.8937598Z         {
2023-01-11T21:38:05.8937667Z             {
2023-01-11T21:38:05.8937736Z                 {
2023-01-11T21:38:05.8937999Z                     float tmp5 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.8938098Z                     for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:05.8938171Z                     {
2023-01-11T21:38:05.8938238Z                         {
2023-01-11T21:38:05.8938353Z                             auto tmp0 = static_cast<long>(i1);
2023-01-11T21:38:05.8938473Z                             auto tmp1 = static_cast<long>(63);
2023-01-11T21:38:05.8938575Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.8938668Z                             float tmp3 = 0.0;
2023-01-11T21:38:05.8938750Z                             if(tmp2)
2023-01-11T21:38:05.8938829Z                             {
2023-01-11T21:38:05.8938943Z                                 auto tmp4 = in_ptr0[i1 + (63*i0)];
2023-01-11T21:38:05.8939027Z                                 tmp3 = tmp4;
2023-01-11T21:38:05.8939105Z                             }
2023-01-11T21:38:05.8939216Z                             tmp5 = std::max(tmp5, tmp3);
2023-01-11T21:38:05.8939289Z                         }
2023-01-11T21:38:05.8939359Z                     }
2023-01-11T21:38:05.8939451Z                     out_ptr0[i0] = tmp5;
2023-01-11T21:38:05.8939524Z                 }
2023-01-11T21:38:05.8939585Z             }
2023-01-11T21:38:05.8939654Z         }
2023-01-11T21:38:05.8939739Z         #pragma omp for 
2023-01-11T21:38:05.8939827Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:05.8939895Z         {
2023-01-11T21:38:05.8939962Z             {
2023-01-11T21:38:05.8940024Z                 {
2023-01-11T21:38:05.8940111Z                     float tmp8 = 0;
2023-01-11T21:38:05.8940257Z                     for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:05.8940330Z                     {
2023-01-11T21:38:05.8940404Z                         {
2023-01-11T21:38:05.8940510Z                             auto tmp5 = out_ptr0[i0];
2023-01-11T21:38:05.8940624Z                             auto tmp0 = static_cast<long>(i1);
2023-01-11T21:38:05.8940728Z                             auto tmp1 = static_cast<long>(63);
2023-01-11T21:38:05.8940829Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:05.8940923Z                             float tmp3 = 0.0;
2023-01-11T21:38:05.8941006Z                             if(tmp2)
2023-01-11T21:38:05.8941082Z                             {
2023-01-11T21:38:05.8941198Z                                 auto tmp4 = in_ptr0[i1 + (63*i0)];
2023-01-11T21:38:05.8941291Z                                 tmp3 = tmp4;
2023-01-11T21:38:05.8941359Z                             }
2023-01-11T21:38:05.8941507Z                             auto tmp6 = tmp3 - tmp5;
2023-01-11T21:38:05.8941621Z                             auto tmp7 = std::exp(tmp6);
2023-01-11T21:38:05.8941728Z                             out_ptr1[i1 + (64*i0)] = tmp7;
2023-01-11T21:38:05.8941818Z                             tmp8 += tmp7;
2023-01-11T21:38:05.8941892Z                         }
2023-01-11T21:38:05.8941965Z                     }
2023-01-11T21:38:05.8942047Z                     out_ptr2[i0] = tmp8;
2023-01-11T21:38:05.8942121Z                 }
2023-01-11T21:38:05.8942188Z             }
2023-01-11T21:38:05.8942256Z         }
2023-01-11T21:38:05.8942338Z         #pragma omp for 
2023-01-11T21:38:05.8942427Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:05.8942495Z         {
2023-01-11T21:38:05.8942621Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.8942690Z             {
2023-01-11T21:38:05.8942842Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + (8*i1) + (64*i0));
2023-01-11T21:38:05.8942976Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr2[i0]);
2023-01-11T21:38:05.8943073Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8943189Z                 tmp2.store(in_out_ptr0 + (8*i1) + (64*i0));
2023-01-11T21:38:05.8943259Z             }
2023-01-11T21:38:05.8943347Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8943441Z             for(long i1=64; i1<64; i1+=1)
2023-01-11T21:38:05.8943525Z             {
2023-01-11T21:38:05.8943655Z                 auto tmp0 = out_ptr1[i1 + (64*i0)];
2023-01-11T21:38:05.8943777Z                 auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:05.8943895Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.8944021Z                 in_out_ptr0[i1 + (64*i0)] = tmp2;
2023-01-11T21:38:05.8944089Z             }
2023-01-11T21:38:05.8944159Z         }
2023-01-11T21:38:05.8944228Z     }
2023-01-11T21:38:05.8944297Z }
2023-01-11T21:38:05.8944392Z ''')
2023-01-11T21:38:05.8944398Z 
2023-01-11T21:38:05.8944403Z 
2023-01-11T21:38:05.8944500Z async_compile.wait(globals())
2023-01-11T21:38:05.8944581Z del async_compile
2023-01-11T21:38:05.8944591Z 
2023-01-11T21:38:05.8944670Z def call(args):
2023-01-11T21:38:05.8944742Z     arg0_1, = args
2023-01-11T21:38:05.8944821Z     args.clear()
2023-01-11T21:38:05.8945037Z     buf0 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8945256Z     buf1 = empty_strided((128, 32, 64), (2048, 64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8945468Z     buf2 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8945564Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:05.8945760Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8945831Z     del arg0_1
2023-01-11T21:38:05.8945912Z     return (buf3, )
2023-01-11T21:38:05.8945917Z 
2023-01-11T21:38:05.8945921Z 
2023-01-11T21:38:05.8946016Z if __name__ == "__main__":
2023-01-11T21:38:05.8946139Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8946319Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8946533Z     arg0_1 = rand_strided((128, 32, 63), (2016, 63, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8946653Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8946658Z 
2023-01-11T21:38:05.8946731Z ok (1.750s)
2023-01-11T21:38:05.8947263Z   test_elu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8947403Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8947699Z [2023-01-11 21:26:09,347] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 85
2023-01-11T21:38:05.8948000Z [2023-01-11 21:26:11,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 85
2023-01-11T21:38:05.8948006Z 
2023-01-11T21:38:05.8948110Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8948188Z import torch
2023-01-11T21:38:05.8948266Z import random
2023-01-11T21:38:05.8948394Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8948528Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8948533Z 
2023-01-11T21:38:05.8948610Z aten = torch.ops.aten
2023-01-11T21:38:05.8948761Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8948896Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8948901Z 
2023-01-11T21:38:05.8948981Z import triton
2023-01-11T21:38:05.8949080Z import triton.language as tl
2023-01-11T21:38:05.8949225Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8949368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8949375Z 
2023-01-11T21:38:05.8949380Z 
2023-01-11T21:38:05.8949519Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8949719Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8949845Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8949951Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8950061Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8950129Z {
2023-01-11T21:38:05.8950233Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8950303Z     {
2023-01-11T21:38:05.8950380Z         #pragma omp for 
2023-01-11T21:38:05.8950474Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.8950544Z         {
2023-01-11T21:38:05.8950615Z             {
2023-01-11T21:38:05.8950685Z                 {
2023-01-11T21:38:05.8950786Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8950905Z                     auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:05.8950998Z                     auto tmp2 = tmp0 > tmp1;
2023-01-11T21:38:05.8951121Z                     auto tmp3 = static_cast<float>(1.0507009873554805);
2023-01-11T21:38:05.8951220Z                     auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.8951334Z                     auto tmp5 = static_cast<float>(1.0);
2023-01-11T21:38:05.8951432Z                     auto tmp6 = tmp0 * tmp5;
2023-01-11T21:38:05.8951542Z                     auto tmp7 = std::expm1(tmp6);
2023-01-11T21:38:05.8951662Z                     auto tmp8 = static_cast<float>(1.7580993408473766);
2023-01-11T21:38:05.8951760Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.8951862Z                     auto tmp10 = tmp2 ? tmp4 : tmp9;
2023-01-11T21:38:05.8951971Z                     auto tmp11 = static_cast<float>(2);
2023-01-11T21:38:05.8952073Z                     auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:05.8952183Z                     auto tmp13 = static_cast<float>(1);
2023-01-11T21:38:05.8952319Z                     auto tmp14 = tmp0 + tmp13;
2023-01-11T21:38:05.8952423Z                     auto tmp15 = tmp14 > tmp1;
2023-01-11T21:38:05.8952534Z                     auto tmp16 = static_cast<float>(3);
2023-01-11T21:38:05.8952627Z                     auto tmp17 = tmp14 * tmp16;
2023-01-11T21:38:05.8952737Z                     auto tmp18 = static_cast<float>(4);
2023-01-11T21:38:05.8952837Z                     auto tmp19 = tmp14 * tmp18;
2023-01-11T21:38:05.8952947Z                     auto tmp20 = std::expm1(tmp19);
2023-01-11T21:38:05.8953056Z                     auto tmp21 = static_cast<float>(6);
2023-01-11T21:38:05.8953157Z                     auto tmp22 = tmp20 * tmp21;
2023-01-11T21:38:05.8953266Z                     auto tmp23 = tmp15 ? tmp17 : tmp22;
2023-01-11T21:38:05.8953352Z                     out_ptr0[i0] = tmp12;
2023-01-11T21:38:05.8953446Z                     out_ptr1[i0] = tmp23;
2023-01-11T21:38:05.8953519Z                 }
2023-01-11T21:38:05.8953592Z             }
2023-01-11T21:38:05.8953664Z         }
2023-01-11T21:38:05.8953733Z     }
2023-01-11T21:38:05.8953800Z }
2023-01-11T21:38:05.8953882Z ''')
2023-01-11T21:38:05.8953888Z 
2023-01-11T21:38:05.8953892Z 
2023-01-11T21:38:05.8953995Z async_compile.wait(globals())
2023-01-11T21:38:05.8954076Z del async_compile
2023-01-11T21:38:05.8954081Z 
2023-01-11T21:38:05.8954159Z def call(args):
2023-01-11T21:38:05.8954238Z     arg0_1, = args
2023-01-11T21:38:05.8954321Z     args.clear()
2023-01-11T21:38:05.8954525Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8954718Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8954920Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8955000Z     del arg0_1
2023-01-11T21:38:05.8955086Z     return (buf0, buf1, )
2023-01-11T21:38:05.8955091Z 
2023-01-11T21:38:05.8955096Z 
2023-01-11T21:38:05.8955193Z if __name__ == "__main__":
2023-01-11T21:38:05.8955333Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8955484Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8955686Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8955792Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8955806Z 
2023-01-11T21:38:05.8955873Z ok (1.723s)
2023-01-11T21:38:05.8956334Z   test_embedding_bag_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8956474Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8956736Z [2023-01-11 21:26:11,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 86
2023-01-11T21:38:05.8956971Z [2023-01-11 21:26:11,066] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._embedding_bag
2023-01-11T21:38:05.8957236Z [2023-01-11 21:26:11,069] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 86
2023-01-11T21:38:05.8957242Z 
2023-01-11T21:38:05.8957345Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8957424Z import torch
2023-01-11T21:38:05.8957506Z import random
2023-01-11T21:38:05.8957621Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8957750Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8957758Z 
2023-01-11T21:38:05.8957844Z aten = torch.ops.aten
2023-01-11T21:38:05.8957983Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8958082Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8958087Z 
2023-01-11T21:38:05.8958164Z import triton
2023-01-11T21:38:05.8958291Z import triton.language as tl
2023-01-11T21:38:05.8958411Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8958552Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8958558Z 
2023-01-11T21:38:05.8958562Z 
2023-01-11T21:38:05.8958657Z async_compile.wait(globals())
2023-01-11T21:38:05.8958735Z del async_compile
2023-01-11T21:38:05.8958740Z 
2023-01-11T21:38:05.8958816Z def call(args):
2023-01-11T21:38:05.8958905Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.8958982Z     args.clear()
2023-01-11T21:38:05.8959095Z     buf0 = aten._embedding_bag(arg0_1, arg1_1, arg2_1)
2023-01-11T21:38:05.8959160Z     del arg0_1
2023-01-11T21:38:05.8959236Z     del arg1_1
2023-01-11T21:38:05.8959307Z     del arg2_1
2023-01-11T21:38:05.8959381Z     buf1 = buf0[0]
2023-01-11T21:38:05.8959481Z     assert_size_stride(buf1, (3, 4), (4, 1))
2023-01-11T21:38:05.8959556Z     buf2 = buf0[1]
2023-01-11T21:38:05.8959656Z     assert_size_stride(buf2, (0, ), (1, ))
2023-01-11T21:38:05.8959724Z     buf3 = buf0[2]
2023-01-11T21:38:05.8959824Z     assert_size_stride(buf3, (3, ), (1, ))
2023-01-11T21:38:05.8959899Z     buf4 = buf0[3]
2023-01-11T21:38:05.8959994Z     assert_size_stride(buf4, (3, ), (1, ))
2023-01-11T21:38:05.8960066Z     del buf0
2023-01-11T21:38:05.8960159Z     return (buf1, buf2, buf3, buf4, )
2023-01-11T21:38:05.8960165Z 
2023-01-11T21:38:05.8960169Z 
2023-01-11T21:38:05.8960250Z if __name__ == "__main__":
2023-01-11T21:38:05.8960361Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8960488Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8960689Z     arg0_1 = rand_strided((10, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8960915Z     arg1_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8961106Z     arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8961235Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.8961243Z 
2023-01-11T21:38:05.8961317Z ok (0.041s)
2023-01-11T21:38:05.8961769Z   test_embedding_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8961899Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8962149Z [2023-01-11 21:26:11,172] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 87
2023-01-11T21:38:05.8962412Z [2023-01-11 21:26:12,886] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 87
2023-01-11T21:38:05.8962418Z 
2023-01-11T21:38:05.8962519Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8962594Z import torch
2023-01-11T21:38:05.8962673Z import random
2023-01-11T21:38:05.8962794Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8962919Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8962924Z 
2023-01-11T21:38:05.8963008Z aten = torch.ops.aten
2023-01-11T21:38:05.8963137Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8963234Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8963239Z 
2023-01-11T21:38:05.8963314Z import triton
2023-01-11T21:38:05.8963409Z import triton.language as tl
2023-01-11T21:38:05.8963534Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8963675Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8963680Z 
2023-01-11T21:38:05.8963684Z 
2023-01-11T21:38:05.8963822Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8964029Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8964176Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.8964289Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8964394Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8964498Z                        bool* __restrict__ out_ptr1,
2023-01-11T21:38:05.8964599Z                        long* __restrict__ out_ptr2)
2023-01-11T21:38:05.8964666Z {
2023-01-11T21:38:05.8964770Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8964830Z     {
2023-01-11T21:38:05.8964913Z         #pragma omp for 
2023-01-11T21:38:05.8965002Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8965071Z         {
2023-01-11T21:38:05.8965164Z             #pragma GCC ivdep
2023-01-11T21:38:05.8965252Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.8965319Z             {
2023-01-11T21:38:05.8965381Z                 {
2023-01-11T21:38:05.8965453Z                     {
2023-01-11T21:38:05.8965554Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8965669Z                         auto tmp1 = in_ptr1[i1 + (4*tmp0)];
2023-01-11T21:38:05.8965772Z                         auto tmp2 = tmp1 * (tmp1>0);
2023-01-11T21:38:05.8965873Z                         out_ptr0[i1 + (4*i0)] = tmp2;
2023-01-11T21:38:05.8965946Z                     }
2023-01-11T21:38:05.8966009Z                 }
2023-01-11T21:38:05.8966075Z             }
2023-01-11T21:38:05.8966142Z         }
2023-01-11T21:38:05.8966223Z         #pragma omp for 
2023-01-11T21:38:05.8966311Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.8966376Z         {
2023-01-11T21:38:05.8966437Z             {
2023-01-11T21:38:05.8966506Z                 {
2023-01-11T21:38:05.8966638Z                     auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.8966749Z                     auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:05.8966848Z                     auto tmp2 = tmp0 <= tmp1;
2023-01-11T21:38:05.8966940Z                     out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.8967009Z                 }
2023-01-11T21:38:05.8967072Z             }
2023-01-11T21:38:05.8967142Z         }
2023-01-11T21:38:05.8967224Z         #pragma omp for 
2023-01-11T21:38:05.8967309Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.8967377Z         {
2023-01-11T21:38:05.8967444Z             {
2023-01-11T21:38:05.8967513Z                 {
2023-01-11T21:38:05.8967602Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8967718Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.8967828Z                     auto tmp2 = static_cast<long>(tmp1);
2023-01-11T21:38:05.8967918Z                     out_ptr2[i0] = tmp2;
2023-01-11T21:38:05.8967990Z                 }
2023-01-11T21:38:05.8968057Z             }
2023-01-11T21:38:05.8968126Z         }
2023-01-11T21:38:05.8968185Z     }
2023-01-11T21:38:05.8968250Z }
2023-01-11T21:38:05.8968338Z ''')
2023-01-11T21:38:05.8968344Z 
2023-01-11T21:38:05.8968348Z 
2023-01-11T21:38:05.8968444Z async_compile.wait(globals())
2023-01-11T21:38:05.8968530Z del async_compile
2023-01-11T21:38:05.8968535Z 
2023-01-11T21:38:05.8968610Z def call(args):
2023-01-11T21:38:05.8968704Z     primals_1, primals_2 = args
2023-01-11T21:38:05.8968772Z     args.clear()
2023-01-11T21:38:05.8968981Z     buf0 = empty_strided((2, 8, 4), (32, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8969181Z     buf1 = empty_strided((2, 8, 4), (32, 4, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.8969374Z     buf2 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8969602Z     kernel_cpp_0(c_void_p(primals_2.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.8969686Z     del primals_1
2023-01-11T21:38:05.8969762Z     del primals_2
2023-01-11T21:38:05.8969850Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.8969855Z 
2023-01-11T21:38:05.8969860Z 
2023-01-11T21:38:05.8969933Z if __name__ == "__main__":
2023-01-11T21:38:05.8970095Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8970226Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8970431Z     primals_1 = rand_strided((10, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8970629Z     primals_2 = rand_strided((2, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.8970762Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:05.8970767Z 
2023-01-11T21:38:05.8970837Z ok (1.818s)
2023-01-11T21:38:05.8971286Z   test_exp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8971423Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8971674Z [2023-01-11 21:26:12,905] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 88
2023-01-11T21:38:05.8971937Z [2023-01-11 21:26:14,603] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 88
2023-01-11T21:38:05.8971943Z 
2023-01-11T21:38:05.8972043Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8972120Z import torch
2023-01-11T21:38:05.8972195Z import random
2023-01-11T21:38:05.8972316Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8972440Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8972445Z 
2023-01-11T21:38:05.8972572Z aten = torch.ops.aten
2023-01-11T21:38:05.8972702Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8972800Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8972805Z 
2023-01-11T21:38:05.8972881Z import triton
2023-01-11T21:38:05.8972974Z import triton.language as tl
2023-01-11T21:38:05.8973102Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8973244Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8973249Z 
2023-01-11T21:38:05.8973253Z 
2023-01-11T21:38:05.8973391Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8973596Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8973712Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8973822Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.8973931Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8974036Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8974103Z {
2023-01-11T21:38:05.8974208Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8974278Z     {
2023-01-11T21:38:05.8974352Z         #pragma omp for 
2023-01-11T21:38:05.8974441Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.8974660Z         {
2023-01-11T21:38:05.8974810Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.8974973Z             auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.8975082Z             auto tmp1 = tmp0.exp();
2023-01-11T21:38:05.8975180Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.8975261Z             auto tmp4 = tmp3.exp();
2023-01-11T21:38:05.8975358Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.8975455Z             tmp4.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.8975521Z         }
2023-01-11T21:38:05.8975622Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.8975715Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.8975781Z         {
2023-01-11T21:38:05.8975863Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.8975953Z             auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:05.8976048Z             auto tmp1 = std::exp(tmp0);
2023-01-11T21:38:05.8976192Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.8976288Z             auto tmp4 = std::exp(tmp3);
2023-01-11T21:38:05.8976375Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.8976459Z             out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.8976519Z         }
2023-01-11T21:38:05.8976586Z     }
2023-01-11T21:38:05.8976651Z }
2023-01-11T21:38:05.8976741Z ''')
2023-01-11T21:38:05.8976746Z 
2023-01-11T21:38:05.8976751Z 
2023-01-11T21:38:05.8976846Z async_compile.wait(globals())
2023-01-11T21:38:05.8976923Z del async_compile
2023-01-11T21:38:05.8976928Z 
2023-01-11T21:38:05.8977003Z def call(args):
2023-01-11T21:38:05.8977085Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8977262Z     args.clear()
2023-01-11T21:38:05.8977470Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8977665Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8977865Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8977939Z     del arg0_1
2023-01-11T21:38:05.8978011Z     del arg1_1
2023-01-11T21:38:05.8978094Z     return (buf0, buf1, )
2023-01-11T21:38:05.8978099Z 
2023-01-11T21:38:05.8978104Z 
2023-01-11T21:38:05.8978177Z if __name__ == "__main__":
2023-01-11T21:38:05.8978298Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8978425Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8978622Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8978815Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8978990Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8978995Z 
2023-01-11T21:38:05.8979074Z ok (1.723s)
2023-01-11T21:38:05.8979532Z   test_expand_as_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8979665Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8979914Z [2023-01-11 21:26:14,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 89
2023-01-11T21:38:05.8980174Z [2023-01-11 21:26:16,362] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 89
2023-01-11T21:38:05.8980179Z 
2023-01-11T21:38:05.8980281Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8980358Z import torch
2023-01-11T21:38:05.8980435Z import random
2023-01-11T21:38:05.8980554Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8980682Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8980687Z 
2023-01-11T21:38:05.8980770Z aten = torch.ops.aten
2023-01-11T21:38:05.8980900Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8980996Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8981001Z 
2023-01-11T21:38:05.8981076Z import triton
2023-01-11T21:38:05.8981169Z import triton.language as tl
2023-01-11T21:38:05.8981294Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8981435Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8981441Z 
2023-01-11T21:38:05.8981445Z 
2023-01-11T21:38:05.8981584Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8981791Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8981911Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8982018Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.8982085Z {
2023-01-11T21:38:05.8982186Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8982254Z     {
2023-01-11T21:38:05.8982382Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8982474Z         for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:05.8982535Z         {
2023-01-11T21:38:05.8982626Z             for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:05.8982697Z             {
2023-01-11T21:38:05.8982790Z                 for(long i2=0; i2<12; i2+=1)
2023-01-11T21:38:05.8982863Z                 {
2023-01-11T21:38:05.8983014Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i2) + (100*i0));
2023-01-11T21:38:05.8983160Z                     auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.8983261Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8983349Z                     auto tmp3 = tmp2 + tmp1;
2023-01-11T21:38:05.8983470Z                     tmp3.store(out_ptr0 + (8*i2) + (100*i1) + (12800*i0));
2023-01-11T21:38:05.8983541Z                 }
2023-01-11T21:38:05.8983643Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.8983742Z                 for(long i2=96; i2<100; i2+=1)
2023-01-11T21:38:05.8983813Z                 {
2023-01-11T21:38:05.8983920Z                     auto tmp0 = in_ptr0[i2 + (100*i0)];
2023-01-11T21:38:05.8984021Z                     auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8984118Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8984215Z                     auto tmp3 = tmp2 + tmp1;
2023-01-11T21:38:05.8984327Z                     out_ptr0[i2 + (100*i1) + (12800*i0)] = tmp3;
2023-01-11T21:38:05.8984398Z                 }
2023-01-11T21:38:05.8984467Z             }
2023-01-11T21:38:05.8984536Z         }
2023-01-11T21:38:05.8984626Z     }
2023-01-11T21:38:05.8984689Z }
2023-01-11T21:38:05.8984775Z ''')
2023-01-11T21:38:05.8984780Z 
2023-01-11T21:38:05.8984785Z 
2023-01-11T21:38:05.8984883Z async_compile.wait(globals())
2023-01-11T21:38:05.8984980Z del async_compile
2023-01-11T21:38:05.8984987Z 
2023-01-11T21:38:05.8985065Z def call(args):
2023-01-11T21:38:05.8985169Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.8985240Z     args.clear()
2023-01-11T21:38:05.8985458Z     buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8985598Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.8985714Z     return (as_strided(arg0_1, (6, 128, 100), (100, 0, 1)), buf0, )
2023-01-11T21:38:05.8985719Z 
2023-01-11T21:38:05.8985723Z 
2023-01-11T21:38:05.8985808Z if __name__ == "__main__":
2023-01-11T21:38:05.8985925Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8986053Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8986268Z     arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8986475Z     arg1_1 = rand_strided((6, 128, 100), (12800, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8986593Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.8986599Z 
2023-01-11T21:38:05.8986676Z ok (1.755s)
2023-01-11T21:38:05.8987130Z   test_expand_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8987264Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8987518Z [2023-01-11 21:26:16,389] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 90
2023-01-11T21:38:05.8987782Z [2023-01-11 21:26:18,090] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 90
2023-01-11T21:38:05.8987787Z 
2023-01-11T21:38:05.8987888Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.8987963Z import torch
2023-01-11T21:38:05.8988031Z import random
2023-01-11T21:38:05.8988222Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.8988350Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.8988356Z 
2023-01-11T21:38:05.8988438Z aten = torch.ops.aten
2023-01-11T21:38:05.8988574Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.8988671Z async_compile = AsyncCompile()
2023-01-11T21:38:05.8988676Z 
2023-01-11T21:38:05.8988752Z import triton
2023-01-11T21:38:05.8988846Z import triton.language as tl
2023-01-11T21:38:05.8988965Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.8989107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.8989115Z 
2023-01-11T21:38:05.8989120Z 
2023-01-11T21:38:05.8989259Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.8989469Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.8989596Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.8989702Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.8989806Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.8989873Z {
2023-01-11T21:38:05.8989967Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.8990034Z     {
2023-01-11T21:38:05.8990129Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8990217Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:05.8990284Z         {
2023-01-11T21:38:05.8990372Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.8990433Z             {
2023-01-11T21:38:05.8990522Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8990644Z                 for(long i2=0; i2<3; i2+=1)
2023-01-11T21:38:05.8990718Z                 {
2023-01-11T21:38:05.8990809Z                     #pragma GCC ivdep
2023-01-11T21:38:05.8990907Z                     for(long i3=0; i3<2; i3+=1)
2023-01-11T21:38:05.8990977Z                     {
2023-01-11T21:38:05.8991045Z                         {
2023-01-11T21:38:05.8991122Z                             {
2023-01-11T21:38:05.8991234Z                                 auto tmp0 = in_ptr0[i3 + (2*i1)];
2023-01-11T21:38:05.8991353Z                                 auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.8991457Z                                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8991571Z                                 auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.8991674Z                                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.8991792Z                                 out_ptr0[i3 + (2*i2) + (6*i1) + (12*i0)] = tmp4;
2023-01-11T21:38:05.8991863Z                             }
2023-01-11T21:38:05.8991937Z                         }
2023-01-11T21:38:05.8992010Z                     }
2023-01-11T21:38:05.8992078Z                 }
2023-01-11T21:38:05.8992146Z             }
2023-01-11T21:38:05.8992213Z         }
2023-01-11T21:38:05.8992301Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.8992394Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.8992460Z         {
2023-01-11T21:38:05.8992548Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.8992615Z             {
2023-01-11T21:38:05.8992704Z                 #pragma GCC ivdep
2023-01-11T21:38:05.8992797Z                 for(long i2=0; i2<3; i2+=1)
2023-01-11T21:38:05.8992859Z                 {
2023-01-11T21:38:05.8992947Z                     #pragma GCC ivdep
2023-01-11T21:38:05.8993042Z                     for(long i3=0; i3<2; i3+=1)
2023-01-11T21:38:05.8993114Z                     {
2023-01-11T21:38:05.8993188Z                         {
2023-01-11T21:38:05.8993267Z                             {
2023-01-11T21:38:05.8993380Z                                 auto tmp0 = in_ptr0[i3 + (2*i1)];
2023-01-11T21:38:05.8993489Z                                 auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.8993593Z                                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.8993739Z                                 out_ptr1[i3 + (2*i2) + (6*i1) + (12*i0)] = tmp2;
2023-01-11T21:38:05.8993817Z                             }
2023-01-11T21:38:05.8993891Z                         }
2023-01-11T21:38:05.8993962Z                     }
2023-01-11T21:38:05.8994031Z                 }
2023-01-11T21:38:05.8994091Z             }
2023-01-11T21:38:05.8994162Z         }
2023-01-11T21:38:05.8994232Z     }
2023-01-11T21:38:05.8994300Z }
2023-01-11T21:38:05.8994386Z ''')
2023-01-11T21:38:05.8994392Z 
2023-01-11T21:38:05.8994396Z 
2023-01-11T21:38:05.8994491Z async_compile.wait(globals())
2023-01-11T21:38:05.8994569Z del async_compile
2023-01-11T21:38:05.8994575Z 
2023-01-11T21:38:05.8994644Z def call(args):
2023-01-11T21:38:05.8994723Z     arg0_1, = args
2023-01-11T21:38:05.8994801Z     args.clear()
2023-01-11T21:38:05.8995019Z     buf0 = empty_strided((3, 4, 2, 3, 2), (48, 12, 6, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8995238Z     buf1 = empty_strided((2, 1, 2, 3, 2), (12, 12, 6, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8995410Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.8995535Z     return (buf0, buf1, as_strided(arg0_1, (2, 2, 5, 2), (0, 2, 0, 1)), )
2023-01-11T21:38:05.8995541Z 
2023-01-11T21:38:05.8995545Z 
2023-01-11T21:38:05.8995625Z if __name__ == "__main__":
2023-01-11T21:38:05.8995736Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.8995863Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.8996065Z     arg0_1 = rand_strided((2, 1, 2), (2, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.8996179Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.8996213Z 
2023-01-11T21:38:05.8996288Z ok (1.726s)
2023-01-11T21:38:05.8996924Z   test_expanded_reduction_cpu (__main__.CpuTests) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/87157 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests. (0.001s)
2023-01-11T21:38:05.8997376Z   test_expm1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8997509Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8997769Z [2023-01-11 21:26:18,110] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 91
2023-01-11T21:38:05.8998032Z [2023-01-11 21:26:19,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 91
2023-01-11T21:38:05.8998451Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8998584Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8998829Z [2023-01-11 21:26:19,804] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 92
2023-01-11T21:38:05.8999087Z [2023-01-11 21:26:21,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 92
2023-01-11T21:38:05.8999534Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.8999668Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.8999920Z [2023-01-11 21:26:21,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 93
2023-01-11T21:38:05.9000179Z [2023-01-11 21:26:23,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 93
2023-01-11T21:38:05.9000593Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9000727Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9000984Z [2023-01-11 21:26:23,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 94
2023-01-11T21:38:05.9001243Z [2023-01-11 21:26:24,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 94
2023-01-11T21:38:05.9001655Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9001784Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9002051Z [2023-01-11 21:26:24,968] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 95
2023-01-11T21:38:05.9002069Z 
2023-01-11T21:38:05.9002161Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9002237Z import torch
2023-01-11T21:38:05.9002312Z import random
2023-01-11T21:38:05.9002433Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9002558Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9002564Z 
2023-01-11T21:38:05.9002647Z aten = torch.ops.aten
2023-01-11T21:38:05.9002788Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9002876Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9002881Z 
2023-01-11T21:38:05.9002956Z import triton
2023-01-11T21:38:05.9003050Z import triton.language as tl
2023-01-11T21:38:05.9003177Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9003318Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9003327Z 
2023-01-11T21:38:05.9003331Z 
2023-01-11T21:38:05.9003469Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9003678Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9003801Z extern "C" void kernel(const half* __restrict__ in_ptr0,
2023-01-11T21:38:05.9003899Z                        half* __restrict__ out_ptr0,
2023-01-11T21:38:05.9004001Z                        half* __restrict__ out_ptr1)
2023-01-11T21:38:05.9004067Z {
2023-01-11T21:38:05.9004171Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9004237Z     {
2023-01-11T21:38:05.9004319Z         #pragma omp for 
2023-01-11T21:38:05.9004406Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9004466Z         {
2023-01-11T21:38:05.9004535Z             {
2023-01-11T21:38:05.9004605Z                 {
2023-01-11T21:38:05.9004727Z                     auto tmp0 = static_cast<float>(in_ptr0[i0]);
2023-01-11T21:38:05.9004837Z                     auto tmp1 = std::expm1(tmp0);
2023-01-11T21:38:05.9004949Z                     auto tmp2 = static_cast<half>(2);
2023-01-11T21:38:05.9005046Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9005129Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9005219Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9005321Z                 }
2023-01-11T21:38:05.9005391Z             }
2023-01-11T21:38:05.9005458Z         }
2023-01-11T21:38:05.9005527Z     }
2023-01-11T21:38:05.9005584Z }
2023-01-11T21:38:05.9005672Z ''')
2023-01-11T21:38:05.9005677Z 
2023-01-11T21:38:05.9005682Z 
2023-01-11T21:38:05.9005775Z async_compile.wait(globals())
2023-01-11T21:38:05.9005852Z del async_compile
2023-01-11T21:38:05.9005857Z 
2023-01-11T21:38:05.9005934Z def call(args):
2023-01-11T21:38:05.9006009Z     arg0_1, = args
2023-01-11T21:38:05.9006086Z     args.clear()
2023-01-11T21:38:05.9006283Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9006469Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9006638Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9006712Z     del arg0_1
2023-01-11T21:38:05.9006793Z     return (buf0, buf1, )
2023-01-11T21:38:05.9006798Z 
2023-01-11T21:38:05.9006802Z 
2023-01-11T21:38:05.9006886Z if __name__ == "__main__":
2023-01-11T21:38:05.9007004Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9007132Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9007325Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9007430Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9007435Z 
2023-01-11T21:38:05.9007439Z 
2023-01-11T21:38:05.9007540Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9007614Z import torch
2023-01-11T21:38:05.9007690Z import random
2023-01-11T21:38:05.9007809Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9007961Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9007967Z 
2023-01-11T21:38:05.9008049Z aten = torch.ops.aten
2023-01-11T21:38:05.9008184Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9008272Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9008280Z 
2023-01-11T21:38:05.9008353Z import triton
2023-01-11T21:38:05.9008446Z import triton.language as tl
2023-01-11T21:38:05.9008573Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9008716Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9008721Z 
2023-01-11T21:38:05.9008725Z 
2023-01-11T21:38:05.9008859Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9009064Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9009188Z extern "C" void kernel(const half* __restrict__ in_ptr0,
2023-01-11T21:38:05.9009282Z                        half* __restrict__ out_ptr0,
2023-01-11T21:38:05.9009389Z                        half* __restrict__ out_ptr1)
2023-01-11T21:38:05.9009457Z {
2023-01-11T21:38:05.9009559Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9009625Z     {
2023-01-11T21:38:05.9009707Z         #pragma omp for 
2023-01-11T21:38:05.9009791Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9009859Z         {
2023-01-11T21:38:05.9009928Z             {
2023-01-11T21:38:05.9009998Z                 {
2023-01-11T21:38:05.9010117Z                     auto tmp0 = static_cast<float>(in_ptr0[i0]);
2023-01-11T21:38:05.9010224Z                     auto tmp1 = std::expm1(tmp0);
2023-01-11T21:38:05.9010331Z                     auto tmp2 = static_cast<half>(2);
2023-01-11T21:38:05.9010420Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9010511Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9010602Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9010672Z                 }
2023-01-11T21:38:05.9010742Z             }
2023-01-11T21:38:05.9010810Z         }
2023-01-11T21:38:05.9010877Z     }
2023-01-11T21:38:05.9010934Z }
2023-01-11T21:38:05.9011020Z ''')
2023-01-11T21:38:05.9011025Z 
2023-01-11T21:38:05.9011029Z 
2023-01-11T21:38:05.9011123Z async_compile.wait(globals())
2023-01-11T21:38:05.9011200Z del async_compile
2023-01-11T21:38:05.9011247Z 
2023-01-11T21:38:05.9011325Z def call(args):
2023-01-11T21:38:05.9011399Z     arg0_1, = args
2023-01-11T21:38:05.9011474Z     args.clear()
2023-01-11T21:38:05.9011660Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9011853Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9012020Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9012095Z     del arg0_1
2023-01-11T21:38:05.9012176Z     return (buf0, buf1, )
2023-01-11T21:38:05.9012181Z 
2023-01-11T21:38:05.9012185Z 
2023-01-11T21:38:05.9012268Z if __name__ == "__main__":
2023-01-11T21:38:05.9012386Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9012512Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9012701Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9012818Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9012824Z 
2023-01-11T21:38:05.9012828Z 
2023-01-11T21:38:05.9012926Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9013000Z import torch
2023-01-11T21:38:05.9013074Z import random
2023-01-11T21:38:05.9013192Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9013314Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9013319Z 
2023-01-11T21:38:05.9013401Z aten = torch.ops.aten
2023-01-11T21:38:05.9013529Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9013625Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9013659Z 
2023-01-11T21:38:05.9013733Z import triton
2023-01-11T21:38:05.9013826Z import triton.language as tl
2023-01-11T21:38:05.9013954Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9014096Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9014101Z 
2023-01-11T21:38:05.9014106Z 
2023-01-11T21:38:05.9014244Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9014450Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9014683Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9014792Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9014895Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9014961Z {
2023-01-11T21:38:05.9015063Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9015129Z     {
2023-01-11T21:38:05.9015210Z         #pragma omp for 
2023-01-11T21:38:05.9015290Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9015361Z         {
2023-01-11T21:38:05.9015505Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9015600Z             auto tmp1 = tmp0.expm1();
2023-01-11T21:38:05.9015738Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9015829Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9015927Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9016028Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9016088Z         }
2023-01-11T21:38:05.9016188Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9016276Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9016343Z         {
2023-01-11T21:38:05.9016438Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9016537Z             auto tmp1 = std::expm1(tmp0);
2023-01-11T21:38:05.9016634Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:05.9016724Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9016813Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9016898Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9016968Z         }
2023-01-11T21:38:05.9017036Z     }
2023-01-11T21:38:05.9017102Z }
2023-01-11T21:38:05.9017264Z ''')
2023-01-11T21:38:05.9017270Z 
2023-01-11T21:38:05.9017283Z 
2023-01-11T21:38:05.9017431Z async_compile.wait(globals())
2023-01-11T21:38:05.9017526Z del async_compile
2023-01-11T21:38:05.9017532Z 
2023-01-11T21:38:05.9017620Z def call(args):
2023-01-11T21:38:05.9017708Z     arg0_1, = args
2023-01-11T21:38:05.9017799Z     args.clear()
2023-01-11T21:38:05.9017996Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9018185Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9018342Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9018420Z     del arg0_1
2023-01-11T21:38:05.9018501Z     return (buf0, buf1, )
2023-01-11T21:38:05.9018509Z 
2023-01-11T21:38:05.9018514Z 
2023-01-11T21:38:05.9018598Z if __name__ == "__main__":
2023-01-11T21:38:05.9018715Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9018842Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9019035Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9019150Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9019155Z 
2023-01-11T21:38:05.9019159Z 
2023-01-11T21:38:05.9019249Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9019323Z import torch
2023-01-11T21:38:05.9019397Z import random
2023-01-11T21:38:05.9019517Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9019641Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9019646Z 
2023-01-11T21:38:05.9019728Z aten = torch.ops.aten
2023-01-11T21:38:05.9019862Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9019992Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9020005Z 
2023-01-11T21:38:05.9020072Z import triton
2023-01-11T21:38:05.9020166Z import triton.language as tl
2023-01-11T21:38:05.9020290Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9020432Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9020440Z 
2023-01-11T21:38:05.9020444Z 
2023-01-11T21:38:05.9020581Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9020784Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9020908Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9021011Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9021105Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9021170Z {
2023-01-11T21:38:05.9021271Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9021338Z     {
2023-01-11T21:38:05.9021423Z         #pragma omp for 
2023-01-11T21:38:05.9021510Z         for(long i0=0; i0<25; i0+=1)
2023-01-11T21:38:05.9021570Z         {
2023-01-11T21:38:05.9021709Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9021800Z             auto tmp1 = tmp0.expm1();
2023-01-11T21:38:05.9021938Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9022029Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9022129Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9022224Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9022292Z         }
2023-01-11T21:38:05.9022384Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9022473Z         for(long i0=200; i0<201; i0+=1)
2023-01-11T21:38:05.9022540Z         {
2023-01-11T21:38:05.9022629Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9022727Z             auto tmp1 = std::expm1(tmp0);
2023-01-11T21:38:05.9022831Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:05.9022923Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9023002Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9023089Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9023160Z         }
2023-01-11T21:38:05.9023229Z     }
2023-01-11T21:38:05.9023295Z }
2023-01-11T21:38:05.9023380Z ''')
2023-01-11T21:38:05.9023422Z 
2023-01-11T21:38:05.9023427Z 
2023-01-11T21:38:05.9023520Z async_compile.wait(globals())
2023-01-11T21:38:05.9023592Z del async_compile
2023-01-11T21:38:05.9023597Z 
2023-01-11T21:38:05.9023673Z def call(args):
2023-01-11T21:38:05.9023750Z     arg0_1, = args
2023-01-11T21:38:05.9023825Z     args.clear()
2023-01-11T21:38:05.9024020Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9024212Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9024379Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9024448Z     del arg0_1
2023-01-11T21:38:05.9024529Z     return (buf0, buf1, )
2023-01-11T21:38:05.9024534Z 
2023-01-11T21:38:05.9024538Z 
2023-01-11T21:38:05.9024620Z if __name__ == "__main__":
2023-01-11T21:38:05.9024738Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9024871Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9025066Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9025179Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9025184Z 
2023-01-11T21:38:05.9025443Z [2023-01-11 21:26:26,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 95
2023-01-11T21:38:05.9025861Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9026017Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9026272Z [2023-01-11 21:26:26,688] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 96
2023-01-11T21:38:05.9026539Z [2023-01-11 21:26:28,446] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 96
2023-01-11T21:38:05.9026953Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9027086Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9027340Z [2023-01-11 21:26:28,463] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 97
2023-01-11T21:38:05.9027603Z [2023-01-11 21:26:30,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 97
2023-01-11T21:38:05.9028018Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9028149Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9028400Z [2023-01-11 21:26:30,335] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 98
2023-01-11T21:38:05.9028660Z [2023-01-11 21:26:32,127] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 98
2023-01-11T21:38:05.9029072Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9029228Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9029481Z [2023-01-11 21:26:32,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 99
2023-01-11T21:38:05.9029487Z 
2023-01-11T21:38:05.9029587Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9029661Z import torch
2023-01-11T21:38:05.9029737Z import random
2023-01-11T21:38:05.9029858Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9029983Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9029989Z 
2023-01-11T21:38:05.9030076Z aten = torch.ops.aten
2023-01-11T21:38:05.9030207Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9030304Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9030310Z 
2023-01-11T21:38:05.9030386Z import triton
2023-01-11T21:38:05.9030479Z import triton.language as tl
2023-01-11T21:38:05.9030607Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9030750Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9030755Z 
2023-01-11T21:38:05.9030760Z 
2023-01-11T21:38:05.9030897Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9031105Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9031224Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:05.9031332Z                        double* __restrict__ out_ptr0,
2023-01-11T21:38:05.9031435Z                        double* __restrict__ out_ptr1)
2023-01-11T21:38:05.9031504Z {
2023-01-11T21:38:05.9031606Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9031706Z     {
2023-01-11T21:38:05.9031791Z         #pragma omp for 
2023-01-11T21:38:05.9031872Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9031941Z         {
2023-01-11T21:38:05.9032009Z             {
2023-01-11T21:38:05.9032078Z                 {
2023-01-11T21:38:05.9032181Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9032288Z                     auto tmp1 = std::expm1(tmp0);
2023-01-11T21:38:05.9032398Z                     auto tmp2 = static_cast<double>(2);
2023-01-11T21:38:05.9032488Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9032579Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9032670Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9032741Z                 }
2023-01-11T21:38:05.9032808Z             }
2023-01-11T21:38:05.9032877Z         }
2023-01-11T21:38:05.9032943Z     }
2023-01-11T21:38:05.9033000Z }
2023-01-11T21:38:05.9033086Z ''')
2023-01-11T21:38:05.9033095Z 
2023-01-11T21:38:05.9033099Z 
2023-01-11T21:38:05.9033194Z async_compile.wait(globals())
2023-01-11T21:38:05.9033272Z del async_compile
2023-01-11T21:38:05.9033277Z 
2023-01-11T21:38:05.9033351Z def call(args):
2023-01-11T21:38:05.9033426Z     arg0_1, = args
2023-01-11T21:38:05.9033502Z     args.clear()
2023-01-11T21:38:05.9033692Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9033887Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9034055Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9034130Z     del arg0_1
2023-01-11T21:38:05.9034212Z     return (buf0, buf1, )
2023-01-11T21:38:05.9034217Z 
2023-01-11T21:38:05.9034222Z 
2023-01-11T21:38:05.9034303Z if __name__ == "__main__":
2023-01-11T21:38:05.9034422Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9034551Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9034738Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9034853Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9034858Z 
2023-01-11T21:38:05.9034862Z 
2023-01-11T21:38:05.9034961Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9035036Z import torch
2023-01-11T21:38:05.9035141Z import random
2023-01-11T21:38:05.9035262Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9035386Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9035391Z 
2023-01-11T21:38:05.9035472Z aten = torch.ops.aten
2023-01-11T21:38:05.9035600Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9035696Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9035701Z 
2023-01-11T21:38:05.9035775Z import triton
2023-01-11T21:38:05.9035866Z import triton.language as tl
2023-01-11T21:38:05.9035989Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9036132Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9036138Z 
2023-01-11T21:38:05.9036142Z 
2023-01-11T21:38:05.9036279Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9036484Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9036605Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:05.9036711Z                        double* __restrict__ out_ptr0,
2023-01-11T21:38:05.9036814Z                        double* __restrict__ out_ptr1)
2023-01-11T21:38:05.9036881Z {
2023-01-11T21:38:05.9036984Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9037050Z     {
2023-01-11T21:38:05.9037131Z         #pragma omp for 
2023-01-11T21:38:05.9037212Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9037279Z         {
2023-01-11T21:38:05.9037346Z             {
2023-01-11T21:38:05.9037414Z                 {
2023-01-11T21:38:05.9037514Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9037647Z                     auto tmp1 = std::expm1(tmp0);
2023-01-11T21:38:05.9037758Z                     auto tmp2 = static_cast<double>(2);
2023-01-11T21:38:05.9037847Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9037939Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9038032Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9038104Z                 }
2023-01-11T21:38:05.9038174Z             }
2023-01-11T21:38:05.9038243Z         }
2023-01-11T21:38:05.9038310Z     }
2023-01-11T21:38:05.9038367Z }
2023-01-11T21:38:05.9038452Z ''')
2023-01-11T21:38:05.9038457Z 
2023-01-11T21:38:05.9038461Z 
2023-01-11T21:38:05.9038556Z async_compile.wait(globals())
2023-01-11T21:38:05.9038634Z del async_compile
2023-01-11T21:38:05.9038639Z 
2023-01-11T21:38:05.9038716Z def call(args):
2023-01-11T21:38:05.9038791Z     arg0_1, = args
2023-01-11T21:38:05.9038868Z     args.clear()
2023-01-11T21:38:05.9039054Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9039254Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9039420Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9039495Z     del arg0_1
2023-01-11T21:38:05.9039578Z     return (buf0, buf1, )
2023-01-11T21:38:05.9039586Z 
2023-01-11T21:38:05.9039591Z 
2023-01-11T21:38:05.9039673Z if __name__ == "__main__":
2023-01-11T21:38:05.9039790Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9039914Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9040100Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9040211Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9040216Z 
2023-01-11T21:38:05.9040221Z 
2023-01-11T21:38:05.9040317Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9040393Z import torch
2023-01-11T21:38:05.9040473Z import random
2023-01-11T21:38:05.9040591Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9040716Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9040721Z 
2023-01-11T21:38:05.9040804Z aten = torch.ops.aten
2023-01-11T21:38:05.9040931Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9041058Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9041064Z 
2023-01-11T21:38:05.9041142Z import triton
2023-01-11T21:38:05.9041236Z import triton.language as tl
2023-01-11T21:38:05.9041361Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9041503Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9041508Z 
2023-01-11T21:38:05.9041513Z 
2023-01-11T21:38:05.9041648Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9041854Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9041967Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:05.9042077Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9042181Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9042248Z {
2023-01-11T21:38:05.9042352Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9042419Z     {
2023-01-11T21:38:05.9042504Z         #pragma omp for 
2023-01-11T21:38:05.9042585Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9042655Z         {
2023-01-11T21:38:05.9042724Z             {
2023-01-11T21:38:05.9042794Z                 {
2023-01-11T21:38:05.9042890Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9043003Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9043103Z                     auto tmp2 = std::expm1(tmp1);
2023-01-11T21:38:05.9043212Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9043309Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9043400Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9043518Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9043588Z                 }
2023-01-11T21:38:05.9043657Z             }
2023-01-11T21:38:05.9043717Z         }
2023-01-11T21:38:05.9043784Z     }
2023-01-11T21:38:05.9043849Z }
2023-01-11T21:38:05.9043937Z ''')
2023-01-11T21:38:05.9043943Z 
2023-01-11T21:38:05.9043950Z 
2023-01-11T21:38:05.9044044Z async_compile.wait(globals())
2023-01-11T21:38:05.9044125Z del async_compile
2023-01-11T21:38:05.9044130Z 
2023-01-11T21:38:05.9044205Z def call(args):
2023-01-11T21:38:05.9044279Z     arg0_1, = args
2023-01-11T21:38:05.9044347Z     args.clear()
2023-01-11T21:38:05.9044541Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9044732Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9044900Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9044974Z     del arg0_1
2023-01-11T21:38:05.9045060Z     return (buf0, buf1, )
2023-01-11T21:38:05.9045065Z 
2023-01-11T21:38:05.9045070Z 
2023-01-11T21:38:05.9045153Z if __name__ == "__main__":
2023-01-11T21:38:05.9045262Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9045386Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9045581Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.9045695Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9045700Z 
2023-01-11T21:38:05.9045704Z 
2023-01-11T21:38:05.9045803Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9045878Z import torch
2023-01-11T21:38:05.9045956Z import random
2023-01-11T21:38:05.9046075Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9046190Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9046195Z 
2023-01-11T21:38:05.9046282Z aten = torch.ops.aten
2023-01-11T21:38:05.9046422Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9046525Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9046530Z 
2023-01-11T21:38:05.9046603Z import triton
2023-01-11T21:38:05.9046698Z import triton.language as tl
2023-01-11T21:38:05.9046827Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9046998Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9047005Z 
2023-01-11T21:38:05.9047010Z 
2023-01-11T21:38:05.9047138Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9047345Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9047465Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:05.9047569Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9047672Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9047739Z {
2023-01-11T21:38:05.9047841Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9047903Z     {
2023-01-11T21:38:05.9047987Z         #pragma omp for 
2023-01-11T21:38:05.9048076Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9048144Z         {
2023-01-11T21:38:05.9048211Z             {
2023-01-11T21:38:05.9048281Z                 {
2023-01-11T21:38:05.9048378Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9048487Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9048597Z                     auto tmp2 = std::expm1(tmp1);
2023-01-11T21:38:05.9048706Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9048803Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9048893Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9048982Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9049053Z                 }
2023-01-11T21:38:05.9049113Z             }
2023-01-11T21:38:05.9049181Z         }
2023-01-11T21:38:05.9049248Z     }
2023-01-11T21:38:05.9049313Z }
2023-01-11T21:38:05.9049427Z ''')
2023-01-11T21:38:05.9049433Z 
2023-01-11T21:38:05.9049437Z 
2023-01-11T21:38:05.9049531Z async_compile.wait(globals())
2023-01-11T21:38:05.9049611Z del async_compile
2023-01-11T21:38:05.9049617Z 
2023-01-11T21:38:05.9049685Z def call(args):
2023-01-11T21:38:05.9049759Z     arg0_1, = args
2023-01-11T21:38:05.9049834Z     args.clear()
2023-01-11T21:38:05.9050031Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9050223Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9050389Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9050463Z     del arg0_1
2023-01-11T21:38:05.9050546Z     return (buf0, buf1, )
2023-01-11T21:38:05.9050551Z 
2023-01-11T21:38:05.9050555Z 
2023-01-11T21:38:05.9050627Z if __name__ == "__main__":
2023-01-11T21:38:05.9050746Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9050872Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9051068Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.9051182Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9051186Z 
2023-01-11T21:38:05.9051453Z [2023-01-11 21:26:33,977] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 99
2023-01-11T21:38:05.9051868Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9051999Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9052256Z [2023-01-11 21:26:33,994] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 100
2023-01-11T21:38:05.9052511Z [2023-01-11 21:26:35,767] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 100
2023-01-11T21:38:05.9052531Z 
2023-01-11T21:38:05.9052622Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9052699Z import torch
2023-01-11T21:38:05.9052773Z import random
2023-01-11T21:38:05.9052932Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9053059Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9053064Z 
2023-01-11T21:38:05.9053147Z aten = torch.ops.aten
2023-01-11T21:38:05.9053285Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9053373Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9053378Z 
2023-01-11T21:38:05.9053454Z import triton
2023-01-11T21:38:05.9053550Z import triton.language as tl
2023-01-11T21:38:05.9053676Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9053817Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9053825Z 
2023-01-11T21:38:05.9053829Z 
2023-01-11T21:38:05.9053966Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9054172Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9054294Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9054393Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9054606Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9054673Z {
2023-01-11T21:38:05.9054777Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9054846Z     {
2023-01-11T21:38:05.9054928Z         #pragma omp for 
2023-01-11T21:38:05.9055017Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9055077Z         {
2023-01-11T21:38:05.9055148Z             {
2023-01-11T21:38:05.9055216Z                 {
2023-01-11T21:38:05.9055314Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9055430Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9055586Z                     auto tmp2 = std::expm1(tmp1);
2023-01-11T21:38:05.9055696Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9055786Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9055877Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9055970Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9056041Z                 }
2023-01-11T21:38:05.9056107Z             }
2023-01-11T21:38:05.9056174Z         }
2023-01-11T21:38:05.9056239Z     }
2023-01-11T21:38:05.9056296Z }
2023-01-11T21:38:05.9056382Z ''')
2023-01-11T21:38:05.9056388Z 
2023-01-11T21:38:05.9056392Z 
2023-01-11T21:38:05.9056484Z async_compile.wait(globals())
2023-01-11T21:38:05.9056561Z del async_compile
2023-01-11T21:38:05.9056566Z 
2023-01-11T21:38:05.9056647Z def call(args):
2023-01-11T21:38:05.9056721Z     arg0_1, = args
2023-01-11T21:38:05.9056797Z     args.clear()
2023-01-11T21:38:05.9056984Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9057266Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9057438Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9057516Z     del arg0_1
2023-01-11T21:38:05.9057601Z     return (buf0, buf1, )
2023-01-11T21:38:05.9057606Z 
2023-01-11T21:38:05.9057611Z 
2023-01-11T21:38:05.9057692Z if __name__ == "__main__":
2023-01-11T21:38:05.9057811Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9057936Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9058119Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9058235Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9058240Z 
2023-01-11T21:38:05.9058244Z 
2023-01-11T21:38:05.9058342Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9058416Z import torch
2023-01-11T21:38:05.9058492Z import random
2023-01-11T21:38:05.9058612Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9058737Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9058742Z 
2023-01-11T21:38:05.9058821Z aten = torch.ops.aten
2023-01-11T21:38:05.9058949Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9059088Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9059094Z 
2023-01-11T21:38:05.9059171Z import triton
2023-01-11T21:38:05.9059269Z import triton.language as tl
2023-01-11T21:38:05.9059395Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9059537Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9059542Z 
2023-01-11T21:38:05.9059547Z 
2023-01-11T21:38:05.9059684Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9059890Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9060006Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9060114Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9060221Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9060287Z {
2023-01-11T21:38:05.9060388Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9060458Z     {
2023-01-11T21:38:05.9060544Z         #pragma omp for 
2023-01-11T21:38:05.9060625Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9060695Z         {
2023-01-11T21:38:05.9060763Z             {
2023-01-11T21:38:05.9060832Z                 {
2023-01-11T21:38:05.9060929Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9061044Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9061152Z                     auto tmp2 = std::expm1(tmp1);
2023-01-11T21:38:05.9061253Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9061349Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9061439Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9061556Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9061626Z                 }
2023-01-11T21:38:05.9061694Z             }
2023-01-11T21:38:05.9061753Z         }
2023-01-11T21:38:05.9061824Z     }
2023-01-11T21:38:05.9061894Z }
2023-01-11T21:38:05.9061980Z ''')
2023-01-11T21:38:05.9061986Z 
2023-01-11T21:38:05.9061993Z 
2023-01-11T21:38:05.9062089Z async_compile.wait(globals())
2023-01-11T21:38:05.9062166Z del async_compile
2023-01-11T21:38:05.9062171Z 
2023-01-11T21:38:05.9062250Z def call(args):
2023-01-11T21:38:05.9062324Z     arg0_1, = args
2023-01-11T21:38:05.9062392Z     args.clear()
2023-01-11T21:38:05.9062586Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9062780Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9062948Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9063026Z     del arg0_1
2023-01-11T21:38:05.9063112Z     return (buf0, buf1, )
2023-01-11T21:38:05.9063117Z 
2023-01-11T21:38:05.9063122Z 
2023-01-11T21:38:05.9063203Z if __name__ == "__main__":
2023-01-11T21:38:05.9063322Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9063441Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9063636Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9063748Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9063753Z 
2023-01-11T21:38:05.9063828Z ok (17.682s)
2023-01-11T21:38:05.9064279Z   test_fill1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9064413Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9064675Z [2023-01-11 21:26:35,819] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 101
2023-01-11T21:38:05.9064969Z [2023-01-11 21:26:37,612] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 101
2023-01-11T21:38:05.9064976Z 
2023-01-11T21:38:05.9065077Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9065144Z import torch
2023-01-11T21:38:05.9065220Z import random
2023-01-11T21:38:05.9065342Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9065470Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9065475Z 
2023-01-11T21:38:05.9065559Z aten = torch.ops.aten
2023-01-11T21:38:05.9065695Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9065799Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9065804Z 
2023-01-11T21:38:05.9065883Z import triton
2023-01-11T21:38:05.9065968Z import triton.language as tl
2023-01-11T21:38:05.9066094Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9066235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9066240Z 
2023-01-11T21:38:05.9066244Z 
2023-01-11T21:38:05.9066384Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9066590Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9066709Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9066813Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9066871Z {
2023-01-11T21:38:05.9066975Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9067042Z     {
2023-01-11T21:38:05.9067125Z         #pragma omp for 
2023-01-11T21:38:05.9067216Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9067286Z         {
2023-01-11T21:38:05.9067429Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9067546Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9067614Z         }
2023-01-11T21:38:05.9067715Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9067806Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.9067874Z         {
2023-01-11T21:38:05.9067983Z             auto tmp0 = static_cast<float>(1);
2023-01-11T21:38:05.9068071Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9068131Z         }
2023-01-11T21:38:05.9068211Z         #pragma omp for 
2023-01-11T21:38:05.9068300Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9068366Z         {
2023-01-11T21:38:05.9068502Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9068600Z             tmp0.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9068671Z         }
2023-01-11T21:38:05.9068762Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9068850Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.9068922Z         {
2023-01-11T21:38:05.9069027Z             auto tmp0 = static_cast<float>(2);
2023-01-11T21:38:05.9069113Z             out_ptr1[i0] = tmp0;
2023-01-11T21:38:05.9069181Z         }
2023-01-11T21:38:05.9069248Z     }
2023-01-11T21:38:05.9069306Z }
2023-01-11T21:38:05.9069391Z ''')
2023-01-11T21:38:05.9069397Z 
2023-01-11T21:38:05.9069404Z 
2023-01-11T21:38:05.9069501Z async_compile.wait(globals())
2023-01-11T21:38:05.9069581Z del async_compile
2023-01-11T21:38:05.9069586Z 
2023-01-11T21:38:05.9069665Z def call(args):
2023-01-11T21:38:05.9069742Z     arg0_1, = args
2023-01-11T21:38:05.9069819Z     args.clear()
2023-01-11T21:38:05.9070012Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9070212Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9070351Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9070434Z     return (buf0, buf1, )
2023-01-11T21:38:05.9070442Z 
2023-01-11T21:38:05.9070446Z 
2023-01-11T21:38:05.9070528Z if __name__ == "__main__":
2023-01-11T21:38:05.9070648Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9070775Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9071004Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9071110Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9071115Z 
2023-01-11T21:38:05.9071191Z ok (1.846s)
2023-01-11T21:38:05.9071638Z   test_fill2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9071772Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9072035Z [2023-01-11 21:26:37,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 102
2023-01-11T21:38:05.9072300Z [2023-01-11 21:26:39,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 102
2023-01-11T21:38:05.9072306Z 
2023-01-11T21:38:05.9072406Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9072481Z import torch
2023-01-11T21:38:05.9072556Z import random
2023-01-11T21:38:05.9072667Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9072794Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9072799Z 
2023-01-11T21:38:05.9072884Z aten = torch.ops.aten
2023-01-11T21:38:05.9073020Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9073117Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9073122Z 
2023-01-11T21:38:05.9073195Z import triton
2023-01-11T21:38:05.9073289Z import triton.language as tl
2023-01-11T21:38:05.9073450Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9073583Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9073589Z 
2023-01-11T21:38:05.9073602Z 
2023-01-11T21:38:05.9073731Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9073938Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9074059Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9074164Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9074229Z {
2023-01-11T21:38:05.9074335Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9074403Z     {
2023-01-11T21:38:05.9074477Z         #pragma omp for 
2023-01-11T21:38:05.9074569Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9074636Z         {
2023-01-11T21:38:05.9074780Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9074877Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9074947Z         }
2023-01-11T21:38:05.9075039Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9075126Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.9075194Z         {
2023-01-11T21:38:05.9075301Z             auto tmp0 = static_cast<float>(1);
2023-01-11T21:38:05.9075391Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9075458Z         }
2023-01-11T21:38:05.9075543Z         #pragma omp for 
2023-01-11T21:38:05.9075621Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9075689Z         {
2023-01-11T21:38:05.9075831Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(3.0));
2023-01-11T21:38:05.9075928Z             tmp0.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9075997Z         }
2023-01-11T21:38:05.9076098Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9076186Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.9076245Z         {
2023-01-11T21:38:05.9076351Z             auto tmp0 = static_cast<float>(3.0);
2023-01-11T21:38:05.9076438Z             out_ptr1[i0] = tmp0;
2023-01-11T21:38:05.9076507Z         }
2023-01-11T21:38:05.9076574Z     }
2023-01-11T21:38:05.9076638Z }
2023-01-11T21:38:05.9076724Z ''')
2023-01-11T21:38:05.9076729Z 
2023-01-11T21:38:05.9076734Z 
2023-01-11T21:38:05.9076819Z async_compile.wait(globals())
2023-01-11T21:38:05.9076926Z del async_compile
2023-01-11T21:38:05.9076932Z 
2023-01-11T21:38:05.9077007Z def call(args):
2023-01-11T21:38:05.9077084Z     arg0_1, = args
2023-01-11T21:38:05.9077161Z     args.clear()
2023-01-11T21:38:05.9077364Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9077561Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9077698Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9077772Z     return (buf0, buf1, )
2023-01-11T21:38:05.9077777Z 
2023-01-11T21:38:05.9077781Z 
2023-01-11T21:38:05.9077866Z if __name__ == "__main__":
2023-01-11T21:38:05.9077985Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9078114Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9078312Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9078428Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9078433Z 
2023-01-11T21:38:05.9078506Z ok (1.914s)
2023-01-11T21:38:05.9078954Z   test_flip_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9079086Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9079335Z [2023-01-11 21:26:39,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 103
2023-01-11T21:38:05.9079672Z [2023-01-11 21:26:41,414] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 103
2023-01-11T21:38:05.9079677Z 
2023-01-11T21:38:05.9079779Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9079861Z import torch
2023-01-11T21:38:05.9079936Z import random
2023-01-11T21:38:05.9080055Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9080178Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9080183Z 
2023-01-11T21:38:05.9080265Z aten = torch.ops.aten
2023-01-11T21:38:05.9080393Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9080491Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9080496Z 
2023-01-11T21:38:05.9080570Z import triton
2023-01-11T21:38:05.9080661Z import triton.language as tl
2023-01-11T21:38:05.9080786Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9080932Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9080938Z 
2023-01-11T21:38:05.9080943Z 
2023-01-11T21:38:05.9081081Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9081287Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9081406Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9081513Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9081616Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9081683Z {
2023-01-11T21:38:05.9081785Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9081853Z     {
2023-01-11T21:38:05.9081932Z         #pragma omp for 
2023-01-11T21:38:05.9082013Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:05.9082081Z         {
2023-01-11T21:38:05.9082167Z             #pragma GCC ivdep
2023-01-11T21:38:05.9082256Z             for(long i1=0; i1<6; i1+=1)
2023-01-11T21:38:05.9082328Z             {
2023-01-11T21:38:05.9082398Z                 {
2023-01-11T21:38:05.9082473Z                     {
2023-01-11T21:38:05.9082640Z                         auto tmp0 = in_ptr0[5 + ((-1)*i1) + (6*i0)];
2023-01-11T21:38:05.9082742Z                         out_ptr0[i1 + (6*i0)] = tmp0;
2023-01-11T21:38:05.9082844Z                     }
2023-01-11T21:38:05.9082914Z                 }
2023-01-11T21:38:05.9082983Z             }
2023-01-11T21:38:05.9083050Z         }
2023-01-11T21:38:05.9083138Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9083226Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9083296Z         {
2023-01-11T21:38:05.9083385Z             for(long i1=0; i1<6; i1+=1)
2023-01-11T21:38:05.9083454Z             {
2023-01-11T21:38:05.9083542Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9083638Z                 for(long i2=0; i2<6; i2+=1)
2023-01-11T21:38:05.9083700Z                 {
2023-01-11T21:38:05.9083772Z                     {
2023-01-11T21:38:05.9083846Z                         {
2023-01-11T21:38:05.9084031Z                             auto tmp0 = in_ptr0[30 + i2 + ((-6)*i1) + (36*i0)];
2023-01-11T21:38:05.9084147Z                             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.9084297Z                             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9084409Z                             out_ptr1[i2 + (6*i1) + (36*i0)] = tmp2;
2023-01-11T21:38:05.9084475Z                         }
2023-01-11T21:38:05.9084545Z                     }
2023-01-11T21:38:05.9084618Z                 }
2023-01-11T21:38:05.9084689Z             }
2023-01-11T21:38:05.9084757Z         }
2023-01-11T21:38:05.9084823Z     }
2023-01-11T21:38:05.9084889Z }
2023-01-11T21:38:05.9084966Z ''')
2023-01-11T21:38:05.9084971Z 
2023-01-11T21:38:05.9084976Z 
2023-01-11T21:38:05.9085070Z async_compile.wait(globals())
2023-01-11T21:38:05.9085148Z del async_compile
2023-01-11T21:38:05.9085153Z 
2023-01-11T21:38:05.9085230Z def call(args):
2023-01-11T21:38:05.9085339Z     arg0_1, = args
2023-01-11T21:38:05.9085418Z     args.clear()
2023-01-11T21:38:05.9085631Z     buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9085832Z     buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9086005Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9086079Z     del arg0_1
2023-01-11T21:38:05.9086160Z     return (buf0, buf1, )
2023-01-11T21:38:05.9086165Z 
2023-01-11T21:38:05.9086170Z 
2023-01-11T21:38:05.9086254Z if __name__ == "__main__":
2023-01-11T21:38:05.9086375Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9086504Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9086717Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9086821Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9086841Z 
2023-01-11T21:38:05.9086904Z ok (1.893s)
2023-01-11T21:38:05.9087358Z   test_fmod_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9087493Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9087752Z [2023-01-11 21:26:41,450] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 104
2023-01-11T21:38:05.9088017Z [2023-01-11 21:26:43,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 104
2023-01-11T21:38:05.9088023Z 
2023-01-11T21:38:05.9088122Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9088197Z import torch
2023-01-11T21:38:05.9088275Z import random
2023-01-11T21:38:05.9088397Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9088514Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9088519Z 
2023-01-11T21:38:05.9088600Z aten = torch.ops.aten
2023-01-11T21:38:05.9088737Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9088861Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9088868Z 
2023-01-11T21:38:05.9088945Z import triton
2023-01-11T21:38:05.9089038Z import triton.language as tl
2023-01-11T21:38:05.9089164Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9089297Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9089312Z 
2023-01-11T21:38:05.9089316Z 
2023-01-11T21:38:05.9089445Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9089652Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9089777Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9089890Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9089994Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9090095Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9090165Z {
2023-01-11T21:38:05.9090261Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9090328Z     {
2023-01-11T21:38:05.9090412Z         #pragma omp for 
2023-01-11T21:38:05.9090502Z         for(long i0=0; i0<9; i0+=1)
2023-01-11T21:38:05.9090570Z         {
2023-01-11T21:38:05.9090713Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9090849Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9090938Z             auto tmp2 = tmp0.fmod(tmp1);
2023-01-11T21:38:05.9091081Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(3.0));
2023-01-11T21:38:05.9091173Z             auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9091296Z             auto tmp5 = tmp4.fmod(tmp1);
2023-01-11T21:38:05.9091436Z             auto tmp6 = at::vec::Vectorized<float>(static_cast<float>(2.0));
2023-01-11T21:38:05.9091563Z             auto tmp7 = tmp5 - tmp6;
2023-01-11T21:38:05.9091657Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9091754Z             tmp7.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9091814Z         }
2023-01-11T21:38:05.9091914Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9092003Z         for(long i0=72; i0<72; i0+=1)
2023-01-11T21:38:05.9092070Z         {
2023-01-11T21:38:05.9092158Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9092246Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9092353Z             auto tmp2 = std::fmod(tmp0, tmp1);
2023-01-11T21:38:05.9092452Z             auto tmp3 = static_cast<float>(3.0);
2023-01-11T21:38:05.9092546Z             auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9092650Z             auto tmp5 = std::fmod(tmp4, tmp1);
2023-01-11T21:38:05.9092761Z             auto tmp6 = static_cast<float>(2.0);
2023-01-11T21:38:05.9092886Z             auto tmp7 = tmp5 - tmp6;
2023-01-11T21:38:05.9092976Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9093062Z             out_ptr1[i0] = tmp7;
2023-01-11T21:38:05.9093122Z         }
2023-01-11T21:38:05.9093190Z     }
2023-01-11T21:38:05.9093258Z }
2023-01-11T21:38:05.9093343Z ''')
2023-01-11T21:38:05.9093348Z 
2023-01-11T21:38:05.9093353Z 
2023-01-11T21:38:05.9093447Z async_compile.wait(globals())
2023-01-11T21:38:05.9093526Z del async_compile
2023-01-11T21:38:05.9093531Z 
2023-01-11T21:38:05.9093608Z def call(args):
2023-01-11T21:38:05.9093681Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9093761Z     args.clear()
2023-01-11T21:38:05.9093974Z     buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9094189Z     buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9094387Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9094464Z     del arg0_1
2023-01-11T21:38:05.9094650Z     del arg1_1
2023-01-11T21:38:05.9094725Z     return (buf0, buf1, )
2023-01-11T21:38:05.9094739Z 
2023-01-11T21:38:05.9094743Z 
2023-01-11T21:38:05.9094873Z if __name__ == "__main__":
2023-01-11T21:38:05.9094993Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9095125Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9095338Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9095549Z     arg1_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9095669Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9095675Z 
2023-01-11T21:38:05.9095747Z ok (1.846s)
2023-01-11T21:38:05.9096205Z   test_fmod_zero_dim_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9096343Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9096594Z [2023-01-11 21:26:43,287] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 105
2023-01-11T21:38:05.9096857Z [2023-01-11 21:26:45,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 105
2023-01-11T21:38:05.9097334Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9097503Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9097757Z [2023-01-11 21:26:45,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 106
2023-01-11T21:38:05.9098018Z [2023-01-11 21:26:46,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 106
2023-01-11T21:38:05.9098024Z 
2023-01-11T21:38:05.9098122Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9098199Z import torch
2023-01-11T21:38:05.9098275Z import random
2023-01-11T21:38:05.9098387Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9098514Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9098519Z 
2023-01-11T21:38:05.9098603Z aten = torch.ops.aten
2023-01-11T21:38:05.9098743Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9098844Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9098849Z 
2023-01-11T21:38:05.9098923Z import triton
2023-01-11T21:38:05.9099015Z import triton.language as tl
2023-01-11T21:38:05.9099142Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9099274Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9099282Z 
2023-01-11T21:38:05.9099296Z 
2023-01-11T21:38:05.9099426Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9099634Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9099759Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9099870Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9099980Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9100046Z {
2023-01-11T21:38:05.9100149Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9100208Z     {
2023-01-11T21:38:05.9100295Z         #pragma omp for 
2023-01-11T21:38:05.9100384Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9100452Z         {
2023-01-11T21:38:05.9100594Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9100720Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:05.9100852Z             auto tmp2 = tmp0.fmod(tmp1);
2023-01-11T21:38:05.9100942Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9101009Z         }
2023-01-11T21:38:05.9101115Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9101204Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.9101273Z         {
2023-01-11T21:38:05.9101363Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9101451Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:05.9101551Z             auto tmp2 = std::fmod(tmp0, tmp1);
2023-01-11T21:38:05.9101646Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9101715Z         }
2023-01-11T21:38:05.9101783Z     }
2023-01-11T21:38:05.9101853Z }
2023-01-11T21:38:05.9101940Z ''')
2023-01-11T21:38:05.9101945Z 
2023-01-11T21:38:05.9101949Z 
2023-01-11T21:38:05.9102042Z async_compile.wait(globals())
2023-01-11T21:38:05.9102115Z del async_compile
2023-01-11T21:38:05.9102120Z 
2023-01-11T21:38:05.9102194Z def call(args):
2023-01-11T21:38:05.9102274Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9102356Z     args.clear()
2023-01-11T21:38:05.9102549Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9102716Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9102790Z     del arg0_1
2023-01-11T21:38:05.9102855Z     del arg1_1
2023-01-11T21:38:05.9102933Z     return (buf0, )
2023-01-11T21:38:05.9102938Z 
2023-01-11T21:38:05.9102942Z 
2023-01-11T21:38:05.9103028Z if __name__ == "__main__":
2023-01-11T21:38:05.9103148Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9103274Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9103503Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9103690Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9103810Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9103815Z 
2023-01-11T21:38:05.9103821Z 
2023-01-11T21:38:05.9103912Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9103989Z import torch
2023-01-11T21:38:05.9104065Z import random
2023-01-11T21:38:05.9104184Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9104305Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9104310Z 
2023-01-11T21:38:05.9104394Z aten = torch.ops.aten
2023-01-11T21:38:05.9104529Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9104628Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9104633Z 
2023-01-11T21:38:05.9104699Z import triton
2023-01-11T21:38:05.9104796Z import triton.language as tl
2023-01-11T21:38:05.9104920Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9105059Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9105064Z 
2023-01-11T21:38:05.9105069Z 
2023-01-11T21:38:05.9105204Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9105413Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9105539Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9105650Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9105746Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9105813Z {
2023-01-11T21:38:05.9105915Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9105982Z     {
2023-01-11T21:38:05.9106066Z         #pragma omp for 
2023-01-11T21:38:05.9106155Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9106215Z         {
2023-01-11T21:38:05.9106345Z             auto tmp0 = at::vec::Vectorized<float>(in_ptr0[0]);
2023-01-11T21:38:05.9106482Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9106578Z             auto tmp2 = tmp0.fmod(tmp1);
2023-01-11T21:38:05.9106677Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9106744Z         }
2023-01-11T21:38:05.9106875Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9106965Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.9107024Z         {
2023-01-11T21:38:05.9107116Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:05.9107206Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9107313Z             auto tmp2 = std::fmod(tmp0, tmp1);
2023-01-11T21:38:05.9107399Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9107469Z         }
2023-01-11T21:38:05.9107535Z     }
2023-01-11T21:38:05.9107593Z }
2023-01-11T21:38:05.9107679Z ''')
2023-01-11T21:38:05.9107685Z 
2023-01-11T21:38:05.9107689Z 
2023-01-11T21:38:05.9107789Z async_compile.wait(globals())
2023-01-11T21:38:05.9107869Z del async_compile
2023-01-11T21:38:05.9107874Z 
2023-01-11T21:38:05.9107950Z def call(args):
2023-01-11T21:38:05.9108031Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9108108Z     args.clear()
2023-01-11T21:38:05.9108294Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9108463Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9108536Z     del arg0_1
2023-01-11T21:38:05.9108607Z     del arg1_1
2023-01-11T21:38:05.9108686Z     return (buf0, )
2023-01-11T21:38:05.9108691Z 
2023-01-11T21:38:05.9108696Z 
2023-01-11T21:38:05.9108777Z if __name__ == "__main__":
2023-01-11T21:38:05.9108898Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9109029Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9109208Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9109430Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9109551Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9109556Z 
2023-01-11T21:38:05.9109628Z ok (3.563s)
2023-01-11T21:38:05.9110092Z   test_forced_buffer_realize_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9110231Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9110494Z [2023-01-11 21:26:46,861] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 107
2023-01-11T21:38:05.9110755Z [2023-01-11 21:26:48,603] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 107
2023-01-11T21:38:05.9110764Z 
2023-01-11T21:38:05.9110866Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9110933Z import torch
2023-01-11T21:38:05.9111009Z import random
2023-01-11T21:38:05.9111131Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9111259Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9111264Z 
2023-01-11T21:38:05.9111348Z aten = torch.ops.aten
2023-01-11T21:38:05.9111484Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9111581Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9111586Z 
2023-01-11T21:38:05.9111661Z import triton
2023-01-11T21:38:05.9111746Z import triton.language as tl
2023-01-11T21:38:05.9111871Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9112010Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9112015Z 
2023-01-11T21:38:05.9112019Z 
2023-01-11T21:38:05.9112159Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9112366Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9112487Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9112605Z                        const float* __restrict__ in_ptr0)
2023-01-11T21:38:05.9112672Z {
2023-01-11T21:38:05.9112795Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9112866Z     {
2023-01-11T21:38:05.9112948Z         #pragma omp for 
2023-01-11T21:38:05.9113036Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9113104Z         {
2023-01-11T21:38:05.9113244Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9113383Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9113466Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9113554Z             auto tmp3 = tmp2 * tmp1;
2023-01-11T21:38:05.9113658Z             tmp3.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9113729Z         }
2023-01-11T21:38:05.9113830Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9113920Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:05.9113988Z         {
2023-01-11T21:38:05.9114070Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9114180Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.9114267Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9114358Z             auto tmp3 = tmp2 * tmp1;
2023-01-11T21:38:05.9114444Z             in_out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.9114513Z         }
2023-01-11T21:38:05.9114572Z     }
2023-01-11T21:38:05.9114639Z }
2023-01-11T21:38:05.9114724Z ''')
2023-01-11T21:38:05.9114730Z 
2023-01-11T21:38:05.9114734Z 
2023-01-11T21:38:05.9114829Z async_compile.wait(globals())
2023-01-11T21:38:05.9114908Z del async_compile
2023-01-11T21:38:05.9114913Z 
2023-01-11T21:38:05.9114988Z def call(args):
2023-01-11T21:38:05.9115063Z     arg0_1, = args
2023-01-11T21:38:05.9115165Z     args.clear()
2023-01-11T21:38:05.9115353Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9115443Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:05.9115584Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.9115659Z     del arg0_1
2023-01-11T21:38:05.9115737Z     return (buf1, )
2023-01-11T21:38:05.9115742Z 
2023-01-11T21:38:05.9115746Z 
2023-01-11T21:38:05.9115829Z if __name__ == "__main__":
2023-01-11T21:38:05.9115947Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9116067Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9116260Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9116374Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9116379Z 
2023-01-11T21:38:05.9116451Z ok (1.766s)
2023-01-11T21:38:05.9116896Z   test_full_like_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9117037Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9117296Z [2023-01-11 21:26:48,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 108
2023-01-11T21:38:05.9117559Z [2023-01-11 21:26:50,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 108
2023-01-11T21:38:05.9117564Z 
2023-01-11T21:38:05.9117661Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9117736Z import torch
2023-01-11T21:38:05.9117803Z import random
2023-01-11T21:38:05.9117924Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9118049Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9118056Z 
2023-01-11T21:38:05.9118141Z aten = torch.ops.aten
2023-01-11T21:38:05.9118276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9118374Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9118379Z 
2023-01-11T21:38:05.9118456Z import triton
2023-01-11T21:38:05.9118568Z import triton.language as tl
2023-01-11T21:38:05.9118696Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9118837Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9118842Z 
2023-01-11T21:38:05.9118847Z 
2023-01-11T21:38:05.9118984Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9119191Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9119310Z extern "C" void kernel(float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9119377Z {
2023-01-11T21:38:05.9119481Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9119544Z     {
2023-01-11T21:38:05.9119627Z         #pragma omp for 
2023-01-11T21:38:05.9119716Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9119784Z         {
2023-01-11T21:38:05.9119928Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(7.777));
2023-01-11T21:38:05.9120070Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9120199Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9120288Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9120357Z         }
2023-01-11T21:38:05.9120459Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9120544Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:05.9120615Z         {
2023-01-11T21:38:05.9120723Z             auto tmp0 = static_cast<float>(7.777);
2023-01-11T21:38:05.9120825Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9120944Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9121031Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9121127Z         }
2023-01-11T21:38:05.9121198Z     }
2023-01-11T21:38:05.9121263Z }
2023-01-11T21:38:05.9121349Z ''')
2023-01-11T21:38:05.9121354Z 
2023-01-11T21:38:05.9121359Z 
2023-01-11T21:38:05.9121453Z async_compile.wait(globals())
2023-01-11T21:38:05.9121524Z del async_compile
2023-01-11T21:38:05.9121530Z 
2023-01-11T21:38:05.9121609Z def call(args):
2023-01-11T21:38:05.9121685Z     arg0_1, = args
2023-01-11T21:38:05.9121760Z     args.clear()
2023-01-11T21:38:05.9121953Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9122061Z     kernel_cpp_0(c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9122140Z     return (buf0, )
2023-01-11T21:38:05.9122145Z 
2023-01-11T21:38:05.9122149Z 
2023-01-11T21:38:05.9122231Z if __name__ == "__main__":
2023-01-11T21:38:05.9122341Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9122468Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9122660Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9122780Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9122785Z 
2023-01-11T21:38:05.9122859Z ok (1.784s)
2023-01-11T21:38:05.9123320Z   test_fuse_tiled_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9123453Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9123713Z [2023-01-11 21:26:50,405] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 109
2023-01-11T21:38:05.9123977Z [2023-01-11 21:26:52,258] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 109
2023-01-11T21:38:05.9123985Z 
2023-01-11T21:38:05.9124076Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9124150Z import torch
2023-01-11T21:38:05.9124226Z import random
2023-01-11T21:38:05.9124347Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9124472Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9124477Z 
2023-01-11T21:38:05.9124591Z aten = torch.ops.aten
2023-01-11T21:38:05.9124730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9124819Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9124833Z 
2023-01-11T21:38:05.9124900Z import triton
2023-01-11T21:38:05.9124994Z import triton.language as tl
2023-01-11T21:38:05.9125120Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9125259Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9125264Z 
2023-01-11T21:38:05.9125269Z 
2023-01-11T21:38:05.9125407Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9125612Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9125740Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9125851Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9125953Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9126060Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9126164Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9126228Z {
2023-01-11T21:38:05.9126332Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9126400Z     {
2023-01-11T21:38:05.9126475Z         #pragma omp for 
2023-01-11T21:38:05.9126566Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:05.9126633Z         {
2023-01-11T21:38:05.9126724Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:05.9126794Z             {
2023-01-11T21:38:05.9126929Z                 auto tmp0 = at::vec::Vectorized<float>(in_ptr0[i0]);
2023-01-11T21:38:05.9127102Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i1);
2023-01-11T21:38:05.9127197Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9127299Z                 tmp2.store(out_ptr0 + (8*i1) + (128*i0));
2023-01-11T21:38:05.9127369Z             }
2023-01-11T21:38:05.9127474Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9127571Z             for(long i1=128; i1<128; i1+=1)
2023-01-11T21:38:05.9127640Z             {
2023-01-11T21:38:05.9127732Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9127824Z                 auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:05.9127908Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9128008Z                 out_ptr0[i1 + (128*i0)] = tmp2;
2023-01-11T21:38:05.9128080Z             }
2023-01-11T21:38:05.9128147Z         }
2023-01-11T21:38:05.9128233Z         #pragma omp for 
2023-01-11T21:38:05.9128322Z         for(long i0=0; i0<2048; i0+=1)
2023-01-11T21:38:05.9128382Z         {
2023-01-11T21:38:05.9128523Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i0);
2023-01-11T21:38:05.9128662Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9128752Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9128848Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9128919Z         }
2023-01-11T21:38:05.9129018Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9129112Z         for(long i0=16384; i0<16384; i0+=1)
2023-01-11T21:38:05.9129172Z         {
2023-01-11T21:38:05.9129261Z             auto tmp0 = in_ptr2[i0];
2023-01-11T21:38:05.9129364Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9129455Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9129542Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9129610Z         }
2023-01-11T21:38:05.9129669Z     }
2023-01-11T21:38:05.9129735Z }
2023-01-11T21:38:05.9129820Z ''')
2023-01-11T21:38:05.9129825Z 
2023-01-11T21:38:05.9129832Z 
2023-01-11T21:38:05.9129927Z async_compile.wait(globals())
2023-01-11T21:38:05.9130004Z del async_compile
2023-01-11T21:38:05.9130009Z 
2023-01-11T21:38:05.9130085Z def call(args):
2023-01-11T21:38:05.9130176Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9130252Z     args.clear()
2023-01-11T21:38:05.9130477Z     buf0 = empty_strided((128, 128), (128, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9130681Z     buf1 = empty_strided((128, 128), (128, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9130897Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9130975Z     del arg0_1
2023-01-11T21:38:05.9131051Z     del arg1_1
2023-01-11T21:38:05.9131124Z     del arg2_1
2023-01-11T21:38:05.9131204Z     return (buf0, buf1, )
2023-01-11T21:38:05.9131209Z 
2023-01-11T21:38:05.9131213Z 
2023-01-11T21:38:05.9131295Z if __name__ == "__main__":
2023-01-11T21:38:05.9131410Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9131540Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9131745Z     arg0_1 = rand_strided((128, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9131953Z     arg1_1 = rand_strided((1, 128), (128, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9132158Z     arg2_1 = rand_strided((128, 128), (128, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9132291Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9132296Z 
2023-01-11T21:38:05.9132369Z ok (1.873s)
2023-01-11T21:38:05.9132823Z   test_gather1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9132984Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9133235Z [2023-01-11 21:26:52,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 110
2023-01-11T21:38:05.9133499Z [2023-01-11 21:26:54,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 110
2023-01-11T21:38:05.9133505Z 
2023-01-11T21:38:05.9133604Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9133681Z import torch
2023-01-11T21:38:05.9133756Z import random
2023-01-11T21:38:05.9133877Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9134002Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9134007Z 
2023-01-11T21:38:05.9134090Z aten = torch.ops.aten
2023-01-11T21:38:05.9134219Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9134317Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9134326Z 
2023-01-11T21:38:05.9134400Z import triton
2023-01-11T21:38:05.9134598Z import triton.language as tl
2023-01-11T21:38:05.9134725Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9134864Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9134870Z 
2023-01-11T21:38:05.9134874Z 
2023-01-11T21:38:05.9135015Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9135223Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9135338Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9135451Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9135558Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9135662Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9135727Z {
2023-01-11T21:38:05.9135829Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9135901Z     {
2023-01-11T21:38:05.9135978Z         #pragma omp for 
2023-01-11T21:38:05.9136067Z         for(long i0=0; i0<20; i0+=1)
2023-01-11T21:38:05.9136135Z         {
2023-01-11T21:38:05.9136220Z             #pragma GCC ivdep
2023-01-11T21:38:05.9136311Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:05.9136379Z             {
2023-01-11T21:38:05.9136507Z                 {
2023-01-11T21:38:05.9136572Z                     {
2023-01-11T21:38:05.9136684Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:05.9136795Z                         auto tmp1 = static_cast<long>(1);
2023-01-11T21:38:05.9136898Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9137008Z                         auto tmp3 = in_ptr1[tmp2 + (6*i1)];
2023-01-11T21:38:05.9137111Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:05.9137281Z                         out_ptr1[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:05.9137345Z                     }
2023-01-11T21:38:05.9137416Z                 }
2023-01-11T21:38:05.9137491Z             }
2023-01-11T21:38:05.9166692Z         }
2023-01-11T21:38:05.9166771Z     }
2023-01-11T21:38:05.9166831Z }
2023-01-11T21:38:05.9166943Z ''')
2023-01-11T21:38:05.9166949Z 
2023-01-11T21:38:05.9166954Z 
2023-01-11T21:38:05.9167057Z async_compile.wait(globals())
2023-01-11T21:38:05.9167132Z del async_compile
2023-01-11T21:38:05.9167142Z 
2023-01-11T21:38:05.9167214Z def call(args):
2023-01-11T21:38:05.9167290Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9167361Z     args.clear()
2023-01-11T21:38:05.9167582Z     buf0 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9167790Z     buf1 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9167983Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9168053Z     del arg0_1
2023-01-11T21:38:05.9168120Z     del arg1_1
2023-01-11T21:38:05.9168306Z     return (buf0, buf1, )
2023-01-11T21:38:05.9168312Z 
2023-01-11T21:38:05.9168317Z 
2023-01-11T21:38:05.9168393Z if __name__ == "__main__":
2023-01-11T21:38:05.9168510Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9168634Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9168846Z     arg0_1 = rand_strided((1, 1, 10, 6), (60, 60, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9169051Z     arg1_1 = rand_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9169167Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9169173Z 
2023-01-11T21:38:05.9169240Z ok (1.826s)
2023-01-11T21:38:05.9169354Z   test_gather2_cpu (__main__.CpuTests) ... ok (0.001s)
2023-01-11T21:38:05.9169813Z   test_gather_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9169944Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9170205Z [2023-01-11 21:26:54,184] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 111
2023-01-11T21:38:05.9170465Z [2023-01-11 21:26:55,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 111
2023-01-11T21:38:05.9170471Z 
2023-01-11T21:38:05.9170566Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9170633Z import torch
2023-01-11T21:38:05.9170704Z import random
2023-01-11T21:38:05.9170819Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9170941Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9170946Z 
2023-01-11T21:38:05.9171025Z aten = torch.ops.aten
2023-01-11T21:38:05.9171161Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9171253Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9171258Z 
2023-01-11T21:38:05.9171326Z import triton
2023-01-11T21:38:05.9171415Z import triton.language as tl
2023-01-11T21:38:05.9171540Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9171710Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9171716Z 
2023-01-11T21:38:05.9171720Z 
2023-01-11T21:38:05.9171852Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9172056Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9172175Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9172283Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9172384Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9172446Z {
2023-01-11T21:38:05.9172547Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9172607Z     {
2023-01-11T21:38:05.9172685Z         #pragma omp for 
2023-01-11T21:38:05.9172769Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9172832Z         {
2023-01-11T21:38:05.9172975Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(0));
2023-01-11T21:38:05.9173071Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9173134Z         }
2023-01-11T21:38:05.9173227Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9173311Z         for(long i0=512; i0<512; i0+=1)
2023-01-11T21:38:05.9173374Z         {
2023-01-11T21:38:05.9173475Z             auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:05.9173557Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9173620Z         }
2023-01-11T21:38:05.9173697Z         #pragma omp for 
2023-01-11T21:38:05.9173776Z         for(long i0=0; i0<80; i0+=1)
2023-01-11T21:38:05.9173838Z         {
2023-01-11T21:38:05.9173920Z             #pragma GCC ivdep
2023-01-11T21:38:05.9174043Z             for(long i1=0; i1<32; i1+=1)
2023-01-11T21:38:05.9174107Z             {
2023-01-11T21:38:05.9174172Z                 {
2023-01-11T21:38:05.9174239Z                     {
2023-01-11T21:38:05.9174337Z                         auto tmp0 = in_ptr0[80 + i0];
2023-01-11T21:38:05.9174435Z                         auto tmp1 = in_ptr0[i0];
2023-01-11T21:38:05.9174827Z                         auto tmp2 = in_ptr1[i1 + (32*tmp1)];
2023-01-11T21:38:05.9174935Z                         auto tmp3 = in_ptr1[i1 + (32*tmp0)];
2023-01-11T21:38:05.9175082Z                         auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:05.9175190Z                         auto tmp5 = static_cast<float>(1);
2023-01-11T21:38:05.9175284Z                         auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:05.9175398Z                         atomic_add(&out_ptr0[i1 + (32*tmp0)], tmp6);
2023-01-11T21:38:05.9175466Z                     }
2023-01-11T21:38:05.9175531Z                 }
2023-01-11T21:38:05.9175594Z             }
2023-01-11T21:38:05.9175661Z         }
2023-01-11T21:38:05.9175723Z     }
2023-01-11T21:38:05.9175783Z }
2023-01-11T21:38:05.9175860Z ''')
2023-01-11T21:38:05.9175865Z 
2023-01-11T21:38:05.9175870Z 
2023-01-11T21:38:05.9175960Z async_compile.wait(globals())
2023-01-11T21:38:05.9176033Z del async_compile
2023-01-11T21:38:05.9176039Z 
2023-01-11T21:38:05.9176112Z def call(args):
2023-01-11T21:38:05.9176188Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9176260Z     args.clear()
2023-01-11T21:38:05.9176457Z     buf0 = empty_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9176618Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9176687Z     del arg0_1
2023-01-11T21:38:05.9176754Z     del arg1_1
2023-01-11T21:38:05.9176829Z     return (buf0, )
2023-01-11T21:38:05.9176834Z 
2023-01-11T21:38:05.9176838Z 
2023-01-11T21:38:05.9176915Z if __name__ == "__main__":
2023-01-11T21:38:05.9177029Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9177231Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9177451Z     arg0_1 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9177642Z     arg1_1 = rand_strided((2, 80), (80, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9177809Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9177815Z 
2023-01-11T21:38:05.9177884Z ok (1.848s)
2023-01-11T21:38:05.9178330Z   test_gelu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9178458Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9178717Z [2023-01-11 21:26:55,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 112
2023-01-11T21:38:05.9178976Z [2023-01-11 21:26:57,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 112
2023-01-11T21:38:05.9178982Z 
2023-01-11T21:38:05.9179080Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9179152Z import torch
2023-01-11T21:38:05.9179219Z import random
2023-01-11T21:38:05.9179335Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9179453Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9179459Z 
2023-01-11T21:38:05.9179537Z aten = torch.ops.aten
2023-01-11T21:38:05.9179670Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9179761Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9179766Z 
2023-01-11T21:38:05.9179836Z import triton
2023-01-11T21:38:05.9179924Z import triton.language as tl
2023-01-11T21:38:05.9180041Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9180216Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9180222Z 
2023-01-11T21:38:05.9180226Z 
2023-01-11T21:38:05.9180363Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9180570Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9180693Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9180797Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9180898Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9180966Z {
2023-01-11T21:38:05.9181063Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9181132Z     {
2023-01-11T21:38:05.9181216Z         #pragma omp for 
2023-01-11T21:38:05.9181306Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9181376Z         {
2023-01-11T21:38:05.9181520Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9181668Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(0.5));
2023-01-11T21:38:05.9181753Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9181903Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(0.7071067811865476));
2023-01-11T21:38:05.9181998Z             auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9182091Z             auto tmp5 = tmp4.erf();
2023-01-11T21:38:05.9182231Z             auto tmp6 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9182323Z             auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:05.9182413Z             auto tmp8 = tmp2 * tmp7;
2023-01-11T21:38:05.9182552Z             auto tmp9 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9182639Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:05.9182731Z             auto tmp11 = tmp0 + tmp6;
2023-01-11T21:38:05.9182824Z             auto tmp12 = tmp11 * tmp1;
2023-01-11T21:38:05.9182916Z             auto tmp13 = tmp11 * tmp3;
2023-01-11T21:38:05.9183011Z             auto tmp14 = tmp13.erf();
2023-01-11T21:38:05.9183103Z             auto tmp15 = tmp14 + tmp6;
2023-01-11T21:38:05.9183198Z             auto tmp16 = tmp12 * tmp15;
2023-01-11T21:38:05.9183291Z             tmp10.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9183390Z             tmp16.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9183490Z         }
2023-01-11T21:38:05.9183597Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9183690Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.9183760Z         {
2023-01-11T21:38:05.9183851Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9183954Z             auto tmp1 = static_cast<float>(0.5);
2023-01-11T21:38:05.9184045Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9184162Z             auto tmp3 = static_cast<float>(0.7071067811865476);
2023-01-11T21:38:05.9184255Z             auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9184351Z             auto tmp5 = std::erf(tmp4);
2023-01-11T21:38:05.9184460Z             auto tmp6 = static_cast<float>(1);
2023-01-11T21:38:05.9184551Z             auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:05.9184634Z             auto tmp8 = tmp2 * tmp7;
2023-01-11T21:38:05.9184740Z             auto tmp9 = static_cast<float>(2);
2023-01-11T21:38:05.9184835Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:05.9184927Z             auto tmp11 = tmp0 + tmp6;
2023-01-11T21:38:05.9185020Z             auto tmp12 = tmp11 * tmp1;
2023-01-11T21:38:05.9185115Z             auto tmp13 = tmp11 * tmp3;
2023-01-11T21:38:05.9185214Z             auto tmp14 = std::erf(tmp13);
2023-01-11T21:38:05.9185299Z             auto tmp15 = tmp14 + tmp6;
2023-01-11T21:38:05.9185392Z             auto tmp16 = tmp12 * tmp15;
2023-01-11T21:38:05.9185482Z             out_ptr0[i0] = tmp10;
2023-01-11T21:38:05.9185570Z             out_ptr1[i0] = tmp16;
2023-01-11T21:38:05.9185640Z         }
2023-01-11T21:38:05.9185709Z     }
2023-01-11T21:38:05.9185769Z }
2023-01-11T21:38:05.9185857Z ''')
2023-01-11T21:38:05.9185896Z 
2023-01-11T21:38:05.9185901Z 
2023-01-11T21:38:05.9186002Z async_compile.wait(globals())
2023-01-11T21:38:05.9186082Z del async_compile
2023-01-11T21:38:05.9186088Z 
2023-01-11T21:38:05.9186166Z def call(args):
2023-01-11T21:38:05.9186243Z     arg0_1, = args
2023-01-11T21:38:05.9186321Z     args.clear()
2023-01-11T21:38:05.9186530Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9186726Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9186901Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9186978Z     del arg0_1
2023-01-11T21:38:05.9187062Z     return (buf0, buf1, )
2023-01-11T21:38:05.9187067Z 
2023-01-11T21:38:05.9187072Z 
2023-01-11T21:38:05.9187154Z if __name__ == "__main__":
2023-01-11T21:38:05.9187276Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9187405Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9188926Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9189033Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9189038Z 
2023-01-11T21:38:05.9189112Z ok (1.830s)
2023-01-11T21:38:05.9189567Z   test_glu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9189703Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9189964Z [2023-01-11 21:26:57,805] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 113
2023-01-11T21:38:05.9190232Z [2023-01-11 21:26:59,603] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 113
2023-01-11T21:38:05.9190240Z 
2023-01-11T21:38:05.9190341Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9190417Z import torch
2023-01-11T21:38:05.9190495Z import random
2023-01-11T21:38:05.9190610Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9190766Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9190771Z 
2023-01-11T21:38:05.9190858Z aten = torch.ops.aten
2023-01-11T21:38:05.9190999Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9191100Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9191105Z 
2023-01-11T21:38:05.9191180Z import triton
2023-01-11T21:38:05.9191277Z import triton.language as tl
2023-01-11T21:38:05.9191405Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9191543Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9191549Z 
2023-01-11T21:38:05.9191553Z 
2023-01-11T21:38:05.9191698Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9191907Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9192033Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9192141Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9192249Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9192353Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9192420Z {
2023-01-11T21:38:05.9192518Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9192588Z     {
2023-01-11T21:38:05.9192673Z         #pragma omp for 
2023-01-11T21:38:05.9192763Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9192836Z         {
2023-01-11T21:38:05.9192923Z             #pragma GCC ivdep
2023-01-11T21:38:05.9193006Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9193077Z             {
2023-01-11T21:38:05.9193148Z                 {
2023-01-11T21:38:05.9193255Z                     {
2023-01-11T21:38:05.9193365Z                         auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.9193478Z                         auto tmp1 = in_ptr0[4 + i1 + (8*i0)];
2023-01-11T21:38:05.9193636Z                         auto tmp2 = std::exp(-tmp1);
2023-01-11T21:38:05.9193732Z                         auto tmp3 = 1 / (1 + tmp2);
2023-01-11T21:38:05.9193835Z                         auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9193938Z                         out_ptr0[i1 + (4*i0)] = tmp4;
2023-01-11T21:38:05.9194013Z                     }
2023-01-11T21:38:05.9194086Z                 }
2023-01-11T21:38:05.9194158Z             }
2023-01-11T21:38:05.9194228Z         }
2023-01-11T21:38:05.9194305Z         #pragma omp for 
2023-01-11T21:38:05.9194395Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9194467Z         {
2023-01-11T21:38:05.9194561Z             for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:05.9194631Z             {
2023-01-11T21:38:05.9194784Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (1024*i0));
2023-01-11T21:38:05.9194937Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr0 + 512 + (8*i1) + (1024*i0));
2023-01-11T21:38:05.9195083Z                 auto tmp2 = decltype(tmp1)(1)/(decltype(tmp1)(1) + tmp1.neg().exp());
2023-01-11T21:38:05.9195175Z                 auto tmp3 = tmp0 * tmp2;
2023-01-11T21:38:05.9195285Z                 tmp3.store(out_ptr1 + (8*i1) + (512*i0));
2023-01-11T21:38:05.9195355Z             }
2023-01-11T21:38:05.9195454Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9195551Z             for(long i1=512; i1<512; i1+=1)
2023-01-11T21:38:05.9195621Z             {
2023-01-11T21:38:05.9195727Z                 auto tmp0 = in_ptr0[i1 + (1024*i0)];
2023-01-11T21:38:05.9195831Z                 auto tmp1 = in_ptr0[512 + i1 + (1024*i0)];
2023-01-11T21:38:05.9195977Z                 auto tmp2 = std::exp(-tmp1);
2023-01-11T21:38:05.9196075Z                 auto tmp3 = 1 / (1 + tmp2);
2023-01-11T21:38:05.9196175Z                 auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9196274Z                 out_ptr1[i1 + (512*i0)] = tmp4;
2023-01-11T21:38:05.9196346Z             }
2023-01-11T21:38:05.9196422Z         }
2023-01-11T21:38:05.9196499Z         #pragma omp for 
2023-01-11T21:38:05.9196623Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:05.9196695Z         {
2023-01-11T21:38:05.9196785Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9196854Z             {
2023-01-11T21:38:05.9197001Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (64*i0));
2023-01-11T21:38:05.9197152Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr0 + 32 + (8*i1) + (64*i0));
2023-01-11T21:38:05.9197288Z                 auto tmp2 = decltype(tmp1)(1)/(decltype(tmp1)(1) + tmp1.neg().exp());
2023-01-11T21:38:05.9197383Z                 auto tmp3 = tmp0 * tmp2;
2023-01-11T21:38:05.9197496Z                 tmp3.store(out_ptr2 + (8*i1) + (32*i0));
2023-01-11T21:38:05.9197569Z             }
2023-01-11T21:38:05.9197668Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9197762Z             for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:05.9197833Z             {
2023-01-11T21:38:05.9197929Z                 auto tmp0 = in_ptr0[i1 + (64*i0)];
2023-01-11T21:38:05.9198039Z                 auto tmp1 = in_ptr0[32 + i1 + (64*i0)];
2023-01-11T21:38:05.9198188Z                 auto tmp2 = std::exp(-tmp1);
2023-01-11T21:38:05.9198284Z                 auto tmp3 = 1 / (1 + tmp2);
2023-01-11T21:38:05.9198380Z                 auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9198480Z                 out_ptr2[i1 + (32*i0)] = tmp4;
2023-01-11T21:38:05.9198549Z             }
2023-01-11T21:38:05.9198612Z         }
2023-01-11T21:38:05.9198681Z     }
2023-01-11T21:38:05.9198748Z }
2023-01-11T21:38:05.9198833Z ''')
2023-01-11T21:38:05.9198839Z 
2023-01-11T21:38:05.9198843Z 
2023-01-11T21:38:05.9198940Z async_compile.wait(globals())
2023-01-11T21:38:05.9199048Z del async_compile
2023-01-11T21:38:05.9199054Z 
2023-01-11T21:38:05.9199132Z def call(args):
2023-01-11T21:38:05.9199209Z     arg0_1, = args
2023-01-11T21:38:05.9199280Z     args.clear()
2023-01-11T21:38:05.9199500Z     buf0 = empty_strided((8, 16, 8, 4), (512, 32, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9199717Z     buf1 = empty_strided((8, 8, 8, 8), (512, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9199927Z     buf2 = empty_strided((8, 16, 4, 8), (512, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9200125Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9200200Z     del arg0_1
2023-01-11T21:38:05.9200292Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9200297Z 
2023-01-11T21:38:05.9200301Z 
2023-01-11T21:38:05.9200385Z if __name__ == "__main__":
2023-01-11T21:38:05.9200499Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9200633Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9200852Z     arg0_1 = rand_strided((8, 16, 8, 8), (1024, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9200969Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9200974Z 
2023-01-11T21:38:05.9201047Z ok (1.838s)
2023-01-11T21:38:05.9201511Z   test_grid_sampler_2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9201646Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9201909Z [2023-01-11 21:27:00,987] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 114
2023-01-11T21:38:05.9202123Z [2023-01-11 21:27:01,595] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:05.9202321Z [2023-01-11 21:27:01,596] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:05.9202524Z [2023-01-11 21:27:01,596] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5')
2023-01-11T21:38:05.9202608Z 
2023-01-11T21:38:05.9202711Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9202788Z import torch
2023-01-11T21:38:05.9202867Z import random
2023-01-11T21:38:05.9202990Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9203117Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9203122Z 
2023-01-11T21:38:05.9203208Z aten = torch.ops.aten
2023-01-11T21:38:05.9203340Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9203440Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9203445Z 
2023-01-11T21:38:05.9203521Z import triton
2023-01-11T21:38:05.9203618Z import triton.language as tl
2023-01-11T21:38:05.9203748Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9203892Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9203898Z 
2023-01-11T21:38:05.9203903Z 
2023-01-11T21:38:05.9204043Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9204257Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9204374Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9204488Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.9204603Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9204716Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9204825Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9204930Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9205036Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.9205162Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.9205266Z                        long* __restrict__ out_ptr4,
2023-01-11T21:38:05.9205366Z                        long* __restrict__ out_ptr5,
2023-01-11T21:38:05.9205469Z                        float* __restrict__ out_ptr6,
2023-01-11T21:38:05.9205571Z                        long* __restrict__ out_ptr7,
2023-01-11T21:38:05.9205673Z                        long* __restrict__ out_ptr8,
2023-01-11T21:38:05.9205775Z                        float* __restrict__ out_ptr9,
2023-01-11T21:38:05.9205878Z                        long* __restrict__ out_ptr10,
2023-01-11T21:38:05.9205973Z                        long* __restrict__ out_ptr11,
2023-01-11T21:38:05.9206080Z                        float* __restrict__ out_ptr12,
2023-01-11T21:38:05.9206184Z                        long* __restrict__ out_ptr13,
2023-01-11T21:38:05.9206285Z                        long* __restrict__ out_ptr14,
2023-01-11T21:38:05.9206394Z                        float* __restrict__ out_ptr15)
2023-01-11T21:38:05.9206462Z {
2023-01-11T21:38:05.9206568Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9206629Z     {
2023-01-11T21:38:05.9206713Z         #pragma omp for 
2023-01-11T21:38:05.9206807Z         for(long i0=0; i0<495616; i0+=1)
2023-01-11T21:38:05.9206880Z         {
2023-01-11T21:38:05.9206953Z             {
2023-01-11T21:38:05.9207028Z                 {
2023-01-11T21:38:05.9207124Z                     auto tmp0 = in_ptr0[2*i0];
2023-01-11T21:38:05.9207230Z                     auto tmp9 = in_ptr0[1 + (2*i0)];
2023-01-11T21:38:05.9207345Z                     auto tmp1 = static_cast<float>(175.5);
2023-01-11T21:38:05.9207446Z                     auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9207544Z                     auto tmp3 = tmp2 + tmp1;
2023-01-11T21:38:05.9207654Z                     auto tmp4 = std::floor(tmp3);
2023-01-11T21:38:05.9207765Z                     auto tmp5 = static_cast<float>(0);
2023-01-11T21:38:05.9207868Z                     auto tmp6 = tmp4 >= tmp5;
2023-01-11T21:38:05.9207975Z                     auto tmp7 = static_cast<float>(352);
2023-01-11T21:38:05.9208073Z                     auto tmp8 = tmp4 < tmp7;
2023-01-11T21:38:05.9208172Z                     auto tmp10 = tmp9 * tmp1;
2023-01-11T21:38:05.9208273Z                     auto tmp11 = tmp10 + tmp1;
2023-01-11T21:38:05.9208425Z                     auto tmp12 = std::floor(tmp11);
2023-01-11T21:38:05.9208528Z                     auto tmp13 = tmp12 >= tmp5;
2023-01-11T21:38:05.9208628Z                     auto tmp14 = tmp12 < tmp7;
2023-01-11T21:38:05.9208724Z                     auto tmp15 = tmp13 && tmp14;
2023-01-11T21:38:05.9208826Z                     auto tmp16 = tmp8 && tmp15;
2023-01-11T21:38:05.9208928Z                     auto tmp17 = tmp6 && tmp16;
2023-01-11T21:38:05.9209044Z                     auto tmp18 = static_cast<float>(1);
2023-01-11T21:38:05.9209145Z                     auto tmp19 = tmp4 + tmp18;
2023-01-11T21:38:05.9209292Z                     auto tmp20 = tmp19 - tmp3;
2023-01-11T21:38:05.9209398Z                     auto tmp21 = tmp12 + tmp18;
2023-01-11T21:38:05.9209534Z                     auto tmp22 = tmp21 - tmp11;
2023-01-11T21:38:05.9209635Z                     auto tmp23 = tmp20 * tmp22;
2023-01-11T21:38:05.9209743Z                     auto tmp24 = tmp17 ? tmp23 : tmp5;
2023-01-11T21:38:05.9209846Z                     auto tmp25 = tmp19 >= tmp5;
2023-01-11T21:38:05.9209947Z                     auto tmp26 = tmp19 < tmp7;
2023-01-11T21:38:05.9210049Z                     auto tmp27 = tmp26 && tmp15;
2023-01-11T21:38:05.9210151Z                     auto tmp28 = tmp25 && tmp27;
2023-01-11T21:38:05.9210287Z                     auto tmp29 = tmp3 - tmp4;
2023-01-11T21:38:05.9210389Z                     auto tmp30 = tmp29 * tmp22;
2023-01-11T21:38:05.9210499Z                     auto tmp31 = tmp28 ? tmp30 : tmp5;
2023-01-11T21:38:05.9210596Z                     auto tmp32 = tmp21 >= tmp5;
2023-01-11T21:38:05.9210696Z                     auto tmp33 = tmp21 < tmp7;
2023-01-11T21:38:05.9210828Z                     auto tmp34 = tmp32 && tmp33;
2023-01-11T21:38:05.9210930Z                     auto tmp35 = tmp8 && tmp34;
2023-01-11T21:38:05.9211028Z                     auto tmp36 = tmp6 && tmp35;
2023-01-11T21:38:05.9211164Z                     auto tmp37 = tmp11 - tmp12;
2023-01-11T21:38:05.9211266Z                     auto tmp38 = tmp20 * tmp37;
2023-01-11T21:38:05.9211375Z                     auto tmp39 = tmp36 ? tmp38 : tmp5;
2023-01-11T21:38:05.9211476Z                     auto tmp40 = tmp26 && tmp34;
2023-01-11T21:38:05.9211577Z                     auto tmp41 = tmp25 && tmp40;
2023-01-11T21:38:05.9211677Z                     auto tmp42 = tmp29 * tmp37;
2023-01-11T21:38:05.9211785Z                     auto tmp43 = tmp41 ? tmp42 : tmp5;
2023-01-11T21:38:05.9211895Z                     auto tmp44 = static_cast<float>(176.0);
2023-01-11T21:38:05.9211996Z                     auto tmp45 = tmp0 * tmp44;
2023-01-11T21:38:05.9212094Z                     auto tmp46 = tmp45 + tmp1;
2023-01-11T21:38:05.9212212Z                     auto tmp47 = static_cast<float>(0.0);
2023-01-11T21:38:05.9212357Z                     auto tmp48 = (tmp47 != tmp47) ? tmp47 : std::max(tmp46, tmp47);
2023-01-11T21:38:05.9212473Z                     auto tmp49 = static_cast<float>(351.0);
2023-01-11T21:38:05.9212614Z                     auto tmp50 = (tmp49 != tmp49) ? tmp49 : std::min(tmp48, tmp49);
2023-01-11T21:38:05.9212722Z                     auto tmp51 = std::floor(tmp50);
2023-01-11T21:38:05.9212814Z                     auto tmp52 = tmp51 >= tmp5;
2023-01-11T21:38:05.9212914Z                     auto tmp53 = tmp51 < tmp7;
2023-01-11T21:38:05.9213013Z                     auto tmp54 = tmp9 * tmp44;
2023-01-11T21:38:05.9213114Z                     auto tmp55 = tmp54 + tmp1;
2023-01-11T21:38:05.9213250Z                     auto tmp56 = (tmp47 != tmp47) ? tmp47 : std::max(tmp55, tmp47);
2023-01-11T21:38:05.9213383Z                     auto tmp57 = (tmp49 != tmp49) ? tmp49 : std::min(tmp56, tmp49);
2023-01-11T21:38:05.9213496Z                     auto tmp58 = std::floor(tmp57);
2023-01-11T21:38:05.9213587Z                     auto tmp59 = tmp58 >= tmp5;
2023-01-11T21:38:05.9213687Z                     auto tmp60 = tmp58 < tmp7;
2023-01-11T21:38:05.9213788Z                     auto tmp61 = tmp59 && tmp60;
2023-01-11T21:38:05.9213919Z                     auto tmp62 = tmp53 && tmp61;
2023-01-11T21:38:05.9214022Z                     auto tmp63 = tmp52 && tmp62;
2023-01-11T21:38:05.9214140Z                     auto tmp64 = static_cast<long>(tmp51);
2023-01-11T21:38:05.9214253Z                     auto tmp65 = static_cast<long>(0);
2023-01-11T21:38:05.9214364Z                     auto tmp66 = tmp63 ? tmp64 : tmp65;
2023-01-11T21:38:05.9214471Z                     auto tmp67 = static_cast<long>(tmp58);
2023-01-11T21:38:05.9214693Z                     auto tmp68 = tmp63 ? tmp67 : tmp65;
2023-01-11T21:38:05.9214793Z                     auto tmp69 = tmp51 + tmp18;
2023-01-11T21:38:05.9214943Z                     auto tmp70 = tmp69 - tmp50;
2023-01-11T21:38:05.9215043Z                     auto tmp71 = tmp58 + tmp18;
2023-01-11T21:38:05.9215187Z                     auto tmp72 = tmp71 - tmp57;
2023-01-11T21:38:05.9215284Z                     auto tmp73 = tmp70 * tmp72;
2023-01-11T21:38:05.9215384Z                     auto tmp74 = tmp63 ? tmp73 : tmp5;
2023-01-11T21:38:05.9215497Z                     auto tmp75 = tmp69 >= tmp5;
2023-01-11T21:38:05.9215604Z                     auto tmp76 = tmp69 < tmp7;
2023-01-11T21:38:05.9215725Z                     auto tmp77 = tmp76 && tmp61;
2023-01-11T21:38:05.9215826Z                     auto tmp78 = tmp75 && tmp77;
2023-01-11T21:38:05.9215939Z                     auto tmp79 = static_cast<long>(tmp69);
2023-01-11T21:38:05.9216049Z                     auto tmp80 = tmp78 ? tmp79 : tmp65;
2023-01-11T21:38:05.9216146Z                     auto tmp81 = tmp78 ? tmp67 : tmp65;
2023-01-11T21:38:05.9216287Z                     auto tmp82 = tmp50 - tmp51;
2023-01-11T21:38:05.9216435Z                     auto tmp83 = tmp82 * tmp72;
2023-01-11T21:38:05.9216539Z                     auto tmp84 = tmp78 ? tmp83 : tmp5;
2023-01-11T21:38:05.9216635Z                     auto tmp85 = tmp71 >= tmp5;
2023-01-11T21:38:05.9216735Z                     auto tmp86 = tmp71 < tmp7;
2023-01-11T21:38:05.9216837Z                     auto tmp87 = tmp85 && tmp86;
2023-01-11T21:38:05.9216935Z                     auto tmp88 = tmp53 && tmp87;
2023-01-11T21:38:05.9217025Z                     auto tmp89 = tmp52 && tmp88;
2023-01-11T21:38:05.9217180Z                     auto tmp90 = tmp89 ? tmp64 : tmp65;
2023-01-11T21:38:05.9217303Z                     auto tmp91 = static_cast<long>(tmp71);
2023-01-11T21:38:05.9217407Z                     auto tmp92 = tmp89 ? tmp91 : tmp65;
2023-01-11T21:38:05.9217550Z                     auto tmp93 = tmp57 - tmp58;
2023-01-11T21:38:05.9217649Z                     auto tmp94 = tmp70 * tmp93;
2023-01-11T21:38:05.9217753Z                     auto tmp95 = tmp89 ? tmp94 : tmp5;
2023-01-11T21:38:05.9217846Z                     auto tmp96 = tmp76 && tmp87;
2023-01-11T21:38:05.9217943Z                     auto tmp97 = tmp75 && tmp96;
2023-01-11T21:38:05.9218046Z                     auto tmp98 = tmp97 ? tmp79 : tmp65;
2023-01-11T21:38:05.9218150Z                     auto tmp99 = tmp97 ? tmp91 : tmp65;
2023-01-11T21:38:05.9218250Z                     auto tmp100 = tmp82 * tmp93;
2023-01-11T21:38:05.9218357Z                     auto tmp101 = tmp97 ? tmp100 : tmp5;
2023-01-11T21:38:05.9218449Z                     out_ptr0[i0] = tmp24;
2023-01-11T21:38:05.9218532Z                     out_ptr1[i0] = tmp31;
2023-01-11T21:38:05.9218621Z                     out_ptr2[i0] = tmp39;
2023-01-11T21:38:05.9218710Z                     out_ptr3[i0] = tmp43;
2023-01-11T21:38:05.9218799Z                     out_ptr4[i0] = tmp66;
2023-01-11T21:38:05.9218886Z                     out_ptr5[i0] = tmp68;
2023-01-11T21:38:05.9218974Z                     out_ptr6[i0] = tmp74;
2023-01-11T21:38:05.9219066Z                     out_ptr7[i0] = tmp80;
2023-01-11T21:38:05.9219146Z                     out_ptr8[i0] = tmp81;
2023-01-11T21:38:05.9219233Z                     out_ptr9[i0] = tmp84;
2023-01-11T21:38:05.9219327Z                     out_ptr10[i0] = tmp90;
2023-01-11T21:38:05.9219422Z                     out_ptr11[i0] = tmp92;
2023-01-11T21:38:05.9219565Z                     out_ptr12[i0] = tmp95;
2023-01-11T21:38:05.9219657Z                     out_ptr13[i0] = tmp98;
2023-01-11T21:38:05.9219746Z                     out_ptr14[i0] = tmp99;
2023-01-11T21:38:05.9219832Z                     out_ptr15[i0] = tmp101;
2023-01-11T21:38:05.9219903Z                 }
2023-01-11T21:38:05.9219972Z             }
2023-01-11T21:38:05.9220044Z         }
2023-01-11T21:38:05.9220141Z         #pragma omp for  collapse(3)
2023-01-11T21:38:05.9220229Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9220296Z         {
2023-01-11T21:38:05.9220377Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.9220446Z             {
2023-01-11T21:38:05.9220547Z                 for(long i2=0; i2<123904; i2+=1)
2023-01-11T21:38:05.9220616Z                 {
2023-01-11T21:38:05.9220694Z                     {
2023-01-11T21:38:05.9220766Z                         {
2023-01-11T21:38:05.9220882Z                             auto tmp2 = in_ptr0[(2*i2) + (247808*i0)];
2023-01-11T21:38:05.9220996Z                             auto tmp11 = in_ptr0[1 + (2*i2) + (247808*i0)];
2023-01-11T21:38:05.9221116Z                             auto tmp51 = out_ptr0[i2 + (123904*i0)];
2023-01-11T21:38:05.9221232Z                             auto tmp53 = out_ptr1[i2 + (123904*i0)];
2023-01-11T21:38:05.9221345Z                             auto tmp56 = out_ptr2[i2 + (123904*i0)];
2023-01-11T21:38:05.9221454Z                             auto tmp59 = out_ptr3[i2 + (123904*i0)];
2023-01-11T21:38:05.9221569Z                             auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:05.9221683Z                             auto tmp1 = static_cast<long>(i1);
2023-01-11T21:38:05.9221831Z                             auto tmp3 = static_cast<float>(175.5);
2023-01-11T21:38:05.9221928Z                             auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9222031Z                             auto tmp5 = tmp4 + tmp3;
2023-01-11T21:38:05.9222148Z                             auto tmp6 = std::floor(tmp5);
2023-01-11T21:38:05.9222266Z                             auto tmp7 = static_cast<float>(0);
2023-01-11T21:38:05.9222371Z                             auto tmp8 = tmp6 >= tmp7;
2023-01-11T21:38:05.9222488Z                             auto tmp9 = static_cast<float>(352);
2023-01-11T21:38:05.9222595Z                             auto tmp10 = tmp6 < tmp9;
2023-01-11T21:38:05.9222692Z                             auto tmp12 = tmp11 * tmp3;
2023-01-11T21:38:05.9222795Z                             auto tmp13 = tmp12 + tmp3;
2023-01-11T21:38:05.9222912Z                             auto tmp14 = std::floor(tmp13);
2023-01-11T21:38:05.9223016Z                             auto tmp15 = tmp14 >= tmp7;
2023-01-11T21:38:05.9223123Z                             auto tmp16 = tmp14 < tmp9;
2023-01-11T21:38:05.9223230Z                             auto tmp17 = tmp15 && tmp16;
2023-01-11T21:38:05.9223338Z                             auto tmp18 = tmp10 && tmp17;
2023-01-11T21:38:05.9223441Z                             auto tmp19 = tmp8 && tmp18;
2023-01-11T21:38:05.9223557Z                             auto tmp20 = static_cast<long>(tmp14);
2023-01-11T21:38:05.9223676Z                             auto tmp21 = static_cast<long>(0);
2023-01-11T21:38:05.9223789Z                             auto tmp22 = tmp19 ? tmp20 : tmp21;
2023-01-11T21:38:05.9223907Z                             auto tmp23 = static_cast<long>(tmp6);
2023-01-11T21:38:05.9224021Z                             auto tmp24 = tmp19 ? tmp23 : tmp21;
2023-01-11T21:38:05.9224164Z                             auto tmp25 = in_ptr1[tmp24 + (352*tmp22) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9224280Z                             auto tmp26 = static_cast<float>(1);
2023-01-11T21:38:05.9224379Z                             auto tmp27 = tmp6 + tmp26;
2023-01-11T21:38:05.9224483Z                             auto tmp28 = tmp27 >= tmp7;
2023-01-11T21:38:05.9224589Z                             auto tmp29 = tmp27 < tmp9;
2023-01-11T21:38:05.9224698Z                             auto tmp30 = tmp29 && tmp17;
2023-01-11T21:38:05.9224838Z                             auto tmp31 = tmp28 && tmp30;
2023-01-11T21:38:05.9224947Z                             auto tmp32 = tmp31 ? tmp20 : tmp21;
2023-01-11T21:38:05.9225064Z                             auto tmp33 = static_cast<long>(tmp27);
2023-01-11T21:38:05.9225178Z                             auto tmp34 = tmp31 ? tmp33 : tmp21;
2023-01-11T21:38:05.9225309Z                             auto tmp35 = in_ptr1[tmp34 + (352*tmp32) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9225413Z                             auto tmp36 = tmp14 + tmp26;
2023-01-11T21:38:05.9225517Z                             auto tmp37 = tmp36 >= tmp7;
2023-01-11T21:38:05.9225623Z                             auto tmp38 = tmp36 < tmp9;
2023-01-11T21:38:05.9225729Z                             auto tmp39 = tmp37 && tmp38;
2023-01-11T21:38:05.9225836Z                             auto tmp40 = tmp10 && tmp39;
2023-01-11T21:38:05.9225941Z                             auto tmp41 = tmp8 && tmp40;
2023-01-11T21:38:05.9226052Z                             auto tmp42 = static_cast<long>(tmp36);
2023-01-11T21:38:05.9226162Z                             auto tmp43 = tmp41 ? tmp42 : tmp21;
2023-01-11T21:38:05.9226270Z                             auto tmp44 = tmp41 ? tmp23 : tmp21;
2023-01-11T21:38:05.9226409Z                             auto tmp45 = in_ptr1[tmp44 + (352*tmp43) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9226515Z                             auto tmp46 = tmp29 && tmp39;
2023-01-11T21:38:05.9226623Z                             auto tmp47 = tmp28 && tmp46;
2023-01-11T21:38:05.9226734Z                             auto tmp48 = tmp47 ? tmp42 : tmp21;
2023-01-11T21:38:05.9226874Z                             auto tmp49 = tmp47 ? tmp33 : tmp21;
2023-01-11T21:38:05.9227007Z                             auto tmp50 = in_ptr1[tmp49 + (352*tmp48) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9227115Z                             auto tmp52 = tmp25 * tmp51;
2023-01-11T21:38:05.9227221Z                             auto tmp54 = tmp35 * tmp53;
2023-01-11T21:38:05.9227324Z                             auto tmp55 = tmp52 + tmp54;
2023-01-11T21:38:05.9227425Z                             auto tmp57 = tmp45 * tmp56;
2023-01-11T21:38:05.9227527Z                             auto tmp58 = tmp55 + tmp57;
2023-01-11T21:38:05.9227629Z                             auto tmp60 = tmp50 * tmp59;
2023-01-11T21:38:05.9227732Z                             auto tmp61 = tmp58 + tmp60;
2023-01-11T21:38:05.9227849Z                             in_out_ptr0[i2 + (123904*i1) + (371712*i0)] = tmp61;
2023-01-11T21:38:05.9227925Z                         }
2023-01-11T21:38:05.9227999Z                     }
2023-01-11T21:38:05.9228073Z                 }
2023-01-11T21:38:05.9228143Z             }
2023-01-11T21:38:05.9228212Z         }
2023-01-11T21:38:05.9228312Z         #pragma omp for  collapse(3)
2023-01-11T21:38:05.9228397Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9228467Z         {
2023-01-11T21:38:05.9228562Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.9228633Z             {
2023-01-11T21:38:05.9228736Z                 for(long i2=0; i2<123904; i2+=1)
2023-01-11T21:38:05.9228809Z                 {
2023-01-11T21:38:05.9228875Z                     {
2023-01-11T21:38:05.9228955Z                         {
2023-01-11T21:38:05.9229074Z                             auto tmp2 = out_ptr5[i2 + (123904*i0)];
2023-01-11T21:38:05.9229187Z                             auto tmp3 = out_ptr4[i2 + (123904*i0)];
2023-01-11T21:38:05.9229298Z                             auto tmp5 = out_ptr6[i2 + (123904*i0)];
2023-01-11T21:38:05.9229410Z                             auto tmp7 = out_ptr8[i2 + (123904*i0)];
2023-01-11T21:38:05.9229521Z                             auto tmp8 = out_ptr7[i2 + (123904*i0)];
2023-01-11T21:38:05.9229639Z                             auto tmp10 = out_ptr9[i2 + (123904*i0)];
2023-01-11T21:38:05.9229749Z                             auto tmp13 = out_ptr11[i2 + (123904*i0)];
2023-01-11T21:38:05.9229887Z                             auto tmp14 = out_ptr10[i2 + (123904*i0)];
2023-01-11T21:38:05.9230001Z                             auto tmp16 = out_ptr12[i2 + (123904*i0)];
2023-01-11T21:38:05.9230112Z                             auto tmp19 = out_ptr14[i2 + (123904*i0)];
2023-01-11T21:38:05.9230222Z                             auto tmp20 = out_ptr13[i2 + (123904*i0)];
2023-01-11T21:38:05.9230331Z                             auto tmp22 = out_ptr15[i2 + (123904*i0)];
2023-01-11T21:38:05.9230449Z                             auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:05.9230566Z                             auto tmp1 = static_cast<long>(i1);
2023-01-11T21:38:05.9230699Z                             auto tmp4 = in_ptr1[tmp3 + (352*tmp2) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9230808Z                             auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:05.9230945Z                             auto tmp9 = in_ptr1[tmp8 + (352*tmp7) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9231055Z                             auto tmp11 = tmp9 * tmp10;
2023-01-11T21:38:05.9231160Z                             auto tmp12 = tmp6 + tmp11;
2023-01-11T21:38:05.9231304Z                             auto tmp15 = in_ptr1[tmp14 + (352*tmp13) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9231410Z                             auto tmp17 = tmp15 * tmp16;
2023-01-11T21:38:05.9231507Z                             auto tmp18 = tmp12 + tmp17;
2023-01-11T21:38:05.9231645Z                             auto tmp21 = in_ptr1[tmp20 + (352*tmp19) + (123904*tmp1) + (371712*tmp0)];
2023-01-11T21:38:05.9231750Z                             auto tmp23 = tmp21 * tmp22;
2023-01-11T21:38:05.9231878Z                             auto tmp24 = tmp18 + tmp23;
2023-01-11T21:38:05.9231998Z                             in_out_ptr1[i2 + (123904*i1) + (371712*i0)] = tmp24;
2023-01-11T21:38:05.9232071Z                         }
2023-01-11T21:38:05.9232145Z                     }
2023-01-11T21:38:05.9232213Z                 }
2023-01-11T21:38:05.9232273Z             }
2023-01-11T21:38:05.9232344Z         }
2023-01-11T21:38:05.9232410Z     }
2023-01-11T21:38:05.9232476Z }
2023-01-11T21:38:05.9232565Z ''')
2023-01-11T21:38:05.9232570Z 
2023-01-11T21:38:05.9232575Z 
2023-01-11T21:38:05.9232670Z async_compile.wait(globals())
2023-01-11T21:38:05.9232750Z del async_compile
2023-01-11T21:38:05.9232755Z 
2023-01-11T21:38:05.9232823Z def call(args):
2023-01-11T21:38:05.9232902Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9232976Z     args.clear()
2023-01-11T21:38:05.9233195Z     buf0 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9233410Z     buf2 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9233622Z     buf4 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9233827Z     buf6 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9234041Z     buf9 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9234245Z     buf10 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9234460Z     buf11 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9234665Z     buf12 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9234875Z     buf13 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9235091Z     buf14 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9235299Z     buf15 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9235503Z     buf16 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9235745Z     buf17 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9235942Z     buf19 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9236150Z     buf20 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9236359Z     buf21 = empty_strided((4, 352, 352), (123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9236588Z     buf1 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9236679Z     buf8 = buf1; del buf1  # reuse
2023-01-11T21:38:05.9236906Z     buf18 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9237003Z     buf22 = buf18; del buf18  # reuse
2023-01-11T21:38:05.9237541Z     kernel_cpp_0(c_void_p(buf8.data_ptr()), c_void_p(buf22.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(buf12.data_ptr()), c_void_p(buf13.data_ptr()), c_void_p(buf14.data_ptr()), c_void_p(buf15.data_ptr()), c_void_p(buf16.data_ptr()), c_void_p(buf17.data_ptr()), c_void_p(buf19.data_ptr()), c_void_p(buf20.data_ptr()), c_void_p(buf21.data_ptr()))
2023-01-11T21:38:05.9237618Z     del arg0_1
2023-01-11T21:38:05.9237691Z     del arg1_1
2023-01-11T21:38:05.9237769Z     return (buf8, buf22, )
2023-01-11T21:38:05.9237774Z 
2023-01-11T21:38:05.9237779Z 
2023-01-11T21:38:05.9237862Z if __name__ == "__main__":
2023-01-11T21:38:05.9237981Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9238140Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9238373Z     arg0_1 = rand_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9238600Z     arg1_1 = rand_strided((4, 352, 352, 2), (247808, 704, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9238719Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9238984Z [2023-01-11 21:27:03,441] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 114
2023-01-11T21:38:05.9238989Z 
2023-01-11T21:38:05.9239054Z ok (3.888s)
2023-01-11T21:38:05.9239513Z   test_hardsigmoid_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9239649Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9239910Z [2023-01-11 21:27:03,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 115
2023-01-11T21:38:05.9240176Z [2023-01-11 21:27:05,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 115
2023-01-11T21:38:05.9240182Z 
2023-01-11T21:38:05.9240279Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9240355Z import torch
2023-01-11T21:38:05.9240430Z import random
2023-01-11T21:38:05.9240549Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9240666Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9240679Z 
2023-01-11T21:38:05.9240755Z aten = torch.ops.aten
2023-01-11T21:38:05.9240892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9240987Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9240995Z 
2023-01-11T21:38:05.9241069Z import triton
2023-01-11T21:38:05.9241162Z import triton.language as tl
2023-01-11T21:38:05.9241288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9241429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9241434Z 
2023-01-11T21:38:05.9241467Z 
2023-01-11T21:38:05.9241607Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9241806Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9241930Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9242037Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9242139Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9242239Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9242303Z {
2023-01-11T21:38:05.9242407Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9242470Z     {
2023-01-11T21:38:05.9242555Z         #pragma omp for 
2023-01-11T21:38:05.9242643Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9242710Z         {
2023-01-11T21:38:05.9242851Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9242994Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.9243086Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9243221Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(0.0));
2023-01-11T21:38:05.9243335Z             auto tmp4 = at::vec::maximum(tmp2, tmp3);
2023-01-11T21:38:05.9243473Z             auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(6.0));
2023-01-11T21:38:05.9243587Z             auto tmp6 = at::vec::minimum(tmp4, tmp5);
2023-01-11T21:38:05.9243724Z             auto tmp7 = at::vec::Vectorized<float>(static_cast<float>(6));
2023-01-11T21:38:05.9243815Z             auto tmp8 = tmp6 / tmp7;
2023-01-11T21:38:05.9243934Z             auto tmp9 = tmp2 + tmp1;
2023-01-11T21:38:05.9244049Z             auto tmp10 = at::vec::maximum(tmp9, tmp3);
2023-01-11T21:38:05.9244157Z             auto tmp11 = at::vec::minimum(tmp10, tmp5);
2023-01-11T21:38:05.9244248Z             auto tmp12 = tmp11 / tmp7;
2023-01-11T21:38:05.9244381Z             auto tmp13 = tmp0 - tmp1;
2023-01-11T21:38:05.9244473Z             auto tmp14 = tmp13 + tmp1;
2023-01-11T21:38:05.9244587Z             auto tmp15 = at::vec::maximum(tmp14, tmp3);
2023-01-11T21:38:05.9244698Z             auto tmp16 = at::vec::minimum(tmp15, tmp5);
2023-01-11T21:38:05.9244789Z             auto tmp17 = tmp16 / tmp7;
2023-01-11T21:38:05.9244886Z             tmp8.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9244976Z             tmp12.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9245072Z             tmp17.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9245138Z         }
2023-01-11T21:38:05.9245238Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9245334Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9245419Z         {
2023-01-11T21:38:05.9245508Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9245630Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.9245719Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9245825Z             auto tmp3 = static_cast<float>(0.0);
2023-01-11T21:38:05.9245959Z             auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::max(tmp2, tmp3);
2023-01-11T21:38:05.9246066Z             auto tmp5 = static_cast<float>(6.0);
2023-01-11T21:38:05.9246192Z             auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::min(tmp4, tmp5);
2023-01-11T21:38:05.9246296Z             auto tmp7 = static_cast<float>(6);
2023-01-11T21:38:05.9246379Z             auto tmp8 = tmp6 / tmp7;
2023-01-11T21:38:05.9246466Z             auto tmp9 = tmp2 + tmp1;
2023-01-11T21:38:05.9246596Z             auto tmp10 = (tmp3 != tmp3) ? tmp3 : std::max(tmp9, tmp3);
2023-01-11T21:38:05.9246724Z             auto tmp11 = (tmp5 != tmp5) ? tmp5 : std::min(tmp10, tmp5);
2023-01-11T21:38:05.9246820Z             auto tmp12 = tmp11 / tmp7;
2023-01-11T21:38:05.9246950Z             auto tmp13 = tmp0 - tmp1;
2023-01-11T21:38:05.9247042Z             auto tmp14 = tmp13 + tmp1;
2023-01-11T21:38:05.9247170Z             auto tmp15 = (tmp3 != tmp3) ? tmp3 : std::max(tmp14, tmp3);
2023-01-11T21:38:05.9247331Z             auto tmp16 = (tmp5 != tmp5) ? tmp5 : std::min(tmp15, tmp5);
2023-01-11T21:38:05.9247425Z             auto tmp17 = tmp16 / tmp7;
2023-01-11T21:38:05.9247511Z             out_ptr0[i0] = tmp8;
2023-01-11T21:38:05.9247596Z             out_ptr1[i0] = tmp12;
2023-01-11T21:38:05.9247681Z             out_ptr2[i0] = tmp17;
2023-01-11T21:38:05.9247748Z         }
2023-01-11T21:38:05.9247814Z     }
2023-01-11T21:38:05.9247872Z }
2023-01-11T21:38:05.9247958Z ''')
2023-01-11T21:38:05.9247963Z 
2023-01-11T21:38:05.9247968Z 
2023-01-11T21:38:05.9248062Z async_compile.wait(globals())
2023-01-11T21:38:05.9248138Z del async_compile
2023-01-11T21:38:05.9248143Z 
2023-01-11T21:38:05.9248219Z def call(args):
2023-01-11T21:38:05.9248294Z     arg0_1, = args
2023-01-11T21:38:05.9248369Z     args.clear()
2023-01-11T21:38:05.9248557Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9248750Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9248941Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9249136Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9249210Z     del arg0_1
2023-01-11T21:38:05.9249297Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9249302Z 
2023-01-11T21:38:05.9249307Z 
2023-01-11T21:38:05.9249388Z if __name__ == "__main__":
2023-01-11T21:38:05.9249507Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9249627Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9249819Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9249965Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9249970Z 
2023-01-11T21:38:05.9250043Z ok (1.812s)
2023-01-11T21:38:05.9250499Z   test_hardswish_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9250631Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9250891Z [2023-01-11 21:27:05,370] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 116
2023-01-11T21:38:05.9251153Z [2023-01-11 21:27:07,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 116
2023-01-11T21:38:05.9251162Z 
2023-01-11T21:38:05.9251261Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9251329Z import torch
2023-01-11T21:38:05.9251404Z import random
2023-01-11T21:38:05.9251527Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9251651Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9251656Z 
2023-01-11T21:38:05.9251742Z aten = torch.ops.aten
2023-01-11T21:38:05.9251879Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9251975Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9251980Z 
2023-01-11T21:38:05.9252055Z import triton
2023-01-11T21:38:05.9252140Z import triton.language as tl
2023-01-11T21:38:05.9252265Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9252406Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9252412Z 
2023-01-11T21:38:05.9252416Z 
2023-01-11T21:38:05.9252554Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9252762Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9252887Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9252992Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9253095Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9253217Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9253285Z {
2023-01-11T21:38:05.9253391Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9253458Z     {
2023-01-11T21:38:05.9253541Z         #pragma omp for 
2023-01-11T21:38:05.9253630Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9253700Z         {
2023-01-11T21:38:05.9253832Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9253972Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.9254063Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9254212Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(0.0));
2023-01-11T21:38:05.9254325Z             auto tmp4 = at::vec::maximum(tmp2, tmp3);
2023-01-11T21:38:05.9254466Z             auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(6.0));
2023-01-11T21:38:05.9254700Z             auto tmp6 = at::vec::minimum(tmp4, tmp5);
2023-01-11T21:38:05.9254793Z             auto tmp7 = tmp0 * tmp6;
2023-01-11T21:38:05.9254922Z             auto tmp8 = at::vec::Vectorized<float>(static_cast<float>(6));
2023-01-11T21:38:05.9255015Z             auto tmp9 = tmp7 / tmp8;
2023-01-11T21:38:05.9255106Z             auto tmp10 = tmp2 + tmp1;
2023-01-11T21:38:05.9255221Z             auto tmp11 = at::vec::maximum(tmp10, tmp3);
2023-01-11T21:38:05.9255333Z             auto tmp12 = at::vec::minimum(tmp11, tmp5);
2023-01-11T21:38:05.9255427Z             auto tmp13 = tmp2 * tmp12;
2023-01-11T21:38:05.9255518Z             auto tmp14 = tmp13 / tmp8;
2023-01-11T21:38:05.9255641Z             auto tmp15 = tmp0 - tmp1;
2023-01-11T21:38:05.9255784Z             auto tmp16 = tmp15 + tmp1;
2023-01-11T21:38:05.9255896Z             auto tmp17 = at::vec::maximum(tmp16, tmp3);
2023-01-11T21:38:05.9256009Z             auto tmp18 = at::vec::minimum(tmp17, tmp5);
2023-01-11T21:38:05.9256099Z             auto tmp19 = tmp15 * tmp18;
2023-01-11T21:38:05.9256194Z             auto tmp20 = tmp19 / tmp8;
2023-01-11T21:38:05.9256294Z             tmp9.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9256389Z             tmp14.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9256479Z             tmp20.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9256548Z         }
2023-01-11T21:38:05.9256650Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9256739Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9256807Z         {
2023-01-11T21:38:05.9256898Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9257004Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.9257087Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9257257Z             auto tmp3 = static_cast<float>(0.0);
2023-01-11T21:38:05.9257388Z             auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::max(tmp2, tmp3);
2023-01-11T21:38:05.9257494Z             auto tmp5 = static_cast<float>(6.0);
2023-01-11T21:38:05.9257621Z             auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::min(tmp4, tmp5);
2023-01-11T21:38:05.9257716Z             auto tmp7 = tmp0 * tmp6;
2023-01-11T21:38:05.9257820Z             auto tmp8 = static_cast<float>(6);
2023-01-11T21:38:05.9257902Z             auto tmp9 = tmp7 / tmp8;
2023-01-11T21:38:05.9257991Z             auto tmp10 = tmp2 + tmp1;
2023-01-11T21:38:05.9258122Z             auto tmp11 = (tmp3 != tmp3) ? tmp3 : std::max(tmp10, tmp3);
2023-01-11T21:38:05.9258250Z             auto tmp12 = (tmp5 != tmp5) ? tmp5 : std::min(tmp11, tmp5);
2023-01-11T21:38:05.9258341Z             auto tmp13 = tmp2 * tmp12;
2023-01-11T21:38:05.9258435Z             auto tmp14 = tmp13 / tmp8;
2023-01-11T21:38:05.9258565Z             auto tmp15 = tmp0 - tmp1;
2023-01-11T21:38:05.9258649Z             auto tmp16 = tmp15 + tmp1;
2023-01-11T21:38:05.9258775Z             auto tmp17 = (tmp3 != tmp3) ? tmp3 : std::max(tmp16, tmp3);
2023-01-11T21:38:05.9258901Z             auto tmp18 = (tmp5 != tmp5) ? tmp5 : std::min(tmp17, tmp5);
2023-01-11T21:38:05.9258993Z             auto tmp19 = tmp15 * tmp18;
2023-01-11T21:38:05.9259127Z             auto tmp20 = tmp19 / tmp8;
2023-01-11T21:38:05.9259221Z             out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.9259304Z             out_ptr1[i0] = tmp14;
2023-01-11T21:38:05.9259391Z             out_ptr2[i0] = tmp20;
2023-01-11T21:38:05.9259451Z         }
2023-01-11T21:38:05.9259517Z     }
2023-01-11T21:38:05.9259583Z }
2023-01-11T21:38:05.9259667Z ''')
2023-01-11T21:38:05.9259673Z 
2023-01-11T21:38:05.9259677Z 
2023-01-11T21:38:05.9259770Z async_compile.wait(globals())
2023-01-11T21:38:05.9259848Z del async_compile
2023-01-11T21:38:05.9259853Z 
2023-01-11T21:38:05.9259928Z def call(args):
2023-01-11T21:38:05.9259994Z     arg0_1, = args
2023-01-11T21:38:05.9260073Z     args.clear()
2023-01-11T21:38:05.9260270Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9260462Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9260650Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9260847Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9260924Z     del arg0_1
2023-01-11T21:38:05.9261011Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9261016Z 
2023-01-11T21:38:05.9261020Z 
2023-01-11T21:38:05.9261093Z if __name__ == "__main__":
2023-01-11T21:38:05.9261212Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9261339Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9261535Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9261682Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9261687Z 
2023-01-11T21:38:05.9261759Z ok (1.827s)
2023-01-11T21:38:05.9262221Z   test_hardtanh_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9262354Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9262614Z [2023-01-11 21:27:07,170] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 117
2023-01-11T21:38:05.9262872Z [2023-01-11 21:27:08,905] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 117
2023-01-11T21:38:05.9262887Z 
2023-01-11T21:38:05.9262978Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9263055Z import torch
2023-01-11T21:38:05.9263135Z import random
2023-01-11T21:38:05.9263253Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9263379Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9263384Z 
2023-01-11T21:38:05.9263468Z aten = torch.ops.aten
2023-01-11T21:38:05.9263611Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9263700Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9263705Z 
2023-01-11T21:38:05.9263780Z import triton
2023-01-11T21:38:05.9263874Z import triton.language as tl
2023-01-11T21:38:05.9264000Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9264141Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9264146Z 
2023-01-11T21:38:05.9264151Z 
2023-01-11T21:38:05.9264288Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9264495Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9264622Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9264721Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9264823Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9264923Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9265016Z {
2023-01-11T21:38:05.9265119Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9265185Z     {
2023-01-11T21:38:05.9265268Z         #pragma omp for 
2023-01-11T21:38:05.9265349Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9265416Z         {
2023-01-11T21:38:05.9265557Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9265761Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(-1.0));
2023-01-11T21:38:05.9265874Z             auto tmp2 = at::vec::maximum(tmp0, tmp1);
2023-01-11T21:38:05.9266015Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1.0));
2023-01-11T21:38:05.9266131Z             auto tmp4 = at::vec::minimum(tmp2, tmp3);
2023-01-11T21:38:05.9266267Z             auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9266350Z             auto tmp6 = tmp0 + tmp5;
2023-01-11T21:38:05.9266461Z             auto tmp7 = at::vec::maximum(tmp6, tmp1);
2023-01-11T21:38:05.9266577Z             auto tmp8 = at::vec::minimum(tmp7, tmp3);
2023-01-11T21:38:05.9266704Z             auto tmp9 = tmp0 - tmp5;
2023-01-11T21:38:05.9266817Z             auto tmp10 = at::vec::maximum(tmp9, tmp1);
2023-01-11T21:38:05.9266934Z             auto tmp11 = at::vec::minimum(tmp10, tmp3);
2023-01-11T21:38:05.9267031Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9267127Z             tmp8.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9267216Z             tmp11.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9267283Z         }
2023-01-11T21:38:05.9267384Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9267505Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9267572Z         {
2023-01-11T21:38:05.9267661Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9267805Z             auto tmp1 = static_cast<float>(-1.0);
2023-01-11T21:38:05.9267935Z             auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1);
2023-01-11T21:38:05.9268044Z             auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:05.9268172Z             auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::min(tmp2, tmp3);
2023-01-11T21:38:05.9268279Z             auto tmp5 = static_cast<float>(1);
2023-01-11T21:38:05.9268369Z             auto tmp6 = tmp0 + tmp5;
2023-01-11T21:38:05.9268495Z             auto tmp7 = (tmp1 != tmp1) ? tmp1 : std::max(tmp6, tmp1);
2023-01-11T21:38:05.9268621Z             auto tmp8 = (tmp3 != tmp3) ? tmp3 : std::min(tmp7, tmp3);
2023-01-11T21:38:05.9268742Z             auto tmp9 = tmp0 - tmp5;
2023-01-11T21:38:05.9268871Z             auto tmp10 = (tmp1 != tmp1) ? tmp1 : std::max(tmp9, tmp1);
2023-01-11T21:38:05.9269002Z             auto tmp11 = (tmp3 != tmp3) ? tmp3 : std::min(tmp10, tmp3);
2023-01-11T21:38:05.9269086Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.9269171Z             out_ptr1[i0] = tmp8;
2023-01-11T21:38:05.9269256Z             out_ptr2[i0] = tmp11;
2023-01-11T21:38:05.9269325Z         }
2023-01-11T21:38:05.9269390Z     }
2023-01-11T21:38:05.9269449Z }
2023-01-11T21:38:05.9269533Z ''')
2023-01-11T21:38:05.9269539Z 
2023-01-11T21:38:05.9269544Z 
2023-01-11T21:38:05.9269639Z async_compile.wait(globals())
2023-01-11T21:38:05.9269718Z del async_compile
2023-01-11T21:38:05.9269723Z 
2023-01-11T21:38:05.9269798Z def call(args):
2023-01-11T21:38:05.9269872Z     arg0_1, = args
2023-01-11T21:38:05.9269948Z     args.clear()
2023-01-11T21:38:05.9270134Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9270327Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9270514Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9270710Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9270783Z     del arg0_1
2023-01-11T21:38:05.9270872Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9270877Z 
2023-01-11T21:38:05.9270881Z 
2023-01-11T21:38:05.9270990Z if __name__ == "__main__":
2023-01-11T21:38:05.9271111Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9271230Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9271423Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9271538Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9271543Z 
2023-01-11T21:38:05.9271614Z ok (1.775s)
2023-01-11T21:38:05.9272078Z   test_horizonal_fusion1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9272215Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9272475Z [2023-01-11 21:27:08,925] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 118
2023-01-11T21:38:05.9272742Z [2023-01-11 21:27:10,674] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 118
2023-01-11T21:38:05.9272747Z 
2023-01-11T21:38:05.9272845Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9272913Z import torch
2023-01-11T21:38:05.9272989Z import random
2023-01-11T21:38:05.9273112Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9273236Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9273242Z 
2023-01-11T21:38:05.9273353Z aten = torch.ops.aten
2023-01-11T21:38:05.9273491Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9273587Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9273592Z 
2023-01-11T21:38:05.9273666Z import triton
2023-01-11T21:38:05.9273751Z import triton.language as tl
2023-01-11T21:38:05.9273878Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9274021Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9274026Z 
2023-01-11T21:38:05.9274031Z 
2023-01-11T21:38:05.9274167Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9274373Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9274498Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9274609Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9274718Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9274819Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9274921Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9275021Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9275087Z {
2023-01-11T21:38:05.9275188Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9275254Z     {
2023-01-11T21:38:05.9275337Z         #pragma omp for 
2023-01-11T21:38:05.9275418Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.9275486Z         {
2023-01-11T21:38:05.9275627Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9275764Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9275855Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9275951Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9276017Z         }
2023-01-11T21:38:05.9276109Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9276201Z         for(long i0=2048; i0<2048; i0+=1)
2023-01-11T21:38:05.9276273Z         {
2023-01-11T21:38:05.9276361Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9276451Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9276540Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9276625Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9276684Z         }
2023-01-11T21:38:05.9276791Z         #pragma omp for 
2023-01-11T21:38:05.9276879Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9276946Z         {
2023-01-11T21:38:05.9277034Z             #pragma GCC ivdep
2023-01-11T21:38:05.9277124Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:05.9277190Z             {
2023-01-11T21:38:05.9277278Z                 for(long i2=0; i2<2; i2+=1)
2023-01-11T21:38:05.9277348Z                 {
2023-01-11T21:38:05.9277507Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i2) + (16*i1) + (256*i0));
2023-01-11T21:38:05.9277639Z                     auto tmp1 = at::vec::Vectorized<float>(in_ptr2[i1]);
2023-01-11T21:38:05.9277795Z                     auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i2) + (16*i1) + (256*i0));
2023-01-11T21:38:05.9277936Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9278037Z                     auto tmp4 = tmp3 * tmp1;
2023-01-11T21:38:05.9278156Z                     tmp2.store(out_ptr1 + (8*i2) + (16*i1) + (256*i0));
2023-01-11T21:38:05.9278266Z                     tmp4.store(out_ptr2 + (8*i2) + (16*i1) + (256*i0));
2023-01-11T21:38:05.9278336Z                 }
2023-01-11T21:38:05.9278438Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9278531Z                 for(long i2=16; i2<16; i2+=1)
2023-01-11T21:38:05.9278600Z                 {
2023-01-11T21:38:05.9278711Z                     auto tmp0 = in_ptr0[i2 + (16*i1) + (256*i0)];
2023-01-11T21:38:05.9278809Z                     auto tmp1 = in_ptr2[i1];
2023-01-11T21:38:05.9278911Z                     auto tmp3 = in_ptr1[i2 + (16*i1) + (256*i0)];
2023-01-11T21:38:05.9279082Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9279176Z                     auto tmp4 = tmp3 * tmp1;
2023-01-11T21:38:05.9279284Z                     out_ptr1[i2 + (16*i1) + (256*i0)] = tmp2;
2023-01-11T21:38:05.9279389Z                     out_ptr2[i2 + (16*i1) + (256*i0)] = tmp4;
2023-01-11T21:38:05.9279458Z                 }
2023-01-11T21:38:05.9279527Z             }
2023-01-11T21:38:05.9279587Z         }
2023-01-11T21:38:05.9279652Z     }
2023-01-11T21:38:05.9279716Z }
2023-01-11T21:38:05.9279800Z ''')
2023-01-11T21:38:05.9279806Z 
2023-01-11T21:38:05.9279810Z 
2023-01-11T21:38:05.9279905Z async_compile.wait(globals())
2023-01-11T21:38:05.9279981Z del async_compile
2023-01-11T21:38:05.9279986Z 
2023-01-11T21:38:05.9280065Z def call(args):
2023-01-11T21:38:05.9280145Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9280219Z     args.clear()
2023-01-11T21:38:05.9280430Z     buf0 = empty_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9280641Z     buf1 = empty_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9280845Z     buf2 = empty_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9281086Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9281162Z     del arg0_1
2023-01-11T21:38:05.9281232Z     del arg1_1
2023-01-11T21:38:05.9281296Z     del arg2_1
2023-01-11T21:38:05.9281382Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9281387Z 
2023-01-11T21:38:05.9281392Z 
2023-01-11T21:38:05.9281472Z if __name__ == "__main__":
2023-01-11T21:38:05.9281591Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9281722Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9281932Z     arg0_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9282142Z     arg1_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9282345Z     arg2_1 = rand_strided((1, 16, 1), (16, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9282464Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9282469Z 
2023-01-11T21:38:05.9282568Z ok (1.768s)
2023-01-11T21:38:05.9283033Z   test_horizonal_fusion2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9283164Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9283424Z [2023-01-11 21:27:10,693] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 119
2023-01-11T21:38:05.9283691Z [2023-01-11 21:27:12,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 119
2023-01-11T21:38:05.9283697Z 
2023-01-11T21:38:05.9283794Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9283868Z import torch
2023-01-11T21:38:05.9283946Z import random
2023-01-11T21:38:05.9284060Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9284184Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9284189Z 
2023-01-11T21:38:05.9284273Z aten = torch.ops.aten
2023-01-11T21:38:05.9284409Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9284505Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9284511Z 
2023-01-11T21:38:05.9284585Z import triton
2023-01-11T21:38:05.9284676Z import triton.language as tl
2023-01-11T21:38:05.9284801Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9284933Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9284967Z 
2023-01-11T21:38:05.9284971Z 
2023-01-11T21:38:05.9285107Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9285315Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9285444Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9285554Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9285663Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9285768Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9285871Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9285964Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9286029Z {
2023-01-11T21:38:05.9286133Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9286199Z     {
2023-01-11T21:38:05.9286281Z         #pragma omp for 
2023-01-11T21:38:05.9286373Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:05.9286440Z         {
2023-01-11T21:38:05.9286572Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9286710Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9286806Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9286905Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9286973Z         }
2023-01-11T21:38:05.9287073Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9287166Z         for(long i0=1024; i0<1024; i0+=1)
2023-01-11T21:38:05.9287225Z         {
2023-01-11T21:38:05.9287314Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9287419Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9287508Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9287593Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9287659Z         }
2023-01-11T21:38:05.9287739Z         #pragma omp for 
2023-01-11T21:38:05.9287820Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9287887Z         {
2023-01-11T21:38:05.9288026Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9288160Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9288249Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9288384Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9288452Z         }
2023-01-11T21:38:05.9288544Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9288633Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:05.9288698Z         {
2023-01-11T21:38:05.9288790Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.9288894Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.9288983Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9289070Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9289129Z         }
2023-01-11T21:38:05.9289209Z         #pragma omp for 
2023-01-11T21:38:05.9289300Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9289366Z         {
2023-01-11T21:38:05.9289500Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i0);
2023-01-11T21:38:05.9289636Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.9289724Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9289813Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9289881Z         }
2023-01-11T21:38:05.9289979Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9290065Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:05.9290132Z         {
2023-01-11T21:38:05.9290219Z             auto tmp0 = in_ptr2[i0];
2023-01-11T21:38:05.9290321Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.9290402Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9290488Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:05.9290555Z         }
2023-01-11T21:38:05.9290620Z     }
2023-01-11T21:38:05.9290765Z }
2023-01-11T21:38:05.9290855Z ''')
2023-01-11T21:38:05.9290861Z 
2023-01-11T21:38:05.9290866Z 
2023-01-11T21:38:05.9290960Z async_compile.wait(globals())
2023-01-11T21:38:05.9291031Z del async_compile
2023-01-11T21:38:05.9291037Z 
2023-01-11T21:38:05.9291111Z def call(args):
2023-01-11T21:38:05.9291198Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9291276Z     args.clear()
2023-01-11T21:38:05.9291483Z     buf0 = empty_strided((8, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9291680Z     buf1 = empty_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9291874Z     buf2 = empty_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9292110Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9292178Z     del arg0_1
2023-01-11T21:38:05.9292248Z     del arg1_1
2023-01-11T21:38:05.9292320Z     del arg2_1
2023-01-11T21:38:05.9292406Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9292411Z 
2023-01-11T21:38:05.9292415Z 
2023-01-11T21:38:05.9292495Z if __name__ == "__main__":
2023-01-11T21:38:05.9292615Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9292746Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9292945Z     arg0_1 = rand_strided((8, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9293143Z     arg1_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9293338Z     arg2_1 = rand_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9293468Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9293473Z 
2023-01-11T21:38:05.9293545Z ok (1.757s)
2023-01-11T21:38:05.9293999Z   test_index1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9294160Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9294420Z [2023-01-11 21:27:12,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 120
2023-01-11T21:38:05.9294823Z [2023-01-11 21:27:14,170] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 120
2023-01-11T21:38:05.9295239Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9295375Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9295624Z [2023-01-11 21:27:14,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 121
2023-01-11T21:38:05.9295885Z [2023-01-11 21:27:16,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 121
2023-01-11T21:38:05.9295891Z 
2023-01-11T21:38:05.9295990Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9296064Z import torch
2023-01-11T21:38:05.9296141Z import random
2023-01-11T21:38:05.9296259Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9296383Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9296389Z 
2023-01-11T21:38:05.9296469Z aten = torch.ops.aten
2023-01-11T21:38:05.9296599Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9296695Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9296700Z 
2023-01-11T21:38:05.9296819Z import triton
2023-01-11T21:38:05.9296912Z import triton.language as tl
2023-01-11T21:38:05.9297037Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9297258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9297263Z 
2023-01-11T21:38:05.9297268Z 
2023-01-11T21:38:05.9297412Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9297619Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9297734Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9297843Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9297952Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9298057Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9298122Z {
2023-01-11T21:38:05.9298224Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9298292Z     {
2023-01-11T21:38:05.9298379Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9298469Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9298535Z         {
2023-01-11T21:38:05.9298627Z             for(long i1=0; i1<12; i1+=1)
2023-01-11T21:38:05.9298694Z             {
2023-01-11T21:38:05.9298762Z                 {
2023-01-11T21:38:05.9298833Z                     {
2023-01-11T21:38:05.9298929Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9299029Z                         auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9299147Z                         auto tmp2 = in_ptr2[i1 + (12*tmp1) + (96*tmp0)];
2023-01-11T21:38:05.9299249Z                         out_ptr0[i1 + (12*i0)] = tmp2;
2023-01-11T21:38:05.9299320Z                     }
2023-01-11T21:38:05.9299389Z                 }
2023-01-11T21:38:05.9299457Z             }
2023-01-11T21:38:05.9299517Z         }
2023-01-11T21:38:05.9299583Z     }
2023-01-11T21:38:05.9299647Z }
2023-01-11T21:38:05.9299738Z ''')
2023-01-11T21:38:05.9299743Z 
2023-01-11T21:38:05.9299752Z 
2023-01-11T21:38:05.9299846Z async_compile.wait(globals())
2023-01-11T21:38:05.9299923Z del async_compile
2023-01-11T21:38:05.9299928Z 
2023-01-11T21:38:05.9300002Z def call(args):
2023-01-11T21:38:05.9300082Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9300157Z     args.clear()
2023-01-11T21:38:05.9300404Z     buf0 = empty_strided((4, 12), (12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9300600Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9300672Z     del arg0_1
2023-01-11T21:38:05.9300743Z     del arg1_1
2023-01-11T21:38:05.9300814Z     del arg2_1
2023-01-11T21:38:05.9300883Z     return (buf0, )
2023-01-11T21:38:05.9300888Z 
2023-01-11T21:38:05.9300901Z 
2023-01-11T21:38:05.9300975Z if __name__ == "__main__":
2023-01-11T21:38:05.9301096Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9301222Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9301435Z     arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9301626Z     arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9301812Z     arg2_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9301943Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9301948Z 
2023-01-11T21:38:05.9301952Z 
2023-01-11T21:38:05.9302051Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9302118Z import torch
2023-01-11T21:38:05.9302193Z import random
2023-01-11T21:38:05.9302312Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9302436Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9302441Z 
2023-01-11T21:38:05.9302524Z aten = torch.ops.aten
2023-01-11T21:38:05.9302660Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9302759Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9302793Z 
2023-01-11T21:38:05.9302861Z import triton
2023-01-11T21:38:05.9302954Z import triton.language as tl
2023-01-11T21:38:05.9303079Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9303219Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9303225Z 
2023-01-11T21:38:05.9303229Z 
2023-01-11T21:38:05.9303366Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9303574Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9303695Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9303804Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9303907Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9304015Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9304081Z {
2023-01-11T21:38:05.9304185Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9304255Z     {
2023-01-11T21:38:05.9304353Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9304439Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9304498Z         {
2023-01-11T21:38:05.9304586Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9304654Z             {
2023-01-11T21:38:05.9304741Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9304838Z                 for(long i2=0; i2<12; i2+=1)
2023-01-11T21:38:05.9304909Z                 {
2023-01-11T21:38:05.9304973Z                     {
2023-01-11T21:38:05.9305047Z                         {
2023-01-11T21:38:05.9305150Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9305252Z                             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9305373Z                             auto tmp2 = in_ptr2[i2 + (12*tmp1) + (96*tmp0)];
2023-01-11T21:38:05.9305484Z                             out_ptr0[i2 + (12*i1) + (48*i0)] = tmp2;
2023-01-11T21:38:05.9305558Z                         }
2023-01-11T21:38:05.9305633Z                     }
2023-01-11T21:38:05.9305695Z                 }
2023-01-11T21:38:05.9305763Z             }
2023-01-11T21:38:05.9305829Z         }
2023-01-11T21:38:05.9305900Z     }
2023-01-11T21:38:05.9305965Z }
2023-01-11T21:38:05.9306048Z ''')
2023-01-11T21:38:05.9306054Z 
2023-01-11T21:38:05.9306058Z 
2023-01-11T21:38:05.9306187Z async_compile.wait(globals())
2023-01-11T21:38:05.9306260Z del async_compile
2023-01-11T21:38:05.9306265Z 
2023-01-11T21:38:05.9306341Z def call(args):
2023-01-11T21:38:05.9306426Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9306503Z     args.clear()
2023-01-11T21:38:05.9306714Z     buf0 = empty_strided((4, 4, 12), (48, 12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9306908Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9306981Z     del arg0_1
2023-01-11T21:38:05.9307045Z     del arg1_1
2023-01-11T21:38:05.9307115Z     del arg2_1
2023-01-11T21:38:05.9307195Z     return (buf0, )
2023-01-11T21:38:05.9307201Z 
2023-01-11T21:38:05.9307205Z 
2023-01-11T21:38:05.9307286Z if __name__ == "__main__":
2023-01-11T21:38:05.9307406Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9307533Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9307742Z     arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9307935Z     arg1_1 = rand_strided((1, 4), (4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9308119Z     arg2_1 = rand_strided((4, 1), (1, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9308248Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9308254Z 
2023-01-11T21:38:05.9308323Z ok (3.619s)
2023-01-11T21:38:05.9308778Z   test_index2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9308942Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9309201Z [2023-01-11 21:27:16,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 122
2023-01-11T21:38:05.9309465Z [2023-01-11 21:27:17,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 122
2023-01-11T21:38:05.9309470Z 
2023-01-11T21:38:05.9309568Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9309642Z import torch
2023-01-11T21:38:05.9309709Z import random
2023-01-11T21:38:05.9309832Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9309956Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9309961Z 
2023-01-11T21:38:05.9310043Z aten = torch.ops.aten
2023-01-11T21:38:05.9310181Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9310275Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9310280Z 
2023-01-11T21:38:05.9310353Z import triton
2023-01-11T21:38:05.9310445Z import triton.language as tl
2023-01-11T21:38:05.9310565Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9310708Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9310713Z 
2023-01-11T21:38:05.9310717Z 
2023-01-11T21:38:05.9310853Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9311059Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9311181Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9311290Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9311394Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9311496Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9311556Z {
2023-01-11T21:38:05.9311659Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9311726Z     {
2023-01-11T21:38:05.9311826Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9311915Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9311982Z         {
2023-01-11T21:38:05.9312097Z             for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:05.9312173Z             {
2023-01-11T21:38:05.9312242Z                 {
2023-01-11T21:38:05.9312314Z                     {
2023-01-11T21:38:05.9312415Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9312526Z                         auto tmp1 = in_ptr1[i1 + (64*tmp0)];
2023-01-11T21:38:05.9312628Z                         out_ptr0[i1 + (64*i0)] = tmp1;
2023-01-11T21:38:05.9312692Z                     }
2023-01-11T21:38:05.9312762Z                 }
2023-01-11T21:38:05.9312829Z             }
2023-01-11T21:38:05.9312895Z         }
2023-01-11T21:38:05.9312977Z         #pragma omp for 
2023-01-11T21:38:05.9313067Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9313133Z         {
2023-01-11T21:38:05.9313212Z             #pragma GCC ivdep
2023-01-11T21:38:05.9313298Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9313366Z             {
2023-01-11T21:38:05.9313455Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9313553Z                 for(long i2=0; i2<8; i2+=1)
2023-01-11T21:38:05.9313622Z                 {
2023-01-11T21:38:05.9313692Z                     {
2023-01-11T21:38:05.9313758Z                         {
2023-01-11T21:38:05.9313859Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9313976Z                             auto tmp1 = in_ptr1[i2 + (8*tmp0) + (64*i0)];
2023-01-11T21:38:05.9314089Z                             out_ptr1[i2 + (8*i1) + (32*i0)] = tmp1;
2023-01-11T21:38:05.9314163Z                         }
2023-01-11T21:38:05.9314233Z                     }
2023-01-11T21:38:05.9314301Z                 }
2023-01-11T21:38:05.9314361Z             }
2023-01-11T21:38:05.9314459Z         }
2023-01-11T21:38:05.9314524Z     }
2023-01-11T21:38:05.9314592Z }
2023-01-11T21:38:05.9314678Z ''')
2023-01-11T21:38:05.9314683Z 
2023-01-11T21:38:05.9314687Z 
2023-01-11T21:38:05.9314783Z async_compile.wait(globals())
2023-01-11T21:38:05.9314860Z del async_compile
2023-01-11T21:38:05.9314865Z 
2023-01-11T21:38:05.9314935Z def call(args):
2023-01-11T21:38:05.9315018Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9315094Z     args.clear()
2023-01-11T21:38:05.9315309Z     buf0 = empty_strided((1, 4, 8, 8), (256, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9315520Z     buf1 = empty_strided((8, 1, 4, 8), (32, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9315712Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9315784Z     del arg0_1
2023-01-11T21:38:05.9315848Z     del arg1_1
2023-01-11T21:38:05.9315930Z     return (buf0, buf1, )
2023-01-11T21:38:05.9315939Z 
2023-01-11T21:38:05.9315943Z 
2023-01-11T21:38:05.9316023Z if __name__ == "__main__":
2023-01-11T21:38:05.9316143Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9316270Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9316480Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9316670Z     arg1_1 = rand_strided((1, 4), (4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9316793Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9316799Z 
2023-01-11T21:38:05.9316862Z ok (1.852s)
2023-01-11T21:38:05.9317310Z   test_index3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9317447Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9317702Z [2023-01-11 21:27:17,953] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 123
2023-01-11T21:38:05.9317948Z [2023-01-11 21:27:17,978] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.index
2023-01-11T21:38:05.9318212Z [2023-01-11 21:27:17,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 123
2023-01-11T21:38:05.9318217Z 
2023-01-11T21:38:05.9318316Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9318390Z import torch
2023-01-11T21:38:05.9318464Z import random
2023-01-11T21:38:05.9318576Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9318699Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9318705Z 
2023-01-11T21:38:05.9318786Z aten = torch.ops.aten
2023-01-11T21:38:05.9318922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9319021Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9319026Z 
2023-01-11T21:38:05.9319101Z import triton
2023-01-11T21:38:05.9319194Z import triton.language as tl
2023-01-11T21:38:05.9319317Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9319451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9319457Z 
2023-01-11T21:38:05.9319468Z 
2023-01-11T21:38:05.9319553Z async_compile.wait(globals())
2023-01-11T21:38:05.9319629Z del async_compile
2023-01-11T21:38:05.9319634Z 
2023-01-11T21:38:05.9319708Z def call(args):
2023-01-11T21:38:05.9319794Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9319870Z     args.clear()
2023-01-11T21:38:05.9320023Z     buf0 = aten.index(as_strided(arg0_1, (3, 4, 1, 4, 3), (192, 48, 0, 12, 1)), [None, arg1_1, None, arg2_1])
2023-01-11T21:38:05.9320095Z     del arg0_1
2023-01-11T21:38:05.9320159Z     del arg1_1
2023-01-11T21:38:05.9320262Z     del arg2_1
2023-01-11T21:38:05.9320335Z     buf1 = buf0
2023-01-11T21:38:05.9320445Z     assert_size_stride(buf1, (3, 3, 1, 3), (9, 3, 3, 1))
2023-01-11T21:38:05.9320516Z     del buf0
2023-01-11T21:38:05.9320591Z     return (buf1, )
2023-01-11T21:38:05.9320596Z 
2023-01-11T21:38:05.9320601Z 
2023-01-11T21:38:05.9320679Z if __name__ == "__main__":
2023-01-11T21:38:05.9320794Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9320919Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9321142Z     arg0_1 = rand_strided((3, 4, 4, 4, 3), (192, 48, 12, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9321333Z     arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9321523Z     arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9321650Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9321655Z 
2023-01-11T21:38:05.9321727Z ok (0.079s)
2023-01-11T21:38:05.9322187Z   test_index_put1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9322320Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9322569Z [2023-01-11 21:27:18,199] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 124
2023-01-11T21:38:05.9322832Z [2023-01-11 21:27:19,987] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 124
2023-01-11T21:38:05.9323252Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9323386Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9323665Z [2023-01-11 21:27:20,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 125
2023-01-11T21:38:05.9323928Z [2023-01-11 21:27:22,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 125
2023-01-11T21:38:05.9323934Z 
2023-01-11T21:38:05.9324031Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9324105Z import torch
2023-01-11T21:38:05.9324181Z import random
2023-01-11T21:38:05.9324294Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9324422Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9324427Z 
2023-01-11T21:38:05.9324510Z aten = torch.ops.aten
2023-01-11T21:38:05.9324647Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9324743Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9324748Z 
2023-01-11T21:38:05.9324823Z import triton
2023-01-11T21:38:05.9324917Z import triton.language as tl
2023-01-11T21:38:05.9325041Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9325177Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9325182Z 
2023-01-11T21:38:05.9325195Z 
2023-01-11T21:38:05.9325324Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9325528Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9325653Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9325764Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9325874Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9325978Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9326117Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9326209Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9326275Z {
2023-01-11T21:38:05.9326380Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9326445Z     {
2023-01-11T21:38:05.9326526Z         #pragma omp for 
2023-01-11T21:38:05.9326621Z         for(long i0=0; i0<1254400; i0+=1)
2023-01-11T21:38:05.9326689Z         {
2023-01-11T21:38:05.9326824Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9326965Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9327055Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9327151Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9327248Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9327314Z         }
2023-01-11T21:38:05.9327412Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9327503Z         for(long i0=10035200; i0<10035200; i0+=1)
2023-01-11T21:38:05.9327574Z         {
2023-01-11T21:38:05.9327662Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9327765Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9327853Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9327940Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9328030Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9328090Z         }
2023-01-11T21:38:05.9328171Z         #pragma omp for 
2023-01-11T21:38:05.9328260Z         for(long i0=0; i0<601; i0+=1)
2023-01-11T21:38:05.9328327Z         {
2023-01-11T21:38:05.9328413Z             #pragma GCC ivdep
2023-01-11T21:38:05.9328507Z             for(long i1=0; i1<12544; i1+=1)
2023-01-11T21:38:05.9328576Z             {
2023-01-11T21:38:05.9328638Z                 {
2023-01-11T21:38:05.9328709Z                     {
2023-01-11T21:38:05.9328809Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.9328920Z                         auto tmp1 = in_ptr2[i1 + (12544*i0)];
2023-01-11T21:38:05.9329033Z                         auto tmp2 = static_cast<long>(1);
2023-01-11T21:38:05.9329133Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.9329246Z                         auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.9329337Z                         auto tmp5 = tmp1 + tmp4;
2023-01-11T21:38:05.9329488Z                         out_ptr0[i1 + (12544*tmp0)] = tmp1;
2023-01-11T21:38:05.9329595Z                         out_ptr1[i1 + (12544*tmp3)] = tmp5;
2023-01-11T21:38:05.9329667Z                     }
2023-01-11T21:38:05.9329738Z                 }
2023-01-11T21:38:05.9329805Z             }
2023-01-11T21:38:05.9329871Z         }
2023-01-11T21:38:05.9329945Z         #pragma omp for 
2023-01-11T21:38:05.9330037Z         for(long i0=0; i0<1254400; i0+=1)
2023-01-11T21:38:05.9330103Z         {
2023-01-11T21:38:05.9330241Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9330384Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9330477Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9330573Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9330632Z         }
2023-01-11T21:38:05.9330730Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9330828Z         for(long i0=10035200; i0<10035200; i0+=1)
2023-01-11T21:38:05.9330900Z         {
2023-01-11T21:38:05.9330988Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:05.9331092Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9331180Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9331259Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:05.9331326Z         }
2023-01-11T21:38:05.9331391Z     }
2023-01-11T21:38:05.9331455Z }
2023-01-11T21:38:05.9331540Z ''')
2023-01-11T21:38:05.9331546Z 
2023-01-11T21:38:05.9331551Z 
2023-01-11T21:38:05.9331644Z async_compile.wait(globals())
2023-01-11T21:38:05.9331722Z del async_compile
2023-01-11T21:38:05.9331728Z 
2023-01-11T21:38:05.9331829Z def call(args):
2023-01-11T21:38:05.9331917Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9331993Z     args.clear()
2023-01-11T21:38:05.9332218Z     buf0 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9332444Z     buf2 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9332662Z     buf4 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9332899Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.9332977Z     del arg0_1
2023-01-11T21:38:05.9333041Z     del arg1_1
2023-01-11T21:38:05.9333113Z     del arg2_1
2023-01-11T21:38:05.9333195Z     return (buf0, buf4, )
2023-01-11T21:38:05.9333200Z 
2023-01-11T21:38:05.9333204Z 
2023-01-11T21:38:05.9333285Z if __name__ == "__main__":
2023-01-11T21:38:05.9333407Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9333538Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9333764Z     arg0_1 = rand_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9333961Z     arg1_1 = rand_strided((601, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9334171Z     arg2_1 = rand_strided((601, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9334302Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9334308Z 
2023-01-11T21:38:05.9334312Z 
2023-01-11T21:38:05.9334409Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9334601Z import torch
2023-01-11T21:38:05.9334678Z import random
2023-01-11T21:38:05.9334797Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9334923Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9334932Z 
2023-01-11T21:38:05.9335013Z aten = torch.ops.aten
2023-01-11T21:38:05.9335143Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9335241Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9335247Z 
2023-01-11T21:38:05.9335319Z import triton
2023-01-11T21:38:05.9335412Z import triton.language as tl
2023-01-11T21:38:05.9335580Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9335721Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9335726Z 
2023-01-11T21:38:05.9335731Z 
2023-01-11T21:38:05.9335869Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9336075Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9336192Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9336301Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9336407Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9336518Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9336621Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9336720Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9336785Z {
2023-01-11T21:38:05.9336879Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9336947Z     {
2023-01-11T21:38:05.9337029Z         #pragma omp for 
2023-01-11T21:38:05.9337116Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9337302Z         {
2023-01-11T21:38:05.9337440Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9337580Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9337663Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9337759Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9337854Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9337923Z         }
2023-01-11T21:38:05.9338065Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9338156Z         for(long i0=8192; i0<8192; i0+=1)
2023-01-11T21:38:05.9338223Z         {
2023-01-11T21:38:05.9338305Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9338410Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9338498Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9338586Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9338672Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9338738Z         }
2023-01-11T21:38:05.9338832Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9338911Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9338978Z         {
2023-01-11T21:38:05.9339067Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.9339134Z             {
2023-01-11T21:38:05.9339203Z                 {
2023-01-11T21:38:05.9339273Z                     {
2023-01-11T21:38:05.9339374Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.9339467Z                         auto tmp1 = in_ptr2[i0];
2023-01-11T21:38:05.9339586Z                         auto tmp2 = static_cast<long>(1);
2023-01-11T21:38:05.9339684Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.9339801Z                         auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.9339898Z                         auto tmp5 = tmp1 + tmp4;
2023-01-11T21:38:05.9340007Z                         out_ptr0[i1 + (8*tmp0)] = tmp1;
2023-01-11T21:38:05.9340111Z                         out_ptr1[i1 + (8*tmp3)] = tmp5;
2023-01-11T21:38:05.9340175Z                     }
2023-01-11T21:38:05.9340243Z                 }
2023-01-11T21:38:05.9340310Z             }
2023-01-11T21:38:05.9340377Z         }
2023-01-11T21:38:05.9340459Z         #pragma omp for 
2023-01-11T21:38:05.9340549Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9340615Z         {
2023-01-11T21:38:05.9340746Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9340882Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9340977Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9341074Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9341143Z         }
2023-01-11T21:38:05.9341242Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9341361Z         for(long i0=8192; i0<8192; i0+=1)
2023-01-11T21:38:05.9341422Z         {
2023-01-11T21:38:05.9341512Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:05.9341615Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9341705Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9341791Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:05.9341857Z         }
2023-01-11T21:38:05.9341923Z     }
2023-01-11T21:38:05.9341980Z }
2023-01-11T21:38:05.9342067Z ''')
2023-01-11T21:38:05.9342072Z 
2023-01-11T21:38:05.9342077Z 
2023-01-11T21:38:05.9342169Z async_compile.wait(globals())
2023-01-11T21:38:05.9342246Z del async_compile
2023-01-11T21:38:05.9342254Z 
2023-01-11T21:38:05.9342330Z def call(args):
2023-01-11T21:38:05.9342417Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9342492Z     args.clear()
2023-01-11T21:38:05.9342692Z     buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9342893Z     buf2 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9343092Z     buf4 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9343330Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.9343403Z     del arg0_1
2023-01-11T21:38:05.9343474Z     del arg1_1
2023-01-11T21:38:05.9343546Z     del arg2_1
2023-01-11T21:38:05.9343626Z     return (buf0, buf4, )
2023-01-11T21:38:05.9343632Z 
2023-01-11T21:38:05.9343636Z 
2023-01-11T21:38:05.9343709Z if __name__ == "__main__":
2023-01-11T21:38:05.9343859Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9343987Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9344192Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9344385Z     arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9344588Z     arg2_1 = rand_strided((4, 1, 1), (1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9344714Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9344719Z 
2023-01-11T21:38:05.9344792Z ok (4.310s)
2023-01-11T21:38:05.9345243Z   test_index_put2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9345370Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9345628Z [2023-01-11 21:27:22,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 126
2023-01-11T21:38:05.9345895Z [2023-01-11 21:27:24,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 126
2023-01-11T21:38:05.9345901Z 
2023-01-11T21:38:05.9346000Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9346074Z import torch
2023-01-11T21:38:05.9346145Z import random
2023-01-11T21:38:05.9346264Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9346390Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9346395Z 
2023-01-11T21:38:05.9346470Z aten = torch.ops.aten
2023-01-11T21:38:05.9346608Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9346707Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9346714Z 
2023-01-11T21:38:05.9346789Z import triton
2023-01-11T21:38:05.9346880Z import triton.language as tl
2023-01-11T21:38:05.9347004Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9347144Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9347150Z 
2023-01-11T21:38:05.9347154Z 
2023-01-11T21:38:05.9347318Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9347519Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9347643Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9347752Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9347861Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9347964Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9348028Z {
2023-01-11T21:38:05.9348129Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9348188Z     {
2023-01-11T21:38:05.9348272Z         #pragma omp for 
2023-01-11T21:38:05.9348362Z         for(long i0=0; i0<156800; i0+=1)
2023-01-11T21:38:05.9348429Z         {
2023-01-11T21:38:05.9348568Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9348663Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9348734Z         }
2023-01-11T21:38:05.9348826Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9348924Z         for(long i0=1254400; i0<1254400; i0+=1)
2023-01-11T21:38:05.9348990Z         {
2023-01-11T21:38:05.9349079Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9349164Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9349230Z         }
2023-01-11T21:38:05.9349315Z         #pragma omp for 
2023-01-11T21:38:05.9349394Z         for(long i0=0; i0<600; i0+=1)
2023-01-11T21:38:05.9349460Z         {
2023-01-11T21:38:05.9349544Z             #pragma GCC ivdep
2023-01-11T21:38:05.9349637Z             for(long i1=0; i1<12544; i1+=1)
2023-01-11T21:38:05.9349738Z             {
2023-01-11T21:38:05.9349807Z                 {
2023-01-11T21:38:05.9349878Z                     {
2023-01-11T21:38:05.9349972Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:05.9350090Z                         auto tmp1 = in_ptr2[i1 + (12544*i0)];
2023-01-11T21:38:05.9350212Z                         atomic_add(&out_ptr0[i1 + (12544*tmp0)], tmp1);
2023-01-11T21:38:05.9350283Z                     }
2023-01-11T21:38:05.9350352Z                 }
2023-01-11T21:38:05.9350418Z             }
2023-01-11T21:38:05.9350483Z         }
2023-01-11T21:38:05.9350542Z     }
2023-01-11T21:38:05.9350606Z }
2023-01-11T21:38:05.9350690Z ''')
2023-01-11T21:38:05.9350695Z 
2023-01-11T21:38:05.9350699Z 
2023-01-11T21:38:05.9350792Z async_compile.wait(globals())
2023-01-11T21:38:05.9350869Z del async_compile
2023-01-11T21:38:05.9350873Z 
2023-01-11T21:38:05.9350948Z def call(args):
2023-01-11T21:38:05.9351031Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9351100Z     args.clear()
2023-01-11T21:38:05.9351324Z     buf0 = empty_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9351517Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9351591Z     del arg0_1
2023-01-11T21:38:05.9351665Z     del arg1_1
2023-01-11T21:38:05.9351735Z     del arg2_1
2023-01-11T21:38:05.9351811Z     return (buf0, )
2023-01-11T21:38:05.9351816Z 
2023-01-11T21:38:05.9351820Z 
2023-01-11T21:38:05.9351901Z if __name__ == "__main__":
2023-01-11T21:38:05.9352014Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9352140Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9352360Z     arg0_1 = rand_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9352553Z     arg1_1 = rand_strided((600, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9352772Z     arg2_1 = rand_strided((600, 256, 7, 7), (12544, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9352904Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9352909Z 
2023-01-11T21:38:05.9352983Z ok (2.114s)
2023-01-11T21:38:05.9353461Z   test_index_put3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9353596Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9353846Z [2023-01-11 21:27:24,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 127
2023-01-11T21:38:05.9354109Z [2023-01-11 21:27:26,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 127
2023-01-11T21:38:05.9354117Z 
2023-01-11T21:38:05.9354215Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9354289Z import torch
2023-01-11T21:38:05.9354361Z import random
2023-01-11T21:38:05.9354481Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9354609Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9354614Z 
2023-01-11T21:38:05.9354696Z aten = torch.ops.aten
2023-01-11T21:38:05.9354825Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9354920Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9354925Z 
2023-01-11T21:38:05.9354999Z import triton
2023-01-11T21:38:05.9355091Z import triton.language as tl
2023-01-11T21:38:05.9355215Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9355356Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9355361Z 
2023-01-11T21:38:05.9355366Z 
2023-01-11T21:38:05.9355502Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9355737Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9355853Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9355962Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9356069Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9356168Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9356233Z {
2023-01-11T21:38:05.9356334Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9356401Z     {
2023-01-11T21:38:05.9356475Z         #pragma omp for 
2023-01-11T21:38:05.9356562Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9356630Z         {
2023-01-11T21:38:05.9356713Z             #pragma GCC ivdep
2023-01-11T21:38:05.9356801Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.9356870Z             {
2023-01-11T21:38:05.9356957Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9357047Z                 for(long i2=0; i2<2; i2+=1)
2023-01-11T21:38:05.9357116Z                 {
2023-01-11T21:38:05.9357188Z                     {
2023-01-11T21:38:05.9357262Z                         {
2023-01-11T21:38:05.9357364Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9357477Z                             auto tmp1 = in_ptr1[i2 + (2*i0)];
2023-01-11T21:38:05.9357590Z                             out_ptr0[i2 + (2*tmp0) + (8*i0)] = tmp1;
2023-01-11T21:38:05.9357657Z                         }
2023-01-11T21:38:05.9357726Z                     }
2023-01-11T21:38:05.9357796Z                 }
2023-01-11T21:38:05.9357864Z             }
2023-01-11T21:38:05.9357932Z         }
2023-01-11T21:38:05.9358012Z         #pragma omp for 
2023-01-11T21:38:05.9358093Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9358160Z         {
2023-01-11T21:38:05.9358304Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9358447Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9358538Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9358634Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9358702Z         }
2023-01-11T21:38:05.9358802Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9358913Z         for(long i0=8192; i0<8192; i0+=1)
2023-01-11T21:38:05.9358986Z         {
2023-01-11T21:38:05.9359077Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9359182Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9359273Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9359360Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9359419Z         }
2023-01-11T21:38:05.9359502Z         #pragma omp for 
2023-01-11T21:38:05.9359588Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9359654Z         {
2023-01-11T21:38:05.9359739Z             #pragma GCC ivdep
2023-01-11T21:38:05.9359826Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:05.9359897Z             {
2023-01-11T21:38:05.9359976Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9360071Z                 for(long i2=0; i2<2; i2+=1)
2023-01-11T21:38:05.9360139Z                 {
2023-01-11T21:38:05.9360210Z                     {
2023-01-11T21:38:05.9360282Z                         {
2023-01-11T21:38:05.9360387Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9360496Z                             auto tmp3 = in_ptr1[i2 + (2*i0)];
2023-01-11T21:38:05.9360600Z                             auto tmp1 = static_cast<long>(1);
2023-01-11T21:38:05.9360700Z                             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9360813Z                             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.9360916Z                             auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.9361029Z                             out_ptr1[i2 + (2*tmp2) + (8*i0)] = tmp5;
2023-01-11T21:38:05.9361102Z                         }
2023-01-11T21:38:05.9361205Z                     }
2023-01-11T21:38:05.9361267Z                 }
2023-01-11T21:38:05.9361337Z             }
2023-01-11T21:38:05.9361408Z         }
2023-01-11T21:38:05.9361475Z     }
2023-01-11T21:38:05.9361542Z }
2023-01-11T21:38:05.9361630Z ''')
2023-01-11T21:38:05.9361636Z 
2023-01-11T21:38:05.9361641Z 
2023-01-11T21:38:05.9361737Z async_compile.wait(globals())
2023-01-11T21:38:05.9361807Z del async_compile
2023-01-11T21:38:05.9361821Z 
2023-01-11T21:38:05.9361889Z def call(args):
2023-01-11T21:38:05.9361975Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9362051Z     args.clear()
2023-01-11T21:38:05.9362259Z     buf1 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9362454Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9362528Z     del arg0_1
2023-01-11T21:38:05.9362602Z     del arg1_1
2023-01-11T21:38:05.9362666Z     del arg2_1
2023-01-11T21:38:05.9362747Z     return (buf1, )
2023-01-11T21:38:05.9362752Z 
2023-01-11T21:38:05.9362756Z 
2023-01-11T21:38:05.9362840Z if __name__ == "__main__":
2023-01-11T21:38:05.9362958Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9363085Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9363295Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9363486Z     arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9363684Z     arg2_1 = rand_strided((1024, 1, 2), (2, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9363810Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9363816Z 
2023-01-11T21:38:05.9363887Z ok (1.910s)
2023-01-11T21:38:05.9364353Z   test_index_put_as_masked_fill_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9364487Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9364779Z [2023-01-11 21:27:26,363] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 128
2023-01-11T21:38:05.9365043Z [2023-01-11 21:27:28,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 128
2023-01-11T21:38:05.9365458Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9365596Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9365854Z [2023-01-11 21:27:28,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 129
2023-01-11T21:38:05.9366121Z [2023-01-11 21:27:29,845] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 129
2023-01-11T21:38:05.9366127Z 
2023-01-11T21:38:05.9366226Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9366294Z import torch
2023-01-11T21:38:05.9366368Z import random
2023-01-11T21:38:05.9366488Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9366615Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9366620Z 
2023-01-11T21:38:05.9366702Z aten = torch.ops.aten
2023-01-11T21:38:05.9366841Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9366940Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9366945Z 
2023-01-11T21:38:05.9367012Z import triton
2023-01-11T21:38:05.9367136Z import triton.language as tl
2023-01-11T21:38:05.9367262Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9367404Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9367409Z 
2023-01-11T21:38:05.9367413Z 
2023-01-11T21:38:05.9367550Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9367758Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9367883Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.9367994Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9368095Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9368202Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9368268Z {
2023-01-11T21:38:05.9368371Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9368440Z     {
2023-01-11T21:38:05.9368525Z         #pragma omp for 
2023-01-11T21:38:05.9368619Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9368680Z         {
2023-01-11T21:38:05.9368787Z             float g_tmp_buffer_in_ptr0[8] = {0};
2023-01-11T21:38:05.9368915Z             flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.9369067Z             auto tmp0 = at::vec::Vectorized<float>::loadu(g_tmp_buffer_in_ptr0);
2023-01-11T21:38:05.9369192Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:05.9369329Z             auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i0);
2023-01-11T21:38:05.9369456Z             auto tmp3 = decltype(tmp1)::blendv(tmp2, tmp1, tmp0);
2023-01-11T21:38:05.9369555Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9369616Z         }
2023-01-11T21:38:05.9369713Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9369804Z         for(long i0=8192; i0<8192; i0+=1)
2023-01-11T21:38:05.9369872Z         {
2023-01-11T21:38:05.9369962Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9370055Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:05.9370143Z             auto tmp2 = in_ptr2[i0];
2023-01-11T21:38:05.9370235Z             auto tmp3 = tmp0 ? tmp1 : tmp2;
2023-01-11T21:38:05.9370323Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.9370390Z         }
2023-01-11T21:38:05.9370456Z     }
2023-01-11T21:38:05.9370556Z }
2023-01-11T21:38:05.9370645Z ''')
2023-01-11T21:38:05.9370651Z 
2023-01-11T21:38:05.9370655Z 
2023-01-11T21:38:05.9370751Z async_compile.wait(globals())
2023-01-11T21:38:05.9370822Z del async_compile
2023-01-11T21:38:05.9370827Z 
2023-01-11T21:38:05.9370902Z def call(args):
2023-01-11T21:38:05.9370995Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9371073Z     args.clear()
2023-01-11T21:38:05.9371280Z     buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9371472Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9371547Z     del arg0_1
2023-01-11T21:38:05.9371612Z     del arg1_1
2023-01-11T21:38:05.9371684Z     del arg2_1
2023-01-11T21:38:05.9371759Z     return (buf0, )
2023-01-11T21:38:05.9371764Z 
2023-01-11T21:38:05.9371769Z 
2023-01-11T21:38:05.9371850Z if __name__ == "__main__":
2023-01-11T21:38:05.9371970Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9372099Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9372304Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9372509Z     arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9372688Z     arg2_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9372815Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9372821Z 
2023-01-11T21:38:05.9372825Z 
2023-01-11T21:38:05.9372922Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9373026Z import torch
2023-01-11T21:38:05.9373102Z import random
2023-01-11T21:38:05.9373223Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9373347Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9373352Z 
2023-01-11T21:38:05.9373435Z aten = torch.ops.aten
2023-01-11T21:38:05.9373567Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9373666Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9373671Z 
2023-01-11T21:38:05.9373745Z import triton
2023-01-11T21:38:05.9373841Z import triton.language as tl
2023-01-11T21:38:05.9373966Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9374109Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9374115Z 
2023-01-11T21:38:05.9374119Z 
2023-01-11T21:38:05.9374255Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9374463Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9374697Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.9374809Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9374919Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9375022Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9375089Z {
2023-01-11T21:38:05.9375191Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9375257Z     {
2023-01-11T21:38:05.9375331Z         #pragma omp for 
2023-01-11T21:38:05.9375418Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9375486Z         {
2023-01-11T21:38:05.9375593Z             float g_tmp_buffer_in_ptr0[8] = {0};
2023-01-11T21:38:05.9375721Z             flag_to_float(in_ptr0 + 8*i0, g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.9375869Z             auto tmp0 = at::vec::Vectorized<float>::loadu(g_tmp_buffer_in_ptr0);
2023-01-11T21:38:05.9376006Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9376128Z             auto tmp2 = at::vec::Vectorized<float>(in_ptr2[0]);
2023-01-11T21:38:05.9376220Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.9376350Z             auto tmp4 = decltype(tmp3)::blendv(tmp1, tmp3, tmp0);
2023-01-11T21:38:05.9376450Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9376562Z         }
2023-01-11T21:38:05.9376667Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9376756Z         for(long i0=8192; i0<8192; i0+=1)
2023-01-11T21:38:05.9376824Z         {
2023-01-11T21:38:05.9376907Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9376993Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9377083Z             auto tmp2 = in_ptr2[0];
2023-01-11T21:38:05.9377226Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.9377340Z             auto tmp4 = tmp0 ? tmp3 : tmp1;
2023-01-11T21:38:05.9377439Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:05.9377499Z         }
2023-01-11T21:38:05.9377566Z     }
2023-01-11T21:38:05.9377633Z }
2023-01-11T21:38:05.9377720Z ''')
2023-01-11T21:38:05.9377726Z 
2023-01-11T21:38:05.9377730Z 
2023-01-11T21:38:05.9377823Z async_compile.wait(globals())
2023-01-11T21:38:05.9377901Z del async_compile
2023-01-11T21:38:05.9377906Z 
2023-01-11T21:38:05.9377982Z def call(args):
2023-01-11T21:38:05.9378070Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9378138Z     args.clear()
2023-01-11T21:38:05.9378347Z     buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9378539Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9378612Z     del arg0_1
2023-01-11T21:38:05.9378683Z     del arg1_1
2023-01-11T21:38:05.9378754Z     del arg2_1
2023-01-11T21:38:05.9378831Z     return (buf0, )
2023-01-11T21:38:05.9378836Z 
2023-01-11T21:38:05.9378840Z 
2023-01-11T21:38:05.9378913Z if __name__ == "__main__":
2023-01-11T21:38:05.9379032Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9379198Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9379404Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9379605Z     arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9379793Z     arg2_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9379922Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9379927Z 
2023-01-11T21:38:05.9380001Z ok (3.530s)
2023-01-11T21:38:05.9380469Z   test_index_put_fallback1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9380597Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9380855Z [2023-01-11 21:27:29,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 130
2023-01-11T21:38:05.9381120Z [2023-01-11 21:27:31,582] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 130
2023-01-11T21:38:05.9381536Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9381667Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9381922Z [2023-01-11 21:27:31,629] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 131
2023-01-11T21:38:05.9382189Z [2023-01-11 21:27:31,636] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 131
2023-01-11T21:38:05.9382194Z 
2023-01-11T21:38:05.9382292Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9382366Z import torch
2023-01-11T21:38:05.9382441Z import random
2023-01-11T21:38:05.9382627Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9382753Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9382758Z 
2023-01-11T21:38:05.9382841Z aten = torch.ops.aten
2023-01-11T21:38:05.9382976Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9383074Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9383080Z 
2023-01-11T21:38:05.9383154Z import triton
2023-01-11T21:38:05.9383247Z import triton.language as tl
2023-01-11T21:38:05.9383365Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9383506Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9383514Z 
2023-01-11T21:38:05.9383518Z 
2023-01-11T21:38:05.9383657Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9383862Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9383986Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9384093Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9384160Z {
2023-01-11T21:38:05.9384239Z     #pragma GCC ivdep
2023-01-11T21:38:05.9384318Z     for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.9384384Z     {
2023-01-11T21:38:05.9384451Z         {
2023-01-11T21:38:05.9384519Z             {
2023-01-11T21:38:05.9384614Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9384703Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9384770Z             }
2023-01-11T21:38:05.9384830Z         }
2023-01-11T21:38:05.9384897Z     }
2023-01-11T21:38:05.9384961Z }
2023-01-11T21:38:05.9385046Z ''')
2023-01-11T21:38:05.9385083Z 
2023-01-11T21:38:05.9385088Z 
2023-01-11T21:38:05.9385181Z async_compile.wait(globals())
2023-01-11T21:38:05.9385259Z del async_compile
2023-01-11T21:38:05.9385264Z 
2023-01-11T21:38:05.9385344Z def call(args):
2023-01-11T21:38:05.9385423Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9385499Z     args.clear()
2023-01-11T21:38:05.9385694Z     buf0 = empty_strided((3, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9385835Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9385910Z     del arg0_1
2023-01-11T21:38:05.9386019Z     aten.index_put_(buf0, [arg1_1], arg2_1, False)
2023-01-11T21:38:05.9386093Z     del arg1_1
2023-01-11T21:38:05.9386157Z     del arg2_1
2023-01-11T21:38:05.9386231Z     return (buf0, )
2023-01-11T21:38:05.9386237Z 
2023-01-11T21:38:05.9386241Z 
2023-01-11T21:38:05.9386321Z if __name__ == "__main__":
2023-01-11T21:38:05.9386438Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9386569Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9386766Z     arg0_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9386953Z     arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9387146Z     arg2_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9387267Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9387272Z 
2023-01-11T21:38:05.9387277Z 
2023-01-11T21:38:05.9387375Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9387450Z import torch
2023-01-11T21:38:05.9387525Z import random
2023-01-11T21:38:05.9387644Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9387767Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9387771Z 
2023-01-11T21:38:05.9387855Z aten = torch.ops.aten
2023-01-11T21:38:05.9387993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9388084Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9388089Z 
2023-01-11T21:38:05.9388162Z import triton
2023-01-11T21:38:05.9388258Z import triton.language as tl
2023-01-11T21:38:05.9388384Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9388523Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9388555Z 
2023-01-11T21:38:05.9388560Z 
2023-01-11T21:38:05.9388696Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9388901Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9389026Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9389123Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9389188Z {
2023-01-11T21:38:05.9389269Z     #pragma GCC ivdep
2023-01-11T21:38:05.9389354Z     for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.9389420Z     {
2023-01-11T21:38:05.9389486Z         {
2023-01-11T21:38:05.9389546Z             {
2023-01-11T21:38:05.9389641Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9389729Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9389799Z             }
2023-01-11T21:38:05.9389865Z         }
2023-01-11T21:38:05.9389930Z     }
2023-01-11T21:38:05.9389994Z }
2023-01-11T21:38:05.9390070Z ''')
2023-01-11T21:38:05.9390075Z 
2023-01-11T21:38:05.9390082Z 
2023-01-11T21:38:05.9390173Z async_compile.wait(globals())
2023-01-11T21:38:05.9390248Z del async_compile
2023-01-11T21:38:05.9390253Z 
2023-01-11T21:38:05.9390326Z def call(args):
2023-01-11T21:38:05.9390411Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9390487Z     args.clear()
2023-01-11T21:38:05.9390677Z     buf0 = empty_strided((3, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9390808Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9390880Z     del arg0_1
2023-01-11T21:38:05.9390987Z     aten.index_put_(buf0, [arg1_1], arg2_1, True)
2023-01-11T21:38:05.9391092Z     del arg1_1
2023-01-11T21:38:05.9391162Z     del arg2_1
2023-01-11T21:38:05.9391236Z     return (buf0, )
2023-01-11T21:38:05.9391241Z 
2023-01-11T21:38:05.9391246Z 
2023-01-11T21:38:05.9391326Z if __name__ == "__main__":
2023-01-11T21:38:05.9391443Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9391563Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9391757Z     arg0_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9391943Z     arg1_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9392134Z     arg2_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9392263Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9392269Z 
2023-01-11T21:38:05.9392340Z ok (1.791s)
2023-01-11T21:38:05.9392805Z   test_index_put_fallback2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9392943Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9393204Z [2023-01-11 21:27:31,685] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 132
2023-01-11T21:38:05.9393460Z [2023-01-11 21:27:33,365] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 132
2023-01-11T21:38:05.9393878Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9394010Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9394267Z [2023-01-11 21:27:33,416] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 133
2023-01-11T21:38:05.9394555Z [2023-01-11 21:27:33,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 133
2023-01-11T21:38:05.9394561Z 
2023-01-11T21:38:05.9394664Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9394739Z import torch
2023-01-11T21:38:05.9394814Z import random
2023-01-11T21:38:05.9394935Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9395052Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9395062Z 
2023-01-11T21:38:05.9395138Z aten = torch.ops.aten
2023-01-11T21:38:05.9395275Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9395371Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9395376Z 
2023-01-11T21:38:05.9395455Z import triton
2023-01-11T21:38:05.9395547Z import triton.language as tl
2023-01-11T21:38:05.9395672Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9395813Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9395819Z 
2023-01-11T21:38:05.9395823Z 
2023-01-11T21:38:05.9395956Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9396162Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9396285Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9396388Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9396455Z {
2023-01-11T21:38:05.9396536Z     #pragma GCC ivdep
2023-01-11T21:38:05.9396618Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:05.9396677Z     {
2023-01-11T21:38:05.9396745Z         {
2023-01-11T21:38:05.9396813Z             {
2023-01-11T21:38:05.9396907Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9397024Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9397094Z             }
2023-01-11T21:38:05.9397159Z         }
2023-01-11T21:38:05.9397218Z     }
2023-01-11T21:38:05.9397282Z }
2023-01-11T21:38:05.9397366Z ''')
2023-01-11T21:38:05.9397371Z 
2023-01-11T21:38:05.9397376Z 
2023-01-11T21:38:05.9397471Z async_compile.wait(globals())
2023-01-11T21:38:05.9397549Z del async_compile
2023-01-11T21:38:05.9397554Z 
2023-01-11T21:38:05.9397627Z def call(args):
2023-01-11T21:38:05.9397722Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:05.9397790Z     args.clear()
2023-01-11T21:38:05.9397991Z     buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9398129Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9398203Z     del arg0_1
2023-01-11T21:38:05.9398326Z     aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, False)
2023-01-11T21:38:05.9398398Z     del arg1_1
2023-01-11T21:38:05.9398473Z     del arg2_1
2023-01-11T21:38:05.9398537Z     del arg3_1
2023-01-11T21:38:05.9398614Z     return (buf0, )
2023-01-11T21:38:05.9398619Z 
2023-01-11T21:38:05.9398624Z 
2023-01-11T21:38:05.9398703Z if __name__ == "__main__":
2023-01-11T21:38:05.9398820Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9398952Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9399154Z     arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9399342Z     arg1_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9399527Z     arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9399704Z     arg3_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9399838Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:05.9399843Z 
2023-01-11T21:38:05.9399848Z 
2023-01-11T21:38:05.9399948Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9400025Z import torch
2023-01-11T21:38:05.9400100Z import random
2023-01-11T21:38:05.9400220Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9400343Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9400348Z 
2023-01-11T21:38:05.9400430Z aten = torch.ops.aten
2023-01-11T21:38:05.9400586Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9400684Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9400690Z 
2023-01-11T21:38:05.9400764Z import triton
2023-01-11T21:38:05.9400859Z import triton.language as tl
2023-01-11T21:38:05.9400985Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9401127Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9401132Z 
2023-01-11T21:38:05.9401136Z 
2023-01-11T21:38:05.9401271Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9401476Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9401595Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9401700Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9401765Z {
2023-01-11T21:38:05.9401846Z     #pragma GCC ivdep
2023-01-11T21:38:05.9401932Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:05.9402001Z     {
2023-01-11T21:38:05.9402072Z         {
2023-01-11T21:38:05.9402132Z             {
2023-01-11T21:38:05.9402226Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9402314Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:05.9402381Z             }
2023-01-11T21:38:05.9402448Z         }
2023-01-11T21:38:05.9402516Z     }
2023-01-11T21:38:05.9402581Z }
2023-01-11T21:38:05.9402657Z ''')
2023-01-11T21:38:05.9402663Z 
2023-01-11T21:38:05.9402668Z 
2023-01-11T21:38:05.9402760Z async_compile.wait(globals())
2023-01-11T21:38:05.9402838Z del async_compile
2023-01-11T21:38:05.9402843Z 
2023-01-11T21:38:05.9402920Z def call(args):
2023-01-11T21:38:05.9403042Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:05.9403118Z     args.clear()
2023-01-11T21:38:05.9403317Z     buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9403449Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9403520Z     del arg0_1
2023-01-11T21:38:05.9403650Z     aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, True)
2023-01-11T21:38:05.9403722Z     del arg1_1
2023-01-11T21:38:05.9403793Z     del arg2_1
2023-01-11T21:38:05.9403865Z     del arg3_1
2023-01-11T21:38:05.9403941Z     return (buf0, )
2023-01-11T21:38:05.9403946Z 
2023-01-11T21:38:05.9403950Z 
2023-01-11T21:38:05.9404023Z if __name__ == "__main__":
2023-01-11T21:38:05.9404142Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9404270Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9404472Z     arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9404663Z     arg1_1 = rand_strided((2, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9404850Z     arg2_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9405034Z     arg3_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9405171Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:05.9405176Z 
2023-01-11T21:38:05.9405240Z ok (1.788s)
2023-01-11T21:38:05.9405694Z   test_index_select_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9405827Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9406088Z [2023-01-11 21:27:33,466] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 134
2023-01-11T21:38:05.9406350Z [2023-01-11 21:27:35,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 134
2023-01-11T21:38:05.9406795Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9406929Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9407185Z [2023-01-11 21:27:35,279] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 135
2023-01-11T21:38:05.9407447Z [2023-01-11 21:27:37,083] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 135
2023-01-11T21:38:05.9407455Z 
2023-01-11T21:38:05.9407554Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9407629Z import torch
2023-01-11T21:38:05.9407697Z import random
2023-01-11T21:38:05.9407821Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9407944Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9407952Z 
2023-01-11T21:38:05.9408037Z aten = torch.ops.aten
2023-01-11T21:38:05.9408173Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9408269Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9408274Z 
2023-01-11T21:38:05.9408348Z import triton
2023-01-11T21:38:05.9408443Z import triton.language as tl
2023-01-11T21:38:05.9408562Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9408703Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9408708Z 
2023-01-11T21:38:05.9408713Z 
2023-01-11T21:38:05.9408850Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9409082Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9409207Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:05.9409317Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9409422Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9409527Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9409621Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9409690Z {
2023-01-11T21:38:05.9409793Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9409860Z     {
2023-01-11T21:38:05.9409955Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9410041Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9410101Z         {
2023-01-11T21:38:05.9410193Z             for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:05.9410261Z             {
2023-01-11T21:38:05.9410330Z                 {
2023-01-11T21:38:05.9410406Z                     {
2023-01-11T21:38:05.9410506Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9410618Z                         auto tmp1 = in_ptr1[i1 + (64*tmp0)];
2023-01-11T21:38:05.9410714Z                         out_ptr0[i1 + (64*i0)] = tmp1;
2023-01-11T21:38:05.9410784Z                     }
2023-01-11T21:38:05.9410856Z                 }
2023-01-11T21:38:05.9410924Z             }
2023-01-11T21:38:05.9410990Z         }
2023-01-11T21:38:05.9411072Z         #pragma omp for 
2023-01-11T21:38:05.9411160Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9411221Z         {
2023-01-11T21:38:05.9411306Z             #pragma GCC ivdep
2023-01-11T21:38:05.9411394Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9411463Z             {
2023-01-11T21:38:05.9411551Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9411646Z                 for(long i2=0; i2<8; i2+=1)
2023-01-11T21:38:05.9411716Z                 {
2023-01-11T21:38:05.9411780Z                     {
2023-01-11T21:38:05.9411856Z                         {
2023-01-11T21:38:05.9411959Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9412077Z                             auto tmp1 = in_ptr1[i2 + (8*tmp0) + (64*i0)];
2023-01-11T21:38:05.9412190Z                             out_ptr1[i2 + (8*i1) + (32*i0)] = tmp1;
2023-01-11T21:38:05.9412301Z                         }
2023-01-11T21:38:05.9412374Z                     }
2023-01-11T21:38:05.9412435Z                 }
2023-01-11T21:38:05.9412502Z             }
2023-01-11T21:38:05.9412569Z         }
2023-01-11T21:38:05.9412652Z         #pragma omp for 
2023-01-11T21:38:05.9412740Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9412807Z         {
2023-01-11T21:38:05.9412884Z             #pragma GCC ivdep
2023-01-11T21:38:05.9412970Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9413038Z             {
2023-01-11T21:38:05.9413123Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9413218Z                 for(long i2=0; i2<4; i2+=1)
2023-01-11T21:38:05.9413293Z                 {
2023-01-11T21:38:05.9413364Z                     {
2023-01-11T21:38:05.9413430Z                         {
2023-01-11T21:38:05.9413531Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9413629Z                             auto tmp1 = in_ptr0[i2];
2023-01-11T21:38:05.9413752Z                             auto tmp2 = in_ptr1[tmp1 + (8*tmp0) + (64*i0)];
2023-01-11T21:38:05.9413863Z                             out_ptr2[i2 + (4*i1) + (16*i0)] = tmp2;
2023-01-11T21:38:05.9413938Z                         }
2023-01-11T21:38:05.9414009Z                     }
2023-01-11T21:38:05.9414071Z                 }
2023-01-11T21:38:05.9414138Z             }
2023-01-11T21:38:05.9414204Z         }
2023-01-11T21:38:05.9414272Z     }
2023-01-11T21:38:05.9414337Z }
2023-01-11T21:38:05.9414424Z ''')
2023-01-11T21:38:05.9414429Z 
2023-01-11T21:38:05.9414433Z 
2023-01-11T21:38:05.9414638Z async_compile.wait(globals())
2023-01-11T21:38:05.9414708Z del async_compile
2023-01-11T21:38:05.9414756Z 
2023-01-11T21:38:05.9414835Z def call(args):
2023-01-11T21:38:05.9414914Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9414989Z     args.clear()
2023-01-11T21:38:05.9415196Z     buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9415399Z     buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9415593Z     buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9415810Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9415877Z     del arg0_1
2023-01-11T21:38:05.9415948Z     del arg1_1
2023-01-11T21:38:05.9416036Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9416041Z 
2023-01-11T21:38:05.9416046Z 
2023-01-11T21:38:05.9416126Z if __name__ == "__main__":
2023-01-11T21:38:05.9416245Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9416373Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9416576Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9416767Z     arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.9416884Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9416889Z 
2023-01-11T21:38:05.9416893Z 
2023-01-11T21:38:05.9416992Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9417067Z import torch
2023-01-11T21:38:05.9417192Z import random
2023-01-11T21:38:05.9417321Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9417449Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9417454Z 
2023-01-11T21:38:05.9417536Z aten = torch.ops.aten
2023-01-11T21:38:05.9417665Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9417760Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9417768Z 
2023-01-11T21:38:05.9417842Z import triton
2023-01-11T21:38:05.9417938Z import triton.language as tl
2023-01-11T21:38:05.9418061Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9418201Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9418206Z 
2023-01-11T21:38:05.9418210Z 
2023-01-11T21:38:05.9418389Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9418596Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9418717Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9418820Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9418924Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9419024Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9419123Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9419190Z {
2023-01-11T21:38:05.9419294Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9419362Z     {
2023-01-11T21:38:05.9419449Z         #pragma omp for  collapse(2)
2023-01-11T21:38:05.9419539Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9419605Z         {
2023-01-11T21:38:05.9419699Z             for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:05.9419768Z             {
2023-01-11T21:38:05.9419840Z                 {
2023-01-11T21:38:05.9419903Z                     {
2023-01-11T21:38:05.9420007Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9420121Z                         auto tmp1 = in_ptr1[i1 + (64*tmp0)];
2023-01-11T21:38:05.9420224Z                         out_ptr0[i1 + (64*i0)] = tmp1;
2023-01-11T21:38:05.9420294Z                     }
2023-01-11T21:38:05.9420363Z                 }
2023-01-11T21:38:05.9420433Z             }
2023-01-11T21:38:05.9420495Z         }
2023-01-11T21:38:05.9420578Z         #pragma omp for 
2023-01-11T21:38:05.9420664Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9420760Z         {
2023-01-11T21:38:05.9420849Z             #pragma GCC ivdep
2023-01-11T21:38:05.9420936Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9421005Z             {
2023-01-11T21:38:05.9421084Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9421179Z                 for(long i2=0; i2<8; i2+=1)
2023-01-11T21:38:05.9421255Z                 {
2023-01-11T21:38:05.9421326Z                     {
2023-01-11T21:38:05.9421398Z                         {
2023-01-11T21:38:05.9421498Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9421616Z                             auto tmp1 = in_ptr1[i2 + (8*tmp0) + (64*i0)];
2023-01-11T21:38:05.9421719Z                             out_ptr1[i2 + (8*i1) + (32*i0)] = tmp1;
2023-01-11T21:38:05.9421792Z                         }
2023-01-11T21:38:05.9421861Z                     }
2023-01-11T21:38:05.9421929Z                 }
2023-01-11T21:38:05.9421997Z             }
2023-01-11T21:38:05.9422064Z         }
2023-01-11T21:38:05.9422141Z         #pragma omp for 
2023-01-11T21:38:05.9422230Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9422296Z         {
2023-01-11T21:38:05.9422383Z             #pragma GCC ivdep
2023-01-11T21:38:05.9422470Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9422537Z             {
2023-01-11T21:38:05.9422629Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9422715Z                 for(long i2=0; i2<4; i2+=1)
2023-01-11T21:38:05.9422785Z                 {
2023-01-11T21:38:05.9422856Z                     {
2023-01-11T21:38:05.9422928Z                         {
2023-01-11T21:38:05.9423027Z                             auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9423126Z                             auto tmp1 = in_ptr0[i2];
2023-01-11T21:38:05.9423245Z                             auto tmp2 = in_ptr1[tmp1 + (8*tmp0) + (64*i0)];
2023-01-11T21:38:05.9423347Z                             out_ptr2[i2 + (4*i1) + (16*i0)] = tmp2;
2023-01-11T21:38:05.9423423Z                         }
2023-01-11T21:38:05.9423496Z                     }
2023-01-11T21:38:05.9423566Z                 }
2023-01-11T21:38:05.9423633Z             }
2023-01-11T21:38:05.9423700Z         }
2023-01-11T21:38:05.9423765Z     }
2023-01-11T21:38:05.9423822Z }
2023-01-11T21:38:05.9423910Z ''')
2023-01-11T21:38:05.9423916Z 
2023-01-11T21:38:05.9423920Z 
2023-01-11T21:38:05.9424044Z async_compile.wait(globals())
2023-01-11T21:38:05.9424122Z del async_compile
2023-01-11T21:38:05.9424128Z 
2023-01-11T21:38:05.9424204Z def call(args):
2023-01-11T21:38:05.9424283Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9424359Z     args.clear()
2023-01-11T21:38:05.9424554Z     buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9424750Z     buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9424946Z     buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9425181Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9425270Z     del arg0_1
2023-01-11T21:38:05.9425358Z     del arg1_1
2023-01-11T21:38:05.9425445Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9425450Z 
2023-01-11T21:38:05.9425454Z 
2023-01-11T21:38:05.9425538Z if __name__ == "__main__":
2023-01-11T21:38:05.9425651Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9425780Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9425983Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9426173Z     arg1_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9426296Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9426303Z 
2023-01-11T21:38:05.9426373Z ok (3.659s)
2023-01-11T21:38:05.9426842Z   test_indirect_load_broadcast_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9427004Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9427263Z [2023-01-11 21:27:37,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 136
2023-01-11T21:38:05.9427526Z [2023-01-11 21:27:38,882] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 136
2023-01-11T21:38:05.9427532Z 
2023-01-11T21:38:05.9427624Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9427698Z import torch
2023-01-11T21:38:05.9427773Z import random
2023-01-11T21:38:05.9427894Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9428018Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9428026Z 
2023-01-11T21:38:05.9428108Z aten = torch.ops.aten
2023-01-11T21:38:05.9428245Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9428333Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9428348Z 
2023-01-11T21:38:05.9428415Z import triton
2023-01-11T21:38:05.9428510Z import triton.language as tl
2023-01-11T21:38:05.9428634Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9428775Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9428780Z 
2023-01-11T21:38:05.9428785Z 
2023-01-11T21:38:05.9428922Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9429125Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9429247Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9429357Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9429462Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9429567Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9429636Z {
2023-01-11T21:38:05.9429736Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9429803Z     {
2023-01-11T21:38:05.9429886Z         #pragma omp for 
2023-01-11T21:38:05.9430001Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9430069Z         {
2023-01-11T21:38:05.9430157Z             #pragma GCC ivdep
2023-01-11T21:38:05.9430246Z             for(long i1=0; i1<21; i1+=1)
2023-01-11T21:38:05.9430314Z             {
2023-01-11T21:38:05.9430383Z                 {
2023-01-11T21:38:05.9430455Z                     {
2023-01-11T21:38:05.9430558Z                         auto tmp0 = in_ptr0[i0 + (32*i1)];
2023-01-11T21:38:05.9430659Z                         auto tmp2 = in_ptr2[i0];
2023-01-11T21:38:05.9430771Z                         auto tmp1 = in_ptr1[i1 + (512*tmp0)];
2023-01-11T21:38:05.9430872Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.9430977Z                         out_ptr0[i1 + (21*i0)] = tmp3;
2023-01-11T21:38:05.9431047Z                     }
2023-01-11T21:38:05.9431117Z                 }
2023-01-11T21:38:05.9431177Z             }
2023-01-11T21:38:05.9431245Z         }
2023-01-11T21:38:05.9431315Z     }
2023-01-11T21:38:05.9431385Z }
2023-01-11T21:38:05.9431472Z ''')
2023-01-11T21:38:05.9431478Z 
2023-01-11T21:38:05.9431482Z 
2023-01-11T21:38:05.9431575Z async_compile.wait(globals())
2023-01-11T21:38:05.9431653Z del async_compile
2023-01-11T21:38:05.9431658Z 
2023-01-11T21:38:05.9431725Z def call(args):
2023-01-11T21:38:05.9431813Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9431890Z     args.clear()
2023-01-11T21:38:05.9432091Z     buf0 = empty_strided((32, 21), (21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9432283Z     kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9432388Z     del arg0_1
2023-01-11T21:38:05.9432460Z     del arg1_1
2023-01-11T21:38:05.9432525Z     del arg2_1
2023-01-11T21:38:05.9432601Z     return (buf0, )
2023-01-11T21:38:05.9432607Z 
2023-01-11T21:38:05.9432611Z 
2023-01-11T21:38:05.9432692Z if __name__ == "__main__":
2023-01-11T21:38:05.9432812Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9432941Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9433140Z     arg0_1 = rand_strided((32, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9433344Z     arg1_1 = rand_strided((9521, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9433541Z     arg2_1 = rand_strided((32, 21), (1, 32), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9433660Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9433673Z 
2023-01-11T21:38:05.9433737Z ok (1.853s)
2023-01-11T21:38:05.9434183Z   test_inf_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9434324Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9434581Z [2023-01-11 21:27:38,955] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 137
2023-01-11T21:38:05.9434845Z [2023-01-11 21:27:40,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 137
2023-01-11T21:38:05.9434851Z 
2023-01-11T21:38:05.9434950Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9435025Z import torch
2023-01-11T21:38:05.9435100Z import random
2023-01-11T21:38:05.9435219Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9435335Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9435342Z 
2023-01-11T21:38:05.9435424Z aten = torch.ops.aten
2023-01-11T21:38:05.9435559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9435656Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9435661Z 
2023-01-11T21:38:05.9435736Z import triton
2023-01-11T21:38:05.9435858Z import triton.language as tl
2023-01-11T21:38:05.9435985Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9436117Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9436131Z 
2023-01-11T21:38:05.9436136Z 
2023-01-11T21:38:05.9436265Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9436470Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9436593Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9436699Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9436805Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9436905Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9436969Z {
2023-01-11T21:38:05.9443344Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9443427Z     {
2023-01-11T21:38:05.9443517Z         #pragma omp for 
2023-01-11T21:38:05.9443610Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9443673Z         {
2023-01-11T21:38:05.9443827Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9443990Z             auto tmp1 = at::vec::Vectorized<float>(std::numeric_limits<float>::infinity());
2023-01-11T21:38:05.9444082Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9444331Z             auto tmp3 = at::vec::Vectorized<float>(-std::numeric_limits<float>::infinity());
2023-01-11T21:38:05.9444422Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.9444509Z             auto tmp5 = tmp0 * tmp3;
2023-01-11T21:38:05.9444600Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9444756Z             tmp4.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9444858Z             tmp5.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9444928Z         }
2023-01-11T21:38:05.9445029Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9445120Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:05.9445191Z         {
2023-01-11T21:38:05.9445277Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9445409Z             auto tmp1 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9445502Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9445753Z             auto tmp3 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9445847Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:05.9445937Z             auto tmp5 = tmp0 * tmp3;
2023-01-11T21:38:05.9446026Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9446107Z             out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9446193Z             out_ptr2[i0] = tmp5;
2023-01-11T21:38:05.9446268Z         }
2023-01-11T21:38:05.9446336Z     }
2023-01-11T21:38:05.9446402Z }
2023-01-11T21:38:05.9446489Z ''')
2023-01-11T21:38:05.9446496Z 
2023-01-11T21:38:05.9446500Z 
2023-01-11T21:38:05.9446597Z async_compile.wait(globals())
2023-01-11T21:38:05.9446670Z del async_compile
2023-01-11T21:38:05.9446682Z 
2023-01-11T21:38:05.9446752Z def call(args):
2023-01-11T21:38:05.9446833Z     arg0_1, = args
2023-01-11T21:38:05.9446911Z     args.clear()
2023-01-11T21:38:05.9447112Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9447302Z     buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9447498Z     buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9447694Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9447769Z     del arg0_1
2023-01-11T21:38:05.9447858Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:05.9447866Z 
2023-01-11T21:38:05.9447870Z 
2023-01-11T21:38:05.9447954Z if __name__ == "__main__":
2023-01-11T21:38:05.9448078Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9448201Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9448427Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9448543Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9448548Z 
2023-01-11T21:38:05.9448623Z ok (1.702s)
2023-01-11T21:38:05.9449088Z   test_inplace_activations_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9449221Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9449482Z [2023-01-11 21:27:40,783] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 138
2023-01-11T21:38:05.9449747Z [2023-01-11 21:27:42,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 138
2023-01-11T21:38:05.9449753Z 
2023-01-11T21:38:05.9449853Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9449929Z import torch
2023-01-11T21:38:05.9449998Z import random
2023-01-11T21:38:05.9450115Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9450240Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9450245Z 
2023-01-11T21:38:05.9450327Z aten = torch.ops.aten
2023-01-11T21:38:05.9450465Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9450561Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9450566Z 
2023-01-11T21:38:05.9450640Z import triton
2023-01-11T21:38:05.9450725Z import triton.language as tl
2023-01-11T21:38:05.9450878Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9451021Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9451027Z 
2023-01-11T21:38:05.9451031Z 
2023-01-11T21:38:05.9451169Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9451376Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9451502Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9451607Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9451709Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9451802Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.9451900Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.9451999Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:05.9452098Z                        float* __restrict__ out_ptr5,
2023-01-11T21:38:05.9452202Z                        float* __restrict__ out_ptr6)
2023-01-11T21:38:05.9452268Z {
2023-01-11T21:38:05.9452370Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9452429Z     {
2023-01-11T21:38:05.9452510Z         #pragma omp for 
2023-01-11T21:38:05.9452597Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9452664Z         {
2023-01-11T21:38:05.9452730Z             {
2023-01-11T21:38:05.9452798Z                 {
2023-01-11T21:38:05.9452896Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9453000Z                     auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9453100Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9453206Z                     auto tmp3 = static_cast<float>(3);
2023-01-11T21:38:05.9453303Z                     auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.9453414Z                     auto tmp5 = static_cast<float>(0.0);
2023-01-11T21:38:05.9453549Z                     auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp4, tmp5);
2023-01-11T21:38:05.9453663Z                     auto tmp7 = static_cast<float>(6.0);
2023-01-11T21:38:05.9453788Z                     auto tmp8 = (tmp7 != tmp7) ? tmp7 : std::min(tmp6, tmp7);
2023-01-11T21:38:05.9453884Z                     auto tmp9 = tmp2 * tmp8;
2023-01-11T21:38:05.9454018Z                     auto tmp10 = static_cast<float>(6);
2023-01-11T21:38:05.9454119Z                     auto tmp11 = tmp9 / tmp10;
2023-01-11T21:38:05.9454284Z                     auto tmp12 = static_cast<float>(-1.0);
2023-01-11T21:38:05.9454422Z                     auto tmp13 = (tmp12 != tmp12) ? tmp12 : std::max(tmp2, tmp12);
2023-01-11T21:38:05.9454674Z                     auto tmp14 = static_cast<float>(1.0);
2023-01-11T21:38:05.9454812Z                     auto tmp15 = (tmp14 != tmp14) ? tmp14 : std::min(tmp13, tmp14);
2023-01-11T21:38:05.9454913Z                     auto tmp16 = static_cast<float>(0);
2023-01-11T21:38:05.9455013Z                     auto tmp17 = tmp2 > tmp16;
2023-01-11T21:38:05.9455131Z                     auto tmp18 = static_cast<float>(0.01);
2023-01-11T21:38:05.9455240Z                     auto tmp19 = tmp2 * tmp18;
2023-01-11T21:38:05.9455365Z                     auto tmp20 = tmp17 ? tmp2 : tmp19;
2023-01-11T21:38:05.9455542Z                     auto tmp21 = std::exp(-tmp2);
2023-01-11T21:38:05.9455642Z                     auto tmp22 = 1 / (1 + tmp21);
2023-01-11T21:38:05.9455738Z                     auto tmp23 = tmp2 * tmp22;
2023-01-11T21:38:05.9455838Z                     auto tmp24 = std::log1p(tmp2);
2023-01-11T21:38:05.9455943Z                     auto tmp25 = static_cast<bool>(0);
2023-01-11T21:38:05.9456055Z                     auto tmp26 = static_cast<float>(99.0);
2023-01-11T21:38:05.9456160Z                     auto tmp27 = tmp25 ? tmp26 : tmp2;
2023-01-11T21:38:05.9456265Z                     auto tmp28 = static_cast<bool>(1);
2023-01-11T21:38:05.9456368Z                     auto tmp29 = tmp28 ? tmp26 : tmp2;
2023-01-11T21:38:05.9456458Z                     out_ptr0[i0] = tmp11;
2023-01-11T21:38:05.9456599Z                     out_ptr1[i0] = tmp15;
2023-01-11T21:38:05.9456693Z                     out_ptr2[i0] = tmp20;
2023-01-11T21:38:05.9456784Z                     out_ptr3[i0] = tmp23;
2023-01-11T21:38:05.9456875Z                     out_ptr4[i0] = tmp24;
2023-01-11T21:38:05.9456971Z                     out_ptr5[i0] = tmp27;
2023-01-11T21:38:05.9457064Z                     out_ptr6[i0] = tmp29;
2023-01-11T21:38:05.9457188Z                 }
2023-01-11T21:38:05.9457271Z             }
2023-01-11T21:38:05.9457343Z         }
2023-01-11T21:38:05.9457412Z     }
2023-01-11T21:38:05.9457481Z }
2023-01-11T21:38:05.9457571Z ''')
2023-01-11T21:38:05.9457576Z 
2023-01-11T21:38:05.9457581Z 
2023-01-11T21:38:05.9457680Z async_compile.wait(globals())
2023-01-11T21:38:05.9457761Z del async_compile
2023-01-11T21:38:05.9457766Z 
2023-01-11T21:38:05.9457836Z def call(args):
2023-01-11T21:38:05.9457913Z     arg0_1, = args
2023-01-11T21:38:05.9457990Z     args.clear()
2023-01-11T21:38:05.9458194Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9458391Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9458582Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9458778Z     buf3 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9458967Z     buf4 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9459147Z     buf5 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9459334Z     buf6 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9459617Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()))
2023-01-11T21:38:05.9459692Z     del arg0_1
2023-01-11T21:38:05.9459812Z     return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, )
2023-01-11T21:38:05.9459817Z 
2023-01-11T21:38:05.9459822Z 
2023-01-11T21:38:05.9459905Z if __name__ == "__main__":
2023-01-11T21:38:05.9460027Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9460156Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9460399Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9460514Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9460519Z 
2023-01-11T21:38:05.9460589Z ok (1.860s)
2023-01-11T21:38:05.9461103Z   test_inplace_add_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.9461182Z   warnings.warn(
2023-01-11T21:38:05.9461440Z [2023-01-11 21:27:42,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 139
2023-01-11T21:38:05.9461708Z [2023-01-11 21:27:44,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 139
2023-01-11T21:38:05.9461715Z 
2023-01-11T21:38:05.9461812Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9461889Z import torch
2023-01-11T21:38:05.9461957Z import random
2023-01-11T21:38:05.9462077Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9462201Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9462206Z 
2023-01-11T21:38:05.9462289Z aten = torch.ops.aten
2023-01-11T21:38:05.9462427Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9462523Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9462528Z 
2023-01-11T21:38:05.9462602Z import triton
2023-01-11T21:38:05.9462688Z import triton.language as tl
2023-01-11T21:38:05.9462811Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9462981Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9462986Z 
2023-01-11T21:38:05.9462992Z 
2023-01-11T21:38:05.9463132Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9463342Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9463471Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9463583Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9463689Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9463750Z {
2023-01-11T21:38:05.9463853Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9463921Z     {
2023-01-11T21:38:05.9464005Z         #pragma omp for 
2023-01-11T21:38:05.9464098Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9464169Z         {
2023-01-11T21:38:05.9464312Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9464457Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9464543Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9464641Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9464710Z         }
2023-01-11T21:38:05.9464814Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9464910Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:05.9464978Z         {
2023-01-11T21:38:05.9465066Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9465156Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9465247Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9465336Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9465405Z         }
2023-01-11T21:38:05.9465472Z     }
2023-01-11T21:38:05.9465539Z }
2023-01-11T21:38:05.9465619Z ''')
2023-01-11T21:38:05.9465624Z 
2023-01-11T21:38:05.9465635Z 
2023-01-11T21:38:05.9465724Z async_compile.wait(globals())
2023-01-11T21:38:05.9465803Z del async_compile
2023-01-11T21:38:05.9465810Z 
2023-01-11T21:38:05.9465886Z def call(args):
2023-01-11T21:38:05.9465973Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9466051Z     args.clear()
2023-01-11T21:38:05.9466221Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.9466297Z     del arg1_1
2023-01-11T21:38:05.9466400Z     return (arg0_1, )
2023-01-11T21:38:05.9466406Z 
2023-01-11T21:38:05.9466410Z 
2023-01-11T21:38:05.9466496Z if __name__ == "__main__":
2023-01-11T21:38:05.9466616Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9466744Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9466943Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9467138Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9467261Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9467266Z 
2023-01-11T21:38:05.9467340Z ok (1.750s)
2023-01-11T21:38:05.9467681Z   test_inplace_mixed_dtype_ops_cpu (__main__.CpuTests) ... [2023-01-11 21:27:44,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 140
2023-01-11T21:38:05.9467949Z [2023-01-11 21:27:46,266] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 140
2023-01-11T21:38:05.9467957Z 
2023-01-11T21:38:05.9468056Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9468133Z import torch
2023-01-11T21:38:05.9468211Z import random
2023-01-11T21:38:05.9468336Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9468462Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9468467Z 
2023-01-11T21:38:05.9468554Z aten = torch.ops.aten
2023-01-11T21:38:05.9468685Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9468783Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9468788Z 
2023-01-11T21:38:05.9468865Z import triton
2023-01-11T21:38:05.9468988Z import triton.language as tl
2023-01-11T21:38:05.9469114Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9469253Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9469259Z 
2023-01-11T21:38:05.9469263Z 
2023-01-11T21:38:05.9469400Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9469610Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9469723Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9469832Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9469944Z                        const double* __restrict__ in_ptr1)
2023-01-11T21:38:05.9470009Z {
2023-01-11T21:38:05.9470111Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9470176Z     {
2023-01-11T21:38:05.9470258Z         #pragma omp for 
2023-01-11T21:38:05.9470337Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9470403Z         {
2023-01-11T21:38:05.9470475Z             {
2023-01-11T21:38:05.9470544Z                 {
2023-01-11T21:38:05.9470643Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9470744Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9470857Z                     auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:05.9470950Z                     auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:05.9471064Z                     auto tmp4 = static_cast<double>(tmp3);
2023-01-11T21:38:05.9471161Z                     auto tmp5 = tmp4 + tmp1;
2023-01-11T21:38:05.9471272Z                     auto tmp6 = static_cast<float>(tmp5);
2023-01-11T21:38:05.9471385Z                     auto tmp7 = static_cast<double>(tmp6);
2023-01-11T21:38:05.9471480Z                     auto tmp8 = tmp7 * tmp1;
2023-01-11T21:38:05.9471588Z                     auto tmp9 = static_cast<float>(tmp8);
2023-01-11T21:38:05.9471676Z                     in_out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.9471746Z                 }
2023-01-11T21:38:05.9471816Z             }
2023-01-11T21:38:05.9471882Z         }
2023-01-11T21:38:05.9471948Z     }
2023-01-11T21:38:05.9472013Z }
2023-01-11T21:38:05.9472090Z ''')
2023-01-11T21:38:05.9472104Z 
2023-01-11T21:38:05.9472108Z 
2023-01-11T21:38:05.9472194Z async_compile.wait(globals())
2023-01-11T21:38:05.9472269Z del async_compile
2023-01-11T21:38:05.9472275Z 
2023-01-11T21:38:05.9472374Z def call(args):
2023-01-11T21:38:05.9472454Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9472529Z     args.clear()
2023-01-11T21:38:05.9472727Z     buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9472817Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:05.9472977Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()))
2023-01-11T21:38:05.9473050Z     del arg0_1
2023-01-11T21:38:05.9473120Z     del arg1_1
2023-01-11T21:38:05.9473195Z     return (buf1, )
2023-01-11T21:38:05.9473201Z 
2023-01-11T21:38:05.9473205Z 
2023-01-11T21:38:05.9473289Z if __name__ == "__main__":
2023-01-11T21:38:05.9473409Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9473537Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9473731Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9473920Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9474038Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9474045Z 
2023-01-11T21:38:05.9474115Z ok (2.029s)
2023-01-11T21:38:05.9474445Z   test_input_mutation1_cpu (__main__.CpuTests) ... [2023-01-11 21:27:46,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 141
2023-01-11T21:38:05.9474657Z [2023-01-11 21:27:46,318] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:05.9474861Z [2023-01-11 21:27:46,320] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:05.9475122Z [2023-01-11 21:27:48,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 141
2023-01-11T21:38:05.9475160Z 
2023-01-11T21:38:05.9475260Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9475330Z import torch
2023-01-11T21:38:05.9475404Z import random
2023-01-11T21:38:05.9475522Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9475646Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9475651Z 
2023-01-11T21:38:05.9475733Z aten = torch.ops.aten
2023-01-11T21:38:05.9475868Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9475964Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9475971Z 
2023-01-11T21:38:05.9476045Z import triton
2023-01-11T21:38:05.9476130Z import triton.language as tl
2023-01-11T21:38:05.9476254Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9476394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9476399Z 
2023-01-11T21:38:05.9476407Z 
2023-01-11T21:38:05.9476547Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9476751Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9476874Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9476982Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9477084Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:05.9477142Z {
2023-01-11T21:38:05.9477244Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9477311Z     {
2023-01-11T21:38:05.9477395Z         #pragma omp for 
2023-01-11T21:38:05.9477481Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9477548Z         {
2023-01-11T21:38:05.9477689Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9477821Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9477911Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9478002Z             auto tmp3 = tmp2 * tmp2;
2023-01-11T21:38:05.9478136Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9478226Z             auto tmp5 = tmp2 + tmp4;
2023-01-11T21:38:05.9478315Z             auto tmp6 = tmp3 / tmp5;
2023-01-11T21:38:05.9478440Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9478529Z             tmp6.store(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9478597Z         }
2023-01-11T21:38:05.9478699Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9478786Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9478852Z         {
2023-01-11T21:38:05.9478942Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:05.9479047Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9479128Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9479218Z             auto tmp3 = tmp2 * tmp2;
2023-01-11T21:38:05.9479323Z             auto tmp4 = static_cast<float>(2);
2023-01-11T21:38:05.9479415Z             auto tmp5 = tmp2 + tmp4;
2023-01-11T21:38:05.9479504Z             auto tmp6 = tmp3 / tmp5;
2023-01-11T21:38:05.9479589Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9479673Z             out_ptr2[i0] = tmp6;
2023-01-11T21:38:05.9479733Z         }
2023-01-11T21:38:05.9479799Z     }
2023-01-11T21:38:05.9479864Z }
2023-01-11T21:38:05.9479954Z ''')
2023-01-11T21:38:05.9479960Z 
2023-01-11T21:38:05.9479964Z 
2023-01-11T21:38:05.9480055Z async_compile.wait(globals())
2023-01-11T21:38:05.9480132Z del async_compile
2023-01-11T21:38:05.9480139Z 
2023-01-11T21:38:05.9480216Z def call(args):
2023-01-11T21:38:05.9480283Z     arg0_1, = args
2023-01-11T21:38:05.9480357Z     args.clear()
2023-01-11T21:38:05.9480552Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9480722Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:05.9480795Z     del arg0_1
2023-01-11T21:38:05.9480937Z     return (buf2, )
2023-01-11T21:38:05.9480942Z 
2023-01-11T21:38:05.9480946Z 
2023-01-11T21:38:05.9481025Z if __name__ == "__main__":
2023-01-11T21:38:05.9481143Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9481263Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9481461Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9481574Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9481579Z 
2023-01-11T21:38:05.9481650Z ok (1.991s)
2023-01-11T21:38:05.9481980Z   test_input_mutation2_cpu (__main__.CpuTests) ... [2023-01-11 21:27:48,357] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 142
2023-01-11T21:38:05.9482202Z [2023-01-11 21:27:48,360] torch._inductor.graph: [WARNING] Creating implicit fallback for:
2023-01-11T21:38:05.9482301Z   target: aten.expand_copy.default
2023-01-11T21:38:05.9482397Z   args[0]: TensorBox(StorageBox(
2023-01-11T21:38:05.9482470Z     Pointwise(
2023-01-11T21:38:05.9482564Z       'cpu',
2023-01-11T21:38:05.9482643Z       torch.float32,
2023-01-11T21:38:05.9482746Z       tmp0 = constant(66.0, torch.float32)
2023-01-11T21:38:05.9482820Z       return tmp0
2023-01-11T21:38:05.9482886Z       ,
2023-01-11T21:38:05.9482952Z       ranges=[1],
2023-01-11T21:38:05.9483070Z       origins={lift_fresh_copy, _tensor_constant0}
2023-01-11T21:38:05.9483136Z     )
2023-01-11T21:38:05.9483200Z   ))
2023-01-11T21:38:05.9483272Z   args[1]: [64]
2023-01-11T21:38:05.9483539Z [2023-01-11 21:27:48,370] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.aten.expand_copy.default
2023-01-11T21:38:05.9483799Z [2023-01-11 21:27:50,445] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 142
2023-01-11T21:38:05.9483805Z 
2023-01-11T21:38:05.9483902Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9483969Z import torch
2023-01-11T21:38:05.9484041Z import random
2023-01-11T21:38:05.9484161Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9484287Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9484292Z 
2023-01-11T21:38:05.9484375Z aten = torch.ops.aten
2023-01-11T21:38:05.9484516Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9484613Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9484647Z 
2023-01-11T21:38:05.9484715Z import triton
2023-01-11T21:38:05.9484812Z import triton.language as tl
2023-01-11T21:38:05.9484938Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9485078Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9485084Z 
2023-01-11T21:38:05.9485088Z 
2023-01-11T21:38:05.9485224Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9485427Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9485552Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9485658Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9485757Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9485823Z {
2023-01-11T21:38:05.9485926Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9485995Z     {
2023-01-11T21:38:05.9486077Z         #pragma omp for 
2023-01-11T21:38:05.9486167Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9486234Z         {
2023-01-11T21:38:05.9486365Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9486501Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9486591Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9486689Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9486757Z         }
2023-01-11T21:38:05.9486856Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9486943Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9487002Z         {
2023-01-11T21:38:05.9487090Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9487225Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9487314Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9487399Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9487467Z         }
2023-01-11T21:38:05.9487554Z         #pragma omp single
2023-01-11T21:38:05.9487614Z         {
2023-01-11T21:38:05.9487682Z             {
2023-01-11T21:38:05.9487749Z                 {
2023-01-11T21:38:05.9487862Z                     auto tmp0 = static_cast<float>(66.0);
2023-01-11T21:38:05.9487954Z                     out_ptr1[0] = tmp0;
2023-01-11T21:38:05.9488026Z                 }
2023-01-11T21:38:05.9488093Z             }
2023-01-11T21:38:05.9488152Z         }
2023-01-11T21:38:05.9488218Z     }
2023-01-11T21:38:05.9488282Z }
2023-01-11T21:38:05.9488367Z ''')
2023-01-11T21:38:05.9488372Z 
2023-01-11T21:38:05.9488377Z 
2023-01-11T21:38:05.9488512Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.9488717Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9488844Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9488940Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9489005Z {
2023-01-11T21:38:05.9489105Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9489172Z     {
2023-01-11T21:38:05.9489256Z         #pragma omp for 
2023-01-11T21:38:05.9489343Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9489410Z         {
2023-01-11T21:38:05.9489540Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9489678Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9489771Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9489866Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9489933Z         }
2023-01-11T21:38:05.9490033Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9490122Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9490185Z         {
2023-01-11T21:38:05.9490274Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9490376Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.9490466Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9490551Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9490647Z         }
2023-01-11T21:38:05.9490715Z     }
2023-01-11T21:38:05.9490772Z }
2023-01-11T21:38:05.9490858Z ''')
2023-01-11T21:38:05.9490864Z 
2023-01-11T21:38:05.9490868Z 
2023-01-11T21:38:05.9490960Z async_compile.wait(globals())
2023-01-11T21:38:05.9491037Z del async_compile
2023-01-11T21:38:05.9491042Z 
2023-01-11T21:38:05.9491116Z def call(args):
2023-01-11T21:38:05.9491197Z     primals_1, = args
2023-01-11T21:38:05.9491273Z     args.clear()
2023-01-11T21:38:05.9491465Z     buf0 = empty_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9491654Z     buf1 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9491831Z     kernel_cpp_0(c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9491910Z     del primals_1
2023-01-11T21:38:05.9492041Z     buf2 = torch.ops.aten.expand_copy.default(buf1, [64])
2023-01-11T21:38:05.9492111Z     del buf1
2023-01-11T21:38:05.9492183Z     buf3 = buf2
2023-01-11T21:38:05.9492279Z     assert_size_stride(buf3, (64, ), (1, ))
2023-01-11T21:38:05.9492349Z     del buf2
2023-01-11T21:38:05.9492546Z     buf4 = empty_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9492684Z     kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.9492800Z     return (as_strided(buf3, (1, 64), (64, 1)), buf0, buf4, )
2023-01-11T21:38:05.9492805Z 
2023-01-11T21:38:05.9492809Z 
2023-01-11T21:38:05.9492889Z if __name__ == "__main__":
2023-01-11T21:38:05.9493007Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9493128Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9493354Z     primals_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9493471Z     print_performance(lambda: call([primals_1]))
2023-01-11T21:38:05.9493476Z 
2023-01-11T21:38:05.9493547Z ok (2.182s)
2023-01-11T21:38:05.9493881Z   test_input_mutation3_cpu (__main__.CpuTests) ... [2023-01-11 21:27:50,496] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 143
2023-01-11T21:38:05.9494147Z [2023-01-11 21:27:52,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 143
2023-01-11T21:38:05.9494153Z 
2023-01-11T21:38:05.9494249Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9494325Z import torch
2023-01-11T21:38:05.9494400Z import random
2023-01-11T21:38:05.9494635Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9494761Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9494767Z 
2023-01-11T21:38:05.9494852Z aten = torch.ops.aten
2023-01-11T21:38:05.9494991Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9495087Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9495092Z 
2023-01-11T21:38:05.9495168Z import triton
2023-01-11T21:38:05.9495261Z import triton.language as tl
2023-01-11T21:38:05.9495390Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9495523Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9495529Z 
2023-01-11T21:38:05.9495540Z 
2023-01-11T21:38:05.9495671Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9495875Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9495998Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9496102Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9496166Z {
2023-01-11T21:38:05.9496266Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9496332Z     {
2023-01-11T21:38:05.9496409Z         #pragma omp for 
2023-01-11T21:38:05.9496496Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9496562Z         {
2023-01-11T21:38:05.9496706Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9496886Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9496979Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9497074Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9497187Z         }
2023-01-11T21:38:05.9497302Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9497402Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9497485Z         {
2023-01-11T21:38:05.9497591Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9497708Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:05.9497812Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9497904Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9498078Z         }
2023-01-11T21:38:05.9498167Z         #pragma omp for 
2023-01-11T21:38:05.9498253Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9498325Z         {
2023-01-11T21:38:05.9498463Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9498604Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9498686Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9498779Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9498846Z         }
2023-01-11T21:38:05.9498947Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9499033Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9499102Z         {
2023-01-11T21:38:05.9499191Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9499288Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.9499377Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9499464Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9499567Z         }
2023-01-11T21:38:05.9499650Z         #pragma omp for 
2023-01-11T21:38:05.9499736Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9499801Z         {
2023-01-11T21:38:05.9499932Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9500076Z             auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp());
2023-01-11T21:38:05.9500173Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9500239Z         }
2023-01-11T21:38:05.9500337Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9500425Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9500491Z         {
2023-01-11T21:38:05.9500575Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9500716Z             auto tmp1 = std::exp(-tmp0);
2023-01-11T21:38:05.9500804Z             auto tmp2 = 1 / (1 + tmp1);
2023-01-11T21:38:05.9500891Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9500958Z         }
2023-01-11T21:38:05.9501039Z         #pragma omp for 
2023-01-11T21:38:05.9501128Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9501188Z         {
2023-01-11T21:38:05.9501325Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9501462Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:05.9501554Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9501648Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9501717Z         }
2023-01-11T21:38:05.9501815Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9501894Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9501963Z         {
2023-01-11T21:38:05.9502055Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9502159Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:05.9502246Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9502332Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9502400Z         }
2023-01-11T21:38:05.9502474Z         #pragma omp for 
2023-01-11T21:38:05.9502562Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9502631Z         {
2023-01-11T21:38:05.9502769Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9502904Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(4));
2023-01-11T21:38:05.9503033Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9503128Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9503188Z         }
2023-01-11T21:38:05.9503286Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9503376Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9503443Z         {
2023-01-11T21:38:05.9503533Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9503638Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:05.9503730Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9503808Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9503873Z         }
2023-01-11T21:38:05.9503955Z         #pragma omp for 
2023-01-11T21:38:05.9504043Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9504109Z         {
2023-01-11T21:38:05.9504245Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9504380Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:05.9504469Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9504536Z         }
2023-01-11T21:38:05.9504634Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9504722Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9504787Z         {
2023-01-11T21:38:05.9504878Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9504971Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.9505049Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9505114Z         }
2023-01-11T21:38:05.9505179Z     }
2023-01-11T21:38:05.9505243Z }
2023-01-11T21:38:05.9505330Z ''')
2023-01-11T21:38:05.9505336Z 
2023-01-11T21:38:05.9505340Z 
2023-01-11T21:38:05.9505473Z async_compile.wait(globals())
2023-01-11T21:38:05.9505549Z del async_compile
2023-01-11T21:38:05.9505555Z 
2023-01-11T21:38:05.9505622Z def call(args):
2023-01-11T21:38:05.9505700Z     arg0_1, = args
2023-01-11T21:38:05.9505776Z     args.clear()
2023-01-11T21:38:05.9505912Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.9506017Z     return (as_strided(arg0_1, (64, ), (1, )), )
2023-01-11T21:38:05.9506022Z 
2023-01-11T21:38:05.9506026Z 
2023-01-11T21:38:05.9506107Z if __name__ == "__main__":
2023-01-11T21:38:05.9506225Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9506351Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9506546Z     arg0_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9506659Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9506664Z 
2023-01-11T21:38:05.9506736Z ok (2.289s)
2023-01-11T21:38:05.9507065Z   test_input_mutation4_cpu (__main__.CpuTests) ... [2023-01-11 21:27:52,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 144
2023-01-11T21:38:05.9507338Z [2023-01-11 21:27:54,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 144
2023-01-11T21:38:05.9507343Z 
2023-01-11T21:38:05.9507446Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9507522Z import torch
2023-01-11T21:38:05.9507599Z import random
2023-01-11T21:38:05.9507711Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9507835Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9507840Z 
2023-01-11T21:38:05.9507922Z aten = torch.ops.aten
2023-01-11T21:38:05.9508060Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9508159Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9508164Z 
2023-01-11T21:38:05.9508237Z import triton
2023-01-11T21:38:05.9508328Z import triton.language as tl
2023-01-11T21:38:05.9508446Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9508587Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9508592Z 
2023-01-11T21:38:05.9508596Z 
2023-01-11T21:38:05.9508735Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9508999Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9509123Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9509229Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9509295Z {
2023-01-11T21:38:05.9509394Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9509453Z     {
2023-01-11T21:38:05.9509534Z         #pragma omp for 
2023-01-11T21:38:05.9509620Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9509685Z         {
2023-01-11T21:38:05.9509823Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9509957Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:05.9510058Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9510117Z         }
2023-01-11T21:38:05.9510217Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9510302Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9510368Z         {
2023-01-11T21:38:05.9510461Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:05.9510553Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.9510639Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9510699Z         }
2023-01-11T21:38:05.9510763Z     }
2023-01-11T21:38:05.9510827Z }
2023-01-11T21:38:05.9510910Z ''')
2023-01-11T21:38:05.9510916Z 
2023-01-11T21:38:05.9510920Z 
2023-01-11T21:38:05.9511012Z async_compile.wait(globals())
2023-01-11T21:38:05.9511089Z del async_compile
2023-01-11T21:38:05.9511094Z 
2023-01-11T21:38:05.9511174Z def call(args):
2023-01-11T21:38:05.9511240Z     arg0_1, = args
2023-01-11T21:38:05.9511318Z     args.clear()
2023-01-11T21:38:05.9511487Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg0_1.data_ptr()))
2023-01-11T21:38:05.9511566Z     return (arg0_1, )
2023-01-11T21:38:05.9511571Z 
2023-01-11T21:38:05.9511575Z 
2023-01-11T21:38:05.9511654Z if __name__ == "__main__":
2023-01-11T21:38:05.9511773Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9511903Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9512102Z     arg0_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9512207Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9512212Z 
2023-01-11T21:38:05.9512288Z ok (1.809s)
2023-01-11T21:38:05.9512759Z   test_invalid_operand_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9512895Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9513154Z [2023-01-11 21:27:54,908] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 145
2023-01-11T21:38:05.9513419Z [2023-01-11 21:27:56,648] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 145
2023-01-11T21:38:05.9513425Z 
2023-01-11T21:38:05.9513524Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9513597Z import torch
2023-01-11T21:38:05.9513672Z import random
2023-01-11T21:38:05.9513785Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9513909Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9513914Z 
2023-01-11T21:38:05.9513995Z aten = torch.ops.aten
2023-01-11T21:38:05.9514130Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9514225Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9514233Z 
2023-01-11T21:38:05.9514306Z import triton
2023-01-11T21:38:05.9514399Z import triton.language as tl
2023-01-11T21:38:05.9514524Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9514658Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9514664Z 
2023-01-11T21:38:05.9514702Z 
2023-01-11T21:38:05.9514833Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9515038Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9515161Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9515269Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9515376Z                        const long* __restrict__ in_ptr2,
2023-01-11T21:38:05.9515486Z                        const float* __restrict__ in_ptr3,
2023-01-11T21:38:05.9515589Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9515650Z {
2023-01-11T21:38:05.9515751Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9515816Z     {
2023-01-11T21:38:05.9515897Z         #pragma omp for 
2023-01-11T21:38:05.9515983Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9516050Z         {
2023-01-11T21:38:05.9516135Z             #pragma GCC ivdep
2023-01-11T21:38:05.9516222Z             for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:05.9516290Z             {
2023-01-11T21:38:05.9516377Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9516472Z                 for(long i2=0; i2<768; i2+=1)
2023-01-11T21:38:05.9516541Z                 {
2023-01-11T21:38:05.9516611Z                     {
2023-01-11T21:38:05.9516678Z                         {
2023-01-11T21:38:05.9516780Z                             auto tmp3 = in_ptr0[i0];
2023-01-11T21:38:05.9516891Z                             auto tmp8 = in_ptr2[i1 + (128*i0)];
2023-01-11T21:38:05.9517004Z                             auto tmp0 = static_cast<int>(i1);
2023-01-11T21:38:05.9517141Z                             auto tmp1 = static_cast<int>(0);
2023-01-11T21:38:05.9517242Z                             auto tmp2 = tmp0 == tmp1;
2023-01-11T21:38:05.9517352Z                             auto tmp4 = static_cast<int>(1);
2023-01-11T21:38:05.9517456Z                             auto tmp5 = tmp0 >= tmp4;
2023-01-11T21:38:05.9517541Z                             auto tmp6 = 0;
2023-01-11T21:38:05.9517624Z                             if(tmp5)
2023-01-11T21:38:05.9517698Z                             {
2023-01-11T21:38:05.9517877Z                                 auto tmp7 = in_ptr1[(-1) + i1 + (127*i0)];
2023-01-11T21:38:05.9517970Z                                 tmp6 = tmp7;
2023-01-11T21:38:05.9518050Z                             }
2023-01-11T21:38:05.9518160Z                             auto tmp9 = tmp5 ? tmp6 : tmp8;
2023-01-11T21:38:05.9518261Z                             auto tmp10 = tmp2 ? tmp3 : tmp9;
2023-01-11T21:38:05.9518375Z                             auto tmp11 = in_ptr3[i2 + (768*tmp10)];
2023-01-11T21:38:05.9518492Z                             out_ptr0[i2 + (768*i1) + (98304*i0)] = tmp11;
2023-01-11T21:38:05.9518565Z                         }
2023-01-11T21:38:05.9518635Z                     }
2023-01-11T21:38:05.9518703Z                 }
2023-01-11T21:38:05.9518770Z             }
2023-01-11T21:38:05.9518829Z         }
2023-01-11T21:38:05.9518898Z     }
2023-01-11T21:38:05.9518965Z }
2023-01-11T21:38:05.9519048Z ''')
2023-01-11T21:38:05.9519053Z 
2023-01-11T21:38:05.9519057Z 
2023-01-11T21:38:05.9519151Z async_compile.wait(globals())
2023-01-11T21:38:05.9519225Z del async_compile
2023-01-11T21:38:05.9519230Z 
2023-01-11T21:38:05.9519304Z def call(args):
2023-01-11T21:38:05.9519401Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args
2023-01-11T21:38:05.9519477Z     args.clear()
2023-01-11T21:38:05.9519692Z     buf0 = empty_strided((8, 128, 768), (98304, 768, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9519910Z     kernel_cpp_0(c_void_p(arg3_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg4_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9519987Z     del arg0_1
2023-01-11T21:38:05.9520059Z     del arg2_1
2023-01-11T21:38:05.9520131Z     del arg3_1
2023-01-11T21:38:05.9520195Z     del arg4_1
2023-01-11T21:38:05.9520271Z     return (buf0, )
2023-01-11T21:38:05.9520277Z 
2023-01-11T21:38:05.9520310Z 
2023-01-11T21:38:05.9520392Z if __name__ == "__main__":
2023-01-11T21:38:05.9520513Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9520642Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9520850Z     arg0_1 = rand_strided((50005, 768), (768, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9521050Z     arg1_1 = rand_strided((8, 128), (128, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9521247Z     arg2_1 = rand_strided((8, 127), (127, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9521428Z     arg3_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9521620Z     arg4_1 = rand_strided((8, 128), (128, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9521761Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1]))
2023-01-11T21:38:05.9521767Z 
2023-01-11T21:38:05.9521837Z ok (2.607s)
2023-01-11T21:38:05.9522287Z   test_isinf2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9522419Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9522678Z [2023-01-11 21:27:57,174] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 146
2023-01-11T21:38:05.9522941Z [2023-01-11 21:27:58,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 146
2023-01-11T21:38:05.9522975Z 
2023-01-11T21:38:05.9523073Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9523148Z import torch
2023-01-11T21:38:05.9523215Z import random
2023-01-11T21:38:05.9523336Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9523461Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9523466Z 
2023-01-11T21:38:05.9523549Z aten = torch.ops.aten
2023-01-11T21:38:05.9523685Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9523779Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9523785Z 
2023-01-11T21:38:05.9523858Z import triton
2023-01-11T21:38:05.9523944Z import triton.language as tl
2023-01-11T21:38:05.9524068Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9524208Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9524213Z 
2023-01-11T21:38:05.9524221Z 
2023-01-11T21:38:05.9524363Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9524568Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9524691Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9524797Z                        bool* __restrict__ out_ptr0)
2023-01-11T21:38:05.9524865Z {
2023-01-11T21:38:05.9524939Z     #pragma GCC ivdep
2023-01-11T21:38:05.9525024Z     for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:05.9525089Z     {
2023-01-11T21:38:05.9525155Z         {
2023-01-11T21:38:05.9525224Z             {
2023-01-11T21:38:05.9525316Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9525435Z                 auto tmp1 = static_cast<long>(i0);
2023-01-11T21:38:05.9525541Z                 auto tmp2 = static_cast<long>(2);
2023-01-11T21:38:05.9525651Z                 auto tmp3 = tmp1 < tmp2;
2023-01-11T21:38:05.9525757Z                 auto tmp4 = static_cast<long>(1);
2023-01-11T21:38:05.9525850Z                 auto tmp5 = tmp1 < tmp4;
2023-01-11T21:38:05.9525960Z                 auto tmp6 = static_cast<float>(1.0);
2023-01-11T21:38:05.9526088Z                 auto tmp7 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9526191Z                 auto tmp8 = tmp5 ? tmp6 : tmp7;
2023-01-11T21:38:05.9526312Z                 auto tmp9 = static_cast<long>(3);
2023-01-11T21:38:05.9526407Z                 auto tmp10 = tmp1 < tmp9;
2023-01-11T21:38:05.9526515Z                 auto tmp11 = static_cast<float>(2.0);
2023-01-11T21:38:05.9526620Z                 auto tmp12 = static_cast<long>(4);
2023-01-11T21:38:05.9526717Z                 auto tmp13 = tmp1 < tmp12;
2023-01-11T21:38:05.9526932Z                 auto tmp14 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9527062Z                 auto tmp15 = std::numeric_limits<float>::quiet_NaN();
2023-01-11T21:38:05.9527165Z                 auto tmp16 = tmp13 ? tmp14 : tmp15;
2023-01-11T21:38:05.9527259Z                 auto tmp17 = tmp10 ? tmp11 : tmp16;
2023-01-11T21:38:05.9527363Z                 auto tmp18 = tmp3 ? tmp8 : tmp17;
2023-01-11T21:38:05.9527458Z                 auto tmp19 = tmp0 == tmp18;
2023-01-11T21:38:05.9527547Z                 out_ptr0[i0] = tmp19;
2023-01-11T21:38:05.9527616Z             }
2023-01-11T21:38:05.9527683Z         }
2023-01-11T21:38:05.9527749Z     }
2023-01-11T21:38:05.9527808Z }
2023-01-11T21:38:05.9527894Z ''')
2023-01-11T21:38:05.9527899Z 
2023-01-11T21:38:05.9527903Z 
2023-01-11T21:38:05.9527997Z async_compile.wait(globals())
2023-01-11T21:38:05.9528073Z del async_compile
2023-01-11T21:38:05.9528077Z 
2023-01-11T21:38:05.9528152Z def call(args):
2023-01-11T21:38:05.9528226Z     arg0_1, = args
2023-01-11T21:38:05.9528301Z     args.clear()
2023-01-11T21:38:05.9528480Z     buf0 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9528618Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9528691Z     del arg0_1
2023-01-11T21:38:05.9528797Z     return (buf0, )
2023-01-11T21:38:05.9528802Z 
2023-01-11T21:38:05.9528806Z 
2023-01-11T21:38:05.9528886Z if __name__ == "__main__":
2023-01-11T21:38:05.9529007Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9529134Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9529329Z     arg0_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9529435Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9529440Z 
2023-01-11T21:38:05.9529510Z ok (1.719s)
2023-01-11T21:38:05.9529960Z   test_isinf_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9530089Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9530352Z [2023-01-11 21:27:58,892] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 147
2023-01-11T21:38:05.9530612Z [2023-01-11 21:28:00,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 147
2023-01-11T21:38:05.9531032Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9531164Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9531416Z [2023-01-11 21:28:00,683] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 148
2023-01-11T21:38:05.9531676Z [2023-01-11 21:28:02,781] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 148
2023-01-11T21:38:05.9531685Z 
2023-01-11T21:38:05.9531782Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9531849Z import torch
2023-01-11T21:38:05.9531922Z import random
2023-01-11T21:38:05.9532040Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9532192Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9532197Z 
2023-01-11T21:38:05.9532284Z aten = torch.ops.aten
2023-01-11T21:38:05.9532420Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9532516Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9532521Z 
2023-01-11T21:38:05.9532588Z import triton
2023-01-11T21:38:05.9532682Z import triton.language as tl
2023-01-11T21:38:05.9532810Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9532951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9532957Z 
2023-01-11T21:38:05.9532965Z 
2023-01-11T21:38:05.9533103Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9533316Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9533439Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9533544Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:05.9533637Z                        bool* __restrict__ out_ptr1)
2023-01-11T21:38:05.9533702Z {
2023-01-11T21:38:05.9533782Z     #pragma GCC ivdep
2023-01-11T21:38:05.9533868Z     for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:05.9533932Z     {
2023-01-11T21:38:05.9534002Z         {
2023-01-11T21:38:05.9534072Z             {
2023-01-11T21:38:05.9534159Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9534265Z                 auto tmp1 = std::isinf(tmp0);
2023-01-11T21:38:05.9534367Z                 auto tmp2 = std::isnan(tmp0);
2023-01-11T21:38:05.9534456Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9534704Z                 out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9534772Z             }
2023-01-11T21:38:05.9534838Z         }
2023-01-11T21:38:05.9534896Z     }
2023-01-11T21:38:05.9534963Z }
2023-01-11T21:38:05.9535048Z ''')
2023-01-11T21:38:05.9535053Z 
2023-01-11T21:38:05.9535057Z 
2023-01-11T21:38:05.9535151Z async_compile.wait(globals())
2023-01-11T21:38:05.9535231Z del async_compile
2023-01-11T21:38:05.9535236Z 
2023-01-11T21:38:05.9535310Z def call(args):
2023-01-11T21:38:05.9535382Z     arg0_1, = args
2023-01-11T21:38:05.9535455Z     args.clear()
2023-01-11T21:38:05.9535675Z     buf0 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9535875Z     buf1 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9536043Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9536115Z     del arg0_1
2023-01-11T21:38:05.9536197Z     return (buf0, buf1, )
2023-01-11T21:38:05.9536202Z 
2023-01-11T21:38:05.9536210Z 
2023-01-11T21:38:05.9536289Z if __name__ == "__main__":
2023-01-11T21:38:05.9536406Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9536526Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9536718Z     arg0_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9536833Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9536839Z 
2023-01-11T21:38:05.9536843Z 
2023-01-11T21:38:05.9536940Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9537016Z import torch
2023-01-11T21:38:05.9537089Z import random
2023-01-11T21:38:05.9537271Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9537402Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9537407Z 
2023-01-11T21:38:05.9537482Z aten = torch.ops.aten
2023-01-11T21:38:05.9537618Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9537713Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9537721Z 
2023-01-11T21:38:05.9537796Z import triton
2023-01-11T21:38:05.9537889Z import triton.language as tl
2023-01-11T21:38:05.9538014Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9538154Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9538159Z 
2023-01-11T21:38:05.9538211Z 
2023-01-11T21:38:05.9538354Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9538553Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9538680Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:05.9538782Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:05.9538881Z                        bool* __restrict__ out_ptr1)
2023-01-11T21:38:05.9538945Z {
2023-01-11T21:38:05.9539029Z     #pragma GCC ivdep
2023-01-11T21:38:05.9539114Z     for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:05.9539172Z     {
2023-01-11T21:38:05.9539239Z         {
2023-01-11T21:38:05.9539311Z             {
2023-01-11T21:38:05.9539404Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9539509Z                 auto tmp1 = std::isinf(tmp0);
2023-01-11T21:38:05.9539610Z                 auto tmp2 = std::isnan(tmp0);
2023-01-11T21:38:05.9539696Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9539776Z                 out_ptr1[i0] = tmp2;
2023-01-11T21:38:05.9539844Z             }
2023-01-11T21:38:05.9539910Z         }
2023-01-11T21:38:05.9539978Z     }
2023-01-11T21:38:05.9540042Z }
2023-01-11T21:38:05.9540127Z ''')
2023-01-11T21:38:05.9540132Z 
2023-01-11T21:38:05.9540136Z 
2023-01-11T21:38:05.9540228Z async_compile.wait(globals())
2023-01-11T21:38:05.9540298Z del async_compile
2023-01-11T21:38:05.9540303Z 
2023-01-11T21:38:05.9540376Z def call(args):
2023-01-11T21:38:05.9540449Z     arg0_1, = args
2023-01-11T21:38:05.9540525Z     args.clear()
2023-01-11T21:38:05.9540715Z     buf0 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9540932Z     buf1 = empty_strided((5, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9541101Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9541166Z     del arg0_1
2023-01-11T21:38:05.9541247Z     return (buf0, buf1, )
2023-01-11T21:38:05.9541252Z 
2023-01-11T21:38:05.9541260Z 
2023-01-11T21:38:05.9541338Z if __name__ == "__main__":
2023-01-11T21:38:05.9541456Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9541583Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9541776Z     arg0_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9541889Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9541894Z 
2023-01-11T21:38:05.9541964Z ok (3.905s)
2023-01-11T21:38:05.9542468Z   test_kernel_names_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.9542554Z   warnings.warn(
2023-01-11T21:38:05.9542813Z [2023-01-11 21:28:02,796] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 149
2023-01-11T21:38:05.9543078Z [2023-01-11 21:28:04,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 149
2023-01-11T21:38:05.9543083Z 
2023-01-11T21:38:05.9543180Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9543253Z import torch
2023-01-11T21:38:05.9543327Z import random
2023-01-11T21:38:05.9543446Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9543563Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9543576Z 
2023-01-11T21:38:05.9543650Z aten = torch.ops.aten
2023-01-11T21:38:05.9543786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9543881Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9543889Z 
2023-01-11T21:38:05.9543962Z import triton
2023-01-11T21:38:05.9544054Z import triton.language as tl
2023-01-11T21:38:05.9544183Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9544328Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9544333Z 
2023-01-11T21:38:05.9544374Z 
2023-01-11T21:38:05.9544515Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9544713Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9544839Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9544945Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9545012Z {
2023-01-11T21:38:05.9545113Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9545180Z     {
2023-01-11T21:38:05.9545262Z         #pragma omp for 
2023-01-11T21:38:05.9545343Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9545414Z         {
2023-01-11T21:38:05.9545553Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9545690Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9545782Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9545878Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9545946Z         }
2023-01-11T21:38:05.9546039Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9546126Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:05.9546191Z         {
2023-01-11T21:38:05.9546280Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9546384Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:05.9546474Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9546559Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9546619Z         }
2023-01-11T21:38:05.9546683Z     }
2023-01-11T21:38:05.9546748Z }
2023-01-11T21:38:05.9546832Z ''')
2023-01-11T21:38:05.9546837Z 
2023-01-11T21:38:05.9546877Z 
2023-01-11T21:38:05.9546973Z async_compile.wait(globals())
2023-01-11T21:38:05.9547051Z del async_compile
2023-01-11T21:38:05.9547056Z 
2023-01-11T21:38:05.9547130Z def call(args):
2023-01-11T21:38:05.9547197Z     arg0_1, = args
2023-01-11T21:38:05.9547271Z     args.clear()
2023-01-11T21:38:05.9547466Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9547605Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9547677Z     del arg0_1
2023-01-11T21:38:05.9547752Z     return (buf0, )
2023-01-11T21:38:05.9547757Z 
2023-01-11T21:38:05.9547762Z 
2023-01-11T21:38:05.9547843Z if __name__ == "__main__":
2023-01-11T21:38:05.9547959Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9548079Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9548271Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9548384Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9548393Z 
2023-01-11T21:38:05.9548462Z ok (2.003s)
2023-01-11T21:38:05.9548623Z   test_kwargs_cpu (__main__.CpuTests) ... skip: histogramdd only supports cpu (0.001s)
2023-01-11T21:38:05.9549077Z   test_l1_loss_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9549209Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9549467Z [2023-01-11 21:28:04,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 150
2023-01-11T21:38:05.9549730Z [2023-01-11 21:28:06,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 150
2023-01-11T21:38:05.9549738Z 
2023-01-11T21:38:05.9549828Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9549902Z import torch
2023-01-11T21:38:05.9549978Z import random
2023-01-11T21:38:05.9550097Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9550218Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9550252Z 
2023-01-11T21:38:05.9550337Z aten = torch.ops.aten
2023-01-11T21:38:05.9550471Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9550559Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9550570Z 
2023-01-11T21:38:05.9550637Z import triton
2023-01-11T21:38:05.9550729Z import triton.language as tl
2023-01-11T21:38:05.9550853Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9550990Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9550996Z 
2023-01-11T21:38:05.9551001Z 
2023-01-11T21:38:05.9551137Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9551346Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9551466Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9551572Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.9551677Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9551787Z                        const float* __restrict__ in_ptr1)
2023-01-11T21:38:05.9551853Z {
2023-01-11T21:38:05.9551944Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:05.9552035Z     auto out_ptr1 = in_out_ptr1;
2023-01-11T21:38:05.9552101Z     {
2023-01-11T21:38:05.9552290Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.9552363Z         float tmp4 = 0;
2023-01-11T21:38:05.9552487Z         auto tmp4_vec = at::vec::Vectorized<float>(tmp4);
2023-01-11T21:38:05.9552566Z         float tmp6 = 0;
2023-01-11T21:38:05.9552688Z         auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:05.9552825Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9552892Z         {
2023-01-11T21:38:05.9553030Z             #pragma omp for reduction(+:tmp4_vec) reduction(+:tmp6_vec) 
2023-01-11T21:38:05.9553115Z             for(long i0=0; i0<192; i0+=1)
2023-01-11T21:38:05.9553187Z             {
2023-01-11T21:38:05.9553329Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9553468Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9553603Z                 auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9553696Z                 auto tmp3 = tmp2.abs();
2023-01-11T21:38:05.9553786Z                 auto tmp5 = tmp2 * tmp2;
2023-01-11T21:38:05.9553873Z                 tmp4_vec += tmp3;
2023-01-11T21:38:05.9553949Z                 tmp6_vec += tmp5;
2023-01-11T21:38:05.9554017Z             }
2023-01-11T21:38:05.9554218Z             tmp4 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp4_vec);
2023-01-11T21:38:05.9554419Z             tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:05.9554570Z             #pragma omp for simd simdlen(4) reduction(+:tmp4) reduction(+:tmp6) 
2023-01-11T21:38:05.9554665Z             for(long i0=1536; i0<1536; i0+=1)
2023-01-11T21:38:05.9554732Z             {
2023-01-11T21:38:05.9554821Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9554906Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9555037Z                 auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9555140Z                 auto tmp3 = std::abs(tmp2);
2023-01-11T21:38:05.9555229Z                 auto tmp5 = tmp2 * tmp2;
2023-01-11T21:38:05.9555309Z                 tmp4 += tmp3;
2023-01-11T21:38:05.9555389Z                 tmp6 += tmp5;
2023-01-11T21:38:05.9555454Z             }
2023-01-11T21:38:05.9555516Z         }
2023-01-11T21:38:05.9555594Z         out_ptr0[0] = tmp4;
2023-01-11T21:38:05.9555672Z         out_ptr1[0] = tmp6;
2023-01-11T21:38:05.9555738Z     }
2023-01-11T21:38:05.9555802Z     {
2023-01-11T21:38:05.9555868Z         {
2023-01-11T21:38:05.9555952Z             auto tmp0 = out_ptr0[0];
2023-01-11T21:38:05.9556086Z             auto tmp1 = static_cast<float>(1536);
2023-01-11T21:38:05.9556180Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.9556267Z             in_out_ptr0[0] = tmp2;
2023-01-11T21:38:05.9556334Z         }
2023-01-11T21:38:05.9556397Z     }
2023-01-11T21:38:05.9556462Z     {
2023-01-11T21:38:05.9556522Z         {
2023-01-11T21:38:05.9556611Z             auto tmp0 = out_ptr1[0];
2023-01-11T21:38:05.9556717Z             auto tmp1 = static_cast<float>(1536);
2023-01-11T21:38:05.9556806Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.9556890Z             in_out_ptr1[0] = tmp2;
2023-01-11T21:38:05.9556955Z         }
2023-01-11T21:38:05.9557025Z     }
2023-01-11T21:38:05.9557083Z }
2023-01-11T21:38:05.9557168Z ''')
2023-01-11T21:38:05.9557173Z 
2023-01-11T21:38:05.9557178Z 
2023-01-11T21:38:05.9557272Z async_compile.wait(globals())
2023-01-11T21:38:05.9557348Z del async_compile
2023-01-11T21:38:05.9557353Z 
2023-01-11T21:38:05.9557427Z def call(args):
2023-01-11T21:38:05.9557511Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9557586Z     args.clear()
2023-01-11T21:38:05.9557766Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9557947Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9558035Z     buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:05.9558123Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:05.9558318Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()))
2023-01-11T21:38:05.9558390Z     del arg0_1
2023-01-11T21:38:05.9558461Z     del arg1_1
2023-01-11T21:38:05.9558535Z     return (buf2, buf3, )
2023-01-11T21:38:05.9558577Z 
2023-01-11T21:38:05.9558581Z 
2023-01-11T21:38:05.9558656Z if __name__ == "__main__":
2023-01-11T21:38:05.9558773Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9558900Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9559121Z     arg0_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9559336Z     arg1_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9559456Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9559462Z 
2023-01-11T21:38:05.9559533Z ok (2.177s)
2023-01-11T21:38:05.9559986Z   test_layer_norm_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9560121Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9560373Z [2023-01-11 21:28:07,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 151
2023-01-11T21:38:05.9560640Z [2023-01-11 21:28:08,972] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 151
2023-01-11T21:38:05.9560645Z 
2023-01-11T21:38:05.9560746Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9560820Z import torch
2023-01-11T21:38:05.9560892Z import random
2023-01-11T21:38:05.9561012Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9561133Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9561138Z 
2023-01-11T21:38:05.9561221Z aten = torch.ops.aten
2023-01-11T21:38:05.9561350Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9561448Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9561454Z 
2023-01-11T21:38:05.9561526Z import triton
2023-01-11T21:38:05.9561622Z import triton.language as tl
2023-01-11T21:38:05.9561747Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9561888Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9561921Z 
2023-01-11T21:38:05.9561926Z 
2023-01-11T21:38:05.9562064Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9562269Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9562382Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9562491Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.9562602Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9562710Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9562818Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9562924Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9563025Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:05.9563086Z {
2023-01-11T21:38:05.9563177Z     auto out_ptr2 = in_out_ptr0;
2023-01-11T21:38:05.9563266Z     auto out_ptr1 = in_out_ptr1;
2023-01-11T21:38:05.9563369Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9563436Z     {
2023-01-11T21:38:05.9563517Z         #pragma omp for 
2023-01-11T21:38:05.9563604Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9563664Z         {
2023-01-11T21:38:05.9563731Z             {
2023-01-11T21:38:05.9563928Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.9564013Z                 float tmp1 = 0;
2023-01-11T21:38:05.9564140Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:05.9564234Z                 for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9564334Z                 {
2023-01-11T21:38:05.9564485Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:05.9564565Z                     tmp1_vec += tmp0;
2023-01-11T21:38:05.9564635Z                 }
2023-01-11T21:38:05.9564841Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:05.9564967Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:05.9565061Z                 for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:05.9565129Z                 {
2023-01-11T21:38:05.9565234Z                     auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:05.9565309Z                     tmp1 += tmp0;
2023-01-11T21:38:05.9565389Z                 }
2023-01-11T21:38:05.9565489Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9565566Z             }
2023-01-11T21:38:05.9565646Z         }
2023-01-11T21:38:05.9565731Z         #pragma omp for 
2023-01-11T21:38:05.9565817Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9565877Z         {
2023-01-11T21:38:05.9565943Z             {
2023-01-11T21:38:05.9566135Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.9566223Z                 float tmp6 = 0;
2023-01-11T21:38:05.9566349Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:05.9566433Z                 float tmp7 = 0;
2023-01-11T21:38:05.9566558Z                 auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:05.9566651Z                 for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9566714Z                 {
2023-01-11T21:38:05.9566862Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:05.9566998Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:05.9567143Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(32));
2023-01-11T21:38:05.9567240Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:05.9567381Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:05.9567480Z                     auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:05.9567590Z                     tmp6_vec += tmp5;
2023-01-11T21:38:05.9567677Z                     tmp7_vec += tmp0;
2023-01-11T21:38:05.9567745Z                 }
2023-01-11T21:38:05.9567946Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:05.9568145Z                 tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:05.9568291Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6) reduction(+:tmp7)
2023-01-11T21:38:05.9568387Z                 for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:05.9568463Z                 {
2023-01-11T21:38:05.9568570Z                     auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:05.9568661Z                     auto tmp1 = out_ptr0[i0];
2023-01-11T21:38:05.9568769Z                     auto tmp2 = static_cast<float>(32);
2023-01-11T21:38:05.9568867Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:05.9569005Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:05.9569100Z                     auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:05.9569183Z                     tmp6 += tmp5;
2023-01-11T21:38:05.9569264Z                     tmp7 += tmp0;
2023-01-11T21:38:05.9569325Z                 }
2023-01-11T21:38:05.9569413Z                 out_ptr1[i0] = tmp6;
2023-01-11T21:38:05.9569496Z                 out_ptr2[i0] = tmp7;
2023-01-11T21:38:05.9569565Z             }
2023-01-11T21:38:05.9569630Z         }
2023-01-11T21:38:05.9569714Z         #pragma omp for 
2023-01-11T21:38:05.9569794Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9569896Z         {
2023-01-11T21:38:05.9570035Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr2 + 8*i0);
2023-01-11T21:38:05.9570174Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(32));
2023-01-11T21:38:05.9570263Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.9570367Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9570433Z         }
2023-01-11T21:38:05.9570535Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9570615Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:05.9570686Z         {
2023-01-11T21:38:05.9570774Z             auto tmp0 = out_ptr2[i0];
2023-01-11T21:38:05.9570880Z             auto tmp1 = static_cast<float>(32);
2023-01-11T21:38:05.9570969Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.9571057Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9571125Z         }
2023-01-11T21:38:05.9571199Z         #pragma omp for 
2023-01-11T21:38:05.9571289Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9571359Z         {
2023-01-11T21:38:05.9571498Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9571637Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(32));
2023-01-11T21:38:05.9571725Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.9571934Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1e-05));
2023-01-11T21:38:05.9572017Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.9572106Z             auto tmp5 = tmp4.rsqrt();
2023-01-11T21:38:05.9572207Z             tmp5.store(in_out_ptr1 + 8*i0);
2023-01-11T21:38:05.9572272Z         }
2023-01-11T21:38:05.9572373Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9572459Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:05.9572525Z         {
2023-01-11T21:38:05.9572607Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:05.9572714Z             auto tmp1 = static_cast<float>(32);
2023-01-11T21:38:05.9572806Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:05.9572957Z             auto tmp3 = static_cast<float>(1e-05);
2023-01-11T21:38:05.9573047Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:05.9573152Z             auto tmp5 = 1 / std::sqrt(tmp4);
2023-01-11T21:38:05.9573239Z             in_out_ptr1[i0] = tmp5;
2023-01-11T21:38:05.9573299Z         }
2023-01-11T21:38:05.9573465Z         #pragma omp for 
2023-01-11T21:38:05.9573551Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9573616Z         {
2023-01-11T21:38:05.9573702Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9573771Z             {
2023-01-11T21:38:05.9573919Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:05.9574047Z                 auto tmp1 = at::vec::Vectorized<float>(in_out_ptr0[i0]);
2023-01-11T21:38:05.9574180Z                 auto tmp3 = at::vec::Vectorized<float>(in_out_ptr1[i0]);
2023-01-11T21:38:05.9574318Z                 auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i1);
2023-01-11T21:38:05.9574459Z                 auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i1);
2023-01-11T21:38:05.9574709Z                 auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9574801Z                 auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9574891Z                 auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:05.9574985Z                 auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:05.9575112Z                 auto tmp9 = at::vec::clamp_min(tmp8, decltype(tmp8)(0));
2023-01-11T21:38:05.9575223Z                 tmp9.store(out_ptr3 + (8*i1) + (32*i0));
2023-01-11T21:38:05.9575291Z             }
2023-01-11T21:38:05.9575392Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9575483Z             for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:05.9575551Z             {
2023-01-11T21:38:05.9575654Z                 auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:05.9575745Z                 auto tmp1 = in_out_ptr0[i0];
2023-01-11T21:38:05.9575893Z                 auto tmp3 = in_out_ptr1[i0];
2023-01-11T21:38:05.9575985Z                 auto tmp5 = in_ptr1[i1];
2023-01-11T21:38:05.9576076Z                 auto tmp7 = in_ptr2[i1];
2023-01-11T21:38:05.9576209Z                 auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9576300Z                 auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9576390Z                 auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:05.9576473Z                 auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:05.9576569Z                 auto tmp9 = tmp8 * (tmp8>0);
2023-01-11T21:38:05.9576666Z                 out_ptr3[i1 + (32*i0)] = tmp9;
2023-01-11T21:38:05.9576734Z             }
2023-01-11T21:38:05.9576802Z         }
2023-01-11T21:38:05.9576868Z     }
2023-01-11T21:38:05.9576932Z }
2023-01-11T21:38:05.9577009Z ''')
2023-01-11T21:38:05.9577014Z 
2023-01-11T21:38:05.9577019Z 
2023-01-11T21:38:05.9577113Z async_compile.wait(globals())
2023-01-11T21:38:05.9577248Z del async_compile
2023-01-11T21:38:05.9577253Z 
2023-01-11T21:38:05.9577333Z def call(args):
2023-01-11T21:38:05.9577441Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:05.9577517Z     args.clear()
2023-01-11T21:38:05.9577713Z     buf0 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9577910Z     buf1 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9578099Z     buf2 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9578215Z     buf3 = as_strided(buf2, (16, 1), (1, 1)); del buf2  # reuse
2023-01-11T21:38:05.9578333Z     buf4 = as_strided(buf1, (16, 1), (1, 1)); del buf1  # reuse
2023-01-11T21:38:05.9578532Z     buf5 = empty_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9578808Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(primals_3.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:05.9578940Z     return (buf5, primals_1, primals_2, primals_3, buf3, buf4, )
2023-01-11T21:38:05.9578946Z 
2023-01-11T21:38:05.9578950Z 
2023-01-11T21:38:05.9579031Z if __name__ == "__main__":
2023-01-11T21:38:05.9579149Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9579268Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9579509Z     primals_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9579709Z     primals_2 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9579914Z     primals_3 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9580056Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:05.9580061Z 
2023-01-11T21:38:05.9580130Z ok (2.060s)
2023-01-11T21:38:05.9580583Z   test_leaky_relu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9580716Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9580977Z [2023-01-11 21:28:09,067] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 152
2023-01-11T21:38:05.9581240Z [2023-01-11 21:28:11,006] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 152
2023-01-11T21:38:05.9581246Z 
2023-01-11T21:38:05.9581336Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9581411Z import torch
2023-01-11T21:38:05.9581485Z import random
2023-01-11T21:38:05.9581602Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9581725Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9581730Z 
2023-01-11T21:38:05.9581845Z aten = torch.ops.aten
2023-01-11T21:38:05.9581984Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9582075Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9582089Z 
2023-01-11T21:38:05.9582158Z import triton
2023-01-11T21:38:05.9582253Z import triton.language as tl
2023-01-11T21:38:05.9582381Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9582527Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9582532Z 
2023-01-11T21:38:05.9582536Z 
2023-01-11T21:38:05.9582674Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9582881Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9583011Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9583121Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9583218Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9583289Z {
2023-01-11T21:38:05.9583394Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9583462Z     {
2023-01-11T21:38:05.9583546Z         #pragma omp for 
2023-01-11T21:38:05.9583638Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.9583700Z         {
2023-01-11T21:38:05.9583769Z             {
2023-01-11T21:38:05.9583841Z                 {
2023-01-11T21:38:05.9583945Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9584057Z                     auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:05.9584156Z                     auto tmp2 = tmp0 > tmp1;
2023-01-11T21:38:05.9584269Z                     auto tmp3 = static_cast<float>(0.2);
2023-01-11T21:38:05.9584359Z                     auto tmp4 = tmp0 * tmp3;
2023-01-11T21:38:05.9584464Z                     auto tmp5 = tmp2 ? tmp0 : tmp4;
2023-01-11T21:38:05.9584578Z                     auto tmp6 = static_cast<float>(2);
2023-01-11T21:38:05.9584681Z                     auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:05.9584794Z                     auto tmp8 = static_cast<float>(1);
2023-01-11T21:38:05.9584892Z                     auto tmp9 = tmp0 + tmp8;
2023-01-11T21:38:05.9584992Z                     auto tmp10 = tmp9 > tmp1;
2023-01-11T21:38:05.9585100Z                     auto tmp11 = static_cast<float>(0.01);
2023-01-11T21:38:05.9585243Z                     auto tmp12 = tmp9 * tmp11;
2023-01-11T21:38:05.9585371Z                     auto tmp13 = tmp10 ? tmp9 : tmp12;
2023-01-11T21:38:05.9585480Z                     out_ptr0[i0] = tmp7;
2023-01-11T21:38:05.9585583Z                     out_ptr1[i0] = tmp13;
2023-01-11T21:38:05.9585653Z                 }
2023-01-11T21:38:05.9585722Z             }
2023-01-11T21:38:05.9585783Z         }
2023-01-11T21:38:05.9585852Z     }
2023-01-11T21:38:05.9585919Z }
2023-01-11T21:38:05.9586004Z ''')
2023-01-11T21:38:05.9586010Z 
2023-01-11T21:38:05.9586014Z 
2023-01-11T21:38:05.9586112Z async_compile.wait(globals())
2023-01-11T21:38:05.9586192Z del async_compile
2023-01-11T21:38:05.9586198Z 
2023-01-11T21:38:05.9586279Z def call(args):
2023-01-11T21:38:05.9586355Z     arg0_1, = args
2023-01-11T21:38:05.9586426Z     args.clear()
2023-01-11T21:38:05.9586627Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9586827Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9587000Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9587082Z     del arg0_1
2023-01-11T21:38:05.9587169Z     return (buf0, buf1, )
2023-01-11T21:38:05.9587175Z 
2023-01-11T21:38:05.9587179Z 
2023-01-11T21:38:05.9587262Z if __name__ == "__main__":
2023-01-11T21:38:05.9587375Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9587504Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9587705Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9587820Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9587854Z 
2023-01-11T21:38:05.9587927Z ok (1.990s)
2023-01-11T21:38:05.9588381Z   test_lgamma_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9588515Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9588775Z [2023-01-11 21:28:11,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 153
2023-01-11T21:38:05.9589040Z [2023-01-11 21:28:12,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 153
2023-01-11T21:38:05.9589047Z 
2023-01-11T21:38:05.9589148Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9589221Z import torch
2023-01-11T21:38:05.9589298Z import random
2023-01-11T21:38:05.9589419Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9589547Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9589552Z 
2023-01-11T21:38:05.9589638Z aten = torch.ops.aten
2023-01-11T21:38:05.9589781Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9589882Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9589887Z 
2023-01-11T21:38:05.9589957Z import triton
2023-01-11T21:38:05.9590051Z import triton.language as tl
2023-01-11T21:38:05.9590180Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9590321Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9590327Z 
2023-01-11T21:38:05.9590332Z 
2023-01-11T21:38:05.9590471Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9590677Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9590806Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9590913Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9591009Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9591078Z {
2023-01-11T21:38:05.9591182Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9591279Z     {
2023-01-11T21:38:05.9591364Z         #pragma omp for 
2023-01-11T21:38:05.9591456Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:05.9591525Z         {
2023-01-11T21:38:05.9591664Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9591759Z             auto tmp1 = tmp0.lgamma();
2023-01-11T21:38:05.9591898Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9591987Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.9592126Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9592223Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:05.9592315Z             auto tmp6 = tmp5.cos();
2023-01-11T21:38:05.9592414Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9592505Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9592574Z         }
2023-01-11T21:38:05.9592677Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9592770Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:05.9592838Z         {
2023-01-11T21:38:05.9592929Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9593033Z             auto tmp1 = std::lgamma(tmp0);
2023-01-11T21:38:05.9593133Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:05.9593224Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:05.9593334Z             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.9593424Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:05.9593520Z             auto tmp6 = std::cos(tmp5);
2023-01-11T21:38:05.9593608Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.9593721Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:05.9593782Z         }
2023-01-11T21:38:05.9593848Z     }
2023-01-11T21:38:05.9593911Z }
2023-01-11T21:38:05.9593997Z ''')
2023-01-11T21:38:05.9594003Z 
2023-01-11T21:38:05.9594007Z 
2023-01-11T21:38:05.9594098Z async_compile.wait(globals())
2023-01-11T21:38:05.9594175Z del async_compile
2023-01-11T21:38:05.9594182Z 
2023-01-11T21:38:05.9594258Z def call(args):
2023-01-11T21:38:05.9594325Z     arg0_1, = args
2023-01-11T21:38:05.9594397Z     args.clear()
2023-01-11T21:38:05.9594598Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9594795Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9594965Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9595035Z     del arg0_1
2023-01-11T21:38:05.9595112Z     return (buf0, buf1, )
2023-01-11T21:38:05.9595117Z 
2023-01-11T21:38:05.9595122Z 
2023-01-11T21:38:05.9595209Z if __name__ == "__main__":
2023-01-11T21:38:05.9595320Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9595446Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9595645Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9595762Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9595767Z 
2023-01-11T21:38:05.9595837Z ok (1.992s)
2023-01-11T21:38:05.9596285Z   test_linear1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9596416Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9596672Z [2023-01-11 21:28:13,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 154
2023-01-11T21:38:05.9596935Z [2023-01-11 21:28:14,964] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 154
2023-01-11T21:38:05.9596941Z 
2023-01-11T21:38:05.9597031Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9597136Z import torch
2023-01-11T21:38:05.9597211Z import random
2023-01-11T21:38:05.9597332Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9597455Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9597460Z 
2023-01-11T21:38:05.9597543Z aten = torch.ops.aten
2023-01-11T21:38:05.9597679Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9597768Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9597782Z 
2023-01-11T21:38:05.9597848Z import triton
2023-01-11T21:38:05.9597940Z import triton.language as tl
2023-01-11T21:38:05.9598062Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9598204Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9598210Z 
2023-01-11T21:38:05.9598214Z 
2023-01-11T21:38:05.9598353Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9598559Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9598681Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.9598739Z {
2023-01-11T21:38:05.9598842Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9598908Z     {
2023-01-11T21:38:05.9598989Z         #pragma omp for 
2023-01-11T21:38:05.9599075Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9599142Z         {
2023-01-11T21:38:05.9599288Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9599422Z             auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp());
2023-01-11T21:38:05.9599522Z             tmp1.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9599621Z         }
2023-01-11T21:38:05.9599719Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9599808Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:05.9599874Z         {
2023-01-11T21:38:05.9599971Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.9600103Z             auto tmp1 = std::exp(-tmp0);
2023-01-11T21:38:05.9600195Z             auto tmp2 = 1 / (1 + tmp1);
2023-01-11T21:38:05.9600284Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9600352Z         }
2023-01-11T21:38:05.9600416Z     }
2023-01-11T21:38:05.9600478Z }
2023-01-11T21:38:05.9600561Z ''')
2023-01-11T21:38:05.9600567Z 
2023-01-11T21:38:05.9600571Z 
2023-01-11T21:38:05.9600666Z async_compile.wait(globals())
2023-01-11T21:38:05.9600737Z del async_compile
2023-01-11T21:38:05.9600742Z 
2023-01-11T21:38:05.9600817Z def call(args):
2023-01-11T21:38:05.9600924Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:05.9601001Z     args.clear()
2023-01-11T21:38:05.9601201Z     buf0 = empty_strided((2, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9601371Z     aten.addmm.out(primals_2, primals_3, as_strided(primals_1, (8, 16), (1, 8)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:05.9601447Z     del primals_1
2023-01-11T21:38:05.9601516Z     del primals_2
2023-01-11T21:38:05.9601608Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:05.9601715Z     kernel_cpp_0(c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9601810Z     return (buf1, primals_3, buf1, )
2023-01-11T21:38:05.9601815Z 
2023-01-11T21:38:05.9601820Z 
2023-01-11T21:38:05.9601901Z if __name__ == "__main__":
2023-01-11T21:38:05.9602016Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9602140Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9602336Z     primals_1 = rand_strided((16, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9602533Z     primals_2 = rand_strided((16, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9602734Z     primals_3 = rand_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9602878Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:05.9602883Z 
2023-01-11T21:38:05.9602953Z ok (1.958s)
2023-01-11T21:38:05.9603433Z   test_linear2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9603566Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9603825Z [2023-01-11 21:28:15,265] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 155
2023-01-11T21:38:05.9604087Z [2023-01-11 21:28:17,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 155
2023-01-11T21:38:05.9604097Z 
2023-01-11T21:38:05.9604194Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9604261Z import torch
2023-01-11T21:38:05.9604336Z import random
2023-01-11T21:38:05.9604457Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9604584Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9604589Z 
2023-01-11T21:38:05.9604672Z aten = torch.ops.aten
2023-01-11T21:38:05.9604809Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9604907Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9604913Z 
2023-01-11T21:38:05.9604987Z import triton
2023-01-11T21:38:05.9605072Z import triton.language as tl
2023-01-11T21:38:05.9605197Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9605336Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9605341Z 
2023-01-11T21:38:05.9605345Z 
2023-01-11T21:38:05.9605510Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9605718Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9605838Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.9605903Z {
2023-01-11T21:38:05.9605997Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9606067Z     {
2023-01-11T21:38:05.9606149Z         #pragma omp for 
2023-01-11T21:38:05.9606237Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9606305Z         {
2023-01-11T21:38:05.9606450Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9606583Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:05.9606684Z             tmp1.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9606744Z         }
2023-01-11T21:38:05.9606842Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9606930Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:05.9607003Z         {
2023-01-11T21:38:05.9607096Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.9607187Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.9607268Z             in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9607337Z         }
2023-01-11T21:38:05.9607401Z     }
2023-01-11T21:38:05.9607465Z }
2023-01-11T21:38:05.9607550Z ''')
2023-01-11T21:38:05.9607559Z 
2023-01-11T21:38:05.9607564Z 
2023-01-11T21:38:05.9607701Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.9607900Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9608018Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.9608076Z {
2023-01-11T21:38:05.9608176Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9608242Z     {
2023-01-11T21:38:05.9608320Z         #pragma omp for 
2023-01-11T21:38:05.9608406Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9608473Z         {
2023-01-11T21:38:05.9608615Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9608745Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:05.9608844Z             tmp1.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9608910Z         }
2023-01-11T21:38:05.9609007Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9609122Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:05.9609190Z         {
2023-01-11T21:38:05.9609281Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.9609367Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.9609454Z             in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9609522Z         }
2023-01-11T21:38:05.9609589Z     }
2023-01-11T21:38:05.9609652Z }
2023-01-11T21:38:05.9609735Z ''')
2023-01-11T21:38:05.9609740Z 
2023-01-11T21:38:05.9609745Z 
2023-01-11T21:38:05.9609876Z kernel_cpp_2 = async_compile.cpp('''
2023-01-11T21:38:05.9610066Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9610188Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:05.9610256Z {
2023-01-11T21:38:05.9610354Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9610421Z     {
2023-01-11T21:38:05.9610503Z         #pragma omp for 
2023-01-11T21:38:05.9610590Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:05.9610652Z         {
2023-01-11T21:38:05.9610790Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9610921Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:05.9611020Z             tmp1.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:05.9611086Z         }
2023-01-11T21:38:05.9611183Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9611272Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:05.9611331Z         {
2023-01-11T21:38:05.9611423Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.9611516Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.9611631Z             in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9611698Z         }
2023-01-11T21:38:05.9611764Z     }
2023-01-11T21:38:05.9611829Z }
2023-01-11T21:38:05.9611907Z ''')
2023-01-11T21:38:05.9611913Z 
2023-01-11T21:38:05.9611917Z 
2023-01-11T21:38:05.9612053Z kernel_cpp_3 = async_compile.cpp('''
2023-01-11T21:38:05.9612252Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9612371Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9612472Z                        bool* __restrict__ out_ptr0)
2023-01-11T21:38:05.9612537Z {
2023-01-11T21:38:05.9612642Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9612700Z     {
2023-01-11T21:38:05.9612780Z         #pragma omp for 
2023-01-11T21:38:05.9612866Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9612932Z         {
2023-01-11T21:38:05.9612999Z             {
2023-01-11T21:38:05.9613068Z                 {
2023-01-11T21:38:05.9613174Z                     auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:05.9613269Z                     auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:05.9613382Z                     auto tmp2 = static_cast<float>(0);
2023-01-11T21:38:05.9613480Z                     auto tmp3 = tmp1 <= tmp2;
2023-01-11T21:38:05.9613577Z                     in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9613666Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.9613736Z                 }
2023-01-11T21:38:05.9613801Z             }
2023-01-11T21:38:05.9613860Z         }
2023-01-11T21:38:05.9613927Z     }
2023-01-11T21:38:05.9613992Z }
2023-01-11T21:38:05.9614076Z ''')
2023-01-11T21:38:05.9614081Z 
2023-01-11T21:38:05.9614085Z 
2023-01-11T21:38:05.9614179Z async_compile.wait(globals())
2023-01-11T21:38:05.9614256Z del async_compile
2023-01-11T21:38:05.9614261Z 
2023-01-11T21:38:05.9614336Z def call(args):
2023-01-11T21:38:05.9614630Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9 = args
2023-01-11T21:38:05.9614713Z     args.clear()
2023-01-11T21:38:05.9614908Z     buf0 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9615082Z     aten.addmm.out(primals_2, primals_9, as_strided(primals_1, (8, 8), (1, 8)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:05.9615223Z     del primals_1
2023-01-11T21:38:05.9615314Z     del primals_2
2023-01-11T21:38:05.9615415Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:05.9615515Z     kernel_cpp_0(c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9615707Z     buf2 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9615871Z     aten.addmm.out(primals_4, buf1, as_strided(primals_3, (8, 8), (1, 8)), beta=1, alpha=1, out=buf2)
2023-01-11T21:38:05.9615948Z     del primals_4
2023-01-11T21:38:05.9616038Z     buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:05.9616143Z     kernel_cpp_1(c_void_p(buf3.data_ptr()))
2023-01-11T21:38:05.9616335Z     buf4 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9616499Z     aten.addmm.out(primals_6, buf3, as_strided(primals_5, (8, 8), (1, 8)), beta=1, alpha=1, out=buf4)
2023-01-11T21:38:05.9616568Z     del primals_6
2023-01-11T21:38:05.9616656Z     buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:05.9616762Z     kernel_cpp_2(c_void_p(buf5.data_ptr()))
2023-01-11T21:38:05.9638871Z     buf6 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9639039Z     aten.addmm.out(primals_8, buf5, as_strided(primals_7, (8, 8), (1, 8)), beta=1, alpha=1, out=buf6)
2023-01-11T21:38:05.9639118Z     del primals_8
2023-01-11T21:38:05.9639207Z     buf7 = buf6; del buf6  # reuse
2023-01-11T21:38:05.9639396Z     buf8 = empty_strided((2, 8), (8, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9639527Z     kernel_cpp_3(c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()))
2023-01-11T21:38:05.9639727Z     return (buf7, primals_9, buf1, buf3, buf5, buf8, as_strided(primals_7, (8, 8), (8, 1)), as_strided(primals_5, (8, 8), (8, 1)), as_strided(primals_3, (8, 8), (8, 1)), )
2023-01-11T21:38:05.9639902Z 
2023-01-11T21:38:05.9639907Z 
2023-01-11T21:38:05.9639993Z if __name__ == "__main__":
2023-01-11T21:38:05.9640110Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9640244Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9640446Z     primals_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9640642Z     primals_2 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9640837Z     primals_3 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9641024Z     primals_4 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9641219Z     primals_5 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9641410Z     primals_6 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9641610Z     primals_7 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9641804Z     primals_8 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9641999Z     primals_9 = rand_strided((2, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9642212Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9]))
2023-01-11T21:38:05.9642218Z 
2023-01-11T21:38:05.9642291Z ok (2.372s)
2023-01-11T21:38:05.9642416Z   test_linear_binary_cpu (__main__.CpuTests) ... ok (0.020s)
2023-01-11T21:38:05.9642869Z   test_linear_packed_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9643004Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9643260Z [2023-01-11 21:28:17,407] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 156
2023-01-11T21:38:05.9643559Z [2023-01-11 21:28:17,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 156
2023-01-11T21:38:05.9643975Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9644106Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9644360Z [2023-01-11 21:28:17,465] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 157
2023-01-11T21:38:05.9644627Z [2023-01-11 21:28:17,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 157
2023-01-11T21:38:05.9645093Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9645225Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9645478Z [2023-01-11 21:28:17,523] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 158
2023-01-11T21:38:05.9645739Z [2023-01-11 21:28:17,546] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 158
2023-01-11T21:38:05.9646145Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9646307Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9646564Z [2023-01-11 21:28:17,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 159
2023-01-11T21:38:05.9646828Z [2023-01-11 21:28:17,592] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 159
2023-01-11T21:38:05.9646834Z 
2023-01-11T21:38:05.9646938Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9647020Z import torch
2023-01-11T21:38:05.9647098Z import random
2023-01-11T21:38:05.9647220Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9647349Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9647356Z 
2023-01-11T21:38:05.9647434Z aten = torch.ops.aten
2023-01-11T21:38:05.9647575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9647673Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9647679Z 
2023-01-11T21:38:05.9647760Z import triton
2023-01-11T21:38:05.9647857Z import triton.language as tl
2023-01-11T21:38:05.9647986Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9648128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9648133Z 
2023-01-11T21:38:05.9648138Z 
2023-01-11T21:38:05.9648232Z async_compile.wait(globals())
2023-01-11T21:38:05.9648305Z del async_compile
2023-01-11T21:38:05.9648310Z 
2023-01-11T21:38:05.9648388Z def call(args):
2023-01-11T21:38:05.9648487Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:05.9648565Z     args.clear()
2023-01-11T21:38:05.9648700Z     buf0 = torch.ops.mkl._mkl_linear(arg3_1, arg2_1, arg0_1, arg1_1, 6)
2023-01-11T21:38:05.9648777Z     del arg0_1
2023-01-11T21:38:05.9648852Z     del arg1_1
2023-01-11T21:38:05.9648919Z     del arg2_1
2023-01-11T21:38:05.9648993Z     del arg3_1
2023-01-11T21:38:05.9649072Z     return (buf0, )
2023-01-11T21:38:05.9649077Z 
2023-01-11T21:38:05.9649081Z 
2023-01-11T21:38:05.9649166Z if __name__ == "__main__":
2023-01-11T21:38:05.9649318Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9649449Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9649653Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9649850Z     arg1_1 = rand_strided((30, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9650047Z     arg2_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9650255Z     arg3_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9650390Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:05.9650400Z 
2023-01-11T21:38:05.9650404Z 
2023-01-11T21:38:05.9650507Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9650584Z import torch
2023-01-11T21:38:05.9650661Z import random
2023-01-11T21:38:05.9650782Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9650903Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9650916Z 
2023-01-11T21:38:05.9650994Z aten = torch.ops.aten
2023-01-11T21:38:05.9651133Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9651232Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9651237Z 
2023-01-11T21:38:05.9651314Z import triton
2023-01-11T21:38:05.9651410Z import triton.language as tl
2023-01-11T21:38:05.9651539Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9651679Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9651684Z 
2023-01-11T21:38:05.9651689Z 
2023-01-11T21:38:05.9651818Z async_compile.wait(globals())
2023-01-11T21:38:05.9651888Z del async_compile
2023-01-11T21:38:05.9651892Z 
2023-01-11T21:38:05.9651967Z def call(args):
2023-01-11T21:38:05.9652058Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9652135Z     args.clear()
2023-01-11T21:38:05.9652266Z     buf0 = torch.ops.mkl._mkl_linear(arg2_1, arg1_1, arg0_1, None, 6)
2023-01-11T21:38:05.9652341Z     del arg0_1
2023-01-11T21:38:05.9652416Z     del arg1_1
2023-01-11T21:38:05.9652480Z     del arg2_1
2023-01-11T21:38:05.9652555Z     return (buf0, )
2023-01-11T21:38:05.9652560Z 
2023-01-11T21:38:05.9652565Z 
2023-01-11T21:38:05.9652647Z if __name__ == "__main__":
2023-01-11T21:38:05.9652764Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9652892Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9653092Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9653294Z     arg1_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9653491Z     arg2_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9653619Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9653624Z 
2023-01-11T21:38:05.9653628Z 
2023-01-11T21:38:05.9653727Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9653806Z import torch
2023-01-11T21:38:05.9653879Z import random
2023-01-11T21:38:05.9653996Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9654118Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9654123Z 
2023-01-11T21:38:05.9654205Z aten = torch.ops.aten
2023-01-11T21:38:05.9654332Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9654427Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9654432Z 
2023-01-11T21:38:05.9654657Z import triton
2023-01-11T21:38:05.9654755Z import triton.language as tl
2023-01-11T21:38:05.9654895Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9655056Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9655062Z 
2023-01-11T21:38:05.9655068Z 
2023-01-11T21:38:05.9655170Z async_compile.wait(globals())
2023-01-11T21:38:05.9655249Z del async_compile
2023-01-11T21:38:05.9655254Z 
2023-01-11T21:38:05.9655321Z def call(args):
2023-01-11T21:38:05.9655460Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:05.9655539Z     args.clear()
2023-01-11T21:38:05.9655670Z     buf0 = torch.ops.mkl._mkl_linear(arg3_1, arg2_1, arg0_1, arg1_1, 2)
2023-01-11T21:38:05.9655744Z     del arg0_1
2023-01-11T21:38:05.9655818Z     del arg1_1
2023-01-11T21:38:05.9655889Z     del arg2_1
2023-01-11T21:38:05.9655952Z     del arg3_1
2023-01-11T21:38:05.9656028Z     return (buf0, )
2023-01-11T21:38:05.9656033Z 
2023-01-11T21:38:05.9656038Z 
2023-01-11T21:38:05.9656118Z if __name__ == "__main__":
2023-01-11T21:38:05.9656236Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9656367Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9656569Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9656760Z     arg1_1 = rand_strided((30, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9656962Z     arg2_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9657212Z     arg3_1 = rand_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9657346Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:05.9657352Z 
2023-01-11T21:38:05.9657356Z 
2023-01-11T21:38:05.9657453Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9657528Z import torch
2023-01-11T21:38:05.9657603Z import random
2023-01-11T21:38:05.9657722Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9657844Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9657849Z 
2023-01-11T21:38:05.9657931Z aten = torch.ops.aten
2023-01-11T21:38:05.9658100Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9658196Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9658201Z 
2023-01-11T21:38:05.9658276Z import triton
2023-01-11T21:38:05.9658374Z import triton.language as tl
2023-01-11T21:38:05.9658502Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9658641Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9658647Z 
2023-01-11T21:38:05.9658651Z 
2023-01-11T21:38:05.9658745Z async_compile.wait(globals())
2023-01-11T21:38:05.9658821Z del async_compile
2023-01-11T21:38:05.9658826Z 
2023-01-11T21:38:05.9658894Z def call(args):
2023-01-11T21:38:05.9658983Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:05.9659057Z     args.clear()
2023-01-11T21:38:05.9659185Z     buf0 = torch.ops.mkl._mkl_linear(arg2_1, arg1_1, arg0_1, None, 2)
2023-01-11T21:38:05.9659258Z     del arg0_1
2023-01-11T21:38:05.9659330Z     del arg1_1
2023-01-11T21:38:05.9659405Z     del arg2_1
2023-01-11T21:38:05.9659473Z     return (buf0, )
2023-01-11T21:38:05.9659478Z 
2023-01-11T21:38:05.9659482Z 
2023-01-11T21:38:05.9659562Z if __name__ == "__main__":
2023-01-11T21:38:05.9659678Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9659804Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9660013Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9660211Z     arg1_1 = rand_strided((1459233, 1), (1, 0), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9660408Z     arg2_1 = rand_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9660535Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:05.9660540Z 
2023-01-11T21:38:05.9660604Z ok (0.237s)
2023-01-11T21:38:05.9660727Z   test_linear_unary_cpu (__main__.CpuTests) ... ok (0.001s)
2023-01-11T21:38:05.9661182Z   test_linspace1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9661348Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9661610Z [2023-01-11 21:28:17,637] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 160
2023-01-11T21:38:05.9661875Z [2023-01-11 21:28:19,454] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 160
2023-01-11T21:38:05.9661880Z 
2023-01-11T21:38:05.9661979Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9662059Z import torch
2023-01-11T21:38:05.9662137Z import random
2023-01-11T21:38:05.9662250Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9662378Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9662383Z 
2023-01-11T21:38:05.9662464Z aten = torch.ops.aten
2023-01-11T21:38:05.9662601Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9662697Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9662702Z 
2023-01-11T21:38:05.9662776Z import triton
2023-01-11T21:38:05.9662870Z import triton.language as tl
2023-01-11T21:38:05.9662995Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9663128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9663133Z 
2023-01-11T21:38:05.9663145Z 
2023-01-11T21:38:05.9663274Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9663481Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9663609Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9663716Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9663821Z {
2023-01-11T21:38:05.9663903Z     #pragma GCC ivdep
2023-01-11T21:38:05.9663994Z     for(long i0=0; i0<7; i0+=1)
2023-01-11T21:38:05.9664053Z     {
2023-01-11T21:38:05.9664121Z         {
2023-01-11T21:38:05.9664190Z             {
2023-01-11T21:38:05.9664285Z                 auto tmp4 = in_ptr0[i0];
2023-01-11T21:38:05.9664400Z                 auto tmp0 = static_cast<float>(0.125);
2023-01-11T21:38:05.9664506Z                 auto tmp1 = static_cast<float>(i0);
2023-01-11T21:38:05.9664591Z                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9664682Z                 auto tmp3 = tmp2 + tmp0;
2023-01-11T21:38:05.9664772Z                 auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.9664879Z                 out_ptr0[i0] = tmp5;
2023-01-11T21:38:05.9664953Z             }
2023-01-11T21:38:05.9665032Z         }
2023-01-11T21:38:05.9665111Z     }
2023-01-11T21:38:05.9665168Z }
2023-01-11T21:38:05.9665253Z ''')
2023-01-11T21:38:05.9665258Z 
2023-01-11T21:38:05.9665262Z 
2023-01-11T21:38:05.9665360Z async_compile.wait(globals())
2023-01-11T21:38:05.9665437Z del async_compile
2023-01-11T21:38:05.9665442Z 
2023-01-11T21:38:05.9665517Z def call(args):
2023-01-11T21:38:05.9665591Z     arg0_1, = args
2023-01-11T21:38:05.9665666Z     args.clear()
2023-01-11T21:38:05.9665855Z     buf0 = empty_strided((1, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9666000Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9666073Z     del arg0_1
2023-01-11T21:38:05.9666149Z     return (buf0, )
2023-01-11T21:38:05.9666154Z 
2023-01-11T21:38:05.9666158Z 
2023-01-11T21:38:05.9666239Z if __name__ == "__main__":
2023-01-11T21:38:05.9666356Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9666482Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9666678Z     arg0_1 = rand_strided((1, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9666782Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9666797Z 
2023-01-11T21:38:05.9666861Z ok (1.861s)
2023-01-11T21:38:05.9667346Z   test_linspace2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9667479Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9667734Z [2023-01-11 21:28:19,493] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 161
2023-01-11T21:38:05.9667994Z [2023-01-11 21:28:21,203] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 161
2023-01-11T21:38:05.9668000Z 
2023-01-11T21:38:05.9668099Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9668173Z import torch
2023-01-11T21:38:05.9668250Z import random
2023-01-11T21:38:05.9668362Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9668485Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9668490Z 
2023-01-11T21:38:05.9668574Z aten = torch.ops.aten
2023-01-11T21:38:05.9668712Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9668809Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9668814Z 
2023-01-11T21:38:05.9668890Z import triton
2023-01-11T21:38:05.9668984Z import triton.language as tl
2023-01-11T21:38:05.9669109Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9669241Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9669246Z 
2023-01-11T21:38:05.9669259Z 
2023-01-11T21:38:05.9669388Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9669593Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9669717Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9669852Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9669920Z {
2023-01-11T21:38:05.9669989Z     {
2023-01-11T21:38:05.9670058Z         {
2023-01-11T21:38:05.9670143Z             auto tmp4 = in_ptr0[0];
2023-01-11T21:38:05.9670255Z             auto tmp0 = static_cast<float>(0.0);
2023-01-11T21:38:05.9670361Z             auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:05.9670453Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9670545Z             auto tmp3 = tmp2 + tmp1;
2023-01-11T21:38:05.9670635Z             auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.9670721Z             out_ptr0[0] = tmp5;
2023-01-11T21:38:05.9670783Z         }
2023-01-11T21:38:05.9670850Z     }
2023-01-11T21:38:05.9670917Z }
2023-01-11T21:38:05.9671004Z ''')
2023-01-11T21:38:05.9671010Z 
2023-01-11T21:38:05.9671014Z 
2023-01-11T21:38:05.9671111Z async_compile.wait(globals())
2023-01-11T21:38:05.9671190Z del async_compile
2023-01-11T21:38:05.9671199Z 
2023-01-11T21:38:05.9671276Z def call(args):
2023-01-11T21:38:05.9671345Z     arg0_1, = args
2023-01-11T21:38:05.9671421Z     args.clear()
2023-01-11T21:38:05.9671619Z     buf0 = empty_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9671764Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9671839Z     del arg0_1
2023-01-11T21:38:05.9671916Z     return (buf0, )
2023-01-11T21:38:05.9671922Z 
2023-01-11T21:38:05.9671926Z 
2023-01-11T21:38:05.9672008Z if __name__ == "__main__":
2023-01-11T21:38:05.9672121Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9672251Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9672449Z     arg0_1 = rand_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9672563Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9672568Z 
2023-01-11T21:38:05.9672642Z ok (1.748s)
2023-01-11T21:38:05.9673126Z   test_linspace3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9673262Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9673526Z [2023-01-11 21:28:21,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 162
2023-01-11T21:38:05.9673789Z [2023-01-11 21:28:21,233] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 162
2023-01-11T21:38:05.9673795Z 
2023-01-11T21:38:05.9673895Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9673964Z import torch
2023-01-11T21:38:05.9674041Z import random
2023-01-11T21:38:05.9674164Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9674292Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9674297Z 
2023-01-11T21:38:05.9674384Z aten = torch.ops.aten
2023-01-11T21:38:05.9674523Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9674620Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9674628Z 
2023-01-11T21:38:05.9674700Z import triton
2023-01-11T21:38:05.9674795Z import triton.language as tl
2023-01-11T21:38:05.9674925Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9675070Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9675076Z 
2023-01-11T21:38:05.9675080Z 
2023-01-11T21:38:05.9675178Z async_compile.wait(globals())
2023-01-11T21:38:05.9675257Z del async_compile
2023-01-11T21:38:05.9675262Z 
2023-01-11T21:38:05.9675339Z def call(args):
2023-01-11T21:38:05.9675533Z     buf0 = empty_strided((0, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9675605Z     return (buf0, )
2023-01-11T21:38:05.9675638Z 
2023-01-11T21:38:05.9675653Z 
2023-01-11T21:38:05.9675728Z if __name__ == "__main__":
2023-01-11T21:38:05.9675846Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9675974Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9676078Z     print_performance(lambda: call([]))
2023-01-11T21:38:05.9676086Z 
2023-01-11T21:38:05.9676160Z ok (0.029s)
2023-01-11T21:38:05.9676497Z   test_list_clearing_cpu (__main__.CpuTests) ... [2023-01-11 21:28:21,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:05.9676766Z [2023-01-11 21:28:22,995] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:05.9676772Z 
2023-01-11T21:38:05.9676874Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9676944Z import torch
2023-01-11T21:38:05.9677021Z import random
2023-01-11T21:38:05.9677143Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9677272Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9677277Z 
2023-01-11T21:38:05.9677363Z aten = torch.ops.aten
2023-01-11T21:38:05.9677500Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9677597Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9677602Z 
2023-01-11T21:38:05.9677674Z import triton
2023-01-11T21:38:05.9677768Z import triton.language as tl
2023-01-11T21:38:05.9677895Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9678036Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9678041Z 
2023-01-11T21:38:05.9678046Z 
2023-01-11T21:38:05.9678185Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9678392Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9678518Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9678630Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9678731Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9678801Z {
2023-01-11T21:38:05.9678906Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9678974Z     {
2023-01-11T21:38:05.9679059Z         #pragma omp for 
2023-01-11T21:38:05.9679178Z         for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:05.9679249Z         {
2023-01-11T21:38:05.9679394Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9679532Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9679623Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9679721Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9679791Z         }
2023-01-11T21:38:05.9679894Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9679989Z         for(long i0=24; i0<25; i0+=1)
2023-01-11T21:38:05.9680052Z         {
2023-01-11T21:38:05.9680143Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9680236Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9680326Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9680414Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9680483Z         }
2023-01-11T21:38:05.9680551Z     }
2023-01-11T21:38:05.9680610Z }
2023-01-11T21:38:05.9680699Z ''')
2023-01-11T21:38:05.9680706Z 
2023-01-11T21:38:05.9680711Z 
2023-01-11T21:38:05.9680806Z async_compile.wait(globals())
2023-01-11T21:38:05.9680886Z del async_compile
2023-01-11T21:38:05.9680891Z 
2023-01-11T21:38:05.9680968Z def call(args):
2023-01-11T21:38:05.9681045Z     x_1, y_1 = args
2023-01-11T21:38:05.9681123Z     args.clear()
2023-01-11T21:38:05.9681312Z     buf0 = empty_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9681475Z     kernel_cpp_0(c_void_p(x_1.data_ptr()), c_void_p(y_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9681545Z     del x_1
2023-01-11T21:38:05.9681619Z     del y_1
2023-01-11T21:38:05.9681814Z     buf1 = empty_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9681938Z     aten.mm.out(buf0, buf0, out=buf1)
2023-01-11T21:38:05.9682015Z     return (buf1, )
2023-01-11T21:38:05.9682020Z 
2023-01-11T21:38:05.9682025Z 
2023-01-11T21:38:05.9682104Z if __name__ == "__main__":
2023-01-11T21:38:05.9682216Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9682344Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9682534Z     x_1 = rand_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9682724Z     y_1 = rand_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9682839Z     print_performance(lambda: call([x_1, y_1]))
2023-01-11T21:38:05.9682844Z 
2023-01-11T21:38:05.9682916Z ok (1.761s)
2023-01-11T21:38:05.9683364Z   test_log1p_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9683500Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9683757Z [2023-01-11 21:28:23,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 163
2023-01-11T21:38:05.9684014Z [2023-01-11 21:28:25,010] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 163
2023-01-11T21:38:05.9684428Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9684559Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9684824Z [2023-01-11 21:28:25,026] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 164
2023-01-11T21:38:05.9685124Z [2023-01-11 21:28:26,758] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 164
2023-01-11T21:38:05.9685567Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9685699Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9685954Z [2023-01-11 21:28:26,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 165
2023-01-11T21:38:05.9686217Z [2023-01-11 21:28:28,640] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 165
2023-01-11T21:38:05.9686633Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9686763Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9687016Z [2023-01-11 21:28:28,697] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 166
2023-01-11T21:38:05.9687267Z [2023-01-11 21:28:30,508] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 166
2023-01-11T21:38:05.9687679Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9687881Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9688133Z [2023-01-11 21:28:30,545] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 167
2023-01-11T21:38:05.9688139Z 
2023-01-11T21:38:05.9688237Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9688314Z import torch
2023-01-11T21:38:05.9688390Z import random
2023-01-11T21:38:05.9688509Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9688633Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9688638Z 
2023-01-11T21:38:05.9688712Z aten = torch.ops.aten
2023-01-11T21:38:05.9688850Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9688947Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9688956Z 
2023-01-11T21:38:05.9689030Z import triton
2023-01-11T21:38:05.9689123Z import triton.language as tl
2023-01-11T21:38:05.9689250Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9689388Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9689394Z 
2023-01-11T21:38:05.9689401Z 
2023-01-11T21:38:05.9689539Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9689738Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9689862Z extern "C" void kernel(const half* __restrict__ in_ptr0,
2023-01-11T21:38:05.9689966Z                        half* __restrict__ out_ptr0,
2023-01-11T21:38:05.9690066Z                        half* __restrict__ out_ptr1)
2023-01-11T21:38:05.9690133Z {
2023-01-11T21:38:05.9690236Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9690302Z     {
2023-01-11T21:38:05.9690376Z         #pragma omp for 
2023-01-11T21:38:05.9690467Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9690534Z         {
2023-01-11T21:38:05.9690601Z             {
2023-01-11T21:38:05.9690672Z                 {
2023-01-11T21:38:05.9690796Z                     auto tmp0 = static_cast<float>(in_ptr0[i0]);
2023-01-11T21:38:05.9690906Z                     auto tmp1 = std::log1p(tmp0);
2023-01-11T21:38:05.9691034Z                     auto tmp2 = static_cast<half>(2);
2023-01-11T21:38:05.9691134Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9691223Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9691313Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9691382Z                 }
2023-01-11T21:38:05.9691450Z             }
2023-01-11T21:38:05.9691515Z         }
2023-01-11T21:38:05.9691574Z     }
2023-01-11T21:38:05.9691640Z }
2023-01-11T21:38:05.9691725Z ''')
2023-01-11T21:38:05.9691730Z 
2023-01-11T21:38:05.9691735Z 
2023-01-11T21:38:05.9691829Z async_compile.wait(globals())
2023-01-11T21:38:05.9691910Z del async_compile
2023-01-11T21:38:05.9691915Z 
2023-01-11T21:38:05.9691990Z def call(args):
2023-01-11T21:38:05.9692064Z     arg0_1, = args
2023-01-11T21:38:05.9692132Z     args.clear()
2023-01-11T21:38:05.9692326Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9692521Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9692692Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9692765Z     del arg0_1
2023-01-11T21:38:05.9692848Z     return (buf0, buf1, )
2023-01-11T21:38:05.9692853Z 
2023-01-11T21:38:05.9692857Z 
2023-01-11T21:38:05.9692939Z if __name__ == "__main__":
2023-01-11T21:38:05.9693056Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9693175Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9693368Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9693508Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9693513Z 
2023-01-11T21:38:05.9693518Z 
2023-01-11T21:38:05.9693615Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9693691Z import torch
2023-01-11T21:38:05.9693765Z import random
2023-01-11T21:38:05.9693883Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9694010Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9694015Z 
2023-01-11T21:38:05.9694089Z aten = torch.ops.aten
2023-01-11T21:38:05.9694222Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9694318Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9694323Z 
2023-01-11T21:38:05.9694397Z import triton
2023-01-11T21:38:05.9694602Z import triton.language as tl
2023-01-11T21:38:05.9694729Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9694868Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9694874Z 
2023-01-11T21:38:05.9694882Z 
2023-01-11T21:38:05.9695020Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9695217Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9695339Z extern "C" void kernel(const half* __restrict__ in_ptr0,
2023-01-11T21:38:05.9695447Z                        half* __restrict__ out_ptr0,
2023-01-11T21:38:05.9695546Z                        half* __restrict__ out_ptr1)
2023-01-11T21:38:05.9695610Z {
2023-01-11T21:38:05.9695712Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9695779Z     {
2023-01-11T21:38:05.9695852Z         #pragma omp for 
2023-01-11T21:38:05.9695940Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9696006Z         {
2023-01-11T21:38:05.9696075Z             {
2023-01-11T21:38:05.9696144Z                 {
2023-01-11T21:38:05.9696264Z                     auto tmp0 = static_cast<float>(in_ptr0[i0]);
2023-01-11T21:38:05.9696374Z                     auto tmp1 = std::log1p(tmp0);
2023-01-11T21:38:05.9696476Z                     auto tmp2 = static_cast<half>(2);
2023-01-11T21:38:05.9696571Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9696660Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9696750Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9696819Z                 }
2023-01-11T21:38:05.9696931Z             }
2023-01-11T21:38:05.9696999Z         }
2023-01-11T21:38:05.9697061Z     }
2023-01-11T21:38:05.9697188Z }
2023-01-11T21:38:05.9697279Z ''')
2023-01-11T21:38:05.9697285Z 
2023-01-11T21:38:05.9697289Z 
2023-01-11T21:38:05.9697385Z async_compile.wait(globals())
2023-01-11T21:38:05.9697462Z del async_compile
2023-01-11T21:38:05.9697467Z 
2023-01-11T21:38:05.9697542Z def call(args):
2023-01-11T21:38:05.9697616Z     arg0_1, = args
2023-01-11T21:38:05.9697685Z     args.clear()
2023-01-11T21:38:05.9697879Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9698069Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9698236Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9698308Z     del arg0_1
2023-01-11T21:38:05.9698389Z     return (buf0, buf1, )
2023-01-11T21:38:05.9698394Z 
2023-01-11T21:38:05.9698398Z 
2023-01-11T21:38:05.9698480Z if __name__ == "__main__":
2023-01-11T21:38:05.9698598Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9698716Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9698914Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float16)
2023-01-11T21:38:05.9699027Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9699032Z 
2023-01-11T21:38:05.9699036Z 
2023-01-11T21:38:05.9699134Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9699209Z import torch
2023-01-11T21:38:05.9699283Z import random
2023-01-11T21:38:05.9699402Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9699562Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9699575Z 
2023-01-11T21:38:05.9699650Z aten = torch.ops.aten
2023-01-11T21:38:05.9699786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9699880Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9699886Z 
2023-01-11T21:38:05.9699964Z import triton
2023-01-11T21:38:05.9700057Z import triton.language as tl
2023-01-11T21:38:05.9700182Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9700320Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9700326Z 
2023-01-11T21:38:05.9700330Z 
2023-01-11T21:38:05.9700469Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9700665Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9700788Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9700892Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9700998Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9701064Z {
2023-01-11T21:38:05.9701168Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9701236Z     {
2023-01-11T21:38:05.9701309Z         #pragma omp for 
2023-01-11T21:38:05.9701397Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9701466Z         {
2023-01-11T21:38:05.9701608Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9701699Z             auto tmp1 = tmp0.log1p();
2023-01-11T21:38:05.9701837Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9701927Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9702015Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9702110Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9702178Z         }
2023-01-11T21:38:05.9702280Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9702369Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9702440Z         {
2023-01-11T21:38:05.9702530Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9702621Z             auto tmp1 = std::log1p(tmp0);
2023-01-11T21:38:05.9702728Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:05.9702818Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9702937Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9703022Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9703090Z         }
2023-01-11T21:38:05.9703156Z     }
2023-01-11T21:38:05.9703213Z }
2023-01-11T21:38:05.9703304Z ''')
2023-01-11T21:38:05.9703309Z 
2023-01-11T21:38:05.9703313Z 
2023-01-11T21:38:05.9703406Z async_compile.wait(globals())
2023-01-11T21:38:05.9703484Z del async_compile
2023-01-11T21:38:05.9703489Z 
2023-01-11T21:38:05.9703566Z def call(args):
2023-01-11T21:38:05.9703641Z     arg0_1, = args
2023-01-11T21:38:05.9703717Z     args.clear()
2023-01-11T21:38:05.9703900Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9704093Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9704264Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9704336Z     del arg0_1
2023-01-11T21:38:05.9704416Z     return (buf0, buf1, )
2023-01-11T21:38:05.9704424Z 
2023-01-11T21:38:05.9704429Z 
2023-01-11T21:38:05.9704509Z if __name__ == "__main__":
2023-01-11T21:38:05.9704633Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9704762Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9704948Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9705078Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9705084Z 
2023-01-11T21:38:05.9705090Z 
2023-01-11T21:38:05.9705197Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9705294Z import torch
2023-01-11T21:38:05.9705374Z import random
2023-01-11T21:38:05.9705533Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9705657Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9705662Z 
2023-01-11T21:38:05.9705744Z aten = torch.ops.aten
2023-01-11T21:38:05.9705872Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9705971Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9705976Z 
2023-01-11T21:38:05.9706053Z import triton
2023-01-11T21:38:05.9706147Z import triton.language as tl
2023-01-11T21:38:05.9706274Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9706414Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9706419Z 
2023-01-11T21:38:05.9706424Z 
2023-01-11T21:38:05.9706560Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9706764Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9706880Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9706989Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9707092Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9707159Z {
2023-01-11T21:38:05.9707261Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9707329Z     {
2023-01-11T21:38:05.9707410Z         #pragma omp for 
2023-01-11T21:38:05.9707493Z         for(long i0=0; i0<25; i0+=1)
2023-01-11T21:38:05.9707559Z         {
2023-01-11T21:38:05.9707699Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9707790Z             auto tmp1 = tmp0.log1p();
2023-01-11T21:38:05.9707928Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9708017Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9708111Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9708199Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9708267Z         }
2023-01-11T21:38:05.9708366Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9708456Z         for(long i0=200; i0<201; i0+=1)
2023-01-11T21:38:05.9708523Z         {
2023-01-11T21:38:05.9708614Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9708713Z             auto tmp1 = std::log1p(tmp0);
2023-01-11T21:38:05.9708810Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:05.9708930Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9709017Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9709103Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9709169Z         }
2023-01-11T21:38:05.9709235Z     }
2023-01-11T21:38:05.9709303Z }
2023-01-11T21:38:05.9709380Z ''')
2023-01-11T21:38:05.9709386Z 
2023-01-11T21:38:05.9709390Z 
2023-01-11T21:38:05.9709483Z async_compile.wait(globals())
2023-01-11T21:38:05.9709561Z del async_compile
2023-01-11T21:38:05.9709566Z 
2023-01-11T21:38:05.9709641Z def call(args):
2023-01-11T21:38:05.9709716Z     arg0_1, = args
2023-01-11T21:38:05.9709792Z     args.clear()
2023-01-11T21:38:05.9709989Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9710173Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9710337Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9710413Z     del arg0_1
2023-01-11T21:38:05.9710496Z     return (buf0, buf1, )
2023-01-11T21:38:05.9710501Z 
2023-01-11T21:38:05.9710506Z 
2023-01-11T21:38:05.9710588Z if __name__ == "__main__":
2023-01-11T21:38:05.9710707Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9710837Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9711032Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9711138Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9711150Z 
2023-01-11T21:38:05.9711406Z [2023-01-11 21:28:32,592] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 167
2023-01-11T21:38:05.9711860Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9711991Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9712248Z [2023-01-11 21:28:32,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 168
2023-01-11T21:38:05.9712511Z [2023-01-11 21:28:34,381] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 168
2023-01-11T21:38:05.9712924Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9713059Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9713318Z [2023-01-11 21:28:34,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 169
2023-01-11T21:38:05.9713578Z [2023-01-11 21:28:36,262] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 169
2023-01-11T21:38:05.9713988Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9714119Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9714366Z [2023-01-11 21:28:36,308] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 170
2023-01-11T21:38:05.9714632Z [2023-01-11 21:28:38,222] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 170
2023-01-11T21:38:05.9715070Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9715203Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9715455Z [2023-01-11 21:28:38,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 171
2023-01-11T21:38:05.9715461Z 
2023-01-11T21:38:05.9715559Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9715636Z import torch
2023-01-11T21:38:05.9715710Z import random
2023-01-11T21:38:05.9715829Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9715946Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9715959Z 
2023-01-11T21:38:05.9716034Z aten = torch.ops.aten
2023-01-11T21:38:05.9716171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9716267Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9716272Z 
2023-01-11T21:38:05.9716347Z import triton
2023-01-11T21:38:05.9716440Z import triton.language as tl
2023-01-11T21:38:05.9716566Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9716706Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9716711Z 
2023-01-11T21:38:05.9716716Z 
2023-01-11T21:38:05.9716852Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9717050Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9717205Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:05.9717311Z                        double* __restrict__ out_ptr0,
2023-01-11T21:38:05.9717413Z                        double* __restrict__ out_ptr1)
2023-01-11T21:38:05.9717479Z {
2023-01-11T21:38:05.9717584Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9717651Z     {
2023-01-11T21:38:05.9717725Z         #pragma omp for 
2023-01-11T21:38:05.9717815Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9717883Z         {
2023-01-11T21:38:05.9717950Z             {
2023-01-11T21:38:05.9718019Z                 {
2023-01-11T21:38:05.9718120Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9718221Z                     auto tmp1 = std::log1p(tmp0);
2023-01-11T21:38:05.9718331Z                     auto tmp2 = static_cast<double>(2);
2023-01-11T21:38:05.9718428Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9718517Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9718611Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9718680Z                 }
2023-01-11T21:38:05.9718748Z             }
2023-01-11T21:38:05.9718808Z         }
2023-01-11T21:38:05.9718874Z     }
2023-01-11T21:38:05.9718936Z }
2023-01-11T21:38:05.9719021Z ''')
2023-01-11T21:38:05.9719026Z 
2023-01-11T21:38:05.9719034Z 
2023-01-11T21:38:05.9719127Z async_compile.wait(globals())
2023-01-11T21:38:05.9719205Z del async_compile
2023-01-11T21:38:05.9719210Z 
2023-01-11T21:38:05.9719286Z def call(args):
2023-01-11T21:38:05.9719363Z     arg0_1, = args
2023-01-11T21:38:05.9719432Z     args.clear()
2023-01-11T21:38:05.9719628Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9719820Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9719989Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9720061Z     del arg0_1
2023-01-11T21:38:05.9720145Z     return (buf0, buf1, )
2023-01-11T21:38:05.9720150Z 
2023-01-11T21:38:05.9720155Z 
2023-01-11T21:38:05.9720234Z if __name__ == "__main__":
2023-01-11T21:38:05.9720345Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9720470Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9720688Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9720802Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9720807Z 
2023-01-11T21:38:05.9720812Z 
2023-01-11T21:38:05.9720909Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9720985Z import torch
2023-01-11T21:38:05.9721058Z import random
2023-01-11T21:38:05.9721177Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9721292Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9721297Z 
2023-01-11T21:38:05.9721378Z aten = torch.ops.aten
2023-01-11T21:38:05.9721513Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9721613Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9721618Z 
2023-01-11T21:38:05.9721693Z import triton
2023-01-11T21:38:05.9721785Z import triton.language as tl
2023-01-11T21:38:05.9721910Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9722054Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9722060Z 
2023-01-11T21:38:05.9722064Z 
2023-01-11T21:38:05.9722192Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9722397Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9722522Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:05.9722628Z                        double* __restrict__ out_ptr0,
2023-01-11T21:38:05.9722732Z                        double* __restrict__ out_ptr1)
2023-01-11T21:38:05.9722799Z {
2023-01-11T21:38:05.9722900Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9722990Z     {
2023-01-11T21:38:05.9723073Z         #pragma omp for 
2023-01-11T21:38:05.9723160Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9723228Z         {
2023-01-11T21:38:05.9723294Z             {
2023-01-11T21:38:05.9723363Z                 {
2023-01-11T21:38:05.9723462Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9723564Z                     auto tmp1 = std::log1p(tmp0);
2023-01-11T21:38:05.9723673Z                     auto tmp2 = static_cast<double>(2);
2023-01-11T21:38:05.9723769Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9723858Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9723947Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9724018Z                 }
2023-01-11T21:38:05.9724087Z             }
2023-01-11T21:38:05.9724147Z         }
2023-01-11T21:38:05.9724213Z     }
2023-01-11T21:38:05.9724276Z }
2023-01-11T21:38:05.9724361Z ''')
2023-01-11T21:38:05.9724367Z 
2023-01-11T21:38:05.9724372Z 
2023-01-11T21:38:05.9724468Z async_compile.wait(globals())
2023-01-11T21:38:05.9724545Z del async_compile
2023-01-11T21:38:05.9724550Z 
2023-01-11T21:38:05.9724623Z def call(args):
2023-01-11T21:38:05.9724689Z     arg0_1, = args
2023-01-11T21:38:05.9724767Z     args.clear()
2023-01-11T21:38:05.9724962Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9725153Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9725316Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9725389Z     del arg0_1
2023-01-11T21:38:05.9725469Z     return (buf0, buf1, )
2023-01-11T21:38:05.9725475Z 
2023-01-11T21:38:05.9725479Z 
2023-01-11T21:38:05.9725558Z if __name__ == "__main__":
2023-01-11T21:38:05.9725667Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9725791Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9725986Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9726100Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9726105Z 
2023-01-11T21:38:05.9726110Z 
2023-01-11T21:38:05.9726207Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9726285Z import torch
2023-01-11T21:38:05.9726359Z import random
2023-01-11T21:38:05.9726506Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9726623Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9726628Z 
2023-01-11T21:38:05.9726709Z aten = torch.ops.aten
2023-01-11T21:38:05.9726844Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9726941Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9726946Z 
2023-01-11T21:38:05.9727020Z import triton
2023-01-11T21:38:05.9727113Z import triton.language as tl
2023-01-11T21:38:05.9727236Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9727367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9727382Z 
2023-01-11T21:38:05.9727387Z 
2023-01-11T21:38:05.9727515Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9727721Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9727842Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:05.9727946Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9728049Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9728114Z {
2023-01-11T21:38:05.9728214Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9728273Z     {
2023-01-11T21:38:05.9728354Z         #pragma omp for 
2023-01-11T21:38:05.9728441Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9728509Z         {
2023-01-11T21:38:05.9728576Z             {
2023-01-11T21:38:05.9728644Z                 {
2023-01-11T21:38:05.9728740Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9728846Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9728989Z                     auto tmp2 = std::log1p(tmp1);
2023-01-11T21:38:05.9729098Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9729194Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9729288Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9729379Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9729448Z                 }
2023-01-11T21:38:05.9729508Z             }
2023-01-11T21:38:05.9729575Z         }
2023-01-11T21:38:05.9729641Z     }
2023-01-11T21:38:05.9729705Z }
2023-01-11T21:38:05.9729790Z ''')
2023-01-11T21:38:05.9729796Z 
2023-01-11T21:38:05.9729800Z 
2023-01-11T21:38:05.9729892Z async_compile.wait(globals())
2023-01-11T21:38:05.9729972Z del async_compile
2023-01-11T21:38:05.9729977Z 
2023-01-11T21:38:05.9730044Z def call(args):
2023-01-11T21:38:05.9730117Z     arg0_1, = args
2023-01-11T21:38:05.9730191Z     args.clear()
2023-01-11T21:38:05.9730385Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9730576Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9730738Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9730810Z     del arg0_1
2023-01-11T21:38:05.9730887Z     return (buf0, buf1, )
2023-01-11T21:38:05.9730901Z 
2023-01-11T21:38:05.9730906Z 
2023-01-11T21:38:05.9730978Z if __name__ == "__main__":
2023-01-11T21:38:05.9731095Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9731220Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9731411Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.9731522Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9731527Z 
2023-01-11T21:38:05.9731532Z 
2023-01-11T21:38:05.9731629Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9731702Z import torch
2023-01-11T21:38:05.9731772Z import random
2023-01-11T21:38:05.9731890Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9732012Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9732017Z 
2023-01-11T21:38:05.9732100Z aten = torch.ops.aten
2023-01-11T21:38:05.9732260Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9732358Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9732363Z 
2023-01-11T21:38:05.9732438Z import triton
2023-01-11T21:38:05.9732531Z import triton.language as tl
2023-01-11T21:38:05.9732647Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9732784Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9732790Z 
2023-01-11T21:38:05.9732794Z 
2023-01-11T21:38:05.9732930Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9733133Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9733255Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:05.9733359Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9733460Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9733526Z {
2023-01-11T21:38:05.9733620Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9733686Z     {
2023-01-11T21:38:05.9733769Z         #pragma omp for 
2023-01-11T21:38:05.9733859Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9733926Z         {
2023-01-11T21:38:05.9733993Z             {
2023-01-11T21:38:05.9734063Z                 {
2023-01-11T21:38:05.9734153Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9734264Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9734370Z                     auto tmp2 = std::log1p(tmp1);
2023-01-11T21:38:05.9734590Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9734686Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9743465Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9743579Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9743652Z                 }
2023-01-11T21:38:05.9743723Z             }
2023-01-11T21:38:05.9743792Z         }
2023-01-11T21:38:05.9743851Z     }
2023-01-11T21:38:05.9743920Z }
2023-01-11T21:38:05.9744026Z ''')
2023-01-11T21:38:05.9744037Z 
2023-01-11T21:38:05.9744042Z 
2023-01-11T21:38:05.9744140Z async_compile.wait(globals())
2023-01-11T21:38:05.9744221Z del async_compile
2023-01-11T21:38:05.9744226Z 
2023-01-11T21:38:05.9744310Z def call(args):
2023-01-11T21:38:05.9744387Z     arg0_1, = args
2023-01-11T21:38:05.9744455Z     args.clear()
2023-01-11T21:38:05.9744658Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9744855Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9745044Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9745127Z     del arg0_1
2023-01-11T21:38:05.9745229Z     return (buf0, buf1, )
2023-01-11T21:38:05.9745235Z 
2023-01-11T21:38:05.9745241Z 
2023-01-11T21:38:05.9745322Z if __name__ == "__main__":
2023-01-11T21:38:05.9745441Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9745563Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9745757Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:05.9745869Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9745874Z 
2023-01-11T21:38:05.9746143Z [2023-01-11 21:28:40,196] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 171
2023-01-11T21:38:05.9746561Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9746697Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9746954Z [2023-01-11 21:28:40,254] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 172
2023-01-11T21:38:05.9747294Z [2023-01-11 21:28:42,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 172
2023-01-11T21:38:05.9747300Z 
2023-01-11T21:38:05.9747401Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9747476Z import torch
2023-01-11T21:38:05.9747544Z import random
2023-01-11T21:38:05.9747663Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9747788Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9747793Z 
2023-01-11T21:38:05.9747874Z aten = torch.ops.aten
2023-01-11T21:38:05.9748006Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9748107Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9748112Z 
2023-01-11T21:38:05.9748185Z import triton
2023-01-11T21:38:05.9748276Z import triton.language as tl
2023-01-11T21:38:05.9748400Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9748541Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9748549Z 
2023-01-11T21:38:05.9748553Z 
2023-01-11T21:38:05.9748694Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9748899Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9749015Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9749121Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9749223Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9749288Z {
2023-01-11T21:38:05.9749391Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9749459Z     {
2023-01-11T21:38:05.9749543Z         #pragma omp for 
2023-01-11T21:38:05.9749656Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9749724Z         {
2023-01-11T21:38:05.9749792Z             {
2023-01-11T21:38:05.9749860Z                 {
2023-01-11T21:38:05.9749957Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9750072Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9750185Z                     auto tmp2 = std::log1p(tmp1);
2023-01-11T21:38:05.9750286Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9750383Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9750473Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9750562Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9750633Z                 }
2023-01-11T21:38:05.9750700Z             }
2023-01-11T21:38:05.9750766Z         }
2023-01-11T21:38:05.9750826Z     }
2023-01-11T21:38:05.9750889Z }
2023-01-11T21:38:05.9750977Z ''')
2023-01-11T21:38:05.9750983Z 
2023-01-11T21:38:05.9750991Z 
2023-01-11T21:38:05.9751086Z async_compile.wait(globals())
2023-01-11T21:38:05.9751164Z del async_compile
2023-01-11T21:38:05.9751169Z 
2023-01-11T21:38:05.9751244Z def call(args):
2023-01-11T21:38:05.9751320Z     arg0_1, = args
2023-01-11T21:38:05.9751387Z     args.clear()
2023-01-11T21:38:05.9751586Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9751782Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9751950Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9752023Z     del arg0_1
2023-01-11T21:38:05.9752105Z     return (buf0, buf1, )
2023-01-11T21:38:05.9752110Z 
2023-01-11T21:38:05.9752114Z 
2023-01-11T21:38:05.9752194Z if __name__ == "__main__":
2023-01-11T21:38:05.9752314Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9752433Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9752629Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9752744Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9752749Z 
2023-01-11T21:38:05.9752754Z 
2023-01-11T21:38:05.9752851Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9752923Z import torch
2023-01-11T21:38:05.9753026Z import random
2023-01-11T21:38:05.9753146Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9753261Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9753273Z 
2023-01-11T21:38:05.9753348Z aten = torch.ops.aten
2023-01-11T21:38:05.9753483Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9753579Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9753584Z 
2023-01-11T21:38:05.9753657Z import triton
2023-01-11T21:38:05.9753750Z import triton.language as tl
2023-01-11T21:38:05.9753874Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9754012Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9754020Z 
2023-01-11T21:38:05.9754024Z 
2023-01-11T21:38:05.9754162Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9754361Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9754483Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9754589Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9754692Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9754757Z {
2023-01-11T21:38:05.9754858Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9754925Z     {
2023-01-11T21:38:05.9754999Z         #pragma omp for 
2023-01-11T21:38:05.9755088Z         for(long i0=0; i0<201; i0+=1)
2023-01-11T21:38:05.9755155Z         {
2023-01-11T21:38:05.9755222Z             {
2023-01-11T21:38:05.9755293Z                 {
2023-01-11T21:38:05.9755390Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9755539Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:05.9755645Z                     auto tmp2 = std::log1p(tmp1);
2023-01-11T21:38:05.9755753Z                     auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:05.9755849Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:05.9755941Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9756031Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9756099Z                 }
2023-01-11T21:38:05.9756160Z             }
2023-01-11T21:38:05.9756231Z         }
2023-01-11T21:38:05.9756298Z     }
2023-01-11T21:38:05.9756362Z }
2023-01-11T21:38:05.9756447Z ''')
2023-01-11T21:38:05.9756452Z 
2023-01-11T21:38:05.9756457Z 
2023-01-11T21:38:05.9756550Z async_compile.wait(globals())
2023-01-11T21:38:05.9756627Z del async_compile
2023-01-11T21:38:05.9756632Z 
2023-01-11T21:38:05.9756707Z def call(args):
2023-01-11T21:38:05.9756773Z     arg0_1, = args
2023-01-11T21:38:05.9756849Z     args.clear()
2023-01-11T21:38:05.9757047Z     buf0 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9757240Z     buf1 = empty_strided((201, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9757409Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9757484Z     del arg0_1
2023-01-11T21:38:05.9757566Z     return (buf0, buf1, )
2023-01-11T21:38:05.9757571Z 
2023-01-11T21:38:05.9757575Z 
2023-01-11T21:38:05.9757648Z if __name__ == "__main__":
2023-01-11T21:38:05.9757763Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9757889Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9758080Z     arg0_1 = rand_strided((201, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9758191Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9758196Z 
2023-01-11T21:38:05.9758268Z ok (19.158s)
2023-01-11T21:38:05.9758713Z   test_log2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9758874Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9759135Z [2023-01-11 21:28:42,184] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 173
2023-01-11T21:38:05.9759397Z [2023-01-11 21:28:44,219] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 173
2023-01-11T21:38:05.9759403Z 
2023-01-11T21:38:05.9759494Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9759569Z import torch
2023-01-11T21:38:05.9759646Z import random
2023-01-11T21:38:05.9759764Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9759889Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9759894Z 
2023-01-11T21:38:05.9759976Z aten = torch.ops.aten
2023-01-11T21:38:05.9760120Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9760208Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9760214Z 
2023-01-11T21:38:05.9760289Z import triton
2023-01-11T21:38:05.9760385Z import triton.language as tl
2023-01-11T21:38:05.9760512Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9760652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9760658Z 
2023-01-11T21:38:05.9760662Z 
2023-01-11T21:38:05.9760797Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9761003Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9761125Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9761222Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9761351Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9761418Z {
2023-01-11T21:38:05.9761518Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9761585Z     {
2023-01-11T21:38:05.9761666Z         #pragma omp for 
2023-01-11T21:38:05.9761755Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9761817Z         {
2023-01-11T21:38:05.9761958Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9762048Z             auto tmp1 = tmp0.log();
2023-01-11T21:38:05.9762198Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(1.4426950408889634));
2023-01-11T21:38:05.9762288Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9762424Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:05.9762514Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:05.9762603Z             auto tmp6 = tmp5.log();
2023-01-11T21:38:05.9762683Z             auto tmp7 = tmp6 * tmp2;
2023-01-11T21:38:05.9762824Z             auto tmp8 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:05.9762953Z             auto tmp9 = tmp7 - tmp8;
2023-01-11T21:38:05.9763051Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9763147Z             tmp9.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9763215Z         }
2023-01-11T21:38:05.9763318Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9763399Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:05.9763466Z         {
2023-01-11T21:38:05.9763554Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9763650Z             auto tmp1 = std::log(tmp0);
2023-01-11T21:38:05.9763767Z             auto tmp2 = static_cast<float>(1.4426950408889634);
2023-01-11T21:38:05.9763856Z             auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9763959Z             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:05.9764041Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:05.9764135Z             auto tmp6 = std::log(tmp5);
2023-01-11T21:38:05.9764226Z             auto tmp7 = tmp6 * tmp2;
2023-01-11T21:38:05.9764326Z             auto tmp8 = static_cast<float>(2);
2023-01-11T21:38:05.9764452Z             auto tmp9 = tmp7 - tmp8;
2023-01-11T21:38:05.9764538Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.9764623Z             out_ptr1[i0] = tmp9;
2023-01-11T21:38:05.9764716Z         }
2023-01-11T21:38:05.9764786Z     }
2023-01-11T21:38:05.9764851Z }
2023-01-11T21:38:05.9764934Z ''')
2023-01-11T21:38:05.9764939Z 
2023-01-11T21:38:05.9764944Z 
2023-01-11T21:38:05.9765039Z async_compile.wait(globals())
2023-01-11T21:38:05.9765115Z del async_compile
2023-01-11T21:38:05.9765120Z 
2023-01-11T21:38:05.9765195Z def call(args):
2023-01-11T21:38:05.9765262Z     arg0_1, = args
2023-01-11T21:38:05.9765335Z     args.clear()
2023-01-11T21:38:05.9765529Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9765721Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9765893Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9765966Z     del arg0_1
2023-01-11T21:38:05.9766048Z     return (buf0, buf1, )
2023-01-11T21:38:05.9766054Z 
2023-01-11T21:38:05.9766058Z 
2023-01-11T21:38:05.9766138Z if __name__ == "__main__":
2023-01-11T21:38:05.9766252Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9766379Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9766571Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9766684Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9766689Z 
2023-01-11T21:38:05.9766764Z ok (2.068s)
2023-01-11T21:38:05.9767212Z   test_log_fp64_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9767380Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9767639Z [2023-01-11 21:28:44,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 174
2023-01-11T21:38:05.9767902Z [2023-01-11 21:28:46,004] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 174
2023-01-11T21:38:05.9767908Z 
2023-01-11T21:38:05.9767998Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9768074Z import torch
2023-01-11T21:38:05.9768148Z import random
2023-01-11T21:38:05.9768267Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9768390Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9768395Z 
2023-01-11T21:38:05.9768476Z aten = torch.ops.aten
2023-01-11T21:38:05.9768610Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9768709Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9768715Z 
2023-01-11T21:38:05.9768781Z import triton
2023-01-11T21:38:05.9768874Z import triton.language as tl
2023-01-11T21:38:05.9768998Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9769140Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9769146Z 
2023-01-11T21:38:05.9769150Z 
2023-01-11T21:38:05.9769288Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9769493Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9769618Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:05.9769724Z                        double* __restrict__ out_ptr0,
2023-01-11T21:38:05.9769821Z                        double* __restrict__ out_ptr1)
2023-01-11T21:38:05.9769887Z {
2023-01-11T21:38:05.9769989Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9770057Z     {
2023-01-11T21:38:05.9770138Z         #pragma omp for 
2023-01-11T21:38:05.9770229Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9770296Z         {
2023-01-11T21:38:05.9770356Z             {
2023-01-11T21:38:05.9770425Z                 {
2023-01-11T21:38:05.9770523Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9770658Z                     auto tmp1 = std::log(tmp0);
2023-01-11T21:38:05.9770783Z                     auto tmp2 = static_cast<double>(1.4426950408889634);
2023-01-11T21:38:05.9770883Z                     auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:05.9770973Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9771055Z                     out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9771125Z                 }
2023-01-11T21:38:05.9771191Z             }
2023-01-11T21:38:05.9771257Z         }
2023-01-11T21:38:05.9771323Z     }
2023-01-11T21:38:05.9771387Z }
2023-01-11T21:38:05.9771464Z ''')
2023-01-11T21:38:05.9771470Z 
2023-01-11T21:38:05.9771484Z 
2023-01-11T21:38:05.9771571Z async_compile.wait(globals())
2023-01-11T21:38:05.9771648Z del async_compile
2023-01-11T21:38:05.9771653Z 
2023-01-11T21:38:05.9771727Z def call(args):
2023-01-11T21:38:05.9771802Z     arg0_1, = args
2023-01-11T21:38:05.9771876Z     args.clear()
2023-01-11T21:38:05.9772078Z     buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9772274Z     buf1 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9772434Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9772507Z     del arg0_1
2023-01-11T21:38:05.9772590Z     return (buf0, buf1, )
2023-01-11T21:38:05.9772595Z 
2023-01-11T21:38:05.9772599Z 
2023-01-11T21:38:05.9772679Z if __name__ == "__main__":
2023-01-11T21:38:05.9772796Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9772923Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9773150Z     arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:05.9773261Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9773267Z 
2023-01-11T21:38:05.9773331Z ok (1.784s)
2023-01-11T21:38:05.9773789Z   test_log_softmax_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9773923Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9774179Z [2023-01-11 21:28:46,049] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 175
2023-01-11T21:38:05.9774185Z 
2023-01-11T21:38:05.9774283Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9774358Z import torch
2023-01-11T21:38:05.9774430Z import random
2023-01-11T21:38:05.9774684Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9774814Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9774820Z 
2023-01-11T21:38:05.9774913Z aten = torch.ops.aten
2023-01-11T21:38:05.9775071Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9775179Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9775185Z 
2023-01-11T21:38:05.9775258Z import triton
2023-01-11T21:38:05.9775351Z import triton.language as tl
2023-01-11T21:38:05.9775474Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9775613Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9775618Z 
2023-01-11T21:38:05.9775623Z 
2023-01-11T21:38:05.9775761Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9775958Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9776088Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9776198Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9776301Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9776402Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9776549Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:05.9776650Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:05.9776742Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:05.9776838Z                        float* __restrict__ out_ptr5,
2023-01-11T21:38:05.9776934Z                        float* __restrict__ out_ptr6,
2023-01-11T21:38:05.9777031Z                        float* __restrict__ out_ptr7,
2023-01-11T21:38:05.9777180Z                        float* __restrict__ out_ptr8)
2023-01-11T21:38:05.9777255Z {
2023-01-11T21:38:05.9777359Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9777422Z     {
2023-01-11T21:38:05.9777504Z         #pragma omp for 
2023-01-11T21:38:05.9777590Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9777658Z         {
2023-01-11T21:38:05.9777726Z             {
2023-01-11T21:38:05.9778095Z                 #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:05.9778334Z                 float tmp3 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9778462Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:05.9778550Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:05.9778620Z                 {
2023-01-11T21:38:05.9778768Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9778917Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9779014Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9779178Z                     tmp3_vec = at::vec::maximum(tmp3_vec, tmp2);
2023-01-11T21:38:05.9779249Z                 }
2023-01-11T21:38:05.9779468Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp3_vec);
2023-01-11T21:38:05.9779597Z                 #pragma omp simd simdlen(4)  reduction(max:tmp3)
2023-01-11T21:38:05.9779684Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:05.9779753Z                 {
2023-01-11T21:38:05.9779857Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.9779959Z                     auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:05.9780062Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9780166Z                     tmp3 = std::max(tmp3, tmp2);
2023-01-11T21:38:05.9780235Z                 }
2023-01-11T21:38:05.9780315Z                 out_ptr0[i0] = tmp3;
2023-01-11T21:38:05.9780386Z             }
2023-01-11T21:38:05.9780452Z         }
2023-01-11T21:38:05.9780534Z         #pragma omp for 
2023-01-11T21:38:05.9780619Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9780686Z         {
2023-01-11T21:38:05.9780746Z             {
2023-01-11T21:38:05.9780944Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.9781033Z                 float tmp6 = 0;
2023-01-11T21:38:05.9781158Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:05.9781515Z                 #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:05.9781725Z                 float tmp7 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9781853Z                 auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:05.9781950Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:05.9782021Z                 {
2023-01-11T21:38:05.9782159Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9782304Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9782463Z                     auto tmp3 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:05.9782563Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9782702Z                     auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:05.9782797Z                     auto tmp5 = tmp4.exp();
2023-01-11T21:38:05.9782884Z                     tmp6_vec += tmp5;
2023-01-11T21:38:05.9783008Z                     tmp7_vec = at::vec::maximum(tmp7_vec, tmp1);
2023-01-11T21:38:05.9783071Z                 }
2023-01-11T21:38:05.9783271Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:05.9783490Z                 tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp7_vec);
2023-01-11T21:38:05.9783639Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6) reduction(max:tmp7)
2023-01-11T21:38:05.9783735Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:05.9783807Z                 {
2023-01-11T21:38:05.9783914Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.9784015Z                     auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:05.9784114Z                     auto tmp3 = out_ptr0[i0];
2023-01-11T21:38:05.9784203Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9784342Z                     auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:05.9784450Z                     auto tmp5 = std::exp(tmp4);
2023-01-11T21:38:05.9784534Z                     tmp6 += tmp5;
2023-01-11T21:38:05.9784639Z                     tmp7 = std::max(tmp7, tmp1);
2023-01-11T21:38:05.9784742Z                 }
2023-01-11T21:38:05.9784833Z                 out_ptr1[i0] = tmp6;
2023-01-11T21:38:05.9784925Z                 out_ptr2[i0] = tmp7;
2023-01-11T21:38:05.9785005Z             }
2023-01-11T21:38:05.9785078Z         }
2023-01-11T21:38:05.9785181Z         #pragma omp for 
2023-01-11T21:38:05.9785272Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9785339Z         {
2023-01-11T21:38:05.9785400Z             {
2023-01-11T21:38:05.9785468Z                 {
2023-01-11T21:38:05.9785684Z                     float tmp1 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9785781Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.9785852Z                     {
2023-01-11T21:38:05.9785925Z                         {
2023-01-11T21:38:05.9786037Z                             auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:05.9786146Z                             tmp1 = std::max(tmp1, tmp0);
2023-01-11T21:38:05.9786214Z                         }
2023-01-11T21:38:05.9786285Z                     }
2023-01-11T21:38:05.9786374Z                     out_ptr3[i0] = tmp1;
2023-01-11T21:38:05.9786443Z                 }
2023-01-11T21:38:05.9786512Z             }
2023-01-11T21:38:05.9786578Z         }
2023-01-11T21:38:05.9786652Z         #pragma omp for 
2023-01-11T21:38:05.9786740Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9786808Z         {
2023-01-11T21:38:05.9786875Z             {
2023-01-11T21:38:05.9786944Z                 {
2023-01-11T21:38:05.9787030Z                     float tmp4 = 0;
2023-01-11T21:38:05.9787126Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.9787189Z                     {
2023-01-11T21:38:05.9787266Z                         {
2023-01-11T21:38:05.9787378Z                             auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:05.9787480Z                             auto tmp1 = out_ptr3[i0];
2023-01-11T21:38:05.9787627Z                             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9787737Z                             auto tmp3 = std::exp(tmp2);
2023-01-11T21:38:05.9787825Z                             tmp4 += tmp3;
2023-01-11T21:38:05.9787891Z                         }
2023-01-11T21:38:05.9787962Z                     }
2023-01-11T21:38:05.9788052Z                     out_ptr4[i0] = tmp4;
2023-01-11T21:38:05.9788194Z                 }
2023-01-11T21:38:05.9788263Z             }
2023-01-11T21:38:05.9788330Z         }
2023-01-11T21:38:05.9788411Z         #pragma omp for 
2023-01-11T21:38:05.9788490Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9788556Z         {
2023-01-11T21:38:05.9788624Z             {
2023-01-11T21:38:05.9788819Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.9788908Z                 float tmp4 = 0;
2023-01-11T21:38:05.9789034Z                 auto tmp4_vec = at::vec::Vectorized<float>(tmp4);
2023-01-11T21:38:05.9789131Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:05.9789198Z                 {
2023-01-11T21:38:05.9789345Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9789478Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr2[i0]);
2023-01-11T21:38:05.9789621Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9789718Z                     auto tmp3 = tmp2.exp();
2023-01-11T21:38:05.9789807Z                     tmp4_vec += tmp3;
2023-01-11T21:38:05.9789876Z                 }
2023-01-11T21:38:05.9790076Z                 tmp4 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp4_vec);
2023-01-11T21:38:05.9790195Z                 #pragma omp simd simdlen(4)  reduction(+:tmp4)
2023-01-11T21:38:05.9790290Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:05.9790358Z                 {
2023-01-11T21:38:05.9790463Z                     auto tmp0 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:05.9790594Z                     auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:05.9790731Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9790835Z                     auto tmp3 = std::exp(tmp2);
2023-01-11T21:38:05.9790920Z                     tmp4 += tmp3;
2023-01-11T21:38:05.9790982Z                 }
2023-01-11T21:38:05.9791072Z                 out_ptr5[i0] = tmp4;
2023-01-11T21:38:05.9791140Z             }
2023-01-11T21:38:05.9791206Z         }
2023-01-11T21:38:05.9791288Z         #pragma omp for 
2023-01-11T21:38:05.9791373Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9791433Z         {
2023-01-11T21:38:05.9791520Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:05.9791588Z             {
2023-01-11T21:38:05.9791734Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9791878Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9792015Z                 auto tmp3 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:05.9792143Z                 auto tmp5 = at::vec::Vectorized<float>(out_ptr1[i0]);
2023-01-11T21:38:05.9792286Z                 auto tmp8 = at::vec::Vectorized<float>::loadu(out_ptr3 + 8*i1);
2023-01-11T21:38:05.9792428Z                 auto tmp10 = at::vec::Vectorized<float>::loadu(out_ptr4 + 8*i1);
2023-01-11T21:38:05.9792553Z                 auto tmp13 = at::vec::Vectorized<float>(out_ptr2[i0]);
2023-01-11T21:38:05.9792685Z                 auto tmp15 = at::vec::Vectorized<float>(out_ptr5[i0]);
2023-01-11T21:38:05.9792778Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9792911Z                 auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:05.9793003Z                 auto tmp6 = tmp5.log();
2023-01-11T21:38:05.9793134Z                 auto tmp7 = tmp4 - tmp6;
2023-01-11T21:38:05.9793265Z                 auto tmp9 = tmp0 - tmp8;
2023-01-11T21:38:05.9793353Z                 auto tmp11 = tmp10.log();
2023-01-11T21:38:05.9793491Z                 auto tmp12 = tmp9 - tmp11;
2023-01-11T21:38:05.9793626Z                 auto tmp14 = tmp1 - tmp13;
2023-01-11T21:38:05.9793721Z                 auto tmp16 = tmp15.log();
2023-01-11T21:38:05.9793858Z                 auto tmp17 = tmp14 - tmp16;
2023-01-11T21:38:05.9793966Z                 tmp7.store(out_ptr6 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9794106Z                 tmp12.store(out_ptr7 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9794208Z                 tmp17.store(out_ptr8 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9794277Z             }
2023-01-11T21:38:05.9794374Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9794463Z             for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:05.9794532Z             {
2023-01-11T21:38:05.9794636Z                 auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.9794740Z                 auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:05.9794834Z                 auto tmp3 = out_ptr0[i0];
2023-01-11T21:38:05.9794944Z                 auto tmp5 = out_ptr1[i0];
2023-01-11T21:38:05.9795052Z                 auto tmp8 = out_ptr3[i1];
2023-01-11T21:38:05.9795160Z                 auto tmp10 = out_ptr4[i1];
2023-01-11T21:38:05.9795254Z                 auto tmp13 = out_ptr2[i0];
2023-01-11T21:38:05.9795346Z                 auto tmp15 = out_ptr5[i0];
2023-01-11T21:38:05.9795442Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:05.9795567Z                 auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:05.9795667Z                 auto tmp6 = std::log(tmp5);
2023-01-11T21:38:05.9795798Z                 auto tmp7 = tmp4 - tmp6;
2023-01-11T21:38:05.9795930Z                 auto tmp9 = tmp0 - tmp8;
2023-01-11T21:38:05.9796030Z                 auto tmp11 = std::log(tmp10);
2023-01-11T21:38:05.9796164Z                 auto tmp12 = tmp9 - tmp11;
2023-01-11T21:38:05.9796297Z                 auto tmp14 = tmp1 - tmp13;
2023-01-11T21:38:05.9796398Z                 auto tmp16 = std::log(tmp15);
2023-01-11T21:38:05.9796528Z                 auto tmp17 = tmp14 - tmp16;
2023-01-11T21:38:05.9796669Z                 out_ptr6[i1 + (8*i0)] = tmp7;
2023-01-11T21:38:05.9796769Z                 out_ptr7[i1 + (8*i0)] = tmp12;
2023-01-11T21:38:05.9796865Z                 out_ptr8[i1 + (8*i0)] = tmp17;
2023-01-11T21:38:05.9796934Z             }
2023-01-11T21:38:05.9797002Z         }
2023-01-11T21:38:05.9797070Z     }
2023-01-11T21:38:05.9797130Z }
2023-01-11T21:38:05.9797214Z ''')
2023-01-11T21:38:05.9797220Z 
2023-01-11T21:38:05.9797224Z 
2023-01-11T21:38:05.9797319Z async_compile.wait(globals())
2023-01-11T21:38:05.9797395Z del async_compile
2023-01-11T21:38:05.9797400Z 
2023-01-11T21:38:05.9797476Z def call(args):
2023-01-11T21:38:05.9797559Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9797634Z     args.clear()
2023-01-11T21:38:05.9797823Z     buf0 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9798018Z     buf1 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9798206Z     buf6 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9798395Z     buf3 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9798580Z     buf4 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9798770Z     buf7 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9798957Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9799142Z     buf5 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9799319Z     buf8 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9799665Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf8.data_ptr()))
2023-01-11T21:38:05.9799741Z     del arg0_1
2023-01-11T21:38:05.9799813Z     del arg1_1
2023-01-11T21:38:05.9799901Z     return (buf2, buf5, buf8, )
2023-01-11T21:38:05.9799907Z 
2023-01-11T21:38:05.9799911Z 
2023-01-11T21:38:05.9799992Z if __name__ == "__main__":
2023-01-11T21:38:05.9800115Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9800272Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9800461Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9800657Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9800778Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9801044Z [2023-01-11 21:28:47,919] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 175
2023-01-11T21:38:05.9801050Z 
2023-01-11T21:38:05.9801122Z ok (1.915s)
2023-01-11T21:38:05.9801577Z   test_logsumexp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9801715Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9801977Z [2023-01-11 21:28:47,965] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 176
2023-01-11T21:38:05.9802236Z [2023-01-11 21:28:49,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 176
2023-01-11T21:38:05.9802242Z 
2023-01-11T21:38:05.9802343Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9802411Z import torch
2023-01-11T21:38:05.9802488Z import random
2023-01-11T21:38:05.9802607Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9802759Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9802765Z 
2023-01-11T21:38:05.9802848Z aten = torch.ops.aten
2023-01-11T21:38:05.9802984Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9803081Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9803087Z 
2023-01-11T21:38:05.9803161Z import triton
2023-01-11T21:38:05.9803248Z import triton.language as tl
2023-01-11T21:38:05.9803374Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9803518Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9803524Z 
2023-01-11T21:38:05.9803528Z 
2023-01-11T21:38:05.9803666Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9803874Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9803997Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9804107Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:05.9804225Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9804322Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:05.9804424Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:05.9804491Z {
2023-01-11T21:38:05.9804584Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:05.9804676Z     auto out_ptr2 = in_out_ptr1;
2023-01-11T21:38:05.9804780Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9804839Z     {
2023-01-11T21:38:05.9804942Z         #pragma omp for 
2023-01-11T21:38:05.9805036Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9805122Z         {
2023-01-11T21:38:05.9805198Z             {
2023-01-11T21:38:05.9805562Z                 #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:05.9805776Z                 float tmp1 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9805904Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:05.9805992Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:05.9806062Z                 {
2023-01-11T21:38:05.9806210Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9806362Z                     tmp1_vec = at::vec::maximum(tmp1_vec, tmp0);
2023-01-11T21:38:05.9806435Z                 }
2023-01-11T21:38:05.9806650Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp1_vec);
2023-01-11T21:38:05.9806777Z                 #pragma omp simd simdlen(4)  reduction(max:tmp1)
2023-01-11T21:38:05.9806870Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:05.9806944Z                 {
2023-01-11T21:38:05.9807042Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.9807145Z                     tmp1 = std::max(tmp1, tmp0);
2023-01-11T21:38:05.9807217Z                 }
2023-01-11T21:38:05.9807304Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:05.9807374Z             }
2023-01-11T21:38:05.9807441Z         }
2023-01-11T21:38:05.9807517Z         #pragma omp for 
2023-01-11T21:38:05.9807605Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9807674Z         {
2023-01-11T21:38:05.9807742Z             {
2023-01-11T21:38:05.9807935Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:05.9808021Z                 float tmp4 = 0;
2023-01-11T21:38:05.9808147Z                 auto tmp4_vec = at::vec::Vectorized<float>(tmp4);
2023-01-11T21:38:05.9808243Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:05.9808305Z                 {
2023-01-11T21:38:05.9808458Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:05.9808593Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:05.9808762Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9808858Z                     auto tmp3 = tmp2.exp();
2023-01-11T21:38:05.9808949Z                     tmp4_vec += tmp3;
2023-01-11T21:38:05.9809018Z                 }
2023-01-11T21:38:05.9809221Z                 tmp4 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp4_vec);
2023-01-11T21:38:05.9809340Z                 #pragma omp simd simdlen(4)  reduction(+:tmp4)
2023-01-11T21:38:05.9809435Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:05.9809504Z                 {
2023-01-11T21:38:05.9809607Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:05.9809704Z                     auto tmp1 = out_ptr0[i0];
2023-01-11T21:38:05.9809842Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9809943Z                     auto tmp3 = std::exp(tmp2);
2023-01-11T21:38:05.9810022Z                     tmp4 += tmp3;
2023-01-11T21:38:05.9810091Z                 }
2023-01-11T21:38:05.9810179Z                 out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9810249Z             }
2023-01-11T21:38:05.9810317Z         }
2023-01-11T21:38:05.9810398Z         #pragma omp for 
2023-01-11T21:38:05.9810483Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9810546Z         {
2023-01-11T21:38:05.9810613Z             {
2023-01-11T21:38:05.9810683Z                 {
2023-01-11T21:38:05.9810778Z                     auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:05.9810875Z                     auto tmp2 = out_ptr0[i0];
2023-01-11T21:38:05.9810978Z                     auto tmp1 = std::log(tmp0);
2023-01-11T21:38:05.9811079Z                     auto tmp3 = std::abs(tmp2);
2023-01-11T21:38:05.9811204Z                     auto tmp4 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9811299Z                     auto tmp5 = tmp3 == tmp4;
2023-01-11T21:38:05.9811411Z                     auto tmp6 = static_cast<float>(0.0);
2023-01-11T21:38:05.9811517Z                     auto tmp7 = tmp5 ? tmp6 : tmp2;
2023-01-11T21:38:05.9811613Z                     auto tmp8 = tmp1 + tmp7;
2023-01-11T21:38:05.9811707Z                     in_out_ptr0[i0] = tmp8;
2023-01-11T21:38:05.9811775Z                 }
2023-01-11T21:38:05.9811836Z             }
2023-01-11T21:38:05.9811936Z         }
2023-01-11T21:38:05.9812019Z         #pragma omp for 
2023-01-11T21:38:05.9812105Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9812172Z         {
2023-01-11T21:38:05.9812239Z             {
2023-01-11T21:38:05.9812308Z                 {
2023-01-11T21:38:05.9812517Z                     float tmp1 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9812613Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.9812686Z                     {
2023-01-11T21:38:05.9812761Z                         {
2023-01-11T21:38:05.9812873Z                             auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:05.9812986Z                             tmp1 = std::max(tmp1, tmp0);
2023-01-11T21:38:05.9813059Z                         }
2023-01-11T21:38:05.9813123Z                     }
2023-01-11T21:38:05.9813216Z                     out_ptr2[i0] = tmp1;
2023-01-11T21:38:05.9813287Z                 }
2023-01-11T21:38:05.9813355Z             }
2023-01-11T21:38:05.9813424Z         }
2023-01-11T21:38:05.9813508Z         #pragma omp for 
2023-01-11T21:38:05.9813587Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9813654Z         {
2023-01-11T21:38:05.9813721Z             {
2023-01-11T21:38:05.9813790Z                 {
2023-01-11T21:38:05.9813875Z                     float tmp4 = 0;
2023-01-11T21:38:05.9813970Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:05.9814044Z                     {
2023-01-11T21:38:05.9814109Z                         {
2023-01-11T21:38:05.9814218Z                             auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:05.9814323Z                             auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:05.9814625Z                             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9814736Z                             auto tmp3 = std::exp(tmp2);
2023-01-11T21:38:05.9814825Z                             tmp4 += tmp3;
2023-01-11T21:38:05.9814898Z                         }
2023-01-11T21:38:05.9814964Z                     }
2023-01-11T21:38:05.9815056Z                     out_ptr3[i0] = tmp4;
2023-01-11T21:38:05.9815125Z                 }
2023-01-11T21:38:05.9815192Z             }
2023-01-11T21:38:05.9815259Z         }
2023-01-11T21:38:05.9815340Z         #pragma omp for 
2023-01-11T21:38:05.9815425Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9815485Z         {
2023-01-11T21:38:05.9815554Z             {
2023-01-11T21:38:05.9815623Z                 {
2023-01-11T21:38:05.9815719Z                     auto tmp0 = out_ptr3[i0];
2023-01-11T21:38:05.9815816Z                     auto tmp2 = out_ptr2[i0];
2023-01-11T21:38:05.9815920Z                     auto tmp1 = std::log(tmp0);
2023-01-11T21:38:05.9816025Z                     auto tmp3 = std::abs(tmp2);
2023-01-11T21:38:05.9816148Z                     auto tmp4 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9816245Z                     auto tmp5 = tmp3 == tmp4;
2023-01-11T21:38:05.9816355Z                     auto tmp6 = static_cast<float>(0.0);
2023-01-11T21:38:05.9816462Z                     auto tmp7 = tmp5 ? tmp6 : tmp2;
2023-01-11T21:38:05.9816561Z                     auto tmp8 = tmp1 + tmp7;
2023-01-11T21:38:05.9816670Z                     auto tmp9 = static_cast<float>(2);
2023-01-11T21:38:05.9816812Z                     auto tmp10 = tmp8 - tmp9;
2023-01-11T21:38:05.9816899Z                     in_out_ptr1[i0] = tmp10;
2023-01-11T21:38:05.9816970Z                 }
2023-01-11T21:38:05.9817038Z             }
2023-01-11T21:38:05.9817105Z         }
2023-01-11T21:38:05.9817241Z     }
2023-01-11T21:38:05.9817309Z }
2023-01-11T21:38:05.9817395Z ''')
2023-01-11T21:38:05.9817400Z 
2023-01-11T21:38:05.9817408Z 
2023-01-11T21:38:05.9817496Z async_compile.wait(globals())
2023-01-11T21:38:05.9817577Z del async_compile
2023-01-11T21:38:05.9817582Z 
2023-01-11T21:38:05.9817656Z def call(args):
2023-01-11T21:38:05.9817730Z     arg0_1, = args
2023-01-11T21:38:05.9817808Z     args.clear()
2023-01-11T21:38:05.9818059Z     buf0 = empty_strided((8, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9818253Z     buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9818369Z     buf2 = as_strided(buf0, (8, ), (1, )); del buf0  # reuse
2023-01-11T21:38:05.9818554Z     buf3 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9818743Z     buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9818859Z     buf5 = as_strided(buf3, (8, ), (1, )); del buf3  # reuse
2023-01-11T21:38:05.9819077Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:05.9819155Z     del arg0_1
2023-01-11T21:38:05.9819237Z     return (buf2, buf5, )
2023-01-11T21:38:05.9819243Z 
2023-01-11T21:38:05.9819247Z 
2023-01-11T21:38:05.9819328Z if __name__ == "__main__":
2023-01-11T21:38:05.9819446Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9819567Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9819765Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9819879Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9819885Z 
2023-01-11T21:38:05.9819955Z ok (1.846s)
2023-01-11T21:38:05.9820410Z   test_long_tensor_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9820578Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9820838Z [2023-01-11 21:28:49,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 177
2023-01-11T21:38:05.9821105Z [2023-01-11 21:28:51,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 177
2023-01-11T21:38:05.9821111Z 
2023-01-11T21:38:05.9821209Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9821276Z import torch
2023-01-11T21:38:05.9821352Z import random
2023-01-11T21:38:05.9821472Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9821601Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9821606Z 
2023-01-11T21:38:05.9821688Z aten = torch.ops.aten
2023-01-11T21:38:05.9821826Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9821926Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9821935Z 
2023-01-11T21:38:05.9822009Z import triton
2023-01-11T21:38:05.9822093Z import triton.language as tl
2023-01-11T21:38:05.9822218Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9822358Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9822363Z 
2023-01-11T21:38:05.9822371Z 
2023-01-11T21:38:05.9822508Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9822720Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9822845Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9822948Z                        long* __restrict__ out_ptr0,
2023-01-11T21:38:05.9823049Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.9823107Z {
2023-01-11T21:38:05.9823209Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9823276Z     {
2023-01-11T21:38:05.9823358Z         #pragma omp for 
2023-01-11T21:38:05.9823449Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:05.9823516Z         {
2023-01-11T21:38:05.9823577Z             {
2023-01-11T21:38:05.9823646Z                 {
2023-01-11T21:38:05.9823743Z                     auto tmp1 = in_ptr0[i0];
2023-01-11T21:38:05.9823851Z                     auto tmp0 = static_cast<long>(294);
2023-01-11T21:38:05.9824020Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:05.9824131Z                     auto tmp3 = static_cast<long>(295);
2023-01-11T21:38:05.9824231Z                     auto tmp4 = tmp3 + tmp1;
2023-01-11T21:38:05.9824314Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9824407Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:05.9824475Z                 }
2023-01-11T21:38:05.9824543Z             }
2023-01-11T21:38:05.9824609Z         }
2023-01-11T21:38:05.9824675Z     }
2023-01-11T21:38:05.9824740Z }
2023-01-11T21:38:05.9824817Z ''')
2023-01-11T21:38:05.9824823Z 
2023-01-11T21:38:05.9824827Z 
2023-01-11T21:38:05.9824925Z async_compile.wait(globals())
2023-01-11T21:38:05.9825001Z del async_compile
2023-01-11T21:38:05.9825006Z 
2023-01-11T21:38:05.9825083Z def call(args):
2023-01-11T21:38:05.9825157Z     arg0_1, = args
2023-01-11T21:38:05.9825232Z     args.clear()
2023-01-11T21:38:05.9825424Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9825619Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9825779Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9825852Z     del arg0_1
2023-01-11T21:38:05.9825933Z     return (buf0, buf1, )
2023-01-11T21:38:05.9825938Z 
2023-01-11T21:38:05.9825942Z 
2023-01-11T21:38:05.9826024Z if __name__ == "__main__":
2023-01-11T21:38:05.9826143Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9826268Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9826458Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9826593Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9826606Z 
2023-01-11T21:38:05.9826670Z ok (1.705s)
2023-01-11T21:38:05.9827189Z   test_lowmem_dropout1_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.9827270Z   warnings.warn(
2023-01-11T21:38:05.9827527Z [2023-01-11 21:28:51,493] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 178
2023-01-11T21:38:05.9827788Z [2023-01-11 21:28:53,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 178
2023-01-11T21:38:05.9828043Z [2023-01-11 21:28:53,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 178
2023-01-11T21:38:05.9828309Z [2023-01-11 21:28:53,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 178
2023-01-11T21:38:05.9828565Z [2023-01-11 21:28:53,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 179
2023-01-11T21:38:05.9828822Z [2023-01-11 21:28:53,328] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.9829075Z [2023-01-11 21:28:55,028] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 179
2023-01-11T21:38:05.9829327Z [2023-01-11 21:28:55,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 179
2023-01-11T21:38:05.9829333Z 
2023-01-11T21:38:05.9829436Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9829513Z import torch
2023-01-11T21:38:05.9829589Z import random
2023-01-11T21:38:05.9829710Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9829834Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9829842Z 
2023-01-11T21:38:05.9829925Z aten = torch.ops.aten
2023-01-11T21:38:05.9830053Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9830149Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9830154Z 
2023-01-11T21:38:05.9830227Z import triton
2023-01-11T21:38:05.9830319Z import triton.language as tl
2023-01-11T21:38:05.9830473Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9830614Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9830619Z 
2023-01-11T21:38:05.9830623Z 
2023-01-11T21:38:05.9830760Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9830967Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9831083Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9831193Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9831296Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9831365Z {
2023-01-11T21:38:05.9831466Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9831532Z     {
2023-01-11T21:38:05.9831615Z         #pragma omp for 
2023-01-11T21:38:05.9831697Z         for(long i0=0; i0<12500; i0+=1)
2023-01-11T21:38:05.9831763Z         {
2023-01-11T21:38:05.9831906Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9832043Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9832133Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9832230Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9832297Z         }
2023-01-11T21:38:05.9832389Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9832485Z         for(long i0=100000; i0<100000; i0+=1)
2023-01-11T21:38:05.9832551Z         {
2023-01-11T21:38:05.9832639Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9832726Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9832844Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9832929Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9832989Z         }
2023-01-11T21:38:05.9833055Z     }
2023-01-11T21:38:05.9833119Z }
2023-01-11T21:38:05.9833204Z ''')
2023-01-11T21:38:05.9833209Z 
2023-01-11T21:38:05.9833214Z 
2023-01-11T21:38:05.9833309Z async_compile.wait(globals())
2023-01-11T21:38:05.9833387Z del async_compile
2023-01-11T21:38:05.9833392Z 
2023-01-11T21:38:05.9833467Z def call(args):
2023-01-11T21:38:05.9833553Z     primals_1, primals_2 = args
2023-01-11T21:38:05.9833629Z     args.clear()
2023-01-11T21:38:05.9833832Z     buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9834008Z     kernel_cpp_0(c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9834085Z     del primals_2
2023-01-11T21:38:05.9834177Z     return (buf0, primals_1, )
2023-01-11T21:38:05.9834182Z 
2023-01-11T21:38:05.9834186Z 
2023-01-11T21:38:05.9834268Z if __name__ == "__main__":
2023-01-11T21:38:05.9834385Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9834504Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9834713Z     primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9834917Z     primals_2 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9835050Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:05.9835056Z 
2023-01-11T21:38:05.9835061Z 
2023-01-11T21:38:05.9835160Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9835235Z import torch
2023-01-11T21:38:05.9835309Z import random
2023-01-11T21:38:05.9835430Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9835546Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9835551Z 
2023-01-11T21:38:05.9835632Z aten = torch.ops.aten
2023-01-11T21:38:05.9835767Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9835865Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9835870Z 
2023-01-11T21:38:05.9835943Z import triton
2023-01-11T21:38:05.9836036Z import triton.language as tl
2023-01-11T21:38:05.9836161Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9836324Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9836337Z 
2023-01-11T21:38:05.9836341Z 
2023-01-11T21:38:05.9836471Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9836677Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9836801Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9836909Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9837013Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9837079Z {
2023-01-11T21:38:05.9837182Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9837244Z     {
2023-01-11T21:38:05.9837325Z         #pragma omp for 
2023-01-11T21:38:05.9837413Z         for(long i0=0; i0<12500; i0+=1)
2023-01-11T21:38:05.9837480Z         {
2023-01-11T21:38:05.9837619Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9837756Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9837845Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9837941Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9838002Z         }
2023-01-11T21:38:05.9838101Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9838197Z         for(long i0=100000; i0<100000; i0+=1)
2023-01-11T21:38:05.9838264Z         {
2023-01-11T21:38:05.9838353Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9838442Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9838523Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:05.9838609Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9838720Z         }
2023-01-11T21:38:05.9838786Z     }
2023-01-11T21:38:05.9838852Z }
2023-01-11T21:38:05.9838937Z ''')
2023-01-11T21:38:05.9838942Z 
2023-01-11T21:38:05.9838947Z 
2023-01-11T21:38:05.9839044Z async_compile.wait(globals())
2023-01-11T21:38:05.9839115Z del async_compile
2023-01-11T21:38:05.9839126Z 
2023-01-11T21:38:05.9839196Z def call(args):
2023-01-11T21:38:05.9839290Z     primals_1, tangents_1 = args
2023-01-11T21:38:05.9839366Z     args.clear()
2023-01-11T21:38:05.9839567Z     buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9839746Z     kernel_cpp_0(c_void_p(tangents_1.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9839826Z     del primals_1
2023-01-11T21:38:05.9839901Z     del tangents_1
2023-01-11T21:38:05.9839976Z     return (None, buf0, )
2023-01-11T21:38:05.9839982Z 
2023-01-11T21:38:05.9839987Z 
2023-01-11T21:38:05.9840065Z if __name__ == "__main__":
2023-01-11T21:38:05.9840184Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9840311Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9840516Z     primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9840724Z     tangents_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9840855Z     print_performance(lambda: call([primals_1, tangents_1]))
2023-01-11T21:38:05.9840861Z 
2023-01-11T21:38:05.9840865Z 
2023-01-11T21:38:05.9840961Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9841028Z import torch
2023-01-11T21:38:05.9841102Z import random
2023-01-11T21:38:05.9841221Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9841344Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9841349Z 
2023-01-11T21:38:05.9841430Z aten = torch.ops.aten
2023-01-11T21:38:05.9841565Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9841664Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9841669Z 
2023-01-11T21:38:05.9841743Z import triton
2023-01-11T21:38:05.9841828Z import triton.language as tl
2023-01-11T21:38:05.9841953Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9842091Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9842286Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.9842293Z 
2023-01-11T21:38:05.9842297Z 
2023-01-11T21:38:05.9842434Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9842642Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9842760Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:05.9842874Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9842978Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9843083Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9843152Z {
2023-01-11T21:38:05.9843254Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9843321Z     {
2023-01-11T21:38:05.9843402Z         #pragma omp for 
2023-01-11T21:38:05.9843492Z         for(long i0=0; i0<100000; i0+=1)
2023-01-11T21:38:05.9843552Z         {
2023-01-11T21:38:05.9843622Z             {
2023-01-11T21:38:05.9843692Z                 {
2023-01-11T21:38:05.9843784Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.9843884Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.9843982Z                     auto tmp7 = in_ptr2[i0];
2023-01-11T21:38:05.9844081Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.9844223Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.9844333Z                     auto tmp3 = static_cast<float>(0.33);
2023-01-11T21:38:05.9844431Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.9844573Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.9844669Z                     auto tmp8 = tmp6 * tmp7;
2023-01-11T21:38:05.9844763Z                     auto tmp9 = tmp5 * tmp8;
2023-01-11T21:38:05.9844886Z                     auto tmp10 = static_cast<float>(1.492537313432836);
2023-01-11T21:38:05.9844981Z                     auto tmp11 = tmp9 * tmp10;
2023-01-11T21:38:05.9845072Z                     out_ptr0[i0] = tmp11;
2023-01-11T21:38:05.9845141Z                 }
2023-01-11T21:38:05.9845208Z             }
2023-01-11T21:38:05.9845277Z         }
2023-01-11T21:38:05.9845348Z     }
2023-01-11T21:38:05.9845420Z }
2023-01-11T21:38:05.9845510Z ''')
2023-01-11T21:38:05.9845516Z 
2023-01-11T21:38:05.9845522Z 
2023-01-11T21:38:05.9845637Z async_compile.wait(globals())
2023-01-11T21:38:05.9845715Z del async_compile
2023-01-11T21:38:05.9845720Z 
2023-01-11T21:38:05.9845795Z def call(args):
2023-01-11T21:38:05.9845886Z     primals_1, primals_2 = args
2023-01-11T21:38:05.9845961Z     args.clear()
2023-01-11T21:38:05.9846103Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.9846295Z     buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9846512Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9846589Z     del primals_2
2023-01-11T21:38:05.9846707Z     return (buf0, primals_1, seed_cpu_None.clone(), )
2023-01-11T21:38:05.9846713Z 
2023-01-11T21:38:05.9846717Z 
2023-01-11T21:38:05.9846800Z if __name__ == "__main__":
2023-01-11T21:38:05.9846917Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9847044Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9847238Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9847437Z     primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9847639Z     primals_2 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9847771Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:05.9847776Z 
2023-01-11T21:38:05.9848071Z [2023-01-11 21:28:56,699] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 179
2023-01-11T21:38:05.9848077Z 
2023-01-11T21:38:05.9848176Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9848251Z import torch
2023-01-11T21:38:05.9848329Z import random
2023-01-11T21:38:05.9848449Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9848565Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9848578Z 
2023-01-11T21:38:05.9848653Z aten = torch.ops.aten
2023-01-11T21:38:05.9848789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9848884Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9848889Z 
2023-01-11T21:38:05.9848966Z import triton
2023-01-11T21:38:05.9849061Z import triton.language as tl
2023-01-11T21:38:05.9849185Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9849325Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9849331Z 
2023-01-11T21:38:05.9849335Z 
2023-01-11T21:38:05.9849472Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9849673Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9849797Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9849907Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9850017Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:05.9850121Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9850188Z {
2023-01-11T21:38:05.9850291Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9850350Z     {
2023-01-11T21:38:05.9850431Z         #pragma omp for 
2023-01-11T21:38:05.9850552Z         for(long i0=0; i0<100000; i0+=1)
2023-01-11T21:38:05.9850619Z         {
2023-01-11T21:38:05.9850688Z             {
2023-01-11T21:38:05.9850756Z                 {
2023-01-11T21:38:05.9850846Z                     auto tmp0 = in_ptr0[0];
2023-01-11T21:38:05.9850944Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.9851043Z                     auto tmp10 = in_ptr2[i0];
2023-01-11T21:38:05.9851149Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.9851293Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.9851403Z                     auto tmp3 = static_cast<float>(0.33);
2023-01-11T21:38:05.9851501Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.9851613Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.9851702Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.9851821Z                     auto tmp8 = static_cast<float>(1.492537313432836);
2023-01-11T21:38:05.9851919Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.9852018Z                     auto tmp11 = tmp9 * tmp10;
2023-01-11T21:38:05.9852109Z                     out_ptr0[i0] = tmp11;
2023-01-11T21:38:05.9852178Z                 }
2023-01-11T21:38:05.9852246Z             }
2023-01-11T21:38:05.9852306Z         }
2023-01-11T21:38:05.9852378Z     }
2023-01-11T21:38:05.9852444Z }
2023-01-11T21:38:05.9852530Z ''')
2023-01-11T21:38:05.9852535Z 
2023-01-11T21:38:05.9852540Z 
2023-01-11T21:38:05.9852635Z async_compile.wait(globals())
2023-01-11T21:38:05.9852712Z del async_compile
2023-01-11T21:38:05.9852717Z 
2023-01-11T21:38:05.9852791Z def call(args):
2023-01-11T21:38:05.9852899Z     primals_1, philox_seed_like, tangents_1 = args
2023-01-11T21:38:05.9852974Z     args.clear()
2023-01-11T21:38:05.9853176Z     buf0 = empty_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9853397Z     kernel_cpp_0(c_void_p(philox_seed_like.data_ptr()), c_void_p(tangents_1.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9853486Z     del philox_seed_like
2023-01-11T21:38:05.9853562Z     del primals_1
2023-01-11T21:38:05.9853638Z     del tangents_1
2023-01-11T21:38:05.9853713Z     return (None, buf0, )
2023-01-11T21:38:05.9853726Z 
2023-01-11T21:38:05.9853730Z 
2023-01-11T21:38:05.9853834Z if __name__ == "__main__":
2023-01-11T21:38:05.9853954Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9854079Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9854285Z     primals_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9854591Z     philox_seed_like = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9854805Z     tangents_1 = rand_strided((100000, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9854965Z     print_performance(lambda: call([primals_1, philox_seed_like, tangents_1]))
2023-01-11T21:38:05.9854975Z 
2023-01-11T21:38:05.9855062Z ok (5.232s)
2023-01-11T21:38:05.9855411Z   test_lowmem_dropout2_cpu (__main__.CpuTests) ... [2023-01-11 21:28:56,934] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 180
2023-01-11T21:38:05.9855667Z [2023-01-11 21:28:56,935] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:05.9855933Z [2023-01-11 21:28:58,741] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 180
2023-01-11T21:38:05.9856190Z [2023-01-11 21:28:58,811] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 180
2023-01-11T21:38:05.9856452Z [2023-01-11 21:29:01,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 180
2023-01-11T21:38:05.9856457Z 
2023-01-11T21:38:05.9856555Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9856632Z import torch
2023-01-11T21:38:05.9856709Z import random
2023-01-11T21:38:05.9856868Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9856992Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9856997Z 
2023-01-11T21:38:05.9857079Z aten = torch.ops.aten
2023-01-11T21:38:05.9857274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9857388Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9857393Z 
2023-01-11T21:38:05.9857478Z import triton
2023-01-11T21:38:05.9857571Z import triton.language as tl
2023-01-11T21:38:05.9857697Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9857828Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9857992Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:05.9857997Z 
2023-01-11T21:38:05.9858002Z 
2023-01-11T21:38:05.9858140Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9858344Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9858471Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9858578Z                        const long* __restrict__ seed0)
2023-01-11T21:38:05.9858643Z {
2023-01-11T21:38:05.9858744Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9858803Z     {
2023-01-11T21:38:05.9858889Z         #pragma omp for 
2023-01-11T21:38:05.9858977Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.9859044Z         {
2023-01-11T21:38:05.9859111Z             {
2023-01-11T21:38:05.9859183Z                 {
2023-01-11T21:38:05.9859268Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.9859372Z                     auto tmp6 = in_out_ptr0[i0];
2023-01-11T21:38:05.9859478Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.9859626Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.9859736Z                     auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:05.9859838Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.9859948Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.9860044Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.9860148Z                     auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.9860285Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.9860382Z                     in_out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.9860452Z                 }
2023-01-11T21:38:05.9860520Z             }
2023-01-11T21:38:05.9860587Z         }
2023-01-11T21:38:05.9860653Z     }
2023-01-11T21:38:05.9860711Z }
2023-01-11T21:38:05.9860797Z ''')
2023-01-11T21:38:05.9860802Z 
2023-01-11T21:38:05.9860807Z 
2023-01-11T21:38:05.9860942Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.9861149Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9861270Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9861378Z                        const long* __restrict__ seed0)
2023-01-11T21:38:05.9861442Z {
2023-01-11T21:38:05.9861537Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9861604Z     {
2023-01-11T21:38:05.9861685Z         #pragma omp for 
2023-01-11T21:38:05.9861772Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.9861841Z         {
2023-01-11T21:38:05.9861909Z             {
2023-01-11T21:38:05.9861978Z                 {
2023-01-11T21:38:05.9862061Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:05.9862163Z                     auto tmp6 = in_out_ptr0[i0];
2023-01-11T21:38:05.9862276Z                     auto tmp1 = static_cast<int>(256 + i0);
2023-01-11T21:38:05.9862421Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.9862531Z                     auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:05.9862626Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.9862737Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.9862857Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.9862969Z                     auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.9863064Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.9863160Z                     in_out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.9863236Z                 }
2023-01-11T21:38:05.9863303Z             }
2023-01-11T21:38:05.9863371Z         }
2023-01-11T21:38:05.9863431Z     }
2023-01-11T21:38:05.9863494Z }
2023-01-11T21:38:05.9863580Z ''')
2023-01-11T21:38:05.9863585Z 
2023-01-11T21:38:05.9863590Z 
2023-01-11T21:38:05.9863684Z async_compile.wait(globals())
2023-01-11T21:38:05.9863761Z del async_compile
2023-01-11T21:38:05.9863766Z 
2023-01-11T21:38:05.9863840Z def call(args):
2023-01-11T21:38:05.9863946Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:05.9864015Z     args.clear()
2023-01-11T21:38:05.9864153Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:05.9864354Z     buf0 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9864492Z     aten.mm.out(primals_3, as_strided(primals_1, (32, 32), (1, 32)), out=buf0)
2023-01-11T21:38:05.9864568Z     del primals_1
2023-01-11T21:38:05.9864659Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:05.9864810Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(seed_cpu_None.data_ptr()))
2023-01-11T21:38:05.9865006Z     buf2 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9865128Z     aten.mm.out(buf1, as_strided(primals_2, (32, 32), (1, 32)), out=buf2)
2023-01-11T21:38:05.9865242Z     buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:05.9865410Z     kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(seed_cpu_None.data_ptr()))
2023-01-11T21:38:05.9865574Z     return (buf3, primals_3, seed_cpu_None.clone(), buf1, as_strided(primals_2, (32, 32), (32, 1)), )
2023-01-11T21:38:05.9865580Z 
2023-01-11T21:38:05.9865586Z 
2023-01-11T21:38:05.9865665Z if __name__ == "__main__":
2023-01-11T21:38:05.9865782Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9865908Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9866103Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9866333Z     primals_1 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9866534Z     primals_2 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9866732Z     primals_3 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9866877Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:05.9866882Z 
2023-01-11T21:38:05.9866887Z 
2023-01-11T21:38:05.9866989Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9867064Z import torch
2023-01-11T21:38:05.9867138Z import random
2023-01-11T21:38:05.9867259Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9867380Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9867385Z 
2023-01-11T21:38:05.9867467Z aten = torch.ops.aten
2023-01-11T21:38:05.9867603Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9867699Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9867707Z 
2023-01-11T21:38:05.9867783Z import triton
2023-01-11T21:38:05.9867873Z import triton.language as tl
2023-01-11T21:38:05.9867998Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9868135Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9868141Z 
2023-01-11T21:38:05.9868145Z 
2023-01-11T21:38:05.9868274Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9868478Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9868600Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:05.9868714Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9868847Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9868912Z {
2023-01-11T21:38:05.9869015Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9869074Z     {
2023-01-11T21:38:05.9869155Z         #pragma omp for 
2023-01-11T21:38:05.9869245Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.9869314Z         {
2023-01-11T21:38:05.9869381Z             {
2023-01-11T21:38:05.9869450Z                 {
2023-01-11T21:38:05.9869546Z                     auto tmp0 = in_ptr0[0];
2023-01-11T21:38:05.9869637Z                     auto tmp6 = in_ptr1[i0];
2023-01-11T21:38:05.9869750Z                     auto tmp1 = static_cast<int>(256 + i0);
2023-01-11T21:38:05.9869896Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.9870005Z                     auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:05.9870101Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.9870217Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.9870317Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.9870420Z                     auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.9870515Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.9870607Z                     out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.9870679Z                 }
2023-01-11T21:38:05.9870746Z             }
2023-01-11T21:38:05.9870813Z         }
2023-01-11T21:38:05.9870879Z     }
2023-01-11T21:38:05.9870936Z }
2023-01-11T21:38:05.9871021Z ''')
2023-01-11T21:38:05.9871026Z 
2023-01-11T21:38:05.9871030Z 
2023-01-11T21:38:05.9871167Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:05.9871371Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9871492Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:05.9871601Z                        const long* __restrict__ in_ptr0)
2023-01-11T21:38:05.9871670Z {
2023-01-11T21:38:05.9871771Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9871831Z     {
2023-01-11T21:38:05.9871911Z         #pragma omp for 
2023-01-11T21:38:05.9871999Z         for(long i0=0; i0<256; i0+=1)
2023-01-11T21:38:05.9872067Z         {
2023-01-11T21:38:05.9872134Z             {
2023-01-11T21:38:05.9872235Z                 {
2023-01-11T21:38:05.9872324Z                     auto tmp0 = in_ptr0[0];
2023-01-11T21:38:05.9872426Z                     auto tmp6 = in_out_ptr0[i0];
2023-01-11T21:38:05.9872534Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:05.9872677Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:05.9872788Z                     auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:05.9872883Z                     auto tmp4 = tmp2 > tmp3;
2023-01-11T21:38:05.9872994Z                     auto tmp5 = static_cast<float>(tmp4);
2023-01-11T21:38:05.9873094Z                     auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:05.9873195Z                     auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.9873290Z                     auto tmp9 = tmp7 * tmp8;
2023-01-11T21:38:05.9873384Z                     in_out_ptr0[i0] = tmp9;
2023-01-11T21:38:05.9873455Z                 }
2023-01-11T21:38:05.9873527Z             }
2023-01-11T21:38:05.9873594Z         }
2023-01-11T21:38:05.9873653Z     }
2023-01-11T21:38:05.9873719Z }
2023-01-11T21:38:05.9873804Z ''')
2023-01-11T21:38:05.9873810Z 
2023-01-11T21:38:05.9873814Z 
2023-01-11T21:38:05.9873909Z async_compile.wait(globals())
2023-01-11T21:38:05.9873986Z del async_compile
2023-01-11T21:38:05.9873991Z 
2023-01-11T21:38:05.9874069Z def call(args):
2023-01-11T21:38:05.9874207Z     primals_3, philox_seed_like, mul_1, permute_4, tangents_1 = args
2023-01-11T21:38:05.9874288Z     args.clear()
2023-01-11T21:38:05.9874479Z     buf0 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9874698Z     kernel_cpp_0(c_void_p(philox_seed_like.data_ptr()), c_void_p(tangents_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9874779Z     del tangents_1
2023-01-11T21:38:05.9874979Z     buf1 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9875106Z     aten.mm.out(as_strided(buf0, (32, 8), (1, 32)), mul_1, out=buf1)
2023-01-11T21:38:05.9875176Z     del mul_1
2023-01-11T21:38:05.9875369Z     buf2 = empty_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9875465Z     aten.mm.out(buf0, permute_4, out=buf2)
2023-01-11T21:38:05.9875537Z     del buf0
2023-01-11T21:38:05.9875614Z     del permute_4
2023-01-11T21:38:05.9875704Z     buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:05.9875858Z     kernel_cpp_1(c_void_p(buf3.data_ptr()), c_void_p(philox_seed_like.data_ptr()))
2023-01-11T21:38:05.9875943Z     del philox_seed_like
2023-01-11T21:38:05.9876139Z     buf4 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9876265Z     aten.mm.out(as_strided(buf3, (32, 8), (1, 32)), primals_3, out=buf4)
2023-01-11T21:38:05.9876339Z     del buf3
2023-01-11T21:38:05.9876415Z     del primals_3
2023-01-11T21:38:05.9876552Z     return (as_strided(buf4, (32, 32), (32, 1)), as_strided(buf1, (32, 32), (32, 1)), None, )
2023-01-11T21:38:05.9876558Z 
2023-01-11T21:38:05.9876565Z 
2023-01-11T21:38:05.9876646Z if __name__ == "__main__":
2023-01-11T21:38:05.9876768Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9876897Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9877103Z     primals_3 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9877294Z     philox_seed_like = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9877488Z     mul_1 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9877691Z     permute_4 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9877892Z     tangents_1 = rand_strided((8, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9878063Z     print_performance(lambda: call([primals_3, philox_seed_like, mul_1, permute_4, tangents_1]))
2023-01-11T21:38:05.9878069Z 
2023-01-11T21:38:05.9878140Z ok (4.423s)
2023-01-11T21:38:05.9878627Z   test_masked_fill_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9878764Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9879023Z [2023-01-11 21:29:01,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 181
2023-01-11T21:38:05.9879286Z [2023-01-11 21:29:03,275] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 181
2023-01-11T21:38:05.9879294Z 
2023-01-11T21:38:05.9879386Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9879465Z import torch
2023-01-11T21:38:05.9879542Z import random
2023-01-11T21:38:05.9879663Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9879787Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9879793Z 
2023-01-11T21:38:05.9879875Z aten = torch.ops.aten
2023-01-11T21:38:05.9880015Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9880111Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9880116Z 
2023-01-11T21:38:05.9880184Z import triton
2023-01-11T21:38:05.9880276Z import triton.language as tl
2023-01-11T21:38:05.9880401Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9880541Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9880629Z 
2023-01-11T21:38:05.9880634Z 
2023-01-11T21:38:05.9880773Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9880979Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9881103Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.9881214Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9881311Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9881413Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9881480Z {
2023-01-11T21:38:05.9881585Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9881652Z     {
2023-01-11T21:38:05.9881735Z         #pragma omp for 
2023-01-11T21:38:05.9881823Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9881884Z         {
2023-01-11T21:38:05.9881969Z             #pragma GCC ivdep
2023-01-11T21:38:05.9882062Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:05.9882133Z             {
2023-01-11T21:38:05.9882206Z                 {
2023-01-11T21:38:05.9882277Z                     {
2023-01-11T21:38:05.9882371Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9882483Z                         auto tmp2 = in_ptr1[i1 + (16*i0)];
2023-01-11T21:38:05.9882654Z                         auto tmp1 = static_cast<float>(-10000.0);
2023-01-11T21:38:05.9882766Z                         auto tmp3 = tmp0 ? tmp1 : tmp2;
2023-01-11T21:38:05.9882877Z                         auto tmp4 = static_cast<float>(2);
2023-01-11T21:38:05.9882979Z                         auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:05.9883076Z                         auto tmp6 = tmp0 == 0;
2023-01-11T21:38:05.9883192Z                         auto tmp7 = static_cast<float>(667.0);
2023-01-11T21:38:05.9883300Z                         auto tmp8 = static_cast<float>(2.0);
2023-01-11T21:38:05.9883398Z                         auto tmp9 = tmp2 / tmp8;
2023-01-11T21:38:05.9883505Z                         auto tmp10 = tmp6 ? tmp7 : tmp9;
2023-01-11T21:38:05.9883611Z                         out_ptr0[i1 + (16*i0)] = tmp5;
2023-01-11T21:38:05.9883714Z                         out_ptr1[i1 + (16*i0)] = tmp10;
2023-01-11T21:38:05.9883786Z                     }
2023-01-11T21:38:05.9883857Z                 }
2023-01-11T21:38:05.9883918Z             }
2023-01-11T21:38:05.9883985Z         }
2023-01-11T21:38:05.9884081Z     }
2023-01-11T21:38:05.9884146Z }
2023-01-11T21:38:05.9884231Z ''')
2023-01-11T21:38:05.9884237Z 
2023-01-11T21:38:05.9884241Z 
2023-01-11T21:38:05.9884337Z async_compile.wait(globals())
2023-01-11T21:38:05.9884414Z del async_compile
2023-01-11T21:38:05.9884419Z 
2023-01-11T21:38:05.9884487Z def call(args):
2023-01-11T21:38:05.9884567Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9884644Z     args.clear()
2023-01-11T21:38:05.9884843Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9885042Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9885240Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9885314Z     del arg0_1
2023-01-11T21:38:05.9885379Z     del arg1_1
2023-01-11T21:38:05.9885465Z     return (buf0, buf1, )
2023-01-11T21:38:05.9885470Z 
2023-01-11T21:38:05.9885475Z 
2023-01-11T21:38:05.9885558Z if __name__ == "__main__":
2023-01-11T21:38:05.9885677Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9885804Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9885998Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9886196Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9886317Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9886322Z 
2023-01-11T21:38:05.9886386Z ok (2.153s)
2023-01-11T21:38:05.9886908Z   test_masked_fill_promotion_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:05.9887024Z   warnings.warn(
2023-01-11T21:38:05.9887286Z [2023-01-11 21:29:03,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 182
2023-01-11T21:38:05.9887551Z [2023-01-11 21:29:05,432] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 182
2023-01-11T21:38:05.9887806Z [2023-01-11 21:29:05,455] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 183
2023-01-11T21:38:05.9888068Z [2023-01-11 21:29:07,157] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 183
2023-01-11T21:38:05.9888074Z 
2023-01-11T21:38:05.9888174Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9888248Z import torch
2023-01-11T21:38:05.9888319Z import random
2023-01-11T21:38:05.9888440Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9888564Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9888569Z 
2023-01-11T21:38:05.9888654Z aten = torch.ops.aten
2023-01-11T21:38:05.9888791Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9888895Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9888900Z 
2023-01-11T21:38:05.9888975Z import triton
2023-01-11T21:38:05.9889067Z import triton.language as tl
2023-01-11T21:38:05.9889185Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9889325Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9889331Z 
2023-01-11T21:38:05.9889335Z 
2023-01-11T21:38:05.9889472Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9889676Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9889800Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.9889911Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9890016Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:05.9890081Z {
2023-01-11T21:38:05.9890176Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9890243Z     {
2023-01-11T21:38:05.9890353Z         #pragma omp for 
2023-01-11T21:38:05.9890443Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9890510Z         {
2023-01-11T21:38:05.9890600Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:05.9890661Z             {
2023-01-11T21:38:05.9890770Z                 float g_tmp_buffer_in_ptr0[8] = {0};
2023-01-11T21:38:05.9890901Z                 flag_to_float(in_ptr0 + 8*i1, g_tmp_buffer_in_ptr0, 8);
2023-01-11T21:38:05.9891055Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(g_tmp_buffer_in_ptr0);
2023-01-11T21:38:05.9891202Z                 auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (16*i0));
2023-01-11T21:38:05.9891349Z                 auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3.5));
2023-01-11T21:38:05.9891480Z                 auto tmp3 = decltype(tmp1)::blendv(tmp2, tmp1, tmp0);
2023-01-11T21:38:05.9891589Z                 tmp3.store(out_ptr0 + (8*i1) + (16*i0));
2023-01-11T21:38:05.9891658Z             }
2023-01-11T21:38:05.9891752Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:05.9891846Z             for(long i1=16; i1<16; i1+=1)
2023-01-11T21:38:05.9891917Z             {
2023-01-11T21:38:05.9892011Z                 auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9892114Z                 auto tmp2 = in_ptr1[i1 + (16*i0)];
2023-01-11T21:38:05.9892220Z                 auto tmp1 = static_cast<float>(3.5);
2023-01-11T21:38:05.9892323Z                 auto tmp3 = tmp0 ? tmp1 : tmp2;
2023-01-11T21:38:05.9892412Z                 out_ptr0[i1 + (16*i0)] = tmp3;
2023-01-11T21:38:05.9892480Z             }
2023-01-11T21:38:05.9892547Z         }
2023-01-11T21:38:05.9892614Z     }
2023-01-11T21:38:05.9892705Z }
2023-01-11T21:38:05.9892791Z ''')
2023-01-11T21:38:05.9892796Z 
2023-01-11T21:38:05.9892801Z 
2023-01-11T21:38:05.9892896Z async_compile.wait(globals())
2023-01-11T21:38:05.9892967Z del async_compile
2023-01-11T21:38:05.9892973Z 
2023-01-11T21:38:05.9893049Z def call(args):
2023-01-11T21:38:05.9893130Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9893208Z     args.clear()
2023-01-11T21:38:05.9893408Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9893575Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9893649Z     del arg0_1
2023-01-11T21:38:05.9893713Z     del arg1_1
2023-01-11T21:38:05.9893790Z     return (buf0, )
2023-01-11T21:38:05.9893795Z 
2023-01-11T21:38:05.9893799Z 
2023-01-11T21:38:05.9893880Z if __name__ == "__main__":
2023-01-11T21:38:05.9893998Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9894124Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9894321Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9894632Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9894754Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9894763Z 
2023-01-11T21:38:05.9894767Z 
2023-01-11T21:38:05.9894857Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9894932Z import torch
2023-01-11T21:38:05.9895008Z import random
2023-01-11T21:38:05.9895131Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9895254Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9895259Z 
2023-01-11T21:38:05.9895343Z aten = torch.ops.aten
2023-01-11T21:38:05.9895479Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9895576Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9895581Z 
2023-01-11T21:38:05.9895648Z import triton
2023-01-11T21:38:05.9895745Z import triton.language as tl
2023-01-11T21:38:05.9895869Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9896008Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9896013Z 
2023-01-11T21:38:05.9896017Z 
2023-01-11T21:38:05.9896154Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9896410Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9896534Z extern "C" void kernel(const bool* __restrict__ in_ptr0,
2023-01-11T21:38:05.9896643Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:05.9896738Z                        long* __restrict__ out_ptr0)
2023-01-11T21:38:05.9896803Z {
2023-01-11T21:38:05.9896904Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9896970Z     {
2023-01-11T21:38:05.9897051Z         #pragma omp for 
2023-01-11T21:38:05.9897199Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9897261Z         {
2023-01-11T21:38:05.9897349Z             #pragma GCC ivdep
2023-01-11T21:38:05.9897440Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:05.9897508Z             {
2023-01-11T21:38:05.9897577Z                 {
2023-01-11T21:38:05.9897654Z                     {
2023-01-11T21:38:05.9897755Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:05.9897858Z                         auto tmp3 = in_ptr1[i1 + (16*i0)];
2023-01-11T21:38:05.9897974Z                         auto tmp1 = static_cast<float>(3.5);
2023-01-11T21:38:05.9898087Z                         auto tmp2 = static_cast<long>(tmp1);
2023-01-11T21:38:05.9898192Z                         auto tmp4 = tmp0 ? tmp2 : tmp3;
2023-01-11T21:38:05.9898297Z                         out_ptr0[i1 + (16*i0)] = tmp4;
2023-01-11T21:38:05.9898368Z                     }
2023-01-11T21:38:05.9898437Z                 }
2023-01-11T21:38:05.9898497Z             }
2023-01-11T21:38:05.9898565Z         }
2023-01-11T21:38:05.9898631Z     }
2023-01-11T21:38:05.9898736Z }
2023-01-11T21:38:05.9898826Z ''')
2023-01-11T21:38:05.9898831Z 
2023-01-11T21:38:05.9898836Z 
2023-01-11T21:38:05.9898929Z async_compile.wait(globals())
2023-01-11T21:38:05.9899007Z del async_compile
2023-01-11T21:38:05.9899012Z 
2023-01-11T21:38:05.9899079Z def call(args):
2023-01-11T21:38:05.9899159Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9899239Z     args.clear()
2023-01-11T21:38:05.9899438Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9899605Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:05.9899679Z     del arg0_1
2023-01-11T21:38:05.9899750Z     del arg1_1
2023-01-11T21:38:05.9899818Z     return (buf0, )
2023-01-11T21:38:05.9899830Z 
2023-01-11T21:38:05.9899834Z 
2023-01-11T21:38:05.9899907Z if __name__ == "__main__":
2023-01-11T21:38:05.9900028Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9900154Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9900350Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:05.9900548Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9900667Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9900672Z 
2023-01-11T21:38:05.9900746Z ok (3.877s)
2023-01-11T21:38:05.9901198Z   test_max_min_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9901330Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9901584Z [2023-01-11 21:29:07,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 184
2023-01-11T21:38:05.9901850Z [2023-01-11 21:29:08,858] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 184
2023-01-11T21:38:05.9902294Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9902427Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9902684Z [2023-01-11 21:29:08,874] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 185
2023-01-11T21:38:05.9902947Z [2023-01-11 21:29:08,883] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 185
2023-01-11T21:38:05.9902953Z 
2023-01-11T21:38:05.9903051Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9903129Z import torch
2023-01-11T21:38:05.9903204Z import random
2023-01-11T21:38:05.9903317Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9903445Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9903450Z 
2023-01-11T21:38:05.9903535Z aten = torch.ops.aten
2023-01-11T21:38:05.9903674Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9903770Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9903775Z 
2023-01-11T21:38:05.9903850Z import triton
2023-01-11T21:38:05.9903945Z import triton.language as tl
2023-01-11T21:38:05.9904072Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9904204Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9904210Z 
2023-01-11T21:38:05.9904222Z 
2023-01-11T21:38:05.9904352Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9904556Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9904708Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9904817Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9904921Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9905028Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9905099Z {
2023-01-11T21:38:05.9905194Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9905261Z     {
2023-01-11T21:38:05.9905343Z         #pragma omp for 
2023-01-11T21:38:05.9905431Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9905499Z         {
2023-01-11T21:38:05.9905641Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9905778Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9905884Z             auto tmp2 = at::vec::maximum(tmp0, tmp1);
2023-01-11T21:38:05.9905995Z             auto tmp3 = at::vec::minimum(tmp0, tmp1);
2023-01-11T21:38:05.9906094Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9906191Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9906258Z         }
2023-01-11T21:38:05.9906359Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9906446Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:05.9906507Z         {
2023-01-11T21:38:05.9906598Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9906688Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9906817Z             auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1);
2023-01-11T21:38:05.9906942Z             auto tmp3 = (tmp1 != tmp1) ? tmp1 : std::min(tmp0, tmp1);
2023-01-11T21:38:05.9907027Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9907112Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9907172Z         }
2023-01-11T21:38:05.9907240Z     }
2023-01-11T21:38:05.9907305Z }
2023-01-11T21:38:05.9907392Z ''')
2023-01-11T21:38:05.9907397Z 
2023-01-11T21:38:05.9907402Z 
2023-01-11T21:38:05.9907497Z async_compile.wait(globals())
2023-01-11T21:38:05.9907575Z del async_compile
2023-01-11T21:38:05.9907580Z 
2023-01-11T21:38:05.9907657Z def call(args):
2023-01-11T21:38:05.9907737Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9907806Z     args.clear()
2023-01-11T21:38:05.9908033Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9908228Z     buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9908422Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9908495Z     del arg0_1
2023-01-11T21:38:05.9908567Z     del arg1_1
2023-01-11T21:38:05.9908651Z     return (buf0, buf1, )
2023-01-11T21:38:05.9908656Z 
2023-01-11T21:38:05.9908660Z 
2023-01-11T21:38:05.9908734Z if __name__ == "__main__":
2023-01-11T21:38:05.9908853Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9908979Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9909174Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9909365Z     arg1_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9909484Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9909489Z 
2023-01-11T21:38:05.9909496Z 
2023-01-11T21:38:05.9909594Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9909669Z import torch
2023-01-11T21:38:05.9909737Z import random
2023-01-11T21:38:05.9909857Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9909981Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9909986Z 
2023-01-11T21:38:05.9910068Z aten = torch.ops.aten
2023-01-11T21:38:05.9910204Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9910300Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9910305Z 
2023-01-11T21:38:05.9910379Z import triton
2023-01-11T21:38:05.9910500Z import triton.language as tl
2023-01-11T21:38:05.9910618Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9910758Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9910764Z 
2023-01-11T21:38:05.9910768Z 
2023-01-11T21:38:05.9910903Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9911112Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9911236Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9911347Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:05.9911453Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9911553Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:05.9911612Z {
2023-01-11T21:38:05.9911713Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9911780Z     {
2023-01-11T21:38:05.9911865Z         #pragma omp for 
2023-01-11T21:38:05.9911952Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:05.9912025Z         {
2023-01-11T21:38:05.9912163Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:05.9912292Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:05.9912407Z             auto tmp2 = at::vec::maximum(tmp0, tmp1);
2023-01-11T21:38:05.9912521Z             auto tmp3 = at::vec::minimum(tmp0, tmp1);
2023-01-11T21:38:05.9912618Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:05.9912715Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:05.9912784Z         }
2023-01-11T21:38:05.9912883Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:05.9912963Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:05.9913030Z         {
2023-01-11T21:38:05.9913120Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:05.9913211Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:05.9913338Z             auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1);
2023-01-11T21:38:05.9913466Z             auto tmp3 = (tmp1 != tmp1) ? tmp1 : std::min(tmp0, tmp1);
2023-01-11T21:38:05.9913554Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:05.9913631Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:05.9913702Z         }
2023-01-11T21:38:05.9913768Z     }
2023-01-11T21:38:05.9913833Z }
2023-01-11T21:38:05.9913918Z ''')
2023-01-11T21:38:05.9913956Z 
2023-01-11T21:38:05.9913961Z 
2023-01-11T21:38:05.9914057Z async_compile.wait(globals())
2023-01-11T21:38:05.9914135Z del async_compile
2023-01-11T21:38:05.9914141Z 
2023-01-11T21:38:05.9914216Z def call(args):
2023-01-11T21:38:05.9914289Z     arg0_1, arg1_1 = args
2023-01-11T21:38:05.9914365Z     args.clear()
2023-01-11T21:38:05.9914556Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9914747Z     buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9914941Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9915018Z     del arg0_1
2023-01-11T21:38:05.9915090Z     del arg1_1
2023-01-11T21:38:05.9915164Z     return (buf0, buf1, )
2023-01-11T21:38:05.9915169Z 
2023-01-11T21:38:05.9915181Z 
2023-01-11T21:38:05.9915255Z if __name__ == "__main__":
2023-01-11T21:38:05.9915375Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9915502Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9915694Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9915885Z     arg1_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9916009Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:05.9916014Z 
2023-01-11T21:38:05.9916085Z ok (1.727s)
2023-01-11T21:38:05.9916541Z   test_max_pool2d1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9916695Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9916960Z [2023-01-11 21:29:08,903] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 186
2023-01-11T21:38:05.9917226Z [2023-01-11 21:29:10,682] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 186
2023-01-11T21:38:05.9917232Z 
2023-01-11T21:38:05.9917329Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9917405Z import torch
2023-01-11T21:38:05.9917481Z import random
2023-01-11T21:38:05.9917601Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9917726Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9917731Z 
2023-01-11T21:38:05.9917809Z aten = torch.ops.aten
2023-01-11T21:38:05.9917946Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9918043Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9918048Z 
2023-01-11T21:38:05.9918123Z import triton
2023-01-11T21:38:05.9918218Z import triton.language as tl
2023-01-11T21:38:05.9918346Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9918486Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9918492Z 
2023-01-11T21:38:05.9918496Z 
2023-01-11T21:38:05.9918634Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9918831Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9918954Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9919061Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9919161Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.9919232Z {
2023-01-11T21:38:05.9919336Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9919406Z     {
2023-01-11T21:38:05.9919480Z         #pragma omp for 
2023-01-11T21:38:05.9919565Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:05.9919633Z         {
2023-01-11T21:38:05.9919717Z             #pragma GCC ivdep
2023-01-11T21:38:05.9919833Z             for(long i1=0; i1<7; i1+=1)
2023-01-11T21:38:05.9919903Z             {
2023-01-11T21:38:05.9919990Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9920077Z                 for(long i2=0; i2<7; i2+=1)
2023-01-11T21:38:05.9920148Z                 {
2023-01-11T21:38:05.9920222Z                     {
2023-01-11T21:38:05.9920295Z                         {
2023-01-11T21:38:05.9920413Z                             auto tmp0 = in_ptr0[(2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9920535Z                             auto tmp1 = in_ptr0[1 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9920657Z                             auto tmp3 = in_ptr0[2 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9920773Z                             auto tmp5 = in_ptr0[16 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9920888Z                             auto tmp7 = in_ptr0[17 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9921000Z                             auto tmp9 = in_ptr0[18 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9921121Z                             auto tmp11 = in_ptr0[32 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9921236Z                             auto tmp13 = in_ptr0[33 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9921349Z                             auto tmp15 = in_ptr0[34 + (2*i2) + (32*i1) + (256*i0)];
2023-01-11T21:38:05.9921486Z                             auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0);
2023-01-11T21:38:05.9921615Z                             auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2);
2023-01-11T21:38:05.9921748Z                             auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4);
2023-01-11T21:38:05.9921905Z                             auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6);
2023-01-11T21:38:05.9922040Z                             auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8);
2023-01-11T21:38:05.9922179Z                             auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10);
2023-01-11T21:38:05.9922318Z                             auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12);
2023-01-11T21:38:05.9922452Z                             auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14);
2023-01-11T21:38:05.9922576Z                             auto tmp17 = static_cast<long>((2*i2) + (32*i1));
2023-01-11T21:38:05.9922698Z                             auto tmp18 = static_cast<long>(1 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9922800Z                             auto tmp19 = tmp1 > tmp0;
2023-01-11T21:38:05.9922911Z                             auto tmp20 = tmp19 ? tmp18 : tmp17;
2023-01-11T21:38:05.9923030Z                             auto tmp21 = static_cast<long>(2 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9923131Z                             auto tmp22 = tmp3 > tmp2;
2023-01-11T21:38:05.9923241Z                             auto tmp23 = tmp22 ? tmp21 : tmp20;
2023-01-11T21:38:05.9923369Z                             auto tmp24 = static_cast<long>(16 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9923470Z                             auto tmp25 = tmp5 > tmp4;
2023-01-11T21:38:05.9923577Z                             auto tmp26 = tmp25 ? tmp24 : tmp23;
2023-01-11T21:38:05.9923700Z                             auto tmp27 = static_cast<long>(17 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9923801Z                             auto tmp28 = tmp7 > tmp6;
2023-01-11T21:38:05.9923903Z                             auto tmp29 = tmp28 ? tmp27 : tmp26;
2023-01-11T21:38:05.9924026Z                             auto tmp30 = static_cast<long>(18 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9924130Z                             auto tmp31 = tmp9 > tmp8;
2023-01-11T21:38:05.9924241Z                             auto tmp32 = tmp31 ? tmp30 : tmp29;
2023-01-11T21:38:05.9924363Z                             auto tmp33 = static_cast<long>(32 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9924467Z                             auto tmp34 = tmp11 > tmp10;
2023-01-11T21:38:05.9924601Z                             auto tmp35 = tmp34 ? tmp33 : tmp32;
2023-01-11T21:38:05.9924716Z                             auto tmp36 = static_cast<long>(33 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9924817Z                             auto tmp37 = tmp13 > tmp12;
2023-01-11T21:38:05.9924924Z                             auto tmp38 = tmp37 ? tmp36 : tmp35;
2023-01-11T21:38:05.9925045Z                             auto tmp39 = static_cast<long>(34 + (2*i2) + (32*i1));
2023-01-11T21:38:05.9925145Z                             auto tmp40 = tmp15 > tmp14;
2023-01-11T21:38:05.9925251Z                             auto tmp41 = tmp40 ? tmp39 : tmp38;
2023-01-11T21:38:05.9925378Z                             out_ptr0[i2 + (7*i1) + (49*i0)] = tmp16;
2023-01-11T21:38:05.9925500Z                             out_ptr1[i2 + (7*i1) + (49*i0)] = tmp41;
2023-01-11T21:38:05.9925583Z                         }
2023-01-11T21:38:05.9925659Z                     }
2023-01-11T21:38:05.9925728Z                 }
2023-01-11T21:38:05.9925796Z             }
2023-01-11T21:38:05.9925865Z         }
2023-01-11T21:38:05.9925932Z     }
2023-01-11T21:38:05.9925995Z }
2023-01-11T21:38:05.9926075Z ''')
2023-01-11T21:38:05.9926081Z 
2023-01-11T21:38:05.9926085Z 
2023-01-11T21:38:05.9926182Z async_compile.wait(globals())
2023-01-11T21:38:05.9926263Z del async_compile
2023-01-11T21:38:05.9926268Z 
2023-01-11T21:38:05.9926343Z def call(args):
2023-01-11T21:38:05.9926418Z     arg0_1, = args
2023-01-11T21:38:05.9926494Z     args.clear()
2023-01-11T21:38:05.9926708Z     buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9926911Z     buf1 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9927112Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9927185Z     del arg0_1
2023-01-11T21:38:05.9927266Z     return (buf0, buf1, )
2023-01-11T21:38:05.9927271Z 
2023-01-11T21:38:05.9927276Z 
2023-01-11T21:38:05.9927359Z if __name__ == "__main__":
2023-01-11T21:38:05.9927478Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9927604Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9927824Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9927930Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9927940Z 
2023-01-11T21:38:05.9928005Z ok (1.799s)
2023-01-11T21:38:05.9928457Z   test_max_pool2d2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9928594Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9928854Z [2023-01-11 21:29:10,726] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 187
2023-01-11T21:38:05.9929121Z [2023-01-11 21:29:12,646] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 187
2023-01-11T21:38:05.9929127Z 
2023-01-11T21:38:05.9929225Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9929299Z import torch
2023-01-11T21:38:05.9929374Z import random
2023-01-11T21:38:05.9929485Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9929609Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9929614Z 
2023-01-11T21:38:05.9929696Z aten = torch.ops.aten
2023-01-11T21:38:05.9929834Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9929931Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9929936Z 
2023-01-11T21:38:05.9930012Z import triton
2023-01-11T21:38:05.9930103Z import triton.language as tl
2023-01-11T21:38:05.9930256Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9930389Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9930395Z 
2023-01-11T21:38:05.9930407Z 
2023-01-11T21:38:05.9930537Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9930744Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9930871Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9930979Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9931082Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.9931152Z {
2023-01-11T21:38:05.9931259Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9931319Z     {
2023-01-11T21:38:05.9931402Z         #pragma omp for 
2023-01-11T21:38:05.9931491Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9931558Z         {
2023-01-11T21:38:05.9931642Z             #pragma GCC ivdep
2023-01-11T21:38:05.9931735Z             for(long i1=0; i1<27; i1+=1)
2023-01-11T21:38:05.9931802Z             {
2023-01-11T21:38:05.9931883Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9931976Z                 for(long i2=0; i2<27; i2+=1)
2023-01-11T21:38:05.9932046Z                 {
2023-01-11T21:38:05.9932120Z                     {
2023-01-11T21:38:05.9932193Z                         {
2023-01-11T21:38:05.9932314Z                             auto tmp0 = in_ptr0[(2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9932433Z                             auto tmp1 = in_ptr0[1 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9932542Z                             auto tmp3 = in_ptr0[2 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9932706Z                             auto tmp5 = in_ptr0[55 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9932820Z                             auto tmp7 = in_ptr0[56 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9932935Z                             auto tmp9 = in_ptr0[57 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9933057Z                             auto tmp11 = in_ptr0[110 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9933176Z                             auto tmp13 = in_ptr0[111 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9933296Z                             auto tmp15 = in_ptr0[112 + (2*i2) + (110*i1) + (3025*i0)];
2023-01-11T21:38:05.9933433Z                             auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0);
2023-01-11T21:38:05.9933554Z                             auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2);
2023-01-11T21:38:05.9933683Z                             auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4);
2023-01-11T21:38:05.9933815Z                             auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6);
2023-01-11T21:38:05.9933948Z                             auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8);
2023-01-11T21:38:05.9934091Z                             auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10);
2023-01-11T21:38:05.9934227Z                             auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12);
2023-01-11T21:38:05.9934359Z                             auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14);
2023-01-11T21:38:05.9934589Z                             auto tmp17 = static_cast<long>((2*i2) + (110*i1));
2023-01-11T21:38:05.9934720Z                             auto tmp18 = static_cast<long>(1 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9934815Z                             auto tmp19 = tmp1 > tmp0;
2023-01-11T21:38:05.9934925Z                             auto tmp20 = tmp19 ? tmp18 : tmp17;
2023-01-11T21:38:05.9935051Z                             auto tmp21 = static_cast<long>(2 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9935152Z                             auto tmp22 = tmp3 > tmp2;
2023-01-11T21:38:05.9935261Z                             auto tmp23 = tmp22 ? tmp21 : tmp20;
2023-01-11T21:38:05.9935431Z                             auto tmp24 = static_cast<long>(55 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9935533Z                             auto tmp25 = tmp5 > tmp4;
2023-01-11T21:38:05.9935640Z                             auto tmp26 = tmp25 ? tmp24 : tmp23;
2023-01-11T21:38:05.9935757Z                             auto tmp27 = static_cast<long>(56 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9935863Z                             auto tmp28 = tmp7 > tmp6;
2023-01-11T21:38:05.9935972Z                             auto tmp29 = tmp28 ? tmp27 : tmp26;
2023-01-11T21:38:05.9936099Z                             auto tmp30 = static_cast<long>(57 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9936201Z                             auto tmp31 = tmp9 > tmp8;
2023-01-11T21:38:05.9936310Z                             auto tmp32 = tmp31 ? tmp30 : tmp29;
2023-01-11T21:38:05.9936433Z                             auto tmp33 = static_cast<long>(110 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9936536Z                             auto tmp34 = tmp11 > tmp10;
2023-01-11T21:38:05.9936637Z                             auto tmp35 = tmp34 ? tmp33 : tmp32;
2023-01-11T21:38:05.9936762Z                             auto tmp36 = static_cast<long>(111 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9936865Z                             auto tmp37 = tmp13 > tmp12;
2023-01-11T21:38:05.9936971Z                             auto tmp38 = tmp37 ? tmp36 : tmp35;
2023-01-11T21:38:05.9937091Z                             auto tmp39 = static_cast<long>(112 + (2*i2) + (110*i1));
2023-01-11T21:38:05.9937245Z                             auto tmp40 = tmp15 > tmp14;
2023-01-11T21:38:05.9937352Z                             auto tmp41 = tmp40 ? tmp39 : tmp38;
2023-01-11T21:38:05.9937491Z                             out_ptr0[i2 + (27*i1) + (729*i0)] = tmp16;
2023-01-11T21:38:05.9937601Z                             out_ptr1[i2 + (27*i1) + (729*i0)] = tmp41;
2023-01-11T21:38:05.9937675Z                         }
2023-01-11T21:38:05.9937745Z                     }
2023-01-11T21:38:05.9937815Z                 }
2023-01-11T21:38:05.9937884Z             }
2023-01-11T21:38:05.9937953Z         }
2023-01-11T21:38:05.9938012Z     }
2023-01-11T21:38:05.9938075Z }
2023-01-11T21:38:05.9938164Z ''')
2023-01-11T21:38:05.9938170Z 
2023-01-11T21:38:05.9938174Z 
2023-01-11T21:38:05.9938271Z async_compile.wait(globals())
2023-01-11T21:38:05.9938347Z del async_compile
2023-01-11T21:38:05.9938352Z 
2023-01-11T21:38:05.9938426Z def call(args):
2023-01-11T21:38:05.9938500Z     arg0_1, = args
2023-01-11T21:38:05.9938574Z     args.clear()
2023-01-11T21:38:05.9938792Z     buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9939012Z     buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9939183Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9939256Z     del arg0_1
2023-01-11T21:38:05.9939335Z     return (buf0, buf1, )
2023-01-11T21:38:05.9939340Z 
2023-01-11T21:38:05.9939344Z 
2023-01-11T21:38:05.9939428Z if __name__ == "__main__":
2023-01-11T21:38:05.9939547Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9939674Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9939893Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9940004Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9940009Z 
2023-01-11T21:38:05.9940079Z ok (2.106s)
2023-01-11T21:38:05.9940532Z   test_max_pool2d3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9940668Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9940955Z [2023-01-11 21:29:12,816] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 188
2023-01-11T21:38:05.9940962Z 
2023-01-11T21:38:05.9941058Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9941134Z import torch
2023-01-11T21:38:05.9941208Z import random
2023-01-11T21:38:05.9941320Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9941444Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9941449Z 
2023-01-11T21:38:05.9941530Z aten = torch.ops.aten
2023-01-11T21:38:05.9941668Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9941769Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9941774Z 
2023-01-11T21:38:05.9941846Z import triton
2023-01-11T21:38:05.9941938Z import triton.language as tl
2023-01-11T21:38:05.9942054Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9942192Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9942201Z 
2023-01-11T21:38:05.9942205Z 
2023-01-11T21:38:05.9942342Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9942544Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9942669Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9942772Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9942874Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.9942939Z {
2023-01-11T21:38:05.9943033Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9943100Z     {
2023-01-11T21:38:05.9943212Z         #pragma omp for 
2023-01-11T21:38:05.9943299Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:05.9943367Z         {
2023-01-11T21:38:05.9943452Z             #pragma GCC ivdep
2023-01-11T21:38:05.9943539Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:05.9943599Z             {
2023-01-11T21:38:05.9943669Z                 {
2023-01-11T21:38:05.9943743Z                     {
2023-01-11T21:38:05.9943921Z                         auto tmp0 = static_cast<long>((-1) + (2*i0));
2023-01-11T21:38:05.9944030Z                         auto tmp1 = static_cast<long>(0);
2023-01-11T21:38:05.9944131Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:05.9944239Z                         auto tmp3 = static_cast<long>(8);
2023-01-11T21:38:05.9944331Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:05.9944451Z                         auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:05.9944686Z                         auto tmp6 = static_cast<long>((-1) + (2*i1));
2023-01-11T21:38:05.9944818Z                         auto tmp7 = tmp6 >= tmp1;
2023-01-11T21:38:05.9944942Z                         auto tmp8 = tmp6 < tmp3;
2023-01-11T21:38:05.9945064Z                         auto tmp9 = tmp7 & tmp8;
2023-01-11T21:38:05.9945187Z                         auto tmp10 = tmp5 & tmp9;
2023-01-11T21:38:05.9945480Z                         float tmp11 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9945590Z                         if(tmp10)
2023-01-11T21:38:05.9945688Z                         {
2023-01-11T21:38:05.9945918Z                             auto tmp12 = in_ptr0[(-9) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9946033Z                             tmp11 = tmp12;
2023-01-11T21:38:05.9946131Z                         }
2023-01-11T21:38:05.9946275Z                         auto tmp13 = static_cast<long>(2*i1);
2023-01-11T21:38:05.9946409Z                         auto tmp14 = tmp13 >= tmp1;
2023-01-11T21:38:05.9946535Z                         auto tmp15 = tmp13 < tmp3;
2023-01-11T21:38:05.9946673Z                         auto tmp16 = tmp14 & tmp15;
2023-01-11T21:38:05.9946810Z                         auto tmp17 = tmp5 & tmp16;
2023-01-11T21:38:05.9947139Z                         float tmp18 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9947264Z                         if(tmp17)
2023-01-11T21:38:05.9947444Z                         {
2023-01-11T21:38:05.9947719Z                             auto tmp19 = in_ptr0[(-8) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9947836Z                             tmp18 = tmp19;
2023-01-11T21:38:05.9947936Z                         }
2023-01-11T21:38:05.9948123Z                         auto tmp20 = (tmp11 != tmp11) ? tmp11 : std::max(tmp18, tmp11);
2023-01-11T21:38:05.9948285Z                         auto tmp21 = static_cast<long>(1 + (2*i1));
2023-01-11T21:38:05.9948426Z                         auto tmp22 = tmp21 >= tmp1;
2023-01-11T21:38:05.9948558Z                         auto tmp23 = tmp21 < tmp3;
2023-01-11T21:38:05.9948691Z                         auto tmp24 = tmp22 & tmp23;
2023-01-11T21:38:05.9948828Z                         auto tmp25 = tmp5 & tmp24;
2023-01-11T21:38:05.9949135Z                         float tmp26 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9949251Z                         if(tmp25)
2023-01-11T21:38:05.9949350Z                         {
2023-01-11T21:38:05.9949607Z                             auto tmp27 = in_ptr0[(-7) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9949729Z                             tmp26 = tmp27;
2023-01-11T21:38:05.9949834Z                         }
2023-01-11T21:38:05.9950024Z                         auto tmp28 = (tmp20 != tmp20) ? tmp20 : std::max(tmp26, tmp20);
2023-01-11T21:38:05.9950170Z                         auto tmp29 = static_cast<long>(2*i0);
2023-01-11T21:38:05.9950308Z                         auto tmp30 = tmp29 >= tmp1;
2023-01-11T21:38:05.9950440Z                         auto tmp31 = tmp29 < tmp3;
2023-01-11T21:38:05.9950584Z                         auto tmp32 = tmp30 & tmp31;
2023-01-11T21:38:05.9950799Z                         auto tmp33 = tmp32 & tmp9;
2023-01-11T21:38:05.9951123Z                         float tmp34 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9951238Z                         if(tmp33)
2023-01-11T21:38:05.9951342Z                         {
2023-01-11T21:38:05.9951574Z                             auto tmp35 = in_ptr0[(-1) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9951691Z                             tmp34 = tmp35;
2023-01-11T21:38:05.9951790Z                         }
2023-01-11T21:38:05.9951976Z                         auto tmp36 = (tmp28 != tmp28) ? tmp28 : std::max(tmp34, tmp28);
2023-01-11T21:38:05.9952093Z                         auto tmp37 = tmp32 & tmp16;
2023-01-11T21:38:05.9952325Z                         float tmp38 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9952407Z                         if(tmp37)
2023-01-11T21:38:05.9952475Z                         {
2023-01-11T21:38:05.9952591Z                             auto tmp39 = in_ptr0[(2*i1) + (16*i0)];
2023-01-11T21:38:05.9952685Z                             tmp38 = tmp39;
2023-01-11T21:38:05.9952759Z                         }
2023-01-11T21:38:05.9952896Z                         auto tmp40 = (tmp36 != tmp36) ? tmp36 : std::max(tmp38, tmp36);
2023-01-11T21:38:05.9952999Z                         auto tmp41 = tmp32 & tmp24;
2023-01-11T21:38:05.9953226Z                         float tmp42 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9953310Z                         if(tmp41)
2023-01-11T21:38:05.9953377Z                         {
2023-01-11T21:38:05.9953492Z                             auto tmp43 = in_ptr0[1 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9953582Z                             tmp42 = tmp43;
2023-01-11T21:38:05.9953656Z                         }
2023-01-11T21:38:05.9953790Z                         auto tmp44 = (tmp40 != tmp40) ? tmp40 : std::max(tmp42, tmp40);
2023-01-11T21:38:05.9953908Z                         auto tmp45 = static_cast<long>(1 + (2*i0));
2023-01-11T21:38:05.9954014Z                         auto tmp46 = tmp45 >= tmp1;
2023-01-11T21:38:05.9954107Z                         auto tmp47 = tmp45 < tmp3;
2023-01-11T21:38:05.9954206Z                         auto tmp48 = tmp46 & tmp47;
2023-01-11T21:38:05.9954306Z                         auto tmp49 = tmp48 & tmp9;
2023-01-11T21:38:05.9954607Z                         float tmp50 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9954693Z                         if(tmp49)
2023-01-11T21:38:05.9954765Z                         {
2023-01-11T21:38:05.9954880Z                             auto tmp51 = in_ptr0[7 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9954970Z                             tmp50 = tmp51;
2023-01-11T21:38:05.9955037Z                         }
2023-01-11T21:38:05.9955171Z                         auto tmp52 = (tmp44 != tmp44) ? tmp44 : std::max(tmp50, tmp44);
2023-01-11T21:38:05.9955271Z                         auto tmp53 = tmp48 & tmp16;
2023-01-11T21:38:05.9955488Z                         float tmp54 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9955574Z                         if(tmp53)
2023-01-11T21:38:05.9955649Z                         {
2023-01-11T21:38:05.9955762Z                             auto tmp55 = in_ptr0[8 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9955844Z                             tmp54 = tmp55;
2023-01-11T21:38:05.9955922Z                         }
2023-01-11T21:38:05.9956059Z                         auto tmp56 = (tmp52 != tmp52) ? tmp52 : std::max(tmp54, tmp52);
2023-01-11T21:38:05.9956160Z                         auto tmp57 = tmp48 & tmp24;
2023-01-11T21:38:05.9956378Z                         float tmp58 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9956460Z                         if(tmp57)
2023-01-11T21:38:05.9956535Z                         {
2023-01-11T21:38:05.9956654Z                             auto tmp59 = in_ptr0[9 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9956740Z                             tmp58 = tmp59;
2023-01-11T21:38:05.9956848Z                         }
2023-01-11T21:38:05.9956985Z                         auto tmp60 = (tmp56 != tmp56) ? tmp56 : std::max(tmp58, tmp56);
2023-01-11T21:38:05.9957206Z                         float tmp61 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9957292Z                         if(tmp10)
2023-01-11T21:38:05.9957370Z                         {
2023-01-11T21:38:05.9957547Z                             auto tmp62 = in_ptr0[(-9) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9957639Z                             tmp61 = tmp62;
2023-01-11T21:38:05.9957707Z                         }
2023-01-11T21:38:05.9957897Z                         auto tmp63 = static_cast<long>((-9) + (2*i1) + (16*i0));
2023-01-11T21:38:05.9958123Z                         float tmp64 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9958206Z                         if(tmp17)
2023-01-11T21:38:05.9958280Z                         {
2023-01-11T21:38:05.9958457Z                             auto tmp65 = in_ptr0[(-8) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9958556Z                             tmp64 = tmp65;
2023-01-11T21:38:05.9958623Z                         }
2023-01-11T21:38:05.9958814Z                         auto tmp66 = static_cast<long>((-8) + (2*i1) + (16*i0));
2023-01-11T21:38:05.9958917Z                         auto tmp67 = tmp64 > tmp61;
2023-01-11T21:38:05.9959034Z                         auto tmp68 = tmp67 ? tmp66 : tmp63;
2023-01-11T21:38:05.9959174Z                         auto tmp69 = (tmp61 != tmp61) ? tmp61 : std::max(tmp64, tmp61);
2023-01-11T21:38:05.9959398Z                         float tmp70 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9959489Z                         if(tmp25)
2023-01-11T21:38:05.9959568Z                         {
2023-01-11T21:38:05.9959738Z                             auto tmp71 = in_ptr0[(-7) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9959829Z                             tmp70 = tmp71;
2023-01-11T21:38:05.9959904Z                         }
2023-01-11T21:38:05.9960095Z                         auto tmp72 = static_cast<long>((-7) + (2*i1) + (16*i0));
2023-01-11T21:38:05.9960198Z                         auto tmp73 = tmp70 > tmp69;
2023-01-11T21:38:05.9960310Z                         auto tmp74 = tmp73 ? tmp72 : tmp68;
2023-01-11T21:38:05.9960485Z                         auto tmp75 = (tmp69 != tmp69) ? tmp69 : std::max(tmp70, tmp69);
2023-01-11T21:38:05.9960705Z                         float tmp76 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9960780Z                         if(tmp33)
2023-01-11T21:38:05.9960855Z                         {
2023-01-11T21:38:05.9961029Z                             auto tmp77 = in_ptr0[(-1) + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9961119Z                             tmp76 = tmp77;
2023-01-11T21:38:05.9961191Z                         }
2023-01-11T21:38:05.9961373Z                         auto tmp78 = static_cast<long>((-1) + (2*i1) + (16*i0));
2023-01-11T21:38:05.9961473Z                         auto tmp79 = tmp76 > tmp75;
2023-01-11T21:38:05.9961578Z                         auto tmp80 = tmp79 ? tmp78 : tmp74;
2023-01-11T21:38:05.9961711Z                         auto tmp81 = (tmp75 != tmp75) ? tmp75 : std::max(tmp76, tmp75);
2023-01-11T21:38:05.9961927Z                         float tmp82 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9962011Z                         if(tmp37)
2023-01-11T21:38:05.9962086Z                         {
2023-01-11T21:38:05.9962200Z                             auto tmp83 = in_ptr0[(2*i1) + (16*i0)];
2023-01-11T21:38:05.9962287Z                             tmp82 = tmp83;
2023-01-11T21:38:05.9962359Z                         }
2023-01-11T21:38:05.9962475Z                         auto tmp84 = static_cast<long>((2*i1) + (16*i0));
2023-01-11T21:38:05.9962573Z                         auto tmp85 = tmp82 > tmp81;
2023-01-11T21:38:05.9962684Z                         auto tmp86 = tmp85 ? tmp84 : tmp80;
2023-01-11T21:38:05.9962821Z                         auto tmp87 = (tmp81 != tmp81) ? tmp81 : std::max(tmp82, tmp81);
2023-01-11T21:38:05.9963087Z                         float tmp88 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9963169Z                         if(tmp41)
2023-01-11T21:38:05.9963242Z                         {
2023-01-11T21:38:05.9963350Z                             auto tmp89 = in_ptr0[1 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9963442Z                             tmp88 = tmp89;
2023-01-11T21:38:05.9963515Z                         }
2023-01-11T21:38:05.9963640Z                         auto tmp90 = static_cast<long>(1 + (2*i1) + (16*i0));
2023-01-11T21:38:05.9963740Z                         auto tmp91 = tmp88 > tmp87;
2023-01-11T21:38:05.9963848Z                         auto tmp92 = tmp91 ? tmp90 : tmp86;
2023-01-11T21:38:05.9963985Z                         auto tmp93 = (tmp87 != tmp87) ? tmp87 : std::max(tmp88, tmp87);
2023-01-11T21:38:05.9964200Z                         float tmp94 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9964275Z                         if(tmp49)
2023-01-11T21:38:05.9964352Z                         {
2023-01-11T21:38:05.9964468Z                             auto tmp95 = in_ptr0[7 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9964557Z                             tmp94 = tmp95;
2023-01-11T21:38:05.9964629Z                         }
2023-01-11T21:38:05.9964754Z                         auto tmp96 = static_cast<long>(7 + (2*i1) + (16*i0));
2023-01-11T21:38:05.9964855Z                         auto tmp97 = tmp94 > tmp93;
2023-01-11T21:38:05.9964958Z                         auto tmp98 = tmp97 ? tmp96 : tmp92;
2023-01-11T21:38:05.9965093Z                         auto tmp99 = (tmp93 != tmp93) ? tmp93 : std::max(tmp94, tmp93);
2023-01-11T21:38:05.9965338Z                         float tmp100 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9965427Z                         if(tmp53)
2023-01-11T21:38:05.9965515Z                         {
2023-01-11T21:38:05.9965642Z                             auto tmp101 = in_ptr0[8 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9965739Z                             tmp100 = tmp101;
2023-01-11T21:38:05.9965810Z                         }
2023-01-11T21:38:05.9965926Z                         auto tmp102 = static_cast<long>(8 + (2*i1) + (16*i0));
2023-01-11T21:38:05.9966030Z                         auto tmp103 = tmp100 > tmp99;
2023-01-11T21:38:05.9966177Z                         auto tmp104 = tmp103 ? tmp102 : tmp98;
2023-01-11T21:38:05.9966322Z                         auto tmp105 = (tmp99 != tmp99) ? tmp99 : std::max(tmp100, tmp99);
2023-01-11T21:38:05.9966543Z                         float tmp106 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:05.9966625Z                         if(tmp57)
2023-01-11T21:38:05.9966697Z                         {
2023-01-11T21:38:05.9966813Z                             auto tmp107 = in_ptr0[9 + (2*i1) + (16*i0)];
2023-01-11T21:38:05.9966899Z                             tmp106 = tmp107;
2023-01-11T21:38:05.9966973Z                         }
2023-01-11T21:38:05.9967097Z                         auto tmp108 = static_cast<long>(9 + (2*i1) + (16*i0));
2023-01-11T21:38:05.9967205Z                         auto tmp109 = tmp106 > tmp105;
2023-01-11T21:38:05.9967317Z                         auto tmp110 = tmp109 ? tmp108 : tmp104;
2023-01-11T21:38:05.9967462Z                         auto tmp111 = (tmp105 != tmp105) ? tmp105 : std::max(tmp106, tmp105);
2023-01-11T21:38:05.9967567Z                         out_ptr0[i1 + (4*i0)] = tmp60;
2023-01-11T21:38:05.9967663Z                         out_ptr1[i1 + (4*i0)] = tmp110;
2023-01-11T21:38:05.9967739Z                     }
2023-01-11T21:38:05.9967808Z                 }
2023-01-11T21:38:05.9967876Z             }
2023-01-11T21:38:05.9967943Z         }
2023-01-11T21:38:05.9968011Z     }
2023-01-11T21:38:05.9968074Z }
2023-01-11T21:38:05.9968152Z ''')
2023-01-11T21:38:05.9968159Z 
2023-01-11T21:38:05.9968163Z 
2023-01-11T21:38:05.9968259Z async_compile.wait(globals())
2023-01-11T21:38:05.9968338Z del async_compile
2023-01-11T21:38:05.9968343Z 
2023-01-11T21:38:05.9968448Z def call(args):
2023-01-11T21:38:05.9968520Z     arg0_1, = args
2023-01-11T21:38:05.9968595Z     args.clear()
2023-01-11T21:38:05.9968808Z     buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9969024Z     buf1 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9969187Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9969261Z     del arg0_1
2023-01-11T21:38:05.9969342Z     return (buf0, buf1, )
2023-01-11T21:38:05.9969347Z 
2023-01-11T21:38:05.9969351Z 
2023-01-11T21:38:05.9969432Z if __name__ == "__main__":
2023-01-11T21:38:05.9969551Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9969680Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9969893Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9970015Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9970279Z [2023-01-11 21:29:14,938] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 188
2023-01-11T21:38:05.9970285Z 
2023-01-11T21:38:05.9970358Z ok (2.170s)
2023-01-11T21:38:05.9970822Z   test_max_pool2d4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9970956Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9971216Z [2023-01-11 21:29:14,991] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 189
2023-01-11T21:38:05.9971481Z [2023-01-11 21:29:17,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 189
2023-01-11T21:38:05.9971489Z 
2023-01-11T21:38:05.9971588Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9971664Z import torch
2023-01-11T21:38:05.9971740Z import random
2023-01-11T21:38:05.9971852Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9972048Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9972054Z 
2023-01-11T21:38:05.9972137Z aten = torch.ops.aten
2023-01-11T21:38:05.9972276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9972377Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9972382Z 
2023-01-11T21:38:05.9972458Z import triton
2023-01-11T21:38:05.9972550Z import triton.language as tl
2023-01-11T21:38:05.9972675Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9972810Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9972815Z 
2023-01-11T21:38:05.9972820Z 
2023-01-11T21:38:05.9972960Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9973168Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9973298Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9973404Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9973509Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.9973575Z {
2023-01-11T21:38:05.9973678Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9973738Z     {
2023-01-11T21:38:05.9973820Z         #pragma omp for 
2023-01-11T21:38:05.9973908Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:05.9973977Z         {
2023-01-11T21:38:05.9974062Z             #pragma GCC ivdep
2023-01-11T21:38:05.9974152Z             for(long i1=0; i1<55; i1+=1)
2023-01-11T21:38:05.9974213Z             {
2023-01-11T21:38:05.9974301Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9974397Z                 for(long i2=0; i2<55; i2+=1)
2023-01-11T21:38:05.9974646Z                 {
2023-01-11T21:38:05.9974721Z                     {
2023-01-11T21:38:05.9974794Z                         {
2023-01-11T21:38:05.9974918Z                             auto tmp0 = in_ptr0[(2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975038Z                             auto tmp1 = in_ptr0[1 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975156Z                             auto tmp3 = in_ptr0[2 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975278Z                             auto tmp5 = in_ptr0[111 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975401Z                             auto tmp7 = in_ptr0[112 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975521Z                             auto tmp9 = in_ptr0[113 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975642Z                             auto tmp11 = in_ptr0[222 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975764Z                             auto tmp13 = in_ptr0[223 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9975890Z                             auto tmp15 = in_ptr0[224 + (2*i2) + (222*i1) + (12321*i0)];
2023-01-11T21:38:05.9976032Z                             auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0);
2023-01-11T21:38:05.9976162Z                             auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2);
2023-01-11T21:38:05.9976292Z                             auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4);
2023-01-11T21:38:05.9976422Z                             auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6);
2023-01-11T21:38:05.9976556Z                             auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8);
2023-01-11T21:38:05.9976697Z                             auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10);
2023-01-11T21:38:05.9976835Z                             auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12);
2023-01-11T21:38:05.9976967Z                             auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14);
2023-01-11T21:38:05.9977096Z                             auto tmp17 = static_cast<long>((2*i2) + (222*i1));
2023-01-11T21:38:05.9977296Z                             auto tmp18 = static_cast<long>(1 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9977453Z                             auto tmp19 = tmp1 > tmp0;
2023-01-11T21:38:05.9977575Z                             auto tmp20 = tmp19 ? tmp18 : tmp17;
2023-01-11T21:38:05.9977712Z                             auto tmp21 = static_cast<long>(2 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9977820Z                             auto tmp22 = tmp3 > tmp2;
2023-01-11T21:38:05.9977937Z                             auto tmp23 = tmp22 ? tmp21 : tmp20;
2023-01-11T21:38:05.9978073Z                             auto tmp24 = static_cast<long>(111 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9978178Z                             auto tmp25 = tmp5 > tmp4;
2023-01-11T21:38:05.9978294Z                             auto tmp26 = tmp25 ? tmp24 : tmp23;
2023-01-11T21:38:05.9978429Z                             auto tmp27 = static_cast<long>(112 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9978536Z                             auto tmp28 = tmp7 > tmp6;
2023-01-11T21:38:05.9978653Z                             auto tmp29 = tmp28 ? tmp27 : tmp26;
2023-01-11T21:38:05.9978790Z                             auto tmp30 = static_cast<long>(113 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9978897Z                             auto tmp31 = tmp9 > tmp8;
2023-01-11T21:38:05.9979012Z                             auto tmp32 = tmp31 ? tmp30 : tmp29;
2023-01-11T21:38:05.9979148Z                             auto tmp33 = static_cast<long>(222 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9979249Z                             auto tmp34 = tmp11 > tmp10;
2023-01-11T21:38:05.9979363Z                             auto tmp35 = tmp34 ? tmp33 : tmp32;
2023-01-11T21:38:05.9979495Z                             auto tmp36 = static_cast<long>(223 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9979639Z                             auto tmp37 = tmp13 > tmp12;
2023-01-11T21:38:05.9979752Z                             auto tmp38 = tmp37 ? tmp36 : tmp35;
2023-01-11T21:38:05.9979883Z                             auto tmp39 = static_cast<long>(224 + (2*i2) + (222*i1));
2023-01-11T21:38:05.9979989Z                             auto tmp40 = tmp15 > tmp14;
2023-01-11T21:38:05.9980105Z                             auto tmp41 = tmp40 ? tmp39 : tmp38;
2023-01-11T21:38:05.9980217Z                             out_ptr0[i2 + (55*i1) + (3025*i0)] = tmp16;
2023-01-11T21:38:05.9980337Z                             out_ptr1[i2 + (55*i1) + (3025*i0)] = tmp41;
2023-01-11T21:38:05.9980412Z                         }
2023-01-11T21:38:05.9980484Z                     }
2023-01-11T21:38:05.9980554Z                 }
2023-01-11T21:38:05.9980621Z             }
2023-01-11T21:38:05.9980688Z         }
2023-01-11T21:38:05.9980747Z     }
2023-01-11T21:38:05.9980810Z }
2023-01-11T21:38:05.9980905Z ''')
2023-01-11T21:38:05.9980915Z 
2023-01-11T21:38:05.9980919Z 
2023-01-11T21:38:05.9981019Z async_compile.wait(globals())
2023-01-11T21:38:05.9981097Z del async_compile
2023-01-11T21:38:05.9981102Z 
2023-01-11T21:38:05.9981177Z def call(args):
2023-01-11T21:38:05.9981254Z     arg0_1, = args
2023-01-11T21:38:05.9981322Z     args.clear()
2023-01-11T21:38:05.9981586Z     buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9981806Z     buf1 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9981976Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9982050Z     del arg0_1
2023-01-11T21:38:05.9982132Z     return (buf0, buf1, )
2023-01-11T21:38:05.9982138Z 
2023-01-11T21:38:05.9982142Z 
2023-01-11T21:38:05.9982222Z if __name__ == "__main__":
2023-01-11T21:38:05.9982342Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9982461Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9982695Z     arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9982809Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9982814Z 
2023-01-11T21:38:05.9982886Z ok (2.327s)
2023-01-11T21:38:05.9983375Z   test_max_pool2d5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9983513Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9983773Z [2023-01-11 21:29:17,355] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 190
2023-01-11T21:38:05.9984041Z [2023-01-11 21:29:19,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 190
2023-01-11T21:38:05.9984049Z 
2023-01-11T21:38:05.9984148Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9984222Z import torch
2023-01-11T21:38:05.9984290Z import random
2023-01-11T21:38:05.9984411Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9984537Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9984542Z 
2023-01-11T21:38:05.9984623Z aten = torch.ops.aten
2023-01-11T21:38:05.9984759Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9984855Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9984860Z 
2023-01-11T21:38:05.9984934Z import triton
2023-01-11T21:38:05.9985020Z import triton.language as tl
2023-01-11T21:38:05.9985152Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9985319Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9985358Z 
2023-01-11T21:38:05.9985364Z 
2023-01-11T21:38:05.9985517Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:05.9985728Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:05.9985852Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:05.9985961Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:05.9986064Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:05.9986122Z {
2023-01-11T21:38:05.9986224Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:05.9986290Z     {
2023-01-11T21:38:05.9986372Z         #pragma omp for 
2023-01-11T21:38:05.9986461Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:05.9986529Z         {
2023-01-11T21:38:05.9986616Z             #pragma GCC ivdep
2023-01-11T21:38:05.9986700Z             for(long i1=0; i1<18; i1+=1)
2023-01-11T21:38:05.9986768Z             {
2023-01-11T21:38:05.9986857Z                 #pragma GCC ivdep
2023-01-11T21:38:05.9986956Z                 for(long i2=0; i2<18; i2+=1)
2023-01-11T21:38:05.9987027Z                 {
2023-01-11T21:38:05.9987100Z                     {
2023-01-11T21:38:05.9987172Z                         {
2023-01-11T21:38:05.9987290Z                             auto tmp0 = in_ptr0[(3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9987417Z                             auto tmp1 = in_ptr0[1 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9987536Z                             auto tmp3 = in_ptr0[2 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9987660Z                             auto tmp5 = in_ptr0[55 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9987778Z                             auto tmp7 = in_ptr0[56 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9987892Z                             auto tmp9 = in_ptr0[57 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9988015Z                             auto tmp11 = in_ptr0[110 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9988145Z                             auto tmp13 = in_ptr0[111 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9988261Z                             auto tmp15 = in_ptr0[112 + (3*i2) + (165*i1) + (3025*i0)];
2023-01-11T21:38:05.9988402Z                             auto tmp2 = (tmp0 != tmp0) ? tmp0 : std::max(tmp1, tmp0);
2023-01-11T21:38:05.9988563Z                             auto tmp4 = (tmp2 != tmp2) ? tmp2 : std::max(tmp3, tmp2);
2023-01-11T21:38:05.9988693Z                             auto tmp6 = (tmp4 != tmp4) ? tmp4 : std::max(tmp5, tmp4);
2023-01-11T21:38:05.9988822Z                             auto tmp8 = (tmp6 != tmp6) ? tmp6 : std::max(tmp7, tmp6);
2023-01-11T21:38:05.9988957Z                             auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::max(tmp9, tmp8);
2023-01-11T21:38:05.9989099Z                             auto tmp12 = (tmp10 != tmp10) ? tmp10 : std::max(tmp11, tmp10);
2023-01-11T21:38:05.9989238Z                             auto tmp14 = (tmp12 != tmp12) ? tmp12 : std::max(tmp13, tmp12);
2023-01-11T21:38:05.9989377Z                             auto tmp16 = (tmp14 != tmp14) ? tmp14 : std::max(tmp15, tmp14);
2023-01-11T21:38:05.9989495Z                             auto tmp17 = static_cast<long>((3*i2) + (165*i1));
2023-01-11T21:38:05.9989627Z                             auto tmp18 = static_cast<long>(1 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9989731Z                             auto tmp19 = tmp1 > tmp0;
2023-01-11T21:38:05.9989844Z                             auto tmp20 = tmp19 ? tmp18 : tmp17;
2023-01-11T21:38:05.9989973Z                             auto tmp21 = static_cast<long>(2 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9990079Z                             auto tmp22 = tmp3 > tmp2;
2023-01-11T21:38:05.9990189Z                             auto tmp23 = tmp22 ? tmp21 : tmp20;
2023-01-11T21:38:05.9990317Z                             auto tmp24 = static_cast<long>(55 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9990412Z                             auto tmp25 = tmp5 > tmp4;
2023-01-11T21:38:05.9990548Z                             auto tmp26 = tmp25 ? tmp24 : tmp23;
2023-01-11T21:38:05.9990674Z                             auto tmp27 = static_cast<long>(56 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9990776Z                             auto tmp28 = tmp7 > tmp6;
2023-01-11T21:38:05.9990885Z                             auto tmp29 = tmp28 ? tmp27 : tmp26;
2023-01-11T21:38:05.9991011Z                             auto tmp30 = static_cast<long>(57 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9991113Z                             auto tmp31 = tmp9 > tmp8;
2023-01-11T21:38:05.9991224Z                             auto tmp32 = tmp31 ? tmp30 : tmp29;
2023-01-11T21:38:05.9991343Z                             auto tmp33 = static_cast<long>(110 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9991448Z                             auto tmp34 = tmp11 > tmp10;
2023-01-11T21:38:05.9991560Z                             auto tmp35 = tmp34 ? tmp33 : tmp32;
2023-01-11T21:38:05.9991688Z                             auto tmp36 = static_cast<long>(111 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9991794Z                             auto tmp37 = tmp13 > tmp12;
2023-01-11T21:38:05.9991904Z                             auto tmp38 = tmp37 ? tmp36 : tmp35;
2023-01-11T21:38:05.9992027Z                             auto tmp39 = static_cast<long>(112 + (3*i2) + (165*i1));
2023-01-11T21:38:05.9992132Z                             auto tmp40 = tmp15 > tmp14;
2023-01-11T21:38:05.9992234Z                             auto tmp41 = tmp40 ? tmp39 : tmp38;
2023-01-11T21:38:05.9992349Z                             out_ptr0[i2 + (18*i1) + (324*i0)] = tmp16;
2023-01-11T21:38:05.9992461Z                             out_ptr1[i2 + (18*i1) + (324*i0)] = tmp41;
2023-01-11T21:38:05.9992535Z                         }
2023-01-11T21:38:05.9992610Z                     }
2023-01-11T21:38:05.9992680Z                 }
2023-01-11T21:38:05.9992748Z             }
2023-01-11T21:38:05.9992808Z         }
2023-01-11T21:38:05.9992875Z     }
2023-01-11T21:38:05.9992938Z }
2023-01-11T21:38:05.9993028Z ''')
2023-01-11T21:38:05.9993034Z 
2023-01-11T21:38:05.9993038Z 
2023-01-11T21:38:05.9993133Z async_compile.wait(globals())
2023-01-11T21:38:05.9993210Z del async_compile
2023-01-11T21:38:05.9993215Z 
2023-01-11T21:38:05.9993290Z def call(args):
2023-01-11T21:38:05.9993357Z     arg0_1, = args
2023-01-11T21:38:05.9993430Z     args.clear()
2023-01-11T21:38:05.9993690Z     buf0 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9993912Z     buf1 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:05.9994081Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:05.9994155Z     del arg0_1
2023-01-11T21:38:05.9994236Z     return (buf0, buf1, )
2023-01-11T21:38:05.9994241Z 
2023-01-11T21:38:05.9994245Z 
2023-01-11T21:38:05.9994326Z if __name__ == "__main__":
2023-01-11T21:38:05.9994439Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9994571Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9994806Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9994923Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9994928Z 
2023-01-11T21:38:05.9995001Z ok (2.284s)
2023-01-11T21:38:05.9995463Z   test_max_pool2d6_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:05.9995598Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:05.9995857Z [2023-01-11 21:29:19,612] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 191
2023-01-11T21:38:05.9996136Z [2023-01-11 21:29:19,618] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices
2023-01-11T21:38:05.9996394Z [2023-01-11 21:29:19,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 191
2023-01-11T21:38:05.9996408Z 
2023-01-11T21:38:05.9996503Z from ctypes import c_void_p, c_long
2023-01-11T21:38:05.9996578Z import torch
2023-01-11T21:38:05.9996652Z import random
2023-01-11T21:38:05.9996773Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:05.9996899Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:05.9996904Z 
2023-01-11T21:38:05.9996987Z aten = torch.ops.aten
2023-01-11T21:38:05.9997124Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:05.9997214Z async_compile = AsyncCompile()
2023-01-11T21:38:05.9997219Z 
2023-01-11T21:38:05.9997293Z import triton
2023-01-11T21:38:05.9997388Z import triton.language as tl
2023-01-11T21:38:05.9997518Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:05.9997660Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:05.9997666Z 
2023-01-11T21:38:05.9997670Z 
2023-01-11T21:38:05.9997765Z async_compile.wait(globals())
2023-01-11T21:38:05.9997843Z del async_compile
2023-01-11T21:38:05.9997848Z 
2023-01-11T21:38:05.9997926Z def call(args):
2023-01-11T21:38:05.9997993Z     arg0_1, = args
2023-01-11T21:38:05.9998068Z     args.clear()
2023-01-11T21:38:05.9998203Z     buf0 = aten.max_pool2d_with_indices(arg0_1, [13, 13], [13, 13], [0, 0], 1, False)
2023-01-11T21:38:05.9998276Z     del arg0_1
2023-01-11T21:38:05.9998349Z     buf1 = buf0[0]
2023-01-11T21:38:05.9998464Z     assert_size_stride(buf1, (16, 64, 4, 4), (1024, 16, 4, 1))
2023-01-11T21:38:05.9998537Z     buf2 = buf0[1]
2023-01-11T21:38:05.9998644Z     assert_size_stride(buf2, (16, 64, 4, 4), (1024, 16, 4, 1))
2023-01-11T21:38:05.9998713Z     del buf0
2023-01-11T21:38:05.9998794Z     return (buf1, buf2, )
2023-01-11T21:38:05.9998802Z 
2023-01-11T21:38:05.9998806Z 
2023-01-11T21:38:05.9998886Z if __name__ == "__main__":
2023-01-11T21:38:05.9999006Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:05.9999133Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:05.9999394Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:05.9999508Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:05.9999513Z 
2023-01-11T21:38:05.9999577Z ok (0.075s)
2023-01-11T21:38:06.0000055Z   test_max_pool2d_with_indices_backward2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0000192Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0000450Z [2023-01-11 21:29:19,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 192
2023-01-11T21:38:06.0000718Z [2023-01-11 21:29:21,584] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 192
2023-01-11T21:38:06.0000724Z 
2023-01-11T21:38:06.0000821Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0000896Z import torch
2023-01-11T21:38:06.0000971Z import random
2023-01-11T21:38:06.0001091Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0001208Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0001213Z 
2023-01-11T21:38:06.0001294Z aten = torch.ops.aten
2023-01-11T21:38:06.0001431Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0001527Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0001532Z 
2023-01-11T21:38:06.0001605Z import triton
2023-01-11T21:38:06.0001739Z import triton.language as tl
2023-01-11T21:38:06.0001864Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0001998Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0002011Z 
2023-01-11T21:38:06.0002015Z 
2023-01-11T21:38:06.0002146Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0002357Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0002482Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.0002594Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0002700Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0002765Z {
2023-01-11T21:38:06.0002868Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0002927Z     {
2023-01-11T21:38:06.0003010Z         #pragma omp for 
2023-01-11T21:38:06.0003097Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0003165Z         {
2023-01-11T21:38:06.0003252Z             #pragma GCC ivdep
2023-01-11T21:38:06.0003346Z             for(long i1=0; i1<40; i1+=1)
2023-01-11T21:38:06.0003416Z             {
2023-01-11T21:38:06.0003495Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0003591Z                 for(long i2=0; i2<56; i2+=1)
2023-01-11T21:38:06.0003664Z                 {
2023-01-11T21:38:06.0003742Z                     {
2023-01-11T21:38:06.0003815Z                         {
2023-01-11T21:38:06.0003938Z                             auto tmp0 = static_cast<int>(i2 + (56*i1));
2023-01-11T21:38:06.0004057Z                             auto tmp1 = static_cast<int>((i1 / 2));
2023-01-11T21:38:06.0004165Z                             auto tmp2 = static_cast<int>((i2 / 2));
2023-01-11T21:38:06.0004290Z                             auto tmp3 = static_cast<int>(1 + (((1 + i1) / 2)));
2023-01-11T21:38:06.0004411Z                             auto tmp4 = static_cast<int>(1 + (((1 + i2) / 2)));
2023-01-11T21:38:06.0004526Z                             auto tmp5 = static_cast<int>(0);
2023-01-11T21:38:06.0004669Z                             auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5);
2023-01-11T21:38:06.0004803Z                             auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5);
2023-01-11T21:38:06.0004917Z                             auto tmp8 = static_cast<int>(21);
2023-01-11T21:38:06.0005077Z                             auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8);
2023-01-11T21:38:06.0005186Z                             auto tmp10 = static_cast<int>(29);
2023-01-11T21:38:06.0005331Z                             auto tmp11 = (tmp10 != tmp10) ? tmp10 : std::min(tmp4, tmp10);
2023-01-11T21:38:06.0005459Z                             auto tmp12 = tmp6 + tmp5;
2023-01-11T21:38:06.0005571Z                             auto tmp13 = tmp7 + tmp5;
2023-01-11T21:38:06.0005701Z                             auto tmp14 = static_cast<int>(1);
2023-01-11T21:38:06.0005857Z                             auto tmp15 = tmp9 - tmp14;
2023-01-11T21:38:06.0006000Z                             auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp12, tmp15);
2023-01-11T21:38:06.0006152Z                             auto tmp17 = tmp11 - tmp14;
2023-01-11T21:38:06.0006289Z                             auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp13, tmp17);
2023-01-11T21:38:06.0006412Z                             auto tmp19 = in_ptr0[tmp18 + (29*tmp16) + (609*i0)];
2023-01-11T21:38:06.0006541Z                             auto tmp20 = in_ptr1[tmp18 + (29*tmp16) + (609*i0)];
2023-01-11T21:38:06.0006645Z                             auto tmp21 = tmp19 == tmp0;
2023-01-11T21:38:06.0006762Z                             auto tmp22 = static_cast<float>(0.0);
2023-01-11T21:38:06.0006875Z                             auto tmp23 = tmp21 ? tmp20 : tmp22;
2023-01-11T21:38:06.0006979Z                             auto tmp24 = tmp7 + tmp14;
2023-01-11T21:38:06.0007115Z                             auto tmp25 = (tmp17 != tmp17) ? tmp17 : std::min(tmp24, tmp17);
2023-01-11T21:38:06.0007268Z                             auto tmp26 = in_ptr0[tmp25 + (29*tmp16) + (609*i0)];
2023-01-11T21:38:06.0007383Z                             auto tmp27 = in_ptr1[tmp25 + (29*tmp16) + (609*i0)];
2023-01-11T21:38:06.0007487Z                             auto tmp28 = tmp26 == tmp0;
2023-01-11T21:38:06.0007599Z                             auto tmp29 = tmp12 < tmp9;
2023-01-11T21:38:06.0007700Z                             auto tmp30 = tmp24 < tmp11;
2023-01-11T21:38:06.0007802Z                             auto tmp31 = tmp29 & tmp30;
2023-01-11T21:38:06.0007902Z                             auto tmp32 = tmp31 & tmp28;
2023-01-11T21:38:06.0008003Z                             auto tmp33 = tmp23 + tmp27;
2023-01-11T21:38:06.0008108Z                             auto tmp34 = tmp32 ? tmp33 : tmp23;
2023-01-11T21:38:06.0008212Z                             auto tmp35 = tmp6 + tmp14;
2023-01-11T21:38:06.0008351Z                             auto tmp36 = (tmp15 != tmp15) ? tmp15 : std::min(tmp35, tmp15);
2023-01-11T21:38:06.0008481Z                             auto tmp37 = in_ptr0[tmp18 + (29*tmp36) + (609*i0)];
2023-01-11T21:38:06.0008606Z                             auto tmp38 = in_ptr1[tmp18 + (29*tmp36) + (609*i0)];
2023-01-11T21:38:06.0008708Z                             auto tmp39 = tmp37 == tmp0;
2023-01-11T21:38:06.0008814Z                             auto tmp40 = tmp35 < tmp9;
2023-01-11T21:38:06.0008917Z                             auto tmp41 = tmp13 < tmp11;
2023-01-11T21:38:06.0009009Z                             auto tmp42 = tmp40 & tmp41;
2023-01-11T21:38:06.0009112Z                             auto tmp43 = tmp42 & tmp39;
2023-01-11T21:38:06.0009213Z                             auto tmp44 = tmp34 + tmp38;
2023-01-11T21:38:06.0009324Z                             auto tmp45 = tmp43 ? tmp44 : tmp34;
2023-01-11T21:38:06.0009449Z                             auto tmp46 = in_ptr0[tmp25 + (29*tmp36) + (609*i0)];
2023-01-11T21:38:06.0009573Z                             auto tmp47 = in_ptr1[tmp25 + (29*tmp36) + (609*i0)];
2023-01-11T21:38:06.0009677Z                             auto tmp48 = tmp46 == tmp0;
2023-01-11T21:38:06.0009776Z                             auto tmp49 = tmp40 & tmp30;
2023-01-11T21:38:06.0009868Z                             auto tmp50 = tmp49 & tmp48;
2023-01-11T21:38:06.0010004Z                             auto tmp51 = tmp45 + tmp47;
2023-01-11T21:38:06.0010120Z                             auto tmp52 = tmp50 ? tmp51 : tmp45;
2023-01-11T21:38:06.0010235Z                             out_ptr0[i2 + (56*i1) + (2240*i0)] = tmp52;
2023-01-11T21:38:06.0010312Z                         }
2023-01-11T21:38:06.0010384Z                     }
2023-01-11T21:38:06.0010455Z                 }
2023-01-11T21:38:06.0010515Z             }
2023-01-11T21:38:06.0010583Z         }
2023-01-11T21:38:06.0010651Z     }
2023-01-11T21:38:06.0010717Z }
2023-01-11T21:38:06.0010804Z ''')
2023-01-11T21:38:06.0010810Z 
2023-01-11T21:38:06.0010814Z 
2023-01-11T21:38:06.0010910Z async_compile.wait(globals())
2023-01-11T21:38:06.0010992Z del async_compile
2023-01-11T21:38:06.0010997Z 
2023-01-11T21:38:06.0011064Z def call(args):
2023-01-11T21:38:06.0011151Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0011230Z     args.clear()
2023-01-11T21:38:06.0011454Z     buf0 = empty_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0011626Z     kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0011700Z     del arg0_1
2023-01-11T21:38:06.0011771Z     del arg2_1
2023-01-11T21:38:06.0011839Z     return (buf0, )
2023-01-11T21:38:06.0011845Z 
2023-01-11T21:38:06.0011856Z 
2023-01-11T21:38:06.0011931Z if __name__ == "__main__":
2023-01-11T21:38:06.0012050Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0012177Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0012396Z     arg0_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0012649Z     arg1_1 = rand_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0012866Z     arg2_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0012995Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0013003Z 
2023-01-11T21:38:06.0013067Z ok (1.954s)
2023-01-11T21:38:06.0013537Z   test_max_pool2d_with_indices_backward3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0013673Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0013932Z [2023-01-11 21:29:21,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 193
2023-01-11T21:38:06.0014202Z [2023-01-11 21:29:24,125] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 193
2023-01-11T21:38:06.0014208Z 
2023-01-11T21:38:06.0014305Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0014381Z import torch
2023-01-11T21:38:06.0014455Z import random
2023-01-11T21:38:06.0014705Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0014832Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0014838Z 
2023-01-11T21:38:06.0014913Z aten = torch.ops.aten
2023-01-11T21:38:06.0015051Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0015148Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0015153Z 
2023-01-11T21:38:06.0015227Z import triton
2023-01-11T21:38:06.0015322Z import triton.language as tl
2023-01-11T21:38:06.0015449Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0015595Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0015601Z 
2023-01-11T21:38:06.0015605Z 
2023-01-11T21:38:06.0015744Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0015945Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0016117Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.0016231Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0016340Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0016405Z {
2023-01-11T21:38:06.0016509Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0016578Z     {
2023-01-11T21:38:06.0016653Z         #pragma omp for 
2023-01-11T21:38:06.0016743Z         for(long i0=0; i0<8192; i0+=1)
2023-01-11T21:38:06.0016810Z         {
2023-01-11T21:38:06.0016896Z             #pragma GCC ivdep
2023-01-11T21:38:06.0016985Z             for(long i1=0; i1<37; i1+=1)
2023-01-11T21:38:06.0023603Z             {
2023-01-11T21:38:06.0023713Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0023813Z                 for(long i2=0; i2<38; i2+=1)
2023-01-11T21:38:06.0023888Z                 {
2023-01-11T21:38:06.0023964Z                     {
2023-01-11T21:38:06.0024033Z                         {
2023-01-11T21:38:06.0024160Z                             auto tmp0 = static_cast<int>(i2 + (38*i1));
2023-01-11T21:38:06.0024285Z                             auto tmp1 = static_cast<int>(((1 + i1) / 2));
2023-01-11T21:38:06.0024406Z                             auto tmp2 = static_cast<int>(((1 + i2) / 2));
2023-01-11T21:38:06.0024526Z                             auto tmp3 = static_cast<int>(1 + (i1 / 2));
2023-01-11T21:38:06.0024646Z                             auto tmp4 = static_cast<int>(1 + (i2 / 2));
2023-01-11T21:38:06.0024759Z                             auto tmp5 = static_cast<int>(0);
2023-01-11T21:38:06.0024906Z                             auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5);
2023-01-11T21:38:06.0025132Z                             auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5);
2023-01-11T21:38:06.0025271Z                             auto tmp8 = static_cast<int>(19);
2023-01-11T21:38:06.0025407Z                             auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8);
2023-01-11T21:38:06.0025545Z                             auto tmp10 = (tmp8 != tmp8) ? tmp8 : std::min(tmp4, tmp8);
2023-01-11T21:38:06.0025654Z                             auto tmp11 = tmp6 + tmp5;
2023-01-11T21:38:06.0025756Z                             auto tmp12 = tmp7 + tmp5;
2023-01-11T21:38:06.0025871Z                             auto tmp13 = static_cast<int>(1);
2023-01-11T21:38:06.0026044Z                             auto tmp14 = tmp9 - tmp13;
2023-01-11T21:38:06.0026176Z                             auto tmp15 = (tmp14 != tmp14) ? tmp14 : std::min(tmp11, tmp14);
2023-01-11T21:38:06.0026334Z                             auto tmp16 = tmp10 - tmp13;
2023-01-11T21:38:06.0026478Z                             auto tmp17 = (tmp16 != tmp16) ? tmp16 : std::min(tmp12, tmp16);
2023-01-11T21:38:06.0026604Z                             auto tmp18 = in_ptr0[tmp17 + (19*tmp15) + (361*i0)];
2023-01-11T21:38:06.0026727Z                             auto tmp19 = in_ptr1[tmp17 + (19*tmp15) + (361*i0)];
2023-01-11T21:38:06.0026833Z                             auto tmp20 = tmp18 == tmp0;
2023-01-11T21:38:06.0026947Z                             auto tmp21 = static_cast<float>(0.0);
2023-01-11T21:38:06.0027059Z                             auto tmp22 = tmp20 ? tmp19 : tmp21;
2023-01-11T21:38:06.0027162Z                             out_ptr0[i2 + (38*i1) + (1406*i0)] = tmp22;
2023-01-11T21:38:06.0027238Z                         }
2023-01-11T21:38:06.0027311Z                     }
2023-01-11T21:38:06.0027381Z                 }
2023-01-11T21:38:06.0027449Z             }
2023-01-11T21:38:06.0027516Z         }
2023-01-11T21:38:06.0027578Z     }
2023-01-11T21:38:06.0027635Z }
2023-01-11T21:38:06.0027725Z ''')
2023-01-11T21:38:06.0027731Z 
2023-01-11T21:38:06.0027735Z 
2023-01-11T21:38:06.0027831Z async_compile.wait(globals())
2023-01-11T21:38:06.0027908Z del async_compile
2023-01-11T21:38:06.0027913Z 
2023-01-11T21:38:06.0027980Z def call(args):
2023-01-11T21:38:06.0028067Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0028142Z     args.clear()
2023-01-11T21:38:06.0028405Z     buf0 = empty_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0028571Z     kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0028646Z     del arg0_1
2023-01-11T21:38:06.0028718Z     del arg2_1
2023-01-11T21:38:06.0028786Z     return (buf0, )
2023-01-11T21:38:06.0028798Z 
2023-01-11T21:38:06.0028802Z 
2023-01-11T21:38:06.0028878Z if __name__ == "__main__":
2023-01-11T21:38:06.0028999Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0029127Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0029362Z     arg0_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0029589Z     arg1_1 = rand_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0029811Z     arg2_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0029940Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0029945Z 
2023-01-11T21:38:06.0030017Z ok (2.813s)
2023-01-11T21:38:06.0030489Z   test_max_pool2d_with_indices_backward4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0030648Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0030913Z [2023-01-11 21:29:24,437] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 194
2023-01-11T21:38:06.0030918Z 
2023-01-11T21:38:06.0031017Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0031098Z import torch
2023-01-11T21:38:06.0031176Z import random
2023-01-11T21:38:06.0031300Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0031427Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0031432Z 
2023-01-11T21:38:06.0031515Z aten = torch.ops.aten
2023-01-11T21:38:06.0031646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0031746Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0031751Z 
2023-01-11T21:38:06.0031830Z import triton
2023-01-11T21:38:06.0031923Z import triton.language as tl
2023-01-11T21:38:06.0032050Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0032195Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0032201Z 
2023-01-11T21:38:06.0032205Z 
2023-01-11T21:38:06.0032346Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0032554Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0032674Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.0032789Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0032897Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0032965Z {
2023-01-11T21:38:06.0033068Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0033136Z     {
2023-01-11T21:38:06.0033220Z         #pragma omp for 
2023-01-11T21:38:06.0033303Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:06.0033373Z         {
2023-01-11T21:38:06.0033461Z             #pragma GCC ivdep
2023-01-11T21:38:06.0033551Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0033624Z             {
2023-01-11T21:38:06.0033715Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0033813Z                 for(long i2=0; i2<4; i2+=1)
2023-01-11T21:38:06.0033878Z                 {
2023-01-11T21:38:06.0033953Z                     {
2023-01-11T21:38:06.0034030Z                         {
2023-01-11T21:38:06.0034183Z                             auto tmp0 = static_cast<int>(i2 + (4*i1));
2023-01-11T21:38:06.0034361Z                             auto tmp1 = static_cast<int>((-2) + i1);
2023-01-11T21:38:06.0034531Z                             auto tmp2 = static_cast<int>((-2) + i2);
2023-01-11T21:38:06.0034646Z                             auto tmp3 = static_cast<int>(3 + i1);
2023-01-11T21:38:06.0034755Z                             auto tmp4 = static_cast<int>(3 + i2);
2023-01-11T21:38:06.0034867Z                             auto tmp5 = static_cast<int>(0);
2023-01-11T21:38:06.0035006Z                             auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5);
2023-01-11T21:38:06.0035143Z                             auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5);
2023-01-11T21:38:06.0035258Z                             auto tmp8 = static_cast<int>(3);
2023-01-11T21:38:06.0035397Z                             auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8);
2023-01-11T21:38:06.0035513Z                             auto tmp10 = static_cast<int>(4);
2023-01-11T21:38:06.0035654Z                             auto tmp11 = (tmp10 != tmp10) ? tmp10 : std::min(tmp4, tmp10);
2023-01-11T21:38:06.0035751Z                             auto tmp12 = tmp6 + tmp5;
2023-01-11T21:38:06.0035855Z                             auto tmp13 = tmp7 + tmp5;
2023-01-11T21:38:06.0035968Z                             auto tmp14 = static_cast<int>(1);
2023-01-11T21:38:06.0036122Z                             auto tmp15 = tmp9 - tmp14;
2023-01-11T21:38:06.0036267Z                             auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp12, tmp15);
2023-01-11T21:38:06.0036420Z                             auto tmp17 = tmp11 - tmp14;
2023-01-11T21:38:06.0036589Z                             auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp13, tmp17);
2023-01-11T21:38:06.0036716Z                             auto tmp19 = in_ptr0[tmp18 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0036844Z                             auto tmp20 = in_ptr1[tmp18 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0036943Z                             auto tmp21 = tmp19 == tmp0;
2023-01-11T21:38:06.0037060Z                             auto tmp22 = static_cast<float>(0.0);
2023-01-11T21:38:06.0037171Z                             auto tmp23 = tmp21 ? tmp20 : tmp22;
2023-01-11T21:38:06.0037275Z                             auto tmp24 = tmp7 + tmp14;
2023-01-11T21:38:06.0037410Z                             auto tmp25 = (tmp17 != tmp17) ? tmp17 : std::min(tmp24, tmp17);
2023-01-11T21:38:06.0037534Z                             auto tmp26 = in_ptr0[tmp25 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0037659Z                             auto tmp27 = in_ptr1[tmp25 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0037759Z                             auto tmp28 = tmp26 == tmp0;
2023-01-11T21:38:06.0037862Z                             auto tmp29 = tmp12 < tmp9;
2023-01-11T21:38:06.0037964Z                             auto tmp30 = tmp24 < tmp11;
2023-01-11T21:38:06.0038072Z                             auto tmp31 = tmp29 & tmp30;
2023-01-11T21:38:06.0038173Z                             auto tmp32 = tmp31 & tmp28;
2023-01-11T21:38:06.0038276Z                             auto tmp33 = tmp23 + tmp27;
2023-01-11T21:38:06.0038387Z                             auto tmp34 = tmp32 ? tmp33 : tmp23;
2023-01-11T21:38:06.0038503Z                             auto tmp35 = static_cast<int>(2);
2023-01-11T21:38:06.0038600Z                             auto tmp36 = tmp7 + tmp35;
2023-01-11T21:38:06.0038736Z                             auto tmp37 = (tmp17 != tmp17) ? tmp17 : std::min(tmp36, tmp17);
2023-01-11T21:38:06.0038858Z                             auto tmp38 = in_ptr0[tmp37 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0038983Z                             auto tmp39 = in_ptr1[tmp37 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0039085Z                             auto tmp40 = tmp38 == tmp0;
2023-01-11T21:38:06.0039188Z                             auto tmp41 = tmp36 < tmp11;
2023-01-11T21:38:06.0039315Z                             auto tmp42 = tmp29 & tmp41;
2023-01-11T21:38:06.0039418Z                             auto tmp43 = tmp42 & tmp40;
2023-01-11T21:38:06.0039511Z                             auto tmp44 = tmp34 + tmp39;
2023-01-11T21:38:06.0039627Z                             auto tmp45 = tmp43 ? tmp44 : tmp34;
2023-01-11T21:38:06.0039732Z                             auto tmp46 = tmp7 + tmp8;
2023-01-11T21:38:06.0039866Z                             auto tmp47 = (tmp17 != tmp17) ? tmp17 : std::min(tmp46, tmp17);
2023-01-11T21:38:06.0039987Z                             auto tmp48 = in_ptr0[tmp47 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0040109Z                             auto tmp49 = in_ptr1[tmp47 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0040215Z                             auto tmp50 = tmp48 == tmp0;
2023-01-11T21:38:06.0040320Z                             auto tmp51 = tmp46 < tmp11;
2023-01-11T21:38:06.0040414Z                             auto tmp52 = tmp29 & tmp51;
2023-01-11T21:38:06.0040519Z                             auto tmp53 = tmp52 & tmp50;
2023-01-11T21:38:06.0040618Z                             auto tmp54 = tmp45 + tmp49;
2023-01-11T21:38:06.0040731Z                             auto tmp55 = tmp53 ? tmp54 : tmp45;
2023-01-11T21:38:06.0040835Z                             auto tmp56 = tmp7 + tmp10;
2023-01-11T21:38:06.0040977Z                             auto tmp57 = (tmp17 != tmp17) ? tmp17 : std::min(tmp56, tmp17);
2023-01-11T21:38:06.0041101Z                             auto tmp58 = in_ptr0[tmp57 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0041215Z                             auto tmp59 = in_ptr1[tmp57 + (4*tmp16) + (12*i0)];
2023-01-11T21:38:06.0041345Z                             auto tmp60 = tmp58 == tmp0;
2023-01-11T21:38:06.0041448Z                             auto tmp61 = tmp56 < tmp11;
2023-01-11T21:38:06.0041549Z                             auto tmp62 = tmp29 & tmp61;
2023-01-11T21:38:06.0041651Z                             auto tmp63 = tmp62 & tmp60;
2023-01-11T21:38:06.0041754Z                             auto tmp64 = tmp55 + tmp59;
2023-01-11T21:38:06.0041867Z                             auto tmp65 = tmp63 ? tmp64 : tmp55;
2023-01-11T21:38:06.0041969Z                             auto tmp66 = tmp6 + tmp14;
2023-01-11T21:38:06.0042101Z                             auto tmp67 = (tmp15 != tmp15) ? tmp15 : std::min(tmp66, tmp15);
2023-01-11T21:38:06.0042224Z                             auto tmp68 = in_ptr0[tmp18 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0042345Z                             auto tmp69 = in_ptr1[tmp18 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0042447Z                             auto tmp70 = tmp68 == tmp0;
2023-01-11T21:38:06.0042552Z                             auto tmp71 = tmp66 < tmp9;
2023-01-11T21:38:06.0042656Z                             auto tmp72 = tmp13 < tmp11;
2023-01-11T21:38:06.0042760Z                             auto tmp73 = tmp71 & tmp72;
2023-01-11T21:38:06.0042860Z                             auto tmp74 = tmp73 & tmp70;
2023-01-11T21:38:06.0042956Z                             auto tmp75 = tmp65 + tmp69;
2023-01-11T21:38:06.0043068Z                             auto tmp76 = tmp74 ? tmp75 : tmp65;
2023-01-11T21:38:06.0043190Z                             auto tmp77 = in_ptr0[tmp25 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0043312Z                             auto tmp78 = in_ptr1[tmp25 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0043414Z                             auto tmp79 = tmp77 == tmp0;
2023-01-11T21:38:06.0043516Z                             auto tmp80 = tmp71 & tmp30;
2023-01-11T21:38:06.0043616Z                             auto tmp81 = tmp80 & tmp79;
2023-01-11T21:38:06.0043710Z                             auto tmp82 = tmp76 + tmp78;
2023-01-11T21:38:06.0043826Z                             auto tmp83 = tmp81 ? tmp82 : tmp76;
2023-01-11T21:38:06.0043947Z                             auto tmp84 = in_ptr0[tmp37 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0044067Z                             auto tmp85 = in_ptr1[tmp37 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0044198Z                             auto tmp86 = tmp84 == tmp0;
2023-01-11T21:38:06.0044302Z                             auto tmp87 = tmp71 & tmp41;
2023-01-11T21:38:06.0044404Z                             auto tmp88 = tmp87 & tmp86;
2023-01-11T21:38:06.0044505Z                             auto tmp89 = tmp83 + tmp85;
2023-01-11T21:38:06.0044611Z                             auto tmp90 = tmp88 ? tmp89 : tmp83;
2023-01-11T21:38:06.0044732Z                             auto tmp91 = in_ptr0[tmp47 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0044853Z                             auto tmp92 = in_ptr1[tmp47 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0044955Z                             auto tmp93 = tmp91 == tmp0;
2023-01-11T21:38:06.0045060Z                             auto tmp94 = tmp71 & tmp51;
2023-01-11T21:38:06.0045160Z                             auto tmp95 = tmp94 & tmp93;
2023-01-11T21:38:06.0045260Z                             auto tmp96 = tmp90 + tmp92;
2023-01-11T21:38:06.0045373Z                             auto tmp97 = tmp95 ? tmp96 : tmp90;
2023-01-11T21:38:06.0045486Z                             auto tmp98 = in_ptr0[tmp57 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0045605Z                             auto tmp99 = in_ptr1[tmp57 + (4*tmp67) + (12*i0)];
2023-01-11T21:38:06.0045713Z                             auto tmp100 = tmp98 == tmp0;
2023-01-11T21:38:06.0045820Z                             auto tmp101 = tmp71 & tmp61;
2023-01-11T21:38:06.0045929Z                             auto tmp102 = tmp101 & tmp100;
2023-01-11T21:38:06.0046033Z                             auto tmp103 = tmp97 + tmp99;
2023-01-11T21:38:06.0046149Z                             auto tmp104 = tmp102 ? tmp103 : tmp97;
2023-01-11T21:38:06.0046284Z                             auto tmp105 = tmp6 + tmp35;
2023-01-11T21:38:06.0046429Z                             auto tmp106 = (tmp15 != tmp15) ? tmp15 : std::min(tmp105, tmp15);
2023-01-11T21:38:06.0046556Z                             auto tmp107 = in_ptr0[tmp18 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0046683Z                             auto tmp108 = in_ptr1[tmp18 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0046790Z                             auto tmp109 = tmp107 == tmp0;
2023-01-11T21:38:06.0046895Z                             auto tmp110 = tmp105 < tmp9;
2023-01-11T21:38:06.0047003Z                             auto tmp111 = tmp110 & tmp72;
2023-01-11T21:38:06.0047111Z                             auto tmp112 = tmp111 & tmp109;
2023-01-11T21:38:06.0047212Z                             auto tmp113 = tmp104 + tmp108;
2023-01-11T21:38:06.0047326Z                             auto tmp114 = tmp112 ? tmp113 : tmp104;
2023-01-11T21:38:06.0047449Z                             auto tmp115 = in_ptr0[tmp25 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0047575Z                             auto tmp116 = in_ptr1[tmp25 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0047685Z                             auto tmp117 = tmp115 == tmp0;
2023-01-11T21:38:06.0047790Z                             auto tmp118 = tmp110 & tmp30;
2023-01-11T21:38:06.0047902Z                             auto tmp119 = tmp118 & tmp117;
2023-01-11T21:38:06.0048010Z                             auto tmp120 = tmp114 + tmp116;
2023-01-11T21:38:06.0048116Z                             auto tmp121 = tmp119 ? tmp120 : tmp114;
2023-01-11T21:38:06.0048239Z                             auto tmp122 = in_ptr0[tmp37 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0048360Z                             auto tmp123 = in_ptr1[tmp37 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0048467Z                             auto tmp124 = tmp122 == tmp0;
2023-01-11T21:38:06.0048571Z                             auto tmp125 = tmp110 & tmp41;
2023-01-11T21:38:06.0048681Z                             auto tmp126 = tmp125 & tmp124;
2023-01-11T21:38:06.0048789Z                             auto tmp127 = tmp121 + tmp123;
2023-01-11T21:38:06.0048903Z                             auto tmp128 = tmp126 ? tmp127 : tmp121;
2023-01-11T21:38:06.0049047Z                             auto tmp129 = in_ptr0[tmp47 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0049173Z                             auto tmp130 = in_ptr1[tmp47 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0049278Z                             auto tmp131 = tmp129 == tmp0;
2023-01-11T21:38:06.0049384Z                             auto tmp132 = tmp110 & tmp51;
2023-01-11T21:38:06.0049492Z                             auto tmp133 = tmp132 & tmp131;
2023-01-11T21:38:06.0049598Z                             auto tmp134 = tmp128 + tmp130;
2023-01-11T21:38:06.0049715Z                             auto tmp135 = tmp133 ? tmp134 : tmp128;
2023-01-11T21:38:06.0049832Z                             auto tmp136 = in_ptr0[tmp57 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0049955Z                             auto tmp137 = in_ptr1[tmp57 + (4*tmp106) + (12*i0)];
2023-01-11T21:38:06.0050062Z                             auto tmp138 = tmp136 == tmp0;
2023-01-11T21:38:06.0050168Z                             auto tmp139 = tmp110 & tmp61;
2023-01-11T21:38:06.0050277Z                             auto tmp140 = tmp139 & tmp138;
2023-01-11T21:38:06.0050385Z                             auto tmp141 = tmp135 + tmp137;
2023-01-11T21:38:06.0050500Z                             auto tmp142 = tmp140 ? tmp141 : tmp135;
2023-01-11T21:38:06.0050604Z                             auto tmp143 = tmp6 + tmp8;
2023-01-11T21:38:06.0050741Z                             auto tmp144 = (tmp15 != tmp15) ? tmp15 : std::min(tmp143, tmp15);
2023-01-11T21:38:06.0050864Z                             auto tmp145 = in_ptr0[tmp18 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0050986Z                             auto tmp146 = in_ptr1[tmp18 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0051130Z                             auto tmp147 = tmp145 == tmp0;
2023-01-11T21:38:06.0051236Z                             auto tmp148 = tmp143 < tmp9;
2023-01-11T21:38:06.0051341Z                             auto tmp149 = tmp148 & tmp72;
2023-01-11T21:38:06.0051446Z                             auto tmp150 = tmp149 & tmp147;
2023-01-11T21:38:06.0051555Z                             auto tmp151 = tmp142 + tmp146;
2023-01-11T21:38:06.0051661Z                             auto tmp152 = tmp150 ? tmp151 : tmp142;
2023-01-11T21:38:06.0051783Z                             auto tmp153 = in_ptr0[tmp25 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0051903Z                             auto tmp154 = in_ptr1[tmp25 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0052008Z                             auto tmp155 = tmp153 == tmp0;
2023-01-11T21:38:06.0052113Z                             auto tmp156 = tmp148 & tmp30;
2023-01-11T21:38:06.0052218Z                             auto tmp157 = tmp156 & tmp155;
2023-01-11T21:38:06.0052328Z                             auto tmp158 = tmp152 + tmp154;
2023-01-11T21:38:06.0052440Z                             auto tmp159 = tmp157 ? tmp158 : tmp152;
2023-01-11T21:38:06.0052554Z                             auto tmp160 = in_ptr0[tmp37 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0052678Z                             auto tmp161 = in_ptr1[tmp37 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0052782Z                             auto tmp162 = tmp160 == tmp0;
2023-01-11T21:38:06.0052890Z                             auto tmp163 = tmp148 & tmp41;
2023-01-11T21:38:06.0052999Z                             auto tmp164 = tmp163 & tmp162;
2023-01-11T21:38:06.0053104Z                             auto tmp165 = tmp159 + tmp161;
2023-01-11T21:38:06.0053219Z                             auto tmp166 = tmp164 ? tmp165 : tmp159;
2023-01-11T21:38:06.0053336Z                             auto tmp167 = in_ptr0[tmp47 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0053456Z                             auto tmp168 = in_ptr1[tmp47 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0053563Z                             auto tmp169 = tmp167 == tmp0;
2023-01-11T21:38:06.0053668Z                             auto tmp170 = tmp148 & tmp51;
2023-01-11T21:38:06.0053775Z                             auto tmp171 = tmp170 & tmp169;
2023-01-11T21:38:06.0053906Z                             auto tmp172 = tmp166 + tmp168;
2023-01-11T21:38:06.0054021Z                             auto tmp173 = tmp171 ? tmp172 : tmp166;
2023-01-11T21:38:06.0054145Z                             auto tmp174 = in_ptr0[tmp57 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0054258Z                             auto tmp175 = in_ptr1[tmp57 + (4*tmp144) + (12*i0)];
2023-01-11T21:38:06.0054363Z                             auto tmp176 = tmp174 == tmp0;
2023-01-11T21:38:06.0054468Z                             auto tmp177 = tmp148 & tmp61;
2023-01-11T21:38:06.0054718Z                             auto tmp178 = tmp177 & tmp176;
2023-01-11T21:38:06.0054826Z                             auto tmp179 = tmp173 + tmp175;
2023-01-11T21:38:06.0054944Z                             auto tmp180 = tmp178 ? tmp179 : tmp173;
2023-01-11T21:38:06.0055062Z                             auto tmp181 = tmp6 + tmp10;
2023-01-11T21:38:06.0055224Z                             auto tmp182 = (tmp15 != tmp15) ? tmp15 : std::min(tmp181, tmp15);
2023-01-11T21:38:06.0055357Z                             auto tmp183 = in_ptr0[tmp18 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0055478Z                             auto tmp184 = in_ptr1[tmp18 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0055585Z                             auto tmp185 = tmp183 == tmp0;
2023-01-11T21:38:06.0055691Z                             auto tmp186 = tmp181 < tmp9;
2023-01-11T21:38:06.0055795Z                             auto tmp187 = tmp186 & tmp72;
2023-01-11T21:38:06.0055900Z                             auto tmp188 = tmp187 & tmp185;
2023-01-11T21:38:06.0056006Z                             auto tmp189 = tmp180 + tmp184;
2023-01-11T21:38:06.0056113Z                             auto tmp190 = tmp188 ? tmp189 : tmp180;
2023-01-11T21:38:06.0056277Z                             auto tmp191 = in_ptr0[tmp25 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0056398Z                             auto tmp192 = in_ptr1[tmp25 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0056503Z                             auto tmp193 = tmp191 == tmp0;
2023-01-11T21:38:06.0056608Z                             auto tmp194 = tmp186 & tmp30;
2023-01-11T21:38:06.0056712Z                             auto tmp195 = tmp194 & tmp193;
2023-01-11T21:38:06.0056816Z                             auto tmp196 = tmp190 + tmp192;
2023-01-11T21:38:06.0056929Z                             auto tmp197 = tmp195 ? tmp196 : tmp190;
2023-01-11T21:38:06.0057044Z                             auto tmp198 = in_ptr0[tmp37 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0057230Z                             auto tmp199 = in_ptr1[tmp37 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0057337Z                             auto tmp200 = tmp198 == tmp0;
2023-01-11T21:38:06.0057444Z                             auto tmp201 = tmp186 & tmp41;
2023-01-11T21:38:06.0057551Z                             auto tmp202 = tmp201 & tmp200;
2023-01-11T21:38:06.0057656Z                             auto tmp203 = tmp197 + tmp199;
2023-01-11T21:38:06.0057771Z                             auto tmp204 = tmp202 ? tmp203 : tmp197;
2023-01-11T21:38:06.0057895Z                             auto tmp205 = in_ptr0[tmp47 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0058009Z                             auto tmp206 = in_ptr1[tmp47 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0058114Z                             auto tmp207 = tmp205 == tmp0;
2023-01-11T21:38:06.0058217Z                             auto tmp208 = tmp186 & tmp51;
2023-01-11T21:38:06.0058324Z                             auto tmp209 = tmp208 & tmp207;
2023-01-11T21:38:06.0058429Z                             auto tmp210 = tmp204 + tmp206;
2023-01-11T21:38:06.0058541Z                             auto tmp211 = tmp209 ? tmp210 : tmp204;
2023-01-11T21:38:06.0058667Z                             auto tmp212 = in_ptr0[tmp57 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0058788Z                             auto tmp213 = in_ptr1[tmp57 + (4*tmp182) + (12*i0)];
2023-01-11T21:38:06.0058885Z                             auto tmp214 = tmp212 == tmp0;
2023-01-11T21:38:06.0059026Z                             auto tmp215 = tmp186 & tmp61;
2023-01-11T21:38:06.0059134Z                             auto tmp216 = tmp215 & tmp214;
2023-01-11T21:38:06.0059237Z                             auto tmp217 = tmp211 + tmp213;
2023-01-11T21:38:06.0059348Z                             auto tmp218 = tmp216 ? tmp217 : tmp211;
2023-01-11T21:38:06.0059457Z                             out_ptr0[i2 + (4*i1) + (12*i0)] = tmp218;
2023-01-11T21:38:06.0059531Z                         }
2023-01-11T21:38:06.0059595Z                     }
2023-01-11T21:38:06.0059663Z                 }
2023-01-11T21:38:06.0059731Z             }
2023-01-11T21:38:06.0059798Z         }
2023-01-11T21:38:06.0059869Z     }
2023-01-11T21:38:06.0059934Z }
2023-01-11T21:38:06.0060024Z ''')
2023-01-11T21:38:06.0060029Z 
2023-01-11T21:38:06.0060034Z 
2023-01-11T21:38:06.0060123Z async_compile.wait(globals())
2023-01-11T21:38:06.0060200Z del async_compile
2023-01-11T21:38:06.0060207Z 
2023-01-11T21:38:06.0060286Z def call(args):
2023-01-11T21:38:06.0060375Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0060451Z     args.clear()
2023-01-11T21:38:06.0060665Z     buf0 = empty_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0060851Z     kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0060917Z     del arg0_1
2023-01-11T21:38:06.0060989Z     del arg2_1
2023-01-11T21:38:06.0061069Z     return (buf0, )
2023-01-11T21:38:06.0061074Z 
2023-01-11T21:38:06.0061079Z 
2023-01-11T21:38:06.0061160Z if __name__ == "__main__":
2023-01-11T21:38:06.0061289Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0061490Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0061705Z     arg0_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0061916Z     arg1_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0062120Z     arg2_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0062248Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0062514Z [2023-01-11 21:29:26,811] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 194
2023-01-11T21:38:06.0062521Z 
2023-01-11T21:38:06.0062592Z ok (2.417s)
2023-01-11T21:38:06.0063070Z   test_max_pool2d_with_indices_backward5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0063204Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0063466Z [2023-01-11 21:29:26,860] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 195
2023-01-11T21:38:06.0063723Z [2023-01-11 21:29:26,880] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices_backward
2023-01-11T21:38:06.0063982Z [2023-01-11 21:29:26,885] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 195
2023-01-11T21:38:06.0063988Z 
2023-01-11T21:38:06.0064090Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0064158Z import torch
2023-01-11T21:38:06.0064233Z import random
2023-01-11T21:38:06.0064354Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0064478Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0064487Z 
2023-01-11T21:38:06.0064568Z aten = torch.ops.aten
2023-01-11T21:38:06.0064706Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0064802Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0064807Z 
2023-01-11T21:38:06.0064882Z import triton
2023-01-11T21:38:06.0064994Z import triton.language as tl
2023-01-11T21:38:06.0065122Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0065263Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0065269Z 
2023-01-11T21:38:06.0065273Z 
2023-01-11T21:38:06.0065364Z async_compile.wait(globals())
2023-01-11T21:38:06.0065441Z del async_compile
2023-01-11T21:38:06.0065446Z 
2023-01-11T21:38:06.0065520Z def call(args):
2023-01-11T21:38:06.0065606Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0065681Z     args.clear()
2023-01-11T21:38:06.0065836Z     buf0 = aten.max_pool2d_with_indices_backward(arg0_1, arg1_1, [13, 13], [1, 1], [2, 2], [1, 1], False, arg2_1)
2023-01-11T21:38:06.0065913Z     del arg0_1
2023-01-11T21:38:06.0065985Z     del arg1_1
2023-01-11T21:38:06.0066056Z     del arg2_1
2023-01-11T21:38:06.0066129Z     buf1 = buf0
2023-01-11T21:38:06.0066244Z     assert_size_stride(buf1, (2, 64, 20, 20), (25600, 400, 20, 1))
2023-01-11T21:38:06.0066315Z     del buf0
2023-01-11T21:38:06.0066386Z     return (buf1, )
2023-01-11T21:38:06.0066391Z 
2023-01-11T21:38:06.0066396Z 
2023-01-11T21:38:06.0066476Z if __name__ == "__main__":
2023-01-11T21:38:06.0066592Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0066719Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0066942Z     arg0_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0067162Z     arg1_1 = rand_strided((2, 64, 20, 20), (25600, 400, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0067376Z     arg2_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0067539Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0067545Z 
2023-01-11T21:38:06.0067609Z ok (0.059s)
2023-01-11T21:38:06.0068087Z   test_max_pool2d_with_indices_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0068219Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0068478Z [2023-01-11 21:29:26,919] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 196
2023-01-11T21:38:06.0068742Z [2023-01-11 21:29:28,882] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 196
2023-01-11T21:38:06.0068751Z 
2023-01-11T21:38:06.0068850Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0068925Z import torch
2023-01-11T21:38:06.0069000Z import random
2023-01-11T21:38:06.0069120Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0069236Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0069244Z 
2023-01-11T21:38:06.0069325Z aten = torch.ops.aten
2023-01-11T21:38:06.0069464Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0069561Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0069566Z 
2023-01-11T21:38:06.0069641Z import triton
2023-01-11T21:38:06.0069737Z import triton.language as tl
2023-01-11T21:38:06.0069864Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0070003Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0070009Z 
2023-01-11T21:38:06.0070014Z 
2023-01-11T21:38:06.0070144Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0070353Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0070476Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.0070586Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0070718Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0070785Z {
2023-01-11T21:38:06.0070888Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0070947Z     {
2023-01-11T21:38:06.0071028Z         #pragma omp for 
2023-01-11T21:38:06.0071116Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0071183Z         {
2023-01-11T21:38:06.0071268Z             #pragma GCC ivdep
2023-01-11T21:38:06.0071360Z             for(long i1=0; i1<18; i1+=1)
2023-01-11T21:38:06.0071431Z             {
2023-01-11T21:38:06.0071510Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0071604Z                 for(long i2=0; i2<14; i2+=1)
2023-01-11T21:38:06.0071673Z                 {
2023-01-11T21:38:06.0071746Z                     {
2023-01-11T21:38:06.0071818Z                         {
2023-01-11T21:38:06.0071939Z                             auto tmp0 = static_cast<int>(i2 + (14*i1));
2023-01-11T21:38:06.0072057Z                             auto tmp1 = static_cast<int>((i1 / 2));
2023-01-11T21:38:06.0072165Z                             auto tmp2 = static_cast<int>((i2 / 2));
2023-01-11T21:38:06.0072282Z                             auto tmp3 = static_cast<int>(1 + (i1 / 2));
2023-01-11T21:38:06.0072404Z                             auto tmp4 = static_cast<int>(1 + (i2 / 2));
2023-01-11T21:38:06.0072515Z                             auto tmp5 = static_cast<int>(0);
2023-01-11T21:38:06.0072652Z                             auto tmp6 = (tmp5 != tmp5) ? tmp5 : std::max(tmp1, tmp5);
2023-01-11T21:38:06.0072782Z                             auto tmp7 = (tmp5 != tmp5) ? tmp5 : std::max(tmp2, tmp5);
2023-01-11T21:38:06.0072894Z                             auto tmp8 = static_cast<int>(9);
2023-01-11T21:38:06.0073057Z                             auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::min(tmp3, tmp8);
2023-01-11T21:38:06.0073162Z                             auto tmp10 = static_cast<int>(7);
2023-01-11T21:38:06.0073299Z                             auto tmp11 = (tmp10 != tmp10) ? tmp10 : std::min(tmp4, tmp10);
2023-01-11T21:38:06.0073402Z                             auto tmp12 = tmp6 + tmp5;
2023-01-11T21:38:06.0073502Z                             auto tmp13 = tmp7 + tmp5;
2023-01-11T21:38:06.0073612Z                             auto tmp14 = static_cast<int>(1);
2023-01-11T21:38:06.0073765Z                             auto tmp15 = tmp9 - tmp14;
2023-01-11T21:38:06.0073904Z                             auto tmp16 = (tmp15 != tmp15) ? tmp15 : std::min(tmp12, tmp15);
2023-01-11T21:38:06.0074057Z                             auto tmp17 = tmp11 - tmp14;
2023-01-11T21:38:06.0074193Z                             auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp13, tmp17);
2023-01-11T21:38:06.0074310Z                             auto tmp19 = in_ptr0[tmp18 + (7*tmp16) + (63*i0)];
2023-01-11T21:38:06.0074437Z                             auto tmp20 = in_ptr1[tmp18 + (7*tmp16) + (63*i0)];
2023-01-11T21:38:06.0074541Z                             auto tmp21 = tmp19 == tmp0;
2023-01-11T21:38:06.0074657Z                             auto tmp22 = static_cast<float>(0.0);
2023-01-11T21:38:06.0074770Z                             auto tmp23 = tmp21 ? tmp20 : tmp22;
2023-01-11T21:38:06.0074886Z                             out_ptr0[i2 + (14*i1) + (252*i0)] = tmp23;
2023-01-11T21:38:06.0074960Z                         }
2023-01-11T21:38:06.0075024Z                     }
2023-01-11T21:38:06.0075094Z                 }
2023-01-11T21:38:06.0075163Z             }
2023-01-11T21:38:06.0075231Z         }
2023-01-11T21:38:06.0075297Z     }
2023-01-11T21:38:06.0075360Z }
2023-01-11T21:38:06.0075446Z ''')
2023-01-11T21:38:06.0075452Z 
2023-01-11T21:38:06.0075456Z 
2023-01-11T21:38:06.0075544Z async_compile.wait(globals())
2023-01-11T21:38:06.0075624Z del async_compile
2023-01-11T21:38:06.0075630Z 
2023-01-11T21:38:06.0075704Z def call(args):
2023-01-11T21:38:06.0075792Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0075868Z     args.clear()
2023-01-11T21:38:06.0076085Z     buf0 = empty_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0076281Z     kernel_cpp_0(c_void_p(arg2_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0076355Z     del arg0_1
2023-01-11T21:38:06.0076420Z     del arg2_1
2023-01-11T21:38:06.0076495Z     return (buf0, )
2023-01-11T21:38:06.0076500Z 
2023-01-11T21:38:06.0076505Z 
2023-01-11T21:38:06.0076586Z if __name__ == "__main__":
2023-01-11T21:38:06.0076705Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0076831Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0077043Z     arg0_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0077264Z     arg1_1 = rand_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0077465Z     arg2_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0077592Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0077597Z 
2023-01-11T21:38:06.0077672Z ok (2.002s)
2023-01-11T21:38:06.0078121Z   test_mean_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0078254Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0078512Z [2023-01-11 21:29:28,919] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 197
2023-01-11T21:38:06.0078805Z [2023-01-11 21:29:30,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 197
2023-01-11T21:38:06.0078811Z 
2023-01-11T21:38:06.0078909Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0078983Z import torch
2023-01-11T21:38:06.0079060Z import random
2023-01-11T21:38:06.0079175Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0079299Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0079304Z 
2023-01-11T21:38:06.0079386Z aten = torch.ops.aten
2023-01-11T21:38:06.0079524Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0079620Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0079625Z 
2023-01-11T21:38:06.0079701Z import triton
2023-01-11T21:38:06.0079794Z import triton.language as tl
2023-01-11T21:38:06.0079912Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0080050Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0080058Z 
2023-01-11T21:38:06.0080063Z 
2023-01-11T21:38:06.0080202Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0080406Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0080529Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0080637Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.0080750Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0080852Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0080947Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:06.0081015Z {
2023-01-11T21:38:06.0081105Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:06.0081195Z     auto out_ptr0 = in_out_ptr1;
2023-01-11T21:38:06.0081261Z     {
2023-01-11T21:38:06.0081454Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0081539Z         float tmp1 = 0;
2023-01-11T21:38:06.0081664Z         auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0081765Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0081832Z         {
2023-01-11T21:38:06.0081944Z             #pragma omp for reduction(+:tmp1_vec) 
2023-01-11T21:38:06.0082063Z             for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0082133Z             {
2023-01-11T21:38:06.0082272Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0082362Z                 tmp1_vec += tmp0;
2023-01-11T21:38:06.0082422Z             }
2023-01-11T21:38:06.0082620Z             tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0082747Z             #pragma omp for simd simdlen(4) reduction(+:tmp1) 
2023-01-11T21:38:06.0082838Z             for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0082909Z             {
2023-01-11T21:38:06.0083005Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0083087Z                 tmp1 += tmp0;
2023-01-11T21:38:06.0083147Z             }
2023-01-11T21:38:06.0083215Z         }
2023-01-11T21:38:06.0083300Z         out_ptr0[0] = tmp1;
2023-01-11T21:38:06.0083368Z     }
2023-01-11T21:38:06.0083475Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0083542Z     {
2023-01-11T21:38:06.0083624Z         #pragma omp for 
2023-01-11T21:38:06.0083704Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0083772Z         {
2023-01-11T21:38:06.0083839Z             {
2023-01-11T21:38:06.0084031Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0084115Z                 float tmp1 = 0;
2023-01-11T21:38:06.0084246Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0084343Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0084443Z                 {
2023-01-11T21:38:06.0084594Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0084682Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.0084755Z                 }
2023-01-11T21:38:06.0084982Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0085124Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.0085226Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0085296Z                 {
2023-01-11T21:38:06.0085393Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0085476Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0085545Z                 }
2023-01-11T21:38:06.0085633Z                 out_ptr1[i0] = tmp1;
2023-01-11T21:38:06.0085703Z             }
2023-01-11T21:38:06.0085769Z         }
2023-01-11T21:38:06.0085855Z         #pragma omp for 
2023-01-11T21:38:06.0085934Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0086002Z         {
2023-01-11T21:38:06.0086140Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0086277Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.0086371Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0086470Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0086538Z         }
2023-01-11T21:38:06.0086631Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0086716Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.0086782Z         {
2023-01-11T21:38:06.0086871Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:06.0086974Z             auto tmp1 = static_cast<float>(8);
2023-01-11T21:38:06.0087064Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0087153Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0087212Z         }
2023-01-11T21:38:06.0087295Z         #pragma omp for 
2023-01-11T21:38:06.0087379Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0087444Z         {
2023-01-11T21:38:06.0087532Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0087600Z             {
2023-01-11T21:38:06.0087774Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0087917Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0088065Z                 auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr0 + 16 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0088208Z                 auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr0 + 24 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0088301Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0088396Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0088488Z                 auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0088628Z                 auto tmp7 = at::vec::Vectorized<float>(static_cast<float>(4));
2023-01-11T21:38:06.0088721Z                 auto tmp8 = tmp6 / tmp7;
2023-01-11T21:38:06.0088823Z                 tmp8.store(out_ptr2 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0088892Z             }
2023-01-11T21:38:06.0088988Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0089079Z             for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0089145Z             {
2023-01-11T21:38:06.0089248Z                 auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:06.0089350Z                 auto tmp1 = in_ptr0[8 + i1 + (32*i0)];
2023-01-11T21:38:06.0089447Z                 auto tmp3 = in_ptr0[16 + i1 + (32*i0)];
2023-01-11T21:38:06.0089550Z                 auto tmp5 = in_ptr0[24 + i1 + (32*i0)];
2023-01-11T21:38:06.0089641Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0089732Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0089825Z                 auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0089932Z                 auto tmp7 = static_cast<float>(4);
2023-01-11T21:38:06.0090056Z                 auto tmp8 = tmp6 / tmp7;
2023-01-11T21:38:06.0090146Z                 out_ptr2[i1 + (8*i0)] = tmp8;
2023-01-11T21:38:06.0090219Z             }
2023-01-11T21:38:06.0090287Z         }
2023-01-11T21:38:06.0090367Z         #pragma omp for 
2023-01-11T21:38:06.0090457Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0090526Z         {
2023-01-11T21:38:06.0090663Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0090803Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr0 + 32 + (8*i0));
2023-01-11T21:38:06.0090887Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0091022Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0091111Z             auto tmp4 = tmp2 / tmp3;
2023-01-11T21:38:06.0091206Z             tmp4.store(out_ptr3 + 8*i0);
2023-01-11T21:38:06.0091273Z         }
2023-01-11T21:38:06.0091373Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0091466Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:06.0091526Z         {
2023-01-11T21:38:06.0091614Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0091707Z             auto tmp1 = in_ptr0[32 + i0];
2023-01-11T21:38:06.0091798Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0091904Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.0091991Z             auto tmp4 = tmp2 / tmp3;
2023-01-11T21:38:06.0092078Z             out_ptr3[i0] = tmp4;
2023-01-11T21:38:06.0092138Z         }
2023-01-11T21:38:06.0092221Z         #pragma omp single
2023-01-11T21:38:06.0092291Z         {
2023-01-11T21:38:06.0092358Z             {
2023-01-11T21:38:06.0092428Z                 {
2023-01-11T21:38:06.0092524Z                     auto tmp0 = out_ptr0[0];
2023-01-11T21:38:06.0092625Z                     auto tmp1 = static_cast<float>(64);
2023-01-11T21:38:06.0092722Z                     auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0092820Z                     in_out_ptr1[0] = tmp2;
2023-01-11T21:38:06.0092892Z                 }
2023-01-11T21:38:06.0092959Z             }
2023-01-11T21:38:06.0093028Z         }
2023-01-11T21:38:06.0093095Z     }
2023-01-11T21:38:06.0093152Z }
2023-01-11T21:38:06.0093243Z ''')
2023-01-11T21:38:06.0093248Z 
2023-01-11T21:38:06.0093253Z 
2023-01-11T21:38:06.0093376Z async_compile.wait(globals())
2023-01-11T21:38:06.0093455Z del async_compile
2023-01-11T21:38:06.0093460Z 
2023-01-11T21:38:06.0093535Z def call(args):
2023-01-11T21:38:06.0093608Z     arg0_1, = args
2023-01-11T21:38:06.0093684Z     args.clear()
2023-01-11T21:38:06.0093867Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0094066Z     buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0094157Z     buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0094363Z     buf3 = empty_strided((1, 2, 1, 8), (16, 8, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0094685Z     buf4 = empty_strided((4, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0094780Z     buf5 = buf0; del buf0  # reuse
2023-01-11T21:38:06.0095000Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:06.0095073Z     del arg0_1
2023-01-11T21:38:06.0095161Z     return (buf5, buf2, buf3, buf4, )
2023-01-11T21:38:06.0095166Z 
2023-01-11T21:38:06.0095178Z 
2023-01-11T21:38:06.0095251Z if __name__ == "__main__":
2023-01-11T21:38:06.0095369Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0095494Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0095705Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0095817Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0095822Z 
2023-01-11T21:38:06.0095900Z ok (1.887s)
2023-01-11T21:38:06.0096363Z   test_min_max_reduction_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0096540Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0096805Z [2023-01-11 21:29:30,801] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 198
2023-01-11T21:38:06.0097063Z [2023-01-11 21:29:33,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 198
2023-01-11T21:38:06.0097069Z 
2023-01-11T21:38:06.0097238Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0097328Z import torch
2023-01-11T21:38:06.0097418Z import random
2023-01-11T21:38:06.0097554Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0097701Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0097707Z 
2023-01-11T21:38:06.0097806Z aten = torch.ops.aten
2023-01-11T21:38:06.0097957Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0098056Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0098061Z 
2023-01-11T21:38:06.0098139Z import triton
2023-01-11T21:38:06.0098234Z import triton.language as tl
2023-01-11T21:38:06.0098360Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0098502Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0098509Z 
2023-01-11T21:38:06.0098513Z 
2023-01-11T21:38:06.0098654Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0098862Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0098987Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0099092Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0099200Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0099304Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0099406Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0099473Z {
2023-01-11T21:38:06.0099542Z     {
2023-01-11T21:38:06.0099946Z         #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:06.0100157Z         float tmp3 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0100275Z         auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0100520Z         #pragma omp declare reduction(min:at::vec::Vectorized<float>:omp_out = at::vec::minimum(omp_out, omp_in)) initializer(omp_priv={{std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:06.0100649Z         float tmp4 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0100777Z         auto tmp4_vec = at::vec::Vectorized<float>(tmp4);
2023-01-11T21:38:06.0100975Z         float tmp7 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0101100Z         auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.0101213Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0101287Z         {
2023-01-11T21:38:06.0101455Z             #pragma omp for reduction(max:tmp3_vec) reduction(min:tmp4_vec) reduction(max:tmp7_vec) 
2023-01-11T21:38:06.0101545Z             for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0101615Z             {
2023-01-11T21:38:06.0101758Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0101903Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0102000Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0102140Z                 auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0102269Z                 auto tmp6 = tmp0 + tmp5;
2023-01-11T21:38:06.0102384Z                 tmp3_vec = at::vec::maximum(tmp3_vec, tmp2);
2023-01-11T21:38:06.0102503Z                 tmp4_vec = at::vec::minimum(tmp4_vec, tmp2);
2023-01-11T21:38:06.0102620Z                 tmp7_vec = at::vec::maximum(tmp7_vec, tmp6);
2023-01-11T21:38:06.0102691Z             }
2023-01-11T21:38:06.0102904Z             tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp3_vec);
2023-01-11T21:38:06.0103114Z             tmp4 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::minimum(x, y);}, tmp4_vec);
2023-01-11T21:38:06.0103322Z             tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp7_vec);
2023-01-11T21:38:06.0103496Z             #pragma omp for simd simdlen(4) reduction(max:tmp3) reduction(min:tmp4) reduction(max:tmp7) 
2023-01-11T21:38:06.0103591Z             for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0103660Z             {
2023-01-11T21:38:06.0103746Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0103839Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.0103933Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0104039Z                 auto tmp5 = static_cast<float>(1);
2023-01-11T21:38:06.0104129Z                 auto tmp6 = tmp0 + tmp5;
2023-01-11T21:38:06.0104232Z                 tmp3 = std::max(tmp3, tmp2);
2023-01-11T21:38:06.0104332Z                 tmp4 = std::min(tmp4, tmp2);
2023-01-11T21:38:06.0104422Z                 tmp7 = std::max(tmp7, tmp6);
2023-01-11T21:38:06.0104491Z             }
2023-01-11T21:38:06.0104559Z         }
2023-01-11T21:38:06.0104643Z         out_ptr0[0] = tmp3;
2023-01-11T21:38:06.0104724Z         out_ptr1[0] = tmp4;
2023-01-11T21:38:06.0104806Z         out_ptr2[0] = tmp7;
2023-01-11T21:38:06.0104875Z     }
2023-01-11T21:38:06.0104934Z }
2023-01-11T21:38:06.0105017Z ''')
2023-01-11T21:38:06.0105023Z 
2023-01-11T21:38:06.0105027Z 
2023-01-11T21:38:06.0105121Z async_compile.wait(globals())
2023-01-11T21:38:06.0105198Z del async_compile
2023-01-11T21:38:06.0105203Z 
2023-01-11T21:38:06.0105277Z def call(args):
2023-01-11T21:38:06.0105385Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0105462Z     args.clear()
2023-01-11T21:38:06.0105644Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0105826Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0106017Z     buf2 = empty_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0106240Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0106313Z     del arg0_1
2023-01-11T21:38:06.0106383Z     del arg1_1
2023-01-11T21:38:06.0106468Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0106477Z 
2023-01-11T21:38:06.0106481Z 
2023-01-11T21:38:06.0106563Z if __name__ == "__main__":
2023-01-11T21:38:06.0106676Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0106803Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0107002Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0107199Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0107317Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0107323Z 
2023-01-11T21:38:06.0107395Z ok (2.310s)
2023-01-11T21:38:06.0107863Z   test_misaligned_address_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0108028Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0108288Z [2023-01-11 21:29:33,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 199
2023-01-11T21:38:06.0108547Z [2023-01-11 21:29:35,087] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 199
2023-01-11T21:38:06.0108561Z 
2023-01-11T21:38:06.0108651Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0108731Z import torch
2023-01-11T21:38:06.0108807Z import random
2023-01-11T21:38:06.0108928Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0109051Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0109056Z 
2023-01-11T21:38:06.0109137Z aten = torch.ops.aten
2023-01-11T21:38:06.0109274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0109365Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0109370Z 
2023-01-11T21:38:06.0109445Z import triton
2023-01-11T21:38:06.0109544Z import triton.language as tl
2023-01-11T21:38:06.0109668Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0109806Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0109815Z 
2023-01-11T21:38:06.0109820Z 
2023-01-11T21:38:06.0109956Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0110163Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0110285Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.0110388Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0110494Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0110560Z {
2023-01-11T21:38:06.0110624Z     {
2023-01-11T21:38:06.0110692Z         {
2023-01-11T21:38:06.0110784Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.0110880Z             auto tmp1 = in_ptr1[tmp0];
2023-01-11T21:38:06.0110957Z             out_ptr0[0] = tmp1;
2023-01-11T21:38:06.0111026Z         }
2023-01-11T21:38:06.0111094Z     }
2023-01-11T21:38:06.0111160Z }
2023-01-11T21:38:06.0111244Z ''')
2023-01-11T21:38:06.0111249Z 
2023-01-11T21:38:06.0111254Z 
2023-01-11T21:38:06.0111376Z async_compile.wait(globals())
2023-01-11T21:38:06.0111455Z del async_compile
2023-01-11T21:38:06.0111460Z 
2023-01-11T21:38:06.0111527Z def call(args):
2023-01-11T21:38:06.0111608Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0111683Z     args.clear()
2023-01-11T21:38:06.0111877Z     buf0 = empty_strided((1, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0112045Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0112123Z     del arg0_1
2023-01-11T21:38:06.0112196Z     del arg1_1
2023-01-11T21:38:06.0112264Z     return (buf0, )
2023-01-11T21:38:06.0112269Z 
2023-01-11T21:38:06.0112284Z 
2023-01-11T21:38:06.0112359Z if __name__ == "__main__":
2023-01-11T21:38:06.0112477Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0112603Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0112807Z     arg0_1 = rand_strided((1, 1000), (1000, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0113000Z     arg1_1 = rand_strided((1, 1), (1, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0113121Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0113126Z 
2023-01-11T21:38:06.0113196Z ok (2.001s)
2023-01-11T21:38:06.0113650Z   test_mm_views_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0113812Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0114074Z [2023-01-11 21:29:35,134] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 200
2023-01-11T21:38:06.0114338Z [2023-01-11 21:29:35,137] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 200
2023-01-11T21:38:06.0114344Z 
2023-01-11T21:38:06.0114447Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0114521Z import torch
2023-01-11T21:38:06.0114594Z import random
2023-01-11T21:38:06.0114714Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0114838Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0114843Z 
2023-01-11T21:38:06.0114935Z aten = torch.ops.aten
2023-01-11T21:38:06.0115089Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0115201Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0115207Z 
2023-01-11T21:38:06.0115285Z import triton
2023-01-11T21:38:06.0115380Z import triton.language as tl
2023-01-11T21:38:06.0115507Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0115646Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0115651Z 
2023-01-11T21:38:06.0115655Z 
2023-01-11T21:38:06.0115748Z async_compile.wait(globals())
2023-01-11T21:38:06.0115820Z del async_compile
2023-01-11T21:38:06.0115832Z 
2023-01-11T21:38:06.0115900Z def call(args):
2023-01-11T21:38:06.0115981Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0116056Z     args.clear()
2023-01-11T21:38:06.0116257Z     buf0 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0116386Z     aten.mm.out(arg0_1, as_strided(arg1_1, (32, 32), (32, 1)), out=buf0)
2023-01-11T21:38:06.0116460Z     del arg0_1
2023-01-11T21:38:06.0116524Z     del arg1_1
2023-01-11T21:38:06.0116602Z     return (buf0, )
2023-01-11T21:38:06.0116607Z 
2023-01-11T21:38:06.0116612Z 
2023-01-11T21:38:06.0116694Z if __name__ == "__main__":
2023-01-11T21:38:06.0116812Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0116936Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0117138Z     arg0_1 = rand_strided((32, 32), (1, 32), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0117379Z     arg1_1 = rand_strided((32, 1, 32), (32, 1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0117498Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0117503Z 
2023-01-11T21:38:06.0117576Z ok (0.068s)
2023-01-11T21:38:06.0118030Z   test_move_arange_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0118164Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0118420Z [2023-01-11 21:29:35,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 201
2023-01-11T21:38:06.0118683Z [2023-01-11 21:29:37,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 201
2023-01-11T21:38:06.0118692Z 
2023-01-11T21:38:06.0118791Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0118865Z import torch
2023-01-11T21:38:06.0118939Z import random
2023-01-11T21:38:06.0119060Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0119186Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0119191Z 
2023-01-11T21:38:06.0119266Z aten = torch.ops.aten
2023-01-11T21:38:06.0119405Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0119501Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0119506Z 
2023-01-11T21:38:06.0119579Z import triton
2023-01-11T21:38:06.0119701Z import triton.language as tl
2023-01-11T21:38:06.0119826Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0119965Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0119972Z 
2023-01-11T21:38:06.0119976Z 
2023-01-11T21:38:06.0120114Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0120315Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0120440Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0120545Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0120610Z {
2023-01-11T21:38:06.0120711Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0120777Z     {
2023-01-11T21:38:06.0120858Z         #pragma omp for 
2023-01-11T21:38:06.0120938Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0121004Z         {
2023-01-11T21:38:06.0121073Z             {
2023-01-11T21:38:06.0121144Z                 {
2023-01-11T21:38:06.0121249Z                     auto tmp2 = in_ptr0[i0];
2023-01-11T21:38:06.0121359Z                     auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:06.0121471Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.0121560Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.0121655Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0121724Z                 }
2023-01-11T21:38:06.0121792Z             }
2023-01-11T21:38:06.0121859Z         }
2023-01-11T21:38:06.0121926Z     }
2023-01-11T21:38:06.0121983Z }
2023-01-11T21:38:06.0122067Z ''')
2023-01-11T21:38:06.0122073Z 
2023-01-11T21:38:06.0122077Z 
2023-01-11T21:38:06.0122170Z async_compile.wait(globals())
2023-01-11T21:38:06.0122247Z del async_compile
2023-01-11T21:38:06.0122252Z 
2023-01-11T21:38:06.0122326Z def call(args):
2023-01-11T21:38:06.0122400Z     arg0_1, = args
2023-01-11T21:38:06.0122477Z     args.clear()
2023-01-11T21:38:06.0122670Z     buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0122805Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0122881Z     del arg0_1
2023-01-11T21:38:06.0122956Z     return (buf0, )
2023-01-11T21:38:06.0122961Z 
2023-01-11T21:38:06.0122966Z 
2023-01-11T21:38:06.0123045Z if __name__ == "__main__":
2023-01-11T21:38:06.0123190Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0123318Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0123515Z     arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0123619Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0123632Z 
2023-01-11T21:38:06.0123696Z ok (2.029s)
2023-01-11T21:38:06.0124149Z   test_multi_device_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0124283Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0124544Z [2023-01-11 21:29:37,367] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 202
2023-01-11T21:38:06.0124724Z [2023-01-11 21:29:37,370] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.0124903Z [2023-01-11 21:29:37,371] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.0125087Z [2023-01-11 21:29:37,373] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.0125287Z [2023-01-11 21:29:37,374] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.0125563Z [2023-01-11 21:29:39,669] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 202
2023-01-11T21:38:06.0125569Z 
2023-01-11T21:38:06.0125661Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0125778Z import torch
2023-01-11T21:38:06.0125852Z import random
2023-01-11T21:38:06.0125972Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0126096Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0126101Z 
2023-01-11T21:38:06.0126181Z aten = torch.ops.aten
2023-01-11T21:38:06.0126321Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0126409Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0126424Z 
2023-01-11T21:38:06.0126490Z import triton
2023-01-11T21:38:06.0126586Z import triton.language as tl
2023-01-11T21:38:06.0126711Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0126851Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0126856Z 
2023-01-11T21:38:06.0126861Z 
2023-01-11T21:38:06.0126997Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0127202Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0127331Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0127428Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0127498Z {
2023-01-11T21:38:06.0127601Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0127667Z     {
2023-01-11T21:38:06.0127753Z         #pragma omp for 
2023-01-11T21:38:06.0127840Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.0127910Z         {
2023-01-11T21:38:06.0128042Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0128181Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0128271Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0128407Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0128496Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0128592Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0128665Z         }
2023-01-11T21:38:06.0128765Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0128845Z         for(long i0=40; i0<40; i0+=1)
2023-01-11T21:38:06.0128912Z         {
2023-01-11T21:38:06.0128999Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0129104Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0129225Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0129330Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.0129423Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0129502Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0129571Z         }
2023-01-11T21:38:06.0129638Z     }
2023-01-11T21:38:06.0129703Z }
2023-01-11T21:38:06.0129789Z ''')
2023-01-11T21:38:06.0129794Z 
2023-01-11T21:38:06.0129799Z 
2023-01-11T21:38:06.0130001Z triton_fused_add_add_1_add_2_add_3_device_put_1 = async_compile.triton('''
2023-01-11T21:38:06.0130078Z import triton
2023-01-11T21:38:06.0130163Z import triton.language as tl
2023-01-11T21:38:06.0130282Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.0130384Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.0130517Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.0130644Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.0130650Z 
2023-01-11T21:38:06.0131058Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.0131133Z @triton.jit
2023-01-11T21:38:06.0131260Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.0131327Z     xnumel = 40
2023-01-11T21:38:06.0131425Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.0131557Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.0131640Z     xmask = xindex < xnumel
2023-01-11T21:38:06.0131712Z     x0 = xindex
2023-01-11T21:38:06.0131844Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.0131915Z     tmp1 = 3
2023-01-11T21:38:06.0131988Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.0132056Z     tmp3 = 4
2023-01-11T21:38:06.0132134Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.0132276Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.0132365Z ''')
2023-01-11T21:38:06.0132371Z 
2023-01-11T21:38:06.0132375Z 
2023-01-11T21:38:06.0132512Z kernel_cpp_2 = async_compile.cpp('''
2023-01-11T21:38:06.0132718Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0132834Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:06.0132899Z {
2023-01-11T21:38:06.0132999Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0133065Z     {
2023-01-11T21:38:06.0133147Z         #pragma omp for 
2023-01-11T21:38:06.0133234Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.0133304Z         {
2023-01-11T21:38:06.0133442Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0133580Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(5));
2023-01-11T21:38:06.0133670Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0133808Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(6));
2023-01-11T21:38:06.0133897Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0134001Z             tmp4.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0134071Z         }
2023-01-11T21:38:06.0134165Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0134255Z         for(long i0=40; i0<40; i0+=1)
2023-01-11T21:38:06.0134326Z         {
2023-01-11T21:38:06.0134420Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0134633Z             auto tmp1 = static_cast<float>(5);
2023-01-11T21:38:06.0134723Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0134828Z             auto tmp3 = static_cast<float>(6);
2023-01-11T21:38:06.0134913Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0135017Z             in_out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0135093Z         }
2023-01-11T21:38:06.0135167Z     }
2023-01-11T21:38:06.0135246Z }
2023-01-11T21:38:06.0135333Z ''')
2023-01-11T21:38:06.0135338Z 
2023-01-11T21:38:06.0135343Z 
2023-01-11T21:38:06.0135649Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_3 = async_compile.triton('''
2023-01-11T21:38:06.0135727Z import triton
2023-01-11T21:38:06.0135812Z import triton.language as tl
2023-01-11T21:38:06.0135927Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.0136029Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.0136163Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.0136291Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.0136297Z 
2023-01-11T21:38:06.0136694Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.0136771Z @triton.jit
2023-01-11T21:38:06.0136898Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.0136965Z     xnumel = 40
2023-01-11T21:38:06.0137064Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.0137249Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.0137333Z     xmask = xindex < xnumel
2023-01-11T21:38:06.0137402Z     x0 = xindex
2023-01-11T21:38:06.0137508Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.0137579Z     tmp1 = 7
2023-01-11T21:38:06.0137652Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.0137722Z     tmp3 = 8
2023-01-11T21:38:06.0137804Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.0137942Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.0138027Z ''')
2023-01-11T21:38:06.0138071Z 
2023-01-11T21:38:06.0138076Z 
2023-01-11T21:38:06.0138214Z kernel_cpp_4 = async_compile.cpp('''
2023-01-11T21:38:06.0138422Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0138539Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:06.0138605Z {
2023-01-11T21:38:06.0138712Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0138781Z     {
2023-01-11T21:38:06.0138865Z         #pragma omp for 
2023-01-11T21:38:06.0138954Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.0139023Z         {
2023-01-11T21:38:06.0139165Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0139306Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(9));
2023-01-11T21:38:06.0139398Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0139538Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:06.0139629Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0139738Z             tmp4.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0139806Z         }
2023-01-11T21:38:06.0139902Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0139992Z         for(long i0=40; i0<40; i0+=1)
2023-01-11T21:38:06.0140059Z         {
2023-01-11T21:38:06.0140160Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0140266Z             auto tmp1 = static_cast<float>(9);
2023-01-11T21:38:06.0140363Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0140469Z             auto tmp3 = static_cast<float>(10);
2023-01-11T21:38:06.0140553Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0140644Z             in_out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0140712Z         }
2023-01-11T21:38:06.0140780Z     }
2023-01-11T21:38:06.0140846Z }
2023-01-11T21:38:06.0140933Z ''')
2023-01-11T21:38:06.0140938Z 
2023-01-11T21:38:06.0140943Z 
2023-01-11T21:38:06.0141038Z async_compile.wait(globals())
2023-01-11T21:38:06.0141115Z del async_compile
2023-01-11T21:38:06.0141129Z 
2023-01-11T21:38:06.0141200Z def call(args):
2023-01-11T21:38:06.0141275Z     arg0_1, = args
2023-01-11T21:38:06.0141353Z     args.clear()
2023-01-11T21:38:06.0141564Z     buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0141740Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0141817Z     del arg0_1
2023-01-11T21:38:06.0141909Z     with torch.cuda.device(0):
2023-01-11T21:38:06.0142112Z         buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.0142193Z         buf1.copy_(buf0)
2023-01-11T21:38:06.0142285Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0142377Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.0142540Z         triton_fused_add_add_1_add_2_add_3_device_put_1.run(buf2, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.0142630Z     buf3 = buf0; del buf0  # reuse
2023-01-11T21:38:06.0142712Z     buf3.copy_(buf2)
2023-01-11T21:38:06.0142793Z     buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.0142900Z     kernel_cpp_2(c_void_p(buf4.data_ptr()))
2023-01-11T21:38:06.0142995Z     with torch.cuda.device(0):
2023-01-11T21:38:06.0143085Z         buf5 = buf2; del buf2  # reuse
2023-01-11T21:38:06.0143169Z         buf5.copy_(buf4)
2023-01-11T21:38:06.0143261Z         buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.0143452Z         triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_3.run(buf6, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.0143534Z     buf7 = buf4; del buf4  # reuse
2023-01-11T21:38:06.0143614Z     buf7.copy_(buf6)
2023-01-11T21:38:06.0143684Z     del buf6
2023-01-11T21:38:06.0143773Z     buf8 = buf7; del buf7  # reuse
2023-01-11T21:38:06.0143877Z     kernel_cpp_4(c_void_p(buf8.data_ptr()))
2023-01-11T21:38:06.0143952Z     return (buf8, )
2023-01-11T21:38:06.0143957Z 
2023-01-11T21:38:06.0143962Z 
2023-01-11T21:38:06.0144082Z if __name__ == "__main__":
2023-01-11T21:38:06.0144193Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0144319Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0144527Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0144644Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0144649Z 
2023-01-11T21:38:06.0144719Z ok (2.484s)
2023-01-11T21:38:06.0144887Z   test_multi_gpu_device_cpu (__main__.CpuTests) ... skip: requires multiple cuda devices (0.000s)
2023-01-11T21:38:06.0145040Z   test_multilayer_low_prec_cpu (__main__.CpuTests) ... skip: requires CUDA (0.001s)
2023-01-11T21:38:06.0145190Z   test_nan_to_num_cpu (__main__.CpuTests) ... skip: Skipping due to op bugs (0.001s)
2023-01-11T21:38:06.0145645Z   test_narrow_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0145779Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0146033Z [2023-01-11 21:29:39,703] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 203
2023-01-11T21:38:06.0146298Z [2023-01-11 21:29:42,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 203
2023-01-11T21:38:06.0146304Z 
2023-01-11T21:38:06.0146402Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0146477Z import torch
2023-01-11T21:38:06.0146552Z import random
2023-01-11T21:38:06.0146672Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0146796Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0146801Z 
2023-01-11T21:38:06.0146885Z aten = torch.ops.aten
2023-01-11T21:38:06.0147016Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0147114Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0147120Z 
2023-01-11T21:38:06.0147194Z import triton
2023-01-11T21:38:06.0147287Z import triton.language as tl
2023-01-11T21:38:06.0147411Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0147576Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0147582Z 
2023-01-11T21:38:06.0147587Z 
2023-01-11T21:38:06.0147722Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0147932Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0148048Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0148155Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0148221Z {
2023-01-11T21:38:06.0148322Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0148390Z     {
2023-01-11T21:38:06.0148477Z         #pragma omp for 
2023-01-11T21:38:06.0148557Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:06.0148624Z         {
2023-01-11T21:38:06.0148768Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 640 + (8*i0));
2023-01-11T21:38:06.0148907Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0149002Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0149141Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0149229Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0149325Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0149385Z         }
2023-01-11T21:38:06.0149486Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0149579Z         for(long i0=1024; i0<1024; i0+=1)
2023-01-11T21:38:06.0149646Z         {
2023-01-11T21:38:06.0149741Z             auto tmp0 = in_ptr0[640 + i0];
2023-01-11T21:38:06.0149844Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:06.0149961Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0150057Z             auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:06.0150145Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0150232Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0150301Z         }
2023-01-11T21:38:06.0150368Z     }
2023-01-11T21:38:06.0150438Z }
2023-01-11T21:38:06.0150524Z ''')
2023-01-11T21:38:06.0150529Z 
2023-01-11T21:38:06.0150534Z 
2023-01-11T21:38:06.0150620Z async_compile.wait(globals())
2023-01-11T21:38:06.0150702Z del async_compile
2023-01-11T21:38:06.0150708Z 
2023-01-11T21:38:06.0150783Z def call(args):
2023-01-11T21:38:06.0150857Z     arg0_1, = args
2023-01-11T21:38:06.0150936Z     args.clear()
2023-01-11T21:38:06.0151138Z     buf0 = empty_strided((16, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0151276Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0151383Z     return (as_strided(arg0_1, (64, 16), (64, 1), 10), buf0, )
2023-01-11T21:38:06.0151401Z 
2023-01-11T21:38:06.0151405Z 
2023-01-11T21:38:06.0151478Z if __name__ == "__main__":
2023-01-11T21:38:06.0151598Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0151723Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0151930Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0152044Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0152049Z 
2023-01-11T21:38:06.0152119Z ok (2.372s)
2023-01-11T21:38:06.0152582Z   test_new_empty_strided_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0152716Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0152973Z [2023-01-11 21:29:42,095] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 204
2023-01-11T21:38:06.0153230Z [2023-01-11 21:29:44,213] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 204
2023-01-11T21:38:06.0153314Z 
2023-01-11T21:38:06.0153406Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0153483Z import torch
2023-01-11T21:38:06.0153557Z import random
2023-01-11T21:38:06.0153676Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0153800Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0153805Z 
2023-01-11T21:38:06.0153886Z aten = torch.ops.aten
2023-01-11T21:38:06.0154026Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0154115Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0154120Z 
2023-01-11T21:38:06.0154194Z import triton
2023-01-11T21:38:06.0154292Z import triton.language as tl
2023-01-11T21:38:06.0154418Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0154557Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0154563Z 
2023-01-11T21:38:06.0154567Z 
2023-01-11T21:38:06.0154703Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0154910Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0155028Z extern "C" void kernel(float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0155086Z {
2023-01-11T21:38:06.0155191Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0155257Z     {
2023-01-11T21:38:06.0155342Z         #pragma omp for 
2023-01-11T21:38:06.0155431Z         for(long i0=0; i0<2048; i0+=1)
2023-01-11T21:38:06.0155498Z         {
2023-01-11T21:38:06.0155640Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(123));
2023-01-11T21:38:06.0155728Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0155829Z         }
2023-01-11T21:38:06.0155928Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0156023Z         for(long i0=16384; i0<16384; i0+=1)
2023-01-11T21:38:06.0156091Z         {
2023-01-11T21:38:06.0156196Z             auto tmp0 = static_cast<float>(123);
2023-01-11T21:38:06.0156284Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0156346Z         }
2023-01-11T21:38:06.0156412Z     }
2023-01-11T21:38:06.0156478Z }
2023-01-11T21:38:06.0156564Z ''')
2023-01-11T21:38:06.0156569Z 
2023-01-11T21:38:06.0156574Z 
2023-01-11T21:38:06.0156667Z async_compile.wait(globals())
2023-01-11T21:38:06.0156744Z del async_compile
2023-01-11T21:38:06.0156749Z 
2023-01-11T21:38:06.0156824Z def call(args):
2023-01-11T21:38:06.0156890Z     arg0_1, = args
2023-01-11T21:38:06.0156965Z     args.clear()
2023-01-11T21:38:06.0157183Z     buf0 = empty_strided((1, 128, 128), (16384, 128, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0157293Z     kernel_cpp_0(c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0157373Z     return (buf0, )
2023-01-11T21:38:06.0157378Z 
2023-01-11T21:38:06.0157382Z 
2023-01-11T21:38:06.0157463Z if __name__ == "__main__":
2023-01-11T21:38:06.0157581Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0157700Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0157895Z     arg0_1 = rand_strided((55, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0158007Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0158012Z 
2023-01-11T21:38:06.0158085Z ok (2.177s)
2023-01-11T21:38:06.0158537Z   test_new_ones_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0158672Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0158936Z [2023-01-11 21:29:44,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 205
2023-01-11T21:38:06.0159228Z [2023-01-11 21:29:46,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 205
2023-01-11T21:38:06.0159234Z 
2023-01-11T21:38:06.0159335Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0159411Z import torch
2023-01-11T21:38:06.0159478Z import random
2023-01-11T21:38:06.0159598Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0159724Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0159729Z 
2023-01-11T21:38:06.0159812Z aten = torch.ops.aten
2023-01-11T21:38:06.0159950Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0160043Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0160049Z 
2023-01-11T21:38:06.0160122Z import triton
2023-01-11T21:38:06.0160212Z import triton.language as tl
2023-01-11T21:38:06.0160336Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0160477Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0160482Z 
2023-01-11T21:38:06.0160487Z 
2023-01-11T21:38:06.0160626Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0160832Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0160951Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0161055Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0161121Z {
2023-01-11T21:38:06.0161180Z     {
2023-01-11T21:38:06.0161247Z         {
2023-01-11T21:38:06.0161351Z             auto tmp0 = static_cast<float>(1);
2023-01-11T21:38:06.0161438Z             out_ptr0[0] = tmp0;
2023-01-11T21:38:06.0161507Z         }
2023-01-11T21:38:06.0161575Z     }
2023-01-11T21:38:06.0161642Z     {
2023-01-11T21:38:06.0161736Z         {
2023-01-11T21:38:06.0161841Z             auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.0161927Z             out_ptr1[0] = tmp0;
2023-01-11T21:38:06.0161994Z         }
2023-01-11T21:38:06.0162060Z     }
2023-01-11T21:38:06.0162125Z }
2023-01-11T21:38:06.0162203Z ''')
2023-01-11T21:38:06.0162208Z 
2023-01-11T21:38:06.0162219Z 
2023-01-11T21:38:06.0162309Z async_compile.wait(globals())
2023-01-11T21:38:06.0162387Z del async_compile
2023-01-11T21:38:06.0162392Z 
2023-01-11T21:38:06.0162468Z def call(args):
2023-01-11T21:38:06.0162545Z     arg0_1, = args
2023-01-11T21:38:06.0162620Z     args.clear()
2023-01-11T21:38:06.0162810Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0162991Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0163123Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0163204Z     return (buf0, buf1, )
2023-01-11T21:38:06.0163210Z 
2023-01-11T21:38:06.0163217Z 
2023-01-11T21:38:06.0163301Z if __name__ == "__main__":
2023-01-11T21:38:06.0163421Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0163548Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0163740Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0163854Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0163860Z 
2023-01-11T21:38:06.0163929Z ok (1.990s)
2023-01-11T21:38:06.0164375Z   test_nll_loss_forward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0164503Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0164762Z [2023-01-11 21:29:46,263] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 206
2023-01-11T21:38:06.0165024Z [2023-01-11 21:29:48,521] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 206
2023-01-11T21:38:06.0165030Z 
2023-01-11T21:38:06.0165164Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0165243Z import torch
2023-01-11T21:38:06.0165321Z import random
2023-01-11T21:38:06.0165440Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0165574Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0165579Z 
2023-01-11T21:38:06.0165654Z aten = torch.ops.aten
2023-01-11T21:38:06.0165789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0165885Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0165890Z 
2023-01-11T21:38:06.0165965Z import triton
2023-01-11T21:38:06.0166055Z import triton.language as tl
2023-01-11T21:38:06.0166181Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0166320Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0166326Z 
2023-01-11T21:38:06.0166330Z 
2023-01-11T21:38:06.0166471Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0166669Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0166791Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0166902Z                        const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.0167012Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0167119Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0167184Z {
2023-01-11T21:38:06.0167275Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.0167336Z     {
2023-01-11T21:38:06.0167404Z         {
2023-01-11T21:38:06.0167488Z             float tmp3 = 0;
2023-01-11T21:38:06.0167576Z             for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.0167681Z             {
2023-01-11T21:38:06.0167751Z                 {
2023-01-11T21:38:06.0167848Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0167947Z                     auto tmp1 = in_ptr1[tmp0 + (5*i0)];
2023-01-11T21:38:06.0168077Z                     auto tmp2 = -tmp1;
2023-01-11T21:38:06.0168164Z                     tmp3 += tmp2;
2023-01-11T21:38:06.0168233Z                 }
2023-01-11T21:38:06.0168300Z             }
2023-01-11T21:38:06.0168384Z             out_ptr0[0] = tmp3;
2023-01-11T21:38:06.0168452Z         }
2023-01-11T21:38:06.0168510Z     }
2023-01-11T21:38:06.0168576Z     {
2023-01-11T21:38:06.0168644Z         {
2023-01-11T21:38:06.0168734Z             auto tmp0 = out_ptr0[0];
2023-01-11T21:38:06.0168838Z             auto tmp1 = static_cast<float>(5);
2023-01-11T21:38:06.0168928Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0169008Z             in_out_ptr0[0] = tmp2;
2023-01-11T21:38:06.0169072Z         }
2023-01-11T21:38:06.0169136Z     }
2023-01-11T21:38:06.0169204Z     {
2023-01-11T21:38:06.0169270Z         {
2023-01-11T21:38:06.0169375Z             auto tmp0 = static_cast<float>(5.0);
2023-01-11T21:38:06.0169457Z             out_ptr1[0] = tmp0;
2023-01-11T21:38:06.0169516Z         }
2023-01-11T21:38:06.0169583Z     }
2023-01-11T21:38:06.0169647Z }
2023-01-11T21:38:06.0169730Z ''')
2023-01-11T21:38:06.0169739Z 
2023-01-11T21:38:06.0169745Z 
2023-01-11T21:38:06.0169836Z async_compile.wait(globals())
2023-01-11T21:38:06.0169915Z del async_compile
2023-01-11T21:38:06.0169920Z 
2023-01-11T21:38:06.0169998Z def call(args):
2023-01-11T21:38:06.0170071Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0170146Z     args.clear()
2023-01-11T21:38:06.0170333Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0170424Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.0170601Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0170795Z     kernel_cpp_0(c_void_p(buf1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0170871Z     del arg0_1
2023-01-11T21:38:06.0170936Z     del arg1_1
2023-01-11T21:38:06.0171019Z     return (buf1, buf2, )
2023-01-11T21:38:06.0171024Z 
2023-01-11T21:38:06.0171029Z 
2023-01-11T21:38:06.0171110Z if __name__ == "__main__":
2023-01-11T21:38:06.0171259Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0171387Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0171584Z     arg0_1 = rand_strided((5, 5), (5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0171775Z     arg1_1 = rand_strided((5, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0171895Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0171901Z 
2023-01-11T21:38:06.0171964Z ok (2.314s)
2023-01-11T21:38:06.0172436Z   test_no_mega_fusion_during_lowering_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0172573Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0172829Z [2023-01-11 21:29:48,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 207
2023-01-11T21:38:06.0172835Z 
2023-01-11T21:38:06.0172933Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0173011Z import torch
2023-01-11T21:38:06.0173086Z import random
2023-01-11T21:38:06.0173202Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0173327Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0173332Z 
2023-01-11T21:38:06.0173407Z aten = torch.ops.aten
2023-01-11T21:38:06.0173542Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0173667Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0173672Z 
2023-01-11T21:38:06.0173746Z import triton
2023-01-11T21:38:06.0173838Z import triton.language as tl
2023-01-11T21:38:06.0173963Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0174106Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0174111Z 
2023-01-11T21:38:06.0174115Z 
2023-01-11T21:38:06.0174254Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0174451Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0174788Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0174900Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0175011Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0175123Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0175236Z                        const float* __restrict__ in_ptr3,
2023-01-11T21:38:06.0175341Z                        const float* __restrict__ in_ptr4,
2023-01-11T21:38:06.0175447Z                        const float* __restrict__ in_ptr5,
2023-01-11T21:38:06.0175545Z                        const float* __restrict__ in_ptr6,
2023-01-11T21:38:06.0175654Z                        const float* __restrict__ in_ptr7,
2023-01-11T21:38:06.0175762Z                        const float* __restrict__ in_ptr8,
2023-01-11T21:38:06.0175866Z                        const float* __restrict__ in_ptr9,
2023-01-11T21:38:06.0175981Z                        const float* __restrict__ in_ptr10,
2023-01-11T21:38:06.0176092Z                        const float* __restrict__ in_ptr11,
2023-01-11T21:38:06.0176200Z                        const float* __restrict__ in_ptr12,
2023-01-11T21:38:06.0176306Z                        const float* __restrict__ in_ptr13,
2023-01-11T21:38:06.0176406Z                        const float* __restrict__ in_ptr14,
2023-01-11T21:38:06.0176515Z                        const float* __restrict__ in_ptr15,
2023-01-11T21:38:06.0176621Z                        const float* __restrict__ in_ptr16,
2023-01-11T21:38:06.0176729Z                        const float* __restrict__ in_ptr17,
2023-01-11T21:38:06.0176834Z                        const float* __restrict__ in_ptr18,
2023-01-11T21:38:06.0176993Z                        const float* __restrict__ in_ptr19,
2023-01-11T21:38:06.0177102Z                        const float* __restrict__ in_ptr20,
2023-01-11T21:38:06.0177255Z                        const float* __restrict__ in_ptr21,
2023-01-11T21:38:06.0177363Z                        const float* __restrict__ in_ptr22,
2023-01-11T21:38:06.0177468Z                        const float* __restrict__ in_ptr23,
2023-01-11T21:38:06.0177574Z                        const float* __restrict__ in_ptr24,
2023-01-11T21:38:06.0177679Z                        const float* __restrict__ in_ptr25,
2023-01-11T21:38:06.0177784Z                        const float* __restrict__ in_ptr26,
2023-01-11T21:38:06.0177894Z                        const float* __restrict__ in_ptr27,
2023-01-11T21:38:06.0177999Z                        const float* __restrict__ in_ptr28,
2023-01-11T21:38:06.0178098Z                        const float* __restrict__ in_ptr29,
2023-01-11T21:38:06.0178206Z                        const float* __restrict__ in_ptr30,
2023-01-11T21:38:06.0178311Z                        const float* __restrict__ in_ptr31,
2023-01-11T21:38:06.0178418Z                        const float* __restrict__ in_ptr32,
2023-01-11T21:38:06.0178524Z                        const float* __restrict__ in_ptr33,
2023-01-11T21:38:06.0178627Z                        const float* __restrict__ in_ptr34,
2023-01-11T21:38:06.0178733Z                        const float* __restrict__ in_ptr35,
2023-01-11T21:38:06.0178831Z                        const float* __restrict__ in_ptr36,
2023-01-11T21:38:06.0178935Z                        const float* __restrict__ in_ptr37,
2023-01-11T21:38:06.0179080Z                        const float* __restrict__ in_ptr38,
2023-01-11T21:38:06.0179186Z                        const float* __restrict__ in_ptr39,
2023-01-11T21:38:06.0179292Z                        const float* __restrict__ in_ptr40,
2023-01-11T21:38:06.0179398Z                        const float* __restrict__ in_ptr41,
2023-01-11T21:38:06.0179505Z                        const float* __restrict__ in_ptr42,
2023-01-11T21:38:06.0179612Z                        const float* __restrict__ in_ptr43,
2023-01-11T21:38:06.0179711Z                        const float* __restrict__ in_ptr44,
2023-01-11T21:38:06.0179819Z                        const float* __restrict__ in_ptr45,
2023-01-11T21:38:06.0179923Z                        const float* __restrict__ in_ptr46,
2023-01-11T21:38:06.0180035Z                        const float* __restrict__ in_ptr47,
2023-01-11T21:38:06.0180140Z                        const float* __restrict__ in_ptr48,
2023-01-11T21:38:06.0180247Z                        const float* __restrict__ in_ptr49)
2023-01-11T21:38:06.0180314Z {
2023-01-11T21:38:06.0180399Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.0180498Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0180563Z     {
2023-01-11T21:38:06.0180645Z         #pragma omp for 
2023-01-11T21:38:06.0180733Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0180798Z         {
2023-01-11T21:38:06.0180943Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0181073Z             auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0181204Z             auto tmp4 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i0);
2023-01-11T21:38:06.0181336Z             auto tmp6 = at::vec::Vectorized<float>::loadu(in_ptr3 + 8*i0);
2023-01-11T21:38:06.0181465Z             auto tmp8 = at::vec::Vectorized<float>::loadu(in_ptr4 + 8*i0);
2023-01-11T21:38:06.0181597Z             auto tmp10 = at::vec::Vectorized<float>::loadu(in_ptr5 + 8*i0);
2023-01-11T21:38:06.0181731Z             auto tmp12 = at::vec::Vectorized<float>::loadu(in_ptr6 + 8*i0);
2023-01-11T21:38:06.0181861Z             auto tmp14 = at::vec::Vectorized<float>::loadu(in_ptr7 + 8*i0);
2023-01-11T21:38:06.0181992Z             auto tmp16 = at::vec::Vectorized<float>::loadu(in_ptr8 + 8*i0);
2023-01-11T21:38:06.0182083Z             auto tmp1 = tmp0 + tmp0;
2023-01-11T21:38:06.0182194Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.0182286Z             auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:06.0182373Z             auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:06.0182460Z             auto tmp9 = tmp7 + tmp8;
2023-01-11T21:38:06.0182553Z             auto tmp11 = tmp9 + tmp10;
2023-01-11T21:38:06.0182643Z             auto tmp13 = tmp11 + tmp12;
2023-01-11T21:38:06.0182733Z             auto tmp15 = tmp13 + tmp14;
2023-01-11T21:38:06.0182816Z             auto tmp17 = tmp15 + tmp16;
2023-01-11T21:38:06.0182912Z             tmp17.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0182979Z         }
2023-01-11T21:38:06.0183077Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0183168Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0183235Z         {
2023-01-11T21:38:06.0183322Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0183402Z             auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:06.0183489Z             auto tmp4 = in_ptr2[i0];
2023-01-11T21:38:06.0183577Z             auto tmp6 = in_ptr3[i0];
2023-01-11T21:38:06.0183665Z             auto tmp8 = in_ptr4[i0];
2023-01-11T21:38:06.0183753Z             auto tmp10 = in_ptr5[i0];
2023-01-11T21:38:06.0183841Z             auto tmp12 = in_ptr6[i0];
2023-01-11T21:38:06.0183921Z             auto tmp14 = in_ptr7[i0];
2023-01-11T21:38:06.0184010Z             auto tmp16 = in_ptr8[i0];
2023-01-11T21:38:06.0184098Z             auto tmp1 = tmp0 + tmp0;
2023-01-11T21:38:06.0184184Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.0184274Z             auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:06.0184359Z             auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:06.0184474Z             auto tmp9 = tmp7 + tmp8;
2023-01-11T21:38:06.0184556Z             auto tmp11 = tmp9 + tmp10;
2023-01-11T21:38:06.0184648Z             auto tmp13 = tmp11 + tmp12;
2023-01-11T21:38:06.0184738Z             auto tmp15 = tmp13 + tmp14;
2023-01-11T21:38:06.0184828Z             auto tmp17 = tmp15 + tmp16;
2023-01-11T21:38:06.0184916Z             out_ptr0[i0] = tmp17;
2023-01-11T21:38:06.0184983Z         }
2023-01-11T21:38:06.0185064Z         #pragma omp for 
2023-01-11T21:38:06.0185147Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0185231Z         {
2023-01-11T21:38:06.0185387Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0185531Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr9 + 8*i0);
2023-01-11T21:38:06.0185665Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr10 + 8*i0);
2023-01-11T21:38:06.0185798Z             auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr11 + 8*i0);
2023-01-11T21:38:06.0185932Z             auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr12 + 8*i0);
2023-01-11T21:38:06.0186061Z             auto tmp9 = at::vec::Vectorized<float>::loadu(in_ptr13 + 8*i0);
2023-01-11T21:38:06.0186199Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr14 + 8*i0);
2023-01-11T21:38:06.0186330Z             auto tmp13 = at::vec::Vectorized<float>::loadu(in_ptr15 + 8*i0);
2023-01-11T21:38:06.0186465Z             auto tmp15 = at::vec::Vectorized<float>::loadu(in_ptr16 + 8*i0);
2023-01-11T21:38:06.0186559Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0186648Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0186738Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0186823Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0186912Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0186995Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0187086Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0187176Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0187281Z             tmp16.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0187351Z         }
2023-01-11T21:38:06.0187451Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0187538Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0187597Z         {
2023-01-11T21:38:06.0187714Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:06.0187801Z             auto tmp1 = in_ptr9[i0];
2023-01-11T21:38:06.0187889Z             auto tmp3 = in_ptr10[i0];
2023-01-11T21:38:06.0187979Z             auto tmp5 = in_ptr11[i0];
2023-01-11T21:38:06.0188065Z             auto tmp7 = in_ptr12[i0];
2023-01-11T21:38:06.0188156Z             auto tmp9 = in_ptr13[i0];
2023-01-11T21:38:06.0188239Z             auto tmp11 = in_ptr14[i0];
2023-01-11T21:38:06.0188330Z             auto tmp13 = in_ptr15[i0];
2023-01-11T21:38:06.0188422Z             auto tmp15 = in_ptr16[i0];
2023-01-11T21:38:06.0188512Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0188601Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0188693Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0188781Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0188862Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0188952Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0189045Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0189137Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0189224Z             in_out_ptr0[i0] = tmp16;
2023-01-11T21:38:06.0189291Z         }
2023-01-11T21:38:06.0189372Z         #pragma omp for 
2023-01-11T21:38:06.0189452Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0189522Z         {
2023-01-11T21:38:06.0189667Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0189801Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr17 + 8*i0);
2023-01-11T21:38:06.0189935Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr18 + 8*i0);
2023-01-11T21:38:06.0190094Z             auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr19 + 8*i0);
2023-01-11T21:38:06.0190226Z             auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr20 + 8*i0);
2023-01-11T21:38:06.0190354Z             auto tmp9 = at::vec::Vectorized<float>::loadu(in_ptr21 + 8*i0);
2023-01-11T21:38:06.0190496Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr22 + 8*i0);
2023-01-11T21:38:06.0190623Z             auto tmp13 = at::vec::Vectorized<float>::loadu(in_ptr23 + 8*i0);
2023-01-11T21:38:06.0190754Z             auto tmp15 = at::vec::Vectorized<float>::loadu(in_ptr24 + 8*i0);
2023-01-11T21:38:06.0190844Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0190933Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0191019Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0191103Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0191192Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0191275Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0191367Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0191456Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0191556Z             tmp16.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0191623Z         }
2023-01-11T21:38:06.0191723Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0191814Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0191873Z         {
2023-01-11T21:38:06.0191968Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0192058Z             auto tmp1 = in_ptr17[i0];
2023-01-11T21:38:06.0192151Z             auto tmp3 = in_ptr18[i0];
2023-01-11T21:38:06.0192239Z             auto tmp5 = in_ptr19[i0];
2023-01-11T21:38:06.0192327Z             auto tmp7 = in_ptr20[i0];
2023-01-11T21:38:06.0192415Z             auto tmp9 = in_ptr21[i0];
2023-01-11T21:38:06.0192498Z             auto tmp11 = in_ptr22[i0];
2023-01-11T21:38:06.0192588Z             auto tmp13 = in_ptr23[i0];
2023-01-11T21:38:06.0192677Z             auto tmp15 = in_ptr24[i0];
2023-01-11T21:38:06.0192768Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0192855Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0192942Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0193030Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0193111Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0193228Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0193321Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0193411Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0193499Z             in_out_ptr0[i0] = tmp16;
2023-01-11T21:38:06.0193568Z         }
2023-01-11T21:38:06.0193648Z         #pragma omp for 
2023-01-11T21:38:06.0193728Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0193793Z         {
2023-01-11T21:38:06.0193935Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0194075Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr25 + 8*i0);
2023-01-11T21:38:06.0194218Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr26 + 8*i0);
2023-01-11T21:38:06.0194350Z             auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr27 + 8*i0);
2023-01-11T21:38:06.0194483Z             auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr28 + 8*i0);
2023-01-11T21:38:06.0194615Z             auto tmp9 = at::vec::Vectorized<float>::loadu(in_ptr29 + 8*i0);
2023-01-11T21:38:06.0194746Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr30 + 8*i0);
2023-01-11T21:38:06.0194881Z             auto tmp13 = at::vec::Vectorized<float>::loadu(in_ptr31 + 8*i0);
2023-01-11T21:38:06.0195014Z             auto tmp15 = at::vec::Vectorized<float>::loadu(in_ptr32 + 8*i0);
2023-01-11T21:38:06.0195103Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0195193Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0195282Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0195369Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0195487Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0195570Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0195660Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0195748Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0195851Z             tmp16.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0195919Z         }
2023-01-11T21:38:06.0196022Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0196108Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0196168Z         {
2023-01-11T21:38:06.0196263Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0196354Z             auto tmp1 = in_ptr25[i0];
2023-01-11T21:38:06.0196444Z             auto tmp3 = in_ptr26[i0];
2023-01-11T21:38:06.0196532Z             auto tmp5 = in_ptr27[i0];
2023-01-11T21:38:06.0196619Z             auto tmp7 = in_ptr28[i0];
2023-01-11T21:38:06.0196707Z             auto tmp9 = in_ptr29[i0];
2023-01-11T21:38:06.0196789Z             auto tmp11 = in_ptr30[i0];
2023-01-11T21:38:06.0196881Z             auto tmp13 = in_ptr31[i0];
2023-01-11T21:38:06.0196971Z             auto tmp15 = in_ptr32[i0];
2023-01-11T21:38:06.0197063Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0197150Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0197237Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0197320Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0197408Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0197496Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0197587Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0197677Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0197764Z             in_out_ptr0[i0] = tmp16;
2023-01-11T21:38:06.0197833Z         }
2023-01-11T21:38:06.0197907Z         #pragma omp for 
2023-01-11T21:38:06.0197994Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0198059Z         {
2023-01-11T21:38:06.0198202Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0198337Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr33 + 8*i0);
2023-01-11T21:38:06.0198470Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr34 + 8*i0);
2023-01-11T21:38:06.0198601Z             auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr35 + 8*i0);
2023-01-11T21:38:06.0198756Z             auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr36 + 8*i0);
2023-01-11T21:38:06.0198886Z             auto tmp9 = at::vec::Vectorized<float>::loadu(in_ptr37 + 8*i0);
2023-01-11T21:38:06.0199014Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr38 + 8*i0);
2023-01-11T21:38:06.0199148Z             auto tmp13 = at::vec::Vectorized<float>::loadu(in_ptr39 + 8*i0);
2023-01-11T21:38:06.0199283Z             auto tmp15 = at::vec::Vectorized<float>::loadu(in_ptr40 + 8*i0);
2023-01-11T21:38:06.0199372Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0199461Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0199549Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0199637Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0199718Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0199813Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0199906Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0199993Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0200096Z             tmp16.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0200161Z         }
2023-01-11T21:38:06.0200261Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0200340Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0200408Z         {
2023-01-11T21:38:06.0200502Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0200592Z             auto tmp1 = in_ptr33[i0];
2023-01-11T21:38:06.0200681Z             auto tmp3 = in_ptr34[i0];
2023-01-11T21:38:06.0200770Z             auto tmp5 = in_ptr35[i0];
2023-01-11T21:38:06.0200858Z             auto tmp7 = in_ptr36[i0];
2023-01-11T21:38:06.0200938Z             auto tmp9 = in_ptr37[i0];
2023-01-11T21:38:06.0201056Z             auto tmp11 = in_ptr38[i0];
2023-01-11T21:38:06.0201147Z             auto tmp13 = in_ptr39[i0];
2023-01-11T21:38:06.0201236Z             auto tmp15 = in_ptr40[i0];
2023-01-11T21:38:06.0201325Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0201420Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0201509Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0201590Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0201680Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0201769Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0201857Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0201949Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0202035Z             in_out_ptr0[i0] = tmp16;
2023-01-11T21:38:06.0202102Z         }
2023-01-11T21:38:06.0202176Z         #pragma omp for 
2023-01-11T21:38:06.0202262Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0202330Z         {
2023-01-11T21:38:06.0202472Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0202606Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr41 + 8*i0);
2023-01-11T21:38:06.0202740Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr42 + 8*i0);
2023-01-11T21:38:06.0202875Z             auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr43 + 8*i0);
2023-01-11T21:38:06.0203005Z             auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr44 + 8*i0);
2023-01-11T21:38:06.0203127Z             auto tmp9 = at::vec::Vectorized<float>::loadu(in_ptr45 + 8*i0);
2023-01-11T21:38:06.0203262Z             auto tmp11 = at::vec::Vectorized<float>::loadu(in_ptr46 + 8*i0);
2023-01-11T21:38:06.0203396Z             auto tmp13 = at::vec::Vectorized<float>::loadu(in_ptr47 + 8*i0);
2023-01-11T21:38:06.0203530Z             auto tmp15 = at::vec::Vectorized<float>::loadu(in_ptr48 + 8*i0);
2023-01-11T21:38:06.0203625Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0203717Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0203806Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0203892Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0203975Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0204066Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0204183Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0204274Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0204376Z             tmp16.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0204445Z         }
2023-01-11T21:38:06.0204545Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0204624Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0204689Z         {
2023-01-11T21:38:06.0204782Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0204874Z             auto tmp1 = in_ptr41[i0];
2023-01-11T21:38:06.0204963Z             auto tmp3 = in_ptr42[i0];
2023-01-11T21:38:06.0205051Z             auto tmp5 = in_ptr43[i0];
2023-01-11T21:38:06.0205142Z             auto tmp7 = in_ptr44[i0];
2023-01-11T21:38:06.0205222Z             auto tmp9 = in_ptr45[i0];
2023-01-11T21:38:06.0205312Z             auto tmp11 = in_ptr46[i0];
2023-01-11T21:38:06.0205402Z             auto tmp13 = in_ptr47[i0];
2023-01-11T21:38:06.0205491Z             auto tmp15 = in_ptr48[i0];
2023-01-11T21:38:06.0205582Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0205671Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0205758Z             auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0205837Z             auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.0205930Z             auto tmp10 = tmp8 + tmp9;
2023-01-11T21:38:06.0206019Z             auto tmp12 = tmp10 + tmp11;
2023-01-11T21:38:06.0206109Z             auto tmp14 = tmp12 + tmp13;
2023-01-11T21:38:06.0206198Z             auto tmp16 = tmp14 + tmp15;
2023-01-11T21:38:06.0206285Z             in_out_ptr0[i0] = tmp16;
2023-01-11T21:38:06.0206353Z         }
2023-01-11T21:38:06.0206470Z         #pragma omp for 
2023-01-11T21:38:06.0206557Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0206623Z         {
2023-01-11T21:38:06.0206766Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0206899Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr49 + 8*i0);
2023-01-11T21:38:06.0206992Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0207092Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0207151Z         }
2023-01-11T21:38:06.0207254Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0207340Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0207410Z         {
2023-01-11T21:38:06.0207506Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.0207597Z             auto tmp1 = in_ptr49[i0];
2023-01-11T21:38:06.0207685Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0207766Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0207832Z         }
2023-01-11T21:38:06.0207901Z     }
2023-01-11T21:38:06.0207966Z }
2023-01-11T21:38:06.0208059Z ''')
2023-01-11T21:38:06.0208065Z 
2023-01-11T21:38:06.0208070Z 
2023-01-11T21:38:06.0208164Z async_compile.wait(globals())
2023-01-11T21:38:06.0208240Z del async_compile
2023-01-11T21:38:06.0208246Z 
2023-01-11T21:38:06.0208313Z def call(args):
2023-01-11T21:38:06.0208634Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1 = args
2023-01-11T21:38:06.0208711Z     args.clear()
2023-01-11T21:38:06.0208910Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0209001Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.0209093Z     buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0209179Z     buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.0209266Z     buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.0209352Z     buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.0209431Z     buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.0210667Z     kernel_cpp_0(c_void_p(buf6.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg3_1.data_ptr()), c_void_p(arg4_1.data_ptr()), c_void_p(arg5_1.data_ptr()), c_void_p(arg6_1.data_ptr()), c_void_p(arg7_1.data_ptr()), c_void_p(arg8_1.data_ptr()), c_void_p(arg9_1.data_ptr()), c_void_p(arg10_1.data_ptr()), c_void_p(arg11_1.data_ptr()), c_void_p(arg12_1.data_ptr()), c_void_p(arg13_1.data_ptr()), c_void_p(arg14_1.data_ptr()), c_void_p(arg15_1.data_ptr()), c_void_p(arg16_1.data_ptr()), c_void_p(arg17_1.data_ptr()), c_void_p(arg18_1.data_ptr()), c_void_p(arg19_1.data_ptr()), c_void_p(arg20_1.data_ptr()), c_void_p(arg21_1.data_ptr()), c_void_p(arg22_1.data_ptr()), c_void_p(arg23_1.data_ptr()), c_void_p(arg24_1.data_ptr()), c_void_p(arg25_1.data_ptr()), c_void_p(arg26_1.data_ptr()), c_void_p(arg27_1.data_ptr()), c_void_p(arg28_1.data_ptr()), c_void_p(arg29_1.data_ptr()), c_void_p(arg30_1.data_ptr()), c_void_p(arg31_1.data_ptr()), c_void_p(arg32_1.data_ptr()), c_void_p(arg33_1.data_ptr()), c_void_p(arg34_1.data_ptr()), c_void_p(arg35_1.data_ptr()), c_void_p(arg36_1.data_ptr()), c_void_p(arg37_1.data_ptr()), c_void_p(arg38_1.data_ptr()), c_void_p(arg39_1.data_ptr()), c_void_p(arg40_1.data_ptr()), c_void_p(arg41_1.data_ptr()), c_void_p(arg42_1.data_ptr()), c_void_p(arg43_1.data_ptr()), c_void_p(arg44_1.data_ptr()), c_void_p(arg45_1.data_ptr()), c_void_p(arg46_1.data_ptr()), c_void_p(arg47_1.data_ptr()), c_void_p(arg48_1.data_ptr()), c_void_p(arg49_1.data_ptr()))
2023-01-11T21:38:06.0210754Z     del arg0_1
2023-01-11T21:38:06.0210827Z     del arg10_1
2023-01-11T21:38:06.0210891Z     del arg11_1
2023-01-11T21:38:06.0210964Z     del arg12_1
2023-01-11T21:38:06.0211059Z     del arg13_1
2023-01-11T21:38:06.0211129Z     del arg14_1
2023-01-11T21:38:06.0211203Z     del arg15_1
2023-01-11T21:38:06.0211275Z     del arg16_1
2023-01-11T21:38:06.0211345Z     del arg17_1
2023-01-11T21:38:06.0211408Z     del arg18_1
2023-01-11T21:38:06.0211480Z     del arg19_1
2023-01-11T21:38:06.0211550Z     del arg1_1
2023-01-11T21:38:06.0211623Z     del arg20_1
2023-01-11T21:38:06.0211692Z     del arg21_1
2023-01-11T21:38:06.0211763Z     del arg22_1
2023-01-11T21:38:06.0211825Z     del arg23_1
2023-01-11T21:38:06.0211897Z     del arg24_1
2023-01-11T21:38:06.0211965Z     del arg25_1
2023-01-11T21:38:06.0212036Z     del arg26_1
2023-01-11T21:38:06.0212110Z     del arg27_1
2023-01-11T21:38:06.0212179Z     del arg28_1
2023-01-11T21:38:06.0212249Z     del arg29_1
2023-01-11T21:38:06.0212314Z     del arg2_1
2023-01-11T21:38:06.0212384Z     del arg30_1
2023-01-11T21:38:06.0212454Z     del arg31_1
2023-01-11T21:38:06.0212524Z     del arg32_1
2023-01-11T21:38:06.0212594Z     del arg33_1
2023-01-11T21:38:06.0212665Z     del arg34_1
2023-01-11T21:38:06.0212728Z     del arg35_1
2023-01-11T21:38:06.0212798Z     del arg36_1
2023-01-11T21:38:06.0212867Z     del arg37_1
2023-01-11T21:38:06.0212936Z     del arg38_1
2023-01-11T21:38:06.0213007Z     del arg39_1
2023-01-11T21:38:06.0213078Z     del arg3_1
2023-01-11T21:38:06.0213149Z     del arg40_1
2023-01-11T21:38:06.0213213Z     del arg41_1
2023-01-11T21:38:06.0213283Z     del arg42_1
2023-01-11T21:38:06.0213351Z     del arg43_1
2023-01-11T21:38:06.0213422Z     del arg44_1
2023-01-11T21:38:06.0213492Z     del arg45_1
2023-01-11T21:38:06.0213559Z     del arg46_1
2023-01-11T21:38:06.0213622Z     del arg47_1
2023-01-11T21:38:06.0213692Z     del arg48_1
2023-01-11T21:38:06.0213764Z     del arg49_1
2023-01-11T21:38:06.0213834Z     del arg4_1
2023-01-11T21:38:06.0213904Z     del arg5_1
2023-01-11T21:38:06.0213973Z     del arg6_1
2023-01-11T21:38:06.0214043Z     del arg7_1
2023-01-11T21:38:06.0214106Z     del arg8_1
2023-01-11T21:38:06.0214179Z     del arg9_1
2023-01-11T21:38:06.0214253Z     return (buf6, )
2023-01-11T21:38:06.0214259Z 
2023-01-11T21:38:06.0214263Z 
2023-01-11T21:38:06.0214343Z if __name__ == "__main__":
2023-01-11T21:38:06.0214468Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0214703Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0214964Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0215179Z     arg1_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0215370Z     arg2_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0215556Z     arg3_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0215740Z     arg4_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0215925Z     arg5_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0216108Z     arg6_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0216295Z     arg7_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0216478Z     arg8_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0216656Z     arg9_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0216844Z     arg10_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0217034Z     arg11_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0217273Z     arg12_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0217460Z     arg13_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0217648Z     arg14_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0217835Z     arg15_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0218062Z     arg16_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0218239Z     arg17_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0218422Z     arg18_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0218607Z     arg19_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0218792Z     arg20_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0218976Z     arg21_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0219160Z     arg22_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0219341Z     arg23_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0219524Z     arg24_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0219700Z     arg25_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0219888Z     arg26_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0220071Z     arg27_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0220254Z     arg28_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0220442Z     arg29_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0220627Z     arg30_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0220812Z     arg31_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0220995Z     arg32_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0221172Z     arg33_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0221359Z     arg34_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0221542Z     arg35_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0221729Z     arg36_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0221911Z     arg37_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0222125Z     arg38_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0222311Z     arg39_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0222494Z     arg40_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0222670Z     arg41_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0222857Z     arg42_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0223039Z     arg43_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0223224Z     arg44_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0223411Z     arg45_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0223596Z     arg46_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0223780Z     arg47_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0223967Z     arg48_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0224144Z     arg49_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0224496Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1]))
2023-01-11T21:38:06.0224792Z [2023-01-11 21:29:51,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 207
2023-01-11T21:38:06.0224799Z 
2023-01-11T21:38:06.0224885Z --> 7
2023-01-11T21:38:06.0224958Z ok (3.028s)
2023-01-11T21:38:06.0225420Z   test_no_op_reduction_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0225552Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0225809Z [2023-01-11 21:29:51,580] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 208
2023-01-11T21:38:06.0226074Z [2023-01-11 21:29:53,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 208
2023-01-11T21:38:06.0226083Z 
2023-01-11T21:38:06.0226181Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0226257Z import torch
2023-01-11T21:38:06.0226325Z import random
2023-01-11T21:38:06.0226445Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0226573Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0226578Z 
2023-01-11T21:38:06.0226660Z aten = torch.ops.aten
2023-01-11T21:38:06.0226798Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0226894Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0226899Z 
2023-01-11T21:38:06.0226977Z import triton
2023-01-11T21:38:06.0227069Z import triton.language as tl
2023-01-11T21:38:06.0227188Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0227327Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0227333Z 
2023-01-11T21:38:06.0227338Z 
2023-01-11T21:38:06.0227477Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0227686Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0227810Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0227916Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0228046Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0228106Z {
2023-01-11T21:38:06.0228209Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0228274Z     {
2023-01-11T21:38:06.0228356Z         #pragma omp for 
2023-01-11T21:38:06.0228442Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0228507Z         {
2023-01-11T21:38:06.0228646Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0228785Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0228869Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0228962Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0229066Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0229133Z         }
2023-01-11T21:38:06.0229233Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0229319Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.0229379Z         {
2023-01-11T21:38:06.0229471Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0229578Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0229667Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0229754Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0229839Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.0229906Z         }
2023-01-11T21:38:06.0229965Z     }
2023-01-11T21:38:06.0230030Z }
2023-01-11T21:38:06.0230117Z ''')
2023-01-11T21:38:06.0230122Z 
2023-01-11T21:38:06.0230127Z 
2023-01-11T21:38:06.0230222Z async_compile.wait(globals())
2023-01-11T21:38:06.0230300Z del async_compile
2023-01-11T21:38:06.0230305Z 
2023-01-11T21:38:06.0230382Z def call(args):
2023-01-11T21:38:06.0230484Z     arg0_1, = args
2023-01-11T21:38:06.0230559Z     args.clear()
2023-01-11T21:38:06.0230750Z     buf0 = empty_strided((8, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0230954Z     buf1 = empty_strided((8, 1, 1), (1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0231126Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0231200Z     del arg0_1
2023-01-11T21:38:06.0231278Z     return (buf0, buf1, )
2023-01-11T21:38:06.0231283Z 
2023-01-11T21:38:06.0231288Z 
2023-01-11T21:38:06.0231368Z if __name__ == "__main__":
2023-01-11T21:38:06.0231487Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0231607Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0231808Z     arg0_1 = rand_strided((8, 1, 1), (1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0231921Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0231929Z 
2023-01-11T21:38:06.0232000Z ok (2.239s)
2023-01-11T21:38:06.0232521Z   test_output_strides_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.0232603Z   warnings.warn(
2023-01-11T21:38:06.0232862Z [2023-01-11 21:29:53,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 209
2023-01-11T21:38:06.0233125Z [2023-01-11 21:29:56,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 209
2023-01-11T21:38:06.0233378Z [2023-01-11 21:29:56,069] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 210
2023-01-11T21:38:06.0233633Z [2023-01-11 21:29:56,077] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 210
2023-01-11T21:38:06.0233884Z [2023-01-11 21:29:56,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 211
2023-01-11T21:38:06.0234143Z [2023-01-11 21:29:56,119] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 211
2023-01-11T21:38:06.0234595Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:3148: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0234716Z   self.assertEqual(inp.storage(), out.storage())
2023-01-11T21:38:06.0235351Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:1904: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0235452Z   device=typed_storage.device,
2023-01-11T21:38:06.0235457Z 
2023-01-11T21:38:06.0235556Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0235629Z import torch
2023-01-11T21:38:06.0235704Z import random
2023-01-11T21:38:06.0235820Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0235947Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0235952Z 
2023-01-11T21:38:06.0236034Z aten = torch.ops.aten
2023-01-11T21:38:06.0236171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0236266Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0236272Z 
2023-01-11T21:38:06.0236346Z import triton
2023-01-11T21:38:06.0236438Z import triton.language as tl
2023-01-11T21:38:06.0236562Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0236695Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0236728Z 
2023-01-11T21:38:06.0236738Z 
2023-01-11T21:38:06.0236869Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0237078Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0237201Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0237311Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0237377Z {
2023-01-11T21:38:06.0237483Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0237550Z     {
2023-01-11T21:38:06.0237638Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0237722Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0237790Z         {
2023-01-11T21:38:06.0237879Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:06.0237946Z             {
2023-01-11T21:38:06.0238034Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0238128Z                 for(long i2=0; i2<4; i2+=1)
2023-01-11T21:38:06.0238190Z                 {
2023-01-11T21:38:06.0238267Z                     {
2023-01-11T21:38:06.0238341Z                         {
2023-01-11T21:38:06.0238457Z                             auto tmp0 = in_ptr0[i1 + (16*i2) + (64*i0)];
2023-01-11T21:38:06.0238566Z                             out_ptr0[i2 + (4*i1) + (64*i0)] = tmp0;
2023-01-11T21:38:06.0238638Z                         }
2023-01-11T21:38:06.0238704Z                     }
2023-01-11T21:38:06.0238773Z                 }
2023-01-11T21:38:06.0238839Z             }
2023-01-11T21:38:06.0238908Z         }
2023-01-11T21:38:06.0238976Z     }
2023-01-11T21:38:06.0239043Z }
2023-01-11T21:38:06.0239128Z ''')
2023-01-11T21:38:06.0239134Z 
2023-01-11T21:38:06.0239138Z 
2023-01-11T21:38:06.0239224Z async_compile.wait(globals())
2023-01-11T21:38:06.0239299Z del async_compile
2023-01-11T21:38:06.0239304Z 
2023-01-11T21:38:06.0239378Z def call(args):
2023-01-11T21:38:06.0239454Z     arg0_1, = args
2023-01-11T21:38:06.0239529Z     args.clear()
2023-01-11T21:38:06.0239740Z     buf0 = empty_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0239881Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0239953Z     del arg0_1
2023-01-11T21:38:06.0240021Z     return (buf0, )
2023-01-11T21:38:06.0240026Z 
2023-01-11T21:38:06.0240031Z 
2023-01-11T21:38:06.0240140Z if __name__ == "__main__":
2023-01-11T21:38:06.0240260Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0240387Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0240595Z     arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0240706Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0240712Z 
2023-01-11T21:38:06.0240716Z 
2023-01-11T21:38:06.0240813Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0240888Z import torch
2023-01-11T21:38:06.0240955Z import random
2023-01-11T21:38:06.0241073Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0241199Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0241204Z 
2023-01-11T21:38:06.0241286Z aten = torch.ops.aten
2023-01-11T21:38:06.0241421Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0241518Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0241523Z 
2023-01-11T21:38:06.0241600Z import triton
2023-01-11T21:38:06.0241686Z import triton.language as tl
2023-01-11T21:38:06.0241810Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0241951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0241956Z 
2023-01-11T21:38:06.0241961Z 
2023-01-11T21:38:06.0242052Z async_compile.wait(globals())
2023-01-11T21:38:06.0242129Z del async_compile
2023-01-11T21:38:06.0242134Z 
2023-01-11T21:38:06.0242207Z def call(args):
2023-01-11T21:38:06.0242281Z     arg0_1, = args
2023-01-11T21:38:06.0242354Z     args.clear()
2023-01-11T21:38:06.0242450Z     return (as_strided(arg0_1, (64, 4), (4, 1)), )
2023-01-11T21:38:06.0242524Z 
2023-01-11T21:38:06.0242537Z 
2023-01-11T21:38:06.0242610Z if __name__ == "__main__":
2023-01-11T21:38:06.0242729Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0242856Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0243070Z     arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0243183Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0243188Z 
2023-01-11T21:38:06.0243192Z 
2023-01-11T21:38:06.0243289Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0243365Z import torch
2023-01-11T21:38:06.0243432Z import random
2023-01-11T21:38:06.0243552Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0243675Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0243680Z 
2023-01-11T21:38:06.0243761Z aten = torch.ops.aten
2023-01-11T21:38:06.0243896Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0243992Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0243997Z 
2023-01-11T21:38:06.0244071Z import triton
2023-01-11T21:38:06.0244163Z import triton.language as tl
2023-01-11T21:38:06.0244280Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0244427Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0244433Z 
2023-01-11T21:38:06.0244437Z 
2023-01-11T21:38:06.0244531Z async_compile.wait(globals())
2023-01-11T21:38:06.0244607Z del async_compile
2023-01-11T21:38:06.0244611Z 
2023-01-11T21:38:06.0244685Z def call(args):
2023-01-11T21:38:06.0244756Z     arg0_1, = args
2023-01-11T21:38:06.0244830Z     args.clear()
2023-01-11T21:38:06.0244944Z     return (as_strided(arg0_1, (4, 4, 1), (4, 16, 0), 3), )
2023-01-11T21:38:06.0244949Z 
2023-01-11T21:38:06.0244954Z 
2023-01-11T21:38:06.0245026Z if __name__ == "__main__":
2023-01-11T21:38:06.0245141Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0245269Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0245476Z     arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0245587Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0245591Z 
2023-01-11T21:38:06.0245663Z ok (2.327s)
2023-01-11T21:38:06.0246144Z   test_permute_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0246279Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0246536Z [2023-01-11 21:29:56,145] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 212
2023-01-11T21:38:06.0246795Z [2023-01-11 21:29:58,015] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 212
2023-01-11T21:38:06.0246808Z 
2023-01-11T21:38:06.0246899Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0246973Z import torch
2023-01-11T21:38:06.0247047Z import random
2023-01-11T21:38:06.0247168Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0247292Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0247299Z 
2023-01-11T21:38:06.0247380Z aten = torch.ops.aten
2023-01-11T21:38:06.0247515Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0247603Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0247608Z 
2023-01-11T21:38:06.0247680Z import triton
2023-01-11T21:38:06.0247773Z import triton.language as tl
2023-01-11T21:38:06.0247900Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0248038Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0248084Z 
2023-01-11T21:38:06.0248089Z 
2023-01-11T21:38:06.0248226Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0248432Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0248555Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0248655Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0248755Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0248824Z {
2023-01-11T21:38:06.0248925Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0248992Z     {
2023-01-11T21:38:06.0249072Z         #pragma omp for 
2023-01-11T21:38:06.0249158Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0249218Z         {
2023-01-11T21:38:06.0249358Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0249498Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0249587Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0249728Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0249819Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0249908Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.0249997Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0250095Z             tmp5.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0250162Z         }
2023-01-11T21:38:06.0250261Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0250347Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:06.0250413Z         {
2023-01-11T21:38:06.0250502Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0250599Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0250688Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0250791Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.0250884Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0250971Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.0251059Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0251141Z             out_ptr1[i0] = tmp5;
2023-01-11T21:38:06.0251201Z         }
2023-01-11T21:38:06.0251266Z     }
2023-01-11T21:38:06.0251330Z }
2023-01-11T21:38:06.0251416Z ''')
2023-01-11T21:38:06.0251422Z 
2023-01-11T21:38:06.0251426Z 
2023-01-11T21:38:06.0251548Z async_compile.wait(globals())
2023-01-11T21:38:06.0251625Z del async_compile
2023-01-11T21:38:06.0251630Z 
2023-01-11T21:38:06.0251706Z def call(args):
2023-01-11T21:38:06.0251773Z     arg0_1, = args
2023-01-11T21:38:06.0251850Z     args.clear()
2023-01-11T21:38:06.0252074Z     buf0 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0252285Z     buf1 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0252451Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0252524Z     del arg0_1
2023-01-11T21:38:06.0252607Z     return (buf0, buf1, )
2023-01-11T21:38:06.0252612Z 
2023-01-11T21:38:06.0252617Z 
2023-01-11T21:38:06.0252696Z if __name__ == "__main__":
2023-01-11T21:38:06.0252806Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0252932Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0253150Z     arg0_1 = rand_strided((2, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0253262Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0253267Z 
2023-01-11T21:38:06.0253337Z ok (1.910s)
2023-01-11T21:38:06.0253786Z   test_pow1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0253948Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0254206Z [2023-01-11 21:29:58,232] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 213
2023-01-11T21:38:06.0254584Z [2023-01-11 21:30:00,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 213
2023-01-11T21:38:06.0254590Z 
2023-01-11T21:38:06.0254690Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0254757Z import torch
2023-01-11T21:38:06.0254830Z import random
2023-01-11T21:38:06.0254950Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0255075Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0255080Z 
2023-01-11T21:38:06.0255162Z aten = torch.ops.aten
2023-01-11T21:38:06.0255299Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0255393Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0255398Z 
2023-01-11T21:38:06.0255469Z import triton
2023-01-11T21:38:06.0255560Z import triton.language as tl
2023-01-11T21:38:06.0255684Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0255825Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0255831Z 
2023-01-11T21:38:06.0255835Z 
2023-01-11T21:38:06.0255977Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0256182Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0256305Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0256409Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0256503Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0256602Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0256698Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.0256795Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.0256897Z                        float* __restrict__ out_ptr5,
2023-01-11T21:38:06.0256996Z                        float* __restrict__ out_ptr6,
2023-01-11T21:38:06.0257097Z                        float* __restrict__ out_ptr7,
2023-01-11T21:38:06.0257258Z                        float* __restrict__ out_ptr8,
2023-01-11T21:38:06.0257415Z                        float* __restrict__ out_ptr9,
2023-01-11T21:38:06.0257529Z                        float* __restrict__ out_ptr10,
2023-01-11T21:38:06.0257635Z                        float* __restrict__ out_ptr11,
2023-01-11T21:38:06.0257736Z                        float* __restrict__ out_ptr12,
2023-01-11T21:38:06.0257836Z                        float* __restrict__ out_ptr13,
2023-01-11T21:38:06.0257935Z                        float* __restrict__ out_ptr14,
2023-01-11T21:38:06.0258036Z                        float* __restrict__ out_ptr15)
2023-01-11T21:38:06.0258094Z {
2023-01-11T21:38:06.0258195Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0258261Z     {
2023-01-11T21:38:06.0258345Z         #pragma omp for 
2023-01-11T21:38:06.0258432Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0258501Z         {
2023-01-11T21:38:06.0258641Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0258737Z             auto tmp1 = tmp0.reciprocal();
2023-01-11T21:38:06.0258830Z             auto tmp2 = tmp1 * tmp1;
2023-01-11T21:38:06.0258921Z             auto tmp3 = tmp2 * tmp2;
2023-01-11T21:38:06.0259007Z             auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:06.0259093Z             auto tmp5 = tmp2 * tmp1;
2023-01-11T21:38:06.0259178Z             auto tmp6 = tmp5 * tmp5;
2023-01-11T21:38:06.0259264Z             auto tmp7 = tmp6 * tmp1;
2023-01-11T21:38:06.0259343Z             auto tmp8 = tmp3 * tmp1;
2023-01-11T21:38:06.0259482Z             auto tmp9 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0259572Z             auto tmp10 = tmp0 * tmp0;
2023-01-11T21:38:06.0259662Z             auto tmp11 = tmp10 * tmp0;
2023-01-11T21:38:06.0259792Z             auto tmp12 = tmp10 * tmp10;
2023-01-11T21:38:06.0259881Z             auto tmp13 = tmp12 * tmp0;
2023-01-11T21:38:06.0259968Z             auto tmp14 = tmp11 * tmp11;
2023-01-11T21:38:06.0260049Z             auto tmp15 = tmp14 * tmp0;
2023-01-11T21:38:06.0260137Z             auto tmp16 = tmp12 * tmp12;
2023-01-11T21:38:06.0260237Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0260331Z             tmp7.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0260426Z             tmp6.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0260519Z             tmp8.store(out_ptr3 + 8*i0);
2023-01-11T21:38:06.0260611Z             tmp3.store(out_ptr4 + 8*i0);
2023-01-11T21:38:06.0260697Z             tmp5.store(out_ptr5 + 8*i0);
2023-01-11T21:38:06.0260790Z             tmp2.store(out_ptr6 + 8*i0);
2023-01-11T21:38:06.0260881Z             tmp1.store(out_ptr7 + 8*i0);
2023-01-11T21:38:06.0260972Z             tmp9.store(out_ptr8 + 8*i0);
2023-01-11T21:38:06.0261066Z             tmp10.store(out_ptr9 + 8*i0);
2023-01-11T21:38:06.0261168Z             tmp11.store(out_ptr10 + 8*i0);
2023-01-11T21:38:06.0261267Z             tmp12.store(out_ptr11 + 8*i0);
2023-01-11T21:38:06.0261356Z             tmp13.store(out_ptr12 + 8*i0);
2023-01-11T21:38:06.0261451Z             tmp14.store(out_ptr13 + 8*i0);
2023-01-11T21:38:06.0261548Z             tmp15.store(out_ptr14 + 8*i0);
2023-01-11T21:38:06.0261640Z             tmp16.store(out_ptr15 + 8*i0);
2023-01-11T21:38:06.0261709Z         }
2023-01-11T21:38:06.0261808Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0261893Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0261953Z         {
2023-01-11T21:38:06.0262043Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0262134Z             auto tmp1 = 1 / tmp0;
2023-01-11T21:38:06.0262226Z             auto tmp2 = tmp1 * tmp1;
2023-01-11T21:38:06.0262316Z             auto tmp3 = tmp2 * tmp2;
2023-01-11T21:38:06.0262407Z             auto tmp4 = tmp3 * tmp3;
2023-01-11T21:38:06.0262497Z             auto tmp5 = tmp2 * tmp1;
2023-01-11T21:38:06.0262576Z             auto tmp6 = tmp5 * tmp5;
2023-01-11T21:38:06.0262660Z             auto tmp7 = tmp6 * tmp1;
2023-01-11T21:38:06.0262746Z             auto tmp8 = tmp3 * tmp1;
2023-01-11T21:38:06.0262850Z             auto tmp9 = static_cast<float>(1);
2023-01-11T21:38:06.0262968Z             auto tmp10 = tmp0 * tmp0;
2023-01-11T21:38:06.0263059Z             auto tmp11 = tmp10 * tmp0;
2023-01-11T21:38:06.0263151Z             auto tmp12 = tmp10 * tmp10;
2023-01-11T21:38:06.0263233Z             auto tmp13 = tmp12 * tmp0;
2023-01-11T21:38:06.0263323Z             auto tmp14 = tmp11 * tmp11;
2023-01-11T21:38:06.0263411Z             auto tmp15 = tmp14 * tmp0;
2023-01-11T21:38:06.0263500Z             auto tmp16 = tmp12 * tmp12;
2023-01-11T21:38:06.0263585Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0263669Z             out_ptr1[i0] = tmp7;
2023-01-11T21:38:06.0263755Z             out_ptr2[i0] = tmp6;
2023-01-11T21:38:06.0263831Z             out_ptr3[i0] = tmp8;
2023-01-11T21:38:06.0263916Z             out_ptr4[i0] = tmp3;
2023-01-11T21:38:06.0263999Z             out_ptr5[i0] = tmp5;
2023-01-11T21:38:06.0264082Z             out_ptr6[i0] = tmp2;
2023-01-11T21:38:06.0264164Z             out_ptr7[i0] = tmp1;
2023-01-11T21:38:06.0264247Z             out_ptr8[i0] = tmp9;
2023-01-11T21:38:06.0264326Z             out_ptr9[i0] = tmp10;
2023-01-11T21:38:06.0264413Z             out_ptr10[i0] = tmp11;
2023-01-11T21:38:06.0264498Z             out_ptr11[i0] = tmp12;
2023-01-11T21:38:06.0264583Z             out_ptr12[i0] = tmp13;
2023-01-11T21:38:06.0264665Z             out_ptr13[i0] = tmp14;
2023-01-11T21:38:06.0264748Z             out_ptr14[i0] = tmp15;
2023-01-11T21:38:06.0264833Z             out_ptr15[i0] = tmp16;
2023-01-11T21:38:06.0264893Z         }
2023-01-11T21:38:06.0264959Z     }
2023-01-11T21:38:06.0265022Z }
2023-01-11T21:38:06.0265112Z ''')
2023-01-11T21:38:06.0265118Z 
2023-01-11T21:38:06.0265122Z 
2023-01-11T21:38:06.0265215Z async_compile.wait(globals())
2023-01-11T21:38:06.0265329Z del async_compile
2023-01-11T21:38:06.0265336Z 
2023-01-11T21:38:06.0265409Z def call(args):
2023-01-11T21:38:06.0265483Z     arg0_1, = args
2023-01-11T21:38:06.0265551Z     args.clear()
2023-01-11T21:38:06.0265755Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0265956Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0266153Z     buf2 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0266346Z     buf3 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0266539Z     buf4 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0266728Z     buf5 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0266911Z     buf6 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0267099Z     buf7 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0267291Z     buf8 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0267480Z     buf9 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0267677Z     buf10 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0267871Z     buf11 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0268067Z     buf12 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0268260Z     buf13 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0268444Z     buf14 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0268634Z     buf15 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0269135Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(buf12.data_ptr()), c_void_p(buf13.data_ptr()), c_void_p(buf14.data_ptr()), c_void_p(buf15.data_ptr()))
2023-01-11T21:38:06.0269308Z     return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, arg0_1, buf9, buf10, buf11, buf12, buf13, buf14, buf15, )
2023-01-11T21:38:06.0269314Z 
2023-01-11T21:38:06.0269318Z 
2023-01-11T21:38:06.0269399Z if __name__ == "__main__":
2023-01-11T21:38:06.0269520Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0269646Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0269840Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0269954Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0269962Z 
2023-01-11T21:38:06.0270036Z ok (2.189s)
2023-01-11T21:38:06.0270487Z   test_pow2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0270618Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0270870Z [2023-01-11 21:30:00,255] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 214
2023-01-11T21:38:06.0271134Z [2023-01-11 21:30:02,112] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 214
2023-01-11T21:38:06.0271140Z 
2023-01-11T21:38:06.0271240Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0271346Z import torch
2023-01-11T21:38:06.0271421Z import random
2023-01-11T21:38:06.0271542Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0271667Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0271672Z 
2023-01-11T21:38:06.0271754Z aten = torch.ops.aten
2023-01-11T21:38:06.0271885Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0271981Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0271986Z 
2023-01-11T21:38:06.0272062Z import triton
2023-01-11T21:38:06.0272153Z import triton.language as tl
2023-01-11T21:38:06.0272279Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0272417Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0272423Z 
2023-01-11T21:38:06.0272427Z 
2023-01-11T21:38:06.0272566Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0272771Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0272892Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0272996Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0273097Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0273162Z {
2023-01-11T21:38:06.0273263Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0273331Z     {
2023-01-11T21:38:06.0273414Z         #pragma omp for 
2023-01-11T21:38:06.0273494Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0273562Z         {
2023-01-11T21:38:06.0273703Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0273844Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(1000));
2023-01-11T21:38:06.0273937Z             auto tmp2 = tmp0.pow(tmp1);
2023-01-11T21:38:06.0274029Z             auto tmp3 = tmp1.pow(tmp0);
2023-01-11T21:38:06.0274125Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0274214Z             tmp3.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0274283Z         }
2023-01-11T21:38:06.0274380Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0274465Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0274532Z         {
2023-01-11T21:38:06.0274621Z             auto tmp1 = in_ptr0[i0];
2023-01-11T21:38:06.0274726Z             auto tmp0 = static_cast<float>(1000);
2023-01-11T21:38:06.0274856Z             auto tmp2 = std::pow(tmp0, tmp1);
2023-01-11T21:38:06.0274964Z             auto tmp3 = std::pow(tmp1, tmp0);
2023-01-11T21:38:06.0275050Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0275134Z             out_ptr1[i0] = tmp3;
2023-01-11T21:38:06.0275203Z         }
2023-01-11T21:38:06.0275270Z     }
2023-01-11T21:38:06.0275337Z }
2023-01-11T21:38:06.0275415Z ''')
2023-01-11T21:38:06.0275421Z 
2023-01-11T21:38:06.0275425Z 
2023-01-11T21:38:06.0275519Z async_compile.wait(globals())
2023-01-11T21:38:06.0275595Z del async_compile
2023-01-11T21:38:06.0275600Z 
2023-01-11T21:38:06.0275675Z def call(args):
2023-01-11T21:38:06.0275754Z     arg0_1, = args
2023-01-11T21:38:06.0275828Z     args.clear()
2023-01-11T21:38:06.0276028Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0276216Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0276388Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0276460Z     del arg0_1
2023-01-11T21:38:06.0276545Z     return (buf0, buf1, )
2023-01-11T21:38:06.0276550Z 
2023-01-11T21:38:06.0276555Z 
2023-01-11T21:38:06.0276633Z if __name__ == "__main__":
2023-01-11T21:38:06.0276751Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0276879Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0277075Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0277179Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0277192Z 
2023-01-11T21:38:06.0277285Z ok (1.908s)
2023-01-11T21:38:06.0277791Z   test_pow3_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.0277874Z   warnings.warn(
2023-01-11T21:38:06.0278131Z [2023-01-11 21:30:02,152] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 215
2023-01-11T21:38:06.0278395Z [2023-01-11 21:30:04,306] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 215
2023-01-11T21:38:06.0278400Z 
2023-01-11T21:38:06.0278498Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0278572Z import torch
2023-01-11T21:38:06.0278649Z import random
2023-01-11T21:38:06.0278761Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0278886Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0278894Z 
2023-01-11T21:38:06.0278978Z aten = torch.ops.aten
2023-01-11T21:38:06.0279115Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0279210Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0279215Z 
2023-01-11T21:38:06.0279291Z import triton
2023-01-11T21:38:06.0279384Z import triton.language as tl
2023-01-11T21:38:06.0279506Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0279646Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0279652Z 
2023-01-11T21:38:06.0279656Z 
2023-01-11T21:38:06.0279792Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0279996Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0280119Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0280224Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0280291Z {
2023-01-11T21:38:06.0280355Z     {
2023-01-11T21:38:06.0280418Z         {
2023-01-11T21:38:06.0280505Z             auto tmp1 = in_ptr0[0];
2023-01-11T21:38:06.0280622Z             auto tmp0 = static_cast<float>(0.12300000339746475);
2023-01-11T21:38:06.0280711Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0280807Z             auto tmp3 = std::sqrt(tmp2);
2023-01-11T21:38:06.0280924Z             out_ptr0[0] = tmp3;
2023-01-11T21:38:06.0280997Z         }
2023-01-11T21:38:06.0281059Z     }
2023-01-11T21:38:06.0281126Z }
2023-01-11T21:38:06.0281210Z ''')
2023-01-11T21:38:06.0281215Z 
2023-01-11T21:38:06.0281220Z 
2023-01-11T21:38:06.0281314Z async_compile.wait(globals())
2023-01-11T21:38:06.0281390Z del async_compile
2023-01-11T21:38:06.0281395Z 
2023-01-11T21:38:06.0281470Z def call(args):
2023-01-11T21:38:06.0281546Z     arg0_1, = args
2023-01-11T21:38:06.0281614Z     args.clear()
2023-01-11T21:38:06.0281797Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0281937Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0282014Z     del arg0_1
2023-01-11T21:38:06.0282091Z     return (buf0, )
2023-01-11T21:38:06.0282096Z 
2023-01-11T21:38:06.0282101Z 
2023-01-11T21:38:06.0282181Z if __name__ == "__main__":
2023-01-11T21:38:06.0282298Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0282427Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0282606Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0282718Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0282723Z 
2023-01-11T21:38:06.0282793Z ok (2.178s)
2023-01-11T21:38:06.0283131Z   test_profiler_mark_wrapper_call_cpu (__main__.CpuTests) ... STAGE:2023-01-11 21:30:04 2346:2346 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:38:06.0283387Z [2023-01-11 21:30:04,320] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 216
2023-01-11T21:38:06.0283648Z [2023-01-11 21:30:06,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 216
2023-01-11T21:38:06.0283933Z STAGE:2023-01-11 21:30:06 2346:2346 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:38:06.0284195Z STAGE:2023-01-11 21:30:06 2346:2346 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:38:06.0284201Z 
2023-01-11T21:38:06.0284298Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0284365Z import torch
2023-01-11T21:38:06.0284441Z import random
2023-01-11T21:38:06.0284559Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0284683Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0284688Z 
2023-01-11T21:38:06.0284769Z aten = torch.ops.aten
2023-01-11T21:38:06.0284908Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0285003Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0285008Z 
2023-01-11T21:38:06.0285075Z import triton
2023-01-11T21:38:06.0285171Z import triton.language as tl
2023-01-11T21:38:06.0285298Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0285439Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0285444Z 
2023-01-11T21:38:06.0285450Z 
2023-01-11T21:38:06.0285589Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0285798Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0285923Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0286034Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0286131Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0286199Z {
2023-01-11T21:38:06.0286302Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0286370Z     {
2023-01-11T21:38:06.0286452Z         #pragma omp for 
2023-01-11T21:38:06.0286541Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.0286610Z         {
2023-01-11T21:38:06.0286744Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0286879Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0286972Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0287067Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0287164Z         }
2023-01-11T21:38:06.0287267Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0287358Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.0287418Z         {
2023-01-11T21:38:06.0287506Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0287595Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.0287682Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0287766Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0287829Z         }
2023-01-11T21:38:06.0287892Z     }
2023-01-11T21:38:06.0287949Z }
2023-01-11T21:38:06.0288034Z ''')
2023-01-11T21:38:06.0288039Z 
2023-01-11T21:38:06.0288046Z 
2023-01-11T21:38:06.0288140Z async_compile.wait(globals())
2023-01-11T21:38:06.0288216Z del async_compile
2023-01-11T21:38:06.0288222Z 
2023-01-11T21:38:06.0288295Z def call(args):
2023-01-11T21:38:06.0288376Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0288451Z     args.clear()
2023-01-11T21:38:06.0288559Z     from torch.profiler import record_function
2023-01-11T21:38:06.0288724Z     with record_function('inductor_wrapper_call'):
2023-01-11T21:38:06.0288927Z         buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0289098Z         kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0289175Z         del arg0_1
2023-01-11T21:38:06.0289248Z         del arg1_1
2023-01-11T21:38:06.0289327Z         return (buf0, )
2023-01-11T21:38:06.0289333Z 
2023-01-11T21:38:06.0289338Z 
2023-01-11T21:38:06.0289421Z if __name__ == "__main__":
2023-01-11T21:38:06.0289531Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0289699Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0289896Z     arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0290091Z     arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0290216Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0290221Z 
2023-01-11T21:38:06.0290293Z ok (2.424s)
2023-01-11T21:38:06.0290813Z   test_rand_like_deterministic_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.0290891Z   warnings.warn(
2023-01-11T21:38:06.0291146Z [2023-01-11 21:30:06,825] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 217
2023-01-11T21:38:06.0291392Z [2023-01-11 21:30:06,826] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.0291654Z [2023-01-11 21:30:09,192] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 217
2023-01-11T21:38:06.0291660Z 
2023-01-11T21:38:06.0291758Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0291830Z import torch
2023-01-11T21:38:06.0291907Z import random
2023-01-11T21:38:06.0292024Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0292148Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0292154Z 
2023-01-11T21:38:06.0292237Z aten = torch.ops.aten
2023-01-11T21:38:06.0292365Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0292458Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0292463Z 
2023-01-11T21:38:06.0292536Z import triton
2023-01-11T21:38:06.0292628Z import triton.language as tl
2023-01-11T21:38:06.0292752Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0292896Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0293062Z seed_cpu_None = None  # 9130db9322feaa41c28986790b86d7dd047e77339ff46fce775dbaa5929b26ce
2023-01-11T21:38:06.0293068Z 
2023-01-11T21:38:06.0293072Z 
2023-01-11T21:38:06.0293207Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0293429Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0293550Z extern "C" void kernel(const long* __restrict__ seed0,
2023-01-11T21:38:06.0293653Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0293757Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0293822Z {
2023-01-11T21:38:06.0293924Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0293988Z     {
2023-01-11T21:38:06.0294063Z         #pragma omp for 
2023-01-11T21:38:06.0294151Z         for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:06.0294217Z         {
2023-01-11T21:38:06.0294288Z             {
2023-01-11T21:38:06.0294355Z                 {
2023-01-11T21:38:06.0294447Z                     auto tmp0 = seed0[0];
2023-01-11T21:38:06.0294669Z                     auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:06.0294808Z                     auto tmp2 = static_cast<float>(normalized_rand_cpu(tmp0, tmp1));;
2023-01-11T21:38:06.0294923Z                     auto tmp3 = static_cast<int>(1024 + i0);
2023-01-11T21:38:06.0295067Z                     auto tmp4 = static_cast<float>(normalized_rand_cpu(tmp0, tmp3));;
2023-01-11T21:38:06.0295156Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0295246Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.0295317Z                 }
2023-01-11T21:38:06.0295384Z             }
2023-01-11T21:38:06.0295443Z         }
2023-01-11T21:38:06.0295509Z     }
2023-01-11T21:38:06.0295570Z }
2023-01-11T21:38:06.0301863Z ''')
2023-01-11T21:38:06.0301869Z 
2023-01-11T21:38:06.0301874Z 
2023-01-11T21:38:06.0301984Z async_compile.wait(globals())
2023-01-11T21:38:06.0302148Z del async_compile
2023-01-11T21:38:06.0302153Z 
2023-01-11T21:38:06.0302233Z def call(args):
2023-01-11T21:38:06.0302311Z     arg0_1, = args
2023-01-11T21:38:06.0302392Z     args.clear()
2023-01-11T21:38:06.0302537Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cpu_None)
2023-01-11T21:38:06.0302737Z     buf0 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0302935Z     buf1 = empty_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0303110Z     kernel_cpp_0(c_void_p(seed_cpu_None.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0303192Z     return (buf0, buf1, )
2023-01-11T21:38:06.0303198Z 
2023-01-11T21:38:06.0303202Z 
2023-01-11T21:38:06.0303280Z if __name__ == "__main__":
2023-01-11T21:38:06.0303398Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0303525Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0303722Z     seed_cpu_None = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0303912Z     arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0304025Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0304031Z 
2023-01-11T21:38:06.0304100Z ok (2.462s)
2023-01-11T21:38:06.0304560Z   test_reduction1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0304693Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0304950Z [2023-01-11 21:30:09,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 218
2023-01-11T21:38:06.0305219Z [2023-01-11 21:30:11,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 218
2023-01-11T21:38:06.0305225Z 
2023-01-11T21:38:06.0305323Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0305398Z import torch
2023-01-11T21:38:06.0305467Z import random
2023-01-11T21:38:06.0305625Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0305754Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0305760Z 
2023-01-11T21:38:06.0305843Z aten = torch.ops.aten
2023-01-11T21:38:06.0305978Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0306073Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0306078Z 
2023-01-11T21:38:06.0306155Z import triton
2023-01-11T21:38:06.0306249Z import triton.language as tl
2023-01-11T21:38:06.0306366Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0306505Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0306514Z 
2023-01-11T21:38:06.0306519Z 
2023-01-11T21:38:06.0306655Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0306857Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0306986Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0307087Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0307188Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0307288Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0307389Z                        long* __restrict__ out_ptr3,
2023-01-11T21:38:06.0307490Z                        long* __restrict__ out_ptr4)
2023-01-11T21:38:06.0307557Z {
2023-01-11T21:38:06.0307623Z     {
2023-01-11T21:38:06.0307684Z         {
2023-01-11T21:38:06.0307766Z             float tmp1 = 0;
2023-01-11T21:38:06.0307987Z             float tmp2 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0308114Z             float tmp3 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0308270Z             struct IndexValue_7 {size_t index; float value;};
2023-01-11T21:38:06.0308491Z             IndexValue_7 tmp4{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.0308632Z             #pragma omp declare reduction(argmax : struct IndexValue_7 :\
2023-01-11T21:38:06.0308790Z                 omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.0308937Z                 omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.0309171Z             	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.0309293Z             struct IndexValue_8 {size_t index; float value;};
2023-01-11T21:38:06.0309431Z             IndexValue_8 tmp5{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.0309571Z             #pragma omp declare reduction(argmin : struct IndexValue_8 :\
2023-01-11T21:38:06.0309726Z                 omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.0309877Z                 omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.0310025Z             	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.0310119Z             for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:06.0310181Z             {
2023-01-11T21:38:06.0310251Z                 {
2023-01-11T21:38:06.0310349Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0310431Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0310537Z                     tmp2 = std::max(tmp2, tmp0);
2023-01-11T21:38:06.0310641Z                     tmp3 = std::min(tmp3, tmp0);
2023-01-11T21:38:06.0310737Z                     if (tmp4.value < tmp0) {
2023-01-11T21:38:06.0310845Z                         tmp4.index = i0; tmp4.value = tmp0;
2023-01-11T21:38:06.0310917Z                     }
2023-01-11T21:38:06.0311015Z                     if (tmp5.value > tmp0) {
2023-01-11T21:38:06.0311125Z                         tmp5.index = i0; tmp5.value = tmp0;
2023-01-11T21:38:06.0311198Z                     }
2023-01-11T21:38:06.0311267Z                 }
2023-01-11T21:38:06.0311334Z             }
2023-01-11T21:38:06.0311412Z             out_ptr0[0] = tmp1;
2023-01-11T21:38:06.0311524Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.0311607Z             out_ptr2[0] = tmp3;
2023-01-11T21:38:06.0311697Z             out_ptr3[0] = tmp4.index;
2023-01-11T21:38:06.0311785Z             out_ptr4[0] = tmp5.index;
2023-01-11T21:38:06.0311852Z         }
2023-01-11T21:38:06.0311911Z     }
2023-01-11T21:38:06.0311974Z }
2023-01-11T21:38:06.0312058Z ''')
2023-01-11T21:38:06.0312064Z 
2023-01-11T21:38:06.0312068Z 
2023-01-11T21:38:06.0312161Z async_compile.wait(globals())
2023-01-11T21:38:06.0312239Z del async_compile
2023-01-11T21:38:06.0312244Z 
2023-01-11T21:38:06.0312319Z def call(args):
2023-01-11T21:38:06.0312393Z     arg0_1, = args
2023-01-11T21:38:06.0312471Z     args.clear()
2023-01-11T21:38:06.0312653Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0312833Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0313013Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0313194Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0313371Z     buf4 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0313609Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:06.0313684Z     del arg0_1
2023-01-11T21:38:06.0313777Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.0313788Z 
2023-01-11T21:38:06.0313793Z 
2023-01-11T21:38:06.0313867Z if __name__ == "__main__":
2023-01-11T21:38:06.0313985Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0314146Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0314339Z     arg0_1 = rand_strided((3, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0314451Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0314456Z 
2023-01-11T21:38:06.0314530Z ok (2.399s)
2023-01-11T21:38:06.0314986Z   test_reduction2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0315119Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0315375Z [2023-01-11 21:30:11,608] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 219
2023-01-11T21:38:06.0315634Z [2023-01-11 21:30:14,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 219
2023-01-11T21:38:06.0315647Z 
2023-01-11T21:38:06.0315739Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0315812Z import torch
2023-01-11T21:38:06.0315887Z import random
2023-01-11T21:38:06.0316007Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0316130Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0316135Z 
2023-01-11T21:38:06.0316218Z aten = torch.ops.aten
2023-01-11T21:38:06.0316357Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0316446Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0316451Z 
2023-01-11T21:38:06.0316524Z import triton
2023-01-11T21:38:06.0316618Z import triton.language as tl
2023-01-11T21:38:06.0316746Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0316888Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0316896Z 
2023-01-11T21:38:06.0316901Z 
2023-01-11T21:38:06.0317038Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0317243Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0317368Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0317495Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0317599Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0317702Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0317803Z                        long* __restrict__ out_ptr3)
2023-01-11T21:38:06.0317868Z {
2023-01-11T21:38:06.0317934Z     {
2023-01-11T21:38:06.0318002Z         {
2023-01-11T21:38:06.0318077Z             float tmp1 = 0;
2023-01-11T21:38:06.0318284Z             float tmp2 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0318415Z             float tmp3 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0318543Z             struct IndexValue_9 {size_t index; float value;};
2023-01-11T21:38:06.0318681Z             IndexValue_9 tmp4{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.0318821Z             #pragma omp declare reduction(argmin : struct IndexValue_9 :\
2023-01-11T21:38:06.0318976Z                 omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.0319126Z                 omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.0319266Z             	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.0319358Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0319425Z             {
2023-01-11T21:38:06.0319494Z                 {
2023-01-11T21:38:06.0319592Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0319675Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0319781Z                     tmp2 = std::max(tmp2, tmp0);
2023-01-11T21:38:06.0319906Z                     tmp3 = std::min(tmp3, tmp0);
2023-01-11T21:38:06.0320005Z                     if (tmp4.value > tmp0) {
2023-01-11T21:38:06.0320120Z                         tmp4.index = i0; tmp4.value = tmp0;
2023-01-11T21:38:06.0320193Z                     }
2023-01-11T21:38:06.0320262Z                 }
2023-01-11T21:38:06.0320331Z             }
2023-01-11T21:38:06.0320419Z             out_ptr0[0] = tmp1;
2023-01-11T21:38:06.0320496Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.0320576Z             out_ptr2[0] = tmp3;
2023-01-11T21:38:06.0320666Z             out_ptr3[0] = tmp4.index;
2023-01-11T21:38:06.0320732Z         }
2023-01-11T21:38:06.0320797Z     }
2023-01-11T21:38:06.0320861Z }
2023-01-11T21:38:06.0320946Z ''')
2023-01-11T21:38:06.0320951Z 
2023-01-11T21:38:06.0320955Z 
2023-01-11T21:38:06.0321042Z async_compile.wait(globals())
2023-01-11T21:38:06.0321116Z del async_compile
2023-01-11T21:38:06.0321121Z 
2023-01-11T21:38:06.0321194Z def call(args):
2023-01-11T21:38:06.0321269Z     arg0_1, = args
2023-01-11T21:38:06.0321344Z     args.clear()
2023-01-11T21:38:06.0321530Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0321710Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0321884Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0322065Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0322280Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.0322354Z     del arg0_1
2023-01-11T21:38:06.0322446Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.0322451Z 
2023-01-11T21:38:06.0322455Z 
2023-01-11T21:38:06.0322536Z if __name__ == "__main__":
2023-01-11T21:38:06.0322654Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0322780Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0322968Z     arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0323082Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0323087Z 
2023-01-11T21:38:06.0323157Z ok (2.634s)
2023-01-11T21:38:06.0323639Z   test_reduction3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0323777Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0324035Z [2023-01-11 21:30:14,281] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 220
2023-01-11T21:38:06.0324296Z [2023-01-11 21:30:16,495] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 220
2023-01-11T21:38:06.0324305Z 
2023-01-11T21:38:06.0324402Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0324476Z import torch
2023-01-11T21:38:06.0324551Z import random
2023-01-11T21:38:06.0324663Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0324789Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0324794Z 
2023-01-11T21:38:06.0324875Z aten = torch.ops.aten
2023-01-11T21:38:06.0325012Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0325108Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0325113Z 
2023-01-11T21:38:06.0325187Z import triton
2023-01-11T21:38:06.0325278Z import triton.language as tl
2023-01-11T21:38:06.0325408Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0325564Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0325571Z 
2023-01-11T21:38:06.0325576Z 
2023-01-11T21:38:06.0325757Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0325963Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0326087Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0326193Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0326296Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0326395Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0326489Z                        long* __restrict__ out_ptr3)
2023-01-11T21:38:06.0326554Z {
2023-01-11T21:38:06.0326616Z     {
2023-01-11T21:38:06.0326684Z         {
2023-01-11T21:38:06.0326766Z             float tmp1 = 0;
2023-01-11T21:38:06.0326970Z             float tmp2 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0327097Z             float tmp3 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0327214Z             struct IndexValue_10 {size_t index; float value;};
2023-01-11T21:38:06.0327438Z             IndexValue_10 tmp4{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.0327580Z             #pragma omp declare reduction(argmax : struct IndexValue_10 :\
2023-01-11T21:38:06.0327735Z                 omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.0327886Z                 omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.0328115Z             	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.0328206Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0328274Z             {
2023-01-11T21:38:06.0328337Z                 {
2023-01-11T21:38:06.0328436Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0328519Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0328625Z                     tmp2 = std::max(tmp2, tmp0);
2023-01-11T21:38:06.0328726Z                     tmp3 = std::min(tmp3, tmp0);
2023-01-11T21:38:06.0328826Z                     if (tmp4.value < tmp0) {
2023-01-11T21:38:06.0328941Z                         tmp4.index = i0; tmp4.value = tmp0;
2023-01-11T21:38:06.0329012Z                     }
2023-01-11T21:38:06.0329074Z                 }
2023-01-11T21:38:06.0329140Z             }
2023-01-11T21:38:06.0329254Z             out_ptr0[0] = tmp1;
2023-01-11T21:38:06.0329338Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.0329420Z             out_ptr2[0] = tmp3;
2023-01-11T21:38:06.0329509Z             out_ptr3[0] = tmp4.index;
2023-01-11T21:38:06.0329569Z         }
2023-01-11T21:38:06.0329636Z     }
2023-01-11T21:38:06.0329698Z }
2023-01-11T21:38:06.0329782Z ''')
2023-01-11T21:38:06.0329787Z 
2023-01-11T21:38:06.0329792Z 
2023-01-11T21:38:06.0329884Z async_compile.wait(globals())
2023-01-11T21:38:06.0329962Z del async_compile
2023-01-11T21:38:06.0329967Z 
2023-01-11T21:38:06.0330041Z def call(args):
2023-01-11T21:38:06.0330108Z     arg0_1, = args
2023-01-11T21:38:06.0330186Z     args.clear()
2023-01-11T21:38:06.0330373Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0330551Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0330727Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0330906Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0331121Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.0331197Z     del arg0_1
2023-01-11T21:38:06.0331282Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.0331287Z 
2023-01-11T21:38:06.0331296Z 
2023-01-11T21:38:06.0331371Z if __name__ == "__main__":
2023-01-11T21:38:06.0331489Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0331615Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0331806Z     arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0331946Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0331951Z 
2023-01-11T21:38:06.0332020Z ok (2.270s)
2023-01-11T21:38:06.0332250Z   test_reduction4_cpu (__main__.CpuTests) ... skip: Non-deterministic CPU results (0.001s)
2023-01-11T21:38:06.0332728Z   test_reflection_pad2d_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0332860Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0333113Z [2023-01-11 21:30:16,524] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 221
2023-01-11T21:38:06.0333379Z [2023-01-11 21:30:18,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 221
2023-01-11T21:38:06.0333799Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0333930Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0334183Z [2023-01-11 21:30:18,643] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 222
2023-01-11T21:38:06.0334442Z [2023-01-11 21:30:21,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 222
2023-01-11T21:38:06.0335001Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0335138Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0335442Z [2023-01-11 21:30:21,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 223
2023-01-11T21:38:06.0335450Z 
2023-01-11T21:38:06.0335568Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0335643Z import torch
2023-01-11T21:38:06.0335737Z import random
2023-01-11T21:38:06.0335856Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0335981Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0335987Z 
2023-01-11T21:38:06.0336069Z aten = torch.ops.aten
2023-01-11T21:38:06.0336207Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0336307Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0336312Z 
2023-01-11T21:38:06.0336386Z import triton
2023-01-11T21:38:06.0336471Z import triton.language as tl
2023-01-11T21:38:06.0336595Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0336733Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0336741Z 
2023-01-11T21:38:06.0336745Z 
2023-01-11T21:38:06.0336882Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0337087Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0337329Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0337437Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0337504Z {
2023-01-11T21:38:06.0337599Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0337666Z     {
2023-01-11T21:38:06.0337748Z         #pragma omp for 
2023-01-11T21:38:06.0337834Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0337959Z         {
2023-01-11T21:38:06.0338049Z             #pragma GCC ivdep
2023-01-11T21:38:06.0338129Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0338196Z             {
2023-01-11T21:38:06.0338265Z                 {
2023-01-11T21:38:06.0338336Z                     {
2023-01-11T21:38:06.0338449Z                         auto tmp0 = static_cast<int>(i0);
2023-01-11T21:38:06.0338561Z                         auto tmp1 = static_cast<int>(i1);
2023-01-11T21:38:06.0338674Z                         auto tmp2 = in_ptr0[tmp1 + (8*tmp0)];
2023-01-11T21:38:06.0338767Z                         out_ptr0[i1 + (8*i0)] = tmp2;
2023-01-11T21:38:06.0338839Z                     }
2023-01-11T21:38:06.0338907Z                 }
2023-01-11T21:38:06.0338974Z             }
2023-01-11T21:38:06.0339041Z         }
2023-01-11T21:38:06.0339108Z     }
2023-01-11T21:38:06.0339175Z }
2023-01-11T21:38:06.0339254Z ''')
2023-01-11T21:38:06.0339259Z 
2023-01-11T21:38:06.0339270Z 
2023-01-11T21:38:06.0339362Z async_compile.wait(globals())
2023-01-11T21:38:06.0339438Z del async_compile
2023-01-11T21:38:06.0339443Z 
2023-01-11T21:38:06.0339519Z def call(args):
2023-01-11T21:38:06.0339598Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0339672Z     args.clear()
2023-01-11T21:38:06.0339887Z     buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0340039Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0340105Z     del arg0_1
2023-01-11T21:38:06.0340188Z     return (buf0, )
2023-01-11T21:38:06.0340193Z 
2023-01-11T21:38:06.0340197Z 
2023-01-11T21:38:06.0340280Z if __name__ == "__main__":
2023-01-11T21:38:06.0340407Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0340543Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0340786Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0340996Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0341120Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0341125Z 
2023-01-11T21:38:06.0341129Z 
2023-01-11T21:38:06.0341219Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0341299Z import torch
2023-01-11T21:38:06.0341373Z import random
2023-01-11T21:38:06.0341566Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0341693Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0341698Z 
2023-01-11T21:38:06.0341781Z aten = torch.ops.aten
2023-01-11T21:38:06.0341916Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0342004Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0342016Z 
2023-01-11T21:38:06.0342083Z import triton
2023-01-11T21:38:06.0342180Z import triton.language as tl
2023-01-11T21:38:06.0342305Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0342444Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0342451Z 
2023-01-11T21:38:06.0342456Z 
2023-01-11T21:38:06.0342590Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0342793Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0342924Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0343033Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0343091Z {
2023-01-11T21:38:06.0343191Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0343262Z     {
2023-01-11T21:38:06.0343343Z         #pragma omp for 
2023-01-11T21:38:06.0343427Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0343493Z         {
2023-01-11T21:38:06.0343571Z             #pragma GCC ivdep
2023-01-11T21:38:06.0343659Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0343732Z             {
2023-01-11T21:38:06.0343801Z                 {
2023-01-11T21:38:06.0343871Z                     {
2023-01-11T21:38:06.0344014Z                         auto tmp0 = static_cast<int>(1 + i0);
2023-01-11T21:38:06.0344129Z                         auto tmp1 = static_cast<int>(1 + i1);
2023-01-11T21:38:06.0344233Z                         auto tmp2 = in_ptr0[tmp1 + (10*tmp0)];
2023-01-11T21:38:06.0344340Z                         auto tmp3 = static_cast<int>(i1);
2023-01-11T21:38:06.0344439Z                         auto tmp4 = tmp3 >= 1;
2023-01-11T21:38:06.0344535Z                         auto tmp5 = tmp3 <= 1;
2023-01-11T21:38:06.0344637Z                         auto tmp6 = tmp4 & tmp5;
2023-01-11T21:38:06.0344727Z                         float tmp7 = 0.0;
2023-01-11T21:38:06.0344807Z                         if(tmp6)
2023-01-11T21:38:06.0344873Z                         {
2023-01-11T21:38:06.0344985Z                             auto tmp8 = static_cast<int>(1 + i0);
2023-01-11T21:38:06.0345162Z                             auto tmp9 = static_cast<int>(1 + ((-1)*i1));
2023-01-11T21:38:06.0345278Z                             auto tmp10 = in_ptr0[tmp9 + (10*tmp8)];
2023-01-11T21:38:06.0345367Z                             tmp7 = tmp10;
2023-01-11T21:38:06.0345440Z                         }
2023-01-11T21:38:06.0345539Z                         auto tmp11 = tmp2 + tmp7;
2023-01-11T21:38:06.0345639Z                         auto tmp12 = tmp3 >= 6;
2023-01-11T21:38:06.0345732Z                         auto tmp13 = tmp3 <= 6;
2023-01-11T21:38:06.0345833Z                         auto tmp14 = tmp12 & tmp13;
2023-01-11T21:38:06.0345925Z                         float tmp15 = 0.0;
2023-01-11T21:38:06.0346006Z                         if(tmp14)
2023-01-11T21:38:06.0346078Z                         {
2023-01-11T21:38:06.0346192Z                             auto tmp16 = static_cast<int>(1 + i0);
2023-01-11T21:38:06.0346369Z                             auto tmp17 = static_cast<int>(15 + ((-1)*i1));
2023-01-11T21:38:06.0346480Z                             auto tmp18 = in_ptr0[tmp17 + (10*tmp16)];
2023-01-11T21:38:06.0346571Z                             tmp15 = tmp18;
2023-01-11T21:38:06.0346645Z                         }
2023-01-11T21:38:06.0346748Z                         auto tmp19 = tmp11 + tmp15;
2023-01-11T21:38:06.0346856Z                         auto tmp20 = static_cast<int>(i0);
2023-01-11T21:38:06.0346953Z                         auto tmp21 = tmp20 >= 1;
2023-01-11T21:38:06.0347080Z                         auto tmp22 = tmp20 <= 1;
2023-01-11T21:38:06.0347174Z                         auto tmp23 = tmp21 & tmp22;
2023-01-11T21:38:06.0347263Z                         float tmp24 = 0.0;
2023-01-11T21:38:06.0347344Z                         if(tmp23)
2023-01-11T21:38:06.0347415Z                         {
2023-01-11T21:38:06.0347591Z                             auto tmp25 = static_cast<int>(1 + ((-1)*i0));
2023-01-11T21:38:06.0347704Z                             auto tmp26 = static_cast<int>(1 + i1);
2023-01-11T21:38:06.0347823Z                             auto tmp27 = in_ptr0[tmp26 + (10*tmp25)];
2023-01-11T21:38:06.0347904Z                             tmp24 = tmp27;
2023-01-11T21:38:06.0347979Z                         }
2023-01-11T21:38:06.0348080Z                         auto tmp28 = tmp19 + tmp24;
2023-01-11T21:38:06.0348178Z                         auto tmp29 = tmp20 >= 6;
2023-01-11T21:38:06.0348273Z                         auto tmp30 = tmp20 <= 6;
2023-01-11T21:38:06.0348375Z                         auto tmp31 = tmp29 & tmp30;
2023-01-11T21:38:06.0348465Z                         float tmp32 = 0.0;
2023-01-11T21:38:06.0348545Z                         if(tmp31)
2023-01-11T21:38:06.0348611Z                         {
2023-01-11T21:38:06.0348788Z                             auto tmp33 = static_cast<int>(15 + ((-1)*i0));
2023-01-11T21:38:06.0348905Z                             auto tmp34 = static_cast<int>(1 + i1);
2023-01-11T21:38:06.0349021Z                             auto tmp35 = in_ptr0[tmp34 + (10*tmp33)];
2023-01-11T21:38:06.0349110Z                             tmp32 = tmp35;
2023-01-11T21:38:06.0349182Z                         }
2023-01-11T21:38:06.0349314Z                         auto tmp36 = tmp28 + tmp32;
2023-01-11T21:38:06.0349406Z                         auto tmp37 = tmp23 & tmp6;
2023-01-11T21:38:06.0349497Z                         float tmp38 = 0.0;
2023-01-11T21:38:06.0349583Z                         if(tmp37)
2023-01-11T21:38:06.0349656Z                         {
2023-01-11T21:38:06.0349838Z                             auto tmp39 = static_cast<int>(1 + ((-1)*i0));
2023-01-11T21:38:06.0350012Z                             auto tmp40 = static_cast<int>(1 + ((-1)*i1));
2023-01-11T21:38:06.0350127Z                             auto tmp41 = in_ptr0[tmp40 + (10*tmp39)];
2023-01-11T21:38:06.0350208Z                             tmp38 = tmp41;
2023-01-11T21:38:06.0350279Z                         }
2023-01-11T21:38:06.0350379Z                         auto tmp42 = tmp36 + tmp38;
2023-01-11T21:38:06.0350478Z                         auto tmp43 = tmp23 & tmp14;
2023-01-11T21:38:06.0350570Z                         float tmp44 = 0.0;
2023-01-11T21:38:06.0350654Z                         if(tmp43)
2023-01-11T21:38:06.0350726Z                         {
2023-01-11T21:38:06.0350892Z                             auto tmp45 = static_cast<int>(1 + ((-1)*i0));
2023-01-11T21:38:06.0351069Z                             auto tmp46 = static_cast<int>(15 + ((-1)*i1));
2023-01-11T21:38:06.0351185Z                             auto tmp47 = in_ptr0[tmp46 + (10*tmp45)];
2023-01-11T21:38:06.0351272Z                             tmp44 = tmp47;
2023-01-11T21:38:06.0351344Z                         }
2023-01-11T21:38:06.0351443Z                         auto tmp48 = tmp42 + tmp44;
2023-01-11T21:38:06.0351542Z                         auto tmp49 = tmp31 & tmp6;
2023-01-11T21:38:06.0351631Z                         float tmp50 = 0.0;
2023-01-11T21:38:06.0351705Z                         if(tmp49)
2023-01-11T21:38:06.0351777Z                         {
2023-01-11T21:38:06.0351953Z                             auto tmp51 = static_cast<int>(15 + ((-1)*i0));
2023-01-11T21:38:06.0352124Z                             auto tmp52 = static_cast<int>(1 + ((-1)*i1));
2023-01-11T21:38:06.0352241Z                             auto tmp53 = in_ptr0[tmp52 + (10*tmp51)];
2023-01-11T21:38:06.0352329Z                             tmp50 = tmp53;
2023-01-11T21:38:06.0352400Z                         }
2023-01-11T21:38:06.0352518Z                         auto tmp54 = tmp48 + tmp50;
2023-01-11T21:38:06.0352618Z                         auto tmp55 = tmp31 & tmp14;
2023-01-11T21:38:06.0352708Z                         float tmp56 = 0.0;
2023-01-11T21:38:06.0352792Z                         if(tmp55)
2023-01-11T21:38:06.0352868Z                         {
2023-01-11T21:38:06.0353042Z                             auto tmp57 = static_cast<int>(15 + ((-1)*i0));
2023-01-11T21:38:06.0353216Z                             auto tmp58 = static_cast<int>(15 + ((-1)*i1));
2023-01-11T21:38:06.0353323Z                             auto tmp59 = in_ptr0[tmp58 + (10*tmp57)];
2023-01-11T21:38:06.0353411Z                             tmp56 = tmp59;
2023-01-11T21:38:06.0353487Z                         }
2023-01-11T21:38:06.0353585Z                         auto tmp60 = tmp54 + tmp56;
2023-01-11T21:38:06.0353688Z                         out_ptr0[i1 + (8*i0)] = tmp60;
2023-01-11T21:38:06.0353761Z                     }
2023-01-11T21:38:06.0353831Z                 }
2023-01-11T21:38:06.0353891Z             }
2023-01-11T21:38:06.0353960Z         }
2023-01-11T21:38:06.0354027Z     }
2023-01-11T21:38:06.0354093Z }
2023-01-11T21:38:06.0354177Z ''')
2023-01-11T21:38:06.0354183Z 
2023-01-11T21:38:06.0354187Z 
2023-01-11T21:38:06.0354278Z async_compile.wait(globals())
2023-01-11T21:38:06.0354356Z del async_compile
2023-01-11T21:38:06.0354361Z 
2023-01-11T21:38:06.0354429Z def call(args):
2023-01-11T21:38:06.0354510Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0354585Z     args.clear()
2023-01-11T21:38:06.0354799Z     buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0354937Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0355050Z     del arg0_1
2023-01-11T21:38:06.0355142Z     return (buf0, )
2023-01-11T21:38:06.0355147Z 
2023-01-11T21:38:06.0355151Z 
2023-01-11T21:38:06.0355240Z if __name__ == "__main__":
2023-01-11T21:38:06.0355368Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0355496Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0355715Z     arg0_1 = rand_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0355924Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0356047Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0356052Z 
2023-01-11T21:38:06.0356317Z [2023-01-11 21:30:23,676] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 223
2023-01-11T21:38:06.0356323Z 
2023-01-11T21:38:06.0356422Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0356500Z import torch
2023-01-11T21:38:06.0356568Z import random
2023-01-11T21:38:06.0356684Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0356806Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0356811Z 
2023-01-11T21:38:06.0356891Z aten = torch.ops.aten
2023-01-11T21:38:06.0357028Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0357124Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0357129Z 
2023-01-11T21:38:06.0357203Z import triton
2023-01-11T21:38:06.0357296Z import triton.language as tl
2023-01-11T21:38:06.0357414Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0357551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0357556Z 
2023-01-11T21:38:06.0357561Z 
2023-01-11T21:38:06.0357699Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0357905Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0358032Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0358136Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0358199Z {
2023-01-11T21:38:06.0358293Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0358360Z     {
2023-01-11T21:38:06.0358444Z         #pragma omp for 
2023-01-11T21:38:06.0358563Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0358633Z         {
2023-01-11T21:38:06.0358723Z             #pragma GCC ivdep
2023-01-11T21:38:06.0358810Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0358871Z             {
2023-01-11T21:38:06.0358941Z                 {
2023-01-11T21:38:06.0359010Z                     {
2023-01-11T21:38:06.0359123Z                         auto tmp0 = static_cast<int>(3 + i0);
2023-01-11T21:38:06.0359236Z                         auto tmp1 = static_cast<int>(1 + i1);
2023-01-11T21:38:06.0359350Z                         auto tmp2 = in_ptr0[tmp1 + (11*tmp0)];
2023-01-11T21:38:06.0359461Z                         auto tmp3 = static_cast<int>(i1);
2023-01-11T21:38:06.0359551Z                         auto tmp4 = tmp3 >= 1;
2023-01-11T21:38:06.0359651Z                         auto tmp5 = tmp3 <= 1;
2023-01-11T21:38:06.0359754Z                         auto tmp6 = tmp4 & tmp5;
2023-01-11T21:38:06.0359848Z                         float tmp7 = 0.0;
2023-01-11T21:38:06.0359929Z                         if(tmp6)
2023-01-11T21:38:06.0360003Z                         {
2023-01-11T21:38:06.0360120Z                             auto tmp8 = static_cast<int>(3 + i0);
2023-01-11T21:38:06.0360287Z                             auto tmp9 = static_cast<int>(1 + ((-1)*i1));
2023-01-11T21:38:06.0360404Z                             auto tmp10 = in_ptr0[tmp9 + (11*tmp8)];
2023-01-11T21:38:06.0360492Z                             tmp7 = tmp10;
2023-01-11T21:38:06.0360566Z                         }
2023-01-11T21:38:06.0360668Z                         auto tmp11 = tmp2 + tmp7;
2023-01-11T21:38:06.0360766Z                         auto tmp12 = tmp3 >= 5;
2023-01-11T21:38:06.0360892Z                         auto tmp13 = tmp3 <= 6;
2023-01-11T21:38:06.0360991Z                         auto tmp14 = tmp12 & tmp13;
2023-01-11T21:38:06.0361075Z                         float tmp15 = 0.0;
2023-01-11T21:38:06.0361161Z                         if(tmp14)
2023-01-11T21:38:06.0361240Z                         {
2023-01-11T21:38:06.0361356Z                             auto tmp16 = static_cast<int>(3 + i0);
2023-01-11T21:38:06.0361540Z                             auto tmp17 = static_cast<int>(15 + ((-1)*i1));
2023-01-11T21:38:06.0361656Z                             auto tmp18 = in_ptr0[tmp17 + (11*tmp16)];
2023-01-11T21:38:06.0361744Z                             tmp15 = tmp18;
2023-01-11T21:38:06.0361809Z                         }
2023-01-11T21:38:06.0361909Z                         auto tmp19 = tmp11 + tmp15;
2023-01-11T21:38:06.0362018Z                         auto tmp20 = static_cast<int>(i0);
2023-01-11T21:38:06.0362116Z                         auto tmp21 = tmp20 >= 1;
2023-01-11T21:38:06.0362214Z                         auto tmp22 = tmp20 <= 3;
2023-01-11T21:38:06.0362314Z                         auto tmp23 = tmp21 & tmp22;
2023-01-11T21:38:06.0362404Z                         float tmp24 = 0.0;
2023-01-11T21:38:06.0362478Z                         if(tmp23)
2023-01-11T21:38:06.0362555Z                         {
2023-01-11T21:38:06.0362734Z                             auto tmp25 = static_cast<int>(3 + ((-1)*i0));
2023-01-11T21:38:06.0362845Z                             auto tmp26 = static_cast<int>(1 + i1);
2023-01-11T21:38:06.0362964Z                             auto tmp27 = in_ptr0[tmp26 + (11*tmp25)];
2023-01-11T21:38:06.0363055Z                             tmp24 = tmp27;
2023-01-11T21:38:06.0363129Z                         }
2023-01-11T21:38:06.0363221Z                         auto tmp28 = tmp19 + tmp24;
2023-01-11T21:38:06.0363320Z                         auto tmp29 = tmp20 >= 3;
2023-01-11T21:38:06.0363413Z                         auto tmp30 = tmp20 <= 6;
2023-01-11T21:38:06.0363516Z                         auto tmp31 = tmp29 & tmp30;
2023-01-11T21:38:06.0363606Z                         float tmp32 = 0.0;
2023-01-11T21:38:06.0363692Z                         if(tmp31)
2023-01-11T21:38:06.0363765Z                         {
2023-01-11T21:38:06.0363978Z                             auto tmp33 = static_cast<int>(17 + ((-1)*i0));
2023-01-11T21:38:06.0364087Z                             auto tmp34 = static_cast<int>(1 + i1);
2023-01-11T21:38:06.0364202Z                             auto tmp35 = in_ptr0[tmp34 + (11*tmp33)];
2023-01-11T21:38:06.0364293Z                             tmp32 = tmp35;
2023-01-11T21:38:06.0364367Z                         }
2023-01-11T21:38:06.0364468Z                         auto tmp36 = tmp28 + tmp32;
2023-01-11T21:38:06.0364565Z                         auto tmp37 = tmp23 & tmp6;
2023-01-11T21:38:06.0364655Z                         float tmp38 = 0.0;
2023-01-11T21:38:06.0364729Z                         if(tmp37)
2023-01-11T21:38:06.0364807Z                         {
2023-01-11T21:38:06.0364982Z                             auto tmp39 = static_cast<int>(3 + ((-1)*i0));
2023-01-11T21:38:06.0365158Z                             auto tmp40 = static_cast<int>(1 + ((-1)*i1));
2023-01-11T21:38:06.0365273Z                             auto tmp41 = in_ptr0[tmp40 + (11*tmp39)];
2023-01-11T21:38:06.0365363Z                             tmp38 = tmp41;
2023-01-11T21:38:06.0365436Z                         }
2023-01-11T21:38:06.0365528Z                         auto tmp42 = tmp36 + tmp38;
2023-01-11T21:38:06.0365625Z                         auto tmp43 = tmp23 & tmp14;
2023-01-11T21:38:06.0365716Z                         float tmp44 = 0.0;
2023-01-11T21:38:06.0365797Z                         if(tmp43)
2023-01-11T21:38:06.0365870Z                         {
2023-01-11T21:38:06.0366045Z                             auto tmp45 = static_cast<int>(3 + ((-1)*i0));
2023-01-11T21:38:06.0366224Z                             auto tmp46 = static_cast<int>(15 + ((-1)*i1));
2023-01-11T21:38:06.0366360Z                             auto tmp47 = in_ptr0[tmp46 + (11*tmp45)];
2023-01-11T21:38:06.0366448Z                             tmp44 = tmp47;
2023-01-11T21:38:06.0366521Z                         }
2023-01-11T21:38:06.0366624Z                         auto tmp48 = tmp42 + tmp44;
2023-01-11T21:38:06.0366724Z                         auto tmp49 = tmp31 & tmp6;
2023-01-11T21:38:06.0366813Z                         float tmp50 = 0.0;
2023-01-11T21:38:06.0366897Z                         if(tmp49)
2023-01-11T21:38:06.0366971Z                         {
2023-01-11T21:38:06.0367141Z                             auto tmp51 = static_cast<int>(17 + ((-1)*i0));
2023-01-11T21:38:06.0367317Z                             auto tmp52 = static_cast<int>(1 + ((-1)*i1));
2023-01-11T21:38:06.0367430Z                             auto tmp53 = in_ptr0[tmp52 + (11*tmp51)];
2023-01-11T21:38:06.0367520Z                             tmp50 = tmp53;
2023-01-11T21:38:06.0367592Z                         }
2023-01-11T21:38:06.0367694Z                         auto tmp54 = tmp48 + tmp50;
2023-01-11T21:38:06.0367790Z                         auto tmp55 = tmp31 & tmp14;
2023-01-11T21:38:06.0367875Z                         float tmp56 = 0.0;
2023-01-11T21:38:06.0367958Z                         if(tmp55)
2023-01-11T21:38:06.0368031Z                         {
2023-01-11T21:38:06.0368213Z                             auto tmp57 = static_cast<int>(17 + ((-1)*i0));
2023-01-11T21:38:06.0368387Z                             auto tmp58 = static_cast<int>(15 + ((-1)*i1));
2023-01-11T21:38:06.0368505Z                             auto tmp59 = in_ptr0[tmp58 + (11*tmp57)];
2023-01-11T21:38:06.0368594Z                             tmp56 = tmp59;
2023-01-11T21:38:06.0368659Z                         }
2023-01-11T21:38:06.0368760Z                         auto tmp60 = tmp54 + tmp56;
2023-01-11T21:38:06.0368864Z                         out_ptr0[i1 + (8*i0)] = tmp60;
2023-01-11T21:38:06.0368938Z                     }
2023-01-11T21:38:06.0369009Z                 }
2023-01-11T21:38:06.0369077Z             }
2023-01-11T21:38:06.0369142Z         }
2023-01-11T21:38:06.0369201Z     }
2023-01-11T21:38:06.0369267Z }
2023-01-11T21:38:06.0369350Z ''')
2023-01-11T21:38:06.0369355Z 
2023-01-11T21:38:06.0369360Z 
2023-01-11T21:38:06.0369453Z async_compile.wait(globals())
2023-01-11T21:38:06.0369557Z del async_compile
2023-01-11T21:38:06.0369563Z 
2023-01-11T21:38:06.0369641Z def call(args):
2023-01-11T21:38:06.0369721Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0369790Z     args.clear()
2023-01-11T21:38:06.0370000Z     buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0370139Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0370211Z     del arg0_1
2023-01-11T21:38:06.0370287Z     return (buf0, )
2023-01-11T21:38:06.0370292Z 
2023-01-11T21:38:06.0370296Z 
2023-01-11T21:38:06.0370372Z if __name__ == "__main__":
2023-01-11T21:38:06.0370493Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0370623Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0370837Z     arg0_1 = rand_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0371045Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0371166Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0371171Z 
2023-01-11T21:38:06.0371242Z ok (7.180s)
2023-01-11T21:38:06.0371703Z   test_reflection_pad2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0371834Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0372119Z [2023-01-11 21:30:23,760] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 224
2023-01-11T21:38:06.0372380Z [2023-01-11 21:30:26,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 224
2023-01-11T21:38:06.0372386Z 
2023-01-11T21:38:06.0372490Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0372565Z import torch
2023-01-11T21:38:06.0372633Z import random
2023-01-11T21:38:06.0372752Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0372875Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0372881Z 
2023-01-11T21:38:06.0372964Z aten = torch.ops.aten
2023-01-11T21:38:06.0373100Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0373196Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0373202Z 
2023-01-11T21:38:06.0373276Z import triton
2023-01-11T21:38:06.0373361Z import triton.language as tl
2023-01-11T21:38:06.0373492Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0373631Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0373637Z 
2023-01-11T21:38:06.0373641Z 
2023-01-11T21:38:06.0373778Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0373985Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0374109Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0374213Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0374313Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0374372Z {
2023-01-11T21:38:06.0374474Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0374650Z     {
2023-01-11T21:38:06.0374744Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0374829Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.0374897Z         {
2023-01-11T21:38:06.0374986Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.0375053Z             {
2023-01-11T21:38:06.0375122Z                 {
2023-01-11T21:38:06.0375193Z                     {
2023-01-11T21:38:06.0375303Z                         auto tmp0 = static_cast<int>(7);
2023-01-11T21:38:06.0375416Z                         auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:06.0375565Z                         auto tmp2 = static_cast<int>(1);
2023-01-11T21:38:06.0375713Z                         auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:06.0375812Z                         auto tmp4 = std::abs(tmp3);
2023-01-11T21:38:06.0375954Z                         auto tmp5 = tmp0 - tmp4;
2023-01-11T21:38:06.0376061Z                         auto tmp6 = std::abs(tmp5);
2023-01-11T21:38:06.0376202Z                         auto tmp7 = tmp0 - tmp6;
2023-01-11T21:38:06.0376310Z                         auto tmp8 = static_cast<int>(i1);
2023-01-11T21:38:06.0376449Z                         auto tmp9 = tmp8 - tmp2;
2023-01-11T21:38:06.0376558Z                         auto tmp10 = std::abs(tmp9);
2023-01-11T21:38:06.0376696Z                         auto tmp11 = tmp0 - tmp10;
2023-01-11T21:38:06.0376804Z                         auto tmp12 = std::abs(tmp11);
2023-01-11T21:38:06.0376944Z                         auto tmp13 = tmp0 - tmp12;
2023-01-11T21:38:06.0377056Z                         auto tmp14 = in_ptr0[tmp13 + (8*tmp7)];
2023-01-11T21:38:06.0377208Z                         out_ptr0[i1 + (10*i0)] = tmp14;
2023-01-11T21:38:06.0377292Z                     }
2023-01-11T21:38:06.0377375Z                 }
2023-01-11T21:38:06.0377447Z             }
2023-01-11T21:38:06.0377517Z         }
2023-01-11T21:38:06.0377611Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0377697Z         for(long i0=0; i0<15; i0+=1)
2023-01-11T21:38:06.0377764Z         {
2023-01-11T21:38:06.0377855Z             for(long i1=0; i1<11; i1+=1)
2023-01-11T21:38:06.0377924Z             {
2023-01-11T21:38:06.0377986Z                 {
2023-01-11T21:38:06.0378058Z                     {
2023-01-11T21:38:06.0378219Z                         auto tmp0 = static_cast<int>(7);
2023-01-11T21:38:06.0378329Z                         auto tmp1 = static_cast<int>(i0);
2023-01-11T21:38:06.0378435Z                         auto tmp2 = static_cast<int>(3);
2023-01-11T21:38:06.0378580Z                         auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:06.0378685Z                         auto tmp4 = std::abs(tmp3);
2023-01-11T21:38:06.0378818Z                         auto tmp5 = tmp0 - tmp4;
2023-01-11T21:38:06.0378922Z                         auto tmp6 = std::abs(tmp5);
2023-01-11T21:38:06.0379061Z                         auto tmp7 = tmp0 - tmp6;
2023-01-11T21:38:06.0379167Z                         auto tmp8 = static_cast<int>(i1);
2023-01-11T21:38:06.0379277Z                         auto tmp9 = static_cast<int>(1);
2023-01-11T21:38:06.0379421Z                         auto tmp10 = tmp8 - tmp9;
2023-01-11T21:38:06.0379525Z                         auto tmp11 = std::abs(tmp10);
2023-01-11T21:38:06.0379669Z                         auto tmp12 = tmp0 - tmp11;
2023-01-11T21:38:06.0379767Z                         auto tmp13 = std::abs(tmp12);
2023-01-11T21:38:06.0379911Z                         auto tmp14 = tmp0 - tmp13;
2023-01-11T21:38:06.0380024Z                         auto tmp15 = in_ptr0[tmp14 + (8*tmp7)];
2023-01-11T21:38:06.0380129Z                         out_ptr1[i1 + (11*i0)] = tmp15;
2023-01-11T21:38:06.0380201Z                     }
2023-01-11T21:38:06.0380269Z                 }
2023-01-11T21:38:06.0380334Z             }
2023-01-11T21:38:06.0380394Z         }
2023-01-11T21:38:06.0380459Z     }
2023-01-11T21:38:06.0380523Z }
2023-01-11T21:38:06.0380605Z ''')
2023-01-11T21:38:06.0380611Z 
2023-01-11T21:38:06.0380615Z 
2023-01-11T21:38:06.0380709Z async_compile.wait(globals())
2023-01-11T21:38:06.0380786Z del async_compile
2023-01-11T21:38:06.0380791Z 
2023-01-11T21:38:06.0380866Z def call(args):
2023-01-11T21:38:06.0380932Z     arg0_1, = args
2023-01-11T21:38:06.0381006Z     args.clear()
2023-01-11T21:38:06.0381226Z     buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0381440Z     buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0381609Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0381714Z     del arg0_1
2023-01-11T21:38:06.0381796Z     return (buf0, buf1, )
2023-01-11T21:38:06.0381801Z 
2023-01-11T21:38:06.0381806Z 
2023-01-11T21:38:06.0381889Z if __name__ == "__main__":
2023-01-11T21:38:06.0381999Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0382128Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0382339Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0382453Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0382458Z 
2023-01-11T21:38:06.0382530Z ok (2.715s)
2023-01-11T21:38:06.0382986Z   test_relu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0383120Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0383376Z [2023-01-11 21:30:26,422] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 225
2023-01-11T21:38:06.0383639Z [2023-01-11 21:30:28,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 225
2023-01-11T21:38:06.0383645Z 
2023-01-11T21:38:06.0383744Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0383811Z import torch
2023-01-11T21:38:06.0383887Z import random
2023-01-11T21:38:06.0384008Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0384161Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0384166Z 
2023-01-11T21:38:06.0384249Z aten = torch.ops.aten
2023-01-11T21:38:06.0384388Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0384480Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0384487Z 
2023-01-11T21:38:06.0384555Z import triton
2023-01-11T21:38:06.0384648Z import triton.language as tl
2023-01-11T21:38:06.0384773Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0384913Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0384919Z 
2023-01-11T21:38:06.0384923Z 
2023-01-11T21:38:06.0385058Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0385264Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0385387Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0385499Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0385596Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0385697Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0385760Z {
2023-01-11T21:38:06.0385862Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0385926Z     {
2023-01-11T21:38:06.0386012Z         #pragma omp for 
2023-01-11T21:38:06.0386099Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0386159Z         {
2023-01-11T21:38:06.0386300Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0386437Z             auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0386575Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:06.0386669Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.0386799Z             auto tmp4 = at::vec::clamp_min(tmp3, decltype(tmp3)(0));
2023-01-11T21:38:06.0386938Z             auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:06.0387034Z             auto tmp6 = tmp4 / tmp5;
2023-01-11T21:38:06.0387125Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0387220Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0387286Z         }
2023-01-11T21:38:06.0387416Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0387506Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0387576Z         {
2023-01-11T21:38:06.0387665Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0387746Z             auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:06.0387838Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:06.0387928Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.0388020Z             auto tmp4 = tmp3 * (tmp3>0);
2023-01-11T21:38:06.0388124Z             auto tmp5 = static_cast<float>(10);
2023-01-11T21:38:06.0388213Z             auto tmp6 = tmp4 / tmp5;
2023-01-11T21:38:06.0388303Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.0388385Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.0388455Z         }
2023-01-11T21:38:06.0388523Z     }
2023-01-11T21:38:06.0388587Z }
2023-01-11T21:38:06.0388673Z ''')
2023-01-11T21:38:06.0388678Z 
2023-01-11T21:38:06.0388683Z 
2023-01-11T21:38:06.0388774Z async_compile.wait(globals())
2023-01-11T21:38:06.0388850Z del async_compile
2023-01-11T21:38:06.0388855Z 
2023-01-11T21:38:06.0388923Z def call(args):
2023-01-11T21:38:06.0389002Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0389079Z     args.clear()
2023-01-11T21:38:06.0389279Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0389470Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0389666Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0389738Z     del arg0_1
2023-01-11T21:38:06.0389803Z     del arg1_1
2023-01-11T21:38:06.0389914Z     return (buf0, buf1, )
2023-01-11T21:38:06.0389919Z 
2023-01-11T21:38:06.0389923Z 
2023-01-11T21:38:06.0390003Z if __name__ == "__main__":
2023-01-11T21:38:06.0390120Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0390246Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0390443Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0390636Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0390754Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0390759Z 
2023-01-11T21:38:06.0390823Z ok (1.780s)
2023-01-11T21:38:06.0391280Z   test_remainder_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0391416Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0391672Z [2023-01-11 21:30:28,198] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 226
2023-01-11T21:38:06.0391935Z [2023-01-11 21:30:29,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 226
2023-01-11T21:38:06.0391941Z 
2023-01-11T21:38:06.0392043Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0392119Z import torch
2023-01-11T21:38:06.0392192Z import random
2023-01-11T21:38:06.0392311Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0392429Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0392434Z 
2023-01-11T21:38:06.0392514Z aten = torch.ops.aten
2023-01-11T21:38:06.0392653Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0392747Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0392755Z 
2023-01-11T21:38:06.0392829Z import triton
2023-01-11T21:38:06.0392925Z import triton.language as tl
2023-01-11T21:38:06.0393048Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0393184Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0393189Z 
2023-01-11T21:38:06.0393232Z 
2023-01-11T21:38:06.0393365Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0393573Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0393697Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0393810Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0393918Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0394023Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0394126Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0394199Z {
2023-01-11T21:38:06.0394296Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0394365Z     {
2023-01-11T21:38:06.0394450Z         #pragma omp for 
2023-01-11T21:38:06.0394542Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.0394611Z         {
2023-01-11T21:38:06.0394683Z             {
2023-01-11T21:38:06.0394748Z                 {
2023-01-11T21:38:06.0394850Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0394948Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.0395051Z                     auto tmp2 = mod(tmp0, tmp1);
2023-01-11T21:38:06.0395152Z                     auto tmp3 = tmp2 + tmp1;
2023-01-11T21:38:06.0395283Z                     auto tmp4 = ((tmp2 != 0) & ((tmp2 < 0) != (tmp1 < 0))) ? tmp3 : tmp2;
2023-01-11T21:38:06.0395396Z                     auto tmp5 = static_cast<float>(1);
2023-01-11T21:38:06.0395487Z                     auto tmp6 = tmp0 + tmp5;
2023-01-11T21:38:06.0395629Z                     auto tmp7 = tmp1 - tmp5;
2023-01-11T21:38:06.0395759Z                     auto tmp8 = mod(tmp6, tmp7);
2023-01-11T21:38:06.0395853Z                     auto tmp9 = tmp8 + tmp7;
2023-01-11T21:38:06.0395982Z                     auto tmp10 = ((tmp8 != 0) & ((tmp8 < 0) != (tmp7 < 0))) ? tmp9 : tmp8;
2023-01-11T21:38:06.0396124Z                     auto tmp11 = tmp0 - tmp5;
2023-01-11T21:38:06.0396223Z                     auto tmp12 = tmp1 + tmp5;
2023-01-11T21:38:06.0396326Z                     auto tmp13 = mod(tmp11, tmp12);
2023-01-11T21:38:06.0396417Z                     auto tmp14 = tmp13 + tmp12;
2023-01-11T21:38:06.0396548Z                     auto tmp15 = ((tmp13 != 0) & ((tmp13 < 0) != (tmp12 < 0))) ? tmp14 : tmp13;
2023-01-11T21:38:06.0396640Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0396731Z                     out_ptr1[i0] = tmp10;
2023-01-11T21:38:06.0396820Z                     out_ptr2[i0] = tmp15;
2023-01-11T21:38:06.0396888Z                 }
2023-01-11T21:38:06.0396954Z             }
2023-01-11T21:38:06.0397020Z         }
2023-01-11T21:38:06.0397088Z     }
2023-01-11T21:38:06.0397150Z }
2023-01-11T21:38:06.0397235Z ''')
2023-01-11T21:38:06.0397240Z 
2023-01-11T21:38:06.0397245Z 
2023-01-11T21:38:06.0397338Z async_compile.wait(globals())
2023-01-11T21:38:06.0397413Z del async_compile
2023-01-11T21:38:06.0397419Z 
2023-01-11T21:38:06.0397492Z def call(args):
2023-01-11T21:38:06.0397568Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0397644Z     args.clear()
2023-01-11T21:38:06.0397842Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0398034Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0398222Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0398436Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0398511Z     del arg0_1
2023-01-11T21:38:06.0398587Z     del arg1_1
2023-01-11T21:38:06.0398668Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0398673Z 
2023-01-11T21:38:06.0398677Z 
2023-01-11T21:38:06.0398756Z if __name__ == "__main__":
2023-01-11T21:38:06.0398875Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0398999Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0399220Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0399414Z     arg1_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0399535Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0399541Z 
2023-01-11T21:38:06.0399613Z ok (1.752s)
2023-01-11T21:38:06.0400058Z   test_repeat_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0400192Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0400452Z [2023-01-11 21:30:29,942] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 227
2023-01-11T21:38:06.0400715Z [2023-01-11 21:30:32,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 227
2023-01-11T21:38:06.0400721Z 
2023-01-11T21:38:06.0400821Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0400895Z import torch
2023-01-11T21:38:06.0400972Z import random
2023-01-11T21:38:06.0401092Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0401217Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0401223Z 
2023-01-11T21:38:06.0401298Z aten = torch.ops.aten
2023-01-11T21:38:06.0401436Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0401564Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0401569Z 
2023-01-11T21:38:06.0401644Z import triton
2023-01-11T21:38:06.0401740Z import triton.language as tl
2023-01-11T21:38:06.0401865Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0402003Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0402012Z 
2023-01-11T21:38:06.0402016Z 
2023-01-11T21:38:06.0402154Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0402350Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0402471Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0402576Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0402677Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0402774Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0402838Z {
2023-01-11T21:38:06.0402943Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0403005Z     {
2023-01-11T21:38:06.0403100Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0403185Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0403251Z         {
2023-01-11T21:38:06.0403339Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0403405Z             {
2023-01-11T21:38:06.0403494Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0403583Z                 for(long i2=0; i2<12; i2+=1)
2023-01-11T21:38:06.0403652Z                 {
2023-01-11T21:38:06.0403746Z                     for(long i3=0; i3<1; i3+=1)
2023-01-11T21:38:06.0403817Z                     {
2023-01-11T21:38:06.0403981Z                         auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i3) + (8*(i2 % 4)) + (32*(i1 % 2)));
2023-01-11T21:38:06.0404108Z                         tmp0.store(out_ptr0 + (8*i2) + (8*i3) + (96*i1) + (384*i0));
2023-01-11T21:38:06.0404177Z                     }
2023-01-11T21:38:06.0404274Z                     #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0404377Z                     for(long i3=8; i3<8; i3+=1)
2023-01-11T21:38:06.0404449Z                     {
2023-01-11T21:38:06.0404568Z                         auto tmp0 = in_ptr0[i3 + (8*(i2 % 4)) + (32*(i1 % 2))];
2023-01-11T21:38:06.0404682Z                         out_ptr0[i3 + (8*i2) + (96*i1) + (384*i0)] = tmp0;
2023-01-11T21:38:06.0404780Z                     }
2023-01-11T21:38:06.0404851Z                 }
2023-01-11T21:38:06.0404911Z             }
2023-01-11T21:38:06.0404980Z         }
2023-01-11T21:38:06.0405064Z         #pragma omp for 
2023-01-11T21:38:06.0405150Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0405217Z         {
2023-01-11T21:38:06.0405308Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0405388Z             {
2023-01-11T21:38:06.0405539Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i1);
2023-01-11T21:38:06.0405666Z                 tmp0.store(out_ptr1 + (8*i1) + (64*i0));
2023-01-11T21:38:06.0405737Z             }
2023-01-11T21:38:06.0405835Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0405926Z             for(long i1=64; i1<64; i1+=1)
2023-01-11T21:38:06.0405993Z             {
2023-01-11T21:38:06.0406083Z                 auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.0406172Z                 out_ptr1[i1 + (64*i0)] = tmp0;
2023-01-11T21:38:06.0406241Z             }
2023-01-11T21:38:06.0406309Z         }
2023-01-11T21:38:06.0406390Z         #pragma omp for 
2023-01-11T21:38:06.0406474Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0406541Z         {
2023-01-11T21:38:06.0406627Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0406687Z             {
2023-01-11T21:38:06.0406827Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i1);
2023-01-11T21:38:06.0406937Z                 tmp0.store(out_ptr2 + (8*i1) + (64*i0));
2023-01-11T21:38:06.0407005Z             }
2023-01-11T21:38:06.0407102Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0407217Z             for(long i1=64; i1<64; i1+=1)
2023-01-11T21:38:06.0407286Z             {
2023-01-11T21:38:06.0407370Z                 auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.0407465Z                 out_ptr2[i1 + (64*i0)] = tmp0;
2023-01-11T21:38:06.0407532Z             }
2023-01-11T21:38:06.0407600Z         }
2023-01-11T21:38:06.0407669Z     }
2023-01-11T21:38:06.0407735Z }
2023-01-11T21:38:06.0407815Z ''')
2023-01-11T21:38:06.0407827Z 
2023-01-11T21:38:06.0407831Z 
2023-01-11T21:38:06.0407918Z async_compile.wait(globals())
2023-01-11T21:38:06.0407993Z del async_compile
2023-01-11T21:38:06.0407999Z 
2023-01-11T21:38:06.0408072Z def call(args):
2023-01-11T21:38:06.0408146Z     arg0_1, = args
2023-01-11T21:38:06.0408222Z     args.clear()
2023-01-11T21:38:06.0408440Z     buf0 = empty_strided((2, 4, 12, 8), (384, 96, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0408650Z     buf1 = empty_strided((8, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0408872Z     buf2 = empty_strided((2, 1, 1, 2, 4, 8), (64, 64, 64, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0409067Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0409139Z     del arg0_1
2023-01-11T21:38:06.0409229Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0409236Z 
2023-01-11T21:38:06.0409240Z 
2023-01-11T21:38:06.0409320Z if __name__ == "__main__":
2023-01-11T21:38:06.0409440Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0409569Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0409780Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0409893Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0409899Z 
2023-01-11T21:38:06.0409962Z ok (2.100s)
2023-01-11T21:38:06.0410413Z   test_roi_align_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0410575Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0410835Z [2023-01-11 21:30:33,845] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 228
2023-01-11T21:38:06.0411094Z [2023-01-11 21:30:34,178] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.torchvision.roi_align
2023-01-11T21:38:06.0411356Z [2023-01-11 21:30:34,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 228
2023-01-11T21:38:06.0411362Z 
2023-01-11T21:38:06.0411456Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0411532Z import torch
2023-01-11T21:38:06.0411607Z import random
2023-01-11T21:38:06.0411720Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0411842Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0411847Z 
2023-01-11T21:38:06.0411927Z aten = torch.ops.aten
2023-01-11T21:38:06.0412062Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0412158Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0412164Z 
2023-01-11T21:38:06.0412237Z import triton
2023-01-11T21:38:06.0412328Z import triton.language as tl
2023-01-11T21:38:06.0412453Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0412586Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0412591Z 
2023-01-11T21:38:06.0412596Z 
2023-01-11T21:38:06.0412689Z async_compile.wait(globals())
2023-01-11T21:38:06.0412766Z del async_compile
2023-01-11T21:38:06.0412771Z 
2023-01-11T21:38:06.0412845Z def call(args):
2023-01-11T21:38:06.0412924Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0413029Z     args.clear()
2023-01-11T21:38:06.0413175Z     buf0 = torch.ops.torchvision.roi_align(arg0_1, arg1_1, 0.25, 7, 7, 2, False)
2023-01-11T21:38:06.0413241Z     del arg0_1
2023-01-11T21:38:06.0413309Z     del arg1_1
2023-01-11T21:38:06.0413379Z     buf1 = buf0
2023-01-11T21:38:06.0413498Z     assert_size_stride(buf1, (2292, 256, 7, 7), (12544, 49, 7, 1))
2023-01-11T21:38:06.0413569Z     del buf0
2023-01-11T21:38:06.0413642Z     return (buf1, )
2023-01-11T21:38:06.0413648Z 
2023-01-11T21:38:06.0413652Z 
2023-01-11T21:38:06.0413729Z if __name__ == "__main__":
2023-01-11T21:38:06.0413844Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0413963Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0414196Z     arg0_1 = rand_strided((4, 256, 296, 304), (23035904, 89984, 304, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0414396Z     arg1_1 = rand_strided((2292, 5), (5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0414637Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0414646Z 
2023-01-11T21:38:06.0414722Z ok (4.251s)
2023-01-11T21:38:06.0415172Z   test_roll_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0415303Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0415560Z [2023-01-11 21:30:36,340] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 229
2023-01-11T21:38:06.0415824Z [2023-01-11 21:30:39,345] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 229
2023-01-11T21:38:06.0415830Z 
2023-01-11T21:38:06.0415921Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0415999Z import torch
2023-01-11T21:38:06.0416073Z import random
2023-01-11T21:38:06.0416194Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0416315Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0416321Z 
2023-01-11T21:38:06.0416404Z aten = torch.ops.aten
2023-01-11T21:38:06.0416585Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0416682Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0416687Z 
2023-01-11T21:38:06.0416754Z import triton
2023-01-11T21:38:06.0416846Z import triton.language as tl
2023-01-11T21:38:06.0416969Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0417109Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0417114Z 
2023-01-11T21:38:06.0417119Z 
2023-01-11T21:38:06.0417305Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0417508Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0417633Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0417735Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0417829Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0417893Z {
2023-01-11T21:38:06.0417994Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0418063Z     {
2023-01-11T21:38:06.0418158Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0418244Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0418303Z         {
2023-01-11T21:38:06.0418394Z             for(long i1=0; i1<56; i1+=1)
2023-01-11T21:38:06.0418460Z             {
2023-01-11T21:38:06.0418545Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0418640Z                 for(long i2=0; i2<56; i2+=1)
2023-01-11T21:38:06.0418709Z                 {
2023-01-11T21:38:06.0418804Z                     for(long i3=0; i3<2; i3+=1)
2023-01-11T21:38:06.0418867Z                     {
2023-01-11T21:38:06.0419091Z                         auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i3) + (16*((46 + i2) % 56)) + (896*((3 + i1) % 56)) + (50176*i0));
2023-01-11T21:38:06.0419214Z                         tmp0.store(out_ptr0 + (8*i3) + (16*i2) + (896*i1) + (50176*i0));
2023-01-11T21:38:06.0419287Z                     }
2023-01-11T21:38:06.0419395Z                     #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0419491Z                     for(long i3=16; i3<16; i3+=1)
2023-01-11T21:38:06.0419562Z                     {
2023-01-11T21:38:06.0419700Z                         auto tmp0 = in_ptr0[i3 + (16*((46 + i2) % 56)) + (896*((3 + i1) % 56)) + (50176*i0)];
2023-01-11T21:38:06.0419808Z                         out_ptr0[i3 + (16*i2) + (896*i1) + (50176*i0)] = tmp0;
2023-01-11T21:38:06.0419880Z                     }
2023-01-11T21:38:06.0419948Z                 }
2023-01-11T21:38:06.0420014Z             }
2023-01-11T21:38:06.0420080Z         }
2023-01-11T21:38:06.0420174Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0420264Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0420324Z         {
2023-01-11T21:38:06.0420415Z             for(long i1=0; i1<56; i1+=1)
2023-01-11T21:38:06.0420480Z             {
2023-01-11T21:38:06.0420567Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0420660Z                 for(long i2=0; i2<56; i2+=1)
2023-01-11T21:38:06.0420731Z                 {
2023-01-11T21:38:06.0420820Z                     #pragma GCC ivdep
2023-01-11T21:38:06.0420908Z                     for(long i3=0; i3<16; i3+=1)
2023-01-11T21:38:06.0420977Z                     {
2023-01-11T21:38:06.0421052Z                         {
2023-01-11T21:38:06.0421126Z                             {
2023-01-11T21:38:06.0421260Z                                 auto tmp0 = in_ptr0[(100347 + i3 + (16*i2) + (896*i1) + (50176*i0)) % 100352];
2023-01-11T21:38:06.0421382Z                                 out_ptr1[i3 + (16*i2) + (896*i1) + (50176*i0)] = tmp0;
2023-01-11T21:38:06.0421456Z                             }
2023-01-11T21:38:06.0421523Z                         }
2023-01-11T21:38:06.0421593Z                     }
2023-01-11T21:38:06.0421660Z                 }
2023-01-11T21:38:06.0421725Z             }
2023-01-11T21:38:06.0421789Z         }
2023-01-11T21:38:06.0421854Z     }
2023-01-11T21:38:06.0421916Z }
2023-01-11T21:38:06.0421996Z ''')
2023-01-11T21:38:06.0422002Z 
2023-01-11T21:38:06.0422034Z 
2023-01-11T21:38:06.0422128Z async_compile.wait(globals())
2023-01-11T21:38:06.0422204Z del async_compile
2023-01-11T21:38:06.0422209Z 
2023-01-11T21:38:06.0422282Z def call(args):
2023-01-11T21:38:06.0422357Z     arg0_1, = args
2023-01-11T21:38:06.0422438Z     args.clear()
2023-01-11T21:38:06.0422660Z     buf0 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0422873Z     buf1 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0423040Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0423116Z     del arg0_1
2023-01-11T21:38:06.0423200Z     return (buf0, buf1, )
2023-01-11T21:38:06.0423205Z 
2023-01-11T21:38:06.0423209Z 
2023-01-11T21:38:06.0423289Z if __name__ == "__main__":
2023-01-11T21:38:06.0423405Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0423534Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0423755Z     arg0_1 = rand_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0423861Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0423872Z 
2023-01-11T21:38:06.0423936Z ok (3.098s)
2023-01-11T21:38:06.0424395Z   test_round_correctness_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0424557Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0424810Z [2023-01-11 21:30:39,389] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 230
2023-01-11T21:38:06.0425076Z [2023-01-11 21:30:41,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 230
2023-01-11T21:38:06.0425082Z 
2023-01-11T21:38:06.0425181Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0425255Z import torch
2023-01-11T21:38:06.0425330Z import random
2023-01-11T21:38:06.0425443Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0425569Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0425574Z 
2023-01-11T21:38:06.0425655Z aten = torch.ops.aten
2023-01-11T21:38:06.0425789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0425890Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0425895Z 
2023-01-11T21:38:06.0425967Z import triton
2023-01-11T21:38:06.0426056Z import triton.language as tl
2023-01-11T21:38:06.0426183Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0426315Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0426324Z 
2023-01-11T21:38:06.0426337Z 
2023-01-11T21:38:06.0426465Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0426670Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0426797Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.0426902Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.0426966Z {
2023-01-11T21:38:06.0427067Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0427133Z     {
2023-01-11T21:38:06.0427208Z         #pragma omp for 
2023-01-11T21:38:06.0427293Z         for(long i0=0; i0<200; i0+=1)
2023-01-11T21:38:06.0427363Z         {
2023-01-11T21:38:06.0427432Z             {
2023-01-11T21:38:06.0427500Z                 {
2023-01-11T21:38:06.0427598Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0427711Z                     auto tmp1 = std::nearbyint(tmp0);
2023-01-11T21:38:06.0427795Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.0427892Z                 }
2023-01-11T21:38:06.0427960Z             }
2023-01-11T21:38:06.0428028Z         }
2023-01-11T21:38:06.0428094Z     }
2023-01-11T21:38:06.0428156Z }
2023-01-11T21:38:06.0428233Z ''')
2023-01-11T21:38:06.0428239Z 
2023-01-11T21:38:06.0428253Z 
2023-01-11T21:38:06.0428339Z async_compile.wait(globals())
2023-01-11T21:38:06.0428416Z del async_compile
2023-01-11T21:38:06.0428421Z 
2023-01-11T21:38:06.0428493Z def call(args):
2023-01-11T21:38:06.0428568Z     arg0_1, = args
2023-01-11T21:38:06.0428646Z     args.clear()
2023-01-11T21:38:06.0428839Z     buf0 = empty_strided((200, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.0428980Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0429045Z     del arg0_1
2023-01-11T21:38:06.0429120Z     return (buf0, )
2023-01-11T21:38:06.0429125Z 
2023-01-11T21:38:06.0429129Z 
2023-01-11T21:38:06.0429209Z if __name__ == "__main__":
2023-01-11T21:38:06.0429331Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0429457Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0429652Z     arg0_1 = rand_strided((200, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.0429764Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0429769Z 
2023-01-11T21:38:06.0429840Z ok (2.382s)
2023-01-11T21:38:06.0430272Z   test_round_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0430478Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0430736Z [2023-01-11 21:30:41,807] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 231
2023-01-11T21:38:06.0431000Z [2023-01-11 21:30:44,755] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 231
2023-01-11T21:38:06.0431005Z 
2023-01-11T21:38:06.0431103Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0431176Z import torch
2023-01-11T21:38:06.0431251Z import random
2023-01-11T21:38:06.0431372Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0431492Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0431498Z 
2023-01-11T21:38:06.0431573Z aten = torch.ops.aten
2023-01-11T21:38:06.0431708Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0431807Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0431812Z 
2023-01-11T21:38:06.0431890Z import triton
2023-01-11T21:38:06.0431983Z import triton.language as tl
2023-01-11T21:38:06.0432106Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0432244Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0432253Z 
2023-01-11T21:38:06.0432257Z 
2023-01-11T21:38:06.0432394Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0432589Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0432712Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0432819Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0432920Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0433020Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0433121Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0433188Z {
2023-01-11T21:38:06.0433283Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0433349Z     {
2023-01-11T21:38:06.0433431Z         #pragma omp for 
2023-01-11T21:38:06.0433518Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0433585Z         {
2023-01-11T21:38:06.0433753Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0433846Z             auto tmp1 = tmp0.round();
2023-01-11T21:38:06.0433981Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(100.0));
2023-01-11T21:38:06.0434069Z             auto tmp3 = tmp0 * tmp2;
2023-01-11T21:38:06.0434160Z             auto tmp4 = tmp3.round();
2023-01-11T21:38:06.0434299Z             auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(0.01));
2023-01-11T21:38:06.0434389Z             auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.0434487Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0434581Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0434654Z         }
2023-01-11T21:38:06.0434749Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0434838Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0434919Z         {
2023-01-11T21:38:06.0435015Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0435144Z             auto tmp1 = std::nearbyint(tmp0);
2023-01-11T21:38:06.0435254Z             auto tmp2 = static_cast<float>(100.0);
2023-01-11T21:38:06.0435343Z             auto tmp3 = tmp0 * tmp2;
2023-01-11T21:38:06.0435441Z             auto tmp4 = std::nearbyint(tmp3);
2023-01-11T21:38:06.0435547Z             auto tmp5 = static_cast<float>(0.01);
2023-01-11T21:38:06.0435636Z             auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.0435724Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.0435808Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.0435874Z         }
2023-01-11T21:38:06.0435947Z         #pragma omp for 
2023-01-11T21:38:06.0436035Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0436136Z         {
2023-01-11T21:38:06.0436274Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0436411Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0436501Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0436595Z             auto tmp3 = tmp2.round();
2023-01-11T21:38:06.0436690Z             tmp3.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0436750Z         }
2023-01-11T21:38:06.0436849Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0436935Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0437002Z         {
2023-01-11T21:38:06.0437090Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0437191Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0437283Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0437382Z             auto tmp3 = std::nearbyint(tmp2);
2023-01-11T21:38:06.0437468Z             out_ptr2[i0] = tmp3;
2023-01-11T21:38:06.0437538Z         }
2023-01-11T21:38:06.0437604Z     }
2023-01-11T21:38:06.0437668Z }
2023-01-11T21:38:06.0437756Z ''')
2023-01-11T21:38:06.0437761Z 
2023-01-11T21:38:06.0437766Z 
2023-01-11T21:38:06.0437859Z async_compile.wait(globals())
2023-01-11T21:38:06.0437932Z del async_compile
2023-01-11T21:38:06.0437938Z 
2023-01-11T21:38:06.0438012Z def call(args):
2023-01-11T21:38:06.0438092Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0438166Z     args.clear()
2023-01-11T21:38:06.0438361Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0438582Z     buf2 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0438795Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0439031Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0439105Z     del arg0_1
2023-01-11T21:38:06.0439181Z     del arg1_1
2023-01-11T21:38:06.0439272Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0439277Z 
2023-01-11T21:38:06.0439281Z 
2023-01-11T21:38:06.0439362Z if __name__ == "__main__":
2023-01-11T21:38:06.0439490Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0439627Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0439879Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0440068Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0440192Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0440197Z 
2023-01-11T21:38:06.0440266Z ok (3.025s)
2023-01-11T21:38:06.0440711Z   test_rsqrt_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0440844Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0441101Z [2023-01-11 21:30:44,819] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 232
2023-01-11T21:38:06.0441366Z [2023-01-11 21:30:47,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 232
2023-01-11T21:38:06.0441372Z 
2023-01-11T21:38:06.0441472Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0441547Z import torch
2023-01-11T21:38:06.0441615Z import random
2023-01-11T21:38:06.0441733Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0441856Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0441861Z 
2023-01-11T21:38:06.0441942Z aten = torch.ops.aten
2023-01-11T21:38:06.0442077Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0442200Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0442205Z 
2023-01-11T21:38:06.0442279Z import triton
2023-01-11T21:38:06.0442369Z import triton.language as tl
2023-01-11T21:38:06.0442486Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0442626Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0442632Z 
2023-01-11T21:38:06.0442636Z 
2023-01-11T21:38:06.0442771Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0442975Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0443099Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0443203Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0443304Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0443370Z {
2023-01-11T21:38:06.0443466Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0443531Z     {
2023-01-11T21:38:06.0443617Z         #pragma omp for 
2023-01-11T21:38:06.0443702Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0443773Z         {
2023-01-11T21:38:06.0443910Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0443999Z             auto tmp1 = tmp0.rsqrt();
2023-01-11T21:38:06.0444133Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0444224Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.0444312Z             auto tmp4 = tmp3.rsqrt();
2023-01-11T21:38:06.0444446Z             auto tmp5 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0444575Z             auto tmp6 = tmp4 - tmp5;
2023-01-11T21:38:06.0444670Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0444763Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0444823Z         }
2023-01-11T21:38:06.0444921Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0445013Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0445084Z         {
2023-01-11T21:38:06.0445174Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0445275Z             auto tmp1 = 1 / std::sqrt(tmp0);
2023-01-11T21:38:06.0445380Z             auto tmp2 = static_cast<float>(1);
2023-01-11T21:38:06.0445462Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.0445590Z             auto tmp4 = 1 / std::sqrt(tmp3);
2023-01-11T21:38:06.0445693Z             auto tmp5 = static_cast<float>(2);
2023-01-11T21:38:06.0445819Z             auto tmp6 = tmp4 - tmp5;
2023-01-11T21:38:06.0445902Z             out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.0445986Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.0446053Z         }
2023-01-11T21:38:06.0446112Z     }
2023-01-11T21:38:06.0446175Z }
2023-01-11T21:38:06.0446261Z ''')
2023-01-11T21:38:06.0446266Z 
2023-01-11T21:38:06.0446272Z 
2023-01-11T21:38:06.0446366Z async_compile.wait(globals())
2023-01-11T21:38:06.0446443Z del async_compile
2023-01-11T21:38:06.0446448Z 
2023-01-11T21:38:06.0446524Z def call(args):
2023-01-11T21:38:06.0446598Z     arg0_1, = args
2023-01-11T21:38:06.0446666Z     args.clear()
2023-01-11T21:38:06.0446860Z     buf0 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0447050Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0447220Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0447295Z     del arg0_1
2023-01-11T21:38:06.0447376Z     return (buf0, buf1, )
2023-01-11T21:38:06.0447381Z 
2023-01-11T21:38:06.0447386Z 
2023-01-11T21:38:06.0447466Z if __name__ == "__main__":
2023-01-11T21:38:06.0447581Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0447703Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0447896Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0448007Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0448040Z 
2023-01-11T21:38:06.0448111Z ok (3.180s)
2023-01-11T21:38:06.0448561Z   test_scatter1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0448695Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0448953Z [2023-01-11 21:30:47,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 233
2023-01-11T21:38:06.0449213Z [2023-01-11 21:30:50,987] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 233
2023-01-11T21:38:06.0449219Z 
2023-01-11T21:38:06.0449317Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0449392Z import torch
2023-01-11T21:38:06.0449463Z import random
2023-01-11T21:38:06.0449582Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0449703Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0449709Z 
2023-01-11T21:38:06.0449790Z aten = torch.ops.aten
2023-01-11T21:38:06.0449929Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0450024Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0450029Z 
2023-01-11T21:38:06.0450103Z import triton
2023-01-11T21:38:06.0450188Z import triton.language as tl
2023-01-11T21:38:06.0450311Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0450450Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0450455Z 
2023-01-11T21:38:06.0450459Z 
2023-01-11T21:38:06.0450596Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0450800Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0450924Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0451036Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0451145Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0451242Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0451308Z {
2023-01-11T21:38:06.0451420Z     #pragma GCC ivdep
2023-01-11T21:38:06.0451509Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.0451575Z     {
2023-01-11T21:38:06.0451641Z         {
2023-01-11T21:38:06.0451712Z             {
2023-01-11T21:38:06.0451799Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0451886Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0451952Z             }
2023-01-11T21:38:06.0452017Z         }
2023-01-11T21:38:06.0452081Z     }
2023-01-11T21:38:06.0452148Z     {
2023-01-11T21:38:06.0452207Z         {
2023-01-11T21:38:06.0452296Z             auto tmp0 = in_ptr1[0];
2023-01-11T21:38:06.0452383Z             auto tmp1 = in_ptr2[0];
2023-01-11T21:38:06.0452472Z             out_ptr0[tmp0] = tmp1;
2023-01-11T21:38:06.0452539Z         }
2023-01-11T21:38:06.0452603Z     }
2023-01-11T21:38:06.0452664Z }
2023-01-11T21:38:06.0452742Z ''')
2023-01-11T21:38:06.0452747Z 
2023-01-11T21:38:06.0452751Z 
2023-01-11T21:38:06.0452842Z async_compile.wait(globals())
2023-01-11T21:38:06.0452919Z del async_compile
2023-01-11T21:38:06.0452924Z 
2023-01-11T21:38:06.0452998Z def call(args):
2023-01-11T21:38:06.0453084Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0453158Z     args.clear()
2023-01-11T21:38:06.0453352Z     buf0 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0453542Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0453608Z     del arg0_1
2023-01-11T21:38:06.0453680Z     del arg1_1
2023-01-11T21:38:06.0453749Z     del arg2_1
2023-01-11T21:38:06.0453822Z     return (buf0, )
2023-01-11T21:38:06.0453856Z 
2023-01-11T21:38:06.0453861Z 
2023-01-11T21:38:06.0453942Z if __name__ == "__main__":
2023-01-11T21:38:06.0454062Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0454187Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0454377Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0454680Z     arg1_1 = rand_strided((1, 1), (1, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0454872Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0455004Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0455009Z 
2023-01-11T21:38:06.0455080Z ok (3.029s)
2023-01-11T21:38:06.0455534Z   test_scatter2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0455668Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0455927Z [2023-01-11 21:30:51,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 234
2023-01-11T21:38:06.0456190Z [2023-01-11 21:30:53,637] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 234
2023-01-11T21:38:06.0456196Z 
2023-01-11T21:38:06.0456291Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0456358Z import torch
2023-01-11T21:38:06.0456436Z import random
2023-01-11T21:38:06.0456555Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0456676Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0456681Z 
2023-01-11T21:38:06.0456765Z aten = torch.ops.aten
2023-01-11T21:38:06.0456908Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0457007Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0457012Z 
2023-01-11T21:38:06.0457079Z import triton
2023-01-11T21:38:06.0457245Z import triton.language as tl
2023-01-11T21:38:06.0457370Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0457559Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0457565Z 
2023-01-11T21:38:06.0457569Z 
2023-01-11T21:38:06.0457707Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0457912Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0458036Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0458146Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0458247Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0458351Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0458416Z {
2023-01-11T21:38:06.0458520Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0458584Z     {
2023-01-11T21:38:06.0458667Z         #pragma omp for 
2023-01-11T21:38:06.0458757Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:06.0458817Z         {
2023-01-11T21:38:06.0458958Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0459061Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0459128Z         }
2023-01-11T21:38:06.0459225Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0459319Z         for(long i0=32768; i0<32768; i0+=1)
2023-01-11T21:38:06.0459387Z         {
2023-01-11T21:38:06.0459469Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0459552Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0459619Z         }
2023-01-11T21:38:06.0459700Z         #pragma omp for 
2023-01-11T21:38:06.0459787Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.0459849Z         {
2023-01-11T21:38:06.0459934Z             #pragma GCC ivdep
2023-01-11T21:38:06.0460066Z             for(long i1=0; i1<512; i1+=1)
2023-01-11T21:38:06.0460133Z             {
2023-01-11T21:38:06.0460203Z                 {
2023-01-11T21:38:06.0460277Z                     {
2023-01-11T21:38:06.0460384Z                         auto tmp0 = in_ptr1[i1 + (512*i0)];
2023-01-11T21:38:06.0460497Z                         auto tmp1 = in_ptr2[i1 + (512*i0)];
2023-01-11T21:38:06.0460616Z                         atomic_add(&out_ptr0[i1 + (512*tmp0)], tmp1);
2023-01-11T21:38:06.0460680Z                     }
2023-01-11T21:38:06.0460747Z                 }
2023-01-11T21:38:06.0460812Z             }
2023-01-11T21:38:06.0460878Z         }
2023-01-11T21:38:06.0460944Z     }
2023-01-11T21:38:06.0461009Z }
2023-01-11T21:38:06.0461088Z ''')
2023-01-11T21:38:06.0461093Z 
2023-01-11T21:38:06.0461106Z 
2023-01-11T21:38:06.0461193Z async_compile.wait(globals())
2023-01-11T21:38:06.0461269Z del async_compile
2023-01-11T21:38:06.0461274Z 
2023-01-11T21:38:06.0461351Z def call(args):
2023-01-11T21:38:06.0461442Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0461515Z     args.clear()
2023-01-11T21:38:06.0461718Z     buf0 = empty_strided((64, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0461914Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0461981Z     del arg0_1
2023-01-11T21:38:06.0462053Z     del arg1_1
2023-01-11T21:38:06.0462123Z     del arg2_1
2023-01-11T21:38:06.0462199Z     return (buf0, )
2023-01-11T21:38:06.0462204Z 
2023-01-11T21:38:06.0462209Z 
2023-01-11T21:38:06.0462289Z if __name__ == "__main__":
2023-01-11T21:38:06.0462412Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0462538Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0462743Z     arg0_1 = rand_strided((64, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0462935Z     arg1_1 = rand_strided((64, 512), (512, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0463140Z     arg2_1 = rand_strided((64, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0463269Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0463274Z 
2023-01-11T21:38:06.0463346Z ok (2.672s)
2023-01-11T21:38:06.0463823Z   test_scatter3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0463954Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0464213Z [2023-01-11 21:30:53,691] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 235
2023-01-11T21:38:06.0464476Z [2023-01-11 21:30:56,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 235
2023-01-11T21:38:06.0464484Z 
2023-01-11T21:38:06.0464585Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0464652Z import torch
2023-01-11T21:38:06.0464727Z import random
2023-01-11T21:38:06.0464844Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0464971Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0464976Z 
2023-01-11T21:38:06.0465055Z aten = torch.ops.aten
2023-01-11T21:38:06.0465189Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0465288Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0465293Z 
2023-01-11T21:38:06.0465367Z import triton
2023-01-11T21:38:06.0465452Z import triton.language as tl
2023-01-11T21:38:06.0465576Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0465716Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0465721Z 
2023-01-11T21:38:06.0465769Z 
2023-01-11T21:38:06.0465907Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0466111Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0466234Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0466346Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0466449Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0466508Z {
2023-01-11T21:38:06.0466608Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0466673Z     {
2023-01-11T21:38:06.0466754Z         #pragma omp for 
2023-01-11T21:38:06.0466841Z         for(long i0=0; i0<235; i0+=1)
2023-01-11T21:38:06.0466908Z         {
2023-01-11T21:38:06.0467049Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0467138Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0467202Z         }
2023-01-11T21:38:06.0467303Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0467398Z         for(long i0=1880; i0<1885; i0+=1)
2023-01-11T21:38:06.0467464Z         {
2023-01-11T21:38:06.0467553Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0467632Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0467700Z         }
2023-01-11T21:38:06.0467781Z         #pragma omp single
2023-01-11T21:38:06.0467851Z         {
2023-01-11T21:38:06.0467935Z             #pragma GCC ivdep
2023-01-11T21:38:06.0468023Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0468091Z             {
2023-01-11T21:38:06.0468153Z                 {
2023-01-11T21:38:06.0468226Z                     {
2023-01-11T21:38:06.0468325Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0468439Z                         auto tmp1 = static_cast<float>(0.8);
2023-01-11T21:38:06.0468552Z                         atomic_add(&out_ptr0[tmp0], tmp1);
2023-01-11T21:38:06.0468621Z                     }
2023-01-11T21:38:06.0468686Z                 }
2023-01-11T21:38:06.0468750Z             }
2023-01-11T21:38:06.0468815Z         }
2023-01-11T21:38:06.0468880Z     }
2023-01-11T21:38:06.0468946Z }
2023-01-11T21:38:06.0469031Z ''')
2023-01-11T21:38:06.0469036Z 
2023-01-11T21:38:06.0469041Z 
2023-01-11T21:38:06.0469133Z async_compile.wait(globals())
2023-01-11T21:38:06.0469208Z del async_compile
2023-01-11T21:38:06.0469213Z 
2023-01-11T21:38:06.0469310Z def call(args):
2023-01-11T21:38:06.0469393Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0469468Z     args.clear()
2023-01-11T21:38:06.0469676Z     buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0469847Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0469918Z     del arg0_1
2023-01-11T21:38:06.0469990Z     del arg1_1
2023-01-11T21:38:06.0470058Z     return (buf0, )
2023-01-11T21:38:06.0470072Z 
2023-01-11T21:38:06.0470077Z 
2023-01-11T21:38:06.0470151Z if __name__ == "__main__":
2023-01-11T21:38:06.0470271Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0470397Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0470606Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0470806Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0470925Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0470931Z 
2023-01-11T21:38:06.0470998Z ok (2.404s)
2023-01-11T21:38:06.0471445Z   test_scatter4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0471576Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0471855Z [2023-01-11 21:30:56,112] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 236
2023-01-11T21:38:06.0472118Z [2023-01-11 21:30:58,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 236
2023-01-11T21:38:06.0472123Z 
2023-01-11T21:38:06.0472231Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0472304Z import torch
2023-01-11T21:38:06.0472379Z import random
2023-01-11T21:38:06.0472497Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0472622Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0472627Z 
2023-01-11T21:38:06.0472709Z aten = torch.ops.aten
2023-01-11T21:38:06.0472837Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0472930Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0472936Z 
2023-01-11T21:38:06.0473009Z import triton
2023-01-11T21:38:06.0473103Z import triton.language as tl
2023-01-11T21:38:06.0473229Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0473368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0473374Z 
2023-01-11T21:38:06.0473378Z 
2023-01-11T21:38:06.0473515Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0473722Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0473837Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0473947Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0474058Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0474166Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0474231Z {
2023-01-11T21:38:06.0474335Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0474398Z     {
2023-01-11T21:38:06.0474472Z         #pragma omp for 
2023-01-11T21:38:06.0474559Z         for(long i0=0; i0<24304; i0+=1)
2023-01-11T21:38:06.0474633Z         {
2023-01-11T21:38:06.0474772Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0474869Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0474936Z         }
2023-01-11T21:38:06.0475033Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0475152Z         for(long i0=194432; i0<194432; i0+=1)
2023-01-11T21:38:06.0475219Z         {
2023-01-11T21:38:06.0475309Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0475394Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0475460Z         }
2023-01-11T21:38:06.0475538Z         #pragma omp for 
2023-01-11T21:38:06.0475618Z         for(long i0=0; i0<992; i0+=1)
2023-01-11T21:38:06.0475685Z         {
2023-01-11T21:38:06.0475754Z             {
2023-01-11T21:38:06.0475822Z                 {
2023-01-11T21:38:06.0475921Z                     auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0476016Z                     auto tmp1 = in_ptr2[i0];
2023-01-11T21:38:06.0476127Z                     out_ptr0[i0 + (992*tmp0)] = tmp1;
2023-01-11T21:38:06.0476189Z                 }
2023-01-11T21:38:06.0476257Z             }
2023-01-11T21:38:06.0476324Z         }
2023-01-11T21:38:06.0476387Z     }
2023-01-11T21:38:06.0476450Z }
2023-01-11T21:38:06.0476533Z ''')
2023-01-11T21:38:06.0476539Z 
2023-01-11T21:38:06.0476544Z 
2023-01-11T21:38:06.0476635Z async_compile.wait(globals())
2023-01-11T21:38:06.0476705Z del async_compile
2023-01-11T21:38:06.0476717Z 
2023-01-11T21:38:06.0476784Z def call(args):
2023-01-11T21:38:06.0476869Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0476946Z     args.clear()
2023-01-11T21:38:06.0477151Z     buf0 = empty_strided((196, 992), (992, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0477343Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0477414Z     del arg0_1
2023-01-11T21:38:06.0477486Z     del arg1_1
2023-01-11T21:38:06.0477580Z     del arg2_1
2023-01-11T21:38:06.0477656Z     return (buf0, )
2023-01-11T21:38:06.0477662Z 
2023-01-11T21:38:06.0477666Z 
2023-01-11T21:38:06.0477747Z if __name__ == "__main__":
2023-01-11T21:38:06.0477868Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0477992Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0478200Z     arg0_1 = rand_strided((196, 992), (992, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0478398Z     arg1_1 = rand_strided((1, 992), (992, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0478591Z     arg2_1 = rand_strided((1, 992), (992, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0478717Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0478722Z 
2023-01-11T21:38:06.0478790Z ok (2.339s)
2023-01-11T21:38:06.0478947Z   test_scatter_add1_cpu (__main__.CpuTests) ... skip: Flaky test, needs debugging (0.001s)
2023-01-11T21:38:06.0479399Z   test_scatter_add2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0479535Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0479792Z [2023-01-11 21:30:58,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 237
2023-01-11T21:38:06.0480051Z [2023-01-11 21:31:00,297] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 237
2023-01-11T21:38:06.0480057Z 
2023-01-11T21:38:06.0480154Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0480229Z import torch
2023-01-11T21:38:06.0480297Z import random
2023-01-11T21:38:06.0480414Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0480539Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0480544Z 
2023-01-11T21:38:06.0480625Z aten = torch.ops.aten
2023-01-11T21:38:06.0480762Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0480857Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0480862Z 
2023-01-11T21:38:06.0480962Z import triton
2023-01-11T21:38:06.0481048Z import triton.language as tl
2023-01-11T21:38:06.0481173Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0481310Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0481316Z 
2023-01-11T21:38:06.0481320Z 
2023-01-11T21:38:06.0481461Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0481664Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0481786Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0481896Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0482005Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0482109Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0482168Z {
2023-01-11T21:38:06.0482249Z     #pragma GCC ivdep
2023-01-11T21:38:06.0482332Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.0482398Z     {
2023-01-11T21:38:06.0482469Z         {
2023-01-11T21:38:06.0482536Z             {
2023-01-11T21:38:06.0482621Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0482710Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0482778Z             }
2023-01-11T21:38:06.0482842Z         }
2023-01-11T21:38:06.0482906Z     }
2023-01-11T21:38:06.0482985Z     #pragma GCC ivdep
2023-01-11T21:38:06.0483071Z     for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0483130Z     {
2023-01-11T21:38:06.0483216Z         #pragma GCC ivdep
2023-01-11T21:38:06.0483305Z         for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0483371Z         {
2023-01-11T21:38:06.0483469Z             {
2023-01-11T21:38:06.0483545Z                 {
2023-01-11T21:38:06.0483642Z                     auto tmp0 = in_ptr1[i1 + (3*i0)];
2023-01-11T21:38:06.0483747Z                     auto tmp1 = in_ptr2[i1 + (3*i0)];
2023-01-11T21:38:06.0483863Z                     atomic_add(&out_ptr0[i1 + (3*tmp0)], tmp1);
2023-01-11T21:38:06.0483936Z                 }
2023-01-11T21:38:06.0484002Z             }
2023-01-11T21:38:06.0484067Z         }
2023-01-11T21:38:06.0484131Z     }
2023-01-11T21:38:06.0484188Z }
2023-01-11T21:38:06.0484276Z ''')
2023-01-11T21:38:06.0484281Z 
2023-01-11T21:38:06.0484286Z 
2023-01-11T21:38:06.0484376Z async_compile.wait(globals())
2023-01-11T21:38:06.0484456Z del async_compile
2023-01-11T21:38:06.0484462Z 
2023-01-11T21:38:06.0484536Z def call(args):
2023-01-11T21:38:06.0484627Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0484703Z     args.clear()
2023-01-11T21:38:06.0484892Z     buf0 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0485087Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0485160Z     del arg0_1
2023-01-11T21:38:06.0485233Z     del arg1_1
2023-01-11T21:38:06.0485303Z     del arg2_1
2023-01-11T21:38:06.0485377Z     return (buf0, )
2023-01-11T21:38:06.0485383Z 
2023-01-11T21:38:06.0485394Z 
2023-01-11T21:38:06.0485473Z if __name__ == "__main__":
2023-01-11T21:38:06.0485591Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0485711Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0485908Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0486099Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0486288Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0486416Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0486423Z 
2023-01-11T21:38:06.0486494Z ok (1.891s)
2023-01-11T21:38:06.0486975Z   test_scatter_add3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0487109Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0487364Z [2023-01-11 21:31:00,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 238
2023-01-11T21:38:06.0487625Z [2023-01-11 21:31:02,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 238
2023-01-11T21:38:06.0487631Z 
2023-01-11T21:38:06.0487722Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0487796Z import torch
2023-01-11T21:38:06.0487873Z import random
2023-01-11T21:38:06.0487993Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0488116Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0488121Z 
2023-01-11T21:38:06.0488203Z aten = torch.ops.aten
2023-01-11T21:38:06.0488343Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0488431Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0488441Z 
2023-01-11T21:38:06.0488509Z import triton
2023-01-11T21:38:06.0488602Z import triton.language as tl
2023-01-11T21:38:06.0488725Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0488865Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0488871Z 
2023-01-11T21:38:06.0488875Z 
2023-01-11T21:38:06.0489014Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0489217Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0489342Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0489474Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0489584Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0489686Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0489752Z {
2023-01-11T21:38:06.0489861Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0489928Z     {
2023-01-11T21:38:06.0490014Z         #pragma omp for 
2023-01-11T21:38:06.0490095Z         for(long i0=0; i0<235; i0+=1)
2023-01-11T21:38:06.0490165Z         {
2023-01-11T21:38:06.0490306Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0490404Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0490473Z         }
2023-01-11T21:38:06.0490572Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0490665Z         for(long i0=1880; i0<1885; i0+=1)
2023-01-11T21:38:06.0490725Z         {
2023-01-11T21:38:06.0490817Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0490901Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0490971Z         }
2023-01-11T21:38:06.0491054Z         #pragma omp single
2023-01-11T21:38:06.0491121Z         {
2023-01-11T21:38:06.0491210Z             #pragma GCC ivdep
2023-01-11T21:38:06.0491290Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0491359Z             {
2023-01-11T21:38:06.0491429Z                 {
2023-01-11T21:38:06.0491501Z                     {
2023-01-11T21:38:06.0491601Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0491700Z                         auto tmp1 = in_ptr2[i0];
2023-01-11T21:38:06.0491810Z                         atomic_add(&out_ptr0[tmp0], tmp1);
2023-01-11T21:38:06.0491874Z                     }
2023-01-11T21:38:06.0491942Z                 }
2023-01-11T21:38:06.0492012Z             }
2023-01-11T21:38:06.0492078Z         }
2023-01-11T21:38:06.0492145Z     }
2023-01-11T21:38:06.0492209Z }
2023-01-11T21:38:06.0492286Z ''')
2023-01-11T21:38:06.0492299Z 
2023-01-11T21:38:06.0492312Z 
2023-01-11T21:38:06.0492399Z async_compile.wait(globals())
2023-01-11T21:38:06.0492473Z del async_compile
2023-01-11T21:38:06.0492479Z 
2023-01-11T21:38:06.0492554Z def call(args):
2023-01-11T21:38:06.0492640Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0492715Z     args.clear()
2023-01-11T21:38:06.0492952Z     buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0493145Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0493212Z     del arg0_1
2023-01-11T21:38:06.0493284Z     del arg1_1
2023-01-11T21:38:06.0493354Z     del arg2_1
2023-01-11T21:38:06.0493427Z     return (buf0, )
2023-01-11T21:38:06.0493432Z 
2023-01-11T21:38:06.0493437Z 
2023-01-11T21:38:06.0493518Z if __name__ == "__main__":
2023-01-11T21:38:06.0493634Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0493764Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0493972Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0494166Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0494369Z     arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0494603Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0494608Z 
2023-01-11T21:38:06.0494679Z ok (2.436s)
2023-01-11T21:38:06.0495186Z   test_scatter_reduce1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0495361Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0495575Z [W TensorAdvancedIndexing.cpp:1739] Warning: scatter_reduce() is in beta and the API may change at any time. (function operator())
2023-01-11T21:38:06.0495838Z [2023-01-11 21:31:02,755] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 239
2023-01-11T21:38:06.0496098Z [2023-01-11 21:31:02,766] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 239
2023-01-11T21:38:06.0496105Z 
2023-01-11T21:38:06.0496204Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0496272Z import torch
2023-01-11T21:38:06.0496345Z import random
2023-01-11T21:38:06.0496465Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0496588Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0496593Z 
2023-01-11T21:38:06.0496673Z aten = torch.ops.aten
2023-01-11T21:38:06.0496807Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0496907Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0496912Z 
2023-01-11T21:38:06.0496978Z import triton
2023-01-11T21:38:06.0497070Z import triton.language as tl
2023-01-11T21:38:06.0497252Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0497395Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0497400Z 
2023-01-11T21:38:06.0497405Z 
2023-01-11T21:38:06.0497543Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0497750Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0497871Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0497980Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0498081Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0498184Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0498248Z {
2023-01-11T21:38:06.0498352Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0498419Z     {
2023-01-11T21:38:06.0498501Z         #pragma omp for 
2023-01-11T21:38:06.0498586Z         for(long i0=0; i0<235; i0+=1)
2023-01-11T21:38:06.0498646Z         {
2023-01-11T21:38:06.0498786Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0498923Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0498993Z         }
2023-01-11T21:38:06.0499091Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0499179Z         for(long i0=1880; i0<1885; i0+=1)
2023-01-11T21:38:06.0499244Z         {
2023-01-11T21:38:06.0499326Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0499412Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0499479Z         }
2023-01-11T21:38:06.0499561Z         #pragma omp single
2023-01-11T21:38:06.0499629Z         {
2023-01-11T21:38:06.0499715Z             #pragma GCC ivdep
2023-01-11T21:38:06.0499803Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0499868Z             {
2023-01-11T21:38:06.0499936Z                 {
2023-01-11T21:38:06.0500007Z                     {
2023-01-11T21:38:06.0500105Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0500204Z                         auto tmp1 = in_ptr2[i0];
2023-01-11T21:38:06.0500316Z                         atomic_add(&out_ptr0[tmp0], tmp1);
2023-01-11T21:38:06.0500380Z                     }
2023-01-11T21:38:06.0500446Z                 }
2023-01-11T21:38:06.0500512Z             }
2023-01-11T21:38:06.0500575Z         }
2023-01-11T21:38:06.0500640Z     }
2023-01-11T21:38:06.0500702Z }
2023-01-11T21:38:06.0500787Z ''')
2023-01-11T21:38:06.0500793Z 
2023-01-11T21:38:06.0500797Z 
2023-01-11T21:38:06.0500884Z async_compile.wait(globals())
2023-01-11T21:38:06.0500961Z del async_compile
2023-01-11T21:38:06.0500966Z 
2023-01-11T21:38:06.0501040Z def call(args):
2023-01-11T21:38:06.0501127Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0501204Z     args.clear()
2023-01-11T21:38:06.0501450Z     buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0501640Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0501711Z     del arg0_1
2023-01-11T21:38:06.0501775Z     del arg1_1
2023-01-11T21:38:06.0501848Z     del arg2_1
2023-01-11T21:38:06.0501926Z     return (buf0, )
2023-01-11T21:38:06.0501931Z 
2023-01-11T21:38:06.0501935Z 
2023-01-11T21:38:06.0502013Z if __name__ == "__main__":
2023-01-11T21:38:06.0502131Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0502256Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0502466Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0502659Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0502865Z     arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0502992Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0502998Z 
2023-01-11T21:38:06.0503070Z ok (0.032s)
2023-01-11T21:38:06.0503529Z   test_scatter_reduce2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0503660Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0503915Z [2023-01-11 21:31:02,786] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 240
2023-01-11T21:38:06.0504177Z [2023-01-11 21:31:04,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 240
2023-01-11T21:38:06.0504186Z 
2023-01-11T21:38:06.0504285Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0504359Z import torch
2023-01-11T21:38:06.0504427Z import random
2023-01-11T21:38:06.0504546Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0504669Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0504703Z 
2023-01-11T21:38:06.0504786Z aten = torch.ops.aten
2023-01-11T21:38:06.0504920Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0505016Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0505021Z 
2023-01-11T21:38:06.0505100Z import triton
2023-01-11T21:38:06.0505186Z import triton.language as tl
2023-01-11T21:38:06.0505313Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0505451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0505456Z 
2023-01-11T21:38:06.0505461Z 
2023-01-11T21:38:06.0505602Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0505811Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0505930Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0506041Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.0506150Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0506255Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0506313Z {
2023-01-11T21:38:06.0506393Z     #pragma GCC ivdep
2023-01-11T21:38:06.0506477Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.0506543Z     {
2023-01-11T21:38:06.0506609Z         {
2023-01-11T21:38:06.0506675Z             {
2023-01-11T21:38:06.0506761Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0506848Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0506913Z             }
2023-01-11T21:38:06.0506979Z         }
2023-01-11T21:38:06.0507044Z     }
2023-01-11T21:38:06.0507122Z     #pragma GCC ivdep
2023-01-11T21:38:06.0507237Z     for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0507296Z     {
2023-01-11T21:38:06.0507376Z         #pragma GCC ivdep
2023-01-11T21:38:06.0507463Z         for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0507530Z         {
2023-01-11T21:38:06.0507601Z             {
2023-01-11T21:38:06.0507671Z                 {
2023-01-11T21:38:06.0507772Z                     auto tmp0 = in_ptr1[i1 + (3*i0)];
2023-01-11T21:38:06.0507879Z                     auto tmp1 = static_cast<float>(0);
2023-01-11T21:38:06.0507980Z                     out_ptr0[i1 + (3*tmp0)] = tmp1;
2023-01-11T21:38:06.0508048Z                 }
2023-01-11T21:38:06.0508112Z             }
2023-01-11T21:38:06.0508177Z         }
2023-01-11T21:38:06.0508242Z     }
2023-01-11T21:38:06.0508314Z     #pragma GCC ivdep
2023-01-11T21:38:06.0508395Z     for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0508461Z     {
2023-01-11T21:38:06.0508542Z         #pragma GCC ivdep
2023-01-11T21:38:06.0508627Z         for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0508695Z         {
2023-01-11T21:38:06.0508762Z             {
2023-01-11T21:38:06.0508826Z                 {
2023-01-11T21:38:06.0508932Z                     auto tmp0 = in_ptr1[i1 + (3*i0)];
2023-01-11T21:38:06.0509033Z                     auto tmp1 = in_ptr2[i1 + (3*i0)];
2023-01-11T21:38:06.0509150Z                     atomic_add(&out_ptr0[i1 + (3*tmp0)], tmp1);
2023-01-11T21:38:06.0509219Z                 }
2023-01-11T21:38:06.0509285Z             }
2023-01-11T21:38:06.0509350Z         }
2023-01-11T21:38:06.0509409Z     }
2023-01-11T21:38:06.0509473Z }
2023-01-11T21:38:06.0509557Z ''')
2023-01-11T21:38:06.0509563Z 
2023-01-11T21:38:06.0509568Z 
2023-01-11T21:38:06.0509659Z async_compile.wait(globals())
2023-01-11T21:38:06.0509735Z del async_compile
2023-01-11T21:38:06.0509740Z 
2023-01-11T21:38:06.0509817Z def call(args):
2023-01-11T21:38:06.0509904Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0509974Z     args.clear()
2023-01-11T21:38:06.0510170Z     buf0 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0510366Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0510439Z     del arg0_1
2023-01-11T21:38:06.0510509Z     del arg1_1
2023-01-11T21:38:06.0510580Z     del arg2_1
2023-01-11T21:38:06.0510683Z     return (buf0, )
2023-01-11T21:38:06.0510688Z 
2023-01-11T21:38:06.0510693Z 
2023-01-11T21:38:06.0510766Z if __name__ == "__main__":
2023-01-11T21:38:06.0510884Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0511011Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0511206Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0511398Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0511590Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0511721Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0511729Z 
2023-01-11T21:38:06.0511801Z ok (2.056s)
2023-01-11T21:38:06.0512277Z   test_scheduler_vertical_fusion1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0512402Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0512658Z [2023-01-11 21:31:05,002] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 241
2023-01-11T21:38:06.0512915Z [2023-01-11 21:31:07,464] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 241
2023-01-11T21:38:06.0512921Z 
2023-01-11T21:38:06.0513019Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0513122Z import torch
2023-01-11T21:38:06.0513195Z import random
2023-01-11T21:38:06.0513314Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0513437Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0513442Z 
2023-01-11T21:38:06.0513518Z aten = torch.ops.aten
2023-01-11T21:38:06.0513658Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0513751Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0513756Z 
2023-01-11T21:38:06.0513830Z import triton
2023-01-11T21:38:06.0513920Z import triton.language as tl
2023-01-11T21:38:06.0514044Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0514185Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0514190Z 
2023-01-11T21:38:06.0514195Z 
2023-01-11T21:38:06.0514330Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0514528Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0514653Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0514759Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.0514868Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0514980Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0515085Z                        const float* __restrict__ in_ptr2)
2023-01-11T21:38:06.0515150Z {
2023-01-11T21:38:06.0515241Z     auto out_ptr1 = in_out_ptr1;
2023-01-11T21:38:06.0515335Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0515402Z     {
2023-01-11T21:38:06.0515483Z         #pragma omp for 
2023-01-11T21:38:06.0515575Z         for(long i0=0; i0<135252; i0+=1)
2023-01-11T21:38:06.0515640Z         {
2023-01-11T21:38:06.0515781Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0515914Z             auto tmp8 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0516135Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(-1.061519070296458e-11));
2023-01-11T21:38:06.0516227Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0516445Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(-1.988366587925593e-08));
2023-01-11T21:38:06.0516566Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0516657Z             auto tmp5 = tmp0 * tmp4;
2023-01-11T21:38:06.0516878Z             auto tmp6 = at::vec::Vectorized<float>(static_cast<float>(-3.087032500374211e-07));
2023-01-11T21:38:06.0516967Z             auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:06.0517184Z             auto tmp9 = at::vec::Vectorized<float>(static_cast<float>(1.55093272922008e-10));
2023-01-11T21:38:06.0517268Z             auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.0517360Z             auto tmp11 = tmp7 + tmp10;
2023-01-11T21:38:06.0517467Z             auto tmp12 = tmp11.reciprocal();
2023-01-11T21:38:06.0517612Z             auto tmp13 = at::vec::Vectorized<float>(static_cast<float>(1.0));
2023-01-11T21:38:06.0517707Z             auto tmp14 = tmp12 * tmp13;
2023-01-11T21:38:06.0517810Z             tmp11.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0517907Z             tmp14.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0517974Z         }
2023-01-11T21:38:06.0518069Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0518173Z         for(long i0=1082016; i0<1082016; i0+=1)
2023-01-11T21:38:06.0518239Z         {
2023-01-11T21:38:06.0518328Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0518416Z             auto tmp8 = in_ptr1[i0];
2023-01-11T21:38:06.0518593Z             auto tmp1 = static_cast<float>(-1.061519070296458e-11);
2023-01-11T21:38:06.0518681Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0518846Z             auto tmp3 = static_cast<float>(-1.988366587925593e-08);
2023-01-11T21:38:06.0518934Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0519023Z             auto tmp5 = tmp0 * tmp4;
2023-01-11T21:38:06.0519228Z             auto tmp6 = static_cast<float>(-3.087032500374211e-07);
2023-01-11T21:38:06.0519315Z             auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:06.0519489Z             auto tmp9 = static_cast<float>(1.55093272922008e-10);
2023-01-11T21:38:06.0519578Z             auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.0519665Z             auto tmp11 = tmp7 + tmp10;
2023-01-11T21:38:06.0519751Z             auto tmp12 = 1 / tmp11;
2023-01-11T21:38:06.0519856Z             auto tmp13 = static_cast<float>(1.0);
2023-01-11T21:38:06.0519947Z             auto tmp14 = tmp12 * tmp13;
2023-01-11T21:38:06.0520035Z             in_out_ptr0[i0] = tmp11;
2023-01-11T21:38:06.0520120Z             out_ptr1[i0] = tmp14;
2023-01-11T21:38:06.0520186Z         }
2023-01-11T21:38:06.0520260Z         #pragma omp for 
2023-01-11T21:38:06.0520348Z         for(long i0=0; i0<41616; i0+=1)
2023-01-11T21:38:06.0520414Z         {
2023-01-11T21:38:06.0520499Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0520570Z             {
2023-01-11T21:38:06.0520723Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + (8*i1) + (26*i0));
2023-01-11T21:38:06.0520864Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i1);
2023-01-11T21:38:06.0520949Z                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0521066Z                 tmp2.store(in_out_ptr0 + (8*i1) + (26*i0));
2023-01-11T21:38:06.0521134Z             }
2023-01-11T21:38:06.0521233Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0521324Z             for(long i1=24; i1<26; i1+=1)
2023-01-11T21:38:06.0521395Z             {
2023-01-11T21:38:06.0521503Z                 auto tmp0 = in_out_ptr0[i1 + (26*i0)];
2023-01-11T21:38:06.0521588Z                 auto tmp1 = in_ptr2[i1];
2023-01-11T21:38:06.0521678Z                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0521777Z                 in_out_ptr0[i1 + (26*i0)] = tmp2;
2023-01-11T21:38:06.0521846Z             }
2023-01-11T21:38:06.0521920Z         }
2023-01-11T21:38:06.0521998Z         #pragma omp for 
2023-01-11T21:38:06.0522086Z         for(long i0=0; i0<135252; i0+=1)
2023-01-11T21:38:06.0522145Z         {
2023-01-11T21:38:06.0522282Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0522497Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0522588Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0522689Z             tmp2.store(in_out_ptr1 + 8*i0);
2023-01-11T21:38:06.0522755Z         }
2023-01-11T21:38:06.0522853Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0522942Z         for(long i0=1082016; i0<1082016; i0+=1)
2023-01-11T21:38:06.0523008Z         {
2023-01-11T21:38:06.0523098Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:06.0523193Z             auto tmp1 = in_out_ptr0[i0];
2023-01-11T21:38:06.0523278Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0523367Z             in_out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.0523436Z         }
2023-01-11T21:38:06.0523494Z     }
2023-01-11T21:38:06.0523558Z }
2023-01-11T21:38:06.0523642Z ''')
2023-01-11T21:38:06.0523648Z 
2023-01-11T21:38:06.0523652Z 
2023-01-11T21:38:06.0523744Z async_compile.wait(globals())
2023-01-11T21:38:06.0523821Z del async_compile
2023-01-11T21:38:06.0523827Z 
2023-01-11T21:38:06.0523905Z def call(args):
2023-01-11T21:38:06.0523992Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0524060Z     args.clear()
2023-01-11T21:38:06.0524273Z     buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0524363Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.0524575Z     buf2 = empty_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0524663Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0524751Z     buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.0525002Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()))
2023-01-11T21:38:06.0525115Z     del arg0_1
2023-01-11T21:38:06.0525179Z     del arg1_1
2023-01-11T21:38:06.0525249Z     del arg2_1
2023-01-11T21:38:06.0525324Z     return (buf4, )
2023-01-11T21:38:06.0525329Z 
2023-01-11T21:38:06.0525334Z 
2023-01-11T21:38:06.0525416Z if __name__ == "__main__":
2023-01-11T21:38:06.0525537Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0525662Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0525875Z     arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0526080Z     arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0526270Z     arg2_1 = rand_strided((26, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0526394Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0526400Z 
2023-01-11T21:38:06.0526474Z ok (2.664s)
2023-01-11T21:38:06.0526935Z   test_select_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0527066Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0527325Z [2023-01-11 21:31:07,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 242
2023-01-11T21:38:06.0527585Z [2023-01-11 21:31:09,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 242
2023-01-11T21:38:06.0527591Z 
2023-01-11T21:38:06.0527686Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0527758Z import torch
2023-01-11T21:38:06.0527826Z import random
2023-01-11T21:38:06.0527945Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0528066Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0528072Z 
2023-01-11T21:38:06.0528151Z aten = torch.ops.aten
2023-01-11T21:38:06.0528288Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0528415Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0528421Z 
2023-01-11T21:38:06.0528495Z import triton
2023-01-11T21:38:06.0528581Z import triton.language as tl
2023-01-11T21:38:06.0528706Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0528844Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0528849Z 
2023-01-11T21:38:06.0528854Z 
2023-01-11T21:38:06.0528991Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0529197Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0529320Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0529431Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0529537Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.0529641Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0529736Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0529806Z {
2023-01-11T21:38:06.0529907Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0529974Z     {
2023-01-11T21:38:06.0530056Z         #pragma omp for 
2023-01-11T21:38:06.0530142Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0530201Z         {
2023-01-11T21:38:06.0530287Z             #pragma GCC ivdep
2023-01-11T21:38:06.0530377Z             for(long i1=0; i1<197; i1+=1)
2023-01-11T21:38:06.0530443Z             {
2023-01-11T21:38:06.0530529Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0530625Z                 for(long i2=0; i2<38; i2+=1)
2023-01-11T21:38:06.0530694Z                 {
2023-01-11T21:38:06.0530786Z                     {
2023-01-11T21:38:06.0530859Z                         {
2023-01-11T21:38:06.0530970Z                             auto tmp3 = in_ptr0[i2 + (38*i0)];
2023-01-11T21:38:06.0531085Z                             auto tmp4 = in_ptr1[i2 + (38*i1) + (7486*i0)];
2023-01-11T21:38:06.0531197Z                             auto tmp0 = static_cast<int>(i1);
2023-01-11T21:38:06.0531307Z                             auto tmp1 = static_cast<int>(0);
2023-01-11T21:38:06.0531408Z                             auto tmp2 = tmp0 == tmp1;
2023-01-11T21:38:06.0531509Z                             auto tmp5 = tmp2 ? tmp3 : tmp4;
2023-01-11T21:38:06.0531621Z                             out_ptr0[i2 + (38*i1) + (7486*i0)] = tmp5;
2023-01-11T21:38:06.0531694Z                         }
2023-01-11T21:38:06.0531766Z                     }
2023-01-11T21:38:06.0531836Z                 }
2023-01-11T21:38:06.0531904Z             }
2023-01-11T21:38:06.0531971Z         }
2023-01-11T21:38:06.0532044Z         #pragma omp for 
2023-01-11T21:38:06.0532134Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0532198Z         {
2023-01-11T21:38:06.0532282Z             #pragma GCC ivdep
2023-01-11T21:38:06.0532373Z             for(long i1=0; i1<7486; i1+=1)
2023-01-11T21:38:06.0532439Z             {
2023-01-11T21:38:06.0532508Z                 {
2023-01-11T21:38:06.0532574Z                     {
2023-01-11T21:38:06.0532675Z                         auto tmp3 = in_ptr2[i1];
2023-01-11T21:38:06.0532788Z                         auto tmp4 = in_ptr1[i1 + (7486*i0)];
2023-01-11T21:38:06.0532896Z                         auto tmp0 = static_cast<int>(i0);
2023-01-11T21:38:06.0533005Z                         auto tmp1 = static_cast<int>(1);
2023-01-11T21:38:06.0533106Z                         auto tmp2 = tmp0 == tmp1;
2023-01-11T21:38:06.0533212Z                         auto tmp5 = tmp2 ? tmp3 : tmp4;
2023-01-11T21:38:06.0533306Z                         out_ptr1[i1 + (7486*i0)] = tmp5;
2023-01-11T21:38:06.0533376Z                     }
2023-01-11T21:38:06.0533449Z                 }
2023-01-11T21:38:06.0533514Z             }
2023-01-11T21:38:06.0533582Z         }
2023-01-11T21:38:06.0533649Z     }
2023-01-11T21:38:06.0533714Z }
2023-01-11T21:38:06.0533792Z ''')
2023-01-11T21:38:06.0533798Z 
2023-01-11T21:38:06.0533803Z 
2023-01-11T21:38:06.0533898Z async_compile.wait(globals())
2023-01-11T21:38:06.0534001Z del async_compile
2023-01-11T21:38:06.0534007Z 
2023-01-11T21:38:06.0534082Z def call(args):
2023-01-11T21:38:06.0534166Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.0534242Z     args.clear()
2023-01-11T21:38:06.0534454Z     buf0 = empty_strided((8, 197, 38), (7486, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0534770Z     buf1 = empty_strided((8, 197, 38), (7486, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0534987Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0535065Z     del arg0_1
2023-01-11T21:38:06.0535136Z     del arg1_1
2023-01-11T21:38:06.0535206Z     del arg2_1
2023-01-11T21:38:06.0535288Z     return (buf0, buf1, )
2023-01-11T21:38:06.0535294Z 
2023-01-11T21:38:06.0535298Z 
2023-01-11T21:38:06.0535376Z if __name__ == "__main__":
2023-01-11T21:38:06.0535493Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0535615Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0535825Z     arg0_1 = rand_strided((8, 197, 38), (7486, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0536024Z     arg1_1 = rand_strided((8, 38), (38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0536223Z     arg2_1 = rand_strided((197, 38), (38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0536348Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.0536353Z 
2023-01-11T21:38:06.0536421Z ok (1.800s)
2023-01-11T21:38:06.0536869Z   test_sgn_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0537051Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0537377Z [2023-01-11 21:31:09,307] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 243
2023-01-11T21:38:06.0537644Z [2023-01-11 21:31:11,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 243
2023-01-11T21:38:06.0537650Z 
2023-01-11T21:38:06.0537741Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0537816Z import torch
2023-01-11T21:38:06.0537892Z import random
2023-01-11T21:38:06.0538013Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0538138Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0538147Z 
2023-01-11T21:38:06.0538228Z aten = torch.ops.aten
2023-01-11T21:38:06.0538367Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0538455Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0538466Z 
2023-01-11T21:38:06.0538532Z import triton
2023-01-11T21:38:06.0538627Z import triton.language as tl
2023-01-11T21:38:06.0538749Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0538890Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0538895Z 
2023-01-11T21:38:06.0538900Z 
2023-01-11T21:38:06.0539039Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0539248Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0539371Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0539469Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0539581Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0539647Z {
2023-01-11T21:38:06.0539753Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0539818Z     {
2023-01-11T21:38:06.0539900Z         #pragma omp for 
2023-01-11T21:38:06.0539988Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.0540048Z         {
2023-01-11T21:38:06.0540228Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0540401Z             auto tmp1 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), decltype(tmp0)(0) < tmp0);
2023-01-11T21:38:06.0540575Z             auto tmp2 = decltype(tmp0)::blendv(decltype(tmp0)(0), decltype(tmp0)(1), tmp0 < decltype(tmp0)(0));
2023-01-11T21:38:06.0540705Z             auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:06.0540840Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0540929Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:06.0541096Z             auto tmp6 = decltype(tmp5)::blendv(decltype(tmp5)(0), decltype(tmp5)(1), decltype(tmp5)(0) < tmp5);
2023-01-11T21:38:06.0541266Z             auto tmp7 = decltype(tmp5)::blendv(decltype(tmp5)(0), decltype(tmp5)(1), tmp5 < decltype(tmp5)(0));
2023-01-11T21:38:06.0541386Z             auto tmp8 = tmp6 - tmp7;
2023-01-11T21:38:06.0541512Z             auto tmp9 = tmp8 - tmp4;
2023-01-11T21:38:06.0541611Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0541706Z             tmp9.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0541772Z         }
2023-01-11T21:38:06.0541874Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0541960Z         for(long i0=40; i0<41; i0+=1)
2023-01-11T21:38:06.0542020Z         {
2023-01-11T21:38:06.0542108Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0542199Z             auto tmp1 = tmp0 > 0 ? 1 : 0;
2023-01-11T21:38:06.0542290Z             auto tmp2 = tmp0 < 0 ? 1 : 0;
2023-01-11T21:38:06.0542416Z             auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:06.0542562Z             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:06.0542653Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:06.0542736Z             auto tmp6 = tmp5 > 0 ? 1 : 0;
2023-01-11T21:38:06.0542830Z             auto tmp7 = tmp5 < 0 ? 1 : 0;
2023-01-11T21:38:06.0542958Z             auto tmp8 = tmp6 - tmp7;
2023-01-11T21:38:06.0543084Z             auto tmp9 = tmp8 - tmp4;
2023-01-11T21:38:06.0543172Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0543257Z             out_ptr1[i0] = tmp9;
2023-01-11T21:38:06.0543323Z         }
2023-01-11T21:38:06.0543381Z     }
2023-01-11T21:38:06.0543444Z }
2023-01-11T21:38:06.0543529Z ''')
2023-01-11T21:38:06.0543535Z 
2023-01-11T21:38:06.0543539Z 
2023-01-11T21:38:06.0543633Z async_compile.wait(globals())
2023-01-11T21:38:06.0543715Z del async_compile
2023-01-11T21:38:06.0543720Z 
2023-01-11T21:38:06.0543794Z def call(args):
2023-01-11T21:38:06.0543868Z     arg0_1, = args
2023-01-11T21:38:06.0543936Z     args.clear()
2023-01-11T21:38:06.0544132Z     buf0 = empty_strided((41, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0544328Z     buf1 = empty_strided((41, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0544496Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0544568Z     del arg0_1
2023-01-11T21:38:06.0544651Z     return (buf0, buf1, )
2023-01-11T21:38:06.0544656Z 
2023-01-11T21:38:06.0544661Z 
2023-01-11T21:38:06.0544740Z if __name__ == "__main__":
2023-01-11T21:38:06.0544858Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0544977Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0545173Z     arg0_1 = rand_strided((41, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0545287Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0545292Z 
2023-01-11T21:38:06.0545362Z ok (1.748s)
2023-01-11T21:38:06.0545820Z   test_sgn_extremal_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0545986Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0546244Z [2023-01-11 21:31:11,048] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 244
2023-01-11T21:38:06.0546507Z [2023-01-11 21:31:12,722] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 244
2023-01-11T21:38:06.0546513Z 
2023-01-11T21:38:06.0546612Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0546680Z import torch
2023-01-11T21:38:06.0546755Z import random
2023-01-11T21:38:06.0546874Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0546998Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0547007Z 
2023-01-11T21:38:06.0547088Z aten = torch.ops.aten
2023-01-11T21:38:06.0547225Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0547319Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0547324Z 
2023-01-11T21:38:06.0547397Z import triton
2023-01-11T21:38:06.0547485Z import triton.language as tl
2023-01-11T21:38:06.0547611Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0547751Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0547756Z 
2023-01-11T21:38:06.0547761Z 
2023-01-11T21:38:06.0547899Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0548104Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0548228Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0548330Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0548424Z {
2023-01-11T21:38:06.0548498Z     #pragma GCC ivdep
2023-01-11T21:38:06.0548582Z     for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0548647Z     {
2023-01-11T21:38:06.0548713Z         {
2023-01-11T21:38:06.0548780Z             {
2023-01-11T21:38:06.0548873Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0548966Z                 auto tmp1 = tmp0 > 0 ? 1 : 0;
2023-01-11T21:38:06.0549063Z                 auto tmp2 = tmp0 < 0 ? 1 : 0;
2023-01-11T21:38:06.0549192Z                 auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:06.0549279Z                 out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0549344Z             }
2023-01-11T21:38:06.0549412Z         }
2023-01-11T21:38:06.0549479Z     }
2023-01-11T21:38:06.0549536Z }
2023-01-11T21:38:06.0549618Z ''')
2023-01-11T21:38:06.0549624Z 
2023-01-11T21:38:06.0549628Z 
2023-01-11T21:38:06.0549719Z async_compile.wait(globals())
2023-01-11T21:38:06.0549796Z del async_compile
2023-01-11T21:38:06.0549802Z 
2023-01-11T21:38:06.0549874Z def call(args):
2023-01-11T21:38:06.0549952Z     arg0_1, = args
2023-01-11T21:38:06.0550025Z     args.clear()
2023-01-11T21:38:06.0550217Z     buf0 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0550348Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0550421Z     del arg0_1
2023-01-11T21:38:06.0550501Z     return (buf0, )
2023-01-11T21:38:06.0550506Z 
2023-01-11T21:38:06.0550510Z 
2023-01-11T21:38:06.0550590Z if __name__ == "__main__":
2023-01-11T21:38:06.0550707Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0550833Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0551025Z     arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0551129Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0551144Z 
2023-01-11T21:38:06.0551208Z ok (1.686s)
2023-01-11T21:38:06.0551673Z   test_shape_prop_torch_ones_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0551837Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0552093Z [2023-01-11 21:31:12,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 245
2023-01-11T21:38:06.0552354Z [2023-01-11 21:31:14,628] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 245
2023-01-11T21:38:06.0552360Z 
2023-01-11T21:38:06.0552458Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0552533Z import torch
2023-01-11T21:38:06.0552607Z import random
2023-01-11T21:38:06.0552726Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0552846Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0552855Z 
2023-01-11T21:38:06.0552936Z aten = torch.ops.aten
2023-01-11T21:38:06.0553072Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0553167Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0553172Z 
2023-01-11T21:38:06.0553245Z import triton
2023-01-11T21:38:06.0553341Z import triton.language as tl
2023-01-11T21:38:06.0553465Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0553597Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0553610Z 
2023-01-11T21:38:06.0553614Z 
2023-01-11T21:38:06.0553743Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0553946Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0554069Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0554171Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0554266Z {
2023-01-11T21:38:06.0554367Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0554433Z     {
2023-01-11T21:38:06.0554507Z         #pragma omp for 
2023-01-11T21:38:06.0554600Z         for(long i0=0; i0<3145728; i0+=1)
2023-01-11T21:38:06.0554665Z         {
2023-01-11T21:38:06.0554806Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0554956Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0555058Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0555168Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0555235Z         }
2023-01-11T21:38:06.0555332Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0555430Z         for(long i0=25165824; i0<25165824; i0+=1)
2023-01-11T21:38:06.0555499Z         {
2023-01-11T21:38:06.0555586Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0555690Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0555779Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0555860Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0555926Z         }
2023-01-11T21:38:06.0555990Z     }
2023-01-11T21:38:06.0556054Z }
2023-01-11T21:38:06.0556138Z ''')
2023-01-11T21:38:06.0556144Z 
2023-01-11T21:38:06.0556148Z 
2023-01-11T21:38:06.0556239Z async_compile.wait(globals())
2023-01-11T21:38:06.0556318Z del async_compile
2023-01-11T21:38:06.0556324Z 
2023-01-11T21:38:06.0556393Z def call(args):
2023-01-11T21:38:06.0556466Z     arg0_1, = args
2023-01-11T21:38:06.0556542Z     args.clear()
2023-01-11T21:38:06.0556774Z     buf0 = empty_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0556913Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0556984Z     del arg0_1
2023-01-11T21:38:06.0557060Z     return (buf0, )
2023-01-11T21:38:06.0557065Z 
2023-01-11T21:38:06.0557069Z 
2023-01-11T21:38:06.0557150Z if __name__ == "__main__":
2023-01-11T21:38:06.0557263Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0557388Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0557619Z     arg0_1 = rand_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0557732Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0557765Z 
2023-01-11T21:38:06.0557838Z ok (2.438s)
2023-01-11T21:38:06.0558289Z   test_sigmoid_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0558420Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0558674Z [2023-01-11 21:31:15,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 246
2023-01-11T21:38:06.0558933Z [2023-01-11 21:31:16,887] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 246
2023-01-11T21:38:06.0558939Z 
2023-01-11T21:38:06.0559030Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0559103Z import torch
2023-01-11T21:38:06.0559181Z import random
2023-01-11T21:38:06.0559300Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0559422Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0559427Z 
2023-01-11T21:38:06.0559509Z aten = torch.ops.aten
2023-01-11T21:38:06.0559642Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0559737Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0559743Z 
2023-01-11T21:38:06.0559810Z import triton
2023-01-11T21:38:06.0559903Z import triton.language as tl
2023-01-11T21:38:06.0560028Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0560196Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0560201Z 
2023-01-11T21:38:06.0560206Z 
2023-01-11T21:38:06.0560340Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0560543Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0560671Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0560781Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0560878Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0560980Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0561045Z {
2023-01-11T21:38:06.0561147Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0561211Z     {
2023-01-11T21:38:06.0561292Z         #pragma omp for 
2023-01-11T21:38:06.0561376Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0561436Z         {
2023-01-11T21:38:06.0561575Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0561716Z             auto tmp2 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0561855Z             auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp());
2023-01-11T21:38:06.0561946Z             auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.0562090Z             auto tmp4 = decltype(tmp3)(1)/(decltype(tmp3)(1) + tmp3.neg().exp());
2023-01-11T21:38:06.0562186Z             tmp1.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0562275Z             tmp4.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0562341Z         }
2023-01-11T21:38:06.0562440Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0562526Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0562591Z         {
2023-01-11T21:38:06.0562678Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0562765Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.0562895Z             auto tmp1 = std::exp(-tmp0);
2023-01-11T21:38:06.0562986Z             auto tmp2 = 1 / (1 + tmp1);
2023-01-11T21:38:06.0563073Z             auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:06.0563206Z             auto tmp5 = std::exp(-tmp4);
2023-01-11T21:38:06.0563292Z             auto tmp6 = 1 / (1 + tmp5);
2023-01-11T21:38:06.0563377Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0563490Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.0563552Z         }
2023-01-11T21:38:06.0563619Z     }
2023-01-11T21:38:06.0563683Z }
2023-01-11T21:38:06.0563767Z ''')
2023-01-11T21:38:06.0563773Z 
2023-01-11T21:38:06.0563777Z 
2023-01-11T21:38:06.0563870Z async_compile.wait(globals())
2023-01-11T21:38:06.0563946Z del async_compile
2023-01-11T21:38:06.0563952Z 
2023-01-11T21:38:06.0564027Z def call(args):
2023-01-11T21:38:06.0564100Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0564174Z     args.clear()
2023-01-11T21:38:06.0564369Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0564564Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0564761Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0564831Z     del arg0_1
2023-01-11T21:38:06.0564899Z     del arg1_1
2023-01-11T21:38:06.0564978Z     return (buf0, buf1, )
2023-01-11T21:38:06.0564986Z 
2023-01-11T21:38:06.0564992Z 
2023-01-11T21:38:06.0565079Z if __name__ == "__main__":
2023-01-11T21:38:06.0565208Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0565357Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0565549Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0565742Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0565861Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0565866Z 
2023-01-11T21:38:06.0565936Z ok (1.728s)
2023-01-11T21:38:06.0566387Z   test_signbit_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0566547Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0566798Z [2023-01-11 21:31:16,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 247
2023-01-11T21:38:06.0567058Z [2023-01-11 21:31:18,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 247
2023-01-11T21:38:06.0567063Z 
2023-01-11T21:38:06.0567160Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0567235Z import torch
2023-01-11T21:38:06.0567311Z import random
2023-01-11T21:38:06.0567429Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0567553Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0567558Z 
2023-01-11T21:38:06.0567639Z aten = torch.ops.aten
2023-01-11T21:38:06.0567767Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0567862Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0567867Z 
2023-01-11T21:38:06.0567943Z import triton
2023-01-11T21:38:06.0568036Z import triton.language as tl
2023-01-11T21:38:06.0568162Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0568301Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0568306Z 
2023-01-11T21:38:06.0568311Z 
2023-01-11T21:38:06.0568446Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0568654Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0568771Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0568871Z                        bool* __restrict__ out_ptr0,
2023-01-11T21:38:06.0568974Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:06.0569036Z {
2023-01-11T21:38:06.0569139Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0569205Z     {
2023-01-11T21:38:06.0569289Z         #pragma omp for 
2023-01-11T21:38:06.0569368Z         for(long i0=0; i0<72; i0+=1)
2023-01-11T21:38:06.0569463Z         {
2023-01-11T21:38:06.0569531Z             {
2023-01-11T21:38:06.0569600Z                 {
2023-01-11T21:38:06.0569700Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0569808Z                     auto tmp1 = std::signbit(tmp0);
2023-01-11T21:38:06.0569935Z                     auto tmp2 = -tmp0;
2023-01-11T21:38:06.0570039Z                     auto tmp3 = std::signbit(tmp2);
2023-01-11T21:38:06.0576413Z                     auto tmp4 = tmp3 == 0;
2023-01-11T21:38:06.0576549Z                     auto tmp5 = static_cast<long>(tmp4);
2023-01-11T21:38:06.0576653Z                     auto tmp6 = static_cast<long>(1);
2023-01-11T21:38:06.0576757Z                     auto tmp7 = tmp5 & tmp6;
2023-01-11T21:38:06.0576848Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.0576942Z                     out_ptr1[i0] = tmp7;
2023-01-11T21:38:06.0577017Z                 }
2023-01-11T21:38:06.0577088Z             }
2023-01-11T21:38:06.0577235Z         }
2023-01-11T21:38:06.0577301Z     }
2023-01-11T21:38:06.0577369Z }
2023-01-11T21:38:06.0577478Z ''')
2023-01-11T21:38:06.0577483Z 
2023-01-11T21:38:06.0577488Z 
2023-01-11T21:38:06.0577586Z async_compile.wait(globals())
2023-01-11T21:38:06.0577665Z del async_compile
2023-01-11T21:38:06.0577670Z 
2023-01-11T21:38:06.0577748Z def call(args):
2023-01-11T21:38:06.0577823Z     arg0_1, = args
2023-01-11T21:38:06.0577894Z     args.clear()
2023-01-11T21:38:06.0578104Z     buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.0578311Z     buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0578561Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0578635Z     del arg0_1
2023-01-11T21:38:06.0578715Z     return (buf0, buf1, )
2023-01-11T21:38:06.0578720Z 
2023-01-11T21:38:06.0578724Z 
2023-01-11T21:38:06.0578806Z if __name__ == "__main__":
2023-01-11T21:38:06.0578926Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0579045Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0579257Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0579370Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0579375Z 
2023-01-11T21:38:06.0579447Z ok (1.776s)
2023-01-11T21:38:06.0579899Z   test_silu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0580035Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0580298Z [2023-01-11 21:31:18,686] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 248
2023-01-11T21:38:06.0580565Z [2023-01-11 21:31:20,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 248
2023-01-11T21:38:06.0580571Z 
2023-01-11T21:38:06.0580670Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0580737Z import torch
2023-01-11T21:38:06.0580814Z import random
2023-01-11T21:38:06.0580933Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0581053Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0581058Z 
2023-01-11T21:38:06.0581143Z aten = torch.ops.aten
2023-01-11T21:38:06.0581274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0581372Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0581378Z 
2023-01-11T21:38:06.0581451Z import triton
2023-01-11T21:38:06.0581548Z import triton.language as tl
2023-01-11T21:38:06.0581672Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0581846Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0581853Z 
2023-01-11T21:38:06.0581857Z 
2023-01-11T21:38:06.0581997Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0582204Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0582322Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0582427Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0582492Z {
2023-01-11T21:38:06.0582592Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0582658Z     {
2023-01-11T21:38:06.0582739Z         #pragma omp for 
2023-01-11T21:38:06.0582826Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0582887Z         {
2023-01-11T21:38:06.0583034Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0583172Z             auto tmp1 = decltype(tmp0)(1)/(decltype(tmp0)(1) + tmp0.neg().exp());
2023-01-11T21:38:06.0583264Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0583359Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0583426Z         }
2023-01-11T21:38:06.0583526Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0583606Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0583673Z         {
2023-01-11T21:38:06.0583758Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0583897Z             auto tmp1 = std::exp(-tmp0);
2023-01-11T21:38:06.0583985Z             auto tmp2 = 1 / (1 + tmp1);
2023-01-11T21:38:06.0584072Z             auto tmp3 = tmp0 * tmp2;
2023-01-11T21:38:06.0584157Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0584252Z         }
2023-01-11T21:38:06.0584317Z     }
2023-01-11T21:38:06.0584378Z }
2023-01-11T21:38:06.0584461Z ''')
2023-01-11T21:38:06.0584467Z 
2023-01-11T21:38:06.0584471Z 
2023-01-11T21:38:06.0584565Z async_compile.wait(globals())
2023-01-11T21:38:06.0584645Z del async_compile
2023-01-11T21:38:06.0584651Z 
2023-01-11T21:38:06.0584725Z def call(args):
2023-01-11T21:38:06.0584794Z     arg0_1, = args
2023-01-11T21:38:06.0584872Z     args.clear()
2023-01-11T21:38:06.0585072Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0585225Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0585309Z     del arg0_1
2023-01-11T21:38:06.0585392Z     return (buf0, )
2023-01-11T21:38:06.0585399Z 
2023-01-11T21:38:06.0585404Z 
2023-01-11T21:38:06.0585498Z if __name__ == "__main__":
2023-01-11T21:38:06.0585618Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0585738Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0585936Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0586045Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0586050Z 
2023-01-11T21:38:06.0586123Z ok (1.710s)
2023-01-11T21:38:06.0586582Z   test_simplify_loops_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0587223Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0587649Z [2023-01-11 21:31:20,388] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 249
2023-01-11T21:38:06.0588110Z [2023-01-11 21:31:22,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 249
2023-01-11T21:38:06.0588321Z 
2023-01-11T21:38:06.0588423Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0588632Z import torch
2023-01-11T21:38:06.0588810Z import random
2023-01-11T21:38:06.0589038Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0589352Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0589519Z 
2023-01-11T21:38:06.0589595Z aten = torch.ops.aten
2023-01-11T21:38:06.0589856Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0590121Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0590256Z 
2023-01-11T21:38:06.0590332Z import triton
2023-01-11T21:38:06.0590526Z import triton.language as tl
2023-01-11T21:38:06.0590786Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0591087Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0591270Z 
2023-01-11T21:38:06.0591274Z 
2023-01-11T21:38:06.0591411Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0591756Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0592112Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0592375Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0592618Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0592820Z {
2023-01-11T21:38:06.0593022Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0593220Z     {
2023-01-11T21:38:06.0593411Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0593629Z         for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.0593813Z         {
2023-01-11T21:38:06.0594000Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0594193Z             {
2023-01-11T21:38:06.0594384Z                 for(long i2=0; i2<3; i2+=1)
2023-01-11T21:38:06.0594579Z                 {
2023-01-11T21:38:06.0594845Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i2) + (30*i1) + (120*i0));
2023-01-11T21:38:06.0595266Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i2) + (30*i0) + (180*i1));
2023-01-11T21:38:06.0595543Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0595799Z                     tmp2.store(out_ptr0 + (8*i2) + (30*i1) + (120*i0));
2023-01-11T21:38:06.0596017Z                 }
2023-01-11T21:38:06.0596217Z                 #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0596447Z                 for(long i2=24; i2<30; i2+=1)
2023-01-11T21:38:06.0596646Z                 {
2023-01-11T21:38:06.0596858Z                     auto tmp0 = in_ptr0[i2 + (30*i1) + (120*i0)];
2023-01-11T21:38:06.0597116Z                     auto tmp1 = in_ptr1[i2 + (30*i0) + (180*i1)];
2023-01-11T21:38:06.0597355Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0597587Z                     out_ptr0[i2 + (30*i1) + (120*i0)] = tmp2;
2023-01-11T21:38:06.0597788Z                 }
2023-01-11T21:38:06.0597965Z             }
2023-01-11T21:38:06.0598129Z         }
2023-01-11T21:38:06.0598292Z     }
2023-01-11T21:38:06.0598461Z }
2023-01-11T21:38:06.0598646Z ''')
2023-01-11T21:38:06.0598758Z 
2023-01-11T21:38:06.0598763Z 
2023-01-11T21:38:06.0598861Z async_compile.wait(globals())
2023-01-11T21:38:06.0599071Z del async_compile
2023-01-11T21:38:06.0599189Z 
2023-01-11T21:38:06.0599269Z def call(args):
2023-01-11T21:38:06.0599456Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0599649Z     args.clear()
2023-01-11T21:38:06.0599990Z     buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0600328Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0600592Z     del arg0_1
2023-01-11T21:38:06.0600767Z     del arg1_1
2023-01-11T21:38:06.0600942Z     return (buf0, )
2023-01-11T21:38:06.0601056Z 
2023-01-11T21:38:06.0601061Z 
2023-01-11T21:38:06.0601142Z if __name__ == "__main__":
2023-01-11T21:38:06.0601380Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0601663Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0602055Z     arg0_1 = rand_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0602472Z     arg1_1 = rand_strided((2, 3, 4, 5, 6), (90, 30, 180, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0602765Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0602921Z 
2023-01-11T21:38:06.0602994Z ok (1.734s)
2023-01-11T21:38:06.0603549Z   test_sin_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0604125Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0604552Z [2023-01-11 21:31:22,138] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 250
2023-01-11T21:38:06.0605023Z [2023-01-11 21:31:23,903] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 250
2023-01-11T21:38:06.0605260Z 
2023-01-11T21:38:06.0605386Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0605592Z import torch
2023-01-11T21:38:06.0605774Z import random
2023-01-11T21:38:06.0606002Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0606271Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0606434Z 
2023-01-11T21:38:06.0606518Z aten = torch.ops.aten
2023-01-11T21:38:06.0606778Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0607033Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0607170Z 
2023-01-11T21:38:06.0607247Z import triton
2023-01-11T21:38:06.0607484Z import triton.language as tl
2023-01-11T21:38:06.0607735Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0608037Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0608210Z 
2023-01-11T21:38:06.0608215Z 
2023-01-11T21:38:06.0608357Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0608713Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0609139Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0609463Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0609766Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0610024Z {
2023-01-11T21:38:06.0610282Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0610548Z     {
2023-01-11T21:38:06.0610794Z         #pragma omp for 
2023-01-11T21:38:06.0611057Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0611321Z         {
2023-01-11T21:38:06.0611641Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0611917Z             auto tmp1 = tmp0.sin();
2023-01-11T21:38:06.0612183Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0612438Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.0612704Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0612949Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:06.0613165Z             auto tmp6 = tmp5.sin();
2023-01-11T21:38:06.0613387Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0613605Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0613797Z         }
2023-01-11T21:38:06.0613999Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0614209Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0614399Z         {
2023-01-11T21:38:06.0614738Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0614965Z             auto tmp1 = std::sin(tmp0);
2023-01-11T21:38:06.0615238Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:06.0615503Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.0615740Z             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:06.0615980Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:06.0616298Z             auto tmp6 = std::sin(tmp5);
2023-01-11T21:38:06.0616516Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0616713Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.0616898Z         }
2023-01-11T21:38:06.0617067Z     }
2023-01-11T21:38:06.0617326Z }
2023-01-11T21:38:06.0617558Z ''')
2023-01-11T21:38:06.0617659Z 
2023-01-11T21:38:06.0617663Z 
2023-01-11T21:38:06.0617763Z async_compile.wait(globals())
2023-01-11T21:38:06.0617963Z del async_compile
2023-01-11T21:38:06.0618086Z 
2023-01-11T21:38:06.0618164Z def call(args):
2023-01-11T21:38:06.0618350Z     arg0_1, = args
2023-01-11T21:38:06.0618526Z     args.clear()
2023-01-11T21:38:06.0618844Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0619212Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0619556Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0619809Z     del arg0_1
2023-01-11T21:38:06.0620005Z     return (buf0, buf1, )
2023-01-11T21:38:06.0620127Z 
2023-01-11T21:38:06.0620131Z 
2023-01-11T21:38:06.0620217Z if __name__ == "__main__":
2023-01-11T21:38:06.0620445Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0620719Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0621082Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0621362Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0621501Z 
2023-01-11T21:38:06.0621574Z ok (1.795s)
2023-01-11T21:38:06.0622143Z   test_sizehint_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0622795Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0623329Z [2023-01-11 21:31:24,057] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 251
2023-01-11T21:38:06.0624010Z [2023-01-11 21:31:25,803] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 251
2023-01-11T21:38:06.0624315Z 
2023-01-11T21:38:06.0624461Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0624766Z import torch
2023-01-11T21:38:06.0625034Z import random
2023-01-11T21:38:06.0625345Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0625658Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0625820Z 
2023-01-11T21:38:06.0625908Z aten = torch.ops.aten
2023-01-11T21:38:06.0626160Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0626426Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0626563Z 
2023-01-11T21:38:06.0626643Z import triton
2023-01-11T21:38:06.0626845Z import triton.language as tl
2023-01-11T21:38:06.0627093Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0627389Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0627560Z 
2023-01-11T21:38:06.0627565Z 
2023-01-11T21:38:06.0627721Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0628058Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0628407Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0628660Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0628858Z {
2023-01-11T21:38:06.0629057Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0629268Z     {
2023-01-11T21:38:06.0629451Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0629669Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0629861Z         {
2023-01-11T21:38:06.0630108Z             for(long i1=0; i1<384; i1+=1)
2023-01-11T21:38:06.0630305Z             {
2023-01-11T21:38:06.0630496Z                 #pragma GCC ivdep
2023-01-11T21:38:06.0630714Z                 for(long i2=0; i2<196; i2+=1)
2023-01-11T21:38:06.0630901Z                 {
2023-01-11T21:38:06.0631078Z                     {
2023-01-11T21:38:06.0631256Z                         {
2023-01-11T21:38:06.0631479Z                             auto tmp0 = static_cast<long>(4*(i2 / 14));
2023-01-11T21:38:06.0631748Z                             auto tmp1 = static_cast<long>((i1 / 4) % 4);
2023-01-11T21:38:06.0632000Z                             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0632246Z                             auto tmp3 = static_cast<long>(4*(i2 % 14));
2023-01-11T21:38:06.0632507Z                             auto tmp4 = static_cast<long>(i1 % 4);
2023-01-11T21:38:06.0632750Z                             auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:06.0633010Z                             auto tmp6 = in_ptr0[tmp5 + (56*tmp2) + (3136*(i1 / 16)) + (75264*i0)];
2023-01-11T21:38:06.0633282Z                             out_ptr0[i2 + (196*i1) + (75264*i0)] = tmp6;
2023-01-11T21:38:06.0633494Z                         }
2023-01-11T21:38:06.0633674Z                     }
2023-01-11T21:38:06.0633843Z                 }
2023-01-11T21:38:06.0634013Z             }
2023-01-11T21:38:06.0634185Z         }
2023-01-11T21:38:06.0634342Z     }
2023-01-11T21:38:06.0634502Z }
2023-01-11T21:38:06.0634680Z ''')
2023-01-11T21:38:06.0634772Z 
2023-01-11T21:38:06.0634777Z 
2023-01-11T21:38:06.0634873Z async_compile.wait(globals())
2023-01-11T21:38:06.0635078Z del async_compile
2023-01-11T21:38:06.0635192Z 
2023-01-11T21:38:06.0635304Z def call(args):
2023-01-11T21:38:06.0635480Z     arg0_1, = args
2023-01-11T21:38:06.0635661Z     args.clear()
2023-01-11T21:38:06.0635987Z     buf0 = empty_strided((2, 384, 196), (75264, 196, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0636290Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0636531Z     del arg0_1
2023-01-11T21:38:06.0636713Z     return (buf0, )
2023-01-11T21:38:06.0636827Z 
2023-01-11T21:38:06.0636832Z 
2023-01-11T21:38:06.0636907Z if __name__ == "__main__":
2023-01-11T21:38:06.0637141Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0637417Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0637800Z     arg0_1 = rand_strided((2, 24, 56, 56), (75264, 3136, 56, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0638076Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0638223Z 
2023-01-11T21:38:06.0638297Z ok (1.903s)
2023-01-11T21:38:06.0638864Z   test_slice1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0639455Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0639866Z [2023-01-11 21:31:25,848] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 252
2023-01-11T21:38:06.0640328Z [2023-01-11 21:31:27,596] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 252
2023-01-11T21:38:06.0640537Z 
2023-01-11T21:38:06.0640638Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0640845Z import torch
2023-01-11T21:38:06.0641025Z import random
2023-01-11T21:38:06.0641255Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0641533Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0641686Z 
2023-01-11T21:38:06.0641769Z aten = torch.ops.aten
2023-01-11T21:38:06.0642023Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0642284Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0642457Z 
2023-01-11T21:38:06.0642529Z import triton
2023-01-11T21:38:06.0642731Z import triton.language as tl
2023-01-11T21:38:06.0642987Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0643274Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0643447Z 
2023-01-11T21:38:06.0643452Z 
2023-01-11T21:38:06.0643591Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0643931Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0644302Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0644555Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0644793Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0644994Z {
2023-01-11T21:38:06.0645182Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0645381Z     {
2023-01-11T21:38:06.0645572Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0645784Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0645971Z         {
2023-01-11T21:38:06.0646162Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.0646346Z             {
2023-01-11T21:38:06.0646519Z                 {
2023-01-11T21:38:06.0646695Z                     {
2023-01-11T21:38:06.0646911Z                         auto tmp0 = in_ptr0[(2*i1) + (40*i0)];
2023-01-11T21:38:06.0647162Z                         auto tmp1 = in_ptr0[20 + (2*i1) + (40*i0)];
2023-01-11T21:38:06.0647403Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0647647Z                         auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:06.0647915Z                         auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:06.0648141Z                         auto tmp5 = tmp1 + tmp3;
2023-01-11T21:38:06.0648373Z                         auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0648600Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.0648839Z                         out_ptr1[i1 + (10*i0)] = tmp6;
2023-01-11T21:38:06.0649045Z                     }
2023-01-11T21:38:06.0649218Z                 }
2023-01-11T21:38:06.0649393Z             }
2023-01-11T21:38:06.0649570Z         }
2023-01-11T21:38:06.0649735Z     }
2023-01-11T21:38:06.0649900Z }
2023-01-11T21:38:06.0650083Z ''')
2023-01-11T21:38:06.0650178Z 
2023-01-11T21:38:06.0650190Z 
2023-01-11T21:38:06.0650281Z async_compile.wait(globals())
2023-01-11T21:38:06.0650494Z del async_compile
2023-01-11T21:38:06.0650611Z 
2023-01-11T21:38:06.0650689Z def call(args):
2023-01-11T21:38:06.0650867Z     arg0_1, = args
2023-01-11T21:38:06.0651057Z     args.clear()
2023-01-11T21:38:06.0651375Z     buf0 = empty_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0651734Z     buf1 = empty_strided((2, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0652070Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0652334Z     del arg0_1
2023-01-11T21:38:06.0652531Z     return (buf0, buf1, )
2023-01-11T21:38:06.0652648Z 
2023-01-11T21:38:06.0652654Z 
2023-01-11T21:38:06.0652737Z if __name__ == "__main__":
2023-01-11T21:38:06.0652974Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0653255Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0653614Z     arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0653898Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0654048Z 
2023-01-11T21:38:06.0654123Z ok (1.790s)
2023-01-11T21:38:06.0654834Z   test_slice2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0655551Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0655982Z [2023-01-11 21:31:27,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 253
2023-01-11T21:38:06.0656449Z [2023-01-11 21:31:29,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 253
2023-01-11T21:38:06.0656658Z 
2023-01-11T21:38:06.0656760Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0656967Z import torch
2023-01-11T21:38:06.0657230Z import random
2023-01-11T21:38:06.0657473Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0657746Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0657910Z 
2023-01-11T21:38:06.0657995Z aten = torch.ops.aten
2023-01-11T21:38:06.0658256Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0658518Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0658661Z 
2023-01-11T21:38:06.0658737Z import triton
2023-01-11T21:38:06.0658948Z import triton.language as tl
2023-01-11T21:38:06.0659209Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0659500Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0659673Z 
2023-01-11T21:38:06.0659678Z 
2023-01-11T21:38:06.0659819Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0660165Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0660517Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0660781Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0661104Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0661308Z {
2023-01-11T21:38:06.0661500Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0661704Z     {
2023-01-11T21:38:06.0661887Z         #pragma omp for 
2023-01-11T21:38:06.0662087Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.0662283Z         {
2023-01-11T21:38:06.0662454Z             {
2023-01-11T21:38:06.0662620Z                 {
2023-01-11T21:38:06.0662831Z                     auto tmp0 = in_ptr0[1 + (4*i0)];
2023-01-11T21:38:06.0663068Z                     auto tmp1 = in_ptr0[42 + (4*i0)];
2023-01-11T21:38:06.0663295Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0663536Z                     auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:06.0663776Z                     auto tmp4 = tmp0 + tmp3;
2023-01-11T21:38:06.0664010Z                     auto tmp5 = static_cast<float>(2);
2023-01-11T21:38:06.0664245Z                     auto tmp6 = tmp1 + tmp5;
2023-01-11T21:38:06.0664477Z                     auto tmp7 = tmp4 + tmp6;
2023-01-11T21:38:06.0664688Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0664905Z                     out_ptr1[i0] = tmp7;
2023-01-11T21:38:06.0665104Z                 }
2023-01-11T21:38:06.0665273Z             }
2023-01-11T21:38:06.0665443Z         }
2023-01-11T21:38:06.0665615Z     }
2023-01-11T21:38:06.0665773Z }
2023-01-11T21:38:06.0665954Z ''')
2023-01-11T21:38:06.0666055Z 
2023-01-11T21:38:06.0666060Z 
2023-01-11T21:38:06.0666158Z async_compile.wait(globals())
2023-01-11T21:38:06.0666362Z del async_compile
2023-01-11T21:38:06.0666477Z 
2023-01-11T21:38:06.0666555Z def call(args):
2023-01-11T21:38:06.0666744Z     arg0_1, = args
2023-01-11T21:38:06.0666931Z     args.clear()
2023-01-11T21:38:06.0667240Z     buf0 = empty_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0667604Z     buf1 = empty_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0667943Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0668202Z     del arg0_1
2023-01-11T21:38:06.0668394Z     return (buf0, buf1, )
2023-01-11T21:38:06.0668516Z 
2023-01-11T21:38:06.0668521Z 
2023-01-11T21:38:06.0668604Z if __name__ == "__main__":
2023-01-11T21:38:06.0668872Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0669153Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0669524Z     arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0669806Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0669946Z 
2023-01-11T21:38:06.0670022Z ok (1.760s)
2023-01-11T21:38:06.0670591Z   test_slice_mutation1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0671185Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0671611Z [2023-01-11 21:31:29,405] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 254
2023-01-11T21:38:06.0672070Z [2023-01-11 21:31:31,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 254
2023-01-11T21:38:06.0672282Z 
2023-01-11T21:38:06.0672384Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0672595Z import torch
2023-01-11T21:38:06.0672775Z import random
2023-01-11T21:38:06.0673005Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0673286Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0673449Z 
2023-01-11T21:38:06.0673535Z aten = torch.ops.aten
2023-01-11T21:38:06.0673789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0674087Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0674222Z 
2023-01-11T21:38:06.0674300Z import triton
2023-01-11T21:38:06.0674496Z import triton.language as tl
2023-01-11T21:38:06.0674754Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0675057Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0675233Z 
2023-01-11T21:38:06.0675237Z 
2023-01-11T21:38:06.0675372Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0675720Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0676072Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0676326Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0676554Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0676785Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:06.0676991Z {
2023-01-11T21:38:06.0677187Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0677396Z     {
2023-01-11T21:38:06.0677577Z         #pragma omp for 
2023-01-11T21:38:06.0677778Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0677971Z         {
2023-01-11T21:38:06.0678219Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(0));
2023-01-11T21:38:06.0678522Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0678774Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0679000Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0679220Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0679418Z         }
2023-01-11T21:38:06.0679620Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0679843Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0680033Z         {
2023-01-11T21:38:06.0680242Z             auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.0680486Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0680708Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0680921Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0681126Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.0681306Z         }
2023-01-11T21:38:06.0681491Z         #pragma omp for 
2023-01-11T21:38:06.0681699Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0681918Z         {
2023-01-11T21:38:06.0682087Z             {
2023-01-11T21:38:06.0682260Z                 {
2023-01-11T21:38:06.0682471Z                     auto tmp0 = static_cast<float>(3.0);
2023-01-11T21:38:06.0682711Z                     out_ptr0[3 + (8*i0)] = tmp0;
2023-01-11T21:38:06.0682910Z                 }
2023-01-11T21:38:06.0683078Z             }
2023-01-11T21:38:06.0683248Z         }
2023-01-11T21:38:06.0683429Z         #pragma omp for 
2023-01-11T21:38:06.0683626Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0683813Z         {
2023-01-11T21:38:06.0684052Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0684309Z             tmp0.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0684508Z         }
2023-01-11T21:38:06.0684707Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0684919Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0685115Z         {
2023-01-11T21:38:06.0685307Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:06.0685515Z             out_ptr2[i0] = tmp0;
2023-01-11T21:38:06.0685727Z         }
2023-01-11T21:38:06.0685929Z         #pragma omp for 
2023-01-11T21:38:06.0686124Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0686315Z         {
2023-01-11T21:38:06.0686559Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(4.0));
2023-01-11T21:38:06.0686828Z             tmp0.store(out_ptr0 + 32 + (8*i0));
2023-01-11T21:38:06.0687021Z         }
2023-01-11T21:38:06.0687219Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0687437Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.0687618Z         {
2023-01-11T21:38:06.0687870Z             auto tmp0 = static_cast<float>(4.0);
2023-01-11T21:38:06.0688095Z             out_ptr0[32 + i0] = tmp0;
2023-01-11T21:38:06.0688276Z         }
2023-01-11T21:38:06.0688458Z         #pragma omp for 
2023-01-11T21:38:06.0688658Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0688837Z         {
2023-01-11T21:38:06.0689076Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0689381Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0689629Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0689851Z             tmp2.store(out_ptr3 + 8*i0);
2023-01-11T21:38:06.0690045Z         }
2023-01-11T21:38:06.0690235Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0690452Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.0690642Z         {
2023-01-11T21:38:06.0690820Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:06.0691049Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0691276Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0691485Z             out_ptr3[i0] = tmp2;
2023-01-11T21:38:06.0691664Z         }
2023-01-11T21:38:06.0691829Z     }
2023-01-11T21:38:06.0691991Z }
2023-01-11T21:38:06.0692161Z ''')
2023-01-11T21:38:06.0692261Z 
2023-01-11T21:38:06.0692266Z 
2023-01-11T21:38:06.0692366Z async_compile.wait(globals())
2023-01-11T21:38:06.0692575Z del async_compile
2023-01-11T21:38:06.0692682Z 
2023-01-11T21:38:06.0692760Z def call(args):
2023-01-11T21:38:06.0692943Z     arg0_1, = args
2023-01-11T21:38:06.0693124Z     args.clear()
2023-01-11T21:38:06.0693425Z     buf0 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0693782Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0694125Z     buf3 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0694461Z     buf5 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0694923Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:06.0695227Z     return (buf0, buf1, buf3, buf5, )
2023-01-11T21:38:06.0695360Z 
2023-01-11T21:38:06.0695365Z 
2023-01-11T21:38:06.0695451Z if __name__ == "__main__":
2023-01-11T21:38:06.0695725Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0696002Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0696358Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0696628Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0696775Z 
2023-01-11T21:38:06.0696848Z ok (1.815s)
2023-01-11T21:38:06.0697345Z   test_slice_mutation2_cpu (__main__.CpuTests) ... [2023-01-11 21:31:31,206] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 255
2023-01-11T21:38:06.0697855Z [2023-01-11 21:31:32,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 255
2023-01-11T21:38:06.0698067Z 
2023-01-11T21:38:06.0698160Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0698367Z import torch
2023-01-11T21:38:06.0698553Z import random
2023-01-11T21:38:06.0698776Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0699055Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0699214Z 
2023-01-11T21:38:06.0699298Z aten = torch.ops.aten
2023-01-11T21:38:06.0699558Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0699813Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0699952Z 
2023-01-11T21:38:06.0700028Z import triton
2023-01-11T21:38:06.0700228Z import triton.language as tl
2023-01-11T21:38:06.0700475Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0700769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0700941Z 
2023-01-11T21:38:06.0700945Z 
2023-01-11T21:38:06.0701124Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0701463Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0701815Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0702072Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0702310Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0702534Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0702736Z {
2023-01-11T21:38:06.0702931Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0703126Z     {
2023-01-11T21:38:06.0703307Z         #pragma omp for 
2023-01-11T21:38:06.0703510Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0703692Z         {
2023-01-11T21:38:06.0703937Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 20 + (8*i0));
2023-01-11T21:38:06.0704246Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0704496Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0704720Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0704917Z         }
2023-01-11T21:38:06.0705113Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0705337Z         for(long i0=16; i0<20; i0+=1)
2023-01-11T21:38:06.0705527Z         {
2023-01-11T21:38:06.0705717Z             auto tmp0 = in_ptr0[20 + i0];
2023-01-11T21:38:06.0705952Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0706176Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0706387Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0706567Z         }
2023-01-11T21:38:06.0706746Z         #pragma omp for 
2023-01-11T21:38:06.0706952Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0707133Z         {
2023-01-11T21:38:06.0707373Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0707639Z             tmp0.store(out_ptr1 + 20 + (8*i0));
2023-01-11T21:38:06.0707834Z         }
2023-01-11T21:38:06.0708029Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0708250Z         for(long i0=16; i0<20; i0+=1)
2023-01-11T21:38:06.0708431Z         {
2023-01-11T21:38:06.0708619Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:06.0708830Z             out_ptr1[20 + i0] = tmp0;
2023-01-11T21:38:06.0709009Z         }
2023-01-11T21:38:06.0709227Z         #pragma omp for 
2023-01-11T21:38:06.0709433Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0709614Z         {
2023-01-11T21:38:06.0709855Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 1 + (8*i0));
2023-01-11T21:38:06.0710161Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0710406Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0710627Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0710821Z         }
2023-01-11T21:38:06.0711016Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0711224Z         for(long i0=8; i0<9; i0+=1)
2023-01-11T21:38:06.0711413Z         {
2023-01-11T21:38:06.0711604Z             auto tmp0 = out_ptr1[1 + i0];
2023-01-11T21:38:06.0711832Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:06.0712055Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0712268Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:06.0712333Z         }
2023-01-11T21:38:06.0712416Z         #pragma omp for 
2023-01-11T21:38:06.0712502Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0712570Z         {
2023-01-11T21:38:06.0712709Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0712813Z             tmp0.store(out_ptr1 + 2 + (8*i0));
2023-01-11T21:38:06.0712880Z         }
2023-01-11T21:38:06.0712973Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0713059Z         for(long i0=8; i0<9; i0+=1)
2023-01-11T21:38:06.0713127Z         {
2023-01-11T21:38:06.0713216Z             auto tmp0 = out_ptr2[i0];
2023-01-11T21:38:06.0713337Z             out_ptr1[2 + i0] = tmp0;
2023-01-11T21:38:06.0713405Z         }
2023-01-11T21:38:06.0713477Z     }
2023-01-11T21:38:06.0713536Z }
2023-01-11T21:38:06.0713625Z ''')
2023-01-11T21:38:06.0713631Z 
2023-01-11T21:38:06.0713635Z 
2023-01-11T21:38:06.0713730Z async_compile.wait(globals())
2023-01-11T21:38:06.0713812Z del async_compile
2023-01-11T21:38:06.0713820Z 
2023-01-11T21:38:06.0713899Z def call(args):
2023-01-11T21:38:06.0713976Z     arg0_1, = args
2023-01-11T21:38:06.0714052Z     args.clear()
2023-01-11T21:38:06.0714247Z     buf0 = empty_strided((1, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0714442Z     buf2 = empty_strided((1, 9), (9, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0714643Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0714718Z     del arg0_1
2023-01-11T21:38:06.0714789Z     return ()
2023-01-11T21:38:06.0714794Z 
2023-01-11T21:38:06.0714801Z 
2023-01-11T21:38:06.0714884Z if __name__ == "__main__":
2023-01-11T21:38:06.0715003Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0715144Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0715370Z     arg0_1 = rand_strided((1, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0715491Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0715497Z 
2023-01-11T21:38:06.0715570Z ok (1.771s)
2023-01-11T21:38:06.0716032Z   test_slice_scatter2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0716168Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0716431Z [2023-01-11 21:31:32,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 256
2023-01-11T21:38:06.0716698Z [2023-01-11 21:31:34,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 256
2023-01-11T21:38:06.0716704Z 
2023-01-11T21:38:06.0716833Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0716912Z import torch
2023-01-11T21:38:06.0716982Z import random
2023-01-11T21:38:06.0717102Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0717233Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0717238Z 
2023-01-11T21:38:06.0717323Z aten = torch.ops.aten
2023-01-11T21:38:06.0717462Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0717561Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0717566Z 
2023-01-11T21:38:06.0717643Z import triton
2023-01-11T21:38:06.0717737Z import triton.language as tl
2023-01-11T21:38:06.0717856Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0718002Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0718007Z 
2023-01-11T21:38:06.0718012Z 
2023-01-11T21:38:06.0718149Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0718357Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0718483Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0718588Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0718653Z {
2023-01-11T21:38:06.0718754Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0718815Z     {
2023-01-11T21:38:06.0718900Z         #pragma omp for 
2023-01-11T21:38:06.0718988Z         for(long i0=0; i0<75648; i0+=1)
2023-01-11T21:38:06.0719054Z         {
2023-01-11T21:38:06.0719194Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0719294Z             tmp0.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0719391Z         }
2023-01-11T21:38:06.0719484Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0719580Z         for(long i0=605184; i0<605184; i0+=1)
2023-01-11T21:38:06.0719648Z         {
2023-01-11T21:38:06.0719739Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0719826Z             out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.0719895Z         }
2023-01-11T21:38:06.0719956Z     }
2023-01-11T21:38:06.0720021Z }
2023-01-11T21:38:06.0720106Z ''')
2023-01-11T21:38:06.0720112Z 
2023-01-11T21:38:06.0720117Z 
2023-01-11T21:38:06.0720210Z async_compile.wait(globals())
2023-01-11T21:38:06.0720289Z del async_compile
2023-01-11T21:38:06.0720294Z 
2023-01-11T21:38:06.0720370Z def call(args):
2023-01-11T21:38:06.0720451Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0720527Z     args.clear()
2023-01-11T21:38:06.0720735Z     buf0 = empty_strided((8, 197, 384), (75648, 384, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0720874Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0720952Z     del arg1_1
2023-01-11T21:38:06.0721028Z     return (buf0, )
2023-01-11T21:38:06.0721034Z 
2023-01-11T21:38:06.0721038Z 
2023-01-11T21:38:06.0721118Z if __name__ == "__main__":
2023-01-11T21:38:06.0721235Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0721364Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0721581Z     arg0_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0721788Z     arg1_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0721910Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0721915Z 
2023-01-11T21:38:06.0721986Z ok (1.732s)
2023-01-11T21:38:06.0722446Z   test_slice_scatter_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0722580Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0722869Z [2023-01-11 21:31:34,712] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 257
2023-01-11T21:38:06.0723135Z [2023-01-11 21:31:36,427] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 257
2023-01-11T21:38:06.0723140Z 
2023-01-11T21:38:06.0723240Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0723316Z import torch
2023-01-11T21:38:06.0723385Z import random
2023-01-11T21:38:06.0723506Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0723631Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0723636Z 
2023-01-11T21:38:06.0723723Z aten = torch.ops.aten
2023-01-11T21:38:06.0723864Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0723960Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0723966Z 
2023-01-11T21:38:06.0724040Z import triton
2023-01-11T21:38:06.0724133Z import triton.language as tl
2023-01-11T21:38:06.0724253Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0724396Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0724402Z 
2023-01-11T21:38:06.0724406Z 
2023-01-11T21:38:06.0724544Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0724754Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0724879Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0724991Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0725096Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0725206Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0725310Z {
2023-01-11T21:38:06.0725432Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0725505Z     {
2023-01-11T21:38:06.0725587Z         #pragma omp for 
2023-01-11T21:38:06.0725675Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0725742Z         {
2023-01-11T21:38:06.0725823Z             #pragma GCC ivdep
2023-01-11T21:38:06.0725915Z             for(long i1=0; i1<100; i1+=1)
2023-01-11T21:38:06.0725984Z             {
2023-01-11T21:38:06.0726054Z                 {
2023-01-11T21:38:06.0726127Z                     {
2023-01-11T21:38:06.0726241Z                         auto tmp8 = in_ptr1[i1 + (100*i0)];
2023-01-11T21:38:06.0726353Z                         auto tmp0 = static_cast<int>(i1);
2023-01-11T21:38:06.0726456Z                         auto tmp1 = static_cast<int>(10);
2023-01-11T21:38:06.0726556Z                         auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:06.0726667Z                         auto tmp3 = static_cast<int>(90);
2023-01-11T21:38:06.0726769Z                         auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:06.0726872Z                         auto tmp5 = tmp2 & tmp4;
2023-01-11T21:38:06.0726963Z                         float tmp6 = 0.0;
2023-01-11T21:38:06.0727046Z                         if(tmp5)
2023-01-11T21:38:06.0727113Z                         {
2023-01-11T21:38:06.0727290Z                             auto tmp7 = in_ptr0[(-10) + i1 + (80*i0)];
2023-01-11T21:38:06.0727379Z                             tmp6 = tmp7;
2023-01-11T21:38:06.0727453Z                         }
2023-01-11T21:38:06.0727560Z                         auto tmp9 = tmp5 ? tmp6 : tmp8;
2023-01-11T21:38:06.0727676Z                         auto tmp10 = static_cast<int>(i1 % 2);
2023-01-11T21:38:06.0727784Z                         auto tmp11 = static_cast<int>(0);
2023-01-11T21:38:06.0727886Z                         auto tmp12 = tmp10 == tmp11;
2023-01-11T21:38:06.0727979Z                         auto tmp13 = tmp5 & tmp12;
2023-01-11T21:38:06.0728072Z                         float tmp14 = 0.0;
2023-01-11T21:38:06.0728158Z                         if(tmp13)
2023-01-11T21:38:06.0728233Z                         {
2023-01-11T21:38:06.0728412Z                             auto tmp15 = in_ptr0[(-5) + (80*i0) + (i1 / 2)];
2023-01-11T21:38:06.0728503Z                             tmp14 = tmp15;
2023-01-11T21:38:06.0728608Z                         }
2023-01-11T21:38:06.0728714Z                         auto tmp16 = tmp13 ? tmp14 : tmp8;
2023-01-11T21:38:06.0728816Z                         out_ptr0[i1 + (100*i0)] = tmp9;
2023-01-11T21:38:06.0728919Z                         out_ptr1[i1 + (100*i0)] = tmp16;
2023-01-11T21:38:06.0728992Z                     }
2023-01-11T21:38:06.0729061Z                 }
2023-01-11T21:38:06.0729131Z             }
2023-01-11T21:38:06.0729199Z         }
2023-01-11T21:38:06.0729259Z     }
2023-01-11T21:38:06.0729324Z }
2023-01-11T21:38:06.0729410Z ''')
2023-01-11T21:38:06.0729415Z 
2023-01-11T21:38:06.0729420Z 
2023-01-11T21:38:06.0729515Z async_compile.wait(globals())
2023-01-11T21:38:06.0729600Z del async_compile
2023-01-11T21:38:06.0729605Z 
2023-01-11T21:38:06.0729680Z def call(args):
2023-01-11T21:38:06.0729760Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0729830Z     args.clear()
2023-01-11T21:38:06.0730042Z     buf0 = empty_strided((4, 8, 100), (800, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0730255Z     buf1 = empty_strided((4, 8, 100), (800, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0730450Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0730523Z     del arg0_1
2023-01-11T21:38:06.0730595Z     del arg1_1
2023-01-11T21:38:06.0730676Z     return (buf0, buf1, )
2023-01-11T21:38:06.0730681Z 
2023-01-11T21:38:06.0730685Z 
2023-01-11T21:38:06.0730766Z if __name__ == "__main__":
2023-01-11T21:38:06.0730878Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0731004Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0731245Z     arg0_1 = rand_strided((4, 8, 100), (800, 100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0731452Z     arg1_1 = rand_strided((4, 8, 80), (640, 80, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0731573Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0731581Z 
2023-01-11T21:38:06.0731652Z ok (1.753s)
2023-01-11T21:38:06.0732106Z   test_softmax_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0732239Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0732499Z [2023-01-11 21:31:36,469] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 258
2023-01-11T21:38:06.0732507Z 
2023-01-11T21:38:06.0732599Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0732674Z import torch
2023-01-11T21:38:06.0732749Z import random
2023-01-11T21:38:06.0732871Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0732999Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0733004Z 
2023-01-11T21:38:06.0733088Z aten = torch.ops.aten
2023-01-11T21:38:06.0733224Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0733320Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0733326Z 
2023-01-11T21:38:06.0733394Z import triton
2023-01-11T21:38:06.0733488Z import triton.language as tl
2023-01-11T21:38:06.0733613Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0733752Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0733758Z 
2023-01-11T21:38:06.0733762Z 
2023-01-11T21:38:06.0733903Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0734109Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0734230Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0734337Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.0734463Z                        float* __restrict__ in_out_ptr2,
2023-01-11T21:38:06.0734706Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0734816Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0734921Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0735023Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0735124Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0735224Z                        float* __restrict__ out_ptr6,
2023-01-11T21:38:06.0735317Z                        float* __restrict__ out_ptr7,
2023-01-11T21:38:06.0735420Z                        float* __restrict__ out_ptr8)
2023-01-11T21:38:06.0735490Z {
2023-01-11T21:38:06.0735583Z     auto out_ptr3 = in_out_ptr0;
2023-01-11T21:38:06.0735673Z     auto out_ptr4 = in_out_ptr1;
2023-01-11T21:38:06.0735762Z     auto out_ptr5 = in_out_ptr2;
2023-01-11T21:38:06.0735864Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0735926Z     {
2023-01-11T21:38:06.0736009Z         #pragma omp for 
2023-01-11T21:38:06.0736097Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0736165Z         {
2023-01-11T21:38:06.0736234Z             {
2023-01-11T21:38:06.0736600Z                 #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:06.0736844Z                 float tmp3 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0736973Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0737284Z                 float tmp4 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0737417Z                 auto tmp4_vec = at::vec::Vectorized<float>(tmp4);
2023-01-11T21:38:06.0737512Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0737583Z                 {
2023-01-11T21:38:06.0737736Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0737885Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0737986Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0738109Z                     tmp3_vec = at::vec::maximum(tmp3_vec, tmp2);
2023-01-11T21:38:06.0738225Z                     tmp4_vec = at::vec::maximum(tmp4_vec, tmp1);
2023-01-11T21:38:06.0738295Z                 }
2023-01-11T21:38:06.0738512Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp3_vec);
2023-01-11T21:38:06.0738719Z                 tmp4 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp4_vec);
2023-01-11T21:38:06.0738872Z                 #pragma omp simd simdlen(4)  reduction(max:tmp3) reduction(max:tmp4)
2023-01-11T21:38:06.0738971Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0739043Z                 {
2023-01-11T21:38:06.0739150Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0739253Z                     auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:06.0739344Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0739451Z                     tmp3 = std::max(tmp3, tmp2);
2023-01-11T21:38:06.0739557Z                     tmp4 = std::max(tmp4, tmp1);
2023-01-11T21:38:06.0739627Z                 }
2023-01-11T21:38:06.0739714Z                 out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0739801Z                 out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.0739875Z             }
2023-01-11T21:38:06.0739936Z         }
2023-01-11T21:38:06.0740019Z         #pragma omp for 
2023-01-11T21:38:06.0740106Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0740173Z         {
2023-01-11T21:38:06.0740243Z             {
2023-01-11T21:38:06.0740312Z                 {
2023-01-11T21:38:06.0740578Z                     float tmp1 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0740671Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0740745Z                     {
2023-01-11T21:38:06.0740820Z                         {
2023-01-11T21:38:06.0740931Z                             auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:06.0741043Z                             tmp1 = std::max(tmp1, tmp0);
2023-01-11T21:38:06.0741118Z                         }
2023-01-11T21:38:06.0741191Z                     }
2023-01-11T21:38:06.0741276Z                     out_ptr2[i0] = tmp1;
2023-01-11T21:38:06.0741346Z                 }
2023-01-11T21:38:06.0741419Z             }
2023-01-11T21:38:06.0741487Z         }
2023-01-11T21:38:06.0741570Z         #pragma omp for 
2023-01-11T21:38:06.0741655Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0741716Z         {
2023-01-11T21:38:06.0741785Z             {
2023-01-11T21:38:06.0741981Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0742071Z                 float tmp12 = 0;
2023-01-11T21:38:06.0742200Z                 auto tmp12_vec = at::vec::Vectorized<float>(tmp12);
2023-01-11T21:38:06.0742295Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0742366Z                 {
2023-01-11T21:38:06.0742515Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0742655Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0742787Z                     auto tmp3 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:06.0742976Z                     auto tmp6 = at::vec::Vectorized<float>::loadu(out_ptr2 + 8*i1);
2023-01-11T21:38:06.0743110Z                     auto tmp9 = at::vec::Vectorized<float>(out_ptr1[i0]);
2023-01-11T21:38:06.0743210Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0743354Z                     auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:06.0743452Z                     auto tmp5 = tmp4.exp();
2023-01-11T21:38:06.0743591Z                     auto tmp7 = tmp0 - tmp6;
2023-01-11T21:38:06.0743679Z                     auto tmp8 = tmp7.exp();
2023-01-11T21:38:06.0743820Z                     auto tmp10 = tmp1 - tmp9;
2023-01-11T21:38:06.0743919Z                     auto tmp11 = tmp10.exp();
2023-01-11T21:38:06.0744033Z                     tmp5.store(out_ptr3 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0744145Z                     tmp8.store(out_ptr4 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0744261Z                     tmp11.store(out_ptr5 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0744355Z                     tmp12_vec += tmp5;
2023-01-11T21:38:06.0744418Z                 }
2023-01-11T21:38:06.0744620Z                 tmp12 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp12_vec);
2023-01-11T21:38:06.0744748Z                 #pragma omp simd simdlen(4)  reduction(+:tmp12)
2023-01-11T21:38:06.0744846Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0744914Z                 {
2023-01-11T21:38:06.0745019Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0745121Z                     auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:06.0745218Z                     auto tmp3 = out_ptr0[i0];
2023-01-11T21:38:06.0745309Z                     auto tmp6 = out_ptr2[i1];
2023-01-11T21:38:06.0745405Z                     auto tmp9 = out_ptr1[i0];
2023-01-11T21:38:06.0745502Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0745641Z                     auto tmp4 = tmp2 - tmp3;
2023-01-11T21:38:06.0745750Z                     auto tmp5 = std::exp(tmp4);
2023-01-11T21:38:06.0745890Z                     auto tmp7 = tmp0 - tmp6;
2023-01-11T21:38:06.0745995Z                     auto tmp8 = std::exp(tmp7);
2023-01-11T21:38:06.0746126Z                     auto tmp10 = tmp1 - tmp9;
2023-01-11T21:38:06.0746264Z                     auto tmp11 = std::exp(tmp10);
2023-01-11T21:38:06.0746365Z                     out_ptr3[i1 + (8*i0)] = tmp5;
2023-01-11T21:38:06.0746462Z                     out_ptr4[i1 + (8*i0)] = tmp8;
2023-01-11T21:38:06.0746561Z                     out_ptr5[i1 + (8*i0)] = tmp11;
2023-01-11T21:38:06.0746648Z                     tmp12 += tmp5;
2023-01-11T21:38:06.0746719Z                 }
2023-01-11T21:38:06.0746812Z                 out_ptr6[i0] = tmp12;
2023-01-11T21:38:06.0746874Z             }
2023-01-11T21:38:06.0746943Z         }
2023-01-11T21:38:06.0747026Z         #pragma omp for 
2023-01-11T21:38:06.0747114Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0747186Z         {
2023-01-11T21:38:06.0747274Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0747336Z             {
2023-01-11T21:38:06.0747489Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr3 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0747627Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr6[i0]);
2023-01-11T21:38:06.0747721Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0747834Z                 tmp2.store(in_out_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0747905Z             }
2023-01-11T21:38:06.0748003Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0748090Z             for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0748153Z             {
2023-01-11T21:38:06.0748255Z                 auto tmp0 = out_ptr3[i1 + (8*i0)];
2023-01-11T21:38:06.0748349Z                 auto tmp1 = out_ptr6[i0];
2023-01-11T21:38:06.0748444Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0748543Z                 in_out_ptr0[i1 + (8*i0)] = tmp2;
2023-01-11T21:38:06.0748646Z             }
2023-01-11T21:38:06.0748715Z         }
2023-01-11T21:38:06.0748791Z         #pragma omp for 
2023-01-11T21:38:06.0748877Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0748947Z         {
2023-01-11T21:38:06.0749014Z             {
2023-01-11T21:38:06.0749083Z                 {
2023-01-11T21:38:06.0749172Z                     float tmp1 = 0;
2023-01-11T21:38:06.0749262Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0749335Z                     {
2023-01-11T21:38:06.0749409Z                         {
2023-01-11T21:38:06.0749521Z                             auto tmp0 = out_ptr4[i0 + (8*i1)];
2023-01-11T21:38:06.0749610Z                             tmp1 += tmp0;
2023-01-11T21:38:06.0749683Z                         }
2023-01-11T21:38:06.0749754Z                     }
2023-01-11T21:38:06.0749837Z                     out_ptr7[i0] = tmp1;
2023-01-11T21:38:06.0749908Z                 }
2023-01-11T21:38:06.0749976Z             }
2023-01-11T21:38:06.0750047Z         }
2023-01-11T21:38:06.0750129Z         #pragma omp for 
2023-01-11T21:38:06.0750217Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0750288Z         {
2023-01-11T21:38:06.0750369Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0750438Z             {
2023-01-11T21:38:06.0750590Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr4 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0750733Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(out_ptr7 + 8*i1);
2023-01-11T21:38:06.0750828Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0750942Z                 tmp2.store(in_out_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0751013Z             }
2023-01-11T21:38:06.0751103Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0751191Z             for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0751260Z             {
2023-01-11T21:38:06.0751362Z                 auto tmp0 = out_ptr4[i1 + (8*i0)];
2023-01-11T21:38:06.0751458Z                 auto tmp1 = out_ptr7[i1];
2023-01-11T21:38:06.0751552Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0751656Z                 in_out_ptr1[i1 + (8*i0)] = tmp2;
2023-01-11T21:38:06.0751717Z             }
2023-01-11T21:38:06.0751787Z         }
2023-01-11T21:38:06.0751870Z         #pragma omp for 
2023-01-11T21:38:06.0751986Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0752057Z         {
2023-01-11T21:38:06.0752127Z             {
2023-01-11T21:38:06.0752322Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0752401Z                 float tmp1 = 0;
2023-01-11T21:38:06.0752530Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0752626Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0752697Z                 {
2023-01-11T21:38:06.0752847Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr5 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0752939Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.0753011Z                 }
2023-01-11T21:38:06.0753220Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0753342Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.0753437Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0753507Z                 {
2023-01-11T21:38:06.0753614Z                     auto tmp0 = out_ptr5[i1 + (8*i0)];
2023-01-11T21:38:06.0753698Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0753769Z                 }
2023-01-11T21:38:06.0753858Z                 out_ptr8[i0] = tmp1;
2023-01-11T21:38:06.0753920Z             }
2023-01-11T21:38:06.0753987Z         }
2023-01-11T21:38:06.0754070Z         #pragma omp for 
2023-01-11T21:38:06.0754157Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0754227Z         {
2023-01-11T21:38:06.0754347Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0754418Z             {
2023-01-11T21:38:06.0754556Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr5 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0754690Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr8[i0]);
2023-01-11T21:38:06.0754786Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0754896Z                 tmp2.store(in_out_ptr2 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0754966Z             }
2023-01-11T21:38:06.0755062Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0755150Z             for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0755212Z             {
2023-01-11T21:38:06.0755317Z                 auto tmp0 = out_ptr5[i1 + (8*i0)];
2023-01-11T21:38:06.0755411Z                 auto tmp1 = out_ptr8[i0];
2023-01-11T21:38:06.0755503Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0755604Z                 in_out_ptr2[i1 + (8*i0)] = tmp2;
2023-01-11T21:38:06.0755679Z             }
2023-01-11T21:38:06.0755747Z         }
2023-01-11T21:38:06.0755807Z     }
2023-01-11T21:38:06.0755873Z }
2023-01-11T21:38:06.0755961Z ''')
2023-01-11T21:38:06.0755967Z 
2023-01-11T21:38:06.0755973Z 
2023-01-11T21:38:06.0756067Z async_compile.wait(globals())
2023-01-11T21:38:06.0756146Z del async_compile
2023-01-11T21:38:06.0756151Z 
2023-01-11T21:38:06.0756229Z def call(args):
2023-01-11T21:38:06.0756310Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0756380Z     args.clear()
2023-01-11T21:38:06.0756576Z     buf0 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0756770Z     buf8 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0756960Z     buf4 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0757150Z     buf1 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0757336Z     buf5 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0757527Z     buf9 = empty_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0757711Z     buf2 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0757796Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0758013Z     buf6 = empty_strided((1, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0758106Z     buf7 = buf5; del buf5  # reuse
2023-01-11T21:38:06.0758301Z     buf10 = empty_strided((8, 1), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0758392Z     buf11 = buf9; del buf9  # reuse
2023-01-11T21:38:06.0758740Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf11.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf10.data_ptr()))
2023-01-11T21:38:06.0758814Z     del arg0_1
2023-01-11T21:38:06.0758890Z     del arg1_1
2023-01-11T21:38:06.0758975Z     return (buf3, buf7, buf11, )
2023-01-11T21:38:06.0758991Z 
2023-01-11T21:38:06.0758996Z 
2023-01-11T21:38:06.0759071Z if __name__ == "__main__":
2023-01-11T21:38:06.0759191Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0759320Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0759518Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0759715Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0759837Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0760107Z [2023-01-11 21:31:38,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 258
2023-01-11T21:38:06.0760113Z 
2023-01-11T21:38:06.0760186Z ok (1.876s)
2023-01-11T21:38:06.0760640Z   test_softmax_one_kernel_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0760811Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0761075Z [2023-01-11 21:31:38,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 259
2023-01-11T21:38:06.0761338Z [2023-01-11 21:31:40,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 259
2023-01-11T21:38:06.0761344Z 
2023-01-11T21:38:06.0761443Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0761520Z import torch
2023-01-11T21:38:06.0761596Z import random
2023-01-11T21:38:06.0761719Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0761843Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0761852Z 
2023-01-11T21:38:06.0761929Z aten = torch.ops.aten
2023-01-11T21:38:06.0762067Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0762166Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0762170Z 
2023-01-11T21:38:06.0762247Z import triton
2023-01-11T21:38:06.0762344Z import triton.language as tl
2023-01-11T21:38:06.0762471Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0762610Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0762616Z 
2023-01-11T21:38:06.0762620Z 
2023-01-11T21:38:06.0762757Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0762959Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0763080Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0763192Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0763298Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0763401Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0763468Z {
2023-01-11T21:38:06.0763561Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:06.0763656Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0763724Z     {
2023-01-11T21:38:06.0763805Z         #pragma omp for 
2023-01-11T21:38:06.0763922Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:06.0763992Z         {
2023-01-11T21:38:06.0764062Z             {
2023-01-11T21:38:06.0764428Z                 #pragma omp declare reduction(max:at::vec::Vectorized<float>:omp_out = at::vec::maximum(omp_out, omp_in)) initializer(omp_priv={{-std::numeric_limits<float>::infinity()}})
2023-01-11T21:38:06.0764641Z                 float tmp1 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.0764762Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0764859Z                 for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0764932Z                 {
2023-01-11T21:38:06.0765087Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0765212Z                     tmp1_vec = at::vec::maximum(tmp1_vec, tmp0);
2023-01-11T21:38:06.0765283Z                 }
2023-01-11T21:38:06.0765500Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return at::vec::maximum(x, y);}, tmp1_vec);
2023-01-11T21:38:06.0765628Z                 #pragma omp simd simdlen(4)  reduction(max:tmp1)
2023-01-11T21:38:06.0765717Z                 for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:06.0765786Z                 {
2023-01-11T21:38:06.0765894Z                     auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:06.0766000Z                     tmp1 = std::max(tmp1, tmp0);
2023-01-11T21:38:06.0766069Z                 }
2023-01-11T21:38:06.0766157Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.0766226Z             }
2023-01-11T21:38:06.0766320Z         }
2023-01-11T21:38:06.0766402Z         #pragma omp for 
2023-01-11T21:38:06.0766489Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:06.0766557Z         {
2023-01-11T21:38:06.0766626Z             {
2023-01-11T21:38:06.0766824Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0766910Z                 float tmp4 = 0;
2023-01-11T21:38:06.0767030Z                 auto tmp4_vec = at::vec::Vectorized<float>(tmp4);
2023-01-11T21:38:06.0767127Z                 for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0767197Z                 {
2023-01-11T21:38:06.0767344Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0767479Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:06.0767582Z                     auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0767680Z                     auto tmp3 = tmp2.exp();
2023-01-11T21:38:06.0767798Z                     tmp3.store(out_ptr1 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0767880Z                     tmp4_vec += tmp3;
2023-01-11T21:38:06.0767949Z                 }
2023-01-11T21:38:06.0768151Z                 tmp4 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp4_vec);
2023-01-11T21:38:06.0768281Z                 #pragma omp simd simdlen(4)  reduction(+:tmp4)
2023-01-11T21:38:06.0768378Z                 for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:06.0768449Z                 {
2023-01-11T21:38:06.0768554Z                     auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:06.0768653Z                     auto tmp1 = out_ptr0[i0];
2023-01-11T21:38:06.0768744Z                     auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0768850Z                     auto tmp3 = std::exp(tmp2);
2023-01-11T21:38:06.0768949Z                     out_ptr1[i1 + (32*i0)] = tmp3;
2023-01-11T21:38:06.0769036Z                     tmp4 += tmp3;
2023-01-11T21:38:06.0769108Z                 }
2023-01-11T21:38:06.0769195Z                 out_ptr2[i0] = tmp4;
2023-01-11T21:38:06.0769264Z             }
2023-01-11T21:38:06.0769325Z         }
2023-01-11T21:38:06.0769408Z         #pragma omp for 
2023-01-11T21:38:06.0769495Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:06.0769595Z         {
2023-01-11T21:38:06.0769687Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0769756Z             {
2023-01-11T21:38:06.0769899Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0770032Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr2[i0]);
2023-01-11T21:38:06.0770126Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0770241Z                 tmp2.store(in_out_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0770312Z             }
2023-01-11T21:38:06.0770410Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0770501Z             for(long i1=32; i1<32; i1+=1)
2023-01-11T21:38:06.0770573Z             {
2023-01-11T21:38:06.0770669Z                 auto tmp0 = out_ptr1[i1 + (32*i0)];
2023-01-11T21:38:06.0770763Z                 auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:06.0770856Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0770959Z                 in_out_ptr0[i1 + (32*i0)] = tmp2;
2023-01-11T21:38:06.0771028Z             }
2023-01-11T21:38:06.0771096Z         }
2023-01-11T21:38:06.0771163Z     }
2023-01-11T21:38:06.0771222Z }
2023-01-11T21:38:06.0771311Z ''')
2023-01-11T21:38:06.0771316Z 
2023-01-11T21:38:06.0771321Z 
2023-01-11T21:38:06.0771419Z async_compile.wait(globals())
2023-01-11T21:38:06.0771497Z del async_compile
2023-01-11T21:38:06.0771503Z 
2023-01-11T21:38:06.0771576Z def call(args):
2023-01-11T21:38:06.0771651Z     arg0_1, = args
2023-01-11T21:38:06.0771730Z     args.clear()
2023-01-11T21:38:06.0771922Z     buf0 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0772151Z     buf1 = empty_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0772347Z     buf2 = empty_strided((16, 1), (1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0772439Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0772635Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0772709Z     del arg0_1
2023-01-11T21:38:06.0772788Z     return (buf3, )
2023-01-11T21:38:06.0772793Z 
2023-01-11T21:38:06.0772798Z 
2023-01-11T21:38:06.0772879Z if __name__ == "__main__":
2023-01-11T21:38:06.0772991Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0773119Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0773322Z     arg0_1 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0773435Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0773440Z 
2023-01-11T21:38:06.0773516Z ok (1.751s)
2023-01-11T21:38:06.0773971Z   test_sort_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0774105Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0774367Z [2023-01-11 21:31:40,069] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 260
2023-01-11T21:38:06.0774701Z [2023-01-11 21:31:40,074] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort
2023-01-11T21:38:06.0774963Z [2023-01-11 21:31:40,077] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 260
2023-01-11T21:38:06.0774978Z 
2023-01-11T21:38:06.0775071Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0775150Z import torch
2023-01-11T21:38:06.0775244Z import random
2023-01-11T21:38:06.0775380Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0775520Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0775526Z 
2023-01-11T21:38:06.0775608Z aten = torch.ops.aten
2023-01-11T21:38:06.0775803Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0775896Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0775901Z 
2023-01-11T21:38:06.0775977Z import triton
2023-01-11T21:38:06.0776071Z import triton.language as tl
2023-01-11T21:38:06.0776197Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0776338Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0776344Z 
2023-01-11T21:38:06.0776348Z 
2023-01-11T21:38:06.0776442Z async_compile.wait(globals())
2023-01-11T21:38:06.0776520Z del async_compile
2023-01-11T21:38:06.0776525Z 
2023-01-11T21:38:06.0776601Z def call(args):
2023-01-11T21:38:06.0776672Z     arg0_1, = args
2023-01-11T21:38:06.0776747Z     args.clear()
2023-01-11T21:38:06.0776837Z     buf0 = aten.sort(arg0_1)
2023-01-11T21:38:06.0776910Z     del arg0_1
2023-01-11T21:38:06.0776984Z     buf1 = buf0[0]
2023-01-11T21:38:06.0777098Z     assert_size_stride(buf1, (1, 1, 8, 8), (64, 64, 8, 1))
2023-01-11T21:38:06.0777234Z     buf2 = buf0[1]
2023-01-11T21:38:06.0777358Z     assert_size_stride(buf2, (1, 1, 8, 8), (64, 64, 8, 1))
2023-01-11T21:38:06.0777432Z     del buf0
2023-01-11T21:38:06.0777515Z     return (buf1, buf2, )
2023-01-11T21:38:06.0777520Z 
2023-01-11T21:38:06.0777525Z 
2023-01-11T21:38:06.0777606Z if __name__ == "__main__":
2023-01-11T21:38:06.0777725Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0777853Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0778071Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0778177Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0778230Z 
2023-01-11T21:38:06.0778296Z ok (0.022s)
2023-01-11T21:38:06.0778755Z   test_split_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0778889Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0779150Z [2023-01-11 21:31:40,098] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 261
2023-01-11T21:38:06.0779415Z [2023-01-11 21:31:40,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 261
2023-01-11T21:38:06.0779832Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0779970Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0780227Z [2023-01-11 21:31:40,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 262
2023-01-11T21:38:06.0780490Z [2023-01-11 21:31:41,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 262
2023-01-11T21:38:06.0780495Z 
2023-01-11T21:38:06.0780594Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0780662Z import torch
2023-01-11T21:38:06.0780739Z import random
2023-01-11T21:38:06.0780859Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0780984Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0780992Z 
2023-01-11T21:38:06.0781077Z aten = torch.ops.aten
2023-01-11T21:38:06.0781215Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0781312Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0781317Z 
2023-01-11T21:38:06.0781392Z import triton
2023-01-11T21:38:06.0781478Z import triton.language as tl
2023-01-11T21:38:06.0781679Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0781823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0781828Z 
2023-01-11T21:38:06.0781833Z 
2023-01-11T21:38:06.0781926Z async_compile.wait(globals())
2023-01-11T21:38:06.0782005Z del async_compile
2023-01-11T21:38:06.0782010Z 
2023-01-11T21:38:06.0782085Z def call(args):
2023-01-11T21:38:06.0782162Z     arg0_1, = args
2023-01-11T21:38:06.0782231Z     args.clear()
2023-01-11T21:38:06.0782433Z     return (as_strided(arg0_1, (2, 2, 3), (20, 10, 1)), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 3), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 6), as_strided(arg0_1, (2, 2, 1), (20, 10, 1), 9), )
2023-01-11T21:38:06.0782442Z 
2023-01-11T21:38:06.0782446Z 
2023-01-11T21:38:06.0782528Z if __name__ == "__main__":
2023-01-11T21:38:06.0782648Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0782777Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0782989Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0783102Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0783107Z 
2023-01-11T21:38:06.0783111Z 
2023-01-11T21:38:06.0783213Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0783288Z import torch
2023-01-11T21:38:06.0783356Z import random
2023-01-11T21:38:06.0783475Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0783599Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0783605Z 
2023-01-11T21:38:06.0783689Z aten = torch.ops.aten
2023-01-11T21:38:06.0783826Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0783961Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0783966Z 
2023-01-11T21:38:06.0784039Z import triton
2023-01-11T21:38:06.0784133Z import triton.language as tl
2023-01-11T21:38:06.0784251Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0784394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0784400Z 
2023-01-11T21:38:06.0784404Z 
2023-01-11T21:38:06.0784543Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0784751Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0784875Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0784981Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0785048Z {
2023-01-11T21:38:06.0785143Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0785213Z     {
2023-01-11T21:38:06.0785299Z         #pragma omp for 
2023-01-11T21:38:06.0785398Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.0785486Z         {
2023-01-11T21:38:06.0785649Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0785798Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0785890Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0785981Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0786051Z         }
2023-01-11T21:38:06.0786152Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0786242Z         for(long i0=40; i0<40; i0+=1)
2023-01-11T21:38:06.0786311Z         {
2023-01-11T21:38:06.0786406Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0786513Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0786597Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0786688Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0786756Z         }
2023-01-11T21:38:06.0786825Z     }
2023-01-11T21:38:06.0786897Z }
2023-01-11T21:38:06.0786982Z ''')
2023-01-11T21:38:06.0786988Z 
2023-01-11T21:38:06.0786992Z 
2023-01-11T21:38:06.0787089Z async_compile.wait(globals())
2023-01-11T21:38:06.0787161Z del async_compile
2023-01-11T21:38:06.0787166Z 
2023-01-11T21:38:06.0787251Z def call(args):
2023-01-11T21:38:06.0787326Z     arg0_1, = args
2023-01-11T21:38:06.0787431Z     args.clear()
2023-01-11T21:38:06.0787649Z     buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0787789Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0787863Z     del arg0_1
2023-01-11T21:38:06.0788047Z     return (as_strided(buf0, (2, 2, 3), (20, 10, 1)), as_strided(buf0, (2, 2, 3), (20, 10, 1), 3), as_strided(buf0, (2, 2, 3), (20, 10, 1), 6), as_strided(buf0, (2, 2, 1), (20, 10, 0), 9), )
2023-01-11T21:38:06.0788062Z 
2023-01-11T21:38:06.0788066Z 
2023-01-11T21:38:06.0788141Z if __name__ == "__main__":
2023-01-11T21:38:06.0788260Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0788390Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0788597Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0788711Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0788716Z 
2023-01-11T21:38:06.0788791Z ok (1.755s)
2023-01-11T21:38:06.0789245Z   test_split_with_sizes_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0789379Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0789640Z [2023-01-11 21:31:41,862] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 263
2023-01-11T21:38:06.0789929Z [2023-01-11 21:31:43,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 263
2023-01-11T21:38:06.0790344Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0790477Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0790732Z [2023-01-11 21:31:43,673] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 264
2023-01-11T21:38:06.0790997Z [2023-01-11 21:31:45,604] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 264
2023-01-11T21:38:06.0791416Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0791554Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0791811Z [2023-01-11 21:31:45,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 265
2023-01-11T21:38:06.0791817Z 
2023-01-11T21:38:06.0791915Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0791993Z import torch
2023-01-11T21:38:06.0792070Z import random
2023-01-11T21:38:06.0792185Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0792311Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0792316Z 
2023-01-11T21:38:06.0792400Z aten = torch.ops.aten
2023-01-11T21:38:06.0792540Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0792640Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0792645Z 
2023-01-11T21:38:06.0792725Z import triton
2023-01-11T21:38:06.0792817Z import triton.language as tl
2023-01-11T21:38:06.0792940Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0793111Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0793117Z 
2023-01-11T21:38:06.0793122Z 
2023-01-11T21:38:06.0793264Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0793477Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0793605Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0793712Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0793815Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0793917Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0793977Z {
2023-01-11T21:38:06.0794086Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0794153Z     {
2023-01-11T21:38:06.0794237Z         #pragma omp for 
2023-01-11T21:38:06.0794327Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0794398Z         {
2023-01-11T21:38:06.0794485Z             #pragma GCC ivdep
2023-01-11T21:38:06.0794570Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0794640Z             {
2023-01-11T21:38:06.0794710Z                 {
2023-01-11T21:38:06.0794782Z                     {
2023-01-11T21:38:06.0794892Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.0795009Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0795111Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0795220Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0795320Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0795424Z                         out_ptr0[i1 + (3*i0)] = tmp4;
2023-01-11T21:38:06.0795533Z                     }
2023-01-11T21:38:06.0795604Z                 }
2023-01-11T21:38:06.0795674Z             }
2023-01-11T21:38:06.0795743Z         }
2023-01-11T21:38:06.0795820Z         #pragma omp for 
2023-01-11T21:38:06.0795911Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0795981Z         {
2023-01-11T21:38:06.0796069Z             #pragma GCC ivdep
2023-01-11T21:38:06.0796157Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0796228Z             {
2023-01-11T21:38:06.0796291Z                 {
2023-01-11T21:38:06.0796362Z                     {
2023-01-11T21:38:06.0796474Z                         auto tmp0 = in_ptr0[3 + i1 + (10*i0)];
2023-01-11T21:38:06.0796588Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0796689Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0796802Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0796901Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0796999Z                         out_ptr1[i1 + (3*i0)] = tmp4;
2023-01-11T21:38:06.0797071Z                     }
2023-01-11T21:38:06.0797141Z                 }
2023-01-11T21:38:06.0797212Z             }
2023-01-11T21:38:06.0797281Z         }
2023-01-11T21:38:06.0797366Z         #pragma omp for 
2023-01-11T21:38:06.0797456Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0797517Z         {
2023-01-11T21:38:06.0797603Z             #pragma GCC ivdep
2023-01-11T21:38:06.0797693Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0797763Z             {
2023-01-11T21:38:06.0797833Z                 {
2023-01-11T21:38:06.0797906Z                     {
2023-01-11T21:38:06.0798017Z                         auto tmp0 = in_ptr0[6 + i1 + (10*i0)];
2023-01-11T21:38:06.0798124Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0798223Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0798338Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0798440Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0798542Z                         out_ptr2[i1 + (4*i0)] = tmp4;
2023-01-11T21:38:06.0798616Z                     }
2023-01-11T21:38:06.0798686Z                 }
2023-01-11T21:38:06.0798747Z             }
2023-01-11T21:38:06.0798844Z         }
2023-01-11T21:38:06.0798916Z     }
2023-01-11T21:38:06.0798981Z }
2023-01-11T21:38:06.0799069Z ''')
2023-01-11T21:38:06.0799075Z 
2023-01-11T21:38:06.0799079Z 
2023-01-11T21:38:06.0799178Z async_compile.wait(globals())
2023-01-11T21:38:06.0799257Z del async_compile
2023-01-11T21:38:06.0799262Z 
2023-01-11T21:38:06.0799331Z def call(args):
2023-01-11T21:38:06.0799408Z     arg0_1, = args
2023-01-11T21:38:06.0799486Z     args.clear()
2023-01-11T21:38:06.0799695Z     buf0 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0799899Z     buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0800101Z     buf2 = empty_strided((2, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0800294Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0800367Z     del arg0_1
2023-01-11T21:38:06.0800451Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0800457Z 
2023-01-11T21:38:06.0800461Z 
2023-01-11T21:38:06.0800542Z if __name__ == "__main__":
2023-01-11T21:38:06.0800663Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0800792Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0801003Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0801116Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0801122Z 
2023-01-11T21:38:06.0801126Z 
2023-01-11T21:38:06.0801225Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0801301Z import torch
2023-01-11T21:38:06.0801399Z import random
2023-01-11T21:38:06.0801522Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0801646Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0801651Z 
2023-01-11T21:38:06.0801735Z aten = torch.ops.aten
2023-01-11T21:38:06.0801874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0801971Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0801977Z 
2023-01-11T21:38:06.0802051Z import triton
2023-01-11T21:38:06.0802139Z import triton.language as tl
2023-01-11T21:38:06.0802264Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0802406Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0802412Z 
2023-01-11T21:38:06.0802416Z 
2023-01-11T21:38:06.0802554Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0802759Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0802889Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0802993Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0803096Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0803190Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0803259Z {
2023-01-11T21:38:06.0803363Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0803429Z     {
2023-01-11T21:38:06.0803512Z         #pragma omp for 
2023-01-11T21:38:06.0803597Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0803671Z         {
2023-01-11T21:38:06.0803750Z             #pragma GCC ivdep
2023-01-11T21:38:06.0803839Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0803908Z             {
2023-01-11T21:38:06.0803977Z                 {
2023-01-11T21:38:06.0804048Z                     {
2023-01-11T21:38:06.0804159Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.0804274Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0804370Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0804486Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0804586Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0804687Z                         out_ptr0[i1 + (4*i0)] = tmp4;
2023-01-11T21:38:06.0804786Z                     }
2023-01-11T21:38:06.0804858Z                 }
2023-01-11T21:38:06.0804930Z             }
2023-01-11T21:38:06.0804991Z         }
2023-01-11T21:38:06.0805075Z         #pragma omp for 
2023-01-11T21:38:06.0805163Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0805232Z         {
2023-01-11T21:38:06.0805318Z             #pragma GCC ivdep
2023-01-11T21:38:06.0805422Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0805492Z             {
2023-01-11T21:38:06.0805572Z                 {
2023-01-11T21:38:06.0805657Z                     {
2023-01-11T21:38:06.0805769Z                         auto tmp0 = in_ptr0[4 + i1 + (10*i0)];
2023-01-11T21:38:06.0805886Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0805988Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0806103Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0806202Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0806299Z                         out_ptr1[i1 + (3*i0)] = tmp4;
2023-01-11T21:38:06.0806372Z                     }
2023-01-11T21:38:06.0806444Z                 }
2023-01-11T21:38:06.0806513Z             }
2023-01-11T21:38:06.0806581Z         }
2023-01-11T21:38:06.0806661Z         #pragma omp for 
2023-01-11T21:38:06.0806741Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0806810Z         {
2023-01-11T21:38:06.0806895Z             #pragma GCC ivdep
2023-01-11T21:38:06.0806984Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0807052Z             {
2023-01-11T21:38:06.0807121Z                 {
2023-01-11T21:38:06.0807223Z                     {
2023-01-11T21:38:06.0807329Z                         auto tmp0 = in_ptr0[7 + i1 + (10*i0)];
2023-01-11T21:38:06.0807443Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0807542Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0807659Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0807758Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0807859Z                         out_ptr2[i1 + (3*i0)] = tmp4;
2023-01-11T21:38:06.0807932Z                     }
2023-01-11T21:38:06.0807995Z                 }
2023-01-11T21:38:06.0808063Z             }
2023-01-11T21:38:06.0808134Z         }
2023-01-11T21:38:06.0808202Z     }
2023-01-11T21:38:06.0808266Z }
2023-01-11T21:38:06.0808357Z ''')
2023-01-11T21:38:06.0808362Z 
2023-01-11T21:38:06.0808367Z 
2023-01-11T21:38:06.0808463Z async_compile.wait(globals())
2023-01-11T21:38:06.0808534Z del async_compile
2023-01-11T21:38:06.0808539Z 
2023-01-11T21:38:06.0808618Z def call(args):
2023-01-11T21:38:06.0808695Z     arg0_1, = args
2023-01-11T21:38:06.0808770Z     args.clear()
2023-01-11T21:38:06.0808972Z     buf0 = empty_strided((2, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0809173Z     buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0809376Z     buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0809569Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0809636Z     del arg0_1
2023-01-11T21:38:06.0809724Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0809729Z 
2023-01-11T21:38:06.0809733Z 
2023-01-11T21:38:06.0809814Z if __name__ == "__main__":
2023-01-11T21:38:06.0809934Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0810063Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0810273Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0810387Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0810392Z 
2023-01-11T21:38:06.0810659Z [2023-01-11 21:31:47,543] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 265
2023-01-11T21:38:06.0810693Z 
2023-01-11T21:38:06.0810787Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0810863Z import torch
2023-01-11T21:38:06.0810940Z import random
2023-01-11T21:38:06.0811059Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0811187Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0811192Z 
2023-01-11T21:38:06.0811276Z aten = torch.ops.aten
2023-01-11T21:38:06.0811413Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0811509Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0811515Z 
2023-01-11T21:38:06.0811583Z import triton
2023-01-11T21:38:06.0811675Z import triton.language as tl
2023-01-11T21:38:06.0811805Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0811945Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0811951Z 
2023-01-11T21:38:06.0811955Z 
2023-01-11T21:38:06.0812093Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0812300Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0812425Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0812529Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0812625Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0812725Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0812825Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:06.0812892Z {
2023-01-11T21:38:06.0812973Z     #pragma GCC ivdep
2023-01-11T21:38:06.0813059Z     for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0813156Z     {
2023-01-11T21:38:06.0813217Z         {
2023-01-11T21:38:06.0813285Z             {
2023-01-11T21:38:06.0813386Z                 auto tmp0 = in_ptr0[10*i0];
2023-01-11T21:38:06.0813495Z                 auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0813589Z                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0813698Z                 auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0813790Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0813871Z                 out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0813940Z             }
2023-01-11T21:38:06.0814007Z         }
2023-01-11T21:38:06.0814075Z     }
2023-01-11T21:38:06.0814176Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0814244Z     {
2023-01-11T21:38:06.0814319Z         #pragma omp for 
2023-01-11T21:38:06.0814406Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0814584Z         {
2023-01-11T21:38:06.0814674Z             #pragma GCC ivdep
2023-01-11T21:38:06.0814762Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:06.0814833Z             {
2023-01-11T21:38:06.0814903Z                 {
2023-01-11T21:38:06.0814971Z                     {
2023-01-11T21:38:06.0815084Z                         auto tmp0 = in_ptr0[1 + i1 + (10*i0)];
2023-01-11T21:38:06.0815201Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0815302Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0815417Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0815516Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0815618Z                         out_ptr1[i1 + (2*i0)] = tmp4;
2023-01-11T21:38:06.0815683Z                     }
2023-01-11T21:38:06.0815753Z                 }
2023-01-11T21:38:06.0815821Z             }
2023-01-11T21:38:06.0815889Z         }
2023-01-11T21:38:06.0815972Z         #pragma omp for 
2023-01-11T21:38:06.0816058Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0816127Z         {
2023-01-11T21:38:06.0816208Z             #pragma GCC ivdep
2023-01-11T21:38:06.0816296Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.0816364Z             {
2023-01-11T21:38:06.0816434Z                 {
2023-01-11T21:38:06.0816509Z                     {
2023-01-11T21:38:06.0816618Z                         auto tmp0 = in_ptr0[3 + i1 + (10*i0)];
2023-01-11T21:38:06.0816769Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0816871Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0816986Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0817085Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0817256Z                         out_ptr2[i1 + (3*i0)] = tmp4;
2023-01-11T21:38:06.0817332Z                     }
2023-01-11T21:38:06.0817402Z                 }
2023-01-11T21:38:06.0817463Z             }
2023-01-11T21:38:06.0817532Z         }
2023-01-11T21:38:06.0817616Z         #pragma omp for 
2023-01-11T21:38:06.0817706Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0817774Z         {
2023-01-11T21:38:06.0817861Z             #pragma GCC ivdep
2023-01-11T21:38:06.0817949Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.0818010Z             {
2023-01-11T21:38:06.0818079Z                 {
2023-01-11T21:38:06.0818150Z                     {
2023-01-11T21:38:06.0818262Z                         auto tmp0 = in_ptr0[6 + i1 + (10*i0)];
2023-01-11T21:38:06.0818373Z                         auto tmp1 = static_cast<float>(2.0);
2023-01-11T21:38:06.0818474Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0818586Z                         auto tmp3 = static_cast<float>(1.0);
2023-01-11T21:38:06.0818678Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0818779Z                         out_ptr3[i1 + (4*i0)] = tmp4;
2023-01-11T21:38:06.0818850Z                     }
2023-01-11T21:38:06.0818920Z                 }
2023-01-11T21:38:06.0818988Z             }
2023-01-11T21:38:06.0819095Z         }
2023-01-11T21:38:06.0819162Z     }
2023-01-11T21:38:06.0819220Z }
2023-01-11T21:38:06.0819311Z ''')
2023-01-11T21:38:06.0819316Z 
2023-01-11T21:38:06.0819321Z 
2023-01-11T21:38:06.0819417Z async_compile.wait(globals())
2023-01-11T21:38:06.0819496Z del async_compile
2023-01-11T21:38:06.0819501Z 
2023-01-11T21:38:06.0819577Z def call(args):
2023-01-11T21:38:06.0819654Z     arg0_1, = args
2023-01-11T21:38:06.0819730Z     args.clear()
2023-01-11T21:38:06.0819928Z     buf0 = empty_strided((2, 2, 1), (2, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0820129Z     buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0820328Z     buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0820523Z     buf3 = empty_strided((2, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0820739Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.0820817Z     del arg0_1
2023-01-11T21:38:06.0820909Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.0820914Z 
2023-01-11T21:38:06.0820919Z 
2023-01-11T21:38:06.0820998Z if __name__ == "__main__":
2023-01-11T21:38:06.0821123Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0821244Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0821451Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0821569Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0821574Z 
2023-01-11T21:38:06.0821647Z ok (5.732s)
2023-01-11T21:38:06.0822102Z   test_squeeze1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0822240Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0822499Z [2023-01-11 21:31:47,587] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 266
2023-01-11T21:38:06.0822792Z [2023-01-11 21:31:49,501] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 266
2023-01-11T21:38:06.0822798Z 
2023-01-11T21:38:06.0822898Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0822967Z import torch
2023-01-11T21:38:06.0823042Z import random
2023-01-11T21:38:06.0823162Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0823288Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0823293Z 
2023-01-11T21:38:06.0823379Z aten = torch.ops.aten
2023-01-11T21:38:06.0823516Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0823617Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0823622Z 
2023-01-11T21:38:06.0823697Z import triton
2023-01-11T21:38:06.0823783Z import triton.language as tl
2023-01-11T21:38:06.0823911Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0824052Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0824059Z 
2023-01-11T21:38:06.0824064Z 
2023-01-11T21:38:06.0824204Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0824410Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0824534Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0824641Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0824743Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0824803Z {
2023-01-11T21:38:06.0824907Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0824973Z     {
2023-01-11T21:38:06.0825094Z         #pragma omp for 
2023-01-11T21:38:06.0825180Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0825249Z         {
2023-01-11T21:38:06.0825384Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0825526Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0825619Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0825756Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0825847Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0825936Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.0826033Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0826128Z             tmp5.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0826189Z         }
2023-01-11T21:38:06.0826290Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0826375Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.0826443Z         {
2023-01-11T21:38:06.0826537Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0826643Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0826730Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0826827Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.0826916Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0827008Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.0827096Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0827181Z             out_ptr1[i0] = tmp5;
2023-01-11T21:38:06.0827249Z         }
2023-01-11T21:38:06.0827309Z     }
2023-01-11T21:38:06.0827375Z }
2023-01-11T21:38:06.0827462Z ''')
2023-01-11T21:38:06.0827468Z 
2023-01-11T21:38:06.0827473Z 
2023-01-11T21:38:06.0827568Z async_compile.wait(globals())
2023-01-11T21:38:06.0827647Z del async_compile
2023-01-11T21:38:06.0827653Z 
2023-01-11T21:38:06.0827728Z def call(args):
2023-01-11T21:38:06.0827803Z     arg0_1, = args
2023-01-11T21:38:06.0827878Z     args.clear()
2023-01-11T21:38:06.0828075Z     buf0 = empty_strided((2, 2, 2), (4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0828280Z     buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0828454Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0828529Z     del arg0_1
2023-01-11T21:38:06.0828640Z     return (buf0, buf1, )
2023-01-11T21:38:06.0828646Z 
2023-01-11T21:38:06.0828651Z 
2023-01-11T21:38:06.0828735Z if __name__ == "__main__":
2023-01-11T21:38:06.0828855Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0828984Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0829208Z     arg0_1 = rand_strided((1, 2, 1, 2, 2, 1, 1), (8, 4, 4, 2, 1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0829323Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0829328Z 
2023-01-11T21:38:06.0829400Z ok (1.956s)
2023-01-11T21:38:06.0829857Z   test_squeeze2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0829995Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0830254Z [2023-01-11 21:31:49,545] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 267
2023-01-11T21:38:06.0830519Z [2023-01-11 21:31:51,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 267
2023-01-11T21:38:06.0830525Z 
2023-01-11T21:38:06.0830626Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0830702Z import torch
2023-01-11T21:38:06.0830771Z import random
2023-01-11T21:38:06.0830890Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0831043Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0831048Z 
2023-01-11T21:38:06.0831132Z aten = torch.ops.aten
2023-01-11T21:38:06.0831268Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0831367Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0831372Z 
2023-01-11T21:38:06.0831450Z import triton
2023-01-11T21:38:06.0831549Z import triton.language as tl
2023-01-11T21:38:06.0831668Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0831812Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0831817Z 
2023-01-11T21:38:06.0831822Z 
2023-01-11T21:38:06.0831959Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0832165Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0832288Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0832393Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0832502Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0832568Z {
2023-01-11T21:38:06.0832663Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0832731Z     {
2023-01-11T21:38:06.0832815Z         #pragma omp for 
2023-01-11T21:38:06.0832902Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0832974Z         {
2023-01-11T21:38:06.0833114Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0833254Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0833339Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0833475Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.0833566Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0833653Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.0833753Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0833847Z             tmp5.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0833918Z         }
2023-01-11T21:38:06.0834011Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0834100Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.0834167Z         {
2023-01-11T21:38:06.0834257Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0834387Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0834480Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0834584Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.0834668Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0834757Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.0834846Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.0834933Z             out_ptr1[i0] = tmp5;
2023-01-11T21:38:06.0835003Z         }
2023-01-11T21:38:06.0835070Z     }
2023-01-11T21:38:06.0835140Z }
2023-01-11T21:38:06.0835220Z ''')
2023-01-11T21:38:06.0835225Z 
2023-01-11T21:38:06.0835229Z 
2023-01-11T21:38:06.0835326Z async_compile.wait(globals())
2023-01-11T21:38:06.0835408Z del async_compile
2023-01-11T21:38:06.0835413Z 
2023-01-11T21:38:06.0835489Z def call(args):
2023-01-11T21:38:06.0835564Z     arg0_1, = args
2023-01-11T21:38:06.0835640Z     args.clear()
2023-01-11T21:38:06.0835862Z     buf0 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0836080Z     buf1 = empty_strided((2, 1, 2, 2, 2, 1), (8, 8, 4, 2, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0836244Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0836320Z     del arg0_1
2023-01-11T21:38:06.0836403Z     return (buf0, buf1, )
2023-01-11T21:38:06.0836408Z 
2023-01-11T21:38:06.0836412Z 
2023-01-11T21:38:06.0836494Z if __name__ == "__main__":
2023-01-11T21:38:06.0836614Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0836746Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0836977Z     arg0_1 = rand_strided((1, 2, 1, 2, 2, 2, 1), (16, 8, 8, 4, 2, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0837115Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0837127Z 
2023-01-11T21:38:06.0837193Z ok (1.928s)
2023-01-11T21:38:06.0837646Z   test_stack_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0837782Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0838046Z [2023-01-11 21:31:51,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 268
2023-01-11T21:38:06.0838309Z [2023-01-11 21:31:53,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 268
2023-01-11T21:38:06.0838318Z 
2023-01-11T21:38:06.0838417Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0838495Z import torch
2023-01-11T21:38:06.0838575Z import random
2023-01-11T21:38:06.0838688Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0838817Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0838822Z 
2023-01-11T21:38:06.0838906Z aten = torch.ops.aten
2023-01-11T21:38:06.0839043Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0839139Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0839144Z 
2023-01-11T21:38:06.0839218Z import triton
2023-01-11T21:38:06.0839311Z import triton.language as tl
2023-01-11T21:38:06.0839437Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0839570Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0839576Z 
2023-01-11T21:38:06.0839589Z 
2023-01-11T21:38:06.0839720Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0839930Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0840057Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0840167Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0840300Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0840404Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0840473Z {
2023-01-11T21:38:06.0840569Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0840636Z     {
2023-01-11T21:38:06.0840731Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0840817Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.0840884Z         {
2023-01-11T21:38:06.0840977Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:06.0841044Z             {
2023-01-11T21:38:06.0841108Z                 {
2023-01-11T21:38:06.0841182Z                     {
2023-01-11T21:38:06.0841287Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.0841397Z                         out_ptr0[(2*i1) + (32*i0)] = tmp0;
2023-01-11T21:38:06.0841470Z                     }
2023-01-11T21:38:06.0841541Z                 }
2023-01-11T21:38:06.0841602Z             }
2023-01-11T21:38:06.0841670Z         }
2023-01-11T21:38:06.0841769Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.0841855Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.0841923Z         {
2023-01-11T21:38:06.0842014Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:06.0842082Z             {
2023-01-11T21:38:06.0842145Z                 {
2023-01-11T21:38:06.0842217Z                     {
2023-01-11T21:38:06.0842318Z                         auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0842426Z                         out_ptr1[(2*i1) + (32*i0)] = tmp0;
2023-01-11T21:38:06.0842496Z                     }
2023-01-11T21:38:06.0842565Z                 }
2023-01-11T21:38:06.0842633Z             }
2023-01-11T21:38:06.0842722Z         }
2023-01-11T21:38:06.0842787Z     }
2023-01-11T21:38:06.0842851Z }
2023-01-11T21:38:06.0842936Z ''')
2023-01-11T21:38:06.0842942Z 
2023-01-11T21:38:06.0842947Z 
2023-01-11T21:38:06.0843044Z async_compile.wait(globals())
2023-01-11T21:38:06.0843122Z del async_compile
2023-01-11T21:38:06.0843127Z 
2023-01-11T21:38:06.0843203Z def call(args):
2023-01-11T21:38:06.0843279Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0843356Z     args.clear()
2023-01-11T21:38:06.0843565Z     buf2 = empty_strided((12, 16, 2), (32, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0843677Z     buf0 = as_strided(buf2, (12, 16, 1), (32, 2, 1))  # alias
2023-01-11T21:38:06.0843790Z     buf1 = as_strided(buf2, (12, 16, 1), (32, 2, 1), 1)  # alias
2023-01-11T21:38:06.0843985Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0844059Z     del arg0_1
2023-01-11T21:38:06.0844125Z     del arg1_1
2023-01-11T21:38:06.0844205Z     return (buf2, )
2023-01-11T21:38:06.0844210Z 
2023-01-11T21:38:06.0844215Z 
2023-01-11T21:38:06.0844296Z if __name__ == "__main__":
2023-01-11T21:38:06.0844415Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0844543Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0844746Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0844944Z     arg1_1 = rand_strided((12, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0845065Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0845070Z 
2023-01-11T21:38:06.0845143Z ok (1.734s)
2023-01-11T21:38:06.0845585Z   test_std_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0845721Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0845980Z [2023-01-11 21:31:53,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 269
2023-01-11T21:38:06.0845986Z 
2023-01-11T21:38:06.0846124Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0846204Z import torch
2023-01-11T21:38:06.0846279Z import random
2023-01-11T21:38:06.0846399Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0846525Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0846530Z 
2023-01-11T21:38:06.0846606Z aten = torch.ops.aten
2023-01-11T21:38:06.0846743Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0846841Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0846846Z 
2023-01-11T21:38:06.0846920Z import triton
2023-01-11T21:38:06.0847015Z import triton.language as tl
2023-01-11T21:38:06.0847144Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0847284Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0847289Z 
2023-01-11T21:38:06.0847293Z 
2023-01-11T21:38:06.0847431Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0847632Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0857619Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0857779Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.0857890Z                        float* __restrict__ in_out_ptr2,
2023-01-11T21:38:06.0857988Z                        float* __restrict__ in_out_ptr3,
2023-01-11T21:38:06.0858094Z                        float* __restrict__ in_out_ptr4,
2023-01-11T21:38:06.0858199Z                        float* __restrict__ in_out_ptr5,
2023-01-11T21:38:06.0858304Z                        float* __restrict__ in_out_ptr6,
2023-01-11T21:38:06.0858503Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0858611Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0858714Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0858807Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.0858915Z                        float* __restrict__ out_ptr5,
2023-01-11T21:38:06.0859013Z                        float* __restrict__ out_ptr7,
2023-01-11T21:38:06.0859114Z                        float* __restrict__ out_ptr10,
2023-01-11T21:38:06.0859221Z                        float* __restrict__ out_ptr12,
2023-01-11T21:38:06.0859322Z                        float* __restrict__ out_ptr14)
2023-01-11T21:38:06.0859382Z {
2023-01-11T21:38:06.0859474Z     auto out_ptr6 = in_out_ptr0;
2023-01-11T21:38:06.0859562Z     auto out_ptr8 = in_out_ptr1;
2023-01-11T21:38:06.0859654Z     auto out_ptr11 = in_out_ptr2;
2023-01-11T21:38:06.0859743Z     auto out_ptr13 = in_out_ptr3;
2023-01-11T21:38:06.0859834Z     auto out_ptr1 = in_out_ptr4;
2023-01-11T21:38:06.0859921Z     auto out_ptr3 = in_out_ptr5;
2023-01-11T21:38:06.0860001Z     auto out_ptr9 = in_out_ptr6;
2023-01-11T21:38:06.0860069Z     {
2023-01-11T21:38:06.0860272Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0860355Z         float tmp1 = 0;
2023-01-11T21:38:06.0860479Z         auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0860588Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0860655Z         {
2023-01-11T21:38:06.0860761Z             #pragma omp for reduction(+:tmp1_vec) 
2023-01-11T21:38:06.0860851Z             for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0860920Z             {
2023-01-11T21:38:06.0861061Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0861146Z                 tmp1_vec += tmp0;
2023-01-11T21:38:06.0861215Z             }
2023-01-11T21:38:06.0861417Z             tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0861546Z             #pragma omp for simd simdlen(4) reduction(+:tmp1) 
2023-01-11T21:38:06.0861633Z             for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0861742Z             {
2023-01-11T21:38:06.0861837Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0861918Z                 tmp1 += tmp0;
2023-01-11T21:38:06.0861985Z             }
2023-01-11T21:38:06.0862052Z         }
2023-01-11T21:38:06.0862136Z         out_ptr0[0] = tmp1;
2023-01-11T21:38:06.0862196Z     }
2023-01-11T21:38:06.0862261Z     {
2023-01-11T21:38:06.0862450Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0862535Z         float tmp6 = 0;
2023-01-11T21:38:06.0862656Z         auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.0862739Z         float tmp7 = 0;
2023-01-11T21:38:06.0862862Z         auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.0862963Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0863031Z         {
2023-01-11T21:38:06.0863172Z             #pragma omp for reduction(+:tmp6_vec) reduction(+:tmp7_vec) 
2023-01-11T21:38:06.0863265Z             for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0863333Z             {
2023-01-11T21:38:06.0863470Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0863603Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr0[0]);
2023-01-11T21:38:06.0863743Z                 auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(256));
2023-01-11T21:38:06.0863830Z                 auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0863986Z                 auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0864079Z                 auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.0864165Z                 tmp6_vec += tmp5;
2023-01-11T21:38:06.0864278Z                 tmp7_vec += tmp0;
2023-01-11T21:38:06.0864348Z             }
2023-01-11T21:38:06.0864546Z             tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.0864739Z             tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:06.0864882Z             #pragma omp for simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 
2023-01-11T21:38:06.0864972Z             for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0865039Z             {
2023-01-11T21:38:06.0865130Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0865221Z                 auto tmp1 = out_ptr0[0];
2023-01-11T21:38:06.0865330Z                 auto tmp2 = static_cast<float>(256);
2023-01-11T21:38:06.0865421Z                 auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0865547Z                 auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0865642Z                 auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0865723Z                 tmp6 += tmp5;
2023-01-11T21:38:06.0865803Z                 tmp7 += tmp0;
2023-01-11T21:38:06.0865869Z             }
2023-01-11T21:38:06.0865936Z         }
2023-01-11T21:38:06.0866013Z         out_ptr1[0] = tmp6;
2023-01-11T21:38:06.0866088Z         out_ptr2[0] = tmp7;
2023-01-11T21:38:06.0866155Z     }
2023-01-11T21:38:06.0866219Z     {
2023-01-11T21:38:06.0866410Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0866489Z         float tmp6 = 0;
2023-01-11T21:38:06.0866612Z         auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.0866692Z         float tmp7 = 0;
2023-01-11T21:38:06.0866804Z         auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.0866913Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0866980Z         {
2023-01-11T21:38:06.0867120Z             #pragma omp for reduction(+:tmp6_vec) reduction(+:tmp7_vec) 
2023-01-11T21:38:06.0867210Z             for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0867278Z             {
2023-01-11T21:38:06.0867418Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0867572Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr2[0]);
2023-01-11T21:38:06.0867716Z                 auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(256));
2023-01-11T21:38:06.0867807Z                 auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0867941Z                 auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0868036Z                 auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.0868121Z                 tmp6_vec += tmp5;
2023-01-11T21:38:06.0868204Z                 tmp7_vec += tmp0;
2023-01-11T21:38:06.0868271Z             }
2023-01-11T21:38:06.0868460Z             tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.0868654Z             tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:06.0868799Z             #pragma omp for simd simdlen(4) reduction(+:tmp6) reduction(+:tmp7) 
2023-01-11T21:38:06.0868895Z             for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0868967Z             {
2023-01-11T21:38:06.0869058Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0869149Z                 auto tmp1 = out_ptr2[0];
2023-01-11T21:38:06.0869258Z                 auto tmp2 = static_cast<float>(256);
2023-01-11T21:38:06.0869342Z                 auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0869474Z                 auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0869566Z                 auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0869648Z                 tmp6 += tmp5;
2023-01-11T21:38:06.0869727Z                 tmp7 += tmp0;
2023-01-11T21:38:06.0869823Z             }
2023-01-11T21:38:06.0869890Z         }
2023-01-11T21:38:06.0869965Z         out_ptr3[0] = tmp6;
2023-01-11T21:38:06.0870045Z         out_ptr4[0] = tmp7;
2023-01-11T21:38:06.0870110Z     }
2023-01-11T21:38:06.0870215Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0870284Z     {
2023-01-11T21:38:06.0870366Z         #pragma omp for 
2023-01-11T21:38:06.0870449Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0870516Z         {
2023-01-11T21:38:06.0870583Z             {
2023-01-11T21:38:06.0870778Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0870862Z                 float tmp1 = 0;
2023-01-11T21:38:06.0870988Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0871081Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0871150Z                 {
2023-01-11T21:38:06.0871291Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0871382Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.0871450Z                 }
2023-01-11T21:38:06.0871648Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0871776Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.0871871Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0871940Z                 {
2023-01-11T21:38:06.0872049Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0872124Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0872194Z                 }
2023-01-11T21:38:06.0872280Z                 out_ptr5[i0] = tmp1;
2023-01-11T21:38:06.0872348Z             }
2023-01-11T21:38:06.0872415Z         }
2023-01-11T21:38:06.0872496Z         #pragma omp for 
2023-01-11T21:38:06.0872576Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0872646Z         {
2023-01-11T21:38:06.0872714Z             {
2023-01-11T21:38:06.0872903Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0872988Z                 float tmp6 = 0;
2023-01-11T21:38:06.0873114Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.0873226Z                 float tmp7 = 0;
2023-01-11T21:38:06.0873352Z                 auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.0873439Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0873512Z                 {
2023-01-11T21:38:06.0873661Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0873794Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr5[i0]);
2023-01-11T21:38:06.0873936Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.0874034Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0874182Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0874278Z                     auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.0874358Z                     tmp6_vec += tmp5;
2023-01-11T21:38:06.0874442Z                     tmp7_vec += tmp0;
2023-01-11T21:38:06.0874510Z                 }
2023-01-11T21:38:06.0874714Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.0874907Z                 tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:06.0875052Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6) reduction(+:tmp7)
2023-01-11T21:38:06.0875146Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0875216Z                 {
2023-01-11T21:38:06.0875332Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0875492Z                     auto tmp1 = out_ptr5[i0];
2023-01-11T21:38:06.0875615Z                     auto tmp2 = static_cast<float>(8);
2023-01-11T21:38:06.0875712Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0875849Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0875946Z                     auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0876028Z                     tmp6 += tmp5;
2023-01-11T21:38:06.0876104Z                     tmp7 += tmp0;
2023-01-11T21:38:06.0876172Z                 }
2023-01-11T21:38:06.0876260Z                 out_ptr6[i0] = tmp6;
2023-01-11T21:38:06.0876346Z                 out_ptr7[i0] = tmp7;
2023-01-11T21:38:06.0876414Z             }
2023-01-11T21:38:06.0876479Z         }
2023-01-11T21:38:06.0876561Z         #pragma omp for 
2023-01-11T21:38:06.0876641Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0876707Z         {
2023-01-11T21:38:06.0876849Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr6 + 8*i0);
2023-01-11T21:38:06.0876992Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(7));
2023-01-11T21:38:06.0877081Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0877183Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0877252Z         }
2023-01-11T21:38:06.0877344Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0877434Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:06.0877502Z         {
2023-01-11T21:38:06.0877593Z             auto tmp0 = out_ptr6[i0];
2023-01-11T21:38:06.0877696Z             auto tmp1 = static_cast<float>(7);
2023-01-11T21:38:06.0877786Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0877873Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0877932Z         }
2023-01-11T21:38:06.0878013Z         #pragma omp for 
2023-01-11T21:38:06.0878099Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0878164Z         {
2023-01-11T21:38:06.0878231Z             {
2023-01-11T21:38:06.0878423Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0878510Z                 float tmp6 = 0;
2023-01-11T21:38:06.0878629Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.0878722Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0878820Z                 {
2023-01-11T21:38:06.0878969Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0879104Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr7[i0]);
2023-01-11T21:38:06.0879247Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.0879344Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0879484Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0879573Z                     auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.0879659Z                     tmp6_vec += tmp5;
2023-01-11T21:38:06.0879732Z                 }
2023-01-11T21:38:06.0879928Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.0880055Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6)
2023-01-11T21:38:06.0880156Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0880226Z                 {
2023-01-11T21:38:06.0880331Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0880422Z                     auto tmp1 = out_ptr7[i0];
2023-01-11T21:38:06.0880531Z                     auto tmp2 = static_cast<float>(8);
2023-01-11T21:38:06.0880628Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0880765Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0880859Z                     auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0880942Z                     tmp6 += tmp5;
2023-01-11T21:38:06.0881013Z                 }
2023-01-11T21:38:06.0881093Z                 out_ptr8[i0] = tmp6;
2023-01-11T21:38:06.0881236Z             }
2023-01-11T21:38:06.0881303Z         }
2023-01-11T21:38:06.0881387Z         #pragma omp for 
2023-01-11T21:38:06.0881473Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0881538Z         {
2023-01-11T21:38:06.0881681Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr8 + 8*i0);
2023-01-11T21:38:06.0881813Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.0881904Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0882006Z             tmp2.store(in_out_ptr1 + 8*i0);
2023-01-11T21:38:06.0882073Z         }
2023-01-11T21:38:06.0882171Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0882258Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:06.0882325Z         {
2023-01-11T21:38:06.0882408Z             auto tmp0 = out_ptr8[i0];
2023-01-11T21:38:06.0882512Z             auto tmp1 = static_cast<float>(8);
2023-01-11T21:38:06.0882600Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0882693Z             in_out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.0882763Z         }
2023-01-11T21:38:06.0882830Z     }
2023-01-11T21:38:06.0882890Z     {
2023-01-11T21:38:06.0883082Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0883165Z         float tmp6 = 0;
2023-01-11T21:38:06.0883286Z         auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.0883394Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0883461Z         {
2023-01-11T21:38:06.0883576Z             #pragma omp for reduction(+:tmp6_vec) 
2023-01-11T21:38:06.0883666Z             for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0883728Z             {
2023-01-11T21:38:06.0883869Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0884000Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr4[0]);
2023-01-11T21:38:06.0884142Z                 auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(256));
2023-01-11T21:38:06.0884237Z                 auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0884371Z                 auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0884464Z                 auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.0884548Z                 tmp6_vec += tmp5;
2023-01-11T21:38:06.0884637Z             }
2023-01-11T21:38:06.0884836Z             tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.0884962Z             #pragma omp for simd simdlen(4) reduction(+:tmp6) 
2023-01-11T21:38:06.0885059Z             for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.0885128Z             {
2023-01-11T21:38:06.0885219Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0885310Z                 auto tmp1 = out_ptr4[0];
2023-01-11T21:38:06.0885412Z                 auto tmp2 = static_cast<float>(256);
2023-01-11T21:38:06.0885505Z                 auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0885642Z                 auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0885734Z                 auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0885814Z                 tmp6 += tmp5;
2023-01-11T21:38:06.0885883Z             }
2023-01-11T21:38:06.0885952Z         }
2023-01-11T21:38:06.0886029Z         out_ptr9[0] = tmp6;
2023-01-11T21:38:06.0886095Z     }
2023-01-11T21:38:06.0886197Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0886263Z     {
2023-01-11T21:38:06.0886344Z         #pragma omp for 
2023-01-11T21:38:06.0886432Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0886492Z         {
2023-01-11T21:38:06.0886559Z             {
2023-01-11T21:38:06.0886627Z                 {
2023-01-11T21:38:06.0886713Z                     float tmp1 = 0;
2023-01-11T21:38:06.0886811Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0886883Z                     {
2023-01-11T21:38:06.0886955Z                         {
2023-01-11T21:38:06.0887089Z                             auto tmp0 = in_ptr0[i0 + (32*i1)];
2023-01-11T21:38:06.0887178Z                             tmp1 += tmp0;
2023-01-11T21:38:06.0887254Z                         }
2023-01-11T21:38:06.0887325Z                     }
2023-01-11T21:38:06.0887416Z                     out_ptr10[i0] = tmp1;
2023-01-11T21:38:06.0887486Z                 }
2023-01-11T21:38:06.0887553Z             }
2023-01-11T21:38:06.0887613Z         }
2023-01-11T21:38:06.0887695Z         #pragma omp for 
2023-01-11T21:38:06.0887780Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0887846Z         {
2023-01-11T21:38:06.0887913Z             {
2023-01-11T21:38:06.0887981Z                 {
2023-01-11T21:38:06.0888067Z                     float tmp6 = 0;
2023-01-11T21:38:06.0888145Z                     float tmp7 = 0;
2023-01-11T21:38:06.0888241Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0888312Z                     {
2023-01-11T21:38:06.0888385Z                         {
2023-01-11T21:38:06.0888502Z                             auto tmp0 = in_ptr0[i0 + (32*i1)];
2023-01-11T21:38:06.0888608Z                             auto tmp1 = out_ptr10[i0];
2023-01-11T21:38:06.0888721Z                             auto tmp2 = static_cast<float>(8);
2023-01-11T21:38:06.0888816Z                             auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0888968Z                             auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0889069Z                             auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0889156Z                             tmp6 += tmp5;
2023-01-11T21:38:06.0889244Z                             tmp7 += tmp0;
2023-01-11T21:38:06.0889315Z                         }
2023-01-11T21:38:06.0889385Z                     }
2023-01-11T21:38:06.0889468Z                     out_ptr11[i0] = tmp6;
2023-01-11T21:38:06.0889556Z                     out_ptr12[i0] = tmp7;
2023-01-11T21:38:06.0889625Z                 }
2023-01-11T21:38:06.0889692Z             }
2023-01-11T21:38:06.0889758Z         }
2023-01-11T21:38:06.0889842Z         #pragma omp for 
2023-01-11T21:38:06.0889929Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0889989Z         {
2023-01-11T21:38:06.0890131Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr11 + 8*i0);
2023-01-11T21:38:06.0890299Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(7));
2023-01-11T21:38:06.0890390Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0890479Z             auto tmp3 = tmp2.sqrt();
2023-01-11T21:38:06.0890580Z             tmp3.store(in_out_ptr2 + 8*i0);
2023-01-11T21:38:06.0890648Z         }
2023-01-11T21:38:06.0890740Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0890827Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:06.0890894Z         {
2023-01-11T21:38:06.0890986Z             auto tmp0 = out_ptr11[i0];
2023-01-11T21:38:06.0891089Z             auto tmp1 = static_cast<float>(7);
2023-01-11T21:38:06.0891179Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0891279Z             auto tmp3 = std::sqrt(tmp2);
2023-01-11T21:38:06.0891360Z             in_out_ptr2[i0] = tmp3;
2023-01-11T21:38:06.0891426Z         }
2023-01-11T21:38:06.0891507Z         #pragma omp for 
2023-01-11T21:38:06.0891594Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0891661Z         {
2023-01-11T21:38:06.0891730Z             {
2023-01-11T21:38:06.0891794Z                 {
2023-01-11T21:38:06.0891880Z                     float tmp6 = 0;
2023-01-11T21:38:06.0891977Z                     for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.0892047Z                     {
2023-01-11T21:38:06.0892119Z                         {
2023-01-11T21:38:06.0892227Z                             auto tmp0 = in_ptr0[i0 + (32*i1)];
2023-01-11T21:38:06.0892331Z                             auto tmp1 = out_ptr12[i0];
2023-01-11T21:38:06.0892437Z                             auto tmp2 = static_cast<float>(8);
2023-01-11T21:38:06.0892539Z                             auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.0892713Z                             auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.0892811Z                             auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.0892898Z                             tmp6 += tmp5;
2023-01-11T21:38:06.0892972Z                         }
2023-01-11T21:38:06.0893042Z                     }
2023-01-11T21:38:06.0893133Z                     out_ptr13[i0] = tmp6;
2023-01-11T21:38:06.0893196Z                 }
2023-01-11T21:38:06.0893264Z             }
2023-01-11T21:38:06.0893331Z         }
2023-01-11T21:38:06.0893414Z         #pragma omp for 
2023-01-11T21:38:06.0893499Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.0893568Z         {
2023-01-11T21:38:06.0893701Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr13 + 8*i0);
2023-01-11T21:38:06.0893839Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.0893929Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0894020Z             auto tmp3 = tmp2.sqrt();
2023-01-11T21:38:06.0894125Z             tmp3.store(in_out_ptr3 + 8*i0);
2023-01-11T21:38:06.0894191Z         }
2023-01-11T21:38:06.0894291Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0894379Z         for(long i0=32; i0<32; i0+=1)
2023-01-11T21:38:06.0894439Z         {
2023-01-11T21:38:06.0894675Z             auto tmp0 = out_ptr13[i0];
2023-01-11T21:38:06.0894785Z             auto tmp1 = static_cast<float>(8);
2023-01-11T21:38:06.0894875Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0894971Z             auto tmp3 = std::sqrt(tmp2);
2023-01-11T21:38:06.0895066Z             in_out_ptr3[i0] = tmp3;
2023-01-11T21:38:06.0895140Z         }
2023-01-11T21:38:06.0895223Z         #pragma omp for 
2023-01-11T21:38:06.0895328Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0895397Z         {
2023-01-11T21:38:06.0895488Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0895559Z             {
2023-01-11T21:38:06.0895706Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0895859Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0896002Z                 auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr0 + 16 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0896195Z                 auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr0 + 24 + (8*i1) + (32*i0));
2023-01-11T21:38:06.0896289Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0896381Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0896473Z                 auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0896614Z                 auto tmp7 = at::vec::Vectorized<float>(static_cast<float>(4));
2023-01-11T21:38:06.0896704Z                 auto tmp8 = tmp6 / tmp7;
2023-01-11T21:38:06.0896838Z                 auto tmp9 = tmp0 - tmp8;
2023-01-11T21:38:06.0896926Z                 auto tmp10 = tmp9.pow(2);
2023-01-11T21:38:06.0897058Z                 auto tmp11 = tmp1 - tmp8;
2023-01-11T21:38:06.0897228Z                 auto tmp12 = tmp11.pow(2);
2023-01-11T21:38:06.0897332Z                 auto tmp13 = tmp10 + tmp12;
2023-01-11T21:38:06.0897466Z                 auto tmp14 = tmp3 - tmp8;
2023-01-11T21:38:06.0897561Z                 auto tmp15 = tmp14.pow(2);
2023-01-11T21:38:06.0897654Z                 auto tmp16 = tmp13 + tmp15;
2023-01-11T21:38:06.0897782Z                 auto tmp17 = tmp5 - tmp8;
2023-01-11T21:38:06.0897876Z                 auto tmp18 = tmp17.pow(2);
2023-01-11T21:38:06.0897970Z                 auto tmp19 = tmp16 + tmp18;
2023-01-11T21:38:06.0898113Z                 auto tmp20 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:06.0898210Z                 auto tmp21 = tmp19 / tmp20;
2023-01-11T21:38:06.0898303Z                 auto tmp22 = tmp21.sqrt();
2023-01-11T21:38:06.0898414Z                 tmp22.store(out_ptr14 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0898476Z             }
2023-01-11T21:38:06.0898573Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.0898703Z             for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0898772Z             {
2023-01-11T21:38:06.0898873Z                 auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:06.0898979Z                 auto tmp1 = in_ptr0[8 + i1 + (32*i0)];
2023-01-11T21:38:06.0899086Z                 auto tmp3 = in_ptr0[16 + i1 + (32*i0)];
2023-01-11T21:38:06.0899184Z                 auto tmp5 = in_ptr0[24 + i1 + (32*i0)];
2023-01-11T21:38:06.0899276Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0899366Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0899457Z                 auto tmp6 = tmp4 + tmp5;
2023-01-11T21:38:06.0899561Z                 auto tmp7 = static_cast<float>(4);
2023-01-11T21:38:06.0899651Z                 auto tmp8 = tmp6 / tmp7;
2023-01-11T21:38:06.0899782Z                 auto tmp9 = tmp0 - tmp8;
2023-01-11T21:38:06.0899868Z                 auto tmp10 = tmp9 * tmp9;
2023-01-11T21:38:06.0899999Z                 auto tmp11 = tmp1 - tmp8;
2023-01-11T21:38:06.0900099Z                 auto tmp12 = tmp11 * tmp11;
2023-01-11T21:38:06.0900193Z                 auto tmp13 = tmp10 + tmp12;
2023-01-11T21:38:06.0900322Z                 auto tmp14 = tmp3 - tmp8;
2023-01-11T21:38:06.0900418Z                 auto tmp15 = tmp14 * tmp14;
2023-01-11T21:38:06.0900511Z                 auto tmp16 = tmp13 + tmp15;
2023-01-11T21:38:06.0900638Z                 auto tmp17 = tmp5 - tmp8;
2023-01-11T21:38:06.0900732Z                 auto tmp18 = tmp17 * tmp17;
2023-01-11T21:38:06.0900822Z                 auto tmp19 = tmp16 + tmp18;
2023-01-11T21:38:06.0900929Z                 auto tmp20 = static_cast<float>(3);
2023-01-11T21:38:06.0901022Z                 auto tmp21 = tmp19 / tmp20;
2023-01-11T21:38:06.0901126Z                 auto tmp22 = std::sqrt(tmp21);
2023-01-11T21:38:06.0901222Z                 out_ptr14[i1 + (8*i0)] = tmp22;
2023-01-11T21:38:06.0901289Z             }
2023-01-11T21:38:06.0901350Z         }
2023-01-11T21:38:06.0901434Z         #pragma omp single
2023-01-11T21:38:06.0901502Z         {
2023-01-11T21:38:06.0901568Z             {
2023-01-11T21:38:06.0901635Z                 {
2023-01-11T21:38:06.0901731Z                     auto tmp0 = out_ptr1[0];
2023-01-11T21:38:06.0901836Z                     auto tmp1 = static_cast<float>(255);
2023-01-11T21:38:06.0901931Z                     auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0902054Z                     in_out_ptr4[0] = tmp2;
2023-01-11T21:38:06.0902125Z                 }
2023-01-11T21:38:06.0902192Z             }
2023-01-11T21:38:06.0902258Z         }
2023-01-11T21:38:06.0902342Z         #pragma omp single
2023-01-11T21:38:06.0902402Z         {
2023-01-11T21:38:06.0902467Z             {
2023-01-11T21:38:06.0902534Z                 {
2023-01-11T21:38:06.0902629Z                     auto tmp0 = out_ptr3[0];
2023-01-11T21:38:06.0902741Z                     auto tmp1 = static_cast<float>(256);
2023-01-11T21:38:06.0902842Z                     auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0902934Z                     in_out_ptr5[0] = tmp2;
2023-01-11T21:38:06.0902999Z                 }
2023-01-11T21:38:06.0903066Z             }
2023-01-11T21:38:06.0903134Z         }
2023-01-11T21:38:06.0903216Z         #pragma omp single
2023-01-11T21:38:06.0903283Z         {
2023-01-11T21:38:06.0903348Z             {
2023-01-11T21:38:06.0903409Z                 {
2023-01-11T21:38:06.0903506Z                     auto tmp0 = out_ptr9[0];
2023-01-11T21:38:06.0903617Z                     auto tmp1 = static_cast<float>(256);
2023-01-11T21:38:06.0903713Z                     auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.0903845Z                     auto tmp3 = std::sqrt(tmp2);
2023-01-11T21:38:06.0903979Z                     in_out_ptr6[0] = tmp3;
2023-01-11T21:38:06.0904076Z                 }
2023-01-11T21:38:06.0904156Z             }
2023-01-11T21:38:06.0904245Z         }
2023-01-11T21:38:06.0904332Z     }
2023-01-11T21:38:06.0904414Z }
2023-01-11T21:38:06.0904541Z ''')
2023-01-11T21:38:06.0904550Z 
2023-01-11T21:38:06.0904555Z 
2023-01-11T21:38:06.0904738Z async_compile.wait(globals())
2023-01-11T21:38:06.0904846Z del async_compile
2023-01-11T21:38:06.0904853Z 
2023-01-11T21:38:06.0904966Z def call(args):
2023-01-11T21:38:06.0905085Z     arg0_1, = args
2023-01-11T21:38:06.0905229Z     args.clear()
2023-01-11T21:38:06.0905582Z     buf0 = empty_strided((1, 1, 1, 1), (1, 1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0905911Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0906214Z     buf2 = empty_strided((1, 1, 1, 1), (1, 1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0906502Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0906799Z     buf10 = empty_strided((1, 1, 1, 1), (1, 1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0907099Z     buf4 = empty_strided((2, 4, 4, 1), (16, 4, 1, 32), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0907360Z     buf5 = empty_strided((2, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0907684Z     buf7 = empty_strided((2, 4, 4, 1), (16, 4, 1, 32), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0907817Z     buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.0908096Z     buf8 = empty_strided((2, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0908229Z     buf9 = buf8; del buf8  # reuse
2023-01-11T21:38:06.0908527Z     buf11 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0908833Z     buf12 = empty_strided((1, 1, 4, 8), (32, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0909109Z     buf13 = empty_strided((4, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0909399Z     buf15 = empty_strided((1, 1, 4, 8), (32, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0909538Z     buf14 = buf13; del buf13  # reuse
2023-01-11T21:38:06.0909804Z     buf16 = empty_strided((4, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0909936Z     buf17 = buf16; del buf16  # reuse
2023-01-11T21:38:06.0910225Z     buf18 = empty_strided((2, 4, 1, 8), (32, 8, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0910343Z     buf19 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0910459Z     buf20 = buf3; del buf3  # reuse
2023-01-11T21:38:06.0910580Z     buf21 = buf11; del buf11  # reuse
2023-01-11T21:38:06.0911147Z     kernel_cpp_0(c_void_p(buf6.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf14.data_ptr()), c_void_p(buf17.data_ptr()), c_void_p(buf19.data_ptr()), c_void_p(buf20.data_ptr()), c_void_p(buf21.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf12.data_ptr()), c_void_p(buf15.data_ptr()), c_void_p(buf18.data_ptr()))
2023-01-11T21:38:06.0911233Z     del arg0_1
2023-01-11T21:38:06.0911362Z     return (buf19, buf20, buf6, buf9, buf21, buf14, buf17, buf18, )
2023-01-11T21:38:06.0911369Z 
2023-01-11T21:38:06.0911377Z 
2023-01-11T21:38:06.0911462Z if __name__ == "__main__":
2023-01-11T21:38:06.0911584Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0911715Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0911937Z     arg0_1 = rand_strided((2, 4, 4, 8), (128, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0912053Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0912323Z [2023-01-11 21:31:55,313] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 269
2023-01-11T21:38:06.0912330Z 
2023-01-11T21:38:06.0912396Z ok (2.132s)
2023-01-11T21:38:06.0912910Z   test_strided_inputs_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.0912992Z   warnings.warn(
2023-01-11T21:38:06.0913292Z [2023-01-11 21:31:55,328] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 270
2023-01-11T21:38:06.0913559Z [2023-01-11 21:31:57,015] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 270
2023-01-11T21:38:06.0913565Z 
2023-01-11T21:38:06.0913670Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0913750Z import torch
2023-01-11T21:38:06.0913831Z import random
2023-01-11T21:38:06.0913956Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0914076Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0914081Z 
2023-01-11T21:38:06.0914168Z aten = torch.ops.aten
2023-01-11T21:38:06.0914308Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0914407Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0914412Z 
2023-01-11T21:38:06.0914488Z import triton
2023-01-11T21:38:06.0914584Z import triton.language as tl
2023-01-11T21:38:06.0914710Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0914849Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0914863Z 
2023-01-11T21:38:06.0914867Z 
2023-01-11T21:38:06.0915001Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0915214Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0915342Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0915455Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0915562Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0915634Z {
2023-01-11T21:38:06.0915742Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0915804Z     {
2023-01-11T21:38:06.0915890Z         #pragma omp for 
2023-01-11T21:38:06.0915986Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:06.0916057Z         {
2023-01-11T21:38:06.0916128Z             {
2023-01-11T21:38:06.0916198Z                 {
2023-01-11T21:38:06.0916305Z                     auto tmp0 = in_ptr0[2*i0];
2023-01-11T21:38:06.0916398Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.0916495Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0916587Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0916663Z                 }
2023-01-11T21:38:06.0916764Z             }
2023-01-11T21:38:06.0916837Z         }
2023-01-11T21:38:06.0916905Z     }
2023-01-11T21:38:06.0916964Z }
2023-01-11T21:38:06.0917051Z ''')
2023-01-11T21:38:06.0917057Z 
2023-01-11T21:38:06.0917062Z 
2023-01-11T21:38:06.0917157Z async_compile.wait(globals())
2023-01-11T21:38:06.0917233Z del async_compile
2023-01-11T21:38:06.0917238Z 
2023-01-11T21:38:06.0917317Z def call(args):
2023-01-11T21:38:06.0917398Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0917476Z     args.clear()
2023-01-11T21:38:06.0917668Z     buf0 = empty_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0917835Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0917911Z     del arg0_1
2023-01-11T21:38:06.0917983Z     del arg1_1
2023-01-11T21:38:06.0918060Z     return (buf0, )
2023-01-11T21:38:06.0918065Z 
2023-01-11T21:38:06.0918070Z 
2023-01-11T21:38:06.0918149Z if __name__ == "__main__":
2023-01-11T21:38:06.0918270Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0918399Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0918592Z     arg0_1 = rand_strided((8, 16), (32, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0918789Z     arg1_1 = rand_strided((8, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0918913Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0918918Z 
2023-01-11T21:38:06.0918988Z ok (1.700s)
2023-01-11T21:38:06.0919443Z   test_sum1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0919621Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0919882Z [2023-01-11 21:31:57,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 271
2023-01-11T21:38:06.0920147Z [2023-01-11 21:31:58,726] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 271
2023-01-11T21:38:06.0920153Z 
2023-01-11T21:38:06.0920251Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0920319Z import torch
2023-01-11T21:38:06.0920397Z import random
2023-01-11T21:38:06.0920517Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0920642Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0920647Z 
2023-01-11T21:38:06.0920735Z aten = torch.ops.aten
2023-01-11T21:38:06.0920874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0920970Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0920976Z 
2023-01-11T21:38:06.0921051Z import triton
2023-01-11T21:38:06.0921137Z import triton.language as tl
2023-01-11T21:38:06.0921266Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0921408Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0921414Z 
2023-01-11T21:38:06.0921418Z 
2023-01-11T21:38:06.0921558Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0921767Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0921894Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0922004Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0922109Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0922172Z {
2023-01-11T21:38:06.0922273Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0922342Z     {
2023-01-11T21:38:06.0922424Z         #pragma omp for 
2023-01-11T21:38:06.0922512Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0922581Z         {
2023-01-11T21:38:06.0922643Z             {
2023-01-11T21:38:06.0922873Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0922962Z                 float tmp3 = 0;
2023-01-11T21:38:06.0923092Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0923189Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0923262Z                 {
2023-01-11T21:38:06.0923412Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0923561Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0923662Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0923748Z                     tmp3_vec += tmp2;
2023-01-11T21:38:06.0923819Z                 }
2023-01-11T21:38:06.0924023Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:06.0924153Z                 #pragma omp simd simdlen(4)  reduction(+:tmp3)
2023-01-11T21:38:06.0924250Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0924325Z                 {
2023-01-11T21:38:06.0924435Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0924532Z                     auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:06.0924631Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0924717Z                     tmp3 += tmp2;
2023-01-11T21:38:06.0924789Z                 }
2023-01-11T21:38:06.0924878Z                 out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0924950Z             }
2023-01-11T21:38:06.0925046Z         }
2023-01-11T21:38:06.0925108Z     }
2023-01-11T21:38:06.0925172Z }
2023-01-11T21:38:06.0925259Z ''')
2023-01-11T21:38:06.0925265Z 
2023-01-11T21:38:06.0925269Z 
2023-01-11T21:38:06.0925367Z async_compile.wait(globals())
2023-01-11T21:38:06.0925446Z del async_compile
2023-01-11T21:38:06.0925451Z 
2023-01-11T21:38:06.0925528Z def call(args):
2023-01-11T21:38:06.0925614Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0925684Z     args.clear()
2023-01-11T21:38:06.0925879Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0926049Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0926126Z     del arg0_1
2023-01-11T21:38:06.0926199Z     del arg1_1
2023-01-11T21:38:06.0926278Z     return (buf0, )
2023-01-11T21:38:06.0926283Z 
2023-01-11T21:38:06.0926288Z 
2023-01-11T21:38:06.0926372Z if __name__ == "__main__":
2023-01-11T21:38:06.0926495Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0926620Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0926819Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0927020Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0927143Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0927148Z 
2023-01-11T21:38:06.0927225Z ok (1.717s)
2023-01-11T21:38:06.0927671Z   test_sum2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0927805Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0928065Z [2023-01-11 21:31:58,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 272
2023-01-11T21:38:06.0928337Z [2023-01-11 21:32:00,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 272
2023-01-11T21:38:06.0928343Z 
2023-01-11T21:38:06.0928437Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0928543Z import torch
2023-01-11T21:38:06.0928621Z import random
2023-01-11T21:38:06.0928745Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0928871Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0928876Z 
2023-01-11T21:38:06.0928961Z aten = torch.ops.aten
2023-01-11T21:38:06.0929102Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0929200Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0929206Z 
2023-01-11T21:38:06.0929275Z import triton
2023-01-11T21:38:06.0929369Z import triton.language as tl
2023-01-11T21:38:06.0929496Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0929644Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0929650Z 
2023-01-11T21:38:06.0929654Z 
2023-01-11T21:38:06.0929794Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0930003Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0930134Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0930246Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0930346Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0930448Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.0930516Z {
2023-01-11T21:38:06.0930622Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0930691Z     {
2023-01-11T21:38:06.0930776Z         #pragma omp for 
2023-01-11T21:38:06.0930867Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.0930930Z         {
2023-01-11T21:38:06.0931019Z             #pragma GCC ivdep
2023-01-11T21:38:06.0931143Z             for(long i1=0; i1<21; i1+=1)
2023-01-11T21:38:06.0931213Z             {
2023-01-11T21:38:06.0931284Z                 {
2023-01-11T21:38:06.0931356Z                     {
2023-01-11T21:38:06.0931438Z                         float tmp3 = 0;
2023-01-11T21:38:06.0931543Z                         for(long i2=0; i2<27; i2+=1)
2023-01-11T21:38:06.0931617Z                         {
2023-01-11T21:38:06.0931694Z                             {
2023-01-11T21:38:06.0931812Z                                 auto tmp0 = in_ptr0[i1 + (21*i2) + (567*i0)];
2023-01-11T21:38:06.0931932Z                                 auto tmp1 = in_ptr1[i1 + (21*i2) + (567*i0)];
2023-01-11T21:38:06.0932041Z                                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0932134Z                                 tmp3 += tmp2;
2023-01-11T21:38:06.0932203Z                             }
2023-01-11T21:38:06.0932277Z                         }
2023-01-11T21:38:06.0932381Z                         out_ptr0[i1 + (21*i0)] = tmp3;
2023-01-11T21:38:06.0932457Z                     }
2023-01-11T21:38:06.0932525Z                 }
2023-01-11T21:38:06.0932594Z             }
2023-01-11T21:38:06.0932654Z         }
2023-01-11T21:38:06.0932736Z         #pragma omp for 
2023-01-11T21:38:06.0932824Z         for(long i0=0; i0<216; i0+=1)
2023-01-11T21:38:06.0932893Z         {
2023-01-11T21:38:06.0932961Z             {
2023-01-11T21:38:06.0933156Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0933243Z                 float tmp3 = 0;
2023-01-11T21:38:06.0933376Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0933467Z                 for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:06.0933536Z                 {
2023-01-11T21:38:06.0933686Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (21*i0));
2023-01-11T21:38:06.0933832Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (21*i0));
2023-01-11T21:38:06.0933937Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0934026Z                     tmp3_vec += tmp2;
2023-01-11T21:38:06.0934097Z                 }
2023-01-11T21:38:06.0934324Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:06.0934446Z                 #pragma omp simd simdlen(4)  reduction(+:tmp3)
2023-01-11T21:38:06.0934690Z                 for(long i1=16; i1<21; i1+=1)
2023-01-11T21:38:06.0934767Z                 {
2023-01-11T21:38:06.0934875Z                     auto tmp0 = in_ptr0[i1 + (21*i0)];
2023-01-11T21:38:06.0934977Z                     auto tmp1 = in_ptr1[i1 + (21*i0)];
2023-01-11T21:38:06.0935076Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0935162Z                     tmp3 += tmp2;
2023-01-11T21:38:06.0935225Z                 }
2023-01-11T21:38:06.0935316Z                 out_ptr1[i0] = tmp3;
2023-01-11T21:38:06.0935384Z             }
2023-01-11T21:38:06.0935452Z         }
2023-01-11T21:38:06.0935522Z     }
2023-01-11T21:38:06.0935589Z }
2023-01-11T21:38:06.0935671Z ''')
2023-01-11T21:38:06.0935684Z 
2023-01-11T21:38:06.0935688Z 
2023-01-11T21:38:06.0935776Z async_compile.wait(globals())
2023-01-11T21:38:06.0935857Z del async_compile
2023-01-11T21:38:06.0935862Z 
2023-01-11T21:38:06.0935939Z def call(args):
2023-01-11T21:38:06.0936018Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0936093Z     args.clear()
2023-01-11T21:38:06.0936299Z     buf0 = empty_strided((8, 21), (21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0936503Z     buf1 = empty_strided((8, 9, 3), (27, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0936693Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0936768Z     del arg0_1
2023-01-11T21:38:06.0936842Z     del arg1_1
2023-01-11T21:38:06.0936982Z     return (buf0, buf1, )
2023-01-11T21:38:06.0936988Z 
2023-01-11T21:38:06.0936992Z 
2023-01-11T21:38:06.0937072Z if __name__ == "__main__":
2023-01-11T21:38:06.0937269Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0937400Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0937619Z     arg0_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0937821Z     arg1_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0937943Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0937948Z 
2023-01-11T21:38:06.0938020Z ok (1.749s)
2023-01-11T21:38:06.0938464Z   test_sum3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0938601Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0938863Z [2023-01-11 21:32:00,501] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 273
2023-01-11T21:38:06.0939128Z [2023-01-11 21:32:02,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 273
2023-01-11T21:38:06.0939134Z 
2023-01-11T21:38:06.0939233Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0939308Z import torch
2023-01-11T21:38:06.0939383Z import random
2023-01-11T21:38:06.0939496Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0939620Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0939625Z 
2023-01-11T21:38:06.0939710Z aten = torch.ops.aten
2023-01-11T21:38:06.0939847Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0939948Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0939953Z 
2023-01-11T21:38:06.0940029Z import triton
2023-01-11T21:38:06.0940122Z import triton.language as tl
2023-01-11T21:38:06.0940241Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0940421Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0940427Z 
2023-01-11T21:38:06.0940432Z 
2023-01-11T21:38:06.0940572Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0940777Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0940903Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0941010Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.0941115Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0941217Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0941312Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.0941382Z {
2023-01-11T21:38:06.0941485Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0941552Z     {
2023-01-11T21:38:06.0941631Z         #pragma omp for 
2023-01-11T21:38:06.0941720Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.0941789Z         {
2023-01-11T21:38:06.0941854Z             {
2023-01-11T21:38:06.0942046Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0942132Z                 float tmp3 = 0;
2023-01-11T21:38:06.0942260Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0942358Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0942429Z                 {
2023-01-11T21:38:06.0942579Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.0942724Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i1);
2023-01-11T21:38:06.0942845Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0942961Z                     tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.0943049Z                     tmp3_vec += tmp2;
2023-01-11T21:38:06.0943121Z                 }
2023-01-11T21:38:06.0943322Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:06.0943448Z                 #pragma omp simd simdlen(4)  reduction(+:tmp3)
2023-01-11T21:38:06.0943547Z                 for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.0943617Z                 {
2023-01-11T21:38:06.0943717Z                     auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.0943815Z                     auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.0943912Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0944011Z                     out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.0944099Z                     tmp3 += tmp2;
2023-01-11T21:38:06.0944174Z                 }
2023-01-11T21:38:06.0944262Z                 out_ptr1[i0] = tmp3;
2023-01-11T21:38:06.0944324Z             }
2023-01-11T21:38:06.0944392Z         }
2023-01-11T21:38:06.0944474Z         #pragma omp for 
2023-01-11T21:38:06.0944562Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.0944634Z         {
2023-01-11T21:38:06.0944775Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.0944916Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:06.0945000Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0945099Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0945170Z         }
2023-01-11T21:38:06.0945268Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0945356Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.0945423Z         {
2023-01-11T21:38:06.0945512Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.0945614Z             auto tmp1 = static_cast<float>(10);
2023-01-11T21:38:06.0945705Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0945790Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:06.0945858Z         }
2023-01-11T21:38:06.0945925Z     }
2023-01-11T21:38:06.0945991Z }
2023-01-11T21:38:06.0946070Z ''')
2023-01-11T21:38:06.0946076Z 
2023-01-11T21:38:06.0946115Z 
2023-01-11T21:38:06.0946204Z async_compile.wait(globals())
2023-01-11T21:38:06.0946282Z del async_compile
2023-01-11T21:38:06.0946287Z 
2023-01-11T21:38:06.0946360Z def call(args):
2023-01-11T21:38:06.0946441Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.0946517Z     args.clear()
2023-01-11T21:38:06.0946717Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0946910Z     buf1 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0947094Z     buf2 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0947310Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0947388Z     del arg0_1
2023-01-11T21:38:06.0947462Z     del arg1_1
2023-01-11T21:38:06.0947549Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.0947554Z 
2023-01-11T21:38:06.0947558Z 
2023-01-11T21:38:06.0947642Z if __name__ == "__main__":
2023-01-11T21:38:06.0947763Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0947890Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0948083Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0948280Z     arg1_1 = rand_strided((1, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0948402Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.0948407Z 
2023-01-11T21:38:06.0948479Z ok (1.747s)
2023-01-11T21:38:06.0948918Z   test_sum4_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0949084Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0949344Z [2023-01-11 21:32:02,249] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 274
2023-01-11T21:38:06.0949610Z [2023-01-11 21:32:03,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 274
2023-01-11T21:38:06.0949615Z 
2023-01-11T21:38:06.0949715Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0949783Z import torch
2023-01-11T21:38:06.0949859Z import random
2023-01-11T21:38:06.0949979Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0950104Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0950112Z 
2023-01-11T21:38:06.0950195Z aten = torch.ops.aten
2023-01-11T21:38:06.0950334Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0950429Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0950434Z 
2023-01-11T21:38:06.0950509Z import triton
2023-01-11T21:38:06.0950599Z import triton.language as tl
2023-01-11T21:38:06.0950726Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0950866Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0950872Z 
2023-01-11T21:38:06.0950876Z 
2023-01-11T21:38:06.0951015Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0951220Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0951345Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0951450Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.0951556Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.0951652Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.0951753Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.0951853Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:06.0951920Z {
2023-01-11T21:38:06.0952049Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0952119Z     {
2023-01-11T21:38:06.0952203Z         #pragma omp for 
2023-01-11T21:38:06.0952286Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:06.0952355Z         {
2023-01-11T21:38:06.0952496Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.0952635Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0952727Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0952824Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.0952893Z         }
2023-01-11T21:38:06.0952986Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0953082Z         for(long i0=1024; i0<1024; i0+=1)
2023-01-11T21:38:06.0953151Z         {
2023-01-11T21:38:06.0953241Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0953348Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0953438Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0953525Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0953587Z         }
2023-01-11T21:38:06.0953669Z         #pragma omp for 
2023-01-11T21:38:06.0953759Z         for(long i0=0; i0<128; i0+=1)
2023-01-11T21:38:06.0953827Z         {
2023-01-11T21:38:06.0953895Z             {
2023-01-11T21:38:06.0954087Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0954172Z                 float tmp1 = 0;
2023-01-11T21:38:06.0954292Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0954387Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0954485Z                 {
2023-01-11T21:38:06.0954637Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0954727Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.0954797Z                 }
2023-01-11T21:38:06.0955002Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0955132Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.0955221Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0955291Z                 {
2023-01-11T21:38:06.0955398Z                     auto tmp0 = out_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0955482Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0955551Z                 }
2023-01-11T21:38:06.0955638Z                 out_ptr1[i0] = tmp1;
2023-01-11T21:38:06.0955707Z             }
2023-01-11T21:38:06.0955767Z         }
2023-01-11T21:38:06.0955857Z         #pragma omp for 
2023-01-11T21:38:06.0955944Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:06.0956011Z         {
2023-01-11T21:38:06.0956150Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0956294Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:06.0956386Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0956476Z             tmp2.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.0956546Z         }
2023-01-11T21:38:06.0956648Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0956734Z         for(long i0=128; i0<128; i0+=1)
2023-01-11T21:38:06.0956800Z         {
2023-01-11T21:38:06.0956891Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:06.0956996Z             auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:06.0957079Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0957164Z             out_ptr2[i0] = tmp2;
2023-01-11T21:38:06.0957234Z         }
2023-01-11T21:38:06.0957316Z         #pragma omp for 
2023-01-11T21:38:06.0957406Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:06.0957473Z         {
2023-01-11T21:38:06.0957541Z             {
2023-01-11T21:38:06.0957726Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0957839Z                 float tmp1 = 0;
2023-01-11T21:38:06.0957967Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.0958061Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0958132Z                 {
2023-01-11T21:38:06.0958280Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr2 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0958367Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.0958438Z                 }
2023-01-11T21:38:06.0958629Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.0958762Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.0958856Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0958926Z                 {
2023-01-11T21:38:06.0959034Z                     auto tmp0 = out_ptr2[i1 + (8*i0)];
2023-01-11T21:38:06.0959119Z                     tmp1 += tmp0;
2023-01-11T21:38:06.0959189Z                 }
2023-01-11T21:38:06.0959269Z                 out_ptr3[i0] = tmp1;
2023-01-11T21:38:06.0959336Z             }
2023-01-11T21:38:06.0959406Z         }
2023-01-11T21:38:06.0959487Z         #pragma omp for 
2023-01-11T21:38:06.0959573Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0959641Z         {
2023-01-11T21:38:06.0959780Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr3 + 8*i0);
2023-01-11T21:38:06.0959913Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(5));
2023-01-11T21:38:06.0960003Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0960142Z             tmp2.store(out_ptr4 + 8*i0);
2023-01-11T21:38:06.0960212Z         }
2023-01-11T21:38:06.0960312Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0960399Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.0960467Z         {
2023-01-11T21:38:06.0960550Z             auto tmp0 = out_ptr3[i0];
2023-01-11T21:38:06.0960655Z             auto tmp1 = static_cast<float>(5);
2023-01-11T21:38:06.0960747Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0960833Z             out_ptr4[i0] = tmp2;
2023-01-11T21:38:06.0960901Z         }
2023-01-11T21:38:06.0960968Z     }
2023-01-11T21:38:06.0961034Z }
2023-01-11T21:38:06.0961116Z ''')
2023-01-11T21:38:06.0961121Z 
2023-01-11T21:38:06.0961126Z 
2023-01-11T21:38:06.0961218Z async_compile.wait(globals())
2023-01-11T21:38:06.0961296Z del async_compile
2023-01-11T21:38:06.0961302Z 
2023-01-11T21:38:06.0961379Z def call(args):
2023-01-11T21:38:06.0961457Z     arg0_1, = args
2023-01-11T21:38:06.0961534Z     args.clear()
2023-01-11T21:38:06.0961754Z     buf0 = empty_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0961955Z     buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0962154Z     buf2 = empty_strided((1, 16, 8), (128, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0962354Z     buf3 = empty_strided((1, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0962552Z     buf4 = empty_strided((1, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0962791Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:06.0962865Z     del arg0_1
2023-01-11T21:38:06.0962968Z     return (buf4, buf3, buf2, buf1, buf0, )
2023-01-11T21:38:06.0962974Z 
2023-01-11T21:38:06.0962978Z 
2023-01-11T21:38:06.0963059Z if __name__ == "__main__":
2023-01-11T21:38:06.0963182Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0963304Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0963518Z     arg0_1 = rand_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0963632Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0963669Z 
2023-01-11T21:38:06.0963745Z ok (1.768s)
2023-01-11T21:38:06.0964194Z   test_sum5_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0964327Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0964588Z [2023-01-11 21:32:04,017] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 275
2023-01-11T21:38:06.0964854Z [2023-01-11 21:32:05,735] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 275
2023-01-11T21:38:06.0964860Z 
2023-01-11T21:38:06.0964958Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0965027Z import torch
2023-01-11T21:38:06.0965104Z import random
2023-01-11T21:38:06.0965227Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0965352Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0965358Z 
2023-01-11T21:38:06.0965442Z aten = torch.ops.aten
2023-01-11T21:38:06.0965580Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0965675Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0965681Z 
2023-01-11T21:38:06.0965756Z import triton
2023-01-11T21:38:06.0965843Z import triton.language as tl
2023-01-11T21:38:06.0965968Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0966107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0966139Z 
2023-01-11T21:38:06.0966144Z 
2023-01-11T21:38:06.0966284Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0966492Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0966617Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0966729Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0966832Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.0966892Z {
2023-01-11T21:38:06.0966985Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:06.0967088Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0967155Z     {
2023-01-11T21:38:06.0967238Z         #pragma omp for 
2023-01-11T21:38:06.0967326Z         for(long i0=0; i0<136; i0+=1)
2023-01-11T21:38:06.0967397Z         {
2023-01-11T21:38:06.0967459Z             {
2023-01-11T21:38:06.0967653Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0967743Z                 float tmp3 = 0;
2023-01-11T21:38:06.0967872Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0967968Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0968041Z                 {
2023-01-11T21:38:06.0968190Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (9*i0));
2023-01-11T21:38:06.0968336Z                     auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.0968428Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0968515Z                     tmp3_vec += tmp2;
2023-01-11T21:38:06.0968585Z                 }
2023-01-11T21:38:06.0968784Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:06.0968913Z                 #pragma omp simd simdlen(4)  reduction(+:tmp3)
2023-01-11T21:38:06.0969010Z                 for(long i1=8; i1<9; i1+=1)
2023-01-11T21:38:06.0969081Z                 {
2023-01-11T21:38:06.0969179Z                     auto tmp0 = in_ptr0[i1 + (9*i0)];
2023-01-11T21:38:06.0969291Z                     auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.0969414Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0969500Z                     tmp3 += tmp2;
2023-01-11T21:38:06.0969570Z                 }
2023-01-11T21:38:06.0969658Z                 out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.0969728Z             }
2023-01-11T21:38:06.0969790Z         }
2023-01-11T21:38:06.0969873Z         #pragma omp for 
2023-01-11T21:38:06.0969959Z         for(long i0=0; i0<17; i0+=1)
2023-01-11T21:38:06.0970027Z         {
2023-01-11T21:38:06.0970095Z             {
2023-01-11T21:38:06.0970286Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.0970374Z                 float tmp3 = 0;
2023-01-11T21:38:06.0970494Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.0970591Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.0970661Z                 {
2023-01-11T21:38:06.0970812Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.0970957Z                     auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(3));
2023-01-11T21:38:06.0971055Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0971143Z                     tmp3_vec += tmp2;
2023-01-11T21:38:06.0971214Z                 }
2023-01-11T21:38:06.0971404Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:06.0971530Z                 #pragma omp simd simdlen(4)  reduction(+:tmp3)
2023-01-11T21:38:06.0971626Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.0971724Z                 {
2023-01-11T21:38:06.0971831Z                     auto tmp0 = out_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.0971941Z                     auto tmp1 = static_cast<float>(3);
2023-01-11T21:38:06.0972037Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0972122Z                     tmp3 += tmp2;
2023-01-11T21:38:06.0972188Z                 }
2023-01-11T21:38:06.0972276Z                 out_ptr1[i0] = tmp3;
2023-01-11T21:38:06.0972346Z             }
2023-01-11T21:38:06.0972416Z         }
2023-01-11T21:38:06.0972498Z         #pragma omp for 
2023-01-11T21:38:06.0972586Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.0972647Z         {
2023-01-11T21:38:06.0972790Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.0972928Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(5));
2023-01-11T21:38:06.0973020Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0973123Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.0973195Z         }
2023-01-11T21:38:06.0973297Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.0973387Z         for(long i0=16; i0<17; i0+=1)
2023-01-11T21:38:06.0973447Z         {
2023-01-11T21:38:06.0973539Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:06.0973645Z             auto tmp1 = static_cast<float>(5);
2023-01-11T21:38:06.0973734Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.0973822Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0973889Z         }
2023-01-11T21:38:06.0973949Z     }
2023-01-11T21:38:06.0974015Z }
2023-01-11T21:38:06.0974102Z ''')
2023-01-11T21:38:06.0974108Z 
2023-01-11T21:38:06.0974112Z 
2023-01-11T21:38:06.0974207Z async_compile.wait(globals())
2023-01-11T21:38:06.0974285Z del async_compile
2023-01-11T21:38:06.0974290Z 
2023-01-11T21:38:06.0974366Z def call(args):
2023-01-11T21:38:06.0974441Z     arg0_1, = args
2023-01-11T21:38:06.0974636Z     args.clear()
2023-01-11T21:38:06.0974842Z     buf0 = empty_strided((1, 17, 8), (136, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0975041Z     buf1 = empty_strided((1, 17), (17, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0975131Z     buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.0975382Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.0975458Z     del arg0_1
2023-01-11T21:38:06.0975533Z     return (buf2, )
2023-01-11T21:38:06.0975539Z 
2023-01-11T21:38:06.0975543Z 
2023-01-11T21:38:06.0975624Z if __name__ == "__main__":
2023-01-11T21:38:06.0975746Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0975866Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0976082Z     arg0_1 = rand_strided((1, 17, 8, 9), (1224, 72, 9, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0976192Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0976197Z 
2023-01-11T21:38:06.0976267Z ok (1.769s)
2023-01-11T21:38:06.0976729Z   test_sum_dtype_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0976859Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0977116Z [2023-01-11 21:32:05,788] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 276
2023-01-11T21:38:06.0977474Z [2023-01-11 21:32:07,506] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 276
2023-01-11T21:38:06.0977481Z 
2023-01-11T21:38:06.0977600Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0977673Z import torch
2023-01-11T21:38:06.0977750Z import random
2023-01-11T21:38:06.0977918Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0978045Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0978050Z 
2023-01-11T21:38:06.0978132Z aten = torch.ops.aten
2023-01-11T21:38:06.0978268Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0978366Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0978374Z 
2023-01-11T21:38:06.0978449Z import triton
2023-01-11T21:38:06.0978535Z import triton.language as tl
2023-01-11T21:38:06.0978659Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0978799Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0978805Z 
2023-01-11T21:38:06.0978809Z 
2023-01-11T21:38:06.0978945Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0979149Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0979272Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.0979383Z                        double* __restrict__ out_ptr0,
2023-01-11T21:38:06.0979492Z                        double* __restrict__ out_ptr1,
2023-01-11T21:38:06.0979587Z                        double* __restrict__ out_ptr2)
2023-01-11T21:38:06.0979652Z {
2023-01-11T21:38:06.0979756Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0979828Z     {
2023-01-11T21:38:06.0979909Z         #pragma omp for 
2023-01-11T21:38:06.0980002Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0980062Z         {
2023-01-11T21:38:06.0980132Z             {
2023-01-11T21:38:06.0980202Z                 {
2023-01-11T21:38:06.0980288Z                     double tmp2 = 0;
2023-01-11T21:38:06.0980385Z                     for(long i1=0; i1<32; i1+=1)
2023-01-11T21:38:06.0980456Z                     {
2023-01-11T21:38:06.0980530Z                         {
2023-01-11T21:38:06.0980633Z                             auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:06.0980755Z                             auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.0980846Z                             tmp2 += tmp1;
2023-01-11T21:38:06.0980919Z                         }
2023-01-11T21:38:06.0980991Z                     }
2023-01-11T21:38:06.0981082Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.0981152Z                 }
2023-01-11T21:38:06.0981244Z             }
2023-01-11T21:38:06.0981312Z         }
2023-01-11T21:38:06.0981378Z     }
2023-01-11T21:38:06.0981443Z     {
2023-01-11T21:38:06.0981512Z         {
2023-01-11T21:38:06.0981595Z             double tmp2 = 0;
2023-01-11T21:38:06.0981700Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0981770Z             {
2023-01-11T21:38:06.0981877Z                 #pragma omp for reduction(+:tmp2) 
2023-01-11T21:38:06.0981973Z                 for(long i0=0; i0<1024; i0+=1)
2023-01-11T21:38:06.0982042Z                 {
2023-01-11T21:38:06.0982113Z                     {
2023-01-11T21:38:06.0982215Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0982328Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.0982413Z                         tmp2 += tmp1;
2023-01-11T21:38:06.0982484Z                     }
2023-01-11T21:38:06.0982554Z                 }
2023-01-11T21:38:06.0982627Z             }
2023-01-11T21:38:06.0982710Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.0982780Z         }
2023-01-11T21:38:06.0982838Z     }
2023-01-11T21:38:06.0982941Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0983005Z     {
2023-01-11T21:38:06.0983086Z         #pragma omp for 
2023-01-11T21:38:06.0983172Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.0983239Z         {
2023-01-11T21:38:06.0983324Z             #pragma GCC ivdep
2023-01-11T21:38:06.0983407Z             for(long i1=0; i1<32; i1+=1)
2023-01-11T21:38:06.0983475Z             {
2023-01-11T21:38:06.0983549Z                 {
2023-01-11T21:38:06.0983620Z                     {
2023-01-11T21:38:06.0983730Z                         auto tmp0 = in_ptr0[i1 + (32*i0)];
2023-01-11T21:38:06.0983863Z                         auto tmp2 = out_ptr0[i1];
2023-01-11T21:38:06.0983962Z                         auto tmp4 = out_ptr1[0];
2023-01-11T21:38:06.0984071Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.0984170Z                         auto tmp3 = tmp1 * tmp2;
2023-01-11T21:38:06.0984273Z                         auto tmp5 = tmp3 + tmp4;
2023-01-11T21:38:06.0984378Z                         out_ptr2[i1 + (32*i0)] = tmp5;
2023-01-11T21:38:06.0984449Z                     }
2023-01-11T21:38:06.0984518Z                 }
2023-01-11T21:38:06.0984586Z             }
2023-01-11T21:38:06.0984645Z         }
2023-01-11T21:38:06.0984711Z     }
2023-01-11T21:38:06.0984776Z }
2023-01-11T21:38:06.0984862Z ''')
2023-01-11T21:38:06.0984868Z 
2023-01-11T21:38:06.0984872Z 
2023-01-11T21:38:06.0984968Z async_compile.wait(globals())
2023-01-11T21:38:06.0985044Z del async_compile
2023-01-11T21:38:06.0985049Z 
2023-01-11T21:38:06.0985124Z def call(args):
2023-01-11T21:38:06.0985193Z     arg0_1, = args
2023-01-11T21:38:06.0985268Z     args.clear()
2023-01-11T21:38:06.0985463Z     buf0 = empty_strided((32, ), (1, ), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.0985653Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.0985855Z     buf2 = empty_strided((32, 32), (32, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.0986052Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.0986125Z     del arg0_1
2023-01-11T21:38:06.0986194Z     return (buf2, )
2023-01-11T21:38:06.0986207Z 
2023-01-11T21:38:06.0986211Z 
2023-01-11T21:38:06.0986285Z if __name__ == "__main__":
2023-01-11T21:38:06.0986403Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0986530Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0986730Z     arg0_1 = rand_strided((32, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.0986849Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0986854Z 
2023-01-11T21:38:06.0986924Z ok (1.740s)
2023-01-11T21:38:06.0987406Z   test_sum_int_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0987540Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0987799Z [2023-01-11 21:32:07,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 277
2023-01-11T21:38:06.0988057Z [2023-01-11 21:32:09,232] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 277
2023-01-11T21:38:06.0988477Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0988609Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0988867Z [2023-01-11 21:32:09,250] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 278
2023-01-11T21:38:06.0989130Z [2023-01-11 21:32:10,966] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 278
2023-01-11T21:38:06.0989541Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.0989702Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.0989956Z [2023-01-11 21:32:10,984] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 279
2023-01-11T21:38:06.0990216Z [2023-01-11 21:32:12,685] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 279
2023-01-11T21:38:06.0990222Z 
2023-01-11T21:38:06.0990320Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0990392Z import torch
2023-01-11T21:38:06.0990460Z import random
2023-01-11T21:38:06.0990581Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0990703Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0990709Z 
2023-01-11T21:38:06.0990791Z aten = torch.ops.aten
2023-01-11T21:38:06.0990928Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0991028Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0991034Z 
2023-01-11T21:38:06.0991110Z import triton
2023-01-11T21:38:06.0991196Z import triton.language as tl
2023-01-11T21:38:06.0991320Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0991461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0991469Z 
2023-01-11T21:38:06.0991473Z 
2023-01-11T21:38:06.0991610Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0991816Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0991938Z extern "C" void kernel(long* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0992047Z                        const bool* __restrict__ in_ptr0,
2023-01-11T21:38:06.0992146Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:06.0992204Z {
2023-01-11T21:38:06.0992295Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.0992361Z     {
2023-01-11T21:38:06.0992428Z         {
2023-01-11T21:38:06.0992510Z             long tmp2 = 0;
2023-01-11T21:38:06.0992589Z             long tmp3 = 0;
2023-01-11T21:38:06.0992699Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0992760Z             {
2023-01-11T21:38:06.0992893Z                 #pragma omp for reduction(+:tmp2) reduction(+:tmp3) 
2023-01-11T21:38:06.0993016Z                 for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.0993087Z                 {
2023-01-11T21:38:06.0993161Z                     {
2023-01-11T21:38:06.0993262Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.0993377Z                         auto tmp1 = static_cast<long>(tmp0);
2023-01-11T21:38:06.0993455Z                         tmp2 += tmp1;
2023-01-11T21:38:06.0993538Z                         tmp3 += tmp1;
2023-01-11T21:38:06.0993608Z                     }
2023-01-11T21:38:06.0993675Z                 }
2023-01-11T21:38:06.0993743Z             }
2023-01-11T21:38:06.0993826Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.0993907Z             out_ptr1[0] = tmp3;
2023-01-11T21:38:06.0993975Z         }
2023-01-11T21:38:06.0994040Z     }
2023-01-11T21:38:06.0994104Z     {
2023-01-11T21:38:06.0994174Z         {
2023-01-11T21:38:06.0994265Z             auto tmp0 = out_ptr0[0];
2023-01-11T21:38:06.0994353Z             auto tmp3 = out_ptr1[0];
2023-01-11T21:38:06.0994451Z             auto tmp1 = static_cast<long>(2);
2023-01-11T21:38:06.0994542Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.0994629Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.0994716Z             in_out_ptr0[0] = tmp4;
2023-01-11T21:38:06.0994782Z         }
2023-01-11T21:38:06.0994858Z     }
2023-01-11T21:38:06.0994937Z }
2023-01-11T21:38:06.0995021Z ''')
2023-01-11T21:38:06.0995028Z 
2023-01-11T21:38:06.0995033Z 
2023-01-11T21:38:06.0995142Z async_compile.wait(globals())
2023-01-11T21:38:06.0995216Z del async_compile
2023-01-11T21:38:06.0995221Z 
2023-01-11T21:38:06.0995296Z def call(args):
2023-01-11T21:38:06.0995370Z     arg0_1, = args
2023-01-11T21:38:06.0995477Z     args.clear()
2023-01-11T21:38:06.0995665Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0995838Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.0995930Z     buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.0996102Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.0996175Z     del arg0_1
2023-01-11T21:38:06.0996249Z     return (buf2, )
2023-01-11T21:38:06.0996255Z 
2023-01-11T21:38:06.0996259Z 
2023-01-11T21:38:06.0996339Z if __name__ == "__main__":
2023-01-11T21:38:06.0996458Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.0996584Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.0996765Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.0996877Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.0996882Z 
2023-01-11T21:38:06.0996890Z 
2023-01-11T21:38:06.0996991Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.0997067Z import torch
2023-01-11T21:38:06.0997143Z import random
2023-01-11T21:38:06.0997262Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.0997385Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.0997390Z 
2023-01-11T21:38:06.0997478Z aten = torch.ops.aten
2023-01-11T21:38:06.0997607Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.0997702Z async_compile = AsyncCompile()
2023-01-11T21:38:06.0997707Z 
2023-01-11T21:38:06.0997780Z import triton
2023-01-11T21:38:06.0997872Z import triton.language as tl
2023-01-11T21:38:06.0997998Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.0998136Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.0998141Z 
2023-01-11T21:38:06.0998146Z 
2023-01-11T21:38:06.0998282Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.0998489Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.0998604Z extern "C" void kernel(long* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.0998727Z                        const unsigned char* __restrict__ in_ptr0,
2023-01-11T21:38:06.0998829Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:06.0998925Z {
2023-01-11T21:38:06.0999018Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.0999084Z     {
2023-01-11T21:38:06.0999150Z         {
2023-01-11T21:38:06.0999223Z             long tmp2 = 0;
2023-01-11T21:38:06.0999302Z             long tmp3 = 0;
2023-01-11T21:38:06.0999413Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.0999480Z             {
2023-01-11T21:38:06.0999612Z                 #pragma omp for reduction(+:tmp2) reduction(+:tmp3) 
2023-01-11T21:38:06.0999706Z                 for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.0999775Z                 {
2023-01-11T21:38:06.0999839Z                     {
2023-01-11T21:38:06.0999940Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1000054Z                         auto tmp1 = static_cast<long>(tmp0);
2023-01-11T21:38:06.1000144Z                         tmp2 += tmp1;
2023-01-11T21:38:06.1000228Z                         tmp3 += tmp1;
2023-01-11T21:38:06.1000297Z                     }
2023-01-11T21:38:06.1000367Z                 }
2023-01-11T21:38:06.1000428Z             }
2023-01-11T21:38:06.1000514Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.1000597Z             out_ptr1[0] = tmp3;
2023-01-11T21:38:06.1000663Z         }
2023-01-11T21:38:06.1000729Z     }
2023-01-11T21:38:06.1000794Z     {
2023-01-11T21:38:06.1000853Z         {
2023-01-11T21:38:06.1000942Z             auto tmp0 = out_ptr0[0];
2023-01-11T21:38:06.1001029Z             auto tmp3 = out_ptr1[0];
2023-01-11T21:38:06.1001132Z             auto tmp1 = static_cast<long>(2);
2023-01-11T21:38:06.1001220Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1001308Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1001432Z             in_out_ptr0[0] = tmp4;
2023-01-11T21:38:06.1001491Z         }
2023-01-11T21:38:06.1001557Z     }
2023-01-11T21:38:06.1001620Z }
2023-01-11T21:38:06.1001705Z ''')
2023-01-11T21:38:06.1001710Z 
2023-01-11T21:38:06.1001714Z 
2023-01-11T21:38:06.1001806Z async_compile.wait(globals())
2023-01-11T21:38:06.1001886Z del async_compile
2023-01-11T21:38:06.1001891Z 
2023-01-11T21:38:06.1001965Z def call(args):
2023-01-11T21:38:06.1002032Z     arg0_1, = args
2023-01-11T21:38:06.1002106Z     args.clear()
2023-01-11T21:38:06.1002286Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1002464Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1002553Z     buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.1002722Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1002796Z     del arg0_1
2023-01-11T21:38:06.1002864Z     return (buf2, )
2023-01-11T21:38:06.1002882Z 
2023-01-11T21:38:06.1002887Z 
2023-01-11T21:38:06.1002960Z if __name__ == "__main__":
2023-01-11T21:38:06.1003076Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1003203Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1003396Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.uint8)
2023-01-11T21:38:06.1003508Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1003513Z 
2023-01-11T21:38:06.1003517Z 
2023-01-11T21:38:06.1003615Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1003687Z import torch
2023-01-11T21:38:06.1003754Z import random
2023-01-11T21:38:06.1003873Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1003994Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1003999Z 
2023-01-11T21:38:06.1004080Z aten = torch.ops.aten
2023-01-11T21:38:06.1004217Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1004314Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1004319Z 
2023-01-11T21:38:06.1004395Z import triton
2023-01-11T21:38:06.1004489Z import triton.language as tl
2023-01-11T21:38:06.1004607Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1004744Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1004776Z 
2023-01-11T21:38:06.1004781Z 
2023-01-11T21:38:06.1004917Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1005125Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1005243Z extern "C" void kernel(long* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.1005350Z                        const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.1005453Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:06.1005516Z {
2023-01-11T21:38:06.1005599Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.1005664Z     {
2023-01-11T21:38:06.1005729Z         {
2023-01-11T21:38:06.1005812Z             long tmp2 = 0;
2023-01-11T21:38:06.1005891Z             long tmp3 = 0;
2023-01-11T21:38:06.1006002Z             #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1006070Z             {
2023-01-11T21:38:06.1006195Z                 #pragma omp for reduction(+:tmp2) reduction(+:tmp3) 
2023-01-11T21:38:06.1006291Z                 for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.1006360Z                 {
2023-01-11T21:38:06.1006428Z                     {
2023-01-11T21:38:06.1006531Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1006645Z                         auto tmp1 = static_cast<long>(tmp0);
2023-01-11T21:38:06.1006730Z                         tmp2 += tmp1;
2023-01-11T21:38:06.1006806Z                         tmp3 += tmp1;
2023-01-11T21:38:06.1006876Z                     }
2023-01-11T21:38:06.1006945Z                 }
2023-01-11T21:38:06.1007012Z             }
2023-01-11T21:38:06.1007096Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.1007206Z             out_ptr1[0] = tmp3;
2023-01-11T21:38:06.1007266Z         }
2023-01-11T21:38:06.1007332Z     }
2023-01-11T21:38:06.1007397Z     {
2023-01-11T21:38:06.1007463Z         {
2023-01-11T21:38:06.1007551Z             auto tmp0 = out_ptr0[0];
2023-01-11T21:38:06.1007644Z             auto tmp3 = out_ptr1[0];
2023-01-11T21:38:06.1007750Z             auto tmp1 = static_cast<long>(2);
2023-01-11T21:38:06.1007833Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1007922Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1008008Z             in_out_ptr0[0] = tmp4;
2023-01-11T21:38:06.1008076Z         }
2023-01-11T21:38:06.1008141Z     }
2023-01-11T21:38:06.1008205Z }
2023-01-11T21:38:06.1008282Z ''')
2023-01-11T21:38:06.1008294Z 
2023-01-11T21:38:06.1008298Z 
2023-01-11T21:38:06.1008385Z async_compile.wait(globals())
2023-01-11T21:38:06.1008462Z del async_compile
2023-01-11T21:38:06.1008467Z 
2023-01-11T21:38:06.1008541Z def call(args):
2023-01-11T21:38:06.1008615Z     arg0_1, = args
2023-01-11T21:38:06.1008694Z     args.clear()
2023-01-11T21:38:06.1008876Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1009053Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1009135Z     buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.1009300Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1009374Z     del arg0_1
2023-01-11T21:38:06.1009449Z     return (buf2, )
2023-01-11T21:38:06.1009454Z 
2023-01-11T21:38:06.1009459Z 
2023-01-11T21:38:06.1009539Z if __name__ == "__main__":
2023-01-11T21:38:06.1009655Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1009782Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1009974Z     arg0_1 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.1010078Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1010083Z 
2023-01-11T21:38:06.1010155Z ok (5.178s)
2023-01-11T21:38:06.1010640Z   test_sum_keepdims_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1010774Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1011032Z [2023-01-11 21:32:12,701] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 280
2023-01-11T21:38:06.1011295Z [2023-01-11 21:32:12,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 280
2023-01-11T21:38:06.1011301Z 
2023-01-11T21:38:06.1011399Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1011473Z import torch
2023-01-11T21:38:06.1011550Z import random
2023-01-11T21:38:06.1011664Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1011786Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1011791Z 
2023-01-11T21:38:06.1011873Z aten = torch.ops.aten
2023-01-11T21:38:06.1012007Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1012106Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1012112Z 
2023-01-11T21:38:06.1012186Z import triton
2023-01-11T21:38:06.1012280Z import triton.language as tl
2023-01-11T21:38:06.1012404Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1012536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1012542Z 
2023-01-11T21:38:06.1012546Z 
2023-01-11T21:38:06.1012684Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1012891Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1013014Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1013152Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1013256Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1013323Z {
2023-01-11T21:38:06.1013426Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1013485Z     {
2023-01-11T21:38:06.1013568Z         #pragma omp for 
2023-01-11T21:38:06.1013656Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1013722Z         {
2023-01-11T21:38:06.1013790Z             {
2023-01-11T21:38:06.1013985Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1014071Z                 float tmp3 = 0;
2023-01-11T21:38:06.1014192Z                 auto tmp3_vec = at::vec::Vectorized<float>(tmp3);
2023-01-11T21:38:06.1014287Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.1014356Z                 {
2023-01-11T21:38:06.1014619Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.1014772Z                     auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i0) + (8*i1));
2023-01-11T21:38:06.1014867Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1014955Z                     tmp3_vec += tmp2;
2023-01-11T21:38:06.1015016Z                 }
2023-01-11T21:38:06.1015220Z                 tmp3 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp3_vec);
2023-01-11T21:38:06.1015350Z                 #pragma omp simd simdlen(4)  reduction(+:tmp3)
2023-01-11T21:38:06.1015445Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.1015515Z                 {
2023-01-11T21:38:06.1015621Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.1015730Z                     auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:06.1015825Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1015901Z                     tmp3 += tmp2;
2023-01-11T21:38:06.1015973Z                 }
2023-01-11T21:38:06.1016059Z                 out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.1016125Z             }
2023-01-11T21:38:06.1016190Z         }
2023-01-11T21:38:06.1016254Z     }
2023-01-11T21:38:06.1016317Z }
2023-01-11T21:38:06.1016397Z ''')
2023-01-11T21:38:06.1016402Z 
2023-01-11T21:38:06.1016448Z 
2023-01-11T21:38:06.1016542Z async_compile.wait(globals())
2023-01-11T21:38:06.1016619Z del async_compile
2023-01-11T21:38:06.1016623Z 
2023-01-11T21:38:06.1016696Z def call(args):
2023-01-11T21:38:06.1016776Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1016849Z     args.clear()
2023-01-11T21:38:06.1017048Z     buf0 = empty_strided((8, 1), (1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1017268Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1017345Z     del arg0_1
2023-01-11T21:38:06.1017417Z     del arg1_1
2023-01-11T21:38:06.1017494Z     return (buf0, )
2023-01-11T21:38:06.1017503Z 
2023-01-11T21:38:06.1017507Z 
2023-01-11T21:38:06.1017587Z if __name__ == "__main__":
2023-01-11T21:38:06.1017705Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1017832Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1018032Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1018221Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1018343Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1018348Z 
2023-01-11T21:38:06.1018418Z ok (0.025s)
2023-01-11T21:38:06.1018866Z   test_tanh_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1019042Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1019302Z [2023-01-11 21:32:12,739] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 281
2023-01-11T21:38:06.1019567Z [2023-01-11 21:32:14,415] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 281
2023-01-11T21:38:06.1019573Z 
2023-01-11T21:38:06.1019672Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1019746Z import torch
2023-01-11T21:38:06.1019814Z import random
2023-01-11T21:38:06.1019934Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1020057Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1020062Z 
2023-01-11T21:38:06.1020142Z aten = torch.ops.aten
2023-01-11T21:38:06.1020278Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1020375Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1020384Z 
2023-01-11T21:38:06.1020457Z import triton
2023-01-11T21:38:06.1020549Z import triton.language as tl
2023-01-11T21:38:06.1020667Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1020808Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1020814Z 
2023-01-11T21:38:06.1020818Z 
2023-01-11T21:38:06.1020957Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1021162Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1021288Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1021393Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1021494Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.1021560Z {
2023-01-11T21:38:06.1021655Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1021722Z     {
2023-01-11T21:38:06.1021804Z         #pragma omp for 
2023-01-11T21:38:06.1021893Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.1021961Z         {
2023-01-11T21:38:06.1022101Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1022191Z             auto tmp1 = tmp0.tanh();
2023-01-11T21:38:06.1022320Z             auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1022442Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.1022578Z             auto tmp4 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1022667Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:06.1022756Z             auto tmp6 = tmp5.tanh();
2023-01-11T21:38:06.1022852Z             tmp3.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1022945Z             tmp6.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1023006Z         }
2023-01-11T21:38:06.1023106Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1023193Z         for(long i0=256; i0<256; i0+=1)
2023-01-11T21:38:06.1023257Z         {
2023-01-11T21:38:06.1023349Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1023447Z             auto tmp1 = std::tanh(tmp0);
2023-01-11T21:38:06.1023558Z             auto tmp2 = static_cast<float>(2);
2023-01-11T21:38:06.1023640Z             auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.1023743Z             auto tmp4 = static_cast<float>(1);
2023-01-11T21:38:06.1023834Z             auto tmp5 = tmp0 + tmp4;
2023-01-11T21:38:06.1023933Z             auto tmp6 = std::tanh(tmp5);
2023-01-11T21:38:06.1024020Z             out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.1024101Z             out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.1024169Z         }
2023-01-11T21:38:06.1024228Z     }
2023-01-11T21:38:06.1024292Z }
2023-01-11T21:38:06.1024377Z ''')
2023-01-11T21:38:06.1024383Z 
2023-01-11T21:38:06.1024387Z 
2023-01-11T21:38:06.1024483Z async_compile.wait(globals())
2023-01-11T21:38:06.1024560Z del async_compile
2023-01-11T21:38:06.1024565Z 
2023-01-11T21:38:06.1024639Z def call(args):
2023-01-11T21:38:06.1024712Z     arg0_1, = args
2023-01-11T21:38:06.1024810Z     args.clear()
2023-01-11T21:38:06.1025012Z     buf0 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1025213Z     buf1 = empty_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1025384Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1025459Z     del arg0_1
2023-01-11T21:38:06.1025540Z     return (buf0, buf1, )
2023-01-11T21:38:06.1025546Z 
2023-01-11T21:38:06.1025550Z 
2023-01-11T21:38:06.1025628Z if __name__ == "__main__":
2023-01-11T21:38:06.1025746Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1025865Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1026064Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1026176Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1026182Z 
2023-01-11T21:38:06.1026252Z ok (1.706s)
2023-01-11T21:38:06.1026708Z   test_tensor1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1026840Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1027097Z [2023-01-11 21:32:14,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 282
2023-01-11T21:38:06.1027361Z [2023-01-11 21:32:16,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 282
2023-01-11T21:38:06.1027367Z 
2023-01-11T21:38:06.1027464Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1027539Z import torch
2023-01-11T21:38:06.1027607Z import random
2023-01-11T21:38:06.1027728Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1027852Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1027857Z 
2023-01-11T21:38:06.1027941Z aten = torch.ops.aten
2023-01-11T21:38:06.1028078Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1028174Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1028210Z 
2023-01-11T21:38:06.1028287Z import triton
2023-01-11T21:38:06.1028372Z import triton.language as tl
2023-01-11T21:38:06.1028496Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1028635Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1028640Z 
2023-01-11T21:38:06.1028645Z 
2023-01-11T21:38:06.1028780Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1028985Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1029111Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1029222Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1029326Z                        long* __restrict__ out_ptr1)
2023-01-11T21:38:06.1029384Z {
2023-01-11T21:38:06.1029485Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1029552Z     {
2023-01-11T21:38:06.1029635Z         #pragma omp for 
2023-01-11T21:38:06.1029725Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.1029793Z         {
2023-01-11T21:38:06.1029858Z             {
2023-01-11T21:38:06.1029920Z                 {
2023-01-11T21:38:06.1030018Z                     auto tmp2 = in_ptr0[i0];
2023-01-11T21:38:06.1030126Z                     auto tmp0 = static_cast<long>(1);
2023-01-11T21:38:06.1030236Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.1030333Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.1030425Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.1030492Z                 }
2023-01-11T21:38:06.1030552Z             }
2023-01-11T21:38:06.1030649Z         }
2023-01-11T21:38:06.1030734Z         #pragma omp single
2023-01-11T21:38:06.1030800Z         {
2023-01-11T21:38:06.1030868Z             {
2023-01-11T21:38:06.1030935Z                 {
2023-01-11T21:38:06.1031035Z                     auto tmp0 = static_cast<long>(5);
2023-01-11T21:38:06.1031122Z                     out_ptr1[0] = tmp0;
2023-01-11T21:38:06.1031197Z                 }
2023-01-11T21:38:06.1031268Z             }
2023-01-11T21:38:06.1031333Z         }
2023-01-11T21:38:06.1031402Z     }
2023-01-11T21:38:06.1031469Z }
2023-01-11T21:38:06.1031547Z ''')
2023-01-11T21:38:06.1031552Z 
2023-01-11T21:38:06.1031557Z 
2023-01-11T21:38:06.1031649Z async_compile.wait(globals())
2023-01-11T21:38:06.1031726Z del async_compile
2023-01-11T21:38:06.1031731Z 
2023-01-11T21:38:06.1031804Z def call(args):
2023-01-11T21:38:06.1031881Z     arg0_1, = args
2023-01-11T21:38:06.1031957Z     args.clear()
2023-01-11T21:38:06.1032151Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1032327Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1032493Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1032568Z     del arg0_1
2023-01-11T21:38:06.1032650Z     return (buf0, buf1, )
2023-01-11T21:38:06.1032655Z 
2023-01-11T21:38:06.1032662Z 
2023-01-11T21:38:06.1032743Z if __name__ == "__main__":
2023-01-11T21:38:06.1032861Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1032984Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1033177Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1033282Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1033296Z 
2023-01-11T21:38:06.1033360Z ok (1.692s)
2023-01-11T21:38:06.1033812Z   test_tensor2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1033951Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1034246Z [2023-01-11 21:32:16,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 283
2023-01-11T21:38:06.1034510Z [2023-01-11 21:32:17,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 283
2023-01-11T21:38:06.1034516Z 
2023-01-11T21:38:06.1034616Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1034692Z import torch
2023-01-11T21:38:06.1034768Z import random
2023-01-11T21:38:06.1034881Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1035004Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1035012Z 
2023-01-11T21:38:06.1035094Z aten = torch.ops.aten
2023-01-11T21:38:06.1035230Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1035324Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1035329Z 
2023-01-11T21:38:06.1035402Z import triton
2023-01-11T21:38:06.1035495Z import triton.language as tl
2023-01-11T21:38:06.1035622Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1035754Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1035921Z constant0 = None  # 4ebd4ff1c68a89413a036eaaf84436373c4ec2939ac1d7f84e9908772a109281
2023-01-11T21:38:06.1035926Z 
2023-01-11T21:38:06.1035931Z 
2023-01-11T21:38:06.1036068Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1036274Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1036399Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.1036510Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1036644Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1036709Z {
2023-01-11T21:38:06.1036803Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1036869Z     {
2023-01-11T21:38:06.1036952Z         #pragma omp for 
2023-01-11T21:38:06.1037039Z         for(long i0=0; i0<19; i0+=1)
2023-01-11T21:38:06.1037108Z         {
2023-01-11T21:38:06.1037175Z             {
2023-01-11T21:38:06.1037244Z                 {
2023-01-11T21:38:06.1037334Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1037429Z                     auto tmp2 = in_ptr1[0];
2023-01-11T21:38:06.1037541Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.1037639Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.1037728Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.1037797Z                 }
2023-01-11T21:38:06.1037864Z             }
2023-01-11T21:38:06.1037923Z         }
2023-01-11T21:38:06.1037993Z     }
2023-01-11T21:38:06.1038057Z }
2023-01-11T21:38:06.1038139Z ''')
2023-01-11T21:38:06.1038145Z 
2023-01-11T21:38:06.1038149Z 
2023-01-11T21:38:06.1038241Z async_compile.wait(globals())
2023-01-11T21:38:06.1038319Z del async_compile
2023-01-11T21:38:06.1038324Z 
2023-01-11T21:38:06.1038398Z def call(args):
2023-01-11T21:38:06.1038465Z     arg0_1, = args
2023-01-11T21:38:06.1038544Z     args.clear()
2023-01-11T21:38:06.1038739Z     buf0 = empty_strided((19, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1038915Z     kernel_cpp_0(c_void_p(constant0.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1038989Z     del arg0_1
2023-01-11T21:38:06.1039063Z     return (buf0, )
2023-01-11T21:38:06.1039069Z 
2023-01-11T21:38:06.1039073Z 
2023-01-11T21:38:06.1039152Z if __name__ == "__main__":
2023-01-11T21:38:06.1039261Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1039388Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1039590Z     constant0 = rand_strided((19, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1039782Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1039893Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1039898Z 
2023-01-11T21:38:06.1039971Z ok (1.683s)
2023-01-11T21:38:06.1040447Z   test_tensor3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1040581Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1040839Z [2023-01-11 21:32:17,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 284
2023-01-11T21:38:06.1041107Z [2023-01-11 21:32:19,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 284
2023-01-11T21:38:06.1041113Z 
2023-01-11T21:38:06.1041204Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1041278Z import torch
2023-01-11T21:38:06.1041353Z import random
2023-01-11T21:38:06.1041474Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1041597Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1041602Z 
2023-01-11T21:38:06.1041683Z aten = torch.ops.aten
2023-01-11T21:38:06.1041819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1041908Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1041922Z 
2023-01-11T21:38:06.1041989Z import triton
2023-01-11T21:38:06.1042086Z import triton.language as tl
2023-01-11T21:38:06.1042211Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1042350Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1042396Z 
2023-01-11T21:38:06.1042400Z 
2023-01-11T21:38:06.1042538Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1042745Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1042869Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1042966Z                        long* __restrict__ out_ptr0,
2023-01-11T21:38:06.1043066Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:06.1043168Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.1043234Z {
2023-01-11T21:38:06.1043316Z     #pragma GCC ivdep
2023-01-11T21:38:06.1043402Z     for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1043468Z     {
2023-01-11T21:38:06.1043528Z         {
2023-01-11T21:38:06.1043597Z             {
2023-01-11T21:38:06.1043703Z                 auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:06.1043804Z                 auto tmp1 = static_cast<long>(1);
2023-01-11T21:38:06.1043899Z                 auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.1044000Z                 auto tmp3 = static_cast<long>(2);
2023-01-11T21:38:06.1044102Z                 auto tmp4 = tmp2 ? tmp1 : tmp3;
2023-01-11T21:38:06.1044187Z                 auto tmp5 = tmp4 + tmp1;
2023-01-11T21:38:06.1044274Z                 out_ptr0[i0] = tmp5;
2023-01-11T21:38:06.1044348Z             }
2023-01-11T21:38:06.1044415Z         }
2023-01-11T21:38:06.1044478Z     }
2023-01-11T21:38:06.1044558Z     #pragma GCC ivdep
2023-01-11T21:38:06.1044642Z     for(long i0=0; i0<3; i0+=1)
2023-01-11T21:38:06.1044701Z     {
2023-01-11T21:38:06.1044767Z         {
2023-01-11T21:38:06.1044838Z             {
2023-01-11T21:38:06.1044941Z                 auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:06.1045040Z                 auto tmp1 = static_cast<long>(1);
2023-01-11T21:38:06.1045133Z                 auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.1045227Z                 auto tmp3 = static_cast<long>(2);
2023-01-11T21:38:06.1045319Z                 auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:06.1045417Z                 auto tmp5 = static_cast<long>(3);
2023-01-11T21:38:06.1045519Z                 auto tmp6 = tmp4 ? tmp3 : tmp5;
2023-01-11T21:38:06.1045619Z                 auto tmp7 = tmp2 ? tmp1 : tmp6;
2023-01-11T21:38:06.1045712Z                 auto tmp8 = tmp7 + tmp3;
2023-01-11T21:38:06.1045829Z                 out_ptr1[i0] = tmp8;
2023-01-11T21:38:06.1045899Z             }
2023-01-11T21:38:06.1045959Z         }
2023-01-11T21:38:06.1046026Z     }
2023-01-11T21:38:06.1046107Z     #pragma GCC ivdep
2023-01-11T21:38:06.1046196Z     for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1046262Z     {
2023-01-11T21:38:06.1046332Z         {
2023-01-11T21:38:06.1046393Z             {
2023-01-11T21:38:06.1046487Z                 auto tmp12 = in_ptr0[i0];
2023-01-11T21:38:06.1046591Z                 auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:06.1046692Z                 auto tmp1 = static_cast<long>(2);
2023-01-11T21:38:06.1046784Z                 auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.1046888Z                 auto tmp3 = static_cast<long>(1);
2023-01-11T21:38:06.1046979Z                 auto tmp4 = tmp0 < tmp3;
2023-01-11T21:38:06.1047072Z                 auto tmp5 = tmp4 ? tmp3 : tmp1;
2023-01-11T21:38:06.1047177Z                 auto tmp6 = static_cast<long>(3);
2023-01-11T21:38:06.1047272Z                 auto tmp7 = tmp0 < tmp6;
2023-01-11T21:38:06.1047376Z                 auto tmp8 = static_cast<long>(4);
2023-01-11T21:38:06.1047475Z                 auto tmp9 = tmp7 ? tmp6 : tmp8;
2023-01-11T21:38:06.1047579Z                 auto tmp10 = tmp2 ? tmp5 : tmp9;
2023-01-11T21:38:06.1047690Z                 auto tmp11 = static_cast<float>(tmp10);
2023-01-11T21:38:06.1047779Z                 auto tmp13 = tmp11 + tmp12;
2023-01-11T21:38:06.1047867Z                 out_ptr2[i0] = tmp13;
2023-01-11T21:38:06.1047932Z             }
2023-01-11T21:38:06.1047999Z         }
2023-01-11T21:38:06.1048064Z     }
2023-01-11T21:38:06.1048127Z }
2023-01-11T21:38:06.1048242Z ''')
2023-01-11T21:38:06.1048247Z 
2023-01-11T21:38:06.1048252Z 
2023-01-11T21:38:06.1048339Z async_compile.wait(globals())
2023-01-11T21:38:06.1048416Z del async_compile
2023-01-11T21:38:06.1048421Z 
2023-01-11T21:38:06.1048496Z def call(args):
2023-01-11T21:38:06.1048570Z     arg0_1, = args
2023-01-11T21:38:06.1048647Z     args.clear()
2023-01-11T21:38:06.1048840Z     buf0 = empty_strided((2, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1049028Z     buf1 = empty_strided((3, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1049219Z     buf2 = empty_strided((4, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1049403Z     buf3 = empty_strided((0, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1049593Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.1049667Z     del arg0_1
2023-01-11T21:38:06.1049761Z     return (buf3, buf0, buf1, buf2, )
2023-01-11T21:38:06.1049770Z 
2023-01-11T21:38:06.1049774Z 
2023-01-11T21:38:06.1049854Z if __name__ == "__main__":
2023-01-11T21:38:06.1049969Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1050094Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1050288Z     arg0_1 = rand_strided((4, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1050393Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1050398Z 
2023-01-11T21:38:06.1050473Z ok (1.700s)
2023-01-11T21:38:06.1050937Z   test_tmp_not_defined_issue1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1051068Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1051458Z Failed to collect metadata on function, produced code may be suboptimal.  Known situations this can occur are inference mode only compilation involving resize_ or prims (!schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED); if your situation looks different please file a bug to PyTorch.
2023-01-11T21:38:06.1051558Z Traceback (most recent call last):
2023-01-11T21:38:06.1051833Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1273, in aot_wrapper_dedupe
2023-01-11T21:38:06.1052003Z     fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_collect_metadata(
2023-01-11T21:38:06.1052252Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 289, in inner
2023-01-11T21:38:06.1052331Z     outs = f(*f_args)
2023-01-11T21:38:06.1052589Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2327, in functional_call
2023-01-11T21:38:06.1052719Z     out = Interpreter(mod).run(*args[params_len:], **kwargs)
2023-01-11T21:38:06.1052949Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run
2023-01-11T21:38:06.1053053Z     self.env[node] = self.run_node(node)
2023-01-11T21:38:06.1053293Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node
2023-01-11T21:38:06.1053414Z     return getattr(self, n.op)(n.target, args, kwargs)
2023-01-11T21:38:06.1053660Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 249, in call_function
2023-01-11T21:38:06.1053753Z     return target(*args, **kwargs)
2023-01-11T21:38:06.1053961Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.1054063Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.1054320Z   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 36, in __torch_function__
2023-01-11T21:38:06.1054443Z     return func(*args, **kwargs)
2023-01-11T21:38:06.1054783Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.1054886Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.1055157Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims/__init__.py", line 285, in _autograd_impl
2023-01-11T21:38:06.1055309Z     return backwards_not_supported(_prim)(*args, **kwargs)
2023-01-11T21:38:06.1055559Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 309, in _autograd_impl
2023-01-11T21:38:06.1055663Z     return redispatch_prim(args, kwargs)
2023-01-11T21:38:06.1055924Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 279, in redispatch_prim
2023-01-11T21:38:06.1056015Z     return prim(*args, **kwargs)
2023-01-11T21:38:06.1056231Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.1056334Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.1056804Z RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/aten/src/ATen/FunctionalizeFallbackKernel.cpp":32, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels
2023-01-11T21:38:06.1056810Z 
2023-01-11T21:38:06.1057056Z While executing %broadcast_in_dim_default : [#users=1] = call_function[target=torch.ops.prims.broadcast_in_dim.default](args = (%var_default_1, [1, 512, 1], [0, 1]), kwargs = {})
2023-01-11T21:38:06.1057185Z Original traceback:
2023-01-11T21:38:06.1057270Z Module stack: {}
2023-01-11T21:38:06.1057438Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 4723, in forward
2023-01-11T21:38:06.1057592Z     broadcast_in_dim_default_2 = torch.ops.prims.broadcast_in_dim.default(
2023-01-11T21:38:06.1057754Z  |   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 318, in run
2023-01-11T21:38:06.1057842Z     return model(*ex, **kwargs)
2023-01-11T21:38:06.1057850Z 
2023-01-11T21:38:06.1058108Z [2023-01-11 21:32:19,701] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 285
2023-01-11T21:38:06.1058114Z 
2023-01-11T21:38:06.1058211Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1058289Z import torch
2023-01-11T21:38:06.1058432Z import random
2023-01-11T21:38:06.1058557Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1058682Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1058687Z 
2023-01-11T21:38:06.1058769Z aten = torch.ops.aten
2023-01-11T21:38:06.1058905Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1059001Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1059007Z 
2023-01-11T21:38:06.1059083Z import triton
2023-01-11T21:38:06.1059168Z import triton.language as tl
2023-01-11T21:38:06.1059293Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1059432Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1059441Z 
2023-01-11T21:38:06.1059445Z 
2023-01-11T21:38:06.1059584Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1059790Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1059912Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.1060019Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.1060134Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1060236Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1060344Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.1060453Z                        const float* __restrict__ in_ptr3,
2023-01-11T21:38:06.1060559Z                        const float* __restrict__ in_ptr4,
2023-01-11T21:38:06.1060666Z                        const float* __restrict__ in_ptr5,
2023-01-11T21:38:06.1060770Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1060912Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1061015Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.1061110Z                        float* __restrict__ out_ptr5)
2023-01-11T21:38:06.1061179Z {
2023-01-11T21:38:06.1061276Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:06.1061370Z     auto out_ptr4 = in_out_ptr1;
2023-01-11T21:38:06.1061474Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1061545Z     {
2023-01-11T21:38:06.1061622Z         #pragma omp for 
2023-01-11T21:38:06.1061713Z         for(long i0=0; i0<512; i0+=1)
2023-01-11T21:38:06.1061784Z         {
2023-01-11T21:38:06.1061856Z             {
2023-01-11T21:38:06.1062054Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1062142Z                 float tmp1 = 0;
2023-01-11T21:38:06.1062272Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.1062374Z                 for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:06.1062439Z                 {
2023-01-11T21:38:06.1062593Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (1024*i0));
2023-01-11T21:38:06.1062683Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.1062758Z                 }
2023-01-11T21:38:06.1062964Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.1063093Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.1063195Z                 for(long i1=1024; i1<1024; i1+=1)
2023-01-11T21:38:06.1063266Z                 {
2023-01-11T21:38:06.1063369Z                     auto tmp0 = in_ptr0[i1 + (1024*i0)];
2023-01-11T21:38:06.1063455Z                     tmp1 += tmp0;
2023-01-11T21:38:06.1063525Z                 }
2023-01-11T21:38:06.1063616Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1063688Z             }
2023-01-11T21:38:06.1063758Z         }
2023-01-11T21:38:06.1063835Z         #pragma omp for 
2023-01-11T21:38:06.1063926Z         for(long i0=0; i0<512; i0+=1)
2023-01-11T21:38:06.1063996Z         {
2023-01-11T21:38:06.1064065Z             {
2023-01-11T21:38:06.1064290Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1064377Z                 float tmp6 = 0;
2023-01-11T21:38:06.1064505Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.1064599Z                 for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:06.1064661Z                 {
2023-01-11T21:38:06.1064810Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (1024*i0));
2023-01-11T21:38:06.1064944Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:06.1065087Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(1024));
2023-01-11T21:38:06.1065189Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1065334Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1065435Z                     auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.1065528Z                     tmp6_vec += tmp5;
2023-01-11T21:38:06.1065608Z                 }
2023-01-11T21:38:06.1065835Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.1065961Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6)
2023-01-11T21:38:06.1066057Z                 for(long i1=1024; i1<1024; i1+=1)
2023-01-11T21:38:06.1066127Z                 {
2023-01-11T21:38:06.1066235Z                     auto tmp0 = in_ptr0[i1 + (1024*i0)];
2023-01-11T21:38:06.1066334Z                     auto tmp1 = out_ptr0[i0];
2023-01-11T21:38:06.1066437Z                     auto tmp2 = static_cast<float>(1024);
2023-01-11T21:38:06.1066607Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1066747Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1066840Z                     auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.1066923Z                     tmp6 += tmp5;
2023-01-11T21:38:06.1066993Z                 }
2023-01-11T21:38:06.1067084Z                 out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.1067144Z             }
2023-01-11T21:38:06.1067211Z         }
2023-01-11T21:38:06.1067293Z         #pragma omp for 
2023-01-11T21:38:06.1067382Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.1067451Z         {
2023-01-11T21:38:06.1067592Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1067734Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1024));
2023-01-11T21:38:06.1067824Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1067917Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.1067987Z         }
2023-01-11T21:38:06.1068085Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1068174Z         for(long i0=512; i0<512; i0+=1)
2023-01-11T21:38:06.1068241Z         {
2023-01-11T21:38:06.1068333Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:06.1068431Z             auto tmp1 = static_cast<float>(1024);
2023-01-11T21:38:06.1068524Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1068612Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1068678Z         }
2023-01-11T21:38:06.1068758Z         #pragma omp for 
2023-01-11T21:38:06.1068845Z         for(long i0=0; i0<512; i0+=1)
2023-01-11T21:38:06.1068910Z         {
2023-01-11T21:38:06.1068971Z             {
2023-01-11T21:38:06.1069160Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1069246Z                 float tmp9 = 0;
2023-01-11T21:38:06.1069370Z                 auto tmp9_vec = at::vec::Vectorized<float>(tmp9);
2023-01-11T21:38:06.1069468Z                 for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:06.1069538Z                 {
2023-01-11T21:38:06.1069684Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (1024*i0));
2023-01-11T21:38:06.1069815Z                     auto tmp1 = at::vec::Vectorized<float>(in_ptr2[i0]);
2023-01-11T21:38:06.1069969Z                     auto tmp3 = at::vec::Vectorized<float>(in_ptr3[i0]);
2023-01-11T21:38:06.1070111Z                     auto tmp5 = at::vec::Vectorized<float>::loadu(in_ptr4 + 8*i1);
2023-01-11T21:38:06.1070246Z                     auto tmp7 = at::vec::Vectorized<float>::loadu(in_ptr5 + 8*i1);
2023-01-11T21:38:06.1070387Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:06.1070482Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:06.1070575Z                     auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1070672Z                     auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.1070783Z                     tmp8.store(out_ptr2 + (8*i1) + (1024*i0));
2023-01-11T21:38:06.1070867Z                     tmp9_vec += tmp8;
2023-01-11T21:38:06.1070935Z                 }
2023-01-11T21:38:06.1071132Z                 tmp9 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp9_vec);
2023-01-11T21:38:06.1071260Z                 #pragma omp simd simdlen(4)  reduction(+:tmp9)
2023-01-11T21:38:06.1071358Z                 for(long i1=1024; i1<1024; i1+=1)
2023-01-11T21:38:06.1071428Z                 {
2023-01-11T21:38:06.1071534Z                     auto tmp0 = in_ptr1[i1 + (1024*i0)];
2023-01-11T21:38:06.1071631Z                     auto tmp1 = in_ptr2[i0];
2023-01-11T21:38:06.1071718Z                     auto tmp3 = in_ptr3[i0];
2023-01-11T21:38:06.1071814Z                     auto tmp5 = in_ptr4[i1];
2023-01-11T21:38:06.1071907Z                     auto tmp7 = in_ptr5[i1];
2023-01-11T21:38:06.1072044Z                     auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:06.1072168Z                     auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:06.1072261Z                     auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1072352Z                     auto tmp8 = tmp6 + tmp7;
2023-01-11T21:38:06.1072446Z                     out_ptr2[i1 + (1024*i0)] = tmp8;
2023-01-11T21:38:06.1072530Z                     tmp9 += tmp8;
2023-01-11T21:38:06.1072602Z                 }
2023-01-11T21:38:06.1072690Z                 out_ptr3[i0] = tmp9;
2023-01-11T21:38:06.1072757Z             }
2023-01-11T21:38:06.1072824Z         }
2023-01-11T21:38:06.1072908Z         #pragma omp for 
2023-01-11T21:38:06.1072989Z         for(long i0=0; i0<512; i0+=1)
2023-01-11T21:38:06.1073057Z         {
2023-01-11T21:38:06.1073126Z             {
2023-01-11T21:38:06.1073319Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1073407Z                 float tmp6 = 0;
2023-01-11T21:38:06.1073531Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.1073619Z                 float tmp7 = 0;
2023-01-11T21:38:06.1073738Z                 auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.1073833Z                 for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:06.1073902Z                 {
2023-01-11T21:38:06.1074054Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr2 + (8*i1) + (1024*i0));
2023-01-11T21:38:06.1074187Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr3[i0]);
2023-01-11T21:38:06.1074330Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(1024));
2023-01-11T21:38:06.1074425Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1074563Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1074652Z                     auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.1074742Z                     tmp6_vec += tmp5;
2023-01-11T21:38:06.1074826Z                     tmp7_vec += tmp0;
2023-01-11T21:38:06.1074898Z                 }
2023-01-11T21:38:06.1075100Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.1075319Z                 tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:06.1075470Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6) reduction(+:tmp7)
2023-01-11T21:38:06.1075565Z                 for(long i1=1024; i1<1024; i1+=1)
2023-01-11T21:38:06.1075627Z                 {
2023-01-11T21:38:06.1075735Z                     auto tmp0 = out_ptr2[i1 + (1024*i0)];
2023-01-11T21:38:06.1075831Z                     auto tmp1 = out_ptr3[i0];
2023-01-11T21:38:06.1075943Z                     auto tmp2 = static_cast<float>(1024);
2023-01-11T21:38:06.1076040Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1076179Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1076275Z                     auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.1076359Z                     tmp6 += tmp5;
2023-01-11T21:38:06.1076434Z                     tmp7 += tmp0;
2023-01-11T21:38:06.1076503Z                 }
2023-01-11T21:38:06.1076591Z                 out_ptr4[i0] = tmp6;
2023-01-11T21:38:06.1076680Z                 out_ptr5[i0] = tmp7;
2023-01-11T21:38:06.1076747Z             }
2023-01-11T21:38:06.1076814Z         }
2023-01-11T21:38:06.1076888Z         #pragma omp for 
2023-01-11T21:38:06.1076974Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.1077041Z         {
2023-01-11T21:38:06.1077181Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr4 + 8*i0);
2023-01-11T21:38:06.1077321Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1024));
2023-01-11T21:38:06.1077410Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1077611Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1e-05));
2023-01-11T21:38:06.1077731Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1077826Z             tmp4.store(in_out_ptr1 + 8*i0);
2023-01-11T21:38:06.1077894Z         }
2023-01-11T21:38:06.1077993Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1078079Z         for(long i0=512; i0<512; i0+=1)
2023-01-11T21:38:06.1078146Z         {
2023-01-11T21:38:06.1078238Z             auto tmp0 = out_ptr4[i0];
2023-01-11T21:38:06.1078343Z             auto tmp1 = static_cast<float>(1024);
2023-01-11T21:38:06.1078424Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1078576Z             auto tmp3 = static_cast<float>(1e-05);
2023-01-11T21:38:06.1078663Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1078750Z             in_out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.1078817Z         }
2023-01-11T21:38:06.1078882Z     }
2023-01-11T21:38:06.1078939Z }
2023-01-11T21:38:06.1079022Z ''')
2023-01-11T21:38:06.1079027Z 
2023-01-11T21:38:06.1079032Z 
2023-01-11T21:38:06.1079126Z async_compile.wait(globals())
2023-01-11T21:38:06.1079209Z del async_compile
2023-01-11T21:38:06.1079214Z 
2023-01-11T21:38:06.1079288Z def call(args):
2023-01-11T21:38:06.1079400Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1 = args
2023-01-11T21:38:06.1079477Z     args.clear()
2023-01-11T21:38:06.1079687Z     buf0 = empty_strided((1, 512, 1), (512, 1, 512), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1079884Z     buf1 = empty_strided((1, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1079977Z     buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.1080196Z     buf3 = empty_strided((1, 512, 1024), (524288, 1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1080404Z     buf4 = empty_strided((1, 512, 1), (512, 1, 512), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1080601Z     buf5 = empty_strided((1, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1080797Z     buf6 = empty_strided((1, 512), (512, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1080924Z     buf7 = as_strided(buf5, (1, 512, 1), (512, 1, 1)); del buf5  # reuse
2023-01-11T21:38:06.1081318Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(arg3_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(arg4_1.data_ptr()), c_void_p(arg5_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf6.data_ptr()))
2023-01-11T21:38:06.1081395Z     del arg0_1
2023-01-11T21:38:06.1081460Z     del arg1_1
2023-01-11T21:38:06.1081532Z     del arg2_1
2023-01-11T21:38:06.1081603Z     del arg3_1
2023-01-11T21:38:06.1081673Z     del arg4_1
2023-01-11T21:38:06.1081745Z     del arg5_1
2023-01-11T21:38:06.1081831Z     return (buf2, buf6, buf7, )
2023-01-11T21:38:06.1081836Z 
2023-01-11T21:38:06.1081841Z 
2023-01-11T21:38:06.1081922Z if __name__ == "__main__":
2023-01-11T21:38:06.1082034Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1082161Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1082362Z     arg0_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1082559Z     arg1_1 = rand_strided((1024, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1082779Z     arg2_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1082997Z     arg3_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1083204Z     arg4_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1083410Z     arg5_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1083550Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1]))
2023-01-11T21:38:06.1083814Z [2023-01-11 21:32:21,487] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 285
2023-01-11T21:38:06.1083820Z 
2023-01-11T21:38:06.1083932Z ok (2.005s)
2023-01-11T21:38:06.1084400Z   test_tmp_not_defined_issue2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1084533Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1084791Z [2023-01-11 21:32:21,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 286
2023-01-11T21:38:06.1085050Z [2023-01-11 21:32:23,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 286
2023-01-11T21:38:06.1085056Z 
2023-01-11T21:38:06.1085156Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1085230Z import torch
2023-01-11T21:38:06.1085307Z import random
2023-01-11T21:38:06.1085420Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1085543Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1085549Z 
2023-01-11T21:38:06.1085629Z aten = torch.ops.aten
2023-01-11T21:38:06.1085766Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1085864Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1085870Z 
2023-01-11T21:38:06.1085943Z import triton
2023-01-11T21:38:06.1086035Z import triton.language as tl
2023-01-11T21:38:06.1086152Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1086291Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1086297Z 
2023-01-11T21:38:06.1086301Z 
2023-01-11T21:38:06.1086436Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1086644Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1086767Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1086879Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1086988Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.1087094Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1087152Z {
2023-01-11T21:38:06.1087223Z     {
2023-01-11T21:38:06.1087441Z         #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1087524Z         float tmp5 = 0;
2023-01-11T21:38:06.1087645Z         auto tmp5_vec = at::vec::Vectorized<float>(tmp5);
2023-01-11T21:38:06.1087754Z         #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1087824Z         {
2023-01-11T21:38:06.1087938Z             #pragma omp for reduction(+:tmp5_vec) 
2023-01-11T21:38:06.1088026Z             for(long i0=0; i0<17600; i0+=1)
2023-01-11T21:38:06.1088096Z             {
2023-01-11T21:38:06.1088237Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1088369Z                 auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:06.1088513Z                 auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i0);
2023-01-11T21:38:06.1088611Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1088704Z                 auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:06.1088783Z                 tmp5_vec += tmp4;
2023-01-11T21:38:06.1088854Z             }
2023-01-11T21:38:06.1089055Z             tmp5 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp5_vec);
2023-01-11T21:38:06.1089187Z             #pragma omp for simd simdlen(4) reduction(+:tmp5) 
2023-01-11T21:38:06.1089284Z             for(long i0=140800; i0<140800; i0+=1)
2023-01-11T21:38:06.1089351Z             {
2023-01-11T21:38:06.1089442Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1089533Z                 auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.1089682Z                 auto tmp3 = in_ptr2[i0];
2023-01-11T21:38:06.1089775Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1089864Z                 auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:06.1089945Z                 tmp5 += tmp4;
2023-01-11T21:38:06.1090013Z             }
2023-01-11T21:38:06.1090079Z         }
2023-01-11T21:38:06.1090164Z         out_ptr0[0] = tmp5;
2023-01-11T21:38:06.1090223Z     }
2023-01-11T21:38:06.1090288Z }
2023-01-11T21:38:06.1090374Z ''')
2023-01-11T21:38:06.1090379Z 
2023-01-11T21:38:06.1090384Z 
2023-01-11T21:38:06.1090480Z async_compile.wait(globals())
2023-01-11T21:38:06.1090558Z del async_compile
2023-01-11T21:38:06.1090563Z 
2023-01-11T21:38:06.1090638Z def call(args):
2023-01-11T21:38:06.1090744Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.1090813Z     args.clear()
2023-01-11T21:38:06.1091001Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1091210Z     kernel_cpp_0(c_void_p(primals_3.data_ptr()), c_void_p(primals_2.data_ptr()), c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1091333Z     return (buf0, primals_1, primals_2, primals_3, )
2023-01-11T21:38:06.1091338Z 
2023-01-11T21:38:06.1091343Z 
2023-01-11T21:38:06.1091425Z if __name__ == "__main__":
2023-01-11T21:38:06.1091548Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1091676Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1091907Z     primals_1 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1092092Z     primals_2 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1092323Z     primals_3 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1092469Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.1092474Z 
2023-01-11T21:38:06.1092545Z ok (1.760s)
2023-01-11T21:38:06.1093037Z   test_to_device_constant_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1093169Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1093429Z [2023-01-11 21:32:23,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 287
2023-01-11T21:38:06.1093692Z [2023-01-11 21:32:25,094] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 287
2023-01-11T21:38:06.1093697Z 
2023-01-11T21:38:06.1093796Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1093874Z import torch
2023-01-11T21:38:06.1093942Z import random
2023-01-11T21:38:06.1094063Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1094190Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1094195Z 
2023-01-11T21:38:06.1094277Z aten = torch.ops.aten
2023-01-11T21:38:06.1094413Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1094625Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1094633Z 
2023-01-11T21:38:06.1094710Z import triton
2023-01-11T21:38:06.1094795Z import triton.language as tl
2023-01-11T21:38:06.1094919Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1095060Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1095213Z constant0 = None  # beb02a6268c3f406824d31228875474e7116cf8e770246a5eb85f5795315f9cc
2023-01-11T21:38:06.1095219Z 
2023-01-11T21:38:06.1095223Z 
2023-01-11T21:38:06.1095362Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1095566Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1095735Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1095846Z                        const long* __restrict__ in_ptr1,
2023-01-11T21:38:06.1095953Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1096048Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:06.1096156Z                        long* __restrict__ out_ptr2)
2023-01-11T21:38:06.1096223Z {
2023-01-11T21:38:06.1096328Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1096394Z     {
2023-01-11T21:38:06.1096475Z         #pragma omp for 
2023-01-11T21:38:06.1096555Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.1096621Z         {
2023-01-11T21:38:06.1096687Z             {
2023-01-11T21:38:06.1096757Z                 {
2023-01-11T21:38:06.1096856Z                     auto tmp2 = in_ptr0[i0];
2023-01-11T21:38:06.1096967Z                     auto tmp0 = static_cast<long>(i0);
2023-01-11T21:38:06.1097079Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.1097243Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.1097335Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.1097410Z                 }
2023-01-11T21:38:06.1097476Z             }
2023-01-11T21:38:06.1097544Z         }
2023-01-11T21:38:06.1097625Z         #pragma omp for 
2023-01-11T21:38:06.1097716Z         for(long i0=0; i0<64; i0+=1)
2023-01-11T21:38:06.1097775Z         {
2023-01-11T21:38:06.1097845Z             {
2023-01-11T21:38:06.1097913Z                 {
2023-01-11T21:38:06.1098010Z                     auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.1098120Z                     auto tmp1 = static_cast<long>(1);
2023-01-11T21:38:06.1098214Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1098306Z                     out_ptr1[i0] = tmp0;
2023-01-11T21:38:06.1098389Z                     out_ptr2[i0] = tmp2;
2023-01-11T21:38:06.1098457Z                 }
2023-01-11T21:38:06.1098525Z             }
2023-01-11T21:38:06.1098596Z         }
2023-01-11T21:38:06.1098665Z     }
2023-01-11T21:38:06.1098730Z }
2023-01-11T21:38:06.1098809Z ''')
2023-01-11T21:38:06.1098824Z 
2023-01-11T21:38:06.1098829Z 
2023-01-11T21:38:06.1098915Z async_compile.wait(globals())
2023-01-11T21:38:06.1098991Z del async_compile
2023-01-11T21:38:06.1098996Z 
2023-01-11T21:38:06.1099071Z def call(args):
2023-01-11T21:38:06.1099186Z     arg0_1, = args
2023-01-11T21:38:06.1099264Z     args.clear()
2023-01-11T21:38:06.1099462Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1099654Z     buf1 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1099836Z     buf2 = empty_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1100063Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(constant0_cpu0.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.1100138Z     del arg0_1
2023-01-11T21:38:06.1100229Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1100234Z 
2023-01-11T21:38:06.1100239Z 
2023-01-11T21:38:06.1100321Z if __name__ == "__main__":
2023-01-11T21:38:06.1100440Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1100569Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1100777Z     constant0 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.1100975Z     constant0_cpu0 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1101169Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1101281Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1101287Z 
2023-01-11T21:38:06.1101357Z ok (1.838s)
2023-01-11T21:38:06.1101818Z   test_to_device_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1101982Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1102247Z [2023-01-11 21:32:25,147] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 288
2023-01-11T21:38:06.1102430Z [2023-01-11 21:32:25,148] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.1102695Z [2023-01-11 21:32:25,150] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 288
2023-01-11T21:38:06.1102701Z 
2023-01-11T21:38:06.1102800Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1102867Z import torch
2023-01-11T21:38:06.1102943Z import random
2023-01-11T21:38:06.1103065Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1103190Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1103198Z 
2023-01-11T21:38:06.1103283Z aten = torch.ops.aten
2023-01-11T21:38:06.1103419Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1103515Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1103520Z 
2023-01-11T21:38:06.1103587Z import triton
2023-01-11T21:38:06.1103680Z import triton.language as tl
2023-01-11T21:38:06.1103808Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1103950Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1103955Z 
2023-01-11T21:38:06.1103960Z 
2023-01-11T21:38:06.1104053Z async_compile.wait(globals())
2023-01-11T21:38:06.1104132Z del async_compile
2023-01-11T21:38:06.1104137Z 
2023-01-11T21:38:06.1104211Z def call(args):
2023-01-11T21:38:06.1104285Z     arg0_1, = args
2023-01-11T21:38:06.1104352Z     args.clear()
2023-01-11T21:38:06.1104444Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1104656Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1104742Z         buf0.copy_(arg0_1)
2023-01-11T21:38:06.1104815Z         del arg0_1
2023-01-11T21:38:06.1104894Z         return (buf0, )
2023-01-11T21:38:06.1104899Z 
2023-01-11T21:38:06.1104903Z 
2023-01-11T21:38:06.1104986Z if __name__ == "__main__":
2023-01-11T21:38:06.1105129Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1105250Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1105458Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1105571Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1105576Z 
2023-01-11T21:38:06.1105649Z ok (0.057s)
2023-01-11T21:38:06.1106102Z   test_to_dtype_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1106237Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1106493Z [2023-01-11 21:32:25,242] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 289
2023-01-11T21:38:06.1106757Z [2023-01-11 21:32:26,946] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 289
2023-01-11T21:38:06.1106763Z 
2023-01-11T21:38:06.1106863Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1106931Z import torch
2023-01-11T21:38:06.1107005Z import random
2023-01-11T21:38:06.1107126Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1107251Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1107256Z 
2023-01-11T21:38:06.1107339Z aten = torch.ops.aten
2023-01-11T21:38:06.1107479Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1107604Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1107609Z 
2023-01-11T21:38:06.1107684Z import triton
2023-01-11T21:38:06.1107770Z import triton.language as tl
2023-01-11T21:38:06.1107896Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1108035Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1108043Z 
2023-01-11T21:38:06.1108047Z 
2023-01-11T21:38:06.1108185Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1108393Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1108522Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.1108626Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1108721Z                        bool* __restrict__ out_ptr1)
2023-01-11T21:38:06.1108787Z {
2023-01-11T21:38:06.1108888Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1108956Z     {
2023-01-11T21:38:06.1109042Z         #pragma omp for 
2023-01-11T21:38:06.1109130Z         for(long i0=0; i0<40; i0+=1)
2023-01-11T21:38:06.1109195Z         {
2023-01-11T21:38:06.1109255Z             {
2023-01-11T21:38:06.1109324Z                 {
2023-01-11T21:38:06.1109423Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1109537Z                     auto tmp1 = static_cast<double>(1);
2023-01-11T21:38:06.1109634Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1109746Z                     auto tmp3 = static_cast<float>(tmp2);
2023-01-11T21:38:06.1109858Z                     auto tmp4 = static_cast<bool>(tmp0);
2023-01-11T21:38:06.1109940Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.1110029Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.1110098Z                 }
2023-01-11T21:38:06.1110166Z             }
2023-01-11T21:38:06.1110234Z         }
2023-01-11T21:38:06.1110300Z     }
2023-01-11T21:38:06.1110365Z }
2023-01-11T21:38:06.1110442Z ''')
2023-01-11T21:38:06.1110450Z 
2023-01-11T21:38:06.1110455Z 
2023-01-11T21:38:06.1110549Z async_compile.wait(globals())
2023-01-11T21:38:06.1110625Z del async_compile
2023-01-11T21:38:06.1110631Z 
2023-01-11T21:38:06.1110704Z def call(args):
2023-01-11T21:38:06.1110784Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1110859Z     args.clear()
2023-01-11T21:38:06.1111095Z     buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1111287Z     buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1111456Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1111552Z     return (arg0_1, buf0, arg1_1, buf1, )
2023-01-11T21:38:06.1111558Z 
2023-01-11T21:38:06.1111562Z 
2023-01-11T21:38:06.1111644Z if __name__ == "__main__":
2023-01-11T21:38:06.1111760Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1111886Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1112094Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1112291Z     arg1_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.1112408Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1112413Z 
2023-01-11T21:38:06.1112481Z ok (1.795s)
2023-01-11T21:38:06.1112927Z   test_topk_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1113057Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1113317Z [2023-01-11 21:32:26,961] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 290
2023-01-11T21:38:06.1113563Z [2023-01-11 21:32:26,967] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.topk
2023-01-11T21:38:06.1113827Z [2023-01-11 21:32:26,969] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 290
2023-01-11T21:38:06.1113832Z 
2023-01-11T21:38:06.1113934Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1114009Z import torch
2023-01-11T21:38:06.1114084Z import random
2023-01-11T21:38:06.1114197Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1114322Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1114327Z 
2023-01-11T21:38:06.1114408Z aten = torch.ops.aten
2023-01-11T21:38:06.1114547Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1114645Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1114650Z 
2023-01-11T21:38:06.1114723Z import triton
2023-01-11T21:38:06.1114815Z import triton.language as tl
2023-01-11T21:38:06.1114944Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1115077Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1115082Z 
2023-01-11T21:38:06.1115087Z 
2023-01-11T21:38:06.1115179Z async_compile.wait(globals())
2023-01-11T21:38:06.1115255Z del async_compile
2023-01-11T21:38:06.1115260Z 
2023-01-11T21:38:06.1115339Z def call(args):
2023-01-11T21:38:06.1115412Z     arg0_1, = args
2023-01-11T21:38:06.1115486Z     args.clear()
2023-01-11T21:38:06.1115573Z     buf0 = aten.topk(arg0_1, 2)
2023-01-11T21:38:06.1115639Z     del arg0_1
2023-01-11T21:38:06.1115712Z     buf1 = buf0[0]
2023-01-11T21:38:06.1115824Z     assert_size_stride(buf1, (1, 1, 8, 2), (16, 16, 2, 1))
2023-01-11T21:38:06.1115897Z     buf2 = buf0[1]
2023-01-11T21:38:06.1116006Z     assert_size_stride(buf2, (1, 1, 8, 2), (16, 16, 2, 1))
2023-01-11T21:38:06.1116075Z     del buf0
2023-01-11T21:38:06.1116155Z     return (buf1, buf2, )
2023-01-11T21:38:06.1116160Z 
2023-01-11T21:38:06.1116165Z 
2023-01-11T21:38:06.1116240Z if __name__ == "__main__":
2023-01-11T21:38:06.1116357Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1116482Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1116693Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1116837Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1116843Z 
2023-01-11T21:38:06.1116915Z ok (0.022s)
2023-01-11T21:38:06.1117371Z   test_transpose_add_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1117504Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1117765Z [2023-01-11 21:32:26,983] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 291
2023-01-11T21:38:06.1118026Z [2023-01-11 21:32:28,699] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 291
2023-01-11T21:38:06.1118031Z 
2023-01-11T21:38:06.1118125Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1118199Z import torch
2023-01-11T21:38:06.1118273Z import random
2023-01-11T21:38:06.1118392Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1118517Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1118523Z 
2023-01-11T21:38:06.1118605Z aten = torch.ops.aten
2023-01-11T21:38:06.1118742Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1118831Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1118843Z 
2023-01-11T21:38:06.1118910Z import triton
2023-01-11T21:38:06.1119002Z import triton.language as tl
2023-01-11T21:38:06.1119129Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1119294Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1119299Z 
2023-01-11T21:38:06.1119303Z 
2023-01-11T21:38:06.1119440Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1119651Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1119774Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1119877Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1119983Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1120048Z {
2023-01-11T21:38:06.1120152Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1120219Z     {
2023-01-11T21:38:06.1120301Z         #pragma omp for 
2023-01-11T21:38:06.1120389Z         for(long i0=0; i0<32; i0+=1)
2023-01-11T21:38:06.1120450Z         {
2023-01-11T21:38:06.1120534Z             #pragma GCC ivdep
2023-01-11T21:38:06.1120629Z             for(long i1=0; i1<16; i1+=1)
2023-01-11T21:38:06.1120697Z             {
2023-01-11T21:38:06.1120767Z                 {
2023-01-11T21:38:06.1120837Z                     {
2023-01-11T21:38:06.1120945Z                         auto tmp0 = in_ptr0[i0 + (32*i1)];
2023-01-11T21:38:06.1121046Z                         auto tmp1 = in_ptr1[i1 + (16*i0)];
2023-01-11T21:38:06.1121155Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1121258Z                         out_ptr0[i0 + (32*i1)] = tmp2;
2023-01-11T21:38:06.1121330Z                     }
2023-01-11T21:38:06.1121399Z                 }
2023-01-11T21:38:06.1121470Z             }
2023-01-11T21:38:06.1121540Z         }
2023-01-11T21:38:06.1121599Z     }
2023-01-11T21:38:06.1121663Z }
2023-01-11T21:38:06.1121748Z ''')
2023-01-11T21:38:06.1121753Z 
2023-01-11T21:38:06.1121758Z 
2023-01-11T21:38:06.1121850Z async_compile.wait(globals())
2023-01-11T21:38:06.1121927Z del async_compile
2023-01-11T21:38:06.1121932Z 
2023-01-11T21:38:06.1122010Z def call(args):
2023-01-11T21:38:06.1122091Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1122160Z     args.clear()
2023-01-11T21:38:06.1122363Z     buf0 = empty_strided((32, 16), (1, 32), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1122531Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1122631Z     del arg0_1
2023-01-11T21:38:06.1130305Z     del arg1_1
2023-01-11T21:38:06.1130403Z     return (buf0, )
2023-01-11T21:38:06.1130409Z 
2023-01-11T21:38:06.1130413Z 
2023-01-11T21:38:06.1130498Z if __name__ == "__main__":
2023-01-11T21:38:06.1130621Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1130752Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1130967Z     arg0_1 = rand_strided((16, 32), (32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1131169Z     arg1_1 = rand_strided((32, 16), (16, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1131290Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1131300Z 
2023-01-11T21:38:06.1131374Z ok (1.730s)
2023-01-11T21:38:06.1131835Z   test_transpose_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1131968Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1132221Z [2023-01-11 21:32:28,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 292
2023-01-11T21:38:06.1132480Z [2023-01-11 21:32:30,458] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 292
2023-01-11T21:38:06.1132486Z 
2023-01-11T21:38:06.1132587Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1132728Z import torch
2023-01-11T21:38:06.1132804Z import random
2023-01-11T21:38:06.1132924Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1133048Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1133053Z 
2023-01-11T21:38:06.1133136Z aten = torch.ops.aten
2023-01-11T21:38:06.1133269Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1133369Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1133374Z 
2023-01-11T21:38:06.1133450Z import triton
2023-01-11T21:38:06.1133542Z import triton.language as tl
2023-01-11T21:38:06.1133668Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1133809Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1133815Z 
2023-01-11T21:38:06.1133819Z 
2023-01-11T21:38:06.1133957Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1134164Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1134289Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1134391Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1134645Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1134750Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.1134820Z {
2023-01-11T21:38:06.1134921Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1134987Z     {
2023-01-11T21:38:06.1135063Z         #pragma omp for 
2023-01-11T21:38:06.1135151Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1135218Z         {
2023-01-11T21:38:06.1135303Z             #pragma GCC ivdep
2023-01-11T21:38:06.1135391Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.1135460Z             {
2023-01-11T21:38:06.1135527Z                 {
2023-01-11T21:38:06.1135591Z                     {
2023-01-11T21:38:06.1135698Z                         auto tmp0 = in_ptr0[i0 + (8*i1)];
2023-01-11T21:38:06.1135808Z                         auto tmp1 = in_ptr1[i1 + (8*i0)];
2023-01-11T21:38:06.1135910Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1136011Z                         out_ptr0[i0 + (8*i1)] = tmp2;
2023-01-11T21:38:06.1136082Z                     }
2023-01-11T21:38:06.1136150Z                 }
2023-01-11T21:38:06.1136262Z             }
2023-01-11T21:38:06.1136331Z         }
2023-01-11T21:38:06.1136415Z         #pragma omp for 
2023-01-11T21:38:06.1136501Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1136568Z         {
2023-01-11T21:38:06.1136713Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1136853Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1136935Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1137074Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:06.1137216Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1137331Z             tmp4.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1137413Z         }
2023-01-11T21:38:06.1137530Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1137630Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.1137690Z         {
2023-01-11T21:38:06.1137777Z             auto tmp0 = in_ptr1[i0];
2023-01-11T21:38:06.1137885Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:06.1137974Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1138077Z             auto tmp3 = static_cast<float>(10);
2023-01-11T21:38:06.1138164Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1138249Z             out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.1138309Z         }
2023-01-11T21:38:06.1138374Z     }
2023-01-11T21:38:06.1138437Z }
2023-01-11T21:38:06.1138526Z ''')
2023-01-11T21:38:06.1138534Z 
2023-01-11T21:38:06.1138539Z 
2023-01-11T21:38:06.1138631Z async_compile.wait(globals())
2023-01-11T21:38:06.1138708Z del async_compile
2023-01-11T21:38:06.1138756Z 
2023-01-11T21:38:06.1138832Z def call(args):
2023-01-11T21:38:06.1138913Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1138981Z     args.clear()
2023-01-11T21:38:06.1139180Z     buf0 = empty_strided((8, 8), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1139376Z     buf1 = empty_strided((8, 8), (1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1139572Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1139647Z     del arg0_1
2023-01-11T21:38:06.1139718Z     del arg1_1
2023-01-11T21:38:06.1139800Z     return (buf0, buf1, )
2023-01-11T21:38:06.1139805Z 
2023-01-11T21:38:06.1139810Z 
2023-01-11T21:38:06.1139883Z if __name__ == "__main__":
2023-01-11T21:38:06.1140000Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1140127Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1140325Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1140523Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1140645Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1140650Z 
2023-01-11T21:38:06.1140720Z ok (1.759s)
2023-01-11T21:38:06.1141250Z   test_transposed_propagates_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1141331Z   warnings.warn(
2023-01-11T21:38:06.1141583Z [2023-01-11 21:32:30,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 293
2023-01-11T21:38:06.1141846Z [2023-01-11 21:32:32,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 293
2023-01-11T21:38:06.1141851Z 
2023-01-11T21:38:06.1141950Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1142027Z import torch
2023-01-11T21:38:06.1142104Z import random
2023-01-11T21:38:06.1142230Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1142354Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1142359Z 
2023-01-11T21:38:06.1142448Z aten = torch.ops.aten
2023-01-11T21:38:06.1142611Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1142709Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1142714Z 
2023-01-11T21:38:06.1142793Z import triton
2023-01-11T21:38:06.1142886Z import triton.language as tl
2023-01-11T21:38:06.1143011Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1143152Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1143158Z 
2023-01-11T21:38:06.1143162Z 
2023-01-11T21:38:06.1143300Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1143506Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1143625Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1143736Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1143840Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1143907Z {
2023-01-11T21:38:06.1144011Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1144077Z     {
2023-01-11T21:38:06.1144158Z         #pragma omp for 
2023-01-11T21:38:06.1144239Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1144307Z         {
2023-01-11T21:38:06.1144449Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1144587Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1144678Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1144773Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1144839Z         }
2023-01-11T21:38:06.1144932Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1145051Z         for(long i0=64; i0<64; i0+=1)
2023-01-11T21:38:06.1145119Z         {
2023-01-11T21:38:06.1145207Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1145302Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1145408Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1145500Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1145576Z         }
2023-01-11T21:38:06.1145645Z     }
2023-01-11T21:38:06.1145709Z }
2023-01-11T21:38:06.1145794Z ''')
2023-01-11T21:38:06.1145799Z 
2023-01-11T21:38:06.1145804Z 
2023-01-11T21:38:06.1145897Z async_compile.wait(globals())
2023-01-11T21:38:06.1145975Z del async_compile
2023-01-11T21:38:06.1145979Z 
2023-01-11T21:38:06.1146052Z def call(args):
2023-01-11T21:38:06.1146125Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1146203Z     args.clear()
2023-01-11T21:38:06.1146418Z     buf0 = empty_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1146590Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1146665Z     del arg0_1
2023-01-11T21:38:06.1146738Z     del arg1_1
2023-01-11T21:38:06.1146811Z     return (buf0, )
2023-01-11T21:38:06.1146817Z 
2023-01-11T21:38:06.1146821Z 
2023-01-11T21:38:06.1146902Z if __name__ == "__main__":
2023-01-11T21:38:06.1147016Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1147143Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1147357Z     arg0_1 = rand_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1147562Z     arg1_1 = rand_strided((4, 4, 4), (4, 1, 16), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1147682Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1147688Z 
2023-01-11T21:38:06.1147764Z ok (1.684s)
2023-01-11T21:38:06.1148273Z   test_triton_conv_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1148355Z   warnings.warn(
2023-01-11T21:38:06.1148605Z [2023-01-11 21:32:32,209] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 294
2023-01-11T21:38:06.1148925Z [2023-01-11 21:32:32,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 294
2023-01-11T21:38:06.1148931Z 
2023-01-11T21:38:06.1149031Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1149108Z import torch
2023-01-11T21:38:06.1149182Z import random
2023-01-11T21:38:06.1149301Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1149425Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1149430Z 
2023-01-11T21:38:06.1149512Z aten = torch.ops.aten
2023-01-11T21:38:06.1149641Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1149738Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1149744Z 
2023-01-11T21:38:06.1149819Z import triton
2023-01-11T21:38:06.1149911Z import triton.language as tl
2023-01-11T21:38:06.1150034Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1150182Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1150190Z 
2023-01-11T21:38:06.1150340Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune
2023-01-11T21:38:06.1150487Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time
2023-01-11T21:38:06.1150618Z from torch._inductor.triton_ops.autotune import conv_heuristics
2023-01-11T21:38:06.1150624Z 
2023-01-11T21:38:06.1150635Z 
2023-01-11T21:38:06.1150720Z async_compile.wait(globals())
2023-01-11T21:38:06.1150798Z del async_compile
2023-01-11T21:38:06.1150803Z 
2023-01-11T21:38:06.1150878Z def call(args):
2023-01-11T21:38:06.1150965Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.1151039Z     args.clear()
2023-01-11T21:38:06.1151211Z     buf0 = aten.convolution(arg0_1, arg1_1, arg2_1, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.1151325Z     assert_size_stride(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1))
2023-01-11T21:38:06.1151390Z     del arg0_1
2023-01-11T21:38:06.1151461Z     del arg1_1
2023-01-11T21:38:06.1151535Z     del arg2_1
2023-01-11T21:38:06.1151617Z     return (buf0, )
2023-01-11T21:38:06.1151623Z 
2023-01-11T21:38:06.1151627Z 
2023-01-11T21:38:06.1151706Z if __name__ == "__main__":
2023-01-11T21:38:06.1151824Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1151949Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1152171Z     arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1152387Z     arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1152579Z     arg2_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1152709Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.1152714Z 
2023-01-11T21:38:06.1152785Z ok (0.099s)
2023-01-11T21:38:06.1153293Z   test_triton_mm2_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1153375Z   warnings.warn(
2023-01-11T21:38:06.1153635Z [2023-01-11 21:32:32,278] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 295
2023-01-11T21:38:06.1153895Z [2023-01-11 21:32:33,946] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 295
2023-01-11T21:38:06.1153901Z 
2023-01-11T21:38:06.1153992Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1154067Z import torch
2023-01-11T21:38:06.1154142Z import random
2023-01-11T21:38:06.1154264Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1154388Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1154393Z 
2023-01-11T21:38:06.1154476Z aten = torch.ops.aten
2023-01-11T21:38:06.1154615Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1154740Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1154746Z 
2023-01-11T21:38:06.1154814Z import triton
2023-01-11T21:38:06.1154907Z import triton.language as tl
2023-01-11T21:38:06.1155031Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1155173Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1155179Z 
2023-01-11T21:38:06.1155314Z from torch._inductor.triton_ops.autotune import mm_heuristics
2023-01-11T21:38:06.1155474Z from torch._inductor.triton_ops.autotune import mm_autotune
2023-01-11T21:38:06.1155479Z 
2023-01-11T21:38:06.1155484Z 
2023-01-11T21:38:06.1155644Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1155854Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1155968Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:06.1156034Z {
2023-01-11T21:38:06.1156135Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1156201Z     {
2023-01-11T21:38:06.1156285Z         #pragma omp for 
2023-01-11T21:38:06.1156378Z         for(long i0=0; i0<131072; i0+=1)
2023-01-11T21:38:06.1156445Z         {
2023-01-11T21:38:06.1156582Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.1156718Z             auto tmp1 = at::vec::clamp_min(tmp0, decltype(tmp0)(0));
2023-01-11T21:38:06.1156819Z             tmp1.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.1156886Z         }
2023-01-11T21:38:06.1156985Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1157082Z         for(long i0=1048576; i0<1048576; i0+=1)
2023-01-11T21:38:06.1157151Z         {
2023-01-11T21:38:06.1157267Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.1157358Z             auto tmp1 = tmp0 * (tmp0>0);
2023-01-11T21:38:06.1157445Z             in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1157511Z         }
2023-01-11T21:38:06.1157578Z     }
2023-01-11T21:38:06.1157643Z }
2023-01-11T21:38:06.1157728Z ''')
2023-01-11T21:38:06.1157736Z 
2023-01-11T21:38:06.1157740Z 
2023-01-11T21:38:06.1157827Z async_compile.wait(globals())
2023-01-11T21:38:06.1157905Z del async_compile
2023-01-11T21:38:06.1157910Z 
2023-01-11T21:38:06.1157982Z def call(args):
2023-01-11T21:38:06.1158061Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1158136Z     args.clear()
2023-01-11T21:38:06.1158345Z     buf0 = empty_strided((1024, 1024), (1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1158445Z     aten.mm.out(arg0_1, arg1_1, out=buf0)
2023-01-11T21:38:06.1158510Z     del arg0_1
2023-01-11T21:38:06.1158582Z     del arg1_1
2023-01-11T21:38:06.1158673Z     buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.1158784Z     kernel_cpp_0(c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1158860Z     return (buf1, )
2023-01-11T21:38:06.1158865Z 
2023-01-11T21:38:06.1158870Z 
2023-01-11T21:38:06.1158950Z if __name__ == "__main__":
2023-01-11T21:38:06.1159068Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1159196Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1159395Z     arg0_1 = rand_strided((1024, 1024), (1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1159595Z     arg1_1 = rand_strided((1024, 1024), (1024, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1159714Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1159721Z 
2023-01-11T21:38:06.1159791Z ok (1.715s)
2023-01-11T21:38:06.1160236Z   test_triu_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1160370Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1160655Z [2023-01-11 21:32:33,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 296
2023-01-11T21:38:06.1160919Z [2023-01-11 21:32:35,723] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 296
2023-01-11T21:38:06.1160925Z 
2023-01-11T21:38:06.1161023Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1161090Z import torch
2023-01-11T21:38:06.1161166Z import random
2023-01-11T21:38:06.1161285Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1161408Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1161413Z 
2023-01-11T21:38:06.1161493Z aten = torch.ops.aten
2023-01-11T21:38:06.1161630Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1161728Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1161734Z 
2023-01-11T21:38:06.1161807Z import triton
2023-01-11T21:38:06.1161892Z import triton.language as tl
2023-01-11T21:38:06.1162018Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1162161Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1162166Z 
2023-01-11T21:38:06.1162171Z 
2023-01-11T21:38:06.1162306Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1162512Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1162639Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1162744Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1162846Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1162939Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.1163035Z {
2023-01-11T21:38:06.1163137Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1163204Z     {
2023-01-11T21:38:06.1163299Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1163385Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1163452Z         {
2023-01-11T21:38:06.1163535Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.1163606Z             {
2023-01-11T21:38:06.1163696Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1163792Z                 for(long i2=0; i2<10; i2+=1)
2023-01-11T21:38:06.1163861Z                 {
2023-01-11T21:38:06.1163935Z                     {
2023-01-11T21:38:06.1164000Z                         {
2023-01-11T21:38:06.1164121Z                             auto tmp3 = in_ptr0[i2 + (10*i1) + (100*i0)];
2023-01-11T21:38:06.1164310Z                             auto tmp0 = static_cast<int>((-1) + i2 + ((-1)*i1));
2023-01-11T21:38:06.1164422Z                             auto tmp1 = static_cast<int>(0);
2023-01-11T21:38:06.1164529Z                             auto tmp2 = tmp0 >= tmp1;
2023-01-11T21:38:06.1164641Z                             auto tmp4 = static_cast<float>(0);
2023-01-11T21:38:06.1164751Z                             auto tmp5 = tmp2 ? tmp3 : tmp4;
2023-01-11T21:38:06.1164930Z                             auto tmp6 = static_cast<int>(i2 + ((-1)*i1));
2023-01-11T21:38:06.1165027Z                             auto tmp7 = tmp6 >= tmp1;
2023-01-11T21:38:06.1165137Z                             auto tmp8 = tmp7 ? tmp3 : tmp4;
2023-01-11T21:38:06.1165344Z                             auto tmp9 = static_cast<int>((-2) + i2 + ((-1)*i1));
2023-01-11T21:38:06.1165457Z                             auto tmp10 = tmp9 >= tmp1;
2023-01-11T21:38:06.1165586Z                             auto tmp11 = tmp10 ? tmp3 : tmp4;
2023-01-11T21:38:06.1165699Z                             out_ptr0[i2 + (10*i1) + (100*i0)] = tmp5;
2023-01-11T21:38:06.1165807Z                             out_ptr1[i2 + (10*i1) + (100*i0)] = tmp8;
2023-01-11T21:38:06.1165922Z                             out_ptr2[i2 + (10*i1) + (100*i0)] = tmp11;
2023-01-11T21:38:06.1165989Z                         }
2023-01-11T21:38:06.1166060Z                     }
2023-01-11T21:38:06.1166130Z                 }
2023-01-11T21:38:06.1166196Z             }
2023-01-11T21:38:06.1166264Z         }
2023-01-11T21:38:06.1166330Z     }
2023-01-11T21:38:06.1166464Z }
2023-01-11T21:38:06.1166552Z ''')
2023-01-11T21:38:06.1166557Z 
2023-01-11T21:38:06.1166562Z 
2023-01-11T21:38:06.1166656Z async_compile.wait(globals())
2023-01-11T21:38:06.1166734Z del async_compile
2023-01-11T21:38:06.1166739Z 
2023-01-11T21:38:06.1166815Z def call(args):
2023-01-11T21:38:06.1166890Z     arg0_1, = args
2023-01-11T21:38:06.1166965Z     args.clear()
2023-01-11T21:38:06.1167179Z     buf0 = empty_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1167382Z     buf1 = empty_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1167586Z     buf2 = empty_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1167786Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.1167860Z     del arg0_1
2023-01-11T21:38:06.1167945Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1167950Z 
2023-01-11T21:38:06.1167957Z 
2023-01-11T21:38:06.1168038Z if __name__ == "__main__":
2023-01-11T21:38:06.1168157Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1168284Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1168485Z     arg0_1 = rand_strided((2, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1168598Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1168603Z 
2023-01-11T21:38:06.1168673Z ok (1.767s)
2023-01-11T21:38:06.1169125Z   test_unbind_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1169288Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1169550Z [2023-01-11 21:32:35,746] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 297
2023-01-11T21:38:06.1169813Z [2023-01-11 21:32:35,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 297
2023-01-11T21:38:06.1169819Z 
2023-01-11T21:38:06.1169918Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1169993Z import torch
2023-01-11T21:38:06.1170061Z import random
2023-01-11T21:38:06.1170179Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1170304Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1170311Z 
2023-01-11T21:38:06.1170394Z aten = torch.ops.aten
2023-01-11T21:38:06.1170531Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1170627Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1170632Z 
2023-01-11T21:38:06.1170706Z import triton
2023-01-11T21:38:06.1170797Z import triton.language as tl
2023-01-11T21:38:06.1170918Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1171058Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1171063Z 
2023-01-11T21:38:06.1171068Z 
2023-01-11T21:38:06.1171159Z async_compile.wait(globals())
2023-01-11T21:38:06.1171236Z del async_compile
2023-01-11T21:38:06.1171241Z 
2023-01-11T21:38:06.1171315Z def call(args):
2023-01-11T21:38:06.1171389Z     arg0_1, = args
2023-01-11T21:38:06.1171468Z     args.clear()
2023-01-11T21:38:06.1171741Z     return (as_strided(arg0_1, (4, 4), (4, 1)), as_strided(arg0_1, (4, 4), (4, 1), 16), as_strided(arg0_1, (4, 4), (4, 1), 32), as_strided(arg0_1, (4, 4), (4, 1), 48), as_strided(arg0_1, (4, 4), (16, 4)), as_strided(arg0_1, (4, 4), (16, 4), 1), as_strided(arg0_1, (4, 4), (16, 4), 2), as_strided(arg0_1, (4, 4), (16, 4), 3), )
2023-01-11T21:38:06.1171749Z 
2023-01-11T21:38:06.1171754Z 
2023-01-11T21:38:06.1171836Z if __name__ == "__main__":
2023-01-11T21:38:06.1171973Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1172103Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1172308Z     arg0_1 = rand_strided((4, 4, 4), (16, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1172419Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1172424Z 
2023-01-11T21:38:06.1172496Z ok (0.030s)
2023-01-11T21:38:06.1172961Z   test_unroll_small_reduction_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1173096Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1173355Z [2023-01-11 21:32:35,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 298
2023-01-11T21:38:06.1173615Z [2023-01-11 21:32:37,522] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 298
2023-01-11T21:38:06.1174033Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1174157Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1174452Z [2023-01-11 21:32:37,561] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 299
2023-01-11T21:38:06.1174457Z 
2023-01-11T21:38:06.1174673Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1174751Z import torch
2023-01-11T21:38:06.1174826Z import random
2023-01-11T21:38:06.1174948Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1175073Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1175078Z 
2023-01-11T21:38:06.1175167Z aten = torch.ops.aten
2023-01-11T21:38:06.1175296Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1175392Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1175397Z 
2023-01-11T21:38:06.1175471Z import triton
2023-01-11T21:38:06.1175563Z import triton.language as tl
2023-01-11T21:38:06.1175690Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1175832Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1175841Z 
2023-01-11T21:38:06.1175845Z 
2023-01-11T21:38:06.1175997Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1176205Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1176323Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1176433Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1176541Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:06.1176643Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1176741Z                        long* __restrict__ out_ptr3,
2023-01-11T21:38:06.1176840Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.1176938Z                        bool* __restrict__ out_ptr5,
2023-01-11T21:38:06.1177036Z                        bool* __restrict__ out_ptr6,
2023-01-11T21:38:06.1177177Z                        long* __restrict__ out_ptr7,
2023-01-11T21:38:06.1177291Z                        long* __restrict__ out_ptr8,
2023-01-11T21:38:06.1177403Z                        float* __restrict__ out_ptr9,
2023-01-11T21:38:06.1177510Z                        float* __restrict__ out_ptr10)
2023-01-11T21:38:06.1177575Z {
2023-01-11T21:38:06.1177677Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1177744Z     {
2023-01-11T21:38:06.1177866Z         #pragma omp for 
2023-01-11T21:38:06.1177957Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1178024Z         {
2023-01-11T21:38:06.1178092Z             {
2023-01-11T21:38:06.1178161Z                 {
2023-01-11T21:38:06.1178263Z                     auto tmp0 = in_ptr0[3*i0];
2023-01-11T21:38:06.1178358Z                     auto tmp1 = in_ptr0[1 + (3*i0)];
2023-01-11T21:38:06.1178456Z                     auto tmp3 = in_ptr0[2 + (3*i0)];
2023-01-11T21:38:06.1178593Z                     auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::min(tmp0, tmp1);
2023-01-11T21:38:06.1178724Z                     auto tmp4 = (tmp3 != tmp3) ? tmp3 : std::min(tmp2, tmp3);
2023-01-11T21:38:06.1178834Z                     auto tmp5 = static_cast<long>(0);
2023-01-11T21:38:06.1178937Z                     auto tmp6 = static_cast<long>(1);
2023-01-11T21:38:06.1179036Z                     auto tmp7 = tmp1 < tmp0;
2023-01-11T21:38:06.1179145Z                     auto tmp8 = tmp7 ? tmp6 : tmp5;
2023-01-11T21:38:06.1179246Z                     auto tmp9 = static_cast<long>(2);
2023-01-11T21:38:06.1179345Z                     auto tmp10 = tmp3 < tmp2;
2023-01-11T21:38:06.1179447Z                     auto tmp11 = tmp10 ? tmp9 : tmp8;
2023-01-11T21:38:06.1179581Z                     auto tmp12 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1);
2023-01-11T21:38:06.1179717Z                     auto tmp13 = (tmp3 != tmp3) ? tmp3 : std::max(tmp12, tmp3);
2023-01-11T21:38:06.1179812Z                     auto tmp14 = tmp1 > tmp0;
2023-01-11T21:38:06.1179914Z                     auto tmp15 = tmp14 ? tmp6 : tmp5;
2023-01-11T21:38:06.1180012Z                     auto tmp16 = tmp3 > tmp12;
2023-01-11T21:38:06.1180148Z                     auto tmp17 = tmp16 ? tmp9 : tmp15;
2023-01-11T21:38:06.1180246Z                     auto tmp18 = tmp0 + tmp1;
2023-01-11T21:38:06.1180344Z                     auto tmp19 = tmp18 + tmp3;
2023-01-11T21:38:06.1180454Z                     auto tmp20 = static_cast<float>(1);
2023-01-11T21:38:06.1180553Z                     auto tmp21 = tmp0 > tmp20;
2023-01-11T21:38:06.1180669Z                     auto tmp22 = static_cast<long>(tmp21);
2023-01-11T21:38:06.1180782Z                     auto tmp23 = static_cast<bool>(tmp22);
2023-01-11T21:38:06.1180871Z                     auto tmp24 = tmp1 > tmp20;
2023-01-11T21:38:06.1180983Z                     auto tmp25 = static_cast<long>(tmp24);
2023-01-11T21:38:06.1181095Z                     auto tmp26 = static_cast<bool>(tmp25);
2023-01-11T21:38:06.1181195Z                     auto tmp27 = tmp23 || tmp26;
2023-01-11T21:38:06.1181292Z                     auto tmp28 = tmp3 > tmp20;
2023-01-11T21:38:06.1181401Z                     auto tmp29 = static_cast<long>(tmp28);
2023-01-11T21:38:06.1181514Z                     auto tmp30 = static_cast<bool>(tmp29);
2023-01-11T21:38:06.1181607Z                     auto tmp31 = tmp27 || tmp30;
2023-01-11T21:38:06.1181714Z                     auto tmp32 = static_cast<float>(0);
2023-01-11T21:38:06.1181809Z                     auto tmp33 = tmp0 > tmp32;
2023-01-11T21:38:06.1181912Z                     auto tmp34 = tmp33 == 0;
2023-01-11T21:38:06.1182023Z                     auto tmp35 = static_cast<long>(tmp34);
2023-01-11T21:38:06.1182132Z                     auto tmp36 = static_cast<bool>(tmp35);
2023-01-11T21:38:06.1182230Z                     auto tmp37 = tmp1 > tmp32;
2023-01-11T21:38:06.1182326Z                     auto tmp38 = tmp37 == 0;
2023-01-11T21:38:06.1182428Z                     auto tmp39 = static_cast<long>(tmp38);
2023-01-11T21:38:06.1182536Z                     auto tmp40 = static_cast<bool>(tmp39);
2023-01-11T21:38:06.1182634Z                     auto tmp41 = tmp36 || tmp40;
2023-01-11T21:38:06.1182732Z                     auto tmp42 = tmp3 > tmp32;
2023-01-11T21:38:06.1182827Z                     auto tmp43 = tmp42 == 0;
2023-01-11T21:38:06.1182935Z                     auto tmp44 = static_cast<long>(tmp43);
2023-01-11T21:38:06.1183041Z                     auto tmp45 = static_cast<bool>(tmp44);
2023-01-11T21:38:06.1183159Z                     auto tmp46 = tmp41 || tmp45;
2023-01-11T21:38:06.1183256Z                     auto tmp47 = tmp46 == 0;
2023-01-11T21:38:06.1183345Z                     out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1183437Z                     out_ptr1[i0] = tmp11;
2023-01-11T21:38:06.1183526Z                     out_ptr2[i0] = tmp13;
2023-01-11T21:38:06.1183617Z                     out_ptr3[i0] = tmp17;
2023-01-11T21:38:06.1183707Z                     out_ptr4[i0] = tmp19;
2023-01-11T21:38:06.1183787Z                     out_ptr5[i0] = tmp31;
2023-01-11T21:38:06.1183874Z                     out_ptr6[i0] = tmp47;
2023-01-11T21:38:06.1183962Z                     out_ptr7[i0] = tmp11;
2023-01-11T21:38:06.1184052Z                     out_ptr8[i0] = tmp17;
2023-01-11T21:38:06.1184142Z                     out_ptr9[i0] = tmp4;
2023-01-11T21:38:06.1184234Z                     out_ptr10[i0] = tmp13;
2023-01-11T21:38:06.1184303Z                 }
2023-01-11T21:38:06.1184364Z             }
2023-01-11T21:38:06.1184431Z         }
2023-01-11T21:38:06.1184499Z     }
2023-01-11T21:38:06.1184568Z }
2023-01-11T21:38:06.1184657Z ''')
2023-01-11T21:38:06.1184663Z 
2023-01-11T21:38:06.1184668Z 
2023-01-11T21:38:06.1184764Z async_compile.wait(globals())
2023-01-11T21:38:06.1184839Z del async_compile
2023-01-11T21:38:06.1184844Z 
2023-01-11T21:38:06.1184911Z def call(args):
2023-01-11T21:38:06.1184986Z     arg0_1, = args
2023-01-11T21:38:06.1185059Z     args.clear()
2023-01-11T21:38:06.1185255Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1185454Z     buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1185722Z     buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1185911Z     buf3 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1186097Z     buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1186276Z     buf5 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1186461Z     buf6 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1186646Z     buf7 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1186831Z     buf8 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1187018Z     buf9 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1187208Z     buf10 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1187576Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf6.data_ptr()), c_void_p(buf7.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()))
2023-01-11T21:38:06.1187654Z     del arg0_1
2023-01-11T21:38:06.1187786Z     return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, )
2023-01-11T21:38:06.1187798Z 
2023-01-11T21:38:06.1187803Z 
2023-01-11T21:38:06.1187876Z if __name__ == "__main__":
2023-01-11T21:38:06.1187997Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1188124Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1188322Z     arg0_1 = rand_strided((8, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1188435Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1188440Z 
2023-01-11T21:38:06.1188706Z [2023-01-11 21:32:39,271] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 299
2023-01-11T21:38:06.1188714Z 
2023-01-11T21:38:06.1188812Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1188888Z import torch
2023-01-11T21:38:06.1188956Z import random
2023-01-11T21:38:06.1189077Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1189203Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1189239Z 
2023-01-11T21:38:06.1189326Z aten = torch.ops.aten
2023-01-11T21:38:06.1189466Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1189562Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1189567Z 
2023-01-11T21:38:06.1189642Z import triton
2023-01-11T21:38:06.1189733Z import triton.language as tl
2023-01-11T21:38:06.1189852Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1189994Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1190000Z 
2023-01-11T21:38:06.1190004Z 
2023-01-11T21:38:06.1190144Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1190351Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1190471Z extern "C" void kernel(bool* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.1190583Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1190693Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1190793Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:06.1190887Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1190985Z                        long* __restrict__ out_ptr3,
2023-01-11T21:38:06.1191086Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.1191182Z                        bool* __restrict__ out_ptr5,
2023-01-11T21:38:06.1191282Z                        long* __restrict__ out_ptr7,
2023-01-11T21:38:06.1191380Z                        long* __restrict__ out_ptr8,
2023-01-11T21:38:06.1191481Z                        float* __restrict__ out_ptr9,
2023-01-11T21:38:06.1191606Z                        float* __restrict__ out_ptr10)
2023-01-11T21:38:06.1191670Z {
2023-01-11T21:38:06.1191764Z     auto out_ptr6 = in_out_ptr0;
2023-01-11T21:38:06.1191866Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1191931Z     {
2023-01-11T21:38:06.1192017Z         #pragma omp for 
2023-01-11T21:38:06.1192107Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1192167Z         {
2023-01-11T21:38:06.1192234Z             {
2023-01-11T21:38:06.1192303Z                 {
2023-01-11T21:38:06.1192440Z                     float tmp1 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.1192573Z                     struct IndexValue_11 {size_t index; float value;};
2023-01-11T21:38:06.1192720Z                     IndexValue_11 tmp2{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.1192866Z                     #pragma omp declare reduction(argmin : struct IndexValue_11 :\
2023-01-11T21:38:06.1193023Z                         omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.1193180Z                         omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.1193334Z                     	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.1193593Z                     float tmp3 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.1193722Z                     struct IndexValue_12 {size_t index; float value;};
2023-01-11T21:38:06.1193954Z                     IndexValue_12 tmp4{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.1194103Z                     #pragma omp declare reduction(argmax : struct IndexValue_12 :\
2023-01-11T21:38:06.1194259Z                         omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.1194410Z                         omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.1194643Z                     	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.1194729Z                     float tmp5 = 0;
2023-01-11T21:38:06.1194812Z                     bool tmp10 = 0;
2023-01-11T21:38:06.1194896Z                     bool tmp16 = 0;
2023-01-11T21:38:06.1195058Z                     struct IndexValue_13 {size_t index; float value;};
2023-01-11T21:38:06.1195209Z                     IndexValue_13 tmp17{0, std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.1195374Z                     #pragma omp declare reduction(argmin : struct IndexValue_13 :\
2023-01-11T21:38:06.1195555Z                         omp_out.value = omp_in.value > omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.1195698Z                         omp_out.index = omp_in.value > omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.1195850Z                     	initializer(omp_priv = {0, std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.1195977Z                     struct IndexValue_14 {size_t index; float value;};
2023-01-11T21:38:06.1196217Z                     IndexValue_14 tmp18{0, -std::numeric_limits<float>::infinity()};
2023-01-11T21:38:06.1196360Z                     #pragma omp declare reduction(argmax : struct IndexValue_14 :\
2023-01-11T21:38:06.1196518Z                         omp_out.value = omp_in.value < omp_out.value ? omp_out.value : omp_in.value,\
2023-01-11T21:38:06.1196666Z                         omp_out.index = omp_in.value < omp_out.value ? omp_out.index : omp_in.index)\
2023-01-11T21:38:06.1196901Z                     	initializer(omp_priv = {0, -std::numeric_limits<float>::infinity()})
2023-01-11T21:38:06.1197035Z                     float tmp19 = std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.1197239Z                     float tmp20 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.1197335Z                     for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.1197406Z                     {
2023-01-11T21:38:06.1197509Z                         {
2023-01-11T21:38:06.1197624Z                             auto tmp0 = in_ptr0[i1 + (3*i0)];
2023-01-11T21:38:06.1197739Z                             auto tmp6 = static_cast<float>(1);
2023-01-11T21:38:06.1197841Z                             auto tmp7 = tmp0 > tmp6;
2023-01-11T21:38:06.1197960Z                             auto tmp8 = static_cast<long>(tmp7);
2023-01-11T21:38:06.1198068Z                             auto tmp9 = static_cast<bool>(tmp8);
2023-01-11T21:38:06.1198184Z                             auto tmp11 = static_cast<float>(0);
2023-01-11T21:38:06.1198287Z                             auto tmp12 = tmp0 > tmp11;
2023-01-11T21:38:06.1198387Z                             auto tmp13 = tmp12 == 0;
2023-01-11T21:38:06.1198504Z                             auto tmp14 = static_cast<long>(tmp13);
2023-01-11T21:38:06.1198616Z                             auto tmp15 = static_cast<bool>(tmp14);
2023-01-11T21:38:06.1198731Z                             tmp1 = std::min(tmp1, tmp0);
2023-01-11T21:38:06.1198829Z                             if (tmp2.value > tmp0) {
2023-01-11T21:38:06.1198946Z                                 tmp2.index = i1; tmp2.value = tmp0;
2023-01-11T21:38:06.1199024Z                             }
2023-01-11T21:38:06.1199132Z                             tmp3 = std::max(tmp3, tmp0);
2023-01-11T21:38:06.1199235Z                             if (tmp4.value < tmp0) {
2023-01-11T21:38:06.1199351Z                                 tmp4.index = i1; tmp4.value = tmp0;
2023-01-11T21:38:06.1199424Z                             }
2023-01-11T21:38:06.1199513Z                             tmp5 += tmp0;
2023-01-11T21:38:06.1199603Z                             tmp10 = tmp10 || tmp9;
2023-01-11T21:38:06.1199702Z                             tmp16 = tmp16 || tmp15;
2023-01-11T21:38:06.1199801Z                             if (tmp17.value > tmp0) {
2023-01-11T21:38:06.1199918Z                                 tmp17.index = i1; tmp17.value = tmp0;
2023-01-11T21:38:06.1199992Z                             }
2023-01-11T21:38:06.1200095Z                             if (tmp18.value < tmp0) {
2023-01-11T21:38:06.1200211Z                                 tmp18.index = i1; tmp18.value = tmp0;
2023-01-11T21:38:06.1200279Z                             }
2023-01-11T21:38:06.1200390Z                             tmp19 = std::min(tmp19, tmp0);
2023-01-11T21:38:06.1200526Z                             tmp20 = std::max(tmp20, tmp0);
2023-01-11T21:38:06.1200603Z                         }
2023-01-11T21:38:06.1200675Z                     }
2023-01-11T21:38:06.1200768Z                     out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1200867Z                     out_ptr1[i0] = tmp2.index;
2023-01-11T21:38:06.1200949Z                     out_ptr2[i0] = tmp3;
2023-01-11T21:38:06.1201047Z                     out_ptr3[i0] = tmp4.index;
2023-01-11T21:38:06.1201135Z                     out_ptr4[i0] = tmp5;
2023-01-11T21:38:06.1201226Z                     out_ptr5[i0] = tmp10;
2023-01-11T21:38:06.1201317Z                     out_ptr6[i0] = tmp16;
2023-01-11T21:38:06.1201423Z                     out_ptr7[i0] = tmp17.index;
2023-01-11T21:38:06.1201520Z                     out_ptr8[i0] = tmp18.index;
2023-01-11T21:38:06.1201602Z                     out_ptr9[i0] = tmp19;
2023-01-11T21:38:06.1201695Z                     out_ptr10[i0] = tmp20;
2023-01-11T21:38:06.1201767Z                 }
2023-01-11T21:38:06.1201837Z             }
2023-01-11T21:38:06.1201907Z         }
2023-01-11T21:38:06.1201991Z         #pragma omp for 
2023-01-11T21:38:06.1202079Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1202139Z         {
2023-01-11T21:38:06.1202209Z             {
2023-01-11T21:38:06.1202278Z                 {
2023-01-11T21:38:06.1202377Z                     auto tmp0 = out_ptr6[i0];
2023-01-11T21:38:06.1202476Z                     auto tmp1 = tmp0 == 0;
2023-01-11T21:38:06.1202570Z                     in_out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1202639Z                 }
2023-01-11T21:38:06.1202700Z             }
2023-01-11T21:38:06.1202796Z         }
2023-01-11T21:38:06.1202862Z     }
2023-01-11T21:38:06.1202926Z }
2023-01-11T21:38:06.1203015Z ''')
2023-01-11T21:38:06.1203021Z 
2023-01-11T21:38:06.1203026Z 
2023-01-11T21:38:06.1203121Z async_compile.wait(globals())
2023-01-11T21:38:06.1203198Z del async_compile
2023-01-11T21:38:06.1203203Z 
2023-01-11T21:38:06.1203271Z def call(args):
2023-01-11T21:38:06.1203349Z     arg0_1, = args
2023-01-11T21:38:06.1203424Z     args.clear()
2023-01-11T21:38:06.1203620Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1203809Z     buf1 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1203998Z     buf2 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1204183Z     buf3 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1204361Z     buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1204544Z     buf5 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1204728Z     buf6 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1204913Z     buf8 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1205097Z     buf9 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1205293Z     buf10 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1205485Z     buf11 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1205578Z     buf7 = buf6; del buf6  # reuse
2023-01-11T21:38:06.1205939Z     kernel_cpp_0(c_void_p(buf7.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(buf10.data_ptr()), c_void_p(buf11.data_ptr()))
2023-01-11T21:38:06.1206014Z     del arg0_1
2023-01-11T21:38:06.1206154Z     return (buf0, buf1, buf2, buf3, buf4, buf5, buf7, buf8, buf9, buf10, buf11, )
2023-01-11T21:38:06.1206159Z 
2023-01-11T21:38:06.1206165Z 
2023-01-11T21:38:06.1206245Z if __name__ == "__main__":
2023-01-11T21:38:06.1206367Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1206525Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1206728Z     arg0_1 = rand_strided((8, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1206842Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1206847Z 
2023-01-11T21:38:06.1206918Z ok (3.519s)
2023-01-11T21:38:06.1207427Z   test_unspec_inputs_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1207507Z   warnings.warn(
2023-01-11T21:38:06.1207769Z [2023-01-11 21:32:39,292] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 300
2023-01-11T21:38:06.1208020Z [2023-01-11 21:32:39,706] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1208285Z [2023-01-11 21:32:39,706] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 300
2023-01-11T21:38:06.1208540Z [2023-01-11 21:32:39,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 301
2023-01-11T21:38:06.1208786Z [2023-01-11 21:32:40,137] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1209046Z [2023-01-11 21:32:40,137] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 301
2023-01-11T21:38:06.1209298Z [2023-01-11 21:32:40,153] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 302
2023-01-11T21:38:06.1209543Z [2023-01-11 21:32:40,409] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1209825Z [2023-01-11 21:32:40,409] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 302
2023-01-11T21:38:06.1210078Z [2023-01-11 21:32:40,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 303
2023-01-11T21:38:06.1210084Z 
2023-01-11T21:38:06.1210182Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1210259Z import torch
2023-01-11T21:38:06.1210334Z import random
2023-01-11T21:38:06.1210452Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1210575Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1210580Z 
2023-01-11T21:38:06.1210662Z aten = torch.ops.aten
2023-01-11T21:38:06.1210793Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1210889Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1210894Z 
2023-01-11T21:38:06.1210968Z import triton
2023-01-11T21:38:06.1211065Z import triton.language as tl
2023-01-11T21:38:06.1211191Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1211331Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1211336Z 
2023-01-11T21:38:06.1211341Z 
2023-01-11T21:38:06.1211513Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1211588Z import triton
2023-01-11T21:38:06.1211674Z import triton.language as tl
2023-01-11T21:38:06.1211793Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1211896Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1212028Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1212153Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1212158Z 
2023-01-11T21:38:06.1212605Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1212682Z @triton.jit
2023-01-11T21:38:06.1212842Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1212908Z     xnumel = 6
2023-01-11T21:38:06.1213048Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1213187Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1213270Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1213344Z     x0 = xindex
2023-01-11T21:38:06.1213537Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1213610Z     tmp1 = in_ptr1
2023-01-11T21:38:06.1213701Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1213791Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.1213871Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.1213951Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.1214029Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.1214168Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1214302Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1214426Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1214626Z ''')
2023-01-11T21:38:06.1214633Z 
2023-01-11T21:38:06.1214639Z 
2023-01-11T21:38:06.1214735Z async_compile.wait(globals())
2023-01-11T21:38:06.1214810Z del async_compile
2023-01-11T21:38:06.1214816Z 
2023-01-11T21:38:06.1214889Z def call(args):
2023-01-11T21:38:06.1214967Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1215041Z     args.clear()
2023-01-11T21:38:06.1215126Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1215329Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1215529Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1215724Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1215889Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1216058Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1216130Z         del arg0_1
2023-01-11T21:38:06.1216203Z         del arg1_1
2023-01-11T21:38:06.1216289Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1216294Z 
2023-01-11T21:38:06.1216298Z 
2023-01-11T21:38:06.1216380Z if __name__ == "__main__":
2023-01-11T21:38:06.1216499Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1216627Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1216826Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1217010Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float16)
2023-01-11T21:38:06.1217176Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1217182Z 
2023-01-11T21:38:06.1217191Z 
2023-01-11T21:38:06.1217299Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1217378Z import torch
2023-01-11T21:38:06.1217453Z import random
2023-01-11T21:38:06.1217572Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1217696Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1217702Z 
2023-01-11T21:38:06.1217786Z aten = torch.ops.aten
2023-01-11T21:38:06.1217924Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1218022Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1218027Z 
2023-01-11T21:38:06.1218100Z import triton
2023-01-11T21:38:06.1218185Z import triton.language as tl
2023-01-11T21:38:06.1218310Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1218451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1218457Z 
2023-01-11T21:38:06.1218461Z 
2023-01-11T21:38:06.1218629Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1218704Z import triton
2023-01-11T21:38:06.1218796Z import triton.language as tl
2023-01-11T21:38:06.1218910Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1219006Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1219139Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1219303Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1219309Z 
2023-01-11T21:38:06.1219756Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1219829Z @triton.jit
2023-01-11T21:38:06.1219989Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1220060Z     xnumel = 6
2023-01-11T21:38:06.1220157Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1220292Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1220368Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1220438Z     x0 = xindex
2023-01-11T21:38:06.1220512Z     tmp0 = in_ptr0
2023-01-11T21:38:06.1220703Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1220804Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.1220893Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.1220971Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1221043Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.1221117Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.1221251Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1221382Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1221512Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1221596Z ''')
2023-01-11T21:38:06.1221602Z 
2023-01-11T21:38:06.1221634Z 
2023-01-11T21:38:06.1221726Z async_compile.wait(globals())
2023-01-11T21:38:06.1221796Z del async_compile
2023-01-11T21:38:06.1221801Z 
2023-01-11T21:38:06.1221877Z def call(args):
2023-01-11T21:38:06.1221957Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1222030Z     args.clear()
2023-01-11T21:38:06.1222123Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1222322Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1222517Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1222713Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1222798Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1222961Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1223035Z         del arg0_1
2023-01-11T21:38:06.1223106Z         del arg1_1
2023-01-11T21:38:06.1223194Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1223203Z 
2023-01-11T21:38:06.1223208Z 
2023-01-11T21:38:06.1223288Z if __name__ == "__main__":
2023-01-11T21:38:06.1223404Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1223524Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1223710Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float16)
2023-01-11T21:38:06.1223904Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1224021Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1224027Z 
2023-01-11T21:38:06.1224031Z 
2023-01-11T21:38:06.1224128Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1224203Z import torch
2023-01-11T21:38:06.1224277Z import random
2023-01-11T21:38:06.1224395Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1224511Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1224516Z 
2023-01-11T21:38:06.1224600Z aten = torch.ops.aten
2023-01-11T21:38:06.1224739Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1224835Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1224840Z 
2023-01-11T21:38:06.1224912Z import triton
2023-01-11T21:38:06.1225006Z import triton.language as tl
2023-01-11T21:38:06.1225160Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1225302Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1225307Z 
2023-01-11T21:38:06.1225312Z 
2023-01-11T21:38:06.1225488Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1225572Z import triton
2023-01-11T21:38:06.1225680Z import triton.language as tl
2023-01-11T21:38:06.1225801Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1225902Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1226035Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1226161Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1226169Z 
2023-01-11T21:38:06.1226617Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1226687Z @triton.jit
2023-01-11T21:38:06.1226848Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1226922Z     xnumel = 6
2023-01-11T21:38:06.1227020Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1227152Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1227236Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1227307Z     x0 = xindex
2023-01-11T21:38:06.1227491Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1227567Z     tmp1 = in_ptr1
2023-01-11T21:38:06.1227663Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1227780Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.1227858Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.1227935Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.1228012Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.1228140Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1228272Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1228402Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1228489Z ''')
2023-01-11T21:38:06.1228494Z 
2023-01-11T21:38:06.1228499Z 
2023-01-11T21:38:06.1228591Z async_compile.wait(globals())
2023-01-11T21:38:06.1228667Z del async_compile
2023-01-11T21:38:06.1228672Z 
2023-01-11T21:38:06.1228749Z def call(args):
2023-01-11T21:38:06.1228830Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1228898Z     args.clear()
2023-01-11T21:38:06.1228990Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1229188Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1229390Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1229585Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1229677Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1229844Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1229917Z         del arg0_1
2023-01-11T21:38:06.1229983Z         del arg1_1
2023-01-11T21:38:06.1230072Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1230077Z 
2023-01-11T21:38:06.1230081Z 
2023-01-11T21:38:06.1230161Z if __name__ == "__main__":
2023-01-11T21:38:06.1230279Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1230406Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1230605Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1230797Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16)
2023-01-11T21:38:06.1230908Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1230920Z 
2023-01-11T21:38:06.1231162Z [2023-01-11 21:32:40,682] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1231454Z [2023-01-11 21:32:40,682] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 303
2023-01-11T21:38:06.1231708Z [2023-01-11 21:32:40,698] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 304
2023-01-11T21:38:06.1231960Z [2023-01-11 21:32:41,016] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1232222Z [2023-01-11 21:32:41,017] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 304
2023-01-11T21:38:06.1232475Z [2023-01-11 21:32:41,032] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 305
2023-01-11T21:38:06.1232725Z [2023-01-11 21:32:41,353] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1232983Z [2023-01-11 21:32:41,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 305
2023-01-11T21:38:06.1233234Z [2023-01-11 21:32:41,369] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 306
2023-01-11T21:38:06.1233240Z 
2023-01-11T21:38:06.1233331Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1233406Z import torch
2023-01-11T21:38:06.1233479Z import random
2023-01-11T21:38:06.1233599Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1233721Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1233727Z 
2023-01-11T21:38:06.1233808Z aten = torch.ops.aten
2023-01-11T21:38:06.1233944Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1234032Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1234071Z 
2023-01-11T21:38:06.1234139Z import triton
2023-01-11T21:38:06.1234231Z import triton.language as tl
2023-01-11T21:38:06.1234358Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1234499Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1234505Z 
2023-01-11T21:38:06.1234512Z 
2023-01-11T21:38:06.1234678Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1234751Z import triton
2023-01-11T21:38:06.1234842Z import triton.language as tl
2023-01-11T21:38:06.1234949Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1235050Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1235198Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1235338Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1235344Z 
2023-01-11T21:38:06.1235808Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1235886Z @triton.jit
2023-01-11T21:38:06.1236052Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1236128Z     xnumel = 6
2023-01-11T21:38:06.1236227Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1236350Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1236432Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1236503Z     x0 = xindex
2023-01-11T21:38:06.1236577Z     tmp0 = in_ptr0
2023-01-11T21:38:06.1236768Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1236867Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.1236955Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.1237027Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1237108Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.1237184Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.1237319Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1237453Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1237607Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1237694Z ''')
2023-01-11T21:38:06.1237699Z 
2023-01-11T21:38:06.1237704Z 
2023-01-11T21:38:06.1237789Z async_compile.wait(globals())
2023-01-11T21:38:06.1237870Z del async_compile
2023-01-11T21:38:06.1237875Z 
2023-01-11T21:38:06.1237951Z def call(args):
2023-01-11T21:38:06.1238031Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1238106Z     args.clear()
2023-01-11T21:38:06.1238199Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1238400Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1238592Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1238790Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1238882Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1239050Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1239125Z         del arg0_1
2023-01-11T21:38:06.1239201Z         del arg1_1
2023-01-11T21:38:06.1239290Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1239295Z 
2023-01-11T21:38:06.1239300Z 
2023-01-11T21:38:06.1239379Z if __name__ == "__main__":
2023-01-11T21:38:06.1239490Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1239616Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1239805Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16)
2023-01-11T21:38:06.1240002Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1240151Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1240156Z 
2023-01-11T21:38:06.1240161Z 
2023-01-11T21:38:06.1240260Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1240335Z import torch
2023-01-11T21:38:06.1240410Z import random
2023-01-11T21:38:06.1240523Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1240650Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1240655Z 
2023-01-11T21:38:06.1240736Z aten = torch.ops.aten
2023-01-11T21:38:06.1240873Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1240970Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1240975Z 
2023-01-11T21:38:06.1241050Z import triton
2023-01-11T21:38:06.1241142Z import triton.language as tl
2023-01-11T21:38:06.1241264Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1241396Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1241402Z 
2023-01-11T21:38:06.1241409Z 
2023-01-11T21:38:06.1241576Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1241650Z import triton
2023-01-11T21:38:06.1241742Z import triton.language as tl
2023-01-11T21:38:06.1241856Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1241961Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1242099Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1242217Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1242229Z 
2023-01-11T21:38:06.1242669Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1242744Z @triton.jit
2023-01-11T21:38:06.1242903Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1242979Z     xnumel = 6
2023-01-11T21:38:06.1243077Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1243209Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1243293Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1243357Z     x0 = xindex
2023-01-11T21:38:06.1243580Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1243657Z     tmp1 = in_ptr1
2023-01-11T21:38:06.1243757Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1243837Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1243913Z     tmp3 = tmp0 * tmp1
2023-01-11T21:38:06.1243991Z     tmp5 = tmp4 / tmp1
2023-01-11T21:38:06.1244117Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1244246Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1244374Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.1244458Z ''')
2023-01-11T21:38:06.1244466Z 
2023-01-11T21:38:06.1244471Z 
2023-01-11T21:38:06.1244563Z async_compile.wait(globals())
2023-01-11T21:38:06.1244640Z del async_compile
2023-01-11T21:38:06.1244645Z 
2023-01-11T21:38:06.1244717Z def call(args):
2023-01-11T21:38:06.1244796Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1244864Z     args.clear()
2023-01-11T21:38:06.1244958Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1245159Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1245357Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1245553Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1245646Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1245812Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1245885Z         del arg0_1
2023-01-11T21:38:06.1245951Z         del arg1_1
2023-01-11T21:38:06.1246069Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1246074Z 
2023-01-11T21:38:06.1246079Z 
2023-01-11T21:38:06.1246155Z if __name__ == "__main__":
2023-01-11T21:38:06.1246273Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1246401Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1246600Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1246785Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1246897Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1246911Z 
2023-01-11T21:38:06.1246915Z 
2023-01-11T21:38:06.1247005Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1247077Z import torch
2023-01-11T21:38:06.1247151Z import random
2023-01-11T21:38:06.1247271Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1247393Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1247401Z 
2023-01-11T21:38:06.1247482Z aten = torch.ops.aten
2023-01-11T21:38:06.1247619Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1247707Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1247717Z 
2023-01-11T21:38:06.1247787Z import triton
2023-01-11T21:38:06.1247879Z import triton.language as tl
2023-01-11T21:38:06.1248007Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1248148Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1248154Z 
2023-01-11T21:38:06.1248158Z 
2023-01-11T21:38:06.1248324Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1248402Z import triton
2023-01-11T21:38:06.1248493Z import triton.language as tl
2023-01-11T21:38:06.1248599Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1248699Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1248829Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1248955Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1248962Z 
2023-01-11T21:38:06.1249437Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1249513Z @triton.jit
2023-01-11T21:38:06.1249675Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1249749Z     xnumel = 6
2023-01-11T21:38:06.1249841Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1249969Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1250051Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1250120Z     x0 = xindex
2023-01-11T21:38:06.1250194Z     tmp0 = in_ptr0
2023-01-11T21:38:06.1250387Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1250489Z     tmp4 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.1250561Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1250641Z     tmp3 = tmp0 * tmp1
2023-01-11T21:38:06.1250722Z     tmp5 = tmp0 / tmp4
2023-01-11T21:38:06.1250855Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1250994Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1251122Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.1251206Z ''')
2023-01-11T21:38:06.1251212Z 
2023-01-11T21:38:06.1251217Z 
2023-01-11T21:38:06.1251309Z async_compile.wait(globals())
2023-01-11T21:38:06.1251381Z del async_compile
2023-01-11T21:38:06.1251386Z 
2023-01-11T21:38:06.1251460Z def call(args):
2023-01-11T21:38:06.1251539Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1251612Z     args.clear()
2023-01-11T21:38:06.1251704Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1251904Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1252132Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1252323Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1252418Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1252588Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1252665Z         del arg0_1
2023-01-11T21:38:06.1252736Z         del arg1_1
2023-01-11T21:38:06.1252825Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1252831Z 
2023-01-11T21:38:06.1252835Z 
2023-01-11T21:38:06.1252916Z if __name__ == "__main__":
2023-01-11T21:38:06.1253036Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1253156Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1253341Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1253542Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1253660Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1253666Z 
2023-01-11T21:38:06.1253914Z [2023-01-11 21:32:41,784] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1254183Z [2023-01-11 21:32:41,784] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 306
2023-01-11T21:38:06.1254436Z [2023-01-11 21:32:41,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 307
2023-01-11T21:38:06.1254804Z [2023-01-11 21:32:42,223] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1255064Z [2023-01-11 21:32:42,223] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 307
2023-01-11T21:38:06.1255308Z [2023-01-11 21:32:42,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 308
2023-01-11T21:38:06.1255559Z [2023-01-11 21:32:42,663] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1255819Z [2023-01-11 21:32:42,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 308
2023-01-11T21:38:06.1256113Z [2023-01-11 21:32:42,684] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 309
2023-01-11T21:38:06.1256120Z 
2023-01-11T21:38:06.1256219Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1256294Z import torch
2023-01-11T21:38:06.1256369Z import random
2023-01-11T21:38:06.1256490Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1256606Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1256611Z 
2023-01-11T21:38:06.1256695Z aten = torch.ops.aten
2023-01-11T21:38:06.1256831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1256929Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1256937Z 
2023-01-11T21:38:06.1257011Z import triton
2023-01-11T21:38:06.1257104Z import triton.language as tl
2023-01-11T21:38:06.1257288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1257429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1257435Z 
2023-01-11T21:38:06.1257441Z 
2023-01-11T21:38:06.1257602Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1257677Z import triton
2023-01-11T21:38:06.1257771Z import triton.language as tl
2023-01-11T21:38:06.1257884Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1257986Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1258119Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1258244Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1258249Z 
2023-01-11T21:38:06.1258692Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1258851Z @triton.jit
2023-01-11T21:38:06.1259010Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1259087Z     xnumel = 6
2023-01-11T21:38:06.1259186Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1259316Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1259400Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1259473Z     x0 = xindex
2023-01-11T21:38:06.1259657Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1259734Z     tmp1 = in_ptr1
2023-01-11T21:38:06.1259831Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1259920Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.1259998Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.1260080Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.1260160Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.1260289Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1260424Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1260557Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1260642Z ''')
2023-01-11T21:38:06.1260648Z 
2023-01-11T21:38:06.1260652Z 
2023-01-11T21:38:06.1260744Z async_compile.wait(globals())
2023-01-11T21:38:06.1260820Z del async_compile
2023-01-11T21:38:06.1260826Z 
2023-01-11T21:38:06.1260901Z def call(args):
2023-01-11T21:38:06.1260981Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1261050Z     args.clear()
2023-01-11T21:38:06.1261143Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1261338Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1261537Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1261734Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1261828Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1261995Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1262097Z         del arg0_1
2023-01-11T21:38:06.1262165Z         del arg1_1
2023-01-11T21:38:06.1262254Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1262259Z 
2023-01-11T21:38:06.1262264Z 
2023-01-11T21:38:06.1262346Z if __name__ == "__main__":
2023-01-11T21:38:06.1262469Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1262597Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1262795Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1262980Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.1263092Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1263109Z 
2023-01-11T21:38:06.1263113Z 
2023-01-11T21:38:06.1263203Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1263277Z import torch
2023-01-11T21:38:06.1263351Z import random
2023-01-11T21:38:06.1263467Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1263599Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1263604Z 
2023-01-11T21:38:06.1263686Z aten = torch.ops.aten
2023-01-11T21:38:06.1263821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1263910Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1263921Z 
2023-01-11T21:38:06.1263988Z import triton
2023-01-11T21:38:06.1264080Z import triton.language as tl
2023-01-11T21:38:06.1264205Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1264343Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1264349Z 
2023-01-11T21:38:06.1264353Z 
2023-01-11T21:38:06.1264548Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1264626Z import triton
2023-01-11T21:38:06.1264721Z import triton.language as tl
2023-01-11T21:38:06.1264827Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1264931Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1265071Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1265199Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1265205Z 
2023-01-11T21:38:06.1265697Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1265772Z @triton.jit
2023-01-11T21:38:06.1265936Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1266008Z     xnumel = 6
2023-01-11T21:38:06.1266102Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1266232Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1266316Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1266388Z     x0 = xindex
2023-01-11T21:38:06.1266462Z     tmp0 = in_ptr0
2023-01-11T21:38:06.1266656Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1266754Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.1266835Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.1266914Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1266991Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.1267065Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.1267197Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1267330Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1267461Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1267541Z ''')
2023-01-11T21:38:06.1267554Z 
2023-01-11T21:38:06.1267559Z 
2023-01-11T21:38:06.1267645Z async_compile.wait(globals())
2023-01-11T21:38:06.1267722Z del async_compile
2023-01-11T21:38:06.1267727Z 
2023-01-11T21:38:06.1267801Z def call(args):
2023-01-11T21:38:06.1267881Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1267955Z     args.clear()
2023-01-11T21:38:06.1268079Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1268280Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1268470Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1268667Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1268760Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1268927Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1269001Z         del arg0_1
2023-01-11T21:38:06.1269078Z         del arg1_1
2023-01-11T21:38:06.1269166Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1269172Z 
2023-01-11T21:38:06.1269176Z 
2023-01-11T21:38:06.1269256Z if __name__ == "__main__":
2023-01-11T21:38:06.1269367Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1269496Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1269685Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.1269884Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1270001Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1270007Z 
2023-01-11T21:38:06.1270011Z 
2023-01-11T21:38:06.1270109Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1270183Z import torch
2023-01-11T21:38:06.1270257Z import random
2023-01-11T21:38:06.1270369Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1270492Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1270523Z 
2023-01-11T21:38:06.1270606Z aten = torch.ops.aten
2023-01-11T21:38:06.1270743Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1270840Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1270845Z 
2023-01-11T21:38:06.1270919Z import triton
2023-01-11T21:38:06.1271012Z import triton.language as tl
2023-01-11T21:38:06.1271129Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1271268Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1271273Z 
2023-01-11T21:38:06.1271277Z 
2023-01-11T21:38:06.1271444Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1271517Z import triton
2023-01-11T21:38:06.1271609Z import triton.language as tl
2023-01-11T21:38:06.1271728Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1271830Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1271961Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1272082Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1272087Z 
2023-01-11T21:38:06.1272536Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1272609Z @triton.jit
2023-01-11T21:38:06.1272769Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1272843Z     xnumel = 6
2023-01-11T21:38:06.1272940Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1273072Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1273154Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1273218Z     x0 = xindex
2023-01-11T21:38:06.1273406Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1273483Z     tmp1 = in_ptr1
2023-01-11T21:38:06.1273580Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1273669Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.1273750Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.1273827Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.1273897Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.1274060Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1274192Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1274328Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1274411Z ''')
2023-01-11T21:38:06.1274416Z 
2023-01-11T21:38:06.1274421Z 
2023-01-11T21:38:06.1274512Z async_compile.wait(globals())
2023-01-11T21:38:06.1274588Z del async_compile
2023-01-11T21:38:06.1274593Z 
2023-01-11T21:38:06.1274668Z def call(args):
2023-01-11T21:38:06.1274740Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1274815Z     args.clear()
2023-01-11T21:38:06.1274912Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1275110Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1275335Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1275551Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1275647Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1275807Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1275880Z         del arg0_1
2023-01-11T21:38:06.1275953Z         del arg1_1
2023-01-11T21:38:06.1276043Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1276048Z 
2023-01-11T21:38:06.1276052Z 
2023-01-11T21:38:06.1276130Z if __name__ == "__main__":
2023-01-11T21:38:06.1276249Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1276373Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1276596Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1276771Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.1276888Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1276893Z 
2023-01-11T21:38:06.1277144Z [2023-01-11 21:32:43,109] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1277410Z [2023-01-11 21:32:43,109] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 309
2023-01-11T21:38:06.1277664Z [2023-01-11 21:32:43,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 310
2023-01-11T21:38:06.1277912Z [2023-01-11 21:32:43,540] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1278171Z [2023-01-11 21:32:43,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 310
2023-01-11T21:38:06.1278427Z [2023-01-11 21:32:43,557] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 311
2023-01-11T21:38:06.1278673Z [2023-01-11 21:32:43,974] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1278925Z [2023-01-11 21:32:43,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 311
2023-01-11T21:38:06.1278939Z 
2023-01-11T21:38:06.1279031Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1279103Z import torch
2023-01-11T21:38:06.1279178Z import random
2023-01-11T21:38:06.1279302Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1279425Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1279430Z 
2023-01-11T21:38:06.1279512Z aten = torch.ops.aten
2023-01-11T21:38:06.1279650Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1279739Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1279744Z 
2023-01-11T21:38:06.1279819Z import triton
2023-01-11T21:38:06.1279911Z import triton.language as tl
2023-01-11T21:38:06.1280034Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1280178Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1280183Z 
2023-01-11T21:38:06.1280188Z 
2023-01-11T21:38:06.1280381Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1280458Z import triton
2023-01-11T21:38:06.1280550Z import triton.language as tl
2023-01-11T21:38:06.1280658Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1280757Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1280894Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1281020Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1281025Z 
2023-01-11T21:38:06.1281467Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1281542Z @triton.jit
2023-01-11T21:38:06.1281700Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1281775Z     xnumel = 6
2023-01-11T21:38:06.1281869Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1281999Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1282087Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1282157Z     x0 = xindex
2023-01-11T21:38:06.1282232Z     tmp0 = in_ptr0
2023-01-11T21:38:06.1282422Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1282519Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.1282601Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.1282681Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1282758Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.1282868Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.1283003Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1283135Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1283266Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1283344Z ''')
2023-01-11T21:38:06.1283352Z 
2023-01-11T21:38:06.1283363Z 
2023-01-11T21:38:06.1283450Z async_compile.wait(globals())
2023-01-11T21:38:06.1283526Z del async_compile
2023-01-11T21:38:06.1283532Z 
2023-01-11T21:38:06.1283606Z def call(args):
2023-01-11T21:38:06.1283685Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1283761Z     args.clear()
2023-01-11T21:38:06.1283855Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1284056Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1284247Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1284443Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1284537Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1284702Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1284775Z         del arg0_1
2023-01-11T21:38:06.1284850Z         del arg1_1
2023-01-11T21:38:06.1284938Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1284943Z 
2023-01-11T21:38:06.1284948Z 
2023-01-11T21:38:06.1285030Z if __name__ == "__main__":
2023-01-11T21:38:06.1285141Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1285269Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1285450Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.1285647Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1285768Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1285775Z 
2023-01-11T21:38:06.1285779Z 
2023-01-11T21:38:06.1285878Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1285952Z import torch
2023-01-11T21:38:06.1286020Z import random
2023-01-11T21:38:06.1286138Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1286262Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1286296Z 
2023-01-11T21:38:06.1286381Z aten = torch.ops.aten
2023-01-11T21:38:06.1286521Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1286614Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1286619Z 
2023-01-11T21:38:06.1286697Z import triton
2023-01-11T21:38:06.1286787Z import triton.language as tl
2023-01-11T21:38:06.1286905Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1287045Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1287050Z 
2023-01-11T21:38:06.1287054Z 
2023-01-11T21:38:06.1287219Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1287299Z import triton
2023-01-11T21:38:06.1287391Z import triton.language as tl
2023-01-11T21:38:06.1287506Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1287607Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1287740Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1287862Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1287867Z 
2023-01-11T21:38:06.1288307Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1288383Z @triton.jit
2023-01-11T21:38:06.1288542Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1288615Z     xnumel = 6
2023-01-11T21:38:06.1288712Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1288869Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1288952Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1289015Z     x0 = xindex
2023-01-11T21:38:06.1289207Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1289281Z     tmp1 = in_ptr1
2023-01-11T21:38:06.1289382Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1289471Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.1289551Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.1289630Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.1289701Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.1289834Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1289968Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1290099Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1290184Z ''')
2023-01-11T21:38:06.1290192Z 
2023-01-11T21:38:06.1290196Z 
2023-01-11T21:38:06.1290288Z async_compile.wait(globals())
2023-01-11T21:38:06.1290364Z del async_compile
2023-01-11T21:38:06.1290369Z 
2023-01-11T21:38:06.1290441Z def call(args):
2023-01-11T21:38:06.1290514Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1290588Z     args.clear()
2023-01-11T21:38:06.1290682Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1290881Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1291078Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1291273Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1291365Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1291524Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1291596Z         del arg0_1
2023-01-11T21:38:06.1291673Z         del arg1_1
2023-01-11T21:38:06.1291765Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1291771Z 
2023-01-11T21:38:06.1291775Z 
2023-01-11T21:38:06.1291857Z if __name__ == "__main__":
2023-01-11T21:38:06.1291977Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1292103Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1292326Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1292503Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1292623Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1292628Z 
2023-01-11T21:38:06.1292632Z 
2023-01-11T21:38:06.1292729Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1292803Z import torch
2023-01-11T21:38:06.1292875Z import random
2023-01-11T21:38:06.1292994Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1293117Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1293122Z 
2023-01-11T21:38:06.1293200Z aten = torch.ops.aten
2023-01-11T21:38:06.1293335Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1293430Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1293436Z 
2023-01-11T21:38:06.1293510Z import triton
2023-01-11T21:38:06.1293599Z import triton.language as tl
2023-01-11T21:38:06.1293729Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1293869Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1293874Z 
2023-01-11T21:38:06.1293879Z 
2023-01-11T21:38:06.1294044Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.1294111Z import triton
2023-01-11T21:38:06.1294201Z import triton.language as tl
2023-01-11T21:38:06.1294311Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1294412Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1294655Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1294781Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1294829Z 
2023-01-11T21:38:06.1295272Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1295347Z @triton.jit
2023-01-11T21:38:06.1295504Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1295570Z     xnumel = 6
2023-01-11T21:38:06.1295667Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1295796Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1295880Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1295953Z     x0 = xindex
2023-01-11T21:38:06.1296027Z     tmp0 = in_ptr0
2023-01-11T21:38:06.1296219Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1296313Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.1296402Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.1296480Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1296557Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.1296635Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.1296769Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.1296905Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1297030Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.1297114Z ''')
2023-01-11T21:38:06.1297120Z 
2023-01-11T21:38:06.1297166Z 
2023-01-11T21:38:06.1297275Z async_compile.wait(globals())
2023-01-11T21:38:06.1297367Z del async_compile
2023-01-11T21:38:06.1297373Z 
2023-01-11T21:38:06.1297459Z def call(args):
2023-01-11T21:38:06.1297551Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1297627Z     args.clear()
2023-01-11T21:38:06.1297713Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1297915Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1298112Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1298303Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1298435Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1298603Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.1298678Z         del arg0_1
2023-01-11T21:38:06.1298750Z         del arg1_1
2023-01-11T21:38:06.1298832Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.1298837Z 
2023-01-11T21:38:06.1298842Z 
2023-01-11T21:38:06.1298920Z if __name__ == "__main__":
2023-01-11T21:38:06.1299038Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1299166Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1299349Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1299553Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1299673Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1299679Z 
2023-01-11T21:38:06.1299750Z ok (4.701s)
2023-01-11T21:38:06.1300202Z   test_unsqueeze_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1300336Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1300596Z [2023-01-11 21:32:44,005] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 312
2023-01-11T21:38:06.1300858Z [2023-01-11 21:32:45,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 312
2023-01-11T21:38:06.1300902Z 
2023-01-11T21:38:06.1301000Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1301077Z import torch
2023-01-11T21:38:06.1301153Z import random
2023-01-11T21:38:06.1301272Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1301399Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1301404Z 
2023-01-11T21:38:06.1301479Z aten = torch.ops.aten
2023-01-11T21:38:06.1301614Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1301712Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1301717Z 
2023-01-11T21:38:06.1301793Z import triton
2023-01-11T21:38:06.1301889Z import triton.language as tl
2023-01-11T21:38:06.1302016Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1302155Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1302161Z 
2023-01-11T21:38:06.1302168Z 
2023-01-11T21:38:06.1302305Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1302506Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1302632Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1302737Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1302842Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1302944Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1303044Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:06.1303111Z {
2023-01-11T21:38:06.1303207Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1303272Z     {
2023-01-11T21:38:06.1303353Z         #pragma omp for 
2023-01-11T21:38:06.1303442Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1303512Z         {
2023-01-11T21:38:06.1303653Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1303797Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1303888Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1304016Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1304107Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1304224Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.1304323Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1304418Z             tmp5.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1304511Z             tmp4.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.1304606Z             tmp5.store(out_ptr3 + 8*i0);
2023-01-11T21:38:06.1304666Z         }
2023-01-11T21:38:06.1304768Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1304857Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1304924Z         {
2023-01-11T21:38:06.1305013Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1305117Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1305210Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1305307Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.1305397Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1305505Z             auto tmp5 = tmp0 + tmp3;
2023-01-11T21:38:06.1305597Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1305704Z             out_ptr1[i0] = tmp5;
2023-01-11T21:38:06.1305790Z             out_ptr2[i0] = tmp4;
2023-01-11T21:38:06.1305866Z             out_ptr3[i0] = tmp5;
2023-01-11T21:38:06.1305936Z         }
2023-01-11T21:38:06.1306002Z     }
2023-01-11T21:38:06.1306067Z }
2023-01-11T21:38:06.1306152Z ''')
2023-01-11T21:38:06.1306158Z 
2023-01-11T21:38:06.1306162Z 
2023-01-11T21:38:06.1306257Z async_compile.wait(globals())
2023-01-11T21:38:06.1306334Z del async_compile
2023-01-11T21:38:06.1306339Z 
2023-01-11T21:38:06.1306415Z def call(args):
2023-01-11T21:38:06.1306482Z     arg0_1, = args
2023-01-11T21:38:06.1306557Z     args.clear()
2023-01-11T21:38:06.1306802Z     buf0 = empty_strided((2, 2, 2, 2, 1), (8, 4, 2, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1307019Z     buf1 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1307233Z     buf2 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1307449Z     buf3 = empty_strided((2, 2, 2, 1, 2), (8, 4, 2, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1307665Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.1307738Z     del arg0_1
2023-01-11T21:38:06.1307823Z     return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.1307828Z 
2023-01-11T21:38:06.1307832Z 
2023-01-11T21:38:06.1307912Z if __name__ == "__main__":
2023-01-11T21:38:06.1308031Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1308160Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1308370Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1308483Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1308488Z 
2023-01-11T21:38:06.1308558Z ok (1.852s)
2023-01-11T21:38:06.1309023Z   test_unsqueeze_inplace_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1309156Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1309406Z [2023-01-11 21:32:45,858] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 313
2023-01-11T21:38:06.1309666Z [2023-01-11 21:32:47,526] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 313
2023-01-11T21:38:06.1309674Z 
2023-01-11T21:38:06.1309773Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1309848Z import torch
2023-01-11T21:38:06.1309924Z import random
2023-01-11T21:38:06.1310041Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1310194Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1310199Z 
2023-01-11T21:38:06.1310282Z aten = torch.ops.aten
2023-01-11T21:38:06.1310411Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1310508Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1310513Z 
2023-01-11T21:38:06.1310588Z import triton
2023-01-11T21:38:06.1310680Z import triton.language as tl
2023-01-11T21:38:06.1310808Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1310952Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1310958Z 
2023-01-11T21:38:06.1310962Z 
2023-01-11T21:38:06.1311103Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1311312Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1311430Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1311535Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1311640Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.1311708Z {
2023-01-11T21:38:06.1311812Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1311880Z     {
2023-01-11T21:38:06.1311962Z         #pragma omp for 
2023-01-11T21:38:06.1312042Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1312109Z         {
2023-01-11T21:38:06.1312249Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1312385Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1312476Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1312640Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1312729Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1312818Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1312913Z             tmp4.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1312979Z         }
2023-01-11T21:38:06.1313079Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1313169Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1313235Z         {
2023-01-11T21:38:06.1313325Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1313422Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1313513Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1313617Z             auto tmp3 = static_cast<float>(2);
2023-01-11T21:38:06.1313707Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1313793Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1313876Z             out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.1313943Z         }
2023-01-11T21:38:06.1314006Z     }
2023-01-11T21:38:06.1314069Z }
2023-01-11T21:38:06.1314154Z ''')
2023-01-11T21:38:06.1314160Z 
2023-01-11T21:38:06.1314164Z 
2023-01-11T21:38:06.1314260Z async_compile.wait(globals())
2023-01-11T21:38:06.1314339Z del async_compile
2023-01-11T21:38:06.1314344Z 
2023-01-11T21:38:06.1314418Z def call(args):
2023-01-11T21:38:06.1314498Z     arg0_1, = args
2023-01-11T21:38:06.1314565Z     args.clear()
2023-01-11T21:38:06.1314783Z     buf0 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1314999Z     buf1 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1315173Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1315253Z     del arg0_1
2023-01-11T21:38:06.1315332Z     return (buf0, buf1, )
2023-01-11T21:38:06.1315337Z 
2023-01-11T21:38:06.1315341Z 
2023-01-11T21:38:06.1315442Z if __name__ == "__main__":
2023-01-11T21:38:06.1315575Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1315712Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1315918Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1316031Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1316065Z 
2023-01-11T21:38:06.1316136Z ok (1.700s)
2023-01-11T21:38:06.1316604Z   test_upsample_bicubic2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1316736Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1316993Z [2023-01-11 21:32:49,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 314
2023-01-11T21:38:06.1317002Z 
2023-01-11T21:38:06.1317098Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1317173Z import torch
2023-01-11T21:38:06.1317248Z import random
2023-01-11T21:38:06.1317360Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1317486Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1317492Z 
2023-01-11T21:38:06.1317574Z aten = torch.ops.aten
2023-01-11T21:38:06.1317709Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1317803Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1317808Z 
2023-01-11T21:38:06.1317882Z import triton
2023-01-11T21:38:06.1317975Z import triton.language as tl
2023-01-11T21:38:06.1318093Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1318230Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1318236Z 
2023-01-11T21:38:06.1318266Z 
2023-01-11T21:38:06.1318402Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1318610Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1318736Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1318842Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1318947Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.1319011Z {
2023-01-11T21:38:06.1319105Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1319174Z     {
2023-01-11T21:38:06.1319269Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1319354Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.1319421Z         {
2023-01-11T21:38:06.1319512Z             for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:06.1319579Z             {
2023-01-11T21:38:06.1319660Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1319753Z                 for(long i2=0; i2<128; i2+=1)
2023-01-11T21:38:06.1319826Z                 {
2023-01-11T21:38:06.1319898Z                     {
2023-01-11T21:38:06.1319972Z                         {
2023-01-11T21:38:06.1320089Z                             auto tmp0 = static_cast<float>(i2);
2023-01-11T21:38:06.1320197Z                             auto tmp1 = 0.2440944881889764 * tmp0;
2023-01-11T21:38:06.1320303Z                             auto tmp2 = std::floor(tmp1);
2023-01-11T21:38:06.1320452Z                             auto tmp3 = tmp1 - tmp2;
2023-01-11T21:38:06.1320565Z                             auto tmp4 = static_cast<float>(i1);
2023-01-11T21:38:06.1320672Z                             auto tmp5 = 0.49606299212598426 * tmp4;
2023-01-11T21:38:06.1320780Z                             auto tmp6 = std::floor(tmp5);
2023-01-11T21:38:06.1320924Z                             auto tmp7 = tmp5 - tmp6;
2023-01-11T21:38:06.1321035Z                             auto tmp8 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1321140Z                             auto tmp9 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1321291Z                             auto tmp10 = tmp8 + -1;
2023-01-11T21:38:06.1321389Z                             auto tmp11 = tmp8 + 0;
2023-01-11T21:38:06.1321485Z                             auto tmp12 = tmp8 + 1;
2023-01-11T21:38:06.1321579Z                             auto tmp13 = tmp8 + 2;
2023-01-11T21:38:06.1321747Z                             auto tmp14 = tmp9 + -1;
2023-01-11T21:38:06.1321843Z                             auto tmp15 = tmp9 + 0;
2023-01-11T21:38:06.1321936Z                             auto tmp16 = tmp9 + 1;
2023-01-11T21:38:06.1322022Z                             auto tmp17 = tmp9 + 2;
2023-01-11T21:38:06.1322161Z                             auto tmp18 = (tmp10 != tmp10) ? tmp10 : std::min(63, tmp10);
2023-01-11T21:38:06.1322294Z                             auto tmp19 = (tmp18 != tmp18) ? tmp18 : std::max(0, tmp18);
2023-01-11T21:38:06.1322428Z                             auto tmp20 = (tmp14 != tmp14) ? tmp14 : std::min(31, tmp14);
2023-01-11T21:38:06.1322560Z                             auto tmp21 = (tmp20 != tmp20) ? tmp20 : std::max(0, tmp20);
2023-01-11T21:38:06.1322685Z                             auto tmp22 = in_ptr0[tmp21 + (32*tmp19) + (2048*i0)];
2023-01-11T21:38:06.1322815Z                             auto tmp23 = (tmp15 != tmp15) ? tmp15 : std::min(31, tmp15);
2023-01-11T21:38:06.1322950Z                             auto tmp24 = (tmp23 != tmp23) ? tmp23 : std::max(0, tmp23);
2023-01-11T21:38:06.1323072Z                             auto tmp25 = in_ptr0[tmp24 + (32*tmp19) + (2048*i0)];
2023-01-11T21:38:06.1323195Z                             auto tmp26 = (tmp16 != tmp16) ? tmp16 : std::min(31, tmp16);
2023-01-11T21:38:06.1323325Z                             auto tmp27 = (tmp26 != tmp26) ? tmp26 : std::max(0, tmp26);
2023-01-11T21:38:06.1323448Z                             auto tmp28 = in_ptr0[tmp27 + (32*tmp19) + (2048*i0)];
2023-01-11T21:38:06.1323576Z                             auto tmp29 = (tmp17 != tmp17) ? tmp17 : std::min(31, tmp17);
2023-01-11T21:38:06.1323733Z                             auto tmp30 = (tmp29 != tmp29) ? tmp29 : std::max(0, tmp29);
2023-01-11T21:38:06.1323856Z                             auto tmp31 = in_ptr0[tmp30 + (32*tmp19) + (2048*i0)];
2023-01-11T21:38:06.1323957Z                             auto tmp32 = tmp3 + 1.0;
2023-01-11T21:38:06.1324110Z                             auto tmp33 = -0.75 * tmp32;
2023-01-11T21:38:06.1324254Z                             auto tmp34 = tmp33 - -3.75;
2023-01-11T21:38:06.1324356Z                             auto tmp35 = tmp34 * tmp32;
2023-01-11T21:38:06.1324505Z                             auto tmp36 = tmp35 + -6.0;
2023-01-11T21:38:06.1324605Z                             auto tmp37 = tmp36 * tmp32;
2023-01-11T21:38:06.1324751Z                             auto tmp38 = tmp37 - -3.0;
2023-01-11T21:38:06.1324854Z                             auto tmp39 = 1.25 * tmp3;
2023-01-11T21:38:06.1325001Z                             auto tmp40 = tmp39 - 2.25;
2023-01-11T21:38:06.1325099Z                             auto tmp41 = tmp40 * tmp3;
2023-01-11T21:38:06.1325196Z                             auto tmp42 = tmp41 * tmp3;
2023-01-11T21:38:06.1325296Z                             auto tmp43 = tmp42 + 1.0;
2023-01-11T21:38:06.1325438Z                             auto tmp44 = 1.0 - tmp3;
2023-01-11T21:38:06.1325542Z                             auto tmp45 = 1.25 * tmp44;
2023-01-11T21:38:06.1325686Z                             auto tmp46 = tmp45 - 2.25;
2023-01-11T21:38:06.1325788Z                             auto tmp47 = tmp46 * tmp44;
2023-01-11T21:38:06.1325886Z                             auto tmp48 = tmp47 * tmp44;
2023-01-11T21:38:06.1325978Z                             auto tmp49 = tmp48 + 1.0;
2023-01-11T21:38:06.1326077Z                             auto tmp50 = tmp44 + 1.0;
2023-01-11T21:38:06.1326223Z                             auto tmp51 = -0.75 * tmp50;
2023-01-11T21:38:06.1326370Z                             auto tmp52 = tmp51 - -3.75;
2023-01-11T21:38:06.1326468Z                             auto tmp53 = tmp52 * tmp50;
2023-01-11T21:38:06.1326615Z                             auto tmp54 = tmp53 + -6.0;
2023-01-11T21:38:06.1326714Z                             auto tmp55 = tmp54 * tmp50;
2023-01-11T21:38:06.1326860Z                             auto tmp56 = tmp55 - -3.0;
2023-01-11T21:38:06.1326952Z                             auto tmp57 = tmp22 * tmp38;
2023-01-11T21:38:06.1327077Z                             auto tmp58 = tmp25 * tmp43;
2023-01-11T21:38:06.1327179Z                             auto tmp59 = tmp28 * tmp49;
2023-01-11T21:38:06.1327277Z                             auto tmp60 = tmp31 * tmp56;
2023-01-11T21:38:06.1327376Z                             auto tmp61 = tmp59 + tmp60;
2023-01-11T21:38:06.1327474Z                             auto tmp62 = tmp58 + tmp61;
2023-01-11T21:38:06.1327574Z                             auto tmp63 = tmp57 + tmp62;
2023-01-11T21:38:06.1327701Z                             auto tmp64 = (tmp11 != tmp11) ? tmp11 : std::min(63, tmp11);
2023-01-11T21:38:06.1327834Z                             auto tmp65 = (tmp64 != tmp64) ? tmp64 : std::max(0, tmp64);
2023-01-11T21:38:06.1327963Z                             auto tmp66 = in_ptr0[tmp21 + (32*tmp65) + (2048*i0)];
2023-01-11T21:38:06.1328087Z                             auto tmp67 = in_ptr0[tmp24 + (32*tmp65) + (2048*i0)];
2023-01-11T21:38:06.1328212Z                             auto tmp68 = in_ptr0[tmp27 + (32*tmp65) + (2048*i0)];
2023-01-11T21:38:06.1328331Z                             auto tmp69 = in_ptr0[tmp30 + (32*tmp65) + (2048*i0)];
2023-01-11T21:38:06.1328433Z                             auto tmp70 = tmp66 * tmp38;
2023-01-11T21:38:06.1328533Z                             auto tmp71 = tmp67 * tmp43;
2023-01-11T21:38:06.1328626Z                             auto tmp72 = tmp68 * tmp49;
2023-01-11T21:38:06.1328727Z                             auto tmp73 = tmp69 * tmp56;
2023-01-11T21:38:06.1328825Z                             auto tmp74 = tmp72 + tmp73;
2023-01-11T21:38:06.1328923Z                             auto tmp75 = tmp71 + tmp74;
2023-01-11T21:38:06.1329057Z                             auto tmp76 = tmp70 + tmp75;
2023-01-11T21:38:06.1329189Z                             auto tmp77 = (tmp12 != tmp12) ? tmp12 : std::min(63, tmp12);
2023-01-11T21:38:06.1329317Z                             auto tmp78 = (tmp77 != tmp77) ? tmp77 : std::max(0, tmp77);
2023-01-11T21:38:06.1329441Z                             auto tmp79 = in_ptr0[tmp21 + (32*tmp78) + (2048*i0)];
2023-01-11T21:38:06.1329555Z                             auto tmp80 = in_ptr0[tmp24 + (32*tmp78) + (2048*i0)];
2023-01-11T21:38:06.1329675Z                             auto tmp81 = in_ptr0[tmp27 + (32*tmp78) + (2048*i0)];
2023-01-11T21:38:06.1329799Z                             auto tmp82 = in_ptr0[tmp30 + (32*tmp78) + (2048*i0)];
2023-01-11T21:38:06.1329899Z                             auto tmp83 = tmp79 * tmp38;
2023-01-11T21:38:06.1329999Z                             auto tmp84 = tmp80 * tmp43;
2023-01-11T21:38:06.1330103Z                             auto tmp85 = tmp81 * tmp49;
2023-01-11T21:38:06.1330205Z                             auto tmp86 = tmp82 * tmp56;
2023-01-11T21:38:06.1330302Z                             auto tmp87 = tmp85 + tmp86;
2023-01-11T21:38:06.1330393Z                             auto tmp88 = tmp84 + tmp87;
2023-01-11T21:38:06.1330494Z                             auto tmp89 = tmp83 + tmp88;
2023-01-11T21:38:06.1330630Z                             auto tmp90 = (tmp13 != tmp13) ? tmp13 : std::min(63, tmp13);
2023-01-11T21:38:06.1330762Z                             auto tmp91 = (tmp90 != tmp90) ? tmp90 : std::max(0, tmp90);
2023-01-11T21:38:06.1330883Z                             auto tmp92 = in_ptr0[tmp21 + (32*tmp91) + (2048*i0)];
2023-01-11T21:38:06.1331003Z                             auto tmp93 = in_ptr0[tmp24 + (32*tmp91) + (2048*i0)];
2023-01-11T21:38:06.1331122Z                             auto tmp94 = in_ptr0[tmp27 + (32*tmp91) + (2048*i0)];
2023-01-11T21:38:06.1331241Z                             auto tmp95 = in_ptr0[tmp30 + (32*tmp91) + (2048*i0)];
2023-01-11T21:38:06.1331337Z                             auto tmp96 = tmp92 * tmp38;
2023-01-11T21:38:06.1331437Z                             auto tmp97 = tmp93 * tmp43;
2023-01-11T21:38:06.1331537Z                             auto tmp98 = tmp94 * tmp49;
2023-01-11T21:38:06.1331635Z                             auto tmp99 = tmp95 * tmp56;
2023-01-11T21:38:06.1331769Z                             auto tmp100 = tmp98 + tmp99;
2023-01-11T21:38:06.1331877Z                             auto tmp101 = tmp97 + tmp100;
2023-01-11T21:38:06.1331981Z                             auto tmp102 = tmp96 + tmp101;
2023-01-11T21:38:06.1332083Z                             auto tmp103 = tmp7 + 1.0;
2023-01-11T21:38:06.1332231Z                             auto tmp104 = -0.75 * tmp103;
2023-01-11T21:38:06.1332384Z                             auto tmp105 = tmp104 - -3.75;
2023-01-11T21:38:06.1332489Z                             auto tmp106 = tmp105 * tmp103;
2023-01-11T21:38:06.1332642Z                             auto tmp107 = tmp106 + -6.0;
2023-01-11T21:38:06.1332749Z                             auto tmp108 = tmp107 * tmp103;
2023-01-11T21:38:06.1332904Z                             auto tmp109 = tmp108 - -3.0;
2023-01-11T21:38:06.1333003Z                             auto tmp110 = 1.25 * tmp7;
2023-01-11T21:38:06.1333147Z                             auto tmp111 = tmp110 - 2.25;
2023-01-11T21:38:06.1333254Z                             auto tmp112 = tmp111 * tmp7;
2023-01-11T21:38:06.1333356Z                             auto tmp113 = tmp112 * tmp7;
2023-01-11T21:38:06.1333458Z                             auto tmp114 = tmp113 + 1.0;
2023-01-11T21:38:06.1333603Z                             auto tmp115 = 1.0 - tmp7;
2023-01-11T21:38:06.1333704Z                             auto tmp116 = 1.25 * tmp115;
2023-01-11T21:38:06.1333851Z                             auto tmp117 = tmp116 - 2.25;
2023-01-11T21:38:06.1333949Z                             auto tmp118 = tmp117 * tmp115;
2023-01-11T21:38:06.1334055Z                             auto tmp119 = tmp118 * tmp115;
2023-01-11T21:38:06.1334184Z                             auto tmp120 = tmp119 + 1.0;
2023-01-11T21:38:06.1334283Z                             auto tmp121 = tmp115 + 1.0;
2023-01-11T21:38:06.1334433Z                             auto tmp122 = -0.75 * tmp121;
2023-01-11T21:38:06.1334939Z                             auto tmp123 = tmp122 - -3.75;
2023-01-11T21:38:06.1335049Z                             auto tmp124 = tmp123 * tmp121;
2023-01-11T21:38:06.1335201Z                             auto tmp125 = tmp124 + -6.0;
2023-01-11T21:38:06.1335299Z                             auto tmp126 = tmp125 * tmp121;
2023-01-11T21:38:06.1335451Z                             auto tmp127 = tmp126 - -3.0;
2023-01-11T21:38:06.1335554Z                             auto tmp128 = tmp63 * tmp109;
2023-01-11T21:38:06.1335660Z                             auto tmp129 = tmp76 * tmp114;
2023-01-11T21:38:06.1335764Z                             auto tmp130 = tmp89 * tmp120;
2023-01-11T21:38:06.1335867Z                             auto tmp131 = tmp102 * tmp127;
2023-01-11T21:38:06.1335975Z                             auto tmp132 = tmp130 + tmp131;
2023-01-11T21:38:06.1336071Z                             auto tmp133 = tmp129 + tmp132;
2023-01-11T21:38:06.1336173Z                             auto tmp134 = tmp128 + tmp133;
2023-01-11T21:38:06.1336288Z                             out_ptr0[i2 + (128*i1) + (16384*i0)] = tmp134;
2023-01-11T21:38:06.1336360Z                         }
2023-01-11T21:38:06.1336431Z                     }
2023-01-11T21:38:06.1336501Z                 }
2023-01-11T21:38:06.1336570Z             }
2023-01-11T21:38:06.1336630Z         }
2023-01-11T21:38:06.1336728Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1336814Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.1336882Z         {
2023-01-11T21:38:06.1336972Z             for(long i1=0; i1<128; i1+=1)
2023-01-11T21:38:06.1337039Z             {
2023-01-11T21:38:06.1337180Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1337289Z                 for(long i2=0; i2<256; i2+=1)
2023-01-11T21:38:06.1337363Z                 {
2023-01-11T21:38:06.1337435Z                     {
2023-01-11T21:38:06.1337508Z                         {
2023-01-11T21:38:06.1337625Z                             auto tmp0 = static_cast<float>(i2);
2023-01-11T21:38:06.1337723Z                             auto tmp1 = tmp0 + 0.5;
2023-01-11T21:38:06.1337874Z                             auto tmp2 = 0.125 * tmp1;
2023-01-11T21:38:06.1338015Z                             auto tmp3 = tmp2 - 0.5;
2023-01-11T21:38:06.1338127Z                             auto tmp4 = std::floor(tmp3);
2023-01-11T21:38:06.1338274Z                             auto tmp5 = tmp3 - tmp4;
2023-01-11T21:38:06.1338389Z                             auto tmp6 = static_cast<float>(i1);
2023-01-11T21:38:06.1338486Z                             auto tmp7 = tmp6 + 0.5;
2023-01-11T21:38:06.1338579Z                             auto tmp8 = 0.5 * tmp7;
2023-01-11T21:38:06.1338722Z                             auto tmp9 = tmp8 - 0.5;
2023-01-11T21:38:06.1338836Z                             auto tmp10 = std::floor(tmp9);
2023-01-11T21:38:06.1338979Z                             auto tmp11 = tmp9 - tmp10;
2023-01-11T21:38:06.1339093Z                             auto tmp12 = static_cast<int>(tmp10);
2023-01-11T21:38:06.1339209Z                             auto tmp13 = static_cast<int>(tmp4);
2023-01-11T21:38:06.1339357Z                             auto tmp14 = tmp12 + -1;
2023-01-11T21:38:06.1339456Z                             auto tmp15 = tmp12 + 0;
2023-01-11T21:38:06.1339551Z                             auto tmp16 = tmp12 + 1;
2023-01-11T21:38:06.1339647Z                             auto tmp17 = tmp12 + 2;
2023-01-11T21:38:06.1339784Z                             auto tmp18 = tmp13 + -1;
2023-01-11T21:38:06.1339880Z                             auto tmp19 = tmp13 + 0;
2023-01-11T21:38:06.1339976Z                             auto tmp20 = tmp13 + 1;
2023-01-11T21:38:06.1340069Z                             auto tmp21 = tmp13 + 2;
2023-01-11T21:38:06.1340249Z                             auto tmp22 = (tmp14 != tmp14) ? tmp14 : std::min(63, tmp14);
2023-01-11T21:38:06.1340383Z                             auto tmp23 = (tmp22 != tmp22) ? tmp22 : std::max(0, tmp22);
2023-01-11T21:38:06.1340513Z                             auto tmp24 = (tmp18 != tmp18) ? tmp18 : std::min(31, tmp18);
2023-01-11T21:38:06.1340648Z                             auto tmp25 = (tmp24 != tmp24) ? tmp24 : std::max(0, tmp24);
2023-01-11T21:38:06.1340765Z                             auto tmp26 = in_ptr0[tmp25 + (32*tmp23) + (2048*i0)];
2023-01-11T21:38:06.1340893Z                             auto tmp27 = (tmp19 != tmp19) ? tmp19 : std::min(31, tmp19);
2023-01-11T21:38:06.1341025Z                             auto tmp28 = (tmp27 != tmp27) ? tmp27 : std::max(0, tmp27);
2023-01-11T21:38:06.1341146Z                             auto tmp29 = in_ptr0[tmp28 + (32*tmp23) + (2048*i0)];
2023-01-11T21:38:06.1341274Z                             auto tmp30 = (tmp20 != tmp20) ? tmp20 : std::min(31, tmp20);
2023-01-11T21:38:06.1341406Z                             auto tmp31 = (tmp30 != tmp30) ? tmp30 : std::max(0, tmp30);
2023-01-11T21:38:06.1341525Z                             auto tmp32 = in_ptr0[tmp31 + (32*tmp23) + (2048*i0)];
2023-01-11T21:38:06.1341653Z                             auto tmp33 = (tmp21 != tmp21) ? tmp21 : std::min(31, tmp21);
2023-01-11T21:38:06.1341783Z                             auto tmp34 = (tmp33 != tmp33) ? tmp33 : std::max(0, tmp33);
2023-01-11T21:38:06.1341906Z                             auto tmp35 = in_ptr0[tmp34 + (32*tmp23) + (2048*i0)];
2023-01-11T21:38:06.1342000Z                             auto tmp36 = tmp5 + 1.0;
2023-01-11T21:38:06.1342150Z                             auto tmp37 = -0.75 * tmp36;
2023-01-11T21:38:06.1342300Z                             auto tmp38 = tmp37 - -3.75;
2023-01-11T21:38:06.1342401Z                             auto tmp39 = tmp38 * tmp36;
2023-01-11T21:38:06.1342549Z                             auto tmp40 = tmp39 + -6.0;
2023-01-11T21:38:06.1342650Z                             auto tmp41 = tmp40 * tmp36;
2023-01-11T21:38:06.1342799Z                             auto tmp42 = tmp41 - -3.0;
2023-01-11T21:38:06.1342892Z                             auto tmp43 = 1.25 * tmp5;
2023-01-11T21:38:06.1343042Z                             auto tmp44 = tmp43 - 2.25;
2023-01-11T21:38:06.1343169Z                             auto tmp45 = tmp44 * tmp5;
2023-01-11T21:38:06.1343271Z                             auto tmp46 = tmp45 * tmp5;
2023-01-11T21:38:06.1343372Z                             auto tmp47 = tmp46 + 1.0;
2023-01-11T21:38:06.1343516Z                             auto tmp48 = 1.0 - tmp5;
2023-01-11T21:38:06.1343612Z                             auto tmp49 = 1.25 * tmp48;
2023-01-11T21:38:06.1343752Z                             auto tmp50 = tmp49 - 2.25;
2023-01-11T21:38:06.1343854Z                             auto tmp51 = tmp50 * tmp48;
2023-01-11T21:38:06.1343952Z                             auto tmp52 = tmp51 * tmp48;
2023-01-11T21:38:06.1344056Z                             auto tmp53 = tmp52 + 1.0;
2023-01-11T21:38:06.1344154Z                             auto tmp54 = tmp48 + 1.0;
2023-01-11T21:38:06.1344299Z                             auto tmp55 = -0.75 * tmp54;
2023-01-11T21:38:06.1344446Z                             auto tmp56 = tmp55 - -3.75;
2023-01-11T21:38:06.1344548Z                             auto tmp57 = tmp56 * tmp54;
2023-01-11T21:38:06.1344686Z                             auto tmp58 = tmp57 + -6.0;
2023-01-11T21:38:06.1344787Z                             auto tmp59 = tmp58 * tmp54;
2023-01-11T21:38:06.1344934Z                             auto tmp60 = tmp59 - -3.0;
2023-01-11T21:38:06.1345033Z                             auto tmp61 = tmp26 * tmp42;
2023-01-11T21:38:06.1345133Z                             auto tmp62 = tmp29 * tmp47;
2023-01-11T21:38:06.1345232Z                             auto tmp63 = tmp32 * tmp53;
2023-01-11T21:38:06.1345335Z                             auto tmp64 = tmp35 * tmp60;
2023-01-11T21:38:06.1345426Z                             auto tmp65 = tmp63 + tmp64;
2023-01-11T21:38:06.1345558Z                             auto tmp66 = tmp62 + tmp65;
2023-01-11T21:38:06.1345657Z                             auto tmp67 = tmp61 + tmp66;
2023-01-11T21:38:06.1345788Z                             auto tmp68 = (tmp15 != tmp15) ? tmp15 : std::min(63, tmp15);
2023-01-11T21:38:06.1345924Z                             auto tmp69 = (tmp68 != tmp68) ? tmp68 : std::max(0, tmp68);
2023-01-11T21:38:06.1346048Z                             auto tmp70 = in_ptr0[tmp25 + (32*tmp69) + (2048*i0)];
2023-01-11T21:38:06.1346170Z                             auto tmp71 = in_ptr0[tmp28 + (32*tmp69) + (2048*i0)];
2023-01-11T21:38:06.1346291Z                             auto tmp72 = in_ptr0[tmp31 + (32*tmp69) + (2048*i0)];
2023-01-11T21:38:06.1346413Z                             auto tmp73 = in_ptr0[tmp34 + (32*tmp69) + (2048*i0)];
2023-01-11T21:38:06.1346507Z                             auto tmp74 = tmp70 * tmp42;
2023-01-11T21:38:06.1346608Z                             auto tmp75 = tmp71 * tmp47;
2023-01-11T21:38:06.1346712Z                             auto tmp76 = tmp72 * tmp53;
2023-01-11T21:38:06.1346811Z                             auto tmp77 = tmp73 * tmp60;
2023-01-11T21:38:06.1346910Z                             auto tmp78 = tmp76 + tmp77;
2023-01-11T21:38:06.1347010Z                             auto tmp79 = tmp75 + tmp78;
2023-01-11T21:38:06.1347107Z                             auto tmp80 = tmp74 + tmp79;
2023-01-11T21:38:06.1347232Z                             auto tmp81 = (tmp16 != tmp16) ? tmp16 : std::min(63, tmp16);
2023-01-11T21:38:06.1347365Z                             auto tmp82 = (tmp81 != tmp81) ? tmp81 : std::max(0, tmp81);
2023-01-11T21:38:06.1347485Z                             auto tmp83 = in_ptr0[tmp25 + (32*tmp82) + (2048*i0)];
2023-01-11T21:38:06.1347607Z                             auto tmp84 = in_ptr0[tmp28 + (32*tmp82) + (2048*i0)];
2023-01-11T21:38:06.1347727Z                             auto tmp85 = in_ptr0[tmp31 + (32*tmp82) + (2048*i0)];
2023-01-11T21:38:06.1347851Z                             auto tmp86 = in_ptr0[tmp34 + (32*tmp82) + (2048*i0)];
2023-01-11T21:38:06.1347953Z                             auto tmp87 = tmp83 * tmp42;
2023-01-11T21:38:06.1348050Z                             auto tmp88 = tmp84 * tmp47;
2023-01-11T21:38:06.1348221Z                             auto tmp89 = tmp85 * tmp53;
2023-01-11T21:38:06.1348322Z                             auto tmp90 = tmp86 * tmp60;
2023-01-11T21:38:06.1348424Z                             auto tmp91 = tmp89 + tmp90;
2023-01-11T21:38:06.1348521Z                             auto tmp92 = tmp88 + tmp91;
2023-01-11T21:38:06.1348620Z                             auto tmp93 = tmp87 + tmp92;
2023-01-11T21:38:06.1348751Z                             auto tmp94 = (tmp17 != tmp17) ? tmp17 : std::min(63, tmp17);
2023-01-11T21:38:06.1348885Z                             auto tmp95 = (tmp94 != tmp94) ? tmp94 : std::max(0, tmp94);
2023-01-11T21:38:06.1349005Z                             auto tmp96 = in_ptr0[tmp25 + (32*tmp95) + (2048*i0)];
2023-01-11T21:38:06.1349122Z                             auto tmp97 = in_ptr0[tmp28 + (32*tmp95) + (2048*i0)];
2023-01-11T21:38:06.1349241Z                             auto tmp98 = in_ptr0[tmp31 + (32*tmp95) + (2048*i0)];
2023-01-11T21:38:06.1349363Z                             auto tmp99 = in_ptr0[tmp34 + (32*tmp95) + (2048*i0)];
2023-01-11T21:38:06.1349469Z                             auto tmp100 = tmp96 * tmp42;
2023-01-11T21:38:06.1349577Z                             auto tmp101 = tmp97 * tmp47;
2023-01-11T21:38:06.1349680Z                             auto tmp102 = tmp98 * tmp53;
2023-01-11T21:38:06.1349785Z                             auto tmp103 = tmp99 * tmp60;
2023-01-11T21:38:06.1349891Z                             auto tmp104 = tmp102 + tmp103;
2023-01-11T21:38:06.1349988Z                             auto tmp105 = tmp101 + tmp104;
2023-01-11T21:38:06.1350092Z                             auto tmp106 = tmp100 + tmp105;
2023-01-11T21:38:06.1350193Z                             auto tmp107 = tmp11 + 1.0;
2023-01-11T21:38:06.1350375Z                             auto tmp108 = -0.75 * tmp107;
2023-01-11T21:38:06.1350529Z                             auto tmp109 = tmp108 - -3.75;
2023-01-11T21:38:06.1350636Z                             auto tmp110 = tmp109 * tmp107;
2023-01-11T21:38:06.1350790Z                             auto tmp111 = tmp110 + -6.0;
2023-01-11T21:38:06.1350886Z                             auto tmp112 = tmp111 * tmp107;
2023-01-11T21:38:06.1351038Z                             auto tmp113 = tmp112 - -3.0;
2023-01-11T21:38:06.1351138Z                             auto tmp114 = 1.25 * tmp11;
2023-01-11T21:38:06.1351289Z                             auto tmp115 = tmp114 - 2.25;
2023-01-11T21:38:06.1351393Z                             auto tmp116 = tmp115 * tmp11;
2023-01-11T21:38:06.1351497Z                             auto tmp117 = tmp116 * tmp11;
2023-01-11T21:38:06.1351596Z                             auto tmp118 = tmp117 + 1.0;
2023-01-11T21:38:06.1351741Z                             auto tmp119 = 1.0 - tmp11;
2023-01-11T21:38:06.1351838Z                             auto tmp120 = 1.25 * tmp119;
2023-01-11T21:38:06.1351989Z                             auto tmp121 = tmp120 - 2.25;
2023-01-11T21:38:06.1352094Z                             auto tmp122 = tmp121 * tmp119;
2023-01-11T21:38:06.1352200Z                             auto tmp123 = tmp122 * tmp119;
2023-01-11T21:38:06.1352301Z                             auto tmp124 = tmp123 + 1.0;
2023-01-11T21:38:06.1352400Z                             auto tmp125 = tmp119 + 1.0;
2023-01-11T21:38:06.1352552Z                             auto tmp126 = -0.75 * tmp125;
2023-01-11T21:38:06.1352696Z                             auto tmp127 = tmp126 - -3.75;
2023-01-11T21:38:06.1352800Z                             auto tmp128 = tmp127 * tmp125;
2023-01-11T21:38:06.1352951Z                             auto tmp129 = tmp128 + -6.0;
2023-01-11T21:38:06.1353055Z                             auto tmp130 = tmp129 * tmp125;
2023-01-11T21:38:06.1353208Z                             auto tmp131 = tmp130 - -3.0;
2023-01-11T21:38:06.1353314Z                             auto tmp132 = tmp67 * tmp113;
2023-01-11T21:38:06.1353415Z                             auto tmp133 = tmp80 * tmp118;
2023-01-11T21:38:06.1353519Z                             auto tmp134 = tmp93 * tmp124;
2023-01-11T21:38:06.1353644Z                             auto tmp135 = tmp106 * tmp131;
2023-01-11T21:38:06.1353751Z                             auto tmp136 = tmp134 + tmp135;
2023-01-11T21:38:06.1353853Z                             auto tmp137 = tmp133 + tmp136;
2023-01-11T21:38:06.1353956Z                             auto tmp138 = tmp132 + tmp137;
2023-01-11T21:38:06.1354067Z                             out_ptr1[i2 + (256*i1) + (32768*i0)] = tmp138;
2023-01-11T21:38:06.1354140Z                         }
2023-01-11T21:38:06.1354211Z                     }
2023-01-11T21:38:06.1354273Z                 }
2023-01-11T21:38:06.1354341Z             }
2023-01-11T21:38:06.1354409Z         }
2023-01-11T21:38:06.1354482Z     }
2023-01-11T21:38:06.1354550Z }
2023-01-11T21:38:06.1354633Z ''')
2023-01-11T21:38:06.1354639Z 
2023-01-11T21:38:06.1354643Z 
2023-01-11T21:38:06.1354739Z async_compile.wait(globals())
2023-01-11T21:38:06.1354809Z del async_compile
2023-01-11T21:38:06.1354814Z 
2023-01-11T21:38:06.1354888Z def call(args):
2023-01-11T21:38:06.1354963Z     arg0_1, = args
2023-01-11T21:38:06.1355041Z     args.clear()
2023-01-11T21:38:06.1355287Z     buf0 = empty_strided((4, 3, 128, 128), (49152, 16384, 128, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1355545Z     buf1 = empty_strided((4, 3, 128, 256), (98304, 32768, 256, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1355714Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1355788Z     del arg0_1
2023-01-11T21:38:06.1355862Z     return (buf0, buf1, )
2023-01-11T21:38:06.1355868Z 
2023-01-11T21:38:06.1355872Z 
2023-01-11T21:38:06.1355952Z if __name__ == "__main__":
2023-01-11T21:38:06.1356111Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1356240Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1356463Z     arg0_1 = rand_strided((4, 3, 64, 32), (6144, 2048, 32, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1356580Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1356847Z [2023-01-11 21:32:51,050] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 314
2023-01-11T21:38:06.1356854Z 
2023-01-11T21:38:06.1356925Z ok (3.538s)
2023-01-11T21:38:06.1357394Z   test_upsample_bilinear2d_a_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1357523Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1357779Z [2023-01-11 21:32:51,625] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 315
2023-01-11T21:38:06.1357785Z 
2023-01-11T21:38:06.1357883Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1357963Z import torch
2023-01-11T21:38:06.1358038Z import random
2023-01-11T21:38:06.1358160Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1358283Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1358288Z 
2023-01-11T21:38:06.1358370Z aten = torch.ops.aten
2023-01-11T21:38:06.1358500Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1358598Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1358603Z 
2023-01-11T21:38:06.1358678Z import triton
2023-01-11T21:38:06.1358771Z import triton.language as tl
2023-01-11T21:38:06.1358896Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1359040Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1359046Z 
2023-01-11T21:38:06.1359050Z 
2023-01-11T21:38:06.1359186Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1359393Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1359533Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.1359642Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.1359752Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1359858Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1359961Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:06.1360027Z {
2023-01-11T21:38:06.1360118Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.1360201Z     auto out_ptr2 = in_out_ptr1;
2023-01-11T21:38:06.1360304Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1360372Z     {
2023-01-11T21:38:06.1360458Z         #pragma omp for 
2023-01-11T21:38:06.1360547Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1360614Z         {
2023-01-11T21:38:06.1360702Z             #pragma GCC ivdep
2023-01-11T21:38:06.1360787Z             for(long i1=0; i1<45; i1+=1)
2023-01-11T21:38:06.1360859Z             {
2023-01-11T21:38:06.1360949Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1361045Z                 for(long i2=0; i2<45; i2+=1)
2023-01-11T21:38:06.1361114Z                 {
2023-01-11T21:38:06.1361186Z                     {
2023-01-11T21:38:06.1361259Z                         {
2023-01-11T21:38:06.1361368Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1361486Z                             auto tmp1 = static_cast<float>(0.5);
2023-01-11T21:38:06.1361585Z                             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1361709Z                             auto tmp3 = static_cast<float>(0.8222222222222222);
2023-01-11T21:38:06.1361838Z                             auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:06.1361983Z                             auto tmp5 = tmp4 - tmp1;
2023-01-11T21:38:06.1362098Z                             auto tmp6 = static_cast<float>(0.0);
2023-01-11T21:38:06.1362229Z                             auto tmp7 = (tmp6 != tmp6) ? tmp6 : std::max(tmp5, tmp6);
2023-01-11T21:38:06.1362345Z                             auto tmp8 = std::floor(tmp7);
2023-01-11T21:38:06.1362461Z                             auto tmp9 = static_cast<long>(tmp8);
2023-01-11T21:38:06.1362577Z                             auto tmp10 = static_cast<float>(i2);
2023-01-11T21:38:06.1362682Z                             auto tmp11 = tmp10 + tmp1;
2023-01-11T21:38:06.1362807Z                             auto tmp12 = static_cast<float>(0.8444444444444444);
2023-01-11T21:38:06.1362912Z                             auto tmp13 = tmp11 * tmp12;
2023-01-11T21:38:06.1363080Z                             auto tmp14 = tmp13 - tmp1;
2023-01-11T21:38:06.1363247Z                             auto tmp15 = (tmp6 != tmp6) ? tmp6 : std::max(tmp14, tmp6);
2023-01-11T21:38:06.1363397Z                             auto tmp16 = std::floor(tmp15);
2023-01-11T21:38:06.1363552Z                             auto tmp17 = static_cast<long>(tmp16);
2023-01-11T21:38:06.1363712Z                             auto tmp18 = in_ptr0[tmp17 + (38*tmp9) + (1406*i0)];
2023-01-11T21:38:06.1363844Z                             auto tmp19 = static_cast<float>(1.0);
2023-01-11T21:38:06.1363983Z                             auto tmp20 = static_cast<float>(tmp9);
2023-01-11T21:38:06.1364170Z                             auto tmp21 = tmp7 - tmp20;
2023-01-11T21:38:06.1364343Z                             auto tmp22 = tmp19 - tmp21;
2023-01-11T21:38:06.1364467Z                             auto tmp23 = tmp18 * tmp22;
2023-01-11T21:38:06.1364594Z                             auto tmp24 = std::ceil(tmp7);
2023-01-11T21:38:06.1364727Z                             auto tmp25 = static_cast<float>(36.0);
2023-01-11T21:38:06.1364891Z                             auto tmp26 = (tmp25 != tmp25) ? tmp25 : std::min(tmp24, tmp25);
2023-01-11T21:38:06.1365027Z                             auto tmp27 = static_cast<long>(tmp26);
2023-01-11T21:38:06.1365163Z                             auto tmp28 = in_ptr0[tmp17 + (38*tmp27) + (1406*i0)];
2023-01-11T21:38:06.1365333Z                             auto tmp29 = tmp28 * tmp21;
2023-01-11T21:38:06.1365463Z                             auto tmp30 = tmp23 + tmp29;
2023-01-11T21:38:06.1365594Z                             out_ptr0[i2 + (45*i1) + (2025*i0)] = tmp30;
2023-01-11T21:38:06.1365701Z                         }
2023-01-11T21:38:06.1365785Z                     }
2023-01-11T21:38:06.1365867Z                 }
2023-01-11T21:38:06.1365948Z             }
2023-01-11T21:38:06.1366028Z         }
2023-01-11T21:38:06.1366123Z         #pragma omp for 
2023-01-11T21:38:06.1366219Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1366305Z         {
2023-01-11T21:38:06.1366417Z             #pragma GCC ivdep
2023-01-11T21:38:06.1366543Z             for(long i1=0; i1<45; i1+=1)
2023-01-11T21:38:06.1366636Z             {
2023-01-11T21:38:06.1366746Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1366835Z                 for(long i2=0; i2<45; i2+=1)
2023-01-11T21:38:06.1366905Z                 {
2023-01-11T21:38:06.1366978Z                     {
2023-01-11T21:38:06.1367050Z                         {
2023-01-11T21:38:06.1367163Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1367276Z                             auto tmp1 = static_cast<float>(0.5);
2023-01-11T21:38:06.1367380Z                             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1367494Z                             auto tmp3 = static_cast<float>(0.8222222222222222);
2023-01-11T21:38:06.1367596Z                             auto tmp4 = tmp2 * tmp3;
2023-01-11T21:38:06.1367771Z                             auto tmp5 = tmp4 - tmp1;
2023-01-11T21:38:06.1367887Z                             auto tmp6 = static_cast<float>(0.0);
2023-01-11T21:38:06.1368068Z                             auto tmp7 = (tmp6 != tmp6) ? tmp6 : std::max(tmp5, tmp6);
2023-01-11T21:38:06.1368180Z                             auto tmp8 = std::floor(tmp7);
2023-01-11T21:38:06.1368295Z                             auto tmp9 = static_cast<long>(tmp8);
2023-01-11T21:38:06.1368418Z                             auto tmp10 = static_cast<float>(i2);
2023-01-11T21:38:06.1368518Z                             auto tmp11 = tmp10 + tmp1;
2023-01-11T21:38:06.1368647Z                             auto tmp12 = static_cast<float>(0.8444444444444444);
2023-01-11T21:38:06.1368752Z                             auto tmp13 = tmp11 * tmp12;
2023-01-11T21:38:06.1368905Z                             auto tmp14 = tmp13 - tmp1;
2023-01-11T21:38:06.1369041Z                             auto tmp15 = (tmp6 != tmp6) ? tmp6 : std::max(tmp14, tmp6);
2023-01-11T21:38:06.1369153Z                             auto tmp16 = std::ceil(tmp15);
2023-01-11T21:38:06.1369271Z                             auto tmp17 = static_cast<float>(37.0);
2023-01-11T21:38:06.1369409Z                             auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp16, tmp17);
2023-01-11T21:38:06.1369523Z                             auto tmp19 = static_cast<long>(tmp18);
2023-01-11T21:38:06.1369653Z                             auto tmp20 = in_ptr0[tmp19 + (38*tmp9) + (1406*i0)];
2023-01-11T21:38:06.1369772Z                             auto tmp21 = static_cast<float>(1.0);
2023-01-11T21:38:06.1369891Z                             auto tmp22 = static_cast<float>(tmp9);
2023-01-11T21:38:06.1370043Z                             auto tmp23 = tmp7 - tmp22;
2023-01-11T21:38:06.1370196Z                             auto tmp24 = tmp21 - tmp23;
2023-01-11T21:38:06.1370303Z                             auto tmp25 = tmp20 * tmp24;
2023-01-11T21:38:06.1370414Z                             auto tmp26 = std::ceil(tmp7);
2023-01-11T21:38:06.1370522Z                             auto tmp27 = static_cast<float>(36.0);
2023-01-11T21:38:06.1370661Z                             auto tmp28 = (tmp27 != tmp27) ? tmp27 : std::min(tmp26, tmp27);
2023-01-11T21:38:06.1370780Z                             auto tmp29 = static_cast<long>(tmp28);
2023-01-11T21:38:06.1370907Z                             auto tmp30 = in_ptr0[tmp19 + (38*tmp29) + (1406*i0)];
2023-01-11T21:38:06.1371041Z                             auto tmp31 = tmp30 * tmp23;
2023-01-11T21:38:06.1371145Z                             auto tmp32 = tmp25 + tmp31;
2023-01-11T21:38:06.1371259Z                             out_ptr1[i2 + (45*i1) + (2025*i0)] = tmp32;
2023-01-11T21:38:06.1371335Z                         }
2023-01-11T21:38:06.1371402Z                     }
2023-01-11T21:38:06.1371474Z                 }
2023-01-11T21:38:06.1371544Z             }
2023-01-11T21:38:06.1371613Z         }
2023-01-11T21:38:06.1371696Z         #pragma omp for 
2023-01-11T21:38:06.1371785Z         for(long i0=0; i0<360; i0+=1)
2023-01-11T21:38:06.1371854Z         {
2023-01-11T21:38:06.1371934Z             #pragma GCC ivdep
2023-01-11T21:38:06.1372031Z             for(long i1=0; i1<45; i1+=1)
2023-01-11T21:38:06.1372100Z             {
2023-01-11T21:38:06.1372170Z                 {
2023-01-11T21:38:06.1372240Z                     {
2023-01-11T21:38:06.1372355Z                         auto tmp0 = out_ptr0[i1 + (45*i0)];
2023-01-11T21:38:06.1372463Z                         auto tmp16 = out_ptr1[i1 + (45*i0)];
2023-01-11T21:38:06.1372581Z                         auto tmp1 = static_cast<float>(1.0);
2023-01-11T21:38:06.1372692Z                         auto tmp2 = static_cast<float>(i1);
2023-01-11T21:38:06.1372805Z                         auto tmp3 = static_cast<float>(0.5);
2023-01-11T21:38:06.1372908Z                         auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1373032Z                         auto tmp5 = static_cast<float>(0.8444444444444444);
2023-01-11T21:38:06.1373135Z                         auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1373282Z                         auto tmp7 = tmp6 - tmp3;
2023-01-11T21:38:06.1373423Z                         auto tmp8 = static_cast<float>(0.0);
2023-01-11T21:38:06.1373560Z                         auto tmp9 = (tmp8 != tmp8) ? tmp8 : std::max(tmp7, tmp8);
2023-01-11T21:38:06.1373672Z                         auto tmp10 = std::floor(tmp9);
2023-01-11T21:38:06.1373790Z                         auto tmp11 = static_cast<long>(tmp10);
2023-01-11T21:38:06.1373908Z                         auto tmp12 = static_cast<float>(tmp11);
2023-01-11T21:38:06.1374056Z                         auto tmp13 = tmp9 - tmp12;
2023-01-11T21:38:06.1374202Z                         auto tmp14 = tmp1 - tmp13;
2023-01-11T21:38:06.1374302Z                         auto tmp15 = tmp0 * tmp14;
2023-01-11T21:38:06.1374397Z                         auto tmp17 = tmp16 * tmp13;
2023-01-11T21:38:06.1374685Z                         auto tmp18 = tmp15 + tmp17;
2023-01-11T21:38:06.1374832Z                         in_out_ptr0[i1 + (45*i0)] = tmp18;
2023-01-11T21:38:06.1374909Z                     }
2023-01-11T21:38:06.1374985Z                 }
2023-01-11T21:38:06.1375052Z             }
2023-01-11T21:38:06.1375111Z         }
2023-01-11T21:38:06.1375194Z         #pragma omp for 
2023-01-11T21:38:06.1375282Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1375362Z         {
2023-01-11T21:38:06.1375459Z             #pragma GCC ivdep
2023-01-11T21:38:06.1375568Z             for(long i1=0; i1<74; i1+=1)
2023-01-11T21:38:06.1375643Z             {
2023-01-11T21:38:06.1375724Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1375818Z                 for(long i2=0; i2<76; i2+=1)
2023-01-11T21:38:06.1375886Z                 {
2023-01-11T21:38:06.1375957Z                     {
2023-01-11T21:38:06.1376030Z                         {
2023-01-11T21:38:06.1376147Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1376267Z                             auto tmp1 = static_cast<float>(0.4931506849315068);
2023-01-11T21:38:06.1376361Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1376473Z                             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:06.1376590Z                             auto tmp4 = static_cast<long>(tmp3);
2023-01-11T21:38:06.1376701Z                             auto tmp5 = static_cast<float>(i2);
2023-01-11T21:38:06.1376892Z                             auto tmp6 = static_cast<float>(0.49333333333333335);
2023-01-11T21:38:06.1376997Z                             auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:06.1377107Z                             auto tmp8 = std::floor(tmp7);
2023-01-11T21:38:06.1377298Z                             auto tmp9 = static_cast<long>(tmp8);
2023-01-11T21:38:06.1377420Z                             auto tmp10 = in_ptr0[tmp9 + (38*tmp4) + (1406*i0)];
2023-01-11T21:38:06.1377533Z                             auto tmp11 = static_cast<float>(1.0);
2023-01-11T21:38:06.1377649Z                             auto tmp12 = static_cast<float>(tmp4);
2023-01-11T21:38:06.1377809Z                             auto tmp13 = tmp2 - tmp12;
2023-01-11T21:38:06.1377963Z                             auto tmp14 = tmp11 - tmp13;
2023-01-11T21:38:06.1378068Z                             auto tmp15 = tmp10 * tmp14;
2023-01-11T21:38:06.1378180Z                             auto tmp16 = std::ceil(tmp2);
2023-01-11T21:38:06.1378292Z                             auto tmp17 = static_cast<float>(36.0);
2023-01-11T21:38:06.1378423Z                             auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp16, tmp17);
2023-01-11T21:38:06.1378538Z                             auto tmp19 = static_cast<long>(tmp18);
2023-01-11T21:38:06.1378661Z                             auto tmp20 = in_ptr0[tmp9 + (38*tmp19) + (1406*i0)];
2023-01-11T21:38:06.1378763Z                             auto tmp21 = tmp20 * tmp13;
2023-01-11T21:38:06.1378864Z                             auto tmp22 = tmp15 + tmp21;
2023-01-11T21:38:06.1378981Z                             auto tmp23 = static_cast<float>(tmp9);
2023-01-11T21:38:06.1379128Z                             auto tmp24 = tmp7 - tmp23;
2023-01-11T21:38:06.1379325Z                             auto tmp25 = tmp11 - tmp24;
2023-01-11T21:38:06.1379426Z                             auto tmp26 = tmp22 * tmp25;
2023-01-11T21:38:06.1379538Z                             out_ptr2[i2 + (76*i1) + (5624*i0)] = tmp26;
2023-01-11T21:38:06.1379611Z                         }
2023-01-11T21:38:06.1379685Z                     }
2023-01-11T21:38:06.1379754Z                 }
2023-01-11T21:38:06.1379821Z             }
2023-01-11T21:38:06.1379880Z         }
2023-01-11T21:38:06.1379961Z         #pragma omp for 
2023-01-11T21:38:06.1380046Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1380113Z         {
2023-01-11T21:38:06.1380200Z             #pragma GCC ivdep
2023-01-11T21:38:06.1380290Z             for(long i1=0; i1<74; i1+=1)
2023-01-11T21:38:06.1380359Z             {
2023-01-11T21:38:06.1380438Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1380532Z                 for(long i2=0; i2<76; i2+=1)
2023-01-11T21:38:06.1380600Z                 {
2023-01-11T21:38:06.1380674Z                     {
2023-01-11T21:38:06.1380746Z                         {
2023-01-11T21:38:06.1380859Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1380982Z                             auto tmp1 = static_cast<float>(0.4931506849315068);
2023-01-11T21:38:06.1381078Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1381187Z                             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:06.1381302Z                             auto tmp4 = static_cast<long>(tmp3);
2023-01-11T21:38:06.1381413Z                             auto tmp5 = static_cast<float>(i2);
2023-01-11T21:38:06.1381536Z                             auto tmp6 = static_cast<float>(0.49333333333333335);
2023-01-11T21:38:06.1381637Z                             auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:06.1381746Z                             auto tmp8 = std::ceil(tmp7);
2023-01-11T21:38:06.1381860Z                             auto tmp9 = static_cast<float>(37.0);
2023-01-11T21:38:06.1381992Z                             auto tmp10 = (tmp9 != tmp9) ? tmp9 : std::min(tmp8, tmp9);
2023-01-11T21:38:06.1382108Z                             auto tmp11 = static_cast<long>(tmp10);
2023-01-11T21:38:06.1382230Z                             auto tmp12 = in_ptr0[tmp11 + (38*tmp4) + (1406*i0)];
2023-01-11T21:38:06.1382377Z                             auto tmp13 = static_cast<float>(1.0);
2023-01-11T21:38:06.1382494Z                             auto tmp14 = static_cast<float>(tmp4);
2023-01-11T21:38:06.1382647Z                             auto tmp15 = tmp2 - tmp14;
2023-01-11T21:38:06.1382796Z                             auto tmp16 = tmp13 - tmp15;
2023-01-11T21:38:06.1382898Z                             auto tmp17 = tmp12 * tmp16;
2023-01-11T21:38:06.1383001Z                             auto tmp18 = std::ceil(tmp2);
2023-01-11T21:38:06.1383118Z                             auto tmp19 = static_cast<float>(36.0);
2023-01-11T21:38:06.1383256Z                             auto tmp20 = (tmp19 != tmp19) ? tmp19 : std::min(tmp18, tmp19);
2023-01-11T21:38:06.1383374Z                             auto tmp21 = static_cast<long>(tmp20);
2023-01-11T21:38:06.1383499Z                             auto tmp22 = in_ptr0[tmp11 + (38*tmp21) + (1406*i0)];
2023-01-11T21:38:06.1383601Z                             auto tmp23 = tmp22 * tmp15;
2023-01-11T21:38:06.1383709Z                             auto tmp24 = tmp17 + tmp23;
2023-01-11T21:38:06.1383821Z                             auto tmp25 = std::floor(tmp7);
2023-01-11T21:38:06.1383928Z                             auto tmp26 = static_cast<long>(tmp25);
2023-01-11T21:38:06.1384044Z                             auto tmp27 = static_cast<float>(tmp26);
2023-01-11T21:38:06.1384194Z                             auto tmp28 = tmp7 - tmp27;
2023-01-11T21:38:06.1384296Z                             auto tmp29 = tmp24 * tmp28;
2023-01-11T21:38:06.1384407Z                             out_ptr3[i2 + (76*i1) + (5624*i0)] = tmp29;
2023-01-11T21:38:06.1384480Z                         }
2023-01-11T21:38:06.1384581Z                     }
2023-01-11T21:38:06.1384643Z                 }
2023-01-11T21:38:06.1384711Z             }
2023-01-11T21:38:06.1384778Z         }
2023-01-11T21:38:06.1384862Z         #pragma omp for 
2023-01-11T21:38:06.1384950Z         for(long i0=0; i0<5624; i0+=1)
2023-01-11T21:38:06.1385018Z         {
2023-01-11T21:38:06.1385159Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr2 + 8*i0);
2023-01-11T21:38:06.1385292Z             auto tmp1 = at::vec::Vectorized<float>::loadu(out_ptr3 + 8*i0);
2023-01-11T21:38:06.1385383Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1385482Z             tmp2.store(in_out_ptr1 + 8*i0);
2023-01-11T21:38:06.1385550Z         }
2023-01-11T21:38:06.1385650Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1385744Z         for(long i0=44992; i0<44992; i0+=1)
2023-01-11T21:38:06.1385812Z         {
2023-01-11T21:38:06.1385896Z             auto tmp0 = out_ptr2[i0];
2023-01-11T21:38:06.1385989Z             auto tmp1 = out_ptr3[i0];
2023-01-11T21:38:06.1386078Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1386165Z             in_out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.1386232Z         }
2023-01-11T21:38:06.1386299Z     }
2023-01-11T21:38:06.1386365Z }
2023-01-11T21:38:06.1386443Z ''')
2023-01-11T21:38:06.1386449Z 
2023-01-11T21:38:06.1386455Z 
2023-01-11T21:38:06.1386550Z async_compile.wait(globals())
2023-01-11T21:38:06.1386628Z del async_compile
2023-01-11T21:38:06.1386633Z 
2023-01-11T21:38:06.1386709Z def call(args):
2023-01-11T21:38:06.1386784Z     arg0_1, = args
2023-01-11T21:38:06.1386858Z     args.clear()
2023-01-11T21:38:06.1387083Z     buf0 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1387295Z     buf1 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1387387Z     buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.1387608Z     buf3 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1387829Z     buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1387920Z     buf5 = buf3; del buf3  # reuse
2023-01-11T21:38:06.1388171Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(buf5.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:06.1388247Z     del arg0_1
2023-01-11T21:38:06.1388330Z     return (buf2, buf5, )
2023-01-11T21:38:06.1388335Z 
2023-01-11T21:38:06.1388339Z 
2023-01-11T21:38:06.1388420Z if __name__ == "__main__":
2023-01-11T21:38:06.1388533Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1388660Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1388881Z     arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1388995Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1389262Z [2023-01-11 21:32:53,555] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 315
2023-01-11T21:38:06.1389268Z 
2023-01-11T21:38:06.1389339Z ok (2.491s)
2023-01-11T21:38:06.1389814Z   test_upsample_bilinear2d_b_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1389949Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1390203Z [2023-01-11 21:32:53,837] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 316
2023-01-11T21:38:06.1390460Z [2023-01-11 21:32:55,635] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 316
2023-01-11T21:38:06.1390501Z 
2023-01-11T21:38:06.1390593Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1390668Z import torch
2023-01-11T21:38:06.1390743Z import random
2023-01-11T21:38:06.1390865Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1390989Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1390996Z 
2023-01-11T21:38:06.1391080Z aten = torch.ops.aten
2023-01-11T21:38:06.1391218Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1391308Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1391313Z 
2023-01-11T21:38:06.1391388Z import triton
2023-01-11T21:38:06.1391481Z import triton.language as tl
2023-01-11T21:38:06.1391605Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1391746Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1391751Z 
2023-01-11T21:38:06.1391756Z 
2023-01-11T21:38:06.1391892Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1392104Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1392226Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.1392329Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1392438Z                        float* __restrict__ out_ptr1)
2023-01-11T21:38:06.1392506Z {
2023-01-11T21:38:06.1392600Z     auto out_ptr0 = in_out_ptr0;
2023-01-11T21:38:06.1392702Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1392768Z     {
2023-01-11T21:38:06.1392863Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1392943Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1393011Z         {
2023-01-11T21:38:06.1393102Z             for(long i1=0; i1<80; i1+=1)
2023-01-11T21:38:06.1393172Z             {
2023-01-11T21:38:06.1393258Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1393354Z                 for(long i2=0; i2<118; i2+=1)
2023-01-11T21:38:06.1393419Z                 {
2023-01-11T21:38:06.1393491Z                     {
2023-01-11T21:38:06.1393568Z                         {
2023-01-11T21:38:06.1393689Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1393814Z                             auto tmp1 = static_cast<float>(0.4936708860759494);
2023-01-11T21:38:06.1393944Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1394058Z                             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:06.1394176Z                             auto tmp4 = static_cast<long>(tmp3);
2023-01-11T21:38:06.1394282Z                             auto tmp5 = static_cast<float>(i2);
2023-01-11T21:38:06.1394406Z                             auto tmp6 = static_cast<float>(0.49572649572649574);
2023-01-11T21:38:06.1394508Z                             auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:06.1394622Z                             auto tmp8 = std::floor(tmp7);
2023-01-11T21:38:06.1394736Z                             auto tmp9 = static_cast<long>(tmp8);
2023-01-11T21:38:06.1394862Z                             auto tmp10 = in_ptr0[tmp9 + (59*tmp4) + (2360*i0)];
2023-01-11T21:38:06.1394978Z                             auto tmp11 = static_cast<float>(1.0);
2023-01-11T21:38:06.1395087Z                             auto tmp12 = static_cast<float>(tmp4);
2023-01-11T21:38:06.1395242Z                             auto tmp13 = tmp2 - tmp12;
2023-01-11T21:38:06.1395394Z                             auto tmp14 = tmp11 - tmp13;
2023-01-11T21:38:06.1395498Z                             auto tmp15 = tmp10 * tmp14;
2023-01-11T21:38:06.1395608Z                             auto tmp16 = std::ceil(tmp2);
2023-01-11T21:38:06.1395724Z                             auto tmp17 = static_cast<float>(39.0);
2023-01-11T21:38:06.1395863Z                             auto tmp18 = (tmp17 != tmp17) ? tmp17 : std::min(tmp16, tmp17);
2023-01-11T21:38:06.1395981Z                             auto tmp19 = static_cast<long>(tmp18);
2023-01-11T21:38:06.1396098Z                             auto tmp20 = in_ptr0[tmp9 + (59*tmp19) + (2360*i0)];
2023-01-11T21:38:06.1396228Z                             auto tmp21 = tmp20 * tmp13;
2023-01-11T21:38:06.1396329Z                             auto tmp22 = tmp15 + tmp21;
2023-01-11T21:38:06.1396446Z                             auto tmp23 = static_cast<float>(tmp9);
2023-01-11T21:38:06.1396598Z                             auto tmp24 = tmp7 - tmp23;
2023-01-11T21:38:06.1396748Z                             auto tmp25 = tmp11 - tmp24;
2023-01-11T21:38:06.1396852Z                             auto tmp26 = tmp22 * tmp25;
2023-01-11T21:38:06.1396965Z                             out_ptr0[i2 + (118*i1) + (9440*i0)] = tmp26;
2023-01-11T21:38:06.1397032Z                         }
2023-01-11T21:38:06.1397103Z                     }
2023-01-11T21:38:06.1397174Z                 }
2023-01-11T21:38:06.1397242Z             }
2023-01-11T21:38:06.1397311Z         }
2023-01-11T21:38:06.1397407Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1397500Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1397560Z         {
2023-01-11T21:38:06.1397651Z             for(long i1=0; i1<80; i1+=1)
2023-01-11T21:38:06.1397719Z             {
2023-01-11T21:38:06.1397809Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1397906Z                 for(long i2=0; i2<118; i2+=1)
2023-01-11T21:38:06.1397979Z                 {
2023-01-11T21:38:06.1398043Z                     {
2023-01-11T21:38:06.1398117Z                         {
2023-01-11T21:38:06.1398229Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1398352Z                             auto tmp1 = static_cast<float>(0.4936708860759494);
2023-01-11T21:38:06.1398454Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1398564Z                             auto tmp3 = std::floor(tmp2);
2023-01-11T21:38:06.1398683Z                             auto tmp4 = static_cast<long>(tmp3);
2023-01-11T21:38:06.1398796Z                             auto tmp5 = static_cast<float>(i2);
2023-01-11T21:38:06.1398916Z                             auto tmp6 = static_cast<float>(0.49572649572649574);
2023-01-11T21:38:06.1399019Z                             auto tmp7 = tmp5 * tmp6;
2023-01-11T21:38:06.1399130Z                             auto tmp8 = std::ceil(tmp7);
2023-01-11T21:38:06.1399274Z                             auto tmp9 = static_cast<float>(58.0);
2023-01-11T21:38:06.1399414Z                             auto tmp10 = (tmp9 != tmp9) ? tmp9 : std::min(tmp8, tmp9);
2023-01-11T21:38:06.1399530Z                             auto tmp11 = static_cast<long>(tmp10);
2023-01-11T21:38:06.1399652Z                             auto tmp12 = in_ptr0[tmp11 + (59*tmp4) + (2360*i0)];
2023-01-11T21:38:06.1399771Z                             auto tmp13 = static_cast<float>(1.0);
2023-01-11T21:38:06.1399879Z                             auto tmp14 = static_cast<float>(tmp4);
2023-01-11T21:38:06.1400031Z                             auto tmp15 = tmp2 - tmp14;
2023-01-11T21:38:06.1400181Z                             auto tmp16 = tmp13 - tmp15;
2023-01-11T21:38:06.1400286Z                             auto tmp17 = tmp12 * tmp16;
2023-01-11T21:38:06.1400395Z                             auto tmp18 = std::ceil(tmp2);
2023-01-11T21:38:06.1400511Z                             auto tmp19 = static_cast<float>(39.0);
2023-01-11T21:38:06.1400653Z                             auto tmp20 = (tmp19 != tmp19) ? tmp19 : std::min(tmp18, tmp19);
2023-01-11T21:38:06.1400767Z                             auto tmp21 = static_cast<long>(tmp20);
2023-01-11T21:38:06.1400884Z                             auto tmp22 = in_ptr0[tmp11 + (59*tmp21) + (2360*i0)];
2023-01-11T21:38:06.1400987Z                             auto tmp23 = tmp22 * tmp15;
2023-01-11T21:38:06.1401089Z                             auto tmp24 = tmp17 + tmp23;
2023-01-11T21:38:06.1401202Z                             auto tmp25 = std::floor(tmp7);
2023-01-11T21:38:06.1401317Z                             auto tmp26 = static_cast<long>(tmp25);
2023-01-11T21:38:06.1401462Z                             auto tmp27 = static_cast<float>(tmp26);
2023-01-11T21:38:06.1401611Z                             auto tmp28 = tmp7 - tmp27;
2023-01-11T21:38:06.1401705Z                             auto tmp29 = tmp24 * tmp28;
2023-01-11T21:38:06.1401820Z                             out_ptr1[i2 + (118*i1) + (9440*i0)] = tmp29;
2023-01-11T21:38:06.1401897Z                         }
2023-01-11T21:38:06.1401968Z                     }
2023-01-11T21:38:06.1402039Z                 }
2023-01-11T21:38:06.1402107Z             }
2023-01-11T21:38:06.1402176Z         }
2023-01-11T21:38:06.1402251Z         #pragma omp for 
2023-01-11T21:38:06.1402340Z         for(long i0=0; i0<2360; i0+=1)
2023-01-11T21:38:06.1402407Z         {
2023-01-11T21:38:06.1402547Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1402686Z             auto tmp1 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1402777Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1402882Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.1402942Z         }
2023-01-11T21:38:06.1403045Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1403137Z         for(long i0=18880; i0<18880; i0+=1)
2023-01-11T21:38:06.1403204Z         {
2023-01-11T21:38:06.1403299Z             auto tmp0 = out_ptr0[i0];
2023-01-11T21:38:06.1403391Z             auto tmp1 = out_ptr1[i0];
2023-01-11T21:38:06.1403480Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1403561Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1403629Z         }
2023-01-11T21:38:06.1403697Z     }
2023-01-11T21:38:06.1403762Z }
2023-01-11T21:38:06.1403850Z ''')
2023-01-11T21:38:06.1403855Z 
2023-01-11T21:38:06.1403860Z 
2023-01-11T21:38:06.1403957Z async_compile.wait(globals())
2023-01-11T21:38:06.1404034Z del async_compile
2023-01-11T21:38:06.1404039Z 
2023-01-11T21:38:06.1404115Z def call(args):
2023-01-11T21:38:06.1404182Z     arg0_1, = args
2023-01-11T21:38:06.1404257Z     args.clear()
2023-01-11T21:38:06.1404490Z     buf0 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1404708Z     buf1 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1404800Z     buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.1404994Z     kernel_cpp_0(c_void_p(buf2.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1405072Z     del arg0_1
2023-01-11T21:38:06.1405141Z     return (buf2, )
2023-01-11T21:38:06.1405148Z 
2023-01-11T21:38:06.1405154Z 
2023-01-11T21:38:06.1405256Z if __name__ == "__main__":
2023-01-11T21:38:06.1405389Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1405526Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1405747Z     arg0_1 = rand_strided((1, 2, 40, 59), (4720, 2360, 59, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1405861Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1405870Z 
2023-01-11T21:38:06.1405943Z ok (2.080s)
2023-01-11T21:38:06.1406415Z   test_upsample_nearest1d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1406548Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1406799Z [2023-01-11 21:32:55,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 317
2023-01-11T21:38:06.1407063Z [2023-01-11 21:32:57,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 317
2023-01-11T21:38:06.1407069Z 
2023-01-11T21:38:06.1407166Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1407241Z import torch
2023-01-11T21:38:06.1407350Z import random
2023-01-11T21:38:06.1407471Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1407594Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1407599Z 
2023-01-11T21:38:06.1407681Z aten = torch.ops.aten
2023-01-11T21:38:06.1407813Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1407911Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1407916Z 
2023-01-11T21:38:06.1407990Z import triton
2023-01-11T21:38:06.1408084Z import triton.language as tl
2023-01-11T21:38:06.1408211Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1408351Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1408356Z 
2023-01-11T21:38:06.1408361Z 
2023-01-11T21:38:06.1408499Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1408705Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1408825Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1408930Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1409031Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1409130Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1409236Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.1409341Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:06.1409408Z {
2023-01-11T21:38:06.1409503Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1409569Z     {
2023-01-11T21:38:06.1409653Z         #pragma omp for 
2023-01-11T21:38:06.1409741Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1409807Z         {
2023-01-11T21:38:06.1409893Z             #pragma GCC ivdep
2023-01-11T21:38:06.1409987Z             for(long i1=0; i1<74; i1+=1)
2023-01-11T21:38:06.1410048Z             {
2023-01-11T21:38:06.1410117Z                 {
2023-01-11T21:38:06.1410189Z                     {
2023-01-11T21:38:06.1410306Z                         auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1410421Z                         auto tmp1 = static_cast<float>(0.5);
2023-01-11T21:38:06.1410521Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1410634Z                         auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1410767Z                         auto tmp4 = in_ptr0[tmp3 + (37*i0)];
2023-01-11T21:38:06.1410871Z                         out_ptr0[i1 + (74*i0)] = tmp4;
2023-01-11T21:38:06.1410970Z                         out_ptr1[i1 + (74*i0)] = tmp4;
2023-01-11T21:38:06.1411042Z                     }
2023-01-11T21:38:06.1411113Z                 }
2023-01-11T21:38:06.1411181Z             }
2023-01-11T21:38:06.1411248Z         }
2023-01-11T21:38:06.1411323Z         #pragma omp for 
2023-01-11T21:38:06.1411412Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1411481Z         {
2023-01-11T21:38:06.1411567Z             #pragma GCC ivdep
2023-01-11T21:38:06.1411661Z             for(long i1=0; i1<70; i1+=1)
2023-01-11T21:38:06.1411728Z             {
2023-01-11T21:38:06.1411797Z                 {
2023-01-11T21:38:06.1411860Z                     {
2023-01-11T21:38:06.1411972Z                         auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1412097Z                         auto tmp1 = static_cast<float>(0.5285714285714286);
2023-01-11T21:38:06.1412199Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1412312Z                         auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1412423Z                         auto tmp4 = in_ptr0[tmp3 + (37*i0)];
2023-01-11T21:38:06.1412521Z                         out_ptr2[i1 + (70*i0)] = tmp4;
2023-01-11T21:38:06.1421406Z                     }
2023-01-11T21:38:06.1421495Z                 }
2023-01-11T21:38:06.1421567Z             }
2023-01-11T21:38:06.1421633Z         }
2023-01-11T21:38:06.1421724Z         #pragma omp for 
2023-01-11T21:38:06.1421812Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1422001Z         {
2023-01-11T21:38:06.1422094Z             #pragma GCC ivdep
2023-01-11T21:38:06.1422191Z             for(long i1=0; i1<45; i1+=1)
2023-01-11T21:38:06.1422255Z             {
2023-01-11T21:38:06.1422327Z                 {
2023-01-11T21:38:06.1422402Z                     {
2023-01-11T21:38:06.1422522Z                         auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1422647Z                         auto tmp1 = static_cast<float>(0.8222222222222222);
2023-01-11T21:38:06.1422750Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1422864Z                         auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1422971Z                         auto tmp4 = in_ptr0[tmp3 + (37*i0)];
2023-01-11T21:38:06.1423074Z                         out_ptr3[i1 + (45*i0)] = tmp4;
2023-01-11T21:38:06.1423143Z                     }
2023-01-11T21:38:06.1423219Z                 }
2023-01-11T21:38:06.1423289Z             }
2023-01-11T21:38:06.1423356Z         }
2023-01-11T21:38:06.1423441Z         #pragma omp for 
2023-01-11T21:38:06.1423529Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1423600Z         {
2023-01-11T21:38:06.1423687Z             #pragma GCC ivdep
2023-01-11T21:38:06.1423780Z             for(long i1=0; i1<36; i1+=1)
2023-01-11T21:38:06.1423852Z             {
2023-01-11T21:38:06.1423919Z                 {
2023-01-11T21:38:06.1423991Z                     {
2023-01-11T21:38:06.1424105Z                         auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1424226Z                         auto tmp1 = static_cast<float>(1.0277777777777777);
2023-01-11T21:38:06.1424325Z                         auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1424438Z                         auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1424549Z                         auto tmp4 = in_ptr0[tmp3 + (37*i0)];
2023-01-11T21:38:06.1424644Z                         out_ptr4[i1 + (36*i0)] = tmp4;
2023-01-11T21:38:06.1424716Z                     }
2023-01-11T21:38:06.1424789Z                 }
2023-01-11T21:38:06.1424860Z             }
2023-01-11T21:38:06.1424931Z         }
2023-01-11T21:38:06.1425001Z     }
2023-01-11T21:38:06.1425062Z }
2023-01-11T21:38:06.1425171Z ''')
2023-01-11T21:38:06.1425177Z 
2023-01-11T21:38:06.1425182Z 
2023-01-11T21:38:06.1425280Z async_compile.wait(globals())
2023-01-11T21:38:06.1425401Z del async_compile
2023-01-11T21:38:06.1425407Z 
2023-01-11T21:38:06.1425485Z def call(args):
2023-01-11T21:38:06.1425562Z     arg0_1, = args
2023-01-11T21:38:06.1425640Z     args.clear()
2023-01-11T21:38:06.1425861Z     buf0 = empty_strided((2, 4, 74), (296, 74, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1426060Z     buf4 = empty_strided((2, 4, 74), (296, 74, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1426263Z     buf1 = empty_strided((2, 4, 70), (280, 70, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1426463Z     buf2 = empty_strided((2, 4, 45), (180, 45, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1426666Z     buf3 = empty_strided((2, 4, 36), (144, 36, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1426908Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.1426986Z     del arg0_1
2023-01-11T21:38:06.1427089Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.1427094Z 
2023-01-11T21:38:06.1427098Z 
2023-01-11T21:38:06.1427179Z if __name__ == "__main__":
2023-01-11T21:38:06.1427290Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1427417Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1427623Z     arg0_1 = rand_strided((2, 4, 37), (148, 37, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1427737Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1427742Z 
2023-01-11T21:38:06.1427814Z ok (2.008s)
2023-01-11T21:38:06.1428292Z   test_upsample_nearest2d_backward_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1428474Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1428735Z [2023-01-11 21:32:57,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 318
2023-01-11T21:38:06.1428741Z 
2023-01-11T21:38:06.1428838Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1428914Z import torch
2023-01-11T21:38:06.1428984Z import random
2023-01-11T21:38:06.1429108Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1429231Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1429239Z 
2023-01-11T21:38:06.1429322Z aten = torch.ops.aten
2023-01-11T21:38:06.1429459Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1429558Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1429564Z 
2023-01-11T21:38:06.1429638Z import triton
2023-01-11T21:38:06.1429724Z import triton.language as tl
2023-01-11T21:38:06.1429854Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1429996Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1430001Z 
2023-01-11T21:38:06.1430006Z 
2023-01-11T21:38:06.1430146Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1430354Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1430478Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1430587Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1430693Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1430789Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1430890Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.1430988Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:06.1431054Z {
2023-01-11T21:38:06.1431159Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1431253Z     {
2023-01-11T21:38:06.1431336Z         #pragma omp for 
2023-01-11T21:38:06.1431417Z         for(long i0=0; i0<27; i0+=1)
2023-01-11T21:38:06.1431484Z         {
2023-01-11T21:38:06.1431570Z             #pragma GCC ivdep
2023-01-11T21:38:06.1431660Z             for(long i1=0; i1<6; i1+=1)
2023-01-11T21:38:06.1431728Z             {
2023-01-11T21:38:06.1431797Z                 {
2023-01-11T21:38:06.1431868Z                     {
2023-01-11T21:38:06.1431973Z                         auto tmp0 = in_ptr0[(2*i1) + (24*i0)];
2023-01-11T21:38:06.1432085Z                         auto tmp1 = in_ptr0[1 + (2*i1) + (24*i0)];
2023-01-11T21:38:06.1432199Z                         auto tmp3 = in_ptr0[12 + (2*i1) + (24*i0)];
2023-01-11T21:38:06.1432311Z                         auto tmp5 = in_ptr0[13 + (2*i1) + (24*i0)];
2023-01-11T21:38:06.1432414Z                         auto tmp2 = tmp1 + tmp0;
2023-01-11T21:38:06.1432511Z                         auto tmp4 = tmp3 + tmp2;
2023-01-11T21:38:06.1432611Z                         auto tmp6 = tmp5 + tmp4;
2023-01-11T21:38:06.1432720Z                         auto tmp7 = static_cast<float>(1.0);
2023-01-11T21:38:06.1432817Z                         auto tmp8 = tmp6 * tmp7;
2023-01-11T21:38:06.1432923Z                         out_ptr0[i1 + (6*i0)] = tmp8;
2023-01-11T21:38:06.1432995Z                     }
2023-01-11T21:38:06.1433063Z                 }
2023-01-11T21:38:06.1433131Z             }
2023-01-11T21:38:06.1433198Z         }
2023-01-11T21:38:06.1433287Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1433375Z         for(long i0=0; i0<9; i0+=1)
2023-01-11T21:38:06.1433441Z         {
2023-01-11T21:38:06.1433558Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.1433631Z             {
2023-01-11T21:38:06.1433721Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1433814Z                 for(long i2=0; i2<5; i2+=1)
2023-01-11T21:38:06.1433877Z                 {
2023-01-11T21:38:06.1433947Z                     {
2023-01-11T21:38:06.1434023Z                         {
2023-01-11T21:38:06.1434147Z                             auto tmp0 = static_cast<long>(((3 + (6*i1)) / 4));
2023-01-11T21:38:06.1434270Z                             auto tmp1 = static_cast<long>(((9 + (6*i1)) / 4));
2023-01-11T21:38:06.1434370Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.1434492Z                             auto tmp3 = static_cast<long>(((4 + (12*i2)) / 5));
2023-01-11T21:38:06.1434609Z                             auto tmp4 = static_cast<long>(((16 + (12*i2)) / 5));
2023-01-11T21:38:06.1434712Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:06.1434811Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:06.1434907Z                             float tmp7 = 0.0;
2023-01-11T21:38:06.1434996Z                             if(tmp6)
2023-01-11T21:38:06.1435116Z                             {
2023-01-11T21:38:06.1435312Z                                 auto tmp8 = in_ptr0[(12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))];
2023-01-11T21:38:06.1435435Z                                 tmp7 = tmp8;
2023-01-11T21:38:06.1435521Z                             }
2023-01-11T21:38:06.1435671Z                             auto tmp9 = static_cast<long>(1 + (((4 + (12*i2)) / 5)));
2023-01-11T21:38:06.1435792Z                             auto tmp10 = tmp9 < tmp4;
2023-01-11T21:38:06.1435918Z                             auto tmp11 = tmp2 & tmp10;
2023-01-11T21:38:06.1436041Z                             float tmp12 = 0.0;
2023-01-11T21:38:06.1436148Z                             if(tmp11)
2023-01-11T21:38:06.1436246Z                             {
2023-01-11T21:38:06.1438893Z                                 auto tmp13 = in_ptr0[1 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))];
2023-01-11T21:38:06.1439079Z                                 tmp12 = tmp13;
2023-01-11T21:38:06.1439159Z                             }
2023-01-11T21:38:06.1439267Z                             auto tmp14 = tmp12 + tmp7;
2023-01-11T21:38:06.1439454Z                             auto tmp15 = static_cast<long>(2 + (((4 + (12*i2)) / 5)));
2023-01-11T21:38:06.1439561Z                             auto tmp16 = tmp15 < tmp4;
2023-01-11T21:38:06.1439666Z                             auto tmp17 = tmp2 & tmp16;
2023-01-11T21:38:06.1439757Z                             float tmp18 = 0.0;
2023-01-11T21:38:06.1439850Z                             if(tmp17)
2023-01-11T21:38:06.1439926Z                             {
2023-01-11T21:38:06.1440065Z                                 auto tmp19 = in_ptr0[2 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))];
2023-01-11T21:38:06.1440162Z                                 tmp18 = tmp19;
2023-01-11T21:38:06.1440242Z                             }
2023-01-11T21:38:06.1440347Z                             auto tmp20 = tmp18 + tmp14;
2023-01-11T21:38:06.1440477Z                             auto tmp21 = static_cast<long>(1 + (((3 + (6*i1)) / 4)));
2023-01-11T21:38:06.1440573Z                             auto tmp22 = tmp21 < tmp1;
2023-01-11T21:38:06.1440682Z                             auto tmp23 = tmp22 & tmp5;
2023-01-11T21:38:06.1440779Z                             float tmp24 = 0.0;
2023-01-11T21:38:06.1440866Z                             if(tmp23)
2023-01-11T21:38:06.1440943Z                             {
2023-01-11T21:38:06.1441082Z                                 auto tmp25 = in_ptr0[12 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))];
2023-01-11T21:38:06.1441183Z                                 tmp24 = tmp25;
2023-01-11T21:38:06.1441252Z                             }
2023-01-11T21:38:06.1441361Z                             auto tmp26 = tmp24 + tmp20;
2023-01-11T21:38:06.1441497Z                             auto tmp27 = tmp22 & tmp10;
2023-01-11T21:38:06.1441591Z                             float tmp28 = 0.0;
2023-01-11T21:38:06.1441674Z                             if(tmp27)
2023-01-11T21:38:06.1441751Z                             {
2023-01-11T21:38:06.1441894Z                                 auto tmp29 = in_ptr0[13 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))];
2023-01-11T21:38:06.1441989Z                                 tmp28 = tmp29;
2023-01-11T21:38:06.1442057Z                             }
2023-01-11T21:38:06.1442161Z                             auto tmp30 = tmp28 + tmp26;
2023-01-11T21:38:06.1442263Z                             auto tmp31 = tmp22 & tmp16;
2023-01-11T21:38:06.1442358Z                             float tmp32 = 0.0;
2023-01-11T21:38:06.1442443Z                             if(tmp31)
2023-01-11T21:38:06.1442518Z                             {
2023-01-11T21:38:06.1442657Z                                 auto tmp33 = in_ptr0[14 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((4 + (12*i2)) / 5))];
2023-01-11T21:38:06.1442747Z                                 tmp32 = tmp33;
2023-01-11T21:38:06.1442822Z                             }
2023-01-11T21:38:06.1442924Z                             auto tmp34 = tmp32 + tmp30;
2023-01-11T21:38:06.1443042Z                             out_ptr1[i2 + (5*i1) + (20*i0)] = tmp34;
2023-01-11T21:38:06.1443117Z                         }
2023-01-11T21:38:06.1443190Z                     }
2023-01-11T21:38:06.1443259Z                 }
2023-01-11T21:38:06.1443322Z             }
2023-01-11T21:38:06.1443390Z         }
2023-01-11T21:38:06.1443490Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1443579Z         for(long i0=0; i0<9; i0+=1)
2023-01-11T21:38:06.1443651Z         {
2023-01-11T21:38:06.1443740Z             for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:06.1443810Z             {
2023-01-11T21:38:06.1443892Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1443987Z                 for(long i2=0; i2<8; i2+=1)
2023-01-11T21:38:06.1444061Z                 {
2023-01-11T21:38:06.1444134Z                     {
2023-01-11T21:38:06.1444207Z                         {
2023-01-11T21:38:06.1444325Z                             auto tmp0 = static_cast<long>(3*i1);
2023-01-11T21:38:06.1444445Z                             auto tmp1 = static_cast<long>(3 + (3*i1));
2023-01-11T21:38:06.1444572Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.1444700Z                             auto tmp3 = static_cast<long>(((7 + (12*i2)) / 8));
2023-01-11T21:38:06.1444826Z                             auto tmp4 = static_cast<long>(((19 + (12*i2)) / 8));
2023-01-11T21:38:06.1444930Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:06.1445030Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:06.1445128Z                             float tmp7 = 0.0;
2023-01-11T21:38:06.1445213Z                             if(tmp6)
2023-01-11T21:38:06.1445283Z                             {
2023-01-11T21:38:06.1445415Z                                 auto tmp8 = in_ptr0[(36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1445512Z                                 tmp7 = tmp8;
2023-01-11T21:38:06.1445589Z                             }
2023-01-11T21:38:06.1445715Z                             auto tmp9 = static_cast<long>(1 + (((7 + (12*i2)) / 8)));
2023-01-11T21:38:06.1445821Z                             auto tmp10 = tmp9 < tmp4;
2023-01-11T21:38:06.1445925Z                             auto tmp11 = tmp2 & tmp10;
2023-01-11T21:38:06.1446014Z                             float tmp12 = 0.0;
2023-01-11T21:38:06.1446098Z                             if(tmp11)
2023-01-11T21:38:06.1446178Z                             {
2023-01-11T21:38:06.1446310Z                                 auto tmp13 = in_ptr0[1 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1446405Z                                 tmp12 = tmp13;
2023-01-11T21:38:06.1446482Z                             }
2023-01-11T21:38:06.1446635Z                             auto tmp14 = tmp12 + tmp7;
2023-01-11T21:38:06.1446757Z                             auto tmp15 = static_cast<long>(1 + (3*i1));
2023-01-11T21:38:06.1446854Z                             auto tmp16 = tmp15 < tmp1;
2023-01-11T21:38:06.1446957Z                             auto tmp17 = tmp16 & tmp5;
2023-01-11T21:38:06.1447057Z                             float tmp18 = 0.0;
2023-01-11T21:38:06.1447146Z                             if(tmp17)
2023-01-11T21:38:06.1447223Z                             {
2023-01-11T21:38:06.1447354Z                                 auto tmp19 = in_ptr0[12 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1447447Z                                 tmp18 = tmp19;
2023-01-11T21:38:06.1447515Z                             }
2023-01-11T21:38:06.1447621Z                             auto tmp20 = tmp18 + tmp14;
2023-01-11T21:38:06.1447728Z                             auto tmp21 = tmp16 & tmp10;
2023-01-11T21:38:06.1447824Z                             float tmp22 = 0.0;
2023-01-11T21:38:06.1447919Z                             if(tmp21)
2023-01-11T21:38:06.1447997Z                             {
2023-01-11T21:38:06.1448125Z                                 auto tmp23 = in_ptr0[13 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1448212Z                                 tmp22 = tmp23;
2023-01-11T21:38:06.1448292Z                             }
2023-01-11T21:38:06.1448397Z                             auto tmp24 = tmp22 + tmp20;
2023-01-11T21:38:06.1448520Z                             auto tmp25 = static_cast<long>(2 + (3*i1));
2023-01-11T21:38:06.1448623Z                             auto tmp26 = tmp25 < tmp1;
2023-01-11T21:38:06.1448727Z                             auto tmp27 = tmp26 & tmp5;
2023-01-11T21:38:06.1448823Z                             float tmp28 = 0.0;
2023-01-11T21:38:06.1448909Z                             if(tmp27)
2023-01-11T21:38:06.1448977Z                             {
2023-01-11T21:38:06.1449109Z                                 auto tmp29 = in_ptr0[24 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1449205Z                                 tmp28 = tmp29;
2023-01-11T21:38:06.1449283Z                             }
2023-01-11T21:38:06.1449390Z                             auto tmp30 = tmp28 + tmp24;
2023-01-11T21:38:06.1449540Z                             auto tmp31 = tmp26 & tmp10;
2023-01-11T21:38:06.1449637Z                             float tmp32 = 0.0;
2023-01-11T21:38:06.1449714Z                             if(tmp31)
2023-01-11T21:38:06.1449794Z                             {
2023-01-11T21:38:06.1449920Z                                 auto tmp33 = in_ptr0[25 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1450013Z                                 tmp32 = tmp33;
2023-01-11T21:38:06.1450091Z                             }
2023-01-11T21:38:06.1450194Z                             auto tmp34 = tmp32 + tmp30;
2023-01-11T21:38:06.1450289Z                             float tmp35 = 0.0;
2023-01-11T21:38:06.1450371Z                             if(tmp6)
2023-01-11T21:38:06.1450449Z                             {
2023-01-11T21:38:06.1450580Z                                 auto tmp36 = in_ptr0[(36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1450673Z                                 tmp35 = tmp36;
2023-01-11T21:38:06.1450754Z                             }
2023-01-11T21:38:06.1450848Z                             float tmp37 = 0.0;
2023-01-11T21:38:06.1450934Z                             if(tmp11)
2023-01-11T21:38:06.1451003Z                             {
2023-01-11T21:38:06.1451131Z                                 auto tmp38 = in_ptr0[1 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1451223Z                                 tmp37 = tmp38;
2023-01-11T21:38:06.1451298Z                             }
2023-01-11T21:38:06.1451402Z                             auto tmp39 = tmp37 + tmp35;
2023-01-11T21:38:06.1451498Z                             float tmp40 = 0.0;
2023-01-11T21:38:06.1451612Z                             if(tmp17)
2023-01-11T21:38:06.1451690Z                             {
2023-01-11T21:38:06.1451812Z                                 auto tmp41 = in_ptr0[12 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1451904Z                                 tmp40 = tmp41;
2023-01-11T21:38:06.1451980Z                             }
2023-01-11T21:38:06.1452085Z                             auto tmp42 = tmp40 + tmp39;
2023-01-11T21:38:06.1452180Z                             float tmp43 = 0.0;
2023-01-11T21:38:06.1452267Z                             if(tmp21)
2023-01-11T21:38:06.1452343Z                             {
2023-01-11T21:38:06.1452462Z                                 auto tmp44 = in_ptr0[13 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1452556Z                                 tmp43 = tmp44;
2023-01-11T21:38:06.1452632Z                             }
2023-01-11T21:38:06.1452736Z                             auto tmp45 = tmp43 + tmp42;
2023-01-11T21:38:06.1452834Z                             float tmp46 = 0.0;
2023-01-11T21:38:06.1452920Z                             if(tmp27)
2023-01-11T21:38:06.1452997Z                             {
2023-01-11T21:38:06.1453120Z                                 auto tmp47 = in_ptr0[24 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1453214Z                                 tmp46 = tmp47;
2023-01-11T21:38:06.1453292Z                             }
2023-01-11T21:38:06.1453397Z                             auto tmp48 = tmp46 + tmp45;
2023-01-11T21:38:06.1453491Z                             float tmp49 = 0.0;
2023-01-11T21:38:06.1453575Z                             if(tmp31)
2023-01-11T21:38:06.1453652Z                             {
2023-01-11T21:38:06.1453777Z                                 auto tmp50 = in_ptr0[25 + (36*i1) + (72*i0) + (((7 + (12*i2)) / 8))];
2023-01-11T21:38:06.1453861Z                                 tmp49 = tmp50;
2023-01-11T21:38:06.1453936Z                             }
2023-01-11T21:38:06.1454042Z                             auto tmp51 = tmp49 + tmp48;
2023-01-11T21:38:06.1454161Z                             out_ptr2[i2 + (8*i1) + (16*i0)] = tmp34;
2023-01-11T21:38:06.1454270Z                             out_ptr3[i2 + (8*i1) + (16*i0)] = tmp51;
2023-01-11T21:38:06.1454344Z                         }
2023-01-11T21:38:06.1454417Z                     }
2023-01-11T21:38:06.1454843Z                 }
2023-01-11T21:38:06.1454939Z             }
2023-01-11T21:38:06.1455007Z         }
2023-01-11T21:38:06.1455108Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.1455194Z         for(long i0=0; i0<9; i0+=1)
2023-01-11T21:38:06.1455264Z         {
2023-01-11T21:38:06.1455356Z             for(long i1=0; i1<4; i1+=1)
2023-01-11T21:38:06.1455417Z             {
2023-01-11T21:38:06.1455504Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1455599Z                 for(long i2=0; i2<7; i2+=1)
2023-01-11T21:38:06.1455670Z                 {
2023-01-11T21:38:06.1455739Z                     {
2023-01-11T21:38:06.1455815Z                         {
2023-01-11T21:38:06.1455932Z                             auto tmp0 = static_cast<long>(((3 + (6*i1)) / 4));
2023-01-11T21:38:06.1456054Z                             auto tmp1 = static_cast<long>(((9 + (6*i1)) / 4));
2023-01-11T21:38:06.1456159Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.1456283Z                             auto tmp3 = static_cast<long>(((6 + (12*i2)) / 7));
2023-01-11T21:38:06.1456407Z                             auto tmp4 = static_cast<long>(((18 + (12*i2)) / 7));
2023-01-11T21:38:06.1456510Z                             auto tmp5 = tmp3 < tmp4;
2023-01-11T21:38:06.1456608Z                             auto tmp6 = tmp2 & tmp5;
2023-01-11T21:38:06.1456702Z                             float tmp7 = 0.0;
2023-01-11T21:38:06.1456777Z                             if(tmp6)
2023-01-11T21:38:06.1456852Z                             {
2023-01-11T21:38:06.1456985Z                                 auto tmp8 = in_ptr0[(12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))];
2023-01-11T21:38:06.1457117Z                                 tmp7 = tmp8;
2023-01-11T21:38:06.1457271Z                             }
2023-01-11T21:38:06.1457416Z                             auto tmp9 = static_cast<long>(1 + (((6 + (12*i2)) / 7)));
2023-01-11T21:38:06.1457521Z                             auto tmp10 = tmp9 < tmp4;
2023-01-11T21:38:06.1457616Z                             auto tmp11 = tmp2 & tmp10;
2023-01-11T21:38:06.1457714Z                             float tmp12 = 0.0;
2023-01-11T21:38:06.1457796Z                             if(tmp11)
2023-01-11T21:38:06.1457870Z                             {
2023-01-11T21:38:06.1458006Z                                 auto tmp13 = in_ptr0[1 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))];
2023-01-11T21:38:06.1458099Z                                 tmp12 = tmp13;
2023-01-11T21:38:06.1458174Z                             }
2023-01-11T21:38:06.1458275Z                             auto tmp14 = tmp12 + tmp7;
2023-01-11T21:38:06.1458398Z                             auto tmp15 = static_cast<long>(1 + (((3 + (6*i1)) / 4)));
2023-01-11T21:38:06.1458501Z                             auto tmp16 = tmp15 < tmp1;
2023-01-11T21:38:06.1458603Z                             auto tmp17 = tmp16 & tmp5;
2023-01-11T21:38:06.1458697Z                             float tmp18 = 0.0;
2023-01-11T21:38:06.1458784Z                             if(tmp17)
2023-01-11T21:38:06.1458858Z                             {
2023-01-11T21:38:06.1458997Z                                 auto tmp19 = in_ptr0[12 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))];
2023-01-11T21:38:06.1459082Z                                 tmp18 = tmp19;
2023-01-11T21:38:06.1459157Z                             }
2023-01-11T21:38:06.1459260Z                             auto tmp20 = tmp18 + tmp14;
2023-01-11T21:38:06.1459363Z                             auto tmp21 = tmp16 & tmp10;
2023-01-11T21:38:06.1459456Z                             float tmp22 = 0.0;
2023-01-11T21:38:06.1459542Z                             if(tmp21)
2023-01-11T21:38:06.1459617Z                             {
2023-01-11T21:38:06.1459753Z                                 auto tmp23 = in_ptr0[13 + (12*(((3 + (6*i1)) / 4))) + (72*i0) + (((6 + (12*i2)) / 7))];
2023-01-11T21:38:06.1459837Z                                 tmp22 = tmp23;
2023-01-11T21:38:06.1459943Z                             }
2023-01-11T21:38:06.1460046Z                             auto tmp24 = tmp22 + tmp20;
2023-01-11T21:38:06.1460162Z                             out_ptr4[i2 + (7*i1) + (28*i0)] = tmp24;
2023-01-11T21:38:06.1460235Z                         }
2023-01-11T21:38:06.1460307Z                     }
2023-01-11T21:38:06.1460376Z                 }
2023-01-11T21:38:06.1460436Z             }
2023-01-11T21:38:06.1460502Z         }
2023-01-11T21:38:06.1460568Z     }
2023-01-11T21:38:06.1460632Z }
2023-01-11T21:38:06.1460767Z ''')
2023-01-11T21:38:06.1460774Z 
2023-01-11T21:38:06.1460779Z 
2023-01-11T21:38:06.1460876Z async_compile.wait(globals())
2023-01-11T21:38:06.1460958Z del async_compile
2023-01-11T21:38:06.1460963Z 
2023-01-11T21:38:06.1461032Z def call(args):
2023-01-11T21:38:06.1461107Z     arg0_1, = args
2023-01-11T21:38:06.1461182Z     args.clear()
2023-01-11T21:38:06.1461406Z     buf0 = empty_strided((3, 3, 3, 6), (54, 18, 6, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1461622Z     buf1 = empty_strided((3, 3, 4, 5), (60, 20, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1461829Z     buf2 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1462035Z     buf3 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1462230Z     buf4 = empty_strided((3, 3, 4, 7), (84, 28, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1462470Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()))
2023-01-11T21:38:06.1462575Z     del arg0_1
2023-01-11T21:38:06.1462679Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.1462684Z 
2023-01-11T21:38:06.1462689Z 
2023-01-11T21:38:06.1462776Z if __name__ == "__main__":
2023-01-11T21:38:06.1462896Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1463027Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1463243Z     arg0_1 = rand_strided((3, 3, 6, 12), (216, 72, 12, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1463361Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1463626Z [2023-01-11 21:33:00,019] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 318
2023-01-11T21:38:06.1463632Z 
2023-01-11T21:38:06.1463708Z ok (2.379s)
2023-01-11T21:38:06.1464184Z   test_upsample_nearest2d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1464325Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1464592Z [2023-01-11 21:33:00,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 319
2023-01-11T21:38:06.1464862Z [2023-01-11 21:33:02,677] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 319
2023-01-11T21:38:06.1464868Z 
2023-01-11T21:38:06.1464968Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1465045Z import torch
2023-01-11T21:38:06.1465133Z import random
2023-01-11T21:38:06.1465266Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1465419Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1465424Z 
2023-01-11T21:38:06.1465516Z aten = torch.ops.aten
2023-01-11T21:38:06.1465660Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1465760Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1465765Z 
2023-01-11T21:38:06.1465844Z import triton
2023-01-11T21:38:06.1465939Z import triton.language as tl
2023-01-11T21:38:06.1466099Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1466235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1466241Z 
2023-01-11T21:38:06.1466254Z 
2023-01-11T21:38:06.1466387Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1466599Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1466726Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1466835Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1466943Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1467046Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1467152Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.1467247Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:06.1467315Z {
2023-01-11T21:38:06.1467419Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1467490Z     {
2023-01-11T21:38:06.1467577Z         #pragma omp for 
2023-01-11T21:38:06.1467669Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1467740Z         {
2023-01-11T21:38:06.1467822Z             #pragma GCC ivdep
2023-01-11T21:38:06.1467915Z             for(long i1=0; i1<74; i1+=1)
2023-01-11T21:38:06.1467985Z             {
2023-01-11T21:38:06.1468076Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1468176Z                 for(long i2=0; i2<76; i2+=1)
2023-01-11T21:38:06.1468248Z                 {
2023-01-11T21:38:06.1468315Z                     {
2023-01-11T21:38:06.1468392Z                         {
2023-01-11T21:38:06.1468513Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1468662Z                             auto tmp1 = static_cast<float>(0.5);
2023-01-11T21:38:06.1468769Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1468886Z                             auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1469005Z                             auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1469108Z                             auto tmp5 = tmp4 * tmp1;
2023-01-11T21:38:06.1469215Z                             auto tmp6 = static_cast<int>(tmp5);
2023-01-11T21:38:06.1469343Z                             auto tmp7 = in_ptr0[tmp6 + (38*tmp3) + (1406*i0)];
2023-01-11T21:38:06.1469458Z                             out_ptr0[i2 + (76*i1) + (5624*i0)] = tmp7;
2023-01-11T21:38:06.1469573Z                             out_ptr1[i2 + (76*i1) + (5624*i0)] = tmp7;
2023-01-11T21:38:06.1469648Z                         }
2023-01-11T21:38:06.1469721Z                     }
2023-01-11T21:38:06.1469793Z                 }
2023-01-11T21:38:06.1469859Z             }
2023-01-11T21:38:06.1469928Z         }
2023-01-11T21:38:06.1470012Z         #pragma omp for 
2023-01-11T21:38:06.1470103Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1470172Z         {
2023-01-11T21:38:06.1470262Z             #pragma GCC ivdep
2023-01-11T21:38:06.1470356Z             for(long i1=0; i1<70; i1+=1)
2023-01-11T21:38:06.1470420Z             {
2023-01-11T21:38:06.1470511Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1470609Z                 for(long i2=0; i2<75; i2+=1)
2023-01-11T21:38:06.1470680Z                 {
2023-01-11T21:38:06.1470755Z                     {
2023-01-11T21:38:06.1470829Z                         {
2023-01-11T21:38:06.1470939Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1471066Z                             auto tmp1 = static_cast<float>(0.5285714285714286);
2023-01-11T21:38:06.1471173Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1471288Z                             auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1471408Z                             auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1471537Z                             auto tmp5 = static_cast<float>(0.5066666666666667);
2023-01-11T21:38:06.1471644Z                             auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1471787Z                             auto tmp7 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1471908Z                             auto tmp8 = in_ptr0[tmp7 + (38*tmp3) + (1406*i0)];
2023-01-11T21:38:06.1472023Z                             out_ptr2[i2 + (75*i1) + (5250*i0)] = tmp8;
2023-01-11T21:38:06.1472098Z                         }
2023-01-11T21:38:06.1472174Z                     }
2023-01-11T21:38:06.1472246Z                 }
2023-01-11T21:38:06.1472317Z             }
2023-01-11T21:38:06.1472385Z         }
2023-01-11T21:38:06.1472463Z         #pragma omp for 
2023-01-11T21:38:06.1472554Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1472627Z         {
2023-01-11T21:38:06.1472715Z             #pragma GCC ivdep
2023-01-11T21:38:06.1472807Z             for(long i1=0; i1<45; i1+=1)
2023-01-11T21:38:06.1472879Z             {
2023-01-11T21:38:06.1472969Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1473059Z                 for(long i2=0; i2<74; i2+=1)
2023-01-11T21:38:06.1473133Z                 {
2023-01-11T21:38:06.1473206Z                     {
2023-01-11T21:38:06.1473281Z                         {
2023-01-11T21:38:06.1473396Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1473519Z                             auto tmp1 = static_cast<float>(0.8222222222222222);
2023-01-11T21:38:06.1473623Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1473729Z                             auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1473845Z                             auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1473970Z                             auto tmp5 = static_cast<float>(0.5135135135135135);
2023-01-11T21:38:06.1474200Z                             auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1474311Z                             auto tmp7 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1474433Z                             auto tmp8 = in_ptr0[tmp7 + (38*tmp3) + (1406*i0)];
2023-01-11T21:38:06.1474547Z                             out_ptr3[i2 + (74*i1) + (3330*i0)] = tmp8;
2023-01-11T21:38:06.1474612Z                         }
2023-01-11T21:38:06.1474683Z                     }
2023-01-11T21:38:06.1474754Z                 }
2023-01-11T21:38:06.1474825Z             }
2023-01-11T21:38:06.1474894Z         }
2023-01-11T21:38:06.1474976Z         #pragma omp for 
2023-01-11T21:38:06.1475062Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1475128Z         {
2023-01-11T21:38:06.1475232Z             #pragma GCC ivdep
2023-01-11T21:38:06.1475329Z             for(long i1=0; i1<36; i1+=1)
2023-01-11T21:38:06.1475413Z             {
2023-01-11T21:38:06.1475499Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1475596Z                 for(long i2=0; i2<39; i2+=1)
2023-01-11T21:38:06.1475666Z                 {
2023-01-11T21:38:06.1475729Z                     {
2023-01-11T21:38:06.1475801Z                         {
2023-01-11T21:38:06.1475914Z                             auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1476039Z                             auto tmp1 = static_cast<float>(1.0277777777777777);
2023-01-11T21:38:06.1476142Z                             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1476254Z                             auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1476368Z                             auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1476483Z                             auto tmp5 = static_cast<float>(0.9743589743589743);
2023-01-11T21:38:06.1476586Z                             auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1476698Z                             auto tmp7 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1476826Z                             auto tmp8 = in_ptr0[tmp7 + (38*tmp3) + (1406*i0)];
2023-01-11T21:38:06.1476938Z                             out_ptr4[i2 + (39*i1) + (1404*i0)] = tmp8;
2023-01-11T21:38:06.1477013Z                         }
2023-01-11T21:38:06.1477084Z                     }
2023-01-11T21:38:06.1477146Z                 }
2023-01-11T21:38:06.1477240Z             }
2023-01-11T21:38:06.1477309Z         }
2023-01-11T21:38:06.1477378Z     }
2023-01-11T21:38:06.1477444Z }
2023-01-11T21:38:06.1477537Z ''')
2023-01-11T21:38:06.1477544Z 
2023-01-11T21:38:06.1477548Z 
2023-01-11T21:38:06.1477648Z async_compile.wait(globals())
2023-01-11T21:38:06.1477719Z del async_compile
2023-01-11T21:38:06.1477732Z 
2023-01-11T21:38:06.1477800Z def call(args):
2023-01-11T21:38:06.1477874Z     arg0_1, = args
2023-01-11T21:38:06.1477949Z     args.clear()
2023-01-11T21:38:06.1478181Z     buf0 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1478402Z     buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1478625Z     buf1 = empty_strided((2, 4, 70, 75), (21000, 5250, 75, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1478844Z     buf2 = empty_strided((2, 4, 45, 74), (13320, 3330, 74, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1479056Z     buf3 = empty_strided((2, 4, 36, 39), (5616, 1404, 39, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1479294Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.1479369Z     del arg0_1
2023-01-11T21:38:06.1479473Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.1479479Z 
2023-01-11T21:38:06.1479484Z 
2023-01-11T21:38:06.1479566Z if __name__ == "__main__":
2023-01-11T21:38:06.1479684Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1479812Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1480057Z     arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1480171Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1480177Z 
2023-01-11T21:38:06.1480241Z ok (2.665s)
2023-01-11T21:38:06.1480707Z   test_upsample_nearest3d_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1480841Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1481100Z [2023-01-11 21:33:03,264] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 320
2023-01-11T21:38:06.1481108Z 
2023-01-11T21:38:06.1481207Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1481283Z import torch
2023-01-11T21:38:06.1481358Z import random
2023-01-11T21:38:06.1481479Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1481603Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1481608Z 
2023-01-11T21:38:06.1481685Z aten = torch.ops.aten
2023-01-11T21:38:06.1481821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1481919Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1481924Z 
2023-01-11T21:38:06.1481997Z import triton
2023-01-11T21:38:06.1482091Z import triton.language as tl
2023-01-11T21:38:06.1482216Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1482358Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1482364Z 
2023-01-11T21:38:06.1482368Z 
2023-01-11T21:38:06.1482505Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1482705Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1482833Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1482940Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1483044Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1483172Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1483274Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.1483374Z                        float* __restrict__ out_ptr4)
2023-01-11T21:38:06.1483432Z {
2023-01-11T21:38:06.1483534Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1483602Z     {
2023-01-11T21:38:06.1483684Z         #pragma omp for 
2023-01-11T21:38:06.1483770Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1483838Z         {
2023-01-11T21:38:06.1483924Z             #pragma GCC ivdep
2023-01-11T21:38:06.1484008Z             for(long i1=0; i1<74; i1+=1)
2023-01-11T21:38:06.1484080Z             {
2023-01-11T21:38:06.1484167Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1484262Z                 for(long i2=0; i2<76; i2+=1)
2023-01-11T21:38:06.1484332Z                 {
2023-01-11T21:38:06.1484420Z                     #pragma GCC ivdep
2023-01-11T21:38:06.1484518Z                     for(long i3=0; i3<78; i3+=1)
2023-01-11T21:38:06.1484585Z                     {
2023-01-11T21:38:06.1484662Z                         {
2023-01-11T21:38:06.1484738Z                             {
2023-01-11T21:38:06.1484857Z                                 auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1484978Z                                 auto tmp1 = static_cast<float>(0.5);
2023-01-11T21:38:06.1485083Z                                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1485203Z                                 auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1485310Z                                 auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1485412Z                                 auto tmp5 = tmp4 * tmp1;
2023-01-11T21:38:06.1485558Z                                 auto tmp6 = static_cast<int>(tmp5);
2023-01-11T21:38:06.1485674Z                                 auto tmp7 = static_cast<float>(i3);
2023-01-11T21:38:06.1485778Z                                 auto tmp8 = tmp7 * tmp1;
2023-01-11T21:38:06.1485896Z                                 auto tmp9 = static_cast<int>(tmp8);
2023-01-11T21:38:06.1486036Z                                 auto tmp10 = in_ptr0[tmp9 + (39*tmp6) + (1482*tmp3) + (54834*i0)];
2023-01-11T21:38:06.1486157Z                                 out_ptr0[i3 + (78*i2) + (5928*i1) + (438672*i0)] = tmp10;
2023-01-11T21:38:06.1486270Z                                 out_ptr1[i3 + (78*i2) + (5928*i1) + (438672*i0)] = tmp10;
2023-01-11T21:38:06.1486349Z                             }
2023-01-11T21:38:06.1486422Z                         }
2023-01-11T21:38:06.1486493Z                     }
2023-01-11T21:38:06.1486563Z                 }
2023-01-11T21:38:06.1486633Z             }
2023-01-11T21:38:06.1486700Z         }
2023-01-11T21:38:06.1486775Z         #pragma omp for 
2023-01-11T21:38:06.1486861Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1486929Z         {
2023-01-11T21:38:06.1487013Z             #pragma GCC ivdep
2023-01-11T21:38:06.1487103Z             for(long i1=0; i1<70; i1+=1)
2023-01-11T21:38:06.1487176Z             {
2023-01-11T21:38:06.1487256Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1487352Z                 for(long i2=0; i2<75; i2+=1)
2023-01-11T21:38:06.1487426Z                 {
2023-01-11T21:38:06.1487514Z                     #pragma GCC ivdep
2023-01-11T21:38:06.1487613Z                     for(long i3=0; i3<80; i3+=1)
2023-01-11T21:38:06.1487684Z                     {
2023-01-11T21:38:06.1487757Z                         {
2023-01-11T21:38:06.1487824Z                             {
2023-01-11T21:38:06.1487940Z                                 auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1488065Z                                 auto tmp1 = static_cast<float>(0.5285714285714286);
2023-01-11T21:38:06.1488174Z                                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1488290Z                                 auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1488403Z                                 auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1488553Z                                 auto tmp5 = static_cast<float>(0.5066666666666667);
2023-01-11T21:38:06.1488659Z                                 auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1488766Z                                 auto tmp7 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1488878Z                                 auto tmp8 = static_cast<float>(i3);
2023-01-11T21:38:06.1488999Z                                 auto tmp9 = static_cast<float>(0.4875);
2023-01-11T21:38:06.1489106Z                                 auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.1489226Z                                 auto tmp11 = static_cast<int>(tmp10);
2023-01-11T21:38:06.1489365Z                                 auto tmp12 = in_ptr0[tmp11 + (39*tmp7) + (1482*tmp3) + (54834*i0)];
2023-01-11T21:38:06.1489483Z                                 out_ptr2[i3 + (80*i2) + (6000*i1) + (420000*i0)] = tmp12;
2023-01-11T21:38:06.1489558Z                             }
2023-01-11T21:38:06.1489626Z                         }
2023-01-11T21:38:06.1489700Z                     }
2023-01-11T21:38:06.1489770Z                 }
2023-01-11T21:38:06.1489837Z             }
2023-01-11T21:38:06.1489904Z         }
2023-01-11T21:38:06.1489986Z         #pragma omp for 
2023-01-11T21:38:06.1490066Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1490133Z         {
2023-01-11T21:38:06.1490217Z             #pragma GCC ivdep
2023-01-11T21:38:06.1490306Z             for(long i1=0; i1<45; i1+=1)
2023-01-11T21:38:06.1490376Z             {
2023-01-11T21:38:06.1490463Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1490557Z                 for(long i2=0; i2<74; i2+=1)
2023-01-11T21:38:06.1490649Z                 {
2023-01-11T21:38:06.1490739Z                     #pragma GCC ivdep
2023-01-11T21:38:06.1490836Z                     for(long i3=0; i3<103; i3+=1)
2023-01-11T21:38:06.1490907Z                     {
2023-01-11T21:38:06.1490980Z                         {
2023-01-11T21:38:06.1491054Z                             {
2023-01-11T21:38:06.1491174Z                                 auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1491290Z                                 auto tmp1 = static_cast<float>(0.8222222222222222);
2023-01-11T21:38:06.1491394Z                                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1491510Z                                 auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1491623Z                                 auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1491747Z                                 auto tmp5 = static_cast<float>(0.5135135135135135);
2023-01-11T21:38:06.1491852Z                                 auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1491969Z                                 auto tmp7 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1492081Z                                 auto tmp8 = static_cast<float>(i3);
2023-01-11T21:38:06.1492196Z                                 auto tmp9 = static_cast<float>(0.3786407766990291);
2023-01-11T21:38:06.1492306Z                                 auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.1492430Z                                 auto tmp11 = static_cast<int>(tmp10);
2023-01-11T21:38:06.1492567Z                                 auto tmp12 = in_ptr0[tmp11 + (39*tmp7) + (1482*tmp3) + (54834*i0)];
2023-01-11T21:38:06.1492690Z                                 out_ptr3[i3 + (103*i2) + (7622*i1) + (342990*i0)] = tmp12;
2023-01-11T21:38:06.1492766Z                             }
2023-01-11T21:38:06.1492840Z                         }
2023-01-11T21:38:06.1492904Z                     }
2023-01-11T21:38:06.1492973Z                 }
2023-01-11T21:38:06.1493039Z             }
2023-01-11T21:38:06.1493108Z         }
2023-01-11T21:38:06.1493191Z         #pragma omp for 
2023-01-11T21:38:06.1493280Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1493346Z         {
2023-01-11T21:38:06.1493423Z             #pragma GCC ivdep
2023-01-11T21:38:06.1493514Z             for(long i1=0; i1<36; i1+=1)
2023-01-11T21:38:06.1493583Z             {
2023-01-11T21:38:06.1493696Z                 #pragma GCC ivdep
2023-01-11T21:38:06.1493793Z                 for(long i2=0; i2<39; i2+=1)
2023-01-11T21:38:06.1493865Z                 {
2023-01-11T21:38:06.1493956Z                     #pragma GCC ivdep
2023-01-11T21:38:06.1494046Z                     for(long i3=0; i3<40; i3+=1)
2023-01-11T21:38:06.1494117Z                     {
2023-01-11T21:38:06.1494189Z                         {
2023-01-11T21:38:06.1494265Z                             {
2023-01-11T21:38:06.1494383Z                                 auto tmp0 = static_cast<float>(i1);
2023-01-11T21:38:06.1494811Z                                 auto tmp1 = static_cast<float>(1.0277777777777777);
2023-01-11T21:38:06.1494960Z                                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1495069Z                                 auto tmp3 = static_cast<int>(tmp2);
2023-01-11T21:38:06.1495180Z                                 auto tmp4 = static_cast<float>(i2);
2023-01-11T21:38:06.1495307Z                                 auto tmp5 = static_cast<float>(0.9743589743589743);
2023-01-11T21:38:06.1495423Z                                 auto tmp6 = tmp4 * tmp5;
2023-01-11T21:38:06.1495552Z                                 auto tmp7 = static_cast<int>(tmp6);
2023-01-11T21:38:06.1495688Z                                 auto tmp8 = static_cast<float>(i3);
2023-01-11T21:38:06.1495806Z                                 auto tmp9 = static_cast<float>(0.975);
2023-01-11T21:38:06.1495911Z                                 auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.1496022Z                                 auto tmp11 = static_cast<int>(tmp10);
2023-01-11T21:38:06.1496248Z                                 auto tmp12 = in_ptr0[tmp11 + (39*tmp7) + (1482*tmp3) + (54834*i0)];
2023-01-11T21:38:06.1496370Z                                 out_ptr4[i3 + (40*i2) + (1560*i1) + (56160*i0)] = tmp12;
2023-01-11T21:38:06.1496448Z                             }
2023-01-11T21:38:06.1496524Z                         }
2023-01-11T21:38:06.1496602Z                     }
2023-01-11T21:38:06.1496675Z                 }
2023-01-11T21:38:06.1496738Z             }
2023-01-11T21:38:06.1496806Z         }
2023-01-11T21:38:06.1496875Z     }
2023-01-11T21:38:06.1496941Z }
2023-01-11T21:38:06.1497039Z ''')
2023-01-11T21:38:06.1497051Z 
2023-01-11T21:38:06.1497055Z 
2023-01-11T21:38:06.1497228Z async_compile.wait(globals())
2023-01-11T21:38:06.1497324Z del async_compile
2023-01-11T21:38:06.1497330Z 
2023-01-11T21:38:06.1497411Z def call(args):
2023-01-11T21:38:06.1497505Z     arg0_1, = args
2023-01-11T21:38:06.1497591Z     args.clear()
2023-01-11T21:38:06.1497837Z     buf0 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1498079Z     buf4 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1498313Z     buf1 = empty_strided((2, 4, 70, 75, 80), (1680000, 420000, 6000, 80, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1498559Z     buf2 = empty_strided((2, 4, 45, 74, 103), (1371960, 342990, 7622, 103, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1498792Z     buf3 = empty_strided((2, 4, 36, 39, 40), (224640, 56160, 1560, 40, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1499021Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.1499094Z     del arg0_1
2023-01-11T21:38:06.1499195Z     return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.1499201Z 
2023-01-11T21:38:06.1499205Z 
2023-01-11T21:38:06.1499290Z if __name__ == "__main__":
2023-01-11T21:38:06.1499409Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1499538Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1499773Z     arg0_1 = rand_strided((2, 4, 37, 38, 39), (219336, 54834, 1482, 39, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1499933Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1500200Z [2023-01-11 21:33:05,048] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 320
2023-01-11T21:38:06.1500206Z 
2023-01-11T21:38:06.1500270Z ok (2.476s)
2023-01-11T21:38:06.1500724Z   test_var_mean_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1500860Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1501115Z [2023-01-11 21:33:05,185] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 321
2023-01-11T21:38:06.1501120Z 
2023-01-11T21:38:06.1501220Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1501294Z import torch
2023-01-11T21:38:06.1501368Z import random
2023-01-11T21:38:06.1501489Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1501616Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1501622Z 
2023-01-11T21:38:06.1501697Z aten = torch.ops.aten
2023-01-11T21:38:06.1501833Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1501930Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1501936Z 
2023-01-11T21:38:06.1502010Z import triton
2023-01-11T21:38:06.1502102Z import triton.language as tl
2023-01-11T21:38:06.1502255Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1502394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1502400Z 
2023-01-11T21:38:06.1502404Z 
2023-01-11T21:38:06.1502542Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1502743Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1502865Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.1502971Z                        float* __restrict__ in_out_ptr1,
2023-01-11T21:38:06.1503077Z                        float* __restrict__ in_out_ptr2,
2023-01-11T21:38:06.1503182Z                        float* __restrict__ in_out_ptr3,
2023-01-11T21:38:06.1503291Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1503396Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1503490Z                        float* __restrict__ out_ptr3)
2023-01-11T21:38:06.1503558Z {
2023-01-11T21:38:06.1503654Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:06.1503743Z     auto out_ptr2 = in_out_ptr1;
2023-01-11T21:38:06.1503831Z     auto out_ptr4 = in_out_ptr2;
2023-01-11T21:38:06.1503918Z     auto out_ptr5 = in_out_ptr3;
2023-01-11T21:38:06.1504020Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1504079Z     {
2023-01-11T21:38:06.1504162Z         #pragma omp for 
2023-01-11T21:38:06.1504250Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1504316Z         {
2023-01-11T21:38:06.1504386Z             {
2023-01-11T21:38:06.1504611Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1504696Z                 float tmp1 = 0;
2023-01-11T21:38:06.1504816Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.1504910Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.1504980Z                 {
2023-01-11T21:38:06.1505128Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.1505223Z                     tmp1_vec += tmp0;
2023-01-11T21:38:06.1505310Z                 }
2023-01-11T21:38:06.1505540Z                 tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.1505699Z                 #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.1505789Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.1505858Z                 {
2023-01-11T21:38:06.1505963Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.1506047Z                     tmp1 += tmp0;
2023-01-11T21:38:06.1506116Z                 }
2023-01-11T21:38:06.1506202Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1506269Z             }
2023-01-11T21:38:06.1506329Z         }
2023-01-11T21:38:06.1506412Z         #pragma omp for 
2023-01-11T21:38:06.1506497Z         for(long i0=0; i0<8; i0+=1)
2023-01-11T21:38:06.1506566Z         {
2023-01-11T21:38:06.1506633Z             {
2023-01-11T21:38:06.1506824Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1506909Z                 float tmp6 = 0;
2023-01-11T21:38:06.1507030Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.1507114Z                 float tmp7 = 0;
2023-01-11T21:38:06.1507238Z                 auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.1507334Z                 for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.1507404Z                 {
2023-01-11T21:38:06.1507553Z                     auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i1));
2023-01-11T21:38:06.1507686Z                     auto tmp1 = at::vec::Vectorized<float>(out_ptr0[i0]);
2023-01-11T21:38:06.1507829Z                     auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.1507950Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1508094Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1508192Z                     auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.1508280Z                     tmp6_vec += tmp5;
2023-01-11T21:38:06.1508367Z                     tmp7_vec += tmp0;
2023-01-11T21:38:06.1508437Z                 }
2023-01-11T21:38:06.1508637Z                 tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.1508834Z                 tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:06.1508980Z                 #pragma omp simd simdlen(4)  reduction(+:tmp6) reduction(+:tmp7)
2023-01-11T21:38:06.1509068Z                 for(long i1=8; i1<8; i1+=1)
2023-01-11T21:38:06.1509136Z                 {
2023-01-11T21:38:06.1509241Z                     auto tmp0 = in_ptr0[i1 + (8*i0)];
2023-01-11T21:38:06.1509341Z                     auto tmp1 = out_ptr0[i0];
2023-01-11T21:38:06.1509450Z                     auto tmp2 = static_cast<float>(8);
2023-01-11T21:38:06.1509546Z                     auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1509682Z                     auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1509772Z                     auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.1509856Z                     tmp6 += tmp5;
2023-01-11T21:38:06.1509939Z                     tmp7 += tmp0;
2023-01-11T21:38:06.1510008Z                 }
2023-01-11T21:38:06.1510098Z                 out_ptr1[i0] = tmp6;
2023-01-11T21:38:06.1510185Z                 out_ptr2[i0] = tmp7;
2023-01-11T21:38:06.1510253Z             }
2023-01-11T21:38:06.1510313Z         }
2023-01-11T21:38:06.1510398Z         #pragma omp for 
2023-01-11T21:38:06.1510484Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1510551Z         {
2023-01-11T21:38:06.1510690Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1510833Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(7));
2023-01-11T21:38:06.1510923Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1511017Z             tmp2.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.1511085Z         }
2023-01-11T21:38:06.1511217Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1511305Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.1511372Z         {
2023-01-11T21:38:06.1511462Z             auto tmp0 = out_ptr1[i0];
2023-01-11T21:38:06.1511565Z             auto tmp1 = static_cast<float>(7);
2023-01-11T21:38:06.1511646Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1511732Z             in_out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1511800Z         }
2023-01-11T21:38:06.1511880Z         #pragma omp for 
2023-01-11T21:38:06.1511966Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1512032Z         {
2023-01-11T21:38:06.1512171Z             auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr2 + 8*i0);
2023-01-11T21:38:06.1512303Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(8));
2023-01-11T21:38:06.1512393Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1512493Z             tmp2.store(in_out_ptr1 + 8*i0);
2023-01-11T21:38:06.1512562Z         }
2023-01-11T21:38:06.1512663Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1512748Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.1512816Z         {
2023-01-11T21:38:06.1512899Z             auto tmp0 = out_ptr2[i0];
2023-01-11T21:38:06.1513004Z             auto tmp1 = static_cast<float>(8);
2023-01-11T21:38:06.1513092Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1513178Z             in_out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.1513246Z         }
2023-01-11T21:38:06.1513328Z         #pragma omp for 
2023-01-11T21:38:06.1513413Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1513473Z         {
2023-01-11T21:38:06.1513540Z             {
2023-01-11T21:38:06.1513777Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1513863Z                 float tmp1 = 0;
2023-01-11T21:38:06.1513991Z                 auto tmp1_vec = at::vec::Vectorized<float>(tmp1);
2023-01-11T21:38:06.1514089Z                 for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:06.1514158Z                 {
2023-01-11T21:38:06.1514247Z                     for(long i2=0; i2<1; i2+=1)
2023-01-11T21:38:06.1514318Z                     {
2023-01-11T21:38:06.1514476Z                         auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i2) + (32*i1));
2023-01-11T21:38:06.1514568Z                         tmp1_vec += tmp0;
2023-01-11T21:38:06.1514641Z                     }
2023-01-11T21:38:06.1514842Z                     tmp1 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp1_vec);
2023-01-11T21:38:06.1514970Z                     #pragma omp simd simdlen(4)  reduction(+:tmp1)
2023-01-11T21:38:06.1515072Z                     for(long i2=8; i2<8; i2+=1)
2023-01-11T21:38:06.1515136Z                     {
2023-01-11T21:38:06.1515248Z                         auto tmp0 = in_ptr0[i2 + (8*i0) + (32*i1)];
2023-01-11T21:38:06.1515333Z                         tmp1 += tmp0;
2023-01-11T21:38:06.1515406Z                     }
2023-01-11T21:38:06.1515477Z                 }
2023-01-11T21:38:06.1515565Z                 out_ptr3[i0] = tmp1;
2023-01-11T21:38:06.1515632Z             }
2023-01-11T21:38:06.1515691Z         }
2023-01-11T21:38:06.1515774Z         #pragma omp for 
2023-01-11T21:38:06.1515861Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1515928Z         {
2023-01-11T21:38:06.1515994Z             {
2023-01-11T21:38:06.1516187Z                 #pragma omp declare reduction(+:at::vec::Vectorized<float>:omp_out += omp_in) initializer(omp_priv={{0}})
2023-01-11T21:38:06.1516275Z                 float tmp6 = 0;
2023-01-11T21:38:06.1516398Z                 auto tmp6_vec = at::vec::Vectorized<float>(tmp6);
2023-01-11T21:38:06.1516482Z                 float tmp7 = 0;
2023-01-11T21:38:06.1516608Z                 auto tmp7_vec = at::vec::Vectorized<float>(tmp7);
2023-01-11T21:38:06.1516700Z                 for(long i1=0; i1<2; i1+=1)
2023-01-11T21:38:06.1516770Z                 {
2023-01-11T21:38:06.1516896Z                     for(long i2=0; i2<1; i2+=1)
2023-01-11T21:38:06.1516971Z                     {
2023-01-11T21:38:06.1517129Z                         auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i0) + (8*i2) + (32*i1));
2023-01-11T21:38:06.1517258Z                         auto tmp1 = at::vec::Vectorized<float>(out_ptr3[i0]);
2023-01-11T21:38:06.1517402Z                         auto tmp2 = at::vec::Vectorized<float>(static_cast<float>(16));
2023-01-11T21:38:06.1517502Z                         auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1517652Z                         auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1517755Z                         auto tmp5 = tmp4.pow(2);
2023-01-11T21:38:06.1517845Z                         tmp6_vec += tmp5;
2023-01-11T21:38:06.1517935Z                         tmp7_vec += tmp0;
2023-01-11T21:38:06.1517999Z                     }
2023-01-11T21:38:06.1518203Z                     tmp6 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp6_vec);
2023-01-11T21:38:06.1518394Z                     tmp7 = at::vec::vec_reduce_all<float>([](at::vec::Vectorized<float>& x, at::vec::Vectorized<float>&y) {return x + y;}, tmp7_vec);
2023-01-11T21:38:06.1518542Z                     #pragma omp simd simdlen(4)  reduction(+:tmp6) reduction(+:tmp7)
2023-01-11T21:38:06.1518638Z                     for(long i2=8; i2<8; i2+=1)
2023-01-11T21:38:06.1518709Z                     {
2023-01-11T21:38:06.1518822Z                         auto tmp0 = in_ptr0[i2 + (8*i0) + (32*i1)];
2023-01-11T21:38:06.1518920Z                         auto tmp1 = out_ptr3[i0];
2023-01-11T21:38:06.1519066Z                         auto tmp2 = static_cast<float>(16);
2023-01-11T21:38:06.1519157Z                         auto tmp3 = tmp1 / tmp2;
2023-01-11T21:38:06.1519298Z                         auto tmp4 = tmp0 - tmp3;
2023-01-11T21:38:06.1519395Z                         auto tmp5 = tmp4 * tmp4;
2023-01-11T21:38:06.1519483Z                         tmp6 += tmp5;
2023-01-11T21:38:06.1519567Z                         tmp7 += tmp0;
2023-01-11T21:38:06.1519638Z                     }
2023-01-11T21:38:06.1519706Z                 }
2023-01-11T21:38:06.1519786Z                 out_ptr4[i0] = tmp6;
2023-01-11T21:38:06.1519870Z                 out_ptr5[i0] = tmp7;
2023-01-11T21:38:06.1519940Z             }
2023-01-11T21:38:06.1520009Z         }
2023-01-11T21:38:06.1520094Z         #pragma omp single
2023-01-11T21:38:06.1520161Z         {
2023-01-11T21:38:06.1520238Z             #pragma GCC ivdep
2023-01-11T21:38:06.1520326Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1520396Z             {
2023-01-11T21:38:06.1520465Z                 {
2023-01-11T21:38:06.1520536Z                     {
2023-01-11T21:38:06.1520636Z                         auto tmp0 = out_ptr4[i0];
2023-01-11T21:38:06.1520749Z                         auto tmp1 = static_cast<float>(15);
2023-01-11T21:38:06.1520840Z                         auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1520938Z                         in_out_ptr2[i0] = tmp2;
2023-01-11T21:38:06.1521010Z                     }
2023-01-11T21:38:06.1521078Z                 }
2023-01-11T21:38:06.1521144Z             }
2023-01-11T21:38:06.1521210Z         }
2023-01-11T21:38:06.1521293Z         #pragma omp single
2023-01-11T21:38:06.1521355Z         {
2023-01-11T21:38:06.1521440Z             #pragma GCC ivdep
2023-01-11T21:38:06.1521529Z             for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1521596Z             {
2023-01-11T21:38:06.1521664Z                 {
2023-01-11T21:38:06.1521734Z                     {
2023-01-11T21:38:06.1521832Z                         auto tmp0 = out_ptr5[i0];
2023-01-11T21:38:06.1521940Z                         auto tmp1 = static_cast<float>(16);
2023-01-11T21:38:06.1522039Z                         auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.1522133Z                         in_out_ptr3[i0] = tmp2;
2023-01-11T21:38:06.1522205Z                     }
2023-01-11T21:38:06.1522301Z                 }
2023-01-11T21:38:06.1522373Z             }
2023-01-11T21:38:06.1522442Z         }
2023-01-11T21:38:06.1522503Z     }
2023-01-11T21:38:06.1522568Z }
2023-01-11T21:38:06.1522653Z ''')
2023-01-11T21:38:06.1522661Z 
2023-01-11T21:38:06.1522665Z 
2023-01-11T21:38:06.1522760Z async_compile.wait(globals())
2023-01-11T21:38:06.1522837Z del async_compile
2023-01-11T21:38:06.1522842Z 
2023-01-11T21:38:06.1522920Z def call(args):
2023-01-11T21:38:06.1522994Z     arg0_1, = args
2023-01-11T21:38:06.1523062Z     args.clear()
2023-01-11T21:38:06.1523273Z     buf0 = empty_strided((1, 2, 4, 1), (8, 4, 1, 8), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1523479Z     buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1523679Z     buf2 = empty_strided((1, 2, 4), (8, 4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1523771Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.1523857Z     buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.1524065Z     buf5 = empty_strided((1, 1, 4, 1), (4, 4, 1, 4), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1524248Z     buf6 = empty_strided((1, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1524441Z     buf7 = empty_strided((1, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1524529Z     buf8 = buf6; del buf6  # reuse
2023-01-11T21:38:06.1524617Z     buf9 = buf7; del buf7  # reuse
2023-01-11T21:38:06.1524875Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf8.data_ptr()), c_void_p(buf9.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:06.1524979Z     del arg0_1
2023-01-11T21:38:06.1525071Z     return (buf3, buf4, buf8, buf9, )
2023-01-11T21:38:06.1525077Z 
2023-01-11T21:38:06.1525081Z 
2023-01-11T21:38:06.1525162Z if __name__ == "__main__":
2023-01-11T21:38:06.1525281Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1525402Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1525616Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1525729Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1525997Z [2023-01-11 21:33:06,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 321
2023-01-11T21:38:06.1526003Z 
2023-01-11T21:38:06.1526075Z ok (1.831s)
2023-01-11T21:38:06.1526533Z   test_vdd_clamp_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1526670Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1526929Z [2023-01-11 21:33:07,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 322
2023-01-11T21:38:06.1527188Z [2023-01-11 21:33:08,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 322
2023-01-11T21:38:06.1527194Z 
2023-01-11T21:38:06.1527285Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1527360Z import torch
2023-01-11T21:38:06.1527434Z import random
2023-01-11T21:38:06.1527555Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1527679Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1527685Z 
2023-01-11T21:38:06.1527767Z aten = torch.ops.aten
2023-01-11T21:38:06.1527908Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1528004Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1528009Z 
2023-01-11T21:38:06.1528075Z import triton
2023-01-11T21:38:06.1528169Z import triton.language as tl
2023-01-11T21:38:06.1528295Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1528466Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1528473Z 
2023-01-11T21:38:06.1528478Z 
2023-01-11T21:38:06.1528616Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1528822Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1528947Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1529053Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1529147Z                        bool* __restrict__ out_ptr1)
2023-01-11T21:38:06.1529213Z {
2023-01-11T21:38:06.1529315Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1529387Z     {
2023-01-11T21:38:06.1529468Z         #pragma omp for 
2023-01-11T21:38:06.1529557Z         for(long i0=0; i0<16; i0+=1)
2023-01-11T21:38:06.1529624Z         {
2023-01-11T21:38:06.1529684Z             {
2023-01-11T21:38:06.1529752Z                 {
2023-01-11T21:38:06.1529853Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1529965Z                     auto tmp1 = static_cast<float>(3.0);
2023-01-11T21:38:06.1530100Z                     auto tmp2 = (tmp1 != tmp1) ? tmp1 : std::max(tmp0, tmp1);
2023-01-11T21:38:06.1530213Z                     auto tmp3 = static_cast<float>(3);
2023-01-11T21:38:06.1530311Z                     auto tmp4 = tmp0 >= tmp3;
2023-01-11T21:38:06.1530394Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1530484Z                     out_ptr1[i0] = tmp4;
2023-01-11T21:38:06.1530552Z                 }
2023-01-11T21:38:06.1530621Z             }
2023-01-11T21:38:06.1530688Z         }
2023-01-11T21:38:06.1530784Z     }
2023-01-11T21:38:06.1530841Z }
2023-01-11T21:38:06.1530925Z ''')
2023-01-11T21:38:06.1530930Z 
2023-01-11T21:38:06.1530935Z 
2023-01-11T21:38:06.1531032Z async_compile.wait(globals())
2023-01-11T21:38:06.1531110Z del async_compile
2023-01-11T21:38:06.1531115Z 
2023-01-11T21:38:06.1531188Z def call(args):
2023-01-11T21:38:06.1531270Z     primals_1, = args
2023-01-11T21:38:06.1531345Z     args.clear()
2023-01-11T21:38:06.1531540Z     buf0 = empty_strided((16, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1531720Z     buf1 = empty_strided((16, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1531896Z     kernel_cpp_0(c_void_p(primals_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()))
2023-01-11T21:38:06.1531973Z     del primals_1
2023-01-11T21:38:06.1532057Z     return (buf0, buf1, )
2023-01-11T21:38:06.1532063Z 
2023-01-11T21:38:06.1532067Z 
2023-01-11T21:38:06.1532149Z if __name__ == "__main__":
2023-01-11T21:38:06.1532267Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1532397Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1532595Z     primals_1 = rand_strided((16, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1532706Z     print_performance(lambda: call([primals_1]))
2023-01-11T21:38:06.1532711Z 
2023-01-11T21:38:06.1532784Z ok (1.753s)
2023-01-11T21:38:06.1533244Z   test_vertical_fusion1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1533376Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1533633Z [2023-01-11 21:33:08,797] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 323
2023-01-11T21:38:06.1533899Z [2023-01-11 21:33:10,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 323
2023-01-11T21:38:06.1533904Z 
2023-01-11T21:38:06.1534004Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1534080Z import torch
2023-01-11T21:38:06.1534156Z import random
2023-01-11T21:38:06.1534311Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1534438Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1534443Z 
2023-01-11T21:38:06.1534735Z aten = torch.ops.aten
2023-01-11T21:38:06.1534920Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1535028Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1535034Z 
2023-01-11T21:38:06.1535112Z import triton
2023-01-11T21:38:06.1535226Z import triton.language as tl
2023-01-11T21:38:06.1535368Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1535518Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1535528Z 
2023-01-11T21:38:06.1535533Z 
2023-01-11T21:38:06.1535683Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1535891Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1536020Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1536132Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1536242Z                        const float* __restrict__ in_ptr2,
2023-01-11T21:38:06.1536347Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1536416Z {
2023-01-11T21:38:06.1536514Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1536582Z     {
2023-01-11T21:38:06.1536669Z         #pragma omp for 
2023-01-11T21:38:06.1536761Z         for(long i0=0; i0<41616; i0+=1)
2023-01-11T21:38:06.1536830Z         {
2023-01-11T21:38:06.1536919Z             for(long i1=0; i1<3; i1+=1)
2023-01-11T21:38:06.1537068Z             {
2023-01-11T21:38:06.1537293Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (26*i0));
2023-01-11T21:38:06.1537444Z                 auto tmp8 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (26*i0));
2023-01-11T21:38:06.1537586Z                 auto tmp15 = at::vec::Vectorized<float>::loadu(in_ptr2 + 8*i1);
2023-01-11T21:38:06.1537821Z                 auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(-1.061519070296458e-11));
2023-01-11T21:38:06.1537916Z                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1538142Z                 auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(-1.988366587925593e-08));
2023-01-11T21:38:06.1538234Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1538325Z                 auto tmp5 = tmp0 * tmp4;
2023-01-11T21:38:06.1538543Z                 auto tmp6 = at::vec::Vectorized<float>(static_cast<float>(-3.087032500374211e-07));
2023-01-11T21:38:06.1538638Z                 auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:06.1538862Z                 auto tmp9 = at::vec::Vectorized<float>(static_cast<float>(1.55093272922008e-10));
2023-01-11T21:38:06.1538957Z                 auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.1539052Z                 auto tmp11 = tmp7 + tmp10;
2023-01-11T21:38:06.1539160Z                 auto tmp12 = tmp11.reciprocal();
2023-01-11T21:38:06.1539307Z                 auto tmp13 = at::vec::Vectorized<float>(static_cast<float>(1.0));
2023-01-11T21:38:06.1539404Z                 auto tmp14 = tmp12 * tmp13;
2023-01-11T21:38:06.1539492Z                 auto tmp16 = tmp11 * tmp15;
2023-01-11T21:38:06.1539585Z                 auto tmp17 = tmp14 + tmp16;
2023-01-11T21:38:06.1539695Z                 tmp17.store(out_ptr0 + (8*i1) + (26*i0));
2023-01-11T21:38:06.1539763Z             }
2023-01-11T21:38:06.1539863Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.1539953Z             for(long i1=24; i1<26; i1+=1)
2023-01-11T21:38:06.1540020Z             {
2023-01-11T21:38:06.1540118Z                 auto tmp0 = in_ptr0[i1 + (26*i0)];
2023-01-11T21:38:06.1540217Z                 auto tmp8 = in_ptr1[i1 + (26*i0)];
2023-01-11T21:38:06.1540310Z                 auto tmp15 = in_ptr2[i1];
2023-01-11T21:38:06.1540494Z                 auto tmp1 = static_cast<float>(-1.061519070296458e-11);
2023-01-11T21:38:06.1540636Z                 auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1540820Z                 auto tmp3 = static_cast<float>(-1.988366587925593e-08);
2023-01-11T21:38:06.1540914Z                 auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1540998Z                 auto tmp5 = tmp0 * tmp4;
2023-01-11T21:38:06.1541182Z                 auto tmp6 = static_cast<float>(-3.087032500374211e-07);
2023-01-11T21:38:06.1541275Z                 auto tmp7 = tmp5 + tmp6;
2023-01-11T21:38:06.1541452Z                 auto tmp9 = static_cast<float>(1.55093272922008e-10);
2023-01-11T21:38:06.1541547Z                 auto tmp10 = tmp8 * tmp9;
2023-01-11T21:38:06.1541640Z                 auto tmp11 = tmp7 + tmp10;
2023-01-11T21:38:06.1541735Z                 auto tmp12 = 1 / tmp11;
2023-01-11T21:38:06.1541837Z                 auto tmp13 = static_cast<float>(1.0);
2023-01-11T21:38:06.1541934Z                 auto tmp14 = tmp12 * tmp13;
2023-01-11T21:38:06.1542029Z                 auto tmp16 = tmp11 * tmp15;
2023-01-11T21:38:06.1542125Z                 auto tmp17 = tmp14 + tmp16;
2023-01-11T21:38:06.1542223Z                 out_ptr0[i1 + (26*i0)] = tmp17;
2023-01-11T21:38:06.1542291Z             }
2023-01-11T21:38:06.1542357Z         }
2023-01-11T21:38:06.1542416Z     }
2023-01-11T21:38:06.1542480Z }
2023-01-11T21:38:06.1542564Z ''')
2023-01-11T21:38:06.1542570Z 
2023-01-11T21:38:06.1542574Z 
2023-01-11T21:38:06.1542668Z async_compile.wait(globals())
2023-01-11T21:38:06.1542750Z del async_compile
2023-01-11T21:38:06.1542755Z 
2023-01-11T21:38:06.1542831Z def call(args):
2023-01-11T21:38:06.1542918Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.1542995Z     args.clear()
2023-01-11T21:38:06.1543202Z     buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1543430Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(arg2_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1543504Z     del arg0_1
2023-01-11T21:38:06.1543577Z     del arg1_1
2023-01-11T21:38:06.1543652Z     del arg2_1
2023-01-11T21:38:06.1543728Z     return (buf0, )
2023-01-11T21:38:06.1543733Z 
2023-01-11T21:38:06.1543738Z 
2023-01-11T21:38:06.1543816Z if __name__ == "__main__":
2023-01-11T21:38:06.1543929Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1544055Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1544272Z     arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1544487Z     arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1544679Z     arg2_1 = rand_strided((26, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1544811Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.1544816Z 
2023-01-11T21:38:06.1544889Z ok (1.808s)
2023-01-11T21:38:06.1545338Z   test_views1_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1545471Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1545729Z [2023-01-11 21:33:10,579] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 324
2023-01-11T21:38:06.1545984Z [2023-01-11 21:33:12,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 324
2023-01-11T21:38:06.1546406Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1546563Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1546819Z [2023-01-11 21:33:12,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 325
2023-01-11T21:38:06.1547084Z [2023-01-11 21:33:13,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 325
2023-01-11T21:38:06.1547498Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1547632Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1547886Z [2023-01-11 21:33:13,992] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 326
2023-01-11T21:38:06.1548150Z [2023-01-11 21:33:15,744] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 326
2023-01-11T21:38:06.1548565Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1548696Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1548950Z [2023-01-11 21:33:15,764] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 327
2023-01-11T21:38:06.1548983Z 
2023-01-11T21:38:06.1549077Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1549157Z import torch
2023-01-11T21:38:06.1549235Z import random
2023-01-11T21:38:06.1549358Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1549490Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1549495Z 
2023-01-11T21:38:06.1549580Z aten = torch.ops.aten
2023-01-11T21:38:06.1549721Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1549813Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1549826Z 
2023-01-11T21:38:06.1549896Z import triton
2023-01-11T21:38:06.1549991Z import triton.language as tl
2023-01-11T21:38:06.1550122Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1550264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1550270Z 
2023-01-11T21:38:06.1550279Z 
2023-01-11T21:38:06.1550418Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1550629Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1550757Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1550864Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1550971Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1551040Z {
2023-01-11T21:38:06.1551145Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1551216Z     {
2023-01-11T21:38:06.1551301Z         #pragma omp for 
2023-01-11T21:38:06.1551393Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1551456Z         {
2023-01-11T21:38:06.1551599Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1551739Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1551832Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1551933Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1552002Z         }
2023-01-11T21:38:06.1552107Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1552190Z         for(long i0=32; i0<35; i0+=1)
2023-01-11T21:38:06.1552260Z         {
2023-01-11T21:38:06.1552352Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1552469Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1552558Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1552645Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1552711Z         }
2023-01-11T21:38:06.1552770Z     }
2023-01-11T21:38:06.1552836Z }
2023-01-11T21:38:06.1552922Z ''')
2023-01-11T21:38:06.1552928Z 
2023-01-11T21:38:06.1552932Z 
2023-01-11T21:38:06.1553025Z async_compile.wait(globals())
2023-01-11T21:38:06.1553102Z del async_compile
2023-01-11T21:38:06.1553107Z 
2023-01-11T21:38:06.1553182Z def call(args):
2023-01-11T21:38:06.1553262Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1553330Z     args.clear()
2023-01-11T21:38:06.1553529Z     buf0 = empty_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1553699Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1553772Z     del arg0_1
2023-01-11T21:38:06.1553845Z     del arg1_1
2023-01-11T21:38:06.1553923Z     return (buf0, )
2023-01-11T21:38:06.1553928Z 
2023-01-11T21:38:06.1553932Z 
2023-01-11T21:38:06.1554012Z if __name__ == "__main__":
2023-01-11T21:38:06.1554130Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1554250Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1554444Z     arg0_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1554639Z     arg1_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1554759Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1554765Z 
2023-01-11T21:38:06.1554769Z 
2023-01-11T21:38:06.1554906Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1554982Z import torch
2023-01-11T21:38:06.1555055Z import random
2023-01-11T21:38:06.1555173Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1555289Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1555294Z 
2023-01-11T21:38:06.1555379Z aten = torch.ops.aten
2023-01-11T21:38:06.1555517Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1555613Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1555618Z 
2023-01-11T21:38:06.1555694Z import triton
2023-01-11T21:38:06.1555785Z import triton.language as tl
2023-01-11T21:38:06.1555911Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1556051Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1556057Z 
2023-01-11T21:38:06.1556061Z 
2023-01-11T21:38:06.1556194Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1556401Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1556528Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1556637Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1556741Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1556809Z {
2023-01-11T21:38:06.1556914Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1556972Z     {
2023-01-11T21:38:06.1557054Z         #pragma omp for 
2023-01-11T21:38:06.1557139Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1557206Z         {
2023-01-11T21:38:06.1557344Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1557479Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1557612Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1557701Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1557784Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1557881Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1557948Z         }
2023-01-11T21:38:06.1558049Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1558136Z         for(long i0=32; i0<35; i0+=1)
2023-01-11T21:38:06.1558203Z         {
2023-01-11T21:38:06.1558319Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1558402Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1558505Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1558594Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1558681Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1558766Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1558833Z         }
2023-01-11T21:38:06.1558891Z     }
2023-01-11T21:38:06.1558955Z }
2023-01-11T21:38:06.1559040Z ''')
2023-01-11T21:38:06.1559045Z 
2023-01-11T21:38:06.1559049Z 
2023-01-11T21:38:06.1559143Z async_compile.wait(globals())
2023-01-11T21:38:06.1559219Z del async_compile
2023-01-11T21:38:06.1559228Z 
2023-01-11T21:38:06.1559302Z def call(args):
2023-01-11T21:38:06.1559381Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1559458Z     args.clear()
2023-01-11T21:38:06.1559645Z     buf0 = empty_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1559815Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1559886Z     del arg0_1
2023-01-11T21:38:06.1559960Z     del arg1_1
2023-01-11T21:38:06.1560035Z     return (buf0, )
2023-01-11T21:38:06.1560040Z 
2023-01-11T21:38:06.1560044Z 
2023-01-11T21:38:06.1560128Z if __name__ == "__main__":
2023-01-11T21:38:06.1560247Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1560366Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1560559Z     arg0_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1560759Z     arg1_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1560932Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1560937Z 
2023-01-11T21:38:06.1560942Z 
2023-01-11T21:38:06.1561039Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1561114Z import torch
2023-01-11T21:38:06.1561187Z import random
2023-01-11T21:38:06.1561308Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1561424Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1561437Z 
2023-01-11T21:38:06.1561512Z aten = torch.ops.aten
2023-01-11T21:38:06.1561648Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1561742Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1561747Z 
2023-01-11T21:38:06.1561820Z import triton
2023-01-11T21:38:06.1561914Z import triton.language as tl
2023-01-11T21:38:06.1562037Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1562183Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1562192Z 
2023-01-11T21:38:06.1562196Z 
2023-01-11T21:38:06.1562324Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1562530Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1562652Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1562769Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1562873Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1562940Z {
2023-01-11T21:38:06.1563043Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1563112Z     {
2023-01-11T21:38:06.1563187Z         #pragma omp for 
2023-01-11T21:38:06.1563275Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1563343Z         {
2023-01-11T21:38:06.1563482Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1563617Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1563710Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1563804Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1563864Z         }
2023-01-11T21:38:06.1563963Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1564054Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1564121Z         {
2023-01-11T21:38:06.1564246Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1564338Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1564426Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1564504Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1564570Z         }
2023-01-11T21:38:06.1564636Z     }
2023-01-11T21:38:06.1564700Z }
2023-01-11T21:38:06.1564785Z ''')
2023-01-11T21:38:06.1564791Z 
2023-01-11T21:38:06.1564796Z 
2023-01-11T21:38:06.1564888Z async_compile.wait(globals())
2023-01-11T21:38:06.1564969Z del async_compile
2023-01-11T21:38:06.1564974Z 
2023-01-11T21:38:06.1565042Z def call(args):
2023-01-11T21:38:06.1565125Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1565223Z     args.clear()
2023-01-11T21:38:06.1565481Z     buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1565649Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1565722Z     del arg0_1
2023-01-11T21:38:06.1565796Z     del arg1_1
2023-01-11T21:38:06.1565865Z     return (buf0, )
2023-01-11T21:38:06.1565870Z 
2023-01-11T21:38:06.1565883Z 
2023-01-11T21:38:06.1565956Z if __name__ == "__main__":
2023-01-11T21:38:06.1566076Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1566202Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1566398Z     arg0_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1566632Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1566753Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1566786Z 
2023-01-11T21:38:06.1567053Z [2023-01-11 21:33:17,488] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 327
2023-01-11T21:38:06.1567470Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1567604Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1567852Z [2023-01-11 21:33:17,506] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 328
2023-01-11T21:38:06.1568113Z [2023-01-11 21:33:17,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 328
2023-01-11T21:38:06.1568523Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1568660Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1568914Z [2023-01-11 21:33:17,536] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 329
2023-01-11T21:38:06.1569176Z [2023-01-11 21:33:17,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 329
2023-01-11T21:38:06.1569592Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1569725Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1569977Z [2023-01-11 21:33:17,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 330
2023-01-11T21:38:06.1569983Z 
2023-01-11T21:38:06.1570114Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1570190Z import torch
2023-01-11T21:38:06.1570258Z import random
2023-01-11T21:38:06.1570378Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1570502Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1570507Z 
2023-01-11T21:38:06.1570589Z aten = torch.ops.aten
2023-01-11T21:38:06.1570724Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1570822Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1570827Z 
2023-01-11T21:38:06.1570902Z import triton
2023-01-11T21:38:06.1570987Z import triton.language as tl
2023-01-11T21:38:06.1571116Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1571257Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1571262Z 
2023-01-11T21:38:06.1571267Z 
2023-01-11T21:38:06.1571404Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1571612Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1571735Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1571846Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1571950Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1572009Z {
2023-01-11T21:38:06.1572112Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1572181Z     {
2023-01-11T21:38:06.1572266Z         #pragma omp for 
2023-01-11T21:38:06.1572354Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1572423Z         {
2023-01-11T21:38:06.1572560Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1572718Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1572853Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1572946Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1573037Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1573133Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1573201Z         }
2023-01-11T21:38:06.1573301Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1573385Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1573453Z         {
2023-01-11T21:38:06.1573543Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1573632Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1573740Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1573830Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1573918Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1573998Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1574065Z         }
2023-01-11T21:38:06.1574131Z     }
2023-01-11T21:38:06.1574198Z }
2023-01-11T21:38:06.1574282Z ''')
2023-01-11T21:38:06.1574288Z 
2023-01-11T21:38:06.1574292Z 
2023-01-11T21:38:06.1574385Z async_compile.wait(globals())
2023-01-11T21:38:06.1574468Z del async_compile
2023-01-11T21:38:06.1574473Z 
2023-01-11T21:38:06.1574828Z def call(args):
2023-01-11T21:38:06.1574929Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1575005Z     args.clear()
2023-01-11T21:38:06.1575249Z     buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1575418Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1575490Z     del arg0_1
2023-01-11T21:38:06.1575564Z     del arg1_1
2023-01-11T21:38:06.1575640Z     return (buf0, )
2023-01-11T21:38:06.1575646Z 
2023-01-11T21:38:06.1575661Z 
2023-01-11T21:38:06.1575734Z if __name__ == "__main__":
2023-01-11T21:38:06.1575856Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1575987Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1576186Z     arg0_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1576489Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1576611Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1576616Z 
2023-01-11T21:38:06.1576621Z 
2023-01-11T21:38:06.1576718Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1576792Z import torch
2023-01-11T21:38:06.1576859Z import random
2023-01-11T21:38:06.1576977Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1577101Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1577106Z 
2023-01-11T21:38:06.1577269Z aten = torch.ops.aten
2023-01-11T21:38:06.1577425Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1577523Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1577529Z 
2023-01-11T21:38:06.1577602Z import triton
2023-01-11T21:38:06.1577694Z import triton.language as tl
2023-01-11T21:38:06.1577811Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1577953Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1577959Z 
2023-01-11T21:38:06.1577963Z 
2023-01-11T21:38:06.1578100Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1578306Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1578430Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1578540Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1578645Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1578703Z {
2023-01-11T21:38:06.1578805Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1578915Z     {
2023-01-11T21:38:06.1579000Z         #pragma omp for 
2023-01-11T21:38:06.1579091Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1579162Z         {
2023-01-11T21:38:06.1579305Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1579448Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1579534Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1579631Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1579701Z         }
2023-01-11T21:38:06.1579802Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1579896Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1579966Z         {
2023-01-11T21:38:06.1580049Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1580140Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1580229Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1580316Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1580388Z         }
2023-01-11T21:38:06.1580456Z     }
2023-01-11T21:38:06.1580524Z }
2023-01-11T21:38:06.1580606Z ''')
2023-01-11T21:38:06.1580611Z 
2023-01-11T21:38:06.1580623Z 
2023-01-11T21:38:06.1580710Z async_compile.wait(globals())
2023-01-11T21:38:06.1580792Z del async_compile
2023-01-11T21:38:06.1580797Z 
2023-01-11T21:38:06.1580878Z def call(args):
2023-01-11T21:38:06.1580961Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1581039Z     args.clear()
2023-01-11T21:38:06.1581272Z     buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1581441Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1581510Z     del arg0_1
2023-01-11T21:38:06.1581585Z     del arg1_1
2023-01-11T21:38:06.1581662Z     return (buf0, )
2023-01-11T21:38:06.1581668Z 
2023-01-11T21:38:06.1581672Z 
2023-01-11T21:38:06.1581755Z if __name__ == "__main__":
2023-01-11T21:38:06.1581879Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1582008Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1582229Z     arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1582493Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1582610Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1582615Z 
2023-01-11T21:38:06.1582626Z 
2023-01-11T21:38:06.1582719Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1582796Z import torch
2023-01-11T21:38:06.1582874Z import random
2023-01-11T21:38:06.1582995Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1583120Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1583125Z 
2023-01-11T21:38:06.1583210Z aten = torch.ops.aten
2023-01-11T21:38:06.1583348Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1583442Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1583447Z 
2023-01-11T21:38:06.1583524Z import triton
2023-01-11T21:38:06.1583619Z import triton.language as tl
2023-01-11T21:38:06.1583747Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1583891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1583897Z 
2023-01-11T21:38:06.1583901Z 
2023-01-11T21:38:06.1584038Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1584248Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1584372Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1584477Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1584582Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1584650Z {
2023-01-11T21:38:06.1584753Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1584854Z     {
2023-01-11T21:38:06.1584939Z         #pragma omp for 
2023-01-11T21:38:06.1585028Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1585091Z         {
2023-01-11T21:38:06.1585242Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1585404Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1585557Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1585648Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1585738Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1585839Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1585902Z         }
2023-01-11T21:38:06.1586004Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1586098Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1586168Z         {
2023-01-11T21:38:06.1586261Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1586352Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1586463Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1586546Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1586634Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1586723Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1586792Z         }
2023-01-11T21:38:06.1586861Z     }
2023-01-11T21:38:06.1586931Z }
2023-01-11T21:38:06.1587019Z ''')
2023-01-11T21:38:06.1587025Z 
2023-01-11T21:38:06.1587029Z 
2023-01-11T21:38:06.1587118Z async_compile.wait(globals())
2023-01-11T21:38:06.1587198Z del async_compile
2023-01-11T21:38:06.1587203Z 
2023-01-11T21:38:06.1587280Z def call(args):
2023-01-11T21:38:06.1587361Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1587441Z     args.clear()
2023-01-11T21:38:06.1587677Z     buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1587845Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1587916Z     del arg0_1
2023-01-11T21:38:06.1587990Z     del arg1_1
2023-01-11T21:38:06.1588066Z     return (buf0, )
2023-01-11T21:38:06.1588072Z 
2023-01-11T21:38:06.1588076Z 
2023-01-11T21:38:06.1588160Z if __name__ == "__main__":
2023-01-11T21:38:06.1588282Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1588449Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1588668Z     arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1588901Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1589013Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1589026Z 
2023-01-11T21:38:06.1589284Z [2023-01-11 21:33:19,259] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 330
2023-01-11T21:38:06.1589699Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1589836Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1590092Z [2023-01-11 21:33:19,290] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 331
2023-01-11T21:38:06.1590355Z [2023-01-11 21:33:21,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 331
2023-01-11T21:38:06.1590769Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1590931Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1591186Z [2023-01-11 21:33:21,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 332
2023-01-11T21:38:06.1591450Z [2023-01-11 21:33:22,802] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 332
2023-01-11T21:38:06.1591861Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1591993Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1592248Z [2023-01-11 21:33:22,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 333
2023-01-11T21:38:06.1592257Z 
2023-01-11T21:38:06.1592348Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1592423Z import torch
2023-01-11T21:38:06.1592498Z import random
2023-01-11T21:38:06.1592619Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1592746Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1592751Z 
2023-01-11T21:38:06.1592835Z aten = torch.ops.aten
2023-01-11T21:38:06.1592971Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1593060Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1593072Z 
2023-01-11T21:38:06.1593139Z import triton
2023-01-11T21:38:06.1593232Z import triton.language as tl
2023-01-11T21:38:06.1593356Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1593497Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1593502Z 
2023-01-11T21:38:06.1593507Z 
2023-01-11T21:38:06.1593642Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1593850Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1593973Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1594075Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1594207Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1594275Z {
2023-01-11T21:38:06.1594381Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1594448Z     {
2023-01-11T21:38:06.1594532Z         #pragma omp for 
2023-01-11T21:38:06.1594623Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1594684Z         {
2023-01-11T21:38:06.1594822Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1594957Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1595048Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1595143Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1595217Z         }
2023-01-11T21:38:06.1595318Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1595401Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1595471Z         {
2023-01-11T21:38:06.1595561Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1595648Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1595739Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1595826Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1595895Z         }
2023-01-11T21:38:06.1595954Z     }
2023-01-11T21:38:06.1596018Z }
2023-01-11T21:38:06.1596103Z ''')
2023-01-11T21:38:06.1596108Z 
2023-01-11T21:38:06.1596113Z 
2023-01-11T21:38:06.1596205Z async_compile.wait(globals())
2023-01-11T21:38:06.1596282Z del async_compile
2023-01-11T21:38:06.1596287Z 
2023-01-11T21:38:06.1596362Z def call(args):
2023-01-11T21:38:06.1596442Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1596510Z     args.clear()
2023-01-11T21:38:06.1596718Z     buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1596925Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1596998Z     del arg0_1
2023-01-11T21:38:06.1597070Z     del arg1_1
2023-01-11T21:38:06.1597147Z     return (buf0, )
2023-01-11T21:38:06.1597152Z 
2023-01-11T21:38:06.1597158Z 
2023-01-11T21:38:06.1597241Z if __name__ == "__main__":
2023-01-11T21:38:06.1597360Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1597480Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1597679Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1597886Z     arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1598007Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1598012Z 
2023-01-11T21:38:06.1598017Z 
2023-01-11T21:38:06.1598116Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1598195Z import torch
2023-01-11T21:38:06.1598270Z import random
2023-01-11T21:38:06.1598389Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1598505Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1598511Z 
2023-01-11T21:38:06.1598594Z aten = torch.ops.aten
2023-01-11T21:38:06.1598732Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1598827Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1598832Z 
2023-01-11T21:38:06.1598905Z import triton
2023-01-11T21:38:06.1599000Z import triton.language as tl
2023-01-11T21:38:06.1599123Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1599264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1599269Z 
2023-01-11T21:38:06.1599274Z 
2023-01-11T21:38:06.1599403Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1599609Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1599733Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1599846Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1599952Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1600016Z {
2023-01-11T21:38:06.1600145Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1600205Z     {
2023-01-11T21:38:06.1600287Z         #pragma omp for 
2023-01-11T21:38:06.1600375Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1600443Z         {
2023-01-11T21:38:06.1600580Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1600715Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1600850Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1600942Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1601023Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1601123Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1601190Z         }
2023-01-11T21:38:06.1601288Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1601380Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1601449Z         {
2023-01-11T21:38:06.1601536Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1601620Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1601726Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1601813Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1601902Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1601986Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1602052Z         }
2023-01-11T21:38:06.1602119Z     }
2023-01-11T21:38:06.1602177Z }
2023-01-11T21:38:06.1602262Z ''')
2023-01-11T21:38:06.1602267Z 
2023-01-11T21:38:06.1602272Z 
2023-01-11T21:38:06.1602367Z async_compile.wait(globals())
2023-01-11T21:38:06.1602446Z del async_compile
2023-01-11T21:38:06.1602493Z 
2023-01-11T21:38:06.1602568Z def call(args):
2023-01-11T21:38:06.1602648Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1602724Z     args.clear()
2023-01-11T21:38:06.1602926Z     buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1603094Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1603166Z     del arg0_1
2023-01-11T21:38:06.1603237Z     del arg1_1
2023-01-11T21:38:06.1603314Z     return (buf0, )
2023-01-11T21:38:06.1603319Z 
2023-01-11T21:38:06.1603323Z 
2023-01-11T21:38:06.1603408Z if __name__ == "__main__":
2023-01-11T21:38:06.1603527Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1603655Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1603847Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1604054Z     arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1604177Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1604183Z 
2023-01-11T21:38:06.1604187Z 
2023-01-11T21:38:06.1604287Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1604360Z import torch
2023-01-11T21:38:06.1604434Z import random
2023-01-11T21:38:06.1604557Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1604674Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1604686Z 
2023-01-11T21:38:06.1604761Z aten = torch.ops.aten
2023-01-11T21:38:06.1604896Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1604992Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1604997Z 
2023-01-11T21:38:06.1605072Z import triton
2023-01-11T21:38:06.1605164Z import triton.language as tl
2023-01-11T21:38:06.1605288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1605429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1605436Z 
2023-01-11T21:38:06.1605440Z 
2023-01-11T21:38:06.1605577Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1605776Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1605902Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1606046Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1606156Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1606224Z {
2023-01-11T21:38:06.1606328Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1606393Z     {
2023-01-11T21:38:06.1606467Z         #pragma omp for 
2023-01-11T21:38:06.1606553Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1606622Z         {
2023-01-11T21:38:06.1606760Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1606896Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1606991Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1607088Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1607148Z         }
2023-01-11T21:38:06.1607252Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1607338Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.1607404Z         {
2023-01-11T21:38:06.1607493Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1607581Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1607670Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1607748Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1607815Z         }
2023-01-11T21:38:06.1607882Z     }
2023-01-11T21:38:06.1607945Z }
2023-01-11T21:38:06.1608029Z ''')
2023-01-11T21:38:06.1608035Z 
2023-01-11T21:38:06.1608039Z 
2023-01-11T21:38:06.1608133Z async_compile.wait(globals())
2023-01-11T21:38:06.1608212Z del async_compile
2023-01-11T21:38:06.1608217Z 
2023-01-11T21:38:06.1608284Z def call(args):
2023-01-11T21:38:06.1608367Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1608472Z     args.clear()
2023-01-11T21:38:06.1608664Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1608832Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1608904Z     del arg0_1
2023-01-11T21:38:06.1608977Z     del arg1_1
2023-01-11T21:38:06.1609046Z     return (buf0, )
2023-01-11T21:38:06.1609051Z 
2023-01-11T21:38:06.1609062Z 
2023-01-11T21:38:06.1609136Z if __name__ == "__main__":
2023-01-11T21:38:06.1609257Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1609383Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1609592Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1609784Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1609903Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1609911Z 
2023-01-11T21:38:06.1610174Z [2023-01-11 21:33:24,502] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 333
2023-01-11T21:38:06.1610587Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1610721Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1610969Z [2023-01-11 21:33:24,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 334
2023-01-11T21:38:06.1611232Z [2023-01-11 21:33:24,526] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 334
2023-01-11T21:38:06.1611643Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1611802Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1612058Z [2023-01-11 21:33:24,544] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 335
2023-01-11T21:38:06.1612321Z [2023-01-11 21:33:24,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 335
2023-01-11T21:38:06.1612730Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1612864Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1613119Z [2023-01-11 21:33:24,581] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 336
2023-01-11T21:38:06.1613124Z 
2023-01-11T21:38:06.1613226Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1613304Z import torch
2023-01-11T21:38:06.1613371Z import random
2023-01-11T21:38:06.1613490Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1613612Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1613617Z 
2023-01-11T21:38:06.1613699Z aten = torch.ops.aten
2023-01-11T21:38:06.1613835Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1613934Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1613939Z 
2023-01-11T21:38:06.1614013Z import triton
2023-01-11T21:38:06.1614098Z import triton.language as tl
2023-01-11T21:38:06.1614224Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1614390Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1614396Z 
2023-01-11T21:38:06.1614400Z 
2023-01-11T21:38:06.1614755Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1615002Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1615126Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1615235Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1615338Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1615396Z {
2023-01-11T21:38:06.1615498Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1615564Z     {
2023-01-11T21:38:06.1615645Z         #pragma omp for 
2023-01-11T21:38:06.1615732Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1615799Z         {
2023-01-11T21:38:06.1615939Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1616072Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1616205Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1616297Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1616384Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1616483Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1616550Z         }
2023-01-11T21:38:06.1616647Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1616726Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.1616792Z         {
2023-01-11T21:38:06.1616880Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1616966Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1617069Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1617289Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1617385Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1617463Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1617532Z         }
2023-01-11T21:38:06.1617598Z     }
2023-01-11T21:38:06.1617663Z }
2023-01-11T21:38:06.1617756Z ''')
2023-01-11T21:38:06.1617761Z 
2023-01-11T21:38:06.1617766Z 
2023-01-11T21:38:06.1617858Z async_compile.wait(globals())
2023-01-11T21:38:06.1617935Z del async_compile
2023-01-11T21:38:06.1617941Z 
2023-01-11T21:38:06.1618070Z def call(args):
2023-01-11T21:38:06.1618153Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1618229Z     args.clear()
2023-01-11T21:38:06.1618421Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1618587Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1618660Z     del arg0_1
2023-01-11T21:38:06.1618731Z     del arg1_1
2023-01-11T21:38:06.1618799Z     return (buf0, )
2023-01-11T21:38:06.1618811Z 
2023-01-11T21:38:06.1618815Z 
2023-01-11T21:38:06.1618888Z if __name__ == "__main__":
2023-01-11T21:38:06.1619006Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1619136Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1619346Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1619537Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1619659Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1619664Z 
2023-01-11T21:38:06.1619669Z 
2023-01-11T21:38:06.1619766Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1619840Z import torch
2023-01-11T21:38:06.1619906Z import random
2023-01-11T21:38:06.1620026Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1620149Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1620154Z 
2023-01-11T21:38:06.1620236Z aten = torch.ops.aten
2023-01-11T21:38:06.1620373Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1620468Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1620514Z 
2023-01-11T21:38:06.1620594Z import triton
2023-01-11T21:38:06.1620682Z import triton.language as tl
2023-01-11T21:38:06.1620810Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1620951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1620956Z 
2023-01-11T21:38:06.1620963Z 
2023-01-11T21:38:06.1621102Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1621311Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1621437Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1621550Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1621657Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1621718Z {
2023-01-11T21:38:06.1621822Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1621891Z     {
2023-01-11T21:38:06.1621975Z         #pragma omp for 
2023-01-11T21:38:06.1622070Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1622138Z         {
2023-01-11T21:38:06.1622280Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1622411Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1622504Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1622605Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1622675Z         }
2023-01-11T21:38:06.1622778Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1622872Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1622941Z         {
2023-01-11T21:38:06.1623025Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1623115Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1623207Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1623295Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1623365Z         }
2023-01-11T21:38:06.1623433Z     }
2023-01-11T21:38:06.1623499Z }
2023-01-11T21:38:06.1623582Z ''')
2023-01-11T21:38:06.1623587Z 
2023-01-11T21:38:06.1623591Z 
2023-01-11T21:38:06.1623687Z async_compile.wait(globals())
2023-01-11T21:38:06.1623767Z del async_compile
2023-01-11T21:38:06.1623772Z 
2023-01-11T21:38:06.1623850Z def call(args):
2023-01-11T21:38:06.1623931Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1624045Z     args.clear()
2023-01-11T21:38:06.1624252Z     buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1624421Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1624490Z     del arg0_1
2023-01-11T21:38:06.1624563Z     del arg1_1
2023-01-11T21:38:06.1624641Z     return (buf0, )
2023-01-11T21:38:06.1624647Z 
2023-01-11T21:38:06.1624651Z 
2023-01-11T21:38:06.1624734Z if __name__ == "__main__":
2023-01-11T21:38:06.1624853Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1624982Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1625227Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1625457Z     arg1_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1625582Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1625588Z 
2023-01-11T21:38:06.1625594Z 
2023-01-11T21:38:06.1625694Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1625770Z import torch
2023-01-11T21:38:06.1625847Z import random
2023-01-11T21:38:06.1625968Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1626095Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1626100Z 
2023-01-11T21:38:06.1626185Z aten = torch.ops.aten
2023-01-11T21:38:06.1626316Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1626414Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1626419Z 
2023-01-11T21:38:06.1626494Z import triton
2023-01-11T21:38:06.1626633Z import triton.language as tl
2023-01-11T21:38:06.1626756Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1626894Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1626899Z 
2023-01-11T21:38:06.1626903Z 
2023-01-11T21:38:06.1627038Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1627243Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1627359Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1627471Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1627578Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1627643Z {
2023-01-11T21:38:06.1627745Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1627815Z     {
2023-01-11T21:38:06.1627896Z         #pragma omp for 
2023-01-11T21:38:06.1627977Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1628044Z         {
2023-01-11T21:38:06.1628185Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1628320Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1628457Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1628551Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1628639Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1628737Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1628797Z         }
2023-01-11T21:38:06.1628895Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1628986Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1629053Z         {
2023-01-11T21:38:06.1629143Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1629232Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1629335Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1629416Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1629506Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1629596Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1629662Z         }
2023-01-11T21:38:06.1629727Z     }
2023-01-11T21:38:06.1629792Z }
2023-01-11T21:38:06.1629870Z ''')
2023-01-11T21:38:06.1629876Z 
2023-01-11T21:38:06.1629889Z 
2023-01-11T21:38:06.1629974Z async_compile.wait(globals())
2023-01-11T21:38:06.1630084Z del async_compile
2023-01-11T21:38:06.1630090Z 
2023-01-11T21:38:06.1630166Z def call(args):
2023-01-11T21:38:06.1630249Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1630324Z     args.clear()
2023-01-11T21:38:06.1630527Z     buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1630695Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1630760Z     del arg0_1
2023-01-11T21:38:06.1630831Z     del arg1_1
2023-01-11T21:38:06.1630906Z     return (buf0, )
2023-01-11T21:38:06.1630911Z 
2023-01-11T21:38:06.1630918Z 
2023-01-11T21:38:06.1630999Z if __name__ == "__main__":
2023-01-11T21:38:06.1631116Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1631242Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1631468Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1631672Z     arg1_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1631786Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1631791Z 
2023-01-11T21:38:06.1632055Z [2023-01-11 21:33:26,249] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 336
2023-01-11T21:38:06.1632467Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1632632Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1632889Z [2023-01-11 21:33:26,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 337
2023-01-11T21:38:06.1633154Z [2023-01-11 21:33:27,984] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 337
2023-01-11T21:38:06.1633564Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1633693Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1633947Z [2023-01-11 21:33:28,000] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 338
2023-01-11T21:38:06.1634210Z [2023-01-11 21:33:28,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 338
2023-01-11T21:38:06.1634624Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1634754Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1635000Z [2023-01-11 21:33:28,025] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 339
2023-01-11T21:38:06.1635015Z 
2023-01-11T21:38:06.1635106Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1635181Z import torch
2023-01-11T21:38:06.1635256Z import random
2023-01-11T21:38:06.1635378Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1635504Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1635509Z 
2023-01-11T21:38:06.1635592Z aten = torch.ops.aten
2023-01-11T21:38:06.1635729Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1635845Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1635851Z 
2023-01-11T21:38:06.1635929Z import triton
2023-01-11T21:38:06.1636021Z import triton.language as tl
2023-01-11T21:38:06.1636146Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1636288Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1636294Z 
2023-01-11T21:38:06.1636298Z 
2023-01-11T21:38:06.1636437Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1636644Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1636771Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1636875Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1636979Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1637044Z {
2023-01-11T21:38:06.1637148Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1637215Z     {
2023-01-11T21:38:06.1637299Z         #pragma omp for 
2023-01-11T21:38:06.1637387Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1637448Z         {
2023-01-11T21:38:06.1637587Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1637724Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1637817Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1637912Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1637979Z         }
2023-01-11T21:38:06.1638079Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1638159Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1638271Z         {
2023-01-11T21:38:06.1638359Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1638447Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1638535Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1638620Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1638686Z         }
2023-01-11T21:38:06.1638745Z     }
2023-01-11T21:38:06.1638812Z }
2023-01-11T21:38:06.1638897Z ''')
2023-01-11T21:38:06.1638903Z 
2023-01-11T21:38:06.1638910Z 
2023-01-11T21:38:06.1639001Z async_compile.wait(globals())
2023-01-11T21:38:06.1639078Z del async_compile
2023-01-11T21:38:06.1639083Z 
2023-01-11T21:38:06.1639158Z def call(args):
2023-01-11T21:38:06.1639237Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1639305Z     args.clear()
2023-01-11T21:38:06.1639499Z     buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1639666Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1639739Z     del arg0_1
2023-01-11T21:38:06.1639814Z     del arg1_1
2023-01-11T21:38:06.1639890Z     return (buf0, )
2023-01-11T21:38:06.1639895Z 
2023-01-11T21:38:06.1639899Z 
2023-01-11T21:38:06.1639981Z if __name__ == "__main__":
2023-01-11T21:38:06.1640100Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1640222Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1640429Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1640623Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1640743Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1640749Z 
2023-01-11T21:38:06.1640753Z 
2023-01-11T21:38:06.1640852Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1640926Z import torch
2023-01-11T21:38:06.1640999Z import random
2023-01-11T21:38:06.1641111Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1641233Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1641241Z 
2023-01-11T21:38:06.1641321Z aten = torch.ops.aten
2023-01-11T21:38:06.1641455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1641550Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1641555Z 
2023-01-11T21:38:06.1641631Z import triton
2023-01-11T21:38:06.1641754Z import triton.language as tl
2023-01-11T21:38:06.1641882Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1642014Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1642019Z 
2023-01-11T21:38:06.1642033Z 
2023-01-11T21:38:06.1642163Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1642368Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1642492Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1642604Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1642711Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1642777Z {
2023-01-11T21:38:06.1642878Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1642937Z     {
2023-01-11T21:38:06.1643019Z         #pragma omp for 
2023-01-11T21:38:06.1643104Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1643171Z         {
2023-01-11T21:38:06.1643309Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1643447Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1643582Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1643664Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1643751Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1643848Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1643915Z         }
2023-01-11T21:38:06.1644015Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1644102Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1644196Z         {
2023-01-11T21:38:06.1644277Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1644365Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1644469Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1644556Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1644646Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1644732Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1644798Z         }
2023-01-11T21:38:06.1644857Z     }
2023-01-11T21:38:06.1644922Z }
2023-01-11T21:38:06.1645005Z ''')
2023-01-11T21:38:06.1645011Z 
2023-01-11T21:38:06.1645015Z 
2023-01-11T21:38:06.1645113Z async_compile.wait(globals())
2023-01-11T21:38:06.1645206Z del async_compile
2023-01-11T21:38:06.1645213Z 
2023-01-11T21:38:06.1645295Z def call(args):
2023-01-11T21:38:06.1645391Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1645468Z     args.clear()
2023-01-11T21:38:06.1645660Z     buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1645829Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1645901Z     del arg0_1
2023-01-11T21:38:06.1645974Z     del arg1_1
2023-01-11T21:38:06.1646050Z     return (buf0, )
2023-01-11T21:38:06.1646055Z 
2023-01-11T21:38:06.1646060Z 
2023-01-11T21:38:06.1646145Z if __name__ == "__main__":
2023-01-11T21:38:06.1646261Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1646381Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1646590Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1646784Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1646904Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1646909Z 
2023-01-11T21:38:06.1646914Z 
2023-01-11T21:38:06.1647011Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1647089Z import torch
2023-01-11T21:38:06.1647164Z import random
2023-01-11T21:38:06.1647282Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1647398Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1647404Z 
2023-01-11T21:38:06.1647484Z aten = torch.ops.aten
2023-01-11T21:38:06.1647654Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1647751Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1647756Z 
2023-01-11T21:38:06.1647830Z import triton
2023-01-11T21:38:06.1647921Z import triton.language as tl
2023-01-11T21:38:06.1648046Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1648178Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1648190Z 
2023-01-11T21:38:06.1648194Z 
2023-01-11T21:38:06.1648321Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1648527Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1648653Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1648762Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1648865Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1648931Z {
2023-01-11T21:38:06.1649033Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1649094Z     {
2023-01-11T21:38:06.1649177Z         #pragma omp for 
2023-01-11T21:38:06.1649264Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1649330Z         {
2023-01-11T21:38:06.1649467Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1649601Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1649689Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1649777Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1649844Z         }
2023-01-11T21:38:06.1649942Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1650060Z         for(long i0=32; i0<35; i0+=1)
2023-01-11T21:38:06.1650128Z         {
2023-01-11T21:38:06.1650216Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1650305Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1650385Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1650472Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1650541Z         }
2023-01-11T21:38:06.1650609Z     }
2023-01-11T21:38:06.1650673Z }
2023-01-11T21:38:06.1650756Z ''')
2023-01-11T21:38:06.1650762Z 
2023-01-11T21:38:06.1650766Z 
2023-01-11T21:38:06.1650858Z async_compile.wait(globals())
2023-01-11T21:38:06.1650928Z del async_compile
2023-01-11T21:38:06.1650933Z 
2023-01-11T21:38:06.1651007Z def call(args):
2023-01-11T21:38:06.1651087Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1651162Z     args.clear()
2023-01-11T21:38:06.1651356Z     buf0 = empty_strided((35, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1651522Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1651597Z     del arg0_1
2023-01-11T21:38:06.1651662Z     del arg1_1
2023-01-11T21:38:06.1651737Z     return (buf0, )
2023-01-11T21:38:06.1651742Z 
2023-01-11T21:38:06.1651747Z 
2023-01-11T21:38:06.1651826Z if __name__ == "__main__":
2023-01-11T21:38:06.1651944Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1652073Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1652269Z     arg0_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1652461Z     arg1_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1652580Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1652585Z 
2023-01-11T21:38:06.1652848Z [2023-01-11 21:33:28,039] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 339
2023-01-11T21:38:06.1653254Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1653417Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1653674Z [2023-01-11 21:33:28,053] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 340
2023-01-11T21:38:06.1653937Z [2023-01-11 21:33:28,062] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 340
2023-01-11T21:38:06.1654347Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1654624Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1654949Z [2023-01-11 21:33:28,078] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 341
2023-01-11T21:38:06.1655216Z [2023-01-11 21:33:28,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 341
2023-01-11T21:38:06.1655629Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1655759Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1656011Z [2023-01-11 21:33:28,120] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 342
2023-01-11T21:38:06.1656071Z 
2023-01-11T21:38:06.1656175Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1656245Z import torch
2023-01-11T21:38:06.1656321Z import random
2023-01-11T21:38:06.1656443Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1656573Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1656581Z 
2023-01-11T21:38:06.1656666Z aten = torch.ops.aten
2023-01-11T21:38:06.1656805Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1656903Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1656908Z 
2023-01-11T21:38:06.1656978Z import triton
2023-01-11T21:38:06.1657073Z import triton.language as tl
2023-01-11T21:38:06.1657304Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1657460Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1657466Z 
2023-01-11T21:38:06.1657471Z 
2023-01-11T21:38:06.1657611Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1657821Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1657945Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1658053Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1658153Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1658219Z {
2023-01-11T21:38:06.1658320Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1658387Z     {
2023-01-11T21:38:06.1658470Z         #pragma omp for 
2023-01-11T21:38:06.1658558Z         for(long i0=0; i0<4; i0+=1)
2023-01-11T21:38:06.1658627Z         {
2023-01-11T21:38:06.1658760Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1658895Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1659030Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1659122Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1659212Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1659309Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1659376Z         }
2023-01-11T21:38:06.1659475Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1659555Z         for(long i0=32; i0<35; i0+=1)
2023-01-11T21:38:06.1659664Z         {
2023-01-11T21:38:06.1659754Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1659842Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1659950Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1660038Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1660118Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1660204Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1660269Z         }
2023-01-11T21:38:06.1660333Z     }
2023-01-11T21:38:06.1660397Z }
2023-01-11T21:38:06.1660483Z ''')
2023-01-11T21:38:06.1660488Z 
2023-01-11T21:38:06.1660493Z 
2023-01-11T21:38:06.1660587Z async_compile.wait(globals())
2023-01-11T21:38:06.1660661Z del async_compile
2023-01-11T21:38:06.1660674Z 
2023-01-11T21:38:06.1660742Z def call(args):
2023-01-11T21:38:06.1660820Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1660896Z     args.clear()
2023-01-11T21:38:06.1661086Z     buf0 = empty_strided((35, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1661253Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1661326Z     del arg0_1
2023-01-11T21:38:06.1661397Z     del arg1_1
2023-01-11T21:38:06.1661464Z     return (buf0, )
2023-01-11T21:38:06.1661469Z 
2023-01-11T21:38:06.1661474Z 
2023-01-11T21:38:06.1661556Z if __name__ == "__main__":
2023-01-11T21:38:06.1661674Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1661802Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1661997Z     arg0_1 = rand_strided((5, 7), (7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1662189Z     arg1_1 = rand_strided((35, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1662337Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1662343Z 
2023-01-11T21:38:06.1662347Z 
2023-01-11T21:38:06.1662445Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1662512Z import torch
2023-01-11T21:38:06.1662584Z import random
2023-01-11T21:38:06.1662706Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1662830Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1662835Z 
2023-01-11T21:38:06.1662918Z aten = torch.ops.aten
2023-01-11T21:38:06.1663053Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1663148Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1663153Z 
2023-01-11T21:38:06.1663227Z import triton
2023-01-11T21:38:06.1663312Z import triton.language as tl
2023-01-11T21:38:06.1663435Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1663574Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1663583Z 
2023-01-11T21:38:06.1663587Z 
2023-01-11T21:38:06.1663721Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1663928Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1664052Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1664161Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1664265Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1664323Z {
2023-01-11T21:38:06.1664424Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1664489Z     {
2023-01-11T21:38:06.1664570Z         #pragma omp for 
2023-01-11T21:38:06.1664657Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1664724Z         {
2023-01-11T21:38:06.1664854Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1664989Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1665081Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1665177Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1665244Z         }
2023-01-11T21:38:06.1665356Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1665462Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1665573Z         {
2023-01-11T21:38:06.1665663Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1665752Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1665838Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1665923Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1665989Z         }
2023-01-11T21:38:06.1666055Z     }
2023-01-11T21:38:06.1666112Z }
2023-01-11T21:38:06.1666197Z ''')
2023-01-11T21:38:06.1666203Z 
2023-01-11T21:38:06.1666207Z 
2023-01-11T21:38:06.1666299Z async_compile.wait(globals())
2023-01-11T21:38:06.1666377Z del async_compile
2023-01-11T21:38:06.1666382Z 
2023-01-11T21:38:06.1666458Z def call(args):
2023-01-11T21:38:06.1666541Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1666615Z     args.clear()
2023-01-11T21:38:06.1666810Z     buf0 = empty_strided((5040, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1666970Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1667044Z     del arg0_1
2023-01-11T21:38:06.1667115Z     del arg1_1
2023-01-11T21:38:06.1667191Z     return (buf0, )
2023-01-11T21:38:06.1667196Z 
2023-01-11T21:38:06.1667201Z 
2023-01-11T21:38:06.1667280Z if __name__ == "__main__":
2023-01-11T21:38:06.1667397Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1667523Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1667749Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1667946Z     arg1_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1668099Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1668105Z 
2023-01-11T21:38:06.1668109Z 
2023-01-11T21:38:06.1668208Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1668285Z import torch
2023-01-11T21:38:06.1668361Z import random
2023-01-11T21:38:06.1668482Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1668611Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1668616Z 
2023-01-11T21:38:06.1668693Z aten = torch.ops.aten
2023-01-11T21:38:06.1668832Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1668929Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1668934Z 
2023-01-11T21:38:06.1669009Z import triton
2023-01-11T21:38:06.1669103Z import triton.language as tl
2023-01-11T21:38:06.1669232Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1669377Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1669382Z 
2023-01-11T21:38:06.1669389Z 
2023-01-11T21:38:06.1669527Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1669728Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1669855Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1669968Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1670077Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1670146Z {
2023-01-11T21:38:06.1670249Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1670316Z     {
2023-01-11T21:38:06.1670393Z         #pragma omp for 
2023-01-11T21:38:06.1670483Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1670553Z         {
2023-01-11T21:38:06.1670693Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1670831Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1670968Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1671064Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1671154Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1671245Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1671314Z         }
2023-01-11T21:38:06.1671415Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1671537Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1671608Z         {
2023-01-11T21:38:06.1671698Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1671790Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1671890Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1671980Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1672070Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1672157Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1672224Z         }
2023-01-11T21:38:06.1672291Z     }
2023-01-11T21:38:06.1672357Z }
2023-01-11T21:38:06.1672437Z ''')
2023-01-11T21:38:06.1672445Z 
2023-01-11T21:38:06.1672449Z 
2023-01-11T21:38:06.1672546Z async_compile.wait(globals())
2023-01-11T21:38:06.1672624Z del async_compile
2023-01-11T21:38:06.1672630Z 
2023-01-11T21:38:06.1672706Z def call(args):
2023-01-11T21:38:06.1672789Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1672867Z     args.clear()
2023-01-11T21:38:06.1673067Z     buf0 = empty_strided((5040, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1673227Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1673303Z     del arg0_1
2023-01-11T21:38:06.1673376Z     del arg1_1
2023-01-11T21:38:06.1673454Z     return (buf0, )
2023-01-11T21:38:06.1673460Z 
2023-01-11T21:38:06.1673464Z 
2023-01-11T21:38:06.1673547Z if __name__ == "__main__":
2023-01-11T21:38:06.1673665Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1673794Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1674028Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1674248Z     arg1_1 = rand_strided((5040, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1674370Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1674375Z 
2023-01-11T21:38:06.1674644Z [2023-01-11 21:33:28,131] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 342
2023-01-11T21:38:06.1675057Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1675191Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1675450Z [2023-01-11 21:33:28,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 343
2023-01-11T21:38:06.1675715Z [2023-01-11 21:33:28,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 343
2023-01-11T21:38:06.1676132Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1676264Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1676523Z [2023-01-11 21:33:28,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 344
2023-01-11T21:38:06.1676787Z [2023-01-11 21:33:28,219] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 344
2023-01-11T21:38:06.1677198Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1677366Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1677622Z [2023-01-11 21:33:28,236] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 345
2023-01-11T21:38:06.1677628Z 
2023-01-11T21:38:06.1677728Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1677804Z import torch
2023-01-11T21:38:06.1677881Z import random
2023-01-11T21:38:06.1678003Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1678129Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1678134Z 
2023-01-11T21:38:06.1678218Z aten = torch.ops.aten
2023-01-11T21:38:06.1678350Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1678453Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1678458Z 
2023-01-11T21:38:06.1678533Z import triton
2023-01-11T21:38:06.1678629Z import triton.language as tl
2023-01-11T21:38:06.1678756Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1678901Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1678907Z 
2023-01-11T21:38:06.1678911Z 
2023-01-11T21:38:06.1679049Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1679258Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1679376Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1679488Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1679597Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1679666Z {
2023-01-11T21:38:06.1679769Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1679877Z     {
2023-01-11T21:38:06.1679963Z         #pragma omp for 
2023-01-11T21:38:06.1680046Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1680115Z         {
2023-01-11T21:38:06.1680254Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1680397Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1680491Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1680590Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1680659Z         }
2023-01-11T21:38:06.1680754Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1680847Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1680915Z         {
2023-01-11T21:38:06.1681006Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1681096Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1681187Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1681274Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1681340Z         }
2023-01-11T21:38:06.1681407Z     }
2023-01-11T21:38:06.1681473Z }
2023-01-11T21:38:06.1681560Z ''')
2023-01-11T21:38:06.1681566Z 
2023-01-11T21:38:06.1681570Z 
2023-01-11T21:38:06.1681666Z async_compile.wait(globals())
2023-01-11T21:38:06.1681745Z del async_compile
2023-01-11T21:38:06.1681750Z 
2023-01-11T21:38:06.1681828Z def call(args):
2023-01-11T21:38:06.1681904Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1681981Z     args.clear()
2023-01-11T21:38:06.1682200Z     buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1682366Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1682442Z     del arg0_1
2023-01-11T21:38:06.1682517Z     del arg1_1
2023-01-11T21:38:06.1682595Z     return (buf0, )
2023-01-11T21:38:06.1682600Z 
2023-01-11T21:38:06.1682604Z 
2023-01-11T21:38:06.1682687Z if __name__ == "__main__":
2023-01-11T21:38:06.1682800Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1682932Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1683168Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1683418Z     arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1683543Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1683549Z 
2023-01-11T21:38:06.1683553Z 
2023-01-11T21:38:06.1683652Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1683730Z import torch
2023-01-11T21:38:06.1683807Z import random
2023-01-11T21:38:06.1683921Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1684046Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1684051Z 
2023-01-11T21:38:06.1684134Z aten = torch.ops.aten
2023-01-11T21:38:06.1684271Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1684370Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1684375Z 
2023-01-11T21:38:06.1684452Z import triton
2023-01-11T21:38:06.1684546Z import triton.language as tl
2023-01-11T21:38:06.1684665Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1684806Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1684814Z 
2023-01-11T21:38:06.1684818Z 
2023-01-11T21:38:06.1684956Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1685162Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1685286Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1685399Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1685504Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1685572Z {
2023-01-11T21:38:06.1685668Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1685737Z     {
2023-01-11T21:38:06.1685851Z         #pragma omp for 
2023-01-11T21:38:06.1685943Z         for(long i0=0; i0<630; i0+=1)
2023-01-11T21:38:06.1686012Z         {
2023-01-11T21:38:06.1686151Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1686288Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1686425Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1686511Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1686602Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1686700Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1686770Z         }
2023-01-11T21:38:06.1686871Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1686966Z         for(long i0=5040; i0<5040; i0+=1)
2023-01-11T21:38:06.1687035Z         {
2023-01-11T21:38:06.1687120Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1687210Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1687320Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1687411Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1687502Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1687589Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1687652Z         }
2023-01-11T21:38:06.1687720Z     }
2023-01-11T21:38:06.1687786Z }
2023-01-11T21:38:06.1687875Z ''')
2023-01-11T21:38:06.1687880Z 
2023-01-11T21:38:06.1687885Z 
2023-01-11T21:38:06.1687981Z async_compile.wait(globals())
2023-01-11T21:38:06.1688062Z del async_compile
2023-01-11T21:38:06.1688067Z 
2023-01-11T21:38:06.1688144Z def call(args):
2023-01-11T21:38:06.1688225Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1688296Z     args.clear()
2023-01-11T21:38:06.1688513Z     buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1688683Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1688757Z     del arg0_1
2023-01-11T21:38:06.1688834Z     del arg1_1
2023-01-11T21:38:06.1688911Z     return (buf0, )
2023-01-11T21:38:06.1688916Z 
2023-01-11T21:38:06.1688920Z 
2023-01-11T21:38:06.1689005Z if __name__ == "__main__":
2023-01-11T21:38:06.1689118Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1689277Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1689511Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1689725Z     arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1689845Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1689850Z 
2023-01-11T21:38:06.1689855Z 
2023-01-11T21:38:06.1689953Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1690027Z import torch
2023-01-11T21:38:06.1690101Z import random
2023-01-11T21:38:06.1690213Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1690340Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1690345Z 
2023-01-11T21:38:06.1690426Z aten = torch.ops.aten
2023-01-11T21:38:06.1690563Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1690659Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1690664Z 
2023-01-11T21:38:06.1690740Z import triton
2023-01-11T21:38:06.1690831Z import triton.language as tl
2023-01-11T21:38:06.1690957Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1691089Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1691094Z 
2023-01-11T21:38:06.1691105Z 
2023-01-11T21:38:06.1691232Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1691439Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1691560Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1699059Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1699272Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1699343Z {
2023-01-11T21:38:06.1699451Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1699526Z     {
2023-01-11T21:38:06.1699613Z         #pragma omp for 
2023-01-11T21:38:06.1699697Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1699772Z         {
2023-01-11T21:38:06.1699925Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1700065Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1700159Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1700257Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1700327Z         }
2023-01-11T21:38:06.1700422Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1700519Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1700590Z         {
2023-01-11T21:38:06.1700681Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1700770Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1700854Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1700945Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1701013Z         }
2023-01-11T21:38:06.1701082Z     }
2023-01-11T21:38:06.1701148Z }
2023-01-11T21:38:06.1701262Z ''')
2023-01-11T21:38:06.1701269Z 
2023-01-11T21:38:06.1701276Z 
2023-01-11T21:38:06.1701375Z async_compile.wait(globals())
2023-01-11T21:38:06.1701449Z del async_compile
2023-01-11T21:38:06.1701462Z 
2023-01-11T21:38:06.1701534Z def call(args):
2023-01-11T21:38:06.1701618Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1701696Z     args.clear()
2023-01-11T21:38:06.1701903Z     buf0 = empty_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1702087Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1702161Z     del arg0_1
2023-01-11T21:38:06.1702233Z     del arg1_1
2023-01-11T21:38:06.1702306Z     return (buf0, )
2023-01-11T21:38:06.1702311Z 
2023-01-11T21:38:06.1702316Z 
2023-01-11T21:38:06.1702397Z if __name__ == "__main__":
2023-01-11T21:38:06.1702526Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1702663Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1702944Z     arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1703174Z     arg1_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1703303Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1703308Z 
2023-01-11T21:38:06.1703615Z [2023-01-11 21:33:28,257] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 345
2023-01-11T21:38:06.1704104Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1704243Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1704538Z [2023-01-11 21:33:28,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 346
2023-01-11T21:38:06.1704839Z [2023-01-11 21:33:28,279] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 346
2023-01-11T21:38:06.1705327Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1705469Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1705789Z [2023-01-11 21:33:28,296] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 347
2023-01-11T21:38:06.1706092Z [2023-01-11 21:33:28,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 347
2023-01-11T21:38:06.1706581Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1706724Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1707015Z [2023-01-11 21:33:28,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 348
2023-01-11T21:38:06.1707021Z 
2023-01-11T21:38:06.1707128Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1707202Z import torch
2023-01-11T21:38:06.1707280Z import random
2023-01-11T21:38:06.1707410Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1707545Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1707550Z 
2023-01-11T21:38:06.1707638Z aten = torch.ops.aten
2023-01-11T21:38:06.1707792Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1707895Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1707900Z 
2023-01-11T21:38:06.1707978Z import triton
2023-01-11T21:38:06.1708070Z import triton.language as tl
2023-01-11T21:38:06.1708207Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1708360Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1708366Z 
2023-01-11T21:38:06.1708371Z 
2023-01-11T21:38:06.1708524Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1708758Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1708896Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1709013Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1709125Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1709186Z {
2023-01-11T21:38:06.1709296Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1709393Z     {
2023-01-11T21:38:06.1709481Z         #pragma omp for 
2023-01-11T21:38:06.1709574Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1709644Z         {
2023-01-11T21:38:06.1709790Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1709940Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1710085Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1710181Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1710276Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1710378Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1710452Z         }
2023-01-11T21:38:06.1710558Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1710648Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1710717Z         {
2023-01-11T21:38:06.1710811Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1710906Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1711017Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1711112Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1711205Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1711288Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1711356Z         }
2023-01-11T21:38:06.1711423Z     }
2023-01-11T21:38:06.1711490Z }
2023-01-11T21:38:06.1711580Z ''')
2023-01-11T21:38:06.1711586Z 
2023-01-11T21:38:06.1711590Z 
2023-01-11T21:38:06.1711691Z async_compile.wait(globals())
2023-01-11T21:38:06.1711772Z del async_compile
2023-01-11T21:38:06.1711777Z 
2023-01-11T21:38:06.1711849Z def call(args):
2023-01-11T21:38:06.1711963Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1712039Z     args.clear()
2023-01-11T21:38:06.1712269Z     buf0 = empty_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1712438Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1712512Z     del arg0_1
2023-01-11T21:38:06.1712584Z     del arg1_1
2023-01-11T21:38:06.1712652Z     return (buf0, )
2023-01-11T21:38:06.1712657Z 
2023-01-11T21:38:06.1712668Z 
2023-01-11T21:38:06.1712741Z if __name__ == "__main__":
2023-01-11T21:38:06.1712860Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1712988Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1713197Z     arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1713394Z     arg1_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1713517Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1713523Z 
2023-01-11T21:38:06.1713527Z 
2023-01-11T21:38:06.1713624Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1713691Z import torch
2023-01-11T21:38:06.1713766Z import random
2023-01-11T21:38:06.1713884Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1714012Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1714017Z 
2023-01-11T21:38:06.1714100Z aten = torch.ops.aten
2023-01-11T21:38:06.1714236Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1714331Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1714336Z 
2023-01-11T21:38:06.1714409Z import triton
2023-01-11T21:38:06.1714494Z import triton.language as tl
2023-01-11T21:38:06.1714618Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1714758Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1714764Z 
2023-01-11T21:38:06.1714770Z 
2023-01-11T21:38:06.1714905Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1715111Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1715234Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1715375Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1715480Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1715539Z {
2023-01-11T21:38:06.1715639Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1715704Z     {
2023-01-11T21:38:06.1715787Z         #pragma omp for 
2023-01-11T21:38:06.1715874Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1715941Z         {
2023-01-11T21:38:06.1716082Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1716212Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1716301Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1716400Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1716466Z         }
2023-01-11T21:38:06.1716568Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1716654Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.1716720Z         {
2023-01-11T21:38:06.1716801Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1716891Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1716982Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1717068Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1717134Z         }
2023-01-11T21:38:06.1717200Z     }
2023-01-11T21:38:06.1717257Z }
2023-01-11T21:38:06.1717342Z ''')
2023-01-11T21:38:06.1717347Z 
2023-01-11T21:38:06.1717352Z 
2023-01-11T21:38:06.1717444Z async_compile.wait(globals())
2023-01-11T21:38:06.1717522Z del async_compile
2023-01-11T21:38:06.1717527Z 
2023-01-11T21:38:06.1717603Z def call(args):
2023-01-11T21:38:06.1717684Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1717759Z     args.clear()
2023-01-11T21:38:06.1718000Z     buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1718158Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1718230Z     del arg0_1
2023-01-11T21:38:06.1718306Z     del arg1_1
2023-01-11T21:38:06.1718383Z     return (buf0, )
2023-01-11T21:38:06.1718388Z 
2023-01-11T21:38:06.1718393Z 
2023-01-11T21:38:06.1718473Z if __name__ == "__main__":
2023-01-11T21:38:06.1718590Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1718718Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1718912Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1719108Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1719227Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1719232Z 
2023-01-11T21:38:06.1719237Z 
2023-01-11T21:38:06.1719337Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1719412Z import torch
2023-01-11T21:38:06.1719487Z import random
2023-01-11T21:38:06.1719608Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1719731Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1719737Z 
2023-01-11T21:38:06.1719827Z aten = torch.ops.aten
2023-01-11T21:38:06.1719956Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1720053Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1720058Z 
2023-01-11T21:38:06.1720133Z import triton
2023-01-11T21:38:06.1720225Z import triton.language as tl
2023-01-11T21:38:06.1720353Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1720493Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1720498Z 
2023-01-11T21:38:06.1720503Z 
2023-01-11T21:38:06.1720638Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1720843Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1720963Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1721074Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1721179Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1721245Z {
2023-01-11T21:38:06.1721377Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1721444Z     {
2023-01-11T21:38:06.1721526Z         #pragma omp for 
2023-01-11T21:38:06.1721605Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1721672Z         {
2023-01-11T21:38:06.1721809Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1721946Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1722081Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1722170Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1722260Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1722350Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1722416Z         }
2023-01-11T21:38:06.1722515Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1722601Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.1722668Z         {
2023-01-11T21:38:06.1722758Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1722846Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1722943Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1723031Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1723119Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1723205Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1723271Z         }
2023-01-11T21:38:06.1723337Z     }
2023-01-11T21:38:06.1723400Z }
2023-01-11T21:38:06.1723479Z ''')
2023-01-11T21:38:06.1723484Z 
2023-01-11T21:38:06.1723489Z 
2023-01-11T21:38:06.1723581Z async_compile.wait(globals())
2023-01-11T21:38:06.1723657Z del async_compile
2023-01-11T21:38:06.1723695Z 
2023-01-11T21:38:06.1723773Z def call(args):
2023-01-11T21:38:06.1723853Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1723928Z     args.clear()
2023-01-11T21:38:06.1724133Z     buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1724297Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1724371Z     del arg0_1
2023-01-11T21:38:06.1724443Z     del arg1_1
2023-01-11T21:38:06.1724519Z     return (buf0, )
2023-01-11T21:38:06.1724525Z 
2023-01-11T21:38:06.1724529Z 
2023-01-11T21:38:06.1724612Z if __name__ == "__main__":
2023-01-11T21:38:06.1724730Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1724858Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1725051Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1725247Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1725370Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1725375Z 
2023-01-11T21:38:06.1725639Z [2023-01-11 21:33:28,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 348
2023-01-11T21:38:06.1726053Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1726186Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1726442Z [2023-01-11 21:33:28,346] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 349
2023-01-11T21:38:06.1726705Z [2023-01-11 21:33:28,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 349
2023-01-11T21:38:06.1727190Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1727323Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1727578Z [2023-01-11 21:33:28,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 350
2023-01-11T21:38:06.1727839Z [2023-01-11 21:33:28,381] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 350
2023-01-11T21:38:06.1728247Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1728375Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1728630Z [2023-01-11 21:33:28,399] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 351
2023-01-11T21:38:06.1728636Z 
2023-01-11T21:38:06.1728734Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1728808Z import torch
2023-01-11T21:38:06.1728882Z import random
2023-01-11T21:38:06.1729001Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1729125Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1729130Z 
2023-01-11T21:38:06.1729212Z aten = torch.ops.aten
2023-01-11T21:38:06.1729341Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1729437Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1729442Z 
2023-01-11T21:38:06.1729559Z import triton
2023-01-11T21:38:06.1729652Z import triton.language as tl
2023-01-11T21:38:06.1729778Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1729919Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1729925Z 
2023-01-11T21:38:06.1729929Z 
2023-01-11T21:38:06.1730068Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1730275Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1730392Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1730501Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1730604Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1730670Z {
2023-01-11T21:38:06.1730771Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1730837Z     {
2023-01-11T21:38:06.1730919Z         #pragma omp for 
2023-01-11T21:38:06.1731000Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1731070Z         {
2023-01-11T21:38:06.1731209Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1731346Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1731436Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1731535Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1731601Z         }
2023-01-11T21:38:06.1731693Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1731783Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1731850Z         {
2023-01-11T21:38:06.1731938Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1732025Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1732112Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1732197Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1732256Z         }
2023-01-11T21:38:06.1732322Z     }
2023-01-11T21:38:06.1732385Z }
2023-01-11T21:38:06.1732470Z ''')
2023-01-11T21:38:06.1732478Z 
2023-01-11T21:38:06.1732484Z 
2023-01-11T21:38:06.1732576Z async_compile.wait(globals())
2023-01-11T21:38:06.1732653Z del async_compile
2023-01-11T21:38:06.1732658Z 
2023-01-11T21:38:06.1732732Z def call(args):
2023-01-11T21:38:06.1732805Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1732880Z     args.clear()
2023-01-11T21:38:06.1733143Z     buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1733313Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1733388Z     del arg0_1
2023-01-11T21:38:06.1733458Z     del arg1_1
2023-01-11T21:38:06.1733533Z     return (buf0, )
2023-01-11T21:38:06.1733538Z 
2023-01-11T21:38:06.1733543Z 
2023-01-11T21:38:06.1733623Z if __name__ == "__main__":
2023-01-11T21:38:06.1733734Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1733861Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1734070Z     arg0_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1734297Z     arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1734418Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1734423Z 
2023-01-11T21:38:06.1734430Z 
2023-01-11T21:38:06.1734685Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1734798Z import torch
2023-01-11T21:38:06.1734894Z import random
2023-01-11T21:38:06.1735008Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1735136Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1735142Z 
2023-01-11T21:38:06.1735230Z aten = torch.ops.aten
2023-01-11T21:38:06.1735386Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1735508Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1735514Z 
2023-01-11T21:38:06.1735590Z import triton
2023-01-11T21:38:06.1735741Z import triton.language as tl
2023-01-11T21:38:06.1735859Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1735999Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1736005Z 
2023-01-11T21:38:06.1736009Z 
2023-01-11T21:38:06.1736150Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1736359Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1736480Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1736589Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1736691Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1736755Z {
2023-01-11T21:38:06.1736849Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1736914Z     {
2023-01-11T21:38:06.1736995Z         #pragma omp for 
2023-01-11T21:38:06.1737082Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1737211Z         {
2023-01-11T21:38:06.1737373Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1737510Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1737644Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1737727Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1737818Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1737914Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1737985Z         }
2023-01-11T21:38:06.1738084Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1738176Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1738243Z         {
2023-01-11T21:38:06.1738324Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1738410Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1738514Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1738602Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1738690Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1738778Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1738844Z         }
2023-01-11T21:38:06.1738904Z     }
2023-01-11T21:38:06.1738969Z }
2023-01-11T21:38:06.1739055Z ''')
2023-01-11T21:38:06.1739060Z 
2023-01-11T21:38:06.1739064Z 
2023-01-11T21:38:06.1739157Z async_compile.wait(globals())
2023-01-11T21:38:06.1739275Z del async_compile
2023-01-11T21:38:06.1739280Z 
2023-01-11T21:38:06.1739357Z def call(args):
2023-01-11T21:38:06.1739436Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1739504Z     args.clear()
2023-01-11T21:38:06.1739734Z     buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1739901Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1739974Z     del arg0_1
2023-01-11T21:38:06.1740045Z     del arg1_1
2023-01-11T21:38:06.1740120Z     return (buf0, )
2023-01-11T21:38:06.1740125Z 
2023-01-11T21:38:06.1740133Z 
2023-01-11T21:38:06.1740212Z if __name__ == "__main__":
2023-01-11T21:38:06.1740323Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1740453Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1740655Z     arg0_1 = rand_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1740886Z     arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1741006Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1741012Z 
2023-01-11T21:38:06.1741016Z 
2023-01-11T21:38:06.1741114Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1741187Z import torch
2023-01-11T21:38:06.1741261Z import random
2023-01-11T21:38:06.1741374Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1741498Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1741503Z 
2023-01-11T21:38:06.1741584Z aten = torch.ops.aten
2023-01-11T21:38:06.1741749Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1741849Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1741854Z 
2023-01-11T21:38:06.1741930Z import triton
2023-01-11T21:38:06.1742026Z import triton.language as tl
2023-01-11T21:38:06.1742152Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1742289Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1742295Z 
2023-01-11T21:38:06.1742305Z 
2023-01-11T21:38:06.1742435Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1742643Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1742769Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1742881Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1742987Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1743055Z {
2023-01-11T21:38:06.1743158Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1743223Z     {
2023-01-11T21:38:06.1743304Z         #pragma omp for 
2023-01-11T21:38:06.1743392Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1743461Z         {
2023-01-11T21:38:06.1743603Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1743743Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1743834Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1743925Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1743994Z         }
2023-01-11T21:38:06.1744096Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1744186Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1744257Z         {
2023-01-11T21:38:06.1744347Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1744437Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.1744520Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1744609Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1744681Z         }
2023-01-11T21:38:06.1744748Z     }
2023-01-11T21:38:06.1744815Z }
2023-01-11T21:38:06.1744901Z ''')
2023-01-11T21:38:06.1744907Z 
2023-01-11T21:38:06.1744913Z 
2023-01-11T21:38:06.1745006Z async_compile.wait(globals())
2023-01-11T21:38:06.1745079Z del async_compile
2023-01-11T21:38:06.1745084Z 
2023-01-11T21:38:06.1745190Z def call(args):
2023-01-11T21:38:06.1745274Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1745352Z     args.clear()
2023-01-11T21:38:06.1745597Z     buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1745779Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1745854Z     del arg0_1
2023-01-11T21:38:06.1745921Z     del arg1_1
2023-01-11T21:38:06.1745998Z     return (buf0, )
2023-01-11T21:38:06.1746003Z 
2023-01-11T21:38:06.1746007Z 
2023-01-11T21:38:06.1746090Z if __name__ == "__main__":
2023-01-11T21:38:06.1746211Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1746340Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1746539Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1746749Z     arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1746873Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1746878Z 
2023-01-11T21:38:06.1747136Z [2023-01-11 21:33:28,409] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 351
2023-01-11T21:38:06.1747150Z 
2023-01-11T21:38:06.1747243Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1747320Z import torch
2023-01-11T21:38:06.1747397Z import random
2023-01-11T21:38:06.1747517Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1747640Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1747645Z 
2023-01-11T21:38:06.1747761Z aten = torch.ops.aten
2023-01-11T21:38:06.1747898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1747989Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1747994Z 
2023-01-11T21:38:06.1748071Z import triton
2023-01-11T21:38:06.1748166Z import triton.language as tl
2023-01-11T21:38:06.1748295Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1748436Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1748441Z 
2023-01-11T21:38:06.1748446Z 
2023-01-11T21:38:06.1748586Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1748791Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1748915Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1749019Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1749124Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1749191Z {
2023-01-11T21:38:06.1749303Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1749371Z     {
2023-01-11T21:38:06.1749455Z         #pragma omp for 
2023-01-11T21:38:06.1749543Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1749605Z         {
2023-01-11T21:38:06.1749743Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1749881Z             auto tmp3 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.1750017Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1750109Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1750200Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1750299Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1750370Z         }
2023-01-11T21:38:06.1750465Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1750553Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1750621Z         {
2023-01-11T21:38:06.1750711Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1750803Z             auto tmp3 = in_ptr1[i0];
2023-01-11T21:38:06.1750909Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1750994Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1751085Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1751173Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1751271Z         }
2023-01-11T21:38:06.1751340Z     }
2023-01-11T21:38:06.1751406Z }
2023-01-11T21:38:06.1751494Z ''')
2023-01-11T21:38:06.1751499Z 
2023-01-11T21:38:06.1751504Z 
2023-01-11T21:38:06.1751599Z async_compile.wait(globals())
2023-01-11T21:38:06.1751672Z del async_compile
2023-01-11T21:38:06.1751678Z 
2023-01-11T21:38:06.1751753Z def call(args):
2023-01-11T21:38:06.1751836Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1751913Z     args.clear()
2023-01-11T21:38:06.1752123Z     buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1752293Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1752370Z     del arg0_1
2023-01-11T21:38:06.1752436Z     del arg1_1
2023-01-11T21:38:06.1752515Z     return (buf0, )
2023-01-11T21:38:06.1752520Z 
2023-01-11T21:38:06.1752524Z 
2023-01-11T21:38:06.1752606Z if __name__ == "__main__":
2023-01-11T21:38:06.1752732Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1752861Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1753059Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1753269Z     arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1753390Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1753395Z 
2023-01-11T21:38:06.1753462Z ok (17.852s)
2023-01-11T21:38:06.1753921Z   test_views2_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1754084Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1754343Z [2023-01-11 21:33:28,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 352
2023-01-11T21:38:06.1754606Z [2023-01-11 21:33:30,087] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 352
2023-01-11T21:38:06.1755019Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1755157Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1755459Z [2023-01-11 21:33:30,103] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 353
2023-01-11T21:38:06.1755722Z [2023-01-11 21:33:31,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 353
2023-01-11T21:38:06.1756134Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1756265Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1756519Z [2023-01-11 21:33:31,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 354
2023-01-11T21:38:06.1756775Z [2023-01-11 21:33:33,456] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 354
2023-01-11T21:38:06.1757208Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1757341Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1757594Z [2023-01-11 21:33:33,474] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 355
2023-01-11T21:38:06.1757852Z [2023-01-11 21:33:35,138] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 355
2023-01-11T21:38:06.1758260Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1758396Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1758648Z [2023-01-11 21:33:35,153] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 356
2023-01-11T21:38:06.1758653Z 
2023-01-11T21:38:06.1758751Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1758828Z import torch
2023-01-11T21:38:06.1758895Z import random
2023-01-11T21:38:06.1759016Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1759144Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1759149Z 
2023-01-11T21:38:06.1759230Z aten = torch.ops.aten
2023-01-11T21:38:06.1759368Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1759494Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1759499Z 
2023-01-11T21:38:06.1759573Z import triton
2023-01-11T21:38:06.1759664Z import triton.language as tl
2023-01-11T21:38:06.1759782Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1759923Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1759929Z 
2023-01-11T21:38:06.1759933Z 
2023-01-11T21:38:06.1760072Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1760279Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1760401Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1760505Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1760571Z {
2023-01-11T21:38:06.1760671Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1760732Z     {
2023-01-11T21:38:06.1760814Z         #pragma omp for 
2023-01-11T21:38:06.1760900Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1760971Z         {
2023-01-11T21:38:06.1761109Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1761248Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1761338Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1761430Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1761498Z         }
2023-01-11T21:38:06.1761596Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1761683Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1761748Z         {
2023-01-11T21:38:06.1761837Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1761944Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1762026Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1762111Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1762178Z         }
2023-01-11T21:38:06.1762244Z     }
2023-01-11T21:38:06.1762309Z }
2023-01-11T21:38:06.1762397Z ''')
2023-01-11T21:38:06.1762403Z 
2023-01-11T21:38:06.1762407Z 
2023-01-11T21:38:06.1762499Z async_compile.wait(globals())
2023-01-11T21:38:06.1762569Z del async_compile
2023-01-11T21:38:06.1762574Z 
2023-01-11T21:38:06.1762648Z def call(args):
2023-01-11T21:38:06.1762723Z     arg0_1, = args
2023-01-11T21:38:06.1762798Z     args.clear()
2023-01-11T21:38:06.1763021Z     buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1763163Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1763234Z     del arg0_1
2023-01-11T21:38:06.1763302Z     return (buf0, )
2023-01-11T21:38:06.1763307Z 
2023-01-11T21:38:06.1763311Z 
2023-01-11T21:38:06.1763391Z if __name__ == "__main__":
2023-01-11T21:38:06.1763511Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1763637Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1763846Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1763961Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1763966Z 
2023-01-11T21:38:06.1763970Z 
2023-01-11T21:38:06.1764066Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1764141Z import torch
2023-01-11T21:38:06.1764208Z import random
2023-01-11T21:38:06.1764329Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1764451Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1764456Z 
2023-01-11T21:38:06.1764537Z aten = torch.ops.aten
2023-01-11T21:38:06.1764671Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1764766Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1764772Z 
2023-01-11T21:38:06.1764846Z import triton
2023-01-11T21:38:06.1764937Z import triton.language as tl
2023-01-11T21:38:06.1765055Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1765194Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1765230Z 
2023-01-11T21:38:06.1765235Z 
2023-01-11T21:38:06.1765371Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1765577Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1765700Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1765806Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1765871Z {
2023-01-11T21:38:06.1765965Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1766031Z     {
2023-01-11T21:38:06.1766111Z         #pragma omp for 
2023-01-11T21:38:06.1766198Z         for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1766264Z         {
2023-01-11T21:38:06.1766401Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1766537Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1766626Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1766752Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1766845Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1766940Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1767007Z         }
2023-01-11T21:38:06.1767107Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1767195Z         for(long i0=16; i0<16; i0+=1)
2023-01-11T21:38:06.1767263Z         {
2023-01-11T21:38:06.1767344Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1767448Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:06.1767536Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1767639Z             auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:06.1767726Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1767812Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1767879Z         }
2023-01-11T21:38:06.1767937Z     }
2023-01-11T21:38:06.1768002Z }
2023-01-11T21:38:06.1768088Z ''')
2023-01-11T21:38:06.1768093Z 
2023-01-11T21:38:06.1768098Z 
2023-01-11T21:38:06.1768193Z async_compile.wait(globals())
2023-01-11T21:38:06.1768270Z del async_compile
2023-01-11T21:38:06.1768276Z 
2023-01-11T21:38:06.1768350Z def call(args):
2023-01-11T21:38:06.1768425Z     arg0_1, = args
2023-01-11T21:38:06.1768492Z     args.clear()
2023-01-11T21:38:06.1768686Z     buf0 = empty_strided((4, 4), (4, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1768854Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1768928Z     del arg0_1
2023-01-11T21:38:06.1769003Z     return (buf0, )
2023-01-11T21:38:06.1769009Z 
2023-01-11T21:38:06.1769013Z 
2023-01-11T21:38:06.1769093Z if __name__ == "__main__":
2023-01-11T21:38:06.1769211Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1769335Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1769537Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1769649Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1769659Z 
2023-01-11T21:38:06.1769663Z 
2023-01-11T21:38:06.1769761Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1769838Z import torch
2023-01-11T21:38:06.1769913Z import random
2023-01-11T21:38:06.1770031Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1770156Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1770162Z 
2023-01-11T21:38:06.1770236Z aten = torch.ops.aten
2023-01-11T21:38:06.1770370Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1770464Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1770469Z 
2023-01-11T21:38:06.1770542Z import triton
2023-01-11T21:38:06.1770636Z import triton.language as tl
2023-01-11T21:38:06.1770760Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1770900Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1770905Z 
2023-01-11T21:38:06.1770910Z 
2023-01-11T21:38:06.1771045Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1771284Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1771409Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1771513Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1771578Z {
2023-01-11T21:38:06.1771680Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1771746Z     {
2023-01-11T21:38:06.1771828Z         #pragma omp for 
2023-01-11T21:38:06.1771908Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1771973Z         {
2023-01-11T21:38:06.1772111Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1772250Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1772340Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1772436Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1772503Z         }
2023-01-11T21:38:06.1772604Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1772689Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1772756Z         {
2023-01-11T21:38:06.1772845Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1772947Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1773035Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1773126Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1773193Z         }
2023-01-11T21:38:06.1773252Z     }
2023-01-11T21:38:06.1773315Z }
2023-01-11T21:38:06.1773400Z ''')
2023-01-11T21:38:06.1773405Z 
2023-01-11T21:38:06.1773410Z 
2023-01-11T21:38:06.1773502Z async_compile.wait(globals())
2023-01-11T21:38:06.1773578Z del async_compile
2023-01-11T21:38:06.1773583Z 
2023-01-11T21:38:06.1773659Z def call(args):
2023-01-11T21:38:06.1773733Z     arg0_1, = args
2023-01-11T21:38:06.1773800Z     args.clear()
2023-01-11T21:38:06.1774004Z     buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1774144Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1774216Z     del arg0_1
2023-01-11T21:38:06.1774292Z     return (buf0, )
2023-01-11T21:38:06.1774297Z 
2023-01-11T21:38:06.1774302Z 
2023-01-11T21:38:06.1774381Z if __name__ == "__main__":
2023-01-11T21:38:06.1774757Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1774919Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1775163Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1775277Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1775283Z 
2023-01-11T21:38:06.1775287Z 
2023-01-11T21:38:06.1775387Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1775465Z import torch
2023-01-11T21:38:06.1775541Z import random
2023-01-11T21:38:06.1775666Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1775792Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1775801Z 
2023-01-11T21:38:06.1775880Z aten = torch.ops.aten
2023-01-11T21:38:06.1776017Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1776114Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1776119Z 
2023-01-11T21:38:06.1776194Z import triton
2023-01-11T21:38:06.1776292Z import triton.language as tl
2023-01-11T21:38:06.1776417Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1776558Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1776563Z 
2023-01-11T21:38:06.1776568Z 
2023-01-11T21:38:06.1776705Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1776906Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1777032Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1777196Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1777277Z {
2023-01-11T21:38:06.1777437Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1777504Z     {
2023-01-11T21:38:06.1777586Z         #pragma omp for 
2023-01-11T21:38:06.1777666Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1777732Z         {
2023-01-11T21:38:06.1777871Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1778012Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1778101Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1778234Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1778322Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1778420Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1778479Z         }
2023-01-11T21:38:06.1778581Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1778671Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1778738Z         {
2023-01-11T21:38:06.1778827Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1778933Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:06.1779021Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1779117Z             auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:06.1779205Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1779296Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1779365Z         }
2023-01-11T21:38:06.1779430Z     }
2023-01-11T21:38:06.1779494Z }
2023-01-11T21:38:06.1779574Z ''')
2023-01-11T21:38:06.1779587Z 
2023-01-11T21:38:06.1779592Z 
2023-01-11T21:38:06.1779677Z async_compile.wait(globals())
2023-01-11T21:38:06.1779754Z del async_compile
2023-01-11T21:38:06.1779759Z 
2023-01-11T21:38:06.1779834Z def call(args):
2023-01-11T21:38:06.1779907Z     arg0_1, = args
2023-01-11T21:38:06.1779981Z     args.clear()
2023-01-11T21:38:06.1780182Z     buf0 = empty_strided((10, 100), (100, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1780319Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1780387Z     del arg0_1
2023-01-11T21:38:06.1780463Z     return (buf0, )
2023-01-11T21:38:06.1780468Z 
2023-01-11T21:38:06.1780472Z 
2023-01-11T21:38:06.1780552Z if __name__ == "__main__":
2023-01-11T21:38:06.1780671Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1780827Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1781057Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1781172Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1781177Z 
2023-01-11T21:38:06.1781440Z [2023-01-11 21:33:35,160] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 356
2023-01-11T21:38:06.1781851Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1781978Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1782235Z [2023-01-11 21:33:35,177] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 357
2023-01-11T21:38:06.1782497Z [2023-01-11 21:33:35,185] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 357
2023-01-11T21:38:06.1782503Z 
2023-01-11T21:38:06.1782601Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1782675Z import torch
2023-01-11T21:38:06.1782750Z import random
2023-01-11T21:38:06.1782868Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1782990Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1782995Z 
2023-01-11T21:38:06.1783070Z aten = torch.ops.aten
2023-01-11T21:38:06.1783204Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1783330Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1783335Z 
2023-01-11T21:38:06.1783409Z import triton
2023-01-11T21:38:06.1783502Z import triton.language as tl
2023-01-11T21:38:06.1783626Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1783771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1783776Z 
2023-01-11T21:38:06.1783781Z 
2023-01-11T21:38:06.1783917Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1784117Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1784239Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1784344Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1784408Z {
2023-01-11T21:38:06.1784509Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1784576Z     {
2023-01-11T21:38:06.1784656Z         #pragma omp for 
2023-01-11T21:38:06.1784741Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1784807Z         {
2023-01-11T21:38:06.1784945Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1785081Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1785171Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1785269Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1785341Z         }
2023-01-11T21:38:06.1785454Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1785551Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1785625Z         {
2023-01-11T21:38:06.1785730Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1785832Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1785921Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1786005Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1786072Z         }
2023-01-11T21:38:06.1786130Z     }
2023-01-11T21:38:06.1786195Z }
2023-01-11T21:38:06.1786283Z ''')
2023-01-11T21:38:06.1786289Z 
2023-01-11T21:38:06.1786293Z 
2023-01-11T21:38:06.1786386Z async_compile.wait(globals())
2023-01-11T21:38:06.1786463Z del async_compile
2023-01-11T21:38:06.1786468Z 
2023-01-11T21:38:06.1786542Z def call(args):
2023-01-11T21:38:06.1786617Z     arg0_1, = args
2023-01-11T21:38:06.1786684Z     args.clear()
2023-01-11T21:38:06.1786922Z     buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1787066Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1787140Z     del arg0_1
2023-01-11T21:38:06.1787215Z     return (buf0, )
2023-01-11T21:38:06.1787221Z 
2023-01-11T21:38:06.1787225Z 
2023-01-11T21:38:06.1787306Z if __name__ == "__main__":
2023-01-11T21:38:06.1787423Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1787541Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1787742Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1787857Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1787862Z 
2023-01-11T21:38:06.1787866Z 
2023-01-11T21:38:06.1787963Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1788036Z import torch
2023-01-11T21:38:06.1788111Z import random
2023-01-11T21:38:06.1788234Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1788357Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1788362Z 
2023-01-11T21:38:06.1788437Z aten = torch.ops.aten
2023-01-11T21:38:06.1788572Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1788665Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1788671Z 
2023-01-11T21:38:06.1788744Z import triton
2023-01-11T21:38:06.1788837Z import triton.language as tl
2023-01-11T21:38:06.1788960Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1789099Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1789131Z 
2023-01-11T21:38:06.1789136Z 
2023-01-11T21:38:06.1789272Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1789469Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1789593Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1789699Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1789765Z {
2023-01-11T21:38:06.1789866Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1789931Z     {
2023-01-11T21:38:06.1790012Z         #pragma omp for 
2023-01-11T21:38:06.1790092Z         for(long i0=0; i0<125; i0+=1)
2023-01-11T21:38:06.1790159Z         {
2023-01-11T21:38:06.1790295Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1790432Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(2));
2023-01-11T21:38:06.1790520Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1790660Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1790748Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1790843Z             tmp4.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1790903Z         }
2023-01-11T21:38:06.1791002Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1791096Z         for(long i0=1000; i0<1000; i0+=1)
2023-01-11T21:38:06.1791163Z         {
2023-01-11T21:38:06.1791252Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1791356Z             auto tmp1 = static_cast<float>(2);
2023-01-11T21:38:06.1791445Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.1791540Z             auto tmp3 = static_cast<float>(1);
2023-01-11T21:38:06.1791628Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.1791713Z             out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.1791778Z         }
2023-01-11T21:38:06.1791843Z     }
2023-01-11T21:38:06.1791907Z }
2023-01-11T21:38:06.1791984Z ''')
2023-01-11T21:38:06.1791996Z 
2023-01-11T21:38:06.1792004Z 
2023-01-11T21:38:06.1792090Z async_compile.wait(globals())
2023-01-11T21:38:06.1792166Z del async_compile
2023-01-11T21:38:06.1792171Z 
2023-01-11T21:38:06.1792247Z def call(args):
2023-01-11T21:38:06.1792322Z     arg0_1, = args
2023-01-11T21:38:06.1792397Z     args.clear()
2023-01-11T21:38:06.1792634Z     buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1792773Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1792839Z     del arg0_1
2023-01-11T21:38:06.1792914Z     return (buf0, )
2023-01-11T21:38:06.1792919Z 
2023-01-11T21:38:06.1792923Z 
2023-01-11T21:38:06.1793003Z if __name__ == "__main__":
2023-01-11T21:38:06.1793120Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1793246Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1793446Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1793562Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1793567Z 
2023-01-11T21:38:06.1793637Z ok (6.775s)
2023-01-11T21:38:06.1794088Z   test_views3_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1794221Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1794478Z [2023-01-11 21:33:35,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 358
2023-01-11T21:38:06.1794740Z [2023-01-11 21:33:36,954] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 358
2023-01-11T21:38:06.1794746Z 
2023-01-11T21:38:06.1794844Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1794948Z import torch
2023-01-11T21:38:06.1795027Z import random
2023-01-11T21:38:06.1795146Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1795295Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1795301Z 
2023-01-11T21:38:06.1795385Z aten = torch.ops.aten
2023-01-11T21:38:06.1795538Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1795637Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1795642Z 
2023-01-11T21:38:06.1795716Z import triton
2023-01-11T21:38:06.1795811Z import triton.language as tl
2023-01-11T21:38:06.1795935Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1796073Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1796079Z 
2023-01-11T21:38:06.1796084Z 
2023-01-11T21:38:06.1796220Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1796418Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1796543Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.1796653Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.1796757Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.1796825Z {
2023-01-11T21:38:06.1796930Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1796996Z     {
2023-01-11T21:38:06.1797071Z         #pragma omp for 
2023-01-11T21:38:06.1797159Z         for(long i0=0; i0<744; i0+=1)
2023-01-11T21:38:06.1797228Z         {
2023-01-11T21:38:06.1797313Z             #pragma GCC ivdep
2023-01-11T21:38:06.1797406Z             for(long i1=0; i1<192; i1+=1)
2023-01-11T21:38:06.1797475Z             {
2023-01-11T21:38:06.1797546Z                 {
2023-01-11T21:38:06.1797610Z                     {
2023-01-11T21:38:06.1797723Z                         auto tmp0 = in_ptr0[(3*i0) + (i1 / 64)];
2023-01-11T21:38:06.1797838Z                         auto tmp1 = in_ptr1[(64*tmp0) + (i1 % 64)];
2023-01-11T21:38:06.1797943Z                         out_ptr0[i1 + (192*i0)] = tmp1;
2023-01-11T21:38:06.1798013Z                     }
2023-01-11T21:38:06.1798081Z                 }
2023-01-11T21:38:06.1798150Z             }
2023-01-11T21:38:06.1798209Z         }
2023-01-11T21:38:06.1798275Z     }
2023-01-11T21:38:06.1798339Z }
2023-01-11T21:38:06.1798430Z ''')
2023-01-11T21:38:06.1798464Z 
2023-01-11T21:38:06.1798469Z 
2023-01-11T21:38:06.1798564Z async_compile.wait(globals())
2023-01-11T21:38:06.1798643Z del async_compile
2023-01-11T21:38:06.1798648Z 
2023-01-11T21:38:06.1798724Z def call(args):
2023-01-11T21:38:06.1798796Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1798873Z     args.clear()
2023-01-11T21:38:06.1799104Z     buf0 = empty_strided((1, 12, 62, 192), (142848, 11904, 192, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1799272Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1799346Z     del arg0_1
2023-01-11T21:38:06.1799421Z     del arg1_1
2023-01-11T21:38:06.1799496Z     return (buf0, )
2023-01-11T21:38:06.1799501Z 
2023-01-11T21:38:06.1799506Z 
2023-01-11T21:38:06.1799585Z if __name__ == "__main__":
2023-01-11T21:38:06.1799696Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1799821Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1800022Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1800216Z     arg1_1 = rand_strided((2232, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1800338Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1800343Z 
2023-01-11T21:38:06.1800415Z ok (1.769s)
2023-01-11T21:38:06.1800934Z   test_zero_dim_reductions_cpu (__main__.CpuTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1801058Z   warnings.warn(
2023-01-11T21:38:06.1801306Z [2023-01-11 21:33:37,014] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 359
2023-01-11T21:38:06.1801565Z [2023-01-11 21:33:38,661] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 359
2023-01-11T21:38:06.1801819Z [2023-01-11 21:33:38,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 360
2023-01-11T21:38:06.1802080Z [2023-01-11 21:33:38,724] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 360
2023-01-11T21:38:06.1802086Z 
2023-01-11T21:38:06.1802184Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1802259Z import torch
2023-01-11T21:38:06.1802334Z import random
2023-01-11T21:38:06.1802454Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1802572Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1802584Z 
2023-01-11T21:38:06.1802663Z aten = torch.ops.aten
2023-01-11T21:38:06.1802799Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1802896Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1802901Z 
2023-01-11T21:38:06.1802975Z import triton
2023-01-11T21:38:06.1803070Z import triton.language as tl
2023-01-11T21:38:06.1803198Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1803337Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1803343Z 
2023-01-11T21:38:06.1803347Z 
2023-01-11T21:38:06.1803477Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1803679Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1803794Z extern "C" void kernel(bool* __restrict__ out_ptr0)
2023-01-11T21:38:06.1803860Z {
2023-01-11T21:38:06.1803942Z     #pragma GCC ivdep
2023-01-11T21:38:06.1804027Z     for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1804093Z     {
2023-01-11T21:38:06.1804156Z         {
2023-01-11T21:38:06.1804226Z             {
2023-01-11T21:38:06.1804338Z                 auto tmp0 = static_cast<bool>(false);
2023-01-11T21:38:06.1804428Z                 auto tmp1 = tmp0 == 0;
2023-01-11T21:38:06.1804515Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1804583Z             }
2023-01-11T21:38:06.1804678Z         }
2023-01-11T21:38:06.1804741Z     }
2023-01-11T21:38:06.1804807Z }
2023-01-11T21:38:06.1804891Z ''')
2023-01-11T21:38:06.1804897Z 
2023-01-11T21:38:06.1804901Z 
2023-01-11T21:38:06.1804996Z async_compile.wait(globals())
2023-01-11T21:38:06.1805073Z del async_compile
2023-01-11T21:38:06.1805079Z 
2023-01-11T21:38:06.1805154Z def call(args):
2023-01-11T21:38:06.1805232Z     arg0_1, = args
2023-01-11T21:38:06.1805300Z     args.clear()
2023-01-11T21:38:06.1805491Z     buf0 = empty_strided((2, 1), (1, 1), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1805599Z     kernel_cpp_0(c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1805674Z     return (buf0, )
2023-01-11T21:38:06.1805682Z 
2023-01-11T21:38:06.1805686Z 
2023-01-11T21:38:06.1805767Z if __name__ == "__main__":
2023-01-11T21:38:06.1805882Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1806007Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1806205Z     arg0_1 = rand_strided((2, 0), (1, 1), device='cpu', dtype=torch.float16)
2023-01-11T21:38:06.1806310Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1806315Z 
2023-01-11T21:38:06.1806327Z 
2023-01-11T21:38:06.1806417Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1806491Z import torch
2023-01-11T21:38:06.1806565Z import random
2023-01-11T21:38:06.1806683Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1806805Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1806810Z 
2023-01-11T21:38:06.1806891Z aten = torch.ops.aten
2023-01-11T21:38:06.1807025Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1807143Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1807148Z 
2023-01-11T21:38:06.1807222Z import triton
2023-01-11T21:38:06.1807314Z import triton.language as tl
2023-01-11T21:38:06.1807437Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1807574Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1807582Z 
2023-01-11T21:38:06.1807586Z 
2023-01-11T21:38:06.1807721Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1807929Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1808044Z extern "C" void kernel(bool* __restrict__ out_ptr0)
2023-01-11T21:38:06.1808103Z {
2023-01-11T21:38:06.1808182Z     #pragma GCC ivdep
2023-01-11T21:38:06.1808267Z     for(long i0=0; i0<2; i0+=1)
2023-01-11T21:38:06.1808333Z     {
2023-01-11T21:38:06.1808402Z         {
2023-01-11T21:38:06.1808471Z             {
2023-01-11T21:38:06.1808573Z                 auto tmp0 = static_cast<bool>(false);
2023-01-11T21:38:06.1808666Z                 auto tmp1 = tmp0 == 0;
2023-01-11T21:38:06.1808754Z                 out_ptr0[i0] = tmp1;
2023-01-11T21:38:06.1808822Z             }
2023-01-11T21:38:06.1808889Z         }
2023-01-11T21:38:06.1808955Z     }
2023-01-11T21:38:06.1809019Z }
2023-01-11T21:38:06.1809096Z ''')
2023-01-11T21:38:06.1809101Z 
2023-01-11T21:38:06.1809115Z 
2023-01-11T21:38:06.1809202Z async_compile.wait(globals())
2023-01-11T21:38:06.1809282Z del async_compile
2023-01-11T21:38:06.1809287Z 
2023-01-11T21:38:06.1809361Z def call(args):
2023-01-11T21:38:06.1809435Z     arg0_1, = args
2023-01-11T21:38:06.1809511Z     args.clear()
2023-01-11T21:38:06.1809700Z     buf0 = empty_strided((2, ), (1, ), device='cpu', dtype=torch.bool)
2023-01-11T21:38:06.1809807Z     kernel_cpp_0(c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1809876Z     return (buf0, )
2023-01-11T21:38:06.1809881Z 
2023-01-11T21:38:06.1809886Z 
2023-01-11T21:38:06.1809965Z if __name__ == "__main__":
2023-01-11T21:38:06.1810083Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1810208Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1810406Z     arg0_1 = rand_strided((2, 0), (1, 1), device='cpu', dtype=torch.float16)
2023-01-11T21:38:06.1810518Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1810523Z 
2023-01-11T21:38:06.1810624Z ok (1.770s)
2023-01-11T21:38:06.1811076Z   test_zeros_cpu (__main__.CpuTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1811208Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1811458Z [2023-01-11 21:33:38,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 361
2023-01-11T21:38:06.1811722Z [2023-01-11 21:33:40,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 361
2023-01-11T21:38:06.1811728Z 
2023-01-11T21:38:06.1811826Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1811900Z import torch
2023-01-11T21:38:06.1811974Z import random
2023-01-11T21:38:06.1812092Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1812214Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1812219Z 
2023-01-11T21:38:06.1812300Z aten = torch.ops.aten
2023-01-11T21:38:06.1812428Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1812523Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1812528Z 
2023-01-11T21:38:06.1812603Z import triton
2023-01-11T21:38:06.1812695Z import triton.language as tl
2023-01-11T21:38:06.1812819Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1812960Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1813005Z 
2023-01-11T21:38:06.1813010Z 
2023-01-11T21:38:06.1813148Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1813352Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1813470Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1813575Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1813677Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.1813778Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.1813877Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.1813976Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.1814072Z                        float* __restrict__ out_ptr5)
2023-01-11T21:38:06.1814131Z {
2023-01-11T21:38:06.1814232Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1814299Z     {
2023-01-11T21:38:06.1814384Z         #pragma omp for 
2023-01-11T21:38:06.1814470Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.1814661Z         {
2023-01-11T21:38:06.1814803Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.1814934Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(1));
2023-01-11T21:38:06.1815029Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1815125Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.1815220Z             tmp2.store(out_ptr1 + 8*i0);
2023-01-11T21:38:06.1815287Z         }
2023-01-11T21:38:06.1815385Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1815471Z         for(long i0=8; i0<8; i0+=1)
2023-01-11T21:38:06.1815530Z         {
2023-01-11T21:38:06.1815621Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1815725Z             auto tmp1 = static_cast<float>(1);
2023-01-11T21:38:06.1815814Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1815899Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1815988Z             out_ptr1[i0] = tmp2;
2023-01-11T21:38:06.1816056Z         }
2023-01-11T21:38:06.1816130Z         #pragma omp for 
2023-01-11T21:38:06.1816219Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:06.1816286Z         {
2023-01-11T21:38:06.1816468Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(0));
2023-01-11T21:38:06.1816568Z             tmp0.store(out_ptr2 + 8*i0);
2023-01-11T21:38:06.1816636Z         }
2023-01-11T21:38:06.1816741Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1816833Z         for(long i0=32768; i0<32768; i0+=1)
2023-01-11T21:38:06.1816899Z         {
2023-01-11T21:38:06.1817008Z             auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.1817096Z             out_ptr2[i0] = tmp0;
2023-01-11T21:38:06.1817252Z         }
2023-01-11T21:38:06.1817343Z         #pragma omp for 
2023-01-11T21:38:06.1817431Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:06.1817490Z         {
2023-01-11T21:38:06.1817630Z             auto tmp0 = at::vec::Vectorized<float>(static_cast<float>(0));
2023-01-11T21:38:06.1817726Z             tmp0.store(out_ptr3 + 8*i0);
2023-01-11T21:38:06.1817795Z         }
2023-01-11T21:38:06.1817893Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.1817986Z         for(long i0=32768; i0<32768; i0+=1)
2023-01-11T21:38:06.1818056Z         {
2023-01-11T21:38:06.1818151Z             auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.1818235Z             out_ptr3[i0] = tmp0;
2023-01-11T21:38:06.1818302Z         }
2023-01-11T21:38:06.1818386Z         #pragma omp single
2023-01-11T21:38:06.1818454Z         {
2023-01-11T21:38:06.1818539Z             #pragma GCC ivdep
2023-01-11T21:38:06.1818619Z             for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.1818687Z             {
2023-01-11T21:38:06.1818755Z                 {
2023-01-11T21:38:06.1818826Z                     {
2023-01-11T21:38:06.1818938Z                         auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.1819125Z                         out_ptr4[i0] = tmp0;
2023-01-11T21:38:06.1819198Z                     }
2023-01-11T21:38:06.1819262Z                 }
2023-01-11T21:38:06.1819332Z             }
2023-01-11T21:38:06.1819402Z         }
2023-01-11T21:38:06.1819489Z         #pragma omp single
2023-01-11T21:38:06.1819558Z         {
2023-01-11T21:38:06.1819648Z             #pragma GCC ivdep
2023-01-11T21:38:06.1819738Z             for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.1819802Z             {
2023-01-11T21:38:06.1819874Z                 {
2023-01-11T21:38:06.1819947Z                     {
2023-01-11T21:38:06.1820065Z                         auto tmp0 = static_cast<float>(3.1416);
2023-01-11T21:38:06.1820162Z                         out_ptr5[i0] = tmp0;
2023-01-11T21:38:06.1820236Z                     }
2023-01-11T21:38:06.1820310Z                 }
2023-01-11T21:38:06.1820373Z             }
2023-01-11T21:38:06.1820442Z         }
2023-01-11T21:38:06.1820513Z     }
2023-01-11T21:38:06.1820580Z }
2023-01-11T21:38:06.1820677Z ''')
2023-01-11T21:38:06.1820683Z 
2023-01-11T21:38:06.1820687Z 
2023-01-11T21:38:06.1820783Z async_compile.wait(globals())
2023-01-11T21:38:06.1820862Z del async_compile
2023-01-11T21:38:06.1820867Z 
2023-01-11T21:38:06.1820936Z def call(args):
2023-01-11T21:38:06.1821012Z     arg0_1, = args
2023-01-11T21:38:06.1821092Z     args.clear()
2023-01-11T21:38:06.1821294Z     buf0 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1821488Z     buf4 = empty_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1821712Z     buf1 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1821934Z     buf2 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1822121Z     buf3 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1822315Z     buf5 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1822579Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:06.1822655Z     del arg0_1
2023-01-11T21:38:06.1822763Z     return (buf0, buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:06.1822798Z 
2023-01-11T21:38:06.1822804Z 
2023-01-11T21:38:06.1822885Z if __name__ == "__main__":
2023-01-11T21:38:06.1823006Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1823131Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1823322Z     arg0_1 = rand_strided((8, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1823428Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1823433Z 
2023-01-11T21:38:06.1823504Z ok (1.841s)
2023-01-11T21:38:06.1824027Z   test_accuracy_issue1 (__main__.CudaReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1824110Z   warnings.warn(
2023-01-11T21:38:06.1824369Z [2023-01-11 21:33:41,585] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 362
2023-01-11T21:38:06.1824579Z [2023-01-11 21:33:41,613] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2')
2023-01-11T21:38:06.1824782Z [2023-01-11 21:33:41,613] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.1825044Z [2023-01-11 21:33:42,055] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 362
2023-01-11T21:38:06.1825298Z [2023-01-11 21:33:42,194] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 362
2023-01-11T21:38:06.1825499Z [2023-01-11 21:33:42,232] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2')
2023-01-11T21:38:06.1825533Z 
2023-01-11T21:38:06.1825654Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1825733Z import torch
2023-01-11T21:38:06.1825826Z import random
2023-01-11T21:38:06.1825953Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1826075Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1826080Z 
2023-01-11T21:38:06.1826166Z aten = torch.ops.aten
2023-01-11T21:38:06.1826295Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1826391Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1826396Z 
2023-01-11T21:38:06.1826470Z import triton
2023-01-11T21:38:06.1826562Z import triton.language as tl
2023-01-11T21:38:06.1826688Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1826828Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1826834Z 
2023-01-11T21:38:06.1826838Z 
2023-01-11T21:38:06.1827083Z triton_fused_addmm_amax_clone_exp_permute_split_squeeze_sub_sub_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.1827161Z import triton
2023-01-11T21:38:06.1827246Z import triton.language as tl
2023-01-11T21:38:06.1827363Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1827465Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1827597Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.1827724Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1827729Z 
2023-01-11T21:38:06.1827819Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.1827933Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.1828018Z               filename=__file__,
2023-01-11T21:38:06.1828380Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.1828455Z @triton.jit
2023-01-11T21:38:06.1828624Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.1828700Z     xnumel = 1
2023-01-11T21:38:06.1828774Z     rnumel = 128
2023-01-11T21:38:06.1828870Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1829005Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.1829082Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1829228Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.1829413Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.1829518Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1829609Z         rindex = roffset + rbase
2023-01-11T21:38:06.1829695Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1829768Z         r0 = rindex
2023-01-11T21:38:06.1829958Z         tmp0 = tl.load(in_ptr0 + (2*r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1830090Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.1830206Z     tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.1830322Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.1830428Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1830515Z         rindex = roffset + rbase
2023-01-11T21:38:06.1830601Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1830675Z         r0 = rindex
2023-01-11T21:38:06.1830864Z         tmp2 = tl.load(in_ptr0 + (2*r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1830980Z         tmp3 = tmp2 - tmp1
2023-01-11T21:38:06.1831063Z         tmp4 = tl.exp(tmp3)
2023-01-11T21:38:06.1831184Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.1831297Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.1831400Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1831489Z         rindex = roffset + rbase
2023-01-11T21:38:06.1831566Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1831639Z         r0 = rindex
2023-01-11T21:38:06.1831774Z         tmp6 = tl.load(in_ptr0 + (2*r0), rmask)
2023-01-11T21:38:06.1831889Z         tmp7 = tmp6 - tmp1
2023-01-11T21:38:06.1831970Z         tmp8 = tl.log(tmp5)
2023-01-11T21:38:06.1832083Z         tmp9 = tmp7 - tmp8
2023-01-11T21:38:06.1832239Z         tl.store(out_ptr2 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask)
2023-01-11T21:38:06.1832317Z ''')
2023-01-11T21:38:06.1832322Z 
2023-01-11T21:38:06.1832326Z 
2023-01-11T21:38:06.1832604Z triton_fused_convert_element_type_div_gather_ne_neg_scalar_tensor_squeeze_1_sub_1_sum_2_sum_3_1 = async_compile.triton('''
2023-01-11T21:38:06.1832680Z import triton
2023-01-11T21:38:06.1832776Z import triton.language as tl
2023-01-11T21:38:06.1832890Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1832992Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1833125Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1833251Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1833259Z 
2023-01-11T21:38:06.1833692Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i64', 3: '*i1', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.1833771Z @triton.jit
2023-01-11T21:38:06.1833932Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1834005Z     xnumel = 1
2023-01-11T21:38:06.1834107Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1834238Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1834323Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1834559Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.1834684Z     tmp5 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.1834758Z     tmp1 = 0
2023-01-11T21:38:06.1834897Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.1834971Z     tmp3 = 128
2023-01-11T21:38:06.1835105Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3))
2023-01-11T21:38:06.1835268Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp1, tmp5, tmp1))
2023-01-11T21:38:06.1835401Z     tmp7 = tl.where(tmp6 != tmp6, tmp6, tl.where(tmp6 < tmp3, tmp6, tmp3))
2023-01-11T21:38:06.1835475Z     tmp8 = tmp7 != tmp3
2023-01-11T21:38:06.1835576Z     tmp9 = tl.load(in_ptr1 + (tmp4), None)
2023-01-11T21:38:06.1835683Z     tmp10 = -tmp9
2023-01-11T21:38:06.1835783Z     tmp11 = tl.where(tmp8, tmp10, tmp1)
2023-01-11T21:38:06.1835871Z     tmp12 = tmp8.to(tl.int64)
2023-01-11T21:38:06.1835963Z     tmp13 = tmp12.to(tl.float32)
2023-01-11T21:38:06.1836044Z     tmp14 = tmp11 / tmp13
2023-01-11T21:38:06.1836171Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp4, None)
2023-01-11T21:38:06.1836306Z     tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp8, None)
2023-01-11T21:38:06.1836443Z     tl.store(out_ptr2 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp14, None)
2023-01-11T21:38:06.1836530Z ''')
2023-01-11T21:38:06.1836536Z 
2023-01-11T21:38:06.1836540Z 
2023-01-11T21:38:06.1836634Z async_compile.wait(globals())
2023-01-11T21:38:06.1836720Z del async_compile
2023-01-11T21:38:06.1836725Z 
2023-01-11T21:38:06.1836801Z def call(args):
2023-01-11T21:38:06.1836920Z     primals_1, primals_2, primals_3, primals_4 = args
2023-01-11T21:38:06.1836989Z     args.clear()
2023-01-11T21:38:06.1837082Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1837287Z         buf0 = empty_strided((128, 2), (2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1837481Z         aten.addmm.out(primals_2, as_strided(primals_4, (128, 768), (768, 1)), as_strided(primals_1, (768, 2), (1, 768)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.1837560Z         del primals_2
2023-01-11T21:38:06.1837765Z         buf3 = empty_strided((1, 128), (128, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1837886Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1838080Z         triton_fused_addmm_amax_clone_exp_permute_split_squeeze_sub_sub_1_sum_1_0.run(buf0, buf3, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.1838145Z         del buf0
2023-01-11T21:38:06.1838345Z         buf4 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.1838535Z         buf5 = empty_strided((1, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.1838721Z         buf6 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1838951Z         triton_fused_convert_element_type_div_gather_ne_neg_scalar_tensor_squeeze_1_sub_1_sum_2_sum_3_1.run(primals_3, buf3, buf4, buf5, buf6, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.1839031Z         del primals_3
2023-01-11T21:38:06.1839200Z         return (buf6, as_strided(primals_4, (128, 768), (768, 1)), buf3, buf4, buf5, as_strided(primals_1, (2, 768), (768, 1)), )
2023-01-11T21:38:06.1839209Z 
2023-01-11T21:38:06.1839213Z 
2023-01-11T21:38:06.1839296Z if __name__ == "__main__":
2023-01-11T21:38:06.1839408Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1839536Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1839750Z     primals_1 = rand_strided((2, 768), (768, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1839954Z     primals_2 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1840153Z     primals_3 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.1840378Z     primals_4 = rand_strided((1, 128, 768), (98304, 768, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1840533Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4]))
2023-01-11T21:38:06.1840538Z 
2023-01-11T21:38:06.1840543Z 
2023-01-11T21:38:06.1840643Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1840722Z import torch
2023-01-11T21:38:06.1840790Z import random
2023-01-11T21:38:06.1840911Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1841036Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1841041Z 
2023-01-11T21:38:06.1841127Z aten = torch.ops.aten
2023-01-11T21:38:06.1841291Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1841389Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1841394Z 
2023-01-11T21:38:06.1841469Z import triton
2023-01-11T21:38:06.1841555Z import triton.language as tl
2023-01-11T21:38:06.1841680Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1841823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1841828Z 
2023-01-11T21:38:06.1841833Z 
2023-01-11T21:38:06.1842013Z triton_fused_scatter_zeros_like_0 = async_compile.triton('''
2023-01-11T21:38:06.1842088Z import triton
2023-01-11T21:38:06.1842181Z import triton.language as tl
2023-01-11T21:38:06.1842298Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1842400Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1842525Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1842650Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1842656Z 
2023-01-11T21:38:06.1843048Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.1843128Z @triton.jit
2023-01-11T21:38:06.1843250Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1843324Z     xnumel = 128
2023-01-11T21:38:06.1843422Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1843550Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1843626Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1843728Z     x0 = xindex
2023-01-11T21:38:06.1843800Z     tmp0 = 0
2023-01-11T21:38:06.1843934Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.1844020Z ''')
2023-01-11T21:38:06.1844026Z 
2023-01-11T21:38:06.1844030Z 
2023-01-11T21:38:06.1844209Z triton_fused_scatter_zeros_like_1 = async_compile.triton('''
2023-01-11T21:38:06.1844288Z import triton
2023-01-11T21:38:06.1844373Z import triton.language as tl
2023-01-11T21:38:06.1844488Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1844590Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1844722Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1844846Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1844851Z 
2023-01-11T21:38:06.1845256Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.1845333Z @triton.jit
2023-01-11T21:38:06.1845488Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1845556Z     xnumel = 1
2023-01-11T21:38:06.1845674Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1845811Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1845895Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1846128Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.1846230Z     tmp1 = -1.0
2023-01-11T21:38:06.1846334Z     tl.store(out_ptr0 + (tmp0), tmp1, None)
2023-01-11T21:38:06.1846411Z ''')
2023-01-11T21:38:06.1846423Z 
2023-01-11T21:38:06.1846428Z 
2023-01-11T21:38:06.1846695Z triton_fused_cat_convert_element_type_div_1_mul_ne_2_scalar_tensor_scatter_sum_2_sum_4_where_1_2 = async_compile.triton('''
2023-01-11T21:38:06.1846775Z import triton
2023-01-11T21:38:06.1846873Z import triton.language as tl
2023-01-11T21:38:06.1846990Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1847092Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1847223Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.1847348Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1847353Z 
2023-01-11T21:38:06.1847471Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.1847582Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.1847669Z               filename=__file__,
2023-01-11T21:38:06.1848088Z               meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp32', 3: '*i1', 4: '*fp32', 5: '*fp32', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 7), equal_to_1=())]})
2023-01-11T21:38:06.1848163Z @triton.jit
2023-01-11T21:38:06.1848361Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.1848439Z     xnumel = 1
2023-01-11T21:38:06.1848513Z     rnumel = 128
2023-01-11T21:38:06.1848603Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1848740Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.1848826Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1848948Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.1849194Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.1849439Z     tmp4 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.1849677Z     tmp5 = tl.load(in_ptr3 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.1849800Z     _tmp12 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.1849900Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1850019Z         rindex = roffset + rbase
2023-01-11T21:38:06.1850109Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1850181Z         r0 = rindex
2023-01-11T21:38:06.1850375Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1850451Z         tmp2 = 128
2023-01-11T21:38:06.1850538Z         tmp3 = tmp1 != tmp2
2023-01-11T21:38:06.1850621Z         tmp6 = tmp5.to(tl.int64)
2023-01-11T21:38:06.1850713Z         tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.1850795Z         tmp8 = tmp4 / tmp7
2023-01-11T21:38:06.1850868Z         tmp9 = 0
2023-01-11T21:38:06.1850969Z         tmp10 = tl.where(tmp3, tmp8, tmp9)
2023-01-11T21:38:06.1851052Z         tmp11 = tmp0 * tmp10
2023-01-11T21:38:06.1851175Z         _tmp12 = tl.where(xmask & rmask, _tmp12 + tmp11, _tmp12)
2023-01-11T21:38:06.1851284Z     tmp12 = tl.reshape(tl.sum(_tmp12, 1), [XBLOCK, 1])
2023-01-11T21:38:06.1851424Z     tmp14 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None)
2023-01-11T21:38:06.1851569Z     tmp17 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None)
2023-01-11T21:38:06.1851706Z     tmp18 = tl.load(in_ptr3 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None)
2023-01-11T21:38:06.1851813Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1851902Z         rindex = roffset + rbase
2023-01-11T21:38:06.1851991Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1852056Z         r0 = rindex
2023-01-11T21:38:06.1852158Z         tmp13 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.1852262Z         tmp25 = tl.load(in_ptr4 + (r0), rmask)
2023-01-11T21:38:06.1852336Z         tmp15 = 128
2023-01-11T21:38:06.1852420Z         tmp16 = tmp14 != tmp15
2023-01-11T21:38:06.1852513Z         tmp19 = tmp18.to(tl.int64)
2023-01-11T21:38:06.1852610Z         tmp20 = tmp19.to(tl.float32)
2023-01-11T21:38:06.1852687Z         tmp21 = tmp17 / tmp20
2023-01-11T21:38:06.1852758Z         tmp22 = 0
2023-01-11T21:38:06.1852863Z         tmp23 = tl.where(tmp16, tmp21, tmp22)
2023-01-11T21:38:06.1852949Z         tmp24 = tmp13 * tmp23
2023-01-11T21:38:06.1853034Z         tmp26 = tl.exp(tmp25)
2023-01-11T21:38:06.1853115Z         tmp27 = tmp26 * tmp12
2023-01-11T21:38:06.1853233Z         tmp28 = tmp24 - tmp27
2023-01-11T21:38:06.1853380Z         tl.store(out_ptr1 + (2*r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp28, rmask & xmask)
2023-01-11T21:38:06.1853492Z ''')
2023-01-11T21:38:06.1853497Z 
2023-01-11T21:38:06.1853502Z 
2023-01-11T21:38:06.1853664Z triton_fused_zeros_3 = async_compile.triton('''
2023-01-11T21:38:06.1853738Z import triton
2023-01-11T21:38:06.1853834Z import triton.language as tl
2023-01-11T21:38:06.1853948Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1854051Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1854184Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1854302Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1854307Z 
2023-01-11T21:38:06.1854808Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(1,), equal_to_1=())]})
2023-01-11T21:38:06.1854887Z @triton.jit
2023-01-11T21:38:06.1855010Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1855088Z     xnumel = 128
2023-01-11T21:38:06.1855185Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1855313Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1855396Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1855460Z     x0 = xindex
2023-01-11T21:38:06.1855531Z     tmp0 = 0
2023-01-11T21:38:06.1855668Z     tl.store(out_ptr0 + (2*x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.1855755Z ''')
2023-01-11T21:38:06.1855760Z 
2023-01-11T21:38:06.1855764Z 
2023-01-11T21:38:06.1855940Z triton_fused_cat_sum_5_view_2_4 = async_compile.triton('''
2023-01-11T21:38:06.1856016Z import triton
2023-01-11T21:38:06.1856175Z import triton.language as tl
2023-01-11T21:38:06.1856289Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1856397Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1856541Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.1856676Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1856684Z 
2023-01-11T21:38:06.1856777Z @reduction(size_hints=[2, 128],
2023-01-11T21:38:06.1856903Z               reduction_hint=ReductionHint.OUTER,
2023-01-11T21:38:06.1856992Z               filename=__file__,
2023-01-11T21:38:06.1857450Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.1857518Z @triton.jit
2023-01-11T21:38:06.1857686Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.1857763Z     xnumel = 2
2023-01-11T21:38:06.1857836Z     rnumel = 128
2023-01-11T21:38:06.1857934Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1858070Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.1858152Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1858267Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.1858339Z     x0 = xindex
2023-01-11T21:38:06.1858458Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.1858564Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1858652Z         rindex = roffset + rbase
2023-01-11T21:38:06.1858736Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1858807Z         r1 = rindex
2023-01-11T21:38:06.1858918Z         tmp0 = tl.load(in_ptr0 + (x0 + (2*r1)), rmask & xmask)
2023-01-11T21:38:06.1859039Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.1859153Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.1859257Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.1859342Z ''')
2023-01-11T21:38:06.1859348Z 
2023-01-11T21:38:06.1859352Z 
2023-01-11T21:38:06.1859444Z async_compile.wait(globals())
2023-01-11T21:38:06.1859522Z del async_compile
2023-01-11T21:38:06.1859527Z 
2023-01-11T21:38:06.1859602Z def call(args):
2023-01-11T21:38:06.1859763Z     view, sub_1, unsqueeze, ne, permute_1, tangents_1 = args
2023-01-11T21:38:06.1859840Z     args.clear()
2023-01-11T21:38:06.1859932Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1860140Z         buf0 = empty_strided((1, 128), (128, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1860234Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1860381Z         triton_fused_scatter_zeros_like_0.run(buf0, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.1860538Z         triton_fused_scatter_zeros_like_1.run(unsqueeze, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.1860740Z         buf5 = empty_strided((1, 128, 2), (256, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1860857Z         buf3 = as_strided(buf5, (1, 128, 1), (256, 2, 1))  # alias
2023-01-11T21:38:06.1861098Z         triton_fused_cat_convert_element_type_div_1_mul_ne_2_scalar_tensor_scatter_sum_2_sum_4_where_1_2.run(buf0, unsqueeze, tangents_1, ne, sub_1, buf3, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.1861171Z         del buf0
2023-01-11T21:38:06.1861243Z         del ne
2023-01-11T21:38:06.1861315Z         del sub_1
2023-01-11T21:38:06.1861394Z         del tangents_1
2023-01-11T21:38:06.1861469Z         del unsqueeze
2023-01-11T21:38:06.1861578Z         buf4 = as_strided(buf5, (1, 128, 1), (256, 2, 1), 1)  # alias
2023-01-11T21:38:06.1861714Z         triton_fused_zeros_3.run(buf4, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.1861789Z         del buf3
2023-01-11T21:38:06.1861871Z         del buf4
2023-01-11T21:38:06.1862077Z         buf6 = empty_strided((128, 768), (768, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1862239Z         aten.mm.out(as_strided(buf5, (128, 2), (2, 1)), permute_1, out=buf6)
2023-01-11T21:38:06.1862318Z         del permute_1
2023-01-11T21:38:06.1862515Z         buf7 = empty_strided((2, 768), (768, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1862645Z         aten.mm.out(as_strided(buf5, (2, 128), (1, 2)), view, out=buf7)
2023-01-11T21:38:06.1862720Z         del view
2023-01-11T21:38:06.1862918Z         buf8 = empty_strided((1, 2), (2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1863066Z         triton_fused_cat_sum_5_view_2_4.run(buf5, buf8, 2, 128, grid=grid(2), stream=stream0)
2023-01-11T21:38:06.1863242Z         return (as_strided(buf7, (2, 768), (768, 1)), as_strided(buf8, (2, ), (1, )), None, as_strided(buf6, (1, 128, 768), (98304, 768, 1)), )
2023-01-11T21:38:06.1863248Z 
2023-01-11T21:38:06.1863253Z 
2023-01-11T21:38:06.1863333Z if __name__ == "__main__":
2023-01-11T21:38:06.1863451Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1863574Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1863779Z     view = rand_strided((128, 768), (768, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1863981Z     sub_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1864184Z     unsqueeze = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.1864368Z     ne = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.1864578Z     permute_1 = rand_strided((2, 768), (768, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1864772Z     tangents_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1864941Z     print_performance(lambda: call([view, sub_1, unsqueeze, ne, permute_1, tangents_1]))
2023-01-11T21:38:06.1865213Z [2023-01-11 21:33:42,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 362
2023-01-11T21:38:06.1865223Z 
2023-01-11T21:38:06.1865287Z ok (2.011s)
2023-01-11T21:38:06.1865419Z   test_autotune_inplace_kernel (__main__.CudaReproTests)
2023-01-11T21:38:06.1865616Z This UT tests autotune on an inplace kernel. The autotune should not contaminate ... ok (4.483s)
2023-01-11T21:38:06.1866007Z   test_dtype_factory_issue (__main__.CudaReproTests) ... [2023-01-11 21:33:47,080] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:06.1866230Z [2023-01-11 21:33:47,084] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.randn
2023-01-11T21:38:06.1866490Z [2023-01-11 21:33:47,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:06.1866496Z 
2023-01-11T21:38:06.1866595Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1866670Z import torch
2023-01-11T21:38:06.1866738Z import random
2023-01-11T21:38:06.1866857Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1866981Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1866990Z 
2023-01-11T21:38:06.1867074Z aten = torch.ops.aten
2023-01-11T21:38:06.1867210Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1867308Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1867313Z 
2023-01-11T21:38:06.1867389Z import triton
2023-01-11T21:38:06.1867484Z import triton.language as tl
2023-01-11T21:38:06.1867602Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1867742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1867748Z 
2023-01-11T21:38:06.1867752Z 
2023-01-11T21:38:06.1867845Z async_compile.wait(globals())
2023-01-11T21:38:06.1867924Z del async_compile
2023-01-11T21:38:06.1867929Z 
2023-01-11T21:38:06.1868006Z def call(args):
2023-01-11T21:38:06.1868098Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1868357Z         buf0 = aten.randn([12, 64, 1, 64], dtype=torch.float32, device=device(type='cuda', index=0), pin_memory=False)
2023-01-11T21:38:06.1868464Z         buf1 = buf0
2023-01-11T21:38:06.1868575Z         assert_size_stride(buf1, (12, 64, 1, 64), (4096, 64, 64, 1))
2023-01-11T21:38:06.1868646Z         del buf0
2023-01-11T21:38:06.1868767Z         return (as_strided(buf1, (12, 64, 1, 64, 1), (4096, 64, 64, 1, 1)), )
2023-01-11T21:38:06.1868772Z 
2023-01-11T21:38:06.1868777Z 
2023-01-11T21:38:06.1868861Z if __name__ == "__main__":
2023-01-11T21:38:06.1868977Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1869101Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1869206Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.1869211Z 
2023-01-11T21:38:06.1869282Z ok (0.024s)
2023-01-11T21:38:06.1869792Z   test_dynamic_shapes (__main__.CudaReproTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1869874Z   warnings.warn(
2023-01-11T21:38:06.1870128Z [2023-01-11 21:33:47,154] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 363
2023-01-11T21:38:06.1870392Z [2023-01-11 21:33:48,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 363
2023-01-11T21:38:06.1870401Z 
2023-01-11T21:38:06.1870501Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1870577Z import torch
2023-01-11T21:38:06.1870651Z import random
2023-01-11T21:38:06.1870769Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1870886Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1870891Z 
2023-01-11T21:38:06.1870972Z aten = torch.ops.aten
2023-01-11T21:38:06.1871108Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1871203Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1871208Z 
2023-01-11T21:38:06.1871281Z import triton
2023-01-11T21:38:06.1871379Z import triton.language as tl
2023-01-11T21:38:06.1871504Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1871642Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1871647Z 
2023-01-11T21:38:06.1871652Z 
2023-01-11T21:38:06.1871781Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1872015Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1872140Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.1872245Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.1872332Z                        const long ks0)
2023-01-11T21:38:06.1872397Z {
2023-01-11T21:38:06.1872498Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.1872557Z     {
2023-01-11T21:38:06.1872639Z         #pragma omp for 
2023-01-11T21:38:06.1872729Z         for(long i0=0; i0<ks0; i0+=1)
2023-01-11T21:38:06.1872796Z         {
2023-01-11T21:38:06.1872866Z             {
2023-01-11T21:38:06.1872937Z                 {
2023-01-11T21:38:06.1873041Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.1873139Z                     auto tmp1 = std::cos(tmp0);
2023-01-11T21:38:06.1873240Z                     auto tmp2 = std::sin(tmp1);
2023-01-11T21:38:06.1873330Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.1873401Z                 }
2023-01-11T21:38:06.1873470Z             }
2023-01-11T21:38:06.1873538Z         }
2023-01-11T21:38:06.1873604Z     }
2023-01-11T21:38:06.1873662Z }
2023-01-11T21:38:06.1873747Z ''')
2023-01-11T21:38:06.1873753Z 
2023-01-11T21:38:06.1873757Z 
2023-01-11T21:38:06.1873850Z async_compile.wait(globals())
2023-01-11T21:38:06.1873927Z del async_compile
2023-01-11T21:38:06.1873932Z 
2023-01-11T21:38:06.1874006Z def call(args):
2023-01-11T21:38:06.1874079Z     arg0_1, = args
2023-01-11T21:38:06.1874153Z     args.clear()
2023-01-11T21:38:06.1874231Z     arg0_1_size = arg0_1.size()
2023-01-11T21:38:06.1874307Z     s0 = arg0_1_size[0]
2023-01-11T21:38:06.1874534Z     buf0 = empty_strided((s0, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1874686Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_long(s0))
2023-01-11T21:38:06.1874759Z     del arg0_1
2023-01-11T21:38:06.1874835Z     return (buf0, )
2023-01-11T21:38:06.1874840Z 
2023-01-11T21:38:06.1874846Z 
2023-01-11T21:38:06.1874926Z if __name__ == "__main__":
2023-01-11T21:38:06.1875044Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1875164Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1875360Z     arg0_1 = rand_strided((32, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1875473Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1875478Z 
2023-01-11T21:38:06.1875550Z ok (1.733s)
2023-01-11T21:38:06.1875903Z   test_expanded_inputs_cudagraphs (__main__.CudaReproTests) ... [2023-01-11 21:33:48,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 364
2023-01-11T21:38:06.1876164Z [2023-01-11 21:33:49,018] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to complex input striding
2023-01-11T21:38:06.1876428Z [2023-01-11 21:33:49,018] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 364
2023-01-11T21:38:06.1876434Z 
2023-01-11T21:38:06.1876534Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1876601Z import torch
2023-01-11T21:38:06.1876677Z import random
2023-01-11T21:38:06.1876796Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1876919Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1876924Z 
2023-01-11T21:38:06.1877006Z aten = torch.ops.aten
2023-01-11T21:38:06.1877144Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1877239Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1877245Z 
2023-01-11T21:38:06.1877320Z import triton
2023-01-11T21:38:06.1877406Z import triton.language as tl
2023-01-11T21:38:06.1877536Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1877676Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1877682Z 
2023-01-11T21:38:06.1877687Z 
2023-01-11T21:38:06.1877841Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.1877915Z import triton
2023-01-11T21:38:06.1878036Z import triton.language as tl
2023-01-11T21:38:06.1878153Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1878248Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1878382Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1878507Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1878512Z 
2023-01-11T21:38:06.1878932Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1879008Z @triton.jit
2023-01-11T21:38:06.1879154Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1879227Z     xnumel = 625
2023-01-11T21:38:06.1879327Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1879460Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1879537Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1879611Z     x0 = xindex % 5
2023-01-11T21:38:06.1879697Z     x2 = (xindex // 25) % 5
2023-01-11T21:38:06.1879767Z     x4 = xindex
2023-01-11T21:38:06.1879873Z     tmp0 = tl.load(in_ptr0 + (x0 + (5*x2)), xmask)
2023-01-11T21:38:06.1879982Z     tmp1 = tl.load(in_ptr1 + (x0 + (5*x2)), xmask)
2023-01-11T21:38:06.1880054Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1880189Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1880274Z ''')
2023-01-11T21:38:06.1880279Z 
2023-01-11T21:38:06.1880284Z 
2023-01-11T21:38:06.1880375Z async_compile.wait(globals())
2023-01-11T21:38:06.1880485Z del async_compile
2023-01-11T21:38:06.1880490Z 
2023-01-11T21:38:06.1880565Z def call(args):
2023-01-11T21:38:06.1880645Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1880719Z     args.clear()
2023-01-11T21:38:06.1880804Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1881025Z         buf0 = empty_strided((5, 5, 5, 5), (125, 25, 5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1881118Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1881263Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 625, grid=grid(625), stream=stream0)
2023-01-11T21:38:06.1881337Z         del arg0_1
2023-01-11T21:38:06.1881409Z         del arg1_1
2023-01-11T21:38:06.1881486Z         return (buf0, )
2023-01-11T21:38:06.1881491Z 
2023-01-11T21:38:06.1881497Z 
2023-01-11T21:38:06.1881576Z if __name__ == "__main__":
2023-01-11T21:38:06.1881685Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1881811Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1882027Z     arg0_1 = rand_strided((5, 5, 5, 5), (0, 5, 0, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1882240Z     arg1_1 = rand_strided((5, 5, 5, 5), (0, 5, 0, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1882358Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1882366Z 
2023-01-11T21:38:06.1882438Z ok (0.199s)
2023-01-11T21:38:06.1882811Z   test_expanded_inputs_cudagraphs_no_size_asserts (__main__.CudaReproTests) ... [2023-01-11 21:33:49,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 365
2023-01-11T21:38:06.1883074Z [2023-01-11 21:33:49,039] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to complex input striding
2023-01-11T21:38:06.1883331Z [2023-01-11 21:33:49,039] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 365
2023-01-11T21:38:06.1883343Z 
2023-01-11T21:38:06.1883434Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1883512Z import torch
2023-01-11T21:38:06.1883586Z import random
2023-01-11T21:38:06.1883709Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1883833Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1883838Z 
2023-01-11T21:38:06.1883922Z aten = torch.ops.aten
2023-01-11T21:38:06.1884087Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1884178Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1884183Z 
2023-01-11T21:38:06.1884256Z import triton
2023-01-11T21:38:06.1884348Z import triton.language as tl
2023-01-11T21:38:06.1884472Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1884612Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1884618Z 
2023-01-11T21:38:06.1884622Z 
2023-01-11T21:38:06.1884777Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.1884853Z import triton
2023-01-11T21:38:06.1884946Z import triton.language as tl
2023-01-11T21:38:06.1885056Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1885158Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1885290Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1885438Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1885444Z 
2023-01-11T21:38:06.1885889Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1885962Z @triton.jit
2023-01-11T21:38:06.1886105Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1886180Z     xnumel = 625
2023-01-11T21:38:06.1886270Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1886398Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1886482Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1886585Z     x0 = xindex % 5
2023-01-11T21:38:06.1886666Z     x2 = (xindex // 25) % 5
2023-01-11T21:38:06.1886737Z     x4 = xindex
2023-01-11T21:38:06.1886841Z     tmp0 = tl.load(in_ptr0 + (x0 + (5*x2)), xmask)
2023-01-11T21:38:06.1886937Z     tmp1 = tl.load(in_ptr1 + (x0 + (5*x2)), xmask)
2023-01-11T21:38:06.1887021Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1887158Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1887244Z ''')
2023-01-11T21:38:06.1887250Z 
2023-01-11T21:38:06.1887255Z 
2023-01-11T21:38:06.1887346Z async_compile.wait(globals())
2023-01-11T21:38:06.1887422Z del async_compile
2023-01-11T21:38:06.1887428Z 
2023-01-11T21:38:06.1887503Z def call(args):
2023-01-11T21:38:06.1887574Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1887649Z     args.clear()
2023-01-11T21:38:06.1887741Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1887960Z         buf0 = empty_strided((5, 5, 5, 5), (125, 25, 5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1888056Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1888197Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 625, grid=grid(625), stream=stream0)
2023-01-11T21:38:06.1888270Z         del arg0_1
2023-01-11T21:38:06.1888336Z         del arg1_1
2023-01-11T21:38:06.1888413Z         return (buf0, )
2023-01-11T21:38:06.1888422Z 
2023-01-11T21:38:06.1888426Z 
2023-01-11T21:38:06.1888506Z if __name__ == "__main__":
2023-01-11T21:38:06.1888623Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1888748Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1888964Z     arg0_1 = rand_strided((5, 5, 5, 5), (0, 5, 0, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1889174Z     arg1_1 = rand_strided((5, 5, 5, 5), (0, 5, 0, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1889292Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1889297Z 
2023-01-11T21:38:06.1889371Z ok (0.021s)
2023-01-11T21:38:06.1889701Z   test_index_put_issue (__main__.CudaReproTests) ... [2023-01-11 21:33:49,098] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:06.1889953Z [2023-01-11 21:33:49,326] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1890247Z [2023-01-11 21:33:49,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:06.1890253Z 
2023-01-11T21:38:06.1890352Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1890426Z import torch
2023-01-11T21:38:06.1890500Z import random
2023-01-11T21:38:06.1890617Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1890742Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1890747Z 
2023-01-11T21:38:06.1890823Z aten = torch.ops.aten
2023-01-11T21:38:06.1890959Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1891056Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1891064Z 
2023-01-11T21:38:06.1891140Z import triton
2023-01-11T21:38:06.1891236Z import triton.language as tl
2023-01-11T21:38:06.1891361Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1891502Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1891507Z 
2023-01-11T21:38:06.1891514Z 
2023-01-11T21:38:06.1891671Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.1891738Z import triton
2023-01-11T21:38:06.1891831Z import triton.language as tl
2023-01-11T21:38:06.1891945Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1892048Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1892181Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1892307Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1892312Z 
2023-01-11T21:38:06.1892717Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1892823Z @triton.jit
2023-01-11T21:38:06.1892951Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1893028Z     xnumel = 393216
2023-01-11T21:38:06.1893127Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1893256Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1893339Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1893412Z     x0 = xindex
2023-01-11T21:38:06.1893529Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.1893658Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.1893742Z ''')
2023-01-11T21:38:06.1893748Z 
2023-01-11T21:38:06.1893753Z 
2023-01-11T21:38:06.1893955Z triton_fused_clone_index_put__sum_1_view_where_1 = async_compile.triton('''
2023-01-11T21:38:06.1894033Z import triton
2023-01-11T21:38:06.1894125Z import triton.language as tl
2023-01-11T21:38:06.1894240Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1894345Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1894593Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1894717Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1894722Z 
2023-01-11T21:38:06.1895181Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*i64', 1: '*i1', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.1895255Z @triton.jit
2023-01-11T21:38:06.1895413Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1895488Z     xnumel = 393216
2023-01-11T21:38:06.1895585Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1895718Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1895803Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1895877Z     x1 = (xindex // 768)
2023-01-11T21:38:06.1895947Z     x2 = xindex
2023-01-11T21:38:06.1896023Z     x0 = xindex % 768
2023-01-11T21:38:06.1896122Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.1896263Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.1896383Z     tmp2 = tl.load(in_ptr2 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.1896503Z     tmp3 = tl.load(in_ptr2 + (393216 + x2), xmask).to(tl.float32)
2023-01-11T21:38:06.1896614Z     tmp5 = tl.load(in_ptr2 + (786432 + x2), xmask).to(tl.float32)
2023-01-11T21:38:06.1896736Z     tmp7 = tl.load(in_ptr2 + (1179648 + x2), xmask).to(tl.float32)
2023-01-11T21:38:06.1896851Z     tmp9 = tl.load(in_ptr3 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.1896931Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.1897010Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.1897091Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.1897237Z     tmp10 = tl.where(tmp1, tmp8, tmp9)
2023-01-11T21:38:06.1897384Z     tl.atomic_add(out_ptr0 + (x0 + (768*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.1897471Z ''')
2023-01-11T21:38:06.1897476Z 
2023-01-11T21:38:06.1897481Z 
2023-01-11T21:38:06.1897575Z async_compile.wait(globals())
2023-01-11T21:38:06.1897655Z del async_compile
2023-01-11T21:38:06.1897660Z 
2023-01-11T21:38:06.1897737Z def call(args):
2023-01-11T21:38:06.1897897Z     self_1, arg76_1_1, expand_default_1, full_like_default_1, _to_copy_default_67_1, zeros_1 = args
2023-01-11T21:38:06.1897973Z     args.clear()
2023-01-11T21:38:06.1898067Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1898267Z         buf0 = empty_strided((512, 768), (768, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.1898359Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1898503Z         triton_fused_clone_0.run(zeros_1, buf0, 393216, grid=grid(393216), stream=stream0)
2023-01-11T21:38:06.1898636Z         del zeros_1
2023-01-11T21:38:06.1898863Z         triton_fused_clone_index_put__sum_1_view_where_1.run(arg76_1_1, expand_default_1, _to_copy_default_67_1, full_like_default_1, buf0, 393216, grid=grid(393216), stream=stream0)
2023-01-11T21:38:06.1898957Z         del _to_copy_default_67_1
2023-01-11T21:38:06.1899040Z         del arg76_1_1
2023-01-11T21:38:06.1899121Z         del expand_default_1
2023-01-11T21:38:06.1899211Z         del full_like_default_1
2023-01-11T21:38:06.1899291Z         return (buf0, )
2023-01-11T21:38:06.1899297Z 
2023-01-11T21:38:06.1899301Z 
2023-01-11T21:38:06.1899385Z if __name__ == "__main__":
2023-01-11T21:38:06.1899506Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1899634Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1899823Z     self_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.1900026Z     arg76_1_1 = rand_strided((512, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.1900246Z     expand_default_1 = rand_strided((512, 768), (768, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.1900475Z     full_like_default_1 = rand_strided((512, 768), (768, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.1900720Z     _to_copy_default_67_1 = rand_strided((4, 512, 768), (393216, 768, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.1900932Z     zeros_1 = rand_strided((512, 768), (768, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.1901131Z     print_performance(lambda: call([self_1, arg76_1_1, expand_default_1, full_like_default_1, _to_copy_default_67_1, zeros_1]))
2023-01-11T21:38:06.1901137Z 
2023-01-11T21:38:06.1901210Z ok (0.286s)
2023-01-11T21:38:06.1901565Z   test_indirect_indexing_dense_mask (__main__.CudaReproTests) ... [2023-01-11 21:33:49,397] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 366
2023-01-11T21:38:06.1901778Z [2023-01-11 21:33:49,408] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.1902047Z [2023-01-11 21:33:49,583] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 366
2023-01-11T21:38:06.1902053Z 
2023-01-11T21:38:06.1902152Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1902222Z import torch
2023-01-11T21:38:06.1902327Z import random
2023-01-11T21:38:06.1902451Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1902579Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1902585Z 
2023-01-11T21:38:06.1902669Z aten = torch.ops.aten
2023-01-11T21:38:06.1902808Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1902909Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1902915Z 
2023-01-11T21:38:06.1902984Z import triton
2023-01-11T21:38:06.1903080Z import triton.language as tl
2023-01-11T21:38:06.1903206Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1903349Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1903357Z 
2023-01-11T21:38:06.1903362Z 
2023-01-11T21:38:06.1903531Z triton_fused_mul_ne_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.1903610Z import triton
2023-01-11T21:38:06.1903704Z import triton.language as tl
2023-01-11T21:38:06.1903822Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1903919Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1904051Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.1904180Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1904185Z 
2023-01-11T21:38:06.1904277Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.1904394Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.1904481Z               filename=__file__,
2023-01-11T21:38:06.1904856Z               meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.1904960Z @triton.jit
2023-01-11T21:38:06.1905133Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.1905209Z     xnumel = 1
2023-01-11T21:38:06.1905286Z     rnumel = 128
2023-01-11T21:38:06.1905388Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1905525Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.1905612Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1905733Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.1905844Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.1905952Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1906042Z         rindex = roffset + rbase
2023-01-11T21:38:06.1906131Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1906206Z         r0 = rindex
2023-01-11T21:38:06.1906402Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.1906481Z         tmp1 = 1
2023-01-11T21:38:06.1906559Z         tmp2 = tmp0 != tmp1
2023-01-11T21:38:06.1906649Z         tmp3 = tmp2.to(tl.int64)
2023-01-11T21:38:06.1906773Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.1906895Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.1907001Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.1907091Z         rindex = roffset + rbase
2023-01-11T21:38:06.1907180Z         rmask = rindex < rnumel
2023-01-11T21:38:06.1907247Z         r0 = rindex
2023-01-11T21:38:06.1907350Z         tmp5 = tl.load(in_ptr1 + (r0), rmask)
2023-01-11T21:38:06.1907423Z         tmp6 = 1
2023-01-11T21:38:06.1907540Z         tmp7 = tmp4 - tmp6
2023-01-11T21:38:06.1907648Z         tmp8 = tl.load(in_ptr0 + (tmp7), xmask)
2023-01-11T21:38:06.1907730Z         tmp9 = tmp5 * tmp8
2023-01-11T21:38:06.1907883Z         tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask)
2023-01-11T21:38:06.1907965Z ''')
2023-01-11T21:38:06.1907970Z 
2023-01-11T21:38:06.1907975Z 
2023-01-11T21:38:06.1908069Z async_compile.wait(globals())
2023-01-11T21:38:06.1908149Z del async_compile
2023-01-11T21:38:06.1908155Z 
2023-01-11T21:38:06.1908233Z def call(args):
2023-01-11T21:38:06.1908343Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1908422Z     args.clear()
2023-01-11T21:38:06.1908516Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1908715Z         buf1 = empty_strided((1, 128), (128, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.1908808Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1908962Z         triton_fused_mul_ne_sum_1_0.run(arg0_1, arg1_1, buf1, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.1909037Z         del arg0_1
2023-01-11T21:38:06.1909111Z         del arg1_1
2023-01-11T21:38:06.1909192Z         return (buf1, )
2023-01-11T21:38:06.1909197Z 
2023-01-11T21:38:06.1909202Z 
2023-01-11T21:38:06.1909287Z if __name__ == "__main__":
2023-01-11T21:38:06.1909408Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1909530Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1909734Z     arg0_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.1909940Z     arg1_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.1910061Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1910066Z 
2023-01-11T21:38:06.1910143Z ok (0.381s)
2023-01-11T21:38:06.1910304Z   test_inductor_output_aliases_intermediate (__main__.CudaReproTests) ... ok (0.001s)
2023-01-11T21:38:06.1910663Z   test_inplace_updates_cudagraphs (__main__.CudaReproTests) ... [2023-01-11 21:33:49,736] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 367
2023-01-11T21:38:06.1910930Z [2023-01-11 21:33:49,739] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 367
2023-01-11T21:38:06.1911218Z [2023-01-11 21:33:49,857] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 367
2023-01-11T21:38:06.1911477Z [2023-01-11 21:33:49,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 367
2023-01-11T21:38:06.1911483Z 
2023-01-11T21:38:06.1911586Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1911663Z import torch
2023-01-11T21:38:06.1911741Z import random
2023-01-11T21:38:06.1911863Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1911989Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1911994Z 
2023-01-11T21:38:06.1912081Z aten = torch.ops.aten
2023-01-11T21:38:06.1912214Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1912314Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1912319Z 
2023-01-11T21:38:06.1912395Z import triton
2023-01-11T21:38:06.1912491Z import triton.language as tl
2023-01-11T21:38:06.1912622Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1912764Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1912770Z 
2023-01-11T21:38:06.1912775Z 
2023-01-11T21:38:06.1912870Z async_compile.wait(globals())
2023-01-11T21:38:06.1912949Z del async_compile
2023-01-11T21:38:06.1912954Z 
2023-01-11T21:38:06.1913027Z def call(args):
2023-01-11T21:38:06.1913120Z     primals_1, primals_2 = args
2023-01-11T21:38:06.1913199Z     args.clear()
2023-01-11T21:38:06.1913293Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1913504Z         buf0 = empty_strided((10, 20), (20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1913618Z         aten.mm.out(primals_2, primals_1, out=buf0)
2023-01-11T21:38:06.1913774Z         return (buf0, as_strided(primals_2, (10, 10), (1, 10)), as_strided(primals_1, (20, 10), (1, 20)), )
2023-01-11T21:38:06.1913779Z 
2023-01-11T21:38:06.1913784Z 
2023-01-11T21:38:06.1913868Z if __name__ == "__main__":
2023-01-11T21:38:06.1913984Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1914111Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1914323Z     primals_1 = rand_strided((10, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1914596Z     primals_2 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1914731Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:06.1914736Z 
2023-01-11T21:38:06.1914741Z 
2023-01-11T21:38:06.1914841Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1914918Z import torch
2023-01-11T21:38:06.1914996Z import random
2023-01-11T21:38:06.1915131Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1915271Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1915277Z 
2023-01-11T21:38:06.1915373Z aten = torch.ops.aten
2023-01-11T21:38:06.1915511Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1915611Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1915616Z 
2023-01-11T21:38:06.1915692Z import triton
2023-01-11T21:38:06.1915788Z import triton.language as tl
2023-01-11T21:38:06.1915914Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1916047Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1916055Z 
2023-01-11T21:38:06.1916059Z 
2023-01-11T21:38:06.1916153Z async_compile.wait(globals())
2023-01-11T21:38:06.1916232Z del async_compile
2023-01-11T21:38:06.1916238Z 
2023-01-11T21:38:06.1916313Z def call(args):
2023-01-11T21:38:06.1916422Z     permute, permute_1, tangents_1 = args
2023-01-11T21:38:06.1916499Z     args.clear()
2023-01-11T21:38:06.1916595Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1916792Z         buf0 = empty_strided((10, 20), (20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1916906Z         aten.mm.out(permute, tangents_1, out=buf0)
2023-01-11T21:38:06.1916985Z         del permute
2023-01-11T21:38:06.1917212Z         buf1 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1917328Z         aten.mm.out(tangents_1, permute_1, out=buf1)
2023-01-11T21:38:06.1917409Z         del permute_1
2023-01-11T21:38:06.1917491Z         del tangents_1
2023-01-11T21:38:06.1917578Z         return (buf0, buf1, )
2023-01-11T21:38:06.1917585Z 
2023-01-11T21:38:06.1917590Z 
2023-01-11T21:38:06.1917666Z if __name__ == "__main__":
2023-01-11T21:38:06.1917783Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1917910Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1918120Z     permute = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1918324Z     permute_1 = rand_strided((20, 10), (1, 20), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1918534Z     tangents_1 = rand_strided((10, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1918680Z     print_performance(lambda: call([permute, permute_1, tangents_1]))
2023-01-11T21:38:06.1918688Z 
2023-01-11T21:38:06.1918763Z ok (0.271s)
2023-01-11T21:38:06.1919237Z   test_input_channels_last (__main__.CudaReproTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1919365Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1919624Z [2023-01-11 21:33:51,298] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 368
2023-01-11T21:38:06.1919887Z [2023-01-11 21:33:51,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 368
2023-01-11T21:38:06.1920334Z /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.1920418Z   warnings.warn(
2023-01-11T21:38:06.1920708Z [2023-01-11 21:33:51,464] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 369
2023-01-11T21:38:06.1920966Z [2023-01-11 21:33:51,488] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 369
2023-01-11T21:38:06.1920971Z 
2023-01-11T21:38:06.1921072Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1921149Z import torch
2023-01-11T21:38:06.1921219Z import random
2023-01-11T21:38:06.1921341Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1921467Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1921472Z 
2023-01-11T21:38:06.1921557Z aten = torch.ops.aten
2023-01-11T21:38:06.1921694Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1921795Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1921800Z 
2023-01-11T21:38:06.1921876Z import triton
2023-01-11T21:38:06.1921971Z import triton.language as tl
2023-01-11T21:38:06.1922090Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1922232Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1922238Z 
2023-01-11T21:38:06.1922242Z 
2023-01-11T21:38:06.1922412Z triton_fused_convolution_0 = async_compile.triton('''
2023-01-11T21:38:06.1922491Z import triton
2023-01-11T21:38:06.1922586Z import triton.language as tl
2023-01-11T21:38:06.1922702Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1922808Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1922936Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1923065Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1923070Z 
2023-01-11T21:38:06.1923493Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1923597Z @triton.jit
2023-01-11T21:38:06.1923734Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1923813Z     xnumel = 1536
2023-01-11T21:38:06.1923915Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1924049Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1924128Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1924202Z     x2 = xindex
2023-01-11T21:38:06.1924277Z     x0 = xindex % 3
2023-01-11T21:38:06.1924384Z     tmp0 = tl.load(in_out_ptr0 + (x2), xmask)
2023-01-11T21:38:06.1924482Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1924563Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1924708Z     tl.store(in_out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1924791Z ''')
2023-01-11T21:38:06.1924804Z 
2023-01-11T21:38:06.1924808Z 
2023-01-11T21:38:06.1924897Z async_compile.wait(globals())
2023-01-11T21:38:06.1924975Z del async_compile
2023-01-11T21:38:06.1924981Z 
2023-01-11T21:38:06.1925056Z def call(args):
2023-01-11T21:38:06.1925167Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.1925257Z     args.clear()
2023-01-11T21:38:06.1925367Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1925548Z         buf0 = aten.convolution(primals_3, primals_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.1925658Z         assert_size_stride(buf0, (2, 3, 16, 16), (768, 1, 48, 3))
2023-01-11T21:38:06.1925787Z         buf1 = as_strided(buf0, (2, 3, 16, 16), (768, 1, 48, 3)); del buf0  # reuse
2023-01-11T21:38:06.1925883Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1926038Z         triton_fused_convolution_0.run(buf1, primals_2, 1536, grid=grid(1536), stream=stream0)
2023-01-11T21:38:06.1926118Z         del primals_2
2023-01-11T21:38:06.1926230Z         return (buf1, primals_1, primals_3, )
2023-01-11T21:38:06.1926236Z 
2023-01-11T21:38:06.1926240Z 
2023-01-11T21:38:06.1926322Z if __name__ == "__main__":
2023-01-11T21:38:06.1926442Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1926591Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1926811Z     primals_1 = rand_strided((3, 3, 1, 1), (3, 1, 3, 3), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1927016Z     primals_2 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1927242Z     primals_3 = rand_strided((2, 3, 16, 16), (768, 1, 48, 3), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1927389Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.1927394Z 
2023-01-11T21:38:06.1927399Z 
2023-01-11T21:38:06.1927500Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1927576Z import torch
2023-01-11T21:38:06.1927655Z import random
2023-01-11T21:38:06.1927770Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1927896Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1927901Z 
2023-01-11T21:38:06.1927986Z aten = torch.ops.aten
2023-01-11T21:38:06.1928125Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1928227Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1928233Z 
2023-01-11T21:38:06.1928309Z import triton
2023-01-11T21:38:06.1928402Z import triton.language as tl
2023-01-11T21:38:06.1928528Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1928663Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1928668Z 
2023-01-11T21:38:06.1928672Z 
2023-01-11T21:38:06.1928842Z triton_fused_convolution_0 = async_compile.triton('''
2023-01-11T21:38:06.1928919Z import triton
2023-01-11T21:38:06.1929013Z import triton.language as tl
2023-01-11T21:38:06.1929130Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1929266Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1929401Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1929521Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1929532Z 
2023-01-11T21:38:06.1929951Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1930030Z @triton.jit
2023-01-11T21:38:06.1930165Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1930242Z     xnumel = 1536
2023-01-11T21:38:06.1930341Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1930472Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1930560Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1930627Z     x2 = xindex
2023-01-11T21:38:06.1930705Z     x0 = xindex % 3
2023-01-11T21:38:06.1930811Z     tmp0 = tl.load(in_out_ptr0 + (x2), xmask)
2023-01-11T21:38:06.1930910Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1930992Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1931133Z     tl.store(in_out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1931222Z ''')
2023-01-11T21:38:06.1931227Z 
2023-01-11T21:38:06.1931232Z 
2023-01-11T21:38:06.1931326Z async_compile.wait(globals())
2023-01-11T21:38:06.1931399Z del async_compile
2023-01-11T21:38:06.1931404Z 
2023-01-11T21:38:06.1931480Z def call(args):
2023-01-11T21:38:06.1931587Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.1931665Z     args.clear()
2023-01-11T21:38:06.1931759Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1931916Z         buf0 = aten.convolution(primals_3, primals_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.1932032Z         assert_size_stride(buf0, (2, 3, 16, 16), (768, 1, 48, 3))
2023-01-11T21:38:06.1932159Z         buf1 = as_strided(buf0, (2, 3, 16, 16), (768, 1, 48, 3)); del buf0  # reuse
2023-01-11T21:38:06.1932253Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1932410Z         triton_fused_convolution_0.run(buf1, primals_2, 1536, grid=grid(1536), stream=stream0)
2023-01-11T21:38:06.1932490Z         del primals_2
2023-01-11T21:38:06.1932628Z         return (buf1, primals_1, primals_3, )
2023-01-11T21:38:06.1932634Z 
2023-01-11T21:38:06.1932639Z 
2023-01-11T21:38:06.1932720Z if __name__ == "__main__":
2023-01-11T21:38:06.1932837Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1932963Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1933172Z     primals_1 = rand_strided((3, 3, 1, 1), (3, 1, 3, 3), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1933374Z     primals_2 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1933600Z     primals_3 = rand_strided((2, 3, 16, 16), (768, 1, 48, 3), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1933747Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.1933752Z 
2023-01-11T21:38:06.1933824Z ok (1.625s)
2023-01-11T21:38:06.1934166Z   test_scalar_triton_index (__main__.CudaReproTests) ... [2023-01-11 21:33:51,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 370
2023-01-11T21:38:06.1934431Z [2023-01-11 21:33:51,812] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 370
2023-01-11T21:38:06.1934437Z 
2023-01-11T21:38:06.1934660Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1934737Z import torch
2023-01-11T21:38:06.1934806Z import random
2023-01-11T21:38:06.1934924Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1935048Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1935053Z 
2023-01-11T21:38:06.1935133Z aten = torch.ops.aten
2023-01-11T21:38:06.1935270Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1935409Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1935415Z 
2023-01-11T21:38:06.1935489Z import triton
2023-01-11T21:38:06.1935574Z import triton.language as tl
2023-01-11T21:38:06.1935700Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1935841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1935846Z 
2023-01-11T21:38:06.1935851Z 
2023-01-11T21:38:06.1936009Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.1936085Z import triton
2023-01-11T21:38:06.1936176Z import triton.language as tl
2023-01-11T21:38:06.1936290Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1936392Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1936517Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1936643Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1936648Z 
2023-01-11T21:38:06.1937055Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1937177Z @triton.jit
2023-01-11T21:38:06.1937326Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1937400Z     xnumel = 16
2023-01-11T21:38:06.1937499Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1937627Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1937704Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1937776Z     x0 = xindex
2023-01-11T21:38:06.1937846Z     tmp0 = 0
2023-01-11T21:38:06.1937950Z     tmp1 = tl.load(in_ptr0 + (tmp0), None)
2023-01-11T21:38:06.1938083Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.1938168Z ''')
2023-01-11T21:38:06.1938173Z 
2023-01-11T21:38:06.1938177Z 
2023-01-11T21:38:06.1938275Z async_compile.wait(globals())
2023-01-11T21:38:06.1938345Z del async_compile
2023-01-11T21:38:06.1938357Z 
2023-01-11T21:38:06.1938425Z def call(args):
2023-01-11T21:38:06.1938497Z     arg0_1, = args
2023-01-11T21:38:06.1938572Z     args.clear()
2023-01-11T21:38:06.1938662Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1938904Z         buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1939001Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1939145Z         triton_fused_index_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.1939219Z         del arg0_1
2023-01-11T21:38:06.1939298Z         return (buf0, )
2023-01-11T21:38:06.1939303Z 
2023-01-11T21:38:06.1939308Z 
2023-01-11T21:38:06.1939389Z if __name__ == "__main__":
2023-01-11T21:38:06.1939519Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1939659Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1939887Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1940010Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1940016Z 
2023-01-11T21:38:06.1940080Z ok (0.325s)
2023-01-11T21:38:06.1940476Z   test_sort_stride_issue (__main__.CudaReproTests) ... [2023-01-11 21:33:51,944] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 371
2023-01-11T21:38:06.1940721Z [2023-01-11 21:33:51,949] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort
2023-01-11T21:38:06.1940962Z [2023-01-11 21:33:51,950] torch._inductor.scheduler: [DEBUG] removed dead node: buf2
2023-01-11T21:38:06.1941262Z [2023-01-11 21:33:51,951] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 371
2023-01-11T21:38:06.1941268Z 
2023-01-11T21:38:06.1941373Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1941447Z import torch
2023-01-11T21:38:06.1941522Z import random
2023-01-11T21:38:06.1941649Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1941817Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1941822Z 
2023-01-11T21:38:06.1941908Z aten = torch.ops.aten
2023-01-11T21:38:06.1942056Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1942156Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1942166Z 
2023-01-11T21:38:06.1942244Z import triton
2023-01-11T21:38:06.1942342Z import triton.language as tl
2023-01-11T21:38:06.1942477Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1942622Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1942638Z 
2023-01-11T21:38:06.1942642Z 
2023-01-11T21:38:06.1942732Z async_compile.wait(globals())
2023-01-11T21:38:06.1942813Z del async_compile
2023-01-11T21:38:06.1942818Z 
2023-01-11T21:38:06.1942898Z def call(args):
2023-01-11T21:38:06.1942977Z     arg0_1, = args
2023-01-11T21:38:06.1943052Z     args.clear()
2023-01-11T21:38:06.1943148Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1943253Z         buf0 = aten.sort(arg0_1, 1, True)
2023-01-11T21:38:06.1943319Z         del arg0_1
2023-01-11T21:38:06.1943395Z         buf1 = buf0[0]
2023-01-11T21:38:06.1943506Z         assert_size_stride(buf1, (1, 100), (100, 1))
2023-01-11T21:38:06.1943577Z         del buf0
2023-01-11T21:38:06.1943654Z         return (buf1, )
2023-01-11T21:38:06.1943662Z 
2023-01-11T21:38:06.1943667Z 
2023-01-11T21:38:06.1943749Z if __name__ == "__main__":
2023-01-11T21:38:06.1943874Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1944002Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1944232Z     arg0_1 = rand_strided((1, 100), (0, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.1944350Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1944355Z 
2023-01-11T21:38:06.1944428Z ok (0.137s)
2023-01-11T21:38:06.1944834Z   test_unspec_inputs_interop (__main__.CudaReproTests) ... [2023-01-11 21:33:52,102] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:06.1945123Z [2023-01-11 21:33:53,749] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.1945453Z [2023-01-11 21:33:53,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:06.1945495Z 
2023-01-11T21:38:06.1945614Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1945692Z import torch
2023-01-11T21:38:06.1945762Z import random
2023-01-11T21:38:06.1945891Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1946025Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1946031Z 
2023-01-11T21:38:06.1946117Z aten = torch.ops.aten
2023-01-11T21:38:06.1946265Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1946367Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1946372Z 
2023-01-11T21:38:06.1946448Z import triton
2023-01-11T21:38:06.1946545Z import triton.language as tl
2023-01-11T21:38:06.1946681Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1946834Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1946839Z 
2023-01-11T21:38:06.1946845Z 
2023-01-11T21:38:06.1946998Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.1947232Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.1947367Z extern "C" void kernel(const long* __restrict__ in_ptr0,
2023-01-11T21:38:06.1947475Z                        long* __restrict__ out_ptr0)
2023-01-11T21:38:06.1947543Z {
2023-01-11T21:38:06.1947605Z     {
2023-01-11T21:38:06.1947677Z         {
2023-01-11T21:38:06.1947769Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.1947873Z             auto tmp1 = static_cast<long>(1);
2023-01-11T21:38:06.1947965Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.1948055Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.1948151Z         }
2023-01-11T21:38:06.1948213Z     }
2023-01-11T21:38:06.1948278Z }
2023-01-11T21:38:06.1948364Z ''')
2023-01-11T21:38:06.1948369Z 
2023-01-11T21:38:06.1948373Z 
2023-01-11T21:38:06.1948466Z async_compile.wait(globals())
2023-01-11T21:38:06.1948545Z del async_compile
2023-01-11T21:38:06.1948551Z 
2023-01-11T21:38:06.1948630Z def call(args):
2023-01-11T21:38:06.1948712Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.1948781Z     args.clear()
2023-01-11T21:38:06.1948964Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1949103Z     kernel_cpp_0(c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.1949176Z     del arg1_1
2023-01-11T21:38:06.1949300Z     return (as_strided(arg0_1, (12, 3, 512, 1, 64), (64, 196608, 768, 0, 1)), buf0, )
2023-01-11T21:38:06.1949306Z 
2023-01-11T21:38:06.1949310Z 
2023-01-11T21:38:06.1949390Z if __name__ == "__main__":
2023-01-11T21:38:06.1949508Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1949637Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1949859Z     arg0_1 = rand_strided((12, 3, 512, 64), (64, 196608, 768, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1950042Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.1950167Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.1950172Z 
2023-01-11T21:38:06.1950246Z ok (1.679s)
2023-01-11T21:38:06.1950700Z   test_abs_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1950831Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1951091Z [2023-01-11 21:33:53,763] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 372
2023-01-11T21:38:06.1951355Z [2023-01-11 21:33:53,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 372
2023-01-11T21:38:06.1951801Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1951933Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1952190Z [2023-01-11 21:33:53,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 373
2023-01-11T21:38:06.1952446Z [2023-01-11 21:33:54,008] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 373
2023-01-11T21:38:06.1952454Z 
2023-01-11T21:38:06.1952551Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1952625Z import torch
2023-01-11T21:38:06.1952699Z import random
2023-01-11T21:38:06.1952819Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1952945Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1952950Z 
2023-01-11T21:38:06.1953034Z aten = torch.ops.aten
2023-01-11T21:38:06.1953164Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1953258Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1953264Z 
2023-01-11T21:38:06.1953336Z import triton
2023-01-11T21:38:06.1953427Z import triton.language as tl
2023-01-11T21:38:06.1953552Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1953692Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1953697Z 
2023-01-11T21:38:06.1953702Z 
2023-01-11T21:38:06.1953855Z triton_fused_div_0 = async_compile.triton('''
2023-01-11T21:38:06.1953956Z import triton
2023-01-11T21:38:06.1954041Z import triton.language as tl
2023-01-11T21:38:06.1954155Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1954255Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1954388Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1954514Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1954520Z 
2023-01-11T21:38:06.1954923Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.1954996Z @triton.jit
2023-01-11T21:38:06.1955145Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1955219Z     xnumel = 17
2023-01-11T21:38:06.1955335Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1955470Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1955556Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1955627Z     x0 = xindex
2023-01-11T21:38:06.1955723Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1955803Z     tmp1 = tl.abs(tmp0)
2023-01-11T21:38:06.1955866Z     tmp2 = 1
2023-01-11T21:38:06.1955944Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1956025Z     tmp4 = tmp0 / tmp3
2023-01-11T21:38:06.1956161Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1956246Z ''')
2023-01-11T21:38:06.1956251Z 
2023-01-11T21:38:06.1956256Z 
2023-01-11T21:38:06.1956349Z async_compile.wait(globals())
2023-01-11T21:38:06.1956428Z del async_compile
2023-01-11T21:38:06.1956433Z 
2023-01-11T21:38:06.1956507Z def call(args):
2023-01-11T21:38:06.1956573Z     arg0_1, = args
2023-01-11T21:38:06.1956647Z     args.clear()
2023-01-11T21:38:06.1956737Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1956933Z         buf0 = empty_strided((17, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1957029Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1957164Z         triton_fused_div_0.run(arg0_1, buf0, 17, grid=grid(17), stream=stream0)
2023-01-11T21:38:06.1957237Z         del arg0_1
2023-01-11T21:38:06.1957307Z         return (buf0, )
2023-01-11T21:38:06.1957312Z 
2023-01-11T21:38:06.1957317Z 
2023-01-11T21:38:06.1957426Z if __name__ == "__main__":
2023-01-11T21:38:06.1957549Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1957676Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1957875Z     arg0_1 = rand_strided((17, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1957987Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1957992Z 
2023-01-11T21:38:06.1957997Z 
2023-01-11T21:38:06.1958093Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1958168Z import torch
2023-01-11T21:38:06.1958235Z import random
2023-01-11T21:38:06.1958353Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1958478Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1958483Z 
2023-01-11T21:38:06.1958566Z aten = torch.ops.aten
2023-01-11T21:38:06.1958700Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1958793Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1958798Z 
2023-01-11T21:38:06.1958874Z import triton
2023-01-11T21:38:06.1958960Z import triton.language as tl
2023-01-11T21:38:06.1959082Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1959224Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1959229Z 
2023-01-11T21:38:06.1959234Z 
2023-01-11T21:38:06.1959385Z triton_fused_div_0 = async_compile.triton('''
2023-01-11T21:38:06.1959458Z import triton
2023-01-11T21:38:06.1959550Z import triton.language as tl
2023-01-11T21:38:06.1959664Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1959765Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1959917Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1960043Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1960048Z 
2023-01-11T21:38:06.1960451Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.1960525Z @triton.jit
2023-01-11T21:38:06.1960657Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1960731Z     xnumel = 17
2023-01-11T21:38:06.1960827Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1960957Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1961033Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1961104Z     x0 = xindex
2023-01-11T21:38:06.1961223Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.1961304Z     tmp1 = tl.abs(tmp0)
2023-01-11T21:38:06.1961380Z     tmp2 = 1
2023-01-11T21:38:06.1961458Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.1961535Z     tmp4 = tmp0 / tmp3
2023-01-11T21:38:06.1961666Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.1961751Z ''')
2023-01-11T21:38:06.1961756Z 
2023-01-11T21:38:06.1961761Z 
2023-01-11T21:38:06.1961855Z async_compile.wait(globals())
2023-01-11T21:38:06.1961932Z del async_compile
2023-01-11T21:38:06.1961937Z 
2023-01-11T21:38:06.1962012Z def call(args):
2023-01-11T21:38:06.1962086Z     arg0_1, = args
2023-01-11T21:38:06.1962159Z     args.clear()
2023-01-11T21:38:06.1962243Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1962439Z         buf0 = empty_strided((17, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.1962532Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1962666Z         triton_fused_div_0.run(arg0_1, buf0, 17, grid=grid(17), stream=stream0)
2023-01-11T21:38:06.1962739Z         del arg0_1
2023-01-11T21:38:06.1962820Z         return (buf0, )
2023-01-11T21:38:06.1962825Z 
2023-01-11T21:38:06.1962830Z 
2023-01-11T21:38:06.1962910Z if __name__ == "__main__":
2023-01-11T21:38:06.1963027Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1963146Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1963369Z     arg0_1 = rand_strided((17, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.1963482Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1963487Z 
2023-01-11T21:38:06.1963562Z ok (0.261s)
2023-01-11T21:38:06.1964033Z   test_adaptive_avg_pool2d1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1964167Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1964427Z [2023-01-11 21:33:54,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 374
2023-01-11T21:38:06.1964673Z [2023-01-11 21:33:54,045] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d
2023-01-11T21:38:06.1964679Z 
2023-01-11T21:38:06.1964774Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1964841Z import torch
2023-01-11T21:38:06.1964917Z import random
2023-01-11T21:38:06.1965036Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1965157Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1965163Z 
2023-01-11T21:38:06.1965244Z aten = torch.ops.aten
2023-01-11T21:38:06.1965380Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1965474Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1965479Z 
2023-01-11T21:38:06.1965555Z import triton
2023-01-11T21:38:06.1965669Z import triton.language as tl
2023-01-11T21:38:06.1965793Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1965931Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1965936Z 
2023-01-11T21:38:06.1965941Z 
2023-01-11T21:38:06.1966129Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.1966205Z import triton
2023-01-11T21:38:06.1966297Z import triton.language as tl
2023-01-11T21:38:06.1966409Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1966510Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1966635Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1966761Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1966766Z 
2023-01-11T21:38:06.1967170Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1967247Z @triton.jit
2023-01-11T21:38:06.1967377Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1967451Z     xnumel = 288
2023-01-11T21:38:06.1967549Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1967681Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1967757Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1967839Z     x1 = (xindex // 6) % 6
2023-01-11T21:38:06.1967914Z     x0 = xindex % 6
2023-01-11T21:38:06.1967994Z     x2 = (xindex // 36)
2023-01-11T21:38:06.1968065Z     x4 = xindex
2023-01-11T21:38:06.1968147Z     tmp0 = ((8*x1) // 3)
2023-01-11T21:38:06.1968223Z     tmp1 = ((21 + (16*x1)) // 6)
2023-01-11T21:38:06.1968301Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.1968377Z     tmp3 = ((8*x0) // 3)
2023-01-11T21:38:06.1968460Z     tmp4 = ((21 + (16*x0)) // 6)
2023-01-11T21:38:06.1968539Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.1968623Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.1968934Z     tmp7 = tl.load(in_ptr0 + ((16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1969023Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.1969106Z     tmp9 = 1 + (((8*x0) // 3))
2023-01-11T21:38:06.1969212Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.1969295Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.1969616Z     tmp12 = tl.load(in_ptr0 + (1 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1969712Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.1969793Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.1969877Z     tmp15 = 2 + (((8*x0) // 3))
2023-01-11T21:38:06.1969950Z     tmp16 = tmp15 < tmp4
2023-01-11T21:38:06.1970031Z     tmp17 = tmp2 & tmp16
2023-01-11T21:38:06.1970344Z     tmp18 = tl.load(in_ptr0 + (2 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1970445Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.1970528Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.1970611Z     tmp21 = 3 + (((8*x0) // 3))
2023-01-11T21:38:06.1970693Z     tmp22 = tmp21 < tmp4
2023-01-11T21:38:06.1970766Z     tmp23 = tmp2 & tmp22
2023-01-11T21:38:06.1971077Z     tmp24 = tl.load(in_ptr0 + (3 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp23 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1971172Z     tmp25 = tl.where(tmp23, tmp24, 0.0)
2023-01-11T21:38:06.1971255Z     tmp26 = tmp25 + tmp20
2023-01-11T21:38:06.1971335Z     tmp27 = 1 + (((8*x1) // 3))
2023-01-11T21:38:06.1971416Z     tmp28 = tmp27 < tmp1
2023-01-11T21:38:06.1971496Z     tmp29 = tmp28 & tmp5
2023-01-11T21:38:06.1971805Z     tmp30 = tl.load(in_ptr0 + (16 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp29 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1971927Z     tmp31 = tl.where(tmp29, tmp30, 0.0)
2023-01-11T21:38:06.1972009Z     tmp32 = tmp31 + tmp26
2023-01-11T21:38:06.1972093Z     tmp33 = tmp28 & tmp10
2023-01-11T21:38:06.1972408Z     tmp34 = tl.load(in_ptr0 + (17 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1972504Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.1972585Z     tmp36 = tmp35 + tmp32
2023-01-11T21:38:06.1972668Z     tmp37 = tmp28 & tmp16
2023-01-11T21:38:06.1972972Z     tmp38 = tl.load(in_ptr0 + (18 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1973066Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.1973147Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.1973228Z     tmp41 = tmp28 & tmp22
2023-01-11T21:38:06.1973539Z     tmp42 = tl.load(in_ptr0 + (19 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1973635Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.1973715Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.1973798Z     tmp45 = 2 + (((8*x1) // 3))
2023-01-11T21:38:06.1973872Z     tmp46 = tmp45 < tmp1
2023-01-11T21:38:06.1973954Z     tmp47 = tmp46 & tmp5
2023-01-11T21:38:06.1974261Z     tmp48 = tl.load(in_ptr0 + (32 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp47 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1974357Z     tmp49 = tl.where(tmp47, tmp48, 0.0)
2023-01-11T21:38:06.1974438Z     tmp50 = tmp49 + tmp44
2023-01-11T21:38:06.1974637Z     tmp51 = tmp46 & tmp10
2023-01-11T21:38:06.1974944Z     tmp52 = tl.load(in_ptr0 + (33 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp51 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1975036Z     tmp53 = tl.where(tmp51, tmp52, 0.0)
2023-01-11T21:38:06.1975117Z     tmp54 = tmp53 + tmp50
2023-01-11T21:38:06.1975211Z     tmp55 = tmp46 & tmp16
2023-01-11T21:38:06.1975603Z     tmp56 = tl.load(in_ptr0 + (34 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp55 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1975704Z     tmp57 = tl.where(tmp55, tmp56, 0.0)
2023-01-11T21:38:06.1975787Z     tmp58 = tmp57 + tmp54
2023-01-11T21:38:06.1975869Z     tmp59 = tmp46 & tmp22
2023-01-11T21:38:06.1976179Z     tmp60 = tl.load(in_ptr0 + (35 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp59 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1976270Z     tmp61 = tl.where(tmp59, tmp60, 0.0)
2023-01-11T21:38:06.1976354Z     tmp62 = tmp61 + tmp58
2023-01-11T21:38:06.1976437Z     tmp63 = 3 + (((8*x1) // 3))
2023-01-11T21:38:06.1976523Z     tmp64 = tmp63 < tmp1
2023-01-11T21:38:06.1976606Z     tmp65 = tmp64 & tmp5
2023-01-11T21:38:06.1976920Z     tmp66 = tl.load(in_ptr0 + (48 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp65 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1977019Z     tmp67 = tl.where(tmp65, tmp66, 0.0)
2023-01-11T21:38:06.1977095Z     tmp68 = tmp67 + tmp62
2023-01-11T21:38:06.1977245Z     tmp69 = tmp64 & tmp10
2023-01-11T21:38:06.1977557Z     tmp70 = tl.load(in_ptr0 + (49 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp69 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1977656Z     tmp71 = tl.where(tmp69, tmp70, 0.0)
2023-01-11T21:38:06.1977739Z     tmp72 = tmp71 + tmp68
2023-01-11T21:38:06.1977823Z     tmp73 = tmp64 & tmp16
2023-01-11T21:38:06.1978133Z     tmp74 = tl.load(in_ptr0 + (50 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp73 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1978271Z     tmp75 = tl.where(tmp73, tmp74, 0.0)
2023-01-11T21:38:06.1978345Z     tmp76 = tmp75 + tmp72
2023-01-11T21:38:06.1978425Z     tmp77 = tmp64 & tmp22
2023-01-11T21:38:06.1978729Z     tmp78 = tl.load(in_ptr0 + (51 + (16*(((8*x1) // 3))) + (256*x2) + (((8*x0) // 3)) + tl.zeros([XBLOCK], tl.int32)), tmp77 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.1978831Z     tmp79 = tl.where(tmp77, tmp78, 0.0)
2023-01-11T21:38:06.1978910Z     tmp80 = tmp79 + tmp76
2023-01-11T21:38:06.1978983Z     tmp81 = 1
2023-01-11T21:38:06.1979079Z     tmp82 = tl.where(tmp6, tmp81, 0.0)
2023-01-11T21:38:06.1979144Z     tmp83 = 1
2023-01-11T21:38:06.1979238Z     tmp84 = tl.where(tmp11, tmp83, 0.0)
2023-01-11T21:38:06.1979318Z     tmp85 = tmp84 + tmp82
2023-01-11T21:38:06.1979388Z     tmp86 = 1
2023-01-11T21:38:06.1979481Z     tmp87 = tl.where(tmp17, tmp86, 0.0)
2023-01-11T21:38:06.1979562Z     tmp88 = tmp87 + tmp85
2023-01-11T21:38:06.1979634Z     tmp89 = 1
2023-01-11T21:38:06.1979722Z     tmp90 = tl.where(tmp23, tmp89, 0.0)
2023-01-11T21:38:06.1979801Z     tmp91 = tmp90 + tmp88
2023-01-11T21:38:06.1979874Z     tmp92 = 1
2023-01-11T21:38:06.1979966Z     tmp93 = tl.where(tmp29, tmp92, 0.0)
2023-01-11T21:38:06.1980044Z     tmp94 = tmp93 + tmp91
2023-01-11T21:38:06.1980115Z     tmp95 = 1
2023-01-11T21:38:06.1980210Z     tmp96 = tl.where(tmp33, tmp95, 0.0)
2023-01-11T21:38:06.1980282Z     tmp97 = tmp96 + tmp94
2023-01-11T21:38:06.1980353Z     tmp98 = 1
2023-01-11T21:38:06.1980445Z     tmp99 = tl.where(tmp37, tmp98, 0.0)
2023-01-11T21:38:06.1980528Z     tmp100 = tmp99 + tmp97
2023-01-11T21:38:06.1980600Z     tmp101 = 1
2023-01-11T21:38:06.1980702Z     tmp102 = tl.where(tmp41, tmp101, 0.0)
2023-01-11T21:38:06.1980779Z     tmp103 = tmp102 + tmp100
2023-01-11T21:38:06.1980851Z     tmp104 = 1
2023-01-11T21:38:06.1980950Z     tmp105 = tl.where(tmp47, tmp104, 0.0)
2023-01-11T21:38:06.1981031Z     tmp106 = tmp105 + tmp103
2023-01-11T21:38:06.1981102Z     tmp107 = 1
2023-01-11T21:38:06.1981203Z     tmp108 = tl.where(tmp51, tmp107, 0.0)
2023-01-11T21:38:06.1981284Z     tmp109 = tmp108 + tmp106
2023-01-11T21:38:06.1981349Z     tmp110 = 1
2023-01-11T21:38:06.1981446Z     tmp111 = tl.where(tmp55, tmp110, 0.0)
2023-01-11T21:38:06.1981528Z     tmp112 = tmp111 + tmp109
2023-01-11T21:38:06.1981601Z     tmp113 = 1
2023-01-11T21:38:06.1981727Z     tmp114 = tl.where(tmp59, tmp113, 0.0)
2023-01-11T21:38:06.1981811Z     tmp115 = tmp114 + tmp112
2023-01-11T21:38:06.1981883Z     tmp116 = 1
2023-01-11T21:38:06.1981971Z     tmp117 = tl.where(tmp65, tmp116, 0.0)
2023-01-11T21:38:06.1982053Z     tmp118 = tmp117 + tmp115
2023-01-11T21:38:06.1982125Z     tmp119 = 1
2023-01-11T21:38:06.1982218Z     tmp120 = tl.where(tmp69, tmp119, 0.0)
2023-01-11T21:38:06.1988990Z     tmp121 = tmp120 + tmp118
2023-01-11T21:38:06.1989084Z     tmp122 = 1
2023-01-11T21:38:06.1989188Z     tmp123 = tl.where(tmp73, tmp122, 0.0)
2023-01-11T21:38:06.1989275Z     tmp124 = tmp123 + tmp121
2023-01-11T21:38:06.1989360Z     tmp125 = 1
2023-01-11T21:38:06.1989454Z     tmp126 = tl.where(tmp77, tmp125, 0.0)
2023-01-11T21:38:06.1989541Z     tmp127 = tmp126 + tmp124
2023-01-11T21:38:06.1989628Z     tmp128 = tmp80 / tmp127
2023-01-11T21:38:06.1989767Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp128, xmask)
2023-01-11T21:38:06.1989882Z ''')
2023-01-11T21:38:06.1989892Z 
2023-01-11T21:38:06.1989897Z 
2023-01-11T21:38:06.1990057Z triton_fused_add_1 = async_compile.triton('''
2023-01-11T21:38:06.1990136Z import triton
2023-01-11T21:38:06.1990225Z import triton.language as tl
2023-01-11T21:38:06.1990347Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1990452Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1990585Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1990717Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1990723Z 
2023-01-11T21:38:06.1991129Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1991277Z @triton.jit
2023-01-11T21:38:06.1991406Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1991482Z     xnumel = 2048
2023-01-11T21:38:06.1991583Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.1991714Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.1991798Z     xmask = xindex < xnumel
2023-01-11T21:38:06.1991870Z     x0 = xindex
2023-01-11T21:38:06.1991972Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.1992036Z     tmp1 = 1
2023-01-11T21:38:06.1992115Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.1992249Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.1992334Z ''')
2023-01-11T21:38:06.1992340Z 
2023-01-11T21:38:06.1992344Z 
2023-01-11T21:38:06.1992437Z async_compile.wait(globals())
2023-01-11T21:38:06.1992516Z del async_compile
2023-01-11T21:38:06.1992521Z 
2023-01-11T21:38:06.1992595Z def call(args):
2023-01-11T21:38:06.1992662Z     arg0_1, = args
2023-01-11T21:38:06.1992737Z     args.clear()
2023-01-11T21:38:06.1992830Z     with torch.cuda.device(0):
2023-01-11T21:38:06.1993053Z         buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1993146Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.1993304Z         triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.1993528Z         buf1 = empty_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.1993667Z         triton_fused_add_1.run(arg0_1, buf1, 2048, grid=grid(2048), stream=stream0)
2023-01-11T21:38:06.1993734Z         del arg0_1
2023-01-11T21:38:06.1993851Z         buf2 = aten._adaptive_avg_pool2d(buf1, [2, 5])
2023-01-11T21:38:06.1993923Z         del buf1
2023-01-11T21:38:06.1993997Z         buf3 = buf2
2023-01-11T21:38:06.1994111Z         assert_size_stride(buf3, (2, 4, 2, 5), (40, 10, 5, 1))
2023-01-11T21:38:06.1994181Z         del buf2
2023-01-11T21:38:06.1994262Z         return (buf0, buf3, )
2023-01-11T21:38:06.1994267Z 
2023-01-11T21:38:06.1994272Z 
2023-01-11T21:38:06.1994345Z if __name__ == "__main__":
2023-01-11T21:38:06.1994491Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.1994619Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.1994847Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.1994961Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.1995233Z [2023-01-11 21:33:54,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 374
2023-01-11T21:38:06.1995653Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1995789Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1996048Z [2023-01-11 21:33:54,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 375
2023-01-11T21:38:06.1996310Z [2023-01-11 21:33:55,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 375
2023-01-11T21:38:06.1996717Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.1996878Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.1997133Z [2023-01-11 21:33:55,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 376
2023-01-11T21:38:06.1997138Z 
2023-01-11T21:38:06.1997143Z 
2023-01-11T21:38:06.1997241Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.1997318Z import torch
2023-01-11T21:38:06.1997392Z import random
2023-01-11T21:38:06.1997511Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.1997634Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.1997641Z 
2023-01-11T21:38:06.1997723Z aten = torch.ops.aten
2023-01-11T21:38:06.1997853Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.1997948Z async_compile = AsyncCompile()
2023-01-11T21:38:06.1997953Z 
2023-01-11T21:38:06.1998027Z import triton
2023-01-11T21:38:06.1998120Z import triton.language as tl
2023-01-11T21:38:06.1998244Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.1998386Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.1998391Z 
2023-01-11T21:38:06.1998396Z 
2023-01-11T21:38:06.1998581Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.1998656Z import triton
2023-01-11T21:38:06.1998741Z import triton.language as tl
2023-01-11T21:38:06.1998857Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.1998960Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.1999096Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.1999223Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.1999228Z 
2023-01-11T21:38:06.1999634Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.1999709Z @triton.jit
2023-01-11T21:38:06.1999842Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.1999910Z     xnumel = 288
2023-01-11T21:38:06.2000006Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2000133Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2000217Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2000326Z     x1 = (xindex // 6) % 6
2023-01-11T21:38:06.2000402Z     x0 = xindex % 6
2023-01-11T21:38:06.2000481Z     x2 = (xindex // 36)
2023-01-11T21:38:06.2000544Z     x4 = xindex
2023-01-11T21:38:06.2000619Z     tmp0 = (x1 // 2)
2023-01-11T21:38:06.2000701Z     tmp1 = ((8 + (3*x1)) // 6)
2023-01-11T21:38:06.2000780Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2000853Z     tmp3 = (x0 // 2)
2023-01-11T21:38:06.2000934Z     tmp4 = ((8 + (3*x0)) // 6)
2023-01-11T21:38:06.2001006Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.2001085Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.2001381Z     tmp7 = tl.load(in_ptr0 + ((3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2001483Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.2001562Z     tmp9 = 1 + (x0 // 2)
2023-01-11T21:38:06.2001642Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.2001723Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.2002029Z     tmp12 = tl.load(in_ptr0 + (1 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2002119Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.2002199Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.2002277Z     tmp15 = 1 + (x1 // 2)
2023-01-11T21:38:06.2002356Z     tmp16 = tmp15 < tmp1
2023-01-11T21:38:06.2002436Z     tmp17 = tmp16 & tmp5
2023-01-11T21:38:06.2002736Z     tmp18 = tl.load(in_ptr0 + (3 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2002833Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2002937Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.2003018Z     tmp21 = tmp16 & tmp10
2023-01-11T21:38:06.2003306Z     tmp22 = tl.load(in_ptr0 + (4 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2003405Z     tmp23 = tl.where(tmp21, tmp22, 0.0)
2023-01-11T21:38:06.2003489Z     tmp24 = tmp23 + tmp20
2023-01-11T21:38:06.2003562Z     tmp25 = 1
2023-01-11T21:38:06.2003659Z     tmp26 = tl.where(tmp6, tmp25, 0.0)
2023-01-11T21:38:06.2003724Z     tmp27 = 1
2023-01-11T21:38:06.2003816Z     tmp28 = tl.where(tmp11, tmp27, 0.0)
2023-01-11T21:38:06.2003900Z     tmp29 = tmp28 + tmp26
2023-01-11T21:38:06.2003971Z     tmp30 = 1
2023-01-11T21:38:06.2004063Z     tmp31 = tl.where(tmp17, tmp30, 0.0)
2023-01-11T21:38:06.2004143Z     tmp32 = tmp31 + tmp29
2023-01-11T21:38:06.2004216Z     tmp33 = 1
2023-01-11T21:38:06.2004301Z     tmp34 = tl.where(tmp21, tmp33, 0.0)
2023-01-11T21:38:06.2004381Z     tmp35 = tmp34 + tmp32
2023-01-11T21:38:06.2004463Z     tmp36 = tmp24 / tmp35
2023-01-11T21:38:06.2004598Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp36, xmask)
2023-01-11T21:38:06.2004684Z ''')
2023-01-11T21:38:06.2004690Z 
2023-01-11T21:38:06.2004694Z 
2023-01-11T21:38:06.2004884Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2004962Z import triton
2023-01-11T21:38:06.2005048Z import triton.language as tl
2023-01-11T21:38:06.2005164Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2005270Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2005402Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2005527Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2005532Z 
2023-01-11T21:38:06.2005933Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2006007Z @triton.jit
2023-01-11T21:38:06.2006138Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2006205Z     xnumel = 80
2023-01-11T21:38:06.2006300Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2006455Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2006543Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2006624Z     x1 = (xindex // 5) % 2
2023-01-11T21:38:06.2006696Z     x0 = xindex % 5
2023-01-11T21:38:06.2006774Z     x2 = (xindex // 10)
2023-01-11T21:38:06.2006837Z     x4 = xindex
2023-01-11T21:38:06.2006918Z     tmp0 = ((3*x1) // 2)
2023-01-11T21:38:06.2006999Z     tmp1 = 2 + (((3*x1) // 2))
2023-01-11T21:38:06.2007077Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2007153Z     tmp3 = ((3*x0) // 5)
2023-01-11T21:38:06.2007234Z     tmp4 = ((7 + (3*x0)) // 5)
2023-01-11T21:38:06.2007313Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.2007383Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.2007556Z     tmp7 = tl.load(in_ptr0 + ((3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0)
2023-01-11T21:38:06.2007626Z     tmp8 = 1
2023-01-11T21:38:06.2007704Z     tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.2007799Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.2007882Z     tmp11 = 1 + (((3*x0) // 5))
2023-01-11T21:38:06.2007962Z     tmp12 = tmp11 < tmp4
2023-01-11T21:38:06.2008035Z     tmp13 = tmp2 & tmp12
2023-01-11T21:38:06.2008208Z     tmp14 = tl.load(in_ptr0 + (1 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0)
2023-01-11T21:38:06.2008280Z     tmp15 = 1
2023-01-11T21:38:06.2008360Z     tmp16 = tmp14 + tmp15
2023-01-11T21:38:06.2008454Z     tmp17 = tl.where(tmp13, tmp16, 0.0)
2023-01-11T21:38:06.2008535Z     tmp18 = tmp17 + tmp10
2023-01-11T21:38:06.2008615Z     tmp19 = 1 + (((3*x1) // 2))
2023-01-11T21:38:06.2008688Z     tmp20 = tmp19 < tmp1
2023-01-11T21:38:06.2008795Z     tmp21 = tmp20 & tmp5
2023-01-11T21:38:06.2008960Z     tmp22 = tl.load(in_ptr0 + (3 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0)
2023-01-11T21:38:06.2009034Z     tmp23 = 1
2023-01-11T21:38:06.2009114Z     tmp24 = tmp22 + tmp23
2023-01-11T21:38:06.2009216Z     tmp25 = tl.where(tmp21, tmp24, 0.0)
2023-01-11T21:38:06.2009297Z     tmp26 = tmp25 + tmp18
2023-01-11T21:38:06.2009369Z     tmp27 = tmp20 & tmp12
2023-01-11T21:38:06.2009529Z     tmp28 = tl.load(in_ptr0 + (4 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0)
2023-01-11T21:38:06.2009601Z     tmp29 = 1
2023-01-11T21:38:06.2009679Z     tmp30 = tmp28 + tmp29
2023-01-11T21:38:06.2009772Z     tmp31 = tl.where(tmp27, tmp30, 0.0)
2023-01-11T21:38:06.2009851Z     tmp32 = tmp31 + tmp26
2023-01-11T21:38:06.2009924Z     tmp33 = 1
2023-01-11T21:38:06.2010011Z     tmp34 = tl.where(tmp6, tmp33, 0.0)
2023-01-11T21:38:06.2010082Z     tmp35 = 1
2023-01-11T21:38:06.2010180Z     tmp36 = tl.where(tmp13, tmp35, 0.0)
2023-01-11T21:38:06.2010258Z     tmp37 = tmp36 + tmp34
2023-01-11T21:38:06.2010329Z     tmp38 = 1
2023-01-11T21:38:06.2010422Z     tmp39 = tl.where(tmp21, tmp38, 0.0)
2023-01-11T21:38:06.2010503Z     tmp40 = tmp39 + tmp37
2023-01-11T21:38:06.2010567Z     tmp41 = 1
2023-01-11T21:38:06.2010661Z     tmp42 = tl.where(tmp27, tmp41, 0.0)
2023-01-11T21:38:06.2010741Z     tmp43 = tmp42 + tmp40
2023-01-11T21:38:06.2010821Z     tmp44 = tmp32 / tmp43
2023-01-11T21:38:06.2010957Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp44, xmask)
2023-01-11T21:38:06.2011048Z ''')
2023-01-11T21:38:06.2011053Z 
2023-01-11T21:38:06.2011058Z 
2023-01-11T21:38:06.2011151Z async_compile.wait(globals())
2023-01-11T21:38:06.2011221Z del async_compile
2023-01-11T21:38:06.2011226Z 
2023-01-11T21:38:06.2011302Z def call(args):
2023-01-11T21:38:06.2011375Z     arg0_1, = args
2023-01-11T21:38:06.2011455Z     args.clear()
2023-01-11T21:38:06.2011551Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2011770Z         buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2011863Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2012016Z         triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.2012259Z         buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2012418Z         triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0)
2023-01-11T21:38:06.2012494Z         del arg0_1
2023-01-11T21:38:06.2012576Z         return (buf0, buf1, )
2023-01-11T21:38:06.2012581Z 
2023-01-11T21:38:06.2012586Z 
2023-01-11T21:38:06.2012666Z if __name__ == "__main__":
2023-01-11T21:38:06.2012785Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2012912Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2013126Z     arg0_1 = rand_strided((2, 4, 3, 3), (36, 9, 3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2013237Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2013242Z 
2023-01-11T21:38:06.2013506Z [2023-01-11 21:33:55,478] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 376
2023-01-11T21:38:06.2013927Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2014059Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2014314Z [2023-01-11 21:33:55,505] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 377
2023-01-11T21:38:06.2014320Z 
2023-01-11T21:38:06.2014417Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2014752Z import torch
2023-01-11T21:38:06.2014826Z import random
2023-01-11T21:38:06.2014945Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2015062Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2015067Z 
2023-01-11T21:38:06.2015150Z aten = torch.ops.aten
2023-01-11T21:38:06.2015289Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2015385Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2015390Z 
2023-01-11T21:38:06.2015463Z import triton
2023-01-11T21:38:06.2015556Z import triton.language as tl
2023-01-11T21:38:06.2015681Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2015814Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2015825Z 
2023-01-11T21:38:06.2015829Z 
2023-01-11T21:38:06.2016010Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2016084Z import triton
2023-01-11T21:38:06.2016181Z import triton.language as tl
2023-01-11T21:38:06.2016294Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2016395Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2016527Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2016656Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2016665Z 
2023-01-11T21:38:06.2017069Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2017192Z @triton.jit
2023-01-11T21:38:06.2017335Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2017410Z     xnumel = 288
2023-01-11T21:38:06.2017508Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2017638Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2017724Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2017805Z     x1 = (xindex // 6) % 6
2023-01-11T21:38:06.2017872Z     x0 = xindex % 6
2023-01-11T21:38:06.2017951Z     x2 = (xindex // 36)
2023-01-11T21:38:06.2018022Z     x4 = xindex
2023-01-11T21:38:06.2018096Z     tmp0 = (x1 // 2)
2023-01-11T21:38:06.2018178Z     tmp1 = ((8 + (3*x1)) // 6)
2023-01-11T21:38:06.2018307Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2018383Z     tmp3 = (x0 // 2)
2023-01-11T21:38:06.2018456Z     tmp4 = ((8 + (3*x0)) // 6)
2023-01-11T21:38:06.2018533Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.2018608Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.2018926Z     tmp7 = tl.load(in_ptr0 + ((3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2019024Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.2019102Z     tmp9 = 1 + (x0 // 2)
2023-01-11T21:38:06.2019185Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.2019259Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.2019584Z     tmp12 = tl.load(in_ptr0 + (1 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2019682Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.2019762Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.2019844Z     tmp15 = 1 + (x1 // 2)
2023-01-11T21:38:06.2019924Z     tmp16 = tmp15 < tmp1
2023-01-11T21:38:06.2020008Z     tmp17 = tmp16 & tmp5
2023-01-11T21:38:06.2020318Z     tmp18 = tl.load(in_ptr0 + (3 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2020414Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2020494Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.2020576Z     tmp21 = tmp16 & tmp10
2023-01-11T21:38:06.2020886Z     tmp22 = tl.load(in_ptr0 + (4 + (3*(x1 // 2)) + (9*x2) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2021019Z     tmp23 = tl.where(tmp21, tmp22, 0.0)
2023-01-11T21:38:06.2021100Z     tmp24 = tmp23 + tmp20
2023-01-11T21:38:06.2021172Z     tmp25 = 1
2023-01-11T21:38:06.2021260Z     tmp26 = tl.where(tmp6, tmp25, 0.0)
2023-01-11T21:38:06.2021331Z     tmp27 = 1
2023-01-11T21:38:06.2021427Z     tmp28 = tl.where(tmp11, tmp27, 0.0)
2023-01-11T21:38:06.2021511Z     tmp29 = tmp28 + tmp26
2023-01-11T21:38:06.2021584Z     tmp30 = 1
2023-01-11T21:38:06.2021675Z     tmp31 = tl.where(tmp17, tmp30, 0.0)
2023-01-11T21:38:06.2021748Z     tmp32 = tmp31 + tmp29
2023-01-11T21:38:06.2021819Z     tmp33 = 1
2023-01-11T21:38:06.2021912Z     tmp34 = tl.where(tmp21, tmp33, 0.0)
2023-01-11T21:38:06.2021992Z     tmp35 = tmp34 + tmp32
2023-01-11T21:38:06.2022071Z     tmp36 = tmp24 / tmp35
2023-01-11T21:38:06.2022208Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp36, xmask)
2023-01-11T21:38:06.2022294Z ''')
2023-01-11T21:38:06.2022300Z 
2023-01-11T21:38:06.2022308Z 
2023-01-11T21:38:06.2022499Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2022567Z import triton
2023-01-11T21:38:06.2022661Z import triton.language as tl
2023-01-11T21:38:06.2022777Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2022878Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2023016Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2023142Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2023147Z 
2023-01-11T21:38:06.2023552Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2023628Z @triton.jit
2023-01-11T21:38:06.2023754Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2023827Z     xnumel = 80
2023-01-11T21:38:06.2023929Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2024058Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2024141Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2024227Z     x1 = (xindex // 5) % 2
2023-01-11T21:38:06.2024300Z     x0 = xindex % 5
2023-01-11T21:38:06.2024371Z     x2 = (xindex // 10)
2023-01-11T21:38:06.2024477Z     x4 = xindex
2023-01-11T21:38:06.2024557Z     tmp0 = ((3*x1) // 2)
2023-01-11T21:38:06.2024638Z     tmp1 = 2 + (((3*x1) // 2))
2023-01-11T21:38:06.2024717Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2024794Z     tmp3 = ((3*x0) // 5)
2023-01-11T21:38:06.2024868Z     tmp4 = ((7 + (3*x0)) // 5)
2023-01-11T21:38:06.2024946Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.2025024Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.2025211Z     tmp7 = tl.load(in_ptr0 + ((3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2025281Z     tmp8 = 1
2023-01-11T21:38:06.2025364Z     tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.2025460Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.2025534Z     tmp11 = 1 + (((3*x0) // 5))
2023-01-11T21:38:06.2025614Z     tmp12 = tmp11 < tmp4
2023-01-11T21:38:06.2025695Z     tmp13 = tmp2 & tmp12
2023-01-11T21:38:06.2025883Z     tmp14 = tl.load(in_ptr0 + (1 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2025958Z     tmp15 = 1
2023-01-11T21:38:06.2026040Z     tmp16 = tmp14 + tmp15
2023-01-11T21:38:06.2026133Z     tmp17 = tl.where(tmp13, tmp16, 0.0)
2023-01-11T21:38:06.2026206Z     tmp18 = tmp17 + tmp10
2023-01-11T21:38:06.2026287Z     tmp19 = 1 + (((3*x1) // 2))
2023-01-11T21:38:06.2026366Z     tmp20 = tmp19 < tmp1
2023-01-11T21:38:06.2026445Z     tmp21 = tmp20 & tmp5
2023-01-11T21:38:06.2026628Z     tmp22 = tl.load(in_ptr0 + (3 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2026731Z     tmp23 = 1
2023-01-11T21:38:06.2026812Z     tmp24 = tmp22 + tmp23
2023-01-11T21:38:06.2026907Z     tmp25 = tl.where(tmp21, tmp24, 0.0)
2023-01-11T21:38:06.2026980Z     tmp26 = tmp25 + tmp18
2023-01-11T21:38:06.2027059Z     tmp27 = tmp20 & tmp12
2023-01-11T21:38:06.2027244Z     tmp28 = tl.load(in_ptr0 + (4 + (3*(((3*x1) // 2))) + (9*x2) + (((3*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2027319Z     tmp29 = 1
2023-01-11T21:38:06.2027399Z     tmp30 = tmp28 + tmp29
2023-01-11T21:38:06.2027496Z     tmp31 = tl.where(tmp27, tmp30, 0.0)
2023-01-11T21:38:06.2027576Z     tmp32 = tmp31 + tmp26
2023-01-11T21:38:06.2027640Z     tmp33 = 1
2023-01-11T21:38:06.2027734Z     tmp34 = tl.where(tmp6, tmp33, 0.0)
2023-01-11T21:38:06.2027804Z     tmp35 = 1
2023-01-11T21:38:06.2027898Z     tmp36 = tl.where(tmp13, tmp35, 0.0)
2023-01-11T21:38:06.2027979Z     tmp37 = tmp36 + tmp34
2023-01-11T21:38:06.2028050Z     tmp38 = 1
2023-01-11T21:38:06.2028138Z     tmp39 = tl.where(tmp21, tmp38, 0.0)
2023-01-11T21:38:06.2028222Z     tmp40 = tmp39 + tmp37
2023-01-11T21:38:06.2028292Z     tmp41 = 1
2023-01-11T21:38:06.2028384Z     tmp42 = tl.where(tmp27, tmp41, 0.0)
2023-01-11T21:38:06.2028464Z     tmp43 = tmp42 + tmp40
2023-01-11T21:38:06.2028543Z     tmp44 = tmp32 / tmp43
2023-01-11T21:38:06.2028682Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp44, xmask)
2023-01-11T21:38:06.2028765Z ''')
2023-01-11T21:38:06.2028779Z 
2023-01-11T21:38:06.2028784Z 
2023-01-11T21:38:06.2028871Z async_compile.wait(globals())
2023-01-11T21:38:06.2028948Z del async_compile
2023-01-11T21:38:06.2028954Z 
2023-01-11T21:38:06.2029027Z def call(args):
2023-01-11T21:38:06.2029100Z     arg0_1, = args
2023-01-11T21:38:06.2029175Z     args.clear()
2023-01-11T21:38:06.2029268Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2029486Z         buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2029572Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2029735Z         triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.2029949Z         buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2030139Z         triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0)
2023-01-11T21:38:06.2030214Z         del arg0_1
2023-01-11T21:38:06.2030297Z         return (buf0, buf1, )
2023-01-11T21:38:06.2030302Z 
2023-01-11T21:38:06.2030307Z 
2023-01-11T21:38:06.2030389Z if __name__ == "__main__":
2023-01-11T21:38:06.2030508Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2030629Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2030845Z     arg0_1 = rand_strided((2, 4, 3, 3), (36, 9, 3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2030957Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2030965Z 
2023-01-11T21:38:06.2031228Z [2023-01-11 21:33:55,784] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 377
2023-01-11T21:38:06.2031647Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2031779Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2032035Z [2023-01-11 21:33:55,811] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 378
2023-01-11T21:38:06.2032040Z 
2023-01-11T21:38:06.2032137Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2032212Z import torch
2023-01-11T21:38:06.2032280Z import random
2023-01-11T21:38:06.2032400Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2032565Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2032570Z 
2023-01-11T21:38:06.2032652Z aten = torch.ops.aten
2023-01-11T21:38:06.2032789Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2032886Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2032891Z 
2023-01-11T21:38:06.2032966Z import triton
2023-01-11T21:38:06.2033056Z import triton.language as tl
2023-01-11T21:38:06.2033174Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2033313Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2033319Z 
2023-01-11T21:38:06.2033323Z 
2023-01-11T21:38:06.2033508Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2033583Z import triton
2023-01-11T21:38:06.2033677Z import triton.language as tl
2023-01-11T21:38:06.2033792Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2033894Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2034031Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2034149Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2034154Z 
2023-01-11T21:38:06.2034558Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2034634Z @triton.jit
2023-01-11T21:38:06.2034766Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2034839Z     xnumel = 288
2023-01-11T21:38:06.2034935Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2035065Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2035149Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2035213Z     x0 = xindex
2023-01-11T21:38:06.2035403Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2035542Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2035629Z ''')
2023-01-11T21:38:06.2035634Z 
2023-01-11T21:38:06.2035638Z 
2023-01-11T21:38:06.2035825Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2035900Z import triton
2023-01-11T21:38:06.2036023Z import triton.language as tl
2023-01-11T21:38:06.2036131Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2036233Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2036364Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2036489Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2036494Z 
2023-01-11T21:38:06.2036892Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2036966Z @triton.jit
2023-01-11T21:38:06.2037103Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2037176Z     xnumel = 80
2023-01-11T21:38:06.2037265Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2037393Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2037476Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2037559Z     x1 = (xindex // 5) % 2
2023-01-11T21:38:06.2037633Z     x0 = xindex % 5
2023-01-11T21:38:06.2037710Z     x3 = (xindex // 5)
2023-01-11T21:38:06.2037782Z     x4 = xindex
2023-01-11T21:38:06.2037849Z     tmp0 = 3*x1
2023-01-11T21:38:06.2037924Z     tmp1 = 3 + (3*x1)
2023-01-11T21:38:06.2038001Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2038080Z     tmp3 = ((6*x0) // 5)
2023-01-11T21:38:06.2038163Z     tmp4 = 2 + (((6*x0) // 5))
2023-01-11T21:38:06.2038240Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.2038321Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.2038477Z     tmp7 = tl.load(in_ptr0 + ((18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0)
2023-01-11T21:38:06.2038577Z     tmp8 = 1
2023-01-11T21:38:06.2038654Z     tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.2038749Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.2038829Z     tmp11 = 1 + (((6*x0) // 5))
2023-01-11T21:38:06.2038910Z     tmp12 = tmp11 < tmp4
2023-01-11T21:38:06.2038998Z     tmp13 = tmp2 & tmp12
2023-01-11T21:38:06.2039153Z     tmp14 = tl.load(in_ptr0 + (1 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0)
2023-01-11T21:38:06.2039226Z     tmp15 = 1
2023-01-11T21:38:06.2039306Z     tmp16 = tmp14 + tmp15
2023-01-11T21:38:06.2039401Z     tmp17 = tl.where(tmp13, tmp16, 0.0)
2023-01-11T21:38:06.2039481Z     tmp18 = tmp17 + tmp10
2023-01-11T21:38:06.2039555Z     tmp19 = 1 + (3*x1)
2023-01-11T21:38:06.2039637Z     tmp20 = tmp19 < tmp1
2023-01-11T21:38:06.2039709Z     tmp21 = tmp20 & tmp5
2023-01-11T21:38:06.2039862Z     tmp22 = tl.load(in_ptr0 + (6 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0)
2023-01-11T21:38:06.2039939Z     tmp23 = 1
2023-01-11T21:38:06.2040019Z     tmp24 = tmp22 + tmp23
2023-01-11T21:38:06.2040112Z     tmp25 = tl.where(tmp21, tmp24, 0.0)
2023-01-11T21:38:06.2040192Z     tmp26 = tmp25 + tmp18
2023-01-11T21:38:06.2040272Z     tmp27 = tmp20 & tmp12
2023-01-11T21:38:06.2040429Z     tmp28 = tl.load(in_ptr0 + (7 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0)
2023-01-11T21:38:06.2040502Z     tmp29 = 1
2023-01-11T21:38:06.2040580Z     tmp30 = tmp28 + tmp29
2023-01-11T21:38:06.2040677Z     tmp31 = tl.where(tmp27, tmp30, 0.0)
2023-01-11T21:38:06.2040757Z     tmp32 = tmp31 + tmp26
2023-01-11T21:38:06.2040835Z     tmp33 = 2 + (3*x1)
2023-01-11T21:38:06.2040915Z     tmp34 = tmp33 < tmp1
2023-01-11T21:38:06.2040988Z     tmp35 = tmp34 & tmp5
2023-01-11T21:38:06.2041147Z     tmp36 = tl.load(in_ptr0 + (12 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp35 & xmask, other=0)
2023-01-11T21:38:06.2041219Z     tmp37 = 1
2023-01-11T21:38:06.2041304Z     tmp38 = tmp36 + tmp37
2023-01-11T21:38:06.2041398Z     tmp39 = tl.where(tmp35, tmp38, 0.0)
2023-01-11T21:38:06.2041479Z     tmp40 = tmp39 + tmp32
2023-01-11T21:38:06.2041557Z     tmp41 = tmp34 & tmp12
2023-01-11T21:38:06.2041735Z     tmp42 = tl.load(in_ptr0 + (13 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0)
2023-01-11T21:38:06.2041812Z     tmp43 = 1
2023-01-11T21:38:06.2041891Z     tmp44 = tmp42 + tmp43
2023-01-11T21:38:06.2041987Z     tmp45 = tl.where(tmp41, tmp44, 0.0)
2023-01-11T21:38:06.2042065Z     tmp46 = tmp45 + tmp40
2023-01-11T21:38:06.2042147Z     tmp47 = tmp1 < tmp1
2023-01-11T21:38:06.2042227Z     tmp48 = tmp47 & tmp5
2023-01-11T21:38:06.2042375Z     tmp49 = tl.load(in_ptr0 + (18 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp48 & xmask, other=0)
2023-01-11T21:38:06.2042448Z     tmp50 = 1
2023-01-11T21:38:06.2042528Z     tmp51 = tmp49 + tmp50
2023-01-11T21:38:06.2042624Z     tmp52 = tl.where(tmp48, tmp51, 0.0)
2023-01-11T21:38:06.2042706Z     tmp53 = tmp52 + tmp46
2023-01-11T21:38:06.2042786Z     tmp54 = tmp47 & tmp12
2023-01-11T21:38:06.2042944Z     tmp55 = tl.load(in_ptr0 + (19 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp54 & xmask, other=0)
2023-01-11T21:38:06.2043009Z     tmp56 = 1
2023-01-11T21:38:06.2043088Z     tmp57 = tmp55 + tmp56
2023-01-11T21:38:06.2043184Z     tmp58 = tl.where(tmp54, tmp57, 0.0)
2023-01-11T21:38:06.2043263Z     tmp59 = tmp58 + tmp53
2023-01-11T21:38:06.2043335Z     tmp60 = 1
2023-01-11T21:38:06.2043430Z     tmp61 = tl.where(tmp6, tmp60, 0.0)
2023-01-11T21:38:06.2043501Z     tmp62 = 1
2023-01-11T21:38:06.2043587Z     tmp63 = tl.where(tmp13, tmp62, 0.0)
2023-01-11T21:38:06.2043667Z     tmp64 = tmp63 + tmp61
2023-01-11T21:38:06.2043737Z     tmp65 = 1
2023-01-11T21:38:06.2043829Z     tmp66 = tl.where(tmp21, tmp65, 0.0)
2023-01-11T21:38:06.2043910Z     tmp67 = tmp66 + tmp64
2023-01-11T21:38:06.2043981Z     tmp68 = 1
2023-01-11T21:38:06.2044066Z     tmp69 = tl.where(tmp27, tmp68, 0.0)
2023-01-11T21:38:06.2044181Z     tmp70 = tmp69 + tmp67
2023-01-11T21:38:06.2044252Z     tmp71 = 1
2023-01-11T21:38:06.2044344Z     tmp72 = tl.where(tmp35, tmp71, 0.0)
2023-01-11T21:38:06.2044423Z     tmp73 = tmp72 + tmp70
2023-01-11T21:38:06.2044494Z     tmp74 = 1
2023-01-11T21:38:06.2044586Z     tmp75 = tl.where(tmp41, tmp74, 0.0)
2023-01-11T21:38:06.2044662Z     tmp76 = tmp75 + tmp73
2023-01-11T21:38:06.2044732Z     tmp77 = 1
2023-01-11T21:38:06.2044823Z     tmp78 = tl.where(tmp48, tmp77, 0.0)
2023-01-11T21:38:06.2044902Z     tmp79 = tmp78 + tmp76
2023-01-11T21:38:06.2044978Z     tmp80 = 1
2023-01-11T21:38:06.2045069Z     tmp81 = tl.where(tmp54, tmp80, 0.0)
2023-01-11T21:38:06.2045151Z     tmp82 = tmp81 + tmp79
2023-01-11T21:38:06.2045223Z     tmp83 = tmp59 / tmp82
2023-01-11T21:38:06.2045360Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp83, xmask)
2023-01-11T21:38:06.2045448Z ''')
2023-01-11T21:38:06.2045454Z 
2023-01-11T21:38:06.2045458Z 
2023-01-11T21:38:06.2045556Z async_compile.wait(globals())
2023-01-11T21:38:06.2045632Z del async_compile
2023-01-11T21:38:06.2045638Z 
2023-01-11T21:38:06.2045713Z def call(args):
2023-01-11T21:38:06.2045785Z     arg0_1, = args
2023-01-11T21:38:06.2045853Z     args.clear()
2023-01-11T21:38:06.2045946Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2046168Z         buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2046262Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2046419Z         triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.2046633Z         buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2046790Z         triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0)
2023-01-11T21:38:06.2046862Z         del arg0_1
2023-01-11T21:38:06.2046939Z         return (buf0, buf1, )
2023-01-11T21:38:06.2046944Z 
2023-01-11T21:38:06.2046958Z 
2023-01-11T21:38:06.2047032Z if __name__ == "__main__":
2023-01-11T21:38:06.2047149Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2047275Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2047488Z     arg0_1 = rand_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2047625Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2047631Z 
2023-01-11T21:38:06.2047895Z [2023-01-11 21:33:56,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 378
2023-01-11T21:38:06.2047901Z 
2023-01-11T21:38:06.2047998Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2048078Z import torch
2023-01-11T21:38:06.2048145Z import random
2023-01-11T21:38:06.2048265Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2048391Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2048396Z 
2023-01-11T21:38:06.2048479Z aten = torch.ops.aten
2023-01-11T21:38:06.2048618Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2048715Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2048720Z 
2023-01-11T21:38:06.2048794Z import triton
2023-01-11T21:38:06.2048879Z import triton.language as tl
2023-01-11T21:38:06.2049006Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2049149Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2049154Z 
2023-01-11T21:38:06.2049158Z 
2023-01-11T21:38:06.2049343Z triton_fused__adaptive_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2049417Z import triton
2023-01-11T21:38:06.2049509Z import triton.language as tl
2023-01-11T21:38:06.2049625Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2049727Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2049853Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2049977Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2050009Z 
2023-01-11T21:38:06.2050411Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2050484Z @triton.jit
2023-01-11T21:38:06.2050617Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2050694Z     xnumel = 288
2023-01-11T21:38:06.2050787Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2050918Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2050995Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2051065Z     x0 = xindex
2023-01-11T21:38:06.2051279Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2051413Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2051498Z ''')
2023-01-11T21:38:06.2051509Z 
2023-01-11T21:38:06.2051514Z 
2023-01-11T21:38:06.2051701Z triton_fused__adaptive_avg_pool2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2051777Z import triton
2023-01-11T21:38:06.2051863Z import triton.language as tl
2023-01-11T21:38:06.2051975Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2052080Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2052211Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2052335Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2052341Z 
2023-01-11T21:38:06.2052735Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2052813Z @triton.jit
2023-01-11T21:38:06.2052942Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2053008Z     xnumel = 80
2023-01-11T21:38:06.2053106Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2053234Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2053316Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2053399Z     x1 = (xindex // 5) % 2
2023-01-11T21:38:06.2053477Z     x0 = xindex % 5
2023-01-11T21:38:06.2053583Z     x3 = (xindex // 5)
2023-01-11T21:38:06.2053648Z     x4 = xindex
2023-01-11T21:38:06.2053721Z     tmp0 = 3*x1
2023-01-11T21:38:06.2053797Z     tmp1 = 3 + (3*x1)
2023-01-11T21:38:06.2053873Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2053951Z     tmp3 = ((6*x0) // 5)
2023-01-11T21:38:06.2054031Z     tmp4 = 2 + (((6*x0) // 5))
2023-01-11T21:38:06.2054108Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.2054178Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.2054352Z     tmp7 = tl.load(in_ptr0 + ((18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2054422Z     tmp8 = 1
2023-01-11T21:38:06.2054616Z     tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.2054714Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.2054796Z     tmp11 = 1 + (((6*x0) // 5))
2023-01-11T21:38:06.2054876Z     tmp12 = tmp11 < tmp4
2023-01-11T21:38:06.2054949Z     tmp13 = tmp2 & tmp12
2023-01-11T21:38:06.2055131Z     tmp14 = tl.load(in_ptr0 + (1 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2055204Z     tmp15 = 1
2023-01-11T21:38:06.2055285Z     tmp16 = tmp14 + tmp15
2023-01-11T21:38:06.2055379Z     tmp17 = tl.where(tmp13, tmp16, 0.0)
2023-01-11T21:38:06.2055459Z     tmp18 = tmp17 + tmp10
2023-01-11T21:38:06.2055533Z     tmp19 = 1 + (3*x1)
2023-01-11T21:38:06.2055606Z     tmp20 = tmp19 < tmp1
2023-01-11T21:38:06.2055685Z     tmp21 = tmp20 & tmp5
2023-01-11T21:38:06.2055857Z     tmp22 = tl.load(in_ptr0 + (6 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2055930Z     tmp23 = 1
2023-01-11T21:38:06.2056056Z     tmp24 = tmp22 + tmp23
2023-01-11T21:38:06.2056150Z     tmp25 = tl.where(tmp21, tmp24, 0.0)
2023-01-11T21:38:06.2056231Z     tmp26 = tmp25 + tmp18
2023-01-11T21:38:06.2056303Z     tmp27 = tmp20 & tmp12
2023-01-11T21:38:06.2056481Z     tmp28 = tl.load(in_ptr0 + (7 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2056553Z     tmp29 = 1
2023-01-11T21:38:06.2056632Z     tmp30 = tmp28 + tmp29
2023-01-11T21:38:06.2056727Z     tmp31 = tl.where(tmp27, tmp30, 0.0)
2023-01-11T21:38:06.2056808Z     tmp32 = tmp31 + tmp26
2023-01-11T21:38:06.2056882Z     tmp33 = 2 + (3*x1)
2023-01-11T21:38:06.2056955Z     tmp34 = tmp33 < tmp1
2023-01-11T21:38:06.2057033Z     tmp35 = tmp34 & tmp5
2023-01-11T21:38:06.2057263Z     tmp36 = tl.load(in_ptr0 + (12 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp35 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2057337Z     tmp37 = 1
2023-01-11T21:38:06.2057419Z     tmp38 = tmp36 + tmp37
2023-01-11T21:38:06.2057519Z     tmp39 = tl.where(tmp35, tmp38, 0.0)
2023-01-11T21:38:06.2057597Z     tmp40 = tmp39 + tmp32
2023-01-11T21:38:06.2057671Z     tmp41 = tmp34 & tmp12
2023-01-11T21:38:06.2057842Z     tmp42 = tl.load(in_ptr0 + (13 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2057917Z     tmp43 = 1
2023-01-11T21:38:06.2057995Z     tmp44 = tmp42 + tmp43
2023-01-11T21:38:06.2058089Z     tmp45 = tl.where(tmp41, tmp44, 0.0)
2023-01-11T21:38:06.2058169Z     tmp46 = tmp45 + tmp40
2023-01-11T21:38:06.2058248Z     tmp47 = tmp1 < tmp1
2023-01-11T21:38:06.2058320Z     tmp48 = tmp47 & tmp5
2023-01-11T21:38:06.2058490Z     tmp49 = tl.load(in_ptr0 + (18 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp48 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2058563Z     tmp50 = 1
2023-01-11T21:38:06.2058643Z     tmp51 = tmp49 + tmp50
2023-01-11T21:38:06.2058737Z     tmp52 = tl.where(tmp48, tmp51, 0.0)
2023-01-11T21:38:06.2058818Z     tmp53 = tmp52 + tmp46
2023-01-11T21:38:06.2058895Z     tmp54 = tmp47 & tmp12
2023-01-11T21:38:06.2059060Z     tmp55 = tl.load(in_ptr0 + (19 + (18*x3) + (((6*x0) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp54 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2059136Z     tmp56 = 1
2023-01-11T21:38:06.2059215Z     tmp57 = tmp55 + tmp56
2023-01-11T21:38:06.2059348Z     tmp58 = tl.where(tmp54, tmp57, 0.0)
2023-01-11T21:38:06.2059428Z     tmp59 = tmp58 + tmp53
2023-01-11T21:38:06.2059499Z     tmp60 = 1
2023-01-11T21:38:06.2059593Z     tmp61 = tl.where(tmp6, tmp60, 0.0)
2023-01-11T21:38:06.2059657Z     tmp62 = 1
2023-01-11T21:38:06.2059750Z     tmp63 = tl.where(tmp13, tmp62, 0.0)
2023-01-11T21:38:06.2059829Z     tmp64 = tmp63 + tmp61
2023-01-11T21:38:06.2059901Z     tmp65 = 1
2023-01-11T21:38:06.2059992Z     tmp66 = tl.where(tmp21, tmp65, 0.0)
2023-01-11T21:38:06.2060071Z     tmp67 = tmp66 + tmp64
2023-01-11T21:38:06.2060142Z     tmp68 = 1
2023-01-11T21:38:06.2060227Z     tmp69 = tl.where(tmp27, tmp68, 0.0)
2023-01-11T21:38:06.2060309Z     tmp70 = tmp69 + tmp67
2023-01-11T21:38:06.2060382Z     tmp71 = 1
2023-01-11T21:38:06.2060475Z     tmp72 = tl.where(tmp35, tmp71, 0.0)
2023-01-11T21:38:06.2060554Z     tmp73 = tmp72 + tmp70
2023-01-11T21:38:06.2060624Z     tmp74 = 1
2023-01-11T21:38:06.2060715Z     tmp75 = tl.where(tmp41, tmp74, 0.0)
2023-01-11T21:38:06.2060791Z     tmp76 = tmp75 + tmp73
2023-01-11T21:38:06.2060861Z     tmp77 = 1
2023-01-11T21:38:06.2060952Z     tmp78 = tl.where(tmp48, tmp77, 0.0)
2023-01-11T21:38:06.2061032Z     tmp79 = tmp78 + tmp76
2023-01-11T21:38:06.2061101Z     tmp80 = 1
2023-01-11T21:38:06.2061194Z     tmp81 = tl.where(tmp54, tmp80, 0.0)
2023-01-11T21:38:06.2061266Z     tmp82 = tmp81 + tmp79
2023-01-11T21:38:06.2061345Z     tmp83 = tmp59 / tmp82
2023-01-11T21:38:06.2061482Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp83, xmask)
2023-01-11T21:38:06.2061574Z ''')
2023-01-11T21:38:06.2061580Z 
2023-01-11T21:38:06.2061585Z 
2023-01-11T21:38:06.2061681Z async_compile.wait(globals())
2023-01-11T21:38:06.2061787Z del async_compile
2023-01-11T21:38:06.2061793Z 
2023-01-11T21:38:06.2061867Z def call(args):
2023-01-11T21:38:06.2061940Z     arg0_1, = args
2023-01-11T21:38:06.2062008Z     args.clear()
2023-01-11T21:38:06.2062103Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2062326Z         buf0 = empty_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2062423Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2062598Z         triton_fused__adaptive_avg_pool2d_0.run(arg0_1, buf0, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.2062840Z         buf1 = empty_strided((2, 4, 2, 5), (40, 10, 5, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2063012Z         triton_fused__adaptive_avg_pool2d_1_1.run(arg0_1, buf1, 80, grid=grid(80), stream=stream0)
2023-01-11T21:38:06.2063088Z         del arg0_1
2023-01-11T21:38:06.2063167Z         return (buf0, buf1, )
2023-01-11T21:38:06.2063172Z 
2023-01-11T21:38:06.2063180Z 
2023-01-11T21:38:06.2063262Z if __name__ == "__main__":
2023-01-11T21:38:06.2063388Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2063525Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2063767Z     arg0_1 = rand_strided((2, 4, 6, 6), (144, 36, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2063891Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2063897Z 
2023-01-11T21:38:06.2063969Z ok (2.046s)
2023-01-11T21:38:06.2064523Z   test_adaptive_avg_pool2d2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2064667Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2064959Z [2023-01-11 21:33:56,072] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 379
2023-01-11T21:38:06.2065238Z [2023-01-11 21:33:56,077] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._adaptive_avg_pool2d
2023-01-11T21:38:06.2065565Z [2023-01-11 21:33:56,080] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 379
2023-01-11T21:38:06.2065571Z 
2023-01-11T21:38:06.2065676Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2065749Z import torch
2023-01-11T21:38:06.2065825Z import random
2023-01-11T21:38:06.2065954Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2066088Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2066093Z 
2023-01-11T21:38:06.2066171Z aten = torch.ops.aten
2023-01-11T21:38:06.2066321Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2066422Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2066431Z 
2023-01-11T21:38:06.2066506Z import triton
2023-01-11T21:38:06.2066602Z import triton.language as tl
2023-01-11T21:38:06.2066739Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2066891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2066897Z 
2023-01-11T21:38:06.2066901Z 
2023-01-11T21:38:06.2066999Z async_compile.wait(globals())
2023-01-11T21:38:06.2067071Z del async_compile
2023-01-11T21:38:06.2067076Z 
2023-01-11T21:38:06.2067150Z def call(args):
2023-01-11T21:38:06.2067225Z     arg0_1, = args
2023-01-11T21:38:06.2067301Z     args.clear()
2023-01-11T21:38:06.2067401Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2067525Z         buf0 = aten._adaptive_avg_pool2d(arg0_1, [4, 4])
2023-01-11T21:38:06.2067598Z         del arg0_1
2023-01-11T21:38:06.2067664Z         buf1 = buf0
2023-01-11T21:38:06.2067785Z         assert_size_stride(buf1, (2, 4, 4, 4), (64, 16, 4, 1))
2023-01-11T21:38:06.2067857Z         del buf0
2023-01-11T21:38:06.2067966Z         return (buf1, )
2023-01-11T21:38:06.2067971Z 
2023-01-11T21:38:06.2067976Z 
2023-01-11T21:38:06.2068058Z if __name__ == "__main__":
2023-01-11T21:38:06.2068183Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2068319Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2068583Z     arg0_1 = rand_strided((2, 4, 21, 21), (1764, 441, 21, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2068687Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2068693Z 
2023-01-11T21:38:06.2068764Z ok (0.025s)
2023-01-11T21:38:06.2069221Z   test_add_const_float_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2069354Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2069614Z [2023-01-11 21:33:56,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 380
2023-01-11T21:38:06.2069879Z [2023-01-11 21:33:56,155] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 380
2023-01-11T21:38:06.2070290Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2070421Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2070675Z [2023-01-11 21:33:56,167] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 381
2023-01-11T21:38:06.2070940Z [2023-01-11 21:33:56,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 381
2023-01-11T21:38:06.2070946Z 
2023-01-11T21:38:06.2071048Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2071115Z import torch
2023-01-11T21:38:06.2071189Z import random
2023-01-11T21:38:06.2071338Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2071465Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2071471Z 
2023-01-11T21:38:06.2071552Z aten = torch.ops.aten
2023-01-11T21:38:06.2071688Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2071783Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2071789Z 
2023-01-11T21:38:06.2071856Z import triton
2023-01-11T21:38:06.2071949Z import triton.language as tl
2023-01-11T21:38:06.2072074Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2072219Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2072228Z 
2023-01-11T21:38:06.2072232Z 
2023-01-11T21:38:06.2072387Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2072459Z import triton
2023-01-11T21:38:06.2072550Z import triton.language as tl
2023-01-11T21:38:06.2072664Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2072761Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2072897Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2073023Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2073029Z 
2023-01-11T21:38:06.2073434Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2073507Z @triton.jit
2023-01-11T21:38:06.2073639Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2073712Z     xnumel = 32
2023-01-11T21:38:06.2073860Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2073982Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2074063Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2074134Z     x0 = xindex
2023-01-11T21:38:06.2074232Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2074305Z     tmp1 = 1.5
2023-01-11T21:38:06.2074383Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2074518Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2074597Z ''')
2023-01-11T21:38:06.2074603Z 
2023-01-11T21:38:06.2074607Z 
2023-01-11T21:38:06.2074699Z async_compile.wait(globals())
2023-01-11T21:38:06.2074775Z del async_compile
2023-01-11T21:38:06.2074780Z 
2023-01-11T21:38:06.2074856Z def call(args):
2023-01-11T21:38:06.2074929Z     arg0_1, = args
2023-01-11T21:38:06.2075006Z     args.clear()
2023-01-11T21:38:06.2075099Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2075323Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2075433Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2075569Z         triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.2075642Z         del arg0_1
2023-01-11T21:38:06.2075719Z         return (buf0, )
2023-01-11T21:38:06.2075724Z 
2023-01-11T21:38:06.2075731Z 
2023-01-11T21:38:06.2075810Z if __name__ == "__main__":
2023-01-11T21:38:06.2075928Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2076055Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2076248Z     arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2076358Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2076363Z 
2023-01-11T21:38:06.2076368Z 
2023-01-11T21:38:06.2076465Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2076537Z import torch
2023-01-11T21:38:06.2076611Z import random
2023-01-11T21:38:06.2076734Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2076856Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2076861Z 
2023-01-11T21:38:06.2076944Z aten = torch.ops.aten
2023-01-11T21:38:06.2077071Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2077166Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2077203Z 
2023-01-11T21:38:06.2077279Z import triton
2023-01-11T21:38:06.2077371Z import triton.language as tl
2023-01-11T21:38:06.2077495Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2077636Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2077641Z 
2023-01-11T21:38:06.2077646Z 
2023-01-11T21:38:06.2077799Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2077874Z import triton
2023-01-11T21:38:06.2077959Z import triton.language as tl
2023-01-11T21:38:06.2078072Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2078173Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2078307Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2078432Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2078437Z 
2023-01-11T21:38:06.2078845Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2078918Z @triton.jit
2023-01-11T21:38:06.2079050Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2079116Z     xnumel = 32
2023-01-11T21:38:06.2079212Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2079341Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2079427Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2079497Z     x0 = xindex
2023-01-11T21:38:06.2079614Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2079718Z     tmp1 = 1.5
2023-01-11T21:38:06.2079791Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2079926Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2080012Z ''')
2023-01-11T21:38:06.2080018Z 
2023-01-11T21:38:06.2080022Z 
2023-01-11T21:38:06.2080115Z async_compile.wait(globals())
2023-01-11T21:38:06.2080194Z del async_compile
2023-01-11T21:38:06.2080199Z 
2023-01-11T21:38:06.2080272Z def call(args):
2023-01-11T21:38:06.2080346Z     arg0_1, = args
2023-01-11T21:38:06.2080414Z     args.clear()
2023-01-11T21:38:06.2080506Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2080703Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2080795Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2080928Z         triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.2081002Z         del arg0_1
2023-01-11T21:38:06.2081079Z         return (buf0, )
2023-01-11T21:38:06.2081087Z 
2023-01-11T21:38:06.2081091Z 
2023-01-11T21:38:06.2081172Z if __name__ == "__main__":
2023-01-11T21:38:06.2081283Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2081410Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2081610Z     arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2081722Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2081727Z 
2023-01-11T21:38:06.2081797Z ok (0.151s)
2023-01-11T21:38:06.2082259Z   test_add_const_int_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2082390Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2082648Z [2023-01-11 21:33:56,242] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 382
2023-01-11T21:38:06.2082912Z [2023-01-11 21:33:56,308] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 382
2023-01-11T21:38:06.2083347Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2083481Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2083737Z [2023-01-11 21:33:56,320] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 383
2023-01-11T21:38:06.2083999Z [2023-01-11 21:33:56,382] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 383
2023-01-11T21:38:06.2084007Z 
2023-01-11T21:38:06.2084106Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2084181Z import torch
2023-01-11T21:38:06.2084255Z import random
2023-01-11T21:38:06.2084373Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2084504Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2084510Z 
2023-01-11T21:38:06.2084584Z aten = torch.ops.aten
2023-01-11T21:38:06.2084721Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2084815Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2084820Z 
2023-01-11T21:38:06.2084893Z import triton
2023-01-11T21:38:06.2084985Z import triton.language as tl
2023-01-11T21:38:06.2085110Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2085248Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2085253Z 
2023-01-11T21:38:06.2085257Z 
2023-01-11T21:38:06.2085441Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2085508Z import triton
2023-01-11T21:38:06.2085600Z import triton.language as tl
2023-01-11T21:38:06.2085714Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2085816Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2085950Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2086075Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2086080Z 
2023-01-11T21:38:06.2086485Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2086559Z @triton.jit
2023-01-11T21:38:06.2086685Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2086756Z     xnumel = 32
2023-01-11T21:38:06.2086853Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2086985Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2087070Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2087139Z     x0 = xindex
2023-01-11T21:38:06.2087237Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2087301Z     tmp1 = 1
2023-01-11T21:38:06.2087379Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2087519Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2087607Z ''')
2023-01-11T21:38:06.2087612Z 
2023-01-11T21:38:06.2087617Z 
2023-01-11T21:38:06.2087710Z async_compile.wait(globals())
2023-01-11T21:38:06.2087786Z del async_compile
2023-01-11T21:38:06.2087791Z 
2023-01-11T21:38:06.2087866Z def call(args):
2023-01-11T21:38:06.2087933Z     arg0_1, = args
2023-01-11T21:38:06.2088007Z     args.clear()
2023-01-11T21:38:06.2088099Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2088297Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2088391Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2088526Z         triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.2088600Z         del arg0_1
2023-01-11T21:38:06.2088671Z         return (buf0, )
2023-01-11T21:38:06.2088684Z 
2023-01-11T21:38:06.2088688Z 
2023-01-11T21:38:06.2088761Z if __name__ == "__main__":
2023-01-11T21:38:06.2088911Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2089037Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2089239Z     arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2089354Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2089360Z 
2023-01-11T21:38:06.2089364Z 
2023-01-11T21:38:06.2089461Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2089533Z import torch
2023-01-11T21:38:06.2089607Z import random
2023-01-11T21:38:06.2089718Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2089843Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2089848Z 
2023-01-11T21:38:06.2089930Z aten = torch.ops.aten
2023-01-11T21:38:06.2090068Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2090163Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2090168Z 
2023-01-11T21:38:06.2090241Z import triton
2023-01-11T21:38:06.2090336Z import triton.language as tl
2023-01-11T21:38:06.2090454Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2090592Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2090598Z 
2023-01-11T21:38:06.2090602Z 
2023-01-11T21:38:06.2090756Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2090831Z import triton
2023-01-11T21:38:06.2090923Z import triton.language as tl
2023-01-11T21:38:06.2091035Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2091138Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2091271Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2091418Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2091423Z 
2023-01-11T21:38:06.2091825Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2091899Z @triton.jit
2023-01-11T21:38:06.2092033Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2092106Z     xnumel = 32
2023-01-11T21:38:06.2092203Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2092330Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2092416Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2092480Z     x0 = xindex
2023-01-11T21:38:06.2092596Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2092667Z     tmp1 = 1
2023-01-11T21:38:06.2092751Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2092884Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2092971Z ''')
2023-01-11T21:38:06.2092976Z 
2023-01-11T21:38:06.2092981Z 
2023-01-11T21:38:06.2093071Z async_compile.wait(globals())
2023-01-11T21:38:06.2093140Z del async_compile
2023-01-11T21:38:06.2093146Z 
2023-01-11T21:38:06.2093224Z def call(args):
2023-01-11T21:38:06.2093297Z     arg0_1, = args
2023-01-11T21:38:06.2093371Z     args.clear()
2023-01-11T21:38:06.2093461Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2093658Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2093751Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2093878Z         triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.2093951Z         del arg0_1
2023-01-11T21:38:06.2094029Z         return (buf0, )
2023-01-11T21:38:06.2094034Z 
2023-01-11T21:38:06.2094039Z 
2023-01-11T21:38:06.2094120Z if __name__ == "__main__":
2023-01-11T21:38:06.2094241Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2094366Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2094675Z     arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2094834Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2094840Z 
2023-01-11T21:38:06.2094905Z ok (0.152s)
2023-01-11T21:38:06.2095378Z   test_add_inplace_permuted_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2095510Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2095788Z [2023-01-11 21:33:56,395] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 384
2023-01-11T21:38:06.2096054Z [2023-01-11 21:33:56,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 384
2023-01-11T21:38:06.2096470Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2096600Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2096858Z [2023-01-11 21:33:56,494] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 385
2023-01-11T21:38:06.2097120Z [2023-01-11 21:33:56,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 385
2023-01-11T21:38:06.2097211Z 
2023-01-11T21:38:06.2097333Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2097407Z import torch
2023-01-11T21:38:06.2097474Z import random
2023-01-11T21:38:06.2097594Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2097716Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2097724Z 
2023-01-11T21:38:06.2097806Z aten = torch.ops.aten
2023-01-11T21:38:06.2097942Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2098038Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2098043Z 
2023-01-11T21:38:06.2098116Z import triton
2023-01-11T21:38:06.2098208Z import triton.language as tl
2023-01-11T21:38:06.2098326Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2098466Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2098471Z 
2023-01-11T21:38:06.2098476Z 
2023-01-11T21:38:06.2098634Z triton_fused_add__0 = async_compile.triton('''
2023-01-11T21:38:06.2098712Z import triton
2023-01-11T21:38:06.2098804Z import triton.language as tl
2023-01-11T21:38:06.2098916Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2099017Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2099143Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2099272Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2099277Z 
2023-01-11T21:38:06.2099719Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2099791Z @triton.jit
2023-01-11T21:38:06.2099932Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2100008Z     xnumel = 5304
2023-01-11T21:38:06.2100108Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2100236Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2100315Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2100385Z     x3 = xindex
2023-01-11T21:38:06.2100461Z     x0 = xindex % 221
2023-01-11T21:38:06.2100542Z     x2 = (xindex // 2652)
2023-01-11T21:38:06.2100763Z     tmp0 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2100872Z     tmp1 = tl.load(in_ptr1 + (x0 + (221*x2)), xmask)
2023-01-11T21:38:06.2100951Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2101078Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2101163Z ''')
2023-01-11T21:38:06.2101168Z 
2023-01-11T21:38:06.2101173Z 
2023-01-11T21:38:06.2101266Z async_compile.wait(globals())
2023-01-11T21:38:06.2101342Z del async_compile
2023-01-11T21:38:06.2101347Z 
2023-01-11T21:38:06.2101421Z def call(args):
2023-01-11T21:38:06.2101499Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2101572Z     args.clear()
2023-01-11T21:38:06.2101663Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2101750Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2101899Z         triton_fused_add__0.run(arg0_1, arg1_1, arg0_1, 5304, grid=grid(5304), stream=stream0)
2023-01-11T21:38:06.2101973Z         del arg1_1
2023-01-11T21:38:06.2102053Z         return (arg0_1, )
2023-01-11T21:38:06.2102058Z 
2023-01-11T21:38:06.2102065Z 
2023-01-11T21:38:06.2102147Z if __name__ == "__main__":
2023-01-11T21:38:06.2102262Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2102385Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2102611Z     arg0_1 = rand_strided((2, 13, 12, 17), (2652, 17, 221, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2102821Z     arg1_1 = rand_strided((2, 13, 1, 17), (221, 17, 17, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2102941Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2102946Z 
2023-01-11T21:38:06.2102951Z 
2023-01-11T21:38:06.2103084Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2103159Z import torch
2023-01-11T21:38:06.2103233Z import random
2023-01-11T21:38:06.2103351Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2103474Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2103479Z 
2023-01-11T21:38:06.2103559Z aten = torch.ops.aten
2023-01-11T21:38:06.2103690Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2103784Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2103789Z 
2023-01-11T21:38:06.2103862Z import triton
2023-01-11T21:38:06.2103954Z import triton.language as tl
2023-01-11T21:38:06.2104078Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2104218Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2104223Z 
2023-01-11T21:38:06.2104228Z 
2023-01-11T21:38:06.2104382Z triton_fused_add__0 = async_compile.triton('''
2023-01-11T21:38:06.2104456Z import triton
2023-01-11T21:38:06.2104545Z import triton.language as tl
2023-01-11T21:38:06.2104658Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2104759Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2104892Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2105016Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2105023Z 
2023-01-11T21:38:06.2105492Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2105584Z @triton.jit
2023-01-11T21:38:06.2105726Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2105794Z     xnumel = 5304
2023-01-11T21:38:06.2105890Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2106018Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2106104Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2106174Z     x3 = xindex
2023-01-11T21:38:06.2106248Z     x0 = xindex % 221
2023-01-11T21:38:06.2106331Z     x2 = (xindex // 2652)
2023-01-11T21:38:06.2106539Z     tmp0 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2106741Z     tmp1 = tl.load(in_ptr1 + (x0 + (221*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2106820Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2106956Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2107040Z ''')
2023-01-11T21:38:06.2107046Z 
2023-01-11T21:38:06.2107050Z 
2023-01-11T21:38:06.2107143Z async_compile.wait(globals())
2023-01-11T21:38:06.2107219Z del async_compile
2023-01-11T21:38:06.2107224Z 
2023-01-11T21:38:06.2107292Z def call(args):
2023-01-11T21:38:06.2107371Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2107445Z     args.clear()
2023-01-11T21:38:06.2107537Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2107632Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2107780Z         triton_fused_add__0.run(arg0_1, arg1_1, arg0_1, 5304, grid=grid(5304), stream=stream0)
2023-01-11T21:38:06.2107853Z         del arg1_1
2023-01-11T21:38:06.2107926Z         return (arg0_1, )
2023-01-11T21:38:06.2107938Z 
2023-01-11T21:38:06.2107942Z 
2023-01-11T21:38:06.2108018Z if __name__ == "__main__":
2023-01-11T21:38:06.2108138Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2108263Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2108489Z     arg0_1 = rand_strided((2, 13, 12, 17), (2652, 17, 221, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2108704Z     arg1_1 = rand_strided((2, 13, 1, 17), (221, 17, 17, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2108824Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2108829Z 
2023-01-11T21:38:06.2108899Z ok (0.192s)
2023-01-11T21:38:06.2109353Z   test_addmm_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2109512Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2109761Z [2023-01-11 21:33:56,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 386
2023-01-11T21:38:06.2110025Z [2023-01-11 21:33:56,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 386
2023-01-11T21:38:06.2110440Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2110576Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2110831Z [2023-01-11 21:33:56,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 387
2023-01-11T21:38:06.2110840Z 
2023-01-11T21:38:06.2110938Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2111012Z import torch
2023-01-11T21:38:06.2111086Z import random
2023-01-11T21:38:06.2111204Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2111321Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2111326Z 
2023-01-11T21:38:06.2111406Z aten = torch.ops.aten
2023-01-11T21:38:06.2111542Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2111636Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2111641Z 
2023-01-11T21:38:06.2111714Z import triton
2023-01-11T21:38:06.2111811Z import triton.language as tl
2023-01-11T21:38:06.2111935Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2112074Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2112080Z 
2023-01-11T21:38:06.2112084Z 
2023-01-11T21:38:06.2112232Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2112334Z import triton
2023-01-11T21:38:06.2112428Z import triton.language as tl
2023-01-11T21:38:06.2112542Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2112644Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2112777Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2112902Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2112908Z 
2023-01-11T21:38:06.2113312Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2113381Z @triton.jit
2023-01-11T21:38:06.2113513Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2113585Z     xnumel = 64
2023-01-11T21:38:06.2113682Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2113813Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2113896Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2113965Z     x0 = xindex
2023-01-11T21:38:06.2114055Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2114125Z     tmp1 = 1
2023-01-11T21:38:06.2114203Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2114337Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2114422Z ''')
2023-01-11T21:38:06.2114428Z 
2023-01-11T21:38:06.2114432Z 
2023-01-11T21:38:06.2114589Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2114663Z import triton
2023-01-11T21:38:06.2114756Z import triton.language as tl
2023-01-11T21:38:06.2114908Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2115008Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2115151Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2115295Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2115302Z 
2023-01-11T21:38:06.2115731Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2115805Z @triton.jit
2023-01-11T21:38:06.2115937Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2116010Z     xnumel = 64
2023-01-11T21:38:06.2116099Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2116228Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2116311Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2116385Z     x0 = xindex
2023-01-11T21:38:06.2116480Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2116550Z     tmp1 = 2
2023-01-11T21:38:06.2116622Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2116754Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2116840Z ''')
2023-01-11T21:38:06.2116847Z 
2023-01-11T21:38:06.2116853Z 
2023-01-11T21:38:06.2117009Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.2117086Z import triton
2023-01-11T21:38:06.2117179Z import triton.language as tl
2023-01-11T21:38:06.2117291Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2117392Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2117517Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2117639Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2117644Z 
2023-01-11T21:38:06.2118045Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2118121Z @triton.jit
2023-01-11T21:38:06.2118252Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2118325Z     xnumel = 64
2023-01-11T21:38:06.2118452Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2118582Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2118658Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2118726Z     x0 = xindex
2023-01-11T21:38:06.2118822Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2118893Z     tmp1 = 3
2023-01-11T21:38:06.2118971Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2119104Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2119189Z ''')
2023-01-11T21:38:06.2119195Z 
2023-01-11T21:38:06.2119199Z 
2023-01-11T21:38:06.2119348Z triton_fused_add_3_3 = async_compile.triton('''
2023-01-11T21:38:06.2119426Z import triton
2023-01-11T21:38:06.2119518Z import triton.language as tl
2023-01-11T21:38:06.2119631Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2119733Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2119864Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2119992Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2119997Z 
2023-01-11T21:38:06.2120396Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2120463Z @triton.jit
2023-01-11T21:38:06.2120587Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2120659Z     xnumel = 64
2023-01-11T21:38:06.2120755Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2120883Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2120996Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2121066Z     x0 = xindex
2023-01-11T21:38:06.2121161Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2121232Z     tmp1 = 4
2023-01-11T21:38:06.2121310Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2121449Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2121534Z ''')
2023-01-11T21:38:06.2121539Z 
2023-01-11T21:38:06.2121544Z 
2023-01-11T21:38:06.2121638Z async_compile.wait(globals())
2023-01-11T21:38:06.2121715Z del async_compile
2023-01-11T21:38:06.2121720Z 
2023-01-11T21:38:06.2121792Z def call(args):
2023-01-11T21:38:06.2121873Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2121948Z     args.clear()
2023-01-11T21:38:06.2122040Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2122241Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2122334Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2122474Z         triton_fused_add_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2122547Z         del arg0_1
2023-01-11T21:38:06.2122738Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2122880Z         triton_fused_add_1_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2122953Z         del arg1_1
2023-01-11T21:38:06.2123148Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2123284Z         triton_fused_add_2_2.run(arg2_1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2123356Z         del arg2_1
2023-01-11T21:38:06.2123546Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2123671Z         aten.addmm.out(buf0, buf1, buf2, beta=1, alpha=1, out=buf3)
2023-01-11T21:38:06.2123745Z         del buf0
2023-01-11T21:38:06.2123814Z         del buf1
2023-01-11T21:38:06.2123886Z         del buf2
2023-01-11T21:38:06.2123976Z         buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.2124106Z         triton_fused_add_3_3.run(buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2124183Z         return (buf4, )
2023-01-11T21:38:06.2124189Z 
2023-01-11T21:38:06.2124193Z 
2023-01-11T21:38:06.2124273Z if __name__ == "__main__":
2023-01-11T21:38:06.2124410Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2124538Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2124737Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2124935Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2125130Z     arg2_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2125255Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2125261Z 
2023-01-11T21:38:06.2125553Z [2023-01-11 21:33:57,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 387
2023-01-11T21:38:06.2125564Z 
2023-01-11T21:38:06.2125683Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2125751Z import torch
2023-01-11T21:38:06.2125825Z import random
2023-01-11T21:38:06.2125944Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2126072Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2126077Z 
2023-01-11T21:38:06.2126159Z aten = torch.ops.aten
2023-01-11T21:38:06.2126295Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2126391Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2126397Z 
2023-01-11T21:38:06.2126468Z import triton
2023-01-11T21:38:06.2126555Z import triton.language as tl
2023-01-11T21:38:06.2126679Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2126817Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2126822Z 
2023-01-11T21:38:06.2126857Z 
2023-01-11T21:38:06.2127011Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2127085Z import triton
2023-01-11T21:38:06.2127178Z import triton.language as tl
2023-01-11T21:38:06.2127290Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2127384Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2127518Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2127641Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2127646Z 
2023-01-11T21:38:06.2128048Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2128123Z @triton.jit
2023-01-11T21:38:06.2128255Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2128328Z     xnumel = 64
2023-01-11T21:38:06.2128423Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2128547Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2128630Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2128699Z     x0 = xindex
2023-01-11T21:38:06.2128816Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2128886Z     tmp1 = 1
2023-01-11T21:38:06.2128967Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2129103Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2129181Z ''')
2023-01-11T21:38:06.2129186Z 
2023-01-11T21:38:06.2129197Z 
2023-01-11T21:38:06.2129347Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2129422Z import triton
2023-01-11T21:38:06.2129513Z import triton.language as tl
2023-01-11T21:38:06.2129628Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2129730Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2129862Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2129987Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2129992Z 
2023-01-11T21:38:06.2130386Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2130490Z @triton.jit
2023-01-11T21:38:06.2130621Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2130694Z     xnumel = 64
2023-01-11T21:38:06.2130790Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2130918Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2131001Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2131064Z     x0 = xindex
2023-01-11T21:38:06.2131180Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2131251Z     tmp1 = 2
2023-01-11T21:38:06.2131331Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2131463Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2131550Z ''')
2023-01-11T21:38:06.2131556Z 
2023-01-11T21:38:06.2131560Z 
2023-01-11T21:38:06.2131715Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.2131788Z import triton
2023-01-11T21:38:06.2131874Z import triton.language as tl
2023-01-11T21:38:06.2131990Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2132091Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2132224Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2132350Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2132355Z 
2023-01-11T21:38:06.2132755Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2132828Z @triton.jit
2023-01-11T21:38:06.2132957Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2133053Z     xnumel = 64
2023-01-11T21:38:06.2133152Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2133281Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2133366Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2133438Z     x0 = xindex
2023-01-11T21:38:06.2133554Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2133624Z     tmp1 = 3
2023-01-11T21:38:06.2133695Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2133826Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2133911Z ''')
2023-01-11T21:38:06.2133916Z 
2023-01-11T21:38:06.2133921Z 
2023-01-11T21:38:06.2134077Z triton_fused_add_3_3 = async_compile.triton('''
2023-01-11T21:38:06.2134150Z import triton
2023-01-11T21:38:06.2134242Z import triton.language as tl
2023-01-11T21:38:06.2134354Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2134451Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2134700Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2134826Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2134831Z 
2023-01-11T21:38:06.2135236Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2135309Z @triton.jit
2023-01-11T21:38:06.2135435Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2135508Z     xnumel = 64
2023-01-11T21:38:06.2135604Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2135725Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2135808Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2135879Z     x0 = xindex
2023-01-11T21:38:06.2135999Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2136074Z     tmp1 = 4
2023-01-11T21:38:06.2136150Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2136287Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2136365Z ''')
2023-01-11T21:38:06.2136370Z 
2023-01-11T21:38:06.2136381Z 
2023-01-11T21:38:06.2136467Z async_compile.wait(globals())
2023-01-11T21:38:06.2136588Z del async_compile
2023-01-11T21:38:06.2136594Z 
2023-01-11T21:38:06.2136669Z def call(args):
2023-01-11T21:38:06.2136757Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2136832Z     args.clear()
2023-01-11T21:38:06.2136924Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2137167Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2137267Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2137420Z         triton_fused_add_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2137494Z         del arg0_1
2023-01-11T21:38:06.2137695Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2137836Z         triton_fused_add_1_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2137911Z         del arg1_1
2023-01-11T21:38:06.2138104Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2138235Z         triton_fused_add_2_2.run(arg2_1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2138308Z         del arg2_1
2023-01-11T21:38:06.2138501Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2138631Z         aten.addmm.out(buf0, buf1, buf2, beta=1, alpha=1, out=buf3)
2023-01-11T21:38:06.2138703Z         del buf0
2023-01-11T21:38:06.2138772Z         del buf1
2023-01-11T21:38:06.2138840Z         del buf2
2023-01-11T21:38:06.2138924Z         buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.2139055Z         triton_fused_add_3_3.run(buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2139177Z         return (buf4, )
2023-01-11T21:38:06.2139182Z 
2023-01-11T21:38:06.2139187Z 
2023-01-11T21:38:06.2139265Z if __name__ == "__main__":
2023-01-11T21:38:06.2139382Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2139506Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2139708Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2139903Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2140093Z     arg2_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2140221Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2140227Z 
2023-01-11T21:38:06.2140296Z ok (0.444s)
2023-01-11T21:38:06.2140757Z   test_alexnet_prefix_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2140888Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2141149Z [2023-01-11 21:33:57,096] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 388
2023-01-11T21:38:06.2141362Z [2023-01-11 21:33:57,185] torch._inductor.scheduler: [DEBUG] removed dead node: buf3
2023-01-11T21:38:06.2141623Z [2023-01-11 21:33:57,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 388
2023-01-11T21:38:06.2142039Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2142171Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2142424Z [2023-01-11 21:33:57,387] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 389
2023-01-11T21:38:06.2142660Z [2023-01-11 21:33:57,438] torch._inductor.scheduler: [DEBUG] removed dead node: buf3
2023-01-11T21:38:06.2142674Z 
2023-01-11T21:38:06.2142766Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2142840Z import torch
2023-01-11T21:38:06.2142913Z import random
2023-01-11T21:38:06.2143032Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2143155Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2143161Z 
2023-01-11T21:38:06.2143242Z aten = torch.ops.aten
2023-01-11T21:38:06.2143378Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2143468Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2143475Z 
2023-01-11T21:38:06.2143548Z import triton
2023-01-11T21:38:06.2143640Z import triton.language as tl
2023-01-11T21:38:06.2143765Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2143907Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2143912Z 
2023-01-11T21:38:06.2143917Z 
2023-01-11T21:38:06.2144099Z triton_fused_convolution_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.2144174Z import triton
2023-01-11T21:38:06.2144266Z import triton.language as tl
2023-01-11T21:38:06.2144372Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2144476Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2144609Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2144733Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2144738Z 
2023-01-11T21:38:06.2145156Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2145265Z @triton.jit
2023-01-11T21:38:06.2145399Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2145474Z     xnumel = 3097600
2023-01-11T21:38:06.2145567Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2145693Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2145774Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2145845Z     x3 = xindex
2023-01-11T21:38:06.2145930Z     x1 = (xindex // 3025) % 64
2023-01-11T21:38:06.2146038Z     tmp0 = tl.load(in_out_ptr0 + (x3), xmask)
2023-01-11T21:38:06.2146132Z     tmp1 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.2146205Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2146322Z     tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2))
2023-01-11T21:38:06.2146460Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2146548Z ''')
2023-01-11T21:38:06.2146553Z 
2023-01-11T21:38:06.2146558Z 
2023-01-11T21:38:06.2146717Z triton_fused_getitem_1 = async_compile.triton('''
2023-01-11T21:38:06.2146789Z import triton
2023-01-11T21:38:06.2146880Z import triton.language as tl
2023-01-11T21:38:06.2146990Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2147090Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2147222Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2147346Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2147351Z 
2023-01-11T21:38:06.2147758Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2147832Z @triton.jit
2023-01-11T21:38:06.2147964Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2148042Z     xnumel = 746496
2023-01-11T21:38:06.2148132Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2148259Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2148341Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2148416Z     x0 = xindex % 27
2023-01-11T21:38:06.2148526Z     x1 = (xindex // 27) % 27
2023-01-11T21:38:06.2148608Z     x2 = (xindex // 729)
2023-01-11T21:38:06.2148678Z     x3 = xindex
2023-01-11T21:38:06.2148791Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2148909Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149028Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149146Z     tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149263Z     tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149381Z     tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149500Z     tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149617Z     tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149728Z     tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2149864Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.2150000Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.2150131Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.2150261Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.2150394Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.2150536Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.2150702Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.2150828Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.2150958Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.2151049Z ''')
2023-01-11T21:38:06.2151055Z 
2023-01-11T21:38:06.2151059Z 
2023-01-11T21:38:06.2151153Z async_compile.wait(globals())
2023-01-11T21:38:06.2151231Z del async_compile
2023-01-11T21:38:06.2151237Z 
2023-01-11T21:38:06.2151313Z def call(args):
2023-01-11T21:38:06.2151399Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2151475Z     args.clear()
2023-01-11T21:38:06.2151561Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2151704Z         buf0 = aten.convolution(arg2_1, arg1_1, None, (4, 4), (2, 2), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.2151824Z         assert_size_stride(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1))
2023-01-11T21:38:06.2151900Z         del arg1_1
2023-01-11T21:38:06.2151975Z         del arg2_1
2023-01-11T21:38:06.2152103Z         buf1 = as_strided(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)); del buf0  # reuse
2023-01-11T21:38:06.2152196Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2152349Z         triton_fused_convolution_relu_0.run(buf1, arg0_1, 3097600, grid=grid(3097600), stream=stream0)
2023-01-11T21:38:06.2152422Z         del arg0_1
2023-01-11T21:38:06.2152652Z         buf2 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2152796Z         triton_fused_getitem_1.run(buf1, buf2, 746496, grid=grid(746496), stream=stream0)
2023-01-11T21:38:06.2152877Z         return (buf2, )
2023-01-11T21:38:06.2152882Z 
2023-01-11T21:38:06.2152887Z 
2023-01-11T21:38:06.2152965Z if __name__ == "__main__":
2023-01-11T21:38:06.2153083Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2153212Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2153407Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2153630Z     arg1_1 = rand_strided((64, 3, 11, 11), (363, 121, 11, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2153891Z     arg2_1 = rand_strided((16, 3, 224, 224), (150528, 50176, 224, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2154021Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2154026Z 
2023-01-11T21:38:06.2154290Z [2023-01-11 21:33:57,574] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 389
2023-01-11T21:38:06.2154296Z 
2023-01-11T21:38:06.2154394Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2154472Z import torch
2023-01-11T21:38:06.2154548Z import random
2023-01-11T21:38:06.2154660Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2154785Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2154794Z 
2023-01-11T21:38:06.2154875Z aten = torch.ops.aten
2023-01-11T21:38:06.2155009Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2155104Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2155110Z 
2023-01-11T21:38:06.2155184Z import triton
2023-01-11T21:38:06.2155277Z import triton.language as tl
2023-01-11T21:38:06.2155405Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2155536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2155541Z 
2023-01-11T21:38:06.2155552Z 
2023-01-11T21:38:06.2155724Z triton_fused_convolution_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.2155797Z import triton
2023-01-11T21:38:06.2155889Z import triton.language as tl
2023-01-11T21:38:06.2156003Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2156102Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2156233Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2156356Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2156407Z 
2023-01-11T21:38:06.2156825Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2156901Z @triton.jit
2023-01-11T21:38:06.2157034Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2157111Z     xnumel = 3097600
2023-01-11T21:38:06.2157210Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2157337Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2157424Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2157495Z     x3 = xindex
2023-01-11T21:38:06.2157570Z     x1 = (xindex // 3025) % 64
2023-01-11T21:38:06.2157690Z     tmp0 = tl.load(in_out_ptr0 + (x3), xmask).to(tl.float32)
2023-01-11T21:38:06.2157807Z     tmp1 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32)
2023-01-11T21:38:06.2157890Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2158005Z     tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2))
2023-01-11T21:38:06.2158141Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2158227Z ''')
2023-01-11T21:38:06.2158233Z 
2023-01-11T21:38:06.2158238Z 
2023-01-11T21:38:06.2158392Z triton_fused_getitem_1 = async_compile.triton('''
2023-01-11T21:38:06.2158466Z import triton
2023-01-11T21:38:06.2158557Z import triton.language as tl
2023-01-11T21:38:06.2158671Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2158771Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2158903Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2159029Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2159034Z 
2023-01-11T21:38:06.2159444Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2159514Z @triton.jit
2023-01-11T21:38:06.2159648Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2159723Z     xnumel = 746496
2023-01-11T21:38:06.2159855Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2159983Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2160063Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2160139Z     x0 = xindex % 27
2023-01-11T21:38:06.2160215Z     x1 = (xindex // 27) % 27
2023-01-11T21:38:06.2160294Z     x2 = (xindex // 729)
2023-01-11T21:38:06.2160363Z     x3 = xindex
2023-01-11T21:38:06.2160499Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2160633Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2160765Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2160901Z     tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2161032Z     tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2161158Z     tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2161289Z     tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2161420Z     tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2161546Z     tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2161680Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.2161813Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.2161944Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.2162101Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.2162230Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.2162376Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.2162513Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.2162645Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.2162776Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.2162866Z ''')
2023-01-11T21:38:06.2162871Z 
2023-01-11T21:38:06.2162876Z 
2023-01-11T21:38:06.2162969Z async_compile.wait(globals())
2023-01-11T21:38:06.2163046Z del async_compile
2023-01-11T21:38:06.2163051Z 
2023-01-11T21:38:06.2163118Z def call(args):
2023-01-11T21:38:06.2163209Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2163285Z     args.clear()
2023-01-11T21:38:06.2163379Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2163523Z         buf0 = aten.convolution(arg2_1, arg1_1, None, (4, 4), (2, 2), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.2163644Z         assert_size_stride(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1))
2023-01-11T21:38:06.2163718Z         del arg1_1
2023-01-11T21:38:06.2163784Z         del arg2_1
2023-01-11T21:38:06.2163911Z         buf1 = as_strided(buf0, (16, 64, 55, 55), (193600, 3025, 55, 1)); del buf0  # reuse
2023-01-11T21:38:06.2164004Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2164163Z         triton_fused_convolution_relu_0.run(buf1, arg0_1, 3097600, grid=grid(3097600), stream=stream0)
2023-01-11T21:38:06.2164238Z         del arg0_1
2023-01-11T21:38:06.2164468Z         buf2 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2164609Z         triton_fused_getitem_1.run(buf1, buf2, 746496, grid=grid(746496), stream=stream0)
2023-01-11T21:38:06.2164689Z         return (buf2, )
2023-01-11T21:38:06.2164696Z 
2023-01-11T21:38:06.2164700Z 
2023-01-11T21:38:06.2164780Z if __name__ == "__main__":
2023-01-11T21:38:06.2164890Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2165042Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2165247Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2165471Z     arg1_1 = rand_strided((64, 3, 11, 11), (363, 121, 11, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2165701Z     arg2_1 = rand_strided((16, 3, 224, 224), (150528, 50176, 224, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2165829Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2165834Z 
2023-01-11T21:38:06.2165906Z ok (0.560s)
2023-01-11T21:38:06.2166356Z   test_any_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2166493Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2166745Z [2023-01-11 21:33:57,612] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 390
2023-01-11T21:38:06.2167008Z [2023-01-11 21:33:57,914] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 390
2023-01-11T21:38:06.2167423Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2167580Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2167836Z [2023-01-11 21:33:57,947] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 391
2023-01-11T21:38:06.2167841Z 
2023-01-11T21:38:06.2167942Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2168018Z import torch
2023-01-11T21:38:06.2168091Z import random
2023-01-11T21:38:06.2168210Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2168327Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2168332Z 
2023-01-11T21:38:06.2168416Z aten = torch.ops.aten
2023-01-11T21:38:06.2168551Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2168646Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2168651Z 
2023-01-11T21:38:06.2168725Z import triton
2023-01-11T21:38:06.2168816Z import triton.language as tl
2023-01-11T21:38:06.2168946Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2169085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2169091Z 
2023-01-11T21:38:06.2169095Z 
2023-01-11T21:38:06.2169366Z triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0 = async_compile.triton('''
2023-01-11T21:38:06.2169443Z import triton
2023-01-11T21:38:06.2169535Z import triton.language as tl
2023-01-11T21:38:06.2169648Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2169749Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2169879Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2170002Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2170007Z 
2023-01-11T21:38:06.2170099Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.2170206Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2170292Z               filename=__file__,
2023-01-11T21:38:06.2170717Z               meta={'signature': {0: '*i1', 1: '*i1', 2: '*fp32', 3: '*i1', 4: '*i1', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]})
2023-01-11T21:38:06.2170790Z @triton.jit
2023-01-11T21:38:06.2171018Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2171094Z     xnumel = 1
2023-01-11T21:38:06.2171169Z     rnumel = 64
2023-01-11T21:38:06.2171266Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2171394Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2171476Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2171593Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2171706Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2171817Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2171923Z     _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2172034Z     _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2172131Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2172220Z         rindex = roffset + rbase
2023-01-11T21:38:06.2172308Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2172381Z         r0 = rindex
2023-01-11T21:38:06.2172579Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2172679Z         tmp9 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.2172760Z         tmp1 = (tmp0 != 0)
2023-01-11T21:38:06.2172882Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.2172981Z         tmp3 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.2173108Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4)
2023-01-11T21:38:06.2173187Z         tmp5 = tmp3 == 0
2023-01-11T21:38:06.2173304Z         tmp6 = tmp5.to(tl.int64)
2023-01-11T21:38:06.2173384Z         tmp7 = (tmp6 != 0)
2023-01-11T21:38:06.2173508Z         _tmp8 = tl.where(xmask & rmask & (_tmp8 < tmp7), tmp7, _tmp8)
2023-01-11T21:38:06.2173603Z         tmp10 = tl.libdevice.isinf(tmp9)
2023-01-11T21:38:06.2173683Z         tmp11 = tmp10 == 0
2023-01-11T21:38:06.2173765Z         tmp12 = tmp11 == 0
2023-01-11T21:38:06.2173898Z         _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13)
2023-01-11T21:38:06.2174012Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2174144Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.2174258Z     tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2174383Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None)
2023-01-11T21:38:06.2174621Z     tmp8 = tl.reshape(tl.max(_tmp8, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2174739Z     tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2174822Z     tmp14 = tmp8 == 0
2023-01-11T21:38:06.2174900Z     tmp15 = tmp13 == 0
2023-01-11T21:38:06.2175037Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, None)
2023-01-11T21:38:06.2175197Z     tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, None)
2023-01-11T21:38:06.2175287Z ''')
2023-01-11T21:38:06.2175297Z 
2023-01-11T21:38:06.2175303Z 
2023-01-11T21:38:06.2175406Z async_compile.wait(globals())
2023-01-11T21:38:06.2175486Z del async_compile
2023-01-11T21:38:06.2175491Z 
2023-01-11T21:38:06.2175568Z def call(args):
2023-01-11T21:38:06.2175642Z     arg0_1, = args
2023-01-11T21:38:06.2175717Z     args.clear()
2023-01-11T21:38:06.2175817Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2176027Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2176200Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2176379Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2176556Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2176645Z         buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.2176736Z         buf5 = buf3; del buf3  # reuse
2023-01-11T21:38:06.2176826Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2177086Z         triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0.run(buf4, buf5, arg0_1, buf0, buf1, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2177229Z         del arg0_1
2023-01-11T21:38:06.2177319Z         return (buf0, buf1, buf4, buf5, )
2023-01-11T21:38:06.2177324Z 
2023-01-11T21:38:06.2177338Z 
2023-01-11T21:38:06.2177411Z if __name__ == "__main__":
2023-01-11T21:38:06.2177528Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2177656Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2177856Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2177972Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2177977Z 
2023-01-11T21:38:06.2178240Z [2023-01-11 21:33:58,123] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 391
2023-01-11T21:38:06.2178663Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2178795Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2179056Z [2023-01-11 21:33:58,156] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 392
2023-01-11T21:38:06.2179062Z 
2023-01-11T21:38:06.2179152Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2179264Z import torch
2023-01-11T21:38:06.2179338Z import random
2023-01-11T21:38:06.2179458Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2179584Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2179589Z 
2023-01-11T21:38:06.2179671Z aten = torch.ops.aten
2023-01-11T21:38:06.2179808Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2179897Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2179912Z 
2023-01-11T21:38:06.2179979Z import triton
2023-01-11T21:38:06.2180077Z import triton.language as tl
2023-01-11T21:38:06.2180203Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2180344Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2180349Z 
2023-01-11T21:38:06.2180354Z 
2023-01-11T21:38:06.2180630Z triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0 = async_compile.triton('''
2023-01-11T21:38:06.2180705Z import triton
2023-01-11T21:38:06.2180799Z import triton.language as tl
2023-01-11T21:38:06.2180906Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2181005Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2181139Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2181263Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2181271Z 
2023-01-11T21:38:06.2181358Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.2181473Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2181557Z               filename=__file__,
2023-01-11T21:38:06.2181984Z               meta={'signature': {0: '*i1', 1: '*i1', 2: '*fp16', 3: '*i1', 4: '*i1', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]})
2023-01-11T21:38:06.2182052Z @triton.jit
2023-01-11T21:38:06.2182251Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2182328Z     xnumel = 1
2023-01-11T21:38:06.2182403Z     rnumel = 64
2023-01-11T21:38:06.2182499Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2182635Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2182745Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2182863Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2182970Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2183080Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2183187Z     _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2183300Z     _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2183406Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2183495Z         rindex = roffset + rbase
2023-01-11T21:38:06.2183582Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2183651Z         r0 = rindex
2023-01-11T21:38:06.2183869Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2183991Z         tmp9 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.2184074Z         tmp1 = (tmp0 != 0)
2023-01-11T21:38:06.2184204Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.2184305Z         tmp3 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.2184431Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4)
2023-01-11T21:38:06.2184504Z         tmp5 = tmp3 == 0
2023-01-11T21:38:06.2184593Z         tmp6 = tmp5.to(tl.int64)
2023-01-11T21:38:06.2184675Z         tmp7 = (tmp6 != 0)
2023-01-11T21:38:06.2184801Z         _tmp8 = tl.where(xmask & rmask & (_tmp8 < tmp7), tmp7, _tmp8)
2023-01-11T21:38:06.2184903Z         tmp10 = tl.libdevice.isinf(tmp9)
2023-01-11T21:38:06.2184983Z         tmp11 = tmp10 == 0
2023-01-11T21:38:06.2185063Z         tmp12 = tmp11 == 0
2023-01-11T21:38:06.2185216Z         _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13)
2023-01-11T21:38:06.2185332Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2185488Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.2185622Z     tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2185757Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None)
2023-01-11T21:38:06.2185868Z     tmp8 = tl.reshape(tl.max(_tmp8, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2185982Z     tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2186053Z     tmp14 = tmp8 == 0
2023-01-11T21:38:06.2186132Z     tmp15 = tmp13 == 0
2023-01-11T21:38:06.2186270Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, None)
2023-01-11T21:38:06.2186403Z     tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, None)
2023-01-11T21:38:06.2186493Z ''')
2023-01-11T21:38:06.2186499Z 
2023-01-11T21:38:06.2186507Z 
2023-01-11T21:38:06.2186600Z async_compile.wait(globals())
2023-01-11T21:38:06.2186681Z del async_compile
2023-01-11T21:38:06.2186686Z 
2023-01-11T21:38:06.2186760Z def call(args):
2023-01-11T21:38:06.2186827Z     arg0_1, = args
2023-01-11T21:38:06.2186902Z     args.clear()
2023-01-11T21:38:06.2186994Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2187181Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2187360Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2187535Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2187710Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2187801Z         buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.2187885Z         buf5 = buf3; del buf3  # reuse
2023-01-11T21:38:06.2187978Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2188195Z         triton_fused_any_1_any_2_any_3_any_4_isinf_isinf_1_isinf_2_logical_not_logical_not_1_logical_not_2_0.run(buf4, buf5, arg0_1, buf0, buf1, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2188272Z         del arg0_1
2023-01-11T21:38:06.2188366Z         return (buf0, buf1, buf4, buf5, )
2023-01-11T21:38:06.2188372Z 
2023-01-11T21:38:06.2188376Z 
2023-01-11T21:38:06.2188484Z if __name__ == "__main__":
2023-01-11T21:38:06.2188600Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2188730Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2188922Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2189033Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2189038Z 
2023-01-11T21:38:06.2189300Z [2023-01-11 21:33:58,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 392
2023-01-11T21:38:06.2189717Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2189852Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2190108Z [2023-01-11 21:33:58,458] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 393
2023-01-11T21:38:06.2190113Z 
2023-01-11T21:38:06.2190211Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2190288Z import torch
2023-01-11T21:38:06.2190362Z import random
2023-01-11T21:38:06.2190475Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2190599Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2190604Z 
2023-01-11T21:38:06.2190685Z aten = torch.ops.aten
2023-01-11T21:38:06.2190821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2190946Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2190951Z 
2023-01-11T21:38:06.2191026Z import triton
2023-01-11T21:38:06.2191118Z import triton.language as tl
2023-01-11T21:38:06.2191236Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2191377Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2191385Z 
2023-01-11T21:38:06.2191389Z 
2023-01-11T21:38:06.2191550Z triton_fused_any_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2191624Z import triton
2023-01-11T21:38:06.2191716Z import triton.language as tl
2023-01-11T21:38:06.2191831Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2191934Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2192066Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2192183Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2192188Z 
2023-01-11T21:38:06.2192278Z @reduction(size_hints=[16, 8],
2023-01-11T21:38:06.2192395Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2192479Z               filename=__file__,
2023-01-11T21:38:06.2192838Z               meta={'signature': {0: '*fp32', 1: '*i1', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2192916Z @triton.jit
2023-01-11T21:38:06.2193086Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2193160Z     xnumel = 16
2023-01-11T21:38:06.2193225Z     rnumel = 8
2023-01-11T21:38:06.2193322Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2193461Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2193543Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2193663Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2193732Z     x0 = xindex
2023-01-11T21:38:06.2193842Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2193943Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2194030Z         rindex = roffset + rbase
2023-01-11T21:38:06.2194116Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2194187Z         r1 = rindex
2023-01-11T21:38:06.2194427Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2194513Z         tmp1 = (tmp0 != 0)
2023-01-11T21:38:06.2194641Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.2194748Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2194845Z     tl.store(out_ptr0 + x0, tmp2, xmask)
2023-01-11T21:38:06.2194930Z ''')
2023-01-11T21:38:06.2194936Z 
2023-01-11T21:38:06.2194942Z 
2023-01-11T21:38:06.2195205Z triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1 = async_compile.triton('''
2023-01-11T21:38:06.2195296Z import triton
2023-01-11T21:38:06.2195398Z import triton.language as tl
2023-01-11T21:38:06.2195531Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2195632Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2195757Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2195883Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2195888Z 
2023-01-11T21:38:06.2195977Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.2196091Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2196174Z               filename=__file__,
2023-01-11T21:38:06.2196561Z               meta={'signature': {0: '*i1', 1: '*fp32', 2: '*i1', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.2196634Z @triton.jit
2023-01-11T21:38:06.2196813Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2196880Z     xnumel = 1
2023-01-11T21:38:06.2197036Z     rnumel = 128
2023-01-11T21:38:06.2197133Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2197268Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2197350Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2197468Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2197585Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2197689Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2197793Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2197879Z         rindex = roffset + rbase
2023-01-11T21:38:06.2197964Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2198033Z         r0 = rindex
2023-01-11T21:38:06.2198229Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2198330Z         tmp3 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.2198422Z         tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.2198550Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.2198649Z         tmp4 = tl.libdevice.isinf(tmp3)
2023-01-11T21:38:06.2198728Z         tmp5 = tmp4 == 0
2023-01-11T21:38:06.2198809Z         tmp6 = tmp5 == 0
2023-01-11T21:38:06.2198939Z         _tmp7 = tl.where(xmask & rmask & (_tmp7 < tmp6), tmp6, _tmp7)
2023-01-11T21:38:06.2199052Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2199177Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.2199290Z     tmp7 = tl.reshape(tl.max(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2199366Z     tmp8 = tmp7 == 0
2023-01-11T21:38:06.2199503Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None)
2023-01-11T21:38:06.2199588Z ''')
2023-01-11T21:38:06.2199594Z 
2023-01-11T21:38:06.2199598Z 
2023-01-11T21:38:06.2199808Z triton_fused_any_3_isinf_1_logical_not_logical_not_1_2 = async_compile.triton('''
2023-01-11T21:38:06.2199890Z import triton
2023-01-11T21:38:06.2199982Z import triton.language as tl
2023-01-11T21:38:06.2200090Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2200191Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2200321Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2200475Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2200480Z 
2023-01-11T21:38:06.2200570Z @reduction(size_hints=[8, 16],
2023-01-11T21:38:06.2200687Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.2200770Z               filename=__file__,
2023-01-11T21:38:06.2201143Z               meta={'signature': {0: '*i1', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2201209Z @triton.jit
2023-01-11T21:38:06.2201377Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2201453Z     xnumel = 8
2023-01-11T21:38:06.2201526Z     rnumel = 16
2023-01-11T21:38:06.2201624Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2201759Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2201843Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2201958Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2202029Z     x0 = xindex
2023-01-11T21:38:06.2202140Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2202242Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2202330Z         rindex = roffset + rbase
2023-01-11T21:38:06.2202417Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2202488Z         r1 = rindex
2023-01-11T21:38:06.2202598Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask)
2023-01-11T21:38:06.2202696Z         tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.2202775Z         tmp2 = tmp1 == 0
2023-01-11T21:38:06.2202891Z         tmp3 = tmp2.to(tl.int64)
2023-01-11T21:38:06.2202971Z         tmp4 = (tmp3 != 0)
2023-01-11T21:38:06.2203098Z         _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5)
2023-01-11T21:38:06.2203212Z     tmp5 = tl.reshape(tl.max(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2203282Z     tmp6 = tmp5 == 0
2023-01-11T21:38:06.2203423Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2203508Z ''')
2023-01-11T21:38:06.2203514Z 
2023-01-11T21:38:06.2203518Z 
2023-01-11T21:38:06.2203609Z async_compile.wait(globals())
2023-01-11T21:38:06.2203685Z del async_compile
2023-01-11T21:38:06.2203690Z 
2023-01-11T21:38:06.2203763Z def call(args):
2023-01-11T21:38:06.2203836Z     arg0_1, = args
2023-01-11T21:38:06.2203905Z     args.clear()
2023-01-11T21:38:06.2203996Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2204193Z         buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2204284Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2204428Z         triton_fused_any_1_0.run(arg0_1, buf0, 16, 8, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2204610Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2204787Z         buf4 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2204879Z         buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.2205069Z         triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1.run(buf5, arg0_1, buf1, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2205259Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2205359Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.2205559Z         triton_fused_any_3_isinf_1_logical_not_logical_not_1_2.run(buf3, arg0_1, 8, 16, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.2205643Z         del arg0_1
2023-01-11T21:38:06.2205737Z         return (buf0, buf1, buf3, buf5, )
2023-01-11T21:38:06.2205745Z 
2023-01-11T21:38:06.2205749Z 
2023-01-11T21:38:06.2205830Z if __name__ == "__main__":
2023-01-11T21:38:06.2205948Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2206066Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2206295Z     arg0_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2206408Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2206413Z 
2023-01-11T21:38:06.2206678Z [2023-01-11 21:33:58,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 393
2023-01-11T21:38:06.2206684Z 
2023-01-11T21:38:06.2206781Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2206858Z import torch
2023-01-11T21:38:06.2206932Z import random
2023-01-11T21:38:06.2207051Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2207169Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2207174Z 
2023-01-11T21:38:06.2207258Z aten = torch.ops.aten
2023-01-11T21:38:06.2207393Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2207491Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2207496Z 
2023-01-11T21:38:06.2207570Z import triton
2023-01-11T21:38:06.2207662Z import triton.language as tl
2023-01-11T21:38:06.2207788Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2207926Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2207931Z 
2023-01-11T21:38:06.2207936Z 
2023-01-11T21:38:06.2208087Z triton_fused_any_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2208162Z import triton
2023-01-11T21:38:06.2208255Z import triton.language as tl
2023-01-11T21:38:06.2208366Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2208467Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2208598Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2208721Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2208754Z 
2023-01-11T21:38:06.2208847Z @reduction(size_hints=[16, 8],
2023-01-11T21:38:06.2208954Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2209038Z               filename=__file__,
2023-01-11T21:38:06.2209398Z               meta={'signature': {0: '*fp16', 1: '*i1', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2209472Z @triton.jit
2023-01-11T21:38:06.2209639Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2209712Z     xnumel = 16
2023-01-11T21:38:06.2209782Z     rnumel = 8
2023-01-11T21:38:06.2209873Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2210005Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2210086Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2210204Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2210277Z     x0 = xindex
2023-01-11T21:38:06.2210389Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2210495Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2210577Z         rindex = roffset + rbase
2023-01-11T21:38:06.2210662Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2210734Z         r1 = rindex
2023-01-11T21:38:06.2210971Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2211056Z         tmp1 = (tmp0 != 0)
2023-01-11T21:38:06.2211182Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.2211295Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2211385Z     tl.store(out_ptr0 + x0, tmp2, xmask)
2023-01-11T21:38:06.2211470Z ''')
2023-01-11T21:38:06.2211475Z 
2023-01-11T21:38:06.2211480Z 
2023-01-11T21:38:06.2211730Z triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1 = async_compile.triton('''
2023-01-11T21:38:06.2211810Z import triton
2023-01-11T21:38:06.2211903Z import triton.language as tl
2023-01-11T21:38:06.2212017Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2212117Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2212274Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2212394Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2212406Z 
2023-01-11T21:38:06.2212489Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.2212607Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2212692Z               filename=__file__,
2023-01-11T21:38:06.2213076Z               meta={'signature': {0: '*i1', 1: '*fp16', 2: '*i1', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.2213150Z @triton.jit
2023-01-11T21:38:06.2213331Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2213405Z     xnumel = 1
2023-01-11T21:38:06.2213472Z     rnumel = 128
2023-01-11T21:38:06.2213569Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2213707Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2213795Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2213913Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2214025Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2214134Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2214231Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2214321Z         rindex = roffset + rbase
2023-01-11T21:38:06.2214406Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2214588Z         r0 = rindex
2023-01-11T21:38:06.2214808Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2214973Z         tmp3 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.2215074Z         tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.2215194Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.2215294Z         tmp4 = tl.libdevice.isinf(tmp3)
2023-01-11T21:38:06.2215376Z         tmp5 = tmp4 == 0
2023-01-11T21:38:06.2215455Z         tmp6 = tmp5 == 0
2023-01-11T21:38:06.2215582Z         _tmp7 = tl.where(xmask & rmask & (_tmp7 < tmp6), tmp6, _tmp7)
2023-01-11T21:38:06.2215695Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2215827Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.2215945Z     tmp7 = tl.reshape(tl.max(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2216015Z     tmp8 = tmp7 == 0
2023-01-11T21:38:06.2216150Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None)
2023-01-11T21:38:06.2216236Z ''')
2023-01-11T21:38:06.2216244Z 
2023-01-11T21:38:06.2216249Z 
2023-01-11T21:38:06.2216460Z triton_fused_any_3_isinf_1_logical_not_logical_not_1_2 = async_compile.triton('''
2023-01-11T21:38:06.2216535Z import triton
2023-01-11T21:38:06.2216627Z import triton.language as tl
2023-01-11T21:38:06.2216741Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2216838Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2216970Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2217095Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2217100Z 
2023-01-11T21:38:06.2217247Z @reduction(size_hints=[8, 16],
2023-01-11T21:38:06.2217367Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.2217450Z               filename=__file__,
2023-01-11T21:38:06.2217822Z               meta={'signature': {0: '*i1', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2217900Z @triton.jit
2023-01-11T21:38:06.2218063Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2218137Z     xnumel = 8
2023-01-11T21:38:06.2218209Z     rnumel = 16
2023-01-11T21:38:06.2218304Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2218472Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2218556Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2218673Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2218737Z     x0 = xindex
2023-01-11T21:38:06.2218848Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.2218951Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2219038Z         rindex = roffset + rbase
2023-01-11T21:38:06.2219124Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2219196Z         r1 = rindex
2023-01-11T21:38:06.2219325Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.2219423Z         tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.2219499Z         tmp2 = tmp1 == 0
2023-01-11T21:38:06.2219587Z         tmp3 = tmp2.to(tl.int64)
2023-01-11T21:38:06.2219667Z         tmp4 = (tmp3 != 0)
2023-01-11T21:38:06.2219797Z         _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5)
2023-01-11T21:38:06.2219909Z     tmp5 = tl.reshape(tl.max(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2219986Z     tmp6 = tmp5 == 0
2023-01-11T21:38:06.2220117Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2220203Z ''')
2023-01-11T21:38:06.2220209Z 
2023-01-11T21:38:06.2220213Z 
2023-01-11T21:38:06.2220303Z async_compile.wait(globals())
2023-01-11T21:38:06.2220380Z del async_compile
2023-01-11T21:38:06.2220385Z 
2023-01-11T21:38:06.2220461Z def call(args):
2023-01-11T21:38:06.2220533Z     arg0_1, = args
2023-01-11T21:38:06.2220606Z     args.clear()
2023-01-11T21:38:06.2220728Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2220914Z         buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2221008Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2221149Z         triton_fused_any_1_0.run(arg0_1, buf0, 16, 8, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2221331Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2221509Z         buf4 = empty_strided((), (), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2221598Z         buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.2221796Z         triton_fused_any_2_any_4_isinf_isinf_2_logical_not_2_logical_not_3_logical_not_4_1.run(buf5, arg0_1, buf1, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2221988Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2222071Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.2222241Z         triton_fused_any_3_isinf_1_logical_not_logical_not_1_2.run(buf3, arg0_1, 8, 16, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.2222318Z         del arg0_1
2023-01-11T21:38:06.2222412Z         return (buf0, buf1, buf3, buf5, )
2023-01-11T21:38:06.2222418Z 
2023-01-11T21:38:06.2222422Z 
2023-01-11T21:38:06.2222500Z if __name__ == "__main__":
2023-01-11T21:38:06.2222620Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2222746Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2222950Z     arg0_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2223056Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2223062Z 
2023-01-11T21:38:06.2223133Z ok (1.078s)
2023-01-11T21:38:06.2223589Z   test_arange1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2223726Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2224014Z [2023-01-11 21:33:58,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 394
2023-01-11T21:38:06.2224277Z [2023-01-11 21:33:58,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 394
2023-01-11T21:38:06.2224691Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2224821Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2225078Z [2023-01-11 21:33:58,929] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 395
2023-01-11T21:38:06.2225339Z [2023-01-11 21:33:59,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 395
2023-01-11T21:38:06.2225345Z 
2023-01-11T21:38:06.2225441Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2225510Z import torch
2023-01-11T21:38:06.2225583Z import random
2023-01-11T21:38:06.2225727Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2225865Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2225871Z 
2023-01-11T21:38:06.2225958Z aten = torch.ops.aten
2023-01-11T21:38:06.2226098Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2226194Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2226199Z 
2023-01-11T21:38:06.2226269Z import triton
2023-01-11T21:38:06.2226360Z import triton.language as tl
2023-01-11T21:38:06.2226488Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2226659Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2226665Z 
2023-01-11T21:38:06.2226669Z 
2023-01-11T21:38:06.2226830Z triton_fused_add_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.2226906Z import triton
2023-01-11T21:38:06.2226996Z import triton.language as tl
2023-01-11T21:38:06.2227112Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2227207Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2227339Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2227465Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2227470Z 
2023-01-11T21:38:06.2227891Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.2227963Z @triton.jit
2023-01-11T21:38:06.2228109Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2228181Z     xnumel = 64
2023-01-11T21:38:06.2228277Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2228401Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2228491Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2228562Z     x0 = xindex
2023-01-11T21:38:06.2228634Z     x1 = xindex % 8
2023-01-11T21:38:06.2228731Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2228803Z     tmp1 = x0
2023-01-11T21:38:06.2228882Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.2228949Z     tmp3 = 10 + x1
2023-01-11T21:38:06.2229038Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.2229114Z     tmp5 = tmp2 + tmp4
2023-01-11T21:38:06.2229250Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2229385Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2229474Z ''')
2023-01-11T21:38:06.2229480Z 
2023-01-11T21:38:06.2229484Z 
2023-01-11T21:38:06.2229576Z async_compile.wait(globals())
2023-01-11T21:38:06.2229645Z del async_compile
2023-01-11T21:38:06.2229650Z 
2023-01-11T21:38:06.2229725Z def call(args):
2023-01-11T21:38:06.2229799Z     arg0_1, = args
2023-01-11T21:38:06.2229875Z     args.clear()
2023-01-11T21:38:06.2229996Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2230198Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2230398Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2230484Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2230628Z         triton_fused_add_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2230701Z         del arg0_1
2023-01-11T21:38:06.2230782Z         return (buf0, buf1, )
2023-01-11T21:38:06.2230787Z 
2023-01-11T21:38:06.2230791Z 
2023-01-11T21:38:06.2230874Z if __name__ == "__main__":
2023-01-11T21:38:06.2230993Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2231121Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2231320Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2231424Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2231439Z 
2023-01-11T21:38:06.2231444Z 
2023-01-11T21:38:06.2231534Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2231608Z import torch
2023-01-11T21:38:06.2231682Z import random
2023-01-11T21:38:06.2231803Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2231929Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2231934Z 
2023-01-11T21:38:06.2232017Z aten = torch.ops.aten
2023-01-11T21:38:06.2232154Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2232244Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2232249Z 
2023-01-11T21:38:06.2232352Z import triton
2023-01-11T21:38:06.2232444Z import triton.language as tl
2023-01-11T21:38:06.2232568Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2232707Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2232712Z 
2023-01-11T21:38:06.2232716Z 
2023-01-11T21:38:06.2232878Z triton_fused_add_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.2232955Z import triton
2023-01-11T21:38:06.2233046Z import triton.language as tl
2023-01-11T21:38:06.2233154Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2233256Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2233388Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2233513Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2233518Z 
2023-01-11T21:38:06.2233939Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.2234016Z @triton.jit
2023-01-11T21:38:06.2234157Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2234230Z     xnumel = 64
2023-01-11T21:38:06.2234320Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2234456Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2234538Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2234609Z     x0 = xindex
2023-01-11T21:38:06.2234681Z     x1 = xindex % 8
2023-01-11T21:38:06.2234800Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2234888Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.2234952Z     tmp2 = x0
2023-01-11T21:38:06.2235030Z     tmp3 = tmp1 * tmp2
2023-01-11T21:38:06.2235103Z     tmp4 = 10 + x1
2023-01-11T21:38:06.2235188Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.2235264Z     tmp6 = tmp3 + tmp5
2023-01-11T21:38:06.2235399Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2235530Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2235609Z ''')
2023-01-11T21:38:06.2235614Z 
2023-01-11T21:38:06.2235619Z 
2023-01-11T21:38:06.2235710Z async_compile.wait(globals())
2023-01-11T21:38:06.2235816Z del async_compile
2023-01-11T21:38:06.2235821Z 
2023-01-11T21:38:06.2235897Z def call(args):
2023-01-11T21:38:06.2235974Z     arg0_1, = args
2023-01-11T21:38:06.2236047Z     args.clear()
2023-01-11T21:38:06.2236139Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2236331Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2236528Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2236621Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2236764Z         triton_fused_add_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2236840Z         del arg0_1
2023-01-11T21:38:06.2236924Z         return (buf0, buf1, )
2023-01-11T21:38:06.2236929Z 
2023-01-11T21:38:06.2236933Z 
2023-01-11T21:38:06.2237019Z if __name__ == "__main__":
2023-01-11T21:38:06.2237136Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2237256Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2237454Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2237570Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2237575Z 
2023-01-11T21:38:06.2237646Z ok (0.353s)
2023-01-11T21:38:06.2238102Z   test_arange2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2238276Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2238539Z [2023-01-11 21:33:59,040] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 396
2023-01-11T21:38:06.2238802Z [2023-01-11 21:33:59,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 396
2023-01-11T21:38:06.2238809Z 
2023-01-11T21:38:06.2238909Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2238986Z import torch
2023-01-11T21:38:06.2239054Z import random
2023-01-11T21:38:06.2239174Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2239297Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2239302Z 
2023-01-11T21:38:06.2239382Z aten = torch.ops.aten
2023-01-11T21:38:06.2239521Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2239614Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2239620Z 
2023-01-11T21:38:06.2239697Z import triton
2023-01-11T21:38:06.2239783Z import triton.language as tl
2023-01-11T21:38:06.2239907Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2240047Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2240053Z 
2023-01-11T21:38:06.2240057Z 
2023-01-11T21:38:06.2240215Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2240291Z import triton
2023-01-11T21:38:06.2240385Z import triton.language as tl
2023-01-11T21:38:06.2240497Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2240599Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2240724Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2240851Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2240856Z 
2023-01-11T21:38:06.2241256Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2241333Z @triton.jit
2023-01-11T21:38:06.2241467Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2241539Z     xnumel = 64
2023-01-11T21:38:06.2241637Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2241790Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2241868Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2241938Z     x2 = xindex
2023-01-11T21:38:06.2242015Z     x0 = xindex % 8
2023-01-11T21:38:06.2242111Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.2242184Z     tmp1 = x0
2023-01-11T21:38:06.2242262Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2242399Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2242477Z ''')
2023-01-11T21:38:06.2242482Z 
2023-01-11T21:38:06.2242487Z 
2023-01-11T21:38:06.2242578Z async_compile.wait(globals())
2023-01-11T21:38:06.2242656Z del async_compile
2023-01-11T21:38:06.2242662Z 
2023-01-11T21:38:06.2242738Z def call(args):
2023-01-11T21:38:06.2242810Z     arg0_1, = args
2023-01-11T21:38:06.2242887Z     args.clear()
2023-01-11T21:38:06.2242978Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2243170Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2243261Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2243400Z         triton_fused_add_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2243475Z         del arg0_1
2023-01-11T21:38:06.2243556Z         return (buf0, )
2023-01-11T21:38:06.2243562Z 
2023-01-11T21:38:06.2243566Z 
2023-01-11T21:38:06.2243645Z if __name__ == "__main__":
2023-01-11T21:38:06.2243763Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2243888Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2244078Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.2244218Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2244224Z 
2023-01-11T21:38:06.2244298Z ok (0.098s)
2023-01-11T21:38:06.2244763Z   test_arange3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2244894Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2245152Z [2023-01-11 21:33:59,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 397
2023-01-11T21:38:06.2245416Z [2023-01-11 21:33:59,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 397
2023-01-11T21:38:06.2245833Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2245972Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2246226Z [2023-01-11 21:33:59,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 398
2023-01-11T21:38:06.2246487Z [2023-01-11 21:33:59,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 398
2023-01-11T21:38:06.2246493Z 
2023-01-11T21:38:06.2246584Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2246659Z import torch
2023-01-11T21:38:06.2246735Z import random
2023-01-11T21:38:06.2246856Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2246980Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2246988Z 
2023-01-11T21:38:06.2247069Z aten = torch.ops.aten
2023-01-11T21:38:06.2247204Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2247293Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2247298Z 
2023-01-11T21:38:06.2247374Z import triton
2023-01-11T21:38:06.2247495Z import triton.language as tl
2023-01-11T21:38:06.2247622Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2247759Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2247765Z 
2023-01-11T21:38:06.2247770Z 
2023-01-11T21:38:06.2247927Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2248000Z import triton
2023-01-11T21:38:06.2248094Z import triton.language as tl
2023-01-11T21:38:06.2248202Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2248303Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2248439Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2248566Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2248571Z 
2023-01-11T21:38:06.2248975Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2249046Z @triton.jit
2023-01-11T21:38:06.2249181Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2249256Z     xnumel = 14
2023-01-11T21:38:06.2249346Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2249474Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2249559Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2249629Z     x0 = xindex
2023-01-11T21:38:06.2249727Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2249804Z     tmp1 = 4*x0
2023-01-11T21:38:06.2249890Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.2249991Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.2250125Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2250212Z ''')
2023-01-11T21:38:06.2250218Z 
2023-01-11T21:38:06.2250222Z 
2023-01-11T21:38:06.2250317Z async_compile.wait(globals())
2023-01-11T21:38:06.2250398Z del async_compile
2023-01-11T21:38:06.2250405Z 
2023-01-11T21:38:06.2250482Z def call(args):
2023-01-11T21:38:06.2250556Z     arg0_1, = args
2023-01-11T21:38:06.2250623Z     args.clear()
2023-01-11T21:38:06.2250715Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2250915Z         buf0 = empty_strided((14, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2251009Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2251150Z         triton_fused_add_0.run(arg0_1, buf0, 14, grid=grid(14), stream=stream0)
2023-01-11T21:38:06.2251224Z         del arg0_1
2023-01-11T21:38:06.2251301Z         return (buf0, )
2023-01-11T21:38:06.2251306Z 
2023-01-11T21:38:06.2251312Z 
2023-01-11T21:38:06.2251396Z if __name__ == "__main__":
2023-01-11T21:38:06.2251507Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2251633Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2251835Z     arg0_1 = rand_strided((14, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2251954Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2251959Z 
2023-01-11T21:38:06.2251963Z 
2023-01-11T21:38:06.2252060Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2252136Z import torch
2023-01-11T21:38:06.2252210Z import random
2023-01-11T21:38:06.2252327Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2252443Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2252448Z 
2023-01-11T21:38:06.2252532Z aten = torch.ops.aten
2023-01-11T21:38:06.2252668Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2252764Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2252772Z 
2023-01-11T21:38:06.2252850Z import triton
2023-01-11T21:38:06.2252943Z import triton.language as tl
2023-01-11T21:38:06.2253070Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2253202Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2253215Z 
2023-01-11T21:38:06.2253220Z 
2023-01-11T21:38:06.2253393Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2253470Z import triton
2023-01-11T21:38:06.2253565Z import triton.language as tl
2023-01-11T21:38:06.2253682Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2253785Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2253917Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2254043Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2254048Z 
2023-01-11T21:38:06.2254440Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2254631Z @triton.jit
2023-01-11T21:38:06.2254763Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2254834Z     xnumel = 14
2023-01-11T21:38:06.2254934Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2255067Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2255150Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2255223Z     x0 = xindex
2023-01-11T21:38:06.2255333Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2255414Z     tmp1 = 4*x0
2023-01-11T21:38:06.2255520Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.2255604Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.2255758Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2255847Z ''')
2023-01-11T21:38:06.2255853Z 
2023-01-11T21:38:06.2255857Z 
2023-01-11T21:38:06.2255952Z async_compile.wait(globals())
2023-01-11T21:38:06.2256067Z del async_compile
2023-01-11T21:38:06.2256072Z 
2023-01-11T21:38:06.2256150Z def call(args):
2023-01-11T21:38:06.2256227Z     arg0_1, = args
2023-01-11T21:38:06.2256302Z     args.clear()
2023-01-11T21:38:06.2256392Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2256596Z         buf0 = empty_strided((14, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2256691Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2256818Z         triton_fused_add_0.run(arg0_1, buf0, 14, grid=grid(14), stream=stream0)
2023-01-11T21:38:06.2256894Z         del arg0_1
2023-01-11T21:38:06.2256971Z         return (buf0, )
2023-01-11T21:38:06.2256976Z 
2023-01-11T21:38:06.2256981Z 
2023-01-11T21:38:06.2257065Z if __name__ == "__main__":
2023-01-11T21:38:06.2257242Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2257372Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2257574Z     arg0_1 = rand_strided((14, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2257690Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2257696Z 
2023-01-11T21:38:06.2257759Z ok (0.198s)
2023-01-11T21:38:06.2258216Z   test_arange4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2258350Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2258606Z [2023-01-11 21:33:59,336] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 399
2023-01-11T21:38:06.2258869Z [2023-01-11 21:33:59,407] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 399
2023-01-11T21:38:06.2259295Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2259463Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2259717Z [2023-01-11 21:33:59,437] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 400
2023-01-11T21:38:06.2259983Z [2023-01-11 21:33:59,505] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 400
2023-01-11T21:38:06.2259989Z 
2023-01-11T21:38:06.2260086Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2260161Z import torch
2023-01-11T21:38:06.2260229Z import random
2023-01-11T21:38:06.2260347Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2260481Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2260487Z 
2023-01-11T21:38:06.2260569Z aten = torch.ops.aten
2023-01-11T21:38:06.2260705Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2260805Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2260810Z 
2023-01-11T21:38:06.2260889Z import triton
2023-01-11T21:38:06.2260974Z import triton.language as tl
2023-01-11T21:38:06.2261103Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2261242Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2261247Z 
2023-01-11T21:38:06.2261252Z 
2023-01-11T21:38:06.2261407Z triton_fused_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.2261480Z import triton
2023-01-11T21:38:06.2261576Z import triton.language as tl
2023-01-11T21:38:06.2261688Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2261788Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2261914Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2262071Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2262076Z 
2023-01-11T21:38:06.2262487Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2262564Z @triton.jit
2023-01-11T21:38:06.2262711Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2262788Z     xnumel = 1024
2023-01-11T21:38:06.2262893Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2263032Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2263111Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2263184Z     x0 = xindex
2023-01-11T21:38:06.2263289Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2263407Z     tmp1 = 512 + ((-1)*x0)
2023-01-11T21:38:06.2263502Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.2263617Z     tmp3 = tmp0 - tmp2
2023-01-11T21:38:06.2263763Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2263843Z ''')
2023-01-11T21:38:06.2263849Z 
2023-01-11T21:38:06.2263863Z 
2023-01-11T21:38:06.2263954Z async_compile.wait(globals())
2023-01-11T21:38:06.2264036Z del async_compile
2023-01-11T21:38:06.2264042Z 
2023-01-11T21:38:06.2264118Z def call(args):
2023-01-11T21:38:06.2264191Z     arg0_1, = args
2023-01-11T21:38:06.2264267Z     args.clear()
2023-01-11T21:38:06.2264362Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2264587Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2264673Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2264815Z         triton_fused_sub_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.2264892Z         del arg0_1
2023-01-11T21:38:06.2264969Z         return (buf0, )
2023-01-11T21:38:06.2264977Z 
2023-01-11T21:38:06.2264981Z 
2023-01-11T21:38:06.2265064Z if __name__ == "__main__":
2023-01-11T21:38:06.2265182Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2265307Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2265531Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2265645Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2265650Z 
2023-01-11T21:38:06.2265655Z 
2023-01-11T21:38:06.2265752Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2265827Z import torch
2023-01-11T21:38:06.2265899Z import random
2023-01-11T21:38:06.2266014Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2266140Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2266146Z 
2023-01-11T21:38:06.2266225Z aten = torch.ops.aten
2023-01-11T21:38:06.2266353Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2266452Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2266457Z 
2023-01-11T21:38:06.2266528Z import triton
2023-01-11T21:38:06.2266617Z import triton.language as tl
2023-01-11T21:38:06.2266739Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2266876Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2266883Z 
2023-01-11T21:38:06.2266888Z 
2023-01-11T21:38:06.2267042Z triton_fused_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.2267115Z import triton
2023-01-11T21:38:06.2267200Z import triton.language as tl
2023-01-11T21:38:06.2267313Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2267419Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2267551Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2267676Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2267681Z 
2023-01-11T21:38:06.2268084Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2268185Z @triton.jit
2023-01-11T21:38:06.2268317Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2268385Z     xnumel = 1024
2023-01-11T21:38:06.2268484Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2268612Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2268698Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2268769Z     x0 = xindex
2023-01-11T21:38:06.2268885Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2268999Z     tmp1 = 512 + ((-1)*x0)
2023-01-11T21:38:06.2269080Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.2269187Z     tmp3 = tmp0 - tmp2
2023-01-11T21:38:06.2269318Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2269402Z ''')
2023-01-11T21:38:06.2269410Z 
2023-01-11T21:38:06.2269415Z 
2023-01-11T21:38:06.2269509Z async_compile.wait(globals())
2023-01-11T21:38:06.2269583Z del async_compile
2023-01-11T21:38:06.2269589Z 
2023-01-11T21:38:06.2269662Z def call(args):
2023-01-11T21:38:06.2269734Z     arg0_1, = args
2023-01-11T21:38:06.2269802Z     args.clear()
2023-01-11T21:38:06.2269895Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2270095Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2270187Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2270324Z         triton_fused_sub_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.2270397Z         del arg0_1
2023-01-11T21:38:06.2270475Z         return (buf0, )
2023-01-11T21:38:06.2270480Z 
2023-01-11T21:38:06.2270484Z 
2023-01-11T21:38:06.2270558Z if __name__ == "__main__":
2023-01-11T21:38:06.2270673Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2270798Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2271003Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2271113Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2271118Z 
2023-01-11T21:38:06.2271186Z ok (0.199s)
2023-01-11T21:38:06.2271671Z   test_argmax_argmin1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2271803Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2272055Z [2023-01-11 21:33:59,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 401
2023-01-11T21:38:06.2272309Z [2023-01-11 21:33:59,757] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 401
2023-01-11T21:38:06.2272726Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2272857Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2273110Z [2023-01-11 21:33:59,781] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 402
2023-01-11T21:38:06.2273369Z [2023-01-11 21:33:59,912] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 402
2023-01-11T21:38:06.2273375Z 
2023-01-11T21:38:06.2273472Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2273545Z import torch
2023-01-11T21:38:06.2273618Z import random
2023-01-11T21:38:06.2273767Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2273883Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2273896Z 
2023-01-11T21:38:06.2273970Z aten = torch.ops.aten
2023-01-11T21:38:06.2274107Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2280386Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2280395Z 
2023-01-11T21:38:06.2280486Z import triton
2023-01-11T21:38:06.2280586Z import triton.language as tl
2023-01-11T21:38:06.2280714Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2280856Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2280862Z 
2023-01-11T21:38:06.2280866Z 
2023-01-11T21:38:06.2281060Z triton_fused_argmax_argmin_0 = async_compile.triton('''
2023-01-11T21:38:06.2281129Z import triton
2023-01-11T21:38:06.2281224Z import triton.language as tl
2023-01-11T21:38:06.2281339Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2281446Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2281578Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2281706Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2281711Z 
2023-01-11T21:38:06.2281805Z @reduction(size_hints=[1, 524288],
2023-01-11T21:38:06.2281925Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2282004Z               filename=__file__,
2023-01-11T21:38:06.2282374Z               meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.2282451Z @triton.jit
2023-01-11T21:38:06.2282625Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2282699Z     xnumel = 1
2023-01-11T21:38:06.2282773Z     rnumel = 524288
2023-01-11T21:38:06.2282877Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2283006Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2283090Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2283208Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2283442Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.2283561Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2283690Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.2283803Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2283900Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2283989Z         rindex = roffset + rbase
2023-01-11T21:38:06.2284075Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2284146Z         r0 = rindex
2023-01-11T21:38:06.2284343Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2284445Z         tmp2 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.2284589Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.2284710Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.2284853Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.2284976Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.2285071Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.2285186Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2285305Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2285404Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.2285492Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2285610Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2285741Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.2285882Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.2285991Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2286108Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2286201Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.2286292Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2286409Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2286542Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.2286629Z ''')
2023-01-11T21:38:06.2286634Z 
2023-01-11T21:38:06.2286639Z 
2023-01-11T21:38:06.2286732Z async_compile.wait(globals())
2023-01-11T21:38:06.2286809Z del async_compile
2023-01-11T21:38:06.2286814Z 
2023-01-11T21:38:06.2286888Z def call(args):
2023-01-11T21:38:06.2286963Z     arg0_1, = args
2023-01-11T21:38:06.2287031Z     args.clear()
2023-01-11T21:38:06.2287123Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2287314Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2287500Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2287590Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2287752Z         triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 524288, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2287823Z         del arg0_1
2023-01-11T21:38:06.2287906Z         return (buf0, buf1, )
2023-01-11T21:38:06.2287911Z 
2023-01-11T21:38:06.2287915Z 
2023-01-11T21:38:06.2287988Z if __name__ == "__main__":
2023-01-11T21:38:06.2288107Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2288235Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2288456Z     arg0_1 = rand_strided((8, 256, 256), (65536, 256, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2288570Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2288576Z 
2023-01-11T21:38:06.2288581Z 
2023-01-11T21:38:06.2288679Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2288753Z import torch
2023-01-11T21:38:06.2288829Z import random
2023-01-11T21:38:06.2288940Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2289065Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2289070Z 
2023-01-11T21:38:06.2289180Z aten = torch.ops.aten
2023-01-11T21:38:06.2289317Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2289413Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2289418Z 
2023-01-11T21:38:06.2289492Z import triton
2023-01-11T21:38:06.2289584Z import triton.language as tl
2023-01-11T21:38:06.2289708Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2289838Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2289844Z 
2023-01-11T21:38:06.2289848Z 
2023-01-11T21:38:06.2290019Z triton_fused_argmax_argmin_0 = async_compile.triton('''
2023-01-11T21:38:06.2290097Z import triton
2023-01-11T21:38:06.2290189Z import triton.language as tl
2023-01-11T21:38:06.2290302Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2290403Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2290532Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2290651Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2290663Z 
2023-01-11T21:38:06.2290748Z @reduction(size_hints=[1, 524288],
2023-01-11T21:38:06.2290864Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2290948Z               filename=__file__,
2023-01-11T21:38:06.2291322Z               meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.2291394Z @triton.jit
2023-01-11T21:38:06.2291571Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2291674Z     xnumel = 1
2023-01-11T21:38:06.2291742Z     rnumel = 524288
2023-01-11T21:38:06.2291841Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2291975Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2292058Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2292179Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2292364Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.2292480Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2292607Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.2292714Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2292820Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2292907Z         rindex = roffset + rbase
2023-01-11T21:38:06.2292991Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2293062Z         r0 = rindex
2023-01-11T21:38:06.2293284Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2293404Z         tmp2 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.2293539Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.2293667Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.2293802Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.2293925Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.2294021Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.2294129Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2294252Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2294341Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.2294429Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2294692Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2294828Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.2294922Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.2295030Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2295261Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2295366Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.2295468Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2295591Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2295721Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.2295830Z ''')
2023-01-11T21:38:06.2295836Z 
2023-01-11T21:38:06.2295840Z 
2023-01-11T21:38:06.2295933Z async_compile.wait(globals())
2023-01-11T21:38:06.2296012Z del async_compile
2023-01-11T21:38:06.2296017Z 
2023-01-11T21:38:06.2296090Z def call(args):
2023-01-11T21:38:06.2296159Z     arg0_1, = args
2023-01-11T21:38:06.2296234Z     args.clear()
2023-01-11T21:38:06.2296327Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2296512Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2296697Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2296788Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2296943Z         triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 524288, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2297016Z         del arg0_1
2023-01-11T21:38:06.2297092Z         return (buf0, buf1, )
2023-01-11T21:38:06.2297097Z 
2023-01-11T21:38:06.2297102Z 
2023-01-11T21:38:06.2297255Z if __name__ == "__main__":
2023-01-11T21:38:06.2297379Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2297505Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2297725Z     arg0_1 = rand_strided((8, 256, 256), (65536, 256, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2297880Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2297885Z 
2023-01-11T21:38:06.2297954Z ok (0.407s)
2023-01-11T21:38:06.2298415Z   test_argmax_argmin2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2298547Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2298799Z [2023-01-11 21:33:59,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 403
2023-01-11T21:38:06.2299061Z [2023-01-11 21:34:01,511] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 403
2023-01-11T21:38:06.2299487Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2299621Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2299876Z [2023-01-11 21:34:01,553] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 404
2023-01-11T21:38:06.2299881Z 
2023-01-11T21:38:06.2299978Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2300052Z import torch
2023-01-11T21:38:06.2300127Z import random
2023-01-11T21:38:06.2300245Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2300362Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2300368Z 
2023-01-11T21:38:06.2300450Z aten = torch.ops.aten
2023-01-11T21:38:06.2300590Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2300686Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2300691Z 
2023-01-11T21:38:06.2300767Z import triton
2023-01-11T21:38:06.2300857Z import triton.language as tl
2023-01-11T21:38:06.2300983Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2301153Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2301159Z 
2023-01-11T21:38:06.2301163Z 
2023-01-11T21:38:06.2301329Z triton_fused_argmax_argmin_0 = async_compile.triton('''
2023-01-11T21:38:06.2301405Z import triton
2023-01-11T21:38:06.2301496Z import triton.language as tl
2023-01-11T21:38:06.2301612Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2301715Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2301845Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2301973Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2301982Z 
2023-01-11T21:38:06.2302073Z @reduction(size_hints=[256, 256],
2023-01-11T21:38:06.2302184Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.2302268Z               filename=__file__,
2023-01-11T21:38:06.2302649Z               meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.2302722Z @triton.jit
2023-01-11T21:38:06.2302897Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2302972Z     xnumel = 144
2023-01-11T21:38:06.2303046Z     rnumel = 144
2023-01-11T21:38:06.2303138Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2303272Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2303354Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2303473Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2303576Z     x0 = xindex
2023-01-11T21:38:06.2303759Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.2303876Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2303995Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.2304110Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2304215Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2304305Z         rindex = roffset + rbase
2023-01-11T21:38:06.2304392Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2304463Z         r1 = rindex
2023-01-11T21:38:06.2304683Z         tmp0 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2304800Z         tmp2 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask)
2023-01-11T21:38:06.2304935Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.2305065Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.2305203Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.2305325Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.2305423Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.2305558Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2305691Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2305791Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.2305880Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2306003Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2306101Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.2306198Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.2306306Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2306426Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2306517Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.2306608Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2306731Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2306828Z     tl.store(out_ptr1 + x0, tmp3, xmask)
2023-01-11T21:38:06.2306939Z ''')
2023-01-11T21:38:06.2306946Z 
2023-01-11T21:38:06.2306950Z 
2023-01-11T21:38:06.2307129Z triton_fused_argmax_1_argmin_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2307204Z import triton
2023-01-11T21:38:06.2307295Z import triton.language as tl
2023-01-11T21:38:06.2307402Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2307505Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2307635Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2307760Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2307765Z 
2023-01-11T21:38:06.2307857Z @reduction(size_hints=[256, 256],
2023-01-11T21:38:06.2307974Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2308060Z               filename=__file__,
2023-01-11T21:38:06.2308439Z               meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.2308506Z @triton.jit
2023-01-11T21:38:06.2308680Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2308752Z     xnumel = 144
2023-01-11T21:38:06.2308826Z     rnumel = 144
2023-01-11T21:38:06.2308924Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2309057Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2309142Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2309253Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2309323Z     x0 = xindex
2023-01-11T21:38:06.2309536Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.2309652Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2309778Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.2309892Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2310001Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2310083Z         rindex = roffset + rbase
2023-01-11T21:38:06.2310168Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2310239Z         r1 = rindex
2023-01-11T21:38:06.2310459Z         tmp0 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2310576Z         tmp2 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask)
2023-01-11T21:38:06.2310720Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.2310848Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.2310991Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.2311108Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.2311204Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.2311316Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2311432Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2311527Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.2311614Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2311738Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2311828Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.2311922Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.2312032Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2312150Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2312245Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.2312333Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2312456Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2312546Z     tl.store(out_ptr1 + x0, tmp3, xmask)
2023-01-11T21:38:06.2312631Z ''')
2023-01-11T21:38:06.2312636Z 
2023-01-11T21:38:06.2312641Z 
2023-01-11T21:38:06.2312758Z async_compile.wait(globals())
2023-01-11T21:38:06.2312839Z del async_compile
2023-01-11T21:38:06.2312844Z 
2023-01-11T21:38:06.2312919Z def call(args):
2023-01-11T21:38:06.2312993Z     arg0_1, = args
2023-01-11T21:38:06.2313068Z     args.clear()
2023-01-11T21:38:06.2313159Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2313350Z         buf0 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2313546Z         buf1 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2313638Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2313795Z         triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 144, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.2313990Z         buf2 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2314182Z         buf3 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2314346Z         triton_fused_argmax_1_argmin_1_1.run(arg0_1, buf2, buf3, 144, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.2314420Z         del arg0_1
2023-01-11T21:38:06.2314510Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.2314515Z 
2023-01-11T21:38:06.2314520Z 
2023-01-11T21:38:06.2314599Z if __name__ == "__main__":
2023-01-11T21:38:06.2314717Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2314843Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2315055Z     arg0_1 = rand_strided((144, 144), (144, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2315167Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2315206Z 
2023-01-11T21:38:06.2315515Z [2023-01-11 21:34:03,297] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 404
2023-01-11T21:38:06.2315521Z 
2023-01-11T21:38:06.2315620Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2315688Z import torch
2023-01-11T21:38:06.2315762Z import random
2023-01-11T21:38:06.2315882Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2316004Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2316009Z 
2023-01-11T21:38:06.2316091Z aten = torch.ops.aten
2023-01-11T21:38:06.2316227Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2316323Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2316328Z 
2023-01-11T21:38:06.2316402Z import triton
2023-01-11T21:38:06.2316488Z import triton.language as tl
2023-01-11T21:38:06.2316613Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2316751Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2316759Z 
2023-01-11T21:38:06.2316763Z 
2023-01-11T21:38:06.2316933Z triton_fused_argmax_argmin_0 = async_compile.triton('''
2023-01-11T21:38:06.2317009Z import triton
2023-01-11T21:38:06.2317100Z import triton.language as tl
2023-01-11T21:38:06.2317216Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2317314Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2317443Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2317568Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2317573Z 
2023-01-11T21:38:06.2317665Z @reduction(size_hints=[256, 256],
2023-01-11T21:38:06.2317780Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.2317864Z               filename=__file__,
2023-01-11T21:38:06.2318252Z               meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.2318329Z @triton.jit
2023-01-11T21:38:06.2318505Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2318580Z     xnumel = 144
2023-01-11T21:38:06.2318653Z     rnumel = 144
2023-01-11T21:38:06.2318778Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2318911Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2318996Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2319115Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2319180Z     x0 = xindex
2023-01-11T21:38:06.2319363Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.2319479Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2319604Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.2319719Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2319827Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2319915Z         rindex = roffset + rbase
2023-01-11T21:38:06.2320000Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2320065Z         r1 = rindex
2023-01-11T21:38:06.2320308Z         tmp0 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2320443Z         tmp2 = tl.load(in_ptr0 + (x0 + (144*r1)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.2320586Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.2320710Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.2320847Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.2320972Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.2321060Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.2321201Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2321321Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2321417Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.2321504Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2321631Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2321729Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.2321816Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.2321925Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2322044Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2322138Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.2322226Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2322349Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2322448Z     tl.store(out_ptr1 + x0, tmp3, xmask)
2023-01-11T21:38:06.2322534Z ''')
2023-01-11T21:38:06.2322542Z 
2023-01-11T21:38:06.2322547Z 
2023-01-11T21:38:06.2322718Z triton_fused_argmax_1_argmin_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2322793Z import triton
2023-01-11T21:38:06.2322885Z import triton.language as tl
2023-01-11T21:38:06.2323000Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2323105Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2323235Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2323360Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2323365Z 
2023-01-11T21:38:06.2323449Z @reduction(size_hints=[256, 256],
2023-01-11T21:38:06.2323563Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2323648Z               filename=__file__,
2023-01-11T21:38:06.2324021Z               meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.2324097Z @triton.jit
2023-01-11T21:38:06.2324274Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2324348Z     xnumel = 144
2023-01-11T21:38:06.2324421Z     rnumel = 144
2023-01-11T21:38:06.2324511Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2324673Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2324758Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2324877Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2324947Z     x0 = xindex
2023-01-11T21:38:06.2325129Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.2325249Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2325368Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.2325482Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.2325587Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2325691Z         rindex = roffset + rbase
2023-01-11T21:38:06.2325786Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2325869Z         r1 = rindex
2023-01-11T21:38:06.2326123Z         tmp0 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2326262Z         tmp2 = tl.load(in_ptr0 + (r1 + (144*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.2326398Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.2326526Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.2326666Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.2326788Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.2326882Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.2326990Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2327149Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2327238Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.2327327Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2327452Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2327554Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.2327648Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.2327758Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.2327875Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.2327963Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.2328054Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.2328178Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.2328275Z     tl.store(out_ptr1 + x0, tmp3, xmask)
2023-01-11T21:38:06.2328361Z ''')
2023-01-11T21:38:06.2328366Z 
2023-01-11T21:38:06.2328371Z 
2023-01-11T21:38:06.2328468Z async_compile.wait(globals())
2023-01-11T21:38:06.2328544Z del async_compile
2023-01-11T21:38:06.2328549Z 
2023-01-11T21:38:06.2328623Z def call(args):
2023-01-11T21:38:06.2328689Z     arg0_1, = args
2023-01-11T21:38:06.2328763Z     args.clear()
2023-01-11T21:38:06.2328856Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2329053Z         buf0 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2329250Z         buf1 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2329341Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2329500Z         triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 144, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.2329685Z         buf2 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2329877Z         buf3 = empty_strided((144, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.2330037Z         triton_fused_argmax_1_argmin_1_1.run(arg0_1, buf2, buf3, 144, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.2330115Z         del arg0_1
2023-01-11T21:38:06.2330211Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.2330217Z 
2023-01-11T21:38:06.2330221Z 
2023-01-11T21:38:06.2330302Z if __name__ == "__main__":
2023-01-11T21:38:06.2330451Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2330579Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2330789Z     arg0_1 = rand_strided((144, 144), (144, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2330894Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2330899Z 
2023-01-11T21:38:06.2330970Z ok (3.387s)
2023-01-11T21:38:06.2331092Z   test_argmax_argmin3_cuda (__main__.CudaTests) ... skip: 
2023-01-11T21:38:06.2331255Z         FIXME: In the case of having equally max/min elements, our implementation returns
2023-01-11T21:38:06.2331363Z         the last index instead of the first one
2023-01-11T21:38:06.2331437Z          (0.000s)
2023-01-11T21:38:06.2331897Z   test_as_strided_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2332028Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2332288Z [2023-01-11 21:34:03,331] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 405
2023-01-11T21:38:06.2332546Z [2023-01-11 21:34:03,407] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 405
2023-01-11T21:38:06.2332959Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2333119Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2333376Z [2023-01-11 21:34:03,438] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 406
2023-01-11T21:38:06.2333637Z [2023-01-11 21:34:03,511] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 406
2023-01-11T21:38:06.2333643Z 
2023-01-11T21:38:06.2333740Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2333814Z import torch
2023-01-11T21:38:06.2333887Z import random
2023-01-11T21:38:06.2334005Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2334121Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2334126Z 
2023-01-11T21:38:06.2334208Z aten = torch.ops.aten
2023-01-11T21:38:06.2334347Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2334443Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2334448Z 
2023-01-11T21:38:06.2334634Z import triton
2023-01-11T21:38:06.2334729Z import triton.language as tl
2023-01-11T21:38:06.2334854Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2334990Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2335001Z 
2023-01-11T21:38:06.2335005Z 
2023-01-11T21:38:06.2335162Z triton_fused_add_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2335235Z import triton
2023-01-11T21:38:06.2335327Z import triton.language as tl
2023-01-11T21:38:06.2335437Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2335539Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2335672Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2335797Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2335805Z 
2023-01-11T21:38:06.2336224Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2336292Z @triton.jit
2023-01-11T21:38:06.2336469Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2336546Z     xnumel = 4096
2023-01-11T21:38:06.2336642Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2336776Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2336861Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2336931Z     x0 = xindex
2023-01-11T21:38:06.2337021Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2337093Z     tmp1 = 1
2023-01-11T21:38:06.2337227Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2337298Z     tmp3 = 2
2023-01-11T21:38:06.2337377Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.2337519Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2337607Z ''')
2023-01-11T21:38:06.2337612Z 
2023-01-11T21:38:06.2337617Z 
2023-01-11T21:38:06.2337703Z async_compile.wait(globals())
2023-01-11T21:38:06.2337779Z del async_compile
2023-01-11T21:38:06.2337785Z 
2023-01-11T21:38:06.2337858Z def call(args):
2023-01-11T21:38:06.2337934Z     arg0_1, = args
2023-01-11T21:38:06.2338007Z     args.clear()
2023-01-11T21:38:06.2338097Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2338300Z         buf0 = empty_strided((64, 64), (64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2338414Z         buf1 = as_strided(buf0, (8, 8, 64), (512, 64, 1)); del buf0  # reuse
2023-01-11T21:38:06.2338506Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2338648Z         triton_fused_add_add_1_0.run(buf1, arg0_1, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.2338765Z         return (as_strided(arg0_1, (8, 8, 64), (512, 64, 1)), buf1, )
2023-01-11T21:38:06.2338808Z 
2023-01-11T21:38:06.2338812Z 
2023-01-11T21:38:06.2338895Z if __name__ == "__main__":
2023-01-11T21:38:06.2339011Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2339136Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2339342Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2339447Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2339460Z 
2023-01-11T21:38:06.2339464Z 
2023-01-11T21:38:06.2339555Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2339628Z import torch
2023-01-11T21:38:06.2339702Z import random
2023-01-11T21:38:06.2339818Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2339942Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2339947Z 
2023-01-11T21:38:06.2340028Z aten = torch.ops.aten
2023-01-11T21:38:06.2340164Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2340254Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2340267Z 
2023-01-11T21:38:06.2340334Z import triton
2023-01-11T21:38:06.2340426Z import triton.language as tl
2023-01-11T21:38:06.2340549Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2340687Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2340695Z 
2023-01-11T21:38:06.2340699Z 
2023-01-11T21:38:06.2340860Z triton_fused_add_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2340935Z import triton
2023-01-11T21:38:06.2341026Z import triton.language as tl
2023-01-11T21:38:06.2341132Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2341236Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2341366Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2341489Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2341494Z 
2023-01-11T21:38:06.2341915Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2341989Z @triton.jit
2023-01-11T21:38:06.2342121Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2342224Z     xnumel = 4096
2023-01-11T21:38:06.2342316Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2342445Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2342527Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2342595Z     x0 = xindex
2023-01-11T21:38:06.2342712Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2342780Z     tmp1 = 1
2023-01-11T21:38:06.2342860Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2342923Z     tmp3 = 2
2023-01-11T21:38:06.2343001Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.2343137Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2343226Z ''')
2023-01-11T21:38:06.2343231Z 
2023-01-11T21:38:06.2343236Z 
2023-01-11T21:38:06.2343327Z async_compile.wait(globals())
2023-01-11T21:38:06.2343403Z del async_compile
2023-01-11T21:38:06.2343408Z 
2023-01-11T21:38:06.2343482Z def call(args):
2023-01-11T21:38:06.2343555Z     arg0_1, = args
2023-01-11T21:38:06.2343623Z     args.clear()
2023-01-11T21:38:06.2343716Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2343920Z         buf0 = empty_strided((64, 64), (64, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2344039Z         buf1 = as_strided(buf0, (8, 8, 64), (512, 64, 1)); del buf0  # reuse
2023-01-11T21:38:06.2344131Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2344272Z         triton_fused_add_add_1_0.run(buf1, arg0_1, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.2344388Z         return (as_strided(arg0_1, (8, 8, 64), (512, 64, 1)), buf1, )
2023-01-11T21:38:06.2344393Z 
2023-01-11T21:38:06.2344398Z 
2023-01-11T21:38:06.2344471Z if __name__ == "__main__":
2023-01-11T21:38:06.2344614Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2344740Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2344943Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2345057Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2345062Z 
2023-01-11T21:38:06.2345132Z ok (0.213s)
2023-01-11T21:38:06.2345620Z   test_as_strided_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2345776Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2346036Z [2023-01-11 21:34:03,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 407
2023-01-11T21:38:06.2346305Z [2023-01-11 21:34:03,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 407
2023-01-11T21:38:06.2346715Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2346845Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2347100Z [2023-01-11 21:34:03,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 408
2023-01-11T21:38:06.2347360Z [2023-01-11 21:34:03,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 408
2023-01-11T21:38:06.2347368Z 
2023-01-11T21:38:06.2347470Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2347544Z import torch
2023-01-11T21:38:06.2347618Z import random
2023-01-11T21:38:06.2347735Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2347860Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2347865Z 
2023-01-11T21:38:06.2347967Z aten = torch.ops.aten
2023-01-11T21:38:06.2348105Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2348200Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2348204Z 
2023-01-11T21:38:06.2348277Z import triton
2023-01-11T21:38:06.2348370Z import triton.language as tl
2023-01-11T21:38:06.2348496Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2348634Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2348640Z 
2023-01-11T21:38:06.2348644Z 
2023-01-11T21:38:06.2348851Z triton_fused_add_as_strided_scatter_mul_mul_1_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.2348922Z import triton
2023-01-11T21:38:06.2349014Z import triton.language as tl
2023-01-11T21:38:06.2349127Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2349229Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2349361Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2349490Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2349496Z 
2023-01-11T21:38:06.2349902Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2349976Z @triton.jit
2023-01-11T21:38:06.2350102Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2350178Z     xnumel = 10240
2023-01-11T21:38:06.2350275Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2350401Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2350509Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2350579Z     x0 = xindex
2023-01-11T21:38:06.2350675Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2350740Z     tmp1 = 8
2023-01-11T21:38:06.2350819Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.2350891Z     tmp3 = 10
2023-01-11T21:38:06.2350972Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.2351107Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2351193Z ''')
2023-01-11T21:38:06.2351199Z 
2023-01-11T21:38:06.2351203Z 
2023-01-11T21:38:06.2351361Z triton_fused_mul_1_sub_1 = async_compile.triton('''
2023-01-11T21:38:06.2351428Z import triton
2023-01-11T21:38:06.2351520Z import triton.language as tl
2023-01-11T21:38:06.2351632Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2351732Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2351863Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2351991Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2351996Z 
2023-01-11T21:38:06.2352410Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2352484Z @triton.jit
2023-01-11T21:38:06.2352609Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2352683Z     xnumel = 5120
2023-01-11T21:38:06.2352779Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2352907Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2352990Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2353063Z     x0 = xindex
2023-01-11T21:38:06.2353159Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2353222Z     tmp1 = 2
2023-01-11T21:38:06.2353304Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.2353372Z     tmp3 = 4
2023-01-11T21:38:06.2353482Z     tmp4 = tmp2 - tmp3
2023-01-11T21:38:06.2353619Z     tl.store(out_ptr0 + (2*x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2353704Z ''')
2023-01-11T21:38:06.2353709Z 
2023-01-11T21:38:06.2353714Z 
2023-01-11T21:38:06.2353804Z async_compile.wait(globals())
2023-01-11T21:38:06.2353875Z del async_compile
2023-01-11T21:38:06.2353917Z 
2023-01-11T21:38:06.2353986Z def call(args):
2023-01-11T21:38:06.2354067Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2354141Z     args.clear()
2023-01-11T21:38:06.2354233Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2354441Z         buf0 = empty_strided((10, 1024), (1024, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2354533Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2354706Z         triton_fused_add_as_strided_scatter_mul_mul_1_sub_0.run(arg0_1, buf0, 10240, grid=grid(10240), stream=stream0)
2023-01-11T21:38:06.2354773Z         del arg0_1
2023-01-11T21:38:06.2354913Z         triton_fused_mul_1_sub_1.run(arg1_1, buf0, 5120, grid=grid(5120), stream=stream0)
2023-01-11T21:38:06.2354986Z         del arg1_1
2023-01-11T21:38:06.2355065Z         return (buf0, )
2023-01-11T21:38:06.2355070Z 
2023-01-11T21:38:06.2355074Z 
2023-01-11T21:38:06.2355155Z if __name__ == "__main__":
2023-01-11T21:38:06.2355294Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2355434Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2355652Z     arg0_1 = rand_strided((10, 1024), (1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2355857Z     arg1_1 = rand_strided((10, 512), (512, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2355977Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2355983Z 
2023-01-11T21:38:06.2355987Z 
2023-01-11T21:38:06.2356082Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2356158Z import torch
2023-01-11T21:38:06.2356231Z import random
2023-01-11T21:38:06.2356350Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2356500Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2356505Z 
2023-01-11T21:38:06.2356585Z aten = torch.ops.aten
2023-01-11T21:38:06.2356713Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2356807Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2356815Z 
2023-01-11T21:38:06.2356889Z import triton
2023-01-11T21:38:06.2356980Z import triton.language as tl
2023-01-11T21:38:06.2357103Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2357241Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2357246Z 
2023-01-11T21:38:06.2357251Z 
2023-01-11T21:38:06.2357462Z triton_fused_add_as_strided_scatter_mul_mul_1_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.2357536Z import triton
2023-01-11T21:38:06.2357621Z import triton.language as tl
2023-01-11T21:38:06.2357733Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2357837Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2357969Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2358093Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2358098Z 
2023-01-11T21:38:06.2358506Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2358579Z @triton.jit
2023-01-11T21:38:06.2358709Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2358777Z     xnumel = 10240
2023-01-11T21:38:06.2358873Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2359003Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2359085Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2359156Z     x0 = xindex
2023-01-11T21:38:06.2359270Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2359337Z     tmp1 = 8
2023-01-11T21:38:06.2359414Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.2359484Z     tmp3 = 10
2023-01-11T21:38:06.2359560Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.2359694Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2359778Z ''')
2023-01-11T21:38:06.2359811Z 
2023-01-11T21:38:06.2359817Z 
2023-01-11T21:38:06.2359978Z triton_fused_mul_1_sub_1 = async_compile.triton('''
2023-01-11T21:38:06.2360052Z import triton
2023-01-11T21:38:06.2360137Z import triton.language as tl
2023-01-11T21:38:06.2360249Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2360350Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2360480Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2360606Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2360611Z 
2023-01-11T21:38:06.2361024Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2361100Z @triton.jit
2023-01-11T21:38:06.2361232Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2361301Z     xnumel = 5120
2023-01-11T21:38:06.2361396Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2361524Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2361606Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2361675Z     x0 = xindex
2023-01-11T21:38:06.2361795Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2361865Z     tmp1 = 2
2023-01-11T21:38:06.2361937Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.2362005Z     tmp3 = 4
2023-01-11T21:38:06.2362115Z     tmp4 = tmp2 - tmp3
2023-01-11T21:38:06.2362250Z     tl.store(out_ptr0 + (2*x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2362363Z ''')
2023-01-11T21:38:06.2362369Z 
2023-01-11T21:38:06.2362373Z 
2023-01-11T21:38:06.2362466Z async_compile.wait(globals())
2023-01-11T21:38:06.2362541Z del async_compile
2023-01-11T21:38:06.2362546Z 
2023-01-11T21:38:06.2362614Z def call(args):
2023-01-11T21:38:06.2362694Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2362770Z     args.clear()
2023-01-11T21:38:06.2362862Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2363070Z         buf0 = empty_strided((10, 1024), (1024, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2363163Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2363338Z         triton_fused_add_as_strided_scatter_mul_mul_1_sub_0.run(arg0_1, buf0, 10240, grid=grid(10240), stream=stream0)
2023-01-11T21:38:06.2363405Z         del arg0_1
2023-01-11T21:38:06.2363550Z         triton_fused_mul_1_sub_1.run(arg1_1, buf0, 5120, grid=grid(5120), stream=stream0)
2023-01-11T21:38:06.2363621Z         del arg1_1
2023-01-11T21:38:06.2363696Z         return (buf0, )
2023-01-11T21:38:06.2363705Z 
2023-01-11T21:38:06.2363709Z 
2023-01-11T21:38:06.2363790Z if __name__ == "__main__":
2023-01-11T21:38:06.2363906Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2364031Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2364242Z     arg0_1 = rand_strided((10, 1024), (1024, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2364441Z     arg1_1 = rand_strided((10, 512), (512, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2364558Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2364563Z 
2023-01-11T21:38:06.2364638Z ok (0.220s)
2023-01-11T21:38:06.2365090Z   test_avg_pool2d1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2365227Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2365482Z [2023-01-11 21:34:03,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 409
2023-01-11T21:38:06.2365775Z [2023-01-11 21:34:03,903] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 409
2023-01-11T21:38:06.2366189Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2366323Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2366578Z [2023-01-11 21:34:03,921] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 410
2023-01-11T21:38:06.2366842Z [2023-01-11 21:34:04,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 410
2023-01-11T21:38:06.2366848Z 
2023-01-11T21:38:06.2366939Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2367012Z import torch
2023-01-11T21:38:06.2367087Z import random
2023-01-11T21:38:06.2367205Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2367328Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2367333Z 
2023-01-11T21:38:06.2367416Z aten = torch.ops.aten
2023-01-11T21:38:06.2367553Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2367649Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2367654Z 
2023-01-11T21:38:06.2367721Z import triton
2023-01-11T21:38:06.2367814Z import triton.language as tl
2023-01-11T21:38:06.2367936Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2368117Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2368122Z 
2023-01-11T21:38:06.2368127Z 
2023-01-11T21:38:06.2368293Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2368366Z import triton
2023-01-11T21:38:06.2368458Z import triton.language as tl
2023-01-11T21:38:06.2368568Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2368671Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2368805Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2368930Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2368935Z 
2023-01-11T21:38:06.2369334Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2369408Z @triton.jit
2023-01-11T21:38:06.2369539Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2369616Z     xnumel = 392
2023-01-11T21:38:06.2369705Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2369833Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2369916Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2369990Z     x0 = xindex % 7
2023-01-11T21:38:06.2370077Z     x1 = (xindex // 7) % 7
2023-01-11T21:38:06.2370156Z     x2 = (xindex // 49)
2023-01-11T21:38:06.2370226Z     x3 = xindex
2023-01-11T21:38:06.2370337Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2370456Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2370572Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2370691Z     tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2370806Z     tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2370917Z     tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2371036Z     tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2371154Z     tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2371291Z     tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.2371372Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.2371451Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.2371528Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.2371604Z     tmp8 = tmp7 + tmp6
2023-01-11T21:38:06.2371680Z     tmp10 = tmp9 + tmp8
2023-01-11T21:38:06.2371760Z     tmp12 = tmp11 + tmp10
2023-01-11T21:38:06.2371833Z     tmp14 = tmp13 + tmp12
2023-01-11T21:38:06.2371910Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.2371990Z     tmp17 = 0.1111111111111111
2023-01-11T21:38:06.2372068Z     tmp18 = tmp16 * tmp17
2023-01-11T21:38:06.2372202Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.2372291Z ''')
2023-01-11T21:38:06.2372296Z 
2023-01-11T21:38:06.2372301Z 
2023-01-11T21:38:06.2372396Z async_compile.wait(globals())
2023-01-11T21:38:06.2372465Z del async_compile
2023-01-11T21:38:06.2372477Z 
2023-01-11T21:38:06.2372545Z def call(args):
2023-01-11T21:38:06.2372618Z     arg0_1, = args
2023-01-11T21:38:06.2372694Z     args.clear()
2023-01-11T21:38:06.2372789Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2373007Z         buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2373097Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2373242Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 392, grid=grid(392), stream=stream0)
2023-01-11T21:38:06.2373308Z         del arg0_1
2023-01-11T21:38:06.2373382Z         return (buf0, )
2023-01-11T21:38:06.2373388Z 
2023-01-11T21:38:06.2373392Z 
2023-01-11T21:38:06.2373472Z if __name__ == "__main__":
2023-01-11T21:38:06.2373589Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2373744Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2373968Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2374080Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2374085Z 
2023-01-11T21:38:06.2374089Z 
2023-01-11T21:38:06.2374185Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2374252Z import torch
2023-01-11T21:38:06.2374326Z import random
2023-01-11T21:38:06.2374443Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2374692Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2374697Z 
2023-01-11T21:38:06.2374779Z aten = torch.ops.aten
2023-01-11T21:38:06.2374915Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2375009Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2375015Z 
2023-01-11T21:38:06.2375081Z import triton
2023-01-11T21:38:06.2375175Z import triton.language as tl
2023-01-11T21:38:06.2375304Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2375443Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2375448Z 
2023-01-11T21:38:06.2375452Z 
2023-01-11T21:38:06.2375619Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2375698Z import triton
2023-01-11T21:38:06.2375792Z import triton.language as tl
2023-01-11T21:38:06.2375905Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2376000Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2376131Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2376258Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2376264Z 
2023-01-11T21:38:06.2376664Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2376740Z @triton.jit
2023-01-11T21:38:06.2376872Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2376946Z     xnumel = 392
2023-01-11T21:38:06.2377041Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2377243Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2377387Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2377465Z     x0 = xindex % 7
2023-01-11T21:38:06.2377547Z     x1 = (xindex // 7) % 7
2023-01-11T21:38:06.2377626Z     x2 = (xindex // 49)
2023-01-11T21:38:06.2377699Z     x3 = xindex
2023-01-11T21:38:06.2377831Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2377960Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378085Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378220Z     tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378354Z     tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378484Z     tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378616Z     tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378750Z     tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378878Z     tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2378950Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.2379028Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.2379105Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.2379181Z     tmp8 = tmp7 + tmp6
2023-01-11T21:38:06.2379257Z     tmp10 = tmp9 + tmp8
2023-01-11T21:38:06.2379336Z     tmp12 = tmp11 + tmp10
2023-01-11T21:38:06.2379417Z     tmp14 = tmp13 + tmp12
2023-01-11T21:38:06.2379525Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.2379605Z     tmp17 = 0.1111111111111111
2023-01-11T21:38:06.2379682Z     tmp18 = tmp16 * tmp17
2023-01-11T21:38:06.2379818Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.2379905Z ''')
2023-01-11T21:38:06.2379910Z 
2023-01-11T21:38:06.2379915Z 
2023-01-11T21:38:06.2380010Z async_compile.wait(globals())
2023-01-11T21:38:06.2380084Z del async_compile
2023-01-11T21:38:06.2380090Z 
2023-01-11T21:38:06.2380157Z def call(args):
2023-01-11T21:38:06.2380229Z     arg0_1, = args
2023-01-11T21:38:06.2380301Z     args.clear()
2023-01-11T21:38:06.2380393Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2380610Z         buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2380701Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2380846Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 392, grid=grid(392), stream=stream0)
2023-01-11T21:38:06.2380919Z         del arg0_1
2023-01-11T21:38:06.2380993Z         return (buf0, )
2023-01-11T21:38:06.2380998Z 
2023-01-11T21:38:06.2381002Z 
2023-01-11T21:38:06.2381081Z if __name__ == "__main__":
2023-01-11T21:38:06.2381197Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2381322Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2381551Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2381663Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2381668Z 
2023-01-11T21:38:06.2381741Z ok (0.320s)
2023-01-11T21:38:06.2382201Z   test_avg_pool2d2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2382335Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2382587Z [2023-01-11 21:34:04,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 411
2023-01-11T21:38:06.2382880Z [2023-01-11 21:34:04,233] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 411
2023-01-11T21:38:06.2383295Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2383425Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2383680Z [2023-01-11 21:34:04,252] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 412
2023-01-11T21:38:06.2383945Z [2023-01-11 21:34:04,385] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 412
2023-01-11T21:38:06.2383951Z 
2023-01-11T21:38:06.2384049Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2384123Z import torch
2023-01-11T21:38:06.2384196Z import random
2023-01-11T21:38:06.2384310Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2384435Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2384440Z 
2023-01-11T21:38:06.2384519Z aten = torch.ops.aten
2023-01-11T21:38:06.2384655Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2384750Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2384755Z 
2023-01-11T21:38:06.2384827Z import triton
2023-01-11T21:38:06.2384917Z import triton.language as tl
2023-01-11T21:38:06.2385041Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2385174Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2385249Z 
2023-01-11T21:38:06.2385260Z 
2023-01-11T21:38:06.2385419Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2385503Z import triton
2023-01-11T21:38:06.2385608Z import triton.language as tl
2023-01-11T21:38:06.2385742Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2385848Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2385980Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2386099Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2386111Z 
2023-01-11T21:38:06.2386510Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2386583Z @triton.jit
2023-01-11T21:38:06.2386714Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2386790Z     xnumel = 746496
2023-01-11T21:38:06.2386887Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2387013Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2387097Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2387166Z     x0 = xindex % 27
2023-01-11T21:38:06.2387248Z     x1 = (xindex // 27) % 27
2023-01-11T21:38:06.2387329Z     x2 = (xindex // 729)
2023-01-11T21:38:06.2387399Z     x3 = xindex
2023-01-11T21:38:06.2387518Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2387635Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2387755Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2387873Z     tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2387986Z     tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2388101Z     tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2388223Z     tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2388342Z     tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2388457Z     tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.2388566Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.2388646Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.2388718Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.2388795Z     tmp8 = tmp7 + tmp6
2023-01-11T21:38:06.2388873Z     tmp10 = tmp9 + tmp8
2023-01-11T21:38:06.2388954Z     tmp12 = tmp11 + tmp10
2023-01-11T21:38:06.2389034Z     tmp14 = tmp13 + tmp12
2023-01-11T21:38:06.2389112Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.2389192Z     tmp17 = 0.1111111111111111
2023-01-11T21:38:06.2389264Z     tmp18 = tmp16 * tmp17
2023-01-11T21:38:06.2389400Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.2389489Z ''')
2023-01-11T21:38:06.2389494Z 
2023-01-11T21:38:06.2389498Z 
2023-01-11T21:38:06.2389592Z async_compile.wait(globals())
2023-01-11T21:38:06.2389670Z del async_compile
2023-01-11T21:38:06.2389675Z 
2023-01-11T21:38:06.2389748Z def call(args):
2023-01-11T21:38:06.2389821Z     arg0_1, = args
2023-01-11T21:38:06.2389889Z     args.clear()
2023-01-11T21:38:06.2389983Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2390214Z         buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2390305Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2390453Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 746496, grid=grid(746496), stream=stream0)
2023-01-11T21:38:06.2390527Z         del arg0_1
2023-01-11T21:38:06.2390603Z         return (buf0, )
2023-01-11T21:38:06.2390608Z 
2023-01-11T21:38:06.2390612Z 
2023-01-11T21:38:06.2390691Z if __name__ == "__main__":
2023-01-11T21:38:06.2390803Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2390958Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2391189Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2391300Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2391305Z 
2023-01-11T21:38:06.2391309Z 
2023-01-11T21:38:06.2391408Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2391484Z import torch
2023-01-11T21:38:06.2391559Z import random
2023-01-11T21:38:06.2391678Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2391794Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2391799Z 
2023-01-11T21:38:06.2391880Z aten = torch.ops.aten
2023-01-11T21:38:06.2392017Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2392112Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2392118Z 
2023-01-11T21:38:06.2392190Z import triton
2023-01-11T21:38:06.2392280Z import triton.language as tl
2023-01-11T21:38:06.2392407Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2392538Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2392552Z 
2023-01-11T21:38:06.2392556Z 
2023-01-11T21:38:06.2392714Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2392789Z import triton
2023-01-11T21:38:06.2392881Z import triton.language as tl
2023-01-11T21:38:06.2392993Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2393095Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2393224Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2393346Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2393351Z 
2023-01-11T21:38:06.2393760Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2393831Z @triton.jit
2023-01-11T21:38:06.2393961Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2394034Z     xnumel = 746496
2023-01-11T21:38:06.2394129Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2394294Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2394378Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2394453Z     x0 = xindex % 27
2023-01-11T21:38:06.2394529Z     x1 = (xindex // 27) % 27
2023-01-11T21:38:06.2394608Z     x2 = (xindex // 729)
2023-01-11T21:38:06.2394678Z     x3 = xindex
2023-01-11T21:38:06.2394812Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2394946Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395078Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395211Z     tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395346Z     tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395469Z     tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395602Z     tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395732Z     tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395856Z     tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2395934Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.2396013Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.2396089Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.2396159Z     tmp8 = tmp7 + tmp6
2023-01-11T21:38:06.2396237Z     tmp10 = tmp9 + tmp8
2023-01-11T21:38:06.2396315Z     tmp12 = tmp11 + tmp10
2023-01-11T21:38:06.2396395Z     tmp14 = tmp13 + tmp12
2023-01-11T21:38:06.2396501Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.2396579Z     tmp17 = 0.1111111111111111
2023-01-11T21:38:06.2396658Z     tmp18 = tmp16 * tmp17
2023-01-11T21:38:06.2396787Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.2396874Z ''')
2023-01-11T21:38:06.2396880Z 
2023-01-11T21:38:06.2396887Z 
2023-01-11T21:38:06.2396977Z async_compile.wait(globals())
2023-01-11T21:38:06.2397055Z del async_compile
2023-01-11T21:38:06.2397061Z 
2023-01-11T21:38:06.2397135Z def call(args):
2023-01-11T21:38:06.2397206Z     arg0_1, = args
2023-01-11T21:38:06.2397279Z     args.clear()
2023-01-11T21:38:06.2397365Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2397591Z         buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2397682Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2397829Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 746496, grid=grid(746496), stream=stream0)
2023-01-11T21:38:06.2397902Z         del arg0_1
2023-01-11T21:38:06.2397979Z         return (buf0, )
2023-01-11T21:38:06.2397984Z 
2023-01-11T21:38:06.2397989Z 
2023-01-11T21:38:06.2398069Z if __name__ == "__main__":
2023-01-11T21:38:06.2398185Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2398308Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2398540Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2398653Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2398658Z 
2023-01-11T21:38:06.2398726Z ok (0.334s)
2023-01-11T21:38:06.2399183Z   test_avg_pool2d3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2399314Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2399570Z [2023-01-11 21:34:04,405] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 413
2023-01-11T21:38:06.2399859Z [2023-01-11 21:34:04,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 413
2023-01-11T21:38:06.2400276Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2400406Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2400663Z [2023-01-11 21:34:04,594] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 414
2023-01-11T21:38:06.2400672Z 
2023-01-11T21:38:06.2400763Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2400840Z import torch
2023-01-11T21:38:06.2400912Z import random
2023-01-11T21:38:06.2401028Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2401154Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2401160Z 
2023-01-11T21:38:06.2401241Z aten = torch.ops.aten
2023-01-11T21:38:06.2401377Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2401474Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2401480Z 
2023-01-11T21:38:06.2401547Z import triton
2023-01-11T21:38:06.2401636Z import triton.language as tl
2023-01-11T21:38:06.2401759Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2401904Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2401909Z 
2023-01-11T21:38:06.2401913Z 
2023-01-11T21:38:06.2402106Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2402181Z import triton
2023-01-11T21:38:06.2402271Z import triton.language as tl
2023-01-11T21:38:06.2402378Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2402480Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2402615Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2402740Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2402745Z 
2023-01-11T21:38:06.2403148Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2403221Z @triton.jit
2023-01-11T21:38:06.2403351Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2403424Z     xnumel = 16
2023-01-11T21:38:06.2403514Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2403644Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2403727Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2403805Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2403878Z     x0 = xindex % 4
2023-01-11T21:38:06.2403950Z     x2 = xindex
2023-01-11T21:38:06.2404058Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.2404124Z     tmp1 = 0
2023-01-11T21:38:06.2404203Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2404271Z     tmp3 = 8
2023-01-11T21:38:06.2404351Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2404427Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2404533Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.2404611Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.2404681Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.2404757Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.2404834Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.2405070Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
2023-01-11T21:38:06.2405169Z     tmp12 = tl.where(tmp10, tmp11, 0.0)
2023-01-11T21:38:06.2405240Z     tmp13 = 2*x0
2023-01-11T21:38:06.2405319Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.2405392Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.2405471Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.2405550Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.2405823Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0)
2023-01-11T21:38:06.2405921Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2406001Z     tmp20 = tmp19 + tmp12
2023-01-11T21:38:06.2406075Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.2406149Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.2406229Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.2406310Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.2406389Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.2406626Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0)
2023-01-11T21:38:06.2406725Z     tmp27 = tl.where(tmp25, tmp26, 0.0)
2023-01-11T21:38:06.2406805Z     tmp28 = tmp27 + tmp20
2023-01-11T21:38:06.2406871Z     tmp29 = 2*x1
2023-01-11T21:38:06.2406950Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.2407027Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.2407107Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.2407184Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.2407420Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0)
2023-01-11T21:38:06.2407515Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.2407589Z     tmp36 = tmp35 + tmp28
2023-01-11T21:38:06.2407667Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.2407824Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0)
2023-01-11T21:38:06.2407918Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.2407998Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.2408076Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.2408231Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0)
2023-01-11T21:38:06.2408359Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.2408438Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.2408511Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.2408590Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.2408673Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.2408752Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.2408829Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.2408979Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0)
2023-01-11T21:38:06.2409072Z     tmp51 = tl.where(tmp49, tmp50, 0.0)
2023-01-11T21:38:06.2409150Z     tmp52 = tmp51 + tmp44
2023-01-11T21:38:06.2409229Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.2409381Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0)
2023-01-11T21:38:06.2409475Z     tmp55 = tl.where(tmp53, tmp54, 0.0)
2023-01-11T21:38:06.2409556Z     tmp56 = tmp55 + tmp52
2023-01-11T21:38:06.2409628Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.2409783Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0)
2023-01-11T21:38:06.2409876Z     tmp59 = tl.where(tmp57, tmp58, 0.0)
2023-01-11T21:38:06.2409959Z     tmp60 = tmp59 + tmp56
2023-01-11T21:38:06.2410038Z     tmp61 = 0.1111111111111111
2023-01-11T21:38:06.2410116Z     tmp62 = tmp60 * tmp61
2023-01-11T21:38:06.2410257Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask)
2023-01-11T21:38:06.2410337Z ''')
2023-01-11T21:38:06.2410342Z 
2023-01-11T21:38:06.2410354Z 
2023-01-11T21:38:06.2410440Z async_compile.wait(globals())
2023-01-11T21:38:06.2410517Z del async_compile
2023-01-11T21:38:06.2410522Z 
2023-01-11T21:38:06.2410599Z def call(args):
2023-01-11T21:38:06.2410672Z     arg0_1, = args
2023-01-11T21:38:06.2410746Z     args.clear()
2023-01-11T21:38:06.2410838Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2411055Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2411142Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2411304Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2411380Z         del arg0_1
2023-01-11T21:38:06.2411485Z         return (buf0, )
2023-01-11T21:38:06.2411490Z 
2023-01-11T21:38:06.2411495Z 
2023-01-11T21:38:06.2411576Z if __name__ == "__main__":
2023-01-11T21:38:06.2411696Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2411825Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2412059Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2412183Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2412188Z 
2023-01-11T21:38:06.2412476Z [2023-01-11 21:34:04,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 414
2023-01-11T21:38:06.2412485Z 
2023-01-11T21:38:06.2412581Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2412655Z import torch
2023-01-11T21:38:06.2412731Z import random
2023-01-11T21:38:06.2412851Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2412975Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2412980Z 
2023-01-11T21:38:06.2413068Z aten = torch.ops.aten
2023-01-11T21:38:06.2413198Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2413299Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2413304Z 
2023-01-11T21:38:06.2413376Z import triton
2023-01-11T21:38:06.2413468Z import triton.language as tl
2023-01-11T21:38:06.2413598Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2413739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2413745Z 
2023-01-11T21:38:06.2413749Z 
2023-01-11T21:38:06.2413913Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2414021Z import triton
2023-01-11T21:38:06.2414105Z import triton.language as tl
2023-01-11T21:38:06.2414218Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2414317Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2414453Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2414706Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2414712Z 
2023-01-11T21:38:06.2415125Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2415200Z @triton.jit
2023-01-11T21:38:06.2415333Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2415402Z     xnumel = 16
2023-01-11T21:38:06.2415508Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2415657Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2415764Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2415852Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2415924Z     x0 = xindex % 4
2023-01-11T21:38:06.2415997Z     x2 = xindex
2023-01-11T21:38:06.2416100Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.2416173Z     tmp1 = 0
2023-01-11T21:38:06.2416259Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2416332Z     tmp3 = 8
2023-01-11T21:38:06.2416409Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2416491Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2416593Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.2416674Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.2416751Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.2416835Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.2416915Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.2417261Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2417417Z     tmp12 = tl.where(tmp10, tmp11, 0.0)
2023-01-11T21:38:06.2417487Z     tmp13 = 2*x0
2023-01-11T21:38:06.2417567Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.2417648Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.2417729Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.2417808Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.2418127Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2418226Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2418300Z     tmp20 = tmp19 + tmp12
2023-01-11T21:38:06.2418375Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.2418454Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.2418532Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.2418610Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.2418687Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.2418948Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2419040Z     tmp27 = tl.where(tmp25, tmp26, 0.0)
2023-01-11T21:38:06.2419121Z     tmp28 = tmp27 + tmp20
2023-01-11T21:38:06.2419195Z     tmp29 = 2*x1
2023-01-11T21:38:06.2419274Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.2419351Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.2419429Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.2419507Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.2419756Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2419853Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.2419931Z     tmp36 = tmp35 + tmp28
2023-01-11T21:38:06.2420008Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.2420182Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2420277Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.2420355Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.2420427Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.2420632Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2420733Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.2420812Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.2420887Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.2420967Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.2421047Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.2421120Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.2421198Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.2421369Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2421464Z     tmp51 = tl.where(tmp49, tmp50, 0.0)
2023-01-11T21:38:06.2421543Z     tmp52 = tmp51 + tmp44
2023-01-11T21:38:06.2421621Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.2421787Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2421884Z     tmp55 = tl.where(tmp53, tmp54, 0.0)
2023-01-11T21:38:06.2421956Z     tmp56 = tmp55 + tmp52
2023-01-11T21:38:06.2422035Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.2422206Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2422303Z     tmp59 = tl.where(tmp57, tmp58, 0.0)
2023-01-11T21:38:06.2422381Z     tmp60 = tmp59 + tmp56
2023-01-11T21:38:06.2422460Z     tmp61 = 0.1111111111111111
2023-01-11T21:38:06.2422540Z     tmp62 = tmp60 * tmp61
2023-01-11T21:38:06.2422670Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask)
2023-01-11T21:38:06.2422755Z ''')
2023-01-11T21:38:06.2422762Z 
2023-01-11T21:38:06.2422767Z 
2023-01-11T21:38:06.2422860Z async_compile.wait(globals())
2023-01-11T21:38:06.2422937Z del async_compile
2023-01-11T21:38:06.2422942Z 
2023-01-11T21:38:06.2423015Z def call(args):
2023-01-11T21:38:06.2423088Z     arg0_1, = args
2023-01-11T21:38:06.2423165Z     args.clear()
2023-01-11T21:38:06.2423257Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2423469Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2423562Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2423733Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2423813Z         del arg0_1
2023-01-11T21:38:06.2423890Z         return (buf0, )
2023-01-11T21:38:06.2423895Z 
2023-01-11T21:38:06.2423899Z 
2023-01-11T21:38:06.2423981Z if __name__ == "__main__":
2023-01-11T21:38:06.2424101Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2424221Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2424438Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2424549Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2424554Z 
2023-01-11T21:38:06.2424626Z ok (0.363s)
2023-01-11T21:38:06.2425088Z   test_avg_pool2d4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2425224Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2425482Z [2023-01-11 21:34:04,769] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 415
2023-01-11T21:38:06.2425746Z [2023-01-11 21:34:04,918] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 415
2023-01-11T21:38:06.2426160Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2426319Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2426576Z [2023-01-11 21:34:04,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 416
2023-01-11T21:38:06.2426836Z [2023-01-11 21:34:05,067] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 416
2023-01-11T21:38:06.2426850Z 
2023-01-11T21:38:06.2426942Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2427016Z import torch
2023-01-11T21:38:06.2427090Z import random
2023-01-11T21:38:06.2427210Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2427334Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2427339Z 
2023-01-11T21:38:06.2427422Z aten = torch.ops.aten
2023-01-11T21:38:06.2427560Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2427649Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2427654Z 
2023-01-11T21:38:06.2427729Z import triton
2023-01-11T21:38:06.2427821Z import triton.language as tl
2023-01-11T21:38:06.2427945Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2428086Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2428092Z 
2023-01-11T21:38:06.2428096Z 
2023-01-11T21:38:06.2428263Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2428338Z import triton
2023-01-11T21:38:06.2428431Z import triton.language as tl
2023-01-11T21:38:06.2428538Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2428640Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2428776Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2428903Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2428911Z 
2023-01-11T21:38:06.2429316Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2429388Z @triton.jit
2023-01-11T21:38:06.2429554Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2429630Z     xnumel = 48400
2023-01-11T21:38:06.2429722Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2429854Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2429939Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2430016Z     x0 = xindex % 55
2023-01-11T21:38:06.2430098Z     x1 = (xindex // 55) % 55
2023-01-11T21:38:06.2430177Z     x2 = (xindex // 3025)
2023-01-11T21:38:06.2430240Z     x3 = xindex
2023-01-11T21:38:06.2430361Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2430485Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2430611Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2430730Z     tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2430852Z     tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2430972Z     tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2431094Z     tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2431208Z     tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2431324Z     tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.2431403Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.2431481Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.2431558Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.2431636Z     tmp8 = tmp7 + tmp6
2023-01-11T21:38:06.2431747Z     tmp10 = tmp9 + tmp8
2023-01-11T21:38:06.2431822Z     tmp12 = tmp11 + tmp10
2023-01-11T21:38:06.2431903Z     tmp14 = tmp13 + tmp12
2023-01-11T21:38:06.2431983Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.2432064Z     tmp17 = 0.1111111111111111
2023-01-11T21:38:06.2432141Z     tmp18 = tmp16 * tmp17
2023-01-11T21:38:06.2432282Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.2432369Z ''')
2023-01-11T21:38:06.2432375Z 
2023-01-11T21:38:06.2432380Z 
2023-01-11T21:38:06.2432472Z async_compile.wait(globals())
2023-01-11T21:38:06.2432542Z del async_compile
2023-01-11T21:38:06.2432547Z 
2023-01-11T21:38:06.2432621Z def call(args):
2023-01-11T21:38:06.2432691Z     arg0_1, = args
2023-01-11T21:38:06.2432765Z     args.clear()
2023-01-11T21:38:06.2432858Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2433084Z         buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2433177Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2433322Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 48400, grid=grid(48400), stream=stream0)
2023-01-11T21:38:06.2433394Z         del arg0_1
2023-01-11T21:38:06.2433470Z         return (buf0, )
2023-01-11T21:38:06.2433476Z 
2023-01-11T21:38:06.2433480Z 
2023-01-11T21:38:06.2433560Z if __name__ == "__main__":
2023-01-11T21:38:06.2433683Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2433810Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2434045Z     arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2434160Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2434165Z 
2023-01-11T21:38:06.2434170Z 
2023-01-11T21:38:06.2434260Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2434337Z import torch
2023-01-11T21:38:06.2434409Z import random
2023-01-11T21:38:06.2434530Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2434657Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2434663Z 
2023-01-11T21:38:06.2434746Z aten = torch.ops.aten
2023-01-11T21:38:06.2434882Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2434978Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2434983Z 
2023-01-11T21:38:06.2435079Z import triton
2023-01-11T21:38:06.2435172Z import triton.language as tl
2023-01-11T21:38:06.2435297Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2435439Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2435445Z 
2023-01-11T21:38:06.2435449Z 
2023-01-11T21:38:06.2435617Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2435688Z import triton
2023-01-11T21:38:06.2435780Z import triton.language as tl
2023-01-11T21:38:06.2435888Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2435989Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2436125Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2436250Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2436255Z 
2023-01-11T21:38:06.2436662Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2436736Z @triton.jit
2023-01-11T21:38:06.2436869Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2436943Z     xnumel = 48400
2023-01-11T21:38:06.2437034Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2437164Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2437247Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2437321Z     x0 = xindex % 55
2023-01-11T21:38:06.2437402Z     x1 = (xindex // 55) % 55
2023-01-11T21:38:06.2437484Z     x2 = (xindex // 3025)
2023-01-11T21:38:06.2437587Z     x3 = xindex
2023-01-11T21:38:06.2437717Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2437852Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2437989Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438130Z     tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438264Z     tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438394Z     tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438529Z     tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438664Z     tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438796Z     tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2438871Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.2438948Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.2439025Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.2439103Z     tmp8 = tmp7 + tmp6
2023-01-11T21:38:06.2439179Z     tmp10 = tmp9 + tmp8
2023-01-11T21:38:06.2439260Z     tmp12 = tmp11 + tmp10
2023-01-11T21:38:06.2439334Z     tmp14 = tmp13 + tmp12
2023-01-11T21:38:06.2439414Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.2439495Z     tmp17 = 0.1111111111111111
2023-01-11T21:38:06.2439573Z     tmp18 = tmp16 * tmp17
2023-01-11T21:38:06.2439713Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.2439798Z ''')
2023-01-11T21:38:06.2439803Z 
2023-01-11T21:38:06.2439808Z 
2023-01-11T21:38:06.2439904Z async_compile.wait(globals())
2023-01-11T21:38:06.2439980Z del async_compile
2023-01-11T21:38:06.2439985Z 
2023-01-11T21:38:06.2440053Z def call(args):
2023-01-11T21:38:06.2440130Z     arg0_1, = args
2023-01-11T21:38:06.2440204Z     args.clear()
2023-01-11T21:38:06.2440297Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2440527Z         buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2440621Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2440800Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 48400, grid=grid(48400), stream=stream0)
2023-01-11T21:38:06.2440868Z         del arg0_1
2023-01-11T21:38:06.2440944Z         return (buf0, )
2023-01-11T21:38:06.2440949Z 
2023-01-11T21:38:06.2440953Z 
2023-01-11T21:38:06.2441034Z if __name__ == "__main__":
2023-01-11T21:38:06.2441152Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2441279Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2441518Z     arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2441631Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2441640Z 
2023-01-11T21:38:06.2441711Z ok (0.318s)
2023-01-11T21:38:06.2442174Z   test_avg_pool2d5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2442301Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2442562Z [2023-01-11 21:34:05,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 417
2023-01-11T21:38:06.2442825Z [2023-01-11 21:34:05,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 417
2023-01-11T21:38:06.2443242Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2443405Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2443663Z [2023-01-11 21:34:05,451] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 418
2023-01-11T21:38:06.2443668Z 
2023-01-11T21:38:06.2443766Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2443844Z import torch
2023-01-11T21:38:06.2443918Z import random
2023-01-11T21:38:06.2444036Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2444154Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2444159Z 
2023-01-11T21:38:06.2444238Z aten = torch.ops.aten
2023-01-11T21:38:06.2444379Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2444479Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2444485Z 
2023-01-11T21:38:06.2444558Z import triton
2023-01-11T21:38:06.2444653Z import triton.language as tl
2023-01-11T21:38:06.2444778Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2444915Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2444927Z 
2023-01-11T21:38:06.2444931Z 
2023-01-11T21:38:06.2445092Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2445165Z import triton
2023-01-11T21:38:06.2445257Z import triton.language as tl
2023-01-11T21:38:06.2445370Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2445471Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2445603Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2445729Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2445734Z 
2023-01-11T21:38:06.2446136Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2446211Z @triton.jit
2023-01-11T21:38:06.2446342Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2446446Z     xnumel = 16
2023-01-11T21:38:06.2446545Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2446675Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2446758Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2446835Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2446902Z     x0 = xindex % 4
2023-01-11T21:38:06.2446971Z     x2 = xindex
2023-01-11T21:38:06.2447083Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.2447154Z     tmp1 = 0
2023-01-11T21:38:06.2447234Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2447301Z     tmp3 = 8
2023-01-11T21:38:06.2447372Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2447448Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2447561Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.2447638Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.2447715Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.2447790Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.2447867Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.2448106Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
2023-01-11T21:38:06.2448203Z     tmp12 = tl.where(tmp10, tmp11, 0.0)
2023-01-11T21:38:06.2448277Z     tmp13 = 2*x0
2023-01-11T21:38:06.2448357Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.2448437Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.2448519Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.2448597Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.2448827Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0)
2023-01-11T21:38:06.2448923Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2449003Z     tmp20 = tmp19 + tmp12
2023-01-11T21:38:06.2449119Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.2449197Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.2449275Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.2449353Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.2449425Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.2449667Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0)
2023-01-11T21:38:06.2449762Z     tmp27 = tl.where(tmp25, tmp26, 0.0)
2023-01-11T21:38:06.2449840Z     tmp28 = tmp27 + tmp20
2023-01-11T21:38:06.2449913Z     tmp29 = 2*x1
2023-01-11T21:38:06.2449996Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.2450074Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.2450146Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.2450223Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.2450456Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0)
2023-01-11T21:38:06.2450550Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.2450633Z     tmp36 = tmp35 + tmp28
2023-01-11T21:38:06.2450710Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.2450866Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0)
2023-01-11T21:38:06.2450954Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.2451033Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.2451114Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.2451268Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0)
2023-01-11T21:38:06.2451365Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.2451441Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.2451517Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.2451589Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.2451669Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.2451747Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.2451829Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.2451987Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0)
2023-01-11T21:38:06.2452085Z     tmp51 = tl.where(tmp49, tmp50, 0.0)
2023-01-11T21:38:06.2452164Z     tmp52 = tmp51 + tmp44
2023-01-11T21:38:06.2452236Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.2452422Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0)
2023-01-11T21:38:06.2452519Z     tmp55 = tl.where(tmp53, tmp54, 0.0)
2023-01-11T21:38:06.2452600Z     tmp56 = tmp55 + tmp52
2023-01-11T21:38:06.2452679Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.2452837Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0)
2023-01-11T21:38:06.2452933Z     tmp59 = tl.where(tmp57, tmp58, 0.0)
2023-01-11T21:38:06.2453006Z     tmp60 = tmp59 + tmp56
2023-01-11T21:38:06.2453077Z     tmp61 = 1
2023-01-11T21:38:06.2453170Z     tmp62 = tl.where(tmp10, tmp61, 0.0)
2023-01-11T21:38:06.2453239Z     tmp63 = 1
2023-01-11T21:38:06.2453333Z     tmp64 = tl.where(tmp17, tmp63, 0.0)
2023-01-11T21:38:06.2453417Z     tmp65 = tmp64 + tmp62
2023-01-11T21:38:06.2453488Z     tmp66 = 1
2023-01-11T21:38:06.2453574Z     tmp67 = tl.where(tmp25, tmp66, 0.0)
2023-01-11T21:38:06.2453654Z     tmp68 = tmp67 + tmp65
2023-01-11T21:38:06.2453726Z     tmp69 = 1
2023-01-11T21:38:06.2453816Z     tmp70 = tl.where(tmp33, tmp69, 0.0)
2023-01-11T21:38:06.2453896Z     tmp71 = tmp70 + tmp68
2023-01-11T21:38:06.2453968Z     tmp72 = 1
2023-01-11T21:38:06.2454061Z     tmp73 = tl.where(tmp37, tmp72, 0.0)
2023-01-11T21:38:06.2454134Z     tmp74 = tmp73 + tmp71
2023-01-11T21:38:06.2454205Z     tmp75 = 1
2023-01-11T21:38:06.2454298Z     tmp76 = tl.where(tmp41, tmp75, 0.0)
2023-01-11T21:38:06.2454377Z     tmp77 = tmp76 + tmp74
2023-01-11T21:38:06.2454447Z     tmp78 = 1
2023-01-11T21:38:06.2454654Z     tmp79 = tl.where(tmp49, tmp78, 0.0)
2023-01-11T21:38:06.2454733Z     tmp80 = tmp79 + tmp77
2023-01-11T21:38:06.2454797Z     tmp81 = 1
2023-01-11T21:38:06.2454890Z     tmp82 = tl.where(tmp53, tmp81, 0.0)
2023-01-11T21:38:06.2455013Z     tmp83 = tmp82 + tmp80
2023-01-11T21:38:06.2455083Z     tmp84 = 1
2023-01-11T21:38:06.2455175Z     tmp85 = tl.where(tmp57, tmp84, 0.0)
2023-01-11T21:38:06.2455252Z     tmp86 = tmp85 + tmp83
2023-01-11T21:38:06.2455324Z     tmp87 = tmp60 / tmp86
2023-01-11T21:38:06.2455465Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp87, xmask)
2023-01-11T21:38:06.2455554Z ''')
2023-01-11T21:38:06.2455560Z 
2023-01-11T21:38:06.2455565Z 
2023-01-11T21:38:06.2455659Z async_compile.wait(globals())
2023-01-11T21:38:06.2455737Z del async_compile
2023-01-11T21:38:06.2455742Z 
2023-01-11T21:38:06.2455820Z def call(args):
2023-01-11T21:38:06.2455892Z     arg0_1, = args
2023-01-11T21:38:06.2455967Z     args.clear()
2023-01-11T21:38:06.2456054Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2456271Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2456361Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2456507Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2456582Z         del arg0_1
2023-01-11T21:38:06.2456660Z         return (buf0, )
2023-01-11T21:38:06.2456666Z 
2023-01-11T21:38:06.2456671Z 
2023-01-11T21:38:06.2456750Z if __name__ == "__main__":
2023-01-11T21:38:06.2456870Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2456994Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2457270Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2457406Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2457412Z 
2023-01-11T21:38:06.2457713Z [2023-01-11 21:34:05,651] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 418
2023-01-11T21:38:06.2457720Z 
2023-01-11T21:38:06.2457833Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2457921Z import torch
2023-01-11T21:38:06.2458135Z import random
2023-01-11T21:38:06.2458266Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2458400Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2458405Z 
2023-01-11T21:38:06.2458490Z aten = torch.ops.aten
2023-01-11T21:38:06.2458642Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2458783Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2458789Z 
2023-01-11T21:38:06.2458865Z import triton
2023-01-11T21:38:06.2458961Z import triton.language as tl
2023-01-11T21:38:06.2459099Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2459247Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2459259Z 
2023-01-11T21:38:06.2459263Z 
2023-01-11T21:38:06.2459445Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2459520Z import triton
2023-01-11T21:38:06.2459615Z import triton.language as tl
2023-01-11T21:38:06.2459737Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2459848Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2459992Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2460128Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2460134Z 
2023-01-11T21:38:06.2460606Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2460679Z @triton.jit
2023-01-11T21:38:06.2460814Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2460886Z     xnumel = 16
2023-01-11T21:38:06.2460983Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2461113Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2461197Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2461277Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2461343Z     x0 = xindex % 4
2023-01-11T21:38:06.2461442Z     x2 = xindex
2023-01-11T21:38:06.2461555Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.2461627Z     tmp1 = 0
2023-01-11T21:38:06.2461709Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2461781Z     tmp3 = 8
2023-01-11T21:38:06.2461855Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2461933Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2462045Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.2462124Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.2462204Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.2462285Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.2462366Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.2462627Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2462726Z     tmp12 = tl.where(tmp10, tmp11, 0.0)
2023-01-11T21:38:06.2462800Z     tmp13 = 2*x0
2023-01-11T21:38:06.2462887Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.2462969Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.2463050Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.2463134Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.2463394Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2463491Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2463575Z     tmp20 = tmp19 + tmp12
2023-01-11T21:38:06.2463656Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.2463738Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.2463819Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.2463900Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.2463975Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.2464237Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2464335Z     tmp27 = tl.where(tmp25, tmp26, 0.0)
2023-01-11T21:38:06.2464416Z     tmp28 = tmp27 + tmp20
2023-01-11T21:38:06.2464493Z     tmp29 = 2*x1
2023-01-11T21:38:06.2464572Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.2464657Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.2464731Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.2464814Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.2465070Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2465170Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.2465285Z     tmp36 = tmp35 + tmp28
2023-01-11T21:38:06.2465368Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.2465542Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2465632Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.2465713Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.2465795Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.2465966Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2466063Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.2466150Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.2466228Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.2466302Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.2466385Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.2466467Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.2466547Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.2466722Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2466820Z     tmp51 = tl.where(tmp49, tmp50, 0.0)
2023-01-11T21:38:06.2466899Z     tmp52 = tmp51 + tmp44
2023-01-11T21:38:06.2466973Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.2467144Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2467241Z     tmp55 = tl.where(tmp53, tmp54, 0.0)
2023-01-11T21:38:06.2467324Z     tmp56 = tmp55 + tmp52
2023-01-11T21:38:06.2467409Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.2467580Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2467704Z     tmp59 = tl.where(tmp57, tmp58, 0.0)
2023-01-11T21:38:06.2467786Z     tmp60 = tmp59 + tmp56
2023-01-11T21:38:06.2467853Z     tmp61 = 1
2023-01-11T21:38:06.2467950Z     tmp62 = tl.where(tmp10, tmp61, 0.0)
2023-01-11T21:38:06.2468027Z     tmp63 = 1
2023-01-11T21:38:06.2468123Z     tmp64 = tl.where(tmp17, tmp63, 0.0)
2023-01-11T21:38:06.2468204Z     tmp65 = tmp64 + tmp62
2023-01-11T21:38:06.2468281Z     tmp66 = 1
2023-01-11T21:38:06.2468375Z     tmp67 = tl.where(tmp25, tmp66, 0.0)
2023-01-11T21:38:06.2468450Z     tmp68 = tmp67 + tmp65
2023-01-11T21:38:06.2468523Z     tmp69 = 1
2023-01-11T21:38:06.2468617Z     tmp70 = tl.where(tmp33, tmp69, 0.0)
2023-01-11T21:38:06.2468699Z     tmp71 = tmp70 + tmp68
2023-01-11T21:38:06.2468772Z     tmp72 = 1
2023-01-11T21:38:06.2468868Z     tmp73 = tl.where(tmp37, tmp72, 0.0)
2023-01-11T21:38:06.2468944Z     tmp74 = tmp73 + tmp71
2023-01-11T21:38:06.2469019Z     tmp75 = 1
2023-01-11T21:38:06.2469117Z     tmp76 = tl.where(tmp41, tmp75, 0.0)
2023-01-11T21:38:06.2469198Z     tmp77 = tmp76 + tmp74
2023-01-11T21:38:06.2469271Z     tmp78 = 1
2023-01-11T21:38:06.2469367Z     tmp79 = tl.where(tmp49, tmp78, 0.0)
2023-01-11T21:38:06.2469450Z     tmp80 = tmp79 + tmp77
2023-01-11T21:38:06.2469519Z     tmp81 = 1
2023-01-11T21:38:06.2469614Z     tmp82 = tl.where(tmp53, tmp81, 0.0)
2023-01-11T21:38:06.2469695Z     tmp83 = tmp82 + tmp80
2023-01-11T21:38:06.2469769Z     tmp84 = 1
2023-01-11T21:38:06.2469863Z     tmp85 = tl.where(tmp57, tmp84, 0.0)
2023-01-11T21:38:06.2469945Z     tmp86 = tmp85 + tmp83
2023-01-11T21:38:06.2470028Z     tmp87 = tmp60 / tmp86
2023-01-11T21:38:06.2470161Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp87, xmask)
2023-01-11T21:38:06.2470251Z ''')
2023-01-11T21:38:06.2470257Z 
2023-01-11T21:38:06.2470261Z 
2023-01-11T21:38:06.2470357Z async_compile.wait(globals())
2023-01-11T21:38:06.2470435Z del async_compile
2023-01-11T21:38:06.2470443Z 
2023-01-11T21:38:06.2470521Z def call(args):
2023-01-11T21:38:06.2470597Z     arg0_1, = args
2023-01-11T21:38:06.2470673Z     args.clear()
2023-01-11T21:38:06.2470763Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2470983Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2471105Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2471255Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2471332Z         del arg0_1
2023-01-11T21:38:06.2471412Z         return (buf0, )
2023-01-11T21:38:06.2471417Z 
2023-01-11T21:38:06.2471422Z 
2023-01-11T21:38:06.2471505Z if __name__ == "__main__":
2023-01-11T21:38:06.2471626Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2471749Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2471969Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2472087Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2472092Z 
2023-01-11T21:38:06.2472165Z ok (0.585s)
2023-01-11T21:38:06.2472623Z   test_avg_pool2d6_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2472760Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2473022Z [2023-01-11 21:34:05,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 419
2023-01-11T21:38:06.2473289Z [2023-01-11 21:34:05,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 419
2023-01-11T21:38:06.2473706Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2473871Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2474130Z [2023-01-11 21:34:05,838] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 420
2023-01-11T21:38:06.2474136Z 
2023-01-11T21:38:06.2474230Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2474307Z import torch
2023-01-11T21:38:06.2474382Z import random
2023-01-11T21:38:06.2474508Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2474633Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2474639Z 
2023-01-11T21:38:06.2474723Z aten = torch.ops.aten
2023-01-11T21:38:06.2474867Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2474959Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2474969Z 
2023-01-11T21:38:06.2475039Z import triton
2023-01-11T21:38:06.2475131Z import triton.language as tl
2023-01-11T21:38:06.2475261Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2475408Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2475414Z 
2023-01-11T21:38:06.2475418Z 
2023-01-11T21:38:06.2475591Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2475668Z import triton
2023-01-11T21:38:06.2475767Z import triton.language as tl
2023-01-11T21:38:06.2475878Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2475980Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2476117Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2476246Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2476254Z 
2023-01-11T21:38:06.2476660Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2476735Z @triton.jit
2023-01-11T21:38:06.2476950Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2477028Z     xnumel = 16
2023-01-11T21:38:06.2477121Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2477254Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2477341Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2477422Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2477501Z     x0 = xindex % 4
2023-01-11T21:38:06.2477573Z     x2 = xindex
2023-01-11T21:38:06.2477686Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.2477753Z     tmp1 = 0
2023-01-11T21:38:06.2477837Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2477909Z     tmp3 = 8
2023-01-11T21:38:06.2477994Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2478073Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2478186Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.2478261Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.2478339Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.2478418Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.2478498Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.2478747Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
2023-01-11T21:38:06.2478846Z     tmp12 = tl.where(tmp10, tmp11, 0.0)
2023-01-11T21:38:06.2478922Z     tmp13 = 2*x0
2023-01-11T21:38:06.2478999Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.2479080Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.2479161Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.2479243Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.2479482Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0)
2023-01-11T21:38:06.2479582Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2479695Z     tmp20 = tmp19 + tmp12
2023-01-11T21:38:06.2479765Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.2479850Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.2479932Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.2480015Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.2480097Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.2480340Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0)
2023-01-11T21:38:06.2480437Z     tmp27 = tl.where(tmp25, tmp26, 0.0)
2023-01-11T21:38:06.2480512Z     tmp28 = tmp27 + tmp20
2023-01-11T21:38:06.2480585Z     tmp29 = 2*x1
2023-01-11T21:38:06.2480666Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.2480744Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.2480826Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.2480905Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.2481143Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0)
2023-01-11T21:38:06.2481238Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.2481321Z     tmp36 = tmp35 + tmp28
2023-01-11T21:38:06.2481403Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.2481563Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0)
2023-01-11T21:38:06.2481660Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.2481745Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.2481826Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.2481979Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0)
2023-01-11T21:38:06.2482077Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.2482157Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.2482234Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.2482313Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.2482396Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.2482479Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.2482553Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.2482718Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0)
2023-01-11T21:38:06.2482817Z     tmp51 = tl.where(tmp49, tmp50, 0.0)
2023-01-11T21:38:06.2482897Z     tmp52 = tmp51 + tmp44
2023-01-11T21:38:06.2482978Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.2483161Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0)
2023-01-11T21:38:06.2483260Z     tmp55 = tl.where(tmp53, tmp54, 0.0)
2023-01-11T21:38:06.2483343Z     tmp56 = tmp55 + tmp52
2023-01-11T21:38:06.2483417Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.2483578Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0)
2023-01-11T21:38:06.2483677Z     tmp59 = tl.where(tmp57, tmp58, 0.0)
2023-01-11T21:38:06.2483760Z     tmp60 = tmp59 + tmp56
2023-01-11T21:38:06.2483843Z     tmp61 = 0.3333333333333333
2023-01-11T21:38:06.2483925Z     tmp62 = tmp60 * tmp61
2023-01-11T21:38:06.2484066Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask)
2023-01-11T21:38:06.2484151Z ''')
2023-01-11T21:38:06.2484158Z 
2023-01-11T21:38:06.2484162Z 
2023-01-11T21:38:06.2484260Z async_compile.wait(globals())
2023-01-11T21:38:06.2484339Z del async_compile
2023-01-11T21:38:06.2484344Z 
2023-01-11T21:38:06.2484422Z def call(args):
2023-01-11T21:38:06.2484500Z     arg0_1, = args
2023-01-11T21:38:06.2484578Z     args.clear()
2023-01-11T21:38:06.2484674Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2484887Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2484983Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2485129Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2485204Z         del arg0_1
2023-01-11T21:38:06.2485281Z         return (buf0, )
2023-01-11T21:38:06.2485286Z 
2023-01-11T21:38:06.2485291Z 
2023-01-11T21:38:06.2485376Z if __name__ == "__main__":
2023-01-11T21:38:06.2485525Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2485656Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2485868Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2485984Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2485992Z 
2023-01-11T21:38:06.2486263Z [2023-01-11 21:34:05,990] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 420
2023-01-11T21:38:06.2486268Z 
2023-01-11T21:38:06.2486373Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2486450Z import torch
2023-01-11T21:38:06.2486526Z import random
2023-01-11T21:38:06.2486649Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2486777Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2486782Z 
2023-01-11T21:38:06.2486860Z aten = torch.ops.aten
2023-01-11T21:38:06.2487001Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2487105Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2487110Z 
2023-01-11T21:38:06.2487186Z import triton
2023-01-11T21:38:06.2487280Z import triton.language as tl
2023-01-11T21:38:06.2487408Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2487554Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2487560Z 
2023-01-11T21:38:06.2487564Z 
2023-01-11T21:38:06.2487734Z triton_fused_avg_pool2d_0 = async_compile.triton('''
2023-01-11T21:38:06.2487805Z import triton
2023-01-11T21:38:06.2487898Z import triton.language as tl
2023-01-11T21:38:06.2488017Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2488122Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2488256Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2488383Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2488388Z 
2023-01-11T21:38:06.2488791Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2488868Z @triton.jit
2023-01-11T21:38:06.2488997Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2489103Z     xnumel = 16
2023-01-11T21:38:06.2489204Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2489338Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2489423Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2489503Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2489579Z     x0 = xindex % 4
2023-01-11T21:38:06.2489645Z     x2 = xindex
2023-01-11T21:38:06.2489759Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.2489832Z     tmp1 = 0
2023-01-11T21:38:06.2489911Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2489983Z     tmp3 = 8
2023-01-11T21:38:06.2490062Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2490145Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2490249Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.2490329Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.2490407Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.2490483Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.2490564Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.2490835Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2490935Z     tmp12 = tl.where(tmp10, tmp11, 0.0)
2023-01-11T21:38:06.2491004Z     tmp13 = 2*x0
2023-01-11T21:38:06.2491088Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.2491170Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.2491252Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.2491337Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.2491599Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2491698Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.2491815Z     tmp20 = tmp19 + tmp12
2023-01-11T21:38:06.2491893Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.2491978Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.2492061Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.2492141Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.2492222Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.2492479Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2492580Z     tmp27 = tl.where(tmp25, tmp26, 0.0)
2023-01-11T21:38:06.2492661Z     tmp28 = tmp27 + tmp20
2023-01-11T21:38:06.2492737Z     tmp29 = 2*x1
2023-01-11T21:38:06.2492817Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.2492898Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.2492979Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.2493054Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.2493308Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2493410Z     tmp35 = tl.where(tmp33, tmp34, 0.0)
2023-01-11T21:38:06.2493493Z     tmp36 = tmp35 + tmp28
2023-01-11T21:38:06.2493575Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.2493745Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2493847Z     tmp39 = tl.where(tmp37, tmp38, 0.0)
2023-01-11T21:38:06.2493927Z     tmp40 = tmp39 + tmp36
2023-01-11T21:38:06.2494001Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.2494174Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2494271Z     tmp43 = tl.where(tmp41, tmp42, 0.0)
2023-01-11T21:38:06.2494354Z     tmp44 = tmp43 + tmp40
2023-01-11T21:38:06.2494431Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.2494624Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.2494704Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.2494777Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.2494856Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.2495031Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2495131Z     tmp51 = tl.where(tmp49, tmp50, 0.0)
2023-01-11T21:38:06.2495230Z     tmp52 = tmp51 + tmp44
2023-01-11T21:38:06.2495314Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.2495547Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2495637Z     tmp55 = tl.where(tmp53, tmp54, 0.0)
2023-01-11T21:38:06.2495718Z     tmp56 = tmp55 + tmp52
2023-01-11T21:38:06.2495797Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.2495967Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2496062Z     tmp59 = tl.where(tmp57, tmp58, 0.0)
2023-01-11T21:38:06.2496142Z     tmp60 = tmp59 + tmp56
2023-01-11T21:38:06.2496223Z     tmp61 = 0.3333333333333333
2023-01-11T21:38:06.2496299Z     tmp62 = tmp60 * tmp61
2023-01-11T21:38:06.2496435Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp62, xmask)
2023-01-11T21:38:06.2496523Z ''')
2023-01-11T21:38:06.2496529Z 
2023-01-11T21:38:06.2496533Z 
2023-01-11T21:38:06.2496627Z async_compile.wait(globals())
2023-01-11T21:38:06.2496704Z del async_compile
2023-01-11T21:38:06.2496712Z 
2023-01-11T21:38:06.2496787Z def call(args):
2023-01-11T21:38:06.2496862Z     arg0_1, = args
2023-01-11T21:38:06.2496941Z     args.clear()
2023-01-11T21:38:06.2497027Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2497304Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2497399Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2497546Z         triton_fused_avg_pool2d_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.2497622Z         del arg0_1
2023-01-11T21:38:06.2497699Z         return (buf0, )
2023-01-11T21:38:06.2497746Z 
2023-01-11T21:38:06.2497750Z 
2023-01-11T21:38:06.2497833Z if __name__ == "__main__":
2023-01-11T21:38:06.2497950Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2498072Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2498296Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2498411Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2498416Z 
2023-01-11T21:38:06.2498485Z ok (0.339s)
2023-01-11T21:38:06.2498944Z   test_avg_pool2d7_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2499077Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2499340Z [2023-01-11 21:34:06,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 421
2023-01-11T21:38:06.2499567Z [2023-01-11 21:34:06,014] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d
2023-01-11T21:38:06.2499834Z [2023-01-11 21:34:06,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 421
2023-01-11T21:38:06.2500255Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2500382Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2500637Z [2023-01-11 21:34:06,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 422
2023-01-11T21:38:06.2500870Z [2023-01-11 21:34:06,039] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d
2023-01-11T21:38:06.2501133Z [2023-01-11 21:34:06,041] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 422
2023-01-11T21:38:06.2501139Z 
2023-01-11T21:38:06.2501265Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2501341Z import torch
2023-01-11T21:38:06.2501415Z import random
2023-01-11T21:38:06.2501536Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2501653Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2501666Z 
2023-01-11T21:38:06.2501742Z aten = torch.ops.aten
2023-01-11T21:38:06.2501880Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2501974Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2501979Z 
2023-01-11T21:38:06.2502053Z import triton
2023-01-11T21:38:06.2502146Z import triton.language as tl
2023-01-11T21:38:06.2502277Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2502420Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2502425Z 
2023-01-11T21:38:06.2502430Z 
2023-01-11T21:38:06.2502516Z async_compile.wait(globals())
2023-01-11T21:38:06.2502592Z del async_compile
2023-01-11T21:38:06.2502597Z 
2023-01-11T21:38:06.2502673Z def call(args):
2023-01-11T21:38:06.2502745Z     arg0_1, = args
2023-01-11T21:38:06.2502820Z     args.clear()
2023-01-11T21:38:06.2502912Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2503048Z         buf0 = aten.avg_pool2d(arg0_1, [13, 13], [1, 1], [0, 0], False, True, None)
2023-01-11T21:38:06.2503114Z         del arg0_1
2023-01-11T21:38:06.2503186Z         buf1 = buf0
2023-01-11T21:38:06.2503304Z         assert_size_stride(buf1, (1, 1, 12, 12), (144, 144, 12, 1))
2023-01-11T21:38:06.2503375Z         del buf0
2023-01-11T21:38:06.2503450Z         return (buf1, )
2023-01-11T21:38:06.2503455Z 
2023-01-11T21:38:06.2503487Z 
2023-01-11T21:38:06.2503571Z if __name__ == "__main__":
2023-01-11T21:38:06.2503692Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2503819Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2504042Z     arg0_1 = rand_strided((1, 1, 24, 24), (576, 576, 24, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2504157Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2504162Z 
2023-01-11T21:38:06.2504166Z 
2023-01-11T21:38:06.2504266Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2504342Z import torch
2023-01-11T21:38:06.2504418Z import random
2023-01-11T21:38:06.2504540Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2504667Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2504672Z 
2023-01-11T21:38:06.2504755Z aten = torch.ops.aten
2023-01-11T21:38:06.2504886Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2504980Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2504989Z 
2023-01-11T21:38:06.2505065Z import triton
2023-01-11T21:38:06.2505159Z import triton.language as tl
2023-01-11T21:38:06.2505288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2505428Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2505433Z 
2023-01-11T21:38:06.2505440Z 
2023-01-11T21:38:06.2505534Z async_compile.wait(globals())
2023-01-11T21:38:06.2505613Z del async_compile
2023-01-11T21:38:06.2505618Z 
2023-01-11T21:38:06.2505689Z def call(args):
2023-01-11T21:38:06.2505764Z     arg0_1, = args
2023-01-11T21:38:06.2505841Z     args.clear()
2023-01-11T21:38:06.2505935Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2506071Z         buf0 = aten.avg_pool2d(arg0_1, [13, 13], [1, 1], [0, 0], False, True, None)
2023-01-11T21:38:06.2506151Z         del arg0_1
2023-01-11T21:38:06.2506225Z         buf1 = buf0
2023-01-11T21:38:06.2506335Z         assert_size_stride(buf1, (1, 1, 12, 12), (144, 144, 12, 1))
2023-01-11T21:38:06.2506412Z         del buf0
2023-01-11T21:38:06.2506492Z         return (buf1, )
2023-01-11T21:38:06.2506497Z 
2023-01-11T21:38:06.2506501Z 
2023-01-11T21:38:06.2506583Z if __name__ == "__main__":
2023-01-11T21:38:06.2506700Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2506857Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2507085Z     arg0_1 = rand_strided((1, 1, 24, 24), (576, 576, 24, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2507201Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2507206Z 
2023-01-11T21:38:06.2507273Z ok (0.051s)
2023-01-11T21:38:06.2507744Z   test_avg_pool2d_backward2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2507885Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2508145Z [2023-01-11 21:34:06,062] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 423
2023-01-11T21:38:06.2508416Z [2023-01-11 21:34:06,383] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 423
2023-01-11T21:38:06.2508837Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2508971Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2509231Z [2023-01-11 21:34:06,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 424
2023-01-11T21:38:06.2509273Z 
2023-01-11T21:38:06.2509376Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2509451Z import torch
2023-01-11T21:38:06.2509522Z import random
2023-01-11T21:38:06.2509642Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2509769Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2509774Z 
2023-01-11T21:38:06.2509858Z aten = torch.ops.aten
2023-01-11T21:38:06.2509995Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2510091Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2510096Z 
2023-01-11T21:38:06.2510172Z import triton
2023-01-11T21:38:06.2510267Z import triton.language as tl
2023-01-11T21:38:06.2510390Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2510536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2510542Z 
2023-01-11T21:38:06.2510546Z 
2023-01-11T21:38:06.2510735Z triton_fused_avg_pool2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.2510817Z import triton
2023-01-11T21:38:06.2510912Z import triton.language as tl
2023-01-11T21:38:06.2511029Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2511137Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2511269Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2511399Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2511404Z 
2023-01-11T21:38:06.2511813Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2511890Z @triton.jit
2023-01-11T21:38:06.2512027Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2512102Z     xnumel = 300
2023-01-11T21:38:06.2512202Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2512336Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2512414Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2512497Z     x1 = (xindex // 15)
2023-01-11T21:38:06.2512578Z     x0 = xindex % 15
2023-01-11T21:38:06.2512650Z     x2 = xindex
2023-01-11T21:38:06.2512755Z     tmp0 = (-1) + x1
2023-01-11T21:38:06.2512889Z     tmp1 = (-1) + x0
2023-01-11T21:38:06.2512967Z     tmp2 = 2 + x1
2023-01-11T21:38:06.2513035Z     tmp3 = 2 + x0
2023-01-11T21:38:06.2513106Z     tmp4 = 0
2023-01-11T21:38:06.2513252Z     tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4))
2023-01-11T21:38:06.2513391Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4))
2023-01-11T21:38:06.2513465Z     tmp7 = 20
2023-01-11T21:38:06.2513598Z     tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7))
2023-01-11T21:38:06.2513670Z     tmp9 = 15
2023-01-11T21:38:06.2513803Z     tmp10 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp9, tmp3, tmp9))
2023-01-11T21:38:06.2513887Z     tmp11 = tmp5 + tmp4
2023-01-11T21:38:06.2513968Z     tmp12 = tmp6 + tmp4
2023-01-11T21:38:06.2514045Z     tmp13 = 1
2023-01-11T21:38:06.2514117Z     tmp14 = 3
2023-01-11T21:38:06.2514200Z     tmp15 = tmp11 * tmp13
2023-01-11T21:38:06.2514318Z     tmp16 = tmp15 - tmp13
2023-01-11T21:38:06.2514393Z     tmp17 = tmp12 * tmp13
2023-01-11T21:38:06.2514509Z     tmp18 = tmp17 - tmp13
2023-01-11T21:38:06.2514589Z     tmp19 = tmp16 + tmp14
2023-01-11T21:38:06.2514670Z     tmp20 = tmp7 + tmp13
2023-01-11T21:38:06.2514818Z     tmp21 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 < tmp20, tmp19, tmp20))
2023-01-11T21:38:06.2514900Z     tmp22 = tmp18 + tmp14
2023-01-11T21:38:06.2514979Z     tmp23 = tmp9 + tmp13
2023-01-11T21:38:06.2515118Z     tmp24 = tl.where(tmp22 != tmp22, tmp22, tl.where(tmp22 < tmp23, tmp22, tmp23))
2023-01-11T21:38:06.2515287Z     tmp25 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 > tmp4, tmp16, tmp4))
2023-01-11T21:38:06.2515455Z     tmp26 = tl.where(tmp18 != tmp18, tmp18, tl.where(tmp18 > tmp4, tmp18, tmp4))
2023-01-11T21:38:06.2515625Z     tmp27 = tl.where(tmp21 != tmp21, tmp21, tl.where(tmp21 < tmp7, tmp21, tmp7))
2023-01-11T21:38:06.2515766Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp9, tmp24, tmp9))
2023-01-11T21:38:06.2515884Z     tmp29 = tmp27 - tmp25
2023-01-11T21:38:06.2516001Z     tmp30 = tmp28 - tmp26
2023-01-11T21:38:06.2516076Z     tmp31 = tmp29 * tmp30
2023-01-11T21:38:06.2516190Z     tmp32 = tmp8 - tmp13
2023-01-11T21:38:06.2516336Z     tmp33 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp32, tmp11, tmp32))
2023-01-11T21:38:06.2516454Z     tmp34 = tmp10 - tmp13
2023-01-11T21:38:06.2516596Z     tmp35 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp34, tmp12, tmp34))
2023-01-11T21:38:06.2516714Z     tmp36 = tl.load(in_ptr0 + (tmp35 + (15*tmp33)), None)
2023-01-11T21:38:06.2516796Z     tmp37 = tmp36 / tmp31
2023-01-11T21:38:06.2516872Z     tmp38 = tmp11 < tmp8
2023-01-11T21:38:06.2516953Z     tmp39 = tmp12 < tmp10
2023-01-11T21:38:06.2517037Z     tmp40 = tmp38 & tmp39
2023-01-11T21:38:06.2517113Z     tmp41 = 0.0
2023-01-11T21:38:06.2517215Z     tmp42 = tl.where(tmp40, tmp37, tmp41)
2023-01-11T21:38:06.2517296Z     tmp43 = tmp6 + tmp13
2023-01-11T21:38:06.2517375Z     tmp44 = tmp43 * tmp13
2023-01-11T21:38:06.2517482Z     tmp45 = tmp44 - tmp13
2023-01-11T21:38:06.2517565Z     tmp46 = tmp45 + tmp14
2023-01-11T21:38:06.2517709Z     tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp23, tmp46, tmp23))
2023-01-11T21:38:06.2517851Z     tmp48 = tl.where(tmp45 != tmp45, tmp45, tl.where(tmp45 > tmp4, tmp45, tmp4))
2023-01-11T21:38:06.2517992Z     tmp49 = tl.where(tmp47 != tmp47, tmp47, tl.where(tmp47 < tmp9, tmp47, tmp9))
2023-01-11T21:38:06.2518108Z     tmp50 = tmp49 - tmp48
2023-01-11T21:38:06.2518189Z     tmp51 = tmp29 * tmp50
2023-01-11T21:38:06.2518323Z     tmp52 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 < tmp34, tmp43, tmp34))
2023-01-11T21:38:06.2518443Z     tmp53 = tl.load(in_ptr0 + (tmp52 + (15*tmp33)), None)
2023-01-11T21:38:06.2518528Z     tmp54 = tmp53 / tmp51
2023-01-11T21:38:06.2518607Z     tmp55 = tmp43 < tmp10
2023-01-11T21:38:06.2518688Z     tmp56 = tmp38 & tmp55
2023-01-11T21:38:06.2518769Z     tmp57 = tmp42 + tmp54
2023-01-11T21:38:06.2518869Z     tmp58 = tl.where(tmp56, tmp57, tmp42)
2023-01-11T21:38:06.2518936Z     tmp59 = 2
2023-01-11T21:38:06.2519048Z     tmp60 = tmp6 + tmp59
2023-01-11T21:38:06.2519129Z     tmp61 = tmp60 * tmp13
2023-01-11T21:38:06.2519241Z     tmp62 = tmp61 - tmp13
2023-01-11T21:38:06.2519321Z     tmp63 = tmp62 + tmp14
2023-01-11T21:38:06.2519459Z     tmp64 = tl.where(tmp63 != tmp63, tmp63, tl.where(tmp63 < tmp23, tmp63, tmp23))
2023-01-11T21:38:06.2519599Z     tmp65 = tl.where(tmp62 != tmp62, tmp62, tl.where(tmp62 > tmp4, tmp62, tmp4))
2023-01-11T21:38:06.2519738Z     tmp66 = tl.where(tmp64 != tmp64, tmp64, tl.where(tmp64 < tmp9, tmp64, tmp9))
2023-01-11T21:38:06.2519845Z     tmp67 = tmp66 - tmp65
2023-01-11T21:38:06.2519926Z     tmp68 = tmp29 * tmp67
2023-01-11T21:38:06.2520072Z     tmp69 = tl.where(tmp60 != tmp60, tmp60, tl.where(tmp60 < tmp34, tmp60, tmp34))
2023-01-11T21:38:06.2520188Z     tmp70 = tl.load(in_ptr0 + (tmp69 + (15*tmp33)), None)
2023-01-11T21:38:06.2520272Z     tmp71 = tmp70 / tmp68
2023-01-11T21:38:06.2520351Z     tmp72 = tmp60 < tmp10
2023-01-11T21:38:06.2520433Z     tmp73 = tmp38 & tmp72
2023-01-11T21:38:06.2520509Z     tmp74 = tmp58 + tmp71
2023-01-11T21:38:06.2520610Z     tmp75 = tl.where(tmp73, tmp74, tmp58)
2023-01-11T21:38:06.2520692Z     tmp76 = tmp5 + tmp13
2023-01-11T21:38:06.2520772Z     tmp77 = tmp76 * tmp13
2023-01-11T21:38:06.2520888Z     tmp78 = tmp77 - tmp13
2023-01-11T21:38:06.2520969Z     tmp79 = tmp78 + tmp14
2023-01-11T21:38:06.2521111Z     tmp80 = tl.where(tmp79 != tmp79, tmp79, tl.where(tmp79 < tmp20, tmp79, tmp20))
2023-01-11T21:38:06.2521247Z     tmp81 = tl.where(tmp78 != tmp78, tmp78, tl.where(tmp78 > tmp4, tmp78, tmp4))
2023-01-11T21:38:06.2521388Z     tmp82 = tl.where(tmp80 != tmp80, tmp80, tl.where(tmp80 < tmp7, tmp80, tmp7))
2023-01-11T21:38:06.2521530Z     tmp83 = tmp82 - tmp81
2023-01-11T21:38:06.2521611Z     tmp84 = tmp83 * tmp30
2023-01-11T21:38:06.2521751Z     tmp85 = tl.where(tmp76 != tmp76, tmp76, tl.where(tmp76 < tmp32, tmp76, tmp32))
2023-01-11T21:38:06.2521869Z     tmp86 = tl.load(in_ptr0 + (tmp35 + (15*tmp85)), None)
2023-01-11T21:38:06.2521950Z     tmp87 = tmp86 / tmp84
2023-01-11T21:38:06.2522028Z     tmp88 = tmp76 < tmp8
2023-01-11T21:38:06.2522110Z     tmp89 = tmp88 & tmp39
2023-01-11T21:38:06.2522191Z     tmp90 = tmp75 + tmp87
2023-01-11T21:38:06.2522293Z     tmp91 = tl.where(tmp89, tmp90, tmp75)
2023-01-11T21:38:06.2522374Z     tmp92 = tmp83 * tmp50
2023-01-11T21:38:06.2522490Z     tmp93 = tl.load(in_ptr0 + (tmp52 + (15*tmp85)), None)
2023-01-11T21:38:06.2522571Z     tmp94 = tmp93 / tmp92
2023-01-11T21:38:06.2522645Z     tmp95 = tmp88 & tmp55
2023-01-11T21:38:06.2522727Z     tmp96 = tmp91 + tmp94
2023-01-11T21:38:06.2522830Z     tmp97 = tl.where(tmp95, tmp96, tmp91)
2023-01-11T21:38:06.2522910Z     tmp98 = tmp83 * tmp67
2023-01-11T21:38:06.2523031Z     tmp99 = tl.load(in_ptr0 + (tmp69 + (15*tmp85)), None)
2023-01-11T21:38:06.2523116Z     tmp100 = tmp99 / tmp98
2023-01-11T21:38:06.2523199Z     tmp101 = tmp88 & tmp72
2023-01-11T21:38:06.2523276Z     tmp102 = tmp97 + tmp100
2023-01-11T21:38:06.2523384Z     tmp103 = tl.where(tmp101, tmp102, tmp97)
2023-01-11T21:38:06.2523468Z     tmp104 = tmp5 + tmp59
2023-01-11T21:38:06.2523553Z     tmp105 = tmp104 * tmp13
2023-01-11T21:38:06.2523672Z     tmp106 = tmp105 - tmp13
2023-01-11T21:38:06.2523755Z     tmp107 = tmp106 + tmp14
2023-01-11T21:38:06.2523904Z     tmp108 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 < tmp20, tmp107, tmp20))
2023-01-11T21:38:06.2524045Z     tmp109 = tl.where(tmp106 != tmp106, tmp106, tl.where(tmp106 > tmp4, tmp106, tmp4))
2023-01-11T21:38:06.2524191Z     tmp110 = tl.where(tmp108 != tmp108, tmp108, tl.where(tmp108 < tmp7, tmp108, tmp7))
2023-01-11T21:38:06.2524314Z     tmp111 = tmp110 - tmp109
2023-01-11T21:38:06.2524398Z     tmp112 = tmp111 * tmp30
2023-01-11T21:38:06.2524548Z     tmp113 = tl.where(tmp104 != tmp104, tmp104, tl.where(tmp104 < tmp32, tmp104, tmp32))
2023-01-11T21:38:06.2524671Z     tmp114 = tl.load(in_ptr0 + (tmp35 + (15*tmp113)), None)
2023-01-11T21:38:06.2524757Z     tmp115 = tmp114 / tmp112
2023-01-11T21:38:06.2524834Z     tmp116 = tmp104 < tmp8
2023-01-11T21:38:06.2524942Z     tmp117 = tmp116 & tmp39
2023-01-11T21:38:06.2525026Z     tmp118 = tmp103 + tmp115
2023-01-11T21:38:06.2525135Z     tmp119 = tl.where(tmp117, tmp118, tmp103)
2023-01-11T21:38:06.2525217Z     tmp120 = tmp111 * tmp50
2023-01-11T21:38:06.2525337Z     tmp121 = tl.load(in_ptr0 + (tmp52 + (15*tmp113)), None)
2023-01-11T21:38:06.2525420Z     tmp122 = tmp121 / tmp120
2023-01-11T21:38:06.2525496Z     tmp123 = tmp116 & tmp55
2023-01-11T21:38:06.2525580Z     tmp124 = tmp119 + tmp122
2023-01-11T21:38:06.2525685Z     tmp125 = tl.where(tmp123, tmp124, tmp119)
2023-01-11T21:38:06.2525767Z     tmp126 = tmp111 * tmp67
2023-01-11T21:38:06.2525885Z     tmp127 = tl.load(in_ptr0 + (tmp69 + (15*tmp113)), None)
2023-01-11T21:38:06.2525971Z     tmp128 = tmp127 / tmp126
2023-01-11T21:38:06.2526052Z     tmp129 = tmp116 & tmp72
2023-01-11T21:38:06.2526128Z     tmp130 = tmp125 + tmp128
2023-01-11T21:38:06.2526234Z     tmp131 = tl.where(tmp129, tmp130, tmp125)
2023-01-11T21:38:06.2526376Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp131, xmask)
2023-01-11T21:38:06.2526464Z ''')
2023-01-11T21:38:06.2526470Z 
2023-01-11T21:38:06.2526475Z 
2023-01-11T21:38:06.2526570Z async_compile.wait(globals())
2023-01-11T21:38:06.2526648Z del async_compile
2023-01-11T21:38:06.2526653Z 
2023-01-11T21:38:06.2526734Z def call(args):
2023-01-11T21:38:06.2526808Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2526887Z     args.clear()
2023-01-11T21:38:06.2526982Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2527212Z         buf0 = empty_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2527307Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2527494Z         triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 300, grid=grid(300), stream=stream0)
2023-01-11T21:38:06.2527570Z         del arg0_1
2023-01-11T21:38:06.2527651Z         return (buf0, )
2023-01-11T21:38:06.2527656Z 
2023-01-11T21:38:06.2527660Z 
2023-01-11T21:38:06.2527736Z if __name__ == "__main__":
2023-01-11T21:38:06.2527857Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2527991Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2528218Z     arg0_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2528442Z     arg1_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2528566Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2528571Z 
2023-01-11T21:38:06.2528839Z [2023-01-11 21:34:06,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 424
2023-01-11T21:38:06.2528848Z 
2023-01-11T21:38:06.2528949Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2529026Z import torch
2023-01-11T21:38:06.2529096Z import random
2023-01-11T21:38:06.2529217Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2529344Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2529349Z 
2023-01-11T21:38:06.2529435Z aten = torch.ops.aten
2023-01-11T21:38:06.2529575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2529674Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2529679Z 
2023-01-11T21:38:06.2529754Z import triton
2023-01-11T21:38:06.2529842Z import triton.language as tl
2023-01-11T21:38:06.2529969Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2530112Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2530118Z 
2023-01-11T21:38:06.2530122Z 
2023-01-11T21:38:06.2530311Z triton_fused_avg_pool2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.2530387Z import triton
2023-01-11T21:38:06.2530482Z import triton.language as tl
2023-01-11T21:38:06.2530595Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2530703Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2530833Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2530989Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2530995Z 
2023-01-11T21:38:06.2531401Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2531478Z @triton.jit
2023-01-11T21:38:06.2531615Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2531691Z     xnumel = 300
2023-01-11T21:38:06.2531790Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2531922Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2532003Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2532084Z     x1 = (xindex // 15)
2023-01-11T21:38:06.2532162Z     x0 = xindex % 15
2023-01-11T21:38:06.2532237Z     x2 = xindex
2023-01-11T21:38:06.2532345Z     tmp0 = (-1) + x1
2023-01-11T21:38:06.2532451Z     tmp1 = (-1) + x0
2023-01-11T21:38:06.2532523Z     tmp2 = 2 + x1
2023-01-11T21:38:06.2532598Z     tmp3 = 2 + x0
2023-01-11T21:38:06.2532670Z     tmp4 = 0
2023-01-11T21:38:06.2532812Z     tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4))
2023-01-11T21:38:06.2532956Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4))
2023-01-11T21:38:06.2533030Z     tmp7 = 20
2023-01-11T21:38:06.2533168Z     tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7))
2023-01-11T21:38:06.2533235Z     tmp9 = 15
2023-01-11T21:38:06.2533374Z     tmp10 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp9, tmp3, tmp9))
2023-01-11T21:38:06.2533456Z     tmp11 = tmp5 + tmp4
2023-01-11T21:38:06.2533576Z     tmp12 = tmp6 + tmp4
2023-01-11T21:38:06.2533654Z     tmp13 = 1
2023-01-11T21:38:06.2533729Z     tmp14 = 3
2023-01-11T21:38:06.2533815Z     tmp15 = tmp11 * tmp13
2023-01-11T21:38:06.2533926Z     tmp16 = tmp15 - tmp13
2023-01-11T21:38:06.2534009Z     tmp17 = tmp12 * tmp13
2023-01-11T21:38:06.2534123Z     tmp18 = tmp17 - tmp13
2023-01-11T21:38:06.2534209Z     tmp19 = tmp16 + tmp14
2023-01-11T21:38:06.2534291Z     tmp20 = tmp7 + tmp13
2023-01-11T21:38:06.2534438Z     tmp21 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 < tmp20, tmp19, tmp20))
2023-01-11T21:38:06.2534628Z     tmp22 = tmp18 + tmp14
2023-01-11T21:38:06.2534702Z     tmp23 = tmp9 + tmp13
2023-01-11T21:38:06.2534845Z     tmp24 = tl.where(tmp22 != tmp22, tmp22, tl.where(tmp22 < tmp23, tmp22, tmp23))
2023-01-11T21:38:06.2534986Z     tmp25 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 > tmp4, tmp16, tmp4))
2023-01-11T21:38:06.2535125Z     tmp26 = tl.where(tmp18 != tmp18, tmp18, tl.where(tmp18 > tmp4, tmp18, tmp4))
2023-01-11T21:38:06.2535268Z     tmp27 = tl.where(tmp21 != tmp21, tmp21, tl.where(tmp21 < tmp7, tmp21, tmp7))
2023-01-11T21:38:06.2535407Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp9, tmp24, tmp9))
2023-01-11T21:38:06.2535524Z     tmp29 = tmp27 - tmp25
2023-01-11T21:38:06.2535638Z     tmp30 = tmp28 - tmp26
2023-01-11T21:38:06.2535714Z     tmp31 = tmp29 * tmp30
2023-01-11T21:38:06.2535825Z     tmp32 = tmp8 - tmp13
2023-01-11T21:38:06.2535963Z     tmp33 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp32, tmp11, tmp32))
2023-01-11T21:38:06.2536074Z     tmp34 = tmp10 - tmp13
2023-01-11T21:38:06.2536215Z     tmp35 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp34, tmp12, tmp34))
2023-01-11T21:38:06.2536346Z     tmp36 = tl.load(in_ptr0 + (tmp35 + (15*tmp33)), None).to(tl.float32)
2023-01-11T21:38:06.2536424Z     tmp37 = tmp36 / tmp31
2023-01-11T21:38:06.2536496Z     tmp38 = tmp11 < tmp8
2023-01-11T21:38:06.2536574Z     tmp39 = tmp12 < tmp10
2023-01-11T21:38:06.2536652Z     tmp40 = tmp38 & tmp39
2023-01-11T21:38:06.2536727Z     tmp41 = 0.0
2023-01-11T21:38:06.2536826Z     tmp42 = tl.where(tmp40, tmp37, tmp41)
2023-01-11T21:38:06.2536906Z     tmp43 = tmp6 + tmp13
2023-01-11T21:38:06.2536979Z     tmp44 = tmp43 * tmp13
2023-01-11T21:38:06.2537090Z     tmp45 = tmp44 - tmp13
2023-01-11T21:38:06.2537223Z     tmp46 = tmp45 + tmp14
2023-01-11T21:38:06.2537427Z     tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp23, tmp46, tmp23))
2023-01-11T21:38:06.2537573Z     tmp48 = tl.where(tmp45 != tmp45, tmp45, tl.where(tmp45 > tmp4, tmp45, tmp4))
2023-01-11T21:38:06.2537716Z     tmp49 = tl.where(tmp47 != tmp47, tmp47, tl.where(tmp47 < tmp9, tmp47, tmp9))
2023-01-11T21:38:06.2537831Z     tmp50 = tmp49 - tmp48
2023-01-11T21:38:06.2537913Z     tmp51 = tmp29 * tmp50
2023-01-11T21:38:06.2538047Z     tmp52 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 < tmp34, tmp43, tmp34))
2023-01-11T21:38:06.2538181Z     tmp53 = tl.load(in_ptr0 + (tmp52 + (15*tmp33)), None).to(tl.float32)
2023-01-11T21:38:06.2538269Z     tmp54 = tmp53 / tmp51
2023-01-11T21:38:06.2538350Z     tmp55 = tmp43 < tmp10
2023-01-11T21:38:06.2538432Z     tmp56 = tmp38 & tmp55
2023-01-11T21:38:06.2538513Z     tmp57 = tmp42 + tmp54
2023-01-11T21:38:06.2538614Z     tmp58 = tl.where(tmp56, tmp57, tmp42)
2023-01-11T21:38:06.2538681Z     tmp59 = 2
2023-01-11T21:38:06.2538763Z     tmp60 = tmp6 + tmp59
2023-01-11T21:38:06.2538848Z     tmp61 = tmp60 * tmp13
2023-01-11T21:38:06.2538963Z     tmp62 = tmp61 - tmp13
2023-01-11T21:38:06.2539043Z     tmp63 = tmp62 + tmp14
2023-01-11T21:38:06.2539186Z     tmp64 = tl.where(tmp63 != tmp63, tmp63, tl.where(tmp63 < tmp23, tmp63, tmp23))
2023-01-11T21:38:06.2539325Z     tmp65 = tl.where(tmp62 != tmp62, tmp62, tl.where(tmp62 > tmp4, tmp62, tmp4))
2023-01-11T21:38:06.2539458Z     tmp66 = tl.where(tmp64 != tmp64, tmp64, tl.where(tmp64 < tmp9, tmp64, tmp9))
2023-01-11T21:38:06.2539572Z     tmp67 = tmp66 - tmp65
2023-01-11T21:38:06.2539651Z     tmp68 = tmp29 * tmp67
2023-01-11T21:38:06.2539789Z     tmp69 = tl.where(tmp60 != tmp60, tmp60, tl.where(tmp60 < tmp34, tmp60, tmp34))
2023-01-11T21:38:06.2539961Z     tmp70 = tl.load(in_ptr0 + (tmp69 + (15*tmp33)), None).to(tl.float32)
2023-01-11T21:38:06.2540042Z     tmp71 = tmp70 / tmp68
2023-01-11T21:38:06.2540122Z     tmp72 = tmp60 < tmp10
2023-01-11T21:38:06.2540196Z     tmp73 = tmp38 & tmp72
2023-01-11T21:38:06.2540284Z     tmp74 = tmp58 + tmp71
2023-01-11T21:38:06.2540386Z     tmp75 = tl.where(tmp73, tmp74, tmp58)
2023-01-11T21:38:06.2540468Z     tmp76 = tmp5 + tmp13
2023-01-11T21:38:06.2540548Z     tmp77 = tmp76 * tmp13
2023-01-11T21:38:06.2540666Z     tmp78 = tmp77 - tmp13
2023-01-11T21:38:06.2540747Z     tmp79 = tmp78 + tmp14
2023-01-11T21:38:06.2540881Z     tmp80 = tl.where(tmp79 != tmp79, tmp79, tl.where(tmp79 < tmp20, tmp79, tmp20))
2023-01-11T21:38:06.2541022Z     tmp81 = tl.where(tmp78 != tmp78, tmp78, tl.where(tmp78 > tmp4, tmp78, tmp4))
2023-01-11T21:38:06.2541163Z     tmp82 = tl.where(tmp80 != tmp80, tmp80, tl.where(tmp80 < tmp7, tmp80, tmp7))
2023-01-11T21:38:06.2541283Z     tmp83 = tmp82 - tmp81
2023-01-11T21:38:06.2541363Z     tmp84 = tmp83 * tmp30
2023-01-11T21:38:06.2541503Z     tmp85 = tl.where(tmp76 != tmp76, tmp76, tl.where(tmp76 < tmp32, tmp76, tmp32))
2023-01-11T21:38:06.2541636Z     tmp86 = tl.load(in_ptr0 + (tmp35 + (15*tmp85)), None).to(tl.float32)
2023-01-11T21:38:06.2541712Z     tmp87 = tmp86 / tmp84
2023-01-11T21:38:06.2541798Z     tmp88 = tmp76 < tmp8
2023-01-11T21:38:06.2541881Z     tmp89 = tmp88 & tmp39
2023-01-11T21:38:06.2541960Z     tmp90 = tmp75 + tmp87
2023-01-11T21:38:06.2542062Z     tmp91 = tl.where(tmp89, tmp90, tmp75)
2023-01-11T21:38:06.2542146Z     tmp92 = tmp83 * tmp50
2023-01-11T21:38:06.2542278Z     tmp93 = tl.load(in_ptr0 + (tmp52 + (15*tmp85)), None).to(tl.float32)
2023-01-11T21:38:06.2542353Z     tmp94 = tmp93 / tmp92
2023-01-11T21:38:06.2542432Z     tmp95 = tmp88 & tmp55
2023-01-11T21:38:06.2542514Z     tmp96 = tmp91 + tmp94
2023-01-11T21:38:06.2542615Z     tmp97 = tl.where(tmp95, tmp96, tmp91)
2023-01-11T21:38:06.2542697Z     tmp98 = tmp83 * tmp67
2023-01-11T21:38:06.2542832Z     tmp99 = tl.load(in_ptr0 + (tmp69 + (15*tmp85)), None).to(tl.float32)
2023-01-11T21:38:06.2542917Z     tmp100 = tmp99 / tmp98
2023-01-11T21:38:06.2542994Z     tmp101 = tmp88 & tmp72
2023-01-11T21:38:06.2543078Z     tmp102 = tmp97 + tmp100
2023-01-11T21:38:06.2543184Z     tmp103 = tl.where(tmp101, tmp102, tmp97)
2023-01-11T21:38:06.2543293Z     tmp104 = tmp5 + tmp59
2023-01-11T21:38:06.2543377Z     tmp105 = tmp104 * tmp13
2023-01-11T21:38:06.2543497Z     tmp106 = tmp105 - tmp13
2023-01-11T21:38:06.2543577Z     tmp107 = tmp106 + tmp14
2023-01-11T21:38:06.2543722Z     tmp108 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 < tmp20, tmp107, tmp20))
2023-01-11T21:38:06.2543868Z     tmp109 = tl.where(tmp106 != tmp106, tmp106, tl.where(tmp106 > tmp4, tmp106, tmp4))
2023-01-11T21:38:06.2544012Z     tmp110 = tl.where(tmp108 != tmp108, tmp108, tl.where(tmp108 < tmp7, tmp108, tmp7))
2023-01-11T21:38:06.2544133Z     tmp111 = tmp110 - tmp109
2023-01-11T21:38:06.2544222Z     tmp112 = tmp111 * tmp30
2023-01-11T21:38:06.2544369Z     tmp113 = tl.where(tmp104 != tmp104, tmp104, tl.where(tmp104 < tmp32, tmp104, tmp32))
2023-01-11T21:38:06.2544502Z     tmp114 = tl.load(in_ptr0 + (tmp35 + (15*tmp113)), None).to(tl.float32)
2023-01-11T21:38:06.2544588Z     tmp115 = tmp114 / tmp112
2023-01-11T21:38:06.2544665Z     tmp116 = tmp104 < tmp8
2023-01-11T21:38:06.2544753Z     tmp117 = tmp116 & tmp39
2023-01-11T21:38:06.2544836Z     tmp118 = tmp103 + tmp115
2023-01-11T21:38:06.2544944Z     tmp119 = tl.where(tmp117, tmp118, tmp103)
2023-01-11T21:38:06.2545025Z     tmp120 = tmp111 * tmp50
2023-01-11T21:38:06.2545158Z     tmp121 = tl.load(in_ptr0 + (tmp52 + (15*tmp113)), None).to(tl.float32)
2023-01-11T21:38:06.2545241Z     tmp122 = tmp121 / tmp120
2023-01-11T21:38:06.2545317Z     tmp123 = tmp116 & tmp55
2023-01-11T21:38:06.2545402Z     tmp124 = tmp119 + tmp122
2023-01-11T21:38:06.2545506Z     tmp125 = tl.where(tmp123, tmp124, tmp119)
2023-01-11T21:38:06.2545585Z     tmp126 = tmp111 * tmp67
2023-01-11T21:38:06.2545739Z     tmp127 = tl.load(in_ptr0 + (tmp69 + (15*tmp113)), None).to(tl.float32)
2023-01-11T21:38:06.2545823Z     tmp128 = tmp127 / tmp126
2023-01-11T21:38:06.2545904Z     tmp129 = tmp116 & tmp72
2023-01-11T21:38:06.2545981Z     tmp130 = tmp125 + tmp128
2023-01-11T21:38:06.2546086Z     tmp131 = tl.where(tmp129, tmp130, tmp125)
2023-01-11T21:38:06.2546224Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp131, xmask)
2023-01-11T21:38:06.2546311Z ''')
2023-01-11T21:38:06.2546318Z 
2023-01-11T21:38:06.2546322Z 
2023-01-11T21:38:06.2546417Z async_compile.wait(globals())
2023-01-11T21:38:06.2546496Z del async_compile
2023-01-11T21:38:06.2546501Z 
2023-01-11T21:38:06.2546578Z def call(args):
2023-01-11T21:38:06.2546652Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2546727Z     args.clear()
2023-01-11T21:38:06.2546820Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2547050Z         buf0 = empty_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2547148Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2547308Z         triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 300, grid=grid(300), stream=stream0)
2023-01-11T21:38:06.2547382Z         del arg0_1
2023-01-11T21:38:06.2547455Z         return (buf0, )
2023-01-11T21:38:06.2547469Z 
2023-01-11T21:38:06.2547474Z 
2023-01-11T21:38:06.2547552Z if __name__ == "__main__":
2023-01-11T21:38:06.2547672Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2547803Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2548029Z     arg0_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2548247Z     arg1_1 = rand_strided((1, 1, 20, 15), (300, 300, 15, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2548372Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2548378Z 
2023-01-11T21:38:06.2548450Z ok (0.678s)
2023-01-11T21:38:06.2548946Z   test_avg_pool2d_backward3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2549085Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2549342Z [2023-01-11 21:34:06,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 425
2023-01-11T21:38:06.2549610Z [2023-01-11 21:34:06,881] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 425
2023-01-11T21:38:06.2550026Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2550164Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2550425Z [2023-01-11 21:34:06,901] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 426
2023-01-11T21:38:06.2550693Z [2023-01-11 21:34:07,022] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 426
2023-01-11T21:38:06.2550699Z 
2023-01-11T21:38:06.2550799Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2550878Z import torch
2023-01-11T21:38:06.2550954Z import random
2023-01-11T21:38:06.2551070Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2551196Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2551201Z 
2023-01-11T21:38:06.2551285Z aten = torch.ops.aten
2023-01-11T21:38:06.2551427Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2551549Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2551554Z 
2023-01-11T21:38:06.2551630Z import triton
2023-01-11T21:38:06.2551724Z import triton.language as tl
2023-01-11T21:38:06.2551852Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2551991Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2551997Z 
2023-01-11T21:38:06.2552008Z 
2023-01-11T21:38:06.2552189Z triton_fused_avg_pool2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.2552263Z import triton
2023-01-11T21:38:06.2552358Z import triton.language as tl
2023-01-11T21:38:06.2552475Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2552578Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2552713Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2552842Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2552847Z 
2023-01-11T21:38:06.2553252Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2553332Z @triton.jit
2023-01-11T21:38:06.2553466Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2553547Z     xnumel = 889056
2023-01-11T21:38:06.2553647Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2553779Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2553864Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2553947Z     x1 = (xindex // 21) % 21
2023-01-11T21:38:06.2554019Z     x0 = xindex % 21
2023-01-11T21:38:06.2554099Z     x2 = (xindex // 441)
2023-01-11T21:38:06.2560860Z     x5 = xindex
2023-01-11T21:38:06.2560961Z     tmp0 = ((1 + x1) // 2)
2023-01-11T21:38:06.2561043Z     tmp1 = ((1 + x0) // 2)
2023-01-11T21:38:06.2561122Z     tmp2 = 1 + (x1 // 2)
2023-01-11T21:38:06.2561208Z     tmp3 = 1 + (x0 // 2)
2023-01-11T21:38:06.2561276Z     tmp4 = 0
2023-01-11T21:38:06.2561418Z     tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4))
2023-01-11T21:38:06.2561558Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4))
2023-01-11T21:38:06.2561632Z     tmp7 = 11
2023-01-11T21:38:06.2561822Z     tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7))
2023-01-11T21:38:06.2561959Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7))
2023-01-11T21:38:06.2562043Z     tmp10 = tmp5 + tmp4
2023-01-11T21:38:06.2562118Z     tmp11 = tmp6 + tmp4
2023-01-11T21:38:06.2562196Z     tmp12 = 1
2023-01-11T21:38:06.2562331Z     tmp13 = tmp8 - tmp12
2023-01-11T21:38:06.2562481Z     tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13))
2023-01-11T21:38:06.2562595Z     tmp15 = tmp9 - tmp12
2023-01-11T21:38:06.2562731Z     tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15))
2023-01-11T21:38:06.2562860Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp14) + (121*x2)), xmask)
2023-01-11T21:38:06.2562940Z     tmp18 = tmp17 / 1
2023-01-11T21:38:06.2563021Z     tmp19 = tmp10 < tmp8
2023-01-11T21:38:06.2563102Z     tmp20 = tmp11 < tmp9
2023-01-11T21:38:06.2563184Z     tmp21 = tmp19 & tmp20
2023-01-11T21:38:06.2563259Z     tmp22 = 0.0
2023-01-11T21:38:06.2563358Z     tmp23 = tl.where(tmp21, tmp18, tmp22)
2023-01-11T21:38:06.2563496Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.2563583Z ''')
2023-01-11T21:38:06.2563590Z 
2023-01-11T21:38:06.2563595Z 
2023-01-11T21:38:06.2563693Z async_compile.wait(globals())
2023-01-11T21:38:06.2563772Z del async_compile
2023-01-11T21:38:06.2563778Z 
2023-01-11T21:38:06.2563854Z def call(args):
2023-01-11T21:38:06.2563939Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2564017Z     args.clear()
2023-01-11T21:38:06.2564105Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2564338Z         buf0 = empty_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2564468Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2564630Z         triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 889056, grid=grid(889056), stream=stream0)
2023-01-11T21:38:06.2564707Z         del arg0_1
2023-01-11T21:38:06.2564790Z         return (buf0, )
2023-01-11T21:38:06.2564795Z 
2023-01-11T21:38:06.2564800Z 
2023-01-11T21:38:06.2564882Z if __name__ == "__main__":
2023-01-11T21:38:06.2565001Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2565124Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2565358Z     arg0_1 = rand_strided((1, 2016, 11, 11), (243936, 121, 11, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2565591Z     arg1_1 = rand_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2565710Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2565718Z 
2023-01-11T21:38:06.2565722Z 
2023-01-11T21:38:06.2565819Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2565893Z import torch
2023-01-11T21:38:06.2565964Z import random
2023-01-11T21:38:06.2566075Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2566197Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2566205Z 
2023-01-11T21:38:06.2566287Z aten = torch.ops.aten
2023-01-11T21:38:06.2566425Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2566520Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2566526Z 
2023-01-11T21:38:06.2566602Z import triton
2023-01-11T21:38:06.2566693Z import triton.language as tl
2023-01-11T21:38:06.2566819Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2566951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2566965Z 
2023-01-11T21:38:06.2566969Z 
2023-01-11T21:38:06.2567148Z triton_fused_avg_pool2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.2567223Z import triton
2023-01-11T21:38:06.2567318Z import triton.language as tl
2023-01-11T21:38:06.2567431Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2567531Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2567665Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2567816Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2567822Z 
2023-01-11T21:38:06.2568228Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2568303Z @triton.jit
2023-01-11T21:38:06.2568437Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2568512Z     xnumel = 889056
2023-01-11T21:38:06.2568609Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2568741Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2568827Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2568909Z     x1 = (xindex // 21) % 21
2023-01-11T21:38:06.2568977Z     x0 = xindex % 21
2023-01-11T21:38:06.2569057Z     x2 = (xindex // 441)
2023-01-11T21:38:06.2569127Z     x5 = xindex
2023-01-11T21:38:06.2569209Z     tmp0 = ((1 + x1) // 2)
2023-01-11T21:38:06.2569286Z     tmp1 = ((1 + x0) // 2)
2023-01-11T21:38:06.2569362Z     tmp2 = 1 + (x1 // 2)
2023-01-11T21:38:06.2569431Z     tmp3 = 1 + (x0 // 2)
2023-01-11T21:38:06.2569501Z     tmp4 = 0
2023-01-11T21:38:06.2569641Z     tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4))
2023-01-11T21:38:06.2569777Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4))
2023-01-11T21:38:06.2569848Z     tmp7 = 11
2023-01-11T21:38:06.2569979Z     tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7))
2023-01-11T21:38:06.2570111Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7))
2023-01-11T21:38:06.2570219Z     tmp10 = tmp5 + tmp4
2023-01-11T21:38:06.2570291Z     tmp11 = tmp6 + tmp4
2023-01-11T21:38:06.2570364Z     tmp12 = 1
2023-01-11T21:38:06.2570478Z     tmp13 = tmp8 - tmp12
2023-01-11T21:38:06.2570621Z     tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13))
2023-01-11T21:38:06.2570734Z     tmp15 = tmp9 - tmp12
2023-01-11T21:38:06.2570871Z     tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15))
2023-01-11T21:38:06.2571009Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp14) + (121*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2571080Z     tmp18 = tmp17 / 1
2023-01-11T21:38:06.2571157Z     tmp19 = tmp10 < tmp8
2023-01-11T21:38:06.2571235Z     tmp20 = tmp11 < tmp9
2023-01-11T21:38:06.2571315Z     tmp21 = tmp19 & tmp20
2023-01-11T21:38:06.2571387Z     tmp22 = 0.0
2023-01-11T21:38:06.2571486Z     tmp23 = tl.where(tmp21, tmp18, tmp22)
2023-01-11T21:38:06.2571621Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.2571702Z ''')
2023-01-11T21:38:06.2571708Z 
2023-01-11T21:38:06.2571712Z 
2023-01-11T21:38:06.2571806Z async_compile.wait(globals())
2023-01-11T21:38:06.2571884Z del async_compile
2023-01-11T21:38:06.2571889Z 
2023-01-11T21:38:06.2571962Z def call(args):
2023-01-11T21:38:06.2572041Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2572118Z     args.clear()
2023-01-11T21:38:06.2572210Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2572435Z         buf0 = empty_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2572527Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2572686Z         triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 889056, grid=grid(889056), stream=stream0)
2023-01-11T21:38:06.2572759Z         del arg0_1
2023-01-11T21:38:06.2572838Z         return (buf0, )
2023-01-11T21:38:06.2572843Z 
2023-01-11T21:38:06.2572848Z 
2023-01-11T21:38:06.2572925Z if __name__ == "__main__":
2023-01-11T21:38:06.2573043Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2573171Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2573396Z     arg0_1 = rand_strided((1, 2016, 11, 11), (243936, 121, 11, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2573653Z     arg1_1 = rand_strided((1, 2016, 21, 21), (889056, 441, 21, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2573774Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2573780Z 
2023-01-11T21:38:06.2573850Z ok (0.303s)
2023-01-11T21:38:06.2574317Z   test_avg_pool2d_backward4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2574456Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2574864Z [2023-01-11 21:34:07,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 427
2023-01-11T21:38:06.2575108Z [2023-01-11 21:34:07,053] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.avg_pool2d_backward
2023-01-11T21:38:06.2575397Z [2023-01-11 21:34:07,055] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 427
2023-01-11T21:38:06.2575403Z 
2023-01-11T21:38:06.2575514Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2575593Z import torch
2023-01-11T21:38:06.2575667Z import random
2023-01-11T21:38:06.2575786Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2575909Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2575914Z 
2023-01-11T21:38:06.2575995Z aten = torch.ops.aten
2023-01-11T21:38:06.2576130Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2576317Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2576323Z 
2023-01-11T21:38:06.2576397Z import triton
2023-01-11T21:38:06.2576484Z import triton.language as tl
2023-01-11T21:38:06.2576612Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2576756Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2576762Z 
2023-01-11T21:38:06.2576766Z 
2023-01-11T21:38:06.2576860Z async_compile.wait(globals())
2023-01-11T21:38:06.2576938Z del async_compile
2023-01-11T21:38:06.2576943Z 
2023-01-11T21:38:06.2577018Z def call(args):
2023-01-11T21:38:06.2577098Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2577233Z     args.clear()
2023-01-11T21:38:06.2577341Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2577512Z         buf0 = aten.avg_pool2d_backward(arg0_1, arg1_1, [13, 13], [1, 1], [0, 0], True, False, None)
2023-01-11T21:38:06.2577591Z         del arg0_1
2023-01-11T21:38:06.2577666Z         del arg1_1
2023-01-11T21:38:06.2577745Z         buf1 = buf0
2023-01-11T21:38:06.2577863Z         assert_size_stride(buf1, (1, 16, 24, 24), (9216, 576, 24, 1))
2023-01-11T21:38:06.2577930Z         del buf0
2023-01-11T21:38:06.2578009Z         return (buf1, )
2023-01-11T21:38:06.2578014Z 
2023-01-11T21:38:06.2578019Z 
2023-01-11T21:38:06.2578102Z if __name__ == "__main__":
2023-01-11T21:38:06.2578224Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2578353Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2578587Z     arg0_1 = rand_strided((1, 16, 12, 12), (2304, 144, 12, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2578808Z     arg1_1 = rand_strided((1, 16, 24, 24), (9216, 576, 24, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2578929Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2578934Z 
2023-01-11T21:38:06.2579000Z ok (0.033s)
2023-01-11T21:38:06.2579466Z   test_avg_pool2d_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2579643Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2579906Z [2023-01-11 21:34:07,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 428
2023-01-11T21:38:06.2580170Z [2023-01-11 21:34:07,205] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 428
2023-01-11T21:38:06.2580583Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2580720Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2580977Z [2023-01-11 21:34:07,224] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 429
2023-01-11T21:38:06.2581244Z [2023-01-11 21:34:07,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 429
2023-01-11T21:38:06.2581249Z 
2023-01-11T21:38:06.2581347Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2581425Z import torch
2023-01-11T21:38:06.2581494Z import random
2023-01-11T21:38:06.2581616Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2581740Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2581746Z 
2023-01-11T21:38:06.2581829Z aten = torch.ops.aten
2023-01-11T21:38:06.2581969Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2582065Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2582114Z 
2023-01-11T21:38:06.2582191Z import triton
2023-01-11T21:38:06.2582285Z import triton.language as tl
2023-01-11T21:38:06.2582407Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2582552Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2582558Z 
2023-01-11T21:38:06.2582565Z 
2023-01-11T21:38:06.2582748Z triton_fused_avg_pool2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.2582823Z import triton
2023-01-11T21:38:06.2582918Z import triton.language as tl
2023-01-11T21:38:06.2583031Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2583135Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2583262Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2583390Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2583395Z 
2023-01-11T21:38:06.2583804Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2583882Z @triton.jit
2023-01-11T21:38:06.2584018Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2584094Z     xnumel = 1568
2023-01-11T21:38:06.2584199Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2584329Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2584408Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2584491Z     x1 = (xindex // 14) % 14
2023-01-11T21:38:06.2584568Z     x0 = xindex % 14
2023-01-11T21:38:06.2584650Z     x2 = (xindex // 196)
2023-01-11T21:38:06.2584723Z     x5 = xindex
2023-01-11T21:38:06.2584800Z     tmp0 = (x1 // 2)
2023-01-11T21:38:06.2584876Z     tmp1 = (x0 // 2)
2023-01-11T21:38:06.2584949Z     tmp2 = 1 + (x1 // 2)
2023-01-11T21:38:06.2585026Z     tmp3 = 1 + (x0 // 2)
2023-01-11T21:38:06.2585101Z     tmp4 = 0
2023-01-11T21:38:06.2585246Z     tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4))
2023-01-11T21:38:06.2585384Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4))
2023-01-11T21:38:06.2585460Z     tmp7 = 7
2023-01-11T21:38:06.2585616Z     tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7))
2023-01-11T21:38:06.2585796Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7))
2023-01-11T21:38:06.2585879Z     tmp10 = tmp5 + tmp4
2023-01-11T21:38:06.2585963Z     tmp11 = tmp6 + tmp4
2023-01-11T21:38:06.2586034Z     tmp12 = 1
2023-01-11T21:38:06.2586151Z     tmp13 = tmp8 - tmp12
2023-01-11T21:38:06.2586299Z     tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13))
2023-01-11T21:38:06.2586413Z     tmp15 = tmp9 - tmp12
2023-01-11T21:38:06.2586549Z     tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15))
2023-01-11T21:38:06.2586673Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (7*tmp14) + (49*x2)), xmask)
2023-01-11T21:38:06.2586756Z     tmp18 = tmp17 / 4
2023-01-11T21:38:06.2586838Z     tmp19 = tmp10 < tmp8
2023-01-11T21:38:06.2586919Z     tmp20 = tmp11 < tmp9
2023-01-11T21:38:06.2587001Z     tmp21 = tmp19 & tmp20
2023-01-11T21:38:06.2587077Z     tmp22 = 0.0
2023-01-11T21:38:06.2587173Z     tmp23 = tl.where(tmp21, tmp18, tmp22)
2023-01-11T21:38:06.2587311Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.2587400Z ''')
2023-01-11T21:38:06.2587406Z 
2023-01-11T21:38:06.2587410Z 
2023-01-11T21:38:06.2587508Z async_compile.wait(globals())
2023-01-11T21:38:06.2587588Z del async_compile
2023-01-11T21:38:06.2587593Z 
2023-01-11T21:38:06.2587669Z def call(args):
2023-01-11T21:38:06.2587750Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2587831Z     args.clear()
2023-01-11T21:38:06.2587919Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2588143Z         buf0 = empty_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2588263Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2588419Z         triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 1568, grid=grid(1568), stream=stream0)
2023-01-11T21:38:06.2588493Z         del arg0_1
2023-01-11T21:38:06.2588574Z         return (buf0, )
2023-01-11T21:38:06.2588580Z 
2023-01-11T21:38:06.2588584Z 
2023-01-11T21:38:06.2588668Z if __name__ == "__main__":
2023-01-11T21:38:06.2588781Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2588910Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2589130Z     arg0_1 = rand_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2589352Z     arg1_1 = rand_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2589473Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2589479Z 
2023-01-11T21:38:06.2589483Z 
2023-01-11T21:38:06.2589583Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2589664Z import torch
2023-01-11T21:38:06.2589742Z import random
2023-01-11T21:38:06.2589856Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2589981Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2589986Z 
2023-01-11T21:38:06.2590068Z aten = torch.ops.aten
2023-01-11T21:38:06.2590208Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2590307Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2590313Z 
2023-01-11T21:38:06.2590390Z import triton
2023-01-11T21:38:06.2590484Z import triton.language as tl
2023-01-11T21:38:06.2590611Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2590744Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2590749Z 
2023-01-11T21:38:06.2590761Z 
2023-01-11T21:38:06.2590939Z triton_fused_avg_pool2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.2591015Z import triton
2023-01-11T21:38:06.2591109Z import triton.language as tl
2023-01-11T21:38:06.2591227Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2591333Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2591467Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2591595Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2591600Z 
2023-01-11T21:38:06.2592030Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2592111Z @triton.jit
2023-01-11T21:38:06.2592248Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2592326Z     xnumel = 1568
2023-01-11T21:38:06.2592425Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2592554Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2592642Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2592729Z     x1 = (xindex // 14) % 14
2023-01-11T21:38:06.2592800Z     x0 = xindex % 14
2023-01-11T21:38:06.2592882Z     x2 = (xindex // 196)
2023-01-11T21:38:06.2592955Z     x5 = xindex
2023-01-11T21:38:06.2593032Z     tmp0 = (x1 // 2)
2023-01-11T21:38:06.2593110Z     tmp1 = (x0 // 2)
2023-01-11T21:38:06.2593189Z     tmp2 = 1 + (x1 // 2)
2023-01-11T21:38:06.2593263Z     tmp3 = 1 + (x0 // 2)
2023-01-11T21:38:06.2593337Z     tmp4 = 0
2023-01-11T21:38:06.2593476Z     tmp5 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp4, tmp0, tmp4))
2023-01-11T21:38:06.2593614Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp4, tmp1, tmp4))
2023-01-11T21:38:06.2593687Z     tmp7 = 7
2023-01-11T21:38:06.2593820Z     tmp8 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp7, tmp2, tmp7))
2023-01-11T21:38:06.2593951Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp7, tmp3, tmp7))
2023-01-11T21:38:06.2594027Z     tmp10 = tmp5 + tmp4
2023-01-11T21:38:06.2594107Z     tmp11 = tmp6 + tmp4
2023-01-11T21:38:06.2594208Z     tmp12 = 1
2023-01-11T21:38:06.2594324Z     tmp13 = tmp8 - tmp12
2023-01-11T21:38:06.2594470Z     tmp14 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp13, tmp10, tmp13))
2023-01-11T21:38:06.2594584Z     tmp15 = tmp9 - tmp12
2023-01-11T21:38:06.2594729Z     tmp16 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp15, tmp11, tmp15))
2023-01-11T21:38:06.2594859Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (7*tmp14) + (49*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2594940Z     tmp18 = tmp17 / 4
2023-01-11T21:38:06.2595022Z     tmp19 = tmp10 < tmp8
2023-01-11T21:38:06.2595103Z     tmp20 = tmp11 < tmp9
2023-01-11T21:38:06.2595184Z     tmp21 = tmp19 & tmp20
2023-01-11T21:38:06.2595258Z     tmp22 = 0.0
2023-01-11T21:38:06.2595360Z     tmp23 = tl.where(tmp21, tmp18, tmp22)
2023-01-11T21:38:06.2595491Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.2595578Z ''')
2023-01-11T21:38:06.2595583Z 
2023-01-11T21:38:06.2595587Z 
2023-01-11T21:38:06.2595688Z async_compile.wait(globals())
2023-01-11T21:38:06.2595767Z del async_compile
2023-01-11T21:38:06.2595772Z 
2023-01-11T21:38:06.2595848Z def call(args):
2023-01-11T21:38:06.2595930Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2596006Z     args.clear()
2023-01-11T21:38:06.2596101Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2596323Z         buf0 = empty_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2596419Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2596574Z         triton_fused_avg_pool2d_backward_0.run(arg0_1, buf0, 1568, grid=grid(1568), stream=stream0)
2023-01-11T21:38:06.2596649Z         del arg0_1
2023-01-11T21:38:06.2596729Z         return (buf0, )
2023-01-11T21:38:06.2596735Z 
2023-01-11T21:38:06.2596739Z 
2023-01-11T21:38:06.2596822Z if __name__ == "__main__":
2023-01-11T21:38:06.2596941Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2597068Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2597282Z     arg0_1 = rand_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2597503Z     arg1_1 = rand_strided((2, 4, 14, 14), (784, 196, 14, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2597628Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2597662Z 
2023-01-11T21:38:06.2597736Z ok (0.288s)
2023-01-11T21:38:06.2598195Z   test_baddbmm_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2598328Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2598588Z [2023-01-11 21:34:07,364] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 430
2023-01-11T21:38:06.2598859Z [2023-01-11 21:34:07,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 430
2023-01-11T21:38:06.2599279Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2599414Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2599665Z [2023-01-11 21:34:07,464] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 431
2023-01-11T21:38:06.2599929Z [2023-01-11 21:34:07,539] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 431
2023-01-11T21:38:06.2599934Z 
2023-01-11T21:38:06.2600068Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2600144Z import torch
2023-01-11T21:38:06.2600222Z import random
2023-01-11T21:38:06.2600345Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2600471Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2600476Z 
2023-01-11T21:38:06.2600562Z aten = torch.ops.aten
2023-01-11T21:38:06.2600695Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2600793Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2600798Z 
2023-01-11T21:38:06.2600873Z import triton
2023-01-11T21:38:06.2600969Z import triton.language as tl
2023-01-11T21:38:06.2601093Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2601233Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2601238Z 
2023-01-11T21:38:06.2601243Z 
2023-01-11T21:38:06.2601400Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2601478Z import triton
2023-01-11T21:38:06.2601569Z import triton.language as tl
2023-01-11T21:38:06.2601684Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2601789Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2601922Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2602050Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2602060Z 
2023-01-11T21:38:06.2602485Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2602562Z @triton.jit
2023-01-11T21:38:06.2602698Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2602769Z     xnumel = 76800
2023-01-11T21:38:06.2602869Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2602999Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2603088Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2603168Z     x0 = xindex % 100
2023-01-11T21:38:06.2603255Z     x2 = (xindex // 12800)
2023-01-11T21:38:06.2603328Z     x3 = xindex
2023-01-11T21:38:06.2603432Z     tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask)
2023-01-11T21:38:06.2603537Z     tmp1 = tl.load(in_out_ptr0 + (x3), xmask)
2023-01-11T21:38:06.2603649Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2603791Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2603877Z ''')
2023-01-11T21:38:06.2603883Z 
2023-01-11T21:38:06.2603887Z 
2023-01-11T21:38:06.2603982Z async_compile.wait(globals())
2023-01-11T21:38:06.2604061Z del async_compile
2023-01-11T21:38:06.2604066Z 
2023-01-11T21:38:06.2604142Z def call(args):
2023-01-11T21:38:06.2604225Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2604303Z     args.clear()
2023-01-11T21:38:06.2604397Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2604622Z         buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2604731Z         aten.bmm.out(arg1_1, arg2_1, out=buf0)
2023-01-11T21:38:06.2604808Z         del arg1_1
2023-01-11T21:38:06.2604883Z         del arg2_1
2023-01-11T21:38:06.2604969Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.2605065Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2605207Z         triton_fused_add_0.run(buf1, arg0_1, 76800, grid=grid(76800), stream=stream0)
2023-01-11T21:38:06.2605281Z         del arg0_1
2023-01-11T21:38:06.2605359Z         return (buf1, )
2023-01-11T21:38:06.2605364Z 
2023-01-11T21:38:06.2605368Z 
2023-01-11T21:38:06.2605450Z if __name__ == "__main__":
2023-01-11T21:38:06.2605569Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2605700Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2605954Z     arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2606172Z     arg1_1 = rand_strided((6, 128, 64), (8192, 64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2606419Z     arg2_1 = rand_strided((6, 64, 100), (6400, 100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2606548Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2606553Z 
2023-01-11T21:38:06.2606558Z 
2023-01-11T21:38:06.2606662Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2606739Z import torch
2023-01-11T21:38:06.2606816Z import random
2023-01-11T21:38:06.2606930Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2607056Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2607061Z 
2023-01-11T21:38:06.2607144Z aten = torch.ops.aten
2023-01-11T21:38:06.2607281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2607377Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2607382Z 
2023-01-11T21:38:06.2607458Z import triton
2023-01-11T21:38:06.2607553Z import triton.language as tl
2023-01-11T21:38:06.2607684Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2607818Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2607823Z 
2023-01-11T21:38:06.2607837Z 
2023-01-11T21:38:06.2607986Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2608062Z import triton
2023-01-11T21:38:06.2608158Z import triton.language as tl
2023-01-11T21:38:06.2608274Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2608380Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2608515Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2608643Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2608649Z 
2023-01-11T21:38:06.2609066Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2609145Z @triton.jit
2023-01-11T21:38:06.2609282Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2609359Z     xnumel = 76800
2023-01-11T21:38:06.2609458Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2609615Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2609703Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2609785Z     x0 = xindex % 100
2023-01-11T21:38:06.2609862Z     x2 = (xindex // 12800)
2023-01-11T21:38:06.2609934Z     x3 = xindex
2023-01-11T21:38:06.2610064Z     tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.2610190Z     tmp1 = tl.load(in_out_ptr0 + (x3), xmask).to(tl.float32)
2023-01-11T21:38:06.2610272Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2610412Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2610501Z ''')
2023-01-11T21:38:06.2610506Z 
2023-01-11T21:38:06.2610511Z 
2023-01-11T21:38:06.2610601Z async_compile.wait(globals())
2023-01-11T21:38:06.2610681Z del async_compile
2023-01-11T21:38:06.2610687Z 
2023-01-11T21:38:06.2610763Z def call(args):
2023-01-11T21:38:06.2610852Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2610932Z     args.clear()
2023-01-11T21:38:06.2611029Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2611254Z         buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2611355Z         aten.bmm.out(arg1_1, arg2_1, out=buf0)
2023-01-11T21:38:06.2611430Z         del arg1_1
2023-01-11T21:38:06.2611505Z         del arg2_1
2023-01-11T21:38:06.2611596Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.2611690Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2611828Z         triton_fused_add_0.run(buf1, arg0_1, 76800, grid=grid(76800), stream=stream0)
2023-01-11T21:38:06.2611903Z         del arg0_1
2023-01-11T21:38:06.2611976Z         return (buf1, )
2023-01-11T21:38:06.2611989Z 
2023-01-11T21:38:06.2612024Z 
2023-01-11T21:38:06.2612100Z if __name__ == "__main__":
2023-01-11T21:38:06.2612221Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2612353Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2612567Z     arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2612781Z     arg1_1 = rand_strided((6, 128, 64), (8192, 64, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2612995Z     arg2_1 = rand_strided((6, 64, 100), (6400, 100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2613124Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2613129Z 
2023-01-11T21:38:06.2613204Z ok (0.197s)
2023-01-11T21:38:06.2613658Z   test_batch_norm_2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2613793Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2614055Z [2023-01-11 21:34:07,851] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 432
2023-01-11T21:38:06.2614320Z [2023-01-11 21:34:08,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 432
2023-01-11T21:38:06.2614846Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2614980Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2615240Z [2023-01-11 21:34:08,398] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 433
2023-01-11T21:38:06.2615246Z 
2023-01-11T21:38:06.2615347Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2615418Z import torch
2023-01-11T21:38:06.2615490Z import random
2023-01-11T21:38:06.2615646Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2615776Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2615781Z 
2023-01-11T21:38:06.2615863Z aten = torch.ops.aten
2023-01-11T21:38:06.2615998Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2616094Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2616099Z 
2023-01-11T21:38:06.2616171Z import triton
2023-01-11T21:38:06.2616266Z import triton.language as tl
2023-01-11T21:38:06.2616391Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2616524Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2616532Z 
2023-01-11T21:38:06.2616543Z 
2023-01-11T21:38:06.2616722Z triton_fused_convert_element_type_0 = async_compile.triton('''
2023-01-11T21:38:06.2616795Z import triton
2023-01-11T21:38:06.2616885Z import triton.language as tl
2023-01-11T21:38:06.2616999Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2617104Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2617324Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2617453Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2617459Z 
2023-01-11T21:38:06.2617854Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2617928Z @triton.jit
2023-01-11T21:38:06.2618061Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2618176Z     xnumel = 10
2023-01-11T21:38:06.2618271Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2618401Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2618484Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2618547Z     x0 = xindex
2023-01-11T21:38:06.2618648Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2618782Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2618870Z ''')
2023-01-11T21:38:06.2618875Z 
2023-01-11T21:38:06.2618879Z 
2023-01-11T21:38:06.2619040Z triton_fused_le_relu_1 = async_compile.triton('''
2023-01-11T21:38:06.2619116Z import triton
2023-01-11T21:38:06.2619208Z import triton.language as tl
2023-01-11T21:38:06.2619323Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2619416Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2619551Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2619676Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2619685Z 
2023-01-11T21:38:06.2620166Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*i1', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.2620240Z @triton.jit
2023-01-11T21:38:06.2620420Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2620495Z     xnumel = 1280
2023-01-11T21:38:06.2620594Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2620718Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2620800Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2620869Z     x3 = xindex
2023-01-11T21:38:06.2620952Z     x1 = (xindex // 64) % 10
2023-01-11T21:38:06.2621049Z     tmp0 = tl.load(in_ptr0 + (x3), xmask)
2023-01-11T21:38:06.2621148Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.2621244Z     tmp3 = tl.load(in_ptr2 + (x1), xmask)
2023-01-11T21:38:06.2621335Z     tmp11 = tl.load(in_ptr3 + (x1), xmask)
2023-01-11T21:38:06.2621431Z     tmp13 = tl.load(in_ptr4 + (x1), xmask)
2023-01-11T21:38:06.2621542Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.2621670Z     tmp4 = 1e-05
2023-01-11T21:38:06.2621750Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.2621830Z     tmp6 = tl.sqrt(tmp5)
2023-01-11T21:38:06.2621906Z     tmp7 = 1 / tmp6
2023-01-11T21:38:06.2621970Z     tmp8 = 1
2023-01-11T21:38:06.2622046Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.2622124Z     tmp10 = tmp2 * tmp9
2023-01-11T21:38:06.2622205Z     tmp12 = tmp10 * tmp11
2023-01-11T21:38:06.2622284Z     tmp14 = tmp12 + tmp13
2023-01-11T21:38:06.2622404Z     tmp15 = tl.where(0 != 0, 0, tl.where(0 > tmp14, 0, tmp14))
2023-01-11T21:38:06.2622475Z     tmp16 = 0
2023-01-11T21:38:06.2622549Z     tmp17 = tmp15 <= tmp16
2023-01-11T21:38:06.2622687Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.2622822Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.2622906Z ''')
2023-01-11T21:38:06.2622911Z 
2023-01-11T21:38:06.2622916Z 
2023-01-11T21:38:06.2623009Z async_compile.wait(globals())
2023-01-11T21:38:06.2623087Z del async_compile
2023-01-11T21:38:06.2623093Z 
2023-01-11T21:38:06.2623167Z def call(args):
2023-01-11T21:38:06.2623312Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args
2023-01-11T21:38:06.2623381Z     args.clear()
2023-01-11T21:38:06.2623476Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2623676Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2623772Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2623933Z         triton_fused_convert_element_type_0.run(primals_3, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.2624011Z         del primals_3
2023-01-11T21:38:06.2624243Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2624396Z         triton_fused_convert_element_type_0.run(primals_4, buf1, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.2624476Z         del primals_4
2023-01-11T21:38:06.2624694Z         buf2 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2624902Z         buf3 = empty_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2625085Z         triton_fused_le_relu_1.run(primals_6, buf0, buf1, primals_1, primals_2, buf2, buf3, 1280, grid=grid(1280), stream=stream0)
2023-01-11T21:38:06.2625162Z         del primals_1
2023-01-11T21:38:06.2625238Z         del primals_2
2023-01-11T21:38:06.2625359Z         return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, )
2023-01-11T21:38:06.2625366Z 
2023-01-11T21:38:06.2625370Z 
2023-01-11T21:38:06.2625444Z if __name__ == "__main__":
2023-01-11T21:38:06.2625589Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2625737Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2625945Z     primals_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2626151Z     primals_2 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2626351Z     primals_3 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2626552Z     primals_4 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2626740Z     primals_5 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.2626954Z     primals_6 = rand_strided((2, 10, 8, 8), (640, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2627132Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6]))
2023-01-11T21:38:06.2627137Z 
2023-01-11T21:38:06.2627400Z [2023-01-11 21:34:08,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 433
2023-01-11T21:38:06.2627409Z 
2023-01-11T21:38:06.2627505Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2627579Z import torch
2023-01-11T21:38:06.2627655Z import random
2023-01-11T21:38:06.2627804Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2627930Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2627936Z 
2023-01-11T21:38:06.2628011Z aten = torch.ops.aten
2023-01-11T21:38:06.2628147Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2628241Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2628246Z 
2023-01-11T21:38:06.2628319Z import triton
2023-01-11T21:38:06.2628414Z import triton.language as tl
2023-01-11T21:38:06.2628537Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2628678Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2628687Z 
2023-01-11T21:38:06.2628691Z 
2023-01-11T21:38:06.2628875Z triton_fused_convert_element_type_0 = async_compile.triton('''
2023-01-11T21:38:06.2628944Z import triton
2023-01-11T21:38:06.2629037Z import triton.language as tl
2023-01-11T21:38:06.2629148Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2629251Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2629385Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2629510Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2629515Z 
2023-01-11T21:38:06.2629917Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2629993Z @triton.jit
2023-01-11T21:38:06.2630118Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2630192Z     xnumel = 10
2023-01-11T21:38:06.2630318Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2630449Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2630536Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2630606Z     x0 = xindex
2023-01-11T21:38:06.2630703Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2630836Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2630924Z ''')
2023-01-11T21:38:06.2630929Z 
2023-01-11T21:38:06.2630933Z 
2023-01-11T21:38:06.2631094Z triton_fused_le_relu_1 = async_compile.triton('''
2023-01-11T21:38:06.2631170Z import triton
2023-01-11T21:38:06.2631265Z import triton.language as tl
2023-01-11T21:38:06.2631382Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2631486Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2631622Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2631743Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2631752Z 
2023-01-11T21:38:06.2632228Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*i1', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.2632310Z @triton.jit
2023-01-11T21:38:06.2632486Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2632563Z     xnumel = 7680
2023-01-11T21:38:06.2632663Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2632793Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2632880Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2632946Z     x3 = xindex
2023-01-11T21:38:06.2633031Z     x1 = (xindex // 256) % 10
2023-01-11T21:38:06.2633130Z     tmp0 = tl.load(in_ptr0 + (x3), xmask)
2023-01-11T21:38:06.2633228Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.2633327Z     tmp3 = tl.load(in_ptr2 + (x1), xmask)
2023-01-11T21:38:06.2633429Z     tmp11 = tl.load(in_ptr3 + (x1), xmask)
2023-01-11T21:38:06.2633527Z     tmp13 = tl.load(in_ptr4 + (x1), xmask)
2023-01-11T21:38:06.2633634Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.2633737Z     tmp4 = 1e-05
2023-01-11T21:38:06.2633855Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.2633939Z     tmp6 = tl.sqrt(tmp5)
2023-01-11T21:38:06.2634018Z     tmp7 = 1 / tmp6
2023-01-11T21:38:06.2634093Z     tmp8 = 1
2023-01-11T21:38:06.2634174Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.2634248Z     tmp10 = tmp2 * tmp9
2023-01-11T21:38:06.2634332Z     tmp12 = tmp10 * tmp11
2023-01-11T21:38:06.2634414Z     tmp14 = tmp12 + tmp13
2023-01-11T21:38:06.2634539Z     tmp15 = tl.where(0 != 0, 0, tl.where(0 > tmp14, 0, tmp14))
2023-01-11T21:38:06.2634613Z     tmp16 = 0
2023-01-11T21:38:06.2634696Z     tmp17 = tmp15 <= tmp16
2023-01-11T21:38:06.2634832Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.2634965Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.2635053Z ''')
2023-01-11T21:38:06.2635058Z 
2023-01-11T21:38:06.2635062Z 
2023-01-11T21:38:06.2635158Z async_compile.wait(globals())
2023-01-11T21:38:06.2635253Z del async_compile
2023-01-11T21:38:06.2635260Z 
2023-01-11T21:38:06.2635344Z def call(args):
2023-01-11T21:38:06.2635516Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6 = args
2023-01-11T21:38:06.2635592Z     args.clear()
2023-01-11T21:38:06.2635680Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2635880Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2635976Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2636141Z         triton_fused_convert_element_type_0.run(primals_3, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.2636221Z         del primals_3
2023-01-11T21:38:06.2636417Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2636607Z         triton_fused_convert_element_type_0.run(primals_4, buf1, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.2636690Z         del primals_4
2023-01-11T21:38:06.2636907Z         buf2 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2637132Z         buf3 = empty_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2637318Z         triton_fused_le_relu_1.run(primals_6, buf0, buf1, primals_1, primals_2, buf2, buf3, 7680, grid=grid(7680), stream=stream0)
2023-01-11T21:38:06.2637397Z         del primals_1
2023-01-11T21:38:06.2637476Z         del primals_2
2023-01-11T21:38:06.2637600Z         return (buf0, buf1, buf2, primals_6, buf0, buf1, buf3, )
2023-01-11T21:38:06.2637607Z 
2023-01-11T21:38:06.2637611Z 
2023-01-11T21:38:06.2637692Z if __name__ == "__main__":
2023-01-11T21:38:06.2637812Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2637940Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2638146Z     primals_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2638351Z     primals_2 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2638557Z     primals_3 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2638759Z     primals_4 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2638953Z     primals_5 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.2639184Z     primals_6 = rand_strided((3, 10, 16, 16), (2560, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2639366Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6]))
2023-01-11T21:38:06.2639371Z 
2023-01-11T21:38:06.2639444Z ok (0.991s)
2023-01-11T21:38:06.2639927Z   test_bernoulli1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2640067Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2640330Z [2023-01-11 21:34:08,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 434
2023-01-11T21:38:06.2640593Z [2023-01-11 21:34:08,630] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 434
2023-01-11T21:38:06.2641009Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2641147Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2641404Z [2023-01-11 21:34:08,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 435
2023-01-11T21:38:06.2641670Z [2023-01-11 21:34:08,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 435
2023-01-11T21:38:06.2641676Z 
2023-01-11T21:38:06.2641774Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2641852Z import torch
2023-01-11T21:38:06.2641922Z import random
2023-01-11T21:38:06.2642044Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2642170Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2642175Z 
2023-01-11T21:38:06.2642262Z aten = torch.ops.aten
2023-01-11T21:38:06.2642403Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2642529Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2642534Z 
2023-01-11T21:38:06.2642608Z import triton
2023-01-11T21:38:06.2642702Z import triton.language as tl
2023-01-11T21:38:06.2642824Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2642967Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2642972Z 
2023-01-11T21:38:06.2642977Z 
2023-01-11T21:38:06.2643147Z triton_fused_empty_like_0 = async_compile.triton('''
2023-01-11T21:38:06.2643225Z import triton
2023-01-11T21:38:06.2643319Z import triton.language as tl
2023-01-11T21:38:06.2643435Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2643538Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2643672Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2643794Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2643799Z 
2023-01-11T21:38:06.2644188Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.2644267Z @triton.jit
2023-01-11T21:38:06.2644391Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2644469Z     xnumel = 100
2023-01-11T21:38:06.2644568Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2644698Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2644781Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2644848Z     x0 = xindex
2023-01-11T21:38:06.2644921Z     tmp0 = 0
2023-01-11T21:38:06.2645058Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2645145Z ''')
2023-01-11T21:38:06.2645151Z 
2023-01-11T21:38:06.2645155Z 
2023-01-11T21:38:06.2645250Z async_compile.wait(globals())
2023-01-11T21:38:06.2645329Z del async_compile
2023-01-11T21:38:06.2645334Z 
2023-01-11T21:38:06.2645412Z def call(args):
2023-01-11T21:38:06.2645481Z     arg0_1, = args
2023-01-11T21:38:06.2645560Z     args.clear()
2023-01-11T21:38:06.2645654Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2645858Z         buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2645954Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2646122Z         triton_fused_empty_like_0.run(buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.2646212Z         aten.bernoulli_(buf0, )
2023-01-11T21:38:06.2646291Z         return (buf0, buf0, )
2023-01-11T21:38:06.2646296Z 
2023-01-11T21:38:06.2646307Z 
2023-01-11T21:38:06.2646382Z if __name__ == "__main__":
2023-01-11T21:38:06.2646503Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2646632Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2646838Z     arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2646954Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2646962Z 
2023-01-11T21:38:06.2646967Z 
2023-01-11T21:38:06.2647066Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2647140Z import torch
2023-01-11T21:38:06.2647210Z import random
2023-01-11T21:38:06.2647330Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2647458Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2647463Z 
2023-01-11T21:38:06.2647546Z aten = torch.ops.aten
2023-01-11T21:38:06.2647683Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2647785Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2647790Z 
2023-01-11T21:38:06.2647865Z import triton
2023-01-11T21:38:06.2647963Z import triton.language as tl
2023-01-11T21:38:06.2648083Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2648225Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2648230Z 
2023-01-11T21:38:06.2648235Z 
2023-01-11T21:38:06.2648430Z triton_fused_empty_like_0 = async_compile.triton('''
2023-01-11T21:38:06.2648506Z import triton
2023-01-11T21:38:06.2648600Z import triton.language as tl
2023-01-11T21:38:06.2648717Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2648825Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2648956Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2649084Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2649090Z 
2023-01-11T21:38:06.2649481Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.2649556Z @triton.jit
2023-01-11T21:38:06.2649680Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2649756Z     xnumel = 100
2023-01-11T21:38:06.2649853Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2649988Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2650066Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2650140Z     x0 = xindex
2023-01-11T21:38:06.2650212Z     tmp0 = 0
2023-01-11T21:38:06.2650348Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2650438Z ''')
2023-01-11T21:38:06.2650446Z 
2023-01-11T21:38:06.2650451Z 
2023-01-11T21:38:06.2650549Z async_compile.wait(globals())
2023-01-11T21:38:06.2650628Z del async_compile
2023-01-11T21:38:06.2650633Z 
2023-01-11T21:38:06.2650710Z def call(args):
2023-01-11T21:38:06.2650779Z     arg0_1, = args
2023-01-11T21:38:06.2650858Z     args.clear()
2023-01-11T21:38:06.2650953Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2651157Z         buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2651252Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2651393Z         triton_fused_empty_like_0.run(buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.2651488Z         aten.bernoulli_(buf0, )
2023-01-11T21:38:06.2651567Z         return (buf0, buf0, )
2023-01-11T21:38:06.2651572Z 
2023-01-11T21:38:06.2651584Z 
2023-01-11T21:38:06.2651659Z if __name__ == "__main__":
2023-01-11T21:38:06.2651780Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2651937Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2652140Z     arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2652252Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2652257Z 
2023-01-11T21:38:06.2652330Z ok (0.200s)
2023-01-11T21:38:06.2652793Z   test_bernoulli2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2652929Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2653181Z [2023-01-11 21:34:08,754] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 436
2023-01-11T21:38:06.2653440Z [2023-01-11 21:34:08,754] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.2653705Z [2023-01-11 21:34:08,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 436
2023-01-11T21:38:06.2654122Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2654258Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2654650Z [2023-01-11 21:34:08,967] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 437
2023-01-11T21:38:06.2654903Z [2023-01-11 21:34:08,968] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.2655168Z [2023-01-11 21:34:09,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 437
2023-01-11T21:38:06.2655175Z 
2023-01-11T21:38:06.2655294Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2655370Z import torch
2023-01-11T21:38:06.2655454Z import random
2023-01-11T21:38:06.2655579Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2655702Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2655707Z 
2023-01-11T21:38:06.2655792Z aten = torch.ops.aten
2023-01-11T21:38:06.2655929Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2656028Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2656033Z 
2023-01-11T21:38:06.2656107Z import triton
2023-01-11T21:38:06.2656200Z import triton.language as tl
2023-01-11T21:38:06.2656317Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2656454Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2656621Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.2656627Z 
2023-01-11T21:38:06.2656631Z 
2023-01-11T21:38:06.2656782Z triton_fused_lt_0 = async_compile.triton('''
2023-01-11T21:38:06.2656856Z import triton
2023-01-11T21:38:06.2656946Z import triton.language as tl
2023-01-11T21:38:06.2657060Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2657217Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2657353Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2657479Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2657487Z 
2023-01-11T21:38:06.2657893Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2657967Z @triton.jit
2023-01-11T21:38:06.2658155Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2658230Z     xnumel = 8
2023-01-11T21:38:06.2658327Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2658449Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2658533Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2658606Z     x0 = xindex
2023-01-11T21:38:06.2658739Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.2658837Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2658908Z     tmp1 = 65535
2023-01-11T21:38:06.2658987Z     tmp2 = tmp0 ^ tmp1
2023-01-11T21:38:06.2659054Z     tmp3 = x0
2023-01-11T21:38:06.2659142Z     tmp4 = tl.rand(tmp2, tmp3)
2023-01-11T21:38:06.2659221Z     tmp6 = tmp4 < tmp5
2023-01-11T21:38:06.2659359Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2659445Z ''')
2023-01-11T21:38:06.2659451Z 
2023-01-11T21:38:06.2659455Z 
2023-01-11T21:38:06.2659552Z async_compile.wait(globals())
2023-01-11T21:38:06.2659629Z del async_compile
2023-01-11T21:38:06.2659635Z 
2023-01-11T21:38:06.2659708Z def call(args):
2023-01-11T21:38:06.2659775Z     arg0_1, = args
2023-01-11T21:38:06.2659850Z     args.clear()
2023-01-11T21:38:06.2659984Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.2660078Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2660270Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2660363Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2660509Z         triton_fused_lt_0.run(seed_cuda_0, arg0_1, buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.2660612Z         del arg0_1
2023-01-11T21:38:06.2660694Z         return (buf0, )
2023-01-11T21:38:06.2660699Z 
2023-01-11T21:38:06.2660704Z 
2023-01-11T21:38:06.2660784Z if __name__ == "__main__":
2023-01-11T21:38:06.2660902Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2661034Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2661236Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.2661435Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2661546Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2661551Z 
2023-01-11T21:38:06.2661556Z 
2023-01-11T21:38:06.2661646Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2661719Z import torch
2023-01-11T21:38:06.2661792Z import random
2023-01-11T21:38:06.2661909Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2662031Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2662040Z 
2023-01-11T21:38:06.2662122Z aten = torch.ops.aten
2023-01-11T21:38:06.2662256Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2662348Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2662354Z 
2023-01-11T21:38:06.2662421Z import triton
2023-01-11T21:38:06.2662518Z import triton.language as tl
2023-01-11T21:38:06.2662643Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2662781Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2662944Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.2662950Z 
2023-01-11T21:38:06.2662955Z 
2023-01-11T21:38:06.2663107Z triton_fused_lt_0 = async_compile.triton('''
2023-01-11T21:38:06.2663183Z import triton
2023-01-11T21:38:06.2663276Z import triton.language as tl
2023-01-11T21:38:06.2663381Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2663481Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2663615Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2663738Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2663743Z 
2023-01-11T21:38:06.2664183Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2664259Z @triton.jit
2023-01-11T21:38:06.2664398Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2664472Z     xnumel = 8
2023-01-11T21:38:06.2664563Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2664690Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2664774Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2664844Z     x0 = xindex
2023-01-11T21:38:06.2664972Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.2665093Z     tmp5 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2665167Z     tmp1 = 65535
2023-01-11T21:38:06.2665239Z     tmp2 = tmp0 ^ tmp1
2023-01-11T21:38:06.2665310Z     tmp3 = x0
2023-01-11T21:38:06.2665401Z     tmp4 = tl.rand(tmp2, tmp3)
2023-01-11T21:38:06.2665482Z     tmp6 = tmp4 < tmp5
2023-01-11T21:38:06.2665618Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2665707Z ''')
2023-01-11T21:38:06.2665713Z 
2023-01-11T21:38:06.2665717Z 
2023-01-11T21:38:06.2665809Z async_compile.wait(globals())
2023-01-11T21:38:06.2665879Z del async_compile
2023-01-11T21:38:06.2665884Z 
2023-01-11T21:38:06.2665957Z def call(args):
2023-01-11T21:38:06.2666030Z     arg0_1, = args
2023-01-11T21:38:06.2666105Z     args.clear()
2023-01-11T21:38:06.2666238Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.2666330Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2666566Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2666654Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2666799Z         triton_fused_lt_0.run(seed_cuda_0, arg0_1, buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.2666875Z         del arg0_1
2023-01-11T21:38:06.2666957Z         return (buf0, )
2023-01-11T21:38:06.2666962Z 
2023-01-11T21:38:06.2666967Z 
2023-01-11T21:38:06.2667049Z if __name__ == "__main__":
2023-01-11T21:38:06.2667166Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2667293Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2667492Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.2667683Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2667796Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2667801Z 
2023-01-11T21:38:06.2667875Z ok (0.332s)
2023-01-11T21:38:06.2668340Z   test_bitwise2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2668474Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2668735Z [2023-01-11 21:34:09,084] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 438
2023-01-11T21:38:06.2669000Z [2023-01-11 21:34:09,274] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 438
2023-01-11T21:38:06.2669419Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2669554Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2669883Z [2023-01-11 21:34:09,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 439
2023-01-11T21:38:06.2670153Z [2023-01-11 21:34:09,309] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 439
2023-01-11T21:38:06.2670158Z 
2023-01-11T21:38:06.2670253Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2670329Z import torch
2023-01-11T21:38:06.2670406Z import random
2023-01-11T21:38:06.2670529Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2670655Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2670660Z 
2023-01-11T21:38:06.2670745Z aten = torch.ops.aten
2023-01-11T21:38:06.2670886Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2670980Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2670991Z 
2023-01-11T21:38:06.2671061Z import triton
2023-01-11T21:38:06.2671154Z import triton.language as tl
2023-01-11T21:38:06.2671282Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2671431Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2671437Z 
2023-01-11T21:38:06.2671442Z 
2023-01-11T21:38:06.2671668Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton('''
2023-01-11T21:38:06.2671747Z import triton
2023-01-11T21:38:06.2671843Z import triton.language as tl
2023-01-11T21:38:06.2671954Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2672059Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2672193Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2672321Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2672357Z 
2023-01-11T21:38:06.2672806Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.2672886Z @triton.jit
2023-01-11T21:38:06.2673057Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2673136Z     xnumel = 40
2023-01-11T21:38:06.2673237Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2673363Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2673447Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2673521Z     x0 = xindex
2023-01-11T21:38:06.2673716Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2673911Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2674013Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2674114Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2674186Z     tmp1 = tmp0 == 0
2023-01-11T21:38:06.2674267Z     tmp3 = tmp0 | tmp2
2023-01-11T21:38:06.2674347Z     tmp4 = tmp0 ^ tmp2
2023-01-11T21:38:06.2674426Z     tmp7 = tmp5 & tmp6
2023-01-11T21:38:06.2674569Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.2674704Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2674838Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2674965Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2675053Z ''')
2023-01-11T21:38:06.2675058Z 
2023-01-11T21:38:06.2675063Z 
2023-01-11T21:38:06.2675160Z async_compile.wait(globals())
2023-01-11T21:38:06.2675239Z del async_compile
2023-01-11T21:38:06.2675244Z 
2023-01-11T21:38:06.2675322Z def call(args):
2023-01-11T21:38:06.2675409Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2675488Z     args.clear()
2023-01-11T21:38:06.2675584Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2675777Z         buf0 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2676006Z         buf1 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2676208Z         buf2 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2676404Z         buf3 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2676500Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2676702Z         triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.2676778Z         del arg0_1
2023-01-11T21:38:06.2676846Z         del arg1_1
2023-01-11T21:38:06.2676944Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.2676953Z 
2023-01-11T21:38:06.2676958Z 
2023-01-11T21:38:06.2677041Z if __name__ == "__main__":
2023-01-11T21:38:06.2677161Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2677291Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2677493Z     arg0_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2677691Z     arg1_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2677811Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2677817Z 
2023-01-11T21:38:06.2677821Z 
2023-01-11T21:38:06.2677920Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2677990Z import torch
2023-01-11T21:38:06.2678066Z import random
2023-01-11T21:38:06.2678186Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2678310Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2678315Z 
2023-01-11T21:38:06.2678398Z aten = torch.ops.aten
2023-01-11T21:38:06.2678564Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2678659Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2678665Z 
2023-01-11T21:38:06.2678734Z import triton
2023-01-11T21:38:06.2678829Z import triton.language as tl
2023-01-11T21:38:06.2678957Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2679101Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2679107Z 
2023-01-11T21:38:06.2679111Z 
2023-01-11T21:38:06.2679338Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton('''
2023-01-11T21:38:06.2679415Z import triton
2023-01-11T21:38:06.2679509Z import triton.language as tl
2023-01-11T21:38:06.2679625Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2679722Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2679857Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2679985Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2679993Z 
2023-01-11T21:38:06.2680433Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.2680514Z @triton.jit
2023-01-11T21:38:06.2680688Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2680765Z     xnumel = 40
2023-01-11T21:38:06.2680864Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2680990Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2681076Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2681148Z     x0 = xindex
2023-01-11T21:38:06.2681346Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2681538Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2681639Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2681738Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2681812Z     tmp1 = tmp0 == 0
2023-01-11T21:38:06.2681895Z     tmp3 = tmp0 | tmp2
2023-01-11T21:38:06.2681971Z     tmp4 = tmp0 ^ tmp2
2023-01-11T21:38:06.2682079Z     tmp7 = tmp5 & tmp6
2023-01-11T21:38:06.2682215Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.2682351Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2682483Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2682615Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2682695Z ''')
2023-01-11T21:38:06.2682701Z 
2023-01-11T21:38:06.2682705Z 
2023-01-11T21:38:06.2682799Z async_compile.wait(globals())
2023-01-11T21:38:06.2682879Z del async_compile
2023-01-11T21:38:06.2682884Z 
2023-01-11T21:38:06.2682963Z def call(args):
2023-01-11T21:38:06.2683045Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2683122Z     args.clear()
2023-01-11T21:38:06.2683216Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2683407Z         buf0 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2683605Z         buf1 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2683800Z         buf2 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2683992Z         buf3 = empty_strided((2, 20), (20, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2684087Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2684288Z         triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.2684364Z         del arg0_1
2023-01-11T21:38:06.2684438Z         del arg1_1
2023-01-11T21:38:06.2684529Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.2684562Z 
2023-01-11T21:38:06.2684574Z 
2023-01-11T21:38:06.2684650Z if __name__ == "__main__":
2023-01-11T21:38:06.2684771Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2684899Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2685100Z     arg0_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2685295Z     arg1_1 = rand_strided((2, 20), (20, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2685436Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2685443Z 
2023-01-11T21:38:06.2685520Z ok (0.245s)
2023-01-11T21:38:06.2686001Z   test_bitwise_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2686132Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2686393Z [2023-01-11 21:34:09,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 440
2023-01-11T21:38:06.2686663Z [2023-01-11 21:34:09,410] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 440
2023-01-11T21:38:06.2687080Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2687214Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2687474Z [2023-01-11 21:34:09,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 441
2023-01-11T21:38:06.2687744Z [2023-01-11 21:34:09,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 441
2023-01-11T21:38:06.2687750Z 
2023-01-11T21:38:06.2687850Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2687927Z import torch
2023-01-11T21:38:06.2688034Z import random
2023-01-11T21:38:06.2688150Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2688276Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2688281Z 
2023-01-11T21:38:06.2688366Z aten = torch.ops.aten
2023-01-11T21:38:06.2688505Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2688601Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2688607Z 
2023-01-11T21:38:06.2688687Z import triton
2023-01-11T21:38:06.2688782Z import triton.language as tl
2023-01-11T21:38:06.2688902Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2689046Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2689051Z 
2023-01-11T21:38:06.2689055Z 
2023-01-11T21:38:06.2689281Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton('''
2023-01-11T21:38:06.2689358Z import triton
2023-01-11T21:38:06.2689452Z import triton.language as tl
2023-01-11T21:38:06.2689571Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2689675Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2689809Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2689930Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2689941Z 
2023-01-11T21:38:06.2690386Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*i32', 2: '*i32', 3: '*i32', 4: '*i32', 5: '*i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.2690489Z @triton.jit
2023-01-11T21:38:06.2690659Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2690736Z     xnumel = 64
2023-01-11T21:38:06.2690839Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2690971Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2691058Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2691125Z     x0 = xindex
2023-01-11T21:38:06.2691318Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2691510Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2691608Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2691705Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2691782Z     tmp1 = ~tmp0
2023-01-11T21:38:06.2691863Z     tmp3 = tmp0 | tmp2
2023-01-11T21:38:06.2691937Z     tmp4 = tmp0 ^ tmp2
2023-01-11T21:38:06.2692016Z     tmp7 = tmp5 & tmp6
2023-01-11T21:38:06.2692158Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.2692294Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2692425Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2692561Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2692650Z ''')
2023-01-11T21:38:06.2692656Z 
2023-01-11T21:38:06.2692660Z 
2023-01-11T21:38:06.2692756Z async_compile.wait(globals())
2023-01-11T21:38:06.2692829Z del async_compile
2023-01-11T21:38:06.2692840Z 
2023-01-11T21:38:06.2692910Z def call(args):
2023-01-11T21:38:06.2692991Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2693068Z     args.clear()
2023-01-11T21:38:06.2693162Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2693362Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2693558Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2693750Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2693935Z         buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2694031Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2694261Z         triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2694339Z         del arg0_1
2023-01-11T21:38:06.2694415Z         del arg1_1
2023-01-11T21:38:06.2694618Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.2694624Z 
2023-01-11T21:38:06.2694628Z 
2023-01-11T21:38:06.2694710Z if __name__ == "__main__":
2023-01-11T21:38:06.2694828Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2694949Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2695148Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.2695346Z     arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.2695482Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2695489Z 
2023-01-11T21:38:06.2695495Z 
2023-01-11T21:38:06.2695601Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2695696Z import torch
2023-01-11T21:38:06.2695777Z import random
2023-01-11T21:38:06.2695890Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2696015Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2696020Z 
2023-01-11T21:38:06.2696101Z aten = torch.ops.aten
2023-01-11T21:38:06.2696235Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2696329Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2696334Z 
2023-01-11T21:38:06.2696405Z import triton
2023-01-11T21:38:06.2696497Z import triton.language as tl
2023-01-11T21:38:06.2696622Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2696801Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2696806Z 
2023-01-11T21:38:06.2696818Z 
2023-01-11T21:38:06.2697036Z triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0 = async_compile.triton('''
2023-01-11T21:38:06.2697112Z import triton
2023-01-11T21:38:06.2697268Z import triton.language as tl
2023-01-11T21:38:06.2697386Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2697488Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2697621Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2697747Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2697753Z 
2023-01-11T21:38:06.2698199Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*i32', 2: '*i32', 3: '*i32', 4: '*i32', 5: '*i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.2698275Z @triton.jit
2023-01-11T21:38:06.2698445Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2698521Z     xnumel = 64
2023-01-11T21:38:06.2698621Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2698754Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2698839Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2698910Z     x0 = xindex
2023-01-11T21:38:06.2699094Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2699284Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2699380Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2699475Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2699549Z     tmp1 = ~tmp0
2023-01-11T21:38:06.2699628Z     tmp3 = tmp0 | tmp2
2023-01-11T21:38:06.2699709Z     tmp4 = tmp0 ^ tmp2
2023-01-11T21:38:06.2699780Z     tmp7 = tmp5 & tmp6
2023-01-11T21:38:06.2699921Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.2700056Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2700187Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2700354Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2700442Z ''')
2023-01-11T21:38:06.2700448Z 
2023-01-11T21:38:06.2700452Z 
2023-01-11T21:38:06.2700545Z async_compile.wait(globals())
2023-01-11T21:38:06.2700621Z del async_compile
2023-01-11T21:38:06.2700626Z 
2023-01-11T21:38:06.2700694Z def call(args):
2023-01-11T21:38:06.2700777Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2700852Z     args.clear()
2023-01-11T21:38:06.2700946Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2701141Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2701334Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2701527Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2701709Z         buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.2701800Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2701997Z         triton_fused_bitwise_and_bitwise_not_bitwise_or_bitwise_xor_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2702070Z         del arg0_1
2023-01-11T21:38:06.2702145Z         del arg1_1
2023-01-11T21:38:06.2702238Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.2702244Z 
2023-01-11T21:38:06.2702248Z 
2023-01-11T21:38:06.2702328Z if __name__ == "__main__":
2023-01-11T21:38:06.2702445Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2702566Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2702759Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.2702982Z     arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.2703100Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2703106Z 
2023-01-11T21:38:06.2703175Z ok (0.135s)
2023-01-11T21:38:06.2703629Z   test_bmm1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2703761Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2704021Z [2023-01-11 21:34:09,466] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 442
2023-01-11T21:38:06.2704283Z [2023-01-11 21:34:09,554] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 442
2023-01-11T21:38:06.2704708Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2704840Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2705087Z [2023-01-11 21:34:09,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 443
2023-01-11T21:38:06.2705092Z 
2023-01-11T21:38:06.2705190Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2705265Z import torch
2023-01-11T21:38:06.2705339Z import random
2023-01-11T21:38:06.2705461Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2705587Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2705594Z 
2023-01-11T21:38:06.2705691Z aten = torch.ops.aten
2023-01-11T21:38:06.2705838Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2705952Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2705957Z 
2023-01-11T21:38:06.2706030Z import triton
2023-01-11T21:38:06.2706148Z import triton.language as tl
2023-01-11T21:38:06.2706276Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2706417Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2706422Z 
2023-01-11T21:38:06.2706426Z 
2023-01-11T21:38:06.2706582Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2706657Z import triton
2023-01-11T21:38:06.2706742Z import triton.language as tl
2023-01-11T21:38:06.2706857Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2706958Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2707092Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2707224Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2707229Z 
2023-01-11T21:38:06.2707636Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2707710Z @triton.jit
2023-01-11T21:38:06.2707845Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2707913Z     xnumel = 128
2023-01-11T21:38:06.2708009Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2708136Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2708219Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2708292Z     x0 = xindex
2023-01-11T21:38:06.2708390Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2708461Z     tmp1 = 1
2023-01-11T21:38:06.2708534Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2708709Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2708799Z ''')
2023-01-11T21:38:06.2708805Z 
2023-01-11T21:38:06.2708809Z 
2023-01-11T21:38:06.2708969Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2709045Z import triton
2023-01-11T21:38:06.2709143Z import triton.language as tl
2023-01-11T21:38:06.2709262Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2709367Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2709496Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2709627Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2709633Z 
2023-01-11T21:38:06.2710041Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2710119Z @triton.jit
2023-01-11T21:38:06.2710255Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2710332Z     xnumel = 128
2023-01-11T21:38:06.2710430Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2710560Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2710638Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2710713Z     x0 = xindex
2023-01-11T21:38:06.2710812Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2710887Z     tmp1 = 2
2023-01-11T21:38:06.2710969Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2711102Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2711191Z ''')
2023-01-11T21:38:06.2711197Z 
2023-01-11T21:38:06.2711201Z 
2023-01-11T21:38:06.2711353Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.2711429Z import triton
2023-01-11T21:38:06.2711524Z import triton.language as tl
2023-01-11T21:38:06.2711640Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2711747Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2711883Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2712009Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2712015Z 
2023-01-11T21:38:06.2712444Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2712515Z @triton.jit
2023-01-11T21:38:06.2712641Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2712717Z     xnumel = 128
2023-01-11T21:38:06.2712814Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2712945Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2713030Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2713103Z     x0 = xindex
2023-01-11T21:38:06.2713202Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2713279Z     tmp1 = 3
2023-01-11T21:38:06.2713358Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2713499Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2713585Z ''')
2023-01-11T21:38:06.2713591Z 
2023-01-11T21:38:06.2713595Z 
2023-01-11T21:38:06.2713692Z async_compile.wait(globals())
2023-01-11T21:38:06.2713772Z del async_compile
2023-01-11T21:38:06.2713777Z 
2023-01-11T21:38:06.2713848Z def call(args):
2023-01-11T21:38:06.2713931Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2714006Z     args.clear()
2023-01-11T21:38:06.2714101Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2714312Z         buf0 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2714422Z         aten.bmm.out(arg0_1, arg1_1, out=buf0)
2023-01-11T21:38:06.2714624Z         buf1 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2714718Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2714882Z         triton_fused_add_0.run(arg0_1, buf1, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2714956Z         del arg0_1
2023-01-11T21:38:06.2715160Z         buf2 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2715330Z         triton_fused_add_1_1.run(arg1_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2715416Z         del arg1_1
2023-01-11T21:38:06.2715633Z         buf3 = empty_strided((2, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2715736Z         aten.bmm.out(buf1, buf2, out=buf3)
2023-01-11T21:38:06.2715804Z         del buf1
2023-01-11T21:38:06.2715876Z         del buf2
2023-01-11T21:38:06.2715968Z         buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.2716103Z         triton_fused_add_2_2.run(buf4, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2716189Z         return (buf0, buf4, )
2023-01-11T21:38:06.2716194Z 
2023-01-11T21:38:06.2716199Z 
2023-01-11T21:38:06.2716284Z if __name__ == "__main__":
2023-01-11T21:38:06.2716404Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2716532Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2716737Z     arg0_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2716949Z     arg1_1 = rand_strided((2, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2717071Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2717077Z 
2023-01-11T21:38:06.2717346Z [2023-01-11 21:34:09,750] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 443
2023-01-11T21:38:06.2717352Z 
2023-01-11T21:38:06.2717450Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2717526Z import torch
2023-01-11T21:38:06.2717602Z import random
2023-01-11T21:38:06.2717722Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2717841Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2717849Z 
2023-01-11T21:38:06.2717932Z aten = torch.ops.aten
2023-01-11T21:38:06.2718068Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2718165Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2718171Z 
2023-01-11T21:38:06.2718249Z import triton
2023-01-11T21:38:06.2718343Z import triton.language as tl
2023-01-11T21:38:06.2718500Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2718635Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2718647Z 
2023-01-11T21:38:06.2718651Z 
2023-01-11T21:38:06.2718803Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.2718880Z import triton
2023-01-11T21:38:06.2718973Z import triton.language as tl
2023-01-11T21:38:06.2719089Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2719193Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2719327Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2719456Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2719461Z 
2023-01-11T21:38:06.2719865Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2719936Z @triton.jit
2023-01-11T21:38:06.2720071Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2720148Z     xnumel = 128
2023-01-11T21:38:06.2720247Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2720378Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2720464Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2720537Z     x0 = xindex
2023-01-11T21:38:06.2720629Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2720701Z     tmp1 = 1
2023-01-11T21:38:06.2720783Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2720919Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2721034Z ''')
2023-01-11T21:38:06.2721040Z 
2023-01-11T21:38:06.2721044Z 
2023-01-11T21:38:06.2721205Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2721284Z import triton
2023-01-11T21:38:06.2721372Z import triton.language as tl
2023-01-11T21:38:06.2721490Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2721593Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2721726Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2721854Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2721859Z 
2023-01-11T21:38:06.2722262Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2722340Z @triton.jit
2023-01-11T21:38:06.2722474Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2722547Z     xnumel = 80
2023-01-11T21:38:06.2722647Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2722777Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2722864Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2722937Z     x0 = xindex
2023-01-11T21:38:06.2723043Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2723116Z     tmp1 = 2
2023-01-11T21:38:06.2723191Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2723331Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2723418Z ''')
2023-01-11T21:38:06.2723424Z 
2023-01-11T21:38:06.2723428Z 
2023-01-11T21:38:06.2723586Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.2723662Z import triton
2023-01-11T21:38:06.2723757Z import triton.language as tl
2023-01-11T21:38:06.2723871Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2723975Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2724104Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2724230Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2724235Z 
2023-01-11T21:38:06.2724669Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2724747Z @triton.jit
2023-01-11T21:38:06.2724876Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2724953Z     xnumel = 160
2023-01-11T21:38:06.2725051Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2725181Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2725259Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2725332Z     x0 = xindex
2023-01-11T21:38:06.2725437Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2725509Z     tmp1 = 3
2023-01-11T21:38:06.2725591Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2725728Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2725809Z ''')
2023-01-11T21:38:06.2725821Z 
2023-01-11T21:38:06.2725825Z 
2023-01-11T21:38:06.2725914Z async_compile.wait(globals())
2023-01-11T21:38:06.2725993Z del async_compile
2023-01-11T21:38:06.2726000Z 
2023-01-11T21:38:06.2726077Z def call(args):
2023-01-11T21:38:06.2726159Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2726235Z     args.clear()
2023-01-11T21:38:06.2726329Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2726545Z         buf0 = empty_strided((1, 16, 10), (160, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2726712Z         aten.mm.out(as_strided(arg0_1, (16, 8), (8, 1)), as_strided(arg1_1, (8, 10), (10, 1)), out=as_strided(buf0, (16, 10), (10, 1)))
2023-01-11T21:38:06.2726926Z         buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2727051Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2727193Z         triton_fused_add_0.run(arg0_1, buf1, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2727269Z         del arg0_1
2023-01-11T21:38:06.2727476Z         buf2 = empty_strided((1, 8, 10), (80, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2727616Z         triton_fused_add_1_1.run(arg1_1, buf2, 80, grid=grid(80), stream=stream0)
2023-01-11T21:38:06.2727692Z         del arg1_1
2023-01-11T21:38:06.2727895Z         buf3 = empty_strided((1, 16, 10), (160, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2728065Z         aten.mm.out(as_strided(buf1, (16, 8), (8, 1)), as_strided(buf2, (8, 10), (10, 1)), out=as_strided(buf3, (16, 10), (10, 1)))
2023-01-11T21:38:06.2728140Z         del buf1
2023-01-11T21:38:06.2728211Z         del buf2
2023-01-11T21:38:06.2728305Z         buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.2728438Z         triton_fused_add_2_2.run(buf4, 160, grid=grid(160), stream=stream0)
2023-01-11T21:38:06.2728527Z         return (buf0, buf4, )
2023-01-11T21:38:06.2728532Z 
2023-01-11T21:38:06.2728536Z 
2023-01-11T21:38:06.2728619Z if __name__ == "__main__":
2023-01-11T21:38:06.2728733Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2728863Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2729077Z     arg0_1 = rand_strided((1, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2729288Z     arg1_1 = rand_strided((1, 8, 10), (80, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2729412Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2729417Z 
2023-01-11T21:38:06.2729491Z ok (0.307s)
2023-01-11T21:38:06.2729945Z   test_bmm2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2730081Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2730368Z [2023-01-11 21:34:09,766] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 444
2023-01-11T21:38:06.2730626Z [2023-01-11 21:34:09,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 444
2023-01-11T21:38:06.2730637Z 
2023-01-11T21:38:06.2730731Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2730807Z import torch
2023-01-11T21:38:06.2730882Z import random
2023-01-11T21:38:06.2731004Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2731128Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2731133Z 
2023-01-11T21:38:06.2731217Z aten = torch.ops.aten
2023-01-11T21:38:06.2731357Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2731451Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2731456Z 
2023-01-11T21:38:06.2731531Z import triton
2023-01-11T21:38:06.2731625Z import triton.language as tl
2023-01-11T21:38:06.2731752Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2731895Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2731900Z 
2023-01-11T21:38:06.2731905Z 
2023-01-11T21:38:06.2731999Z async_compile.wait(globals())
2023-01-11T21:38:06.2732080Z del async_compile
2023-01-11T21:38:06.2732085Z 
2023-01-11T21:38:06.2732161Z def call(args):
2023-01-11T21:38:06.2732237Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2732317Z     args.clear()
2023-01-11T21:38:06.2732412Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2732621Z         buf0 = empty_strided((1, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2732792Z         aten.mm.out(as_strided(arg0_1, (8, 8), (1, 8)), as_strided(arg1_1, (8, 8), (8, 1)), out=as_strided(buf0, (8, 8), (8, 1)))
2023-01-11T21:38:06.2732896Z         del arg0_1
2023-01-11T21:38:06.2732969Z         del arg1_1
2023-01-11T21:38:06.2733042Z         return (buf0, )
2023-01-11T21:38:06.2733048Z 
2023-01-11T21:38:06.2733056Z 
2023-01-11T21:38:06.2733131Z if __name__ == "__main__":
2023-01-11T21:38:06.2733254Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2733382Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2733594Z     arg0_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2733804Z     arg1_1 = rand_strided((1, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2733924Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2733929Z 
2023-01-11T21:38:06.2734004Z ok (0.022s)
2023-01-11T21:38:06.2734456Z   test_bool_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2734773Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2735036Z [2023-01-11 21:34:09,805] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 445
2023-01-11T21:38:06.2735301Z [2023-01-11 21:34:10,025] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 445
2023-01-11T21:38:06.2735716Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2735850Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2736106Z [2023-01-11 21:34:10,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 446
2023-01-11T21:38:06.2736412Z [2023-01-11 21:34:10,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 446
2023-01-11T21:38:06.2736418Z 
2023-01-11T21:38:06.2736515Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2736590Z import torch
2023-01-11T21:38:06.2736665Z import random
2023-01-11T21:38:06.2736778Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2736901Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2736907Z 
2023-01-11T21:38:06.2736988Z aten = torch.ops.aten
2023-01-11T21:38:06.2737174Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2737284Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2737290Z 
2023-01-11T21:38:06.2737374Z import triton
2023-01-11T21:38:06.2737471Z import triton.language as tl
2023-01-11T21:38:06.2737591Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2737733Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2737738Z 
2023-01-11T21:38:06.2737743Z 
2023-01-11T21:38:06.2738030Z triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0 = async_compile.triton('''
2023-01-11T21:38:06.2738106Z import triton
2023-01-11T21:38:06.2738197Z import triton.language as tl
2023-01-11T21:38:06.2738310Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2738412Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2738545Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2738665Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2738676Z 
2023-01-11T21:38:06.2739172Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: '*i1', 7: '*i1', 8: '*i1', 9: '*i1', 10: '*i1', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), equal_to_1=())]})
2023-01-11T21:38:06.2739287Z @triton.jit
2023-01-11T21:38:06.2739498Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2739575Z     xnumel = 4
2023-01-11T21:38:06.2739676Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2739809Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2739893Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2739967Z     x0 = xindex
2023-01-11T21:38:06.2740155Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2740346Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2740445Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2740546Z     tmp7 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2740628Z     tmp2 = tmp0 | tmp1
2023-01-11T21:38:06.2740708Z     tmp3 = tmp0 & tmp1
2023-01-11T21:38:06.2740788Z     tmp4 = tmp0 ^ tmp1
2023-01-11T21:38:06.2740859Z     tmp6 = tmp5 == 0
2023-01-11T21:38:06.2740999Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2741135Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2741270Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2741402Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2741532Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2741661Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2741789Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2741917Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2742044Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2742133Z ''')
2023-01-11T21:38:06.2742138Z 
2023-01-11T21:38:06.2742143Z 
2023-01-11T21:38:06.2742266Z async_compile.wait(globals())
2023-01-11T21:38:06.2742351Z del async_compile
2023-01-11T21:38:06.2742356Z 
2023-01-11T21:38:06.2742431Z def call(args):
2023-01-11T21:38:06.2742513Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2742592Z     args.clear()
2023-01-11T21:38:06.2742681Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2742873Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2743064Z         buf1 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2743251Z         buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2743439Z         buf3 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2743626Z         buf4 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2743812Z         buf5 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2743987Z         buf6 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2744178Z         buf7 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2744359Z         buf8 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2744457Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2744708Z         triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.2744784Z         del arg0_1
2023-01-11T21:38:06.2744859Z         del arg1_1
2023-01-11T21:38:06.2744984Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, )
2023-01-11T21:38:06.2745019Z 
2023-01-11T21:38:06.2745023Z 
2023-01-11T21:38:06.2745103Z if __name__ == "__main__":
2023-01-11T21:38:06.2745218Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2745347Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2745542Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2745734Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2745856Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2745861Z 
2023-01-11T21:38:06.2745865Z 
2023-01-11T21:38:06.2745965Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2746041Z import torch
2023-01-11T21:38:06.2746112Z import random
2023-01-11T21:38:06.2746234Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2746359Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2746367Z 
2023-01-11T21:38:06.2746451Z aten = torch.ops.aten
2023-01-11T21:38:06.2746589Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2746688Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2746693Z 
2023-01-11T21:38:06.2746768Z import triton
2023-01-11T21:38:06.2746862Z import triton.language as tl
2023-01-11T21:38:06.2746985Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2747127Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2747132Z 
2023-01-11T21:38:06.2747137Z 
2023-01-11T21:38:06.2747419Z triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0 = async_compile.triton('''
2023-01-11T21:38:06.2747497Z import triton
2023-01-11T21:38:06.2747591Z import triton.language as tl
2023-01-11T21:38:06.2747708Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2747810Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2747946Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2748069Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2748074Z 
2023-01-11T21:38:06.2748607Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i1', 1: '*i1', 2: '*i1', 3: '*i1', 4: '*i1', 5: '*i1', 6: '*i1', 7: '*i1', 8: '*i1', 9: '*i1', 10: '*i1', 11: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), equal_to_1=())]})
2023-01-11T21:38:06.2748686Z @triton.jit
2023-01-11T21:38:06.2748894Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2748970Z     xnumel = 4
2023-01-11T21:38:06.2749072Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2749205Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2749292Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2749369Z     x0 = xindex
2023-01-11T21:38:06.2749555Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2749747Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2749846Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2749947Z     tmp7 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2750030Z     tmp2 = tmp0 | tmp1
2023-01-11T21:38:06.2750110Z     tmp3 = tmp0 & tmp1
2023-01-11T21:38:06.2750190Z     tmp4 = tmp0 ^ tmp1
2023-01-11T21:38:06.2750261Z     tmp6 = tmp5 == 0
2023-01-11T21:38:06.2750397Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2750533Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2750668Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2750799Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2750977Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2751107Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.2751231Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2751359Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.2751487Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2751576Z ''')
2023-01-11T21:38:06.2751582Z 
2023-01-11T21:38:06.2751586Z 
2023-01-11T21:38:06.2751683Z async_compile.wait(globals())
2023-01-11T21:38:06.2751764Z del async_compile
2023-01-11T21:38:06.2751769Z 
2023-01-11T21:38:06.2751847Z def call(args):
2023-01-11T21:38:06.2751928Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2751999Z     args.clear()
2023-01-11T21:38:06.2752094Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2752288Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2752483Z         buf1 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2752671Z         buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2752858Z         buf3 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2753046Z         buf4 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2753231Z         buf5 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2753408Z         buf6 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2753590Z         buf7 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2753773Z         buf8 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.2753868Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2754118Z         triton_fused_add_bitwise_and_bitwise_or_bitwise_xor_logical_and_logical_not_logical_or_mul_sign_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.2754199Z         del arg0_1
2023-01-11T21:38:06.2754275Z         del arg1_1
2023-01-11T21:38:06.2754400Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, )
2023-01-11T21:38:06.2754406Z 
2023-01-11T21:38:06.2754439Z 
2023-01-11T21:38:06.2754517Z if __name__ == "__main__":
2023-01-11T21:38:06.2754639Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2754766Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2754957Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2755163Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.2755299Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2755305Z 
2023-01-11T21:38:06.2755403Z ok (0.314s)
2023-01-11T21:38:06.2755735Z   test_both_scalars_cuda (__main__.CudaTests) ... [2023-01-11 21:34:10,142] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 447
2023-01-11T21:38:06.2756000Z [2023-01-11 21:34:10,158] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 447
2023-01-11T21:38:06.2756258Z [2023-01-11 21:34:10,212] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 448
2023-01-11T21:38:06.2756524Z [2023-01-11 21:34:10,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 448
2023-01-11T21:38:06.2756529Z 
2023-01-11T21:38:06.2756632Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2756709Z import torch
2023-01-11T21:38:06.2756788Z import random
2023-01-11T21:38:06.2756911Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2757035Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2757041Z 
2023-01-11T21:38:06.2757119Z aten = torch.ops.aten
2023-01-11T21:38:06.2757285Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2757384Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2757389Z 
2023-01-11T21:38:06.2757465Z import triton
2023-01-11T21:38:06.2757560Z import triton.language as tl
2023-01-11T21:38:06.2757687Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2757830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2757836Z 
2023-01-11T21:38:06.2757840Z 
2023-01-11T21:38:06.2757982Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.2758185Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.2758306Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:06.2758411Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.2758512Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.2758616Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.2758721Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.2758821Z                        float* __restrict__ out_ptr5)
2023-01-11T21:38:06.2758889Z {
2023-01-11T21:38:06.2758950Z     {
2023-01-11T21:38:06.2759020Z         {
2023-01-11T21:38:06.2759126Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:06.2759239Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:06.2759334Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.2759421Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.2759483Z         }
2023-01-11T21:38:06.2759552Z     }
2023-01-11T21:38:06.2759620Z     {
2023-01-11T21:38:06.2759687Z         {
2023-01-11T21:38:06.2759796Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:06.2759902Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:06.2759994Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.2760073Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.2760141Z         }
2023-01-11T21:38:06.2760211Z     }
2023-01-11T21:38:06.2760277Z     {
2023-01-11T21:38:06.2760344Z         {
2023-01-11T21:38:06.2760449Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:06.2760557Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:06.2760683Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:06.2760798Z             out_ptr2[0] = tmp2;
2023-01-11T21:38:06.2760867Z         }
2023-01-11T21:38:06.2760934Z     }
2023-01-11T21:38:06.2761002Z     {
2023-01-11T21:38:06.2761071Z         {
2023-01-11T21:38:06.2761171Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:06.2761275Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:06.2761406Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:06.2761489Z             out_ptr3[0] = tmp2;
2023-01-11T21:38:06.2761559Z         }
2023-01-11T21:38:06.2761627Z     }
2023-01-11T21:38:06.2761694Z     {
2023-01-11T21:38:06.2761755Z         {
2023-01-11T21:38:06.2761858Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:06.2761969Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:06.2762059Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.2762144Z             out_ptr4[0] = tmp2;
2023-01-11T21:38:06.2762212Z         }
2023-01-11T21:38:06.2762281Z     }
2023-01-11T21:38:06.2762342Z     {
2023-01-11T21:38:06.2762409Z         {
2023-01-11T21:38:06.2762517Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:06.2762620Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:06.2762714Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.2762800Z             out_ptr5[0] = tmp2;
2023-01-11T21:38:06.2762863Z         }
2023-01-11T21:38:06.2762930Z     }
2023-01-11T21:38:06.2762996Z }
2023-01-11T21:38:06.2763081Z ''')
2023-01-11T21:38:06.2763087Z 
2023-01-11T21:38:06.2763091Z 
2023-01-11T21:38:06.2763185Z async_compile.wait(globals())
2023-01-11T21:38:06.2763265Z del async_compile
2023-01-11T21:38:06.2763271Z 
2023-01-11T21:38:06.2763345Z def call(args):
2023-01-11T21:38:06.2763608Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2763785Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2763966Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2764150Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2764327Z     buf4 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2764505Z     buf5 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2764743Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:06.2764854Z     return (buf0, buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:06.2764859Z 
2023-01-11T21:38:06.2764864Z 
2023-01-11T21:38:06.2764947Z if __name__ == "__main__":
2023-01-11T21:38:06.2765062Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2765194Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2765298Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.2765304Z 
2023-01-11T21:38:06.2765308Z 
2023-01-11T21:38:06.2765409Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2765485Z import torch
2023-01-11T21:38:06.2765563Z import random
2023-01-11T21:38:06.2765684Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2765809Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2765814Z 
2023-01-11T21:38:06.2765891Z aten = torch.ops.aten
2023-01-11T21:38:06.2766029Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2766127Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2766132Z 
2023-01-11T21:38:06.2766210Z import triton
2023-01-11T21:38:06.2766303Z import triton.language as tl
2023-01-11T21:38:06.2766431Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2766575Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2766580Z 
2023-01-11T21:38:06.2766585Z 
2023-01-11T21:38:06.2766724Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.2766926Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.2767074Z extern "C" void kernel(float* __restrict__ out_ptr0,
2023-01-11T21:38:06.2767183Z                        float* __restrict__ out_ptr1,
2023-01-11T21:38:06.2767287Z                        float* __restrict__ out_ptr2,
2023-01-11T21:38:06.2767389Z                        float* __restrict__ out_ptr3,
2023-01-11T21:38:06.2767493Z                        float* __restrict__ out_ptr4,
2023-01-11T21:38:06.2767594Z                        float* __restrict__ out_ptr5)
2023-01-11T21:38:06.2767655Z {
2023-01-11T21:38:06.2767723Z     {
2023-01-11T21:38:06.2767791Z         {
2023-01-11T21:38:06.2767897Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:06.2768011Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:06.2768105Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.2768191Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.2768253Z         }
2023-01-11T21:38:06.2768322Z     }
2023-01-11T21:38:06.2768388Z     {
2023-01-11T21:38:06.2768455Z         {
2023-01-11T21:38:06.2768565Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:06.2768670Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:06.2768764Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.2768842Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.2768910Z         }
2023-01-11T21:38:06.2768975Z     }
2023-01-11T21:38:06.2769043Z     {
2023-01-11T21:38:06.2769111Z         {
2023-01-11T21:38:06.2769214Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:06.2769316Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:06.2769449Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:06.2769532Z             out_ptr2[0] = tmp2;
2023-01-11T21:38:06.2769629Z         }
2023-01-11T21:38:06.2769698Z     }
2023-01-11T21:38:06.2769765Z     {
2023-01-11T21:38:06.2769833Z         {
2023-01-11T21:38:06.2769933Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:06.2770038Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:06.2770174Z             auto tmp2 = tmp0 - tmp1;
2023-01-11T21:38:06.2770259Z             out_ptr3[0] = tmp2;
2023-01-11T21:38:06.2770327Z         }
2023-01-11T21:38:06.2770394Z     }
2023-01-11T21:38:06.2770461Z     {
2023-01-11T21:38:06.2770522Z         {
2023-01-11T21:38:06.2770626Z             auto tmp0 = static_cast<float>(4);
2023-01-11T21:38:06.2770732Z             auto tmp1 = static_cast<float>(3.3);
2023-01-11T21:38:06.2770823Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.2770909Z             out_ptr4[0] = tmp2;
2023-01-11T21:38:06.2770976Z         }
2023-01-11T21:38:06.2771037Z     }
2023-01-11T21:38:06.2771104Z     {
2023-01-11T21:38:06.2771170Z         {
2023-01-11T21:38:06.2771278Z             auto tmp0 = static_cast<float>(3.3);
2023-01-11T21:38:06.2771382Z             auto tmp1 = static_cast<float>(4);
2023-01-11T21:38:06.2771471Z             auto tmp2 = tmp0 * tmp1;
2023-01-11T21:38:06.2771555Z             out_ptr5[0] = tmp2;
2023-01-11T21:38:06.2771617Z         }
2023-01-11T21:38:06.2771684Z     }
2023-01-11T21:38:06.2771752Z }
2023-01-11T21:38:06.2771839Z ''')
2023-01-11T21:38:06.2771844Z 
2023-01-11T21:38:06.2771849Z 
2023-01-11T21:38:06.2771947Z async_compile.wait(globals())
2023-01-11T21:38:06.2772026Z del async_compile
2023-01-11T21:38:06.2772031Z 
2023-01-11T21:38:06.2772108Z def call(args):
2023-01-11T21:38:06.2772290Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2772473Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2772652Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2772831Z     buf3 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2773012Z     buf4 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2773187Z     buf5 = empty_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.2773450Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()), c_void_p(buf3.data_ptr()), c_void_p(buf4.data_ptr()), c_void_p(buf5.data_ptr()))
2023-01-11T21:38:06.2773557Z     return (buf0, buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:06.2773563Z 
2023-01-11T21:38:06.2773567Z 
2023-01-11T21:38:06.2773648Z if __name__ == "__main__":
2023-01-11T21:38:06.2773760Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2773888Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2773989Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.2773995Z 
2023-01-11T21:38:06.2774067Z ok (0.141s)
2023-01-11T21:38:06.2774632Z   test_cat_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2774770Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2775033Z [2023-01-11 21:34:10,276] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 449
2023-01-11T21:38:06.2775295Z [2023-01-11 21:34:10,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 449
2023-01-11T21:38:06.2775712Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2775920Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2776169Z [2023-01-11 21:34:10,575] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 450
2023-01-11T21:38:06.2776183Z 
2023-01-11T21:38:06.2776276Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2776351Z import torch
2023-01-11T21:38:06.2776425Z import random
2023-01-11T21:38:06.2776543Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2776665Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2776671Z 
2023-01-11T21:38:06.2776751Z aten = torch.ops.aten
2023-01-11T21:38:06.2776886Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2776976Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2776981Z 
2023-01-11T21:38:06.2777054Z import triton
2023-01-11T21:38:06.2777208Z import triton.language as tl
2023-01-11T21:38:06.2777365Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2777506Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2777511Z 
2023-01-11T21:38:06.2777516Z 
2023-01-11T21:38:06.2777740Z triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.2777819Z import triton
2023-01-11T21:38:06.2777911Z import triton.language as tl
2023-01-11T21:38:06.2778018Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2778122Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2778255Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2778381Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2778386Z 
2023-01-11T21:38:06.2778866Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp64', 6: '*fp64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.2778943Z @triton.jit
2023-01-11T21:38:06.2779124Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2779195Z     xnumel = 128
2023-01-11T21:38:06.2779326Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2779460Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2779540Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2779610Z     x2 = xindex
2023-01-11T21:38:06.2779684Z     x0 = xindex % 16
2023-01-11T21:38:06.2779764Z     x1 = (xindex // 16)
2023-01-11T21:38:06.2779961Z     tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2780053Z     tmp3 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.2780124Z     tmp1 = 2
2023-01-11T21:38:06.2780202Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2780280Z     tmp4 = tmp3 * tmp1
2023-01-11T21:38:06.2780371Z     tmp5 = tmp4.to(tl.float64)
2023-01-11T21:38:06.2780515Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2780650Z     tl.store(out_ptr1 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2780783Z     tl.store(out_ptr2 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2780910Z     tl.store(out_ptr3 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2781040Z     tl.store(out_ptr4 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2781171Z     tl.store(out_ptr5 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2781256Z ''')
2023-01-11T21:38:06.2781262Z 
2023-01-11T21:38:06.2781266Z 
2023-01-11T21:38:06.2781445Z triton_fused_add_slice_1_slice_2_1 = async_compile.triton('''
2023-01-11T21:38:06.2781520Z import triton
2023-01-11T21:38:06.2781613Z import triton.language as tl
2023-01-11T21:38:06.2781721Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2781856Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2781990Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2782114Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2782120Z 
2023-01-11T21:38:06.2782530Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2782603Z @triton.jit
2023-01-11T21:38:06.2782737Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2782810Z     xnumel = 32
2023-01-11T21:38:06.2782899Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2783032Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2783118Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2783192Z     x0 = xindex % 4
2023-01-11T21:38:06.2783273Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2783379Z     tmp0 = tl.load(in_ptr0 + (x0 + (16*x1)), xmask)
2023-01-11T21:38:06.2783451Z     tmp1 = 1
2023-01-11T21:38:06.2783523Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2783663Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2783749Z ''')
2023-01-11T21:38:06.2783757Z 
2023-01-11T21:38:06.2783762Z 
2023-01-11T21:38:06.2783853Z async_compile.wait(globals())
2023-01-11T21:38:06.2783932Z del async_compile
2023-01-11T21:38:06.2783937Z 
2023-01-11T21:38:06.2784011Z def call(args):
2023-01-11T21:38:06.2784082Z     arg0_1, = args
2023-01-11T21:38:06.2784157Z     args.clear()
2023-01-11T21:38:06.2784244Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2784444Z         buf3 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2784553Z         buf0 = as_strided(buf3, (8, 16), (36, 1))  # alias
2023-01-11T21:38:06.2784667Z         buf2 = as_strided(buf3, (8, 16), (36, 1), 20)  # alias
2023-01-11T21:38:06.2784876Z         buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2784985Z         buf4 = as_strided(buf6, (8, 16), (16, 1))  # alias
2023-01-11T21:38:06.2785096Z         buf5 = as_strided(buf6, (8, 16), (16, 1), 128)  # alias
2023-01-11T21:38:06.2785318Z         buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.2785452Z         buf7 = as_strided(buf9, (8, 16), (16, 1))  # alias
2023-01-11T21:38:06.2785572Z         buf8 = as_strided(buf9, (8, 16), (16, 1), 128)  # alias
2023-01-11T21:38:06.2785681Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2785886Z         triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0.run(arg0_1, buf0, buf2, buf4, buf5, buf7, buf8, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2785999Z         buf1 = as_strided(buf3, (8, 4), (36, 1), 16)  # alias
2023-01-11T21:38:06.2786149Z         triton_fused_add_slice_1_slice_2_1.run(arg0_1, buf1, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.2786225Z         del arg0_1
2023-01-11T21:38:06.2786309Z         return (buf3, buf6, buf9, )
2023-01-11T21:38:06.2786321Z 
2023-01-11T21:38:06.2786325Z 
2023-01-11T21:38:06.2786399Z if __name__ == "__main__":
2023-01-11T21:38:06.2786516Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2786644Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2786848Z     arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2786959Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2786964Z 
2023-01-11T21:38:06.2787227Z [2023-01-11 21:34:10,712] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 450
2023-01-11T21:38:06.2787646Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2787806Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2788061Z [2023-01-11 21:34:10,760] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 451
2023-01-11T21:38:06.2788067Z 
2023-01-11T21:38:06.2788157Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2788231Z import torch
2023-01-11T21:38:06.2788306Z import random
2023-01-11T21:38:06.2788424Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2788549Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2788554Z 
2023-01-11T21:38:06.2788635Z aten = torch.ops.aten
2023-01-11T21:38:06.2788776Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2788865Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2788877Z 
2023-01-11T21:38:06.2788948Z import triton
2023-01-11T21:38:06.2789040Z import triton.language as tl
2023-01-11T21:38:06.2789164Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2789307Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2789312Z 
2023-01-11T21:38:06.2789317Z 
2023-01-11T21:38:06.2789542Z triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.2789615Z import triton
2023-01-11T21:38:06.2789707Z import triton.language as tl
2023-01-11T21:38:06.2789814Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2789915Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2790047Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2790172Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2790177Z 
2023-01-11T21:38:06.2790653Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp64', 6: '*fp64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.2790728Z @triton.jit
2023-01-11T21:38:06.2790938Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2791015Z     xnumel = 128
2023-01-11T21:38:06.2791111Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2791234Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2791318Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2791389Z     x2 = xindex
2023-01-11T21:38:06.2791463Z     x0 = xindex % 16
2023-01-11T21:38:06.2791542Z     x1 = (xindex // 16)
2023-01-11T21:38:06.2791757Z     tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2791874Z     tmp3 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.2791942Z     tmp1 = 2
2023-01-11T21:38:06.2792022Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2792099Z     tmp4 = tmp3 * tmp1
2023-01-11T21:38:06.2792187Z     tmp5 = tmp4.to(tl.float64)
2023-01-11T21:38:06.2792327Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2792464Z     tl.store(out_ptr1 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2792596Z     tl.store(out_ptr2 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2792722Z     tl.store(out_ptr3 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2792854Z     tl.store(out_ptr4 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2792984Z     tl.store(out_ptr5 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2793069Z ''')
2023-01-11T21:38:06.2793075Z 
2023-01-11T21:38:06.2793079Z 
2023-01-11T21:38:06.2793260Z triton_fused_add_slice_1_slice_2_1 = async_compile.triton('''
2023-01-11T21:38:06.2793377Z import triton
2023-01-11T21:38:06.2793471Z import triton.language as tl
2023-01-11T21:38:06.2793585Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2793680Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2793812Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2793940Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2793946Z 
2023-01-11T21:38:06.2794350Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2794424Z @triton.jit
2023-01-11T21:38:06.2794556Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2794629Z     xnumel = 32
2023-01-11T21:38:06.2794724Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2794846Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2794931Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2795005Z     x0 = xindex % 4
2023-01-11T21:38:06.2795083Z     x1 = (xindex // 4)
2023-01-11T21:38:06.2795209Z     tmp0 = tl.load(in_ptr0 + (x0 + (16*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.2795280Z     tmp1 = 1
2023-01-11T21:38:06.2795359Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2795519Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2795618Z ''')
2023-01-11T21:38:06.2795625Z 
2023-01-11T21:38:06.2795630Z 
2023-01-11T21:38:06.2795732Z async_compile.wait(globals())
2023-01-11T21:38:06.2795811Z del async_compile
2023-01-11T21:38:06.2795816Z 
2023-01-11T21:38:06.2795890Z def call(args):
2023-01-11T21:38:06.2795963Z     arg0_1, = args
2023-01-11T21:38:06.2796037Z     args.clear()
2023-01-11T21:38:06.2796123Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2796325Z         buf3 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2796435Z         buf0 = as_strided(buf3, (8, 16), (36, 1))  # alias
2023-01-11T21:38:06.2796547Z         buf2 = as_strided(buf3, (8, 16), (36, 1), 20)  # alias
2023-01-11T21:38:06.2796750Z         buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2796890Z         buf4 = as_strided(buf6, (8, 16), (16, 1))  # alias
2023-01-11T21:38:06.2797005Z         buf5 = as_strided(buf6, (8, 16), (16, 1), 128)  # alias
2023-01-11T21:38:06.2797206Z         buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.2797306Z         buf7 = as_strided(buf9, (8, 16), (16, 1))  # alias
2023-01-11T21:38:06.2797417Z         buf8 = as_strided(buf9, (8, 16), (16, 1), 128)  # alias
2023-01-11T21:38:06.2797508Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2797711Z         triton_fused_add_1_cat_cat_1_cat_2_convert_element_type_mul_0.run(arg0_1, buf0, buf2, buf4, buf5, buf7, buf8, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2797825Z         buf1 = as_strided(buf3, (8, 4), (36, 1), 16)  # alias
2023-01-11T21:38:06.2797975Z         triton_fused_add_slice_1_slice_2_1.run(arg0_1, buf1, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.2798049Z         del arg0_1
2023-01-11T21:38:06.2798138Z         return (buf3, buf6, buf9, )
2023-01-11T21:38:06.2798144Z 
2023-01-11T21:38:06.2798150Z 
2023-01-11T21:38:06.2798224Z if __name__ == "__main__":
2023-01-11T21:38:06.2798342Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2798467Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2798672Z     arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2798784Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2798789Z 
2023-01-11T21:38:06.2798793Z 
2023-01-11T21:38:06.2798889Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2798964Z import torch
2023-01-11T21:38:06.2799038Z import random
2023-01-11T21:38:06.2799151Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2799309Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2799314Z 
2023-01-11T21:38:06.2799395Z aten = torch.ops.aten
2023-01-11T21:38:06.2799530Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2799628Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2799635Z 
2023-01-11T21:38:06.2799710Z import triton
2023-01-11T21:38:06.2799802Z import triton.language as tl
2023-01-11T21:38:06.2799919Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2800058Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2800063Z 
2023-01-11T21:38:06.2800067Z 
2023-01-11T21:38:06.2800322Z triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0 = async_compile.triton('''
2023-01-11T21:38:06.2800396Z import triton
2023-01-11T21:38:06.2800488Z import triton.language as tl
2023-01-11T21:38:06.2800603Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2800711Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2800842Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2800961Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2800974Z 
2023-01-11T21:38:06.2801520Z @pointwise(size_hints=[4, 64], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp64', 7: '*fp64', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 9), equal_to_1=())]})
2023-01-11T21:38:06.2801594Z @triton.jit
2023-01-11T21:38:06.2801808Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2801881Z     xnumel = 3
2023-01-11T21:38:06.2801954Z     ynumel = 48
2023-01-11T21:38:06.2802053Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2802190Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2802274Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2802363Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2802496Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2802608Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2802680Z     x0 = xindex
2023-01-11T21:38:06.2802750Z     y3 = yindex
2023-01-11T21:38:06.2802824Z     y1 = yindex % 16
2023-01-11T21:38:06.2802904Z     y2 = (yindex // 16)
2023-01-11T21:38:06.2803112Z     tmp0 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask, eviction_policy='evict_last')
2023-01-11T21:38:06.2803229Z     tmp5 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask)
2023-01-11T21:38:06.2803302Z     tmp1 = 1
2023-01-11T21:38:06.2803381Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2803452Z     tmp3 = 2
2023-01-11T21:38:06.2803530Z     tmp4 = tmp0 + tmp3
2023-01-11T21:38:06.2803609Z     tmp6 = tmp5 * tmp3
2023-01-11T21:38:06.2803691Z     tmp7 = tmp6.to(tl.float64)
2023-01-11T21:38:06.2803856Z     tl.store(out_ptr0 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2804019Z     tl.store(out_ptr1 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.2804180Z     tl.store(out_ptr2 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp4, xmask & ymask)
2023-01-11T21:38:06.2804333Z     tl.store(out_ptr3 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask)
2023-01-11T21:38:06.2804481Z     tl.store(out_ptr4 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask)
2023-01-11T21:38:06.2804628Z     tl.store(out_ptr5 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask)
2023-01-11T21:38:06.2804772Z     tl.store(out_ptr6 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask)
2023-01-11T21:38:06.2804880Z ''')
2023-01-11T21:38:06.2804886Z 
2023-01-11T21:38:06.2804893Z 
2023-01-11T21:38:06.2805040Z triton_fused_cat_1 = async_compile.triton('''
2023-01-11T21:38:06.2805115Z import triton
2023-01-11T21:38:06.2805211Z import triton.language as tl
2023-01-11T21:38:06.2805344Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2805456Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2805611Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2805735Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2805740Z 
2023-01-11T21:38:06.2806193Z @pointwise(size_hints=[4, 256], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2806267Z @triton.jit
2023-01-11T21:38:06.2806433Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2806509Z     xnumel = 3
2023-01-11T21:38:06.2806583Z     ynumel = 144
2023-01-11T21:38:06.2806681Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2806815Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2806901Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2806991Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2807121Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2807202Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2807270Z     x0 = xindex
2023-01-11T21:38:06.2807339Z     y1 = yindex
2023-01-11T21:38:06.2807454Z     tmp0 = tl.load(in_ptr0 + (y1 + (144*x0)), xmask & ymask)
2023-01-11T21:38:06.2807611Z     tl.store(out_ptr0 + (x0 + (3*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2807689Z ''')
2023-01-11T21:38:06.2807694Z 
2023-01-11T21:38:06.2807699Z 
2023-01-11T21:38:06.2807857Z triton_fused_cat_1_2 = async_compile.triton('''
2023-01-11T21:38:06.2807932Z import triton
2023-01-11T21:38:06.2808024Z import triton.language as tl
2023-01-11T21:38:06.2808137Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2808238Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2808398Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2808517Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2808529Z 
2023-01-11T21:38:06.2808983Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2809057Z @triton.jit
2023-01-11T21:38:06.2809222Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2809291Z     xnumel = 6
2023-01-11T21:38:06.2809365Z     ynumel = 48
2023-01-11T21:38:06.2809459Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2809589Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2809670Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2809759Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2809895Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2809980Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2810050Z     x3 = xindex
2023-01-11T21:38:06.2810121Z     y2 = yindex
2023-01-11T21:38:06.2810193Z     x0 = xindex % 3
2023-01-11T21:38:06.2810264Z     x1 = (xindex // 3)
2023-01-11T21:38:06.2810378Z     tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask)
2023-01-11T21:38:06.2810539Z     tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2810625Z ''')
2023-01-11T21:38:06.2810631Z 
2023-01-11T21:38:06.2810635Z 
2023-01-11T21:38:06.2810818Z triton_fused_cat_2_3 = async_compile.triton('''
2023-01-11T21:38:06.2810897Z import triton
2023-01-11T21:38:06.2810991Z import triton.language as tl
2023-01-11T21:38:06.2811105Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2811200Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2811339Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2811464Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2811470Z 
2023-01-11T21:38:06.2811926Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2811999Z @triton.jit
2023-01-11T21:38:06.2812166Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2812237Z     xnumel = 6
2023-01-11T21:38:06.2812310Z     ynumel = 48
2023-01-11T21:38:06.2812399Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2812534Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2812619Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2812712Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2812845Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2812925Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2812996Z     x3 = xindex
2023-01-11T21:38:06.2813059Z     y2 = yindex
2023-01-11T21:38:06.2813134Z     x0 = xindex % 3
2023-01-11T21:38:06.2813210Z     x1 = (xindex // 3)
2023-01-11T21:38:06.2813324Z     tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask)
2023-01-11T21:38:06.2813487Z     tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2813572Z ''')
2023-01-11T21:38:06.2813578Z 
2023-01-11T21:38:06.2813582Z 
2023-01-11T21:38:06.2813677Z async_compile.wait(globals())
2023-01-11T21:38:06.2813759Z del async_compile
2023-01-11T21:38:06.2813765Z 
2023-01-11T21:38:06.2813832Z def call(args):
2023-01-11T21:38:06.2813906Z     arg0_1, = args
2023-01-11T21:38:06.2813983Z     args.clear()
2023-01-11T21:38:06.2814074Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2814321Z         buf3 = empty_strided((1, 3, 3, 48), (432, 144, 48, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2814438Z         buf0 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1))  # alias
2023-01-11T21:38:06.2814674Z         buf1 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 16)  # alias
2023-01-11T21:38:06.2814788Z         buf2 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 32)  # alias
2023-01-11T21:38:06.2815004Z         buf7 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2815123Z         buf5 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1))  # alias
2023-01-11T21:38:06.2815242Z         buf6 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1), 144)  # alias
2023-01-11T21:38:06.2815464Z         buf11 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.2815583Z         buf9 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1))  # alias
2023-01-11T21:38:06.2815706Z         buf10 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1), 144)  # alias
2023-01-11T21:38:06.2815799Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2816012Z         triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0.run(arg0_1, buf0, buf1, buf2, buf5, buf6, buf9, buf10, 3, 48, grid=grid(3, 48), stream=stream0)
2023-01-11T21:38:06.2816088Z         del arg0_1
2023-01-11T21:38:06.2816303Z         buf4 = empty_strided((1, 3, 3, 48), (432, 1, 144, 3), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2816443Z         triton_fused_cat_1.run(buf3, buf4, 3, 144, grid=grid(3, 144), stream=stream0)
2023-01-11T21:38:06.2816513Z         del buf0
2023-01-11T21:38:06.2816633Z         del buf1
2023-01-11T21:38:06.2816703Z         del buf2
2023-01-11T21:38:06.2816771Z         del buf3
2023-01-11T21:38:06.2816980Z         buf8 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2817122Z         triton_fused_cat_1_2.run(buf7, buf8, 6, 48, grid=grid(6, 48), stream=stream0)
2023-01-11T21:38:06.2817268Z         del buf5
2023-01-11T21:38:06.2817356Z         del buf6
2023-01-11T21:38:06.2817427Z         del buf7
2023-01-11T21:38:06.2817641Z         buf12 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.2817780Z         triton_fused_cat_2_3.run(buf11, buf12, 6, 48, grid=grid(6, 48), stream=stream0)
2023-01-11T21:38:06.2817844Z         del buf10
2023-01-11T21:38:06.2817915Z         del buf11
2023-01-11T21:38:06.2817984Z         del buf9
2023-01-11T21:38:06.2818074Z         return (buf4, buf8, buf12, )
2023-01-11T21:38:06.2818080Z 
2023-01-11T21:38:06.2818088Z 
2023-01-11T21:38:06.2818169Z if __name__ == "__main__":
2023-01-11T21:38:06.2818288Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2818415Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2818634Z     arg0_1 = rand_strided((1, 3, 3, 16), (144, 1, 48, 3), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2818742Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2819009Z [2023-01-11 21:34:11,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 451
2023-01-11T21:38:06.2819430Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2819560Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2819820Z [2023-01-11 21:34:11,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 452
2023-01-11T21:38:06.2819826Z 
2023-01-11T21:38:06.2819830Z 
2023-01-11T21:38:06.2819926Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2820000Z import torch
2023-01-11T21:38:06.2820114Z import random
2023-01-11T21:38:06.2820236Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2820354Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2820359Z 
2023-01-11T21:38:06.2820441Z aten = torch.ops.aten
2023-01-11T21:38:06.2820577Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2820673Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2820679Z 
2023-01-11T21:38:06.2820753Z import triton
2023-01-11T21:38:06.2820845Z import triton.language as tl
2023-01-11T21:38:06.2820968Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2821101Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2821116Z 
2023-01-11T21:38:06.2821121Z 
2023-01-11T21:38:06.2821369Z triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0 = async_compile.triton('''
2023-01-11T21:38:06.2821443Z import triton
2023-01-11T21:38:06.2821536Z import triton.language as tl
2023-01-11T21:38:06.2821650Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2821753Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2821884Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2822011Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2822016Z 
2023-01-11T21:38:06.2822574Z @pointwise(size_hints=[4, 64], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp64', 7: '*fp64', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 9), equal_to_1=())]})
2023-01-11T21:38:06.2822678Z @triton.jit
2023-01-11T21:38:06.2822886Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2822961Z     xnumel = 3
2023-01-11T21:38:06.2823036Z     ynumel = 48
2023-01-11T21:38:06.2823137Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2823272Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2823355Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2823452Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2823577Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2823659Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2823729Z     x0 = xindex
2023-01-11T21:38:06.2823800Z     y3 = yindex
2023-01-11T21:38:06.2823876Z     y1 = yindex % 16
2023-01-11T21:38:06.2823954Z     y2 = (yindex // 16)
2023-01-11T21:38:06.2824195Z     tmp0 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2824322Z     tmp5 = tl.load(in_ptr0 + (x0 + (3*y3)), xmask & ymask).to(tl.float32)
2023-01-11T21:38:06.2824391Z     tmp1 = 1
2023-01-11T21:38:06.2824471Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2824542Z     tmp3 = 2
2023-01-11T21:38:06.2824620Z     tmp4 = tmp0 + tmp3
2023-01-11T21:38:06.2824697Z     tmp6 = tmp5 * tmp3
2023-01-11T21:38:06.2824785Z     tmp7 = tmp6.to(tl.float64)
2023-01-11T21:38:06.2824945Z     tl.store(out_ptr0 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2825109Z     tl.store(out_ptr1 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.2825268Z     tl.store(out_ptr2 + (y1 + (48*y2) + (144*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp4, xmask & ymask)
2023-01-11T21:38:06.2825447Z     tl.store(out_ptr3 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask)
2023-01-11T21:38:06.2825623Z     tl.store(out_ptr4 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp6, xmask & ymask)
2023-01-11T21:38:06.2825770Z     tl.store(out_ptr5 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask)
2023-01-11T21:38:06.2825942Z     tl.store(out_ptr6 + (y3 + (48*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask)
2023-01-11T21:38:06.2826030Z ''')
2023-01-11T21:38:06.2826035Z 
2023-01-11T21:38:06.2826040Z 
2023-01-11T21:38:06.2826195Z triton_fused_cat_1 = async_compile.triton('''
2023-01-11T21:38:06.2826263Z import triton
2023-01-11T21:38:06.2826357Z import triton.language as tl
2023-01-11T21:38:06.2826475Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2826576Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2826709Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2826834Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2826842Z 
2023-01-11T21:38:06.2827302Z @pointwise(size_hints=[4, 256], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2827380Z @triton.jit
2023-01-11T21:38:06.2827540Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2827617Z     xnumel = 3
2023-01-11T21:38:06.2827689Z     ynumel = 144
2023-01-11T21:38:06.2827784Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2827920Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2828004Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2828099Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2828224Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2828306Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2828408Z     x0 = xindex
2023-01-11T21:38:06.2828478Z     y1 = yindex
2023-01-11T21:38:06.2828609Z     tmp0 = tl.load(in_ptr0 + (y1 + (144*x0)), xmask & ymask).to(tl.float32)
2023-01-11T21:38:06.2828765Z     tl.store(out_ptr0 + (x0 + (3*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2828852Z ''')
2023-01-11T21:38:06.2828858Z 
2023-01-11T21:38:06.2828862Z 
2023-01-11T21:38:06.2829013Z triton_fused_cat_1_2 = async_compile.triton('''
2023-01-11T21:38:06.2829088Z import triton
2023-01-11T21:38:06.2829180Z import triton.language as tl
2023-01-11T21:38:06.2829294Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2829397Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2829529Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2829655Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2829660Z 
2023-01-11T21:38:06.2830119Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2830196Z @triton.jit
2023-01-11T21:38:06.2830357Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2830430Z     xnumel = 6
2023-01-11T21:38:06.2830503Z     ynumel = 48
2023-01-11T21:38:06.2830597Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2830729Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2830812Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2830908Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2831032Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2831117Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2831191Z     x3 = xindex
2023-01-11T21:38:06.2831263Z     y2 = yindex
2023-01-11T21:38:06.2831340Z     x0 = xindex % 3
2023-01-11T21:38:06.2831418Z     x1 = (xindex // 3)
2023-01-11T21:38:06.2831544Z     tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask).to(tl.float32)
2023-01-11T21:38:06.2831706Z     tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2831818Z ''')
2023-01-11T21:38:06.2831824Z 
2023-01-11T21:38:06.2831828Z 
2023-01-11T21:38:06.2831987Z triton_fused_cat_2_3 = async_compile.triton('''
2023-01-11T21:38:06.2832063Z import triton
2023-01-11T21:38:06.2832155Z import triton.language as tl
2023-01-11T21:38:06.2832270Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2832375Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2832500Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2832625Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2832630Z 
2023-01-11T21:38:06.2833084Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.2833161Z @triton.jit
2023-01-11T21:38:06.2833330Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.2833403Z     xnumel = 6
2023-01-11T21:38:06.2833475Z     ynumel = 48
2023-01-11T21:38:06.2833573Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2833700Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2833783Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2833878Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.2834007Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.2834090Z     ymask = yindex < ynumel
2023-01-11T21:38:06.2834158Z     x3 = xindex
2023-01-11T21:38:06.2834271Z     y2 = yindex
2023-01-11T21:38:06.2834337Z     x0 = xindex % 3
2023-01-11T21:38:06.2834417Z     x1 = (xindex // 3)
2023-01-11T21:38:06.2834535Z     tmp0 = tl.load(in_ptr0 + (y2 + (48*x3)), xmask & ymask)
2023-01-11T21:38:06.2834698Z     tl.store(out_ptr0 + (x0 + (3*y2) + (144*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.2834790Z ''')
2023-01-11T21:38:06.2834795Z 
2023-01-11T21:38:06.2834800Z 
2023-01-11T21:38:06.2834894Z async_compile.wait(globals())
2023-01-11T21:38:06.2834972Z del async_compile
2023-01-11T21:38:06.2834977Z 
2023-01-11T21:38:06.2835054Z def call(args):
2023-01-11T21:38:06.2835138Z     arg0_1, = args
2023-01-11T21:38:06.2835219Z     args.clear()
2023-01-11T21:38:06.2835333Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2835559Z         buf3 = empty_strided((1, 3, 3, 48), (432, 144, 48, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2835675Z         buf0 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1))  # alias
2023-01-11T21:38:06.2835798Z         buf1 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 16)  # alias
2023-01-11T21:38:06.2835917Z         buf2 = as_strided(buf3, (1, 3, 3, 16), (432, 144, 48, 1), 32)  # alias
2023-01-11T21:38:06.2836127Z         buf7 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2836248Z         buf5 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1))  # alias
2023-01-11T21:38:06.2836366Z         buf6 = as_strided(buf7, (1, 3, 3, 16), (144, 48, 16, 1), 144)  # alias
2023-01-11T21:38:06.2836585Z         buf11 = empty_strided((2, 3, 3, 16), (144, 48, 16, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.2836700Z         buf9 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1))  # alias
2023-01-11T21:38:06.2836818Z         buf10 = as_strided(buf11, (1, 3, 3, 16), (144, 48, 16, 1), 144)  # alias
2023-01-11T21:38:06.2836910Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2837132Z         triton_fused_add_add_1_cat_cat_1_cat_2_convert_element_type_mul_slice_1_slice_2_0.run(arg0_1, buf0, buf1, buf2, buf5, buf6, buf9, buf10, 3, 48, grid=grid(3, 48), stream=stream0)
2023-01-11T21:38:06.2837208Z         del arg0_1
2023-01-11T21:38:06.2837417Z         buf4 = empty_strided((1, 3, 3, 48), (432, 1, 144, 3), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2837583Z         triton_fused_cat_1.run(buf3, buf4, 3, 144, grid=grid(3, 144), stream=stream0)
2023-01-11T21:38:06.2837657Z         del buf0
2023-01-11T21:38:06.2837732Z         del buf1
2023-01-11T21:38:06.2837801Z         del buf2
2023-01-11T21:38:06.2837870Z         del buf3
2023-01-11T21:38:06.2838087Z         buf8 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2838219Z         triton_fused_cat_1_2.run(buf7, buf8, 6, 48, grid=grid(6, 48), stream=stream0)
2023-01-11T21:38:06.2838289Z         del buf5
2023-01-11T21:38:06.2838362Z         del buf6
2023-01-11T21:38:06.2838431Z         del buf7
2023-01-11T21:38:06.2838647Z         buf12 = empty_strided((2, 3, 3, 16), (144, 1, 48, 3), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.2838788Z         triton_fused_cat_2_3.run(buf11, buf12, 6, 48, grid=grid(6, 48), stream=stream0)
2023-01-11T21:38:06.2838860Z         del buf10
2023-01-11T21:38:06.2838924Z         del buf11
2023-01-11T21:38:06.2838992Z         del buf9
2023-01-11T21:38:06.2839084Z         return (buf4, buf8, buf12, )
2023-01-11T21:38:06.2839089Z 
2023-01-11T21:38:06.2839094Z 
2023-01-11T21:38:06.2839175Z if __name__ == "__main__":
2023-01-11T21:38:06.2839295Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2839421Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2839638Z     arg0_1 = rand_strided((1, 3, 3, 16), (144, 1, 48, 3), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2839750Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2840008Z [2023-01-11 21:34:11,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 452
2023-01-11T21:38:06.2840042Z 
2023-01-11T21:38:06.2840114Z ok (1.677s)
2023-01-11T21:38:06.2840581Z   test_cat_extern_kernel_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2840715Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2840970Z [2023-01-11 21:34:11,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 453
2023-01-11T21:38:06.2841232Z [2023-01-11 21:34:12,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 453
2023-01-11T21:38:06.2841239Z 
2023-01-11T21:38:06.2841336Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2841413Z import torch
2023-01-11T21:38:06.2841492Z import random
2023-01-11T21:38:06.2841604Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2841728Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2841734Z 
2023-01-11T21:38:06.2841816Z aten = torch.ops.aten
2023-01-11T21:38:06.2841958Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2842054Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2842059Z 
2023-01-11T21:38:06.2842133Z import triton
2023-01-11T21:38:06.2842228Z import triton.language as tl
2023-01-11T21:38:06.2842353Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2842487Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2842492Z 
2023-01-11T21:38:06.2842503Z 
2023-01-11T21:38:06.2842649Z triton_fused_cat_0 = async_compile.triton('''
2023-01-11T21:38:06.2842723Z import triton
2023-01-11T21:38:06.2842815Z import triton.language as tl
2023-01-11T21:38:06.2842932Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2843032Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2843162Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2843289Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2843294Z 
2023-01-11T21:38:06.2843720Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2843797Z @triton.jit
2023-01-11T21:38:06.2843930Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2844003Z     xnumel = 65536
2023-01-11T21:38:06.2844100Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2844229Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2844311Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2844377Z     x2 = xindex
2023-01-11T21:38:06.2844453Z     x0 = xindex % 256
2023-01-11T21:38:06.2844535Z     x1 = (xindex // 256)
2023-01-11T21:38:06.2844633Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.2844776Z     tl.store(out_ptr0 + (x0 + (512*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2844862Z ''')
2023-01-11T21:38:06.2844872Z 
2023-01-11T21:38:06.2844877Z 
2023-01-11T21:38:06.2844971Z async_compile.wait(globals())
2023-01-11T21:38:06.2845049Z del async_compile
2023-01-11T21:38:06.2845054Z 
2023-01-11T21:38:06.2845122Z def call(args):
2023-01-11T21:38:06.2845215Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:06.2845290Z     args.clear()
2023-01-11T21:38:06.2845381Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2845591Z         buf0 = empty_strided((256, 1600), (1600, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2845692Z         aten.mm.out(arg1_1, arg2_1, out=buf0)
2023-01-11T21:38:06.2845764Z         del arg1_1
2023-01-11T21:38:06.2845859Z         del arg2_1
2023-01-11T21:38:06.2846068Z         buf3 = empty_strided((256, 512), (512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2846184Z         buf1 = as_strided(buf3, (256, 256), (512, 1))  # alias
2023-01-11T21:38:06.2846310Z         aten.mm.out(as_strided(buf0, (256, 100), (1600, 1)), arg3_1, out=buf1)
2023-01-11T21:38:06.2846386Z         del arg3_1
2023-01-11T21:38:06.2846458Z         del buf0
2023-01-11T21:38:06.2846573Z         buf2 = as_strided(buf3, (256, 256), (512, 1), 256)  # alias
2023-01-11T21:38:06.2846659Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2846800Z         triton_fused_cat_0.run(arg0_1, buf2, 65536, grid=grid(65536), stream=stream0)
2023-01-11T21:38:06.2846872Z         del arg0_1
2023-01-11T21:38:06.2846949Z         return (buf3, )
2023-01-11T21:38:06.2846954Z 
2023-01-11T21:38:06.2846958Z 
2023-01-11T21:38:06.2847035Z if __name__ == "__main__":
2023-01-11T21:38:06.2847152Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2847281Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2847486Z     arg0_1 = rand_strided((256, 256), (256, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2847688Z     arg1_1 = rand_strided((256, 1024), (1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2847899Z     arg2_1 = rand_strided((1024, 1600), (1600, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2848101Z     arg3_1 = rand_strided((100, 256), (256, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2848232Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:06.2848238Z 
2023-01-11T21:38:06.2848309Z ok (0.120s)
2023-01-11T21:38:06.2848770Z   test_cat_upcasting_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2848905Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2849162Z [2023-01-11 21:34:12,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 454
2023-01-11T21:38:06.2849458Z [2023-01-11 21:34:12,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 454
2023-01-11T21:38:06.2849878Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2850009Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2850258Z [2023-01-11 21:34:12,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 455
2023-01-11T21:38:06.2850522Z [2023-01-11 21:34:12,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 455
2023-01-11T21:38:06.2850527Z 
2023-01-11T21:38:06.2850624Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2850701Z import torch
2023-01-11T21:38:06.2850776Z import random
2023-01-11T21:38:06.2850895Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2851016Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2851021Z 
2023-01-11T21:38:06.2851101Z aten = torch.ops.aten
2023-01-11T21:38:06.2851231Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2851322Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2851327Z 
2023-01-11T21:38:06.2851399Z import triton
2023-01-11T21:38:06.2851489Z import triton.language as tl
2023-01-11T21:38:06.2851616Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2851779Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2851785Z 
2023-01-11T21:38:06.2851789Z 
2023-01-11T21:38:06.2851943Z triton_fused_cat_0 = async_compile.triton('''
2023-01-11T21:38:06.2852017Z import triton
2023-01-11T21:38:06.2852103Z import triton.language as tl
2023-01-11T21:38:06.2852220Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2852320Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2852451Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2852579Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2852584Z 
2023-01-11T21:38:06.2852986Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2853060Z @triton.jit
2023-01-11T21:38:06.2853197Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2853264Z     xnumel = 128
2023-01-11T21:38:06.2853358Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2853486Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2853569Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2853642Z     x2 = xindex
2023-01-11T21:38:06.2853717Z     x0 = xindex % 16
2023-01-11T21:38:06.2853797Z     x1 = (xindex // 16)
2023-01-11T21:38:06.2853888Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.2854029Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2854114Z ''')
2023-01-11T21:38:06.2854119Z 
2023-01-11T21:38:06.2854124Z 
2023-01-11T21:38:06.2854276Z triton_fused_cat_1 = async_compile.triton('''
2023-01-11T21:38:06.2854349Z import triton
2023-01-11T21:38:06.2854442Z import triton.language as tl
2023-01-11T21:38:06.2854663Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2854762Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2854896Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2855019Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2855025Z 
2023-01-11T21:38:06.2855516Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2855593Z @triton.jit
2023-01-11T21:38:06.2855725Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2855798Z     xnumel = 160
2023-01-11T21:38:06.2855893Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2856015Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2856098Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2856168Z     x2 = xindex
2023-01-11T21:38:06.2856242Z     x0 = xindex % 20
2023-01-11T21:38:06.2856323Z     x1 = (xindex // 20)
2023-01-11T21:38:06.2856441Z     tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.2856528Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.2856661Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.2856748Z ''')
2023-01-11T21:38:06.2856753Z 
2023-01-11T21:38:06.2856760Z 
2023-01-11T21:38:06.2856850Z async_compile.wait(globals())
2023-01-11T21:38:06.2856926Z del async_compile
2023-01-11T21:38:06.2856931Z 
2023-01-11T21:38:06.2857004Z def call(args):
2023-01-11T21:38:06.2857081Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2857213Z     args.clear()
2023-01-11T21:38:06.2857306Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2857504Z         buf2 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2857614Z         buf0 = as_strided(buf2, (8, 16), (36, 1))  # alias
2023-01-11T21:38:06.2857705Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2857881Z         triton_fused_cat_0.run(arg0_1, buf0, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2857954Z         del arg0_1
2023-01-11T21:38:06.2858066Z         buf1 = as_strided(buf2, (8, 20), (36, 1), 16)  # alias
2023-01-11T21:38:06.2858202Z         triton_fused_cat_1.run(arg1_1, buf1, 160, grid=grid(160), stream=stream0)
2023-01-11T21:38:06.2858271Z         del arg1_1
2023-01-11T21:38:06.2858348Z         return (buf2, )
2023-01-11T21:38:06.2858353Z 
2023-01-11T21:38:06.2858357Z 
2023-01-11T21:38:06.2858436Z if __name__ == "__main__":
2023-01-11T21:38:06.2858553Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2864823Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2865063Z     arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2865265Z     arg1_1 = rand_strided((8, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2865408Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2865423Z 
2023-01-11T21:38:06.2865428Z 
2023-01-11T21:38:06.2865537Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2865627Z import torch
2023-01-11T21:38:06.2865702Z import random
2023-01-11T21:38:06.2865815Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2865944Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2865949Z 
2023-01-11T21:38:06.2866033Z aten = torch.ops.aten
2023-01-11T21:38:06.2866171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2866268Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2866274Z 
2023-01-11T21:38:06.2866349Z import triton
2023-01-11T21:38:06.2866437Z import triton.language as tl
2023-01-11T21:38:06.2866555Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2866698Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2866703Z 
2023-01-11T21:38:06.2866708Z 
2023-01-11T21:38:06.2866859Z triton_fused_cat_0 = async_compile.triton('''
2023-01-11T21:38:06.2866938Z import triton
2023-01-11T21:38:06.2867028Z import triton.language as tl
2023-01-11T21:38:06.2867143Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2867238Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2867370Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2867548Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2867554Z 
2023-01-11T21:38:06.2867959Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2868032Z @triton.jit
2023-01-11T21:38:06.2868165Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2868240Z     xnumel = 128
2023-01-11T21:38:06.2868337Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2868459Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2868545Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2868615Z     x2 = xindex
2023-01-11T21:38:06.2868691Z     x0 = xindex % 16
2023-01-11T21:38:06.2868768Z     x1 = (xindex // 16)
2023-01-11T21:38:06.2868886Z     tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.2869030Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2869109Z ''')
2023-01-11T21:38:06.2869114Z 
2023-01-11T21:38:06.2869118Z 
2023-01-11T21:38:06.2869275Z triton_fused_cat_1 = async_compile.triton('''
2023-01-11T21:38:06.2869350Z import triton
2023-01-11T21:38:06.2869442Z import triton.language as tl
2023-01-11T21:38:06.2869554Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2869656Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2869790Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2869909Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2869952Z 
2023-01-11T21:38:06.2870349Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2870421Z @triton.jit
2023-01-11T21:38:06.2870554Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2870626Z     xnumel = 160
2023-01-11T21:38:06.2870720Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2870849Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2870933Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2870997Z     x2 = xindex
2023-01-11T21:38:06.2871072Z     x0 = xindex % 20
2023-01-11T21:38:06.2871150Z     x1 = (xindex // 20)
2023-01-11T21:38:06.2871267Z     tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.2871410Z     tl.store(out_ptr0 + (x0 + (36*x1) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.2871499Z ''')
2023-01-11T21:38:06.2871505Z 
2023-01-11T21:38:06.2871509Z 
2023-01-11T21:38:06.2871601Z async_compile.wait(globals())
2023-01-11T21:38:06.2871677Z del async_compile
2023-01-11T21:38:06.2871682Z 
2023-01-11T21:38:06.2871750Z def call(args):
2023-01-11T21:38:06.2871831Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2871906Z     args.clear()
2023-01-11T21:38:06.2871999Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2872200Z         buf2 = empty_strided((8, 36), (36, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2872309Z         buf0 = as_strided(buf2, (8, 16), (36, 1))  # alias
2023-01-11T21:38:06.2872400Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2872530Z         triton_fused_cat_0.run(arg0_1, buf0, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.2872603Z         del arg0_1
2023-01-11T21:38:06.2872715Z         buf1 = as_strided(buf2, (8, 20), (36, 1), 16)  # alias
2023-01-11T21:38:06.2872850Z         triton_fused_cat_1.run(arg1_1, buf1, 160, grid=grid(160), stream=stream0)
2023-01-11T21:38:06.2872926Z         del arg1_1
2023-01-11T21:38:06.2873005Z         return (buf2, )
2023-01-11T21:38:06.2873010Z 
2023-01-11T21:38:06.2873015Z 
2023-01-11T21:38:06.2873093Z if __name__ == "__main__":
2023-01-11T21:38:06.2873246Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2873368Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2873571Z     arg0_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2873768Z     arg1_1 = rand_strided((8, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2873887Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2873892Z 
2023-01-11T21:38:06.2873962Z ok (0.302s)
2023-01-11T21:38:06.2874418Z   test_cauchy_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2874553Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2874813Z [2023-01-11 21:34:12,349] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 456
2023-01-11T21:38:06.2875079Z [2023-01-11 21:34:12,527] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 456
2023-01-11T21:38:06.2875086Z 
2023-01-11T21:38:06.2875189Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2875276Z import torch
2023-01-11T21:38:06.2875357Z import random
2023-01-11T21:38:06.2875495Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2875618Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2875623Z 
2023-01-11T21:38:06.2875737Z aten = torch.ops.aten
2023-01-11T21:38:06.2875874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2875962Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2875973Z 
2023-01-11T21:38:06.2876040Z import triton
2023-01-11T21:38:06.2876130Z import triton.language as tl
2023-01-11T21:38:06.2876257Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2876395Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2876400Z 
2023-01-11T21:38:06.2876405Z 
2023-01-11T21:38:06.2876613Z triton_fused_mul_reciprocal_sub_sum_1_unsqueeze_0 = async_compile.triton('''
2023-01-11T21:38:06.2876688Z import triton
2023-01-11T21:38:06.2876782Z import triton.language as tl
2023-01-11T21:38:06.2876888Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2876990Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2877121Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2877248Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2877256Z 
2023-01-11T21:38:06.2877350Z @reduction(size_hints=[1, 1024],
2023-01-11T21:38:06.2877465Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2877550Z               filename=__file__,
2023-01-11T21:38:06.2877931Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.2877998Z @triton.jit
2023-01-11T21:38:06.2878175Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2878248Z     xnumel = 1
2023-01-11T21:38:06.2878325Z     rnumel = 1024
2023-01-11T21:38:06.2878424Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2878558Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2878640Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2878755Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2878873Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.2878978Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2879064Z         rindex = roffset + rbase
2023-01-11T21:38:06.2879149Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2879255Z         r1 = (rindex // 32)
2023-01-11T21:38:06.2879335Z         r0 = rindex % 32
2023-01-11T21:38:06.2879431Z         tmp0 = tl.load(in_ptr0 + (r1), rmask)
2023-01-11T21:38:06.2879532Z         tmp1 = tl.load(in_ptr1 + (r0), rmask)
2023-01-11T21:38:06.2879649Z         tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.2879725Z         tmp3 = 1 / tmp2
2023-01-11T21:38:06.2879802Z         tmp4 = 1
2023-01-11T21:38:06.2879884Z         tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.2880007Z         _tmp6 = tl.where(xmask & rmask, _tmp6 + tmp5, _tmp6)
2023-01-11T21:38:06.2880115Z     tmp6 = tl.reshape(tl.sum(_tmp6, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2880256Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None)
2023-01-11T21:38:06.2880340Z ''')
2023-01-11T21:38:06.2880345Z 
2023-01-11T21:38:06.2880350Z 
2023-01-11T21:38:06.2880443Z async_compile.wait(globals())
2023-01-11T21:38:06.2880519Z del async_compile
2023-01-11T21:38:06.2880524Z 
2023-01-11T21:38:06.2880597Z def call(args):
2023-01-11T21:38:06.2880678Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2880756Z     args.clear()
2023-01-11T21:38:06.2880842Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2881030Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2881121Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2881298Z         triton_fused_mul_reciprocal_sub_sum_1_unsqueeze_0.run(arg0_1, arg1_1, buf0, 1, 1024, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.2881371Z         del arg0_1
2023-01-11T21:38:06.2881443Z         del arg1_1
2023-01-11T21:38:06.2881520Z         return (buf0, )
2023-01-11T21:38:06.2881526Z 
2023-01-11T21:38:06.2881577Z 
2023-01-11T21:38:06.2881657Z if __name__ == "__main__":
2023-01-11T21:38:06.2881768Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2881895Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2882094Z     arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2882293Z     arg1_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2882413Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2882418Z 
2023-01-11T21:38:06.2882488Z ok (0.199s)
2023-01-11T21:38:06.2882943Z   test_clamp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2883080Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2883337Z [2023-01-11 21:34:12,556] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 457
2023-01-11T21:38:06.2883597Z [2023-01-11 21:34:12,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 457
2023-01-11T21:38:06.2884013Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2884144Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2884398Z [2023-01-11 21:34:12,769] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 458
2023-01-11T21:38:06.2884664Z [2023-01-11 21:34:12,855] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 458
2023-01-11T21:38:06.2884670Z 
2023-01-11T21:38:06.2884769Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2884843Z import torch
2023-01-11T21:38:06.2884916Z import random
2023-01-11T21:38:06.2885064Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2885185Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2885190Z 
2023-01-11T21:38:06.2885271Z aten = torch.ops.aten
2023-01-11T21:38:06.2885408Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2885505Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2885510Z 
2023-01-11T21:38:06.2885584Z import triton
2023-01-11T21:38:06.2885676Z import triton.language as tl
2023-01-11T21:38:06.2885801Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2885933Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2885947Z 
2023-01-11T21:38:06.2885951Z 
2023-01-11T21:38:06.2886140Z triton_fused_maximum_1_minimum_minimum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2886215Z import triton
2023-01-11T21:38:06.2886307Z import triton.language as tl
2023-01-11T21:38:06.2886421Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2886525Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2886658Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2886780Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2886786Z 
2023-01-11T21:38:06.2887233Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.2887300Z @triton.jit
2023-01-11T21:38:06.2887458Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2887556Z     xnumel = 64
2023-01-11T21:38:06.2887654Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2887783Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2887866Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2887940Z     x0 = xindex
2023-01-11T21:38:06.2888124Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2888316Z     tmp5 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2888413Z     tmp8 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2888511Z     tmp9 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.2888630Z     tmp1 = -0.10000000149011612
2023-01-11T21:38:06.2888768Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.2888847Z     tmp3 = 0.10000000149011612
2023-01-11T21:38:06.2888983Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3))
2023-01-11T21:38:06.2889052Z     tmp6 = 0.0
2023-01-11T21:38:06.2889186Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6))
2023-01-11T21:38:06.2889268Z     tmp10 = tmp8 + tmp9
2023-01-11T21:38:06.2889411Z     tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp6, tmp10, tmp6))
2023-01-11T21:38:06.2889547Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2889681Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2889811Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.2889890Z ''')
2023-01-11T21:38:06.2889896Z 
2023-01-11T21:38:06.2889907Z 
2023-01-11T21:38:06.2889994Z async_compile.wait(globals())
2023-01-11T21:38:06.2890073Z del async_compile
2023-01-11T21:38:06.2890078Z 
2023-01-11T21:38:06.2890153Z def call(args):
2023-01-11T21:38:06.2890232Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2890306Z     args.clear()
2023-01-11T21:38:06.2890400Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2890600Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2890792Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2891018Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2891111Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2891287Z         triton_fused_maximum_1_minimum_minimum_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2891362Z         del arg0_1
2023-01-11T21:38:06.2891434Z         del arg1_1
2023-01-11T21:38:06.2891523Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.2891528Z 
2023-01-11T21:38:06.2891532Z 
2023-01-11T21:38:06.2891613Z if __name__ == "__main__":
2023-01-11T21:38:06.2891725Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2891853Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2892054Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2892251Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2892368Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2892374Z 
2023-01-11T21:38:06.2892380Z 
2023-01-11T21:38:06.2892478Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2892552Z import torch
2023-01-11T21:38:06.2892625Z import random
2023-01-11T21:38:06.2892738Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2892861Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2892866Z 
2023-01-11T21:38:06.2892946Z aten = torch.ops.aten
2023-01-11T21:38:06.2893081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2893173Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2893178Z 
2023-01-11T21:38:06.2893250Z import triton
2023-01-11T21:38:06.2893373Z import triton.language as tl
2023-01-11T21:38:06.2893491Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2893629Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2893635Z 
2023-01-11T21:38:06.2893639Z 
2023-01-11T21:38:06.2893835Z triton_fused_maximum_1_minimum_minimum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2893910Z import triton
2023-01-11T21:38:06.2894001Z import triton.language as tl
2023-01-11T21:38:06.2894115Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2894217Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2894348Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2894465Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2894705Z 
2023-01-11T21:38:06.2895166Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.2895258Z @triton.jit
2023-01-11T21:38:06.2895442Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2895536Z     xnumel = 64
2023-01-11T21:38:06.2895641Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2895775Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2895862Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2895928Z     x0 = xindex
2023-01-11T21:38:06.2896147Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2896363Z     tmp5 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2896481Z     tmp8 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2896598Z     tmp9 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2896716Z     tmp1 = -0.0999755859375
2023-01-11T21:38:06.2896858Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.2896939Z     tmp3 = 0.0999755859375
2023-01-11T21:38:06.2897066Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3))
2023-01-11T21:38:06.2897202Z     tmp6 = 0.0
2023-01-11T21:38:06.2897437Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6))
2023-01-11T21:38:06.2897520Z     tmp10 = tmp8 + tmp9
2023-01-11T21:38:06.2897660Z     tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp6, tmp10, tmp6))
2023-01-11T21:38:06.2897794Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2897924Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2898049Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.2898137Z ''')
2023-01-11T21:38:06.2898143Z 
2023-01-11T21:38:06.2898147Z 
2023-01-11T21:38:06.2898240Z async_compile.wait(globals())
2023-01-11T21:38:06.2898318Z del async_compile
2023-01-11T21:38:06.2898323Z 
2023-01-11T21:38:06.2898397Z def call(args):
2023-01-11T21:38:06.2898475Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.2898550Z     args.clear()
2023-01-11T21:38:06.2898641Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2898839Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2899039Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2899235Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2899328Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2899503Z         triton_fused_maximum_1_minimum_minimum_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.2899577Z         del arg0_1
2023-01-11T21:38:06.2899650Z         del arg1_1
2023-01-11T21:38:06.2899732Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.2899773Z 
2023-01-11T21:38:06.2899785Z 
2023-01-11T21:38:06.2899861Z if __name__ == "__main__":
2023-01-11T21:38:06.2899982Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2900112Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2900318Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2900517Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2900638Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.2900644Z 
2023-01-11T21:38:06.2900720Z ok (0.329s)
2023-01-11T21:38:06.2901174Z   test_clone_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2901305Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2901567Z [2023-01-11 21:34:12,885] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 459
2023-01-11T21:38:06.2901836Z [2023-01-11 21:34:13,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 459
2023-01-11T21:38:06.2902253Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2902386Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2902645Z [2023-01-11 21:34:13,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 460
2023-01-11T21:38:06.2902914Z [2023-01-11 21:34:13,155] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 460
2023-01-11T21:38:06.2902920Z 
2023-01-11T21:38:06.2903020Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2903098Z import torch
2023-01-11T21:38:06.2903177Z import random
2023-01-11T21:38:06.2903320Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2903448Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2903453Z 
2023-01-11T21:38:06.2903538Z aten = torch.ops.aten
2023-01-11T21:38:06.2903679Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2903777Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2903782Z 
2023-01-11T21:38:06.2903859Z import triton
2023-01-11T21:38:06.2903953Z import triton.language as tl
2023-01-11T21:38:06.2904074Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2904216Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2904225Z 
2023-01-11T21:38:06.2904229Z 
2023-01-11T21:38:06.2904401Z triton_fused_add_clone_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2904476Z import triton
2023-01-11T21:38:06.2904572Z import triton.language as tl
2023-01-11T21:38:06.2904690Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2904798Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2904935Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2905056Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2905068Z 
2023-01-11T21:38:06.2905496Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.2905585Z @triton.jit
2023-01-11T21:38:06.2905754Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2905862Z     xnumel = 256
2023-01-11T21:38:06.2905966Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2906099Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2906185Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2906252Z     x0 = xindex
2023-01-11T21:38:06.2906447Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2906546Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.2906620Z     tmp1 = 2
2023-01-11T21:38:06.2906701Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2906774Z     tmp4 = 1
2023-01-11T21:38:06.2906853Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.2906984Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2907118Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2907204Z ''')
2023-01-11T21:38:06.2907209Z 
2023-01-11T21:38:06.2907214Z 
2023-01-11T21:38:06.2907312Z async_compile.wait(globals())
2023-01-11T21:38:06.2907390Z del async_compile
2023-01-11T21:38:06.2907395Z 
2023-01-11T21:38:06.2907473Z def call(args):
2023-01-11T21:38:06.2907548Z     arg0_1, = args
2023-01-11T21:38:06.2907624Z     args.clear()
2023-01-11T21:38:06.2907713Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2907926Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2908127Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2908222Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2908375Z         triton_fused_add_clone_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.2908451Z         del arg0_1
2023-01-11T21:38:06.2908537Z         return (buf0, buf1, )
2023-01-11T21:38:06.2908542Z 
2023-01-11T21:38:06.2908546Z 
2023-01-11T21:38:06.2908622Z if __name__ == "__main__":
2023-01-11T21:38:06.2908747Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2908878Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2909085Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2909202Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2909207Z 
2023-01-11T21:38:06.2909212Z 
2023-01-11T21:38:06.2909340Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2909419Z import torch
2023-01-11T21:38:06.2909494Z import random
2023-01-11T21:38:06.2909609Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2909736Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2909741Z 
2023-01-11T21:38:06.2909824Z aten = torch.ops.aten
2023-01-11T21:38:06.2909962Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2910062Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2910067Z 
2023-01-11T21:38:06.2910142Z import triton
2023-01-11T21:38:06.2910239Z import triton.language as tl
2023-01-11T21:38:06.2910369Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2910504Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2910510Z 
2023-01-11T21:38:06.2910514Z 
2023-01-11T21:38:06.2910682Z triton_fused_add_clone_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2910758Z import triton
2023-01-11T21:38:06.2910856Z import triton.language as tl
2023-01-11T21:38:06.2910973Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2911078Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2911213Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2911333Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2911345Z 
2023-01-11T21:38:06.2911756Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.2911859Z @triton.jit
2023-01-11T21:38:06.2912002Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2912080Z     xnumel = 256
2023-01-11T21:38:06.2912179Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2912310Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2912396Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2912463Z     x0 = xindex
2023-01-11T21:38:06.2912682Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.2912803Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.2912881Z     tmp1 = 2
2023-01-11T21:38:06.2912963Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.2913037Z     tmp4 = 1
2023-01-11T21:38:06.2913118Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.2913249Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.2913382Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.2913470Z ''')
2023-01-11T21:38:06.2913476Z 
2023-01-11T21:38:06.2913481Z 
2023-01-11T21:38:06.2913577Z async_compile.wait(globals())
2023-01-11T21:38:06.2913656Z del async_compile
2023-01-11T21:38:06.2913661Z 
2023-01-11T21:38:06.2913738Z def call(args):
2023-01-11T21:38:06.2913815Z     arg0_1, = args
2023-01-11T21:38:06.2913897Z     args.clear()
2023-01-11T21:38:06.2913986Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2914190Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2914389Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2914484Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2914637Z         triton_fused_add_clone_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.2914713Z         del arg0_1
2023-01-11T21:38:06.2914802Z         return (buf0, buf1, )
2023-01-11T21:38:06.2914810Z 
2023-01-11T21:38:06.2914814Z 
2023-01-11T21:38:06.2914896Z if __name__ == "__main__":
2023-01-11T21:38:06.2915011Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2915140Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2915383Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2915540Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2915546Z 
2023-01-11T21:38:06.2915620Z ok (0.299s)
2023-01-11T21:38:06.2916085Z   test_constant_pad_1d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2916218Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2916480Z [2023-01-11 21:34:13,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 461
2023-01-11T21:38:06.2916750Z [2023-01-11 21:34:13,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 461
2023-01-11T21:38:06.2917172Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2917300Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2917558Z [2023-01-11 21:34:13,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 462
2023-01-11T21:38:06.2917821Z [2023-01-11 21:34:13,562] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 462
2023-01-11T21:38:06.2917854Z 
2023-01-11T21:38:06.2917955Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2918032Z import torch
2023-01-11T21:38:06.2918110Z import random
2023-01-11T21:38:06.2918229Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2918358Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2918363Z 
2023-01-11T21:38:06.2918442Z aten = torch.ops.aten
2023-01-11T21:38:06.2918585Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2918682Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2918688Z 
2023-01-11T21:38:06.2918768Z import triton
2023-01-11T21:38:06.2918863Z import triton.language as tl
2023-01-11T21:38:06.2918989Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2919131Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2919136Z 
2023-01-11T21:38:06.2919141Z 
2023-01-11T21:38:06.2919320Z triton_fused_constant_pad_nd_0 = async_compile.triton('''
2023-01-11T21:38:06.2919391Z import triton
2023-01-11T21:38:06.2919486Z import triton.language as tl
2023-01-11T21:38:06.2919602Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2919705Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2919843Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2919970Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2919975Z 
2023-01-11T21:38:06.2920385Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2920461Z @triton.jit
2023-01-11T21:38:06.2920591Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2920671Z     xnumel = 1024
2023-01-11T21:38:06.2920772Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2920904Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2920990Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2921069Z     x0 = xindex % 32
2023-01-11T21:38:06.2921152Z     x1 = (xindex // 32)
2023-01-11T21:38:06.2921218Z     x2 = xindex
2023-01-11T21:38:06.2921290Z     tmp0 = x0
2023-01-11T21:38:06.2921391Z     tmp1 = 31
2023-01-11T21:38:06.2921474Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2921749Z     tmp3 = tl.load(in_ptr0 + (x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2921846Z     tmp4 = tl.where(tmp2, tmp3, 6.0)
2023-01-11T21:38:06.2921981Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2922061Z ''')
2023-01-11T21:38:06.2922067Z 
2023-01-11T21:38:06.2922078Z 
2023-01-11T21:38:06.2922255Z triton_fused_constant_pad_nd_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2922331Z import triton
2023-01-11T21:38:06.2922426Z import triton.language as tl
2023-01-11T21:38:06.2922548Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2922651Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2922784Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2922912Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2922917Z 
2023-01-11T21:38:06.2923312Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2923386Z @triton.jit
2023-01-11T21:38:06.2923521Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2923598Z     xnumel = 1152
2023-01-11T21:38:06.2923698Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2923830Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2923915Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2924036Z     x0 = xindex % 36
2023-01-11T21:38:06.2924111Z     x1 = (xindex // 36)
2023-01-11T21:38:06.2924184Z     x2 = xindex
2023-01-11T21:38:06.2924292Z     tmp0 = (-2) + x0
2023-01-11T21:38:06.2924363Z     tmp1 = 0
2023-01-11T21:38:06.2924445Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2924517Z     tmp3 = 31
2023-01-11T21:38:06.2924593Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2924673Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2924909Z     tmp6 = tl.load(in_ptr0 + ((-2) + x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0)
2023-01-11T21:38:06.2925007Z     tmp7 = tl.where(tmp5, tmp6, 99.0)
2023-01-11T21:38:06.2925142Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2925229Z ''')
2023-01-11T21:38:06.2925235Z 
2023-01-11T21:38:06.2925240Z 
2023-01-11T21:38:06.2925337Z async_compile.wait(globals())
2023-01-11T21:38:06.2925416Z del async_compile
2023-01-11T21:38:06.2925421Z 
2023-01-11T21:38:06.2925492Z def call(args):
2023-01-11T21:38:06.2925574Z     arg0_1, = args
2023-01-11T21:38:06.2925651Z     args.clear()
2023-01-11T21:38:06.2925745Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2925959Z         buf0 = empty_strided((2, 16, 32), (512, 32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2926054Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2926211Z         triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.2926417Z         buf1 = empty_strided((2, 16, 36), (576, 36, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2926575Z         triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 1152, grid=grid(1152), stream=stream0)
2023-01-11T21:38:06.2926653Z         del arg0_1
2023-01-11T21:38:06.2926739Z         return (buf0, buf1, )
2023-01-11T21:38:06.2926744Z 
2023-01-11T21:38:06.2926749Z 
2023-01-11T21:38:06.2926832Z if __name__ == "__main__":
2023-01-11T21:38:06.2926952Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2927084Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2927300Z     arg0_1 = rand_strided((2, 16, 31), (496, 31, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2927409Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2927420Z 
2023-01-11T21:38:06.2927424Z 
2023-01-11T21:38:06.2927545Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2927624Z import torch
2023-01-11T21:38:06.2927699Z import random
2023-01-11T21:38:06.2927821Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2927947Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2927952Z 
2023-01-11T21:38:06.2928037Z aten = torch.ops.aten
2023-01-11T21:38:06.2928174Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2928265Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2928270Z 
2023-01-11T21:38:06.2928348Z import triton
2023-01-11T21:38:06.2928443Z import triton.language as tl
2023-01-11T21:38:06.2928571Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2928713Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2928719Z 
2023-01-11T21:38:06.2928723Z 
2023-01-11T21:38:06.2928901Z triton_fused_constant_pad_nd_0 = async_compile.triton('''
2023-01-11T21:38:06.2928977Z import triton
2023-01-11T21:38:06.2929074Z import triton.language as tl
2023-01-11T21:38:06.2929183Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2929284Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2929416Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2929542Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2929547Z 
2023-01-11T21:38:06.2929955Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2930059Z @triton.jit
2023-01-11T21:38:06.2930193Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2930269Z     xnumel = 1024
2023-01-11T21:38:06.2930363Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2930495Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2930583Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2930660Z     x0 = xindex % 32
2023-01-11T21:38:06.2930741Z     x1 = (xindex // 32)
2023-01-11T21:38:06.2930814Z     x2 = xindex
2023-01-11T21:38:06.2930887Z     tmp0 = x0
2023-01-11T21:38:06.2930954Z     tmp1 = 31
2023-01-11T21:38:06.2931033Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.2931333Z     tmp3 = tl.load(in_ptr0 + (x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp2 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2931433Z     tmp4 = tl.where(tmp2, tmp3, 6.0)
2023-01-11T21:38:06.2931568Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.2931656Z ''')
2023-01-11T21:38:06.2931662Z 
2023-01-11T21:38:06.2931666Z 
2023-01-11T21:38:06.2931848Z triton_fused_constant_pad_nd_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2931925Z import triton
2023-01-11T21:38:06.2932013Z import triton.language as tl
2023-01-11T21:38:06.2932129Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2932236Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2932371Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2932498Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2932503Z 
2023-01-11T21:38:06.2932902Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2932978Z @triton.jit
2023-01-11T21:38:06.2933111Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2933184Z     xnumel = 1152
2023-01-11T21:38:06.2933282Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2933411Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2933496Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2933573Z     x0 = xindex % 36
2023-01-11T21:38:06.2933683Z     x1 = (xindex // 36)
2023-01-11T21:38:06.2933751Z     x2 = xindex
2023-01-11T21:38:06.2933858Z     tmp0 = (-2) + x0
2023-01-11T21:38:06.2933930Z     tmp1 = 0
2023-01-11T21:38:06.2934011Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2934084Z     tmp3 = 31
2023-01-11T21:38:06.2934164Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2934244Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2934616Z     tmp6 = tl.load(in_ptr0 + ((-2) + x0 + (31*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2934713Z     tmp7 = tl.where(tmp5, tmp6, 99.0)
2023-01-11T21:38:06.2934845Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2934933Z ''')
2023-01-11T21:38:06.2934939Z 
2023-01-11T21:38:06.2934943Z 
2023-01-11T21:38:06.2935036Z async_compile.wait(globals())
2023-01-11T21:38:06.2935112Z del async_compile
2023-01-11T21:38:06.2935117Z 
2023-01-11T21:38:06.2935190Z def call(args):
2023-01-11T21:38:06.2935261Z     arg0_1, = args
2023-01-11T21:38:06.2935331Z     args.clear()
2023-01-11T21:38:06.2935424Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2935636Z         buf0 = empty_strided((2, 16, 32), (512, 32, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2935729Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2935880Z         triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.2936092Z         buf1 = empty_strided((2, 16, 36), (576, 36, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2936247Z         triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 1152, grid=grid(1152), stream=stream0)
2023-01-11T21:38:06.2936394Z         del arg0_1
2023-01-11T21:38:06.2936477Z         return (buf0, buf1, )
2023-01-11T21:38:06.2936483Z 
2023-01-11T21:38:06.2936487Z 
2023-01-11T21:38:06.2936568Z if __name__ == "__main__":
2023-01-11T21:38:06.2936687Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2936816Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2937033Z     arg0_1 = rand_strided((2, 16, 31), (496, 31, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2937201Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2937207Z 
2023-01-11T21:38:06.2937298Z ok (0.407s)
2023-01-11T21:38:06.2937766Z   test_constant_pad_2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2937900Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2938156Z [2023-01-11 21:34:13,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 463
2023-01-11T21:38:06.2938426Z [2023-01-11 21:34:13,696] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 463
2023-01-11T21:38:06.2938841Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2938974Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2939229Z [2023-01-11 21:34:13,723] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 464
2023-01-11T21:38:06.2939494Z [2023-01-11 21:34:13,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 464
2023-01-11T21:38:06.2939500Z 
2023-01-11T21:38:06.2939602Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2939677Z import torch
2023-01-11T21:38:06.2939759Z import random
2023-01-11T21:38:06.2939915Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2940042Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2940047Z 
2023-01-11T21:38:06.2940131Z aten = torch.ops.aten
2023-01-11T21:38:06.2940271Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2940370Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2940375Z 
2023-01-11T21:38:06.2940451Z import triton
2023-01-11T21:38:06.2940546Z import triton.language as tl
2023-01-11T21:38:06.2940673Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2940808Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2940817Z 
2023-01-11T21:38:06.2940821Z 
2023-01-11T21:38:06.2941001Z triton_fused_constant_pad_nd_0 = async_compile.triton('''
2023-01-11T21:38:06.2941079Z import triton
2023-01-11T21:38:06.2941180Z import triton.language as tl
2023-01-11T21:38:06.2941322Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2941459Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2941611Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2941735Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2941747Z 
2023-01-11T21:38:06.2942147Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2942223Z @triton.jit
2023-01-11T21:38:06.2942358Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2942470Z     xnumel = 100
2023-01-11T21:38:06.2942568Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2942695Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2942775Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2942848Z     x1 = (xindex // 10)
2023-01-11T21:38:06.2942926Z     x0 = xindex % 10
2023-01-11T21:38:06.2942995Z     x2 = xindex
2023-01-11T21:38:06.2943101Z     tmp0 = (-1) + x1
2023-01-11T21:38:06.2943251Z     tmp1 = 0
2023-01-11T21:38:06.2943332Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2943403Z     tmp3 = 8
2023-01-11T21:38:06.2943475Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2943581Z     tmp5 = (-1) + x0
2023-01-11T21:38:06.2943659Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.2943736Z     tmp7 = tmp5 < tmp3
2023-01-11T21:38:06.2943814Z     tmp8 = tmp2 & tmp4
2023-01-11T21:38:06.2943892Z     tmp9 = tmp8 & tmp6
2023-01-11T21:38:06.2943964Z     tmp10 = tmp9 & tmp7
2023-01-11T21:38:06.2944248Z     tmp11 = tl.load(in_ptr0 + ((-9) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2944349Z     tmp12 = tl.where(tmp10, tmp11, 6.0)
2023-01-11T21:38:06.2944484Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.2944568Z ''')
2023-01-11T21:38:06.2944573Z 
2023-01-11T21:38:06.2944581Z 
2023-01-11T21:38:06.2944759Z triton_fused_constant_pad_nd_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2944833Z import triton
2023-01-11T21:38:06.2944925Z import triton.language as tl
2023-01-11T21:38:06.2945033Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2945133Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2945270Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2945417Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2945422Z 
2023-01-11T21:38:06.2945845Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2945920Z @triton.jit
2023-01-11T21:38:06.2946051Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2946124Z     xnumel = 165
2023-01-11T21:38:06.2946245Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2946375Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2946457Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2946536Z     x1 = (xindex // 11)
2023-01-11T21:38:06.2946611Z     x0 = xindex % 11
2023-01-11T21:38:06.2946680Z     x2 = xindex
2023-01-11T21:38:06.2946785Z     tmp0 = (-3) + x1
2023-01-11T21:38:06.2946848Z     tmp1 = 0
2023-01-11T21:38:06.2946928Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2947000Z     tmp3 = 8
2023-01-11T21:38:06.2947075Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2947177Z     tmp5 = (-1) + x0
2023-01-11T21:38:06.2947255Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.2947335Z     tmp7 = tmp5 < tmp3
2023-01-11T21:38:06.2947405Z     tmp8 = tmp2 & tmp4
2023-01-11T21:38:06.2947479Z     tmp9 = tmp8 & tmp6
2023-01-11T21:38:06.2947554Z     tmp10 = tmp9 & tmp7
2023-01-11T21:38:06.2947793Z     tmp11 = tl.load(in_ptr0 + ((-25) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
2023-01-11T21:38:06.2947895Z     tmp12 = tl.where(tmp10, tmp11, 99.0)
2023-01-11T21:38:06.2948030Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.2948115Z ''')
2023-01-11T21:38:06.2948121Z 
2023-01-11T21:38:06.2948125Z 
2023-01-11T21:38:06.2948211Z async_compile.wait(globals())
2023-01-11T21:38:06.2948287Z del async_compile
2023-01-11T21:38:06.2948293Z 
2023-01-11T21:38:06.2948369Z def call(args):
2023-01-11T21:38:06.2948439Z     arg0_1, = args
2023-01-11T21:38:06.2948513Z     args.clear()
2023-01-11T21:38:06.2948605Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2948827Z         buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2948943Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2949098Z         triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.2949325Z         buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2949476Z         triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0)
2023-01-11T21:38:06.2949550Z         del arg0_1
2023-01-11T21:38:06.2949636Z         return (buf0, buf1, )
2023-01-11T21:38:06.2949641Z 
2023-01-11T21:38:06.2949645Z 
2023-01-11T21:38:06.2949730Z if __name__ == "__main__":
2023-01-11T21:38:06.2949853Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2949975Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2950189Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2950306Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2950311Z 
2023-01-11T21:38:06.2950316Z 
2023-01-11T21:38:06.2950415Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2950489Z import torch
2023-01-11T21:38:06.2950566Z import random
2023-01-11T21:38:06.2950689Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2950818Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2950823Z 
2023-01-11T21:38:06.2950901Z aten = torch.ops.aten
2023-01-11T21:38:06.2951040Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2951136Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2951141Z 
2023-01-11T21:38:06.2951215Z import triton
2023-01-11T21:38:06.2951309Z import triton.language as tl
2023-01-11T21:38:06.2951436Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2951578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2951583Z 
2023-01-11T21:38:06.2951590Z 
2023-01-11T21:38:06.2951768Z triton_fused_constant_pad_nd_0 = async_compile.triton('''
2023-01-11T21:38:06.2951838Z import triton
2023-01-11T21:38:06.2951933Z import triton.language as tl
2023-01-11T21:38:06.2952049Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2952149Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2952311Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2952440Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2952446Z 
2023-01-11T21:38:06.2952848Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2952922Z @triton.jit
2023-01-11T21:38:06.2953049Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2953125Z     xnumel = 100
2023-01-11T21:38:06.2953227Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2953359Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2953445Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2953525Z     x1 = (xindex // 10)
2023-01-11T21:38:06.2953601Z     x0 = xindex % 10
2023-01-11T21:38:06.2953667Z     x2 = xindex
2023-01-11T21:38:06.2953775Z     tmp0 = (-1) + x1
2023-01-11T21:38:06.2953848Z     tmp1 = 0
2023-01-11T21:38:06.2953934Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2954005Z     tmp3 = 8
2023-01-11T21:38:06.2954085Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2954188Z     tmp5 = (-1) + x0
2023-01-11T21:38:06.2954262Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.2954342Z     tmp7 = tmp5 < tmp3
2023-01-11T21:38:06.2954420Z     tmp8 = tmp2 & tmp4
2023-01-11T21:38:06.2954496Z     tmp9 = tmp8 & tmp6
2023-01-11T21:38:06.2954576Z     tmp10 = tmp9 & tmp7
2023-01-11T21:38:06.2954882Z     tmp11 = tl.load(in_ptr0 + ((-9) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2955053Z     tmp12 = tl.where(tmp10, tmp11, 6.0)
2023-01-11T21:38:06.2955184Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.2955270Z ''')
2023-01-11T21:38:06.2955276Z 
2023-01-11T21:38:06.2955280Z 
2023-01-11T21:38:06.2955466Z triton_fused_constant_pad_nd_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2955543Z import triton
2023-01-11T21:38:06.2955642Z import triton.language as tl
2023-01-11T21:38:06.2955759Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2955864Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2955992Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2956119Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2956124Z 
2023-01-11T21:38:06.2956523Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2956603Z @triton.jit
2023-01-11T21:38:06.2956738Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2956817Z     xnumel = 165
2023-01-11T21:38:06.2956915Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2957049Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2957128Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2957211Z     x1 = (xindex // 11)
2023-01-11T21:38:06.2957289Z     x0 = xindex % 11
2023-01-11T21:38:06.2957358Z     x2 = xindex
2023-01-11T21:38:06.2957464Z     tmp0 = (-3) + x1
2023-01-11T21:38:06.2957538Z     tmp1 = 0
2023-01-11T21:38:06.2957620Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2957685Z     tmp3 = 8
2023-01-11T21:38:06.2957766Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2957870Z     tmp5 = (-1) + x0
2023-01-11T21:38:06.2957951Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.2958028Z     tmp7 = tmp5 < tmp3
2023-01-11T21:38:06.2958110Z     tmp8 = tmp2 & tmp4
2023-01-11T21:38:06.2958182Z     tmp9 = tmp8 & tmp6
2023-01-11T21:38:06.2958262Z     tmp10 = tmp9 & tmp7
2023-01-11T21:38:06.2958522Z     tmp11 = tl.load(in_ptr0 + ((-25) + x0 + (8*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2958652Z     tmp12 = tl.where(tmp10, tmp11, 99.0)
2023-01-11T21:38:06.2958789Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.2958877Z ''')
2023-01-11T21:38:06.2958882Z 
2023-01-11T21:38:06.2958887Z 
2023-01-11T21:38:06.2958982Z async_compile.wait(globals())
2023-01-11T21:38:06.2959060Z del async_compile
2023-01-11T21:38:06.2959065Z 
2023-01-11T21:38:06.2959135Z def call(args):
2023-01-11T21:38:06.2959210Z     arg0_1, = args
2023-01-11T21:38:06.2959285Z     args.clear()
2023-01-11T21:38:06.2959379Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2959602Z         buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2959697Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2959851Z         triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.2960072Z         buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2960222Z         triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0)
2023-01-11T21:38:06.2960299Z         del arg0_1
2023-01-11T21:38:06.2960383Z         return (buf0, buf1, )
2023-01-11T21:38:06.2960388Z 
2023-01-11T21:38:06.2960393Z 
2023-01-11T21:38:06.2960474Z if __name__ == "__main__":
2023-01-11T21:38:06.2960593Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2960719Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2960933Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2961048Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2961080Z 
2023-01-11T21:38:06.2961148Z ok (0.259s)
2023-01-11T21:38:06.2961615Z   test_constant_pad_3d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2961752Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2962013Z [2023-01-11 21:34:13,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 465
2023-01-11T21:38:06.2962279Z [2023-01-11 21:34:13,990] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 465
2023-01-11T21:38:06.2962699Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2962840Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2963099Z [2023-01-11 21:34:14,018] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 466
2023-01-11T21:38:06.2963362Z [2023-01-11 21:34:14,136] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 466
2023-01-11T21:38:06.2963368Z 
2023-01-11T21:38:06.2963470Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2963547Z import torch
2023-01-11T21:38:06.2963617Z import random
2023-01-11T21:38:06.2963738Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2963864Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2963872Z 
2023-01-11T21:38:06.2963956Z aten = torch.ops.aten
2023-01-11T21:38:06.2964095Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2964193Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2964198Z 
2023-01-11T21:38:06.2964273Z import triton
2023-01-11T21:38:06.2964360Z import triton.language as tl
2023-01-11T21:38:06.2964516Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2964661Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2964667Z 
2023-01-11T21:38:06.2964672Z 
2023-01-11T21:38:06.2964850Z triton_fused_constant_pad_nd_0 = async_compile.triton('''
2023-01-11T21:38:06.2964930Z import triton
2023-01-11T21:38:06.2965024Z import triton.language as tl
2023-01-11T21:38:06.2965140Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2965244Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2965371Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2965503Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2965508Z 
2023-01-11T21:38:06.2965913Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2965996Z @triton.jit
2023-01-11T21:38:06.2966130Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2966210Z     xnumel = 2310
2023-01-11T21:38:06.2966308Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2966439Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2966517Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2966601Z     x2 = (xindex // 77) % 15
2023-01-11T21:38:06.2966681Z     x1 = (xindex // 7) % 11
2023-01-11T21:38:06.2966757Z     x0 = xindex % 7
2023-01-11T21:38:06.2966841Z     x3 = (xindex // 1155)
2023-01-11T21:38:06.2966913Z     x7 = xindex
2023-01-11T21:38:06.2967058Z     tmp0 = (-5) + x2
2023-01-11T21:38:06.2967125Z     tmp1 = 0
2023-01-11T21:38:06.2967207Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2967281Z     tmp3 = 4
2023-01-11T21:38:06.2967361Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2967464Z     tmp5 = (-3) + x1
2023-01-11T21:38:06.2967543Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.2967618Z     tmp7 = tmp5 < tmp3
2023-01-11T21:38:06.2967726Z     tmp8 = (-1) + x0
2023-01-11T21:38:06.2967805Z     tmp9 = tmp8 >= tmp1
2023-01-11T21:38:06.2967881Z     tmp10 = tmp8 < tmp3
2023-01-11T21:38:06.2967959Z     tmp11 = tmp2 & tmp4
2023-01-11T21:38:06.2968039Z     tmp12 = tmp11 & tmp6
2023-01-11T21:38:06.2968121Z     tmp13 = tmp12 & tmp7
2023-01-11T21:38:06.2968196Z     tmp14 = tmp13 & tmp9
2023-01-11T21:38:06.2968279Z     tmp15 = tmp14 & tmp10
2023-01-11T21:38:06.2968587Z     tmp16 = tl.load(in_ptr0 + ((-93) + x0 + (4*x1) + (16*x2) + (64*x3) + tl.zeros([XBLOCK], tl.int32)), tmp15 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.2968690Z     tmp17 = tl.where(tmp15, tmp16, 6.0)
2023-01-11T21:38:06.2968828Z     tl.store(out_ptr0 + (x7 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.2968915Z ''')
2023-01-11T21:38:06.2968921Z 
2023-01-11T21:38:06.2968925Z 
2023-01-11T21:38:06.2969105Z triton_fused_constant_pad_nd_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2969189Z import triton
2023-01-11T21:38:06.2969278Z import triton.language as tl
2023-01-11T21:38:06.2969394Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2969498Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2969634Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2969761Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2969766Z 
2023-01-11T21:38:06.2970171Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2970252Z @triton.jit
2023-01-11T21:38:06.2970385Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2970455Z     xnumel = 352
2023-01-11T21:38:06.2970555Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2970685Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2970798Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2970881Z     x1 = (xindex // 4) % 11
2023-01-11T21:38:06.2970959Z     x2 = (xindex // 44)
2023-01-11T21:38:06.2971030Z     x3 = xindex % 44
2023-01-11T21:38:06.2971103Z     x4 = xindex
2023-01-11T21:38:06.2971211Z     tmp0 = (-3) + x1
2023-01-11T21:38:06.2971282Z     tmp1 = 0
2023-01-11T21:38:06.2971365Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2971438Z     tmp3 = 4
2023-01-11T21:38:06.2971518Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2971591Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2971828Z     tmp6 = tl.load(in_ptr0 + ((-12) + x3 + (16*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0)
2023-01-11T21:38:06.2971927Z     tmp7 = tl.where(tmp5, tmp6, 6.0)
2023-01-11T21:38:06.2972064Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2972149Z ''')
2023-01-11T21:38:06.2972154Z 
2023-01-11T21:38:06.2972158Z 
2023-01-11T21:38:06.2972255Z async_compile.wait(globals())
2023-01-11T21:38:06.2972336Z del async_compile
2023-01-11T21:38:06.2972341Z 
2023-01-11T21:38:06.2972418Z def call(args):
2023-01-11T21:38:06.2972487Z     arg0_1, = args
2023-01-11T21:38:06.2972563Z     args.clear()
2023-01-11T21:38:06.2972662Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2972885Z         buf0 = empty_strided((2, 15, 11, 7), (1155, 77, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2972978Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2973130Z         triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 2310, grid=grid(2310), stream=stream0)
2023-01-11T21:38:06.2973352Z         buf1 = empty_strided((2, 4, 11, 4), (176, 44, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2973528Z         triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 352, grid=grid(352), stream=stream0)
2023-01-11T21:38:06.2973602Z         del arg0_1
2023-01-11T21:38:06.2973687Z         return (buf0, buf1, )
2023-01-11T21:38:06.2973692Z 
2023-01-11T21:38:06.2973697Z 
2023-01-11T21:38:06.2973783Z if __name__ == "__main__":
2023-01-11T21:38:06.2973906Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2974034Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2974251Z     arg0_1 = rand_strided((2, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2974364Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2974370Z 
2023-01-11T21:38:06.2974374Z 
2023-01-11T21:38:06.2974590Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2974661Z import torch
2023-01-11T21:38:06.2974739Z import random
2023-01-11T21:38:06.2974860Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2974995Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2975000Z 
2023-01-11T21:38:06.2975084Z aten = torch.ops.aten
2023-01-11T21:38:06.2975221Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2975318Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2975327Z 
2023-01-11T21:38:06.2975396Z import triton
2023-01-11T21:38:06.2975491Z import triton.language as tl
2023-01-11T21:38:06.2975617Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2975760Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2975766Z 
2023-01-11T21:38:06.2975770Z 
2023-01-11T21:38:06.2975950Z triton_fused_constant_pad_nd_0 = async_compile.triton('''
2023-01-11T21:38:06.2976027Z import triton
2023-01-11T21:38:06.2976120Z import triton.language as tl
2023-01-11T21:38:06.2976235Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2976331Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2976467Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2976595Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2976600Z 
2023-01-11T21:38:06.2977046Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.2977177Z @triton.jit
2023-01-11T21:38:06.2977322Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2977399Z     xnumel = 2310
2023-01-11T21:38:06.2977500Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2977625Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2977710Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2977793Z     x2 = (xindex // 77) % 15
2023-01-11T21:38:06.2977878Z     x1 = (xindex // 7) % 11
2023-01-11T21:38:06.2977957Z     x0 = xindex % 7
2023-01-11T21:38:06.2978037Z     x3 = (xindex // 1155)
2023-01-11T21:38:06.2978110Z     x7 = xindex
2023-01-11T21:38:06.2978212Z     tmp0 = (-5) + x2
2023-01-11T21:38:06.2978283Z     tmp1 = 0
2023-01-11T21:38:06.2978364Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2978435Z     tmp3 = 4
2023-01-11T21:38:06.2978515Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2978623Z     tmp5 = (-3) + x1
2023-01-11T21:38:06.2978697Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.2978776Z     tmp7 = tmp5 < tmp3
2023-01-11T21:38:06.2978881Z     tmp8 = (-1) + x0
2023-01-11T21:38:06.2978960Z     tmp9 = tmp8 >= tmp1
2023-01-11T21:38:06.2979040Z     tmp10 = tmp8 < tmp3
2023-01-11T21:38:06.2979118Z     tmp11 = tmp2 & tmp4
2023-01-11T21:38:06.2979201Z     tmp12 = tmp11 & tmp6
2023-01-11T21:38:06.2979278Z     tmp13 = tmp12 & tmp7
2023-01-11T21:38:06.2979359Z     tmp14 = tmp13 & tmp9
2023-01-11T21:38:06.2979444Z     tmp15 = tmp14 & tmp10
2023-01-11T21:38:06.2979774Z     tmp16 = tl.load(in_ptr0 + ((-93) + x0 + (4*x1) + (16*x2) + (64*x3) + tl.zeros([XBLOCK], tl.int32)), tmp15 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.2979915Z     tmp17 = tl.where(tmp15, tmp16, 6.0)
2023-01-11T21:38:06.2980053Z     tl.store(out_ptr0 + (x7 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.2980140Z ''')
2023-01-11T21:38:06.2980146Z 
2023-01-11T21:38:06.2980153Z 
2023-01-11T21:38:06.2980334Z triton_fused_constant_pad_nd_1_1 = async_compile.triton('''
2023-01-11T21:38:06.2980405Z import triton
2023-01-11T21:38:06.2980497Z import triton.language as tl
2023-01-11T21:38:06.2980612Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2980714Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2980848Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2980975Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2980980Z 
2023-01-11T21:38:06.2981381Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.2981462Z @triton.jit
2023-01-11T21:38:06.2981588Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.2981663Z     xnumel = 352
2023-01-11T21:38:06.2981764Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2981897Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.2981982Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2982064Z     x1 = (xindex // 4) % 11
2023-01-11T21:38:06.2982143Z     x2 = (xindex // 44)
2023-01-11T21:38:06.2982214Z     x3 = xindex % 44
2023-01-11T21:38:06.2982286Z     x4 = xindex
2023-01-11T21:38:06.2982392Z     tmp0 = (-3) + x1
2023-01-11T21:38:06.2982465Z     tmp1 = 0
2023-01-11T21:38:06.2982545Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.2982614Z     tmp3 = 4
2023-01-11T21:38:06.2982687Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.2982766Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.2983025Z     tmp6 = tl.load(in_ptr0 + ((-12) + x3 + (16*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.2983124Z     tmp7 = tl.where(tmp5, tmp6, 6.0)
2023-01-11T21:38:06.2983259Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.2983373Z ''')
2023-01-11T21:38:06.2983379Z 
2023-01-11T21:38:06.2983384Z 
2023-01-11T21:38:06.2983479Z async_compile.wait(globals())
2023-01-11T21:38:06.2983558Z del async_compile
2023-01-11T21:38:06.2983563Z 
2023-01-11T21:38:06.2983633Z def call(args):
2023-01-11T21:38:06.2983711Z     arg0_1, = args
2023-01-11T21:38:06.2983785Z     args.clear()
2023-01-11T21:38:06.2983882Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2984106Z         buf0 = empty_strided((2, 15, 11, 7), (1155, 77, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2984198Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2984352Z         triton_fused_constant_pad_nd_0.run(arg0_1, buf0, 2310, grid=grid(2310), stream=stream0)
2023-01-11T21:38:06.2984569Z         buf1 = empty_strided((2, 4, 11, 4), (176, 44, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.2984723Z         triton_fused_constant_pad_nd_1_1.run(arg0_1, buf1, 352, grid=grid(352), stream=stream0)
2023-01-11T21:38:06.2984800Z         del arg0_1
2023-01-11T21:38:06.2984885Z         return (buf0, buf1, )
2023-01-11T21:38:06.2984890Z 
2023-01-11T21:38:06.2984895Z 
2023-01-11T21:38:06.2984975Z if __name__ == "__main__":
2023-01-11T21:38:06.2985094Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2985221Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2985435Z     arg0_1 = rand_strided((2, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.2985543Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.2985554Z 
2023-01-11T21:38:06.2985621Z ok (0.316s)
2023-01-11T21:38:06.2986105Z   test_conv2d_backward_channels_last_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.2986268Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.2986531Z [2023-01-11 21:34:14,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 467
2023-01-11T21:38:06.2986798Z [2023-01-11 21:34:14,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 467
2023-01-11T21:38:06.2986803Z 
2023-01-11T21:38:06.2986901Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2986976Z import torch
2023-01-11T21:38:06.2987050Z import random
2023-01-11T21:38:06.2987165Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2987296Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2987301Z 
2023-01-11T21:38:06.2987385Z aten = torch.ops.aten
2023-01-11T21:38:06.2987526Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2987627Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2987635Z 
2023-01-11T21:38:06.2987710Z import triton
2023-01-11T21:38:06.2987805Z import triton.language as tl
2023-01-11T21:38:06.2987931Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2988065Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2988070Z 
2023-01-11T21:38:06.2988081Z 
2023-01-11T21:38:06.2988234Z triton_fused_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.2988311Z import triton
2023-01-11T21:38:06.2988406Z import triton.language as tl
2023-01-11T21:38:06.2988521Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2988628Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2988762Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.2988889Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2988894Z 
2023-01-11T21:38:06.2988982Z @reduction(size_hints=[512, 128],
2023-01-11T21:38:06.2989101Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.2989214Z               filename=__file__,
2023-01-11T21:38:06.2989585Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.2989661Z @triton.jit
2023-01-11T21:38:06.2989831Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.2989907Z     xnumel = 320
2023-01-11T21:38:06.2989983Z     rnumel = 128
2023-01-11T21:38:06.2990077Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.2990213Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.2990299Z     xmask = xindex < xnumel
2023-01-11T21:38:06.2990419Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.2990492Z     x0 = xindex
2023-01-11T21:38:06.2990609Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.2990719Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.2990804Z         rindex = roffset + rbase
2023-01-11T21:38:06.2990892Z         rmask = rindex < rnumel
2023-01-11T21:38:06.2990970Z         r1 = rindex % 64
2023-01-11T21:38:06.2991054Z         r2 = (rindex // 64)
2023-01-11T21:38:06.2991289Z         tmp0 = tl.load(in_ptr0 + (r1 + (64*x0) + (20480*r2)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.2991412Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.2991527Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.2991621Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.2991744Z ''')
2023-01-11T21:38:06.2991750Z 
2023-01-11T21:38:06.2991754Z 
2023-01-11T21:38:06.2991854Z async_compile.wait(globals())
2023-01-11T21:38:06.2991932Z del async_compile
2023-01-11T21:38:06.2991937Z 
2023-01-11T21:38:06.2992014Z def call(args):
2023-01-11T21:38:06.2992103Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.2992182Z     args.clear()
2023-01-11T21:38:06.2992270Z     with torch.cuda.device(0):
2023-01-11T21:38:06.2992471Z         buf0 = empty_strided((320, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.2992565Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.2992705Z         triton_fused_sum_1_0.run(arg0_1, buf0, 320, 128, grid=grid(320), stream=stream0)
2023-01-11T21:38:06.2992877Z         buf1 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [320], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, False])
2023-01-11T21:38:06.2992951Z         del arg0_1
2023-01-11T21:38:06.2993026Z         del arg1_1
2023-01-11T21:38:06.2993102Z         del arg2_1
2023-01-11T21:38:06.2993171Z         buf2 = buf1[0]
2023-01-11T21:38:06.2993293Z         assert_size_stride(buf2, (2, 2048, 8, 8), (131072, 1, 16384, 2048))
2023-01-11T21:38:06.2993369Z         buf3 = buf1[1]
2023-01-11T21:38:06.2993490Z         assert_size_stride(buf3, (320, 2048, 1, 1), (2048, 1, 2048, 2048))
2023-01-11T21:38:06.2993566Z         del buf1
2023-01-11T21:38:06.2993658Z         return (buf2, buf3, buf0, )
2023-01-11T21:38:06.2993663Z 
2023-01-11T21:38:06.2993668Z 
2023-01-11T21:38:06.2993749Z if __name__ == "__main__":
2023-01-11T21:38:06.2993863Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.2993991Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.2994216Z     arg0_1 = rand_strided((2, 320, 8, 8), (20480, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2994442Z     arg1_1 = rand_strided((2, 2048, 8, 8), (131072, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2994672Z     arg2_1 = rand_strided((320, 2048, 1, 1), (2048, 1, 2048, 2048), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.2994801Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.2994806Z 
2023-01-11T21:38:06.2994880Z ok (0.365s)
2023-01-11T21:38:06.2995078Z   test_conv2d_binary_cuda (__main__.CudaTests) ... skip: only support cpu conv2d binary test (0.001s)
2023-01-11T21:38:06.2995254Z   test_conv2d_channels_last_cuda (__main__.CudaTests) ... skip: only support cpu channels_last (0.001s)
2023-01-11T21:38:06.2995419Z   test_conv2d_packed_cuda (__main__.CudaTests) ... skip: only support cpu conv2d unary test (0.000s)
2023-01-11T21:38:06.2995588Z   test_conv2d_unary_cuda (__main__.CudaTests) ... skip: only support cpu conv2d unary test (0.001s)
2023-01-11T21:38:06.2995762Z   test_conv3d_channels_last_cuda (__main__.CudaTests) ... skip: only support cpu channels_last (0.001s)
2023-01-11T21:38:06.2996302Z   test_conv_autotune_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.2996387Z   warnings.warn(
2023-01-11T21:38:06.2996650Z [2023-01-11 21:34:14,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 468
2023-01-11T21:38:06.2996912Z [2023-01-11 21:34:25,241] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 468
2023-01-11T21:38:06.2997111Z for key =  ('conv', 32, 128, 32, 32, 32, 128, 1, 1, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.2997324Z timing {'aten.convolution': 0.05315199866890907, 'triton_ops.conv': 0.07064960002899169}
2023-01-11T21:38:06.2997420Z best_kernel aten.convolution
2023-01-11T21:38:06.2997426Z 
2023-01-11T21:38:06.2997519Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.2997595Z import torch
2023-01-11T21:38:06.2997671Z import random
2023-01-11T21:38:06.2997825Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.2997950Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.2997956Z 
2023-01-11T21:38:06.2998038Z aten = torch.ops.aten
2023-01-11T21:38:06.2998175Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.2998269Z async_compile = AsyncCompile()
2023-01-11T21:38:06.2998274Z 
2023-01-11T21:38:06.2998350Z import triton
2023-01-11T21:38:06.2998442Z import triton.language as tl
2023-01-11T21:38:06.2998567Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.2998708Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.2998714Z 
2023-01-11T21:38:06.2998864Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune
2023-01-11T21:38:06.2999014Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time
2023-01-11T21:38:06.2999153Z from torch._inductor.triton_ops.autotune import conv_heuristics
2023-01-11T21:38:06.2999162Z 
2023-01-11T21:38:06.2999167Z 
2023-01-11T21:38:06.2999336Z triton_fused_convolution_0 = async_compile.triton('''
2023-01-11T21:38:06.2999407Z import triton
2023-01-11T21:38:06.2999500Z import triton.language as tl
2023-01-11T21:38:06.2999616Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.2999722Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.2999855Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.2999980Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.2999985Z 
2023-01-11T21:38:06.3000412Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3000488Z @triton.jit
2023-01-11T21:38:06.3000619Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3000699Z     xnumel = 1048576
2023-01-11T21:38:06.3000800Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3000930Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3001013Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3001085Z     x3 = xindex
2023-01-11T21:38:06.3001163Z     x1 = (xindex // 1024) % 32
2023-01-11T21:38:06.3001327Z     tmp0 = tl.load(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), xmask)
2023-01-11T21:38:06.3001465Z     tmp1 = tl.load(in_ptr0 + (x1 + tl.zeros([XBLOCK], tl.int32)), xmask)
2023-01-11T21:38:06.3001546Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3001681Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3001769Z ''')
2023-01-11T21:38:06.3001775Z 
2023-01-11T21:38:06.3001780Z 
2023-01-11T21:38:06.3001879Z async_compile.wait(globals())
2023-01-11T21:38:06.3001957Z del async_compile
2023-01-11T21:38:06.3001962Z 
2023-01-11T21:38:06.3002032Z def call(args):
2023-01-11T21:38:06.3002123Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.3002200Z     args.clear()
2023-01-11T21:38:06.3002292Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3002439Z         buf0 = aten.convolution(arg0_1, arg1_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.3002563Z         assert_size_stride(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1))
2023-01-11T21:38:06.3002638Z         del arg0_1
2023-01-11T21:38:06.3002706Z         del arg1_1
2023-01-11T21:38:06.3002837Z         buf1 = as_strided(buf0, (32, 32, 32, 32), (32768, 1024, 32, 1)); del buf0  # reuse
2023-01-11T21:38:06.3002929Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3003084Z         triton_fused_convolution_0.run(buf1, arg2_1, 1048576, grid=grid(1048576), stream=stream0)
2023-01-11T21:38:06.3003159Z         del arg2_1
2023-01-11T21:38:06.3003237Z         return (buf1, )
2023-01-11T21:38:06.3003242Z 
2023-01-11T21:38:06.3003246Z 
2023-01-11T21:38:06.3003326Z if __name__ == "__main__":
2023-01-11T21:38:06.3003474Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3003595Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3003832Z     arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3004057Z     arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3004257Z     arg2_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3004387Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.3004392Z 
2023-01-11T21:38:06.3004465Z ok (10.912s)
2023-01-11T21:38:06.3004796Z   test_conv_backward_cuda (__main__.CudaTests) ... [2023-01-11 21:34:25,504] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 469
2023-01-11T21:38:06.3005062Z [2023-01-11 21:34:25,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 469
2023-01-11T21:38:06.3005317Z [2023-01-11 21:34:25,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 470
2023-01-11T21:38:06.3005570Z [2023-01-11 21:34:25,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 470
2023-01-11T21:38:06.3005581Z 
2023-01-11T21:38:06.3005675Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3005755Z import torch
2023-01-11T21:38:06.3005833Z import random
2023-01-11T21:38:06.3005954Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3006078Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3006084Z 
2023-01-11T21:38:06.3006168Z aten = torch.ops.aten
2023-01-11T21:38:06.3006310Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3006402Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3006407Z 
2023-01-11T21:38:06.3006483Z import triton
2023-01-11T21:38:06.3006577Z import triton.language as tl
2023-01-11T21:38:06.3006703Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3006846Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3006852Z 
2023-01-11T21:38:06.3006856Z 
2023-01-11T21:38:06.3006946Z async_compile.wait(globals())
2023-01-11T21:38:06.3007026Z del async_compile
2023-01-11T21:38:06.3007031Z 
2023-01-11T21:38:06.3007138Z def call(args):
2023-01-11T21:38:06.3007265Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args
2023-01-11T21:38:06.3007344Z     args.clear()
2023-01-11T21:38:06.3007516Z     buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True])
2023-01-11T21:38:06.3007591Z     buf1 = buf0[0]
2023-01-11T21:38:06.3007706Z     assert_size_stride(buf1, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:06.3007781Z     buf2 = buf0[1]
2023-01-11T21:38:06.3007892Z     assert_size_stride(buf2, (4, 4, 3, 3), (36, 9, 3, 1))
2023-01-11T21:38:06.3007959Z     buf3 = buf0[2]
2023-01-11T21:38:06.3008063Z     assert_size_stride(buf3, (4, ), (1, ))
2023-01-11T21:38:06.3008134Z     del buf0
2023-01-11T21:38:06.3008307Z     buf4 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, False, False])
2023-01-11T21:38:06.3008381Z     del arg0_1
2023-01-11T21:38:06.3008454Z     del arg1_1
2023-01-11T21:38:06.3008528Z     del arg2_1
2023-01-11T21:38:06.3008596Z     buf5 = buf4[0]
2023-01-11T21:38:06.3008708Z     assert_size_stride(buf5, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:06.3008779Z     del buf4
2023-01-11T21:38:06.3008944Z     buf6 = aten.convolution_backward(arg3_1, arg4_1, arg5_1, [4], [1], [0], [1], False, [0], 1, [True, True, True])
2023-01-11T21:38:06.3009019Z     del arg3_1
2023-01-11T21:38:06.3009095Z     del arg4_1
2023-01-11T21:38:06.3009168Z     del arg5_1
2023-01-11T21:38:06.3009236Z     buf7 = buf6[0]
2023-01-11T21:38:06.3009347Z     assert_size_stride(buf7, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:06.3009422Z     buf8 = buf6[1]
2023-01-11T21:38:06.3009571Z     assert_size_stride(buf8, (4, 4, 3, 3), (36, 9, 3, 1))
2023-01-11T21:38:06.3009647Z     buf9 = buf6[2]
2023-01-11T21:38:06.3009749Z     assert_size_stride(buf9, (4, ), (1, ))
2023-01-11T21:38:06.3009827Z     del buf6
2023-01-11T21:38:06.3010001Z     buf10 = aten.convolution_backward(arg6_1, arg7_1, arg8_1, [4], [1, 1, 1], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1, [True, True, True])
2023-01-11T21:38:06.3010076Z     del arg6_1
2023-01-11T21:38:06.3010151Z     del arg7_1
2023-01-11T21:38:06.3010225Z     del arg8_1
2023-01-11T21:38:06.3010302Z     buf11 = buf10[0]
2023-01-11T21:38:06.3010419Z     assert_size_stride(buf11, (3, 4, 5, 5, 5), (500, 125, 25, 5, 1))
2023-01-11T21:38:06.3010497Z     buf12 = buf10[1]
2023-01-11T21:38:06.3010606Z     assert_size_stride(buf12, (4, 4, 3, 3, 3), (108, 27, 9, 3, 1))
2023-01-11T21:38:06.3010681Z     buf13 = buf10[2]
2023-01-11T21:38:06.3010782Z     assert_size_stride(buf13, (4, ), (1, ))
2023-01-11T21:38:06.3010853Z     del buf10
2023-01-11T21:38:06.3010990Z     return (buf1, buf2, buf3, buf5, buf7, buf8, buf9, buf11, buf12, buf13, )
2023-01-11T21:38:06.3010996Z 
2023-01-11T21:38:06.3011000Z 
2023-01-11T21:38:06.3011082Z if __name__ == "__main__":
2023-01-11T21:38:06.3011202Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3011333Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3011540Z     arg0_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3011753Z     arg1_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3011959Z     arg2_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3012163Z     arg3_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3012369Z     arg4_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3012575Z     arg5_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3012800Z     arg6_1 = rand_strided((3, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3013017Z     arg7_1 = rand_strided((3, 4, 5, 5, 5), (500, 125, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3013250Z     arg8_1 = rand_strided((4, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3013423Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1]))
2023-01-11T21:38:06.3013428Z 
2023-01-11T21:38:06.3013433Z 
2023-01-11T21:38:06.3013535Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3013611Z import torch
2023-01-11T21:38:06.3013687Z import random
2023-01-11T21:38:06.3013809Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3013939Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3013944Z 
2023-01-11T21:38:06.3014036Z aten = torch.ops.aten
2023-01-11T21:38:06.3014169Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3014267Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3014272Z 
2023-01-11T21:38:06.3014348Z import triton
2023-01-11T21:38:06.3014442Z import triton.language as tl
2023-01-11T21:38:06.3014692Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3014833Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3014839Z 
2023-01-11T21:38:06.3014843Z 
2023-01-11T21:38:06.3014933Z async_compile.wait(globals())
2023-01-11T21:38:06.3015013Z del async_compile
2023-01-11T21:38:06.3015018Z 
2023-01-11T21:38:06.3015085Z def call(args):
2023-01-11T21:38:06.3015214Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1 = args
2023-01-11T21:38:06.3015289Z     args.clear()
2023-01-11T21:38:06.3015455Z     buf0 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, True, True])
2023-01-11T21:38:06.3015574Z     buf1 = buf0[0]
2023-01-11T21:38:06.3015688Z     assert_size_stride(buf1, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:06.3015765Z     buf2 = buf0[1]
2023-01-11T21:38:06.3015875Z     assert_size_stride(buf2, (4, 4, 3, 3), (36, 9, 3, 1))
2023-01-11T21:38:06.3015942Z     buf3 = buf0[2]
2023-01-11T21:38:06.3016046Z     assert_size_stride(buf3, (4, ), (1, ))
2023-01-11T21:38:06.3016118Z     del buf0
2023-01-11T21:38:06.3016287Z     buf4 = aten.convolution_backward(arg0_1, arg1_1, arg2_1, [4], [1, 1], [0, 0], [1, 1], False, [0, 0], 1, [True, False, False])
2023-01-11T21:38:06.3016361Z     del arg0_1
2023-01-11T21:38:06.3016432Z     del arg1_1
2023-01-11T21:38:06.3016506Z     del arg2_1
2023-01-11T21:38:06.3016573Z     buf5 = buf4[0]
2023-01-11T21:38:06.3016684Z     assert_size_stride(buf5, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:06.3016758Z     del buf4
2023-01-11T21:38:06.3016921Z     buf6 = aten.convolution_backward(arg3_1, arg4_1, arg5_1, [4], [1], [0], [1], False, [0], 1, [True, True, True])
2023-01-11T21:38:06.3016999Z     del arg3_1
2023-01-11T21:38:06.3017071Z     del arg4_1
2023-01-11T21:38:06.3017188Z     del arg5_1
2023-01-11T21:38:06.3017265Z     buf7 = buf6[0]
2023-01-11T21:38:06.3017380Z     assert_size_stride(buf7, (3, 4, 5, 5), (100, 25, 5, 1))
2023-01-11T21:38:06.3017457Z     buf8 = buf6[1]
2023-01-11T21:38:06.3017572Z     assert_size_stride(buf8, (4, 4, 3, 3), (36, 9, 3, 1))
2023-01-11T21:38:06.3017647Z     buf9 = buf6[2]
2023-01-11T21:38:06.3017747Z     assert_size_stride(buf9, (4, ), (1, ))
2023-01-11T21:38:06.3017812Z     del buf6
2023-01-11T21:38:06.3017985Z     buf10 = aten.convolution_backward(arg6_1, arg7_1, arg8_1, [4], [1, 1, 1], [0, 0, 0], [1, 1, 1], False, [0, 0, 0], 1, [True, True, True])
2023-01-11T21:38:06.3018059Z     del arg6_1
2023-01-11T21:38:06.3018132Z     del arg7_1
2023-01-11T21:38:06.3018204Z     del arg8_1
2023-01-11T21:38:06.3018279Z     buf11 = buf10[0]
2023-01-11T21:38:06.3018398Z     assert_size_stride(buf11, (3, 4, 5, 5, 5), (500, 125, 25, 5, 1))
2023-01-11T21:38:06.3018472Z     buf12 = buf10[1]
2023-01-11T21:38:06.3018588Z     assert_size_stride(buf12, (4, 4, 3, 3, 3), (108, 27, 9, 3, 1))
2023-01-11T21:38:06.3018663Z     buf13 = buf10[2]
2023-01-11T21:38:06.3018766Z     assert_size_stride(buf13, (4, ), (1, ))
2023-01-11T21:38:06.3018838Z     del buf10
2023-01-11T21:38:06.3019010Z     return (buf1, buf2, buf3, buf5, buf7, buf8, buf9, buf11, buf12, buf13, )
2023-01-11T21:38:06.3019016Z 
2023-01-11T21:38:06.3019021Z 
2023-01-11T21:38:06.3019106Z if __name__ == "__main__":
2023-01-11T21:38:06.3019227Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3019350Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3019566Z     arg0_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3019808Z     arg1_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3020041Z     arg2_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3020274Z     arg3_1 = rand_strided((3, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3020509Z     arg4_1 = rand_strided((3, 4, 5, 5), (100, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3020738Z     arg5_1 = rand_strided((4, 4, 3, 3), (36, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3020987Z     arg6_1 = rand_strided((3, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3021231Z     arg7_1 = rand_strided((3, 4, 5, 5, 5), (500, 125, 25, 5, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3021475Z     arg8_1 = rand_strided((4, 4, 3, 3, 3), (108, 27, 9, 3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3021657Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1]))
2023-01-11T21:38:06.3021663Z 
2023-01-11T21:38:06.3021735Z ok (0.319s)
2023-01-11T21:38:06.3021943Z   test_conv_bn_fuse_cuda (__main__.CudaTests) ... skip: only support cpu conv bn test (0.001s)
2023-01-11T21:38:06.3022142Z   test_conv_functional_bn_fuse_cuda (__main__.CudaTests) ... skip: only support cpu conv bn test (0.001s)
2023-01-11T21:38:06.3022690Z   test_convolution1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3022835Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3023137Z [2023-01-11 21:34:25,828] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 471
2023-01-11T21:38:06.3023438Z [2023-01-11 21:34:25,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 471
2023-01-11T21:38:06.3023931Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3024076Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3024359Z [2023-01-11 21:34:26,044] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 472
2023-01-11T21:38:06.3024662Z [2023-01-11 21:34:26,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 472
2023-01-11T21:38:06.3024667Z 
2023-01-11T21:38:06.3024772Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3024850Z import torch
2023-01-11T21:38:06.3024928Z import random
2023-01-11T21:38:06.3025063Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3025199Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3025204Z 
2023-01-11T21:38:06.3025291Z aten = torch.ops.aten
2023-01-11T21:38:06.3025435Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3025541Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3025576Z 
2023-01-11T21:38:06.3025665Z import triton
2023-01-11T21:38:06.3025781Z import triton.language as tl
2023-01-11T21:38:06.3025942Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3026097Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3026102Z 
2023-01-11T21:38:06.3026107Z 
2023-01-11T21:38:06.3026287Z triton_fused_le_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.3026358Z import triton
2023-01-11T21:38:06.3026457Z import triton.language as tl
2023-01-11T21:38:06.3026581Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3026691Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3026840Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3026976Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3026981Z 
2023-01-11T21:38:06.3027496Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3027571Z @triton.jit
2023-01-11T21:38:06.3027719Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3027790Z     xnumel = 2352
2023-01-11T21:38:06.3027890Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3028021Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3028105Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3028179Z     x3 = xindex
2023-01-11T21:38:06.3028262Z     x1 = (xindex // 196) % 6
2023-01-11T21:38:06.3028390Z     tmp0 = tl.load(in_out_ptr0 + (x3), xmask)
2023-01-11T21:38:06.3028489Z     tmp1 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.3028568Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3028687Z     tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2))
2023-01-11T21:38:06.3028759Z     tmp4 = 0
2023-01-11T21:38:06.3028846Z     tmp5 = tmp3 <= tmp4
2023-01-11T21:38:06.3028986Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3029120Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3029202Z ''')
2023-01-11T21:38:06.3029207Z 
2023-01-11T21:38:06.3029212Z 
2023-01-11T21:38:06.3029307Z async_compile.wait(globals())
2023-01-11T21:38:06.3029387Z del async_compile
2023-01-11T21:38:06.3029392Z 
2023-01-11T21:38:06.3029469Z def call(args):
2023-01-11T21:38:06.3029580Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.3029657Z     args.clear()
2023-01-11T21:38:06.3029754Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3029907Z         buf0 = aten.convolution(primals_3, primals_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.3030028Z         assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1))
2023-01-11T21:38:06.3030161Z         buf1 = as_strided(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)); del buf0  # reuse
2023-01-11T21:38:06.3030380Z         buf2 = empty_strided((2, 6, 14, 14), (1176, 196, 14, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.3030474Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3030633Z         triton_fused_le_relu_0.run(buf1, primals_2, buf2, 2352, grid=grid(2352), stream=stream0)
2023-01-11T21:38:06.3030716Z         del primals_2
2023-01-11T21:38:06.3030827Z         return (buf1, primals_1, primals_3, buf2, )
2023-01-11T21:38:06.3030833Z 
2023-01-11T21:38:06.3030837Z 
2023-01-11T21:38:06.3030919Z if __name__ == "__main__":
2023-01-11T21:38:06.3031033Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3031163Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3031387Z     primals_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3031592Z     primals_2 = rand_strided((6, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3031851Z     primals_3 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3031994Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.3032000Z 
2023-01-11T21:38:06.3032004Z 
2023-01-11T21:38:06.3032106Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3032183Z import torch
2023-01-11T21:38:06.3032253Z import random
2023-01-11T21:38:06.3032375Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3032499Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3032504Z 
2023-01-11T21:38:06.3032584Z aten = torch.ops.aten
2023-01-11T21:38:06.3032725Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3032821Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3032826Z 
2023-01-11T21:38:06.3032902Z import triton
2023-01-11T21:38:06.3032990Z import triton.language as tl
2023-01-11T21:38:06.3033116Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3033260Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3033265Z 
2023-01-11T21:38:06.3033270Z 
2023-01-11T21:38:06.3033429Z triton_fused_le_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.3033506Z import triton
2023-01-11T21:38:06.3033601Z import triton.language as tl
2023-01-11T21:38:06.3033715Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3033818Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3033945Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3034073Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3034105Z 
2023-01-11T21:38:06.3034538Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3034615Z @triton.jit
2023-01-11T21:38:06.3034765Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3034841Z     xnumel = 2352
2023-01-11T21:38:06.3034942Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3035077Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3035175Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3035252Z     x3 = xindex
2023-01-11T21:38:06.3035351Z     x1 = (xindex // 196) % 6
2023-01-11T21:38:06.3035487Z     tmp0 = tl.load(in_out_ptr0 + (x3), xmask).to(tl.float32)
2023-01-11T21:38:06.3035604Z     tmp1 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32)
2023-01-11T21:38:06.3035685Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3035808Z     tmp3 = tl.where(0 != 0, 0, tl.where(0 > tmp2, 0, tmp2))
2023-01-11T21:38:06.3035891Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3035964Z     tmp5 = 0
2023-01-11T21:38:06.3036045Z     tmp6 = tmp4 <= tmp5
2023-01-11T21:38:06.3036183Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3036318Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3036407Z ''')
2023-01-11T21:38:06.3036413Z 
2023-01-11T21:38:06.3036417Z 
2023-01-11T21:38:06.3036510Z async_compile.wait(globals())
2023-01-11T21:38:06.3036582Z del async_compile
2023-01-11T21:38:06.3036594Z 
2023-01-11T21:38:06.3036664Z def call(args):
2023-01-11T21:38:06.3036769Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.3036848Z     args.clear()
2023-01-11T21:38:06.3036942Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3037101Z         buf0 = aten.convolution(primals_3, primals_1, None, (1, 1), (0, 0), (1, 1), False, (0, 0), 1)
2023-01-11T21:38:06.3037223Z         assert_size_stride(buf0, (2, 6, 14, 14), (1176, 196, 14, 1))
2023-01-11T21:38:06.3037351Z         buf1 = as_strided(buf0, (2, 6, 14, 14), (1176, 196, 14, 1)); del buf0  # reuse
2023-01-11T21:38:06.3037593Z         buf2 = empty_strided((2, 6, 14, 14), (1176, 196, 14, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.3037691Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3037845Z         triton_fused_le_relu_0.run(buf1, primals_2, buf2, 2352, grid=grid(2352), stream=stream0)
2023-01-11T21:38:06.3037924Z         del primals_2
2023-01-11T21:38:06.3038038Z         return (buf1, primals_1, primals_3, buf2, )
2023-01-11T21:38:06.3038044Z 
2023-01-11T21:38:06.3038048Z 
2023-01-11T21:38:06.3038132Z if __name__ == "__main__":
2023-01-11T21:38:06.3038251Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3038379Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3038594Z     primals_1 = rand_strided((6, 5, 3, 3), (45, 9, 3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3038804Z     primals_2 = rand_strided((6, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3039030Z     primals_3 = rand_strided((2, 5, 16, 16), (1280, 256, 16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3039178Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.3039183Z 
2023-01-11T21:38:06.3039257Z ok (0.412s)
2023-01-11T21:38:06.3039716Z   test_convolution2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3039849Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3040140Z [2023-01-11 21:34:26,193] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 473
2023-01-11T21:38:06.3040407Z [2023-01-11 21:34:26,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 473
2023-01-11T21:38:06.3040413Z 
2023-01-11T21:38:06.3040514Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3040584Z import torch
2023-01-11T21:38:06.3040661Z import random
2023-01-11T21:38:06.3040782Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3040907Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3040912Z 
2023-01-11T21:38:06.3040999Z aten = torch.ops.aten
2023-01-11T21:38:06.3041138Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3041234Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3041239Z 
2023-01-11T21:38:06.3041308Z import triton
2023-01-11T21:38:06.3041403Z import triton.language as tl
2023-01-11T21:38:06.3041529Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3041673Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3041679Z 
2023-01-11T21:38:06.3041683Z 
2023-01-11T21:38:06.3041853Z triton_fused_convolution_0 = async_compile.triton('''
2023-01-11T21:38:06.3041930Z import triton
2023-01-11T21:38:06.3042025Z import triton.language as tl
2023-01-11T21:38:06.3042143Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3042241Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3042376Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3042502Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3042507Z 
2023-01-11T21:38:06.3042930Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3043008Z @triton.jit
2023-01-11T21:38:06.3043146Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3043223Z     xnumel = 11648
2023-01-11T21:38:06.3043321Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3043444Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3043554Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3043630Z     x3 = xindex
2023-01-11T21:38:06.3043716Z     x1 = (xindex // 364) % 16
2023-01-11T21:38:06.3043823Z     tmp0 = tl.load(in_out_ptr0 + (x3), xmask)
2023-01-11T21:38:06.3043919Z     tmp1 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.3044001Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3044133Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3044222Z ''')
2023-01-11T21:38:06.3044227Z 
2023-01-11T21:38:06.3044232Z 
2023-01-11T21:38:06.3044324Z async_compile.wait(globals())
2023-01-11T21:38:06.3044405Z del async_compile
2023-01-11T21:38:06.3044413Z 
2023-01-11T21:38:06.3044490Z def call(args):
2023-01-11T21:38:06.3044577Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.3044654Z     args.clear()
2023-01-11T21:38:06.3044747Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3044884Z         buf0 = aten.convolution(arg0_1, arg1_1, None, (4,), (0,), (1,), True, (0,), 1)
2023-01-11T21:38:06.3045002Z         assert_size_stride(buf0, (2, 16, 364), (5824, 364, 1))
2023-01-11T21:38:06.3045076Z         del arg0_1
2023-01-11T21:38:06.3045152Z         del arg1_1
2023-01-11T21:38:06.3045279Z         buf1 = as_strided(buf0, (2, 16, 364), (5824, 364, 1)); del buf0  # reuse
2023-01-11T21:38:06.3045373Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3045524Z         triton_fused_convolution_0.run(buf1, arg2_1, 11648, grid=grid(11648), stream=stream0)
2023-01-11T21:38:06.3045592Z         del arg2_1
2023-01-11T21:38:06.3045670Z         return (buf1, )
2023-01-11T21:38:06.3045675Z 
2023-01-11T21:38:06.3045680Z 
2023-01-11T21:38:06.3045758Z if __name__ == "__main__":
2023-01-11T21:38:06.3045903Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3046032Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3046250Z     arg0_1 = rand_strided((2, 32, 90), (2880, 90, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3046462Z     arg1_1 = rand_strided((32, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3046660Z     arg2_1 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3046783Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.3046793Z 
2023-01-11T21:38:06.3046860Z ok (0.141s)
2023-01-11T21:38:06.3047316Z   test_cos_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3047452Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3047711Z [2023-01-11 21:34:26,322] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 474
2023-01-11T21:38:06.3047980Z [2023-01-11 21:34:26,404] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 474
2023-01-11T21:38:06.3048400Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3048534Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3048789Z [2023-01-11 21:34:26,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 475
2023-01-11T21:38:06.3049054Z [2023-01-11 21:34:26,514] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 475
2023-01-11T21:38:06.3049060Z 
2023-01-11T21:38:06.3049158Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3049229Z import torch
2023-01-11T21:38:06.3049378Z import random
2023-01-11T21:38:06.3049500Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3049626Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3049631Z 
2023-01-11T21:38:06.3049714Z aten = torch.ops.aten
2023-01-11T21:38:06.3049854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3049955Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3049960Z 
2023-01-11T21:38:06.3050036Z import triton
2023-01-11T21:38:06.3050124Z import triton.language as tl
2023-01-11T21:38:06.3050251Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3050394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3050399Z 
2023-01-11T21:38:06.3050404Z 
2023-01-11T21:38:06.3050572Z triton_fused_add_cos_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3050649Z import triton
2023-01-11T21:38:06.3050745Z import triton.language as tl
2023-01-11T21:38:06.3050864Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3050961Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3051095Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3051221Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3051227Z 
2023-01-11T21:38:06.3051650Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3051726Z @triton.jit
2023-01-11T21:38:06.3051871Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3051984Z     xnumel = 256
2023-01-11T21:38:06.3052083Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3052206Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3052292Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3052371Z     x0 = xindex
2023-01-11T21:38:06.3052564Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3052664Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3052747Z     tmp1 = tl.cos(tmp0)
2023-01-11T21:38:06.3052818Z     tmp2 = 2
2023-01-11T21:38:06.3052893Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.3052966Z     tmp5 = 1
2023-01-11T21:38:06.3053046Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.3053126Z     tmp7 = tl.cos(tmp6)
2023-01-11T21:38:06.3053266Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3053402Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3053492Z ''')
2023-01-11T21:38:06.3053498Z 
2023-01-11T21:38:06.3053502Z 
2023-01-11T21:38:06.3053598Z async_compile.wait(globals())
2023-01-11T21:38:06.3053671Z del async_compile
2023-01-11T21:38:06.3053676Z 
2023-01-11T21:38:06.3053752Z def call(args):
2023-01-11T21:38:06.3053827Z     arg0_1, = args
2023-01-11T21:38:06.3053905Z     args.clear()
2023-01-11T21:38:06.3053998Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3054208Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3054408Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3054601Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3054753Z         triton_fused_add_cos_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3054828Z         del arg0_1
2023-01-11T21:38:06.3054912Z         return (buf0, buf1, )
2023-01-11T21:38:06.3054918Z 
2023-01-11T21:38:06.3054926Z 
2023-01-11T21:38:06.3055009Z if __name__ == "__main__":
2023-01-11T21:38:06.3055129Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3055258Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3055490Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3055659Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3055667Z 
2023-01-11T21:38:06.3055672Z 
2023-01-11T21:38:06.3055777Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3055851Z import torch
2023-01-11T21:38:06.3055925Z import random
2023-01-11T21:38:06.3056046Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3056170Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3056175Z 
2023-01-11T21:38:06.3056260Z aten = torch.ops.aten
2023-01-11T21:38:06.3056391Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3056489Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3056497Z 
2023-01-11T21:38:06.3056573Z import triton
2023-01-11T21:38:06.3056666Z import triton.language as tl
2023-01-11T21:38:06.3056796Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3056937Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3056942Z 
2023-01-11T21:38:06.3056949Z 
2023-01-11T21:38:06.3057113Z triton_fused_add_cos_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3057244Z import triton
2023-01-11T21:38:06.3057333Z import triton.language as tl
2023-01-11T21:38:06.3057449Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3057552Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3057688Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3057814Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3057820Z 
2023-01-11T21:38:06.3058243Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3058356Z @triton.jit
2023-01-11T21:38:06.3058503Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3058573Z     xnumel = 256
2023-01-11T21:38:06.3058675Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3058806Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3058892Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3058963Z     x0 = xindex
2023-01-11T21:38:06.3059178Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3059297Z     tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3059381Z     tmp1 = tl.cos(tmp0)
2023-01-11T21:38:06.3059447Z     tmp2 = 2
2023-01-11T21:38:06.3059526Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.3059599Z     tmp5 = 1
2023-01-11T21:38:06.3059681Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.3059761Z     tmp7 = tl.cos(tmp6)
2023-01-11T21:38:06.3059901Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3060030Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3060118Z ''')
2023-01-11T21:38:06.3060124Z 
2023-01-11T21:38:06.3060131Z 
2023-01-11T21:38:06.3060227Z async_compile.wait(globals())
2023-01-11T21:38:06.3060306Z del async_compile
2023-01-11T21:38:06.3060311Z 
2023-01-11T21:38:06.3060387Z def call(args):
2023-01-11T21:38:06.3060462Z     arg0_1, = args
2023-01-11T21:38:06.3060538Z     args.clear()
2023-01-11T21:38:06.3060633Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3060833Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3061032Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3061125Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3061279Z         triton_fused_add_cos_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3061353Z         del arg0_1
2023-01-11T21:38:06.3061437Z         return (buf0, buf1, )
2023-01-11T21:38:06.3061442Z 
2023-01-11T21:38:06.3061447Z 
2023-01-11T21:38:06.3061528Z if __name__ == "__main__":
2023-01-11T21:38:06.3061680Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3061801Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3062003Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3062113Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3062119Z 
2023-01-11T21:38:06.3062188Z ok (0.222s)
2023-01-11T21:38:06.3062359Z   test_cpp_wrapper_cuda (__main__.CudaTests) ... skip: cpp_wrapper only supports cpu (0.001s)
2023-01-11T21:38:06.3062814Z   test_cudnn_rnn_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3062953Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3063208Z [2023-01-11 21:34:28,899] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 476
2023-01-11T21:38:06.3063430Z [2023-01-11 21:34:28,989] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._cudnn_rnn
2023-01-11T21:38:06.3063686Z [2023-01-11 21:34:28,995] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 476
2023-01-11T21:38:06.3063699Z 
2023-01-11T21:38:06.3063790Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3063863Z import torch
2023-01-11T21:38:06.3063937Z import random
2023-01-11T21:38:06.3064057Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3064206Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3064212Z 
2023-01-11T21:38:06.3064292Z aten = torch.ops.aten
2023-01-11T21:38:06.3064428Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3064516Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3064523Z 
2023-01-11T21:38:06.3064596Z import triton
2023-01-11T21:38:06.3064688Z import triton.language as tl
2023-01-11T21:38:06.3064813Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3064952Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3064957Z 
2023-01-11T21:38:06.3064962Z 
2023-01-11T21:38:06.3065055Z async_compile.wait(globals())
2023-01-11T21:38:06.3065130Z del async_compile
2023-01-11T21:38:06.3065135Z 
2023-01-11T21:38:06.3065208Z def call(args):
2023-01-11T21:38:06.3065395Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1 = args
2023-01-11T21:38:06.3065491Z     args.clear()
2023-01-11T21:38:06.3065592Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3065840Z         buf0 = aten._cudnn_rnn(arg0_1, [arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1], 4, arg17_1, arg18_1, arg19_1, 2, 2048, 0, 2, False, 0.0, False, True, [], None)
2023-01-11T21:38:06.3065913Z         del arg0_1
2023-01-11T21:38:06.3065984Z         del arg10_1
2023-01-11T21:38:06.3066055Z         del arg11_1
2023-01-11T21:38:06.3066126Z         del arg12_1
2023-01-11T21:38:06.3066189Z         del arg13_1
2023-01-11T21:38:06.3066260Z         del arg14_1
2023-01-11T21:38:06.3066328Z         del arg15_1
2023-01-11T21:38:06.3066398Z         del arg16_1
2023-01-11T21:38:06.3066467Z         del arg17_1
2023-01-11T21:38:06.3066536Z         del arg18_1
2023-01-11T21:38:06.3066599Z         del arg19_1
2023-01-11T21:38:06.3066672Z         del arg1_1
2023-01-11T21:38:06.3066740Z         del arg2_1
2023-01-11T21:38:06.3066811Z         del arg3_1
2023-01-11T21:38:06.3066882Z         del arg4_1
2023-01-11T21:38:06.3066951Z         del arg5_1
2023-01-11T21:38:06.3067021Z         del arg6_1
2023-01-11T21:38:06.3067084Z         del arg7_1
2023-01-11T21:38:06.3067154Z         del arg8_1
2023-01-11T21:38:06.3067251Z         del arg9_1
2023-01-11T21:38:06.3067326Z         buf1 = buf0[0]
2023-01-11T21:38:06.3067439Z         assert_size_stride(buf1, (92, 8, 4096), (32768, 4096, 1))
2023-01-11T21:38:06.3067514Z         buf2 = buf0[1]
2023-01-11T21:38:06.3067626Z         assert_size_stride(buf2, (4, 8, 2048), (16384, 2048, 1))
2023-01-11T21:38:06.3067694Z         buf3 = buf0[2]
2023-01-11T21:38:06.3067804Z         assert_size_stride(buf3, (4, 8, 2048), (16384, 2048, 1))
2023-01-11T21:38:06.3067881Z         buf4 = buf0[3]
2023-01-11T21:38:06.3067981Z         assert_size_stride(buf4, (0, ), (1, ))
2023-01-11T21:38:06.3068056Z         buf5 = buf0[4]
2023-01-11T21:38:06.3068167Z         assert_size_stride(buf5, (167837696, ), (1, ))
2023-01-11T21:38:06.3068238Z         del buf0
2023-01-11T21:38:06.3068334Z         return (buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:06.3068339Z 
2023-01-11T21:38:06.3068344Z 
2023-01-11T21:38:06.3068423Z if __name__ == "__main__":
2023-01-11T21:38:06.3068544Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3068669Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3068895Z     arg0_1 = rand_strided((92, 8, 2048), (16384, 2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3069106Z     arg1_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3069313Z     arg2_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3069512Z     arg3_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3069703Z     arg4_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3069934Z     arg5_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3070136Z     arg6_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3070332Z     arg7_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3070526Z     arg8_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3070726Z     arg9_1 = rand_strided((8192, 4096), (4096, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3070935Z     arg10_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3071136Z     arg11_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3071326Z     arg12_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3071535Z     arg13_1 = rand_strided((8192, 4096), (4096, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3071742Z     arg14_1 = rand_strided((8192, 2048), (2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3071937Z     arg15_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3072136Z     arg16_1 = rand_strided((8192, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3072342Z     arg17_1 = rand_strided((167837696, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3072558Z     arg18_1 = rand_strided((4, 8, 2048), (16384, 2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3072773Z     arg19_1 = rand_strided((4, 8, 2048), (16384, 2048, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3072990Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1]))
2023-01-11T21:38:06.3072999Z 
2023-01-11T21:38:06.3073063Z ok (2.801s)
2023-01-11T21:38:06.3073549Z   test_dense_mask_index_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3073683Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3073942Z [2023-01-11 21:34:29,342] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 477
2023-01-11T21:38:06.3074204Z [2023-01-11 21:34:29,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 477
2023-01-11T21:38:06.3074618Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3074755Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3075011Z [2023-01-11 21:34:29,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 478
2023-01-11T21:38:06.3075017Z 
2023-01-11T21:38:06.3075114Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3075187Z import torch
2023-01-11T21:38:06.3075260Z import random
2023-01-11T21:38:06.3075372Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3075495Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3075500Z 
2023-01-11T21:38:06.3075582Z aten = torch.ops.aten
2023-01-11T21:38:06.3075721Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3075817Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3075850Z 
2023-01-11T21:38:06.3075925Z import triton
2023-01-11T21:38:06.3076018Z import triton.language as tl
2023-01-11T21:38:06.3076136Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3076273Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3076279Z 
2023-01-11T21:38:06.3076285Z 
2023-01-11T21:38:06.3076460Z triton_fused_mul_select_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3076537Z import triton
2023-01-11T21:38:06.3076628Z import triton.language as tl
2023-01-11T21:38:06.3076742Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3076842Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3076972Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3077090Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3077095Z 
2023-01-11T21:38:06.3077187Z @reduction(size_hints=[16, 8192],
2023-01-11T21:38:06.3077304Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.3077395Z               filename=__file__,
2023-01-11T21:38:06.3077772Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3077852Z @triton.jit
2023-01-11T21:38:06.3078028Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3078102Z     xnumel = 13
2023-01-11T21:38:06.3078170Z     rnumel = 7877
2023-01-11T21:38:06.3078267Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3078401Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3078484Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3078600Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3078671Z     x0 = xindex
2023-01-11T21:38:06.3078811Z     tmp4 = tl.load(in_ptr1 + (2 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3078924Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3079028Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3079119Z         rindex = roffset + rbase
2023-01-11T21:38:06.3079204Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3079300Z         r1 = rindex
2023-01-11T21:38:06.3079382Z         tmp0 = r1 + (7877*x0)
2023-01-11T21:38:06.3079458Z         tmp1 = 102400
2023-01-11T21:38:06.3079532Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.3079703Z         tmp3 = tl.load(in_ptr0 + (r1 + (7877*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0)
2023-01-11T21:38:06.3079784Z         tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3079880Z         tmp6 = tl.where(tmp2, tmp5, 0)
2023-01-11T21:38:06.3080002Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.3080116Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3080219Z     tl.store(out_ptr0 + x0, tmp7, xmask)
2023-01-11T21:38:06.3080297Z ''')
2023-01-11T21:38:06.3080303Z 
2023-01-11T21:38:06.3080313Z 
2023-01-11T21:38:06.3080483Z triton_fused_mul_select_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3080556Z import triton
2023-01-11T21:38:06.3080648Z import triton.language as tl
2023-01-11T21:38:06.3080764Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3080865Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3080997Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3081123Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3081129Z 
2023-01-11T21:38:06.3081214Z @reduction(size_hints=[1, 16],
2023-01-11T21:38:06.3081328Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.3081415Z               filename=__file__,
2023-01-11T21:38:06.3081774Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3081873Z @triton.jit
2023-01-11T21:38:06.3082041Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3082115Z     xnumel = 1
2023-01-11T21:38:06.3082186Z     rnumel = 13
2023-01-11T21:38:06.3082278Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3082415Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3082498Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3082616Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3082734Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3082838Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3082927Z         rindex = roffset + rbase
2023-01-11T21:38:06.3083005Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3083075Z         r0 = rindex
2023-01-11T21:38:06.3083181Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.3083303Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.3083416Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3083551Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.3083640Z ''')
2023-01-11T21:38:06.3083648Z 
2023-01-11T21:38:06.3083652Z 
2023-01-11T21:38:06.3083738Z async_compile.wait(globals())
2023-01-11T21:38:06.3083817Z del async_compile
2023-01-11T21:38:06.3083822Z 
2023-01-11T21:38:06.3083895Z def call(args):
2023-01-11T21:38:06.3083974Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3084048Z     args.clear()
2023-01-11T21:38:06.3084141Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3084339Z         buf0 = empty_strided((13, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3084424Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3084582Z         triton_fused_mul_select_sum_1_0.run(arg0_1, arg1_1, buf0, 13, 7877, grid=grid(13), stream=stream0)
2023-01-11T21:38:06.3084658Z         del arg0_1
2023-01-11T21:38:06.3084731Z         del arg1_1
2023-01-11T21:38:06.3084920Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3085069Z         triton_fused_mul_select_sum_1_1.run(buf0, buf1, 1, 13, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.3085176Z         return (buf1, )
2023-01-11T21:38:06.3085181Z 
2023-01-11T21:38:06.3085186Z 
2023-01-11T21:38:06.3085270Z if __name__ == "__main__":
2023-01-11T21:38:06.3085381Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3085529Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3085765Z     arg0_1 = rand_strided((102400, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3085960Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3086080Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3086088Z 
2023-01-11T21:38:06.3086353Z [2023-01-11 21:34:29,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 478
2023-01-11T21:38:06.3086359Z 
2023-01-11T21:38:06.3086462Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3086535Z import torch
2023-01-11T21:38:06.3086603Z import random
2023-01-11T21:38:06.3086723Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3086849Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3086854Z 
2023-01-11T21:38:06.3086936Z aten = torch.ops.aten
2023-01-11T21:38:06.3087074Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3087170Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3087176Z 
2023-01-11T21:38:06.3087249Z import triton
2023-01-11T21:38:06.3087344Z import triton.language as tl
2023-01-11T21:38:06.3087462Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3087601Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3087633Z 
2023-01-11T21:38:06.3087637Z 
2023-01-11T21:38:06.3087808Z triton_fused_mul_select_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3087884Z import triton
2023-01-11T21:38:06.3087974Z import triton.language as tl
2023-01-11T21:38:06.3088088Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3088192Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3088324Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3088443Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3088448Z 
2023-01-11T21:38:06.3088540Z @reduction(size_hints=[16, 8192],
2023-01-11T21:38:06.3088657Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.3088742Z               filename=__file__,
2023-01-11T21:38:06.3089114Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3089194Z @triton.jit
2023-01-11T21:38:06.3089368Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3089442Z     xnumel = 13
2023-01-11T21:38:06.3089510Z     rnumel = 7877
2023-01-11T21:38:06.3089610Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3089747Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3089832Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3089949Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3090019Z     x0 = xindex
2023-01-11T21:38:06.3090174Z     tmp4 = tl.load(in_ptr1 + (2 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.3090285Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3090391Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3090477Z         rindex = roffset + rbase
2023-01-11T21:38:06.3090566Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3090639Z         r1 = rindex
2023-01-11T21:38:06.3090717Z         tmp0 = r1 + (7877*x0)
2023-01-11T21:38:06.3090790Z         tmp1 = 102400
2023-01-11T21:38:06.3090864Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.3091081Z         tmp3 = tl.load(in_ptr0 + (r1 + (7877*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.3091165Z         tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3091261Z         tmp6 = tl.where(tmp2, tmp5, 0)
2023-01-11T21:38:06.3091383Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.3091497Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3091596Z     tl.store(out_ptr0 + x0, tmp7, xmask)
2023-01-11T21:38:06.3091674Z ''')
2023-01-11T21:38:06.3091680Z 
2023-01-11T21:38:06.3091691Z 
2023-01-11T21:38:06.3091859Z triton_fused_mul_select_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3091934Z import triton
2023-01-11T21:38:06.3092029Z import triton.language as tl
2023-01-11T21:38:06.3092143Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3092243Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3092373Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3092497Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3092504Z 
2023-01-11T21:38:06.3092587Z @reduction(size_hints=[1, 16],
2023-01-11T21:38:06.3092701Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.3092785Z               filename=__file__,
2023-01-11T21:38:06.3093141Z               meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3093217Z @triton.jit
2023-01-11T21:38:06.3093386Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3093460Z     xnumel = 1
2023-01-11T21:38:06.3093565Z     rnumel = 13
2023-01-11T21:38:06.3093660Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3093797Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3093880Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3093996Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3094116Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3094221Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3094308Z         rindex = roffset + rbase
2023-01-11T21:38:06.3094387Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3094460Z         r0 = rindex
2023-01-11T21:38:06.3094666Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.3094788Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.3094902Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3095035Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.3095125Z ''')
2023-01-11T21:38:06.3095131Z 
2023-01-11T21:38:06.3095135Z 
2023-01-11T21:38:06.3095222Z async_compile.wait(globals())
2023-01-11T21:38:06.3095300Z del async_compile
2023-01-11T21:38:06.3095305Z 
2023-01-11T21:38:06.3095380Z def call(args):
2023-01-11T21:38:06.3095464Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3095564Z     args.clear()
2023-01-11T21:38:06.3095666Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3095880Z         buf0 = empty_strided((13, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3095965Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3096124Z         triton_fused_mul_select_sum_1_0.run(arg0_1, arg1_1, buf0, 13, 7877, grid=grid(13), stream=stream0)
2023-01-11T21:38:06.3096197Z         del arg0_1
2023-01-11T21:38:06.3096270Z         del arg1_1
2023-01-11T21:38:06.3096457Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3096604Z         triton_fused_mul_select_sum_1_1.run(buf0, buf1, 1, 13, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.3096683Z         return (buf1, )
2023-01-11T21:38:06.3096688Z 
2023-01-11T21:38:06.3096693Z 
2023-01-11T21:38:06.3096772Z if __name__ == "__main__":
2023-01-11T21:38:06.3096884Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3097054Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3097323Z     arg0_1 = rand_strided((102400, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3097519Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3097640Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3097646Z 
2023-01-11T21:38:06.3097714Z ok (0.322s)
2023-01-11T21:38:06.3098165Z   test_div1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3098299Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3098561Z [2023-01-11 21:34:29,694] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 479
2023-01-11T21:38:06.3098816Z [2023-01-11 21:34:29,968] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 479
2023-01-11T21:38:06.3099232Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3099364Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3099659Z [2023-01-11 21:34:30,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 480
2023-01-11T21:38:06.3099922Z [2023-01-11 21:34:30,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 480
2023-01-11T21:38:06.3099927Z 
2023-01-11T21:38:06.3100028Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3100102Z import torch
2023-01-11T21:38:06.3100175Z import random
2023-01-11T21:38:06.3100292Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3100409Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3100420Z 
2023-01-11T21:38:06.3100495Z aten = torch.ops.aten
2023-01-11T21:38:06.3100633Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3100729Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3100734Z 
2023-01-11T21:38:06.3100807Z import triton
2023-01-11T21:38:06.3100898Z import triton.language as tl
2023-01-11T21:38:06.3101031Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3101170Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3101176Z 
2023-01-11T21:38:06.3101180Z 
2023-01-11T21:38:06.3101371Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3101439Z import triton
2023-01-11T21:38:06.3101532Z import triton.language as tl
2023-01-11T21:38:06.3101645Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3101747Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3101878Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3102001Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3102006Z 
2023-01-11T21:38:06.3102486Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3102561Z @triton.jit
2023-01-11T21:38:06.3102733Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3102806Z     xnumel = 64
2023-01-11T21:38:06.3102933Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3103065Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3103149Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3103219Z     x0 = xindex
2023-01-11T21:38:06.3103412Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3103594Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3103691Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3103786Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3103866Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3103964Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.3104064Z     tmp4 = tl.libdevice.trunc(tmp2)
2023-01-11T21:38:06.3104140Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.3104229Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.3104363Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3104499Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3104628Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3104757Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3104883Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3104968Z ''')
2023-01-11T21:38:06.3104974Z 
2023-01-11T21:38:06.3104978Z 
2023-01-11T21:38:06.3105071Z async_compile.wait(globals())
2023-01-11T21:38:06.3105140Z del async_compile
2023-01-11T21:38:06.3105146Z 
2023-01-11T21:38:06.3105220Z def call(args):
2023-01-11T21:38:06.3105298Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3105402Z     args.clear()
2023-01-11T21:38:06.3105497Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3105696Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3105894Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3106085Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3106284Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3106477Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3106569Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3106748Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3106822Z         del arg0_1
2023-01-11T21:38:06.3106893Z         del arg1_1
2023-01-11T21:38:06.3107003Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3107008Z 
2023-01-11T21:38:06.3107013Z 
2023-01-11T21:38:06.3107092Z if __name__ == "__main__":
2023-01-11T21:38:06.3107205Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3107330Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3107530Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3107725Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3107840Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3107846Z 
2023-01-11T21:38:06.3107850Z 
2023-01-11T21:38:06.3107946Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3108020Z import torch
2023-01-11T21:38:06.3108088Z import random
2023-01-11T21:38:06.3108205Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3108327Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3108336Z 
2023-01-11T21:38:06.3108417Z aten = torch.ops.aten
2023-01-11T21:38:06.3108554Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3108646Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3108651Z 
2023-01-11T21:38:06.3108725Z import triton
2023-01-11T21:38:06.3108816Z import triton.language as tl
2023-01-11T21:38:06.3108971Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3109111Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3109117Z 
2023-01-11T21:38:06.3109122Z 
2023-01-11T21:38:06.3109311Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3109385Z import triton
2023-01-11T21:38:06.3109480Z import triton.language as tl
2023-01-11T21:38:06.3109595Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3109699Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3109829Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3109950Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3109955Z 
2023-01-11T21:38:06.3110438Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3110512Z @triton.jit
2023-01-11T21:38:06.3110691Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3110764Z     xnumel = 64
2023-01-11T21:38:06.3110862Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3110989Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3111071Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3111135Z     x0 = xindex
2023-01-11T21:38:06.3111348Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3111587Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3111704Z     tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3111817Z     tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3111896Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3111997Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.3112086Z     tmp4 = tl.libdevice.trunc(tmp2)
2023-01-11T21:38:06.3112165Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.3112261Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.3112395Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3112528Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3112661Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3112788Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3112916Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3112995Z ''')
2023-01-11T21:38:06.3113000Z 
2023-01-11T21:38:06.3113004Z 
2023-01-11T21:38:06.3113097Z async_compile.wait(globals())
2023-01-11T21:38:06.3113173Z del async_compile
2023-01-11T21:38:06.3113178Z 
2023-01-11T21:38:06.3113253Z def call(args):
2023-01-11T21:38:06.3113331Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3113406Z     args.clear()
2023-01-11T21:38:06.3113497Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3113691Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3113886Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3114084Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3114277Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3114466Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3114561Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3114737Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3114811Z         del arg0_1
2023-01-11T21:38:06.3114902Z         del arg1_1
2023-01-11T21:38:06.3115009Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3115015Z 
2023-01-11T21:38:06.3115019Z 
2023-01-11T21:38:06.3115102Z if __name__ == "__main__":
2023-01-11T21:38:06.3115243Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3115383Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3115595Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3115790Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3115908Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3115916Z 
2023-01-11T21:38:06.3115979Z ok (0.544s)
2023-01-11T21:38:06.3116433Z   test_div2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3116565Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3116819Z [2023-01-11 21:34:30,238] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 481
2023-01-11T21:38:06.3117084Z [2023-01-11 21:34:30,400] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 481
2023-01-11T21:38:06.3117498Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3117658Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3117911Z [2023-01-11 21:34:30,454] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 482
2023-01-11T21:38:06.3118171Z [2023-01-11 21:34:30,616] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 482
2023-01-11T21:38:06.3118177Z 
2023-01-11T21:38:06.3118279Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3118353Z import torch
2023-01-11T21:38:06.3118420Z import random
2023-01-11T21:38:06.3118538Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3118662Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3118670Z 
2023-01-11T21:38:06.3118752Z aten = torch.ops.aten
2023-01-11T21:38:06.3118889Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3118985Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3118990Z 
2023-01-11T21:38:06.3119063Z import triton
2023-01-11T21:38:06.3119156Z import triton.language as tl
2023-01-11T21:38:06.3119277Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3119418Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3119424Z 
2023-01-11T21:38:06.3119428Z 
2023-01-11T21:38:06.3119617Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3119690Z import triton
2023-01-11T21:38:06.3119780Z import triton.language as tl
2023-01-11T21:38:06.3119895Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3119994Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3120120Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3120248Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3120253Z 
2023-01-11T21:38:06.3120755Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3120830Z @triton.jit
2023-01-11T21:38:06.3121011Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3121085Z     xnumel = 64
2023-01-11T21:38:06.3121182Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3121312Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3121394Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3121458Z     x0 = xindex
2023-01-11T21:38:06.3121651Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3121842Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3121939Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3122034Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3122123Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3122205Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.3122298Z     tmp4 = tl.libdevice.floor(tmp3)
2023-01-11T21:38:06.3122393Z     tmp5 = tl.libdevice.trunc(tmp3)
2023-01-11T21:38:06.3122480Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.3122557Z     tmp9 = tmp7 / tmp8
2023-01-11T21:38:06.3122654Z     tmp10 = tl.libdevice.floor(tmp9)
2023-01-11T21:38:06.3122787Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3122921Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3123045Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3123176Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3123334Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.3123421Z ''')
2023-01-11T21:38:06.3123427Z 
2023-01-11T21:38:06.3123431Z 
2023-01-11T21:38:06.3123525Z async_compile.wait(globals())
2023-01-11T21:38:06.3123599Z del async_compile
2023-01-11T21:38:06.3123608Z 
2023-01-11T21:38:06.3123682Z def call(args):
2023-01-11T21:38:06.3123761Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3123829Z     args.clear()
2023-01-11T21:38:06.3123922Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3124122Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3124320Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3124515Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3124707Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3124901Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3124986Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3125165Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3125239Z         del arg0_1
2023-01-11T21:38:06.3125329Z         del arg1_1
2023-01-11T21:38:06.3125442Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3125449Z 
2023-01-11T21:38:06.3125455Z 
2023-01-11T21:38:06.3125554Z if __name__ == "__main__":
2023-01-11T21:38:06.3125674Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3125799Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3125990Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3126189Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3126311Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3126316Z 
2023-01-11T21:38:06.3126321Z 
2023-01-11T21:38:06.3126418Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3126491Z import torch
2023-01-11T21:38:06.3126565Z import random
2023-01-11T21:38:06.3126713Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3126837Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3126843Z 
2023-01-11T21:38:06.3126917Z aten = torch.ops.aten
2023-01-11T21:38:06.3127053Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3127150Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3127155Z 
2023-01-11T21:38:06.3127234Z import triton
2023-01-11T21:38:06.3127326Z import triton.language as tl
2023-01-11T21:38:06.3127451Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3127594Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3127603Z 
2023-01-11T21:38:06.3127607Z 
2023-01-11T21:38:06.3127798Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3127866Z import triton
2023-01-11T21:38:06.3127957Z import triton.language as tl
2023-01-11T21:38:06.3128070Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3128175Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3128306Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3128430Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3128435Z 
2023-01-11T21:38:06.3128913Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3128987Z @triton.jit
2023-01-11T21:38:06.3129163Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3129259Z     xnumel = 64
2023-01-11T21:38:06.3129356Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3129485Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3129570Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3129642Z     x0 = xindex
2023-01-11T21:38:06.3129830Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3130043Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3130134Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3130251Z     tmp8 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3130339Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3130418Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.3130517Z     tmp4 = tl.libdevice.floor(tmp3)
2023-01-11T21:38:06.3130613Z     tmp5 = tl.libdevice.trunc(tmp3)
2023-01-11T21:38:06.3130705Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.3130776Z     tmp9 = tmp7 / tmp8
2023-01-11T21:38:06.3130874Z     tmp10 = tl.libdevice.floor(tmp9)
2023-01-11T21:38:06.3131006Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3131138Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3131271Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3131400Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3131531Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.3131609Z ''')
2023-01-11T21:38:06.3131622Z 
2023-01-11T21:38:06.3131626Z 
2023-01-11T21:38:06.3131712Z async_compile.wait(globals())
2023-01-11T21:38:06.3131789Z del async_compile
2023-01-11T21:38:06.3131794Z 
2023-01-11T21:38:06.3131868Z def call(args):
2023-01-11T21:38:06.3131946Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3132023Z     args.clear()
2023-01-11T21:38:06.3132115Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3132312Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3132502Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3132725Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3132921Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3133110Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3133201Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3133381Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3133455Z         del arg0_1
2023-01-11T21:38:06.3133526Z         del arg1_1
2023-01-11T21:38:06.3133621Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3133631Z 
2023-01-11T21:38:06.3133635Z 
2023-01-11T21:38:06.3133714Z if __name__ == "__main__":
2023-01-11T21:38:06.3133836Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3133961Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3134161Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3134358Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3134588Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3134594Z 
2023-01-11T21:38:06.3134667Z ok (0.434s)
2023-01-11T21:38:06.3135117Z   test_div3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3135301Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3135567Z [2023-01-11 21:34:30,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 483
2023-01-11T21:38:06.3135833Z [2023-01-11 21:34:30,769] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 483
2023-01-11T21:38:06.3136246Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3136384Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3136709Z [2023-01-11 21:34:30,815] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 484
2023-01-11T21:38:06.3136976Z [2023-01-11 21:34:30,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 484
2023-01-11T21:38:06.3136982Z 
2023-01-11T21:38:06.3137081Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3137215Z import torch
2023-01-11T21:38:06.3137291Z import random
2023-01-11T21:38:06.3137414Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3137537Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3137542Z 
2023-01-11T21:38:06.3137625Z aten = torch.ops.aten
2023-01-11T21:38:06.3137762Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3137857Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3137862Z 
2023-01-11T21:38:06.3137935Z import triton
2023-01-11T21:38:06.3138027Z import triton.language as tl
2023-01-11T21:38:06.3138145Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3138283Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3138291Z 
2023-01-11T21:38:06.3138296Z 
2023-01-11T21:38:06.3138489Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3138564Z import triton
2023-01-11T21:38:06.3138658Z import triton.language as tl
2023-01-11T21:38:06.3138818Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3138921Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3139052Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3139171Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3139176Z 
2023-01-11T21:38:06.3139642Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3139716Z @triton.jit
2023-01-11T21:38:06.3139894Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3139967Z     xnumel = 64
2023-01-11T21:38:06.3140063Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3140192Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3140278Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3140342Z     x0 = xindex
2023-01-11T21:38:06.3140533Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3140723Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3140894Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3140992Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3141082Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3141168Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3141240Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3141499Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3141650Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3141905Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3142047Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3142182Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3142316Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3142446Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3142570Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3142657Z ''')
2023-01-11T21:38:06.3142663Z 
2023-01-11T21:38:06.3142668Z 
2023-01-11T21:38:06.3142766Z async_compile.wait(globals())
2023-01-11T21:38:06.3142846Z del async_compile
2023-01-11T21:38:06.3142854Z 
2023-01-11T21:38:06.3142932Z def call(args):
2023-01-11T21:38:06.3143013Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3143089Z     args.clear()
2023-01-11T21:38:06.3143183Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3143378Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3143578Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3143774Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3143973Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3144168Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3144263Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3144445Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3144523Z         del arg0_1
2023-01-11T21:38:06.3144591Z         del arg1_1
2023-01-11T21:38:06.3144697Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3144703Z 
2023-01-11T21:38:06.3144707Z 
2023-01-11T21:38:06.3144789Z if __name__ == "__main__":
2023-01-11T21:38:06.3144909Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3145066Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3145265Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3145463Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3145601Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3145608Z 
2023-01-11T21:38:06.3145614Z 
2023-01-11T21:38:06.3145713Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3145808Z import torch
2023-01-11T21:38:06.3145887Z import random
2023-01-11T21:38:06.3146006Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3146135Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3146141Z 
2023-01-11T21:38:06.3146225Z aten = torch.ops.aten
2023-01-11T21:38:06.3146363Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3146454Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3146465Z 
2023-01-11T21:38:06.3146538Z import triton
2023-01-11T21:38:06.3146633Z import triton.language as tl
2023-01-11T21:38:06.3146759Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3146900Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3146905Z 
2023-01-11T21:38:06.3146910Z 
2023-01-11T21:38:06.3147101Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3147178Z import triton
2023-01-11T21:38:06.3147271Z import triton.language as tl
2023-01-11T21:38:06.3147381Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3147485Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3147651Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3147778Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3147784Z 
2023-01-11T21:38:06.3148255Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3148331Z @triton.jit
2023-01-11T21:38:06.3148511Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3148587Z     xnumel = 64
2023-01-11T21:38:06.3148686Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3148813Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3148898Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3148971Z     x0 = xindex
2023-01-11T21:38:06.3149165Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3149355Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3149456Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3149555Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3149642Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3149731Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3149810Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3150066Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3150149Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3150399Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3150535Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3150662Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3150797Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3150928Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3151057Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3151173Z ''')
2023-01-11T21:38:06.3151179Z 
2023-01-11T21:38:06.3151183Z 
2023-01-11T21:38:06.3151281Z async_compile.wait(globals())
2023-01-11T21:38:06.3151361Z del async_compile
2023-01-11T21:38:06.3151367Z 
2023-01-11T21:38:06.3151443Z def call(args):
2023-01-11T21:38:06.3151517Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3151594Z     args.clear()
2023-01-11T21:38:06.3151690Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3151891Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3152088Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3152287Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3152486Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3152679Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3152769Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3152954Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3153030Z         del arg0_1
2023-01-11T21:38:06.3153104Z         del arg1_1
2023-01-11T21:38:06.3153210Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3153215Z 
2023-01-11T21:38:06.3153220Z 
2023-01-11T21:38:06.3153300Z if __name__ == "__main__":
2023-01-11T21:38:06.3153423Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3153552Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3153773Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3153967Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3154090Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3154095Z 
2023-01-11T21:38:06.3154171Z ok (0.217s)
2023-01-11T21:38:06.3154623Z   test_div4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3154756Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3155014Z [2023-01-11 21:34:30,880] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 485
2023-01-11T21:38:06.3155303Z [2023-01-11 21:34:30,900] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 485
2023-01-11T21:38:06.3155749Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3155881Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3156131Z [2023-01-11 21:34:30,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 486
2023-01-11T21:38:06.3156393Z [2023-01-11 21:34:30,965] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 486
2023-01-11T21:38:06.3156399Z 
2023-01-11T21:38:06.3156498Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3156576Z import torch
2023-01-11T21:38:06.3156654Z import random
2023-01-11T21:38:06.3156775Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3156901Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3156906Z 
2023-01-11T21:38:06.3156992Z aten = torch.ops.aten
2023-01-11T21:38:06.3157148Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3157247Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3157252Z 
2023-01-11T21:38:06.3157328Z import triton
2023-01-11T21:38:06.3157422Z import triton.language as tl
2023-01-11T21:38:06.3157548Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3157691Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3157696Z 
2023-01-11T21:38:06.3157701Z 
2023-01-11T21:38:06.3157893Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3157970Z import triton
2023-01-11T21:38:06.3158058Z import triton.language as tl
2023-01-11T21:38:06.3158176Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3158278Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3158410Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3158536Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3158541Z 
2023-01-11T21:38:06.3159012Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3159088Z @triton.jit
2023-01-11T21:38:06.3159264Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3159339Z     xnumel = 64
2023-01-11T21:38:06.3159432Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3159562Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3159675Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3159748Z     x0 = xindex
2023-01-11T21:38:06.3159939Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3160130Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3160232Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3160323Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3160414Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3160503Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3160582Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3160836Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3160917Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3161167Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3161300Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3161434Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3161570Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3161703Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3161829Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3161918Z ''')
2023-01-11T21:38:06.3161924Z 
2023-01-11T21:38:06.3161928Z 
2023-01-11T21:38:06.3162023Z async_compile.wait(globals())
2023-01-11T21:38:06.3162101Z del async_compile
2023-01-11T21:38:06.3162107Z 
2023-01-11T21:38:06.3162176Z def call(args):
2023-01-11T21:38:06.3162257Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3162334Z     args.clear()
2023-01-11T21:38:06.3162426Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3162631Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3162831Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3163026Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3163225Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3163444Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3163540Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3163719Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3163794Z         del arg0_1
2023-01-11T21:38:06.3163872Z         del arg1_1
2023-01-11T21:38:06.3163977Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3163982Z 
2023-01-11T21:38:06.3163987Z 
2023-01-11T21:38:06.3164068Z if __name__ == "__main__":
2023-01-11T21:38:06.3164188Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3164313Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3164511Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3164704Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3164828Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3164833Z 
2023-01-11T21:38:06.3164838Z 
2023-01-11T21:38:06.3164936Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3165012Z import torch
2023-01-11T21:38:06.3165088Z import random
2023-01-11T21:38:06.3165203Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3165328Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3165333Z 
2023-01-11T21:38:06.3165417Z aten = torch.ops.aten
2023-01-11T21:38:06.3165553Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3165650Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3165680Z 
2023-01-11T21:38:06.3165758Z import triton
2023-01-11T21:38:06.3165853Z import triton.language as tl
2023-01-11T21:38:06.3165979Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3166112Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3166118Z 
2023-01-11T21:38:06.3166128Z 
2023-01-11T21:38:06.3166315Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3166394Z import triton
2023-01-11T21:38:06.3166487Z import triton.language as tl
2023-01-11T21:38:06.3172813Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3172933Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3173067Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3173194Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3173199Z 
2023-01-11T21:38:06.3173688Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3173768Z @triton.jit
2023-01-11T21:38:06.3173950Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3174018Z     xnumel = 64
2023-01-11T21:38:06.3174117Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3174248Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3174333Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3174405Z     x0 = xindex
2023-01-11T21:38:06.3174843Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3175032Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3175123Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3175224Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3175318Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3175408Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3175484Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3175741Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3175888Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3176145Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3176282Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3176416Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3176548Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3176679Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3176808Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3176898Z ''')
2023-01-11T21:38:06.3176904Z 
2023-01-11T21:38:06.3176908Z 
2023-01-11T21:38:06.3176998Z async_compile.wait(globals())
2023-01-11T21:38:06.3177077Z del async_compile
2023-01-11T21:38:06.3177082Z 
2023-01-11T21:38:06.3177215Z def call(args):
2023-01-11T21:38:06.3177303Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3177378Z     args.clear()
2023-01-11T21:38:06.3177473Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3177678Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3177876Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3178065Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3178263Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3178454Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3178601Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3178786Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3178863Z         del arg0_1
2023-01-11T21:38:06.3178936Z         del arg1_1
2023-01-11T21:38:06.3179038Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3179051Z 
2023-01-11T21:38:06.3179056Z 
2023-01-11T21:38:06.3179132Z if __name__ == "__main__":
2023-01-11T21:38:06.3179252Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3179377Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3179575Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3179771Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3179894Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3179899Z 
2023-01-11T21:38:06.3179975Z ok (0.131s)
2023-01-11T21:38:06.3180435Z   test_div5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3180570Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3180822Z [2023-01-11 21:34:31,010] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 487
2023-01-11T21:38:06.3181088Z [2023-01-11 21:34:31,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 487
2023-01-11T21:38:06.3181506Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3181640Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3181923Z [2023-01-11 21:34:31,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 488
2023-01-11T21:38:06.3182188Z [2023-01-11 21:34:31,163] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 488
2023-01-11T21:38:06.3182194Z 
2023-01-11T21:38:06.3182293Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3182371Z import torch
2023-01-11T21:38:06.3182450Z import random
2023-01-11T21:38:06.3182565Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3182693Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3182698Z 
2023-01-11T21:38:06.3182782Z aten = torch.ops.aten
2023-01-11T21:38:06.3182925Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3183022Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3183027Z 
2023-01-11T21:38:06.3183104Z import triton
2023-01-11T21:38:06.3183199Z import triton.language as tl
2023-01-11T21:38:06.3183327Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3183464Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3183469Z 
2023-01-11T21:38:06.3183474Z 
2023-01-11T21:38:06.3183667Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3183744Z import triton
2023-01-11T21:38:06.3183838Z import triton.language as tl
2023-01-11T21:38:06.3183955Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3184062Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3184199Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3184320Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3184373Z 
2023-01-11T21:38:06.3184826Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i64', 3: '*i64', 4: '*fp32', 5: '*i64', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3184903Z @triton.jit
2023-01-11T21:38:06.3185076Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3185151Z     xnumel = 64
2023-01-11T21:38:06.3185250Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3185381Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3185472Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3185561Z     x0 = xindex
2023-01-11T21:38:06.3185770Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3185868Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3185960Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3186030Z     tmp2 = 16
2023-01-11T21:38:06.3186108Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.3186362Z     tmp4 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3186443Z     tmp5 = tmp0 // tmp2
2023-01-11T21:38:06.3186687Z     tmp7 = tl.where((tmp6 < 0) != (tmp2 < 0), tl.where(tmp6 % tmp2 != 0, tmp6 // tmp2 - 1, tmp6 // tmp2), tmp6 // tmp2)
2023-01-11T21:38:06.3186824Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3186957Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3187089Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3187220Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3187350Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3187440Z ''')
2023-01-11T21:38:06.3187446Z 
2023-01-11T21:38:06.3187450Z 
2023-01-11T21:38:06.3187545Z async_compile.wait(globals())
2023-01-11T21:38:06.3187616Z del async_compile
2023-01-11T21:38:06.3187621Z 
2023-01-11T21:38:06.3187696Z def call(args):
2023-01-11T21:38:06.3187769Z     arg0_1, = args
2023-01-11T21:38:06.3187844Z     args.clear()
2023-01-11T21:38:06.3187966Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3188169Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3188365Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3188553Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3188755Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3188950Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3189043Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3189219Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3189295Z         del arg0_1
2023-01-11T21:38:06.3189400Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3189405Z 
2023-01-11T21:38:06.3189410Z 
2023-01-11T21:38:06.3189493Z if __name__ == "__main__":
2023-01-11T21:38:06.3189612Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3189731Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3189929Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3190042Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3190047Z 
2023-01-11T21:38:06.3190052Z 
2023-01-11T21:38:06.3190150Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3190223Z import torch
2023-01-11T21:38:06.3190299Z import random
2023-01-11T21:38:06.3190420Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3190569Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3190582Z 
2023-01-11T21:38:06.3190659Z aten = torch.ops.aten
2023-01-11T21:38:06.3190797Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3190892Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3190897Z 
2023-01-11T21:38:06.3190974Z import triton
2023-01-11T21:38:06.3191067Z import triton.language as tl
2023-01-11T21:38:06.3191193Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3191331Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3191337Z 
2023-01-11T21:38:06.3191341Z 
2023-01-11T21:38:06.3191525Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3191600Z import triton
2023-01-11T21:38:06.3191692Z import triton.language as tl
2023-01-11T21:38:06.3191806Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3191908Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3192041Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3192167Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3192172Z 
2023-01-11T21:38:06.3192633Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*i64', 3: '*i64', 4: '*fp32', 5: '*i64', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3192709Z @triton.jit
2023-01-11T21:38:06.3192873Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3192947Z     xnumel = 64
2023-01-11T21:38:06.3193044Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3193172Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3193257Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3193328Z     x0 = xindex
2023-01-11T21:38:06.3193523Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3193616Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3193705Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3193779Z     tmp2 = 16
2023-01-11T21:38:06.3193862Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.3194147Z     tmp4 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3194228Z     tmp5 = tmp0 // tmp2
2023-01-11T21:38:06.3194482Z     tmp7 = tl.where((tmp6 < 0) != (tmp2 < 0), tl.where(tmp6 % tmp2 != 0, tmp6 // tmp2 - 1, tmp6 // tmp2), tmp6 // tmp2)
2023-01-11T21:38:06.3194610Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3194742Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3194872Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3195000Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3195131Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3195218Z ''')
2023-01-11T21:38:06.3195223Z 
2023-01-11T21:38:06.3195228Z 
2023-01-11T21:38:06.3195324Z async_compile.wait(globals())
2023-01-11T21:38:06.3195401Z del async_compile
2023-01-11T21:38:06.3195408Z 
2023-01-11T21:38:06.3195477Z def call(args):
2023-01-11T21:38:06.3195552Z     arg0_1, = args
2023-01-11T21:38:06.3195629Z     args.clear()
2023-01-11T21:38:06.3195720Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3195920Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3196117Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3196312Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3196509Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3196727Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3196822Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3196999Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3197075Z         del arg0_1
2023-01-11T21:38:06.3197180Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3197185Z 
2023-01-11T21:38:06.3197190Z 
2023-01-11T21:38:06.3197271Z if __name__ == "__main__":
2023-01-11T21:38:06.3197390Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3197517Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3197709Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3197822Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3197827Z 
2023-01-11T21:38:06.3197900Z ok (0.198s)
2023-01-11T21:38:06.3198355Z   test_div6_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3198492Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3198753Z [2023-01-11 21:34:31,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 489
2023-01-11T21:38:06.3199020Z [2023-01-11 21:34:31,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 489
2023-01-11T21:38:06.3199436Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3199572Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3199855Z [2023-01-11 21:34:31,363] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 490
2023-01-11T21:38:06.3200115Z [2023-01-11 21:34:31,384] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 490
2023-01-11T21:38:06.3200128Z 
2023-01-11T21:38:06.3200220Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3200296Z import torch
2023-01-11T21:38:06.3200371Z import random
2023-01-11T21:38:06.3200491Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3200617Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3200622Z 
2023-01-11T21:38:06.3200704Z aten = torch.ops.aten
2023-01-11T21:38:06.3200841Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3200934Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3200939Z 
2023-01-11T21:38:06.3201013Z import triton
2023-01-11T21:38:06.3201106Z import triton.language as tl
2023-01-11T21:38:06.3201232Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3201374Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3201380Z 
2023-01-11T21:38:06.3201384Z 
2023-01-11T21:38:06.3201580Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3201654Z import triton
2023-01-11T21:38:06.3201749Z import triton.language as tl
2023-01-11T21:38:06.3201857Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3201961Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3202095Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3202221Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3202227Z 
2023-01-11T21:38:06.3202697Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3202800Z @triton.jit
2023-01-11T21:38:06.3202981Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3203056Z     xnumel = 64
2023-01-11T21:38:06.3203148Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3203280Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3203363Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3203434Z     x0 = xindex
2023-01-11T21:38:06.3203625Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3203814Z     tmp3 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3203914Z     tmp10 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3204015Z     tmp12 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3204097Z     tmp1 = tmp0.to(tl.int64)
2023-01-11T21:38:06.3204186Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.3204273Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3204351Z     tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.3204605Z     tmp6 = tl.where((tmp1 < 0) != (tmp3 < 0), tl.where(tmp1 % tmp3 != 0, tmp1 // tmp3 - 1, tmp1 // tmp3), tmp1 // tmp3)
2023-01-11T21:38:06.3204684Z     tmp7 = tmp1 // tmp3
2023-01-11T21:38:06.3204772Z     tmp8 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3204844Z     tmp9 = tmp8 / tmp4
2023-01-11T21:38:06.3204931Z     tmp11 = tmp10.to(tl.int64)
2023-01-11T21:38:06.3205200Z     tmp13 = tl.where((tmp11 < 0) != (tmp12 < 0), tl.where(tmp11 % tmp12 != 0, tmp11 // tmp12 - 1, tmp11 // tmp12), tmp11 // tmp12)
2023-01-11T21:38:06.3205342Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3205493Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3205651Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3205782Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3205907Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.3206025Z ''')
2023-01-11T21:38:06.3206031Z 
2023-01-11T21:38:06.3206036Z 
2023-01-11T21:38:06.3206133Z async_compile.wait(globals())
2023-01-11T21:38:06.3206210Z del async_compile
2023-01-11T21:38:06.3206215Z 
2023-01-11T21:38:06.3206289Z def call(args):
2023-01-11T21:38:06.3206373Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3206450Z     args.clear()
2023-01-11T21:38:06.3206543Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3206738Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3206934Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3207134Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3207334Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3207526Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3207621Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3207799Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3207873Z         del arg0_1
2023-01-11T21:38:06.3207939Z         del arg1_1
2023-01-11T21:38:06.3208043Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3208048Z 
2023-01-11T21:38:06.3208052Z 
2023-01-11T21:38:06.3208132Z if __name__ == "__main__":
2023-01-11T21:38:06.3208250Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3208380Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3208603Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.3208797Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3208914Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3208920Z 
2023-01-11T21:38:06.3208924Z 
2023-01-11T21:38:06.3209018Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3209093Z import torch
2023-01-11T21:38:06.3209168Z import random
2023-01-11T21:38:06.3209285Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3209409Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3209414Z 
2023-01-11T21:38:06.3209496Z aten = torch.ops.aten
2023-01-11T21:38:06.3209635Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3209724Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3209734Z 
2023-01-11T21:38:06.3209803Z import triton
2023-01-11T21:38:06.3209895Z import triton.language as tl
2023-01-11T21:38:06.3210023Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3210162Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3210168Z 
2023-01-11T21:38:06.3210172Z 
2023-01-11T21:38:06.3210364Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3210444Z import triton
2023-01-11T21:38:06.3210538Z import triton.language as tl
2023-01-11T21:38:06.3210646Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3210747Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3210880Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3211004Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3211009Z 
2023-01-11T21:38:06.3211484Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i1', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3211565Z @triton.jit
2023-01-11T21:38:06.3211742Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3211819Z     xnumel = 64
2023-01-11T21:38:06.3211949Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3212074Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3212158Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3212230Z     x0 = xindex
2023-01-11T21:38:06.3212422Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3212612Z     tmp3 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3212712Z     tmp10 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3212810Z     tmp12 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3212891Z     tmp1 = tmp0.to(tl.int64)
2023-01-11T21:38:06.3212980Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.3213070Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3213148Z     tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.3213403Z     tmp6 = tl.where((tmp1 < 0) != (tmp3 < 0), tl.where(tmp1 % tmp3 != 0, tmp1 // tmp3 - 1, tmp1 // tmp3), tmp1 // tmp3)
2023-01-11T21:38:06.3213483Z     tmp7 = tmp1 // tmp3
2023-01-11T21:38:06.3213574Z     tmp8 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3213646Z     tmp9 = tmp8 / tmp4
2023-01-11T21:38:06.3213733Z     tmp11 = tmp10.to(tl.int64)
2023-01-11T21:38:06.3214002Z     tmp13 = tl.where((tmp11 < 0) != (tmp12 < 0), tl.where(tmp11 % tmp12 != 0, tmp11 // tmp12 - 1, tmp11 // tmp12), tmp11 // tmp12)
2023-01-11T21:38:06.3214135Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3214267Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3214399Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3214642Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3214823Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.3214904Z ''')
2023-01-11T21:38:06.3214910Z 
2023-01-11T21:38:06.3214914Z 
2023-01-11T21:38:06.3215011Z async_compile.wait(globals())
2023-01-11T21:38:06.3215088Z del async_compile
2023-01-11T21:38:06.3215096Z 
2023-01-11T21:38:06.3215171Z def call(args):
2023-01-11T21:38:06.3215251Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3215325Z     args.clear()
2023-01-11T21:38:06.3215415Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3215608Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3215804Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3216000Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3216197Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3216393Z         buf4 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3216485Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3216666Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3216744Z         del arg0_1
2023-01-11T21:38:06.3216811Z         del arg1_1
2023-01-11T21:38:06.3216915Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3216920Z 
2023-01-11T21:38:06.3216924Z 
2023-01-11T21:38:06.3217005Z if __name__ == "__main__":
2023-01-11T21:38:06.3217172Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3217305Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3217500Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.3217694Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3217819Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3217825Z 
2023-01-11T21:38:06.3217890Z ok (0.222s)
2023-01-11T21:38:06.3218380Z   test_div7_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3218515Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3218779Z [2023-01-11 21:34:31,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 491
2023-01-11T21:38:06.3219045Z [2023-01-11 21:34:31,585] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 491
2023-01-11T21:38:06.3219458Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3219597Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3219854Z [2023-01-11 21:34:31,629] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 492
2023-01-11T21:38:06.3220118Z [2023-01-11 21:34:31,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 492
2023-01-11T21:38:06.3220124Z 
2023-01-11T21:38:06.3220223Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3220298Z import torch
2023-01-11T21:38:06.3220367Z import random
2023-01-11T21:38:06.3220486Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3220611Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3220641Z 
2023-01-11T21:38:06.3220726Z aten = torch.ops.aten
2023-01-11T21:38:06.3220863Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3220960Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3220965Z 
2023-01-11T21:38:06.3221039Z import triton
2023-01-11T21:38:06.3221127Z import triton.language as tl
2023-01-11T21:38:06.3221254Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3221394Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3221400Z 
2023-01-11T21:38:06.3221404Z 
2023-01-11T21:38:06.3221593Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3221669Z import triton
2023-01-11T21:38:06.3221761Z import triton.language as tl
2023-01-11T21:38:06.3221875Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3221977Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3222105Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3222234Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3222239Z 
2023-01-11T21:38:06.3222716Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3222790Z @triton.jit
2023-01-11T21:38:06.3222966Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3223042Z     xnumel = 10000
2023-01-11T21:38:06.3223141Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3223272Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3223355Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3223421Z     x0 = xindex
2023-01-11T21:38:06.3223609Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3223801Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3223899Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3223995Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3224084Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3224200Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3224275Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3224528Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3224610Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3224860Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3224996Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3225130Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3225264Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3225420Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3225557Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3225652Z ''')
2023-01-11T21:38:06.3225658Z 
2023-01-11T21:38:06.3225664Z 
2023-01-11T21:38:06.3225760Z async_compile.wait(globals())
2023-01-11T21:38:06.3225842Z del async_compile
2023-01-11T21:38:06.3225847Z 
2023-01-11T21:38:06.3225923Z def call(args):
2023-01-11T21:38:06.3226004Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3226079Z     args.clear()
2023-01-11T21:38:06.3226164Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3226373Z         buf0 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3226578Z         buf1 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3226779Z         buf2 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3227024Z         buf3 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3227226Z         buf4 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3227318Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3227506Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10000, grid=grid(10000), stream=stream0)
2023-01-11T21:38:06.3227574Z         del arg0_1
2023-01-11T21:38:06.3227647Z         del arg1_1
2023-01-11T21:38:06.3227750Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3227755Z 
2023-01-11T21:38:06.3227759Z 
2023-01-11T21:38:06.3227839Z if __name__ == "__main__":
2023-01-11T21:38:06.3227958Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3228085Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3228290Z     arg0_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3228491Z     arg1_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3228605Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3228618Z 
2023-01-11T21:38:06.3228623Z 
2023-01-11T21:38:06.3228717Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3228794Z import torch
2023-01-11T21:38:06.3228869Z import random
2023-01-11T21:38:06.3228989Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3229112Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3229118Z 
2023-01-11T21:38:06.3229201Z aten = torch.ops.aten
2023-01-11T21:38:06.3229336Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3229424Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3229429Z 
2023-01-11T21:38:06.3229506Z import triton
2023-01-11T21:38:06.3229600Z import triton.language as tl
2023-01-11T21:38:06.3229729Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3229869Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3229875Z 
2023-01-11T21:38:06.3229879Z 
2023-01-11T21:38:06.3230070Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3230146Z import triton
2023-01-11T21:38:06.3230266Z import triton.language as tl
2023-01-11T21:38:06.3230376Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3230475Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3230609Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3230736Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3230741Z 
2023-01-11T21:38:06.3231210Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3231292Z @triton.jit
2023-01-11T21:38:06.3231466Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3231540Z     xnumel = 10000
2023-01-11T21:38:06.3231633Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3231763Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3231846Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3231917Z     x0 = xindex
2023-01-11T21:38:06.3232108Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3232299Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3232397Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3232494Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3232576Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3232661Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3232777Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3233029Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3233108Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3233362Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3233496Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3233624Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3233755Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3233887Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3234017Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3234104Z ''')
2023-01-11T21:38:06.3234110Z 
2023-01-11T21:38:06.3234114Z 
2023-01-11T21:38:06.3234213Z async_compile.wait(globals())
2023-01-11T21:38:06.3234293Z del async_compile
2023-01-11T21:38:06.3234298Z 
2023-01-11T21:38:06.3234375Z def call(args):
2023-01-11T21:38:06.3234450Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3234525Z     args.clear()
2023-01-11T21:38:06.3234617Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3234828Z         buf0 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3235039Z         buf1 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3235256Z         buf2 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3235497Z         buf3 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3235702Z         buf4 = empty_strided((100, 100), (100, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3235790Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3235972Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10000, grid=grid(10000), stream=stream0)
2023-01-11T21:38:06.3236051Z         del arg0_1
2023-01-11T21:38:06.3236124Z         del arg1_1
2023-01-11T21:38:06.3236229Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3236234Z 
2023-01-11T21:38:06.3236239Z 
2023-01-11T21:38:06.3236352Z if __name__ == "__main__":
2023-01-11T21:38:06.3236474Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3236601Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3236800Z     arg0_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3236999Z     arg1_1 = rand_strided((100, 100), (100, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3237124Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3237129Z 
2023-01-11T21:38:06.3237203Z ok (0.265s)
2023-01-11T21:38:06.3237521Z   test_div8_cuda (__main__.CudaTests) ... [2023-01-11 21:34:31,686] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 493
2023-01-11T21:38:06.3237798Z [2023-01-11 21:34:31,695] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 493
2023-01-11T21:38:06.3238056Z [2023-01-11 21:34:31,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 494
2023-01-11T21:38:06.3238319Z [2023-01-11 21:34:31,737] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 494
2023-01-11T21:38:06.3238324Z 
2023-01-11T21:38:06.3238423Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3238493Z import torch
2023-01-11T21:38:06.3238570Z import random
2023-01-11T21:38:06.3238691Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3238820Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3238825Z 
2023-01-11T21:38:06.3238910Z aten = torch.ops.aten
2023-01-11T21:38:06.3239047Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3239173Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3239178Z 
2023-01-11T21:38:06.3239248Z import triton
2023-01-11T21:38:06.3239341Z import triton.language as tl
2023-01-11T21:38:06.3239468Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3239614Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3239619Z 
2023-01-11T21:38:06.3239624Z 
2023-01-11T21:38:06.3239766Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.3239975Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.3240095Z extern "C" void kernel(long* __restrict__ out_ptr0,
2023-01-11T21:38:06.3240198Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:06.3240291Z                        long* __restrict__ out_ptr2)
2023-01-11T21:38:06.3240359Z {
2023-01-11T21:38:06.3240426Z     {
2023-01-11T21:38:06.3240498Z         {
2023-01-11T21:38:06.3240609Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:06.3240717Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:06.3240810Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.3240891Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.3240958Z         }
2023-01-11T21:38:06.3241026Z     }
2023-01-11T21:38:06.3241097Z     {
2023-01-11T21:38:06.3241165Z         {
2023-01-11T21:38:06.3241272Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:06.3241370Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:06.3241610Z             auto tmp2 = ((tmp0 < 0) != (tmp1 < 0) ? (tmp0 % tmp1 != 0 ? tmp0 / tmp1 - 1 : tmp0 / tmp1) : tmp0 / tmp1);
2023-01-11T21:38:06.3241696Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.3241763Z         }
2023-01-11T21:38:06.3241831Z     }
2023-01-11T21:38:06.3241898Z     {
2023-01-11T21:38:06.3241967Z         {
2023-01-11T21:38:06.3242066Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:06.3242173Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:06.3242264Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.3242349Z             out_ptr2[0] = tmp2;
2023-01-11T21:38:06.3242416Z         }
2023-01-11T21:38:06.3242483Z     }
2023-01-11T21:38:06.3242551Z }
2023-01-11T21:38:06.3242630Z ''')
2023-01-11T21:38:06.3242635Z 
2023-01-11T21:38:06.3242710Z 
2023-01-11T21:38:06.3242808Z async_compile.wait(globals())
2023-01-11T21:38:06.3242886Z del async_compile
2023-01-11T21:38:06.3242891Z 
2023-01-11T21:38:06.3242968Z def call(args):
2023-01-11T21:38:06.3243152Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.3243335Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.3243515Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.3243682Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.3243766Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3243773Z 
2023-01-11T21:38:06.3243777Z 
2023-01-11T21:38:06.3243860Z if __name__ == "__main__":
2023-01-11T21:38:06.3243979Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3244107Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3244212Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.3244219Z 
2023-01-11T21:38:06.3244223Z 
2023-01-11T21:38:06.3244323Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3244398Z import torch
2023-01-11T21:38:06.3244476Z import random
2023-01-11T21:38:06.3244590Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3244714Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3244720Z 
2023-01-11T21:38:06.3244803Z aten = torch.ops.aten
2023-01-11T21:38:06.3244939Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3245040Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3245046Z 
2023-01-11T21:38:06.3245172Z import triton
2023-01-11T21:38:06.3245277Z import triton.language as tl
2023-01-11T21:38:06.3245410Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3245551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3245556Z 
2023-01-11T21:38:06.3245561Z 
2023-01-11T21:38:06.3245702Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.3245908Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.3246027Z extern "C" void kernel(long* __restrict__ out_ptr0,
2023-01-11T21:38:06.3246128Z                        long* __restrict__ out_ptr1,
2023-01-11T21:38:06.3246232Z                        long* __restrict__ out_ptr2)
2023-01-11T21:38:06.3246299Z {
2023-01-11T21:38:06.3246360Z     {
2023-01-11T21:38:06.3246430Z         {
2023-01-11T21:38:06.3246538Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:06.3246644Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:06.3246738Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.3246823Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.3246893Z         }
2023-01-11T21:38:06.3246953Z     }
2023-01-11T21:38:06.3247022Z     {
2023-01-11T21:38:06.3247089Z         {
2023-01-11T21:38:06.3247194Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:06.3247302Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:06.3247545Z             auto tmp2 = ((tmp0 < 0) != (tmp1 < 0) ? (tmp0 % tmp1 != 0 ? tmp0 / tmp1 - 1 : tmp0 / tmp1) : tmp0 / tmp1);
2023-01-11T21:38:06.3247630Z             out_ptr1[0] = tmp2;
2023-01-11T21:38:06.3247691Z         }
2023-01-11T21:38:06.3247756Z     }
2023-01-11T21:38:06.3247822Z     {
2023-01-11T21:38:06.3247889Z         {
2023-01-11T21:38:06.3247993Z             auto tmp0 = static_cast<long>(1024);
2023-01-11T21:38:06.3248099Z             auto tmp1 = static_cast<long>(100);
2023-01-11T21:38:06.3248190Z             auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.3248268Z             out_ptr2[0] = tmp2;
2023-01-11T21:38:06.3248342Z         }
2023-01-11T21:38:06.3248409Z     }
2023-01-11T21:38:06.3248472Z }
2023-01-11T21:38:06.3248557Z ''')
2023-01-11T21:38:06.3248563Z 
2023-01-11T21:38:06.3248567Z 
2023-01-11T21:38:06.3248663Z async_compile.wait(globals())
2023-01-11T21:38:06.3248741Z del async_compile
2023-01-11T21:38:06.3248746Z 
2023-01-11T21:38:06.3248846Z def call(args):
2023-01-11T21:38:06.3249031Z     buf0 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.3249212Z     buf1 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.3249392Z     buf2 = empty_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.3249556Z     kernel_cpp_0(c_void_p(buf0.data_ptr()), c_void_p(buf1.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.3249645Z     return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3249650Z 
2023-01-11T21:38:06.3249656Z 
2023-01-11T21:38:06.3249736Z if __name__ == "__main__":
2023-01-11T21:38:06.3249859Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3249980Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3250085Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.3250091Z 
2023-01-11T21:38:06.3250163Z ok (0.087s)
2023-01-11T21:38:06.3250621Z   test_div_prim_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3250755Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3251014Z [2023-01-11 21:34:31,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 495
2023-01-11T21:38:06.3251281Z [2023-01-11 21:34:31,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 495
2023-01-11T21:38:06.3251727Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3251862Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3252118Z [2023-01-11 21:34:31,972] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 496
2023-01-11T21:38:06.3252381Z [2023-01-11 21:34:32,045] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 496
2023-01-11T21:38:06.3252788Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3252924Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3253181Z [2023-01-11 21:34:32,082] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 497
2023-01-11T21:38:06.3253446Z [2023-01-11 21:34:32,158] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 497
2023-01-11T21:38:06.3253860Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3253989Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3254247Z [2023-01-11 21:34:32,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 498
2023-01-11T21:38:06.3254253Z 
2023-01-11T21:38:06.3254352Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3254426Z import torch
2023-01-11T21:38:06.3254613Z import random
2023-01-11T21:38:06.3254771Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3254897Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3254902Z 
2023-01-11T21:38:06.3254984Z aten = torch.ops.aten
2023-01-11T21:38:06.3255121Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3255218Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3255223Z 
2023-01-11T21:38:06.3255297Z import triton
2023-01-11T21:38:06.3255392Z import triton.language as tl
2023-01-11T21:38:06.3255525Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3255683Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3255693Z 
2023-01-11T21:38:06.3255699Z 
2023-01-11T21:38:06.3255885Z triton_fused_div_0 = async_compile.triton('''
2023-01-11T21:38:06.3255961Z import triton
2023-01-11T21:38:06.3256054Z import triton.language as tl
2023-01-11T21:38:06.3256167Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3256273Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3256406Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3256525Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3256537Z 
2023-01-11T21:38:06.3256949Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3257022Z @triton.jit
2023-01-11T21:38:06.3257246Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3257374Z     xnumel = 100
2023-01-11T21:38:06.3257474Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3257603Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3257688Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3257754Z     x0 = xindex
2023-01-11T21:38:06.3257854Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3257953Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3258033Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3258169Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3258257Z ''')
2023-01-11T21:38:06.3258263Z 
2023-01-11T21:38:06.3258267Z 
2023-01-11T21:38:06.3258359Z async_compile.wait(globals())
2023-01-11T21:38:06.3258436Z del async_compile
2023-01-11T21:38:06.3258441Z 
2023-01-11T21:38:06.3258510Z def call(args):
2023-01-11T21:38:06.3258589Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3258663Z     args.clear()
2023-01-11T21:38:06.3258760Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3258962Z         buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3259055Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3259196Z         triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.3259267Z         del arg0_1
2023-01-11T21:38:06.3259339Z         del arg1_1
2023-01-11T21:38:06.3259417Z         return (buf0, )
2023-01-11T21:38:06.3259422Z 
2023-01-11T21:38:06.3259427Z 
2023-01-11T21:38:06.3259507Z if __name__ == "__main__":
2023-01-11T21:38:06.3259623Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3259748Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3259949Z     arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3260148Z     arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3260262Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3260270Z 
2023-01-11T21:38:06.3260274Z 
2023-01-11T21:38:06.3260372Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3260446Z import torch
2023-01-11T21:38:06.3260521Z import random
2023-01-11T21:38:06.3260639Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3260793Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3260799Z 
2023-01-11T21:38:06.3260883Z aten = torch.ops.aten
2023-01-11T21:38:06.3261014Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3261109Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3261113Z 
2023-01-11T21:38:06.3261189Z import triton
2023-01-11T21:38:06.3261282Z import triton.language as tl
2023-01-11T21:38:06.3261409Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3261547Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3261553Z 
2023-01-11T21:38:06.3261557Z 
2023-01-11T21:38:06.3261714Z triton_fused_div_0 = async_compile.triton('''
2023-01-11T21:38:06.3261789Z import triton
2023-01-11T21:38:06.3261875Z import triton.language as tl
2023-01-11T21:38:06.3261990Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3262092Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3262226Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3262351Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3262356Z 
2023-01-11T21:38:06.3262771Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3262843Z @triton.jit
2023-01-11T21:38:06.3262985Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3263053Z     xnumel = 100
2023-01-11T21:38:06.3263151Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3263308Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3263392Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3263462Z     x0 = xindex
2023-01-11T21:38:06.3263575Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3263694Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3263768Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3263903Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3263989Z ''')
2023-01-11T21:38:06.3263994Z 
2023-01-11T21:38:06.3263999Z 
2023-01-11T21:38:06.3264092Z async_compile.wait(globals())
2023-01-11T21:38:06.3264169Z del async_compile
2023-01-11T21:38:06.3264174Z 
2023-01-11T21:38:06.3264250Z def call(args):
2023-01-11T21:38:06.3264326Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3264402Z     args.clear()
2023-01-11T21:38:06.3264488Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3264687Z         buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3264782Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3264924Z         triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.3264997Z         del arg0_1
2023-01-11T21:38:06.3265070Z         del arg1_1
2023-01-11T21:38:06.3265153Z         return (buf0, )
2023-01-11T21:38:06.3265158Z 
2023-01-11T21:38:06.3265162Z 
2023-01-11T21:38:06.3265243Z if __name__ == "__main__":
2023-01-11T21:38:06.3265354Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3265479Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3265675Z     arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3265872Z     arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3265995Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3266000Z 
2023-01-11T21:38:06.3266008Z 
2023-01-11T21:38:06.3266109Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3266183Z import torch
2023-01-11T21:38:06.3266251Z import random
2023-01-11T21:38:06.3266369Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3266491Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3266497Z 
2023-01-11T21:38:06.3266605Z aten = torch.ops.aten
2023-01-11T21:38:06.3266748Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3266842Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3266847Z 
2023-01-11T21:38:06.3266922Z import triton
2023-01-11T21:38:06.3267015Z import triton.language as tl
2023-01-11T21:38:06.3267133Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3267270Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3267275Z 
2023-01-11T21:38:06.3267280Z 
2023-01-11T21:38:06.3267434Z triton_fused_div_0 = async_compile.triton('''
2023-01-11T21:38:06.3267513Z import triton
2023-01-11T21:38:06.3267606Z import triton.language as tl
2023-01-11T21:38:06.3267720Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3267822Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3267956Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3268077Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3268083Z 
2023-01-11T21:38:06.3268497Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3268571Z @triton.jit
2023-01-11T21:38:06.3268711Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3268787Z     xnumel = 100
2023-01-11T21:38:06.3268887Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3269017Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3269140Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3269205Z     x0 = xindex
2023-01-11T21:38:06.3269303Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3269398Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3269477Z     tmp2 = tmp0 // tmp1
2023-01-11T21:38:06.3269614Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3269700Z ''')
2023-01-11T21:38:06.3269706Z 
2023-01-11T21:38:06.3269710Z 
2023-01-11T21:38:06.3269804Z async_compile.wait(globals())
2023-01-11T21:38:06.3269875Z del async_compile
2023-01-11T21:38:06.3269880Z 
2023-01-11T21:38:06.3269955Z def call(args):
2023-01-11T21:38:06.3270034Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3270109Z     args.clear()
2023-01-11T21:38:06.3270199Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3270396Z         buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3270485Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3270625Z         triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.3270699Z         del arg0_1
2023-01-11T21:38:06.3270771Z         del arg1_1
2023-01-11T21:38:06.3270849Z         return (buf0, )
2023-01-11T21:38:06.3270854Z 
2023-01-11T21:38:06.3270860Z 
2023-01-11T21:38:06.3270942Z if __name__ == "__main__":
2023-01-11T21:38:06.3271062Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3271189Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3271387Z     arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3271575Z     arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3271694Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3271700Z 
2023-01-11T21:38:06.3271965Z [2023-01-11 21:34:32,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 498
2023-01-11T21:38:06.3271974Z 
2023-01-11T21:38:06.3272074Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3272147Z import torch
2023-01-11T21:38:06.3272222Z import random
2023-01-11T21:38:06.3272339Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3272464Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3272496Z 
2023-01-11T21:38:06.3272574Z aten = torch.ops.aten
2023-01-11T21:38:06.3272710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3272806Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3272811Z 
2023-01-11T21:38:06.3272884Z import triton
2023-01-11T21:38:06.3272978Z import triton.language as tl
2023-01-11T21:38:06.3273103Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3273242Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3273248Z 
2023-01-11T21:38:06.3273252Z 
2023-01-11T21:38:06.3273406Z triton_fused_div_0 = async_compile.triton('''
2023-01-11T21:38:06.3273478Z import triton
2023-01-11T21:38:06.3273572Z import triton.language as tl
2023-01-11T21:38:06.3273685Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3273786Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3273918Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3274049Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3274054Z 
2023-01-11T21:38:06.3274464Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3274538Z @triton.jit
2023-01-11T21:38:06.3274673Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3274747Z     xnumel = 100
2023-01-11T21:38:06.3274844Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3274973Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3275084Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3275157Z     x0 = xindex
2023-01-11T21:38:06.3275263Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3275370Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3275462Z     tmp2 = tmp0 // tmp1
2023-01-11T21:38:06.3275612Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3275698Z ''')
2023-01-11T21:38:06.3275704Z 
2023-01-11T21:38:06.3275708Z 
2023-01-11T21:38:06.3275801Z async_compile.wait(globals())
2023-01-11T21:38:06.3275878Z del async_compile
2023-01-11T21:38:06.3275883Z 
2023-01-11T21:38:06.3275957Z def call(args):
2023-01-11T21:38:06.3276038Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3276107Z     args.clear()
2023-01-11T21:38:06.3276199Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3276396Z         buf0 = empty_strided((100, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3276492Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3276635Z         triton_fused_div_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.3276709Z         del arg0_1
2023-01-11T21:38:06.3276782Z         del arg1_1
2023-01-11T21:38:06.3276853Z         return (buf0, )
2023-01-11T21:38:06.3276858Z 
2023-01-11T21:38:06.3276862Z 
2023-01-11T21:38:06.3276946Z if __name__ == "__main__":
2023-01-11T21:38:06.3277063Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3277188Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3277389Z     arg0_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3277582Z     arg1_1 = rand_strided((100, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3277702Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3277707Z 
2023-01-11T21:38:06.3277778Z ok (0.463s)
2023-01-11T21:38:06.3278230Z   test_div_zero_dim_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3278391Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3278650Z [2023-01-11 21:34:32,252] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 499
2023-01-11T21:38:06.3278915Z [2023-01-11 21:34:32,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 499
2023-01-11T21:38:06.3279331Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3279466Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3279717Z [2023-01-11 21:34:32,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 500
2023-01-11T21:38:06.3279980Z [2023-01-11 21:34:32,728] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 500
2023-01-11T21:38:06.3280396Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3280526Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3280776Z [2023-01-11 21:34:32,782] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 501
2023-01-11T21:38:06.3280807Z 
2023-01-11T21:38:06.3280907Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3280976Z import torch
2023-01-11T21:38:06.3281050Z import random
2023-01-11T21:38:06.3281169Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3281297Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3281302Z 
2023-01-11T21:38:06.3281384Z aten = torch.ops.aten
2023-01-11T21:38:06.3281521Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3281618Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3281623Z 
2023-01-11T21:38:06.3281691Z import triton
2023-01-11T21:38:06.3281784Z import triton.language as tl
2023-01-11T21:38:06.3281908Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3282048Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3282053Z 
2023-01-11T21:38:06.3282061Z 
2023-01-11T21:38:06.3282253Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3282329Z import triton
2023-01-11T21:38:06.3282422Z import triton.language as tl
2023-01-11T21:38:06.3282535Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3282631Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3282767Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3282891Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3282896Z 
2023-01-11T21:38:06.3283373Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3283448Z @triton.jit
2023-01-11T21:38:06.3283626Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3283701Z     xnumel = 10
2023-01-11T21:38:06.3283802Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3283931Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3284009Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3284080Z     x0 = xindex
2023-01-11T21:38:06.3284298Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3284533Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3284634Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3284764Z     tmp6 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3284849Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3284942Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.3285038Z     tmp4 = tl.libdevice.trunc(tmp2)
2023-01-11T21:38:06.3285116Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.3285216Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.3285348Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3285481Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3285613Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3285739Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3285867Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3285954Z ''')
2023-01-11T21:38:06.3285960Z 
2023-01-11T21:38:06.3285964Z 
2023-01-11T21:38:06.3286059Z async_compile.wait(globals())
2023-01-11T21:38:06.3286137Z del async_compile
2023-01-11T21:38:06.3286142Z 
2023-01-11T21:38:06.3286216Z def call(args):
2023-01-11T21:38:06.3286295Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3286370Z     args.clear()
2023-01-11T21:38:06.3286457Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3286656Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3286882Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3287078Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3287275Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3287466Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3287560Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3287740Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3287808Z         del arg0_1
2023-01-11T21:38:06.3287881Z         del arg1_1
2023-01-11T21:38:06.3287986Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3287991Z 
2023-01-11T21:38:06.3287996Z 
2023-01-11T21:38:06.3288079Z if __name__ == "__main__":
2023-01-11T21:38:06.3288202Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3288333Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3288533Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3288725Z     arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3288839Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3288844Z 
2023-01-11T21:38:06.3288848Z 
2023-01-11T21:38:06.3288945Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3289021Z import torch
2023-01-11T21:38:06.3289095Z import random
2023-01-11T21:38:06.3289215Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3289342Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3289348Z 
2023-01-11T21:38:06.3289430Z aten = torch.ops.aten
2023-01-11T21:38:06.3289560Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3289660Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3289665Z 
2023-01-11T21:38:06.3289739Z import triton
2023-01-11T21:38:06.3289835Z import triton.language as tl
2023-01-11T21:38:06.3289962Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3290103Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3290137Z 
2023-01-11T21:38:06.3290142Z 
2023-01-11T21:38:06.3290335Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3290411Z import triton
2023-01-11T21:38:06.3290497Z import triton.language as tl
2023-01-11T21:38:06.3290611Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3290714Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3290846Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3290972Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3290977Z 
2023-01-11T21:38:06.3291458Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3291535Z @triton.jit
2023-01-11T21:38:06.3291713Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3291786Z     xnumel = 10
2023-01-11T21:38:06.3291877Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3292007Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3292091Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3292163Z     x0 = xindex
2023-01-11T21:38:06.3292378Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3292636Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3292782Z     tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3292920Z     tmp6 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.3293001Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3293100Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.3293200Z     tmp4 = tl.libdevice.trunc(tmp2)
2023-01-11T21:38:06.3293280Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.3293377Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.3293512Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3293639Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3293770Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3293899Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3294027Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3294117Z ''')
2023-01-11T21:38:06.3294122Z 
2023-01-11T21:38:06.3294127Z 
2023-01-11T21:38:06.3294225Z async_compile.wait(globals())
2023-01-11T21:38:06.3294302Z del async_compile
2023-01-11T21:38:06.3294307Z 
2023-01-11T21:38:06.3294378Z def call(args):
2023-01-11T21:38:06.3294452Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3294655Z     args.clear()
2023-01-11T21:38:06.3294748Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3294947Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3295144Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3295337Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3295530Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3295723Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3295810Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3295993Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3296067Z         del arg0_1
2023-01-11T21:38:06.3296143Z         del arg1_1
2023-01-11T21:38:06.3296247Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3296293Z 
2023-01-11T21:38:06.3296298Z 
2023-01-11T21:38:06.3296381Z if __name__ == "__main__":
2023-01-11T21:38:06.3296502Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3296629Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3296824Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3297014Z     arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3297213Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3297219Z 
2023-01-11T21:38:06.3297518Z [2023-01-11 21:34:32,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 501
2023-01-11T21:38:06.3297947Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3298082Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3298340Z [2023-01-11 21:34:32,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 502
2023-01-11T21:38:06.3298602Z [2023-01-11 21:34:33,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 502
2023-01-11T21:38:06.3299016Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3299187Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3299447Z [2023-01-11 21:34:33,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 503
2023-01-11T21:38:06.3299452Z 
2023-01-11T21:38:06.3299546Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3299621Z import torch
2023-01-11T21:38:06.3299700Z import random
2023-01-11T21:38:06.3299821Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3299944Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3299949Z 
2023-01-11T21:38:06.3300032Z aten = torch.ops.aten
2023-01-11T21:38:06.3300175Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3300266Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3300274Z 
2023-01-11T21:38:06.3300354Z import triton
2023-01-11T21:38:06.3300451Z import triton.language as tl
2023-01-11T21:38:06.3300578Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3300718Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3300724Z 
2023-01-11T21:38:06.3300728Z 
2023-01-11T21:38:06.3300924Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3300997Z import triton
2023-01-11T21:38:06.3301091Z import triton.language as tl
2023-01-11T21:38:06.3301200Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3301308Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3301443Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3301570Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3301575Z 
2023-01-11T21:38:06.3302052Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3302128Z @triton.jit
2023-01-11T21:38:06.3302336Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3302410Z     xnumel = 10
2023-01-11T21:38:06.3302503Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3302633Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3302717Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3302791Z     x0 = xindex
2023-01-11T21:38:06.3303025Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3303217Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3303351Z     tmp5 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3303452Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3303527Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3303626Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.3303723Z     tmp4 = tl.libdevice.trunc(tmp2)
2023-01-11T21:38:06.3303802Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.3303901Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.3304039Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3304170Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3304295Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3304425Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3304556Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3304646Z ''')
2023-01-11T21:38:06.3304652Z 
2023-01-11T21:38:06.3304656Z 
2023-01-11T21:38:06.3304781Z async_compile.wait(globals())
2023-01-11T21:38:06.3304859Z del async_compile
2023-01-11T21:38:06.3304864Z 
2023-01-11T21:38:06.3304941Z def call(args):
2023-01-11T21:38:06.3305022Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3305092Z     args.clear()
2023-01-11T21:38:06.3305187Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3305395Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3305626Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3305834Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3306037Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3306228Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3306316Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3306500Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3306580Z         del arg0_1
2023-01-11T21:38:06.3306653Z         del arg1_1
2023-01-11T21:38:06.3306759Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3306765Z 
2023-01-11T21:38:06.3306769Z 
2023-01-11T21:38:06.3306849Z if __name__ == "__main__":
2023-01-11T21:38:06.3306973Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3307100Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3307284Z     arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3307482Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3307604Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3307609Z 
2023-01-11T21:38:06.3307613Z 
2023-01-11T21:38:06.3307710Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3307787Z import torch
2023-01-11T21:38:06.3307863Z import random
2023-01-11T21:38:06.3307985Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3308108Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3308113Z 
2023-01-11T21:38:06.3308191Z aten = torch.ops.aten
2023-01-11T21:38:06.3308328Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3308452Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3308458Z 
2023-01-11T21:38:06.3308535Z import triton
2023-01-11T21:38:06.3308628Z import triton.language as tl
2023-01-11T21:38:06.3308756Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3308895Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3308901Z 
2023-01-11T21:38:06.3308905Z 
2023-01-11T21:38:06.3309098Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3309169Z import triton
2023-01-11T21:38:06.3309263Z import triton.language as tl
2023-01-11T21:38:06.3309378Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3309489Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3309625Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3309751Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3309756Z 
2023-01-11T21:38:06.3310232Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3310307Z @triton.jit
2023-01-11T21:38:06.3310487Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3310556Z     xnumel = 10
2023-01-11T21:38:06.3310655Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3310785Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3310907Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3310980Z     x0 = xindex
2023-01-11T21:38:06.3311239Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3311453Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3311594Z     tmp5 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.3311711Z     tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3311794Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.3311893Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.3311990Z     tmp4 = tl.libdevice.trunc(tmp2)
2023-01-11T21:38:06.3312070Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.3312167Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.3312296Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3312429Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3312563Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3312695Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3312825Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3312914Z ''')
2023-01-11T21:38:06.3312920Z 
2023-01-11T21:38:06.3312925Z 
2023-01-11T21:38:06.3313018Z async_compile.wait(globals())
2023-01-11T21:38:06.3313096Z del async_compile
2023-01-11T21:38:06.3313101Z 
2023-01-11T21:38:06.3313171Z def call(args):
2023-01-11T21:38:06.3313251Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3313327Z     args.clear()
2023-01-11T21:38:06.3313419Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3313617Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3313813Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3314011Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3314207Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3314393Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3314488Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3314697Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3314775Z         del arg0_1
2023-01-11T21:38:06.3314851Z         del arg1_1
2023-01-11T21:38:06.3314960Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3314965Z 
2023-01-11T21:38:06.3314970Z 
2023-01-11T21:38:06.3315050Z if __name__ == "__main__":
2023-01-11T21:38:06.3315168Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3315316Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3315534Z     arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3315736Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3315856Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3315861Z 
2023-01-11T21:38:06.3316132Z [2023-01-11 21:34:33,296] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 503
2023-01-11T21:38:06.3316551Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3316682Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3316938Z [2023-01-11 21:34:33,340] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 504
2023-01-11T21:38:06.3317228Z [2023-01-11 21:34:33,359] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 504
2023-01-11T21:38:06.3317645Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3317771Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3318021Z [2023-01-11 21:34:33,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 505
2023-01-11T21:38:06.3318027Z 
2023-01-11T21:38:06.3318124Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3318197Z import torch
2023-01-11T21:38:06.3318271Z import random
2023-01-11T21:38:06.3318390Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3318515Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3318521Z 
2023-01-11T21:38:06.3318604Z aten = torch.ops.aten
2023-01-11T21:38:06.3318733Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3318827Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3318835Z 
2023-01-11T21:38:06.3318909Z import triton
2023-01-11T21:38:06.3319000Z import triton.language as tl
2023-01-11T21:38:06.3319124Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3319264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3319269Z 
2023-01-11T21:38:06.3319274Z 
2023-01-11T21:38:06.3319463Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3319539Z import triton
2023-01-11T21:38:06.3319624Z import triton.language as tl
2023-01-11T21:38:06.3319737Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3319837Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3319972Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3320095Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3320100Z 
2023-01-11T21:38:06.3320590Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3320666Z @triton.jit
2023-01-11T21:38:06.3320844Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3320913Z     xnumel = 10
2023-01-11T21:38:06.3321011Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3321138Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3321222Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3321295Z     x0 = xindex
2023-01-11T21:38:06.3321482Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3321715Z     tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3321812Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3321938Z     tmp8 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3322025Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3322113Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3322191Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3322444Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3322524Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3322775Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3322902Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3323061Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3323191Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3323320Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3323448Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3323533Z ''')
2023-01-11T21:38:06.3323539Z 
2023-01-11T21:38:06.3323543Z 
2023-01-11T21:38:06.3323636Z async_compile.wait(globals())
2023-01-11T21:38:06.3323713Z del async_compile
2023-01-11T21:38:06.3323719Z 
2023-01-11T21:38:06.3323785Z def call(args):
2023-01-11T21:38:06.3323865Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3323939Z     args.clear()
2023-01-11T21:38:06.3324031Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3324227Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3324426Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3324618Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3324814Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3325004Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3325097Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3325274Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3325350Z         del arg0_1
2023-01-11T21:38:06.3325422Z         del arg1_1
2023-01-11T21:38:06.3325525Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3325530Z 
2023-01-11T21:38:06.3325535Z 
2023-01-11T21:38:06.3325613Z if __name__ == "__main__":
2023-01-11T21:38:06.3325732Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3325854Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3326048Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3326234Z     arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3326354Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3326390Z 
2023-01-11T21:38:06.3326396Z 
2023-01-11T21:38:06.3326495Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3326571Z import torch
2023-01-11T21:38:06.3326645Z import random
2023-01-11T21:38:06.3326758Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3326882Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3326887Z 
2023-01-11T21:38:06.3326970Z aten = torch.ops.aten
2023-01-11T21:38:06.3327108Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3327205Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3327210Z 
2023-01-11T21:38:06.3327285Z import triton
2023-01-11T21:38:06.3327380Z import triton.language as tl
2023-01-11T21:38:06.3327505Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3327636Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3327642Z 
2023-01-11T21:38:06.3327651Z 
2023-01-11T21:38:06.3327835Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3327910Z import triton
2023-01-11T21:38:06.3328000Z import triton.language as tl
2023-01-11T21:38:06.3328112Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3328212Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3328345Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3328471Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3328476Z 
2023-01-11T21:38:06.3328934Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3329043Z @triton.jit
2023-01-11T21:38:06.3329220Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3329301Z     xnumel = 10
2023-01-11T21:38:06.3329396Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3329525Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3329608Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3329681Z     x0 = xindex
2023-01-11T21:38:06.3329865Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3330099Z     tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3330195Z     tmp7 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3330326Z     tmp8 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3330415Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3330499Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3330577Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3330822Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3330906Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3331155Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3331289Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3331418Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3331547Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3331679Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3331807Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3331887Z ''')
2023-01-11T21:38:06.3331900Z 
2023-01-11T21:38:06.3331904Z 
2023-01-11T21:38:06.3331991Z async_compile.wait(globals())
2023-01-11T21:38:06.3332066Z del async_compile
2023-01-11T21:38:06.3332071Z 
2023-01-11T21:38:06.3332148Z def call(args):
2023-01-11T21:38:06.3332255Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3332331Z     args.clear()
2023-01-11T21:38:06.3332423Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3332622Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3332808Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3332997Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3333193Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3333382Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3333476Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3333657Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3333731Z         del arg0_1
2023-01-11T21:38:06.3333803Z         del arg1_1
2023-01-11T21:38:06.3333902Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3333907Z 
2023-01-11T21:38:06.3333912Z 
2023-01-11T21:38:06.3333990Z if __name__ == "__main__":
2023-01-11T21:38:06.3334108Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3334232Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3334427Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3334727Z     arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3334845Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3334850Z 
2023-01-11T21:38:06.3335210Z [2023-01-11 21:34:33,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 505
2023-01-11T21:38:06.3335631Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3335757Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3336016Z [2023-01-11 21:34:33,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 506
2023-01-11T21:38:06.3336280Z [2023-01-11 21:34:33,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 506
2023-01-11T21:38:06.3336286Z 
2023-01-11T21:38:06.3336385Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3336457Z import torch
2023-01-11T21:38:06.3336535Z import random
2023-01-11T21:38:06.3336653Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3336774Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3336780Z 
2023-01-11T21:38:06.3336855Z aten = torch.ops.aten
2023-01-11T21:38:06.3336993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3337087Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3337092Z 
2023-01-11T21:38:06.3337239Z import triton
2023-01-11T21:38:06.3337347Z import triton.language as tl
2023-01-11T21:38:06.3337484Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3337623Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3337629Z 
2023-01-11T21:38:06.3337633Z 
2023-01-11T21:38:06.3337826Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3337893Z import triton
2023-01-11T21:38:06.3337983Z import triton.language as tl
2023-01-11T21:38:06.3338099Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3338201Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3338334Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3338459Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3338464Z 
2023-01-11T21:38:06.3338964Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3339041Z @triton.jit
2023-01-11T21:38:06.3339222Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3339289Z     xnumel = 10
2023-01-11T21:38:06.3339387Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3339513Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3339599Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3339670Z     x0 = xindex
2023-01-11T21:38:06.3339902Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3340088Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3340217Z     tmp7 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3340313Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3340403Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3340491Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3340569Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3340823Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3340901Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3341144Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3341307Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3341441Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3341574Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3341710Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3341840Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3341928Z ''')
2023-01-11T21:38:06.3341934Z 
2023-01-11T21:38:06.3341938Z 
2023-01-11T21:38:06.3342036Z async_compile.wait(globals())
2023-01-11T21:38:06.3342108Z del async_compile
2023-01-11T21:38:06.3342119Z 
2023-01-11T21:38:06.3342189Z def call(args):
2023-01-11T21:38:06.3342271Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3342347Z     args.clear()
2023-01-11T21:38:06.3342443Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3342642Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3342840Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3343036Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3343230Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3343427Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3343522Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3343704Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3343779Z         del arg0_1
2023-01-11T21:38:06.3343854Z         del arg1_1
2023-01-11T21:38:06.3343959Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3343964Z 
2023-01-11T21:38:06.3343968Z 
2023-01-11T21:38:06.3344049Z if __name__ == "__main__":
2023-01-11T21:38:06.3344165Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3344294Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3344485Z     arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3344680Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3344833Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3344839Z 
2023-01-11T21:38:06.3344844Z 
2023-01-11T21:38:06.3344943Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3345020Z import torch
2023-01-11T21:38:06.3345099Z import random
2023-01-11T21:38:06.3345236Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3345382Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3345389Z 
2023-01-11T21:38:06.3345473Z aten = torch.ops.aten
2023-01-11T21:38:06.3345610Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3345708Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3345716Z 
2023-01-11T21:38:06.3345792Z import triton
2023-01-11T21:38:06.3345886Z import triton.language as tl
2023-01-11T21:38:06.3346005Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3346148Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3346153Z 
2023-01-11T21:38:06.3346160Z 
2023-01-11T21:38:06.3346351Z triton_fused_div_div_1_div_2_div_3_div_4_0 = async_compile.triton('''
2023-01-11T21:38:06.3346428Z import triton
2023-01-11T21:38:06.3346523Z import triton.language as tl
2023-01-11T21:38:06.3346637Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3346740Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3346875Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3346995Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3347000Z 
2023-01-11T21:38:06.3347462Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*i64', 4: '*i64', 5: '*fp32', 6: '*i64', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.3347566Z @triton.jit
2023-01-11T21:38:06.3347748Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3347825Z     xnumel = 10
2023-01-11T21:38:06.3347925Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3348056Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3348139Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3348206Z     x0 = xindex
2023-01-11T21:38:06.3348440Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3348630Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3348762Z     tmp7 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3348864Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3348954Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3349040Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3349121Z     tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.3349370Z     tmp5 = tl.where((tmp0 < 0) != (tmp2 < 0), tl.where(tmp0 % tmp2 != 0, tmp0 // tmp2 - 1, tmp0 // tmp2), tmp0 // tmp2)
2023-01-11T21:38:06.3349454Z     tmp6 = tmp0 // tmp2
2023-01-11T21:38:06.3349706Z     tmp9 = tl.where((tmp7 < 0) != (tmp8 < 0), tl.where(tmp7 % tmp8 != 0, tmp7 // tmp8 - 1, tmp7 // tmp8), tmp7 // tmp8)
2023-01-11T21:38:06.3349840Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3349970Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3350103Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3350234Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3350366Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3350447Z ''')
2023-01-11T21:38:06.3350453Z 
2023-01-11T21:38:06.3350457Z 
2023-01-11T21:38:06.3350553Z async_compile.wait(globals())
2023-01-11T21:38:06.3350630Z del async_compile
2023-01-11T21:38:06.3350636Z 
2023-01-11T21:38:06.3350749Z def call(args):
2023-01-11T21:38:06.3350831Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3350906Z     args.clear()
2023-01-11T21:38:06.3350996Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3351186Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3351378Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3351569Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3351764Z         buf3 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3351958Z         buf4 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3352053Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3352228Z         triton_fused_div_div_1_div_2_div_3_div_4_0.run(arg0_1, arg1_1, buf0, buf1, buf2, buf3, buf4, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3352302Z         del arg0_1
2023-01-11T21:38:06.3352370Z         del arg1_1
2023-01-11T21:38:06.3352476Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3352481Z 
2023-01-11T21:38:06.3352485Z 
2023-01-11T21:38:06.3352562Z if __name__ == "__main__":
2023-01-11T21:38:06.3352679Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3352804Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3352992Z     arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3353183Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3353302Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3353348Z 
2023-01-11T21:38:06.3353412Z ok (1.371s)
2023-01-11T21:38:06.3353926Z   test_dropout_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.3354006Z   warnings.warn(
2023-01-11T21:38:06.3354262Z [2023-01-11 21:34:33,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 507
2023-01-11T21:38:06.3354514Z [2023-01-11 21:34:33,639] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.3354779Z [2023-01-11 21:34:33,774] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 507
2023-01-11T21:38:06.3355027Z [2023-01-11 21:34:34,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 508
2023-01-11T21:38:06.3355321Z [2023-01-11 21:34:34,025] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.3355587Z [2023-01-11 21:34:34,033] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 508
2023-01-11T21:38:06.3355593Z 
2023-01-11T21:38:06.3355695Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3355764Z import torch
2023-01-11T21:38:06.3355839Z import random
2023-01-11T21:38:06.3355957Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3356078Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3356083Z 
2023-01-11T21:38:06.3356163Z aten = torch.ops.aten
2023-01-11T21:38:06.3356300Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3356396Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3356401Z 
2023-01-11T21:38:06.3356469Z import triton
2023-01-11T21:38:06.3356560Z import triton.language as tl
2023-01-11T21:38:06.3356684Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3356827Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3356992Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.3356998Z 
2023-01-11T21:38:06.3357002Z 
2023-01-11T21:38:06.3357184Z triton_fused_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3357260Z import triton
2023-01-11T21:38:06.3357352Z import triton.language as tl
2023-01-11T21:38:06.3357458Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3357559Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3357692Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3357817Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3357822Z 
2023-01-11T21:38:06.3358236Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3358314Z @triton.jit
2023-01-11T21:38:06.3358454Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3358527Z     xnumel = 1000
2023-01-11T21:38:06.3358617Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3358746Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3358828Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3358899Z     x0 = xindex
2023-01-11T21:38:06.3359028Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3359124Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3359193Z     tmp1 = x0
2023-01-11T21:38:06.3359273Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.3359345Z     tmp3 = 0.5
2023-01-11T21:38:06.3359422Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.3359508Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.3359584Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3359682Z     tmp8 = 2.0
2023-01-11T21:38:06.3359752Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.3359885Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3359970Z ''')
2023-01-11T21:38:06.3359976Z 
2023-01-11T21:38:06.3359980Z 
2023-01-11T21:38:06.3360073Z async_compile.wait(globals())
2023-01-11T21:38:06.3360150Z del async_compile
2023-01-11T21:38:06.3360155Z 
2023-01-11T21:38:06.3360229Z def call(args):
2023-01-11T21:38:06.3360300Z     arg0_1, = args
2023-01-11T21:38:06.3360374Z     args.clear()
2023-01-11T21:38:06.3360500Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.3360590Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3360789Z         buf0 = empty_strided((1000, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3360882Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3361033Z         triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.3361109Z         del arg0_1
2023-01-11T21:38:06.3361186Z         return (buf0, )
2023-01-11T21:38:06.3361191Z 
2023-01-11T21:38:06.3361195Z 
2023-01-11T21:38:06.3361272Z if __name__ == "__main__":
2023-01-11T21:38:06.3361384Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3361514Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3361710Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3361908Z     arg0_1 = rand_strided((1000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3362020Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3362026Z 
2023-01-11T21:38:06.3362030Z 
2023-01-11T21:38:06.3362129Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3362201Z import torch
2023-01-11T21:38:06.3362274Z import random
2023-01-11T21:38:06.3362385Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3362507Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3362516Z 
2023-01-11T21:38:06.3362595Z aten = torch.ops.aten
2023-01-11T21:38:06.3362730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3362824Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3362829Z 
2023-01-11T21:38:06.3362899Z import triton
2023-01-11T21:38:06.3363020Z import triton.language as tl
2023-01-11T21:38:06.3363139Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3363279Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3363442Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.3363447Z 
2023-01-11T21:38:06.3363451Z 
2023-01-11T21:38:06.3363604Z triton_fused_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3363675Z import triton
2023-01-11T21:38:06.3363766Z import triton.language as tl
2023-01-11T21:38:06.3363878Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3363981Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3364107Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3364230Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3364235Z 
2023-01-11T21:38:06.3364648Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3364723Z @triton.jit
2023-01-11T21:38:06.3364862Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3364936Z     xnumel = 1000
2023-01-11T21:38:06.3365032Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3365160Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3365236Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3365306Z     x0 = xindex
2023-01-11T21:38:06.3365431Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3365554Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3365623Z     tmp1 = x0
2023-01-11T21:38:06.3365711Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.3365780Z     tmp3 = 0.5
2023-01-11T21:38:06.3365851Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.3365939Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.3366014Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3366085Z     tmp8 = 2.0
2023-01-11T21:38:06.3366161Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.3366293Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3366378Z ''')
2023-01-11T21:38:06.3366384Z 
2023-01-11T21:38:06.3366389Z 
2023-01-11T21:38:06.3366473Z async_compile.wait(globals())
2023-01-11T21:38:06.3366549Z del async_compile
2023-01-11T21:38:06.3366554Z 
2023-01-11T21:38:06.3366628Z def call(args):
2023-01-11T21:38:06.3366700Z     arg0_1, = args
2023-01-11T21:38:06.3366775Z     args.clear()
2023-01-11T21:38:06.3366910Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.3367000Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3367195Z         buf0 = empty_strided((1000, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3367287Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3367442Z         triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.3367515Z         del arg0_1
2023-01-11T21:38:06.3367592Z         return (buf0, )
2023-01-11T21:38:06.3367597Z 
2023-01-11T21:38:06.3367602Z 
2023-01-11T21:38:06.3367680Z if __name__ == "__main__":
2023-01-11T21:38:06.3367800Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3367925Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3368110Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3368311Z     arg0_1 = rand_strided((1000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3368426Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3368432Z 
2023-01-11T21:38:06.3368501Z ok (0.588s)
2023-01-11T21:38:06.3368843Z   test_dropout_deterministic_cuda (__main__.CudaTests) ... [2023-01-11 21:34:34,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 509
2023-01-11T21:38:06.3369125Z [2023-01-11 21:34:34,227] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.3369390Z [2023-01-11 21:34:34,364] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 509
2023-01-11T21:38:06.3369641Z [2023-01-11 21:34:34,429] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 510
2023-01-11T21:38:06.3369892Z [2023-01-11 21:34:34,430] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.3370144Z [2023-01-11 21:34:34,438] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 510
2023-01-11T21:38:06.3370160Z 
2023-01-11T21:38:06.3370251Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3370326Z import torch
2023-01-11T21:38:06.3370401Z import random
2023-01-11T21:38:06.3370520Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3370647Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3370652Z 
2023-01-11T21:38:06.3370733Z aten = torch.ops.aten
2023-01-11T21:38:06.3370871Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3370958Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3370963Z 
2023-01-11T21:38:06.3371035Z import triton
2023-01-11T21:38:06.3371127Z import triton.language as tl
2023-01-11T21:38:06.3371254Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3371391Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3371555Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.3371586Z 
2023-01-11T21:38:06.3371591Z 
2023-01-11T21:38:06.3371747Z triton_fused_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3371820Z import triton
2023-01-11T21:38:06.3371905Z import triton.language as tl
2023-01-11T21:38:06.3372021Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3372124Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3372254Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3372379Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3372384Z 
2023-01-11T21:38:06.3372801Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3372871Z @triton.jit
2023-01-11T21:38:06.3373013Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3373084Z     xnumel = 1024
2023-01-11T21:38:06.3373179Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3373309Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3373392Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3373463Z     x0 = xindex
2023-01-11T21:38:06.3373591Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3373688Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3373752Z     tmp1 = x0
2023-01-11T21:38:06.3373840Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.3373911Z     tmp3 = 0.55
2023-01-11T21:38:06.3373989Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.3374077Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.3374153Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3374232Z     tmp8 = 2.2222222222222223
2023-01-11T21:38:06.3374302Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.3374439Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3374638Z ''')
2023-01-11T21:38:06.3374644Z 
2023-01-11T21:38:06.3374648Z 
2023-01-11T21:38:06.3374739Z async_compile.wait(globals())
2023-01-11T21:38:06.3374815Z del async_compile
2023-01-11T21:38:06.3374820Z 
2023-01-11T21:38:06.3374893Z def call(args):
2023-01-11T21:38:06.3374966Z     arg0_1, = args
2023-01-11T21:38:06.3375034Z     args.clear()
2023-01-11T21:38:06.3375214Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.3375306Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3375506Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3375598Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3375750Z         triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.3375822Z         del arg0_1
2023-01-11T21:38:06.3375892Z         return (buf0, )
2023-01-11T21:38:06.3375905Z 
2023-01-11T21:38:06.3375910Z 
2023-01-11T21:38:06.3375982Z if __name__ == "__main__":
2023-01-11T21:38:06.3376103Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3376227Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3376424Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3376628Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3376742Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3376747Z 
2023-01-11T21:38:06.3376751Z 
2023-01-11T21:38:06.3376849Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3376921Z import torch
2023-01-11T21:38:06.3376988Z import random
2023-01-11T21:38:06.3377105Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3377304Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3377310Z 
2023-01-11T21:38:06.3377404Z aten = torch.ops.aten
2023-01-11T21:38:06.3377550Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3377687Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3377692Z 
2023-01-11T21:38:06.3377767Z import triton
2023-01-11T21:38:06.3377855Z import triton.language as tl
2023-01-11T21:38:06.3377979Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3378120Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3378291Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.3378296Z 
2023-01-11T21:38:06.3378300Z 
2023-01-11T21:38:06.3378460Z triton_fused_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3378535Z import triton
2023-01-11T21:38:06.3378628Z import triton.language as tl
2023-01-11T21:38:06.3378748Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3378845Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3378979Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3379107Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3379115Z 
2023-01-11T21:38:06.3379537Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3379612Z @triton.jit
2023-01-11T21:38:06.3379756Z def triton_(seed0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3379833Z     xnumel = 1024
2023-01-11T21:38:06.3379933Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3380058Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3380142Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3380214Z     x0 = xindex
2023-01-11T21:38:06.3380345Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3380443Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3380516Z     tmp1 = x0
2023-01-11T21:38:06.3380606Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.3380677Z     tmp3 = 0.55
2023-01-11T21:38:06.3380756Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.3380849Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.3380928Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3381009Z     tmp8 = 2.2222222222222223
2023-01-11T21:38:06.3381090Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.3381253Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3381334Z ''')
2023-01-11T21:38:06.3381339Z 
2023-01-11T21:38:06.3381344Z 
2023-01-11T21:38:06.3381438Z async_compile.wait(globals())
2023-01-11T21:38:06.3381519Z del async_compile
2023-01-11T21:38:06.3381524Z 
2023-01-11T21:38:06.3381600Z def call(args):
2023-01-11T21:38:06.3381675Z     arg0_1, = args
2023-01-11T21:38:06.3381749Z     args.clear()
2023-01-11T21:38:06.3381884Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.3381971Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3382175Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3382273Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3382425Z         triton_fused_mul_1_0.run(seed_cuda_0, arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.3382502Z         del arg0_1
2023-01-11T21:38:06.3382581Z         return (buf0, )
2023-01-11T21:38:06.3382588Z 
2023-01-11T21:38:06.3382593Z 
2023-01-11T21:38:06.3382674Z if __name__ == "__main__":
2023-01-11T21:38:06.3382795Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3382916Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3383110Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3383311Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3383428Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3383434Z 
2023-01-11T21:38:06.3383506Z ok (0.401s)
2023-01-11T21:38:06.3383847Z   test_dtype_mismatch_issue_cuda (__main__.CudaTests) ... [2023-01-11 21:34:34,583] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 511
2023-01-11T21:38:06.3384145Z [2023-01-11 21:34:34,615] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 511
2023-01-11T21:38:06.3384151Z 
2023-01-11T21:38:06.3384255Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3384332Z import torch
2023-01-11T21:38:06.3384402Z import random
2023-01-11T21:38:06.3384521Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3384646Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3384651Z 
2023-01-11T21:38:06.3384737Z aten = torch.ops.aten
2023-01-11T21:38:06.3384875Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3384971Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3384976Z 
2023-01-11T21:38:06.3385051Z import triton
2023-01-11T21:38:06.3385141Z import triton.language as tl
2023-01-11T21:38:06.3385294Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3385458Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3385464Z 
2023-01-11T21:38:06.3385469Z 
2023-01-11T21:38:06.3385610Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.3385823Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.3385948Z extern "C" void kernel(float* __restrict__ in_out_ptr0,
2023-01-11T21:38:06.3386061Z                        const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.3386169Z                        float* __restrict__ out_ptr0,
2023-01-11T21:38:06.3386265Z                        float* __restrict__ out_ptr2)
2023-01-11T21:38:06.3386333Z {
2023-01-11T21:38:06.3386425Z     auto out_ptr1 = in_out_ptr0;
2023-01-11T21:38:06.3386528Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.3386598Z     {
2023-01-11T21:38:06.3386682Z         #pragma omp for 
2023-01-11T21:38:06.3386777Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:06.3386840Z         {
2023-01-11T21:38:06.3386908Z             {
2023-01-11T21:38:06.3386978Z                 {
2023-01-11T21:38:06.3387255Z                     float tmp5 = -std::numeric_limits<float>::infinity();
2023-01-11T21:38:06.3387354Z                     for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:06.3387458Z                     {
2023-01-11T21:38:06.3387534Z                         {
2023-01-11T21:38:06.3387643Z                             auto tmp0 = static_cast<long>(i1);
2023-01-11T21:38:06.3387757Z                             auto tmp1 = static_cast<long>(63);
2023-01-11T21:38:06.3387860Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.3387956Z                             float tmp3 = 0.0;
2023-01-11T21:38:06.3388040Z                             if(tmp2)
2023-01-11T21:38:06.3388117Z                             {
2023-01-11T21:38:06.3388227Z                                 auto tmp4 = in_ptr0[i1 + (63*i0)];
2023-01-11T21:38:06.3388317Z                                 tmp3 = tmp4;
2023-01-11T21:38:06.3388396Z                             }
2023-01-11T21:38:06.3388507Z                             tmp5 = std::max(tmp5, tmp3);
2023-01-11T21:38:06.3388582Z                         }
2023-01-11T21:38:06.3388655Z                     }
2023-01-11T21:38:06.3388749Z                     out_ptr0[i0] = tmp5;
2023-01-11T21:38:06.3388823Z                 }
2023-01-11T21:38:06.3388885Z             }
2023-01-11T21:38:06.3388954Z         }
2023-01-11T21:38:06.3389039Z         #pragma omp for 
2023-01-11T21:38:06.3389127Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:06.3389194Z         {
2023-01-11T21:38:06.3389263Z             {
2023-01-11T21:38:06.3389333Z                 {
2023-01-11T21:38:06.3389415Z                     float tmp8 = 0;
2023-01-11T21:38:06.3389512Z                     for(long i1=0; i1<64; i1+=1)
2023-01-11T21:38:06.3389586Z                     {
2023-01-11T21:38:06.3389660Z                         {
2023-01-11T21:38:06.3389794Z                             auto tmp5 = out_ptr0[i0];
2023-01-11T21:38:06.3389908Z                             auto tmp0 = static_cast<long>(i1);
2023-01-11T21:38:06.3390023Z                             auto tmp1 = static_cast<long>(63);
2023-01-11T21:38:06.3390119Z                             auto tmp2 = tmp0 < tmp1;
2023-01-11T21:38:06.3390216Z                             float tmp3 = 0.0;
2023-01-11T21:38:06.3390299Z                             if(tmp2)
2023-01-11T21:38:06.3390377Z                             {
2023-01-11T21:38:06.3390488Z                                 auto tmp4 = in_ptr0[i1 + (63*i0)];
2023-01-11T21:38:06.3390580Z                                 tmp3 = tmp4;
2023-01-11T21:38:06.3390659Z                             }
2023-01-11T21:38:06.3390801Z                             auto tmp6 = tmp3 - tmp5;
2023-01-11T21:38:06.3390911Z                             auto tmp7 = std::exp(tmp6);
2023-01-11T21:38:06.3391018Z                             out_ptr1[i1 + (64*i0)] = tmp7;
2023-01-11T21:38:06.3391114Z                             tmp8 += tmp7;
2023-01-11T21:38:06.3391189Z                         }
2023-01-11T21:38:06.3391261Z                     }
2023-01-11T21:38:06.3391354Z                     out_ptr2[i0] = tmp8;
2023-01-11T21:38:06.3391418Z                 }
2023-01-11T21:38:06.3391487Z             }
2023-01-11T21:38:06.3391558Z         }
2023-01-11T21:38:06.3391641Z         #pragma omp for 
2023-01-11T21:38:06.3391729Z         for(long i0=0; i0<4096; i0+=1)
2023-01-11T21:38:06.3391799Z         {
2023-01-11T21:38:06.3391883Z             for(long i1=0; i1<8; i1+=1)
2023-01-11T21:38:06.3391955Z             {
2023-01-11T21:38:06.3392108Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(out_ptr1 + (8*i1) + (64*i0));
2023-01-11T21:38:06.3392243Z                 auto tmp1 = at::vec::Vectorized<float>(out_ptr2[i0]);
2023-01-11T21:38:06.3392338Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.3392452Z                 tmp2.store(in_out_ptr0 + (8*i1) + (64*i0));
2023-01-11T21:38:06.3392523Z             }
2023-01-11T21:38:06.3392628Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.3392714Z             for(long i1=64; i1<64; i1+=1)
2023-01-11T21:38:06.3392783Z             {
2023-01-11T21:38:06.3392886Z                 auto tmp0 = out_ptr1[i1 + (64*i0)];
2023-01-11T21:38:06.3393008Z                 auto tmp1 = out_ptr2[i0];
2023-01-11T21:38:06.3393104Z                 auto tmp2 = tmp0 / tmp1;
2023-01-11T21:38:06.3393205Z                 in_out_ptr0[i1 + (64*i0)] = tmp2;
2023-01-11T21:38:06.3393275Z             }
2023-01-11T21:38:06.3393337Z         }
2023-01-11T21:38:06.3393403Z     }
2023-01-11T21:38:06.3393469Z }
2023-01-11T21:38:06.3393554Z ''')
2023-01-11T21:38:06.3393560Z 
2023-01-11T21:38:06.3393564Z 
2023-01-11T21:38:06.3393661Z async_compile.wait(globals())
2023-01-11T21:38:06.3393739Z del async_compile
2023-01-11T21:38:06.3393744Z 
2023-01-11T21:38:06.3393820Z def call(args):
2023-01-11T21:38:06.3393889Z     arg0_1, = args
2023-01-11T21:38:06.3393971Z     args.clear()
2023-01-11T21:38:06.3394183Z     buf0 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3394396Z     buf1 = empty_strided((128, 32, 64), (2048, 64, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3394612Z     buf2 = empty_strided((128, 32, 1), (32, 1, 4096), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3394706Z     buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.3394902Z     kernel_cpp_0(c_void_p(buf3.data_ptr()), c_void_p(arg0_1.data_ptr()), c_void_p(buf0.data_ptr()), c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.3394977Z     del arg0_1
2023-01-11T21:38:06.3395048Z     return (buf3, )
2023-01-11T21:38:06.3395053Z 
2023-01-11T21:38:06.3395058Z 
2023-01-11T21:38:06.3395149Z if __name__ == "__main__":
2023-01-11T21:38:06.3395287Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3395440Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3395653Z     arg0_1 = rand_strided((128, 32, 63), (2016, 63, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.3395810Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3395815Z 
2023-01-11T21:38:06.3395884Z ok (0.056s)
2023-01-11T21:38:06.3396345Z   test_elu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3396477Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3396732Z [2023-01-11 21:34:34,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 512
2023-01-11T21:38:06.3396998Z [2023-01-11 21:34:34,821] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 512
2023-01-11T21:38:06.3397420Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3397554Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3397810Z [2023-01-11 21:34:34,884] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 513
2023-01-11T21:38:06.3398075Z [2023-01-11 21:34:35,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 513
2023-01-11T21:38:06.3398080Z 
2023-01-11T21:38:06.3398181Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3398257Z import torch
2023-01-11T21:38:06.3398333Z import random
2023-01-11T21:38:06.3398448Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3398576Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3398581Z 
2023-01-11T21:38:06.3398666Z aten = torch.ops.aten
2023-01-11T21:38:06.3398803Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3398901Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3398906Z 
2023-01-11T21:38:06.3399010Z import triton
2023-01-11T21:38:06.3399107Z import triton.language as tl
2023-01-11T21:38:06.3399235Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3399368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3399373Z 
2023-01-11T21:38:06.3399384Z 
2023-01-11T21:38:06.3399544Z triton_fused_add_where_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3399621Z import triton
2023-01-11T21:38:06.3399716Z import triton.language as tl
2023-01-11T21:38:06.3399830Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3399936Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3400072Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3400195Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3400200Z 
2023-01-11T21:38:06.3400616Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3400690Z @triton.jit
2023-01-11T21:38:06.3400831Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3400904Z     xnumel = 256
2023-01-11T21:38:06.3401000Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3401129Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3401213Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3401284Z     x0 = xindex
2023-01-11T21:38:06.3401468Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3401598Z     tmp13 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3401671Z     tmp1 = 0
2023-01-11T21:38:06.3401752Z     tmp2 = tmp0 > tmp1
2023-01-11T21:38:06.3401831Z     tmp3 = 1.0507009873554805
2023-01-11T21:38:06.3401907Z     tmp4 = tmp0 * tmp3
2023-01-11T21:38:06.3401972Z     tmp5 = 1.0
2023-01-11T21:38:06.3402051Z     tmp6 = tmp0 * tmp5
2023-01-11T21:38:06.3402150Z     tmp7 = tl.libdevice.expm1(tmp6)
2023-01-11T21:38:06.3402229Z     tmp8 = 1.7580993408473766
2023-01-11T21:38:06.3402307Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.3402404Z     tmp10 = tl.where(tmp2, tmp4, tmp9)
2023-01-11T21:38:06.3402476Z     tmp11 = 2
2023-01-11T21:38:06.3402549Z     tmp12 = tmp10 + tmp11
2023-01-11T21:38:06.3402622Z     tmp14 = 1
2023-01-11T21:38:06.3402700Z     tmp15 = tmp13 + tmp14
2023-01-11T21:38:06.3402780Z     tmp16 = tmp15 > tmp1
2023-01-11T21:38:06.3402851Z     tmp17 = 3
2023-01-11T21:38:06.3402930Z     tmp18 = tmp15 * tmp17
2023-01-11T21:38:06.3403002Z     tmp19 = 4
2023-01-11T21:38:06.3403077Z     tmp20 = tmp15 * tmp19
2023-01-11T21:38:06.3403176Z     tmp21 = tl.libdevice.expm1(tmp20)
2023-01-11T21:38:06.3403249Z     tmp22 = 6
2023-01-11T21:38:06.3403327Z     tmp23 = tmp21 * tmp22
2023-01-11T21:38:06.3403428Z     tmp24 = tl.where(tmp16, tmp18, tmp23)
2023-01-11T21:38:06.3403566Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.3403699Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.3403777Z ''')
2023-01-11T21:38:06.3403782Z 
2023-01-11T21:38:06.3403786Z 
2023-01-11T21:38:06.3403879Z async_compile.wait(globals())
2023-01-11T21:38:06.3403955Z del async_compile
2023-01-11T21:38:06.3403960Z 
2023-01-11T21:38:06.3404041Z def call(args):
2023-01-11T21:38:06.3404113Z     arg0_1, = args
2023-01-11T21:38:06.3404184Z     args.clear()
2023-01-11T21:38:06.3404274Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3404470Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3404673Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3404764Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3404915Z         triton_fused_add_where_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3405014Z         del arg0_1
2023-01-11T21:38:06.3405098Z         return (buf0, buf1, )
2023-01-11T21:38:06.3405103Z 
2023-01-11T21:38:06.3405107Z 
2023-01-11T21:38:06.3405185Z if __name__ == "__main__":
2023-01-11T21:38:06.3405301Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3405419Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3405622Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3405740Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3405745Z 
2023-01-11T21:38:06.3405749Z 
2023-01-11T21:38:06.3405845Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3405921Z import torch
2023-01-11T21:38:06.3405994Z import random
2023-01-11T21:38:06.3406112Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3406236Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3406241Z 
2023-01-11T21:38:06.3406315Z aten = torch.ops.aten
2023-01-11T21:38:06.3406452Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3406547Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3406552Z 
2023-01-11T21:38:06.3406623Z import triton
2023-01-11T21:38:06.3406714Z import triton.language as tl
2023-01-11T21:38:06.3406838Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3406981Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3406987Z 
2023-01-11T21:38:06.3406991Z 
2023-01-11T21:38:06.3407184Z triton_fused_add_convert_element_type_3_0 = async_compile.triton('''
2023-01-11T21:38:06.3407252Z import triton
2023-01-11T21:38:06.3407343Z import triton.language as tl
2023-01-11T21:38:06.3407485Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3407586Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3407717Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3407841Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3407846Z 
2023-01-11T21:38:06.3408267Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3408342Z @triton.jit
2023-01-11T21:38:06.3408478Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3408553Z     xnumel = 256
2023-01-11T21:38:06.3408650Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3408781Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3408868Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3408938Z     x0 = xindex
2023-01-11T21:38:06.3409151Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3409262Z     tmp15 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3409350Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3409424Z     tmp2 = 0
2023-01-11T21:38:06.3409502Z     tmp3 = tmp1 > tmp2
2023-01-11T21:38:06.3409580Z     tmp4 = 1.0507009873554805
2023-01-11T21:38:06.3409657Z     tmp5 = tmp1 * tmp4
2023-01-11T21:38:06.3409730Z     tmp6 = 1.0
2023-01-11T21:38:06.3409800Z     tmp7 = tmp1 * tmp6
2023-01-11T21:38:06.3409900Z     tmp8 = tl.libdevice.expm1(tmp7)
2023-01-11T21:38:06.3409977Z     tmp9 = 1.7580993408473766
2023-01-11T21:38:06.3410056Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.3410151Z     tmp11 = tl.where(tmp3, tmp5, tmp10)
2023-01-11T21:38:06.3410239Z     tmp12 = tmp11.to(tl.float32)
2023-01-11T21:38:06.3410309Z     tmp13 = 2
2023-01-11T21:38:06.3410386Z     tmp14 = tmp12 + tmp13
2023-01-11T21:38:06.3410454Z     tmp16 = 1
2023-01-11T21:38:06.3410533Z     tmp17 = tmp15 + tmp16
2023-01-11T21:38:06.3410623Z     tmp18 = tmp17.to(tl.float32)
2023-01-11T21:38:06.3410700Z     tmp19 = tmp18 > tmp2
2023-01-11T21:38:06.3410770Z     tmp20 = 3
2023-01-11T21:38:06.3410842Z     tmp21 = tmp18 * tmp20
2023-01-11T21:38:06.3410941Z     tmp22 = 4
2023-01-11T21:38:06.3411022Z     tmp23 = tmp18 * tmp22
2023-01-11T21:38:06.3411121Z     tmp24 = tl.libdevice.expm1(tmp23)
2023-01-11T21:38:06.3411193Z     tmp25 = 6
2023-01-11T21:38:06.3411270Z     tmp26 = tmp24 * tmp25
2023-01-11T21:38:06.3411370Z     tmp27 = tl.where(tmp19, tmp21, tmp26)
2023-01-11T21:38:06.3411451Z     tmp28 = tmp27.to(tl.float32)
2023-01-11T21:38:06.3411586Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.3411720Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask)
2023-01-11T21:38:06.3411804Z ''')
2023-01-11T21:38:06.3411809Z 
2023-01-11T21:38:06.3411816Z 
2023-01-11T21:38:06.3411908Z async_compile.wait(globals())
2023-01-11T21:38:06.3411984Z del async_compile
2023-01-11T21:38:06.3411990Z 
2023-01-11T21:38:06.3412064Z def call(args):
2023-01-11T21:38:06.3412138Z     arg0_1, = args
2023-01-11T21:38:06.3412206Z     args.clear()
2023-01-11T21:38:06.3412296Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3412499Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3412697Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3412788Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3412957Z         triton_fused_add_convert_element_type_3_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3413030Z         del arg0_1
2023-01-11T21:38:06.3413106Z         return (buf0, buf1, )
2023-01-11T21:38:06.3413111Z 
2023-01-11T21:38:06.3413124Z 
2023-01-11T21:38:06.3413197Z if __name__ == "__main__":
2023-01-11T21:38:06.3413350Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3413476Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3413681Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3413794Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3413799Z 
2023-01-11T21:38:06.3413871Z ok (0.442s)
2023-01-11T21:38:06.3414336Z   test_embedding_bag_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3414468Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3414841Z [2023-01-11 21:34:35,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 514
2023-01-11T21:38:06.3415069Z [2023-01-11 21:34:35,096] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._embedding_bag
2023-01-11T21:38:06.3415332Z [2023-01-11 21:34:35,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 514
2023-01-11T21:38:06.3415746Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3415878Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3416130Z [2023-01-11 21:34:35,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 515
2023-01-11T21:38:06.3416361Z [2023-01-11 21:34:35,136] torch._inductor.ir: [WARNING] Using FallbackKernel: aten._embedding_bag
2023-01-11T21:38:06.3416624Z [2023-01-11 21:34:35,138] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 515
2023-01-11T21:38:06.3416630Z 
2023-01-11T21:38:06.3416728Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3416808Z import torch
2023-01-11T21:38:06.3416920Z import random
2023-01-11T21:38:06.3417041Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3417220Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3417227Z 
2023-01-11T21:38:06.3417317Z aten = torch.ops.aten
2023-01-11T21:38:06.3417455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3417550Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3417555Z 
2023-01-11T21:38:06.3417632Z import triton
2023-01-11T21:38:06.3417724Z import triton.language as tl
2023-01-11T21:38:06.3417842Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3417980Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3417989Z 
2023-01-11T21:38:06.3417994Z 
2023-01-11T21:38:06.3418085Z async_compile.wait(globals())
2023-01-11T21:38:06.3418164Z del async_compile
2023-01-11T21:38:06.3418169Z 
2023-01-11T21:38:06.3418243Z def call(args):
2023-01-11T21:38:06.3418332Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.3418410Z     args.clear()
2023-01-11T21:38:06.3418495Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3418613Z         buf0 = aten._embedding_bag(arg0_1, arg1_1, arg2_1)
2023-01-11T21:38:06.3418685Z         del arg0_1
2023-01-11T21:38:06.3418756Z         del arg1_1
2023-01-11T21:38:06.3418827Z         del arg2_1
2023-01-11T21:38:06.3418900Z         buf1 = buf0[0]
2023-01-11T21:38:06.3419004Z         assert_size_stride(buf1, (3, 4), (4, 1))
2023-01-11T21:38:06.3419071Z         buf2 = buf0[1]
2023-01-11T21:38:06.3419172Z         assert_size_stride(buf2, (8, ), (1, ))
2023-01-11T21:38:06.3419247Z         buf3 = buf0[2]
2023-01-11T21:38:06.3419383Z         assert_size_stride(buf3, (3, ), (1, ))
2023-01-11T21:38:06.3419456Z         buf4 = buf0[3]
2023-01-11T21:38:06.3419552Z         assert_size_stride(buf4, (0, ), (1, ))
2023-01-11T21:38:06.3419620Z         del buf0
2023-01-11T21:38:06.3419709Z         return (buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3419714Z 
2023-01-11T21:38:06.3419722Z 
2023-01-11T21:38:06.3419799Z if __name__ == "__main__":
2023-01-11T21:38:06.3419918Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3420044Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3420253Z     arg0_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3420447Z     arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3420639Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3420765Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.3420774Z 
2023-01-11T21:38:06.3420779Z 
2023-01-11T21:38:06.3420874Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3420941Z import torch
2023-01-11T21:38:06.3421016Z import random
2023-01-11T21:38:06.3421133Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3421256Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3421263Z 
2023-01-11T21:38:06.3421342Z aten = torch.ops.aten
2023-01-11T21:38:06.3421477Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3421573Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3421578Z 
2023-01-11T21:38:06.3421645Z import triton
2023-01-11T21:38:06.3421738Z import triton.language as tl
2023-01-11T21:38:06.3421863Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3422002Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3422007Z 
2023-01-11T21:38:06.3422011Z 
2023-01-11T21:38:06.3422102Z async_compile.wait(globals())
2023-01-11T21:38:06.3422183Z del async_compile
2023-01-11T21:38:06.3422188Z 
2023-01-11T21:38:06.3422262Z def call(args):
2023-01-11T21:38:06.3422348Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.3422417Z     args.clear()
2023-01-11T21:38:06.3422507Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3422622Z         buf0 = aten._embedding_bag(arg0_1, arg1_1, arg2_1)
2023-01-11T21:38:06.3422722Z         del arg0_1
2023-01-11T21:38:06.3422800Z         del arg1_1
2023-01-11T21:38:06.3422870Z         del arg2_1
2023-01-11T21:38:06.3422943Z         buf1 = buf0[0]
2023-01-11T21:38:06.3423041Z         assert_size_stride(buf1, (3, 4), (4, 1))
2023-01-11T21:38:06.3423112Z         buf2 = buf0[1]
2023-01-11T21:38:06.3423212Z         assert_size_stride(buf2, (8, ), (1, ))
2023-01-11T21:38:06.3423284Z         buf3 = buf0[2]
2023-01-11T21:38:06.3423381Z         assert_size_stride(buf3, (3, ), (1, ))
2023-01-11T21:38:06.3423454Z         buf4 = buf0[3]
2023-01-11T21:38:06.3423544Z         assert_size_stride(buf4, (0, ), (1, ))
2023-01-11T21:38:06.3423617Z         del buf0
2023-01-11T21:38:06.3423715Z         return (buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.3423720Z 
2023-01-11T21:38:06.3423725Z 
2023-01-11T21:38:06.3423805Z if __name__ == "__main__":
2023-01-11T21:38:06.3423920Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3424044Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3424246Z     arg0_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3424439Z     arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3424620Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3424744Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.3424750Z 
2023-01-11T21:38:06.3424818Z ok (0.081s)
2023-01-11T21:38:06.3425276Z   test_embedding_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3425441Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3425732Z [2023-01-11 21:34:35,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 516
2023-01-11T21:38:06.3426015Z [2023-01-11 21:34:35,432] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 516
2023-01-11T21:38:06.3426432Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3426571Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3426829Z [2023-01-11 21:34:35,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 517
2023-01-11T21:38:06.3427091Z [2023-01-11 21:34:35,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 517
2023-01-11T21:38:06.3427097Z 
2023-01-11T21:38:06.3427197Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3427267Z import torch
2023-01-11T21:38:06.3427344Z import random
2023-01-11T21:38:06.3427466Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3427594Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3427599Z 
2023-01-11T21:38:06.3427679Z aten = torch.ops.aten
2023-01-11T21:38:06.3427819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3427920Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3427927Z 
2023-01-11T21:38:06.3427999Z import triton
2023-01-11T21:38:06.3428093Z import triton.language as tl
2023-01-11T21:38:06.3428220Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3428360Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3428366Z 
2023-01-11T21:38:06.3428370Z 
2023-01-11T21:38:06.3428608Z triton_fused_le_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.3428688Z import triton
2023-01-11T21:38:06.3428784Z import triton.language as tl
2023-01-11T21:38:06.3428899Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3428996Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3429130Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3429256Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3429261Z 
2023-01-11T21:38:06.3429695Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*i1', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3429775Z @triton.jit
2023-01-11T21:38:06.3429927Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3430002Z     xnumel = 64
2023-01-11T21:38:06.3430104Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3430228Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3430314Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3430396Z     x1 = (xindex // 4)
2023-01-11T21:38:06.3430474Z     x0 = xindex % 4
2023-01-11T21:38:06.3430548Z     x2 = xindex
2023-01-11T21:38:06.3430652Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.3430763Z     tmp1 = tl.load(in_ptr1 + (x0 + (4*tmp0)), xmask)
2023-01-11T21:38:06.3430876Z     tmp2 = tl.where(0 != 0, 0, tl.where(0 > tmp1, 0, tmp1))
2023-01-11T21:38:06.3430946Z     tmp3 = 0
2023-01-11T21:38:06.3431027Z     tmp4 = tmp2 <= tmp3
2023-01-11T21:38:06.3431192Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3431327Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3431414Z ''')
2023-01-11T21:38:06.3431420Z 
2023-01-11T21:38:06.3431424Z 
2023-01-11T21:38:06.3431616Z triton_fused_convert_element_type_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3431685Z import triton
2023-01-11T21:38:06.3431781Z import triton.language as tl
2023-01-11T21:38:06.3431899Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3432006Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3432140Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3432266Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3432271Z 
2023-01-11T21:38:06.3432672Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3432758Z @triton.jit
2023-01-11T21:38:06.3432886Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3432964Z     xnumel = 16
2023-01-11T21:38:06.3433062Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3433197Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3433284Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3433356Z     x0 = xindex
2023-01-11T21:38:06.3433456Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3433540Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3433627Z     tmp2 = tmp1.to(tl.int64)
2023-01-11T21:38:06.3433761Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3433849Z ''')
2023-01-11T21:38:06.3433854Z 
2023-01-11T21:38:06.3433859Z 
2023-01-11T21:38:06.3433954Z async_compile.wait(globals())
2023-01-11T21:38:06.3434034Z del async_compile
2023-01-11T21:38:06.3434042Z 
2023-01-11T21:38:06.3434118Z def call(args):
2023-01-11T21:38:06.3434209Z     primals_1, primals_2 = args
2023-01-11T21:38:06.3434280Z     args.clear()
2023-01-11T21:38:06.3434374Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3434583Z         buf0 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3434816Z         buf1 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.3434912Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3435077Z         triton_fused_le_relu_0.run(primals_2, primals_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3435159Z         del primals_1
2023-01-11T21:38:06.3435380Z         buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3435563Z         triton_fused_convert_element_type_1_1.run(primals_2, buf2, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.3435642Z         del primals_2
2023-01-11T21:38:06.3435734Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3435739Z 
2023-01-11T21:38:06.3435744Z 
2023-01-11T21:38:06.3435823Z if __name__ == "__main__":
2023-01-11T21:38:06.3435942Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3436069Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3436280Z     primals_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3436477Z     primals_2 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3436612Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:06.3436617Z 
2023-01-11T21:38:06.3436621Z 
2023-01-11T21:38:06.3436721Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3436796Z import torch
2023-01-11T21:38:06.3436873Z import random
2023-01-11T21:38:06.3436993Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3437120Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3437163Z 
2023-01-11T21:38:06.3437247Z aten = torch.ops.aten
2023-01-11T21:38:06.3437378Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3437477Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3437482Z 
2023-01-11T21:38:06.3437558Z import triton
2023-01-11T21:38:06.3437652Z import triton.language as tl
2023-01-11T21:38:06.3437781Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3437925Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3437930Z 
2023-01-11T21:38:06.3437934Z 
2023-01-11T21:38:06.3438096Z triton_fused_le_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.3438176Z import triton
2023-01-11T21:38:06.3438263Z import triton.language as tl
2023-01-11T21:38:06.3438381Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3438487Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3438622Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3438751Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3438756Z 
2023-01-11T21:38:06.3439188Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*i1', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3439264Z @triton.jit
2023-01-11T21:38:06.3439418Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3439488Z     xnumel = 64
2023-01-11T21:38:06.3439587Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3439716Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3439800Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3439879Z     x1 = (xindex // 4)
2023-01-11T21:38:06.3439955Z     x0 = xindex % 4
2023-01-11T21:38:06.3440026Z     x2 = xindex
2023-01-11T21:38:06.3440119Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.3440251Z     tmp1 = tl.load(in_ptr1 + (x0 + (4*tmp0)), xmask).to(tl.float32)
2023-01-11T21:38:06.3440369Z     tmp2 = tl.where(0 != 0, 0, tl.where(0 > tmp1, 0, tmp1))
2023-01-11T21:38:06.3440460Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.3440532Z     tmp4 = 0
2023-01-11T21:38:06.3440613Z     tmp5 = tmp3 <= tmp4
2023-01-11T21:38:06.3440774Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3440903Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3440994Z ''')
2023-01-11T21:38:06.3440999Z 
2023-01-11T21:38:06.3441004Z 
2023-01-11T21:38:06.3441190Z triton_fused_convert_element_type_5_1 = async_compile.triton('''
2023-01-11T21:38:06.3441266Z import triton
2023-01-11T21:38:06.3441362Z import triton.language as tl
2023-01-11T21:38:06.3441479Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3441584Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3441722Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3441845Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3441850Z 
2023-01-11T21:38:06.3442257Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3442333Z @triton.jit
2023-01-11T21:38:06.3442467Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3442544Z     xnumel = 16
2023-01-11T21:38:06.3442641Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3442773Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3442858Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3442924Z     x0 = xindex
2023-01-11T21:38:06.3443022Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3443112Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3443199Z     tmp2 = tmp1.to(tl.int64)
2023-01-11T21:38:06.3443359Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3443445Z ''')
2023-01-11T21:38:06.3443451Z 
2023-01-11T21:38:06.3443455Z 
2023-01-11T21:38:06.3443552Z async_compile.wait(globals())
2023-01-11T21:38:06.3443624Z del async_compile
2023-01-11T21:38:06.3443630Z 
2023-01-11T21:38:06.3443708Z def call(args):
2023-01-11T21:38:06.3443801Z     primals_1, primals_2 = args
2023-01-11T21:38:06.3443876Z     args.clear()
2023-01-11T21:38:06.3443972Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3444181Z         buf0 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3444384Z         buf1 = empty_strided((2, 8, 4), (32, 4, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.3444472Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3444638Z         triton_fused_le_relu_0.run(primals_2, primals_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3444718Z         del primals_1
2023-01-11T21:38:06.3444919Z         buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3445083Z         triton_fused_convert_element_type_5_1.run(primals_2, buf2, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.3445164Z         del primals_2
2023-01-11T21:38:06.3445261Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3445267Z 
2023-01-11T21:38:06.3445271Z 
2023-01-11T21:38:06.3445354Z if __name__ == "__main__":
2023-01-11T21:38:06.3445467Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3445594Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3445807Z     primals_1 = rand_strided((10, 4), (4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3446010Z     primals_2 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3446144Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:06.3446149Z 
2023-01-11T21:38:06.3446224Z ok (0.511s)
2023-01-11T21:38:06.3446700Z   test_exp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3446840Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3447100Z [2023-01-11 21:34:35,668] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 518
2023-01-11T21:38:06.3447366Z [2023-01-11 21:34:35,744] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 518
2023-01-11T21:38:06.3447776Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3447913Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3448171Z [2023-01-11 21:34:35,762] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 519
2023-01-11T21:38:06.3448435Z [2023-01-11 21:34:35,835] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 519
2023-01-11T21:38:06.3448441Z 
2023-01-11T21:38:06.3448543Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3448617Z import torch
2023-01-11T21:38:06.3448694Z import random
2023-01-11T21:38:06.3448815Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3448940Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3448945Z 
2023-01-11T21:38:06.3449023Z aten = torch.ops.aten
2023-01-11T21:38:06.3449165Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3449290Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3449295Z 
2023-01-11T21:38:06.3449371Z import triton
2023-01-11T21:38:06.3449462Z import triton.language as tl
2023-01-11T21:38:06.3449589Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3449730Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3449736Z 
2023-01-11T21:38:06.3449740Z 
2023-01-11T21:38:06.3449905Z triton_fused_exp_exp_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3449975Z import triton
2023-01-11T21:38:06.3450070Z import triton.language as tl
2023-01-11T21:38:06.3450187Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3450291Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3450425Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3450552Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3450557Z 
2023-01-11T21:38:06.3451000Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3451075Z @triton.jit
2023-01-11T21:38:06.3451225Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3451301Z     xnumel = 64
2023-01-11T21:38:06.3451402Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3451535Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3451621Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3451695Z     x0 = xindex
2023-01-11T21:38:06.3451887Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3451981Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3452078Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3452162Z     tmp1 = tl.exp(tmp0)
2023-01-11T21:38:06.3452244Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.3452327Z     tmp5 = tl.exp(tmp4)
2023-01-11T21:38:06.3452464Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3452600Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3452680Z ''')
2023-01-11T21:38:06.3452713Z 
2023-01-11T21:38:06.3452726Z 
2023-01-11T21:38:06.3452816Z async_compile.wait(globals())
2023-01-11T21:38:06.3452895Z del async_compile
2023-01-11T21:38:06.3452900Z 
2023-01-11T21:38:06.3452977Z def call(args):
2023-01-11T21:38:06.3453060Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3453137Z     args.clear()
2023-01-11T21:38:06.3453231Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3453432Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3453625Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3453722Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3453882Z         triton_fused_exp_exp_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3453958Z         del arg0_1
2023-01-11T21:38:06.3454032Z         del arg1_1
2023-01-11T21:38:06.3454118Z         return (buf0, buf1, )
2023-01-11T21:38:06.3454123Z 
2023-01-11T21:38:06.3454129Z 
2023-01-11T21:38:06.3454211Z if __name__ == "__main__":
2023-01-11T21:38:06.3454325Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3454455Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3454764Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3454962Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3455081Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3455086Z 
2023-01-11T21:38:06.3455090Z 
2023-01-11T21:38:06.3455189Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3455304Z import torch
2023-01-11T21:38:06.3455376Z import random
2023-01-11T21:38:06.3455487Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3455611Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3455616Z 
2023-01-11T21:38:06.3455696Z aten = torch.ops.aten
2023-01-11T21:38:06.3455832Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3455926Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3455931Z 
2023-01-11T21:38:06.3456007Z import triton
2023-01-11T21:38:06.3456099Z import triton.language as tl
2023-01-11T21:38:06.3456222Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3456354Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3456359Z 
2023-01-11T21:38:06.3456371Z 
2023-01-11T21:38:06.3456527Z triton_fused_exp_exp_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3456600Z import triton
2023-01-11T21:38:06.3456692Z import triton.language as tl
2023-01-11T21:38:06.3456807Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3456910Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3457041Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3457220Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3457228Z 
2023-01-11T21:38:06.3457658Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3457732Z @triton.jit
2023-01-11T21:38:06.3457882Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3457956Z     xnumel = 64
2023-01-11T21:38:06.3458053Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3458182Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3458268Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3458338Z     x0 = xindex
2023-01-11T21:38:06.3458544Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3458661Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3458775Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3458896Z     tmp1 = tl.exp(tmp0)
2023-01-11T21:38:06.3458977Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.3459057Z     tmp5 = tl.exp(tmp4)
2023-01-11T21:38:06.3459191Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3459317Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3459404Z ''')
2023-01-11T21:38:06.3459409Z 
2023-01-11T21:38:06.3459414Z 
2023-01-11T21:38:06.3459507Z async_compile.wait(globals())
2023-01-11T21:38:06.3459582Z del async_compile
2023-01-11T21:38:06.3459587Z 
2023-01-11T21:38:06.3459662Z def call(args):
2023-01-11T21:38:06.3459744Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3459819Z     args.clear()
2023-01-11T21:38:06.3459904Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3460100Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3460296Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3460388Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3460542Z         triton_fused_exp_exp_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3460613Z         del arg0_1
2023-01-11T21:38:06.3460688Z         del arg1_1
2023-01-11T21:38:06.3460770Z         return (buf0, buf1, )
2023-01-11T21:38:06.3460775Z 
2023-01-11T21:38:06.3460779Z 
2023-01-11T21:38:06.3460850Z if __name__ == "__main__":
2023-01-11T21:38:06.3460967Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3461095Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3461320Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3461516Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3461635Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3461640Z 
2023-01-11T21:38:06.3461710Z ok (0.186s)
2023-01-11T21:38:06.3462170Z   test_expand_as_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3462303Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3462553Z [2023-01-11 21:34:35,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 520
2023-01-11T21:38:06.3462819Z [2023-01-11 21:34:35,949] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 520
2023-01-11T21:38:06.3463233Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3463364Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3463623Z [2023-01-11 21:34:35,983] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 521
2023-01-11T21:38:06.3463880Z [2023-01-11 21:34:36,061] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 521
2023-01-11T21:38:06.3463886Z 
2023-01-11T21:38:06.3463984Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3464063Z import torch
2023-01-11T21:38:06.3464136Z import random
2023-01-11T21:38:06.3464257Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3464373Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3464378Z 
2023-01-11T21:38:06.3464462Z aten = torch.ops.aten
2023-01-11T21:38:06.3464624Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3464719Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3464725Z 
2023-01-11T21:38:06.3464798Z import triton
2023-01-11T21:38:06.3464892Z import triton.language as tl
2023-01-11T21:38:06.3465017Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3465149Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3465159Z 
2023-01-11T21:38:06.3465164Z 
2023-01-11T21:38:06.3465315Z triton_fused_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.3465405Z import triton
2023-01-11T21:38:06.3465503Z import triton.language as tl
2023-01-11T21:38:06.3465643Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3465745Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3465877Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3466001Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3466006Z 
2023-01-11T21:38:06.3466407Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3466480Z @triton.jit
2023-01-11T21:38:06.3466613Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3466687Z     xnumel = 76800
2023-01-11T21:38:06.3466785Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3466912Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3466994Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3467070Z     x0 = xindex % 100
2023-01-11T21:38:06.3467183Z     x2 = (xindex // 12800)
2023-01-11T21:38:06.3467253Z     x3 = xindex
2023-01-11T21:38:06.3467457Z     tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3467526Z     tmp1 = 1
2023-01-11T21:38:06.3467602Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3467680Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.3467813Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3467890Z ''')
2023-01-11T21:38:06.3467895Z 
2023-01-11T21:38:06.3467900Z 
2023-01-11T21:38:06.3467993Z async_compile.wait(globals())
2023-01-11T21:38:06.3468069Z del async_compile
2023-01-11T21:38:06.3468074Z 
2023-01-11T21:38:06.3468147Z def call(args):
2023-01-11T21:38:06.3468223Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3468298Z     args.clear()
2023-01-11T21:38:06.3468388Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3468600Z         buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3468697Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3468836Z         triton_fused_add_2_0.run(arg0_1, buf0, 76800, grid=grid(76800), stream=stream0)
2023-01-11T21:38:06.3468954Z         return (as_strided(arg0_1, (6, 128, 100), (100, 0, 1)), buf0, )
2023-01-11T21:38:06.3468960Z 
2023-01-11T21:38:06.3468964Z 
2023-01-11T21:38:06.3469045Z if __name__ == "__main__":
2023-01-11T21:38:06.3469162Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3469286Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3469500Z     arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3469710Z     arg1_1 = rand_strided((6, 128, 100), (12800, 100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3469828Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3469833Z 
2023-01-11T21:38:06.3469838Z 
2023-01-11T21:38:06.3469935Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3470011Z import torch
2023-01-11T21:38:06.3470085Z import random
2023-01-11T21:38:06.3470206Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3470327Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3470332Z 
2023-01-11T21:38:06.3470413Z aten = torch.ops.aten
2023-01-11T21:38:06.3470568Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3470664Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3470669Z 
2023-01-11T21:38:06.3470743Z import triton
2023-01-11T21:38:06.3470833Z import triton.language as tl
2023-01-11T21:38:06.3470957Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3471093Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3471098Z 
2023-01-11T21:38:06.3471103Z 
2023-01-11T21:38:06.3471259Z triton_fused_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.3471332Z import triton
2023-01-11T21:38:06.3471417Z import triton.language as tl
2023-01-11T21:38:06.3471533Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3471633Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3471766Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3471888Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3471893Z 
2023-01-11T21:38:06.3472297Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3472371Z @triton.jit
2023-01-11T21:38:06.3472504Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3472572Z     xnumel = 76800
2023-01-11T21:38:06.3472668Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3472795Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3472876Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3472978Z     x0 = xindex % 100
2023-01-11T21:38:06.3473059Z     x2 = (xindex // 12800)
2023-01-11T21:38:06.3473127Z     x3 = xindex
2023-01-11T21:38:06.3473348Z     tmp0 = tl.load(in_ptr0 + (x0 + (100*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3473417Z     tmp1 = 1
2023-01-11T21:38:06.3473499Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3473577Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.3473711Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3473793Z ''')
2023-01-11T21:38:06.3473798Z 
2023-01-11T21:38:06.3473803Z 
2023-01-11T21:38:06.3473899Z async_compile.wait(globals())
2023-01-11T21:38:06.3473975Z del async_compile
2023-01-11T21:38:06.3473980Z 
2023-01-11T21:38:06.3474047Z def call(args):
2023-01-11T21:38:06.3474126Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3474199Z     args.clear()
2023-01-11T21:38:06.3474289Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3474505Z         buf0 = empty_strided((6, 128, 100), (12800, 100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3474599Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3480972Z         triton_fused_add_2_0.run(arg0_1, buf0, 76800, grid=grid(76800), stream=stream0)
2023-01-11T21:38:06.3481112Z         return (as_strided(arg0_1, (6, 128, 100), (100, 0, 1)), buf0, )
2023-01-11T21:38:06.3481122Z 
2023-01-11T21:38:06.3481127Z 
2023-01-11T21:38:06.3481210Z if __name__ == "__main__":
2023-01-11T21:38:06.3481324Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3481454Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3481689Z     arg0_1 = rand_strided((6, 1, 100), (100, 100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3481907Z     arg1_1 = rand_strided((6, 128, 100), (12800, 100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3482029Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3482035Z 
2023-01-11T21:38:06.3482112Z ok (0.226s)
2023-01-11T21:38:06.3482621Z   test_expand_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3482756Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3483017Z [2023-01-11 21:34:36,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 522
2023-01-11T21:38:06.3483274Z [2023-01-11 21:34:36,189] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 522
2023-01-11T21:38:06.3483694Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3483824Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3484085Z [2023-01-11 21:34:36,213] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 523
2023-01-11T21:38:06.3484344Z [2023-01-11 21:34:36,299] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 523
2023-01-11T21:38:06.3484350Z 
2023-01-11T21:38:06.3484450Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3484524Z import torch
2023-01-11T21:38:06.3484600Z import random
2023-01-11T21:38:06.3484713Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3484837Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3484842Z 
2023-01-11T21:38:06.3484927Z aten = torch.ops.aten
2023-01-11T21:38:06.3485066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3485210Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3485215Z 
2023-01-11T21:38:06.3485288Z import triton
2023-01-11T21:38:06.3485380Z import triton.language as tl
2023-01-11T21:38:06.3485497Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3485639Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3485644Z 
2023-01-11T21:38:06.3485649Z 
2023-01-11T21:38:06.3485807Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3485882Z import triton
2023-01-11T21:38:06.3485975Z import triton.language as tl
2023-01-11T21:38:06.3486086Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3486193Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3486325Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3486444Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3486457Z 
2023-01-11T21:38:06.3486858Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3486932Z @triton.jit
2023-01-11T21:38:06.3487067Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3487140Z     xnumel = 144
2023-01-11T21:38:06.3487237Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3487365Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3487446Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3487514Z     x0 = xindex % 2
2023-01-11T21:38:06.3487595Z     x2 = (xindex // 6) % 2
2023-01-11T21:38:06.3487665Z     x4 = xindex
2023-01-11T21:38:06.3487865Z     tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3487934Z     tmp1 = 1
2023-01-11T21:38:06.3488014Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3488087Z     tmp3 = 2
2023-01-11T21:38:06.3488159Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.3488296Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3488381Z ''')
2023-01-11T21:38:06.3488387Z 
2023-01-11T21:38:06.3488391Z 
2023-01-11T21:38:06.3488552Z triton_fused_add_2_1 = async_compile.triton('''
2023-01-11T21:38:06.3488655Z import triton
2023-01-11T21:38:06.3488749Z import triton.language as tl
2023-01-11T21:38:06.3488862Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3488956Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3489088Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3489214Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3489219Z 
2023-01-11T21:38:06.3489617Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3489693Z @triton.jit
2023-01-11T21:38:06.3489827Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3489900Z     xnumel = 24
2023-01-11T21:38:06.3490000Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3490124Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3490207Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3490280Z     x0 = xindex % 2
2023-01-11T21:38:06.3490362Z     x2 = (xindex // 6) % 2
2023-01-11T21:38:06.3490432Z     x4 = xindex
2023-01-11T21:38:06.3490633Z     tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3490703Z     tmp1 = 2
2023-01-11T21:38:06.3490774Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3490908Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3490992Z ''')
2023-01-11T21:38:06.3490998Z 
2023-01-11T21:38:06.3491002Z 
2023-01-11T21:38:06.3491093Z async_compile.wait(globals())
2023-01-11T21:38:06.3491198Z del async_compile
2023-01-11T21:38:06.3491204Z 
2023-01-11T21:38:06.3491278Z def call(args):
2023-01-11T21:38:06.3491351Z     arg0_1, = args
2023-01-11T21:38:06.3491426Z     args.clear()
2023-01-11T21:38:06.3491511Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3491737Z         buf0 = empty_strided((3, 4, 2, 3, 2), (48, 12, 6, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3491831Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3491969Z         triton_fused_add_1_0.run(arg0_1, buf0, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.3492188Z         buf1 = empty_strided((2, 1, 2, 3, 2), (12, 12, 6, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3492327Z         triton_fused_add_2_1.run(arg0_1, buf1, 24, grid=grid(24), stream=stream0)
2023-01-11T21:38:06.3492451Z         return (buf0, buf1, as_strided(arg0_1, (2, 2, 5, 2), (0, 2, 0, 1)), )
2023-01-11T21:38:06.3492456Z 
2023-01-11T21:38:06.3492461Z 
2023-01-11T21:38:06.3492546Z if __name__ == "__main__":
2023-01-11T21:38:06.3492655Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3492780Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3492983Z     arg0_1 = rand_strided((2, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3493097Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3493102Z 
2023-01-11T21:38:06.3493107Z 
2023-01-11T21:38:06.3493202Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3493277Z import torch
2023-01-11T21:38:06.3493355Z import random
2023-01-11T21:38:06.3493475Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3493592Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3493597Z 
2023-01-11T21:38:06.3493678Z aten = torch.ops.aten
2023-01-11T21:38:06.3493816Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3493912Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3493919Z 
2023-01-11T21:38:06.3493992Z import triton
2023-01-11T21:38:06.3494084Z import triton.language as tl
2023-01-11T21:38:06.3494208Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3494340Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3494350Z 
2023-01-11T21:38:06.3494354Z 
2023-01-11T21:38:06.3494797Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3494878Z import triton
2023-01-11T21:38:06.3494972Z import triton.language as tl
2023-01-11T21:38:06.3495086Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3495189Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3495321Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3495443Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3495448Z 
2023-01-11T21:38:06.3495852Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3495923Z @triton.jit
2023-01-11T21:38:06.3496052Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3496128Z     xnumel = 144
2023-01-11T21:38:06.3496225Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3496355Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3496435Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3496510Z     x0 = xindex % 2
2023-01-11T21:38:06.3496584Z     x2 = (xindex // 6) % 2
2023-01-11T21:38:06.3496653Z     x4 = xindex
2023-01-11T21:38:06.3496877Z     tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3496946Z     tmp1 = 1
2023-01-11T21:38:06.3497025Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3497097Z     tmp3 = 2
2023-01-11T21:38:06.3497238Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.3497377Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3497504Z ''')
2023-01-11T21:38:06.3497510Z 
2023-01-11T21:38:06.3497514Z 
2023-01-11T21:38:06.3497672Z triton_fused_add_2_1 = async_compile.triton('''
2023-01-11T21:38:06.3497747Z import triton
2023-01-11T21:38:06.3497840Z import triton.language as tl
2023-01-11T21:38:06.3497961Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3498067Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3498196Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3498324Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3498329Z 
2023-01-11T21:38:06.3498733Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3498808Z @triton.jit
2023-01-11T21:38:06.3498941Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3499020Z     xnumel = 24
2023-01-11T21:38:06.3499118Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3499251Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3499329Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3499410Z     x0 = xindex % 2
2023-01-11T21:38:06.3499495Z     x2 = (xindex // 6) % 2
2023-01-11T21:38:06.3499569Z     x4 = xindex
2023-01-11T21:38:06.3499797Z     tmp0 = tl.load(in_ptr0 + (x0 + (2*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3499870Z     tmp1 = 2
2023-01-11T21:38:06.3499951Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3500081Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3500167Z ''')
2023-01-11T21:38:06.3500173Z 
2023-01-11T21:38:06.3500177Z 
2023-01-11T21:38:06.3500273Z async_compile.wait(globals())
2023-01-11T21:38:06.3500352Z del async_compile
2023-01-11T21:38:06.3500357Z 
2023-01-11T21:38:06.3500437Z def call(args):
2023-01-11T21:38:06.3500513Z     arg0_1, = args
2023-01-11T21:38:06.3500587Z     args.clear()
2023-01-11T21:38:06.3500681Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3500898Z         buf0 = empty_strided((3, 4, 2, 3, 2), (48, 12, 6, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3501021Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3501162Z         triton_fused_add_1_0.run(arg0_1, buf0, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.3501383Z         buf1 = empty_strided((2, 1, 2, 3, 2), (12, 12, 6, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3501525Z         triton_fused_add_2_1.run(arg0_1, buf1, 24, grid=grid(24), stream=stream0)
2023-01-11T21:38:06.3501651Z         return (buf0, buf1, as_strided(arg0_1, (2, 2, 5, 2), (0, 2, 0, 1)), )
2023-01-11T21:38:06.3501657Z 
2023-01-11T21:38:06.3501661Z 
2023-01-11T21:38:06.3501743Z if __name__ == "__main__":
2023-01-11T21:38:06.3501861Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3501985Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3502192Z     arg0_1 = rand_strided((2, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3502306Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3502311Z 
2023-01-11T21:38:06.3502385Z ok (0.237s)
2023-01-11T21:38:06.3502852Z   test_expanded_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3502988Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3503248Z [2023-01-11 21:34:36,316] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 524
2023-01-11T21:38:06.3503545Z [2023-01-11 21:34:36,597] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 524
2023-01-11T21:38:06.3503967Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3504102Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3504357Z [2023-01-11 21:34:36,613] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 525
2023-01-11T21:38:06.3504363Z 
2023-01-11T21:38:06.3504456Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3504531Z import torch
2023-01-11T21:38:06.3504605Z import random
2023-01-11T21:38:06.3504727Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3504861Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3504866Z 
2023-01-11T21:38:06.3504950Z aten = torch.ops.aten
2023-01-11T21:38:06.3505089Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3505181Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3505191Z 
2023-01-11T21:38:06.3505263Z import triton
2023-01-11T21:38:06.3505357Z import triton.language as tl
2023-01-11T21:38:06.3505509Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3505673Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3505680Z 
2023-01-11T21:38:06.3505686Z 
2023-01-11T21:38:06.3505851Z triton_fused_mul_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3505927Z import triton
2023-01-11T21:38:06.3506026Z import triton.language as tl
2023-01-11T21:38:06.3506136Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3506239Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3506377Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3506505Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3506510Z 
2023-01-11T21:38:06.3506605Z @reduction(size_hints=[1024, 128],
2023-01-11T21:38:06.3506723Z               reduction_hint=ReductionHint.OUTER,
2023-01-11T21:38:06.3506837Z               filename=__file__,
2023-01-11T21:38:06.3507218Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3507289Z @triton.jit
2023-01-11T21:38:06.3507468Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3507545Z     xnumel = 1024
2023-01-11T21:38:06.3507621Z     rnumel = 99
2023-01-11T21:38:06.3507720Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3507865Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3507949Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3508064Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3508154Z     x1 = (xindex // 256)
2023-01-11T21:38:06.3508232Z     x0 = xindex % 256
2023-01-11T21:38:06.3508355Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3508428Z     x3 = xindex
2023-01-11T21:38:06.3508535Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3508627Z         rindex = roffset + rbase
2023-01-11T21:38:06.3508707Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3508782Z         r2 = rindex
2023-01-11T21:38:06.3508866Z         tmp0 = r2 + (99*x1)
2023-01-11T21:38:06.3508943Z         tmp1 = 394
2023-01-11T21:38:06.3509026Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.3509209Z         tmp3 = tl.load(in_ptr0 + (x0 + (256*((r2 + (99*x1)) % 394)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0)
2023-01-11T21:38:06.3509425Z         tmp4 = tl.load(in_ptr1 + (x0 + (256*(((r2 + (99*x1)) // 197) % 2)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0)
2023-01-11T21:38:06.3509502Z         tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3509600Z         tmp6 = tl.where(tmp2, tmp5, 0)
2023-01-11T21:38:06.3509725Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.3509842Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3509945Z     tl.store(out_ptr0 + x3, tmp7, xmask)
2023-01-11T21:38:06.3510032Z ''')
2023-01-11T21:38:06.3510038Z 
2023-01-11T21:38:06.3510042Z 
2023-01-11T21:38:06.3510207Z triton_fused_mul_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3510282Z import triton
2023-01-11T21:38:06.3510370Z import triton.language as tl
2023-01-11T21:38:06.3510484Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3510589Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3510723Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3510857Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3510862Z 
2023-01-11T21:38:06.3510953Z @reduction(size_hints=[256, 4],
2023-01-11T21:38:06.3511076Z               reduction_hint=ReductionHint.OUTER_TINY,
2023-01-11T21:38:06.3511160Z               filename=__file__,
2023-01-11T21:38:06.3511526Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3511601Z @triton.jit
2023-01-11T21:38:06.3511772Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3511849Z     xnumel = 256
2023-01-11T21:38:06.3511926Z     rnumel = 4
2023-01-11T21:38:06.3512023Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3512162Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3512244Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3512367Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3512440Z     x0 = xindex
2023-01-11T21:38:06.3512560Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3512670Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3512792Z         rindex = roffset + rbase
2023-01-11T21:38:06.3512881Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3512948Z         r1 = rindex
2023-01-11T21:38:06.3513067Z         tmp0 = tl.load(in_ptr0 + (x0 + (256*r1)), rmask & xmask)
2023-01-11T21:38:06.3513189Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.3513305Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3513410Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.3513497Z ''')
2023-01-11T21:38:06.3513503Z 
2023-01-11T21:38:06.3513507Z 
2023-01-11T21:38:06.3513602Z async_compile.wait(globals())
2023-01-11T21:38:06.3513677Z del async_compile
2023-01-11T21:38:06.3513690Z 
2023-01-11T21:38:06.3513759Z def call(args):
2023-01-11T21:38:06.3513842Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3513920Z     args.clear()
2023-01-11T21:38:06.3514012Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3514222Z         buf0 = empty_strided((256, 4), (1, 256), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3514315Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3514469Z         triton_fused_mul_sum_1_0.run(arg0_1, arg1_1, buf0, 1024, 99, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.3514538Z         del arg0_1
2023-01-11T21:38:06.3514612Z         del arg1_1
2023-01-11T21:38:06.3514814Z         buf1 = empty_strided((256, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3514957Z         triton_fused_mul_sum_1_1.run(buf0, buf1, 256, 4, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3515036Z         return (buf1, )
2023-01-11T21:38:06.3515041Z 
2023-01-11T21:38:06.3515046Z 
2023-01-11T21:38:06.3515171Z if __name__ == "__main__":
2023-01-11T21:38:06.3515308Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3515451Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3515676Z     arg0_1 = rand_strided((2, 197, 256), (50432, 256, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3515893Z     arg1_1 = rand_strided((2, 1, 256), (256, 256, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3516017Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3516023Z 
2023-01-11T21:38:06.3516290Z [2023-01-11 21:34:36,741] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 525
2023-01-11T21:38:06.3516296Z 
2023-01-11T21:38:06.3516395Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3516472Z import torch
2023-01-11T21:38:06.3516547Z import random
2023-01-11T21:38:06.3516668Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3516787Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3516796Z 
2023-01-11T21:38:06.3516880Z aten = torch.ops.aten
2023-01-11T21:38:06.3517017Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3517115Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3517120Z 
2023-01-11T21:38:06.3517198Z import triton
2023-01-11T21:38:06.3517296Z import triton.language as tl
2023-01-11T21:38:06.3517423Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3517558Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3517570Z 
2023-01-11T21:38:06.3517574Z 
2023-01-11T21:38:06.3517731Z triton_fused_mul_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3517810Z import triton
2023-01-11T21:38:06.3517905Z import triton.language as tl
2023-01-11T21:38:06.3518024Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3518127Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3518261Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3518391Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3518396Z 
2023-01-11T21:38:06.3518484Z @reduction(size_hints=[1024, 128],
2023-01-11T21:38:06.3518600Z               reduction_hint=ReductionHint.OUTER,
2023-01-11T21:38:06.3518687Z               filename=__file__,
2023-01-11T21:38:06.3519099Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3519177Z @triton.jit
2023-01-11T21:38:06.3519356Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3519434Z     xnumel = 1024
2023-01-11T21:38:06.3519512Z     rnumel = 99
2023-01-11T21:38:06.3519606Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3519743Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3519829Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3519950Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3520033Z     x1 = (xindex // 256)
2023-01-11T21:38:06.3520111Z     x0 = xindex % 256
2023-01-11T21:38:06.3520230Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3520298Z     x3 = xindex
2023-01-11T21:38:06.3520404Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3520496Z         rindex = roffset + rbase
2023-01-11T21:38:06.3520581Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3520654Z         r2 = rindex
2023-01-11T21:38:06.3520737Z         tmp0 = r2 + (99*x1)
2023-01-11T21:38:06.3520814Z         tmp1 = 394
2023-01-11T21:38:06.3520891Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.3521090Z         tmp3 = tl.load(in_ptr0 + (x0 + (256*((r2 + (99*x1)) % 394)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.3521289Z         tmp4 = tl.load(in_ptr1 + (x0 + (256*(((r2 + (99*x1)) // 197) % 2)) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.3521402Z         tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3521497Z         tmp6 = tl.where(tmp2, tmp5, 0)
2023-01-11T21:38:06.3521620Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.3521739Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3521832Z     tl.store(out_ptr0 + x3, tmp7, xmask)
2023-01-11T21:38:06.3521920Z ''')
2023-01-11T21:38:06.3521925Z 
2023-01-11T21:38:06.3521930Z 
2023-01-11T21:38:06.3522096Z triton_fused_mul_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3522172Z import triton
2023-01-11T21:38:06.3522266Z import triton.language as tl
2023-01-11T21:38:06.3522382Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3522486Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3522621Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.3522744Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3522750Z 
2023-01-11T21:38:06.3522841Z @reduction(size_hints=[256, 4],
2023-01-11T21:38:06.3522965Z               reduction_hint=ReductionHint.OUTER_TINY,
2023-01-11T21:38:06.3523051Z               filename=__file__,
2023-01-11T21:38:06.3523415Z               meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3523492Z @triton.jit
2023-01-11T21:38:06.3523663Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.3523739Z     xnumel = 256
2023-01-11T21:38:06.3523806Z     rnumel = 4
2023-01-11T21:38:06.3523905Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3524044Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.3524129Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3524254Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.3524327Z     x0 = xindex
2023-01-11T21:38:06.3524444Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.3524545Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.3524634Z         rindex = roffset + rbase
2023-01-11T21:38:06.3524752Z         rmask = rindex < rnumel
2023-01-11T21:38:06.3524827Z         r1 = rindex
2023-01-11T21:38:06.3524945Z         tmp0 = tl.load(in_ptr0 + (x0 + (256*r1)), rmask & xmask)
2023-01-11T21:38:06.3525069Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.3525183Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.3525276Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.3525363Z ''')
2023-01-11T21:38:06.3525369Z 
2023-01-11T21:38:06.3525373Z 
2023-01-11T21:38:06.3525469Z async_compile.wait(globals())
2023-01-11T21:38:06.3525548Z del async_compile
2023-01-11T21:38:06.3525557Z 
2023-01-11T21:38:06.3525634Z def call(args):
2023-01-11T21:38:06.3525715Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3525797Z     args.clear()
2023-01-11T21:38:06.3525884Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3526093Z         buf0 = empty_strided((256, 4), (1, 256), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3526189Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3526343Z         triton_fused_mul_sum_1_0.run(arg0_1, arg1_1, buf0, 1024, 99, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.3526417Z         del arg0_1
2023-01-11T21:38:06.3526489Z         del arg1_1
2023-01-11T21:38:06.3526691Z         buf1 = empty_strided((256, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3526835Z         triton_fused_mul_sum_1_1.run(buf0, buf1, 256, 4, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3526909Z         return (buf1, )
2023-01-11T21:38:06.3526914Z 
2023-01-11T21:38:06.3526918Z 
2023-01-11T21:38:06.3527000Z if __name__ == "__main__":
2023-01-11T21:38:06.3527209Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3527337Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3527560Z     arg0_1 = rand_strided((2, 197, 256), (50432, 256, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3527778Z     arg1_1 = rand_strided((2, 1, 256), (256, 256, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3527903Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3527908Z 
2023-01-11T21:38:06.3527982Z ok (0.442s)
2023-01-11T21:38:06.3528443Z   test_expm1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3528571Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3528835Z [2023-01-11 21:34:36,758] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 526
2023-01-11T21:38:06.3529103Z [2023-01-11 21:34:36,900] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 526
2023-01-11T21:38:06.3529524Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3529657Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3529916Z [2023-01-11 21:34:36,916] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 527
2023-01-11T21:38:06.3530180Z [2023-01-11 21:34:36,924] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 527
2023-01-11T21:38:06.3530626Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3530761Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3531017Z [2023-01-11 21:34:36,940] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 528
2023-01-11T21:38:06.3531283Z [2023-01-11 21:34:37,090] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 528
2023-01-11T21:38:06.3531696Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3531826Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3532086Z [2023-01-11 21:34:37,106] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 529
2023-01-11T21:38:06.3532092Z 
2023-01-11T21:38:06.3532194Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3532270Z import torch
2023-01-11T21:38:06.3532349Z import random
2023-01-11T21:38:06.3532471Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3532598Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3532603Z 
2023-01-11T21:38:06.3532686Z aten = torch.ops.aten
2023-01-11T21:38:06.3532819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3532918Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3532950Z 
2023-01-11T21:38:06.3533027Z import triton
2023-01-11T21:38:06.3533123Z import triton.language as tl
2023-01-11T21:38:06.3533251Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3533393Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3533398Z 
2023-01-11T21:38:06.3533405Z 
2023-01-11T21:38:06.3533574Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3533652Z import triton
2023-01-11T21:38:06.3533740Z import triton.language as tl
2023-01-11T21:38:06.3533856Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3533962Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3534098Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3534224Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3534230Z 
2023-01-11T21:38:06.3534774Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3534856Z @triton.jit
2023-01-11T21:38:06.3535002Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3535071Z     xnumel = 64
2023-01-11T21:38:06.3535173Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3535308Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3535393Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3535465Z     x0 = xindex
2023-01-11T21:38:06.3535679Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3535799Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3535895Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3535995Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3536067Z     tmp4 = 2
2023-01-11T21:38:06.3536150Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3536289Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3536423Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3536509Z ''')
2023-01-11T21:38:06.3536515Z 
2023-01-11T21:38:06.3536519Z 
2023-01-11T21:38:06.3536659Z async_compile.wait(globals())
2023-01-11T21:38:06.3536733Z del async_compile
2023-01-11T21:38:06.3536738Z 
2023-01-11T21:38:06.3536814Z def call(args):
2023-01-11T21:38:06.3536888Z     arg0_1, = args
2023-01-11T21:38:06.3536967Z     args.clear()
2023-01-11T21:38:06.3537063Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3537337Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3537536Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3537625Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3537777Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3537857Z         del arg0_1
2023-01-11T21:38:06.3537939Z         return (buf0, buf1, )
2023-01-11T21:38:06.3537944Z 
2023-01-11T21:38:06.3537949Z 
2023-01-11T21:38:06.3538028Z if __name__ == "__main__":
2023-01-11T21:38:06.3538146Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3538278Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3538478Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3538587Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3538593Z 
2023-01-11T21:38:06.3538597Z 
2023-01-11T21:38:06.3538700Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3538775Z import torch
2023-01-11T21:38:06.3538851Z import random
2023-01-11T21:38:06.3538974Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3539104Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3539109Z 
2023-01-11T21:38:06.3539232Z aten = torch.ops.aten
2023-01-11T21:38:06.3539370Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3539462Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3539466Z 
2023-01-11T21:38:06.3539542Z import triton
2023-01-11T21:38:06.3539639Z import triton.language as tl
2023-01-11T21:38:06.3539769Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3539911Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3539917Z 
2023-01-11T21:38:06.3539921Z 
2023-01-11T21:38:06.3540088Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3540166Z import triton
2023-01-11T21:38:06.3540253Z import triton.language as tl
2023-01-11T21:38:06.3540369Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3540472Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3540606Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3540731Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3540739Z 
2023-01-11T21:38:06.3541162Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3541240Z @triton.jit
2023-01-11T21:38:06.3541385Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3541466Z     xnumel = 64
2023-01-11T21:38:06.3541558Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3541690Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3541778Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3541851Z     x0 = xindex
2023-01-11T21:38:06.3542066Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3542188Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3542288Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3542384Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3542457Z     tmp4 = 2
2023-01-11T21:38:06.3542538Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3542673Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3542839Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3542926Z ''')
2023-01-11T21:38:06.3542932Z 
2023-01-11T21:38:06.3542936Z 
2023-01-11T21:38:06.3543033Z async_compile.wait(globals())
2023-01-11T21:38:06.3543105Z del async_compile
2023-01-11T21:38:06.3543111Z 
2023-01-11T21:38:06.3543189Z def call(args):
2023-01-11T21:38:06.3543265Z     arg0_1, = args
2023-01-11T21:38:06.3543343Z     args.clear()
2023-01-11T21:38:06.3543437Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3543635Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3543831Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3543922Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3544068Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3544145Z         del arg0_1
2023-01-11T21:38:06.3544233Z         return (buf0, buf1, )
2023-01-11T21:38:06.3544238Z 
2023-01-11T21:38:06.3544242Z 
2023-01-11T21:38:06.3544326Z if __name__ == "__main__":
2023-01-11T21:38:06.3544445Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3544573Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3544775Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3544882Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3544893Z 
2023-01-11T21:38:06.3544897Z 
2023-01-11T21:38:06.3544990Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3545066Z import torch
2023-01-11T21:38:06.3545143Z import random
2023-01-11T21:38:06.3545264Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3545442Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3545447Z 
2023-01-11T21:38:06.3545535Z aten = torch.ops.aten
2023-01-11T21:38:06.3545690Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3545781Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3545789Z 
2023-01-11T21:38:06.3545864Z import triton
2023-01-11T21:38:06.3545958Z import triton.language as tl
2023-01-11T21:38:06.3546085Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3546226Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3546231Z 
2023-01-11T21:38:06.3546236Z 
2023-01-11T21:38:06.3546401Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3546477Z import triton
2023-01-11T21:38:06.3546570Z import triton.language as tl
2023-01-11T21:38:06.3546678Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3546781Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3546918Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3547046Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3547051Z 
2023-01-11T21:38:06.3547473Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3547548Z @triton.jit
2023-01-11T21:38:06.3547692Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3547769Z     xnumel = 201
2023-01-11T21:38:06.3547862Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3547992Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3548076Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3548150Z     x0 = xindex
2023-01-11T21:38:06.3548366Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3548488Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3548590Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3548683Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3548756Z     tmp4 = 2
2023-01-11T21:38:06.3548838Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3549001Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3549138Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3549223Z ''')
2023-01-11T21:38:06.3549229Z 
2023-01-11T21:38:06.3549234Z 
2023-01-11T21:38:06.3549327Z async_compile.wait(globals())
2023-01-11T21:38:06.3549403Z del async_compile
2023-01-11T21:38:06.3549408Z 
2023-01-11T21:38:06.3549478Z def call(args):
2023-01-11T21:38:06.3549555Z     arg0_1, = args
2023-01-11T21:38:06.3549631Z     args.clear()
2023-01-11T21:38:06.3549723Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3549925Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3550132Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3550225Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3550371Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3550447Z         del arg0_1
2023-01-11T21:38:06.3550530Z         return (buf0, buf1, )
2023-01-11T21:38:06.3550535Z 
2023-01-11T21:38:06.3550539Z 
2023-01-11T21:38:06.3550626Z if __name__ == "__main__":
2023-01-11T21:38:06.3550745Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3550872Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3551073Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3551188Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3551193Z 
2023-01-11T21:38:06.3551452Z [2023-01-11 21:34:37,114] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 529
2023-01-11T21:38:06.3551903Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3552040Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3552302Z [2023-01-11 21:34:37,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 530
2023-01-11T21:38:06.3552570Z [2023-01-11 21:34:37,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 530
2023-01-11T21:38:06.3552990Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3553127Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3553386Z [2023-01-11 21:34:37,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 531
2023-01-11T21:38:06.3553646Z [2023-01-11 21:34:37,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 531
2023-01-11T21:38:06.3554059Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3554191Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3554443Z [2023-01-11 21:34:37,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 532
2023-01-11T21:38:06.3554449Z 
2023-01-11T21:38:06.3554543Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3554619Z import torch
2023-01-11T21:38:06.3554723Z import random
2023-01-11T21:38:06.3554847Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3554973Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3554978Z 
2023-01-11T21:38:06.3555063Z aten = torch.ops.aten
2023-01-11T21:38:06.3555202Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3555299Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3555305Z 
2023-01-11T21:38:06.3555393Z import triton
2023-01-11T21:38:06.3555495Z import triton.language as tl
2023-01-11T21:38:06.3555642Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3555787Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3555792Z 
2023-01-11T21:38:06.3555796Z 
2023-01-11T21:38:06.3555963Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3556038Z import triton
2023-01-11T21:38:06.3556135Z import triton.language as tl
2023-01-11T21:38:06.3556247Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3556351Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3556486Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3556611Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3556616Z 
2023-01-11T21:38:06.3557034Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3557110Z @triton.jit
2023-01-11T21:38:06.3557257Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3557360Z     xnumel = 201
2023-01-11T21:38:06.3557454Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3557587Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3557671Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3557747Z     x0 = xindex
2023-01-11T21:38:06.3557964Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3558086Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3558187Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3558288Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3558354Z     tmp4 = 2
2023-01-11T21:38:06.3558438Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3558575Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3558709Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3558800Z ''')
2023-01-11T21:38:06.3558806Z 
2023-01-11T21:38:06.3558810Z 
2023-01-11T21:38:06.3558904Z async_compile.wait(globals())
2023-01-11T21:38:06.3558985Z del async_compile
2023-01-11T21:38:06.3558990Z 
2023-01-11T21:38:06.3559060Z def call(args):
2023-01-11T21:38:06.3559134Z     arg0_1, = args
2023-01-11T21:38:06.3559208Z     args.clear()
2023-01-11T21:38:06.3559305Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3559506Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3559706Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3559800Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3559945Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3560021Z         del arg0_1
2023-01-11T21:38:06.3560107Z         return (buf0, buf1, )
2023-01-11T21:38:06.3560112Z 
2023-01-11T21:38:06.3560117Z 
2023-01-11T21:38:06.3560198Z if __name__ == "__main__":
2023-01-11T21:38:06.3560321Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3560449Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3560650Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3560765Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3560799Z 
2023-01-11T21:38:06.3560804Z 
2023-01-11T21:38:06.3560903Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3560973Z import torch
2023-01-11T21:38:06.3561053Z import random
2023-01-11T21:38:06.3561173Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3561299Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3561304Z 
2023-01-11T21:38:06.3561388Z aten = torch.ops.aten
2023-01-11T21:38:06.3561525Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3561621Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3561626Z 
2023-01-11T21:38:06.3561698Z import triton
2023-01-11T21:38:06.3561792Z import triton.language as tl
2023-01-11T21:38:06.3561919Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3562061Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3562067Z 
2023-01-11T21:38:06.3562071Z 
2023-01-11T21:38:06.3562238Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3562314Z import triton
2023-01-11T21:38:06.3562408Z import triton.language as tl
2023-01-11T21:38:06.3562523Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3562621Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3562757Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3562884Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3562889Z 
2023-01-11T21:38:06.3563310Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3563416Z @triton.jit
2023-01-11T21:38:06.3563559Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3563635Z     xnumel = 64
2023-01-11T21:38:06.3563733Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3563861Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3563949Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3564023Z     x0 = xindex
2023-01-11T21:38:06.3564215Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3564315Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3564417Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3564515Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3564582Z     tmp4 = 2
2023-01-11T21:38:06.3564660Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3564798Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3564936Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3565023Z ''')
2023-01-11T21:38:06.3565029Z 
2023-01-11T21:38:06.3565033Z 
2023-01-11T21:38:06.3565129Z async_compile.wait(globals())
2023-01-11T21:38:06.3565208Z del async_compile
2023-01-11T21:38:06.3565213Z 
2023-01-11T21:38:06.3565293Z def call(args):
2023-01-11T21:38:06.3565363Z     arg0_1, = args
2023-01-11T21:38:06.3565437Z     args.clear()
2023-01-11T21:38:06.3565532Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3565729Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3565925Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3566021Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3566170Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3566239Z         del arg0_1
2023-01-11T21:38:06.3566326Z         return (buf0, buf1, )
2023-01-11T21:38:06.3566331Z 
2023-01-11T21:38:06.3566335Z 
2023-01-11T21:38:06.3566416Z if __name__ == "__main__":
2023-01-11T21:38:06.3566537Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3566665Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3566896Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3567010Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3567015Z 
2023-01-11T21:38:06.3567020Z 
2023-01-11T21:38:06.3567119Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3567189Z import torch
2023-01-11T21:38:06.3567265Z import random
2023-01-11T21:38:06.3567384Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3567512Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3567517Z 
2023-01-11T21:38:06.3567602Z aten = torch.ops.aten
2023-01-11T21:38:06.3567738Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3567838Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3567844Z 
2023-01-11T21:38:06.3567918Z import triton
2023-01-11T21:38:06.3568006Z import triton.language as tl
2023-01-11T21:38:06.3568131Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3568281Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3568286Z 
2023-01-11T21:38:06.3568290Z 
2023-01-11T21:38:06.3568456Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3568533Z import triton
2023-01-11T21:38:06.3568627Z import triton.language as tl
2023-01-11T21:38:06.3568742Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3568840Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3568975Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3569099Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3569104Z 
2023-01-11T21:38:06.3569528Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3569641Z @triton.jit
2023-01-11T21:38:06.3569788Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3569865Z     xnumel = 64
2023-01-11T21:38:06.3569964Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3570089Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3570173Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3570247Z     x0 = xindex
2023-01-11T21:38:06.3570462Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3570584Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3570687Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3570786Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3570856Z     tmp4 = 2
2023-01-11T21:38:06.3570938Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3571075Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3571209Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3571295Z ''')
2023-01-11T21:38:06.3571300Z 
2023-01-11T21:38:06.3571308Z 
2023-01-11T21:38:06.3571403Z async_compile.wait(globals())
2023-01-11T21:38:06.3571483Z del async_compile
2023-01-11T21:38:06.3571488Z 
2023-01-11T21:38:06.3571566Z def call(args):
2023-01-11T21:38:06.3571635Z     arg0_1, = args
2023-01-11T21:38:06.3571710Z     args.clear()
2023-01-11T21:38:06.3571804Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3572003Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3572200Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3572294Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3572446Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3572515Z         del arg0_1
2023-01-11T21:38:06.3572603Z         return (buf0, buf1, )
2023-01-11T21:38:06.3572608Z 
2023-01-11T21:38:06.3572613Z 
2023-01-11T21:38:06.3572694Z if __name__ == "__main__":
2023-01-11T21:38:06.3572840Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3572968Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3573167Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3573276Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3573282Z 
2023-01-11T21:38:06.3573546Z [2023-01-11 21:34:37,460] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 532
2023-01-11T21:38:06.3573963Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3574099Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3574350Z [2023-01-11 21:34:37,476] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 533
2023-01-11T21:38:06.3574719Z [2023-01-11 21:34:37,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 533
2023-01-11T21:38:06.3575135Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3575268Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3575570Z [2023-01-11 21:34:37,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 534
2023-01-11T21:38:06.3575831Z [2023-01-11 21:34:37,638] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 534
2023-01-11T21:38:06.3576249Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3576379Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3576632Z [2023-01-11 21:34:37,654] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 535
2023-01-11T21:38:06.3576637Z 
2023-01-11T21:38:06.3576738Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3576813Z import torch
2023-01-11T21:38:06.3576881Z import random
2023-01-11T21:38:06.3577003Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3577185Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3577191Z 
2023-01-11T21:38:06.3577292Z aten = torch.ops.aten
2023-01-11T21:38:06.3577440Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3577536Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3577541Z 
2023-01-11T21:38:06.3577613Z import triton
2023-01-11T21:38:06.3577698Z import triton.language as tl
2023-01-11T21:38:06.3577820Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3577960Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3577965Z 
2023-01-11T21:38:06.3577969Z 
2023-01-11T21:38:06.3578132Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3578207Z import triton
2023-01-11T21:38:06.3578302Z import triton.language as tl
2023-01-11T21:38:06.3578415Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3578518Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3578644Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3578769Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3578815Z 
2023-01-11T21:38:06.3579230Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3579304Z @triton.jit
2023-01-11T21:38:06.3579446Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3579521Z     xnumel = 201
2023-01-11T21:38:06.3579618Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3579747Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3579826Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3579897Z     x0 = xindex
2023-01-11T21:38:06.3580086Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3580181Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3580285Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3580386Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3580463Z     tmp4 = 2
2023-01-11T21:38:06.3580535Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3580669Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3580802Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3580888Z ''')
2023-01-11T21:38:06.3580893Z 
2023-01-11T21:38:06.3580897Z 
2023-01-11T21:38:06.3580992Z async_compile.wait(globals())
2023-01-11T21:38:06.3581069Z del async_compile
2023-01-11T21:38:06.3581075Z 
2023-01-11T21:38:06.3581148Z def call(args):
2023-01-11T21:38:06.3581214Z     arg0_1, = args
2023-01-11T21:38:06.3581321Z     args.clear()
2023-01-11T21:38:06.3581414Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3581615Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3581820Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3581914Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3582065Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3582137Z         del arg0_1
2023-01-11T21:38:06.3582213Z         return (buf0, buf1, )
2023-01-11T21:38:06.3582218Z 
2023-01-11T21:38:06.3582222Z 
2023-01-11T21:38:06.3582300Z if __name__ == "__main__":
2023-01-11T21:38:06.3582418Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3582544Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3582743Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3582861Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3582866Z 
2023-01-11T21:38:06.3582871Z 
2023-01-11T21:38:06.3582969Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3583042Z import torch
2023-01-11T21:38:06.3583110Z import random
2023-01-11T21:38:06.3583228Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3583355Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3583361Z 
2023-01-11T21:38:06.3583442Z aten = torch.ops.aten
2023-01-11T21:38:06.3583578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3583672Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3583677Z 
2023-01-11T21:38:06.3583750Z import triton
2023-01-11T21:38:06.3583836Z import triton.language as tl
2023-01-11T21:38:06.3583957Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3584093Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3584099Z 
2023-01-11T21:38:06.3584106Z 
2023-01-11T21:38:06.3584268Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3584340Z import triton
2023-01-11T21:38:06.3584432Z import triton.language as tl
2023-01-11T21:38:06.3584546Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3584646Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3584804Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3584929Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3584934Z 
2023-01-11T21:38:06.3585374Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3585466Z @triton.jit
2023-01-11T21:38:06.3585615Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3585686Z     xnumel = 201
2023-01-11T21:38:06.3585784Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3585913Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3585990Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3586060Z     x0 = xindex
2023-01-11T21:38:06.3586271Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3586389Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3586490Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3586587Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3586657Z     tmp4 = 2
2023-01-11T21:38:06.3586730Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3586862Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3586997Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3587084Z ''')
2023-01-11T21:38:06.3587089Z 
2023-01-11T21:38:06.3587093Z 
2023-01-11T21:38:06.3587185Z async_compile.wait(globals())
2023-01-11T21:38:06.3587291Z del async_compile
2023-01-11T21:38:06.3587296Z 
2023-01-11T21:38:06.3587372Z def call(args):
2023-01-11T21:38:06.3587445Z     arg0_1, = args
2023-01-11T21:38:06.3587513Z     args.clear()
2023-01-11T21:38:06.3587604Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3587806Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3588006Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3588097Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3588244Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3588315Z         del arg0_1
2023-01-11T21:38:06.3588391Z         return (buf0, buf1, )
2023-01-11T21:38:06.3588396Z 
2023-01-11T21:38:06.3588407Z 
2023-01-11T21:38:06.3588479Z if __name__ == "__main__":
2023-01-11T21:38:06.3588596Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3588726Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3588926Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3589038Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3589043Z 
2023-01-11T21:38:06.3589048Z 
2023-01-11T21:38:06.3589144Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3589220Z import torch
2023-01-11T21:38:06.3589288Z import random
2023-01-11T21:38:06.3589404Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3589527Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3589532Z 
2023-01-11T21:38:06.3589616Z aten = torch.ops.aten
2023-01-11T21:38:06.3589749Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3589843Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3589848Z 
2023-01-11T21:38:06.3589919Z import triton
2023-01-11T21:38:06.3590010Z import triton.language as tl
2023-01-11T21:38:06.3590126Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3590266Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3590271Z 
2023-01-11T21:38:06.3590276Z 
2023-01-11T21:38:06.3590438Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3590514Z import triton
2023-01-11T21:38:06.3590643Z import triton.language as tl
2023-01-11T21:38:06.3590760Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3590861Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3590992Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3591111Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3591116Z 
2023-01-11T21:38:06.3591535Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3591610Z @triton.jit
2023-01-11T21:38:06.3591752Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3591823Z     xnumel = 64
2023-01-11T21:38:06.3591919Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3592049Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3592133Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3592198Z     x0 = xindex
2023-01-11T21:38:06.3592389Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3592486Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3592584Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3592682Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3592753Z     tmp4 = 2
2023-01-11T21:38:06.3592824Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3592958Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3593088Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3593201Z ''')
2023-01-11T21:38:06.3593206Z 
2023-01-11T21:38:06.3593211Z 
2023-01-11T21:38:06.3593305Z async_compile.wait(globals())
2023-01-11T21:38:06.3593383Z del async_compile
2023-01-11T21:38:06.3593388Z 
2023-01-11T21:38:06.3593462Z def call(args):
2023-01-11T21:38:06.3593538Z     arg0_1, = args
2023-01-11T21:38:06.3593608Z     args.clear()
2023-01-11T21:38:06.3593702Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3593899Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3594096Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3594186Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3594334Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3594406Z         del arg0_1
2023-01-11T21:38:06.3594481Z         return (buf0, buf1, )
2023-01-11T21:38:06.3594492Z 
2023-01-11T21:38:06.3594497Z 
2023-01-11T21:38:06.3594571Z if __name__ == "__main__":
2023-01-11T21:38:06.3594689Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3594814Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3595012Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.3595127Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3595133Z 
2023-01-11T21:38:06.3595399Z [2023-01-11 21:34:37,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 535
2023-01-11T21:38:06.3595813Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3595943Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3596202Z [2023-01-11 21:34:37,678] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 536
2023-01-11T21:38:06.3596456Z [2023-01-11 21:34:37,829] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 536
2023-01-11T21:38:06.3596897Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3597031Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3597284Z [2023-01-11 21:34:37,845] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 537
2023-01-11T21:38:06.3597541Z [2023-01-11 21:34:37,853] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 537
2023-01-11T21:38:06.3597954Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3598083Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3598337Z [2023-01-11 21:34:37,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 538
2023-01-11T21:38:06.3598344Z 
2023-01-11T21:38:06.3598442Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3598516Z import torch
2023-01-11T21:38:06.3598591Z import random
2023-01-11T21:38:06.3598704Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3598829Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3598860Z 
2023-01-11T21:38:06.3598944Z aten = torch.ops.aten
2023-01-11T21:38:06.3599081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3599177Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3599182Z 
2023-01-11T21:38:06.3599256Z import triton
2023-01-11T21:38:06.3599348Z import triton.language as tl
2023-01-11T21:38:06.3599469Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3599611Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3599617Z 
2023-01-11T21:38:06.3599621Z 
2023-01-11T21:38:06.3599784Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3599858Z import triton
2023-01-11T21:38:06.3599951Z import triton.language as tl
2023-01-11T21:38:06.3600065Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3600165Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3600295Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3600416Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3600422Z 
2023-01-11T21:38:06.3600842Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3600914Z @triton.jit
2023-01-11T21:38:06.3601055Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3601126Z     xnumel = 64
2023-01-11T21:38:06.3601224Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3601350Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3601433Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3601497Z     x0 = xindex
2023-01-11T21:38:06.3601687Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3601784Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3601885Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3601981Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3602051Z     tmp4 = 2
2023-01-11T21:38:06.3602129Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3602258Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3602416Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3602500Z ''')
2023-01-11T21:38:06.3602506Z 
2023-01-11T21:38:06.3602511Z 
2023-01-11T21:38:06.3602600Z async_compile.wait(globals())
2023-01-11T21:38:06.3602676Z del async_compile
2023-01-11T21:38:06.3602681Z 
2023-01-11T21:38:06.3602757Z def call(args):
2023-01-11T21:38:06.3602831Z     arg0_1, = args
2023-01-11T21:38:06.3602905Z     args.clear()
2023-01-11T21:38:06.3602990Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3603186Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3603380Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3603474Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3603620Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3603691Z         del arg0_1
2023-01-11T21:38:06.3603776Z         return (buf0, buf1, )
2023-01-11T21:38:06.3603781Z 
2023-01-11T21:38:06.3603788Z 
2023-01-11T21:38:06.3603861Z if __name__ == "__main__":
2023-01-11T21:38:06.3603980Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3604103Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3604302Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.3604413Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3604418Z 
2023-01-11T21:38:06.3604422Z 
2023-01-11T21:38:06.3604517Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3604591Z import torch
2023-01-11T21:38:06.3604663Z import random
2023-01-11T21:38:06.3604803Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3604926Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3604931Z 
2023-01-11T21:38:06.3605011Z aten = torch.ops.aten
2023-01-11T21:38:06.3605143Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3605239Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3605244Z 
2023-01-11T21:38:06.3605318Z import triton
2023-01-11T21:38:06.3605408Z import triton.language as tl
2023-01-11T21:38:06.3605531Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3605663Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3605669Z 
2023-01-11T21:38:06.3605673Z 
2023-01-11T21:38:06.3605835Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3605908Z import triton
2023-01-11T21:38:06.3605999Z import triton.language as tl
2023-01-11T21:38:06.3606109Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3606214Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3606346Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3606464Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3606477Z 
2023-01-11T21:38:06.3606888Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3606960Z @triton.jit
2023-01-11T21:38:06.3607101Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3607175Z     xnumel = 201
2023-01-11T21:38:06.3607271Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3607398Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3607481Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3607545Z     x0 = xindex
2023-01-11T21:38:06.3607733Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3607833Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3607931Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3608024Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3608094Z     tmp4 = 2
2023-01-11T21:38:06.3608174Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3608328Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3608461Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3608547Z ''')
2023-01-11T21:38:06.3608553Z 
2023-01-11T21:38:06.3608557Z 
2023-01-11T21:38:06.3608653Z async_compile.wait(globals())
2023-01-11T21:38:06.3608730Z del async_compile
2023-01-11T21:38:06.3608735Z 
2023-01-11T21:38:06.3608811Z def call(args):
2023-01-11T21:38:06.3608883Z     arg0_1, = args
2023-01-11T21:38:06.3608957Z     args.clear()
2023-01-11T21:38:06.3609042Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3609241Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3609443Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3609535Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3609683Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3609761Z         del arg0_1
2023-01-11T21:38:06.3609844Z         return (buf0, buf1, )
2023-01-11T21:38:06.3609849Z 
2023-01-11T21:38:06.3609854Z 
2023-01-11T21:38:06.3609932Z if __name__ == "__main__":
2023-01-11T21:38:06.3610043Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3610169Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3610367Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.3610479Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3610484Z 
2023-01-11T21:38:06.3610489Z 
2023-01-11T21:38:06.3610634Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3610707Z import torch
2023-01-11T21:38:06.3610781Z import random
2023-01-11T21:38:06.3610892Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3611013Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3611018Z 
2023-01-11T21:38:06.3611099Z aten = torch.ops.aten
2023-01-11T21:38:06.3611237Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3611332Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3611337Z 
2023-01-11T21:38:06.3611408Z import triton
2023-01-11T21:38:06.3611498Z import triton.language as tl
2023-01-11T21:38:06.3611619Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3611753Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3611758Z 
2023-01-11T21:38:06.3611771Z 
2023-01-11T21:38:06.3611927Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3611999Z import triton
2023-01-11T21:38:06.3612095Z import triton.language as tl
2023-01-11T21:38:06.3612204Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3612306Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3612438Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3612562Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3612570Z 
2023-01-11T21:38:06.3612973Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3613048Z @triton.jit
2023-01-11T21:38:06.3613192Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3613265Z     xnumel = 201
2023-01-11T21:38:06.3613362Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3613489Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3613577Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3613647Z     x0 = xindex
2023-01-11T21:38:06.3613829Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3613925Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3614024Z     tmp1 = tl.libdevice.expm1(tmp0)
2023-01-11T21:38:06.3614146Z     tmp3 = tl.libdevice.expm1(tmp2)
2023-01-11T21:38:06.3614220Z     tmp4 = 2
2023-01-11T21:38:06.3614299Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3614432Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.3614677Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.3614766Z ''')
2023-01-11T21:38:06.3614771Z 
2023-01-11T21:38:06.3614775Z 
2023-01-11T21:38:06.3614868Z async_compile.wait(globals())
2023-01-11T21:38:06.3614943Z del async_compile
2023-01-11T21:38:06.3614948Z 
2023-01-11T21:38:06.3615021Z def call(args):
2023-01-11T21:38:06.3615093Z     arg0_1, = args
2023-01-11T21:38:06.3615174Z     args.clear()
2023-01-11T21:38:06.3615259Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3615456Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3615656Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.3615750Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3615899Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3615969Z         del arg0_1
2023-01-11T21:38:06.3616050Z         return (buf0, buf1, )
2023-01-11T21:38:06.3616055Z 
2023-01-11T21:38:06.3616059Z 
2023-01-11T21:38:06.3616135Z if __name__ == "__main__":
2023-01-11T21:38:06.3616246Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3616371Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3616568Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.3616722Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3616728Z 
2023-01-11T21:38:06.3616992Z [2023-01-11 21:34:38,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 538
2023-01-11T21:38:06.3617485Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3617619Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3617877Z [2023-01-11 21:34:38,029] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 539
2023-01-11T21:38:06.3618139Z [2023-01-11 21:34:38,037] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 539
2023-01-11T21:38:06.3618558Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3618691Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3618938Z [2023-01-11 21:34:38,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 540
2023-01-11T21:38:06.3619200Z [2023-01-11 21:34:38,204] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 540
2023-01-11T21:38:06.3619606Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3619735Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3619988Z [2023-01-11 21:34:38,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 541
2023-01-11T21:38:06.3620076Z 
2023-01-11T21:38:06.3620176Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3620252Z import torch
2023-01-11T21:38:06.3620326Z import random
2023-01-11T21:38:06.3620446Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3620562Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3620568Z 
2023-01-11T21:38:06.3620651Z aten = torch.ops.aten
2023-01-11T21:38:06.3620787Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3620880Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3620886Z 
2023-01-11T21:38:06.3620960Z import triton
2023-01-11T21:38:06.3621057Z import triton.language as tl
2023-01-11T21:38:06.3621182Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3621314Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3621328Z 
2023-01-11T21:38:06.3621333Z 
2023-01-11T21:38:06.3621489Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3621566Z import triton
2023-01-11T21:38:06.3621659Z import triton.language as tl
2023-01-11T21:38:06.3621771Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3621873Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3622004Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3622128Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3622133Z 
2023-01-11T21:38:06.3622549Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3622644Z @triton.jit
2023-01-11T21:38:06.3622786Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3622861Z     xnumel = 64
2023-01-11T21:38:06.3622957Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3623088Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3623172Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3623240Z     x0 = xindex
2023-01-11T21:38:06.3623423Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3623519Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3623606Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3623703Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3623789Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3623887Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3623957Z     tmp6 = 2
2023-01-11T21:38:06.3624028Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3624163Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3624293Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3624377Z ''')
2023-01-11T21:38:06.3624383Z 
2023-01-11T21:38:06.3624387Z 
2023-01-11T21:38:06.3624476Z async_compile.wait(globals())
2023-01-11T21:38:06.3624552Z del async_compile
2023-01-11T21:38:06.3624558Z 
2023-01-11T21:38:06.3624633Z def call(args):
2023-01-11T21:38:06.3624706Z     arg0_1, = args
2023-01-11T21:38:06.3624774Z     args.clear()
2023-01-11T21:38:06.3624865Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3625062Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3625259Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3625353Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3625500Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3625577Z         del arg0_1
2023-01-11T21:38:06.3625653Z         return (buf0, buf1, )
2023-01-11T21:38:06.3625658Z 
2023-01-11T21:38:06.3625669Z 
2023-01-11T21:38:06.3625741Z if __name__ == "__main__":
2023-01-11T21:38:06.3625859Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3626012Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3626208Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.3626320Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3626325Z 
2023-01-11T21:38:06.3626330Z 
2023-01-11T21:38:06.3626429Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3626504Z import torch
2023-01-11T21:38:06.3626571Z import random
2023-01-11T21:38:06.3626689Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3626811Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3626816Z 
2023-01-11T21:38:06.3626899Z aten = torch.ops.aten
2023-01-11T21:38:06.3627038Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3627132Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3627137Z 
2023-01-11T21:38:06.3627213Z import triton
2023-01-11T21:38:06.3627305Z import triton.language as tl
2023-01-11T21:38:06.3627424Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3627560Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3627565Z 
2023-01-11T21:38:06.3627570Z 
2023-01-11T21:38:06.3627733Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3627806Z import triton
2023-01-11T21:38:06.3627897Z import triton.language as tl
2023-01-11T21:38:06.3628009Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3628113Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3628239Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3628363Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3628395Z 
2023-01-11T21:38:06.3628813Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3628887Z @triton.jit
2023-01-11T21:38:06.3629032Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3629108Z     xnumel = 64
2023-01-11T21:38:06.3629205Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3629332Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3629409Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3629481Z     x0 = xindex
2023-01-11T21:38:06.3629671Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3629766Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3629854Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3629961Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3630048Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3630139Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3630208Z     tmp6 = 2
2023-01-11T21:38:06.3630288Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3630422Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3630557Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3630640Z ''')
2023-01-11T21:38:06.3630646Z 
2023-01-11T21:38:06.3630650Z 
2023-01-11T21:38:06.3630741Z async_compile.wait(globals())
2023-01-11T21:38:06.3630818Z del async_compile
2023-01-11T21:38:06.3630824Z 
2023-01-11T21:38:06.3630891Z def call(args):
2023-01-11T21:38:06.3630963Z     arg0_1, = args
2023-01-11T21:38:06.3631037Z     args.clear()
2023-01-11T21:38:06.3631128Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3631323Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3631548Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3631643Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3631797Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3631874Z         del arg0_1
2023-01-11T21:38:06.3631984Z         return (buf0, buf1, )
2023-01-11T21:38:06.3631990Z 
2023-01-11T21:38:06.3631995Z 
2023-01-11T21:38:06.3632075Z if __name__ == "__main__":
2023-01-11T21:38:06.3632202Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3632335Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3632554Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.3632665Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3632670Z 
2023-01-11T21:38:06.3632675Z 
2023-01-11T21:38:06.3632770Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3632837Z import torch
2023-01-11T21:38:06.3632914Z import random
2023-01-11T21:38:06.3633032Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3633156Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3633161Z 
2023-01-11T21:38:06.3633242Z aten = torch.ops.aten
2023-01-11T21:38:06.3633378Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3633473Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3633478Z 
2023-01-11T21:38:06.3633545Z import triton
2023-01-11T21:38:06.3633641Z import triton.language as tl
2023-01-11T21:38:06.3633764Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3633904Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3633909Z 
2023-01-11T21:38:06.3633914Z 
2023-01-11T21:38:06.3634076Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3634151Z import triton
2023-01-11T21:38:06.3634246Z import triton.language as tl
2023-01-11T21:38:06.3634359Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3634480Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3634610Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3634734Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3634739Z 
2023-01-11T21:38:06.3635154Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3635232Z @triton.jit
2023-01-11T21:38:06.3635378Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3635454Z     xnumel = 201
2023-01-11T21:38:06.3635550Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3635673Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3635754Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3635826Z     x0 = xindex
2023-01-11T21:38:06.3636021Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3636119Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3636208Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3636306Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3636386Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3636486Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3636557Z     tmp6 = 2
2023-01-11T21:38:06.3636640Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3636772Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3636907Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3636991Z ''')
2023-01-11T21:38:06.3636996Z 
2023-01-11T21:38:06.3637001Z 
2023-01-11T21:38:06.3637086Z async_compile.wait(globals())
2023-01-11T21:38:06.3637162Z del async_compile
2023-01-11T21:38:06.3637168Z 
2023-01-11T21:38:06.3637240Z def call(args):
2023-01-11T21:38:06.3637315Z     arg0_1, = args
2023-01-11T21:38:06.3637388Z     args.clear()
2023-01-11T21:38:06.3637478Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3637677Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3637870Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3637995Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3638145Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3638218Z         del arg0_1
2023-01-11T21:38:06.3638301Z         return (buf0, buf1, )
2023-01-11T21:38:06.3638306Z 
2023-01-11T21:38:06.3638311Z 
2023-01-11T21:38:06.3638389Z if __name__ == "__main__":
2023-01-11T21:38:06.3638505Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3638632Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3638823Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.3638939Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3638945Z 
2023-01-11T21:38:06.3639211Z [2023-01-11 21:34:38,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 541
2023-01-11T21:38:06.3639628Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3639759Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3640015Z [2023-01-11 21:34:38,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 542
2023-01-11T21:38:06.3640280Z [2023-01-11 21:34:38,387] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 542
2023-01-11T21:38:06.3640719Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3640851Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3641105Z [2023-01-11 21:34:38,403] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 543
2023-01-11T21:38:06.3641369Z [2023-01-11 21:34:38,411] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 543
2023-01-11T21:38:06.3641778Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3641905Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3642162Z [2023-01-11 21:34:38,427] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 544
2023-01-11T21:38:06.3642168Z 
2023-01-11T21:38:06.3642267Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3642341Z import torch
2023-01-11T21:38:06.3642416Z import random
2023-01-11T21:38:06.3642537Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3642662Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3642667Z 
2023-01-11T21:38:06.3642746Z aten = torch.ops.aten
2023-01-11T21:38:06.3642876Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3642969Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3642974Z 
2023-01-11T21:38:06.3643049Z import triton
2023-01-11T21:38:06.3643143Z import triton.language as tl
2023-01-11T21:38:06.3643269Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3643410Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3643416Z 
2023-01-11T21:38:06.3643420Z 
2023-01-11T21:38:06.3643610Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3643686Z import triton
2023-01-11T21:38:06.3643770Z import triton.language as tl
2023-01-11T21:38:06.3643885Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3643986Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3644120Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3644245Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3644251Z 
2023-01-11T21:38:06.3644664Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3644740Z @triton.jit
2023-01-11T21:38:06.3644883Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3644952Z     xnumel = 201
2023-01-11T21:38:06.3645049Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3645180Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3645262Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3645331Z     x0 = xindex
2023-01-11T21:38:06.3645523Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3645619Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3645701Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3645799Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3645885Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3645985Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3646056Z     tmp6 = 2
2023-01-11T21:38:06.3646134Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3646300Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3646427Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3646511Z ''')
2023-01-11T21:38:06.3646518Z 
2023-01-11T21:38:06.3646522Z 
2023-01-11T21:38:06.3646616Z async_compile.wait(globals())
2023-01-11T21:38:06.3646692Z del async_compile
2023-01-11T21:38:06.3646697Z 
2023-01-11T21:38:06.3646770Z def call(args):
2023-01-11T21:38:06.3646843Z     arg0_1, = args
2023-01-11T21:38:06.3646916Z     args.clear()
2023-01-11T21:38:06.3647001Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3647200Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3647401Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3647493Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3647643Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3647719Z         del arg0_1
2023-01-11T21:38:06.3647799Z         return (buf0, buf1, )
2023-01-11T21:38:06.3647804Z 
2023-01-11T21:38:06.3647809Z 
2023-01-11T21:38:06.3647890Z if __name__ == "__main__":
2023-01-11T21:38:06.3648001Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3648130Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3648329Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.3648445Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3648450Z 
2023-01-11T21:38:06.3648454Z 
2023-01-11T21:38:06.3648549Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3648623Z import torch
2023-01-11T21:38:06.3648697Z import random
2023-01-11T21:38:06.3648820Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3648936Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3648941Z 
2023-01-11T21:38:06.3649026Z aten = torch.ops.aten
2023-01-11T21:38:06.3649162Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3649256Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3649262Z 
2023-01-11T21:38:06.3649335Z import triton
2023-01-11T21:38:06.3649427Z import triton.language as tl
2023-01-11T21:38:06.3649577Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3649714Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3649720Z 
2023-01-11T21:38:06.3649724Z 
2023-01-11T21:38:06.3649880Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3649957Z import triton
2023-01-11T21:38:06.3650050Z import triton.language as tl
2023-01-11T21:38:06.3650163Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3650261Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3650391Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3650516Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3650523Z 
2023-01-11T21:38:06.3650939Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3651010Z @triton.jit
2023-01-11T21:38:06.3651152Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3651223Z     xnumel = 64
2023-01-11T21:38:06.3651319Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3651445Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3651528Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3651597Z     x0 = xindex
2023-01-11T21:38:06.3651781Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3651877Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3651965Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3652103Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3652192Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3652289Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3652359Z     tmp6 = 2
2023-01-11T21:38:06.3652430Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3652564Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3652698Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3652786Z ''')
2023-01-11T21:38:06.3652792Z 
2023-01-11T21:38:06.3652796Z 
2023-01-11T21:38:06.3652886Z async_compile.wait(globals())
2023-01-11T21:38:06.3652962Z del async_compile
2023-01-11T21:38:06.3652967Z 
2023-01-11T21:38:06.3653043Z def call(args):
2023-01-11T21:38:06.3653115Z     arg0_1, = args
2023-01-11T21:38:06.3653183Z     args.clear()
2023-01-11T21:38:06.3653270Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3653472Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3653673Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3653764Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3653911Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3653982Z         del arg0_1
2023-01-11T21:38:06.3654061Z         return (buf0, buf1, )
2023-01-11T21:38:06.3654066Z 
2023-01-11T21:38:06.3654080Z 
2023-01-11T21:38:06.3654153Z if __name__ == "__main__":
2023-01-11T21:38:06.3654270Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3654397Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3654697Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3654814Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3654819Z 
2023-01-11T21:38:06.3654823Z 
2023-01-11T21:38:06.3654923Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3655001Z import torch
2023-01-11T21:38:06.3655092Z import random
2023-01-11T21:38:06.3655224Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3655359Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3655365Z 
2023-01-11T21:38:06.3655447Z aten = torch.ops.aten
2023-01-11T21:38:06.3655623Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3655722Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3655728Z 
2023-01-11T21:38:06.3655808Z import triton
2023-01-11T21:38:06.3655899Z import triton.language as tl
2023-01-11T21:38:06.3656016Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3656157Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3656163Z 
2023-01-11T21:38:06.3656167Z 
2023-01-11T21:38:06.3656331Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3656406Z import triton
2023-01-11T21:38:06.3656497Z import triton.language as tl
2023-01-11T21:38:06.3656612Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3656713Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3656847Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3656965Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3656970Z 
2023-01-11T21:38:06.3657443Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3657519Z @triton.jit
2023-01-11T21:38:06.3657659Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3657729Z     xnumel = 64
2023-01-11T21:38:06.3657825Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3657955Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3658038Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3658146Z     x0 = xindex
2023-01-11T21:38:06.3658342Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3658441Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3658533Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3658634Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3658725Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3658825Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3658891Z     tmp6 = 2
2023-01-11T21:38:06.3658971Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3659106Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3659240Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3659327Z ''')
2023-01-11T21:38:06.3659333Z 
2023-01-11T21:38:06.3659337Z 
2023-01-11T21:38:06.3659434Z async_compile.wait(globals())
2023-01-11T21:38:06.3659512Z del async_compile
2023-01-11T21:38:06.3659517Z 
2023-01-11T21:38:06.3659587Z def call(args):
2023-01-11T21:38:06.3659664Z     arg0_1, = args
2023-01-11T21:38:06.3659740Z     args.clear()
2023-01-11T21:38:06.3659834Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3660033Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3660234Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3660330Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3660471Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3660546Z         del arg0_1
2023-01-11T21:38:06.3660630Z         return (buf0, buf1, )
2023-01-11T21:38:06.3660636Z 
2023-01-11T21:38:06.3660640Z 
2023-01-11T21:38:06.3660722Z if __name__ == "__main__":
2023-01-11T21:38:06.3660839Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3660966Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3661165Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3661281Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3661287Z 
2023-01-11T21:38:06.3661553Z [2023-01-11 21:34:38,578] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 544
2023-01-11T21:38:06.3661990Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3662126Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3662383Z [2023-01-11 21:34:38,593] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 545
2023-01-11T21:38:06.3662647Z [2023-01-11 21:34:38,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 545
2023-01-11T21:38:06.3662656Z 
2023-01-11T21:38:06.3662757Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3662832Z import torch
2023-01-11T21:38:06.3662908Z import random
2023-01-11T21:38:06.3663029Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3663153Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3663161Z 
2023-01-11T21:38:06.3663238Z aten = torch.ops.aten
2023-01-11T21:38:06.3663377Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3663475Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3663480Z 
2023-01-11T21:38:06.3663555Z import triton
2023-01-11T21:38:06.3663645Z import triton.language as tl
2023-01-11T21:38:06.3663768Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3663910Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3663915Z 
2023-01-11T21:38:06.3663920Z 
2023-01-11T21:38:06.3664084Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3664182Z import triton
2023-01-11T21:38:06.3664275Z import triton.language as tl
2023-01-11T21:38:06.3664390Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3664494Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3664631Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3664760Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3664766Z 
2023-01-11T21:38:06.3665197Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3665281Z @triton.jit
2023-01-11T21:38:06.3665444Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3665518Z     xnumel = 201
2023-01-11T21:38:06.3665617Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3665751Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3665839Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3665910Z     x0 = xindex
2023-01-11T21:38:06.3666102Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3666195Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3666285Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3666387Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3666473Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3666572Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3666644Z     tmp6 = 2
2023-01-11T21:38:06.3666725Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3666855Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3666990Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3667076Z ''')
2023-01-11T21:38:06.3667082Z 
2023-01-11T21:38:06.3667086Z 
2023-01-11T21:38:06.3667186Z async_compile.wait(globals())
2023-01-11T21:38:06.3667264Z del async_compile
2023-01-11T21:38:06.3667269Z 
2023-01-11T21:38:06.3667347Z def call(args):
2023-01-11T21:38:06.3667421Z     arg0_1, = args
2023-01-11T21:38:06.3667492Z     args.clear()
2023-01-11T21:38:06.3667584Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3667813Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3668015Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3668109Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3668258Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3668334Z         del arg0_1
2023-01-11T21:38:06.3668418Z         return (buf0, buf1, )
2023-01-11T21:38:06.3668423Z 
2023-01-11T21:38:06.3668428Z 
2023-01-11T21:38:06.3668503Z if __name__ == "__main__":
2023-01-11T21:38:06.3668623Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3668754Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3668953Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3669067Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3669072Z 
2023-01-11T21:38:06.3669077Z 
2023-01-11T21:38:06.3669178Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3669250Z import torch
2023-01-11T21:38:06.3669330Z import random
2023-01-11T21:38:06.3669444Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3669569Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3669574Z 
2023-01-11T21:38:06.3669657Z aten = torch.ops.aten
2023-01-11T21:38:06.3669796Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3669896Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3669901Z 
2023-01-11T21:38:06.3669977Z import triton
2023-01-11T21:38:06.3670072Z import triton.language as tl
2023-01-11T21:38:06.3670219Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3670360Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3670366Z 
2023-01-11T21:38:06.3670370Z 
2023-01-11T21:38:06.3670535Z triton_fused_expm1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.3670611Z import triton
2023-01-11T21:38:06.3670707Z import triton.language as tl
2023-01-11T21:38:06.3670822Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3670927Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3671059Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3671180Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3671185Z 
2023-01-11T21:38:06.3671605Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3671684Z @triton.jit
2023-01-11T21:38:06.3671829Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3671907Z     xnumel = 201
2023-01-11T21:38:06.3672007Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3672135Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3672222Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3672288Z     x0 = xindex
2023-01-11T21:38:06.3672478Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3672575Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3672665Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3672764Z     tmp2 = tl.libdevice.expm1(tmp1)
2023-01-11T21:38:06.3672854Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.3672953Z     tmp5 = tl.libdevice.expm1(tmp4)
2023-01-11T21:38:06.3673020Z     tmp6 = 2
2023-01-11T21:38:06.3673100Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.3673235Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3673372Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.3673458Z ''')
2023-01-11T21:38:06.3673464Z 
2023-01-11T21:38:06.3673468Z 
2023-01-11T21:38:06.3673563Z async_compile.wait(globals())
2023-01-11T21:38:06.3673641Z del async_compile
2023-01-11T21:38:06.3673672Z 
2023-01-11T21:38:06.3673750Z def call(args):
2023-01-11T21:38:06.3673819Z     arg0_1, = args
2023-01-11T21:38:06.3673895Z     args.clear()
2023-01-11T21:38:06.3673989Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3674190Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3674388Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3674483Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3674633Z         triton_fused_expm1_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.3674701Z         del arg0_1
2023-01-11T21:38:06.3674790Z         return (buf0, buf1, )
2023-01-11T21:38:06.3674794Z 
2023-01-11T21:38:06.3674799Z 
2023-01-11T21:38:06.3674881Z if __name__ == "__main__":
2023-01-11T21:38:06.3675000Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3675139Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3675372Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3675488Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3675493Z 
2023-01-11T21:38:06.3675568Z ok (1.860s)
2023-01-11T21:38:06.3676021Z   test_fill1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3676175Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3676436Z [2023-01-11 21:34:38,642] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 546
2023-01-11T21:38:06.3676704Z [2023-01-11 21:34:38,715] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 546
2023-01-11T21:38:06.3677121Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3677254Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3677510Z [2023-01-11 21:34:38,756] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 547
2023-01-11T21:38:06.3677777Z [2023-01-11 21:34:38,828] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 547
2023-01-11T21:38:06.3677783Z 
2023-01-11T21:38:06.3677885Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3677960Z import torch
2023-01-11T21:38:06.3678037Z import random
2023-01-11T21:38:06.3678154Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3678278Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3678283Z 
2023-01-11T21:38:06.3678366Z aten = torch.ops.aten
2023-01-11T21:38:06.3678504Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3678602Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3678607Z 
2023-01-11T21:38:06.3678681Z import triton
2023-01-11T21:38:06.3678774Z import triton.language as tl
2023-01-11T21:38:06.3678895Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3679038Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3679047Z 
2023-01-11T21:38:06.3679051Z 
2023-01-11T21:38:06.3679216Z triton_fused_ones_like_0 = async_compile.triton('''
2023-01-11T21:38:06.3679294Z import triton
2023-01-11T21:38:06.3679387Z import triton.language as tl
2023-01-11T21:38:06.3679500Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3679630Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3679765Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3679886Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3679891Z 
2023-01-11T21:38:06.3680284Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3680361Z @triton.jit
2023-01-11T21:38:06.3680485Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3680562Z     xnumel = 256
2023-01-11T21:38:06.3680664Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3680794Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3680879Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3680945Z     x0 = xindex
2023-01-11T21:38:06.3681018Z     tmp0 = 1
2023-01-11T21:38:06.3681153Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3681239Z ''')
2023-01-11T21:38:06.3681245Z 
2023-01-11T21:38:06.3681249Z 
2023-01-11T21:38:06.3681413Z triton_fused_full_like_1 = async_compile.triton('''
2023-01-11T21:38:06.3681491Z import triton
2023-01-11T21:38:06.3681584Z import triton.language as tl
2023-01-11T21:38:06.3681693Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3681795Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3681929Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3682056Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3682061Z 
2023-01-11T21:38:06.3682471Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3682547Z @triton.jit
2023-01-11T21:38:06.3682670Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3682749Z     xnumel = 256
2023-01-11T21:38:06.3682841Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3682971Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3683056Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3683125Z     x0 = xindex
2023-01-11T21:38:06.3683197Z     tmp0 = 2
2023-01-11T21:38:06.3683330Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3683418Z ''')
2023-01-11T21:38:06.3683423Z 
2023-01-11T21:38:06.3683427Z 
2023-01-11T21:38:06.3683515Z async_compile.wait(globals())
2023-01-11T21:38:06.3683594Z del async_compile
2023-01-11T21:38:06.3683602Z 
2023-01-11T21:38:06.3683682Z def call(args):
2023-01-11T21:38:06.3683755Z     arg0_1, = args
2023-01-11T21:38:06.3683831Z     args.clear()
2023-01-11T21:38:06.3683926Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3684132Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3684223Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3684361Z         triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3684570Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3684709Z         triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3684798Z         return (buf0, buf1, )
2023-01-11T21:38:06.3684803Z 
2023-01-11T21:38:06.3684807Z 
2023-01-11T21:38:06.3684890Z if __name__ == "__main__":
2023-01-11T21:38:06.3685023Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3685168Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3685396Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3685504Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3685510Z 
2023-01-11T21:38:06.3685514Z 
2023-01-11T21:38:06.3685614Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3685717Z import torch
2023-01-11T21:38:06.3685796Z import random
2023-01-11T21:38:06.3685919Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3686043Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3686048Z 
2023-01-11T21:38:06.3686133Z aten = torch.ops.aten
2023-01-11T21:38:06.3686265Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3686362Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3686367Z 
2023-01-11T21:38:06.3686443Z import triton
2023-01-11T21:38:06.3686538Z import triton.language as tl
2023-01-11T21:38:06.3686664Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3686808Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3686813Z 
2023-01-11T21:38:06.3686818Z 
2023-01-11T21:38:06.3686981Z triton_fused_ones_like_0 = async_compile.triton('''
2023-01-11T21:38:06.3687057Z import triton
2023-01-11T21:38:06.3687145Z import triton.language as tl
2023-01-11T21:38:06.3687261Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3687367Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3687501Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3687627Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3687632Z 
2023-01-11T21:38:06.3688013Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3688087Z @triton.jit
2023-01-11T21:38:06.3688237Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3688306Z     xnumel = 256
2023-01-11T21:38:06.3688406Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3688534Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3688624Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3688695Z     x0 = xindex
2023-01-11T21:38:06.3688768Z     tmp0 = 1
2023-01-11T21:38:06.3688908Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3688989Z ''')
2023-01-11T21:38:06.3688994Z 
2023-01-11T21:38:06.3689006Z 
2023-01-11T21:38:06.3689163Z triton_fused_full_like_1 = async_compile.triton('''
2023-01-11T21:38:06.3689239Z import triton
2023-01-11T21:38:06.3689335Z import triton.language as tl
2023-01-11T21:38:06.3689448Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3689553Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3689692Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3689821Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3689826Z 
2023-01-11T21:38:06.3690195Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3690276Z @triton.jit
2023-01-11T21:38:06.3690397Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3690474Z     xnumel = 256
2023-01-11T21:38:06.3690570Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3690700Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3690783Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3690849Z     x0 = xindex
2023-01-11T21:38:06.3690921Z     tmp0 = 2
2023-01-11T21:38:06.3691054Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3691141Z ''')
2023-01-11T21:38:06.3691147Z 
2023-01-11T21:38:06.3691154Z 
2023-01-11T21:38:06.3691249Z async_compile.wait(globals())
2023-01-11T21:38:06.3691330Z del async_compile
2023-01-11T21:38:06.3691336Z 
2023-01-11T21:38:06.3691411Z def call(args):
2023-01-11T21:38:06.3691488Z     arg0_1, = args
2023-01-11T21:38:06.3691557Z     args.clear()
2023-01-11T21:38:06.3691651Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3691883Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3691977Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3692114Z         triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3692318Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3692454Z         triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3692532Z         return (buf0, buf1, )
2023-01-11T21:38:06.3692542Z 
2023-01-11T21:38:06.3692547Z 
2023-01-11T21:38:06.3692622Z if __name__ == "__main__":
2023-01-11T21:38:06.3692746Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3692873Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3693075Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3693190Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3693198Z 
2023-01-11T21:38:06.3693272Z ok (0.227s)
2023-01-11T21:38:06.3693727Z   test_fill2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3693865Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3694127Z [2023-01-11 21:34:38,872] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 548
2023-01-11T21:38:06.3694430Z [2023-01-11 21:34:38,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 548
2023-01-11T21:38:06.3694951Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3695082Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3695338Z [2023-01-11 21:34:38,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 549
2023-01-11T21:38:06.3695600Z [2023-01-11 21:34:39,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 549
2023-01-11T21:38:06.3695606Z 
2023-01-11T21:38:06.3695709Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3695782Z import torch
2023-01-11T21:38:06.3695858Z import random
2023-01-11T21:38:06.3695977Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3696094Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3696100Z 
2023-01-11T21:38:06.3696184Z aten = torch.ops.aten
2023-01-11T21:38:06.3696318Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3696412Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3696417Z 
2023-01-11T21:38:06.3696490Z import triton
2023-01-11T21:38:06.3696581Z import triton.language as tl
2023-01-11T21:38:06.3696709Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3696848Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3696853Z 
2023-01-11T21:38:06.3696858Z 
2023-01-11T21:38:06.3697012Z triton_fused_ones_like_0 = async_compile.triton('''
2023-01-11T21:38:06.3697086Z import triton
2023-01-11T21:38:06.3697232Z import triton.language as tl
2023-01-11T21:38:06.3697367Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3697469Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3697600Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3697771Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3697778Z 
2023-01-11T21:38:06.3698168Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3698235Z @triton.jit
2023-01-11T21:38:06.3698358Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3698432Z     xnumel = 256
2023-01-11T21:38:06.3698528Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3698659Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3698742Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3698816Z     x0 = xindex
2023-01-11T21:38:06.3698880Z     tmp0 = 1
2023-01-11T21:38:06.3699012Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3699097Z ''')
2023-01-11T21:38:06.3699103Z 
2023-01-11T21:38:06.3699107Z 
2023-01-11T21:38:06.3699271Z triton_fused_full_like_1 = async_compile.triton('''
2023-01-11T21:38:06.3699346Z import triton
2023-01-11T21:38:06.3699437Z import triton.language as tl
2023-01-11T21:38:06.3699547Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3699648Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3699772Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3699893Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3699898Z 
2023-01-11T21:38:06.3700281Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3700394Z @triton.jit
2023-01-11T21:38:06.3700514Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3700587Z     xnumel = 256
2023-01-11T21:38:06.3700684Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3700814Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3700890Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3700960Z     x0 = xindex
2023-01-11T21:38:06.3701031Z     tmp0 = 3.0
2023-01-11T21:38:06.3701165Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3701249Z ''')
2023-01-11T21:38:06.3701255Z 
2023-01-11T21:38:06.3701259Z 
2023-01-11T21:38:06.3701351Z async_compile.wait(globals())
2023-01-11T21:38:06.3701428Z del async_compile
2023-01-11T21:38:06.3701433Z 
2023-01-11T21:38:06.3701500Z def call(args):
2023-01-11T21:38:06.3701571Z     arg0_1, = args
2023-01-11T21:38:06.3701644Z     args.clear()
2023-01-11T21:38:06.3701738Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3701942Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3702033Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3702168Z         triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3702368Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3702502Z         triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3702585Z         return (buf0, buf1, )
2023-01-11T21:38:06.3702590Z 
2023-01-11T21:38:06.3702595Z 
2023-01-11T21:38:06.3702675Z if __name__ == "__main__":
2023-01-11T21:38:06.3702790Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3702919Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3703121Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3703234Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3703239Z 
2023-01-11T21:38:06.3703244Z 
2023-01-11T21:38:06.3703341Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3703409Z import torch
2023-01-11T21:38:06.3703481Z import random
2023-01-11T21:38:06.3703601Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3703751Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3703757Z 
2023-01-11T21:38:06.3703840Z aten = torch.ops.aten
2023-01-11T21:38:06.3703976Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3704072Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3704077Z 
2023-01-11T21:38:06.3704145Z import triton
2023-01-11T21:38:06.3704236Z import triton.language as tl
2023-01-11T21:38:06.3704363Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3704500Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3704505Z 
2023-01-11T21:38:06.3704513Z 
2023-01-11T21:38:06.3704672Z triton_fused_ones_like_0 = async_compile.triton('''
2023-01-11T21:38:06.3704750Z import triton
2023-01-11T21:38:06.3704841Z import triton.language as tl
2023-01-11T21:38:06.3704954Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3705050Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3705186Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3705326Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3705332Z 
2023-01-11T21:38:06.3705752Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3705826Z @triton.jit
2023-01-11T21:38:06.3705946Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3706020Z     xnumel = 256
2023-01-11T21:38:06.3706118Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3706275Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3706361Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3706435Z     x0 = xindex
2023-01-11T21:38:06.3706509Z     tmp0 = 1
2023-01-11T21:38:06.3706644Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3706733Z ''')
2023-01-11T21:38:06.3706739Z 
2023-01-11T21:38:06.3706743Z 
2023-01-11T21:38:06.3706906Z triton_fused_full_like_1 = async_compile.triton('''
2023-01-11T21:38:06.3706977Z import triton
2023-01-11T21:38:06.3707072Z import triton.language as tl
2023-01-11T21:38:06.3707187Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3707291Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3707423Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3707550Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3707554Z 
2023-01-11T21:38:06.3707930Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3708005Z @triton.jit
2023-01-11T21:38:06.3708120Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3708194Z     xnumel = 256
2023-01-11T21:38:06.3708296Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3708423Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3708508Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3708580Z     x0 = xindex
2023-01-11T21:38:06.3708655Z     tmp0 = 3.0
2023-01-11T21:38:06.3708783Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3708869Z ''')
2023-01-11T21:38:06.3708874Z 
2023-01-11T21:38:06.3708878Z 
2023-01-11T21:38:06.3708973Z async_compile.wait(globals())
2023-01-11T21:38:06.3709055Z del async_compile
2023-01-11T21:38:06.3709060Z 
2023-01-11T21:38:06.3709140Z def call(args):
2023-01-11T21:38:06.3709215Z     arg0_1, = args
2023-01-11T21:38:06.3709291Z     args.clear()
2023-01-11T21:38:06.3709379Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3709584Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3709678Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3709849Z         triton_fused_ones_like_0.run(buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3710055Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3710190Z         triton_fused_full_like_1.run(buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3710274Z         return (buf0, buf1, )
2023-01-11T21:38:06.3710280Z 
2023-01-11T21:38:06.3710284Z 
2023-01-11T21:38:06.3710366Z if __name__ == "__main__":
2023-01-11T21:38:06.3710486Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3710608Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3710817Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3710933Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3710938Z 
2023-01-11T21:38:06.3711011Z ok (0.224s)
2023-01-11T21:38:06.3711466Z   test_flip_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3711599Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3711860Z [2023-01-11 21:34:39,092] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 550
2023-01-11T21:38:06.3712124Z [2023-01-11 21:34:39,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 550
2023-01-11T21:38:06.3712621Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3712752Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3713000Z [2023-01-11 21:34:39,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 551
2023-01-11T21:38:06.3713262Z [2023-01-11 21:34:39,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 551
2023-01-11T21:38:06.3713267Z 
2023-01-11T21:38:06.3713366Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3713441Z import torch
2023-01-11T21:38:06.3713515Z import random
2023-01-11T21:38:06.3713637Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3713761Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3713766Z 
2023-01-11T21:38:06.3713847Z aten = torch.ops.aten
2023-01-11T21:38:06.3713977Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3714076Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3714081Z 
2023-01-11T21:38:06.3714156Z import triton
2023-01-11T21:38:06.3714247Z import triton.language as tl
2023-01-11T21:38:06.3714370Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3714509Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3714515Z 
2023-01-11T21:38:06.3714520Z 
2023-01-11T21:38:06.3714673Z triton_fused_rev_0 = async_compile.triton('''
2023-01-11T21:38:06.3714745Z import triton
2023-01-11T21:38:06.3714830Z import triton.language as tl
2023-01-11T21:38:06.3714943Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3715043Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3715180Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3715316Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3715322Z 
2023-01-11T21:38:06.3715786Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3715862Z @triton.jit
2023-01-11T21:38:06.3715995Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3716062Z     xnumel = 72
2023-01-11T21:38:06.3716157Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3716285Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3716368Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3716442Z     x0 = xindex % 6
2023-01-11T21:38:06.3716520Z     x1 = (xindex // 6)
2023-01-11T21:38:06.3716593Z     x2 = xindex
2023-01-11T21:38:06.3716801Z     tmp0 = tl.load(in_ptr0 + (5 + ((-1)*x0) + (6*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3716934Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3717019Z ''')
2023-01-11T21:38:06.3717024Z 
2023-01-11T21:38:06.3717029Z 
2023-01-11T21:38:06.3717183Z triton_fused_sub_1 = async_compile.triton('''
2023-01-11T21:38:06.3717259Z import triton
2023-01-11T21:38:06.3717349Z import triton.language as tl
2023-01-11T21:38:06.3717462Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3717556Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3717688Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3717813Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3717818Z 
2023-01-11T21:38:06.3718218Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3718322Z @triton.jit
2023-01-11T21:38:06.3718454Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3718526Z     xnumel = 72
2023-01-11T21:38:06.3718622Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3718745Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3718827Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3718899Z     x0 = xindex % 6
2023-01-11T21:38:06.3718981Z     x1 = (xindex // 6) % 6
2023-01-11T21:38:06.3719060Z     x2 = (xindex // 36)
2023-01-11T21:38:06.3719132Z     x3 = xindex
2023-01-11T21:38:06.3719309Z     tmp0 = tl.load(in_ptr0 + (30 + x0 + ((-6)*x1) + (36*x2)), xmask)
2023-01-11T21:38:06.3719373Z     tmp1 = 2
2023-01-11T21:38:06.3719486Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.3719616Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3719704Z ''')
2023-01-11T21:38:06.3719709Z 
2023-01-11T21:38:06.3719714Z 
2023-01-11T21:38:06.3719808Z async_compile.wait(globals())
2023-01-11T21:38:06.3719884Z del async_compile
2023-01-11T21:38:06.3719889Z 
2023-01-11T21:38:06.3719964Z def call(args):
2023-01-11T21:38:06.3720038Z     arg0_1, = args
2023-01-11T21:38:06.3720105Z     args.clear()
2023-01-11T21:38:06.3720200Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3720416Z         buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3720509Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3720643Z         triton_fused_rev_0.run(arg0_1, buf0, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.3720858Z         buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3720993Z         triton_fused_sub_1.run(arg0_1, buf1, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.3721059Z         del arg0_1
2023-01-11T21:38:06.3721143Z         return (buf0, buf1, )
2023-01-11T21:38:06.3721151Z 
2023-01-11T21:38:06.3721156Z 
2023-01-11T21:38:06.3721235Z if __name__ == "__main__":
2023-01-11T21:38:06.3721353Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3721479Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3721721Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3721835Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3721841Z 
2023-01-11T21:38:06.3721845Z 
2023-01-11T21:38:06.3721942Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3722009Z import torch
2023-01-11T21:38:06.3722081Z import random
2023-01-11T21:38:06.3722199Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3722321Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3722326Z 
2023-01-11T21:38:06.3722408Z aten = torch.ops.aten
2023-01-11T21:38:06.3722545Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3722642Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3722646Z 
2023-01-11T21:38:06.3722721Z import triton
2023-01-11T21:38:06.3722806Z import triton.language as tl
2023-01-11T21:38:06.3722931Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3723074Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3723080Z 
2023-01-11T21:38:06.3723084Z 
2023-01-11T21:38:06.3723238Z triton_fused_rev_0 = async_compile.triton('''
2023-01-11T21:38:06.3723312Z import triton
2023-01-11T21:38:06.3723403Z import triton.language as tl
2023-01-11T21:38:06.3723518Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3723619Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3723744Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3723869Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3723874Z 
2023-01-11T21:38:06.3724270Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3724371Z @triton.jit
2023-01-11T21:38:06.3724500Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3724572Z     xnumel = 72
2023-01-11T21:38:06.3724668Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3724795Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3724870Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3724943Z     x0 = xindex % 6
2023-01-11T21:38:06.3725020Z     x1 = (xindex // 6)
2023-01-11T21:38:06.3725090Z     x2 = xindex
2023-01-11T21:38:06.3725328Z     tmp0 = tl.load(in_ptr0 + (5 + ((-1)*x0) + (6*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3725459Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3725544Z ''')
2023-01-11T21:38:06.3725552Z 
2023-01-11T21:38:06.3725557Z 
2023-01-11T21:38:06.3725702Z triton_fused_sub_1 = async_compile.triton('''
2023-01-11T21:38:06.3725775Z import triton
2023-01-11T21:38:06.3725873Z import triton.language as tl
2023-01-11T21:38:06.3725986Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3726088Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3726221Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3726345Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3726350Z 
2023-01-11T21:38:06.3726750Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3726816Z @triton.jit
2023-01-11T21:38:06.3726950Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3727022Z     xnumel = 72
2023-01-11T21:38:06.3727122Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3727253Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3727335Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3727409Z     x0 = xindex % 6
2023-01-11T21:38:06.3727482Z     x1 = (xindex // 6) % 6
2023-01-11T21:38:06.3727587Z     x2 = (xindex // 36)
2023-01-11T21:38:06.3727658Z     x3 = xindex
2023-01-11T21:38:06.3727857Z     tmp0 = tl.load(in_ptr0 + (30 + x0 + ((-6)*x1) + (36*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.3727930Z     tmp1 = 2
2023-01-11T21:38:06.3728041Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.3728173Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3728251Z ''')
2023-01-11T21:38:06.3728256Z 
2023-01-11T21:38:06.3728268Z 
2023-01-11T21:38:06.3728354Z async_compile.wait(globals())
2023-01-11T21:38:06.3728431Z del async_compile
2023-01-11T21:38:06.3728437Z 
2023-01-11T21:38:06.3728509Z def call(args):
2023-01-11T21:38:06.3728585Z     arg0_1, = args
2023-01-11T21:38:06.3728657Z     args.clear()
2023-01-11T21:38:06.3728750Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3728956Z         buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3729052Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3729189Z         triton_fused_rev_0.run(arg0_1, buf0, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.3729403Z         buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3729539Z         triton_fused_sub_1.run(arg0_1, buf1, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.3729612Z         del arg0_1
2023-01-11T21:38:06.3729692Z         return (buf0, buf1, )
2023-01-11T21:38:06.3729697Z 
2023-01-11T21:38:06.3729701Z 
2023-01-11T21:38:06.3729777Z if __name__ == "__main__":
2023-01-11T21:38:06.3729887Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3730012Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3730259Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3730370Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3730376Z 
2023-01-11T21:38:06.3730448Z ok (0.253s)
2023-01-11T21:38:06.3730905Z   test_fmod_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3731036Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3731295Z [2023-01-11 21:34:39,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 552
2023-01-11T21:38:06.3731558Z [2023-01-11 21:34:39,576] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 552
2023-01-11T21:38:06.3731978Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3732111Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3732359Z [2023-01-11 21:34:39,597] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 553
2023-01-11T21:38:06.3732622Z [2023-01-11 21:34:39,761] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 553
2023-01-11T21:38:06.3732628Z 
2023-01-11T21:38:06.3732726Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3732797Z import torch
2023-01-11T21:38:06.3732873Z import random
2023-01-11T21:38:06.3732994Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3733117Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3733122Z 
2023-01-11T21:38:06.3733204Z aten = torch.ops.aten
2023-01-11T21:38:06.3733333Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3733456Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3733462Z 
2023-01-11T21:38:06.3733535Z import triton
2023-01-11T21:38:06.3733628Z import triton.language as tl
2023-01-11T21:38:06.3733755Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3733893Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3733899Z 
2023-01-11T21:38:06.3733903Z 
2023-01-11T21:38:06.3734065Z triton_fused_fmod_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.3734137Z import triton
2023-01-11T21:38:06.3734222Z import triton.language as tl
2023-01-11T21:38:06.3734335Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3734441Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3734681Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3734806Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3734812Z 
2023-01-11T21:38:06.3735248Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3735323Z @triton.jit
2023-01-11T21:38:06.3735473Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3735540Z     xnumel = 72
2023-01-11T21:38:06.3735638Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3735767Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3735850Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3735918Z     x0 = xindex
2023-01-11T21:38:06.3736163Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3736350Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3736441Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3736536Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3736641Z     tmp2 = tl.libdevice.fmod(tmp0, tmp1)
2023-01-11T21:38:06.3736715Z     tmp4 = 3.0
2023-01-11T21:38:06.3736794Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3736898Z     tmp7 = tl.libdevice.fmod(tmp5, tmp6)
2023-01-11T21:38:06.3736970Z     tmp8 = 2.0
2023-01-11T21:38:06.3737072Z     tmp9 = tmp7 - tmp8
2023-01-11T21:38:06.3737269Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3737410Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3737495Z ''')
2023-01-11T21:38:06.3737501Z 
2023-01-11T21:38:06.3737505Z 
2023-01-11T21:38:06.3737599Z async_compile.wait(globals())
2023-01-11T21:38:06.3737681Z del async_compile
2023-01-11T21:38:06.3737687Z 
2023-01-11T21:38:06.3737760Z def call(args):
2023-01-11T21:38:06.3737839Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3737908Z     args.clear()
2023-01-11T21:38:06.3737999Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3738217Z         buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3738433Z         buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3738525Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3738678Z         triton_fused_fmod_sub_0.run(arg0_1, arg1_1, buf0, buf1, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.3738749Z         del arg0_1
2023-01-11T21:38:06.3738814Z         del arg1_1
2023-01-11T21:38:06.3738896Z         return (buf0, buf1, )
2023-01-11T21:38:06.3738901Z 
2023-01-11T21:38:06.3738905Z 
2023-01-11T21:38:06.3738987Z if __name__ == "__main__":
2023-01-11T21:38:06.3739108Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3739237Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3739449Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3739704Z     arg1_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3739826Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3739831Z 
2023-01-11T21:38:06.3739836Z 
2023-01-11T21:38:06.3739932Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3739999Z import torch
2023-01-11T21:38:06.3740072Z import random
2023-01-11T21:38:06.3740190Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3740315Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3740320Z 
2023-01-11T21:38:06.3740402Z aten = torch.ops.aten
2023-01-11T21:38:06.3740537Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3740632Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3740637Z 
2023-01-11T21:38:06.3740704Z import triton
2023-01-11T21:38:06.3740795Z import triton.language as tl
2023-01-11T21:38:06.3740920Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3741060Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3741065Z 
2023-01-11T21:38:06.3741070Z 
2023-01-11T21:38:06.3741231Z triton_fused_fmod_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.3741305Z import triton
2023-01-11T21:38:06.3741398Z import triton.language as tl
2023-01-11T21:38:06.3741512Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3741607Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3741739Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3741863Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3741869Z 
2023-01-11T21:38:06.3742300Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3742401Z @triton.jit
2023-01-11T21:38:06.3742557Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3742629Z     xnumel = 72
2023-01-11T21:38:06.3742727Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3742850Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3742931Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3743003Z     x0 = xindex
2023-01-11T21:38:06.3743218Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3743430Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3743548Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3743666Z     tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3743764Z     tmp2 = tl.libdevice.fmod(tmp0, tmp1)
2023-01-11T21:38:06.3743837Z     tmp4 = 3.0
2023-01-11T21:38:06.3743918Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.3744022Z     tmp7 = tl.libdevice.fmod(tmp5, tmp6)
2023-01-11T21:38:06.3744094Z     tmp8 = 2.0
2023-01-11T21:38:06.3744208Z     tmp9 = tmp7 - tmp8
2023-01-11T21:38:06.3744343Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3744470Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3744554Z ''')
2023-01-11T21:38:06.3744559Z 
2023-01-11T21:38:06.3744564Z 
2023-01-11T21:38:06.3744655Z async_compile.wait(globals())
2023-01-11T21:38:06.3744730Z del async_compile
2023-01-11T21:38:06.3744735Z 
2023-01-11T21:38:06.3744809Z def call(args):
2023-01-11T21:38:06.3744888Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3744964Z     args.clear()
2023-01-11T21:38:06.3745056Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3745268Z         buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3745503Z         buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3745605Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3745805Z         triton_fused_fmod_sub_0.run(arg0_1, arg1_1, buf0, buf1, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.3745879Z         del arg0_1
2023-01-11T21:38:06.3745950Z         del arg1_1
2023-01-11T21:38:06.3746033Z         return (buf0, buf1, )
2023-01-11T21:38:06.3746038Z 
2023-01-11T21:38:06.3746042Z 
2023-01-11T21:38:06.3746121Z if __name__ == "__main__":
2023-01-11T21:38:06.3746232Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3746359Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3746574Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3746790Z     arg1_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3746909Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3746915Z 
2023-01-11T21:38:06.3746986Z ok (0.456s)
2023-01-11T21:38:06.3747446Z   test_fmod_zero_dim_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3747577Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3747835Z [2023-01-11 21:34:39,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 554
2023-01-11T21:38:06.3748091Z [2023-01-11 21:34:40,011] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 554
2023-01-11T21:38:06.3748536Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3748666Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3748921Z [2023-01-11 21:34:40,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 555
2023-01-11T21:38:06.3749184Z [2023-01-11 21:34:40,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 555
2023-01-11T21:38:06.3749595Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3749730Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3749989Z [2023-01-11 21:34:40,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 556
2023-01-11T21:38:06.3750246Z [2023-01-11 21:34:40,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 556
2023-01-11T21:38:06.3750657Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3750790Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3751042Z [2023-01-11 21:34:40,338] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 557
2023-01-11T21:38:06.3751048Z 
2023-01-11T21:38:06.3751140Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3751212Z import torch
2023-01-11T21:38:06.3751286Z import random
2023-01-11T21:38:06.3751431Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3751556Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3751561Z 
2023-01-11T21:38:06.3751643Z aten = torch.ops.aten
2023-01-11T21:38:06.3751780Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3751868Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3751879Z 
2023-01-11T21:38:06.3751946Z import triton
2023-01-11T21:38:06.3752039Z import triton.language as tl
2023-01-11T21:38:06.3752165Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3752305Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3752313Z 
2023-01-11T21:38:06.3752317Z 
2023-01-11T21:38:06.3752475Z triton_fused_fmod_0 = async_compile.triton('''
2023-01-11T21:38:06.3752548Z import triton
2023-01-11T21:38:06.3752640Z import triton.language as tl
2023-01-11T21:38:06.3752748Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3752854Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3752987Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3753111Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3753117Z 
2023-01-11T21:38:06.3753529Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3753603Z @triton.jit
2023-01-11T21:38:06.3753744Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3753846Z     xnumel = 10
2023-01-11T21:38:06.3753936Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3754066Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3754149Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3754220Z     x0 = xindex
2023-01-11T21:38:06.3754321Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3754453Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3754558Z     tmp2 = tl.libdevice.fmod(tmp0, tmp1)
2023-01-11T21:38:06.3754683Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3754768Z ''')
2023-01-11T21:38:06.3754774Z 
2023-01-11T21:38:06.3754778Z 
2023-01-11T21:38:06.3754868Z async_compile.wait(globals())
2023-01-11T21:38:06.3754944Z del async_compile
2023-01-11T21:38:06.3754949Z 
2023-01-11T21:38:06.3755027Z def call(args):
2023-01-11T21:38:06.3755107Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3755182Z     args.clear()
2023-01-11T21:38:06.3755274Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3755465Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3755553Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3755700Z         triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3755774Z         del arg0_1
2023-01-11T21:38:06.3755845Z         del arg1_1
2023-01-11T21:38:06.3755922Z         return (buf0, )
2023-01-11T21:38:06.3755928Z 
2023-01-11T21:38:06.3755933Z 
2023-01-11T21:38:06.3756013Z if __name__ == "__main__":
2023-01-11T21:38:06.3756130Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3756249Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3756449Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3756635Z     arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3756757Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3756762Z 
2023-01-11T21:38:06.3756767Z 
2023-01-11T21:38:06.3756864Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3756938Z import torch
2023-01-11T21:38:06.3757012Z import random
2023-01-11T21:38:06.3757124Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3757279Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3757285Z 
2023-01-11T21:38:06.3757369Z aten = torch.ops.aten
2023-01-11T21:38:06.3757504Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3757600Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3757605Z 
2023-01-11T21:38:06.3757678Z import triton
2023-01-11T21:38:06.3757770Z import triton.language as tl
2023-01-11T21:38:06.3757895Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3758027Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3758032Z 
2023-01-11T21:38:06.3758045Z 
2023-01-11T21:38:06.3758194Z triton_fused_fmod_0 = async_compile.triton('''
2023-01-11T21:38:06.3758274Z import triton
2023-01-11T21:38:06.3758366Z import triton.language as tl
2023-01-11T21:38:06.3758481Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3758584Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3758720Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3758843Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3758848Z 
2023-01-11T21:38:06.3759256Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3759331Z @triton.jit
2023-01-11T21:38:06.3759471Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3759545Z     xnumel = 10
2023-01-11T21:38:06.3759668Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3759797Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3759880Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3759952Z     x0 = xindex
2023-01-11T21:38:06.3760062Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3760209Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.3760313Z     tmp2 = tl.libdevice.fmod(tmp0, tmp1)
2023-01-11T21:38:06.3760449Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3760534Z ''')
2023-01-11T21:38:06.3760540Z 
2023-01-11T21:38:06.3760544Z 
2023-01-11T21:38:06.3760636Z async_compile.wait(globals())
2023-01-11T21:38:06.3760711Z del async_compile
2023-01-11T21:38:06.3760716Z 
2023-01-11T21:38:06.3760784Z def call(args):
2023-01-11T21:38:06.3760862Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3760936Z     args.clear()
2023-01-11T21:38:06.3761033Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3761229Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3761318Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3761462Z         triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3761530Z         del arg0_1
2023-01-11T21:38:06.3761605Z         del arg1_1
2023-01-11T21:38:06.3761681Z         return (buf0, )
2023-01-11T21:38:06.3761686Z 
2023-01-11T21:38:06.3761691Z 
2023-01-11T21:38:06.3761770Z if __name__ == "__main__":
2023-01-11T21:38:06.3761889Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3762018Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3762218Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3762410Z     arg1_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3762523Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3762538Z 
2023-01-11T21:38:06.3762542Z 
2023-01-11T21:38:06.3762634Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3762708Z import torch
2023-01-11T21:38:06.3762782Z import random
2023-01-11T21:38:06.3762900Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3763051Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3763057Z 
2023-01-11T21:38:06.3763140Z aten = torch.ops.aten
2023-01-11T21:38:06.3763276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3763364Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3763369Z 
2023-01-11T21:38:06.3763443Z import triton
2023-01-11T21:38:06.3763539Z import triton.language as tl
2023-01-11T21:38:06.3763663Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3763802Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3763808Z 
2023-01-11T21:38:06.3763812Z 
2023-01-11T21:38:06.3763972Z triton_fused_fmod_0 = async_compile.triton('''
2023-01-11T21:38:06.3764048Z import triton
2023-01-11T21:38:06.3764139Z import triton.language as tl
2023-01-11T21:38:06.3764246Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3764348Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3764484Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3764608Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3764614Z 
2023-01-11T21:38:06.3765027Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3765101Z @triton.jit
2023-01-11T21:38:06.3765241Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3765314Z     xnumel = 10
2023-01-11T21:38:06.3765405Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3765562Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3765646Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3765716Z     x0 = xindex
2023-01-11T21:38:06.3765845Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.3765945Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3766052Z     tmp2 = tl.libdevice.fmod(tmp0, tmp1)
2023-01-11T21:38:06.3766179Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3766265Z ''')
2023-01-11T21:38:06.3766271Z 
2023-01-11T21:38:06.3766275Z 
2023-01-11T21:38:06.3766367Z async_compile.wait(globals())
2023-01-11T21:38:06.3766444Z del async_compile
2023-01-11T21:38:06.3766450Z 
2023-01-11T21:38:06.3766526Z def call(args):
2023-01-11T21:38:06.3766606Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3766681Z     args.clear()
2023-01-11T21:38:06.3766772Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3766963Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3767059Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3767201Z         triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3767274Z         del arg0_1
2023-01-11T21:38:06.3767345Z         del arg1_1
2023-01-11T21:38:06.3767430Z         return (buf0, )
2023-01-11T21:38:06.3767435Z 
2023-01-11T21:38:06.3767439Z 
2023-01-11T21:38:06.3767519Z if __name__ == "__main__":
2023-01-11T21:38:06.3767629Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3767752Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3767943Z     arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3768147Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3768266Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3768271Z 
2023-01-11T21:38:06.3768541Z [2023-01-11 21:34:40,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 557
2023-01-11T21:38:06.3768547Z 
2023-01-11T21:38:06.3768644Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3768717Z import torch
2023-01-11T21:38:06.3768785Z import random
2023-01-11T21:38:06.3768927Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3769052Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3769057Z 
2023-01-11T21:38:06.3769138Z aten = torch.ops.aten
2023-01-11T21:38:06.3769276Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3769370Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3769375Z 
2023-01-11T21:38:06.3769446Z import triton
2023-01-11T21:38:06.3769536Z import triton.language as tl
2023-01-11T21:38:06.3769654Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3769792Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3769801Z 
2023-01-11T21:38:06.3769805Z 
2023-01-11T21:38:06.3769962Z triton_fused_fmod_0 = async_compile.triton('''
2023-01-11T21:38:06.3770039Z import triton
2023-01-11T21:38:06.3770130Z import triton.language as tl
2023-01-11T21:38:06.3770243Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3770343Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3770471Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3770595Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3770600Z 
2023-01-11T21:38:06.3771013Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3771084Z @triton.jit
2023-01-11T21:38:06.3771226Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3771299Z     xnumel = 10
2023-01-11T21:38:06.3771429Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3771554Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3771634Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3771698Z     x0 = xindex
2023-01-11T21:38:06.3771840Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.3771956Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3772064Z     tmp2 = tl.libdevice.fmod(tmp0, tmp1)
2023-01-11T21:38:06.3772198Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3772283Z ''')
2023-01-11T21:38:06.3772288Z 
2023-01-11T21:38:06.3772292Z 
2023-01-11T21:38:06.3772383Z async_compile.wait(globals())
2023-01-11T21:38:06.3772453Z del async_compile
2023-01-11T21:38:06.3772462Z 
2023-01-11T21:38:06.3772531Z def call(args):
2023-01-11T21:38:06.3772609Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3772684Z     args.clear()
2023-01-11T21:38:06.3772777Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3772974Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3773066Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3773202Z         triton_fused_fmod_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3773280Z         del arg0_1
2023-01-11T21:38:06.3773352Z         del arg1_1
2023-01-11T21:38:06.3773428Z         return (buf0, )
2023-01-11T21:38:06.3773433Z 
2023-01-11T21:38:06.3773437Z 
2023-01-11T21:38:06.3773515Z if __name__ == "__main__":
2023-01-11T21:38:06.3773631Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3773756Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3773944Z     arg0_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3774135Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3774254Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3774261Z 
2023-01-11T21:38:06.3774331Z ok (0.722s)
2023-01-11T21:38:06.3774951Z   test_forced_buffer_realize_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3775087Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3775347Z [2023-01-11 21:34:40,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 558
2023-01-11T21:38:06.3775610Z [2023-01-11 21:34:40,575] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 558
2023-01-11T21:38:06.3776020Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3776155Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3776408Z [2023-01-11 21:34:40,598] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 559
2023-01-11T21:38:06.3776668Z [2023-01-11 21:34:40,667] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 559
2023-01-11T21:38:06.3776674Z 
2023-01-11T21:38:06.3776765Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3776839Z import torch
2023-01-11T21:38:06.3776911Z import random
2023-01-11T21:38:06.3777030Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3777213Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3777269Z 
2023-01-11T21:38:06.3777368Z aten = torch.ops.aten
2023-01-11T21:38:06.3777527Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3777620Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3777633Z 
2023-01-11T21:38:06.3777700Z import triton
2023-01-11T21:38:06.3777793Z import triton.language as tl
2023-01-11T21:38:06.3777918Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3778055Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3778061Z 
2023-01-11T21:38:06.3778065Z 
2023-01-11T21:38:06.3778226Z triton_fused_mul_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3778297Z import triton
2023-01-11T21:38:06.3778389Z import triton.language as tl
2023-01-11T21:38:06.3778495Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3778597Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3778727Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3778859Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3778864Z 
2023-01-11T21:38:06.3779284Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3779357Z @triton.jit
2023-01-11T21:38:06.3779488Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3779561Z     xnumel = 10
2023-01-11T21:38:06.3779651Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3779778Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3779860Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3779931Z     x0 = xindex
2023-01-11T21:38:06.3780026Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3780094Z     tmp1 = 2
2023-01-11T21:38:06.3780170Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.3786356Z     tmp3 = tmp2 * tmp1
2023-01-11T21:38:06.3786520Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3786619Z ''')
2023-01-11T21:38:06.3786624Z 
2023-01-11T21:38:06.3786629Z 
2023-01-11T21:38:06.3786725Z async_compile.wait(globals())
2023-01-11T21:38:06.3786803Z del async_compile
2023-01-11T21:38:06.3786808Z 
2023-01-11T21:38:06.3786946Z def call(args):
2023-01-11T21:38:06.3787022Z     arg0_1, = args
2023-01-11T21:38:06.3787100Z     args.clear()
2023-01-11T21:38:06.3787194Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3787390Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3787482Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.3787575Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3787716Z         triton_fused_mul_mul_1_0.run(buf1, arg0_1, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3787789Z         del arg0_1
2023-01-11T21:38:06.3787862Z         return (buf1, )
2023-01-11T21:38:06.3787871Z 
2023-01-11T21:38:06.3787875Z 
2023-01-11T21:38:06.3787950Z if __name__ == "__main__":
2023-01-11T21:38:06.3788069Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3788194Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3788395Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3788507Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3788512Z 
2023-01-11T21:38:06.3788517Z 
2023-01-11T21:38:06.3788614Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3788687Z import torch
2023-01-11T21:38:06.3788765Z import random
2023-01-11T21:38:06.3788877Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3789001Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3789006Z 
2023-01-11T21:38:06.3789090Z aten = torch.ops.aten
2023-01-11T21:38:06.3789224Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3789350Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3789355Z 
2023-01-11T21:38:06.3789429Z import triton
2023-01-11T21:38:06.3789521Z import triton.language as tl
2023-01-11T21:38:06.3789645Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3789778Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3789786Z 
2023-01-11T21:38:06.3789790Z 
2023-01-11T21:38:06.3789952Z triton_fused_mul_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3790023Z import triton
2023-01-11T21:38:06.3790115Z import triton.language as tl
2023-01-11T21:38:06.3790232Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3790334Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3790469Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3790587Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3790599Z 
2023-01-11T21:38:06.3791004Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3791079Z @triton.jit
2023-01-11T21:38:06.3791213Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3791289Z     xnumel = 10
2023-01-11T21:38:06.3791388Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3791515Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3791598Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3791662Z     x0 = xindex
2023-01-11T21:38:06.3791775Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3791844Z     tmp1 = 2
2023-01-11T21:38:06.3791923Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.3792002Z     tmp3 = tmp2 * tmp1
2023-01-11T21:38:06.3792139Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3792226Z ''')
2023-01-11T21:38:06.3792234Z 
2023-01-11T21:38:06.3792239Z 
2023-01-11T21:38:06.3792331Z async_compile.wait(globals())
2023-01-11T21:38:06.3792401Z del async_compile
2023-01-11T21:38:06.3792406Z 
2023-01-11T21:38:06.3792485Z def call(args):
2023-01-11T21:38:06.3792558Z     arg0_1, = args
2023-01-11T21:38:06.3792632Z     args.clear()
2023-01-11T21:38:06.3792753Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3792952Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3793041Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.3793126Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3793264Z         triton_fused_mul_mul_1_0.run(buf1, arg0_1, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.3793335Z         del arg0_1
2023-01-11T21:38:06.3793414Z         return (buf1, )
2023-01-11T21:38:06.3793419Z 
2023-01-11T21:38:06.3793423Z 
2023-01-11T21:38:06.3793506Z if __name__ == "__main__":
2023-01-11T21:38:06.3793622Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3793749Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3793940Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3794051Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3794056Z 
2023-01-11T21:38:06.3794127Z ok (0.183s)
2023-01-11T21:38:06.3794585Z   test_full_like_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3794718Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3794977Z [2023-01-11 21:34:40,697] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 560
2023-01-11T21:38:06.3795281Z [2023-01-11 21:34:40,848] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 560
2023-01-11T21:38:06.3795696Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3795827Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3796084Z [2023-01-11 21:34:40,877] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 561
2023-01-11T21:38:06.3796344Z [2023-01-11 21:34:40,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 561
2023-01-11T21:38:06.3796350Z 
2023-01-11T21:38:06.3796448Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3796518Z import torch
2023-01-11T21:38:06.3796591Z import random
2023-01-11T21:38:06.3796711Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3796837Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3796842Z 
2023-01-11T21:38:06.3796923Z aten = torch.ops.aten
2023-01-11T21:38:06.3797063Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3797157Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3797163Z 
2023-01-11T21:38:06.3797230Z import triton
2023-01-11T21:38:06.3797320Z import triton.language as tl
2023-01-11T21:38:06.3797443Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3797581Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3797587Z 
2023-01-11T21:38:06.3797591Z 
2023-01-11T21:38:06.3797742Z triton_fused_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.3797814Z import triton
2023-01-11T21:38:06.3797906Z import triton.language as tl
2023-01-11T21:38:06.3798024Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3798119Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3798252Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3798377Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3798382Z 
2023-01-11T21:38:06.3798801Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.3798876Z @triton.jit
2023-01-11T21:38:06.3798999Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3799072Z     xnumel = 8
2023-01-11T21:38:06.3799162Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3799295Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3799377Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3799448Z     x0 = xindex
2023-01-11T21:38:06.3799523Z     tmp0 = 7.777
2023-01-11T21:38:06.3799592Z     tmp1 = 1
2023-01-11T21:38:06.3799703Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.3799832Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3799916Z ''')
2023-01-11T21:38:06.3799922Z 
2023-01-11T21:38:06.3799926Z 
2023-01-11T21:38:06.3800019Z async_compile.wait(globals())
2023-01-11T21:38:06.3800095Z del async_compile
2023-01-11T21:38:06.3800100Z 
2023-01-11T21:38:06.3800174Z def call(args):
2023-01-11T21:38:06.3800248Z     arg0_1, = args
2023-01-11T21:38:06.3800322Z     args.clear()
2023-01-11T21:38:06.3800413Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3800603Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3800696Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3800828Z         triton_fused_sub_0.run(buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.3800906Z         return (buf0, )
2023-01-11T21:38:06.3800911Z 
2023-01-11T21:38:06.3800943Z 
2023-01-11T21:38:06.3801024Z if __name__ == "__main__":
2023-01-11T21:38:06.3801140Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3801265Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3801463Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3801571Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3801577Z 
2023-01-11T21:38:06.3801581Z 
2023-01-11T21:38:06.3801680Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3801754Z import torch
2023-01-11T21:38:06.3801828Z import random
2023-01-11T21:38:06.3801945Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3802069Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3802074Z 
2023-01-11T21:38:06.3802154Z aten = torch.ops.aten
2023-01-11T21:38:06.3802283Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3802379Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3802387Z 
2023-01-11T21:38:06.3802462Z import triton
2023-01-11T21:38:06.3802553Z import triton.language as tl
2023-01-11T21:38:06.3802677Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3802814Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3802819Z 
2023-01-11T21:38:06.3802823Z 
2023-01-11T21:38:06.3802977Z triton_fused_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.3803050Z import triton
2023-01-11T21:38:06.3803135Z import triton.language as tl
2023-01-11T21:38:06.3803248Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3803350Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3803480Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3803604Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3803609Z 
2023-01-11T21:38:06.3803992Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.3804067Z @triton.jit
2023-01-11T21:38:06.3804186Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3804252Z     xnumel = 8
2023-01-11T21:38:06.3804348Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3804502Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3804584Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3804657Z     x0 = xindex
2023-01-11T21:38:06.3804729Z     tmp0 = 7.777
2023-01-11T21:38:06.3804797Z     tmp1 = 1
2023-01-11T21:38:06.3804901Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.3805036Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3805120Z ''')
2023-01-11T21:38:06.3805125Z 
2023-01-11T21:38:06.3805130Z 
2023-01-11T21:38:06.3805221Z async_compile.wait(globals())
2023-01-11T21:38:06.3805297Z del async_compile
2023-01-11T21:38:06.3805302Z 
2023-01-11T21:38:06.3805379Z def call(args):
2023-01-11T21:38:06.3805452Z     arg0_1, = args
2023-01-11T21:38:06.3805527Z     args.clear()
2023-01-11T21:38:06.3805613Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3805809Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3805901Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3806031Z         triton_fused_sub_0.run(buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.3806108Z         return (buf0, )
2023-01-11T21:38:06.3806113Z 
2023-01-11T21:38:06.3806119Z 
2023-01-11T21:38:06.3806198Z if __name__ == "__main__":
2023-01-11T21:38:06.3806316Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3806435Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3806629Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3806741Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3806746Z 
2023-01-11T21:38:06.3806848Z ok (0.276s)
2023-01-11T21:38:06.3807308Z   test_fuse_tiled_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3807437Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3807693Z [2023-01-11 21:34:40,961] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 562
2023-01-11T21:38:06.3807957Z [2023-01-11 21:34:41,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 562
2023-01-11T21:38:06.3808365Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3808502Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3808762Z [2023-01-11 21:34:41,064] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 563
2023-01-11T21:38:06.3809019Z [2023-01-11 21:34:41,147] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 563
2023-01-11T21:38:06.3809032Z 
2023-01-11T21:38:06.3809123Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3809195Z import torch
2023-01-11T21:38:06.3809269Z import random
2023-01-11T21:38:06.3809389Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3809512Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3809517Z 
2023-01-11T21:38:06.3809599Z aten = torch.ops.aten
2023-01-11T21:38:06.3809739Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3809827Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3809832Z 
2023-01-11T21:38:06.3809905Z import triton
2023-01-11T21:38:06.3809997Z import triton.language as tl
2023-01-11T21:38:06.3810127Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3810343Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3810349Z 
2023-01-11T21:38:06.3810354Z 
2023-01-11T21:38:06.3810507Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.3810581Z import triton
2023-01-11T21:38:06.3810667Z import triton.language as tl
2023-01-11T21:38:06.3810783Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3810882Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3811013Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3811139Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3811144Z 
2023-01-11T21:38:06.3811571Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3811643Z @triton.jit
2023-01-11T21:38:06.3811786Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3811861Z     xnumel = 16384
2023-01-11T21:38:06.3811951Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3812078Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3812160Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3812240Z     x1 = (xindex // 128)
2023-01-11T21:38:06.3812317Z     x0 = xindex % 128
2023-01-11T21:38:06.3812387Z     x2 = xindex
2023-01-11T21:38:06.3812479Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.3812576Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.3812654Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3812817Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3812903Z ''')
2023-01-11T21:38:06.3812909Z 
2023-01-11T21:38:06.3812913Z 
2023-01-11T21:38:06.3813068Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3813141Z import triton
2023-01-11T21:38:06.3813236Z import triton.language as tl
2023-01-11T21:38:06.3813343Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3813446Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3813578Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3813702Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3813707Z 
2023-01-11T21:38:06.3814111Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3814187Z @triton.jit
2023-01-11T21:38:06.3814317Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3814389Z     xnumel = 16384
2023-01-11T21:38:06.3814722Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3814856Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3814946Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3815019Z     x0 = xindex
2023-01-11T21:38:06.3815118Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3815190Z     tmp1 = 1
2023-01-11T21:38:06.3815271Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3815401Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3815490Z ''')
2023-01-11T21:38:06.3815496Z 
2023-01-11T21:38:06.3815502Z 
2023-01-11T21:38:06.3815598Z async_compile.wait(globals())
2023-01-11T21:38:06.3815677Z del async_compile
2023-01-11T21:38:06.3815682Z 
2023-01-11T21:38:06.3815759Z def call(args):
2023-01-11T21:38:06.3815847Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.3815928Z     args.clear()
2023-01-11T21:38:06.3816015Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3816226Z         buf0 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3816321Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3816525Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16384, grid=grid(16384), stream=stream0)
2023-01-11T21:38:06.3816600Z         del arg0_1
2023-01-11T21:38:06.3816673Z         del arg1_1
2023-01-11T21:38:06.3816879Z         buf1 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3817019Z         triton_fused_add_1_1.run(arg2_1, buf1, 16384, grid=grid(16384), stream=stream0)
2023-01-11T21:38:06.3817085Z         del arg2_1
2023-01-11T21:38:06.3817227Z         return (buf0, buf1, )
2023-01-11T21:38:06.3817233Z 
2023-01-11T21:38:06.3817237Z 
2023-01-11T21:38:06.3817316Z if __name__ == "__main__":
2023-01-11T21:38:06.3817439Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3817568Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3817772Z     arg0_1 = rand_strided((128, 1), (1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3817976Z     arg1_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3818182Z     arg2_1 = rand_strided((128, 128), (128, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3818302Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.3818308Z 
2023-01-11T21:38:06.3818320Z 
2023-01-11T21:38:06.3818411Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3818484Z import torch
2023-01-11T21:38:06.3818555Z import random
2023-01-11T21:38:06.3818674Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3818796Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3818802Z 
2023-01-11T21:38:06.3818881Z aten = torch.ops.aten
2023-01-11T21:38:06.3819015Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3819147Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3819152Z 
2023-01-11T21:38:06.3819227Z import triton
2023-01-11T21:38:06.3819322Z import triton.language as tl
2023-01-11T21:38:06.3819448Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3819594Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3819600Z 
2023-01-11T21:38:06.3819604Z 
2023-01-11T21:38:06.3819761Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.3819836Z import triton
2023-01-11T21:38:06.3819924Z import triton.language as tl
2023-01-11T21:38:06.3820043Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3820147Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3820281Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3820408Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3820413Z 
2023-01-11T21:38:06.3820842Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3820917Z @triton.jit
2023-01-11T21:38:06.3821065Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3821141Z     xnumel = 16384
2023-01-11T21:38:06.3821234Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3821363Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3821447Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3821529Z     x1 = (xindex // 128)
2023-01-11T21:38:06.3821608Z     x0 = xindex % 128
2023-01-11T21:38:06.3821680Z     x2 = xindex
2023-01-11T21:38:06.3821794Z     tmp0 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32)
2023-01-11T21:38:06.3821913Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3821996Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3822129Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3822216Z ''')
2023-01-11T21:38:06.3822222Z 
2023-01-11T21:38:06.3822227Z 
2023-01-11T21:38:06.3822386Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3822465Z import triton
2023-01-11T21:38:06.3822590Z import triton.language as tl
2023-01-11T21:38:06.3822700Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3822803Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3822937Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3823066Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3823071Z 
2023-01-11T21:38:06.3823474Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3823554Z @triton.jit
2023-01-11T21:38:06.3823688Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3823765Z     xnumel = 16384
2023-01-11T21:38:06.3823858Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3823989Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3824074Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3824146Z     x0 = xindex
2023-01-11T21:38:06.3824264Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3824339Z     tmp1 = 1
2023-01-11T21:38:06.3824420Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3824548Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.3824636Z ''')
2023-01-11T21:38:06.3824641Z 
2023-01-11T21:38:06.3824646Z 
2023-01-11T21:38:06.3824740Z async_compile.wait(globals())
2023-01-11T21:38:06.3824818Z del async_compile
2023-01-11T21:38:06.3824823Z 
2023-01-11T21:38:06.3824901Z def call(args):
2023-01-11T21:38:06.3825035Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.3825114Z     args.clear()
2023-01-11T21:38:06.3825202Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3825411Z         buf0 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3825508Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3825658Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16384, grid=grid(16384), stream=stream0)
2023-01-11T21:38:06.3825735Z         del arg0_1
2023-01-11T21:38:06.3825809Z         del arg1_1
2023-01-11T21:38:06.3826016Z         buf1 = empty_strided((128, 128), (128, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3826156Z         triton_fused_add_1_1.run(arg2_1, buf1, 16384, grid=grid(16384), stream=stream0)
2023-01-11T21:38:06.3826224Z         del arg2_1
2023-01-11T21:38:06.3826310Z         return (buf0, buf1, )
2023-01-11T21:38:06.3826315Z 
2023-01-11T21:38:06.3826319Z 
2023-01-11T21:38:06.3826401Z if __name__ == "__main__":
2023-01-11T21:38:06.3826522Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3826648Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3826852Z     arg0_1 = rand_strided((128, 1), (1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3827057Z     arg1_1 = rand_strided((1, 128), (128, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3827264Z     arg2_1 = rand_strided((128, 128), (128, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3827386Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.3827391Z 
2023-01-11T21:38:06.3827467Z ok (0.205s)
2023-01-11T21:38:06.3827920Z   test_gather1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3828057Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3828317Z [2023-01-11 21:34:41,174] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 564
2023-01-11T21:38:06.3828619Z [2023-01-11 21:34:41,369] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 564
2023-01-11T21:38:06.3829035Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3829167Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3829425Z [2023-01-11 21:34:41,395] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 565
2023-01-11T21:38:06.3829693Z [2023-01-11 21:34:41,487] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 565
2023-01-11T21:38:06.3829699Z 
2023-01-11T21:38:06.3829798Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3829868Z import torch
2023-01-11T21:38:06.3829944Z import random
2023-01-11T21:38:06.3830067Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3830191Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3830196Z 
2023-01-11T21:38:06.3830281Z aten = torch.ops.aten
2023-01-11T21:38:06.3830423Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3830518Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3830524Z 
2023-01-11T21:38:06.3830601Z import triton
2023-01-11T21:38:06.3830689Z import triton.language as tl
2023-01-11T21:38:06.3830817Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3830958Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3830990Z 
2023-01-11T21:38:06.3830995Z 
2023-01-11T21:38:06.3831173Z triton_fused_gather_gather_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3831251Z import triton
2023-01-11T21:38:06.3831346Z import triton.language as tl
2023-01-11T21:38:06.3831463Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3831562Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3831695Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3831824Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3831829Z 
2023-01-11T21:38:06.3832264Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3832341Z @triton.jit
2023-01-11T21:38:06.3832493Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3832569Z     xnumel = 200
2023-01-11T21:38:06.3832669Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3832794Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3832879Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3832952Z     x2 = xindex
2023-01-11T21:38:06.3833031Z     x0 = xindex % 10
2023-01-11T21:38:06.3833225Z     tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3833326Z     tmp4 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.3833399Z     tmp1 = 1
2023-01-11T21:38:06.3833474Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3833680Z     tmp3 = tl.load(in_ptr1 + (tmp2 + (6*x0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3833764Z     tmp5 = tmp4 + tmp1
2023-01-11T21:38:06.3833876Z     tmp6 = tl.load(in_ptr1 + (tmp5 + (6*x0)), xmask)
2023-01-11T21:38:06.3834012Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3834150Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3834241Z ''')
2023-01-11T21:38:06.3834246Z 
2023-01-11T21:38:06.3834251Z 
2023-01-11T21:38:06.3834347Z async_compile.wait(globals())
2023-01-11T21:38:06.3834419Z del async_compile
2023-01-11T21:38:06.3834425Z 
2023-01-11T21:38:06.3834531Z def call(args):
2023-01-11T21:38:06.3834615Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3834692Z     args.clear()
2023-01-11T21:38:06.3834786Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3835004Z         buf0 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3835235Z         buf1 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3835338Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3835529Z         triton_fused_gather_gather_1_0.run(arg1_1, arg0_1, buf0, buf1, 200, grid=grid(200), stream=stream0)
2023-01-11T21:38:06.3835604Z         del arg0_1
2023-01-11T21:38:06.3835682Z         del arg1_1
2023-01-11T21:38:06.3835765Z         return (buf0, buf1, )
2023-01-11T21:38:06.3835771Z 
2023-01-11T21:38:06.3835775Z 
2023-01-11T21:38:06.3835857Z if __name__ == "__main__":
2023-01-11T21:38:06.3835979Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3836109Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3836321Z     arg0_1 = rand_strided((1, 1, 10, 6), (60, 60, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3836534Z     arg1_1 = rand_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3836658Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3836663Z 
2023-01-11T21:38:06.3836668Z 
2023-01-11T21:38:06.3836768Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3836844Z import torch
2023-01-11T21:38:06.3836921Z import random
2023-01-11T21:38:06.3837042Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3837196Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3837202Z 
2023-01-11T21:38:06.3837279Z aten = torch.ops.aten
2023-01-11T21:38:06.3837416Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3837513Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3837518Z 
2023-01-11T21:38:06.3837597Z import triton
2023-01-11T21:38:06.3837692Z import triton.language as tl
2023-01-11T21:38:06.3837816Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3837957Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3837963Z 
2023-01-11T21:38:06.3837967Z 
2023-01-11T21:38:06.3838145Z triton_fused_gather_gather_1_0 = async_compile.triton('''
2023-01-11T21:38:06.3838215Z import triton
2023-01-11T21:38:06.3838309Z import triton.language as tl
2023-01-11T21:38:06.3838425Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3838528Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3838665Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3838792Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3838798Z 
2023-01-11T21:38:06.3839228Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3839306Z @triton.jit
2023-01-11T21:38:06.3839452Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3839529Z     xnumel = 200
2023-01-11T21:38:06.3839628Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3839760Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3839843Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3839915Z     x2 = xindex
2023-01-11T21:38:06.3839992Z     x0 = xindex % 10
2023-01-11T21:38:06.3840180Z     tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3840281Z     tmp4 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.3840353Z     tmp1 = 1
2023-01-11T21:38:06.3840433Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3840663Z     tmp3 = tl.load(in_ptr1 + (tmp2 + (6*x0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3840772Z     tmp5 = tmp4 + tmp1
2023-01-11T21:38:06.3840905Z     tmp6 = tl.load(in_ptr1 + (tmp5 + (6*x0)), xmask).to(tl.float32)
2023-01-11T21:38:06.3841035Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3841168Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3841256Z ''')
2023-01-11T21:38:06.3841261Z 
2023-01-11T21:38:06.3841265Z 
2023-01-11T21:38:06.3841360Z async_compile.wait(globals())
2023-01-11T21:38:06.3841436Z del async_compile
2023-01-11T21:38:06.3841441Z 
2023-01-11T21:38:06.3841514Z def call(args):
2023-01-11T21:38:06.3841596Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3841669Z     args.clear()
2023-01-11T21:38:06.3841757Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3841974Z         buf0 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3842192Z         buf1 = empty_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3842285Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3842448Z         triton_fused_gather_gather_1_0.run(arg1_1, arg0_1, buf0, buf1, 200, grid=grid(200), stream=stream0)
2023-01-11T21:38:06.3842523Z         del arg0_1
2023-01-11T21:38:06.3842597Z         del arg1_1
2023-01-11T21:38:06.3842675Z         return (buf0, buf1, )
2023-01-11T21:38:06.3842685Z 
2023-01-11T21:38:06.3842689Z 
2023-01-11T21:38:06.3842764Z if __name__ == "__main__":
2023-01-11T21:38:06.3842882Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3843009Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3843251Z     arg0_1 = rand_strided((1, 1, 10, 6), (60, 60, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3843466Z     arg1_1 = rand_strided((4, 5, 10, 1), (50, 10, 1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3843588Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3843593Z 
2023-01-11T21:38:06.3843670Z ok (0.340s)
2023-01-11T21:38:06.3843792Z   test_gather2_cuda (__main__.CudaTests) ... ok (0.001s)
2023-01-11T21:38:06.3844246Z   test_gather_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3844378Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3844637Z [2023-01-11 21:34:41,581] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 566
2023-01-11T21:38:06.3844903Z [2023-01-11 21:34:41,755] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 566
2023-01-11T21:38:06.3844909Z 
2023-01-11T21:38:06.3845012Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3845088Z import torch
2023-01-11T21:38:06.3845167Z import random
2023-01-11T21:38:06.3845288Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3845413Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3845419Z 
2023-01-11T21:38:06.3845496Z aten = torch.ops.aten
2023-01-11T21:38:06.3845634Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3845729Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3845734Z 
2023-01-11T21:38:06.3845810Z import triton
2023-01-11T21:38:06.3845905Z import triton.language as tl
2023-01-11T21:38:06.3846033Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3846176Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3846181Z 
2023-01-11T21:38:06.3846185Z 
2023-01-11T21:38:06.3846449Z triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_0 = async_compile.triton('''
2023-01-11T21:38:06.3846547Z import triton
2023-01-11T21:38:06.3846642Z import triton.language as tl
2023-01-11T21:38:06.3846757Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3846860Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3846993Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3847120Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3847125Z 
2023-01-11T21:38:06.3847516Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.3847593Z @triton.jit
2023-01-11T21:38:06.3847710Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3847786Z     xnumel = 512
2023-01-11T21:38:06.3847886Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3848015Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3848103Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3848176Z     x0 = xindex
2023-01-11T21:38:06.3848247Z     tmp0 = 0
2023-01-11T21:38:06.3848375Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.3848461Z ''')
2023-01-11T21:38:06.3848467Z 
2023-01-11T21:38:06.3848471Z 
2023-01-11T21:38:06.3848735Z triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_1 = async_compile.triton('''
2023-01-11T21:38:06.3848811Z import triton
2023-01-11T21:38:06.3848906Z import triton.language as tl
2023-01-11T21:38:06.3849022Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3849153Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3849287Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3849407Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3849412Z 
2023-01-11T21:38:06.3849841Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3849918Z @triton.jit
2023-01-11T21:38:06.3850057Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3850132Z     xnumel = 2560
2023-01-11T21:38:06.3850230Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3850360Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3850444Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3850520Z     x1 = (xindex // 32)
2023-01-11T21:38:06.3850602Z     x0 = xindex % 32
2023-01-11T21:38:06.3850706Z     tmp0 = tl.load(in_ptr0 + (80 + x1), xmask)
2023-01-11T21:38:06.3850806Z     tmp1 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.3850916Z     tmp2 = tl.load(in_ptr1 + (x0 + (32*tmp1)), xmask)
2023-01-11T21:38:06.3851023Z     tmp3 = tl.load(in_ptr1 + (x0 + (32*tmp0)), xmask)
2023-01-11T21:38:06.3851138Z     tmp4 = tmp2 - tmp3
2023-01-11T21:38:06.3851205Z     tmp5 = 1
2023-01-11T21:38:06.3851286Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.3851437Z     tl.atomic_add(out_ptr0 + (x0 + (32*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3851524Z ''')
2023-01-11T21:38:06.3851529Z 
2023-01-11T21:38:06.3851534Z 
2023-01-11T21:38:06.3851627Z async_compile.wait(globals())
2023-01-11T21:38:06.3851707Z del async_compile
2023-01-11T21:38:06.3851712Z 
2023-01-11T21:38:06.3851788Z def call(args):
2023-01-11T21:38:06.3851863Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3851939Z     args.clear()
2023-01-11T21:38:06.3852033Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3852242Z         buf0 = empty_strided((16, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3852337Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3852566Z         triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_0.run(buf0, 512, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.3852781Z         triton_fused_add_expand_index_index_1_scatter_add__select_2_sub_unsqueeze_zeros_like_1.run(arg1_1, arg0_1, buf0, 2560, grid=grid(2560), stream=stream0)
2023-01-11T21:38:06.3852856Z         del arg0_1
2023-01-11T21:38:06.3852924Z         del arg1_1
2023-01-11T21:38:06.3853005Z         return (buf0, )
2023-01-11T21:38:06.3853010Z 
2023-01-11T21:38:06.3853015Z 
2023-01-11T21:38:06.3853095Z if __name__ == "__main__":
2023-01-11T21:38:06.3853216Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3853343Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3853553Z     arg0_1 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3853753Z     arg1_1 = rand_strided((2, 80), (80, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.3853874Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3853879Z 
2023-01-11T21:38:06.3853945Z ok (0.267s)
2023-01-11T21:38:06.3854403Z   test_gelu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3854650Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3854914Z [2023-01-11 21:34:41,793] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 567
2023-01-11T21:38:06.3855221Z [2023-01-11 21:34:41,965] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 567
2023-01-11T21:38:06.3855639Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3855774Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3856028Z [2023-01-11 21:34:42,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 568
2023-01-11T21:38:06.3856294Z [2023-01-11 21:34:42,190] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 568
2023-01-11T21:38:06.3856300Z 
2023-01-11T21:38:06.3856399Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3856477Z import torch
2023-01-11T21:38:06.3856551Z import random
2023-01-11T21:38:06.3856670Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3856794Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3856800Z 
2023-01-11T21:38:06.3856884Z aten = torch.ops.aten
2023-01-11T21:38:06.3857028Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3857177Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3857183Z 
2023-01-11T21:38:06.3857274Z import triton
2023-01-11T21:38:06.3857374Z import triton.language as tl
2023-01-11T21:38:06.3857516Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3857658Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3857664Z 
2023-01-11T21:38:06.3857668Z 
2023-01-11T21:38:06.3857836Z triton_fused_add_1_mul_5_0 = async_compile.triton('''
2023-01-11T21:38:06.3857912Z import triton
2023-01-11T21:38:06.3858004Z import triton.language as tl
2023-01-11T21:38:06.3858125Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3858227Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3858356Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3858480Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3858486Z 
2023-01-11T21:38:06.3858946Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3859024Z @triton.jit
2023-01-11T21:38:06.3859168Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3859244Z     xnumel = 256
2023-01-11T21:38:06.3859342Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3859469Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3859548Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3859624Z     x0 = xindex
2023-01-11T21:38:06.3859816Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3859917Z     tmp11 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3859992Z     tmp1 = 0.5
2023-01-11T21:38:06.3860072Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.3860155Z     tmp3 = 0.7071067811865476
2023-01-11T21:38:06.3860232Z     tmp4 = tmp0 * tmp3
2023-01-11T21:38:06.3860330Z     tmp5 = tl.libdevice.erf(tmp4)
2023-01-11T21:38:06.3860402Z     tmp6 = 1
2023-01-11T21:38:06.3860479Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.3860558Z     tmp8 = tmp2 * tmp7
2023-01-11T21:38:06.3860630Z     tmp9 = 2
2023-01-11T21:38:06.3860704Z     tmp10 = tmp8 + tmp9
2023-01-11T21:38:06.3860788Z     tmp12 = tmp11 + tmp6
2023-01-11T21:38:06.3860872Z     tmp13 = tmp12 * tmp1
2023-01-11T21:38:06.3860954Z     tmp14 = tmp12 * tmp3
2023-01-11T21:38:06.3861051Z     tmp15 = tl.libdevice.erf(tmp14)
2023-01-11T21:38:06.3861132Z     tmp16 = tmp15 + tmp6
2023-01-11T21:38:06.3861241Z     tmp17 = tmp13 * tmp16
2023-01-11T21:38:06.3861374Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.3861510Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.3861597Z ''')
2023-01-11T21:38:06.3861603Z 
2023-01-11T21:38:06.3861608Z 
2023-01-11T21:38:06.3861708Z async_compile.wait(globals())
2023-01-11T21:38:06.3861790Z del async_compile
2023-01-11T21:38:06.3861795Z 
2023-01-11T21:38:06.3861870Z def call(args):
2023-01-11T21:38:06.3861946Z     arg0_1, = args
2023-01-11T21:38:06.3862022Z     args.clear()
2023-01-11T21:38:06.3862110Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3862314Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3862518Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3862610Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3862759Z         triton_fused_add_1_mul_5_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3862838Z         del arg0_1
2023-01-11T21:38:06.3862922Z         return (buf0, buf1, )
2023-01-11T21:38:06.3862927Z 
2023-01-11T21:38:06.3862931Z 
2023-01-11T21:38:06.3863012Z if __name__ == "__main__":
2023-01-11T21:38:06.3863126Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3863260Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3863465Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3863579Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3863584Z 
2023-01-11T21:38:06.3863588Z 
2023-01-11T21:38:06.3863686Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3863762Z import torch
2023-01-11T21:38:06.3863837Z import random
2023-01-11T21:38:06.3863952Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3864076Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3864084Z 
2023-01-11T21:38:06.3864168Z aten = torch.ops.aten
2023-01-11T21:38:06.3864304Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3864406Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3864411Z 
2023-01-11T21:38:06.3864488Z import triton
2023-01-11T21:38:06.3864582Z import triton.language as tl
2023-01-11T21:38:06.3864735Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3864871Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3864877Z 
2023-01-11T21:38:06.3864886Z 
2023-01-11T21:38:06.3865079Z triton_fused_add_1_convert_element_type_3_0 = async_compile.triton('''
2023-01-11T21:38:06.3865156Z import triton
2023-01-11T21:38:06.3865249Z import triton.language as tl
2023-01-11T21:38:06.3865365Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3865468Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3865616Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3865759Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3865766Z 
2023-01-11T21:38:06.3866197Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.3866271Z @triton.jit
2023-01-11T21:38:06.3866414Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3866491Z     xnumel = 256
2023-01-11T21:38:06.3866588Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3866719Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3866805Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3866878Z     x0 = xindex
2023-01-11T21:38:06.3867087Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3867206Z     tmp13 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3867339Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3867414Z     tmp2 = 0.5
2023-01-11T21:38:06.3867494Z     tmp3 = tmp1 * tmp2
2023-01-11T21:38:06.3867574Z     tmp4 = 0.7071067811865476
2023-01-11T21:38:06.3867654Z     tmp5 = tmp1 * tmp4
2023-01-11T21:38:06.3867744Z     tmp6 = tl.libdevice.erf(tmp5)
2023-01-11T21:38:06.3867819Z     tmp7 = 1
2023-01-11T21:38:06.3867898Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.3867976Z     tmp9 = tmp3 * tmp8
2023-01-11T21:38:06.3868066Z     tmp10 = tmp9.to(tl.float32)
2023-01-11T21:38:06.3868140Z     tmp11 = 2
2023-01-11T21:38:06.3868216Z     tmp12 = tmp10 + tmp11
2023-01-11T21:38:06.3868300Z     tmp14 = tmp13 + tmp7
2023-01-11T21:38:06.3868392Z     tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.3868474Z     tmp16 = tmp15 * tmp2
2023-01-11T21:38:06.3868552Z     tmp17 = tmp15 * tmp4
2023-01-11T21:38:06.3868649Z     tmp18 = tl.libdevice.erf(tmp17)
2023-01-11T21:38:06.3868731Z     tmp19 = tmp18 + tmp7
2023-01-11T21:38:06.3868810Z     tmp20 = tmp16 * tmp19
2023-01-11T21:38:06.3868901Z     tmp21 = tmp20.to(tl.float32)
2023-01-11T21:38:06.3869038Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.3869181Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp21, xmask)
2023-01-11T21:38:06.3869272Z ''')
2023-01-11T21:38:06.3869278Z 
2023-01-11T21:38:06.3869285Z 
2023-01-11T21:38:06.3869383Z async_compile.wait(globals())
2023-01-11T21:38:06.3869461Z del async_compile
2023-01-11T21:38:06.3869466Z 
2023-01-11T21:38:06.3869541Z def call(args):
2023-01-11T21:38:06.3869610Z     arg0_1, = args
2023-01-11T21:38:06.3869692Z     args.clear()
2023-01-11T21:38:06.3869786Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3869992Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3870192Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3870285Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3870460Z         triton_fused_add_1_convert_element_type_3_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.3870528Z         del arg0_1
2023-01-11T21:38:06.3870613Z         return (buf0, buf1, )
2023-01-11T21:38:06.3870618Z 
2023-01-11T21:38:06.3870624Z 
2023-01-11T21:38:06.3870706Z if __name__ == "__main__":
2023-01-11T21:38:06.3870854Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3870985Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3871189Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3871300Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3871306Z 
2023-01-11T21:38:06.3871379Z ok (0.434s)
2023-01-11T21:38:06.3871832Z   test_glu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3871962Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3872226Z [2023-01-11 21:34:42,227] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 569
2023-01-11T21:38:06.3872489Z [2023-01-11 21:34:42,352] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 569
2023-01-11T21:38:06.3872901Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3873035Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3873319Z [2023-01-11 21:34:42,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 570
2023-01-11T21:38:06.3873325Z 
2023-01-11T21:38:06.3873425Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3873501Z import torch
2023-01-11T21:38:06.3873576Z import random
2023-01-11T21:38:06.3873692Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3873816Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3873821Z 
2023-01-11T21:38:06.3873906Z aten = torch.ops.aten
2023-01-11T21:38:06.3874042Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3874143Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3874148Z 
2023-01-11T21:38:06.3874224Z import triton
2023-01-11T21:38:06.3874317Z import triton.language as tl
2023-01-11T21:38:06.3874444Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3874579Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3874587Z 
2023-01-11T21:38:06.3874598Z 
2023-01-11T21:38:06.3874746Z triton_fused_glu_0 = async_compile.triton('''
2023-01-11T21:38:06.3874824Z import triton
2023-01-11T21:38:06.3874919Z import triton.language as tl
2023-01-11T21:38:06.3875036Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3875142Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3875280Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3875429Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3875434Z 
2023-01-11T21:38:06.3875857Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3875933Z @triton.jit
2023-01-11T21:38:06.3876066Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3876148Z     xnumel = 4096
2023-01-11T21:38:06.3876248Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3876376Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3876461Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3876539Z     x0 = xindex % 4
2023-01-11T21:38:06.3876614Z     x1 = (xindex // 4)
2023-01-11T21:38:06.3876713Z     x2 = xindex
2023-01-11T21:38:06.3876920Z     tmp0 = tl.load(in_ptr0 + (x0 + (8*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3877128Z     tmp1 = tl.load(in_ptr0 + (4 + x0 + (8*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3877216Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.3877297Z     tmp3 = tmp0 * tmp2
2023-01-11T21:38:06.3877435Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3877515Z ''')
2023-01-11T21:38:06.3877520Z 
2023-01-11T21:38:06.3877524Z 
2023-01-11T21:38:06.3877682Z triton_fused_glu_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3877762Z import triton
2023-01-11T21:38:06.3877865Z import triton.language as tl
2023-01-11T21:38:06.3877982Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3878087Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3878220Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3878343Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3878353Z 
2023-01-11T21:38:06.3878753Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3878826Z @triton.jit
2023-01-11T21:38:06.3878960Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3879037Z     xnumel = 4096
2023-01-11T21:38:06.3879138Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3879268Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3879384Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3879456Z     x0 = xindex % 512
2023-01-11T21:38:06.3879538Z     x1 = (xindex // 512)
2023-01-11T21:38:06.3879610Z     x2 = xindex
2023-01-11T21:38:06.3879818Z     tmp0 = tl.load(in_ptr0 + (x0 + (1024*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3880030Z     tmp1 = tl.load(in_ptr0 + (512 + x0 + (1024*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3880117Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.3880196Z     tmp3 = tmp0 * tmp2
2023-01-11T21:38:06.3880322Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3880406Z ''')
2023-01-11T21:38:06.3880411Z 
2023-01-11T21:38:06.3880416Z 
2023-01-11T21:38:06.3880572Z triton_fused_glu_2_2 = async_compile.triton('''
2023-01-11T21:38:06.3880647Z import triton
2023-01-11T21:38:06.3880745Z import triton.language as tl
2023-01-11T21:38:06.3880860Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3880966Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3881104Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3881223Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3881229Z 
2023-01-11T21:38:06.3881635Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3881709Z @triton.jit
2023-01-11T21:38:06.3881845Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3881919Z     xnumel = 4096
2023-01-11T21:38:06.3882022Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3882150Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3882235Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3882306Z     x0 = xindex % 32
2023-01-11T21:38:06.3882387Z     x1 = (xindex // 32)
2023-01-11T21:38:06.3882462Z     x2 = xindex
2023-01-11T21:38:06.3882567Z     tmp0 = tl.load(in_ptr0 + (x0 + (64*x1)), xmask)
2023-01-11T21:38:06.3882679Z     tmp1 = tl.load(in_ptr0 + (32 + x0 + (64*x1)), xmask)
2023-01-11T21:38:06.3882766Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.3882847Z     tmp3 = tmp0 * tmp2
2023-01-11T21:38:06.3883007Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3883097Z ''')
2023-01-11T21:38:06.3883102Z 
2023-01-11T21:38:06.3883107Z 
2023-01-11T21:38:06.3883202Z async_compile.wait(globals())
2023-01-11T21:38:06.3883280Z del async_compile
2023-01-11T21:38:06.3883285Z 
2023-01-11T21:38:06.3883363Z def call(args):
2023-01-11T21:38:06.3883437Z     arg0_1, = args
2023-01-11T21:38:06.3883514Z     args.clear()
2023-01-11T21:38:06.3883602Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3883824Z         buf0 = empty_strided((8, 16, 8, 4), (512, 32, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3883919Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3884064Z         triton_fused_glu_0.run(arg0_1, buf0, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.3884282Z         buf1 = empty_strided((8, 8, 8, 8), (512, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3884424Z         triton_fused_glu_1_1.run(arg0_1, buf1, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.3884640Z         buf2 = empty_strided((8, 16, 4, 8), (512, 32, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3884780Z         triton_fused_glu_2_2.run(arg0_1, buf2, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.3884849Z         del arg0_1
2023-01-11T21:38:06.3884938Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3884943Z 
2023-01-11T21:38:06.3884948Z 
2023-01-11T21:38:06.3885031Z if __name__ == "__main__":
2023-01-11T21:38:06.3885151Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3885280Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3885525Z     arg0_1 = rand_strided((8, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3885641Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3885646Z 
2023-01-11T21:38:06.3885912Z [2023-01-11 21:34:42,497] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 570
2023-01-11T21:38:06.3885922Z 
2023-01-11T21:38:06.3886021Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3886091Z import torch
2023-01-11T21:38:06.3886167Z import random
2023-01-11T21:38:06.3886291Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3886418Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3886423Z 
2023-01-11T21:38:06.3886507Z aten = torch.ops.aten
2023-01-11T21:38:06.3886646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3886742Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3886747Z 
2023-01-11T21:38:06.3886816Z import triton
2023-01-11T21:38:06.3886912Z import triton.language as tl
2023-01-11T21:38:06.3887041Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3887181Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3887186Z 
2023-01-11T21:38:06.3887190Z 
2023-01-11T21:38:06.3887345Z triton_fused_glu_0 = async_compile.triton('''
2023-01-11T21:38:06.3887426Z import triton
2023-01-11T21:38:06.3887519Z import triton.language as tl
2023-01-11T21:38:06.3887633Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3887730Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3887863Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3887989Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3887995Z 
2023-01-11T21:38:06.3888397Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3888473Z @triton.jit
2023-01-11T21:38:06.3888606Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3888683Z     xnumel = 4096
2023-01-11T21:38:06.3888787Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3888936Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3889020Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3889096Z     x0 = xindex % 4
2023-01-11T21:38:06.3889175Z     x1 = (xindex // 4)
2023-01-11T21:38:06.3889248Z     x2 = xindex
2023-01-11T21:38:06.3889474Z     tmp0 = tl.load(in_ptr0 + (x0 + (8*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3889702Z     tmp1 = tl.load(in_ptr0 + (4 + x0 + (8*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3889783Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.3889864Z     tmp3 = tmp0 * tmp2
2023-01-11T21:38:06.3890001Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3890090Z ''')
2023-01-11T21:38:06.3890095Z 
2023-01-11T21:38:06.3890100Z 
2023-01-11T21:38:06.3890260Z triton_fused_glu_1_1 = async_compile.triton('''
2023-01-11T21:38:06.3890335Z import triton
2023-01-11T21:38:06.3890432Z import triton.language as tl
2023-01-11T21:38:06.3890543Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3890646Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3890781Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3890908Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3890913Z 
2023-01-11T21:38:06.3891317Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3891392Z @triton.jit
2023-01-11T21:38:06.3891530Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3891638Z     xnumel = 4096
2023-01-11T21:38:06.3891730Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3891859Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3891942Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3892022Z     x0 = xindex % 512
2023-01-11T21:38:06.3892107Z     x1 = (xindex // 512)
2023-01-11T21:38:06.3892180Z     x2 = xindex
2023-01-11T21:38:06.3892411Z     tmp0 = tl.load(in_ptr0 + (x0 + (1024*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3892644Z     tmp1 = tl.load(in_ptr0 + (512 + x0 + (1024*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3892725Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.3892807Z     tmp3 = tmp0 * tmp2
2023-01-11T21:38:06.3892941Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3893028Z ''')
2023-01-11T21:38:06.3893033Z 
2023-01-11T21:38:06.3893038Z 
2023-01-11T21:38:06.3893196Z triton_fused_glu_2_2 = async_compile.triton('''
2023-01-11T21:38:06.3893270Z import triton
2023-01-11T21:38:06.3893365Z import triton.language as tl
2023-01-11T21:38:06.3893473Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3893575Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3893710Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3893838Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3893843Z 
2023-01-11T21:38:06.3894240Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.3894317Z @triton.jit
2023-01-11T21:38:06.3894454Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3894643Z     xnumel = 4096
2023-01-11T21:38:06.3894736Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3894870Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3894956Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3895035Z     x0 = xindex % 32
2023-01-11T21:38:06.3895115Z     x1 = (xindex // 32)
2023-01-11T21:38:06.3895187Z     x2 = xindex
2023-01-11T21:38:06.3895357Z     tmp0 = tl.load(in_ptr0 + (x0 + (64*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.3895481Z     tmp1 = tl.load(in_ptr0 + (32 + x0 + (64*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.3895567Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.3895648Z     tmp3 = tmp0 * tmp2
2023-01-11T21:38:06.3895786Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.3895897Z ''')
2023-01-11T21:38:06.3895903Z 
2023-01-11T21:38:06.3895907Z 
2023-01-11T21:38:06.3896004Z async_compile.wait(globals())
2023-01-11T21:38:06.3896085Z del async_compile
2023-01-11T21:38:06.3896090Z 
2023-01-11T21:38:06.3896166Z def call(args):
2023-01-11T21:38:06.3896234Z     arg0_1, = args
2023-01-11T21:38:06.3896314Z     args.clear()
2023-01-11T21:38:06.3896408Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3896628Z         buf0 = empty_strided((8, 16, 8, 4), (512, 32, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3896727Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3896882Z         triton_fused_glu_0.run(arg0_1, buf0, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.3897172Z         buf1 = empty_strided((8, 8, 8, 8), (512, 64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3897312Z         triton_fused_glu_1_1.run(arg0_1, buf1, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.3897547Z         buf2 = empty_strided((8, 16, 4, 8), (512, 32, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3897687Z         triton_fused_glu_2_2.run(arg0_1, buf2, 4096, grid=grid(4096), stream=stream0)
2023-01-11T21:38:06.3897763Z         del arg0_1
2023-01-11T21:38:06.3897851Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3897895Z 
2023-01-11T21:38:06.3897900Z 
2023-01-11T21:38:06.3897983Z if __name__ == "__main__":
2023-01-11T21:38:06.3898102Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3898232Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3898448Z     arg0_1 = rand_strided((8, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3898562Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3898567Z 
2023-01-11T21:38:06.3898639Z ok (0.308s)
2023-01-11T21:38:06.3899102Z   test_grid_sampler_2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3899236Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3899499Z [2023-01-11 21:34:43,850] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 571
2023-01-11T21:38:06.3899711Z [2023-01-11 21:34:44,454] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:06.3899917Z [2023-01-11 21:34:44,454] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.3900118Z [2023-01-11 21:34:44,454] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5')
2023-01-11T21:38:06.3900124Z 
2023-01-11T21:38:06.3900224Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3900295Z import torch
2023-01-11T21:38:06.3900371Z import random
2023-01-11T21:38:06.3900492Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3900614Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3900619Z 
2023-01-11T21:38:06.3900704Z aten = torch.ops.aten
2023-01-11T21:38:06.3900843Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3900941Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3900947Z 
2023-01-11T21:38:06.3901016Z import triton
2023-01-11T21:38:06.3901112Z import triton.language as tl
2023-01-11T21:38:06.3901240Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3901410Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3901416Z 
2023-01-11T21:38:06.3901421Z 
2023-01-11T21:38:06.3901718Z triton_fused_add_add_1_add_10_add_2_add_3_add_7_add_8_add_9_convert_element_type_10_convert_element_type_11_0 = async_compile.triton('''
2023-01-11T21:38:06.3901796Z import triton
2023-01-11T21:38:06.3901892Z import triton.language as tl
2023-01-11T21:38:06.3902008Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3902106Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3902239Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3902370Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3902378Z 
2023-01-11T21:38:06.3903011Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*i64', 6: '*i64', 7: '*fp32', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*i64', 12: '*i64', 13: '*fp32', 14: '*i64', 15: '*i64', 16: '*fp32', 17: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), equal_to_1=())]})
2023-01-11T21:38:06.3903086Z @triton.jit
2023-01-11T21:38:06.3903339Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, out_ptr11, out_ptr12, out_ptr13, out_ptr14, out_ptr15, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3903416Z     xnumel = 495616
2023-01-11T21:38:06.3903517Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3903647Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3903798Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3903864Z     x0 = xindex
2023-01-11T21:38:06.3904059Z     tmp0 = tl.load(in_ptr0 + (2*x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3904258Z     tmp9 = tl.load(in_ptr0 + (1 + (2*x0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3904363Z     tmp99 = tl.load(in_ptr0 + (2*x0), xmask)
2023-01-11T21:38:06.3904468Z     tmp108 = tl.load(in_ptr0 + (1 + (2*x0)), xmask)
2023-01-11T21:38:06.3904544Z     tmp1 = 175.5
2023-01-11T21:38:06.3904624Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.3904698Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.3904801Z     tmp4 = tl.libdevice.floor(tmp3)
2023-01-11T21:38:06.3904874Z     tmp5 = 0
2023-01-11T21:38:06.3904954Z     tmp6 = tmp4 >= tmp5
2023-01-11T21:38:06.3905029Z     tmp7 = 352
2023-01-11T21:38:06.3905108Z     tmp8 = tmp4 < tmp7
2023-01-11T21:38:06.3905181Z     tmp10 = tmp9 * tmp1
2023-01-11T21:38:06.3905265Z     tmp11 = tmp10 + tmp1
2023-01-11T21:38:06.3905375Z     tmp12 = tl.libdevice.floor(tmp11)
2023-01-11T21:38:06.3905459Z     tmp13 = tmp12 >= tmp5
2023-01-11T21:38:06.3905539Z     tmp14 = tmp12 < tmp7
2023-01-11T21:38:06.3905637Z     tmp15 = tmp13 & tmp14
2023-01-11T21:38:06.3905722Z     tmp16 = tmp8 & tmp15
2023-01-11T21:38:06.3905812Z     tmp17 = tmp6 & tmp16
2023-01-11T21:38:06.3905894Z     tmp18 = 1
2023-01-11T21:38:06.3905976Z     tmp19 = tmp4 + tmp18
2023-01-11T21:38:06.3906090Z     tmp20 = tmp19 - tmp3
2023-01-11T21:38:06.3906170Z     tmp21 = tmp12 + tmp18
2023-01-11T21:38:06.3906281Z     tmp22 = tmp21 - tmp11
2023-01-11T21:38:06.3906361Z     tmp23 = tmp20 * tmp22
2023-01-11T21:38:06.3906456Z     tmp24 = tl.where(tmp17, tmp23, tmp5)
2023-01-11T21:38:06.3906538Z     tmp25 = tmp19 >= tmp5
2023-01-11T21:38:06.3906616Z     tmp26 = tmp19 < tmp7
2023-01-11T21:38:06.3906695Z     tmp27 = tmp26 & tmp15
2023-01-11T21:38:06.3906777Z     tmp28 = tmp25 & tmp27
2023-01-11T21:38:06.3906886Z     tmp29 = tmp3 - tmp4
2023-01-11T21:38:06.3906960Z     tmp30 = tmp29 * tmp22
2023-01-11T21:38:06.3907064Z     tmp31 = tl.where(tmp28, tmp30, tmp5)
2023-01-11T21:38:06.3907144Z     tmp32 = tmp21 >= tmp5
2023-01-11T21:38:06.3907223Z     tmp33 = tmp21 < tmp7
2023-01-11T21:38:06.3907303Z     tmp34 = tmp32 & tmp33
2023-01-11T21:38:06.3907384Z     tmp35 = tmp8 & tmp34
2023-01-11T21:38:06.3907466Z     tmp36 = tmp6 & tmp35
2023-01-11T21:38:06.3907601Z     tmp37 = tmp11 - tmp12
2023-01-11T21:38:06.3907682Z     tmp38 = tmp20 * tmp37
2023-01-11T21:38:06.3907778Z     tmp39 = tl.where(tmp36, tmp38, tmp5)
2023-01-11T21:38:06.3907857Z     tmp40 = tmp26 & tmp34
2023-01-11T21:38:06.3907937Z     tmp41 = tmp25 & tmp40
2023-01-11T21:38:06.3908015Z     tmp42 = tmp29 * tmp37
2023-01-11T21:38:06.3908113Z     tmp43 = tl.where(tmp41, tmp42, tmp5)
2023-01-11T21:38:06.3908182Z     tmp44 = 176.0
2023-01-11T21:38:06.3908263Z     tmp45 = tmp0 * tmp44
2023-01-11T21:38:06.3908342Z     tmp46 = tmp45 + tmp1
2023-01-11T21:38:06.3908413Z     tmp47 = 0.0
2023-01-11T21:38:06.3908555Z     tmp48 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 > tmp47, tmp46, tmp47))
2023-01-11T21:38:06.3908634Z     tmp49 = 351.0
2023-01-11T21:38:06.3908777Z     tmp50 = tl.where(tmp48 != tmp48, tmp48, tl.where(tmp48 < tmp49, tmp48, tmp49))
2023-01-11T21:38:06.3908872Z     tmp51 = tl.libdevice.floor(tmp50)
2023-01-11T21:38:06.3908954Z     tmp52 = tmp51 >= tmp5
2023-01-11T21:38:06.3909036Z     tmp53 = tmp51 < tmp7
2023-01-11T21:38:06.3909116Z     tmp54 = tmp9 * tmp44
2023-01-11T21:38:06.3909197Z     tmp55 = tmp54 + tmp1
2023-01-11T21:38:06.3909335Z     tmp56 = tl.where(tmp55 != tmp55, tmp55, tl.where(tmp55 > tmp47, tmp55, tmp47))
2023-01-11T21:38:06.3909470Z     tmp57 = tl.where(tmp56 != tmp56, tmp56, tl.where(tmp56 < tmp49, tmp56, tmp49))
2023-01-11T21:38:06.3909564Z     tmp58 = tl.libdevice.floor(tmp57)
2023-01-11T21:38:06.3909646Z     tmp59 = tmp58 >= tmp5
2023-01-11T21:38:06.3909727Z     tmp60 = tmp58 < tmp7
2023-01-11T21:38:06.3909808Z     tmp61 = tmp59 & tmp60
2023-01-11T21:38:06.3909888Z     tmp62 = tmp53 & tmp61
2023-01-11T21:38:06.3910005Z     tmp63 = tmp52 & tmp62
2023-01-11T21:38:06.3910098Z     tmp64 = tmp51.to(tl.int64)
2023-01-11T21:38:06.3910191Z     tmp65 = tl.where(tmp63, tmp64, tmp5)
2023-01-11T21:38:06.3910279Z     tmp66 = tmp58.to(tl.int64)
2023-01-11T21:38:06.3910377Z     tmp67 = tl.where(tmp63, tmp66, tmp5)
2023-01-11T21:38:06.3910460Z     tmp68 = tmp51 + tmp18
2023-01-11T21:38:06.3910577Z     tmp69 = tmp68 - tmp50
2023-01-11T21:38:06.3910658Z     tmp70 = tmp58 + tmp18
2023-01-11T21:38:06.3910772Z     tmp71 = tmp70 - tmp57
2023-01-11T21:38:06.3910846Z     tmp72 = tmp69 * tmp71
2023-01-11T21:38:06.3910945Z     tmp73 = tl.where(tmp63, tmp72, tmp5)
2023-01-11T21:38:06.3911026Z     tmp74 = tmp68 >= tmp5
2023-01-11T21:38:06.3911105Z     tmp75 = tmp68 < tmp7
2023-01-11T21:38:06.3911185Z     tmp76 = tmp75 & tmp61
2023-01-11T21:38:06.3911266Z     tmp77 = tmp74 & tmp76
2023-01-11T21:38:06.3911348Z     tmp78 = tmp68.to(tl.int64)
2023-01-11T21:38:06.3911450Z     tmp79 = tl.where(tmp77, tmp78, tmp5)
2023-01-11T21:38:06.3911548Z     tmp80 = tl.where(tmp77, tmp66, tmp5)
2023-01-11T21:38:06.3911667Z     tmp81 = tmp50 - tmp51
2023-01-11T21:38:06.3911746Z     tmp82 = tmp81 * tmp71
2023-01-11T21:38:06.3911844Z     tmp83 = tl.where(tmp77, tmp82, tmp5)
2023-01-11T21:38:06.3911925Z     tmp84 = tmp70 >= tmp5
2023-01-11T21:38:06.3911999Z     tmp85 = tmp70 < tmp7
2023-01-11T21:38:06.3912081Z     tmp86 = tmp84 & tmp85
2023-01-11T21:38:06.3912159Z     tmp87 = tmp53 & tmp86
2023-01-11T21:38:06.3912239Z     tmp88 = tmp52 & tmp87
2023-01-11T21:38:06.3912336Z     tmp89 = tl.where(tmp88, tmp64, tmp5)
2023-01-11T21:38:06.3912423Z     tmp90 = tmp70.to(tl.int64)
2023-01-11T21:38:06.3912522Z     tmp91 = tl.where(tmp88, tmp90, tmp5)
2023-01-11T21:38:06.3912628Z     tmp92 = tmp57 - tmp58
2023-01-11T21:38:06.3912710Z     tmp93 = tmp69 * tmp92
2023-01-11T21:38:06.3912807Z     tmp94 = tl.where(tmp88, tmp93, tmp5)
2023-01-11T21:38:06.3912885Z     tmp95 = tmp75 & tmp86
2023-01-11T21:38:06.3912963Z     tmp96 = tmp74 & tmp95
2023-01-11T21:38:06.3913058Z     tmp97 = tl.where(tmp96, tmp78, tmp5)
2023-01-11T21:38:06.3913159Z     tmp98 = tl.where(tmp96, tmp90, tmp5)
2023-01-11T21:38:06.3913236Z     tmp100 = tmp99 * tmp44
2023-01-11T21:38:06.3913319Z     tmp101 = tmp100 + tmp1
2023-01-11T21:38:06.3913470Z     tmp102 = tl.where(tmp101 != tmp101, tmp101, tl.where(tmp101 > tmp47, tmp101, tmp47))
2023-01-11T21:38:06.3913642Z     tmp103 = tl.where(tmp102 != tmp102, tmp102, tl.where(tmp102 < tmp49, tmp102, tmp49))
2023-01-11T21:38:06.3913746Z     tmp104 = tl.libdevice.floor(tmp103)
2023-01-11T21:38:06.3913830Z     tmp105 = tmp104 + tmp18
2023-01-11T21:38:06.3913912Z     tmp106 = tmp105 >= tmp5
2023-01-11T21:38:06.3913989Z     tmp107 = tmp105 < tmp7
2023-01-11T21:38:06.3914074Z     tmp109 = tmp108 * tmp44
2023-01-11T21:38:06.3914157Z     tmp110 = tmp109 + tmp1
2023-01-11T21:38:06.3914300Z     tmp111 = tl.where(tmp110 != tmp110, tmp110, tl.where(tmp110 > tmp47, tmp110, tmp47))
2023-01-11T21:38:06.3914442Z     tmp112 = tl.where(tmp111 != tmp111, tmp111, tl.where(tmp111 < tmp49, tmp111, tmp49))
2023-01-11T21:38:06.3914549Z     tmp113 = tl.libdevice.floor(tmp112)
2023-01-11T21:38:06.3914632Z     tmp114 = tmp113 + tmp18
2023-01-11T21:38:06.3914709Z     tmp115 = tmp114 >= tmp5
2023-01-11T21:38:06.3914791Z     tmp116 = tmp114 < tmp7
2023-01-11T21:38:06.3914872Z     tmp117 = tmp115 & tmp116
2023-01-11T21:38:06.3914957Z     tmp118 = tmp107 & tmp117
2023-01-11T21:38:06.3915041Z     tmp119 = tmp106 & tmp118
2023-01-11T21:38:06.3915164Z     tmp120 = tmp103 - tmp104
2023-01-11T21:38:06.3915282Z     tmp121 = tmp112 - tmp113
2023-01-11T21:38:06.3915357Z     tmp122 = tmp120 * tmp121
2023-01-11T21:38:06.3915471Z     tmp123 = tl.where(tmp119, tmp122, tmp5)
2023-01-11T21:38:06.3915632Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.3915786Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp31, xmask)
2023-01-11T21:38:06.3915917Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp39, xmask)
2023-01-11T21:38:06.3916048Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp43, xmask)
2023-01-11T21:38:06.3916206Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp65, xmask)
2023-01-11T21:38:06.3916328Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp67, xmask)
2023-01-11T21:38:06.3916455Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp73, xmask)
2023-01-11T21:38:06.3916583Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp79, xmask)
2023-01-11T21:38:06.3916714Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp80, xmask)
2023-01-11T21:38:06.3916842Z     tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp83, xmask)
2023-01-11T21:38:06.3916974Z     tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp89, xmask)
2023-01-11T21:38:06.3917109Z     tl.store(out_ptr11 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp91, xmask)
2023-01-11T21:38:06.3917238Z     tl.store(out_ptr12 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp94, xmask)
2023-01-11T21:38:06.3917363Z     tl.store(out_ptr13 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp97, xmask)
2023-01-11T21:38:06.3917489Z     tl.store(out_ptr14 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp98, xmask)
2023-01-11T21:38:06.3917625Z     tl.store(out_ptr15 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp123, xmask)
2023-01-11T21:38:06.3917715Z ''')
2023-01-11T21:38:06.3917721Z 
2023-01-11T21:38:06.3917727Z 
2023-01-11T21:38:06.3917933Z triton_fused_add_6_index_index_1_index_2_index_3_1 = async_compile.triton('''
2023-01-11T21:38:06.3918010Z import triton
2023-01-11T21:38:06.3918104Z import triton.language as tl
2023-01-11T21:38:06.3918221Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3918319Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3918453Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3918580Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3918585Z 
2023-01-11T21:38:06.3919081Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.3919160Z @triton.jit
2023-01-11T21:38:06.3919369Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3919451Z     xnumel = 1486848
2023-01-11T21:38:06.3919550Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3919682Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3919760Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3919844Z     x2 = (xindex // 371712)
2023-01-11T21:38:06.3919927Z     x1 = (xindex // 123904) % 3
2023-01-11T21:38:06.3920008Z     x0 = xindex % 123904
2023-01-11T21:38:06.3920081Z     x3 = xindex
2023-01-11T21:38:06.3920299Z     tmp2 = tl.load(in_ptr0 + ((2*x0) + (247808*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3920522Z     tmp11 = tl.load(in_ptr0 + (1 + (2*x0) + (247808*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3920632Z     tmp45 = tl.load(in_ptr0 + ((2*x0) + (247808*x2)), xmask)
2023-01-11T21:38:06.3920747Z     tmp52 = tl.load(in_ptr0 + (1 + (2*x0) + (247808*x2)), xmask)
2023-01-11T21:38:06.3920862Z     tmp67 = tl.load(in_ptr2 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3920975Z     tmp69 = tl.load(in_ptr3 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3921085Z     tmp72 = tl.load(in_ptr4 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3921194Z     tmp75 = tl.load(in_ptr5 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3921271Z     tmp0 = x2
2023-01-11T21:38:06.3921338Z     tmp1 = x1
2023-01-11T21:38:06.3921413Z     tmp3 = 175.5
2023-01-11T21:38:06.3921492Z     tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.3921573Z     tmp5 = tmp4 + tmp3
2023-01-11T21:38:06.3921675Z     tmp6 = tl.libdevice.floor(tmp5)
2023-01-11T21:38:06.3921748Z     tmp7 = 0
2023-01-11T21:38:06.3921859Z     tmp8 = tmp6 >= tmp7
2023-01-11T21:38:06.3921928Z     tmp9 = 352
2023-01-11T21:38:06.3922008Z     tmp10 = tmp6 < tmp9
2023-01-11T21:38:06.3922092Z     tmp12 = tmp11 * tmp3
2023-01-11T21:38:06.3922171Z     tmp13 = tmp12 + tmp3
2023-01-11T21:38:06.3922274Z     tmp14 = tl.libdevice.floor(tmp13)
2023-01-11T21:38:06.3922358Z     tmp15 = tmp14 >= tmp7
2023-01-11T21:38:06.3922433Z     tmp16 = tmp14 < tmp9
2023-01-11T21:38:06.3922512Z     tmp17 = tmp15 & tmp16
2023-01-11T21:38:06.3922592Z     tmp18 = tmp10 & tmp17
2023-01-11T21:38:06.3922671Z     tmp19 = tmp8 & tmp18
2023-01-11T21:38:06.3922759Z     tmp20 = tmp14.to(tl.int64)
2023-01-11T21:38:06.3922860Z     tmp21 = tl.where(tmp19, tmp20, tmp7)
2023-01-11T21:38:06.3922950Z     tmp22 = tmp6.to(tl.int64)
2023-01-11T21:38:06.3923044Z     tmp23 = tl.where(tmp19, tmp22, tmp7)
2023-01-11T21:38:06.3923301Z     tmp24 = tl.load(in_ptr1 + (tmp23 + (352*tmp21) + (123904*tmp1) + (371712*tmp0)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3923377Z     tmp25 = 1
2023-01-11T21:38:06.3923458Z     tmp26 = tmp6 + tmp25
2023-01-11T21:38:06.3923539Z     tmp27 = tmp26 >= tmp7
2023-01-11T21:38:06.3923618Z     tmp28 = tmp26 < tmp9
2023-01-11T21:38:06.3923696Z     tmp29 = tmp28 & tmp17
2023-01-11T21:38:06.3923770Z     tmp30 = tmp27 & tmp29
2023-01-11T21:38:06.3923869Z     tmp31 = tl.where(tmp30, tmp20, tmp7)
2023-01-11T21:38:06.3923964Z     tmp32 = tmp26.to(tl.int64)
2023-01-11T21:38:06.3924063Z     tmp33 = tl.where(tmp30, tmp32, tmp7)
2023-01-11T21:38:06.3924317Z     tmp34 = tl.load(in_ptr1 + (tmp33 + (352*tmp31) + (123904*tmp1) + (371712*tmp0)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3924403Z     tmp35 = tmp14 + tmp25
2023-01-11T21:38:06.3924484Z     tmp36 = tmp35 >= tmp7
2023-01-11T21:38:06.3924558Z     tmp37 = tmp35 < tmp9
2023-01-11T21:38:06.3924639Z     tmp38 = tmp36 & tmp37
2023-01-11T21:38:06.3924720Z     tmp39 = tmp10 & tmp38
2023-01-11T21:38:06.3924801Z     tmp40 = tmp8 & tmp39
2023-01-11T21:38:06.3924890Z     tmp41 = tmp35.to(tl.int64)
2023-01-11T21:38:06.3924991Z     tmp42 = tl.where(tmp40, tmp41, tmp7)
2023-01-11T21:38:06.3925091Z     tmp43 = tl.where(tmp40, tmp22, tmp7)
2023-01-11T21:38:06.3925334Z     tmp44 = tl.load(in_ptr1 + (tmp43 + (352*tmp42) + (123904*tmp1) + (371712*tmp0)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.3925417Z     tmp46 = tmp45 * tmp3
2023-01-11T21:38:06.3925526Z     tmp47 = tmp46 + tmp3
2023-01-11T21:38:06.3925630Z     tmp48 = tl.libdevice.floor(tmp47)
2023-01-11T21:38:06.3925711Z     tmp49 = tmp48 + tmp25
2023-01-11T21:38:06.3925792Z     tmp50 = tmp49 >= tmp7
2023-01-11T21:38:06.3925874Z     tmp51 = tmp49 < tmp9
2023-01-11T21:38:06.3925948Z     tmp53 = tmp52 * tmp3
2023-01-11T21:38:06.3926029Z     tmp54 = tmp53 + tmp3
2023-01-11T21:38:06.3926131Z     tmp55 = tl.libdevice.floor(tmp54)
2023-01-11T21:38:06.3926215Z     tmp56 = tmp55 + tmp25
2023-01-11T21:38:06.3926296Z     tmp57 = tmp56 >= tmp7
2023-01-11T21:38:06.3926377Z     tmp58 = tmp56 < tmp9
2023-01-11T21:38:06.3926456Z     tmp59 = tmp57 & tmp58
2023-01-11T21:38:06.3926534Z     tmp60 = tmp51 & tmp59
2023-01-11T21:38:06.3926612Z     tmp61 = tmp50 & tmp60
2023-01-11T21:38:06.3926698Z     tmp62 = tmp56.to(tl.int64)
2023-01-11T21:38:06.3926798Z     tmp63 = tl.where(tmp61, tmp62, tmp7)
2023-01-11T21:38:06.3926888Z     tmp64 = tmp49.to(tl.int64)
2023-01-11T21:38:06.3926989Z     tmp65 = tl.where(tmp61, tmp64, tmp7)
2023-01-11T21:38:06.3927129Z     tmp66 = tl.load(in_ptr1 + (tmp65 + (352*tmp63) + (123904*tmp1) + (371712*tmp0)), None)
2023-01-11T21:38:06.3927205Z     tmp68 = tmp24 * tmp67
2023-01-11T21:38:06.3927288Z     tmp70 = tmp34 * tmp69
2023-01-11T21:38:06.3927369Z     tmp71 = tmp68 + tmp70
2023-01-11T21:38:06.3927450Z     tmp73 = tmp44 * tmp72
2023-01-11T21:38:06.3927530Z     tmp74 = tmp71 + tmp73
2023-01-11T21:38:06.3927608Z     tmp76 = tmp66 * tmp75
2023-01-11T21:38:06.3927688Z     tmp77 = tmp74 + tmp76
2023-01-11T21:38:06.3927824Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp77, xmask)
2023-01-11T21:38:06.3927911Z ''')
2023-01-11T21:38:06.3927943Z 
2023-01-11T21:38:06.3927948Z 
2023-01-11T21:38:06.3928200Z triton_fused_add_10_add_11_add_12_add_13_add_7_add_8_add_9_floor_2_floor_3_ge_10_2 = async_compile.triton('''
2023-01-11T21:38:06.3928278Z import triton
2023-01-11T21:38:06.3928373Z import triton.language as tl
2023-01-11T21:38:06.3928493Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3928597Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3928726Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3928852Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3928857Z 
2023-01-11T21:38:06.3929451Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: '*fp32', 4: '*fp32', 5: '*i64', 6: '*i64', 7: '*fp32', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*i64', 12: '*i64', 13: '*fp32', 14: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14), equal_to_1=())]})
2023-01-11T21:38:06.3929529Z @triton.jit
2023-01-11T21:38:06.3929753Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, in_ptr8, in_ptr9, in_ptr10, in_ptr11, in_ptr12, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3929835Z     xnumel = 1486848
2023-01-11T21:38:06.3929935Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3930066Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3930152Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3930229Z     x2 = (xindex // 371712)
2023-01-11T21:38:06.3930314Z     x1 = (xindex // 123904) % 3
2023-01-11T21:38:06.3930394Z     x0 = xindex % 123904
2023-01-11T21:38:06.3930466Z     x3 = xindex
2023-01-11T21:38:06.3930577Z     tmp2 = tl.load(in_ptr0 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3930689Z     tmp3 = tl.load(in_ptr1 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3930797Z     tmp5 = tl.load(in_ptr3 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3930901Z     tmp7 = tl.load(in_ptr4 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931007Z     tmp8 = tl.load(in_ptr5 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931119Z     tmp10 = tl.load(in_ptr6 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931254Z     tmp13 = tl.load(in_ptr7 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931360Z     tmp14 = tl.load(in_ptr8 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931465Z     tmp16 = tl.load(in_ptr9 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931573Z     tmp19 = tl.load(in_ptr10 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931681Z     tmp20 = tl.load(in_ptr11 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931781Z     tmp22 = tl.load(in_ptr12 + (x0 + (123904*x2)), xmask)
2023-01-11T21:38:06.3931852Z     tmp0 = x2
2023-01-11T21:38:06.3931923Z     tmp1 = x1
2023-01-11T21:38:06.3932175Z     tmp4 = tl.load(in_ptr2 + (tmp3 + (352*tmp2) + (123904*tmp1) + (371712*tmp0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3932257Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.3932504Z     tmp9 = tl.load(in_ptr2 + (tmp8 + (352*tmp7) + (123904*tmp1) + (371712*tmp0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3932586Z     tmp11 = tmp9 * tmp10
2023-01-11T21:38:06.3932658Z     tmp12 = tmp6 + tmp11
2023-01-11T21:38:06.3932911Z     tmp15 = tl.load(in_ptr2 + (tmp14 + (352*tmp13) + (123904*tmp1) + (371712*tmp0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3932993Z     tmp17 = tmp15 * tmp16
2023-01-11T21:38:06.3933073Z     tmp18 = tmp12 + tmp17
2023-01-11T21:38:06.3933208Z     tmp21 = tl.load(in_ptr2 + (tmp20 + (352*tmp19) + (123904*tmp1) + (371712*tmp0)), xmask)
2023-01-11T21:38:06.3933286Z     tmp23 = tmp21 * tmp22
2023-01-11T21:38:06.3933363Z     tmp24 = tmp18 + tmp23
2023-01-11T21:38:06.3933494Z     tl.store(in_out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.3933576Z ''')
2023-01-11T21:38:06.3933582Z 
2023-01-11T21:38:06.3933615Z 
2023-01-11T21:38:06.3933709Z async_compile.wait(globals())
2023-01-11T21:38:06.3933784Z del async_compile
2023-01-11T21:38:06.3933789Z 
2023-01-11T21:38:06.3933862Z def call(args):
2023-01-11T21:38:06.3933940Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.3934013Z     args.clear()
2023-01-11T21:38:06.3934103Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3934318Z         buf0 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3934661Z         buf2 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3934882Z         buf4 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3935097Z         buf6 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3935314Z         buf9 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3935528Z         buf10 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3935751Z         buf11 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3935961Z         buf12 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3936179Z         buf13 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3936390Z         buf14 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3936602Z         buf15 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3936814Z         buf16 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3937029Z         buf17 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3937309Z         buf19 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3937525Z         buf20 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.3937738Z         buf21 = empty_strided((4, 352, 352), (123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3937879Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3938162Z         triton_fused_add_add_1_add_10_add_2_add_3_add_7_add_8_add_9_convert_element_type_10_convert_element_type_11_0.run(arg1_1, buf0, buf2, buf4, buf6, buf9, buf10, buf11, buf12, buf13, buf14, buf15, buf16, buf17, buf19, buf20, buf21, 495616, grid=grid(495616), stream=stream0)
2023-01-11T21:38:06.3938388Z         buf1 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3938479Z         buf8 = buf1; del buf1  # reuse
2023-01-11T21:38:06.3938667Z         triton_fused_add_6_index_index_1_index_2_index_3_1.run(buf8, arg1_1, arg0_1, buf0, buf2, buf4, buf6, 1486848, grid=grid(1486848), stream=stream0)
2023-01-11T21:38:06.3938745Z         del arg1_1
2023-01-11T21:38:06.3938816Z         del buf0
2023-01-11T21:38:06.3938886Z         del buf2
2023-01-11T21:38:06.3938955Z         del buf4
2023-01-11T21:38:06.3939017Z         del buf6
2023-01-11T21:38:06.3939256Z         buf18 = empty_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3939349Z         buf22 = buf18; del buf18  # reuse
2023-01-11T21:38:06.3939582Z         triton_fused_add_10_add_11_add_12_add_13_add_7_add_8_add_9_floor_2_floor_3_ge_10_2.run(buf22, buf10, buf9, arg0_1, buf11, buf13, buf12, buf14, buf16, buf15, buf17, buf20, buf19, buf21, 1486848, grid=grid(1486848), stream=stream0)
2023-01-11T21:38:06.3939657Z         del arg0_1
2023-01-11T21:38:06.3939742Z         return (buf8, buf22, )
2023-01-11T21:38:06.3939747Z 
2023-01-11T21:38:06.3939751Z 
2023-01-11T21:38:06.3939829Z if __name__ == "__main__":
2023-01-11T21:38:06.3939989Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3940118Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3940346Z     arg0_1 = rand_strided((4, 3, 352, 352), (371712, 123904, 352, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3940611Z     arg1_1 = rand_strided((4, 352, 352, 2), (247808, 704, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3940739Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.3941046Z [2023-01-11 21:34:44,927] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 571
2023-01-11T21:38:06.3941052Z 
2023-01-11T21:38:06.3941127Z ok (2.432s)
2023-01-11T21:38:06.3941670Z   test_hardsigmoid_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3941816Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3942110Z [2023-01-11 21:34:44,978] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 572
2023-01-11T21:38:06.3942416Z [2023-01-11 21:34:45,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 572
2023-01-11T21:38:06.3942908Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3943051Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3943336Z [2023-01-11 21:34:45,222] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 573
2023-01-11T21:38:06.3943638Z [2023-01-11 21:34:45,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 573
2023-01-11T21:38:06.3943644Z 
2023-01-11T21:38:06.3943749Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3943853Z import torch
2023-01-11T21:38:06.3943931Z import random
2023-01-11T21:38:06.3944064Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3944199Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3944204Z 
2023-01-11T21:38:06.3944291Z aten = torch.ops.aten
2023-01-11T21:38:06.3944437Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3944538Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3944544Z 
2023-01-11T21:38:06.3944621Z import triton
2023-01-11T21:38:06.3944720Z import triton.language as tl
2023-01-11T21:38:06.3944857Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3945016Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3945022Z 
2023-01-11T21:38:06.3945026Z 
2023-01-11T21:38:06.3945244Z triton_fused_div_div_1_div_2_0 = async_compile.triton('''
2023-01-11T21:38:06.3945328Z import triton
2023-01-11T21:38:06.3945443Z import triton.language as tl
2023-01-11T21:38:06.3945569Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3945680Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3945824Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3945968Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3945973Z 
2023-01-11T21:38:06.3946491Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3946605Z @triton.jit
2023-01-11T21:38:06.3946759Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3946829Z     xnumel = 64
2023-01-11T21:38:06.3946927Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3947058Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3947146Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3947220Z     x0 = xindex
2023-01-11T21:38:06.3947414Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3947517Z     tmp13 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3947583Z     tmp1 = 3
2023-01-11T21:38:06.3947662Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3947736Z     tmp3 = 0.0
2023-01-11T21:38:06.3947875Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp3, tmp2, tmp3))
2023-01-11T21:38:06.3947949Z     tmp5 = 6.0
2023-01-11T21:38:06.3948085Z     tmp6 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp5, tmp4, tmp5))
2023-01-11T21:38:06.3948160Z     tmp7 = 6
2023-01-11T21:38:06.3948234Z     tmp8 = tmp6 / tmp7
2023-01-11T21:38:06.3948312Z     tmp9 = tmp2 + tmp1
2023-01-11T21:38:06.3948450Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp3, tmp9, tmp3))
2023-01-11T21:38:06.3948596Z     tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp5, tmp10, tmp5))
2023-01-11T21:38:06.3948681Z     tmp12 = tmp11 / tmp7
2023-01-11T21:38:06.3948796Z     tmp14 = tmp13 - tmp1
2023-01-11T21:38:06.3948878Z     tmp15 = tmp14 + tmp1
2023-01-11T21:38:06.3949013Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp3, tmp15, tmp3))
2023-01-11T21:38:06.3949153Z     tmp17 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp5, tmp16, tmp5))
2023-01-11T21:38:06.3949233Z     tmp18 = tmp17 / tmp7
2023-01-11T21:38:06.3949367Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3949502Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.3949635Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.3949725Z ''')
2023-01-11T21:38:06.3949730Z 
2023-01-11T21:38:06.3949735Z 
2023-01-11T21:38:06.3949831Z async_compile.wait(globals())
2023-01-11T21:38:06.3949903Z del async_compile
2023-01-11T21:38:06.3949908Z 
2023-01-11T21:38:06.3949984Z def call(args):
2023-01-11T21:38:06.3950086Z     arg0_1, = args
2023-01-11T21:38:06.3950164Z     args.clear()
2023-01-11T21:38:06.3950257Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3950458Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3950657Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3950848Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3950941Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3951100Z         triton_fused_div_div_1_div_2_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3951179Z         del arg0_1
2023-01-11T21:38:06.3951267Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3951273Z 
2023-01-11T21:38:06.3951277Z 
2023-01-11T21:38:06.3951361Z if __name__ == "__main__":
2023-01-11T21:38:06.3951482Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3951612Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3951804Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3951917Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3951922Z 
2023-01-11T21:38:06.3951926Z 
2023-01-11T21:38:06.3952023Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3952099Z import torch
2023-01-11T21:38:06.3952175Z import random
2023-01-11T21:38:06.3952297Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3952421Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3952426Z 
2023-01-11T21:38:06.3952510Z aten = torch.ops.aten
2023-01-11T21:38:06.3952680Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3952776Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3952781Z 
2023-01-11T21:38:06.3952856Z import triton
2023-01-11T21:38:06.3952950Z import triton.language as tl
2023-01-11T21:38:06.3953076Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3953220Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3953225Z 
2023-01-11T21:38:06.3953230Z 
2023-01-11T21:38:06.3953494Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0 = async_compile.triton('''
2023-01-11T21:38:06.3953572Z import triton
2023-01-11T21:38:06.3953659Z import triton.language as tl
2023-01-11T21:38:06.3953776Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3953881Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3954015Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3954144Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3954152Z 
2023-01-11T21:38:06.3954592Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3954666Z @triton.jit
2023-01-11T21:38:06.3954820Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3954890Z     xnumel = 64
2023-01-11T21:38:06.3954991Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3955143Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3955230Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3955320Z     x0 = xindex
2023-01-11T21:38:06.3955541Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3955661Z     tmp18 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3955746Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3955819Z     tmp2 = 3
2023-01-11T21:38:06.3955897Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.3955972Z     tmp4 = 0.0
2023-01-11T21:38:06.3956107Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp4, tmp3, tmp4))
2023-01-11T21:38:06.3956180Z     tmp6 = 6.0
2023-01-11T21:38:06.3956347Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 < tmp6, tmp5, tmp6))
2023-01-11T21:38:06.3956414Z     tmp8 = 6
2023-01-11T21:38:06.3956496Z     tmp9 = tmp7 / tmp8
2023-01-11T21:38:06.3956586Z     tmp10 = tmp9.to(tl.float32)
2023-01-11T21:38:06.3956669Z     tmp11 = tmp0 + tmp2
2023-01-11T21:38:06.3956761Z     tmp12 = tmp11.to(tl.float32)
2023-01-11T21:38:06.3956842Z     tmp13 = tmp12 + tmp2
2023-01-11T21:38:06.3956985Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp4, tmp13, tmp4))
2023-01-11T21:38:06.3957120Z     tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 < tmp6, tmp14, tmp6))
2023-01-11T21:38:06.3957206Z     tmp16 = tmp15 / tmp8
2023-01-11T21:38:06.3957297Z     tmp17 = tmp16.to(tl.float32)
2023-01-11T21:38:06.3957413Z     tmp19 = tmp18 - tmp2
2023-01-11T21:38:06.3957503Z     tmp20 = tmp19.to(tl.float32)
2023-01-11T21:38:06.3957583Z     tmp21 = tmp20 + tmp2
2023-01-11T21:38:06.3957721Z     tmp22 = tl.where(tmp21 != tmp21, tmp21, tl.where(tmp21 > tmp4, tmp21, tmp4))
2023-01-11T21:38:06.3957857Z     tmp23 = tl.where(tmp22 != tmp22, tmp22, tl.where(tmp22 < tmp6, tmp22, tmp6))
2023-01-11T21:38:06.3957943Z     tmp24 = tmp23 / tmp8
2023-01-11T21:38:06.3958037Z     tmp25 = tmp24.to(tl.float32)
2023-01-11T21:38:06.3958172Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.3958306Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.3958436Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp25, xmask)
2023-01-11T21:38:06.3958524Z ''')
2023-01-11T21:38:06.3958530Z 
2023-01-11T21:38:06.3958534Z 
2023-01-11T21:38:06.3958658Z async_compile.wait(globals())
2023-01-11T21:38:06.3958730Z del async_compile
2023-01-11T21:38:06.3958735Z 
2023-01-11T21:38:06.3958812Z def call(args):
2023-01-11T21:38:06.3958885Z     arg0_1, = args
2023-01-11T21:38:06.3958962Z     args.clear()
2023-01-11T21:38:06.3959056Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3959259Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3959458Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3959648Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3959743Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3959955Z         triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3960031Z         del arg0_1
2023-01-11T21:38:06.3960120Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3960128Z 
2023-01-11T21:38:06.3960132Z 
2023-01-11T21:38:06.3960215Z if __name__ == "__main__":
2023-01-11T21:38:06.3960334Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3960462Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3960659Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3960771Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3960776Z 
2023-01-11T21:38:06.3960849Z ok (0.397s)
2023-01-11T21:38:06.3961312Z   test_hardswish_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3961444Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3961707Z [2023-01-11 21:34:45,379] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 574
2023-01-11T21:38:06.3961972Z [2023-01-11 21:34:45,480] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 574
2023-01-11T21:38:06.3962416Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3962552Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3962808Z [2023-01-11 21:34:45,636] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 575
2023-01-11T21:38:06.3963072Z [2023-01-11 21:34:45,743] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 575
2023-01-11T21:38:06.3963080Z 
2023-01-11T21:38:06.3963175Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3963250Z import torch
2023-01-11T21:38:06.3963326Z import random
2023-01-11T21:38:06.3963448Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3963573Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3963578Z 
2023-01-11T21:38:06.3963663Z aten = torch.ops.aten
2023-01-11T21:38:06.3963801Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3963893Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3963902Z 
2023-01-11T21:38:06.3963972Z import triton
2023-01-11T21:38:06.3964066Z import triton.language as tl
2023-01-11T21:38:06.3964193Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3964333Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3964338Z 
2023-01-11T21:38:06.3964343Z 
2023-01-11T21:38:06.3964546Z triton_fused_div_div_1_div_2_0 = async_compile.triton('''
2023-01-11T21:38:06.3964623Z import triton
2023-01-11T21:38:06.3964717Z import triton.language as tl
2023-01-11T21:38:06.3964826Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3964930Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3965066Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3965192Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3965197Z 
2023-01-11T21:38:06.3965632Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3965710Z @triton.jit
2023-01-11T21:38:06.3965862Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3965935Z     xnumel = 64
2023-01-11T21:38:06.3966032Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3966159Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3966246Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3966320Z     x0 = xindex
2023-01-11T21:38:06.3966511Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3966617Z     tmp15 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3966688Z     tmp1 = 3
2023-01-11T21:38:06.3966763Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.3966838Z     tmp3 = 0.0
2023-01-11T21:38:06.3966976Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp3, tmp2, tmp3))
2023-01-11T21:38:06.3967050Z     tmp5 = 6.0
2023-01-11T21:38:06.3967186Z     tmp6 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp5, tmp4, tmp5))
2023-01-11T21:38:06.3967265Z     tmp7 = tmp0 * tmp6
2023-01-11T21:38:06.3967337Z     tmp8 = 6
2023-01-11T21:38:06.3967410Z     tmp9 = tmp7 / tmp8
2023-01-11T21:38:06.3967491Z     tmp10 = tmp2 + tmp1
2023-01-11T21:38:06.3967638Z     tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 > tmp3, tmp10, tmp3))
2023-01-11T21:38:06.3967776Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp5, tmp11, tmp5))
2023-01-11T21:38:06.3967859Z     tmp13 = tmp2 * tmp12
2023-01-11T21:38:06.3967938Z     tmp14 = tmp13 / tmp8
2023-01-11T21:38:06.3968081Z     tmp16 = tmp15 - tmp1
2023-01-11T21:38:06.3968156Z     tmp17 = tmp16 + tmp1
2023-01-11T21:38:06.3968296Z     tmp18 = tl.where(tmp17 != tmp17, tmp17, tl.where(tmp17 > tmp3, tmp17, tmp3))
2023-01-11T21:38:06.3968435Z     tmp19 = tl.where(tmp18 != tmp18, tmp18, tl.where(tmp18 < tmp5, tmp18, tmp5))
2023-01-11T21:38:06.3968517Z     tmp20 = tmp16 * tmp19
2023-01-11T21:38:06.3968598Z     tmp21 = tmp20 / tmp8
2023-01-11T21:38:06.3968737Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.3968872Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.3969006Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp21, xmask)
2023-01-11T21:38:06.3969090Z ''')
2023-01-11T21:38:06.3969095Z 
2023-01-11T21:38:06.3969100Z 
2023-01-11T21:38:06.3969196Z async_compile.wait(globals())
2023-01-11T21:38:06.3969275Z del async_compile
2023-01-11T21:38:06.3969280Z 
2023-01-11T21:38:06.3969356Z def call(args):
2023-01-11T21:38:06.3969432Z     arg0_1, = args
2023-01-11T21:38:06.3969508Z     args.clear()
2023-01-11T21:38:06.3969603Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3969795Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3969994Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3970193Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3970289Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3970447Z         triton_fused_div_div_1_div_2_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3970555Z         del arg0_1
2023-01-11T21:38:06.3970645Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3970651Z 
2023-01-11T21:38:06.3970655Z 
2023-01-11T21:38:06.3970738Z if __name__ == "__main__":
2023-01-11T21:38:06.3970851Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3970981Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3971183Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3971295Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3971300Z 
2023-01-11T21:38:06.3971305Z 
2023-01-11T21:38:06.3971403Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3971478Z import torch
2023-01-11T21:38:06.3971555Z import random
2023-01-11T21:38:06.3971678Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3971796Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3971801Z 
2023-01-11T21:38:06.3971886Z aten = torch.ops.aten
2023-01-11T21:38:06.3972027Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3972125Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3972130Z 
2023-01-11T21:38:06.3972206Z import triton
2023-01-11T21:38:06.3972299Z import triton.language as tl
2023-01-11T21:38:06.3972426Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3972563Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3972575Z 
2023-01-11T21:38:06.3972579Z 
2023-01-11T21:38:06.3972839Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0 = async_compile.triton('''
2023-01-11T21:38:06.3972917Z import triton
2023-01-11T21:38:06.3973012Z import triton.language as tl
2023-01-11T21:38:06.3973127Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3973231Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3973367Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3973494Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3973501Z 
2023-01-11T21:38:06.3973931Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3974032Z @triton.jit
2023-01-11T21:38:06.3974188Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3974262Z     xnumel = 64
2023-01-11T21:38:06.3974360Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3974668Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3974750Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3974820Z     x0 = xindex
2023-01-11T21:38:06.3975027Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3975144Z     tmp20 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3975237Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3975307Z     tmp2 = 3
2023-01-11T21:38:06.3975384Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.3975455Z     tmp4 = 0.0
2023-01-11T21:38:06.3975591Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp4, tmp3, tmp4))
2023-01-11T21:38:06.3975656Z     tmp6 = 6.0
2023-01-11T21:38:06.3975794Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 < tmp6, tmp5, tmp6))
2023-01-11T21:38:06.3975872Z     tmp8 = tmp1 * tmp7
2023-01-11T21:38:06.3975941Z     tmp9 = 6
2023-01-11T21:38:06.3976019Z     tmp10 = tmp8 / tmp9
2023-01-11T21:38:06.3976107Z     tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.3976185Z     tmp12 = tmp0 + tmp2
2023-01-11T21:38:06.3976267Z     tmp13 = tmp12.to(tl.float32)
2023-01-11T21:38:06.3976348Z     tmp14 = tmp13 + tmp2
2023-01-11T21:38:06.3976485Z     tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 > tmp4, tmp14, tmp4))
2023-01-11T21:38:06.3976622Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 < tmp6, tmp15, tmp6))
2023-01-11T21:38:06.3976750Z     tmp17 = tmp13 * tmp16
2023-01-11T21:38:06.3976834Z     tmp18 = tmp17 / tmp9
2023-01-11T21:38:06.3976924Z     tmp19 = tmp18.to(tl.float32)
2023-01-11T21:38:06.3977034Z     tmp21 = tmp20 - tmp2
2023-01-11T21:38:06.3977167Z     tmp22 = tmp21.to(tl.float32)
2023-01-11T21:38:06.3977261Z     tmp23 = tmp22 + tmp2
2023-01-11T21:38:06.3977402Z     tmp24 = tl.where(tmp23 != tmp23, tmp23, tl.where(tmp23 > tmp4, tmp23, tmp4))
2023-01-11T21:38:06.3977538Z     tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp6, tmp24, tmp6))
2023-01-11T21:38:06.3977619Z     tmp26 = tmp22 * tmp25
2023-01-11T21:38:06.3977698Z     tmp27 = tmp26 / tmp9
2023-01-11T21:38:06.3977781Z     tmp28 = tmp27.to(tl.float32)
2023-01-11T21:38:06.3977921Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.3978054Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.3978183Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask)
2023-01-11T21:38:06.3978277Z ''')
2023-01-11T21:38:06.3978283Z 
2023-01-11T21:38:06.3978287Z 
2023-01-11T21:38:06.3978382Z async_compile.wait(globals())
2023-01-11T21:38:06.3978461Z del async_compile
2023-01-11T21:38:06.3978466Z 
2023-01-11T21:38:06.3978542Z def call(args):
2023-01-11T21:38:06.3978613Z     arg0_1, = args
2023-01-11T21:38:06.3978689Z     args.clear()
2023-01-11T21:38:06.3978784Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3978981Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3979181Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3979380Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3979473Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3979688Z         triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3979761Z         del arg0_1
2023-01-11T21:38:06.3979851Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3979857Z 
2023-01-11T21:38:06.3979861Z 
2023-01-11T21:38:06.3979943Z if __name__ == "__main__":
2023-01-11T21:38:06.3980064Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3980232Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3980436Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3980549Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3980554Z 
2023-01-11T21:38:06.3980628Z ok (0.417s)
2023-01-11T21:38:06.3981082Z   test_hardtanh_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3981222Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3981481Z [2023-01-11 21:34:45,779] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 576
2023-01-11T21:38:06.3981752Z [2023-01-11 21:34:45,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 576
2023-01-11T21:38:06.3982168Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3982301Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3982558Z [2023-01-11 21:34:46,098] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 577
2023-01-11T21:38:06.3982848Z [2023-01-11 21:34:46,191] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 577
2023-01-11T21:38:06.3982854Z 
2023-01-11T21:38:06.3982954Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3983030Z import torch
2023-01-11T21:38:06.3983104Z import random
2023-01-11T21:38:06.3983225Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3983349Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3983354Z 
2023-01-11T21:38:06.3983438Z aten = torch.ops.aten
2023-01-11T21:38:06.3983578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3983675Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3983680Z 
2023-01-11T21:38:06.3983756Z import triton
2023-01-11T21:38:06.3983850Z import triton.language as tl
2023-01-11T21:38:06.3983970Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3984118Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3984124Z 
2023-01-11T21:38:06.3984128Z 
2023-01-11T21:38:06.3984324Z triton_fused_minimum_minimum_1_minimum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.3984401Z import triton
2023-01-11T21:38:06.3984496Z import triton.language as tl
2023-01-11T21:38:06.3984616Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3984719Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3984853Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3984974Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3984979Z 
2023-01-11T21:38:06.3985449Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3985536Z @triton.jit
2023-01-11T21:38:06.3985697Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3985773Z     xnumel = 64
2023-01-11T21:38:06.3985873Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3986006Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3986090Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3986187Z     x0 = xindex
2023-01-11T21:38:06.3986381Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.3986481Z     tmp9 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.3986581Z     tmp1 = -1.0
2023-01-11T21:38:06.3986720Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.3986794Z     tmp3 = 1.0
2023-01-11T21:38:06.3986934Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3))
2023-01-11T21:38:06.3987000Z     tmp5 = 1
2023-01-11T21:38:06.3987082Z     tmp6 = tmp0 + tmp5
2023-01-11T21:38:06.3987220Z     tmp7 = tl.where(tmp6 != tmp6, tmp6, tl.where(tmp6 > tmp1, tmp6, tmp1))
2023-01-11T21:38:06.3987355Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 < tmp3, tmp7, tmp3))
2023-01-11T21:38:06.3987469Z     tmp10 = tmp9 - tmp5
2023-01-11T21:38:06.3987615Z     tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 > tmp1, tmp10, tmp1))
2023-01-11T21:38:06.3987762Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp3, tmp11, tmp3))
2023-01-11T21:38:06.3987892Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.3988029Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.3988166Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.3988254Z ''')
2023-01-11T21:38:06.3988259Z 
2023-01-11T21:38:06.3988264Z 
2023-01-11T21:38:06.3988358Z async_compile.wait(globals())
2023-01-11T21:38:06.3988438Z del async_compile
2023-01-11T21:38:06.3988443Z 
2023-01-11T21:38:06.3988521Z def call(args):
2023-01-11T21:38:06.3988628Z     arg0_1, = args
2023-01-11T21:38:06.3988698Z     args.clear()
2023-01-11T21:38:06.3988792Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3988994Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3989192Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3989394Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.3989487Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3989660Z         triton_fused_minimum_minimum_1_minimum_2_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3989737Z         del arg0_1
2023-01-11T21:38:06.3989821Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3989826Z 
2023-01-11T21:38:06.3989831Z 
2023-01-11T21:38:06.3989914Z if __name__ == "__main__":
2023-01-11T21:38:06.3990036Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3990169Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3990369Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.3990482Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3990487Z 
2023-01-11T21:38:06.3990492Z 
2023-01-11T21:38:06.3990591Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.3990670Z import torch
2023-01-11T21:38:06.3990740Z import random
2023-01-11T21:38:06.3990863Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.3990987Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.3990992Z 
2023-01-11T21:38:06.3991074Z aten = torch.ops.aten
2023-01-11T21:38:06.3991213Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.3991312Z async_compile = AsyncCompile()
2023-01-11T21:38:06.3991317Z 
2023-01-11T21:38:06.3991392Z import triton
2023-01-11T21:38:06.3991480Z import triton.language as tl
2023-01-11T21:38:06.3991611Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.3991752Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.3991758Z 
2023-01-11T21:38:06.3991762Z 
2023-01-11T21:38:06.3992026Z triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0 = async_compile.triton('''
2023-01-11T21:38:06.3992133Z import triton
2023-01-11T21:38:06.3992230Z import triton.language as tl
2023-01-11T21:38:06.3992348Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.3992452Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.3992581Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.3992708Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.3992714Z 
2023-01-11T21:38:06.3993150Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.3993229Z @triton.jit
2023-01-11T21:38:06.3993382Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.3993458Z     xnumel = 64
2023-01-11T21:38:06.3993562Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.3993696Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.3993775Z     xmask = xindex < xnumel
2023-01-11T21:38:06.3993848Z     x0 = xindex
2023-01-11T21:38:06.3994065Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.3994184Z     tmp13 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.3994275Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.3994376Z     tmp2 = -1.0
2023-01-11T21:38:06.3994513Z     tmp3 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp2, tmp1, tmp2))
2023-01-11T21:38:06.3994590Z     tmp4 = 1.0
2023-01-11T21:38:06.3994721Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4))
2023-01-11T21:38:06.3994851Z     tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.3994926Z     tmp7 = 1
2023-01-11T21:38:06.3995005Z     tmp8 = tmp0 + tmp7
2023-01-11T21:38:06.3995091Z     tmp9 = tmp8.to(tl.float32)
2023-01-11T21:38:06.3995241Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp2, tmp9, tmp2))
2023-01-11T21:38:06.3995406Z     tmp11 = tl.where(tmp10 != tmp10, tmp10, tl.where(tmp10 < tmp4, tmp10, tmp4))
2023-01-11T21:38:06.3995509Z     tmp12 = tmp11.to(tl.float32)
2023-01-11T21:38:06.3995625Z     tmp14 = tmp13 - tmp7
2023-01-11T21:38:06.3995718Z     tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.3995859Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp2, tmp15, tmp2))
2023-01-11T21:38:06.3995998Z     tmp17 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp4, tmp16, tmp4))
2023-01-11T21:38:06.3996088Z     tmp18 = tmp17.to(tl.float32)
2023-01-11T21:38:06.3996223Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.3996355Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.3996486Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.3996570Z ''')
2023-01-11T21:38:06.3996576Z 
2023-01-11T21:38:06.3996580Z 
2023-01-11T21:38:06.3996679Z async_compile.wait(globals())
2023-01-11T21:38:06.3996758Z del async_compile
2023-01-11T21:38:06.3996763Z 
2023-01-11T21:38:06.3996841Z def call(args):
2023-01-11T21:38:06.3996915Z     arg0_1, = args
2023-01-11T21:38:06.3996993Z     args.clear()
2023-01-11T21:38:06.3997081Z     with torch.cuda.device(0):
2023-01-11T21:38:06.3997280Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3997478Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3997675Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.3997769Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.3997987Z         triton_fused_convert_element_type_1_convert_element_type_3_convert_element_type_5_0.run(arg0_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.3998064Z         del arg0_1
2023-01-11T21:38:06.3998149Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.3998162Z 
2023-01-11T21:38:06.3998230Z 
2023-01-11T21:38:06.3998308Z if __name__ == "__main__":
2023-01-11T21:38:06.3998430Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.3998560Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.3998761Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.3998875Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.3998880Z 
2023-01-11T21:38:06.3998957Z ok (0.448s)
2023-01-11T21:38:06.3999422Z   test_horizonal_fusion1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.3999559Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.3999822Z [2023-01-11 21:34:46,210] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 578
2023-01-11T21:38:06.4000080Z [2023-01-11 21:34:46,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 578
2023-01-11T21:38:06.4000498Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4000662Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4000917Z [2023-01-11 21:34:46,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 579
2023-01-11T21:38:06.4001190Z [2023-01-11 21:34:46,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 579
2023-01-11T21:38:06.4001195Z 
2023-01-11T21:38:06.4001297Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4001374Z import torch
2023-01-11T21:38:06.4001450Z import random
2023-01-11T21:38:06.4001571Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4001690Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4001695Z 
2023-01-11T21:38:06.4001779Z aten = torch.ops.aten
2023-01-11T21:38:06.4001921Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4002017Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4002022Z 
2023-01-11T21:38:06.4002100Z import triton
2023-01-11T21:38:06.4002199Z import triton.language as tl
2023-01-11T21:38:06.4002323Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4002464Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4002469Z 
2023-01-11T21:38:06.4002474Z 
2023-01-11T21:38:06.4002641Z triton_fused_add_mul_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.4002717Z import triton
2023-01-11T21:38:06.4002813Z import triton.language as tl
2023-01-11T21:38:06.4002929Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4003033Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4003170Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4003296Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4003301Z 
2023-01-11T21:38:06.4003768Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.4003841Z @triton.jit
2023-01-11T21:38:06.4004012Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4004088Z     xnumel = 2048
2023-01-11T21:38:06.4004216Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4004348Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4004433Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4004507Z     x0 = xindex
2023-01-11T21:38:06.4004585Z     x2 = (xindex // 16) % 16
2023-01-11T21:38:06.4004777Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4004975Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4005075Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4005266Z     tmp4 = tl.load(in_ptr2 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4005366Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.4005464Z     tmp7 = tl.load(in_ptr2 + (x2), xmask)
2023-01-11T21:38:06.4005546Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4005650Z     tmp5 = tmp3 - tmp4
2023-01-11T21:38:06.4005728Z     tmp8 = tmp6 * tmp7
2023-01-11T21:38:06.4005866Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4006002Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4006135Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.4006222Z ''')
2023-01-11T21:38:06.4006227Z 
2023-01-11T21:38:06.4006232Z 
2023-01-11T21:38:06.4006327Z async_compile.wait(globals())
2023-01-11T21:38:06.4006399Z del async_compile
2023-01-11T21:38:06.4006411Z 
2023-01-11T21:38:06.4006482Z def call(args):
2023-01-11T21:38:06.4006569Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4006648Z     args.clear()
2023-01-11T21:38:06.4006772Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4006988Z         buf0 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4007201Z         buf1 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4007410Z         buf2 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4007499Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4007670Z         triton_fused_add_mul_sub_0.run(arg0_1, arg1_1, arg2_1, buf0, buf1, buf2, 2048, grid=grid(2048), stream=stream0)
2023-01-11T21:38:06.4007744Z         del arg0_1
2023-01-11T21:38:06.4007818Z         del arg1_1
2023-01-11T21:38:06.4007891Z         del arg2_1
2023-01-11T21:38:06.4007982Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4007987Z 
2023-01-11T21:38:06.4007991Z 
2023-01-11T21:38:06.4008075Z if __name__ == "__main__":
2023-01-11T21:38:06.4008199Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4008324Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4008540Z     arg0_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4008751Z     arg1_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4008960Z     arg2_1 = rand_strided((1, 16, 1), (16, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4009091Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4009096Z 
2023-01-11T21:38:06.4009101Z 
2023-01-11T21:38:06.4009199Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4009276Z import torch
2023-01-11T21:38:06.4009353Z import random
2023-01-11T21:38:06.4009468Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4009595Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4009601Z 
2023-01-11T21:38:06.4009685Z aten = torch.ops.aten
2023-01-11T21:38:06.4009825Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4009921Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4009926Z 
2023-01-11T21:38:06.4010001Z import triton
2023-01-11T21:38:06.4010099Z import triton.language as tl
2023-01-11T21:38:06.4010218Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4010386Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4010393Z 
2023-01-11T21:38:06.4010397Z 
2023-01-11T21:38:06.4010566Z triton_fused_add_mul_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.4010642Z import triton
2023-01-11T21:38:06.4010737Z import triton.language as tl
2023-01-11T21:38:06.4010849Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4010953Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4011089Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4011210Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4011218Z 
2023-01-11T21:38:06.4011686Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.4011764Z @triton.jit
2023-01-11T21:38:06.4011936Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4012016Z     xnumel = 2048
2023-01-11T21:38:06.4012115Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4012248Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4012334Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4012400Z     x0 = xindex
2023-01-11T21:38:06.4012486Z     x2 = (xindex // 16) % 16
2023-01-11T21:38:06.4012703Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4012940Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4013056Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4013264Z     tmp4 = tl.load(in_ptr2 + (x2), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4013382Z     tmp6 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4013496Z     tmp7 = tl.load(in_ptr2 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.4013568Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4013676Z     tmp5 = tmp3 - tmp4
2023-01-11T21:38:06.4013753Z     tmp8 = tmp6 * tmp7
2023-01-11T21:38:06.4013887Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4014022Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4014153Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.4014237Z ''')
2023-01-11T21:38:06.4014243Z 
2023-01-11T21:38:06.4014247Z 
2023-01-11T21:38:06.4014337Z async_compile.wait(globals())
2023-01-11T21:38:06.4014414Z del async_compile
2023-01-11T21:38:06.4014419Z 
2023-01-11T21:38:06.4014624Z def call(args):
2023-01-11T21:38:06.4014714Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4014788Z     args.clear()
2023-01-11T21:38:06.4014879Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4015099Z         buf0 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4015309Z         buf1 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4015532Z         buf2 = empty_strided((8, 16, 16), (256, 16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4015642Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4015818Z         triton_fused_add_mul_sub_0.run(arg0_1, arg1_1, arg2_1, buf0, buf1, buf2, 2048, grid=grid(2048), stream=stream0)
2023-01-11T21:38:06.4015893Z         del arg0_1
2023-01-11T21:38:06.4015965Z         del arg1_1
2023-01-11T21:38:06.4016040Z         del arg2_1
2023-01-11T21:38:06.4016131Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4016136Z 
2023-01-11T21:38:06.4016140Z 
2023-01-11T21:38:06.4016217Z if __name__ == "__main__":
2023-01-11T21:38:06.4016328Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4016500Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4016711Z     arg0_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4016919Z     arg1_1 = rand_strided((8, 16, 16), (256, 16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4017176Z     arg2_1 = rand_strided((1, 16, 1), (16, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4017334Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4017340Z 
2023-01-11T21:38:06.4017412Z ok (0.350s)
2023-01-11T21:38:06.4017873Z   test_horizonal_fusion2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4018013Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4018266Z [2023-01-11 21:34:46,560] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 580
2023-01-11T21:38:06.4018526Z [2023-01-11 21:34:46,643] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 580
2023-01-11T21:38:06.4018940Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4019110Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4019363Z [2023-01-11 21:34:46,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 581
2023-01-11T21:38:06.4019369Z 
2023-01-11T21:38:06.4019468Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4019542Z import torch
2023-01-11T21:38:06.4019614Z import random
2023-01-11T21:38:06.4019735Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4019852Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4019857Z 
2023-01-11T21:38:06.4019938Z aten = torch.ops.aten
2023-01-11T21:38:06.4020077Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4020173Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4020178Z 
2023-01-11T21:38:06.4020252Z import triton
2023-01-11T21:38:06.4020344Z import triton.language as tl
2023-01-11T21:38:06.4020473Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4020607Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4020619Z 
2023-01-11T21:38:06.4020623Z 
2023-01-11T21:38:06.4020770Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4020847Z import triton
2023-01-11T21:38:06.4020941Z import triton.language as tl
2023-01-11T21:38:06.4021055Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4021155Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4021290Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4021415Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4021421Z 
2023-01-11T21:38:06.4021833Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4021903Z @triton.jit
2023-01-11T21:38:06.4022038Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4022111Z     xnumel = 1024
2023-01-11T21:38:06.4022208Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4022335Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4022445Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4022517Z     x0 = xindex
2023-01-11T21:38:06.4022607Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4022679Z     tmp1 = 1
2023-01-11T21:38:06.4022758Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4022893Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4022981Z ''')
2023-01-11T21:38:06.4022987Z 
2023-01-11T21:38:06.4022992Z 
2023-01-11T21:38:06.4023149Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4023224Z import triton
2023-01-11T21:38:06.4023310Z import triton.language as tl
2023-01-11T21:38:06.4023425Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4023528Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4023660Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4023785Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4023790Z 
2023-01-11T21:38:06.4024194Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4024273Z @triton.jit
2023-01-11T21:38:06.4024406Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4024473Z     xnumel = 128
2023-01-11T21:38:06.4024571Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4024702Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4024785Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4024854Z     x0 = xindex
2023-01-11T21:38:06.4024981Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4025051Z     tmp1 = 2
2023-01-11T21:38:06.4025123Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4025260Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4025348Z ''')
2023-01-11T21:38:06.4025353Z 
2023-01-11T21:38:06.4025357Z 
2023-01-11T21:38:06.4025542Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.4025621Z import triton
2023-01-11T21:38:06.4025734Z import triton.language as tl
2023-01-11T21:38:06.4025848Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4025950Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4026078Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4026205Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4026210Z 
2023-01-11T21:38:06.4026615Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4026693Z @triton.jit
2023-01-11T21:38:06.4026825Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4026898Z     xnumel = 128
2023-01-11T21:38:06.4026994Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4027127Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4027203Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4027272Z     x0 = xindex
2023-01-11T21:38:06.4027368Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4027439Z     tmp1 = 3
2023-01-11T21:38:06.4027518Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4027651Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4027735Z ''')
2023-01-11T21:38:06.4027741Z 
2023-01-11T21:38:06.4027745Z 
2023-01-11T21:38:06.4027831Z async_compile.wait(globals())
2023-01-11T21:38:06.4027906Z del async_compile
2023-01-11T21:38:06.4027914Z 
2023-01-11T21:38:06.4027989Z def call(args):
2023-01-11T21:38:06.4028073Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4028147Z     args.clear()
2023-01-11T21:38:06.4028239Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4028450Z         buf0 = empty_strided((8, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4028562Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4028707Z         triton_fused_add_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.4028779Z         del arg0_1
2023-01-11T21:38:06.4028982Z         buf1 = empty_strided((8, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4029116Z         triton_fused_add_1_1.run(arg1_1, buf1, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4029188Z         del arg1_1
2023-01-11T21:38:06.4029388Z         buf2 = empty_strided((16, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4029526Z         triton_fused_add_2_2.run(arg2_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4029595Z         del arg2_1
2023-01-11T21:38:06.4029682Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4029687Z 
2023-01-11T21:38:06.4029692Z 
2023-01-11T21:38:06.4029771Z if __name__ == "__main__":
2023-01-11T21:38:06.4029892Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4030019Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4030233Z     arg0_1 = rand_strided((8, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4030435Z     arg1_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4030636Z     arg2_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4030758Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4030763Z 
2023-01-11T21:38:06.4031030Z [2023-01-11 21:34:46,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 581
2023-01-11T21:38:06.4031063Z 
2023-01-11T21:38:06.4031161Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4031235Z import torch
2023-01-11T21:38:06.4031310Z import random
2023-01-11T21:38:06.4031427Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4031554Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4031559Z 
2023-01-11T21:38:06.4031641Z aten = torch.ops.aten
2023-01-11T21:38:06.4031771Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4031865Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4031870Z 
2023-01-11T21:38:06.4031944Z import triton
2023-01-11T21:38:06.4032038Z import triton.language as tl
2023-01-11T21:38:06.4032163Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4032302Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4032307Z 
2023-01-11T21:38:06.4032312Z 
2023-01-11T21:38:06.4032466Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4032544Z import triton
2023-01-11T21:38:06.4032629Z import triton.language as tl
2023-01-11T21:38:06.4032743Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4032844Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4032981Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4033108Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4033114Z 
2023-01-11T21:38:06.4033518Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4033589Z @triton.jit
2023-01-11T21:38:06.4033724Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4033792Z     xnumel = 1024
2023-01-11T21:38:06.4033886Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4034014Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4034105Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4034177Z     x0 = xindex
2023-01-11T21:38:06.4034295Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4034366Z     tmp1 = 1
2023-01-11T21:38:06.4034438Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4034601Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4034687Z ''')
2023-01-11T21:38:06.4034693Z 
2023-01-11T21:38:06.4034697Z 
2023-01-11T21:38:06.4034857Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4034932Z import triton
2023-01-11T21:38:06.4035024Z import triton.language as tl
2023-01-11T21:38:06.4035142Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4035253Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4035403Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4035545Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4035554Z 
2023-01-11T21:38:06.4035953Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4036026Z @triton.jit
2023-01-11T21:38:06.4036160Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4036237Z     xnumel = 128
2023-01-11T21:38:06.4036333Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4036455Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4036538Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4036607Z     x0 = xindex
2023-01-11T21:38:06.4036728Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4036799Z     tmp1 = 2
2023-01-11T21:38:06.4036877Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4037011Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4037132Z ''')
2023-01-11T21:38:06.4037138Z 
2023-01-11T21:38:06.4037147Z 
2023-01-11T21:38:06.4037299Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.4037371Z import triton
2023-01-11T21:38:06.4037468Z import triton.language as tl
2023-01-11T21:38:06.4037582Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4037685Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4037818Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4037945Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4037950Z 
2023-01-11T21:38:06.4038346Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4038420Z @triton.jit
2023-01-11T21:38:06.4038552Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4038632Z     xnumel = 128
2023-01-11T21:38:06.4038730Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4038858Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4038940Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4039008Z     x0 = xindex
2023-01-11T21:38:06.4039122Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4039191Z     tmp1 = 3
2023-01-11T21:38:06.4039267Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4039403Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4039487Z ''')
2023-01-11T21:38:06.4039492Z 
2023-01-11T21:38:06.4039496Z 
2023-01-11T21:38:06.4039590Z async_compile.wait(globals())
2023-01-11T21:38:06.4039666Z del async_compile
2023-01-11T21:38:06.4039671Z 
2023-01-11T21:38:06.4039739Z def call(args):
2023-01-11T21:38:06.4039825Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4039901Z     args.clear()
2023-01-11T21:38:06.4039995Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4040208Z         buf0 = empty_strided((8, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4040301Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4040440Z         triton_fused_add_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.4040507Z         del arg0_1
2023-01-11T21:38:06.4040738Z         buf1 = empty_strided((8, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4040878Z         triton_fused_add_1_1.run(arg1_1, buf1, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4040952Z         del arg1_1
2023-01-11T21:38:06.4041153Z         buf2 = empty_strided((16, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4041291Z         triton_fused_add_2_2.run(arg2_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4041362Z         del arg2_1
2023-01-11T21:38:06.4041449Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4041454Z 
2023-01-11T21:38:06.4041461Z 
2023-01-11T21:38:06.4041534Z if __name__ == "__main__":
2023-01-11T21:38:06.4041653Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4041781Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4041990Z     arg0_1 = rand_strided((8, 16, 8), (128, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4042195Z     arg1_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4042395Z     arg2_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4042523Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4042528Z 
2023-01-11T21:38:06.4042598Z ok (0.208s)
2023-01-11T21:38:06.4043055Z   test_index1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4043209Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4043471Z [2023-01-11 21:34:46,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 582
2023-01-11T21:38:06.4043736Z [2023-01-11 21:34:46,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 582
2023-01-11T21:38:06.4044151Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4044283Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4044544Z [2023-01-11 21:34:46,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 583
2023-01-11T21:38:06.4044802Z [2023-01-11 21:34:46,983] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 583
2023-01-11T21:38:06.4045217Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4045348Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4045601Z [2023-01-11 21:34:47,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 584
2023-01-11T21:38:06.4045863Z [2023-01-11 21:34:47,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 584
2023-01-11T21:38:06.4046300Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4046427Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4046686Z [2023-01-11 21:34:47,162] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 585
2023-01-11T21:38:06.4046692Z 
2023-01-11T21:38:06.4046790Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4046865Z import torch
2023-01-11T21:38:06.4046938Z import random
2023-01-11T21:38:06.4047058Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4047184Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4047189Z 
2023-01-11T21:38:06.4047276Z aten = torch.ops.aten
2023-01-11T21:38:06.4047407Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4047501Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4047506Z 
2023-01-11T21:38:06.4047579Z import triton
2023-01-11T21:38:06.4047670Z import triton.language as tl
2023-01-11T21:38:06.4047799Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4047940Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4047946Z 
2023-01-11T21:38:06.4047950Z 
2023-01-11T21:38:06.4048109Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4048184Z import triton
2023-01-11T21:38:06.4048270Z import triton.language as tl
2023-01-11T21:38:06.4048385Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4048487Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4048625Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4048752Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4048783Z 
2023-01-11T21:38:06.4049218Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4049296Z @triton.jit
2023-01-11T21:38:06.4049449Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4049515Z     xnumel = 48
2023-01-11T21:38:06.4049612Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4049742Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4049824Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4049904Z     x1 = (xindex // 12)
2023-01-11T21:38:06.4049977Z     x0 = xindex % 12
2023-01-11T21:38:06.4050047Z     x2 = xindex
2023-01-11T21:38:06.4050139Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4050236Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.4050359Z     tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask)
2023-01-11T21:38:06.4050494Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4050579Z ''')
2023-01-11T21:38:06.4050585Z 
2023-01-11T21:38:06.4050590Z 
2023-01-11T21:38:06.4050685Z async_compile.wait(globals())
2023-01-11T21:38:06.4050763Z del async_compile
2023-01-11T21:38:06.4050768Z 
2023-01-11T21:38:06.4050843Z def call(args):
2023-01-11T21:38:06.4050922Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4050999Z     args.clear()
2023-01-11T21:38:06.4051091Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4051296Z         buf0 = empty_strided((4, 12), (12, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4051390Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4051541Z         triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 48, grid=grid(48), stream=stream0)
2023-01-11T21:38:06.4051615Z         del arg0_1
2023-01-11T21:38:06.4051683Z         del arg1_1
2023-01-11T21:38:06.4051755Z         del arg2_1
2023-01-11T21:38:06.4051831Z         return (buf0, )
2023-01-11T21:38:06.4051836Z 
2023-01-11T21:38:06.4051840Z 
2023-01-11T21:38:06.4051921Z if __name__ == "__main__":
2023-01-11T21:38:06.4052039Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4052191Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4052405Z     arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4052602Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4052789Z     arg2_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4052919Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4052924Z 
2023-01-11T21:38:06.4052928Z 
2023-01-11T21:38:06.4053026Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4053101Z import torch
2023-01-11T21:38:06.4053177Z import random
2023-01-11T21:38:06.4053296Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4053423Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4053428Z 
2023-01-11T21:38:06.4053509Z aten = torch.ops.aten
2023-01-11T21:38:06.4053638Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4053734Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4053739Z 
2023-01-11T21:38:06.4053812Z import triton
2023-01-11T21:38:06.4053905Z import triton.language as tl
2023-01-11T21:38:06.4054031Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4054170Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4054175Z 
2023-01-11T21:38:06.4054180Z 
2023-01-11T21:38:06.4054336Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4054404Z import triton
2023-01-11T21:38:06.4054603Z import triton.language as tl
2023-01-11T21:38:06.4054722Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4054869Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4055004Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4055130Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4055135Z 
2023-01-11T21:38:06.4055620Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4055694Z @triton.jit
2023-01-11T21:38:06.4055847Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4055914Z     xnumel = 48
2023-01-11T21:38:06.4056010Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4056138Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4056220Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4056305Z     x1 = (xindex // 12)
2023-01-11T21:38:06.4056381Z     x0 = xindex % 12
2023-01-11T21:38:06.4056446Z     x2 = xindex
2023-01-11T21:38:06.4056544Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4056641Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.4056782Z     tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask).to(tl.float32)
2023-01-11T21:38:06.4056921Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4057005Z ''')
2023-01-11T21:38:06.4057011Z 
2023-01-11T21:38:06.4057015Z 
2023-01-11T21:38:06.4057107Z async_compile.wait(globals())
2023-01-11T21:38:06.4057245Z del async_compile
2023-01-11T21:38:06.4057251Z 
2023-01-11T21:38:06.4057320Z def call(args):
2023-01-11T21:38:06.4057408Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4057482Z     args.clear()
2023-01-11T21:38:06.4057573Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4057780Z         buf0 = empty_strided((4, 12), (12, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4057880Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4058029Z         triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 48, grid=grid(48), stream=stream0)
2023-01-11T21:38:06.4058096Z         del arg0_1
2023-01-11T21:38:06.4058169Z         del arg1_1
2023-01-11T21:38:06.4058238Z         del arg2_1
2023-01-11T21:38:06.4058353Z         return (buf0, )
2023-01-11T21:38:06.4058359Z 
2023-01-11T21:38:06.4058363Z 
2023-01-11T21:38:06.4058446Z if __name__ == "__main__":
2023-01-11T21:38:06.4058564Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4058690Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4058902Z     arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4059090Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4059278Z     arg2_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4059409Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4059414Z 
2023-01-11T21:38:06.4059419Z 
2023-01-11T21:38:06.4059516Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4059591Z import torch
2023-01-11T21:38:06.4059664Z import random
2023-01-11T21:38:06.4059785Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4059909Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4059914Z 
2023-01-11T21:38:06.4059989Z aten = torch.ops.aten
2023-01-11T21:38:06.4060124Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4060220Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4060225Z 
2023-01-11T21:38:06.4060299Z import triton
2023-01-11T21:38:06.4060392Z import triton.language as tl
2023-01-11T21:38:06.4060516Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4060655Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4060660Z 
2023-01-11T21:38:06.4060692Z 
2023-01-11T21:38:06.4060852Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4060922Z import triton
2023-01-11T21:38:06.4061016Z import triton.language as tl
2023-01-11T21:38:06.4061134Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4061238Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4061377Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4061505Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4061510Z 
2023-01-11T21:38:06.4061942Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4062017Z @triton.jit
2023-01-11T21:38:06.4062164Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4062242Z     xnumel = 192
2023-01-11T21:38:06.4062342Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4062474Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4062561Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4062646Z     x1 = (xindex // 12) % 4
2023-01-11T21:38:06.4062725Z     x2 = (xindex // 48)
2023-01-11T21:38:06.4062798Z     x0 = xindex % 12
2023-01-11T21:38:06.4062871Z     x4 = xindex
2023-01-11T21:38:06.4062974Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4063075Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.4063198Z     tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask)
2023-01-11T21:38:06.4063337Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4063425Z ''')
2023-01-11T21:38:06.4063431Z 
2023-01-11T21:38:06.4063435Z 
2023-01-11T21:38:06.4063530Z async_compile.wait(globals())
2023-01-11T21:38:06.4063602Z del async_compile
2023-01-11T21:38:06.4063607Z 
2023-01-11T21:38:06.4063685Z def call(args):
2023-01-11T21:38:06.4063774Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4063853Z     args.clear()
2023-01-11T21:38:06.4063948Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4064159Z         buf0 = empty_strided((4, 4, 12), (48, 12, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4064295Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4064448Z         triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 192, grid=grid(192), stream=stream0)
2023-01-11T21:38:06.4064525Z         del arg0_1
2023-01-11T21:38:06.4064598Z         del arg1_1
2023-01-11T21:38:06.4064673Z         del arg2_1
2023-01-11T21:38:06.4064753Z         return (buf0, )
2023-01-11T21:38:06.4064758Z 
2023-01-11T21:38:06.4064762Z 
2023-01-11T21:38:06.4064844Z if __name__ == "__main__":
2023-01-11T21:38:06.4064967Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4065089Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4065303Z     arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4065504Z     arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4065701Z     arg2_1 = rand_strided((4, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4065832Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4065838Z 
2023-01-11T21:38:06.4066106Z [2023-01-11 21:34:47,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 585
2023-01-11T21:38:06.4066112Z 
2023-01-11T21:38:06.4066215Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4066294Z import torch
2023-01-11T21:38:06.4066364Z import random
2023-01-11T21:38:06.4066485Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4066614Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4066619Z 
2023-01-11T21:38:06.4066703Z aten = torch.ops.aten
2023-01-11T21:38:06.4066845Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4066974Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4066979Z 
2023-01-11T21:38:06.4067058Z import triton
2023-01-11T21:38:06.4067155Z import triton.language as tl
2023-01-11T21:38:06.4067276Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4067421Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4067426Z 
2023-01-11T21:38:06.4067431Z 
2023-01-11T21:38:06.4067594Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4067671Z import triton
2023-01-11T21:38:06.4067766Z import triton.language as tl
2023-01-11T21:38:06.4067881Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4067990Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4068126Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4068248Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4068253Z 
2023-01-11T21:38:06.4068686Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4068760Z @triton.jit
2023-01-11T21:38:06.4068919Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4068994Z     xnumel = 192
2023-01-11T21:38:06.4069094Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4069224Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4069311Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4069389Z     x1 = (xindex // 12) % 4
2023-01-11T21:38:06.4069470Z     x2 = (xindex // 48)
2023-01-11T21:38:06.4069546Z     x0 = xindex % 12
2023-01-11T21:38:06.4069619Z     x4 = xindex
2023-01-11T21:38:06.4069720Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4069820Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.4069962Z     tmp2 = tl.load(in_ptr2 + (x0 + (12*tmp1) + (96*tmp0)), xmask).to(tl.float32)
2023-01-11T21:38:06.4070093Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4070181Z ''')
2023-01-11T21:38:06.4070187Z 
2023-01-11T21:38:06.4070191Z 
2023-01-11T21:38:06.4070313Z async_compile.wait(globals())
2023-01-11T21:38:06.4070394Z del async_compile
2023-01-11T21:38:06.4070399Z 
2023-01-11T21:38:06.4070476Z def call(args):
2023-01-11T21:38:06.4070564Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4070641Z     args.clear()
2023-01-11T21:38:06.4070728Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4070942Z         buf0 = empty_strided((4, 4, 12), (48, 12, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4071036Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4071194Z         triton_fused_index_0.run(arg1_1, arg2_1, arg0_1, buf0, 192, grid=grid(192), stream=stream0)
2023-01-11T21:38:06.4071269Z         del arg0_1
2023-01-11T21:38:06.4071347Z         del arg1_1
2023-01-11T21:38:06.4071421Z         del arg2_1
2023-01-11T21:38:06.4071493Z         return (buf0, )
2023-01-11T21:38:06.4071506Z 
2023-01-11T21:38:06.4071510Z 
2023-01-11T21:38:06.4071585Z if __name__ == "__main__":
2023-01-11T21:38:06.4071705Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4071835Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4072046Z     arg0_1 = rand_strided((8, 8, 12), (96, 12, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4072245Z     arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4072443Z     arg2_1 = rand_strided((4, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4072573Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4072578Z 
2023-01-11T21:38:06.4072651Z ok (0.504s)
2023-01-11T21:38:06.4073103Z   test_index2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4073266Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4073530Z [2023-01-11 21:34:47,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 586
2023-01-11T21:38:06.4073795Z [2023-01-11 21:34:47,423] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 586
2023-01-11T21:38:06.4074213Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4074349Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4074610Z [2023-01-11 21:34:47,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 587
2023-01-11T21:38:06.4074877Z [2023-01-11 21:34:47,582] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 587
2023-01-11T21:38:06.4074883Z 
2023-01-11T21:38:06.4074983Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4075061Z import torch
2023-01-11T21:38:06.4075130Z import random
2023-01-11T21:38:06.4075252Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4075383Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4075388Z 
2023-01-11T21:38:06.4075471Z aten = torch.ops.aten
2023-01-11T21:38:06.4075612Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4075709Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4075718Z 
2023-01-11T21:38:06.4075791Z import triton
2023-01-11T21:38:06.4075883Z import triton.language as tl
2023-01-11T21:38:06.4076004Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4076145Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4076151Z 
2023-01-11T21:38:06.4076180Z 
2023-01-11T21:38:06.4076340Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4076418Z import triton
2023-01-11T21:38:06.4076510Z import triton.language as tl
2023-01-11T21:38:06.4076625Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4076729Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4076866Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4076988Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4076993Z 
2023-01-11T21:38:06.4077415Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4077493Z @triton.jit
2023-01-11T21:38:06.4077638Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4077716Z     xnumel = 256
2023-01-11T21:38:06.4077817Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4077953Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4078037Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4078112Z     x1 = (xindex // 64)
2023-01-11T21:38:06.4078189Z     x0 = xindex % 64
2023-01-11T21:38:06.4078262Z     x2 = xindex
2023-01-11T21:38:06.4078457Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4078667Z     tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4078805Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4078929Z ''')
2023-01-11T21:38:06.4078935Z 
2023-01-11T21:38:06.4078939Z 
2023-01-11T21:38:06.4079096Z triton_fused_index_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4079174Z import triton
2023-01-11T21:38:06.4079269Z import triton.language as tl
2023-01-11T21:38:06.4079388Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4079492Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4079627Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4079756Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4079761Z 
2023-01-11T21:38:06.4080179Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4080249Z @triton.jit
2023-01-11T21:38:06.4080393Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4080473Z     xnumel = 256
2023-01-11T21:38:06.4080571Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4080704Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4080790Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4080872Z     x1 = (xindex // 8) % 4
2023-01-11T21:38:06.4080944Z     x0 = xindex % 8
2023-01-11T21:38:06.4081025Z     x2 = (xindex // 32)
2023-01-11T21:38:06.4081100Z     x3 = xindex
2023-01-11T21:38:06.4081200Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4081323Z     tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask)
2023-01-11T21:38:06.4081459Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4081545Z ''')
2023-01-11T21:38:06.4081550Z 
2023-01-11T21:38:06.4081555Z 
2023-01-11T21:38:06.4081647Z async_compile.wait(globals())
2023-01-11T21:38:06.4081720Z del async_compile
2023-01-11T21:38:06.4081725Z 
2023-01-11T21:38:06.4081805Z def call(args):
2023-01-11T21:38:06.4081885Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4081961Z     args.clear()
2023-01-11T21:38:06.4082057Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4082279Z         buf0 = empty_strided((1, 4, 8, 8), (256, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4082400Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4082544Z         triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4082763Z         buf1 = empty_strided((8, 1, 4, 8), (32, 32, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4082917Z         triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4082992Z         del arg0_1
2023-01-11T21:38:06.4083065Z         del arg1_1
2023-01-11T21:38:06.4083151Z         return (buf0, buf1, )
2023-01-11T21:38:06.4083156Z 
2023-01-11T21:38:06.4083160Z 
2023-01-11T21:38:06.4083241Z if __name__ == "__main__":
2023-01-11T21:38:06.4083362Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4083485Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4083694Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4090045Z     arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4090168Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4090174Z 
2023-01-11T21:38:06.4090178Z 
2023-01-11T21:38:06.4090276Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4090353Z import torch
2023-01-11T21:38:06.4090431Z import random
2023-01-11T21:38:06.4090551Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4090677Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4090683Z 
2023-01-11T21:38:06.4090757Z aten = torch.ops.aten
2023-01-11T21:38:06.4090898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4091054Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4091059Z 
2023-01-11T21:38:06.4091136Z import triton
2023-01-11T21:38:06.4091232Z import triton.language as tl
2023-01-11T21:38:06.4091361Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4091499Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4091506Z 
2023-01-11T21:38:06.4091510Z 
2023-01-11T21:38:06.4091667Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4091745Z import triton
2023-01-11T21:38:06.4091837Z import triton.language as tl
2023-01-11T21:38:06.4091954Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4092058Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4092194Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4092323Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4092328Z 
2023-01-11T21:38:06.4092748Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4092822Z @triton.jit
2023-01-11T21:38:06.4092966Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4093045Z     xnumel = 256
2023-01-11T21:38:06.4093144Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4093275Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4093361Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4093443Z     x1 = (xindex // 64)
2023-01-11T21:38:06.4093514Z     x0 = xindex % 64
2023-01-11T21:38:06.4093586Z     x2 = xindex
2023-01-11T21:38:06.4093780Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4094012Z     tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4094150Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4094239Z ''')
2023-01-11T21:38:06.4094245Z 
2023-01-11T21:38:06.4094249Z 
2023-01-11T21:38:06.4094411Z triton_fused_index_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4094724Z import triton
2023-01-11T21:38:06.4094814Z import triton.language as tl
2023-01-11T21:38:06.4094981Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4095084Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4095218Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4095361Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4095368Z 
2023-01-11T21:38:06.4095818Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4095890Z @triton.jit
2023-01-11T21:38:06.4096034Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4096101Z     xnumel = 256
2023-01-11T21:38:06.4096193Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4096321Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4096403Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4096489Z     x1 = (xindex // 8) % 4
2023-01-11T21:38:06.4096562Z     x0 = xindex % 8
2023-01-11T21:38:06.4096639Z     x2 = (xindex // 32)
2023-01-11T21:38:06.4096703Z     x3 = xindex
2023-01-11T21:38:06.4096801Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4096934Z     tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.4097068Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4097219Z ''')
2023-01-11T21:38:06.4097226Z 
2023-01-11T21:38:06.4097234Z 
2023-01-11T21:38:06.4097338Z async_compile.wait(globals())
2023-01-11T21:38:06.4097415Z del async_compile
2023-01-11T21:38:06.4097508Z 
2023-01-11T21:38:06.4097577Z def call(args):
2023-01-11T21:38:06.4097655Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4097729Z     args.clear()
2023-01-11T21:38:06.4097818Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4098037Z         buf0 = empty_strided((1, 4, 8, 8), (256, 64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4098131Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4098277Z         triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4098492Z         buf1 = empty_strided((8, 1, 4, 8), (32, 32, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4098630Z         triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4098703Z         del arg0_1
2023-01-11T21:38:06.4098775Z         del arg1_1
2023-01-11T21:38:06.4098859Z         return (buf0, buf1, )
2023-01-11T21:38:06.4098864Z 
2023-01-11T21:38:06.4098868Z 
2023-01-11T21:38:06.4098950Z if __name__ == "__main__":
2023-01-11T21:38:06.4099066Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4099192Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4099400Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4099593Z     arg1_1 = rand_strided((1, 4), (4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4099711Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4099717Z 
2023-01-11T21:38:06.4099787Z ok (0.329s)
2023-01-11T21:38:06.4100239Z   test_index3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4100373Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4100636Z [2023-01-11 21:34:47,625] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 588
2023-01-11T21:38:06.4100854Z [2023-01-11 21:34:47,647] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.index
2023-01-11T21:38:06.4101146Z [2023-01-11 21:34:47,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 588
2023-01-11T21:38:06.4101556Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4101686Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4101930Z [2023-01-11 21:34:47,690] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 589
2023-01-11T21:38:06.4102144Z [2023-01-11 21:34:47,711] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.index
2023-01-11T21:38:06.4102401Z [2023-01-11 21:34:47,714] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 589
2023-01-11T21:38:06.4102409Z 
2023-01-11T21:38:06.4102506Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4102580Z import torch
2023-01-11T21:38:06.4102652Z import random
2023-01-11T21:38:06.4102770Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4102892Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4102898Z 
2023-01-11T21:38:06.4102973Z aten = torch.ops.aten
2023-01-11T21:38:06.4103111Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4103207Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4103213Z 
2023-01-11T21:38:06.4103286Z import triton
2023-01-11T21:38:06.4103403Z import triton.language as tl
2023-01-11T21:38:06.4103528Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4103668Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4103673Z 
2023-01-11T21:38:06.4103677Z 
2023-01-11T21:38:06.4103769Z async_compile.wait(globals())
2023-01-11T21:38:06.4103840Z del async_compile
2023-01-11T21:38:06.4103851Z 
2023-01-11T21:38:06.4103919Z def call(args):
2023-01-11T21:38:06.4104004Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4104080Z     args.clear()
2023-01-11T21:38:06.4104172Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4104324Z         buf0 = aten.index(as_strided(arg0_1, (3, 4, 1, 4, 3), (192, 48, 0, 12, 1)), [None, arg1_1, None, arg2_1])
2023-01-11T21:38:06.4104399Z         del arg0_1
2023-01-11T21:38:06.4104464Z         del arg1_1
2023-01-11T21:38:06.4104534Z         del arg2_1
2023-01-11T21:38:06.4104604Z         buf1 = buf0
2023-01-11T21:38:06.4104716Z         assert_size_stride(buf1, (3, 3, 1, 3), (9, 3, 3, 1))
2023-01-11T21:38:06.4104789Z         del buf0
2023-01-11T21:38:06.4104867Z         return (buf1, )
2023-01-11T21:38:06.4104872Z 
2023-01-11T21:38:06.4104877Z 
2023-01-11T21:38:06.4104955Z if __name__ == "__main__":
2023-01-11T21:38:06.4105070Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4105194Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4105425Z     arg0_1 = rand_strided((3, 4, 4, 4, 3), (192, 48, 12, 3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4105639Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4105860Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4105987Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4105993Z 
2023-01-11T21:38:06.4105997Z 
2023-01-11T21:38:06.4106096Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4106170Z import torch
2023-01-11T21:38:06.4106245Z import random
2023-01-11T21:38:06.4106356Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4106480Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4106486Z 
2023-01-11T21:38:06.4106568Z aten = torch.ops.aten
2023-01-11T21:38:06.4106702Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4106823Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4106829Z 
2023-01-11T21:38:06.4106903Z import triton
2023-01-11T21:38:06.4106993Z import triton.language as tl
2023-01-11T21:38:06.4107111Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4107248Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4107253Z 
2023-01-11T21:38:06.4107258Z 
2023-01-11T21:38:06.4107349Z async_compile.wait(globals())
2023-01-11T21:38:06.4107424Z del async_compile
2023-01-11T21:38:06.4107429Z 
2023-01-11T21:38:06.4107503Z def call(args):
2023-01-11T21:38:06.4107587Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4107662Z     args.clear()
2023-01-11T21:38:06.4107752Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4107896Z         buf0 = aten.index(as_strided(arg0_1, (3, 4, 1, 4, 3), (192, 48, 0, 12, 1)), [None, arg1_1, None, arg2_1])
2023-01-11T21:38:06.4107972Z         del arg0_1
2023-01-11T21:38:06.4108045Z         del arg1_1
2023-01-11T21:38:06.4108118Z         del arg2_1
2023-01-11T21:38:06.4108190Z         buf1 = buf0
2023-01-11T21:38:06.4108300Z         assert_size_stride(buf1, (3, 3, 1, 3), (9, 3, 3, 1))
2023-01-11T21:38:06.4108371Z         del buf0
2023-01-11T21:38:06.4108442Z         return (buf1, )
2023-01-11T21:38:06.4108447Z 
2023-01-11T21:38:06.4108457Z 
2023-01-11T21:38:06.4108529Z if __name__ == "__main__":
2023-01-11T21:38:06.4108647Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4108772Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4108995Z     arg0_1 = rand_strided((3, 4, 4, 4, 3), (192, 48, 12, 3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4109217Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4109407Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4109531Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4109536Z 
2023-01-11T21:38:06.4109605Z ok (0.131s)
2023-01-11T21:38:06.4110061Z   test_index_put1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4110193Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4110449Z [2023-01-11 21:34:47,901] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 590
2023-01-11T21:38:06.4110712Z [2023-01-11 21:34:48,118] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 590
2023-01-11T21:38:06.4111129Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4111260Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4111518Z [2023-01-11 21:34:48,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 591
2023-01-11T21:38:06.4111524Z 
2023-01-11T21:38:06.4111621Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4111695Z import torch
2023-01-11T21:38:06.4111768Z import random
2023-01-11T21:38:06.4111880Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4112003Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4112009Z 
2023-01-11T21:38:06.4112090Z aten = torch.ops.aten
2023-01-11T21:38:06.4112227Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4112320Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4112354Z 
2023-01-11T21:38:06.4112432Z import triton
2023-01-11T21:38:06.4112527Z import triton.language as tl
2023-01-11T21:38:06.4112646Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4112788Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4112794Z 
2023-01-11T21:38:06.4112798Z 
2023-01-11T21:38:06.4112970Z triton_fused_add_index_put_0 = async_compile.triton('''
2023-01-11T21:38:06.4113045Z import triton
2023-01-11T21:38:06.4113138Z import triton.language as tl
2023-01-11T21:38:06.4113251Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4113355Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4113494Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4113615Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4113620Z 
2023-01-11T21:38:06.4114053Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4114130Z @triton.jit
2023-01-11T21:38:06.4114274Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4114355Z     xnumel = 10035200
2023-01-11T21:38:06.4114455Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4114589Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4114674Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4114741Z     x0 = xindex
2023-01-11T21:38:06.4114933Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4115067Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4115139Z     tmp2 = 1
2023-01-11T21:38:06.4115218Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4115375Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4115531Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4115618Z ''')
2023-01-11T21:38:06.4115624Z 
2023-01-11T21:38:06.4115636Z 
2023-01-11T21:38:06.4115829Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton('''
2023-01-11T21:38:06.4115903Z import triton
2023-01-11T21:38:06.4115995Z import triton.language as tl
2023-01-11T21:38:06.4116109Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4116215Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4116345Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4116470Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4116479Z 
2023-01-11T21:38:06.4116930Z @pointwise(size_hints=[8388608], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4117004Z @triton.jit
2023-01-11T21:38:06.4117156Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4117233Z     xnumel = 7538944
2023-01-11T21:38:06.4117330Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4117460Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4117543Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4117627Z     x1 = (xindex // 12544)
2023-01-11T21:38:06.4117690Z     x2 = xindex
2023-01-11T21:38:06.4117771Z     x0 = xindex % 12544
2023-01-11T21:38:06.4117965Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4118159Z     tmp1 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4118255Z     tmp2 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4118351Z     tmp5 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.4118421Z     tmp3 = 1
2023-01-11T21:38:06.4118492Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.4118598Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.4118743Z     tl.store(out_ptr0 + (x0 + (12544*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4118888Z     tl.store(out_ptr1 + (x0 + (12544*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4118973Z ''')
2023-01-11T21:38:06.4118979Z 
2023-01-11T21:38:06.4118983Z 
2023-01-11T21:38:06.4119139Z triton_fused_add_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4119212Z import triton
2023-01-11T21:38:06.4119298Z import triton.language as tl
2023-01-11T21:38:06.4119411Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4119510Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4119648Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4119772Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4119777Z 
2023-01-11T21:38:06.4120187Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4120262Z @triton.jit
2023-01-11T21:38:06.4120395Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4120470Z     xnumel = 10035200
2023-01-11T21:38:06.4120560Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4120687Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4120770Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4120841Z     x0 = xindex
2023-01-11T21:38:06.4120935Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4121034Z     tmp1 = 1
2023-01-11T21:38:06.4121106Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4121240Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4121325Z ''')
2023-01-11T21:38:06.4121331Z 
2023-01-11T21:38:06.4121335Z 
2023-01-11T21:38:06.4121427Z async_compile.wait(globals())
2023-01-11T21:38:06.4121506Z del async_compile
2023-01-11T21:38:06.4121511Z 
2023-01-11T21:38:06.4121589Z def call(args):
2023-01-11T21:38:06.4121676Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4121751Z     args.clear()
2023-01-11T21:38:06.4121835Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4122062Z         buf0 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4122280Z         buf2 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4122372Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4122532Z         triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 10035200, grid=grid(10035200), stream=stream0)
2023-01-11T21:38:06.4122609Z         del arg0_1
2023-01-11T21:38:06.4122789Z         triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 7538944, grid=grid(7538944), stream=stream0)
2023-01-11T21:38:06.4122867Z         del arg1_1
2023-01-11T21:38:06.4122935Z         del arg2_1
2023-01-11T21:38:06.4123159Z         buf4 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4123300Z         triton_fused_add_3_2.run(buf2, buf4, 10035200, grid=grid(10035200), stream=stream0)
2023-01-11T21:38:06.4123381Z         return (buf0, buf4, )
2023-01-11T21:38:06.4123386Z 
2023-01-11T21:38:06.4123391Z 
2023-01-11T21:38:06.4123470Z if __name__ == "__main__":
2023-01-11T21:38:06.4123589Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4123713Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4123939Z     arg0_1 = rand_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4124130Z     arg1_1 = rand_strided((601, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4124352Z     arg2_1 = rand_strided((601, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4124512Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4124518Z 
2023-01-11T21:38:06.4124782Z [2023-01-11 21:34:48,322] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 591
2023-01-11T21:38:06.4125197Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4125331Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4125590Z [2023-01-11 21:34:48,412] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 592
2023-01-11T21:38:06.4125596Z 
2023-01-11T21:38:06.4125693Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4125767Z import torch
2023-01-11T21:38:06.4125836Z import random
2023-01-11T21:38:06.4125957Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4126079Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4126084Z 
2023-01-11T21:38:06.4126166Z aten = torch.ops.aten
2023-01-11T21:38:06.4126302Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4126398Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4126403Z 
2023-01-11T21:38:06.4126476Z import triton
2023-01-11T21:38:06.4126569Z import triton.language as tl
2023-01-11T21:38:06.4126687Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4126826Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4126869Z 
2023-01-11T21:38:06.4126874Z 
2023-01-11T21:38:06.4127050Z triton_fused_add_index_put_0 = async_compile.triton('''
2023-01-11T21:38:06.4127125Z import triton
2023-01-11T21:38:06.4127218Z import triton.language as tl
2023-01-11T21:38:06.4127334Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4127435Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4127565Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4127684Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4127689Z 
2023-01-11T21:38:06.4128112Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4128188Z @triton.jit
2023-01-11T21:38:06.4128332Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4128412Z     xnumel = 10035200
2023-01-11T21:38:06.4128507Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4128634Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4128718Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4128782Z     x0 = xindex
2023-01-11T21:38:06.4128997Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4129115Z     tmp1 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4129186Z     tmp2 = 1
2023-01-11T21:38:06.4129266Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4129401Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4129531Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4129608Z ''')
2023-01-11T21:38:06.4129613Z 
2023-01-11T21:38:06.4129624Z 
2023-01-11T21:38:06.4129817Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton('''
2023-01-11T21:38:06.4129898Z import triton
2023-01-11T21:38:06.4129990Z import triton.language as tl
2023-01-11T21:38:06.4130106Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4130207Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4130339Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4130491Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4130496Z 
2023-01-11T21:38:06.4130948Z @pointwise(size_hints=[8388608], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4131022Z @triton.jit
2023-01-11T21:38:06.4131173Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4131251Z     xnumel = 7538944
2023-01-11T21:38:06.4131356Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4131481Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4131563Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4131644Z     x1 = (xindex // 12544)
2023-01-11T21:38:06.4131707Z     x2 = xindex
2023-01-11T21:38:06.4131783Z     x0 = xindex % 12544
2023-01-11T21:38:06.4131977Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4132191Z     tmp1 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4132288Z     tmp2 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4132404Z     tmp5 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.4132473Z     tmp3 = 1
2023-01-11T21:38:06.4132545Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.4132620Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.4132765Z     tl.store(out_ptr0 + (x0 + (12544*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4132908Z     tl.store(out_ptr1 + (x0 + (12544*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4133023Z ''')
2023-01-11T21:38:06.4133028Z 
2023-01-11T21:38:06.4133033Z 
2023-01-11T21:38:06.4133189Z triton_fused_add_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4133262Z import triton
2023-01-11T21:38:06.4133348Z import triton.language as tl
2023-01-11T21:38:06.4133464Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4133566Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4133697Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4133820Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4133825Z 
2023-01-11T21:38:06.4134232Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4134307Z @triton.jit
2023-01-11T21:38:06.4134442Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4134635Z     xnumel = 10035200
2023-01-11T21:38:06.4134727Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4134858Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4134940Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4135011Z     x0 = xindex
2023-01-11T21:38:06.4135127Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4135197Z     tmp1 = 1
2023-01-11T21:38:06.4135269Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4135401Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4135485Z ''')
2023-01-11T21:38:06.4135491Z 
2023-01-11T21:38:06.4135495Z 
2023-01-11T21:38:06.4135587Z async_compile.wait(globals())
2023-01-11T21:38:06.4135663Z del async_compile
2023-01-11T21:38:06.4135668Z 
2023-01-11T21:38:06.4135742Z def call(args):
2023-01-11T21:38:06.4135828Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4135905Z     args.clear()
2023-01-11T21:38:06.4135990Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4136217Z         buf0 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4136436Z         buf2 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4136573Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4136736Z         triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 10035200, grid=grid(10035200), stream=stream0)
2023-01-11T21:38:06.4136809Z         del arg0_1
2023-01-11T21:38:06.4136990Z         triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 7538944, grid=grid(7538944), stream=stream0)
2023-01-11T21:38:06.4137061Z         del arg1_1
2023-01-11T21:38:06.4137177Z         del arg2_1
2023-01-11T21:38:06.4137404Z         buf4 = empty_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4137549Z         triton_fused_add_3_2.run(buf2, buf4, 10035200, grid=grid(10035200), stream=stream0)
2023-01-11T21:38:06.4137632Z         return (buf0, buf4, )
2023-01-11T21:38:06.4137637Z 
2023-01-11T21:38:06.4137641Z 
2023-01-11T21:38:06.4137721Z if __name__ == "__main__":
2023-01-11T21:38:06.4137838Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4137968Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4138193Z     arg0_1 = rand_strided((800, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4138382Z     arg1_1 = rand_strided((601, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4138606Z     arg2_1 = rand_strided((601, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4138733Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4138738Z 
2023-01-11T21:38:06.4139002Z [2023-01-11 21:34:48,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 592
2023-01-11T21:38:06.4139461Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4139591Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4139846Z [2023-01-11 21:34:48,691] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 593
2023-01-11T21:38:06.4139851Z 
2023-01-11T21:38:06.4139949Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4140026Z import torch
2023-01-11T21:38:06.4140093Z import random
2023-01-11T21:38:06.4140212Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4140335Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4140343Z 
2023-01-11T21:38:06.4140424Z aten = torch.ops.aten
2023-01-11T21:38:06.4140558Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4140652Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4140657Z 
2023-01-11T21:38:06.4140732Z import triton
2023-01-11T21:38:06.4140824Z import triton.language as tl
2023-01-11T21:38:06.4140945Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4141085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4141090Z 
2023-01-11T21:38:06.4141095Z 
2023-01-11T21:38:06.4141264Z triton_fused_add_index_put_0 = async_compile.triton('''
2023-01-11T21:38:06.4141338Z import triton
2023-01-11T21:38:06.4141429Z import triton.language as tl
2023-01-11T21:38:06.4141541Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4141643Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4141775Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4141897Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4141902Z 
2023-01-11T21:38:06.4142345Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4142422Z @triton.jit
2023-01-11T21:38:06.4142568Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4142644Z     xnumel = 8192
2023-01-11T21:38:06.4142742Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4142870Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4142953Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4143017Z     x0 = xindex
2023-01-11T21:38:06.4143207Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4143306Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4143378Z     tmp2 = 1
2023-01-11T21:38:06.4143458Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4143593Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4143727Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4143805Z ''')
2023-01-11T21:38:06.4143810Z 
2023-01-11T21:38:06.4143817Z 
2023-01-11T21:38:06.4144014Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton('''
2023-01-11T21:38:06.4144088Z import triton
2023-01-11T21:38:06.4144180Z import triton.language as tl
2023-01-11T21:38:06.4144294Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4144391Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4144524Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4144641Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4144653Z 
2023-01-11T21:38:06.4145098Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4145204Z @triton.jit
2023-01-11T21:38:06.4145353Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4145431Z     xnumel = 32
2023-01-11T21:38:06.4145548Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4145698Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4145783Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4145861Z     x1 = (xindex // 8)
2023-01-11T21:38:06.4145928Z     x0 = xindex % 8
2023-01-11T21:38:06.4146118Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4146310Z     tmp1 = tl.load(in_ptr1 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4146407Z     tmp2 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4146505Z     tmp5 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.4146575Z     tmp3 = 1
2023-01-11T21:38:06.4146653Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.4146725Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.4146869Z     tl.store(out_ptr0 + (x0 + (8*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4147005Z     tl.store(out_ptr1 + (x0 + (8*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4147087Z ''')
2023-01-11T21:38:06.4147093Z 
2023-01-11T21:38:06.4147097Z 
2023-01-11T21:38:06.4147251Z triton_fused_add_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4147321Z import triton
2023-01-11T21:38:06.4147413Z import triton.language as tl
2023-01-11T21:38:06.4147520Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4147621Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4147755Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4147878Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4147887Z 
2023-01-11T21:38:06.4148288Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4148361Z @triton.jit
2023-01-11T21:38:06.4148522Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4148597Z     xnumel = 8192
2023-01-11T21:38:06.4148687Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4148815Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4148899Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4148969Z     x0 = xindex
2023-01-11T21:38:06.4149065Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4149136Z     tmp1 = 1
2023-01-11T21:38:06.4149214Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4149339Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4149428Z ''')
2023-01-11T21:38:06.4149434Z 
2023-01-11T21:38:06.4149438Z 
2023-01-11T21:38:06.4149528Z async_compile.wait(globals())
2023-01-11T21:38:06.4149609Z del async_compile
2023-01-11T21:38:06.4149614Z 
2023-01-11T21:38:06.4149687Z def call(args):
2023-01-11T21:38:06.4149774Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4149847Z     args.clear()
2023-01-11T21:38:06.4149941Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4150141Z         buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4150350Z         buf2 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4150443Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4150598Z         triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4150671Z         del arg0_1
2023-01-11T21:38:06.4150845Z         triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.4150947Z         del arg1_1
2023-01-11T21:38:06.4151012Z         del arg2_1
2023-01-11T21:38:06.4151220Z         buf4 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4151357Z         triton_fused_add_3_2.run(buf2, buf4, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4151441Z         return (buf0, buf4, )
2023-01-11T21:38:06.4151447Z 
2023-01-11T21:38:06.4151451Z 
2023-01-11T21:38:06.4151530Z if __name__ == "__main__":
2023-01-11T21:38:06.4151646Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4151771Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4151979Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4152165Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4152368Z     arg2_1 = rand_strided((4, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4152498Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4152503Z 
2023-01-11T21:38:06.4152765Z [2023-01-11 21:34:48,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 593
2023-01-11T21:38:06.4152771Z 
2023-01-11T21:38:06.4152868Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4152944Z import torch
2023-01-11T21:38:06.4153020Z import random
2023-01-11T21:38:06.4153138Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4153256Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4153267Z 
2023-01-11T21:38:06.4153342Z aten = torch.ops.aten
2023-01-11T21:38:06.4153477Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4153572Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4153577Z 
2023-01-11T21:38:06.4153648Z import triton
2023-01-11T21:38:06.4153741Z import triton.language as tl
2023-01-11T21:38:06.4153865Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4154003Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4154008Z 
2023-01-11T21:38:06.4154013Z 
2023-01-11T21:38:06.4154184Z triton_fused_add_index_put_0 = async_compile.triton('''
2023-01-11T21:38:06.4154254Z import triton
2023-01-11T21:38:06.4154373Z import triton.language as tl
2023-01-11T21:38:06.4154488Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4154588Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4154719Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4154843Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4154849Z 
2023-01-11T21:38:06.4155270Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4155366Z @triton.jit
2023-01-11T21:38:06.4155519Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4155605Z     xnumel = 8192
2023-01-11T21:38:06.4155700Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4155829Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4155913Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4155983Z     x0 = xindex
2023-01-11T21:38:06.4156201Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4156312Z     tmp1 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4156381Z     tmp2 = 1
2023-01-11T21:38:06.4156461Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4156594Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4156724Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4156809Z ''')
2023-01-11T21:38:06.4156815Z 
2023-01-11T21:38:06.4156846Z 
2023-01-11T21:38:06.4157042Z triton_fused_add_add_2_index_put_index_put__1 = async_compile.triton('''
2023-01-11T21:38:06.4157110Z import triton
2023-01-11T21:38:06.4157202Z import triton.language as tl
2023-01-11T21:38:06.4157315Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4157417Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4157550Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4157673Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4157678Z 
2023-01-11T21:38:06.4158134Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4158207Z @triton.jit
2023-01-11T21:38:06.4158358Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4158428Z     xnumel = 32
2023-01-11T21:38:06.4158523Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4158651Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4158732Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4158809Z     x1 = (xindex // 8)
2023-01-11T21:38:06.4158879Z     x0 = xindex % 8
2023-01-11T21:38:06.4159074Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4159282Z     tmp1 = tl.load(in_ptr1 + (x1), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4159378Z     tmp2 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4159493Z     tmp5 = tl.load(in_ptr1 + (x1), xmask).to(tl.float32)
2023-01-11T21:38:06.4159564Z     tmp3 = 1
2023-01-11T21:38:06.4159640Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.4159716Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.4159858Z     tl.store(out_ptr0 + (x0 + (8*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4159991Z     tl.store(out_ptr1 + (x0 + (8*tmp4) + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4160074Z ''')
2023-01-11T21:38:06.4160080Z 
2023-01-11T21:38:06.4160084Z 
2023-01-11T21:38:06.4160244Z triton_fused_add_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4160317Z import triton
2023-01-11T21:38:06.4160409Z import triton.language as tl
2023-01-11T21:38:06.4160548Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4160649Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4160773Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4160897Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4160902Z 
2023-01-11T21:38:06.4161307Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4161379Z @triton.jit
2023-01-11T21:38:06.4161516Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4161589Z     xnumel = 8192
2023-01-11T21:38:06.4161686Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4161814Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4161889Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4161963Z     x0 = xindex
2023-01-11T21:38:06.4162077Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4162145Z     tmp1 = 1
2023-01-11T21:38:06.4162222Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4162354Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4162439Z ''')
2023-01-11T21:38:06.4162444Z 
2023-01-11T21:38:06.4162449Z 
2023-01-11T21:38:06.4162543Z async_compile.wait(globals())
2023-01-11T21:38:06.4162612Z del async_compile
2023-01-11T21:38:06.4162617Z 
2023-01-11T21:38:06.4162697Z def call(args):
2023-01-11T21:38:06.4162782Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4162885Z     args.clear()
2023-01-11T21:38:06.4162972Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4163182Z         buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4163388Z         buf2 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4163476Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4163630Z         triton_fused_add_index_put_0.run(arg0_1, buf0, buf2, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4163703Z         del arg0_1
2023-01-11T21:38:06.4163875Z         triton_fused_add_add_2_index_put_index_put__1.run(arg1_1, arg2_1, buf0, buf2, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.4163947Z         del arg1_1
2023-01-11T21:38:06.4164017Z         del arg2_1
2023-01-11T21:38:06.4164224Z         buf4 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4164361Z         triton_fused_add_3_2.run(buf2, buf4, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4164440Z         return (buf0, buf4, )
2023-01-11T21:38:06.4164445Z 
2023-01-11T21:38:06.4164449Z 
2023-01-11T21:38:06.4164530Z if __name__ == "__main__":
2023-01-11T21:38:06.4164648Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4164781Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4164987Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4165182Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4165385Z     arg2_1 = rand_strided((4, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4165514Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4165520Z 
2023-01-11T21:38:06.4165583Z ok (1.078s)
2023-01-11T21:38:06.4166044Z   test_index_put2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4166208Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4166468Z [2023-01-11 21:34:48,880] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 594
2023-01-11T21:38:06.4166730Z [2023-01-11 21:34:48,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 594
2023-01-11T21:38:06.4166737Z 
2023-01-11T21:38:06.4166835Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4166908Z import torch
2023-01-11T21:38:06.4166983Z import random
2023-01-11T21:38:06.4167101Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4167217Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4167225Z 
2023-01-11T21:38:06.4167308Z aten = torch.ops.aten
2023-01-11T21:38:06.4167444Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4167539Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4167544Z 
2023-01-11T21:38:06.4167615Z import triton
2023-01-11T21:38:06.4167707Z import triton.language as tl
2023-01-11T21:38:06.4167834Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4167976Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4167981Z 
2023-01-11T21:38:06.4167985Z 
2023-01-11T21:38:06.4168142Z triton_fused_index_put_0 = async_compile.triton('''
2023-01-11T21:38:06.4168216Z import triton
2023-01-11T21:38:06.4168308Z import triton.language as tl
2023-01-11T21:38:06.4168422Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4168522Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4168657Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4168823Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4168828Z 
2023-01-11T21:38:06.4169238Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4169307Z @triton.jit
2023-01-11T21:38:06.4169441Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4169515Z     xnumel = 1254400
2023-01-11T21:38:06.4169611Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4169738Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4169820Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4169889Z     x0 = xindex
2023-01-11T21:38:06.4169979Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4170112Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4170197Z ''')
2023-01-11T21:38:06.4170205Z 
2023-01-11T21:38:06.4170209Z 
2023-01-11T21:38:06.4170373Z triton_fused_index_put_1 = async_compile.triton('''
2023-01-11T21:38:06.4170448Z import triton
2023-01-11T21:38:06.4170540Z import triton.language as tl
2023-01-11T21:38:06.4170651Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4170754Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4170879Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4171003Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4171008Z 
2023-01-11T21:38:06.4171436Z @pointwise(size_hints=[8388608], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4171511Z @triton.jit
2023-01-11T21:38:06.4171651Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4171729Z     xnumel = 7526400
2023-01-11T21:38:06.4171825Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4171953Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4172028Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4172109Z     x1 = (xindex // 12544)
2023-01-11T21:38:06.4172178Z     x2 = xindex
2023-01-11T21:38:06.4172288Z     x0 = xindex % 12544
2023-01-11T21:38:06.4172385Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4172480Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.4172631Z     tl.atomic_add(out_ptr0 + (x0 + (12544*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4172710Z ''')
2023-01-11T21:38:06.4172715Z 
2023-01-11T21:38:06.4172726Z 
2023-01-11T21:38:06.4172812Z async_compile.wait(globals())
2023-01-11T21:38:06.4172886Z del async_compile
2023-01-11T21:38:06.4172891Z 
2023-01-11T21:38:06.4172965Z def call(args):
2023-01-11T21:38:06.4173050Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4173127Z     args.clear()
2023-01-11T21:38:06.4173220Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4173447Z         buf0 = empty_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4173533Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4173682Z         triton_fused_index_put_0.run(arg0_1, buf0, 1254400, grid=grid(1254400), stream=stream0)
2023-01-11T21:38:06.4173755Z         del arg0_1
2023-01-11T21:38:06.4173906Z         triton_fused_index_put_1.run(arg1_1, arg2_1, buf0, 7526400, grid=grid(7526400), stream=stream0)
2023-01-11T21:38:06.4173980Z         del arg1_1
2023-01-11T21:38:06.4174052Z         del arg2_1
2023-01-11T21:38:06.4174128Z         return (buf0, )
2023-01-11T21:38:06.4174133Z 
2023-01-11T21:38:06.4174137Z 
2023-01-11T21:38:06.4174210Z if __name__ == "__main__":
2023-01-11T21:38:06.4174329Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4174456Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4174844Z     arg0_1 = rand_strided((100, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4175040Z     arg1_1 = rand_strided((600, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4175262Z     arg2_1 = rand_strided((600, 256, 7, 7), (12544, 49, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4175395Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4175400Z 
2023-01-11T21:38:06.4175471Z ok (0.181s)
2023-01-11T21:38:06.4175975Z   test_index_put3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4176107Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4176363Z [2023-01-11 21:34:49,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 595
2023-01-11T21:38:06.4176629Z [2023-01-11 21:34:49,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 595
2023-01-11T21:38:06.4177043Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4177230Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4177489Z [2023-01-11 21:34:49,258] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 596
2023-01-11T21:38:06.4177494Z 
2023-01-11T21:38:06.4177591Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4177676Z import torch
2023-01-11T21:38:06.4177750Z import random
2023-01-11T21:38:06.4177866Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4177983Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4177988Z 
2023-01-11T21:38:06.4178069Z aten = torch.ops.aten
2023-01-11T21:38:06.4178253Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4178352Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4178357Z 
2023-01-11T21:38:06.4178432Z import triton
2023-01-11T21:38:06.4178526Z import triton.language as tl
2023-01-11T21:38:06.4178650Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4178782Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4178791Z 
2023-01-11T21:38:06.4178796Z 
2023-01-11T21:38:06.4178956Z triton_fused_index_put__0 = async_compile.triton('''
2023-01-11T21:38:06.4179031Z import triton
2023-01-11T21:38:06.4179121Z import triton.language as tl
2023-01-11T21:38:06.4179236Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4179336Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4179468Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4179593Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4179599Z 
2023-01-11T21:38:06.4180034Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4180101Z @triton.jit
2023-01-11T21:38:06.4180241Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4180316Z     xnumel = 6144
2023-01-11T21:38:06.4180412Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4180541Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4180667Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4180752Z     x1 = (xindex // 2) % 3
2023-01-11T21:38:06.4180821Z     x0 = xindex % 2
2023-01-11T21:38:06.4180902Z     x2 = (xindex // 6)
2023-01-11T21:38:06.4181097Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4181303Z     tmp1 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4181453Z     tl.store(out_ptr0 + (x0 + (2*tmp0) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4181541Z ''')
2023-01-11T21:38:06.4181547Z 
2023-01-11T21:38:06.4181551Z 
2023-01-11T21:38:06.4181703Z triton_fused_add_1 = async_compile.triton('''
2023-01-11T21:38:06.4181782Z import triton
2023-01-11T21:38:06.4181870Z import triton.language as tl
2023-01-11T21:38:06.4181988Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4182091Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4182223Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4182357Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4182362Z 
2023-01-11T21:38:06.4182776Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4182852Z @triton.jit
2023-01-11T21:38:06.4182987Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4183057Z     xnumel = 8192
2023-01-11T21:38:06.4183155Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4183286Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4183371Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4183444Z     x0 = xindex
2023-01-11T21:38:06.4183544Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4183619Z     tmp1 = 1
2023-01-11T21:38:06.4183694Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4183833Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4183920Z ''')
2023-01-11T21:38:06.4183926Z 
2023-01-11T21:38:06.4183930Z 
2023-01-11T21:38:06.4184115Z triton_fused_add_add_2_index_put__1_2 = async_compile.triton('''
2023-01-11T21:38:06.4184193Z import triton
2023-01-11T21:38:06.4184312Z import triton.language as tl
2023-01-11T21:38:06.4184431Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4184527Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4184662Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4184788Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4184793Z 
2023-01-11T21:38:06.4185223Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4185302Z @triton.jit
2023-01-11T21:38:06.4185450Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4185550Z     xnumel = 6144
2023-01-11T21:38:06.4185656Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4185797Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4185883Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4185964Z     x1 = (xindex // 2) % 3
2023-01-11T21:38:06.4186039Z     x0 = xindex % 2
2023-01-11T21:38:06.4186116Z     x2 = (xindex // 6)
2023-01-11T21:38:06.4186214Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4186320Z     tmp3 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask)
2023-01-11T21:38:06.4186387Z     tmp1 = 1
2023-01-11T21:38:06.4186468Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4186547Z     tmp4 = tmp3 + tmp1
2023-01-11T21:38:06.4186700Z     tl.store(out_ptr0 + (x0 + (2*tmp2) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4186788Z ''')
2023-01-11T21:38:06.4186830Z 
2023-01-11T21:38:06.4186834Z 
2023-01-11T21:38:06.4186932Z async_compile.wait(globals())
2023-01-11T21:38:06.4187009Z del async_compile
2023-01-11T21:38:06.4187014Z 
2023-01-11T21:38:06.4187092Z def call(args):
2023-01-11T21:38:06.4187173Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4187250Z     args.clear()
2023-01-11T21:38:06.4187348Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4187441Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4187597Z         triton_fused_index_put__0.run(arg1_1, arg2_1, arg0_1, 6144, grid=grid(6144), stream=stream0)
2023-01-11T21:38:06.4187809Z         buf1 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4187949Z         triton_fused_add_1.run(arg0_1, buf1, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4188018Z         del arg0_1
2023-01-11T21:38:06.4188180Z         triton_fused_add_add_2_index_put__1_2.run(arg1_1, arg2_1, buf1, 6144, grid=grid(6144), stream=stream0)
2023-01-11T21:38:06.4188258Z         del arg1_1
2023-01-11T21:38:06.4188333Z         del arg2_1
2023-01-11T21:38:06.4188414Z         return (buf1, )
2023-01-11T21:38:06.4188420Z 
2023-01-11T21:38:06.4188425Z 
2023-01-11T21:38:06.4188509Z if __name__ == "__main__":
2023-01-11T21:38:06.4188630Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4188764Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4188970Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4189166Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4189376Z     arg2_1 = rand_strided((1024, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4189505Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4189510Z 
2023-01-11T21:38:06.4189777Z [2023-01-11 21:34:49,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 596
2023-01-11T21:38:06.4189786Z 
2023-01-11T21:38:06.4189887Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4189964Z import torch
2023-01-11T21:38:06.4190041Z import random
2023-01-11T21:38:06.4190155Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4190280Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4190286Z 
2023-01-11T21:38:06.4190439Z aten = torch.ops.aten
2023-01-11T21:38:06.4190580Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4190676Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4190681Z 
2023-01-11T21:38:06.4190757Z import triton
2023-01-11T21:38:06.4190852Z import triton.language as tl
2023-01-11T21:38:06.4190972Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4191112Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4191117Z 
2023-01-11T21:38:06.4191121Z 
2023-01-11T21:38:06.4191290Z triton_fused_index_put__0 = async_compile.triton('''
2023-01-11T21:38:06.4191370Z import triton
2023-01-11T21:38:06.4191466Z import triton.language as tl
2023-01-11T21:38:06.4191582Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4191687Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4191821Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4191943Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4191956Z 
2023-01-11T21:38:06.4192375Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4192452Z @triton.jit
2023-01-11T21:38:06.4192594Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4192670Z     xnumel = 6144
2023-01-11T21:38:06.4192768Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4192900Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4193007Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4193081Z     x1 = (xindex // 2) % 3
2023-01-11T21:38:06.4193154Z     x0 = xindex % 2
2023-01-11T21:38:06.4193231Z     x2 = (xindex // 6)
2023-01-11T21:38:06.4193420Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4193644Z     tmp1 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4193790Z     tl.store(out_ptr0 + (x0 + (2*tmp0) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4193875Z ''')
2023-01-11T21:38:06.4193880Z 
2023-01-11T21:38:06.4193885Z 
2023-01-11T21:38:06.4194036Z triton_fused_add_1 = async_compile.triton('''
2023-01-11T21:38:06.4194104Z import triton
2023-01-11T21:38:06.4194193Z import triton.language as tl
2023-01-11T21:38:06.4194310Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4194414Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4194549Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4194672Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4194677Z 
2023-01-11T21:38:06.4195080Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4195151Z @triton.jit
2023-01-11T21:38:06.4195280Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4195372Z     xnumel = 8192
2023-01-11T21:38:06.4195477Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4195626Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4195710Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4195783Z     x0 = xindex
2023-01-11T21:38:06.4195904Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4195970Z     tmp1 = 1
2023-01-11T21:38:06.4196048Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4196181Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4196267Z ''')
2023-01-11T21:38:06.4196272Z 
2023-01-11T21:38:06.4196276Z 
2023-01-11T21:38:06.4196462Z triton_fused_add_add_2_index_put__1_2 = async_compile.triton('''
2023-01-11T21:38:06.4196567Z import triton
2023-01-11T21:38:06.4196662Z import triton.language as tl
2023-01-11T21:38:06.4196769Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4196869Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4196998Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4197122Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4197127Z 
2023-01-11T21:38:06.4197546Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4197625Z @triton.jit
2023-01-11T21:38:06.4197763Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4197837Z     xnumel = 6144
2023-01-11T21:38:06.4197925Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4198058Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4198140Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4198222Z     x1 = (xindex // 2) % 3
2023-01-11T21:38:06.4198293Z     x0 = xindex % 2
2023-01-11T21:38:06.4198373Z     x2 = (xindex // 6)
2023-01-11T21:38:06.4198472Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4198586Z     tmp3 = tl.load(in_ptr1 + (x0 + (2*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.4198659Z     tmp1 = 1
2023-01-11T21:38:06.4198737Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4198813Z     tmp4 = tmp3 + tmp1
2023-01-11T21:38:06.4198957Z     tl.store(out_ptr0 + (x0 + (2*tmp2) + (8*x2) + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4199071Z ''')
2023-01-11T21:38:06.4199076Z 
2023-01-11T21:38:06.4199081Z 
2023-01-11T21:38:06.4199173Z async_compile.wait(globals())
2023-01-11T21:38:06.4199248Z del async_compile
2023-01-11T21:38:06.4199254Z 
2023-01-11T21:38:06.4199321Z def call(args):
2023-01-11T21:38:06.4199408Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4199485Z     args.clear()
2023-01-11T21:38:06.4199578Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4199671Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4199820Z         triton_fused_index_put__0.run(arg1_1, arg2_1, arg0_1, 6144, grid=grid(6144), stream=stream0)
2023-01-11T21:38:06.4200025Z         buf1 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4200158Z         triton_fused_add_1.run(arg0_1, buf1, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4200231Z         del arg0_1
2023-01-11T21:38:06.4200394Z         triton_fused_add_add_2_index_put__1_2.run(arg1_1, arg2_1, buf1, 6144, grid=grid(6144), stream=stream0)
2023-01-11T21:38:06.4200470Z         del arg1_1
2023-01-11T21:38:06.4200542Z         del arg2_1
2023-01-11T21:38:06.4200622Z         return (buf1, )
2023-01-11T21:38:06.4200627Z 
2023-01-11T21:38:06.4200632Z 
2023-01-11T21:38:06.4200714Z if __name__ == "__main__":
2023-01-11T21:38:06.4200835Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4200954Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4201161Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4201353Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4201560Z     arg2_1 = rand_strided((1024, 1, 2), (2, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4201687Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4201692Z 
2023-01-11T21:38:06.4201762Z ok (0.388s)
2023-01-11T21:38:06.4202260Z   test_index_put_as_masked_fill_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4202394Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4202652Z [2023-01-11 21:34:49,410] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 597
2023-01-11T21:38:06.4202917Z [2023-01-11 21:34:49,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 597
2023-01-11T21:38:06.4203325Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4203461Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4203721Z [2023-01-11 21:34:49,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 598
2023-01-11T21:38:06.4203981Z [2023-01-11 21:34:49,607] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 598
2023-01-11T21:38:06.4204391Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4204525Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4204801Z [2023-01-11 21:34:49,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 599
2023-01-11T21:38:06.4205063Z [2023-01-11 21:34:49,736] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 599
2023-01-11T21:38:06.4205478Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4205609Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4205864Z [2023-01-11 21:34:49,784] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 600
2023-01-11T21:38:06.4205870Z 
2023-01-11T21:38:06.4205968Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4206039Z import torch
2023-01-11T21:38:06.4206113Z import random
2023-01-11T21:38:06.4206231Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4206353Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4206358Z 
2023-01-11T21:38:06.4206438Z aten = torch.ops.aten
2023-01-11T21:38:06.4206578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4206675Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4206680Z 
2023-01-11T21:38:06.4206747Z import triton
2023-01-11T21:38:06.4206838Z import triton.language as tl
2023-01-11T21:38:06.4206963Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4207098Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4207104Z 
2023-01-11T21:38:06.4207108Z 
2023-01-11T21:38:06.4207287Z triton_fused_clone_index_put__0 = async_compile.triton('''
2023-01-11T21:38:06.4207361Z import triton
2023-01-11T21:38:06.4207453Z import triton.language as tl
2023-01-11T21:38:06.4207568Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4207662Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4207793Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4207919Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4207949Z 
2023-01-11T21:38:06.4208386Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4208459Z @triton.jit
2023-01-11T21:38:06.4208611Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4208685Z     xnumel = 8192
2023-01-11T21:38:06.4208783Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4208903Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4208990Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4209061Z     x0 = xindex
2023-01-11T21:38:06.4209156Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4209289Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.4209385Z     tmp2 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.4209485Z     tmp3 = tl.where(tmp0, tmp1, tmp2)
2023-01-11T21:38:06.4209611Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4209694Z ''')
2023-01-11T21:38:06.4209699Z 
2023-01-11T21:38:06.4209704Z 
2023-01-11T21:38:06.4209794Z async_compile.wait(globals())
2023-01-11T21:38:06.4209870Z del async_compile
2023-01-11T21:38:06.4209875Z 
2023-01-11T21:38:06.4209947Z def call(args):
2023-01-11T21:38:06.4210031Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4210106Z     args.clear()
2023-01-11T21:38:06.4210197Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4210401Z         buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4210532Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4210695Z         triton_fused_clone_index_put__0.run(arg1_1, arg2_1, arg0_1, buf0, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4210767Z         del arg0_1
2023-01-11T21:38:06.4210839Z         del arg1_1
2023-01-11T21:38:06.4210914Z         del arg2_1
2023-01-11T21:38:06.4210993Z         return (buf0, )
2023-01-11T21:38:06.4210998Z 
2023-01-11T21:38:06.4211003Z 
2023-01-11T21:38:06.4211076Z if __name__ == "__main__":
2023-01-11T21:38:06.4211194Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4211319Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4211528Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4211729Z     arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4211914Z     arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4212044Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4212049Z 
2023-01-11T21:38:06.4212054Z 
2023-01-11T21:38:06.4212151Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4212218Z import torch
2023-01-11T21:38:06.4212293Z import random
2023-01-11T21:38:06.4212416Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4212540Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4212545Z 
2023-01-11T21:38:06.4212626Z aten = torch.ops.aten
2023-01-11T21:38:06.4212767Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4212863Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4212868Z 
2023-01-11T21:38:06.4212940Z import triton
2023-01-11T21:38:06.4213025Z import triton.language as tl
2023-01-11T21:38:06.4213151Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4213292Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4213300Z 
2023-01-11T21:38:06.4213305Z 
2023-01-11T21:38:06.4213478Z triton_fused_clone_index_put__0 = async_compile.triton('''
2023-01-11T21:38:06.4213552Z import triton
2023-01-11T21:38:06.4213647Z import triton.language as tl
2023-01-11T21:38:06.4213758Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4213887Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4214014Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4214137Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4214143Z 
2023-01-11T21:38:06.4214767Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4214842Z @triton.jit
2023-01-11T21:38:06.4214990Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4215066Z     xnumel = 8192
2023-01-11T21:38:06.4215162Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4215290Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4215366Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4215436Z     x0 = xindex
2023-01-11T21:38:06.4215534Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4215678Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.4215795Z     tmp2 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4215887Z     tmp3 = tl.where(tmp0, tmp1, tmp2)
2023-01-11T21:38:06.4216023Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4216103Z ''')
2023-01-11T21:38:06.4216109Z 
2023-01-11T21:38:06.4216119Z 
2023-01-11T21:38:06.4216205Z async_compile.wait(globals())
2023-01-11T21:38:06.4216281Z del async_compile
2023-01-11T21:38:06.4216331Z 
2023-01-11T21:38:06.4216404Z def call(args):
2023-01-11T21:38:06.4216491Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4216565Z     args.clear()
2023-01-11T21:38:06.4216655Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4216864Z         buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4216952Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4217120Z         triton_fused_clone_index_put__0.run(arg1_1, arg2_1, arg0_1, buf0, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4217259Z         del arg0_1
2023-01-11T21:38:06.4217340Z         del arg1_1
2023-01-11T21:38:06.4217411Z         del arg2_1
2023-01-11T21:38:06.4217486Z         return (buf0, )
2023-01-11T21:38:06.4217491Z 
2023-01-11T21:38:06.4217496Z 
2023-01-11T21:38:06.4217575Z if __name__ == "__main__":
2023-01-11T21:38:06.4217685Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4217812Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4218026Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4218227Z     arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4218414Z     arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4218545Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4218550Z 
2023-01-11T21:38:06.4218555Z 
2023-01-11T21:38:06.4218656Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4218730Z import torch
2023-01-11T21:38:06.4218797Z import random
2023-01-11T21:38:06.4218915Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4219036Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4219041Z 
2023-01-11T21:38:06.4219123Z aten = torch.ops.aten
2023-01-11T21:38:06.4219258Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4219350Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4219359Z 
2023-01-11T21:38:06.4219431Z import triton
2023-01-11T21:38:06.4219523Z import triton.language as tl
2023-01-11T21:38:06.4219641Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4219779Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4219784Z 
2023-01-11T21:38:06.4219826Z 
2023-01-11T21:38:06.4220004Z triton_fused_clone_index_put__0 = async_compile.triton('''
2023-01-11T21:38:06.4220080Z import triton
2023-01-11T21:38:06.4220171Z import triton.language as tl
2023-01-11T21:38:06.4220283Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4220384Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4220509Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4220633Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4220638Z 
2023-01-11T21:38:06.4221070Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4221147Z @triton.jit
2023-01-11T21:38:06.4221298Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4221373Z     xnumel = 8192
2023-01-11T21:38:06.4221467Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4221598Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4221678Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4221741Z     x0 = xindex
2023-01-11T21:38:06.4221838Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4221933Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.4222068Z     tmp2 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.4222146Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4222243Z     tmp4 = tl.where(tmp0, tmp3, tmp1)
2023-01-11T21:38:06.4222401Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4222479Z ''')
2023-01-11T21:38:06.4222485Z 
2023-01-11T21:38:06.4222490Z 
2023-01-11T21:38:06.4222582Z async_compile.wait(globals())
2023-01-11T21:38:06.4222658Z del async_compile
2023-01-11T21:38:06.4222663Z 
2023-01-11T21:38:06.4222738Z def call(args):
2023-01-11T21:38:06.4222830Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4222903Z     args.clear()
2023-01-11T21:38:06.4222995Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4223196Z         buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4223288Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4223450Z         triton_fused_clone_index_put__0.run(arg1_1, arg0_1, arg2_1, buf0, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4223520Z         del arg0_1
2023-01-11T21:38:06.4223592Z         del arg1_1
2023-01-11T21:38:06.4223662Z         del arg2_1
2023-01-11T21:38:06.4223742Z         return (buf0, )
2023-01-11T21:38:06.4223747Z 
2023-01-11T21:38:06.4223752Z 
2023-01-11T21:38:06.4223831Z if __name__ == "__main__":
2023-01-11T21:38:06.4223942Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4224068Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4224281Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4224481Z     arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4224668Z     arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4224794Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4224799Z 
2023-01-11T21:38:06.4225062Z [2023-01-11 21:34:49,862] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 600
2023-01-11T21:38:06.4225068Z 
2023-01-11T21:38:06.4225165Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4225234Z import torch
2023-01-11T21:38:06.4225306Z import random
2023-01-11T21:38:06.4225432Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4225557Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4225563Z 
2023-01-11T21:38:06.4225665Z aten = torch.ops.aten
2023-01-11T21:38:06.4225848Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4225944Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4225950Z 
2023-01-11T21:38:06.4226020Z import triton
2023-01-11T21:38:06.4226106Z import triton.language as tl
2023-01-11T21:38:06.4226229Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4226364Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4226369Z 
2023-01-11T21:38:06.4226374Z 
2023-01-11T21:38:06.4226548Z triton_fused_clone_index_put__0 = async_compile.triton('''
2023-01-11T21:38:06.4226621Z import triton
2023-01-11T21:38:06.4226713Z import triton.language as tl
2023-01-11T21:38:06.4226830Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4226925Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4227060Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4227185Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4227190Z 
2023-01-11T21:38:06.4227616Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4227690Z @triton.jit
2023-01-11T21:38:06.4227839Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4227912Z     xnumel = 8192
2023-01-11T21:38:06.4228007Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4228135Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4228240Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4228310Z     x0 = xindex
2023-01-11T21:38:06.4228405Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4228521Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4228666Z     tmp2 = tl.load(in_ptr2 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.4228745Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4228840Z     tmp4 = tl.where(tmp0, tmp3, tmp1)
2023-01-11T21:38:06.4228967Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4229050Z ''')
2023-01-11T21:38:06.4229056Z 
2023-01-11T21:38:06.4229060Z 
2023-01-11T21:38:06.4229151Z async_compile.wait(globals())
2023-01-11T21:38:06.4229227Z del async_compile
2023-01-11T21:38:06.4229232Z 
2023-01-11T21:38:06.4229305Z def call(args):
2023-01-11T21:38:06.4229391Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4229466Z     args.clear()
2023-01-11T21:38:06.4229550Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4229760Z         buf0 = empty_strided((1024, 4, 2), (8, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4229850Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4230015Z         triton_fused_clone_index_put__0.run(arg1_1, arg0_1, arg2_1, buf0, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.4230086Z         del arg0_1
2023-01-11T21:38:06.4230160Z         del arg1_1
2023-01-11T21:38:06.4230231Z         del arg2_1
2023-01-11T21:38:06.4230301Z         return (buf0, )
2023-01-11T21:38:06.4230312Z 
2023-01-11T21:38:06.4230317Z 
2023-01-11T21:38:06.4230390Z if __name__ == "__main__":
2023-01-11T21:38:06.4230507Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4230630Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4230840Z     arg0_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4231043Z     arg1_1 = rand_strided((1024, 4, 2), (8, 2, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4231230Z     arg2_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4231356Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4231362Z 
2023-01-11T21:38:06.4231430Z ok (0.501s)
2023-01-11T21:38:06.4231919Z   test_index_put_fallback1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4232051Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4232308Z [2023-01-11 21:34:49,910] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 601
2023-01-11T21:38:06.4232575Z [2023-01-11 21:34:50,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 601
2023-01-11T21:38:06.4232992Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4233122Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4233374Z [2023-01-11 21:34:50,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 602
2023-01-11T21:38:06.4233636Z [2023-01-11 21:34:50,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 602
2023-01-11T21:38:06.4234045Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4234203Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4234459Z [2023-01-11 21:34:50,227] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 603
2023-01-11T21:38:06.4234721Z [2023-01-11 21:34:50,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 603
2023-01-11T21:38:06.4235130Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4235262Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4235560Z [2023-01-11 21:34:50,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 604
2023-01-11T21:38:06.4235566Z 
2023-01-11T21:38:06.4235664Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4235738Z import torch
2023-01-11T21:38:06.4235815Z import random
2023-01-11T21:38:06.4235936Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4236060Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4236066Z 
2023-01-11T21:38:06.4236146Z aten = torch.ops.aten
2023-01-11T21:38:06.4236275Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4236368Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4236373Z 
2023-01-11T21:38:06.4236446Z import triton
2023-01-11T21:38:06.4236537Z import triton.language as tl
2023-01-11T21:38:06.4236661Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4236800Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4236808Z 
2023-01-11T21:38:06.4236813Z 
2023-01-11T21:38:06.4236973Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4237048Z import triton
2023-01-11T21:38:06.4237132Z import triton.language as tl
2023-01-11T21:38:06.4237247Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4237378Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4237512Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4237636Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4237641Z 
2023-01-11T21:38:06.4238046Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4238118Z @triton.jit
2023-01-11T21:38:06.4238250Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4238320Z     xnumel = 3
2023-01-11T21:38:06.4238415Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4238541Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4238623Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4238695Z     x0 = xindex
2023-01-11T21:38:06.4238794Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4238928Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4239006Z ''')
2023-01-11T21:38:06.4239012Z 
2023-01-11T21:38:06.4239017Z 
2023-01-11T21:38:06.4239109Z async_compile.wait(globals())
2023-01-11T21:38:06.4239187Z del async_compile
2023-01-11T21:38:06.4239192Z 
2023-01-11T21:38:06.4239266Z def call(args):
2023-01-11T21:38:06.4239351Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4239424Z     args.clear()
2023-01-11T21:38:06.4239516Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4239705Z         buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4239825Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4239961Z         triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0)
2023-01-11T21:38:06.4240038Z         del arg0_1
2023-01-11T21:38:06.4240149Z         aten.index_put_(buf0, [arg1_1], arg2_1, False)
2023-01-11T21:38:06.4240223Z         del arg1_1
2023-01-11T21:38:06.4240296Z         del arg2_1
2023-01-11T21:38:06.4240366Z         return (buf0, )
2023-01-11T21:38:06.4240379Z 
2023-01-11T21:38:06.4240384Z 
2023-01-11T21:38:06.4240456Z if __name__ == "__main__":
2023-01-11T21:38:06.4240572Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4240697Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4240901Z     arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4241089Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4241280Z     arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4241413Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4241418Z 
2023-01-11T21:38:06.4241423Z 
2023-01-11T21:38:06.4241523Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4241590Z import torch
2023-01-11T21:38:06.4241664Z import random
2023-01-11T21:38:06.4241783Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4241907Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4241912Z 
2023-01-11T21:38:06.4241993Z aten = torch.ops.aten
2023-01-11T21:38:06.4242129Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4242223Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4242230Z 
2023-01-11T21:38:06.4242296Z import triton
2023-01-11T21:38:06.4242386Z import triton.language as tl
2023-01-11T21:38:06.4242510Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4242646Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4242654Z 
2023-01-11T21:38:06.4242658Z 
2023-01-11T21:38:06.4242813Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4242888Z import triton
2023-01-11T21:38:06.4242979Z import triton.language as tl
2023-01-11T21:38:06.4243093Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4243213Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4243348Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4243472Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4243478Z 
2023-01-11T21:38:06.4243870Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4243943Z @triton.jit
2023-01-11T21:38:06.4244074Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4244150Z     xnumel = 3
2023-01-11T21:38:06.4244248Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4244369Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4244452Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4244522Z     x0 = xindex
2023-01-11T21:38:06.4244641Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4244775Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4244862Z ''')
2023-01-11T21:38:06.4244867Z 
2023-01-11T21:38:06.4244871Z 
2023-01-11T21:38:06.4244963Z async_compile.wait(globals())
2023-01-11T21:38:06.4245032Z del async_compile
2023-01-11T21:38:06.4245043Z 
2023-01-11T21:38:06.4245110Z def call(args):
2023-01-11T21:38:06.4245196Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4245270Z     args.clear()
2023-01-11T21:38:06.4245361Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4245557Z         buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4245674Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4245809Z         triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0)
2023-01-11T21:38:06.4245875Z         del arg0_1
2023-01-11T21:38:06.4245988Z         aten.index_put_(buf0, [arg1_1], arg2_1, False)
2023-01-11T21:38:06.4246059Z         del arg1_1
2023-01-11T21:38:06.4246136Z         del arg2_1
2023-01-11T21:38:06.4246213Z         return (buf0, )
2023-01-11T21:38:06.4246218Z 
2023-01-11T21:38:06.4246223Z 
2023-01-11T21:38:06.4246301Z if __name__ == "__main__":
2023-01-11T21:38:06.4246418Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4246536Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4246731Z     arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4246920Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4247110Z     arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4247242Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4247247Z 
2023-01-11T21:38:06.4247252Z 
2023-01-11T21:38:06.4247348Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4247421Z import torch
2023-01-11T21:38:06.4247495Z import random
2023-01-11T21:38:06.4247610Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4247731Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4247736Z 
2023-01-11T21:38:06.4247816Z aten = torch.ops.aten
2023-01-11T21:38:06.4247949Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4248043Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4248048Z 
2023-01-11T21:38:06.4248122Z import triton
2023-01-11T21:38:06.4248213Z import triton.language as tl
2023-01-11T21:38:06.4248337Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4248467Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4248474Z 
2023-01-11T21:38:06.4248478Z 
2023-01-11T21:38:06.4248635Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4248708Z import triton
2023-01-11T21:38:06.4248798Z import triton.language as tl
2023-01-11T21:38:06.4248912Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4249039Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4249172Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4249289Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4249299Z 
2023-01-11T21:38:06.4249684Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4249757Z @triton.jit
2023-01-11T21:38:06.4249888Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4249961Z     xnumel = 3
2023-01-11T21:38:06.4250058Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4250185Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4250268Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4250332Z     x0 = xindex
2023-01-11T21:38:06.4250428Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4250562Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4250647Z ''')
2023-01-11T21:38:06.4250652Z 
2023-01-11T21:38:06.4250657Z 
2023-01-11T21:38:06.4250752Z async_compile.wait(globals())
2023-01-11T21:38:06.4250828Z del async_compile
2023-01-11T21:38:06.4250833Z 
2023-01-11T21:38:06.4250907Z def call(args):
2023-01-11T21:38:06.4250992Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4251061Z     args.clear()
2023-01-11T21:38:06.4251151Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4251344Z         buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4251435Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4251610Z         triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0)
2023-01-11T21:38:06.4251680Z         del arg0_1
2023-01-11T21:38:06.4251791Z         aten.index_put_(buf0, [arg1_1], arg2_1, True)
2023-01-11T21:38:06.4251856Z         del arg1_1
2023-01-11T21:38:06.4251931Z         del arg2_1
2023-01-11T21:38:06.4252010Z         return (buf0, )
2023-01-11T21:38:06.4252015Z 
2023-01-11T21:38:06.4252020Z 
2023-01-11T21:38:06.4252100Z if __name__ == "__main__":
2023-01-11T21:38:06.4252218Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4252343Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4252540Z     arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4252730Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4252913Z     arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4253046Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4253051Z 
2023-01-11T21:38:06.4253319Z [2023-01-11 21:34:50,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 604
2023-01-11T21:38:06.4253325Z 
2023-01-11T21:38:06.4253424Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4253501Z import torch
2023-01-11T21:38:06.4253575Z import random
2023-01-11T21:38:06.4253694Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4253816Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4253821Z 
2023-01-11T21:38:06.4253896Z aten = torch.ops.aten
2023-01-11T21:38:06.4254030Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4254124Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4254129Z 
2023-01-11T21:38:06.4254202Z import triton
2023-01-11T21:38:06.4254295Z import triton.language as tl
2023-01-11T21:38:06.4254421Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4254690Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4254696Z 
2023-01-11T21:38:06.4254701Z 
2023-01-11T21:38:06.4254858Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4254925Z import triton
2023-01-11T21:38:06.4255016Z import triton.language as tl
2023-01-11T21:38:06.4255174Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4255283Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4255413Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4255537Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4255542Z 
2023-01-11T21:38:06.4255943Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4256017Z @triton.jit
2023-01-11T21:38:06.4256148Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4256221Z     xnumel = 3
2023-01-11T21:38:06.4256315Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4256442Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4256525Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4256597Z     x0 = xindex
2023-01-11T21:38:06.4256713Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4256840Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4256924Z ''')
2023-01-11T21:38:06.4256930Z 
2023-01-11T21:38:06.4256934Z 
2023-01-11T21:38:06.4257026Z async_compile.wait(globals())
2023-01-11T21:38:06.4257102Z del async_compile
2023-01-11T21:38:06.4257107Z 
2023-01-11T21:38:06.4257244Z def call(args):
2023-01-11T21:38:06.4257359Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4257443Z     args.clear()
2023-01-11T21:38:06.4257528Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4257766Z         buf0 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4257860Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4257998Z         triton_fused_clone_0.run(arg0_1, buf0, 3, grid=grid(3), stream=stream0)
2023-01-11T21:38:06.4258072Z         del arg0_1
2023-01-11T21:38:06.4258185Z         aten.index_put_(buf0, [arg1_1], arg2_1, True)
2023-01-11T21:38:06.4258255Z         del arg1_1
2023-01-11T21:38:06.4258320Z         del arg2_1
2023-01-11T21:38:06.4258397Z         return (buf0, )
2023-01-11T21:38:06.4258403Z 
2023-01-11T21:38:06.4258407Z 
2023-01-11T21:38:06.4258488Z if __name__ == "__main__":
2023-01-11T21:38:06.4258605Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4258734Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4258929Z     arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4259121Z     arg1_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4259313Z     arg2_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4259432Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4259441Z 
2023-01-11T21:38:06.4259505Z ok (0.432s)
2023-01-11T21:38:06.4259972Z   test_index_put_fallback2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4260105Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4260361Z [2023-01-11 21:34:50,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 605
2023-01-11T21:38:06.4260624Z [2023-01-11 21:34:50,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 605
2023-01-11T21:38:06.4261066Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4261198Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4261452Z [2023-01-11 21:34:50,547] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 606
2023-01-11T21:38:06.4261711Z [2023-01-11 21:34:50,614] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 606
2023-01-11T21:38:06.4262123Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4262256Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4262515Z [2023-01-11 21:34:50,664] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 607
2023-01-11T21:38:06.4262771Z [2023-01-11 21:34:50,674] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 607
2023-01-11T21:38:06.4263182Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4263313Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4263596Z [2023-01-11 21:34:50,723] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 608
2023-01-11T21:38:06.4263601Z 
2023-01-11T21:38:06.4263702Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4263784Z import torch
2023-01-11T21:38:06.4263865Z import random
2023-01-11T21:38:06.4263986Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4264112Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4264118Z 
2023-01-11T21:38:06.4264195Z aten = torch.ops.aten
2023-01-11T21:38:06.4264337Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4264434Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4264439Z 
2023-01-11T21:38:06.4264512Z import triton
2023-01-11T21:38:06.4264609Z import triton.language as tl
2023-01-11T21:38:06.4264735Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4264879Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4264884Z 
2023-01-11T21:38:06.4264889Z 
2023-01-11T21:38:06.4265048Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4265118Z import triton
2023-01-11T21:38:06.4265210Z import triton.language as tl
2023-01-11T21:38:06.4265333Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4265438Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4265594Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4265733Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4265739Z 
2023-01-11T21:38:06.4266150Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4266226Z @triton.jit
2023-01-11T21:38:06.4266354Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4266432Z     xnumel = 6
2023-01-11T21:38:06.4266534Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4266664Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4266750Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4266823Z     x0 = xindex
2023-01-11T21:38:06.4266980Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4267113Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4267201Z ''')
2023-01-11T21:38:06.4267207Z 
2023-01-11T21:38:06.4267212Z 
2023-01-11T21:38:06.4267306Z async_compile.wait(globals())
2023-01-11T21:38:06.4267386Z del async_compile
2023-01-11T21:38:06.4267391Z 
2023-01-11T21:38:06.4267467Z def call(args):
2023-01-11T21:38:06.4267563Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:06.4267639Z     args.clear()
2023-01-11T21:38:06.4267726Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4267935Z         buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4268034Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4268173Z         triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.4268248Z         del arg0_1
2023-01-11T21:38:06.4268379Z         aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, False)
2023-01-11T21:38:06.4268457Z         del arg1_1
2023-01-11T21:38:06.4268524Z         del arg2_1
2023-01-11T21:38:06.4268595Z         del arg3_1
2023-01-11T21:38:06.4268673Z         return (buf0, )
2023-01-11T21:38:06.4268678Z 
2023-01-11T21:38:06.4268682Z 
2023-01-11T21:38:06.4268764Z if __name__ == "__main__":
2023-01-11T21:38:06.4268890Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4269017Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4269231Z     arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4269423Z     arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4269633Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4269824Z     arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4269958Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:06.4269965Z 
2023-01-11T21:38:06.4269970Z 
2023-01-11T21:38:06.4270074Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4270151Z import torch
2023-01-11T21:38:06.4270229Z import random
2023-01-11T21:38:06.4270348Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4270474Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4270479Z 
2023-01-11T21:38:06.4270556Z aten = torch.ops.aten
2023-01-11T21:38:06.4270695Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4270789Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4270794Z 
2023-01-11T21:38:06.4270871Z import triton
2023-01-11T21:38:06.4270971Z import triton.language as tl
2023-01-11T21:38:06.4271100Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4271238Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4271244Z 
2023-01-11T21:38:06.4271248Z 
2023-01-11T21:38:06.4271407Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4271478Z import triton
2023-01-11T21:38:06.4271573Z import triton.language as tl
2023-01-11T21:38:06.4271687Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4271792Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4271928Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4272050Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4272056Z 
2023-01-11T21:38:06.4272453Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4272534Z @triton.jit
2023-01-11T21:38:06.4272662Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4272734Z     xnumel = 6
2023-01-11T21:38:06.4272834Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4272994Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4273081Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4273159Z     x0 = xindex
2023-01-11T21:38:06.4273279Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4273409Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4273497Z ''')
2023-01-11T21:38:06.4273503Z 
2023-01-11T21:38:06.4273507Z 
2023-01-11T21:38:06.4273601Z async_compile.wait(globals())
2023-01-11T21:38:06.4273682Z del async_compile
2023-01-11T21:38:06.4273687Z 
2023-01-11T21:38:06.4273763Z def call(args):
2023-01-11T21:38:06.4273858Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:06.4273939Z     args.clear()
2023-01-11T21:38:06.4274033Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4274233Z         buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4274325Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4274463Z         triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.4274542Z         del arg0_1
2023-01-11T21:38:06.4274671Z         aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, False)
2023-01-11T21:38:06.4274745Z         del arg1_1
2023-01-11T21:38:06.4274821Z         del arg2_1
2023-01-11T21:38:06.4274889Z         del arg3_1
2023-01-11T21:38:06.4274968Z         return (buf0, )
2023-01-11T21:38:06.4274973Z 
2023-01-11T21:38:06.4274977Z 
2023-01-11T21:38:06.4275063Z if __name__ == "__main__":
2023-01-11T21:38:06.4275195Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4275337Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4275593Z     arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4275786Z     arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4275977Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4276161Z     arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4276295Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:06.4276300Z 
2023-01-11T21:38:06.4276305Z 
2023-01-11T21:38:06.4276405Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4276481Z import torch
2023-01-11T21:38:06.4276557Z import random
2023-01-11T21:38:06.4276676Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4276802Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4276807Z 
2023-01-11T21:38:06.4276890Z aten = torch.ops.aten
2023-01-11T21:38:06.4277021Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4277121Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4277126Z 
2023-01-11T21:38:06.4277203Z import triton
2023-01-11T21:38:06.4277296Z import triton.language as tl
2023-01-11T21:38:06.4277423Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4277566Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4277571Z 
2023-01-11T21:38:06.4277575Z 
2023-01-11T21:38:06.4277733Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4277810Z import triton
2023-01-11T21:38:06.4277898Z import triton.language as tl
2023-01-11T21:38:06.4278013Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4278117Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4278252Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4278377Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4278382Z 
2023-01-11T21:38:06.4278776Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4278853Z @triton.jit
2023-01-11T21:38:06.4278985Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4279081Z     xnumel = 6
2023-01-11T21:38:06.4279182Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4279311Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4279396Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4279470Z     x0 = xindex
2023-01-11T21:38:06.4279568Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4279706Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4279787Z ''')
2023-01-11T21:38:06.4279793Z 
2023-01-11T21:38:06.4279797Z 
2023-01-11T21:38:06.4279896Z async_compile.wait(globals())
2023-01-11T21:38:06.4279975Z del async_compile
2023-01-11T21:38:06.4279980Z 
2023-01-11T21:38:06.4280056Z def call(args):
2023-01-11T21:38:06.4280151Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:06.4280227Z     args.clear()
2023-01-11T21:38:06.4280320Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4280521Z         buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4280618Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4280757Z         triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.4280832Z         del arg0_1
2023-01-11T21:38:06.4280958Z         aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, True)
2023-01-11T21:38:06.4281036Z         del arg1_1
2023-01-11T21:38:06.4281111Z         del arg2_1
2023-01-11T21:38:06.4281178Z         del arg3_1
2023-01-11T21:38:06.4281256Z         return (buf0, )
2023-01-11T21:38:06.4281261Z 
2023-01-11T21:38:06.4281266Z 
2023-01-11T21:38:06.4281348Z if __name__ == "__main__":
2023-01-11T21:38:06.4281538Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4281663Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4281870Z     arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4282065Z     arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4282257Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4282439Z     arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4282577Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:06.4282582Z 
2023-01-11T21:38:06.4282849Z [2023-01-11 21:34:50,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 608
2023-01-11T21:38:06.4282856Z 
2023-01-11T21:38:06.4282955Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4283032Z import torch
2023-01-11T21:38:06.4283109Z import random
2023-01-11T21:38:06.4283228Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4283352Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4283358Z 
2023-01-11T21:38:06.4283434Z aten = torch.ops.aten
2023-01-11T21:38:06.4283571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4283671Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4283676Z 
2023-01-11T21:38:06.4283752Z import triton
2023-01-11T21:38:06.4283847Z import triton.language as tl
2023-01-11T21:38:06.4283972Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4284115Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4284121Z 
2023-01-11T21:38:06.4284126Z 
2023-01-11T21:38:06.4284284Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.4284354Z import triton
2023-01-11T21:38:06.4284449Z import triton.language as tl
2023-01-11T21:38:06.4284564Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4284669Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4284803Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4284931Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4284936Z 
2023-01-11T21:38:06.4285367Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4285441Z @triton.jit
2023-01-11T21:38:06.4285568Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4285641Z     xnumel = 6
2023-01-11T21:38:06.4285740Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4285871Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4285959Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4286033Z     x0 = xindex
2023-01-11T21:38:06.4286151Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4286282Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.4286369Z ''')
2023-01-11T21:38:06.4286375Z 
2023-01-11T21:38:06.4286379Z 
2023-01-11T21:38:06.4286476Z async_compile.wait(globals())
2023-01-11T21:38:06.4286554Z del async_compile
2023-01-11T21:38:06.4286561Z 
2023-01-11T21:38:06.4286637Z def call(args):
2023-01-11T21:38:06.4286733Z     arg0_1, arg1_1, arg2_1, arg3_1 = args
2023-01-11T21:38:06.4286811Z     args.clear()
2023-01-11T21:38:06.4286904Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4287104Z         buf0 = empty_strided((1, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4287199Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4287339Z         triton_fused_clone_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.4287415Z         del arg0_1
2023-01-11T21:38:06.4287542Z         aten.index_put_(buf0, [None,arg1_1,arg2_1], arg3_1, True)
2023-01-11T21:38:06.4287642Z         del arg1_1
2023-01-11T21:38:06.4287717Z         del arg2_1
2023-01-11T21:38:06.4287784Z         del arg3_1
2023-01-11T21:38:06.4287863Z         return (buf0, )
2023-01-11T21:38:06.4287868Z 
2023-01-11T21:38:06.4287873Z 
2023-01-11T21:38:06.4287954Z if __name__ == "__main__":
2023-01-11T21:38:06.4288076Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4288204Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4288413Z     arg0_1 = rand_strided((1, 2, 3), (6, 3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4288605Z     arg1_1 = rand_strided((2, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4288795Z     arg2_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.4288977Z     arg3_1 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4289113Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1]))
2023-01-11T21:38:06.4289122Z 
2023-01-11T21:38:06.4289197Z ok (0.440s)
2023-01-11T21:38:06.4289660Z   test_index_select_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4289793Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4290054Z [2023-01-11 21:34:50,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 609
2023-01-11T21:38:06.4290319Z [2023-01-11 21:34:50,894] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 609
2023-01-11T21:38:06.4290733Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4290868Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4291150Z [2023-01-11 21:34:50,936] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 610
2023-01-11T21:38:06.4291156Z 
2023-01-11T21:38:06.4291258Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4291328Z import torch
2023-01-11T21:38:06.4291404Z import random
2023-01-11T21:38:06.4291525Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4291650Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4291655Z 
2023-01-11T21:38:06.4291738Z aten = torch.ops.aten
2023-01-11T21:38:06.4291875Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4291974Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4291979Z 
2023-01-11T21:38:06.4292049Z import triton
2023-01-11T21:38:06.4292143Z import triton.language as tl
2023-01-11T21:38:06.4292276Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4292417Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4292424Z 
2023-01-11T21:38:06.4292429Z 
2023-01-11T21:38:06.4292589Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4292668Z import triton
2023-01-11T21:38:06.4292762Z import triton.language as tl
2023-01-11T21:38:06.4292880Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4292977Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4293110Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4293238Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4293243Z 
2023-01-11T21:38:06.4293663Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4293786Z @triton.jit
2023-01-11T21:38:06.4293928Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4294002Z     xnumel = 256
2023-01-11T21:38:06.4294103Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4294228Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4294314Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4294400Z     x1 = (xindex // 64)
2023-01-11T21:38:06.4294587Z     x0 = xindex % 64
2023-01-11T21:38:06.4294662Z     x2 = xindex
2023-01-11T21:38:06.4294861Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4295069Z     tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4295197Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4295290Z ''')
2023-01-11T21:38:06.4295295Z 
2023-01-11T21:38:06.4295300Z 
2023-01-11T21:38:06.4295460Z triton_fused_index_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4295535Z import triton
2023-01-11T21:38:06.4295633Z import triton.language as tl
2023-01-11T21:38:06.4295753Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4295858Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4295996Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4296116Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4296121Z 
2023-01-11T21:38:06.4296540Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4296616Z @triton.jit
2023-01-11T21:38:06.4296755Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4296835Z     xnumel = 256
2023-01-11T21:38:06.4296932Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4297062Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4297196Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4297335Z     x1 = (xindex // 8) % 4
2023-01-11T21:38:06.4297424Z     x0 = xindex % 8
2023-01-11T21:38:06.4297519Z     x2 = (xindex // 32)
2023-01-11T21:38:06.4297592Z     x3 = xindex
2023-01-11T21:38:06.4297785Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4298001Z     tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4298136Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4298213Z ''')
2023-01-11T21:38:06.4298219Z 
2023-01-11T21:38:06.4298223Z 
2023-01-11T21:38:06.4298381Z triton_fused_index_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4298462Z import triton
2023-01-11T21:38:06.4298553Z import triton.language as tl
2023-01-11T21:38:06.4298670Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4298770Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4298903Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4299023Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4299035Z 
2023-01-11T21:38:06.4299444Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4299516Z @triton.jit
2023-01-11T21:38:06.4299653Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4299725Z     xnumel = 128
2023-01-11T21:38:06.4299822Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4299953Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4300074Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4300147Z     x1 = (xindex // 4) % 4
2023-01-11T21:38:06.4300220Z     x0 = xindex % 4
2023-01-11T21:38:06.4300295Z     x2 = (xindex // 16)
2023-01-11T21:38:06.4300365Z     x4 = xindex
2023-01-11T21:38:06.4300466Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4300562Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4300681Z     tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask)
2023-01-11T21:38:06.4300805Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4300888Z ''')
2023-01-11T21:38:06.4300893Z 
2023-01-11T21:38:06.4300897Z 
2023-01-11T21:38:06.4300990Z async_compile.wait(globals())
2023-01-11T21:38:06.4301066Z del async_compile
2023-01-11T21:38:06.4301071Z 
2023-01-11T21:38:06.4301145Z def call(args):
2023-01-11T21:38:06.4301224Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4301298Z     args.clear()
2023-01-11T21:38:06.4301395Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4301596Z         buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4301686Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4301831Z         triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4302037Z         buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4302182Z         triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4302384Z         buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4302528Z         triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4302600Z         del arg0_1
2023-01-11T21:38:06.4302666Z         del arg1_1
2023-01-11T21:38:06.4302756Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4302763Z 
2023-01-11T21:38:06.4302768Z 
2023-01-11T21:38:06.4302846Z if __name__ == "__main__":
2023-01-11T21:38:06.4302963Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4303088Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4303328Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4303520Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.4303638Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4303643Z 
2023-01-11T21:38:06.4303901Z [2023-01-11 21:34:51,039] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 610
2023-01-11T21:38:06.4304316Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4304448Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4304702Z [2023-01-11 21:34:51,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 611
2023-01-11T21:38:06.4304707Z 
2023-01-11T21:38:06.4304807Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4304882Z import torch
2023-01-11T21:38:06.4304957Z import random
2023-01-11T21:38:06.4305077Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4305203Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4305208Z 
2023-01-11T21:38:06.4305283Z aten = torch.ops.aten
2023-01-11T21:38:06.4305437Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4305540Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4305547Z 
2023-01-11T21:38:06.4305638Z import triton
2023-01-11T21:38:06.4305733Z import triton.language as tl
2023-01-11T21:38:06.4305893Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4306033Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4306038Z 
2023-01-11T21:38:06.4306043Z 
2023-01-11T21:38:06.4306202Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4306271Z import triton
2023-01-11T21:38:06.4306365Z import triton.language as tl
2023-01-11T21:38:06.4306478Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4306579Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4306713Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4306836Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4306841Z 
2023-01-11T21:38:06.4307256Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4307332Z @triton.jit
2023-01-11T21:38:06.4307464Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4307538Z     xnumel = 256
2023-01-11T21:38:06.4307634Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4307765Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4307849Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4307926Z     x1 = (xindex // 64)
2023-01-11T21:38:06.4307999Z     x0 = xindex % 64
2023-01-11T21:38:06.4308063Z     x2 = xindex
2023-01-11T21:38:06.4308252Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4308485Z     tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4308620Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4308703Z ''')
2023-01-11T21:38:06.4308708Z 
2023-01-11T21:38:06.4308713Z 
2023-01-11T21:38:06.4308874Z triton_fused_index_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4308949Z import triton
2023-01-11T21:38:06.4309040Z import triton.language as tl
2023-01-11T21:38:06.4309148Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4309249Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4309408Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4309536Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4309541Z 
2023-01-11T21:38:06.4309958Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4310030Z @triton.jit
2023-01-11T21:38:06.4310169Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4310242Z     xnumel = 256
2023-01-11T21:38:06.4310332Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4310463Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4310544Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4310622Z     x1 = (xindex // 8) % 4
2023-01-11T21:38:06.4310696Z     x0 = xindex % 8
2023-01-11T21:38:06.4310770Z     x2 = (xindex // 32)
2023-01-11T21:38:06.4310843Z     x3 = xindex
2023-01-11T21:38:06.4311024Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4311263Z     tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4311395Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4311479Z ''')
2023-01-11T21:38:06.4311485Z 
2023-01-11T21:38:06.4311489Z 
2023-01-11T21:38:06.4311647Z triton_fused_index_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4311722Z import triton
2023-01-11T21:38:06.4311815Z import triton.language as tl
2023-01-11T21:38:06.4311950Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4312051Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4312182Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4312306Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4312311Z 
2023-01-11T21:38:06.4312728Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4312802Z @triton.jit
2023-01-11T21:38:06.4312938Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4313013Z     xnumel = 128
2023-01-11T21:38:06.4313109Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4313231Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4313317Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4313401Z     x1 = (xindex // 4) % 4
2023-01-11T21:38:06.4313474Z     x0 = xindex % 4
2023-01-11T21:38:06.4313551Z     x2 = (xindex // 16)
2023-01-11T21:38:06.4313620Z     x4 = xindex
2023-01-11T21:38:06.4313711Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4313808Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4313944Z     tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.4314078Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4314163Z ''')
2023-01-11T21:38:06.4314169Z 
2023-01-11T21:38:06.4314173Z 
2023-01-11T21:38:06.4314267Z async_compile.wait(globals())
2023-01-11T21:38:06.4314343Z del async_compile
2023-01-11T21:38:06.4314348Z 
2023-01-11T21:38:06.4314421Z def call(args):
2023-01-11T21:38:06.4314494Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4314568Z     args.clear()
2023-01-11T21:38:06.4314659Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4314865Z         buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4314959Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4315108Z         triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4315364Z         buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4315536Z         triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4315733Z         buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4315879Z         triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4315949Z         del arg0_1
2023-01-11T21:38:06.4316022Z         del arg1_1
2023-01-11T21:38:06.4316108Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4316114Z 
2023-01-11T21:38:06.4316118Z 
2023-01-11T21:38:06.4316196Z if __name__ == "__main__":
2023-01-11T21:38:06.4316321Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4316448Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4316647Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4316841Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.4316960Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4316965Z 
2023-01-11T21:38:06.4317228Z [2023-01-11 21:34:51,184] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 611
2023-01-11T21:38:06.4317644Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4317800Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4318057Z [2023-01-11 21:34:51,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 612
2023-01-11T21:38:06.4318063Z 
2023-01-11T21:38:06.4318160Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4318236Z import torch
2023-01-11T21:38:06.4318304Z import random
2023-01-11T21:38:06.4318422Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4318547Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4318553Z 
2023-01-11T21:38:06.4318634Z aten = torch.ops.aten
2023-01-11T21:38:06.4318769Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4318865Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4318870Z 
2023-01-11T21:38:06.4318944Z import triton
2023-01-11T21:38:06.4319033Z import triton.language as tl
2023-01-11T21:38:06.4319151Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4319292Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4319297Z 
2023-01-11T21:38:06.4319302Z 
2023-01-11T21:38:06.4319459Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4319532Z import triton
2023-01-11T21:38:06.4319622Z import triton.language as tl
2023-01-11T21:38:06.4319739Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4319839Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4319964Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4320088Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4320093Z 
2023-01-11T21:38:06.4320508Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4320583Z @triton.jit
2023-01-11T21:38:06.4320726Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4320799Z     xnumel = 256
2023-01-11T21:38:06.4320895Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4321024Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4321101Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4321208Z     x1 = (xindex // 64)
2023-01-11T21:38:06.4321284Z     x0 = xindex % 64
2023-01-11T21:38:06.4321355Z     x2 = xindex
2023-01-11T21:38:06.4321544Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4321749Z     tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4321881Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4321958Z ''')
2023-01-11T21:38:06.4321968Z 
2023-01-11T21:38:06.4321972Z 
2023-01-11T21:38:06.4322125Z triton_fused_index_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4322199Z import triton
2023-01-11T21:38:06.4322294Z import triton.language as tl
2023-01-11T21:38:06.4322408Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4322509Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4322643Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4322766Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4322774Z 
2023-01-11T21:38:06.4323187Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4323255Z @triton.jit
2023-01-11T21:38:06.4323392Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4323464Z     xnumel = 256
2023-01-11T21:38:06.4323560Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4323688Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4323798Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4323881Z     x1 = (xindex // 8) % 4
2023-01-11T21:38:06.4323947Z     x0 = xindex % 8
2023-01-11T21:38:06.4324027Z     x2 = (xindex // 32)
2023-01-11T21:38:06.4324098Z     x3 = xindex
2023-01-11T21:38:06.4324287Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4324506Z     tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4324641Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4324721Z ''')
2023-01-11T21:38:06.4324727Z 
2023-01-11T21:38:06.4324731Z 
2023-01-11T21:38:06.4324893Z triton_fused_index_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4324961Z import triton
2023-01-11T21:38:06.4325057Z import triton.language as tl
2023-01-11T21:38:06.4325170Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4325276Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4325410Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4325535Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4325540Z 
2023-01-11T21:38:06.4325954Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4326027Z @triton.jit
2023-01-11T21:38:06.4326158Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4326230Z     xnumel = 128
2023-01-11T21:38:06.4326325Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4326453Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4326534Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4326616Z     x1 = (xindex // 4) % 4
2023-01-11T21:38:06.4326689Z     x0 = xindex % 4
2023-01-11T21:38:06.4326760Z     x2 = (xindex // 16)
2023-01-11T21:38:06.4326833Z     x4 = xindex
2023-01-11T21:38:06.4326929Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4327026Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4327146Z     tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask)
2023-01-11T21:38:06.4327310Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4327396Z ''')
2023-01-11T21:38:06.4327402Z 
2023-01-11T21:38:06.4327406Z 
2023-01-11T21:38:06.4327492Z async_compile.wait(globals())
2023-01-11T21:38:06.4327568Z del async_compile
2023-01-11T21:38:06.4327573Z 
2023-01-11T21:38:06.4327647Z def call(args):
2023-01-11T21:38:06.4327725Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4327800Z     args.clear()
2023-01-11T21:38:06.4327890Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4328097Z         buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4328182Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4328330Z         triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4328537Z         buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4328685Z         triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4328890Z         buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4329034Z         triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4329105Z         del arg0_1
2023-01-11T21:38:06.4329177Z         del arg1_1
2023-01-11T21:38:06.4329258Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4329263Z 
2023-01-11T21:38:06.4329273Z 
2023-01-11T21:38:06.4329346Z if __name__ == "__main__":
2023-01-11T21:38:06.4329463Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4329589Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4329825Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4330021Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4330139Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4330145Z 
2023-01-11T21:38:06.4330414Z [2023-01-11 21:34:51,330] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 612
2023-01-11T21:38:06.4330419Z 
2023-01-11T21:38:06.4330515Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4330583Z import torch
2023-01-11T21:38:06.4330656Z import random
2023-01-11T21:38:06.4330774Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4330901Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4330906Z 
2023-01-11T21:38:06.4330989Z aten = torch.ops.aten
2023-01-11T21:38:06.4331126Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4331225Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4331230Z 
2023-01-11T21:38:06.4331297Z import triton
2023-01-11T21:38:06.4331389Z import triton.language as tl
2023-01-11T21:38:06.4331513Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4331653Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4331661Z 
2023-01-11T21:38:06.4331665Z 
2023-01-11T21:38:06.4331825Z triton_fused_index_0 = async_compile.triton('''
2023-01-11T21:38:06.4331900Z import triton
2023-01-11T21:38:06.4331995Z import triton.language as tl
2023-01-11T21:38:06.4332108Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4332202Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4332333Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4332457Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4332462Z 
2023-01-11T21:38:06.4332874Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4332950Z @triton.jit
2023-01-11T21:38:06.4333090Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4333197Z     xnumel = 256
2023-01-11T21:38:06.4333295Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4333417Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4333500Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4333581Z     x1 = (xindex // 64)
2023-01-11T21:38:06.4333656Z     x0 = xindex % 64
2023-01-11T21:38:06.4333725Z     x2 = xindex
2023-01-11T21:38:06.4333917Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4334147Z     tmp1 = tl.load(in_ptr1 + (x0 + (64*tmp0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4334272Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4334360Z ''')
2023-01-11T21:38:06.4334366Z 
2023-01-11T21:38:06.4334370Z 
2023-01-11T21:38:06.4334633Z triton_fused_index_1_1 = async_compile.triton('''
2023-01-11T21:38:06.4334712Z import triton
2023-01-11T21:38:06.4334804Z import triton.language as tl
2023-01-11T21:38:06.4334920Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4335022Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4335153Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4335271Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4335276Z 
2023-01-11T21:38:06.4335694Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4335766Z @triton.jit
2023-01-11T21:38:06.4335901Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4336031Z     xnumel = 256
2023-01-11T21:38:06.4336127Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4336256Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4336339Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4336416Z     x1 = (xindex // 8) % 4
2023-01-11T21:38:06.4336490Z     x0 = xindex % 8
2023-01-11T21:38:06.4336567Z     x2 = (xindex // 32)
2023-01-11T21:38:06.4336637Z     x3 = xindex
2023-01-11T21:38:06.4336822Z     tmp0 = tl.load(in_ptr0 + (x1), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4337062Z     tmp1 = tl.load(in_ptr1 + (x0 + (8*tmp0) + (64*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4337276Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4337355Z ''')
2023-01-11T21:38:06.4337360Z 
2023-01-11T21:38:06.4337364Z 
2023-01-11T21:38:06.4337523Z triton_fused_index_3_2 = async_compile.triton('''
2023-01-11T21:38:06.4337603Z import triton
2023-01-11T21:38:06.4337695Z import triton.language as tl
2023-01-11T21:38:06.4337808Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4337910Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4338042Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4338163Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4338175Z 
2023-01-11T21:38:06.4338583Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4338656Z @triton.jit
2023-01-11T21:38:06.4338794Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4338866Z     xnumel = 128
2023-01-11T21:38:06.4338962Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4339093Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4339175Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4339248Z     x1 = (xindex // 4) % 4
2023-01-11T21:38:06.4339320Z     x0 = xindex % 4
2023-01-11T21:38:06.4339401Z     x2 = (xindex // 16)
2023-01-11T21:38:06.4339472Z     x4 = xindex
2023-01-11T21:38:06.4339615Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.4339712Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4339848Z     tmp2 = tl.load(in_ptr1 + (tmp1 + (8*tmp0) + (64*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.4339975Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4340061Z ''')
2023-01-11T21:38:06.4340067Z 
2023-01-11T21:38:06.4340071Z 
2023-01-11T21:38:06.4340165Z async_compile.wait(globals())
2023-01-11T21:38:06.4340242Z del async_compile
2023-01-11T21:38:06.4340247Z 
2023-01-11T21:38:06.4340322Z def call(args):
2023-01-11T21:38:06.4340401Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4340479Z     args.clear()
2023-01-11T21:38:06.4340571Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4340770Z         buf0 = empty_strided((4, 8, 8), (64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4340860Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4341007Z         triton_fused_index_0.run(arg1_1, arg0_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4341212Z         buf1 = empty_strided((8, 4, 8), (32, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4341354Z         triton_fused_index_1_1.run(arg1_1, arg0_1, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4341555Z         buf2 = empty_strided((8, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4341701Z         triton_fused_index_3_2.run(arg1_1, arg0_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.4341775Z         del arg0_1
2023-01-11T21:38:06.4341841Z         del arg1_1
2023-01-11T21:38:06.4341929Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4341961Z 
2023-01-11T21:38:06.4341966Z 
2023-01-11T21:38:06.4342050Z if __name__ == "__main__":
2023-01-11T21:38:06.4342169Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4342297Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4342509Z     arg0_1 = rand_strided((8, 8, 8), (64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4342703Z     arg1_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4342827Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4342832Z 
2023-01-11T21:38:06.4342898Z ok (0.597s)
2023-01-11T21:38:06.4343373Z   test_indirect_load_broadcast_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4343511Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4343771Z [2023-01-11 21:34:51,348] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 613
2023-01-11T21:38:06.4344036Z [2023-01-11 21:34:51,876] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 613
2023-01-11T21:38:06.4344453Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4344584Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4344841Z [2023-01-11 21:34:51,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 614
2023-01-11T21:38:06.4345110Z [2023-01-11 21:34:52,122] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 614
2023-01-11T21:38:06.4345116Z 
2023-01-11T21:38:06.4345217Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4345321Z import torch
2023-01-11T21:38:06.4345393Z import random
2023-01-11T21:38:06.4345537Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4345677Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4345683Z 
2023-01-11T21:38:06.4345780Z aten = torch.ops.aten
2023-01-11T21:38:06.4345918Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4346014Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4346019Z 
2023-01-11T21:38:06.4346095Z import triton
2023-01-11T21:38:06.4346183Z import triton.language as tl
2023-01-11T21:38:06.4346309Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4346451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4346457Z 
2023-01-11T21:38:06.4346461Z 
2023-01-11T21:38:06.4346617Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4346694Z import triton
2023-01-11T21:38:06.4346790Z import triton.language as tl
2023-01-11T21:38:06.4346908Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4347013Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4347141Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4347269Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4347274Z 
2023-01-11T21:38:06.4347767Z @pointwise(size_hints=[32, 32], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4347871Z @triton.jit
2023-01-11T21:38:06.4348055Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.4348129Z     xnumel = 32
2023-01-11T21:38:06.4348203Z     ynumel = 21
2023-01-11T21:38:06.4348301Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4348434Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4348521Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4348617Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.4348751Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.4348835Z     ymask = yindex < ynumel
2023-01-11T21:38:06.4348908Z     x0 = xindex
2023-01-11T21:38:06.4348981Z     y1 = yindex
2023-01-11T21:38:06.4349092Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*y1)), xmask & ymask)
2023-01-11T21:38:06.4349192Z     tmp2 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.4349314Z     tmp1 = tl.load(in_ptr1 + (y1 + (512*tmp0)), xmask & ymask)
2023-01-11T21:38:06.4349401Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4349560Z     tl.store(out_ptr0 + (y1 + (21*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask)
2023-01-11T21:38:06.4349647Z ''')
2023-01-11T21:38:06.4349653Z 
2023-01-11T21:38:06.4349657Z 
2023-01-11T21:38:06.4349753Z async_compile.wait(globals())
2023-01-11T21:38:06.4349835Z del async_compile
2023-01-11T21:38:06.4349840Z 
2023-01-11T21:38:06.4349910Z def call(args):
2023-01-11T21:38:06.4349999Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4350081Z     args.clear()
2023-01-11T21:38:06.4350174Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4350378Z         buf0 = empty_strided((32, 21), (21, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4350472Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4350630Z         triton_fused_add_0.run(arg2_1, arg1_1, arg0_1, buf0, 32, 21, grid=grid(32, 21), stream=stream0)
2023-01-11T21:38:06.4350699Z         del arg0_1
2023-01-11T21:38:06.4350776Z         del arg1_1
2023-01-11T21:38:06.4350850Z         del arg2_1
2023-01-11T21:38:06.4350928Z         return (buf0, )
2023-01-11T21:38:06.4350934Z 
2023-01-11T21:38:06.4350938Z 
2023-01-11T21:38:06.4351020Z if __name__ == "__main__":
2023-01-11T21:38:06.4351139Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4351295Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4351502Z     arg0_1 = rand_strided((32, 1), (1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4351704Z     arg1_1 = rand_strided((9521, 512), (512, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4351902Z     arg2_1 = rand_strided((32, 21), (1, 32), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4352032Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4352037Z 
2023-01-11T21:38:06.4352042Z 
2023-01-11T21:38:06.4352140Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4352215Z import torch
2023-01-11T21:38:06.4352293Z import random
2023-01-11T21:38:06.4352411Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4352529Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4352540Z 
2023-01-11T21:38:06.4352617Z aten = torch.ops.aten
2023-01-11T21:38:06.4352758Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4352858Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4352863Z 
2023-01-11T21:38:06.4352940Z import triton
2023-01-11T21:38:06.4353035Z import triton.language as tl
2023-01-11T21:38:06.4353160Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4353300Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4353306Z 
2023-01-11T21:38:06.4353310Z 
2023-01-11T21:38:06.4353465Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4353536Z import triton
2023-01-11T21:38:06.4353630Z import triton.language as tl
2023-01-11T21:38:06.4353745Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4353875Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4354006Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4354136Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4354141Z 
2023-01-11T21:38:06.4354638Z @pointwise(size_hints=[32, 32], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.4354715Z @triton.jit
2023-01-11T21:38:06.4354892Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.4354968Z     xnumel = 32
2023-01-11T21:38:06.4355044Z     ynumel = 21
2023-01-11T21:38:06.4355144Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4355281Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4355369Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4355467Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.4355593Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.4355678Z     ymask = yindex < ynumel
2023-01-11T21:38:06.4355750Z     x0 = xindex
2023-01-11T21:38:06.4355823Z     y1 = yindex
2023-01-11T21:38:06.4355941Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*y1)), xmask & ymask)
2023-01-11T21:38:06.4356061Z     tmp2 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4356197Z     tmp1 = tl.load(in_ptr1 + (y1 + (512*tmp0)), xmask & ymask).to(tl.float32)
2023-01-11T21:38:06.4356272Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4356429Z     tl.store(out_ptr0 + (y1 + (21*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask)
2023-01-11T21:38:06.4356518Z ''')
2023-01-11T21:38:06.4356523Z 
2023-01-11T21:38:06.4356528Z 
2023-01-11T21:38:06.4356621Z async_compile.wait(globals())
2023-01-11T21:38:06.4356704Z del async_compile
2023-01-11T21:38:06.4356709Z 
2023-01-11T21:38:06.4356785Z def call(args):
2023-01-11T21:38:06.4356877Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4356954Z     args.clear()
2023-01-11T21:38:06.4357042Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4357280Z         buf0 = empty_strided((32, 21), (21, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4357376Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4357532Z         triton_fused_add_0.run(arg2_1, arg1_1, arg0_1, buf0, 32, 21, grid=grid(32, 21), stream=stream0)
2023-01-11T21:38:06.4357607Z         del arg0_1
2023-01-11T21:38:06.4357682Z         del arg1_1
2023-01-11T21:38:06.4357756Z         del arg2_1
2023-01-11T21:38:06.4357828Z         return (buf0, )
2023-01-11T21:38:06.4357833Z 
2023-01-11T21:38:06.4357838Z 
2023-01-11T21:38:06.4357920Z if __name__ == "__main__":
2023-01-11T21:38:06.4358039Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4358168Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4358377Z     arg0_1 = rand_strided((32, 1), (1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4358588Z     arg1_1 = rand_strided((9521, 512), (512, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4358788Z     arg2_1 = rand_strided((32, 21), (1, 32), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4358920Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4358925Z 
2023-01-11T21:38:06.4358992Z ok (0.794s)
2023-01-11T21:38:06.4359444Z   test_inf_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4359576Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4359867Z [2023-01-11 21:34:52,142] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 615
2023-01-11T21:38:06.4360130Z [2023-01-11 21:34:52,313] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 615
2023-01-11T21:38:06.4360550Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4360683Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4360939Z [2023-01-11 21:34:52,331] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 616
2023-01-11T21:38:06.4361201Z [2023-01-11 21:34:52,406] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 616
2023-01-11T21:38:06.4361209Z 
2023-01-11T21:38:06.4361310Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4361386Z import torch
2023-01-11T21:38:06.4361455Z import random
2023-01-11T21:38:06.4361574Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4361702Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4361707Z 
2023-01-11T21:38:06.4361793Z aten = torch.ops.aten
2023-01-11T21:38:06.4361933Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4362030Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4362036Z 
2023-01-11T21:38:06.4362111Z import triton
2023-01-11T21:38:06.4362198Z import triton.language as tl
2023-01-11T21:38:06.4362325Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4362468Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4362474Z 
2023-01-11T21:38:06.4362481Z 
2023-01-11T21:38:06.4362653Z triton_fused_add_add_1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4362730Z import triton
2023-01-11T21:38:06.4362825Z import triton.language as tl
2023-01-11T21:38:06.4362941Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4363044Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4363201Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4363331Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4363336Z 
2023-01-11T21:38:06.4363768Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4363845Z @triton.jit
2023-01-11T21:38:06.4363999Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4364072Z     xnumel = 8
2023-01-11T21:38:06.4364176Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4364306Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4364384Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4364458Z     x0 = xindex
2023-01-11T21:38:06.4364649Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4364752Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4364837Z     tmp1 = float("inf")
2023-01-11T21:38:06.4364918Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4365032Z     tmp3 = float("-inf")
2023-01-11T21:38:06.4365106Z     tmp4 = tmp0 + tmp3
2023-01-11T21:38:06.4365185Z     tmp6 = tmp5 * tmp3
2023-01-11T21:38:06.4365323Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4365456Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4365590Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4365703Z ''')
2023-01-11T21:38:06.4365709Z 
2023-01-11T21:38:06.4365714Z 
2023-01-11T21:38:06.4365811Z async_compile.wait(globals())
2023-01-11T21:38:06.4365888Z del async_compile
2023-01-11T21:38:06.4365893Z 
2023-01-11T21:38:06.4365962Z def call(args):
2023-01-11T21:38:06.4366037Z     arg0_1, = args
2023-01-11T21:38:06.4366111Z     args.clear()
2023-01-11T21:38:06.4366209Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4366410Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4366607Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4366804Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4366892Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4367055Z         triton_fused_add_add_1_mul_0.run(arg0_1, buf0, buf1, buf2, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4367129Z         del arg0_1
2023-01-11T21:38:06.4367220Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4367228Z 
2023-01-11T21:38:06.4367233Z 
2023-01-11T21:38:06.4367315Z if __name__ == "__main__":
2023-01-11T21:38:06.4367436Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4367565Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4367767Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4367875Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4367880Z 
2023-01-11T21:38:06.4367891Z 
2023-01-11T21:38:06.4367984Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4368065Z import torch
2023-01-11T21:38:06.4368143Z import random
2023-01-11T21:38:06.4368265Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4368390Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4368395Z 
2023-01-11T21:38:06.4368479Z aten = torch.ops.aten
2023-01-11T21:38:06.4368617Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4368711Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4368716Z 
2023-01-11T21:38:06.4368791Z import triton
2023-01-11T21:38:06.4368883Z import triton.language as tl
2023-01-11T21:38:06.4369010Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4369178Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4369184Z 
2023-01-11T21:38:06.4369189Z 
2023-01-11T21:38:06.4369361Z triton_fused_add_add_1_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4369436Z import triton
2023-01-11T21:38:06.4369526Z import triton.language as tl
2023-01-11T21:38:06.4369633Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4369733Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4369864Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4369988Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4369994Z 
2023-01-11T21:38:06.4370429Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4370506Z @triton.jit
2023-01-11T21:38:06.4370660Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4370733Z     xnumel = 8
2023-01-11T21:38:06.4370823Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4370951Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4371033Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4371103Z     x0 = xindex
2023-01-11T21:38:06.4371318Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4371435Z     tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4371515Z     tmp1 = float("inf")
2023-01-11T21:38:06.4371587Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4371727Z     tmp3 = float("-inf")
2023-01-11T21:38:06.4371803Z     tmp4 = tmp0 + tmp3
2023-01-11T21:38:06.4371880Z     tmp6 = tmp5 * tmp3
2023-01-11T21:38:06.4372016Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4372146Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4372276Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4372354Z ''')
2023-01-11T21:38:06.4372359Z 
2023-01-11T21:38:06.4372371Z 
2023-01-11T21:38:06.4372458Z async_compile.wait(globals())
2023-01-11T21:38:06.4372533Z del async_compile
2023-01-11T21:38:06.4372538Z 
2023-01-11T21:38:06.4372611Z def call(args):
2023-01-11T21:38:06.4372684Z     arg0_1, = args
2023-01-11T21:38:06.4372759Z     args.clear()
2023-01-11T21:38:06.4372851Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4373040Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4373233Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4373428Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4373519Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4373676Z         triton_fused_add_add_1_mul_0.run(arg0_1, buf0, buf1, buf2, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4373750Z         del arg0_1
2023-01-11T21:38:06.4373839Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.4373844Z 
2023-01-11T21:38:06.4373849Z 
2023-01-11T21:38:06.4373929Z if __name__ == "__main__":
2023-01-11T21:38:06.4374039Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4374163Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4374358Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4374579Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4374585Z 
2023-01-11T21:38:06.4374658Z ok (0.282s)
2023-01-11T21:38:06.4375217Z   test_inplace_activations_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4375365Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4375660Z [2023-01-11 21:34:52,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 617
2023-01-11T21:38:06.4375921Z [2023-01-11 21:34:52,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 617
2023-01-11T21:38:06.4376335Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4376470Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4376720Z [2023-01-11 21:34:53,061] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 618
2023-01-11T21:38:06.4376733Z 
2023-01-11T21:38:06.4376824Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4376898Z import torch
2023-01-11T21:38:06.4376969Z import random
2023-01-11T21:38:06.4377090Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4377272Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4377278Z 
2023-01-11T21:38:06.4377368Z aten = torch.ops.aten
2023-01-11T21:38:06.4377516Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4377605Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4377610Z 
2023-01-11T21:38:06.4377734Z import triton
2023-01-11T21:38:06.4377825Z import triton.language as tl
2023-01-11T21:38:06.4377950Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4378088Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4378093Z 
2023-01-11T21:38:06.4378098Z 
2023-01-11T21:38:06.4378349Z triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_div_gt_lift_fresh_copy_0 = async_compile.triton('''
2023-01-11T21:38:06.4378424Z import triton
2023-01-11T21:38:06.4378517Z import triton.language as tl
2023-01-11T21:38:06.4378624Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4378727Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4378863Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4378990Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4378995Z 
2023-01-11T21:38:06.4379489Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8), equal_to_1=())]})
2023-01-11T21:38:06.4379566Z @triton.jit
2023-01-11T21:38:06.4379755Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4379831Z     xnumel = 64
2023-01-11T21:38:06.4379921Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4380049Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4380132Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4380203Z     x0 = xindex
2023-01-11T21:38:06.4380394Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4380493Z     tmp26 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4380564Z     tmp1 = 1
2023-01-11T21:38:06.4380636Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4380709Z     tmp3 = 3
2023-01-11T21:38:06.4380785Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.4380857Z     tmp5 = 0.0
2023-01-11T21:38:06.4380994Z     tmp6 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 > tmp5, tmp4, tmp5))
2023-01-11T21:38:06.4381066Z     tmp7 = 6.0
2023-01-11T21:38:06.4381227Z     tmp8 = tl.where(tmp6 != tmp6, tmp6, tl.where(tmp6 < tmp7, tmp6, tmp7))
2023-01-11T21:38:06.4381299Z     tmp9 = tmp2 * tmp8
2023-01-11T21:38:06.4381370Z     tmp10 = 6
2023-01-11T21:38:06.4381449Z     tmp11 = tmp9 / tmp10
2023-01-11T21:38:06.4381552Z     tmp12 = -1.0
2023-01-11T21:38:06.4381690Z     tmp13 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp12, tmp2, tmp12))
2023-01-11T21:38:06.4381761Z     tmp14 = 1.0
2023-01-11T21:38:06.4381903Z     tmp15 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp14, tmp13, tmp14))
2023-01-11T21:38:06.4381967Z     tmp16 = 0
2023-01-11T21:38:06.4382047Z     tmp17 = tmp2 > tmp16
2023-01-11T21:38:06.4382121Z     tmp18 = 0.01
2023-01-11T21:38:06.4382202Z     tmp19 = tmp2 * tmp18
2023-01-11T21:38:06.4382304Z     tmp20 = tl.where(tmp17, tmp2, tmp19)
2023-01-11T21:38:06.4382391Z     tmp21 = tl.sigmoid(tmp2)
2023-01-11T21:38:06.4382469Z     tmp22 = tmp2 * tmp21
2023-01-11T21:38:06.4382561Z     tmp23 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4382633Z     tmp24 = 99.0
2023-01-11T21:38:06.4382734Z     tmp25 = tl.where(tmp16, tmp24, tmp2)
2023-01-11T21:38:06.4382811Z     tmp27 = tmp26 + tmp1
2023-01-11T21:38:06.4382909Z     tmp28 = tl.where(tmp1, tmp24, tmp27)
2023-01-11T21:38:06.4383043Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.4383177Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.4383299Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp20, xmask)
2023-01-11T21:38:06.4383422Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.4383544Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.4383696Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp25, xmask)
2023-01-11T21:38:06.4383819Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask)
2023-01-11T21:38:06.4383903Z ''')
2023-01-11T21:38:06.4383909Z 
2023-01-11T21:38:06.4383913Z 
2023-01-11T21:38:06.4384007Z async_compile.wait(globals())
2023-01-11T21:38:06.4384084Z del async_compile
2023-01-11T21:38:06.4384089Z 
2023-01-11T21:38:06.4384156Z def call(args):
2023-01-11T21:38:06.4384227Z     arg0_1, = args
2023-01-11T21:38:06.4384299Z     args.clear()
2023-01-11T21:38:06.4384391Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4384589Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4384785Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4384978Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4385165Z         buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4385360Z         buf4 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4385551Z         buf5 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4385776Z         buf6 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4385885Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4386100Z         triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_div_gt_lift_fresh_copy_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4386174Z         del arg0_1
2023-01-11T21:38:06.4386290Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, )
2023-01-11T21:38:06.4386296Z 
2023-01-11T21:38:06.4386300Z 
2023-01-11T21:38:06.4386380Z if __name__ == "__main__":
2023-01-11T21:38:06.4386491Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4386617Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4386818Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4386930Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4386936Z 
2023-01-11T21:38:06.4387228Z [2023-01-11 21:34:53,261] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 618
2023-01-11T21:38:06.4387235Z 
2023-01-11T21:38:06.4387332Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4387408Z import torch
2023-01-11T21:38:06.4387481Z import random
2023-01-11T21:38:06.4387593Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4387714Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4387719Z 
2023-01-11T21:38:06.4387799Z aten = torch.ops.aten
2023-01-11T21:38:06.4387935Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4388028Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4388033Z 
2023-01-11T21:38:06.4388109Z import triton
2023-01-11T21:38:06.4388201Z import triton.language as tl
2023-01-11T21:38:06.4388319Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4388461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4388467Z 
2023-01-11T21:38:06.4388471Z 
2023-01-11T21:38:06.4388783Z triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_convert_element_type_convert_element_type_1_convert_element_type_2_0 = async_compile.triton('''
2023-01-11T21:38:06.4388858Z import triton
2023-01-11T21:38:06.4388949Z import triton.language as tl
2023-01-11T21:38:06.4389063Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4389168Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4389298Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4389416Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4389426Z 
2023-01-11T21:38:06.4389906Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: '*fp16', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8), equal_to_1=())]})
2023-01-11T21:38:06.4390008Z @triton.jit
2023-01-11T21:38:06.4390196Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4390270Z     xnumel = 64
2023-01-11T21:38:06.4390367Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4390501Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4390585Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4390655Z     x0 = xindex
2023-01-11T21:38:06.4390859Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4390976Z     tmp31 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4391046Z     tmp1 = 1
2023-01-11T21:38:06.4391128Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4391215Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.4391284Z     tmp4 = 3
2023-01-11T21:38:06.4391361Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.4391426Z     tmp6 = 0.0
2023-01-11T21:38:06.4391559Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6))
2023-01-11T21:38:06.4391630Z     tmp8 = 6.0
2023-01-11T21:38:06.4397973Z     tmp9 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 < tmp8, tmp7, tmp8))
2023-01-11T21:38:06.4398075Z     tmp10 = tmp3 * tmp9
2023-01-11T21:38:06.4398151Z     tmp11 = 6
2023-01-11T21:38:06.4398237Z     tmp12 = tmp10 / tmp11
2023-01-11T21:38:06.4398334Z     tmp13 = tmp12.to(tl.float32)
2023-01-11T21:38:06.4398458Z     tmp14 = -1.0
2023-01-11T21:38:06.4398609Z     tmp15 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp14, tmp3, tmp14))
2023-01-11T21:38:06.4398682Z     tmp16 = 1.0
2023-01-11T21:38:06.4398836Z     tmp17 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 < tmp16, tmp15, tmp16))
2023-01-11T21:38:06.4398936Z     tmp18 = tmp17.to(tl.float32)
2023-01-11T21:38:06.4399009Z     tmp19 = 0
2023-01-11T21:38:06.4399093Z     tmp20 = tmp3 > tmp19
2023-01-11T21:38:06.4399159Z     tmp21 = 0.01
2023-01-11T21:38:06.4399240Z     tmp22 = tmp3 * tmp21
2023-01-11T21:38:06.4399340Z     tmp23 = tl.where(tmp20, tmp3, tmp22)
2023-01-11T21:38:06.4399487Z     tmp24 = tmp23.to(tl.float32)
2023-01-11T21:38:06.4399573Z     tmp25 = tl.sigmoid(tmp3)
2023-01-11T21:38:06.4399647Z     tmp26 = tmp3 * tmp25
2023-01-11T21:38:06.4399735Z     tmp27 = tmp26.to(tl.float32)
2023-01-11T21:38:06.4399833Z     tmp28 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4399907Z     tmp29 = 99.0
2023-01-11T21:38:06.4400005Z     tmp30 = tl.where(tmp19, tmp29, tmp2)
2023-01-11T21:38:06.4400084Z     tmp32 = tmp31 + tmp1
2023-01-11T21:38:06.4400181Z     tmp33 = tl.where(tmp1, tmp29, tmp32)
2023-01-11T21:38:06.4400309Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.4400442Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.4400574Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.4400703Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp27, xmask)
2023-01-11T21:38:06.4400829Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp28, xmask)
2023-01-11T21:38:06.4400958Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp30, xmask)
2023-01-11T21:38:06.4401081Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp33, xmask)
2023-01-11T21:38:06.4401169Z ''')
2023-01-11T21:38:06.4401175Z 
2023-01-11T21:38:06.4401179Z 
2023-01-11T21:38:06.4401266Z async_compile.wait(globals())
2023-01-11T21:38:06.4401343Z del async_compile
2023-01-11T21:38:06.4401348Z 
2023-01-11T21:38:06.4401422Z def call(args):
2023-01-11T21:38:06.4401495Z     arg0_1, = args
2023-01-11T21:38:06.4401570Z     args.clear()
2023-01-11T21:38:06.4401662Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4401895Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4402087Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4402287Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4402487Z         buf3 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4402681Z         buf4 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4402872Z         buf5 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4403063Z         buf6 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4403159Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4403416Z         triton_fused_add_add_1_add_2_add_3_add_4_add_6_add_7_convert_element_type_convert_element_type_1_convert_element_type_2_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4403494Z         del arg0_1
2023-01-11T21:38:06.4403605Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, )
2023-01-11T21:38:06.4403610Z 
2023-01-11T21:38:06.4403615Z 
2023-01-11T21:38:06.4403698Z if __name__ == "__main__":
2023-01-11T21:38:06.4403820Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4403951Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4404156Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4404271Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4404276Z 
2023-01-11T21:38:06.4404349Z ok (0.857s)
2023-01-11T21:38:06.4404701Z   test_inplace_add_alpha_autotune (__main__.CudaTests) ... [2023-01-11 21:34:53,283] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:06.4404946Z [2023-01-11 21:34:53,943] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation
2023-01-11T21:38:06.4405211Z [2023-01-11 21:34:53,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:06.4405217Z 
2023-01-11T21:38:06.4405315Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4405391Z import torch
2023-01-11T21:38:06.4405469Z import random
2023-01-11T21:38:06.4405639Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4405781Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4405787Z 
2023-01-11T21:38:06.4405885Z aten = torch.ops.aten
2023-01-11T21:38:06.4406018Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4406117Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4406122Z 
2023-01-11T21:38:06.4406198Z import triton
2023-01-11T21:38:06.4406292Z import triton.language as tl
2023-01-11T21:38:06.4406419Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4406559Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4406567Z 
2023-01-11T21:38:06.4406572Z 
2023-01-11T21:38:06.4406733Z triton_fused_add__0 = async_compile.triton('''
2023-01-11T21:38:06.4406811Z import triton
2023-01-11T21:38:06.4406899Z import triton.language as tl
2023-01-11T21:38:06.4407013Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4407120Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4407253Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4407382Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4407387Z 
2023-01-11T21:38:06.4407886Z @pointwise(size_hints=[8, 64], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4407965Z @triton.jit
2023-01-11T21:38:06.4408144Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.4408241Z     xnumel = 6
2023-01-11T21:38:06.4408316Z     ynumel = 40
2023-01-11T21:38:06.4408417Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4408552Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4408641Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4408744Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.4408878Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.4408956Z     ymask = yindex < ynumel
2023-01-11T21:38:06.4409028Z     x3 = xindex
2023-01-11T21:38:06.4409100Z     y2 = yindex
2023-01-11T21:38:06.4409174Z     x0 = xindex % 3
2023-01-11T21:38:06.4409256Z     x1 = (xindex // 3)
2023-01-11T21:38:06.4409472Z     tmp0 = tl.load(in_ptr0 + (y2 + (40*x3)), xmask & ymask, eviction_policy='evict_last')
2023-01-11T21:38:06.4409598Z     tmp1 = tl.load(in_ptr1 + (x0 + (3*y2) + (120*x1)), xmask & ymask)
2023-01-11T21:38:06.4409669Z     tmp2 = 0.55
2023-01-11T21:38:06.4409750Z     tmp3 = tmp1 * tmp2
2023-01-11T21:38:06.4409831Z     tmp4 = tmp0 + tmp3
2023-01-11T21:38:06.4409991Z     tl.store(out_ptr0 + (y2 + (40*x3) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp4, xmask & ymask)
2023-01-11T21:38:06.4410078Z ''')
2023-01-11T21:38:06.4410084Z 
2023-01-11T21:38:06.4410092Z 
2023-01-11T21:38:06.4410188Z async_compile.wait(globals())
2023-01-11T21:38:06.4410268Z del async_compile
2023-01-11T21:38:06.4410273Z 
2023-01-11T21:38:06.4410349Z def call(args):
2023-01-11T21:38:06.4410416Z     x_1, y_1 = args
2023-01-11T21:38:06.4410493Z     args.clear()
2023-01-11T21:38:06.4410587Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4410682Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4410821Z         triton_fused_add__0.run(x_1, y_1, x_1, 6, 40, grid=grid(6, 40), stream=stream0)
2023-01-11T21:38:06.4410893Z         del y_1
2023-01-11T21:38:06.4410972Z         return (x_1, )
2023-01-11T21:38:06.4410977Z 
2023-01-11T21:38:06.4410984Z 
2023-01-11T21:38:06.4411060Z if __name__ == "__main__":
2023-01-11T21:38:06.4411179Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4411307Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4411552Z     x_1 = rand_strided((2, 3, 4, 10), (120, 40, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4411767Z     y_1 = rand_strided((2, 3, 4, 10), (120, 1, 30, 3), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4411886Z     print_performance(lambda: call([x_1, y_1]))
2023-01-11T21:38:06.4411891Z 
2023-01-11T21:38:06.4411963Z ok (8.995s)
2023-01-11T21:38:06.4412486Z   test_inplace_add_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.4412567Z   warnings.warn(
2023-01-11T21:38:06.4412822Z [2023-01-11 21:35:02,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 619
2023-01-11T21:38:06.4413071Z [2023-01-11 21:35:02,343] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation
2023-01-11T21:38:06.4413336Z [2023-01-11 21:35:02,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 619
2023-01-11T21:38:06.4413342Z 
2023-01-11T21:38:06.4413442Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4413520Z import torch
2023-01-11T21:38:06.4413595Z import random
2023-01-11T21:38:06.4413716Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4413843Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4413848Z 
2023-01-11T21:38:06.4413926Z aten = torch.ops.aten
2023-01-11T21:38:06.4414066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4414163Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4414168Z 
2023-01-11T21:38:06.4414273Z import triton
2023-01-11T21:38:06.4414367Z import triton.language as tl
2023-01-11T21:38:06.4414720Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4414863Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4414868Z 
2023-01-11T21:38:06.4414873Z 
2023-01-11T21:38:06.4415035Z triton_fused_add__0 = async_compile.triton('''
2023-01-11T21:38:06.4415103Z import triton
2023-01-11T21:38:06.4415194Z import triton.language as tl
2023-01-11T21:38:06.4415321Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4415433Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4415589Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4415713Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4415718Z 
2023-01-11T21:38:06.4416160Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4416237Z @triton.jit
2023-01-11T21:38:06.4416371Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4416441Z     xnumel = 16
2023-01-11T21:38:06.4416536Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4416666Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4416749Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4416817Z     x0 = xindex
2023-01-11T21:38:06.4417005Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4417095Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.4417239Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4417374Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4417459Z ''')
2023-01-11T21:38:06.4417464Z 
2023-01-11T21:38:06.4417469Z 
2023-01-11T21:38:06.4417562Z async_compile.wait(globals())
2023-01-11T21:38:06.4417639Z del async_compile
2023-01-11T21:38:06.4417644Z 
2023-01-11T21:38:06.4417717Z def call(args):
2023-01-11T21:38:06.4417797Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4417864Z     args.clear()
2023-01-11T21:38:06.4417955Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4418109Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4418256Z         triton_fused_add__0.run(arg0_1, arg1_1, arg0_1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4418329Z         del arg1_1
2023-01-11T21:38:06.4418408Z         return (arg0_1, )
2023-01-11T21:38:06.4418413Z 
2023-01-11T21:38:06.4418418Z 
2023-01-11T21:38:06.4418496Z if __name__ == "__main__":
2023-01-11T21:38:06.4418606Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4418732Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4418931Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4419126Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4419248Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4419253Z 
2023-01-11T21:38:06.4419328Z ok (0.083s)
2023-01-11T21:38:06.4419793Z   test_inplace_buffer_autotune (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4419924Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4420180Z [2023-01-11 21:35:02,362] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 620
2023-01-11T21:38:06.4420439Z [2023-01-11 21:35:02,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 620
2023-01-11T21:38:06.4420488Z 
2023-01-11T21:38:06.4420582Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4420657Z import torch
2023-01-11T21:38:06.4420737Z import random
2023-01-11T21:38:06.4420856Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4420978Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4420986Z 
2023-01-11T21:38:06.4421071Z aten = torch.ops.aten
2023-01-11T21:38:06.4421208Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4421299Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4421304Z 
2023-01-11T21:38:06.4421380Z import triton
2023-01-11T21:38:06.4421473Z import triton.language as tl
2023-01-11T21:38:06.4421598Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4421741Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4421747Z 
2023-01-11T21:38:06.4421751Z 
2023-01-11T21:38:06.4421910Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4421990Z import triton
2023-01-11T21:38:06.4422084Z import triton.language as tl
2023-01-11T21:38:06.4422193Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4422298Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4422434Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4422567Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4422573Z 
2023-01-11T21:38:06.4422989Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4423064Z @triton.jit
2023-01-11T21:38:06.4423199Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4423275Z     xnumel = 25
2023-01-11T21:38:06.4423368Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4423499Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4423587Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4423660Z     x0 = xindex
2023-01-11T21:38:06.4423767Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4423866Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4423947Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4424112Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4424199Z ''')
2023-01-11T21:38:06.4424205Z 
2023-01-11T21:38:06.4424210Z 
2023-01-11T21:38:06.4424304Z async_compile.wait(globals())
2023-01-11T21:38:06.4424383Z del async_compile
2023-01-11T21:38:06.4424388Z 
2023-01-11T21:38:06.4424466Z def call(args):
2023-01-11T21:38:06.4424555Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4424633Z     args.clear()
2023-01-11T21:38:06.4424720Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4424924Z         buf0 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4425032Z         aten.mm.out(arg0_1, arg1_1, out=buf0)
2023-01-11T21:38:06.4425106Z         del arg0_1
2023-01-11T21:38:06.4425181Z         del arg1_1
2023-01-11T21:38:06.4425308Z         buf1 = as_strided(buf0, (1, 1, 5, 5), (25, 25, 5, 1)); del buf0  # reuse
2023-01-11T21:38:06.4425400Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4425547Z         triton_fused_add_0.run(buf1, arg2_1, 25, grid=grid(25), stream=stream0)
2023-01-11T21:38:06.4425631Z         del arg2_1
2023-01-11T21:38:06.4425717Z         return (buf1, )
2023-01-11T21:38:06.4425723Z 
2023-01-11T21:38:06.4425729Z 
2023-01-11T21:38:06.4425825Z if __name__ == "__main__":
2023-01-11T21:38:06.4425946Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4426074Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4426273Z     arg0_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4426471Z     arg1_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4426734Z     arg2_1 = rand_strided((1, 1, 5, 5), (25, 1, 5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4426856Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4426861Z 
2023-01-11T21:38:06.4426932Z ok (0.088s)
2023-01-11T21:38:06.4427467Z   test_inplace_mixed_dtype_ops_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.4427548Z   warnings.warn(
2023-01-11T21:38:06.4427805Z [2023-01-11 21:35:02,477] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 621
2023-01-11T21:38:06.4428070Z [2023-01-11 21:35:02,550] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 621
2023-01-11T21:38:06.4428076Z 
2023-01-11T21:38:06.4428178Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4428253Z import torch
2023-01-11T21:38:06.4428323Z import random
2023-01-11T21:38:06.4428445Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4428570Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4428576Z 
2023-01-11T21:38:06.4428660Z aten = torch.ops.aten
2023-01-11T21:38:06.4428801Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4428897Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4428902Z 
2023-01-11T21:38:06.4428978Z import triton
2023-01-11T21:38:06.4429072Z import triton.language as tl
2023-01-11T21:38:06.4429194Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4429333Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4429338Z 
2023-01-11T21:38:06.4429343Z 
2023-01-11T21:38:06.4429552Z triton_fused_add_add__convert_element_type_mul__0 = async_compile.triton('''
2023-01-11T21:38:06.4429630Z import triton
2023-01-11T21:38:06.4429729Z import triton.language as tl
2023-01-11T21:38:06.4429845Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4429950Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4430084Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4430233Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4430239Z 
2023-01-11T21:38:06.4430671Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4430747Z @triton.jit
2023-01-11T21:38:06.4430892Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4430968Z     xnumel = 16
2023-01-11T21:38:06.4431067Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4431200Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4431287Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4431354Z     x0 = xindex
2023-01-11T21:38:06.4431451Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4431645Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4431746Z     tmp8 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.4431835Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.4431916Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.4432008Z     tmp4 = tmp3.to(tl.float64)
2023-01-11T21:38:06.4432081Z     tmp5 = tmp4 + tmp1
2023-01-11T21:38:06.4432168Z     tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.4432254Z     tmp7 = tmp6.to(tl.float64)
2023-01-11T21:38:06.4432333Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.4432421Z     tmp10 = tmp9.to(tl.float32)
2023-01-11T21:38:06.4432563Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.4432649Z ''')
2023-01-11T21:38:06.4432655Z 
2023-01-11T21:38:06.4432689Z 
2023-01-11T21:38:06.4432779Z async_compile.wait(globals())
2023-01-11T21:38:06.4432859Z del async_compile
2023-01-11T21:38:06.4432864Z 
2023-01-11T21:38:06.4432943Z def call(args):
2023-01-11T21:38:06.4433024Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4433101Z     args.clear()
2023-01-11T21:38:06.4433195Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4433402Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4433489Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.4433583Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4433761Z         triton_fused_add_add__convert_element_type_mul__0.run(buf1, arg0_1, arg1_1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4433836Z         del arg0_1
2023-01-11T21:38:06.4433911Z         del arg1_1
2023-01-11T21:38:06.4433991Z         return (buf1, )
2023-01-11T21:38:06.4433996Z 
2023-01-11T21:38:06.4434002Z 
2023-01-11T21:38:06.4434086Z if __name__ == "__main__":
2023-01-11T21:38:06.4434208Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4434330Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4434533Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4434731Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4434857Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4434862Z 
2023-01-11T21:38:06.4434936Z ok (0.304s)
2023-01-11T21:38:06.4435275Z   test_input_mutation1_cuda (__main__.CudaTests) ... [2023-01-11 21:35:02,754] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 622
2023-01-11T21:38:06.4435525Z [2023-01-11 21:35:02,765] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.4435783Z [2023-01-11 21:35:02,832] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation
2023-01-11T21:38:06.4436042Z [2023-01-11 21:35:02,832] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 622
2023-01-11T21:38:06.4436056Z 
2023-01-11T21:38:06.4436150Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4436227Z import torch
2023-01-11T21:38:06.4436303Z import random
2023-01-11T21:38:06.4436425Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4436582Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4436588Z 
2023-01-11T21:38:06.4436675Z aten = torch.ops.aten
2023-01-11T21:38:06.4436813Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4436903Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4436908Z 
2023-01-11T21:38:06.4436985Z import triton
2023-01-11T21:38:06.4437080Z import triton.language as tl
2023-01-11T21:38:06.4437207Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4437349Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4437355Z 
2023-01-11T21:38:06.4437363Z 
2023-01-11T21:38:06.4437536Z triton_fused_add_copy__div_0 = async_compile.triton('''
2023-01-11T21:38:06.4437614Z import triton
2023-01-11T21:38:06.4437708Z import triton.language as tl
2023-01-11T21:38:06.4437817Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4437920Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4438058Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4438190Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4438195Z 
2023-01-11T21:38:06.4438638Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr1', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4438715Z @triton.jit
2023-01-11T21:38:06.4438858Z def triton_(in_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4438933Z     xnumel = 64
2023-01-11T21:38:06.4439053Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4439184Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4439267Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4439339Z     x0 = xindex
2023-01-11T21:38:06.4439532Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4439608Z     tmp1 = 1
2023-01-11T21:38:06.4439692Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4439766Z     tmp3 = tmp2 * tmp2
2023-01-11T21:38:06.4439838Z     tmp4 = 2
2023-01-11T21:38:06.4439919Z     tmp5 = tmp2 + tmp4
2023-01-11T21:38:06.4439997Z     tmp6 = tmp3 / tmp5
2023-01-11T21:38:06.4440140Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4440274Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4440362Z ''')
2023-01-11T21:38:06.4440367Z 
2023-01-11T21:38:06.4440372Z 
2023-01-11T21:38:06.4440460Z async_compile.wait(globals())
2023-01-11T21:38:06.4440541Z del async_compile
2023-01-11T21:38:06.4440546Z 
2023-01-11T21:38:06.4440624Z def call(args):
2023-01-11T21:38:06.4440701Z     arg0_1, = args
2023-01-11T21:38:06.4440777Z     args.clear()
2023-01-11T21:38:06.4440878Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4441083Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4441171Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4441330Z         triton_fused_add_copy__div_0.run(arg0_1, arg0_1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4441404Z         del arg0_1
2023-01-11T21:38:06.4441484Z         return (buf2, )
2023-01-11T21:38:06.4441489Z 
2023-01-11T21:38:06.4441494Z 
2023-01-11T21:38:06.4441577Z if __name__ == "__main__":
2023-01-11T21:38:06.4441697Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4441824Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4442031Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4442142Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4442154Z 
2023-01-11T21:38:06.4442220Z ok (0.097s)
2023-01-11T21:38:06.4442579Z   test_input_mutation2_cuda (__main__.CudaTests) ... [2023-01-11 21:35:02,905] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 623
2023-01-11T21:38:06.4442845Z [2023-01-11 21:35:02,914] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.aten.expand_copy.default
2023-01-11T21:38:06.4443107Z [2023-01-11 21:35:03,070] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 623
2023-01-11T21:38:06.4443113Z 
2023-01-11T21:38:06.4443212Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4443289Z import torch
2023-01-11T21:38:06.4443367Z import random
2023-01-11T21:38:06.4443487Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4443607Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4443614Z 
2023-01-11T21:38:06.4443698Z aten = torch.ops.aten
2023-01-11T21:38:06.4443838Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4443938Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4443944Z 
2023-01-11T21:38:06.4444021Z import triton
2023-01-11T21:38:06.4444116Z import triton.language as tl
2023-01-11T21:38:06.4444244Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4444380Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4444391Z 
2023-01-11T21:38:06.4444395Z 
2023-01-11T21:38:06.4444544Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4444621Z import triton
2023-01-11T21:38:06.4444718Z import triton.language as tl
2023-01-11T21:38:06.4444834Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4444937Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4445072Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4445201Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4445240Z 
2023-01-11T21:38:06.4445671Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4445770Z @triton.jit
2023-01-11T21:38:06.4445904Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4445981Z     xnumel = 64
2023-01-11T21:38:06.4446081Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4446210Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4446294Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4446367Z     x0 = xindex
2023-01-11T21:38:06.4446458Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4446532Z     tmp1 = 1
2023-01-11T21:38:06.4446614Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4446754Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4446845Z ''')
2023-01-11T21:38:06.4446851Z 
2023-01-11T21:38:06.4446855Z 
2023-01-11T21:38:06.4447031Z triton_fused_lift_fresh_copy_1 = async_compile.triton('''
2023-01-11T21:38:06.4447108Z import triton
2023-01-11T21:38:06.4447196Z import triton.language as tl
2023-01-11T21:38:06.4447315Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4447420Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4447552Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4447679Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4447685Z 
2023-01-11T21:38:06.4448069Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.4448145Z @triton.jit
2023-01-11T21:38:06.4448268Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4448339Z     xnumel = 1
2023-01-11T21:38:06.4448436Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4448567Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4448653Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4448727Z     tmp0 = 66.0
2023-01-11T21:38:06.4448892Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.4448981Z ''')
2023-01-11T21:38:06.4448986Z 
2023-01-11T21:38:06.4448991Z 
2023-01-11T21:38:06.4449151Z triton_fused_add_1_2 = async_compile.triton('''
2023-01-11T21:38:06.4449222Z import triton
2023-01-11T21:38:06.4449321Z import triton.language as tl
2023-01-11T21:38:06.4449436Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4449538Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4449673Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4449799Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4449808Z 
2023-01-11T21:38:06.4450210Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4450280Z @triton.jit
2023-01-11T21:38:06.4450418Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4450496Z     xnumel = 64
2023-01-11T21:38:06.4450594Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4450724Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4450811Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4450884Z     x0 = xindex
2023-01-11T21:38:06.4451070Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4451144Z     tmp1 = 2
2023-01-11T21:38:06.4451222Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4451355Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4451470Z ''')
2023-01-11T21:38:06.4451476Z 
2023-01-11T21:38:06.4451480Z 
2023-01-11T21:38:06.4451576Z async_compile.wait(globals())
2023-01-11T21:38:06.4451656Z del async_compile
2023-01-11T21:38:06.4451661Z 
2023-01-11T21:38:06.4451736Z def call(args):
2023-01-11T21:38:06.4451811Z     primals_1, = args
2023-01-11T21:38:06.4451889Z     args.clear()
2023-01-11T21:38:06.4451985Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4452191Z         buf0 = empty_strided((1, 64), (64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4452285Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4452425Z         triton_fused_add_0.run(primals_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4452505Z         del primals_1
2023-01-11T21:38:06.4452696Z         buf1 = empty_strided((1, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4452842Z         triton_fused_lift_fresh_copy_1.run(buf1, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.4452980Z         buf2 = torch.ops.aten.expand_copy.default(buf1, [64])
2023-01-11T21:38:06.4453059Z         del buf1
2023-01-11T21:38:06.4453135Z         buf3 = buf2
2023-01-11T21:38:06.4453241Z         assert_size_stride(buf3, (64, ), (1, ))
2023-01-11T21:38:06.4453310Z         del buf2
2023-01-11T21:38:06.4453508Z         buf4 = empty_strided((1, 64), (64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4453643Z         triton_fused_add_1_2.run(buf3, buf4, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4453757Z         return (as_strided(buf3, (1, 64), (64, 1)), buf0, buf4, )
2023-01-11T21:38:06.4453762Z 
2023-01-11T21:38:06.4453766Z 
2023-01-11T21:38:06.4453847Z if __name__ == "__main__":
2023-01-11T21:38:06.4453965Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4454095Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4454306Z     primals_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4454426Z     print_performance(lambda: call([primals_1]))
2023-01-11T21:38:06.4454435Z 
2023-01-11T21:38:06.4454629Z ok (0.372s)
2023-01-11T21:38:06.4454954Z   test_input_mutation3_cuda (__main__.CudaTests) ... [2023-01-11 21:35:03,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 624
2023-01-11T21:38:06.4455244Z [2023-01-11 21:35:03,351] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation
2023-01-11T21:38:06.4455514Z [2023-01-11 21:35:03,351] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 624
2023-01-11T21:38:06.4455520Z 
2023-01-11T21:38:06.4455617Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4455692Z import torch
2023-01-11T21:38:06.4455770Z import random
2023-01-11T21:38:06.4455891Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4456015Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4456020Z 
2023-01-11T21:38:06.4456095Z aten = torch.ops.aten
2023-01-11T21:38:06.4456237Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4456333Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4456339Z 
2023-01-11T21:38:06.4456413Z import triton
2023-01-11T21:38:06.4456503Z import triton.language as tl
2023-01-11T21:38:06.4456628Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4456769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4456775Z 
2023-01-11T21:38:06.4456779Z 
2023-01-11T21:38:06.4456934Z triton_fused_add__0 = async_compile.triton('''
2023-01-11T21:38:06.4457001Z import triton
2023-01-11T21:38:06.4457092Z import triton.language as tl
2023-01-11T21:38:06.4457264Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4457368Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4457502Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4457629Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4457634Z 
2023-01-11T21:38:06.4458180Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4458262Z @triton.jit
2023-01-11T21:38:06.4458395Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4458472Z     xnumel = 64
2023-01-11T21:38:06.4458571Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4458702Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4458788Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4458860Z     x0 = xindex
2023-01-11T21:38:06.4459054Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4459121Z     tmp1 = 1
2023-01-11T21:38:06.4459205Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4459343Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4459433Z ''')
2023-01-11T21:38:06.4459439Z 
2023-01-11T21:38:06.4459443Z 
2023-01-11T21:38:06.4459601Z triton_fused_mul__1 = async_compile.triton('''
2023-01-11T21:38:06.4459678Z import triton
2023-01-11T21:38:06.4459774Z import triton.language as tl
2023-01-11T21:38:06.4459884Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4459991Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4460125Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4460256Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4460261Z 
2023-01-11T21:38:06.4460678Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4460751Z @triton.jit
2023-01-11T21:38:06.4460886Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4460963Z     xnumel = 64
2023-01-11T21:38:06.4461054Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4461186Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4461271Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4461343Z     x0 = xindex
2023-01-11T21:38:06.4461561Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4461635Z     tmp1 = 2
2023-01-11T21:38:06.4461716Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4461844Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4461930Z ''')
2023-01-11T21:38:06.4461936Z 
2023-01-11T21:38:06.4461940Z 
2023-01-11T21:38:06.4462101Z triton_fused_sigmoid__2 = async_compile.triton('''
2023-01-11T21:38:06.4462181Z import triton
2023-01-11T21:38:06.4462274Z import triton.language as tl
2023-01-11T21:38:06.4462391Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4462494Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4462629Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4462749Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4462754Z 
2023-01-11T21:38:06.4463181Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4463257Z @triton.jit
2023-01-11T21:38:06.4463392Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4463467Z     xnumel = 64
2023-01-11T21:38:06.4463566Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4463697Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4463782Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4463848Z     x0 = xindex
2023-01-11T21:38:06.4464039Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4464160Z     tmp1 = tl.sigmoid(tmp0)
2023-01-11T21:38:06.4464295Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4464381Z ''')
2023-01-11T21:38:06.4464386Z 
2023-01-11T21:38:06.4464391Z 
2023-01-11T21:38:06.4464558Z triton_fused_add__1_view_3 = async_compile.triton('''
2023-01-11T21:38:06.4464638Z import triton
2023-01-11T21:38:06.4464726Z import triton.language as tl
2023-01-11T21:38:06.4464842Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4464944Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4465076Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4465203Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4465208Z 
2023-01-11T21:38:06.4465627Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4465705Z @triton.jit
2023-01-11T21:38:06.4465840Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4465912Z     xnumel = 64
2023-01-11T21:38:06.4466004Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4466136Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4466217Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4466290Z     x0 = xindex
2023-01-11T21:38:06.4466480Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4466552Z     tmp1 = 3
2023-01-11T21:38:06.4466626Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4466760Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4466845Z ''')
2023-01-11T21:38:06.4466852Z 
2023-01-11T21:38:06.4466856Z 
2023-01-11T21:38:06.4467024Z triton_fused_mul__1_view_4 = async_compile.triton('''
2023-01-11T21:38:06.4467103Z import triton
2023-01-11T21:38:06.4467200Z import triton.language as tl
2023-01-11T21:38:06.4467317Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4467420Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4467545Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4467676Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4467707Z 
2023-01-11T21:38:06.4468134Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4468209Z @triton.jit
2023-01-11T21:38:06.4468345Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4468420Z     xnumel = 64
2023-01-11T21:38:06.4468519Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4468651Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4468734Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4468808Z     x0 = xindex
2023-01-11T21:38:06.4469000Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4469073Z     tmp1 = 4
2023-01-11T21:38:06.4469155Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4469291Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4469375Z ''')
2023-01-11T21:38:06.4469380Z 
2023-01-11T21:38:06.4469385Z 
2023-01-11T21:38:06.4469551Z triton_fused_relu__view_5 = async_compile.triton('''
2023-01-11T21:38:06.4469621Z import triton
2023-01-11T21:38:06.4469716Z import triton.language as tl
2023-01-11T21:38:06.4469832Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4469936Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4470069Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4470195Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4470201Z 
2023-01-11T21:38:06.4470664Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4470738Z @triton.jit
2023-01-11T21:38:06.4470869Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4470944Z     xnumel = 64
2023-01-11T21:38:06.4471043Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4471172Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4471256Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4471328Z     x0 = xindex
2023-01-11T21:38:06.4471518Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4471630Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.4471763Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4471854Z ''')
2023-01-11T21:38:06.4471859Z 
2023-01-11T21:38:06.4471863Z 
2023-01-11T21:38:06.4471957Z async_compile.wait(globals())
2023-01-11T21:38:06.4472044Z del async_compile
2023-01-11T21:38:06.4472049Z 
2023-01-11T21:38:06.4472128Z def call(args):
2023-01-11T21:38:06.4472204Z     arg0_1, = args
2023-01-11T21:38:06.4472274Z     args.clear()
2023-01-11T21:38:06.4472373Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4472468Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4472608Z         triton_fused_add__0.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4472740Z         triton_fused_mul__1.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4472884Z         triton_fused_sigmoid__2.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4473032Z         triton_fused_add__1_view_3.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4473174Z         triton_fused_mul__1_view_4.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4473313Z         triton_fused_relu__view_5.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4473421Z         return (as_strided(arg0_1, (64, ), (1, )), )
2023-01-11T21:38:06.4473427Z 
2023-01-11T21:38:06.4473432Z 
2023-01-11T21:38:06.4473518Z if __name__ == "__main__":
2023-01-11T21:38:06.4473662Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4473793Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4473999Z     arg0_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4474112Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4474117Z 
2023-01-11T21:38:06.4474192Z ok (0.146s)
2023-01-11T21:38:06.4474517Z   test_input_mutation4_cuda (__main__.CudaTests) ... [2023-01-11 21:35:03,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 625
2023-01-11T21:38:06.4474767Z [2023-01-11 21:35:03,370] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation
2023-01-11T21:38:06.4475034Z [2023-01-11 21:35:03,370] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 625
2023-01-11T21:38:06.4475040Z 
2023-01-11T21:38:06.4475143Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4475219Z import torch
2023-01-11T21:38:06.4475299Z import random
2023-01-11T21:38:06.4475423Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4475549Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4475554Z 
2023-01-11T21:38:06.4475639Z aten = torch.ops.aten
2023-01-11T21:38:06.4475772Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4475868Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4475873Z 
2023-01-11T21:38:06.4475952Z import triton
2023-01-11T21:38:06.4476044Z import triton.language as tl
2023-01-11T21:38:06.4476173Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4476395Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4476400Z 
2023-01-11T21:38:06.4476405Z 
2023-01-11T21:38:06.4476562Z triton_fused_relu__0 = async_compile.triton('''
2023-01-11T21:38:06.4476633Z import triton
2023-01-11T21:38:06.4476728Z import triton.language as tl
2023-01-11T21:38:06.4476848Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4476954Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4477089Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4477215Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4477220Z 
2023-01-11T21:38:06.4477641Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0', 'in_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4477717Z @triton.jit
2023-01-11T21:38:06.4477845Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4477926Z     xnumel = 64
2023-01-11T21:38:06.4478025Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4478156Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4478243Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4478315Z     x0 = xindex
2023-01-11T21:38:06.4478511Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4478630Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.4478761Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4478847Z ''')
2023-01-11T21:38:06.4478852Z 
2023-01-11T21:38:06.4478857Z 
2023-01-11T21:38:06.4478953Z async_compile.wait(globals())
2023-01-11T21:38:06.4479034Z del async_compile
2023-01-11T21:38:06.4479039Z 
2023-01-11T21:38:06.4479114Z def call(args):
2023-01-11T21:38:06.4479189Z     arg0_1, = args
2023-01-11T21:38:06.4479266Z     args.clear()
2023-01-11T21:38:06.4479356Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4479452Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4479593Z         triton_fused_relu__0.run(arg0_1, arg0_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4479675Z         return (arg0_1, )
2023-01-11T21:38:06.4479680Z 
2023-01-11T21:38:06.4479684Z 
2023-01-11T21:38:06.4479792Z if __name__ == "__main__":
2023-01-11T21:38:06.4479916Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4480042Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4480247Z     arg0_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4480354Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4480359Z 
2023-01-11T21:38:06.4480431Z ok (0.019s)
2023-01-11T21:38:06.4480900Z   test_invalid_operand_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4481039Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4481302Z [2023-01-11 21:35:03,715] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 626
2023-01-11T21:38:06.4481566Z [2023-01-11 21:35:03,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 626
2023-01-11T21:38:06.4481985Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4482142Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4482399Z [2023-01-11 21:35:03,950] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 627
2023-01-11T21:38:06.4482665Z [2023-01-11 21:35:04,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 627
2023-01-11T21:38:06.4482671Z 
2023-01-11T21:38:06.4482771Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4482841Z import torch
2023-01-11T21:38:06.4482918Z import random
2023-01-11T21:38:06.4483039Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4483164Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4483169Z 
2023-01-11T21:38:06.4483252Z aten = torch.ops.aten
2023-01-11T21:38:06.4483391Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4483489Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4483495Z 
2023-01-11T21:38:06.4483569Z import triton
2023-01-11T21:38:06.4483660Z import triton.language as tl
2023-01-11T21:38:06.4483784Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4483926Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4483932Z 
2023-01-11T21:38:06.4483936Z 
2023-01-11T21:38:06.4484105Z triton_fused_embedding_0 = async_compile.triton('''
2023-01-11T21:38:06.4484180Z import triton
2023-01-11T21:38:06.4484274Z import triton.language as tl
2023-01-11T21:38:06.4484390Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4484487Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4484620Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4484747Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4484752Z 
2023-01-11T21:38:06.4485199Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.4485275Z @triton.jit
2023-01-11T21:38:06.4485433Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4485512Z     xnumel = 786432
2023-01-11T21:38:06.4485638Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4485765Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4485850Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4485935Z     x1 = (xindex // 768) % 128
2023-01-11T21:38:06.4486018Z     x2 = (xindex // 98304)
2023-01-11T21:38:06.4486095Z     x3 = (xindex // 768)
2023-01-11T21:38:06.4486174Z     x0 = xindex % 768
2023-01-11T21:38:06.4486246Z     x4 = xindex
2023-01-11T21:38:06.4486340Z     tmp3 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.4486438Z     tmp8 = tl.load(in_ptr2 + (x3), xmask)
2023-01-11T21:38:06.4486510Z     tmp0 = x1
2023-01-11T21:38:06.4486587Z     tmp1 = 0
2023-01-11T21:38:06.4486668Z     tmp2 = tmp0 == tmp1
2023-01-11T21:38:06.4486741Z     tmp4 = 1
2023-01-11T21:38:06.4486821Z     tmp5 = tmp0 >= tmp4
2023-01-11T21:38:06.4487051Z     tmp6 = tl.load(in_ptr1 + ((-1) + x1 + (127*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0)
2023-01-11T21:38:06.4487149Z     tmp7 = tl.where(tmp5, tmp6, 0)
2023-01-11T21:38:06.4487248Z     tmp9 = tl.where(tmp5, tmp7, tmp8)
2023-01-11T21:38:06.4487346Z     tmp10 = tl.where(tmp2, tmp3, tmp9)
2023-01-11T21:38:06.4487463Z     tmp11 = tl.load(in_ptr3 + (x0 + (768*tmp10)), xmask)
2023-01-11T21:38:06.4487600Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.4487687Z ''')
2023-01-11T21:38:06.4487692Z 
2023-01-11T21:38:06.4487698Z 
2023-01-11T21:38:06.4487792Z async_compile.wait(globals())
2023-01-11T21:38:06.4487865Z del async_compile
2023-01-11T21:38:06.4487870Z 
2023-01-11T21:38:06.4487945Z def call(args):
2023-01-11T21:38:06.4488051Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args
2023-01-11T21:38:06.4488158Z     args.clear()
2023-01-11T21:38:06.4488254Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4488481Z         buf0 = empty_strided((8, 128, 768), (98304, 768, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4488577Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4488742Z         triton_fused_embedding_0.run(arg3_1, arg2_1, arg4_1, arg0_1, buf0, 786432, grid=grid(786432), stream=stream0)
2023-01-11T21:38:06.4488816Z         del arg0_1
2023-01-11T21:38:06.4488890Z         del arg2_1
2023-01-11T21:38:06.4488963Z         del arg3_1
2023-01-11T21:38:06.4489037Z         del arg4_1
2023-01-11T21:38:06.4489115Z         return (buf0, )
2023-01-11T21:38:06.4489120Z 
2023-01-11T21:38:06.4489124Z 
2023-01-11T21:38:06.4489206Z if __name__ == "__main__":
2023-01-11T21:38:06.4489320Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4489449Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4489663Z     arg0_1 = rand_strided((50005, 768), (768, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4489870Z     arg1_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4490071Z     arg2_1 = rand_strided((8, 127), (127, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4490270Z     arg3_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4490468Z     arg4_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4490612Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1]))
2023-01-11T21:38:06.4490617Z 
2023-01-11T21:38:06.4490622Z 
2023-01-11T21:38:06.4490722Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4490792Z import torch
2023-01-11T21:38:06.4490867Z import random
2023-01-11T21:38:06.4490987Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4491110Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4491119Z 
2023-01-11T21:38:06.4491204Z aten = torch.ops.aten
2023-01-11T21:38:06.4491340Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4491437Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4491442Z 
2023-01-11T21:38:06.4491512Z import triton
2023-01-11T21:38:06.4491606Z import triton.language as tl
2023-01-11T21:38:06.4491758Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4491900Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4491905Z 
2023-01-11T21:38:06.4491910Z 
2023-01-11T21:38:06.4492076Z triton_fused_embedding_0 = async_compile.triton('''
2023-01-11T21:38:06.4492154Z import triton
2023-01-11T21:38:06.4492246Z import triton.language as tl
2023-01-11T21:38:06.4492360Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4492457Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4492589Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4492714Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4492722Z 
2023-01-11T21:38:06.4493175Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.4493251Z @triton.jit
2023-01-11T21:38:06.4493412Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4493490Z     xnumel = 786432
2023-01-11T21:38:06.4493590Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4493715Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4493801Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4493887Z     x1 = (xindex // 768) % 128
2023-01-11T21:38:06.4493966Z     x2 = (xindex // 98304)
2023-01-11T21:38:06.4494047Z     x3 = (xindex // 768)
2023-01-11T21:38:06.4494124Z     x0 = xindex % 768
2023-01-11T21:38:06.4494224Z     x4 = xindex
2023-01-11T21:38:06.4494318Z     tmp3 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.4494415Z     tmp8 = tl.load(in_ptr2 + (x3), xmask)
2023-01-11T21:38:06.4494599Z     tmp0 = x1
2023-01-11T21:38:06.4494674Z     tmp1 = 0
2023-01-11T21:38:06.4494755Z     tmp2 = tmp0 == tmp1
2023-01-11T21:38:06.4494827Z     tmp4 = 1
2023-01-11T21:38:06.4494908Z     tmp5 = tmp0 >= tmp4
2023-01-11T21:38:06.4495145Z     tmp6 = tl.load(in_ptr1 + ((-1) + x1 + (127*x2) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, other=0)
2023-01-11T21:38:06.4495244Z     tmp7 = tl.where(tmp5, tmp6, 0)
2023-01-11T21:38:06.4495341Z     tmp9 = tl.where(tmp5, tmp7, tmp8)
2023-01-11T21:38:06.4495438Z     tmp10 = tl.where(tmp2, tmp3, tmp9)
2023-01-11T21:38:06.4495571Z     tmp11 = tl.load(in_ptr3 + (x0 + (768*tmp10)), xmask).to(tl.float32)
2023-01-11T21:38:06.4495711Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.4495808Z ''')
2023-01-11T21:38:06.4495817Z 
2023-01-11T21:38:06.4495821Z 
2023-01-11T21:38:06.4495910Z async_compile.wait(globals())
2023-01-11T21:38:06.4495990Z del async_compile
2023-01-11T21:38:06.4495995Z 
2023-01-11T21:38:06.4496070Z def call(args):
2023-01-11T21:38:06.4496175Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1 = args
2023-01-11T21:38:06.4496252Z     args.clear()
2023-01-11T21:38:06.4496350Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4496572Z         buf0 = empty_strided((8, 128, 768), (98304, 768, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4496665Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4496827Z         triton_fused_embedding_0.run(arg3_1, arg2_1, arg4_1, arg0_1, buf0, 786432, grid=grid(786432), stream=stream0)
2023-01-11T21:38:06.4496902Z         del arg0_1
2023-01-11T21:38:06.4496975Z         del arg2_1
2023-01-11T21:38:06.4497048Z         del arg3_1
2023-01-11T21:38:06.4497121Z         del arg4_1
2023-01-11T21:38:06.4497259Z         return (buf0, )
2023-01-11T21:38:06.4497268Z 
2023-01-11T21:38:06.4497272Z 
2023-01-11T21:38:06.4497355Z if __name__ == "__main__":
2023-01-11T21:38:06.4497469Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4497596Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4497868Z     arg0_1 = rand_strided((50005, 768), (768, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4498075Z     arg1_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4498276Z     arg2_1 = rand_strided((8, 127), (127, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4498473Z     arg3_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4498671Z     arg4_1 = rand_strided((8, 128), (128, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4498814Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1]))
2023-01-11T21:38:06.4498819Z 
2023-01-11T21:38:06.4498887Z ok (0.719s)
2023-01-11T21:38:06.4499348Z   test_isinf2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4499485Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4499744Z [2023-01-11 21:35:04,104] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 628
2023-01-11T21:38:06.4500009Z [2023-01-11 21:35:04,176] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 628
2023-01-11T21:38:06.4500427Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4500591Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4500852Z [2023-01-11 21:35:04,193] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 629
2023-01-11T21:38:06.4501112Z [2023-01-11 21:35:04,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 629
2023-01-11T21:38:06.4501118Z 
2023-01-11T21:38:06.4501215Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4501287Z import torch
2023-01-11T21:38:06.4501354Z import random
2023-01-11T21:38:06.4501472Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4501596Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4501601Z 
2023-01-11T21:38:06.4501685Z aten = torch.ops.aten
2023-01-11T21:38:06.4501821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4501920Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4501925Z 
2023-01-11T21:38:06.4501999Z import triton
2023-01-11T21:38:06.4502090Z import triton.language as tl
2023-01-11T21:38:06.4502207Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4502346Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4502352Z 
2023-01-11T21:38:06.4502356Z 
2023-01-11T21:38:06.4502507Z triton_fused_eq_0 = async_compile.triton('''
2023-01-11T21:38:06.4502580Z import triton
2023-01-11T21:38:06.4502672Z import triton.language as tl
2023-01-11T21:38:06.4502785Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4502886Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4503010Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4503134Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4503140Z 
2023-01-11T21:38:06.4503536Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4503614Z @triton.jit
2023-01-11T21:38:06.4503746Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4503846Z     xnumel = 5
2023-01-11T21:38:06.4503946Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4504076Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4504152Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4504224Z     x0 = xindex
2023-01-11T21:38:06.4504321Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4504393Z     tmp1 = x0
2023-01-11T21:38:06.4504461Z     tmp2 = 2
2023-01-11T21:38:06.4504540Z     tmp3 = tmp1 < tmp2
2023-01-11T21:38:06.4504609Z     tmp4 = 1
2023-01-11T21:38:06.4504680Z     tmp5 = tmp1 < tmp4
2023-01-11T21:38:06.4504750Z     tmp6 = 1.0
2023-01-11T21:38:06.4504827Z     tmp7 = float("inf")
2023-01-11T21:38:06.4504926Z     tmp8 = tl.where(tmp5, tmp6, tmp7)
2023-01-11T21:38:06.4504994Z     tmp9 = 3
2023-01-11T21:38:06.4505073Z     tmp10 = tmp1 < tmp9
2023-01-11T21:38:06.4505138Z     tmp11 = 2.0
2023-01-11T21:38:06.4505208Z     tmp12 = 4
2023-01-11T21:38:06.4505286Z     tmp13 = tmp1 < tmp12
2023-01-11T21:38:06.4505407Z     tmp14 = float("-inf")
2023-01-11T21:38:06.4505491Z     tmp15 = float("nan")
2023-01-11T21:38:06.4505607Z     tmp16 = tl.where(tmp13, tmp14, tmp15)
2023-01-11T21:38:06.4505714Z     tmp17 = tl.where(tmp10, tmp11, tmp16)
2023-01-11T21:38:06.4505824Z     tmp18 = tl.where(tmp3, tmp8, tmp17)
2023-01-11T21:38:06.4505905Z     tmp19 = tmp0 == tmp18
2023-01-11T21:38:06.4506041Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.4506124Z ''')
2023-01-11T21:38:06.4506130Z 
2023-01-11T21:38:06.4506134Z 
2023-01-11T21:38:06.4506227Z async_compile.wait(globals())
2023-01-11T21:38:06.4506302Z del async_compile
2023-01-11T21:38:06.4506338Z 
2023-01-11T21:38:06.4506413Z def call(args):
2023-01-11T21:38:06.4506487Z     arg0_1, = args
2023-01-11T21:38:06.4506554Z     args.clear()
2023-01-11T21:38:06.4506649Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4506840Z         buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4506931Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4507065Z         triton_fused_eq_0.run(arg0_1, buf0, 5, grid=grid(5), stream=stream0)
2023-01-11T21:38:06.4507138Z         del arg0_1
2023-01-11T21:38:06.4507214Z         return (buf0, )
2023-01-11T21:38:06.4507219Z 
2023-01-11T21:38:06.4507223Z 
2023-01-11T21:38:06.4507296Z if __name__ == "__main__":
2023-01-11T21:38:06.4507412Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4507539Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4507736Z     arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4507848Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4507856Z 
2023-01-11T21:38:06.4507860Z 
2023-01-11T21:38:06.4507957Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4508029Z import torch
2023-01-11T21:38:06.4508102Z import random
2023-01-11T21:38:06.4508214Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4508340Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4508347Z 
2023-01-11T21:38:06.4508427Z aten = torch.ops.aten
2023-01-11T21:38:06.4508563Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4508657Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4508662Z 
2023-01-11T21:38:06.4508735Z import triton
2023-01-11T21:38:06.4508829Z import triton.language as tl
2023-01-11T21:38:06.4508947Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4509086Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4509091Z 
2023-01-11T21:38:06.4509096Z 
2023-01-11T21:38:06.4509247Z triton_fused_eq_0 = async_compile.triton('''
2023-01-11T21:38:06.4509322Z import triton
2023-01-11T21:38:06.4509413Z import triton.language as tl
2023-01-11T21:38:06.4509526Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4509626Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4509786Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4509905Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4509917Z 
2023-01-11T21:38:06.4510307Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4510380Z @triton.jit
2023-01-11T21:38:06.4510515Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4510588Z     xnumel = 5
2023-01-11T21:38:06.4510685Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4510813Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4510899Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4510962Z     x0 = xindex
2023-01-11T21:38:06.4511080Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4511149Z     tmp1 = x0
2023-01-11T21:38:06.4511218Z     tmp2 = 2
2023-01-11T21:38:06.4511302Z     tmp3 = tmp1 < tmp2
2023-01-11T21:38:06.4511372Z     tmp4 = 1
2023-01-11T21:38:06.4511451Z     tmp5 = tmp1 < tmp4
2023-01-11T21:38:06.4511515Z     tmp6 = 1.0
2023-01-11T21:38:06.4511590Z     tmp7 = float("inf")
2023-01-11T21:38:06.4511685Z     tmp8 = tl.where(tmp5, tmp6, tmp7)
2023-01-11T21:38:06.4511753Z     tmp9 = 3
2023-01-11T21:38:06.4511832Z     tmp10 = tmp1 < tmp9
2023-01-11T21:38:06.4511903Z     tmp11 = 2.0
2023-01-11T21:38:06.4511967Z     tmp12 = 4
2023-01-11T21:38:06.4512050Z     tmp13 = tmp1 < tmp12
2023-01-11T21:38:06.4512164Z     tmp14 = float("-inf")
2023-01-11T21:38:06.4512246Z     tmp15 = float("nan")
2023-01-11T21:38:06.4512347Z     tmp16 = tl.where(tmp13, tmp14, tmp15)
2023-01-11T21:38:06.4512482Z     tmp17 = tl.where(tmp10, tmp11, tmp16)
2023-01-11T21:38:06.4512579Z     tmp18 = tl.where(tmp3, tmp8, tmp17)
2023-01-11T21:38:06.4512652Z     tmp19 = tmp0 == tmp18
2023-01-11T21:38:06.4512788Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.4512876Z ''')
2023-01-11T21:38:06.4512881Z 
2023-01-11T21:38:06.4512886Z 
2023-01-11T21:38:06.4512980Z async_compile.wait(globals())
2023-01-11T21:38:06.4513054Z del async_compile
2023-01-11T21:38:06.4513059Z 
2023-01-11T21:38:06.4513136Z def call(args):
2023-01-11T21:38:06.4513209Z     arg0_1, = args
2023-01-11T21:38:06.4513284Z     args.clear()
2023-01-11T21:38:06.4513369Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4513555Z         buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4513643Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4513776Z         triton_fused_eq_0.run(arg0_1, buf0, 5, grid=grid(5), stream=stream0)
2023-01-11T21:38:06.4513850Z         del arg0_1
2023-01-11T21:38:06.4513926Z         return (buf0, )
2023-01-11T21:38:06.4513931Z 
2023-01-11T21:38:06.4513935Z 
2023-01-11T21:38:06.4514014Z if __name__ == "__main__":
2023-01-11T21:38:06.4514124Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4514251Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4514446Z     arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4514557Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4514562Z 
2023-01-11T21:38:06.4514632Z ok (0.179s)
2023-01-11T21:38:06.4515085Z   test_isinf_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4515220Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4515480Z [2023-01-11 21:35:04,282] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 630
2023-01-11T21:38:06.4515767Z [2023-01-11 21:35:04,420] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 630
2023-01-11T21:38:06.4516186Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4516315Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4516562Z [2023-01-11 21:35:04,434] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 631
2023-01-11T21:38:06.4516828Z [2023-01-11 21:35:04,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 631
2023-01-11T21:38:06.4517239Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4517368Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4517623Z [2023-01-11 21:35:04,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 632
2023-01-11T21:38:06.4517882Z [2023-01-11 21:35:04,730] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 632
2023-01-11T21:38:06.4518294Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4518451Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4518700Z [2023-01-11 21:35:04,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 633
2023-01-11T21:38:06.4518705Z 
2023-01-11T21:38:06.4518802Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4518881Z import torch
2023-01-11T21:38:06.4518948Z import random
2023-01-11T21:38:06.4519065Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4519190Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4519195Z 
2023-01-11T21:38:06.4519277Z aten = torch.ops.aten
2023-01-11T21:38:06.4519411Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4519509Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4519514Z 
2023-01-11T21:38:06.4519588Z import triton
2023-01-11T21:38:06.4519672Z import triton.language as tl
2023-01-11T21:38:06.4519795Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4519936Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4519942Z 
2023-01-11T21:38:06.4519946Z 
2023-01-11T21:38:06.4520114Z triton_fused_isinf_isnan_0 = async_compile.triton('''
2023-01-11T21:38:06.4520186Z import triton
2023-01-11T21:38:06.4520275Z import triton.language as tl
2023-01-11T21:38:06.4520386Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4520485Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4520611Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4520738Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4520743Z 
2023-01-11T21:38:06.4521152Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4521226Z @triton.jit
2023-01-11T21:38:06.4521392Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4521465Z     xnumel = 5
2023-01-11T21:38:06.4521558Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4521684Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4521760Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4521829Z     x0 = xindex
2023-01-11T21:38:06.4522020Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4522120Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4522216Z     tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.4522313Z     tmp3 = tl.libdevice.isnan(tmp2)
2023-01-11T21:38:06.4522451Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4522575Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4522658Z ''')
2023-01-11T21:38:06.4522664Z 
2023-01-11T21:38:06.4522668Z 
2023-01-11T21:38:06.4522760Z async_compile.wait(globals())
2023-01-11T21:38:06.4522838Z del async_compile
2023-01-11T21:38:06.4522843Z 
2023-01-11T21:38:06.4522917Z def call(args):
2023-01-11T21:38:06.4522991Z     arg0_1, = args
2023-01-11T21:38:06.4523066Z     args.clear()
2023-01-11T21:38:06.4523152Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4523341Z         buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4523529Z         buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4523621Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4523770Z         triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0)
2023-01-11T21:38:06.4523872Z         del arg0_1
2023-01-11T21:38:06.4523954Z         return (buf0, buf1, )
2023-01-11T21:38:06.4523959Z 
2023-01-11T21:38:06.4523964Z 
2023-01-11T21:38:06.4524044Z if __name__ == "__main__":
2023-01-11T21:38:06.4524155Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4524283Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4524480Z     arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4524592Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4524598Z 
2023-01-11T21:38:06.4524602Z 
2023-01-11T21:38:06.4524698Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4524772Z import torch
2023-01-11T21:38:06.4524845Z import random
2023-01-11T21:38:06.4524961Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4525078Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4525083Z 
2023-01-11T21:38:06.4525165Z aten = torch.ops.aten
2023-01-11T21:38:06.4525303Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4525397Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4525402Z 
2023-01-11T21:38:06.4525476Z import triton
2023-01-11T21:38:06.4525567Z import triton.language as tl
2023-01-11T21:38:06.4525689Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4525823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4525833Z 
2023-01-11T21:38:06.4525838Z 
2023-01-11T21:38:06.4525997Z triton_fused_isinf_isnan_0 = async_compile.triton('''
2023-01-11T21:38:06.4526072Z import triton
2023-01-11T21:38:06.4526163Z import triton.language as tl
2023-01-11T21:38:06.4526275Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4526376Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4526509Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4526630Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4526637Z 
2023-01-11T21:38:06.4527044Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4527111Z @triton.jit
2023-01-11T21:38:06.4527280Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4527351Z     xnumel = 5
2023-01-11T21:38:06.4527446Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4527573Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4527655Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4527724Z     x0 = xindex
2023-01-11T21:38:06.4527931Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4528048Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4528145Z     tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.4528245Z     tmp3 = tl.libdevice.isnan(tmp2)
2023-01-11T21:38:06.4528377Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4528509Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4528593Z ''')
2023-01-11T21:38:06.4528598Z 
2023-01-11T21:38:06.4528603Z 
2023-01-11T21:38:06.4528696Z async_compile.wait(globals())
2023-01-11T21:38:06.4528765Z del async_compile
2023-01-11T21:38:06.4528771Z 
2023-01-11T21:38:06.4528846Z def call(args):
2023-01-11T21:38:06.4528917Z     arg0_1, = args
2023-01-11T21:38:06.4528991Z     args.clear()
2023-01-11T21:38:06.4529080Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4529272Z         buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4529460Z         buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4529545Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4529692Z         triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0)
2023-01-11T21:38:06.4529794Z         del arg0_1
2023-01-11T21:38:06.4529876Z         return (buf0, buf1, )
2023-01-11T21:38:06.4529881Z 
2023-01-11T21:38:06.4529886Z 
2023-01-11T21:38:06.4529965Z if __name__ == "__main__":
2023-01-11T21:38:06.4530082Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4530207Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4530402Z     arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4530508Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4530513Z 
2023-01-11T21:38:06.4530524Z 
2023-01-11T21:38:06.4530613Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4530685Z import torch
2023-01-11T21:38:06.4530756Z import random
2023-01-11T21:38:06.4530875Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4531000Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4531009Z 
2023-01-11T21:38:06.4531090Z aten = torch.ops.aten
2023-01-11T21:38:06.4531224Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4531311Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4531316Z 
2023-01-11T21:38:06.4531387Z import triton
2023-01-11T21:38:06.4531477Z import triton.language as tl
2023-01-11T21:38:06.4531604Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4531742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4531748Z 
2023-01-11T21:38:06.4531752Z 
2023-01-11T21:38:06.4531917Z triton_fused_isinf_isnan_0 = async_compile.triton('''
2023-01-11T21:38:06.4531990Z import triton
2023-01-11T21:38:06.4532082Z import triton.language as tl
2023-01-11T21:38:06.4532188Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4532289Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4532422Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4532544Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4532551Z 
2023-01-11T21:38:06.4532958Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp64', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4533057Z @triton.jit
2023-01-11T21:38:06.4533200Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4533270Z     xnumel = 5
2023-01-11T21:38:06.4533360Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4533487Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4533569Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4533640Z     x0 = xindex
2023-01-11T21:38:06.4533829Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4533925Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4534022Z     tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.4534114Z     tmp3 = tl.libdevice.isnan(tmp2)
2023-01-11T21:38:06.4534245Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4534379Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4534462Z ''')
2023-01-11T21:38:06.4534467Z 
2023-01-11T21:38:06.4534596Z 
2023-01-11T21:38:06.4534691Z async_compile.wait(globals())
2023-01-11T21:38:06.4534766Z del async_compile
2023-01-11T21:38:06.4534771Z 
2023-01-11T21:38:06.4534846Z def call(args):
2023-01-11T21:38:06.4534913Z     arg0_1, = args
2023-01-11T21:38:06.4534987Z     args.clear()
2023-01-11T21:38:06.4535079Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4535270Z         buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4535481Z         buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4535586Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4535745Z         triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0)
2023-01-11T21:38:06.4535861Z         del arg0_1
2023-01-11T21:38:06.4535937Z         return (buf0, buf1, )
2023-01-11T21:38:06.4535942Z 
2023-01-11T21:38:06.4535946Z 
2023-01-11T21:38:06.4536024Z if __name__ == "__main__":
2023-01-11T21:38:06.4536146Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4536273Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4536469Z     arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4536582Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4536587Z 
2023-01-11T21:38:06.4536850Z [2023-01-11 21:35:04,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 633
2023-01-11T21:38:06.4536856Z 
2023-01-11T21:38:06.4536951Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4537017Z import torch
2023-01-11T21:38:06.4537089Z import random
2023-01-11T21:38:06.4537272Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4537396Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4537401Z 
2023-01-11T21:38:06.4537483Z aten = torch.ops.aten
2023-01-11T21:38:06.4537620Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4537718Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4537724Z 
2023-01-11T21:38:06.4537798Z import triton
2023-01-11T21:38:06.4537883Z import triton.language as tl
2023-01-11T21:38:06.4538007Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4538143Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4538149Z 
2023-01-11T21:38:06.4538153Z 
2023-01-11T21:38:06.4538318Z triton_fused_isinf_isnan_0 = async_compile.triton('''
2023-01-11T21:38:06.4538390Z import triton
2023-01-11T21:38:06.4538482Z import triton.language as tl
2023-01-11T21:38:06.4538597Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4538695Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4538827Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4538950Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4538955Z 
2023-01-11T21:38:06.4539407Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp64', 1: '*i1', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4539482Z @triton.jit
2023-01-11T21:38:06.4539625Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4539698Z     xnumel = 5
2023-01-11T21:38:06.4539794Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4539916Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4539997Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4540065Z     x0 = xindex
2023-01-11T21:38:06.4540253Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4540354Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4540454Z     tmp1 = tl.libdevice.isinf(tmp0)
2023-01-11T21:38:06.4540549Z     tmp3 = tl.libdevice.isnan(tmp2)
2023-01-11T21:38:06.4540674Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4540810Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4540894Z ''')
2023-01-11T21:38:06.4540899Z 
2023-01-11T21:38:06.4540904Z 
2023-01-11T21:38:06.4540999Z async_compile.wait(globals())
2023-01-11T21:38:06.4541075Z del async_compile
2023-01-11T21:38:06.4541080Z 
2023-01-11T21:38:06.4541154Z def call(args):
2023-01-11T21:38:06.4541227Z     arg0_1, = args
2023-01-11T21:38:06.4541303Z     args.clear()
2023-01-11T21:38:06.4541388Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4541575Z         buf0 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4541792Z         buf1 = empty_strided((5, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4541886Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4542039Z         triton_fused_isinf_isnan_0.run(arg0_1, buf0, buf1, 5, grid=grid(5), stream=stream0)
2023-01-11T21:38:06.4542112Z         del arg0_1
2023-01-11T21:38:06.4542201Z         return (buf0, buf1, )
2023-01-11T21:38:06.4542206Z 
2023-01-11T21:38:06.4542211Z 
2023-01-11T21:38:06.4542292Z if __name__ == "__main__":
2023-01-11T21:38:06.4542404Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4542531Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4542732Z     arg0_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4542847Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4542852Z 
2023-01-11T21:38:06.4542925Z ok (0.485s)
2023-01-11T21:38:06.4543443Z   test_kernel_names_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.4543526Z   warnings.warn(
2023-01-11T21:38:06.4543789Z [2023-01-11 21:35:04,763] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 634
2023-01-11T21:38:06.4544055Z [2023-01-11 21:35:04,917] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 634
2023-01-11T21:38:06.4544061Z 
2023-01-11T21:38:06.4544154Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4544229Z import torch
2023-01-11T21:38:06.4544308Z import random
2023-01-11T21:38:06.4544429Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4544554Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4544559Z 
2023-01-11T21:38:06.4544642Z aten = torch.ops.aten
2023-01-11T21:38:06.4544782Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4544875Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4544880Z 
2023-01-11T21:38:06.4544957Z import triton
2023-01-11T21:38:06.4545050Z import triton.language as tl
2023-01-11T21:38:06.4545178Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4545353Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4545359Z 
2023-01-11T21:38:06.4545363Z 
2023-01-11T21:38:06.4545499Z triton__0 = async_compile.triton('''
2023-01-11T21:38:06.4545577Z import triton
2023-01-11T21:38:06.4545687Z import triton.language as tl
2023-01-11T21:38:06.4545808Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4545919Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4546048Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4546173Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4546178Z 
2023-01-11T21:38:06.4546580Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4546656Z @triton.jit
2023-01-11T21:38:06.4546789Z def triton__0(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4546862Z     xnumel = 8
2023-01-11T21:38:06.4546952Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4547079Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4547160Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4547228Z     x0 = xindex
2023-01-11T21:38:06.4547323Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4547394Z     tmp1 = 2
2023-01-11T21:38:06.4547472Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4547599Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4547684Z ''')
2023-01-11T21:38:06.4547690Z 
2023-01-11T21:38:06.4547695Z 
2023-01-11T21:38:06.4547813Z async_compile.wait(globals())
2023-01-11T21:38:06.4547887Z del async_compile
2023-01-11T21:38:06.4547893Z 
2023-01-11T21:38:06.4547969Z def call(args):
2023-01-11T21:38:06.4548039Z     arg0_1, = args
2023-01-11T21:38:06.4548115Z     args.clear()
2023-01-11T21:38:06.4548200Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4548399Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4548490Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4548614Z         triton__0.run(arg0_1, buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4548685Z         del arg0_1
2023-01-11T21:38:06.4548761Z         return (buf0, )
2023-01-11T21:38:06.4548767Z 
2023-01-11T21:38:06.4548771Z 
2023-01-11T21:38:06.4548852Z if __name__ == "__main__":
2023-01-11T21:38:06.4548967Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4549086Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4549280Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4549394Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4549399Z 
2023-01-11T21:38:06.4549467Z ok (0.324s)
2023-01-11T21:38:06.4549626Z   test_kwargs_cuda (__main__.CudaTests) ... skip: histogramdd only supports cpu (0.000s)
2023-01-11T21:38:06.4550078Z   test_l1_loss_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4550209Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4550466Z [2023-01-11 21:35:05,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 635
2023-01-11T21:38:06.4550729Z [2023-01-11 21:35:05,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 635
2023-01-11T21:38:06.4550738Z 
2023-01-11T21:38:06.4550834Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4550901Z import torch
2023-01-11T21:38:06.4550978Z import random
2023-01-11T21:38:06.4551124Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4551247Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4551251Z 
2023-01-11T21:38:06.4551332Z aten = torch.ops.aten
2023-01-11T21:38:06.4551469Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4551566Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4551571Z 
2023-01-11T21:38:06.4551639Z import triton
2023-01-11T21:38:06.4551730Z import triton.language as tl
2023-01-11T21:38:06.4551855Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4551993Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4551999Z 
2023-01-11T21:38:06.4552006Z 
2023-01-11T21:38:06.4552202Z triton_fused_abs_1_mean_mean_1_pow_1_sub_sub_1_0 = async_compile.triton('''
2023-01-11T21:38:06.4552279Z import triton
2023-01-11T21:38:06.4552369Z import triton.language as tl
2023-01-11T21:38:06.4552481Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4552577Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4552707Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4552830Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4552835Z 
2023-01-11T21:38:06.4552925Z @reduction(size_hints=[1, 2048],
2023-01-11T21:38:06.4553039Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.4553122Z               filename=__file__,
2023-01-11T21:38:06.4553545Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 5), equal_to_1=())]})
2023-01-11T21:38:06.4553652Z @triton.jit
2023-01-11T21:38:06.4553837Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4553909Z     xnumel = 1
2023-01-11T21:38:06.4553985Z     rnumel = 1536
2023-01-11T21:38:06.4554087Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4554224Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4554308Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4554425Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4554537Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4554654Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4554760Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4554847Z         rindex = roffset + rbase
2023-01-11T21:38:06.4554931Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4555006Z         r0 = rindex
2023-01-11T21:38:06.4555201Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4555410Z         tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4555522Z         tmp5 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.4555634Z         tmp6 = tl.load(in_ptr1 + (r0), rmask)
2023-01-11T21:38:06.4555748Z         tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.4555833Z         tmp3 = tl.abs(tmp2)
2023-01-11T21:38:06.4555953Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.4556067Z         tmp7 = tmp5 - tmp6
2023-01-11T21:38:06.4556141Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.4556262Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.4556374Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4556484Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4556557Z     tmp10 = 1536
2023-01-11T21:38:06.4556640Z     tmp11 = tmp4 / tmp10
2023-01-11T21:38:06.4556720Z     tmp12 = tmp9 / tmp10
2023-01-11T21:38:06.4556853Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, None)
2023-01-11T21:38:06.4556990Z     tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp12, None)
2023-01-11T21:38:06.4557074Z ''')
2023-01-11T21:38:06.4557107Z 
2023-01-11T21:38:06.4557112Z 
2023-01-11T21:38:06.4557205Z async_compile.wait(globals())
2023-01-11T21:38:06.4557285Z del async_compile
2023-01-11T21:38:06.4557290Z 
2023-01-11T21:38:06.4557363Z def call(args):
2023-01-11T21:38:06.4557443Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4557518Z     args.clear()
2023-01-11T21:38:06.4557603Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4557792Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4557981Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4558073Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.4558165Z         buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.4558255Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4558429Z         triton_fused_abs_1_mean_mean_1_pow_1_sub_sub_1_0.run(buf2, buf3, arg0_1, arg1_1, 1, 1536, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.4558495Z         del arg0_1
2023-01-11T21:38:06.4558569Z         del arg1_1
2023-01-11T21:38:06.4558652Z         return (buf2, buf3, )
2023-01-11T21:38:06.4558657Z 
2023-01-11T21:38:06.4558661Z 
2023-01-11T21:38:06.4558740Z if __name__ == "__main__":
2023-01-11T21:38:06.4558857Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4558982Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4559206Z     arg0_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4559420Z     arg1_1 = rand_strided((2, 3, 16, 16), (768, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4559533Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4559575Z 
2023-01-11T21:38:06.4559642Z ok (0.239s)
2023-01-11T21:38:06.4560102Z   test_layer_norm_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4560236Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4560491Z [2023-01-11 21:35:05,382] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 636
2023-01-11T21:38:06.4560700Z [2023-01-11 21:35:05,412] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.4560962Z [2023-01-11 21:35:05,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 636
2023-01-11T21:38:06.4560970Z 
2023-01-11T21:38:06.4561066Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4561140Z import torch
2023-01-11T21:38:06.4561215Z import random
2023-01-11T21:38:06.4561327Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4561449Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4561457Z 
2023-01-11T21:38:06.4561538Z aten = torch.ops.aten
2023-01-11T21:38:06.4561673Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4561767Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4561772Z 
2023-01-11T21:38:06.4561843Z import triton
2023-01-11T21:38:06.4561934Z import triton.language as tl
2023-01-11T21:38:06.4562051Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4562190Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4562196Z 
2023-01-11T21:38:06.4562200Z 
2023-01-11T21:38:06.4562397Z triton_fused_getitem_1_relu_rsqrt_var_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.4562474Z import triton
2023-01-11T21:38:06.4562566Z import triton.language as tl
2023-01-11T21:38:06.4562678Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4562778Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4562911Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4563056Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4563062Z 
2023-01-11T21:38:06.4563151Z @reduction(size_hints=[16, 32],
2023-01-11T21:38:06.4563268Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.4563352Z               filename=__file__,
2023-01-11T21:38:06.4563808Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7), equal_to_1=())]})
2023-01-11T21:38:06.4563885Z @triton.jit
2023-01-11T21:38:06.4564089Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4564160Z     xnumel = 16
2023-01-11T21:38:06.4564225Z     rnumel = 32
2023-01-11T21:38:06.4564322Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4564459Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4564543Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4564658Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4564727Z     x0 = xindex
2023-01-11T21:38:06.4564844Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4564942Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4565029Z         rindex = roffset + rbase
2023-01-11T21:38:06.4565116Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4565186Z         r1 = rindex
2023-01-11T21:38:06.4565406Z         tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4565582Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.4565709Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4565836Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4565947Z     _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4566052Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4566140Z         rindex = roffset + rbase
2023-01-11T21:38:06.4566225Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4566296Z         r1 = rindex
2023-01-11T21:38:06.4566515Z         tmp2 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4566588Z         tmp3 = 32
2023-01-11T21:38:06.4566662Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.4566775Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.4566856Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.4566977Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.4567098Z         _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8)
2023-01-11T21:38:06.4567210Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4567322Z     tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4567386Z     tmp9 = 32
2023-01-11T21:38:06.4567469Z     tmp10 = tmp8 / tmp9
2023-01-11T21:38:06.4567546Z     tmp11 = tmp7 / tmp9
2023-01-11T21:38:06.4567650Z     tmp12 = 1e-05
2023-01-11T21:38:06.4567732Z     tmp13 = tmp11 + tmp12
2023-01-11T21:38:06.4567833Z     tmp14 = tl.libdevice.rsqrt(tmp13)
2023-01-11T21:38:06.4567966Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.4568108Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.4568211Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4568299Z         rindex = roffset + rbase
2023-01-11T21:38:06.4568381Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4568455Z         r1 = rindex
2023-01-11T21:38:06.4568571Z         tmp15 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask)
2023-01-11T21:38:06.4568679Z         tmp18 = tl.load(in_ptr1 + (r1), rmask)
2023-01-11T21:38:06.4568774Z         tmp20 = tl.load(in_ptr2 + (r1), rmask)
2023-01-11T21:38:06.4568966Z         tmp16 = tmp15 - tmp10
2023-01-11T21:38:06.4569054Z         tmp17 = tmp16 * tmp14
2023-01-11T21:38:06.4569134Z         tmp19 = tmp17 * tmp18
2023-01-11T21:38:06.4569213Z         tmp21 = tmp19 + tmp20
2023-01-11T21:38:06.4569332Z         tmp22 = tl.where(0 != 0, 0, tl.where(0 > tmp21, 0, tmp21))
2023-01-11T21:38:06.4569489Z         tl.store(out_ptr1 + (r1 + (32*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp22, rmask & xmask)
2023-01-11T21:38:06.4569567Z ''')
2023-01-11T21:38:06.4569573Z 
2023-01-11T21:38:06.4569578Z 
2023-01-11T21:38:06.4569668Z async_compile.wait(globals())
2023-01-11T21:38:06.4569745Z del async_compile
2023-01-11T21:38:06.4569750Z 
2023-01-11T21:38:06.4569828Z def call(args):
2023-01-11T21:38:06.4569933Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.4570008Z     args.clear()
2023-01-11T21:38:06.4570101Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4570296Z         buf1 = empty_strided((16, 1), (1, 16), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4570494Z         buf2 = empty_strided((16, 1), (1, 16), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4570616Z         buf3 = as_strided(buf2, (16, 1), (1, 1)); del buf2  # reuse
2023-01-11T21:38:06.4570730Z         buf4 = as_strided(buf1, (16, 1), (1, 1)); del buf1  # reuse
2023-01-11T21:38:06.4570929Z         buf5 = empty_strided((16, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4571022Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4571221Z         triton_fused_getitem_1_relu_rsqrt_var_mean_0.run(buf3, buf4, primals_3, primals_1, primals_2, buf5, 16, 32, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4571352Z         return (buf5, primals_1, primals_2, primals_3, buf3, buf4, )
2023-01-11T21:38:06.4571386Z 
2023-01-11T21:38:06.4571390Z 
2023-01-11T21:38:06.4571468Z if __name__ == "__main__":
2023-01-11T21:38:06.4571580Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4571708Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4571916Z     primals_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4572119Z     primals_2 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4572327Z     primals_3 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4572470Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.4572476Z 
2023-01-11T21:38:06.4572545Z ok (0.365s)
2023-01-11T21:38:06.4573007Z   test_leaky_relu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4573138Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4573392Z [2023-01-11 21:35:05,715] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 637
2023-01-11T21:38:06.4573657Z [2023-01-11 21:35:05,798] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 637
2023-01-11T21:38:06.4574076Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4574211Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4574464Z [2023-01-11 21:35:05,857] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 638
2023-01-11T21:38:06.4574880Z [2023-01-11 21:35:05,947] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 638
2023-01-11T21:38:06.4574886Z 
2023-01-11T21:38:06.4574989Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4575062Z import torch
2023-01-11T21:38:06.4575137Z import random
2023-01-11T21:38:06.4575255Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4575371Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4575376Z 
2023-01-11T21:38:06.4575459Z aten = torch.ops.aten
2023-01-11T21:38:06.4575597Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4575697Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4575702Z 
2023-01-11T21:38:06.4575775Z import triton
2023-01-11T21:38:06.4575872Z import triton.language as tl
2023-01-11T21:38:06.4575996Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4576128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4576142Z 
2023-01-11T21:38:06.4576146Z 
2023-01-11T21:38:06.4576309Z triton_fused_add_where_1_0 = async_compile.triton('''
2023-01-11T21:38:06.4576382Z import triton
2023-01-11T21:38:06.4576477Z import triton.language as tl
2023-01-11T21:38:06.4576591Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4576692Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4576826Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4576950Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4576956Z 
2023-01-11T21:38:06.4577437Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4577598Z @triton.jit
2023-01-11T21:38:06.4577742Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4577816Z     xnumel = 256
2023-01-11T21:38:06.4577911Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4578041Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4578126Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4578196Z     x0 = xindex
2023-01-11T21:38:06.4578382Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4578479Z     tmp8 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4578551Z     tmp1 = 0
2023-01-11T21:38:06.4578628Z     tmp2 = tmp0 > tmp1
2023-01-11T21:38:06.4578696Z     tmp3 = 0.2
2023-01-11T21:38:06.4578771Z     tmp4 = tmp0 * tmp3
2023-01-11T21:38:06.4578866Z     tmp5 = tl.where(tmp2, tmp0, tmp4)
2023-01-11T21:38:06.4578930Z     tmp6 = 2
2023-01-11T21:38:06.4579010Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.4579080Z     tmp9 = 1
2023-01-11T21:38:06.4579158Z     tmp10 = tmp8 + tmp9
2023-01-11T21:38:06.4579237Z     tmp11 = tmp10 > tmp1
2023-01-11T21:38:06.4579309Z     tmp12 = 0.01
2023-01-11T21:38:06.4579381Z     tmp13 = tmp10 * tmp12
2023-01-11T21:38:06.4579479Z     tmp14 = tl.where(tmp11, tmp10, tmp13)
2023-01-11T21:38:06.4579618Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4579751Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.4579834Z ''')
2023-01-11T21:38:06.4579840Z 
2023-01-11T21:38:06.4579844Z 
2023-01-11T21:38:06.4579936Z async_compile.wait(globals())
2023-01-11T21:38:06.4580011Z del async_compile
2023-01-11T21:38:06.4580016Z 
2023-01-11T21:38:06.4580091Z def call(args):
2023-01-11T21:38:06.4580157Z     arg0_1, = args
2023-01-11T21:38:06.4580230Z     args.clear()
2023-01-11T21:38:06.4580321Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4580525Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4580722Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4580810Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4580990Z         triton_fused_add_where_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4581058Z         del arg0_1
2023-01-11T21:38:06.4581140Z         return (buf0, buf1, )
2023-01-11T21:38:06.4581145Z 
2023-01-11T21:38:06.4581150Z 
2023-01-11T21:38:06.4581229Z if __name__ == "__main__":
2023-01-11T21:38:06.4581347Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4581472Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4581677Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4581788Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4581793Z 
2023-01-11T21:38:06.4581801Z 
2023-01-11T21:38:06.4581900Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4581967Z import torch
2023-01-11T21:38:06.4582041Z import random
2023-01-11T21:38:06.4582159Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4582280Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4582287Z 
2023-01-11T21:38:06.4582369Z aten = torch.ops.aten
2023-01-11T21:38:06.4582506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4582601Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4582606Z 
2023-01-11T21:38:06.4582679Z import triton
2023-01-11T21:38:06.4582764Z import triton.language as tl
2023-01-11T21:38:06.4582887Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4583026Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4583031Z 
2023-01-11T21:38:06.4583036Z 
2023-01-11T21:38:06.4583229Z triton_fused_add_convert_element_type_3_0 = async_compile.triton('''
2023-01-11T21:38:06.4583334Z import triton
2023-01-11T21:38:06.4583427Z import triton.language as tl
2023-01-11T21:38:06.4583543Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4583645Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4583772Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4583902Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4583907Z 
2023-01-11T21:38:06.4584329Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4584405Z @triton.jit
2023-01-11T21:38:06.4584550Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4584627Z     xnumel = 256
2023-01-11T21:38:06.4584725Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4584855Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4584936Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4585009Z     x0 = xindex
2023-01-11T21:38:06.4585223Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4585343Z     tmp10 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4585436Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4585509Z     tmp2 = 0
2023-01-11T21:38:06.4585591Z     tmp3 = tmp1 > tmp2
2023-01-11T21:38:06.4585659Z     tmp4 = 0.2
2023-01-11T21:38:06.4585738Z     tmp5 = tmp1 * tmp4
2023-01-11T21:38:06.4585837Z     tmp6 = tl.where(tmp3, tmp1, tmp5)
2023-01-11T21:38:06.4585926Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.4585998Z     tmp8 = 2
2023-01-11T21:38:06.4586080Z     tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.4586146Z     tmp11 = 1
2023-01-11T21:38:06.4586230Z     tmp12 = tmp10 + tmp11
2023-01-11T21:38:06.4586322Z     tmp13 = tmp12.to(tl.float32)
2023-01-11T21:38:06.4586403Z     tmp14 = tmp13 > tmp2
2023-01-11T21:38:06.4586479Z     tmp15 = 0.01
2023-01-11T21:38:06.4586558Z     tmp16 = tmp13 * tmp15
2023-01-11T21:38:06.4586660Z     tmp17 = tl.where(tmp14, tmp13, tmp16)
2023-01-11T21:38:06.4586745Z     tmp18 = tmp17.to(tl.float32)
2023-01-11T21:38:06.4586883Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.4587046Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.4587133Z ''')
2023-01-11T21:38:06.4587138Z 
2023-01-11T21:38:06.4587143Z 
2023-01-11T21:38:06.4587236Z async_compile.wait(globals())
2023-01-11T21:38:06.4587310Z del async_compile
2023-01-11T21:38:06.4587316Z 
2023-01-11T21:38:06.4587391Z def call(args):
2023-01-11T21:38:06.4587463Z     arg0_1, = args
2023-01-11T21:38:06.4587531Z     args.clear()
2023-01-11T21:38:06.4587624Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4587825Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4588024Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4588116Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4588283Z         triton_fused_add_convert_element_type_3_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4588356Z         del arg0_1
2023-01-11T21:38:06.4588434Z         return (buf0, buf1, )
2023-01-11T21:38:06.4588440Z 
2023-01-11T21:38:06.4588450Z 
2023-01-11T21:38:06.4588523Z if __name__ == "__main__":
2023-01-11T21:38:06.4588637Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4588765Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4588968Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4589079Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4589084Z 
2023-01-11T21:38:06.4589153Z ok (0.267s)
2023-01-11T21:38:06.4589599Z   test_lgamma_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4589757Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4590013Z [2023-01-11 21:35:06,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 639
2023-01-11T21:38:06.4590267Z [2023-01-11 21:35:06,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 639
2023-01-11T21:38:06.4590677Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4590809Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4591063Z [2023-01-11 21:35:06,448] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 640
2023-01-11T21:38:06.4591328Z [2023-01-11 21:35:06,629] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 640
2023-01-11T21:38:06.4591334Z 
2023-01-11T21:38:06.4591431Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4591505Z import torch
2023-01-11T21:38:06.4591580Z import random
2023-01-11T21:38:06.4591700Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4591816Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4591821Z 
2023-01-11T21:38:06.4591902Z aten = torch.ops.aten
2023-01-11T21:38:06.4592038Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4592133Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4592142Z 
2023-01-11T21:38:06.4592214Z import triton
2023-01-11T21:38:06.4592305Z import triton.language as tl
2023-01-11T21:38:06.4592431Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4592563Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4592575Z 
2023-01-11T21:38:06.4592608Z 
2023-01-11T21:38:06.4592763Z triton_fused_add_cos_0 = async_compile.triton('''
2023-01-11T21:38:06.4592837Z import triton
2023-01-11T21:38:06.4592926Z import triton.language as tl
2023-01-11T21:38:06.4593044Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4593148Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4593279Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4593406Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4593411Z 
2023-01-11T21:38:06.4593829Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4593899Z @triton.jit
2023-01-11T21:38:06.4594040Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4594116Z     xnumel = 256
2023-01-11T21:38:06.4594217Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4594347Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4594428Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4594499Z     x0 = xindex
2023-01-11T21:38:06.4594681Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4594778Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4594878Z     tmp1 = tl.libdevice.lgamma(tmp0)
2023-01-11T21:38:06.4594947Z     tmp2 = 2
2023-01-11T21:38:06.4595025Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4595097Z     tmp5 = 1
2023-01-11T21:38:06.4595216Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.4595298Z     tmp7 = tl.cos(tmp6)
2023-01-11T21:38:06.4595452Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4595606Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4595691Z ''')
2023-01-11T21:38:06.4595696Z 
2023-01-11T21:38:06.4595703Z 
2023-01-11T21:38:06.4595795Z async_compile.wait(globals())
2023-01-11T21:38:06.4595874Z del async_compile
2023-01-11T21:38:06.4595879Z 
2023-01-11T21:38:06.4595954Z def call(args):
2023-01-11T21:38:06.4596020Z     arg0_1, = args
2023-01-11T21:38:06.4596095Z     args.clear()
2023-01-11T21:38:06.4596186Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4596389Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4596587Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4596680Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4596829Z         triton_fused_add_cos_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4596900Z         del arg0_1
2023-01-11T21:38:06.4596975Z         return (buf0, buf1, )
2023-01-11T21:38:06.4596980Z 
2023-01-11T21:38:06.4596984Z 
2023-01-11T21:38:06.4597063Z if __name__ == "__main__":
2023-01-11T21:38:06.4597184Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4597310Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4597512Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4597623Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4597628Z 
2023-01-11T21:38:06.4597633Z 
2023-01-11T21:38:06.4597729Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4597802Z import torch
2023-01-11T21:38:06.4597869Z import random
2023-01-11T21:38:06.4597987Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4598107Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4598116Z 
2023-01-11T21:38:06.4598195Z aten = torch.ops.aten
2023-01-11T21:38:06.4598329Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4598423Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4598428Z 
2023-01-11T21:38:06.4598502Z import triton
2023-01-11T21:38:06.4598618Z import triton.language as tl
2023-01-11T21:38:06.4598745Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4598883Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4598889Z 
2023-01-11T21:38:06.4598893Z 
2023-01-11T21:38:06.4599052Z triton_fused_add_cos_0 = async_compile.triton('''
2023-01-11T21:38:06.4599125Z import triton
2023-01-11T21:38:06.4599217Z import triton.language as tl
2023-01-11T21:38:06.4599329Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4599430Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4599555Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4599682Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4599687Z 
2023-01-11T21:38:06.4600107Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4600181Z @triton.jit
2023-01-11T21:38:06.4600325Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4600401Z     xnumel = 256
2023-01-11T21:38:06.4600496Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4600626Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4600702Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4600772Z     x0 = xindex
2023-01-11T21:38:06.4600986Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4601102Z     tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4601231Z     tmp1 = tl.libdevice.lgamma(tmp0)
2023-01-11T21:38:06.4601301Z     tmp2 = 2
2023-01-11T21:38:06.4601378Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.4601442Z     tmp5 = 1
2023-01-11T21:38:06.4601518Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.4601597Z     tmp7 = tl.cos(tmp6)
2023-01-11T21:38:06.4601731Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4601861Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4601946Z ''')
2023-01-11T21:38:06.4601953Z 
2023-01-11T21:38:06.4601957Z 
2023-01-11T21:38:06.4602049Z async_compile.wait(globals())
2023-01-11T21:38:06.4602126Z del async_compile
2023-01-11T21:38:06.4602131Z 
2023-01-11T21:38:06.4602199Z def call(args):
2023-01-11T21:38:06.4602270Z     arg0_1, = args
2023-01-11T21:38:06.4602342Z     args.clear()
2023-01-11T21:38:06.4602434Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4602635Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4602836Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4602927Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4603068Z         triton_fused_add_cos_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.4603138Z         del arg0_1
2023-01-11T21:38:06.4603221Z         return (buf0, buf1, )
2023-01-11T21:38:06.4603227Z 
2023-01-11T21:38:06.4603232Z 
2023-01-11T21:38:06.4603311Z if __name__ == "__main__":
2023-01-11T21:38:06.4603427Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4603553Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4603756Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4603869Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4603874Z 
2023-01-11T21:38:06.4603938Z ok (0.682s)
2023-01-11T21:38:06.4604417Z   test_linear1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4604552Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4604808Z [2023-01-11 21:35:06,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 641
2023-01-11T21:38:06.4605071Z [2023-01-11 21:35:06,837] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 641
2023-01-11T21:38:06.4605510Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4605663Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4605925Z [2023-01-11 21:35:06,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 642
2023-01-11T21:38:06.4606187Z [2023-01-11 21:35:07,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 642
2023-01-11T21:38:06.4606192Z 
2023-01-11T21:38:06.4606287Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4606361Z import torch
2023-01-11T21:38:06.4606428Z import random
2023-01-11T21:38:06.4606544Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4606666Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4606671Z 
2023-01-11T21:38:06.4606752Z aten = torch.ops.aten
2023-01-11T21:38:06.4606885Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4607006Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4607011Z 
2023-01-11T21:38:06.4607083Z import triton
2023-01-11T21:38:06.4607167Z import triton.language as tl
2023-01-11T21:38:06.4607288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4607429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4607435Z 
2023-01-11T21:38:06.4607439Z 
2023-01-11T21:38:06.4607600Z triton_fused_sigmoid_0 = async_compile.triton('''
2023-01-11T21:38:06.4607675Z import triton
2023-01-11T21:38:06.4607767Z import triton.language as tl
2023-01-11T21:38:06.4607882Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4607981Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4608108Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4608231Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4608236Z 
2023-01-11T21:38:06.4608638Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4608710Z @triton.jit
2023-01-11T21:38:06.4608835Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4608911Z     xnumel = 32
2023-01-11T21:38:06.4609009Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4609139Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4609215Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4609285Z     x0 = xindex
2023-01-11T21:38:06.4609393Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4609478Z     tmp1 = tl.sigmoid(tmp0)
2023-01-11T21:38:06.4609612Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4609696Z ''')
2023-01-11T21:38:06.4609702Z 
2023-01-11T21:38:06.4609706Z 
2023-01-11T21:38:06.4609801Z async_compile.wait(globals())
2023-01-11T21:38:06.4609871Z del async_compile
2023-01-11T21:38:06.4609881Z 
2023-01-11T21:38:06.4609949Z def call(args):
2023-01-11T21:38:06.4610053Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.4610129Z     args.clear()
2023-01-11T21:38:06.4610220Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4610447Z         buf0 = empty_strided((2, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4610615Z         aten.addmm.out(primals_2, primals_3, as_strided(primals_1, (8, 16), (1, 8)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4610692Z         del primals_1
2023-01-11T21:38:06.4610761Z         del primals_2
2023-01-11T21:38:06.4610852Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.4610943Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4611075Z         triton_fused_sigmoid_0.run(buf1, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.4611171Z         return (buf1, primals_3, buf1, )
2023-01-11T21:38:06.4611180Z 
2023-01-11T21:38:06.4611184Z 
2023-01-11T21:38:06.4611264Z if __name__ == "__main__":
2023-01-11T21:38:06.4611383Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4611503Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4611710Z     primals_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4611913Z     primals_2 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4612118Z     primals_3 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4612261Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.4612266Z 
2023-01-11T21:38:06.4612270Z 
2023-01-11T21:38:06.4612367Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4612440Z import torch
2023-01-11T21:38:06.4612513Z import random
2023-01-11T21:38:06.4612625Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4612748Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4612779Z 
2023-01-11T21:38:06.4612861Z aten = torch.ops.aten
2023-01-11T21:38:06.4612993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4613087Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4613093Z 
2023-01-11T21:38:06.4613163Z import triton
2023-01-11T21:38:06.4613258Z import triton.language as tl
2023-01-11T21:38:06.4613380Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4613511Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4613516Z 
2023-01-11T21:38:06.4613527Z 
2023-01-11T21:38:06.4613681Z triton_fused_sigmoid_0 = async_compile.triton('''
2023-01-11T21:38:06.4613755Z import triton
2023-01-11T21:38:06.4613847Z import triton.language as tl
2023-01-11T21:38:06.4613961Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4614062Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4614195Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4614322Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4614327Z 
2023-01-11T21:38:06.4614831Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4614905Z @triton.jit
2023-01-11T21:38:06.4615029Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4615103Z     xnumel = 32
2023-01-11T21:38:06.4615198Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4615327Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4615408Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4615473Z     x0 = xindex
2023-01-11T21:38:06.4615594Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4615679Z     tmp1 = tl.sigmoid(tmp0)
2023-01-11T21:38:06.4615815Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4615904Z ''')
2023-01-11T21:38:06.4615909Z 
2023-01-11T21:38:06.4615914Z 
2023-01-11T21:38:06.4616006Z async_compile.wait(globals())
2023-01-11T21:38:06.4616083Z del async_compile
2023-01-11T21:38:06.4616088Z 
2023-01-11T21:38:06.4616161Z def call(args):
2023-01-11T21:38:06.4616301Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.4616378Z     args.clear()
2023-01-11T21:38:06.4616470Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4616671Z         buf0 = empty_strided((2, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4616836Z         aten.addmm.out(primals_2, primals_3, as_strided(primals_1, (8, 16), (1, 8)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4616912Z         del primals_1
2023-01-11T21:38:06.4616987Z         del primals_2
2023-01-11T21:38:06.4617070Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.4617216Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4617355Z         triton_fused_sigmoid_0.run(buf1, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.4617462Z         return (buf1, primals_3, buf1, )
2023-01-11T21:38:06.4617467Z 
2023-01-11T21:38:06.4617471Z 
2023-01-11T21:38:06.4617551Z if __name__ == "__main__":
2023-01-11T21:38:06.4617669Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4617798Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4618003Z     primals_1 = rand_strided((16, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4618196Z     primals_2 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4618396Z     primals_3 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4618538Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.4618544Z 
2023-01-11T21:38:06.4618613Z ok (0.437s)
2023-01-11T21:38:06.4619067Z   test_linear2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4619242Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4619503Z [2023-01-11 21:35:07,244] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 643
2023-01-11T21:38:06.4619766Z [2023-01-11 21:35:07,417] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 643
2023-01-11T21:38:06.4620183Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4620319Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4620575Z [2023-01-11 21:35:08,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 644
2023-01-11T21:38:06.4620583Z 
2023-01-11T21:38:06.4620677Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4620752Z import torch
2023-01-11T21:38:06.4620828Z import random
2023-01-11T21:38:06.4620949Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4621075Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4621080Z 
2023-01-11T21:38:06.4621165Z aten = torch.ops.aten
2023-01-11T21:38:06.4621304Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4621401Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4621406Z 
2023-01-11T21:38:06.4621475Z import triton
2023-01-11T21:38:06.4621569Z import triton.language as tl
2023-01-11T21:38:06.4621699Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4621841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4621846Z 
2023-01-11T21:38:06.4621851Z 
2023-01-11T21:38:06.4622007Z triton_fused_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.4622113Z import triton
2023-01-11T21:38:06.4622206Z import triton.language as tl
2023-01-11T21:38:06.4622315Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4622414Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4622549Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4622675Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4622680Z 
2023-01-11T21:38:06.4623079Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4623158Z @triton.jit
2023-01-11T21:38:06.4623283Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4623359Z     xnumel = 16
2023-01-11T21:38:06.4623449Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4623580Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4623665Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4623737Z     x0 = xindex
2023-01-11T21:38:06.4623841Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4623959Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.4624095Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4624173Z ''')
2023-01-11T21:38:06.4624185Z 
2023-01-11T21:38:06.4624189Z 
2023-01-11T21:38:06.4624342Z triton_fused_le_relu_3_1 = async_compile.triton('''
2023-01-11T21:38:06.4624418Z import triton
2023-01-11T21:38:06.4624510Z import triton.language as tl
2023-01-11T21:38:06.4624654Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4624755Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4624888Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4625014Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4625019Z 
2023-01-11T21:38:06.4625440Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4625510Z @triton.jit
2023-01-11T21:38:06.4625670Z def triton_(in_out_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4625749Z     xnumel = 16
2023-01-11T21:38:06.4625871Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4626000Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4626085Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4626160Z     x0 = xindex
2023-01-11T21:38:06.4626258Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4626374Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.4626449Z     tmp2 = 0
2023-01-11T21:38:06.4626531Z     tmp3 = tmp1 <= tmp2
2023-01-11T21:38:06.4626672Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4626807Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4626897Z ''')
2023-01-11T21:38:06.4626902Z 
2023-01-11T21:38:06.4626907Z 
2023-01-11T21:38:06.4627001Z async_compile.wait(globals())
2023-01-11T21:38:06.4627075Z del async_compile
2023-01-11T21:38:06.4627080Z 
2023-01-11T21:38:06.4627156Z def call(args):
2023-01-11T21:38:06.4627339Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9 = args
2023-01-11T21:38:06.4627417Z     args.clear()
2023-01-11T21:38:06.4627511Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4627718Z         buf0 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4627889Z         aten.addmm.out(primals_2, primals_9, as_strided(primals_1, (8, 8), (1, 8)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4627963Z         del primals_1
2023-01-11T21:38:06.4628044Z         del primals_2
2023-01-11T21:38:06.4628163Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.4628260Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4628392Z         triton_fused_relu_0.run(buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4628594Z         buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4628757Z         aten.addmm.out(primals_4, buf1, as_strided(primals_3, (8, 8), (1, 8)), beta=1, alpha=1, out=buf2)
2023-01-11T21:38:06.4628831Z         del primals_4
2023-01-11T21:38:06.4628922Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.4629054Z         triton_fused_relu_0.run(buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4629257Z         buf4 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4629421Z         aten.addmm.out(primals_6, buf3, as_strided(primals_5, (8, 8), (1, 8)), beta=1, alpha=1, out=buf4)
2023-01-11T21:38:06.4629501Z         del primals_6
2023-01-11T21:38:06.4629593Z         buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.4629723Z         triton_fused_relu_0.run(buf5, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4629912Z         buf6 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4630072Z         aten.addmm.out(primals_8, buf5, as_strided(primals_7, (8, 8), (1, 8)), beta=1, alpha=1, out=buf6)
2023-01-11T21:38:06.4630152Z         del primals_8
2023-01-11T21:38:06.4630244Z         buf7 = buf6; del buf6  # reuse
2023-01-11T21:38:06.4630440Z         buf8 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4630582Z         triton_fused_le_relu_3_1.run(buf7, buf8, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4630820Z         return (buf7, primals_9, buf1, buf3, buf5, buf8, as_strided(primals_7, (8, 8), (8, 1)), as_strided(primals_5, (8, 8), (8, 1)), as_strided(primals_3, (8, 8), (8, 1)), )
2023-01-11T21:38:06.4630827Z 
2023-01-11T21:38:06.4630831Z 
2023-01-11T21:38:06.4630913Z if __name__ == "__main__":
2023-01-11T21:38:06.4631038Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4631160Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4631365Z     primals_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4631567Z     primals_2 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4631770Z     primals_3 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4631969Z     primals_4 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4632174Z     primals_5 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4632377Z     primals_6 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4632580Z     primals_7 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4632777Z     primals_8 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4632979Z     primals_9 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4633190Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9]))
2023-01-11T21:38:06.4633196Z 
2023-01-11T21:38:06.4633464Z [2023-01-11 21:35:08,117] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 644
2023-01-11T21:38:06.4633470Z 
2023-01-11T21:38:06.4633569Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4633647Z import torch
2023-01-11T21:38:06.4633726Z import random
2023-01-11T21:38:06.4633851Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4633971Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4633976Z 
2023-01-11T21:38:06.4634060Z aten = torch.ops.aten
2023-01-11T21:38:06.4634199Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4634325Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4634332Z 
2023-01-11T21:38:06.4634408Z import triton
2023-01-11T21:38:06.4634500Z import triton.language as tl
2023-01-11T21:38:06.4634627Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4634767Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4634772Z 
2023-01-11T21:38:06.4634777Z 
2023-01-11T21:38:06.4634925Z triton_fused_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.4635003Z import triton
2023-01-11T21:38:06.4635094Z import triton.language as tl
2023-01-11T21:38:06.4635209Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4635327Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4635477Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4635620Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4635625Z 
2023-01-11T21:38:06.4636027Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4636094Z @triton.jit
2023-01-11T21:38:06.4636218Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4636291Z     xnumel = 16
2023-01-11T21:38:06.4636388Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4636518Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4636601Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4636671Z     x0 = xindex
2023-01-11T21:38:06.4636786Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4636941Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.4637077Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4637163Z ''')
2023-01-11T21:38:06.4637168Z 
2023-01-11T21:38:06.4637173Z 
2023-01-11T21:38:06.4637335Z triton_fused_le_relu_3_1 = async_compile.triton('''
2023-01-11T21:38:06.4637410Z import triton
2023-01-11T21:38:06.4637503Z import triton.language as tl
2023-01-11T21:38:06.4637616Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4637711Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4637841Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4637965Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4637970Z 
2023-01-11T21:38:06.4638381Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4638457Z @triton.jit
2023-01-11T21:38:06.4638592Z def triton_(in_out_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4638665Z     xnumel = 16
2023-01-11T21:38:06.4638762Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4638885Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4638970Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4639040Z     x0 = xindex
2023-01-11T21:38:06.4639160Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4639273Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.4639360Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.4639426Z     tmp3 = 0
2023-01-11T21:38:06.4639499Z     tmp4 = tmp2 <= tmp3
2023-01-11T21:38:06.4639639Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4639771Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.4639859Z ''')
2023-01-11T21:38:06.4639865Z 
2023-01-11T21:38:06.4639869Z 
2023-01-11T21:38:06.4639962Z async_compile.wait(globals())
2023-01-11T21:38:06.4640040Z del async_compile
2023-01-11T21:38:06.4640045Z 
2023-01-11T21:38:06.4640119Z def call(args):
2023-01-11T21:38:06.4640321Z     primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9 = args
2023-01-11T21:38:06.4640390Z     args.clear()
2023-01-11T21:38:06.4640482Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4640681Z         buf0 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4640853Z         aten.addmm.out(primals_2, primals_9, as_strided(primals_1, (8, 8), (1, 8)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4640929Z         del primals_1
2023-01-11T21:38:06.4641006Z         del primals_2
2023-01-11T21:38:06.4641096Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.4641181Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4641313Z         triton_fused_relu_0.run(buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4641508Z         buf2 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4641673Z         aten.addmm.out(primals_4, buf1, as_strided(primals_3, (8, 8), (1, 8)), beta=1, alpha=1, out=buf2)
2023-01-11T21:38:06.4641751Z         del primals_4
2023-01-11T21:38:06.4641842Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.4641971Z         triton_fused_relu_0.run(buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4642161Z         buf4 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4642323Z         aten.addmm.out(primals_6, buf3, as_strided(primals_5, (8, 8), (1, 8)), beta=1, alpha=1, out=buf4)
2023-01-11T21:38:06.4642400Z         del primals_6
2023-01-11T21:38:06.4642486Z         buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.4642613Z         triton_fused_relu_0.run(buf5, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4642835Z         buf6 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4642990Z         aten.addmm.out(primals_8, buf5, as_strided(primals_7, (8, 8), (1, 8)), beta=1, alpha=1, out=buf6)
2023-01-11T21:38:06.4643068Z         del primals_8
2023-01-11T21:38:06.4643153Z         buf7 = buf6; del buf6  # reuse
2023-01-11T21:38:06.4643345Z         buf8 = empty_strided((2, 8), (8, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.4643485Z         triton_fused_le_relu_3_1.run(buf7, buf8, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.4643690Z         return (buf7, primals_9, buf1, buf3, buf5, buf8, as_strided(primals_7, (8, 8), (8, 1)), as_strided(primals_5, (8, 8), (8, 1)), as_strided(primals_3, (8, 8), (8, 1)), )
2023-01-11T21:38:06.4643695Z 
2023-01-11T21:38:06.4643700Z 
2023-01-11T21:38:06.4643779Z if __name__ == "__main__":
2023-01-11T21:38:06.4643897Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4644027Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4644233Z     primals_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4644434Z     primals_2 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4644631Z     primals_3 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4644828Z     primals_4 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4645027Z     primals_5 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4645224Z     primals_6 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4645422Z     primals_7 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4645614Z     primals_8 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4645811Z     primals_9 = rand_strided((2, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4646023Z     print_performance(lambda: call([primals_1, primals_2, primals_3, primals_4, primals_5, primals_6, primals_7, primals_8, primals_9]))
2023-01-11T21:38:06.4646029Z 
2023-01-11T21:38:06.4646094Z ok (1.051s)
2023-01-11T21:38:06.4646249Z   test_linear_binary_cuda (__main__.CudaTests) ... ok (0.001s)
2023-01-11T21:38:06.4646714Z   test_linear_packed_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4646845Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4647104Z [2023-01-11 21:35:08,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 645
2023-01-11T21:38:06.4647369Z [2023-01-11 21:35:08,148] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 645
2023-01-11T21:38:06.4647785Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4647915Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4648171Z [2023-01-11 21:35:08,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 646
2023-01-11T21:38:06.4648432Z [2023-01-11 21:35:08,246] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 646
2023-01-11T21:38:06.4648846Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4649002Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4649249Z [2023-01-11 21:35:08,264] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 647
2023-01-11T21:38:06.4649507Z [2023-01-11 21:35:08,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 647
2023-01-11T21:38:06.4649916Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4650049Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4650301Z [2023-01-11 21:35:08,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 648
2023-01-11T21:38:06.4650562Z [2023-01-11 21:35:08,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 648
2023-01-11T21:38:06.4650971Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4651099Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4651349Z [2023-01-11 21:35:08,308] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 649
2023-01-11T21:38:06.4651610Z [2023-01-11 21:35:08,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 649
2023-01-11T21:38:06.4652043Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4652173Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4652417Z [2023-01-11 21:35:08,401] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 650
2023-01-11T21:38:06.4652671Z [2023-01-11 21:35:08,404] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 650
2023-01-11T21:38:06.4653076Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4653209Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4653458Z [2023-01-11 21:35:08,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 651
2023-01-11T21:38:06.4653464Z 
2023-01-11T21:38:06.4653560Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4653637Z import torch
2023-01-11T21:38:06.4653713Z import random
2023-01-11T21:38:06.4653833Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4653950Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4653964Z 
2023-01-11T21:38:06.4654039Z aten = torch.ops.aten
2023-01-11T21:38:06.4654177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4654310Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4654315Z 
2023-01-11T21:38:06.4654389Z import triton
2023-01-11T21:38:06.4654581Z import triton.language as tl
2023-01-11T21:38:06.4654707Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4654850Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4654856Z 
2023-01-11T21:38:06.4654860Z 
2023-01-11T21:38:06.4654945Z async_compile.wait(globals())
2023-01-11T21:38:06.4655022Z del async_compile
2023-01-11T21:38:06.4655027Z 
2023-01-11T21:38:06.4655100Z def call(args):
2023-01-11T21:38:06.4655187Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4655264Z     args.clear()
2023-01-11T21:38:06.4655355Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4655562Z         buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4655757Z         aten.addmm.out(arg1_1, as_strided(arg2_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4655827Z         del arg0_1
2023-01-11T21:38:06.4655899Z         del arg1_1
2023-01-11T21:38:06.4655969Z         del arg2_1
2023-01-11T21:38:06.4656089Z         return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), )
2023-01-11T21:38:06.4656094Z 
2023-01-11T21:38:06.4656101Z 
2023-01-11T21:38:06.4656181Z if __name__ == "__main__":
2023-01-11T21:38:06.4656306Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4656443Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4656675Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4656893Z     arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4657182Z     arg2_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4657326Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4657335Z 
2023-01-11T21:38:06.4657340Z 
2023-01-11T21:38:06.4657442Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4657516Z import torch
2023-01-11T21:38:06.4657589Z import random
2023-01-11T21:38:06.4657708Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4657876Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4657889Z 
2023-01-11T21:38:06.4657965Z aten = torch.ops.aten
2023-01-11T21:38:06.4658101Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4658195Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4658200Z 
2023-01-11T21:38:06.4658274Z import triton
2023-01-11T21:38:06.4658363Z import triton.language as tl
2023-01-11T21:38:06.4658487Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4658628Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4658634Z 
2023-01-11T21:38:06.4658638Z 
2023-01-11T21:38:06.4658728Z async_compile.wait(globals())
2023-01-11T21:38:06.4658801Z del async_compile
2023-01-11T21:38:06.4658806Z 
2023-01-11T21:38:06.4658886Z def call(args):
2023-01-11T21:38:06.4658970Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4659044Z     args.clear()
2023-01-11T21:38:06.4659132Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4659334Z         buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4659510Z         aten.addmm.out(arg1_1, as_strided(arg2_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4659577Z         del arg0_1
2023-01-11T21:38:06.4659649Z         del arg1_1
2023-01-11T21:38:06.4659718Z         del arg2_1
2023-01-11T21:38:06.4659830Z         return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), )
2023-01-11T21:38:06.4659835Z 
2023-01-11T21:38:06.4659839Z 
2023-01-11T21:38:06.4659920Z if __name__ == "__main__":
2023-01-11T21:38:06.4660035Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4660248Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4660454Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4660645Z     arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4660859Z     arg2_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4660991Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4660997Z 
2023-01-11T21:38:06.4661001Z 
2023-01-11T21:38:06.4661101Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4661177Z import torch
2023-01-11T21:38:06.4661253Z import random
2023-01-11T21:38:06.4661374Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4661499Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4661504Z 
2023-01-11T21:38:06.4661582Z aten = torch.ops.aten
2023-01-11T21:38:06.4661719Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4661820Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4661825Z 
2023-01-11T21:38:06.4661900Z import triton
2023-01-11T21:38:06.4661995Z import triton.language as tl
2023-01-11T21:38:06.4662124Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4662271Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4662276Z 
2023-01-11T21:38:06.4662280Z 
2023-01-11T21:38:06.4662375Z async_compile.wait(globals())
2023-01-11T21:38:06.4662447Z del async_compile
2023-01-11T21:38:06.4662451Z 
2023-01-11T21:38:06.4662527Z def call(args):
2023-01-11T21:38:06.4662608Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4662685Z     args.clear()
2023-01-11T21:38:06.4662777Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4662979Z         buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4663129Z         aten.mm.out(as_strided(arg1_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), out=buf0)
2023-01-11T21:38:06.4663202Z         del arg0_1
2023-01-11T21:38:06.4663277Z         del arg1_1
2023-01-11T21:38:06.4663390Z         return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), )
2023-01-11T21:38:06.4663395Z 
2023-01-11T21:38:06.4663400Z 
2023-01-11T21:38:06.4663481Z if __name__ == "__main__":
2023-01-11T21:38:06.4663630Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4663759Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4663964Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4664172Z     arg1_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4664288Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4664293Z 
2023-01-11T21:38:06.4664305Z 
2023-01-11T21:38:06.4664397Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4664472Z import torch
2023-01-11T21:38:06.4664548Z import random
2023-01-11T21:38:06.4664672Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4664797Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4664802Z 
2023-01-11T21:38:06.4664884Z aten = torch.ops.aten
2023-01-11T21:38:06.4665022Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4665114Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4665119Z 
2023-01-11T21:38:06.4665195Z import triton
2023-01-11T21:38:06.4665290Z import triton.language as tl
2023-01-11T21:38:06.4665421Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4665586Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4665591Z 
2023-01-11T21:38:06.4665596Z 
2023-01-11T21:38:06.4665707Z async_compile.wait(globals())
2023-01-11T21:38:06.4665792Z del async_compile
2023-01-11T21:38:06.4665797Z 
2023-01-11T21:38:06.4665873Z def call(args):
2023-01-11T21:38:06.4665948Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4666024Z     args.clear()
2023-01-11T21:38:06.4666147Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4666348Z         buf0 = empty_strided((6, 30), (30, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4666496Z         aten.mm.out(as_strided(arg1_1, (6, 10), (10, 1)), as_strided(arg0_1, (10, 30), (1, 10)), out=buf0)
2023-01-11T21:38:06.4666572Z         del arg0_1
2023-01-11T21:38:06.4666647Z         del arg1_1
2023-01-11T21:38:06.4666754Z         return (as_strided(buf0, (2, 3, 30), (90, 30, 1)), )
2023-01-11T21:38:06.4666759Z 
2023-01-11T21:38:06.4666770Z 
2023-01-11T21:38:06.4666845Z if __name__ == "__main__":
2023-01-11T21:38:06.4666963Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4667088Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4667294Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4667501Z     arg1_1 = rand_strided((2, 3, 10), (30, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4667630Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4667635Z 
2023-01-11T21:38:06.4667640Z 
2023-01-11T21:38:06.4667740Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4667816Z import torch
2023-01-11T21:38:06.4667885Z import random
2023-01-11T21:38:06.4668005Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4668133Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4668138Z 
2023-01-11T21:38:06.4668225Z aten = torch.ops.aten
2023-01-11T21:38:06.4668361Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4668458Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4668463Z 
2023-01-11T21:38:06.4668539Z import triton
2023-01-11T21:38:06.4668626Z import triton.language as tl
2023-01-11T21:38:06.4668752Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4668892Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4668897Z 
2023-01-11T21:38:06.4668904Z 
2023-01-11T21:38:06.4668999Z async_compile.wait(globals())
2023-01-11T21:38:06.4669076Z del async_compile
2023-01-11T21:38:06.4669082Z 
2023-01-11T21:38:06.4669157Z def call(args):
2023-01-11T21:38:06.4669243Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4669322Z     args.clear()
2023-01-11T21:38:06.4669408Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4669642Z         buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4669801Z         aten.addmm.out(arg1_1, arg2_1, as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4669876Z         del arg0_1
2023-01-11T21:38:06.4669952Z         del arg1_1
2023-01-11T21:38:06.4670026Z         del arg2_1
2023-01-11T21:38:06.4670106Z         return (buf0, )
2023-01-11T21:38:06.4670112Z 
2023-01-11T21:38:06.4670116Z 
2023-01-11T21:38:06.4670192Z if __name__ == "__main__":
2023-01-11T21:38:06.4670310Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4670440Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4670642Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4670842Z     arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4671045Z     arg2_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4671175Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4671180Z 
2023-01-11T21:38:06.4671185Z 
2023-01-11T21:38:06.4671285Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4671355Z import torch
2023-01-11T21:38:06.4671429Z import random
2023-01-11T21:38:06.4671549Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4671670Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4671675Z 
2023-01-11T21:38:06.4671760Z aten = torch.ops.aten
2023-01-11T21:38:06.4671898Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4672024Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4672030Z 
2023-01-11T21:38:06.4672105Z import triton
2023-01-11T21:38:06.4672192Z import triton.language as tl
2023-01-11T21:38:06.4672320Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4672461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4672467Z 
2023-01-11T21:38:06.4672471Z 
2023-01-11T21:38:06.4672565Z async_compile.wait(globals())
2023-01-11T21:38:06.4672643Z del async_compile
2023-01-11T21:38:06.4672648Z 
2023-01-11T21:38:06.4672725Z def call(args):
2023-01-11T21:38:06.4672813Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.4672890Z     args.clear()
2023-01-11T21:38:06.4672977Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4673181Z         buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4673339Z         aten.addmm.out(arg1_1, arg2_1, as_strided(arg0_1, (10, 30), (1, 10)), beta=1, alpha=1, out=buf0)
2023-01-11T21:38:06.4673418Z         del arg0_1
2023-01-11T21:38:06.4673493Z         del arg1_1
2023-01-11T21:38:06.4673568Z         del arg2_1
2023-01-11T21:38:06.4673646Z         return (buf0, )
2023-01-11T21:38:06.4673652Z 
2023-01-11T21:38:06.4673656Z 
2023-01-11T21:38:06.4673731Z if __name__ == "__main__":
2023-01-11T21:38:06.4673853Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4673981Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4674186Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4674384Z     arg1_1 = rand_strided((30, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4674586Z     arg2_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4674721Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.4674726Z 
2023-01-11T21:38:06.4674992Z [2023-01-11 21:35:08,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 651
2023-01-11T21:38:06.4675467Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4675623Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4675874Z [2023-01-11 21:35:08,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 652
2023-01-11T21:38:06.4676137Z [2023-01-11 21:35:08,437] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 652
2023-01-11T21:38:06.4676143Z 
2023-01-11T21:38:06.4676242Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4676318Z import torch
2023-01-11T21:38:06.4676395Z import random
2023-01-11T21:38:06.4676520Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4676644Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4676649Z 
2023-01-11T21:38:06.4676734Z aten = torch.ops.aten
2023-01-11T21:38:06.4676864Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4676963Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4676969Z 
2023-01-11T21:38:06.4677045Z import triton
2023-01-11T21:38:06.4677139Z import triton.language as tl
2023-01-11T21:38:06.4677267Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4677408Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4677414Z 
2023-01-11T21:38:06.4677418Z 
2023-01-11T21:38:06.4677510Z async_compile.wait(globals())
2023-01-11T21:38:06.4677583Z del async_compile
2023-01-11T21:38:06.4677593Z 
2023-01-11T21:38:06.4677663Z def call(args):
2023-01-11T21:38:06.4677745Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4677823Z     args.clear()
2023-01-11T21:38:06.4677954Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4678159Z         buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4678289Z         aten.mm.out(arg1_1, as_strided(arg0_1, (10, 30), (1, 10)), out=buf0)
2023-01-11T21:38:06.4678367Z         del arg0_1
2023-01-11T21:38:06.4678437Z         del arg1_1
2023-01-11T21:38:06.4678516Z         return (buf0, )
2023-01-11T21:38:06.4678522Z 
2023-01-11T21:38:06.4678526Z 
2023-01-11T21:38:06.4678606Z if __name__ == "__main__":
2023-01-11T21:38:06.4678725Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4678853Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4679058Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4679259Z     arg1_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4679373Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4679389Z 
2023-01-11T21:38:06.4679393Z 
2023-01-11T21:38:06.4679486Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4679560Z import torch
2023-01-11T21:38:06.4679637Z import random
2023-01-11T21:38:06.4679759Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4679884Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4679892Z 
2023-01-11T21:38:06.4679977Z aten = torch.ops.aten
2023-01-11T21:38:06.4680116Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4680205Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4680217Z 
2023-01-11T21:38:06.4680286Z import triton
2023-01-11T21:38:06.4680381Z import triton.language as tl
2023-01-11T21:38:06.4680508Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4680652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4680657Z 
2023-01-11T21:38:06.4680661Z 
2023-01-11T21:38:06.4680753Z async_compile.wait(globals())
2023-01-11T21:38:06.4680835Z del async_compile
2023-01-11T21:38:06.4680840Z 
2023-01-11T21:38:06.4680918Z def call(args):
2023-01-11T21:38:06.4680993Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4681069Z     args.clear()
2023-01-11T21:38:06.4681163Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4681393Z         buf0 = empty_strided((2, 30), (30, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4681522Z         aten.mm.out(arg1_1, as_strided(arg0_1, (10, 30), (1, 10)), out=buf0)
2023-01-11T21:38:06.4681599Z         del arg0_1
2023-01-11T21:38:06.4681672Z         del arg1_1
2023-01-11T21:38:06.4681745Z         return (buf0, )
2023-01-11T21:38:06.4681756Z 
2023-01-11T21:38:06.4681761Z 
2023-01-11T21:38:06.4681835Z if __name__ == "__main__":
2023-01-11T21:38:06.4681953Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4682082Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4682284Z     arg0_1 = rand_strided((30, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4682489Z     arg1_1 = rand_strided((2, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4682610Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4682615Z 
2023-01-11T21:38:06.4682688Z ok (0.318s)
2023-01-11T21:38:06.4682815Z   test_linear_permute_fusion (__main__.CudaTests) ... ok (0.004s)
2023-01-11T21:38:06.4682941Z   test_linear_unary_cuda (__main__.CudaTests) ... ok (0.001s)
2023-01-11T21:38:06.4683403Z   test_linspace1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4683537Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4683826Z [2023-01-11 21:35:08,477] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 653
2023-01-11T21:38:06.4684088Z [2023-01-11 21:35:08,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 653
2023-01-11T21:38:06.4684504Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4684637Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4684894Z [2023-01-11 21:35:08,581] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 654
2023-01-11T21:38:06.4685159Z [2023-01-11 21:35:08,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 654
2023-01-11T21:38:06.4685168Z 
2023-01-11T21:38:06.4685267Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4685337Z import torch
2023-01-11T21:38:06.4685415Z import random
2023-01-11T21:38:06.4685537Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4685667Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4685675Z 
2023-01-11T21:38:06.4685758Z aten = torch.ops.aten
2023-01-11T21:38:06.4685897Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4685996Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4686001Z 
2023-01-11T21:38:06.4686076Z import triton
2023-01-11T21:38:06.4686163Z import triton.language as tl
2023-01-11T21:38:06.4686290Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4686431Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4686436Z 
2023-01-11T21:38:06.4686441Z 
2023-01-11T21:38:06.4686600Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4686678Z import triton
2023-01-11T21:38:06.4686772Z import triton.language as tl
2023-01-11T21:38:06.4686886Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4686983Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4687121Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4687275Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4687280Z 
2023-01-11T21:38:06.4687686Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4687762Z @triton.jit
2023-01-11T21:38:06.4687899Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4687976Z     xnumel = 7
2023-01-11T21:38:06.4688076Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4688200Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4688288Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4688361Z     x0 = xindex
2023-01-11T21:38:06.4688460Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4688536Z     tmp0 = 0.125
2023-01-11T21:38:06.4688612Z     tmp1 = x0
2023-01-11T21:38:06.4688690Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4688766Z     tmp3 = tmp2 + tmp0
2023-01-11T21:38:06.4688847Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.4688987Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4689077Z ''')
2023-01-11T21:38:06.4689082Z 
2023-01-11T21:38:06.4689087Z 
2023-01-11T21:38:06.4689181Z async_compile.wait(globals())
2023-01-11T21:38:06.4689257Z del async_compile
2023-01-11T21:38:06.4689262Z 
2023-01-11T21:38:06.4689336Z def call(args):
2023-01-11T21:38:06.4689410Z     arg0_1, = args
2023-01-11T21:38:06.4689480Z     args.clear()
2023-01-11T21:38:06.4689572Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4689776Z         buf0 = empty_strided((1, 7), (7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4689900Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4690038Z         triton_fused_add_0.run(arg0_1, buf0, 7, grid=grid(7), stream=stream0)
2023-01-11T21:38:06.4690112Z         del arg0_1
2023-01-11T21:38:06.4690191Z         return (buf0, )
2023-01-11T21:38:06.4690197Z 
2023-01-11T21:38:06.4690203Z 
2023-01-11T21:38:06.4690279Z if __name__ == "__main__":
2023-01-11T21:38:06.4690399Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4690525Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4690727Z     arg0_1 = rand_strided((1, 7), (7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4690840Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4690846Z 
2023-01-11T21:38:06.4690850Z 
2023-01-11T21:38:06.4690950Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4691027Z import torch
2023-01-11T21:38:06.4691103Z import random
2023-01-11T21:38:06.4691220Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4691344Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4691349Z 
2023-01-11T21:38:06.4691436Z aten = torch.ops.aten
2023-01-11T21:38:06.4691573Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4691673Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4691678Z 
2023-01-11T21:38:06.4691755Z import triton
2023-01-11T21:38:06.4691849Z import triton.language as tl
2023-01-11T21:38:06.4691969Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4692110Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4692115Z 
2023-01-11T21:38:06.4692120Z 
2023-01-11T21:38:06.4692274Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4692350Z import triton
2023-01-11T21:38:06.4692443Z import triton.language as tl
2023-01-11T21:38:06.4692559Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4692666Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4692800Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4692922Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4692927Z 
2023-01-11T21:38:06.4693357Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4693433Z @triton.jit
2023-01-11T21:38:06.4693569Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4693643Z     xnumel = 7
2023-01-11T21:38:06.4693742Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4693872Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4693957Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4694023Z     x0 = xindex
2023-01-11T21:38:06.4694143Z     tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4694222Z     tmp0 = 0.125
2023-01-11T21:38:06.4694295Z     tmp1 = x0
2023-01-11T21:38:06.4694374Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4694453Z     tmp3 = tmp2 + tmp0
2023-01-11T21:38:06.4694646Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.4694721Z     tmp6 = tmp3 + tmp5
2023-01-11T21:38:06.4694861Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.4694949Z ''')
2023-01-11T21:38:06.4694955Z 
2023-01-11T21:38:06.4694959Z 
2023-01-11T21:38:06.4695053Z async_compile.wait(globals())
2023-01-11T21:38:06.4695132Z del async_compile
2023-01-11T21:38:06.4695137Z 
2023-01-11T21:38:06.4695217Z def call(args):
2023-01-11T21:38:06.4695292Z     arg0_1, = args
2023-01-11T21:38:06.4695362Z     args.clear()
2023-01-11T21:38:06.4695456Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4695660Z         buf0 = empty_strided((1, 7), (7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4695755Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4695939Z         triton_fused_add_0.run(arg0_1, buf0, 7, grid=grid(7), stream=stream0)
2023-01-11T21:38:06.4696013Z         del arg0_1
2023-01-11T21:38:06.4696092Z         return (buf0, )
2023-01-11T21:38:06.4696098Z 
2023-01-11T21:38:06.4696102Z 
2023-01-11T21:38:06.4696182Z if __name__ == "__main__":
2023-01-11T21:38:06.4696299Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4696425Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4702914Z     arg0_1 = rand_strided((1, 7), (7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4703037Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4703043Z 
2023-01-11T21:38:06.4703116Z ok (0.207s)
2023-01-11T21:38:06.4703576Z   test_linspace2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4703717Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4704001Z [2023-01-11 21:35:08,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 655
2023-01-11T21:38:06.4704269Z [2023-01-11 21:35:08,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 655
2023-01-11T21:38:06.4704687Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4704820Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4705082Z [2023-01-11 21:35:08,774] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 656
2023-01-11T21:38:06.4705338Z [2023-01-11 21:35:08,839] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 656
2023-01-11T21:38:06.4705344Z 
2023-01-11T21:38:06.4705523Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4705612Z import torch
2023-01-11T21:38:06.4705698Z import random
2023-01-11T21:38:06.4705839Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4705968Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4705973Z 
2023-01-11T21:38:06.4706048Z aten = torch.ops.aten
2023-01-11T21:38:06.4706185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4706281Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4706286Z 
2023-01-11T21:38:06.4706361Z import triton
2023-01-11T21:38:06.4706453Z import triton.language as tl
2023-01-11T21:38:06.4706579Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4706722Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4706728Z 
2023-01-11T21:38:06.4706732Z 
2023-01-11T21:38:06.4706887Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4706955Z import triton
2023-01-11T21:38:06.4707049Z import triton.language as tl
2023-01-11T21:38:06.4707163Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4707264Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4707396Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4707521Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4707526Z 
2023-01-11T21:38:06.4707926Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4707997Z @triton.jit
2023-01-11T21:38:06.4708156Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4708232Z     xnumel = 1
2023-01-11T21:38:06.4708332Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4708465Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4708552Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4708689Z     tmp4 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.4708765Z     tmp0 = 0.0
2023-01-11T21:38:06.4708831Z     tmp1 = 0
2023-01-11T21:38:06.4708913Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4708993Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.4709070Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.4709207Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp5, None)
2023-01-11T21:38:06.4709296Z ''')
2023-01-11T21:38:06.4709302Z 
2023-01-11T21:38:06.4709306Z 
2023-01-11T21:38:06.4709399Z async_compile.wait(globals())
2023-01-11T21:38:06.4709473Z del async_compile
2023-01-11T21:38:06.4709490Z 
2023-01-11T21:38:06.4709560Z def call(args):
2023-01-11T21:38:06.4709636Z     arg0_1, = args
2023-01-11T21:38:06.4709719Z     args.clear()
2023-01-11T21:38:06.4709814Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4710018Z         buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4710121Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4710261Z         triton_fused_add_0.run(arg0_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.4710330Z         del arg0_1
2023-01-11T21:38:06.4710410Z         return (buf0, )
2023-01-11T21:38:06.4710415Z 
2023-01-11T21:38:06.4710420Z 
2023-01-11T21:38:06.4710502Z if __name__ == "__main__":
2023-01-11T21:38:06.4710621Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4710749Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4710952Z     arg0_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4711067Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4711074Z 
2023-01-11T21:38:06.4711079Z 
2023-01-11T21:38:06.4711176Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4711245Z import torch
2023-01-11T21:38:06.4711322Z import random
2023-01-11T21:38:06.4711446Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4711602Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4711608Z 
2023-01-11T21:38:06.4711694Z aten = torch.ops.aten
2023-01-11T21:38:06.4711831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4711930Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4711935Z 
2023-01-11T21:38:06.4712004Z import triton
2023-01-11T21:38:06.4712101Z import triton.language as tl
2023-01-11T21:38:06.4712226Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4712367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4712373Z 
2023-01-11T21:38:06.4712377Z 
2023-01-11T21:38:06.4712536Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4712611Z import triton
2023-01-11T21:38:06.4712708Z import triton.language as tl
2023-01-11T21:38:06.4712825Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4712923Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4713060Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4713187Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4713192Z 
2023-01-11T21:38:06.4713593Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4713671Z @triton.jit
2023-01-11T21:38:06.4713805Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4713879Z     xnumel = 1
2023-01-11T21:38:06.4713979Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4714132Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4714219Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4714368Z     tmp4 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.4714443Z     tmp0 = 0.0
2023-01-11T21:38:06.4714514Z     tmp1 = 0
2023-01-11T21:38:06.4714595Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.4714675Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.4714759Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.4714838Z     tmp6 = tmp3 + tmp5
2023-01-11T21:38:06.4714974Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp6, None)
2023-01-11T21:38:06.4715066Z ''')
2023-01-11T21:38:06.4715072Z 
2023-01-11T21:38:06.4715076Z 
2023-01-11T21:38:06.4715172Z async_compile.wait(globals())
2023-01-11T21:38:06.4715252Z del async_compile
2023-01-11T21:38:06.4715258Z 
2023-01-11T21:38:06.4715351Z def call(args):
2023-01-11T21:38:06.4715428Z     arg0_1, = args
2023-01-11T21:38:06.4715522Z     args.clear()
2023-01-11T21:38:06.4715631Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4715833Z         buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4715928Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4716066Z         triton_fused_add_0.run(arg0_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.4716146Z         del arg0_1
2023-01-11T21:38:06.4716219Z         return (buf0, )
2023-01-11T21:38:06.4716234Z 
2023-01-11T21:38:06.4716239Z 
2023-01-11T21:38:06.4716315Z if __name__ == "__main__":
2023-01-11T21:38:06.4716434Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4716562Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4716764Z     arg0_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4716879Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4716884Z 
2023-01-11T21:38:06.4716957Z ok (0.190s)
2023-01-11T21:38:06.4717446Z   test_linspace3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4717581Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4717840Z [2023-01-11 21:35:08,867] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 657
2023-01-11T21:38:06.4718099Z [2023-01-11 21:35:08,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 657
2023-01-11T21:38:06.4718512Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4718647Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4718906Z [2023-01-11 21:35:08,894] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 658
2023-01-11T21:38:06.4719167Z [2023-01-11 21:35:08,896] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 658
2023-01-11T21:38:06.4719172Z 
2023-01-11T21:38:06.4719270Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4719343Z import torch
2023-01-11T21:38:06.4719418Z import random
2023-01-11T21:38:06.4719538Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4719654Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4719659Z 
2023-01-11T21:38:06.4719742Z aten = torch.ops.aten
2023-01-11T21:38:06.4719879Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4720008Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4720013Z 
2023-01-11T21:38:06.4720086Z import triton
2023-01-11T21:38:06.4720176Z import triton.language as tl
2023-01-11T21:38:06.4720303Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4720443Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4720449Z 
2023-01-11T21:38:06.4720453Z 
2023-01-11T21:38:06.4720538Z async_compile.wait(globals())
2023-01-11T21:38:06.4720613Z del async_compile
2023-01-11T21:38:06.4720618Z 
2023-01-11T21:38:06.4720693Z def call(args):
2023-01-11T21:38:06.4720888Z     buf0 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4720962Z     return (buf0, )
2023-01-11T21:38:06.4720967Z 
2023-01-11T21:38:06.4720972Z 
2023-01-11T21:38:06.4721051Z if __name__ == "__main__":
2023-01-11T21:38:06.4721166Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4721292Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4721390Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.4721395Z 
2023-01-11T21:38:06.4721404Z 
2023-01-11T21:38:06.4721494Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4721565Z import torch
2023-01-11T21:38:06.4721639Z import random
2023-01-11T21:38:06.4721761Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4721884Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4721889Z 
2023-01-11T21:38:06.4721970Z aten = torch.ops.aten
2023-01-11T21:38:06.4722106Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4722195Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4722200Z 
2023-01-11T21:38:06.4722273Z import triton
2023-01-11T21:38:06.4722366Z import triton.language as tl
2023-01-11T21:38:06.4722491Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4722629Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4722637Z 
2023-01-11T21:38:06.4722641Z 
2023-01-11T21:38:06.4722734Z async_compile.wait(globals())
2023-01-11T21:38:06.4722811Z del async_compile
2023-01-11T21:38:06.4722816Z 
2023-01-11T21:38:06.4722889Z def call(args):
2023-01-11T21:38:06.4723075Z     buf0 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4723183Z     return (buf0, )
2023-01-11T21:38:06.4723188Z 
2023-01-11T21:38:06.4723193Z 
2023-01-11T21:38:06.4723274Z if __name__ == "__main__":
2023-01-11T21:38:06.4723389Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4723512Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4723612Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.4723617Z 
2023-01-11T21:38:06.4723692Z ok (0.056s)
2023-01-11T21:38:06.4724025Z   test_list_clearing_cuda (__main__.CudaTests) ... [2023-01-11 21:35:08,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:06.4724282Z [2023-01-11 21:35:09,221] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:06.4724536Z [2023-01-11 21:35:09,246] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph None
2023-01-11T21:38:06.4724800Z [2023-01-11 21:35:09,383] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph None
2023-01-11T21:38:06.4724806Z 
2023-01-11T21:38:06.4724903Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4724976Z import torch
2023-01-11T21:38:06.4725050Z import random
2023-01-11T21:38:06.4725169Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4725292Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4725297Z 
2023-01-11T21:38:06.4725372Z aten = torch.ops.aten
2023-01-11T21:38:06.4725504Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4725604Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4725609Z 
2023-01-11T21:38:06.4725735Z import triton
2023-01-11T21:38:06.4725835Z import triton.language as tl
2023-01-11T21:38:06.4725984Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4726126Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4726132Z 
2023-01-11T21:38:06.4726136Z 
2023-01-11T21:38:06.4726298Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4726365Z import triton
2023-01-11T21:38:06.4726458Z import triton.language as tl
2023-01-11T21:38:06.4726571Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4726672Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4726807Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4726933Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4726938Z 
2023-01-11T21:38:06.4727360Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4727437Z @triton.jit
2023-01-11T21:38:06.4727573Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4727646Z     xnumel = 25
2023-01-11T21:38:06.4727747Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4727878Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4727960Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4728030Z     x0 = xindex
2023-01-11T21:38:06.4728127Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4728217Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.4728296Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4728430Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4728516Z ''')
2023-01-11T21:38:06.4728522Z 
2023-01-11T21:38:06.4728526Z 
2023-01-11T21:38:06.4728620Z async_compile.wait(globals())
2023-01-11T21:38:06.4728700Z del async_compile
2023-01-11T21:38:06.4728705Z 
2023-01-11T21:38:06.4728779Z def call(args):
2023-01-11T21:38:06.4728848Z     x_1, y_1 = args
2023-01-11T21:38:06.4728917Z     args.clear()
2023-01-11T21:38:06.4729008Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4729207Z         buf0 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4729328Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4729469Z         triton_fused_add_0.run(x_1, y_1, buf0, 25, grid=grid(25), stream=stream0)
2023-01-11T21:38:06.4729543Z         del x_1
2023-01-11T21:38:06.4729612Z         del y_1
2023-01-11T21:38:06.4729805Z         buf1 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4729905Z         aten.mm.out(buf0, buf0, out=buf1)
2023-01-11T21:38:06.4729983Z         return (buf1, )
2023-01-11T21:38:06.4729989Z 
2023-01-11T21:38:06.4729993Z 
2023-01-11T21:38:06.4730075Z if __name__ == "__main__":
2023-01-11T21:38:06.4730189Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4730318Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4730512Z     x_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4730705Z     y_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4730814Z     print_performance(lambda: call([x_1, y_1]))
2023-01-11T21:38:06.4730819Z 
2023-01-11T21:38:06.4730830Z 
2023-01-11T21:38:06.4730921Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4730995Z import torch
2023-01-11T21:38:06.4731070Z import random
2023-01-11T21:38:06.4731188Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4731309Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4731315Z 
2023-01-11T21:38:06.4731398Z aten = torch.ops.aten
2023-01-11T21:38:06.4731530Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4731618Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4731650Z 
2023-01-11T21:38:06.4731730Z import triton
2023-01-11T21:38:06.4731820Z import triton.language as tl
2023-01-11T21:38:06.4731946Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4732089Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4732095Z 
2023-01-11T21:38:06.4732099Z 
2023-01-11T21:38:06.4732255Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.4732332Z import triton
2023-01-11T21:38:06.4732417Z import triton.language as tl
2023-01-11T21:38:06.4732534Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4732636Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4732769Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4732899Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4732904Z 
2023-01-11T21:38:06.4733320Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4733400Z @triton.jit
2023-01-11T21:38:06.4733543Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4733616Z     xnumel = 25
2023-01-11T21:38:06.4733709Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4733842Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4733925Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4733997Z     x0 = xindex
2023-01-11T21:38:06.4734093Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4734188Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.4734261Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4734395Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4734714Z ''')
2023-01-11T21:38:06.4734721Z 
2023-01-11T21:38:06.4734725Z 
2023-01-11T21:38:06.4734822Z async_compile.wait(globals())
2023-01-11T21:38:06.4734903Z del async_compile
2023-01-11T21:38:06.4734909Z 
2023-01-11T21:38:06.4734982Z def call(args):
2023-01-11T21:38:06.4735053Z     x_1, y_1 = args
2023-01-11T21:38:06.4735129Z     args.clear()
2023-01-11T21:38:06.4735214Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4735466Z         buf0 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4735559Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4735697Z         triton_fused_add_0.run(x_1, y_1, buf0, 25, grid=grid(25), stream=stream0)
2023-01-11T21:38:06.4735769Z         del x_1
2023-01-11T21:38:06.4735837Z         del y_1
2023-01-11T21:38:06.4736032Z         buf1 = empty_strided((5, 5), (5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4736122Z         aten.mm.out(buf0, buf0, out=buf1)
2023-01-11T21:38:06.4736200Z         return (buf1, )
2023-01-11T21:38:06.4736205Z 
2023-01-11T21:38:06.4736209Z 
2023-01-11T21:38:06.4736293Z if __name__ == "__main__":
2023-01-11T21:38:06.4736410Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4736537Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4736729Z     x_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4736919Z     y_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4737034Z     print_performance(lambda: call([x_1, y_1]))
2023-01-11T21:38:06.4737040Z 
2023-01-11T21:38:06.4737103Z ok (0.486s)
2023-01-11T21:38:06.4737635Z   test_log1p_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4737768Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4738075Z [2023-01-11 21:35:09,398] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 659
2023-01-11T21:38:06.4738344Z [2023-01-11 21:35:09,553] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 659
2023-01-11T21:38:06.4738765Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4738900Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4739161Z [2023-01-11 21:35:09,569] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 660
2023-01-11T21:38:06.4739425Z [2023-01-11 21:35:09,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 660
2023-01-11T21:38:06.4739846Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4739979Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4740235Z [2023-01-11 21:35:09,593] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 661
2023-01-11T21:38:06.4740493Z [2023-01-11 21:35:09,750] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 661
2023-01-11T21:38:06.4740910Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4741045Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4741327Z [2023-01-11 21:35:09,767] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 662
2023-01-11T21:38:06.4741333Z 
2023-01-11T21:38:06.4741433Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4741511Z import torch
2023-01-11T21:38:06.4741588Z import random
2023-01-11T21:38:06.4741712Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4741840Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4741845Z 
2023-01-11T21:38:06.4741923Z aten = torch.ops.aten
2023-01-11T21:38:06.4742061Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4742158Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4742163Z 
2023-01-11T21:38:06.4742245Z import triton
2023-01-11T21:38:06.4742339Z import triton.language as tl
2023-01-11T21:38:06.4742467Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4742608Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4742614Z 
2023-01-11T21:38:06.4742619Z 
2023-01-11T21:38:06.4742788Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4742858Z import triton
2023-01-11T21:38:06.4742952Z import triton.language as tl
2023-01-11T21:38:06.4743066Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4743170Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4743306Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4743433Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4743439Z 
2023-01-11T21:38:06.4743862Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4743964Z @triton.jit
2023-01-11T21:38:06.4744101Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4744180Z     xnumel = 64
2023-01-11T21:38:06.4744280Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4744413Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4744500Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4744573Z     x0 = xindex
2023-01-11T21:38:06.4744791Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4744904Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4745005Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4745103Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4745176Z     tmp4 = 2
2023-01-11T21:38:06.4745257Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4745397Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4745533Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4745613Z ''')
2023-01-11T21:38:06.4745625Z 
2023-01-11T21:38:06.4745629Z 
2023-01-11T21:38:06.4745719Z async_compile.wait(globals())
2023-01-11T21:38:06.4745799Z del async_compile
2023-01-11T21:38:06.4745804Z 
2023-01-11T21:38:06.4745880Z def call(args):
2023-01-11T21:38:06.4745956Z     arg0_1, = args
2023-01-11T21:38:06.4746034Z     args.clear()
2023-01-11T21:38:06.4746129Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4746330Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4746523Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4746618Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4746767Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4746846Z         del arg0_1
2023-01-11T21:38:06.4746932Z         return (buf0, buf1, )
2023-01-11T21:38:06.4746937Z 
2023-01-11T21:38:06.4746941Z 
2023-01-11T21:38:06.4747022Z if __name__ == "__main__":
2023-01-11T21:38:06.4747143Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4747303Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4747498Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4747614Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4747619Z 
2023-01-11T21:38:06.4747623Z 
2023-01-11T21:38:06.4747723Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4747799Z import torch
2023-01-11T21:38:06.4747876Z import random
2023-01-11T21:38:06.4747997Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4748122Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4748127Z 
2023-01-11T21:38:06.4748212Z aten = torch.ops.aten
2023-01-11T21:38:06.4748347Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4748443Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4748448Z 
2023-01-11T21:38:06.4748522Z import triton
2023-01-11T21:38:06.4748616Z import triton.language as tl
2023-01-11T21:38:06.4748741Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4748887Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4748892Z 
2023-01-11T21:38:06.4748897Z 
2023-01-11T21:38:06.4749062Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4749140Z import triton
2023-01-11T21:38:06.4749228Z import triton.language as tl
2023-01-11T21:38:06.4749344Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4749447Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4749582Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4749711Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4749741Z 
2023-01-11T21:38:06.4750164Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4750239Z @triton.jit
2023-01-11T21:38:06.4750384Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4750453Z     xnumel = 64
2023-01-11T21:38:06.4750550Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4750682Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4750767Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4750840Z     x0 = xindex
2023-01-11T21:38:06.4751056Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4751176Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4751270Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4751372Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4751445Z     tmp4 = 2
2023-01-11T21:38:06.4751527Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4751665Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4751805Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4751893Z ''')
2023-01-11T21:38:06.4751899Z 
2023-01-11T21:38:06.4751903Z 
2023-01-11T21:38:06.4751992Z async_compile.wait(globals())
2023-01-11T21:38:06.4752070Z del async_compile
2023-01-11T21:38:06.4752075Z 
2023-01-11T21:38:06.4752150Z def call(args):
2023-01-11T21:38:06.4752226Z     arg0_1, = args
2023-01-11T21:38:06.4752304Z     args.clear()
2023-01-11T21:38:06.4752400Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4752600Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4752798Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4752889Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4753038Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4753114Z         del arg0_1
2023-01-11T21:38:06.4753200Z         return (buf0, buf1, )
2023-01-11T21:38:06.4753205Z 
2023-01-11T21:38:06.4753209Z 
2023-01-11T21:38:06.4753321Z if __name__ == "__main__":
2023-01-11T21:38:06.4753439Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4753563Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4753755Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4753867Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4753872Z 
2023-01-11T21:38:06.4753876Z 
2023-01-11T21:38:06.4753974Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4754046Z import torch
2023-01-11T21:38:06.4754119Z import random
2023-01-11T21:38:06.4754239Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4754363Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4754368Z 
2023-01-11T21:38:06.4754451Z aten = torch.ops.aten
2023-01-11T21:38:06.4754579Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4754674Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4754679Z 
2023-01-11T21:38:06.4754754Z import triton
2023-01-11T21:38:06.4754846Z import triton.language as tl
2023-01-11T21:38:06.4754970Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4755109Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4755115Z 
2023-01-11T21:38:06.4755119Z 
2023-01-11T21:38:06.4755290Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4755383Z import triton
2023-01-11T21:38:06.4755473Z import triton.language as tl
2023-01-11T21:38:06.4755611Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4755712Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4755878Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4756003Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4756009Z 
2023-01-11T21:38:06.4756432Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4756506Z @triton.jit
2023-01-11T21:38:06.4756648Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4756716Z     xnumel = 201
2023-01-11T21:38:06.4756812Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4756937Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4757020Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4757090Z     x0 = xindex
2023-01-11T21:38:06.4757304Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4757424Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4757515Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4757614Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4757683Z     tmp4 = 2
2023-01-11T21:38:06.4757760Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4757894Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4758026Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4758109Z ''')
2023-01-11T21:38:06.4758115Z 
2023-01-11T21:38:06.4758119Z 
2023-01-11T21:38:06.4758211Z async_compile.wait(globals())
2023-01-11T21:38:06.4758280Z del async_compile
2023-01-11T21:38:06.4758285Z 
2023-01-11T21:38:06.4758360Z def call(args):
2023-01-11T21:38:06.4758432Z     arg0_1, = args
2023-01-11T21:38:06.4758505Z     args.clear()
2023-01-11T21:38:06.4758597Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4758796Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4758995Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4759080Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4759226Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4759370Z         del arg0_1
2023-01-11T21:38:06.4759451Z         return (buf0, buf1, )
2023-01-11T21:38:06.4759457Z 
2023-01-11T21:38:06.4759461Z 
2023-01-11T21:38:06.4759540Z if __name__ == "__main__":
2023-01-11T21:38:06.4759658Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4759783Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4759986Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4760091Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4760096Z 
2023-01-11T21:38:06.4760362Z [2023-01-11 21:35:09,775] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 662
2023-01-11T21:38:06.4760786Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4760916Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4761172Z [2023-01-11 21:35:09,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 663
2023-01-11T21:38:06.4761439Z [2023-01-11 21:35:09,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 663
2023-01-11T21:38:06.4761853Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4762014Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4762272Z [2023-01-11 21:35:09,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 664
2023-01-11T21:38:06.4762534Z [2023-01-11 21:35:09,960] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 664
2023-01-11T21:38:06.4762947Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4763077Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4763326Z [2023-01-11 21:35:09,975] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 665
2023-01-11T21:38:06.4763340Z 
2023-01-11T21:38:06.4763431Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4763505Z import torch
2023-01-11T21:38:06.4763580Z import random
2023-01-11T21:38:06.4763703Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4763826Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4763831Z 
2023-01-11T21:38:06.4763912Z aten = torch.ops.aten
2023-01-11T21:38:06.4764050Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4764138Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4764143Z 
2023-01-11T21:38:06.4764217Z import triton
2023-01-11T21:38:06.4764308Z import triton.language as tl
2023-01-11T21:38:06.4764434Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4764573Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4764581Z 
2023-01-11T21:38:06.4764585Z 
2023-01-11T21:38:06.4764749Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4764821Z import triton
2023-01-11T21:38:06.4764914Z import triton.language as tl
2023-01-11T21:38:06.4765021Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4765153Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4765287Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4765411Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4765416Z 
2023-01-11T21:38:06.4765835Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4765912Z @triton.jit
2023-01-11T21:38:06.4766055Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4766131Z     xnumel = 201
2023-01-11T21:38:06.4766222Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4766352Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4766435Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4766505Z     x0 = xindex
2023-01-11T21:38:06.4766723Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4766842Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4766939Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4767029Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4767098Z     tmp4 = 2
2023-01-11T21:38:06.4767177Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4767314Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4767443Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4767530Z ''')
2023-01-11T21:38:06.4767577Z 
2023-01-11T21:38:06.4767582Z 
2023-01-11T21:38:06.4767674Z async_compile.wait(globals())
2023-01-11T21:38:06.4767744Z del async_compile
2023-01-11T21:38:06.4767756Z 
2023-01-11T21:38:06.4767823Z def call(args):
2023-01-11T21:38:06.4767896Z     arg0_1, = args
2023-01-11T21:38:06.4767972Z     args.clear()
2023-01-11T21:38:06.4768065Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4768264Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4768463Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4768555Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4768698Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4768770Z         del arg0_1
2023-01-11T21:38:06.4768853Z         return (buf0, buf1, )
2023-01-11T21:38:06.4768858Z 
2023-01-11T21:38:06.4768862Z 
2023-01-11T21:38:06.4768941Z if __name__ == "__main__":
2023-01-11T21:38:06.4769060Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4769185Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4769384Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4769490Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4769502Z 
2023-01-11T21:38:06.4769508Z 
2023-01-11T21:38:06.4769598Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4769675Z import torch
2023-01-11T21:38:06.4769750Z import random
2023-01-11T21:38:06.4769870Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4769990Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4769996Z 
2023-01-11T21:38:06.4770078Z aten = torch.ops.aten
2023-01-11T21:38:06.4770212Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4770300Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4770314Z 
2023-01-11T21:38:06.4770381Z import triton
2023-01-11T21:38:06.4770474Z import triton.language as tl
2023-01-11T21:38:06.4770597Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4770739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4770744Z 
2023-01-11T21:38:06.4770748Z 
2023-01-11T21:38:06.4770911Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4771016Z import triton
2023-01-11T21:38:06.4771110Z import triton.language as tl
2023-01-11T21:38:06.4771216Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4771318Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4771449Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4771575Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4771581Z 
2023-01-11T21:38:06.4771999Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4772075Z @triton.jit
2023-01-11T21:38:06.4772212Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4772287Z     xnumel = 64
2023-01-11T21:38:06.4772377Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4772508Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4772590Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4772661Z     x0 = xindex
2023-01-11T21:38:06.4772852Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4772948Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4773047Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4773137Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4773206Z     tmp4 = 2
2023-01-11T21:38:06.4773287Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4773417Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4773580Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4773664Z ''')
2023-01-11T21:38:06.4773670Z 
2023-01-11T21:38:06.4773674Z 
2023-01-11T21:38:06.4773767Z async_compile.wait(globals())
2023-01-11T21:38:06.4773843Z del async_compile
2023-01-11T21:38:06.4773848Z 
2023-01-11T21:38:06.4773915Z def call(args):
2023-01-11T21:38:06.4773990Z     arg0_1, = args
2023-01-11T21:38:06.4774065Z     args.clear()
2023-01-11T21:38:06.4774157Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4774356Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4774661Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4774755Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4774895Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4774969Z         del arg0_1
2023-01-11T21:38:06.4775049Z         return (buf0, buf1, )
2023-01-11T21:38:06.4775058Z 
2023-01-11T21:38:06.4775062Z 
2023-01-11T21:38:06.4775139Z if __name__ == "__main__":
2023-01-11T21:38:06.4775254Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4775378Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4775580Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4775701Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4775706Z 
2023-01-11T21:38:06.4775710Z 
2023-01-11T21:38:06.4775806Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4775881Z import torch
2023-01-11T21:38:06.4775956Z import random
2023-01-11T21:38:06.4776089Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4776219Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4776224Z 
2023-01-11T21:38:06.4776309Z aten = torch.ops.aten
2023-01-11T21:38:06.4776455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4776555Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4776561Z 
2023-01-11T21:38:06.4776628Z import triton
2023-01-11T21:38:06.4776725Z import triton.language as tl
2023-01-11T21:38:06.4776858Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4777011Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4777060Z 
2023-01-11T21:38:06.4777065Z 
2023-01-11T21:38:06.4777323Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4777425Z import triton
2023-01-11T21:38:06.4777517Z import triton.language as tl
2023-01-11T21:38:06.4777623Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4777725Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4777861Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4777986Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4777991Z 
2023-01-11T21:38:06.4778410Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4778488Z @triton.jit
2023-01-11T21:38:06.4778629Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4778705Z     xnumel = 64
2023-01-11T21:38:06.4778796Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4778926Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4779011Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4779082Z     x0 = xindex
2023-01-11T21:38:06.4779298Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4779419Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4779516Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4779612Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4779676Z     tmp4 = 2
2023-01-11T21:38:06.4779798Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4779932Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4780064Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4780150Z ''')
2023-01-11T21:38:06.4780156Z 
2023-01-11T21:38:06.4780161Z 
2023-01-11T21:38:06.4780252Z async_compile.wait(globals())
2023-01-11T21:38:06.4780329Z del async_compile
2023-01-11T21:38:06.4780335Z 
2023-01-11T21:38:06.4780402Z def call(args):
2023-01-11T21:38:06.4780474Z     arg0_1, = args
2023-01-11T21:38:06.4780546Z     args.clear()
2023-01-11T21:38:06.4780638Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4780837Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4781033Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4781124Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4781266Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4781342Z         del arg0_1
2023-01-11T21:38:06.4781425Z         return (buf0, buf1, )
2023-01-11T21:38:06.4781430Z 
2023-01-11T21:38:06.4781434Z 
2023-01-11T21:38:06.4781513Z if __name__ == "__main__":
2023-01-11T21:38:06.4781634Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4781759Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4781956Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4782068Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4782073Z 
2023-01-11T21:38:06.4782338Z [2023-01-11 21:35:10,128] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 665
2023-01-11T21:38:06.4782745Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4782877Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4783188Z [2023-01-11 21:35:10,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 666
2023-01-11T21:38:06.4783452Z [2023-01-11 21:35:10,152] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 666
2023-01-11T21:38:06.4783868Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4783998Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4784255Z [2023-01-11 21:35:10,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 667
2023-01-11T21:38:06.4784516Z [2023-01-11 21:35:10,318] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 667
2023-01-11T21:38:06.4784930Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4785059Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4785337Z [2023-01-11 21:35:10,334] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 668
2023-01-11T21:38:06.4785342Z 
2023-01-11T21:38:06.4785453Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4785562Z import torch
2023-01-11T21:38:06.4785636Z import random
2023-01-11T21:38:06.4785755Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4785878Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4785884Z 
2023-01-11T21:38:06.4785965Z aten = torch.ops.aten
2023-01-11T21:38:06.4786102Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4786198Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4786203Z 
2023-01-11T21:38:06.4786270Z import triton
2023-01-11T21:38:06.4786363Z import triton.language as tl
2023-01-11T21:38:06.4786486Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4786626Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4786631Z 
2023-01-11T21:38:06.4786636Z 
2023-01-11T21:38:06.4786800Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4786876Z import triton
2023-01-11T21:38:06.4786967Z import triton.language as tl
2023-01-11T21:38:06.4787082Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4787177Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4787309Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4787434Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4787439Z 
2023-01-11T21:38:06.4787862Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4787940Z @triton.jit
2023-01-11T21:38:06.4788083Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4788156Z     xnumel = 201
2023-01-11T21:38:06.4788250Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4788374Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4788457Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4788531Z     x0 = xindex
2023-01-11T21:38:06.4788725Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4788825Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4788924Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4789018Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4789111Z     tmp4 = 2
2023-01-11T21:38:06.4789191Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4789323Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4789456Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4789542Z ''')
2023-01-11T21:38:06.4789547Z 
2023-01-11T21:38:06.4789552Z 
2023-01-11T21:38:06.4789643Z async_compile.wait(globals())
2023-01-11T21:38:06.4789720Z del async_compile
2023-01-11T21:38:06.4789726Z 
2023-01-11T21:38:06.4789799Z def call(args):
2023-01-11T21:38:06.4789866Z     arg0_1, = args
2023-01-11T21:38:06.4789939Z     args.clear()
2023-01-11T21:38:06.4790035Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4790234Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4790433Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4790528Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4790678Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4790745Z         del arg0_1
2023-01-11T21:38:06.4790825Z         return (buf0, buf1, )
2023-01-11T21:38:06.4790830Z 
2023-01-11T21:38:06.4790834Z 
2023-01-11T21:38:06.4790914Z if __name__ == "__main__":
2023-01-11T21:38:06.4791032Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4791158Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4791358Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4791469Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4791507Z 
2023-01-11T21:38:06.4791511Z 
2023-01-11T21:38:06.4791610Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4791677Z import torch
2023-01-11T21:38:06.4791747Z import random
2023-01-11T21:38:06.4791864Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4791987Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4791992Z 
2023-01-11T21:38:06.4792072Z aten = torch.ops.aten
2023-01-11T21:38:06.4792206Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4792301Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4792306Z 
2023-01-11T21:38:06.4792381Z import triton
2023-01-11T21:38:06.4792468Z import triton.language as tl
2023-01-11T21:38:06.4792596Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4792734Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4792739Z 
2023-01-11T21:38:06.4792743Z 
2023-01-11T21:38:06.4792909Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4792982Z import triton
2023-01-11T21:38:06.4793072Z import triton.language as tl
2023-01-11T21:38:06.4793185Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4793280Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4793417Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4793543Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4793548Z 
2023-01-11T21:38:06.4793959Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4794033Z @triton.jit
2023-01-11T21:38:06.4794177Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4794251Z     xnumel = 201
2023-01-11T21:38:06.4794346Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4794471Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4794555Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4794624Z     x0 = xindex
2023-01-11T21:38:06.4794838Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4794985Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4795086Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4795192Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4795272Z     tmp4 = 2
2023-01-11T21:38:06.4795354Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4795509Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4795644Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4795731Z ''')
2023-01-11T21:38:06.4795737Z 
2023-01-11T21:38:06.4795741Z 
2023-01-11T21:38:06.4795831Z async_compile.wait(globals())
2023-01-11T21:38:06.4795912Z del async_compile
2023-01-11T21:38:06.4795917Z 
2023-01-11T21:38:06.4795992Z def call(args):
2023-01-11T21:38:06.4796058Z     arg0_1, = args
2023-01-11T21:38:06.4796135Z     args.clear()
2023-01-11T21:38:06.4796225Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4796424Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4796626Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4796715Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4796859Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4796925Z         del arg0_1
2023-01-11T21:38:06.4797008Z         return (buf0, buf1, )
2023-01-11T21:38:06.4797013Z 
2023-01-11T21:38:06.4797018Z 
2023-01-11T21:38:06.4797101Z if __name__ == "__main__":
2023-01-11T21:38:06.4797218Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4797343Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4797571Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4797683Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4797688Z 
2023-01-11T21:38:06.4797692Z 
2023-01-11T21:38:06.4797788Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4797863Z import torch
2023-01-11T21:38:06.4797930Z import random
2023-01-11T21:38:06.4798046Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4798167Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4798172Z 
2023-01-11T21:38:06.4798253Z aten = torch.ops.aten
2023-01-11T21:38:06.4798385Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4798483Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4798488Z 
2023-01-11T21:38:06.4798562Z import triton
2023-01-11T21:38:06.4798648Z import triton.language as tl
2023-01-11T21:38:06.4798775Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4798916Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4798921Z 
2023-01-11T21:38:06.4798925Z 
2023-01-11T21:38:06.4799087Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4799161Z import triton
2023-01-11T21:38:06.4799252Z import triton.language as tl
2023-01-11T21:38:06.4799364Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4799467Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4799591Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4799713Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4799719Z 
2023-01-11T21:38:06.4800144Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4800217Z @triton.jit
2023-01-11T21:38:06.4800366Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4800439Z     xnumel = 64
2023-01-11T21:38:06.4800536Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4800661Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4800738Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4800837Z     x0 = xindex
2023-01-11T21:38:06.4801031Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4801131Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4801228Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4801332Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4801404Z     tmp4 = 2
2023-01-11T21:38:06.4801476Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4801615Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4801746Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4801830Z ''')
2023-01-11T21:38:06.4801840Z 
2023-01-11T21:38:06.4801844Z 
2023-01-11T21:38:06.4801938Z async_compile.wait(globals())
2023-01-11T21:38:06.4802015Z del async_compile
2023-01-11T21:38:06.4802020Z 
2023-01-11T21:38:06.4802095Z def call(args):
2023-01-11T21:38:06.4802161Z     arg0_1, = args
2023-01-11T21:38:06.4802236Z     args.clear()
2023-01-11T21:38:06.4802328Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4802523Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4802720Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4802809Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4802957Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4803031Z         del arg0_1
2023-01-11T21:38:06.4803106Z         return (buf0, buf1, )
2023-01-11T21:38:06.4803111Z 
2023-01-11T21:38:06.4803116Z 
2023-01-11T21:38:06.4803196Z if __name__ == "__main__":
2023-01-11T21:38:06.4803342Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4803469Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4803669Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4803781Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4803790Z 
2023-01-11T21:38:06.4804055Z [2023-01-11 21:35:10,342] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 668
2023-01-11T21:38:06.4804474Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4804606Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4804858Z [2023-01-11 21:35:10,358] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 669
2023-01-11T21:38:06.4805122Z [2023-01-11 21:35:10,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 669
2023-01-11T21:38:06.4805539Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4805670Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4805923Z [2023-01-11 21:35:10,533] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 670
2023-01-11T21:38:06.4806187Z [2023-01-11 21:35:10,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 670
2023-01-11T21:38:06.4806622Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4806755Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4807010Z [2023-01-11 21:35:10,556] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 671
2023-01-11T21:38:06.4807016Z 
2023-01-11T21:38:06.4807113Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4807185Z import torch
2023-01-11T21:38:06.4807253Z import random
2023-01-11T21:38:06.4807372Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4807497Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4807502Z 
2023-01-11T21:38:06.4807589Z aten = torch.ops.aten
2023-01-11T21:38:06.4807724Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4807817Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4807822Z 
2023-01-11T21:38:06.4807897Z import triton
2023-01-11T21:38:06.4807982Z import triton.language as tl
2023-01-11T21:38:06.4808109Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4808244Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4808249Z 
2023-01-11T21:38:06.4808254Z 
2023-01-11T21:38:06.4808415Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4808487Z import triton
2023-01-11T21:38:06.4808578Z import triton.language as tl
2023-01-11T21:38:06.4808692Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4808794Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4808919Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4809045Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4809088Z 
2023-01-11T21:38:06.4809512Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4809589Z @triton.jit
2023-01-11T21:38:06.4809731Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4809803Z     xnumel = 64
2023-01-11T21:38:06.4809902Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4810033Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4810109Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4810182Z     x0 = xindex
2023-01-11T21:38:06.4810375Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4810477Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4810575Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4810675Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4810745Z     tmp4 = 2
2023-01-11T21:38:06.4810817Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4810950Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4811082Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4811166Z ''')
2023-01-11T21:38:06.4811172Z 
2023-01-11T21:38:06.4811176Z 
2023-01-11T21:38:06.4811271Z async_compile.wait(globals())
2023-01-11T21:38:06.4811346Z del async_compile
2023-01-11T21:38:06.4811352Z 
2023-01-11T21:38:06.4811426Z def call(args):
2023-01-11T21:38:06.4811498Z     arg0_1, = args
2023-01-11T21:38:06.4811566Z     args.clear()
2023-01-11T21:38:06.4811661Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4811862Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4812057Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4812151Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4812296Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4812370Z         del arg0_1
2023-01-11T21:38:06.4812445Z         return (buf0, buf1, )
2023-01-11T21:38:06.4812451Z 
2023-01-11T21:38:06.4812461Z 
2023-01-11T21:38:06.4812565Z if __name__ == "__main__":
2023-01-11T21:38:06.4812685Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4812812Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4813012Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4813126Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4813131Z 
2023-01-11T21:38:06.4813135Z 
2023-01-11T21:38:06.4813235Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4813308Z import torch
2023-01-11T21:38:06.4813376Z import random
2023-01-11T21:38:06.4813494Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4813618Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4813624Z 
2023-01-11T21:38:06.4813709Z aten = torch.ops.aten
2023-01-11T21:38:06.4813843Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4813938Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4813943Z 
2023-01-11T21:38:06.4814020Z import triton
2023-01-11T21:38:06.4814113Z import triton.language as tl
2023-01-11T21:38:06.4814231Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4814373Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4814378Z 
2023-01-11T21:38:06.4814383Z 
2023-01-11T21:38:06.4814657Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4814730Z import triton
2023-01-11T21:38:06.4814822Z import triton.language as tl
2023-01-11T21:38:06.4814935Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4815037Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4815213Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4815331Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4815336Z 
2023-01-11T21:38:06.4815761Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4815838Z @triton.jit
2023-01-11T21:38:06.4815981Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4816055Z     xnumel = 201
2023-01-11T21:38:06.4816155Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4816283Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4816369Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4816433Z     x0 = xindex
2023-01-11T21:38:06.4816622Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4816721Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4816821Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4816917Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4816988Z     tmp4 = 2
2023-01-11T21:38:06.4817061Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4817285Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4817465Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4817562Z ''')
2023-01-11T21:38:06.4817568Z 
2023-01-11T21:38:06.4817572Z 
2023-01-11T21:38:06.4817664Z async_compile.wait(globals())
2023-01-11T21:38:06.4817741Z del async_compile
2023-01-11T21:38:06.4817746Z 
2023-01-11T21:38:06.4817821Z def call(args):
2023-01-11T21:38:06.4817892Z     arg0_1, = args
2023-01-11T21:38:06.4817959Z     args.clear()
2023-01-11T21:38:06.4818052Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4818254Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4818455Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4818550Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4818697Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4818809Z         del arg0_1
2023-01-11T21:38:06.4818887Z         return (buf0, buf1, )
2023-01-11T21:38:06.4818898Z 
2023-01-11T21:38:06.4818903Z 
2023-01-11T21:38:06.4818976Z if __name__ == "__main__":
2023-01-11T21:38:06.4819092Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4819221Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4819423Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4819535Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4819540Z 
2023-01-11T21:38:06.4819545Z 
2023-01-11T21:38:06.4819642Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4819719Z import torch
2023-01-11T21:38:06.4819793Z import random
2023-01-11T21:38:06.4819906Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4820030Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4820035Z 
2023-01-11T21:38:06.4820117Z aten = torch.ops.aten
2023-01-11T21:38:06.4820257Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4820353Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4820359Z 
2023-01-11T21:38:06.4820430Z import triton
2023-01-11T21:38:06.4820523Z import triton.language as tl
2023-01-11T21:38:06.4820641Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4820779Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4820785Z 
2023-01-11T21:38:06.4820789Z 
2023-01-11T21:38:06.4820952Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4821025Z import triton
2023-01-11T21:38:06.4821116Z import triton.language as tl
2023-01-11T21:38:06.4821263Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4821366Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4821498Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4821619Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4821624Z 
2023-01-11T21:38:06.4822038Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4822114Z @triton.jit
2023-01-11T21:38:06.4822256Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4822333Z     xnumel = 201
2023-01-11T21:38:06.4822435Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4822565Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4822652Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4822722Z     x0 = xindex
2023-01-11T21:38:06.4822912Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4823011Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4823111Z     tmp1 = tl.libdevice.log1p(tmp0)
2023-01-11T21:38:06.4823210Z     tmp3 = tl.libdevice.log1p(tmp2)
2023-01-11T21:38:06.4823288Z     tmp4 = 2
2023-01-11T21:38:06.4823366Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4823495Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4823624Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4823712Z ''')
2023-01-11T21:38:06.4823717Z 
2023-01-11T21:38:06.4823721Z 
2023-01-11T21:38:06.4823820Z async_compile.wait(globals())
2023-01-11T21:38:06.4823898Z del async_compile
2023-01-11T21:38:06.4823903Z 
2023-01-11T21:38:06.4823978Z def call(args):
2023-01-11T21:38:06.4824051Z     arg0_1, = args
2023-01-11T21:38:06.4824127Z     args.clear()
2023-01-11T21:38:06.4824218Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4824423Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4824621Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4824716Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4824896Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4824970Z         del arg0_1
2023-01-11T21:38:06.4825054Z         return (buf0, buf1, )
2023-01-11T21:38:06.4825059Z 
2023-01-11T21:38:06.4825063Z 
2023-01-11T21:38:06.4825139Z if __name__ == "__main__":
2023-01-11T21:38:06.4825272Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4825415Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4825637Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4825751Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4825760Z 
2023-01-11T21:38:06.4826026Z [2023-01-11 21:35:10,699] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 671
2023-01-11T21:38:06.4826442Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4826573Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4826830Z [2023-01-11 21:35:10,716] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 672
2023-01-11T21:38:06.4827093Z [2023-01-11 21:35:10,724] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 672
2023-01-11T21:38:06.4827502Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4827664Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4827921Z [2023-01-11 21:35:10,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 673
2023-01-11T21:38:06.4828186Z [2023-01-11 21:35:10,892] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 673
2023-01-11T21:38:06.4828596Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4828730Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4828983Z [2023-01-11 21:35:10,908] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 674
2023-01-11T21:38:06.4828989Z 
2023-01-11T21:38:06.4829093Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4829170Z import torch
2023-01-11T21:38:06.4829248Z import random
2023-01-11T21:38:06.4829362Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4829485Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4829490Z 
2023-01-11T21:38:06.4829574Z aten = torch.ops.aten
2023-01-11T21:38:06.4829714Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4829811Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4829816Z 
2023-01-11T21:38:06.4829891Z import triton
2023-01-11T21:38:06.4829988Z import triton.language as tl
2023-01-11T21:38:06.4830119Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4830253Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4830259Z 
2023-01-11T21:38:06.4830264Z 
2023-01-11T21:38:06.4830430Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4830507Z import triton
2023-01-11T21:38:06.4830625Z import triton.language as tl
2023-01-11T21:38:06.4830748Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4830852Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4830989Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4831110Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4831122Z 
2023-01-11T21:38:06.4831539Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4831617Z @triton.jit
2023-01-11T21:38:06.4831764Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4831840Z     xnumel = 64
2023-01-11T21:38:06.4831940Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4832071Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4832157Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4832225Z     x0 = xindex
2023-01-11T21:38:06.4832425Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4832532Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4832622Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4832724Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4832814Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4832914Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4832980Z     tmp6 = 2
2023-01-11T21:38:06.4833062Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4833198Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4833361Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4833451Z ''')
2023-01-11T21:38:06.4833456Z 
2023-01-11T21:38:06.4833461Z 
2023-01-11T21:38:06.4833556Z async_compile.wait(globals())
2023-01-11T21:38:06.4833635Z del async_compile
2023-01-11T21:38:06.4833645Z 
2023-01-11T21:38:06.4833725Z def call(args):
2023-01-11T21:38:06.4833794Z     arg0_1, = args
2023-01-11T21:38:06.4833872Z     args.clear()
2023-01-11T21:38:06.4833966Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4834167Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4834366Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4834460Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4834607Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4834679Z         del arg0_1
2023-01-11T21:38:06.4834765Z         return (buf0, buf1, )
2023-01-11T21:38:06.4834770Z 
2023-01-11T21:38:06.4834774Z 
2023-01-11T21:38:06.4834854Z if __name__ == "__main__":
2023-01-11T21:38:06.4834975Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4835103Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4835331Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.4835468Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4835473Z 
2023-01-11T21:38:06.4835478Z 
2023-01-11T21:38:06.4835576Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4835646Z import torch
2023-01-11T21:38:06.4835723Z import random
2023-01-11T21:38:06.4835844Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4835968Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4835973Z 
2023-01-11T21:38:06.4836056Z aten = torch.ops.aten
2023-01-11T21:38:06.4836199Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4836296Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4836301Z 
2023-01-11T21:38:06.4836377Z import triton
2023-01-11T21:38:06.4836465Z import triton.language as tl
2023-01-11T21:38:06.4836591Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4836758Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4836764Z 
2023-01-11T21:38:06.4836768Z 
2023-01-11T21:38:06.4836935Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4837012Z import triton
2023-01-11T21:38:06.4837107Z import triton.language as tl
2023-01-11T21:38:06.4837222Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4837326Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4837455Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4837581Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4837586Z 
2023-01-11T21:38:06.4838005Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4838079Z @triton.jit
2023-01-11T21:38:06.4838226Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4838302Z     xnumel = 64
2023-01-11T21:38:06.4838401Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4838537Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4838616Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4838689Z     x0 = xindex
2023-01-11T21:38:06.4838883Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4838981Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4839070Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4839167Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4839285Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4839377Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4839449Z     tmp6 = 2
2023-01-11T21:38:06.4839528Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4839665Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4839802Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4839893Z ''')
2023-01-11T21:38:06.4839898Z 
2023-01-11T21:38:06.4839904Z 
2023-01-11T21:38:06.4839999Z async_compile.wait(globals())
2023-01-11T21:38:06.4840072Z del async_compile
2023-01-11T21:38:06.4840077Z 
2023-01-11T21:38:06.4840154Z def call(args):
2023-01-11T21:38:06.4840229Z     arg0_1, = args
2023-01-11T21:38:06.4840308Z     args.clear()
2023-01-11T21:38:06.4840403Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4840603Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4840803Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4840895Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4841046Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4841121Z         del arg0_1
2023-01-11T21:38:06.4841206Z         return (buf0, buf1, )
2023-01-11T21:38:06.4841214Z 
2023-01-11T21:38:06.4841219Z 
2023-01-11T21:38:06.4841301Z if __name__ == "__main__":
2023-01-11T21:38:06.4841422Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4841548Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4841747Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.4841855Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4841864Z 
2023-01-11T21:38:06.4841869Z 
2023-01-11T21:38:06.4841961Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4842035Z import torch
2023-01-11T21:38:06.4842110Z import random
2023-01-11T21:38:06.4842234Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4842360Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4842365Z 
2023-01-11T21:38:06.4842450Z aten = torch.ops.aten
2023-01-11T21:38:06.4842588Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4842708Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4842715Z 
2023-01-11T21:38:06.4842790Z import triton
2023-01-11T21:38:06.4842887Z import triton.language as tl
2023-01-11T21:38:06.4843015Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4843156Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4843161Z 
2023-01-11T21:38:06.4843165Z 
2023-01-11T21:38:06.4843332Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4843409Z import triton
2023-01-11T21:38:06.4843503Z import triton.language as tl
2023-01-11T21:38:06.4843612Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4843717Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4843850Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4843977Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4843982Z 
2023-01-11T21:38:06.4844402Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4844478Z @triton.jit
2023-01-11T21:38:06.4844620Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4844695Z     xnumel = 201
2023-01-11T21:38:06.4844788Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4844920Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4845005Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4845076Z     x0 = xindex
2023-01-11T21:38:06.4845268Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4845394Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4845485Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4845578Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4845665Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4845764Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4845839Z     tmp6 = 2
2023-01-11T21:38:06.4845921Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4846055Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4846189Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4846270Z ''')
2023-01-11T21:38:06.4846276Z 
2023-01-11T21:38:06.4846280Z 
2023-01-11T21:38:06.4846375Z async_compile.wait(globals())
2023-01-11T21:38:06.4846453Z del async_compile
2023-01-11T21:38:06.4846458Z 
2023-01-11T21:38:06.4846534Z def call(args):
2023-01-11T21:38:06.4846613Z     arg0_1, = args
2023-01-11T21:38:06.4846694Z     args.clear()
2023-01-11T21:38:06.4846787Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4846983Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4847181Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4847279Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4847429Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4847504Z         del arg0_1
2023-01-11T21:38:06.4847589Z         return (buf0, buf1, )
2023-01-11T21:38:06.4847594Z 
2023-01-11T21:38:06.4847598Z 
2023-01-11T21:38:06.4847680Z if __name__ == "__main__":
2023-01-11T21:38:06.4847801Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4847922Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4848124Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.4848241Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4848251Z 
2023-01-11T21:38:06.4848517Z [2023-01-11 21:35:10,916] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 674
2023-01-11T21:38:06.4848961Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4849098Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4849356Z [2023-01-11 21:35:10,932] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 675
2023-01-11T21:38:06.4849621Z [2023-01-11 21:35:11,075] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 675
2023-01-11T21:38:06.4850033Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4850170Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4850424Z [2023-01-11 21:35:11,091] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 676
2023-01-11T21:38:06.4850682Z [2023-01-11 21:35:11,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 676
2023-01-11T21:38:06.4851097Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4851310Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4851568Z [2023-01-11 21:35:11,115] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 677
2023-01-11T21:38:06.4851574Z 
2023-01-11T21:38:06.4851676Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4851753Z import torch
2023-01-11T21:38:06.4851829Z import random
2023-01-11T21:38:06.4851952Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4852077Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4852082Z 
2023-01-11T21:38:06.4852160Z aten = torch.ops.aten
2023-01-11T21:38:06.4852298Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4852395Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4852400Z 
2023-01-11T21:38:06.4852475Z import triton
2023-01-11T21:38:06.4852571Z import triton.language as tl
2023-01-11T21:38:06.4852701Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4852842Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4852848Z 
2023-01-11T21:38:06.4852852Z 
2023-01-11T21:38:06.4853017Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4853089Z import triton
2023-01-11T21:38:06.4853187Z import triton.language as tl
2023-01-11T21:38:06.4853301Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4853408Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4853541Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4853667Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4853673Z 
2023-01-11T21:38:06.4854091Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4854169Z @triton.jit
2023-01-11T21:38:06.4854308Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4854383Z     xnumel = 201
2023-01-11T21:38:06.4854584Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4854761Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4854849Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4854922Z     x0 = xindex
2023-01-11T21:38:06.4855119Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4855212Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4855303Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4855403Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4855492Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4855589Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4855661Z     tmp6 = 2
2023-01-11T21:38:06.4855740Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4855874Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4856010Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4856098Z ''')
2023-01-11T21:38:06.4856104Z 
2023-01-11T21:38:06.4856108Z 
2023-01-11T21:38:06.4856203Z async_compile.wait(globals())
2023-01-11T21:38:06.4856284Z del async_compile
2023-01-11T21:38:06.4856289Z 
2023-01-11T21:38:06.4856365Z def call(args):
2023-01-11T21:38:06.4856440Z     arg0_1, = args
2023-01-11T21:38:06.4856516Z     args.clear()
2023-01-11T21:38:06.4856603Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4856806Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4857008Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4857103Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4857338Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4857573Z         del arg0_1
2023-01-11T21:38:06.4857658Z         return (buf0, buf1, )
2023-01-11T21:38:06.4857664Z 
2023-01-11T21:38:06.4857668Z 
2023-01-11T21:38:06.4857753Z if __name__ == "__main__":
2023-01-11T21:38:06.4857867Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4857998Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4858199Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.4858315Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4858320Z 
2023-01-11T21:38:06.4858324Z 
2023-01-11T21:38:06.4858423Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4858498Z import torch
2023-01-11T21:38:06.4858574Z import random
2023-01-11T21:38:06.4858689Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4858816Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4858821Z 
2023-01-11T21:38:06.4858902Z aten = torch.ops.aten
2023-01-11T21:38:06.4859043Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4859139Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4859145Z 
2023-01-11T21:38:06.4859219Z import triton
2023-01-11T21:38:06.4859314Z import triton.language as tl
2023-01-11T21:38:06.4859439Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4859578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4859583Z 
2023-01-11T21:38:06.4859594Z 
2023-01-11T21:38:06.4859752Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4859829Z import triton
2023-01-11T21:38:06.4859924Z import triton.language as tl
2023-01-11T21:38:06.4860039Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4860144Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4860276Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4860404Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4860411Z 
2023-01-11T21:38:06.4860823Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4860898Z @triton.jit
2023-01-11T21:38:06.4861077Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4861153Z     xnumel = 64
2023-01-11T21:38:06.4861251Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4861381Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4861468Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4861540Z     x0 = xindex
2023-01-11T21:38:06.4861727Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4861826Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4861916Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4862016Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4862106Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4862205Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4862271Z     tmp6 = 2
2023-01-11T21:38:06.4862352Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4862486Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4862622Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4862708Z ''')
2023-01-11T21:38:06.4862713Z 
2023-01-11T21:38:06.4862718Z 
2023-01-11T21:38:06.4862812Z async_compile.wait(globals())
2023-01-11T21:38:06.4862891Z del async_compile
2023-01-11T21:38:06.4862896Z 
2023-01-11T21:38:06.4862970Z def call(args):
2023-01-11T21:38:06.4863038Z     arg0_1, = args
2023-01-11T21:38:06.4863115Z     args.clear()
2023-01-11T21:38:06.4863210Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4863409Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4863605Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4863728Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4863879Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4863947Z         del arg0_1
2023-01-11T21:38:06.4864033Z         return (buf0, buf1, )
2023-01-11T21:38:06.4864040Z 
2023-01-11T21:38:06.4864045Z 
2023-01-11T21:38:06.4864126Z if __name__ == "__main__":
2023-01-11T21:38:06.4864245Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4864373Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4864570Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4864683Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4864689Z 
2023-01-11T21:38:06.4864693Z 
2023-01-11T21:38:06.4864793Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4864868Z import torch
2023-01-11T21:38:06.4864940Z import random
2023-01-11T21:38:06.4865061Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4865186Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4865191Z 
2023-01-11T21:38:06.4865276Z aten = torch.ops.aten
2023-01-11T21:38:06.4865438Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4865548Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4865554Z 
2023-01-11T21:38:06.4865646Z import triton
2023-01-11T21:38:06.4865733Z import triton.language as tl
2023-01-11T21:38:06.4865860Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4866002Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4866008Z 
2023-01-11T21:38:06.4866012Z 
2023-01-11T21:38:06.4866178Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4866256Z import triton
2023-01-11T21:38:06.4866352Z import triton.language as tl
2023-01-11T21:38:06.4866468Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4866573Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4866701Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4866827Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4866832Z 
2023-01-11T21:38:06.4867280Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4867356Z @triton.jit
2023-01-11T21:38:06.4867499Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4867573Z     xnumel = 64
2023-01-11T21:38:06.4867673Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4867805Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4867883Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4867954Z     x0 = xindex
2023-01-11T21:38:06.4868147Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4868247Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4868336Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4868435Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4868524Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4868620Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4868691Z     tmp6 = 2
2023-01-11T21:38:06.4868773Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4868910Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4869044Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4869132Z ''')
2023-01-11T21:38:06.4869138Z 
2023-01-11T21:38:06.4869142Z 
2023-01-11T21:38:06.4869236Z async_compile.wait(globals())
2023-01-11T21:38:06.4869308Z del async_compile
2023-01-11T21:38:06.4869320Z 
2023-01-11T21:38:06.4869391Z def call(args):
2023-01-11T21:38:06.4869466Z     arg0_1, = args
2023-01-11T21:38:06.4869569Z     args.clear()
2023-01-11T21:38:06.4869662Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4869862Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4870063Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4870162Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4870304Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4870377Z         del arg0_1
2023-01-11T21:38:06.4870462Z         return (buf0, buf1, )
2023-01-11T21:38:06.4870467Z 
2023-01-11T21:38:06.4870472Z 
2023-01-11T21:38:06.4870551Z if __name__ == "__main__":
2023-01-11T21:38:06.4870675Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4870801Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4870999Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4871111Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4871121Z 
2023-01-11T21:38:06.4871381Z [2023-01-11 21:35:11,268] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 677
2023-01-11T21:38:06.4871797Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4871932Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4872190Z [2023-01-11 21:35:11,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 678
2023-01-11T21:38:06.4872450Z [2023-01-11 21:35:11,292] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 678
2023-01-11T21:38:06.4872458Z 
2023-01-11T21:38:06.4872558Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4872634Z import torch
2023-01-11T21:38:06.4872711Z import random
2023-01-11T21:38:06.4872829Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4872947Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4872952Z 
2023-01-11T21:38:06.4873061Z aten = torch.ops.aten
2023-01-11T21:38:06.4873202Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4873299Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4873304Z 
2023-01-11T21:38:06.4873379Z import triton
2023-01-11T21:38:06.4873471Z import triton.language as tl
2023-01-11T21:38:06.4873599Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4873734Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4873744Z 
2023-01-11T21:38:06.4873748Z 
2023-01-11T21:38:06.4873908Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4873987Z import triton
2023-01-11T21:38:06.4874084Z import triton.language as tl
2023-01-11T21:38:06.4874199Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4874306Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4874443Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4874573Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4874578Z 
2023-01-11T21:38:06.4874994Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4875063Z @triton.jit
2023-01-11T21:38:06.4875207Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4875294Z     xnumel = 201
2023-01-11T21:38:06.4875409Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4875566Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4875679Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4875751Z     x0 = xindex
2023-01-11T21:38:06.4875938Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4876039Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4876130Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4876227Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4876316Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4876416Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4876489Z     tmp6 = 2
2023-01-11T21:38:06.4876563Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4876699Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4876831Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4876922Z ''')
2023-01-11T21:38:06.4876927Z 
2023-01-11T21:38:06.4876933Z 
2023-01-11T21:38:06.4877025Z async_compile.wait(globals())
2023-01-11T21:38:06.4877108Z del async_compile
2023-01-11T21:38:06.4877113Z 
2023-01-11T21:38:06.4877189Z def call(args):
2023-01-11T21:38:06.4877265Z     arg0_1, = args
2023-01-11T21:38:06.4877335Z     args.clear()
2023-01-11T21:38:06.4877430Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4877636Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4877839Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4877932Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4878081Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4878155Z         del arg0_1
2023-01-11T21:38:06.4878234Z         return (buf0, buf1, )
2023-01-11T21:38:06.4878239Z 
2023-01-11T21:38:06.4878248Z 
2023-01-11T21:38:06.4878323Z if __name__ == "__main__":
2023-01-11T21:38:06.4878445Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4878572Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4878775Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4878888Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4878893Z 
2023-01-11T21:38:06.4878898Z 
2023-01-11T21:38:06.4878999Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4879103Z import torch
2023-01-11T21:38:06.4879174Z import random
2023-01-11T21:38:06.4879295Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4879421Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4879426Z 
2023-01-11T21:38:06.4879510Z aten = torch.ops.aten
2023-01-11T21:38:06.4879650Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4879750Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4879755Z 
2023-01-11T21:38:06.4879831Z import triton
2023-01-11T21:38:06.4879922Z import triton.language as tl
2023-01-11T21:38:06.4880042Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4880189Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4880194Z 
2023-01-11T21:38:06.4880198Z 
2023-01-11T21:38:06.4880366Z triton_fused_log1p_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4880443Z import triton
2023-01-11T21:38:06.4880538Z import triton.language as tl
2023-01-11T21:38:06.4880657Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4880759Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4880888Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4881015Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4881020Z 
2023-01-11T21:38:06.4881434Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4881510Z @triton.jit
2023-01-11T21:38:06.4881680Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4881758Z     xnumel = 201
2023-01-11T21:38:06.4881859Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4881990Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4882069Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4882144Z     x0 = xindex
2023-01-11T21:38:06.4882334Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4882434Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4882524Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4882624Z     tmp2 = tl.libdevice.log1p(tmp1)
2023-01-11T21:38:06.4882711Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4882805Z     tmp5 = tl.libdevice.log1p(tmp4)
2023-01-11T21:38:06.4882876Z     tmp6 = 2
2023-01-11T21:38:06.4882958Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.4883093Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4883233Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.4883322Z ''')
2023-01-11T21:38:06.4883328Z 
2023-01-11T21:38:06.4883332Z 
2023-01-11T21:38:06.4883425Z async_compile.wait(globals())
2023-01-11T21:38:06.4883504Z del async_compile
2023-01-11T21:38:06.4883510Z 
2023-01-11T21:38:06.4883583Z def call(args):
2023-01-11T21:38:06.4883658Z     arg0_1, = args
2023-01-11T21:38:06.4883733Z     args.clear()
2023-01-11T21:38:06.4883828Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4884029Z         buf0 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4884227Z         buf1 = empty_strided((201, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4884325Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4884470Z         triton_fused_log1p_mul_0.run(arg0_1, buf0, buf1, 201, grid=grid(201), stream=stream0)
2023-01-11T21:38:06.4884546Z         del arg0_1
2023-01-11T21:38:06.4884630Z         return (buf0, buf1, )
2023-01-11T21:38:06.4884638Z 
2023-01-11T21:38:06.4884642Z 
2023-01-11T21:38:06.4884726Z if __name__ == "__main__":
2023-01-11T21:38:06.4884848Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4884973Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4885201Z     arg0_1 = rand_strided((201, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.4885317Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4885323Z 
2023-01-11T21:38:06.4885388Z ok (1.911s)
2023-01-11T21:38:06.4885842Z   test_log2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4885972Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4886235Z [2023-01-11 21:35:11,313] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 679
2023-01-11T21:38:06.4886503Z [2023-01-11 21:35:11,393] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 679
2023-01-11T21:38:06.4886928Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4887059Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4887318Z [2023-01-11 21:35:11,413] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 680
2023-01-11T21:38:06.4887584Z [2023-01-11 21:35:11,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 680
2023-01-11T21:38:06.4887614Z 
2023-01-11T21:38:06.4887718Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4887794Z import torch
2023-01-11T21:38:06.4887864Z import random
2023-01-11T21:38:06.4887985Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4888111Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4888116Z 
2023-01-11T21:38:06.4888198Z aten = torch.ops.aten
2023-01-11T21:38:06.4888339Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4888435Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4888440Z 
2023-01-11T21:38:06.4888517Z import triton
2023-01-11T21:38:06.4888605Z import triton.language as tl
2023-01-11T21:38:06.4888735Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4888878Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4888884Z 
2023-01-11T21:38:06.4888891Z 
2023-01-11T21:38:06.4889052Z triton_fused_mul_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.4889130Z import triton
2023-01-11T21:38:06.4889222Z import triton.language as tl
2023-01-11T21:38:06.4889341Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4889448Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4889580Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4889707Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4889712Z 
2023-01-11T21:38:06.4890133Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4890209Z @triton.jit
2023-01-11T21:38:06.4890356Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4890432Z     xnumel = 64
2023-01-11T21:38:06.4890536Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4890671Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4890750Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4890825Z     x0 = xindex
2023-01-11T21:38:06.4891019Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4891146Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4891231Z     tmp1 = tl.log(tmp0)
2023-01-11T21:38:06.4891313Z     tmp2 = 1.4426950408889634
2023-01-11T21:38:06.4891402Z     tmp3 = tmp1 * tmp2
2023-01-11T21:38:06.4891469Z     tmp5 = 1
2023-01-11T21:38:06.4891547Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.4891629Z     tmp7 = tl.log(tmp6)
2023-01-11T21:38:06.4891709Z     tmp8 = tmp7 * tmp2
2023-01-11T21:38:06.4891783Z     tmp9 = 2
2023-01-11T21:38:06.4891897Z     tmp10 = tmp8 - tmp9
2023-01-11T21:38:06.4892035Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4892167Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.4892252Z ''')
2023-01-11T21:38:06.4892257Z 
2023-01-11T21:38:06.4892262Z 
2023-01-11T21:38:06.4892356Z async_compile.wait(globals())
2023-01-11T21:38:06.4892434Z del async_compile
2023-01-11T21:38:06.4892439Z 
2023-01-11T21:38:06.4892515Z def call(args):
2023-01-11T21:38:06.4892593Z     arg0_1, = args
2023-01-11T21:38:06.4892672Z     args.clear()
2023-01-11T21:38:06.4892759Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4892961Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4893159Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4893254Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4893400Z         triton_fused_mul_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4893475Z         del arg0_1
2023-01-11T21:38:06.4893558Z         return (buf0, buf1, )
2023-01-11T21:38:06.4893604Z 
2023-01-11T21:38:06.4893608Z 
2023-01-11T21:38:06.4893691Z if __name__ == "__main__":
2023-01-11T21:38:06.4893804Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4893933Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4894133Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4894248Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4894254Z 
2023-01-11T21:38:06.4894259Z 
2023-01-11T21:38:06.4894358Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4894433Z import torch
2023-01-11T21:38:06.4894624Z import random
2023-01-11T21:38:06.4894749Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4894867Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4894872Z 
2023-01-11T21:38:06.4894952Z aten = torch.ops.aten
2023-01-11T21:38:06.4895088Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4895181Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4895190Z 
2023-01-11T21:38:06.4895261Z import triton
2023-01-11T21:38:06.4895352Z import triton.language as tl
2023-01-11T21:38:06.4895475Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4895608Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4895621Z 
2023-01-11T21:38:06.4895628Z 
2023-01-11T21:38:06.4895782Z triton_fused_mul_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.4895857Z import triton
2023-01-11T21:38:06.4895950Z import triton.language as tl
2023-01-11T21:38:06.4896061Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4896162Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4896293Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4896417Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4896422Z 
2023-01-11T21:38:06.4896839Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4896908Z @triton.jit
2023-01-11T21:38:06.4897047Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4897118Z     xnumel = 64
2023-01-11T21:38:06.4897379Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4897533Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4897628Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4897713Z     x0 = xindex
2023-01-11T21:38:06.4897950Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4898185Z     tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.4898266Z     tmp1 = tl.log(tmp0)
2023-01-11T21:38:06.4898347Z     tmp2 = 1.4426950408889634
2023-01-11T21:38:06.4898425Z     tmp3 = tmp1 * tmp2
2023-01-11T21:38:06.4898493Z     tmp5 = 1
2023-01-11T21:38:06.4898573Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.4898644Z     tmp7 = tl.log(tmp6)
2023-01-11T21:38:06.4898721Z     tmp8 = tmp7 * tmp2
2023-01-11T21:38:06.4898789Z     tmp9 = 2
2023-01-11T21:38:06.4898899Z     tmp10 = tmp8 - tmp9
2023-01-11T21:38:06.4899035Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.4899172Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.4899255Z ''')
2023-01-11T21:38:06.4899260Z 
2023-01-11T21:38:06.4899265Z 
2023-01-11T21:38:06.4899350Z async_compile.wait(globals())
2023-01-11T21:38:06.4899427Z del async_compile
2023-01-11T21:38:06.4899432Z 
2023-01-11T21:38:06.4899507Z def call(args):
2023-01-11T21:38:06.4899581Z     arg0_1, = args
2023-01-11T21:38:06.4899655Z     args.clear()
2023-01-11T21:38:06.4899746Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4899943Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4900179Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4900274Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4900420Z         triton_fused_mul_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.4900494Z         del arg0_1
2023-01-11T21:38:06.4900580Z         return (buf0, buf1, )
2023-01-11T21:38:06.4900586Z 
2023-01-11T21:38:06.4900590Z 
2023-01-11T21:38:06.4900672Z if __name__ == "__main__":
2023-01-11T21:38:06.4900792Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4900921Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4901114Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4901229Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4901234Z 
2023-01-11T21:38:06.4901310Z ok (0.200s)
2023-01-11T21:38:06.4901771Z   test_log_fp64_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4901910Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4902176Z [2023-01-11 21:35:11,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 681
2023-01-11T21:38:06.4902441Z [2023-01-11 21:35:11,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 681
2023-01-11T21:38:06.4902857Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4902997Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4903253Z [2023-01-11 21:35:11,671] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 682
2023-01-11T21:38:06.4903548Z [2023-01-11 21:35:11,680] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 682
2023-01-11T21:38:06.4903555Z 
2023-01-11T21:38:06.4903650Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4903727Z import torch
2023-01-11T21:38:06.4903804Z import random
2023-01-11T21:38:06.4903925Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4904053Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4904058Z 
2023-01-11T21:38:06.4904141Z aten = torch.ops.aten
2023-01-11T21:38:06.4904281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4904378Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4904386Z 
2023-01-11T21:38:06.4904456Z import triton
2023-01-11T21:38:06.4904551Z import triton.language as tl
2023-01-11T21:38:06.4904677Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4904819Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4904824Z 
2023-01-11T21:38:06.4904828Z 
2023-01-11T21:38:06.4904995Z triton_fused_log_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4905073Z import triton
2023-01-11T21:38:06.4905168Z import triton.language as tl
2023-01-11T21:38:06.4905279Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4905384Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4905540Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4905685Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4905692Z 
2023-01-11T21:38:06.4906126Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4906228Z @triton.jit
2023-01-11T21:38:06.4906369Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4906446Z     xnumel = 1024
2023-01-11T21:38:06.4906548Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4906675Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4906760Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4906834Z     x0 = xindex
2023-01-11T21:38:06.4907029Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4907129Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4907226Z     tmp1 = tl.libdevice.log(tmp0)
2023-01-11T21:38:06.4907322Z     tmp3 = tl.libdevice.log(tmp2)
2023-01-11T21:38:06.4907397Z     tmp4 = 1.4426950408889634
2023-01-11T21:38:06.4907476Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4907616Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4907751Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4907838Z ''')
2023-01-11T21:38:06.4907844Z 
2023-01-11T21:38:06.4907848Z 
2023-01-11T21:38:06.4907944Z async_compile.wait(globals())
2023-01-11T21:38:06.4908025Z del async_compile
2023-01-11T21:38:06.4908031Z 
2023-01-11T21:38:06.4908101Z def call(args):
2023-01-11T21:38:06.4908176Z     arg0_1, = args
2023-01-11T21:38:06.4908252Z     args.clear()
2023-01-11T21:38:06.4908345Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4908549Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4908750Z         buf1 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4908844Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4908994Z         triton_fused_log_mul_0.run(arg0_1, buf0, buf1, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.4909066Z         del arg0_1
2023-01-11T21:38:06.4909150Z         return (buf0, buf1, )
2023-01-11T21:38:06.4909156Z 
2023-01-11T21:38:06.4909160Z 
2023-01-11T21:38:06.4909242Z if __name__ == "__main__":
2023-01-11T21:38:06.4909362Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4909518Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4909724Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4909839Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4909844Z 
2023-01-11T21:38:06.4909848Z 
2023-01-11T21:38:06.4909947Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4910016Z import torch
2023-01-11T21:38:06.4910094Z import random
2023-01-11T21:38:06.4910215Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4910340Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4910345Z 
2023-01-11T21:38:06.4910429Z aten = torch.ops.aten
2023-01-11T21:38:06.4910572Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4910669Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4910674Z 
2023-01-11T21:38:06.4910743Z import triton
2023-01-11T21:38:06.4910837Z import triton.language as tl
2023-01-11T21:38:06.4910966Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4911107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4911112Z 
2023-01-11T21:38:06.4911116Z 
2023-01-11T21:38:06.4911280Z triton_fused_log_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.4911358Z import triton
2023-01-11T21:38:06.4911453Z import triton.language as tl
2023-01-11T21:38:06.4911567Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4911664Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4911798Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4911926Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4911962Z 
2023-01-11T21:38:06.4912389Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4912464Z @triton.jit
2023-01-11T21:38:06.4912611Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4912686Z     xnumel = 1024
2023-01-11T21:38:06.4912786Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4912911Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4912997Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4913073Z     x0 = xindex
2023-01-11T21:38:06.4913266Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4913366Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4913465Z     tmp1 = tl.libdevice.log(tmp0)
2023-01-11T21:38:06.4913562Z     tmp3 = tl.libdevice.log(tmp2)
2023-01-11T21:38:06.4913637Z     tmp4 = 1.4426950408889634
2023-01-11T21:38:06.4913721Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.4913856Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.4913989Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4914080Z ''')
2023-01-11T21:38:06.4914086Z 
2023-01-11T21:38:06.4914090Z 
2023-01-11T21:38:06.4914185Z async_compile.wait(globals())
2023-01-11T21:38:06.4914265Z del async_compile
2023-01-11T21:38:06.4914270Z 
2023-01-11T21:38:06.4914349Z def call(args):
2023-01-11T21:38:06.4914417Z     arg0_1, = args
2023-01-11T21:38:06.4914493Z     args.clear()
2023-01-11T21:38:06.4914588Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4914789Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4914990Z         buf1 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.4915085Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4915234Z         triton_fused_log_mul_0.run(arg0_1, buf0, buf1, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.4915312Z         del arg0_1
2023-01-11T21:38:06.4915413Z         return (buf0, buf1, )
2023-01-11T21:38:06.4915418Z 
2023-01-11T21:38:06.4915423Z 
2023-01-11T21:38:06.4915548Z if __name__ == "__main__":
2023-01-11T21:38:06.4915671Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4915798Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4915997Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.4916110Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4916115Z 
2023-01-11T21:38:06.4916185Z ok (0.188s)
2023-01-11T21:38:06.4916646Z   test_log_softmax_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4916774Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4917032Z [2023-01-11 21:35:11,724] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 683
2023-01-11T21:38:06.4917241Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.4917443Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.4917640Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6')
2023-01-11T21:38:06.4917838Z [2023-01-11 21:35:11,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.4918100Z [2023-01-11 21:35:11,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 683
2023-01-11T21:38:06.4918546Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4918678Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4918932Z [2023-01-11 21:35:12,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 684
2023-01-11T21:38:06.4919133Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.4919338Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.4919539Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6')
2023-01-11T21:38:06.4919740Z [2023-01-11 21:35:12,194] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.4919746Z 
2023-01-11T21:38:06.4919842Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4919915Z import torch
2023-01-11T21:38:06.4919989Z import random
2023-01-11T21:38:06.4920111Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4920232Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4920237Z 
2023-01-11T21:38:06.4920320Z aten = torch.ops.aten
2023-01-11T21:38:06.4920457Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4920550Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4920555Z 
2023-01-11T21:38:06.4920628Z import triton
2023-01-11T21:38:06.4920719Z import triton.language as tl
2023-01-11T21:38:06.4920844Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4920978Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4920988Z 
2023-01-11T21:38:06.4920999Z 
2023-01-11T21:38:06.4921180Z triton_fused_amax_1_exp_1_sub_2_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.4921254Z import triton
2023-01-11T21:38:06.4921344Z import triton.language as tl
2023-01-11T21:38:06.4921458Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4921558Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4921715Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4921844Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4921850Z 
2023-01-11T21:38:06.4921931Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4922050Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.4922137Z               filename=__file__,
2023-01-11T21:38:06.4922514Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4922591Z @triton.jit
2023-01-11T21:38:06.4922767Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4922842Z     xnumel = 8
2023-01-11T21:38:06.4922912Z     rnumel = 8
2023-01-11T21:38:06.4923003Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4923137Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4923220Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4923335Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4923405Z     x0 = xindex
2023-01-11T21:38:06.4923588Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4923695Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4923777Z         rindex = roffset + rbase
2023-01-11T21:38:06.4923860Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4923931Z         r1 = rindex
2023-01-11T21:38:06.4924151Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4924307Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.4924420Z     tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4924517Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.4924631Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4924736Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4924822Z         rindex = roffset + rbase
2023-01-11T21:38:06.4924905Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4924975Z         r1 = rindex
2023-01-11T21:38:06.4925095Z         tmp2 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask)
2023-01-11T21:38:06.4925212Z         tmp3 = tmp2 - tmp1
2023-01-11T21:38:06.4925288Z         tmp4 = tl.exp(tmp3)
2023-01-11T21:38:06.4925410Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.4925525Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4925625Z     tl.store(out_ptr1 + x0, tmp5, xmask)
2023-01-11T21:38:06.4925711Z ''')
2023-01-11T21:38:06.4925717Z 
2023-01-11T21:38:06.4925721Z 
2023-01-11T21:38:06.4925953Z triton_fused_add_amax_amax_2_exp_exp_2_sub_sub_1_sub_3_sub_4_sub_5_1 = async_compile.triton('''
2023-01-11T21:38:06.4926030Z import triton
2023-01-11T21:38:06.4926123Z import triton.language as tl
2023-01-11T21:38:06.4926233Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4926334Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4926467Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4926591Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4926596Z 
2023-01-11T21:38:06.4926686Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4926802Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.4926888Z               filename=__file__,
2023-01-11T21:38:06.4927321Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.4927390Z @triton.jit
2023-01-11T21:38:06.4927630Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4927705Z     xnumel = 8
2023-01-11T21:38:06.4927776Z     rnumel = 8
2023-01-11T21:38:06.4927875Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4928012Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4928096Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4928207Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4928276Z     x0 = xindex
2023-01-11T21:38:06.4928459Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4928562Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4928655Z         rindex = roffset + rbase
2023-01-11T21:38:06.4928740Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4928811Z         r1 = rindex
2023-01-11T21:38:06.4929018Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4929236Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4929318Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4929445Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.4929559Z     tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4929676Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4929862Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4929969Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4930049Z         rindex = roffset + rbase
2023-01-11T21:38:06.4930135Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4930243Z         r1 = rindex
2023-01-11T21:38:06.4930459Z         tmp4 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4930672Z         tmp5 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4930753Z         tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.4930871Z         tmp7 = tmp6 - tmp3
2023-01-11T21:38:06.4930946Z         tmp8 = tl.exp(tmp7)
2023-01-11T21:38:06.4931067Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.4931197Z         _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp5), tmp5, _tmp10)
2023-01-11T21:38:06.4931309Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4931423Z     tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4931538Z     _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4931640Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4931721Z         rindex = roffset + rbase
2023-01-11T21:38:06.4931809Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4931880Z         r1 = rindex
2023-01-11T21:38:06.4932096Z         tmp11 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4932320Z         tmp15 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4932437Z         tmp20 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.4932541Z         tmp21 = tl.load(in_ptr2 + (r1), rmask)
2023-01-11T21:38:06.4932644Z         tmp23 = tl.load(in_ptr3 + (r1), rmask)
2023-01-11T21:38:06.4932757Z         tmp12 = tmp11 - tmp10
2023-01-11T21:38:06.4932842Z         tmp13 = tl.exp(tmp12)
2023-01-11T21:38:06.4932966Z         _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14)
2023-01-11T21:38:06.4933048Z         tmp16 = tmp15 + tmp11
2023-01-11T21:38:06.4933164Z         tmp17 = tmp16 - tmp3
2023-01-11T21:38:06.4933250Z         tmp18 = tl.log(tmp9)
2023-01-11T21:38:06.4933361Z         tmp19 = tmp17 - tmp18
2023-01-11T21:38:06.4933476Z         tmp22 = tmp20 - tmp21
2023-01-11T21:38:06.4933557Z         tmp24 = tl.log(tmp23)
2023-01-11T21:38:06.4933670Z         tmp25 = tmp22 - tmp24
2023-01-11T21:38:06.4933829Z         tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp19, rmask & xmask)
2023-01-11T21:38:06.4934016Z         tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask)
2023-01-11T21:38:06.4934135Z     tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4934241Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4934322Z         rindex = roffset + rbase
2023-01-11T21:38:06.4934406Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4934588Z         r1 = rindex
2023-01-11T21:38:06.4934707Z         tmp26 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.4934825Z         tmp27 = tmp26 - tmp10
2023-01-11T21:38:06.4934907Z         tmp28 = tl.log(tmp14)
2023-01-11T21:38:06.4935026Z         tmp29 = tmp27 - tmp28
2023-01-11T21:38:06.4935172Z         tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp29, rmask & xmask)
2023-01-11T21:38:06.4935255Z ''')
2023-01-11T21:38:06.4935261Z 
2023-01-11T21:38:06.4935265Z 
2023-01-11T21:38:06.4935359Z async_compile.wait(globals())
2023-01-11T21:38:06.4935437Z del async_compile
2023-01-11T21:38:06.4935442Z 
2023-01-11T21:38:06.4935519Z def call(args):
2023-01-11T21:38:06.4935599Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4935675Z     args.clear()
2023-01-11T21:38:06.4935762Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4935966Z         buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4936166Z         buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4936259Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4936418Z         triton_fused_amax_1_exp_1_sub_2_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4936674Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4936866Z         buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4937064Z         buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4937312Z         triton_fused_add_amax_amax_2_exp_exp_2_sub_sub_1_sub_3_sub_4_sub_5_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4937396Z         del arg0_1
2023-01-11T21:38:06.4937469Z         del arg1_1
2023-01-11T21:38:06.4937558Z         return (buf2, buf5, buf8, )
2023-01-11T21:38:06.4937564Z 
2023-01-11T21:38:06.4937568Z 
2023-01-11T21:38:06.4937647Z if __name__ == "__main__":
2023-01-11T21:38:06.4937766Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4937893Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4938095Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4938285Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4938405Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4938410Z 
2023-01-11T21:38:06.4938415Z 
2023-01-11T21:38:06.4938512Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4938586Z import torch
2023-01-11T21:38:06.4938660Z import random
2023-01-11T21:38:06.4938777Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4938901Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4938906Z 
2023-01-11T21:38:06.4938988Z aten = torch.ops.aten
2023-01-11T21:38:06.4939117Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4939214Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4939219Z 
2023-01-11T21:38:06.4939292Z import triton
2023-01-11T21:38:06.4939385Z import triton.language as tl
2023-01-11T21:38:06.4939513Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4939654Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4939659Z 
2023-01-11T21:38:06.4939664Z 
2023-01-11T21:38:06.4939887Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_2_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.4940002Z import triton
2023-01-11T21:38:06.4940089Z import triton.language as tl
2023-01-11T21:38:06.4940201Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4940303Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4940434Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4940562Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4940567Z 
2023-01-11T21:38:06.4940655Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4940772Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.4940856Z               filename=__file__,
2023-01-11T21:38:06.4941226Z               meta={'signature': {0: '*fp16', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.4941303Z @triton.jit
2023-01-11T21:38:06.4941483Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4941555Z     xnumel = 8
2023-01-11T21:38:06.4941626Z     rnumel = 8
2023-01-11T21:38:06.4941724Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4941861Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4941937Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4942056Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4942127Z     x0 = xindex
2023-01-11T21:38:06.4942312Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4942417Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4942536Z         rindex = roffset + rbase
2023-01-11T21:38:06.4942624Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4942690Z         r1 = rindex
2023-01-11T21:38:06.4942933Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4943028Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4943160Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.4943276Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4943376Z     tl.store(out_ptr0 + x0, tmp2, xmask)
2023-01-11T21:38:06.4943499Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4943607Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4943690Z         rindex = roffset + rbase
2023-01-11T21:38:06.4943776Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4943849Z         r1 = rindex
2023-01-11T21:38:06.4943983Z         tmp3 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.4944086Z         tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4944204Z         tmp5 = tmp4 - tmp2
2023-01-11T21:38:06.4944293Z         tmp6 = tl.exp(tmp5)
2023-01-11T21:38:06.4944409Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.4944529Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4944631Z     tl.store(out_ptr1 + x0, tmp7, xmask)
2023-01-11T21:38:06.4944718Z ''')
2023-01-11T21:38:06.4944723Z 
2023-01-11T21:38:06.4944728Z 
2023-01-11T21:38:06.4945091Z triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1 = async_compile.triton('''
2023-01-11T21:38:06.4945170Z import triton
2023-01-11T21:38:06.4945267Z import triton.language as tl
2023-01-11T21:38:06.4945385Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4945485Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4945636Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4945783Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4945790Z 
2023-01-11T21:38:06.4945899Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4946016Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.4946104Z               filename=__file__,
2023-01-11T21:38:06.4946615Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: '*fp32', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.4946693Z @triton.jit
2023-01-11T21:38:06.4946896Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4946972Z     xnumel = 8
2023-01-11T21:38:06.4947047Z     rnumel = 8
2023-01-11T21:38:06.4947151Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4947291Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4947375Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4947493Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4947560Z     x0 = xindex
2023-01-11T21:38:06.4947746Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4947850Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4947940Z         rindex = roffset + rbase
2023-01-11T21:38:06.4948027Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4948101Z         r1 = rindex
2023-01-11T21:38:06.4948340Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4948572Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4948658Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.4948751Z         tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.4948911Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4)
2023-01-11T21:38:06.4949027Z     tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4949148Z     _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4949338Z     _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4949443Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4949526Z         rindex = roffset + rbase
2023-01-11T21:38:06.4949613Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4949687Z         r1 = rindex
2023-01-11T21:38:06.4949929Z         tmp5 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4950166Z         tmp6 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4950251Z         tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.4950343Z         tmp8 = tmp7.to(tl.float32)
2023-01-11T21:38:06.4950456Z         tmp9 = tmp8 - tmp4
2023-01-11T21:38:06.4950543Z         tmp10 = tl.exp(tmp9)
2023-01-11T21:38:06.4950668Z         _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11)
2023-01-11T21:38:06.4950762Z         tmp12 = tmp6.to(tl.float32)
2023-01-11T21:38:06.4950899Z         _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13)
2023-01-11T21:38:06.4951015Z     tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4951134Z     tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4951247Z     _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4951353Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4951443Z         rindex = roffset + rbase
2023-01-11T21:38:06.4951530Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4951603Z         r1 = rindex
2023-01-11T21:38:06.4951843Z         tmp14 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4952083Z         tmp19 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4952218Z         tmp26 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.4952316Z         tmp28 = tl.load(in_ptr2 + (r1), rmask)
2023-01-11T21:38:06.4952447Z         tmp30 = tl.load(in_ptr3 + (r1), rmask)
2023-01-11T21:38:06.4952543Z         tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.4952665Z         tmp16 = tmp15 - tmp13
2023-01-11T21:38:06.4952753Z         tmp17 = tl.exp(tmp16)
2023-01-11T21:38:06.4952876Z         _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18)
2023-01-11T21:38:06.4952958Z         tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.4953043Z         tmp21 = tmp20.to(tl.float32)
2023-01-11T21:38:06.4953159Z         tmp22 = tmp21 - tmp4
2023-01-11T21:38:06.4953242Z         tmp23 = tl.log(tmp11)
2023-01-11T21:38:06.4953358Z         tmp24 = tmp22 - tmp23
2023-01-11T21:38:06.4953447Z         tmp25 = tmp24.to(tl.float32)
2023-01-11T21:38:06.4953541Z         tmp27 = tmp26.to(tl.float32)
2023-01-11T21:38:06.4953659Z         tmp29 = tmp27 - tmp28
2023-01-11T21:38:06.4953736Z         tmp31 = tl.log(tmp30)
2023-01-11T21:38:06.4953852Z         tmp32 = tmp29 - tmp31
2023-01-11T21:38:06.4953944Z         tmp33 = tmp32.to(tl.float32)
2023-01-11T21:38:06.4954108Z         tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask)
2023-01-11T21:38:06.4954268Z         tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp33, rmask & xmask)
2023-01-11T21:38:06.4954387Z     tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4954495Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4954578Z         rindex = roffset + rbase
2023-01-11T21:38:06.4954664Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4954738Z         r1 = rindex
2023-01-11T21:38:06.4954872Z         tmp34 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.4954992Z         tmp35 = tmp34.to(tl.float32)
2023-01-11T21:38:06.4955108Z         tmp36 = tmp35 - tmp13
2023-01-11T21:38:06.4955194Z         tmp37 = tl.log(tmp18)
2023-01-11T21:38:06.4955311Z         tmp38 = tmp36 - tmp37
2023-01-11T21:38:06.4955422Z         tmp39 = tmp38.to(tl.float32)
2023-01-11T21:38:06.4955603Z         tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp39, rmask & xmask)
2023-01-11T21:38:06.4955690Z ''')
2023-01-11T21:38:06.4955696Z 
2023-01-11T21:38:06.4955700Z 
2023-01-11T21:38:06.4955793Z async_compile.wait(globals())
2023-01-11T21:38:06.4955871Z del async_compile
2023-01-11T21:38:06.4955877Z 
2023-01-11T21:38:06.4955954Z def call(args):
2023-01-11T21:38:06.4956037Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.4956108Z     args.clear()
2023-01-11T21:38:06.4956202Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4956403Z         buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4956602Z         buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4956699Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4956889Z         triton_fused_amax_1_convert_element_type_2_exp_1_sub_2_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4957092Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4957283Z         buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4957477Z         buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4957767Z         triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4957842Z         del arg0_1
2023-01-11T21:38:06.4957918Z         del arg1_1
2023-01-11T21:38:06.4958019Z         return (buf2, buf5, buf8, )
2023-01-11T21:38:06.4958024Z 
2023-01-11T21:38:06.4958029Z 
2023-01-11T21:38:06.4958111Z if __name__ == "__main__":
2023-01-11T21:38:06.4958237Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4958366Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4958587Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4958782Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4958906Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.4959173Z [2023-01-11 21:35:12,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 684
2023-01-11T21:38:06.4959179Z 
2023-01-11T21:38:06.4959250Z ok (0.640s)
2023-01-11T21:38:06.4959709Z   test_logsumexp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4959845Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4960110Z [2023-01-11 21:35:12,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 685
2023-01-11T21:38:06.4960321Z [2023-01-11 21:35:12,392] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.4960520Z [2023-01-11 21:35:12,395] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:06.4960786Z [2023-01-11 21:35:12,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 685
2023-01-11T21:38:06.4961203Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4961365Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4961624Z [2023-01-11 21:35:12,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 686
2023-01-11T21:38:06.4961835Z [2023-01-11 21:35:12,769] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.4962042Z [2023-01-11 21:35:12,769] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.4962243Z [2023-01-11 21:35:12,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:06.4962443Z [2023-01-11 21:35:12,772] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.4962449Z 
2023-01-11T21:38:06.4962550Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4962623Z import torch
2023-01-11T21:38:06.4962700Z import random
2023-01-11T21:38:06.4962823Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4962949Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4962955Z 
2023-01-11T21:38:06.4963037Z aten = torch.ops.aten
2023-01-11T21:38:06.4963178Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4963278Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4963284Z 
2023-01-11T21:38:06.4963353Z import triton
2023-01-11T21:38:06.4963449Z import triton.language as tl
2023-01-11T21:38:06.4963576Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4963716Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4963722Z 
2023-01-11T21:38:06.4963727Z 
2023-01-11T21:38:06.4963917Z triton_fused_add_amax_exp_sub_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.4963995Z import triton
2023-01-11T21:38:06.4964090Z import triton.language as tl
2023-01-11T21:38:06.4964211Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4964308Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4964441Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4964569Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4964574Z 
2023-01-11T21:38:06.4964691Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4964810Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.4964896Z               filename=__file__,
2023-01-11T21:38:06.4965270Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4965344Z @triton.jit
2023-01-11T21:38:06.4965510Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4965586Z     xnumel = 8
2023-01-11T21:38:06.4965664Z     rnumel = 8
2023-01-11T21:38:06.4965766Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4965904Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4965988Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4966110Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4966179Z     x0 = xindex
2023-01-11T21:38:06.4966365Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4966475Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4966564Z         rindex = roffset + rbase
2023-01-11T21:38:06.4966653Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4966724Z         r1 = rindex
2023-01-11T21:38:06.4966940Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4967065Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.4967179Z     tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4967332Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4967441Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4967536Z         rindex = roffset + rbase
2023-01-11T21:38:06.4967625Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4967700Z         r1 = rindex
2023-01-11T21:38:06.4967816Z         tmp2 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.4967936Z         tmp3 = tmp2 - tmp1
2023-01-11T21:38:06.4968022Z         tmp4 = tl.exp(tmp3)
2023-01-11T21:38:06.4968144Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.4968259Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4968345Z     tmp6 = tl.log(tmp5)
2023-01-11T21:38:06.4968427Z     tmp7 = tl.abs(tmp1)
2023-01-11T21:38:06.4968502Z     tmp8 = float("inf")
2023-01-11T21:38:06.4968581Z     tmp9 = tmp7 == tmp8
2023-01-11T21:38:06.4968653Z     tmp10 = 0.0
2023-01-11T21:38:06.4968752Z     tmp11 = tl.where(tmp9, tmp10, tmp1)
2023-01-11T21:38:06.4968839Z     tmp12 = tmp6 + tmp11
2023-01-11T21:38:06.4968981Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.4969070Z ''')
2023-01-11T21:38:06.4969075Z 
2023-01-11T21:38:06.4969080Z 
2023-01-11T21:38:06.4969276Z triton_fused_amax_1_exp_1_sub_1_sub_2_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.4969350Z import triton
2023-01-11T21:38:06.4969444Z import triton.language as tl
2023-01-11T21:38:06.4969560Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4969665Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4969799Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4969926Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4969931Z 
2023-01-11T21:38:06.4970022Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4970136Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.4970225Z               filename=__file__,
2023-01-11T21:38:06.4970602Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4970678Z @triton.jit
2023-01-11T21:38:06.4970881Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4970959Z     xnumel = 8
2023-01-11T21:38:06.4971033Z     rnumel = 8
2023-01-11T21:38:06.4971127Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4971264Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4971348Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4971470Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4971542Z     x0 = xindex
2023-01-11T21:38:06.4971726Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4971832Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4971927Z         rindex = roffset + rbase
2023-01-11T21:38:06.4972008Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4972081Z         r1 = rindex
2023-01-11T21:38:06.4972297Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4972430Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.4972545Z     tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4972667Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4972773Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4972858Z         rindex = roffset + rbase
2023-01-11T21:38:06.4972942Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4973015Z         r1 = rindex
2023-01-11T21:38:06.4973135Z         tmp2 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask)
2023-01-11T21:38:06.4973254Z         tmp3 = tmp2 - tmp1
2023-01-11T21:38:06.4973340Z         tmp4 = tl.exp(tmp3)
2023-01-11T21:38:06.4973492Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.4973601Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4973683Z     tmp6 = tl.log(tmp5)
2023-01-11T21:38:06.4973765Z     tmp7 = tl.abs(tmp1)
2023-01-11T21:38:06.4973847Z     tmp8 = float("inf")
2023-01-11T21:38:06.4973929Z     tmp9 = tmp7 == tmp8
2023-01-11T21:38:06.4974008Z     tmp10 = 0.0
2023-01-11T21:38:06.4974109Z     tmp11 = tl.where(tmp9, tmp10, tmp1)
2023-01-11T21:38:06.4974185Z     tmp12 = tmp6 + tmp11
2023-01-11T21:38:06.4974259Z     tmp13 = 2
2023-01-11T21:38:06.4974378Z     tmp14 = tmp12 - tmp13
2023-01-11T21:38:06.4974636Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.4974721Z ''')
2023-01-11T21:38:06.4974727Z 
2023-01-11T21:38:06.4974731Z 
2023-01-11T21:38:06.4974824Z async_compile.wait(globals())
2023-01-11T21:38:06.4974900Z del async_compile
2023-01-11T21:38:06.4974905Z 
2023-01-11T21:38:06.4974972Z def call(args):
2023-01-11T21:38:06.4975049Z     arg0_1, = args
2023-01-11T21:38:06.4975124Z     args.clear()
2023-01-11T21:38:06.4975217Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4975420Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4975510Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.4975605Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4975759Z         triton_fused_add_amax_exp_sub_sum_1_0.run(buf2, arg0_1, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4975953Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.4976042Z         buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.4976203Z         triton_fused_amax_1_exp_1_sub_1_sub_2_sum_2_1.run(buf5, arg0_1, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4976276Z         del arg0_1
2023-01-11T21:38:06.4976358Z         return (buf2, buf5, )
2023-01-11T21:38:06.4976363Z 
2023-01-11T21:38:06.4976368Z 
2023-01-11T21:38:06.4976446Z if __name__ == "__main__":
2023-01-11T21:38:06.4976567Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4976688Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4976888Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.4977040Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4977045Z 
2023-01-11T21:38:06.4977383Z [2023-01-11 21:35:12,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 686
2023-01-11T21:38:06.4977390Z 
2023-01-11T21:38:06.4977485Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4977559Z import torch
2023-01-11T21:38:06.4977633Z import random
2023-01-11T21:38:06.4977753Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4977871Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4977876Z 
2023-01-11T21:38:06.4977957Z aten = torch.ops.aten
2023-01-11T21:38:06.4978096Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4978194Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4978199Z 
2023-01-11T21:38:06.4978273Z import triton
2023-01-11T21:38:06.4978364Z import triton.language as tl
2023-01-11T21:38:06.4978489Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4978632Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4978637Z 
2023-01-11T21:38:06.4978642Z 
2023-01-11T21:38:06.4978887Z triton_fused_amax_convert_element_type_convert_element_type_1_exp_sub_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.4978963Z import triton
2023-01-11T21:38:06.4979056Z import triton.language as tl
2023-01-11T21:38:06.4979170Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4979272Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4979402Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4979531Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4979585Z 
2023-01-11T21:38:06.4979679Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4979789Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.4979873Z               filename=__file__,
2023-01-11T21:38:06.4980235Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4980309Z @triton.jit
2023-01-11T21:38:06.4980477Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4980552Z     xnumel = 8
2023-01-11T21:38:06.4980627Z     rnumel = 8
2023-01-11T21:38:06.4980718Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4980856Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4980938Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4981055Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4981129Z     x0 = xindex
2023-01-11T21:38:06.4981311Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4981416Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4981499Z         rindex = roffset + rbase
2023-01-11T21:38:06.4981588Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4981664Z         r1 = rindex
2023-01-11T21:38:06.4981900Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4981993Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4982121Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.4982234Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4982349Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4982448Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4982534Z         rindex = roffset + rbase
2023-01-11T21:38:06.4982619Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4982693Z         r1 = rindex
2023-01-11T21:38:06.4982826Z         tmp3 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.4982915Z         tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4983027Z         tmp5 = tmp4 - tmp2
2023-01-11T21:38:06.4983102Z         tmp6 = tl.exp(tmp5)
2023-01-11T21:38:06.4983253Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.4983368Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4983446Z     tmp8 = tl.log(tmp7)
2023-01-11T21:38:06.4983523Z     tmp9 = tl.abs(tmp2)
2023-01-11T21:38:06.4983603Z     tmp10 = float("inf")
2023-01-11T21:38:06.4983682Z     tmp11 = tmp9 == tmp10
2023-01-11T21:38:06.4983750Z     tmp12 = 0.0
2023-01-11T21:38:06.4983849Z     tmp13 = tl.where(tmp11, tmp12, tmp2)
2023-01-11T21:38:06.4983929Z     tmp14 = tmp8 + tmp13
2023-01-11T21:38:06.4984017Z     tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.4984151Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.4984239Z ''')
2023-01-11T21:38:06.4984244Z 
2023-01-11T21:38:06.4984248Z 
2023-01-11T21:38:06.4984481Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sub_2_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.4984549Z import triton
2023-01-11T21:38:06.4984643Z import triton.language as tl
2023-01-11T21:38:06.4984757Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4984860Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4984992Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.4985118Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4985123Z 
2023-01-11T21:38:06.4985211Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.4985321Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.4985427Z               filename=__file__,
2023-01-11T21:38:06.4985817Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.4985924Z @triton.jit
2023-01-11T21:38:06.4986096Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.4986173Z     xnumel = 8
2023-01-11T21:38:06.4986251Z     rnumel = 8
2023-01-11T21:38:06.4986352Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4986482Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.4986566Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4986686Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.4986757Z     x0 = xindex
2023-01-11T21:38:06.4986941Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.4987046Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4987136Z         rindex = roffset + rbase
2023-01-11T21:38:06.4987216Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4987292Z         r1 = rindex
2023-01-11T21:38:06.4987532Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.4987626Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.4987757Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.4987876Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4987994Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.4988096Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.4988187Z         rindex = roffset + rbase
2023-01-11T21:38:06.4988275Z         rmask = rindex < rnumel
2023-01-11T21:38:06.4988349Z         r1 = rindex
2023-01-11T21:38:06.4988483Z         tmp3 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.4988575Z         tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.4988693Z         tmp5 = tmp4 - tmp2
2023-01-11T21:38:06.4988775Z         tmp6 = tl.exp(tmp5)
2023-01-11T21:38:06.4988900Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.4989015Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.4989096Z     tmp8 = tl.log(tmp7)
2023-01-11T21:38:06.4989177Z     tmp9 = tl.abs(tmp2)
2023-01-11T21:38:06.4989289Z     tmp10 = float("inf")
2023-01-11T21:38:06.4989373Z     tmp11 = tmp9 == tmp10
2023-01-11T21:38:06.4989440Z     tmp12 = 0.0
2023-01-11T21:38:06.4989542Z     tmp13 = tl.where(tmp11, tmp12, tmp2)
2023-01-11T21:38:06.4989626Z     tmp14 = tmp8 + tmp13
2023-01-11T21:38:06.4989717Z     tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.4989791Z     tmp16 = 2
2023-01-11T21:38:06.4989908Z     tmp17 = tmp15 - tmp16
2023-01-11T21:38:06.4990049Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.4990129Z ''')
2023-01-11T21:38:06.4990134Z 
2023-01-11T21:38:06.4990145Z 
2023-01-11T21:38:06.4990234Z async_compile.wait(globals())
2023-01-11T21:38:06.4990316Z del async_compile
2023-01-11T21:38:06.4990321Z 
2023-01-11T21:38:06.4990398Z def call(args):
2023-01-11T21:38:06.4990475Z     arg0_1, = args
2023-01-11T21:38:06.4990551Z     args.clear()
2023-01-11T21:38:06.4990646Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4990846Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4990935Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.4991137Z         triton_fused_amax_convert_element_type_convert_element_type_1_exp_sub_sum_1_0.run(arg0_1, buf2, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4991335Z         buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.4991521Z         triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sub_2_sum_2_1.run(arg0_1, buf5, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.4991595Z         del arg0_1
2023-01-11T21:38:06.4991682Z         return (buf2, buf5, )
2023-01-11T21:38:06.4991688Z 
2023-01-11T21:38:06.4991719Z 
2023-01-11T21:38:06.4991802Z if __name__ == "__main__":
2023-01-11T21:38:06.4991924Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.4992046Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.4992249Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.4992364Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.4992370Z 
2023-01-11T21:38:06.4992443Z ok (0.550s)
2023-01-11T21:38:06.4992903Z   test_long_tensor_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4993038Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4993301Z [2023-01-11 21:35:12,897] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 687
2023-01-11T21:38:06.4993563Z [2023-01-11 21:35:12,971] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 687
2023-01-11T21:38:06.4993978Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.4994111Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.4994362Z [2023-01-11 21:35:12,997] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 688
2023-01-11T21:38:06.4994623Z [2023-01-11 21:35:13,007] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 688
2023-01-11T21:38:06.4994632Z 
2023-01-11T21:38:06.4994733Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.4994807Z import torch
2023-01-11T21:38:06.4994882Z import random
2023-01-11T21:38:06.4995002Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.4995156Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.4995162Z 
2023-01-11T21:38:06.4995248Z aten = torch.ops.aten
2023-01-11T21:38:06.4995403Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.4995507Z async_compile = AsyncCompile()
2023-01-11T21:38:06.4995513Z 
2023-01-11T21:38:06.4995606Z import triton
2023-01-11T21:38:06.4995703Z import triton.language as tl
2023-01-11T21:38:06.4995829Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.4995970Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.4995976Z 
2023-01-11T21:38:06.4995981Z 
2023-01-11T21:38:06.4996142Z triton_fused_add_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.4996219Z import triton
2023-01-11T21:38:06.4996307Z import triton.language as tl
2023-01-11T21:38:06.4996422Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.4996528Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.4996664Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.4996794Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.4996799Z 
2023-01-11T21:38:06.4997210Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.4997285Z @triton.jit
2023-01-11T21:38:06.4997431Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.4997500Z     xnumel = 64
2023-01-11T21:38:06.4997598Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.4997757Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.4997841Z     xmask = xindex < xnumel
2023-01-11T21:38:06.4997914Z     x0 = xindex
2023-01-11T21:38:06.4998105Z     tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.4998207Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.4998279Z     tmp0 = 294
2023-01-11T21:38:06.4998389Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.4998463Z     tmp3 = 295
2023-01-11T21:38:06.4998543Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.4998677Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.4998812Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.4998898Z ''')
2023-01-11T21:38:06.4998903Z 
2023-01-11T21:38:06.4998908Z 
2023-01-11T21:38:06.4999003Z async_compile.wait(globals())
2023-01-11T21:38:06.4999075Z del async_compile
2023-01-11T21:38:06.4999081Z 
2023-01-11T21:38:06.4999157Z def call(args):
2023-01-11T21:38:06.4999237Z     arg0_1, = args
2023-01-11T21:38:06.4999313Z     args.clear()
2023-01-11T21:38:06.4999406Z     with torch.cuda.device(0):
2023-01-11T21:38:06.4999602Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.4999803Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.4999890Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5000037Z         triton_fused_add_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5000113Z         del arg0_1
2023-01-11T21:38:06.5000197Z         return (buf0, buf1, )
2023-01-11T21:38:06.5000203Z 
2023-01-11T21:38:06.5000207Z 
2023-01-11T21:38:06.5000293Z if __name__ == "__main__":
2023-01-11T21:38:06.5000412Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5000539Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5000738Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5000848Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5000854Z 
2023-01-11T21:38:06.5000858Z 
2023-01-11T21:38:06.5000957Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5001032Z import torch
2023-01-11T21:38:06.5001107Z import random
2023-01-11T21:38:06.5001255Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5001382Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5001387Z 
2023-01-11T21:38:06.5001467Z aten = torch.ops.aten
2023-01-11T21:38:06.5007845Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5007961Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5007967Z 
2023-01-11T21:38:06.5008040Z import triton
2023-01-11T21:38:06.5008141Z import triton.language as tl
2023-01-11T21:38:06.5008272Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5008419Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5008428Z 
2023-01-11T21:38:06.5008432Z 
2023-01-11T21:38:06.5008614Z triton_fused_add_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.5008692Z import triton
2023-01-11T21:38:06.5008787Z import triton.language as tl
2023-01-11T21:38:06.5008906Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5009006Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5009146Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5009273Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5009279Z 
2023-01-11T21:38:06.5009698Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5009767Z @triton.jit
2023-01-11T21:38:06.5009916Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5010047Z     xnumel = 64
2023-01-11T21:38:06.5010139Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5010272Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5010355Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5010426Z     x0 = xindex
2023-01-11T21:38:06.5010621Z     tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5010720Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5010794Z     tmp0 = 294
2023-01-11T21:38:06.5010899Z     tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.5010971Z     tmp3 = 295
2023-01-11T21:38:06.5011049Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.5011184Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5011320Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5011405Z ''')
2023-01-11T21:38:06.5011410Z 
2023-01-11T21:38:06.5011415Z 
2023-01-11T21:38:06.5011507Z async_compile.wait(globals())
2023-01-11T21:38:06.5011580Z del async_compile
2023-01-11T21:38:06.5011586Z 
2023-01-11T21:38:06.5011659Z def call(args):
2023-01-11T21:38:06.5011731Z     arg0_1, = args
2023-01-11T21:38:06.5011805Z     args.clear()
2023-01-11T21:38:06.5011899Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5012098Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5012294Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5012379Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5012528Z         triton_fused_add_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5012598Z         del arg0_1
2023-01-11T21:38:06.5012681Z         return (buf0, buf1, )
2023-01-11T21:38:06.5012686Z 
2023-01-11T21:38:06.5012691Z 
2023-01-11T21:38:06.5012770Z if __name__ == "__main__":
2023-01-11T21:38:06.5012887Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5013012Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5013212Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5013317Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5013326Z 
2023-01-11T21:38:06.5013389Z ok (0.136s)
2023-01-11T21:38:06.5013943Z   test_lowmem_dropout1_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.5014026Z   warnings.warn(
2023-01-11T21:38:06.5014286Z [2023-01-11 21:35:13,028] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 689
2023-01-11T21:38:06.5014778Z [2023-01-11 21:35:13,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 689
2023-01-11T21:38:06.5015042Z [2023-01-11 21:35:13,100] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 689
2023-01-11T21:38:06.5015310Z [2023-01-11 21:35:13,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 689
2023-01-11T21:38:06.5015564Z [2023-01-11 21:35:13,287] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 690
2023-01-11T21:38:06.5015824Z [2023-01-11 21:35:13,289] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.5016078Z [2023-01-11 21:35:13,428] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 690
2023-01-11T21:38:06.5016331Z [2023-01-11 21:35:13,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 690
2023-01-11T21:38:06.5016337Z 
2023-01-11T21:38:06.5016437Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5016514Z import torch
2023-01-11T21:38:06.5016589Z import random
2023-01-11T21:38:06.5016710Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5016887Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5016893Z 
2023-01-11T21:38:06.5016977Z aten = torch.ops.aten
2023-01-11T21:38:06.5017107Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5017266Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5017272Z 
2023-01-11T21:38:06.5017380Z import triton
2023-01-11T21:38:06.5017475Z import triton.language as tl
2023-01-11T21:38:06.5017601Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5017746Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5017752Z 
2023-01-11T21:38:06.5017757Z 
2023-01-11T21:38:06.5017913Z triton_fused_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.5017987Z import triton
2023-01-11T21:38:06.5018073Z import triton.language as tl
2023-01-11T21:38:06.5018187Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5018289Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5018429Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5018555Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5018561Z 
2023-01-11T21:38:06.5018992Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5019071Z @triton.jit
2023-01-11T21:38:06.5019216Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5019285Z     xnumel = 100000
2023-01-11T21:38:06.5019382Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5019512Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5019595Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5019666Z     x0 = xindex
2023-01-11T21:38:06.5019856Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5019958Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5020030Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.5020164Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5020248Z ''')
2023-01-11T21:38:06.5020253Z 
2023-01-11T21:38:06.5020258Z 
2023-01-11T21:38:06.5020392Z async_compile.wait(globals())
2023-01-11T21:38:06.5020469Z del async_compile
2023-01-11T21:38:06.5020474Z 
2023-01-11T21:38:06.5020547Z def call(args):
2023-01-11T21:38:06.5020637Z     primals_1, primals_2 = args
2023-01-11T21:38:06.5020706Z     args.clear()
2023-01-11T21:38:06.5020798Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5021004Z         buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5021096Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5021254Z         triton_fused_mul_0.run(primals_1, primals_2, buf0, 100000, grid=grid(100000), stream=stream0)
2023-01-11T21:38:06.5021332Z         del primals_2
2023-01-11T21:38:06.5021429Z         return (buf0, primals_1, )
2023-01-11T21:38:06.5021435Z 
2023-01-11T21:38:06.5021439Z 
2023-01-11T21:38:06.5021522Z if __name__ == "__main__":
2023-01-11T21:38:06.5021633Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5021759Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5021973Z     primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5022177Z     primals_2 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5022310Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:06.5022316Z 
2023-01-11T21:38:06.5022320Z 
2023-01-11T21:38:06.5022419Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5022494Z import torch
2023-01-11T21:38:06.5022570Z import random
2023-01-11T21:38:06.5022682Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5022804Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5022838Z 
2023-01-11T21:38:06.5022922Z aten = torch.ops.aten
2023-01-11T21:38:06.5023058Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5023154Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5023159Z 
2023-01-11T21:38:06.5023232Z import triton
2023-01-11T21:38:06.5023325Z import triton.language as tl
2023-01-11T21:38:06.5023444Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5023583Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5023588Z 
2023-01-11T21:38:06.5023593Z 
2023-01-11T21:38:06.5023751Z triton_fused_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5023826Z import triton
2023-01-11T21:38:06.5023918Z import triton.language as tl
2023-01-11T21:38:06.5024030Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5024134Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5024267Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5024388Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5024401Z 
2023-01-11T21:38:06.5024821Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5024894Z @triton.jit
2023-01-11T21:38:06.5025034Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5025109Z     xnumel = 100000
2023-01-11T21:38:06.5025204Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5025331Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5025415Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5025479Z     x0 = xindex
2023-01-11T21:38:06.5025577Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5025673Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5025753Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.5025885Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5025968Z ''')
2023-01-11T21:38:06.5025973Z 
2023-01-11T21:38:06.5025978Z 
2023-01-11T21:38:06.5026069Z async_compile.wait(globals())
2023-01-11T21:38:06.5026146Z del async_compile
2023-01-11T21:38:06.5026151Z 
2023-01-11T21:38:06.5026255Z def call(args):
2023-01-11T21:38:06.5026350Z     primals_1, tangents_1 = args
2023-01-11T21:38:06.5026424Z     args.clear()
2023-01-11T21:38:06.5026514Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5026719Z         buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5026810Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5026968Z         triton_fused_mul_1_0.run(tangents_1, primals_1, buf0, 100000, grid=grid(100000), stream=stream0)
2023-01-11T21:38:06.5027040Z         del primals_1
2023-01-11T21:38:06.5027116Z         del tangents_1
2023-01-11T21:38:06.5027202Z         return (None, buf0, )
2023-01-11T21:38:06.5027210Z 
2023-01-11T21:38:06.5027215Z 
2023-01-11T21:38:06.5027296Z if __name__ == "__main__":
2023-01-11T21:38:06.5027414Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5027537Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5027747Z     primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5027957Z     tangents_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5028083Z     print_performance(lambda: call([primals_1, tangents_1]))
2023-01-11T21:38:06.5028089Z 
2023-01-11T21:38:06.5028099Z 
2023-01-11T21:38:06.5028190Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5028264Z import torch
2023-01-11T21:38:06.5028337Z import random
2023-01-11T21:38:06.5028455Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5028578Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5028624Z 
2023-01-11T21:38:06.5028705Z aten = torch.ops.aten
2023-01-11T21:38:06.5028840Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5028929Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5028934Z 
2023-01-11T21:38:06.5029007Z import triton
2023-01-11T21:38:06.5029099Z import triton.language as tl
2023-01-11T21:38:06.5029224Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5029364Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5029528Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.5029533Z 
2023-01-11T21:38:06.5029538Z 
2023-01-11T21:38:06.5029694Z triton_fused_mul_2_0 = async_compile.triton('''
2023-01-11T21:38:06.5029768Z import triton
2023-01-11T21:38:06.5029854Z import triton.language as tl
2023-01-11T21:38:06.5029969Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5030071Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5030204Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5030327Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5030332Z 
2023-01-11T21:38:06.5030766Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5030837Z @triton.jit
2023-01-11T21:38:06.5030985Z def triton_(seed0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5031054Z     xnumel = 100000
2023-01-11T21:38:06.5031151Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5031279Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5031361Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5031430Z     x0 = xindex
2023-01-11T21:38:06.5031662Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.5031854Z     tmp6 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5031945Z     tmp7 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.5032016Z     tmp1 = x0
2023-01-11T21:38:06.5032104Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5032208Z     tmp3 = 0.33
2023-01-11T21:38:06.5032287Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.5032376Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5032455Z     tmp8 = tmp6 * tmp7
2023-01-11T21:38:06.5032526Z     tmp9 = tmp5 * tmp8
2023-01-11T21:38:06.5032604Z     tmp10 = 1.492537313432836
2023-01-11T21:38:06.5032684Z     tmp11 = tmp9 * tmp10
2023-01-11T21:38:06.5032823Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.5032906Z ''')
2023-01-11T21:38:06.5032911Z 
2023-01-11T21:38:06.5032916Z 
2023-01-11T21:38:06.5033008Z async_compile.wait(globals())
2023-01-11T21:38:06.5033085Z del async_compile
2023-01-11T21:38:06.5033093Z 
2023-01-11T21:38:06.5033160Z def call(args):
2023-01-11T21:38:06.5033253Z     primals_1, primals_2 = args
2023-01-11T21:38:06.5033326Z     args.clear()
2023-01-11T21:38:06.5033461Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.5033550Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5033758Z         buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5033848Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5034020Z         triton_fused_mul_2_0.run(seed_cuda_0, primals_1, primals_2, buf0, 100000, grid=grid(100000), stream=stream0)
2023-01-11T21:38:06.5034091Z         del primals_2
2023-01-11T21:38:06.5034207Z         return (buf0, primals_1, seed_cuda_0.clone(), )
2023-01-11T21:38:06.5034212Z 
2023-01-11T21:38:06.5034217Z 
2023-01-11T21:38:06.5034299Z if __name__ == "__main__":
2023-01-11T21:38:06.5034417Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5034543Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5034766Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5034980Z     primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5035186Z     primals_2 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5035322Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:06.5035329Z 
2023-01-11T21:38:06.5035634Z [2023-01-11 21:35:13,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 690
2023-01-11T21:38:06.5035640Z 
2023-01-11T21:38:06.5035737Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5035812Z import torch
2023-01-11T21:38:06.5035889Z import random
2023-01-11T21:38:06.5036008Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5036131Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5036136Z 
2023-01-11T21:38:06.5036220Z aten = torch.ops.aten
2023-01-11T21:38:06.5036349Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5036442Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5036447Z 
2023-01-11T21:38:06.5036519Z import triton
2023-01-11T21:38:06.5036608Z import triton.language as tl
2023-01-11T21:38:06.5036737Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5036875Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5036880Z 
2023-01-11T21:38:06.5036885Z 
2023-01-11T21:38:06.5037041Z triton_fused_mul_5_0 = async_compile.triton('''
2023-01-11T21:38:06.5037115Z import triton
2023-01-11T21:38:06.5037201Z import triton.language as tl
2023-01-11T21:38:06.5037316Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5037416Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5037548Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5037673Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5037681Z 
2023-01-11T21:38:06.5038116Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5038216Z @triton.jit
2023-01-11T21:38:06.5038366Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5038435Z     xnumel = 100000
2023-01-11T21:38:06.5038532Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5038660Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5038743Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5038814Z     x0 = xindex
2023-01-11T21:38:06.5038946Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.5039043Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5039139Z     tmp10 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.5039210Z     tmp1 = x0
2023-01-11T21:38:06.5039301Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5039373Z     tmp3 = 0.33
2023-01-11T21:38:06.5039451Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.5039541Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5039618Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.5039693Z     tmp8 = 1.492537313432836
2023-01-11T21:38:06.5039771Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.5039851Z     tmp11 = tmp9 * tmp10
2023-01-11T21:38:06.5039986Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.5040071Z ''')
2023-01-11T21:38:06.5040077Z 
2023-01-11T21:38:06.5040081Z 
2023-01-11T21:38:06.5040173Z async_compile.wait(globals())
2023-01-11T21:38:06.5040250Z del async_compile
2023-01-11T21:38:06.5040255Z 
2023-01-11T21:38:06.5040324Z def call(args):
2023-01-11T21:38:06.5040439Z     primals_1, philox_seed_like, tangents_1 = args
2023-01-11T21:38:06.5040512Z     args.clear()
2023-01-11T21:38:06.5040632Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5040836Z         buf0 = empty_strided((100000, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5040928Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5041107Z         triton_fused_mul_5_0.run(philox_seed_like, tangents_1, primals_1, buf0, 100000, grid=grid(100000), stream=stream0)
2023-01-11T21:38:06.5041194Z         del philox_seed_like
2023-01-11T21:38:06.5041264Z         del primals_1
2023-01-11T21:38:06.5041342Z         del tangents_1
2023-01-11T21:38:06.5041425Z         return (None, buf0, )
2023-01-11T21:38:06.5041430Z 
2023-01-11T21:38:06.5041435Z 
2023-01-11T21:38:06.5041513Z if __name__ == "__main__":
2023-01-11T21:38:06.5041631Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5041756Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5041968Z     primals_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5042169Z     philox_seed_like = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5042380Z     tangents_1 = rand_strided((100000, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5042532Z     print_performance(lambda: call([primals_1, philox_seed_like, tangents_1]))
2023-01-11T21:38:06.5042538Z 
2023-01-11T21:38:06.5042614Z ok (0.565s)
2023-01-11T21:38:06.5042948Z   test_lowmem_dropout2_cuda (__main__.CudaTests) ... [2023-01-11 21:35:13,808] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 691
2023-01-11T21:38:06.5043203Z [2023-01-11 21:35:13,810] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.5043466Z [2023-01-11 21:35:13,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 691
2023-01-11T21:38:06.5043720Z [2023-01-11 21:35:14,130] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling BACKWARDS graph 691
2023-01-11T21:38:06.5043727Z 
2023-01-11T21:38:06.5043830Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5043898Z import torch
2023-01-11T21:38:06.5043972Z import random
2023-01-11T21:38:06.5044091Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5044215Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5044221Z 
2023-01-11T21:38:06.5044331Z aten = torch.ops.aten
2023-01-11T21:38:06.5044469Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5044563Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5044568Z 
2023-01-11T21:38:06.5044642Z import triton
2023-01-11T21:38:06.5044728Z import triton.language as tl
2023-01-11T21:38:06.5044854Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5044991Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5045156Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.5045161Z 
2023-01-11T21:38:06.5045169Z 
2023-01-11T21:38:06.5045328Z triton_fused_mul_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5045402Z import triton
2023-01-11T21:38:06.5045495Z import triton.language as tl
2023-01-11T21:38:06.5045603Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5045706Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5045843Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5045970Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5045975Z 
2023-01-11T21:38:06.5046389Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5046463Z @triton.jit
2023-01-11T21:38:06.5046596Z def triton_(in_out_ptr0, seed0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5046671Z     xnumel = 256
2023-01-11T21:38:06.5046842Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5046966Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5047050Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5047120Z     x0 = xindex
2023-01-11T21:38:06.5047350Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.5047456Z     tmp6 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5047528Z     tmp1 = x0
2023-01-11T21:38:06.5047616Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5047683Z     tmp3 = 0.5
2023-01-11T21:38:06.5047764Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.5047851Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5047929Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.5048002Z     tmp8 = 2.0
2023-01-11T21:38:06.5048077Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.5048211Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.5048297Z ''')
2023-01-11T21:38:06.5048302Z 
2023-01-11T21:38:06.5048310Z 
2023-01-11T21:38:06.5048464Z triton_fused_mul_3_1 = async_compile.triton('''
2023-01-11T21:38:06.5048539Z import triton
2023-01-11T21:38:06.5048631Z import triton.language as tl
2023-01-11T21:38:06.5048745Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5048845Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5048982Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5049100Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5049106Z 
2023-01-11T21:38:06.5049520Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5049595Z @triton.jit
2023-01-11T21:38:06.5049730Z def triton_(in_out_ptr0, seed0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5049805Z     xnumel = 256
2023-01-11T21:38:06.5049905Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5050035Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5050119Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5050183Z     x0 = xindex
2023-01-11T21:38:06.5050440Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.5050546Z     tmp6 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5050622Z     tmp1 = 256 + x0
2023-01-11T21:38:06.5050712Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5050785Z     tmp3 = 0.5
2023-01-11T21:38:06.5050863Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.5050944Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5051022Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.5051093Z     tmp8 = 2.0
2023-01-11T21:38:06.5051172Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.5051311Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.5051396Z ''')
2023-01-11T21:38:06.5051404Z 
2023-01-11T21:38:06.5051409Z 
2023-01-11T21:38:06.5051502Z async_compile.wait(globals())
2023-01-11T21:38:06.5051572Z del async_compile
2023-01-11T21:38:06.5051577Z 
2023-01-11T21:38:06.5051650Z def call(args):
2023-01-11T21:38:06.5051756Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.5051834Z     args.clear()
2023-01-11T21:38:06.5051975Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.5052068Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5052272Z         buf0 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5052401Z         aten.mm.out(primals_3, as_strided(primals_1, (32, 32), (1, 32)), out=buf0)
2023-01-11T21:38:06.5052480Z         del primals_1
2023-01-11T21:38:06.5052573Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.5052668Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5052811Z         triton_fused_mul_1_0.run(buf1, seed_cuda_0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5053045Z         buf2 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5053174Z         aten.mm.out(buf1, as_strided(primals_2, (32, 32), (1, 32)), out=buf2)
2023-01-11T21:38:06.5053265Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.5053400Z         triton_fused_mul_3_1.run(buf3, seed_cuda_0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5053564Z         return (buf3, primals_3, seed_cuda_0.clone(), buf1, as_strided(primals_2, (32, 32), (32, 1)), )
2023-01-11T21:38:06.5053570Z 
2023-01-11T21:38:06.5053574Z 
2023-01-11T21:38:06.5053655Z if __name__ == "__main__":
2023-01-11T21:38:06.5053774Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5053901Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5054096Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5054305Z     primals_1 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5054630Z     primals_2 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5054832Z     primals_3 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5054977Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.5054986Z 
2023-01-11T21:38:06.5055255Z [2023-01-11 21:35:14,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling BACKWARDS graph 691
2023-01-11T21:38:06.5055260Z 
2023-01-11T21:38:06.5055358Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5055433Z import torch
2023-01-11T21:38:06.5055509Z import random
2023-01-11T21:38:06.5055628Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5055753Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5055758Z 
2023-01-11T21:38:06.5055841Z aten = torch.ops.aten
2023-01-11T21:38:06.5055970Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5056068Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5056073Z 
2023-01-11T21:38:06.5056147Z import triton
2023-01-11T21:38:06.5056239Z import triton.language as tl
2023-01-11T21:38:06.5056364Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5056554Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5056560Z 
2023-01-11T21:38:06.5056565Z 
2023-01-11T21:38:06.5056810Z triton_fused_convert_element_type_2_gt_2_mul_4_mul_5_philox_rand_like_2_0 = async_compile.triton('''
2023-01-11T21:38:06.5056888Z import triton
2023-01-11T21:38:06.5056973Z import triton.language as tl
2023-01-11T21:38:06.5057085Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5057254Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5057433Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5057563Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5057568Z 
2023-01-11T21:38:06.5057990Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5058066Z @triton.jit
2023-01-11T21:38:06.5058209Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5058277Z     xnumel = 256
2023-01-11T21:38:06.5058374Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5058502Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5058587Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5058658Z     x0 = xindex
2023-01-11T21:38:06.5058899Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.5058997Z     tmp6 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5059065Z     tmp1 = 256 + x0
2023-01-11T21:38:06.5059156Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5059273Z     tmp3 = 0.5
2023-01-11T21:38:06.5059350Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.5059441Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5059519Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.5059591Z     tmp8 = 2.0
2023-01-11T21:38:06.5059662Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.5059803Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.5059888Z ''')
2023-01-11T21:38:06.5059893Z 
2023-01-11T21:38:06.5059898Z 
2023-01-11T21:38:06.5060200Z triton_fused_convert_element_type_2_convert_element_type_3_gt_2_gt_3_mm_3_mul_4_mul_5_mul_6_mul_7_philox_rand_like_2_1 = async_compile.triton('''
2023-01-11T21:38:06.5060274Z import triton
2023-01-11T21:38:06.5060364Z import triton.language as tl
2023-01-11T21:38:06.5060481Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5060576Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5060708Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5060839Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5060844Z 
2023-01-11T21:38:06.5061261Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5061334Z @triton.jit
2023-01-11T21:38:06.5061467Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5061542Z     xnumel = 256
2023-01-11T21:38:06.5061638Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5061760Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5061842Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5061910Z     x0 = xindex
2023-01-11T21:38:06.5062042Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.5062146Z     tmp6 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5062222Z     tmp1 = x0
2023-01-11T21:38:06.5062310Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5062375Z     tmp3 = 0.5
2023-01-11T21:38:06.5062454Z     tmp4 = tmp2 > tmp3
2023-01-11T21:38:06.5062542Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5062617Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.5062685Z     tmp8 = 2.0
2023-01-11T21:38:06.5062788Z     tmp9 = tmp7 * tmp8
2023-01-11T21:38:06.5062927Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.5063006Z ''')
2023-01-11T21:38:06.5063012Z 
2023-01-11T21:38:06.5063023Z 
2023-01-11T21:38:06.5063109Z async_compile.wait(globals())
2023-01-11T21:38:06.5063188Z del async_compile
2023-01-11T21:38:06.5063193Z 
2023-01-11T21:38:06.5063265Z def call(args):
2023-01-11T21:38:06.5063407Z     primals_3, philox_seed_like, mul_1, permute_4, tangents_1 = args
2023-01-11T21:38:06.5063482Z     args.clear()
2023-01-11T21:38:06.5063574Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5063775Z         buf0 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5063865Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5064073Z         triton_fused_convert_element_type_2_gt_2_mul_4_mul_5_philox_rand_like_2_0.run(philox_seed_like, tangents_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5064155Z         del tangents_1
2023-01-11T21:38:06.5064362Z         buf1 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5064489Z         aten.mm.out(as_strided(buf0, (32, 8), (1, 32)), mul_1, out=buf1)
2023-01-11T21:38:06.5064562Z         del mul_1
2023-01-11T21:38:06.5064762Z         buf2 = empty_strided((8, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5064863Z         aten.mm.out(buf0, permute_4, out=buf2)
2023-01-11T21:38:06.5064937Z         del buf0
2023-01-11T21:38:06.5065012Z         del permute_4
2023-01-11T21:38:06.5065104Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.5065339Z         triton_fused_convert_element_type_2_convert_element_type_3_gt_2_gt_3_mm_3_mul_4_mul_5_mul_6_mul_7_philox_rand_like_2_1.run(buf3, philox_seed_like, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5065457Z         del philox_seed_like
2023-01-11T21:38:06.5065681Z         buf4 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5065833Z         aten.mm.out(as_strided(buf3, (32, 8), (1, 32)), primals_3, out=buf4)
2023-01-11T21:38:06.5065907Z         del buf3
2023-01-11T21:38:06.5065985Z         del primals_3
2023-01-11T21:38:06.5066126Z         return (as_strided(buf4, (32, 32), (32, 1)), as_strided(buf1, (32, 32), (32, 1)), None, )
2023-01-11T21:38:06.5066132Z 
2023-01-11T21:38:06.5066136Z 
2023-01-11T21:38:06.5066216Z if __name__ == "__main__":
2023-01-11T21:38:06.5066335Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5066464Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5066670Z     primals_3 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5066874Z     philox_seed_like = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5067065Z     mul_1 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5067275Z     permute_4 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5067479Z     tangents_1 = rand_strided((8, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5067651Z     print_performance(lambda: call([primals_3, philox_seed_like, mul_1, permute_4, tangents_1]))
2023-01-11T21:38:06.5067656Z 
2023-01-11T21:38:06.5067728Z ok (0.851s)
2023-01-11T21:38:06.5068185Z   test_masked_fill_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5068322Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5068581Z [2023-01-11 21:35:14,461] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 692
2023-01-11T21:38:06.5068869Z [2023-01-11 21:35:14,550] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 692
2023-01-11T21:38:06.5069287Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5069418Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5069665Z [2023-01-11 21:35:14,587] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 693
2023-01-11T21:38:06.5069930Z [2023-01-11 21:35:14,684] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 693
2023-01-11T21:38:06.5069936Z 
2023-01-11T21:38:06.5070034Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5070111Z import torch
2023-01-11T21:38:06.5070186Z import random
2023-01-11T21:38:06.5070304Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5070428Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5070433Z 
2023-01-11T21:38:06.5070513Z aten = torch.ops.aten
2023-01-11T21:38:06.5070644Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5070739Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5070744Z 
2023-01-11T21:38:06.5070817Z import triton
2023-01-11T21:38:06.5070907Z import triton.language as tl
2023-01-11T21:38:06.5071032Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5071212Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5071218Z 
2023-01-11T21:38:06.5071222Z 
2023-01-11T21:38:06.5071390Z triton_fused_add_where_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5071464Z import triton
2023-01-11T21:38:06.5071549Z import triton.language as tl
2023-01-11T21:38:06.5071664Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5071765Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5071897Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5072022Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5072028Z 
2023-01-11T21:38:06.5072462Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5072535Z @triton.jit
2023-01-11T21:38:06.5072689Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5072756Z     xnumel = 256
2023-01-11T21:38:06.5072853Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5072984Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5073071Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5073146Z     x0 = xindex % 16
2023-01-11T21:38:06.5073217Z     x2 = xindex
2023-01-11T21:38:06.5073409Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5073592Z     tmp2 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5073687Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5073785Z     tmp9 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.5073892Z     tmp1 = -10000.0
2023-01-11T21:38:06.5073988Z     tmp3 = tl.where(tmp0, tmp1, tmp2)
2023-01-11T21:38:06.5074059Z     tmp4 = 2
2023-01-11T21:38:06.5074138Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.5074211Z     tmp7 = tmp6 == 0
2023-01-11T21:38:06.5074285Z     tmp8 = 667.0
2023-01-11T21:38:06.5074358Z     tmp10 = 2.0
2023-01-11T21:38:06.5074440Z     tmp11 = tmp9 / tmp10
2023-01-11T21:38:06.5074537Z     tmp12 = tl.where(tmp7, tmp8, tmp11)
2023-01-11T21:38:06.5074677Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5074842Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.5074922Z ''')
2023-01-11T21:38:06.5074927Z 
2023-01-11T21:38:06.5074932Z 
2023-01-11T21:38:06.5075027Z async_compile.wait(globals())
2023-01-11T21:38:06.5075103Z del async_compile
2023-01-11T21:38:06.5075108Z 
2023-01-11T21:38:06.5075184Z def call(args):
2023-01-11T21:38:06.5075263Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5075347Z     args.clear()
2023-01-11T21:38:06.5075456Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5075678Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5075880Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5075974Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5076131Z         triton_fused_add_where_1_0.run(arg0_1, arg1_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5076203Z         del arg0_1
2023-01-11T21:38:06.5076278Z         del arg1_1
2023-01-11T21:38:06.5076359Z         return (buf0, buf1, )
2023-01-11T21:38:06.5076364Z 
2023-01-11T21:38:06.5076368Z 
2023-01-11T21:38:06.5076449Z if __name__ == "__main__":
2023-01-11T21:38:06.5076560Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5076686Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5076886Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.5077091Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5077209Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5077243Z 
2023-01-11T21:38:06.5077248Z 
2023-01-11T21:38:06.5077348Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5077422Z import torch
2023-01-11T21:38:06.5077497Z import random
2023-01-11T21:38:06.5077611Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5077738Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5077743Z 
2023-01-11T21:38:06.5077826Z aten = torch.ops.aten
2023-01-11T21:38:06.5077962Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5078058Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5078063Z 
2023-01-11T21:38:06.5078136Z import triton
2023-01-11T21:38:06.5078225Z import triton.language as tl
2023-01-11T21:38:06.5078349Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5078480Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5078486Z 
2023-01-11T21:38:06.5078490Z 
2023-01-11T21:38:06.5078656Z triton_fused_add_where_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5078733Z import triton
2023-01-11T21:38:06.5078824Z import triton.language as tl
2023-01-11T21:38:06.5078940Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5079041Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5079174Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5079292Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5079304Z 
2023-01-11T21:38:06.5079731Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5079802Z @triton.jit
2023-01-11T21:38:06.5079954Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5080028Z     xnumel = 256
2023-01-11T21:38:06.5080123Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5080256Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5080339Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5080407Z     x0 = xindex % 16
2023-01-11T21:38:06.5080478Z     x2 = xindex
2023-01-11T21:38:06.5080696Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5080910Z     tmp2 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5081007Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5081126Z     tmp9 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.5081231Z     tmp1 = -10000.0
2023-01-11T21:38:06.5081328Z     tmp3 = tl.where(tmp0, tmp1, tmp2)
2023-01-11T21:38:06.5081392Z     tmp4 = 2
2023-01-11T21:38:06.5081470Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.5081548Z     tmp7 = tmp6 == 0
2023-01-11T21:38:06.5081618Z     tmp8 = 667.0
2023-01-11T21:38:06.5081690Z     tmp10 = 2.0
2023-01-11T21:38:06.5081774Z     tmp11 = tmp9 / tmp10
2023-01-11T21:38:06.5081864Z     tmp12 = tl.where(tmp7, tmp8, tmp11)
2023-01-11T21:38:06.5081999Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5082134Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.5082218Z ''')
2023-01-11T21:38:06.5082226Z 
2023-01-11T21:38:06.5082231Z 
2023-01-11T21:38:06.5082324Z async_compile.wait(globals())
2023-01-11T21:38:06.5082399Z del async_compile
2023-01-11T21:38:06.5082404Z 
2023-01-11T21:38:06.5082478Z def call(args):
2023-01-11T21:38:06.5082555Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5082623Z     args.clear()
2023-01-11T21:38:06.5082715Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5082919Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5083116Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5083205Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5083390Z         triton_fused_add_where_1_0.run(arg0_1, arg1_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5083462Z         del arg0_1
2023-01-11T21:38:06.5083527Z         del arg1_1
2023-01-11T21:38:06.5083609Z         return (buf0, buf1, )
2023-01-11T21:38:06.5083614Z 
2023-01-11T21:38:06.5083618Z 
2023-01-11T21:38:06.5083701Z if __name__ == "__main__":
2023-01-11T21:38:06.5083819Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5083946Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5084143Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.5084347Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5084464Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5084470Z 
2023-01-11T21:38:06.5084534Z ok (0.262s)
2023-01-11T21:38:06.5085058Z   test_masked_fill_promotion_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.5085141Z   warnings.warn(
2023-01-11T21:38:06.5085401Z [2023-01-11 21:35:14,714] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 694
2023-01-11T21:38:06.5085665Z [2023-01-11 21:35:14,787] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 694
2023-01-11T21:38:06.5085917Z [2023-01-11 21:35:14,941] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 695
2023-01-11T21:38:06.5086154Z [2023-01-11 21:35:14,947] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.5086411Z [2023-01-11 21:35:15,021] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 695
2023-01-11T21:38:06.5086420Z 
2023-01-11T21:38:06.5086516Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5086590Z import torch
2023-01-11T21:38:06.5086658Z import random
2023-01-11T21:38:06.5086776Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5086900Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5086905Z 
2023-01-11T21:38:06.5087017Z aten = torch.ops.aten
2023-01-11T21:38:06.5087155Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5087249Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5087254Z 
2023-01-11T21:38:06.5087326Z import triton
2023-01-11T21:38:06.5087412Z import triton.language as tl
2023-01-11T21:38:06.5087536Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5087675Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5087680Z 
2023-01-11T21:38:06.5087685Z 
2023-01-11T21:38:06.5087841Z triton_fused_where_0 = async_compile.triton('''
2023-01-11T21:38:06.5087918Z import triton
2023-01-11T21:38:06.5088010Z import triton.language as tl
2023-01-11T21:38:06.5088126Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5088226Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5088353Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5088479Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5088484Z 
2023-01-11T21:38:06.5088902Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5088974Z @triton.jit
2023-01-11T21:38:06.5089112Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5089190Z     xnumel = 256
2023-01-11T21:38:06.5089286Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5089414Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5089518Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5089592Z     x0 = xindex % 16
2023-01-11T21:38:06.5089661Z     x2 = xindex
2023-01-11T21:38:06.5089759Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5089878Z     tmp3 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.5089953Z     tmp1 = 3.5
2023-01-11T21:38:06.5090034Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.5090128Z     tmp4 = tl.where(tmp0, tmp2, tmp3)
2023-01-11T21:38:06.5090261Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5090345Z ''')
2023-01-11T21:38:06.5090350Z 
2023-01-11T21:38:06.5090354Z 
2023-01-11T21:38:06.5090452Z async_compile.wait(globals())
2023-01-11T21:38:06.5090527Z del async_compile
2023-01-11T21:38:06.5090532Z 
2023-01-11T21:38:06.5090605Z def call(args):
2023-01-11T21:38:06.5090686Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5090754Z     args.clear()
2023-01-11T21:38:06.5090845Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5091051Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5091141Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5091285Z         triton_fused_where_0.run(arg0_1, arg1_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5091362Z         del arg0_1
2023-01-11T21:38:06.5091434Z         del arg1_1
2023-01-11T21:38:06.5091505Z         return (buf0, )
2023-01-11T21:38:06.5091515Z 
2023-01-11T21:38:06.5091520Z 
2023-01-11T21:38:06.5091593Z if __name__ == "__main__":
2023-01-11T21:38:06.5091710Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5091834Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5092033Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.5092240Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5092358Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5092366Z 
2023-01-11T21:38:06.5092370Z 
2023-01-11T21:38:06.5092467Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5092540Z import torch
2023-01-11T21:38:06.5092607Z import random
2023-01-11T21:38:06.5092723Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5092870Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5092876Z 
2023-01-11T21:38:06.5092959Z aten = torch.ops.aten
2023-01-11T21:38:06.5093095Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5093189Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5093194Z 
2023-01-11T21:38:06.5093270Z import triton
2023-01-11T21:38:06.5093355Z import triton.language as tl
2023-01-11T21:38:06.5093479Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5093619Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5093624Z 
2023-01-11T21:38:06.5093629Z 
2023-01-11T21:38:06.5093786Z triton_fused_where_0 = async_compile.triton('''
2023-01-11T21:38:06.5093861Z import triton
2023-01-11T21:38:06.5093952Z import triton.language as tl
2023-01-11T21:38:06.5094063Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5094162Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5094291Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5094416Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5094421Z 
2023-01-11T21:38:06.5094957Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*i1', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5095032Z @triton.jit
2023-01-11T21:38:06.5095173Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5095246Z     xnumel = 256
2023-01-11T21:38:06.5095345Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5095517Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5095593Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5095670Z     x0 = xindex % 16
2023-01-11T21:38:06.5095744Z     x2 = xindex
2023-01-11T21:38:06.5095841Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5095941Z     tmp3 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.5096012Z     tmp1 = 3.5
2023-01-11T21:38:06.5096099Z     tmp2 = tmp1.to(tl.int64)
2023-01-11T21:38:06.5096189Z     tmp4 = tl.where(tmp0, tmp2, tmp3)
2023-01-11T21:38:06.5096322Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5096406Z ''')
2023-01-11T21:38:06.5096411Z 
2023-01-11T21:38:06.5096415Z 
2023-01-11T21:38:06.5096508Z async_compile.wait(globals())
2023-01-11T21:38:06.5096585Z del async_compile
2023-01-11T21:38:06.5096590Z 
2023-01-11T21:38:06.5096664Z def call(args):
2023-01-11T21:38:06.5096743Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5096814Z     args.clear()
2023-01-11T21:38:06.5096906Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5097109Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5097270Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5097425Z         triton_fused_where_0.run(arg0_1, arg1_1, buf0, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5097498Z         del arg0_1
2023-01-11T21:38:06.5097571Z         del arg1_1
2023-01-11T21:38:06.5097642Z         return (buf0, )
2023-01-11T21:38:06.5097653Z 
2023-01-11T21:38:06.5097657Z 
2023-01-11T21:38:06.5097731Z if __name__ == "__main__":
2023-01-11T21:38:06.5097848Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5097974Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5098171Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.5098370Z     arg1_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5098494Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5098499Z 
2023-01-11T21:38:06.5098571Z ok (0.466s)
2023-01-11T21:38:06.5099062Z   test_max_min_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5099191Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5099442Z [2023-01-11 21:35:15,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 696
2023-01-11T21:38:06.5099705Z [2023-01-11 21:35:15,256] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 696
2023-01-11T21:38:06.5100120Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5100258Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5100512Z [2023-01-11 21:35:15,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 697
2023-01-11T21:38:06.5100777Z [2023-01-11 21:35:15,350] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 697
2023-01-11T21:38:06.5101189Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5101347Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5101601Z [2023-01-11 21:35:15,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 698
2023-01-11T21:38:06.5101862Z [2023-01-11 21:35:15,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 698
2023-01-11T21:38:06.5102274Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5102403Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5102649Z [2023-01-11 21:35:15,390] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 699
2023-01-11T21:38:06.5102657Z 
2023-01-11T21:38:06.5102753Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5102827Z import torch
2023-01-11T21:38:06.5102900Z import random
2023-01-11T21:38:06.5103019Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5103147Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5103152Z 
2023-01-11T21:38:06.5103233Z aten = torch.ops.aten
2023-01-11T21:38:06.5103363Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5103459Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5103464Z 
2023-01-11T21:38:06.5103538Z import triton
2023-01-11T21:38:06.5103628Z import triton.language as tl
2023-01-11T21:38:06.5103753Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5103891Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5103897Z 
2023-01-11T21:38:06.5103901Z 
2023-01-11T21:38:06.5104079Z triton_fused_maximum_minimum_0 = async_compile.triton('''
2023-01-11T21:38:06.5104152Z import triton
2023-01-11T21:38:06.5104238Z import triton.language as tl
2023-01-11T21:38:06.5104351Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5104451Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5104609Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5104733Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5104738Z 
2023-01-11T21:38:06.5105170Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5105244Z @triton.jit
2023-01-11T21:38:06.5105394Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5105469Z     xnumel = 8
2023-01-11T21:38:06.5105563Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5105717Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5105810Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5105896Z     x0 = xindex
2023-01-11T21:38:06.5106086Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5106275Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5106373Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5106463Z     tmp4 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5106601Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.5106738Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4))
2023-01-11T21:38:06.5106875Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5107004Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5107120Z ''')
2023-01-11T21:38:06.5107126Z 
2023-01-11T21:38:06.5107130Z 
2023-01-11T21:38:06.5107224Z async_compile.wait(globals())
2023-01-11T21:38:06.5107300Z del async_compile
2023-01-11T21:38:06.5107305Z 
2023-01-11T21:38:06.5107373Z def call(args):
2023-01-11T21:38:06.5107452Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5107528Z     args.clear()
2023-01-11T21:38:06.5107622Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5107817Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5108017Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5108110Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5108263Z         triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5108336Z         del arg0_1
2023-01-11T21:38:06.5108407Z         del arg1_1
2023-01-11T21:38:06.5108490Z         return (buf0, buf1, )
2023-01-11T21:38:06.5108495Z 
2023-01-11T21:38:06.5108503Z 
2023-01-11T21:38:06.5108583Z if __name__ == "__main__":
2023-01-11T21:38:06.5108700Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5108826Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5109022Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5109212Z     arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5109329Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5109335Z 
2023-01-11T21:38:06.5109339Z 
2023-01-11T21:38:06.5109436Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5109508Z import torch
2023-01-11T21:38:06.5109582Z import random
2023-01-11T21:38:06.5109701Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5109823Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5109828Z 
2023-01-11T21:38:06.5109908Z aten = torch.ops.aten
2023-01-11T21:38:06.5110037Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5110134Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5110139Z 
2023-01-11T21:38:06.5110212Z import triton
2023-01-11T21:38:06.5110304Z import triton.language as tl
2023-01-11T21:38:06.5110427Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5110590Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5110596Z 
2023-01-11T21:38:06.5110600Z 
2023-01-11T21:38:06.5110780Z triton_fused_maximum_minimum_0 = async_compile.triton('''
2023-01-11T21:38:06.5110854Z import triton
2023-01-11T21:38:06.5110940Z import triton.language as tl
2023-01-11T21:38:06.5111051Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5111151Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5111282Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5111407Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5111412Z 
2023-01-11T21:38:06.5111843Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5111917Z @triton.jit
2023-01-11T21:38:06.5112069Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5112136Z     xnumel = 8
2023-01-11T21:38:06.5112233Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5112361Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5112443Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5112514Z     x0 = xindex
2023-01-11T21:38:06.5112727Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5112938Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5113048Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5113207Z     tmp4 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5113342Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.5113480Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4))
2023-01-11T21:38:06.5113612Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5113744Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5113832Z ''')
2023-01-11T21:38:06.5113838Z 
2023-01-11T21:38:06.5113842Z 
2023-01-11T21:38:06.5113934Z async_compile.wait(globals())
2023-01-11T21:38:06.5114004Z del async_compile
2023-01-11T21:38:06.5114009Z 
2023-01-11T21:38:06.5114081Z def call(args):
2023-01-11T21:38:06.5114160Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5114233Z     args.clear()
2023-01-11T21:38:06.5114324Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5114519Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5114715Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5114800Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5114965Z         triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5115038Z         del arg0_1
2023-01-11T21:38:06.5115110Z         del arg1_1
2023-01-11T21:38:06.5115191Z         return (buf0, buf1, )
2023-01-11T21:38:06.5115196Z 
2023-01-11T21:38:06.5115201Z 
2023-01-11T21:38:06.5115278Z if __name__ == "__main__":
2023-01-11T21:38:06.5115418Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5115560Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5115757Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5115952Z     arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5116075Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5116080Z 
2023-01-11T21:38:06.5116084Z 
2023-01-11T21:38:06.5116182Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5116258Z import torch
2023-01-11T21:38:06.5116333Z import random
2023-01-11T21:38:06.5116480Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5116604Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5116609Z 
2023-01-11T21:38:06.5116684Z aten = torch.ops.aten
2023-01-11T21:38:06.5116818Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5116912Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5116917Z 
2023-01-11T21:38:06.5116990Z import triton
2023-01-11T21:38:06.5117080Z import triton.language as tl
2023-01-11T21:38:06.5117205Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5117343Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5117349Z 
2023-01-11T21:38:06.5117356Z 
2023-01-11T21:38:06.5117531Z triton_fused_maximum_minimum_0 = async_compile.triton('''
2023-01-11T21:38:06.5117598Z import triton
2023-01-11T21:38:06.5117690Z import triton.language as tl
2023-01-11T21:38:06.5117802Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5117905Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5118039Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5118165Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5118170Z 
2023-01-11T21:38:06.5118603Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5118678Z @triton.jit
2023-01-11T21:38:06.5118822Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5118922Z     xnumel = 8
2023-01-11T21:38:06.5119020Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5119150Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5119233Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5119303Z     x0 = xindex
2023-01-11T21:38:06.5119496Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5119680Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5119778Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5119873Z     tmp4 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5120009Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.5120145Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4))
2023-01-11T21:38:06.5120276Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5120407Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5120496Z ''')
2023-01-11T21:38:06.5120502Z 
2023-01-11T21:38:06.5120506Z 
2023-01-11T21:38:06.5120592Z async_compile.wait(globals())
2023-01-11T21:38:06.5120668Z del async_compile
2023-01-11T21:38:06.5120674Z 
2023-01-11T21:38:06.5120747Z def call(args):
2023-01-11T21:38:06.5120830Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5120907Z     args.clear()
2023-01-11T21:38:06.5120997Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5121194Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5121380Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5121474Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5121634Z         triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5121707Z         del arg0_1
2023-01-11T21:38:06.5121775Z         del arg1_1
2023-01-11T21:38:06.5121857Z         return (buf0, buf1, )
2023-01-11T21:38:06.5121865Z 
2023-01-11T21:38:06.5121869Z 
2023-01-11T21:38:06.5121947Z if __name__ == "__main__":
2023-01-11T21:38:06.5122064Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5122183Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5122409Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5122607Z     arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5122726Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5122731Z 
2023-01-11T21:38:06.5122997Z [2023-01-11 21:35:15,399] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 699
2023-01-11T21:38:06.5123003Z 
2023-01-11T21:38:06.5123101Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5123174Z import torch
2023-01-11T21:38:06.5123248Z import random
2023-01-11T21:38:06.5123359Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5123485Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5123491Z 
2023-01-11T21:38:06.5123572Z aten = torch.ops.aten
2023-01-11T21:38:06.5123710Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5123805Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5123810Z 
2023-01-11T21:38:06.5123886Z import triton
2023-01-11T21:38:06.5123978Z import triton.language as tl
2023-01-11T21:38:06.5124104Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5124235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5124241Z 
2023-01-11T21:38:06.5124245Z 
2023-01-11T21:38:06.5124417Z triton_fused_maximum_minimum_0 = async_compile.triton('''
2023-01-11T21:38:06.5124490Z import triton
2023-01-11T21:38:06.5124582Z import triton.language as tl
2023-01-11T21:38:06.5124694Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5124795Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5124952Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5125070Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5125085Z 
2023-01-11T21:38:06.5125509Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5125582Z @triton.jit
2023-01-11T21:38:06.5125731Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5125803Z     xnumel = 8
2023-01-11T21:38:06.5125900Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5126029Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5126113Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5126177Z     x0 = xindex
2023-01-11T21:38:06.5126390Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5126604Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5126718Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5126834Z     tmp4 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5126969Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.5127100Z     tmp5 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp4, tmp3, tmp4))
2023-01-11T21:38:06.5127234Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5127361Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5127445Z ''')
2023-01-11T21:38:06.5127451Z 
2023-01-11T21:38:06.5127455Z 
2023-01-11T21:38:06.5127547Z async_compile.wait(globals())
2023-01-11T21:38:06.5127621Z del async_compile
2023-01-11T21:38:06.5127626Z 
2023-01-11T21:38:06.5127699Z def call(args):
2023-01-11T21:38:06.5127780Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5127854Z     args.clear()
2023-01-11T21:38:06.5127944Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5128132Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5128324Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5128444Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5128603Z         triton_fused_maximum_minimum_0.run(arg0_1, arg1_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5128676Z         del arg0_1
2023-01-11T21:38:06.5128748Z         del arg1_1
2023-01-11T21:38:06.5128830Z         return (buf0, buf1, )
2023-01-11T21:38:06.5128835Z 
2023-01-11T21:38:06.5128840Z 
2023-01-11T21:38:06.5128920Z if __name__ == "__main__":
2023-01-11T21:38:06.5129031Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5129156Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5129355Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5129548Z     arg1_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5129663Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5129668Z 
2023-01-11T21:38:06.5129740Z ok (0.248s)
2023-01-11T21:38:06.5130200Z   test_max_pool2d1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5130330Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5130588Z [2023-01-11 21:35:15,419] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 700
2023-01-11T21:38:06.5130873Z [2023-01-11 21:35:15,797] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 700
2023-01-11T21:38:06.5131288Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5131418Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5131671Z [2023-01-11 21:35:15,816] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 701
2023-01-11T21:38:06.5131677Z 
2023-01-11T21:38:06.5131772Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5131846Z import torch
2023-01-11T21:38:06.5131921Z import random
2023-01-11T21:38:06.5132040Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5132165Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5132170Z 
2023-01-11T21:38:06.5132245Z aten = torch.ops.aten
2023-01-11T21:38:06.5132383Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5132480Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5132485Z 
2023-01-11T21:38:06.5132560Z import triton
2023-01-11T21:38:06.5132652Z import triton.language as tl
2023-01-11T21:38:06.5132780Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5132918Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5132924Z 
2023-01-11T21:38:06.5132928Z 
2023-01-11T21:38:06.5133107Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5133175Z import triton
2023-01-11T21:38:06.5133269Z import triton.language as tl
2023-01-11T21:38:06.5133380Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5133482Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5133618Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5133743Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5133748Z 
2023-01-11T21:38:06.5134193Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5134269Z @triton.jit
2023-01-11T21:38:06.5134404Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5134594Z     xnumel = 392
2023-01-11T21:38:06.5134696Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5134824Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5134907Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5134979Z     x0 = xindex % 7
2023-01-11T21:38:06.5135061Z     x1 = (xindex // 7) % 7
2023-01-11T21:38:06.5135132Z     x2 = (xindex // 49)
2023-01-11T21:38:06.5135206Z     x3 = xindex
2023-01-11T21:38:06.5135429Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5135658Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5135878Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5136099Z     tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5136316Z     tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5136528Z     tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5136742Z     tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5136964Z     tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5137290Z     tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5137408Z     tmp17 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5137531Z     tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5137647Z     tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5137769Z     tmp29 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5137887Z     tmp34 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5137997Z     tmp39 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5138113Z     tmp44 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5138227Z     tmp49 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5138344Z     tmp54 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask)
2023-01-11T21:38:06.5138480Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5138615Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5138748Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5138879Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5139008Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5139148Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5139284Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5139418Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5139498Z     tmp18 = (2*x0) + (32*x1)
2023-01-11T21:38:06.5139587Z     tmp20 = 1 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5139669Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5139768Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5139901Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5139982Z     tmp25 = 2 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5140151Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5140251Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5140390Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5140470Z     tmp30 = 16 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5140550Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5140641Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5140781Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5140864Z     tmp35 = 17 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5140944Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5141044Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5141186Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5141268Z     tmp40 = 18 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5141341Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5141438Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5141575Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5141658Z     tmp45 = 32 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5141737Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5141832Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5141965Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5142041Z     tmp50 = 33 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5142123Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5142222Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5142359Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5142477Z     tmp55 = 34 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5142558Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5142656Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5142788Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5142926Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5143059Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5143151Z ''')
2023-01-11T21:38:06.5143156Z 
2023-01-11T21:38:06.5143161Z 
2023-01-11T21:38:06.5143259Z async_compile.wait(globals())
2023-01-11T21:38:06.5143340Z del async_compile
2023-01-11T21:38:06.5143345Z 
2023-01-11T21:38:06.5143425Z def call(args):
2023-01-11T21:38:06.5143502Z     arg0_1, = args
2023-01-11T21:38:06.5143573Z     args.clear()
2023-01-11T21:38:06.5143668Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5143890Z         buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5144109Z         buf1 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5144201Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5144363Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 392, grid=grid(392), stream=stream0)
2023-01-11T21:38:06.5144440Z         del arg0_1
2023-01-11T21:38:06.5144525Z         return (buf0, buf1, )
2023-01-11T21:38:06.5144530Z 
2023-01-11T21:38:06.5144535Z 
2023-01-11T21:38:06.5144610Z if __name__ == "__main__":
2023-01-11T21:38:06.5144731Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5144865Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5145094Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5145207Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5145213Z 
2023-01-11T21:38:06.5145485Z [2023-01-11 21:35:16,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 701
2023-01-11T21:38:06.5145491Z 
2023-01-11T21:38:06.5145590Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5145668Z import torch
2023-01-11T21:38:06.5145738Z import random
2023-01-11T21:38:06.5145914Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5146062Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5146068Z 
2023-01-11T21:38:06.5146158Z aten = torch.ops.aten
2023-01-11T21:38:06.5146297Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5146395Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5146400Z 
2023-01-11T21:38:06.5146473Z import triton
2023-01-11T21:38:06.5146570Z import triton.language as tl
2023-01-11T21:38:06.5146690Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5146831Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5146840Z 
2023-01-11T21:38:06.5146845Z 
2023-01-11T21:38:06.5147027Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5147104Z import triton
2023-01-11T21:38:06.5147202Z import triton.language as tl
2023-01-11T21:38:06.5147317Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5147421Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5147550Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5147679Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5147684Z 
2023-01-11T21:38:06.5148101Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5148176Z @triton.jit
2023-01-11T21:38:06.5148320Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5148427Z     xnumel = 392
2023-01-11T21:38:06.5148525Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5148656Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5148735Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5148811Z     x0 = xindex % 7
2023-01-11T21:38:06.5148894Z     x1 = (xindex // 7) % 7
2023-01-11T21:38:06.5148974Z     x2 = (xindex // 49)
2023-01-11T21:38:06.5149050Z     x3 = xindex
2023-01-11T21:38:06.5149296Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5149537Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5149773Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5150008Z     tmp5 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5150250Z     tmp7 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5150483Z     tmp9 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5150728Z     tmp11 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5150964Z     tmp13 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5151198Z     tmp15 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5151330Z     tmp17 = tl.load(in_ptr0 + ((2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5151465Z     tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5151603Z     tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5151733Z     tmp29 = tl.load(in_ptr0 + (16 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5151870Z     tmp34 = tl.load(in_ptr0 + (17 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5152034Z     tmp39 = tl.load(in_ptr0 + (18 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5152165Z     tmp44 = tl.load(in_ptr0 + (32 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5152293Z     tmp49 = tl.load(in_ptr0 + (33 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5152421Z     tmp54 = tl.load(in_ptr0 + (34 + (2*x0) + (32*x1) + (256*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5152560Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5152697Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5152824Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5152960Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5153097Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5153243Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5153381Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5153517Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5153604Z     tmp18 = (2*x0) + (32*x1)
2023-01-11T21:38:06.5153690Z     tmp20 = 1 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5153766Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5153866Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5154007Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5154092Z     tmp25 = 2 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5154220Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5154319Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5154456Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5154536Z     tmp30 = 16 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5154618Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5154721Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5154862Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5154946Z     tmp35 = 17 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5155029Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5155129Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5155266Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5155366Z     tmp40 = 18 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5155457Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5155576Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5155722Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5155807Z     tmp45 = 32 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5155889Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5155981Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5156120Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5156203Z     tmp50 = 33 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5156286Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5156383Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5156522Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5156604Z     tmp55 = 34 + (2*x0) + (32*x1)
2023-01-11T21:38:06.5156685Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5156778Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5156915Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5157052Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5157185Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5157276Z ''')
2023-01-11T21:38:06.5157282Z 
2023-01-11T21:38:06.5157287Z 
2023-01-11T21:38:06.5157381Z async_compile.wait(globals())
2023-01-11T21:38:06.5157487Z del async_compile
2023-01-11T21:38:06.5157493Z 
2023-01-11T21:38:06.5157570Z def call(args):
2023-01-11T21:38:06.5157639Z     arg0_1, = args
2023-01-11T21:38:06.5157717Z     args.clear()
2023-01-11T21:38:06.5157813Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5158033Z         buf0 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5158252Z         buf1 = empty_strided((2, 4, 7, 7), (196, 49, 7, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5158345Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5158505Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 392, grid=grid(392), stream=stream0)
2023-01-11T21:38:06.5158578Z         del arg0_1
2023-01-11T21:38:06.5158661Z         return (buf0, buf1, )
2023-01-11T21:38:06.5158666Z 
2023-01-11T21:38:06.5158671Z 
2023-01-11T21:38:06.5158753Z if __name__ == "__main__":
2023-01-11T21:38:06.5158875Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5159009Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5159235Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5159348Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5159353Z 
2023-01-11T21:38:06.5159426Z ok (0.675s)
2023-01-11T21:38:06.5159885Z   test_max_pool2d2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5160040Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5160300Z [2023-01-11 21:35:16,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 702
2023-01-11T21:38:06.5160571Z [2023-01-11 21:35:16,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 702
2023-01-11T21:38:06.5160988Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5161121Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5161379Z [2023-01-11 21:35:16,423] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 703
2023-01-11T21:38:06.5161387Z 
2023-01-11T21:38:06.5161491Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5161570Z import torch
2023-01-11T21:38:06.5161645Z import random
2023-01-11T21:38:06.5161763Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5161888Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5161893Z 
2023-01-11T21:38:06.5161979Z aten = torch.ops.aten
2023-01-11T21:38:06.5162117Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5162216Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5162221Z 
2023-01-11T21:38:06.5162297Z import triton
2023-01-11T21:38:06.5162390Z import triton.language as tl
2023-01-11T21:38:06.5162520Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5162655Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5162660Z 
2023-01-11T21:38:06.5162678Z 
2023-01-11T21:38:06.5162853Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5162929Z import triton
2023-01-11T21:38:06.5163025Z import triton.language as tl
2023-01-11T21:38:06.5163141Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5163246Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5163409Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5163538Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5163544Z 
2023-01-11T21:38:06.5163964Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5164041Z @triton.jit
2023-01-11T21:38:06.5164186Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5164264Z     xnumel = 746496
2023-01-11T21:38:06.5164366Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5164498Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5164584Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5164663Z     x0 = xindex % 27
2023-01-11T21:38:06.5164741Z     x1 = (xindex // 27) % 27
2023-01-11T21:38:06.5164825Z     x2 = (xindex // 729)
2023-01-11T21:38:06.5164897Z     x3 = xindex
2023-01-11T21:38:06.5165123Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5165352Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5165579Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5165805Z     tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5166024Z     tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5166278Z     tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5166507Z     tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5166737Z     tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5166965Z     tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5167088Z     tmp17 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167209Z     tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167328Z     tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167448Z     tmp29 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167565Z     tmp34 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167684Z     tmp39 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167806Z     tmp44 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5167928Z     tmp49 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5168047Z     tmp54 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5168184Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5168318Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5168450Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5168577Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5168714Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5168861Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5168999Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5169135Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5169248Z     tmp18 = (2*x0) + (110*x1)
2023-01-11T21:38:06.5169337Z     tmp20 = 1 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5169418Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5169514Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5169653Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5169739Z     tmp25 = 2 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5169822Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5169923Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5170059Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5170148Z     tmp30 = 55 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5170224Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5170323Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5170465Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5170551Z     tmp35 = 56 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5170637Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5170738Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5170874Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5170952Z     tmp40 = 57 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5171035Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5171134Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5171272Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5171359Z     tmp45 = 110 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5171441Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5171566Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5171694Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5171780Z     tmp50 = 111 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5171862Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5171959Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5172099Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5172185Z     tmp55 = 112 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5172266Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5172359Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5172493Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5172628Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5172761Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5172848Z ''')
2023-01-11T21:38:06.5172857Z 
2023-01-11T21:38:06.5172862Z 
2023-01-11T21:38:06.5172959Z async_compile.wait(globals())
2023-01-11T21:38:06.5173037Z del async_compile
2023-01-11T21:38:06.5173042Z 
2023-01-11T21:38:06.5173119Z def call(args):
2023-01-11T21:38:06.5173188Z     arg0_1, = args
2023-01-11T21:38:06.5173265Z     args.clear()
2023-01-11T21:38:06.5173364Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5173597Z         buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5173823Z         buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5173918Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5174079Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 746496, grid=grid(746496), stream=stream0)
2023-01-11T21:38:06.5174152Z         del arg0_1
2023-01-11T21:38:06.5174231Z         return (buf0, buf1, )
2023-01-11T21:38:06.5174236Z 
2023-01-11T21:38:06.5174243Z 
2023-01-11T21:38:06.5174326Z if __name__ == "__main__":
2023-01-11T21:38:06.5174446Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5174687Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5174922Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5175079Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5175085Z 
2023-01-11T21:38:06.5175351Z [2023-01-11 21:35:16,679] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 703
2023-01-11T21:38:06.5175357Z 
2023-01-11T21:38:06.5175455Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5175523Z import torch
2023-01-11T21:38:06.5175599Z import random
2023-01-11T21:38:06.5175717Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5175841Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5175846Z 
2023-01-11T21:38:06.5175927Z aten = torch.ops.aten
2023-01-11T21:38:06.5176066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5176160Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5176165Z 
2023-01-11T21:38:06.5176239Z import triton
2023-01-11T21:38:06.5176324Z import triton.language as tl
2023-01-11T21:38:06.5176447Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5176587Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5176593Z 
2023-01-11T21:38:06.5176598Z 
2023-01-11T21:38:06.5176778Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5176853Z import triton
2023-01-11T21:38:06.5176940Z import triton.language as tl
2023-01-11T21:38:06.5177051Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5177200Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5177339Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5177470Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5177518Z 
2023-01-11T21:38:06.5177946Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5178021Z @triton.jit
2023-01-11T21:38:06.5178168Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5178244Z     xnumel = 746496
2023-01-11T21:38:06.5178344Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5178475Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5178554Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5178633Z     x0 = xindex % 27
2023-01-11T21:38:06.5178718Z     x1 = (xindex // 27) % 27
2023-01-11T21:38:06.5178800Z     x2 = (xindex // 729)
2023-01-11T21:38:06.5178879Z     x3 = xindex
2023-01-11T21:38:06.5179128Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5179381Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5179614Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5179867Z     tmp5 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5180116Z     tmp7 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5180362Z     tmp9 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5180610Z     tmp11 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5180857Z     tmp13 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5181112Z     tmp15 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5181245Z     tmp17 = tl.load(in_ptr0 + ((2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5181405Z     tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5181536Z     tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5181673Z     tmp29 = tl.load(in_ptr0 + (55 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5181810Z     tmp34 = tl.load(in_ptr0 + (56 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5181941Z     tmp39 = tl.load(in_ptr0 + (57 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5182075Z     tmp44 = tl.load(in_ptr0 + (110 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5182211Z     tmp49 = tl.load(in_ptr0 + (111 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5182346Z     tmp54 = tl.load(in_ptr0 + (112 + (2*x0) + (110*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5182484Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5182621Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5182748Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5182880Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5183019Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5183164Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5183306Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5183468Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5183550Z     tmp18 = (2*x0) + (110*x1)
2023-01-11T21:38:06.5183630Z     tmp20 = 1 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5183713Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5183817Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5183960Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5184044Z     tmp25 = 2 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5184127Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5184227Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5184363Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5184442Z     tmp30 = 55 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5184525Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5184628Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5184769Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5184858Z     tmp35 = 56 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5184941Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5185041Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5185170Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5185257Z     tmp40 = 57 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5185341Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5185438Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5185577Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5185686Z     tmp45 = 110 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5185774Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5185888Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5186023Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5186109Z     tmp50 = 111 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5186192Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5186292Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5186427Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5186512Z     tmp55 = 112 + (2*x0) + (110*x1)
2023-01-11T21:38:06.5186587Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5186712Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5186849Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5186983Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5187118Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5187209Z ''')
2023-01-11T21:38:06.5187215Z 
2023-01-11T21:38:06.5187219Z 
2023-01-11T21:38:06.5187315Z async_compile.wait(globals())
2023-01-11T21:38:06.5187399Z del async_compile
2023-01-11T21:38:06.5187404Z 
2023-01-11T21:38:06.5187474Z def call(args):
2023-01-11T21:38:06.5187552Z     arg0_1, = args
2023-01-11T21:38:06.5187629Z     args.clear()
2023-01-11T21:38:06.5187724Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5187954Z         buf0 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5188181Z         buf1 = empty_strided((16, 64, 27, 27), (46656, 729, 27, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5188276Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5188431Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 746496, grid=grid(746496), stream=stream0)
2023-01-11T21:38:06.5188511Z         del arg0_1
2023-01-11T21:38:06.5188597Z         return (buf0, buf1, )
2023-01-11T21:38:06.5188602Z 
2023-01-11T21:38:06.5188607Z 
2023-01-11T21:38:06.5188687Z if __name__ == "__main__":
2023-01-11T21:38:06.5188809Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5188940Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5189200Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5189318Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5189323Z 
2023-01-11T21:38:06.5189398Z ok (0.605s)
2023-01-11T21:38:06.5189855Z   test_max_pool2d3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5189988Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5190249Z [2023-01-11 21:35:16,701] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 704
2023-01-11T21:38:06.5190254Z 
2023-01-11T21:38:06.5190354Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5190433Z import torch
2023-01-11T21:38:06.5190509Z import random
2023-01-11T21:38:06.5190632Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5190755Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5190761Z 
2023-01-11T21:38:06.5190846Z aten = torch.ops.aten
2023-01-11T21:38:06.5190982Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5191081Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5191086Z 
2023-01-11T21:38:06.5191160Z import triton
2023-01-11T21:38:06.5191252Z import triton.language as tl
2023-01-11T21:38:06.5191378Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5191521Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5191527Z 
2023-01-11T21:38:06.5191531Z 
2023-01-11T21:38:06.5191712Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5191783Z import triton
2023-01-11T21:38:06.5191875Z import triton.language as tl
2023-01-11T21:38:06.5191994Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5192097Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5192236Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5192366Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5192371Z 
2023-01-11T21:38:06.5192822Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5192900Z @triton.jit
2023-01-11T21:38:06.5193042Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5193112Z     xnumel = 16
2023-01-11T21:38:06.5193211Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5193341Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5193429Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5193512Z     x1 = (xindex // 4)
2023-01-11T21:38:06.5193588Z     x0 = xindex % 4
2023-01-11T21:38:06.5193654Z     x2 = xindex
2023-01-11T21:38:06.5193763Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.5193836Z     tmp1 = 0
2023-01-11T21:38:06.5193916Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.5193988Z     tmp3 = 8
2023-01-11T21:38:06.5194071Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.5194149Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.5194254Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.5194335Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.5194415Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.5194494Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.5194574Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.5194869Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5195027Z     tmp12 = tl.where(tmp10, tmp11, float("-inf"))
2023-01-11T21:38:06.5195097Z     tmp13 = 2*x0
2023-01-11T21:38:06.5195217Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.5195299Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.5195383Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.5195467Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.5195751Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5195906Z     tmp19 = tl.where(tmp17, tmp18, float("-inf"))
2023-01-11T21:38:06.5196046Z     tmp20 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp12, tmp19, tmp12))
2023-01-11T21:38:06.5196123Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.5196206Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.5196286Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.5196369Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.5196450Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.5196736Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5196889Z     tmp27 = tl.where(tmp25, tmp26, float("-inf"))
2023-01-11T21:38:06.5197028Z     tmp28 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 > tmp20, tmp27, tmp20))
2023-01-11T21:38:06.5197103Z     tmp29 = 2*x1
2023-01-11T21:38:06.5197185Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.5197265Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.5197349Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.5197432Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.5197702Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5197856Z     tmp35 = tl.where(tmp33, tmp34, float("-inf"))
2023-01-11T21:38:06.5197995Z     tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 > tmp28, tmp35, tmp28))
2023-01-11T21:38:06.5198082Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.5198358Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5198514Z     tmp39 = tl.where(tmp37, tmp38, float("-inf"))
2023-01-11T21:38:06.5198654Z     tmp40 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp36, tmp39, tmp36))
2023-01-11T21:38:06.5198737Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.5199039Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5199202Z     tmp43 = tl.where(tmp41, tmp42, float("-inf"))
2023-01-11T21:38:06.5199344Z     tmp44 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 > tmp40, tmp43, tmp40))
2023-01-11T21:38:06.5199423Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.5199507Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.5199589Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.5199672Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.5199757Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.5200031Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5200190Z     tmp51 = tl.where(tmp49, tmp50, float("-inf"))
2023-01-11T21:38:06.5200330Z     tmp52 = tl.where(tmp51 != tmp51, tmp51, tl.where(tmp51 > tmp44, tmp51, tmp44))
2023-01-11T21:38:06.5200410Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.5200684Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5200839Z     tmp55 = tl.where(tmp53, tmp54, float("-inf"))
2023-01-11T21:38:06.5200978Z     tmp56 = tl.where(tmp55 != tmp55, tmp55, tl.where(tmp55 > tmp52, tmp55, tmp52))
2023-01-11T21:38:06.5201080Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.5201380Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.5201535Z     tmp59 = tl.where(tmp57, tmp58, float("-inf"))
2023-01-11T21:38:06.5201674Z     tmp60 = tl.where(tmp59 != tmp59, tmp59, tl.where(tmp59 > tmp56, tmp59, tmp56))
2023-01-11T21:38:06.5201942Z     tmp61 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0)
2023-01-11T21:38:06.5202096Z     tmp62 = tl.where(tmp10, tmp61, float("-inf"))
2023-01-11T21:38:06.5202223Z     tmp63 = (-9) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5202465Z     tmp64 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0)
2023-01-11T21:38:06.5202618Z     tmp65 = tl.where(tmp17, tmp64, float("-inf"))
2023-01-11T21:38:06.5202738Z     tmp66 = (-8) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5202819Z     tmp67 = tmp65 > tmp62
2023-01-11T21:38:06.5202919Z     tmp68 = tl.where(tmp67, tmp66, tmp63)
2023-01-11T21:38:06.5203059Z     tmp69 = tl.where(tmp65 != tmp65, tmp65, tl.where(tmp65 > tmp62, tmp65, tmp62))
2023-01-11T21:38:06.5203294Z     tmp70 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0)
2023-01-11T21:38:06.5203445Z     tmp71 = tl.where(tmp25, tmp70, float("-inf"))
2023-01-11T21:38:06.5203571Z     tmp72 = (-7) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5203647Z     tmp73 = tmp71 > tmp69
2023-01-11T21:38:06.5203748Z     tmp74 = tl.where(tmp73, tmp72, tmp68)
2023-01-11T21:38:06.5203887Z     tmp75 = tl.where(tmp71 != tmp71, tmp71, tl.where(tmp71 > tmp69, tmp71, tmp69))
2023-01-11T21:38:06.5204122Z     tmp76 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0)
2023-01-11T21:38:06.5204276Z     tmp77 = tl.where(tmp33, tmp76, float("-inf"))
2023-01-11T21:38:06.5204399Z     tmp78 = (-1) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5204480Z     tmp79 = tmp77 > tmp75
2023-01-11T21:38:06.5204574Z     tmp80 = tl.where(tmp79, tmp78, tmp74)
2023-01-11T21:38:06.5204711Z     tmp81 = tl.where(tmp77 != tmp77, tmp77, tl.where(tmp77 > tmp75, tmp77, tmp75))
2023-01-11T21:38:06.5204866Z     tmp82 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0)
2023-01-11T21:38:06.5205019Z     tmp83 = tl.where(tmp37, tmp82, float("-inf"))
2023-01-11T21:38:06.5205105Z     tmp84 = (2*x0) + (16*x1)
2023-01-11T21:38:06.5205187Z     tmp85 = tmp83 > tmp81
2023-01-11T21:38:06.5205285Z     tmp86 = tl.where(tmp85, tmp84, tmp80)
2023-01-11T21:38:06.5205422Z     tmp87 = tl.where(tmp83 != tmp83, tmp83, tl.where(tmp83 > tmp81, tmp83, tmp81))
2023-01-11T21:38:06.5205601Z     tmp88 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0)
2023-01-11T21:38:06.5205757Z     tmp89 = tl.where(tmp41, tmp88, float("-inf"))
2023-01-11T21:38:06.5205847Z     tmp90 = 1 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5205929Z     tmp91 = tmp89 > tmp87
2023-01-11T21:38:06.5206028Z     tmp92 = tl.where(tmp91, tmp90, tmp86)
2023-01-11T21:38:06.5206169Z     tmp93 = tl.where(tmp89 != tmp89, tmp89, tl.where(tmp89 > tmp87, tmp89, tmp87))
2023-01-11T21:38:06.5206325Z     tmp94 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0)
2023-01-11T21:38:06.5206472Z     tmp95 = tl.where(tmp49, tmp94, float("-inf"))
2023-01-11T21:38:06.5206561Z     tmp96 = 7 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5206643Z     tmp97 = tmp95 > tmp93
2023-01-11T21:38:06.5206741Z     tmp98 = tl.where(tmp97, tmp96, tmp92)
2023-01-11T21:38:06.5206879Z     tmp99 = tl.where(tmp95 != tmp95, tmp95, tl.where(tmp95 > tmp93, tmp95, tmp93))
2023-01-11T21:38:06.5207037Z     tmp100 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0)
2023-01-11T21:38:06.5207197Z     tmp101 = tl.where(tmp53, tmp100, float("-inf"))
2023-01-11T21:38:06.5207286Z     tmp102 = 8 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5207364Z     tmp103 = tmp101 > tmp99
2023-01-11T21:38:06.5207469Z     tmp104 = tl.where(tmp103, tmp102, tmp98)
2023-01-11T21:38:06.5207614Z     tmp105 = tl.where(tmp101 != tmp101, tmp101, tl.where(tmp101 > tmp99, tmp101, tmp99))
2023-01-11T21:38:06.5207772Z     tmp106 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0)
2023-01-11T21:38:06.5207958Z     tmp107 = tl.where(tmp57, tmp106, float("-inf"))
2023-01-11T21:38:06.5208044Z     tmp108 = 9 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5208129Z     tmp109 = tmp107 > tmp105
2023-01-11T21:38:06.5208229Z     tmp110 = tl.where(tmp109, tmp108, tmp104)
2023-01-11T21:38:06.5208378Z     tmp111 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 > tmp105, tmp107, tmp105))
2023-01-11T21:38:06.5208512Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask)
2023-01-11T21:38:06.5208647Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp110, xmask)
2023-01-11T21:38:06.5208735Z ''')
2023-01-11T21:38:06.5208741Z 
2023-01-11T21:38:06.5208745Z 
2023-01-11T21:38:06.5208841Z async_compile.wait(globals())
2023-01-11T21:38:06.5208919Z del async_compile
2023-01-11T21:38:06.5208924Z 
2023-01-11T21:38:06.5209002Z def call(args):
2023-01-11T21:38:06.5209071Z     arg0_1, = args
2023-01-11T21:38:06.5209147Z     args.clear()
2023-01-11T21:38:06.5209240Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5209460Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5209675Z         buf1 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5209768Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5209932Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5210010Z         del arg0_1
2023-01-11T21:38:06.5210089Z         return (buf0, buf1, )
2023-01-11T21:38:06.5210094Z 
2023-01-11T21:38:06.5210099Z 
2023-01-11T21:38:06.5210182Z if __name__ == "__main__":
2023-01-11T21:38:06.5210303Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5210432Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5210650Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5210764Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5211035Z [2023-01-11 21:35:16,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 704
2023-01-11T21:38:06.5211487Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5211621Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5211872Z [2023-01-11 21:35:16,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 705
2023-01-11T21:38:06.5211878Z 
2023-01-11T21:38:06.5211889Z 
2023-01-11T21:38:06.5211982Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5212059Z import torch
2023-01-11T21:38:06.5212136Z import random
2023-01-11T21:38:06.5212255Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5212385Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5212391Z 
2023-01-11T21:38:06.5212473Z aten = torch.ops.aten
2023-01-11T21:38:06.5212611Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5212703Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5212711Z 
2023-01-11T21:38:06.5212786Z import triton
2023-01-11T21:38:06.5212879Z import triton.language as tl
2023-01-11T21:38:06.5213005Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5213148Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5213154Z 
2023-01-11T21:38:06.5213158Z 
2023-01-11T21:38:06.5213340Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5213418Z import triton
2023-01-11T21:38:06.5213511Z import triton.language as tl
2023-01-11T21:38:06.5213620Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5213723Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5213885Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5214012Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5214017Z 
2023-01-11T21:38:06.5214438Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5214624Z @triton.jit
2023-01-11T21:38:06.5214770Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5214847Z     xnumel = 16
2023-01-11T21:38:06.5214938Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5215066Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5215150Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5215229Z     x1 = (xindex // 4)
2023-01-11T21:38:06.5215303Z     x0 = xindex % 4
2023-01-11T21:38:06.5215378Z     x2 = xindex
2023-01-11T21:38:06.5215490Z     tmp0 = (-1) + (2*x1)
2023-01-11T21:38:06.5215555Z     tmp1 = 0
2023-01-11T21:38:06.5215633Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.5215703Z     tmp3 = 8
2023-01-11T21:38:06.5215779Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.5215855Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.5215966Z     tmp6 = (-1) + (2*x0)
2023-01-11T21:38:06.5216038Z     tmp7 = tmp6 >= tmp1
2023-01-11T21:38:06.5216117Z     tmp8 = tmp6 < tmp3
2023-01-11T21:38:06.5216192Z     tmp9 = tmp7 & tmp8
2023-01-11T21:38:06.5216269Z     tmp10 = tmp5 & tmp9
2023-01-11T21:38:06.5216584Z     tmp11 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5216738Z     tmp12 = tl.where(tmp10, tmp11, float("-inf"))
2023-01-11T21:38:06.5216811Z     tmp13 = 2*x0
2023-01-11T21:38:06.5216885Z     tmp14 = tmp13 >= tmp1
2023-01-11T21:38:06.5216965Z     tmp15 = tmp13 < tmp3
2023-01-11T21:38:06.5217046Z     tmp16 = tmp14 & tmp15
2023-01-11T21:38:06.5217171Z     tmp17 = tmp5 & tmp16
2023-01-11T21:38:06.5217520Z     tmp18 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5217722Z     tmp19 = tl.where(tmp17, tmp18, float("-inf"))
2023-01-11T21:38:06.5217866Z     tmp20 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp12, tmp19, tmp12))
2023-01-11T21:38:06.5217941Z     tmp21 = 1 + (2*x0)
2023-01-11T21:38:06.5218014Z     tmp22 = tmp21 >= tmp1
2023-01-11T21:38:06.5218093Z     tmp23 = tmp21 < tmp3
2023-01-11T21:38:06.5218169Z     tmp24 = tmp22 & tmp23
2023-01-11T21:38:06.5218248Z     tmp25 = tmp5 & tmp24
2023-01-11T21:38:06.5218557Z     tmp26 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5218712Z     tmp27 = tl.where(tmp25, tmp26, float("-inf"))
2023-01-11T21:38:06.5218857Z     tmp28 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 > tmp20, tmp27, tmp20))
2023-01-11T21:38:06.5218924Z     tmp29 = 2*x1
2023-01-11T21:38:06.5219002Z     tmp30 = tmp29 >= tmp1
2023-01-11T21:38:06.5219080Z     tmp31 = tmp29 < tmp3
2023-01-11T21:38:06.5219155Z     tmp32 = tmp30 & tmp31
2023-01-11T21:38:06.5219236Z     tmp33 = tmp32 & tmp9
2023-01-11T21:38:06.5219531Z     tmp34 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5219683Z     tmp35 = tl.where(tmp33, tmp34, float("-inf"))
2023-01-11T21:38:06.5219812Z     tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 > tmp28, tmp35, tmp28))
2023-01-11T21:38:06.5219891Z     tmp37 = tmp32 & tmp16
2023-01-11T21:38:06.5220185Z     tmp38 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5220336Z     tmp39 = tl.where(tmp37, tmp38, float("-inf"))
2023-01-11T21:38:06.5220509Z     tmp40 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp36, tmp39, tmp36))
2023-01-11T21:38:06.5220590Z     tmp41 = tmp32 & tmp24
2023-01-11T21:38:06.5220893Z     tmp42 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5221046Z     tmp43 = tl.where(tmp41, tmp42, float("-inf"))
2023-01-11T21:38:06.5221180Z     tmp44 = tl.where(tmp43 != tmp43, tmp43, tl.where(tmp43 > tmp40, tmp43, tmp40))
2023-01-11T21:38:06.5221259Z     tmp45 = 1 + (2*x1)
2023-01-11T21:38:06.5221339Z     tmp46 = tmp45 >= tmp1
2023-01-11T21:38:06.5221419Z     tmp47 = tmp45 < tmp3
2023-01-11T21:38:06.5221501Z     tmp48 = tmp46 & tmp47
2023-01-11T21:38:06.5221581Z     tmp49 = tmp48 & tmp9
2023-01-11T21:38:06.5221879Z     tmp50 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5222038Z     tmp51 = tl.where(tmp49, tmp50, float("-inf"))
2023-01-11T21:38:06.5222172Z     tmp52 = tl.where(tmp51 != tmp51, tmp51, tl.where(tmp51 > tmp44, tmp51, tmp44))
2023-01-11T21:38:06.5222254Z     tmp53 = tmp48 & tmp16
2023-01-11T21:38:06.5222556Z     tmp54 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5222711Z     tmp55 = tl.where(tmp53, tmp54, float("-inf"))
2023-01-11T21:38:06.5222852Z     tmp56 = tl.where(tmp55 != tmp55, tmp55, tl.where(tmp55 > tmp52, tmp55, tmp52))
2023-01-11T21:38:06.5222936Z     tmp57 = tmp48 & tmp24
2023-01-11T21:38:06.5223237Z     tmp58 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.5223391Z     tmp59 = tl.where(tmp57, tmp58, float("-inf"))
2023-01-11T21:38:06.5223522Z     tmp60 = tl.where(tmp59 != tmp59, tmp59, tl.where(tmp59 > tmp56, tmp59, tmp56))
2023-01-11T21:38:06.5223783Z     tmp61 = tl.load(in_ptr0 + ((-9) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp10 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5223937Z     tmp62 = tl.where(tmp10, tmp61, float("-inf"))
2023-01-11T21:38:06.5224063Z     tmp63 = (-9) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5224351Z     tmp64 = tl.load(in_ptr0 + ((-8) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5224508Z     tmp65 = tl.where(tmp17, tmp64, float("-inf"))
2023-01-11T21:38:06.5224634Z     tmp66 = (-8) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5224710Z     tmp67 = tmp65 > tmp62
2023-01-11T21:38:06.5224811Z     tmp68 = tl.where(tmp67, tmp66, tmp63)
2023-01-11T21:38:06.5224951Z     tmp69 = tl.where(tmp65 != tmp65, tmp65, tl.where(tmp65 > tmp62, tmp65, tmp62))
2023-01-11T21:38:06.5225202Z     tmp70 = tl.load(in_ptr0 + ((-7) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp25 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5225357Z     tmp71 = tl.where(tmp25, tmp70, float("-inf"))
2023-01-11T21:38:06.5225480Z     tmp72 = (-7) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5225561Z     tmp73 = tmp71 > tmp69
2023-01-11T21:38:06.5225662Z     tmp74 = tl.where(tmp73, tmp72, tmp68)
2023-01-11T21:38:06.5225809Z     tmp75 = tl.where(tmp71 != tmp71, tmp71, tl.where(tmp71 > tmp69, tmp71, tmp69))
2023-01-11T21:38:06.5226097Z     tmp76 = tl.load(in_ptr0 + ((-1) + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp33 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5226252Z     tmp77 = tl.where(tmp33, tmp76, float("-inf"))
2023-01-11T21:38:06.5226375Z     tmp78 = (-1) + (2*x0) + (16*x1)
2023-01-11T21:38:06.5226458Z     tmp79 = tmp77 > tmp75
2023-01-11T21:38:06.5226559Z     tmp80 = tl.where(tmp79, tmp78, tmp74)
2023-01-11T21:38:06.5226697Z     tmp81 = tl.where(tmp77 != tmp77, tmp77, tl.where(tmp77 > tmp75, tmp77, tmp75))
2023-01-11T21:38:06.5226865Z     tmp82 = tl.load(in_ptr0 + ((2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp37 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5227042Z     tmp83 = tl.where(tmp37, tmp82, float("-inf"))
2023-01-11T21:38:06.5227127Z     tmp84 = (2*x0) + (16*x1)
2023-01-11T21:38:06.5227210Z     tmp85 = tmp83 > tmp81
2023-01-11T21:38:06.5227310Z     tmp86 = tl.where(tmp85, tmp84, tmp80)
2023-01-11T21:38:06.5227452Z     tmp87 = tl.where(tmp83 != tmp83, tmp83, tl.where(tmp83 > tmp81, tmp83, tmp81))
2023-01-11T21:38:06.5227620Z     tmp88 = tl.load(in_ptr0 + (1 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp41 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5227775Z     tmp89 = tl.where(tmp41, tmp88, float("-inf"))
2023-01-11T21:38:06.5227854Z     tmp90 = 1 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5227940Z     tmp91 = tmp89 > tmp87
2023-01-11T21:38:06.5228043Z     tmp92 = tl.where(tmp91, tmp90, tmp86)
2023-01-11T21:38:06.5228181Z     tmp93 = tl.where(tmp89 != tmp89, tmp89, tl.where(tmp89 > tmp87, tmp89, tmp87))
2023-01-11T21:38:06.5228350Z     tmp94 = tl.load(in_ptr0 + (7 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp49 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5228505Z     tmp95 = tl.where(tmp49, tmp94, float("-inf"))
2023-01-11T21:38:06.5228591Z     tmp96 = 7 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5228672Z     tmp97 = tmp95 > tmp93
2023-01-11T21:38:06.5228765Z     tmp98 = tl.where(tmp97, tmp96, tmp92)
2023-01-11T21:38:06.5228906Z     tmp99 = tl.where(tmp95 != tmp95, tmp95, tl.where(tmp95 > tmp93, tmp95, tmp93))
2023-01-11T21:38:06.5229077Z     tmp100 = tl.load(in_ptr0 + (8 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp53 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5229231Z     tmp101 = tl.where(tmp53, tmp100, float("-inf"))
2023-01-11T21:38:06.5229318Z     tmp102 = 8 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5229401Z     tmp103 = tmp101 > tmp99
2023-01-11T21:38:06.5229508Z     tmp104 = tl.where(tmp103, tmp102, tmp98)
2023-01-11T21:38:06.5229647Z     tmp105 = tl.where(tmp101 != tmp101, tmp101, tl.where(tmp101 > tmp99, tmp101, tmp99))
2023-01-11T21:38:06.5229819Z     tmp106 = tl.load(in_ptr0 + (9 + (2*x0) + (16*x1) + tl.zeros([XBLOCK], tl.int32)), tmp57 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5229977Z     tmp107 = tl.where(tmp57, tmp106, float("-inf"))
2023-01-11T21:38:06.5230066Z     tmp108 = 9 + (2*x0) + (16*x1)
2023-01-11T21:38:06.5230150Z     tmp109 = tmp107 > tmp105
2023-01-11T21:38:06.5230284Z     tmp110 = tl.where(tmp109, tmp108, tmp104)
2023-01-11T21:38:06.5230432Z     tmp111 = tl.where(tmp107 != tmp107, tmp107, tl.where(tmp107 > tmp105, tmp107, tmp105))
2023-01-11T21:38:06.5230569Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask)
2023-01-11T21:38:06.5230699Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp110, xmask)
2023-01-11T21:38:06.5230788Z ''')
2023-01-11T21:38:06.5230793Z 
2023-01-11T21:38:06.5230798Z 
2023-01-11T21:38:06.5230895Z async_compile.wait(globals())
2023-01-11T21:38:06.5230974Z del async_compile
2023-01-11T21:38:06.5230979Z 
2023-01-11T21:38:06.5231058Z def call(args):
2023-01-11T21:38:06.5231137Z     arg0_1, = args
2023-01-11T21:38:06.5231215Z     args.clear()
2023-01-11T21:38:06.5231304Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5231523Z         buf0 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5231737Z         buf1 = empty_strided((1, 1, 4, 4), (16, 16, 4, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5231834Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5231992Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5232068Z         del arg0_1
2023-01-11T21:38:06.5232152Z         return (buf0, buf1, )
2023-01-11T21:38:06.5232157Z 
2023-01-11T21:38:06.5232162Z 
2023-01-11T21:38:06.5232247Z if __name__ == "__main__":
2023-01-11T21:38:06.5232360Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5232489Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5232706Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5232895Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5233165Z [2023-01-11 21:35:17,263] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 705
2023-01-11T21:38:06.5233171Z 
2023-01-11T21:38:06.5233246Z ok (0.583s)
2023-01-11T21:38:06.5233708Z   test_max_pool2d4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5233842Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5234101Z [2023-01-11 21:35:17,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 706
2023-01-11T21:38:06.5234369Z [2023-01-11 21:35:17,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 706
2023-01-11T21:38:06.5234779Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5234913Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5235170Z [2023-01-11 21:35:17,593] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 707
2023-01-11T21:38:06.5235176Z 
2023-01-11T21:38:06.5235276Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5235365Z import torch
2023-01-11T21:38:06.5235452Z import random
2023-01-11T21:38:06.5235595Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5235726Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5235732Z 
2023-01-11T21:38:06.5235816Z aten = torch.ops.aten
2023-01-11T21:38:06.5235951Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5236052Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5236082Z 
2023-01-11T21:38:06.5236161Z import triton
2023-01-11T21:38:06.5236255Z import triton.language as tl
2023-01-11T21:38:06.5236382Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5236525Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5236530Z 
2023-01-11T21:38:06.5236535Z 
2023-01-11T21:38:06.5236714Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5236790Z import triton
2023-01-11T21:38:06.5236878Z import triton.language as tl
2023-01-11T21:38:06.5236994Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5237098Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5237236Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5237366Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5237371Z 
2023-01-11T21:38:06.5237793Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5237869Z @triton.jit
2023-01-11T21:38:06.5238014Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5238085Z     xnumel = 48400
2023-01-11T21:38:06.5238186Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5238317Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5238403Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5238481Z     x0 = xindex % 55
2023-01-11T21:38:06.5238565Z     x1 = (xindex // 55) % 55
2023-01-11T21:38:06.5238684Z     x2 = (xindex // 3025)
2023-01-11T21:38:06.5238750Z     x3 = xindex
2023-01-11T21:38:06.5238980Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5239206Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5239434Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5239662Z     tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5239890Z     tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5240115Z     tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5240345Z     tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5240571Z     tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5240798Z     tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5240922Z     tmp17 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241044Z     tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241165Z     tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241288Z     tmp29 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241409Z     tmp34 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241528Z     tmp39 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241639Z     tmp44 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241757Z     tmp49 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5241873Z     tmp54 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask)
2023-01-11T21:38:06.5242010Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5242176Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5242314Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5242447Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5242587Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5242725Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5242864Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5242998Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5243086Z     tmp18 = (2*x0) + (222*x1)
2023-01-11T21:38:06.5243171Z     tmp20 = 1 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5243254Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5243356Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5243499Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5243578Z     tmp25 = 2 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5243661Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5243762Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5243899Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5243984Z     tmp30 = 111 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5244067Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5244168Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5244301Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5244417Z     tmp35 = 112 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5244501Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5244601Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5244736Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5244822Z     tmp40 = 113 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5244908Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5245001Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5245139Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5245223Z     tmp45 = 222 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5245306Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5245404Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5245538Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5245624Z     tmp50 = 223 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5245700Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5245801Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5245936Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5246021Z     tmp55 = 224 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5246101Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5246201Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5246339Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5246473Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5246601Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5246691Z ''')
2023-01-11T21:38:06.5246696Z 
2023-01-11T21:38:06.5246701Z 
2023-01-11T21:38:06.5246800Z async_compile.wait(globals())
2023-01-11T21:38:06.5246878Z del async_compile
2023-01-11T21:38:06.5246883Z 
2023-01-11T21:38:06.5246960Z def call(args):
2023-01-11T21:38:06.5247037Z     arg0_1, = args
2023-01-11T21:38:06.5247117Z     args.clear()
2023-01-11T21:38:06.5247207Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5247436Z         buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5247662Z         buf1 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5247787Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5247951Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 48400, grid=grid(48400), stream=stream0)
2023-01-11T21:38:06.5248028Z         del arg0_1
2023-01-11T21:38:06.5248112Z         return (buf0, buf1, )
2023-01-11T21:38:06.5248117Z 
2023-01-11T21:38:06.5248122Z 
2023-01-11T21:38:06.5248203Z if __name__ == "__main__":
2023-01-11T21:38:06.5248317Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5248447Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5248682Z     arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5248801Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5248806Z 
2023-01-11T21:38:06.5249073Z [2023-01-11 21:35:17,855] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 707
2023-01-11T21:38:06.5249079Z 
2023-01-11T21:38:06.5249181Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5249258Z import torch
2023-01-11T21:38:06.5249335Z import random
2023-01-11T21:38:06.5249449Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5249575Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5249580Z 
2023-01-11T21:38:06.5249665Z aten = torch.ops.aten
2023-01-11T21:38:06.5249804Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5249900Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5249905Z 
2023-01-11T21:38:06.5249982Z import triton
2023-01-11T21:38:06.5250077Z import triton.language as tl
2023-01-11T21:38:06.5250239Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5250374Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5250379Z 
2023-01-11T21:38:06.5250390Z 
2023-01-11T21:38:06.5250565Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5250642Z import triton
2023-01-11T21:38:06.5250739Z import triton.language as tl
2023-01-11T21:38:06.5250855Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5250960Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5251095Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5251222Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5251227Z 
2023-01-11T21:38:06.5251643Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5251721Z @triton.jit
2023-01-11T21:38:06.5251865Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5251943Z     xnumel = 48400
2023-01-11T21:38:06.5252044Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5252175Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5252263Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5252343Z     x0 = xindex % 55
2023-01-11T21:38:06.5252421Z     x1 = (xindex // 55) % 55
2023-01-11T21:38:06.5252503Z     x2 = (xindex // 3025)
2023-01-11T21:38:06.5252575Z     x3 = xindex
2023-01-11T21:38:06.5252829Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5253083Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5253333Z     tmp3 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5253585Z     tmp5 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5253833Z     tmp7 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5254104Z     tmp9 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5254357Z     tmp11 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5254717Z     tmp13 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5254969Z     tmp15 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5255103Z     tmp17 = tl.load(in_ptr0 + ((2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5255241Z     tmp19 = tl.load(in_ptr0 + (1 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5255378Z     tmp24 = tl.load(in_ptr0 + (2 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5255514Z     tmp29 = tl.load(in_ptr0 + (111 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5255648Z     tmp34 = tl.load(in_ptr0 + (112 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5255772Z     tmp39 = tl.load(in_ptr0 + (113 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5255900Z     tmp44 = tl.load(in_ptr0 + (222 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5256027Z     tmp49 = tl.load(in_ptr0 + (223 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5256156Z     tmp54 = tl.load(in_ptr0 + (224 + (2*x0) + (222*x1) + (12321*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5256342Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5256479Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5256614Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5256750Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5256882Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5257025Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5257211Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5257365Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5257446Z     tmp18 = (2*x0) + (222*x1)
2023-01-11T21:38:06.5257532Z     tmp20 = 1 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5257616Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5257713Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5257855Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5257937Z     tmp25 = 2 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5258018Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5258120Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5258259Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5258344Z     tmp30 = 111 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5258426Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5258520Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5258662Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5258748Z     tmp35 = 112 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5258830Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5258930Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5259065Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5259152Z     tmp40 = 113 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5259228Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5259326Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5259464Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5259596Z     tmp45 = 222 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5259681Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5259779Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5259915Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5259992Z     tmp50 = 223 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5260076Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5260175Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5260312Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5260396Z     tmp55 = 224 + (2*x0) + (222*x1)
2023-01-11T21:38:06.5260484Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5260583Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5260711Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5260844Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5260978Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5261067Z ''')
2023-01-11T21:38:06.5261073Z 
2023-01-11T21:38:06.5261078Z 
2023-01-11T21:38:06.5261174Z async_compile.wait(globals())
2023-01-11T21:38:06.5261251Z del async_compile
2023-01-11T21:38:06.5261256Z 
2023-01-11T21:38:06.5261332Z def call(args):
2023-01-11T21:38:06.5261408Z     arg0_1, = args
2023-01-11T21:38:06.5261479Z     args.clear()
2023-01-11T21:38:06.5261574Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5261805Z         buf0 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5262058Z         buf1 = empty_strided((2, 8, 55, 55), (24200, 3025, 55, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5262151Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5262308Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 48400, grid=grid(48400), stream=stream0)
2023-01-11T21:38:06.5262382Z         del arg0_1
2023-01-11T21:38:06.5262467Z         return (buf0, buf1, )
2023-01-11T21:38:06.5262473Z 
2023-01-11T21:38:06.5262477Z 
2023-01-11T21:38:06.5262550Z if __name__ == "__main__":
2023-01-11T21:38:06.5262667Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5262796Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5263027Z     arg0_1 = rand_strided((2, 8, 111, 111), (98568, 12321, 111, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5263139Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5263144Z 
2023-01-11T21:38:06.5263211Z ok (0.592s)
2023-01-11T21:38:06.5263672Z   test_max_pool2d5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5263807Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5264067Z [2023-01-11 21:35:17,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 708
2023-01-11T21:38:06.5264322Z [2023-01-11 21:35:18,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 708
2023-01-11T21:38:06.5264738Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5264870Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5265148Z [2023-01-11 21:35:18,201] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 709
2023-01-11T21:38:06.5265154Z 
2023-01-11T21:38:06.5265253Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5265327Z import torch
2023-01-11T21:38:06.5265401Z import random
2023-01-11T21:38:06.5265520Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5265659Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5265666Z 
2023-01-11T21:38:06.5265747Z aten = torch.ops.aten
2023-01-11T21:38:06.5265909Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5266003Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5266008Z 
2023-01-11T21:38:06.5266081Z import triton
2023-01-11T21:38:06.5266175Z import triton.language as tl
2023-01-11T21:38:06.5266300Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5266440Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5266446Z 
2023-01-11T21:38:06.5266450Z 
2023-01-11T21:38:06.5266631Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5266699Z import triton
2023-01-11T21:38:06.5266793Z import triton.language as tl
2023-01-11T21:38:06.5266907Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5267009Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5267141Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5267264Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5267269Z 
2023-01-11T21:38:06.5267689Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5267791Z @triton.jit
2023-01-11T21:38:06.5267927Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5268000Z     xnumel = 331776
2023-01-11T21:38:06.5268098Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5268229Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5268313Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5268387Z     x0 = xindex % 18
2023-01-11T21:38:06.5268467Z     x1 = (xindex // 18) % 18
2023-01-11T21:38:06.5268539Z     x2 = (xindex // 324)
2023-01-11T21:38:06.5268611Z     x3 = xindex
2023-01-11T21:38:06.5268838Z     tmp0 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5269063Z     tmp1 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5269286Z     tmp3 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5269512Z     tmp5 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5269735Z     tmp7 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5269958Z     tmp9 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5270186Z     tmp11 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5270404Z     tmp13 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5270629Z     tmp15 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5270748Z     tmp17 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5270864Z     tmp19 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5270984Z     tmp24 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271100Z     tmp29 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271217Z     tmp34 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271358Z     tmp39 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271470Z     tmp44 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271588Z     tmp49 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271701Z     tmp54 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask)
2023-01-11T21:38:06.5271834Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5271967Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5272095Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5272228Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5272359Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5272495Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5272630Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5272760Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5272841Z     tmp18 = (3*x0) + (165*x1)
2023-01-11T21:38:06.5272921Z     tmp20 = 1 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5273000Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5273097Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5273228Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5273311Z     tmp25 = 2 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5273418Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5273520Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5273656Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5273739Z     tmp30 = 55 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5273822Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5273914Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5274052Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5274133Z     tmp35 = 56 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5274212Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5274311Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5274441Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5274523Z     tmp40 = 57 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5274602Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5274693Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5274832Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5274916Z     tmp45 = 110 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5275002Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5275101Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5275237Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5275339Z     tmp50 = 111 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5275419Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5275534Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5275672Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5275755Z     tmp55 = 112 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5275834Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5275931Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5276062Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5276188Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5276317Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5276403Z ''')
2023-01-11T21:38:06.5276409Z 
2023-01-11T21:38:06.5276413Z 
2023-01-11T21:38:06.5276534Z async_compile.wait(globals())
2023-01-11T21:38:06.5276612Z del async_compile
2023-01-11T21:38:06.5276618Z 
2023-01-11T21:38:06.5276694Z def call(args):
2023-01-11T21:38:06.5276768Z     arg0_1, = args
2023-01-11T21:38:06.5276844Z     args.clear()
2023-01-11T21:38:06.5276929Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5277160Z         buf0 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5277384Z         buf1 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5277474Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5277635Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 331776, grid=grid(331776), stream=stream0)
2023-01-11T21:38:06.5277711Z         del arg0_1
2023-01-11T21:38:06.5277795Z         return (buf0, buf1, )
2023-01-11T21:38:06.5277800Z 
2023-01-11T21:38:06.5277805Z 
2023-01-11T21:38:06.5277884Z if __name__ == "__main__":
2023-01-11T21:38:06.5277998Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5278123Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5278355Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5278469Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5278474Z 
2023-01-11T21:38:06.5278739Z [2023-01-11 21:35:18,459] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 709
2023-01-11T21:38:06.5278745Z 
2023-01-11T21:38:06.5278841Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5278914Z import torch
2023-01-11T21:38:06.5279035Z import random
2023-01-11T21:38:06.5279148Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5279271Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5279276Z 
2023-01-11T21:38:06.5279359Z aten = torch.ops.aten
2023-01-11T21:38:06.5279495Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5279595Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5279600Z 
2023-01-11T21:38:06.5279675Z import triton
2023-01-11T21:38:06.5279767Z import triton.language as tl
2023-01-11T21:38:06.5279885Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5280024Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5280030Z 
2023-01-11T21:38:06.5280034Z 
2023-01-11T21:38:06.5280213Z triton_fused_getitem_getitem_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5280288Z import triton
2023-01-11T21:38:06.5280379Z import triton.language as tl
2023-01-11T21:38:06.5280492Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5280595Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5280729Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5280848Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5280860Z 
2023-01-11T21:38:06.5281278Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5281352Z @triton.jit
2023-01-11T21:38:06.5281495Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5281571Z     xnumel = 331776
2023-01-11T21:38:06.5281668Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5281799Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5281886Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5281957Z     x0 = xindex % 18
2023-01-11T21:38:06.5282039Z     x1 = (xindex // 18) % 18
2023-01-11T21:38:06.5282118Z     x2 = (xindex // 324)
2023-01-11T21:38:06.5282188Z     x3 = xindex
2023-01-11T21:38:06.5282434Z     tmp0 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5282708Z     tmp1 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5282947Z     tmp3 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5283192Z     tmp5 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5283427Z     tmp7 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5283668Z     tmp9 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5283922Z     tmp11 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5284168Z     tmp13 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5284415Z     tmp15 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5284547Z     tmp17 = tl.load(in_ptr0 + ((3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5284680Z     tmp19 = tl.load(in_ptr0 + (1 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5284810Z     tmp24 = tl.load(in_ptr0 + (2 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5284942Z     tmp29 = tl.load(in_ptr0 + (55 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5285065Z     tmp34 = tl.load(in_ptr0 + (56 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5285228Z     tmp39 = tl.load(in_ptr0 + (57 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5285358Z     tmp44 = tl.load(in_ptr0 + (110 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5285494Z     tmp49 = tl.load(in_ptr0 + (111 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5285624Z     tmp54 = tl.load(in_ptr0 + (112 + (3*x0) + (165*x1) + (3025*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5285780Z     tmp2 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp0, tmp1, tmp0))
2023-01-11T21:38:06.5285932Z     tmp4 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 > tmp2, tmp3, tmp2))
2023-01-11T21:38:06.5286071Z     tmp6 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp4, tmp5, tmp4))
2023-01-11T21:38:06.5286195Z     tmp8 = tl.where(tmp7 != tmp7, tmp7, tl.where(tmp7 > tmp6, tmp7, tmp6))
2023-01-11T21:38:06.5286327Z     tmp10 = tl.where(tmp9 != tmp9, tmp9, tl.where(tmp9 > tmp8, tmp9, tmp8))
2023-01-11T21:38:06.5286470Z     tmp12 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 > tmp10, tmp11, tmp10))
2023-01-11T21:38:06.5286602Z     tmp14 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 > tmp12, tmp13, tmp12))
2023-01-11T21:38:06.5286735Z     tmp16 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 > tmp14, tmp15, tmp14))
2023-01-11T21:38:06.5286820Z     tmp18 = (3*x0) + (165*x1)
2023-01-11T21:38:06.5286905Z     tmp20 = 1 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5286985Z     tmp21 = tmp19 > tmp17
2023-01-11T21:38:06.5287077Z     tmp22 = tl.where(tmp21, tmp20, tmp18)
2023-01-11T21:38:06.5287217Z     tmp23 = tl.where(tmp19 != tmp19, tmp19, tl.where(tmp19 > tmp17, tmp19, tmp17))
2023-01-11T21:38:06.5287301Z     tmp25 = 2 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5287383Z     tmp26 = tmp24 > tmp23
2023-01-11T21:38:06.5287480Z     tmp27 = tl.where(tmp26, tmp25, tmp22)
2023-01-11T21:38:06.5287613Z     tmp28 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 > tmp23, tmp24, tmp23))
2023-01-11T21:38:06.5287699Z     tmp30 = 55 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5287771Z     tmp31 = tmp29 > tmp28
2023-01-11T21:38:06.5287872Z     tmp32 = tl.where(tmp31, tmp30, tmp27)
2023-01-11T21:38:06.5288014Z     tmp33 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 > tmp28, tmp29, tmp28))
2023-01-11T21:38:06.5288097Z     tmp35 = 56 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5288203Z     tmp36 = tmp34 > tmp33
2023-01-11T21:38:06.5288302Z     tmp37 = tl.where(tmp36, tmp35, tmp32)
2023-01-11T21:38:06.5288438Z     tmp38 = tl.where(tmp34 != tmp34, tmp34, tl.where(tmp34 > tmp33, tmp34, tmp33))
2023-01-11T21:38:06.5288513Z     tmp40 = 57 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5288592Z     tmp41 = tmp39 > tmp38
2023-01-11T21:38:06.5288690Z     tmp42 = tl.where(tmp41, tmp40, tmp37)
2023-01-11T21:38:06.5288825Z     tmp43 = tl.where(tmp39 != tmp39, tmp39, tl.where(tmp39 > tmp38, tmp39, tmp38))
2023-01-11T21:38:06.5288907Z     tmp45 = 110 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5288990Z     tmp46 = tmp44 > tmp43
2023-01-11T21:38:06.5289091Z     tmp47 = tl.where(tmp46, tmp45, tmp42)
2023-01-11T21:38:06.5289223Z     tmp48 = tl.where(tmp44 != tmp44, tmp44, tl.where(tmp44 > tmp43, tmp44, tmp43))
2023-01-11T21:38:06.5289299Z     tmp50 = 111 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5289380Z     tmp51 = tmp49 > tmp48
2023-01-11T21:38:06.5289482Z     tmp52 = tl.where(tmp51, tmp50, tmp47)
2023-01-11T21:38:06.5289616Z     tmp53 = tl.where(tmp49 != tmp49, tmp49, tl.where(tmp49 > tmp48, tmp49, tmp48))
2023-01-11T21:38:06.5289699Z     tmp55 = 112 + (3*x0) + (165*x1)
2023-01-11T21:38:06.5289779Z     tmp56 = tmp54 > tmp53
2023-01-11T21:38:06.5289877Z     tmp57 = tl.where(tmp56, tmp55, tmp52)
2023-01-11T21:38:06.5290002Z     tmp58 = tl.where(tmp54 != tmp54, tmp54, tl.where(tmp54 > tmp53, tmp54, tmp53))
2023-01-11T21:38:06.5290133Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5290265Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp57, xmask)
2023-01-11T21:38:06.5290376Z ''')
2023-01-11T21:38:06.5290382Z 
2023-01-11T21:38:06.5290386Z 
2023-01-11T21:38:06.5290483Z async_compile.wait(globals())
2023-01-11T21:38:06.5290558Z del async_compile
2023-01-11T21:38:06.5290564Z 
2023-01-11T21:38:06.5290638Z def call(args):
2023-01-11T21:38:06.5290711Z     arg0_1, = args
2023-01-11T21:38:06.5290780Z     args.clear()
2023-01-11T21:38:06.5290875Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5291104Z         buf0 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5291327Z         buf1 = empty_strided((16, 64, 18, 18), (20736, 324, 18, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5291417Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5291579Z         triton_fused_getitem_getitem_1_0.run(arg0_1, buf0, buf1, 331776, grid=grid(331776), stream=stream0)
2023-01-11T21:38:06.5291652Z         del arg0_1
2023-01-11T21:38:06.5291728Z         return (buf0, buf1, )
2023-01-11T21:38:06.5291733Z 
2023-01-11T21:38:06.5291746Z 
2023-01-11T21:38:06.5291819Z if __name__ == "__main__":
2023-01-11T21:38:06.5291936Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5292063Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5292296Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5292407Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5292413Z 
2023-01-11T21:38:06.5292481Z ok (0.605s)
2023-01-11T21:38:06.5292939Z   test_max_pool2d6_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5293071Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5293332Z [2023-01-11 21:35:18,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 710
2023-01-11T21:38:06.5293569Z [2023-01-11 21:35:18,504] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices
2023-01-11T21:38:06.5293859Z [2023-01-11 21:35:18,507] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 710
2023-01-11T21:38:06.5294278Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5294409Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5294849Z [2023-01-11 21:35:18,527] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 711
2023-01-11T21:38:06.5295102Z [2023-01-11 21:35:18,532] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices
2023-01-11T21:38:06.5295367Z [2023-01-11 21:35:18,535] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 711
2023-01-11T21:38:06.5295373Z 
2023-01-11T21:38:06.5295492Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5295574Z import torch
2023-01-11T21:38:06.5295655Z import random
2023-01-11T21:38:06.5295784Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5295906Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5295912Z 
2023-01-11T21:38:06.5295993Z aten = torch.ops.aten
2023-01-11T21:38:06.5296130Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5296226Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5296231Z 
2023-01-11T21:38:06.5296304Z import triton
2023-01-11T21:38:06.5296396Z import triton.language as tl
2023-01-11T21:38:06.5296558Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5296697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5296702Z 
2023-01-11T21:38:06.5296707Z 
2023-01-11T21:38:06.5296798Z async_compile.wait(globals())
2023-01-11T21:38:06.5296875Z del async_compile
2023-01-11T21:38:06.5296882Z 
2023-01-11T21:38:06.5296954Z def call(args):
2023-01-11T21:38:06.5297028Z     arg0_1, = args
2023-01-11T21:38:06.5297102Z     args.clear()
2023-01-11T21:38:06.5297257Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5297397Z         buf0 = aten.max_pool2d_with_indices(arg0_1, [13, 13], [13, 13], [0, 0], 1, False)
2023-01-11T21:38:06.5297471Z         del arg0_1
2023-01-11T21:38:06.5297544Z         buf1 = buf0[0]
2023-01-11T21:38:06.5297657Z         assert_size_stride(buf1, (16, 64, 4, 4), (1024, 16, 4, 1))
2023-01-11T21:38:06.5297730Z         buf2 = buf0[1]
2023-01-11T21:38:06.5297841Z         assert_size_stride(buf2, (16, 64, 4, 4), (1024, 16, 4, 1))
2023-01-11T21:38:06.5297916Z         del buf0
2023-01-11T21:38:06.5297991Z         return (buf1, buf2, )
2023-01-11T21:38:06.5297997Z 
2023-01-11T21:38:06.5298001Z 
2023-01-11T21:38:06.5298079Z if __name__ == "__main__":
2023-01-11T21:38:06.5298194Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5298322Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5298562Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5298677Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5298682Z 
2023-01-11T21:38:06.5298687Z 
2023-01-11T21:38:06.5298783Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5298856Z import torch
2023-01-11T21:38:06.5298924Z import random
2023-01-11T21:38:06.5299047Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5299169Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5299175Z 
2023-01-11T21:38:06.5299262Z aten = torch.ops.aten
2023-01-11T21:38:06.5299396Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5299491Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5299496Z 
2023-01-11T21:38:06.5299570Z import triton
2023-01-11T21:38:06.5299655Z import triton.language as tl
2023-01-11T21:38:06.5299817Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5299957Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5299963Z 
2023-01-11T21:38:06.5299967Z 
2023-01-11T21:38:06.5300058Z async_compile.wait(globals())
2023-01-11T21:38:06.5300133Z del async_compile
2023-01-11T21:38:06.5300138Z 
2023-01-11T21:38:06.5300211Z def call(args):
2023-01-11T21:38:06.5300285Z     arg0_1, = args
2023-01-11T21:38:06.5300359Z     args.clear()
2023-01-11T21:38:06.5300443Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5300579Z         buf0 = aten.max_pool2d_with_indices(arg0_1, [13, 13], [13, 13], [0, 0], 1, False)
2023-01-11T21:38:06.5300658Z         del arg0_1
2023-01-11T21:38:06.5300731Z         buf1 = buf0[0]
2023-01-11T21:38:06.5300844Z         assert_size_stride(buf1, (16, 64, 4, 4), (1024, 16, 4, 1))
2023-01-11T21:38:06.5300921Z         buf2 = buf0[1]
2023-01-11T21:38:06.5301032Z         assert_size_stride(buf2, (16, 64, 4, 4), (1024, 16, 4, 1))
2023-01-11T21:38:06.5301098Z         del buf0
2023-01-11T21:38:06.5301180Z         return (buf1, buf2, )
2023-01-11T21:38:06.5301185Z 
2023-01-11T21:38:06.5301189Z 
2023-01-11T21:38:06.5301268Z if __name__ == "__main__":
2023-01-11T21:38:06.5301384Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5301509Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5301745Z     arg0_1 = rand_strided((16, 64, 55, 55), (193600, 3025, 55, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5301859Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5301864Z 
2023-01-11T21:38:06.5301934Z ok (0.075s)
2023-01-11T21:38:06.5302443Z   test_max_pool2d_with_indices_backward2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5302580Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5302834Z [2023-01-11 21:35:18,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 712
2023-01-11T21:38:06.5303095Z [2023-01-11 21:35:18,770] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 712
2023-01-11T21:38:06.5303511Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5303649Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5303911Z [2023-01-11 21:35:18,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 713
2023-01-11T21:38:06.5303917Z 
2023-01-11T21:38:06.5304016Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5304089Z import torch
2023-01-11T21:38:06.5304162Z import random
2023-01-11T21:38:06.5304281Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5304400Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5304405Z 
2023-01-11T21:38:06.5304487Z aten = torch.ops.aten
2023-01-11T21:38:06.5304626Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5304725Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5304733Z 
2023-01-11T21:38:06.5304808Z import triton
2023-01-11T21:38:06.5304901Z import triton.language as tl
2023-01-11T21:38:06.5305028Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5305163Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5305175Z 
2023-01-11T21:38:06.5305180Z 
2023-01-11T21:38:06.5305406Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5305484Z import triton
2023-01-11T21:38:06.5305583Z import triton.language as tl
2023-01-11T21:38:06.5305724Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5305842Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5305985Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5306112Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5306118Z 
2023-01-11T21:38:06.5306540Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5306612Z @triton.jit
2023-01-11T21:38:06.5306756Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5306836Z     xnumel = 17920
2023-01-11T21:38:06.5306939Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5307073Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5307158Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5307236Z     x3 = xindex % 2240
2023-01-11T21:38:06.5307314Z     x1 = (xindex // 56) % 40
2023-01-11T21:38:06.5307393Z     x0 = xindex % 56
2023-01-11T21:38:06.5307474Z     x2 = (xindex // 2240)
2023-01-11T21:38:06.5307547Z     x5 = xindex
2023-01-11T21:38:06.5307619Z     tmp0 = x3
2023-01-11T21:38:06.5307697Z     tmp1 = (x1 // 2)
2023-01-11T21:38:06.5307774Z     tmp2 = (x0 // 2)
2023-01-11T21:38:06.5307853Z     tmp3 = 1 + (((1 + x1) // 2))
2023-01-11T21:38:06.5307963Z     tmp4 = 1 + (((1 + x0) // 2))
2023-01-11T21:38:06.5308035Z     tmp5 = 0
2023-01-11T21:38:06.5308177Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5308314Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5308387Z     tmp8 = 21
2023-01-11T21:38:06.5308523Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5308591Z     tmp10 = 29
2023-01-11T21:38:06.5308733Z     tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10))
2023-01-11T21:38:06.5308816Z     tmp12 = tmp6 + tmp5
2023-01-11T21:38:06.5308897Z     tmp13 = tmp7 + tmp5
2023-01-11T21:38:06.5308969Z     tmp14 = 1
2023-01-11T21:38:06.5309085Z     tmp15 = tmp9 - tmp14
2023-01-11T21:38:06.5309228Z     tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15))
2023-01-11T21:38:06.5309336Z     tmp17 = tmp11 - tmp14
2023-01-11T21:38:06.5309478Z     tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17))
2023-01-11T21:38:06.5309609Z     tmp19 = tl.load(in_ptr0 + (tmp18 + (29*tmp16) + (609*x2)), xmask)
2023-01-11T21:38:06.5309729Z     tmp20 = tl.load(in_ptr1 + (tmp18 + (29*tmp16) + (609*x2)), xmask)
2023-01-11T21:38:06.5309810Z     tmp21 = tmp19 == tmp0
2023-01-11T21:38:06.5309887Z     tmp22 = 0.0
2023-01-11T21:38:06.5309992Z     tmp23 = tl.where(tmp21, tmp20, tmp22)
2023-01-11T21:38:06.5310069Z     tmp24 = tmp7 + tmp14
2023-01-11T21:38:06.5310213Z     tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17))
2023-01-11T21:38:06.5310335Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (29*tmp16) + (609*x2)), xmask)
2023-01-11T21:38:06.5310455Z     tmp27 = tl.load(in_ptr1 + (tmp25 + (29*tmp16) + (609*x2)), xmask)
2023-01-11T21:38:06.5310537Z     tmp28 = tmp26 == tmp0
2023-01-11T21:38:06.5310620Z     tmp29 = tmp12 < tmp9
2023-01-11T21:38:06.5310701Z     tmp30 = tmp24 < tmp11
2023-01-11T21:38:06.5310775Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5310858Z     tmp32 = tmp31 & tmp28
2023-01-11T21:38:06.5310937Z     tmp33 = tmp23 + tmp27
2023-01-11T21:38:06.5311036Z     tmp34 = tl.where(tmp32, tmp33, tmp23)
2023-01-11T21:38:06.5317374Z     tmp35 = tmp6 + tmp14
2023-01-11T21:38:06.5317540Z     tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 < tmp15, tmp35, tmp15))
2023-01-11T21:38:06.5317721Z     tmp37 = tl.load(in_ptr0 + (tmp18 + (29*tmp36) + (609*x2)), xmask)
2023-01-11T21:38:06.5317844Z     tmp38 = tl.load(in_ptr1 + (tmp18 + (29*tmp36) + (609*x2)), xmask)
2023-01-11T21:38:06.5317922Z     tmp39 = tmp37 == tmp0
2023-01-11T21:38:06.5318006Z     tmp40 = tmp35 < tmp9
2023-01-11T21:38:06.5318087Z     tmp41 = tmp13 < tmp11
2023-01-11T21:38:06.5318169Z     tmp42 = tmp40 & tmp41
2023-01-11T21:38:06.5318251Z     tmp43 = tmp42 & tmp39
2023-01-11T21:38:06.5318331Z     tmp44 = tmp34 + tmp38
2023-01-11T21:38:06.5318433Z     tmp45 = tl.where(tmp43, tmp44, tmp34)
2023-01-11T21:38:06.5318550Z     tmp46 = tl.load(in_ptr0 + (tmp25 + (29*tmp36) + (609*x2)), xmask)
2023-01-11T21:38:06.5318676Z     tmp47 = tl.load(in_ptr1 + (tmp25 + (29*tmp36) + (609*x2)), xmask)
2023-01-11T21:38:06.5318758Z     tmp48 = tmp46 == tmp0
2023-01-11T21:38:06.5318835Z     tmp49 = tmp40 & tmp30
2023-01-11T21:38:06.5318919Z     tmp50 = tmp49 & tmp48
2023-01-11T21:38:06.5318998Z     tmp51 = tmp45 + tmp47
2023-01-11T21:38:06.5319095Z     tmp52 = tl.where(tmp50, tmp51, tmp45)
2023-01-11T21:38:06.5319237Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp52, xmask)
2023-01-11T21:38:06.5319345Z ''')
2023-01-11T21:38:06.5319351Z 
2023-01-11T21:38:06.5319355Z 
2023-01-11T21:38:06.5319453Z async_compile.wait(globals())
2023-01-11T21:38:06.5319537Z del async_compile
2023-01-11T21:38:06.5319542Z 
2023-01-11T21:38:06.5319619Z def call(args):
2023-01-11T21:38:06.5319710Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5319787Z     args.clear()
2023-01-11T21:38:06.5319876Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5320109Z         buf0 = empty_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5320237Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5320420Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 17920, grid=grid(17920), stream=stream0)
2023-01-11T21:38:06.5320496Z         del arg0_1
2023-01-11T21:38:06.5320572Z         del arg2_1
2023-01-11T21:38:06.5320652Z         return (buf0, )
2023-01-11T21:38:06.5320657Z 
2023-01-11T21:38:06.5320661Z 
2023-01-11T21:38:06.5320738Z if __name__ == "__main__":
2023-01-11T21:38:06.5320858Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5320990Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5321250Z     arg0_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5321488Z     arg1_1 = rand_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5321705Z     arg2_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5321837Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5321843Z 
2023-01-11T21:38:06.5322107Z [2023-01-11 21:35:18,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 713
2023-01-11T21:38:06.5322113Z 
2023-01-11T21:38:06.5322213Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5322281Z import torch
2023-01-11T21:38:06.5322355Z import random
2023-01-11T21:38:06.5322474Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5322597Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5322602Z 
2023-01-11T21:38:06.5322685Z aten = torch.ops.aten
2023-01-11T21:38:06.5322819Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5322914Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5322919Z 
2023-01-11T21:38:06.5322993Z import triton
2023-01-11T21:38:06.5323079Z import triton.language as tl
2023-01-11T21:38:06.5323208Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5323347Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5323352Z 
2023-01-11T21:38:06.5323357Z 
2023-01-11T21:38:06.5323565Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5323670Z import triton
2023-01-11T21:38:06.5323764Z import triton.language as tl
2023-01-11T21:38:06.5323880Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5323974Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5324107Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5324230Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5324236Z 
2023-01-11T21:38:06.5324655Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5324730Z @triton.jit
2023-01-11T21:38:06.5324870Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5324945Z     xnumel = 17920
2023-01-11T21:38:06.5325045Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5325170Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5325254Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5325332Z     x3 = xindex % 2240
2023-01-11T21:38:06.5325415Z     x1 = (xindex // 56) % 40
2023-01-11T21:38:06.5325489Z     x0 = xindex % 56
2023-01-11T21:38:06.5325571Z     x2 = (xindex // 2240)
2023-01-11T21:38:06.5325642Z     x5 = xindex
2023-01-11T21:38:06.5325706Z     tmp0 = x3
2023-01-11T21:38:06.5325781Z     tmp1 = (x1 // 2)
2023-01-11T21:38:06.5325854Z     tmp2 = (x0 // 2)
2023-01-11T21:38:06.5325940Z     tmp3 = 1 + (((1 + x1) // 2))
2023-01-11T21:38:06.5326021Z     tmp4 = 1 + (((1 + x0) // 2))
2023-01-11T21:38:06.5326139Z     tmp5 = 0
2023-01-11T21:38:06.5326280Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5326413Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5326489Z     tmp8 = 21
2023-01-11T21:38:06.5326624Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5326699Z     tmp10 = 29
2023-01-11T21:38:06.5326841Z     tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10))
2023-01-11T21:38:06.5326922Z     tmp12 = tmp6 + tmp5
2023-01-11T21:38:06.5327003Z     tmp13 = tmp7 + tmp5
2023-01-11T21:38:06.5327069Z     tmp14 = 1
2023-01-11T21:38:06.5327185Z     tmp15 = tmp9 - tmp14
2023-01-11T21:38:06.5327331Z     tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15))
2023-01-11T21:38:06.5327448Z     tmp17 = tmp11 - tmp14
2023-01-11T21:38:06.5327594Z     tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17))
2023-01-11T21:38:06.5327723Z     tmp19 = tl.load(in_ptr0 + (tmp18 + (29*tmp16) + (609*x2)), xmask)
2023-01-11T21:38:06.5327860Z     tmp20 = tl.load(in_ptr1 + (tmp18 + (29*tmp16) + (609*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5327937Z     tmp21 = tmp19 == tmp0
2023-01-11T21:38:06.5328010Z     tmp22 = 0.0
2023-01-11T21:38:06.5328112Z     tmp23 = tl.where(tmp21, tmp20, tmp22)
2023-01-11T21:38:06.5328205Z     tmp24 = tmp7 + tmp14
2023-01-11T21:38:06.5328352Z     tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17))
2023-01-11T21:38:06.5328477Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (29*tmp16) + (609*x2)), xmask)
2023-01-11T21:38:06.5328616Z     tmp27 = tl.load(in_ptr1 + (tmp25 + (29*tmp16) + (609*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5328691Z     tmp28 = tmp26 == tmp0
2023-01-11T21:38:06.5328775Z     tmp29 = tmp12 < tmp9
2023-01-11T21:38:06.5328857Z     tmp30 = tmp24 < tmp11
2023-01-11T21:38:06.5328941Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5329020Z     tmp32 = tmp31 & tmp28
2023-01-11T21:38:06.5329103Z     tmp33 = tmp23 + tmp27
2023-01-11T21:38:06.5329204Z     tmp34 = tl.where(tmp32, tmp33, tmp23)
2023-01-11T21:38:06.5329279Z     tmp35 = tmp6 + tmp14
2023-01-11T21:38:06.5329423Z     tmp36 = tl.where(tmp35 != tmp35, tmp35, tl.where(tmp35 < tmp15, tmp35, tmp15))
2023-01-11T21:38:06.5329617Z     tmp37 = tl.load(in_ptr0 + (tmp18 + (29*tmp36) + (609*x2)), xmask)
2023-01-11T21:38:06.5329756Z     tmp38 = tl.load(in_ptr1 + (tmp18 + (29*tmp36) + (609*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5329839Z     tmp39 = tmp37 == tmp0
2023-01-11T21:38:06.5329919Z     tmp40 = tmp35 < tmp9
2023-01-11T21:38:06.5329999Z     tmp41 = tmp13 < tmp11
2023-01-11T21:38:06.5330073Z     tmp42 = tmp40 & tmp41
2023-01-11T21:38:06.5330151Z     tmp43 = tmp42 & tmp39
2023-01-11T21:38:06.5330235Z     tmp44 = tmp34 + tmp38
2023-01-11T21:38:06.5330336Z     tmp45 = tl.where(tmp43, tmp44, tmp34)
2023-01-11T21:38:06.5330460Z     tmp46 = tl.load(in_ptr0 + (tmp25 + (29*tmp36) + (609*x2)), xmask)
2023-01-11T21:38:06.5330606Z     tmp47 = tl.load(in_ptr1 + (tmp25 + (29*tmp36) + (609*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5330689Z     tmp48 = tmp46 == tmp0
2023-01-11T21:38:06.5330771Z     tmp49 = tmp40 & tmp30
2023-01-11T21:38:06.5330845Z     tmp50 = tmp49 & tmp48
2023-01-11T21:38:06.5330926Z     tmp51 = tmp45 + tmp47
2023-01-11T21:38:06.5331028Z     tmp52 = tl.where(tmp50, tmp51, tmp45)
2023-01-11T21:38:06.5331167Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp52, xmask)
2023-01-11T21:38:06.5331256Z ''')
2023-01-11T21:38:06.5331262Z 
2023-01-11T21:38:06.5331266Z 
2023-01-11T21:38:06.5331363Z async_compile.wait(globals())
2023-01-11T21:38:06.5331445Z del async_compile
2023-01-11T21:38:06.5331450Z 
2023-01-11T21:38:06.5331520Z def call(args):
2023-01-11T21:38:06.5331607Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5331684Z     args.clear()
2023-01-11T21:38:06.5331778Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5332006Z         buf0 = empty_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5332131Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5332311Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 17920, grid=grid(17920), stream=stream0)
2023-01-11T21:38:06.5332381Z         del arg0_1
2023-01-11T21:38:06.5332457Z         del arg2_1
2023-01-11T21:38:06.5332541Z         return (buf0, )
2023-01-11T21:38:06.5332547Z 
2023-01-11T21:38:06.5332551Z 
2023-01-11T21:38:06.5332631Z if __name__ == "__main__":
2023-01-11T21:38:06.5332750Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5332880Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5333109Z     arg0_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5333335Z     arg1_1 = rand_strided((2, 4, 40, 56), (8960, 2240, 56, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5333546Z     arg2_1 = rand_strided((2, 4, 21, 29), (2436, 609, 29, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5333680Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5333685Z 
2023-01-11T21:38:06.5333757Z ok (0.446s)
2023-01-11T21:38:06.5334240Z   test_max_pool2d_with_indices_backward3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5334373Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5334781Z [2023-01-11 21:35:19,106] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 714
2023-01-11T21:38:06.5335049Z [2023-01-11 21:35:19,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 714
2023-01-11T21:38:06.5335572Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5335713Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5335976Z [2023-01-11 21:35:19,280] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 715
2023-01-11T21:38:06.5336242Z [2023-01-11 21:35:19,409] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 715
2023-01-11T21:38:06.5336248Z 
2023-01-11T21:38:06.5336345Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5336412Z import torch
2023-01-11T21:38:06.5336487Z import random
2023-01-11T21:38:06.5336606Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5336730Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5336735Z 
2023-01-11T21:38:06.5336817Z aten = torch.ops.aten
2023-01-11T21:38:06.5336954Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5337050Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5337057Z 
2023-01-11T21:38:06.5337181Z import triton
2023-01-11T21:38:06.5337293Z import triton.language as tl
2023-01-11T21:38:06.5337423Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5337567Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5337573Z 
2023-01-11T21:38:06.5337578Z 
2023-01-11T21:38:06.5337783Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5337859Z import triton
2023-01-11T21:38:06.5337950Z import triton.language as tl
2023-01-11T21:38:06.5338065Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5338160Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5338341Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5338471Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5338476Z 
2023-01-11T21:38:06.5338907Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5338984Z @triton.jit
2023-01-11T21:38:06.5339125Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5339203Z     xnumel = 11517952
2023-01-11T21:38:06.5339302Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5339426Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5339508Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5339593Z     x3 = xindex % 1406
2023-01-11T21:38:06.5339677Z     x1 = (xindex // 38) % 37
2023-01-11T21:38:06.5339760Z     x0 = xindex % 38
2023-01-11T21:38:06.5339840Z     x2 = (xindex // 1406)
2023-01-11T21:38:06.5339912Z     x5 = xindex
2023-01-11T21:38:06.5339979Z     tmp0 = x3
2023-01-11T21:38:06.5340061Z     tmp1 = ((1 + x1) // 2)
2023-01-11T21:38:06.5340141Z     tmp2 = ((1 + x0) // 2)
2023-01-11T21:38:06.5340223Z     tmp3 = 1 + (x1 // 2)
2023-01-11T21:38:06.5340301Z     tmp4 = 1 + (x0 // 2)
2023-01-11T21:38:06.5340376Z     tmp5 = 0
2023-01-11T21:38:06.5340510Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5340648Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5340727Z     tmp8 = 19
2023-01-11T21:38:06.5340862Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5341001Z     tmp10 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp8, tmp4, tmp8))
2023-01-11T21:38:06.5341083Z     tmp11 = tmp6 + tmp5
2023-01-11T21:38:06.5341164Z     tmp12 = tmp7 + tmp5
2023-01-11T21:38:06.5341241Z     tmp13 = 1
2023-01-11T21:38:06.5341350Z     tmp14 = tmp9 - tmp13
2023-01-11T21:38:06.5341494Z     tmp15 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp14, tmp11, tmp14))
2023-01-11T21:38:06.5341610Z     tmp16 = tmp10 - tmp13
2023-01-11T21:38:06.5341782Z     tmp17 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp16, tmp12, tmp16))
2023-01-11T21:38:06.5341906Z     tmp18 = tl.load(in_ptr0 + (tmp17 + (19*tmp15) + (361*x2)), xmask)
2023-01-11T21:38:06.5342026Z     tmp19 = tl.load(in_ptr1 + (tmp17 + (19*tmp15) + (361*x2)), xmask)
2023-01-11T21:38:06.5342104Z     tmp20 = tmp18 == tmp0
2023-01-11T21:38:06.5342171Z     tmp21 = 0.0
2023-01-11T21:38:06.5342274Z     tmp22 = tl.where(tmp20, tmp19, tmp21)
2023-01-11T21:38:06.5342409Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.5342494Z ''')
2023-01-11T21:38:06.5342500Z 
2023-01-11T21:38:06.5342504Z 
2023-01-11T21:38:06.5342598Z async_compile.wait(globals())
2023-01-11T21:38:06.5342679Z del async_compile
2023-01-11T21:38:06.5342684Z 
2023-01-11T21:38:06.5342759Z def call(args):
2023-01-11T21:38:06.5342844Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5342913Z     args.clear()
2023-01-11T21:38:06.5343005Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5343241Z         buf0 = empty_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5343333Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5343515Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 11517952, grid=grid(11517952), stream=stream0)
2023-01-11T21:38:06.5343588Z         del arg0_1
2023-01-11T21:38:06.5343660Z         del arg2_1
2023-01-11T21:38:06.5343730Z         return (buf0, )
2023-01-11T21:38:06.5343736Z 
2023-01-11T21:38:06.5343740Z 
2023-01-11T21:38:06.5343820Z if __name__ == "__main__":
2023-01-11T21:38:06.5343938Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5344065Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5344327Z     arg0_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5344557Z     arg1_1 = rand_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5344782Z     arg2_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5344907Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5344912Z 
2023-01-11T21:38:06.5344916Z 
2023-01-11T21:38:06.5345012Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5345080Z import torch
2023-01-11T21:38:06.5345153Z import random
2023-01-11T21:38:06.5345273Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5345396Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5345401Z 
2023-01-11T21:38:06.5345486Z aten = torch.ops.aten
2023-01-11T21:38:06.5345629Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5345745Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5345750Z 
2023-01-11T21:38:06.5345823Z import triton
2023-01-11T21:38:06.5345933Z import triton.language as tl
2023-01-11T21:38:06.5346058Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5346198Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5346203Z 
2023-01-11T21:38:06.5346208Z 
2023-01-11T21:38:06.5346409Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5346484Z import triton
2023-01-11T21:38:06.5346577Z import triton.language as tl
2023-01-11T21:38:06.5346691Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5346786Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5346917Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5347046Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5347054Z 
2023-01-11T21:38:06.5347479Z @pointwise(size_hints=[16777216], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5347552Z @triton.jit
2023-01-11T21:38:06.5347721Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5347800Z     xnumel = 11517952
2023-01-11T21:38:06.5347895Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5348016Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5348099Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5348177Z     x3 = xindex % 1406
2023-01-11T21:38:06.5348259Z     x1 = (xindex // 38) % 37
2023-01-11T21:38:06.5348334Z     x0 = xindex % 38
2023-01-11T21:38:06.5348415Z     x2 = (xindex // 1406)
2023-01-11T21:38:06.5348487Z     x5 = xindex
2023-01-11T21:38:06.5348551Z     tmp0 = x3
2023-01-11T21:38:06.5348630Z     tmp1 = ((1 + x1) // 2)
2023-01-11T21:38:06.5348708Z     tmp2 = ((1 + x0) // 2)
2023-01-11T21:38:06.5348784Z     tmp3 = 1 + (x1 // 2)
2023-01-11T21:38:06.5348857Z     tmp4 = 1 + (x0 // 2)
2023-01-11T21:38:06.5348927Z     tmp5 = 0
2023-01-11T21:38:06.5349057Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5349197Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5349269Z     tmp8 = 19
2023-01-11T21:38:06.5349398Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5349534Z     tmp10 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp8, tmp4, tmp8))
2023-01-11T21:38:06.5349614Z     tmp11 = tmp6 + tmp5
2023-01-11T21:38:06.5349698Z     tmp12 = tmp7 + tmp5
2023-01-11T21:38:06.5349767Z     tmp13 = 1
2023-01-11T21:38:06.5349875Z     tmp14 = tmp9 - tmp13
2023-01-11T21:38:06.5350017Z     tmp15 = tl.where(tmp11 != tmp11, tmp11, tl.where(tmp11 < tmp14, tmp11, tmp14))
2023-01-11T21:38:06.5350159Z     tmp16 = tmp10 - tmp13
2023-01-11T21:38:06.5350298Z     tmp17 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp16, tmp12, tmp16))
2023-01-11T21:38:06.5350420Z     tmp18 = tl.load(in_ptr0 + (tmp17 + (19*tmp15) + (361*x2)), xmask)
2023-01-11T21:38:06.5350563Z     tmp19 = tl.load(in_ptr1 + (tmp17 + (19*tmp15) + (361*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5350644Z     tmp20 = tmp18 == tmp0
2023-01-11T21:38:06.5350710Z     tmp21 = 0.0
2023-01-11T21:38:06.5350810Z     tmp22 = tl.where(tmp20, tmp19, tmp21)
2023-01-11T21:38:06.5350943Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.5351026Z ''')
2023-01-11T21:38:06.5351032Z 
2023-01-11T21:38:06.5351036Z 
2023-01-11T21:38:06.5351130Z async_compile.wait(globals())
2023-01-11T21:38:06.5351207Z del async_compile
2023-01-11T21:38:06.5351212Z 
2023-01-11T21:38:06.5351286Z def call(args):
2023-01-11T21:38:06.5351372Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5351443Z     args.clear()
2023-01-11T21:38:06.5351535Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5351764Z         buf0 = empty_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5351857Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5352038Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 11517952, grid=grid(11517952), stream=stream0)
2023-01-11T21:38:06.5352115Z         del arg0_1
2023-01-11T21:38:06.5352189Z         del arg2_1
2023-01-11T21:38:06.5352259Z         return (buf0, )
2023-01-11T21:38:06.5352264Z 
2023-01-11T21:38:06.5352268Z 
2023-01-11T21:38:06.5352346Z if __name__ == "__main__":
2023-01-11T21:38:06.5352464Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5352590Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5352821Z     arg0_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5353054Z     arg1_1 = rand_strided((32, 256, 37, 38), (359936, 1406, 38, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5353279Z     arg2_1 = rand_strided((32, 256, 19, 19), (92416, 361, 19, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5353406Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5353441Z 
2023-01-11T21:38:06.5353507Z ok (0.440s)
2023-01-11T21:38:06.5353983Z   test_max_pool2d_with_indices_backward4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5354113Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5354373Z [2023-01-11 21:35:19,444] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 716
2023-01-11T21:38:06.5354381Z 
2023-01-11T21:38:06.5354479Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5354556Z import torch
2023-01-11T21:38:06.5354631Z import random
2023-01-11T21:38:06.5354750Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5354874Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5354879Z 
2023-01-11T21:38:06.5354954Z aten = torch.ops.aten
2023-01-11T21:38:06.5355087Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5355186Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5355191Z 
2023-01-11T21:38:06.5355266Z import triton
2023-01-11T21:38:06.5355373Z import triton.language as tl
2023-01-11T21:38:06.5355514Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5355671Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5355676Z 
2023-01-11T21:38:06.5355682Z 
2023-01-11T21:38:06.5355912Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5355981Z import triton
2023-01-11T21:38:06.5356073Z import triton.language as tl
2023-01-11T21:38:06.5356186Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5356287Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5356420Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5356545Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5356550Z 
2023-01-11T21:38:06.5356973Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5357045Z @triton.jit
2023-01-11T21:38:06.5357186Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5357255Z     xnumel = 1536
2023-01-11T21:38:06.5357354Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5357485Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5357566Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5357643Z     x3 = xindex % 12
2023-01-11T21:38:06.5357721Z     x1 = (xindex // 4) % 3
2023-01-11T21:38:06.5357791Z     x0 = xindex % 4
2023-01-11T21:38:06.5357870Z     x2 = (xindex // 12)
2023-01-11T21:38:06.5357940Z     x5 = xindex
2023-01-11T21:38:06.5358011Z     tmp0 = x3
2023-01-11T21:38:06.5358116Z     tmp1 = (-2) + x1
2023-01-11T21:38:06.5358218Z     tmp2 = (-2) + x0
2023-01-11T21:38:06.5358291Z     tmp3 = 3 + x1
2023-01-11T21:38:06.5358357Z     tmp4 = 3 + x0
2023-01-11T21:38:06.5358429Z     tmp5 = 0
2023-01-11T21:38:06.5358567Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5358703Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5358773Z     tmp8 = 3
2023-01-11T21:38:06.5358906Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5358978Z     tmp10 = 4
2023-01-11T21:38:06.5359109Z     tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10))
2023-01-11T21:38:06.5359188Z     tmp12 = tmp6 + tmp5
2023-01-11T21:38:06.5359264Z     tmp13 = tmp7 + tmp5
2023-01-11T21:38:06.5359364Z     tmp14 = 1
2023-01-11T21:38:06.5359478Z     tmp15 = tmp9 - tmp14
2023-01-11T21:38:06.5359620Z     tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15))
2023-01-11T21:38:06.5359733Z     tmp17 = tmp11 - tmp14
2023-01-11T21:38:06.5359867Z     tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17))
2023-01-11T21:38:06.5359988Z     tmp19 = tl.load(in_ptr0 + (tmp18 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5360104Z     tmp20 = tl.load(in_ptr1 + (tmp18 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5360184Z     tmp21 = tmp19 == tmp0
2023-01-11T21:38:06.5360257Z     tmp22 = 0.0
2023-01-11T21:38:06.5360359Z     tmp23 = tl.where(tmp21, tmp20, tmp22)
2023-01-11T21:38:06.5360441Z     tmp24 = tmp7 + tmp14
2023-01-11T21:38:06.5360574Z     tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17))
2023-01-11T21:38:06.5360693Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5360809Z     tmp27 = tl.load(in_ptr1 + (tmp25 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5360889Z     tmp28 = tmp26 == tmp0
2023-01-11T21:38:06.5360968Z     tmp29 = tmp12 < tmp9
2023-01-11T21:38:06.5361049Z     tmp30 = tmp24 < tmp11
2023-01-11T21:38:06.5361127Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5361199Z     tmp32 = tmp31 & tmp28
2023-01-11T21:38:06.5361276Z     tmp33 = tmp23 + tmp27
2023-01-11T21:38:06.5361377Z     tmp34 = tl.where(tmp32, tmp33, tmp23)
2023-01-11T21:38:06.5361448Z     tmp35 = 2
2023-01-11T21:38:06.5361528Z     tmp36 = tmp7 + tmp35
2023-01-11T21:38:06.5361667Z     tmp37 = tl.where(tmp36 != tmp36, tmp36, tl.where(tmp36 < tmp17, tmp36, tmp17))
2023-01-11T21:38:06.5361811Z     tmp38 = tl.load(in_ptr0 + (tmp37 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5361920Z     tmp39 = tl.load(in_ptr1 + (tmp37 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5362003Z     tmp40 = tmp38 == tmp0
2023-01-11T21:38:06.5362080Z     tmp41 = tmp36 < tmp11
2023-01-11T21:38:06.5362160Z     tmp42 = tmp29 & tmp41
2023-01-11T21:38:06.5362241Z     tmp43 = tmp42 & tmp40
2023-01-11T21:38:06.5362319Z     tmp44 = tmp34 + tmp39
2023-01-11T21:38:06.5362419Z     tmp45 = tl.where(tmp43, tmp44, tmp34)
2023-01-11T21:38:06.5362491Z     tmp46 = tmp7 + tmp8
2023-01-11T21:38:06.5362627Z     tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp17, tmp46, tmp17))
2023-01-11T21:38:06.5362746Z     tmp48 = tl.load(in_ptr0 + (tmp47 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5362861Z     tmp49 = tl.load(in_ptr1 + (tmp47 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5362938Z     tmp50 = tmp48 == tmp0
2023-01-11T21:38:06.5363018Z     tmp51 = tmp46 < tmp11
2023-01-11T21:38:06.5363101Z     tmp52 = tmp29 & tmp51
2023-01-11T21:38:06.5363173Z     tmp53 = tmp52 & tmp50
2023-01-11T21:38:06.5363251Z     tmp54 = tmp45 + tmp49
2023-01-11T21:38:06.5363350Z     tmp55 = tl.where(tmp53, tmp54, tmp45)
2023-01-11T21:38:06.5363427Z     tmp56 = tmp7 + tmp10
2023-01-11T21:38:06.5363569Z     tmp57 = tl.where(tmp56 != tmp56, tmp56, tl.where(tmp56 < tmp17, tmp56, tmp17))
2023-01-11T21:38:06.5363685Z     tmp58 = tl.load(in_ptr0 + (tmp57 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5363798Z     tmp59 = tl.load(in_ptr1 + (tmp57 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5363881Z     tmp60 = tmp58 == tmp0
2023-01-11T21:38:06.5363954Z     tmp61 = tmp56 < tmp11
2023-01-11T21:38:06.5364033Z     tmp62 = tmp29 & tmp61
2023-01-11T21:38:06.5364111Z     tmp63 = tmp62 & tmp60
2023-01-11T21:38:06.5364190Z     tmp64 = tmp55 + tmp59
2023-01-11T21:38:06.5364288Z     tmp65 = tl.where(tmp63, tmp64, tmp55)
2023-01-11T21:38:06.5364369Z     tmp66 = tmp6 + tmp14
2023-01-11T21:38:06.5364501Z     tmp67 = tl.where(tmp66 != tmp66, tmp66, tl.where(tmp66 < tmp15, tmp66, tmp15))
2023-01-11T21:38:06.5364622Z     tmp68 = tl.load(in_ptr0 + (tmp18 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5364737Z     tmp69 = tl.load(in_ptr1 + (tmp18 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5364817Z     tmp70 = tmp68 == tmp0
2023-01-11T21:38:06.5364896Z     tmp71 = tmp66 < tmp9
2023-01-11T21:38:06.5365002Z     tmp72 = tmp13 < tmp11
2023-01-11T21:38:06.5365082Z     tmp73 = tmp71 & tmp72
2023-01-11T21:38:06.5365153Z     tmp74 = tmp73 & tmp70
2023-01-11T21:38:06.5365233Z     tmp75 = tmp65 + tmp69
2023-01-11T21:38:06.5365336Z     tmp76 = tl.where(tmp74, tmp75, tmp65)
2023-01-11T21:38:06.5365451Z     tmp77 = tl.load(in_ptr0 + (tmp25 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5365565Z     tmp78 = tl.load(in_ptr1 + (tmp25 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5365646Z     tmp79 = tmp77 == tmp0
2023-01-11T21:38:06.5365723Z     tmp80 = tmp71 & tmp30
2023-01-11T21:38:06.5365800Z     tmp81 = tmp80 & tmp79
2023-01-11T21:38:06.5365875Z     tmp82 = tmp76 + tmp78
2023-01-11T21:38:06.5365969Z     tmp83 = tl.where(tmp81, tmp82, tmp76)
2023-01-11T21:38:06.5366085Z     tmp84 = tl.load(in_ptr0 + (tmp37 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5366201Z     tmp85 = tl.load(in_ptr1 + (tmp37 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5366282Z     tmp86 = tmp84 == tmp0
2023-01-11T21:38:06.5366363Z     tmp87 = tmp71 & tmp41
2023-01-11T21:38:06.5366439Z     tmp88 = tmp87 & tmp86
2023-01-11T21:38:06.5366511Z     tmp89 = tmp83 + tmp85
2023-01-11T21:38:06.5366606Z     tmp90 = tl.where(tmp88, tmp89, tmp83)
2023-01-11T21:38:06.5366721Z     tmp91 = tl.load(in_ptr0 + (tmp47 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5366837Z     tmp92 = tl.load(in_ptr1 + (tmp47 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5366915Z     tmp93 = tmp91 == tmp0
2023-01-11T21:38:06.5366993Z     tmp94 = tmp71 & tmp51
2023-01-11T21:38:06.5367070Z     tmp95 = tmp94 & tmp93
2023-01-11T21:38:06.5367142Z     tmp96 = tmp90 + tmp92
2023-01-11T21:38:06.5367278Z     tmp97 = tl.where(tmp95, tmp96, tmp90)
2023-01-11T21:38:06.5367397Z     tmp98 = tl.load(in_ptr0 + (tmp57 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5367511Z     tmp99 = tl.load(in_ptr1 + (tmp57 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5367592Z     tmp100 = tmp98 == tmp0
2023-01-11T21:38:06.5367679Z     tmp101 = tmp71 & tmp61
2023-01-11T21:38:06.5367757Z     tmp102 = tmp101 & tmp100
2023-01-11T21:38:06.5367832Z     tmp103 = tmp97 + tmp99
2023-01-11T21:38:06.5367936Z     tmp104 = tl.where(tmp102, tmp103, tmp97)
2023-01-11T21:38:06.5368013Z     tmp105 = tmp6 + tmp35
2023-01-11T21:38:06.5368155Z     tmp106 = tl.where(tmp105 != tmp105, tmp105, tl.where(tmp105 < tmp15, tmp105, tmp15))
2023-01-11T21:38:06.5368276Z     tmp107 = tl.load(in_ptr0 + (tmp18 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5368395Z     tmp108 = tl.load(in_ptr1 + (tmp18 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5368477Z     tmp109 = tmp107 == tmp0
2023-01-11T21:38:06.5368551Z     tmp110 = tmp105 < tmp9
2023-01-11T21:38:06.5368635Z     tmp111 = tmp110 & tmp72
2023-01-11T21:38:06.5368717Z     tmp112 = tmp111 & tmp109
2023-01-11T21:38:06.5368798Z     tmp113 = tmp104 + tmp108
2023-01-11T21:38:06.5368900Z     tmp114 = tl.where(tmp112, tmp113, tmp104)
2023-01-11T21:38:06.5369022Z     tmp115 = tl.load(in_ptr0 + (tmp25 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5369143Z     tmp116 = tl.load(in_ptr1 + (tmp25 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5369217Z     tmp117 = tmp115 == tmp0
2023-01-11T21:38:06.5369299Z     tmp118 = tmp110 & tmp30
2023-01-11T21:38:06.5369381Z     tmp119 = tmp118 & tmp117
2023-01-11T21:38:06.5369463Z     tmp120 = tmp114 + tmp116
2023-01-11T21:38:06.5369567Z     tmp121 = tl.where(tmp119, tmp120, tmp114)
2023-01-11T21:38:06.5369686Z     tmp122 = tl.load(in_ptr0 + (tmp37 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5369804Z     tmp123 = tl.load(in_ptr1 + (tmp37 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5369878Z     tmp124 = tmp122 == tmp0
2023-01-11T21:38:06.5369958Z     tmp125 = tmp110 & tmp41
2023-01-11T21:38:06.5370041Z     tmp126 = tmp125 & tmp124
2023-01-11T21:38:06.5370121Z     tmp127 = tmp121 + tmp123
2023-01-11T21:38:06.5370223Z     tmp128 = tl.where(tmp126, tmp127, tmp121)
2023-01-11T21:38:06.5370340Z     tmp129 = tl.load(in_ptr0 + (tmp47 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5370481Z     tmp130 = tl.load(in_ptr1 + (tmp47 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5370557Z     tmp131 = tmp129 == tmp0
2023-01-11T21:38:06.5370638Z     tmp132 = tmp110 & tmp51
2023-01-11T21:38:06.5370719Z     tmp133 = tmp132 & tmp131
2023-01-11T21:38:06.5370800Z     tmp134 = tmp128 + tmp130
2023-01-11T21:38:06.5370901Z     tmp135 = tl.where(tmp133, tmp134, tmp128)
2023-01-11T21:38:06.5371021Z     tmp136 = tl.load(in_ptr0 + (tmp57 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5371139Z     tmp137 = tl.load(in_ptr1 + (tmp57 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5371213Z     tmp138 = tmp136 == tmp0
2023-01-11T21:38:06.5371295Z     tmp139 = tmp110 & tmp61
2023-01-11T21:38:06.5371378Z     tmp140 = tmp139 & tmp138
2023-01-11T21:38:06.5371458Z     tmp141 = tmp135 + tmp137
2023-01-11T21:38:06.5371560Z     tmp142 = tl.where(tmp140, tmp141, tmp135)
2023-01-11T21:38:06.5371640Z     tmp143 = tmp6 + tmp8
2023-01-11T21:38:06.5371786Z     tmp144 = tl.where(tmp143 != tmp143, tmp143, tl.where(tmp143 < tmp15, tmp143, tmp15))
2023-01-11T21:38:06.5371896Z     tmp145 = tl.load(in_ptr0 + (tmp18 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5372011Z     tmp146 = tl.load(in_ptr1 + (tmp18 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5372092Z     tmp147 = tmp145 == tmp0
2023-01-11T21:38:06.5372172Z     tmp148 = tmp143 < tmp9
2023-01-11T21:38:06.5372255Z     tmp149 = tmp148 & tmp72
2023-01-11T21:38:06.5372336Z     tmp150 = tmp149 & tmp147
2023-01-11T21:38:06.5372415Z     tmp151 = tmp142 + tmp146
2023-01-11T21:38:06.5372509Z     tmp152 = tl.where(tmp150, tmp151, tmp142)
2023-01-11T21:38:06.5372624Z     tmp153 = tl.load(in_ptr0 + (tmp25 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5372766Z     tmp154 = tl.load(in_ptr1 + (tmp25 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5372847Z     tmp155 = tmp153 == tmp0
2023-01-11T21:38:06.5372926Z     tmp156 = tmp148 & tmp30
2023-01-11T21:38:06.5373007Z     tmp157 = tmp156 & tmp155
2023-01-11T21:38:06.5373089Z     tmp158 = tmp152 + tmp154
2023-01-11T21:38:06.5373186Z     tmp159 = tl.where(tmp157, tmp158, tmp152)
2023-01-11T21:38:06.5373303Z     tmp160 = tl.load(in_ptr0 + (tmp37 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5373418Z     tmp161 = tl.load(in_ptr1 + (tmp37 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5373498Z     tmp162 = tmp160 == tmp0
2023-01-11T21:38:06.5373578Z     tmp163 = tmp148 & tmp41
2023-01-11T21:38:06.5373659Z     tmp164 = tmp163 & tmp162
2023-01-11T21:38:06.5373740Z     tmp165 = tmp159 + tmp161
2023-01-11T21:38:06.5373835Z     tmp166 = tl.where(tmp164, tmp165, tmp159)
2023-01-11T21:38:06.5373952Z     tmp167 = tl.load(in_ptr0 + (tmp47 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5374067Z     tmp168 = tl.load(in_ptr1 + (tmp47 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5374149Z     tmp169 = tmp167 == tmp0
2023-01-11T21:38:06.5374228Z     tmp170 = tmp148 & tmp51
2023-01-11T21:38:06.5374309Z     tmp171 = tmp170 & tmp169
2023-01-11T21:38:06.5374390Z     tmp172 = tmp166 + tmp168
2023-01-11T21:38:06.5374603Z     tmp173 = tl.where(tmp171, tmp172, tmp166)
2023-01-11T21:38:06.5374721Z     tmp174 = tl.load(in_ptr0 + (tmp57 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5374835Z     tmp175 = tl.load(in_ptr1 + (tmp57 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5374916Z     tmp176 = tmp174 == tmp0
2023-01-11T21:38:06.5374996Z     tmp177 = tmp148 & tmp61
2023-01-11T21:38:06.5375077Z     tmp178 = tmp177 & tmp176
2023-01-11T21:38:06.5375158Z     tmp179 = tmp173 + tmp175
2023-01-11T21:38:06.5375251Z     tmp180 = tl.where(tmp178, tmp179, tmp173)
2023-01-11T21:38:06.5375331Z     tmp181 = tmp6 + tmp10
2023-01-11T21:38:06.5375473Z     tmp182 = tl.where(tmp181 != tmp181, tmp181, tl.where(tmp181 < tmp15, tmp181, tmp15))
2023-01-11T21:38:06.5375593Z     tmp183 = tl.load(in_ptr0 + (tmp18 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5375708Z     tmp184 = tl.load(in_ptr1 + (tmp18 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5375788Z     tmp185 = tmp183 == tmp0
2023-01-11T21:38:06.5375909Z     tmp186 = tmp181 < tmp9
2023-01-11T21:38:06.5375989Z     tmp187 = tmp186 & tmp72
2023-01-11T21:38:06.5376063Z     tmp188 = tmp187 & tmp185
2023-01-11T21:38:06.5376145Z     tmp189 = tmp180 + tmp184
2023-01-11T21:38:06.5376246Z     tmp190 = tl.where(tmp188, tmp189, tmp180)
2023-01-11T21:38:06.5376360Z     tmp191 = tl.load(in_ptr0 + (tmp25 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5376476Z     tmp192 = tl.load(in_ptr1 + (tmp25 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5376555Z     tmp193 = tmp191 == tmp0
2023-01-11T21:38:06.5376637Z     tmp194 = tmp186 & tmp30
2023-01-11T21:38:06.5376711Z     tmp195 = tmp194 & tmp193
2023-01-11T21:38:06.5376793Z     tmp196 = tmp190 + tmp192
2023-01-11T21:38:06.5376892Z     tmp197 = tl.where(tmp195, tmp196, tmp190)
2023-01-11T21:38:06.5377008Z     tmp198 = tl.load(in_ptr0 + (tmp37 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5377169Z     tmp199 = tl.load(in_ptr1 + (tmp37 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5377269Z     tmp200 = tmp198 == tmp0
2023-01-11T21:38:06.5377362Z     tmp201 = tmp186 & tmp41
2023-01-11T21:38:06.5377437Z     tmp202 = tmp201 & tmp200
2023-01-11T21:38:06.5377520Z     tmp203 = tmp197 + tmp199
2023-01-11T21:38:06.5377620Z     tmp204 = tl.where(tmp202, tmp203, tmp197)
2023-01-11T21:38:06.5377736Z     tmp205 = tl.load(in_ptr0 + (tmp47 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5377849Z     tmp206 = tl.load(in_ptr1 + (tmp47 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5377933Z     tmp207 = tmp205 == tmp0
2023-01-11T21:38:06.5378013Z     tmp208 = tmp186 & tmp51
2023-01-11T21:38:06.5378087Z     tmp209 = tmp208 & tmp207
2023-01-11T21:38:06.5378166Z     tmp210 = tmp204 + tmp206
2023-01-11T21:38:06.5378317Z     tmp211 = tl.where(tmp209, tmp210, tmp204)
2023-01-11T21:38:06.5378432Z     tmp212 = tl.load(in_ptr0 + (tmp57 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5378548Z     tmp213 = tl.load(in_ptr1 + (tmp57 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5378627Z     tmp214 = tmp212 == tmp0
2023-01-11T21:38:06.5378712Z     tmp215 = tmp186 & tmp61
2023-01-11T21:38:06.5378785Z     tmp216 = tmp215 & tmp214
2023-01-11T21:38:06.5378866Z     tmp217 = tmp211 + tmp213
2023-01-11T21:38:06.5378967Z     tmp218 = tl.where(tmp216, tmp217, tmp211)
2023-01-11T21:38:06.5379104Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp218, xmask)
2023-01-11T21:38:06.5379196Z ''')
2023-01-11T21:38:06.5379202Z 
2023-01-11T21:38:06.5379207Z 
2023-01-11T21:38:06.5379300Z async_compile.wait(globals())
2023-01-11T21:38:06.5379377Z del async_compile
2023-01-11T21:38:06.5379382Z 
2023-01-11T21:38:06.5379456Z def call(args):
2023-01-11T21:38:06.5379535Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5379616Z     args.clear()
2023-01-11T21:38:06.5379706Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5379928Z         buf0 = empty_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5380019Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5380200Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 1536, grid=grid(1536), stream=stream0)
2023-01-11T21:38:06.5380272Z         del arg0_1
2023-01-11T21:38:06.5380338Z         del arg2_1
2023-01-11T21:38:06.5380415Z         return (buf0, )
2023-01-11T21:38:06.5380421Z 
2023-01-11T21:38:06.5380425Z 
2023-01-11T21:38:06.5380505Z if __name__ == "__main__":
2023-01-11T21:38:06.5380625Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5380752Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5380970Z     arg0_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5381189Z     arg1_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5381399Z     arg2_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5381521Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5381817Z [2023-01-11 21:35:20,323] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 716
2023-01-11T21:38:06.5382235Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5382366Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5382623Z [2023-01-11 21:35:20,346] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 717
2023-01-11T21:38:06.5382631Z 
2023-01-11T21:38:06.5382635Z 
2023-01-11T21:38:06.5382732Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5382805Z import torch
2023-01-11T21:38:06.5382877Z import random
2023-01-11T21:38:06.5383000Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5383117Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5383130Z 
2023-01-11T21:38:06.5383205Z aten = torch.ops.aten
2023-01-11T21:38:06.5383342Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5383436Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5383441Z 
2023-01-11T21:38:06.5383514Z import triton
2023-01-11T21:38:06.5383604Z import triton.language as tl
2023-01-11T21:38:06.5383729Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5383869Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5383874Z 
2023-01-11T21:38:06.5383905Z 
2023-01-11T21:38:06.5384104Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5384180Z import triton
2023-01-11T21:38:06.5384272Z import triton.language as tl
2023-01-11T21:38:06.5384387Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5384492Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5384623Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5384751Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5384757Z 
2023-01-11T21:38:06.5385177Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5385244Z @triton.jit
2023-01-11T21:38:06.5385384Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5385463Z     xnumel = 1536
2023-01-11T21:38:06.5385559Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5385710Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5385800Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5385891Z     x3 = xindex % 12
2023-01-11T21:38:06.5385966Z     x1 = (xindex // 4) % 3
2023-01-11T21:38:06.5386042Z     x0 = xindex % 4
2023-01-11T21:38:06.5386120Z     x2 = (xindex // 12)
2023-01-11T21:38:06.5386190Z     x5 = xindex
2023-01-11T21:38:06.5386261Z     tmp0 = x3
2023-01-11T21:38:06.5386363Z     tmp1 = (-2) + x1
2023-01-11T21:38:06.5386465Z     tmp2 = (-2) + x0
2023-01-11T21:38:06.5386532Z     tmp3 = 3 + x1
2023-01-11T21:38:06.5386603Z     tmp4 = 3 + x0
2023-01-11T21:38:06.5386673Z     tmp5 = 0
2023-01-11T21:38:06.5386811Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5386948Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5387022Z     tmp8 = 3
2023-01-11T21:38:06.5387156Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5387220Z     tmp10 = 4
2023-01-11T21:38:06.5387357Z     tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10))
2023-01-11T21:38:06.5387433Z     tmp12 = tmp6 + tmp5
2023-01-11T21:38:06.5387508Z     tmp13 = tmp7 + tmp5
2023-01-11T21:38:06.5387607Z     tmp14 = 1
2023-01-11T21:38:06.5387722Z     tmp15 = tmp9 - tmp14
2023-01-11T21:38:06.5387867Z     tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15))
2023-01-11T21:38:06.5387973Z     tmp17 = tmp11 - tmp14
2023-01-11T21:38:06.5388115Z     tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17))
2023-01-11T21:38:06.5388235Z     tmp19 = tl.load(in_ptr0 + (tmp18 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5388370Z     tmp20 = tl.load(in_ptr1 + (tmp18 + (4*tmp16) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5388449Z     tmp21 = tmp19 == tmp0
2023-01-11T21:38:06.5388524Z     tmp22 = 0.0
2023-01-11T21:38:06.5388624Z     tmp23 = tl.where(tmp21, tmp20, tmp22)
2023-01-11T21:38:06.5388697Z     tmp24 = tmp7 + tmp14
2023-01-11T21:38:06.5388835Z     tmp25 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp17, tmp24, tmp17))
2023-01-11T21:38:06.5388953Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5389087Z     tmp27 = tl.load(in_ptr1 + (tmp25 + (4*tmp16) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5389167Z     tmp28 = tmp26 == tmp0
2023-01-11T21:38:06.5389247Z     tmp29 = tmp12 < tmp9
2023-01-11T21:38:06.5389324Z     tmp30 = tmp24 < tmp11
2023-01-11T21:38:06.5389396Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5389477Z     tmp32 = tmp31 & tmp28
2023-01-11T21:38:06.5389554Z     tmp33 = tmp23 + tmp27
2023-01-11T21:38:06.5389652Z     tmp34 = tl.where(tmp32, tmp33, tmp23)
2023-01-11T21:38:06.5389721Z     tmp35 = 2
2023-01-11T21:38:06.5389800Z     tmp36 = tmp7 + tmp35
2023-01-11T21:38:06.5389935Z     tmp37 = tl.where(tmp36 != tmp36, tmp36, tl.where(tmp36 < tmp17, tmp36, tmp17))
2023-01-11T21:38:06.5390073Z     tmp38 = tl.load(in_ptr0 + (tmp37 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5390205Z     tmp39 = tl.load(in_ptr1 + (tmp37 + (4*tmp16) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5390288Z     tmp40 = tmp38 == tmp0
2023-01-11T21:38:06.5390370Z     tmp41 = tmp36 < tmp11
2023-01-11T21:38:06.5390449Z     tmp42 = tmp29 & tmp41
2023-01-11T21:38:06.5390531Z     tmp43 = tmp42 & tmp40
2023-01-11T21:38:06.5390608Z     tmp44 = tmp34 + tmp39
2023-01-11T21:38:06.5390700Z     tmp45 = tl.where(tmp43, tmp44, tmp34)
2023-01-11T21:38:06.5390779Z     tmp46 = tmp7 + tmp8
2023-01-11T21:38:06.5390915Z     tmp47 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 < tmp17, tmp46, tmp17))
2023-01-11T21:38:06.5391034Z     tmp48 = tl.load(in_ptr0 + (tmp47 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5391164Z     tmp49 = tl.load(in_ptr1 + (tmp47 + (4*tmp16) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5391241Z     tmp50 = tmp48 == tmp0
2023-01-11T21:38:06.5391325Z     tmp51 = tmp46 < tmp11
2023-01-11T21:38:06.5391403Z     tmp52 = tmp29 & tmp51
2023-01-11T21:38:06.5391475Z     tmp53 = tmp52 & tmp50
2023-01-11T21:38:06.5391551Z     tmp54 = tmp45 + tmp49
2023-01-11T21:38:06.5391648Z     tmp55 = tl.where(tmp53, tmp54, tmp45)
2023-01-11T21:38:06.5391725Z     tmp56 = tmp7 + tmp10
2023-01-11T21:38:06.5391865Z     tmp57 = tl.where(tmp56 != tmp56, tmp56, tl.where(tmp56 < tmp17, tmp56, tmp17))
2023-01-11T21:38:06.5391983Z     tmp58 = tl.load(in_ptr0 + (tmp57 + (4*tmp16) + (12*x2)), xmask)
2023-01-11T21:38:06.5392113Z     tmp59 = tl.load(in_ptr1 + (tmp57 + (4*tmp16) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5392185Z     tmp60 = tmp58 == tmp0
2023-01-11T21:38:06.5392264Z     tmp61 = tmp56 < tmp11
2023-01-11T21:38:06.5392342Z     tmp62 = tmp29 & tmp61
2023-01-11T21:38:06.5392419Z     tmp63 = tmp62 & tmp60
2023-01-11T21:38:06.5392499Z     tmp64 = tmp55 + tmp59
2023-01-11T21:38:06.5392597Z     tmp65 = tl.where(tmp63, tmp64, tmp55)
2023-01-11T21:38:06.5392678Z     tmp66 = tmp6 + tmp14
2023-01-11T21:38:06.5392811Z     tmp67 = tl.where(tmp66 != tmp66, tmp66, tl.where(tmp66 < tmp15, tmp66, tmp15))
2023-01-11T21:38:06.5392928Z     tmp68 = tl.load(in_ptr0 + (tmp18 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5393086Z     tmp69 = tl.load(in_ptr1 + (tmp18 + (4*tmp67) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5393163Z     tmp70 = tmp68 == tmp0
2023-01-11T21:38:06.5393242Z     tmp71 = tmp66 < tmp9
2023-01-11T21:38:06.5393321Z     tmp72 = tmp13 < tmp11
2023-01-11T21:38:06.5393398Z     tmp73 = tmp71 & tmp72
2023-01-11T21:38:06.5393468Z     tmp74 = tmp73 & tmp70
2023-01-11T21:38:06.5393544Z     tmp75 = tmp65 + tmp69
2023-01-11T21:38:06.5393642Z     tmp76 = tl.where(tmp74, tmp75, tmp65)
2023-01-11T21:38:06.5393761Z     tmp77 = tl.load(in_ptr0 + (tmp25 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5393891Z     tmp78 = tl.load(in_ptr1 + (tmp25 + (4*tmp67) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5393971Z     tmp79 = tmp77 == tmp0
2023-01-11T21:38:06.5394052Z     tmp80 = tmp71 & tmp30
2023-01-11T21:38:06.5394124Z     tmp81 = tmp80 & tmp79
2023-01-11T21:38:06.5394200Z     tmp82 = tmp76 + tmp78
2023-01-11T21:38:06.5394295Z     tmp83 = tl.where(tmp81, tmp82, tmp76)
2023-01-11T21:38:06.5394413Z     tmp84 = tl.load(in_ptr0 + (tmp37 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5394545Z     tmp85 = tl.load(in_ptr1 + (tmp37 + (4*tmp67) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5394625Z     tmp86 = tmp84 == tmp0
2023-01-11T21:38:06.5394704Z     tmp87 = tmp71 & tmp41
2023-01-11T21:38:06.5394775Z     tmp88 = tmp87 & tmp86
2023-01-11T21:38:06.5394854Z     tmp89 = tmp83 + tmp85
2023-01-11T21:38:06.5394951Z     tmp90 = tl.where(tmp88, tmp89, tmp83)
2023-01-11T21:38:06.5395068Z     tmp91 = tl.load(in_ptr0 + (tmp47 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5395200Z     tmp92 = tl.load(in_ptr1 + (tmp47 + (4*tmp67) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5395285Z     tmp93 = tmp91 == tmp0
2023-01-11T21:38:06.5395411Z     tmp94 = tmp71 & tmp51
2023-01-11T21:38:06.5395491Z     tmp95 = tmp94 & tmp93
2023-01-11T21:38:06.5395585Z     tmp96 = tmp90 + tmp92
2023-01-11T21:38:06.5395681Z     tmp97 = tl.where(tmp95, tmp96, tmp90)
2023-01-11T21:38:06.5395801Z     tmp98 = tl.load(in_ptr0 + (tmp57 + (4*tmp67) + (12*x2)), xmask)
2023-01-11T21:38:06.5395933Z     tmp99 = tl.load(in_ptr1 + (tmp57 + (4*tmp67) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5396014Z     tmp100 = tmp98 == tmp0
2023-01-11T21:38:06.5396096Z     tmp101 = tmp71 & tmp61
2023-01-11T21:38:06.5396172Z     tmp102 = tmp101 & tmp100
2023-01-11T21:38:06.5396254Z     tmp103 = tmp97 + tmp99
2023-01-11T21:38:06.5396356Z     tmp104 = tl.where(tmp102, tmp103, tmp97)
2023-01-11T21:38:06.5396434Z     tmp105 = tmp6 + tmp35
2023-01-11T21:38:06.5396578Z     tmp106 = tl.where(tmp105 != tmp105, tmp105, tl.where(tmp105 < tmp15, tmp105, tmp15))
2023-01-11T21:38:06.5396700Z     tmp107 = tl.load(in_ptr0 + (tmp18 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5396836Z     tmp108 = tl.load(in_ptr1 + (tmp18 + (4*tmp106) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5396911Z     tmp109 = tmp107 == tmp0
2023-01-11T21:38:06.5396992Z     tmp110 = tmp105 < tmp9
2023-01-11T21:38:06.5397074Z     tmp111 = tmp110 & tmp72
2023-01-11T21:38:06.5397156Z     tmp112 = tmp111 & tmp109
2023-01-11T21:38:06.5397236Z     tmp113 = tmp104 + tmp108
2023-01-11T21:38:06.5397339Z     tmp114 = tl.where(tmp112, tmp113, tmp104)
2023-01-11T21:38:06.5397460Z     tmp115 = tl.load(in_ptr0 + (tmp25 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5397585Z     tmp116 = tl.load(in_ptr1 + (tmp25 + (4*tmp106) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5397666Z     tmp117 = tmp115 == tmp0
2023-01-11T21:38:06.5397746Z     tmp118 = tmp110 & tmp30
2023-01-11T21:38:06.5397827Z     tmp119 = tmp118 & tmp117
2023-01-11T21:38:06.5397908Z     tmp120 = tmp114 + tmp116
2023-01-11T21:38:06.5398010Z     tmp121 = tl.where(tmp119, tmp120, tmp114)
2023-01-11T21:38:06.5398130Z     tmp122 = tl.load(in_ptr0 + (tmp37 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5398266Z     tmp123 = tl.load(in_ptr1 + (tmp37 + (4*tmp106) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5398341Z     tmp124 = tmp122 == tmp0
2023-01-11T21:38:06.5398420Z     tmp125 = tmp110 & tmp41
2023-01-11T21:38:06.5398501Z     tmp126 = tmp125 & tmp124
2023-01-11T21:38:06.5398609Z     tmp127 = tmp121 + tmp123
2023-01-11T21:38:06.5398712Z     tmp128 = tl.where(tmp126, tmp127, tmp121)
2023-01-11T21:38:06.5398831Z     tmp129 = tl.load(in_ptr0 + (tmp47 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5398959Z     tmp130 = tl.load(in_ptr1 + (tmp47 + (4*tmp106) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5399033Z     tmp131 = tmp129 == tmp0
2023-01-11T21:38:06.5399112Z     tmp132 = tmp110 & tmp51
2023-01-11T21:38:06.5399190Z     tmp133 = tmp132 & tmp131
2023-01-11T21:38:06.5399270Z     tmp134 = tmp128 + tmp130
2023-01-11T21:38:06.5399371Z     tmp135 = tl.where(tmp133, tmp134, tmp128)
2023-01-11T21:38:06.5399490Z     tmp136 = tl.load(in_ptr0 + (tmp57 + (4*tmp106) + (12*x2)), xmask)
2023-01-11T21:38:06.5399625Z     tmp137 = tl.load(in_ptr1 + (tmp57 + (4*tmp106) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5399699Z     tmp138 = tmp136 == tmp0
2023-01-11T21:38:06.5399777Z     tmp139 = tmp110 & tmp61
2023-01-11T21:38:06.5399861Z     tmp140 = tmp139 & tmp138
2023-01-11T21:38:06.5399944Z     tmp141 = tmp135 + tmp137
2023-01-11T21:38:06.5400044Z     tmp142 = tl.where(tmp140, tmp141, tmp135)
2023-01-11T21:38:06.5400124Z     tmp143 = tmp6 + tmp8
2023-01-11T21:38:06.5400268Z     tmp144 = tl.where(tmp143 != tmp143, tmp143, tl.where(tmp143 < tmp15, tmp143, tmp15))
2023-01-11T21:38:06.5400381Z     tmp145 = tl.load(in_ptr0 + (tmp18 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5400509Z     tmp146 = tl.load(in_ptr1 + (tmp18 + (4*tmp144) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5400589Z     tmp147 = tmp145 == tmp0
2023-01-11T21:38:06.5400667Z     tmp148 = tmp143 < tmp9
2023-01-11T21:38:06.5400748Z     tmp149 = tmp148 & tmp72
2023-01-11T21:38:06.5400856Z     tmp150 = tmp149 & tmp147
2023-01-11T21:38:06.5400935Z     tmp151 = tmp142 + tmp146
2023-01-11T21:38:06.5401030Z     tmp152 = tl.where(tmp150, tmp151, tmp142)
2023-01-11T21:38:06.5401147Z     tmp153 = tl.load(in_ptr0 + (tmp25 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5401276Z     tmp154 = tl.load(in_ptr1 + (tmp25 + (4*tmp144) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5401353Z     tmp155 = tmp153 == tmp0
2023-01-11T21:38:06.5401435Z     tmp156 = tmp148 & tmp30
2023-01-11T21:38:06.5401516Z     tmp157 = tmp156 & tmp155
2023-01-11T21:38:06.5401598Z     tmp158 = tmp152 + tmp154
2023-01-11T21:38:06.5401692Z     tmp159 = tl.where(tmp157, tmp158, tmp152)
2023-01-11T21:38:06.5401809Z     tmp160 = tl.load(in_ptr0 + (tmp37 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5401938Z     tmp161 = tl.load(in_ptr1 + (tmp37 + (4*tmp144) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5402019Z     tmp162 = tmp160 == tmp0
2023-01-11T21:38:06.5402105Z     tmp163 = tmp148 & tmp41
2023-01-11T21:38:06.5402189Z     tmp164 = tmp163 & tmp162
2023-01-11T21:38:06.5402269Z     tmp165 = tmp159 + tmp161
2023-01-11T21:38:06.5402363Z     tmp166 = tl.where(tmp164, tmp165, tmp159)
2023-01-11T21:38:06.5402480Z     tmp167 = tl.load(in_ptr0 + (tmp47 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5402611Z     tmp168 = tl.load(in_ptr1 + (tmp47 + (4*tmp144) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5402693Z     tmp169 = tmp167 == tmp0
2023-01-11T21:38:06.5402773Z     tmp170 = tmp148 & tmp51
2023-01-11T21:38:06.5402853Z     tmp171 = tmp170 & tmp169
2023-01-11T21:38:06.5402931Z     tmp172 = tmp166 + tmp168
2023-01-11T21:38:06.5403025Z     tmp173 = tl.where(tmp171, tmp172, tmp166)
2023-01-11T21:38:06.5403142Z     tmp174 = tl.load(in_ptr0 + (tmp57 + (4*tmp144) + (12*x2)), xmask)
2023-01-11T21:38:06.5403271Z     tmp175 = tl.load(in_ptr1 + (tmp57 + (4*tmp144) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5403352Z     tmp176 = tmp174 == tmp0
2023-01-11T21:38:06.5403434Z     tmp177 = tmp148 & tmp61
2023-01-11T21:38:06.5403516Z     tmp178 = tmp177 & tmp176
2023-01-11T21:38:06.5403596Z     tmp179 = tmp173 + tmp175
2023-01-11T21:38:06.5403691Z     tmp180 = tl.where(tmp178, tmp179, tmp173)
2023-01-11T21:38:06.5403772Z     tmp181 = tmp6 + tmp10
2023-01-11T21:38:06.5403943Z     tmp182 = tl.where(tmp181 != tmp181, tmp181, tl.where(tmp181 < tmp15, tmp181, tmp15))
2023-01-11T21:38:06.5404062Z     tmp183 = tl.load(in_ptr0 + (tmp18 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5404193Z     tmp184 = tl.load(in_ptr1 + (tmp18 + (4*tmp182) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5404274Z     tmp185 = tmp183 == tmp0
2023-01-11T21:38:06.5404355Z     tmp186 = tmp181 < tmp9
2023-01-11T21:38:06.5404435Z     tmp187 = tmp186 & tmp72
2023-01-11T21:38:06.5404510Z     tmp188 = tmp187 & tmp185
2023-01-11T21:38:06.5404591Z     tmp189 = tmp180 + tmp184
2023-01-11T21:38:06.5404690Z     tmp190 = tl.where(tmp188, tmp189, tmp180)
2023-01-11T21:38:06.5404807Z     tmp191 = tl.load(in_ptr0 + (tmp25 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5404941Z     tmp192 = tl.load(in_ptr1 + (tmp25 + (4*tmp182) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5405020Z     tmp193 = tmp191 == tmp0
2023-01-11T21:38:06.5405100Z     tmp194 = tmp186 & tmp30
2023-01-11T21:38:06.5405174Z     tmp195 = tmp194 & tmp193
2023-01-11T21:38:06.5405257Z     tmp196 = tmp190 + tmp192
2023-01-11T21:38:06.5405358Z     tmp197 = tl.where(tmp195, tmp196, tmp190)
2023-01-11T21:38:06.5405473Z     tmp198 = tl.load(in_ptr0 + (tmp37 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5405603Z     tmp199 = tl.load(in_ptr1 + (tmp37 + (4*tmp182) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5405686Z     tmp200 = tmp198 == tmp0
2023-01-11T21:38:06.5405767Z     tmp201 = tmp186 & tmp41
2023-01-11T21:38:06.5405841Z     tmp202 = tmp201 & tmp200
2023-01-11T21:38:06.5405919Z     tmp203 = tmp197 + tmp199
2023-01-11T21:38:06.5406020Z     tmp204 = tl.where(tmp202, tmp203, tmp197)
2023-01-11T21:38:06.5406140Z     tmp205 = tl.load(in_ptr0 + (tmp47 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5406305Z     tmp206 = tl.load(in_ptr1 + (tmp47 + (4*tmp182) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5406382Z     tmp207 = tmp205 == tmp0
2023-01-11T21:38:06.5406461Z     tmp208 = tmp186 & tmp51
2023-01-11T21:38:06.5406535Z     tmp209 = tmp208 & tmp207
2023-01-11T21:38:06.5406617Z     tmp210 = tmp204 + tmp206
2023-01-11T21:38:06.5406718Z     tmp211 = tl.where(tmp209, tmp210, tmp204)
2023-01-11T21:38:06.5406835Z     tmp212 = tl.load(in_ptr0 + (tmp57 + (4*tmp182) + (12*x2)), xmask)
2023-01-11T21:38:06.5406963Z     tmp213 = tl.load(in_ptr1 + (tmp57 + (4*tmp182) + (12*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5407044Z     tmp214 = tmp212 == tmp0
2023-01-11T21:38:06.5407129Z     tmp215 = tmp186 & tmp61
2023-01-11T21:38:06.5407203Z     tmp216 = tmp215 & tmp214
2023-01-11T21:38:06.5407286Z     tmp217 = tmp211 + tmp213
2023-01-11T21:38:06.5407388Z     tmp218 = tl.where(tmp216, tmp217, tmp211)
2023-01-11T21:38:06.5407529Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp218, xmask)
2023-01-11T21:38:06.5407624Z ''')
2023-01-11T21:38:06.5407630Z 
2023-01-11T21:38:06.5407634Z 
2023-01-11T21:38:06.5407729Z async_compile.wait(globals())
2023-01-11T21:38:06.5407807Z del async_compile
2023-01-11T21:38:06.5407812Z 
2023-01-11T21:38:06.5407887Z def call(args):
2023-01-11T21:38:06.5407970Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5408046Z     args.clear()
2023-01-11T21:38:06.5408140Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5408363Z         buf0 = empty_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5408455Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5408630Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 1536, grid=grid(1536), stream=stream0)
2023-01-11T21:38:06.5408701Z         del arg0_1
2023-01-11T21:38:06.5408766Z         del arg2_1
2023-01-11T21:38:06.5408844Z         return (buf0, )
2023-01-11T21:38:06.5408852Z 
2023-01-11T21:38:06.5408857Z 
2023-01-11T21:38:06.5408937Z if __name__ == "__main__":
2023-01-11T21:38:06.5409054Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5409185Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5409433Z     arg0_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5409650Z     arg1_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5409864Z     arg2_1 = rand_strided((2, 64, 3, 4), (768, 12, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5409986Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5410252Z [2023-01-11 21:35:21,222] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 717
2023-01-11T21:38:06.5410258Z 
2023-01-11T21:38:06.5410331Z ok (1.801s)
2023-01-11T21:38:06.5410806Z   test_max_pool2d_with_indices_backward5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5410942Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5411198Z [2023-01-11 21:35:21,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 718
2023-01-11T21:38:06.5411458Z [2023-01-11 21:35:21,261] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices_backward
2023-01-11T21:38:06.5411716Z [2023-01-11 21:35:21,264] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 718
2023-01-11T21:38:06.5412131Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5412298Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5412554Z [2023-01-11 21:35:21,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 719
2023-01-11T21:38:06.5412806Z [2023-01-11 21:35:21,299] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.max_pool2d_with_indices_backward
2023-01-11T21:38:06.5413071Z [2023-01-11 21:35:21,302] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 719
2023-01-11T21:38:06.5413077Z 
2023-01-11T21:38:06.5413173Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5413246Z import torch
2023-01-11T21:38:06.5413320Z import random
2023-01-11T21:38:06.5413438Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5413563Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5413568Z 
2023-01-11T21:38:06.5413652Z aten = torch.ops.aten
2023-01-11T21:38:06.5413782Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5413877Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5413885Z 
2023-01-11T21:38:06.5413960Z import triton
2023-01-11T21:38:06.5414052Z import triton.language as tl
2023-01-11T21:38:06.5414176Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5414317Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5414323Z 
2023-01-11T21:38:06.5414327Z 
2023-01-11T21:38:06.5414419Z async_compile.wait(globals())
2023-01-11T21:38:06.5414614Z del async_compile
2023-01-11T21:38:06.5414620Z 
2023-01-11T21:38:06.5414689Z def call(args):
2023-01-11T21:38:06.5414776Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5414852Z     args.clear()
2023-01-11T21:38:06.5414948Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5415110Z         buf0 = aten.max_pool2d_with_indices_backward(arg0_1, arg1_1, [13, 13], [1, 1], [2, 2], [1, 1], False, arg2_1)
2023-01-11T21:38:06.5415183Z         del arg0_1
2023-01-11T21:38:06.5415255Z         del arg1_1
2023-01-11T21:38:06.5415319Z         del arg2_1
2023-01-11T21:38:06.5415436Z         buf1 = buf0
2023-01-11T21:38:06.5415557Z         assert_size_stride(buf1, (2, 64, 20, 20), (25600, 400, 20, 1))
2023-01-11T21:38:06.5415630Z         del buf0
2023-01-11T21:38:06.5415708Z         return (buf1, )
2023-01-11T21:38:06.5415715Z 
2023-01-11T21:38:06.5415720Z 
2023-01-11T21:38:06.5415824Z if __name__ == "__main__":
2023-01-11T21:38:06.5415955Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5416094Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5416318Z     arg0_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5416546Z     arg1_1 = rand_strided((2, 64, 20, 20), (25600, 400, 20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5416767Z     arg2_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5416895Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5416900Z 
2023-01-11T21:38:06.5416906Z 
2023-01-11T21:38:06.5417004Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5417077Z import torch
2023-01-11T21:38:06.5417213Z import random
2023-01-11T21:38:06.5417339Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5417456Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5417461Z 
2023-01-11T21:38:06.5417547Z aten = torch.ops.aten
2023-01-11T21:38:06.5417683Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5417779Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5417784Z 
2023-01-11T21:38:06.5417857Z import triton
2023-01-11T21:38:06.5418032Z import triton.language as tl
2023-01-11T21:38:06.5418157Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5418289Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5418299Z 
2023-01-11T21:38:06.5418304Z 
2023-01-11T21:38:06.5418389Z async_compile.wait(globals())
2023-01-11T21:38:06.5418468Z del async_compile
2023-01-11T21:38:06.5418473Z 
2023-01-11T21:38:06.5418550Z def call(args):
2023-01-11T21:38:06.5418638Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5418714Z     args.clear()
2023-01-11T21:38:06.5418806Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5418967Z         buf0 = aten.max_pool2d_with_indices_backward(arg0_1, arg1_1, [13, 13], [1, 1], [2, 2], [1, 1], False, arg2_1)
2023-01-11T21:38:06.5419034Z         del arg0_1
2023-01-11T21:38:06.5419107Z         del arg1_1
2023-01-11T21:38:06.5419178Z         del arg2_1
2023-01-11T21:38:06.5419250Z         buf1 = buf0
2023-01-11T21:38:06.5419366Z         assert_size_stride(buf1, (2, 64, 20, 20), (25600, 400, 20, 1))
2023-01-11T21:38:06.5419441Z         del buf0
2023-01-11T21:38:06.5419516Z         return (buf1, )
2023-01-11T21:38:06.5419521Z 
2023-01-11T21:38:06.5419525Z 
2023-01-11T21:38:06.5419606Z if __name__ == "__main__":
2023-01-11T21:38:06.5419719Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5419846Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5420073Z     arg0_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5420293Z     arg1_1 = rand_strided((2, 64, 20, 20), (25600, 400, 20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5420509Z     arg2_1 = rand_strided((2, 64, 12, 12), (9216, 144, 12, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5420635Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5420641Z 
2023-01-11T21:38:06.5420711Z ok (0.080s)
2023-01-11T21:38:06.5421217Z   test_max_pool2d_with_indices_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5421352Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5421603Z [2023-01-11 21:35:21,325] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 720
2023-01-11T21:38:06.5421864Z [2023-01-11 21:35:21,462] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 720
2023-01-11T21:38:06.5422277Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5422409Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5422669Z [2023-01-11 21:35:21,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 721
2023-01-11T21:38:06.5422932Z [2023-01-11 21:35:21,609] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 721
2023-01-11T21:38:06.5422937Z 
2023-01-11T21:38:06.5423036Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5423110Z import torch
2023-01-11T21:38:06.5423184Z import random
2023-01-11T21:38:06.5423297Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5423424Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5423430Z 
2023-01-11T21:38:06.5423510Z aten = torch.ops.aten
2023-01-11T21:38:06.5423646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5423764Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5423769Z 
2023-01-11T21:38:06.5423840Z import triton
2023-01-11T21:38:06.5423931Z import triton.language as tl
2023-01-11T21:38:06.5424058Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5424193Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5424198Z 
2023-01-11T21:38:06.5424207Z 
2023-01-11T21:38:06.5424404Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5424477Z import triton
2023-01-11T21:38:06.5424567Z import triton.language as tl
2023-01-11T21:38:06.5424681Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5424782Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5424917Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5425042Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5425047Z 
2023-01-11T21:38:06.5425463Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5425536Z @triton.jit
2023-01-11T21:38:06.5425679Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5425753Z     xnumel = 2016
2023-01-11T21:38:06.5425850Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5425979Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5426061Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5426138Z     x3 = xindex % 252
2023-01-11T21:38:06.5426214Z     x1 = (xindex // 14) % 18
2023-01-11T21:38:06.5426287Z     x0 = xindex % 14
2023-01-11T21:38:06.5426365Z     x2 = (xindex // 252)
2023-01-11T21:38:06.5426432Z     x5 = xindex
2023-01-11T21:38:06.5426502Z     tmp0 = x3
2023-01-11T21:38:06.5426577Z     tmp1 = (x1 // 2)
2023-01-11T21:38:06.5426647Z     tmp2 = (x0 // 2)
2023-01-11T21:38:06.5426722Z     tmp3 = 1 + (x1 // 2)
2023-01-11T21:38:06.5426798Z     tmp4 = 1 + (x0 // 2)
2023-01-11T21:38:06.5426866Z     tmp5 = 0
2023-01-11T21:38:06.5427002Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5427163Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5427232Z     tmp8 = 9
2023-01-11T21:38:06.5427356Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5427427Z     tmp10 = 7
2023-01-11T21:38:06.5427564Z     tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10))
2023-01-11T21:38:06.5427643Z     tmp12 = tmp6 + tmp5
2023-01-11T21:38:06.5427719Z     tmp13 = tmp7 + tmp5
2023-01-11T21:38:06.5427790Z     tmp14 = 1
2023-01-11T21:38:06.5427902Z     tmp15 = tmp9 - tmp14
2023-01-11T21:38:06.5428039Z     tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15))
2023-01-11T21:38:06.5428155Z     tmp17 = tmp11 - tmp14
2023-01-11T21:38:06.5428294Z     tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17))
2023-01-11T21:38:06.5428415Z     tmp19 = tl.load(in_ptr0 + (tmp18 + (7*tmp16) + (63*x2)), xmask)
2023-01-11T21:38:06.5428531Z     tmp20 = tl.load(in_ptr1 + (tmp18 + (7*tmp16) + (63*x2)), xmask)
2023-01-11T21:38:06.5428613Z     tmp21 = tmp19 == tmp0
2023-01-11T21:38:06.5428685Z     tmp22 = 0.0
2023-01-11T21:38:06.5428778Z     tmp23 = tl.where(tmp21, tmp20, tmp22)
2023-01-11T21:38:06.5428912Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.5428995Z ''')
2023-01-11T21:38:06.5429001Z 
2023-01-11T21:38:06.5429005Z 
2023-01-11T21:38:06.5429099Z async_compile.wait(globals())
2023-01-11T21:38:06.5429175Z del async_compile
2023-01-11T21:38:06.5429180Z 
2023-01-11T21:38:06.5429254Z def call(args):
2023-01-11T21:38:06.5429338Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5429414Z     args.clear()
2023-01-11T21:38:06.5429528Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5429753Z         buf0 = empty_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5429845Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5430030Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 2016, grid=grid(2016), stream=stream0)
2023-01-11T21:38:06.5430102Z         del arg0_1
2023-01-11T21:38:06.5430175Z         del arg2_1
2023-01-11T21:38:06.5430257Z         return (buf0, )
2023-01-11T21:38:06.5430262Z 
2023-01-11T21:38:06.5430267Z 
2023-01-11T21:38:06.5430344Z if __name__ == "__main__":
2023-01-11T21:38:06.5430454Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5430579Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5430796Z     arg0_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5431016Z     arg1_1 = rand_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5431233Z     arg2_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5431357Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5431363Z 
2023-01-11T21:38:06.5431367Z 
2023-01-11T21:38:06.5431467Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5431540Z import torch
2023-01-11T21:38:06.5431606Z import random
2023-01-11T21:38:06.5431724Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5431847Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5431852Z 
2023-01-11T21:38:06.5431933Z aten = torch.ops.aten
2023-01-11T21:38:06.5432072Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5432167Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5432172Z 
2023-01-11T21:38:06.5432244Z import triton
2023-01-11T21:38:06.5432330Z import triton.language as tl
2023-01-11T21:38:06.5432455Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5432594Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5432599Z 
2023-01-11T21:38:06.5432603Z 
2023-01-11T21:38:06.5432803Z triton_fused_max_pool2d_with_indices_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5432876Z import triton
2023-01-11T21:38:06.5432997Z import triton.language as tl
2023-01-11T21:38:06.5433113Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5433212Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5433337Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5433462Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5433467Z 
2023-01-11T21:38:06.5433884Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5433960Z @triton.jit
2023-01-11T21:38:06.5434102Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5434179Z     xnumel = 2016
2023-01-11T21:38:06.5434275Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5434404Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5434480Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5434555Z     x3 = xindex % 252
2023-01-11T21:38:06.5434637Z     x1 = (xindex // 14) % 18
2023-01-11T21:38:06.5434711Z     x0 = xindex % 14
2023-01-11T21:38:06.5434788Z     x2 = (xindex // 252)
2023-01-11T21:38:06.5434860Z     x5 = xindex
2023-01-11T21:38:06.5434933Z     tmp0 = x3
2023-01-11T21:38:06.5435001Z     tmp1 = (x1 // 2)
2023-01-11T21:38:06.5435075Z     tmp2 = (x0 // 2)
2023-01-11T21:38:06.5435156Z     tmp3 = 1 + (x1 // 2)
2023-01-11T21:38:06.5435232Z     tmp4 = 1 + (x0 // 2)
2023-01-11T21:38:06.5435311Z     tmp5 = 0
2023-01-11T21:38:06.5435465Z     tmp6 = tl.where(tmp1 != tmp1, tmp1, tl.where(tmp1 > tmp5, tmp1, tmp5))
2023-01-11T21:38:06.5435653Z     tmp7 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 > tmp5, tmp2, tmp5))
2023-01-11T21:38:06.5435717Z     tmp8 = 9
2023-01-11T21:38:06.5435846Z     tmp9 = tl.where(tmp3 != tmp3, tmp3, tl.where(tmp3 < tmp8, tmp3, tmp8))
2023-01-11T21:38:06.5435917Z     tmp10 = 7
2023-01-11T21:38:06.5436060Z     tmp11 = tl.where(tmp4 != tmp4, tmp4, tl.where(tmp4 < tmp10, tmp4, tmp10))
2023-01-11T21:38:06.5436136Z     tmp12 = tmp6 + tmp5
2023-01-11T21:38:06.5436215Z     tmp13 = tmp7 + tmp5
2023-01-11T21:38:06.5436286Z     tmp14 = 1
2023-01-11T21:38:06.5436392Z     tmp15 = tmp9 - tmp14
2023-01-11T21:38:06.5436534Z     tmp16 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 < tmp15, tmp12, tmp15))
2023-01-11T21:38:06.5436646Z     tmp17 = tmp11 - tmp14
2023-01-11T21:38:06.5436785Z     tmp18 = tl.where(tmp13 != tmp13, tmp13, tl.where(tmp13 < tmp17, tmp13, tmp17))
2023-01-11T21:38:06.5436904Z     tmp19 = tl.load(in_ptr0 + (tmp18 + (7*tmp16) + (63*x2)), xmask)
2023-01-11T21:38:06.5437041Z     tmp20 = tl.load(in_ptr1 + (tmp18 + (7*tmp16) + (63*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.5437119Z     tmp21 = tmp19 == tmp0
2023-01-11T21:38:06.5437184Z     tmp22 = 0.0
2023-01-11T21:38:06.5437281Z     tmp23 = tl.where(tmp21, tmp20, tmp22)
2023-01-11T21:38:06.5437419Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.5437504Z ''')
2023-01-11T21:38:06.5437510Z 
2023-01-11T21:38:06.5437514Z 
2023-01-11T21:38:06.5437610Z async_compile.wait(globals())
2023-01-11T21:38:06.5437687Z del async_compile
2023-01-11T21:38:06.5437692Z 
2023-01-11T21:38:06.5437767Z def call(args):
2023-01-11T21:38:06.5437855Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.5437924Z     args.clear()
2023-01-11T21:38:06.5438018Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5438243Z         buf0 = empty_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5438333Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5438510Z         triton_fused_max_pool2d_with_indices_backward_0.run(arg2_1, arg0_1, buf0, 2016, grid=grid(2016), stream=stream0)
2023-01-11T21:38:06.5438584Z         del arg0_1
2023-01-11T21:38:06.5438656Z         del arg2_1
2023-01-11T21:38:06.5438726Z         return (buf0, )
2023-01-11T21:38:06.5438731Z 
2023-01-11T21:38:06.5438769Z 
2023-01-11T21:38:06.5438844Z if __name__ == "__main__":
2023-01-11T21:38:06.5438964Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5439091Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5439304Z     arg0_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5439526Z     arg1_1 = rand_strided((2, 4, 18, 14), (1008, 252, 14, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5439736Z     arg2_1 = rand_strided((2, 4, 9, 7), (252, 63, 7, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5439862Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.5439870Z 
2023-01-11T21:38:06.5439935Z ok (0.307s)
2023-01-11T21:38:06.5440387Z   test_mean_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5440519Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5440776Z [2023-01-11 21:35:21,630] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 722
2023-01-11T21:38:06.5441041Z [2023-01-11 21:35:21,837] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 722
2023-01-11T21:38:06.5441457Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5441617Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5441874Z [2023-01-11 21:35:21,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 723
2023-01-11T21:38:06.5442078Z [2023-01-11 21:35:22,040] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.5442282Z [2023-01-11 21:35:22,042] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.5442287Z 
2023-01-11T21:38:06.5442386Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5442454Z import torch
2023-01-11T21:38:06.5442529Z import random
2023-01-11T21:38:06.5442648Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5442773Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5442778Z 
2023-01-11T21:38:06.5442860Z aten = torch.ops.aten
2023-01-11T21:38:06.5443000Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5443094Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5443099Z 
2023-01-11T21:38:06.5443176Z import triton
2023-01-11T21:38:06.5443261Z import triton.language as tl
2023-01-11T21:38:06.5443384Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5443520Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5443526Z 
2023-01-11T21:38:06.5443530Z 
2023-01-11T21:38:06.5443685Z triton_fused_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.5443759Z import triton
2023-01-11T21:38:06.5443850Z import triton.language as tl
2023-01-11T21:38:06.5443962Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5444056Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5444186Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5444313Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5444319Z 
2023-01-11T21:38:06.5444404Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.5444519Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5444607Z               filename=__file__,
2023-01-11T21:38:06.5445007Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.5445082Z @triton.jit
2023-01-11T21:38:06.5445247Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5445320Z     xnumel = 1
2023-01-11T21:38:06.5445392Z     rnumel = 64
2023-01-11T21:38:06.5445494Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5445627Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5445713Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5445831Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5445942Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5446048Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5446137Z         rindex = roffset + rbase
2023-01-11T21:38:06.5446222Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5446295Z         r0 = rindex
2023-01-11T21:38:06.5446396Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5446518Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5446625Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5446696Z     tmp2 = 64
2023-01-11T21:38:06.5446774Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.5446914Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, None)
2023-01-11T21:38:06.5446999Z ''')
2023-01-11T21:38:06.5447005Z 
2023-01-11T21:38:06.5447045Z 
2023-01-11T21:38:06.5447206Z triton_fused_mean_1_1 = async_compile.triton('''
2023-01-11T21:38:06.5447283Z import triton
2023-01-11T21:38:06.5447375Z import triton.language as tl
2023-01-11T21:38:06.5447481Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5447581Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5447717Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5447840Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5447845Z 
2023-01-11T21:38:06.5447934Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.5448047Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5448132Z               filename=__file__,
2023-01-11T21:38:06.5448496Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5448569Z @triton.jit
2023-01-11T21:38:06.5448741Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5448814Z     xnumel = 8
2023-01-11T21:38:06.5448887Z     rnumel = 8
2023-01-11T21:38:06.5448985Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5449123Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5449206Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5449318Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5449390Z     x0 = xindex
2023-01-11T21:38:06.5449511Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5449616Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5449704Z         rindex = roffset + rbase
2023-01-11T21:38:06.5449787Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5449861Z         r1 = rindex
2023-01-11T21:38:06.5449972Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.5450091Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5450207Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5450279Z     tmp2 = 8
2023-01-11T21:38:06.5450357Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.5450497Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.5450608Z ''')
2023-01-11T21:38:06.5450614Z 
2023-01-11T21:38:06.5450618Z 
2023-01-11T21:38:06.5450769Z triton_fused_mean_2_2 = async_compile.triton('''
2023-01-11T21:38:06.5450845Z import triton
2023-01-11T21:38:06.5450936Z import triton.language as tl
2023-01-11T21:38:06.5451049Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5451151Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5451284Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5451408Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5451414Z 
2023-01-11T21:38:06.5451818Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5451889Z @triton.jit
2023-01-11T21:38:06.5452020Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5452097Z     xnumel = 16
2023-01-11T21:38:06.5452191Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5452320Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5452400Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5452472Z     x0 = xindex % 8
2023-01-11T21:38:06.5452543Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5452612Z     x2 = xindex
2023-01-11T21:38:06.5452813Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5453019Z     tmp1 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5453224Z     tmp3 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5453459Z     tmp5 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5453539Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5453616Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5453686Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.5453758Z     tmp7 = 4
2023-01-11T21:38:06.5453836Z     tmp8 = tmp6 / tmp7
2023-01-11T21:38:06.5453970Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.5454057Z ''')
2023-01-11T21:38:06.5454062Z 
2023-01-11T21:38:06.5454067Z 
2023-01-11T21:38:06.5454223Z triton_fused_mean_3_3 = async_compile.triton('''
2023-01-11T21:38:06.5454296Z import triton
2023-01-11T21:38:06.5454382Z import triton.language as tl
2023-01-11T21:38:06.5454613Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5454717Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5454853Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5454981Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5454986Z 
2023-01-11T21:38:06.5455433Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5455518Z @triton.jit
2023-01-11T21:38:06.5455648Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5455715Z     xnumel = 32
2023-01-11T21:38:06.5455810Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5455937Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5456018Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5456088Z     x0 = xindex
2023-01-11T21:38:06.5456182Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5456285Z     tmp1 = tl.load(in_ptr0 + (32 + x0), xmask)
2023-01-11T21:38:06.5456357Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5456428Z     tmp3 = 2
2023-01-11T21:38:06.5456504Z     tmp4 = tmp2 / tmp3
2023-01-11T21:38:06.5456638Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5456723Z ''')
2023-01-11T21:38:06.5456728Z 
2023-01-11T21:38:06.5456733Z 
2023-01-11T21:38:06.5456826Z async_compile.wait(globals())
2023-01-11T21:38:06.5456946Z del async_compile
2023-01-11T21:38:06.5456951Z 
2023-01-11T21:38:06.5457019Z def call(args):
2023-01-11T21:38:06.5457093Z     arg0_1, = args
2023-01-11T21:38:06.5457231Z     args.clear()
2023-01-11T21:38:06.5457337Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5457550Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5457640Z         buf5 = buf0; del buf0  # reuse
2023-01-11T21:38:06.5457729Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5457861Z         triton_fused_mean_0.run(buf5, arg0_1, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5458062Z         buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5458156Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.5458296Z         triton_fused_mean_1_1.run(buf2, arg0_1, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5458507Z         buf3 = empty_strided((1, 2, 1, 8), (16, 8, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5458645Z         triton_fused_mean_2_2.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5458843Z         buf4 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5458978Z         triton_fused_mean_3_3.run(arg0_1, buf4, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.5459045Z         del arg0_1
2023-01-11T21:38:06.5459139Z         return (buf5, buf2, buf3, buf4, )
2023-01-11T21:38:06.5459144Z 
2023-01-11T21:38:06.5459149Z 
2023-01-11T21:38:06.5459230Z if __name__ == "__main__":
2023-01-11T21:38:06.5459349Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5459515Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5459727Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5459838Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5459843Z 
2023-01-11T21:38:06.5460109Z [2023-01-11 21:35:22,132] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 723
2023-01-11T21:38:06.5460115Z 
2023-01-11T21:38:06.5460213Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5460281Z import torch
2023-01-11T21:38:06.5460353Z import random
2023-01-11T21:38:06.5460473Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5460596Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5460601Z 
2023-01-11T21:38:06.5460682Z aten = torch.ops.aten
2023-01-11T21:38:06.5460816Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5460911Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5460920Z 
2023-01-11T21:38:06.5460986Z import triton
2023-01-11T21:38:06.5461079Z import triton.language as tl
2023-01-11T21:38:06.5461203Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5461341Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5461346Z 
2023-01-11T21:38:06.5461354Z 
2023-01-11T21:38:06.5461509Z triton_fused_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.5461583Z import triton
2023-01-11T21:38:06.5461672Z import triton.language as tl
2023-01-11T21:38:06.5461785Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5461880Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5462011Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5462134Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5462139Z 
2023-01-11T21:38:06.5462228Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.5462342Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5462430Z               filename=__file__,
2023-01-11T21:38:06.5462791Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.5462864Z @triton.jit
2023-01-11T21:38:06.5463052Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5463126Z     xnumel = 1
2023-01-11T21:38:06.5463198Z     rnumel = 64
2023-01-11T21:38:06.5463294Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5463429Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5463512Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5463632Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5463744Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5463848Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5463938Z         rindex = roffset + rbase
2023-01-11T21:38:06.5464022Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5464093Z         r0 = rindex
2023-01-11T21:38:06.5464212Z         tmp0 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5464300Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.5464416Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.5464529Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5464600Z     tmp3 = 64
2023-01-11T21:38:06.5464678Z     tmp4 = tmp2 / tmp3
2023-01-11T21:38:06.5464763Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5464896Z     tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, None)
2023-01-11T21:38:06.5464981Z ''')
2023-01-11T21:38:06.5464986Z 
2023-01-11T21:38:06.5464990Z 
2023-01-11T21:38:06.5465149Z triton_fused_mean_1_1 = async_compile.triton('''
2023-01-11T21:38:06.5465217Z import triton
2023-01-11T21:38:06.5465307Z import triton.language as tl
2023-01-11T21:38:06.5465457Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5465575Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5465727Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5465854Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5465859Z 
2023-01-11T21:38:06.5465948Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.5466056Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5466139Z               filename=__file__,
2023-01-11T21:38:06.5466494Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5466569Z @triton.jit
2023-01-11T21:38:06.5466736Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5466809Z     xnumel = 8
2023-01-11T21:38:06.5466883Z     rnumel = 8
2023-01-11T21:38:06.5466973Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5467110Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5467191Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5467309Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5467381Z     x0 = xindex
2023-01-11T21:38:06.5467498Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5467602Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5467682Z         rindex = roffset + rbase
2023-01-11T21:38:06.5467769Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5467841Z         r1 = rindex
2023-01-11T21:38:06.5467973Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.5468062Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.5468179Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.5468293Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5468366Z     tmp3 = 8
2023-01-11T21:38:06.5468439Z     tmp4 = tmp2 / tmp3
2023-01-11T21:38:06.5468522Z     tmp5 = tmp4.to(tl.float32)
2023-01-11T21:38:06.5468657Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5468742Z ''')
2023-01-11T21:38:06.5468748Z 
2023-01-11T21:38:06.5468781Z 
2023-01-11T21:38:06.5468940Z triton_fused_mean_2_2 = async_compile.triton('''
2023-01-11T21:38:06.5469010Z import triton
2023-01-11T21:38:06.5469100Z import triton.language as tl
2023-01-11T21:38:06.5469207Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5469308Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5469437Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5469561Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5469567Z 
2023-01-11T21:38:06.5469970Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5470046Z @triton.jit
2023-01-11T21:38:06.5470177Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5470250Z     xnumel = 16
2023-01-11T21:38:06.5470341Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5470473Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5470554Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5470628Z     x0 = xindex % 8
2023-01-11T21:38:06.5470707Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5470781Z     x2 = xindex
2023-01-11T21:38:06.5471008Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5471229Z     tmp2 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5471454Z     tmp5 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5471703Z     tmp8 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5471794Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.5471883Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.5471962Z     tmp4 = tmp1 + tmp3
2023-01-11T21:38:06.5472047Z     tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.5472118Z     tmp7 = tmp4 + tmp6
2023-01-11T21:38:06.5472204Z     tmp9 = tmp8.to(tl.float32)
2023-01-11T21:38:06.5472281Z     tmp10 = tmp7 + tmp9
2023-01-11T21:38:06.5472349Z     tmp11 = 4
2023-01-11T21:38:06.5472430Z     tmp12 = tmp10 / tmp11
2023-01-11T21:38:06.5472518Z     tmp13 = tmp12.to(tl.float32)
2023-01-11T21:38:06.5472654Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.5472732Z ''')
2023-01-11T21:38:06.5472743Z 
2023-01-11T21:38:06.5472748Z 
2023-01-11T21:38:06.5472897Z triton_fused_mean_3_3 = async_compile.triton('''
2023-01-11T21:38:06.5472974Z import triton
2023-01-11T21:38:06.5473068Z import triton.language as tl
2023-01-11T21:38:06.5473178Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5473281Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5473411Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5473540Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5473546Z 
2023-01-11T21:38:06.5473941Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5474014Z @triton.jit
2023-01-11T21:38:06.5474146Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5474220Z     xnumel = 32
2023-01-11T21:38:06.5474313Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5474443Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5474530Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5474599Z     x0 = xindex
2023-01-11T21:38:06.5474708Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5474826Z     tmp2 = tl.load(in_ptr0 + (32 + x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5474943Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.5475030Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.5475110Z     tmp4 = tmp1 + tmp3
2023-01-11T21:38:06.5475181Z     tmp5 = 2
2023-01-11T21:38:06.5475258Z     tmp6 = tmp4 / tmp5
2023-01-11T21:38:06.5475350Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.5475502Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.5475607Z ''')
2023-01-11T21:38:06.5475613Z 
2023-01-11T21:38:06.5475617Z 
2023-01-11T21:38:06.5475709Z async_compile.wait(globals())
2023-01-11T21:38:06.5475786Z del async_compile
2023-01-11T21:38:06.5475791Z 
2023-01-11T21:38:06.5475866Z def call(args):
2023-01-11T21:38:06.5475943Z     arg0_1, = args
2023-01-11T21:38:06.5476013Z     args.clear()
2023-01-11T21:38:06.5476101Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5476290Z         buf5 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5476383Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5476524Z         triton_fused_mean_0.run(arg0_1, buf5, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5476728Z         buf2 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5476867Z         triton_fused_mean_1_1.run(arg0_1, buf2, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5477076Z         buf3 = empty_strided((1, 2, 1, 8), (16, 8, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5477205Z         triton_fused_mean_2_2.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5477402Z         buf4 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5477569Z         triton_fused_mean_3_3.run(arg0_1, buf4, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.5477642Z         del arg0_1
2023-01-11T21:38:06.5477737Z         return (buf5, buf2, buf3, buf4, )
2023-01-11T21:38:06.5477743Z 
2023-01-11T21:38:06.5477748Z 
2023-01-11T21:38:06.5477828Z if __name__ == "__main__":
2023-01-11T21:38:06.5477947Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5478072Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5478281Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5478393Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5478398Z 
2023-01-11T21:38:06.5478466Z ok (0.524s)
2023-01-11T21:38:06.5478931Z   test_min_max_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5479064Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5479326Z [2023-01-11 21:35:22,157] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 724
2023-01-11T21:38:06.5479590Z [2023-01-11 21:35:22,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 724
2023-01-11T21:38:06.5480004Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5480135Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5480392Z [2023-01-11 21:35:22,277] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 725
2023-01-11T21:38:06.5480655Z [2023-01-11 21:35:22,375] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 725
2023-01-11T21:38:06.5480661Z 
2023-01-11T21:38:06.5480779Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5480856Z import torch
2023-01-11T21:38:06.5480927Z import random
2023-01-11T21:38:06.5481046Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5481170Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5481175Z 
2023-01-11T21:38:06.5481255Z aten = torch.ops.aten
2023-01-11T21:38:06.5481390Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5481479Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5481490Z 
2023-01-11T21:38:06.5481558Z import triton
2023-01-11T21:38:06.5481647Z import triton.language as tl
2023-01-11T21:38:06.5481776Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5481916Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5481921Z 
2023-01-11T21:38:06.5481926Z 
2023-01-11T21:38:06.5482125Z triton_fused_add_add_1_add_2_amax_max_1_min_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5482202Z import triton
2023-01-11T21:38:06.5482295Z import triton.language as tl
2023-01-11T21:38:06.5482401Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5482500Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5482630Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5482755Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5482760Z 
2023-01-11T21:38:06.5482849Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.5482962Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5483046Z               filename=__file__,
2023-01-11T21:38:06.5483453Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]})
2023-01-11T21:38:06.5483549Z @triton.jit
2023-01-11T21:38:06.5483745Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5483817Z     xnumel = 1
2023-01-11T21:38:06.5483889Z     rnumel = 64
2023-01-11T21:38:06.5483985Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5484120Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5484202Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5484314Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5484496Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5484622Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5484805Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5484911Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5484997Z         rindex = roffset + rbase
2023-01-11T21:38:06.5485085Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5485158Z         r0 = rindex
2023-01-11T21:38:06.5485346Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5485563Z         tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5485672Z         tmp4 = tl.load(in_ptr1 + (r0), rmask)
2023-01-11T21:38:06.5485786Z         tmp7 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5485865Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5485991Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.5486071Z         tmp5 = tmp0 + tmp4
2023-01-11T21:38:06.5486191Z         _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6)
2023-01-11T21:38:06.5486264Z         tmp8 = 1
2023-01-11T21:38:06.5486345Z         tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.5486473Z         _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10)
2023-01-11T21:38:06.5486589Z     tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5486747Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5486865Z     tmp6 = tl.reshape(tl.min(_tmp6, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5486990Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None)
2023-01-11T21:38:06.5487104Z     tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5487234Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp10, None)
2023-01-11T21:38:06.5487319Z ''')
2023-01-11T21:38:06.5487324Z 
2023-01-11T21:38:06.5487329Z 
2023-01-11T21:38:06.5487422Z async_compile.wait(globals())
2023-01-11T21:38:06.5487498Z del async_compile
2023-01-11T21:38:06.5487503Z 
2023-01-11T21:38:06.5487578Z def call(args):
2023-01-11T21:38:06.5487658Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5487726Z     args.clear()
2023-01-11T21:38:06.5487817Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5488006Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5488191Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5488392Z         buf2 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5488484Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5488662Z         triton_fused_add_add_1_add_2_amax_max_1_min_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5488729Z         del arg0_1
2023-01-11T21:38:06.5488797Z         del arg1_1
2023-01-11T21:38:06.5488887Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.5488892Z 
2023-01-11T21:38:06.5488897Z 
2023-01-11T21:38:06.5488978Z if __name__ == "__main__":
2023-01-11T21:38:06.5489095Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5489265Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5489467Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5489663Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5489779Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5489791Z 
2023-01-11T21:38:06.5489795Z 
2023-01-11T21:38:06.5489885Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5489963Z import torch
2023-01-11T21:38:06.5490037Z import random
2023-01-11T21:38:06.5490156Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5490282Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5490287Z 
2023-01-11T21:38:06.5490369Z aten = torch.ops.aten
2023-01-11T21:38:06.5490506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5490594Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5490601Z 
2023-01-11T21:38:06.5490678Z import triton
2023-01-11T21:38:06.5490772Z import triton.language as tl
2023-01-11T21:38:06.5490897Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5491035Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5491040Z 
2023-01-11T21:38:06.5491045Z 
2023-01-11T21:38:06.5491245Z triton_fused_add_add_1_add_2_amax_max_1_min_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5491319Z import triton
2023-01-11T21:38:06.5491409Z import triton.language as tl
2023-01-11T21:38:06.5491515Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5491614Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5491744Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5491865Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5491870Z 
2023-01-11T21:38:06.5491962Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.5492075Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5492163Z               filename=__file__,
2023-01-11T21:38:06.5492607Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 6), equal_to_1=())]})
2023-01-11T21:38:06.5492675Z @triton.jit
2023-01-11T21:38:06.5492869Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5492939Z     xnumel = 1
2023-01-11T21:38:06.5493013Z     rnumel = 64
2023-01-11T21:38:06.5493109Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5493245Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5493329Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5493440Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5493621Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5493750Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5493935Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5494041Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5494130Z         rindex = roffset + rbase
2023-01-11T21:38:06.5494213Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5494284Z         r0 = rindex
2023-01-11T21:38:06.5494605Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5494823Z         tmp1 = tl.load(in_ptr1 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5494942Z         tmp4 = tl.load(in_ptr1 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5495059Z         tmp7 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5495140Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5495267Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.5495391Z         tmp5 = tmp0 + tmp4
2023-01-11T21:38:06.5495511Z         _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6)
2023-01-11T21:38:06.5495581Z         tmp8 = 1
2023-01-11T21:38:06.5495660Z         tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.5495795Z         _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10)
2023-01-11T21:38:06.5495910Z     tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5496038Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5496153Z     tmp6 = tl.reshape(tl.min(_tmp6, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5496276Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None)
2023-01-11T21:38:06.5496392Z     tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5496525Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp10, None)
2023-01-11T21:38:06.5496610Z ''')
2023-01-11T21:38:06.5496616Z 
2023-01-11T21:38:06.5496623Z 
2023-01-11T21:38:06.5496717Z async_compile.wait(globals())
2023-01-11T21:38:06.5496794Z del async_compile
2023-01-11T21:38:06.5496800Z 
2023-01-11T21:38:06.5496873Z def call(args):
2023-01-11T21:38:06.5496952Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5497020Z     args.clear()
2023-01-11T21:38:06.5497113Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5497375Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5497560Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5497762Z         buf2 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5497854Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5498033Z         triton_fused_add_add_1_add_2_amax_max_1_min_1_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5498105Z         del arg0_1
2023-01-11T21:38:06.5498170Z         del arg1_1
2023-01-11T21:38:06.5498258Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.5498267Z 
2023-01-11T21:38:06.5498271Z 
2023-01-11T21:38:06.5498349Z if __name__ == "__main__":
2023-01-11T21:38:06.5498468Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5498596Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5498836Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5499034Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5499147Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5499158Z 
2023-01-11T21:38:06.5499221Z ok (0.243s)
2023-01-11T21:38:06.5499696Z   test_misaligned_address_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5499831Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5500088Z [2023-01-11 21:35:22,396] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 726
2023-01-11T21:38:06.5500351Z [2023-01-11 21:35:22,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 726
2023-01-11T21:38:06.5500767Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5500900Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5501152Z [2023-01-11 21:35:22,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 727
2023-01-11T21:38:06.5501445Z [2023-01-11 21:35:22,642] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 727
2023-01-11T21:38:06.5501450Z 
2023-01-11T21:38:06.5501551Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5501623Z import torch
2023-01-11T21:38:06.5501699Z import random
2023-01-11T21:38:06.5501820Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5501946Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5501952Z 
2023-01-11T21:38:06.5502036Z aten = torch.ops.aten
2023-01-11T21:38:06.5502176Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5502273Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5502278Z 
2023-01-11T21:38:06.5502355Z import triton
2023-01-11T21:38:06.5502443Z import triton.language as tl
2023-01-11T21:38:06.5502570Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5502714Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5502720Z 
2023-01-11T21:38:06.5502725Z 
2023-01-11T21:38:06.5502885Z triton_fused_gather_0 = async_compile.triton('''
2023-01-11T21:38:06.5502959Z import triton
2023-01-11T21:38:06.5503053Z import triton.language as tl
2023-01-11T21:38:06.5503173Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5503277Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5503406Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5503532Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5503537Z 
2023-01-11T21:38:06.5503953Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5504029Z @triton.jit
2023-01-11T21:38:06.5504174Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5504249Z     xnumel = 1
2023-01-11T21:38:06.5504347Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5504476Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5504556Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5504718Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.5504823Z     tmp1 = tl.load(in_ptr1 + (tmp0), None)
2023-01-11T21:38:06.5504962Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp1, None)
2023-01-11T21:38:06.5505049Z ''')
2023-01-11T21:38:06.5505054Z 
2023-01-11T21:38:06.5505058Z 
2023-01-11T21:38:06.5505154Z async_compile.wait(globals())
2023-01-11T21:38:06.5505233Z del async_compile
2023-01-11T21:38:06.5505239Z 
2023-01-11T21:38:06.5505309Z def call(args):
2023-01-11T21:38:06.5505389Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5505466Z     args.clear()
2023-01-11T21:38:06.5505561Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5505798Z         buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5505915Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5506064Z         triton_fused_gather_0.run(arg1_1, arg0_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5506141Z         del arg0_1
2023-01-11T21:38:06.5506211Z         del arg1_1
2023-01-11T21:38:06.5506293Z         return (buf0, )
2023-01-11T21:38:06.5506298Z 
2023-01-11T21:38:06.5506303Z 
2023-01-11T21:38:06.5506384Z if __name__ == "__main__":
2023-01-11T21:38:06.5506504Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5506632Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5506843Z     arg0_1 = rand_strided((1, 1000), (1000, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5507042Z     arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5507157Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5507194Z 
2023-01-11T21:38:06.5507198Z 
2023-01-11T21:38:06.5507293Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5507369Z import torch
2023-01-11T21:38:06.5507445Z import random
2023-01-11T21:38:06.5507566Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5507695Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5507700Z 
2023-01-11T21:38:06.5507784Z aten = torch.ops.aten
2023-01-11T21:38:06.5507922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5508013Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5508018Z 
2023-01-11T21:38:06.5508092Z import triton
2023-01-11T21:38:06.5508186Z import triton.language as tl
2023-01-11T21:38:06.5508314Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5508457Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5508463Z 
2023-01-11T21:38:06.5508467Z 
2023-01-11T21:38:06.5508632Z triton_fused_gather_0 = async_compile.triton('''
2023-01-11T21:38:06.5508706Z import triton
2023-01-11T21:38:06.5508801Z import triton.language as tl
2023-01-11T21:38:06.5508910Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5509014Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5509153Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5509280Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5509285Z 
2023-01-11T21:38:06.5509699Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5509775Z @triton.jit
2023-01-11T21:38:06.5509920Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5509999Z     xnumel = 1
2023-01-11T21:38:06.5510091Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5510226Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5510312Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5510444Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.5510564Z     tmp1 = tl.load(in_ptr1 + (tmp0), None).to(tl.float32)
2023-01-11T21:38:06.5510774Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp1, None)
2023-01-11T21:38:06.5510862Z ''')
2023-01-11T21:38:06.5510868Z 
2023-01-11T21:38:06.5510872Z 
2023-01-11T21:38:06.5510970Z async_compile.wait(globals())
2023-01-11T21:38:06.5511042Z del async_compile
2023-01-11T21:38:06.5511047Z 
2023-01-11T21:38:06.5511123Z def call(args):
2023-01-11T21:38:06.5511204Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5511282Z     args.clear()
2023-01-11T21:38:06.5511377Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5511578Z         buf0 = empty_strided((1, 1), (1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5511673Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5511816Z         triton_fused_gather_0.run(arg1_1, arg0_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5511888Z         del arg0_1
2023-01-11T21:38:06.5511963Z         del arg1_1
2023-01-11T21:38:06.5512041Z         return (buf0, )
2023-01-11T21:38:06.5512046Z 
2023-01-11T21:38:06.5512050Z 
2023-01-11T21:38:06.5512133Z if __name__ == "__main__":
2023-01-11T21:38:06.5512253Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5512379Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5512587Z     arg0_1 = rand_strided((1, 1000), (1000, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5512777Z     arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5512898Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5512903Z 
2023-01-11T21:38:06.5512976Z ok (0.265s)
2023-01-11T21:38:06.5513434Z   test_mm_views_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5513599Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5513859Z [2023-01-11 21:35:22,660] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 728
2023-01-11T21:38:06.5514123Z [2023-01-11 21:35:22,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 728
2023-01-11T21:38:06.5514128Z 
2023-01-11T21:38:06.5514227Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5514304Z import torch
2023-01-11T21:38:06.5514374Z import random
2023-01-11T21:38:06.5514497Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5514624Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5514629Z 
2023-01-11T21:38:06.5514713Z aten = torch.ops.aten
2023-01-11T21:38:06.5514857Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5514954Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5514959Z 
2023-01-11T21:38:06.5515041Z import triton
2023-01-11T21:38:06.5515141Z import triton.language as tl
2023-01-11T21:38:06.5515261Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5515428Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5515434Z 
2023-01-11T21:38:06.5515438Z 
2023-01-11T21:38:06.5515545Z async_compile.wait(globals())
2023-01-11T21:38:06.5515633Z del async_compile
2023-01-11T21:38:06.5515638Z 
2023-01-11T21:38:06.5515714Z def call(args):
2023-01-11T21:38:06.5515795Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5515871Z     args.clear()
2023-01-11T21:38:06.5515958Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5516169Z         buf0 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5516299Z         aten.mm.out(arg0_1, as_strided(arg1_1, (32, 32), (32, 1)), out=buf0)
2023-01-11T21:38:06.5516373Z         del arg0_1
2023-01-11T21:38:06.5516447Z         del arg1_1
2023-01-11T21:38:06.5516526Z         return (buf0, )
2023-01-11T21:38:06.5516559Z 
2023-01-11T21:38:06.5516564Z 
2023-01-11T21:38:06.5516647Z if __name__ == "__main__":
2023-01-11T21:38:06.5516764Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5516886Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5517090Z     arg0_1 = rand_strided((32, 32), (1, 32), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5517302Z     arg1_1 = rand_strided((32, 1, 32), (32, 1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5517424Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5517430Z 
2023-01-11T21:38:06.5517505Z ok (0.020s)
2023-01-11T21:38:06.5517977Z   test_move_arange_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5518111Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5518370Z [2023-01-11 21:35:22,728] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 729
2023-01-11T21:38:06.5518634Z [2023-01-11 21:35:22,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 729
2023-01-11T21:38:06.5518640Z 
2023-01-11T21:38:06.5518740Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5518811Z import torch
2023-01-11T21:38:06.5518890Z import random
2023-01-11T21:38:06.5519045Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5519170Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5519175Z 
2023-01-11T21:38:06.5519257Z aten = torch.ops.aten
2023-01-11T21:38:06.5519392Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5519489Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5519497Z 
2023-01-11T21:38:06.5519567Z import triton
2023-01-11T21:38:06.5519657Z import triton.language as tl
2023-01-11T21:38:06.5519785Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5519924Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5519930Z 
2023-01-11T21:38:06.5519934Z 
2023-01-11T21:38:06.5520091Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.5520168Z import triton
2023-01-11T21:38:06.5520262Z import triton.language as tl
2023-01-11T21:38:06.5520380Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5520477Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5520616Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5520744Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5520750Z 
2023-01-11T21:38:06.5521159Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5521235Z @triton.jit
2023-01-11T21:38:06.5521368Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5521444Z     xnumel = 32
2023-01-11T21:38:06.5521542Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5521667Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5521753Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5521826Z     x0 = xindex
2023-01-11T21:38:06.5521925Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5522003Z     tmp0 = x0
2023-01-11T21:38:06.5522092Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.5522166Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.5522304Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.5522394Z ''')
2023-01-11T21:38:06.5522400Z 
2023-01-11T21:38:06.5522404Z 
2023-01-11T21:38:06.5522527Z async_compile.wait(globals())
2023-01-11T21:38:06.5522607Z del async_compile
2023-01-11T21:38:06.5522613Z 
2023-01-11T21:38:06.5522692Z def call(args):
2023-01-11T21:38:06.5522767Z     arg0_1, = args
2023-01-11T21:38:06.5522845Z     args.clear()
2023-01-11T21:38:06.5522932Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5523132Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5523227Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5523367Z         triton_fused_add_0.run(arg0_1, buf0, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.5523441Z         del arg0_1
2023-01-11T21:38:06.5523525Z         return (buf0, )
2023-01-11T21:38:06.5523530Z 
2023-01-11T21:38:06.5523534Z 
2023-01-11T21:38:06.5523616Z if __name__ == "__main__":
2023-01-11T21:38:06.5523737Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5523858Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5524061Z     arg0_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5524177Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5524182Z 
2023-01-11T21:38:06.5524258Z ok (0.137s)
2023-01-11T21:38:06.5524717Z   test_multi_device_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5524878Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5525137Z [2023-01-11 21:35:22,937] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 730
2023-01-11T21:38:06.5525321Z [2023-01-11 21:35:22,941] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.5525502Z [2023-01-11 21:35:22,943] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.5525671Z [2023-01-11 21:35:22,944] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.5525942Z [2023-01-11 21:35:23,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 730
2023-01-11T21:38:06.5525948Z 
2023-01-11T21:38:06.5526048Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5526129Z import torch
2023-01-11T21:38:06.5526206Z import random
2023-01-11T21:38:06.5526329Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5526456Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5526463Z 
2023-01-11T21:38:06.5526547Z aten = torch.ops.aten
2023-01-11T21:38:06.5526679Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5526776Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5526781Z 
2023-01-11T21:38:06.5526856Z import triton
2023-01-11T21:38:06.5526951Z import triton.language as tl
2023-01-11T21:38:06.5527082Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5527223Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5527229Z 
2023-01-11T21:38:06.5527233Z 
2023-01-11T21:38:06.5527417Z triton_fused_add_add_1_add_2_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.5527495Z import triton
2023-01-11T21:38:06.5527582Z import triton.language as tl
2023-01-11T21:38:06.5527699Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5527800Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5527935Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5528067Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5528072Z 
2023-01-11T21:38:06.5528473Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5528576Z @triton.jit
2023-01-11T21:38:06.5528714Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5528783Z     xnumel = 40
2023-01-11T21:38:06.5528881Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5529011Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5529096Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5529170Z     x0 = xindex
2023-01-11T21:38:06.5529268Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5529342Z     tmp1 = 1
2023-01-11T21:38:06.5529417Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5529490Z     tmp3 = 2
2023-01-11T21:38:06.5529572Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5529644Z     tmp5 = 3
2023-01-11T21:38:06.5529722Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.5529793Z     tmp7 = 4
2023-01-11T21:38:06.5529866Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.5530003Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.5530091Z ''')
2023-01-11T21:38:06.5530098Z 
2023-01-11T21:38:06.5530103Z 
2023-01-11T21:38:06.5530242Z kernel_cpp_1 = async_compile.cpp('''
2023-01-11T21:38:06.5530451Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.5530574Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:06.5530642Z {
2023-01-11T21:38:06.5530747Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.5530809Z     {
2023-01-11T21:38:06.5530891Z         #pragma omp for 
2023-01-11T21:38:06.5530981Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.5531050Z         {
2023-01-11T21:38:06.5531217Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.5531402Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(5));
2023-01-11T21:38:06.5531491Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.5531621Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(6));
2023-01-11T21:38:06.5531712Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.5531813Z             tmp4.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.5531879Z         }
2023-01-11T21:38:06.5531978Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.5532066Z         for(long i0=40; i0<40; i0+=1)
2023-01-11T21:38:06.5532132Z         {
2023-01-11T21:38:06.5532219Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.5532323Z             auto tmp1 = static_cast<float>(5);
2023-01-11T21:38:06.5532413Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.5532517Z             auto tmp3 = static_cast<float>(6);
2023-01-11T21:38:06.5532603Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.5532695Z             in_out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.5532761Z         }
2023-01-11T21:38:06.5532820Z     }
2023-01-11T21:38:06.5532886Z }
2023-01-11T21:38:06.5532972Z ''')
2023-01-11T21:38:06.5532977Z 
2023-01-11T21:38:06.5532981Z 
2023-01-11T21:38:06.5533241Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_2 = async_compile.triton('''
2023-01-11T21:38:06.5533318Z import triton
2023-01-11T21:38:06.5533408Z import triton.language as tl
2023-01-11T21:38:06.5533523Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5533617Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5533748Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5533872Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5533877Z 
2023-01-11T21:38:06.5534274Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5534349Z @triton.jit
2023-01-11T21:38:06.5534586Z def triton_(in_out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5534661Z     xnumel = 40
2023-01-11T21:38:06.5534755Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5534923Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5535008Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5535079Z     x0 = xindex
2023-01-11T21:38:06.5535182Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5535252Z     tmp1 = 7
2023-01-11T21:38:06.5535333Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5535405Z     tmp3 = 8
2023-01-11T21:38:06.5535475Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5535613Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5535699Z ''')
2023-01-11T21:38:06.5535705Z 
2023-01-11T21:38:06.5535709Z 
2023-01-11T21:38:06.5535847Z kernel_cpp_3 = async_compile.cpp('''
2023-01-11T21:38:06.5536051Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.5536171Z extern "C" void kernel(float* __restrict__ in_out_ptr0)
2023-01-11T21:38:06.5536235Z {
2023-01-11T21:38:06.5536336Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.5536397Z     {
2023-01-11T21:38:06.5536478Z         #pragma omp for 
2023-01-11T21:38:06.5536564Z         for(long i0=0; i0<5; i0+=1)
2023-01-11T21:38:06.5536630Z         {
2023-01-11T21:38:06.5536772Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.5536910Z             auto tmp1 = at::vec::Vectorized<float>(static_cast<float>(9));
2023-01-11T21:38:06.5536998Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.5537186Z             auto tmp3 = at::vec::Vectorized<float>(static_cast<float>(10));
2023-01-11T21:38:06.5537298Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.5537494Z             tmp4.store(in_out_ptr0 + 8*i0);
2023-01-11T21:38:06.5537561Z         }
2023-01-11T21:38:06.5537661Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.5537747Z         for(long i0=40; i0<40; i0+=1)
2023-01-11T21:38:06.5537813Z         {
2023-01-11T21:38:06.5537900Z             auto tmp0 = in_out_ptr0[i0];
2023-01-11T21:38:06.5538007Z             auto tmp1 = static_cast<float>(9);
2023-01-11T21:38:06.5538095Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.5538200Z             auto tmp3 = static_cast<float>(10);
2023-01-11T21:38:06.5538288Z             auto tmp4 = tmp2 + tmp3;
2023-01-11T21:38:06.5538376Z             in_out_ptr0[i0] = tmp4;
2023-01-11T21:38:06.5538442Z         }
2023-01-11T21:38:06.5538501Z     }
2023-01-11T21:38:06.5538564Z }
2023-01-11T21:38:06.5538659Z ''')
2023-01-11T21:38:06.5538664Z 
2023-01-11T21:38:06.5538668Z 
2023-01-11T21:38:06.5538763Z async_compile.wait(globals())
2023-01-11T21:38:06.5538840Z del async_compile
2023-01-11T21:38:06.5538845Z 
2023-01-11T21:38:06.5538923Z def call(args):
2023-01-11T21:38:06.5538996Z     arg0_1, = args
2023-01-11T21:38:06.5539064Z     args.clear()
2023-01-11T21:38:06.5539154Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5539366Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5539462Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5539618Z         triton_fused_add_add_1_add_2_add_3_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.5539691Z         del arg0_1
2023-01-11T21:38:06.5539897Z     buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.5539969Z     buf1.copy_(buf0)
2023-01-11T21:38:06.5540059Z     buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.5540165Z     kernel_cpp_1(c_void_p(buf2.data_ptr()))
2023-01-11T21:38:06.5540256Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5540345Z         buf3 = buf0; del buf0  # reuse
2023-01-11T21:38:06.5540426Z         buf3.copy_(buf2)
2023-01-11T21:38:06.5540520Z         buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.5540703Z         triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_device_put_device_put_1_2.run(buf4, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.5540792Z     buf5 = buf2; del buf2  # reuse
2023-01-11T21:38:06.5540869Z     buf5.copy_(buf4)
2023-01-11T21:38:06.5540967Z     del buf4
2023-01-11T21:38:06.5541056Z     buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.5541164Z     kernel_cpp_3(c_void_p(buf6.data_ptr()))
2023-01-11T21:38:06.5541238Z     return (buf6, )
2023-01-11T21:38:06.5541243Z 
2023-01-11T21:38:06.5541248Z 
2023-01-11T21:38:06.5541327Z if __name__ == "__main__":
2023-01-11T21:38:06.5541438Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5541566Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5541778Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5541890Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5541899Z 
2023-01-11T21:38:06.5541968Z ok (0.227s)
2023-01-11T21:38:06.5542136Z   test_multi_gpu_device_cuda (__main__.CudaTests) ... skip: requires multiple cuda devices (0.000s)
2023-01-11T21:38:06.5542603Z   test_multilayer_low_prec_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5542733Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5542991Z [2023-01-11 21:35:23,124] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 731
2023-01-11T21:38:06.5543193Z [2023-01-11 21:35:23,177] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.5543485Z [2023-01-11 21:35:23,362] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 731
2023-01-11T21:38:06.5543907Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5544040Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5544294Z [2023-01-11 21:35:23,375] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 732
2023-01-11T21:38:06.5544501Z [2023-01-11 21:35:23,392] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.5544507Z 
2023-01-11T21:38:06.5544600Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5544676Z import torch
2023-01-11T21:38:06.5544754Z import random
2023-01-11T21:38:06.5544866Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5544990Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5544995Z 
2023-01-11T21:38:06.5545074Z aten = torch.ops.aten
2023-01-11T21:38:06.5545215Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5545311Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5545317Z 
2023-01-11T21:38:06.5545390Z import triton
2023-01-11T21:38:06.5545484Z import triton.language as tl
2023-01-11T21:38:06.5545615Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5545766Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5545772Z 
2023-01-11T21:38:06.5545784Z 
2023-01-11T21:38:06.5545953Z triton_fused_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.5546027Z import triton
2023-01-11T21:38:06.5546118Z import triton.language as tl
2023-01-11T21:38:06.5546231Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5546336Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5546467Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5546585Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5546596Z 
2023-01-11T21:38:06.5546708Z @reduction(size_hints=[512, 8192],
2023-01-11T21:38:06.5546823Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5546907Z               filename=__file__,
2023-01-11T21:38:06.5547267Z               meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5547339Z @triton.jit
2023-01-11T21:38:06.5547506Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5547582Z     xnumel = 454
2023-01-11T21:38:06.5547649Z     rnumel = 8188
2023-01-11T21:38:06.5547748Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5547884Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5547968Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5548087Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5548158Z     x0 = xindex
2023-01-11T21:38:06.5548276Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5548374Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5548460Z         rindex = roffset + rbase
2023-01-11T21:38:06.5548543Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5548613Z         r1 = rindex
2023-01-11T21:38:06.5548690Z         tmp0 = r1 + (8188*x0)
2023-01-11T21:38:06.5548764Z         tmp1 = 3717120
2023-01-11T21:38:06.5548846Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.5549036Z         tmp3 = tl.load(in_ptr0 + ((r1 + (8188*x0)) % 3717120 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5549155Z         tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.5549248Z         tmp5 = tl.where(tmp2, tmp4, 0)
2023-01-11T21:38:06.5549371Z         _tmp6 = tl.where(xmask & rmask, _tmp6 + tmp5, _tmp6)
2023-01-11T21:38:06.5549486Z     tmp6 = tl.reshape(tl.sum(_tmp6, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5549584Z     tl.store(out_ptr0 + x0, tmp6, xmask)
2023-01-11T21:38:06.5549672Z ''')
2023-01-11T21:38:06.5549678Z 
2023-01-11T21:38:06.5549682Z 
2023-01-11T21:38:06.5549837Z triton_fused_mean_1 = async_compile.triton('''
2023-01-11T21:38:06.5549904Z import triton
2023-01-11T21:38:06.5549997Z import triton.language as tl
2023-01-11T21:38:06.5550110Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5550212Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5550342Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5550462Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5550467Z 
2023-01-11T21:38:06.5550557Z @reduction(size_hints=[1, 512],
2023-01-11T21:38:06.5550669Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5550754Z               filename=__file__,
2023-01-11T21:38:06.5551117Z               meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5551190Z @triton.jit
2023-01-11T21:38:06.5551357Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5551427Z     xnumel = 1
2023-01-11T21:38:06.5551501Z     rnumel = 454
2023-01-11T21:38:06.5551599Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5551728Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5551809Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5551927Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5552042Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5552148Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5552237Z         rindex = roffset + rbase
2023-01-11T21:38:06.5552321Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5552386Z         r0 = rindex
2023-01-11T21:38:06.5552485Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5552632Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5552746Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5552821Z     tmp2 = 3717120
2023-01-11T21:38:06.5552900Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.5552988Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.5553116Z     tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp4, None)
2023-01-11T21:38:06.5553204Z ''')
2023-01-11T21:38:06.5553209Z 
2023-01-11T21:38:06.5553214Z 
2023-01-11T21:38:06.5553306Z async_compile.wait(globals())
2023-01-11T21:38:06.5553380Z del async_compile
2023-01-11T21:38:06.5553385Z 
2023-01-11T21:38:06.5553461Z def call(args):
2023-01-11T21:38:06.5553534Z     arg0_1, = args
2023-01-11T21:38:06.5553607Z     args.clear()
2023-01-11T21:38:06.5553698Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5553890Z         buf0 = empty_strided((454, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5553982Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5554126Z         triton_fused_mean_0.run(arg0_1, buf0, 454, 8188, grid=grid(454), stream=stream0)
2023-01-11T21:38:06.5554200Z         del arg0_1
2023-01-11T21:38:06.5554386Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5554525Z         triton_fused_mean_1.run(buf0, buf2, 1, 454, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5554602Z         return (buf2, )
2023-01-11T21:38:06.5554607Z 
2023-01-11T21:38:06.5554612Z 
2023-01-11T21:38:06.5554684Z if __name__ == "__main__":
2023-01-11T21:38:06.5554804Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5554933Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5555194Z     arg0_1 = rand_strided((10, 3, 352, 352), (371712, 123904, 352, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5555311Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5555317Z 
2023-01-11T21:38:06.5555627Z [2023-01-11 21:35:23,394] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 732
2023-01-11T21:38:06.5555633Z 
2023-01-11T21:38:06.5555731Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5555804Z import torch
2023-01-11T21:38:06.5555880Z import random
2023-01-11T21:38:06.5555992Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5556113Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5556118Z 
2023-01-11T21:38:06.5556201Z aten = torch.ops.aten
2023-01-11T21:38:06.5556335Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5556429Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5556437Z 
2023-01-11T21:38:06.5556511Z import triton
2023-01-11T21:38:06.5556603Z import triton.language as tl
2023-01-11T21:38:06.5556720Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5556859Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5556864Z 
2023-01-11T21:38:06.5556868Z 
2023-01-11T21:38:06.5557025Z triton_fused_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.5557099Z import triton
2023-01-11T21:38:06.5557190Z import triton.language as tl
2023-01-11T21:38:06.5557302Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5557403Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5557533Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5557650Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5557655Z 
2023-01-11T21:38:06.5557747Z @reduction(size_hints=[512, 8192],
2023-01-11T21:38:06.5557859Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5557948Z               filename=__file__,
2023-01-11T21:38:06.5558301Z               meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5558372Z @triton.jit
2023-01-11T21:38:06.5558568Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5558641Z     xnumel = 454
2023-01-11T21:38:06.5558709Z     rnumel = 8188
2023-01-11T21:38:06.5558806Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5558944Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5559027Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5559145Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5559215Z     x0 = xindex
2023-01-11T21:38:06.5559333Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5559434Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5559520Z         rindex = roffset + rbase
2023-01-11T21:38:06.5559604Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5559674Z         r1 = rindex
2023-01-11T21:38:06.5559753Z         tmp0 = r1 + (8188*x0)
2023-01-11T21:38:06.5559827Z         tmp1 = 3717120
2023-01-11T21:38:06.5559904Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.5560097Z         tmp3 = tl.load(in_ptr0 + ((r1 + (8188*x0)) % 3717120 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.5560187Z         tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.5560281Z         tmp5 = tl.where(tmp2, tmp4, 0)
2023-01-11T21:38:06.5560402Z         _tmp6 = tl.where(xmask & rmask, _tmp6 + tmp5, _tmp6)
2023-01-11T21:38:06.5560515Z     tmp6 = tl.reshape(tl.sum(_tmp6, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5560613Z     tl.store(out_ptr0 + x0, tmp6, xmask)
2023-01-11T21:38:06.5560698Z ''')
2023-01-11T21:38:06.5560704Z 
2023-01-11T21:38:06.5560736Z 
2023-01-11T21:38:06.5560885Z triton_fused_mean_1 = async_compile.triton('''
2023-01-11T21:38:06.5560958Z import triton
2023-01-11T21:38:06.5561050Z import triton.language as tl
2023-01-11T21:38:06.5561163Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5561264Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5561393Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5561517Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5561523Z 
2023-01-11T21:38:06.5561610Z @reduction(size_hints=[1, 512],
2023-01-11T21:38:06.5561718Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5561803Z               filename=__file__,
2023-01-11T21:38:06.5562158Z               meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5562235Z @triton.jit
2023-01-11T21:38:06.5562405Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5562477Z     xnumel = 1
2023-01-11T21:38:06.5562550Z     rnumel = 454
2023-01-11T21:38:06.5562639Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5562773Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5562857Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5562974Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5563091Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5563198Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5563286Z         rindex = roffset + rbase
2023-01-11T21:38:06.5563364Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5563437Z         r0 = rindex
2023-01-11T21:38:06.5563539Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5563658Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5563774Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5563847Z     tmp2 = 3717120
2023-01-11T21:38:06.5563926Z     tmp3 = tmp1 / tmp2
2023-01-11T21:38:06.5564007Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.5564140Z     tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp4, None)
2023-01-11T21:38:06.5564226Z ''')
2023-01-11T21:38:06.5564259Z 
2023-01-11T21:38:06.5564264Z 
2023-01-11T21:38:06.5564357Z async_compile.wait(globals())
2023-01-11T21:38:06.5564434Z del async_compile
2023-01-11T21:38:06.5564439Z 
2023-01-11T21:38:06.5564514Z def call(args):
2023-01-11T21:38:06.5564586Z     arg0_1, = args
2023-01-11T21:38:06.5564660Z     args.clear()
2023-01-11T21:38:06.5564744Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5564944Z         buf0 = empty_strided((454, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5565034Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5565174Z         triton_fused_mean_0.run(arg0_1, buf0, 454, 8188, grid=grid(454), stream=stream0)
2023-01-11T21:38:06.5565251Z         del arg0_1
2023-01-11T21:38:06.5565437Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5565573Z         triton_fused_mean_1.run(buf0, buf2, 1, 454, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5565644Z         return (buf2, )
2023-01-11T21:38:06.5565657Z 
2023-01-11T21:38:06.5565664Z 
2023-01-11T21:38:06.5565737Z if __name__ == "__main__":
2023-01-11T21:38:06.5565854Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5565980Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5566214Z     arg0_1 = rand_strided((10, 3, 352, 352), (371712, 123904, 352, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5566327Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5566332Z 
2023-01-11T21:38:06.5566402Z ok (0.367s)
2023-01-11T21:38:06.5566558Z   test_nan_to_num_cuda (__main__.CudaTests) ... skip: Skipping due to op bugs (0.001s)
2023-01-11T21:38:06.5567048Z   test_narrow_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5567182Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5567433Z [2023-01-11 21:35:23,426] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 733
2023-01-11T21:38:06.5567696Z [2023-01-11 21:35:23,497] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 733
2023-01-11T21:38:06.5568107Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5568242Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5568504Z [2023-01-11 21:35:23,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 734
2023-01-11T21:38:06.5568766Z [2023-01-11 21:35:23,598] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 734
2023-01-11T21:38:06.5568772Z 
2023-01-11T21:38:06.5568869Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5568941Z import torch
2023-01-11T21:38:06.5569017Z import random
2023-01-11T21:38:06.5569129Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5569254Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5569259Z 
2023-01-11T21:38:06.5569340Z aten = torch.ops.aten
2023-01-11T21:38:06.5569476Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5569575Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5569580Z 
2023-01-11T21:38:06.5569654Z import triton
2023-01-11T21:38:06.5569745Z import triton.language as tl
2023-01-11T21:38:06.5569870Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5570028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5570034Z 
2023-01-11T21:38:06.5570039Z 
2023-01-11T21:38:06.5570197Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5570272Z import triton
2023-01-11T21:38:06.5570363Z import triton.language as tl
2023-01-11T21:38:06.5570476Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5570577Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5570709Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5570826Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5570835Z 
2023-01-11T21:38:06.5571235Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5571313Z @triton.jit
2023-01-11T21:38:06.5571446Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5571521Z     xnumel = 1024
2023-01-11T21:38:06.5571618Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5571747Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5571829Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5571892Z     x0 = xindex
2023-01-11T21:38:06.5572089Z     tmp0 = tl.load(in_ptr0 + (640 + x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5572158Z     tmp1 = 2
2023-01-11T21:38:06.5572237Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5572306Z     tmp3 = 1
2023-01-11T21:38:06.5572384Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5572517Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5572636Z ''')
2023-01-11T21:38:06.5572642Z 
2023-01-11T21:38:06.5572653Z 
2023-01-11T21:38:06.5572740Z async_compile.wait(globals())
2023-01-11T21:38:06.5572817Z del async_compile
2023-01-11T21:38:06.5572822Z 
2023-01-11T21:38:06.5572896Z def call(args):
2023-01-11T21:38:06.5572973Z     arg0_1, = args
2023-01-11T21:38:06.5573047Z     args.clear()
2023-01-11T21:38:06.5573139Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5573343Z         buf0 = empty_strided((16, 64), (64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5573429Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5573571Z         triton_fused_add_1_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.5573684Z         return (as_strided(arg0_1, (64, 16), (64, 1), 10), buf0, )
2023-01-11T21:38:06.5573690Z 
2023-01-11T21:38:06.5573694Z 
2023-01-11T21:38:06.5573772Z if __name__ == "__main__":
2023-01-11T21:38:06.5573892Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5574023Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5574229Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5574341Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5574347Z 
2023-01-11T21:38:06.5574354Z 
2023-01-11T21:38:06.5574444Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5574633Z import torch
2023-01-11T21:38:06.5574708Z import random
2023-01-11T21:38:06.5574826Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5574949Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5574954Z 
2023-01-11T21:38:06.5575035Z aten = torch.ops.aten
2023-01-11T21:38:06.5575171Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5575267Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5575272Z 
2023-01-11T21:38:06.5575338Z import triton
2023-01-11T21:38:06.5575436Z import triton.language as tl
2023-01-11T21:38:06.5575559Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5575697Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5575702Z 
2023-01-11T21:38:06.5575707Z 
2023-01-11T21:38:06.5575865Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5575981Z import triton
2023-01-11T21:38:06.5576075Z import triton.language as tl
2023-01-11T21:38:06.5576181Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5576283Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5576414Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5576536Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5576541Z 
2023-01-11T21:38:06.5576942Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5577019Z @triton.jit
2023-01-11T21:38:06.5577206Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5577294Z     xnumel = 1024
2023-01-11T21:38:06.5577386Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5577536Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5577619Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5577689Z     x0 = xindex
2023-01-11T21:38:06.5577906Z     tmp0 = tl.load(in_ptr0 + (640 + x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5577976Z     tmp1 = 2
2023-01-11T21:38:06.5578054Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5578117Z     tmp3 = 1
2023-01-11T21:38:06.5578194Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5578329Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5578413Z ''')
2023-01-11T21:38:06.5578419Z 
2023-01-11T21:38:06.5578423Z 
2023-01-11T21:38:06.5578514Z async_compile.wait(globals())
2023-01-11T21:38:06.5578664Z del async_compile
2023-01-11T21:38:06.5578670Z 
2023-01-11T21:38:06.5578745Z def call(args):
2023-01-11T21:38:06.5578819Z     arg0_1, = args
2023-01-11T21:38:06.5578886Z     args.clear()
2023-01-11T21:38:06.5578977Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5579188Z         buf0 = empty_strided((16, 64), (64, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5579279Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5579419Z         triton_fused_add_1_0.run(arg0_1, buf0, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.5579532Z         return (as_strided(arg0_1, (64, 16), (64, 1), 10), buf0, )
2023-01-11T21:38:06.5579538Z 
2023-01-11T21:38:06.5579542Z 
2023-01-11T21:38:06.5579621Z if __name__ == "__main__":
2023-01-11T21:38:06.5579737Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5579855Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5580057Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5580173Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5580178Z 
2023-01-11T21:38:06.5580250Z ok (0.203s)
2023-01-11T21:38:06.5580725Z   test_new_empty_strided_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5580857Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5581112Z [2023-01-11 21:35:23,638] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 735
2023-01-11T21:38:06.5581377Z [2023-01-11 21:35:23,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 735
2023-01-11T21:38:06.5581826Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5581958Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5582207Z [2023-01-11 21:35:23,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 736
2023-01-11T21:38:06.5582469Z [2023-01-11 21:35:23,923] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 736
2023-01-11T21:38:06.5582475Z 
2023-01-11T21:38:06.5582574Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5582647Z import torch
2023-01-11T21:38:06.5582721Z import random
2023-01-11T21:38:06.5582837Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5582966Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5582971Z 
2023-01-11T21:38:06.5583053Z aten = torch.ops.aten
2023-01-11T21:38:06.5583182Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5583276Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5583281Z 
2023-01-11T21:38:06.5583356Z import triton
2023-01-11T21:38:06.5583450Z import triton.language as tl
2023-01-11T21:38:06.5583573Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5583713Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5583718Z 
2023-01-11T21:38:06.5583723Z 
2023-01-11T21:38:06.5583908Z triton_fused_fill__new_empty_strided_0 = async_compile.triton('''
2023-01-11T21:38:06.5583983Z import triton
2023-01-11T21:38:06.5584069Z import triton.language as tl
2023-01-11T21:38:06.5584183Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5584283Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5584444Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5584572Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5584578Z 
2023-01-11T21:38:06.5584972Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5585044Z @triton.jit
2023-01-11T21:38:06.5585165Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5585233Z     xnumel = 16384
2023-01-11T21:38:06.5585331Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5585456Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5585539Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5585616Z     x0 = xindex
2023-01-11T21:38:06.5585705Z     tmp0 = 123
2023-01-11T21:38:06.5585854Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.5585944Z ''')
2023-01-11T21:38:06.5585949Z 
2023-01-11T21:38:06.5585960Z 
2023-01-11T21:38:06.5586046Z async_compile.wait(globals())
2023-01-11T21:38:06.5586121Z del async_compile
2023-01-11T21:38:06.5586126Z 
2023-01-11T21:38:06.5586201Z def call(args):
2023-01-11T21:38:06.5586272Z     arg0_1, = args
2023-01-11T21:38:06.5586348Z     args.clear()
2023-01-11T21:38:06.5586439Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5586653Z         buf0 = empty_strided((1, 128, 128), (16384, 128, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5586746Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5586901Z         triton_fused_fill__new_empty_strided_0.run(buf0, 16384, grid=grid(16384), stream=stream0)
2023-01-11T21:38:06.5586978Z         return (buf0, )
2023-01-11T21:38:06.5586983Z 
2023-01-11T21:38:06.5586988Z 
2023-01-11T21:38:06.5587066Z if __name__ == "__main__":
2023-01-11T21:38:06.5587186Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5587314Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5587510Z     arg0_1 = rand_strided((55, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5587615Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5587629Z 
2023-01-11T21:38:06.5587633Z 
2023-01-11T21:38:06.5587752Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5587826Z import torch
2023-01-11T21:38:06.5587899Z import random
2023-01-11T21:38:06.5588020Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5588143Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5588148Z 
2023-01-11T21:38:06.5588227Z aten = torch.ops.aten
2023-01-11T21:38:06.5588359Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5588447Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5588456Z 
2023-01-11T21:38:06.5588522Z import triton
2023-01-11T21:38:06.5588612Z import triton.language as tl
2023-01-11T21:38:06.5588739Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5588878Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5588883Z 
2023-01-11T21:38:06.5588887Z 
2023-01-11T21:38:06.5589070Z triton_fused_fill__new_empty_strided_0 = async_compile.triton('''
2023-01-11T21:38:06.5589143Z import triton
2023-01-11T21:38:06.5589236Z import triton.language as tl
2023-01-11T21:38:06.5589344Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5589447Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5589579Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5589702Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5589707Z 
2023-01-11T21:38:06.5590096Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5590197Z @triton.jit
2023-01-11T21:38:06.5590317Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5590394Z     xnumel = 16384
2023-01-11T21:38:06.5590484Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5590610Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5590693Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5590763Z     x0 = xindex
2023-01-11T21:38:06.5590836Z     tmp0 = 123
2023-01-11T21:38:06.5590970Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.5591056Z ''')
2023-01-11T21:38:06.5591061Z 
2023-01-11T21:38:06.5591065Z 
2023-01-11T21:38:06.5591160Z async_compile.wait(globals())
2023-01-11T21:38:06.5591229Z del async_compile
2023-01-11T21:38:06.5591234Z 
2023-01-11T21:38:06.5591307Z def call(args):
2023-01-11T21:38:06.5591379Z     arg0_1, = args
2023-01-11T21:38:06.5591451Z     args.clear()
2023-01-11T21:38:06.5591546Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5591768Z         buf0 = empty_strided((1, 128, 128), (16384, 128, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5591861Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5592008Z         triton_fused_fill__new_empty_strided_0.run(buf0, 16384, grid=grid(16384), stream=stream0)
2023-01-11T21:38:06.5592086Z         return (buf0, )
2023-01-11T21:38:06.5592093Z 
2023-01-11T21:38:06.5592097Z 
2023-01-11T21:38:06.5592177Z if __name__ == "__main__":
2023-01-11T21:38:06.5592295Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5592420Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5592620Z     arg0_1 = rand_strided((55, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5592733Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5592738Z 
2023-01-11T21:38:06.5592806Z ok (0.324s)
2023-01-11T21:38:06.5593256Z   test_new_ones_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5593417Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5593678Z [2023-01-11 21:35:23,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 737
2023-01-11T21:38:06.5593942Z [2023-01-11 21:35:24,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 737
2023-01-11T21:38:06.5594356Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5594489Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5594742Z [2023-01-11 21:35:24,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 738
2023-01-11T21:38:06.5595002Z [2023-01-11 21:35:24,123] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 738
2023-01-11T21:38:06.5595007Z 
2023-01-11T21:38:06.5595105Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5595178Z import torch
2023-01-11T21:38:06.5595246Z import random
2023-01-11T21:38:06.5595382Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5595520Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5595526Z 
2023-01-11T21:38:06.5595621Z aten = torch.ops.aten
2023-01-11T21:38:06.5595758Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5595853Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5595859Z 
2023-01-11T21:38:06.5595960Z import triton
2023-01-11T21:38:06.5596054Z import triton.language as tl
2023-01-11T21:38:06.5596172Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5596312Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5596317Z 
2023-01-11T21:38:06.5596322Z 
2023-01-11T21:38:06.5596480Z triton_fused_full_0 = async_compile.triton('''
2023-01-11T21:38:06.5596554Z import triton
2023-01-11T21:38:06.5596645Z import triton.language as tl
2023-01-11T21:38:06.5596756Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5602983Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5603144Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5603275Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5603281Z 
2023-01-11T21:38:06.5603687Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5603769Z @triton.jit
2023-01-11T21:38:06.5603893Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5603966Z     xnumel = 1
2023-01-11T21:38:06.5604067Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5604202Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5604293Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5604357Z     tmp0 = 1
2023-01-11T21:38:06.5604492Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.5604579Z ''')
2023-01-11T21:38:06.5604585Z 
2023-01-11T21:38:06.5604589Z 
2023-01-11T21:38:06.5604751Z triton_fused_new_zeros_1 = async_compile.triton('''
2023-01-11T21:38:06.5604822Z import triton
2023-01-11T21:38:06.5604918Z import triton.language as tl
2023-01-11T21:38:06.5605030Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5605124Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5605261Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5605385Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5605390Z 
2023-01-11T21:38:06.5605828Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5605907Z @triton.jit
2023-01-11T21:38:06.5606025Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5606099Z     xnumel = 1
2023-01-11T21:38:06.5606196Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5606318Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5606400Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5606470Z     tmp0 = 0
2023-01-11T21:38:06.5606602Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.5606688Z ''')
2023-01-11T21:38:06.5606696Z 
2023-01-11T21:38:06.5606701Z 
2023-01-11T21:38:06.5606795Z async_compile.wait(globals())
2023-01-11T21:38:06.5606871Z del async_compile
2023-01-11T21:38:06.5606876Z 
2023-01-11T21:38:06.5606949Z def call(args):
2023-01-11T21:38:06.5607015Z     arg0_1, = args
2023-01-11T21:38:06.5607092Z     args.clear()
2023-01-11T21:38:06.5607189Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5607380Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5607472Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5607603Z         triton_fused_full_0.run(buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5607790Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5607919Z         triton_fused_new_zeros_1.run(buf1, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5608004Z         return (buf0, buf1, )
2023-01-11T21:38:06.5608009Z 
2023-01-11T21:38:06.5608014Z 
2023-01-11T21:38:06.5608094Z if __name__ == "__main__":
2023-01-11T21:38:06.5608287Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5608412Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5608608Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5608716Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5608724Z 
2023-01-11T21:38:06.5608729Z 
2023-01-11T21:38:06.5608824Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5608890Z import torch
2023-01-11T21:38:06.5608965Z import random
2023-01-11T21:38:06.5609085Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5609209Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5609214Z 
2023-01-11T21:38:06.5609299Z aten = torch.ops.aten
2023-01-11T21:38:06.5609434Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5609531Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5609536Z 
2023-01-11T21:38:06.5609614Z import triton
2023-01-11T21:38:06.5609699Z import triton.language as tl
2023-01-11T21:38:06.5609822Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5609958Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5609964Z 
2023-01-11T21:38:06.5609968Z 
2023-01-11T21:38:06.5610126Z triton_fused_full_0 = async_compile.triton('''
2023-01-11T21:38:06.5610200Z import triton
2023-01-11T21:38:06.5610291Z import triton.language as tl
2023-01-11T21:38:06.5610402Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5610496Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5610631Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5610754Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5610760Z 
2023-01-11T21:38:06.5611139Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5611214Z @triton.jit
2023-01-11T21:38:06.5611335Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5611407Z     xnumel = 1
2023-01-11T21:38:06.5611504Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5611653Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5611738Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5611809Z     tmp0 = 1
2023-01-11T21:38:06.5611940Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.5612024Z ''')
2023-01-11T21:38:06.5612029Z 
2023-01-11T21:38:06.5612034Z 
2023-01-11T21:38:06.5612196Z triton_fused_new_zeros_1 = async_compile.triton('''
2023-01-11T21:38:06.5612270Z import triton
2023-01-11T21:38:06.5612362Z import triton.language as tl
2023-01-11T21:38:06.5612467Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5612568Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5612700Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5612823Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5612828Z 
2023-01-11T21:38:06.5613204Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5613277Z @triton.jit
2023-01-11T21:38:06.5613396Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5613469Z     xnumel = 1
2023-01-11T21:38:06.5613559Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5613686Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5613768Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5613836Z     tmp0 = 0
2023-01-11T21:38:06.5613964Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.5614047Z ''')
2023-01-11T21:38:06.5614094Z 
2023-01-11T21:38:06.5614098Z 
2023-01-11T21:38:06.5614191Z async_compile.wait(globals())
2023-01-11T21:38:06.5614261Z del async_compile
2023-01-11T21:38:06.5614266Z 
2023-01-11T21:38:06.5614340Z def call(args):
2023-01-11T21:38:06.5614414Z     arg0_1, = args
2023-01-11T21:38:06.5614724Z     args.clear()
2023-01-11T21:38:06.5614820Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5615017Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5615109Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5615233Z         triton_fused_full_0.run(buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5615420Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5615554Z         triton_fused_new_zeros_1.run(buf1, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5615636Z         return (buf0, buf1, )
2023-01-11T21:38:06.5615643Z 
2023-01-11T21:38:06.5615647Z 
2023-01-11T21:38:06.5615727Z if __name__ == "__main__":
2023-01-11T21:38:06.5615849Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5615976Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5616172Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5616277Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5616285Z 
2023-01-11T21:38:06.5616356Z ok (0.200s)
2023-01-11T21:38:06.5616822Z   test_nll_loss_forward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5616954Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5617272Z [2023-01-11 21:35:24,174] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 739
2023-01-11T21:38:06.5617562Z [2023-01-11 21:35:24,357] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 739
2023-01-11T21:38:06.5618033Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5618170Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5618430Z [2023-01-11 21:35:24,406] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 740
2023-01-11T21:38:06.5618644Z [2023-01-11 21:35:24,416] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.5618909Z [2023-01-11 21:35:24,493] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 740
2023-01-11T21:38:06.5618918Z 
2023-01-11T21:38:06.5619011Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5619088Z import torch
2023-01-11T21:38:06.5619163Z import random
2023-01-11T21:38:06.5619285Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5619413Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5619419Z 
2023-01-11T21:38:06.5619507Z aten = torch.ops.aten
2023-01-11T21:38:06.5619646Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5619742Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5619747Z 
2023-01-11T21:38:06.5619816Z import triton
2023-01-11T21:38:06.5619909Z import triton.language as tl
2023-01-11T21:38:06.5620035Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5620175Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5620181Z 
2023-01-11T21:38:06.5620185Z 
2023-01-11T21:38:06.5620441Z triton_fused_gather_mean_neg_squeeze_unsqueeze_0 = async_compile.triton('''
2023-01-11T21:38:06.5620519Z import triton
2023-01-11T21:38:06.5620612Z import triton.language as tl
2023-01-11T21:38:06.5620721Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5620828Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5620961Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5621088Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5621093Z 
2023-01-11T21:38:06.5621182Z @reduction(size_hints=[1, 8],
2023-01-11T21:38:06.5621300Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5621388Z               filename=__file__,
2023-01-11T21:38:06.5621776Z               meta={'signature': {0: '*fp32', 1: '*i64', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5621848Z @triton.jit
2023-01-11T21:38:06.5622032Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5622106Z     xnumel = 1
2023-01-11T21:38:06.5622180Z     rnumel = 5
2023-01-11T21:38:06.5622279Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5622422Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5622508Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5622627Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5622741Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5622848Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5622939Z         rindex = roffset + rbase
2023-01-11T21:38:06.5623028Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5623103Z         r0 = rindex
2023-01-11T21:38:06.5623208Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5623321Z         tmp1 = tl.load(in_ptr1 + (tmp0 + (5*r0)), rmask)
2023-01-11T21:38:06.5623426Z         tmp2 = -tmp1
2023-01-11T21:38:06.5623550Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.5623669Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5623742Z     tmp4 = 5
2023-01-11T21:38:06.5623824Z     tmp5 = tmp3 / tmp4
2023-01-11T21:38:06.5623991Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, None)
2023-01-11T21:38:06.5624080Z ''')
2023-01-11T21:38:06.5624086Z 
2023-01-11T21:38:06.5624091Z 
2023-01-11T21:38:06.5624241Z triton_fused_full_1 = async_compile.triton('''
2023-01-11T21:38:06.5624319Z import triton
2023-01-11T21:38:06.5624413Z import triton.language as tl
2023-01-11T21:38:06.5624529Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5624634Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5624769Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5624897Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5624904Z 
2023-01-11T21:38:06.5625293Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5625363Z @triton.jit
2023-01-11T21:38:06.5625487Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5625560Z     xnumel = 1
2023-01-11T21:38:06.5625676Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5625822Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5625919Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5625994Z     tmp0 = 5.0
2023-01-11T21:38:06.5626122Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.5626208Z ''')
2023-01-11T21:38:06.5626213Z 
2023-01-11T21:38:06.5626218Z 
2023-01-11T21:38:06.5626313Z async_compile.wait(globals())
2023-01-11T21:38:06.5626390Z del async_compile
2023-01-11T21:38:06.5626426Z 
2023-01-11T21:38:06.5626503Z def call(args):
2023-01-11T21:38:06.5626584Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5626661Z     args.clear()
2023-01-11T21:38:06.5626749Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5626941Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5627037Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.5627130Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5627308Z         triton_fused_gather_mean_neg_squeeze_unsqueeze_0.run(buf1, arg1_1, arg0_1, 1, 5, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5627384Z         del arg0_1
2023-01-11T21:38:06.5627459Z         del arg1_1
2023-01-11T21:38:06.5627651Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5627778Z         triton_fused_full_1.run(buf2, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5627864Z         return (buf1, buf2, )
2023-01-11T21:38:06.5627869Z 
2023-01-11T21:38:06.5627877Z 
2023-01-11T21:38:06.5627960Z if __name__ == "__main__":
2023-01-11T21:38:06.5628080Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5628212Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5628414Z     arg0_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5628613Z     arg1_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5628733Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5628738Z 
2023-01-11T21:38:06.5628743Z 
2023-01-11T21:38:06.5628836Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5628912Z import torch
2023-01-11T21:38:06.5628987Z import random
2023-01-11T21:38:06.5629108Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5629231Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5629236Z 
2023-01-11T21:38:06.5629319Z aten = torch.ops.aten
2023-01-11T21:38:06.5629458Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5629559Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5629564Z 
2023-01-11T21:38:06.5629634Z import triton
2023-01-11T21:38:06.5629728Z import triton.language as tl
2023-01-11T21:38:06.5629854Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5630026Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5630032Z 
2023-01-11T21:38:06.5630036Z 
2023-01-11T21:38:06.5630248Z triton_fused_gather_mean_neg_squeeze_unsqueeze_0 = async_compile.triton('''
2023-01-11T21:38:06.5630325Z import triton
2023-01-11T21:38:06.5630417Z import triton.language as tl
2023-01-11T21:38:06.5630526Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5630629Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5630761Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5630887Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5630892Z 
2023-01-11T21:38:06.5630986Z @reduction(size_hints=[1, 8],
2023-01-11T21:38:06.5631103Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5631191Z               filename=__file__,
2023-01-11T21:38:06.5631569Z               meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5631639Z @triton.jit
2023-01-11T21:38:06.5631818Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5631892Z     xnumel = 1
2023-01-11T21:38:06.5631966Z     rnumel = 5
2023-01-11T21:38:06.5632065Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5632204Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5632290Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5632409Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5632556Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5632662Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5632752Z         rindex = roffset + rbase
2023-01-11T21:38:06.5632839Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5632912Z         r0 = rindex
2023-01-11T21:38:06.5633018Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5633148Z         tmp1 = tl.load(in_ptr1 + (tmp0 + (5*r0)), rmask).to(tl.float32)
2023-01-11T21:38:06.5633250Z         tmp2 = -tmp1
2023-01-11T21:38:06.5633344Z         tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.5633468Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.5633583Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5633656Z     tmp5 = 5
2023-01-11T21:38:06.5633738Z     tmp6 = tmp4 / tmp5
2023-01-11T21:38:06.5633821Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.5633959Z     tl.store(out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp7, None)
2023-01-11T21:38:06.5634048Z ''')
2023-01-11T21:38:06.5634054Z 
2023-01-11T21:38:06.5634058Z 
2023-01-11T21:38:06.5634216Z triton_fused_full_1 = async_compile.triton('''
2023-01-11T21:38:06.5634291Z import triton
2023-01-11T21:38:06.5634384Z import triton.language as tl
2023-01-11T21:38:06.5634499Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5634606Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5634735Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5634862Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5634867Z 
2023-01-11T21:38:06.5635257Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.5635344Z @triton.jit
2023-01-11T21:38:06.5635483Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5635579Z     xnumel = 1
2023-01-11T21:38:06.5635683Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5635814Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5635892Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5635965Z     tmp0 = 5.0
2023-01-11T21:38:06.5636129Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.5636218Z ''')
2023-01-11T21:38:06.5636223Z 
2023-01-11T21:38:06.5636228Z 
2023-01-11T21:38:06.5636323Z async_compile.wait(globals())
2023-01-11T21:38:06.5636401Z del async_compile
2023-01-11T21:38:06.5636406Z 
2023-01-11T21:38:06.5636483Z def call(args):
2023-01-11T21:38:06.5636569Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5636640Z     args.clear()
2023-01-11T21:38:06.5636735Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5636927Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5637022Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5637204Z         triton_fused_gather_mean_neg_squeeze_unsqueeze_0.run(arg1_1, arg0_1, buf1, 1, 5, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5637283Z         del arg0_1
2023-01-11T21:38:06.5637357Z         del arg1_1
2023-01-11T21:38:06.5637542Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5637676Z         triton_fused_full_1.run(buf2, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5637762Z         return (buf1, buf2, )
2023-01-11T21:38:06.5637768Z 
2023-01-11T21:38:06.5637772Z 
2023-01-11T21:38:06.5637854Z if __name__ == "__main__":
2023-01-11T21:38:06.5637972Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5638102Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5638307Z     arg0_1 = rand_strided((5, 5), (5, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5638504Z     arg1_1 = rand_strided((5, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5638620Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5638651Z 
2023-01-11T21:38:06.5638731Z ok (0.370s)
2023-01-11T21:38:06.5639211Z   test_no_mega_fusion_during_lowering_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5639346Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5639608Z [2023-01-11 21:35:24,674] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 741
2023-01-11T21:38:06.5639614Z 
2023-01-11T21:38:06.5639717Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5639793Z import torch
2023-01-11T21:38:06.5639872Z import random
2023-01-11T21:38:06.5639995Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5640118Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5640123Z 
2023-01-11T21:38:06.5640206Z aten = torch.ops.aten
2023-01-11T21:38:06.5640344Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5640445Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5640452Z 
2023-01-11T21:38:06.5640528Z import triton
2023-01-11T21:38:06.5640623Z import triton.language as tl
2023-01-11T21:38:06.5640749Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5640883Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5640897Z 
2023-01-11T21:38:06.5640901Z 
2023-01-11T21:38:06.5641126Z triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_add_8_0 = async_compile.triton('''
2023-01-11T21:38:06.5641203Z import triton
2023-01-11T21:38:06.5641295Z import triton.language as tl
2023-01-11T21:38:06.5641411Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5641519Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5641654Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5641782Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5641786Z 
2023-01-11T21:38:06.5642343Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: '*fp32', 9: '*fp32', 10: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), equal_to_1=())]})
2023-01-11T21:38:06.5642420Z @triton.jit
2023-01-11T21:38:06.5642613Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, in_ptr8, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5642688Z     xnumel = 64
2023-01-11T21:38:06.5642788Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5642920Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5643010Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5643083Z     x0 = xindex
2023-01-11T21:38:06.5643182Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5643275Z     tmp2 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5643372Z     tmp4 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.5643472Z     tmp6 = tl.load(in_ptr3 + (x0), xmask)
2023-01-11T21:38:06.5643568Z     tmp8 = tl.load(in_ptr4 + (x0), xmask)
2023-01-11T21:38:06.5643668Z     tmp10 = tl.load(in_ptr5 + (x0), xmask)
2023-01-11T21:38:06.5643767Z     tmp12 = tl.load(in_ptr6 + (x0), xmask)
2023-01-11T21:38:06.5643865Z     tmp14 = tl.load(in_ptr7 + (x0), xmask)
2023-01-11T21:38:06.5643956Z     tmp16 = tl.load(in_ptr8 + (x0), xmask)
2023-01-11T21:38:06.5644036Z     tmp1 = tmp0 + tmp0
2023-01-11T21:38:06.5644114Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.5644191Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.5644270Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.5644349Z     tmp9 = tmp7 + tmp8
2023-01-11T21:38:06.5644452Z     tmp11 = tmp9 + tmp10
2023-01-11T21:38:06.5644534Z     tmp13 = tmp11 + tmp12
2023-01-11T21:38:06.5644616Z     tmp15 = tmp13 + tmp14
2023-01-11T21:38:06.5644696Z     tmp17 = tmp15 + tmp16
2023-01-11T21:38:06.5644836Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.5644931Z ''')
2023-01-11T21:38:06.5644937Z 
2023-01-11T21:38:06.5644941Z 
2023-01-11T21:38:06.5645190Z triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1 = async_compile.triton('''
2023-01-11T21:38:06.5645268Z import triton
2023-01-11T21:38:06.5645357Z import triton.language as tl
2023-01-11T21:38:06.5645473Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5645579Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5645723Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5645871Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5645876Z 
2023-01-11T21:38:06.5646426Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: '*fp32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), equal_to_1=())]})
2023-01-11T21:38:06.5646506Z @triton.jit
2023-01-11T21:38:06.5646697Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, in_ptr5, in_ptr6, in_ptr7, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5646774Z     xnumel = 64
2023-01-11T21:38:06.5646867Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5647002Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5647085Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5647157Z     x0 = xindex
2023-01-11T21:38:06.5647261Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5647358Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5647458Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5647548Z     tmp5 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.5647645Z     tmp7 = tl.load(in_ptr3 + (x0), xmask)
2023-01-11T21:38:06.5647740Z     tmp9 = tl.load(in_ptr4 + (x0), xmask)
2023-01-11T21:38:06.5647839Z     tmp11 = tl.load(in_ptr5 + (x0), xmask)
2023-01-11T21:38:06.5647968Z     tmp13 = tl.load(in_ptr6 + (x0), xmask)
2023-01-11T21:38:06.5648064Z     tmp15 = tl.load(in_ptr7 + (x0), xmask)
2023-01-11T21:38:06.5648146Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5648217Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5648293Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.5648368Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.5648444Z     tmp10 = tmp8 + tmp9
2023-01-11T21:38:06.5648525Z     tmp12 = tmp10 + tmp11
2023-01-11T21:38:06.5648603Z     tmp14 = tmp12 + tmp13
2023-01-11T21:38:06.5648675Z     tmp16 = tmp14 + tmp15
2023-01-11T21:38:06.5648816Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.5648906Z ''')
2023-01-11T21:38:06.5648911Z 
2023-01-11T21:38:06.5648915Z 
2023-01-11T21:38:06.5649075Z triton_fused_add_49_2 = async_compile.triton('''
2023-01-11T21:38:06.5649151Z import triton
2023-01-11T21:38:06.5649244Z import triton.language as tl
2023-01-11T21:38:06.5649357Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5649460Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5649590Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5649716Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5649721Z 
2023-01-11T21:38:06.5650134Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5650206Z @triton.jit
2023-01-11T21:38:06.5650340Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5650444Z     xnumel = 64
2023-01-11T21:38:06.5650542Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5650671Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5650746Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5650815Z     x0 = xindex
2023-01-11T21:38:06.5650922Z     tmp0 = tl.load(in_out_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5651015Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5651096Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5651235Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5651321Z ''')
2023-01-11T21:38:06.5651327Z 
2023-01-11T21:38:06.5651331Z 
2023-01-11T21:38:06.5651415Z async_compile.wait(globals())
2023-01-11T21:38:06.5651492Z del async_compile
2023-01-11T21:38:06.5651497Z 
2023-01-11T21:38:06.5651569Z def call(args):
2023-01-11T21:38:06.5651890Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1 = args
2023-01-11T21:38:06.5651974Z     args.clear()
2023-01-11T21:38:06.5652065Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5652268Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5652361Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5652573Z         triton_fused_add_add_1_add_2_add_3_add_4_add_5_add_6_add_7_add_8_0.run(arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5652639Z         del arg0_1
2023-01-11T21:38:06.5652711Z         del arg1_1
2023-01-11T21:38:06.5652782Z         del arg2_1
2023-01-11T21:38:06.5652860Z         del arg3_1
2023-01-11T21:38:06.5652933Z         del arg4_1
2023-01-11T21:38:06.5653002Z         del arg5_1
2023-01-11T21:38:06.5653071Z         del arg6_1
2023-01-11T21:38:06.5653134Z         del arg7_1
2023-01-11T21:38:06.5653204Z         del arg8_1
2023-01-11T21:38:06.5653293Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.5653536Z         triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5653615Z         del arg10_1
2023-01-11T21:38:06.5653691Z         del arg11_1
2023-01-11T21:38:06.5653763Z         del arg12_1
2023-01-11T21:38:06.5653829Z         del arg13_1
2023-01-11T21:38:06.5653901Z         del arg14_1
2023-01-11T21:38:06.5653974Z         del arg15_1
2023-01-11T21:38:06.5654044Z         del arg16_1
2023-01-11T21:38:06.5654115Z         del arg9_1
2023-01-11T21:38:06.5654203Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.5654428Z         triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf2, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5654610Z         del arg17_1
2023-01-11T21:38:06.5654687Z         del arg18_1
2023-01-11T21:38:06.5654760Z         del arg19_1
2023-01-11T21:38:06.5654830Z         del arg20_1
2023-01-11T21:38:06.5654899Z         del arg21_1
2023-01-11T21:38:06.5654966Z         del arg22_1
2023-01-11T21:38:06.5655034Z         del arg23_1
2023-01-11T21:38:06.5655097Z         del arg24_1
2023-01-11T21:38:06.5655185Z         buf3 = buf2; del buf2  # reuse
2023-01-11T21:38:06.5655398Z         triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf3, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5655469Z         del arg25_1
2023-01-11T21:38:06.5655541Z         del arg26_1
2023-01-11T21:38:06.5655654Z         del arg27_1
2023-01-11T21:38:06.5655726Z         del arg28_1
2023-01-11T21:38:06.5655789Z         del arg29_1
2023-01-11T21:38:06.5655859Z         del arg30_1
2023-01-11T21:38:06.5655929Z         del arg31_1
2023-01-11T21:38:06.5656000Z         del arg32_1
2023-01-11T21:38:06.5656085Z         buf4 = buf3; del buf3  # reuse
2023-01-11T21:38:06.5656304Z         triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf4, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5656375Z         del arg33_1
2023-01-11T21:38:06.5656439Z         del arg34_1
2023-01-11T21:38:06.5656507Z         del arg35_1
2023-01-11T21:38:06.5656577Z         del arg36_1
2023-01-11T21:38:06.5656647Z         del arg37_1
2023-01-11T21:38:06.5656717Z         del arg38_1
2023-01-11T21:38:06.5656786Z         del arg39_1
2023-01-11T21:38:06.5656856Z         del arg40_1
2023-01-11T21:38:06.5656942Z         buf5 = buf4; del buf4  # reuse
2023-01-11T21:38:06.5657209Z         triton_fused_add_add_1_add_10_add_11_add_12_add_13_add_14_add_15_add_16_add_2_1.run(buf5, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5657282Z         del arg41_1
2023-01-11T21:38:06.5657354Z         del arg42_1
2023-01-11T21:38:06.5657437Z         del arg43_1
2023-01-11T21:38:06.5657515Z         del arg44_1
2023-01-11T21:38:06.5657591Z         del arg45_1
2023-01-11T21:38:06.5657655Z         del arg46_1
2023-01-11T21:38:06.5657727Z         del arg47_1
2023-01-11T21:38:06.5657800Z         del arg48_1
2023-01-11T21:38:06.5657888Z         buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.5658029Z         triton_fused_add_49_2.run(buf6, arg49_1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5658102Z         del arg49_1
2023-01-11T21:38:06.5658178Z         return (buf6, )
2023-01-11T21:38:06.5658184Z 
2023-01-11T21:38:06.5658188Z 
2023-01-11T21:38:06.5658265Z if __name__ == "__main__":
2023-01-11T21:38:06.5658383Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5658512Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5658729Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5658989Z     arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5659213Z     arg2_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5659433Z     arg3_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5659649Z     arg4_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5659858Z     arg5_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5660074Z     arg6_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5660288Z     arg7_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5660509Z     arg8_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5660722Z     arg9_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5660947Z     arg10_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5661170Z     arg11_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5661390Z     arg12_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5661604Z     arg13_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5661820Z     arg14_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5662035Z     arg15_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5662252Z     arg16_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5662504Z     arg17_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5662719Z     arg18_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5662935Z     arg19_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5663154Z     arg20_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5663362Z     arg21_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5663574Z     arg22_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5663788Z     arg23_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5664002Z     arg24_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5664217Z     arg25_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5664431Z     arg26_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5664646Z     arg27_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5664858Z     arg28_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5665077Z     arg29_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5665283Z     arg30_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5665504Z     arg31_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5665753Z     arg32_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5665975Z     arg33_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5666192Z     arg34_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5666405Z     arg35_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5666621Z     arg36_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5666831Z     arg37_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5667037Z     arg38_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5667278Z     arg39_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5667491Z     arg40_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5667705Z     arg41_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5667918Z     arg42_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5668132Z     arg43_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5668345Z     arg44_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5668561Z     arg45_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5668768Z     arg46_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5668980Z     arg47_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5669196Z     arg48_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5669411Z     arg49_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5669829Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1, arg6_1, arg7_1, arg8_1, arg9_1, arg10_1, arg11_1, arg12_1, arg13_1, arg14_1, arg15_1, arg16_1, arg17_1, arg18_1, arg19_1, arg20_1, arg21_1, arg22_1, arg23_1, arg24_1, arg25_1, arg26_1, arg27_1, arg28_1, arg29_1, arg30_1, arg31_1, arg32_1, arg33_1, arg34_1, arg35_1, arg36_1, arg37_1, arg38_1, arg39_1, arg40_1, arg41_1, arg42_1, arg43_1, arg44_1, arg45_1, arg46_1, arg47_1, arg48_1, arg49_1]))
2023-01-11T21:38:06.5670164Z [2023-01-11 21:35:24,964] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 741
2023-01-11T21:38:06.5670170Z 
2023-01-11T21:38:06.5670259Z --> 7
2023-01-11T21:38:06.5670334Z ok (0.487s)
2023-01-11T21:38:06.5670885Z   test_no_op_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5671028Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5671322Z [2023-01-11 21:35:24,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 742
2023-01-11T21:38:06.5671618Z [2023-01-11 21:35:25,068] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 742
2023-01-11T21:38:06.5672116Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5672257Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5672553Z [2023-01-11 21:35:25,087] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 743
2023-01-11T21:38:06.5672854Z [2023-01-11 21:35:25,156] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 743
2023-01-11T21:38:06.5672860Z 
2023-01-11T21:38:06.5672962Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5673039Z import torch
2023-01-11T21:38:06.5673112Z import random
2023-01-11T21:38:06.5673244Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5673371Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5673377Z 
2023-01-11T21:38:06.5673457Z aten = torch.ops.aten
2023-01-11T21:38:06.5673593Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5673712Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5673718Z 
2023-01-11T21:38:06.5673795Z import triton
2023-01-11T21:38:06.5673887Z import triton.language as tl
2023-01-11T21:38:06.5674013Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5674145Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5674160Z 
2023-01-11T21:38:06.5674165Z 
2023-01-11T21:38:06.5674323Z triton_fused_amax_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5674397Z import triton
2023-01-11T21:38:06.5674488Z import triton.language as tl
2023-01-11T21:38:06.5674602Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5674705Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5674840Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5674964Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5674969Z 
2023-01-11T21:38:06.5675413Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5675483Z @triton.jit
2023-01-11T21:38:06.5675648Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5675721Z     xnumel = 8
2023-01-11T21:38:06.5675818Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5675947Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5676029Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5676101Z     x0 = xindex
2023-01-11T21:38:06.5676286Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5676413Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5676484Z     tmp2 = 1
2023-01-11T21:38:06.5676562Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.5676697Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.5676833Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.5676915Z ''')
2023-01-11T21:38:06.5676920Z 
2023-01-11T21:38:06.5676925Z 
2023-01-11T21:38:06.5677018Z async_compile.wait(globals())
2023-01-11T21:38:06.5677088Z del async_compile
2023-01-11T21:38:06.5677093Z 
2023-01-11T21:38:06.5677166Z def call(args):
2023-01-11T21:38:06.5677238Z     arg0_1, = args
2023-01-11T21:38:06.5677313Z     args.clear()
2023-01-11T21:38:06.5677407Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5677608Z         buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5677815Z         buf1 = empty_strided((8, 1, 1), (1, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5677904Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5678046Z         triton_fused_amax_sum_1_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5678118Z         del arg0_1
2023-01-11T21:38:06.5678200Z         return (buf0, buf1, )
2023-01-11T21:38:06.5678209Z 
2023-01-11T21:38:06.5678213Z 
2023-01-11T21:38:06.5678291Z if __name__ == "__main__":
2023-01-11T21:38:06.5678407Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5678532Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5678738Z     arg0_1 = rand_strided((8, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5678844Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5678849Z 
2023-01-11T21:38:06.5678858Z 
2023-01-11T21:38:06.5678949Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5679022Z import torch
2023-01-11T21:38:06.5679093Z import random
2023-01-11T21:38:06.5679217Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5679341Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5679346Z 
2023-01-11T21:38:06.5679431Z aten = torch.ops.aten
2023-01-11T21:38:06.5679566Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5679683Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5679688Z 
2023-01-11T21:38:06.5679762Z import triton
2023-01-11T21:38:06.5679852Z import triton.language as tl
2023-01-11T21:38:06.5679976Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5680117Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5680122Z 
2023-01-11T21:38:06.5680127Z 
2023-01-11T21:38:06.5680292Z triton_fused_amax_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5680365Z import triton
2023-01-11T21:38:06.5680449Z import triton.language as tl
2023-01-11T21:38:06.5680562Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5680668Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5680800Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5680924Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5680929Z 
2023-01-11T21:38:06.5681347Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5681420Z @triton.jit
2023-01-11T21:38:06.5681563Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5681635Z     xnumel = 8
2023-01-11T21:38:06.5681725Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5681850Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5681934Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5682004Z     x0 = xindex
2023-01-11T21:38:06.5682246Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5682364Z     tmp1 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5682434Z     tmp2 = 1
2023-01-11T21:38:06.5682507Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.5682648Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.5682779Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.5682863Z ''')
2023-01-11T21:38:06.5682869Z 
2023-01-11T21:38:06.5682873Z 
2023-01-11T21:38:06.5682965Z async_compile.wait(globals())
2023-01-11T21:38:06.5683041Z del async_compile
2023-01-11T21:38:06.5683047Z 
2023-01-11T21:38:06.5683124Z def call(args):
2023-01-11T21:38:06.5683190Z     arg0_1, = args
2023-01-11T21:38:06.5683263Z     args.clear()
2023-01-11T21:38:06.5683355Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5683554Z         buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5683759Z         buf1 = empty_strided((8, 1, 1), (1, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5683849Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5683993Z         triton_fused_amax_sum_1_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.5684067Z         del arg0_1
2023-01-11T21:38:06.5684146Z         return (buf0, buf1, )
2023-01-11T21:38:06.5684151Z 
2023-01-11T21:38:06.5684155Z 
2023-01-11T21:38:06.5684236Z if __name__ == "__main__":
2023-01-11T21:38:06.5684355Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5684482Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5684687Z     arg0_1 = rand_strided((8, 1, 1), (1, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5684799Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5684804Z 
2023-01-11T21:38:06.5684874Z ok (0.175s)
2023-01-11T21:38:06.5685390Z   test_output_strides_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.5685465Z   warnings.warn(
2023-01-11T21:38:06.5685750Z [2023-01-11 21:35:25,171] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 744
2023-01-11T21:38:06.5686014Z [2023-01-11 21:35:25,351] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 744
2023-01-11T21:38:06.5686268Z [2023-01-11 21:35:25,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 745
2023-01-11T21:38:06.5686526Z [2023-01-11 21:35:25,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 745
2023-01-11T21:38:06.5687038Z /opt/conda/lib/python3.10/site-packages/torch/cuda/graphs.py:82: UserWarning: The CUDA Graph is empty. This ususally means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAGraph.cpp:192.)
2023-01-11T21:38:06.5687145Z   super(CUDAGraph, self).capture_end()
2023-01-11T21:38:06.5687397Z [2023-01-11 21:35:25,681] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 746
2023-01-11T21:38:06.5687658Z [2023-01-11 21:35:25,685] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 746
2023-01-11T21:38:06.5688167Z /opt/conda/lib/python3.10/site-packages/torch/cuda/graphs.py:82: UserWarning: The CUDA Graph is empty. This ususally means that the graph was attempted to be captured on wrong device or stream. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAGraph.cpp:192.)
2023-01-11T21:38:06.5688271Z   super(CUDAGraph, self).capture_end()
2023-01-11T21:38:06.5688681Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:3148: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5688828Z   self.assertEqual(inp.storage(), out.storage())
2023-01-11T21:38:06.5689462Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:1904: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5689556Z   device=typed_storage.device,
2023-01-11T21:38:06.5689562Z 
2023-01-11T21:38:06.5689661Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5689736Z import torch
2023-01-11T21:38:06.5689813Z import random
2023-01-11T21:38:06.5689933Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5690061Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5690067Z 
2023-01-11T21:38:06.5690142Z aten = torch.ops.aten
2023-01-11T21:38:06.5690278Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5690374Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5690379Z 
2023-01-11T21:38:06.5690455Z import triton
2023-01-11T21:38:06.5690548Z import triton.language as tl
2023-01-11T21:38:06.5690671Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5690810Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5690815Z 
2023-01-11T21:38:06.5690820Z 
2023-01-11T21:38:06.5690978Z triton_fused_clone_0 = async_compile.triton('''
2023-01-11T21:38:06.5691045Z import triton
2023-01-11T21:38:06.5691136Z import triton.language as tl
2023-01-11T21:38:06.5691248Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5691349Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5691485Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5691610Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5691615Z 
2023-01-11T21:38:06.5692103Z @pointwise(size_hints=[64, 4], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5692178Z @triton.jit
2023-01-11T21:38:06.5692338Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.5692414Z     xnumel = 64
2023-01-11T21:38:06.5692485Z     ynumel = 4
2023-01-11T21:38:06.5692581Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5692717Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5692800Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5692896Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.5693024Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.5693106Z     ymask = yindex < ynumel
2023-01-11T21:38:06.5693181Z     x0 = xindex % 16
2023-01-11T21:38:06.5693258Z     x1 = (xindex // 16)
2023-01-11T21:38:06.5693327Z     y2 = yindex
2023-01-11T21:38:06.5693400Z     x3 = xindex
2023-01-11T21:38:06.5693526Z     tmp0 = tl.load(in_ptr0 + (x0 + (16*y2) + (64*x1)), xmask & ymask)
2023-01-11T21:38:06.5693678Z     tl.store(out_ptr0 + (y2 + (4*x3) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.5693763Z ''')
2023-01-11T21:38:06.5693769Z 
2023-01-11T21:38:06.5693774Z 
2023-01-11T21:38:06.5693867Z async_compile.wait(globals())
2023-01-11T21:38:06.5693945Z del async_compile
2023-01-11T21:38:06.5693950Z 
2023-01-11T21:38:06.5694021Z def call(args):
2023-01-11T21:38:06.5694093Z     arg0_1, = args
2023-01-11T21:38:06.5694169Z     args.clear()
2023-01-11T21:38:06.5694259Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5694609Z         buf0 = empty_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5694706Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5694846Z         triton_fused_clone_0.run(arg0_1, buf0, 64, 4, grid=grid(64, 4), stream=stream0)
2023-01-11T21:38:06.5694919Z         del arg0_1
2023-01-11T21:38:06.5695001Z         return (buf0, )
2023-01-11T21:38:06.5695007Z 
2023-01-11T21:38:06.5695011Z 
2023-01-11T21:38:06.5695090Z if __name__ == "__main__":
2023-01-11T21:38:06.5695206Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5695330Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5695540Z     arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5695653Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5695658Z 
2023-01-11T21:38:06.5695663Z 
2023-01-11T21:38:06.5695760Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5695843Z import torch
2023-01-11T21:38:06.5695919Z import random
2023-01-11T21:38:06.5696038Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5696160Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5696165Z 
2023-01-11T21:38:06.5696249Z aten = torch.ops.aten
2023-01-11T21:38:06.5696382Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5696478Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5696483Z 
2023-01-11T21:38:06.5696556Z import triton
2023-01-11T21:38:06.5696649Z import triton.language as tl
2023-01-11T21:38:06.5696773Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5696910Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5696916Z 
2023-01-11T21:38:06.5696920Z 
2023-01-11T21:38:06.5697012Z async_compile.wait(globals())
2023-01-11T21:38:06.5697081Z del async_compile
2023-01-11T21:38:06.5697094Z 
2023-01-11T21:38:06.5697238Z def call(args):
2023-01-11T21:38:06.5697326Z     arg0_1, = args
2023-01-11T21:38:06.5697415Z     args.clear()
2023-01-11T21:38:06.5697523Z     return (as_strided(arg0_1, (64, 4), (4, 1)), )
2023-01-11T21:38:06.5697528Z 
2023-01-11T21:38:06.5697532Z 
2023-01-11T21:38:06.5697614Z if __name__ == "__main__":
2023-01-11T21:38:06.5697733Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5697915Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5698130Z     arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5698243Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5698248Z 
2023-01-11T21:38:06.5698252Z 
2023-01-11T21:38:06.5698364Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5698453Z import torch
2023-01-11T21:38:06.5698541Z import random
2023-01-11T21:38:06.5698673Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5698917Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5698927Z 
2023-01-11T21:38:06.5699028Z aten = torch.ops.aten
2023-01-11T21:38:06.5699193Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5699317Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5699322Z 
2023-01-11T21:38:06.5699397Z import triton
2023-01-11T21:38:06.5699490Z import triton.language as tl
2023-01-11T21:38:06.5699619Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5699759Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5699765Z 
2023-01-11T21:38:06.5699769Z 
2023-01-11T21:38:06.5699862Z async_compile.wait(globals())
2023-01-11T21:38:06.5699940Z del async_compile
2023-01-11T21:38:06.5699945Z 
2023-01-11T21:38:06.5700014Z def call(args):
2023-01-11T21:38:06.5700087Z     arg0_1, = args
2023-01-11T21:38:06.5700164Z     args.clear()
2023-01-11T21:38:06.5700273Z     return (as_strided(arg0_1, (4, 4, 1), (4, 16, 0), 3), )
2023-01-11T21:38:06.5700278Z 
2023-01-11T21:38:06.5700282Z 
2023-01-11T21:38:06.5700412Z if __name__ == "__main__":
2023-01-11T21:38:06.5700527Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5700648Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5700869Z     arg0_1 = rand_strided((4, 4, 4, 4), (64, 16, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5700977Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5700982Z 
2023-01-11T21:38:06.5701052Z ok (0.666s)
2023-01-11T21:38:06.5701182Z   test_permute_bmm_fusion (__main__.CudaTests) ... ok (0.004s)
2023-01-11T21:38:06.5701643Z   test_permute_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5701776Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5702036Z [2023-01-11 21:35:25,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 747
2023-01-11T21:38:06.5702305Z [2023-01-11 21:35:25,926] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 747
2023-01-11T21:38:06.5702718Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5702849Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5703099Z [2023-01-11 21:35:25,948] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 748
2023-01-11T21:38:06.5703365Z [2023-01-11 21:35:26,025] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 748
2023-01-11T21:38:06.5703370Z 
2023-01-11T21:38:06.5703463Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5703535Z import torch
2023-01-11T21:38:06.5703608Z import random
2023-01-11T21:38:06.5703802Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5703931Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5703937Z 
2023-01-11T21:38:06.5704015Z aten = torch.ops.aten
2023-01-11T21:38:06.5704153Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5704242Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5704253Z 
2023-01-11T21:38:06.5704321Z import triton
2023-01-11T21:38:06.5704411Z import triton.language as tl
2023-01-11T21:38:06.5704537Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5704680Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5704688Z 
2023-01-11T21:38:06.5704693Z 
2023-01-11T21:38:06.5704860Z triton_fused_add_1_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.5704938Z import triton
2023-01-11T21:38:06.5705030Z import triton.language as tl
2023-01-11T21:38:06.5705138Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5705239Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5705374Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5705501Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5705506Z 
2023-01-11T21:38:06.5705975Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5706049Z @triton.jit
2023-01-11T21:38:06.5706189Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5706263Z     xnumel = 32
2023-01-11T21:38:06.5706393Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5706521Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5706604Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5706676Z     x0 = xindex
2023-01-11T21:38:06.5706870Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5706967Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5707037Z     tmp1 = 1
2023-01-11T21:38:06.5707109Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5707179Z     tmp3 = 2
2023-01-11T21:38:06.5707257Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5707335Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.5707470Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5707606Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.5707696Z ''')
2023-01-11T21:38:06.5707701Z 
2023-01-11T21:38:06.5707706Z 
2023-01-11T21:38:06.5707791Z async_compile.wait(globals())
2023-01-11T21:38:06.5707873Z del async_compile
2023-01-11T21:38:06.5707878Z 
2023-01-11T21:38:06.5707953Z def call(args):
2023-01-11T21:38:06.5708024Z     arg0_1, = args
2023-01-11T21:38:06.5708098Z     args.clear()
2023-01-11T21:38:06.5708191Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5708414Z         buf0 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5708631Z         buf1 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5708716Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5708865Z         triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.5708938Z         del arg0_1
2023-01-11T21:38:06.5709018Z         return (buf0, buf1, )
2023-01-11T21:38:06.5709023Z 
2023-01-11T21:38:06.5709027Z 
2023-01-11T21:38:06.5709106Z if __name__ == "__main__":
2023-01-11T21:38:06.5709225Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5709353Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5709572Z     arg0_1 = rand_strided((2, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5709678Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5709683Z 
2023-01-11T21:38:06.5709715Z 
2023-01-11T21:38:06.5709814Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5709891Z import torch
2023-01-11T21:38:06.5709965Z import random
2023-01-11T21:38:06.5710081Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5710204Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5710209Z 
2023-01-11T21:38:06.5710288Z aten = torch.ops.aten
2023-01-11T21:38:06.5710423Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5710512Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5710517Z 
2023-01-11T21:38:06.5710590Z import triton
2023-01-11T21:38:06.5710680Z import triton.language as tl
2023-01-11T21:38:06.5710811Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5710951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5710956Z 
2023-01-11T21:38:06.5710961Z 
2023-01-11T21:38:06.5711127Z triton_fused_add_1_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.5711203Z import triton
2023-01-11T21:38:06.5711288Z import triton.language as tl
2023-01-11T21:38:06.5711404Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5711506Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5711639Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5711762Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5711767Z 
2023-01-11T21:38:06.5712182Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5712288Z @triton.jit
2023-01-11T21:38:06.5712431Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5712498Z     xnumel = 32
2023-01-11T21:38:06.5712595Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5712725Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5712807Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5712876Z     x0 = xindex
2023-01-11T21:38:06.5713092Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5713208Z     tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5713278Z     tmp1 = 1
2023-01-11T21:38:06.5713351Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5713421Z     tmp3 = 2
2023-01-11T21:38:06.5713497Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5713576Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.5713713Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5713850Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.5713928Z ''')
2023-01-11T21:38:06.5713938Z 
2023-01-11T21:38:06.5713942Z 
2023-01-11T21:38:06.5714028Z async_compile.wait(globals())
2023-01-11T21:38:06.5714105Z del async_compile
2023-01-11T21:38:06.5714110Z 
2023-01-11T21:38:06.5714186Z def call(args):
2023-01-11T21:38:06.5714257Z     arg0_1, = args
2023-01-11T21:38:06.5714332Z     args.clear()
2023-01-11T21:38:06.5714427Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5714646Z         buf0 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5714854Z         buf1 = empty_strided((2, 2, 2, 2, 2), (4, 8, 1, 16, 2), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5714947Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5715094Z         triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.5715168Z         del arg0_1
2023-01-11T21:38:06.5715248Z         return (buf0, buf1, )
2023-01-11T21:38:06.5715253Z 
2023-01-11T21:38:06.5715257Z 
2023-01-11T21:38:06.5715336Z if __name__ == "__main__":
2023-01-11T21:38:06.5715452Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5715576Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5715814Z     arg0_1 = rand_strided((2, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5715927Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5715932Z 
2023-01-11T21:38:06.5716004Z ok (0.199s)
2023-01-11T21:38:06.5716514Z   test_permute_fusion (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.5716594Z   warnings.warn(
2023-01-11T21:38:06.5716854Z [2023-01-11 21:35:26,082] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 749
2023-01-11T21:38:06.5717118Z [2023-01-11 21:35:26,086] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 749
2023-01-11T21:38:06.5717124Z 
2023-01-11T21:38:06.5717222Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5717298Z import torch
2023-01-11T21:38:06.5717365Z import random
2023-01-11T21:38:06.5717484Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5717606Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5717611Z 
2023-01-11T21:38:06.5717693Z aten = torch.ops.aten
2023-01-11T21:38:06.5717830Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5717925Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5717930Z 
2023-01-11T21:38:06.5718003Z import triton
2023-01-11T21:38:06.5718088Z import triton.language as tl
2023-01-11T21:38:06.5718214Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5718380Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5718386Z 
2023-01-11T21:38:06.5718390Z 
2023-01-11T21:38:06.5718482Z async_compile.wait(globals())
2023-01-11T21:38:06.5718557Z del async_compile
2023-01-11T21:38:06.5718562Z 
2023-01-11T21:38:06.5718637Z def call(args):
2023-01-11T21:38:06.5718732Z     primals_1, primals_2 = args
2023-01-11T21:38:06.5718809Z     args.clear()
2023-01-11T21:38:06.5718893Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5719114Z         buf0 = empty_strided((1024, 160, 20), (3200, 20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5719266Z         aten.bmm.out(as_strided(primals_1, (1024, 160, 642), (102720, 1, 160)), primals_2, out=buf0)
2023-01-11T21:38:06.5719433Z         return (buf0, as_strided(primals_1, (1024, 642, 160), (102720, 160, 1)), as_strided(primals_2, (1024, 20, 642), (12840, 1, 20)), )
2023-01-11T21:38:06.5719439Z 
2023-01-11T21:38:06.5719443Z 
2023-01-11T21:38:06.5719527Z if __name__ == "__main__":
2023-01-11T21:38:06.5719642Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5719767Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5719998Z     primals_1 = rand_strided((1024, 642, 160), (102720, 160, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5720228Z     primals_2 = rand_strided((1024, 642, 20), (12840, 20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5720354Z     print_performance(lambda: call([primals_1, primals_2]))
2023-01-11T21:38:06.5720359Z 
2023-01-11T21:38:06.5720430Z ok (0.203s)
2023-01-11T21:38:06.5720561Z   test_permute_linear_fusion (__main__.CudaTests) ... ok (0.004s)
2023-01-11T21:38:06.5721018Z   test_pow1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5721155Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5721408Z [2023-01-11 21:35:26,381] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 750
2023-01-11T21:38:06.5721695Z [2023-01-11 21:35:26,670] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 750
2023-01-11T21:38:06.5722109Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5722240Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5722492Z [2023-01-11 21:35:26,823] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 751
2023-01-11T21:38:06.5722501Z 
2023-01-11T21:38:06.5722600Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5722668Z import torch
2023-01-11T21:38:06.5722746Z import random
2023-01-11T21:38:06.5722867Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5722993Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5722998Z 
2023-01-11T21:38:06.5723081Z aten = torch.ops.aten
2023-01-11T21:38:06.5723220Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5723315Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5723320Z 
2023-01-11T21:38:06.5723387Z import triton
2023-01-11T21:38:06.5723481Z import triton.language as tl
2023-01-11T21:38:06.5723613Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5723752Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5723757Z 
2023-01-11T21:38:06.5723789Z 
2023-01-11T21:38:06.5724040Z triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0 = async_compile.triton('''
2023-01-11T21:38:06.5724116Z import triton
2023-01-11T21:38:06.5724210Z import triton.language as tl
2023-01-11T21:38:06.5724324Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5724421Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5724553Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5724678Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5724684Z 
2023-01-11T21:38:06.5725316Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: '*fp32', 9: '*fp32', 10: '*fp32', 11: '*fp32', 12: '*fp32', 13: '*fp32', 14: '*fp32', 15: '*fp32', 16: '*fp32', 17: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), equal_to_1=())]})
2023-01-11T21:38:06.5725393Z @triton.jit
2023-01-11T21:38:06.5725646Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, out_ptr11, out_ptr12, out_ptr13, out_ptr14, out_ptr15, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5725722Z     xnumel = 256
2023-01-11T21:38:06.5725816Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5725944Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5726020Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5726090Z     x0 = xindex
2023-01-11T21:38:06.5726281Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5726380Z     tmp16 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5726455Z     tmp1 = 1 / tmp0
2023-01-11T21:38:06.5726534Z     tmp2 = tmp1 * tmp1
2023-01-11T21:38:06.5726611Z     tmp3 = tmp2 * tmp2
2023-01-11T21:38:06.5726684Z     tmp4 = tmp3 * tmp3
2023-01-11T21:38:06.5726760Z     tmp5 = tmp2 * tmp1
2023-01-11T21:38:06.5726834Z     tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.5726907Z     tmp7 = tmp6 * tmp1
2023-01-11T21:38:06.5726981Z     tmp8 = tmp3 * tmp1
2023-01-11T21:38:06.5727051Z     tmp9 = 1
2023-01-11T21:38:06.5727129Z     tmp10 = tmp0 * tmp0
2023-01-11T21:38:06.5727233Z     tmp11 = tmp10 * tmp0
2023-01-11T21:38:06.5727315Z     tmp12 = tmp10 * tmp10
2023-01-11T21:38:06.5727394Z     tmp13 = tmp12 * tmp0
2023-01-11T21:38:06.5727472Z     tmp14 = tmp11 * tmp11
2023-01-11T21:38:06.5727551Z     tmp15 = tmp14 * tmp0
2023-01-11T21:38:06.5727630Z     tmp17 = tmp16 * tmp16
2023-01-11T21:38:06.5727701Z     tmp18 = tmp17 * tmp17
2023-01-11T21:38:06.5727780Z     tmp19 = tmp18 * tmp18
2023-01-11T21:38:06.5727916Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5728071Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.5728215Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.5728362Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.5728489Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.5728618Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5728738Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5728867Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.5728993Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.5729124Z     tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.5729254Z     tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.5729384Z     tl.store(out_ptr11 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.5729544Z     tl.store(out_ptr12 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.5729673Z     tl.store(out_ptr13 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.5729792Z     tl.store(out_ptr14 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.5729921Z     tl.store(out_ptr15 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.5730009Z ''')
2023-01-11T21:38:06.5730015Z 
2023-01-11T21:38:06.5730019Z 
2023-01-11T21:38:06.5730116Z async_compile.wait(globals())
2023-01-11T21:38:06.5730191Z del async_compile
2023-01-11T21:38:06.5730196Z 
2023-01-11T21:38:06.5730269Z def call(args):
2023-01-11T21:38:06.5730339Z     arg0_1, = args
2023-01-11T21:38:06.5730413Z     args.clear()
2023-01-11T21:38:06.5730499Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5730703Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5730903Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5731103Z         buf2 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5731295Z         buf3 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5731485Z         buf4 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5731683Z         buf5 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5731872Z         buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5732055Z         buf7 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5732246Z         buf8 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5732436Z         buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5732637Z         buf10 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5732837Z         buf11 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5733030Z         buf12 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5733224Z         buf13 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5733455Z         buf14 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5733642Z         buf15 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5733736Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5733985Z         triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, buf11, buf12, buf13, buf14, buf15, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5734153Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, arg0_1, buf9, buf10, buf11, buf12, buf13, buf14, buf15, )
2023-01-11T21:38:06.5734163Z 
2023-01-11T21:38:06.5734167Z 
2023-01-11T21:38:06.5734249Z if __name__ == "__main__":
2023-01-11T21:38:06.5734369Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5734623Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5734833Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5734944Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5734950Z 
2023-01-11T21:38:06.5735210Z [2023-01-11 21:35:26,981] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 751
2023-01-11T21:38:06.5735223Z 
2023-01-11T21:38:06.5735315Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5735390Z import torch
2023-01-11T21:38:06.5735461Z import random
2023-01-11T21:38:06.5735580Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5735702Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5735753Z 
2023-01-11T21:38:06.5735836Z aten = torch.ops.aten
2023-01-11T21:38:06.5735977Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5736066Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5736071Z 
2023-01-11T21:38:06.5736143Z import triton
2023-01-11T21:38:06.5736241Z import triton.language as tl
2023-01-11T21:38:06.5736363Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5736503Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5736508Z 
2023-01-11T21:38:06.5736514Z 
2023-01-11T21:38:06.5736762Z triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0 = async_compile.triton('''
2023-01-11T21:38:06.5736835Z import triton
2023-01-11T21:38:06.5736928Z import triton.language as tl
2023-01-11T21:38:06.5737035Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5737193Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5737329Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5737452Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5737457Z 
2023-01-11T21:38:06.5738095Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: '*fp16', 8: '*fp16', 9: '*fp16', 10: '*fp16', 11: '*fp16', 12: '*fp16', 13: '*fp16', 14: '*fp16', 15: '*fp16', 16: '*fp16', 17: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17), equal_to_1=())]})
2023-01-11T21:38:06.5738167Z @triton.jit
2023-01-11T21:38:06.5738418Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, out_ptr11, out_ptr12, out_ptr13, out_ptr14, out_ptr15, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5738492Z     xnumel = 256
2023-01-11T21:38:06.5738595Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5738718Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5738801Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5738872Z     x0 = xindex
2023-01-11T21:38:06.5739124Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5739244Z     tmp16 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5739319Z     tmp1 = 1 / tmp0
2023-01-11T21:38:06.5739400Z     tmp2 = tmp1 * tmp1
2023-01-11T21:38:06.5739471Z     tmp3 = tmp2 * tmp2
2023-01-11T21:38:06.5739549Z     tmp4 = tmp3 * tmp3
2023-01-11T21:38:06.5739623Z     tmp5 = tmp2 * tmp1
2023-01-11T21:38:06.5739701Z     tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.5739777Z     tmp7 = tmp6 * tmp1
2023-01-11T21:38:06.5739854Z     tmp8 = tmp3 * tmp1
2023-01-11T21:38:06.5739921Z     tmp9 = 1
2023-01-11T21:38:06.5739992Z     tmp10 = tmp0 * tmp0
2023-01-11T21:38:06.5740076Z     tmp11 = tmp10 * tmp0
2023-01-11T21:38:06.5740159Z     tmp12 = tmp10 * tmp10
2023-01-11T21:38:06.5740240Z     tmp13 = tmp12 * tmp0
2023-01-11T21:38:06.5740321Z     tmp14 = tmp11 * tmp11
2023-01-11T21:38:06.5740399Z     tmp15 = tmp14 * tmp0
2023-01-11T21:38:06.5740476Z     tmp17 = tmp16 * tmp16
2023-01-11T21:38:06.5740548Z     tmp18 = tmp17 * tmp17
2023-01-11T21:38:06.5740629Z     tmp19 = tmp18 * tmp18
2023-01-11T21:38:06.5740765Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5740898Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.5741027Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.5741157Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.5741283Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.5741403Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5741558Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5741687Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.5741812Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.5741946Z     tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.5742076Z     tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.5742207Z     tl.store(out_ptr11 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.5742335Z     tl.store(out_ptr12 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.5742457Z     tl.store(out_ptr13 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.5742583Z     tl.store(out_ptr14 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.5742707Z     tl.store(out_ptr15 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.5742797Z ''')
2023-01-11T21:38:06.5742802Z 
2023-01-11T21:38:06.5742807Z 
2023-01-11T21:38:06.5742904Z async_compile.wait(globals())
2023-01-11T21:38:06.5742981Z del async_compile
2023-01-11T21:38:06.5742986Z 
2023-01-11T21:38:06.5743060Z def call(args):
2023-01-11T21:38:06.5743133Z     arg0_1, = args
2023-01-11T21:38:06.5743204Z     args.clear()
2023-01-11T21:38:06.5743297Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5743502Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5743700Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5743894Z         buf2 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5744089Z         buf3 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5744281Z         buf4 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5744475Z         buf5 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5744659Z         buf6 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5744849Z         buf7 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5745068Z         buf8 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5745260Z         buf9 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5745462Z         buf10 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5745685Z         buf11 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5745903Z         buf12 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5746096Z         buf13 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5746285Z         buf14 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5746476Z         buf15 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5746567Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5746812Z         triton_fused_pow_1_pow_11_pow_12_pow_13_pow_14_pow_15_pow_16_pow_17_pow_2_pow_3_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, buf11, buf12, buf13, buf14, buf15, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5746982Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, arg0_1, buf9, buf10, buf11, buf12, buf13, buf14, buf15, )
2023-01-11T21:38:06.5746987Z 
2023-01-11T21:38:06.5746992Z 
2023-01-11T21:38:06.5747071Z if __name__ == "__main__":
2023-01-11T21:38:06.5747189Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5747315Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5747516Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5747661Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5747665Z 
2023-01-11T21:38:06.5747736Z ok (0.753s)
2023-01-11T21:38:06.5748192Z   test_pow2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5748324Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5748584Z [2023-01-11 21:35:27,013] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 752
2023-01-11T21:38:06.5748844Z [2023-01-11 21:35:27,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 752
2023-01-11T21:38:06.5749261Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5749393Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5749643Z [2023-01-11 21:35:27,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 753
2023-01-11T21:38:06.5749903Z [2023-01-11 21:35:27,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 753
2023-01-11T21:38:06.5749908Z 
2023-01-11T21:38:06.5750005Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5750072Z import torch
2023-01-11T21:38:06.5750145Z import random
2023-01-11T21:38:06.5750265Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5750394Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5750399Z 
2023-01-11T21:38:06.5750479Z aten = torch.ops.aten
2023-01-11T21:38:06.5750615Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5750711Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5750717Z 
2023-01-11T21:38:06.5750817Z import triton
2023-01-11T21:38:06.5750904Z import triton.language as tl
2023-01-11T21:38:06.5751026Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5751166Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5751171Z 
2023-01-11T21:38:06.5751176Z 
2023-01-11T21:38:06.5751344Z triton_fused_pow_1_pow_2_0 = async_compile.triton('''
2023-01-11T21:38:06.5751419Z import triton
2023-01-11T21:38:06.5751510Z import triton.language as tl
2023-01-11T21:38:06.5751626Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5751722Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5751858Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5751981Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5751986Z 
2023-01-11T21:38:06.5752412Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5752489Z @triton.jit
2023-01-11T21:38:06.5752630Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5752706Z     xnumel = 256
2023-01-11T21:38:06.5752802Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5752924Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5753008Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5753081Z     x0 = xindex
2023-01-11T21:38:06.5753271Z     tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5753397Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5753469Z     tmp0 = 1000
2023-01-11T21:38:06.5753570Z     tmp2 = tl.libdevice.pow(tmp0, tmp1)
2023-01-11T21:38:06.5753665Z     tmp4 = tl.libdevice.pow(tmp3, tmp0)
2023-01-11T21:38:06.5753799Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5753935Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5754020Z ''')
2023-01-11T21:38:06.5754026Z 
2023-01-11T21:38:06.5754030Z 
2023-01-11T21:38:06.5754125Z async_compile.wait(globals())
2023-01-11T21:38:06.5754202Z del async_compile
2023-01-11T21:38:06.5754207Z 
2023-01-11T21:38:06.5754279Z def call(args):
2023-01-11T21:38:06.5754352Z     arg0_1, = args
2023-01-11T21:38:06.5754420Z     args.clear()
2023-01-11T21:38:06.5754514Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5754717Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5754917Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5755009Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5755158Z         triton_fused_pow_1_pow_2_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5755234Z         del arg0_1
2023-01-11T21:38:06.5755315Z         return (buf0, buf1, )
2023-01-11T21:38:06.5755330Z 
2023-01-11T21:38:06.5755336Z 
2023-01-11T21:38:06.5755426Z if __name__ == "__main__":
2023-01-11T21:38:06.5755559Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5755700Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5755902Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5756013Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5756018Z 
2023-01-11T21:38:06.5756023Z 
2023-01-11T21:38:06.5756120Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5756194Z import torch
2023-01-11T21:38:06.5756263Z import random
2023-01-11T21:38:06.5756382Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5756505Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5756510Z 
2023-01-11T21:38:06.5756591Z aten = torch.ops.aten
2023-01-11T21:38:06.5756752Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5756847Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5756852Z 
2023-01-11T21:38:06.5756926Z import triton
2023-01-11T21:38:06.5757015Z import triton.language as tl
2023-01-11T21:38:06.5757132Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5757269Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5757274Z 
2023-01-11T21:38:06.5757279Z 
2023-01-11T21:38:06.5757445Z triton_fused_pow_1_pow_2_0 = async_compile.triton('''
2023-01-11T21:38:06.5757522Z import triton
2023-01-11T21:38:06.5757613Z import triton.language as tl
2023-01-11T21:38:06.5757728Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5757833Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5757965Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5758083Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5758088Z 
2023-01-11T21:38:06.5758508Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5758582Z @triton.jit
2023-01-11T21:38:06.5758724Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5758802Z     xnumel = 256
2023-01-11T21:38:06.5758898Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5759025Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5759107Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5759200Z     x0 = xindex
2023-01-11T21:38:06.5759412Z     tmp1 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5759529Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5759601Z     tmp0 = 1000
2023-01-11T21:38:06.5759701Z     tmp2 = tl.libdevice.pow(tmp0, tmp1)
2023-01-11T21:38:06.5759806Z     tmp4 = tl.libdevice.pow(tmp3, tmp0)
2023-01-11T21:38:06.5759940Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5760064Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.5760148Z ''')
2023-01-11T21:38:06.5760154Z 
2023-01-11T21:38:06.5760158Z 
2023-01-11T21:38:06.5760248Z async_compile.wait(globals())
2023-01-11T21:38:06.5760324Z del async_compile
2023-01-11T21:38:06.5760329Z 
2023-01-11T21:38:06.5760401Z def call(args):
2023-01-11T21:38:06.5760473Z     arg0_1, = args
2023-01-11T21:38:06.5760546Z     args.clear()
2023-01-11T21:38:06.5760631Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5760842Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5761038Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5761130Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5761283Z         triton_fused_pow_1_pow_2_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.5761356Z         del arg0_1
2023-01-11T21:38:06.5761437Z         return (buf0, buf1, )
2023-01-11T21:38:06.5761442Z 
2023-01-11T21:38:06.5761446Z 
2023-01-11T21:38:06.5761524Z if __name__ == "__main__":
2023-01-11T21:38:06.5761634Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5761759Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5761959Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5762071Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5762078Z 
2023-01-11T21:38:06.5762148Z ok (0.374s)
2023-01-11T21:38:06.5762650Z   test_pow3_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.5762753Z   warnings.warn(
2023-01-11T21:38:06.5763013Z [2023-01-11 21:35:27,378] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 754
2023-01-11T21:38:06.5763263Z [2023-01-11 21:35:27,530] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.5763519Z [2023-01-11 21:35:27,530] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 754
2023-01-11T21:38:06.5763535Z 
2023-01-11T21:38:06.5763625Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5763700Z import torch
2023-01-11T21:38:06.5763774Z import random
2023-01-11T21:38:06.5763896Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5764020Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5764025Z 
2023-01-11T21:38:06.5764107Z aten = torch.ops.aten
2023-01-11T21:38:06.5764244Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5764334Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5764339Z 
2023-01-11T21:38:06.5764414Z import triton
2023-01-11T21:38:06.5764507Z import triton.language as tl
2023-01-11T21:38:06.5764631Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5764771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5764777Z 
2023-01-11T21:38:06.5764781Z 
2023-01-11T21:38:06.5764970Z triton_fused_add_lift_fresh_copy_pow_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5765043Z import triton
2023-01-11T21:38:06.5765132Z import triton.language as tl
2023-01-11T21:38:06.5765239Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5765370Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5765506Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5765656Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5765661Z 
2023-01-11T21:38:06.5766087Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5766162Z @triton.jit
2023-01-11T21:38:06.5766294Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5766365Z     xnumel = 1
2023-01-11T21:38:06.5766455Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5766583Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5766667Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5766742Z     tmp1 = in_ptr0
2023-01-11T21:38:06.5766821Z     tmp0 = 0.12300000339746475
2023-01-11T21:38:06.5766906Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.5766989Z     tmp3 = tl.sqrt(tmp2)
2023-01-11T21:38:06.5767115Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp3, None)
2023-01-11T21:38:06.5767200Z ''')
2023-01-11T21:38:06.5767205Z 
2023-01-11T21:38:06.5767211Z 
2023-01-11T21:38:06.5767303Z async_compile.wait(globals())
2023-01-11T21:38:06.5767384Z del async_compile
2023-01-11T21:38:06.5767389Z 
2023-01-11T21:38:06.5767465Z def call(args):
2023-01-11T21:38:06.5767537Z     arg0_1, = args
2023-01-11T21:38:06.5767610Z     args.clear()
2023-01-11T21:38:06.5767695Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5767890Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5767985Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5768151Z         triton_fused_add_lift_fresh_copy_pow_1_0.run(arg0_1.item(), buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5768226Z         del arg0_1
2023-01-11T21:38:06.5768308Z         return (buf0, )
2023-01-11T21:38:06.5768313Z 
2023-01-11T21:38:06.5768318Z 
2023-01-11T21:38:06.5768397Z if __name__ == "__main__":
2023-01-11T21:38:06.5768514Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5768632Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5768846Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.5768960Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5768965Z 
2023-01-11T21:38:06.5769033Z ok (0.168s)
2023-01-11T21:38:06.5769375Z   test_profiler_mark_wrapper_call_cuda (__main__.CudaTests) ... STAGE:2023-01-11 21:35:27 2346:2346 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:38:06.5769636Z [2023-01-11 21:35:27,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 755
2023-01-11T21:38:06.5769902Z [2023-01-11 21:35:27,549] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 755
2023-01-11T21:38:06.5770155Z STAGE:2023-01-11 21:35:27 2346:2346 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:38:06.5770404Z STAGE:2023-01-11 21:35:27 2346:2346 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:38:06.5770417Z 
2023-01-11T21:38:06.5770509Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5770584Z import torch
2023-01-11T21:38:06.5770662Z import random
2023-01-11T21:38:06.5770783Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5770909Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5770914Z 
2023-01-11T21:38:06.5770997Z aten = torch.ops.aten
2023-01-11T21:38:06.5771141Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5771228Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5771233Z 
2023-01-11T21:38:06.5771306Z import triton
2023-01-11T21:38:06.5771398Z import triton.language as tl
2023-01-11T21:38:06.5771524Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5771692Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5771698Z 
2023-01-11T21:38:06.5771702Z 
2023-01-11T21:38:06.5771842Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.5772053Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.5772176Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.5772279Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.5772386Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.5772452Z {
2023-01-11T21:38:06.5772554Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.5772619Z     {
2023-01-11T21:38:06.5772701Z         #pragma omp for 
2023-01-11T21:38:06.5772790Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.5772850Z         {
2023-01-11T21:38:06.5773004Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.5773143Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.5773233Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.5773330Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.5773396Z         }
2023-01-11T21:38:06.5773495Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.5773580Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.5773645Z         {
2023-01-11T21:38:06.5773734Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.5773822Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.5773909Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.5773998Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.5774065Z         }
2023-01-11T21:38:06.5774125Z     }
2023-01-11T21:38:06.5774189Z }
2023-01-11T21:38:06.5774274Z ''')
2023-01-11T21:38:06.5774280Z 
2023-01-11T21:38:06.5774284Z 
2023-01-11T21:38:06.5774376Z async_compile.wait(globals())
2023-01-11T21:38:06.5774452Z del async_compile
2023-01-11T21:38:06.5774461Z 
2023-01-11T21:38:06.5774648Z def call(args):
2023-01-11T21:38:06.5774731Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5774799Z     args.clear()
2023-01-11T21:38:06.5774918Z     from torch.profiler import record_function
2023-01-11T21:38:06.5775078Z     with record_function('inductor_wrapper_call'):
2023-01-11T21:38:06.5775327Z         buf0 = empty_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.5775497Z         kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.5775570Z         del arg0_1
2023-01-11T21:38:06.5775644Z         del arg1_1
2023-01-11T21:38:06.5775714Z         return (buf0, )
2023-01-11T21:38:06.5775730Z 
2023-01-11T21:38:06.5775734Z 
2023-01-11T21:38:06.5775807Z if __name__ == "__main__":
2023-01-11T21:38:06.5775922Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5776050Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5776251Z     arg0_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.5776443Z     arg1_1 = rand_strided((100, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.5776560Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5776566Z 
2023-01-11T21:38:06.5776633Z ok (0.023s)
2023-01-11T21:38:06.5777220Z   test_rand_like_deterministic_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.5777307Z   warnings.warn(
2023-01-11T21:38:06.5777591Z [2023-01-11 21:35:27,625] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 756
2023-01-11T21:38:06.5777844Z [2023-01-11 21:35:27,626] torch._inductor.lowering: [WARNING] using triton random, expect difference from eager
2023-01-11T21:38:06.5778146Z [2023-01-11 21:35:27,817] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 756
2023-01-11T21:38:06.5778152Z 
2023-01-11T21:38:06.5778250Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5778322Z import torch
2023-01-11T21:38:06.5778399Z import random
2023-01-11T21:38:06.5778520Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5778636Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5778649Z 
2023-01-11T21:38:06.5778724Z aten = torch.ops.aten
2023-01-11T21:38:06.5778859Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5778954Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5778959Z 
2023-01-11T21:38:06.5779033Z import triton
2023-01-11T21:38:06.5779124Z import triton.language as tl
2023-01-11T21:38:06.5779248Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5779385Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5779545Z seed_cuda_0 = None  # 12bf87036c8e625335a9db42dcf50de0c1ec952294785adced537424d5733e17
2023-01-11T21:38:06.5779551Z 
2023-01-11T21:38:06.5779565Z 
2023-01-11T21:38:06.5779765Z triton_fused_philox_rand_like_philox_rand_like_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5779842Z import triton
2023-01-11T21:38:06.5779935Z import triton.language as tl
2023-01-11T21:38:06.5780055Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5780160Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5780292Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5780414Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5780419Z 
2023-01-11T21:38:06.5780828Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.5780904Z @triton.jit
2023-01-11T21:38:06.5781043Z def triton_(seed0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5781116Z     xnumel = 1024
2023-01-11T21:38:06.5781212Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5781339Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5781451Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5781521Z     x0 = xindex
2023-01-11T21:38:06.5781747Z     tmp0 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.5781880Z     tmp3 = tl.load(seed0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.5781950Z     tmp1 = x0
2023-01-11T21:38:06.5782039Z     tmp2 = tl.rand(tmp0, tmp1)
2023-01-11T21:38:06.5782113Z     tmp4 = 1024 + x0
2023-01-11T21:38:06.5782201Z     tmp5 = tl.rand(tmp3, tmp4)
2023-01-11T21:38:06.5782335Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5782460Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.5782549Z ''')
2023-01-11T21:38:06.5782555Z 
2023-01-11T21:38:06.5782559Z 
2023-01-11T21:38:06.5782653Z async_compile.wait(globals())
2023-01-11T21:38:06.5782729Z del async_compile
2023-01-11T21:38:06.5782734Z 
2023-01-11T21:38:06.5782806Z def call(args):
2023-01-11T21:38:06.5782881Z     arg0_1, = args
2023-01-11T21:38:06.5782957Z     args.clear()
2023-01-11T21:38:06.5783092Z     torch.randint(2**31, size=(), dtype=torch.int64, out=seed_cuda_0)
2023-01-11T21:38:06.5783177Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5783377Z         buf0 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5783576Z         buf1 = empty_strided((1024, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5783669Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5783851Z         triton_fused_philox_rand_like_philox_rand_like_1_0.run(seed_cuda_0, buf0, buf1, 1024, grid=grid(1024), stream=stream0)
2023-01-11T21:38:06.5783961Z         return (buf0, buf1, )
2023-01-11T21:38:06.5783966Z 
2023-01-11T21:38:06.5783971Z 
2023-01-11T21:38:06.5784050Z if __name__ == "__main__":
2023-01-11T21:38:06.5784168Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5784287Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5784488Z     seed_cuda_0 = rand_strided((), (), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.5784691Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5784801Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5784806Z 
2023-01-11T21:38:06.5784875Z ok (0.413s)
2023-01-11T21:38:06.5785331Z   test_reduction1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5785466Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5785748Z [2023-01-11 21:35:27,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 757
2023-01-11T21:38:06.5786041Z [2023-01-11 21:35:28,228] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 757
2023-01-11T21:38:06.5786455Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5786578Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5786835Z [2023-01-11 21:35:28,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 758
2023-01-11T21:38:06.5786842Z 
2023-01-11T21:38:06.5786940Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5787013Z import torch
2023-01-11T21:38:06.5787087Z import random
2023-01-11T21:38:06.5787205Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5787353Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5787359Z 
2023-01-11T21:38:06.5787442Z aten = torch.ops.aten
2023-01-11T21:38:06.5787571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5787665Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5787670Z 
2023-01-11T21:38:06.5787745Z import triton
2023-01-11T21:38:06.5787837Z import triton.language as tl
2023-01-11T21:38:06.5787962Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5788101Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5788107Z 
2023-01-11T21:38:06.5788114Z 
2023-01-11T21:38:06.5788314Z triton_fused_argmax_argmin_max_1_min_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5788386Z import triton
2023-01-11T21:38:06.5788471Z import triton.language as tl
2023-01-11T21:38:06.5788585Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5788686Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5788819Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5788944Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5788950Z 
2023-01-11T21:38:06.5789037Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.5789153Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5789231Z               filename=__file__,
2023-01-11T21:38:06.5789647Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*i64', 5: '*i64', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.5789758Z @triton.jit
2023-01-11T21:38:06.5789958Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5790031Z     xnumel = 1
2023-01-11T21:38:06.5790105Z     rnumel = 3
2023-01-11T21:38:06.5790207Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5790342Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5790419Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5790538Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5790656Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5790842Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5790966Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5791146Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5791265Z     _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5791387Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5791495Z     _tmp6_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5791597Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5791687Z         rindex = roffset + rbase
2023-01-11T21:38:06.5791773Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5791843Z         r0 = rindex
2023-01-11T21:38:06.5792039Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5792141Z         tmp5 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5792255Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5792382Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.5792507Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.5792650Z         _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0),  rindex, _tmp4_index)
2023-01-11T21:38:06.5792774Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4)
2023-01-11T21:38:06.5792909Z         _tmp6_index = tl.where(xmask & rmask & (_tmp6 > tmp5),  rindex, _tmp6_index)
2023-01-11T21:38:06.5793028Z         _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6)
2023-01-11T21:38:06.5793173Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5793299Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5793413Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5793545Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.5793657Z     tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5793787Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5793883Z     _tmp4_index_reduce = tl.reshape(
2023-01-11T21:38:06.5793992Z     	tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5794108Z     _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5794202Z     	[1, RBLOCK]) == _tmp4_index_reduce)
2023-01-11T21:38:06.5794290Z     tmp4 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5794415Z     	tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5794546Z     tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None)
2023-01-11T21:38:06.5794641Z     _tmp6_index_reduce = tl.reshape(
2023-01-11T21:38:06.5794749Z     	tl.argmin(_tmp6, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5794860Z     _tmp6_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5794957Z     	[1, RBLOCK]) == _tmp6_index_reduce)
2023-01-11T21:38:06.5795043Z     tmp6 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5795166Z     	tl.where(_tmp6_index_mask, _tmp6_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5795294Z     tl.store(out_ptr4 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None)
2023-01-11T21:38:06.5795472Z ''')
2023-01-11T21:38:06.5795478Z 
2023-01-11T21:38:06.5795482Z 
2023-01-11T21:38:06.5795595Z async_compile.wait(globals())
2023-01-11T21:38:06.5795676Z del async_compile
2023-01-11T21:38:06.5795682Z 
2023-01-11T21:38:06.5795749Z def call(args):
2023-01-11T21:38:06.5795822Z     arg0_1, = args
2023-01-11T21:38:06.5795900Z     args.clear()
2023-01-11T21:38:06.5795995Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5796186Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5796370Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5796554Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5796733Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5796918Z         buf4 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5797012Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5797196Z         triton_fused_argmax_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 1, 3, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5797271Z         del arg0_1
2023-01-11T21:38:06.5797374Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.5797380Z 
2023-01-11T21:38:06.5797384Z 
2023-01-11T21:38:06.5797467Z if __name__ == "__main__":
2023-01-11T21:38:06.5797588Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5797709Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5797907Z     arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5798019Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5798024Z 
2023-01-11T21:38:06.5798289Z [2023-01-11 21:35:28,358] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 758
2023-01-11T21:38:06.5798295Z 
2023-01-11T21:38:06.5798393Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5798466Z import torch
2023-01-11T21:38:06.5798543Z import random
2023-01-11T21:38:06.5798662Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5798779Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5798789Z 
2023-01-11T21:38:06.5798863Z aten = torch.ops.aten
2023-01-11T21:38:06.5799027Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5799125Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5799130Z 
2023-01-11T21:38:06.5799203Z import triton
2023-01-11T21:38:06.5799293Z import triton.language as tl
2023-01-11T21:38:06.5799420Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5799559Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5799564Z 
2023-01-11T21:38:06.5799569Z 
2023-01-11T21:38:06.5799767Z triton_fused_argmax_argmin_max_1_min_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5799834Z import triton
2023-01-11T21:38:06.5799927Z import triton.language as tl
2023-01-11T21:38:06.5800042Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5800145Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5800274Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5800396Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5800401Z 
2023-01-11T21:38:06.5800491Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.5800599Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5800682Z               filename=__file__,
2023-01-11T21:38:06.5801098Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*i64', 5: '*i64', 6: 'i32', 7: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.5801171Z @triton.jit
2023-01-11T21:38:06.5801374Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5801473Z     xnumel = 1
2023-01-11T21:38:06.5801546Z     rnumel = 3
2023-01-11T21:38:06.5801642Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5801770Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5801853Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5801974Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5802090Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5802277Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5802402Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5802582Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5802690Z     _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5802811Z     _tmp6 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5802924Z     _tmp6_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5803032Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5803121Z         rindex = roffset + rbase
2023-01-11T21:38:06.5803207Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5803278Z         r0 = rindex
2023-01-11T21:38:06.5803493Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5803612Z         tmp5 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5803732Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5803859Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.5803982Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.5804126Z         _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0),  rindex, _tmp4_index)
2023-01-11T21:38:06.5804249Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4)
2023-01-11T21:38:06.5804386Z         _tmp6_index = tl.where(xmask & rmask & (_tmp6 > tmp5),  rindex, _tmp6_index)
2023-01-11T21:38:06.5804504Z         _tmp6 = tl.where(xmask & rmask & (_tmp6 > tmp5), tmp5, _tmp6)
2023-01-11T21:38:06.5804617Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5804748Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5804886Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5805022Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.5805133Z     tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5805267Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5805363Z     _tmp4_index_reduce = tl.reshape(
2023-01-11T21:38:06.5805466Z     	tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5805584Z     _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5805680Z     	[1, RBLOCK]) == _tmp4_index_reduce)
2023-01-11T21:38:06.5805771Z     tmp4 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5805897Z     	tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5806024Z     tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp4, None)
2023-01-11T21:38:06.5806116Z     _tmp6_index_reduce = tl.reshape(
2023-01-11T21:38:06.5806219Z     	tl.argmin(_tmp6, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5806339Z     _tmp6_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5806433Z     	[1, RBLOCK]) == _tmp6_index_reduce)
2023-01-11T21:38:06.5806521Z     tmp6 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5806643Z     	tl.where(_tmp6_index_mask, _tmp6_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5806767Z     tl.store(out_ptr4 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp6, None)
2023-01-11T21:38:06.5806854Z ''')
2023-01-11T21:38:06.5806860Z 
2023-01-11T21:38:06.5806864Z 
2023-01-11T21:38:06.5806957Z async_compile.wait(globals())
2023-01-11T21:38:06.5807028Z del async_compile
2023-01-11T21:38:06.5807061Z 
2023-01-11T21:38:06.5807137Z def call(args):
2023-01-11T21:38:06.5807212Z     arg0_1, = args
2023-01-11T21:38:06.5807285Z     args.clear()
2023-01-11T21:38:06.5807381Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5807569Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5807754Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5807932Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5808116Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5808299Z         buf4 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5808392Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5808572Z         triton_fused_argmax_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, 1, 3, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5808644Z         del arg0_1
2023-01-11T21:38:06.5808749Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.5808759Z 
2023-01-11T21:38:06.5808763Z 
2023-01-11T21:38:06.5808843Z if __name__ == "__main__":
2023-01-11T21:38:06.5808955Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5809083Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5809284Z     arg0_1 = rand_strided((3, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5809396Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5809402Z 
2023-01-11T21:38:06.5809473Z ok (0.394s)
2023-01-11T21:38:06.5809929Z   test_reduction2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5810063Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5810320Z [2023-01-11 21:35:28,376] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 759
2023-01-11T21:38:06.5810613Z [2023-01-11 21:35:28,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 759
2023-01-11T21:38:06.5811030Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5811154Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5811406Z [2023-01-11 21:35:28,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 760
2023-01-11T21:38:06.5811668Z [2023-01-11 21:35:28,687] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 760
2023-01-11T21:38:06.5811674Z 
2023-01-11T21:38:06.5811775Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5811850Z import torch
2023-01-11T21:38:06.5811925Z import random
2023-01-11T21:38:06.5812046Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5812169Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5812176Z 
2023-01-11T21:38:06.5812250Z aten = torch.ops.aten
2023-01-11T21:38:06.5812385Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5812479Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5812485Z 
2023-01-11T21:38:06.5812556Z import triton
2023-01-11T21:38:06.5812646Z import triton.language as tl
2023-01-11T21:38:06.5812769Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5812908Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5812942Z 
2023-01-11T21:38:06.5812946Z 
2023-01-11T21:38:06.5813139Z triton_fused_argmin_max_1_min_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5813207Z import triton
2023-01-11T21:38:06.5813299Z import triton.language as tl
2023-01-11T21:38:06.5813411Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5813517Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5813648Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5813774Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5813779Z 
2023-01-11T21:38:06.5813867Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.5813982Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5814061Z               filename=__file__,
2023-01-11T21:38:06.5814468Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5814669Z @triton.jit
2023-01-11T21:38:06.5814866Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5814941Z     xnumel = 1
2023-01-11T21:38:06.5815013Z     rnumel = 4
2023-01-11T21:38:06.5815112Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5815248Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5815325Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5815444Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5815563Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5815747Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5815873Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5815997Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5816116Z     _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5816214Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5816301Z         rindex = roffset + rbase
2023-01-11T21:38:06.5816386Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5816456Z         r0 = rindex
2023-01-11T21:38:06.5816693Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5816796Z         tmp4 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5816918Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5817037Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.5817213Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.5817377Z         _tmp5_index = tl.where(xmask & rmask & (_tmp5 > tmp4),  rindex, _tmp5_index)
2023-01-11T21:38:06.5817508Z         _tmp5 = tl.where(xmask & rmask & (_tmp5 > tmp4), tmp4, _tmp5)
2023-01-11T21:38:06.5817629Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5817764Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5817878Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5818008Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.5818114Z     tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5818241Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5818337Z     _tmp5_index_reduce = tl.reshape(
2023-01-11T21:38:06.5818447Z     	tl.argmin(_tmp5, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5818568Z     _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5818667Z     	[1, RBLOCK]) == _tmp5_index_reduce)
2023-01-11T21:38:06.5818754Z     tmp5 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5818871Z     	tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5818998Z     tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None)
2023-01-11T21:38:06.5819121Z ''')
2023-01-11T21:38:06.5819126Z 
2023-01-11T21:38:06.5819131Z 
2023-01-11T21:38:06.5819224Z async_compile.wait(globals())
2023-01-11T21:38:06.5819302Z del async_compile
2023-01-11T21:38:06.5819307Z 
2023-01-11T21:38:06.5819381Z def call(args):
2023-01-11T21:38:06.5819455Z     arg0_1, = args
2023-01-11T21:38:06.5819530Z     args.clear()
2023-01-11T21:38:06.5819615Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5819802Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5819986Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5820168Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5820354Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5820445Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5820617Z         triton_fused_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5820686Z         del arg0_1
2023-01-11T21:38:06.5820782Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.5820787Z 
2023-01-11T21:38:06.5820791Z 
2023-01-11T21:38:06.5820871Z if __name__ == "__main__":
2023-01-11T21:38:06.5820992Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5821121Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5821315Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5821427Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5821433Z 
2023-01-11T21:38:06.5821437Z 
2023-01-11T21:38:06.5821533Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5821609Z import torch
2023-01-11T21:38:06.5821677Z import random
2023-01-11T21:38:06.5821796Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5821919Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5821927Z 
2023-01-11T21:38:06.5822011Z aten = torch.ops.aten
2023-01-11T21:38:06.5822146Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5822241Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5822246Z 
2023-01-11T21:38:06.5822318Z import triton
2023-01-11T21:38:06.5822404Z import triton.language as tl
2023-01-11T21:38:06.5822557Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5822702Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5822707Z 
2023-01-11T21:38:06.5822712Z 
2023-01-11T21:38:06.5822898Z triton_fused_argmin_max_1_min_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5822974Z import triton
2023-01-11T21:38:06.5823065Z import triton.language as tl
2023-01-11T21:38:06.5823178Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5823281Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5823404Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5823529Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5823534Z 
2023-01-11T21:38:06.5823620Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.5823736Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5823820Z               filename=__file__,
2023-01-11T21:38:06.5824227Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5824301Z @triton.jit
2023-01-11T21:38:06.5824496Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5824562Z     xnumel = 1
2023-01-11T21:38:06.5824632Z     rnumel = 4
2023-01-11T21:38:06.5824728Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5824863Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5824974Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5825092Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5825208Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5825384Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5825511Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5825649Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5825778Z     _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5825907Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5825994Z         rindex = roffset + rbase
2023-01-11T21:38:06.5826079Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5826143Z         r0 = rindex
2023-01-11T21:38:06.5826362Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5826483Z         tmp4 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5826603Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5826731Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.5826854Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.5826999Z         _tmp5_index = tl.where(xmask & rmask & (_tmp5 > tmp4),  rindex, _tmp5_index)
2023-01-11T21:38:06.5827117Z         _tmp5 = tl.where(xmask & rmask & (_tmp5 > tmp4), tmp4, _tmp5)
2023-01-11T21:38:06.5827223Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5827356Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5827468Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5827600Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.5827710Z     tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5827843Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5827937Z     _tmp5_index_reduce = tl.reshape(
2023-01-11T21:38:06.5828047Z     	tl.argmin(_tmp5, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5828161Z     _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5828284Z     	[1, RBLOCK]) == _tmp5_index_reduce)
2023-01-11T21:38:06.5828377Z     tmp5 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5828503Z     	tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5828631Z     tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None)
2023-01-11T21:38:06.5828716Z ''')
2023-01-11T21:38:06.5828722Z 
2023-01-11T21:38:06.5828726Z 
2023-01-11T21:38:06.5828818Z async_compile.wait(globals())
2023-01-11T21:38:06.5828889Z del async_compile
2023-01-11T21:38:06.5828901Z 
2023-01-11T21:38:06.5828969Z def call(args):
2023-01-11T21:38:06.5829040Z     arg0_1, = args
2023-01-11T21:38:06.5829115Z     args.clear()
2023-01-11T21:38:06.5829209Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5829399Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5829584Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5829768Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5829948Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5830039Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5830208Z         triton_fused_argmin_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5830281Z         del arg0_1
2023-01-11T21:38:06.5830375Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.5830381Z 
2023-01-11T21:38:06.5830385Z 
2023-01-11T21:38:06.5830465Z if __name__ == "__main__":
2023-01-11T21:38:06.5830584Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5830705Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5830944Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5831056Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5831061Z 
2023-01-11T21:38:06.5831131Z ok (0.329s)
2023-01-11T21:38:06.5831590Z   test_reduction3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5831722Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5831977Z [2023-01-11 21:35:28,706] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 761
2023-01-11T21:38:06.5832240Z [2023-01-11 21:35:28,804] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 761
2023-01-11T21:38:06.5832661Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5832796Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5833053Z [2023-01-11 21:35:28,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 762
2023-01-11T21:38:06.5833305Z [2023-01-11 21:35:28,920] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 762
2023-01-11T21:38:06.5833317Z 
2023-01-11T21:38:06.5833409Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5833481Z import torch
2023-01-11T21:38:06.5833558Z import random
2023-01-11T21:38:06.5833677Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5833802Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5833807Z 
2023-01-11T21:38:06.5833889Z aten = torch.ops.aten
2023-01-11T21:38:06.5834025Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5834140Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5834146Z 
2023-01-11T21:38:06.5834222Z import triton
2023-01-11T21:38:06.5834314Z import triton.language as tl
2023-01-11T21:38:06.5834437Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5834575Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5834580Z 
2023-01-11T21:38:06.5834585Z 
2023-01-11T21:38:06.5834770Z triton_fused_argmax_max_1_min_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5834845Z import triton
2023-01-11T21:38:06.5834936Z import triton.language as tl
2023-01-11T21:38:06.5835042Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5835146Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5835278Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5835429Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5835435Z 
2023-01-11T21:38:06.5835533Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.5835662Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5835747Z               filename=__file__,
2023-01-11T21:38:06.5836151Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5836220Z @triton.jit
2023-01-11T21:38:06.5836412Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5836491Z     xnumel = 1
2023-01-11T21:38:06.5836593Z     rnumel = 4
2023-01-11T21:38:06.5836686Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5836821Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5836906Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5837017Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5837138Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5837325Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5837451Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5837629Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5837742Z     _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5837847Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5837929Z         rindex = roffset + rbase
2023-01-11T21:38:06.5838015Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5838091Z         r0 = rindex
2023-01-11T21:38:06.5838288Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5838390Z         tmp4 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5838511Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5838643Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.5838766Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.5838902Z         _tmp5_index = tl.where(xmask & rmask & (_tmp5 < tmp4),  rindex, _tmp5_index)
2023-01-11T21:38:06.5839025Z         _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5)
2023-01-11T21:38:06.5839138Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5839268Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5839381Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5839511Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.5839624Z     tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5839747Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5839843Z     _tmp5_index_reduce = tl.reshape(
2023-01-11T21:38:06.5839978Z     	tl.argmax(_tmp5, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5840099Z     _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5840195Z     	[1, RBLOCK]) == _tmp5_index_reduce)
2023-01-11T21:38:06.5840285Z     tmp5 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5840408Z     	tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5840536Z     tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None)
2023-01-11T21:38:06.5840617Z ''')
2023-01-11T21:38:06.5840622Z 
2023-01-11T21:38:06.5840627Z 
2023-01-11T21:38:06.5840719Z async_compile.wait(globals())
2023-01-11T21:38:06.5840796Z del async_compile
2023-01-11T21:38:06.5840804Z 
2023-01-11T21:38:06.5840880Z def call(args):
2023-01-11T21:38:06.5840954Z     arg0_1, = args
2023-01-11T21:38:06.5841029Z     args.clear()
2023-01-11T21:38:06.5841122Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5841302Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5841488Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5841673Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5841856Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5841948Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5842116Z         triton_fused_argmax_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5842192Z         del arg0_1
2023-01-11T21:38:06.5842288Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.5842293Z 
2023-01-11T21:38:06.5842298Z 
2023-01-11T21:38:06.5842401Z if __name__ == "__main__":
2023-01-11T21:38:06.5842520Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5842644Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5842841Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5842956Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5842961Z 
2023-01-11T21:38:06.5842966Z 
2023-01-11T21:38:06.5843061Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5843134Z import torch
2023-01-11T21:38:06.5843207Z import random
2023-01-11T21:38:06.5843321Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5843444Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5843449Z 
2023-01-11T21:38:06.5843531Z aten = torch.ops.aten
2023-01-11T21:38:06.5843667Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5843766Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5843771Z 
2023-01-11T21:38:06.5843848Z import triton
2023-01-11T21:38:06.5843940Z import triton.language as tl
2023-01-11T21:38:06.5844064Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5844196Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5844201Z 
2023-01-11T21:38:06.5844205Z 
2023-01-11T21:38:06.5844393Z triton_fused_argmax_max_1_min_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.5844467Z import triton
2023-01-11T21:38:06.5844558Z import triton.language as tl
2023-01-11T21:38:06.5844671Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5844774Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5844903Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5845022Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5845033Z 
2023-01-11T21:38:06.5845115Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.5845228Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5845315Z               filename=__file__,
2023-01-11T21:38:06.5845718Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*i64', 5: 'i32', 6: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5845820Z @triton.jit
2023-01-11T21:38:06.5846015Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5846088Z     xnumel = 1
2023-01-11T21:38:06.5846153Z     rnumel = 4
2023-01-11T21:38:06.5846252Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5846387Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5846469Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5846588Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5846707Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.5846890Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5847015Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5847189Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5847307Z     _tmp5_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5847413Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5847501Z         rindex = roffset + rbase
2023-01-11T21:38:06.5847586Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5847660Z         r0 = rindex
2023-01-11T21:38:06.5847875Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5847988Z         tmp4 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5848106Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.5848232Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.5848390Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.5848533Z         _tmp5_index = tl.where(xmask & rmask & (_tmp5 < tmp4),  rindex, _tmp5_index)
2023-01-11T21:38:06.5848654Z         _tmp5 = tl.where(xmask & rmask & (_tmp5 < tmp4), tmp4, _tmp5)
2023-01-11T21:38:06.5848770Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5848900Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5849006Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5849135Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp2, None)
2023-01-11T21:38:06.5849247Z     tmp3 = tl.reshape(tl.min(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.5849376Z     tl.store(out_ptr2 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5849471Z     _tmp5_index_reduce = tl.reshape(
2023-01-11T21:38:06.5849581Z     	tl.argmax(_tmp5, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5849701Z     _tmp5_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5849793Z     	[1, RBLOCK]) == _tmp5_index_reduce)
2023-01-11T21:38:06.5849880Z     tmp5 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5850003Z     	tl.where(_tmp5_index_mask, _tmp5_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5850134Z     tl.store(out_ptr3 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp5, None)
2023-01-11T21:38:06.5850219Z ''')
2023-01-11T21:38:06.5850224Z 
2023-01-11T21:38:06.5850229Z 
2023-01-11T21:38:06.5850322Z async_compile.wait(globals())
2023-01-11T21:38:06.5850399Z del async_compile
2023-01-11T21:38:06.5850404Z 
2023-01-11T21:38:06.5850479Z def call(args):
2023-01-11T21:38:06.5850545Z     arg0_1, = args
2023-01-11T21:38:06.5850619Z     args.clear()
2023-01-11T21:38:06.5850712Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5850899Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5851082Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5851270Z         buf2 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5851453Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5851537Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5851737Z         triton_fused_argmax_max_1_min_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, buf3, 1, 4, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5851815Z         del arg0_1
2023-01-11T21:38:06.5851910Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.5851916Z 
2023-01-11T21:38:06.5851920Z 
2023-01-11T21:38:06.5852000Z if __name__ == "__main__":
2023-01-11T21:38:06.5852118Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5852244Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5852438Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5852544Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5852549Z 
2023-01-11T21:38:06.5852624Z ok (0.233s)
2023-01-11T21:38:06.5853082Z   test_reduction4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5853216Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5853474Z [2023-01-11 21:35:28,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 763
2023-01-11T21:38:06.5853738Z [2023-01-11 21:35:29,045] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 763
2023-01-11T21:38:06.5854151Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5854308Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5854678Z [2023-01-11 21:35:29,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 764
2023-01-11T21:38:06.5854942Z [2023-01-11 21:35:29,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 764
2023-01-11T21:38:06.5855354Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5855481Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5855730Z [2023-01-11 21:35:29,181] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 765
2023-01-11T21:38:06.5855748Z 
2023-01-11T21:38:06.5855839Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5855913Z import torch
2023-01-11T21:38:06.5855990Z import random
2023-01-11T21:38:06.5856109Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5856234Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5856239Z 
2023-01-11T21:38:06.5856320Z aten = torch.ops.aten
2023-01-11T21:38:06.5856458Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5856547Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5856552Z 
2023-01-11T21:38:06.5856626Z import triton
2023-01-11T21:38:06.5856717Z import triton.language as tl
2023-01-11T21:38:06.5856842Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5856985Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5856993Z 
2023-01-11T21:38:06.5856997Z 
2023-01-11T21:38:06.5857217Z triton_fused_argmax_argmin_0 = async_compile.triton('''
2023-01-11T21:38:06.5857323Z import triton
2023-01-11T21:38:06.5857417Z import triton.language as tl
2023-01-11T21:38:06.5857575Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5857680Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5857809Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5857934Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5857939Z 
2023-01-11T21:38:06.5858029Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.5858141Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5858227Z               filename=__file__,
2023-01-11T21:38:06.5858596Z               meta={'signature': {0: '*fp32', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.5858674Z @triton.jit
2023-01-11T21:38:06.5858852Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5858925Z     xnumel = 1
2023-01-11T21:38:06.5858997Z     rnumel = 128
2023-01-11T21:38:06.5859093Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5859225Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5859307Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5859418Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5859599Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5859714Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5859840Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5859955Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5860097Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5860185Z         rindex = roffset + rbase
2023-01-11T21:38:06.5860263Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5860335Z         r0 = rindex
2023-01-11T21:38:06.5860529Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5860632Z         tmp2 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.5860773Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.5860899Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.5861036Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.5861153Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.5861249Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.5861359Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5861476Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5861574Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.5861661Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5861784Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5861919Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5862007Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.5862112Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5862230Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5862326Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.5862414Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5862535Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5862663Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5862742Z ''')
2023-01-11T21:38:06.5862747Z 
2023-01-11T21:38:06.5862756Z 
2023-01-11T21:38:06.5862845Z async_compile.wait(globals())
2023-01-11T21:38:06.5862920Z del async_compile
2023-01-11T21:38:06.5862925Z 
2023-01-11T21:38:06.5863000Z def call(args):
2023-01-11T21:38:06.5863071Z     arg0_1, = args
2023-01-11T21:38:06.5863144Z     args.clear()
2023-01-11T21:38:06.5863236Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5863441Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5863627Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5863717Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5863877Z         triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5863949Z         del arg0_1
2023-01-11T21:38:06.5864036Z         return (buf0, buf1, )
2023-01-11T21:38:06.5864041Z 
2023-01-11T21:38:06.5864045Z 
2023-01-11T21:38:06.5864123Z if __name__ == "__main__":
2023-01-11T21:38:06.5864240Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5864364Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5864564Z     arg0_1 = rand_strided((128, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5864676Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5864681Z 
2023-01-11T21:38:06.5864685Z 
2023-01-11T21:38:06.5864782Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5864856Z import torch
2023-01-11T21:38:06.5864931Z import random
2023-01-11T21:38:06.5865052Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5865175Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5865180Z 
2023-01-11T21:38:06.5865256Z aten = torch.ops.aten
2023-01-11T21:38:06.5865390Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5865494Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5865501Z 
2023-01-11T21:38:06.5865590Z import triton
2023-01-11T21:38:06.5865688Z import triton.language as tl
2023-01-11T21:38:06.5865856Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5865993Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5865999Z 
2023-01-11T21:38:06.5866003Z 
2023-01-11T21:38:06.5866175Z triton_fused_argmax_argmin_0 = async_compile.triton('''
2023-01-11T21:38:06.5866242Z import triton
2023-01-11T21:38:06.5866337Z import triton.language as tl
2023-01-11T21:38:06.5866450Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5866552Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5866681Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.5866804Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5866809Z 
2023-01-11T21:38:06.5866896Z @reduction(size_hints=[1, 128],
2023-01-11T21:38:06.5867011Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.5867089Z               filename=__file__,
2023-01-11T21:38:06.5867466Z               meta={'signature': {0: '*fp16', 1: '*i64', 2: '*i64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.5867543Z @triton.jit
2023-01-11T21:38:06.5867722Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.5867796Z     xnumel = 1
2023-01-11T21:38:06.5867869Z     rnumel = 128
2023-01-11T21:38:06.5867966Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5868094Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.5868178Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5868297Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.5868479Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.5868596Z     _tmp1_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5868723Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.5868840Z     _tmp3_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.5868944Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.5869025Z         rindex = roffset + rbase
2023-01-11T21:38:06.5869108Z         rmask = rindex < rnumel
2023-01-11T21:38:06.5869178Z         r0 = rindex
2023-01-11T21:38:06.5869423Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5869544Z         tmp2 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.5869686Z         _tmp1_index = tl.where(xmask & rmask & (_tmp1 < tmp0),  rindex, _tmp1_index)
2023-01-11T21:38:06.5869813Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.5869944Z         _tmp3_index = tl.where(xmask & rmask & (_tmp3 > tmp2),  rindex, _tmp3_index)
2023-01-11T21:38:06.5870066Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 > tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.5870162Z     _tmp1_index_reduce = tl.reshape(
2023-01-11T21:38:06.5870273Z     	tl.argmax(_tmp1, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5870392Z     _tmp1_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5870486Z     	[1, RBLOCK]) == _tmp1_index_reduce)
2023-01-11T21:38:06.5870578Z     tmp1 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5870698Z     	tl.where(_tmp1_index_mask, _tmp1_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5870830Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.5870925Z     _tmp3_index_reduce = tl.reshape(
2023-01-11T21:38:06.5871035Z     	tl.argmin(_tmp3, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.5871155Z     _tmp3_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.5871251Z     	[1, RBLOCK]) == _tmp3_index_reduce)
2023-01-11T21:38:06.5871338Z     tmp3 = tl.reshape(tl.sum(
2023-01-11T21:38:06.5871461Z     	tl.where(_tmp3_index_mask, _tmp3_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.5871584Z     tl.store(out_ptr1 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp3, None)
2023-01-11T21:38:06.5871711Z ''')
2023-01-11T21:38:06.5871717Z 
2023-01-11T21:38:06.5871722Z 
2023-01-11T21:38:06.5871819Z async_compile.wait(globals())
2023-01-11T21:38:06.5871898Z del async_compile
2023-01-11T21:38:06.5871903Z 
2023-01-11T21:38:06.5871980Z def call(args):
2023-01-11T21:38:06.5872059Z     arg0_1, = args
2023-01-11T21:38:06.5872137Z     args.clear()
2023-01-11T21:38:06.5872226Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5872415Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5872601Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5872694Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5872851Z         triton_fused_argmax_argmin_0.run(arg0_1, buf0, buf1, 1, 128, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.5872926Z         del arg0_1
2023-01-11T21:38:06.5873011Z         return (buf0, buf1, )
2023-01-11T21:38:06.5873016Z 
2023-01-11T21:38:06.5873024Z 
2023-01-11T21:38:06.5873107Z if __name__ == "__main__":
2023-01-11T21:38:06.5873220Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5873348Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5873553Z     arg0_1 = rand_strided((128, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5873671Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5873676Z 
2023-01-11T21:38:06.5873944Z [2023-01-11 21:35:29,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 765
2023-01-11T21:38:06.5874360Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5874496Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5874760Z [2023-01-11 21:35:29,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 766
2023-01-11T21:38:06.5875023Z [2023-01-11 21:35:29,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 766
2023-01-11T21:38:06.5875061Z 
2023-01-11T21:38:06.5875162Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5875232Z import torch
2023-01-11T21:38:06.5875309Z import random
2023-01-11T21:38:06.5875455Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5875602Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5875608Z 
2023-01-11T21:38:06.5875694Z aten = torch.ops.aten
2023-01-11T21:38:06.5875830Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5875927Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5875932Z 
2023-01-11T21:38:06.5876001Z import triton
2023-01-11T21:38:06.5876096Z import triton.language as tl
2023-01-11T21:38:06.5876226Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5876366Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5876372Z 
2023-01-11T21:38:06.5876376Z 
2023-01-11T21:38:06.5876539Z triton_fused_argmax_0 = async_compile.triton('''
2023-01-11T21:38:06.5876618Z import triton
2023-01-11T21:38:06.5876712Z import triton.language as tl
2023-01-11T21:38:06.5876827Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5876924Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5877060Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5877186Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5877191Z 
2023-01-11T21:38:06.5877581Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5877683Z @triton.jit
2023-01-11T21:38:06.5877805Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5877879Z     xnumel = 16
2023-01-11T21:38:06.5877980Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5878104Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5878195Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5878268Z     x0 = xindex
2023-01-11T21:38:06.5878402Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), 0, xmask)
2023-01-11T21:38:06.5878491Z ''')
2023-01-11T21:38:06.5878497Z 
2023-01-11T21:38:06.5878501Z 
2023-01-11T21:38:06.5878596Z async_compile.wait(globals())
2023-01-11T21:38:06.5878677Z del async_compile
2023-01-11T21:38:06.5878682Z 
2023-01-11T21:38:06.5878752Z def call(args):
2023-01-11T21:38:06.5878828Z     arg0_1, = args
2023-01-11T21:38:06.5878906Z     args.clear()
2023-01-11T21:38:06.5878999Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5879200Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5879298Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5879433Z         triton_fused_argmax_0.run(buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5879622Z         buf1 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5879759Z         triton_fused_argmax_0.run(buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5879845Z         return (buf0, buf1, )
2023-01-11T21:38:06.5879850Z 
2023-01-11T21:38:06.5879854Z 
2023-01-11T21:38:06.5879934Z if __name__ == "__main__":
2023-01-11T21:38:06.5880054Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5880185Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5880394Z     arg0_1 = rand_strided((4, 4, 1), (4, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5880508Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5880513Z 
2023-01-11T21:38:06.5880520Z 
2023-01-11T21:38:06.5880619Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5880689Z import torch
2023-01-11T21:38:06.5880766Z import random
2023-01-11T21:38:06.5880885Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5881009Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5881014Z 
2023-01-11T21:38:06.5881126Z aten = torch.ops.aten
2023-01-11T21:38:06.5881269Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5881366Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5881371Z 
2023-01-11T21:38:06.5881439Z import triton
2023-01-11T21:38:06.5881533Z import triton.language as tl
2023-01-11T21:38:06.5881659Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5881799Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5881804Z 
2023-01-11T21:38:06.5881809Z 
2023-01-11T21:38:06.5881969Z triton_fused_argmax_0 = async_compile.triton('''
2023-01-11T21:38:06.5882048Z import triton
2023-01-11T21:38:06.5882139Z import triton.language as tl
2023-01-11T21:38:06.5882255Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5882352Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5882487Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5882615Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5882620Z 
2023-01-11T21:38:06.5883004Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5883080Z @triton.jit
2023-01-11T21:38:06.5883201Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5883278Z     xnumel = 16
2023-01-11T21:38:06.5883377Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5883501Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5883614Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5883687Z     x0 = xindex
2023-01-11T21:38:06.5883819Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), 0, xmask)
2023-01-11T21:38:06.5883906Z ''')
2023-01-11T21:38:06.5883911Z 
2023-01-11T21:38:06.5883916Z 
2023-01-11T21:38:06.5884012Z async_compile.wait(globals())
2023-01-11T21:38:06.5884094Z del async_compile
2023-01-11T21:38:06.5884100Z 
2023-01-11T21:38:06.5884179Z def call(args):
2023-01-11T21:38:06.5884248Z     arg0_1, = args
2023-01-11T21:38:06.5884325Z     args.clear()
2023-01-11T21:38:06.5884418Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5884619Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5884712Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5884848Z         triton_fused_argmax_0.run(buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5885046Z         buf1 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.5885173Z         triton_fused_argmax_0.run(buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.5885263Z         return (buf0, buf1, )
2023-01-11T21:38:06.5885268Z 
2023-01-11T21:38:06.5885272Z 
2023-01-11T21:38:06.5885356Z if __name__ == "__main__":
2023-01-11T21:38:06.5885476Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5885608Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5885815Z     arg0_1 = rand_strided((4, 4, 1), (4, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5885928Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5885933Z 
2023-01-11T21:38:06.5886004Z ok (0.349s)
2023-01-11T21:38:06.5886481Z   test_reflection_pad2d_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5886611Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5886875Z [2023-01-11 21:35:29,290] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 767
2023-01-11T21:38:06.5887217Z [2023-01-11 21:35:29,360] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 767
2023-01-11T21:38:06.5887633Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5887766Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5888021Z [2023-01-11 21:35:29,380] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 768
2023-01-11T21:38:06.5888286Z [2023-01-11 21:35:29,448] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 768
2023-01-11T21:38:06.5888702Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5888834Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5889089Z [2023-01-11 21:35:29,467] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 769
2023-01-11T21:38:06.5889095Z 
2023-01-11T21:38:06.5889197Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5889267Z import torch
2023-01-11T21:38:06.5889343Z import random
2023-01-11T21:38:06.5889493Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5889620Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5889625Z 
2023-01-11T21:38:06.5889708Z aten = torch.ops.aten
2023-01-11T21:38:06.5889847Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5889949Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5889954Z 
2023-01-11T21:38:06.5890030Z import triton
2023-01-11T21:38:06.5890118Z import triton.language as tl
2023-01-11T21:38:06.5890243Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5890387Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5890392Z 
2023-01-11T21:38:06.5890397Z 
2023-01-11T21:38:06.5890592Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5890669Z import triton
2023-01-11T21:38:06.5890764Z import triton.language as tl
2023-01-11T21:38:06.5890881Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5890981Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5891119Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5891248Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5891254Z 
2023-01-11T21:38:06.5891660Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5891739Z @triton.jit
2023-01-11T21:38:06.5891873Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5891953Z     xnumel = 64
2023-01-11T21:38:06.5892053Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5892177Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5892263Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5892345Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5892425Z     x0 = xindex % 8
2023-01-11T21:38:06.5892499Z     x2 = xindex
2023-01-11T21:38:06.5892572Z     tmp0 = x1
2023-01-11T21:38:06.5892645Z     tmp1 = x0
2023-01-11T21:38:06.5892751Z     tmp2 = tl.load(in_ptr0 + (tmp1 + (8*tmp0)), None)
2023-01-11T21:38:06.5892886Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5893002Z ''')
2023-01-11T21:38:06.5893008Z 
2023-01-11T21:38:06.5893013Z 
2023-01-11T21:38:06.5893110Z async_compile.wait(globals())
2023-01-11T21:38:06.5893189Z del async_compile
2023-01-11T21:38:06.5893194Z 
2023-01-11T21:38:06.5893270Z def call(args):
2023-01-11T21:38:06.5893354Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5893432Z     args.clear()
2023-01-11T21:38:06.5893519Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5893737Z         buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5893829Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5893999Z         triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5894078Z         del arg0_1
2023-01-11T21:38:06.5894157Z         return (buf0, )
2023-01-11T21:38:06.5894163Z 
2023-01-11T21:38:06.5894167Z 
2023-01-11T21:38:06.5894252Z if __name__ == "__main__":
2023-01-11T21:38:06.5894367Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5894601Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5894824Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5895037Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5895155Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5895160Z 
2023-01-11T21:38:06.5895165Z 
2023-01-11T21:38:06.5895262Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5895336Z import torch
2023-01-11T21:38:06.5895427Z import random
2023-01-11T21:38:06.5895611Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5895746Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5895751Z 
2023-01-11T21:38:06.5895836Z aten = torch.ops.aten
2023-01-11T21:38:06.5895975Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5896074Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5896079Z 
2023-01-11T21:38:06.5896155Z import triton
2023-01-11T21:38:06.5896249Z import triton.language as tl
2023-01-11T21:38:06.5896376Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5896510Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5896515Z 
2023-01-11T21:38:06.5896525Z 
2023-01-11T21:38:06.5896715Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5896792Z import triton
2023-01-11T21:38:06.5896892Z import triton.language as tl
2023-01-11T21:38:06.5897006Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5897114Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5897307Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5897441Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5897445Z 
2023-01-11T21:38:06.5897845Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5897919Z @triton.jit
2023-01-11T21:38:06.5898051Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5898125Z     xnumel = 64
2023-01-11T21:38:06.5898225Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5898355Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5898438Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5898512Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5898588Z     x0 = xindex % 8
2023-01-11T21:38:06.5898665Z     x2 = xindex
2023-01-11T21:38:06.5898739Z     tmp0 = x1
2023-01-11T21:38:06.5898813Z     tmp1 = x0
2023-01-11T21:38:06.5898943Z     tmp2 = tl.load(in_ptr0 + (tmp1 + (8*tmp0)), None).to(tl.float32)
2023-01-11T21:38:06.5899081Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.5899161Z ''')
2023-01-11T21:38:06.5899211Z 
2023-01-11T21:38:06.5899216Z 
2023-01-11T21:38:06.5899302Z async_compile.wait(globals())
2023-01-11T21:38:06.5899379Z del async_compile
2023-01-11T21:38:06.5899385Z 
2023-01-11T21:38:06.5899460Z def call(args):
2023-01-11T21:38:06.5899539Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5899614Z     args.clear()
2023-01-11T21:38:06.5899707Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5899923Z         buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5900009Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5900172Z         triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5900246Z         del arg0_1
2023-01-11T21:38:06.5900320Z         return (buf0, )
2023-01-11T21:38:06.5900326Z 
2023-01-11T21:38:06.5900330Z 
2023-01-11T21:38:06.5900408Z if __name__ == "__main__":
2023-01-11T21:38:06.5900533Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5900672Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5900913Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5901146Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5901273Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5901278Z 
2023-01-11T21:38:06.5901581Z [2023-01-11 21:35:29,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 769
2023-01-11T21:38:06.5902070Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5902247Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5902537Z [2023-01-11 21:35:29,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 770
2023-01-11T21:38:06.5902544Z 
2023-01-11T21:38:06.5902646Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5902722Z import torch
2023-01-11T21:38:06.5902796Z import random
2023-01-11T21:38:06.5902917Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5903052Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5903057Z 
2023-01-11T21:38:06.5903140Z aten = torch.ops.aten
2023-01-11T21:38:06.5903291Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5903392Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5903397Z 
2023-01-11T21:38:06.5903470Z import triton
2023-01-11T21:38:06.5903570Z import triton.language as tl
2023-01-11T21:38:06.5903705Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5903852Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5903857Z 
2023-01-11T21:38:06.5903872Z 
2023-01-11T21:38:06.5904085Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5904159Z import triton
2023-01-11T21:38:06.5904257Z import triton.language as tl
2023-01-11T21:38:06.5904378Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5904486Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5904629Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5904766Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5904771Z 
2023-01-11T21:38:06.5905249Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5905323Z @triton.jit
2023-01-11T21:38:06.5905489Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5905562Z     xnumel = 64
2023-01-11T21:38:06.5905674Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5905817Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5905915Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5905987Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5906059Z     x0 = xindex % 8
2023-01-11T21:38:06.5906131Z     x2 = xindex
2023-01-11T21:38:06.5906205Z     tmp0 = 1 + x1
2023-01-11T21:38:06.5906278Z     tmp1 = 1 + x0
2023-01-11T21:38:06.5906391Z     tmp2 = tl.load(in_ptr0 + (tmp1 + (10*tmp0)), None)
2023-01-11T21:38:06.5906460Z     tmp3 = x0
2023-01-11T21:38:06.5906533Z     tmp4 = tmp3 >= 1
2023-01-11T21:38:06.5906608Z     tmp5 = tmp3 <= 1
2023-01-11T21:38:06.5906683Z     tmp6 = tmp4 & tmp5
2023-01-11T21:38:06.5906791Z     tmp7 = 1 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5906949Z     tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5907074Z     tmp9 = tl.load(in_ptr0 + (tmp8 + (10*tmp7)), tmp6, other=0)
2023-01-11T21:38:06.5907169Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.5907243Z     tmp11 = tmp2 + tmp10
2023-01-11T21:38:06.5907323Z     tmp12 = tmp3 >= 6
2023-01-11T21:38:06.5907400Z     tmp13 = tmp3 <= 6
2023-01-11T21:38:06.5907479Z     tmp14 = tmp12 & tmp13
2023-01-11T21:38:06.5907585Z     tmp15 = 1 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5907747Z     tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5907872Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (10*tmp15)), tmp14, other=0)
2023-01-11T21:38:06.5907960Z     tmp18 = tl.where(tmp14, tmp17, 0.0)
2023-01-11T21:38:06.5908065Z     tmp19 = tmp11 + tmp18
2023-01-11T21:38:06.5908137Z     tmp20 = x1
2023-01-11T21:38:06.5908211Z     tmp21 = tmp20 >= 1
2023-01-11T21:38:06.5908286Z     tmp22 = tmp20 <= 1
2023-01-11T21:38:06.5914676Z     tmp23 = tmp21 & tmp22
2023-01-11T21:38:06.5914882Z     tmp24 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5914998Z     tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5915118Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (10*tmp24)), tmp23, other=0)
2023-01-11T21:38:06.5915218Z     tmp27 = tl.where(tmp23, tmp26, 0.0)
2023-01-11T21:38:06.5915301Z     tmp28 = tmp19 + tmp27
2023-01-11T21:38:06.5915381Z     tmp29 = tmp20 >= 6
2023-01-11T21:38:06.5915467Z     tmp30 = tmp20 <= 6
2023-01-11T21:38:06.5915566Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5915756Z     tmp32 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5915857Z     tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5915984Z     tmp34 = tl.load(in_ptr0 + (tmp33 + (10*tmp32)), tmp31, other=0)
2023-01-11T21:38:06.5916084Z     tmp35 = tl.where(tmp31, tmp34, 0.0)
2023-01-11T21:38:06.5916164Z     tmp36 = tmp28 + tmp35
2023-01-11T21:38:06.5916240Z     tmp37 = tmp23 & tmp6
2023-01-11T21:38:06.5916403Z     tmp38 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5916557Z     tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5916684Z     tmp40 = tl.load(in_ptr0 + (tmp39 + (10*tmp38)), tmp37, other=0)
2023-01-11T21:38:06.5916782Z     tmp41 = tl.where(tmp37, tmp40, 0.0)
2023-01-11T21:38:06.5916863Z     tmp42 = tmp36 + tmp41
2023-01-11T21:38:06.5916943Z     tmp43 = tmp23 & tmp14
2023-01-11T21:38:06.5917103Z     tmp44 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5917264Z     tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5917382Z     tmp46 = tl.load(in_ptr0 + (tmp45 + (10*tmp44)), tmp43, other=0)
2023-01-11T21:38:06.5917476Z     tmp47 = tl.where(tmp43, tmp46, 0.0)
2023-01-11T21:38:06.5917560Z     tmp48 = tmp42 + tmp47
2023-01-11T21:38:06.5917643Z     tmp49 = tmp31 & tmp6
2023-01-11T21:38:06.5917801Z     tmp50 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5917960Z     tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5918082Z     tmp52 = tl.load(in_ptr0 + (tmp51 + (10*tmp50)), tmp49, other=0)
2023-01-11T21:38:06.5918226Z     tmp53 = tl.where(tmp49, tmp52, 0.0)
2023-01-11T21:38:06.5918310Z     tmp54 = tmp48 + tmp53
2023-01-11T21:38:06.5918390Z     tmp55 = tmp31 & tmp14
2023-01-11T21:38:06.5918549Z     tmp56 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5918709Z     tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5918833Z     tmp58 = tl.load(in_ptr0 + (tmp57 + (10*tmp56)), tmp55, other=0)
2023-01-11T21:38:06.5918931Z     tmp59 = tl.where(tmp55, tmp58, 0.0)
2023-01-11T21:38:06.5919005Z     tmp60 = tmp54 + tmp59
2023-01-11T21:38:06.5919142Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask)
2023-01-11T21:38:06.5919232Z ''')
2023-01-11T21:38:06.5919238Z 
2023-01-11T21:38:06.5919243Z 
2023-01-11T21:38:06.5919334Z async_compile.wait(globals())
2023-01-11T21:38:06.5919411Z del async_compile
2023-01-11T21:38:06.5919416Z 
2023-01-11T21:38:06.5919492Z def call(args):
2023-01-11T21:38:06.5919569Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5919645Z     args.clear()
2023-01-11T21:38:06.5919731Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5919950Z         buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5920040Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5920205Z         triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5920279Z         del arg0_1
2023-01-11T21:38:06.5920358Z         return (buf0, )
2023-01-11T21:38:06.5920363Z 
2023-01-11T21:38:06.5920367Z 
2023-01-11T21:38:06.5920448Z if __name__ == "__main__":
2023-01-11T21:38:06.5920567Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5920729Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5920953Z     arg0_1 = rand_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5921168Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5921290Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5921296Z 
2023-01-11T21:38:06.5921560Z [2023-01-11 21:35:29,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 770
2023-01-11T21:38:06.5921977Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5922112Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5922366Z [2023-01-11 21:35:29,829] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 771
2023-01-11T21:38:06.5922372Z 
2023-01-11T21:38:06.5922469Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5922542Z import torch
2023-01-11T21:38:06.5922614Z import random
2023-01-11T21:38:06.5922734Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5922859Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5922864Z 
2023-01-11T21:38:06.5922946Z aten = torch.ops.aten
2023-01-11T21:38:06.5923081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5923178Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5923183Z 
2023-01-11T21:38:06.5923257Z import triton
2023-01-11T21:38:06.5923342Z import triton.language as tl
2023-01-11T21:38:06.5923467Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5923609Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5923614Z 
2023-01-11T21:38:06.5923619Z 
2023-01-11T21:38:06.5923812Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5923889Z import triton
2023-01-11T21:38:06.5923980Z import triton.language as tl
2023-01-11T21:38:06.5924121Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5924229Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5924355Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5924480Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5924486Z 
2023-01-11T21:38:06.5924892Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5924966Z @triton.jit
2023-01-11T21:38:06.5925099Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5925174Z     xnumel = 64
2023-01-11T21:38:06.5925271Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5925417Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5925501Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5925597Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5925679Z     x0 = xindex % 8
2023-01-11T21:38:06.5925749Z     x2 = xindex
2023-01-11T21:38:06.5925822Z     tmp0 = 1 + x1
2023-01-11T21:38:06.5925898Z     tmp1 = 1 + x0
2023-01-11T21:38:06.5926021Z     tmp2 = tl.load(in_ptr0 + (tmp1 + (10*tmp0)), None).to(tl.float32)
2023-01-11T21:38:06.5926091Z     tmp3 = x0
2023-01-11T21:38:06.5926165Z     tmp4 = tmp3 >= 1
2023-01-11T21:38:06.5926240Z     tmp5 = tmp3 <= 1
2023-01-11T21:38:06.5926318Z     tmp6 = tmp4 & tmp5
2023-01-11T21:38:06.5926428Z     tmp7 = 1 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5926587Z     tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5926751Z     tmp9 = tl.load(in_ptr0 + (tmp8 + (10*tmp7)), tmp6, other=0).to(tl.float32)
2023-01-11T21:38:06.5926847Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.5926930Z     tmp11 = tmp2 + tmp10
2023-01-11T21:38:06.5927007Z     tmp12 = tmp3 >= 6
2023-01-11T21:38:06.5927084Z     tmp13 = tmp3 <= 6
2023-01-11T21:38:06.5927168Z     tmp14 = tmp12 & tmp13
2023-01-11T21:38:06.5927274Z     tmp15 = 1 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5927429Z     tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5927572Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (10*tmp15)), tmp14, other=0).to(tl.float32)
2023-01-11T21:38:06.5927667Z     tmp18 = tl.where(tmp14, tmp17, 0.0)
2023-01-11T21:38:06.5927747Z     tmp19 = tmp11 + tmp18
2023-01-11T21:38:06.5927822Z     tmp20 = x1
2023-01-11T21:38:06.5927898Z     tmp21 = tmp20 >= 1
2023-01-11T21:38:06.5927974Z     tmp22 = tmp20 <= 1
2023-01-11T21:38:06.5928047Z     tmp23 = tmp21 & tmp22
2023-01-11T21:38:06.5928207Z     tmp24 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5928317Z     tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5928452Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (10*tmp24)), tmp23, other=0).to(tl.float32)
2023-01-11T21:38:06.5928546Z     tmp27 = tl.where(tmp23, tmp26, 0.0)
2023-01-11T21:38:06.5928626Z     tmp28 = tmp19 + tmp27
2023-01-11T21:38:06.5928709Z     tmp29 = tmp20 >= 6
2023-01-11T21:38:06.5928779Z     tmp30 = tmp20 <= 6
2023-01-11T21:38:06.5928860Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5929022Z     tmp32 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5929127Z     tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5929267Z     tmp34 = tl.load(in_ptr0 + (tmp33 + (10*tmp32)), tmp31, other=0).to(tl.float32)
2023-01-11T21:38:06.5929362Z     tmp35 = tl.where(tmp31, tmp34, 0.0)
2023-01-11T21:38:06.5929442Z     tmp36 = tmp28 + tmp35
2023-01-11T21:38:06.5929515Z     tmp37 = tmp23 & tmp6
2023-01-11T21:38:06.5929677Z     tmp38 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5929837Z     tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5929976Z     tmp40 = tl.load(in_ptr0 + (tmp39 + (10*tmp38)), tmp37, other=0).to(tl.float32)
2023-01-11T21:38:06.5930070Z     tmp41 = tl.where(tmp37, tmp40, 0.0)
2023-01-11T21:38:06.5930148Z     tmp42 = tmp36 + tmp41
2023-01-11T21:38:06.5930264Z     tmp43 = tmp23 & tmp14
2023-01-11T21:38:06.5930423Z     tmp44 = 1 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5930576Z     tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5930714Z     tmp46 = tl.load(in_ptr0 + (tmp45 + (10*tmp44)), tmp43, other=0).to(tl.float32)
2023-01-11T21:38:06.5930807Z     tmp47 = tl.where(tmp43, tmp46, 0.0)
2023-01-11T21:38:06.5930886Z     tmp48 = tmp42 + tmp47
2023-01-11T21:38:06.5930964Z     tmp49 = tmp31 & tmp6
2023-01-11T21:38:06.5931121Z     tmp50 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5931279Z     tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5931409Z     tmp52 = tl.load(in_ptr0 + (tmp51 + (10*tmp50)), tmp49, other=0).to(tl.float32)
2023-01-11T21:38:06.5931503Z     tmp53 = tl.where(tmp49, tmp52, 0.0)
2023-01-11T21:38:06.5931581Z     tmp54 = tmp48 + tmp53
2023-01-11T21:38:06.5931658Z     tmp55 = tmp31 & tmp14
2023-01-11T21:38:06.5931817Z     tmp56 = 15 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5931976Z     tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5932113Z     tmp58 = tl.load(in_ptr0 + (tmp57 + (10*tmp56)), tmp55, other=0).to(tl.float32)
2023-01-11T21:38:06.5932200Z     tmp59 = tl.where(tmp55, tmp58, 0.0)
2023-01-11T21:38:06.5932284Z     tmp60 = tmp54 + tmp59
2023-01-11T21:38:06.5932419Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask)
2023-01-11T21:38:06.5932504Z ''')
2023-01-11T21:38:06.5932509Z 
2023-01-11T21:38:06.5932514Z 
2023-01-11T21:38:06.5932608Z async_compile.wait(globals())
2023-01-11T21:38:06.5932684Z del async_compile
2023-01-11T21:38:06.5932717Z 
2023-01-11T21:38:06.5932792Z def call(args):
2023-01-11T21:38:06.5932871Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5932938Z     args.clear()
2023-01-11T21:38:06.5933031Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5933251Z         buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5933346Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5933510Z         triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5933583Z         del arg0_1
2023-01-11T21:38:06.5933662Z         return (buf0, )
2023-01-11T21:38:06.5933667Z 
2023-01-11T21:38:06.5933671Z 
2023-01-11T21:38:06.5933750Z if __name__ == "__main__":
2023-01-11T21:38:06.5933861Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5933988Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5934212Z     arg0_1 = rand_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5934426Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5934780Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5934786Z 
2023-01-11T21:38:06.5935060Z [2023-01-11 21:35:29,993] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 771
2023-01-11T21:38:06.5935502Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5935652Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5935918Z [2023-01-11 21:35:30,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 772
2023-01-11T21:38:06.5935927Z 
2023-01-11T21:38:06.5936018Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5936095Z import torch
2023-01-11T21:38:06.5936167Z import random
2023-01-11T21:38:06.5936284Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5936457Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5936463Z 
2023-01-11T21:38:06.5936544Z aten = torch.ops.aten
2023-01-11T21:38:06.5936682Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5936777Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5936782Z 
2023-01-11T21:38:06.5936850Z import triton
2023-01-11T21:38:06.5936943Z import triton.language as tl
2023-01-11T21:38:06.5937066Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5937264Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5937270Z 
2023-01-11T21:38:06.5937275Z 
2023-01-11T21:38:06.5937477Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5937556Z import triton
2023-01-11T21:38:06.5937647Z import triton.language as tl
2023-01-11T21:38:06.5937755Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5937856Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5937991Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5938117Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5938122Z 
2023-01-11T21:38:06.5938527Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5938601Z @triton.jit
2023-01-11T21:38:06.5938734Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5938807Z     xnumel = 64
2023-01-11T21:38:06.5938898Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5939072Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5939160Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5939242Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5939320Z     x0 = xindex % 8
2023-01-11T21:38:06.5939393Z     x2 = xindex
2023-01-11T21:38:06.5939468Z     tmp0 = 3 + x1
2023-01-11T21:38:06.5939540Z     tmp1 = 1 + x0
2023-01-11T21:38:06.5939657Z     tmp2 = tl.load(in_ptr0 + (tmp1 + (11*tmp0)), None)
2023-01-11T21:38:06.5939728Z     tmp3 = x0
2023-01-11T21:38:06.5939809Z     tmp4 = tmp3 >= 1
2023-01-11T21:38:06.5939887Z     tmp5 = tmp3 <= 1
2023-01-11T21:38:06.5939969Z     tmp6 = tmp4 & tmp5
2023-01-11T21:38:06.5940079Z     tmp7 = 3 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5940233Z     tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5940359Z     tmp9 = tl.load(in_ptr0 + (tmp8 + (11*tmp7)), tmp6, other=0)
2023-01-11T21:38:06.5940457Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.5940542Z     tmp11 = tmp2 + tmp10
2023-01-11T21:38:06.5940627Z     tmp12 = tmp3 >= 5
2023-01-11T21:38:06.5940706Z     tmp13 = tmp3 <= 6
2023-01-11T21:38:06.5940787Z     tmp14 = tmp12 & tmp13
2023-01-11T21:38:06.5940890Z     tmp15 = 3 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5941054Z     tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5941183Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp15)), tmp14, other=0)
2023-01-11T21:38:06.5941281Z     tmp18 = tl.where(tmp14, tmp17, 0.0)
2023-01-11T21:38:06.5941363Z     tmp19 = tmp11 + tmp18
2023-01-11T21:38:06.5941439Z     tmp20 = x1
2023-01-11T21:38:06.5941519Z     tmp21 = tmp20 >= 1
2023-01-11T21:38:06.5941591Z     tmp22 = tmp20 <= 3
2023-01-11T21:38:06.5941673Z     tmp23 = tmp21 & tmp22
2023-01-11T21:38:06.5941835Z     tmp24 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5941943Z     tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5942071Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (11*tmp24)), tmp23, other=0)
2023-01-11T21:38:06.5942176Z     tmp27 = tl.where(tmp23, tmp26, 0.0)
2023-01-11T21:38:06.5942259Z     tmp28 = tmp19 + tmp27
2023-01-11T21:38:06.5942331Z     tmp29 = tmp20 >= 3
2023-01-11T21:38:06.5942410Z     tmp30 = tmp20 <= 6
2023-01-11T21:38:06.5942495Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5942656Z     tmp32 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5942796Z     tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5942924Z     tmp34 = tl.load(in_ptr0 + (tmp33 + (11*tmp32)), tmp31, other=0)
2023-01-11T21:38:06.5943024Z     tmp35 = tl.where(tmp31, tmp34, 0.0)
2023-01-11T21:38:06.5943099Z     tmp36 = tmp28 + tmp35
2023-01-11T21:38:06.5943182Z     tmp37 = tmp23 & tmp6
2023-01-11T21:38:06.5943342Z     tmp38 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5943501Z     tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5943627Z     tmp40 = tl.load(in_ptr0 + (tmp39 + (11*tmp38)), tmp37, other=0)
2023-01-11T21:38:06.5943724Z     tmp41 = tl.where(tmp37, tmp40, 0.0)
2023-01-11T21:38:06.5943811Z     tmp42 = tmp36 + tmp41
2023-01-11T21:38:06.5943887Z     tmp43 = tmp23 & tmp14
2023-01-11T21:38:06.5944046Z     tmp44 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5944208Z     tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5944338Z     tmp46 = tl.load(in_ptr0 + (tmp45 + (11*tmp44)), tmp43, other=0)
2023-01-11T21:38:06.5944434Z     tmp47 = tl.where(tmp43, tmp46, 0.0)
2023-01-11T21:38:06.5944515Z     tmp48 = tmp42 + tmp47
2023-01-11T21:38:06.5944596Z     tmp49 = tmp31 & tmp6
2023-01-11T21:38:06.5944748Z     tmp50 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5944911Z     tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5945034Z     tmp52 = tl.load(in_ptr0 + (tmp51 + (11*tmp50)), tmp49, other=0)
2023-01-11T21:38:06.5945129Z     tmp53 = tl.where(tmp49, tmp52, 0.0)
2023-01-11T21:38:06.5945212Z     tmp54 = tmp48 + tmp53
2023-01-11T21:38:06.5945297Z     tmp55 = tmp31 & tmp14
2023-01-11T21:38:06.5945490Z     tmp56 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5945643Z     tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5945769Z     tmp58 = tl.load(in_ptr0 + (tmp57 + (11*tmp56)), tmp55, other=0)
2023-01-11T21:38:06.5945865Z     tmp59 = tl.where(tmp55, tmp58, 0.0)
2023-01-11T21:38:06.5945952Z     tmp60 = tmp54 + tmp59
2023-01-11T21:38:06.5946095Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask)
2023-01-11T21:38:06.5946180Z ''')
2023-01-11T21:38:06.5946185Z 
2023-01-11T21:38:06.5946190Z 
2023-01-11T21:38:06.5946285Z async_compile.wait(globals())
2023-01-11T21:38:06.5946364Z del async_compile
2023-01-11T21:38:06.5946369Z 
2023-01-11T21:38:06.5946439Z def call(args):
2023-01-11T21:38:06.5946521Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5946598Z     args.clear()
2023-01-11T21:38:06.5946694Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5946912Z         buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5947013Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5947180Z         triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5947249Z         del arg0_1
2023-01-11T21:38:06.5947329Z         return (buf0, )
2023-01-11T21:38:06.5947337Z 
2023-01-11T21:38:06.5947342Z 
2023-01-11T21:38:06.5947426Z if __name__ == "__main__":
2023-01-11T21:38:06.5947548Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5947678Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5947903Z     arg0_1 = rand_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5948117Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5948239Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5948244Z 
2023-01-11T21:38:06.5948515Z [2023-01-11 21:35:30,165] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 772
2023-01-11T21:38:06.5948520Z 
2023-01-11T21:38:06.5948614Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5948694Z import torch
2023-01-11T21:38:06.5948771Z import random
2023-01-11T21:38:06.5948922Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5949052Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5949057Z 
2023-01-11T21:38:06.5949141Z aten = torch.ops.aten
2023-01-11T21:38:06.5949281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5949372Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5949384Z 
2023-01-11T21:38:06.5949453Z import triton
2023-01-11T21:38:06.5949546Z import triton.language as tl
2023-01-11T21:38:06.5949675Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5949818Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5949826Z 
2023-01-11T21:38:06.5949831Z 
2023-01-11T21:38:06.5950031Z triton_fused_reflection_pad2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.5950107Z import triton
2023-01-11T21:38:06.5950203Z import triton.language as tl
2023-01-11T21:38:06.5950312Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5950421Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5950555Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5950683Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5950689Z 
2023-01-11T21:38:06.5951091Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.5951167Z @triton.jit
2023-01-11T21:38:06.5951304Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5951377Z     xnumel = 64
2023-01-11T21:38:06.5951502Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5951636Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5951721Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5951801Z     x1 = (xindex // 8)
2023-01-11T21:38:06.5951877Z     x0 = xindex % 8
2023-01-11T21:38:06.5951950Z     x2 = xindex
2023-01-11T21:38:06.5952027Z     tmp0 = 3 + x1
2023-01-11T21:38:06.5952096Z     tmp1 = 1 + x0
2023-01-11T21:38:06.5952227Z     tmp2 = tl.load(in_ptr0 + (tmp1 + (11*tmp0)), None).to(tl.float32)
2023-01-11T21:38:06.5952301Z     tmp3 = x0
2023-01-11T21:38:06.5952380Z     tmp4 = tmp3 >= 1
2023-01-11T21:38:06.5952459Z     tmp5 = tmp3 <= 1
2023-01-11T21:38:06.5952539Z     tmp6 = tmp4 & tmp5
2023-01-11T21:38:06.5952653Z     tmp7 = 3 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5952806Z     tmp8 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5952947Z     tmp9 = tl.load(in_ptr0 + (tmp8 + (11*tmp7)), tmp6, other=0).to(tl.float32)
2023-01-11T21:38:06.5953050Z     tmp10 = tl.where(tmp6, tmp9, 0.0)
2023-01-11T21:38:06.5953133Z     tmp11 = tmp2 + tmp10
2023-01-11T21:38:06.5953212Z     tmp12 = tmp3 >= 5
2023-01-11T21:38:06.5953291Z     tmp13 = tmp3 <= 6
2023-01-11T21:38:06.5953374Z     tmp14 = tmp12 & tmp13
2023-01-11T21:38:06.5953476Z     tmp15 = 3 + x1 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5953644Z     tmp16 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5953786Z     tmp17 = tl.load(in_ptr0 + (tmp16 + (11*tmp15)), tmp14, other=0).to(tl.float32)
2023-01-11T21:38:06.5953884Z     tmp18 = tl.where(tmp14, tmp17, 0.0)
2023-01-11T21:38:06.5953967Z     tmp19 = tmp11 + tmp18
2023-01-11T21:38:06.5954043Z     tmp20 = x1
2023-01-11T21:38:06.5954122Z     tmp21 = tmp20 >= 1
2023-01-11T21:38:06.5954195Z     tmp22 = tmp20 <= 3
2023-01-11T21:38:06.5954276Z     tmp23 = tmp21 & tmp22
2023-01-11T21:38:06.5954438Z     tmp24 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5954546Z     tmp25 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5954691Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (11*tmp24)), tmp23, other=0).to(tl.float32)
2023-01-11T21:38:06.5954789Z     tmp27 = tl.where(tmp23, tmp26, 0.0)
2023-01-11T21:38:06.5954871Z     tmp28 = tmp19 + tmp27
2023-01-11T21:38:06.5954943Z     tmp29 = tmp20 >= 3
2023-01-11T21:38:06.5955023Z     tmp30 = tmp20 <= 6
2023-01-11T21:38:06.5955135Z     tmp31 = tmp29 & tmp30
2023-01-11T21:38:06.5955305Z     tmp32 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5955436Z     tmp33 = 1 + x0 + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5955599Z     tmp34 = tl.load(in_ptr0 + (tmp33 + (11*tmp32)), tmp31, other=0).to(tl.float32)
2023-01-11T21:38:06.5955698Z     tmp35 = tl.where(tmp31, tmp34, 0.0)
2023-01-11T21:38:06.5955774Z     tmp36 = tmp28 + tmp35
2023-01-11T21:38:06.5955855Z     tmp37 = tmp23 & tmp6
2023-01-11T21:38:06.5956015Z     tmp38 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5956176Z     tmp39 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5956319Z     tmp40 = tl.load(in_ptr0 + (tmp39 + (11*tmp38)), tmp37, other=0).to(tl.float32)
2023-01-11T21:38:06.5956416Z     tmp41 = tl.where(tmp37, tmp40, 0.0)
2023-01-11T21:38:06.5956499Z     tmp42 = tmp36 + tmp41
2023-01-11T21:38:06.5956574Z     tmp43 = tmp23 & tmp14
2023-01-11T21:38:06.5956736Z     tmp44 = 3 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5956898Z     tmp45 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5957040Z     tmp46 = tl.load(in_ptr0 + (tmp45 + (11*tmp44)), tmp43, other=0).to(tl.float32)
2023-01-11T21:38:06.5957137Z     tmp47 = tl.where(tmp43, tmp46, 0.0)
2023-01-11T21:38:06.5957220Z     tmp48 = tmp42 + tmp47
2023-01-11T21:38:06.5957304Z     tmp49 = tmp31 & tmp6
2023-01-11T21:38:06.5957457Z     tmp50 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5957619Z     tmp51 = 1 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5957756Z     tmp52 = tl.load(in_ptr0 + (tmp51 + (11*tmp50)), tmp49, other=0).to(tl.float32)
2023-01-11T21:38:06.5957883Z     tmp53 = tl.where(tmp49, tmp52, 0.0)
2023-01-11T21:38:06.5957965Z     tmp54 = tmp48 + tmp53
2023-01-11T21:38:06.5958046Z     tmp55 = tmp31 & tmp14
2023-01-11T21:38:06.5958207Z     tmp56 = 17 + ((-1)*x1) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5958370Z     tmp57 = 15 + ((-1)*x0) + tl.zeros([XBLOCK], tl.int32)
2023-01-11T21:38:06.5958505Z     tmp58 = tl.load(in_ptr0 + (tmp57 + (11*tmp56)), tmp55, other=0).to(tl.float32)
2023-01-11T21:38:06.5958601Z     tmp59 = tl.where(tmp55, tmp58, 0.0)
2023-01-11T21:38:06.5958684Z     tmp60 = tmp54 + tmp59
2023-01-11T21:38:06.5958822Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp60, xmask)
2023-01-11T21:38:06.5958911Z ''')
2023-01-11T21:38:06.5958917Z 
2023-01-11T21:38:06.5958922Z 
2023-01-11T21:38:06.5959017Z async_compile.wait(globals())
2023-01-11T21:38:06.5959098Z del async_compile
2023-01-11T21:38:06.5959103Z 
2023-01-11T21:38:06.5959173Z def call(args):
2023-01-11T21:38:06.5959260Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5959337Z     args.clear()
2023-01-11T21:38:06.5959434Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5959650Z         buf0 = empty_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5959744Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5959914Z         triton_fused_reflection_pad2d_backward_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5959991Z         del arg0_1
2023-01-11T21:38:06.5960064Z         return (buf0, )
2023-01-11T21:38:06.5960070Z 
2023-01-11T21:38:06.5960074Z 
2023-01-11T21:38:06.5960158Z if __name__ == "__main__":
2023-01-11T21:38:06.5960280Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5960407Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5960631Z     arg0_1 = rand_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5960844Z     arg1_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5960970Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5960976Z 
2023-01-11T21:38:06.5961050Z ok (0.896s)
2023-01-11T21:38:06.5961537Z   test_reflection_pad2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5961678Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5961943Z [2023-01-11 21:35:30,192] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 773
2023-01-11T21:38:06.5962211Z [2023-01-11 21:35:30,298] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 773
2023-01-11T21:38:06.5962628Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5962761Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5963015Z [2023-01-11 21:35:30,324] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 774
2023-01-11T21:38:06.5963277Z [2023-01-11 21:35:30,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 774
2023-01-11T21:38:06.5963283Z 
2023-01-11T21:38:06.5963383Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5963462Z import torch
2023-01-11T21:38:06.5963539Z import random
2023-01-11T21:38:06.5963655Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5963823Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5963829Z 
2023-01-11T21:38:06.5963915Z aten = torch.ops.aten
2023-01-11T21:38:06.5964055Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5964153Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5964160Z 
2023-01-11T21:38:06.5964238Z import triton
2023-01-11T21:38:06.5964335Z import triton.language as tl
2023-01-11T21:38:06.5964456Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5964599Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5964605Z 
2023-01-11T21:38:06.5964609Z 
2023-01-11T21:38:06.5964793Z triton_fused_reflection_pad2d_0 = async_compile.triton('''
2023-01-11T21:38:06.5964873Z import triton
2023-01-11T21:38:06.5964967Z import triton.language as tl
2023-01-11T21:38:06.5965083Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5965188Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5965328Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5965448Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5965453Z 
2023-01-11T21:38:06.5965860Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5965933Z @triton.jit
2023-01-11T21:38:06.5966070Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5966146Z     xnumel = 100
2023-01-11T21:38:06.5966246Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5966380Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5966463Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5966539Z     x1 = (xindex // 10)
2023-01-11T21:38:06.5966619Z     x0 = xindex % 10
2023-01-11T21:38:06.5966690Z     x2 = xindex
2023-01-11T21:38:06.5966765Z     tmp0 = 7
2023-01-11T21:38:06.5966838Z     tmp1 = x1
2023-01-11T21:38:06.5966910Z     tmp2 = 1
2023-01-11T21:38:06.5967016Z     tmp3 = tmp1 - tmp2
2023-01-11T21:38:06.5967099Z     tmp4 = tl.abs(tmp3)
2023-01-11T21:38:06.5967210Z     tmp5 = tmp0 - tmp4
2023-01-11T21:38:06.5967291Z     tmp6 = tl.abs(tmp5)
2023-01-11T21:38:06.5967429Z     tmp7 = tmp0 - tmp6
2023-01-11T21:38:06.5967504Z     tmp8 = x0
2023-01-11T21:38:06.5967613Z     tmp9 = tmp8 - tmp2
2023-01-11T21:38:06.5967690Z     tmp10 = tl.abs(tmp9)
2023-01-11T21:38:06.5967804Z     tmp11 = tmp0 - tmp10
2023-01-11T21:38:06.5967886Z     tmp12 = tl.abs(tmp11)
2023-01-11T21:38:06.5968000Z     tmp13 = tmp0 - tmp12
2023-01-11T21:38:06.5968211Z     tmp14 = tl.load(in_ptr0 + (tmp13 + (8*tmp7)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.5968352Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.5968439Z ''')
2023-01-11T21:38:06.5968445Z 
2023-01-11T21:38:06.5968449Z 
2023-01-11T21:38:06.5968630Z triton_fused_reflection_pad2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.5968708Z import triton
2023-01-11T21:38:06.5968802Z import triton.language as tl
2023-01-11T21:38:06.5968919Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5969025Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5969164Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5969291Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5969297Z 
2023-01-11T21:38:06.5969699Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5969769Z @triton.jit
2023-01-11T21:38:06.5969903Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5969979Z     xnumel = 165
2023-01-11T21:38:06.5970080Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5970239Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5970325Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5970407Z     x1 = (xindex // 11)
2023-01-11T21:38:06.5970478Z     x0 = xindex % 11
2023-01-11T21:38:06.5970548Z     x2 = xindex
2023-01-11T21:38:06.5970620Z     tmp0 = 7
2023-01-11T21:38:06.5970695Z     tmp1 = x1
2023-01-11T21:38:06.5970767Z     tmp2 = 3
2023-01-11T21:38:06.5970880Z     tmp3 = tmp1 - tmp2
2023-01-11T21:38:06.5970962Z     tmp4 = tl.abs(tmp3)
2023-01-11T21:38:06.5971064Z     tmp5 = tmp0 - tmp4
2023-01-11T21:38:06.5971146Z     tmp6 = tl.abs(tmp5)
2023-01-11T21:38:06.5971255Z     tmp7 = tmp0 - tmp6
2023-01-11T21:38:06.5971328Z     tmp8 = x0
2023-01-11T21:38:06.5971400Z     tmp9 = 1
2023-01-11T21:38:06.5971511Z     tmp10 = tmp8 - tmp9
2023-01-11T21:38:06.5971589Z     tmp11 = tl.abs(tmp10)
2023-01-11T21:38:06.5971706Z     tmp12 = tmp0 - tmp11
2023-01-11T21:38:06.5971789Z     tmp13 = tl.abs(tmp12)
2023-01-11T21:38:06.5971905Z     tmp14 = tmp0 - tmp13
2023-01-11T21:38:06.5972023Z     tmp15 = tl.load(in_ptr0 + (tmp14 + (8*tmp7)), None)
2023-01-11T21:38:06.5972159Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.5972248Z ''')
2023-01-11T21:38:06.5972253Z 
2023-01-11T21:38:06.5972258Z 
2023-01-11T21:38:06.5972356Z async_compile.wait(globals())
2023-01-11T21:38:06.5972430Z del async_compile
2023-01-11T21:38:06.5972434Z 
2023-01-11T21:38:06.5972512Z def call(args):
2023-01-11T21:38:06.5972587Z     arg0_1, = args
2023-01-11T21:38:06.5972664Z     args.clear()
2023-01-11T21:38:06.5972760Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5972989Z         buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5973088Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5973235Z         triton_fused_reflection_pad2d_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.5973462Z         buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5973622Z         triton_fused_reflection_pad2d_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0)
2023-01-11T21:38:06.5973699Z         del arg0_1
2023-01-11T21:38:06.5973785Z         return (buf0, buf1, )
2023-01-11T21:38:06.5973790Z 
2023-01-11T21:38:06.5973834Z 
2023-01-11T21:38:06.5973918Z if __name__ == "__main__":
2023-01-11T21:38:06.5974037Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5974167Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5974377Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5974603Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5974609Z 
2023-01-11T21:38:06.5974614Z 
2023-01-11T21:38:06.5974719Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5974798Z import torch
2023-01-11T21:38:06.5974875Z import random
2023-01-11T21:38:06.5975002Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5975128Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5975133Z 
2023-01-11T21:38:06.5975214Z aten = torch.ops.aten
2023-01-11T21:38:06.5975346Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5975447Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5975452Z 
2023-01-11T21:38:06.5975527Z import triton
2023-01-11T21:38:06.5975621Z import triton.language as tl
2023-01-11T21:38:06.5975749Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5975893Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5975899Z 
2023-01-11T21:38:06.5975903Z 
2023-01-11T21:38:06.5976085Z triton_fused_reflection_pad2d_0 = async_compile.triton('''
2023-01-11T21:38:06.5976162Z import triton
2023-01-11T21:38:06.5976250Z import triton.language as tl
2023-01-11T21:38:06.5976366Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5976519Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5976655Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5976783Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5976788Z 
2023-01-11T21:38:06.5977258Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5977336Z @triton.jit
2023-01-11T21:38:06.5977470Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5977540Z     xnumel = 100
2023-01-11T21:38:06.5977642Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5977775Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5977861Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5977944Z     x1 = (xindex // 10)
2023-01-11T21:38:06.5978021Z     x0 = xindex % 10
2023-01-11T21:38:06.5978096Z     x2 = xindex
2023-01-11T21:38:06.5978162Z     tmp0 = 7
2023-01-11T21:38:06.5978237Z     tmp1 = x1
2023-01-11T21:38:06.5978311Z     tmp2 = 1
2023-01-11T21:38:06.5978424Z     tmp3 = tmp1 - tmp2
2023-01-11T21:38:06.5978504Z     tmp4 = tl.abs(tmp3)
2023-01-11T21:38:06.5978618Z     tmp5 = tmp0 - tmp4
2023-01-11T21:38:06.5978695Z     tmp6 = tl.abs(tmp5)
2023-01-11T21:38:06.5978806Z     tmp7 = tmp0 - tmp6
2023-01-11T21:38:06.5978878Z     tmp8 = x0
2023-01-11T21:38:06.5978986Z     tmp9 = tmp8 - tmp2
2023-01-11T21:38:06.5979072Z     tmp10 = tl.abs(tmp9)
2023-01-11T21:38:06.5979186Z     tmp11 = tmp0 - tmp10
2023-01-11T21:38:06.5979271Z     tmp12 = tl.abs(tmp11)
2023-01-11T21:38:06.5979378Z     tmp13 = tmp0 - tmp12
2023-01-11T21:38:06.5979617Z     tmp14 = tl.load(in_ptr0 + (tmp13 + (8*tmp7)), None, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5979754Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.5979842Z ''')
2023-01-11T21:38:06.5979850Z 
2023-01-11T21:38:06.5979855Z 
2023-01-11T21:38:06.5980039Z triton_fused_reflection_pad2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.5980114Z import triton
2023-01-11T21:38:06.5980210Z import triton.language as tl
2023-01-11T21:38:06.5980324Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5980466Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5980606Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5980733Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5980739Z 
2023-01-11T21:38:06.5981142Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.5981222Z @triton.jit
2023-01-11T21:38:06.5981354Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5981430Z     xnumel = 165
2023-01-11T21:38:06.5981534Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5981657Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5981743Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5981825Z     x1 = (xindex // 11)
2023-01-11T21:38:06.5981902Z     x0 = xindex % 11
2023-01-11T21:38:06.5981974Z     x2 = xindex
2023-01-11T21:38:06.5982046Z     tmp0 = 7
2023-01-11T21:38:06.5982112Z     tmp1 = x1
2023-01-11T21:38:06.5982184Z     tmp2 = 3
2023-01-11T21:38:06.5982296Z     tmp3 = tmp1 - tmp2
2023-01-11T21:38:06.5982378Z     tmp4 = tl.abs(tmp3)
2023-01-11T21:38:06.5982487Z     tmp5 = tmp0 - tmp4
2023-01-11T21:38:06.5982567Z     tmp6 = tl.abs(tmp5)
2023-01-11T21:38:06.5982676Z     tmp7 = tmp0 - tmp6
2023-01-11T21:38:06.5982743Z     tmp8 = x0
2023-01-11T21:38:06.5982814Z     tmp9 = 1
2023-01-11T21:38:06.5982926Z     tmp10 = tmp8 - tmp9
2023-01-11T21:38:06.5983008Z     tmp11 = tl.abs(tmp10)
2023-01-11T21:38:06.5983123Z     tmp12 = tmp0 - tmp11
2023-01-11T21:38:06.5983237Z     tmp13 = tl.abs(tmp12)
2023-01-11T21:38:06.5983341Z     tmp14 = tmp0 - tmp13
2023-01-11T21:38:06.5983473Z     tmp15 = tl.load(in_ptr0 + (tmp14 + (8*tmp7)), None).to(tl.float32)
2023-01-11T21:38:06.5983610Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.5983695Z ''')
2023-01-11T21:38:06.5983701Z 
2023-01-11T21:38:06.5983708Z 
2023-01-11T21:38:06.5983804Z async_compile.wait(globals())
2023-01-11T21:38:06.5983881Z del async_compile
2023-01-11T21:38:06.5983885Z 
2023-01-11T21:38:06.5983959Z def call(args):
2023-01-11T21:38:06.5984031Z     arg0_1, = args
2023-01-11T21:38:06.5984098Z     args.clear()
2023-01-11T21:38:06.5984193Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5984416Z         buf0 = empty_strided((1, 1, 10, 10), (100, 100, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5984508Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5984661Z         triton_fused_reflection_pad2d_0.run(arg0_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.5984884Z         buf1 = empty_strided((1, 1, 15, 11), (165, 165, 11, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.5985038Z         triton_fused_reflection_pad2d_1_1.run(arg0_1, buf1, 165, grid=grid(165), stream=stream0)
2023-01-11T21:38:06.5985111Z         del arg0_1
2023-01-11T21:38:06.5985191Z         return (buf0, buf1, )
2023-01-11T21:38:06.5985196Z 
2023-01-11T21:38:06.5985200Z 
2023-01-11T21:38:06.5985282Z if __name__ == "__main__":
2023-01-11T21:38:06.5985406Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5985549Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5985796Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.5985908Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.5985913Z 
2023-01-11T21:38:06.5985983Z ok (0.255s)
2023-01-11T21:38:06.5986435Z   test_relu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5986597Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5986853Z [2023-01-11 21:35:30,443] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 775
2023-01-11T21:38:06.5987117Z [2023-01-11 21:35:30,520] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 775
2023-01-11T21:38:06.5987537Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.5987671Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.5987926Z [2023-01-11 21:35:30,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 776
2023-01-11T21:38:06.5988194Z [2023-01-11 21:35:30,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 776
2023-01-11T21:38:06.5988200Z 
2023-01-11T21:38:06.5988298Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5988373Z import torch
2023-01-11T21:38:06.5988448Z import random
2023-01-11T21:38:06.5988561Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5988688Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5988693Z 
2023-01-11T21:38:06.5988779Z aten = torch.ops.aten
2023-01-11T21:38:06.5988916Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5989012Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5989088Z 
2023-01-11T21:38:06.5989162Z import triton
2023-01-11T21:38:06.5989254Z import triton.language as tl
2023-01-11T21:38:06.5989379Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5989512Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5989517Z 
2023-01-11T21:38:06.5989531Z 
2023-01-11T21:38:06.5989685Z triton_fused_div_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.5989759Z import triton
2023-01-11T21:38:06.5989851Z import triton.language as tl
2023-01-11T21:38:06.5989965Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5990066Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5990198Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5990323Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5990328Z 
2023-01-11T21:38:06.5990755Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5990832Z @triton.jit
2023-01-11T21:38:06.5990982Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5991061Z     xnumel = 64
2023-01-11T21:38:06.5991155Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5991286Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5991368Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5991437Z     x0 = xindex
2023-01-11T21:38:06.5991622Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.5991720Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.5991816Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.5991933Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.5992013Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5992130Z     tmp5 = tl.where(0 != 0, 0, tl.where(0 > tmp4, 0, tmp4))
2023-01-11T21:38:06.5992202Z     tmp6 = 10
2023-01-11T21:38:06.5992274Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.5992409Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.5992574Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.5992660Z ''')
2023-01-11T21:38:06.5992666Z 
2023-01-11T21:38:06.5992670Z 
2023-01-11T21:38:06.5992766Z async_compile.wait(globals())
2023-01-11T21:38:06.5992844Z del async_compile
2023-01-11T21:38:06.5992849Z 
2023-01-11T21:38:06.5992925Z def call(args):
2023-01-11T21:38:06.5993007Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5993075Z     args.clear()
2023-01-11T21:38:06.5993167Z     with torch.cuda.device(0):
2023-01-11T21:38:06.5993371Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5993572Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.5993667Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.5993821Z         triton_fused_div_relu_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.5993894Z         del arg0_1
2023-01-11T21:38:06.5993960Z         del arg1_1
2023-01-11T21:38:06.5994045Z         return (buf0, buf1, )
2023-01-11T21:38:06.5994050Z 
2023-01-11T21:38:06.5994055Z 
2023-01-11T21:38:06.5994134Z if __name__ == "__main__":
2023-01-11T21:38:06.5994252Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.5994377Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.5994578Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5994777Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.5994896Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.5994901Z 
2023-01-11T21:38:06.5994906Z 
2023-01-11T21:38:06.5995027Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.5995100Z import torch
2023-01-11T21:38:06.5995173Z import random
2023-01-11T21:38:06.5995300Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.5995440Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.5995445Z 
2023-01-11T21:38:06.5995547Z aten = torch.ops.aten
2023-01-11T21:38:06.5995688Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.5995783Z async_compile = AsyncCompile()
2023-01-11T21:38:06.5995789Z 
2023-01-11T21:38:06.5995855Z import triton
2023-01-11T21:38:06.5995945Z import triton.language as tl
2023-01-11T21:38:06.5996069Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.5996212Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.5996217Z 
2023-01-11T21:38:06.5996222Z 
2023-01-11T21:38:06.5996383Z triton_fused_div_relu_0 = async_compile.triton('''
2023-01-11T21:38:06.5996456Z import triton
2023-01-11T21:38:06.5996551Z import triton.language as tl
2023-01-11T21:38:06.5996658Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.5996759Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.5996887Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.5997015Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.5997020Z 
2023-01-11T21:38:06.5997457Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.5997530Z @triton.jit
2023-01-11T21:38:06.5997681Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.5997755Z     xnumel = 64
2023-01-11T21:38:06.5997845Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.5997975Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.5998058Z     xmask = xindex < xnumel
2023-01-11T21:38:06.5998131Z     x0 = xindex
2023-01-11T21:38:06.5998345Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.5998461Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5998602Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.5998719Z     tmp1 = tl.where(0 != 0, 0, tl.where(0 > tmp0, 0, tmp0))
2023-01-11T21:38:06.5998791Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.5998905Z     tmp5 = tl.where(0 != 0, 0, tl.where(0 > tmp4, 0, tmp4))
2023-01-11T21:38:06.5998978Z     tmp6 = 10
2023-01-11T21:38:06.5999057Z     tmp7 = tmp5 / tmp6
2023-01-11T21:38:06.5999189Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.5999321Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.5999404Z ''')
2023-01-11T21:38:06.5999409Z 
2023-01-11T21:38:06.5999416Z 
2023-01-11T21:38:06.5999505Z async_compile.wait(globals())
2023-01-11T21:38:06.5999584Z del async_compile
2023-01-11T21:38:06.5999589Z 
2023-01-11T21:38:06.5999663Z def call(args):
2023-01-11T21:38:06.5999742Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.5999818Z     args.clear()
2023-01-11T21:38:06.5999909Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6000110Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6000306Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6000391Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6000541Z         triton_fused_div_relu_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6000615Z         del arg0_1
2023-01-11T21:38:06.6000686Z         del arg1_1
2023-01-11T21:38:06.6000766Z         return (buf0, buf1, )
2023-01-11T21:38:06.6000771Z 
2023-01-11T21:38:06.6000776Z 
2023-01-11T21:38:06.6000886Z if __name__ == "__main__":
2023-01-11T21:38:06.6001009Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6001128Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6001328Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6001528Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6001648Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6001653Z 
2023-01-11T21:38:06.6001725Z ok (0.200s)
2023-01-11T21:38:06.6002184Z   test_remainder_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6002317Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6002583Z [2023-01-11 21:35:30,647] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 777
2023-01-11T21:38:06.6002849Z [2023-01-11 21:35:30,744] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 777
2023-01-11T21:38:06.6003269Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6003402Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6003649Z [2023-01-11 21:35:30,771] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 778
2023-01-11T21:38:06.6003910Z [2023-01-11 21:35:30,870] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 778
2023-01-11T21:38:06.6003919Z 
2023-01-11T21:38:06.6004018Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6004091Z import torch
2023-01-11T21:38:06.6004162Z import random
2023-01-11T21:38:06.6004281Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6004436Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6004442Z 
2023-01-11T21:38:06.6004525Z aten = torch.ops.aten
2023-01-11T21:38:06.6004655Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6004754Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6004760Z 
2023-01-11T21:38:06.6004834Z import triton
2023-01-11T21:38:06.6004925Z import triton.language as tl
2023-01-11T21:38:06.6005048Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6005189Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6005194Z 
2023-01-11T21:38:06.6005202Z 
2023-01-11T21:38:06.6005405Z triton_fused_remainder_remainder_1_remainder_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6005478Z import triton
2023-01-11T21:38:06.6005563Z import triton.language as tl
2023-01-11T21:38:06.6005676Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6005777Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6005912Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6006036Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6006041Z 
2023-01-11T21:38:06.6006495Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.6006569Z @triton.jit
2023-01-11T21:38:06.6006727Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6006837Z     xnumel = 64
2023-01-11T21:38:06.6006934Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6007064Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6007145Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6007215Z     x0 = xindex
2023-01-11T21:38:06.6007407Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6007598Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6007692Z     tmp11 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6007790Z     tmp13 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6007870Z     tmp2 = tmp0 % tmp1
2023-01-11T21:38:06.6007947Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.6008075Z     tmp4 = tl.where(((tmp2 != 0) & ((tmp2 < 0) != (tmp1 < 0))), tmp3, tmp2)
2023-01-11T21:38:06.6008143Z     tmp5 = 1
2023-01-11T21:38:06.6008220Z     tmp6 = tmp0 + tmp5
2023-01-11T21:38:06.6008322Z     tmp7 = tmp1 - tmp5
2023-01-11T21:38:06.6008403Z     tmp8 = tmp6 % tmp7
2023-01-11T21:38:06.6008479Z     tmp9 = tmp8 + tmp7
2023-01-11T21:38:06.6008607Z     tmp10 = tl.where(((tmp8 != 0) & ((tmp8 < 0) != (tmp7 < 0))), tmp9, tmp8)
2023-01-11T21:38:06.6008719Z     tmp12 = tmp11 - tmp5
2023-01-11T21:38:06.6008799Z     tmp14 = tmp13 + tmp5
2023-01-11T21:38:06.6008885Z     tmp15 = tmp12 % tmp14
2023-01-11T21:38:06.6008958Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.6009088Z     tmp17 = tl.where(((tmp15 != 0) & ((tmp15 < 0) != (tmp14 < 0))), tmp16, tmp15)
2023-01-11T21:38:06.6009223Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6009357Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.6009490Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.6009576Z ''')
2023-01-11T21:38:06.6009581Z 
2023-01-11T21:38:06.6009586Z 
2023-01-11T21:38:06.6009678Z async_compile.wait(globals())
2023-01-11T21:38:06.6009756Z del async_compile
2023-01-11T21:38:06.6009762Z 
2023-01-11T21:38:06.6009829Z def call(args):
2023-01-11T21:38:06.6009906Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6009980Z     args.clear()
2023-01-11T21:38:06.6010071Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6010299Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6010498Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6010694Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6010781Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6010968Z         triton_fused_remainder_remainder_1_remainder_2_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6011043Z         del arg0_1
2023-01-11T21:38:06.6011115Z         del arg1_1
2023-01-11T21:38:06.6011207Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6011213Z 
2023-01-11T21:38:06.6011220Z 
2023-01-11T21:38:06.6011299Z if __name__ == "__main__":
2023-01-11T21:38:06.6011419Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6011544Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6011736Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6011934Z     arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6012053Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6012058Z 
2023-01-11T21:38:06.6012063Z 
2023-01-11T21:38:06.6012158Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6012231Z import torch
2023-01-11T21:38:06.6012304Z import random
2023-01-11T21:38:06.6012421Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6012546Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6012551Z 
2023-01-11T21:38:06.6012626Z aten = torch.ops.aten
2023-01-11T21:38:06.6012764Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6012885Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6012890Z 
2023-01-11T21:38:06.6012961Z import triton
2023-01-11T21:38:06.6013052Z import triton.language as tl
2023-01-11T21:38:06.6013178Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6013322Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6013328Z 
2023-01-11T21:38:06.6013332Z 
2023-01-11T21:38:06.6013536Z triton_fused_remainder_remainder_1_remainder_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6013603Z import triton
2023-01-11T21:38:06.6013693Z import triton.language as tl
2023-01-11T21:38:06.6013806Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6013907Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6014038Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6014161Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6014169Z 
2023-01-11T21:38:06.6014725Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.6014800Z @triton.jit
2023-01-11T21:38:06.6014960Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6015034Z     xnumel = 64
2023-01-11T21:38:06.6015132Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6015263Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6015354Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6015443Z     x0 = xindex
2023-01-11T21:38:06.6015684Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6015892Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6016012Z     tmp11 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6016128Z     tmp13 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6016207Z     tmp2 = tmp0 % tmp1
2023-01-11T21:38:06.6016285Z     tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.6016454Z     tmp4 = tl.where(((tmp2 != 0) & ((tmp2 < 0) != (tmp1 < 0))), tmp3, tmp2)
2023-01-11T21:38:06.6016529Z     tmp5 = 1
2023-01-11T21:38:06.6016600Z     tmp6 = tmp0 + tmp5
2023-01-11T21:38:06.6016707Z     tmp7 = tmp1 - tmp5
2023-01-11T21:38:06.6016783Z     tmp8 = tmp6 % tmp7
2023-01-11T21:38:06.6016858Z     tmp9 = tmp8 + tmp7
2023-01-11T21:38:06.6016985Z     tmp10 = tl.where(((tmp8 != 0) & ((tmp8 < 0) != (tmp7 < 0))), tmp9, tmp8)
2023-01-11T21:38:06.6017097Z     tmp12 = tmp11 - tmp5
2023-01-11T21:38:06.6017241Z     tmp14 = tmp13 + tmp5
2023-01-11T21:38:06.6017327Z     tmp15 = tmp12 % tmp14
2023-01-11T21:38:06.6017423Z     tmp16 = tmp15 + tmp14
2023-01-11T21:38:06.6017568Z     tmp17 = tl.where(((tmp15 != 0) & ((tmp15 < 0) != (tmp14 < 0))), tmp16, tmp15)
2023-01-11T21:38:06.6017708Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6017839Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.6017970Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.6018059Z ''')
2023-01-11T21:38:06.6018064Z 
2023-01-11T21:38:06.6018069Z 
2023-01-11T21:38:06.6018162Z async_compile.wait(globals())
2023-01-11T21:38:06.6018232Z del async_compile
2023-01-11T21:38:06.6018237Z 
2023-01-11T21:38:06.6018309Z def call(args):
2023-01-11T21:38:06.6018385Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6018462Z     args.clear()
2023-01-11T21:38:06.6018551Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6018749Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6018946Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6019177Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6019270Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6019458Z         triton_fused_remainder_remainder_1_remainder_2_0.run(arg0_1, arg1_1, buf0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6019534Z         del arg0_1
2023-01-11T21:38:06.6019606Z         del arg1_1
2023-01-11T21:38:06.6019693Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6019698Z 
2023-01-11T21:38:06.6019703Z 
2023-01-11T21:38:06.6019781Z if __name__ == "__main__":
2023-01-11T21:38:06.6019899Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6020019Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6020214Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6020408Z     arg1_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6020530Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6020542Z 
2023-01-11T21:38:06.6020611Z ok (0.250s)
2023-01-11T21:38:06.6021069Z   test_repeat_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6021205Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6021463Z [2023-01-11 21:35:30,889] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 779
2023-01-11T21:38:06.6021728Z [2023-01-11 21:35:31,019] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 779
2023-01-11T21:38:06.6022142Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6022304Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6022552Z [2023-01-11 21:35:31,036] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 780
2023-01-11T21:38:06.6022558Z 
2023-01-11T21:38:06.6022656Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6022729Z import torch
2023-01-11T21:38:06.6022803Z import random
2023-01-11T21:38:06.6022922Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6023048Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6023053Z 
2023-01-11T21:38:06.6023135Z aten = torch.ops.aten
2023-01-11T21:38:06.6023273Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6023365Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6023370Z 
2023-01-11T21:38:06.6023444Z import triton
2023-01-11T21:38:06.6023534Z import triton.language as tl
2023-01-11T21:38:06.6023659Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6023804Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6023810Z 
2023-01-11T21:38:06.6023815Z 
2023-01-11T21:38:06.6023975Z triton_fused_repeat_0 = async_compile.triton('''
2023-01-11T21:38:06.6024048Z import triton
2023-01-11T21:38:06.6024133Z import triton.language as tl
2023-01-11T21:38:06.6024248Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6024349Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6024484Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6024610Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6024615Z 
2023-01-11T21:38:06.6025019Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6025122Z @triton.jit
2023-01-11T21:38:06.6025254Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6025323Z     xnumel = 768
2023-01-11T21:38:06.6025423Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6025555Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6025650Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6025735Z     x0 = xindex % 8
2023-01-11T21:38:06.6025823Z     x1 = (xindex // 8) % 12
2023-01-11T21:38:06.6025916Z     x2 = (xindex // 96) % 4
2023-01-11T21:38:06.6025979Z     x4 = xindex
2023-01-11T21:38:06.6026208Z     tmp0 = tl.load(in_ptr0 + (x0 + (8*(x1 % 4)) + (32*(x2 % 2))), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6026344Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6026432Z ''')
2023-01-11T21:38:06.6026438Z 
2023-01-11T21:38:06.6026442Z 
2023-01-11T21:38:06.6026604Z triton_fused_repeat_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6026680Z import triton
2023-01-11T21:38:06.6026771Z import triton.language as tl
2023-01-11T21:38:06.6026886Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6026981Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6027112Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6027237Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6027242Z 
2023-01-11T21:38:06.6027646Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6027719Z @triton.jit
2023-01-11T21:38:06.6027853Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6027927Z     xnumel = 512
2023-01-11T21:38:06.6028023Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6028145Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6028228Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6028304Z     x0 = xindex % 64
2023-01-11T21:38:06.6028399Z     x2 = xindex
2023-01-11T21:38:06.6028589Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6028724Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6028809Z ''')
2023-01-11T21:38:06.6028815Z 
2023-01-11T21:38:06.6028819Z 
2023-01-11T21:38:06.6028978Z triton_fused_repeat_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6029046Z import triton
2023-01-11T21:38:06.6029136Z import triton.language as tl
2023-01-11T21:38:06.6029249Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6029348Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6029483Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6029606Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6029612Z 
2023-01-11T21:38:06.6030016Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6030089Z @triton.jit
2023-01-11T21:38:06.6030213Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6030286Z     xnumel = 128
2023-01-11T21:38:06.6030381Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6030510Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6030592Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6030666Z     x0 = xindex % 64
2023-01-11T21:38:06.6030729Z     x2 = xindex
2023-01-11T21:38:06.6030823Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6030986Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6031071Z ''')
2023-01-11T21:38:06.6031077Z 
2023-01-11T21:38:06.6031081Z 
2023-01-11T21:38:06.6031176Z async_compile.wait(globals())
2023-01-11T21:38:06.6031252Z del async_compile
2023-01-11T21:38:06.6031257Z 
2023-01-11T21:38:06.6031330Z def call(args):
2023-01-11T21:38:06.6031403Z     arg0_1, = args
2023-01-11T21:38:06.6031471Z     args.clear()
2023-01-11T21:38:06.6031566Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6031787Z         buf0 = empty_strided((2, 4, 12, 8), (384, 96, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6031877Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6032019Z         triton_fused_repeat_0.run(arg0_1, buf0, 768, grid=grid(768), stream=stream0)
2023-01-11T21:38:06.6032233Z         buf1 = empty_strided((8, 2, 4, 8), (64, 32, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6032376Z         triton_fused_repeat_1_1.run(arg0_1, buf1, 512, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.6032607Z         buf2 = empty_strided((2, 1, 1, 2, 4, 8), (64, 64, 64, 32, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6032742Z         triton_fused_repeat_2_2.run(arg0_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.6032814Z         del arg0_1
2023-01-11T21:38:06.6032905Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6032911Z 
2023-01-11T21:38:06.6032915Z 
2023-01-11T21:38:06.6032994Z if __name__ == "__main__":
2023-01-11T21:38:06.6033112Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6033237Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6033449Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6033562Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6033567Z 
2023-01-11T21:38:06.6033824Z [2023-01-11 21:35:31,131] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 780
2023-01-11T21:38:06.6033846Z 
2023-01-11T21:38:06.6033937Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6034011Z import torch
2023-01-11T21:38:06.6034083Z import random
2023-01-11T21:38:06.6034202Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6034355Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6034360Z 
2023-01-11T21:38:06.6034442Z aten = torch.ops.aten
2023-01-11T21:38:06.6034578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6034666Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6034671Z 
2023-01-11T21:38:06.6034741Z import triton
2023-01-11T21:38:06.6034832Z import triton.language as tl
2023-01-11T21:38:06.6034956Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6035094Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6035099Z 
2023-01-11T21:38:06.6035104Z 
2023-01-11T21:38:06.6035263Z triton_fused_repeat_0 = async_compile.triton('''
2023-01-11T21:38:06.6035360Z import triton
2023-01-11T21:38:06.6035461Z import triton.language as tl
2023-01-11T21:38:06.6035591Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6035693Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6035825Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6035950Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6035955Z 
2023-01-11T21:38:06.6036359Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6036431Z @triton.jit
2023-01-11T21:38:06.6036564Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6036634Z     xnumel = 768
2023-01-11T21:38:06.6036725Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6036880Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6036963Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6037037Z     x0 = xindex % 8
2023-01-11T21:38:06.6037118Z     x1 = (xindex // 8) % 12
2023-01-11T21:38:06.6037196Z     x2 = (xindex // 96) % 4
2023-01-11T21:38:06.6037259Z     x4 = xindex
2023-01-11T21:38:06.6037513Z     tmp0 = tl.load(in_ptr0 + (x0 + (8*(x1 % 4)) + (32*(x2 % 2))), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6037648Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6037732Z ''')
2023-01-11T21:38:06.6037738Z 
2023-01-11T21:38:06.6037742Z 
2023-01-11T21:38:06.6037903Z triton_fused_repeat_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6037976Z import triton
2023-01-11T21:38:06.6038068Z import triton.language as tl
2023-01-11T21:38:06.6038182Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6038277Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6038409Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6038536Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6038542Z 
2023-01-11T21:38:06.6038947Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6039018Z @triton.jit
2023-01-11T21:38:06.6039150Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6039222Z     xnumel = 512
2023-01-11T21:38:06.6039316Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6039438Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6039519Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6039593Z     x0 = xindex % 64
2023-01-11T21:38:06.6039664Z     x2 = xindex
2023-01-11T21:38:06.6039876Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6040012Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6040096Z ''')
2023-01-11T21:38:06.6040102Z 
2023-01-11T21:38:06.6040106Z 
2023-01-11T21:38:06.6040265Z triton_fused_repeat_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6040333Z import triton
2023-01-11T21:38:06.6040453Z import triton.language as tl
2023-01-11T21:38:06.6040570Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6040672Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6040803Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6040930Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6040935Z 
2023-01-11T21:38:06.6041341Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6041415Z @triton.jit
2023-01-11T21:38:06.6041538Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6041612Z     xnumel = 128
2023-01-11T21:38:06.6041707Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6041836Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6041921Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6041998Z     x0 = xindex % 64
2023-01-11T21:38:06.6042068Z     x2 = xindex
2023-01-11T21:38:06.6042180Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6042312Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6042400Z ''')
2023-01-11T21:38:06.6042406Z 
2023-01-11T21:38:06.6042410Z 
2023-01-11T21:38:06.6042502Z async_compile.wait(globals())
2023-01-11T21:38:06.6042578Z del async_compile
2023-01-11T21:38:06.6042583Z 
2023-01-11T21:38:06.6042655Z def call(args):
2023-01-11T21:38:06.6042727Z     arg0_1, = args
2023-01-11T21:38:06.6042823Z     args.clear()
2023-01-11T21:38:06.6042916Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6043136Z         buf0 = empty_strided((2, 4, 12, 8), (384, 96, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6043230Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6043372Z         triton_fused_repeat_0.run(arg0_1, buf0, 768, grid=grid(768), stream=stream0)
2023-01-11T21:38:06.6043587Z         buf1 = empty_strided((8, 2, 4, 8), (64, 32, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6043728Z         triton_fused_repeat_1_1.run(arg0_1, buf1, 512, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.6043954Z         buf2 = empty_strided((2, 1, 1, 2, 4, 8), (64, 64, 64, 32, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6044088Z         triton_fused_repeat_2_2.run(arg0_1, buf2, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.6044160Z         del arg0_1
2023-01-11T21:38:06.6044252Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6044260Z 
2023-01-11T21:38:06.6044265Z 
2023-01-11T21:38:06.6044344Z if __name__ == "__main__":
2023-01-11T21:38:06.6044461Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6044588Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6044810Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6044923Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6044928Z 
2023-01-11T21:38:06.6044994Z ok (0.261s)
2023-01-11T21:38:06.6045451Z   test_roi_align_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6045581Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6045841Z [2023-01-11 21:35:31,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 781
2023-01-11T21:38:06.6046098Z [2023-01-11 21:35:31,227] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.torchvision.roi_align
2023-01-11T21:38:06.6046385Z [2023-01-11 21:35:31,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 781
2023-01-11T21:38:06.6046801Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6046931Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6047187Z [2023-01-11 21:35:31,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 782
2023-01-11T21:38:06.6047448Z [2023-01-11 21:35:31,326] torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.torchvision.roi_align
2023-01-11T21:38:06.6047709Z [2023-01-11 21:35:31,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 782
2023-01-11T21:38:06.6047717Z 
2023-01-11T21:38:06.6047814Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6047882Z import torch
2023-01-11T21:38:06.6047955Z import random
2023-01-11T21:38:06.6048073Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6048196Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6048201Z 
2023-01-11T21:38:06.6048282Z aten = torch.ops.aten
2023-01-11T21:38:06.6048416Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6048511Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6048516Z 
2023-01-11T21:38:06.6048583Z import triton
2023-01-11T21:38:06.6048675Z import triton.language as tl
2023-01-11T21:38:06.6048838Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6048977Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6048983Z 
2023-01-11T21:38:06.6048988Z 
2023-01-11T21:38:06.6049083Z async_compile.wait(globals())
2023-01-11T21:38:06.6049157Z del async_compile
2023-01-11T21:38:06.6049165Z 
2023-01-11T21:38:06.6049239Z def call(args):
2023-01-11T21:38:06.6049318Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6049385Z     args.clear()
2023-01-11T21:38:06.6049480Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6049628Z         buf0 = torch.ops.torchvision.roi_align(arg0_1, arg1_1, 0.25, 7, 7, 2, False)
2023-01-11T21:38:06.6049700Z         del arg0_1
2023-01-11T21:38:06.6049772Z         del arg1_1
2023-01-11T21:38:06.6049844Z         buf1 = buf0
2023-01-11T21:38:06.6049961Z         assert_size_stride(buf1, (2292, 256, 7, 7), (12544, 49, 7, 1))
2023-01-11T21:38:06.6050025Z         del buf0
2023-01-11T21:38:06.6050104Z         return (buf1, )
2023-01-11T21:38:06.6050110Z 
2023-01-11T21:38:06.6050114Z 
2023-01-11T21:38:06.6050194Z if __name__ == "__main__":
2023-01-11T21:38:06.6050310Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6050436Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6050679Z     arg0_1 = rand_strided((4, 256, 296, 304), (23035904, 89984, 304, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6050884Z     arg1_1 = rand_strided((2292, 5), (5, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6051003Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6051008Z 
2023-01-11T21:38:06.6051013Z 
2023-01-11T21:38:06.6051103Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6051175Z import torch
2023-01-11T21:38:06.6051249Z import random
2023-01-11T21:38:06.6051366Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6051488Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6051499Z 
2023-01-11T21:38:06.6051578Z aten = torch.ops.aten
2023-01-11T21:38:06.6051711Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6051799Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6051810Z 
2023-01-11T21:38:06.6051877Z import triton
2023-01-11T21:38:06.6051968Z import triton.language as tl
2023-01-11T21:38:06.6052119Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6052258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6052263Z 
2023-01-11T21:38:06.6052267Z 
2023-01-11T21:38:06.6052358Z async_compile.wait(globals())
2023-01-11T21:38:06.6052433Z del async_compile
2023-01-11T21:38:06.6052438Z 
2023-01-11T21:38:06.6052513Z def call(args):
2023-01-11T21:38:06.6052584Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6052656Z     args.clear()
2023-01-11T21:38:06.6052745Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6052892Z         buf0 = torch.ops.torchvision.roi_align(arg0_1, arg1_1, 0.25, 7, 7, 2, False)
2023-01-11T21:38:06.6052970Z         del arg0_1
2023-01-11T21:38:06.6053041Z         del arg1_1
2023-01-11T21:38:06.6053111Z         buf1 = buf0
2023-01-11T21:38:06.6053221Z         assert_size_stride(buf1, (2292, 256, 7, 7), (12544, 49, 7, 1))
2023-01-11T21:38:06.6053292Z         del buf0
2023-01-11T21:38:06.6053371Z         return (buf1, )
2023-01-11T21:38:06.6053377Z 
2023-01-11T21:38:06.6053381Z 
2023-01-11T21:38:06.6053460Z if __name__ == "__main__":
2023-01-11T21:38:06.6053577Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6053707Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6053944Z     arg0_1 = rand_strided((4, 256, 296, 304), (23035904, 89984, 304, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6054144Z     arg1_1 = rand_strided((2292, 5), (5, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6054255Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6054270Z 
2023-01-11T21:38:06.6054364Z ok (0.262s)
2023-01-11T21:38:06.6054932Z   test_roll_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6055063Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6055324Z [2023-01-11 21:35:31,424] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 783
2023-01-11T21:38:06.6055587Z [2023-01-11 21:35:31,596] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 783
2023-01-11T21:38:06.6056000Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6056133Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6056390Z [2023-01-11 21:35:31,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 784
2023-01-11T21:38:06.6056652Z [2023-01-11 21:35:31,742] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 784
2023-01-11T21:38:06.6056657Z 
2023-01-11T21:38:06.6056755Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6056822Z import torch
2023-01-11T21:38:06.6056897Z import random
2023-01-11T21:38:06.6057014Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6057188Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6057194Z 
2023-01-11T21:38:06.6057276Z aten = torch.ops.aten
2023-01-11T21:38:06.6057415Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6057511Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6057516Z 
2023-01-11T21:38:06.6057589Z import triton
2023-01-11T21:38:06.6057674Z import triton.language as tl
2023-01-11T21:38:06.6057797Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6057982Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6057989Z 
2023-01-11T21:38:06.6057993Z 
2023-01-11T21:38:06.6058151Z triton_fused_roll_0 = async_compile.triton('''
2023-01-11T21:38:06.6058228Z import triton
2023-01-11T21:38:06.6058321Z import triton.language as tl
2023-01-11T21:38:06.6058435Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6058529Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6058663Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6058787Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6058795Z 
2023-01-11T21:38:06.6059200Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6059271Z @triton.jit
2023-01-11T21:38:06.6059409Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6059483Z     xnumel = 100352
2023-01-11T21:38:06.6059581Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6059702Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6059784Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6059858Z     x0 = xindex % 16
2023-01-11T21:38:06.6059939Z     x1 = (xindex // 16) % 56
2023-01-11T21:38:06.6060021Z     x2 = (xindex // 896) % 56
2023-01-11T21:38:06.6060099Z     x3 = (xindex // 50176)
2023-01-11T21:38:06.6060168Z     x4 = xindex
2023-01-11T21:38:06.6060425Z     tmp0 = tl.load(in_ptr0 + (x0 + (16*((46 + x1) % 56)) + (896*((3 + x2) % 56)) + (50176*x3)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6060599Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6060686Z ''')
2023-01-11T21:38:06.6060692Z 
2023-01-11T21:38:06.6060697Z 
2023-01-11T21:38:06.6060856Z triton_fused_roll_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6060936Z import triton
2023-01-11T21:38:06.6061028Z import triton.language as tl
2023-01-11T21:38:06.6061145Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6061247Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6061375Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6061502Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6061508Z 
2023-01-11T21:38:06.6061909Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6061989Z @triton.jit
2023-01-11T21:38:06.6062124Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6062199Z     xnumel = 100352
2023-01-11T21:38:06.6062296Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6062427Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6062506Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6062582Z     x0 = xindex % 16
2023-01-11T21:38:06.6062666Z     x1 = (xindex // 16) % 56
2023-01-11T21:38:06.6062749Z     x2 = (xindex // 896) % 56
2023-01-11T21:38:06.6062832Z     x3 = (xindex // 50176)
2023-01-11T21:38:06.6062903Z     x4 = xindex
2023-01-11T21:38:06.6063039Z     tmp0 = tl.load(in_ptr0 + ((100347 + x0 + (16*x1) + (896*x2) + (50176*x3)) % 100352), xmask)
2023-01-11T21:38:06.6063170Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6063256Z ''')
2023-01-11T21:38:06.6063261Z 
2023-01-11T21:38:06.6063268Z 
2023-01-11T21:38:06.6063366Z async_compile.wait(globals())
2023-01-11T21:38:06.6063444Z del async_compile
2023-01-11T21:38:06.6063449Z 
2023-01-11T21:38:06.6063525Z def call(args):
2023-01-11T21:38:06.6063600Z     arg0_1, = args
2023-01-11T21:38:06.6063677Z     args.clear()
2023-01-11T21:38:06.6063765Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6064025Z         buf0 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6064124Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6064268Z         triton_fused_roll_0.run(arg0_1, buf0, 100352, grid=grid(100352), stream=stream0)
2023-01-11T21:38:06.6064493Z         buf1 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6064635Z         triton_fused_roll_1_1.run(arg0_1, buf1, 100352, grid=grid(100352), stream=stream0)
2023-01-11T21:38:06.6064709Z         del arg0_1
2023-01-11T21:38:06.6064794Z         return (buf0, buf1, )
2023-01-11T21:38:06.6064803Z 
2023-01-11T21:38:06.6064807Z 
2023-01-11T21:38:06.6064889Z if __name__ == "__main__":
2023-01-11T21:38:06.6065002Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6065131Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6065359Z     arg0_1 = rand_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6065474Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6065479Z 
2023-01-11T21:38:06.6065484Z 
2023-01-11T21:38:06.6065582Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6065678Z import torch
2023-01-11T21:38:06.6065760Z import random
2023-01-11T21:38:06.6065899Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6066025Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6066030Z 
2023-01-11T21:38:06.6066113Z aten = torch.ops.aten
2023-01-11T21:38:06.6066250Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6066378Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6066383Z 
2023-01-11T21:38:06.6066459Z import triton
2023-01-11T21:38:06.6066554Z import triton.language as tl
2023-01-11T21:38:06.6066680Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6066815Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6066821Z 
2023-01-11T21:38:06.6066831Z 
2023-01-11T21:38:06.6066982Z triton_fused_roll_0 = async_compile.triton('''
2023-01-11T21:38:06.6067057Z import triton
2023-01-11T21:38:06.6067149Z import triton.language as tl
2023-01-11T21:38:06.6067263Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6067364Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6067499Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6067625Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6067630Z 
2023-01-11T21:38:06.6068026Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6068103Z @triton.jit
2023-01-11T21:38:06.6068237Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6068316Z     xnumel = 100352
2023-01-11T21:38:06.6068415Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6068544Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6068630Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6068709Z     x0 = xindex % 16
2023-01-11T21:38:06.6068786Z     x1 = (xindex // 16) % 56
2023-01-11T21:38:06.6068870Z     x2 = (xindex // 896) % 56
2023-01-11T21:38:06.6068952Z     x3 = (xindex // 50176)
2023-01-11T21:38:06.6069023Z     x4 = xindex
2023-01-11T21:38:06.6069312Z     tmp0 = tl.load(in_ptr0 + (x0 + (16*((46 + x1) % 56)) + (896*((3 + x2) % 56)) + (50176*x3)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6069456Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6069540Z ''')
2023-01-11T21:38:06.6069546Z 
2023-01-11T21:38:06.6069550Z 
2023-01-11T21:38:06.6069702Z triton_fused_roll_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6069779Z import triton
2023-01-11T21:38:06.6069901Z import triton.language as tl
2023-01-11T21:38:06.6070018Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6070123Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6070257Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6070385Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6070390Z 
2023-01-11T21:38:06.6070801Z @pointwise(size_hints=[131072], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6070873Z @triton.jit
2023-01-11T21:38:06.6071007Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6071085Z     xnumel = 100352
2023-01-11T21:38:06.6071184Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6071312Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6071398Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6071475Z     x0 = xindex % 16
2023-01-11T21:38:06.6071552Z     x1 = (xindex // 16) % 56
2023-01-11T21:38:06.6071632Z     x2 = (xindex // 896) % 56
2023-01-11T21:38:06.6071714Z     x3 = (xindex // 50176)
2023-01-11T21:38:06.6071786Z     x4 = xindex
2023-01-11T21:38:06.6071934Z     tmp0 = tl.load(in_ptr0 + ((100347 + x0 + (16*x1) + (896*x2) + (50176*x3)) % 100352), xmask).to(tl.float32)
2023-01-11T21:38:06.6072071Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6072159Z ''')
2023-01-11T21:38:06.6072164Z 
2023-01-11T21:38:06.6072168Z 
2023-01-11T21:38:06.6072290Z async_compile.wait(globals())
2023-01-11T21:38:06.6072362Z del async_compile
2023-01-11T21:38:06.6072367Z 
2023-01-11T21:38:06.6072443Z def call(args):
2023-01-11T21:38:06.6072519Z     arg0_1, = args
2023-01-11T21:38:06.6072596Z     args.clear()
2023-01-11T21:38:06.6072690Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6072921Z         buf0 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6073016Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6073150Z         triton_fused_roll_0.run(arg0_1, buf0, 100352, grid=grid(100352), stream=stream0)
2023-01-11T21:38:06.6073377Z         buf1 = empty_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6073522Z         triton_fused_roll_1_1.run(arg0_1, buf1, 100352, grid=grid(100352), stream=stream0)
2023-01-11T21:38:06.6073597Z         del arg0_1
2023-01-11T21:38:06.6073680Z         return (buf0, buf1, )
2023-01-11T21:38:06.6073688Z 
2023-01-11T21:38:06.6073693Z 
2023-01-11T21:38:06.6073774Z if __name__ == "__main__":
2023-01-11T21:38:06.6073892Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6074019Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6074243Z     arg0_1 = rand_strided((2, 56, 56, 16), (50176, 896, 16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6074359Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6074364Z 
2023-01-11T21:38:06.6074436Z ok (0.348s)
2023-01-11T21:38:06.6074612Z   test_round_correctness_cuda (__main__.CudaTests) ... skip: need to debug tl.libdevice on A100/V100 (0.001s)
2023-01-11T21:38:06.6075067Z   test_round_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6075202Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6075488Z [2023-01-11 21:35:31,764] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 785
2023-01-11T21:38:06.6075803Z [2023-01-11 21:35:31,912] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 785
2023-01-11T21:38:06.6076218Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6076351Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6076606Z [2023-01-11 21:35:31,934] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 786
2023-01-11T21:38:06.6076867Z [2023-01-11 21:35:32,081] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 786
2023-01-11T21:38:06.6076879Z 
2023-01-11T21:38:06.6076972Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6077048Z import torch
2023-01-11T21:38:06.6077126Z import random
2023-01-11T21:38:06.6077249Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6077374Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6077378Z 
2023-01-11T21:38:06.6077463Z aten = torch.ops.aten
2023-01-11T21:38:06.6077602Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6077693Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6077698Z 
2023-01-11T21:38:06.6077773Z import triton
2023-01-11T21:38:06.6077868Z import triton.language as tl
2023-01-11T21:38:06.6077995Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6078136Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6078167Z 
2023-01-11T21:38:06.6078172Z 
2023-01-11T21:38:06.6078342Z triton_fused_mul_1_round_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6078419Z import triton
2023-01-11T21:38:06.6078514Z import triton.language as tl
2023-01-11T21:38:06.6078626Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6078731Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6078865Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6078992Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6078997Z 
2023-01-11T21:38:06.6079417Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6079492Z @triton.jit
2023-01-11T21:38:06.6079636Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6079710Z     xnumel = 64
2023-01-11T21:38:06.6079803Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6079933Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6080020Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6080093Z     x0 = xindex
2023-01-11T21:38:06.6080285Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6080381Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6080488Z     tmp1 = tl.libdevice.nearbyint(tmp0)
2023-01-11T21:38:06.6080557Z     tmp3 = 100.0
2023-01-11T21:38:06.6080638Z     tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.6080748Z     tmp5 = tl.libdevice.nearbyint(tmp4)
2023-01-11T21:38:06.6080821Z     tmp6 = 0.01
2023-01-11T21:38:06.6080900Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.6081035Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6081174Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6081257Z ''')
2023-01-11T21:38:06.6081262Z 
2023-01-11T21:38:06.6081267Z 
2023-01-11T21:38:06.6081428Z triton_fused_round_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6081504Z import triton
2023-01-11T21:38:06.6081602Z import triton.language as tl
2023-01-11T21:38:06.6081795Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6081899Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6082031Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6082149Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6082160Z 
2023-01-11T21:38:06.6082557Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6082628Z @triton.jit
2023-01-11T21:38:06.6082759Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6082835Z     xnumel = 64
2023-01-11T21:38:06.6082931Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6083059Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6083142Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6083206Z     x0 = xindex
2023-01-11T21:38:06.6083303Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6083371Z     tmp1 = 1
2023-01-11T21:38:06.6083451Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6083552Z     tmp3 = tl.libdevice.nearbyint(tmp2)
2023-01-11T21:38:06.6083684Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6083769Z ''')
2023-01-11T21:38:06.6083775Z 
2023-01-11T21:38:06.6083779Z 
2023-01-11T21:38:06.6083865Z async_compile.wait(globals())
2023-01-11T21:38:06.6083939Z del async_compile
2023-01-11T21:38:06.6083944Z 
2023-01-11T21:38:06.6084020Z def call(args):
2023-01-11T21:38:06.6084096Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6084200Z     args.clear()
2023-01-11T21:38:06.6084292Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6084493Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6084690Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6084779Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6084928Z         triton_fused_mul_1_round_1_0.run(arg0_1, buf0, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6085001Z         del arg0_1
2023-01-11T21:38:06.6085196Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6085339Z         triton_fused_round_2_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6085411Z         del arg1_1
2023-01-11T21:38:06.6085497Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6085502Z 
2023-01-11T21:38:06.6085506Z 
2023-01-11T21:38:06.6085585Z if __name__ == "__main__":
2023-01-11T21:38:06.6085702Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6085827Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6086025Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6086222Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6086346Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6086352Z 
2023-01-11T21:38:06.6086356Z 
2023-01-11T21:38:06.6086453Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6086529Z import torch
2023-01-11T21:38:06.6086597Z import random
2023-01-11T21:38:06.6086715Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6086836Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6086841Z 
2023-01-11T21:38:06.6086923Z aten = torch.ops.aten
2023-01-11T21:38:06.6087062Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6087162Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6087167Z 
2023-01-11T21:38:06.6087238Z import triton
2023-01-11T21:38:06.6087328Z import triton.language as tl
2023-01-11T21:38:06.6087445Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6087584Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6087617Z 
2023-01-11T21:38:06.6087622Z 
2023-01-11T21:38:06.6087789Z triton_fused_mul_1_round_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6087863Z import triton
2023-01-11T21:38:06.6087956Z import triton.language as tl
2023-01-11T21:38:06.6088067Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6088168Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6088298Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6088415Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6088420Z 
2023-01-11T21:38:06.6088838Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6088916Z @triton.jit
2023-01-11T21:38:06.6089057Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6089133Z     xnumel = 64
2023-01-11T21:38:06.6089230Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6089359Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6089444Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6089508Z     x0 = xindex
2023-01-11T21:38:06.6089721Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6089840Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6089944Z     tmp1 = tl.libdevice.nearbyint(tmp0)
2023-01-11T21:38:06.6090015Z     tmp3 = 100.0
2023-01-11T21:38:06.6090093Z     tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.6090243Z     tmp5 = tl.libdevice.nearbyint(tmp4)
2023-01-11T21:38:06.6090308Z     tmp6 = 0.01
2023-01-11T21:38:06.6090386Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.6090521Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6090654Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6090739Z ''')
2023-01-11T21:38:06.6090745Z 
2023-01-11T21:38:06.6090749Z 
2023-01-11T21:38:06.6090908Z triton_fused_round_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6090983Z import triton
2023-01-11T21:38:06.6091068Z import triton.language as tl
2023-01-11T21:38:06.6091179Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6091283Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6091417Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6091539Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6091545Z 
2023-01-11T21:38:06.6091948Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6092025Z @triton.jit
2023-01-11T21:38:06.6092156Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6092225Z     xnumel = 64
2023-01-11T21:38:06.6092321Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6092449Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6092529Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6092598Z     x0 = xindex
2023-01-11T21:38:06.6092715Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6092783Z     tmp1 = 1
2023-01-11T21:38:06.6092855Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6092958Z     tmp3 = tl.libdevice.nearbyint(tmp2)
2023-01-11T21:38:06.6093086Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6093178Z ''')
2023-01-11T21:38:06.6093183Z 
2023-01-11T21:38:06.6093188Z 
2023-01-11T21:38:06.6093282Z async_compile.wait(globals())
2023-01-11T21:38:06.6093359Z del async_compile
2023-01-11T21:38:06.6093364Z 
2023-01-11T21:38:06.6093438Z def call(args):
2023-01-11T21:38:06.6093517Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6093585Z     args.clear()
2023-01-11T21:38:06.6093704Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6093902Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6094099Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6094190Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6094337Z         triton_fused_mul_1_round_1_0.run(arg0_1, buf0, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6094410Z         del arg0_1
2023-01-11T21:38:06.6094723Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6094866Z         triton_fused_round_2_1.run(arg1_1, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6094944Z         del arg1_1
2023-01-11T21:38:06.6095031Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6095036Z 
2023-01-11T21:38:06.6095041Z 
2023-01-11T21:38:06.6095120Z if __name__ == "__main__":
2023-01-11T21:38:06.6095238Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6095364Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6095564Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6095753Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6095872Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6095878Z 
2023-01-11T21:38:06.6095948Z ok (0.339s)
2023-01-11T21:38:06.6096397Z   test_rsqrt_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6096573Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6096835Z [2023-01-11 21:35:32,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 787
2023-01-11T21:38:06.6097098Z [2023-01-11 21:35:32,241] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 787
2023-01-11T21:38:06.6097587Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6097727Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6097981Z [2023-01-11 21:35:32,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 788
2023-01-11T21:38:06.6098245Z [2023-01-11 21:35:32,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 788
2023-01-11T21:38:06.6098251Z 
2023-01-11T21:38:06.6098342Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6098416Z import torch
2023-01-11T21:38:06.6098490Z import random
2023-01-11T21:38:06.6098610Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6098738Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6098743Z 
2023-01-11T21:38:06.6098824Z aten = torch.ops.aten
2023-01-11T21:38:06.6098958Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6099046Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6099057Z 
2023-01-11T21:38:06.6099124Z import triton
2023-01-11T21:38:06.6099218Z import triton.language as tl
2023-01-11T21:38:06.6099342Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6099482Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6099487Z 
2023-01-11T21:38:06.6099492Z 
2023-01-11T21:38:06.6099694Z triton_fused_rsqrt_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.6099771Z import triton
2023-01-11T21:38:06.6099862Z import triton.language as tl
2023-01-11T21:38:06.6099968Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6100069Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6100201Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6100329Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6100335Z 
2023-01-11T21:38:06.6100751Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6100827Z @triton.jit
2023-01-11T21:38:06.6100968Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6101042Z     xnumel = 64
2023-01-11T21:38:06.6101132Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6101261Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6101345Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6101415Z     x0 = xindex
2023-01-11T21:38:06.6101606Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6101705Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6101802Z     tmp1 = tl.libdevice.rsqrt(tmp0)
2023-01-11T21:38:06.6101866Z     tmp3 = 1
2023-01-11T21:38:06.6101948Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6102049Z     tmp5 = tl.libdevice.rsqrt(tmp4)
2023-01-11T21:38:06.6102119Z     tmp6 = 2
2023-01-11T21:38:06.6102229Z     tmp7 = tmp5 - tmp6
2023-01-11T21:38:06.6102389Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6102519Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6102597Z ''')
2023-01-11T21:38:06.6102602Z 
2023-01-11T21:38:06.6102614Z 
2023-01-11T21:38:06.6102700Z async_compile.wait(globals())
2023-01-11T21:38:06.6102777Z del async_compile
2023-01-11T21:38:06.6102782Z 
2023-01-11T21:38:06.6102853Z def call(args):
2023-01-11T21:38:06.6102926Z     arg0_1, = args
2023-01-11T21:38:06.6103004Z     args.clear()
2023-01-11T21:38:06.6103124Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6103393Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6103602Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6103701Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6103868Z         triton_fused_rsqrt_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6103974Z         del arg0_1
2023-01-11T21:38:06.6104085Z         return (buf0, buf1, )
2023-01-11T21:38:06.6104094Z 
2023-01-11T21:38:06.6104100Z 
2023-01-11T21:38:06.6104212Z if __name__ == "__main__":
2023-01-11T21:38:06.6104374Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6104507Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6104706Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6104821Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6104827Z 
2023-01-11T21:38:06.6104831Z 
2023-01-11T21:38:06.6104932Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6105005Z import torch
2023-01-11T21:38:06.6105077Z import random
2023-01-11T21:38:06.6105194Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6105318Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6105323Z 
2023-01-11T21:38:06.6105405Z aten = torch.ops.aten
2023-01-11T21:38:06.6105537Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6105643Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6105650Z 
2023-01-11T21:38:06.6105736Z import triton
2023-01-11T21:38:06.6105838Z import triton.language as tl
2023-01-11T21:38:06.6106022Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6106164Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6106169Z 
2023-01-11T21:38:06.6106174Z 
2023-01-11T21:38:06.6106334Z triton_fused_rsqrt_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.6106401Z import triton
2023-01-11T21:38:06.6106493Z import triton.language as tl
2023-01-11T21:38:06.6106633Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6106736Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6106868Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6106999Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6107007Z 
2023-01-11T21:38:06.6107427Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6107506Z @triton.jit
2023-01-11T21:38:06.6107650Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6107718Z     xnumel = 64
2023-01-11T21:38:06.6107814Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6107942Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6108029Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6108098Z     x0 = xindex
2023-01-11T21:38:06.6108311Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6108432Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6108523Z     tmp1 = tl.libdevice.rsqrt(tmp0)
2023-01-11T21:38:06.6108625Z     tmp3 = 1
2023-01-11T21:38:06.6108707Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6108807Z     tmp5 = tl.libdevice.rsqrt(tmp4)
2023-01-11T21:38:06.6108880Z     tmp6 = 2
2023-01-11T21:38:06.6108990Z     tmp7 = tmp5 - tmp6
2023-01-11T21:38:06.6109121Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6109256Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6109343Z ''')
2023-01-11T21:38:06.6109348Z 
2023-01-11T21:38:06.6109353Z 
2023-01-11T21:38:06.6109449Z async_compile.wait(globals())
2023-01-11T21:38:06.6109529Z del async_compile
2023-01-11T21:38:06.6109535Z 
2023-01-11T21:38:06.6109609Z def call(args):
2023-01-11T21:38:06.6109686Z     arg0_1, = args
2023-01-11T21:38:06.6109763Z     args.clear()
2023-01-11T21:38:06.6109851Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6110051Z         buf0 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6110251Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6110346Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6110491Z         triton_fused_rsqrt_sub_0.run(arg0_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6110565Z         del arg0_1
2023-01-11T21:38:06.6110652Z         return (buf0, buf1, )
2023-01-11T21:38:06.6110657Z 
2023-01-11T21:38:06.6110662Z 
2023-01-11T21:38:06.6110745Z if __name__ == "__main__":
2023-01-11T21:38:06.6110858Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6110985Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6111185Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6111301Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6111307Z 
2023-01-11T21:38:06.6111382Z ok (0.320s)
2023-01-11T21:38:06.6111844Z   test_scatter1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6112014Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6112277Z [2023-01-11 21:35:32,425] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 789
2023-01-11T21:38:06.6112542Z [2023-01-11 21:35:32,495] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 789
2023-01-11T21:38:06.6112959Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6113090Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6113346Z [2023-01-11 21:35:32,517] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 790
2023-01-11T21:38:06.6113608Z [2023-01-11 21:35:32,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 790
2023-01-11T21:38:06.6113614Z 
2023-01-11T21:38:06.6113717Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6113796Z import torch
2023-01-11T21:38:06.6113875Z import random
2023-01-11T21:38:06.6113995Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6114122Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6114128Z 
2023-01-11T21:38:06.6114206Z aten = torch.ops.aten
2023-01-11T21:38:06.6114344Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6114440Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6114473Z 
2023-01-11T21:38:06.6114552Z import triton
2023-01-11T21:38:06.6114645Z import triton.language as tl
2023-01-11T21:38:06.6114770Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6114914Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6114920Z 
2023-01-11T21:38:06.6114927Z 
2023-01-11T21:38:06.6115094Z triton_fused_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6115164Z import triton
2023-01-11T21:38:06.6115261Z import triton.language as tl
2023-01-11T21:38:06.6115378Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6115481Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6115615Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6115741Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6115746Z 
2023-01-11T21:38:06.6116148Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6116228Z @triton.jit
2023-01-11T21:38:06.6116356Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6116431Z     xnumel = 6
2023-01-11T21:38:06.6116535Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6116665Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6116750Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6116824Z     x0 = xindex
2023-01-11T21:38:06.6116923Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6117052Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6117143Z ''')
2023-01-11T21:38:06.6117148Z 
2023-01-11T21:38:06.6117153Z 
2023-01-11T21:38:06.6117315Z triton_fused_scatter_1 = async_compile.triton('''
2023-01-11T21:38:06.6117392Z import triton
2023-01-11T21:38:06.6117488Z import triton.language as tl
2023-01-11T21:38:06.6117609Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6117712Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6117847Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6117967Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6117973Z 
2023-01-11T21:38:06.6118427Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6118510Z @triton.jit
2023-01-11T21:38:06.6118653Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6118727Z     xnumel = 1
2023-01-11T21:38:06.6118824Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6118955Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6119041Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6119172Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.6119303Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.6119409Z     tl.store(out_ptr0 + (tmp0), tmp1, None)
2023-01-11T21:38:06.6119497Z ''')
2023-01-11T21:38:06.6119502Z 
2023-01-11T21:38:06.6119509Z 
2023-01-11T21:38:06.6119606Z async_compile.wait(globals())
2023-01-11T21:38:06.6119687Z del async_compile
2023-01-11T21:38:06.6119692Z 
2023-01-11T21:38:06.6119769Z def call(args):
2023-01-11T21:38:06.6119858Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6119929Z     args.clear()
2023-01-11T21:38:06.6120022Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6120224Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6120320Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6120465Z         triton_fused_scatter_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6120569Z         del arg0_1
2023-01-11T21:38:06.6120718Z         triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.6120787Z         del arg1_1
2023-01-11T21:38:06.6120862Z         del arg2_1
2023-01-11T21:38:06.6120940Z         return (buf0, )
2023-01-11T21:38:06.6120945Z 
2023-01-11T21:38:06.6120952Z 
2023-01-11T21:38:06.6121034Z if __name__ == "__main__":
2023-01-11T21:38:06.6121152Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6121279Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6121482Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6121673Z     arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6121870Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6122000Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6122008Z 
2023-01-11T21:38:06.6122012Z 
2023-01-11T21:38:06.6122113Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6122191Z import torch
2023-01-11T21:38:06.6122267Z import random
2023-01-11T21:38:06.6122388Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6122515Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6122522Z 
2023-01-11T21:38:06.6122606Z aten = torch.ops.aten
2023-01-11T21:38:06.6122738Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6122837Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6122842Z 
2023-01-11T21:38:06.6122919Z import triton
2023-01-11T21:38:06.6123013Z import triton.language as tl
2023-01-11T21:38:06.6123138Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6123282Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6123287Z 
2023-01-11T21:38:06.6123292Z 
2023-01-11T21:38:06.6123454Z triton_fused_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6123527Z import triton
2023-01-11T21:38:06.6123622Z import triton.language as tl
2023-01-11T21:38:06.6123738Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6123843Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6123975Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6124128Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6124134Z 
2023-01-11T21:38:06.6124536Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6124611Z @triton.jit
2023-01-11T21:38:06.6124740Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6124814Z     xnumel = 6
2023-01-11T21:38:06.6124913Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6125045Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6125135Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6125209Z     x0 = xindex
2023-01-11T21:38:06.6125331Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6125459Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6125548Z ''')
2023-01-11T21:38:06.6125557Z 
2023-01-11T21:38:06.6125561Z 
2023-01-11T21:38:06.6125723Z triton_fused_scatter_1 = async_compile.triton('''
2023-01-11T21:38:06.6125799Z import triton
2023-01-11T21:38:06.6125894Z import triton.language as tl
2023-01-11T21:38:06.6126010Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6126112Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6126248Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6126367Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6126372Z 
2023-01-11T21:38:06.6126792Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6126895Z @triton.jit
2023-01-11T21:38:06.6127038Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6127115Z     xnumel = 1
2023-01-11T21:38:06.6127214Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6127344Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6127428Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6127555Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.6127703Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.6127813Z     tl.store(out_ptr0 + (tmp0), tmp1, None)
2023-01-11T21:38:06.6127901Z ''')
2023-01-11T21:38:06.6127906Z 
2023-01-11T21:38:06.6127910Z 
2023-01-11T21:38:06.6128007Z async_compile.wait(globals())
2023-01-11T21:38:06.6128087Z del async_compile
2023-01-11T21:38:06.6128092Z 
2023-01-11T21:38:06.6128168Z def call(args):
2023-01-11T21:38:06.6128257Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6128328Z     args.clear()
2023-01-11T21:38:06.6128419Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6128623Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6128719Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6128863Z         triton_fused_scatter_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6128936Z         del arg0_1
2023-01-11T21:38:06.6129080Z         triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.6129148Z         del arg1_1
2023-01-11T21:38:06.6129222Z         del arg2_1
2023-01-11T21:38:06.6129298Z         return (buf0, )
2023-01-11T21:38:06.6129304Z 
2023-01-11T21:38:06.6129308Z 
2023-01-11T21:38:06.6129390Z if __name__ == "__main__":
2023-01-11T21:38:06.6129511Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6129643Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6129847Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6130068Z     arg1_1 = rand_strided((1, 1), (1, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6130260Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6130393Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6130398Z 
2023-01-11T21:38:06.6130471Z ok (0.189s)
2023-01-11T21:38:06.6130617Z   test_scatter2_cuda (__main__.CudaTests) ... skip: unstable on sm86 (0.001s)
2023-01-11T21:38:06.6131076Z   test_scatter3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6131216Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6131480Z [2023-01-11 21:35:32,611] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 791
2023-01-11T21:38:06.6131744Z [2023-01-11 21:35:32,693] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 791
2023-01-11T21:38:06.6132162Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6132296Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6132590Z [2023-01-11 21:35:32,713] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 792
2023-01-11T21:38:06.6132847Z [2023-01-11 21:35:32,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 792
2023-01-11T21:38:06.6132853Z 
2023-01-11T21:38:06.6132956Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6133033Z import torch
2023-01-11T21:38:06.6133110Z import random
2023-01-11T21:38:06.6133233Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6133359Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6133364Z 
2023-01-11T21:38:06.6133449Z aten = torch.ops.aten
2023-01-11T21:38:06.6133582Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6133680Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6133686Z 
2023-01-11T21:38:06.6133762Z import triton
2023-01-11T21:38:06.6133856Z import triton.language as tl
2023-01-11T21:38:06.6133987Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6134128Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6134133Z 
2023-01-11T21:38:06.6134138Z 
2023-01-11T21:38:06.6134299Z triton_fused_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6134376Z import triton
2023-01-11T21:38:06.6134466Z import triton.language as tl
2023-01-11T21:38:06.6134940Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6135045Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6135177Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6135382Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6135388Z 
2023-01-11T21:38:06.6135856Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6135940Z @triton.jit
2023-01-11T21:38:06.6136074Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6136141Z     xnumel = 1885
2023-01-11T21:38:06.6136235Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6136372Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6136513Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6136588Z     x0 = xindex
2023-01-11T21:38:06.6136687Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6136825Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6136905Z ''')
2023-01-11T21:38:06.6136917Z 
2023-01-11T21:38:06.6136922Z 
2023-01-11T21:38:06.6137074Z triton_fused_scatter_1 = async_compile.triton('''
2023-01-11T21:38:06.6137201Z import triton
2023-01-11T21:38:06.6137310Z import triton.language as tl
2023-01-11T21:38:06.6137437Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6137540Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6137676Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6137801Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6137807Z 
2023-01-11T21:38:06.6138210Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6138289Z @triton.jit
2023-01-11T21:38:06.6138424Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6138497Z     xnumel = 4
2023-01-11T21:38:06.6138594Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6138724Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6138807Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6138879Z     x0 = xindex
2023-01-11T21:38:06.6138969Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6139040Z     tmp1 = 0.8
2023-01-11T21:38:06.6139192Z     tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask)
2023-01-11T21:38:06.6139277Z ''')
2023-01-11T21:38:06.6139283Z 
2023-01-11T21:38:06.6139287Z 
2023-01-11T21:38:06.6139376Z async_compile.wait(globals())
2023-01-11T21:38:06.6139453Z del async_compile
2023-01-11T21:38:06.6139458Z 
2023-01-11T21:38:06.6139534Z def call(args):
2023-01-11T21:38:06.6139610Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6139687Z     args.clear()
2023-01-11T21:38:06.6139780Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6139993Z         buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6140087Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6140231Z         triton_fused_scatter_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0)
2023-01-11T21:38:06.6140305Z         del arg0_1
2023-01-11T21:38:06.6140440Z         triton_fused_scatter_1.run(arg1_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6140516Z         del arg1_1
2023-01-11T21:38:06.6140598Z         return (buf0, )
2023-01-11T21:38:06.6140603Z 
2023-01-11T21:38:06.6140608Z 
2023-01-11T21:38:06.6140688Z if __name__ == "__main__":
2023-01-11T21:38:06.6140804Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6140930Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6141145Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6141349Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6141462Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6141473Z 
2023-01-11T21:38:06.6141478Z 
2023-01-11T21:38:06.6141569Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6141643Z import torch
2023-01-11T21:38:06.6141716Z import random
2023-01-11T21:38:06.6141836Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6141961Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6141969Z 
2023-01-11T21:38:06.6142051Z aten = torch.ops.aten
2023-01-11T21:38:06.6142187Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6142276Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6142288Z 
2023-01-11T21:38:06.6142356Z import triton
2023-01-11T21:38:06.6142477Z import triton.language as tl
2023-01-11T21:38:06.6142605Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6142744Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6142750Z 
2023-01-11T21:38:06.6142754Z 
2023-01-11T21:38:06.6142913Z triton_fused_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6142989Z import triton
2023-01-11T21:38:06.6143081Z import triton.language as tl
2023-01-11T21:38:06.6143189Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6143293Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6143425Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6143554Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6143559Z 
2023-01-11T21:38:06.6143969Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6144044Z @triton.jit
2023-01-11T21:38:06.6144176Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6144250Z     xnumel = 1885
2023-01-11T21:38:06.6144341Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6144471Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6144554Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6144626Z     x0 = xindex
2023-01-11T21:38:06.6144743Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6144878Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6144991Z ''')
2023-01-11T21:38:06.6144996Z 
2023-01-11T21:38:06.6145001Z 
2023-01-11T21:38:06.6145160Z triton_fused_scatter_1 = async_compile.triton('''
2023-01-11T21:38:06.6145230Z import triton
2023-01-11T21:38:06.6145324Z import triton.language as tl
2023-01-11T21:38:06.6145441Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6145549Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6145705Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6145846Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6145852Z 
2023-01-11T21:38:06.6146264Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6146339Z @triton.jit
2023-01-11T21:38:06.6146467Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6146543Z     xnumel = 4
2023-01-11T21:38:06.6146644Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6146774Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6146860Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6146933Z     x0 = xindex
2023-01-11T21:38:06.6147025Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6147104Z     tmp1 = 0.8
2023-01-11T21:38:06.6147216Z     tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask)
2023-01-11T21:38:06.6147306Z ''')
2023-01-11T21:38:06.6147311Z 
2023-01-11T21:38:06.6147316Z 
2023-01-11T21:38:06.6147410Z async_compile.wait(globals())
2023-01-11T21:38:06.6147489Z del async_compile
2023-01-11T21:38:06.6147494Z 
2023-01-11T21:38:06.6147569Z def call(args):
2023-01-11T21:38:06.6147649Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6147720Z     args.clear()
2023-01-11T21:38:06.6147813Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6148030Z         buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6148127Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6148274Z         triton_fused_scatter_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0)
2023-01-11T21:38:06.6148349Z         del arg0_1
2023-01-11T21:38:06.6148494Z         triton_fused_scatter_1.run(arg1_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6148592Z         del arg1_1
2023-01-11T21:38:06.6148674Z         return (buf0, )
2023-01-11T21:38:06.6148679Z 
2023-01-11T21:38:06.6148684Z 
2023-01-11T21:38:06.6148768Z if __name__ == "__main__":
2023-01-11T21:38:06.6148886Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6149014Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6149229Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6149433Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6149554Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6149563Z 
2023-01-11T21:38:06.6149629Z ok (0.198s)
2023-01-11T21:38:06.6150086Z   test_scatter4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6150220Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6150480Z [2023-01-11 21:35:32,806] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 793
2023-01-11T21:38:06.6150745Z [2023-01-11 21:35:32,888] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 793
2023-01-11T21:38:06.6151162Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6151326Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6151583Z [2023-01-11 21:35:32,902] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 794
2023-01-11T21:38:06.6151847Z [2023-01-11 21:35:32,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 794
2023-01-11T21:38:06.6151853Z 
2023-01-11T21:38:06.6151955Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6152033Z import torch
2023-01-11T21:38:06.6152103Z import random
2023-01-11T21:38:06.6152226Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6152353Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6152361Z 
2023-01-11T21:38:06.6152446Z aten = torch.ops.aten
2023-01-11T21:38:06.6152583Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6152680Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6152685Z 
2023-01-11T21:38:06.6152762Z import triton
2023-01-11T21:38:06.6152856Z import triton.language as tl
2023-01-11T21:38:06.6152980Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6153122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6153128Z 
2023-01-11T21:38:06.6153132Z 
2023-01-11T21:38:06.6153294Z triton_fused_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6153372Z import triton
2023-01-11T21:38:06.6153464Z import triton.language as tl
2023-01-11T21:38:06.6153580Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6153684Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6153812Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6153939Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6153948Z 
2023-01-11T21:38:06.6154361Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6154463Z @triton.jit
2023-01-11T21:38:06.6154600Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6154678Z     xnumel = 194432
2023-01-11T21:38:06.6154778Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6154909Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6154987Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6155061Z     x0 = xindex
2023-01-11T21:38:06.6155159Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6155299Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6155387Z ''')
2023-01-11T21:38:06.6155396Z 
2023-01-11T21:38:06.6155400Z 
2023-01-11T21:38:06.6155561Z triton_fused_scatter_1 = async_compile.triton('''
2023-01-11T21:38:06.6155637Z import triton
2023-01-11T21:38:06.6155730Z import triton.language as tl
2023-01-11T21:38:06.6155840Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6155942Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6156075Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6156203Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6156209Z 
2023-01-11T21:38:06.6156641Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6156718Z @triton.jit
2023-01-11T21:38:06.6156860Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6156962Z     xnumel = 992
2023-01-11T21:38:06.6157054Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6157184Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6157267Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6157341Z     x0 = xindex
2023-01-11T21:38:06.6157440Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6157541Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6157687Z     tl.store(out_ptr0 + (x0 + (992*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6157769Z ''')
2023-01-11T21:38:06.6157775Z 
2023-01-11T21:38:06.6157779Z 
2023-01-11T21:38:06.6157876Z async_compile.wait(globals())
2023-01-11T21:38:06.6157958Z del async_compile
2023-01-11T21:38:06.6157963Z 
2023-01-11T21:38:06.6158040Z def call(args):
2023-01-11T21:38:06.6158128Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6158205Z     args.clear()
2023-01-11T21:38:06.6158298Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6158505Z         buf0 = empty_strided((196, 992), (992, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6158606Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6158754Z         triton_fused_scatter_0.run(arg0_1, buf0, 194432, grid=grid(194432), stream=stream0)
2023-01-11T21:38:06.6158830Z         del arg0_1
2023-01-11T21:38:06.6158983Z         triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 992, grid=grid(992), stream=stream0)
2023-01-11T21:38:06.6159056Z         del arg1_1
2023-01-11T21:38:06.6159132Z         del arg2_1
2023-01-11T21:38:06.6159212Z         return (buf0, )
2023-01-11T21:38:06.6159217Z 
2023-01-11T21:38:06.6159221Z 
2023-01-11T21:38:06.6159297Z if __name__ == "__main__":
2023-01-11T21:38:06.6159417Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6159547Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6159756Z     arg0_1 = rand_strided((196, 992), (992, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6159957Z     arg1_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6160164Z     arg2_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6160293Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6160299Z 
2023-01-11T21:38:06.6160304Z 
2023-01-11T21:38:06.6160433Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6160505Z import torch
2023-01-11T21:38:06.6160580Z import random
2023-01-11T21:38:06.6160701Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6160827Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6160832Z 
2023-01-11T21:38:06.6160913Z aten = torch.ops.aten
2023-01-11T21:38:06.6161052Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6161169Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6161174Z 
2023-01-11T21:38:06.6161250Z import triton
2023-01-11T21:38:06.6161360Z import triton.language as tl
2023-01-11T21:38:06.6161489Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6161629Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6161634Z 
2023-01-11T21:38:06.6161639Z 
2023-01-11T21:38:06.6161800Z triton_fused_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6161878Z import triton
2023-01-11T21:38:06.6161973Z import triton.language as tl
2023-01-11T21:38:06.6162092Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6162188Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6162321Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6162447Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6162452Z 
2023-01-11T21:38:06.6162857Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6162957Z @triton.jit
2023-01-11T21:38:06.6163095Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6163173Z     xnumel = 194432
2023-01-11T21:38:06.6163272Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6163402Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6163481Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6163550Z     x0 = xindex
2023-01-11T21:38:06.6163669Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6163804Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6163890Z ''')
2023-01-11T21:38:06.6163896Z 
2023-01-11T21:38:06.6163900Z 
2023-01-11T21:38:06.6164060Z triton_fused_scatter_1 = async_compile.triton('''
2023-01-11T21:38:06.6164134Z import triton
2023-01-11T21:38:06.6164220Z import triton.language as tl
2023-01-11T21:38:06.6164334Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6164436Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6164573Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6164699Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6164705Z 
2023-01-11T21:38:06.6165135Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6165209Z @triton.jit
2023-01-11T21:38:06.6165351Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6165419Z     xnumel = 992
2023-01-11T21:38:06.6165515Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6165642Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6165724Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6165796Z     x0 = xindex
2023-01-11T21:38:06.6165895Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6166016Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6166155Z     tl.store(out_ptr0 + (x0 + (992*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6166241Z ''')
2023-01-11T21:38:06.6166246Z 
2023-01-11T21:38:06.6166251Z 
2023-01-11T21:38:06.6166344Z async_compile.wait(globals())
2023-01-11T21:38:06.6166450Z del async_compile
2023-01-11T21:38:06.6166456Z 
2023-01-11T21:38:06.6166531Z def call(args):
2023-01-11T21:38:06.6166619Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6166693Z     args.clear()
2023-01-11T21:38:06.6166785Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6166987Z         buf0 = empty_strided((196, 992), (992, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6167079Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6167226Z         triton_fused_scatter_0.run(arg0_1, buf0, 194432, grid=grid(194432), stream=stream0)
2023-01-11T21:38:06.6167302Z         del arg0_1
2023-01-11T21:38:06.6167457Z         triton_fused_scatter_1.run(arg1_1, arg2_1, buf0, 992, grid=grid(992), stream=stream0)
2023-01-11T21:38:06.6167532Z         del arg1_1
2023-01-11T21:38:06.6167606Z         del arg2_1
2023-01-11T21:38:06.6167677Z         return (buf0, )
2023-01-11T21:38:06.6167682Z 
2023-01-11T21:38:06.6167693Z 
2023-01-11T21:38:06.6167770Z if __name__ == "__main__":
2023-01-11T21:38:06.6167889Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6168020Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6168228Z     arg0_1 = rand_strided((196, 992), (992, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6168429Z     arg1_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6168634Z     arg2_1 = rand_strided((1, 992), (992, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6168763Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6168769Z 
2023-01-11T21:38:06.6168866Z ok (0.190s)
2023-01-11T21:38:06.6169029Z   test_scatter_add1_cuda (__main__.CudaTests) ... skip: Flaky test, needs debugging (0.000s)
2023-01-11T21:38:06.6169494Z   test_scatter_add2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6169627Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6169887Z [2023-01-11 21:35:33,001] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 795
2023-01-11T21:38:06.6170150Z [2023-01-11 21:35:33,078] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 795
2023-01-11T21:38:06.6170565Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6170704Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6170963Z [2023-01-11 21:35:33,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 796
2023-01-11T21:38:06.6171225Z [2023-01-11 21:35:33,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 796
2023-01-11T21:38:06.6171230Z 
2023-01-11T21:38:06.6171329Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6171399Z import torch
2023-01-11T21:38:06.6171474Z import random
2023-01-11T21:38:06.6171594Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6171719Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6171726Z 
2023-01-11T21:38:06.6171810Z aten = torch.ops.aten
2023-01-11T21:38:06.6171946Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6172042Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6172048Z 
2023-01-11T21:38:06.6172122Z import triton
2023-01-11T21:38:06.6172237Z import triton.language as tl
2023-01-11T21:38:06.6172364Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6172503Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6172509Z 
2023-01-11T21:38:06.6172513Z 
2023-01-11T21:38:06.6172681Z triton_fused_scatter_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6172756Z import triton
2023-01-11T21:38:06.6172849Z import triton.language as tl
2023-01-11T21:38:06.6172963Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6173059Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6173194Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6173324Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6173329Z 
2023-01-11T21:38:06.6173730Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6173807Z @triton.jit
2023-01-11T21:38:06.6173940Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6174013Z     xnumel = 6
2023-01-11T21:38:06.6174109Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6174234Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6174317Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6174389Z     x0 = xindex
2023-01-11T21:38:06.6174663Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6174802Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6174990Z ''')
2023-01-11T21:38:06.6174996Z 
2023-01-11T21:38:06.6175000Z 
2023-01-11T21:38:06.6175167Z triton_fused_scatter_add_1 = async_compile.triton('''
2023-01-11T21:38:06.6175243Z import triton
2023-01-11T21:38:06.6175333Z import triton.language as tl
2023-01-11T21:38:06.6175448Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6175552Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6175686Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6175813Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6175819Z 
2023-01-11T21:38:06.6176247Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6176321Z @triton.jit
2023-01-11T21:38:06.6176461Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6176532Z     xnumel = 6
2023-01-11T21:38:06.6176628Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6176758Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6176841Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6176911Z     x2 = xindex
2023-01-11T21:38:06.6176986Z     x0 = xindex % 3
2023-01-11T21:38:06.6177087Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.6177234Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.6177387Z     tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6177476Z ''')
2023-01-11T21:38:06.6177482Z 
2023-01-11T21:38:06.6177486Z 
2023-01-11T21:38:06.6177580Z async_compile.wait(globals())
2023-01-11T21:38:06.6177657Z del async_compile
2023-01-11T21:38:06.6177662Z 
2023-01-11T21:38:06.6177737Z def call(args):
2023-01-11T21:38:06.6177822Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6177898Z     args.clear()
2023-01-11T21:38:06.6177988Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6178189Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6178282Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6178427Z         triton_fused_scatter_add_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6178552Z         del arg0_1
2023-01-11T21:38:06.6178710Z         triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6178784Z         del arg1_1
2023-01-11T21:38:06.6178850Z         del arg2_1
2023-01-11T21:38:06.6178929Z         return (buf0, )
2023-01-11T21:38:06.6178934Z 
2023-01-11T21:38:06.6178939Z 
2023-01-11T21:38:06.6179019Z if __name__ == "__main__":
2023-01-11T21:38:06.6179137Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6179265Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6179466Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6179666Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6179863Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6179985Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6179990Z 
2023-01-11T21:38:06.6180003Z 
2023-01-11T21:38:06.6180095Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6180171Z import torch
2023-01-11T21:38:06.6180245Z import random
2023-01-11T21:38:06.6180366Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6180489Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6180494Z 
2023-01-11T21:38:06.6180580Z aten = torch.ops.aten
2023-01-11T21:38:06.6180716Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6180805Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6180810Z 
2023-01-11T21:38:06.6180883Z import triton
2023-01-11T21:38:06.6181005Z import triton.language as tl
2023-01-11T21:38:06.6181130Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6181270Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6181275Z 
2023-01-11T21:38:06.6181280Z 
2023-01-11T21:38:06.6181446Z triton_fused_scatter_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6181524Z import triton
2023-01-11T21:38:06.6181611Z import triton.language as tl
2023-01-11T21:38:06.6181728Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6181830Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6181963Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6182089Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6182094Z 
2023-01-11T21:38:06.6182491Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6182568Z @triton.jit
2023-01-11T21:38:06.6182700Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6182767Z     xnumel = 6
2023-01-11T21:38:06.6182866Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6182997Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6183080Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6183151Z     x0 = xindex
2023-01-11T21:38:06.6183266Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6183403Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6183491Z ''')
2023-01-11T21:38:06.6183496Z 
2023-01-11T21:38:06.6183501Z 
2023-01-11T21:38:06.6183662Z triton_fused_scatter_add_1 = async_compile.triton('''
2023-01-11T21:38:06.6183738Z import triton
2023-01-11T21:38:06.6183831Z import triton.language as tl
2023-01-11T21:38:06.6183945Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6184049Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6184183Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6184308Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6184313Z 
2023-01-11T21:38:06.6184761Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6184830Z @triton.jit
2023-01-11T21:38:06.6184972Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6185045Z     xnumel = 6
2023-01-11T21:38:06.6185141Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6185271Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6185355Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6185424Z     x2 = xindex
2023-01-11T21:38:06.6185494Z     x0 = xindex % 3
2023-01-11T21:38:06.6185592Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.6185736Z     tmp1 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.6185909Z     tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6185997Z ''')
2023-01-11T21:38:06.6186002Z 
2023-01-11T21:38:06.6186008Z 
2023-01-11T21:38:06.6186101Z async_compile.wait(globals())
2023-01-11T21:38:06.6186178Z del async_compile
2023-01-11T21:38:06.6186183Z 
2023-01-11T21:38:06.6186258Z def call(args):
2023-01-11T21:38:06.6186337Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6186412Z     args.clear()
2023-01-11T21:38:06.6186505Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6186705Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6186798Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6186943Z         triton_fused_scatter_add_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6187045Z         del arg0_1
2023-01-11T21:38:06.6187188Z         triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6187262Z         del arg1_1
2023-01-11T21:38:06.6187335Z         del arg2_1
2023-01-11T21:38:06.6187412Z         return (buf0, )
2023-01-11T21:38:06.6187418Z 
2023-01-11T21:38:06.6187424Z 
2023-01-11T21:38:06.6187506Z if __name__ == "__main__":
2023-01-11T21:38:06.6187625Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6187755Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6187956Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6188145Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6188341Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6188467Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6188476Z 
2023-01-11T21:38:06.6188549Z ok (0.193s)
2023-01-11T21:38:06.6189013Z   test_scatter_add3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6189148Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6189411Z [2023-01-11 21:35:33,194] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 797
2023-01-11T21:38:06.6189674Z [2023-01-11 21:35:33,357] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 797
2023-01-11T21:38:06.6190091Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6190227Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6190502Z [2023-01-11 21:35:33,376] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 798
2023-01-11T21:38:06.6190769Z [2023-01-11 21:35:33,446] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 798
2023-01-11T21:38:06.6190775Z 
2023-01-11T21:38:06.6190873Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6190948Z import torch
2023-01-11T21:38:06.6191024Z import random
2023-01-11T21:38:06.6191144Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6191270Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6191276Z 
2023-01-11T21:38:06.6191362Z aten = torch.ops.aten
2023-01-11T21:38:06.6191492Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6191590Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6191595Z 
2023-01-11T21:38:06.6191669Z import triton
2023-01-11T21:38:06.6191761Z import triton.language as tl
2023-01-11T21:38:06.6191889Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6192028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6192033Z 
2023-01-11T21:38:06.6192038Z 
2023-01-11T21:38:06.6192204Z triton_fused_scatter_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6192279Z import triton
2023-01-11T21:38:06.6192365Z import triton.language as tl
2023-01-11T21:38:06.6192481Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6192585Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6192717Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6192843Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6192874Z 
2023-01-11T21:38:06.6193280Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6193355Z @triton.jit
2023-01-11T21:38:06.6193488Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6193557Z     xnumel = 1885
2023-01-11T21:38:06.6193654Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6193782Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6193865Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6193937Z     x0 = xindex
2023-01-11T21:38:06.6194036Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6194172Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6194252Z ''')
2023-01-11T21:38:06.6194258Z 
2023-01-11T21:38:06.6194271Z 
2023-01-11T21:38:06.6194431Z triton_fused_scatter_add_1 = async_compile.triton('''
2023-01-11T21:38:06.6194507Z import triton
2023-01-11T21:38:06.6194600Z import triton.language as tl
2023-01-11T21:38:06.6194715Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6194818Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6194955Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6195081Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6195086Z 
2023-01-11T21:38:06.6195540Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6195622Z @triton.jit
2023-01-11T21:38:06.6195767Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6195841Z     xnumel = 4
2023-01-11T21:38:06.6195941Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6196071Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6196154Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6196225Z     x0 = xindex
2023-01-11T21:38:06.6196315Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6196440Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6196552Z     tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask)
2023-01-11T21:38:06.6196637Z ''')
2023-01-11T21:38:06.6196643Z 
2023-01-11T21:38:06.6196647Z 
2023-01-11T21:38:06.6196741Z async_compile.wait(globals())
2023-01-11T21:38:06.6196817Z del async_compile
2023-01-11T21:38:06.6196823Z 
2023-01-11T21:38:06.6196896Z def call(args):
2023-01-11T21:38:06.6196977Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6197052Z     args.clear()
2023-01-11T21:38:06.6197146Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6197358Z         buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6197455Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6197604Z         triton_fused_scatter_add_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0)
2023-01-11T21:38:06.6197677Z         del arg0_1
2023-01-11T21:38:06.6197832Z         triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6197900Z         del arg1_1
2023-01-11T21:38:06.6197971Z         del arg2_1
2023-01-11T21:38:06.6198049Z         return (buf0, )
2023-01-11T21:38:06.6198055Z 
2023-01-11T21:38:06.6198059Z 
2023-01-11T21:38:06.6198140Z if __name__ == "__main__":
2023-01-11T21:38:06.6198258Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6198385Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6198600Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6198798Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6199037Z     arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6199165Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6199170Z 
2023-01-11T21:38:06.6199174Z 
2023-01-11T21:38:06.6199272Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6199350Z import torch
2023-01-11T21:38:06.6199426Z import random
2023-01-11T21:38:06.6199545Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6199670Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6199675Z 
2023-01-11T21:38:06.6199751Z aten = torch.ops.aten
2023-01-11T21:38:06.6199889Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6199985Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6199990Z 
2023-01-11T21:38:06.6200066Z import triton
2023-01-11T21:38:06.6200159Z import triton.language as tl
2023-01-11T21:38:06.6200284Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6200425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6200431Z 
2023-01-11T21:38:06.6200435Z 
2023-01-11T21:38:06.6200601Z triton_fused_scatter_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6200669Z import triton
2023-01-11T21:38:06.6200761Z import triton.language as tl
2023-01-11T21:38:06.6200877Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6200978Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6201111Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6201234Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6201239Z 
2023-01-11T21:38:06.6201643Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6201718Z @triton.jit
2023-01-11T21:38:06.6201846Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6201924Z     xnumel = 1885
2023-01-11T21:38:06.6202019Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6202148Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6202236Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6202336Z     x0 = xindex
2023-01-11T21:38:06.6202456Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6202584Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6202669Z ''')
2023-01-11T21:38:06.6202675Z 
2023-01-11T21:38:06.6202679Z 
2023-01-11T21:38:06.6202849Z triton_fused_scatter_add_1 = async_compile.triton('''
2023-01-11T21:38:06.6202925Z import triton
2023-01-11T21:38:06.6203023Z import triton.language as tl
2023-01-11T21:38:06.6203134Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6203233Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6203367Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6203485Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6203490Z 
2023-01-11T21:38:06.6203912Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6203986Z @triton.jit
2023-01-11T21:38:06.6204129Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6204200Z     xnumel = 4
2023-01-11T21:38:06.6204298Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6204426Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6204509Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6204574Z     x0 = xindex
2023-01-11T21:38:06.6204670Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6204786Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6204925Z     tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask)
2023-01-11T21:38:06.6205011Z ''')
2023-01-11T21:38:06.6205017Z 
2023-01-11T21:38:06.6205021Z 
2023-01-11T21:38:06.6205115Z async_compile.wait(globals())
2023-01-11T21:38:06.6205191Z del async_compile
2023-01-11T21:38:06.6205196Z 
2023-01-11T21:38:06.6205272Z def call(args):
2023-01-11T21:38:06.6205353Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6205428Z     args.clear()
2023-01-11T21:38:06.6205522Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6205735Z         buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6205828Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6205977Z         triton_fused_scatter_add_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0)
2023-01-11T21:38:06.6206050Z         del arg0_1
2023-01-11T21:38:06.6206193Z         triton_fused_scatter_add_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6206269Z         del arg1_1
2023-01-11T21:38:06.6206343Z         del arg2_1
2023-01-11T21:38:06.6206420Z         return (buf0, )
2023-01-11T21:38:06.6206425Z 
2023-01-11T21:38:06.6206429Z 
2023-01-11T21:38:06.6206510Z if __name__ == "__main__":
2023-01-11T21:38:06.6206627Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6206757Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6206961Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6207162Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6207369Z     arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6207495Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6207500Z 
2023-01-11T21:38:06.6207571Z ok (0.273s)
2023-01-11T21:38:06.6208070Z   test_scatter_reduce1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6208208Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6208467Z [2023-01-11 21:35:33,467] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 799
2023-01-11T21:38:06.6208734Z [2023-01-11 21:35:33,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 799
2023-01-11T21:38:06.6209149Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6209284Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6209534Z [2023-01-11 21:35:33,502] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 800
2023-01-11T21:38:06.6209799Z [2023-01-11 21:35:33,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 800
2023-01-11T21:38:06.6209805Z 
2023-01-11T21:38:06.6209903Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6209976Z import torch
2023-01-11T21:38:06.6210049Z import random
2023-01-11T21:38:06.6210170Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6210294Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6210299Z 
2023-01-11T21:38:06.6210380Z aten = torch.ops.aten
2023-01-11T21:38:06.6210511Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6210638Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6210643Z 
2023-01-11T21:38:06.6210718Z import triton
2023-01-11T21:38:06.6210811Z import triton.language as tl
2023-01-11T21:38:06.6210937Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6211077Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6211085Z 
2023-01-11T21:38:06.6211089Z 
2023-01-11T21:38:06.6211264Z triton_fused_scatter_reduce_0 = async_compile.triton('''
2023-01-11T21:38:06.6211341Z import triton
2023-01-11T21:38:06.6211427Z import triton.language as tl
2023-01-11T21:38:06.6211543Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6211643Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6211774Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6211898Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6211903Z 
2023-01-11T21:38:06.6219476Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6219582Z @triton.jit
2023-01-11T21:38:06.6219724Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6219805Z     xnumel = 1885
2023-01-11T21:38:06.6219899Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6220032Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6220116Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6220188Z     x0 = xindex
2023-01-11T21:38:06.6220286Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6220423Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6220531Z ''')
2023-01-11T21:38:06.6220538Z 
2023-01-11T21:38:06.6220542Z 
2023-01-11T21:38:06.6220714Z triton_fused_scatter_reduce_1 = async_compile.triton('''
2023-01-11T21:38:06.6220793Z import triton
2023-01-11T21:38:06.6220886Z import triton.language as tl
2023-01-11T21:38:06.6220997Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6221103Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6221235Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6221437Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6221443Z 
2023-01-11T21:38:06.6221860Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6221940Z @triton.jit
2023-01-11T21:38:06.6222088Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6222162Z     xnumel = 4
2023-01-11T21:38:06.6222262Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6222394Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6222483Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6222559Z     x0 = xindex
2023-01-11T21:38:06.6222652Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6222754Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6222868Z     tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask)
2023-01-11T21:38:06.6222993Z ''')
2023-01-11T21:38:06.6223001Z 
2023-01-11T21:38:06.6223007Z 
2023-01-11T21:38:06.6223132Z async_compile.wait(globals())
2023-01-11T21:38:06.6223221Z del async_compile
2023-01-11T21:38:06.6223227Z 
2023-01-11T21:38:06.6223303Z def call(args):
2023-01-11T21:38:06.6223385Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6223465Z     args.clear()
2023-01-11T21:38:06.6223560Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6223780Z         buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6223882Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6224037Z         triton_fused_scatter_reduce_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0)
2023-01-11T21:38:06.6224176Z         del arg0_1
2023-01-11T21:38:06.6224335Z         triton_fused_scatter_reduce_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6224402Z         del arg1_1
2023-01-11T21:38:06.6224473Z         del arg2_1
2023-01-11T21:38:06.6224551Z         return (buf0, )
2023-01-11T21:38:06.6224557Z 
2023-01-11T21:38:06.6224561Z 
2023-01-11T21:38:06.6224645Z if __name__ == "__main__":
2023-01-11T21:38:06.6224764Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6224887Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6225100Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6225377Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6225592Z     arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6225748Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6225753Z 
2023-01-11T21:38:06.6225758Z 
2023-01-11T21:38:06.6225878Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6225958Z import torch
2023-01-11T21:38:06.6226032Z import random
2023-01-11T21:38:06.6226155Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6226280Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6226285Z 
2023-01-11T21:38:06.6226360Z aten = torch.ops.aten
2023-01-11T21:38:06.6226495Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6226592Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6226597Z 
2023-01-11T21:38:06.6226669Z import triton
2023-01-11T21:38:06.6226761Z import triton.language as tl
2023-01-11T21:38:06.6226886Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6227024Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6227031Z 
2023-01-11T21:38:06.6227036Z 
2023-01-11T21:38:06.6227209Z triton_fused_scatter_reduce_0 = async_compile.triton('''
2023-01-11T21:38:06.6227276Z import triton
2023-01-11T21:38:06.6227367Z import triton.language as tl
2023-01-11T21:38:06.6227480Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6227613Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6227747Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6227872Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6227877Z 
2023-01-11T21:38:06.6228281Z @pointwise(size_hints=[2048], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6228354Z @triton.jit
2023-01-11T21:38:06.6228480Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6228556Z     xnumel = 1885
2023-01-11T21:38:06.6228660Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6228788Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6228873Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6228944Z     x0 = xindex
2023-01-11T21:38:06.6229060Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6229192Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6229277Z ''')
2023-01-11T21:38:06.6229283Z 
2023-01-11T21:38:06.6229288Z 
2023-01-11T21:38:06.6229461Z triton_fused_scatter_reduce_1 = async_compile.triton('''
2023-01-11T21:38:06.6229536Z import triton
2023-01-11T21:38:06.6229627Z import triton.language as tl
2023-01-11T21:38:06.6229740Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6229842Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6229974Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6230092Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6230127Z 
2023-01-11T21:38:06.6230550Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6230625Z @triton.jit
2023-01-11T21:38:06.6230765Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6230837Z     xnumel = 4
2023-01-11T21:38:06.6230932Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6231061Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6231144Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6231208Z     x0 = xindex
2023-01-11T21:38:06.6231304Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6231421Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6231534Z     tl.atomic_add(out_ptr0 + (tmp0), tmp1, xmask)
2023-01-11T21:38:06.6231622Z ''')
2023-01-11T21:38:06.6231627Z 
2023-01-11T21:38:06.6231632Z 
2023-01-11T21:38:06.6231726Z async_compile.wait(globals())
2023-01-11T21:38:06.6231803Z del async_compile
2023-01-11T21:38:06.6231808Z 
2023-01-11T21:38:06.6231882Z def call(args):
2023-01-11T21:38:06.6231960Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6232037Z     args.clear()
2023-01-11T21:38:06.6232128Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6232342Z         buf0 = empty_strided((5, 29, 13), (377, 13, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6232434Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6232586Z         triton_fused_scatter_reduce_0.run(arg0_1, buf0, 1885, grid=grid(1885), stream=stream0)
2023-01-11T21:38:06.6232659Z         del arg0_1
2023-01-11T21:38:06.6232808Z         triton_fused_scatter_reduce_1.run(arg1_1, arg2_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6232879Z         del arg1_1
2023-01-11T21:38:06.6232952Z         del arg2_1
2023-01-11T21:38:06.6233032Z         return (buf0, )
2023-01-11T21:38:06.6233037Z 
2023-01-11T21:38:06.6233041Z 
2023-01-11T21:38:06.6233121Z if __name__ == "__main__":
2023-01-11T21:38:06.6233239Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6233366Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6233606Z     arg0_1 = rand_strided((5, 29, 13), (377, 13, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6233804Z     arg1_1 = rand_strided((1, 1, 4), (4, 4, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6234011Z     arg2_1 = rand_strided((1, 1, 10), (10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6234137Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6234142Z 
2023-01-11T21:38:06.6234213Z ok (0.070s)
2023-01-11T21:38:06.6234675Z   test_scatter_reduce2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6234811Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6235075Z [2023-01-11 21:35:33,537] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 801
2023-01-11T21:38:06.6235343Z [2023-01-11 21:35:33,614] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 801
2023-01-11T21:38:06.6235760Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6235949Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6236223Z [2023-01-11 21:35:33,634] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 802
2023-01-11T21:38:06.6236237Z 
2023-01-11T21:38:06.6236329Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6236407Z import torch
2023-01-11T21:38:06.6236482Z import random
2023-01-11T21:38:06.6236600Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6236724Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6236729Z 
2023-01-11T21:38:06.6236814Z aten = torch.ops.aten
2023-01-11T21:38:06.6236950Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6237039Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6237044Z 
2023-01-11T21:38:06.6237115Z import triton
2023-01-11T21:38:06.6237204Z import triton.language as tl
2023-01-11T21:38:06.6237328Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6237472Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6237478Z 
2023-01-11T21:38:06.6237483Z 
2023-01-11T21:38:06.6237656Z triton_fused_scatter_reduce_0 = async_compile.triton('''
2023-01-11T21:38:06.6237728Z import triton
2023-01-11T21:38:06.6237819Z import triton.language as tl
2023-01-11T21:38:06.6237929Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6238028Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6238160Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6238283Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6238289Z 
2023-01-11T21:38:06.6238685Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6238759Z @triton.jit
2023-01-11T21:38:06.6238895Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6238968Z     xnumel = 6
2023-01-11T21:38:06.6239058Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6239187Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6239269Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6239375Z     x0 = xindex
2023-01-11T21:38:06.6239473Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6239608Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6239693Z ''')
2023-01-11T21:38:06.6239699Z 
2023-01-11T21:38:06.6239703Z 
2023-01-11T21:38:06.6239869Z triton_fused_scatter_reduce_1 = async_compile.triton('''
2023-01-11T21:38:06.6239947Z import triton
2023-01-11T21:38:06.6240040Z import triton.language as tl
2023-01-11T21:38:06.6240155Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6240254Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6240385Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6240514Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6240520Z 
2023-01-11T21:38:06.6240924Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6240992Z @triton.jit
2023-01-11T21:38:06.6241123Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6241196Z     xnumel = 6
2023-01-11T21:38:06.6241293Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6241424Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6241507Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6241577Z     x2 = xindex
2023-01-11T21:38:06.6241644Z     x0 = xindex % 3
2023-01-11T21:38:06.6241832Z     tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6241934Z     tmp1 = 0
2023-01-11T21:38:06.6242076Z     tl.store(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6242158Z ''')
2023-01-11T21:38:06.6242163Z 
2023-01-11T21:38:06.6242167Z 
2023-01-11T21:38:06.6242338Z triton_fused_scatter_reduce_2 = async_compile.triton('''
2023-01-11T21:38:06.6242412Z import triton
2023-01-11T21:38:06.6242506Z import triton.language as tl
2023-01-11T21:38:06.6242614Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6242714Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6242847Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6242972Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6242977Z 
2023-01-11T21:38:06.6243401Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6243479Z @triton.jit
2023-01-11T21:38:06.6243617Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6243689Z     xnumel = 6
2023-01-11T21:38:06.6243779Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6243910Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6243993Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6244064Z     x2 = xindex
2023-01-11T21:38:06.6244136Z     x0 = xindex % 3
2023-01-11T21:38:06.6244233Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.6244331Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.6244472Z     tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6244556Z ''')
2023-01-11T21:38:06.6244561Z 
2023-01-11T21:38:06.6244565Z 
2023-01-11T21:38:06.6244658Z async_compile.wait(globals())
2023-01-11T21:38:06.6244736Z del async_compile
2023-01-11T21:38:06.6244741Z 
2023-01-11T21:38:06.6244822Z def call(args):
2023-01-11T21:38:06.6244906Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6244981Z     args.clear()
2023-01-11T21:38:06.6245066Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6245264Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6245385Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6245535Z         triton_fused_scatter_reduce_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6245608Z         del arg0_1
2023-01-11T21:38:06.6245753Z         triton_fused_scatter_reduce_1.run(arg1_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6245909Z         triton_fused_scatter_reduce_2.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6245981Z         del arg1_1
2023-01-11T21:38:06.6246047Z         del arg2_1
2023-01-11T21:38:06.6246124Z         return (buf0, )
2023-01-11T21:38:06.6246129Z 
2023-01-11T21:38:06.6246133Z 
2023-01-11T21:38:06.6246217Z if __name__ == "__main__":
2023-01-11T21:38:06.6246335Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6246459Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6246657Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6246854Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6247056Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6247177Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6247182Z 
2023-01-11T21:38:06.6247443Z [2023-01-11 21:35:33,711] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 802
2023-01-11T21:38:06.6247449Z 
2023-01-11T21:38:06.6247548Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6247622Z import torch
2023-01-11T21:38:06.6247695Z import random
2023-01-11T21:38:06.6247812Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6247962Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6247967Z 
2023-01-11T21:38:06.6248048Z aten = torch.ops.aten
2023-01-11T21:38:06.6248177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6248274Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6248281Z 
2023-01-11T21:38:06.6248354Z import triton
2023-01-11T21:38:06.6248446Z import triton.language as tl
2023-01-11T21:38:06.6248570Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6248708Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6248713Z 
2023-01-11T21:38:06.6248717Z 
2023-01-11T21:38:06.6248890Z triton_fused_scatter_reduce_0 = async_compile.triton('''
2023-01-11T21:38:06.6248967Z import triton
2023-01-11T21:38:06.6249052Z import triton.language as tl
2023-01-11T21:38:06.6249166Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6249268Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6249403Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6249526Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6249531Z 
2023-01-11T21:38:06.6249938Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6250011Z @triton.jit
2023-01-11T21:38:06.6250143Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6250209Z     xnumel = 6
2023-01-11T21:38:06.6250306Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6250438Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6250520Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6250591Z     x0 = xindex
2023-01-11T21:38:06.6250708Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6250844Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6250922Z ''')
2023-01-11T21:38:06.6250928Z 
2023-01-11T21:38:06.6250932Z 
2023-01-11T21:38:06.6251106Z triton_fused_scatter_reduce_1 = async_compile.triton('''
2023-01-11T21:38:06.6251180Z import triton
2023-01-11T21:38:06.6251299Z import triton.language as tl
2023-01-11T21:38:06.6251415Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6251516Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6251647Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6251765Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6251778Z 
2023-01-11T21:38:06.6252179Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6252252Z @triton.jit
2023-01-11T21:38:06.6252387Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6252458Z     xnumel = 6
2023-01-11T21:38:06.6252553Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6252686Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6252769Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6252835Z     x2 = xindex
2023-01-11T21:38:06.6252909Z     x0 = xindex % 3
2023-01-11T21:38:06.6253097Z     tmp0 = tl.load(in_ptr0 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6253171Z     tmp1 = 0
2023-01-11T21:38:06.6253315Z     tl.store(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6253400Z ''')
2023-01-11T21:38:06.6253405Z 
2023-01-11T21:38:06.6253409Z 
2023-01-11T21:38:06.6253580Z triton_fused_scatter_reduce_2 = async_compile.triton('''
2023-01-11T21:38:06.6253654Z import triton
2023-01-11T21:38:06.6253740Z import triton.language as tl
2023-01-11T21:38:06.6253853Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6253982Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6254113Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6254236Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6254241Z 
2023-01-11T21:38:06.6254800Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6254876Z @triton.jit
2023-01-11T21:38:06.6255014Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6255079Z     xnumel = 6
2023-01-11T21:38:06.6255175Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6255304Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6255389Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6255463Z     x2 = xindex
2023-01-11T21:38:06.6255536Z     x0 = xindex % 3
2023-01-11T21:38:06.6255634Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.6255744Z     tmp1 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.6255894Z     tl.atomic_add(out_ptr0 + (x0 + (3*tmp0) + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6255985Z ''')
2023-01-11T21:38:06.6255991Z 
2023-01-11T21:38:06.6255995Z 
2023-01-11T21:38:06.6256088Z async_compile.wait(globals())
2023-01-11T21:38:06.6256169Z del async_compile
2023-01-11T21:38:06.6256174Z 
2023-01-11T21:38:06.6256248Z def call(args):
2023-01-11T21:38:06.6256333Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6256408Z     args.clear()
2023-01-11T21:38:06.6256493Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6256696Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6256788Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6256937Z         triton_fused_scatter_reduce_0.run(arg0_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6257017Z         del arg0_1
2023-01-11T21:38:06.6257226Z         triton_fused_scatter_reduce_1.run(arg1_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6257397Z         triton_fused_scatter_reduce_2.run(arg1_1, arg2_1, buf0, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.6257529Z         del arg1_1
2023-01-11T21:38:06.6257616Z         del arg2_1
2023-01-11T21:38:06.6257706Z         return (buf0, )
2023-01-11T21:38:06.6257712Z 
2023-01-11T21:38:06.6257716Z 
2023-01-11T21:38:06.6257797Z if __name__ == "__main__":
2023-01-11T21:38:06.6257914Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6258041Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6258243Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6258436Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.6258631Z     arg2_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6258756Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6258761Z 
2023-01-11T21:38:06.6258832Z ok (0.194s)
2023-01-11T21:38:06.6259308Z   test_scheduler_vertical_fusion1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6259442Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6259701Z [2023-01-11 21:35:33,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 803
2023-01-11T21:38:06.6259909Z [2023-01-11 21:35:33,824] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:06.6260208Z [2023-01-11 21:35:33,906] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 803
2023-01-11T21:38:06.6260628Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6260758Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6261005Z [2023-01-11 21:35:33,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 804
2023-01-11T21:38:06.6261212Z [2023-01-11 21:35:34,000] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:06.6261473Z [2023-01-11 21:35:34,083] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 804
2023-01-11T21:38:06.6261482Z 
2023-01-11T21:38:06.6261580Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6261654Z import torch
2023-01-11T21:38:06.6261727Z import random
2023-01-11T21:38:06.6261848Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6261971Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6261979Z 
2023-01-11T21:38:06.6262055Z aten = torch.ops.aten
2023-01-11T21:38:06.6262195Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6262290Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6262295Z 
2023-01-11T21:38:06.6262371Z import triton
2023-01-11T21:38:06.6262463Z import triton.language as tl
2023-01-11T21:38:06.6262587Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6262728Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6262734Z 
2023-01-11T21:38:06.6262738Z 
2023-01-11T21:38:06.6262975Z triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0 = async_compile.triton('''
2023-01-11T21:38:06.6263053Z import triton
2023-01-11T21:38:06.6263138Z import triton.language as tl
2023-01-11T21:38:06.6263251Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6263354Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6263515Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6263645Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6263651Z 
2023-01-11T21:38:06.6264137Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.6264211Z @triton.jit
2023-01-11T21:38:06.6264373Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6264446Z     xnumel = 1082016
2023-01-11T21:38:06.6264544Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6264673Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6264754Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6264825Z     x0 = xindex
2023-01-11T21:38:06.6264899Z     x1 = xindex % 26
2023-01-11T21:38:06.6265092Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6265183Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6265280Z     tmp9 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6265378Z     tmp16 = tl.load(in_ptr2 + (x1), xmask)
2023-01-11T21:38:06.6265502Z     tmp1 = -1.061519070296458e-11
2023-01-11T21:38:06.6265583Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6265705Z     tmp4 = -1.988366587925593e-08
2023-01-11T21:38:06.6265784Z     tmp5 = tmp2 + tmp4
2023-01-11T21:38:06.6265855Z     tmp6 = tmp3 * tmp5
2023-01-11T21:38:06.6265977Z     tmp7 = -3.087032500374211e-07
2023-01-11T21:38:06.6266098Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.6266220Z     tmp10 = 1.55093272922008e-10
2023-01-11T21:38:06.6266302Z     tmp11 = tmp9 * tmp10
2023-01-11T21:38:06.6266384Z     tmp12 = tmp8 + tmp11
2023-01-11T21:38:06.6266453Z     tmp13 = 1 / tmp12
2023-01-11T21:38:06.6266527Z     tmp14 = 1.0
2023-01-11T21:38:06.6266611Z     tmp15 = tmp13 * tmp14
2023-01-11T21:38:06.6266694Z     tmp17 = tmp12 * tmp16
2023-01-11T21:38:06.6266773Z     tmp18 = tmp15 + tmp17
2023-01-11T21:38:06.6266914Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.6266999Z ''')
2023-01-11T21:38:06.6267005Z 
2023-01-11T21:38:06.6267009Z 
2023-01-11T21:38:06.6267104Z async_compile.wait(globals())
2023-01-11T21:38:06.6267174Z del async_compile
2023-01-11T21:38:06.6267179Z 
2023-01-11T21:38:06.6267256Z def call(args):
2023-01-11T21:38:06.6267342Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6267417Z     args.clear()
2023-01-11T21:38:06.6267513Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6267742Z         buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6267832Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.6268037Z         buf2 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6268132Z         buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.6268223Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6268423Z         triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0.run(buf1, buf4, arg1_1, arg0_1, arg2_1, 1082016, grid=grid(1082016), stream=stream0)
2023-01-11T21:38:06.6268496Z         del arg0_1
2023-01-11T21:38:06.6268568Z         del arg1_1
2023-01-11T21:38:06.6268641Z         del arg2_1
2023-01-11T21:38:06.6268711Z         return (buf4, )
2023-01-11T21:38:06.6268723Z 
2023-01-11T21:38:06.6268728Z 
2023-01-11T21:38:06.6268801Z if __name__ == "__main__":
2023-01-11T21:38:06.6268919Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6269052Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6269267Z     arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6269485Z     arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6269713Z     arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6269841Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6269846Z 
2023-01-11T21:38:06.6269851Z 
2023-01-11T21:38:06.6269952Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6270020Z import torch
2023-01-11T21:38:06.6270094Z import random
2023-01-11T21:38:06.6270213Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6270337Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6270342Z 
2023-01-11T21:38:06.6270425Z aten = torch.ops.aten
2023-01-11T21:38:06.6270565Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6270662Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6270667Z 
2023-01-11T21:38:06.6270741Z import triton
2023-01-11T21:38:06.6270827Z import triton.language as tl
2023-01-11T21:38:06.6270954Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6271097Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6271103Z 
2023-01-11T21:38:06.6271107Z 
2023-01-11T21:38:06.6271344Z triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0 = async_compile.triton('''
2023-01-11T21:38:06.6271418Z import triton
2023-01-11T21:38:06.6271509Z import triton.language as tl
2023-01-11T21:38:06.6271625Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6271719Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6271852Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6271977Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6272011Z 
2023-01-11T21:38:06.6272502Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.6272575Z @triton.jit
2023-01-11T21:38:06.6272741Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, in_ptr1, in_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6272816Z     xnumel = 1082016
2023-01-11T21:38:06.6272916Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6273047Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6273123Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6273195Z     x0 = xindex
2023-01-11T21:38:06.6273270Z     x1 = xindex % 26
2023-01-11T21:38:06.6273481Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6273601Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6273715Z     tmp9 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6273831Z     tmp16 = tl.load(in_ptr2 + (x1), xmask).to(tl.float32)
2023-01-11T21:38:06.6273948Z     tmp1 = -1.061519070296458e-11
2023-01-11T21:38:06.6274031Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6274152Z     tmp4 = -1.988366587925593e-08
2023-01-11T21:38:06.6274231Z     tmp5 = tmp2 + tmp4
2023-01-11T21:38:06.6274310Z     tmp6 = tmp3 * tmp5
2023-01-11T21:38:06.6274432Z     tmp7 = -3.087032500374211e-07
2023-01-11T21:38:06.6274511Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.6274626Z     tmp10 = 1.55093272922008e-10
2023-01-11T21:38:06.6274706Z     tmp11 = tmp9 * tmp10
2023-01-11T21:38:06.6274787Z     tmp12 = tmp8 + tmp11
2023-01-11T21:38:06.6274861Z     tmp13 = 1 / tmp12
2023-01-11T21:38:06.6274930Z     tmp14 = 1.0
2023-01-11T21:38:06.6275008Z     tmp15 = tmp13 * tmp14
2023-01-11T21:38:06.6275084Z     tmp17 = tmp12 * tmp16
2023-01-11T21:38:06.6275161Z     tmp18 = tmp15 + tmp17
2023-01-11T21:38:06.6275303Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.6275387Z ''')
2023-01-11T21:38:06.6275393Z 
2023-01-11T21:38:06.6275397Z 
2023-01-11T21:38:06.6275488Z async_compile.wait(globals())
2023-01-11T21:38:06.6275639Z del async_compile
2023-01-11T21:38:06.6275645Z 
2023-01-11T21:38:06.6275721Z def call(args):
2023-01-11T21:38:06.6275807Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6275876Z     args.clear()
2023-01-11T21:38:06.6275966Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6276181Z         buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6276273Z         buf1 = buf0; del buf0  # reuse
2023-01-11T21:38:06.6276484Z         buf2 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6276574Z         buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.6276669Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6276859Z         triton_fused_add_add_1_add_2_add_3_mul_mul_1_mul_2_mul_3_mul_4_realize_0.run(buf1, buf4, arg1_1, arg0_1, arg2_1, 1082016, grid=grid(1082016), stream=stream0)
2023-01-11T21:38:06.6276933Z         del arg0_1
2023-01-11T21:38:06.6277007Z         del arg1_1
2023-01-11T21:38:06.6277081Z         del arg2_1
2023-01-11T21:38:06.6277156Z         return (buf4, )
2023-01-11T21:38:06.6277161Z 
2023-01-11T21:38:06.6277166Z 
2023-01-11T21:38:06.6277245Z if __name__ == "__main__":
2023-01-11T21:38:06.6277366Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6277492Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6277701Z     arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6277916Z     arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6278113Z     arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6278274Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6278279Z 
2023-01-11T21:38:06.6278350Z ok (0.373s)
2023-01-11T21:38:06.6278814Z   test_select_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6278945Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6279204Z [2023-01-11 21:35:34,114] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 805
2023-01-11T21:38:06.6279471Z [2023-01-11 21:35:34,218] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 805
2023-01-11T21:38:06.6279887Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6280017Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6280265Z [2023-01-11 21:35:34,246] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 806
2023-01-11T21:38:06.6280525Z [2023-01-11 21:35:34,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 806
2023-01-11T21:38:06.6280531Z 
2023-01-11T21:38:06.6280629Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6280705Z import torch
2023-01-11T21:38:06.6280780Z import random
2023-01-11T21:38:06.6280901Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6281029Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6281034Z 
2023-01-11T21:38:06.6281117Z aten = torch.ops.aten
2023-01-11T21:38:06.6281247Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6281343Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6281376Z 
2023-01-11T21:38:06.6281455Z import triton
2023-01-11T21:38:06.6281546Z import triton.language as tl
2023-01-11T21:38:06.6281672Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6281813Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6281818Z 
2023-01-11T21:38:06.6281822Z 
2023-01-11T21:38:06.6282022Z triton_fused_select_scatter_select_scatter_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6282095Z import triton
2023-01-11T21:38:06.6282180Z import triton.language as tl
2023-01-11T21:38:06.6282293Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6282392Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6282526Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6282652Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6282657Z 
2023-01-11T21:38:06.6283108Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.6283182Z @triton.jit
2023-01-11T21:38:06.6283340Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6283409Z     xnumel = 59888
2023-01-11T21:38:06.6283505Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6283634Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6283714Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6283799Z     x1 = (xindex // 38) % 197
2023-01-11T21:38:06.6283901Z     x0 = xindex % 38
2023-01-11T21:38:06.6283980Z     x2 = (xindex // 7486)
2023-01-11T21:38:06.6284044Z     x3 = xindex
2023-01-11T21:38:06.6284121Z     x4 = xindex % 7486
2023-01-11T21:38:06.6284230Z     tmp3 = tl.load(in_ptr0 + (x0 + (38*x2)), xmask)
2023-01-11T21:38:06.6284423Z     tmp4 = tl.load(in_ptr1 + (x3), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6284521Z     tmp9 = tl.load(in_ptr2 + (x4), xmask)
2023-01-11T21:38:06.6284621Z     tmp10 = tl.load(in_ptr1 + (x3), xmask)
2023-01-11T21:38:06.6284693Z     tmp0 = x1
2023-01-11T21:38:06.6284757Z     tmp1 = 0
2023-01-11T21:38:06.6284835Z     tmp2 = tmp0 == tmp1
2023-01-11T21:38:06.6284932Z     tmp5 = tl.where(tmp2, tmp3, tmp4)
2023-01-11T21:38:06.6285003Z     tmp6 = x2
2023-01-11T21:38:06.6285074Z     tmp7 = 1
2023-01-11T21:38:06.6285154Z     tmp8 = tmp6 == tmp7
2023-01-11T21:38:06.6285251Z     tmp11 = tl.where(tmp8, tmp9, tmp10)
2023-01-11T21:38:06.6285378Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6285515Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.6285599Z ''')
2023-01-11T21:38:06.6285605Z 
2023-01-11T21:38:06.6285609Z 
2023-01-11T21:38:06.6285701Z async_compile.wait(globals())
2023-01-11T21:38:06.6285778Z del async_compile
2023-01-11T21:38:06.6285783Z 
2023-01-11T21:38:06.6285859Z def call(args):
2023-01-11T21:38:06.6285946Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6286015Z     args.clear()
2023-01-11T21:38:06.6286107Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6286323Z         buf0 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6286535Z         buf1 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6286628Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6286816Z         triton_fused_select_scatter_select_scatter_1_0.run(arg1_1, arg0_1, arg2_1, buf0, buf1, 59888, grid=grid(59888), stream=stream0)
2023-01-11T21:38:06.6286890Z         del arg0_1
2023-01-11T21:38:06.6286960Z         del arg1_1
2023-01-11T21:38:06.6287025Z         del arg2_1
2023-01-11T21:38:06.6287109Z         return (buf0, buf1, )
2023-01-11T21:38:06.6287114Z 
2023-01-11T21:38:06.6287118Z 
2023-01-11T21:38:06.6287196Z if __name__ == "__main__":
2023-01-11T21:38:06.6287343Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6287472Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6287685Z     arg0_1 = rand_strided((8, 197, 38), (7486, 38, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6287885Z     arg1_1 = rand_strided((8, 38), (38, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6288091Z     arg2_1 = rand_strided((197, 38), (38, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6288211Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6288216Z 
2023-01-11T21:38:06.6288226Z 
2023-01-11T21:38:06.6288320Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6288396Z import torch
2023-01-11T21:38:06.6288469Z import random
2023-01-11T21:38:06.6288589Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6288712Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6288717Z 
2023-01-11T21:38:06.6288801Z aten = torch.ops.aten
2023-01-11T21:38:06.6288938Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6289026Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6289031Z 
2023-01-11T21:38:06.6289107Z import triton
2023-01-11T21:38:06.6289199Z import triton.language as tl
2023-01-11T21:38:06.6289324Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6289464Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6289469Z 
2023-01-11T21:38:06.6289473Z 
2023-01-11T21:38:06.6289672Z triton_fused_select_scatter_select_scatter_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6289777Z import triton
2023-01-11T21:38:06.6289872Z import triton.language as tl
2023-01-11T21:38:06.6289978Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6290078Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6290208Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6290335Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6290341Z 
2023-01-11T21:38:06.6290795Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.6290868Z @triton.jit
2023-01-11T21:38:06.6291026Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6291101Z     xnumel = 59888
2023-01-11T21:38:06.6291190Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6291322Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6291405Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6291486Z     x1 = (xindex // 38) % 197
2023-01-11T21:38:06.6291562Z     x0 = xindex % 38
2023-01-11T21:38:06.6291640Z     x2 = (xindex // 7486)
2023-01-11T21:38:06.6291711Z     x3 = xindex
2023-01-11T21:38:06.6291785Z     x4 = xindex % 7486
2023-01-11T21:38:06.6291909Z     tmp3 = tl.load(in_ptr0 + (x0 + (38*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.6292120Z     tmp4 = tl.load(in_ptr1 + (x3), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6292237Z     tmp9 = tl.load(in_ptr2 + (x4), xmask).to(tl.float32)
2023-01-11T21:38:06.6292352Z     tmp10 = tl.load(in_ptr1 + (x3), xmask).to(tl.float32)
2023-01-11T21:38:06.6292424Z     tmp0 = x1
2023-01-11T21:38:06.6292495Z     tmp1 = 0
2023-01-11T21:38:06.6292568Z     tmp2 = tmp0 == tmp1
2023-01-11T21:38:06.6292667Z     tmp5 = tl.where(tmp2, tmp3, tmp4)
2023-01-11T21:38:06.6292742Z     tmp6 = x2
2023-01-11T21:38:06.6292811Z     tmp7 = 1
2023-01-11T21:38:06.6292894Z     tmp8 = tmp6 == tmp7
2023-01-11T21:38:06.6292991Z     tmp11 = tl.where(tmp8, tmp9, tmp10)
2023-01-11T21:38:06.6293126Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6293282Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.6293369Z ''')
2023-01-11T21:38:06.6293374Z 
2023-01-11T21:38:06.6293379Z 
2023-01-11T21:38:06.6293472Z async_compile.wait(globals())
2023-01-11T21:38:06.6293551Z del async_compile
2023-01-11T21:38:06.6293557Z 
2023-01-11T21:38:06.6293631Z def call(args):
2023-01-11T21:38:06.6293718Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.6293793Z     args.clear()
2023-01-11T21:38:06.6293879Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6294090Z         buf0 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6294304Z         buf1 = empty_strided((8, 197, 38), (7486, 38, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6294399Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6294700Z         triton_fused_select_scatter_select_scatter_1_0.run(arg1_1, arg0_1, arg2_1, buf0, buf1, 59888, grid=grid(59888), stream=stream0)
2023-01-11T21:38:06.6294777Z         del arg0_1
2023-01-11T21:38:06.6294851Z         del arg1_1
2023-01-11T21:38:06.6294922Z         del arg2_1
2023-01-11T21:38:06.6294997Z         return (buf0, buf1, )
2023-01-11T21:38:06.6295002Z 
2023-01-11T21:38:06.6295007Z 
2023-01-11T21:38:06.6295089Z if __name__ == "__main__":
2023-01-11T21:38:06.6295206Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6295332Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6295549Z     arg0_1 = rand_strided((8, 197, 38), (7486, 38, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6295777Z     arg1_1 = rand_strided((8, 38), (38, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6296053Z     arg2_1 = rand_strided((197, 38), (38, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6296180Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.6296185Z 
2023-01-11T21:38:06.6296252Z ok (0.259s)
2023-01-11T21:38:06.6296705Z   test_sgn_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6296839Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6297103Z [2023-01-11 21:35:34,363] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 807
2023-01-11T21:38:06.6297437Z [2023-01-11 21:35:34,440] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 807
2023-01-11T21:38:06.6297859Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6297994Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6298250Z [2023-01-11 21:35:34,483] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 808
2023-01-11T21:38:06.6298516Z [2023-01-11 21:35:34,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 808
2023-01-11T21:38:06.6298522Z 
2023-01-11T21:38:06.6298622Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6298698Z import torch
2023-01-11T21:38:06.6298769Z import random
2023-01-11T21:38:06.6298890Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6299019Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6299024Z 
2023-01-11T21:38:06.6299108Z aten = torch.ops.aten
2023-01-11T21:38:06.6299248Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6299346Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6299390Z 
2023-01-11T21:38:06.6299473Z import triton
2023-01-11T21:38:06.6299562Z import triton.language as tl
2023-01-11T21:38:06.6299689Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6299837Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6299843Z 
2023-01-11T21:38:06.6299847Z 
2023-01-11T21:38:06.6300012Z triton_fused_sign_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.6300087Z import triton
2023-01-11T21:38:06.6300182Z import triton.language as tl
2023-01-11T21:38:06.6300299Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6300404Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6300536Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6300664Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6300670Z 
2023-01-11T21:38:06.6301095Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6301173Z @triton.jit
2023-01-11T21:38:06.6301318Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6301395Z     xnumel = 41
2023-01-11T21:38:06.6301495Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6301627Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6301706Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6301782Z     x0 = xindex
2023-01-11T21:38:06.6301975Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6302105Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6302183Z     tmp1 = 0 < tmp0
2023-01-11T21:38:06.6302277Z     tmp2 = tl.where(tmp1, 1, 0)
2023-01-11T21:38:06.6302356Z     tmp3 = tmp0 < 0
2023-01-11T21:38:06.6302440Z     tmp4 = tl.where(tmp3, 1, 0)
2023-01-11T21:38:06.6302554Z     tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.6302627Z     tmp7 = 1
2023-01-11T21:38:06.6302705Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.6302782Z     tmp9 = 0 < tmp8
2023-01-11T21:38:06.6302874Z     tmp10 = tl.where(tmp9, 1, 0)
2023-01-11T21:38:06.6302945Z     tmp11 = tmp8 < 0
2023-01-11T21:38:06.6303039Z     tmp12 = tl.where(tmp11, 1, 0)
2023-01-11T21:38:06.6303156Z     tmp13 = tmp10 - tmp12
2023-01-11T21:38:06.6303271Z     tmp14 = tmp13 - tmp7
2023-01-11T21:38:06.6303409Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6303546Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp14, xmask)
2023-01-11T21:38:06.6303636Z ''')
2023-01-11T21:38:06.6303642Z 
2023-01-11T21:38:06.6303646Z 
2023-01-11T21:38:06.6303741Z async_compile.wait(globals())
2023-01-11T21:38:06.6303813Z del async_compile
2023-01-11T21:38:06.6303819Z 
2023-01-11T21:38:06.6303896Z def call(args):
2023-01-11T21:38:06.6303972Z     arg0_1, = args
2023-01-11T21:38:06.6304053Z     args.clear()
2023-01-11T21:38:06.6304151Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6304351Z         buf0 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6304550Z         buf1 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6304638Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6304785Z         triton_fused_sign_sub_0.run(arg0_1, buf0, buf1, 41, grid=grid(41), stream=stream0)
2023-01-11T21:38:06.6304859Z         del arg0_1
2023-01-11T21:38:06.6304945Z         return (buf0, buf1, )
2023-01-11T21:38:06.6304950Z 
2023-01-11T21:38:06.6304954Z 
2023-01-11T21:38:06.6305035Z if __name__ == "__main__":
2023-01-11T21:38:06.6305154Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6305285Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6305485Z     arg0_1 = rand_strided((41, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6305629Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6305637Z 
2023-01-11T21:38:06.6305650Z 
2023-01-11T21:38:06.6305758Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6305848Z import torch
2023-01-11T21:38:06.6305936Z import random
2023-01-11T21:38:06.6306058Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6306181Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6306186Z 
2023-01-11T21:38:06.6306271Z aten = torch.ops.aten
2023-01-11T21:38:06.6306409Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6306500Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6306505Z 
2023-01-11T21:38:06.6306583Z import triton
2023-01-11T21:38:06.6306675Z import triton.language as tl
2023-01-11T21:38:06.6306801Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6306943Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6306948Z 
2023-01-11T21:38:06.6306953Z 
2023-01-11T21:38:06.6307150Z triton_fused_convert_element_type_1_sub_0 = async_compile.triton('''
2023-01-11T21:38:06.6307227Z import triton
2023-01-11T21:38:06.6307320Z import triton.language as tl
2023-01-11T21:38:06.6307428Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6307532Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6307667Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6307794Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6307799Z 
2023-01-11T21:38:06.6308212Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6308329Z @triton.jit
2023-01-11T21:38:06.6308473Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6308551Z     xnumel = 41
2023-01-11T21:38:06.6308646Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6308778Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6308864Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6308935Z     x0 = xindex
2023-01-11T21:38:06.6309151Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6309272Z     tmp8 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6309362Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6309433Z     tmp2 = 0 < tmp1
2023-01-11T21:38:06.6309523Z     tmp3 = tl.where(tmp2, 1, 0)
2023-01-11T21:38:06.6309599Z     tmp4 = tmp1 < 0
2023-01-11T21:38:06.6309689Z     tmp5 = tl.where(tmp4, 1, 0)
2023-01-11T21:38:06.6309804Z     tmp6 = tmp3 - tmp5
2023-01-11T21:38:06.6309894Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.6309967Z     tmp9 = 1
2023-01-11T21:38:06.6310043Z     tmp10 = tmp8 + tmp9
2023-01-11T21:38:06.6310135Z     tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.6310211Z     tmp12 = 0 < tmp11
2023-01-11T21:38:06.6310307Z     tmp13 = tl.where(tmp12, 1, 0)
2023-01-11T21:38:06.6310387Z     tmp14 = tmp11 < 0
2023-01-11T21:38:06.6310479Z     tmp15 = tl.where(tmp14, 1, 0)
2023-01-11T21:38:06.6310589Z     tmp16 = tmp13 - tmp15
2023-01-11T21:38:06.6310680Z     tmp17 = tmp16.to(tl.float32)
2023-01-11T21:38:06.6310795Z     tmp18 = tmp17 - tmp9
2023-01-11T21:38:06.6310933Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6311069Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.6311152Z ''')
2023-01-11T21:38:06.6311158Z 
2023-01-11T21:38:06.6311162Z 
2023-01-11T21:38:06.6311261Z async_compile.wait(globals())
2023-01-11T21:38:06.6311340Z del async_compile
2023-01-11T21:38:06.6311345Z 
2023-01-11T21:38:06.6311416Z def call(args):
2023-01-11T21:38:06.6311493Z     arg0_1, = args
2023-01-11T21:38:06.6311569Z     args.clear()
2023-01-11T21:38:06.6311663Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6311889Z         buf0 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6312090Z         buf1 = empty_strided((41, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6312184Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6312346Z         triton_fused_convert_element_type_1_sub_0.run(arg0_1, buf0, buf1, 41, grid=grid(41), stream=stream0)
2023-01-11T21:38:06.6312419Z         del arg0_1
2023-01-11T21:38:06.6312504Z         return (buf0, buf1, )
2023-01-11T21:38:06.6312509Z 
2023-01-11T21:38:06.6312514Z 
2023-01-11T21:38:06.6312594Z if __name__ == "__main__":
2023-01-11T21:38:06.6312712Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6312843Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6313040Z     arg0_1 = rand_strided((41, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6313156Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6313161Z 
2023-01-11T21:38:06.6313227Z ok (0.221s)
2023-01-11T21:38:06.6313691Z   test_sgn_extremal_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6313823Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6314082Z [2023-01-11 21:35:34,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 809
2023-01-11T21:38:06.6314384Z [2023-01-11 21:35:34,642] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 809
2023-01-11T21:38:06.6314800Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6314934Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6315188Z [2023-01-11 21:35:34,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 810
2023-01-11T21:38:06.6315452Z [2023-01-11 21:35:34,734] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 810
2023-01-11T21:38:06.6315458Z 
2023-01-11T21:38:06.6315558Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6315637Z import torch
2023-01-11T21:38:06.6315708Z import random
2023-01-11T21:38:06.6315829Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6315953Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6315958Z 
2023-01-11T21:38:06.6316039Z aten = torch.ops.aten
2023-01-11T21:38:06.6316185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6316285Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6316290Z 
2023-01-11T21:38:06.6316367Z import triton
2023-01-11T21:38:06.6316455Z import triton.language as tl
2023-01-11T21:38:06.6316583Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6316726Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6316732Z 
2023-01-11T21:38:06.6316736Z 
2023-01-11T21:38:06.6316895Z triton_fused_sign_0 = async_compile.triton('''
2023-01-11T21:38:06.6316974Z import triton
2023-01-11T21:38:06.6317075Z import triton.language as tl
2023-01-11T21:38:06.6317195Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6317298Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6317427Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6317555Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6317560Z 
2023-01-11T21:38:06.6317995Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6318074Z @triton.jit
2023-01-11T21:38:06.6318209Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6318284Z     xnumel = 4
2023-01-11T21:38:06.6318386Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6318516Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6318595Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6318670Z     x0 = xindex
2023-01-11T21:38:06.6318772Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6318851Z     tmp1 = 0 < tmp0
2023-01-11T21:38:06.6318942Z     tmp2 = tl.where(tmp1, 1, 0)
2023-01-11T21:38:06.6319021Z     tmp3 = tmp0 < 0
2023-01-11T21:38:06.6319111Z     tmp4 = tl.where(tmp3, 1, 0)
2023-01-11T21:38:06.6319215Z     tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.6319355Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6319441Z ''')
2023-01-11T21:38:06.6319446Z 
2023-01-11T21:38:06.6319451Z 
2023-01-11T21:38:06.6319546Z async_compile.wait(globals())
2023-01-11T21:38:06.6319627Z del async_compile
2023-01-11T21:38:06.6319632Z 
2023-01-11T21:38:06.6319710Z def call(args):
2023-01-11T21:38:06.6319788Z     arg0_1, = args
2023-01-11T21:38:06.6319859Z     args.clear()
2023-01-11T21:38:06.6319953Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6320150Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6320246Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6320415Z         triton_fused_sign_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6320491Z         del arg0_1
2023-01-11T21:38:06.6320571Z         return (buf0, )
2023-01-11T21:38:06.6320576Z 
2023-01-11T21:38:06.6320581Z 
2023-01-11T21:38:06.6320661Z if __name__ == "__main__":
2023-01-11T21:38:06.6320777Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6320903Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6321102Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6321219Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6321224Z 
2023-01-11T21:38:06.6321229Z 
2023-01-11T21:38:06.6321329Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6321403Z import torch
2023-01-11T21:38:06.6321477Z import random
2023-01-11T21:38:06.6321598Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6321720Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6321725Z 
2023-01-11T21:38:06.6321809Z aten = torch.ops.aten
2023-01-11T21:38:06.6321944Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6322040Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6322046Z 
2023-01-11T21:38:06.6322120Z import triton
2023-01-11T21:38:06.6322217Z import triton.language as tl
2023-01-11T21:38:06.6322344Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6322478Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6322491Z 
2023-01-11T21:38:06.6322495Z 
2023-01-11T21:38:06.6322677Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6322754Z import triton
2023-01-11T21:38:06.6322849Z import triton.language as tl
2023-01-11T21:38:06.6322966Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6323073Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6323205Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6323335Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6323340Z 
2023-01-11T21:38:06.6323772Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6323842Z @triton.jit
2023-01-11T21:38:06.6323977Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6324053Z     xnumel = 4
2023-01-11T21:38:06.6324152Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6324282Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6324368Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6324442Z     x0 = xindex
2023-01-11T21:38:06.6324555Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6324645Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6324723Z     tmp2 = 0 < tmp1
2023-01-11T21:38:06.6324813Z     tmp3 = tl.where(tmp2, 1, 0)
2023-01-11T21:38:06.6324890Z     tmp4 = tmp1 < 0
2023-01-11T21:38:06.6324979Z     tmp5 = tl.where(tmp4, 1, 0)
2023-01-11T21:38:06.6325091Z     tmp6 = tmp3 - tmp5
2023-01-11T21:38:06.6325175Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.6325313Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6325400Z ''')
2023-01-11T21:38:06.6325406Z 
2023-01-11T21:38:06.6325410Z 
2023-01-11T21:38:06.6325507Z async_compile.wait(globals())
2023-01-11T21:38:06.6325586Z del async_compile
2023-01-11T21:38:06.6325591Z 
2023-01-11T21:38:06.6325666Z def call(args):
2023-01-11T21:38:06.6325741Z     arg0_1, = args
2023-01-11T21:38:06.6325811Z     args.clear()
2023-01-11T21:38:06.6325905Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6326100Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6326195Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6326383Z         triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6326459Z         del arg0_1
2023-01-11T21:38:06.6326540Z         return (buf0, )
2023-01-11T21:38:06.6326545Z 
2023-01-11T21:38:06.6326549Z 
2023-01-11T21:38:06.6326632Z if __name__ == "__main__":
2023-01-11T21:38:06.6326746Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6326872Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6327070Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6327183Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6327188Z 
2023-01-11T21:38:06.6327261Z ok (0.169s)
2023-01-11T21:38:06.6327731Z   test_shape_prop_torch_ones_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6327869Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6328133Z [2023-01-11 21:35:34,927] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 811
2023-01-11T21:38:06.6328402Z [2023-01-11 21:35:35,004] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 811
2023-01-11T21:38:06.6328819Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6328947Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6329206Z [2023-01-11 21:35:35,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 812
2023-01-11T21:38:06.6329468Z [2023-01-11 21:35:35,129] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 812
2023-01-11T21:38:06.6329474Z 
2023-01-11T21:38:06.6329602Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6329682Z import torch
2023-01-11T21:38:06.6329758Z import random
2023-01-11T21:38:06.6329879Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6330005Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6330012Z 
2023-01-11T21:38:06.6330089Z aten = torch.ops.aten
2023-01-11T21:38:06.6330228Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6330325Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6330330Z 
2023-01-11T21:38:06.6330406Z import triton
2023-01-11T21:38:06.6330502Z import triton.language as tl
2023-01-11T21:38:06.6330638Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6330782Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6330787Z 
2023-01-11T21:38:06.6330792Z 
2023-01-11T21:38:06.6330948Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6331019Z import triton
2023-01-11T21:38:06.6331115Z import triton.language as tl
2023-01-11T21:38:06.6331233Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6331337Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6331472Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6331601Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6331607Z 
2023-01-11T21:38:06.6332020Z @pointwise(size_hints=[33554432], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6332132Z @triton.jit
2023-01-11T21:38:06.6332261Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6332339Z     xnumel = 25165824
2023-01-11T21:38:06.6332439Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6332574Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6332661Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6332737Z     x0 = xindex
2023-01-11T21:38:06.6332836Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6332903Z     tmp1 = 1
2023-01-11T21:38:06.6332984Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6333122Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6333207Z ''')
2023-01-11T21:38:06.6333214Z 
2023-01-11T21:38:06.6333219Z 
2023-01-11T21:38:06.6333314Z async_compile.wait(globals())
2023-01-11T21:38:06.6333393Z del async_compile
2023-01-11T21:38:06.6333398Z 
2023-01-11T21:38:06.6333477Z def call(args):
2023-01-11T21:38:06.6333558Z     arg0_1, = args
2023-01-11T21:38:06.6333629Z     args.clear()
2023-01-11T21:38:06.6333724Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6333963Z         buf0 = empty_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6334061Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6334211Z         triton_fused_add_0.run(arg0_1, buf0, 25165824, grid=grid(25165824), stream=stream0)
2023-01-11T21:38:06.6334287Z         del arg0_1
2023-01-11T21:38:06.6334368Z         return (buf0, )
2023-01-11T21:38:06.6334374Z 
2023-01-11T21:38:06.6334378Z 
2023-01-11T21:38:06.6334454Z if __name__ == "__main__":
2023-01-11T21:38:06.6334719Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6334846Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6335084Z     arg0_1 = rand_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6335197Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6335206Z 
2023-01-11T21:38:06.6335211Z 
2023-01-11T21:38:06.6335308Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6335383Z import torch
2023-01-11T21:38:06.6335459Z import random
2023-01-11T21:38:06.6335576Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6335764Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6335772Z 
2023-01-11T21:38:06.6335872Z aten = torch.ops.aten
2023-01-11T21:38:06.6336006Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6336103Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6336108Z 
2023-01-11T21:38:06.6336179Z import triton
2023-01-11T21:38:06.6336270Z import triton.language as tl
2023-01-11T21:38:06.6336394Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6336525Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6336531Z 
2023-01-11T21:38:06.6336535Z 
2023-01-11T21:38:06.6336692Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6336766Z import triton
2023-01-11T21:38:06.6336858Z import triton.language as tl
2023-01-11T21:38:06.6336970Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6337072Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6337266Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6337386Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6337398Z 
2023-01-11T21:38:06.6337801Z @pointwise(size_hints=[33554432], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6337876Z @triton.jit
2023-01-11T21:38:06.6338007Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6338083Z     xnumel = 25165824
2023-01-11T21:38:06.6338176Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6338349Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6338437Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6338503Z     x0 = xindex
2023-01-11T21:38:06.6338621Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6338711Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6338785Z     tmp2 = 1
2023-01-11T21:38:06.6338865Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.6339000Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6339091Z ''')
2023-01-11T21:38:06.6339096Z 
2023-01-11T21:38:06.6339101Z 
2023-01-11T21:38:06.6339195Z async_compile.wait(globals())
2023-01-11T21:38:06.6339267Z del async_compile
2023-01-11T21:38:06.6339272Z 
2023-01-11T21:38:06.6339348Z def call(args):
2023-01-11T21:38:06.6339424Z     arg0_1, = args
2023-01-11T21:38:06.6339501Z     args.clear()
2023-01-11T21:38:06.6339595Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6339832Z         buf0 = empty_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6339929Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6340066Z         triton_fused_add_0.run(arg0_1, buf0, 25165824, grid=grid(25165824), stream=stream0)
2023-01-11T21:38:06.6340142Z         del arg0_1
2023-01-11T21:38:06.6340222Z         return (buf0, )
2023-01-11T21:38:06.6340230Z 
2023-01-11T21:38:06.6340235Z 
2023-01-11T21:38:06.6340315Z if __name__ == "__main__":
2023-01-11T21:38:06.6340435Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6340559Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6340794Z     arg0_1 = rand_strided((8, 12, 512, 512), (3145728, 262144, 512, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6340909Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6340914Z 
2023-01-11T21:38:06.6340980Z ok (0.420s)
2023-01-11T21:38:06.6341437Z   test_sigmoid_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6341602Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6341861Z [2023-01-11 21:35:35,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 813
2023-01-11T21:38:06.6342124Z [2023-01-11 21:35:35,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 813
2023-01-11T21:38:06.6342535Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6342668Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6342923Z [2023-01-11 21:35:35,271] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 814
2023-01-11T21:38:06.6343187Z [2023-01-11 21:35:35,349] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 814
2023-01-11T21:38:06.6343193Z 
2023-01-11T21:38:06.6343293Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6343373Z import torch
2023-01-11T21:38:06.6343443Z import random
2023-01-11T21:38:06.6343563Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6343688Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6343693Z 
2023-01-11T21:38:06.6343777Z aten = torch.ops.aten
2023-01-11T21:38:06.6343913Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6344010Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6344043Z 
2023-01-11T21:38:06.6344121Z import triton
2023-01-11T21:38:06.6344209Z import triton.language as tl
2023-01-11T21:38:06.6344339Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6344480Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6344486Z 
2023-01-11T21:38:06.6344493Z 
2023-01-11T21:38:06.6344673Z triton_fused_sigmoid_sigmoid_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6344750Z import triton
2023-01-11T21:38:06.6344845Z import triton.language as tl
2023-01-11T21:38:06.6344961Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6345066Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6345194Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6345322Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6345327Z 
2023-01-11T21:38:06.6345790Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.6345888Z @triton.jit
2023-01-11T21:38:06.6346046Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6346131Z     xnumel = 64
2023-01-11T21:38:06.6346229Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6346360Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6346438Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6346511Z     x0 = xindex
2023-01-11T21:38:06.6346702Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6346802Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6346904Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6346990Z     tmp1 = tl.sigmoid(tmp0)
2023-01-11T21:38:06.6347071Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6347153Z     tmp5 = tl.sigmoid(tmp4)
2023-01-11T21:38:06.6347288Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6347424Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6347511Z ''')
2023-01-11T21:38:06.6347517Z 
2023-01-11T21:38:06.6347521Z 
2023-01-11T21:38:06.6347643Z async_compile.wait(globals())
2023-01-11T21:38:06.6347725Z del async_compile
2023-01-11T21:38:06.6347730Z 
2023-01-11T21:38:06.6347806Z def call(args):
2023-01-11T21:38:06.6347887Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6347957Z     args.clear()
2023-01-11T21:38:06.6348052Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6348254Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6348452Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6348546Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6348712Z         triton_fused_sigmoid_sigmoid_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6348790Z         del arg0_1
2023-01-11T21:38:06.6348858Z         del arg1_1
2023-01-11T21:38:06.6348942Z         return (buf0, buf1, )
2023-01-11T21:38:06.6348947Z 
2023-01-11T21:38:06.6348951Z 
2023-01-11T21:38:06.6349034Z if __name__ == "__main__":
2023-01-11T21:38:06.6349154Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6349284Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6349487Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6349685Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6349807Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6349813Z 
2023-01-11T21:38:06.6349817Z 
2023-01-11T21:38:06.6349917Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6349987Z import torch
2023-01-11T21:38:06.6350063Z import random
2023-01-11T21:38:06.6350224Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6350351Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6350356Z 
2023-01-11T21:38:06.6350441Z aten = torch.ops.aten
2023-01-11T21:38:06.6350578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6350677Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6350682Z 
2023-01-11T21:38:06.6350751Z import triton
2023-01-11T21:38:06.6350846Z import triton.language as tl
2023-01-11T21:38:06.6350972Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6351114Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6351120Z 
2023-01-11T21:38:06.6351124Z 
2023-01-11T21:38:06.6351306Z triton_fused_sigmoid_sigmoid_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6351383Z import triton
2023-01-11T21:38:06.6351477Z import triton.language as tl
2023-01-11T21:38:06.6351597Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6351697Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6351829Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6351956Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6351961Z 
2023-01-11T21:38:06.6352398Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.6352474Z @triton.jit
2023-01-11T21:38:06.6352625Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6352702Z     xnumel = 64
2023-01-11T21:38:06.6352803Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6352927Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6353012Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6353084Z     x0 = xindex
2023-01-11T21:38:06.6353302Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6353421Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6353539Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6353625Z     tmp1 = tl.sigmoid(tmp0)
2023-01-11T21:38:06.6353727Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6353814Z     tmp5 = tl.sigmoid(tmp4)
2023-01-11T21:38:06.6353954Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6354089Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6354176Z ''')
2023-01-11T21:38:06.6354182Z 
2023-01-11T21:38:06.6354186Z 
2023-01-11T21:38:06.6354282Z async_compile.wait(globals())
2023-01-11T21:38:06.6354360Z del async_compile
2023-01-11T21:38:06.6354365Z 
2023-01-11T21:38:06.6354446Z def call(args):
2023-01-11T21:38:06.6354521Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6354599Z     args.clear()
2023-01-11T21:38:06.6354693Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6354893Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6355090Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6355188Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6355364Z         triton_fused_sigmoid_sigmoid_1_0.run(arg0_1, arg1_1, buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6355449Z         del arg0_1
2023-01-11T21:38:06.6355525Z         del arg1_1
2023-01-11T21:38:06.6355630Z         return (buf0, buf1, )
2023-01-11T21:38:06.6355635Z 
2023-01-11T21:38:06.6355639Z 
2023-01-11T21:38:06.6355723Z if __name__ == "__main__":
2023-01-11T21:38:06.6355842Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6355971Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6356169Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6356397Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6356512Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6356517Z 
2023-01-11T21:38:06.6356590Z ok (0.196s)
2023-01-11T21:38:06.6357051Z   test_signbit_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6357185Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6357444Z [2023-01-11 21:35:35,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 815
2023-01-11T21:38:06.6357709Z [2023-01-11 21:35:35,520] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 815
2023-01-11T21:38:06.6358131Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6358270Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6358525Z [2023-01-11 21:35:35,540] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 816
2023-01-11T21:38:06.6358788Z [2023-01-11 21:35:35,687] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 816
2023-01-11T21:38:06.6358794Z 
2023-01-11T21:38:06.6358895Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6358964Z import torch
2023-01-11T21:38:06.6359042Z import random
2023-01-11T21:38:06.6359169Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6359295Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6359300Z 
2023-01-11T21:38:06.6359385Z aten = torch.ops.aten
2023-01-11T21:38:06.6359525Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6359649Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6359655Z 
2023-01-11T21:38:06.6359725Z import triton
2023-01-11T21:38:06.6359819Z import triton.language as tl
2023-01-11T21:38:06.6359946Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6360086Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6360092Z 
2023-01-11T21:38:06.6360096Z 
2023-01-11T21:38:06.6360279Z triton_fused_bitwise_and_signbit_0 = async_compile.triton('''
2023-01-11T21:38:06.6360358Z import triton
2023-01-11T21:38:06.6360452Z import triton.language as tl
2023-01-11T21:38:06.6360566Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6360666Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6360801Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6360932Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6360937Z 
2023-01-11T21:38:06.6361353Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i1', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6361429Z @triton.jit
2023-01-11T21:38:06.6361576Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6361653Z     xnumel = 72
2023-01-11T21:38:06.6361754Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6361879Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6361963Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6362035Z     x0 = xindex
2023-01-11T21:38:06.6362255Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6362356Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6362512Z     tmp1 = tl.libdevice.signbit(tmp0) if (tmp0).dtype is tl.float32 else tmp0 < 0
2023-01-11T21:38:06.6362617Z     tmp3 = -tmp2
2023-01-11T21:38:06.6362765Z     tmp4 = tl.libdevice.signbit(tmp3) if (tmp3).dtype is tl.float32 else tmp3 < 0
2023-01-11T21:38:06.6362848Z     tmp5 = tmp4 == 0
2023-01-11T21:38:06.6362934Z     tmp6 = tmp5.to(tl.int64)
2023-01-11T21:38:06.6363009Z     tmp7 = 1
2023-01-11T21:38:06.6363092Z     tmp8 = tmp6 & tmp7
2023-01-11T21:38:06.6363227Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6363363Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.6363443Z ''')
2023-01-11T21:38:06.6363450Z 
2023-01-11T21:38:06.6363460Z 
2023-01-11T21:38:06.6363549Z async_compile.wait(globals())
2023-01-11T21:38:06.6363627Z del async_compile
2023-01-11T21:38:06.6363635Z 
2023-01-11T21:38:06.6363713Z def call(args):
2023-01-11T21:38:06.6363790Z     arg0_1, = args
2023-01-11T21:38:06.6363866Z     args.clear()
2023-01-11T21:38:06.6363963Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6364174Z         buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.6364382Z         buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.6364475Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6364637Z         triton_fused_bitwise_and_signbit_0.run(arg0_1, buf0, buf1, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.6364711Z         del arg0_1
2023-01-11T21:38:06.6364796Z         return (buf0, buf1, )
2023-01-11T21:38:06.6364801Z 
2023-01-11T21:38:06.6364806Z 
2023-01-11T21:38:06.6364886Z if __name__ == "__main__":
2023-01-11T21:38:06.6365009Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6365139Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6365353Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6365467Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6365472Z 
2023-01-11T21:38:06.6365476Z 
2023-01-11T21:38:06.6365600Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6365707Z import torch
2023-01-11T21:38:06.6365801Z import random
2023-01-11T21:38:06.6365922Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6366047Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6366052Z 
2023-01-11T21:38:06.6366137Z aten = torch.ops.aten
2023-01-11T21:38:06.6366269Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6366368Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6366373Z 
2023-01-11T21:38:06.6366447Z import triton
2023-01-11T21:38:06.6366542Z import triton.language as tl
2023-01-11T21:38:06.6366668Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6366812Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6366818Z 
2023-01-11T21:38:06.6366822Z 
2023-01-11T21:38:06.6367004Z triton_fused_bitwise_and_signbit_0 = async_compile.triton('''
2023-01-11T21:38:06.6367081Z import triton
2023-01-11T21:38:06.6367171Z import triton.language as tl
2023-01-11T21:38:06.6367289Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6367392Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6367524Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6367652Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6367657Z 
2023-01-11T21:38:06.6368067Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp16', 1: '*i1', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6368224Z @triton.jit
2023-01-11T21:38:06.6368368Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6368437Z     xnumel = 72
2023-01-11T21:38:06.6368537Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6368666Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6368754Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6368828Z     x0 = xindex
2023-01-11T21:38:06.6369046Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6369167Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6369314Z     tmp1 = tl.libdevice.signbit(tmp0) if (tmp0).dtype is tl.float32 else tmp0 < 0
2023-01-11T21:38:06.6369417Z     tmp3 = -tmp2
2023-01-11T21:38:06.6369569Z     tmp4 = tl.libdevice.signbit(tmp3) if (tmp3).dtype is tl.float32 else tmp3 < 0
2023-01-11T21:38:06.6369647Z     tmp5 = tmp4 == 0
2023-01-11T21:38:06.6369735Z     tmp6 = tmp5.to(tl.int64)
2023-01-11T21:38:06.6369811Z     tmp7 = 1
2023-01-11T21:38:06.6369891Z     tmp8 = tmp6 & tmp7
2023-01-11T21:38:06.6370019Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.6370153Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.6370239Z ''')
2023-01-11T21:38:06.6370244Z 
2023-01-11T21:38:06.6370251Z 
2023-01-11T21:38:06.6370350Z async_compile.wait(globals())
2023-01-11T21:38:06.6370427Z del async_compile
2023-01-11T21:38:06.6370432Z 
2023-01-11T21:38:06.6370509Z def call(args):
2023-01-11T21:38:06.6370583Z     arg0_1, = args
2023-01-11T21:38:06.6370662Z     args.clear()
2023-01-11T21:38:06.6370749Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6370956Z         buf0 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.6371166Z         buf1 = empty_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.6371259Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6371423Z         triton_fused_bitwise_and_signbit_0.run(arg0_1, buf0, buf1, 72, grid=grid(72), stream=stream0)
2023-01-11T21:38:06.6371499Z         del arg0_1
2023-01-11T21:38:06.6371585Z         return (buf0, buf1, )
2023-01-11T21:38:06.6371590Z 
2023-01-11T21:38:06.6371595Z 
2023-01-11T21:38:06.6371670Z if __name__ == "__main__":
2023-01-11T21:38:06.6371815Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6371945Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6372159Z     arg0_1 = rand_strided((1, 2, 6, 6), (72, 36, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6372274Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6372280Z 
2023-01-11T21:38:06.6372352Z ok (0.337s)
2023-01-11T21:38:06.6372813Z   test_silu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6372950Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6373210Z [2023-01-11 21:35:35,703] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 817
2023-01-11T21:38:06.6373478Z [2023-01-11 21:35:35,773] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 817
2023-01-11T21:38:06.6373886Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6374019Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6374300Z [2023-01-11 21:35:35,823] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 818
2023-01-11T21:38:06.6374671Z [2023-01-11 21:35:35,897] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 818
2023-01-11T21:38:06.6374677Z 
2023-01-11T21:38:06.6374780Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6374857Z import torch
2023-01-11T21:38:06.6374936Z import random
2023-01-11T21:38:06.6375059Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6375186Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6375191Z 
2023-01-11T21:38:06.6375268Z aten = torch.ops.aten
2023-01-11T21:38:06.6375408Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6375505Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6375510Z 
2023-01-11T21:38:06.6375586Z import triton
2023-01-11T21:38:06.6375686Z import triton.language as tl
2023-01-11T21:38:06.6375818Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6375958Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6375963Z 
2023-01-11T21:38:06.6375968Z 
2023-01-11T21:38:06.6376124Z triton_fused_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.6376194Z import triton
2023-01-11T21:38:06.6376290Z import triton.language as tl
2023-01-11T21:38:06.6376406Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6376513Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6376647Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6376775Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6376780Z 
2023-01-11T21:38:06.6377237Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6377317Z @triton.jit
2023-01-11T21:38:06.6377445Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6377521Z     xnumel = 64
2023-01-11T21:38:06.6377621Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6377751Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6377881Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6377955Z     x0 = xindex
2023-01-11T21:38:06.6378054Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6378135Z     tmp1 = tl.sigmoid(tmp0)
2023-01-11T21:38:06.6378216Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6378350Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6378439Z ''')
2023-01-11T21:38:06.6378445Z 
2023-01-11T21:38:06.6378449Z 
2023-01-11T21:38:06.6378546Z async_compile.wait(globals())
2023-01-11T21:38:06.6378624Z del async_compile
2023-01-11T21:38:06.6378629Z 
2023-01-11T21:38:06.6378707Z def call(args):
2023-01-11T21:38:06.6378779Z     arg0_1, = args
2023-01-11T21:38:06.6378857Z     args.clear()
2023-01-11T21:38:06.6378951Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6379152Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6379248Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6379388Z         triton_fused_mul_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6379463Z         del arg0_1
2023-01-11T21:38:06.6379536Z         return (buf0, )
2023-01-11T21:38:06.6379548Z 
2023-01-11T21:38:06.6379553Z 
2023-01-11T21:38:06.6379628Z if __name__ == "__main__":
2023-01-11T21:38:06.6379747Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6379878Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6380079Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6380194Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6380199Z 
2023-01-11T21:38:06.6380237Z 
2023-01-11T21:38:06.6380337Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6380413Z import torch
2023-01-11T21:38:06.6380493Z import random
2023-01-11T21:38:06.6380607Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6380733Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6380738Z 
2023-01-11T21:38:06.6380824Z aten = torch.ops.aten
2023-01-11T21:38:06.6380965Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6381067Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6381073Z 
2023-01-11T21:38:06.6381149Z import triton
2023-01-11T21:38:06.6381245Z import triton.language as tl
2023-01-11T21:38:06.6381367Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6381508Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6381513Z 
2023-01-11T21:38:06.6381517Z 
2023-01-11T21:38:06.6381706Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6381786Z import triton
2023-01-11T21:38:06.6381881Z import triton.language as tl
2023-01-11T21:38:06.6381999Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6382103Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6382238Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6382361Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6382366Z 
2023-01-11T21:38:06.6382773Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6382851Z @triton.jit
2023-01-11T21:38:06.6382985Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6383059Z     xnumel = 64
2023-01-11T21:38:06.6383156Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6383287Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6383376Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6383443Z     x0 = xindex
2023-01-11T21:38:06.6383562Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6383655Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6383742Z     tmp2 = tl.sigmoid(tmp1)
2023-01-11T21:38:06.6383850Z     tmp3 = tmp1 * tmp2
2023-01-11T21:38:06.6383940Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.6384081Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6384163Z ''')
2023-01-11T21:38:06.6384168Z 
2023-01-11T21:38:06.6384172Z 
2023-01-11T21:38:06.6384266Z async_compile.wait(globals())
2023-01-11T21:38:06.6384343Z del async_compile
2023-01-11T21:38:06.6384348Z 
2023-01-11T21:38:06.6384426Z def call(args):
2023-01-11T21:38:06.6384500Z     arg0_1, = args
2023-01-11T21:38:06.6384576Z     args.clear()
2023-01-11T21:38:06.6384670Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6384865Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6384962Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6385124Z         triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6385200Z         del arg0_1
2023-01-11T21:38:06.6385278Z         return (buf0, )
2023-01-11T21:38:06.6385286Z 
2023-01-11T21:38:06.6385290Z 
2023-01-11T21:38:06.6385372Z if __name__ == "__main__":
2023-01-11T21:38:06.6385495Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6385630Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6385854Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6385977Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6385982Z 
2023-01-11T21:38:06.6386054Z ok (0.210s)
2023-01-11T21:38:06.6386504Z   test_simplify_dims (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6386665Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6386923Z [2023-01-11 21:35:35,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 819
2023-01-11T21:38:06.6387190Z [2023-01-11 21:35:35,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 819
2023-01-11T21:38:06.6387602Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6387737Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6387993Z [2023-01-11 21:35:36,010] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 820
2023-01-11T21:38:06.6388258Z [2023-01-11 21:35:36,088] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 820
2023-01-11T21:38:06.6388264Z 
2023-01-11T21:38:06.6388358Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6388432Z import torch
2023-01-11T21:38:06.6388505Z import random
2023-01-11T21:38:06.6388626Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6388751Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6388756Z 
2023-01-11T21:38:06.6388841Z aten = torch.ops.aten
2023-01-11T21:38:06.6388978Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6389068Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6389081Z 
2023-01-11T21:38:06.6389150Z import triton
2023-01-11T21:38:06.6389243Z import triton.language as tl
2023-01-11T21:38:06.6389370Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6389510Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6389515Z 
2023-01-11T21:38:06.6389520Z 
2023-01-11T21:38:06.6389703Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6389780Z import triton
2023-01-11T21:38:06.6389875Z import triton.language as tl
2023-01-11T21:38:06.6389984Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6390092Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6390228Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6390356Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6390361Z 
2023-01-11T21:38:06.6390769Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6390848Z @triton.jit
2023-01-11T21:38:06.6390983Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6391057Z     xnumel = 720
2023-01-11T21:38:06.6391150Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6391288Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6391376Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6391453Z     x0 = xindex % 30
2023-01-11T21:38:06.6391534Z     x1 = (xindex // 30) % 4
2023-01-11T21:38:06.6391620Z     x2 = (xindex // 120)
2023-01-11T21:38:06.6391693Z     x3 = xindex
2023-01-11T21:38:06.6391803Z     tmp0 = tl.load(in_ptr0 + (x0 + (60*x1) + (300*x2)), xmask)
2023-01-11T21:38:06.6391877Z     tmp1 = 1
2023-01-11T21:38:06.6391959Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6392095Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6392220Z ''')
2023-01-11T21:38:06.6392227Z 
2023-01-11T21:38:06.6392231Z 
2023-01-11T21:38:06.6392324Z async_compile.wait(globals())
2023-01-11T21:38:06.6392402Z del async_compile
2023-01-11T21:38:06.6392407Z 
2023-01-11T21:38:06.6392483Z def call(args):
2023-01-11T21:38:06.6392552Z     arg0_1, = args
2023-01-11T21:38:06.6392633Z     args.clear()
2023-01-11T21:38:06.6392730Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6392961Z         buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6393056Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6393195Z         triton_fused_add_0.run(arg0_1, buf0, 720, grid=grid(720), stream=stream0)
2023-01-11T21:38:06.6393272Z         del arg0_1
2023-01-11T21:38:06.6393345Z         return (buf0, )
2023-01-11T21:38:06.6393350Z 
2023-01-11T21:38:06.6393355Z 
2023-01-11T21:38:06.6393436Z if __name__ == "__main__":
2023-01-11T21:38:06.6393555Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6393689Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6393916Z     arg0_1 = rand_strided((2, 3, 4, 5, 6), (900, 300, 60, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6394031Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6394036Z 
2023-01-11T21:38:06.6394041Z 
2023-01-11T21:38:06.6394143Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6394219Z import torch
2023-01-11T21:38:06.6394289Z import random
2023-01-11T21:38:06.6394410Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6394538Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6394543Z 
2023-01-11T21:38:06.6394626Z aten = torch.ops.aten
2023-01-11T21:38:06.6394764Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6394861Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6394867Z 
2023-01-11T21:38:06.6394943Z import triton
2023-01-11T21:38:06.6395032Z import triton.language as tl
2023-01-11T21:38:06.6395163Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6395304Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6395311Z 
2023-01-11T21:38:06.6395316Z 
2023-01-11T21:38:06.6395495Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6395591Z import triton
2023-01-11T21:38:06.6395716Z import triton.language as tl
2023-01-11T21:38:06.6395834Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6395940Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6396067Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6396195Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6396200Z 
2023-01-11T21:38:06.6396609Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6396686Z @triton.jit
2023-01-11T21:38:06.6396820Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6396897Z     xnumel = 720
2023-01-11T21:38:06.6396997Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6397128Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6397209Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6397288Z     x0 = xindex % 30
2023-01-11T21:38:06.6397371Z     x1 = (xindex // 30) % 4
2023-01-11T21:38:06.6397451Z     x2 = (xindex // 120)
2023-01-11T21:38:06.6397523Z     x3 = xindex
2023-01-11T21:38:06.6397656Z     tmp0 = tl.load(in_ptr0 + (x0 + (60*x1) + (300*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.6397728Z     tmp1 = 1
2023-01-11T21:38:06.6397803Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6397939Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6398028Z ''')
2023-01-11T21:38:06.6398033Z 
2023-01-11T21:38:06.6398038Z 
2023-01-11T21:38:06.6398163Z async_compile.wait(globals())
2023-01-11T21:38:06.6398241Z del async_compile
2023-01-11T21:38:06.6398247Z 
2023-01-11T21:38:06.6398324Z def call(args):
2023-01-11T21:38:06.6398401Z     arg0_1, = args
2023-01-11T21:38:06.6398471Z     args.clear()
2023-01-11T21:38:06.6398563Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6398793Z         buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6398888Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6399027Z         triton_fused_add_0.run(arg0_1, buf0, 720, grid=grid(720), stream=stream0)
2023-01-11T21:38:06.6399102Z         del arg0_1
2023-01-11T21:38:06.6399183Z         return (buf0, )
2023-01-11T21:38:06.6399188Z 
2023-01-11T21:38:06.6399192Z 
2023-01-11T21:38:06.6399273Z if __name__ == "__main__":
2023-01-11T21:38:06.6399386Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6399515Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6399746Z     arg0_1 = rand_strided((2, 3, 4, 5, 6), (900, 300, 60, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6399860Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6399866Z 
2023-01-11T21:38:06.6399941Z ok (0.191s)
2023-01-11T21:38:06.6400404Z   test_simplify_loops_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6400536Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6400797Z [2023-01-11 21:35:36,101] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 821
2023-01-11T21:38:06.6401064Z [2023-01-11 21:35:36,186] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 821
2023-01-11T21:38:06.6401505Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6401637Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6401887Z [2023-01-11 21:35:36,199] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 822
2023-01-11T21:38:06.6402147Z [2023-01-11 21:35:36,278] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 822
2023-01-11T21:38:06.6402153Z 
2023-01-11T21:38:06.6402253Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6402332Z import torch
2023-01-11T21:38:06.6402408Z import random
2023-01-11T21:38:06.6402532Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6402658Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6402663Z 
2023-01-11T21:38:06.6402747Z aten = torch.ops.aten
2023-01-11T21:38:06.6402878Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6402977Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6402982Z 
2023-01-11T21:38:06.6403058Z import triton
2023-01-11T21:38:06.6403152Z import triton.language as tl
2023-01-11T21:38:06.6403276Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6403416Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6403421Z 
2023-01-11T21:38:06.6403426Z 
2023-01-11T21:38:06.6403580Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6403657Z import triton
2023-01-11T21:38:06.6403744Z import triton.language as tl
2023-01-11T21:38:06.6403859Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6403991Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6404123Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6404251Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6404256Z 
2023-01-11T21:38:06.6404684Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6404761Z @triton.jit
2023-01-11T21:38:06.6404905Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6404976Z     xnumel = 720
2023-01-11T21:38:06.6405075Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6405206Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6405291Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6405364Z     x3 = xindex
2023-01-11T21:38:06.6405442Z     x0 = xindex % 30
2023-01-11T21:38:06.6405521Z     x1 = (xindex // 30) % 4
2023-01-11T21:38:06.6405602Z     x2 = (xindex // 120)
2023-01-11T21:38:06.6405703Z     tmp0 = tl.load(in_ptr0 + (x3), xmask)
2023-01-11T21:38:06.6405819Z     tmp1 = tl.load(in_ptr1 + (x0 + (30*x2) + (180*x1)), xmask)
2023-01-11T21:38:06.6405901Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6406041Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6406128Z ''')
2023-01-11T21:38:06.6406134Z 
2023-01-11T21:38:06.6406138Z 
2023-01-11T21:38:06.6406234Z async_compile.wait(globals())
2023-01-11T21:38:06.6406307Z del async_compile
2023-01-11T21:38:06.6406312Z 
2023-01-11T21:38:06.6406388Z def call(args):
2023-01-11T21:38:06.6406470Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6406548Z     args.clear()
2023-01-11T21:38:06.6406641Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6406871Z         buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6406968Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6407108Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 720, grid=grid(720), stream=stream0)
2023-01-11T21:38:06.6407184Z         del arg0_1
2023-01-11T21:38:06.6407257Z         del arg1_1
2023-01-11T21:38:06.6407336Z         return (buf0, )
2023-01-11T21:38:06.6407341Z 
2023-01-11T21:38:06.6407374Z 
2023-01-11T21:38:06.6407460Z if __name__ == "__main__":
2023-01-11T21:38:06.6407581Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6407709Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6407938Z     arg0_1 = rand_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6408158Z     arg1_1 = rand_strided((2, 3, 4, 5, 6), (90, 30, 180, 6, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6408279Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6408284Z 
2023-01-11T21:38:06.6408289Z 
2023-01-11T21:38:06.6408391Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6408466Z import torch
2023-01-11T21:38:06.6408542Z import random
2023-01-11T21:38:06.6408663Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6408789Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6408794Z 
2023-01-11T21:38:06.6408878Z aten = torch.ops.aten
2023-01-11T21:38:06.6409012Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6409113Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6409118Z 
2023-01-11T21:38:06.6409193Z import triton
2023-01-11T21:38:06.6409287Z import triton.language as tl
2023-01-11T21:38:06.6409414Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6409554Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6409560Z 
2023-01-11T21:38:06.6409565Z 
2023-01-11T21:38:06.6409722Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6409801Z import triton
2023-01-11T21:38:06.6409915Z import triton.language as tl
2023-01-11T21:38:06.6410030Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6410134Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6410268Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6410392Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6410399Z 
2023-01-11T21:38:06.6410826Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6410901Z @triton.jit
2023-01-11T21:38:06.6411046Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6411116Z     xnumel = 720
2023-01-11T21:38:06.6411215Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6411348Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6411436Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6411509Z     x3 = xindex
2023-01-11T21:38:06.6411587Z     x0 = xindex % 30
2023-01-11T21:38:06.6411669Z     x1 = (xindex // 30) % 4
2023-01-11T21:38:06.6411743Z     x2 = (xindex // 120)
2023-01-11T21:38:06.6411864Z     tmp0 = tl.load(in_ptr0 + (x3), xmask).to(tl.float32)
2023-01-11T21:38:06.6411999Z     tmp1 = tl.load(in_ptr1 + (x0 + (30*x2) + (180*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6412080Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6412220Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6412311Z ''')
2023-01-11T21:38:06.6412317Z 
2023-01-11T21:38:06.6412321Z 
2023-01-11T21:38:06.6412414Z async_compile.wait(globals())
2023-01-11T21:38:06.6412486Z del async_compile
2023-01-11T21:38:06.6412496Z 
2023-01-11T21:38:06.6412566Z def call(args):
2023-01-11T21:38:06.6412647Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6412724Z     args.clear()
2023-01-11T21:38:06.6412817Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6413047Z         buf0 = empty_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6413140Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6413285Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 720, grid=grid(720), stream=stream0)
2023-01-11T21:38:06.6413381Z         del arg0_1
2023-01-11T21:38:06.6413458Z         del arg1_1
2023-01-11T21:38:06.6413538Z         return (buf0, )
2023-01-11T21:38:06.6413543Z 
2023-01-11T21:38:06.6413548Z 
2023-01-11T21:38:06.6413629Z if __name__ == "__main__":
2023-01-11T21:38:06.6413751Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6413876Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6414103Z     arg0_1 = rand_strided((2, 3, 4, 5, 6), (360, 120, 30, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6414329Z     arg1_1 = rand_strided((2, 3, 4, 5, 6), (90, 30, 180, 6, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6414449Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6414454Z 
2023-01-11T21:38:06.6414634Z ok (0.190s)
2023-01-11T21:38:06.6415094Z   test_sin_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6415227Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6415486Z [2023-01-11 21:35:36,309] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 823
2023-01-11T21:38:06.6415748Z [2023-01-11 21:35:36,388] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 823
2023-01-11T21:38:06.6416164Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6416346Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6416604Z [2023-01-11 21:35:36,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 824
2023-01-11T21:38:06.6416868Z [2023-01-11 21:35:36,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 824
2023-01-11T21:38:06.6416874Z 
2023-01-11T21:38:06.6416973Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6417043Z import torch
2023-01-11T21:38:06.6417117Z import random
2023-01-11T21:38:06.6417294Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6417421Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6417430Z 
2023-01-11T21:38:06.6417519Z aten = torch.ops.aten
2023-01-11T21:38:06.6417658Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6417756Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6417761Z 
2023-01-11T21:38:06.6417831Z import triton
2023-01-11T21:38:06.6417928Z import triton.language as tl
2023-01-11T21:38:06.6418058Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6418201Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6418207Z 
2023-01-11T21:38:06.6418211Z 
2023-01-11T21:38:06.6418377Z triton_fused_add_sin_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6418452Z import triton
2023-01-11T21:38:06.6418552Z import triton.language as tl
2023-01-11T21:38:06.6418670Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6418769Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6418905Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6419034Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6419039Z 
2023-01-11T21:38:06.6419501Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6419578Z @triton.jit
2023-01-11T21:38:06.6419720Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6419798Z     xnumel = 256
2023-01-11T21:38:06.6419899Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6420024Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6420108Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6420187Z     x0 = xindex
2023-01-11T21:38:06.6420382Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6420485Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6420572Z     tmp1 = tl.sin(tmp0)
2023-01-11T21:38:06.6420644Z     tmp2 = 2
2023-01-11T21:38:06.6420718Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.6420789Z     tmp5 = 1
2023-01-11T21:38:06.6420869Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.6420948Z     tmp7 = tl.sin(tmp6)
2023-01-11T21:38:06.6421088Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6421221Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6421308Z ''')
2023-01-11T21:38:06.6421314Z 
2023-01-11T21:38:06.6421318Z 
2023-01-11T21:38:06.6421406Z async_compile.wait(globals())
2023-01-11T21:38:06.6421485Z del async_compile
2023-01-11T21:38:06.6421490Z 
2023-01-11T21:38:06.6421567Z def call(args):
2023-01-11T21:38:06.6421642Z     arg0_1, = args
2023-01-11T21:38:06.6421719Z     args.clear()
2023-01-11T21:38:06.6421812Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6422017Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6422240Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6422334Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6422483Z         triton_fused_add_sin_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.6422561Z         del arg0_1
2023-01-11T21:38:06.6422646Z         return (buf0, buf1, )
2023-01-11T21:38:06.6422651Z 
2023-01-11T21:38:06.6422656Z 
2023-01-11T21:38:06.6422736Z if __name__ == "__main__":
2023-01-11T21:38:06.6422859Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6422987Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6423186Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6423300Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6423306Z 
2023-01-11T21:38:06.6423310Z 
2023-01-11T21:38:06.6423409Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6423487Z import torch
2023-01-11T21:38:06.6423566Z import random
2023-01-11T21:38:06.6423686Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6423812Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6423818Z 
2023-01-11T21:38:06.6423903Z aten = torch.ops.aten
2023-01-11T21:38:06.6424037Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6424134Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6424139Z 
2023-01-11T21:38:06.6424214Z import triton
2023-01-11T21:38:06.6424309Z import triton.language as tl
2023-01-11T21:38:06.6424435Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6424575Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6424581Z 
2023-01-11T21:38:06.6424585Z 
2023-01-11T21:38:06.6424750Z triton_fused_add_sin_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6424827Z import triton
2023-01-11T21:38:06.6424915Z import triton.language as tl
2023-01-11T21:38:06.6425036Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6425143Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6425277Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6425405Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6425411Z 
2023-01-11T21:38:06.6425913Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6425989Z @triton.jit
2023-01-11T21:38:06.6426135Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6426204Z     xnumel = 256
2023-01-11T21:38:06.6426306Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6426436Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6426525Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6426597Z     x0 = xindex
2023-01-11T21:38:06.6426813Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6426931Z     tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6427007Z     tmp1 = tl.sin(tmp0)
2023-01-11T21:38:06.6427082Z     tmp2 = 2
2023-01-11T21:38:06.6427163Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.6427237Z     tmp5 = 1
2023-01-11T21:38:06.6427316Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.6427397Z     tmp7 = tl.sin(tmp6)
2023-01-11T21:38:06.6427531Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6427660Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.6427747Z ''')
2023-01-11T21:38:06.6427753Z 
2023-01-11T21:38:06.6427757Z 
2023-01-11T21:38:06.6427856Z async_compile.wait(globals())
2023-01-11T21:38:06.6427935Z del async_compile
2023-01-11T21:38:06.6427940Z 
2023-01-11T21:38:06.6428045Z def call(args):
2023-01-11T21:38:06.6428121Z     arg0_1, = args
2023-01-11T21:38:06.6428198Z     args.clear()
2023-01-11T21:38:06.6428286Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6428493Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6428694Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6428788Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6428936Z         triton_fused_add_sin_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.6429011Z         del arg0_1
2023-01-11T21:38:06.6429095Z         return (buf0, buf1, )
2023-01-11T21:38:06.6429100Z 
2023-01-11T21:38:06.6429105Z 
2023-01-11T21:38:06.6429187Z if __name__ == "__main__":
2023-01-11T21:38:06.6429299Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6429432Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6429636Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6429755Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6429760Z 
2023-01-11T21:38:06.6429832Z ok (0.221s)
2023-01-11T21:38:06.6429969Z   test_sink_cat_after_pointwise (__main__.CudaTests) ... ok (0.004s)
2023-01-11T21:38:06.6430435Z   test_sizehint_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6430572Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6430833Z [2023-01-11 21:35:36,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 825
2023-01-11T21:38:06.6431097Z [2023-01-11 21:35:36,795] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 825
2023-01-11T21:38:06.6431541Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6431675Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6431932Z [2023-01-11 21:35:36,980] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 826
2023-01-11T21:38:06.6432196Z [2023-01-11 21:35:37,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 826
2023-01-11T21:38:06.6432202Z 
2023-01-11T21:38:06.6432303Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6432383Z import torch
2023-01-11T21:38:06.6432462Z import random
2023-01-11T21:38:06.6432582Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6432709Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6432714Z 
2023-01-11T21:38:06.6432792Z aten = torch.ops.aten
2023-01-11T21:38:06.6432933Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6433030Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6433035Z 
2023-01-11T21:38:06.6433110Z import triton
2023-01-11T21:38:06.6433204Z import triton.language as tl
2023-01-11T21:38:06.6433332Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6433473Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6433479Z 
2023-01-11T21:38:06.6433483Z 
2023-01-11T21:38:06.6433653Z triton_fused__unsafe_view_0 = async_compile.triton('''
2023-01-11T21:38:06.6433723Z import triton
2023-01-11T21:38:06.6433820Z import triton.language as tl
2023-01-11T21:38:06.6433936Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6434078Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6434212Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6434338Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6434343Z 
2023-01-11T21:38:06.6434753Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6434829Z @triton.jit
2023-01-11T21:38:06.6434957Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6435035Z     xnumel = 150528
2023-01-11T21:38:06.6435136Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6435268Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6435375Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6435457Z     x0 = xindex % 196
2023-01-11T21:38:06.6435566Z     x1 = (xindex // 196) % 384
2023-01-11T21:38:06.6435645Z     x2 = (xindex // 75264)
2023-01-11T21:38:06.6435717Z     x4 = xindex
2023-01-11T21:38:06.6435796Z     tmp0 = 4*(x0 // 14)
2023-01-11T21:38:06.6435875Z     tmp1 = (x1 // 4) % 4
2023-01-11T21:38:06.6435960Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6436040Z     tmp3 = 4*(x0 % 14)
2023-01-11T21:38:06.6436108Z     tmp4 = x1 % 4
2023-01-11T21:38:06.6436188Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6436326Z     tmp6 = tl.load(in_ptr0 + (tmp5 + (56*tmp2) + (3136*(x1 // 16)) + (75264*x2)), xmask)
2023-01-11T21:38:06.6436465Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6436553Z ''')
2023-01-11T21:38:06.6436559Z 
2023-01-11T21:38:06.6436563Z 
2023-01-11T21:38:06.6436657Z async_compile.wait(globals())
2023-01-11T21:38:06.6436736Z del async_compile
2023-01-11T21:38:06.6436741Z 
2023-01-11T21:38:06.6436818Z def call(args):
2023-01-11T21:38:06.6436886Z     arg0_1, = args
2023-01-11T21:38:06.6436967Z     args.clear()
2023-01-11T21:38:06.6437060Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6437279Z         buf0 = empty_strided((2, 384, 196), (75264, 196, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6437374Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6437549Z         triton_fused__unsafe_view_0.run(arg0_1, buf0, 150528, grid=grid(150528), stream=stream0)
2023-01-11T21:38:06.6437626Z         del arg0_1
2023-01-11T21:38:06.6437699Z         return (buf0, )
2023-01-11T21:38:06.6437708Z 
2023-01-11T21:38:06.6437713Z 
2023-01-11T21:38:06.6437789Z if __name__ == "__main__":
2023-01-11T21:38:06.6437911Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6438042Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6438273Z     arg0_1 = rand_strided((2, 24, 56, 56), (75264, 3136, 56, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6438390Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6438399Z 
2023-01-11T21:38:06.6438403Z 
2023-01-11T21:38:06.6438502Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6438579Z import torch
2023-01-11T21:38:06.6438648Z import random
2023-01-11T21:38:06.6438768Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6438894Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6438901Z 
2023-01-11T21:38:06.6438984Z aten = torch.ops.aten
2023-01-11T21:38:06.6439122Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6439217Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6439222Z 
2023-01-11T21:38:06.6439297Z import triton
2023-01-11T21:38:06.6439391Z import triton.language as tl
2023-01-11T21:38:06.6439512Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6439652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6439658Z 
2023-01-11T21:38:06.6439662Z 
2023-01-11T21:38:06.6439851Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6439961Z import triton
2023-01-11T21:38:06.6440056Z import triton.language as tl
2023-01-11T21:38:06.6440174Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6440278Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6440412Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6440538Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6440543Z 
2023-01-11T21:38:06.6440957Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6441032Z @triton.jit
2023-01-11T21:38:06.6441167Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6441243Z     xnumel = 150528
2023-01-11T21:38:06.6441342Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6441474Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6441560Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6441632Z     x0 = xindex % 196
2023-01-11T21:38:06.6441717Z     x1 = (xindex // 196) % 384
2023-01-11T21:38:06.6441800Z     x2 = (xindex // 75264)
2023-01-11T21:38:06.6441873Z     x4 = xindex
2023-01-11T21:38:06.6441956Z     tmp0 = 4*(x0 // 14)
2023-01-11T21:38:06.6442038Z     tmp1 = (x1 // 4) % 4
2023-01-11T21:38:06.6442112Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6442189Z     tmp3 = 4*(x0 % 14)
2023-01-11T21:38:06.6442262Z     tmp4 = x1 % 4
2023-01-11T21:38:06.6442341Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6442492Z     tmp6 = tl.load(in_ptr0 + (tmp5 + (56*tmp2) + (3136*(x1 // 16)) + (75264*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.6442586Z     tmp7 = tmp6.to(tl.float32)
2023-01-11T21:38:06.6442674Z     tmp8 = tmp7.to(tl.float32)
2023-01-11T21:38:06.6442805Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.6442896Z ''')
2023-01-11T21:38:06.6442902Z 
2023-01-11T21:38:06.6442906Z 
2023-01-11T21:38:06.6443000Z async_compile.wait(globals())
2023-01-11T21:38:06.6443081Z del async_compile
2023-01-11T21:38:06.6443086Z 
2023-01-11T21:38:06.6443163Z def call(args):
2023-01-11T21:38:06.6443239Z     arg0_1, = args
2023-01-11T21:38:06.6443315Z     args.clear()
2023-01-11T21:38:06.6443436Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6443657Z         buf0 = empty_strided((2, 384, 196), (75264, 196, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6443751Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6443916Z         triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 150528, grid=grid(150528), stream=stream0)
2023-01-11T21:38:06.6443991Z         del arg0_1
2023-01-11T21:38:06.6444073Z         return (buf0, )
2023-01-11T21:38:06.6444078Z 
2023-01-11T21:38:06.6444082Z 
2023-01-11T21:38:06.6444163Z if __name__ == "__main__":
2023-01-11T21:38:06.6444282Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6444413Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6444634Z     arg0_1 = rand_strided((2, 24, 56, 56), (75264, 3136, 56, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6444750Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6444755Z 
2023-01-11T21:38:06.6444828Z ok (0.596s)
2023-01-11T21:38:06.6445281Z   test_slice1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6445414Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6445672Z [2023-01-11 21:35:37,140] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 827
2023-01-11T21:38:06.6445961Z [2023-01-11 21:35:37,230] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 827
2023-01-11T21:38:06.6446373Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6446506Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6446759Z [2023-01-11 21:35:37,272] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 828
2023-01-11T21:38:06.6447021Z [2023-01-11 21:35:37,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 828
2023-01-11T21:38:06.6447027Z 
2023-01-11T21:38:06.6447121Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6447202Z import torch
2023-01-11T21:38:06.6447277Z import random
2023-01-11T21:38:06.6447403Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6447529Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6447534Z 
2023-01-11T21:38:06.6447618Z aten = torch.ops.aten
2023-01-11T21:38:06.6447760Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6447851Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6447862Z 
2023-01-11T21:38:06.6447931Z import triton
2023-01-11T21:38:06.6448025Z import triton.language as tl
2023-01-11T21:38:06.6448150Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6448292Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6448298Z 
2023-01-11T21:38:06.6448302Z 
2023-01-11T21:38:06.6448467Z triton_fused_add_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.6448542Z import triton
2023-01-11T21:38:06.6448635Z import triton.language as tl
2023-01-11T21:38:06.6448746Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6448850Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6448987Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6449113Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6449118Z 
2023-01-11T21:38:06.6449570Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6449651Z @triton.jit
2023-01-11T21:38:06.6449795Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6449871Z     xnumel = 20
2023-01-11T21:38:06.6449963Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6450096Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6450181Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6450262Z     x0 = xindex % 10
2023-01-11T21:38:06.6450343Z     x1 = (xindex // 10)
2023-01-11T21:38:06.6450415Z     x2 = xindex
2023-01-11T21:38:06.6450626Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6450834Z     tmp1 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6450947Z     tmp3 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask)
2023-01-11T21:38:06.6451062Z     tmp6 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask)
2023-01-11T21:38:06.6451145Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6451219Z     tmp4 = 1
2023-01-11T21:38:06.6451297Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6451377Z     tmp7 = tmp6 + tmp4
2023-01-11T21:38:06.6451449Z     tmp8 = tmp5 + tmp7
2023-01-11T21:38:06.6451586Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6451722Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.6451838Z ''')
2023-01-11T21:38:06.6451843Z 
2023-01-11T21:38:06.6451848Z 
2023-01-11T21:38:06.6451945Z async_compile.wait(globals())
2023-01-11T21:38:06.6452026Z del async_compile
2023-01-11T21:38:06.6452031Z 
2023-01-11T21:38:06.6452107Z def call(args):
2023-01-11T21:38:06.6452181Z     arg0_1, = args
2023-01-11T21:38:06.6452254Z     args.clear()
2023-01-11T21:38:06.6452349Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6452552Z         buf0 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6452753Z         buf1 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6452850Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6452998Z         triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 20, grid=grid(20), stream=stream0)
2023-01-11T21:38:06.6453074Z         del arg0_1
2023-01-11T21:38:06.6453153Z         return (buf0, buf1, )
2023-01-11T21:38:06.6453158Z 
2023-01-11T21:38:06.6453168Z 
2023-01-11T21:38:06.6453247Z if __name__ == "__main__":
2023-01-11T21:38:06.6453365Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6453493Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6453705Z     arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6453824Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6453830Z 
2023-01-11T21:38:06.6453834Z 
2023-01-11T21:38:06.6453937Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6454012Z import torch
2023-01-11T21:38:06.6454082Z import random
2023-01-11T21:38:06.6454205Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6454336Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6454341Z 
2023-01-11T21:38:06.6454425Z aten = torch.ops.aten
2023-01-11T21:38:06.6454668Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6454767Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6454776Z 
2023-01-11T21:38:06.6454851Z import triton
2023-01-11T21:38:06.6454945Z import triton.language as tl
2023-01-11T21:38:06.6455065Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6455206Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6455212Z 
2023-01-11T21:38:06.6455216Z 
2023-01-11T21:38:06.6455428Z triton_fused_add_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.6455506Z import triton
2023-01-11T21:38:06.6455601Z import triton.language as tl
2023-01-11T21:38:06.6455715Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6455817Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6455949Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6456069Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6456074Z 
2023-01-11T21:38:06.6456492Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6456572Z @triton.jit
2023-01-11T21:38:06.6456714Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6456791Z     xnumel = 20
2023-01-11T21:38:06.6456893Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6457023Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6457107Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6457233Z     x0 = xindex % 10
2023-01-11T21:38:06.6457327Z     x1 = (xindex // 10)
2023-01-11T21:38:06.6457408Z     x2 = xindex
2023-01-11T21:38:06.6457643Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6457876Z     tmp1 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6458004Z     tmp3 = tl.load(in_ptr0 + ((2*x0) + (40*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6458177Z     tmp6 = tl.load(in_ptr0 + (20 + (2*x0) + (40*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6458251Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6458324Z     tmp4 = 1
2023-01-11T21:38:06.6458404Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6458484Z     tmp7 = tmp6 + tmp4
2023-01-11T21:38:06.6458571Z     tmp8 = tmp5 + tmp7
2023-01-11T21:38:06.6458706Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6458841Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.6458923Z ''')
2023-01-11T21:38:06.6458928Z 
2023-01-11T21:38:06.6458940Z 
2023-01-11T21:38:06.6459028Z async_compile.wait(globals())
2023-01-11T21:38:06.6459108Z del async_compile
2023-01-11T21:38:06.6459113Z 
2023-01-11T21:38:06.6459191Z def call(args):
2023-01-11T21:38:06.6459265Z     arg0_1, = args
2023-01-11T21:38:06.6459342Z     args.clear()
2023-01-11T21:38:06.6459437Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6459639Z         buf0 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6459837Z         buf1 = empty_strided((2, 10), (10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6459932Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6460083Z         triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 20, grid=grid(20), stream=stream0)
2023-01-11T21:38:06.6460158Z         del arg0_1
2023-01-11T21:38:06.6460241Z         return (buf0, buf1, )
2023-01-11T21:38:06.6460246Z 
2023-01-11T21:38:06.6460250Z 
2023-01-11T21:38:06.6460332Z if __name__ == "__main__":
2023-01-11T21:38:06.6460452Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6460574Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6460786Z     arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6460901Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6460906Z 
2023-01-11T21:38:06.6460981Z ok (0.257s)
2023-01-11T21:38:06.6461506Z   test_slice2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6461645Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6461909Z [2023-01-11 21:35:37,398] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 829
2023-01-11T21:38:06.6462172Z [2023-01-11 21:35:37,482] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 829
2023-01-11T21:38:06.6462586Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6462722Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6462977Z [2023-01-11 21:35:37,523] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 830
2023-01-11T21:38:06.6463234Z [2023-01-11 21:35:37,605] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 830
2023-01-11T21:38:06.6463245Z 
2023-01-11T21:38:06.6463338Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6463415Z import torch
2023-01-11T21:38:06.6463492Z import random
2023-01-11T21:38:06.6463614Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6463738Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6463743Z 
2023-01-11T21:38:06.6463827Z aten = torch.ops.aten
2023-01-11T21:38:06.6463995Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6464086Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6464091Z 
2023-01-11T21:38:06.6464166Z import triton
2023-01-11T21:38:06.6464259Z import triton.language as tl
2023-01-11T21:38:06.6464386Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6464531Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6464537Z 
2023-01-11T21:38:06.6464541Z 
2023-01-11T21:38:06.6464704Z triton_fused_add_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.6464779Z import triton
2023-01-11T21:38:06.6464874Z import triton.language as tl
2023-01-11T21:38:06.6464983Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6465087Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6465222Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6465349Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6465357Z 
2023-01-11T21:38:06.6465817Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6465903Z @triton.jit
2023-01-11T21:38:06.6466052Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6466128Z     xnumel = 10
2023-01-11T21:38:06.6466221Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6466352Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6466435Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6466507Z     x0 = xindex
2023-01-11T21:38:06.6466708Z     tmp0 = tl.load(in_ptr0 + (1 + (4*x0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6466909Z     tmp1 = tl.load(in_ptr0 + (42 + (4*x0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6467020Z     tmp3 = tl.load(in_ptr0 + (1 + (4*x0)), xmask)
2023-01-11T21:38:06.6467122Z     tmp6 = tl.load(in_ptr0 + (42 + (4*x0)), xmask)
2023-01-11T21:38:06.6467201Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6467273Z     tmp4 = 1
2023-01-11T21:38:06.6467354Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6467425Z     tmp7 = 2
2023-01-11T21:38:06.6467505Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.6467615Z     tmp9 = tmp5 + tmp8
2023-01-11T21:38:06.6467747Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6467882Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.6467970Z ''')
2023-01-11T21:38:06.6467976Z 
2023-01-11T21:38:06.6467980Z 
2023-01-11T21:38:06.6468074Z async_compile.wait(globals())
2023-01-11T21:38:06.6468150Z del async_compile
2023-01-11T21:38:06.6468155Z 
2023-01-11T21:38:06.6468230Z def call(args):
2023-01-11T21:38:06.6468304Z     arg0_1, = args
2023-01-11T21:38:06.6468374Z     args.clear()
2023-01-11T21:38:06.6468468Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6468680Z         buf0 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6468881Z         buf1 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6468976Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6469127Z         triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.6469203Z         del arg0_1
2023-01-11T21:38:06.6469289Z         return (buf0, buf1, )
2023-01-11T21:38:06.6469295Z 
2023-01-11T21:38:06.6469299Z 
2023-01-11T21:38:06.6469374Z if __name__ == "__main__":
2023-01-11T21:38:06.6469494Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6469622Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6469831Z     arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6469946Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6469951Z 
2023-01-11T21:38:06.6469986Z 
2023-01-11T21:38:06.6470086Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6470162Z import torch
2023-01-11T21:38:06.6470238Z import random
2023-01-11T21:38:06.6470352Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6470474Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6470479Z 
2023-01-11T21:38:06.6470565Z aten = torch.ops.aten
2023-01-11T21:38:06.6470702Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6470799Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6470804Z 
2023-01-11T21:38:06.6470880Z import triton
2023-01-11T21:38:06.6470974Z import triton.language as tl
2023-01-11T21:38:06.6471095Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6471235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6471240Z 
2023-01-11T21:38:06.6471245Z 
2023-01-11T21:38:06.6471407Z triton_fused_add_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.6471488Z import triton
2023-01-11T21:38:06.6471581Z import triton.language as tl
2023-01-11T21:38:06.6471695Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6471797Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6471931Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6472054Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6472060Z 
2023-01-11T21:38:06.6472471Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6472545Z @triton.jit
2023-01-11T21:38:06.6472688Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6472764Z     xnumel = 10
2023-01-11T21:38:06.6472866Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6473000Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6473088Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6473154Z     x0 = xindex
2023-01-11T21:38:06.6473377Z     tmp0 = tl.load(in_ptr0 + (1 + (4*x0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6473628Z     tmp1 = tl.load(in_ptr0 + (42 + (4*x0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6473751Z     tmp3 = tl.load(in_ptr0 + (1 + (4*x0)), xmask).to(tl.float32)
2023-01-11T21:38:06.6473874Z     tmp6 = tl.load(in_ptr0 + (42 + (4*x0)), xmask).to(tl.float32)
2023-01-11T21:38:06.6473954Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6474028Z     tmp4 = 1
2023-01-11T21:38:06.6474102Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6474172Z     tmp7 = 2
2023-01-11T21:38:06.6474251Z     tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.6474330Z     tmp9 = tmp5 + tmp8
2023-01-11T21:38:06.6474466Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6474601Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.6474691Z ''')
2023-01-11T21:38:06.6474696Z 
2023-01-11T21:38:06.6474701Z 
2023-01-11T21:38:06.6474796Z async_compile.wait(globals())
2023-01-11T21:38:06.6474868Z del async_compile
2023-01-11T21:38:06.6474873Z 
2023-01-11T21:38:06.6474953Z def call(args):
2023-01-11T21:38:06.6475029Z     arg0_1, = args
2023-01-11T21:38:06.6475106Z     args.clear()
2023-01-11T21:38:06.6475200Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6475429Z         buf0 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6475653Z         buf1 = empty_strided((1, 10), (10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6475740Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6475888Z         triton_fused_add_add_3_0.run(arg0_1, buf0, buf1, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.6475963Z         del arg0_1
2023-01-11T21:38:06.6476048Z         return (buf0, buf1, )
2023-01-11T21:38:06.6476090Z 
2023-01-11T21:38:06.6476095Z 
2023-01-11T21:38:06.6476176Z if __name__ == "__main__":
2023-01-11T21:38:06.6476294Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6476420Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6476633Z     arg0_1 = rand_strided((2, 20, 2), (40, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6476741Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6476746Z 
2023-01-11T21:38:06.6476819Z ok (0.249s)
2023-01-11T21:38:06.6477280Z   test_slice_mutation1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6477418Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6477680Z [2023-01-11 21:35:37,653] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 831
2023-01-11T21:38:06.6477949Z [2023-01-11 21:35:37,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 831
2023-01-11T21:38:06.6478364Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6478498Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6478754Z [2023-01-11 21:35:37,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 832
2023-01-11T21:38:06.6478759Z 
2023-01-11T21:38:06.6478860Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6478942Z import torch
2023-01-11T21:38:06.6479012Z import random
2023-01-11T21:38:06.6479134Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6479258Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6479263Z 
2023-01-11T21:38:06.6479345Z aten = torch.ops.aten
2023-01-11T21:38:06.6479506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6479605Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6479610Z 
2023-01-11T21:38:06.6479684Z import triton
2023-01-11T21:38:06.6479771Z import triton.language as tl
2023-01-11T21:38:06.6479898Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6480040Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6480046Z 
2023-01-11T21:38:06.6480050Z 
2023-01-11T21:38:06.6480226Z triton_fused_add_zeros_like_0 = async_compile.triton('''
2023-01-11T21:38:06.6480305Z import triton
2023-01-11T21:38:06.6480398Z import triton.language as tl
2023-01-11T21:38:06.6480516Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6480618Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6480747Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6480874Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6480879Z 
2023-01-11T21:38:06.6481286Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6481363Z @triton.jit
2023-01-11T21:38:06.6481495Z def triton_(out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6481571Z     xnumel = 64
2023-01-11T21:38:06.6481670Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6481802Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6481881Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6481983Z     x0 = xindex
2023-01-11T21:38:06.6482055Z     tmp0 = 0
2023-01-11T21:38:06.6482126Z     tmp1 = 1
2023-01-11T21:38:06.6482208Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6482344Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6482481Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6482562Z ''')
2023-01-11T21:38:06.6482568Z 
2023-01-11T21:38:06.6482572Z 
2023-01-11T21:38:06.6482796Z triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1 = async_compile.triton('''
2023-01-11T21:38:06.6482873Z import triton
2023-01-11T21:38:06.6482968Z import triton.language as tl
2023-01-11T21:38:06.6483084Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6483186Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6483323Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6483444Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6483459Z 
2023-01-11T21:38:06.6483847Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.6483923Z @triton.jit
2023-01-11T21:38:06.6484049Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6484126Z     xnumel = 8
2023-01-11T21:38:06.6484223Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6484353Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6484436Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6484502Z     x0 = xindex
2023-01-11T21:38:06.6484577Z     tmp0 = 3.0
2023-01-11T21:38:06.6484719Z     tl.store(out_ptr0 + (3 + (8*x0) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6484806Z ''')
2023-01-11T21:38:06.6484811Z 
2023-01-11T21:38:06.6484816Z 
2023-01-11T21:38:06.6484976Z triton_fused_clone_2 = async_compile.triton('''
2023-01-11T21:38:06.6485055Z import triton
2023-01-11T21:38:06.6485150Z import triton.language as tl
2023-01-11T21:38:06.6485269Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6485366Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6485499Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6485655Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6485661Z 
2023-01-11T21:38:06.6486065Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6486142Z @triton.jit
2023-01-11T21:38:06.6486276Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6486351Z     xnumel = 64
2023-01-11T21:38:06.6486449Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6486572Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6486661Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6486734Z     x0 = xindex
2023-01-11T21:38:06.6486926Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6487061Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6487151Z ''')
2023-01-11T21:38:06.6487156Z 
2023-01-11T21:38:06.6487161Z 
2023-01-11T21:38:06.6487393Z triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3 = async_compile.triton('''
2023-01-11T21:38:06.6487464Z import triton
2023-01-11T21:38:06.6487559Z import triton.language as tl
2023-01-11T21:38:06.6487675Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6487778Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6487913Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6488040Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6488045Z 
2023-01-11T21:38:06.6488469Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.6488544Z @triton.jit
2023-01-11T21:38:06.6488663Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6488739Z     xnumel = 8
2023-01-11T21:38:06.6488838Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6488969Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6489055Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6489129Z     x0 = xindex
2023-01-11T21:38:06.6489203Z     tmp0 = 4.0
2023-01-11T21:38:06.6489335Z     tl.store(out_ptr0 + (32 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6489422Z ''')
2023-01-11T21:38:06.6489428Z 
2023-01-11T21:38:06.6489432Z 
2023-01-11T21:38:06.6489591Z triton_fused_add_1_4 = async_compile.triton('''
2023-01-11T21:38:06.6489672Z import triton
2023-01-11T21:38:06.6489766Z import triton.language as tl
2023-01-11T21:38:06.6489883Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6489985Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6490119Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6490242Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6490247Z 
2023-01-11T21:38:06.6490650Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6490726Z @triton.jit
2023-01-11T21:38:06.6490859Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6490935Z     xnumel = 64
2023-01-11T21:38:06.6491032Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6491161Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6491247Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6491313Z     x0 = xindex
2023-01-11T21:38:06.6491504Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6491579Z     tmp1 = 1
2023-01-11T21:38:06.6491659Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6491826Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6491913Z ''')
2023-01-11T21:38:06.6491918Z 
2023-01-11T21:38:06.6491923Z 
2023-01-11T21:38:06.6492021Z async_compile.wait(globals())
2023-01-11T21:38:06.6492092Z del async_compile
2023-01-11T21:38:06.6492097Z 
2023-01-11T21:38:06.6492171Z def call(args):
2023-01-11T21:38:06.6492245Z     arg0_1, = args
2023-01-11T21:38:06.6492319Z     args.clear()
2023-01-11T21:38:06.6492411Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6492613Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6492811Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6492901Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6493047Z         triton_fused_add_zeros_like_0.run(buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6493221Z         triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1.run(buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6493421Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6493558Z         triton_fused_clone_2.run(buf0, buf3, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6493733Z         triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3.run(buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6493931Z         buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6494066Z         triton_fused_add_1_4.run(buf0, buf5, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6494162Z         return (buf0, buf1, buf3, buf5, )
2023-01-11T21:38:06.6494194Z 
2023-01-11T21:38:06.6494199Z 
2023-01-11T21:38:06.6494274Z if __name__ == "__main__":
2023-01-11T21:38:06.6494395Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6494626Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6494833Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6494946Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6494951Z 
2023-01-11T21:38:06.6495216Z [2023-01-11 21:35:37,899] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 832
2023-01-11T21:38:06.6495222Z 
2023-01-11T21:38:06.6495320Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6495395Z import torch
2023-01-11T21:38:06.6495464Z import random
2023-01-11T21:38:06.6495608Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6495755Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6495761Z 
2023-01-11T21:38:06.6495855Z aten = torch.ops.aten
2023-01-11T21:38:06.6495993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6496090Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6496095Z 
2023-01-11T21:38:06.6496170Z import triton
2023-01-11T21:38:06.6496263Z import triton.language as tl
2023-01-11T21:38:06.6496384Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6496524Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6496529Z 
2023-01-11T21:38:06.6496534Z 
2023-01-11T21:38:06.6496705Z triton_fused_add_zeros_like_0 = async_compile.triton('''
2023-01-11T21:38:06.6496780Z import triton
2023-01-11T21:38:06.6496874Z import triton.language as tl
2023-01-11T21:38:06.6496987Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6497091Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6497274Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6497421Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6497433Z 
2023-01-11T21:38:06.6497834Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6497957Z @triton.jit
2023-01-11T21:38:06.6498092Z def triton_(out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6498167Z     xnumel = 64
2023-01-11T21:38:06.6498266Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6498394Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6498471Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6498541Z     x0 = xindex
2023-01-11T21:38:06.6498611Z     tmp0 = 0
2023-01-11T21:38:06.6498681Z     tmp1 = 1
2023-01-11T21:38:06.6498760Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6498896Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6499029Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6499110Z ''')
2023-01-11T21:38:06.6499116Z 
2023-01-11T21:38:06.6499128Z 
2023-01-11T21:38:06.6499345Z triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1 = async_compile.triton('''
2023-01-11T21:38:06.6499422Z import triton
2023-01-11T21:38:06.6499515Z import triton.language as tl
2023-01-11T21:38:06.6499630Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6499732Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6499863Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6499988Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6499993Z 
2023-01-11T21:38:06.6500385Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.6500491Z @triton.jit
2023-01-11T21:38:06.6500614Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6500687Z     xnumel = 8
2023-01-11T21:38:06.6500785Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6500918Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6501003Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6501079Z     x0 = xindex
2023-01-11T21:38:06.6501147Z     tmp0 = 3.0
2023-01-11T21:38:06.6501287Z     tl.store(out_ptr0 + (3 + (8*x0) + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6501374Z ''')
2023-01-11T21:38:06.6501379Z 
2023-01-11T21:38:06.6501384Z 
2023-01-11T21:38:06.6501542Z triton_fused_clone_2 = async_compile.triton('''
2023-01-11T21:38:06.6501617Z import triton
2023-01-11T21:38:06.6501712Z import triton.language as tl
2023-01-11T21:38:06.6501827Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6501923Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6502057Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6502187Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6502192Z 
2023-01-11T21:38:06.6502597Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6502673Z @triton.jit
2023-01-11T21:38:06.6502808Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6502881Z     xnumel = 64
2023-01-11T21:38:06.6502980Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6503104Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6503189Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6503260Z     x0 = xindex
2023-01-11T21:38:06.6503476Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6503610Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6503697Z ''')
2023-01-11T21:38:06.6503703Z 
2023-01-11T21:38:06.6503707Z 
2023-01-11T21:38:06.6503938Z triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3 = async_compile.triton('''
2023-01-11T21:38:06.6504014Z import triton
2023-01-11T21:38:06.6504130Z import triton.language as tl
2023-01-11T21:38:06.6504249Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6504352Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6504483Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6504608Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6504614Z 
2023-01-11T21:38:06.6505009Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.6505086Z @triton.jit
2023-01-11T21:38:06.6505213Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6505280Z     xnumel = 8
2023-01-11T21:38:06.6505381Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6505510Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6505607Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6505694Z     x0 = xindex
2023-01-11T21:38:06.6505777Z     tmp0 = 4.0
2023-01-11T21:38:06.6505934Z     tl.store(out_ptr0 + (32 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6506014Z ''')
2023-01-11T21:38:06.6506019Z 
2023-01-11T21:38:06.6506024Z 
2023-01-11T21:38:06.6506183Z triton_fused_add_1_4 = async_compile.triton('''
2023-01-11T21:38:06.6506259Z import triton
2023-01-11T21:38:06.6506353Z import triton.language as tl
2023-01-11T21:38:06.6506469Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6506572Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6506705Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6506853Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6506865Z 
2023-01-11T21:38:06.6507262Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6507336Z @triton.jit
2023-01-11T21:38:06.6507468Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6507541Z     xnumel = 64
2023-01-11T21:38:06.6507638Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6507768Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6507852Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6507918Z     x0 = xindex
2023-01-11T21:38:06.6508132Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6508204Z     tmp1 = 1
2023-01-11T21:38:06.6508289Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6508424Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6508511Z ''')
2023-01-11T21:38:06.6508517Z 
2023-01-11T21:38:06.6508521Z 
2023-01-11T21:38:06.6508614Z async_compile.wait(globals())
2023-01-11T21:38:06.6508694Z del async_compile
2023-01-11T21:38:06.6508699Z 
2023-01-11T21:38:06.6508772Z def call(args):
2023-01-11T21:38:06.6508845Z     arg0_1, = args
2023-01-11T21:38:06.6508920Z     args.clear()
2023-01-11T21:38:06.6509015Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6509214Z         buf0 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6509415Z         buf1 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6509509Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6509650Z         triton_fused_add_zeros_like_0.run(buf0, buf1, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6509831Z         triton_fused_fill__lift_fresh_copy_select_slice_1_zeros_like_1.run(buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6510033Z         buf3 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6510167Z         triton_fused_clone_2.run(buf0, buf3, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6510372Z         triton_fused_fill__1_lift_fresh_copy_1_select_1_slice_2_zeros_like_3.run(buf0, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6510572Z         buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6510706Z         triton_fused_add_1_4.run(buf0, buf5, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6510805Z         return (buf0, buf1, buf3, buf5, )
2023-01-11T21:38:06.6510811Z 
2023-01-11T21:38:06.6510815Z 
2023-01-11T21:38:06.6510896Z if __name__ == "__main__":
2023-01-11T21:38:06.6511010Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6511137Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6511339Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6511453Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6511458Z 
2023-01-11T21:38:06.6511533Z ok (0.294s)
2023-01-11T21:38:06.6511870Z   test_slice_mutation2_cuda (__main__.CudaTests) ... [2023-01-11 21:35:37,935] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 833
2023-01-11T21:38:06.6512117Z [2023-01-11 21:35:38,027] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to input mutation
2023-01-11T21:38:06.6512384Z [2023-01-11 21:35:38,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 833
2023-01-11T21:38:06.6512391Z 
2023-01-11T21:38:06.6512484Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6512561Z import torch
2023-01-11T21:38:06.6512637Z import random
2023-01-11T21:38:06.6512757Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6512882Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6512919Z 
2023-01-11T21:38:06.6513004Z aten = torch.ops.aten
2023-01-11T21:38:06.6513143Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6513241Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6513246Z 
2023-01-11T21:38:06.6513316Z import triton
2023-01-11T21:38:06.6513418Z import triton.language as tl
2023-01-11T21:38:06.6513547Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6513687Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6513692Z 
2023-01-11T21:38:06.6513697Z 
2023-01-11T21:38:06.6513879Z triton_fused_add_slice_1_slice_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6513954Z import triton
2023-01-11T21:38:06.6514048Z import triton.language as tl
2023-01-11T21:38:06.6514157Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6514259Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6514392Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6514521Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6514526Z 
2023-01-11T21:38:06.6514929Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6515004Z @triton.jit
2023-01-11T21:38:06.6515137Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6515212Z     xnumel = 20
2023-01-11T21:38:06.6515304Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6515435Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6515519Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6515592Z     x0 = xindex
2023-01-11T21:38:06.6515794Z     tmp0 = tl.load(in_ptr0 + (20 + x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6515868Z     tmp1 = 1
2023-01-11T21:38:06.6515954Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6516083Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6516168Z ''')
2023-01-11T21:38:06.6516174Z 
2023-01-11T21:38:06.6516178Z 
2023-01-11T21:38:06.6516394Z triton_fused_add_copy__slice_1_slice_2_slice_3_slice_4_1 = async_compile.triton('''
2023-01-11T21:38:06.6516496Z import triton
2023-01-11T21:38:06.6516593Z import triton.language as tl
2023-01-11T21:38:06.6516708Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6516810Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6516944Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6517063Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6517075Z 
2023-01-11T21:38:06.6517476Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6517556Z @triton.jit
2023-01-11T21:38:06.6517689Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6517764Z     xnumel = 20
2023-01-11T21:38:06.6517862Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6517993Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6518076Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6518142Z     x0 = xindex
2023-01-11T21:38:06.6518239Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6518377Z     tl.store(out_ptr0 + (20 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6518462Z ''')
2023-01-11T21:38:06.6518468Z 
2023-01-11T21:38:06.6518472Z 
2023-01-11T21:38:06.6524853Z triton_fused_add_1_slice_5_slice_6_2 = async_compile.triton('''
2023-01-11T21:38:06.6524946Z import triton
2023-01-11T21:38:06.6525043Z import triton.language as tl
2023-01-11T21:38:06.6525162Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6525327Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6525478Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6525619Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6525626Z 
2023-01-11T21:38:06.6526070Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6526145Z @triton.jit
2023-01-11T21:38:06.6526280Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6526354Z     xnumel = 9
2023-01-11T21:38:06.6526454Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6526576Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6526659Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6526726Z     x0 = xindex
2023-01-11T21:38:06.6526928Z     tmp0 = tl.load(in_ptr0 + (1 + x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6527001Z     tmp1 = 2
2023-01-11T21:38:06.6527080Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6527214Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6527292Z ''')
2023-01-11T21:38:06.6527298Z 
2023-01-11T21:38:06.6527302Z 
2023-01-11T21:38:06.6527522Z triton_fused_add_1_copy__1_slice_5_slice_6_slice_7_slice_8_3 = async_compile.triton('''
2023-01-11T21:38:06.6527597Z import triton
2023-01-11T21:38:06.6527691Z import triton.language as tl
2023-01-11T21:38:06.6527807Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6527907Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6528038Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6528156Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6528167Z 
2023-01-11T21:38:06.6528568Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6528645Z @triton.jit
2023-01-11T21:38:06.6528776Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6528847Z     xnumel = 9
2023-01-11T21:38:06.6528976Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6529106Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6529188Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6529252Z     x0 = xindex
2023-01-11T21:38:06.6529348Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6529484Z     tl.store(out_ptr0 + (2 + x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6529570Z ''')
2023-01-11T21:38:06.6529575Z 
2023-01-11T21:38:06.6529580Z 
2023-01-11T21:38:06.6529675Z async_compile.wait(globals())
2023-01-11T21:38:06.6529751Z del async_compile
2023-01-11T21:38:06.6529756Z 
2023-01-11T21:38:06.6529834Z def call(args):
2023-01-11T21:38:06.6529906Z     arg0_1, = args
2023-01-11T21:38:06.6529974Z     args.clear()
2023-01-11T21:38:06.6530065Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6530268Z         buf0 = empty_strided((1, 20), (20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6530362Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6530513Z         triton_fused_add_slice_1_slice_2_0.run(arg0_1, buf0, 20, grid=grid(20), stream=stream0)
2023-01-11T21:38:06.6530685Z         triton_fused_add_copy__slice_1_slice_2_slice_3_slice_4_1.run(buf0, arg0_1, 20, grid=grid(20), stream=stream0)
2023-01-11T21:38:06.6530758Z         del buf0
2023-01-11T21:38:06.6530951Z         buf2 = empty_strided((1, 9), (9, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6531103Z         triton_fused_add_1_slice_5_slice_6_2.run(arg0_1, buf2, 9, grid=grid(9), stream=stream0)
2023-01-11T21:38:06.6531277Z         triton_fused_add_1_copy__1_slice_5_slice_6_slice_7_slice_8_3.run(buf2, arg0_1, 9, grid=grid(9), stream=stream0)
2023-01-11T21:38:06.6531380Z         del arg0_1
2023-01-11T21:38:06.6531451Z         return ()
2023-01-11T21:38:06.6531457Z 
2023-01-11T21:38:06.6531461Z 
2023-01-11T21:38:06.6531541Z if __name__ == "__main__":
2023-01-11T21:38:06.6531658Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6531785Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6531982Z     arg0_1 = rand_strided((1, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6532093Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6532098Z 
2023-01-11T21:38:06.6532169Z ok (0.126s)
2023-01-11T21:38:06.6532626Z   test_slice_scatter2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6532760Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6533018Z [2023-01-11 21:35:38,052] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 834
2023-01-11T21:38:06.6533283Z [2023-01-11 21:35:38,120] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 834
2023-01-11T21:38:06.6533702Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6533838Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6534092Z [2023-01-11 21:35:38,139] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 835
2023-01-11T21:38:06.6534358Z [2023-01-11 21:35:38,209] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 835
2023-01-11T21:38:06.6534364Z 
2023-01-11T21:38:06.6534463Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6534750Z import torch
2023-01-11T21:38:06.6534876Z import random
2023-01-11T21:38:06.6534998Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6535121Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6535126Z 
2023-01-11T21:38:06.6535208Z aten = torch.ops.aten
2023-01-11T21:38:06.6535346Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6535442Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6535447Z 
2023-01-11T21:38:06.6535514Z import triton
2023-01-11T21:38:06.6535608Z import triton.language as tl
2023-01-11T21:38:06.6535736Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6535881Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6535886Z 
2023-01-11T21:38:06.6535891Z 
2023-01-11T21:38:06.6536065Z triton_fused_slice_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6536140Z import triton
2023-01-11T21:38:06.6536231Z import triton.language as tl
2023-01-11T21:38:06.6536346Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6536441Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6536573Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6536702Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6536707Z 
2023-01-11T21:38:06.6537116Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6537256Z @triton.jit
2023-01-11T21:38:06.6537390Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6537506Z     xnumel = 605184
2023-01-11T21:38:06.6537604Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6537727Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6537809Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6537880Z     x0 = xindex
2023-01-11T21:38:06.6537984Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6538127Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6538213Z ''')
2023-01-11T21:38:06.6538219Z 
2023-01-11T21:38:06.6538223Z 
2023-01-11T21:38:06.6538319Z async_compile.wait(globals())
2023-01-11T21:38:06.6538389Z del async_compile
2023-01-11T21:38:06.6538394Z 
2023-01-11T21:38:06.6538469Z def call(args):
2023-01-11T21:38:06.6538547Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6538621Z     args.clear()
2023-01-11T21:38:06.6538713Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6538935Z         buf0 = empty_strided((8, 197, 384), (75648, 384, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6539031Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6539177Z         triton_fused_slice_scatter_0.run(arg1_1, buf0, 605184, grid=grid(605184), stream=stream0)
2023-01-11T21:38:06.6539251Z         del arg1_1
2023-01-11T21:38:06.6539327Z         return (buf0, )
2023-01-11T21:38:06.6539336Z 
2023-01-11T21:38:06.6539340Z 
2023-01-11T21:38:06.6539420Z if __name__ == "__main__":
2023-01-11T21:38:06.6539535Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6539660Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6539880Z     arg0_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6540098Z     arg1_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6540211Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6540223Z 
2023-01-11T21:38:06.6540229Z 
2023-01-11T21:38:06.6540320Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6540392Z import torch
2023-01-11T21:38:06.6540465Z import random
2023-01-11T21:38:06.6540585Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6540707Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6540712Z 
2023-01-11T21:38:06.6540823Z aten = torch.ops.aten
2023-01-11T21:38:06.6540960Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6541049Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6541061Z 
2023-01-11T21:38:06.6541128Z import triton
2023-01-11T21:38:06.6541219Z import triton.language as tl
2023-01-11T21:38:06.6541347Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6541488Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6541493Z 
2023-01-11T21:38:06.6541498Z 
2023-01-11T21:38:06.6541666Z triton_fused_slice_scatter_0 = async_compile.triton('''
2023-01-11T21:38:06.6541744Z import triton
2023-01-11T21:38:06.6541835Z import triton.language as tl
2023-01-11T21:38:06.6541942Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6542041Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6542171Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6542297Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6542303Z 
2023-01-11T21:38:06.6542705Z @pointwise(size_hints=[1048576], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6542778Z @triton.jit
2023-01-11T21:38:06.6542908Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6542980Z     xnumel = 605184
2023-01-11T21:38:06.6543072Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6543201Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6543313Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6543383Z     x0 = xindex
2023-01-11T21:38:06.6543499Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6543632Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6543716Z ''')
2023-01-11T21:38:06.6543723Z 
2023-01-11T21:38:06.6543728Z 
2023-01-11T21:38:06.6543821Z async_compile.wait(globals())
2023-01-11T21:38:06.6543890Z del async_compile
2023-01-11T21:38:06.6543895Z 
2023-01-11T21:38:06.6543968Z def call(args):
2023-01-11T21:38:06.6544046Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6544121Z     args.clear()
2023-01-11T21:38:06.6544212Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6544433Z         buf0 = empty_strided((8, 197, 384), (75648, 384, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6544525Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6544667Z         triton_fused_slice_scatter_0.run(arg1_1, buf0, 605184, grid=grid(605184), stream=stream0)
2023-01-11T21:38:06.6544741Z         del arg1_1
2023-01-11T21:38:06.6544819Z         return (buf0, )
2023-01-11T21:38:06.6544825Z 
2023-01-11T21:38:06.6544829Z 
2023-01-11T21:38:06.6544910Z if __name__ == "__main__":
2023-01-11T21:38:06.6545027Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6545156Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6545374Z     arg0_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6545587Z     arg1_1 = rand_strided((8, 197, 384), (75648, 384, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6545699Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6545704Z 
2023-01-11T21:38:06.6545775Z ok (0.183s)
2023-01-11T21:38:06.6546233Z   test_slice_scatter_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6546369Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6546656Z [2023-01-11 21:35:38,243] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 836
2023-01-11T21:38:06.6546925Z [2023-01-11 21:35:38,358] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 836
2023-01-11T21:38:06.6547337Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6547471Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6547725Z [2023-01-11 21:35:38,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 837
2023-01-11T21:38:06.6547989Z [2023-01-11 21:35:38,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 837
2023-01-11T21:38:06.6547994Z 
2023-01-11T21:38:06.6548092Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6548160Z import torch
2023-01-11T21:38:06.6548235Z import random
2023-01-11T21:38:06.6548353Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6548475Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6548480Z 
2023-01-11T21:38:06.6548560Z aten = torch.ops.aten
2023-01-11T21:38:06.6548692Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6548787Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6548792Z 
2023-01-11T21:38:06.6548859Z import triton
2023-01-11T21:38:06.6548978Z import triton.language as tl
2023-01-11T21:38:06.6549103Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6549241Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6549247Z 
2023-01-11T21:38:06.6549251Z 
2023-01-11T21:38:06.6549452Z triton_fused_slice_scatter_slice_scatter_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6549526Z import triton
2023-01-11T21:38:06.6549620Z import triton.language as tl
2023-01-11T21:38:06.6549733Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6549828Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6549959Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6550083Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6550088Z 
2023-01-11T21:38:06.6550524Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.6550601Z @triton.jit
2023-01-11T21:38:06.6550752Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6550826Z     xnumel = 3200
2023-01-11T21:38:06.6550926Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6551048Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6551131Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6551206Z     x0 = xindex % 100
2023-01-11T21:38:06.6551286Z     x1 = (xindex // 100)
2023-01-11T21:38:06.6551357Z     x2 = xindex
2023-01-11T21:38:06.6551548Z     tmp8 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6551648Z     tmp16 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.6551713Z     tmp0 = x0
2023-01-11T21:38:06.6551783Z     tmp1 = 10
2023-01-11T21:38:06.6551860Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.6551940Z     tmp3 = 90
2023-01-11T21:38:06.6552017Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.6552093Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.6552377Z     tmp6 = tl.load(in_ptr0 + ((-10) + x0 + (80*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.6552466Z     tmp7 = tl.where(tmp5, tmp6, 0.0)
2023-01-11T21:38:06.6552591Z     tmp9 = tl.where(tmp5, tmp7, tmp8)
2023-01-11T21:38:06.6552667Z     tmp10 = x0 % 2
2023-01-11T21:38:06.6552738Z     tmp11 = 0
2023-01-11T21:38:06.6552820Z     tmp12 = tmp10 == tmp11
2023-01-11T21:38:06.6552900Z     tmp13 = tmp5 & tmp12
2023-01-11T21:38:06.6553143Z     tmp14 = tl.load(in_ptr0 + ((-5) + (80*x1) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0)
2023-01-11T21:38:06.6553232Z     tmp15 = tl.where(tmp13, tmp14, 0.0)
2023-01-11T21:38:06.6553331Z     tmp17 = tl.where(tmp13, tmp15, tmp16)
2023-01-11T21:38:06.6553464Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.6553600Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.6553688Z ''')
2023-01-11T21:38:06.6553693Z 
2023-01-11T21:38:06.6553698Z 
2023-01-11T21:38:06.6553791Z async_compile.wait(globals())
2023-01-11T21:38:06.6553866Z del async_compile
2023-01-11T21:38:06.6553872Z 
2023-01-11T21:38:06.6553944Z def call(args):
2023-01-11T21:38:06.6554019Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6554092Z     args.clear()
2023-01-11T21:38:06.6554182Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6554393Z         buf0 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6554606Z         buf1 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6554698Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6554874Z         triton_fused_slice_scatter_slice_scatter_1_0.run(arg1_1, arg0_1, buf0, buf1, 3200, grid=grid(3200), stream=stream0)
2023-01-11T21:38:06.6554941Z         del arg0_1
2023-01-11T21:38:06.6555045Z         del arg1_1
2023-01-11T21:38:06.6555127Z         return (buf0, buf1, )
2023-01-11T21:38:06.6555133Z 
2023-01-11T21:38:06.6555137Z 
2023-01-11T21:38:06.6555215Z if __name__ == "__main__":
2023-01-11T21:38:06.6555357Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6555506Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6555728Z     arg0_1 = rand_strided((4, 8, 100), (800, 100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6555936Z     arg1_1 = rand_strided((4, 8, 80), (640, 80, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6556048Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6556059Z 
2023-01-11T21:38:06.6556064Z 
2023-01-11T21:38:06.6556154Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6556228Z import torch
2023-01-11T21:38:06.6556301Z import random
2023-01-11T21:38:06.6556419Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6556546Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6556551Z 
2023-01-11T21:38:06.6556635Z aten = torch.ops.aten
2023-01-11T21:38:06.6556771Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6556859Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6556864Z 
2023-01-11T21:38:06.6556940Z import triton
2023-01-11T21:38:06.6557032Z import triton.language as tl
2023-01-11T21:38:06.6557155Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6557293Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6557299Z 
2023-01-11T21:38:06.6557303Z 
2023-01-11T21:38:06.6557500Z triton_fused_slice_scatter_slice_scatter_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6557577Z import triton
2023-01-11T21:38:06.6557668Z import triton.language as tl
2023-01-11T21:38:06.6557774Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6557875Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6558011Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6558135Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6558140Z 
2023-01-11T21:38:06.6558603Z @pointwise(size_hints=[4096], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.6558680Z @triton.jit
2023-01-11T21:38:06.6558829Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6558902Z     xnumel = 3200
2023-01-11T21:38:06.6558992Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6559122Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6559205Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6559281Z     x0 = xindex % 100
2023-01-11T21:38:06.6559358Z     x1 = (xindex // 100)
2023-01-11T21:38:06.6559431Z     x2 = xindex
2023-01-11T21:38:06.6559645Z     tmp8 = tl.load(in_ptr1 + (x2), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6559757Z     tmp16 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.6559828Z     tmp0 = x0
2023-01-11T21:38:06.6559898Z     tmp1 = 10
2023-01-11T21:38:06.6559979Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.6560050Z     tmp3 = 90
2023-01-11T21:38:06.6560127Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.6560205Z     tmp5 = tmp2 & tmp4
2023-01-11T21:38:06.6560507Z     tmp6 = tl.load(in_ptr0 + ((-10) + x0 + (80*x1) + tl.zeros([XBLOCK], tl.int32)), tmp5 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.6560602Z     tmp7 = tl.where(tmp5, tmp6, 0.0)
2023-01-11T21:38:06.6560698Z     tmp9 = tl.where(tmp5, tmp7, tmp8)
2023-01-11T21:38:06.6560770Z     tmp10 = x0 % 2
2023-01-11T21:38:06.6560840Z     tmp11 = 0
2023-01-11T21:38:06.6560922Z     tmp12 = tmp10 == tmp11
2023-01-11T21:38:06.6561003Z     tmp13 = tmp5 & tmp12
2023-01-11T21:38:06.6561339Z     tmp14 = tl.load(in_ptr0 + ((-5) + (80*x1) + (x0 // 2) + tl.zeros([XBLOCK], tl.int32)), tmp13 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.6561436Z     tmp15 = tl.where(tmp13, tmp14, 0.0)
2023-01-11T21:38:06.6561535Z     tmp17 = tl.where(tmp13, tmp15, tmp16)
2023-01-11T21:38:06.6561670Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.6561809Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.6561894Z ''')
2023-01-11T21:38:06.6561900Z 
2023-01-11T21:38:06.6561904Z 
2023-01-11T21:38:06.6561997Z async_compile.wait(globals())
2023-01-11T21:38:06.6562073Z del async_compile
2023-01-11T21:38:06.6562079Z 
2023-01-11T21:38:06.6562146Z def call(args):
2023-01-11T21:38:06.6562227Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6562302Z     args.clear()
2023-01-11T21:38:06.6562393Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6562604Z         buf0 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6562820Z         buf1 = empty_strided((4, 8, 100), (800, 100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6562912Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6563083Z         triton_fused_slice_scatter_slice_scatter_1_0.run(arg1_1, arg0_1, buf0, buf1, 3200, grid=grid(3200), stream=stream0)
2023-01-11T21:38:06.6563156Z         del arg0_1
2023-01-11T21:38:06.6563229Z         del arg1_1
2023-01-11T21:38:06.6563313Z         return (buf0, buf1, )
2023-01-11T21:38:06.6563318Z 
2023-01-11T21:38:06.6563322Z 
2023-01-11T21:38:06.6563406Z if __name__ == "__main__":
2023-01-11T21:38:06.6563522Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6563649Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6563863Z     arg0_1 = rand_strided((4, 8, 100), (800, 100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6564064Z     arg1_1 = rand_strided((4, 8, 80), (640, 80, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6564187Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6564193Z 
2023-01-11T21:38:06.6564263Z ok (0.291s)
2023-01-11T21:38:06.6564746Z   test_softmax_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6564878Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6565135Z [2023-01-11 21:35:38,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 838
2023-01-11T21:38:06.6565346Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.6565551Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.6565756Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6')
2023-01-11T21:38:06.6565953Z [2023-01-11 21:35:38,589] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.6566211Z [2023-01-11 21:35:38,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 838
2023-01-11T21:38:06.6566626Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6566755Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6567013Z [2023-01-11 21:35:38,851] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 839
2023-01-11T21:38:06.6567257Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.6567458Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.6567659Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf6')
2023-01-11T21:38:06.6567857Z [2023-01-11 21:35:38,899] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.6567862Z 
2023-01-11T21:38:06.6567959Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6568027Z import torch
2023-01-11T21:38:06.6568100Z import random
2023-01-11T21:38:06.6568219Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6568345Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6568350Z 
2023-01-11T21:38:06.6568433Z aten = torch.ops.aten
2023-01-11T21:38:06.6568570Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6568669Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6568675Z 
2023-01-11T21:38:06.6568748Z import triton
2023-01-11T21:38:06.6568834Z import triton.language as tl
2023-01-11T21:38:06.6568958Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6569097Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6569105Z 
2023-01-11T21:38:06.6569110Z 
2023-01-11T21:38:06.6569296Z triton_fused_amax_1_exp_1_sub_1_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6569369Z import triton
2023-01-11T21:38:06.6569461Z import triton.language as tl
2023-01-11T21:38:06.6569575Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6569678Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6569803Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6569929Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6569934Z 
2023-01-11T21:38:06.6570022Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.6570144Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.6570228Z               filename=__file__,
2023-01-11T21:38:06.6570633Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6570709Z @triton.jit
2023-01-11T21:38:06.6570885Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6570951Z     xnumel = 8
2023-01-11T21:38:06.6571022Z     rnumel = 8
2023-01-11T21:38:06.6571118Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6571254Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6571338Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6571457Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6571528Z     x0 = xindex
2023-01-11T21:38:06.6571709Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6571813Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6571902Z         rindex = roffset + rbase
2023-01-11T21:38:06.6571986Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6572057Z         r1 = rindex
2023-01-11T21:38:06.6572279Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6572408Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.6572516Z     tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6572612Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.6572731Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6572837Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6572925Z         rindex = roffset + rbase
2023-01-11T21:38:06.6573011Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6573084Z         r1 = rindex
2023-01-11T21:38:06.6573233Z         tmp2 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask)
2023-01-11T21:38:06.6573348Z         tmp3 = tmp2 - tmp1
2023-01-11T21:38:06.6573432Z         tmp4 = tl.exp(tmp3)
2023-01-11T21:38:06.6573556Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.6573675Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6573772Z     tl.store(out_ptr1 + x0, tmp5, xmask)
2023-01-11T21:38:06.6573856Z ''')
2023-01-11T21:38:06.6573862Z 
2023-01-11T21:38:06.6573867Z 
2023-01-11T21:38:06.6574089Z triton_fused_add_amax_amax_2_div_div_1_div_2_exp_exp_2_sub_sub_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6574163Z import triton
2023-01-11T21:38:06.6574258Z import triton.language as tl
2023-01-11T21:38:06.6574371Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6574648Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6574784Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6574913Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6574922Z 
2023-01-11T21:38:06.6575010Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.6575119Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6575202Z               filename=__file__,
2023-01-11T21:38:06.6575647Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.6575723Z @triton.jit
2023-01-11T21:38:06.6575933Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6576006Z     xnumel = 8
2023-01-11T21:38:06.6576077Z     rnumel = 8
2023-01-11T21:38:06.6576173Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6576301Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6576385Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6576501Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6576571Z     x0 = xindex
2023-01-11T21:38:06.6576756Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6576906Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6576996Z         rindex = roffset + rbase
2023-01-11T21:38:06.6577074Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6577201Z         r1 = rindex
2023-01-11T21:38:06.6577424Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6577639Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6577720Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6577851Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp2), tmp2, _tmp3)
2023-01-11T21:38:06.6577965Z     tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6578078Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6578265Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6578369Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6578457Z         rindex = roffset + rbase
2023-01-11T21:38:06.6578547Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6578619Z         r1 = rindex
2023-01-11T21:38:06.6578834Z         tmp4 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6579050Z         tmp5 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6579126Z         tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.6579241Z         tmp7 = tmp6 - tmp3
2023-01-11T21:38:06.6579323Z         tmp8 = tl.exp(tmp7)
2023-01-11T21:38:06.6579441Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.6579574Z         _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp5), tmp5, _tmp10)
2023-01-11T21:38:06.6579731Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6579844Z     tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6579955Z     _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6580062Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6580151Z         rindex = roffset + rbase
2023-01-11T21:38:06.6580237Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6580309Z         r1 = rindex
2023-01-11T21:38:06.6580526Z         tmp11 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6580742Z         tmp15 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6580852Z         tmp20 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6580954Z         tmp21 = tl.load(in_ptr2 + (r1), rmask)
2023-01-11T21:38:06.6581058Z         tmp24 = tl.load(in_ptr3 + (r1), rmask)
2023-01-11T21:38:06.6581182Z         tmp12 = tmp11 - tmp10
2023-01-11T21:38:06.6581268Z         tmp13 = tl.exp(tmp12)
2023-01-11T21:38:06.6581390Z         _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14)
2023-01-11T21:38:06.6581473Z         tmp16 = tmp15 + tmp11
2023-01-11T21:38:06.6581581Z         tmp17 = tmp16 - tmp3
2023-01-11T21:38:06.6581668Z         tmp18 = tl.exp(tmp17)
2023-01-11T21:38:06.6581747Z         tmp19 = tmp18 / tmp9
2023-01-11T21:38:06.6581859Z         tmp22 = tmp20 - tmp21
2023-01-11T21:38:06.6581939Z         tmp23 = tl.exp(tmp22)
2023-01-11T21:38:06.6582017Z         tmp25 = tmp23 / tmp24
2023-01-11T21:38:06.6582178Z         tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp19, rmask & xmask)
2023-01-11T21:38:06.6582330Z         tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask)
2023-01-11T21:38:06.6582447Z     tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6582552Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6582642Z         rindex = roffset + rbase
2023-01-11T21:38:06.6582728Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6582801Z         r1 = rindex
2023-01-11T21:38:06.6582919Z         tmp26 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6583029Z         tmp27 = tmp26 - tmp10
2023-01-11T21:38:06.6583141Z         tmp28 = tl.exp(tmp27)
2023-01-11T21:38:06.6583225Z         tmp29 = tmp28 / tmp14
2023-01-11T21:38:06.6583378Z         tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp29, rmask & xmask)
2023-01-11T21:38:06.6583463Z ''')
2023-01-11T21:38:06.6583469Z 
2023-01-11T21:38:06.6583473Z 
2023-01-11T21:38:06.6583569Z async_compile.wait(globals())
2023-01-11T21:38:06.6583647Z del async_compile
2023-01-11T21:38:06.6583652Z 
2023-01-11T21:38:06.6583728Z def call(args):
2023-01-11T21:38:06.6583800Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6583876Z     args.clear()
2023-01-11T21:38:06.6583967Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6584174Z         buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6584371Z         buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6584464Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6584624Z         triton_fused_amax_1_exp_1_sub_1_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6584820Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6585006Z         buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6585204Z         buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6585411Z         triton_fused_add_amax_amax_2_div_div_1_div_2_exp_exp_2_sub_sub_2_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6585486Z         del arg0_1
2023-01-11T21:38:06.6585587Z         del arg1_1
2023-01-11T21:38:06.6585675Z         return (buf2, buf5, buf8, )
2023-01-11T21:38:06.6585681Z 
2023-01-11T21:38:06.6585685Z 
2023-01-11T21:38:06.6585768Z if __name__ == "__main__":
2023-01-11T21:38:06.6585886Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6586006Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6586210Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6586408Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6586527Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6586533Z 
2023-01-11T21:38:06.6586537Z 
2023-01-11T21:38:06.6586636Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6586709Z import torch
2023-01-11T21:38:06.6586785Z import random
2023-01-11T21:38:06.6586905Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6587023Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6587031Z 
2023-01-11T21:38:06.6587113Z aten = torch.ops.aten
2023-01-11T21:38:06.6587252Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6587348Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6587353Z 
2023-01-11T21:38:06.6587427Z import triton
2023-01-11T21:38:06.6587522Z import triton.language as tl
2023-01-11T21:38:06.6587648Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6587782Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6587795Z 
2023-01-11T21:38:06.6587800Z 
2023-01-11T21:38:06.6588014Z triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6588090Z import triton
2023-01-11T21:38:06.6588182Z import triton.language as tl
2023-01-11T21:38:06.6588294Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6588396Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6588528Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6588654Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6588659Z 
2023-01-11T21:38:06.6588741Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.6588857Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.6588943Z               filename=__file__,
2023-01-11T21:38:06.6589344Z               meta={'signature': {0: '*fp16', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6589417Z @triton.jit
2023-01-11T21:38:06.6589592Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6589672Z     xnumel = 8
2023-01-11T21:38:06.6589741Z     rnumel = 8
2023-01-11T21:38:06.6589832Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6589969Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6590055Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6590174Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6590244Z     x0 = xindex
2023-01-11T21:38:06.6590428Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6590535Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6590617Z         rindex = roffset + rbase
2023-01-11T21:38:06.6590699Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6590770Z         r1 = rindex
2023-01-11T21:38:06.6591010Z         tmp0 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6591103Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6591231Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 < tmp1), tmp1, _tmp2)
2023-01-11T21:38:06.6591347Z     tmp2 = tl.reshape(tl.max(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6591438Z     tl.store(out_ptr0 + x0, tmp2, xmask)
2023-01-11T21:38:06.6591583Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6591688Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6591776Z         rindex = roffset + rbase
2023-01-11T21:38:06.6591859Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6591932Z         r1 = rindex
2023-01-11T21:38:06.6592066Z         tmp3 = tl.load(in_ptr0 + (x0 + (8*r1)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6592149Z         tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.6592264Z         tmp5 = tmp4 - tmp2
2023-01-11T21:38:06.6592349Z         tmp6 = tl.exp(tmp5)
2023-01-11T21:38:06.6592469Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.6592583Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6592682Z     tl.store(out_ptr1 + x0, tmp7, xmask)
2023-01-11T21:38:06.6592764Z ''')
2023-01-11T21:38:06.6592769Z 
2023-01-11T21:38:06.6592773Z 
2023-01-11T21:38:06.6593131Z triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6593204Z import triton
2023-01-11T21:38:06.6593296Z import triton.language as tl
2023-01-11T21:38:06.6593412Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6593515Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6593650Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6593775Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6593781Z 
2023-01-11T21:38:06.6593868Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.6593982Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6594060Z               filename=__file__,
2023-01-11T21:38:06.6594494Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: '*fp32', 4: '*fp16', 5: '*fp16', 6: '*fp16', 7: 'i32', 8: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6), equal_to_1=())]})
2023-01-11T21:38:06.6594572Z @triton.jit
2023-01-11T21:38:06.6594784Z def triton_(in_ptr0, in_ptr1, in_ptr2, in_ptr3, out_ptr4, out_ptr5, out_ptr6, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6594856Z     xnumel = 8
2023-01-11T21:38:06.6594927Z     rnumel = 8
2023-01-11T21:38:06.6595051Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6595188Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6595264Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6595384Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6595455Z     x0 = xindex
2023-01-11T21:38:06.6595638Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6595743Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6595831Z         rindex = roffset + rbase
2023-01-11T21:38:06.6595916Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6595980Z         r1 = rindex
2023-01-11T21:38:06.6596220Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6596458Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6596539Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6596630Z         tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.6596758Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp3), tmp3, _tmp4)
2023-01-11T21:38:06.6596873Z     tmp4 = tl.reshape(tl.max(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6596984Z     _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6597167Z     _tmp13 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6597273Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6597361Z         rindex = roffset + rbase
2023-01-11T21:38:06.6597446Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6597516Z         r1 = rindex
2023-01-11T21:38:06.6597777Z         tmp5 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6598003Z         tmp6 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6598085Z         tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.6598178Z         tmp8 = tmp7.to(tl.float32)
2023-01-11T21:38:06.6598290Z         tmp9 = tmp8 - tmp4
2023-01-11T21:38:06.6598374Z         tmp10 = tl.exp(tmp9)
2023-01-11T21:38:06.6598496Z         _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11)
2023-01-11T21:38:06.6598586Z         tmp12 = tmp6.to(tl.float32)
2023-01-11T21:38:06.6598714Z         _tmp13 = tl.where(xmask & rmask & (_tmp13 < tmp12), tmp12, _tmp13)
2023-01-11T21:38:06.6598823Z     tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6598938Z     tmp13 = tl.reshape(tl.max(_tmp13, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6599053Z     _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6599163Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6599252Z         rindex = roffset + rbase
2023-01-11T21:38:06.6599337Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6599408Z         r1 = rindex
2023-01-11T21:38:06.6599641Z         tmp14 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6599881Z         tmp19 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6600012Z         tmp26 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6600114Z         tmp28 = tl.load(in_ptr2 + (r1), rmask)
2023-01-11T21:38:06.6600216Z         tmp31 = tl.load(in_ptr3 + (r1), rmask)
2023-01-11T21:38:06.6600309Z         tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.6600428Z         tmp16 = tmp15 - tmp13
2023-01-11T21:38:06.6600506Z         tmp17 = tl.exp(tmp16)
2023-01-11T21:38:06.6600630Z         _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18)
2023-01-11T21:38:06.6600715Z         tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.6600804Z         tmp21 = tmp20.to(tl.float32)
2023-01-11T21:38:06.6600919Z         tmp22 = tmp21 - tmp4
2023-01-11T21:38:06.6601002Z         tmp23 = tl.exp(tmp22)
2023-01-11T21:38:06.6601088Z         tmp24 = tmp23 / tmp11
2023-01-11T21:38:06.6601199Z         tmp25 = tmp24.to(tl.float32)
2023-01-11T21:38:06.6601290Z         tmp27 = tmp26.to(tl.float32)
2023-01-11T21:38:06.6601405Z         tmp29 = tmp27 - tmp28
2023-01-11T21:38:06.6601485Z         tmp30 = tl.exp(tmp29)
2023-01-11T21:38:06.6601569Z         tmp32 = tmp30 / tmp31
2023-01-11T21:38:06.6601656Z         tmp33 = tmp32.to(tl.float32)
2023-01-11T21:38:06.6601816Z         tl.store(out_ptr4 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp25, rmask & xmask)
2023-01-11T21:38:06.6601967Z         tl.store(out_ptr5 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp33, rmask & xmask)
2023-01-11T21:38:06.6602085Z     tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6602194Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6602286Z         rindex = roffset + rbase
2023-01-11T21:38:06.6602370Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6602441Z         r1 = rindex
2023-01-11T21:38:06.6602575Z         tmp34 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6602667Z         tmp35 = tmp34.to(tl.float32)
2023-01-11T21:38:06.6602777Z         tmp36 = tmp35 - tmp13
2023-01-11T21:38:06.6602860Z         tmp37 = tl.exp(tmp36)
2023-01-11T21:38:06.6602940Z         tmp38 = tmp37 / tmp18
2023-01-11T21:38:06.6603029Z         tmp39 = tmp38.to(tl.float32)
2023-01-11T21:38:06.6603184Z         tl.store(out_ptr6 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp39, rmask & xmask)
2023-01-11T21:38:06.6603268Z ''')
2023-01-11T21:38:06.6603273Z 
2023-01-11T21:38:06.6603278Z 
2023-01-11T21:38:06.6603369Z async_compile.wait(globals())
2023-01-11T21:38:06.6603440Z del async_compile
2023-01-11T21:38:06.6603471Z 
2023-01-11T21:38:06.6603545Z def call(args):
2023-01-11T21:38:06.6603624Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6603698Z     args.clear()
2023-01-11T21:38:06.6603794Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6603996Z         buf3 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6604196Z         buf4 = empty_strided((1, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6604281Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6604465Z         triton_fused_amax_1_convert_element_type_2_exp_1_sub_1_sum_2_0.run(arg0_1, buf3, buf4, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6604665Z         buf2 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6604863Z         buf5 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6605059Z         buf8 = empty_strided((8, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6605353Z         triton_fused_add_amax_amax_2_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_4_convert_element_type_5_exp_exp_2_1.run(arg0_1, arg1_1, buf3, buf4, buf2, buf5, buf8, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6605431Z         del arg0_1
2023-01-11T21:38:06.6605504Z         del arg1_1
2023-01-11T21:38:06.6605595Z         return (buf2, buf5, buf8, )
2023-01-11T21:38:06.6605600Z 
2023-01-11T21:38:06.6605605Z 
2023-01-11T21:38:06.6605678Z if __name__ == "__main__":
2023-01-11T21:38:06.6605796Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6605924Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6606122Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6606319Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6606440Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6606710Z [2023-01-11 21:35:39,021] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 839
2023-01-11T21:38:06.6606716Z 
2023-01-11T21:38:06.6606789Z ok (0.522s)
2023-01-11T21:38:06.6607279Z   test_softmax_one_kernel_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6607406Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6607661Z [2023-01-11 21:35:39,208] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 840
2023-01-11T21:38:06.6607871Z [2023-01-11 21:35:39,225] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.6608080Z [2023-01-11 21:35:39,225] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.6608344Z [2023-01-11 21:35:39,322] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 840
2023-01-11T21:38:06.6608350Z 
2023-01-11T21:38:06.6608447Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6608524Z import torch
2023-01-11T21:38:06.6608597Z import random
2023-01-11T21:38:06.6608711Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6608832Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6608837Z 
2023-01-11T21:38:06.6608919Z aten = torch.ops.aten
2023-01-11T21:38:06.6609056Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6609150Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6609155Z 
2023-01-11T21:38:06.6609231Z import triton
2023-01-11T21:38:06.6609323Z import triton.language as tl
2023-01-11T21:38:06.6609446Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6609618Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6609624Z 
2023-01-11T21:38:06.6609634Z 
2023-01-11T21:38:06.6609812Z triton_fused_amax_div_exp_mul_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6609884Z import triton
2023-01-11T21:38:06.6609977Z import triton.language as tl
2023-01-11T21:38:06.6610094Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6610194Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6610328Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6610453Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6610458Z 
2023-01-11T21:38:06.6610542Z @reduction(size_hints=[16, 32],
2023-01-11T21:38:06.6610657Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6610742Z               filename=__file__,
2023-01-11T21:38:06.6611108Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6611186Z @triton.jit
2023-01-11T21:38:06.6611358Z def triton_(in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6611430Z     xnumel = 16
2023-01-11T21:38:06.6611498Z     rnumel = 32
2023-01-11T21:38:06.6611591Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6611727Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6611810Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6611928Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6611999Z     x0 = xindex
2023-01-11T21:38:06.6612182Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.6612287Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6612369Z         rindex = roffset + rbase
2023-01-11T21:38:06.6612450Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6612525Z         r1 = rindex
2023-01-11T21:38:06.6612739Z         tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6612866Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 < tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.6612979Z     tmp1 = tl.reshape(tl.max(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6613121Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6613221Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6613311Z         rindex = roffset + rbase
2023-01-11T21:38:06.6613397Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6613468Z         r1 = rindex
2023-01-11T21:38:06.6613683Z         tmp2 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6613769Z         tmp3 = tmp2 * tmp1
2023-01-11T21:38:06.6613850Z         tmp4 = tl.exp(tmp3)
2023-01-11T21:38:06.6613964Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.6614080Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6614185Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6614273Z         rindex = roffset + rbase
2023-01-11T21:38:06.6614359Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6614429Z         r1 = rindex
2023-01-11T21:38:06.6614657Z         tmp6 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask)
2023-01-11T21:38:06.6614734Z         tmp7 = tmp6 * tmp1
2023-01-11T21:38:06.6614816Z         tmp8 = tl.exp(tmp7)
2023-01-11T21:38:06.6614895Z         tmp9 = tmp8 / tmp5
2023-01-11T21:38:06.6615053Z         tl.store(out_ptr2 + (r1 + (32*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask)
2023-01-11T21:38:06.6615138Z ''')
2023-01-11T21:38:06.6615144Z 
2023-01-11T21:38:06.6615148Z 
2023-01-11T21:38:06.6615240Z async_compile.wait(globals())
2023-01-11T21:38:06.6615314Z del async_compile
2023-01-11T21:38:06.6615320Z 
2023-01-11T21:38:06.6615395Z def call(args):
2023-01-11T21:38:06.6615461Z     arg0_1, = args
2023-01-11T21:38:06.6615606Z     args.clear()
2023-01-11T21:38:06.6615712Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6615935Z         buf2 = empty_strided((16, 32), (32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6616029Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6616189Z         triton_fused_amax_div_exp_mul_sum_1_0.run(arg0_1, buf2, 16, 32, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6616260Z         del arg0_1
2023-01-11T21:38:06.6616330Z         return (buf2, )
2023-01-11T21:38:06.6616336Z 
2023-01-11T21:38:06.6616340Z 
2023-01-11T21:38:06.6616419Z if __name__ == "__main__":
2023-01-11T21:38:06.6616536Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6616665Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6616874Z     arg0_1 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6616986Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6616991Z 
2023-01-11T21:38:06.6617065Z ok (0.300s)
2023-01-11T21:38:06.6617573Z   test_sort_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6617703Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6617953Z [2023-01-11 21:35:39,336] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 841
2023-01-11T21:38:06.6618171Z [2023-01-11 21:35:39,341] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort
2023-01-11T21:38:06.6618433Z [2023-01-11 21:35:39,344] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 841
2023-01-11T21:38:06.6618848Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6619025Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6619280Z [2023-01-11 21:35:39,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 842
2023-01-11T21:38:06.6619493Z [2023-01-11 21:35:39,362] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.sort
2023-01-11T21:38:06.6619748Z [2023-01-11 21:35:39,364] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 842
2023-01-11T21:38:06.6619754Z 
2023-01-11T21:38:06.6619852Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6619923Z import torch
2023-01-11T21:38:06.6619991Z import random
2023-01-11T21:38:06.6620114Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6620235Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6620240Z 
2023-01-11T21:38:06.6620319Z aten = torch.ops.aten
2023-01-11T21:38:06.6620457Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6620555Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6620561Z 
2023-01-11T21:38:06.6620636Z import triton
2023-01-11T21:38:06.6620720Z import triton.language as tl
2023-01-11T21:38:06.6620844Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6620982Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6620988Z 
2023-01-11T21:38:06.6620992Z 
2023-01-11T21:38:06.6621085Z async_compile.wait(globals())
2023-01-11T21:38:06.6621162Z del async_compile
2023-01-11T21:38:06.6621167Z 
2023-01-11T21:38:06.6621238Z def call(args):
2023-01-11T21:38:06.6621312Z     arg0_1, = args
2023-01-11T21:38:06.6621384Z     args.clear()
2023-01-11T21:38:06.6621501Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6621591Z         buf0 = aten.sort(arg0_1)
2023-01-11T21:38:06.6621667Z         del arg0_1
2023-01-11T21:38:06.6621743Z         buf1 = buf0[0]
2023-01-11T21:38:06.6621860Z         assert_size_stride(buf1, (1, 1, 8, 8), (64, 64, 8, 1))
2023-01-11T21:38:06.6621936Z         buf2 = buf0[1]
2023-01-11T21:38:06.6622049Z         assert_size_stride(buf2, (1, 1, 8, 8), (64, 64, 8, 1))
2023-01-11T21:38:06.6622115Z         del buf0
2023-01-11T21:38:06.6622200Z         return (buf1, buf2, )
2023-01-11T21:38:06.6622205Z 
2023-01-11T21:38:06.6622210Z 
2023-01-11T21:38:06.6622292Z if __name__ == "__main__":
2023-01-11T21:38:06.6622413Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6622540Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6622761Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6622884Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6622892Z 
2023-01-11T21:38:06.6622896Z 
2023-01-11T21:38:06.6623003Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6623073Z import torch
2023-01-11T21:38:06.6623150Z import random
2023-01-11T21:38:06.6623279Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6623418Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6623423Z 
2023-01-11T21:38:06.6623508Z aten = torch.ops.aten
2023-01-11T21:38:06.6623659Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6623760Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6623765Z 
2023-01-11T21:38:06.6623840Z import triton
2023-01-11T21:38:06.6623932Z import triton.language as tl
2023-01-11T21:38:06.6624071Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6624223Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6624228Z 
2023-01-11T21:38:06.6624233Z 
2023-01-11T21:38:06.6624331Z async_compile.wait(globals())
2023-01-11T21:38:06.6624416Z del async_compile
2023-01-11T21:38:06.6624421Z 
2023-01-11T21:38:06.6624501Z def call(args):
2023-01-11T21:38:06.6624577Z     arg0_1, = args
2023-01-11T21:38:06.6624648Z     args.clear()
2023-01-11T21:38:06.6624745Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6624838Z         buf0 = aten.sort(arg0_1)
2023-01-11T21:38:06.6624942Z         del arg0_1
2023-01-11T21:38:06.6625021Z         buf1 = buf0[0]
2023-01-11T21:38:06.6625143Z         assert_size_stride(buf1, (1, 1, 8, 8), (64, 64, 8, 1))
2023-01-11T21:38:06.6625219Z         buf2 = buf0[1]
2023-01-11T21:38:06.6625331Z         assert_size_stride(buf2, (1, 1, 8, 8), (64, 64, 8, 1))
2023-01-11T21:38:06.6625406Z         del buf0
2023-01-11T21:38:06.6625495Z         return (buf1, buf2, )
2023-01-11T21:38:06.6625500Z 
2023-01-11T21:38:06.6625504Z 
2023-01-11T21:38:06.6625588Z if __name__ == "__main__":
2023-01-11T21:38:06.6625719Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6625862Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6626110Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6626230Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6626235Z 
2023-01-11T21:38:06.6626302Z ok (0.042s)
2023-01-11T21:38:06.6626840Z   test_split_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6626984Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6627284Z [2023-01-11 21:35:39,386] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 843
2023-01-11T21:38:06.6627582Z [2023-01-11 21:35:39,389] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 843
2023-01-11T21:38:06.6628099Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6628247Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6628540Z [2023-01-11 21:35:39,411] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 844
2023-01-11T21:38:06.6628841Z [2023-01-11 21:35:39,414] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 844
2023-01-11T21:38:06.6629329Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6629473Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6629766Z [2023-01-11 21:35:39,439] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 845
2023-01-11T21:38:06.6630056Z [2023-01-11 21:35:39,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 845
2023-01-11T21:38:06.6630533Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6630679Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6630969Z [2023-01-11 21:35:39,537] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 846
2023-01-11T21:38:06.6631296Z [2023-01-11 21:35:39,607] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 846
2023-01-11T21:38:06.6631302Z 
2023-01-11T21:38:06.6631409Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6631487Z import torch
2023-01-11T21:38:06.6631565Z import random
2023-01-11T21:38:06.6631696Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6631825Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6631830Z 
2023-01-11T21:38:06.6631916Z aten = torch.ops.aten
2023-01-11T21:38:06.6632067Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6632169Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6632174Z 
2023-01-11T21:38:06.6632251Z import triton
2023-01-11T21:38:06.6632353Z import triton.language as tl
2023-01-11T21:38:06.6632490Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6632644Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6632649Z 
2023-01-11T21:38:06.6632654Z 
2023-01-11T21:38:06.6632745Z async_compile.wait(globals())
2023-01-11T21:38:06.6632828Z del async_compile
2023-01-11T21:38:06.6632833Z 
2023-01-11T21:38:06.6632910Z def call(args):
2023-01-11T21:38:06.6632986Z     arg0_1, = args
2023-01-11T21:38:06.6633064Z     args.clear()
2023-01-11T21:38:06.6633287Z     return (as_strided(arg0_1, (2, 2, 3), (20, 10, 1)), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 3), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 6), as_strided(arg0_1, (2, 2, 1), (20, 10, 1), 9), )
2023-01-11T21:38:06.6633293Z 
2023-01-11T21:38:06.6633297Z 
2023-01-11T21:38:06.6633384Z if __name__ == "__main__":
2023-01-11T21:38:06.6633511Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6633642Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6633909Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6634023Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6634029Z 
2023-01-11T21:38:06.6634033Z 
2023-01-11T21:38:06.6634133Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6634212Z import torch
2023-01-11T21:38:06.6634291Z import random
2023-01-11T21:38:06.6634413Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6634542Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6634547Z 
2023-01-11T21:38:06.6634624Z aten = torch.ops.aten
2023-01-11T21:38:06.6634759Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6634859Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6634864Z 
2023-01-11T21:38:06.6634939Z import triton
2023-01-11T21:38:06.6635036Z import triton.language as tl
2023-01-11T21:38:06.6635162Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6635306Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6635311Z 
2023-01-11T21:38:06.6635316Z 
2023-01-11T21:38:06.6635411Z async_compile.wait(globals())
2023-01-11T21:38:06.6635484Z del async_compile
2023-01-11T21:38:06.6635496Z 
2023-01-11T21:38:06.6635568Z def call(args):
2023-01-11T21:38:06.6635643Z     arg0_1, = args
2023-01-11T21:38:06.6635721Z     args.clear()
2023-01-11T21:38:06.6635918Z     return (as_strided(arg0_1, (2, 2, 3), (20, 10, 1)), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 3), as_strided(arg0_1, (2, 2, 3), (20, 10, 1), 6), as_strided(arg0_1, (2, 2, 1), (20, 10, 1), 9), )
2023-01-11T21:38:06.6635925Z 
2023-01-11T21:38:06.6635929Z 
2023-01-11T21:38:06.6636010Z if __name__ == "__main__":
2023-01-11T21:38:06.6636131Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6636261Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6636475Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6636585Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6636590Z 
2023-01-11T21:38:06.6636595Z 
2023-01-11T21:38:06.6636694Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6636770Z import torch
2023-01-11T21:38:06.6636847Z import random
2023-01-11T21:38:06.6636995Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6637122Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6637127Z 
2023-01-11T21:38:06.6637211Z aten = torch.ops.aten
2023-01-11T21:38:06.6637342Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6637439Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6637444Z 
2023-01-11T21:38:06.6637518Z import triton
2023-01-11T21:38:06.6637615Z import triton.language as tl
2023-01-11T21:38:06.6637742Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6637882Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6637891Z 
2023-01-11T21:38:06.6637895Z 
2023-01-11T21:38:06.6638051Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6638129Z import triton
2023-01-11T21:38:06.6638216Z import triton.language as tl
2023-01-11T21:38:06.6638332Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6638439Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6638575Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6638703Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6638708Z 
2023-01-11T21:38:06.6639111Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6639191Z @triton.jit
2023-01-11T21:38:06.6639326Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6639423Z     xnumel = 40
2023-01-11T21:38:06.6639523Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6639654Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6639740Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6639814Z     x0 = xindex
2023-01-11T21:38:06.6639912Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6639988Z     tmp1 = 1
2023-01-11T21:38:06.6640063Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6640201Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6640287Z ''')
2023-01-11T21:38:06.6640293Z 
2023-01-11T21:38:06.6640297Z 
2023-01-11T21:38:06.6640392Z async_compile.wait(globals())
2023-01-11T21:38:06.6640471Z del async_compile
2023-01-11T21:38:06.6640476Z 
2023-01-11T21:38:06.6640553Z def call(args):
2023-01-11T21:38:06.6640629Z     arg0_1, = args
2023-01-11T21:38:06.6640704Z     args.clear()
2023-01-11T21:38:06.6640791Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6641007Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6641104Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6641247Z         triton_fused_add_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.6641323Z         del arg0_1
2023-01-11T21:38:06.6641521Z         return (as_strided(buf0, (2, 2, 3), (20, 10, 1)), as_strided(buf0, (2, 2, 3), (20, 10, 1), 3), as_strided(buf0, (2, 2, 3), (20, 10, 1), 6), as_strided(buf0, (2, 2, 1), (20, 10, 0), 9), )
2023-01-11T21:38:06.6641528Z 
2023-01-11T21:38:06.6641532Z 
2023-01-11T21:38:06.6641615Z if __name__ == "__main__":
2023-01-11T21:38:06.6641735Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6641856Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6642068Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6642184Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6642191Z 
2023-01-11T21:38:06.6642196Z 
2023-01-11T21:38:06.6642301Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6642377Z import torch
2023-01-11T21:38:06.6642454Z import random
2023-01-11T21:38:06.6642578Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6642702Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6642733Z 
2023-01-11T21:38:06.6642813Z aten = torch.ops.aten
2023-01-11T21:38:06.6642950Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6643048Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6643053Z 
2023-01-11T21:38:06.6643129Z import triton
2023-01-11T21:38:06.6643224Z import triton.language as tl
2023-01-11T21:38:06.6643352Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6643492Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6643498Z 
2023-01-11T21:38:06.6643502Z 
2023-01-11T21:38:06.6643656Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6643730Z import triton
2023-01-11T21:38:06.6643824Z import triton.language as tl
2023-01-11T21:38:06.6643939Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6644045Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6644179Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6644312Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6644317Z 
2023-01-11T21:38:06.6644719Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6644794Z @triton.jit
2023-01-11T21:38:06.6644923Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6644999Z     xnumel = 40
2023-01-11T21:38:06.6645098Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6645230Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6645343Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6645416Z     x0 = xindex
2023-01-11T21:38:06.6645535Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6645601Z     tmp1 = 1
2023-01-11T21:38:06.6645681Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6645821Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6645910Z ''')
2023-01-11T21:38:06.6645916Z 
2023-01-11T21:38:06.6645920Z 
2023-01-11T21:38:06.6646016Z async_compile.wait(globals())
2023-01-11T21:38:06.6646095Z del async_compile
2023-01-11T21:38:06.6646100Z 
2023-01-11T21:38:06.6646176Z def call(args):
2023-01-11T21:38:06.6646244Z     arg0_1, = args
2023-01-11T21:38:06.6646320Z     args.clear()
2023-01-11T21:38:06.6646416Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6646628Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6646723Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6646865Z         triton_fused_add_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.6646942Z         del arg0_1
2023-01-11T21:38:06.6647134Z         return (as_strided(buf0, (2, 2, 3), (20, 10, 1)), as_strided(buf0, (2, 2, 3), (20, 10, 1), 3), as_strided(buf0, (2, 2, 3), (20, 10, 1), 6), as_strided(buf0, (2, 2, 1), (20, 10, 0), 9), )
2023-01-11T21:38:06.6647140Z 
2023-01-11T21:38:06.6647144Z 
2023-01-11T21:38:06.6647220Z if __name__ == "__main__":
2023-01-11T21:38:06.6647340Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6647467Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6647679Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6647797Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6647802Z 
2023-01-11T21:38:06.6647874Z ok (0.244s)
2023-01-11T21:38:06.6648336Z   test_split_with_sizes_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6648503Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6648763Z [2023-01-11 21:35:39,638] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 847
2023-01-11T21:38:06.6649028Z [2023-01-11 21:35:39,738] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 847
2023-01-11T21:38:06.6649434Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6649571Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6649828Z [2023-01-11 21:35:39,768] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 848
2023-01-11T21:38:06.6649834Z 
2023-01-11T21:38:06.6649938Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6650016Z import torch
2023-01-11T21:38:06.6650092Z import random
2023-01-11T21:38:06.6650213Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6650338Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6650343Z 
2023-01-11T21:38:06.6650427Z aten = torch.ops.aten
2023-01-11T21:38:06.6650559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6650658Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6650663Z 
2023-01-11T21:38:06.6650741Z import triton
2023-01-11T21:38:06.6650835Z import triton.language as tl
2023-01-11T21:38:06.6651002Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6651145Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6651150Z 
2023-01-11T21:38:06.6651155Z 
2023-01-11T21:38:06.6651311Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6651389Z import triton
2023-01-11T21:38:06.6651477Z import triton.language as tl
2023-01-11T21:38:06.6651595Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6651699Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6651835Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6651964Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6651969Z 
2023-01-11T21:38:06.6652368Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6652446Z @triton.jit
2023-01-11T21:38:06.6652581Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6652651Z     xnumel = 12
2023-01-11T21:38:06.6652751Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6652883Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6652975Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6653053Z     x0 = xindex % 3
2023-01-11T21:38:06.6653135Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6653200Z     x2 = xindex
2023-01-11T21:38:06.6653407Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6653483Z     tmp1 = 2.0
2023-01-11T21:38:06.6653568Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6653643Z     tmp3 = 1.0
2023-01-11T21:38:06.6653723Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6653859Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6653939Z ''')
2023-01-11T21:38:06.6653955Z 
2023-01-11T21:38:06.6653960Z 
2023-01-11T21:38:06.6654112Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6654189Z import triton
2023-01-11T21:38:06.6654284Z import triton.language as tl
2023-01-11T21:38:06.6654401Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6654628Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6654860Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6654988Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6654993Z 
2023-01-11T21:38:06.6655387Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6655462Z @triton.jit
2023-01-11T21:38:06.6655595Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6655669Z     xnumel = 12
2023-01-11T21:38:06.6655765Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6655897Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6655979Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6656053Z     x0 = xindex % 3
2023-01-11T21:38:06.6656123Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6656193Z     x2 = xindex
2023-01-11T21:38:06.6656400Z     tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6656473Z     tmp1 = 2.0
2023-01-11T21:38:06.6656549Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6656620Z     tmp3 = 1.0
2023-01-11T21:38:06.6656690Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6656824Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6656910Z ''')
2023-01-11T21:38:06.6656915Z 
2023-01-11T21:38:06.6656920Z 
2023-01-11T21:38:06.6657074Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6657248Z import triton
2023-01-11T21:38:06.6657352Z import triton.language as tl
2023-01-11T21:38:06.6657517Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6657618Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6657743Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6657866Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6657871Z 
2023-01-11T21:38:06.6658281Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6658355Z @triton.jit
2023-01-11T21:38:06.6658486Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6658557Z     xnumel = 16
2023-01-11T21:38:06.6658652Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6658781Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6658856Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6658937Z     x0 = xindex % 4
2023-01-11T21:38:06.6659014Z     x1 = (xindex // 4)
2023-01-11T21:38:06.6659084Z     x2 = xindex
2023-01-11T21:38:06.6659194Z     tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask)
2023-01-11T21:38:06.6659266Z     tmp1 = 2.0
2023-01-11T21:38:06.6659343Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6659408Z     tmp3 = 1.0
2023-01-11T21:38:06.6659488Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6659623Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6659708Z ''')
2023-01-11T21:38:06.6659714Z 
2023-01-11T21:38:06.6659718Z 
2023-01-11T21:38:06.6659811Z async_compile.wait(globals())
2023-01-11T21:38:06.6659888Z del async_compile
2023-01-11T21:38:06.6659893Z 
2023-01-11T21:38:06.6659969Z def call(args):
2023-01-11T21:38:06.6660035Z     arg0_1, = args
2023-01-11T21:38:06.6660111Z     args.clear()
2023-01-11T21:38:06.6660202Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6660408Z         buf0 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6660503Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6660640Z         triton_fused_add_0.run(arg0_1, buf0, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6660838Z         buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6661002Z         triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6661195Z         buf2 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6661328Z         triton_fused_add_2_2.run(arg0_1, buf2, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6661402Z         del arg0_1
2023-01-11T21:38:06.6661492Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6661497Z 
2023-01-11T21:38:06.6661501Z 
2023-01-11T21:38:06.6661580Z if __name__ == "__main__":
2023-01-11T21:38:06.6661700Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6661827Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6662038Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6662144Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6662149Z 
2023-01-11T21:38:06.6662423Z [2023-01-11 21:35:39,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 848
2023-01-11T21:38:06.6662839Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6662970Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6663226Z [2023-01-11 21:35:39,889] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 849
2023-01-11T21:38:06.6663256Z 
2023-01-11T21:38:06.6663357Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6663434Z import torch
2023-01-11T21:38:06.6663511Z import random
2023-01-11T21:38:06.6663634Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6663754Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6663759Z 
2023-01-11T21:38:06.6663844Z aten = torch.ops.aten
2023-01-11T21:38:06.6663983Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6664079Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6664084Z 
2023-01-11T21:38:06.6664162Z import triton
2023-01-11T21:38:06.6664255Z import triton.language as tl
2023-01-11T21:38:06.6664383Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6664517Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6664531Z 
2023-01-11T21:38:06.6664535Z 
2023-01-11T21:38:06.6664684Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6664765Z import triton
2023-01-11T21:38:06.6664860Z import triton.language as tl
2023-01-11T21:38:06.6664977Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6665081Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6665214Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6665344Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6665349Z 
2023-01-11T21:38:06.6665753Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6665822Z @triton.jit
2023-01-11T21:38:06.6665960Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6666036Z     xnumel = 12
2023-01-11T21:38:06.6666133Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6666265Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6666354Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6666430Z     x0 = xindex % 3
2023-01-11T21:38:06.6666503Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6666574Z     x2 = xindex
2023-01-11T21:38:06.6666809Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6666915Z     tmp1 = 2.0
2023-01-11T21:38:06.6667001Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6667075Z     tmp3 = 1.0
2023-01-11T21:38:06.6667154Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6667287Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6667376Z ''')
2023-01-11T21:38:06.6667381Z 
2023-01-11T21:38:06.6667386Z 
2023-01-11T21:38:06.6667545Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6667621Z import triton
2023-01-11T21:38:06.6667719Z import triton.language as tl
2023-01-11T21:38:06.6667837Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6667941Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6668072Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6668200Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6668205Z 
2023-01-11T21:38:06.6668609Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6668684Z @triton.jit
2023-01-11T21:38:06.6668819Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6668894Z     xnumel = 12
2023-01-11T21:38:06.6668995Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6669127Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6669205Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6669281Z     x0 = xindex % 3
2023-01-11T21:38:06.6669365Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6669435Z     x2 = xindex
2023-01-11T21:38:06.6669703Z     tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6669779Z     tmp1 = 2.0
2023-01-11T21:38:06.6669860Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6669928Z     tmp3 = 1.0
2023-01-11T21:38:06.6670009Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6670149Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6670235Z ''')
2023-01-11T21:38:06.6670240Z 
2023-01-11T21:38:06.6670245Z 
2023-01-11T21:38:06.6670406Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6670482Z import triton
2023-01-11T21:38:06.6670577Z import triton.language as tl
2023-01-11T21:38:06.6670686Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6670789Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6670924Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6671052Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6671061Z 
2023-01-11T21:38:06.6671464Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6671540Z @triton.jit
2023-01-11T21:38:06.6671680Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6671756Z     xnumel = 16
2023-01-11T21:38:06.6671848Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6671979Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6672065Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6672141Z     x0 = xindex % 4
2023-01-11T21:38:06.6672221Z     x1 = (xindex // 4)
2023-01-11T21:38:06.6672292Z     x2 = xindex
2023-01-11T21:38:06.6672421Z     tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6672488Z     tmp1 = 2.0
2023-01-11T21:38:06.6672570Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6672647Z     tmp3 = 1.0
2023-01-11T21:38:06.6672728Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6672868Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6672956Z ''')
2023-01-11T21:38:06.6672962Z 
2023-01-11T21:38:06.6672966Z 
2023-01-11T21:38:06.6673061Z async_compile.wait(globals())
2023-01-11T21:38:06.6673168Z del async_compile
2023-01-11T21:38:06.6673178Z 
2023-01-11T21:38:06.6673248Z def call(args):
2023-01-11T21:38:06.6673323Z     arg0_1, = args
2023-01-11T21:38:06.6673400Z     args.clear()
2023-01-11T21:38:06.6673497Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6673702Z         buf0 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6673798Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6673940Z         triton_fused_add_0.run(arg0_1, buf0, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6674136Z         buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6674280Z         triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6674480Z         buf2 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6674615Z         triton_fused_add_2_2.run(arg0_1, buf2, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6674696Z         del arg0_1
2023-01-11T21:38:06.6674788Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6674793Z 
2023-01-11T21:38:06.6674798Z 
2023-01-11T21:38:06.6674879Z if __name__ == "__main__":
2023-01-11T21:38:06.6674999Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6675121Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6675335Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6675452Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6675457Z 
2023-01-11T21:38:06.6675723Z [2023-01-11 21:35:39,986] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 849
2023-01-11T21:38:06.6676172Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6676306Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6676563Z [2023-01-11 21:35:40,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 850
2023-01-11T21:38:06.6676569Z 
2023-01-11T21:38:06.6676671Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6676748Z import torch
2023-01-11T21:38:06.6676818Z import random
2023-01-11T21:38:06.6676940Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6677066Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6677074Z 
2023-01-11T21:38:06.6677158Z aten = torch.ops.aten
2023-01-11T21:38:06.6677297Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6677395Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6677400Z 
2023-01-11T21:38:06.6677479Z import triton
2023-01-11T21:38:06.6677576Z import triton.language as tl
2023-01-11T21:38:06.6677697Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6677840Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6677845Z 
2023-01-11T21:38:06.6677849Z 
2023-01-11T21:38:06.6678003Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6678080Z import triton
2023-01-11T21:38:06.6678174Z import triton.language as tl
2023-01-11T21:38:06.6678290Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6678394Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6678532Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6678656Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6678661Z 
2023-01-11T21:38:06.6679086Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6679164Z @triton.jit
2023-01-11T21:38:06.6679299Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6679375Z     xnumel = 16
2023-01-11T21:38:06.6679473Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6679605Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6679694Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6679763Z     x0 = xindex % 4
2023-01-11T21:38:06.6679843Z     x1 = (xindex // 4)
2023-01-11T21:38:06.6679915Z     x2 = xindex
2023-01-11T21:38:06.6680119Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6680199Z     tmp1 = 2.0
2023-01-11T21:38:06.6680281Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6680349Z     tmp3 = 1.0
2023-01-11T21:38:06.6680429Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6680567Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6680654Z ''')
2023-01-11T21:38:06.6680662Z 
2023-01-11T21:38:06.6680666Z 
2023-01-11T21:38:06.6680825Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6680903Z import triton
2023-01-11T21:38:06.6681000Z import triton.language as tl
2023-01-11T21:38:06.6681119Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6681216Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6681354Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6681481Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6681486Z 
2023-01-11T21:38:06.6681885Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6681990Z @triton.jit
2023-01-11T21:38:06.6682128Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6682204Z     xnumel = 12
2023-01-11T21:38:06.6682306Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6682430Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6682514Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6682591Z     x0 = xindex % 3
2023-01-11T21:38:06.6682671Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6682743Z     x2 = xindex
2023-01-11T21:38:06.6682955Z     tmp0 = tl.load(in_ptr0 + (4 + x0 + (10*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6683031Z     tmp1 = 2.0
2023-01-11T21:38:06.6683105Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6683178Z     tmp3 = 1.0
2023-01-11T21:38:06.6683257Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6683396Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6683485Z ''')
2023-01-11T21:38:06.6683490Z 
2023-01-11T21:38:06.6683496Z 
2023-01-11T21:38:06.6683655Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6683732Z import triton
2023-01-11T21:38:06.6683822Z import triton.language as tl
2023-01-11T21:38:06.6683939Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6684043Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6684178Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6684305Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6684310Z 
2023-01-11T21:38:06.6684707Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6684784Z @triton.jit
2023-01-11T21:38:06.6684923Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6684991Z     xnumel = 12
2023-01-11T21:38:06.6685090Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6685222Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6685307Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6685414Z     x0 = xindex % 3
2023-01-11T21:38:06.6685518Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6685595Z     x2 = xindex
2023-01-11T21:38:06.6685723Z     tmp0 = tl.load(in_ptr0 + (7 + x0 + (10*x1)), xmask)
2023-01-11T21:38:06.6685800Z     tmp1 = 2.0
2023-01-11T21:38:06.6685881Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6685955Z     tmp3 = 1.0
2023-01-11T21:38:06.6686035Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6686171Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6686259Z ''')
2023-01-11T21:38:06.6686264Z 
2023-01-11T21:38:06.6686269Z 
2023-01-11T21:38:06.6686357Z async_compile.wait(globals())
2023-01-11T21:38:06.6686439Z del async_compile
2023-01-11T21:38:06.6686444Z 
2023-01-11T21:38:06.6686521Z def call(args):
2023-01-11T21:38:06.6686596Z     arg0_1, = args
2023-01-11T21:38:06.6686674Z     args.clear()
2023-01-11T21:38:06.6686768Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6686981Z         buf0 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6687069Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6687209Z         triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6687415Z         buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6687554Z         triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6687755Z         buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6687889Z         triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6687992Z         del arg0_1
2023-01-11T21:38:06.6688082Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6688088Z 
2023-01-11T21:38:06.6688092Z 
2023-01-11T21:38:06.6688168Z if __name__ == "__main__":
2023-01-11T21:38:06.6688289Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6688424Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6688637Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6688754Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6688759Z 
2023-01-11T21:38:06.6689027Z [2023-01-11 21:35:40,107] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 850
2023-01-11T21:38:06.6689449Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6689586Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6689849Z [2023-01-11 21:35:40,141] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 851
2023-01-11T21:38:06.6689855Z 
2023-01-11T21:38:06.6689956Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6690025Z import torch
2023-01-11T21:38:06.6690103Z import random
2023-01-11T21:38:06.6690229Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6690356Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6690361Z 
2023-01-11T21:38:06.6690445Z aten = torch.ops.aten
2023-01-11T21:38:06.6690584Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6690681Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6690686Z 
2023-01-11T21:38:06.6690755Z import triton
2023-01-11T21:38:06.6690853Z import triton.language as tl
2023-01-11T21:38:06.6690978Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6691123Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6691129Z 
2023-01-11T21:38:06.6691133Z 
2023-01-11T21:38:06.6691324Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6691401Z import triton
2023-01-11T21:38:06.6691496Z import triton.language as tl
2023-01-11T21:38:06.6691612Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6691709Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6691842Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6691970Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6691975Z 
2023-01-11T21:38:06.6692386Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6692466Z @triton.jit
2023-01-11T21:38:06.6692601Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6692677Z     xnumel = 16
2023-01-11T21:38:06.6692776Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6692903Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6692988Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6693067Z     x0 = xindex % 4
2023-01-11T21:38:06.6693148Z     x1 = (xindex // 4)
2023-01-11T21:38:06.6693219Z     x2 = xindex
2023-01-11T21:38:06.6693451Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6693527Z     tmp1 = 2.0
2023-01-11T21:38:06.6693600Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6693674Z     tmp3 = 1.0
2023-01-11T21:38:06.6693752Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6693888Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6694023Z ''')
2023-01-11T21:38:06.6694028Z 
2023-01-11T21:38:06.6694033Z 
2023-01-11T21:38:06.6694191Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6694268Z import triton
2023-01-11T21:38:06.6694356Z import triton.language as tl
2023-01-11T21:38:06.6694471Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6694699Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6694845Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6694983Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6694988Z 
2023-01-11T21:38:06.6695463Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6695541Z @triton.jit
2023-01-11T21:38:06.6695685Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6695758Z     xnumel = 12
2023-01-11T21:38:06.6695861Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6696001Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6696089Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6696170Z     x0 = xindex % 3
2023-01-11T21:38:06.6696250Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6696328Z     x2 = xindex
2023-01-11T21:38:06.6696587Z     tmp0 = tl.load(in_ptr0 + (4 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6696663Z     tmp1 = 2.0
2023-01-11T21:38:06.6696744Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6696818Z     tmp3 = 1.0
2023-01-11T21:38:06.6696899Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6697036Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6697123Z ''')
2023-01-11T21:38:06.6697171Z 
2023-01-11T21:38:06.6697176Z 
2023-01-11T21:38:06.6697350Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6697423Z import triton
2023-01-11T21:38:06.6697519Z import triton.language as tl
2023-01-11T21:38:06.6697636Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6697741Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6697876Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6698056Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6698062Z 
2023-01-11T21:38:06.6698463Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6698539Z @triton.jit
2023-01-11T21:38:06.6698667Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6698742Z     xnumel = 12
2023-01-11T21:38:06.6698842Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6698974Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6699063Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6699140Z     x0 = xindex % 3
2023-01-11T21:38:06.6699212Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6699285Z     x2 = xindex
2023-01-11T21:38:06.6699414Z     tmp0 = tl.load(in_ptr0 + (7 + x0 + (10*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6699490Z     tmp1 = 2.0
2023-01-11T21:38:06.6699574Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6699649Z     tmp3 = 1.0
2023-01-11T21:38:06.6699728Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6699859Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6699946Z ''')
2023-01-11T21:38:06.6699953Z 
2023-01-11T21:38:06.6699957Z 
2023-01-11T21:38:06.6700053Z async_compile.wait(globals())
2023-01-11T21:38:06.6700132Z del async_compile
2023-01-11T21:38:06.6700137Z 
2023-01-11T21:38:06.6700213Z def call(args):
2023-01-11T21:38:06.6700287Z     arg0_1, = args
2023-01-11T21:38:06.6700367Z     args.clear()
2023-01-11T21:38:06.6700463Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6700698Z         buf0 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6700794Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6700934Z         triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6701139Z         buf1 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6701279Z         triton_fused_add_1_1.run(arg0_1, buf1, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6701479Z         buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6701619Z         triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6701695Z         del arg0_1
2023-01-11T21:38:06.6701780Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6701785Z 
2023-01-11T21:38:06.6701789Z 
2023-01-11T21:38:06.6701872Z if __name__ == "__main__":
2023-01-11T21:38:06.6701993Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6702124Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6702336Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6702451Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6702456Z 
2023-01-11T21:38:06.6702728Z [2023-01-11 21:35:40,237] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 851
2023-01-11T21:38:06.6703148Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6703285Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6703537Z [2023-01-11 21:35:40,273] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 852
2023-01-11T21:38:06.6703552Z 
2023-01-11T21:38:06.6703645Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6703723Z import torch
2023-01-11T21:38:06.6703800Z import random
2023-01-11T21:38:06.6703922Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6704077Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6704082Z 
2023-01-11T21:38:06.6704168Z aten = torch.ops.aten
2023-01-11T21:38:06.6704306Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6704397Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6704403Z 
2023-01-11T21:38:06.6704478Z import triton
2023-01-11T21:38:06.6704575Z import triton.language as tl
2023-01-11T21:38:06.6704702Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6704844Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6704850Z 
2023-01-11T21:38:06.6704857Z 
2023-01-11T21:38:06.6705013Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6705090Z import triton
2023-01-11T21:38:06.6705184Z import triton.language as tl
2023-01-11T21:38:06.6705293Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6705400Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6705536Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6705664Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6705669Z 
2023-01-11T21:38:06.6706070Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6706150Z @triton.jit
2023-01-11T21:38:06.6706284Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6706360Z     xnumel = 4
2023-01-11T21:38:06.6706453Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6706615Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6706701Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6706775Z     x0 = xindex
2023-01-11T21:38:06.6706972Z     tmp0 = tl.load(in_ptr0 + (10*x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6707047Z     tmp1 = 2.0
2023-01-11T21:38:06.6707132Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6707200Z     tmp3 = 1.0
2023-01-11T21:38:06.6707280Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6707417Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6707504Z ''')
2023-01-11T21:38:06.6707509Z 
2023-01-11T21:38:06.6707513Z 
2023-01-11T21:38:06.6707673Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6707750Z import triton
2023-01-11T21:38:06.6707850Z import triton.language as tl
2023-01-11T21:38:06.6707959Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6708064Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6708201Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6708329Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6708335Z 
2023-01-11T21:38:06.6708740Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6708815Z @triton.jit
2023-01-11T21:38:06.6708949Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6709023Z     xnumel = 8
2023-01-11T21:38:06.6709115Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6709247Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6709333Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6709409Z     x0 = xindex % 2
2023-01-11T21:38:06.6709489Z     x1 = (xindex // 2)
2023-01-11T21:38:06.6709562Z     x2 = xindex
2023-01-11T21:38:06.6709774Z     tmp0 = tl.load(in_ptr0 + (1 + x0 + (10*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6709843Z     tmp1 = 2.0
2023-01-11T21:38:06.6709923Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6709998Z     tmp3 = 1.0
2023-01-11T21:38:06.6710076Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6710241Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6710331Z ''')
2023-01-11T21:38:06.6710336Z 
2023-01-11T21:38:06.6710341Z 
2023-01-11T21:38:06.6710502Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6710572Z import triton
2023-01-11T21:38:06.6710667Z import triton.language as tl
2023-01-11T21:38:06.6710785Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6710888Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6711023Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6711154Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6711159Z 
2023-01-11T21:38:06.6711562Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6711638Z @triton.jit
2023-01-11T21:38:06.6711767Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6711847Z     xnumel = 12
2023-01-11T21:38:06.6711949Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6712082Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6712169Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6712246Z     x0 = xindex % 3
2023-01-11T21:38:06.6712328Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6712394Z     x2 = xindex
2023-01-11T21:38:06.6712604Z     tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6712680Z     tmp1 = 2.0
2023-01-11T21:38:06.6712762Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6712872Z     tmp3 = 1.0
2023-01-11T21:38:06.6712952Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6713091Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6713172Z ''')
2023-01-11T21:38:06.6713177Z 
2023-01-11T21:38:06.6713182Z 
2023-01-11T21:38:06.6713342Z triton_fused_add_3_3 = async_compile.triton('''
2023-01-11T21:38:06.6713422Z import triton
2023-01-11T21:38:06.6713517Z import triton.language as tl
2023-01-11T21:38:06.6713633Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6713737Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6713872Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6713993Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6714006Z 
2023-01-11T21:38:06.6714405Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6714486Z @triton.jit
2023-01-11T21:38:06.6714623Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6714700Z     xnumel = 16
2023-01-11T21:38:06.6714799Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6714931Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6715018Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6715088Z     x0 = xindex % 4
2023-01-11T21:38:06.6715170Z     x1 = (xindex // 4)
2023-01-11T21:38:06.6715251Z     x2 = xindex
2023-01-11T21:38:06.6715381Z     tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask)
2023-01-11T21:38:06.6715471Z     tmp1 = 2.0
2023-01-11T21:38:06.6715561Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6715634Z     tmp3 = 1.0
2023-01-11T21:38:06.6715707Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6715844Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6715932Z ''')
2023-01-11T21:38:06.6715937Z 
2023-01-11T21:38:06.6715945Z 
2023-01-11T21:38:06.6716042Z async_compile.wait(globals())
2023-01-11T21:38:06.6716120Z del async_compile
2023-01-11T21:38:06.6716126Z 
2023-01-11T21:38:06.6716202Z def call(args):
2023-01-11T21:38:06.6716277Z     arg0_1, = args
2023-01-11T21:38:06.6716356Z     args.clear()
2023-01-11T21:38:06.6716444Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6716691Z         buf0 = empty_strided((2, 2, 1), (2, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6716788Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6716929Z         triton_fused_add_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6717136Z         buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6717274Z         triton_fused_add_1_1.run(arg0_1, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6717476Z         buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6717608Z         triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6717812Z         buf3 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6717948Z         triton_fused_add_3_3.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6718025Z         del arg0_1
2023-01-11T21:38:06.6718126Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.6718131Z 
2023-01-11T21:38:06.6718136Z 
2023-01-11T21:38:06.6718219Z if __name__ == "__main__":
2023-01-11T21:38:06.6718342Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6718471Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6718676Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6718791Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6718796Z 
2023-01-11T21:38:06.6719063Z [2023-01-11 21:35:40,366] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 852
2023-01-11T21:38:06.6719095Z 
2023-01-11T21:38:06.6719199Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6719277Z import torch
2023-01-11T21:38:06.6719352Z import random
2023-01-11T21:38:06.6719473Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6719602Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6719607Z 
2023-01-11T21:38:06.6719684Z aten = torch.ops.aten
2023-01-11T21:38:06.6719823Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6719921Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6719926Z 
2023-01-11T21:38:06.6720003Z import triton
2023-01-11T21:38:06.6720099Z import triton.language as tl
2023-01-11T21:38:06.6720226Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6720367Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6720373Z 
2023-01-11T21:38:06.6720377Z 
2023-01-11T21:38:06.6720533Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6720607Z import triton
2023-01-11T21:38:06.6720703Z import triton.language as tl
2023-01-11T21:38:06.6720819Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6720925Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6721064Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6721192Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6721197Z 
2023-01-11T21:38:06.6721601Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6721677Z @triton.jit
2023-01-11T21:38:06.6721803Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6721879Z     xnumel = 4
2023-01-11T21:38:06.6721979Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6722109Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6722198Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6722271Z     x0 = xindex
2023-01-11T21:38:06.6722489Z     tmp0 = tl.load(in_ptr0 + (10*x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6722558Z     tmp1 = 2.0
2023-01-11T21:38:06.6722673Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6722750Z     tmp3 = 1.0
2023-01-11T21:38:06.6722832Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6722972Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6723059Z ''')
2023-01-11T21:38:06.6723064Z 
2023-01-11T21:38:06.6723069Z 
2023-01-11T21:38:06.6723227Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6723302Z import triton
2023-01-11T21:38:06.6723391Z import triton.language as tl
2023-01-11T21:38:06.6723510Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6723613Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6723747Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6723877Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6723883Z 
2023-01-11T21:38:06.6724287Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6724363Z @triton.jit
2023-01-11T21:38:06.6724499Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6724567Z     xnumel = 8
2023-01-11T21:38:06.6724666Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6724797Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6724882Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6724958Z     x0 = xindex % 2
2023-01-11T21:38:06.6725038Z     x1 = (xindex // 2)
2023-01-11T21:38:06.6725104Z     x2 = xindex
2023-01-11T21:38:06.6725338Z     tmp0 = tl.load(in_ptr0 + (1 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6725441Z     tmp1 = 2.0
2023-01-11T21:38:06.6725523Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6725598Z     tmp3 = 1.0
2023-01-11T21:38:06.6725682Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6725823Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6725903Z ''')
2023-01-11T21:38:06.6725917Z 
2023-01-11T21:38:06.6725922Z 
2023-01-11T21:38:06.6726073Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6726150Z import triton
2023-01-11T21:38:06.6726246Z import triton.language as tl
2023-01-11T21:38:06.6726363Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6726465Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6726599Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6726733Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6726738Z 
2023-01-11T21:38:06.6727137Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6727210Z @triton.jit
2023-01-11T21:38:06.6727344Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6727423Z     xnumel = 12
2023-01-11T21:38:06.6727523Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6727656Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6727742Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6727818Z     x0 = xindex % 3
2023-01-11T21:38:06.6727892Z     x1 = (xindex // 3)
2023-01-11T21:38:06.6727964Z     x2 = xindex
2023-01-11T21:38:06.6728199Z     tmp0 = tl.load(in_ptr0 + (3 + x0 + (10*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6728276Z     tmp1 = 2.0
2023-01-11T21:38:06.6728357Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6728434Z     tmp3 = 1.0
2023-01-11T21:38:06.6728515Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6728644Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6728731Z ''')
2023-01-11T21:38:06.6728737Z 
2023-01-11T21:38:06.6728741Z 
2023-01-11T21:38:06.6728902Z triton_fused_add_3_3 = async_compile.triton('''
2023-01-11T21:38:06.6729012Z import triton
2023-01-11T21:38:06.6729109Z import triton.language as tl
2023-01-11T21:38:06.6729228Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6729336Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6729464Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6729594Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6729599Z 
2023-01-11T21:38:06.6730001Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6730082Z @triton.jit
2023-01-11T21:38:06.6730215Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6730293Z     xnumel = 16
2023-01-11T21:38:06.6730392Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6730525Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6730603Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6730681Z     x0 = xindex % 4
2023-01-11T21:38:06.6730763Z     x1 = (xindex // 4)
2023-01-11T21:38:06.6730837Z     x2 = xindex
2023-01-11T21:38:06.6730968Z     tmp0 = tl.load(in_ptr0 + (6 + x0 + (10*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6731043Z     tmp1 = 2.0
2023-01-11T21:38:06.6731127Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.6731195Z     tmp3 = 1.0
2023-01-11T21:38:06.6731274Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6731415Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6731504Z ''')
2023-01-11T21:38:06.6731536Z 
2023-01-11T21:38:06.6731541Z 
2023-01-11T21:38:06.6731638Z async_compile.wait(globals())
2023-01-11T21:38:06.6731717Z del async_compile
2023-01-11T21:38:06.6731722Z 
2023-01-11T21:38:06.6731799Z def call(args):
2023-01-11T21:38:06.6731867Z     arg0_1, = args
2023-01-11T21:38:06.6731945Z     args.clear()
2023-01-11T21:38:06.6732044Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6732253Z         buf0 = empty_strided((2, 2, 1), (2, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6732348Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6732489Z         triton_fused_add_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.6732700Z         buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6732843Z         triton_fused_add_1_1.run(arg0_1, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6733037Z         buf2 = empty_strided((2, 2, 3), (6, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6733180Z         triton_fused_add_2_2.run(arg0_1, buf2, 12, grid=grid(12), stream=stream0)
2023-01-11T21:38:06.6733382Z         buf3 = empty_strided((2, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6733517Z         triton_fused_add_3_3.run(arg0_1, buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6733596Z         del arg0_1
2023-01-11T21:38:06.6733695Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.6733701Z 
2023-01-11T21:38:06.6733705Z 
2023-01-11T21:38:06.6733789Z if __name__ == "__main__":
2023-01-11T21:38:06.6733912Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6734034Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6734249Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6734366Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6734371Z 
2023-01-11T21:38:06.6734446Z ok (0.759s)
2023-01-11T21:38:06.6735067Z   test_squeeze1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6735204Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6735465Z [2023-01-11 21:35:40,388] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 853
2023-01-11T21:38:06.6735727Z [2023-01-11 21:35:40,460] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 853
2023-01-11T21:38:06.6736144Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6736282Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6736541Z [2023-01-11 21:35:40,481] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 854
2023-01-11T21:38:06.6736794Z [2023-01-11 21:35:40,556] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 854
2023-01-11T21:38:06.6736799Z 
2023-01-11T21:38:06.6736897Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6736973Z import torch
2023-01-11T21:38:06.6737050Z import random
2023-01-11T21:38:06.6737225Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6737353Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6737358Z 
2023-01-11T21:38:06.6737441Z aten = torch.ops.aten
2023-01-11T21:38:06.6737571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6737721Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6737727Z 
2023-01-11T21:38:06.6737803Z import triton
2023-01-11T21:38:06.6737897Z import triton.language as tl
2023-01-11T21:38:06.6738028Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6738174Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6738179Z 
2023-01-11T21:38:06.6738183Z 
2023-01-11T21:38:06.6738358Z triton_fused_add_1_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6738435Z import triton
2023-01-11T21:38:06.6738523Z import triton.language as tl
2023-01-11T21:38:06.6738644Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6738748Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6738884Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6739010Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6739016Z 
2023-01-11T21:38:06.6739434Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6739514Z @triton.jit
2023-01-11T21:38:06.6739661Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6739732Z     xnumel = 8
2023-01-11T21:38:06.6739834Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6739967Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6740055Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6740129Z     x0 = xindex
2023-01-11T21:38:06.6740322Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6740424Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6740490Z     tmp1 = 1
2023-01-11T21:38:06.6740571Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6740644Z     tmp3 = 2
2023-01-11T21:38:06.6740724Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6740807Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.6740944Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6741080Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6741160Z ''')
2023-01-11T21:38:06.6741172Z 
2023-01-11T21:38:06.6741176Z 
2023-01-11T21:38:06.6741296Z async_compile.wait(globals())
2023-01-11T21:38:06.6741379Z del async_compile
2023-01-11T21:38:06.6741384Z 
2023-01-11T21:38:06.6741462Z def call(args):
2023-01-11T21:38:06.6741538Z     arg0_1, = args
2023-01-11T21:38:06.6741616Z     args.clear()
2023-01-11T21:38:06.6741713Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6741921Z         buf0 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6742121Z         buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6742218Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6742371Z         triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6742447Z         del arg0_1
2023-01-11T21:38:06.6742536Z         return (buf0, buf1, )
2023-01-11T21:38:06.6742541Z 
2023-01-11T21:38:06.6742545Z 
2023-01-11T21:38:06.6742628Z if __name__ == "__main__":
2023-01-11T21:38:06.6742750Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6742879Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6743109Z     arg0_1 = rand_strided((1, 2, 1, 2, 2, 1, 1), (8, 4, 4, 2, 1, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6743223Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6743228Z 
2023-01-11T21:38:06.6743233Z 
2023-01-11T21:38:06.6743333Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6743409Z import torch
2023-01-11T21:38:06.6743488Z import random
2023-01-11T21:38:06.6743611Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6743737Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6743769Z 
2023-01-11T21:38:06.6743855Z aten = torch.ops.aten
2023-01-11T21:38:06.6743987Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6744086Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6744091Z 
2023-01-11T21:38:06.6744166Z import triton
2023-01-11T21:38:06.6744264Z import triton.language as tl
2023-01-11T21:38:06.6744392Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6744534Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6744539Z 
2023-01-11T21:38:06.6744544Z 
2023-01-11T21:38:06.6744713Z triton_fused_add_1_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6744789Z import triton
2023-01-11T21:38:06.6744876Z import triton.language as tl
2023-01-11T21:38:06.6744993Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6745096Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6745233Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6745364Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6745370Z 
2023-01-11T21:38:06.6745839Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6745915Z @triton.jit
2023-01-11T21:38:06.6746060Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6746130Z     xnumel = 8
2023-01-11T21:38:06.6746229Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6746361Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6746447Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6746521Z     x0 = xindex
2023-01-11T21:38:06.6746737Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6746858Z     tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6746927Z     tmp1 = 1
2023-01-11T21:38:06.6747010Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6747082Z     tmp3 = 2
2023-01-11T21:38:06.6747163Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6747242Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.6747406Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6747542Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6747623Z ''')
2023-01-11T21:38:06.6747628Z 
2023-01-11T21:38:06.6747633Z 
2023-01-11T21:38:06.6747728Z async_compile.wait(globals())
2023-01-11T21:38:06.6747807Z del async_compile
2023-01-11T21:38:06.6747812Z 
2023-01-11T21:38:06.6747889Z def call(args):
2023-01-11T21:38:06.6747964Z     arg0_1, = args
2023-01-11T21:38:06.6748041Z     args.clear()
2023-01-11T21:38:06.6748136Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6748337Z         buf0 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6748546Z         buf1 = empty_strided((2, 2, 2), (4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6748644Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6748795Z         triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6748873Z         del arg0_1
2023-01-11T21:38:06.6748965Z         return (buf0, buf1, )
2023-01-11T21:38:06.6748970Z 
2023-01-11T21:38:06.6748974Z 
2023-01-11T21:38:06.6749056Z if __name__ == "__main__":
2023-01-11T21:38:06.6749175Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6749296Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6749530Z     arg0_1 = rand_strided((1, 2, 1, 2, 2, 1, 1), (8, 4, 4, 2, 1, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6749646Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6749651Z 
2023-01-11T21:38:06.6749725Z ok (0.189s)
2023-01-11T21:38:06.6750256Z   test_squeeze2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6750394Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6750659Z [2023-01-11 21:35:40,579] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 855
2023-01-11T21:38:06.6750924Z [2023-01-11 21:35:40,657] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 855
2023-01-11T21:38:06.6751340Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6751476Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6751733Z [2023-01-11 21:35:40,679] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 856
2023-01-11T21:38:06.6751987Z [2023-01-11 21:35:40,753] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 856
2023-01-11T21:38:06.6752001Z 
2023-01-11T21:38:06.6752095Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6752174Z import torch
2023-01-11T21:38:06.6752255Z import random
2023-01-11T21:38:06.6752378Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6752507Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6752513Z 
2023-01-11T21:38:06.6752597Z aten = torch.ops.aten
2023-01-11T21:38:06.6752736Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6752831Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6752836Z 
2023-01-11T21:38:06.6752911Z import triton
2023-01-11T21:38:06.6753005Z import triton.language as tl
2023-01-11T21:38:06.6753133Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6753300Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6753306Z 
2023-01-11T21:38:06.6753311Z 
2023-01-11T21:38:06.6753480Z triton_fused_add_1_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6753559Z import triton
2023-01-11T21:38:06.6753656Z import triton.language as tl
2023-01-11T21:38:06.6753764Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6753871Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6754005Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6754132Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6754137Z 
2023-01-11T21:38:06.6754559Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6754639Z @triton.jit
2023-01-11T21:38:06.6754787Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6754863Z     xnumel = 16
2023-01-11T21:38:06.6754956Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6755088Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6755174Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6755251Z     x0 = xindex
2023-01-11T21:38:06.6755445Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6755546Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6755621Z     tmp1 = 1
2023-01-11T21:38:06.6755696Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6755769Z     tmp3 = 2
2023-01-11T21:38:06.6755877Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6755955Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.6756092Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6756229Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6756309Z ''')
2023-01-11T21:38:06.6756323Z 
2023-01-11T21:38:06.6756329Z 
2023-01-11T21:38:06.6756418Z async_compile.wait(globals())
2023-01-11T21:38:06.6756498Z del async_compile
2023-01-11T21:38:06.6756503Z 
2023-01-11T21:38:06.6756581Z def call(args):
2023-01-11T21:38:06.6756657Z     arg0_1, = args
2023-01-11T21:38:06.6756735Z     args.clear()
2023-01-11T21:38:06.6756830Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6757056Z         buf0 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6757277Z         buf1 = empty_strided((2, 1, 2, 2, 2, 1), (8, 8, 4, 2, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6757377Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6757533Z         triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6757609Z         del arg0_1
2023-01-11T21:38:06.6757694Z         return (buf0, buf1, )
2023-01-11T21:38:06.6757699Z 
2023-01-11T21:38:06.6757703Z 
2023-01-11T21:38:06.6757785Z if __name__ == "__main__":
2023-01-11T21:38:06.6757908Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6758038Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6758268Z     arg0_1 = rand_strided((1, 2, 1, 2, 2, 2, 1), (16, 8, 8, 4, 2, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6758383Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6758389Z 
2023-01-11T21:38:06.6758393Z 
2023-01-11T21:38:06.6758494Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6758569Z import torch
2023-01-11T21:38:06.6758646Z import random
2023-01-11T21:38:06.6758767Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6758898Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6758903Z 
2023-01-11T21:38:06.6758989Z aten = torch.ops.aten
2023-01-11T21:38:06.6759121Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6759218Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6759223Z 
2023-01-11T21:38:06.6759325Z import triton
2023-01-11T21:38:06.6759422Z import triton.language as tl
2023-01-11T21:38:06.6759548Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6759689Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6759694Z 
2023-01-11T21:38:06.6759698Z 
2023-01-11T21:38:06.6759867Z triton_fused_add_1_add_2_0 = async_compile.triton('''
2023-01-11T21:38:06.6759947Z import triton
2023-01-11T21:38:06.6760035Z import triton.language as tl
2023-01-11T21:38:06.6760153Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6760259Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6760396Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6760523Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6760528Z 
2023-01-11T21:38:06.6760954Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6761029Z @triton.jit
2023-01-11T21:38:06.6761175Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6761243Z     xnumel = 16
2023-01-11T21:38:06.6761345Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6761479Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6761564Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6761638Z     x0 = xindex
2023-01-11T21:38:06.6761852Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6762000Z     tmp5 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6762066Z     tmp1 = 1
2023-01-11T21:38:06.6762148Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6762220Z     tmp3 = 2
2023-01-11T21:38:06.6762304Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6762383Z     tmp6 = tmp5 + tmp3
2023-01-11T21:38:06.6762522Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.6762658Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6762739Z ''')
2023-01-11T21:38:06.6762744Z 
2023-01-11T21:38:06.6762755Z 
2023-01-11T21:38:06.6762843Z async_compile.wait(globals())
2023-01-11T21:38:06.6762922Z del async_compile
2023-01-11T21:38:06.6762927Z 
2023-01-11T21:38:06.6763003Z def call(args):
2023-01-11T21:38:06.6763080Z     arg0_1, = args
2023-01-11T21:38:06.6763159Z     args.clear()
2023-01-11T21:38:06.6763254Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6763478Z         buf0 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6763700Z         buf1 = empty_strided((2, 1, 2, 2, 2, 1), (8, 8, 4, 2, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6763795Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6763949Z         triton_fused_add_1_add_2_0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6764030Z         del arg0_1
2023-01-11T21:38:06.6764115Z         return (buf0, buf1, )
2023-01-11T21:38:06.6764120Z 
2023-01-11T21:38:06.6764124Z 
2023-01-11T21:38:06.6764206Z if __name__ == "__main__":
2023-01-11T21:38:06.6764326Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6764457Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6764686Z     arg0_1 = rand_strided((1, 2, 1, 2, 2, 2, 1), (16, 8, 8, 4, 2, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6764801Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6764809Z 
2023-01-11T21:38:06.6764882Z ok (0.197s)
2023-01-11T21:38:06.6765369Z   test_stack_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6765505Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6765768Z [2023-01-11 21:35:40,774] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 857
2023-01-11T21:38:06.6766032Z [2023-01-11 21:35:40,867] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 857
2023-01-11T21:38:06.6766449Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6766588Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6766847Z [2023-01-11 21:35:40,886] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 858
2023-01-11T21:38:06.6767117Z [2023-01-11 21:35:40,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 858
2023-01-11T21:38:06.6767123Z 
2023-01-11T21:38:06.6767216Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6767293Z import torch
2023-01-11T21:38:06.6767369Z import random
2023-01-11T21:38:06.6767490Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6767617Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6767622Z 
2023-01-11T21:38:06.6767708Z aten = torch.ops.aten
2023-01-11T21:38:06.6767874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6767965Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6767970Z 
2023-01-11T21:38:06.6768047Z import triton
2023-01-11T21:38:06.6768141Z import triton.language as tl
2023-01-11T21:38:06.6768271Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6768414Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6768420Z 
2023-01-11T21:38:06.6768424Z 
2023-01-11T21:38:06.6768584Z triton_fused_cat_0 = async_compile.triton('''
2023-01-11T21:38:06.6768661Z import triton
2023-01-11T21:38:06.6768762Z import triton.language as tl
2023-01-11T21:38:06.6768872Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6768975Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6769109Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6769236Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6769244Z 
2023-01-11T21:38:06.6769648Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6769726Z @triton.jit
2023-01-11T21:38:06.6769867Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6769945Z     xnumel = 192
2023-01-11T21:38:06.6770037Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6770169Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6770256Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6770334Z     x0 = xindex % 16
2023-01-11T21:38:06.6770407Z     x2 = xindex
2023-01-11T21:38:06.6770506Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6770645Z     tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6770725Z ''')
2023-01-11T21:38:06.6770734Z 
2023-01-11T21:38:06.6770739Z 
2023-01-11T21:38:06.6770893Z triton_fused_cat_1 = async_compile.triton('''
2023-01-11T21:38:06.6770970Z import triton
2023-01-11T21:38:06.6771067Z import triton.language as tl
2023-01-11T21:38:06.6771182Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6771285Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6771454Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6771572Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6771585Z 
2023-01-11T21:38:06.6771975Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 2), equal_to_1=())]})
2023-01-11T21:38:06.6772049Z @triton.jit
2023-01-11T21:38:06.6772181Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6772255Z     xnumel = 192
2023-01-11T21:38:06.6772354Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6772483Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6772565Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6772638Z     x1 = (xindex // 16)
2023-01-11T21:38:06.6772708Z     x2 = xindex
2023-01-11T21:38:06.6772808Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.6772943Z     tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6773029Z ''')
2023-01-11T21:38:06.6773034Z 
2023-01-11T21:38:06.6773039Z 
2023-01-11T21:38:06.6773130Z async_compile.wait(globals())
2023-01-11T21:38:06.6773207Z del async_compile
2023-01-11T21:38:06.6773212Z 
2023-01-11T21:38:06.6773287Z def call(args):
2023-01-11T21:38:06.6773359Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6773433Z     args.clear()
2023-01-11T21:38:06.6773525Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6773735Z         buf2 = empty_strided((12, 16, 2), (32, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6773872Z         buf0 = as_strided(buf2, (12, 16, 1), (32, 2, 1))  # alias
2023-01-11T21:38:06.6773964Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6774101Z         triton_fused_cat_0.run(arg0_1, buf0, 192, grid=grid(192), stream=stream0)
2023-01-11T21:38:06.6774167Z         del arg0_1
2023-01-11T21:38:06.6774286Z         buf1 = as_strided(buf2, (12, 16, 1), (32, 2, 1), 1)  # alias
2023-01-11T21:38:06.6774423Z         triton_fused_cat_1.run(arg1_1, buf1, 192, grid=grid(192), stream=stream0)
2023-01-11T21:38:06.6774629Z         del arg1_1
2023-01-11T21:38:06.6774709Z         return (buf2, )
2023-01-11T21:38:06.6774714Z 
2023-01-11T21:38:06.6774719Z 
2023-01-11T21:38:06.6774798Z if __name__ == "__main__":
2023-01-11T21:38:06.6774915Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6775042Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6775241Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6775445Z     arg1_1 = rand_strided((12, 1), (1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6775564Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6775570Z 
2023-01-11T21:38:06.6775574Z 
2023-01-11T21:38:06.6775672Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6775747Z import torch
2023-01-11T21:38:06.6775824Z import random
2023-01-11T21:38:06.6775947Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6776070Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6776076Z 
2023-01-11T21:38:06.6776150Z aten = torch.ops.aten
2023-01-11T21:38:06.6776286Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6776382Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6776387Z 
2023-01-11T21:38:06.6776460Z import triton
2023-01-11T21:38:06.6776551Z import triton.language as tl
2023-01-11T21:38:06.6776676Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6776817Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6776822Z 
2023-01-11T21:38:06.6776827Z 
2023-01-11T21:38:06.6776978Z triton_fused_cat_0 = async_compile.triton('''
2023-01-11T21:38:06.6777045Z import triton
2023-01-11T21:38:06.6777189Z import triton.language as tl
2023-01-11T21:38:06.6777385Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6777490Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6777630Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6777755Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6777760Z 
2023-01-11T21:38:06.6778163Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6778236Z @triton.jit
2023-01-11T21:38:06.6778370Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6778449Z     xnumel = 192
2023-01-11T21:38:06.6778549Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6778687Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6778772Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6778850Z     x0 = xindex % 16
2023-01-11T21:38:06.6778922Z     x2 = xindex
2023-01-11T21:38:06.6779040Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6779187Z     tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6779281Z ''')
2023-01-11T21:38:06.6779286Z 
2023-01-11T21:38:06.6779291Z 
2023-01-11T21:38:06.6779459Z triton_fused_cat_1 = async_compile.triton('''
2023-01-11T21:38:06.6779534Z import triton
2023-01-11T21:38:06.6779630Z import triton.language as tl
2023-01-11T21:38:06.6779752Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6779852Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6780042Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6780178Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6780183Z 
2023-01-11T21:38:06.6780655Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 2), equal_to_1=())]})
2023-01-11T21:38:06.6780731Z @triton.jit
2023-01-11T21:38:06.6780864Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6780941Z     xnumel = 192
2023-01-11T21:38:06.6781039Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6781163Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6781247Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6781329Z     x1 = (xindex // 16)
2023-01-11T21:38:06.6781403Z     x2 = xindex
2023-01-11T21:38:06.6781523Z     tmp0 = tl.load(in_ptr0 + (x1), xmask).to(tl.float32)
2023-01-11T21:38:06.6781663Z     tl.store(out_ptr0 + (2*x2 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.6781750Z ''')
2023-01-11T21:38:06.6781755Z 
2023-01-11T21:38:06.6781760Z 
2023-01-11T21:38:06.6781856Z async_compile.wait(globals())
2023-01-11T21:38:06.6781929Z del async_compile
2023-01-11T21:38:06.6781934Z 
2023-01-11T21:38:06.6782015Z def call(args):
2023-01-11T21:38:06.6782095Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6782174Z     args.clear()
2023-01-11T21:38:06.6782268Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6782478Z         buf2 = empty_strided((12, 16, 2), (32, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6782590Z         buf0 = as_strided(buf2, (12, 16, 1), (32, 2, 1))  # alias
2023-01-11T21:38:06.6782678Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6782817Z         triton_fused_cat_0.run(arg0_1, buf0, 192, grid=grid(192), stream=stream0)
2023-01-11T21:38:06.6782891Z         del arg0_1
2023-01-11T21:38:06.6783011Z         buf1 = as_strided(buf2, (12, 16, 1), (32, 2, 1), 1)  # alias
2023-01-11T21:38:06.6783153Z         triton_fused_cat_1.run(arg1_1, buf1, 192, grid=grid(192), stream=stream0)
2023-01-11T21:38:06.6783229Z         del arg1_1
2023-01-11T21:38:06.6783309Z         return (buf2, )
2023-01-11T21:38:06.6783314Z 
2023-01-11T21:38:06.6783319Z 
2023-01-11T21:38:06.6783427Z if __name__ == "__main__":
2023-01-11T21:38:06.6783542Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6783670Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6783873Z     arg0_1 = rand_strided((1, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6784075Z     arg1_1 = rand_strided((12, 1), (1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6784198Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6784203Z 
2023-01-11T21:38:06.6784276Z ok (0.223s)
2023-01-11T21:38:06.6784727Z   test_std_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6784874Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6785135Z [2023-01-11 21:35:41,014] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 859
2023-01-11T21:38:06.6785342Z [2023-01-11 21:35:41,112] torch._inductor.scheduler: [DEBUG] remove_buffer('buf10')
2023-01-11T21:38:06.6785551Z [2023-01-11 21:35:41,112] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2')
2023-01-11T21:38:06.6785755Z [2023-01-11 21:35:41,112] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.6785961Z [2023-01-11 21:35:41,118] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.6786189Z [2023-01-11 21:35:41,118] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.6786398Z [2023-01-11 21:35:41,123] torch._inductor.scheduler: [DEBUG] remove_buffer('buf15')
2023-01-11T21:38:06.6786602Z [2023-01-11 21:35:41,123] torch._inductor.scheduler: [DEBUG] remove_buffer('buf12')
2023-01-11T21:38:06.6786607Z 
2023-01-11T21:38:06.6786709Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6786779Z import torch
2023-01-11T21:38:06.6786856Z import random
2023-01-11T21:38:06.6786978Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6787106Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6787112Z 
2023-01-11T21:38:06.6787197Z aten = torch.ops.aten
2023-01-11T21:38:06.6787336Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6787434Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6787440Z 
2023-01-11T21:38:06.6787518Z import triton
2023-01-11T21:38:06.6787606Z import triton.language as tl
2023-01-11T21:38:06.6787737Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6787880Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6787885Z 
2023-01-11T21:38:06.6787890Z 
2023-01-11T21:38:06.6788064Z triton_fused_std_var_var_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6788144Z import triton
2023-01-11T21:38:06.6788239Z import triton.language as tl
2023-01-11T21:38:06.6788356Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6788455Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6788589Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6788719Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6788724Z 
2023-01-11T21:38:06.6788817Z @reduction(size_hints=[1, 256],
2023-01-11T21:38:06.6788935Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6789023Z               filename=__file__,
2023-01-11T21:38:06.6789466Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr1', 'in_out_ptr0', 'in_out_ptr2'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 5), equal_to_1=())]})
2023-01-11T21:38:06.6789549Z @triton.jit
2023-01-11T21:38:06.6789776Z def triton_(in_out_ptr0, in_out_ptr1, in_out_ptr2, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6789848Z     xnumel = 1
2023-01-11T21:38:06.6789923Z     rnumel = 256
2023-01-11T21:38:06.6790023Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6790161Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6790247Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6790368Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6790489Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6790591Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6790686Z         rindex = roffset + rbase
2023-01-11T21:38:06.6790776Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6790854Z         r0 = rindex
2023-01-11T21:38:06.6791051Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6791175Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.6791294Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6791406Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6791526Z     _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6791633Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6791723Z         rindex = roffset + rbase
2023-01-11T21:38:06.6791810Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6791884Z         r0 = rindex
2023-01-11T21:38:06.6792079Z         tmp2 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6792149Z         tmp3 = 256
2023-01-11T21:38:06.6792261Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.6792378Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.6792461Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.6792584Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.6792704Z         _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8)
2023-01-11T21:38:06.6792823Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6792929Z     tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6793049Z     _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6793169Z     _tmp15 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6793280Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6793371Z         rindex = roffset + rbase
2023-01-11T21:38:06.6793458Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6793532Z         r0 = rindex
2023-01-11T21:38:06.6793721Z         tmp9 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6793801Z         tmp10 = 256
2023-01-11T21:38:06.6793885Z         tmp11 = tmp8 / tmp10
2023-01-11T21:38:06.6794004Z         tmp12 = tmp9 - tmp11
2023-01-11T21:38:06.6794091Z         tmp13 = tmp12 * tmp12
2023-01-11T21:38:06.6794216Z         _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14)
2023-01-11T21:38:06.6794342Z         _tmp15 = tl.where(xmask & rmask, _tmp15 + tmp9, _tmp15)
2023-01-11T21:38:06.6794453Z     tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6794570Z     tmp15 = tl.reshape(tl.sum(_tmp15, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6794688Z     _tmp21 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6794798Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6794888Z         rindex = roffset + rbase
2023-01-11T21:38:06.6794975Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6795051Z         r0 = rindex
2023-01-11T21:38:06.6795149Z         tmp16 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.6795226Z         tmp17 = 256
2023-01-11T21:38:06.6795315Z         tmp18 = tmp15 / tmp17
2023-01-11T21:38:06.6795435Z         tmp19 = tmp16 - tmp18
2023-01-11T21:38:06.6795524Z         tmp20 = tmp19 * tmp19
2023-01-11T21:38:06.6795650Z         _tmp21 = tl.where(xmask & rmask, _tmp21 + tmp20, _tmp21)
2023-01-11T21:38:06.6795767Z     tmp21 = tl.reshape(tl.sum(_tmp21, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6795862Z     tmp22 = 256
2023-01-11T21:38:06.6795948Z     tmp23 = tmp21 / tmp22
2023-01-11T21:38:06.6796035Z     tmp24 = tl.sqrt(tmp23)
2023-01-11T21:38:06.6796121Z     tmp25 = tmp14 / tmp22
2023-01-11T21:38:06.6796196Z     tmp26 = 255
2023-01-11T21:38:06.6796279Z     tmp27 = tmp7 / tmp26
2023-01-11T21:38:06.6796422Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp24, None)
2023-01-11T21:38:06.6796555Z     tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp25, None)
2023-01-11T21:38:06.6796692Z     tl.store(in_out_ptr2 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp27, None)
2023-01-11T21:38:06.6796783Z ''')
2023-01-11T21:38:06.6796789Z 
2023-01-11T21:38:06.6796793Z 
2023-01-11T21:38:06.6796962Z triton_fused_var_2_var_3_1 = async_compile.triton('''
2023-01-11T21:38:06.6797040Z import triton
2023-01-11T21:38:06.6797137Z import triton.language as tl
2023-01-11T21:38:06.6797254Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6797362Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6797489Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6797616Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6797621Z 
2023-01-11T21:38:06.6797713Z @reduction(size_hints=[32, 8],
2023-01-11T21:38:06.6797833Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6797920Z               filename=__file__,
2023-01-11T21:38:06.6798332Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6798433Z @triton.jit
2023-01-11T21:38:06.6798619Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6798688Z     xnumel = 32
2023-01-11T21:38:06.6798763Z     rnumel = 8
2023-01-11T21:38:06.6798866Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6799004Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6799089Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6799211Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6799284Z     x0 = xindex
2023-01-11T21:38:06.6799396Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6799503Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6799593Z         rindex = roffset + rbase
2023-01-11T21:38:06.6799681Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6799756Z         r1 = rindex
2023-01-11T21:38:06.6799978Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6800102Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.6800211Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6800329Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6800451Z     _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6800559Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6800652Z         rindex = roffset + rbase
2023-01-11T21:38:06.6800742Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6800818Z         r1 = rindex
2023-01-11T21:38:06.6801029Z         tmp2 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6801107Z         tmp3 = 8
2023-01-11T21:38:06.6801191Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.6801310Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.6801393Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.6801520Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.6801643Z         _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8)
2023-01-11T21:38:06.6801753Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6801870Z     tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6802017Z     _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6802125Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6802215Z         rindex = roffset + rbase
2023-01-11T21:38:06.6802304Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6802379Z         r1 = rindex
2023-01-11T21:38:06.6802492Z         tmp9 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6802568Z         tmp10 = 8
2023-01-11T21:38:06.6802654Z         tmp11 = tmp8 / tmp10
2023-01-11T21:38:06.6802773Z         tmp12 = tmp9 - tmp11
2023-01-11T21:38:06.6802863Z         tmp13 = tmp12 * tmp12
2023-01-11T21:38:06.6802993Z         _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14)
2023-01-11T21:38:06.6803115Z     tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6803182Z     tmp15 = 7
2023-01-11T21:38:06.6803266Z     tmp16 = tmp7 / tmp15
2023-01-11T21:38:06.6803342Z     tmp17 = 8
2023-01-11T21:38:06.6803424Z     tmp18 = tmp14 / tmp17
2023-01-11T21:38:06.6803572Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp16, xmask)
2023-01-11T21:38:06.6803712Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp18, xmask)
2023-01-11T21:38:06.6803803Z ''')
2023-01-11T21:38:06.6803808Z 
2023-01-11T21:38:06.6803812Z 
2023-01-11T21:38:06.6803981Z triton_fused_std_1_std_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6804051Z import triton
2023-01-11T21:38:06.6804145Z import triton.language as tl
2023-01-11T21:38:06.6804261Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6804365Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6804501Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6804655Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6804661Z 
2023-01-11T21:38:06.6804753Z @reduction(size_hints=[32, 8],
2023-01-11T21:38:06.6804868Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.6804957Z               filename=__file__,
2023-01-11T21:38:06.6805366Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6805445Z @triton.jit
2023-01-11T21:38:06.6805654Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6805739Z     xnumel = 32
2023-01-11T21:38:06.6805831Z     rnumel = 8
2023-01-11T21:38:06.6805929Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6806060Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6806148Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6806273Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6806347Z     x0 = xindex
2023-01-11T21:38:06.6806465Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6806575Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6806666Z         rindex = roffset + rbase
2023-01-11T21:38:06.6806748Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6806825Z         r1 = rindex
2023-01-11T21:38:06.6807043Z         tmp0 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6807165Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.6807282Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6807400Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6807518Z     _tmp8 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6807620Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6807710Z         rindex = roffset + rbase
2023-01-11T21:38:06.6807799Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6807874Z         r1 = rindex
2023-01-11T21:38:06.6808122Z         tmp2 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6808199Z         tmp3 = 8
2023-01-11T21:38:06.6808284Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.6808393Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.6808476Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.6808599Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.6808720Z         _tmp8 = tl.where(xmask & rmask, _tmp8 + tmp2, _tmp8)
2023-01-11T21:38:06.6808836Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6808950Z     tmp8 = tl.reshape(tl.sum(_tmp8, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6809069Z     _tmp14 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6809173Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6809265Z         rindex = roffset + rbase
2023-01-11T21:38:06.6809353Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6809427Z         r1 = rindex
2023-01-11T21:38:06.6809546Z         tmp9 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask)
2023-01-11T21:38:06.6809625Z         tmp10 = 8
2023-01-11T21:38:06.6809712Z         tmp11 = tmp8 / tmp10
2023-01-11T21:38:06.6809824Z         tmp12 = tmp9 - tmp11
2023-01-11T21:38:06.6809910Z         tmp13 = tmp12 * tmp12
2023-01-11T21:38:06.6810034Z         _tmp14 = tl.where(xmask & rmask, _tmp14 + tmp13, _tmp14)
2023-01-11T21:38:06.6810152Z     tmp14 = tl.reshape(tl.sum(_tmp14, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6810230Z     tmp15 = 7
2023-01-11T21:38:06.6810314Z     tmp16 = tmp7 / tmp15
2023-01-11T21:38:06.6810404Z     tmp17 = tl.sqrt(tmp16)
2023-01-11T21:38:06.6810470Z     tmp18 = 8
2023-01-11T21:38:06.6810555Z     tmp19 = tmp14 / tmp18
2023-01-11T21:38:06.6810681Z     tmp20 = tl.sqrt(tmp19)
2023-01-11T21:38:06.6810822Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.6810961Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp20, xmask)
2023-01-11T21:38:06.6811049Z ''')
2023-01-11T21:38:06.6811055Z 
2023-01-11T21:38:06.6811059Z 
2023-01-11T21:38:06.6811224Z triton_fused_std_3_3 = async_compile.triton('''
2023-01-11T21:38:06.6811301Z import triton
2023-01-11T21:38:06.6811389Z import triton.language as tl
2023-01-11T21:38:06.6811506Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6811612Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6811746Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6811876Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6811881Z 
2023-01-11T21:38:06.6812288Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6812368Z @triton.jit
2023-01-11T21:38:06.6812495Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6812574Z     xnumel = 64
2023-01-11T21:38:06.6812673Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6812809Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6812895Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6812973Z     x0 = xindex % 8
2023-01-11T21:38:06.6813054Z     x1 = (xindex // 8)
2023-01-11T21:38:06.6813120Z     x2 = xindex
2023-01-11T21:38:06.6813229Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask)
2023-01-11T21:38:06.6813344Z     tmp1 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask)
2023-01-11T21:38:06.6813457Z     tmp3 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask)
2023-01-11T21:38:06.6813567Z     tmp5 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask)
2023-01-11T21:38:06.6813647Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6813731Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.6813804Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.6813877Z     tmp7 = 4
2023-01-11T21:38:06.6813957Z     tmp8 = tmp6 / tmp7
2023-01-11T21:38:06.6814067Z     tmp9 = tmp0 - tmp8
2023-01-11T21:38:06.6814149Z     tmp10 = tmp9 * tmp9
2023-01-11T21:38:06.6814290Z     tmp11 = tmp1 - tmp8
2023-01-11T21:38:06.6814377Z     tmp12 = tmp11 * tmp11
2023-01-11T21:38:06.6814453Z     tmp13 = tmp10 + tmp12
2023-01-11T21:38:06.6814694Z     tmp14 = tmp3 - tmp8
2023-01-11T21:38:06.6814777Z     tmp15 = tmp14 * tmp14
2023-01-11T21:38:06.6814861Z     tmp16 = tmp13 + tmp15
2023-01-11T21:38:06.6814972Z     tmp17 = tmp5 - tmp8
2023-01-11T21:38:06.6815054Z     tmp18 = tmp17 * tmp17
2023-01-11T21:38:06.6815136Z     tmp19 = tmp16 + tmp18
2023-01-11T21:38:06.6815202Z     tmp20 = 3
2023-01-11T21:38:06.6815286Z     tmp21 = tmp19 / tmp20
2023-01-11T21:38:06.6815371Z     tmp22 = tl.sqrt(tmp21)
2023-01-11T21:38:06.6815511Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.6815601Z ''')
2023-01-11T21:38:06.6815606Z 
2023-01-11T21:38:06.6815610Z 
2023-01-11T21:38:06.6815708Z async_compile.wait(globals())
2023-01-11T21:38:06.6815787Z del async_compile
2023-01-11T21:38:06.6815792Z 
2023-01-11T21:38:06.6815862Z def call(args):
2023-01-11T21:38:06.6815938Z     arg0_1, = args
2023-01-11T21:38:06.6816017Z     args.clear()
2023-01-11T21:38:06.6816112Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6816303Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6816495Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6816687Z         buf11 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6816778Z         buf21 = buf11; del buf11  # reuse
2023-01-11T21:38:06.6816870Z         buf20 = buf3; del buf3  # reuse
2023-01-11T21:38:06.6816962Z         buf19 = buf1; del buf1  # reuse
2023-01-11T21:38:06.6817057Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6817332Z         triton_fused_std_var_var_1_0.run(buf21, buf20, buf19, arg0_1, 1, 256, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.6817544Z         buf5 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6817753Z         buf8 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6817848Z         buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.6817934Z         buf9 = buf8; del buf8  # reuse
2023-01-11T21:38:06.6818086Z         triton_fused_var_2_var_3_1.run(buf6, buf9, arg0_1, 32, 8, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.6818288Z         buf13 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6818488Z         buf16 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6818585Z         buf14 = buf13; del buf13  # reuse
2023-01-11T21:38:06.6818682Z         buf17 = buf16; del buf16  # reuse
2023-01-11T21:38:06.6818840Z         triton_fused_std_1_std_2_2.run(buf14, buf17, arg0_1, 32, 8, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.6819055Z         buf18 = empty_strided((2, 4, 1, 8), (32, 8, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6819190Z         triton_fused_std_3_3.run(arg0_1, buf18, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6819270Z         del arg0_1
2023-01-11T21:38:06.6819398Z         return (buf19, buf20, buf6, buf9, buf21, buf14, buf17, buf18, )
2023-01-11T21:38:06.6819404Z 
2023-01-11T21:38:06.6819408Z 
2023-01-11T21:38:06.6819492Z if __name__ == "__main__":
2023-01-11T21:38:06.6819613Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6819743Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6819960Z     arg0_1 = rand_strided((2, 4, 4, 8), (128, 32, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6820074Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6820337Z [2023-01-11 21:35:41,429] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 859
2023-01-11T21:38:06.6820798Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6820936Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6821197Z [2023-01-11 21:35:41,614] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 860
2023-01-11T21:38:06.6821411Z [2023-01-11 21:35:41,704] torch._inductor.scheduler: [DEBUG] remove_buffer('buf10')
2023-01-11T21:38:06.6821620Z [2023-01-11 21:35:41,704] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2')
2023-01-11T21:38:06.6821824Z [2023-01-11 21:35:41,704] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.6822034Z [2023-01-11 21:35:41,709] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.6822238Z [2023-01-11 21:35:41,709] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.6829207Z [2023-01-11 21:35:41,714] torch._inductor.scheduler: [DEBUG] remove_buffer('buf15')
2023-01-11T21:38:06.6829458Z [2023-01-11 21:35:41,714] torch._inductor.scheduler: [DEBUG] remove_buffer('buf12')
2023-01-11T21:38:06.6829475Z 
2023-01-11T21:38:06.6829480Z 
2023-01-11T21:38:06.6829577Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6829656Z import torch
2023-01-11T21:38:06.6829736Z import random
2023-01-11T21:38:06.6829859Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6829990Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6829995Z 
2023-01-11T21:38:06.6830080Z aten = torch.ops.aten
2023-01-11T21:38:06.6830219Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6830392Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6830397Z 
2023-01-11T21:38:06.6830475Z import triton
2023-01-11T21:38:06.6830572Z import triton.language as tl
2023-01-11T21:38:06.6830701Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6830846Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6830851Z 
2023-01-11T21:38:06.6830855Z 
2023-01-11T21:38:06.6831026Z triton_fused_std_var_var_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6831107Z import triton
2023-01-11T21:38:06.6831195Z import triton.language as tl
2023-01-11T21:38:06.6831311Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6831415Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6831549Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6831678Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6831683Z 
2023-01-11T21:38:06.6831777Z @reduction(size_hints=[1, 256],
2023-01-11T21:38:06.6831897Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6831986Z               filename=__file__,
2023-01-11T21:38:06.6832426Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr1', 'in_out_ptr0', 'in_out_ptr2'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 5), equal_to_1=())]})
2023-01-11T21:38:06.6832502Z @triton.jit
2023-01-11T21:38:06.6832698Z def triton_(in_out_ptr0, in_out_ptr1, in_out_ptr2, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6832775Z     xnumel = 1
2023-01-11T21:38:06.6832851Z     rnumel = 256
2023-01-11T21:38:06.6832953Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6833091Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6833176Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6833291Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6833413Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6833521Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6833612Z         rindex = roffset + rbase
2023-01-11T21:38:06.6833700Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6833808Z         r0 = rindex
2023-01-11T21:38:06.6834030Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6834118Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6834241Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.6834360Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6834478Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6834598Z     _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6834707Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6834798Z         rindex = roffset + rbase
2023-01-11T21:38:06.6834881Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6834956Z         r0 = rindex
2023-01-11T21:38:06.6835178Z         tmp3 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6835254Z         tmp4 = 256
2023-01-11T21:38:06.6835361Z         tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.6835463Z         tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.6835600Z         tmp7 = tmp3 - tmp6
2023-01-11T21:38:06.6835678Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.6835800Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.6835893Z         tmp10 = tmp3.to(tl.float32)
2023-01-11T21:38:06.6836018Z         _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11)
2023-01-11T21:38:06.6836135Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6836251Z     tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6836375Z     _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6836515Z     _tmp20 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6836624Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6836714Z         rindex = roffset + rbase
2023-01-11T21:38:06.6836803Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6836877Z         r0 = rindex
2023-01-11T21:38:06.6837101Z         tmp12 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6837179Z         tmp13 = 256
2023-01-11T21:38:06.6837258Z         tmp14 = tmp11 / tmp13
2023-01-11T21:38:06.6837351Z         tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.6837471Z         tmp16 = tmp12 - tmp15
2023-01-11T21:38:06.6837556Z         tmp17 = tmp16 * tmp16
2023-01-11T21:38:06.6837683Z         _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18)
2023-01-11T21:38:06.6837776Z         tmp19 = tmp12.to(tl.float32)
2023-01-11T21:38:06.6837898Z         _tmp20 = tl.where(xmask & rmask, _tmp20 + tmp19, _tmp20)
2023-01-11T21:38:06.6838009Z     tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6838128Z     tmp20 = tl.reshape(tl.sum(_tmp20, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6838247Z     _tmp27 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6838358Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6838451Z         rindex = roffset + rbase
2023-01-11T21:38:06.6838541Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6838615Z         r0 = rindex
2023-01-11T21:38:06.6838730Z         tmp21 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.6838805Z         tmp22 = 256
2023-01-11T21:38:06.6838890Z         tmp23 = tmp20 / tmp22
2023-01-11T21:38:06.6838981Z         tmp24 = tmp23.to(tl.float32)
2023-01-11T21:38:06.6839102Z         tmp25 = tmp21 - tmp24
2023-01-11T21:38:06.6839187Z         tmp26 = tmp25 * tmp25
2023-01-11T21:38:06.6839312Z         _tmp27 = tl.where(xmask & rmask, _tmp27 + tmp26, _tmp27)
2023-01-11T21:38:06.6839421Z     tmp27 = tl.reshape(tl.sum(_tmp27, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6839500Z     tmp28 = 256
2023-01-11T21:38:06.6839584Z     tmp29 = tmp27 / tmp28
2023-01-11T21:38:06.6839670Z     tmp30 = tl.sqrt(tmp29)
2023-01-11T21:38:06.6839752Z     tmp31 = tmp18 / tmp28
2023-01-11T21:38:06.6839826Z     tmp32 = 255
2023-01-11T21:38:06.6839909Z     tmp33 = tmp9 / tmp32
2023-01-11T21:38:06.6840072Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp30, None)
2023-01-11T21:38:06.6840217Z     tl.store(in_out_ptr1 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp31, None)
2023-01-11T21:38:06.6840356Z     tl.store(in_out_ptr2 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp33, None)
2023-01-11T21:38:06.6840444Z ''')
2023-01-11T21:38:06.6840450Z 
2023-01-11T21:38:06.6840454Z 
2023-01-11T21:38:06.6840623Z triton_fused_var_2_var_3_1 = async_compile.triton('''
2023-01-11T21:38:06.6840702Z import triton
2023-01-11T21:38:06.6840798Z import triton.language as tl
2023-01-11T21:38:06.6840915Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6841014Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6841151Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6841278Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6841283Z 
2023-01-11T21:38:06.6841376Z @reduction(size_hints=[32, 8],
2023-01-11T21:38:06.6841497Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6841585Z               filename=__file__,
2023-01-11T21:38:06.6841995Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6842071Z @triton.jit
2023-01-11T21:38:06.6842248Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6842324Z     xnumel = 32
2023-01-11T21:38:06.6842400Z     rnumel = 8
2023-01-11T21:38:06.6842500Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6842666Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6842752Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6842873Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6842939Z     x0 = xindex
2023-01-11T21:38:06.6843060Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6843168Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6843257Z         rindex = roffset + rbase
2023-01-11T21:38:06.6843343Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6843417Z         r1 = rindex
2023-01-11T21:38:06.6843661Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6843749Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6843877Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.6843993Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6844114Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6844235Z     _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6844342Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6844433Z         rindex = roffset + rbase
2023-01-11T21:38:06.6844513Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6844592Z         r1 = rindex
2023-01-11T21:38:06.6844833Z         tmp3 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6844908Z         tmp4 = 8
2023-01-11T21:38:06.6844992Z         tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.6845086Z         tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.6845201Z         tmp7 = tmp3 - tmp6
2023-01-11T21:38:06.6845278Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.6845407Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.6845503Z         tmp10 = tmp3.to(tl.float32)
2023-01-11T21:38:06.6845628Z         _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11)
2023-01-11T21:38:06.6845748Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6845864Z     tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6845982Z     _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6846109Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6846202Z         rindex = roffset + rbase
2023-01-11T21:38:06.6846290Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6846364Z         r1 = rindex
2023-01-11T21:38:06.6846502Z         tmp12 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6846578Z         tmp13 = 8
2023-01-11T21:38:06.6846663Z         tmp14 = tmp11 / tmp13
2023-01-11T21:38:06.6846750Z         tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.6846871Z         tmp16 = tmp12 - tmp15
2023-01-11T21:38:06.6846956Z         tmp17 = tmp16 * tmp16
2023-01-11T21:38:06.6847080Z         _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18)
2023-01-11T21:38:06.6847202Z     tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6847276Z     tmp19 = 7
2023-01-11T21:38:06.6847360Z     tmp20 = tmp9 / tmp19
2023-01-11T21:38:06.6847427Z     tmp21 = 8
2023-01-11T21:38:06.6847512Z     tmp22 = tmp18 / tmp21
2023-01-11T21:38:06.6847657Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp20, xmask)
2023-01-11T21:38:06.6847802Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.6847890Z ''')
2023-01-11T21:38:06.6847896Z 
2023-01-11T21:38:06.6847900Z 
2023-01-11T21:38:06.6848068Z triton_fused_std_1_std_2_2 = async_compile.triton('''
2023-01-11T21:38:06.6848146Z import triton
2023-01-11T21:38:06.6848242Z import triton.language as tl
2023-01-11T21:38:06.6848353Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6848458Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6848594Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6848748Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6848754Z 
2023-01-11T21:38:06.6848845Z @reduction(size_hints=[32, 8],
2023-01-11T21:38:06.6848967Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.6849056Z               filename=__file__,
2023-01-11T21:38:06.6849469Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6849538Z @triton.jit
2023-01-11T21:38:06.6849724Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6849800Z     xnumel = 32
2023-01-11T21:38:06.6849874Z     rnumel = 8
2023-01-11T21:38:06.6849973Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6850111Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6850201Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6850318Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6850391Z     x0 = xindex
2023-01-11T21:38:06.6850509Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6850618Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6850711Z         rindex = roffset + rbase
2023-01-11T21:38:06.6850799Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6850873Z         r1 = rindex
2023-01-11T21:38:06.6851107Z         tmp0 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6851203Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6851326Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.6851443Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6851566Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6851686Z     _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6851797Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6851880Z         rindex = roffset + rbase
2023-01-11T21:38:06.6851969Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6852046Z         r1 = rindex
2023-01-11T21:38:06.6852358Z         tmp3 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6852435Z         tmp4 = 8
2023-01-11T21:38:06.6852519Z         tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.6852613Z         tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.6852722Z         tmp7 = tmp3 - tmp6
2023-01-11T21:38:06.6852808Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.6852930Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.6853022Z         tmp10 = tmp3.to(tl.float32)
2023-01-11T21:38:06.6853147Z         _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11)
2023-01-11T21:38:06.6853264Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6853385Z     tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6853504Z     _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6853604Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6853695Z         rindex = roffset + rbase
2023-01-11T21:38:06.6853785Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6853859Z         r1 = rindex
2023-01-11T21:38:06.6853995Z         tmp12 = tl.load(in_ptr0 + (x0 + (32*r1)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6854072Z         tmp13 = 8
2023-01-11T21:38:06.6854152Z         tmp14 = tmp11 / tmp13
2023-01-11T21:38:06.6854245Z         tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.6854366Z         tmp16 = tmp12 - tmp15
2023-01-11T21:38:06.6854452Z         tmp17 = tmp16 * tmp16
2023-01-11T21:38:06.6854729Z         _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18)
2023-01-11T21:38:06.6854846Z     tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6854982Z     tmp19 = 7
2023-01-11T21:38:06.6855057Z     tmp20 = tmp9 / tmp19
2023-01-11T21:38:06.6855140Z     tmp21 = tl.sqrt(tmp20)
2023-01-11T21:38:06.6855212Z     tmp22 = 8
2023-01-11T21:38:06.6855293Z     tmp23 = tmp18 / tmp22
2023-01-11T21:38:06.6855375Z     tmp24 = tl.sqrt(tmp23)
2023-01-11T21:38:06.6855519Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp21, xmask)
2023-01-11T21:38:06.6855659Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.6855740Z ''')
2023-01-11T21:38:06.6855756Z 
2023-01-11T21:38:06.6855760Z 
2023-01-11T21:38:06.6855911Z triton_fused_std_3_3 = async_compile.triton('''
2023-01-11T21:38:06.6855986Z import triton
2023-01-11T21:38:06.6856079Z import triton.language as tl
2023-01-11T21:38:06.6856196Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6856298Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6856431Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6856560Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6856565Z 
2023-01-11T21:38:06.6856973Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6857040Z @triton.jit
2023-01-11T21:38:06.6857251Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6857342Z     xnumel = 64
2023-01-11T21:38:06.6857440Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6857573Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6857656Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6857731Z     x0 = xindex % 8
2023-01-11T21:38:06.6857803Z     x1 = (xindex // 8)
2023-01-11T21:38:06.6857874Z     x2 = xindex
2023-01-11T21:38:06.6857997Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6858127Z     tmp2 = tl.load(in_ptr0 + (8 + x0 + (32*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6858253Z     tmp5 = tl.load(in_ptr0 + (16 + x0 + (32*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6858376Z     tmp8 = tl.load(in_ptr0 + (24 + x0 + (32*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.6858466Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.6858587Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.6858669Z     tmp4 = tmp1 + tmp3
2023-01-11T21:38:06.6858755Z     tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.6858834Z     tmp7 = tmp4 + tmp6
2023-01-11T21:38:06.6858920Z     tmp9 = tmp8.to(tl.float32)
2023-01-11T21:38:06.6858999Z     tmp10 = tmp7 + tmp9
2023-01-11T21:38:06.6859071Z     tmp11 = 4
2023-01-11T21:38:06.6859144Z     tmp12 = tmp10 / tmp11
2023-01-11T21:38:06.6859233Z     tmp13 = tmp12.to(tl.float32)
2023-01-11T21:38:06.6859350Z     tmp14 = tmp0 - tmp13
2023-01-11T21:38:06.6859431Z     tmp15 = tmp14 * tmp14
2023-01-11T21:38:06.6859542Z     tmp16 = tmp2 - tmp13
2023-01-11T21:38:06.6859625Z     tmp17 = tmp16 * tmp16
2023-01-11T21:38:06.6859697Z     tmp18 = tmp15 + tmp17
2023-01-11T21:38:06.6859807Z     tmp19 = tmp5 - tmp13
2023-01-11T21:38:06.6859887Z     tmp20 = tmp19 * tmp19
2023-01-11T21:38:06.6859966Z     tmp21 = tmp18 + tmp20
2023-01-11T21:38:06.6860078Z     tmp22 = tmp8 - tmp13
2023-01-11T21:38:06.6860159Z     tmp23 = tmp22 * tmp22
2023-01-11T21:38:06.6860238Z     tmp24 = tmp21 + tmp23
2023-01-11T21:38:06.6860302Z     tmp25 = 3
2023-01-11T21:38:06.6860380Z     tmp26 = tmp24 / tmp25
2023-01-11T21:38:06.6860463Z     tmp27 = tl.sqrt(tmp26)
2023-01-11T21:38:06.6860601Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp27, xmask)
2023-01-11T21:38:06.6860687Z ''')
2023-01-11T21:38:06.6860692Z 
2023-01-11T21:38:06.6860697Z 
2023-01-11T21:38:06.6860790Z async_compile.wait(globals())
2023-01-11T21:38:06.6860866Z del async_compile
2023-01-11T21:38:06.6860872Z 
2023-01-11T21:38:06.6860946Z def call(args):
2023-01-11T21:38:06.6861013Z     arg0_1, = args
2023-01-11T21:38:06.6861131Z     args.clear()
2023-01-11T21:38:06.6861227Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6861421Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6861617Z         buf3 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6861811Z         buf11 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6861913Z         buf21 = buf11; del buf11  # reuse
2023-01-11T21:38:06.6862000Z         buf20 = buf3; del buf3  # reuse
2023-01-11T21:38:06.6862096Z         buf19 = buf1; del buf1  # reuse
2023-01-11T21:38:06.6862194Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6862353Z         triton_fused_std_var_var_1_0.run(buf21, buf20, buf19, arg0_1, 1, 256, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.6862565Z         buf5 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6862774Z         buf8 = empty_strided((2, 4, 4), (16, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6862872Z         buf6 = buf5; del buf5  # reuse
2023-01-11T21:38:06.6862958Z         buf9 = buf8; del buf8  # reuse
2023-01-11T21:38:06.6863111Z         triton_fused_var_2_var_3_1.run(buf6, buf9, arg0_1, 32, 8, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.6863314Z         buf13 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6863518Z         buf16 = empty_strided((4, 8), (8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6863620Z         buf14 = buf13; del buf13  # reuse
2023-01-11T21:38:06.6863717Z         buf17 = buf16; del buf16  # reuse
2023-01-11T21:38:06.6863874Z         triton_fused_std_1_std_2_2.run(buf14, buf17, arg0_1, 32, 8, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.6864089Z         buf18 = empty_strided((2, 4, 1, 8), (32, 8, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6864222Z         triton_fused_std_3_3.run(arg0_1, buf18, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.6864302Z         del arg0_1
2023-01-11T21:38:06.6864430Z         return (buf19, buf20, buf6, buf9, buf21, buf14, buf17, buf18, )
2023-01-11T21:38:06.6864436Z 
2023-01-11T21:38:06.6864440Z 
2023-01-11T21:38:06.6864523Z if __name__ == "__main__":
2023-01-11T21:38:06.6864645Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6864806Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6865027Z     arg0_1 = rand_strided((2, 4, 4, 8), (128, 32, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6865142Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6865404Z [2023-01-11 21:35:41,877] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 860
2023-01-11T21:38:06.6865418Z 
2023-01-11T21:38:06.6865485Z ok (0.902s)
2023-01-11T21:38:06.6866007Z   test_strided_inputs_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.6866094Z   warnings.warn(
2023-01-11T21:38:06.6866354Z [2023-01-11 21:35:41,892] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 861
2023-01-11T21:38:06.6866622Z [2023-01-11 21:35:41,960] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 861
2023-01-11T21:38:06.6866628Z 
2023-01-11T21:38:06.6866733Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6866811Z import torch
2023-01-11T21:38:06.6866887Z import random
2023-01-11T21:38:06.6867004Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6867129Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6867135Z 
2023-01-11T21:38:06.6867224Z aten = torch.ops.aten
2023-01-11T21:38:06.6867364Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6867464Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6867498Z 
2023-01-11T21:38:06.6867578Z import triton
2023-01-11T21:38:06.6867674Z import triton.language as tl
2023-01-11T21:38:06.6867801Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6867936Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6867942Z 
2023-01-11T21:38:06.6867956Z 
2023-01-11T21:38:06.6868108Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.6868185Z import triton
2023-01-11T21:38:06.6868281Z import triton.language as tl
2023-01-11T21:38:06.6868399Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6868504Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6868643Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6868765Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6868779Z 
2023-01-11T21:38:06.6869197Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6869278Z @triton.jit
2023-01-11T21:38:06.6869424Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6869501Z     xnumel = 128
2023-01-11T21:38:06.6869606Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6869739Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6869826Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6869893Z     x0 = xindex
2023-01-11T21:38:06.6869997Z     tmp0 = tl.load(in_ptr0 + (2*x0), xmask)
2023-01-11T21:38:06.6870099Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.6870181Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6870318Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6870407Z ''')
2023-01-11T21:38:06.6870413Z 
2023-01-11T21:38:06.6870417Z 
2023-01-11T21:38:06.6870516Z async_compile.wait(globals())
2023-01-11T21:38:06.6870596Z del async_compile
2023-01-11T21:38:06.6870601Z 
2023-01-11T21:38:06.6870672Z def call(args):
2023-01-11T21:38:06.6870753Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6870830Z     args.clear()
2023-01-11T21:38:06.6870924Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6871161Z         buf0 = empty_strided((8, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6871257Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6871404Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 128, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.6871473Z         del arg0_1
2023-01-11T21:38:06.6871549Z         del arg1_1
2023-01-11T21:38:06.6871628Z         return (buf0, )
2023-01-11T21:38:06.6871634Z 
2023-01-11T21:38:06.6871638Z 
2023-01-11T21:38:06.6871720Z if __name__ == "__main__":
2023-01-11T21:38:06.6871840Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6871968Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6872179Z     arg0_1 = rand_strided((8, 16), (32, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6872382Z     arg1_1 = rand_strided((8, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6872498Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6872503Z 
2023-01-11T21:38:06.6872578Z ok (0.259s)
2023-01-11T21:38:06.6873031Z   test_sum1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6873170Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6873434Z [2023-01-11 21:35:42,154] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 862
2023-01-11T21:38:06.6873737Z [2023-01-11 21:35:42,233] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 862
2023-01-11T21:38:06.6874157Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6874292Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6874546Z [2023-01-11 21:35:42,249] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 863
2023-01-11T21:38:06.6874805Z [2023-01-11 21:35:42,327] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 863
2023-01-11T21:38:06.6874811Z 
2023-01-11T21:38:06.6874913Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6874987Z import torch
2023-01-11T21:38:06.6875064Z import random
2023-01-11T21:38:06.6875187Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6875318Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6875323Z 
2023-01-11T21:38:06.6875409Z aten = torch.ops.aten
2023-01-11T21:38:06.6875575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6875694Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6875701Z 
2023-01-11T21:38:06.6875773Z import triton
2023-01-11T21:38:06.6875868Z import triton.language as tl
2023-01-11T21:38:06.6875995Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6876135Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6876141Z 
2023-01-11T21:38:06.6876145Z 
2023-01-11T21:38:06.6876308Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6876386Z import triton
2023-01-11T21:38:06.6876485Z import triton.language as tl
2023-01-11T21:38:06.6876600Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6876698Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6876832Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6876960Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6876993Z 
2023-01-11T21:38:06.6877086Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.6877203Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6877294Z               filename=__file__,
2023-01-11T21:38:06.6877675Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6877752Z @triton.jit
2023-01-11T21:38:06.6877924Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6878006Z     xnumel = 8
2023-01-11T21:38:06.6878080Z     rnumel = 8
2023-01-11T21:38:06.6878182Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6878319Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6878407Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6878531Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6878598Z     x0 = xindex
2023-01-11T21:38:06.6878717Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6878826Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6878920Z         rindex = roffset + rbase
2023-01-11T21:38:06.6879007Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6879083Z         r1 = rindex
2023-01-11T21:38:06.6879204Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6879315Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6879399Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6879524Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6879671Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6879775Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.6879864Z ''')
2023-01-11T21:38:06.6879870Z 
2023-01-11T21:38:06.6879874Z 
2023-01-11T21:38:06.6879970Z async_compile.wait(globals())
2023-01-11T21:38:06.6880053Z del async_compile
2023-01-11T21:38:06.6880058Z 
2023-01-11T21:38:06.6880128Z def call(args):
2023-01-11T21:38:06.6880210Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6880288Z     args.clear()
2023-01-11T21:38:06.6880386Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6880586Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6880682Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6880835Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6880904Z         del arg0_1
2023-01-11T21:38:06.6880979Z         del arg1_1
2023-01-11T21:38:06.6881062Z         return (buf0, )
2023-01-11T21:38:06.6881068Z 
2023-01-11T21:38:06.6881072Z 
2023-01-11T21:38:06.6881156Z if __name__ == "__main__":
2023-01-11T21:38:06.6881276Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6881405Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6881612Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6881814Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6881931Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6881936Z 
2023-01-11T21:38:06.6881940Z 
2023-01-11T21:38:06.6882040Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6882117Z import torch
2023-01-11T21:38:06.6882194Z import random
2023-01-11T21:38:06.6882316Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6882443Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6882450Z 
2023-01-11T21:38:06.6882535Z aten = torch.ops.aten
2023-01-11T21:38:06.6882666Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6882766Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6882771Z 
2023-01-11T21:38:06.6882849Z import triton
2023-01-11T21:38:06.6882943Z import triton.language as tl
2023-01-11T21:38:06.6883101Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6883246Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6883251Z 
2023-01-11T21:38:06.6883256Z 
2023-01-11T21:38:06.6883419Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6883496Z import triton
2023-01-11T21:38:06.6883584Z import triton.language as tl
2023-01-11T21:38:06.6883700Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6883804Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6883939Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6884070Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6884075Z 
2023-01-11T21:38:06.6884166Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.6884285Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6884373Z               filename=__file__,
2023-01-11T21:38:06.6884747Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6884824Z @triton.jit
2023-01-11T21:38:06.6885005Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6885080Z     xnumel = 8
2023-01-11T21:38:06.6885154Z     rnumel = 8
2023-01-11T21:38:06.6885255Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6885394Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6885473Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6885648Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6885733Z     x0 = xindex
2023-01-11T21:38:06.6885866Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6885973Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6886064Z         rindex = roffset + rbase
2023-01-11T21:38:06.6886156Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6886224Z         r1 = rindex
2023-01-11T21:38:06.6886359Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6886493Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6886579Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6886706Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6886822Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6886923Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.6887013Z ''')
2023-01-11T21:38:06.6887021Z 
2023-01-11T21:38:06.6887025Z 
2023-01-11T21:38:06.6887113Z async_compile.wait(globals())
2023-01-11T21:38:06.6887194Z del async_compile
2023-01-11T21:38:06.6887199Z 
2023-01-11T21:38:06.6887275Z def call(args):
2023-01-11T21:38:06.6887357Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6887434Z     args.clear()
2023-01-11T21:38:06.6887533Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6887731Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6887818Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6887972Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.6888049Z         del arg0_1
2023-01-11T21:38:06.6888125Z         del arg1_1
2023-01-11T21:38:06.6888204Z         return (buf0, )
2023-01-11T21:38:06.6888209Z 
2023-01-11T21:38:06.6888214Z 
2023-01-11T21:38:06.6888297Z if __name__ == "__main__":
2023-01-11T21:38:06.6888416Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6888548Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6888741Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6888939Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6889094Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6889100Z 
2023-01-11T21:38:06.6889174Z ok (0.189s)
2023-01-11T21:38:06.6889624Z   test_sum2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6889755Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6890014Z [2023-01-11 21:35:42,348] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 864
2023-01-11T21:38:06.6890277Z [2023-01-11 21:35:42,743] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 864
2023-01-11T21:38:06.6890695Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6890827Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6891076Z [2023-01-11 21:35:42,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 865
2023-01-11T21:38:06.6891089Z 
2023-01-11T21:38:06.6891180Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6891256Z import torch
2023-01-11T21:38:06.6891359Z import random
2023-01-11T21:38:06.6891478Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6891604Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6891609Z 
2023-01-11T21:38:06.6891691Z aten = torch.ops.aten
2023-01-11T21:38:06.6891828Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6891917Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6891922Z 
2023-01-11T21:38:06.6891996Z import triton
2023-01-11T21:38:06.6892089Z import triton.language as tl
2023-01-11T21:38:06.6892213Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6892355Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6892361Z 
2023-01-11T21:38:06.6892365Z 
2023-01-11T21:38:06.6892527Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6892602Z import triton
2023-01-11T21:38:06.6892694Z import triton.language as tl
2023-01-11T21:38:06.6892801Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6892911Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6893043Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6893169Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6893175Z 
2023-01-11T21:38:06.6893265Z @reduction(size_hints=[256, 32],
2023-01-11T21:38:06.6893385Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.6893470Z               filename=__file__,
2023-01-11T21:38:06.6893846Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6893912Z @triton.jit
2023-01-11T21:38:06.6894090Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6894165Z     xnumel = 168
2023-01-11T21:38:06.6894238Z     rnumel = 27
2023-01-11T21:38:06.6894336Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6894471Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6894673Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6894784Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6894860Z     x0 = xindex % 21
2023-01-11T21:38:06.6894983Z     x1 = (xindex // 21)
2023-01-11T21:38:06.6895102Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6895173Z     x3 = xindex
2023-01-11T21:38:06.6895278Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6895366Z         rindex = roffset + rbase
2023-01-11T21:38:06.6895444Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6895516Z         r2 = rindex
2023-01-11T21:38:06.6895750Z         tmp0 = tl.load(in_ptr0 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6895979Z         tmp1 = tl.load(in_ptr1 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6896065Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6896186Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6896301Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6896392Z     tl.store(out_ptr0 + x3, tmp3, xmask)
2023-01-11T21:38:06.6896478Z ''')
2023-01-11T21:38:06.6896483Z 
2023-01-11T21:38:06.6896488Z 
2023-01-11T21:38:06.6896653Z triton_fused_add_1_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6896728Z import triton
2023-01-11T21:38:06.6896823Z import triton.language as tl
2023-01-11T21:38:06.6896938Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6897040Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6897237Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6897357Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6897362Z 
2023-01-11T21:38:06.6897453Z @reduction(size_hints=[256, 32],
2023-01-11T21:38:06.6897614Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6897703Z               filename=__file__,
2023-01-11T21:38:06.6898077Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6898155Z @triton.jit
2023-01-11T21:38:06.6898351Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6898425Z     xnumel = 216
2023-01-11T21:38:06.6898492Z     rnumel = 21
2023-01-11T21:38:06.6898593Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6898739Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6898828Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6898954Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6899026Z     x0 = xindex
2023-01-11T21:38:06.6899155Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6899260Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6899349Z         rindex = roffset + rbase
2023-01-11T21:38:06.6899438Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6899509Z         r1 = rindex
2023-01-11T21:38:06.6899640Z         tmp0 = tl.load(in_ptr0 + (r1 + (21*x0)), rmask & xmask)
2023-01-11T21:38:06.6899764Z         tmp1 = tl.load(in_ptr1 + (r1 + (21*x0)), rmask & xmask)
2023-01-11T21:38:06.6899847Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6899969Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6900088Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6900190Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.6900278Z ''')
2023-01-11T21:38:06.6900284Z 
2023-01-11T21:38:06.6900288Z 
2023-01-11T21:38:06.6900386Z async_compile.wait(globals())
2023-01-11T21:38:06.6900463Z del async_compile
2023-01-11T21:38:06.6900473Z 
2023-01-11T21:38:06.6900550Z def call(args):
2023-01-11T21:38:06.6900623Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6900699Z     args.clear()
2023-01-11T21:38:06.6900795Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6901025Z         buf0 = empty_strided((8, 21), (21, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6901146Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6901300Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 168, 27, grid=grid(168), stream=stream0)
2023-01-11T21:38:06.6901503Z         buf1 = empty_strided((8, 9, 3), (27, 3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6901657Z         triton_fused_add_1_sum_2_1.run(arg0_1, arg1_1, buf1, 216, 21, grid=grid(216), stream=stream0)
2023-01-11T21:38:06.6901723Z         del arg0_1
2023-01-11T21:38:06.6901795Z         del arg1_1
2023-01-11T21:38:06.6901882Z         return (buf0, buf1, )
2023-01-11T21:38:06.6901887Z 
2023-01-11T21:38:06.6901891Z 
2023-01-11T21:38:06.6901971Z if __name__ == "__main__":
2023-01-11T21:38:06.6902091Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6902218Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6902439Z     arg0_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6902661Z     arg1_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6902774Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6902779Z 
2023-01-11T21:38:06.6903042Z [2023-01-11 21:35:43,093] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 865
2023-01-11T21:38:06.6903048Z 
2023-01-11T21:38:06.6903146Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6903220Z import torch
2023-01-11T21:38:06.6903294Z import random
2023-01-11T21:38:06.6903413Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6903535Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6903572Z 
2023-01-11T21:38:06.6903654Z aten = torch.ops.aten
2023-01-11T21:38:06.6903783Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6903877Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6903882Z 
2023-01-11T21:38:06.6903955Z import triton
2023-01-11T21:38:06.6904051Z import triton.language as tl
2023-01-11T21:38:06.6904175Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6904314Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6904320Z 
2023-01-11T21:38:06.6904324Z 
2023-01-11T21:38:06.6904485Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6904559Z import triton
2023-01-11T21:38:06.6904644Z import triton.language as tl
2023-01-11T21:38:06.6904757Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6904859Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6904993Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6905122Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6905127Z 
2023-01-11T21:38:06.6905218Z @reduction(size_hints=[256, 32],
2023-01-11T21:38:06.6905335Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.6905413Z               filename=__file__,
2023-01-11T21:38:06.6905840Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6905914Z @triton.jit
2023-01-11T21:38:06.6906090Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6906164Z     xnumel = 168
2023-01-11T21:38:06.6906237Z     rnumel = 27
2023-01-11T21:38:06.6906339Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6906473Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6906552Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6906670Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6906746Z     x0 = xindex % 21
2023-01-11T21:38:06.6906824Z     x1 = (xindex // 21)
2023-01-11T21:38:06.6906941Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6907012Z     x3 = xindex
2023-01-11T21:38:06.6907142Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6907225Z         rindex = roffset + rbase
2023-01-11T21:38:06.6907310Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6907381Z         r2 = rindex
2023-01-11T21:38:06.6907634Z         tmp0 = tl.load(in_ptr0 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6907883Z         tmp1 = tl.load(in_ptr1 + (x0 + (21*r2) + (567*x1)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.6907965Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6908086Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6908198Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6908297Z     tl.store(out_ptr0 + x3, tmp3, xmask)
2023-01-11T21:38:06.6908381Z ''')
2023-01-11T21:38:06.6908386Z 
2023-01-11T21:38:06.6908390Z 
2023-01-11T21:38:06.6908557Z triton_fused_add_1_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6908634Z import triton
2023-01-11T21:38:06.6908728Z import triton.language as tl
2023-01-11T21:38:06.6908841Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6908942Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6909066Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6909189Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6909194Z 
2023-01-11T21:38:06.6909283Z @reduction(size_hints=[256, 32],
2023-01-11T21:38:06.6909400Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6909485Z               filename=__file__,
2023-01-11T21:38:06.6909888Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.6909962Z @triton.jit
2023-01-11T21:38:06.6910141Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6910208Z     xnumel = 216
2023-01-11T21:38:06.6910281Z     rnumel = 21
2023-01-11T21:38:06.6910379Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6910514Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6910597Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6910715Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6910785Z     x0 = xindex
2023-01-11T21:38:06.6910893Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6911000Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6911090Z         rindex = roffset + rbase
2023-01-11T21:38:06.6911177Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6911249Z         r1 = rindex
2023-01-11T21:38:06.6911381Z         tmp0 = tl.load(in_ptr0 + (r1 + (21*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6911517Z         tmp1 = tl.load(in_ptr1 + (r1 + (21*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6911592Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6911718Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6911833Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6911931Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.6912016Z ''')
2023-01-11T21:38:06.6912021Z 
2023-01-11T21:38:06.6912025Z 
2023-01-11T21:38:06.6912119Z async_compile.wait(globals())
2023-01-11T21:38:06.6912196Z del async_compile
2023-01-11T21:38:06.6912201Z 
2023-01-11T21:38:06.6912274Z def call(args):
2023-01-11T21:38:06.6912347Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6912425Z     args.clear()
2023-01-11T21:38:06.6912516Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6912719Z         buf0 = empty_strided((8, 21), (21, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6912814Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6912992Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 168, 27, grid=grid(168), stream=stream0)
2023-01-11T21:38:06.6913199Z         buf1 = empty_strided((8, 9, 3), (27, 3, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6913346Z         triton_fused_add_1_sum_2_1.run(arg0_1, arg1_1, buf1, 216, 21, grid=grid(216), stream=stream0)
2023-01-11T21:38:06.6913419Z         del arg0_1
2023-01-11T21:38:06.6913493Z         del arg1_1
2023-01-11T21:38:06.6913575Z         return (buf0, buf1, )
2023-01-11T21:38:06.6913581Z 
2023-01-11T21:38:06.6913585Z 
2023-01-11T21:38:06.6913664Z if __name__ == "__main__":
2023-01-11T21:38:06.6913781Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6913908Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6914133Z     arg0_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6914341Z     arg1_1 = rand_strided((8, 9, 3, 21), (567, 63, 21, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6914462Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6914467Z 
2023-01-11T21:38:06.6914540Z ok (0.766s)
2023-01-11T21:38:06.6914994Z   test_sum3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6915125Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6915385Z [2023-01-11 21:35:43,114] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 866
2023-01-11T21:38:06.6915689Z [2023-01-11 21:35:43,310] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 866
2023-01-11T21:38:06.6916110Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6916243Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6916497Z [2023-01-11 21:35:43,330] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 867
2023-01-11T21:38:06.6916503Z 
2023-01-11T21:38:06.6916601Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6916668Z import torch
2023-01-11T21:38:06.6916745Z import random
2023-01-11T21:38:06.6916862Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6916987Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6916992Z 
2023-01-11T21:38:06.6917074Z aten = torch.ops.aten
2023-01-11T21:38:06.6917210Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6917308Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6917313Z 
2023-01-11T21:38:06.6917380Z import triton
2023-01-11T21:38:06.6917473Z import triton.language as tl
2023-01-11T21:38:06.6917600Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6917739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6917745Z 
2023-01-11T21:38:06.6917749Z 
2023-01-11T21:38:06.6917910Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6917985Z import triton
2023-01-11T21:38:06.6918077Z import triton.language as tl
2023-01-11T21:38:06.6918190Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6918288Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6918417Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6918543Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6918548Z 
2023-01-11T21:38:06.6918641Z @reduction(size_hints=[16, 16],
2023-01-11T21:38:06.6918779Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6918865Z               filename=__file__,
2023-01-11T21:38:06.6919258Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6919331Z @triton.jit
2023-01-11T21:38:06.6919508Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6919582Z     xnumel = 10
2023-01-11T21:38:06.6919654Z     rnumel = 10
2023-01-11T21:38:06.6919756Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6919892Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6919975Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6920093Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6920157Z     x0 = xindex
2023-01-11T21:38:06.6920266Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6920353Z         rindex = roffset + rbase
2023-01-11T21:38:06.6920437Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6920508Z         r1 = rindex
2023-01-11T21:38:06.6920624Z         tmp0 = tl.load(in_ptr0 + (r1 + (10*x0)), rmask & xmask)
2023-01-11T21:38:06.6920726Z         tmp1 = tl.load(in_ptr1 + (r1), rmask)
2023-01-11T21:38:06.6920801Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6920958Z         tl.store(out_ptr0 + (r1 + (10*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask)
2023-01-11T21:38:06.6921076Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6921215Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6921302Z         rindex = roffset + rbase
2023-01-11T21:38:06.6921387Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6921458Z         r1 = rindex
2023-01-11T21:38:06.6921567Z         tmp3 = tl.load(out_ptr0 + (r1 + (10*x0)), rmask & xmask)
2023-01-11T21:38:06.6921690Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.6921805Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6921902Z     tl.store(out_ptr1 + x0, tmp4, xmask)
2023-01-11T21:38:06.6921989Z ''')
2023-01-11T21:38:06.6921994Z 
2023-01-11T21:38:06.6921999Z 
2023-01-11T21:38:06.6922155Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6922228Z import triton
2023-01-11T21:38:06.6922321Z import triton.language as tl
2023-01-11T21:38:06.6922428Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6922529Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6922667Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6922792Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6922797Z 
2023-01-11T21:38:06.6923201Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6923275Z @triton.jit
2023-01-11T21:38:06.6923414Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6923488Z     xnumel = 10
2023-01-11T21:38:06.6923577Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6923706Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6923792Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6923861Z     x0 = xindex
2023-01-11T21:38:06.6923960Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.6924034Z     tmp1 = 10
2023-01-11T21:38:06.6924108Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6924242Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6924328Z ''')
2023-01-11T21:38:06.6924333Z 
2023-01-11T21:38:06.6924338Z 
2023-01-11T21:38:06.6924430Z async_compile.wait(globals())
2023-01-11T21:38:06.6924505Z del async_compile
2023-01-11T21:38:06.6924542Z 
2023-01-11T21:38:06.6924618Z def call(args):
2023-01-11T21:38:06.6924697Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6924773Z     args.clear()
2023-01-11T21:38:06.6924857Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6925063Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6925260Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6925352Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6925505Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, buf1, 10, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.6925581Z         del arg0_1
2023-01-11T21:38:06.6925773Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6925909Z         triton_fused_add_1_1.run(arg1_1, buf2, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.6925975Z         del arg1_1
2023-01-11T21:38:06.6926066Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6926074Z 
2023-01-11T21:38:06.6926079Z 
2023-01-11T21:38:06.6926158Z if __name__ == "__main__":
2023-01-11T21:38:06.6926277Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6926404Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6926607Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6926807Z     arg1_1 = rand_strided((1, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6926918Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6926930Z 
2023-01-11T21:38:06.6927187Z [2023-01-11 21:35:43,430] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 867
2023-01-11T21:38:06.6927222Z 
2023-01-11T21:38:06.6927320Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6927395Z import torch
2023-01-11T21:38:06.6927469Z import random
2023-01-11T21:38:06.6927588Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6927715Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6927720Z 
2023-01-11T21:38:06.6927802Z aten = torch.ops.aten
2023-01-11T21:38:06.6927938Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6928027Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6928032Z 
2023-01-11T21:38:06.6928105Z import triton
2023-01-11T21:38:06.6928198Z import triton.language as tl
2023-01-11T21:38:06.6928322Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6928461Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6928467Z 
2023-01-11T21:38:06.6928474Z 
2023-01-11T21:38:06.6928635Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6928712Z import triton
2023-01-11T21:38:06.6928797Z import triton.language as tl
2023-01-11T21:38:06.6928909Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6929011Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6929145Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6929268Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6929273Z 
2023-01-11T21:38:06.6929362Z @reduction(size_hints=[16, 16],
2023-01-11T21:38:06.6929476Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6929563Z               filename=__file__,
2023-01-11T21:38:06.6929942Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6930024Z @triton.jit
2023-01-11T21:38:06.6930209Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6930282Z     xnumel = 10
2023-01-11T21:38:06.6930354Z     rnumel = 10
2023-01-11T21:38:06.6930451Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6930618Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6930702Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6930814Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6930885Z     x0 = xindex
2023-01-11T21:38:06.6930991Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6931078Z         rindex = roffset + rbase
2023-01-11T21:38:06.6931162Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6931234Z         r1 = rindex
2023-01-11T21:38:06.6931360Z         tmp0 = tl.load(in_ptr0 + (r1 + (10*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6931483Z         tmp1 = tl.load(in_ptr1 + (r1), rmask).to(tl.float32)
2023-01-11T21:38:06.6931569Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6931728Z         tl.store(out_ptr0 + (r1 + (10*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask)
2023-01-11T21:38:06.6931846Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6931954Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6932041Z         rindex = roffset + rbase
2023-01-11T21:38:06.6932126Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6932190Z         r1 = rindex
2023-01-11T21:38:06.6932323Z         tmp3 = tl.load(out_ptr0 + (r1 + (10*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6932443Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.6932555Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6932652Z     tl.store(out_ptr1 + x0, tmp4, xmask)
2023-01-11T21:38:06.6932737Z ''')
2023-01-11T21:38:06.6932743Z 
2023-01-11T21:38:06.6932747Z 
2023-01-11T21:38:06.6932934Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.6933002Z import triton
2023-01-11T21:38:06.6933094Z import triton.language as tl
2023-01-11T21:38:06.6933207Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6933309Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6933445Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.6933571Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6933576Z 
2023-01-11T21:38:06.6933977Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6934051Z @triton.jit
2023-01-11T21:38:06.6934177Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.6934250Z     xnumel = 10
2023-01-11T21:38:06.6934346Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6934687Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.6934775Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6934847Z     x0 = xindex
2023-01-11T21:38:06.6934963Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.6935028Z     tmp1 = 10
2023-01-11T21:38:06.6935107Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6935244Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.6935331Z ''')
2023-01-11T21:38:06.6935336Z 
2023-01-11T21:38:06.6935341Z 
2023-01-11T21:38:06.6935440Z async_compile.wait(globals())
2023-01-11T21:38:06.6935534Z del async_compile
2023-01-11T21:38:06.6935539Z 
2023-01-11T21:38:06.6935618Z def call(args):
2023-01-11T21:38:06.6935717Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.6935785Z     args.clear()
2023-01-11T21:38:06.6935876Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6936079Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6936277Z         buf1 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6936372Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6936527Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, buf1, 10, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.6936602Z         del arg0_1
2023-01-11T21:38:06.6936846Z         buf2 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6936986Z         triton_fused_add_1_1.run(arg1_1, buf2, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.6937062Z         del arg1_1
2023-01-11T21:38:06.6937204Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.6937210Z 
2023-01-11T21:38:06.6937214Z 
2023-01-11T21:38:06.6937295Z if __name__ == "__main__":
2023-01-11T21:38:06.6937412Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6937540Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6937747Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6937942Z     arg1_1 = rand_strided((1, 10), (10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6938060Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.6938065Z 
2023-01-11T21:38:06.6938135Z ok (0.338s)
2023-01-11T21:38:06.6938585Z   test_sum4_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6938717Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6938975Z [2023-01-11 21:35:43,453] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 868
2023-01-11T21:38:06.6939236Z [2023-01-11 21:35:43,689] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 868
2023-01-11T21:38:06.6939705Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6939841Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6940096Z [2023-01-11 21:35:43,711] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 869
2023-01-11T21:38:06.6940102Z 
2023-01-11T21:38:06.6940201Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6940271Z import torch
2023-01-11T21:38:06.6940348Z import random
2023-01-11T21:38:06.6940469Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6940595Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6940604Z 
2023-01-11T21:38:06.6940688Z aten = torch.ops.aten
2023-01-11T21:38:06.6940828Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6940926Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6940931Z 
2023-01-11T21:38:06.6941008Z import triton
2023-01-11T21:38:06.6941099Z import triton.language as tl
2023-01-11T21:38:06.6941228Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6941370Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6941376Z 
2023-01-11T21:38:06.6941381Z 
2023-01-11T21:38:06.6941554Z triton_fused_add_add_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6941634Z import triton
2023-01-11T21:38:06.6941729Z import triton.language as tl
2023-01-11T21:38:06.6941845Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6941942Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6942076Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6942206Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6942211Z 
2023-01-11T21:38:06.6942304Z @reduction(size_hints=[128, 8],
2023-01-11T21:38:06.6942424Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6942511Z               filename=__file__,
2023-01-11T21:38:06.6942935Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.6943013Z @triton.jit
2023-01-11T21:38:06.6943194Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6943277Z     xnumel = 128
2023-01-11T21:38:06.6943353Z     rnumel = 8
2023-01-11T21:38:06.6943454Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6943590Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6943681Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6943801Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6943867Z     x0 = xindex
2023-01-11T21:38:06.6943975Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6944068Z         rindex = roffset + rbase
2023-01-11T21:38:06.6944157Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6944231Z         r1 = rindex
2023-01-11T21:38:06.6944352Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6944425Z         tmp1 = 1
2023-01-11T21:38:06.6944503Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6944665Z         tl.store(out_ptr0 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask)
2023-01-11T21:38:06.6944786Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6944894Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6944983Z         rindex = roffset + rbase
2023-01-11T21:38:06.6945143Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6945216Z         r1 = rindex
2023-01-11T21:38:06.6945328Z         tmp3 = tl.load(out_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6945451Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.6945571Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6945672Z     tl.store(out_ptr1 + x0, tmp4, xmask)
2023-01-11T21:38:06.6945746Z     tmp5 = 3
2023-01-11T21:38:06.6945830Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.6945968Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6946052Z ''')
2023-01-11T21:38:06.6946066Z 
2023-01-11T21:38:06.6946071Z 
2023-01-11T21:38:06.6946243Z triton_fused_add_1_add_2_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6946319Z import triton
2023-01-11T21:38:06.6946414Z import triton.language as tl
2023-01-11T21:38:06.6946530Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6946637Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6946775Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6946902Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6946908Z 
2023-01-11T21:38:06.6946993Z @reduction(size_hints=[16, 8],
2023-01-11T21:38:06.6947114Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6947202Z               filename=__file__,
2023-01-11T21:38:06.6947579Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6947655Z @triton.jit
2023-01-11T21:38:06.6947834Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6947910Z     xnumel = 16
2023-01-11T21:38:06.6947984Z     rnumel = 8
2023-01-11T21:38:06.6948077Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6948215Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6948300Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6948422Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6948495Z     x0 = xindex
2023-01-11T21:38:06.6948640Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6948752Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6948837Z         rindex = roffset + rbase
2023-01-11T21:38:06.6948926Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6948999Z         r1 = rindex
2023-01-11T21:38:06.6949119Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6949242Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.6949359Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6949458Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.6949525Z     tmp2 = 5
2023-01-11T21:38:06.6949611Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.6949750Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6949839Z ''')
2023-01-11T21:38:06.6949845Z 
2023-01-11T21:38:06.6949849Z 
2023-01-11T21:38:06.6949947Z async_compile.wait(globals())
2023-01-11T21:38:06.6950026Z del async_compile
2023-01-11T21:38:06.6950033Z 
2023-01-11T21:38:06.6950111Z def call(args):
2023-01-11T21:38:06.6950179Z     arg0_1, = args
2023-01-11T21:38:06.6950260Z     args.clear()
2023-01-11T21:38:06.6950359Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6950582Z         buf0 = empty_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6950794Z         buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6951005Z         buf2 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6951103Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6951296Z         triton_fused_add_add_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, 128, 8, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.6951365Z         del arg0_1
2023-01-11T21:38:06.6951566Z         buf3 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6951766Z         buf4 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6951923Z         triton_fused_add_1_add_2_sum_2_1.run(buf2, buf3, buf4, 16, 8, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6952028Z         return (buf4, buf3, buf2, buf1, buf0, )
2023-01-11T21:38:06.6952033Z 
2023-01-11T21:38:06.6952038Z 
2023-01-11T21:38:06.6952121Z if __name__ == "__main__":
2023-01-11T21:38:06.6952241Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6952370Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6952583Z     arg0_1 = rand_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6952701Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6952707Z 
2023-01-11T21:38:06.6952974Z [2023-01-11 21:35:43,832] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 869
2023-01-11T21:38:06.6952980Z 
2023-01-11T21:38:06.6953083Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6953164Z import torch
2023-01-11T21:38:06.6953242Z import random
2023-01-11T21:38:06.6953365Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6953493Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6953498Z 
2023-01-11T21:38:06.6953576Z aten = torch.ops.aten
2023-01-11T21:38:06.6953714Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6953811Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6953816Z 
2023-01-11T21:38:06.6953894Z import triton
2023-01-11T21:38:06.6953989Z import triton.language as tl
2023-01-11T21:38:06.6954117Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6954261Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6954267Z 
2023-01-11T21:38:06.6954271Z 
2023-01-11T21:38:06.6954446Z triton_fused_add_add_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6954516Z import triton
2023-01-11T21:38:06.6954612Z import triton.language as tl
2023-01-11T21:38:06.6954752Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6954858Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6954994Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6955121Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6955126Z 
2023-01-11T21:38:06.6955223Z @reduction(size_hints=[128, 8],
2023-01-11T21:38:06.6955361Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6955455Z               filename=__file__,
2023-01-11T21:38:06.6955859Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.6955940Z @triton.jit
2023-01-11T21:38:06.6956126Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6956204Z     xnumel = 128
2023-01-11T21:38:06.6956282Z     rnumel = 8
2023-01-11T21:38:06.6956380Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6956510Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6956596Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6956715Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6956788Z     x0 = xindex
2023-01-11T21:38:06.6956896Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6956988Z         rindex = roffset + rbase
2023-01-11T21:38:06.6957076Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6957142Z         r1 = rindex
2023-01-11T21:38:06.6957316Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6957391Z         tmp1 = 1
2023-01-11T21:38:06.6957475Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6957639Z         tl.store(out_ptr0 + (r1 + (8*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp2, rmask & xmask)
2023-01-11T21:38:06.6957761Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6957871Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6957962Z         rindex = roffset + rbase
2023-01-11T21:38:06.6958042Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6958116Z         r1 = rindex
2023-01-11T21:38:06.6958252Z         tmp3 = tl.load(out_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6958377Z         _tmp4 = tl.where(xmask & rmask, _tmp4 + tmp3, _tmp4)
2023-01-11T21:38:06.6958494Z     tmp4 = tl.reshape(tl.sum(_tmp4, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6958594Z     tl.store(out_ptr1 + x0, tmp4, xmask)
2023-01-11T21:38:06.6958670Z     tmp5 = 3
2023-01-11T21:38:06.6958745Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.6958881Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.6958968Z ''')
2023-01-11T21:38:06.6958974Z 
2023-01-11T21:38:06.6958978Z 
2023-01-11T21:38:06.6959158Z triton_fused_add_1_add_2_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6959238Z import triton
2023-01-11T21:38:06.6959337Z import triton.language as tl
2023-01-11T21:38:06.6959454Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6959551Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6959684Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6959811Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6959816Z 
2023-01-11T21:38:06.6959909Z @reduction(size_hints=[16, 8],
2023-01-11T21:38:06.6960027Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6960113Z               filename=__file__,
2023-01-11T21:38:06.6960498Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6960576Z @triton.jit
2023-01-11T21:38:06.6960775Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6960854Z     xnumel = 16
2023-01-11T21:38:06.6960928Z     rnumel = 8
2023-01-11T21:38:06.6961028Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6961165Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6961250Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6961372Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6961438Z     x0 = xindex
2023-01-11T21:38:06.6961557Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6961665Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6961760Z         rindex = roffset + rbase
2023-01-11T21:38:06.6961848Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6961922Z         r1 = rindex
2023-01-11T21:38:06.6962056Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6962172Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.6962293Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6962393Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.6962472Z     tmp2 = 5
2023-01-11T21:38:06.6962555Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.6962692Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.6962781Z ''')
2023-01-11T21:38:06.6962786Z 
2023-01-11T21:38:06.6962790Z 
2023-01-11T21:38:06.6962886Z async_compile.wait(globals())
2023-01-11T21:38:06.6962959Z del async_compile
2023-01-11T21:38:06.6962964Z 
2023-01-11T21:38:06.6963044Z def call(args):
2023-01-11T21:38:06.6963121Z     arg0_1, = args
2023-01-11T21:38:06.6963227Z     args.clear()
2023-01-11T21:38:06.6963321Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6963546Z         buf0 = empty_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6963759Z         buf1 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6963961Z         buf2 = empty_strided((1, 16, 8), (128, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6964057Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6964219Z         triton_fused_add_add_1_sum_1_0.run(arg0_1, buf0, buf1, buf2, 128, 8, grid=grid(128), stream=stream0)
2023-01-11T21:38:06.6964295Z         del arg0_1
2023-01-11T21:38:06.6964500Z         buf3 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6964697Z         buf4 = empty_strided((1, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6964857Z         triton_fused_add_1_add_2_sum_2_1.run(buf2, buf3, buf4, 16, 8, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.6964969Z         return (buf4, buf3, buf2, buf1, buf0, )
2023-01-11T21:38:06.6964974Z 
2023-01-11T21:38:06.6964979Z 
2023-01-11T21:38:06.6965063Z if __name__ == "__main__":
2023-01-11T21:38:06.6965177Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6965311Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6965555Z     arg0_1 = rand_strided((1, 16, 8, 8), (1024, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6965681Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6965687Z 
2023-01-11T21:38:06.6965774Z ok (0.402s)
2023-01-11T21:38:06.6966228Z   test_sum5_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6966363Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6966627Z [2023-01-11 21:35:43,855] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 870
2023-01-11T21:38:06.6966924Z [2023-01-11 21:35:44,053] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 870
2023-01-11T21:38:06.6967334Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6967468Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6967724Z [2023-01-11 21:35:44,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 871
2023-01-11T21:38:06.6967733Z 
2023-01-11T21:38:06.6967833Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6967911Z import torch
2023-01-11T21:38:06.6967987Z import random
2023-01-11T21:38:06.6968109Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6968238Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6968244Z 
2023-01-11T21:38:06.6968329Z aten = torch.ops.aten
2023-01-11T21:38:06.6968462Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6968560Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6968565Z 
2023-01-11T21:38:06.6968641Z import triton
2023-01-11T21:38:06.6968738Z import triton.language as tl
2023-01-11T21:38:06.6968865Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6969008Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6969014Z 
2023-01-11T21:38:06.6969042Z 
2023-01-11T21:38:06.6969207Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6969284Z import triton
2023-01-11T21:38:06.6969373Z import triton.language as tl
2023-01-11T21:38:06.6969488Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6969592Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6969728Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6969858Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6969863Z 
2023-01-11T21:38:06.6969958Z @reduction(size_hints=[256, 16],
2023-01-11T21:38:06.6970074Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6970155Z               filename=__file__,
2023-01-11T21:38:06.6970515Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6970594Z @triton.jit
2023-01-11T21:38:06.6970769Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6970848Z     xnumel = 136
2023-01-11T21:38:06.6970922Z     rnumel = 9
2023-01-11T21:38:06.6971026Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6971167Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6971246Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6971366Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6971442Z     x0 = xindex
2023-01-11T21:38:06.6971561Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6971671Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6971763Z         rindex = roffset + rbase
2023-01-11T21:38:06.6971849Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6971916Z         r1 = rindex
2023-01-11T21:38:06.6972037Z         tmp0 = tl.load(in_ptr0 + (r1 + (9*x0)), rmask & xmask)
2023-01-11T21:38:06.6972112Z         tmp1 = 1
2023-01-11T21:38:06.6972199Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6972322Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6972440Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6972540Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.6972621Z ''')
2023-01-11T21:38:06.6972626Z 
2023-01-11T21:38:06.6972659Z 
2023-01-11T21:38:06.6972853Z triton_fused_add_add_1_add_2_sum_1_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6972930Z import triton
2023-01-11T21:38:06.6973026Z import triton.language as tl
2023-01-11T21:38:06.6973147Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6973252Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6973388Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6973509Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6973514Z 
2023-01-11T21:38:06.6973605Z @reduction(size_hints=[32, 8],
2023-01-11T21:38:06.6973724Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6973812Z               filename=__file__,
2023-01-11T21:38:06.6974186Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6974267Z @triton.jit
2023-01-11T21:38:06.6974440Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6974644Z     xnumel = 17
2023-01-11T21:38:06.6974710Z     rnumel = 8
2023-01-11T21:38:06.6974807Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6974942Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6975025Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6975145Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6975214Z     x0 = xindex
2023-01-11T21:38:06.6975331Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6975480Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6975570Z         rindex = roffset + rbase
2023-01-11T21:38:06.6975658Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6975732Z         r1 = rindex
2023-01-11T21:38:06.6975855Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.6975930Z         tmp1 = 3
2023-01-11T21:38:06.6976014Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6976130Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6976247Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6976320Z     tmp4 = 5
2023-01-11T21:38:06.6976402Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6976544Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6976632Z ''')
2023-01-11T21:38:06.6976638Z 
2023-01-11T21:38:06.6976642Z 
2023-01-11T21:38:06.6976737Z async_compile.wait(globals())
2023-01-11T21:38:06.6976813Z del async_compile
2023-01-11T21:38:06.6976829Z 
2023-01-11T21:38:06.6976898Z def call(args):
2023-01-11T21:38:06.6976973Z     arg0_1, = args
2023-01-11T21:38:06.6977051Z     args.clear()
2023-01-11T21:38:06.6977197Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6977442Z         buf0 = empty_strided((1, 17, 8), (136, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6977537Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6977683Z         triton_fused_add_sum_1_0.run(arg0_1, buf0, 136, 9, grid=grid(136), stream=stream0)
2023-01-11T21:38:06.6977752Z         del arg0_1
2023-01-11T21:38:06.6977957Z         buf1 = empty_strided((1, 17), (17, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.6978050Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.6978208Z         triton_fused_add_add_1_add_2_sum_1_sum_2_1.run(buf2, buf0, 17, 8, grid=grid(17), stream=stream0)
2023-01-11T21:38:06.6978290Z         return (buf2, )
2023-01-11T21:38:06.6978296Z 
2023-01-11T21:38:06.6978304Z 
2023-01-11T21:38:06.6978385Z if __name__ == "__main__":
2023-01-11T21:38:06.6978503Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6978632Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6978848Z     arg0_1 = rand_strided((1, 17, 8, 9), (1224, 72, 9, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.6979007Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6979013Z 
2023-01-11T21:38:06.6979281Z [2023-01-11 21:35:44,194] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 871
2023-01-11T21:38:06.6979287Z 
2023-01-11T21:38:06.6979387Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6979466Z import torch
2023-01-11T21:38:06.6979541Z import random
2023-01-11T21:38:06.6979662Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6979780Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6979792Z 
2023-01-11T21:38:06.6979869Z aten = torch.ops.aten
2023-01-11T21:38:06.6980013Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6980111Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6980116Z 
2023-01-11T21:38:06.6980192Z import triton
2023-01-11T21:38:06.6980286Z import triton.language as tl
2023-01-11T21:38:06.6980414Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6980555Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6980561Z 
2023-01-11T21:38:06.6980565Z 
2023-01-11T21:38:06.6980728Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6980798Z import triton
2023-01-11T21:38:06.6980893Z import triton.language as tl
2023-01-11T21:38:06.6981007Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6981112Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6981250Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6981377Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6981409Z 
2023-01-11T21:38:06.6981504Z @reduction(size_hints=[256, 16],
2023-01-11T21:38:06.6981614Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6981702Z               filename=__file__,
2023-01-11T21:38:06.6982067Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6982143Z @triton.jit
2023-01-11T21:38:06.6982313Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6982389Z     xnumel = 136
2023-01-11T21:38:06.6982464Z     rnumel = 9
2023-01-11T21:38:06.6982564Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6982694Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6982780Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6982902Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6982978Z     x0 = xindex
2023-01-11T21:38:06.6983098Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6983206Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6983296Z         rindex = roffset + rbase
2023-01-11T21:38:06.6983376Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6983455Z         r1 = rindex
2023-01-11T21:38:06.6983590Z         tmp0 = tl.load(in_ptr0 + (r1 + (9*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6983664Z         tmp1 = 1
2023-01-11T21:38:06.6983748Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6983870Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6983987Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6984081Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.6984168Z ''')
2023-01-11T21:38:06.6984173Z 
2023-01-11T21:38:06.6984178Z 
2023-01-11T21:38:06.6984371Z triton_fused_add_add_1_add_2_sum_1_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6984453Z import triton
2023-01-11T21:38:06.6984548Z import triton.language as tl
2023-01-11T21:38:06.6984666Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6984770Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6984897Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6985054Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6985059Z 
2023-01-11T21:38:06.6985152Z @reduction(size_hints=[32, 8],
2023-01-11T21:38:06.6985269Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6985355Z               filename=__file__,
2023-01-11T21:38:06.6985729Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.6985807Z @triton.jit
2023-01-11T21:38:06.6985978Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6986050Z     xnumel = 17
2023-01-11T21:38:06.6986129Z     rnumel = 8
2023-01-11T21:38:06.6986228Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6986364Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6986449Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6986573Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6986646Z     x0 = xindex
2023-01-11T21:38:06.6986757Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.6986865Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6986954Z         rindex = roffset + rbase
2023-01-11T21:38:06.6987041Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6987115Z         r1 = rindex
2023-01-11T21:38:06.6987248Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.6987323Z         tmp1 = 3
2023-01-11T21:38:06.6987400Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.6987551Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.6987667Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6987740Z     tmp4 = 5
2023-01-11T21:38:06.6987821Z     tmp5 = tmp3 + tmp4
2023-01-11T21:38:06.6987964Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.6988053Z ''')
2023-01-11T21:38:06.6988059Z 
2023-01-11T21:38:06.6988063Z 
2023-01-11T21:38:06.6988159Z async_compile.wait(globals())
2023-01-11T21:38:06.6988231Z del async_compile
2023-01-11T21:38:06.6988236Z 
2023-01-11T21:38:06.6988314Z def call(args):
2023-01-11T21:38:06.6988389Z     arg0_1, = args
2023-01-11T21:38:06.6988466Z     args.clear()
2023-01-11T21:38:06.6988561Z     with torch.cuda.device(0):
2023-01-11T21:38:06.6988771Z         buf0 = empty_strided((1, 17, 8), (136, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6988866Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.6989010Z         triton_fused_add_sum_1_0.run(arg0_1, buf0, 136, 9, grid=grid(136), stream=stream0)
2023-01-11T21:38:06.6989086Z         del arg0_1
2023-01-11T21:38:06.6989291Z         buf1 = empty_strided((1, 17), (17, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.6989384Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.6989547Z         triton_fused_add_add_1_add_2_sum_1_sum_2_1.run(buf2, buf0, 17, 8, grid=grid(17), stream=stream0)
2023-01-11T21:38:06.6989631Z         return (buf2, )
2023-01-11T21:38:06.6989636Z 
2023-01-11T21:38:06.6989640Z 
2023-01-11T21:38:06.6989723Z if __name__ == "__main__":
2023-01-11T21:38:06.6989843Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.6989966Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.6990187Z     arg0_1 = rand_strided((1, 17, 8, 9), (1224, 72, 9, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.6990304Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.6990309Z 
2023-01-11T21:38:06.6990387Z ok (0.361s)
2023-01-11T21:38:06.6990874Z   test_sum_dtype_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6991012Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6991274Z [2023-01-11 21:35:44,215] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 872
2023-01-11T21:38:06.6991484Z [2023-01-11 21:35:44,231] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.6991749Z [2023-01-11 21:35:44,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 872
2023-01-11T21:38:06.6992162Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.6992294Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.6992553Z [2023-01-11 21:35:44,334] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 873
2023-01-11T21:38:06.6992765Z [2023-01-11 21:35:44,348] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.6992770Z 
2023-01-11T21:38:06.6992870Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.6992947Z import torch
2023-01-11T21:38:06.6993024Z import random
2023-01-11T21:38:06.6993145Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.6993272Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.6993304Z 
2023-01-11T21:38:06.6993383Z aten = torch.ops.aten
2023-01-11T21:38:06.6993522Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.6993621Z async_compile = AsyncCompile()
2023-01-11T21:38:06.6993626Z 
2023-01-11T21:38:06.6993705Z import triton
2023-01-11T21:38:06.6993799Z import triton.language as tl
2023-01-11T21:38:06.6993929Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.6994072Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.6994077Z 
2023-01-11T21:38:06.6994082Z 
2023-01-11T21:38:06.6994240Z triton_fused_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.6994310Z import triton
2023-01-11T21:38:06.6994404Z import triton.language as tl
2023-01-11T21:38:06.6994520Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6994626Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6994761Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6994894Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6994899Z 
2023-01-11T21:38:06.6994990Z @reduction(size_hints=[32, 32],
2023-01-11T21:38:06.6995100Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6995206Z               filename=__file__,
2023-01-11T21:38:06.6995608Z               meta={'signature': {0: '*fp32', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.6995685Z @triton.jit
2023-01-11T21:38:06.6995859Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6995937Z     xnumel = 32
2023-01-11T21:38:06.6996012Z     rnumel = 32
2023-01-11T21:38:06.6996111Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6996241Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.6996325Z     xmask = xindex < xnumel
2023-01-11T21:38:06.6996453Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.6996526Z     x0 = xindex
2023-01-11T21:38:06.6996644Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0
2023-01-11T21:38:06.6996753Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.6996846Z         rindex = roffset + rbase
2023-01-11T21:38:06.6996955Z         rmask = rindex < rnumel
2023-01-11T21:38:06.6997030Z         r1 = rindex
2023-01-11T21:38:06.6997249Z         tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.6997342Z         tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.6997466Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.6997585Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.6997684Z     tl.store(out_ptr0 + x0, tmp2, xmask)
2023-01-11T21:38:06.6997764Z ''')
2023-01-11T21:38:06.6997770Z 
2023-01-11T21:38:06.6997782Z 
2023-01-11T21:38:06.6997938Z triton_fused_add_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.6998019Z import triton
2023-01-11T21:38:06.6998117Z import triton.language as tl
2023-01-11T21:38:06.6998234Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.6998339Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.6998477Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.6998606Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.6998612Z 
2023-01-11T21:38:06.6998700Z @reduction(size_hints=[1, 1024],
2023-01-11T21:38:06.6998816Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.6998903Z               filename=__file__,
2023-01-11T21:38:06.6999281Z               meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.6999360Z @triton.jit
2023-01-11T21:38:06.6999541Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.6999656Z     xnumel = 1
2023-01-11T21:38:06.6999733Z     rnumel = 1024
2023-01-11T21:38:06.6999826Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.6999962Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7000052Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7000176Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7000297Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0
2023-01-11T21:38:06.7000405Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7000497Z         rindex = roffset + rbase
2023-01-11T21:38:06.7000577Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7000651Z         r0 = rindex
2023-01-11T21:38:06.7000849Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7000943Z         tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.7001067Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7001189Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7001296Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7001379Z         rindex = roffset + rbase
2023-01-11T21:38:06.7001466Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7001543Z         r0 = rindex
2023-01-11T21:38:06.7001624Z         r1 = rindex % 32
2023-01-11T21:38:06.7001730Z         tmp3 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.7001833Z         tmp5 = tl.load(in_ptr1 + (r1), rmask)
2023-01-11T21:38:06.7001925Z         tmp4 = tmp3.to(tl.float64)
2023-01-11T21:38:06.7002003Z         tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7002087Z         tmp7 = tmp6 + tmp2
2023-01-11T21:38:06.7002241Z         tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp7, rmask & xmask)
2023-01-11T21:38:06.7002329Z ''')
2023-01-11T21:38:06.7002336Z 
2023-01-11T21:38:06.7002340Z 
2023-01-11T21:38:06.7002437Z async_compile.wait(globals())
2023-01-11T21:38:06.7002518Z del async_compile
2023-01-11T21:38:06.7002523Z 
2023-01-11T21:38:06.7002602Z def call(args):
2023-01-11T21:38:06.7002670Z     arg0_1, = args
2023-01-11T21:38:06.7002749Z     args.clear()
2023-01-11T21:38:06.7002844Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7003076Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.7003172Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7003314Z         triton_fused_sum_1_0.run(arg0_1, buf0, 32, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.7003520Z         buf2 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.7003670Z         triton_fused_add_sum_2_1.run(arg0_1, buf0, buf2, 1, 1024, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7003739Z         del arg0_1
2023-01-11T21:38:06.7003820Z         return (buf2, )
2023-01-11T21:38:06.7003825Z 
2023-01-11T21:38:06.7003830Z 
2023-01-11T21:38:06.7003912Z if __name__ == "__main__":
2023-01-11T21:38:06.7004036Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7004165Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7004372Z     arg0_1 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7004486Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7004493Z 
2023-01-11T21:38:06.7004758Z [2023-01-11 21:35:44,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 873
2023-01-11T21:38:06.7004764Z 
2023-01-11T21:38:06.7004856Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7004933Z import torch
2023-01-11T21:38:06.7005011Z import random
2023-01-11T21:38:06.7005131Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7005256Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7005261Z 
2023-01-11T21:38:06.7005346Z aten = torch.ops.aten
2023-01-11T21:38:06.7005484Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7005609Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7005614Z 
2023-01-11T21:38:06.7005684Z import triton
2023-01-11T21:38:06.7005778Z import triton.language as tl
2023-01-11T21:38:06.7005904Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7006049Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7006055Z 
2023-01-11T21:38:06.7006059Z 
2023-01-11T21:38:06.7006220Z triton_fused_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7006297Z import triton
2023-01-11T21:38:06.7006393Z import triton.language as tl
2023-01-11T21:38:06.7006502Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7006605Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7006737Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7006864Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7006869Z 
2023-01-11T21:38:06.7006960Z @reduction(size_hints=[32, 32],
2023-01-11T21:38:06.7007082Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7007172Z               filename=__file__,
2023-01-11T21:38:06.7007538Z               meta={'signature': {0: '*fp16', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7007607Z @triton.jit
2023-01-11T21:38:06.7007776Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7007850Z     xnumel = 32
2023-01-11T21:38:06.7007924Z     rnumel = 32
2023-01-11T21:38:06.7008024Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7008161Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7008248Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7008361Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7008435Z     x0 = xindex
2023-01-11T21:38:06.7008558Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0
2023-01-11T21:38:06.7008665Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7008755Z         rindex = roffset + rbase
2023-01-11T21:38:06.7008843Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7008917Z         r1 = rindex
2023-01-11T21:38:06.7009177Z         tmp0 = tl.load(in_ptr0 + (r1 + (32*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7009277Z         tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.7009400Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7009516Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7009616Z     tl.store(out_ptr0 + x0, tmp2, xmask)
2023-01-11T21:38:06.7009703Z ''')
2023-01-11T21:38:06.7009708Z 
2023-01-11T21:38:06.7009713Z 
2023-01-11T21:38:06.7009879Z triton_fused_add_sum_2_1 = async_compile.triton('''
2023-01-11T21:38:06.7009956Z import triton
2023-01-11T21:38:06.7010044Z import triton.language as tl
2023-01-11T21:38:06.7010163Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7010266Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7010401Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7010528Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7010533Z 
2023-01-11T21:38:06.7010628Z @reduction(size_hints=[1, 1024],
2023-01-11T21:38:06.7010743Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7010823Z               filename=__file__,
2023-01-11T21:38:06.7011202Z               meta={'signature': {0: '*fp16', 1: '*fp64', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.7011279Z @triton.jit
2023-01-11T21:38:06.7011457Z def triton_(in_ptr0, in_ptr1, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7011564Z     xnumel = 1
2023-01-11T21:38:06.7011640Z     rnumel = 1024
2023-01-11T21:38:06.7011740Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7011876Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7011955Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7012077Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7012198Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float64) + 0
2023-01-11T21:38:06.7012305Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7012396Z         rindex = roffset + rbase
2023-01-11T21:38:06.7012483Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7012557Z         r0 = rindex
2023-01-11T21:38:06.7012767Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7012860Z         tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.7012985Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7013101Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7013212Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7013305Z         rindex = roffset + rbase
2023-01-11T21:38:06.7013392Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7013459Z         r0 = rindex
2023-01-11T21:38:06.7013541Z         r1 = rindex % 32
2023-01-11T21:38:06.7013663Z         tmp3 = tl.load(in_ptr0 + (r0), rmask).to(tl.float32)
2023-01-11T21:38:06.7013767Z         tmp5 = tl.load(in_ptr1 + (r1), rmask)
2023-01-11T21:38:06.7013860Z         tmp4 = tmp3.to(tl.float64)
2023-01-11T21:38:06.7013944Z         tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7014033Z         tmp7 = tmp6 + tmp2
2023-01-11T21:38:06.7014183Z         tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp7, rmask & xmask)
2023-01-11T21:38:06.7014270Z ''')
2023-01-11T21:38:06.7014276Z 
2023-01-11T21:38:06.7014280Z 
2023-01-11T21:38:06.7014377Z async_compile.wait(globals())
2023-01-11T21:38:06.7014456Z del async_compile
2023-01-11T21:38:06.7014463Z 
2023-01-11T21:38:06.7014728Z def call(args):
2023-01-11T21:38:06.7014807Z     arg0_1, = args
2023-01-11T21:38:06.7014886Z     args.clear()
2023-01-11T21:38:06.7014982Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7015180Z         buf0 = empty_strided((32, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.7015321Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7015463Z         triton_fused_sum_1_0.run(arg0_1, buf0, 32, 32, grid=grid(32), stream=stream0)
2023-01-11T21:38:06.7015666Z         buf2 = empty_strided((32, 32), (32, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.7015816Z         triton_fused_add_sum_2_1.run(arg0_1, buf0, buf2, 1, 1024, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7015891Z         del arg0_1
2023-01-11T21:38:06.7015973Z         return (buf2, )
2023-01-11T21:38:06.7015978Z 
2023-01-11T21:38:06.7015982Z 
2023-01-11T21:38:06.7016063Z if __name__ == "__main__":
2023-01-11T21:38:06.7016176Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7016309Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7016514Z     arg0_1 = rand_strided((32, 32), (32, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7016629Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7016634Z 
2023-01-11T21:38:06.7016707Z ok (0.240s)
2023-01-11T21:38:06.7017221Z   test_sum_int_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7017359Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7017621Z [2023-01-11 21:35:44,454] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 874
2023-01-11T21:38:06.7017873Z [2023-01-11 21:35:44,464] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7018132Z [2023-01-11 21:35:44,547] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 874
2023-01-11T21:38:06.7018554Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7018687Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7018945Z [2023-01-11 21:35:44,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 875
2023-01-11T21:38:06.7019155Z [2023-01-11 21:35:44,574] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7019416Z [2023-01-11 21:35:44,654] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 875
2023-01-11T21:38:06.7019835Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7019968Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7020223Z [2023-01-11 21:35:44,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 876
2023-01-11T21:38:06.7020433Z [2023-01-11 21:35:44,682] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7020439Z 
2023-01-11T21:38:06.7020541Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7020612Z import torch
2023-01-11T21:38:06.7020689Z import random
2023-01-11T21:38:06.7020812Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7020938Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7020943Z 
2023-01-11T21:38:06.7021030Z aten = torch.ops.aten
2023-01-11T21:38:06.7021169Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7021295Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7021302Z 
2023-01-11T21:38:06.7021373Z import triton
2023-01-11T21:38:06.7021467Z import triton.language as tl
2023-01-11T21:38:06.7021595Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7021738Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7021743Z 
2023-01-11T21:38:06.7021748Z 
2023-01-11T21:38:06.7021929Z triton_fused_add_mul_sum_1_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7022006Z import triton
2023-01-11T21:38:06.7022101Z import triton.language as tl
2023-01-11T21:38:06.7022220Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7022320Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7022453Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7022579Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7022585Z 
2023-01-11T21:38:06.7022681Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.7022800Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7022888Z               filename=__file__,
2023-01-11T21:38:06.7023260Z               meta={'signature': {0: '*i64', 1: '*i1', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.7023337Z @triton.jit
2023-01-11T21:38:06.7023502Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7023577Z     xnumel = 1
2023-01-11T21:38:06.7023655Z     rnumel = 64
2023-01-11T21:38:06.7023754Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7023923Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7024009Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7024131Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7024241Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.7024357Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.7024464Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7024557Z         rindex = roffset + rbase
2023-01-11T21:38:06.7024644Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7024719Z         r0 = rindex
2023-01-11T21:38:06.7024916Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7025013Z         tmp3 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.7025105Z         tmp1 = tmp0.to(tl.int64)
2023-01-11T21:38:06.7025230Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7025324Z         tmp4 = tmp3.to(tl.int64)
2023-01-11T21:38:06.7025447Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.7025562Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7025675Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7025742Z     tmp6 = 2
2023-01-11T21:38:06.7025827Z     tmp7 = tmp2 * tmp6
2023-01-11T21:38:06.7025908Z     tmp8 = tmp7 + tmp5
2023-01-11T21:38:06.7026050Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None)
2023-01-11T21:38:06.7026139Z ''')
2023-01-11T21:38:06.7026145Z 
2023-01-11T21:38:06.7026149Z 
2023-01-11T21:38:06.7026247Z async_compile.wait(globals())
2023-01-11T21:38:06.7026325Z del async_compile
2023-01-11T21:38:06.7026331Z 
2023-01-11T21:38:06.7026407Z def call(args):
2023-01-11T21:38:06.7026476Z     arg0_1, = args
2023-01-11T21:38:06.7026554Z     args.clear()
2023-01-11T21:38:06.7026648Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7026840Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7026939Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.7027032Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7027185Z         triton_fused_add_mul_sum_1_sum_2_0.run(buf2, arg0_1, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7027254Z         del arg0_1
2023-01-11T21:38:06.7027362Z         return (buf2, )
2023-01-11T21:38:06.7027368Z 
2023-01-11T21:38:06.7027373Z 
2023-01-11T21:38:06.7027455Z if __name__ == "__main__":
2023-01-11T21:38:06.7027576Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7027706Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7027902Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.bool)
2023-01-11T21:38:06.7028017Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7028022Z 
2023-01-11T21:38:06.7028027Z 
2023-01-11T21:38:06.7028127Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7028199Z import torch
2023-01-11T21:38:06.7028276Z import random
2023-01-11T21:38:06.7028400Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7028529Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7028534Z 
2023-01-11T21:38:06.7028619Z aten = torch.ops.aten
2023-01-11T21:38:06.7028759Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7028858Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7028863Z 
2023-01-11T21:38:06.7028940Z import triton
2023-01-11T21:38:06.7029027Z import triton.language as tl
2023-01-11T21:38:06.7029156Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7029298Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7029303Z 
2023-01-11T21:38:06.7029308Z 
2023-01-11T21:38:06.7029490Z triton_fused_add_mul_sum_1_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7029569Z import triton
2023-01-11T21:38:06.7029663Z import triton.language as tl
2023-01-11T21:38:06.7029806Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7029903Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7030036Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7030161Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7030167Z 
2023-01-11T21:38:06.7030261Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.7030378Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7030466Z               filename=__file__,
2023-01-11T21:38:06.7030834Z               meta={'signature': {0: '*i64', 1: '*u8', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.7030909Z @triton.jit
2023-01-11T21:38:06.7031076Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7031152Z     xnumel = 1
2023-01-11T21:38:06.7031231Z     rnumel = 64
2023-01-11T21:38:06.7031334Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7031471Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7031558Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7031680Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7031792Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.7031907Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.7032018Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7032109Z         rindex = roffset + rbase
2023-01-11T21:38:06.7032197Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7032271Z         r0 = rindex
2023-01-11T21:38:06.7032467Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7032564Z         tmp3 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.7032654Z         tmp1 = tmp0.to(tl.int64)
2023-01-11T21:38:06.7032776Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7032870Z         tmp4 = tmp3.to(tl.int64)
2023-01-11T21:38:06.7032993Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.7033113Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7033255Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7033330Z     tmp6 = 2
2023-01-11T21:38:06.7033405Z     tmp7 = tmp2 * tmp6
2023-01-11T21:38:06.7033484Z     tmp8 = tmp7 + tmp5
2023-01-11T21:38:06.7033622Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None)
2023-01-11T21:38:06.7033711Z ''')
2023-01-11T21:38:06.7033716Z 
2023-01-11T21:38:06.7033721Z 
2023-01-11T21:38:06.7033819Z async_compile.wait(globals())
2023-01-11T21:38:06.7033899Z del async_compile
2023-01-11T21:38:06.7033904Z 
2023-01-11T21:38:06.7033982Z def call(args):
2023-01-11T21:38:06.7034050Z     arg0_1, = args
2023-01-11T21:38:06.7034130Z     args.clear()
2023-01-11T21:38:06.7034228Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7034423Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7034520Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.7034615Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7034771Z         triton_fused_add_mul_sum_1_sum_2_0.run(buf2, arg0_1, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7034840Z         del arg0_1
2023-01-11T21:38:06.7034921Z         return (buf2, )
2023-01-11T21:38:06.7034926Z 
2023-01-11T21:38:06.7034931Z 
2023-01-11T21:38:06.7035011Z if __name__ == "__main__":
2023-01-11T21:38:06.7035135Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7035284Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7035508Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.uint8)
2023-01-11T21:38:06.7035623Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7035628Z 
2023-01-11T21:38:06.7035897Z [2023-01-11 21:35:44,764] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 876
2023-01-11T21:38:06.7035929Z 
2023-01-11T21:38:06.7036029Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7036099Z import torch
2023-01-11T21:38:06.7036175Z import random
2023-01-11T21:38:06.7036296Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7036426Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7036431Z 
2023-01-11T21:38:06.7036515Z aten = torch.ops.aten
2023-01-11T21:38:06.7036655Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7036754Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7036759Z 
2023-01-11T21:38:06.7036828Z import triton
2023-01-11T21:38:06.7036924Z import triton.language as tl
2023-01-11T21:38:06.7037051Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7037194Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7037199Z 
2023-01-11T21:38:06.7037207Z 
2023-01-11T21:38:06.7037389Z triton_fused_add_mul_sum_1_sum_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7037466Z import triton
2023-01-11T21:38:06.7037562Z import triton.language as tl
2023-01-11T21:38:06.7037677Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7037773Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7037908Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7038036Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7038041Z 
2023-01-11T21:38:06.7038133Z @reduction(size_hints=[1, 64],
2023-01-11T21:38:06.7038249Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7038337Z               filename=__file__,
2023-01-11T21:38:06.7038708Z               meta={'signature': {0: '*i64', 1: '*i32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.7038788Z @triton.jit
2023-01-11T21:38:06.7038953Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7039027Z     xnumel = 1
2023-01-11T21:38:06.7039103Z     rnumel = 64
2023-01-11T21:38:06.7039204Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7039405Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7039493Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7039616Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7039725Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.7039840Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.int64) + 0
2023-01-11T21:38:06.7039948Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7040039Z         rindex = roffset + rbase
2023-01-11T21:38:06.7040128Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7040204Z         r0 = rindex
2023-01-11T21:38:06.7040401Z         tmp0 = tl.load(in_ptr0 + (r0), rmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7040502Z         tmp3 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.7040592Z         tmp1 = tmp0.to(tl.int64)
2023-01-11T21:38:06.7040716Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7040810Z         tmp4 = tmp3.to(tl.int64)
2023-01-11T21:38:06.7040931Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp4, _tmp5)
2023-01-11T21:38:06.7041048Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7041162Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7041229Z     tmp6 = 2
2023-01-11T21:38:06.7041312Z     tmp7 = tmp2 * tmp6
2023-01-11T21:38:06.7041392Z     tmp8 = tmp7 + tmp5
2023-01-11T21:38:06.7041532Z     tl.store(in_out_ptr0 + (0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp8, None)
2023-01-11T21:38:06.7041620Z ''')
2023-01-11T21:38:06.7041625Z 
2023-01-11T21:38:06.7041630Z 
2023-01-11T21:38:06.7041729Z async_compile.wait(globals())
2023-01-11T21:38:06.7041807Z del async_compile
2023-01-11T21:38:06.7041848Z 
2023-01-11T21:38:06.7041926Z def call(args):
2023-01-11T21:38:06.7041995Z     arg0_1, = args
2023-01-11T21:38:06.7042071Z     args.clear()
2023-01-11T21:38:06.7042167Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7042354Z         buf0 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7042450Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.7042545Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7042697Z         triton_fused_add_mul_sum_1_sum_2_0.run(buf2, arg0_1, 1, 64, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7042767Z         del arg0_1
2023-01-11T21:38:06.7042845Z         return (buf2, )
2023-01-11T21:38:06.7042850Z 
2023-01-11T21:38:06.7042855Z 
2023-01-11T21:38:06.7042936Z if __name__ == "__main__":
2023-01-11T21:38:06.7043056Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7043184Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7043383Z     arg0_1 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.7043501Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7043506Z 
2023-01-11T21:38:06.7043579Z ok (0.329s)
2023-01-11T21:38:06.7044042Z   test_sum_keepdims_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7044178Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7044437Z [2023-01-11 21:35:44,780] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 877
2023-01-11T21:38:06.7044706Z [2023-01-11 21:35:44,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 877
2023-01-11T21:38:06.7045150Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7045286Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7045545Z [2023-01-11 21:35:44,809] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 878
2023-01-11T21:38:06.7045807Z [2023-01-11 21:35:44,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 878
2023-01-11T21:38:06.7045813Z 
2023-01-11T21:38:06.7045915Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7045992Z import torch
2023-01-11T21:38:06.7046062Z import random
2023-01-11T21:38:06.7046185Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7046315Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7046320Z 
2023-01-11T21:38:06.7046404Z aten = torch.ops.aten
2023-01-11T21:38:06.7046542Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7046641Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7046646Z 
2023-01-11T21:38:06.7046725Z import triton
2023-01-11T21:38:06.7046822Z import triton.language as tl
2023-01-11T21:38:06.7046942Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7047085Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7047091Z 
2023-01-11T21:38:06.7047096Z 
2023-01-11T21:38:06.7047259Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7047336Z import triton
2023-01-11T21:38:06.7047431Z import triton.language as tl
2023-01-11T21:38:06.7047546Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7047650Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7047809Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7047930Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7047935Z 
2023-01-11T21:38:06.7048026Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.7048144Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7048234Z               filename=__file__,
2023-01-11T21:38:06.7048613Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7048691Z @triton.jit
2023-01-11T21:38:06.7048871Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7048946Z     xnumel = 8
2023-01-11T21:38:06.7049014Z     rnumel = 8
2023-01-11T21:38:06.7049114Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7049250Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7049340Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7049460Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7049533Z     x0 = xindex
2023-01-11T21:38:06.7049652Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7049756Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7049846Z         rindex = roffset + rbase
2023-01-11T21:38:06.7049936Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7050009Z         r1 = rindex
2023-01-11T21:38:06.7050128Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.7050247Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.7050332Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7050448Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.7050565Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7050665Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.7050755Z ''')
2023-01-11T21:38:06.7050760Z 
2023-01-11T21:38:06.7050764Z 
2023-01-11T21:38:06.7050860Z async_compile.wait(globals())
2023-01-11T21:38:06.7050939Z del async_compile
2023-01-11T21:38:06.7050944Z 
2023-01-11T21:38:06.7051021Z def call(args):
2023-01-11T21:38:06.7051096Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7051199Z     args.clear()
2023-01-11T21:38:06.7051293Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7051497Z         buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7051591Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7051742Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7051818Z         del arg0_1
2023-01-11T21:38:06.7051886Z         del arg1_1
2023-01-11T21:38:06.7051966Z         return (buf0, )
2023-01-11T21:38:06.7051971Z 
2023-01-11T21:38:06.7051975Z 
2023-01-11T21:38:06.7052057Z if __name__ == "__main__":
2023-01-11T21:38:06.7052179Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7052306Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7052509Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7052710Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7052832Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7052837Z 
2023-01-11T21:38:06.7052842Z 
2023-01-11T21:38:06.7052943Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7053012Z import torch
2023-01-11T21:38:06.7053088Z import random
2023-01-11T21:38:06.7053210Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7053336Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7053341Z 
2023-01-11T21:38:06.7053426Z aten = torch.ops.aten
2023-01-11T21:38:06.7053564Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7053706Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7053712Z 
2023-01-11T21:38:06.7053781Z import triton
2023-01-11T21:38:06.7053878Z import triton.language as tl
2023-01-11T21:38:06.7054006Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7054147Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7054155Z 
2023-01-11T21:38:06.7054159Z 
2023-01-11T21:38:06.7054322Z triton_fused_add_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7054399Z import triton
2023-01-11T21:38:06.7054617Z import triton.language as tl
2023-01-11T21:38:06.7054732Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7054827Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7054957Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7055083Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7055089Z 
2023-01-11T21:38:06.7055175Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.7055314Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7055407Z               filename=__file__,
2023-01-11T21:38:06.7055807Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7055881Z @triton.jit
2023-01-11T21:38:06.7056051Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7056124Z     xnumel = 8
2023-01-11T21:38:06.7056196Z     rnumel = 8
2023-01-11T21:38:06.7056292Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7056431Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7056513Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7056630Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7056694Z     x0 = xindex
2023-01-11T21:38:06.7056814Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7056917Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7057004Z         rindex = roffset + rbase
2023-01-11T21:38:06.7057089Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7057217Z         r1 = rindex
2023-01-11T21:38:06.7057406Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.7057535Z         tmp1 = tl.load(in_ptr1 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.7057617Z         tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7057740Z         _tmp3 = tl.where(xmask & rmask, _tmp3 + tmp2, _tmp3)
2023-01-11T21:38:06.7057855Z     tmp3 = tl.reshape(tl.sum(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7057954Z     tl.store(out_ptr0 + x0, tmp3, xmask)
2023-01-11T21:38:06.7058039Z ''')
2023-01-11T21:38:06.7058044Z 
2023-01-11T21:38:06.7058049Z 
2023-01-11T21:38:06.7058141Z async_compile.wait(globals())
2023-01-11T21:38:06.7058211Z del async_compile
2023-01-11T21:38:06.7058226Z 
2023-01-11T21:38:06.7058294Z def call(args):
2023-01-11T21:38:06.7058372Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7058447Z     args.clear()
2023-01-11T21:38:06.7058538Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7058737Z         buf0 = empty_strided((8, 1), (1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7058831Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7058980Z         triton_fused_add_sum_1_0.run(arg0_1, arg1_1, buf0, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7059046Z         del arg0_1
2023-01-11T21:38:06.7059118Z         del arg1_1
2023-01-11T21:38:06.7059195Z         return (buf0, )
2023-01-11T21:38:06.7059200Z 
2023-01-11T21:38:06.7059204Z 
2023-01-11T21:38:06.7059284Z if __name__ == "__main__":
2023-01-11T21:38:06.7059402Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7059527Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7059725Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7059948Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7060068Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7060073Z 
2023-01-11T21:38:06.7060143Z ok (0.057s)
2023-01-11T21:38:06.7060595Z   test_tanh_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7060725Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7060982Z [2023-01-11 21:35:44,850] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 879
2023-01-11T21:38:06.7061245Z [2023-01-11 21:35:44,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 879
2023-01-11T21:38:06.7061663Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7061795Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7062048Z [2023-01-11 21:35:45,027] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 880
2023-01-11T21:38:06.7062312Z [2023-01-11 21:35:45,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 880
2023-01-11T21:38:06.7062318Z 
2023-01-11T21:38:06.7062417Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7062484Z import torch
2023-01-11T21:38:06.7062561Z import random
2023-01-11T21:38:06.7062680Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7062803Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7062808Z 
2023-01-11T21:38:06.7062888Z aten = torch.ops.aten
2023-01-11T21:38:06.7063024Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7063143Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7063149Z 
2023-01-11T21:38:06.7063219Z import triton
2023-01-11T21:38:06.7063310Z import triton.language as tl
2023-01-11T21:38:06.7063438Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7063579Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7063585Z 
2023-01-11T21:38:06.7063589Z 
2023-01-11T21:38:06.7063752Z triton_fused_add_tanh_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7063825Z import triton
2023-01-11T21:38:06.7063916Z import triton.language as tl
2023-01-11T21:38:06.7064028Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7064126Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7064258Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7064381Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7064386Z 
2023-01-11T21:38:06.7064804Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7064878Z @triton.jit
2023-01-11T21:38:06.7065021Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7065094Z     xnumel = 256
2023-01-11T21:38:06.7065193Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7065315Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7065400Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7065471Z     x0 = xindex
2023-01-11T21:38:06.7065692Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7065792Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7065892Z     tmp1 = tl.libdevice.tanh(tmp0)
2023-01-11T21:38:06.7065966Z     tmp2 = 2
2023-01-11T21:38:06.7066041Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7066117Z     tmp5 = 1
2023-01-11T21:38:06.7066197Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.7066296Z     tmp7 = tl.libdevice.tanh(tmp6)
2023-01-11T21:38:06.7066434Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7066568Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7066656Z ''')
2023-01-11T21:38:06.7066661Z 
2023-01-11T21:38:06.7066666Z 
2023-01-11T21:38:06.7066755Z async_compile.wait(globals())
2023-01-11T21:38:06.7066834Z del async_compile
2023-01-11T21:38:06.7066841Z 
2023-01-11T21:38:06.7066916Z def call(args):
2023-01-11T21:38:06.7066991Z     arg0_1, = args
2023-01-11T21:38:06.7067071Z     args.clear()
2023-01-11T21:38:06.7067166Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7067376Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7067569Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7067666Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7067818Z         triton_fused_add_tanh_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.7067894Z         del arg0_1
2023-01-11T21:38:06.7067981Z         return (buf0, buf1, )
2023-01-11T21:38:06.7067986Z 
2023-01-11T21:38:06.7067990Z 
2023-01-11T21:38:06.7068071Z if __name__ == "__main__":
2023-01-11T21:38:06.7068192Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7068320Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7068518Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7068637Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7068642Z 
2023-01-11T21:38:06.7068647Z 
2023-01-11T21:38:06.7068746Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7068822Z import torch
2023-01-11T21:38:06.7068899Z import random
2023-01-11T21:38:06.7069023Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7069176Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7069182Z 
2023-01-11T21:38:06.7069267Z aten = torch.ops.aten
2023-01-11T21:38:06.7069398Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7069495Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7069500Z 
2023-01-11T21:38:06.7069576Z import triton
2023-01-11T21:38:06.7069672Z import triton.language as tl
2023-01-11T21:38:06.7069799Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7069940Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7069946Z 
2023-01-11T21:38:06.7069953Z 
2023-01-11T21:38:06.7070119Z triton_fused_add_tanh_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7070196Z import triton
2023-01-11T21:38:06.7070285Z import triton.language as tl
2023-01-11T21:38:06.7070399Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7070503Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7070640Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7070766Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7070772Z 
2023-01-11T21:38:06.7071185Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7071263Z @triton.jit
2023-01-11T21:38:06.7071406Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7071477Z     xnumel = 256
2023-01-11T21:38:06.7071608Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7071742Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7071827Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7071900Z     x0 = xindex
2023-01-11T21:38:06.7072115Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7072237Z     tmp4 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7072328Z     tmp1 = tl.libdevice.tanh(tmp0)
2023-01-11T21:38:06.7072402Z     tmp2 = 2
2023-01-11T21:38:06.7072484Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7072557Z     tmp5 = 1
2023-01-11T21:38:06.7072638Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.7072736Z     tmp7 = tl.libdevice.tanh(tmp6)
2023-01-11T21:38:06.7072872Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7072999Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7073086Z ''')
2023-01-11T21:38:06.7073094Z 
2023-01-11T21:38:06.7073100Z 
2023-01-11T21:38:06.7073194Z async_compile.wait(globals())
2023-01-11T21:38:06.7073274Z del async_compile
2023-01-11T21:38:06.7073279Z 
2023-01-11T21:38:06.7073356Z def call(args):
2023-01-11T21:38:06.7073432Z     arg0_1, = args
2023-01-11T21:38:06.7073509Z     args.clear()
2023-01-11T21:38:06.7073600Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7073804Z         buf0 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7074002Z         buf1 = empty_strided((16, 16), (16, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7074096Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7074248Z         triton_fused_add_tanh_1_0.run(arg0_1, buf0, buf1, 256, grid=grid(256), stream=stream0)
2023-01-11T21:38:06.7074324Z         del arg0_1
2023-01-11T21:38:06.7074408Z         return (buf0, buf1, )
2023-01-11T21:38:06.7074413Z 
2023-01-11T21:38:06.7074418Z 
2023-01-11T21:38:06.7074499Z if __name__ == "__main__":
2023-01-11T21:38:06.7074614Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7074743Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7074946Z     arg0_1 = rand_strided((16, 16), (16, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7075059Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7075094Z 
2023-01-11T21:38:06.7075169Z ok (0.355s)
2023-01-11T21:38:06.7075626Z   test_tensor1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7075761Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7076020Z [2023-01-11 21:35:45,195] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 881
2023-01-11T21:38:06.7076288Z [2023-01-11 21:35:45,268] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 881
2023-01-11T21:38:06.7076705Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7076840Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7077090Z [2023-01-11 21:35:45,285] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 882
2023-01-11T21:38:06.7077354Z [2023-01-11 21:35:45,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 882
2023-01-11T21:38:06.7077359Z 
2023-01-11T21:38:06.7077486Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7077560Z import torch
2023-01-11T21:38:06.7077637Z import random
2023-01-11T21:38:06.7077758Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7077882Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7077887Z 
2023-01-11T21:38:06.7077972Z aten = torch.ops.aten
2023-01-11T21:38:06.7078101Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7078197Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7078202Z 
2023-01-11T21:38:06.7078275Z import triton
2023-01-11T21:38:06.7078367Z import triton.language as tl
2023-01-11T21:38:06.7078492Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7078631Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7078637Z 
2023-01-11T21:38:06.7078642Z 
2023-01-11T21:38:06.7078796Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7078872Z import triton
2023-01-11T21:38:06.7078960Z import triton.language as tl
2023-01-11T21:38:06.7079074Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7079176Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7079308Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7079437Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7079445Z 
2023-01-11T21:38:06.7079848Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7079924Z @triton.jit
2023-01-11T21:38:06.7080057Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7080124Z     xnumel = 10
2023-01-11T21:38:06.7080220Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7080350Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7080436Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7080508Z     x0 = xindex
2023-01-11T21:38:06.7080604Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7080674Z     tmp0 = 1
2023-01-11T21:38:06.7080755Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7080833Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7080994Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7081082Z ''')
2023-01-11T21:38:06.7081087Z 
2023-01-11T21:38:06.7081092Z 
2023-01-11T21:38:06.7081269Z triton_fused_lift_fresh_copy_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7081344Z import triton
2023-01-11T21:38:06.7081436Z import triton.language as tl
2023-01-11T21:38:06.7081544Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7081643Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7081774Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7081898Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7081906Z 
2023-01-11T21:38:06.7082286Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.7082362Z @triton.jit
2023-01-11T21:38:06.7082485Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7082557Z     xnumel = 1
2023-01-11T21:38:06.7082646Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7082773Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7082855Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7082925Z     tmp0 = 5
2023-01-11T21:38:06.7083058Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.7083143Z ''')
2023-01-11T21:38:06.7083148Z 
2023-01-11T21:38:06.7083152Z 
2023-01-11T21:38:06.7083246Z async_compile.wait(globals())
2023-01-11T21:38:06.7083316Z del async_compile
2023-01-11T21:38:06.7083329Z 
2023-01-11T21:38:06.7083435Z def call(args):
2023-01-11T21:38:06.7083508Z     arg0_1, = args
2023-01-11T21:38:06.7083585Z     args.clear()
2023-01-11T21:38:06.7083677Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7083874Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7083968Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7084107Z         triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.7084173Z         del arg0_1
2023-01-11T21:38:06.7084359Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7084502Z         triton_fused_lift_fresh_copy_1_1.run(buf1, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7084584Z         return (buf0, buf1, )
2023-01-11T21:38:06.7084590Z 
2023-01-11T21:38:06.7084594Z 
2023-01-11T21:38:06.7084672Z if __name__ == "__main__":
2023-01-11T21:38:06.7084789Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7084915Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7085109Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7085220Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7085225Z 
2023-01-11T21:38:06.7085229Z 
2023-01-11T21:38:06.7085324Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7085399Z import torch
2023-01-11T21:38:06.7085473Z import random
2023-01-11T21:38:06.7085592Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7085715Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7085720Z 
2023-01-11T21:38:06.7085801Z aten = torch.ops.aten
2023-01-11T21:38:06.7085929Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7086023Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7086028Z 
2023-01-11T21:38:06.7086100Z import triton
2023-01-11T21:38:06.7086192Z import triton.language as tl
2023-01-11T21:38:06.7086315Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7086459Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7086464Z 
2023-01-11T21:38:06.7086468Z 
2023-01-11T21:38:06.7086622Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7086697Z import triton
2023-01-11T21:38:06.7086782Z import triton.language as tl
2023-01-11T21:38:06.7086922Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7087026Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7087159Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7087284Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7087289Z 
2023-01-11T21:38:06.7087688Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7087760Z @triton.jit
2023-01-11T21:38:06.7087897Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7087963Z     xnumel = 10
2023-01-11T21:38:06.7088060Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7088188Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7088270Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7088342Z     x0 = xindex
2023-01-11T21:38:06.7088459Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7088529Z     tmp0 = 1
2023-01-11T21:38:06.7088609Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7088688Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7088822Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7088907Z ''')
2023-01-11T21:38:06.7088913Z 
2023-01-11T21:38:06.7088917Z 
2023-01-11T21:38:06.7089095Z triton_fused_lift_fresh_copy_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7089169Z import triton
2023-01-11T21:38:06.7089261Z import triton.language as tl
2023-01-11T21:38:06.7089396Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7089497Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7089630Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7089754Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7089759Z 
2023-01-11T21:38:06.7090141Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.7090214Z @triton.jit
2023-01-11T21:38:06.7090335Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7090407Z     xnumel = 1
2023-01-11T21:38:06.7090495Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7090623Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7090705Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7090775Z     tmp0 = 5
2023-01-11T21:38:06.7090914Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp0, None)
2023-01-11T21:38:06.7091001Z ''')
2023-01-11T21:38:06.7091006Z 
2023-01-11T21:38:06.7091011Z 
2023-01-11T21:38:06.7091103Z async_compile.wait(globals())
2023-01-11T21:38:06.7091184Z del async_compile
2023-01-11T21:38:06.7091189Z 
2023-01-11T21:38:06.7091257Z def call(args):
2023-01-11T21:38:06.7091332Z     arg0_1, = args
2023-01-11T21:38:06.7091407Z     args.clear()
2023-01-11T21:38:06.7091499Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7091697Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7091792Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7091930Z         triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.7091995Z         del arg0_1
2023-01-11T21:38:06.7092182Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7092326Z         triton_fused_lift_fresh_copy_1_1.run(buf1, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7092411Z         return (buf0, buf1, )
2023-01-11T21:38:06.7092417Z 
2023-01-11T21:38:06.7092421Z 
2023-01-11T21:38:06.7092500Z if __name__ == "__main__":
2023-01-11T21:38:06.7092619Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7092772Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7092971Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7093075Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7093080Z 
2023-01-11T21:38:06.7093151Z ok (0.179s)
2023-01-11T21:38:06.7093603Z   test_tensor2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7093737Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7093993Z [2023-01-11 21:35:45,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 883
2023-01-11T21:38:06.7094258Z [2023-01-11 21:35:45,439] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 883
2023-01-11T21:38:06.7094859Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7094992Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7095252Z [2023-01-11 21:35:45,455] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 884
2023-01-11T21:38:06.7095557Z [2023-01-11 21:35:45,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 884
2023-01-11T21:38:06.7095562Z 
2023-01-11T21:38:06.7095661Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7095729Z import torch
2023-01-11T21:38:06.7095802Z import random
2023-01-11T21:38:06.7095922Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7096046Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7096051Z 
2023-01-11T21:38:06.7096132Z aten = torch.ops.aten
2023-01-11T21:38:06.7096268Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7096363Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7096368Z 
2023-01-11T21:38:06.7096441Z import triton
2023-01-11T21:38:06.7096526Z import triton.language as tl
2023-01-11T21:38:06.7096651Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7096791Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7096951Z constant0 = None  # 00601f3c4f47596913f89a8bc54b1f5991da93d930cde1b0ac60c4341d62906d
2023-01-11T21:38:06.7096957Z 
2023-01-11T21:38:06.7096961Z 
2023-01-11T21:38:06.7097114Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7097260Z import triton
2023-01-11T21:38:06.7097357Z import triton.language as tl
2023-01-11T21:38:06.7097465Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7097565Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7097697Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7097824Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7097829Z 
2023-01-11T21:38:06.7098243Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7098319Z @triton.jit
2023-01-11T21:38:06.7098462Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7098536Z     xnumel = 19
2023-01-11T21:38:06.7098626Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7098752Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7098874Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7098950Z     x0 = xindex
2023-01-11T21:38:06.7099047Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7099177Z     tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.7099266Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7099338Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7099472Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7099557Z ''')
2023-01-11T21:38:06.7099562Z 
2023-01-11T21:38:06.7099567Z 
2023-01-11T21:38:06.7099658Z async_compile.wait(globals())
2023-01-11T21:38:06.7099734Z del async_compile
2023-01-11T21:38:06.7099742Z 
2023-01-11T21:38:06.7099817Z def call(args):
2023-01-11T21:38:06.7099889Z     arg0_1, = args
2023-01-11T21:38:06.7099962Z     args.clear()
2023-01-11T21:38:06.7100047Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7100244Z         buf0 = empty_strided((19, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7100340Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7100487Z         triton_fused_add_0.run(constant0, arg0_1, buf0, 19, grid=grid(19), stream=stream0)
2023-01-11T21:38:06.7100559Z         del arg0_1
2023-01-11T21:38:06.7100636Z         return (buf0, )
2023-01-11T21:38:06.7100641Z 
2023-01-11T21:38:06.7100646Z 
2023-01-11T21:38:06.7100725Z if __name__ == "__main__":
2023-01-11T21:38:06.7100843Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7100962Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7101164Z     constant0 = rand_strided((19, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.7101389Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7101504Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7101509Z 
2023-01-11T21:38:06.7101514Z 
2023-01-11T21:38:06.7101614Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7101690Z import torch
2023-01-11T21:38:06.7101768Z import random
2023-01-11T21:38:06.7101885Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7102012Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7102017Z 
2023-01-11T21:38:06.7102101Z aten = torch.ops.aten
2023-01-11T21:38:06.7102239Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7102336Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7102342Z 
2023-01-11T21:38:06.7102417Z import triton
2023-01-11T21:38:06.7102515Z import triton.language as tl
2023-01-11T21:38:06.7102642Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7102775Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7102936Z constant0 = None  # 00601f3c4f47596913f89a8bc54b1f5991da93d930cde1b0ac60c4341d62906d
2023-01-11T21:38:06.7102941Z 
2023-01-11T21:38:06.7102946Z 
2023-01-11T21:38:06.7103101Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7103179Z import triton
2023-01-11T21:38:06.7103275Z import triton.language as tl
2023-01-11T21:38:06.7103391Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7103494Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7103628Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7103749Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7103755Z 
2023-01-11T21:38:06.7104166Z @pointwise(size_hints=[32], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7104244Z @triton.jit
2023-01-11T21:38:06.7104389Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7104464Z     xnumel = 19
2023-01-11T21:38:06.7104563Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7104696Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7104808Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7104876Z     x0 = xindex
2023-01-11T21:38:06.7104976Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7105123Z     tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None).to(tl.float32)
2023-01-11T21:38:06.7105213Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7105293Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7105430Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7105517Z ''')
2023-01-11T21:38:06.7105522Z 
2023-01-11T21:38:06.7105527Z 
2023-01-11T21:38:06.7105614Z async_compile.wait(globals())
2023-01-11T21:38:06.7105698Z del async_compile
2023-01-11T21:38:06.7105703Z 
2023-01-11T21:38:06.7105783Z def call(args):
2023-01-11T21:38:06.7105858Z     arg0_1, = args
2023-01-11T21:38:06.7105935Z     args.clear()
2023-01-11T21:38:06.7106028Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7106228Z         buf0 = empty_strided((19, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7106319Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7106471Z         triton_fused_add_0.run(constant0, arg0_1, buf0, 19, grid=grid(19), stream=stream0)
2023-01-11T21:38:06.7106547Z         del arg0_1
2023-01-11T21:38:06.7106626Z         return (buf0, )
2023-01-11T21:38:06.7106631Z 
2023-01-11T21:38:06.7106636Z 
2023-01-11T21:38:06.7106718Z if __name__ == "__main__":
2023-01-11T21:38:06.7106839Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7106967Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7107171Z     constant0 = rand_strided((19, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.7107399Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7107513Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7107518Z 
2023-01-11T21:38:06.7107591Z ok (0.170s)
2023-01-11T21:38:06.7108049Z   test_tensor3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7108182Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7108442Z [2023-01-11 21:35:45,552] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 885
2023-01-11T21:38:06.7108705Z [2023-01-11 21:35:45,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 885
2023-01-11T21:38:06.7109131Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7109264Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7109522Z [2023-01-11 21:35:45,756] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 886
2023-01-11T21:38:06.7109528Z 
2023-01-11T21:38:06.7109628Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7109698Z import torch
2023-01-11T21:38:06.7109777Z import random
2023-01-11T21:38:06.7109902Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7110029Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7110037Z 
2023-01-11T21:38:06.7110120Z aten = torch.ops.aten
2023-01-11T21:38:06.7110260Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7110358Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7110364Z 
2023-01-11T21:38:06.7110440Z import triton
2023-01-11T21:38:06.7110528Z import triton.language as tl
2023-01-11T21:38:06.7110688Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7110833Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7110839Z 
2023-01-11T21:38:06.7110843Z 
2023-01-11T21:38:06.7111001Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7111080Z import triton
2023-01-11T21:38:06.7111174Z import triton.language as tl
2023-01-11T21:38:06.7111291Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7111388Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7111522Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7111654Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7111659Z 
2023-01-11T21:38:06.7112042Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.7112121Z @triton.jit
2023-01-11T21:38:06.7112246Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7112322Z     xnumel = 2
2023-01-11T21:38:06.7112420Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7112545Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7112630Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7112703Z     x0 = xindex
2023-01-11T21:38:06.7112776Z     tmp0 = x0
2023-01-11T21:38:06.7112849Z     tmp1 = 1
2023-01-11T21:38:06.7112929Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7113002Z     tmp3 = 2
2023-01-11T21:38:06.7113094Z     tmp4 = tl.where(tmp2, tmp1, tmp3)
2023-01-11T21:38:06.7113202Z     tmp5 = tmp4 + tmp1
2023-01-11T21:38:06.7113340Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7113428Z ''')
2023-01-11T21:38:06.7113434Z 
2023-01-11T21:38:06.7113438Z 
2023-01-11T21:38:06.7113597Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7113675Z import triton
2023-01-11T21:38:06.7113772Z import triton.language as tl
2023-01-11T21:38:06.7113883Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7113987Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7114121Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7114248Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7114253Z 
2023-01-11T21:38:06.7114638Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.7114716Z @triton.jit
2023-01-11T21:38:06.7114839Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7114913Z     xnumel = 3
2023-01-11T21:38:06.7115005Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7115137Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7115225Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7115298Z     x0 = xindex
2023-01-11T21:38:06.7115373Z     tmp0 = x0
2023-01-11T21:38:06.7115444Z     tmp1 = 1
2023-01-11T21:38:06.7115529Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7115594Z     tmp3 = 2
2023-01-11T21:38:06.7115673Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.7115744Z     tmp5 = 3
2023-01-11T21:38:06.7115843Z     tmp6 = tl.where(tmp4, tmp3, tmp5)
2023-01-11T21:38:06.7115939Z     tmp7 = tl.where(tmp2, tmp1, tmp6)
2023-01-11T21:38:06.7116019Z     tmp8 = tmp7 + tmp3
2023-01-11T21:38:06.7116158Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7116239Z ''')
2023-01-11T21:38:06.7116247Z 
2023-01-11T21:38:06.7116251Z 
2023-01-11T21:38:06.7116410Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7116486Z import triton
2023-01-11T21:38:06.7116581Z import triton.language as tl
2023-01-11T21:38:06.7116697Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7116829Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7116964Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7117082Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7117087Z 
2023-01-11T21:38:06.7117486Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7117559Z @triton.jit
2023-01-11T21:38:06.7117692Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7117764Z     xnumel = 4
2023-01-11T21:38:06.7117864Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7117996Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7118078Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7118142Z     x0 = xindex
2023-01-11T21:38:06.7118240Z     tmp12 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7118315Z     tmp0 = x0
2023-01-11T21:38:06.7118385Z     tmp1 = 2
2023-01-11T21:38:06.7118463Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7118531Z     tmp3 = 1
2023-01-11T21:38:06.7118608Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.7118697Z     tmp5 = tl.where(tmp4, tmp3, tmp1)
2023-01-11T21:38:06.7118768Z     tmp6 = 3
2023-01-11T21:38:06.7118845Z     tmp7 = tmp0 < tmp6
2023-01-11T21:38:06.7118914Z     tmp8 = 4
2023-01-11T21:38:06.7119010Z     tmp9 = tl.where(tmp7, tmp6, tmp8)
2023-01-11T21:38:06.7119105Z     tmp10 = tl.where(tmp2, tmp5, tmp9)
2023-01-11T21:38:06.7126077Z     tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.7126177Z     tmp13 = tmp11 + tmp12
2023-01-11T21:38:06.7126385Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.7126498Z ''')
2023-01-11T21:38:06.7126504Z 
2023-01-11T21:38:06.7126509Z 
2023-01-11T21:38:06.7126608Z async_compile.wait(globals())
2023-01-11T21:38:06.7126691Z del async_compile
2023-01-11T21:38:06.7126696Z 
2023-01-11T21:38:06.7126779Z def call(args):
2023-01-11T21:38:06.7126857Z     arg0_1, = args
2023-01-11T21:38:06.7126929Z     args.clear()
2023-01-11T21:38:06.7127026Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7127228Z         buf0 = empty_strided((2, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7127325Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7127461Z         triton_fused_add_0.run(buf0, 2, grid=grid(2), stream=stream0)
2023-01-11T21:38:06.7127657Z         buf1 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7127785Z         triton_fused_add_1_1.run(buf1, 3, grid=grid(3), stream=stream0)
2023-01-11T21:38:06.7127980Z         buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7128123Z         triton_fused_add_2_2.run(arg0_1, buf2, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.7128198Z         del arg0_1
2023-01-11T21:38:06.7128397Z         buf3 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7128498Z         return (buf3, buf0, buf1, buf2, )
2023-01-11T21:38:06.7128503Z 
2023-01-11T21:38:06.7128508Z 
2023-01-11T21:38:06.7128590Z if __name__ == "__main__":
2023-01-11T21:38:06.7128712Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7128841Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7129032Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7129148Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7129153Z 
2023-01-11T21:38:06.7129420Z [2023-01-11 21:35:45,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 886
2023-01-11T21:38:06.7129429Z 
2023-01-11T21:38:06.7129527Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7129602Z import torch
2023-01-11T21:38:06.7129676Z import random
2023-01-11T21:38:06.7129796Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7129955Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7129961Z 
2023-01-11T21:38:06.7130043Z aten = torch.ops.aten
2023-01-11T21:38:06.7130173Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7130272Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7130277Z 
2023-01-11T21:38:06.7130351Z import triton
2023-01-11T21:38:06.7130443Z import triton.language as tl
2023-01-11T21:38:06.7130569Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7130710Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7130715Z 
2023-01-11T21:38:06.7130719Z 
2023-01-11T21:38:06.7130872Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7130944Z import triton
2023-01-11T21:38:06.7131036Z import triton.language as tl
2023-01-11T21:38:06.7131149Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7131250Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7131381Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7131507Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7131512Z 
2023-01-11T21:38:06.7131898Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.7131972Z @triton.jit
2023-01-11T21:38:06.7132088Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7132163Z     xnumel = 2
2023-01-11T21:38:06.7132263Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7132392Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7132518Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7132590Z     x0 = xindex
2023-01-11T21:38:06.7132661Z     tmp0 = x0
2023-01-11T21:38:06.7132724Z     tmp1 = 1
2023-01-11T21:38:06.7132803Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7132876Z     tmp3 = 2
2023-01-11T21:38:06.7132974Z     tmp4 = tl.where(tmp2, tmp1, tmp3)
2023-01-11T21:38:06.7133053Z     tmp5 = tmp4 + tmp1
2023-01-11T21:38:06.7133188Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7133273Z ''')
2023-01-11T21:38:06.7133279Z 
2023-01-11T21:38:06.7133283Z 
2023-01-11T21:38:06.7133433Z triton_fused_add_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7133507Z import triton
2023-01-11T21:38:06.7133599Z import triton.language as tl
2023-01-11T21:38:06.7133716Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7133817Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7133949Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7134077Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7134081Z 
2023-01-11T21:38:06.7134702Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*i64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.7134772Z @triton.jit
2023-01-11T21:38:06.7134894Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7134966Z     xnumel = 3
2023-01-11T21:38:06.7135063Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7135191Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7135275Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7135345Z     x0 = xindex
2023-01-11T21:38:06.7135409Z     tmp0 = x0
2023-01-11T21:38:06.7135479Z     tmp1 = 1
2023-01-11T21:38:06.7135558Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7135627Z     tmp3 = 2
2023-01-11T21:38:06.7135705Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.7135779Z     tmp5 = 3
2023-01-11T21:38:06.7135874Z     tmp6 = tl.where(tmp4, tmp3, tmp5)
2023-01-11T21:38:06.7135962Z     tmp7 = tl.where(tmp2, tmp1, tmp6)
2023-01-11T21:38:06.7136041Z     tmp8 = tmp7 + tmp3
2023-01-11T21:38:06.7136177Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7136320Z ''')
2023-01-11T21:38:06.7136326Z 
2023-01-11T21:38:06.7136331Z 
2023-01-11T21:38:06.7136494Z triton_fused_add_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7136568Z import triton
2023-01-11T21:38:06.7136663Z import triton.language as tl
2023-01-11T21:38:06.7136771Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7136872Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7137003Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7137174Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7137180Z 
2023-01-11T21:38:06.7137624Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7137704Z @triton.jit
2023-01-11T21:38:06.7137848Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7137923Z     xnumel = 4
2023-01-11T21:38:06.7138018Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7138158Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7138244Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7138315Z     x0 = xindex
2023-01-11T21:38:06.7138442Z     tmp12 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7138513Z     tmp0 = x0
2023-01-11T21:38:06.7138582Z     tmp1 = 2
2023-01-11T21:38:06.7138656Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7138728Z     tmp3 = 1
2023-01-11T21:38:06.7138806Z     tmp4 = tmp0 < tmp3
2023-01-11T21:38:06.7138906Z     tmp5 = tl.where(tmp4, tmp3, tmp1)
2023-01-11T21:38:06.7139072Z     tmp6 = 3
2023-01-11T21:38:06.7139153Z     tmp7 = tmp0 < tmp6
2023-01-11T21:38:06.7139225Z     tmp8 = 4
2023-01-11T21:38:06.7139319Z     tmp9 = tl.where(tmp7, tmp6, tmp8)
2023-01-11T21:38:06.7139422Z     tmp10 = tl.where(tmp2, tmp5, tmp9)
2023-01-11T21:38:06.7139519Z     tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.7139609Z     tmp13 = tmp11 + tmp12
2023-01-11T21:38:06.7139761Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.7139854Z ''')
2023-01-11T21:38:06.7139860Z 
2023-01-11T21:38:06.7139864Z 
2023-01-11T21:38:06.7139962Z async_compile.wait(globals())
2023-01-11T21:38:06.7140035Z del async_compile
2023-01-11T21:38:06.7140041Z 
2023-01-11T21:38:06.7140119Z def call(args):
2023-01-11T21:38:06.7140194Z     arg0_1, = args
2023-01-11T21:38:06.7140272Z     args.clear()
2023-01-11T21:38:06.7140370Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7140591Z         buf0 = empty_strided((2, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7140688Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7140813Z         triton_fused_add_0.run(buf0, 2, grid=grid(2), stream=stream0)
2023-01-11T21:38:06.7141005Z         buf1 = empty_strided((3, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7141141Z         triton_fused_add_1_1.run(buf1, 3, grid=grid(3), stream=stream0)
2023-01-11T21:38:06.7141340Z         buf2 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7141477Z         triton_fused_add_2_2.run(arg0_1, buf2, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.7141552Z         del arg0_1
2023-01-11T21:38:06.7141749Z         buf3 = empty_strided((0, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7141852Z         return (buf3, buf0, buf1, buf2, )
2023-01-11T21:38:06.7141857Z 
2023-01-11T21:38:06.7141862Z 
2023-01-11T21:38:06.7141936Z if __name__ == "__main__":
2023-01-11T21:38:06.7142057Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7142186Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7142383Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7142503Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7142508Z 
2023-01-11T21:38:06.7142581Z ok (0.312s)
2023-01-11T21:38:06.7143078Z   test_tmp_not_defined_issue1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7143212Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7143578Z Failed to collect metadata on function, produced code may be suboptimal.  Known situations this can occur are inference mode only compilation involving resize_ or prims (!schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED); if your situation looks different please file a bug to PyTorch.
2023-01-11T21:38:06.7143678Z Traceback (most recent call last):
2023-01-11T21:38:06.7143946Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1273, in aot_wrapper_dedupe
2023-01-11T21:38:06.7144116Z     fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_collect_metadata(
2023-01-11T21:38:06.7144363Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 289, in inner
2023-01-11T21:38:06.7144443Z     outs = f(*f_args)
2023-01-11T21:38:06.7144705Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2327, in functional_call
2023-01-11T21:38:06.7144834Z     out = Interpreter(mod).run(*args[params_len:], **kwargs)
2023-01-11T21:38:06.7145064Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run
2023-01-11T21:38:06.7145193Z     self.env[node] = self.run_node(node)
2023-01-11T21:38:06.7145421Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node
2023-01-11T21:38:06.7145557Z     return getattr(self, n.op)(n.target, args, kwargs)
2023-01-11T21:38:06.7145837Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 249, in call_function
2023-01-11T21:38:06.7145930Z     return target(*args, **kwargs)
2023-01-11T21:38:06.7146147Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.7146251Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.7146510Z   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 36, in __torch_function__
2023-01-11T21:38:06.7146600Z     return func(*args, **kwargs)
2023-01-11T21:38:06.7146808Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.7146911Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.7147157Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims/__init__.py", line 285, in _autograd_impl
2023-01-11T21:38:06.7147283Z     return backwards_not_supported(_prim)(*args, **kwargs)
2023-01-11T21:38:06.7147542Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 309, in _autograd_impl
2023-01-11T21:38:06.7147646Z     return redispatch_prim(args, kwargs)
2023-01-11T21:38:06.7147907Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 279, in redispatch_prim
2023-01-11T21:38:06.7147998Z     return prim(*args, **kwargs)
2023-01-11T21:38:06.7148205Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.7148309Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.7148777Z RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/aten/src/ATen/FunctionalizeFallbackKernel.cpp":32, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels
2023-01-11T21:38:06.7148786Z 
2023-01-11T21:38:06.7149028Z While executing %broadcast_in_dim_default : [#users=1] = call_function[target=torch.ops.prims.broadcast_in_dim.default](args = (%var_default_1, [1, 512, 1], [0, 1]), kwargs = {})
2023-01-11T21:38:06.7149140Z Original traceback:
2023-01-11T21:38:06.7149219Z Module stack: {}
2023-01-11T21:38:06.7149390Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 4723, in forward
2023-01-11T21:38:06.7149548Z     broadcast_in_dim_default_2 = torch.ops.prims.broadcast_in_dim.default(
2023-01-11T21:38:06.7149711Z  |   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 318, in run
2023-01-11T21:38:06.7149792Z     return model(*ex, **kwargs)
2023-01-11T21:38:06.7149798Z 
2023-01-11T21:38:06.7150057Z [2023-01-11 21:35:46,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 887
2023-01-11T21:38:06.7150270Z [2023-01-11 21:35:46,081] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7150476Z [2023-01-11 21:35:46,087] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.7150743Z [2023-01-11 21:35:46,306] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 887
2023-01-11T21:38:06.7151167Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7151299Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7151657Z Failed to collect metadata on function, produced code may be suboptimal.  Known situations this can occur are inference mode only compilation involving resize_ or prims (!schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED); if your situation looks different please file a bug to PyTorch.
2023-01-11T21:38:06.7151786Z Traceback (most recent call last):
2023-01-11T21:38:06.7152058Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1273, in aot_wrapper_dedupe
2023-01-11T21:38:06.7152221Z     fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_collect_metadata(
2023-01-11T21:38:06.7152466Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 289, in inner
2023-01-11T21:38:06.7152546Z     outs = f(*f_args)
2023-01-11T21:38:06.7152807Z   File "/opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 2327, in functional_call
2023-01-11T21:38:06.7152934Z     out = Interpreter(mod).run(*args[params_len:], **kwargs)
2023-01-11T21:38:06.7153165Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 136, in run
2023-01-11T21:38:06.7153268Z     self.env[node] = self.run_node(node)
2023-01-11T21:38:06.7153506Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 177, in run_node
2023-01-11T21:38:06.7153618Z     return getattr(self, n.op)(n.target, args, kwargs)
2023-01-11T21:38:06.7153867Z   File "/opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py", line 249, in call_function
2023-01-11T21:38:06.7153961Z     return target(*args, **kwargs)
2023-01-11T21:38:06.7154175Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.7154279Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.7154535Z   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/overrides.py", line 36, in __torch_function__
2023-01-11T21:38:06.7154626Z     return func(*args, **kwargs)
2023-01-11T21:38:06.7154842Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.7154938Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.7155187Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims/__init__.py", line 285, in _autograd_impl
2023-01-11T21:38:06.7155316Z     return backwards_not_supported(_prim)(*args, **kwargs)
2023-01-11T21:38:06.7155573Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 309, in _autograd_impl
2023-01-11T21:38:06.7155704Z     return redispatch_prim(args, kwargs)
2023-01-11T21:38:06.7155969Z   File "/opt/conda/lib/python3.10/site-packages/torch/_prims_common/wrappers.py", line 279, in redispatch_prim
2023-01-11T21:38:06.7156062Z     return prim(*args, **kwargs)
2023-01-11T21:38:06.7156283Z   File "/opt/conda/lib/python3.10/site-packages/torch/_ops.py", line 284, in __call__
2023-01-11T21:38:06.7156380Z     return self._op(*args, **kwargs or {})
2023-01-11T21:38:06.7156848Z RuntimeError: !schema.hasAnyAliasInfo() INTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/aten/src/ATen/FunctionalizeFallbackKernel.cpp":32, please report a bug to PyTorch. mutating and aliasing ops should all have codegen'd kernels
2023-01-11T21:38:06.7156857Z 
2023-01-11T21:38:06.7157100Z While executing %broadcast_in_dim_default : [#users=1] = call_function[target=torch.ops.prims.broadcast_in_dim.default](args = (%var_default_1, [1, 512, 1], [0, 1]), kwargs = {})
2023-01-11T21:38:06.7157182Z Original traceback:
2023-01-11T21:38:06.7157262Z Module stack: {}
2023-01-11T21:38:06.7157428Z   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 4723, in forward
2023-01-11T21:38:06.7157582Z     broadcast_in_dim_default_2 = torch.ops.prims.broadcast_in_dim.default(
2023-01-11T21:38:06.7157748Z  |   File "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 318, in run
2023-01-11T21:38:06.7157835Z     return model(*ex, **kwargs)
2023-01-11T21:38:06.7157840Z 
2023-01-11T21:38:06.7158091Z [2023-01-11 21:35:46,508] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 888
2023-01-11T21:38:06.7158300Z [2023-01-11 21:35:46,542] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7158533Z [2023-01-11 21:35:46,548] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.7158539Z 
2023-01-11T21:38:06.7158637Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7158712Z import torch
2023-01-11T21:38:06.7158788Z import random
2023-01-11T21:38:06.7158911Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7159034Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7159040Z 
2023-01-11T21:38:06.7159115Z aten = torch.ops.aten
2023-01-11T21:38:06.7159252Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7159347Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7159352Z 
2023-01-11T21:38:06.7159426Z import triton
2023-01-11T21:38:06.7159518Z import triton.language as tl
2023-01-11T21:38:06.7159645Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7159786Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7159795Z 
2023-01-11T21:38:06.7159799Z 
2023-01-11T21:38:06.7159953Z triton_fused_var_0 = async_compile.triton('''
2023-01-11T21:38:06.7160021Z import triton
2023-01-11T21:38:06.7160114Z import triton.language as tl
2023-01-11T21:38:06.7160229Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7160333Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7160466Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7160592Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7160597Z 
2023-01-11T21:38:06.7160691Z @reduction(size_hints=[512, 1024],
2023-01-11T21:38:06.7160799Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7160884Z               filename=__file__,
2023-01-11T21:38:06.7161261Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7161341Z @triton.jit
2023-01-11T21:38:06.7161510Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7161585Z     xnumel = 512
2023-01-11T21:38:06.7161658Z     rnumel = 1024
2023-01-11T21:38:06.7161782Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7161913Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7161998Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7162116Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7162188Z     x0 = xindex
2023-01-11T21:38:06.7162310Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7162416Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7162505Z         rindex = roffset + rbase
2023-01-11T21:38:06.7162583Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7162655Z         r1 = rindex
2023-01-11T21:38:06.7162880Z         tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7163006Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.7163123Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7163239Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7163345Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7163426Z         rindex = roffset + rbase
2023-01-11T21:38:06.7163511Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7163584Z         r1 = rindex
2023-01-11T21:38:06.7163702Z         tmp2 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask)
2023-01-11T21:38:06.7163776Z         tmp3 = 1024
2023-01-11T21:38:06.7163860Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.7163975Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.7164048Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.7164168Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.7164309Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7164383Z     tmp8 = 1024
2023-01-11T21:38:06.7164462Z     tmp9 = tmp7 / tmp8
2023-01-11T21:38:06.7164601Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp9, xmask)
2023-01-11T21:38:06.7164689Z ''')
2023-01-11T21:38:06.7164694Z 
2023-01-11T21:38:06.7164699Z 
2023-01-11T21:38:06.7164910Z triton_fused_add_add_1_mul_mul_1_sub_sum_1_var_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7164980Z import triton
2023-01-11T21:38:06.7165095Z import triton.language as tl
2023-01-11T21:38:06.7165220Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7165335Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7165466Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7165591Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7165596Z 
2023-01-11T21:38:06.7165689Z @reduction(size_hints=[512, 1024],
2023-01-11T21:38:06.7165796Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7165884Z               filename=__file__,
2023-01-11T21:38:06.7166352Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: '*fp32', 6: '*fp32', 7: '*fp32', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), equal_to_1=())]})
2023-01-11T21:38:06.7166427Z @triton.jit
2023-01-11T21:38:06.7166645Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7166719Z     xnumel = 512
2023-01-11T21:38:06.7166792Z     rnumel = 1024
2023-01-11T21:38:06.7166889Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7167018Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7167104Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7167222Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7167297Z     x0 = xindex
2023-01-11T21:38:06.7167394Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7167490Z     tmp3 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.7167596Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7167676Z         rindex = roffset + rbase
2023-01-11T21:38:06.7167795Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7167868Z         r1 = rindex
2023-01-11T21:38:06.7167985Z         tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask)
2023-01-11T21:38:06.7168086Z         tmp5 = tl.load(in_ptr3 + (r1), rmask)
2023-01-11T21:38:06.7168187Z         tmp7 = tl.load(in_ptr4 + (r1), rmask)
2023-01-11T21:38:06.7168304Z         tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.7168378Z         tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.7168458Z         tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7168538Z         tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.7168697Z         tl.store(out_ptr0 + (r1 + (1024*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp8, rmask & xmask)
2023-01-11T21:38:06.7168822Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7168927Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7169014Z         rindex = roffset + rbase
2023-01-11T21:38:06.7169092Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7169167Z         r1 = rindex
2023-01-11T21:38:06.7169387Z         tmp9 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7169511Z         _tmp10 = tl.where(xmask & rmask, _tmp10 + tmp9, _tmp10)
2023-01-11T21:38:06.7169627Z     tmp10 = tl.reshape(tl.sum(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7169745Z     _tmp16 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7169860Z     _tmp18 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7169967Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7170048Z         rindex = roffset + rbase
2023-01-11T21:38:06.7170132Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7170236Z         r1 = rindex
2023-01-11T21:38:06.7170461Z         tmp11 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7170583Z         tmp17 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask)
2023-01-11T21:38:06.7170658Z         tmp12 = 1024
2023-01-11T21:38:06.7170745Z         tmp13 = tmp10 / tmp12
2023-01-11T21:38:06.7170856Z         tmp14 = tmp11 - tmp13
2023-01-11T21:38:06.7170939Z         tmp15 = tmp14 * tmp14
2023-01-11T21:38:06.7171062Z         _tmp16 = tl.where(xmask & rmask, _tmp16 + tmp15, _tmp16)
2023-01-11T21:38:06.7171184Z         _tmp18 = tl.where(xmask & rmask, _tmp18 + tmp17, _tmp18)
2023-01-11T21:38:06.7171300Z     tmp16 = tl.reshape(tl.sum(_tmp16, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7171415Z     tmp18 = tl.reshape(tl.sum(_tmp18, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7171513Z     tl.store(out_ptr2 + x0, tmp18, xmask)
2023-01-11T21:38:06.7171580Z     tmp19 = 1024
2023-01-11T21:38:06.7171665Z     tmp20 = tmp16 / tmp19
2023-01-11T21:38:06.7171769Z     tmp21 = 1e-05
2023-01-11T21:38:06.7171850Z     tmp22 = tmp20 + tmp21
2023-01-11T21:38:06.7171991Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.7172080Z ''')
2023-01-11T21:38:06.7172085Z 
2023-01-11T21:38:06.7172090Z 
2023-01-11T21:38:06.7172187Z async_compile.wait(globals())
2023-01-11T21:38:06.7172257Z del async_compile
2023-01-11T21:38:06.7172262Z 
2023-01-11T21:38:06.7172337Z def call(args):
2023-01-11T21:38:06.7172447Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1 = args
2023-01-11T21:38:06.7172523Z     args.clear()
2023-01-11T21:38:06.7172617Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7172826Z         buf1 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7172918Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.7173004Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7173144Z         triton_fused_var_0.run(buf2, arg3_1, 512, 1024, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.7173221Z         del arg3_1
2023-01-11T21:38:06.7173446Z         buf3 = empty_strided((1, 512, 1024), (524288, 1024, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7173649Z         buf5 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7173880Z         buf6 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7174008Z         buf7 = as_strided(buf5, (1, 512, 1), (512, 1, 1)); del buf5  # reuse
2023-01-11T21:38:06.7174209Z         triton_fused_add_add_1_mul_mul_1_sub_sum_1_var_1_1.run(buf7, arg2_1, arg4_1, arg5_1, arg0_1, arg1_1, buf3, buf6, 512, 1024, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.7174289Z         del arg0_1
2023-01-11T21:38:06.7174357Z         del arg1_1
2023-01-11T21:38:06.7174432Z         del arg2_1
2023-01-11T21:38:06.7174631Z         del arg4_1
2023-01-11T21:38:06.7174705Z         del arg5_1
2023-01-11T21:38:06.7174798Z         return (buf2, buf6, buf7, )
2023-01-11T21:38:06.7174808Z 
2023-01-11T21:38:06.7174812Z 
2023-01-11T21:38:06.7174895Z if __name__ == "__main__":
2023-01-11T21:38:06.7175019Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7175167Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7175400Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7175604Z     arg1_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7175828Z     arg2_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7176046Z     arg3_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7176261Z     arg4_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7176471Z     arg5_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7176677Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1]))
2023-01-11T21:38:06.7176683Z 
2023-01-11T21:38:06.7176942Z [2023-01-11 21:35:46,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 888
2023-01-11T21:38:06.7176956Z 
2023-01-11T21:38:06.7177051Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7177177Z import torch
2023-01-11T21:38:06.7177257Z import random
2023-01-11T21:38:06.7177377Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7177504Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7177510Z 
2023-01-11T21:38:06.7177593Z aten = torch.ops.aten
2023-01-11T21:38:06.7177730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7177818Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7177823Z 
2023-01-11T21:38:06.7177900Z import triton
2023-01-11T21:38:06.7177993Z import triton.language as tl
2023-01-11T21:38:06.7178121Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7178259Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7178264Z 
2023-01-11T21:38:06.7178269Z 
2023-01-11T21:38:06.7178422Z triton_fused_var_0 = async_compile.triton('''
2023-01-11T21:38:06.7178497Z import triton
2023-01-11T21:38:06.7178592Z import triton.language as tl
2023-01-11T21:38:06.7178699Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7178800Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7178931Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7179055Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7179060Z 
2023-01-11T21:38:06.7179153Z @reduction(size_hints=[512, 1024],
2023-01-11T21:38:06.7179267Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7179354Z               filename=__file__,
2023-01-11T21:38:06.7179729Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7179800Z @triton.jit
2023-01-11T21:38:06.7179971Z def triton_(in_out_ptr0, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7180101Z     xnumel = 512
2023-01-11T21:38:06.7180178Z     rnumel = 1024
2023-01-11T21:38:06.7180277Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7180414Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7180497Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7180610Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7180680Z     x0 = xindex
2023-01-11T21:38:06.7180799Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7180905Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7180992Z         rindex = roffset + rbase
2023-01-11T21:38:06.7181081Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7181155Z         r1 = rindex
2023-01-11T21:38:06.7181391Z         tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7181482Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7181604Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7181718Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7181839Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7181944Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7182032Z         rindex = roffset + rbase
2023-01-11T21:38:06.7182110Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7182184Z         r1 = rindex
2023-01-11T21:38:06.7182316Z         tmp3 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.7182393Z         tmp4 = 1024
2023-01-11T21:38:06.7182477Z         tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.7182595Z         tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.7182709Z         tmp7 = tmp3 - tmp6
2023-01-11T21:38:06.7182783Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.7182904Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.7183018Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7183097Z     tmp10 = 1024
2023-01-11T21:38:06.7183180Z     tmp11 = tmp9 / tmp10
2023-01-11T21:38:06.7183321Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7183405Z ''')
2023-01-11T21:38:06.7183411Z 
2023-01-11T21:38:06.7183415Z 
2023-01-11T21:38:06.7183652Z triton_fused_add_add_1_convert_element_type_mul_mul_1_sub_sum_1_var_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7183720Z import triton
2023-01-11T21:38:06.7183814Z import triton.language as tl
2023-01-11T21:38:06.7183928Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7184031Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7184167Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7184295Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7184301Z 
2023-01-11T21:38:06.7184397Z @reduction(size_hints=[512, 1024],
2023-01-11T21:38:06.7184504Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7184593Z               filename=__file__,
2023-01-11T21:38:06.7185060Z               meta={'signature': {0: '*fp32', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: '*fp16', 6: '*fp32', 7: '*fp32', 8: 'i32', 9: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), equal_to_1=())]})
2023-01-11T21:38:06.7185135Z @triton.jit
2023-01-11T21:38:06.7185350Z def triton_(in_out_ptr0, in_ptr0, in_ptr1, in_ptr2, in_ptr3, in_ptr4, out_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7185426Z     xnumel = 512
2023-01-11T21:38:06.7185504Z     rnumel = 1024
2023-01-11T21:38:06.7185603Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7185730Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7185813Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7185931Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7186036Z     x0 = xindex
2023-01-11T21:38:06.7186155Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7186271Z     tmp3 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7186375Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7186456Z         rindex = roffset + rbase
2023-01-11T21:38:06.7186543Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7186615Z         r1 = rindex
2023-01-11T21:38:06.7186747Z         tmp0 = tl.load(in_ptr0 + (r1 + (1024*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.7186865Z         tmp5 = tl.load(in_ptr3 + (r1), rmask).to(tl.float32)
2023-01-11T21:38:06.7186984Z         tmp7 = tl.load(in_ptr4 + (r1), rmask).to(tl.float32)
2023-01-11T21:38:06.7187104Z         tmp2 = tmp0 - tmp1
2023-01-11T21:38:06.7187179Z         tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.7187260Z         tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7187338Z         tmp8 = tmp6 + tmp7
2023-01-11T21:38:06.7187429Z         tmp9 = tmp8.to(tl.float32)
2023-01-11T21:38:06.7187593Z         tl.store(out_ptr0 + (r1 + (1024*x0) + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp9, rmask & xmask)
2023-01-11T21:38:06.7187713Z     _tmp11 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7187818Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7187906Z         rindex = roffset + rbase
2023-01-11T21:38:06.7187984Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7188059Z         r1 = rindex
2023-01-11T21:38:06.7188282Z         tmp10 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7188405Z         _tmp11 = tl.where(xmask & rmask, _tmp11 + tmp10, _tmp11)
2023-01-11T21:38:06.7188552Z     tmp11 = tl.reshape(tl.sum(_tmp11, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7188669Z     _tmp17 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7188785Z     _tmp19 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7188883Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7188973Z         rindex = roffset + rbase
2023-01-11T21:38:06.7189059Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7189130Z         r1 = rindex
2023-01-11T21:38:06.7189351Z         tmp12 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7189472Z         tmp18 = tl.load(out_ptr0 + (r1 + (1024*x0)), rmask & xmask)
2023-01-11T21:38:06.7189547Z         tmp13 = 1024
2023-01-11T21:38:06.7189624Z         tmp14 = tmp11 / tmp13
2023-01-11T21:38:06.7189741Z         tmp15 = tmp12 - tmp14
2023-01-11T21:38:06.7189821Z         tmp16 = tmp15 * tmp15
2023-01-11T21:38:06.7189944Z         _tmp17 = tl.where(xmask & rmask, _tmp17 + tmp16, _tmp17)
2023-01-11T21:38:06.7190069Z         _tmp19 = tl.where(xmask & rmask, _tmp19 + tmp18, _tmp19)
2023-01-11T21:38:06.7190184Z     tmp17 = tl.reshape(tl.sum(_tmp17, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7190297Z     tmp19 = tl.reshape(tl.sum(_tmp19, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7190394Z     tl.store(out_ptr2 + x0, tmp19, xmask)
2023-01-11T21:38:06.7190468Z     tmp20 = 1024
2023-01-11T21:38:06.7190555Z     tmp21 = tmp17 / tmp20
2023-01-11T21:38:06.7190659Z     tmp22 = 1e-05
2023-01-11T21:38:06.7190741Z     tmp23 = tmp21 + tmp22
2023-01-11T21:38:06.7190880Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp23, xmask)
2023-01-11T21:38:06.7190963Z ''')
2023-01-11T21:38:06.7190970Z 
2023-01-11T21:38:06.7190975Z 
2023-01-11T21:38:06.7191067Z async_compile.wait(globals())
2023-01-11T21:38:06.7191137Z del async_compile
2023-01-11T21:38:06.7191142Z 
2023-01-11T21:38:06.7191217Z def call(args):
2023-01-11T21:38:06.7191327Z     arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1 = args
2023-01-11T21:38:06.7191404Z     args.clear()
2023-01-11T21:38:06.7191499Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7191704Z         buf1 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7191794Z         buf2 = buf1; del buf1  # reuse
2023-01-11T21:38:06.7191906Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7192049Z         triton_fused_var_0.run(buf2, arg3_1, 512, 1024, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.7192123Z         del arg3_1
2023-01-11T21:38:06.7192346Z         buf3 = empty_strided((1, 512, 1024), (524288, 1024, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7192548Z         buf5 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7192742Z         buf6 = empty_strided((1, 512), (512, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7192864Z         buf7 = as_strided(buf5, (1, 512, 1), (512, 1, 1)); del buf5  # reuse
2023-01-11T21:38:06.7193086Z         triton_fused_add_add_1_convert_element_type_mul_mul_1_sub_sum_1_var_1_1.run(buf7, arg2_1, arg4_1, arg5_1, arg0_1, arg1_1, buf3, buf6, 512, 1024, grid=grid(512), stream=stream0)
2023-01-11T21:38:06.7193153Z         del arg0_1
2023-01-11T21:38:06.7193225Z         del arg1_1
2023-01-11T21:38:06.7193298Z         del arg2_1
2023-01-11T21:38:06.7193372Z         del arg4_1
2023-01-11T21:38:06.7193442Z         del arg5_1
2023-01-11T21:38:06.7193531Z         return (buf2, buf6, buf7, )
2023-01-11T21:38:06.7193537Z 
2023-01-11T21:38:06.7193542Z 
2023-01-11T21:38:06.7193622Z if __name__ == "__main__":
2023-01-11T21:38:06.7193735Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7193864Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7194066Z     arg0_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7194265Z     arg1_1 = rand_strided((1024, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7194513Z     arg2_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7194726Z     arg3_1 = rand_strided((1, 512, 1024), (524288, 1024, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7194934Z     arg4_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7195144Z     arg5_1 = rand_strided((1, 512, 1), (512, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7195284Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1, arg3_1, arg4_1, arg5_1]))
2023-01-11T21:38:06.7195298Z 
2023-01-11T21:38:06.7195362Z ok (0.827s)
2023-01-11T21:38:06.7195832Z   test_tmp_not_defined_issue2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7195966Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7196225Z [2023-01-11 21:35:46,725] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 889
2023-01-11T21:38:06.7196494Z [2023-01-11 21:35:46,936] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 889
2023-01-11T21:38:06.7196910Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7197042Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7197297Z [2023-01-11 21:35:46,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 890
2023-01-11T21:38:06.7197305Z 
2023-01-11T21:38:06.7197403Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7197479Z import torch
2023-01-11T21:38:06.7197547Z import random
2023-01-11T21:38:06.7197669Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7197821Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7197827Z 
2023-01-11T21:38:06.7197911Z aten = torch.ops.aten
2023-01-11T21:38:06.7198050Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7198147Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7198152Z 
2023-01-11T21:38:06.7198229Z import triton
2023-01-11T21:38:06.7198323Z import triton.language as tl
2023-01-11T21:38:06.7198440Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7198581Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7198587Z 
2023-01-11T21:38:06.7198592Z 
2023-01-11T21:38:06.7198764Z triton_fused_div_mul_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7198840Z import triton
2023-01-11T21:38:06.7198934Z import triton.language as tl
2023-01-11T21:38:06.7199050Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7199151Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7199277Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7199404Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7199409Z 
2023-01-11T21:38:06.7199500Z @reduction(size_hints=[32, 8192],
2023-01-11T21:38:06.7199616Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7199704Z               filename=__file__,
2023-01-11T21:38:06.7200092Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7200166Z @triton.jit
2023-01-11T21:38:06.7200379Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7200455Z     xnumel = 18
2023-01-11T21:38:06.7200522Z     rnumel = 7823
2023-01-11T21:38:06.7200621Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7200759Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7200843Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7200963Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7201033Z     x0 = xindex
2023-01-11T21:38:06.7201281Z     tmp4 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last')
2023-01-11T21:38:06.7201392Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7201498Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7201588Z         rindex = roffset + rbase
2023-01-11T21:38:06.7201673Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7201748Z         r1 = rindex
2023-01-11T21:38:06.7201830Z         tmp0 = r1 + (7823*x0)
2023-01-11T21:38:06.7201904Z         tmp1 = 140800
2023-01-11T21:38:06.7201979Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7202295Z         tmp3 = tl.load(in_ptr0 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7202378Z         tmp5 = tmp3 / tmp4
2023-01-11T21:38:06.7202696Z         tmp6 = tl.load(in_ptr2 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7202780Z         tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.7202874Z         tmp8 = tl.where(tmp2, tmp7, 0)
2023-01-11T21:38:06.7203000Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.7203108Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7203207Z     tl.store(out_ptr0 + x0, tmp9, xmask)
2023-01-11T21:38:06.7203294Z ''')
2023-01-11T21:38:06.7203300Z 
2023-01-11T21:38:06.7203304Z 
2023-01-11T21:38:06.7203472Z triton_fused_div_mul_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7203547Z import triton
2023-01-11T21:38:06.7203642Z import triton.language as tl
2023-01-11T21:38:06.7203755Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7203884Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7204009Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7204138Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7204143Z 
2023-01-11T21:38:06.7204232Z @reduction(size_hints=[1, 32],
2023-01-11T21:38:06.7204346Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7204431Z               filename=__file__,
2023-01-11T21:38:06.7204789Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7204866Z @triton.jit
2023-01-11T21:38:06.7205033Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7205099Z     xnumel = 1
2023-01-11T21:38:06.7205171Z     rnumel = 18
2023-01-11T21:38:06.7205268Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7205407Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7205490Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7205609Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7205726Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7205824Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7205911Z         rindex = roffset + rbase
2023-01-11T21:38:06.7205996Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7206067Z         r0 = rindex
2023-01-11T21:38:06.7206170Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.7206290Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.7206430Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7206555Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.7206641Z ''')
2023-01-11T21:38:06.7206647Z 
2023-01-11T21:38:06.7206651Z 
2023-01-11T21:38:06.7206748Z async_compile.wait(globals())
2023-01-11T21:38:06.7206825Z del async_compile
2023-01-11T21:38:06.7206831Z 
2023-01-11T21:38:06.7206904Z def call(args):
2023-01-11T21:38:06.7207009Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.7207086Z     args.clear()
2023-01-11T21:38:06.7207171Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7207370Z         buf0 = empty_strided((18, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7207461Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7207635Z         triton_fused_div_mul_sum_1_0.run(primals_3, primals_2, primals_1, buf0, 18, 7823, grid=grid(18), stream=stream0)
2023-01-11T21:38:06.7207826Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7207971Z         triton_fused_div_mul_sum_1_1.run(buf0, buf1, 1, 18, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7208091Z         return (buf1, primals_1, primals_2, primals_3, )
2023-01-11T21:38:06.7208096Z 
2023-01-11T21:38:06.7208100Z 
2023-01-11T21:38:06.7208183Z if __name__ == "__main__":
2023-01-11T21:38:06.7208299Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7208418Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7208652Z     primals_1 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7208847Z     primals_2 = rand_strided((), (), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7209074Z     primals_3 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7209220Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.7209229Z 
2023-01-11T21:38:06.7209495Z [2023-01-11 21:35:47,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 890
2023-01-11T21:38:06.7209501Z 
2023-01-11T21:38:06.7209600Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7209674Z import torch
2023-01-11T21:38:06.7209771Z import random
2023-01-11T21:38:06.7209891Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7210017Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7210022Z 
2023-01-11T21:38:06.7210103Z aten = torch.ops.aten
2023-01-11T21:38:06.7210240Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7210338Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7210344Z 
2023-01-11T21:38:06.7210417Z import triton
2023-01-11T21:38:06.7210509Z import triton.language as tl
2023-01-11T21:38:06.7210626Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7210769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7210774Z 
2023-01-11T21:38:06.7210779Z 
2023-01-11T21:38:06.7210948Z triton_fused_div_mul_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7211023Z import triton
2023-01-11T21:38:06.7211116Z import triton.language as tl
2023-01-11T21:38:06.7211230Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7211332Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7211455Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7211582Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7211587Z 
2023-01-11T21:38:06.7211677Z @reduction(size_hints=[32, 8192],
2023-01-11T21:38:06.7211790Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7211875Z               filename=__file__,
2023-01-11T21:38:06.7212268Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7212371Z @triton.jit
2023-01-11T21:38:06.7212554Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7212627Z     xnumel = 18
2023-01-11T21:38:06.7212697Z     rnumel = 7823
2023-01-11T21:38:06.7212795Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7212928Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7213012Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7213131Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7213202Z     x0 = xindex
2023-01-11T21:38:06.7213471Z     tmp4 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), None, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7213582Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7213687Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7213780Z         rindex = roffset + rbase
2023-01-11T21:38:06.7213866Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7213936Z         r1 = rindex
2023-01-11T21:38:06.7214017Z         tmp0 = r1 + (7823*x0)
2023-01-11T21:38:06.7214093Z         tmp1 = 140800
2023-01-11T21:38:06.7214166Z         tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7214723Z         tmp3 = tl.load(in_ptr0 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7214811Z         tmp5 = tmp3 / tmp4
2023-01-11T21:38:06.7215147Z         tmp6 = tl.load(in_ptr2 + ((r1 + (7823*x0)) % 140800 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), rmask & tmp2 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7215228Z         tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.7215322Z         tmp8 = tl.where(tmp2, tmp7, 0)
2023-01-11T21:38:06.7215444Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.7215562Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7215654Z     tl.store(out_ptr0 + x0, tmp9, xmask)
2023-01-11T21:38:06.7215739Z ''')
2023-01-11T21:38:06.7215745Z 
2023-01-11T21:38:06.7215750Z 
2023-01-11T21:38:06.7215918Z triton_fused_div_mul_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7216038Z import triton
2023-01-11T21:38:06.7216133Z import triton.language as tl
2023-01-11T21:38:06.7216249Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7216352Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7216476Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7216600Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7216605Z 
2023-01-11T21:38:06.7216695Z @reduction(size_hints=[1, 32],
2023-01-11T21:38:06.7216808Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7216892Z               filename=__file__,
2023-01-11T21:38:06.7217311Z               meta={'signature': {0: '*fp32', 1: '*fp16', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7217392Z @triton.jit
2023-01-11T21:38:06.7217561Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7217627Z     xnumel = 1
2023-01-11T21:38:06.7217705Z     rnumel = 18
2023-01-11T21:38:06.7217803Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7217936Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7218019Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7218138Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7218255Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7218353Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7218440Z         rindex = roffset + rbase
2023-01-11T21:38:06.7218581Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7218658Z         r0 = rindex
2023-01-11T21:38:06.7218761Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.7218884Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.7218998Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7219129Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.7219219Z ''')
2023-01-11T21:38:06.7219224Z 
2023-01-11T21:38:06.7219228Z 
2023-01-11T21:38:06.7219324Z async_compile.wait(globals())
2023-01-11T21:38:06.7219404Z del async_compile
2023-01-11T21:38:06.7219409Z 
2023-01-11T21:38:06.7219485Z def call(args):
2023-01-11T21:38:06.7219594Z     primals_1, primals_2, primals_3 = args
2023-01-11T21:38:06.7219672Z     args.clear()
2023-01-11T21:38:06.7219766Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7219959Z         buf0 = empty_strided((18, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7220059Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7220233Z         triton_fused_div_mul_sum_1_0.run(primals_3, primals_2, primals_1, buf0, 18, 7823, grid=grid(18), stream=stream0)
2023-01-11T21:38:06.7220424Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7220573Z         triton_fused_div_mul_sum_1_1.run(buf0, buf1, 1, 18, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.7220696Z         return (buf1, primals_1, primals_2, primals_3, )
2023-01-11T21:38:06.7220701Z 
2023-01-11T21:38:06.7220706Z 
2023-01-11T21:38:06.7220789Z if __name__ == "__main__":
2023-01-11T21:38:06.7220910Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7221032Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7221267Z     primals_1 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7221464Z     primals_2 = rand_strided((), (), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7221694Z     primals_3 = rand_strided((1, 88, 40, 40), (140800, 1600, 40, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7221840Z     print_performance(lambda: call([primals_1, primals_2, primals_3]))
2023-01-11T21:38:06.7221845Z 
2023-01-11T21:38:06.7221918Z ok (0.459s)
2023-01-11T21:38:06.7222416Z   test_to_device_constant_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7222547Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7222810Z [2023-01-11 21:35:47,267] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 891
2023-01-11T21:38:06.7223072Z [2023-01-11 21:35:47,350] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 891
2023-01-11T21:38:06.7223484Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7223616Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7223870Z [2023-01-11 21:35:47,494] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 892
2023-01-11T21:38:06.7224133Z [2023-01-11 21:35:47,571] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 892
2023-01-11T21:38:06.7224138Z 
2023-01-11T21:38:06.7224235Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7224310Z import torch
2023-01-11T21:38:06.7224416Z import random
2023-01-11T21:38:06.7224535Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7224658Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7224663Z 
2023-01-11T21:38:06.7224738Z aten = torch.ops.aten
2023-01-11T21:38:06.7224876Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7224973Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7224979Z 
2023-01-11T21:38:06.7225051Z import triton
2023-01-11T21:38:06.7225143Z import triton.language as tl
2023-01-11T21:38:06.7225270Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7225409Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7225568Z constant0 = None  # 001ba0e61f337f449db2ca901dfd67b1df5cf0825674994be1be485b60aabc98
2023-01-11T21:38:06.7225725Z constant0_cuda0 = None  # beb02a6268c3f406824d31228875474e7116cf8e770246a5eb85f5795315f9cc
2023-01-11T21:38:06.7225731Z 
2023-01-11T21:38:06.7225735Z 
2023-01-11T21:38:06.7225896Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7225970Z import triton
2023-01-11T21:38:06.7226063Z import triton.language as tl
2023-01-11T21:38:06.7226176Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7226280Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7226415Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7226532Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7226544Z 
2023-01-11T21:38:06.7226937Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7227013Z @triton.jit
2023-01-11T21:38:06.7227147Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7227219Z     xnumel = 10
2023-01-11T21:38:06.7227316Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7227450Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7227533Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7227597Z     x0 = xindex
2023-01-11T21:38:06.7227694Z     tmp2 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7227765Z     tmp0 = x0
2023-01-11T21:38:06.7227882Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7227961Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7228098Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7228186Z ''')
2023-01-11T21:38:06.7228192Z 
2023-01-11T21:38:06.7228196Z 
2023-01-11T21:38:06.7228417Z triton_fused_convert_element_type_1_convert_element_type_2_1 = async_compile.triton('''
2023-01-11T21:38:06.7228485Z import triton
2023-01-11T21:38:06.7228577Z import triton.language as tl
2023-01-11T21:38:06.7228691Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7228791Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7228922Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7229050Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7229056Z 
2023-01-11T21:38:06.7229469Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7229544Z @triton.jit
2023-01-11T21:38:06.7229680Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7229753Z     xnumel = 64
2023-01-11T21:38:06.7229849Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7229978Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7230060Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7230131Z     x0 = xindex
2023-01-11T21:38:06.7230321Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7230439Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7230510Z     tmp2 = 1
2023-01-11T21:38:06.7230592Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7230725Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.7230857Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7230943Z ''')
2023-01-11T21:38:06.7230949Z 
2023-01-11T21:38:06.7230953Z 
2023-01-11T21:38:06.7231047Z async_compile.wait(globals())
2023-01-11T21:38:06.7231118Z del async_compile
2023-01-11T21:38:06.7231128Z 
2023-01-11T21:38:06.7231196Z def call(args):
2023-01-11T21:38:06.7231268Z     arg0_1, = args
2023-01-11T21:38:06.7231343Z     args.clear()
2023-01-11T21:38:06.7231438Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7231636Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7231729Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7231858Z         triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.7231934Z         del arg0_1
2023-01-11T21:38:06.7232129Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7232324Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7232524Z         triton_fused_convert_element_type_1_convert_element_type_2_1.run(constant0_cuda0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.7232613Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7232619Z 
2023-01-11T21:38:06.7232623Z 
2023-01-11T21:38:06.7232703Z if __name__ == "__main__":
2023-01-11T21:38:06.7232821Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7232948Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7233136Z     constant0 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.7233345Z     constant0_cuda0 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.7233546Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7233658Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7233664Z 
2023-01-11T21:38:06.7233668Z 
2023-01-11T21:38:06.7233765Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7233838Z import torch
2023-01-11T21:38:06.7233984Z import random
2023-01-11T21:38:06.7234098Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7234223Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7234228Z 
2023-01-11T21:38:06.7234309Z aten = torch.ops.aten
2023-01-11T21:38:06.7234447Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7234541Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7234546Z 
2023-01-11T21:38:06.7234622Z import triton
2023-01-11T21:38:06.7234715Z import triton.language as tl
2023-01-11T21:38:06.7234839Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7234975Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7235143Z constant0 = None  # 001ba0e61f337f449db2ca901dfd67b1df5cf0825674994be1be485b60aabc98
2023-01-11T21:38:06.7235302Z constant0_cuda0 = None  # beb02a6268c3f406824d31228875474e7116cf8e770246a5eb85f5795315f9cc
2023-01-11T21:38:06.7235308Z 
2023-01-11T21:38:06.7235312Z 
2023-01-11T21:38:06.7235467Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7235542Z import triton
2023-01-11T21:38:06.7235635Z import triton.language as tl
2023-01-11T21:38:06.7235748Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7235850Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7235976Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7236100Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7236105Z 
2023-01-11T21:38:06.7236504Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7236605Z @triton.jit
2023-01-11T21:38:06.7236739Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7236816Z     xnumel = 10
2023-01-11T21:38:06.7236916Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7237044Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7237120Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7237191Z     x0 = xindex
2023-01-11T21:38:06.7237310Z     tmp2 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7237382Z     tmp0 = x0
2023-01-11T21:38:06.7237470Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7237550Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7237685Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7237763Z ''')
2023-01-11T21:38:06.7237769Z 
2023-01-11T21:38:06.7237773Z 
2023-01-11T21:38:06.7238000Z triton_fused_convert_element_type_1_convert_element_type_2_1 = async_compile.triton('''
2023-01-11T21:38:06.7238076Z import triton
2023-01-11T21:38:06.7238168Z import triton.language as tl
2023-01-11T21:38:06.7238282Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7238383Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7238518Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7238646Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7238651Z 
2023-01-11T21:38:06.7239054Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*i64', 1: '*i64', 2: '*i64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7239128Z @triton.jit
2023-01-11T21:38:06.7239270Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7239344Z     xnumel = 64
2023-01-11T21:38:06.7239443Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7239571Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7239656Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7239726Z     x0 = xindex
2023-01-11T21:38:06.7239909Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7240038Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7240112Z     tmp2 = 1
2023-01-11T21:38:06.7240192Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7240326Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.7240458Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7240543Z ''')
2023-01-11T21:38:06.7240548Z 
2023-01-11T21:38:06.7240553Z 
2023-01-11T21:38:06.7240638Z async_compile.wait(globals())
2023-01-11T21:38:06.7240716Z del async_compile
2023-01-11T21:38:06.7240722Z 
2023-01-11T21:38:06.7240796Z def call(args):
2023-01-11T21:38:06.7240874Z     arg0_1, = args
2023-01-11T21:38:06.7240951Z     args.clear()
2023-01-11T21:38:06.7241044Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7241241Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7241326Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7241465Z         triton_fused_add_0.run(arg0_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.7241539Z         del arg0_1
2023-01-11T21:38:06.7241732Z         buf1 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7241925Z         buf2 = empty_strided((64, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7242121Z         triton_fused_convert_element_type_1_convert_element_type_2_1.run(constant0_cuda0, buf1, buf2, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.7242210Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7242215Z 
2023-01-11T21:38:06.7242220Z 
2023-01-11T21:38:06.7242301Z if __name__ == "__main__":
2023-01-11T21:38:06.7242439Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7242566Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7242765Z     constant0 = rand_strided((64, ), (1, ), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.7242978Z     constant0_cuda0 = rand_strided((64, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.7243176Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7243293Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7243298Z 
2023-01-11T21:38:06.7243369Z ok (0.448s)
2023-01-11T21:38:06.7243829Z   test_to_device_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7243963Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7244219Z [2023-01-11 21:35:47,624] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 893
2023-01-11T21:38:06.7244395Z [2023-01-11 21:35:47,625] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.7244661Z [2023-01-11 21:35:47,627] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 893
2023-01-11T21:38:06.7245076Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7245209Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7245467Z [2023-01-11 21:35:47,677] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 894
2023-01-11T21:38:06.7245648Z [2023-01-11 21:35:47,679] torch._inductor.ir: [WARNING] DeviceCopy
2023-01-11T21:38:06.7245909Z [2023-01-11 21:35:47,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 894
2023-01-11T21:38:06.7245939Z 
2023-01-11T21:38:06.7246039Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7246114Z import torch
2023-01-11T21:38:06.7246182Z import random
2023-01-11T21:38:06.7246301Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7246426Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7246432Z 
2023-01-11T21:38:06.7246515Z aten = torch.ops.aten
2023-01-11T21:38:06.7246652Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7246748Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7246754Z 
2023-01-11T21:38:06.7246827Z import triton
2023-01-11T21:38:06.7246922Z import triton.language as tl
2023-01-11T21:38:06.7247041Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7247180Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7247186Z 
2023-01-11T21:38:06.7247190Z 
2023-01-11T21:38:06.7247282Z async_compile.wait(globals())
2023-01-11T21:38:06.7247361Z del async_compile
2023-01-11T21:38:06.7247366Z 
2023-01-11T21:38:06.7247442Z def call(args):
2023-01-11T21:38:06.7247514Z     arg0_1, = args
2023-01-11T21:38:06.7247590Z     args.clear()
2023-01-11T21:38:06.7247790Z     buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.7247871Z     buf0.copy_(arg0_1)
2023-01-11T21:38:06.7247944Z     del arg0_1
2023-01-11T21:38:06.7248024Z     return (buf0, )
2023-01-11T21:38:06.7248029Z 
2023-01-11T21:38:06.7248033Z 
2023-01-11T21:38:06.7248112Z if __name__ == "__main__":
2023-01-11T21:38:06.7248229Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7248384Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7248592Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7248698Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7248703Z 
2023-01-11T21:38:06.7248714Z 
2023-01-11T21:38:06.7248807Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7248881Z import torch
2023-01-11T21:38:06.7248957Z import random
2023-01-11T21:38:06.7249074Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7249197Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7249202Z 
2023-01-11T21:38:06.7249288Z aten = torch.ops.aten
2023-01-11T21:38:06.7249424Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7249512Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7249517Z 
2023-01-11T21:38:06.7249590Z import triton
2023-01-11T21:38:06.7249683Z import triton.language as tl
2023-01-11T21:38:06.7249812Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7249951Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7249957Z 
2023-01-11T21:38:06.7249961Z 
2023-01-11T21:38:06.7250144Z triton_fused_convert_element_type_0 = async_compile.triton('''
2023-01-11T21:38:06.7250218Z import triton
2023-01-11T21:38:06.7250314Z import triton.language as tl
2023-01-11T21:38:06.7250421Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7250522Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7250654Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7250780Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7250785Z 
2023-01-11T21:38:06.7251184Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7251261Z @triton.jit
2023-01-11T21:38:06.7251392Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7251465Z     xnumel = 40
2023-01-11T21:38:06.7251555Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7251683Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7251794Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7251867Z     x0 = xindex
2023-01-11T21:38:06.7251986Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7252073Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7252208Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.7252286Z ''')
2023-01-11T21:38:06.7252292Z 
2023-01-11T21:38:06.7252296Z 
2023-01-11T21:38:06.7252388Z async_compile.wait(globals())
2023-01-11T21:38:06.7252464Z del async_compile
2023-01-11T21:38:06.7252469Z 
2023-01-11T21:38:06.7252544Z def call(args):
2023-01-11T21:38:06.7252616Z     arg0_1, = args
2023-01-11T21:38:06.7252693Z     args.clear()
2023-01-11T21:38:06.7252788Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7252989Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7253083Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7253247Z         triton_fused_convert_element_type_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.7253322Z         del arg0_1
2023-01-11T21:38:06.7253526Z     buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.7253605Z     buf1.copy_(buf0)
2023-01-11T21:38:06.7253681Z     return (buf1, )
2023-01-11T21:38:06.7253686Z 
2023-01-11T21:38:06.7253690Z 
2023-01-11T21:38:06.7253769Z if __name__ == "__main__":
2023-01-11T21:38:06.7253879Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7254004Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7254212Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7254360Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7254366Z 
2023-01-11T21:38:06.7254437Z ok (0.173s)
2023-01-11T21:38:06.7255030Z   test_to_dtype_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7255162Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7255421Z [2023-01-11 21:35:47,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 895
2023-01-11T21:38:06.7255684Z [2023-01-11 21:35:47,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 895
2023-01-11T21:38:06.7256101Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7256228Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7256478Z [2023-01-11 21:35:48,002] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 896
2023-01-11T21:38:06.7256738Z [2023-01-11 21:35:48,019] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 896
2023-01-11T21:38:06.7256744Z 
2023-01-11T21:38:06.7256840Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7256914Z import torch
2023-01-11T21:38:06.7256988Z import random
2023-01-11T21:38:06.7257106Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7257299Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7257305Z 
2023-01-11T21:38:06.7257380Z aten = torch.ops.aten
2023-01-11T21:38:06.7257518Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7257613Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7257618Z 
2023-01-11T21:38:06.7257749Z import triton
2023-01-11T21:38:06.7257846Z import triton.language as tl
2023-01-11T21:38:06.7257971Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7258110Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7258116Z 
2023-01-11T21:38:06.7258120Z 
2023-01-11T21:38:06.7258343Z triton_fused_convert_element_type_1_convert_element_type_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7258410Z import triton
2023-01-11T21:38:06.7258503Z import triton.language as tl
2023-01-11T21:38:06.7258616Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7258719Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7258859Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7258985Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7258990Z 
2023-01-11T21:38:06.7259403Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7259478Z @triton.jit
2023-01-11T21:38:06.7259614Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7259688Z     xnumel = 40
2023-01-11T21:38:06.7259785Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7259913Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7259996Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7260068Z     x0 = xindex
2023-01-11T21:38:06.7260258Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7260399Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7260473Z     tmp1 = 1
2023-01-11T21:38:06.7260555Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7260646Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.7260727Z     tmp5 = (tmp4 != 0)
2023-01-11T21:38:06.7260868Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7261005Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7261087Z ''')
2023-01-11T21:38:06.7261101Z 
2023-01-11T21:38:06.7261106Z 
2023-01-11T21:38:06.7261194Z async_compile.wait(globals())
2023-01-11T21:38:06.7261274Z del async_compile
2023-01-11T21:38:06.7261279Z 
2023-01-11T21:38:06.7261356Z def call(args):
2023-01-11T21:38:06.7261438Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7261516Z     args.clear()
2023-01-11T21:38:06.7261611Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7261821Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7262024Z         buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7262120Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7262309Z         triton_fused_convert_element_type_1_convert_element_type_2_0.run(arg1_1, buf0, buf1, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.7262416Z         return (arg0_1, buf0, arg1_1, buf1, )
2023-01-11T21:38:06.7262422Z 
2023-01-11T21:38:06.7262426Z 
2023-01-11T21:38:06.7262510Z if __name__ == "__main__":
2023-01-11T21:38:06.7262629Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7262757Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7262969Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7263173Z     arg1_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.7263297Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7263306Z 
2023-01-11T21:38:06.7263310Z 
2023-01-11T21:38:06.7263411Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7263490Z import torch
2023-01-11T21:38:06.7263565Z import random
2023-01-11T21:38:06.7263685Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7263840Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7263845Z 
2023-01-11T21:38:06.7263933Z aten = torch.ops.aten
2023-01-11T21:38:06.7264064Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7264162Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7264167Z 
2023-01-11T21:38:06.7264244Z import triton
2023-01-11T21:38:06.7264338Z import triton.language as tl
2023-01-11T21:38:06.7264464Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7264605Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7264611Z 
2023-01-11T21:38:06.7264616Z 
2023-01-11T21:38:06.7264803Z triton_fused_convert_element_type_0 = async_compile.triton('''
2023-01-11T21:38:06.7264884Z import triton
2023-01-11T21:38:06.7264972Z import triton.language as tl
2023-01-11T21:38:06.7265089Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7265196Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7265334Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7265461Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7265467Z 
2023-01-11T21:38:06.7265870Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7265946Z @triton.jit
2023-01-11T21:38:06.7266080Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7266149Z     xnumel = 40
2023-01-11T21:38:06.7266249Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7266410Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7266496Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7266570Z     x0 = xindex
2023-01-11T21:38:06.7266690Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7266783Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7266913Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.7267003Z ''')
2023-01-11T21:38:06.7267008Z 
2023-01-11T21:38:06.7267014Z 
2023-01-11T21:38:06.7267238Z triton_fused_convert_element_type_1_convert_element_type_2_1 = async_compile.triton('''
2023-01-11T21:38:06.7267316Z import triton
2023-01-11T21:38:06.7267411Z import triton.language as tl
2023-01-11T21:38:06.7267525Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7267629Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7267764Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7267884Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7267892Z 
2023-01-11T21:38:06.7268306Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7268386Z @triton.jit
2023-01-11T21:38:06.7268534Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7268610Z     xnumel = 40
2023-01-11T21:38:06.7268712Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7268843Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7268930Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7268997Z     x0 = xindex
2023-01-11T21:38:06.7269194Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7269294Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7269369Z     tmp1 = 1
2023-01-11T21:38:06.7269453Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7269545Z     tmp3 = tmp2.to(tl.float32)
2023-01-11T21:38:06.7269627Z     tmp5 = (tmp4 != 0)
2023-01-11T21:38:06.7269757Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7269890Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7270005Z ''')
2023-01-11T21:38:06.7270011Z 
2023-01-11T21:38:06.7270016Z 
2023-01-11T21:38:06.7270113Z async_compile.wait(globals())
2023-01-11T21:38:06.7270192Z del async_compile
2023-01-11T21:38:06.7270197Z 
2023-01-11T21:38:06.7270275Z def call(args):
2023-01-11T21:38:06.7270358Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7270436Z     args.clear()
2023-01-11T21:38:06.7270524Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7270738Z         buf0 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7270833Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7270995Z         triton_fused_convert_element_type_0.run(arg0_1, buf0, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.7271073Z         del arg0_1
2023-01-11T21:38:06.7271280Z         buf1 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7271488Z         buf2 = empty_strided((2, 2, 10), (20, 10, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7271671Z         triton_fused_convert_element_type_1_convert_element_type_2_1.run(arg1_1, buf1, buf2, 40, grid=grid(40), stream=stream0)
2023-01-11T21:38:06.7271774Z         return (buf0, buf1, arg1_1, buf2, )
2023-01-11T21:38:06.7271779Z 
2023-01-11T21:38:06.7271784Z 
2023-01-11T21:38:06.7271865Z if __name__ == "__main__":
2023-01-11T21:38:06.7271985Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7272113Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7272323Z     arg0_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7272536Z     arg1_1 = rand_strided((2, 2, 10), (20, 10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.7272685Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7272690Z 
2023-01-11T21:38:06.7272761Z ok (0.276s)
2023-01-11T21:38:06.7273211Z   test_topk_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7273344Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7273605Z [2023-01-11 21:35:48,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 897
2023-01-11T21:38:06.7273825Z [2023-01-11 21:35:48,040] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.topk
2023-01-11T21:38:06.7274094Z [2023-01-11 21:35:48,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 897
2023-01-11T21:38:06.7274515Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7274644Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7274895Z [2023-01-11 21:35:48,056] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 898
2023-01-11T21:38:06.7275106Z [2023-01-11 21:35:48,061] torch._inductor.ir: [WARNING] Using FallbackKernel: aten.topk
2023-01-11T21:38:06.7275362Z [2023-01-11 21:35:48,063] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 898
2023-01-11T21:38:06.7275370Z 
2023-01-11T21:38:06.7275469Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7275537Z import torch
2023-01-11T21:38:06.7275613Z import random
2023-01-11T21:38:06.7275733Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7275858Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7275890Z 
2023-01-11T21:38:06.7275974Z aten = torch.ops.aten
2023-01-11T21:38:06.7276112Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7276208Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7276213Z 
2023-01-11T21:38:06.7276280Z import triton
2023-01-11T21:38:06.7276377Z import triton.language as tl
2023-01-11T21:38:06.7276504Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7276645Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7276651Z 
2023-01-11T21:38:06.7276655Z 
2023-01-11T21:38:06.7276747Z async_compile.wait(globals())
2023-01-11T21:38:06.7276827Z del async_compile
2023-01-11T21:38:06.7276832Z 
2023-01-11T21:38:06.7276907Z def call(args):
2023-01-11T21:38:06.7276983Z     arg0_1, = args
2023-01-11T21:38:06.7277051Z     args.clear()
2023-01-11T21:38:06.7277145Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7277237Z         buf0 = aten.topk(arg0_1, 2)
2023-01-11T21:38:06.7277310Z         del arg0_1
2023-01-11T21:38:06.7277386Z         buf1 = buf0[0]
2023-01-11T21:38:06.7277501Z         assert_size_stride(buf1, (1, 1, 8, 2), (16, 16, 2, 1))
2023-01-11T21:38:06.7277578Z         buf2 = buf0[1]
2023-01-11T21:38:06.7277681Z         assert_size_stride(buf2, (1, 1, 8, 2), (16, 16, 2, 1))
2023-01-11T21:38:06.7277751Z         del buf0
2023-01-11T21:38:06.7277836Z         return (buf1, buf2, )
2023-01-11T21:38:06.7277841Z 
2023-01-11T21:38:06.7277846Z 
2023-01-11T21:38:06.7277928Z if __name__ == "__main__":
2023-01-11T21:38:06.7278046Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7278171Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7278419Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7278525Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7278537Z 
2023-01-11T21:38:06.7278542Z 
2023-01-11T21:38:06.7278631Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7278709Z import torch
2023-01-11T21:38:06.7278783Z import random
2023-01-11T21:38:06.7278907Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7279032Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7279037Z 
2023-01-11T21:38:06.7279121Z aten = torch.ops.aten
2023-01-11T21:38:06.7279257Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7279346Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7279358Z 
2023-01-11T21:38:06.7279425Z import triton
2023-01-11T21:38:06.7279517Z import triton.language as tl
2023-01-11T21:38:06.7279642Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7279783Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7279788Z 
2023-01-11T21:38:06.7279792Z 
2023-01-11T21:38:06.7279886Z async_compile.wait(globals())
2023-01-11T21:38:06.7279963Z del async_compile
2023-01-11T21:38:06.7279968Z 
2023-01-11T21:38:06.7280042Z def call(args):
2023-01-11T21:38:06.7280112Z     arg0_1, = args
2023-01-11T21:38:06.7280186Z     args.clear()
2023-01-11T21:38:06.7280278Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7280370Z         buf0 = aten.topk(arg0_1, 2)
2023-01-11T21:38:06.7280442Z         del arg0_1
2023-01-11T21:38:06.7280517Z         buf1 = buf0[0]
2023-01-11T21:38:06.7280631Z         assert_size_stride(buf1, (1, 1, 8, 2), (16, 16, 2, 1))
2023-01-11T21:38:06.7280699Z         buf2 = buf0[1]
2023-01-11T21:38:06.7280806Z         assert_size_stride(buf2, (1, 1, 8, 2), (16, 16, 2, 1))
2023-01-11T21:38:06.7280877Z         del buf0
2023-01-11T21:38:06.7280960Z         return (buf1, buf2, )
2023-01-11T21:38:06.7280967Z 
2023-01-11T21:38:06.7280971Z 
2023-01-11T21:38:06.7281051Z if __name__ == "__main__":
2023-01-11T21:38:06.7281168Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7281294Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7281537Z     arg0_1 = rand_strided((1, 1, 8, 8), (64, 64, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7281644Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7281649Z 
2023-01-11T21:38:06.7281721Z ok (0.043s)
2023-01-11T21:38:06.7282183Z   test_transpose_add_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7282314Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7282576Z [2023-01-11 21:35:48,078] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 899
2023-01-11T21:38:06.7282836Z [2023-01-11 21:35:48,696] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 899
2023-01-11T21:38:06.7282842Z 
2023-01-11T21:38:06.7282941Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7283021Z import torch
2023-01-11T21:38:06.7283096Z import random
2023-01-11T21:38:06.7283208Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7283331Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7283337Z 
2023-01-11T21:38:06.7283417Z aten = torch.ops.aten
2023-01-11T21:38:06.7283555Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7283650Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7283655Z 
2023-01-11T21:38:06.7283729Z import triton
2023-01-11T21:38:06.7283821Z import triton.language as tl
2023-01-11T21:38:06.7283967Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7284107Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7284112Z 
2023-01-11T21:38:06.7284117Z 
2023-01-11T21:38:06.7284270Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7284345Z import triton
2023-01-11T21:38:06.7284440Z import triton.language as tl
2023-01-11T21:38:06.7284553Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7284659Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7284792Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7284910Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7284923Z 
2023-01-11T21:38:06.7285400Z @pointwise(size_hints=[32, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7285477Z @triton.jit
2023-01-11T21:38:06.7285654Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.7285733Z     xnumel = 32
2023-01-11T21:38:06.7285806Z     ynumel = 16
2023-01-11T21:38:06.7285910Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7286044Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7286121Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7286217Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.7286350Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.7286433Z     ymask = yindex < ynumel
2023-01-11T21:38:06.7286505Z     x0 = xindex
2023-01-11T21:38:06.7286576Z     y1 = yindex
2023-01-11T21:38:06.7286693Z     tmp0 = tl.load(in_ptr0 + (x0 + (32*y1)), xmask & ymask)
2023-01-11T21:38:06.7286802Z     tmp1 = tl.load(in_ptr1 + (y1 + (16*x0)), xmask & ymask)
2023-01-11T21:38:06.7286883Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7287043Z     tl.store(out_ptr0 + (x0 + (32*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.7287131Z ''')
2023-01-11T21:38:06.7287137Z 
2023-01-11T21:38:06.7287141Z 
2023-01-11T21:38:06.7287236Z async_compile.wait(globals())
2023-01-11T21:38:06.7287342Z del async_compile
2023-01-11T21:38:06.7287347Z 
2023-01-11T21:38:06.7287427Z def call(args):
2023-01-11T21:38:06.7287508Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7287576Z     args.clear()
2023-01-11T21:38:06.7287669Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7287873Z         buf0 = empty_strided((32, 16), (1, 32), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7287966Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7288112Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 32, 16, grid=grid(32, 16), stream=stream0)
2023-01-11T21:38:06.7288187Z         del arg0_1
2023-01-11T21:38:06.7288260Z         del arg1_1
2023-01-11T21:38:06.7288334Z         return (buf0, )
2023-01-11T21:38:06.7288339Z 
2023-01-11T21:38:06.7288351Z 
2023-01-11T21:38:06.7288424Z if __name__ == "__main__":
2023-01-11T21:38:06.7288541Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7288666Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7288874Z     arg0_1 = rand_strided((16, 32), (32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7289075Z     arg1_1 = rand_strided((32, 16), (16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7289197Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7289202Z 
2023-01-11T21:38:06.7289273Z ok (0.633s)
2023-01-11T21:38:06.7289729Z   test_transpose_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7289889Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7290140Z [2023-01-11 21:35:48,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 900
2023-01-11T21:38:06.7290405Z [2023-01-11 21:35:48,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 900
2023-01-11T21:38:06.7290818Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7290949Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7291208Z [2023-01-11 21:35:48,927] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 901
2023-01-11T21:38:06.7291470Z [2023-01-11 21:35:49,016] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 901
2023-01-11T21:38:06.7291476Z 
2023-01-11T21:38:06.7291575Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7291654Z import torch
2023-01-11T21:38:06.7291729Z import random
2023-01-11T21:38:06.7291842Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7291967Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7291972Z 
2023-01-11T21:38:06.7292056Z aten = torch.ops.aten
2023-01-11T21:38:06.7292193Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7292290Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7292295Z 
2023-01-11T21:38:06.7292370Z import triton
2023-01-11T21:38:06.7292461Z import triton.language as tl
2023-01-11T21:38:06.7292585Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7292721Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7292726Z 
2023-01-11T21:38:06.7292731Z 
2023-01-11T21:38:06.7292897Z triton_fused_add_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7292973Z import triton
2023-01-11T21:38:06.7293067Z import triton.language as tl
2023-01-11T21:38:06.7293205Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7293310Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7293447Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7293566Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7293581Z 
2023-01-11T21:38:06.7294065Z @pointwise(size_hints=[8, 8], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7294141Z @triton.jit
2023-01-11T21:38:06.7294328Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.7294403Z     xnumel = 8
2023-01-11T21:38:06.7294677Z     ynumel = 8
2023-01-11T21:38:06.7294781Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7294923Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7295007Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7295097Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.7295230Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.7295314Z     ymask = yindex < ynumel
2023-01-11T21:38:06.7295385Z     x0 = xindex
2023-01-11T21:38:06.7295457Z     y1 = yindex
2023-01-11T21:38:06.7295577Z     tmp0 = tl.load(in_ptr0 + (x0 + (8*y1)), xmask & ymask)
2023-01-11T21:38:06.7295798Z     tmp1 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask, eviction_policy='evict_last')
2023-01-11T21:38:06.7295966Z     tmp3 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask)
2023-01-11T21:38:06.7296050Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7296125Z     tmp4 = 2
2023-01-11T21:38:06.7296206Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.7296283Z     tmp6 = 10
2023-01-11T21:38:06.7296363Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.7296520Z     tl.store(out_ptr0 + (x0 + (8*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.7296673Z     tl.store(out_ptr1 + (y1 + (8*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask)
2023-01-11T21:38:06.7296763Z ''')
2023-01-11T21:38:06.7296769Z 
2023-01-11T21:38:06.7296773Z 
2023-01-11T21:38:06.7296871Z async_compile.wait(globals())
2023-01-11T21:38:06.7296950Z del async_compile
2023-01-11T21:38:06.7296955Z 
2023-01-11T21:38:06.7297034Z def call(args):
2023-01-11T21:38:06.7297116Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7297257Z     args.clear()
2023-01-11T21:38:06.7297359Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7297587Z         buf0 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7297789Z         buf1 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7297884Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7298048Z         triton_fused_add_add_1_0.run(arg0_1, arg1_1, buf0, buf1, 8, 8, grid=grid(8, 8), stream=stream0)
2023-01-11T21:38:06.7298127Z         del arg0_1
2023-01-11T21:38:06.7298203Z         del arg1_1
2023-01-11T21:38:06.7298282Z         return (buf0, buf1, )
2023-01-11T21:38:06.7298295Z 
2023-01-11T21:38:06.7298300Z 
2023-01-11T21:38:06.7298376Z if __name__ == "__main__":
2023-01-11T21:38:06.7298496Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7298625Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7298827Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7299024Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7299151Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7299156Z 
2023-01-11T21:38:06.7299160Z 
2023-01-11T21:38:06.7299261Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7299338Z import torch
2023-01-11T21:38:06.7299408Z import random
2023-01-11T21:38:06.7299574Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7299703Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7299711Z 
2023-01-11T21:38:06.7299796Z aten = torch.ops.aten
2023-01-11T21:38:06.7299935Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7300033Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7300038Z 
2023-01-11T21:38:06.7300114Z import triton
2023-01-11T21:38:06.7300203Z import triton.language as tl
2023-01-11T21:38:06.7300331Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7300475Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7300484Z 
2023-01-11T21:38:06.7300488Z 
2023-01-11T21:38:06.7300654Z triton_fused_add_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7300731Z import triton
2023-01-11T21:38:06.7300826Z import triton.language as tl
2023-01-11T21:38:06.7300944Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7301052Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7301181Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7301309Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7301314Z 
2023-01-11T21:38:06.7301797Z @pointwise(size_hints=[8, 8], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7301875Z @triton.jit
2023-01-11T21:38:06.7302066Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.7302180Z     xnumel = 8
2023-01-11T21:38:06.7302254Z     ynumel = 8
2023-01-11T21:38:06.7302352Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7302480Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7302567Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7302665Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.7302796Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.7302881Z     ymask = yindex < ynumel
2023-01-11T21:38:06.7302953Z     x0 = xindex
2023-01-11T21:38:06.7303023Z     y1 = yindex
2023-01-11T21:38:06.7303148Z     tmp0 = tl.load(in_ptr0 + (x0 + (8*y1)), xmask & ymask).to(tl.float32)
2023-01-11T21:38:06.7303386Z     tmp1 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7303515Z     tmp3 = tl.load(in_ptr1 + (y1 + (8*x0)), xmask & ymask).to(tl.float32)
2023-01-11T21:38:06.7303601Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7303673Z     tmp4 = 2
2023-01-11T21:38:06.7303752Z     tmp5 = tmp3 * tmp4
2023-01-11T21:38:06.7303825Z     tmp6 = 10
2023-01-11T21:38:06.7303895Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.7304059Z     tl.store(out_ptr0 + (x0 + (8*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.7304211Z     tl.store(out_ptr1 + (y1 + (8*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp7, xmask & ymask)
2023-01-11T21:38:06.7304296Z ''')
2023-01-11T21:38:06.7304302Z 
2023-01-11T21:38:06.7304306Z 
2023-01-11T21:38:06.7304400Z async_compile.wait(globals())
2023-01-11T21:38:06.7304480Z del async_compile
2023-01-11T21:38:06.7304485Z 
2023-01-11T21:38:06.7304560Z def call(args):
2023-01-11T21:38:06.7304640Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7304708Z     args.clear()
2023-01-11T21:38:06.7304804Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7305004Z         buf0 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7305206Z         buf1 = empty_strided((8, 8), (1, 8), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7305299Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7305458Z         triton_fused_add_add_1_0.run(arg0_1, arg1_1, buf0, buf1, 8, 8, grid=grid(8, 8), stream=stream0)
2023-01-11T21:38:06.7305560Z         del arg0_1
2023-01-11T21:38:06.7305635Z         del arg1_1
2023-01-11T21:38:06.7305712Z         return (buf0, buf1, )
2023-01-11T21:38:06.7305717Z 
2023-01-11T21:38:06.7305722Z 
2023-01-11T21:38:06.7305803Z if __name__ == "__main__":
2023-01-11T21:38:06.7305921Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7306049Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7306248Z     arg0_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7306446Z     arg1_1 = rand_strided((8, 8), (8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7306568Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7306574Z 
2023-01-11T21:38:06.7306644Z ok (0.321s)
2023-01-11T21:38:06.7307169Z   test_transposed_propagates_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.7307251Z   warnings.warn(
2023-01-11T21:38:06.7307508Z [2023-01-11 21:35:49,031] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 902
2023-01-11T21:38:06.7307770Z [2023-01-11 21:35:49,100] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 902
2023-01-11T21:38:06.7307776Z 
2023-01-11T21:38:06.7307874Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7307949Z import torch
2023-01-11T21:38:06.7308022Z import random
2023-01-11T21:38:06.7308142Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7308289Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7308302Z 
2023-01-11T21:38:06.7308378Z aten = torch.ops.aten
2023-01-11T21:38:06.7308517Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7308613Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7308620Z 
2023-01-11T21:38:06.7308695Z import triton
2023-01-11T21:38:06.7308787Z import triton.language as tl
2023-01-11T21:38:06.7308911Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7309049Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7309055Z 
2023-01-11T21:38:06.7309059Z 
2023-01-11T21:38:06.7309206Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.7309281Z import triton
2023-01-11T21:38:06.7309375Z import triton.language as tl
2023-01-11T21:38:06.7309491Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7309592Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7309730Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7309855Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7309861Z 
2023-01-11T21:38:06.7310283Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7310350Z @triton.jit
2023-01-11T21:38:06.7310491Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7310565Z     xnumel = 64
2023-01-11T21:38:06.7310661Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7310791Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7310875Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7310947Z     x0 = xindex
2023-01-11T21:38:06.7311038Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7311137Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7311217Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7311352Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.7311437Z ''')
2023-01-11T21:38:06.7311443Z 
2023-01-11T21:38:06.7311447Z 
2023-01-11T21:38:06.7311569Z async_compile.wait(globals())
2023-01-11T21:38:06.7311648Z del async_compile
2023-01-11T21:38:06.7311654Z 
2023-01-11T21:38:06.7311728Z def call(args):
2023-01-11T21:38:06.7311800Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7311876Z     args.clear()
2023-01-11T21:38:06.7311967Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7312182Z         buf0 = empty_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7312275Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7312422Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 64, grid=grid(64), stream=stream0)
2023-01-11T21:38:06.7312495Z         del arg0_1
2023-01-11T21:38:06.7312564Z         del arg1_1
2023-01-11T21:38:06.7312640Z         return (buf0, )
2023-01-11T21:38:06.7312646Z 
2023-01-11T21:38:06.7312650Z 
2023-01-11T21:38:06.7312730Z if __name__ == "__main__":
2023-01-11T21:38:06.7312848Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7312975Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7313190Z     arg0_1 = rand_strided((1, 4, 4, 4), (64, 4, 1, 16), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7313395Z     arg1_1 = rand_strided((4, 4, 4), (4, 1, 16), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7313516Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7313521Z 
2023-01-11T21:38:06.7313585Z ok (0.239s)
2023-01-11T21:38:06.7314096Z   test_triton_conv_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.7314203Z   warnings.warn(
2023-01-11T21:38:06.7314459Z [2023-01-11 21:35:49,300] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 903
2023-01-11T21:38:06.7314798Z [2023-01-11 21:35:49,345] torch._inductor.codegen.triton_template: [DEBUG] template_codegen: TemplateSchedulerNode(name='buf1') -- [SchedulerNode(name='buf2')]
2023-01-11T21:38:06.7315012Z [2023-01-11 21:35:49,370] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7315018Z 
2023-01-11T21:38:06.7315117Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7315192Z import torch
2023-01-11T21:38:06.7315266Z import random
2023-01-11T21:38:06.7315378Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7315500Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7315505Z 
2023-01-11T21:38:06.7315588Z aten = torch.ops.aten
2023-01-11T21:38:06.7315723Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7315826Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7315831Z 
2023-01-11T21:38:06.7315904Z import triton
2023-01-11T21:38:06.7315996Z import triton.language as tl
2023-01-11T21:38:06.7316114Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7316255Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7316261Z 
2023-01-11T21:38:06.7316414Z from torch._inductor.triton_ops.conv_perf_model import early_config_prune
2023-01-11T21:38:06.7316562Z from torch._inductor.triton_ops.conv_perf_model import estimate_conv_time
2023-01-11T21:38:06.7316700Z from torch._inductor.triton_ops.autotune import conv_heuristics
2023-01-11T21:38:06.7316705Z 
2023-01-11T21:38:06.7316710Z 
2023-01-11T21:38:06.7316879Z triton_fused_convolution_0 = async_compile.triton('''
2023-01-11T21:38:06.7316954Z import triton
2023-01-11T21:38:06.7317045Z import triton.language as tl
2023-01-11T21:38:06.7317158Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7317255Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7317391Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7317521Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7317526Z 
2023-01-11T21:38:06.7318027Z @pointwise(size_hints=[4096, 1024], tile_hint=TileHint.SQUARE,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7318103Z @triton.jit
2023-01-11T21:38:06.7318269Z def triton_(in_ptr0, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.7318346Z     xnumel = 4096
2023-01-11T21:38:06.7318419Z     ynumel = 1024
2023-01-11T21:38:06.7318510Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7318646Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7318732Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7318834Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.7318966Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.7319051Z     ymask = yindex < ynumel
2023-01-11T21:38:06.7319122Z     x3 = xindex
2023-01-11T21:38:06.7319187Z     y2 = yindex
2023-01-11T21:38:06.7319263Z     x0 = xindex % 128
2023-01-11T21:38:06.7319343Z     x1 = (xindex // 128)
2023-01-11T21:38:06.7319502Z     tmp0 = tl.load(in_ptr0 + (y2 + (1024*x3) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), xmask & ymask)
2023-01-11T21:38:06.7319667Z     tl.store(out_ptr0 + (x0 + (128*y2) + (131072*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.7319755Z ''')
2023-01-11T21:38:06.7319761Z 
2023-01-11T21:38:06.7319841Z @conv_heuristics()
2023-01-11T21:38:06.7319906Z @triton.jit
2023-01-11T21:38:06.7319992Z def triton_template_1(
2023-01-11T21:38:06.7320058Z     x,
2023-01-11T21:38:06.7320157Z     w,
2023-01-11T21:38:06.7320237Z     # stride of tensor
2023-01-11T21:38:06.7320308Z     stride_xn,
2023-01-11T21:38:06.7320379Z     stride_xc,
2023-01-11T21:38:06.7320443Z     stride_xh,
2023-01-11T21:38:06.7320515Z     stride_xw,
2023-01-11T21:38:06.7320585Z     stride_wn,
2023-01-11T21:38:06.7320657Z     stride_wc,
2023-01-11T21:38:06.7320732Z     stride_wh,
2023-01-11T21:38:06.7320802Z     stride_ww,
2023-01-11T21:38:06.7320865Z     stride_yn,
2023-01-11T21:38:06.7320935Z     stride_yc,
2023-01-11T21:38:06.7321004Z     stride_yh,
2023-01-11T21:38:06.7321074Z     stride_yw,
2023-01-11T21:38:06.7321150Z     stride_biasn,
2023-01-11T21:38:06.7321230Z     # Tensor dimensions
2023-01-11T21:38:06.7321299Z     BATCH,
2023-01-11T21:38:06.7321361Z     IN_C,
2023-01-11T21:38:06.7321428Z     IN_H,
2023-01-11T21:38:06.7321495Z     IN_W,
2023-01-11T21:38:06.7321565Z     KERNEL_N,
2023-01-11T21:38:06.7321636Z     KERNEL_H,
2023-01-11T21:38:06.7321705Z     KERNEL_W,
2023-01-11T21:38:06.7321766Z     OUT_H,
2023-01-11T21:38:06.7321838Z     OUT_W,
2023-01-11T21:38:06.7321924Z     # parameters of conv
2023-01-11T21:38:06.7321995Z     stride_h,
2023-01-11T21:38:06.7322065Z     stride_w,
2023-01-11T21:38:06.7322138Z     padding_h,
2023-01-11T21:38:06.7322211Z     padding_w,
2023-01-11T21:38:06.7322276Z     dilation_h,
2023-01-11T21:38:06.7322349Z     dilation_w,
2023-01-11T21:38:06.7322432Z     output_padding_h,
2023-01-11T21:38:06.7322512Z     output_padding_w,
2023-01-11T21:38:06.7322598Z     groups: tl.constexpr,
2023-01-11T21:38:06.7322679Z     # pointer inc for x
2023-01-11T21:38:06.7322747Z     delta_x_ptr,
2023-01-11T21:38:06.7322831Z     # fusable kernels args
2023-01-11T21:38:06.7322902Z     in_ptr0,
2023-01-11T21:38:06.7322972Z     out_ptr3,
2023-01-11T21:38:06.7323052Z     # Metaparameters
2023-01-11T21:38:06.7323141Z     ACC_TYPE: tl.constexpr,
2023-01-11T21:38:06.7323233Z     CONV1X1_NHWC: tl.constexpr,
2023-01-11T21:38:06.7323320Z     # blocks in different dimension
2023-01-11T21:38:06.7323407Z     BLOCK_M: tl.constexpr,
2023-01-11T21:38:06.7323495Z     BLOCK_N: tl.constexpr,
2023-01-11T21:38:06.7323605Z     # reduction tiling parameter for matmul
2023-01-11T21:38:06.7323691Z     BLOCK_K: tl.constexpr,
2023-01-11T21:38:06.7323757Z ):
2023-01-11T21:38:06.7323818Z     """
2023-01-11T21:38:06.7324009Z     each program instance computes a [BLOCK_BATCH, BLOCK_N, BLOCK_H, BLOCK_W] block of y
2023-01-11T21:38:06.7324079Z     """
2023-01-11T21:38:06.7324267Z     # -----------------------------------------------------------
2023-01-11T21:38:06.7324408Z     # Map program ids `pid` to the block of y it should compute.
2023-01-11T21:38:06.7324498Z     pid_nhw = tl.program_id(0)
2023-01-11T21:38:06.7324588Z     pid_k = tl.program_id(1)
2023-01-11T21:38:06.7324593Z 
2023-01-11T21:38:06.7324676Z     # offset for output y
2023-01-11T21:38:06.7324784Z     off_y_k = pid_k * BLOCK_N + tl.arange(0, BLOCK_N)
2023-01-11T21:38:06.7324902Z     off_y_nhw = pid_nhw * BLOCK_M + tl.arange(0, BLOCK_M)
2023-01-11T21:38:06.7325003Z     off_y_n = off_y_nhw // (OUT_H * OUT_W)
2023-01-11T21:38:06.7325104Z     off_y_hw = off_y_nhw % (OUT_H * OUT_W)
2023-01-11T21:38:06.7325190Z     off_y_h = off_y_hw // OUT_W
2023-01-11T21:38:06.7325278Z     off_y_w = off_y_hw % OUT_W
2023-01-11T21:38:06.7325283Z 
2023-01-11T21:38:06.7325380Z     # offset for the initial ptr for x
2023-01-11T21:38:06.7325451Z     off_x_n = off_y_n
2023-01-11T21:38:06.7325598Z     off_x_h = off_y_h * stride_h - padding_h
2023-01-11T21:38:06.7325740Z     off_x_w = off_y_w * stride_w - padding_w
2023-01-11T21:38:06.7325877Z     off_x_nhw = off_x_n * stride_xn + off_x_h * stride_xh + off_x_w * stride_xw
2023-01-11T21:38:06.7325972Z     off_x_crs = tl.arange(0, BLOCK_K)
2023-01-11T21:38:06.7325977Z 
2023-01-11T21:38:06.7326068Z     CRS = IN_C * KERNEL_H * KERNEL_W
2023-01-11T21:38:06.7326166Z     # load inc ptr of x, upade x_ptrs
2023-01-11T21:38:06.7326241Z     if not CONV1X1_NHWC:
2023-01-11T21:38:06.7326349Z         delta_x_ptrs = delta_x_ptr + off_x_crs
2023-01-11T21:38:06.7326575Z         off_x_crs_unpacked = tl.load(delta_x_ptrs, mask=off_x_crs < CRS, other=0)
2023-01-11T21:38:06.7326706Z         x_ptrs = x + off_x_nhw[:, None] + off_x_crs_unpacked[None, :]
2023-01-11T21:38:06.7326774Z     else:
2023-01-11T21:38:06.7326895Z         x_ptrs = x + off_x_nhw[:, None] + off_x_crs[None, :]
2023-01-11T21:38:06.7326901Z 
2023-01-11T21:38:06.7326976Z     mask_x = (
2023-01-11T21:38:06.7327055Z         (off_x_n < BATCH)
2023-01-11T21:38:06.7327125Z         & (off_x_h >= 0)
2023-01-11T21:38:06.7327206Z         & (off_x_h < IN_H)
2023-01-11T21:38:06.7327283Z         & (off_x_w >= 0)
2023-01-11T21:38:06.7327360Z         & (off_x_w < IN_W)
2023-01-11T21:38:06.7327460Z     )[:, None] & (off_x_crs < CRS)[None, :]
2023-01-11T21:38:06.7327466Z 
2023-01-11T21:38:06.7327563Z     # offset for the inital ptr for w
2023-01-11T21:38:06.7327654Z     off_w_crs = tl.arange(0, BLOCK_K)
2023-01-11T21:38:06.7327723Z     off_w_k = off_y_k
2023-01-11T21:38:06.7327849Z     w_ptrs = w + off_w_crs[:, None] + off_w_k[None, :] * stride_wn
2023-01-11T21:38:06.7327984Z     mask_w = (off_x_crs < CRS)[:, None] & (off_w_k < KERNEL_N)[None, :]
2023-01-11T21:38:06.7327989Z 
2023-01-11T21:38:06.7328105Z     # ------ load x ------
2023-01-11T21:38:06.7328226Z     matrix_x = tl.load(x_ptrs, mask=mask_x, other=0.0)
2023-01-11T21:38:06.7328339Z     # ------ load w ------
2023-01-11T21:38:06.7328461Z     matrix_w = tl.load(w_ptrs, mask=mask_w, other=0.0)
2023-01-11T21:38:06.7328466Z 
2023-01-11T21:38:06.7328645Z     # -----------------------------------------------------------
2023-01-11T21:38:06.7328724Z     # allocate accumulator
2023-01-11T21:38:06.7328841Z     acc = tl.zeros((BLOCK_M, BLOCK_N), dtype=ACC_TYPE)
2023-01-11T21:38:06.7328936Z     for crs in range(0, CRS, BLOCK_K):
2023-01-11T21:38:06.7328941Z 
2023-01-11T21:38:06.7329093Z         # ------ matrix multiplication ------
2023-01-11T21:38:06.7329194Z         acc += tl.dot(matrix_x, matrix_w)
2023-01-11T21:38:06.7329323Z         # ------ update ptrs ------
2023-01-11T21:38:06.7329406Z         w_ptrs += BLOCK_K
2023-01-11T21:38:06.7329499Z         # load inc ptr of x, upade x_ptrs
2023-01-11T21:38:06.7329586Z         if not CONV1X1_NHWC:
2023-01-11T21:38:06.7329673Z             delta_x_ptrs += BLOCK_K
2023-01-11T21:38:06.7329792Z             off_x_crs = crs + BLOCK_K + tl.arange(0, BLOCK_K)
2023-01-11T21:38:06.7329972Z             off_x_crs_unpacked = tl.load(delta_x_ptrs, mask=off_x_crs < CRS, other=0)
2023-01-11T21:38:06.7330101Z             x_ptrs = x + off_x_nhw[:, None] + off_x_crs_unpacked[None, :]
2023-01-11T21:38:06.7330173Z         else:
2023-01-11T21:38:06.7330283Z             off_x_crs = crs + BLOCK_K + tl.arange(0, BLOCK_K)
2023-01-11T21:38:06.7330367Z             x_ptrs += BLOCK_K
2023-01-11T21:38:06.7330372Z 
2023-01-11T21:38:06.7330444Z         mask_x = (
2023-01-11T21:38:06.7330525Z             (off_x_n < BATCH)
2023-01-11T21:38:06.7330604Z             & (off_x_h >= 0)
2023-01-11T21:38:06.7330686Z             & (off_x_h < IN_H)
2023-01-11T21:38:06.7330765Z             & (off_x_w >= 0)
2023-01-11T21:38:06.7330849Z             & (off_x_w < IN_W)
2023-01-11T21:38:06.7330944Z         )[:, None] & (off_x_crs < CRS)[None, :]
2023-01-11T21:38:06.7331075Z         mask_w = (off_x_crs < CRS)[:, None] & (off_w_k < KERNEL_N)[None, :]
2023-01-11T21:38:06.7331201Z         # ------ prefetch ------
2023-01-11T21:38:06.7331319Z         # ------ load x ------
2023-01-11T21:38:06.7331441Z         matrix_x = tl.load(x_ptrs, mask=mask_x, other=0.0)
2023-01-11T21:38:06.7331558Z         # ------ load w ------
2023-01-11T21:38:06.7331679Z         matrix_w = tl.load(w_ptrs, mask=mask_w, other=0.0)
2023-01-11T21:38:06.7331685Z 
2023-01-11T21:38:06.7331788Z     acc = acc.to(out_ptr3.dtype.element_ty)
2023-01-11T21:38:06.7331793Z 
2023-01-11T21:38:06.7331806Z 
2023-01-11T21:38:06.7331894Z     XBLOCK: tl.constexpr = BLOCK_M
2023-01-11T21:38:06.7331989Z     YBLOCK: tl.constexpr = BLOCK_N
2023-01-11T21:38:06.7332136Z     xnumel = BATCH * (OUT_H + 2 * output_padding_h) * (OUT_W + 2 * output_padding_w)
2023-01-11T21:38:06.7332246Z     ynumel = KERNEL_N
2023-01-11T21:38:06.7332334Z     xoffset = pid_nhw * XBLOCK
2023-01-11T21:38:06.7332470Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7332555Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7332634Z     yoffset = pid_k * YBLOCK
2023-01-11T21:38:06.7332765Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.7332849Z     ymask = yindex < ynumel
2023-01-11T21:38:06.7332920Z     y2 = yindex
2023-01-11T21:38:06.7332998Z     x0 = xindex % 1024
2023-01-11T21:38:06.7333078Z     x1 = (xindex // 1024)
2023-01-11T21:38:06.7333229Z     tmp0 = tl.load(in_ptr0 + (y2 + tl.zeros([XBLOCK, YBLOCK], tl.int32)), xmask & ymask)
2023-01-11T21:38:06.7333300Z     tmp1 = acc + tmp0
2023-01-11T21:38:06.7333465Z     tl.store(out_ptr3 + (x0 + (1024*y2) + (32768*x1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp1, xmask & ymask)
2023-01-11T21:38:06.7333471Z 
2023-01-11T21:38:06.7333476Z 
2023-01-11T21:38:06.7333549Z     return
2023-01-11T21:38:06.7333554Z 
2023-01-11T21:38:06.7333559Z 
2023-01-11T21:38:06.7333653Z async_compile.wait(globals())
2023-01-11T21:38:06.7333730Z del async_compile
2023-01-11T21:38:06.7333735Z 
2023-01-11T21:38:06.7333811Z def call(args):
2023-01-11T21:38:06.7333903Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.7333981Z     args.clear()
2023-01-11T21:38:06.7334066Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7334305Z         buf0 = empty_strided((32, 128, 32, 32), (131072, 1, 4096, 128), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7334398Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7334688Z         triton_fused_convolution_0.run(arg0_1, buf0, 4096, 1024, grid=grid(4096, 1024), stream=stream0)
2023-01-11T21:38:06.7334764Z         del arg0_1
2023-01-11T21:38:06.7334995Z         buf2 = empty_strided((32, 32, 32, 32), (32768, 1024, 32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7335001Z 
2023-01-11T21:38:06.7335101Z         def grid_triton_template_1(META):
2023-01-11T21:38:06.7335178Z             return (
2023-01-11T21:38:06.7335285Z                 triton.cdiv(32 * 32 * 32, META["BLOCK_M"]),
2023-01-11T21:38:06.7335391Z                 triton.cdiv(32, META["BLOCK_N"]),
2023-01-11T21:38:06.7335460Z             )
2023-01-11T21:38:06.7335465Z 
2023-01-11T21:38:06.7335728Z         triton_template_1[grid_triton_template_1](buf0, arg1_1, 131072, 1, 4096, 128, 128, 1, 1, 1, 32768, 1, 1024, 32, None, 32, 128, 32, 32, 32, 1, 1, 32, 32, 1, 1, 0, 0, 1, 1, 0, 0, 1, None, arg2_1, buf2, ACC_TYPE=tl.float32, CONV1X1_NHWC=True)
2023-01-11T21:38:06.7335807Z         del arg1_1
2023-01-11T21:38:06.7335880Z         del arg2_1
2023-01-11T21:38:06.7335958Z         return (buf2, )
2023-01-11T21:38:06.7335963Z 
2023-01-11T21:38:06.7335968Z 
2023-01-11T21:38:06.7336047Z if __name__ == "__main__":
2023-01-11T21:38:06.7336159Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7336287Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7336522Z     arg0_1 = rand_strided((32, 128, 32, 32), (131072, 1024, 32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7336743Z     arg1_1 = rand_strided((32, 128, 1, 1), (128, 1, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7336941Z     arg2_1 = rand_strided((32, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7337070Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.7337399Z [2023-01-11 21:35:49,532] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 903
2023-01-11T21:38:06.7337406Z 
2023-01-11T21:38:06.7337478Z ok (9.917s)
2023-01-11T21:38:06.7337990Z   test_triton_mm2_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.7338063Z   warnings.warn(
2023-01-11T21:38:06.7338363Z [2023-01-11 21:35:59,191] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 904
2023-01-11T21:38:06.7338698Z [2023-01-11 21:35:59,195] torch._inductor.codegen.triton_template: [DEBUG] template_codegen: TemplateSchedulerNode(name='buf0') -- [SchedulerNode(name='buf1')]
2023-01-11T21:38:06.7338912Z [2023-01-11 21:35:59,200] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7339173Z [2023-01-11 21:35:59,206] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 904
2023-01-11T21:38:06.7339179Z 
2023-01-11T21:38:06.7339278Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7339353Z import torch
2023-01-11T21:38:06.7339428Z import random
2023-01-11T21:38:06.7339541Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7339665Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7339670Z 
2023-01-11T21:38:06.7339752Z aten = torch.ops.aten
2023-01-11T21:38:06.7339892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7339989Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7339994Z 
2023-01-11T21:38:06.7340069Z import triton
2023-01-11T21:38:06.7340163Z import triton.language as tl
2023-01-11T21:38:06.7340292Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7340429Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7340434Z 
2023-01-11T21:38:06.7340570Z from torch._inductor.triton_ops.autotune import mm_heuristics
2023-01-11T21:38:06.7340703Z from torch._inductor.triton_ops.autotune import mm_autotune
2023-01-11T21:38:06.7340779Z import torch
2023-01-11T21:38:06.7340856Z import triton
2023-01-11T21:38:06.7340948Z import triton.language as tl
2023-01-11T21:38:06.7340953Z 
2023-01-11T21:38:06.7341029Z @mm_autotune()
2023-01-11T21:38:06.7341099Z @mm_heuristics()
2023-01-11T21:38:06.7341170Z @triton.jit
2023-01-11T21:38:06.7341254Z def triton_template_0(
2023-01-11T21:38:06.7341325Z     A,
2023-01-11T21:38:06.7341392Z     B,
2023-01-11T21:38:06.7341458Z     M,
2023-01-11T21:38:06.7341523Z     N,
2023-01-11T21:38:06.7341581Z     K,
2023-01-11T21:38:06.7341659Z     stride_am,
2023-01-11T21:38:06.7341731Z     stride_ak,
2023-01-11T21:38:06.7341803Z     stride_bk,
2023-01-11T21:38:06.7341874Z     stride_bn,
2023-01-11T21:38:06.7341971Z     stride_cm,
2023-01-11T21:38:06.7342036Z     stride_cn,
2023-01-11T21:38:06.7342120Z     # fusable kernels args
2023-01-11T21:38:06.7342196Z     out_ptr3,
2023-01-11T21:38:06.7342284Z     allow_tf32: tl.constexpr,
2023-01-11T21:38:06.7342372Z     BLOCK_M: tl.constexpr,
2023-01-11T21:38:06.7342456Z     BLOCK_N: tl.constexpr,
2023-01-11T21:38:06.7342541Z     BLOCK_K: tl.constexpr,
2023-01-11T21:38:06.7342618Z     GROUP_M: tl.constexpr,
2023-01-11T21:38:06.7342701Z     SPLIT_K: tl.constexpr,
2023-01-11T21:38:06.7342783Z     EVEN_K: tl.constexpr,
2023-01-11T21:38:06.7342870Z     ACC_TYPE: tl.constexpr,
2023-01-11T21:38:06.7342939Z ):
2023-01-11T21:38:06.7343030Z     # matrix multiplication
2023-01-11T21:38:06.7343108Z     pid = tl.program_id(0)
2023-01-11T21:38:06.7343197Z     pid_z = tl.program_id(1)
2023-01-11T21:38:06.7343338Z     grid_m = (M + BLOCK_M - 1) // BLOCK_M
2023-01-11T21:38:06.7343470Z     grid_n = (N + BLOCK_N - 1) // BLOCK_N
2023-01-11T21:38:06.7343637Z     # re-order program ID for better L2 performance
2023-01-11T21:38:06.7343723Z     width = GROUP_M * grid_n
2023-01-11T21:38:06.7343809Z     group_id = pid // width
2023-01-11T21:38:06.7343971Z     group_size = min(grid_m - group_id * GROUP_M, GROUP_M)
2023-01-11T21:38:06.7344087Z     pid_m = group_id * GROUP_M + (pid % group_size)
2023-01-11T21:38:06.7344187Z     pid_n = (pid % width) // (group_size)
2023-01-11T21:38:06.7344278Z     # do matrix multiplication
2023-01-11T21:38:06.7344387Z     rm = pid_m * BLOCK_M + tl.arange(0, BLOCK_M)
2023-01-11T21:38:06.7344495Z     rn = pid_n * BLOCK_N + tl.arange(0, BLOCK_N)
2023-01-11T21:38:06.7344631Z     ram = tl.max_contiguous(tl.multiple_of(rm % M, BLOCK_M), BLOCK_M)
2023-01-11T21:38:06.7344803Z     rbn = tl.max_contiguous(tl.multiple_of(rn % N, BLOCK_N), BLOCK_N)
2023-01-11T21:38:06.7344903Z     rk = pid_z * BLOCK_K + tl.arange(0, BLOCK_K)
2023-01-11T21:38:06.7344977Z     # pointers
2023-01-11T21:38:06.7345108Z     A_ptrs = A + (ram[:, None] * stride_am + rk[None, :] * stride_ak)
2023-01-11T21:38:06.7345238Z     B_ptrs = B + (rk[:, None] * stride_bk + rbn[None, :] * stride_bn)
2023-01-11T21:38:06.7345355Z     acc = tl.zeros((BLOCK_M, BLOCK_N), dtype=ACC_TYPE)
2023-01-11T21:38:06.7345504Z     for k in range(K, 0, -BLOCK_K * SPLIT_K):
2023-01-11T21:38:06.7345577Z         if EVEN_K:
2023-01-11T21:38:06.7345655Z             a = tl.load(A_ptrs)
2023-01-11T21:38:06.7345737Z             b = tl.load(B_ptrs)
2023-01-11T21:38:06.7345806Z         else:
2023-01-11T21:38:06.7345927Z             a = tl.load(A_ptrs, mask=rk[None, :] < k, other=0.0)
2023-01-11T21:38:06.7346047Z             b = tl.load(B_ptrs, mask=rk[:, None] < k, other=0.0)
2023-01-11T21:38:06.7346157Z         acc += tl.dot(a, b, allow_tf32=allow_tf32)
2023-01-11T21:38:06.7346261Z         A_ptrs += BLOCK_K * SPLIT_K * stride_ak
2023-01-11T21:38:06.7346359Z         B_ptrs += BLOCK_K * SPLIT_K * stride_bk
2023-01-11T21:38:06.7346470Z     acc = acc.to(out_ptr3.dtype.element_ty)
2023-01-11T21:38:06.7346475Z 
2023-01-11T21:38:06.7346484Z 
2023-01-11T21:38:06.7346582Z     XBLOCK: tl.constexpr = BLOCK_M
2023-01-11T21:38:06.7346680Z     YBLOCK: tl.constexpr = BLOCK_N
2023-01-11T21:38:06.7346753Z     xnumel = M
2023-01-11T21:38:06.7346826Z     ynumel = N
2023-01-11T21:38:06.7346915Z     xoffset = pid_m * XBLOCK
2023-01-11T21:38:06.7347044Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7347127Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7347213Z     yoffset = pid_n * YBLOCK
2023-01-11T21:38:06.7347343Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.7347425Z     ymask = yindex < ynumel
2023-01-11T21:38:06.7347499Z     x0 = xindex
2023-01-11T21:38:06.7347568Z     y1 = yindex
2023-01-11T21:38:06.7347676Z     tmp0 = tl.where(0 != 0, 0, tl.where(0 > acc, 0, acc))
2023-01-11T21:38:06.7347835Z     tl.store(out_ptr3 + (y1 + (1024*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp0, xmask & ymask)
2023-01-11T21:38:06.7347841Z 
2023-01-11T21:38:06.7347845Z 
2023-01-11T21:38:06.7347968Z async_compile.wait(globals())
2023-01-11T21:38:06.7348047Z del async_compile
2023-01-11T21:38:06.7348052Z 
2023-01-11T21:38:06.7348126Z def call(args):
2023-01-11T21:38:06.7348206Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7348282Z     args.clear()
2023-01-11T21:38:06.7348374Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7348581Z         buf1 = empty_strided((1024, 1024), (1024, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7348597Z 
2023-01-11T21:38:06.7348691Z         def grid_triton_template_0(META):
2023-01-11T21:38:06.7348764Z             return (
2023-01-11T21:38:06.7348909Z                 triton.cdiv(1024, META["BLOCK_M"]) * triton.cdiv(1024, META["BLOCK_N"]),
2023-01-11T21:38:06.7348999Z                 META["SPLIT_K"],
2023-01-11T21:38:06.7349067Z             )
2023-01-11T21:38:06.7349072Z 
2023-01-11T21:38:06.7349273Z         triton_template_0[grid_triton_template_0](arg0_1, arg1_1, 1024, 1024, 1024, 1024, 1, 1024, 1, 1024, 1, buf1, GROUP_M=8, ACC_TYPE=tl.float32, allow_tf32=False)
2023-01-11T21:38:06.7349348Z         del arg0_1
2023-01-11T21:38:06.7349414Z         del arg1_1
2023-01-11T21:38:06.7349491Z         return (buf1, )
2023-01-11T21:38:06.7349496Z 
2023-01-11T21:38:06.7349500Z 
2023-01-11T21:38:06.7349583Z if __name__ == "__main__":
2023-01-11T21:38:06.7349704Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7349835Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7350048Z     arg0_1 = rand_strided((1024, 1024), (1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7350256Z     arg1_1 = rand_strided((1024, 1024), (1024, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7350405Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7350410Z 
2023-01-11T21:38:06.7350474Z ok (5.666s)
2023-01-11T21:38:06.7350935Z   test_triu_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7351066Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7351327Z [2023-01-11 21:36:04,879] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 905
2023-01-11T21:38:06.7351589Z [2023-01-11 21:36:04,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 905
2023-01-11T21:38:06.7352006Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7352142Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7352398Z [2023-01-11 21:36:05,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 906
2023-01-11T21:38:06.7352662Z [2023-01-11 21:36:05,121] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 906
2023-01-11T21:38:06.7352667Z 
2023-01-11T21:38:06.7352765Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7352840Z import torch
2023-01-11T21:38:06.7352908Z import random
2023-01-11T21:38:06.7353027Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7353155Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7353160Z 
2023-01-11T21:38:06.7353243Z aten = torch.ops.aten
2023-01-11T21:38:06.7353379Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7353476Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7353481Z 
2023-01-11T21:38:06.7353556Z import triton
2023-01-11T21:38:06.7353676Z import triton.language as tl
2023-01-11T21:38:06.7353796Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7353937Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7353942Z 
2023-01-11T21:38:06.7353947Z 
2023-01-11T21:38:06.7354127Z triton_fused_triu_triu_1_triu_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7354202Z import triton
2023-01-11T21:38:06.7354294Z import triton.language as tl
2023-01-11T21:38:06.7354410Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7354511Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7354637Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7354765Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7354770Z 
2023-01-11T21:38:06.7355205Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7355279Z @triton.jit
2023-01-11T21:38:06.7355435Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7355511Z     xnumel = 200
2023-01-11T21:38:06.7355609Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7355741Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7355817Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7355893Z     x0 = xindex % 10
2023-01-11T21:38:06.7355975Z     x1 = (xindex // 10) % 10
2023-01-11T21:38:06.7356045Z     x3 = xindex
2023-01-11T21:38:06.7356267Z     tmp3 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7356366Z     tmp10 = tl.load(in_ptr0 + (x3), xmask)
2023-01-11T21:38:06.7356487Z     tmp0 = (-1) + x0 + ((-1)*x1)
2023-01-11T21:38:06.7356551Z     tmp1 = 0
2023-01-11T21:38:06.7356634Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.7356733Z     tmp4 = tl.where(tmp2, tmp3, tmp1)
2023-01-11T21:38:06.7356845Z     tmp5 = x0 + ((-1)*x1)
2023-01-11T21:38:06.7356925Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.7357020Z     tmp7 = tl.where(tmp6, tmp3, tmp1)
2023-01-11T21:38:06.7357139Z     tmp8 = (-2) + x0 + ((-1)*x1)
2023-01-11T21:38:06.7357210Z     tmp9 = tmp8 >= tmp1
2023-01-11T21:38:06.7357308Z     tmp11 = tl.where(tmp9, tmp10, tmp1)
2023-01-11T21:38:06.7357446Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7357580Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7357715Z     tl.store(out_ptr2 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7357803Z ''')
2023-01-11T21:38:06.7357808Z 
2023-01-11T21:38:06.7357813Z 
2023-01-11T21:38:06.7357906Z async_compile.wait(globals())
2023-01-11T21:38:06.7357984Z del async_compile
2023-01-11T21:38:06.7357989Z 
2023-01-11T21:38:06.7358056Z def call(args):
2023-01-11T21:38:06.7358134Z     arg0_1, = args
2023-01-11T21:38:06.7358209Z     args.clear()
2023-01-11T21:38:06.7358302Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7358516Z         buf0 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7358728Z         buf1 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7358933Z         buf2 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7359020Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7359185Z         triton_fused_triu_triu_1_triu_2_0.run(arg0_1, buf0, buf1, buf2, 200, grid=grid(200), stream=stream0)
2023-01-11T21:38:06.7359266Z         del arg0_1
2023-01-11T21:38:06.7359356Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7359361Z 
2023-01-11T21:38:06.7359366Z 
2023-01-11T21:38:06.7359447Z if __name__ == "__main__":
2023-01-11T21:38:06.7359566Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7359721Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7359934Z     arg0_1 = rand_strided((2, 10, 10), (100, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7360039Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7360054Z 
2023-01-11T21:38:06.7360059Z 
2023-01-11T21:38:06.7360149Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7360231Z import torch
2023-01-11T21:38:06.7360307Z import random
2023-01-11T21:38:06.7360427Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7360553Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7360558Z 
2023-01-11T21:38:06.7360644Z aten = torch.ops.aten
2023-01-11T21:38:06.7360780Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7360868Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7360873Z 
2023-01-11T21:38:06.7360946Z import triton
2023-01-11T21:38:06.7361038Z import triton.language as tl
2023-01-11T21:38:06.7361164Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7361307Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7361313Z 
2023-01-11T21:38:06.7361317Z 
2023-01-11T21:38:06.7361496Z triton_fused_triu_triu_1_triu_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7361571Z import triton
2023-01-11T21:38:06.7361662Z import triton.language as tl
2023-01-11T21:38:06.7361769Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7361872Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7362006Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7362131Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7362164Z 
2023-01-11T21:38:06.7362597Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7362678Z @triton.jit
2023-01-11T21:38:06.7362833Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7362906Z     xnumel = 200
2023-01-11T21:38:06.7362997Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7363131Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7363218Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7363295Z     x0 = xindex % 10
2023-01-11T21:38:06.7363377Z     x1 = (xindex // 10) % 10
2023-01-11T21:38:06.7363448Z     x3 = xindex
2023-01-11T21:38:06.7363660Z     tmp3 = tl.load(in_ptr0 + (x3), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7363775Z     tmp10 = tl.load(in_ptr0 + (x3), xmask).to(tl.float32)
2023-01-11T21:38:06.7363894Z     tmp0 = (-1) + x0 + ((-1)*x1)
2023-01-11T21:38:06.7363966Z     tmp1 = 0
2023-01-11T21:38:06.7364048Z     tmp2 = tmp0 >= tmp1
2023-01-11T21:38:06.7364147Z     tmp4 = tl.where(tmp2, tmp3, tmp1)
2023-01-11T21:38:06.7364263Z     tmp5 = x0 + ((-1)*x1)
2023-01-11T21:38:06.7364345Z     tmp6 = tmp5 >= tmp1
2023-01-11T21:38:06.7364432Z     tmp7 = tl.where(tmp6, tmp3, tmp1)
2023-01-11T21:38:06.7364552Z     tmp8 = (-2) + x0 + ((-1)*x1)
2023-01-11T21:38:06.7364631Z     tmp9 = tmp8 >= tmp1
2023-01-11T21:38:06.7364729Z     tmp11 = tl.where(tmp9, tmp10, tmp1)
2023-01-11T21:38:06.7364864Z     tl.store(out_ptr0 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7364997Z     tl.store(out_ptr1 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7365133Z     tl.store(out_ptr2 + (x3 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7365214Z ''')
2023-01-11T21:38:06.7365220Z 
2023-01-11T21:38:06.7365231Z 
2023-01-11T21:38:06.7365318Z async_compile.wait(globals())
2023-01-11T21:38:06.7365395Z del async_compile
2023-01-11T21:38:06.7365401Z 
2023-01-11T21:38:06.7365477Z def call(args):
2023-01-11T21:38:06.7365555Z     arg0_1, = args
2023-01-11T21:38:06.7365630Z     args.clear()
2023-01-11T21:38:06.7365755Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7365967Z         buf0 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7366170Z         buf1 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7366374Z         buf2 = empty_strided((2, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7366467Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7366628Z         triton_fused_triu_triu_1_triu_2_0.run(arg0_1, buf0, buf1, buf2, 200, grid=grid(200), stream=stream0)
2023-01-11T21:38:06.7366702Z         del arg0_1
2023-01-11T21:38:06.7366797Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7366802Z 
2023-01-11T21:38:06.7366807Z 
2023-01-11T21:38:06.7366889Z if __name__ == "__main__":
2023-01-11T21:38:06.7367009Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7367130Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7367345Z     arg0_1 = rand_strided((2, 10, 10), (100, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7367458Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7367463Z 
2023-01-11T21:38:06.7367536Z ok (0.282s)
2023-01-11T21:38:06.7367990Z   test_unbind_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7368152Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7368412Z [2023-01-11 21:36:05,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 907
2023-01-11T21:38:06.7368677Z [2023-01-11 21:36:05,151] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 907
2023-01-11T21:38:06.7369088Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7369220Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7369466Z [2023-01-11 21:36:05,173] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 908
2023-01-11T21:38:06.7369732Z [2023-01-11 21:36:05,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 908
2023-01-11T21:38:06.7369738Z 
2023-01-11T21:38:06.7369836Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7369911Z import torch
2023-01-11T21:38:06.7369991Z import random
2023-01-11T21:38:06.7370113Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7370238Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7370243Z 
2023-01-11T21:38:06.7370325Z aten = torch.ops.aten
2023-01-11T21:38:06.7370455Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7370551Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7370556Z 
2023-01-11T21:38:06.7370631Z import triton
2023-01-11T21:38:06.7370723Z import triton.language as tl
2023-01-11T21:38:06.7370849Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7370989Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7370997Z 
2023-01-11T21:38:06.7371002Z 
2023-01-11T21:38:06.7371094Z async_compile.wait(globals())
2023-01-11T21:38:06.7371171Z del async_compile
2023-01-11T21:38:06.7371178Z 
2023-01-11T21:38:06.7371245Z def call(args):
2023-01-11T21:38:06.7371319Z     arg0_1, = args
2023-01-11T21:38:06.7371396Z     args.clear()
2023-01-11T21:38:06.7371698Z     return (as_strided(arg0_1, (4, 4), (4, 1)), as_strided(arg0_1, (4, 4), (4, 1), 16), as_strided(arg0_1, (4, 4), (4, 1), 32), as_strided(arg0_1, (4, 4), (4, 1), 48), as_strided(arg0_1, (4, 4), (16, 4)), as_strided(arg0_1, (4, 4), (16, 4), 1), as_strided(arg0_1, (4, 4), (16, 4), 2), as_strided(arg0_1, (4, 4), (16, 4), 3), )
2023-01-11T21:38:06.7371705Z 
2023-01-11T21:38:06.7371710Z 
2023-01-11T21:38:06.7371794Z if __name__ == "__main__":
2023-01-11T21:38:06.7371913Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7372038Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7372250Z     arg0_1 = rand_strided((4, 4, 4), (16, 4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7372366Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7372371Z 
2023-01-11T21:38:06.7372375Z 
2023-01-11T21:38:06.7372466Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7372542Z import torch
2023-01-11T21:38:06.7372616Z import random
2023-01-11T21:38:06.7372739Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7372864Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7372869Z 
2023-01-11T21:38:06.7372952Z aten = torch.ops.aten
2023-01-11T21:38:06.7373090Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7373187Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7373192Z 
2023-01-11T21:38:06.7373260Z import triton
2023-01-11T21:38:06.7373354Z import triton.language as tl
2023-01-11T21:38:06.7373481Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7373621Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7373652Z 
2023-01-11T21:38:06.7373656Z 
2023-01-11T21:38:06.7373753Z async_compile.wait(globals())
2023-01-11T21:38:06.7373829Z del async_compile
2023-01-11T21:38:06.7373835Z 
2023-01-11T21:38:06.7373913Z def call(args):
2023-01-11T21:38:06.7373986Z     arg0_1, = args
2023-01-11T21:38:06.7374057Z     args.clear()
2023-01-11T21:38:06.7374326Z     return (as_strided(arg0_1, (4, 4), (4, 1)), as_strided(arg0_1, (4, 4), (4, 1), 16), as_strided(arg0_1, (4, 4), (4, 1), 32), as_strided(arg0_1, (4, 4), (4, 1), 48), as_strided(arg0_1, (4, 4), (16, 4)), as_strided(arg0_1, (4, 4), (16, 4), 1), as_strided(arg0_1, (4, 4), (16, 4), 2), as_strided(arg0_1, (4, 4), (16, 4), 3), )
2023-01-11T21:38:06.7374332Z 
2023-01-11T21:38:06.7374336Z 
2023-01-11T21:38:06.7374417Z if __name__ == "__main__":
2023-01-11T21:38:06.7374740Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7374870Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7375088Z     arg0_1 = rand_strided((4, 4, 4), (16, 4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7375209Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7375214Z 
2023-01-11T21:38:06.7375286Z ok (0.059s)
2023-01-11T21:38:06.7375846Z   test_unroll_small_reduction_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7375992Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7376278Z [2023-01-11 21:36:05,220] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 909
2023-01-11T21:38:06.7376581Z [2023-01-11 21:36:05,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 909
2023-01-11T21:38:06.7377116Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7377308Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7377564Z [2023-01-11 21:36:05,541] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 910
2023-01-11T21:38:06.7377570Z 
2023-01-11T21:38:06.7377666Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7377742Z import torch
2023-01-11T21:38:06.7377818Z import random
2023-01-11T21:38:06.7377939Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7378056Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7378064Z 
2023-01-11T21:38:06.7378146Z aten = torch.ops.aten
2023-01-11T21:38:06.7378287Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7378385Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7378390Z 
2023-01-11T21:38:06.7378464Z import triton
2023-01-11T21:38:06.7378559Z import triton.language as tl
2023-01-11T21:38:06.7378689Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7378830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7378835Z 
2023-01-11T21:38:06.7378840Z 
2023-01-11T21:38:06.7379107Z triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7379186Z import triton
2023-01-11T21:38:06.7379281Z import triton.language as tl
2023-01-11T21:38:06.7379399Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7379501Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7379681Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7379807Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7379812Z 
2023-01-11T21:38:06.7380357Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i64', 3: '*fp32', 4: '*i64', 5: '*fp32', 6: '*i1', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*fp32', 12: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]})
2023-01-11T21:38:06.7380431Z @triton.jit
2023-01-11T21:38:06.7380639Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7380713Z     xnumel = 8
2023-01-11T21:38:06.7380812Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7380942Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7381030Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7381102Z     x0 = xindex
2023-01-11T21:38:06.7381294Z     tmp0 = tl.load(in_ptr0 + (3*x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7381485Z     tmp1 = tl.load(in_ptr0 + (1 + (3*x0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7381685Z     tmp3 = tl.load(in_ptr0 + (2 + (3*x0)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7381789Z     tmp46 = tl.load(in_ptr0 + (3*x0), xmask)
2023-01-11T21:38:06.7381898Z     tmp47 = tl.load(in_ptr0 + (1 + (3*x0)), xmask)
2023-01-11T21:38:06.7382003Z     tmp49 = tl.load(in_ptr0 + (2 + (3*x0)), xmask)
2023-01-11T21:38:06.7382142Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 < tmp1, tmp0, tmp1))
2023-01-11T21:38:06.7382279Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3))
2023-01-11T21:38:06.7382350Z     tmp5 = 0
2023-01-11T21:38:06.7382412Z     tmp6 = 1
2023-01-11T21:38:06.7382492Z     tmp7 = tmp1 < tmp0
2023-01-11T21:38:06.7382592Z     tmp8 = tl.where(tmp7, tmp6, tmp5)
2023-01-11T21:38:06.7382661Z     tmp9 = 2
2023-01-11T21:38:06.7382744Z     tmp10 = tmp3 < tmp2
2023-01-11T21:38:06.7382842Z     tmp11 = tl.where(tmp10, tmp9, tmp8)
2023-01-11T21:38:06.7382981Z     tmp12 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.7383143Z     tmp13 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 > tmp3, tmp12, tmp3))
2023-01-11T21:38:06.7383224Z     tmp14 = tmp1 > tmp0
2023-01-11T21:38:06.7383322Z     tmp15 = tl.where(tmp14, tmp6, tmp5)
2023-01-11T21:38:06.7383404Z     tmp16 = tmp3 > tmp12
2023-01-11T21:38:06.7383503Z     tmp17 = tl.where(tmp16, tmp9, tmp15)
2023-01-11T21:38:06.7383582Z     tmp18 = tmp0 + tmp1
2023-01-11T21:38:06.7383664Z     tmp19 = tmp18 + tmp3
2023-01-11T21:38:06.7383737Z     tmp20 = tmp0 > tmp6
2023-01-11T21:38:06.7383826Z     tmp21 = tmp20.to(tl.int64)
2023-01-11T21:38:06.7383908Z     tmp22 = (tmp21 != 0)
2023-01-11T21:38:06.7383988Z     tmp23 = tmp1 > tmp6
2023-01-11T21:38:06.7384081Z     tmp24 = tmp23.to(tl.int64)
2023-01-11T21:38:06.7384165Z     tmp25 = (tmp24 != 0)
2023-01-11T21:38:06.7384239Z     tmp26 = tmp22 | tmp25
2023-01-11T21:38:06.7384320Z     tmp27 = tmp3 > tmp6
2023-01-11T21:38:06.7384406Z     tmp28 = tmp27.to(tl.int64)
2023-01-11T21:38:06.7384487Z     tmp29 = (tmp28 != 0)
2023-01-11T21:38:06.7384570Z     tmp30 = tmp26 | tmp29
2023-01-11T21:38:06.7384651Z     tmp31 = tmp0 > tmp5
2023-01-11T21:38:06.7384729Z     tmp32 = tmp31 == 0
2023-01-11T21:38:06.7384808Z     tmp33 = tmp32.to(tl.int64)
2023-01-11T21:38:06.7384888Z     tmp34 = (tmp33 != 0)
2023-01-11T21:38:06.7384968Z     tmp35 = tmp1 > tmp5
2023-01-11T21:38:06.7385047Z     tmp36 = tmp35 == 0
2023-01-11T21:38:06.7385132Z     tmp37 = tmp36.to(tl.int64)
2023-01-11T21:38:06.7385210Z     tmp38 = (tmp37 != 0)
2023-01-11T21:38:06.7385292Z     tmp39 = tmp34 | tmp38
2023-01-11T21:38:06.7385364Z     tmp40 = tmp3 > tmp5
2023-01-11T21:38:06.7385442Z     tmp41 = tmp40 == 0
2023-01-11T21:38:06.7385531Z     tmp42 = tmp41.to(tl.int64)
2023-01-11T21:38:06.7385651Z     tmp43 = (tmp42 != 0)
2023-01-11T21:38:06.7385731Z     tmp44 = tmp39 | tmp43
2023-01-11T21:38:06.7385811Z     tmp45 = tmp44 == 0
2023-01-11T21:38:06.7385948Z     tmp48 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 > tmp47, tmp46, tmp47))
2023-01-11T21:38:06.7386088Z     tmp50 = tl.where(tmp48 != tmp48, tmp48, tl.where(tmp48 > tmp49, tmp48, tmp49))
2023-01-11T21:38:06.7386225Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7386361Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7386495Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.7386624Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.7386754Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.7386882Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp30, xmask)
2023-01-11T21:38:06.7387005Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp45, xmask)
2023-01-11T21:38:06.7387133Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7387259Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.7387392Z     tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7387524Z     tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp50, xmask)
2023-01-11T21:38:06.7387613Z ''')
2023-01-11T21:38:06.7387619Z 
2023-01-11T21:38:06.7387624Z 
2023-01-11T21:38:06.7387724Z async_compile.wait(globals())
2023-01-11T21:38:06.7387803Z del async_compile
2023-01-11T21:38:06.7387808Z 
2023-01-11T21:38:06.7387885Z def call(args):
2023-01-11T21:38:06.7387951Z     arg0_1, = args
2023-01-11T21:38:06.7388026Z     args.clear()
2023-01-11T21:38:06.7388119Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7388322Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7388519Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7388717Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7388914Z         buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7389128Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7389318Z         buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7389506Z         buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7389699Z         buf7 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7389887Z         buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7390082Z         buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7390277Z         buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7390373Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7390618Z         triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7390697Z         del arg0_1
2023-01-11T21:38:06.7390834Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, )
2023-01-11T21:38:06.7390840Z 
2023-01-11T21:38:06.7390844Z 
2023-01-11T21:38:06.7390925Z if __name__ == "__main__":
2023-01-11T21:38:06.7391044Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7391173Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7391373Z     arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7391485Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7391518Z 
2023-01-11T21:38:06.7391782Z [2023-01-11 21:36:05,697] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 910
2023-01-11T21:38:06.7392192Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7392324Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7392580Z [2023-01-11 21:36:05,740] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 911
2023-01-11T21:38:06.7392586Z 
2023-01-11T21:38:06.7392684Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7392758Z import torch
2023-01-11T21:38:06.7392834Z import random
2023-01-11T21:38:06.7392955Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7393086Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7393092Z 
2023-01-11T21:38:06.7393176Z aten = torch.ops.aten
2023-01-11T21:38:06.7393305Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7393401Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7393406Z 
2023-01-11T21:38:06.7393482Z import triton
2023-01-11T21:38:06.7393576Z import triton.language as tl
2023-01-11T21:38:06.7393703Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7393841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7393846Z 
2023-01-11T21:38:06.7393850Z 
2023-01-11T21:38:06.7394126Z triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7394201Z import triton
2023-01-11T21:38:06.7394287Z import triton.language as tl
2023-01-11T21:38:06.7394402Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7394507Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7394643Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7394768Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7394774Z 
2023-01-11T21:38:06.7395344Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i64', 3: '*fp16', 4: '*i64', 5: '*fp16', 6: '*i1', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp16', 11: '*fp16', 12: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]})
2023-01-11T21:38:06.7395422Z @triton.jit
2023-01-11T21:38:06.7395635Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, out_ptr10, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7395710Z     xnumel = 8
2023-01-11T21:38:06.7395801Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7395932Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7396016Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7396088Z     x0 = xindex
2023-01-11T21:38:06.7396303Z     tmp0 = tl.load(in_ptr0 + (3*x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7396526Z     tmp1 = tl.load(in_ptr0 + (1 + (3*x0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7396744Z     tmp3 = tl.load(in_ptr0 + (2 + (3*x0)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7396858Z     tmp46 = tl.load(in_ptr0 + (3*x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7396979Z     tmp47 = tl.load(in_ptr0 + (1 + (3*x0)), xmask).to(tl.float32)
2023-01-11T21:38:06.7397097Z     tmp49 = tl.load(in_ptr0 + (2 + (3*x0)), xmask).to(tl.float32)
2023-01-11T21:38:06.7397235Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 < tmp1, tmp0, tmp1))
2023-01-11T21:38:06.7397373Z     tmp4 = tl.where(tmp2 != tmp2, tmp2, tl.where(tmp2 < tmp3, tmp2, tmp3))
2023-01-11T21:38:06.7397472Z     tmp5 = 0
2023-01-11T21:38:06.7397544Z     tmp6 = 1
2023-01-11T21:38:06.7397618Z     tmp7 = tmp1 < tmp0
2023-01-11T21:38:06.7397714Z     tmp8 = tl.where(tmp7, tmp6, tmp5)
2023-01-11T21:38:06.7397784Z     tmp9 = 2
2023-01-11T21:38:06.7397863Z     tmp10 = tmp3 < tmp2
2023-01-11T21:38:06.7397962Z     tmp11 = tl.where(tmp10, tmp9, tmp8)
2023-01-11T21:38:06.7398099Z     tmp12 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.7398240Z     tmp13 = tl.where(tmp12 != tmp12, tmp12, tl.where(tmp12 > tmp3, tmp12, tmp3))
2023-01-11T21:38:06.7398313Z     tmp14 = tmp1 > tmp0
2023-01-11T21:38:06.7398408Z     tmp15 = tl.where(tmp14, tmp6, tmp5)
2023-01-11T21:38:06.7398490Z     tmp16 = tmp3 > tmp12
2023-01-11T21:38:06.7398588Z     tmp17 = tl.where(tmp16, tmp9, tmp15)
2023-01-11T21:38:06.7398666Z     tmp18 = tmp0 + tmp1
2023-01-11T21:38:06.7398747Z     tmp19 = tmp18 + tmp3
2023-01-11T21:38:06.7398825Z     tmp20 = tmp0 > tmp6
2023-01-11T21:38:06.7398909Z     tmp21 = tmp20.to(tl.int64)
2023-01-11T21:38:06.7398988Z     tmp22 = (tmp21 != 0)
2023-01-11T21:38:06.7399066Z     tmp23 = tmp1 > tmp6
2023-01-11T21:38:06.7399152Z     tmp24 = tmp23.to(tl.int64)
2023-01-11T21:38:06.7399231Z     tmp25 = (tmp24 != 0)
2023-01-11T21:38:06.7399312Z     tmp26 = tmp22 | tmp25
2023-01-11T21:38:06.7399393Z     tmp27 = tmp3 > tmp6
2023-01-11T21:38:06.7399471Z     tmp28 = tmp27.to(tl.int64)
2023-01-11T21:38:06.7399550Z     tmp29 = (tmp28 != 0)
2023-01-11T21:38:06.7399630Z     tmp30 = tmp26 | tmp29
2023-01-11T21:38:06.7399708Z     tmp31 = tmp0 > tmp5
2023-01-11T21:38:06.7399786Z     tmp32 = tmp31 == 0
2023-01-11T21:38:06.7399872Z     tmp33 = tmp32.to(tl.int64)
2023-01-11T21:38:06.7399944Z     tmp34 = (tmp33 != 0)
2023-01-11T21:38:06.7400023Z     tmp35 = tmp1 > tmp5
2023-01-11T21:38:06.7400100Z     tmp36 = tmp35 == 0
2023-01-11T21:38:06.7400185Z     tmp37 = tmp36.to(tl.int64)
2023-01-11T21:38:06.7400264Z     tmp38 = (tmp37 != 0)
2023-01-11T21:38:06.7400344Z     tmp39 = tmp34 | tmp38
2023-01-11T21:38:06.7400425Z     tmp40 = tmp3 > tmp5
2023-01-11T21:38:06.7400494Z     tmp41 = tmp40 == 0
2023-01-11T21:38:06.7400578Z     tmp42 = tmp41.to(tl.int64)
2023-01-11T21:38:06.7400659Z     tmp43 = (tmp42 != 0)
2023-01-11T21:38:06.7400738Z     tmp44 = tmp39 | tmp43
2023-01-11T21:38:06.7400815Z     tmp45 = tmp44 == 0
2023-01-11T21:38:06.7400989Z     tmp48 = tl.where(tmp46 != tmp46, tmp46, tl.where(tmp46 > tmp47, tmp46, tmp47))
2023-01-11T21:38:06.7401133Z     tmp50 = tl.where(tmp48 != tmp48, tmp48, tl.where(tmp48 > tmp49, tmp48, tmp49))
2023-01-11T21:38:06.7401263Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7401399Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7401531Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.7401659Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.7401786Z     tl.store(out_ptr4 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp19, xmask)
2023-01-11T21:38:06.7401915Z     tl.store(out_ptr5 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp30, xmask)
2023-01-11T21:38:06.7402042Z     tl.store(out_ptr6 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp45, xmask)
2023-01-11T21:38:06.7402167Z     tl.store(out_ptr7 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7402289Z     tl.store(out_ptr8 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.7402422Z     tl.store(out_ptr9 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7402555Z     tl.store(out_ptr10 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp50, xmask)
2023-01-11T21:38:06.7402641Z ''')
2023-01-11T21:38:06.7402647Z 
2023-01-11T21:38:06.7402651Z 
2023-01-11T21:38:06.7402748Z async_compile.wait(globals())
2023-01-11T21:38:06.7402828Z del async_compile
2023-01-11T21:38:06.7402833Z 
2023-01-11T21:38:06.7402908Z def call(args):
2023-01-11T21:38:06.7402981Z     arg0_1, = args
2023-01-11T21:38:06.7403080Z     args.clear()
2023-01-11T21:38:06.7403173Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7403373Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7403565Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7403763Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7403952Z         buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7404146Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7404332Z         buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7404510Z         buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7404701Z         buf7 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7404887Z         buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7405085Z         buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7405282Z         buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7405381Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7405631Z         triton_fused_amax_amin_any_1_argmax_argmin_getitem_getitem_1_getitem_2_getitem_3_logical_not_1_0.run(arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7405706Z         del arg0_1
2023-01-11T21:38:06.7405836Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf6, buf7, buf8, buf9, buf10, )
2023-01-11T21:38:06.7405842Z 
2023-01-11T21:38:06.7405854Z 
2023-01-11T21:38:06.7405928Z if __name__ == "__main__":
2023-01-11T21:38:06.7406049Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7406177Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7406378Z     arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7406496Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7406501Z 
2023-01-11T21:38:06.7406766Z [2023-01-11 21:36:06,030] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 911
2023-01-11T21:38:06.7407210Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7407343Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7407599Z [2023-01-11 21:36:06,071] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 912
2023-01-11T21:38:06.7407606Z 
2023-01-11T21:38:06.7407697Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7407774Z import torch
2023-01-11T21:38:06.7407851Z import random
2023-01-11T21:38:06.7407973Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7408096Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7408101Z 
2023-01-11T21:38:06.7408183Z aten = torch.ops.aten
2023-01-11T21:38:06.7408320Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7408410Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7408423Z 
2023-01-11T21:38:06.7408490Z import triton
2023-01-11T21:38:06.7408583Z import triton.language as tl
2023-01-11T21:38:06.7408708Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7408848Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7408854Z 
2023-01-11T21:38:06.7408858Z 
2023-01-11T21:38:06.7409116Z triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7409219Z import triton
2023-01-11T21:38:06.7409314Z import triton.language as tl
2023-01-11T21:38:06.7409422Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7409525Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7409656Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7409783Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7409788Z 
2023-01-11T21:38:06.7409877Z @reduction(size_hints=[8, 4],
2023-01-11T21:38:06.7409992Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7410078Z               filename=__file__,
2023-01-11T21:38:06.7410589Z               meta={'signature': {0: '*i1', 1: '*fp32', 2: '*fp32', 3: '*i64', 4: '*fp32', 5: '*i64', 6: '*fp32', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp32', 11: '*fp32', 12: 'i32', 13: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]})
2023-01-11T21:38:06.7410666Z @triton.jit
2023-01-11T21:38:06.7410906Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7410979Z     xnumel = 8
2023-01-11T21:38:06.7411051Z     rnumel = 3
2023-01-11T21:38:06.7411153Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7411289Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7411372Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7411489Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7411554Z     x0 = xindex
2023-01-11T21:38:06.7411679Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.7411808Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.7411926Z     _tmp2_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.7412108Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.7412287Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.7412402Z     _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.7412517Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7412667Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.7412782Z     _tmp16 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.7412965Z     _tmp21 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.7413071Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7413161Z         rindex = roffset + rbase
2023-01-11T21:38:06.7413248Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7413325Z         r1 = rindex
2023-01-11T21:38:06.7413535Z         tmp0 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7413656Z         tmp20 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask)
2023-01-11T21:38:06.7413789Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 > tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.7413931Z         _tmp2_index = tl.where(xmask & rmask & (_tmp2 > tmp0),  rindex, _tmp2_index)
2023-01-11T21:38:06.7414055Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 > tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.7414179Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.7414318Z         _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0),  rindex, _tmp4_index)
2023-01-11T21:38:06.7414441Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4)
2023-01-11T21:38:06.7414755Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp0, _tmp5)
2023-01-11T21:38:06.7414832Z         tmp6 = 1
2023-01-11T21:38:06.7414916Z         tmp7 = tmp0 > tmp6
2023-01-11T21:38:06.7415008Z         tmp8 = tmp7.to(tl.int64)
2023-01-11T21:38:06.7415092Z         tmp9 = (tmp8 != 0)
2023-01-11T21:38:06.7415220Z         _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10)
2023-01-11T21:38:06.7415357Z         tmp11 = 0
2023-01-11T21:38:06.7415435Z         tmp12 = tmp0 > tmp11
2023-01-11T21:38:06.7415519Z         tmp13 = tmp12 == 0
2023-01-11T21:38:06.7415613Z         tmp14 = tmp13.to(tl.int64)
2023-01-11T21:38:06.7415699Z         tmp15 = (tmp14 != 0)
2023-01-11T21:38:06.7415834Z         _tmp16 = tl.where(xmask & rmask & (_tmp16 < tmp15), tmp15, _tmp16)
2023-01-11T21:38:06.7415966Z         _tmp21 = tl.where(xmask & rmask & (_tmp21 < tmp20), tmp20, _tmp21)
2023-01-11T21:38:06.7416083Z     tmp1 = tl.reshape(tl.min(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7416177Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.7416275Z     _tmp2_index_reduce = tl.reshape(
2023-01-11T21:38:06.7416389Z     	tl.argmin(_tmp2, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.7416513Z     _tmp2_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.7416617Z     	[1, RBLOCK]) == _tmp2_index_reduce)
2023-01-11T21:38:06.7416709Z     tmp2 = tl.reshape(tl.sum(
2023-01-11T21:38:06.7416842Z     	tl.where(_tmp2_index_mask, _tmp2_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.7416935Z     tl.store(out_ptr1 + x0, tmp2, xmask)
2023-01-11T21:38:06.7417051Z     tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7417209Z     tl.store(out_ptr2 + x0, tmp3, xmask)
2023-01-11T21:38:06.7417311Z     _tmp4_index_reduce = tl.reshape(
2023-01-11T21:38:06.7417424Z     	tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.7417549Z     _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.7417648Z     	[1, RBLOCK]) == _tmp4_index_reduce)
2023-01-11T21:38:06.7417731Z     tmp4 = tl.reshape(tl.sum(
2023-01-11T21:38:06.7417858Z     	tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.7417957Z     tl.store(out_ptr3 + x0, tmp4, xmask)
2023-01-11T21:38:06.7418071Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7418169Z     tl.store(out_ptr4 + x0, tmp5, xmask)
2023-01-11T21:38:06.7418293Z     tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7418394Z     tl.store(out_ptr5 + x0, tmp10, xmask)
2023-01-11T21:38:06.7418505Z     tmp16 = tl.reshape(tl.max(_tmp16, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7418582Z     tmp17 = tmp2
2023-01-11T21:38:06.7418682Z     tl.store(out_ptr6 + x0, tmp17, xmask)
2023-01-11T21:38:06.7418813Z     tmp18 = tmp4
2023-01-11T21:38:06.7418911Z     tl.store(out_ptr7 + x0, tmp18, xmask)
2023-01-11T21:38:06.7418985Z     tmp19 = tmp1
2023-01-11T21:38:06.7419080Z     tl.store(out_ptr8 + x0, tmp19, xmask)
2023-01-11T21:38:06.7419189Z     tmp21 = tl.reshape(tl.max(_tmp21, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7419285Z     tl.store(out_ptr9 + x0, tmp21, xmask)
2023-01-11T21:38:06.7419364Z     tmp22 = tmp16 == 0
2023-01-11T21:38:06.7419503Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.7419596Z ''')
2023-01-11T21:38:06.7419601Z 
2023-01-11T21:38:06.7419606Z 
2023-01-11T21:38:06.7419701Z async_compile.wait(globals())
2023-01-11T21:38:06.7419782Z del async_compile
2023-01-11T21:38:06.7419788Z 
2023-01-11T21:38:06.7419862Z def call(args):
2023-01-11T21:38:06.7419928Z     arg0_1, = args
2023-01-11T21:38:06.7420004Z     args.clear()
2023-01-11T21:38:06.7420097Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7420298Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7420491Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7420689Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7420878Z         buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7421064Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7421252Z         buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7421438Z         buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7421661Z         buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7421851Z         buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7422049Z         buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7422248Z         buf11 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7422340Z         buf7 = buf6; del buf6  # reuse
2023-01-11T21:38:06.7422425Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7422660Z         triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0.run(buf7, arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf8, buf9, buf10, buf11, 8, 3, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7422733Z         del arg0_1
2023-01-11T21:38:06.7422871Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf7, buf8, buf9, buf10, buf11, )
2023-01-11T21:38:06.7422876Z 
2023-01-11T21:38:06.7422884Z 
2023-01-11T21:38:06.7422966Z if __name__ == "__main__":
2023-01-11T21:38:06.7423084Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7423211Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7423409Z     arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7423526Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7423532Z 
2023-01-11T21:38:06.7423790Z [2023-01-11 21:36:06,235] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 912
2023-01-11T21:38:06.7423796Z 
2023-01-11T21:38:06.7423897Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7423974Z import torch
2023-01-11T21:38:06.7424050Z import random
2023-01-11T21:38:06.7424170Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7431659Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7431670Z 
2023-01-11T21:38:06.7431819Z aten = torch.ops.aten
2023-01-11T21:38:06.7431975Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7432067Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7432072Z 
2023-01-11T21:38:06.7432153Z import triton
2023-01-11T21:38:06.7432248Z import triton.language as tl
2023-01-11T21:38:06.7432379Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7432611Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7432853Z 
2023-01-11T21:38:06.7432858Z 
2023-01-11T21:38:06.7433148Z triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7433451Z import triton
2023-01-11T21:38:06.7433675Z import triton.language as tl
2023-01-11T21:38:06.7433927Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7434179Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7434458Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7434754Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7434917Z 
2023-01-11T21:38:06.7435008Z @reduction(size_hints=[8, 4],
2023-01-11T21:38:06.7435258Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7435541Z               filename=__file__,
2023-01-11T21:38:06.7436197Z               meta={'signature': {0: '*i1', 1: '*fp16', 2: '*fp16', 3: '*i64', 4: '*fp16', 5: '*i64', 6: '*fp16', 7: '*i1', 8: '*i64', 9: '*i64', 10: '*fp16', 11: '*fp16', 12: 'i32', 13: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11), equal_to_1=())]})
2023-01-11T21:38:06.7436589Z @triton.jit
2023-01-11T21:38:06.7436968Z def triton_(in_out_ptr0, in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, out_ptr4, out_ptr5, out_ptr6, out_ptr7, out_ptr8, out_ptr9, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7437333Z     xnumel = 8
2023-01-11T21:38:06.7437524Z     rnumel = 3
2023-01-11T21:38:06.7437781Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7438068Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7438331Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7438577Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7438842Z     x0 = xindex
2023-01-11T21:38:06.7439109Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.7439407Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("inf")
2023-01-11T21:38:06.7439676Z     _tmp2_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.7440012Z     _tmp3 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.7440355Z     _tmp4 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.7440624Z     _tmp4_index = tl.zeros([XBLOCK, RBLOCK], tl.int64)
2023-01-11T21:38:06.7440888Z     _tmp5 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7441155Z     _tmp10 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.7441404Z     _tmp16 = tl.zeros([XBLOCK, RBLOCK], tl.int1) + 0
2023-01-11T21:38:06.7441732Z     _tmp21 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + float("-inf")
2023-01-11T21:38:06.7441996Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7442223Z         rindex = roffset + rbase
2023-01-11T21:38:06.7442435Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7442627Z         r1 = rindex
2023-01-11T21:38:06.7442980Z         tmp0 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7443296Z         tmp20 = tl.load(in_ptr0 + (r1 + (3*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.7443589Z         _tmp1 = tl.where(xmask & rmask & (_tmp1 > tmp0), tmp0, _tmp1)
2023-01-11T21:38:06.7443891Z         _tmp2_index = tl.where(xmask & rmask & (_tmp2 > tmp0),  rindex, _tmp2_index)
2023-01-11T21:38:06.7444184Z         _tmp2 = tl.where(xmask & rmask & (_tmp2 > tmp0), tmp0, _tmp2)
2023-01-11T21:38:06.7444469Z         _tmp3 = tl.where(xmask & rmask & (_tmp3 < tmp0), tmp0, _tmp3)
2023-01-11T21:38:06.7444759Z         _tmp4_index = tl.where(xmask & rmask & (_tmp4 < tmp0),  rindex, _tmp4_index)
2023-01-11T21:38:06.7445050Z         _tmp4 = tl.where(xmask & rmask & (_tmp4 < tmp0), tmp0, _tmp4)
2023-01-11T21:38:06.7445353Z         _tmp5 = tl.where(xmask & rmask, _tmp5 + tmp0, _tmp5)
2023-01-11T21:38:06.7445581Z         tmp6 = 1
2023-01-11T21:38:06.7445772Z         tmp7 = tmp0 > tmp6
2023-01-11T21:38:06.7445975Z         tmp8 = tmp7.to(tl.int64)
2023-01-11T21:38:06.7446183Z         tmp9 = (tmp8 != 0)
2023-01-11T21:38:06.7446435Z         _tmp10 = tl.where(xmask & rmask & (_tmp10 < tmp9), tmp9, _tmp10)
2023-01-11T21:38:06.7446660Z         tmp11 = 0
2023-01-11T21:38:06.7446852Z         tmp12 = tmp0 > tmp11
2023-01-11T21:38:06.7447052Z         tmp13 = tmp12 == 0
2023-01-11T21:38:06.7447256Z         tmp14 = tmp13.to(tl.int64)
2023-01-11T21:38:06.7447466Z         tmp15 = (tmp14 != 0)
2023-01-11T21:38:06.7447720Z         _tmp16 = tl.where(xmask & rmask & (_tmp16 < tmp15), tmp15, _tmp16)
2023-01-11T21:38:06.7448003Z         _tmp21 = tl.where(xmask & rmask & (_tmp21 < tmp20), tmp20, _tmp21)
2023-01-11T21:38:06.7448277Z     tmp1 = tl.reshape(tl.min(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7448519Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.7448744Z     _tmp2_index_reduce = tl.reshape(
2023-01-11T21:38:06.7448992Z     	tl.argmin(_tmp2, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.7449257Z     _tmp2_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.7449505Z     	[1, RBLOCK]) == _tmp2_index_reduce)
2023-01-11T21:38:06.7449716Z     tmp2 = tl.reshape(tl.sum(
2023-01-11T21:38:06.7449970Z     	tl.where(_tmp2_index_mask, _tmp2_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.7450223Z     tl.store(out_ptr1 + x0, tmp2, xmask)
2023-01-11T21:38:06.7450463Z     tmp3 = tl.reshape(tl.max(_tmp3, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7450701Z     tl.store(out_ptr2 + x0, tmp3, xmask)
2023-01-11T21:38:06.7450966Z     _tmp4_index_reduce = tl.reshape(
2023-01-11T21:38:06.7451202Z     	tl.argmax(_tmp4, 1), [XBLOCK, 1]).to(tl.int32)
2023-01-11T21:38:06.7451466Z     _tmp4_index_mask = (tl.reshape(tl.arange(0, RBLOCK),
2023-01-11T21:38:06.7451712Z     	[1, RBLOCK]) == _tmp4_index_reduce)
2023-01-11T21:38:06.7451923Z     tmp4 = tl.reshape(tl.sum(
2023-01-11T21:38:06.7452175Z     	tl.where(_tmp4_index_mask, _tmp4_index, 0), 1), [XBLOCK, 1])
2023-01-11T21:38:06.7452426Z     tl.store(out_ptr3 + x0, tmp4, xmask)
2023-01-11T21:38:06.7452674Z     tmp5 = tl.reshape(tl.sum(_tmp5, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7452915Z     tl.store(out_ptr4 + x0, tmp5, xmask)
2023-01-11T21:38:06.7453161Z     tmp10 = tl.reshape(tl.max(_tmp10, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7453409Z     tl.store(out_ptr5 + x0, tmp10, xmask)
2023-01-11T21:38:06.7453652Z     tmp16 = tl.reshape(tl.max(_tmp16, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7453869Z     tmp17 = tmp2
2023-01-11T21:38:06.7454077Z     tl.store(out_ptr6 + x0, tmp17, xmask)
2023-01-11T21:38:06.7454277Z     tmp18 = tmp4
2023-01-11T21:38:06.7454690Z     tl.store(out_ptr7 + x0, tmp18, xmask)
2023-01-11T21:38:06.7454901Z     tmp19 = tmp1
2023-01-11T21:38:06.7455097Z     tl.store(out_ptr8 + x0, tmp19, xmask)
2023-01-11T21:38:06.7455347Z     tmp21 = tl.reshape(tl.max(_tmp21, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7455594Z     tl.store(out_ptr9 + x0, tmp21, xmask)
2023-01-11T21:38:06.7455796Z     tmp22 = tmp16 == 0
2023-01-11T21:38:06.7456058Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp22, xmask)
2023-01-11T21:38:06.7456319Z ''')
2023-01-11T21:38:06.7456423Z 
2023-01-11T21:38:06.7456428Z 
2023-01-11T21:38:06.7456528Z async_compile.wait(globals())
2023-01-11T21:38:06.7456728Z del async_compile
2023-01-11T21:38:06.7456846Z 
2023-01-11T21:38:06.7456924Z def call(args):
2023-01-11T21:38:06.7457107Z     arg0_1, = args
2023-01-11T21:38:06.7457356Z     args.clear()
2023-01-11T21:38:06.7457565Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7457901Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7458255Z         buf1 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7458613Z         buf2 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7459034Z         buf3 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7459388Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7459742Z         buf5 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7460087Z         buf6 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.7460439Z         buf8 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7460780Z         buf9 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.int64)
2023-01-11T21:38:06.7461134Z         buf10 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7461498Z         buf11 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7461748Z         buf7 = buf6; del buf6  # reuse
2023-01-11T21:38:06.7461966Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7462338Z         triton_fused_amax_amin_any_1_any_2_argmax_argmin_gt_gt_1_logical_not_logical_not_1_0.run(buf7, arg0_1, buf0, buf1, buf2, buf3, buf4, buf5, buf8, buf9, buf10, buf11, 8, 3, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7462666Z         del arg0_1
2023-01-11T21:38:06.7462906Z         return (buf0, buf1, buf2, buf3, buf4, buf5, buf7, buf8, buf9, buf10, buf11, )
2023-01-11T21:38:06.7463076Z 
2023-01-11T21:38:06.7463080Z 
2023-01-11T21:38:06.7463162Z if __name__ == "__main__":
2023-01-11T21:38:06.7463404Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7463675Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7464039Z     arg0_1 = rand_strided((8, 3), (3, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7464361Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7464509Z 
2023-01-11T21:38:06.7464582Z ok (1.056s)
2023-01-11T21:38:06.7465209Z   test_unspec_inputs_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.7465698Z   warnings.warn(
2023-01-11T21:38:06.7466082Z [2023-01-11 21:36:06,255] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 913
2023-01-11T21:38:06.7466539Z [2023-01-11 21:36:06,270] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7466997Z [2023-01-11 21:36:06,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 913
2023-01-11T21:38:06.7467464Z [2023-01-11 21:36:06,286] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 914
2023-01-11T21:38:06.7467913Z [2023-01-11 21:36:06,299] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7468365Z [2023-01-11 21:36:06,299] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 914
2023-01-11T21:38:06.7468823Z [2023-01-11 21:36:06,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 915
2023-01-11T21:38:06.7469269Z [2023-01-11 21:36:06,328] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7469727Z [2023-01-11 21:36:06,328] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 915
2023-01-11T21:38:06.7470172Z [2023-01-11 21:36:06,343] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 916
2023-01-11T21:38:06.7470373Z 
2023-01-11T21:38:06.7470473Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7470683Z import torch
2023-01-11T21:38:06.7470860Z import random
2023-01-11T21:38:06.7471090Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7471369Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7471532Z 
2023-01-11T21:38:06.7471615Z aten = torch.ops.aten
2023-01-11T21:38:06.7471899Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7472165Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7472301Z 
2023-01-11T21:38:06.7472377Z import triton
2023-01-11T21:38:06.7472570Z import triton.language as tl
2023-01-11T21:38:06.7472826Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7473122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7473295Z 
2023-01-11T21:38:06.7473300Z 
2023-01-11T21:38:06.7473471Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7473691Z import triton
2023-01-11T21:38:06.7473895Z import triton.language as tl
2023-01-11T21:38:06.7474142Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7474383Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7474656Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7474943Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7475097Z 
2023-01-11T21:38:06.7475554Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7475921Z @triton.jit
2023-01-11T21:38:06.7476187Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7476447Z     xnumel = 6
2023-01-11T21:38:06.7476645Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7476912Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7477187Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7477375Z     x0 = xindex
2023-01-11T21:38:06.7477674Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7477914Z     tmp1 = in_ptr1
2023-01-11T21:38:06.7478116Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7478342Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.7478547Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.7478736Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.7478929Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.7479179Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7479475Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7479771Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7480013Z ''')
2023-01-11T21:38:06.7480115Z 
2023-01-11T21:38:06.7480120Z 
2023-01-11T21:38:06.7480215Z async_compile.wait(globals())
2023-01-11T21:38:06.7480423Z del async_compile
2023-01-11T21:38:06.7480539Z 
2023-01-11T21:38:06.7480617Z def call(args):
2023-01-11T21:38:06.7480806Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7480989Z     args.clear()
2023-01-11T21:38:06.7481192Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7481529Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7481892Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7482251Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7482508Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7482810Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7483066Z         del arg0_1
2023-01-11T21:38:06.7483249Z         del arg1_1
2023-01-11T21:38:06.7483450Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7483575Z 
2023-01-11T21:38:06.7483583Z 
2023-01-11T21:38:06.7483667Z if __name__ == "__main__":
2023-01-11T21:38:06.7483907Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7484185Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7484540Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7484931Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float16)
2023-01-11T21:38:06.7485218Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7485384Z 
2023-01-11T21:38:06.7485390Z 
2023-01-11T21:38:06.7485500Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7485726Z import torch
2023-01-11T21:38:06.7485918Z import random
2023-01-11T21:38:06.7486148Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7486415Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7486575Z 
2023-01-11T21:38:06.7486659Z aten = torch.ops.aten
2023-01-11T21:38:06.7486918Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7487177Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7487310Z 
2023-01-11T21:38:06.7487385Z import triton
2023-01-11T21:38:06.7487586Z import triton.language as tl
2023-01-11T21:38:06.7487838Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7488137Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7488313Z 
2023-01-11T21:38:06.7488318Z 
2023-01-11T21:38:06.7488489Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7488715Z import triton
2023-01-11T21:38:06.7488907Z import triton.language as tl
2023-01-11T21:38:06.7489151Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7489399Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7489663Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7489949Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7490105Z 
2023-01-11T21:38:06.7490558Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7490957Z @triton.jit
2023-01-11T21:38:06.7491220Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7491478Z     xnumel = 6
2023-01-11T21:38:06.7491683Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7491947Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7492191Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7492382Z     x0 = xindex
2023-01-11T21:38:06.7492554Z     tmp0 = in_ptr0
2023-01-11T21:38:06.7492858Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7493118Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7493333Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7493543Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7493736Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.7493923Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.7494177Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7494669Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7494972Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7495213Z ''')
2023-01-11T21:38:06.7495314Z 
2023-01-11T21:38:06.7495319Z 
2023-01-11T21:38:06.7495414Z async_compile.wait(globals())
2023-01-11T21:38:06.7495623Z del async_compile
2023-01-11T21:38:06.7495738Z 
2023-01-11T21:38:06.7495807Z def call(args):
2023-01-11T21:38:06.7495995Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7496187Z     args.clear()
2023-01-11T21:38:06.7496384Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7496715Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7497083Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7497514Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7497763Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7498108Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7498371Z         del arg0_1
2023-01-11T21:38:06.7498546Z         del arg1_1
2023-01-11T21:38:06.7498741Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7498871Z 
2023-01-11T21:38:06.7498875Z 
2023-01-11T21:38:06.7498957Z if __name__ == "__main__":
2023-01-11T21:38:06.7499185Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7499460Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7499809Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float16)
2023-01-11T21:38:06.7500171Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7500451Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7500603Z 
2023-01-11T21:38:06.7500608Z 
2023-01-11T21:38:06.7500706Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7500910Z import torch
2023-01-11T21:38:06.7501088Z import random
2023-01-11T21:38:06.7501314Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7501586Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7501746Z 
2023-01-11T21:38:06.7501821Z aten = torch.ops.aten
2023-01-11T21:38:06.7502078Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7502338Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7502471Z 
2023-01-11T21:38:06.7502545Z import triton
2023-01-11T21:38:06.7502737Z import triton.language as tl
2023-01-11T21:38:06.7502992Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7503286Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7503491Z 
2023-01-11T21:38:06.7503505Z 
2023-01-11T21:38:06.7503670Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7503901Z import triton
2023-01-11T21:38:06.7504107Z import triton.language as tl
2023-01-11T21:38:06.7504347Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7504601Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7504878Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7505166Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7505325Z 
2023-01-11T21:38:06.7505832Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7506202Z @triton.jit
2023-01-11T21:38:06.7506470Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7506727Z     xnumel = 6
2023-01-11T21:38:06.7506935Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7507202Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7507447Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7507633Z     x0 = xindex
2023-01-11T21:38:06.7507939Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7508182Z     tmp1 = in_ptr1
2023-01-11T21:38:06.7508388Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7508612Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.7508822Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.7509011Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.7509205Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.7509456Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7509748Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7510038Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7510292Z ''')
2023-01-11T21:38:06.7510395Z 
2023-01-11T21:38:06.7510400Z 
2023-01-11T21:38:06.7510491Z async_compile.wait(globals())
2023-01-11T21:38:06.7510699Z del async_compile
2023-01-11T21:38:06.7510817Z 
2023-01-11T21:38:06.7510893Z def call(args):
2023-01-11T21:38:06.7511118Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7511306Z     args.clear()
2023-01-11T21:38:06.7511513Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7511848Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7512210Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7512570Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7512831Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7513127Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7513393Z         del arg0_1
2023-01-11T21:38:06.7513575Z         del arg1_1
2023-01-11T21:38:06.7513768Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7513899Z 
2023-01-11T21:38:06.7513904Z 
2023-01-11T21:38:06.7513988Z if __name__ == "__main__":
2023-01-11T21:38:06.7514230Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7514515Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7514870Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7515229Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16)
2023-01-11T21:38:06.7515516Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7515670Z 
2023-01-11T21:38:06.7515919Z [2023-01-11 21:36:06,356] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7516393Z [2023-01-11 21:36:06,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 916
2023-01-11T21:38:06.7516899Z [2023-01-11 21:36:06,373] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 917
2023-01-11T21:38:06.7517354Z [2023-01-11 21:36:06,385] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7517819Z [2023-01-11 21:36:06,385] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 917
2023-01-11T21:38:06.7518283Z [2023-01-11 21:36:06,400] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 918
2023-01-11T21:38:06.7518729Z [2023-01-11 21:36:06,412] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7519191Z [2023-01-11 21:36:06,412] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 918
2023-01-11T21:38:06.7519644Z [2023-01-11 21:36:06,428] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 919
2023-01-11T21:38:06.7519848Z 
2023-01-11T21:38:06.7519951Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7520161Z import torch
2023-01-11T21:38:06.7520341Z import random
2023-01-11T21:38:06.7520581Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7520859Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7521028Z 
2023-01-11T21:38:06.7521113Z aten = torch.ops.aten
2023-01-11T21:38:06.7521369Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7521634Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7521771Z 
2023-01-11T21:38:06.7521850Z import triton
2023-01-11T21:38:06.7522049Z import triton.language as tl
2023-01-11T21:38:06.7522310Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7522614Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7522790Z 
2023-01-11T21:38:06.7522795Z 
2023-01-11T21:38:06.7522959Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7523195Z import triton
2023-01-11T21:38:06.7523399Z import triton.language as tl
2023-01-11T21:38:06.7523639Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7523892Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7524171Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7524500Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7524653Z 
2023-01-11T21:38:06.7525105Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7525499Z @triton.jit
2023-01-11T21:38:06.7525793Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7526055Z     xnumel = 6
2023-01-11T21:38:06.7526258Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7526532Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7526778Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7526971Z     x0 = xindex
2023-01-11T21:38:06.7527153Z     tmp0 = in_ptr0
2023-01-11T21:38:06.7527464Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7527728Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7527948Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7528154Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7528345Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.7528540Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.7528798Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7529091Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7529381Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7529632Z ''')
2023-01-11T21:38:06.7529735Z 
2023-01-11T21:38:06.7529781Z 
2023-01-11T21:38:06.7529880Z async_compile.wait(globals())
2023-01-11T21:38:06.7530083Z del async_compile
2023-01-11T21:38:06.7530201Z 
2023-01-11T21:38:06.7530279Z def call(args):
2023-01-11T21:38:06.7530470Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7530656Z     args.clear()
2023-01-11T21:38:06.7530861Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7531201Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7531564Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7531923Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7532181Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7532474Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7532738Z         del arg0_1
2023-01-11T21:38:06.7532920Z         del arg1_1
2023-01-11T21:38:06.7533116Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7533244Z 
2023-01-11T21:38:06.7533249Z 
2023-01-11T21:38:06.7533332Z if __name__ == "__main__":
2023-01-11T21:38:06.7533571Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7533851Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7534202Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.bfloat16)
2023-01-11T21:38:06.7534777Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7535062Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7535216Z 
2023-01-11T21:38:06.7535220Z 
2023-01-11T21:38:06.7535317Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7535513Z import torch
2023-01-11T21:38:06.7535693Z import random
2023-01-11T21:38:06.7535921Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7536186Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7536342Z 
2023-01-11T21:38:06.7536428Z aten = torch.ops.aten
2023-01-11T21:38:06.7536680Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7536935Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7537071Z 
2023-01-11T21:38:06.7537191Z import triton
2023-01-11T21:38:06.7537404Z import triton.language as tl
2023-01-11T21:38:06.7537697Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7537995Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7538167Z 
2023-01-11T21:38:06.7538171Z 
2023-01-11T21:38:06.7538344Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7538565Z import triton
2023-01-11T21:38:06.7538754Z import triton.language as tl
2023-01-11T21:38:06.7538994Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7539239Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7539499Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7539778Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7539935Z 
2023-01-11T21:38:06.7540386Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7540745Z @triton.jit
2023-01-11T21:38:06.7541003Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7541254Z     xnumel = 6
2023-01-11T21:38:06.7541453Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7541709Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7541946Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7542135Z     x0 = xindex
2023-01-11T21:38:06.7542423Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7542658Z     tmp1 = in_ptr1
2023-01-11T21:38:06.7542864Z     tmp4 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7543112Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7543307Z     tmp3 = tmp0 * tmp1
2023-01-11T21:38:06.7543502Z     tmp5 = tmp4 / tmp1
2023-01-11T21:38:06.7543744Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.7544044Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7544332Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7544581Z ''')
2023-01-11T21:38:06.7544677Z 
2023-01-11T21:38:06.7544683Z 
2023-01-11T21:38:06.7544780Z async_compile.wait(globals())
2023-01-11T21:38:06.7544986Z del async_compile
2023-01-11T21:38:06.7545100Z 
2023-01-11T21:38:06.7545196Z def call(args):
2023-01-11T21:38:06.7545409Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7545605Z     args.clear()
2023-01-11T21:38:06.7545810Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7546141Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7546510Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7546872Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7547127Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7547424Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7547687Z         del arg0_1
2023-01-11T21:38:06.7547871Z         del arg1_1
2023-01-11T21:38:06.7548062Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7548195Z 
2023-01-11T21:38:06.7548199Z 
2023-01-11T21:38:06.7548282Z if __name__ == "__main__":
2023-01-11T21:38:06.7548518Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7548787Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7549152Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7549512Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.7549794Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7549939Z 
2023-01-11T21:38:06.7549943Z 
2023-01-11T21:38:06.7550046Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7550252Z import torch
2023-01-11T21:38:06.7550472Z import random
2023-01-11T21:38:06.7550699Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7550976Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7551134Z 
2023-01-11T21:38:06.7551218Z aten = torch.ops.aten
2023-01-11T21:38:06.7551469Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7551731Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7551867Z 
2023-01-11T21:38:06.7551944Z import triton
2023-01-11T21:38:06.7552140Z import triton.language as tl
2023-01-11T21:38:06.7552398Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7552694Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7552873Z 
2023-01-11T21:38:06.7552877Z 
2023-01-11T21:38:06.7553050Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7553271Z import triton
2023-01-11T21:38:06.7553474Z import triton.language as tl
2023-01-11T21:38:06.7553721Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7553965Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7554236Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7554526Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7554683Z 
2023-01-11T21:38:06.7555133Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7555488Z @triton.jit
2023-01-11T21:38:06.7555755Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7556047Z     xnumel = 6
2023-01-11T21:38:06.7556247Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7556511Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7556756Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7556944Z     x0 = xindex
2023-01-11T21:38:06.7557126Z     tmp0 = in_ptr0
2023-01-11T21:38:06.7557429Z     tmp1 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7557681Z     tmp4 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7557893Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7558091Z     tmp3 = tmp0 * tmp1
2023-01-11T21:38:06.7558284Z     tmp5 = tmp0 / tmp4
2023-01-11T21:38:06.7558525Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.7558820Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7559109Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7559353Z ''')
2023-01-11T21:38:06.7559454Z 
2023-01-11T21:38:06.7559459Z 
2023-01-11T21:38:06.7559556Z async_compile.wait(globals())
2023-01-11T21:38:06.7559766Z del async_compile
2023-01-11T21:38:06.7559881Z 
2023-01-11T21:38:06.7559952Z def call(args):
2023-01-11T21:38:06.7560143Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7560335Z     args.clear()
2023-01-11T21:38:06.7560533Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7560863Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7561229Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7561589Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7561843Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7562141Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7562406Z         del arg0_1
2023-01-11T21:38:06.7562582Z         del arg1_1
2023-01-11T21:38:06.7562783Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7562914Z 
2023-01-11T21:38:06.7562918Z 
2023-01-11T21:38:06.7563001Z if __name__ == "__main__":
2023-01-11T21:38:06.7563233Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7563546Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7563895Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.7564262Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7564538Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7564692Z 
2023-01-11T21:38:06.7564947Z [2023-01-11 21:36:06,441] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7565448Z [2023-01-11 21:36:06,441] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 919
2023-01-11T21:38:06.7565915Z [2023-01-11 21:36:06,457] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 920
2023-01-11T21:38:06.7566363Z [2023-01-11 21:36:06,471] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7566823Z [2023-01-11 21:36:06,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 920
2023-01-11T21:38:06.7567281Z [2023-01-11 21:36:06,486] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 921
2023-01-11T21:38:06.7567719Z [2023-01-11 21:36:06,500] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7568177Z [2023-01-11 21:36:06,500] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 921
2023-01-11T21:38:06.7568630Z [2023-01-11 21:36:06,515] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 922
2023-01-11T21:38:06.7568864Z 
2023-01-11T21:38:06.7568966Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7569172Z import torch
2023-01-11T21:38:06.7569359Z import random
2023-01-11T21:38:06.7569591Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7569862Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7570025Z 
2023-01-11T21:38:06.7570113Z aten = torch.ops.aten
2023-01-11T21:38:06.7570371Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7570628Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7570639Z 
2023-01-11T21:38:06.7570709Z import triton
2023-01-11T21:38:06.7570804Z import triton.language as tl
2023-01-11T21:38:06.7570934Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7571076Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7571082Z 
2023-01-11T21:38:06.7571087Z 
2023-01-11T21:38:06.7571258Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7571343Z import triton
2023-01-11T21:38:06.7571438Z import triton.language as tl
2023-01-11T21:38:06.7571548Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7571653Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7571791Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7571921Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7571926Z 
2023-01-11T21:38:06.7572374Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'fp64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7572450Z @triton.jit
2023-01-11T21:38:06.7572614Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7572689Z     xnumel = 6
2023-01-11T21:38:06.7572783Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7572919Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7573008Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7573082Z     x0 = xindex
2023-01-11T21:38:06.7573275Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7573355Z     tmp1 = in_ptr1
2023-01-11T21:38:06.7573484Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7573570Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.7573653Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.7573734Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.7573814Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.7573953Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7574089Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7574223Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7574305Z ''')
2023-01-11T21:38:06.7574317Z 
2023-01-11T21:38:06.7574324Z 
2023-01-11T21:38:06.7574414Z async_compile.wait(globals())
2023-01-11T21:38:06.7574693Z del async_compile
2023-01-11T21:38:06.7574699Z 
2023-01-11T21:38:06.7574778Z def call(args):
2023-01-11T21:38:06.7574858Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7574933Z     args.clear()
2023-01-11T21:38:06.7575027Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7575239Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7575431Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7575628Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7575721Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7575888Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7575961Z         del arg0_1
2023-01-11T21:38:06.7576033Z         del arg1_1
2023-01-11T21:38:06.7576122Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7576173Z 
2023-01-11T21:38:06.7576177Z 
2023-01-11T21:38:06.7576262Z if __name__ == "__main__":
2023-01-11T21:38:06.7576376Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7576505Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7576711Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7576898Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.7577018Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7577024Z 
2023-01-11T21:38:06.7577028Z 
2023-01-11T21:38:06.7577174Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7577262Z import torch
2023-01-11T21:38:06.7577342Z import random
2023-01-11T21:38:06.7577455Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7577582Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7577587Z 
2023-01-11T21:38:06.7577672Z aten = torch.ops.aten
2023-01-11T21:38:06.7577813Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7577910Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7577915Z 
2023-01-11T21:38:06.7577991Z import triton
2023-01-11T21:38:06.7578085Z import triton.language as tl
2023-01-11T21:38:06.7578209Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7578351Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7578356Z 
2023-01-11T21:38:06.7578360Z 
2023-01-11T21:38:06.7578530Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7578606Z import triton
2023-01-11T21:38:06.7578704Z import triton.language as tl
2023-01-11T21:38:06.7578820Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7578924Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7579059Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7579179Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7579187Z 
2023-01-11T21:38:06.7579633Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'fp64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7579749Z @triton.jit
2023-01-11T21:38:06.7579915Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7579991Z     xnumel = 6
2023-01-11T21:38:06.7580090Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7580225Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7580312Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7580378Z     x0 = xindex
2023-01-11T21:38:06.7580456Z     tmp0 = in_ptr0
2023-01-11T21:38:06.7580651Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7580754Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7580848Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7580929Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7581011Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.7581085Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.7581222Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7581359Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7581495Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7581583Z ''')
2023-01-11T21:38:06.7581589Z 
2023-01-11T21:38:06.7581593Z 
2023-01-11T21:38:06.7581689Z async_compile.wait(globals())
2023-01-11T21:38:06.7581767Z del async_compile
2023-01-11T21:38:06.7581772Z 
2023-01-11T21:38:06.7581851Z def call(args):
2023-01-11T21:38:06.7581927Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7582006Z     args.clear()
2023-01-11T21:38:06.7582100Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7582302Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7582538Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7582737Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7582831Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7582995Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7583073Z         del arg0_1
2023-01-11T21:38:06.7583148Z         del arg1_1
2023-01-11T21:38:06.7583239Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7583245Z 
2023-01-11T21:38:06.7583249Z 
2023-01-11T21:38:06.7583330Z if __name__ == "__main__":
2023-01-11T21:38:06.7583451Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7583583Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7583772Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.7583968Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7584089Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7584094Z 
2023-01-11T21:38:06.7584099Z 
2023-01-11T21:38:06.7584199Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7584278Z import torch
2023-01-11T21:38:06.7584358Z import random
2023-01-11T21:38:06.7584478Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7584605Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7584610Z 
2023-01-11T21:38:06.7584693Z aten = torch.ops.aten
2023-01-11T21:38:06.7584826Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7584925Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7584930Z 
2023-01-11T21:38:06.7585005Z import triton
2023-01-11T21:38:06.7585099Z import triton.language as tl
2023-01-11T21:38:06.7585228Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7585385Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7585392Z 
2023-01-11T21:38:06.7585397Z 
2023-01-11T21:38:06.7585592Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7585680Z import triton
2023-01-11T21:38:06.7585767Z import triton.language as tl
2023-01-11T21:38:06.7585923Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7586032Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7586167Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7586294Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7586299Z 
2023-01-11T21:38:06.7586747Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7586824Z @triton.jit
2023-01-11T21:38:06.7586990Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7587060Z     xnumel = 6
2023-01-11T21:38:06.7587165Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7587297Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7587385Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7587459Z     x0 = xindex
2023-01-11T21:38:06.7587651Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7587731Z     tmp1 = in_ptr1
2023-01-11T21:38:06.7587823Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7587913Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.7587997Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.7588077Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.7588158Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.7588293Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7588433Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7588585Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7588674Z ''')
2023-01-11T21:38:06.7588680Z 
2023-01-11T21:38:06.7588684Z 
2023-01-11T21:38:06.7588780Z async_compile.wait(globals())
2023-01-11T21:38:06.7588860Z del async_compile
2023-01-11T21:38:06.7588868Z 
2023-01-11T21:38:06.7588947Z def call(args):
2023-01-11T21:38:06.7589029Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7589109Z     args.clear()
2023-01-11T21:38:06.7589205Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7589402Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7589604Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7589802Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7589901Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7590068Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7590148Z         del arg0_1
2023-01-11T21:38:06.7590223Z         del arg1_1
2023-01-11T21:38:06.7590308Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7590321Z 
2023-01-11T21:38:06.7590326Z 
2023-01-11T21:38:06.7590403Z if __name__ == "__main__":
2023-01-11T21:38:06.7590525Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7590655Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7590857Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7591043Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.7591163Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7591169Z 
2023-01-11T21:38:06.7591419Z [2023-01-11 21:36:06,528] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7591685Z [2023-01-11 21:36:06,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 922
2023-01-11T21:38:06.7591932Z [2023-01-11 21:36:06,544] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 923
2023-01-11T21:38:06.7592212Z [2023-01-11 21:36:06,557] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7592474Z [2023-01-11 21:36:06,557] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 923
2023-01-11T21:38:06.7592726Z [2023-01-11 21:36:06,574] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 924
2023-01-11T21:38:06.7592970Z [2023-01-11 21:36:06,587] torch._inductor.compile_fx: [WARNING] skipping cudagraphs due to multiple devices
2023-01-11T21:38:06.7593228Z [2023-01-11 21:36:06,587] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 924
2023-01-11T21:38:06.7593234Z 
2023-01-11T21:38:06.7593338Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7593413Z import torch
2023-01-11T21:38:06.7593481Z import random
2023-01-11T21:38:06.7593599Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7593722Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7593727Z 
2023-01-11T21:38:06.7593808Z aten = torch.ops.aten
2023-01-11T21:38:06.7593946Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7594043Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7594048Z 
2023-01-11T21:38:06.7594122Z import triton
2023-01-11T21:38:06.7594216Z import triton.language as tl
2023-01-11T21:38:06.7594333Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7594471Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7594477Z 
2023-01-11T21:38:06.7594481Z 
2023-01-11T21:38:06.7594650Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7594724Z import triton
2023-01-11T21:38:06.7594885Z import triton.language as tl
2023-01-11T21:38:06.7595000Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7595106Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7595260Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7595402Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7595410Z 
2023-01-11T21:38:06.7595857Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7595932Z @triton.jit
2023-01-11T21:38:06.7596091Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7596163Z     xnumel = 6
2023-01-11T21:38:06.7596259Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7596390Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7596476Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7596540Z     x0 = xindex
2023-01-11T21:38:06.7596615Z     tmp0 = in_ptr0
2023-01-11T21:38:06.7596807Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7596904Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7596995Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7597073Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7597151Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.7597222Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.7597357Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7597489Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7597620Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7597708Z ''')
2023-01-11T21:38:06.7597714Z 
2023-01-11T21:38:06.7597718Z 
2023-01-11T21:38:06.7597810Z async_compile.wait(globals())
2023-01-11T21:38:06.7597895Z del async_compile
2023-01-11T21:38:06.7597900Z 
2023-01-11T21:38:06.7597974Z def call(args):
2023-01-11T21:38:06.7598046Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7598122Z     args.clear()
2023-01-11T21:38:06.7598213Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7598438Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7598637Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7598834Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7598926Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7599085Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7599158Z         del arg0_1
2023-01-11T21:38:06.7599230Z         del arg1_1
2023-01-11T21:38:06.7599320Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7599328Z 
2023-01-11T21:38:06.7599332Z 
2023-01-11T21:38:06.7599413Z if __name__ == "__main__":
2023-01-11T21:38:06.7599530Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7599655Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7599839Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.7600032Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7600151Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7600156Z 
2023-01-11T21:38:06.7600160Z 
2023-01-11T21:38:06.7600256Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7600332Z import torch
2023-01-11T21:38:06.7600405Z import random
2023-01-11T21:38:06.7600523Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7600645Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7600650Z 
2023-01-11T21:38:06.7600733Z aten = torch.ops.aten
2023-01-11T21:38:06.7600891Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7600986Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7600991Z 
2023-01-11T21:38:06.7601064Z import triton
2023-01-11T21:38:06.7601155Z import triton.language as tl
2023-01-11T21:38:06.7601280Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7601422Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7601427Z 
2023-01-11T21:38:06.7601432Z 
2023-01-11T21:38:06.7601598Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7601672Z import triton
2023-01-11T21:38:06.7601757Z import triton.language as tl
2023-01-11T21:38:06.7601872Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7601975Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7602107Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7602231Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7602236Z 
2023-01-11T21:38:06.7602680Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i64', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7602754Z @triton.jit
2023-01-11T21:38:06.7602914Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7602980Z     xnumel = 6
2023-01-11T21:38:06.7603077Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7603204Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7603289Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7603360Z     x0 = xindex
2023-01-11T21:38:06.7603550Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7603625Z     tmp1 = in_ptr1
2023-01-11T21:38:06.7603715Z     tmp5 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7603801Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.7603882Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.7603960Z     tmp4 = tmp0 * tmp2
2023-01-11T21:38:06.7604037Z     tmp6 = tmp5 / tmp2
2023-01-11T21:38:06.7604170Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7604335Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7604460Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7604545Z ''')
2023-01-11T21:38:06.7604551Z 
2023-01-11T21:38:06.7604555Z 
2023-01-11T21:38:06.7604647Z async_compile.wait(globals())
2023-01-11T21:38:06.7604725Z del async_compile
2023-01-11T21:38:06.7604730Z 
2023-01-11T21:38:06.7604806Z def call(args):
2023-01-11T21:38:06.7604886Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7604961Z     args.clear()
2023-01-11T21:38:06.7605047Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7605243Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7605443Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7605640Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7605732Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7605904Z         triton_fused_add_div_mul_0.run(arg0_1, arg1_1.item(), buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7605978Z         del arg0_1
2023-01-11T21:38:06.7606051Z         del arg1_1
2023-01-11T21:38:06.7606133Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7606138Z 
2023-01-11T21:38:06.7606142Z 
2023-01-11T21:38:06.7606224Z if __name__ == "__main__":
2023-01-11T21:38:06.7606341Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7606468Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7606665Z     arg0_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7606876Z     arg1_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.7607000Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7607005Z 
2023-01-11T21:38:06.7607010Z 
2023-01-11T21:38:06.7607109Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7607179Z import torch
2023-01-11T21:38:06.7607255Z import random
2023-01-11T21:38:06.7607379Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7607504Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7607509Z 
2023-01-11T21:38:06.7607593Z aten = torch.ops.aten
2023-01-11T21:38:06.7607730Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7607827Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7607832Z 
2023-01-11T21:38:06.7607907Z import triton
2023-01-11T21:38:06.7607996Z import triton.language as tl
2023-01-11T21:38:06.7608123Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7608266Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7608275Z 
2023-01-11T21:38:06.7608279Z 
2023-01-11T21:38:06.7608448Z triton_fused_add_div_mul_0 = async_compile.triton('''
2023-01-11T21:38:06.7608525Z import triton
2023-01-11T21:38:06.7608620Z import triton.language as tl
2023-01-11T21:38:06.7608735Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7608836Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7608972Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7609099Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7609104Z 
2023-01-11T21:38:06.7609547Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: 'i64', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.7609622Z @triton.jit
2023-01-11T21:38:06.7609782Z def triton_(in_ptr0, in_ptr1, out_ptr0, out_ptr1, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7609859Z     xnumel = 6
2023-01-11T21:38:06.7609958Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7610089Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7610168Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7610266Z     x0 = xindex
2023-01-11T21:38:06.7610345Z     tmp0 = in_ptr0
2023-01-11T21:38:06.7610537Z     tmp2 = tl.load(in_ptr1 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7610634Z     tmp5 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.7610722Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7610800Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.7610871Z     tmp4 = tmp1 * tmp2
2023-01-11T21:38:06.7610947Z     tmp6 = tmp1 / tmp5
2023-01-11T21:38:06.7611081Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.7611213Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7611347Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7611432Z ''')
2023-01-11T21:38:06.7611438Z 
2023-01-11T21:38:06.7611442Z 
2023-01-11T21:38:06.7611536Z async_compile.wait(globals())
2023-01-11T21:38:06.7611606Z del async_compile
2023-01-11T21:38:06.7611611Z 
2023-01-11T21:38:06.7611689Z def call(args):
2023-01-11T21:38:06.7611766Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.7611840Z     args.clear()
2023-01-11T21:38:06.7611933Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7612133Z         buf0 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7612331Z         buf1 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7612525Z         buf2 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7612611Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7612776Z         triton_fused_add_div_mul_0.run(arg0_1.item(), arg1_1, buf0, buf1, buf2, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.7612896Z         del arg0_1
2023-01-11T21:38:06.7612969Z         del arg1_1
2023-01-11T21:38:06.7613058Z         return (buf0, buf1, buf2, )
2023-01-11T21:38:06.7613064Z 
2023-01-11T21:38:06.7613068Z 
2023-01-11T21:38:06.7613148Z if __name__ == "__main__":
2023-01-11T21:38:06.7613270Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7613389Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7613574Z     arg0_1 = rand_strided((), (), device='cpu', dtype=torch.int64)
2023-01-11T21:38:06.7613770Z     arg1_1 = rand_strided((2, 3), (3, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7613888Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.7613894Z 
2023-01-11T21:38:06.7613964Z ok (0.349s)
2023-01-11T21:38:06.7614426Z   test_unsqueeze_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7614762Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7615033Z [2023-01-11 21:36:06,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 925
2023-01-11T21:38:06.7615324Z [2023-01-11 21:36:06,799] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 925
2023-01-11T21:38:06.7615768Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7615903Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7616151Z [2023-01-11 21:36:06,831] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 926
2023-01-11T21:38:06.7616465Z [2023-01-11 21:36:06,920] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 926
2023-01-11T21:38:06.7616471Z 
2023-01-11T21:38:06.7616571Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7616645Z import torch
2023-01-11T21:38:06.7616719Z import random
2023-01-11T21:38:06.7616840Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7616963Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7616968Z 
2023-01-11T21:38:06.7617050Z aten = torch.ops.aten
2023-01-11T21:38:06.7617231Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7617344Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7617350Z 
2023-01-11T21:38:06.7617436Z import triton
2023-01-11T21:38:06.7617552Z import triton.language as tl
2023-01-11T21:38:06.7617679Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7617819Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7617825Z 
2023-01-11T21:38:06.7617829Z 
2023-01-11T21:38:06.7618024Z triton_fused_add_1_add_2_add_4_add_5_0 = async_compile.triton('''
2023-01-11T21:38:06.7618099Z import triton
2023-01-11T21:38:06.7618184Z import triton.language as tl
2023-01-11T21:38:06.7618299Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7618400Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7618532Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7618657Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7618663Z 
2023-01-11T21:38:06.7619115Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: '*fp32', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.7619234Z @triton.jit
2023-01-11T21:38:06.7619412Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7619483Z     xnumel = 16
2023-01-11T21:38:06.7619591Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7619733Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7619821Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7619894Z     x0 = xindex
2023-01-11T21:38:06.7620114Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7620218Z     tmp6 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7620285Z     tmp1 = 1
2023-01-11T21:38:06.7620369Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7620444Z     tmp3 = 2
2023-01-11T21:38:06.7620526Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.7620607Z     tmp5 = tmp0 + tmp3
2023-01-11T21:38:06.7620693Z     tmp7 = tmp6 + tmp3
2023-01-11T21:38:06.7620841Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7620981Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7621123Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7621270Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7621361Z ''')
2023-01-11T21:38:06.7621366Z 
2023-01-11T21:38:06.7621371Z 
2023-01-11T21:38:06.7621473Z async_compile.wait(globals())
2023-01-11T21:38:06.7621553Z del async_compile
2023-01-11T21:38:06.7621558Z 
2023-01-11T21:38:06.7621638Z def call(args):
2023-01-11T21:38:06.7621715Z     arg0_1, = args
2023-01-11T21:38:06.7621787Z     args.clear()
2023-01-11T21:38:06.7621885Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7622139Z         buf0 = empty_strided((2, 2, 2, 2, 1), (8, 4, 2, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7622357Z         buf1 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7622577Z         buf2 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7622817Z         buf3 = empty_strided((2, 2, 2, 1, 2), (8, 4, 2, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7622911Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7623071Z         triton_fused_add_1_add_2_add_4_add_5_0.run(arg0_1, buf0, buf1, buf2, buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.7623146Z         del arg0_1
2023-01-11T21:38:06.7623240Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.7623246Z 
2023-01-11T21:38:06.7623250Z 
2023-01-11T21:38:06.7623330Z if __name__ == "__main__":
2023-01-11T21:38:06.7623449Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7623575Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7623786Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7623905Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7623910Z 
2023-01-11T21:38:06.7623914Z 
2023-01-11T21:38:06.7624010Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7624077Z import torch
2023-01-11T21:38:06.7624156Z import random
2023-01-11T21:38:06.7624274Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7624400Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7624405Z 
2023-01-11T21:38:06.7624488Z aten = torch.ops.aten
2023-01-11T21:38:06.7624623Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7624724Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7624729Z 
2023-01-11T21:38:06.7624795Z import triton
2023-01-11T21:38:06.7624888Z import triton.language as tl
2023-01-11T21:38:06.7625012Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7625151Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7625183Z 
2023-01-11T21:38:06.7625188Z 
2023-01-11T21:38:06.7625380Z triton_fused_add_1_add_2_add_4_add_5_0 = async_compile.triton('''
2023-01-11T21:38:06.7625457Z import triton
2023-01-11T21:38:06.7625557Z import triton.language as tl
2023-01-11T21:38:06.7625694Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7625812Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7625951Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7626079Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7626084Z 
2023-01-11T21:38:06.7626537Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: '*fp16', 5: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4, 5), equal_to_1=())]})
2023-01-11T21:38:06.7626614Z @triton.jit
2023-01-11T21:38:06.7626783Z def triton_(in_ptr0, out_ptr0, out_ptr1, out_ptr2, out_ptr3, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7626859Z     xnumel = 16
2023-01-11T21:38:06.7626958Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7627083Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7627169Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7627244Z     x0 = xindex
2023-01-11T21:38:06.7627461Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7627585Z     tmp6 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7627659Z     tmp1 = 1
2023-01-11T21:38:06.7627740Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7627806Z     tmp3 = 2
2023-01-11T21:38:06.7627888Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.7627968Z     tmp5 = tmp0 + tmp3
2023-01-11T21:38:06.7628050Z     tmp7 = tmp6 + tmp3
2023-01-11T21:38:06.7628188Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7628325Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7628465Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7628590Z     tl.store(out_ptr3 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7628679Z ''')
2023-01-11T21:38:06.7628684Z 
2023-01-11T21:38:06.7628716Z 
2023-01-11T21:38:06.7628813Z async_compile.wait(globals())
2023-01-11T21:38:06.7628891Z del async_compile
2023-01-11T21:38:06.7628896Z 
2023-01-11T21:38:06.7628969Z def call(args):
2023-01-11T21:38:06.7629042Z     arg0_1, = args
2023-01-11T21:38:06.7629117Z     args.clear()
2023-01-11T21:38:06.7629211Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7629424Z         buf0 = empty_strided((2, 2, 2, 2, 1), (8, 4, 2, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7629636Z         buf1 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7629853Z         buf2 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7630067Z         buf3 = empty_strided((2, 2, 2, 1, 2), (8, 4, 2, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7630159Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7630328Z         triton_fused_add_1_add_2_add_4_add_5_0.run(arg0_1, buf0, buf1, buf2, buf3, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.7630406Z         del arg0_1
2023-01-11T21:38:06.7630502Z         return (buf0, buf1, buf2, buf3, )
2023-01-11T21:38:06.7630507Z 
2023-01-11T21:38:06.7630511Z 
2023-01-11T21:38:06.7630584Z if __name__ == "__main__":
2023-01-11T21:38:06.7630703Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7630829Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7631042Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7631157Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7631187Z 
2023-01-11T21:38:06.7631260Z ok (0.335s)
2023-01-11T21:38:06.7631729Z   test_unsqueeze_inplace_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7631861Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7632121Z [2023-01-11 21:36:06,952] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 927
2023-01-11T21:38:06.7632385Z [2023-01-11 21:36:07,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 927
2023-01-11T21:38:06.7632791Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7632927Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7633185Z [2023-01-11 21:36:07,055] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 928
2023-01-11T21:38:06.7633448Z [2023-01-11 21:36:07,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 928
2023-01-11T21:38:06.7633454Z 
2023-01-11T21:38:06.7633551Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7633625Z import torch
2023-01-11T21:38:06.7633702Z import random
2023-01-11T21:38:06.7633821Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7633947Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7633952Z 
2023-01-11T21:38:06.7634032Z aten = torch.ops.aten
2023-01-11T21:38:06.7634169Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7634267Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7634272Z 
2023-01-11T21:38:06.7634346Z import triton
2023-01-11T21:38:06.7634439Z import triton.language as tl
2023-01-11T21:38:06.7634595Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7634736Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7634741Z 
2023-01-11T21:38:06.7634746Z 
2023-01-11T21:38:06.7634924Z triton_fused_add_2_unsqueeze__0 = async_compile.triton('''
2023-01-11T21:38:06.7634992Z import triton
2023-01-11T21:38:06.7635085Z import triton.language as tl
2023-01-11T21:38:06.7635198Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7635300Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7635432Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7635558Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7635565Z 
2023-01-11T21:38:06.7635988Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7636065Z @triton.jit
2023-01-11T21:38:06.7636201Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7636276Z     xnumel = 16
2023-01-11T21:38:06.7636376Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7636507Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7636591Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7636662Z     x0 = xindex
2023-01-11T21:38:06.7636853Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7636943Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.7637014Z     tmp1 = 1
2023-01-11T21:38:06.7637131Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7637212Z     tmp4 = tmp3 + tmp1
2023-01-11T21:38:06.7637282Z     tmp5 = 2
2023-01-11T21:38:06.7637359Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.7637492Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.7637619Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7637704Z ''')
2023-01-11T21:38:06.7637710Z 
2023-01-11T21:38:06.7637715Z 
2023-01-11T21:38:06.7637809Z async_compile.wait(globals())
2023-01-11T21:38:06.7637886Z del async_compile
2023-01-11T21:38:06.7637891Z 
2023-01-11T21:38:06.7637965Z def call(args):
2023-01-11T21:38:06.7638037Z     arg0_1, = args
2023-01-11T21:38:06.7638111Z     args.clear()
2023-01-11T21:38:06.7638196Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7638417Z         buf0 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7638637Z         buf1 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7638731Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7638883Z         triton_fused_add_2_unsqueeze__0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.7638956Z         del arg0_1
2023-01-11T21:38:06.7639038Z         return (buf0, buf1, )
2023-01-11T21:38:06.7639046Z 
2023-01-11T21:38:06.7639051Z 
2023-01-11T21:38:06.7639129Z if __name__ == "__main__":
2023-01-11T21:38:06.7639239Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7639368Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7639580Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7639692Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7639697Z 
2023-01-11T21:38:06.7639702Z 
2023-01-11T21:38:06.7639798Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7639871Z import torch
2023-01-11T21:38:06.7639947Z import random
2023-01-11T21:38:06.7640066Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7640183Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7640188Z 
2023-01-11T21:38:06.7640270Z aten = torch.ops.aten
2023-01-11T21:38:06.7640406Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7640531Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7640537Z 
2023-01-11T21:38:06.7640614Z import triton
2023-01-11T21:38:06.7640706Z import triton.language as tl
2023-01-11T21:38:06.7640831Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7640972Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7640978Z 
2023-01-11T21:38:06.7640982Z 
2023-01-11T21:38:06.7641153Z triton_fused_add_2_unsqueeze__0 = async_compile.triton('''
2023-01-11T21:38:06.7641228Z import triton
2023-01-11T21:38:06.7641320Z import triton.language as tl
2023-01-11T21:38:06.7641434Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7641538Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7641670Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7641795Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7641800Z 
2023-01-11T21:38:06.7642222Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7642289Z @triton.jit
2023-01-11T21:38:06.7642432Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7642506Z     xnumel = 16
2023-01-11T21:38:06.7642603Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7642732Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7642815Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7642884Z     x0 = xindex
2023-01-11T21:38:06.7643118Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7643237Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.7643310Z     tmp1 = 1
2023-01-11T21:38:06.7643389Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7643467Z     tmp4 = tmp3 + tmp1
2023-01-11T21:38:06.7643539Z     tmp5 = 2
2023-01-11T21:38:06.7643615Z     tmp6 = tmp4 + tmp5
2023-01-11T21:38:06.7643743Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.7643876Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp6, xmask)
2023-01-11T21:38:06.7643961Z ''')
2023-01-11T21:38:06.7643967Z 
2023-01-11T21:38:06.7643972Z 
2023-01-11T21:38:06.7644066Z async_compile.wait(globals())
2023-01-11T21:38:06.7644142Z del async_compile
2023-01-11T21:38:06.7644147Z 
2023-01-11T21:38:06.7644221Z def call(args):
2023-01-11T21:38:06.7644297Z     arg0_1, = args
2023-01-11T21:38:06.7644371Z     args.clear()
2023-01-11T21:38:06.7644459Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7644681Z         buf0 = empty_strided((2, 2, 1, 2, 2), (8, 4, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7644896Z         buf1 = empty_strided((1, 2, 2, 2, 2), (16, 8, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7644990Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7645143Z         triton_fused_add_2_unsqueeze__0.run(arg0_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.7645215Z         del arg0_1
2023-01-11T21:38:06.7645296Z         return (buf0, buf1, )
2023-01-11T21:38:06.7645301Z 
2023-01-11T21:38:06.7645306Z 
2023-01-11T21:38:06.7645384Z if __name__ == "__main__":
2023-01-11T21:38:06.7645495Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7645621Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7645833Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7645949Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7645954Z 
2023-01-11T21:38:06.7646025Z ok (0.206s)
2023-01-11T21:38:06.7646518Z   test_upsample_bicubic2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7646650Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7646907Z [2023-01-11 21:36:08,705] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 929
2023-01-11T21:38:06.7646913Z 
2023-01-11T21:38:06.7648395Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7648462Z import torch
2023-01-11T21:38:06.7648539Z import random
2023-01-11T21:38:06.7648660Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7648784Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7648789Z 
2023-01-11T21:38:06.7648872Z aten = torch.ops.aten
2023-01-11T21:38:06.7649008Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7649102Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7649110Z 
2023-01-11T21:38:06.7649185Z import triton
2023-01-11T21:38:06.7649270Z import triton.language as tl
2023-01-11T21:38:06.7649394Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7649532Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7649538Z 
2023-01-11T21:38:06.7649542Z 
2023-01-11T21:38:06.7649728Z triton_fused_upsample_bicubic2d_0 = async_compile.triton('''
2023-01-11T21:38:06.7649804Z import triton
2023-01-11T21:38:06.7649896Z import triton.language as tl
2023-01-11T21:38:06.7650009Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7650130Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7650263Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7650390Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7650395Z 
2023-01-11T21:38:06.7650810Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7650884Z @triton.jit
2023-01-11T21:38:06.7651017Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7651092Z     xnumel = 196608
2023-01-11T21:38:06.7651192Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7651313Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7651396Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7651472Z     x0 = xindex % 128
2023-01-11T21:38:06.7651556Z     x1 = (xindex // 128) % 128
2023-01-11T21:38:06.7651641Z     x2 = (xindex // 16384)
2023-01-11T21:38:06.7651711Z     x4 = xindex
2023-01-11T21:38:06.7651784Z     tmp0 = x0
2023-01-11T21:38:06.7651863Z     tmp1 = 0.2440944881889764 * tmp0
2023-01-11T21:38:06.7651960Z     tmp2 = tl.libdevice.floor(tmp1)
2023-01-11T21:38:06.7652071Z     tmp3 = tmp1 - tmp2
2023-01-11T21:38:06.7652145Z     tmp4 = x1
2023-01-11T21:38:06.7652230Z     tmp5 = 0.49606299212598426 * tmp4
2023-01-11T21:38:06.7652330Z     tmp6 = tl.libdevice.floor(tmp5)
2023-01-11T21:38:06.7652438Z     tmp7 = tmp5 - tmp6
2023-01-11T21:38:06.7652517Z     tmp8 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7652601Z     tmp9 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7652695Z     tmp10 = tmp8 + -1
2023-01-11T21:38:06.7652771Z     tmp11 = tmp8 + 0
2023-01-11T21:38:06.7652845Z     tmp12 = tmp8 + 1
2023-01-11T21:38:06.7652920Z     tmp13 = tmp8 + 2
2023-01-11T21:38:06.7653005Z     tmp14 = tmp9 + -1
2023-01-11T21:38:06.7653086Z     tmp15 = tmp9 + 0
2023-01-11T21:38:06.7653158Z     tmp16 = tmp9 + 1
2023-01-11T21:38:06.7653236Z     tmp17 = tmp9 + 2
2023-01-11T21:38:06.7653361Z     tmp18 = tl.where(63 != 63, 63, tl.where(63 < tmp10, 63, tmp10))
2023-01-11T21:38:06.7653479Z     tmp19 = tl.where(0 != 0, 0, tl.where(0 > tmp18, 0, tmp18))
2023-01-11T21:38:06.7653599Z     tmp20 = tl.where(31 != 31, 31, tl.where(31 < tmp14, 31, tmp14))
2023-01-11T21:38:06.7653738Z     tmp21 = tl.where(0 != 0, 0, tl.where(0 > tmp20, 0, tmp20))
2023-01-11T21:38:06.7653971Z     tmp22 = tl.load(in_ptr0 + (tmp21 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7654091Z     tmp23 = tl.where(31 != 31, 31, tl.where(31 < tmp15, 31, tmp15))
2023-01-11T21:38:06.7654205Z     tmp24 = tl.where(0 != 0, 0, tl.where(0 > tmp23, 0, tmp23))
2023-01-11T21:38:06.7654433Z     tmp25 = tl.load(in_ptr0 + (tmp24 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7654753Z     tmp26 = tl.where(31 != 31, 31, tl.where(31 < tmp16, 31, tmp16))
2023-01-11T21:38:06.7654872Z     tmp27 = tl.where(0 != 0, 0, tl.where(0 > tmp26, 0, tmp26))
2023-01-11T21:38:06.7655106Z     tmp28 = tl.load(in_ptr0 + (tmp27 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7655225Z     tmp29 = tl.where(31 != 31, 31, tl.where(31 < tmp17, 31, tmp17))
2023-01-11T21:38:06.7655331Z     tmp30 = tl.where(0 != 0, 0, tl.where(0 > tmp29, 0, tmp29))
2023-01-11T21:38:06.7655606Z     tmp31 = tl.load(in_ptr0 + (tmp30 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7655692Z     tmp32 = tmp3 + 1.0
2023-01-11T21:38:06.7655806Z     tmp33 = -0.75 * tmp32
2023-01-11T21:38:06.7655921Z     tmp34 = tmp33 - -3.75
2023-01-11T21:38:06.7656001Z     tmp35 = tmp34 * tmp32
2023-01-11T21:38:06.7656112Z     tmp36 = tmp35 + -6.0
2023-01-11T21:38:06.7656185Z     tmp37 = tmp36 * tmp32
2023-01-11T21:38:06.7656295Z     tmp38 = tmp37 - -3.0
2023-01-11T21:38:06.7656373Z     tmp39 = 1.25 * tmp3
2023-01-11T21:38:06.7656482Z     tmp40 = tmp39 - 2.25
2023-01-11T21:38:06.7656560Z     tmp41 = tmp40 * tmp3
2023-01-11T21:38:06.7656690Z     tmp42 = tmp41 * tmp3
2023-01-11T21:38:06.7656764Z     tmp43 = tmp42 + 1.0
2023-01-11T21:38:06.7656874Z     tmp44 = 1.0 - tmp3
2023-01-11T21:38:06.7656953Z     tmp45 = 1.25 * tmp44
2023-01-11T21:38:06.7657063Z     tmp46 = tmp45 - 2.25
2023-01-11T21:38:06.7657195Z     tmp47 = tmp46 * tmp44
2023-01-11T21:38:06.7657289Z     tmp48 = tmp47 * tmp44
2023-01-11T21:38:06.7657371Z     tmp49 = tmp48 + 1.0
2023-01-11T21:38:06.7657445Z     tmp50 = tmp44 + 1.0
2023-01-11T21:38:06.7657557Z     tmp51 = -0.75 * tmp50
2023-01-11T21:38:06.7657671Z     tmp52 = tmp51 - -3.75
2023-01-11T21:38:06.7657752Z     tmp53 = tmp52 * tmp50
2023-01-11T21:38:06.7657867Z     tmp54 = tmp53 + -6.0
2023-01-11T21:38:06.7657949Z     tmp55 = tmp54 * tmp50
2023-01-11T21:38:06.7658063Z     tmp56 = tmp55 - -3.0
2023-01-11T21:38:06.7658139Z     tmp57 = tmp22 * tmp38
2023-01-11T21:38:06.7658220Z     tmp58 = tmp25 * tmp43
2023-01-11T21:38:06.7658301Z     tmp59 = tmp28 * tmp49
2023-01-11T21:38:06.7658382Z     tmp60 = tmp31 * tmp56
2023-01-11T21:38:06.7658466Z     tmp61 = tmp59 + tmp60
2023-01-11T21:38:06.7658547Z     tmp62 = tmp58 + tmp61
2023-01-11T21:38:06.7658620Z     tmp63 = tmp57 + tmp62
2023-01-11T21:38:06.7658740Z     tmp64 = tl.where(63 != 63, 63, tl.where(63 < tmp11, 63, tmp11))
2023-01-11T21:38:06.7658862Z     tmp65 = tl.where(0 != 0, 0, tl.where(0 > tmp64, 0, tmp64))
2023-01-11T21:38:06.7659092Z     tmp66 = tl.load(in_ptr0 + (tmp21 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7659316Z     tmp67 = tl.load(in_ptr0 + (tmp24 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7659537Z     tmp68 = tl.load(in_ptr0 + (tmp27 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7659758Z     tmp69 = tl.load(in_ptr0 + (tmp30 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7659842Z     tmp70 = tmp66 * tmp38
2023-01-11T21:38:06.7659917Z     tmp71 = tmp67 * tmp43
2023-01-11T21:38:06.7660001Z     tmp72 = tmp68 * tmp49
2023-01-11T21:38:06.7660084Z     tmp73 = tmp69 * tmp56
2023-01-11T21:38:06.7660165Z     tmp74 = tmp72 + tmp73
2023-01-11T21:38:06.7660245Z     tmp75 = tmp71 + tmp74
2023-01-11T21:38:06.7660325Z     tmp76 = tmp70 + tmp75
2023-01-11T21:38:06.7660444Z     tmp77 = tl.where(63 != 63, 63, tl.where(63 < tmp12, 63, tmp12))
2023-01-11T21:38:06.7660597Z     tmp78 = tl.where(0 != 0, 0, tl.where(0 > tmp77, 0, tmp77))
2023-01-11T21:38:06.7660827Z     tmp79 = tl.load(in_ptr0 + (tmp21 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7661051Z     tmp80 = tl.load(in_ptr0 + (tmp24 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7661272Z     tmp81 = tl.load(in_ptr0 + (tmp27 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7661492Z     tmp82 = tl.load(in_ptr0 + (tmp30 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7661573Z     tmp83 = tmp79 * tmp38
2023-01-11T21:38:06.7661657Z     tmp84 = tmp80 * tmp43
2023-01-11T21:38:06.7661737Z     tmp85 = tmp81 * tmp49
2023-01-11T21:38:06.7661810Z     tmp86 = tmp82 * tmp56
2023-01-11T21:38:06.7661889Z     tmp87 = tmp85 + tmp86
2023-01-11T21:38:06.7661967Z     tmp88 = tmp84 + tmp87
2023-01-11T21:38:06.7662046Z     tmp89 = tmp83 + tmp88
2023-01-11T21:38:06.7662168Z     tmp90 = tl.where(63 != 63, 63, tl.where(63 < tmp13, 63, tmp13))
2023-01-11T21:38:06.7662286Z     tmp91 = tl.where(0 != 0, 0, tl.where(0 > tmp90, 0, tmp90))
2023-01-11T21:38:06.7662511Z     tmp92 = tl.load(in_ptr0 + (tmp21 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7662722Z     tmp93 = tl.load(in_ptr0 + (tmp24 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7662940Z     tmp94 = tl.load(in_ptr0 + (tmp27 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7663158Z     tmp95 = tl.load(in_ptr0 + (tmp30 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7663272Z     tmp96 = tmp92 * tmp38
2023-01-11T21:38:06.7663355Z     tmp97 = tmp93 * tmp43
2023-01-11T21:38:06.7663437Z     tmp98 = tmp94 * tmp49
2023-01-11T21:38:06.7663518Z     tmp99 = tmp95 * tmp56
2023-01-11T21:38:06.7663601Z     tmp100 = tmp98 + tmp99
2023-01-11T21:38:06.7663676Z     tmp101 = tmp97 + tmp100
2023-01-11T21:38:06.7663759Z     tmp102 = tmp96 + tmp101
2023-01-11T21:38:06.7663839Z     tmp103 = tmp7 + 1.0
2023-01-11T21:38:06.7663956Z     tmp104 = -0.75 * tmp103
2023-01-11T21:38:06.7664072Z     tmp105 = tmp104 - -3.75
2023-01-11T21:38:06.7664154Z     tmp106 = tmp105 * tmp103
2023-01-11T21:38:06.7664262Z     tmp107 = tmp106 + -6.0
2023-01-11T21:38:06.7664346Z     tmp108 = tmp107 * tmp103
2023-01-11T21:38:06.7664462Z     tmp109 = tmp108 - -3.0
2023-01-11T21:38:06.7664540Z     tmp110 = 1.25 * tmp7
2023-01-11T21:38:06.7664654Z     tmp111 = tmp110 - 2.25
2023-01-11T21:38:06.7664737Z     tmp112 = tmp111 * tmp7
2023-01-11T21:38:06.7664820Z     tmp113 = tmp112 * tmp7
2023-01-11T21:38:06.7664899Z     tmp114 = tmp113 + 1.0
2023-01-11T21:38:06.7665010Z     tmp115 = 1.0 - tmp7
2023-01-11T21:38:06.7665089Z     tmp116 = 1.25 * tmp115
2023-01-11T21:38:06.7665202Z     tmp117 = tmp116 - 2.25
2023-01-11T21:38:06.7665286Z     tmp118 = tmp117 * tmp115
2023-01-11T21:38:06.7665367Z     tmp119 = tmp118 * tmp115
2023-01-11T21:38:06.7665450Z     tmp120 = tmp119 + 1.0
2023-01-11T21:38:06.7665523Z     tmp121 = tmp115 + 1.0
2023-01-11T21:38:06.7665638Z     tmp122 = -0.75 * tmp121
2023-01-11T21:38:06.7665754Z     tmp123 = tmp122 - -3.75
2023-01-11T21:38:06.7665836Z     tmp124 = tmp123 * tmp121
2023-01-11T21:38:06.7665950Z     tmp125 = tmp124 + -6.0
2023-01-11T21:38:06.7666032Z     tmp126 = tmp125 * tmp121
2023-01-11T21:38:06.7666138Z     tmp127 = tmp126 - -3.0
2023-01-11T21:38:06.7666219Z     tmp128 = tmp63 * tmp109
2023-01-11T21:38:06.7666300Z     tmp129 = tmp76 * tmp114
2023-01-11T21:38:06.7666381Z     tmp130 = tmp89 * tmp120
2023-01-11T21:38:06.7666461Z     tmp131 = tmp102 * tmp127
2023-01-11T21:38:06.7666548Z     tmp132 = tmp130 + tmp131
2023-01-11T21:38:06.7666631Z     tmp133 = tmp129 + tmp132
2023-01-11T21:38:06.7666705Z     tmp134 = tmp128 + tmp133
2023-01-11T21:38:06.7666845Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp134, xmask)
2023-01-11T21:38:06.7666931Z ''')
2023-01-11T21:38:06.7666938Z 
2023-01-11T21:38:06.7666943Z 
2023-01-11T21:38:06.7667161Z triton_fused_upsample_bicubic2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7667239Z import triton
2023-01-11T21:38:06.7667332Z import triton.language as tl
2023-01-11T21:38:06.7667449Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7667552Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7667681Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7667808Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7667813Z 
2023-01-11T21:38:06.7668219Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7668297Z @triton.jit
2023-01-11T21:38:06.7668431Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7668506Z     xnumel = 393216
2023-01-11T21:38:06.7668606Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7668737Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7668814Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7668893Z     x0 = xindex % 256
2023-01-11T21:38:06.7668977Z     x1 = (xindex // 256) % 128
2023-01-11T21:38:06.7669057Z     x2 = (xindex // 32768)
2023-01-11T21:38:06.7669128Z     x4 = xindex
2023-01-11T21:38:06.7669200Z     tmp0 = x0
2023-01-11T21:38:06.7669272Z     tmp1 = tmp0 + 0.5
2023-01-11T21:38:06.7669351Z     tmp2 = 0.125 * tmp1
2023-01-11T21:38:06.7669461Z     tmp3 = tmp2 - 0.5
2023-01-11T21:38:06.7669563Z     tmp4 = tl.libdevice.floor(tmp3)
2023-01-11T21:38:06.7669704Z     tmp5 = tmp3 - tmp4
2023-01-11T21:38:06.7669779Z     tmp6 = x1
2023-01-11T21:38:06.7669856Z     tmp7 = tmp6 + 0.5
2023-01-11T21:38:06.7669926Z     tmp8 = 0.5 * tmp7
2023-01-11T21:38:06.7670034Z     tmp9 = tmp8 - 0.5
2023-01-11T21:38:06.7670137Z     tmp10 = tl.libdevice.floor(tmp9)
2023-01-11T21:38:06.7670252Z     tmp11 = tmp9 - tmp10
2023-01-11T21:38:06.7670345Z     tmp12 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7670434Z     tmp13 = tmp4.to(tl.int32)
2023-01-11T21:38:06.7670529Z     tmp14 = tmp12 + -1
2023-01-11T21:38:06.7670601Z     tmp15 = tmp12 + 0
2023-01-11T21:38:06.7670679Z     tmp16 = tmp12 + 1
2023-01-11T21:38:06.7670761Z     tmp17 = tmp12 + 2
2023-01-11T21:38:06.7670857Z     tmp18 = tmp13 + -1
2023-01-11T21:38:06.7670935Z     tmp19 = tmp13 + 0
2023-01-11T21:38:06.7671012Z     tmp20 = tmp13 + 1
2023-01-11T21:38:06.7671082Z     tmp21 = tmp13 + 2
2023-01-11T21:38:06.7671204Z     tmp22 = tl.where(63 != 63, 63, tl.where(63 < tmp14, 63, tmp14))
2023-01-11T21:38:06.7671321Z     tmp23 = tl.where(0 != 0, 0, tl.where(0 > tmp22, 0, tmp22))
2023-01-11T21:38:06.7671445Z     tmp24 = tl.where(31 != 31, 31, tl.where(31 < tmp18, 31, tmp18))
2023-01-11T21:38:06.7671561Z     tmp25 = tl.where(0 != 0, 0, tl.where(0 > tmp24, 0, tmp24))
2023-01-11T21:38:06.7671687Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (32*tmp23) + (2048*x2)), xmask)
2023-01-11T21:38:06.7671812Z     tmp27 = tl.where(31 != 31, 31, tl.where(31 < tmp19, 31, tmp19))
2023-01-11T21:38:06.7671926Z     tmp28 = tl.where(0 != 0, 0, tl.where(0 > tmp27, 0, tmp27))
2023-01-11T21:38:06.7672040Z     tmp29 = tl.load(in_ptr0 + (tmp28 + (32*tmp23) + (2048*x2)), xmask)
2023-01-11T21:38:06.7672157Z     tmp30 = tl.where(31 != 31, 31, tl.where(31 < tmp20, 31, tmp20))
2023-01-11T21:38:06.7672271Z     tmp31 = tl.where(0 != 0, 0, tl.where(0 > tmp30, 0, tmp30))
2023-01-11T21:38:06.7672390Z     tmp32 = tl.load(in_ptr0 + (tmp31 + (32*tmp23) + (2048*x2)), xmask)
2023-01-11T21:38:06.7672508Z     tmp33 = tl.where(31 != 31, 31, tl.where(31 < tmp21, 31, tmp21))
2023-01-11T21:38:06.7672624Z     tmp34 = tl.where(0 != 0, 0, tl.where(0 > tmp33, 0, tmp33))
2023-01-11T21:38:06.7672742Z     tmp35 = tl.load(in_ptr0 + (tmp34 + (32*tmp23) + (2048*x2)), xmask)
2023-01-11T21:38:06.7672823Z     tmp36 = tmp5 + 1.0
2023-01-11T21:38:06.7672931Z     tmp37 = -0.75 * tmp36
2023-01-11T21:38:06.7673045Z     tmp38 = tmp37 - -3.75
2023-01-11T21:38:06.7673155Z     tmp39 = tmp38 * tmp36
2023-01-11T21:38:06.7673270Z     tmp40 = tmp39 + -6.0
2023-01-11T21:38:06.7673352Z     tmp41 = tmp40 * tmp36
2023-01-11T21:38:06.7673463Z     tmp42 = tmp41 - -3.0
2023-01-11T21:38:06.7673535Z     tmp43 = 1.25 * tmp5
2023-01-11T21:38:06.7673644Z     tmp44 = tmp43 - 2.25
2023-01-11T21:38:06.7673724Z     tmp45 = tmp44 * tmp5
2023-01-11T21:38:06.7673803Z     tmp46 = tmp45 * tmp5
2023-01-11T21:38:06.7673881Z     tmp47 = tmp46 + 1.0
2023-01-11T21:38:06.7673987Z     tmp48 = 1.0 - tmp5
2023-01-11T21:38:06.7674064Z     tmp49 = 1.25 * tmp48
2023-01-11T21:38:06.7674167Z     tmp50 = tmp49 - 2.25
2023-01-11T21:38:06.7674250Z     tmp51 = tmp50 * tmp48
2023-01-11T21:38:06.7674334Z     tmp52 = tmp51 * tmp48
2023-01-11T21:38:06.7674412Z     tmp53 = tmp52 + 1.0
2023-01-11T21:38:06.7674489Z     tmp54 = tmp48 + 1.0
2023-01-11T21:38:06.7674598Z     tmp55 = -0.75 * tmp54
2023-01-11T21:38:06.7674708Z     tmp56 = tmp55 - -3.75
2023-01-11T21:38:06.7674781Z     tmp57 = tmp56 * tmp54
2023-01-11T21:38:06.7674893Z     tmp58 = tmp57 + -6.0
2023-01-11T21:38:06.7674974Z     tmp59 = tmp58 * tmp54
2023-01-11T21:38:06.7675083Z     tmp60 = tmp59 - -3.0
2023-01-11T21:38:06.7675163Z     tmp61 = tmp26 * tmp42
2023-01-11T21:38:06.7675243Z     tmp62 = tmp29 * tmp47
2023-01-11T21:38:06.7675315Z     tmp63 = tmp32 * tmp53
2023-01-11T21:38:06.7675394Z     tmp64 = tmp35 * tmp60
2023-01-11T21:38:06.7675473Z     tmp65 = tmp63 + tmp64
2023-01-11T21:38:06.7675551Z     tmp66 = tmp62 + tmp65
2023-01-11T21:38:06.7675629Z     tmp67 = tmp61 + tmp66
2023-01-11T21:38:06.7675745Z     tmp68 = tl.where(63 != 63, 63, tl.where(63 < tmp15, 63, tmp15))
2023-01-11T21:38:06.7675892Z     tmp69 = tl.where(0 != 0, 0, tl.where(0 > tmp68, 0, tmp68))
2023-01-11T21:38:06.7676003Z     tmp70 = tl.load(in_ptr0 + (tmp25 + (32*tmp69) + (2048*x2)), xmask)
2023-01-11T21:38:06.7676125Z     tmp71 = tl.load(in_ptr0 + (tmp28 + (32*tmp69) + (2048*x2)), xmask)
2023-01-11T21:38:06.7676245Z     tmp72 = tl.load(in_ptr0 + (tmp31 + (32*tmp69) + (2048*x2)), xmask)
2023-01-11T21:38:06.7676368Z     tmp73 = tl.load(in_ptr0 + (tmp34 + (32*tmp69) + (2048*x2)), xmask)
2023-01-11T21:38:06.7676450Z     tmp74 = tmp70 * tmp42
2023-01-11T21:38:06.7676531Z     tmp75 = tmp71 * tmp47
2023-01-11T21:38:06.7676611Z     tmp76 = tmp72 * tmp53
2023-01-11T21:38:06.7676691Z     tmp77 = tmp73 * tmp60
2023-01-11T21:38:06.7676762Z     tmp78 = tmp76 + tmp77
2023-01-11T21:38:06.7676839Z     tmp79 = tmp75 + tmp78
2023-01-11T21:38:06.7676917Z     tmp80 = tmp74 + tmp79
2023-01-11T21:38:06.7677037Z     tmp81 = tl.where(63 != 63, 63, tl.where(63 < tmp16, 63, tmp16))
2023-01-11T21:38:06.7677153Z     tmp82 = tl.where(0 != 0, 0, tl.where(0 > tmp81, 0, tmp81))
2023-01-11T21:38:06.7677277Z     tmp83 = tl.load(in_ptr0 + (tmp25 + (32*tmp82) + (2048*x2)), xmask)
2023-01-11T21:38:06.7677396Z     tmp84 = tl.load(in_ptr0 + (tmp28 + (32*tmp82) + (2048*x2)), xmask)
2023-01-11T21:38:06.7677505Z     tmp85 = tl.load(in_ptr0 + (tmp31 + (32*tmp82) + (2048*x2)), xmask)
2023-01-11T21:38:06.7677629Z     tmp86 = tl.load(in_ptr0 + (tmp34 + (32*tmp82) + (2048*x2)), xmask)
2023-01-11T21:38:06.7677709Z     tmp87 = tmp83 * tmp42
2023-01-11T21:38:06.7677787Z     tmp88 = tmp84 * tmp47
2023-01-11T21:38:06.7677865Z     tmp89 = tmp85 * tmp53
2023-01-11T21:38:06.7677946Z     tmp90 = tmp86 * tmp60
2023-01-11T21:38:06.7678024Z     tmp91 = tmp89 + tmp90
2023-01-11T21:38:06.7678095Z     tmp92 = tmp88 + tmp91
2023-01-11T21:38:06.7678172Z     tmp93 = tmp87 + tmp92
2023-01-11T21:38:06.7678287Z     tmp94 = tl.where(63 != 63, 63, tl.where(63 < tmp17, 63, tmp17))
2023-01-11T21:38:06.7678400Z     tmp95 = tl.where(0 != 0, 0, tl.where(0 > tmp94, 0, tmp94))
2023-01-11T21:38:06.7678516Z     tmp96 = tl.load(in_ptr0 + (tmp25 + (32*tmp95) + (2048*x2)), xmask)
2023-01-11T21:38:06.7678637Z     tmp97 = tl.load(in_ptr0 + (tmp28 + (32*tmp95) + (2048*x2)), xmask)
2023-01-11T21:38:06.7678760Z     tmp98 = tl.load(in_ptr0 + (tmp31 + (32*tmp95) + (2048*x2)), xmask)
2023-01-11T21:38:06.7678871Z     tmp99 = tl.load(in_ptr0 + (tmp34 + (32*tmp95) + (2048*x2)), xmask)
2023-01-11T21:38:06.7678981Z     tmp100 = tmp96 * tmp42
2023-01-11T21:38:06.7679064Z     tmp101 = tmp97 * tmp47
2023-01-11T21:38:06.7679143Z     tmp102 = tmp98 * tmp53
2023-01-11T21:38:06.7679223Z     tmp103 = tmp99 * tmp60
2023-01-11T21:38:06.7679307Z     tmp104 = tmp102 + tmp103
2023-01-11T21:38:06.7679387Z     tmp105 = tmp101 + tmp104
2023-01-11T21:38:06.7679460Z     tmp106 = tmp100 + tmp105
2023-01-11T21:38:06.7679538Z     tmp107 = tmp11 + 1.0
2023-01-11T21:38:06.7679657Z     tmp108 = -0.75 * tmp107
2023-01-11T21:38:06.7679771Z     tmp109 = tmp108 - -3.75
2023-01-11T21:38:06.7679852Z     tmp110 = tmp109 * tmp107
2023-01-11T21:38:06.7679965Z     tmp111 = tmp110 + -6.0
2023-01-11T21:38:06.7680047Z     tmp112 = tmp111 * tmp107
2023-01-11T21:38:06.7680153Z     tmp113 = tmp112 - -3.0
2023-01-11T21:38:06.7680230Z     tmp114 = 1.25 * tmp11
2023-01-11T21:38:06.7680343Z     tmp115 = tmp114 - 2.25
2023-01-11T21:38:06.7680424Z     tmp116 = tmp115 * tmp11
2023-01-11T21:38:06.7680504Z     tmp117 = tmp116 * tmp11
2023-01-11T21:38:06.7680583Z     tmp118 = tmp117 + 1.0
2023-01-11T21:38:06.7680687Z     tmp119 = 1.0 - tmp11
2023-01-11T21:38:06.7680765Z     tmp120 = 1.25 * tmp119
2023-01-11T21:38:06.7680878Z     tmp121 = tmp120 - 2.25
2023-01-11T21:38:06.7680958Z     tmp122 = tmp121 * tmp119
2023-01-11T21:38:06.7681038Z     tmp123 = tmp122 * tmp119
2023-01-11T21:38:06.7681114Z     tmp124 = tmp123 + 1.0
2023-01-11T21:38:06.7681191Z     tmp125 = tmp119 + 1.0
2023-01-11T21:38:06.7681296Z     tmp126 = -0.75 * tmp125
2023-01-11T21:38:06.7681410Z     tmp127 = tmp126 - -3.75
2023-01-11T21:38:06.7681490Z     tmp128 = tmp127 * tmp125
2023-01-11T21:38:06.7681601Z     tmp129 = tmp128 + -6.0
2023-01-11T21:38:06.7681723Z     tmp130 = tmp129 * tmp125
2023-01-11T21:38:06.7681837Z     tmp131 = tmp130 - -3.0
2023-01-11T21:38:06.7681917Z     tmp132 = tmp67 * tmp113
2023-01-11T21:38:06.7681989Z     tmp133 = tmp80 * tmp118
2023-01-11T21:38:06.7682067Z     tmp134 = tmp93 * tmp124
2023-01-11T21:38:06.7682147Z     tmp135 = tmp106 * tmp131
2023-01-11T21:38:06.7682230Z     tmp136 = tmp134 + tmp135
2023-01-11T21:38:06.7682312Z     tmp137 = tmp133 + tmp136
2023-01-11T21:38:06.7682396Z     tmp138 = tmp132 + tmp137
2023-01-11T21:38:06.7682533Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp138, xmask)
2023-01-11T21:38:06.7682611Z ''')
2023-01-11T21:38:06.7682617Z 
2023-01-11T21:38:06.7682621Z 
2023-01-11T21:38:06.7682716Z async_compile.wait(globals())
2023-01-11T21:38:06.7682792Z del async_compile
2023-01-11T21:38:06.7682797Z 
2023-01-11T21:38:06.7682871Z def call(args):
2023-01-11T21:38:06.7682943Z     arg0_1, = args
2023-01-11T21:38:06.7683018Z     args.clear()
2023-01-11T21:38:06.7683113Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7683340Z         buf0 = empty_strided((4, 3, 128, 128), (49152, 16384, 128, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7683434Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7683595Z         triton_fused_upsample_bicubic2d_0.run(arg0_1, buf0, 196608, grid=grid(196608), stream=stream0)
2023-01-11T21:38:06.7683826Z         buf1 = empty_strided((4, 3, 128, 256), (98304, 32768, 256, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7683987Z         triton_fused_upsample_bicubic2d_1_1.run(arg0_1, buf1, 393216, grid=grid(393216), stream=stream0)
2023-01-11T21:38:06.7684059Z         del arg0_1
2023-01-11T21:38:06.7684142Z         return (buf0, buf1, )
2023-01-11T21:38:06.7684147Z 
2023-01-11T21:38:06.7684151Z 
2023-01-11T21:38:06.7684230Z if __name__ == "__main__":
2023-01-11T21:38:06.7684348Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7684468Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7684695Z     arg0_1 = rand_strided((4, 3, 64, 32), (6144, 2048, 32, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7684807Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7685071Z [2023-01-11 21:36:09,323] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 929
2023-01-11T21:38:06.7685512Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7685644Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7685925Z [2023-01-11 21:36:10,974] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 930
2023-01-11T21:38:06.7685931Z 
2023-01-11T21:38:06.7685938Z 
2023-01-11T21:38:06.7686046Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7686134Z import torch
2023-01-11T21:38:06.7686202Z import random
2023-01-11T21:38:06.7686321Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7686445Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7686450Z 
2023-01-11T21:38:06.7686534Z aten = torch.ops.aten
2023-01-11T21:38:06.7686671Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7686767Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7686772Z 
2023-01-11T21:38:06.7686845Z import triton
2023-01-11T21:38:06.7686930Z import triton.language as tl
2023-01-11T21:38:06.7687054Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7687194Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7687199Z 
2023-01-11T21:38:06.7687204Z 
2023-01-11T21:38:06.7687388Z triton_fused_upsample_bicubic2d_0 = async_compile.triton('''
2023-01-11T21:38:06.7687491Z import triton
2023-01-11T21:38:06.7687583Z import triton.language as tl
2023-01-11T21:38:06.7687698Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7687800Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7687925Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7688059Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7688065Z 
2023-01-11T21:38:06.7688473Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7688546Z @triton.jit
2023-01-11T21:38:06.7688682Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7688757Z     xnumel = 196608
2023-01-11T21:38:06.7688855Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7688985Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7689065Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7689141Z     x0 = xindex % 128
2023-01-11T21:38:06.7689225Z     x1 = (xindex // 128) % 128
2023-01-11T21:38:06.7689305Z     x2 = (xindex // 16384)
2023-01-11T21:38:06.7689375Z     x4 = xindex
2023-01-11T21:38:06.7689446Z     tmp0 = x0
2023-01-11T21:38:06.7689533Z     tmp1 = 0.2440944881889764 * tmp0
2023-01-11T21:38:06.7689625Z     tmp2 = tl.libdevice.floor(tmp1)
2023-01-11T21:38:06.7689736Z     tmp3 = tmp1 - tmp2
2023-01-11T21:38:06.7689805Z     tmp4 = x1
2023-01-11T21:38:06.7689891Z     tmp5 = 0.49606299212598426 * tmp4
2023-01-11T21:38:06.7689987Z     tmp6 = tl.libdevice.floor(tmp5)
2023-01-11T21:38:06.7690097Z     tmp7 = tmp5 - tmp6
2023-01-11T21:38:06.7690183Z     tmp8 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7690261Z     tmp9 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7690356Z     tmp10 = tmp8 + -1
2023-01-11T21:38:06.7690432Z     tmp11 = tmp8 + 0
2023-01-11T21:38:06.7690507Z     tmp12 = tmp8 + 1
2023-01-11T21:38:06.7690585Z     tmp13 = tmp8 + 2
2023-01-11T21:38:06.7690679Z     tmp14 = tmp9 + -1
2023-01-11T21:38:06.7690747Z     tmp15 = tmp9 + 0
2023-01-11T21:38:06.7690820Z     tmp16 = tmp9 + 1
2023-01-11T21:38:06.7690892Z     tmp17 = tmp9 + 2
2023-01-11T21:38:06.7691015Z     tmp18 = tl.where(63 != 63, 63, tl.where(63 < tmp10, 63, tmp10))
2023-01-11T21:38:06.7691194Z     tmp19 = tl.where(0 != 0, 0, tl.where(0 > tmp18, 0, tmp18))
2023-01-11T21:38:06.7691315Z     tmp20 = tl.where(31 != 31, 31, tl.where(31 < tmp14, 31, tmp14))
2023-01-11T21:38:06.7691429Z     tmp21 = tl.where(0 != 0, 0, tl.where(0 > tmp20, 0, tmp20))
2023-01-11T21:38:06.7691684Z     tmp22 = tl.load(in_ptr0 + (tmp21 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7691795Z     tmp23 = tl.where(31 != 31, 31, tl.where(31 < tmp15, 31, tmp15))
2023-01-11T21:38:06.7691909Z     tmp24 = tl.where(0 != 0, 0, tl.where(0 > tmp23, 0, tmp23))
2023-01-11T21:38:06.7692159Z     tmp25 = tl.load(in_ptr0 + (tmp24 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7692278Z     tmp26 = tl.where(31 != 31, 31, tl.where(31 < tmp16, 31, tmp16))
2023-01-11T21:38:06.7692391Z     tmp27 = tl.where(0 != 0, 0, tl.where(0 > tmp26, 0, tmp26))
2023-01-11T21:38:06.7692640Z     tmp28 = tl.load(in_ptr0 + (tmp27 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7692759Z     tmp29 = tl.where(31 != 31, 31, tl.where(31 < tmp17, 31, tmp17))
2023-01-11T21:38:06.7692874Z     tmp30 = tl.where(0 != 0, 0, tl.where(0 > tmp29, 0, tmp29))
2023-01-11T21:38:06.7693113Z     tmp31 = tl.load(in_ptr0 + (tmp30 + (32*tmp19) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7693191Z     tmp32 = tmp3 + 1.0
2023-01-11T21:38:06.7693302Z     tmp33 = -0.75 * tmp32
2023-01-11T21:38:06.7693416Z     tmp34 = tmp33 - -3.75
2023-01-11T21:38:06.7693495Z     tmp35 = tmp34 * tmp32
2023-01-11T21:38:06.7693606Z     tmp36 = tmp35 + -6.0
2023-01-11T21:38:06.7693714Z     tmp37 = tmp36 * tmp32
2023-01-11T21:38:06.7693817Z     tmp38 = tmp37 - -3.0
2023-01-11T21:38:06.7693897Z     tmp39 = 1.25 * tmp3
2023-01-11T21:38:06.7694006Z     tmp40 = tmp39 - 2.25
2023-01-11T21:38:06.7694085Z     tmp41 = tmp40 * tmp3
2023-01-11T21:38:06.7694163Z     tmp42 = tmp41 * tmp3
2023-01-11T21:38:06.7694243Z     tmp43 = tmp42 + 1.0
2023-01-11T21:38:06.7694342Z     tmp44 = 1.0 - tmp3
2023-01-11T21:38:06.7694421Z     tmp45 = 1.25 * tmp44
2023-01-11T21:38:06.7694740Z     tmp46 = tmp45 - 2.25
2023-01-11T21:38:06.7694821Z     tmp47 = tmp46 * tmp44
2023-01-11T21:38:06.7694900Z     tmp48 = tmp47 * tmp44
2023-01-11T21:38:06.7694979Z     tmp49 = tmp48 + 1.0
2023-01-11T21:38:06.7695058Z     tmp50 = tmp44 + 1.0
2023-01-11T21:38:06.7695165Z     tmp51 = -0.75 * tmp50
2023-01-11T21:38:06.7695278Z     tmp52 = tmp51 - -3.75
2023-01-11T21:38:06.7695356Z     tmp53 = tmp52 * tmp50
2023-01-11T21:38:06.7695468Z     tmp54 = tmp53 + -6.0
2023-01-11T21:38:06.7695551Z     tmp55 = tmp54 * tmp50
2023-01-11T21:38:06.7695667Z     tmp56 = tmp55 - -3.0
2023-01-11T21:38:06.7695749Z     tmp57 = tmp22 * tmp38
2023-01-11T21:38:06.7695821Z     tmp58 = tmp25 * tmp43
2023-01-11T21:38:06.7695899Z     tmp59 = tmp28 * tmp49
2023-01-11T21:38:06.7695976Z     tmp60 = tmp31 * tmp56
2023-01-11T21:38:06.7696055Z     tmp61 = tmp59 + tmp60
2023-01-11T21:38:06.7696136Z     tmp62 = tmp58 + tmp61
2023-01-11T21:38:06.7696216Z     tmp63 = tmp57 + tmp62
2023-01-11T21:38:06.7696328Z     tmp64 = tl.where(63 != 63, 63, tl.where(63 < tmp11, 63, tmp11))
2023-01-11T21:38:06.7696447Z     tmp65 = tl.where(0 != 0, 0, tl.where(0 > tmp64, 0, tmp64))
2023-01-11T21:38:06.7696697Z     tmp66 = tl.load(in_ptr0 + (tmp21 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7696974Z     tmp67 = tl.load(in_ptr0 + (tmp24 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7697294Z     tmp68 = tl.load(in_ptr0 + (tmp27 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7697537Z     tmp69 = tl.load(in_ptr0 + (tmp30 + (32*tmp65) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7697620Z     tmp70 = tmp66 * tmp38
2023-01-11T21:38:06.7697700Z     tmp71 = tmp67 * tmp43
2023-01-11T21:38:06.7697782Z     tmp72 = tmp68 * tmp49
2023-01-11T21:38:06.7697909Z     tmp73 = tmp69 * tmp56
2023-01-11T21:38:06.7697991Z     tmp74 = tmp72 + tmp73
2023-01-11T21:38:06.7698071Z     tmp75 = tmp71 + tmp74
2023-01-11T21:38:06.7698151Z     tmp76 = tmp70 + tmp75
2023-01-11T21:38:06.7698273Z     tmp77 = tl.where(63 != 63, 63, tl.where(63 < tmp12, 63, tmp12))
2023-01-11T21:38:06.7698391Z     tmp78 = tl.where(0 != 0, 0, tl.where(0 > tmp77, 0, tmp77))
2023-01-11T21:38:06.7698642Z     tmp79 = tl.load(in_ptr0 + (tmp21 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7698879Z     tmp80 = tl.load(in_ptr0 + (tmp24 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7699125Z     tmp81 = tl.load(in_ptr0 + (tmp27 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7699364Z     tmp82 = tl.load(in_ptr0 + (tmp30 + (32*tmp78) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7699446Z     tmp83 = tmp79 * tmp38
2023-01-11T21:38:06.7699533Z     tmp84 = tmp80 * tmp43
2023-01-11T21:38:06.7699613Z     tmp85 = tmp81 * tmp49
2023-01-11T21:38:06.7699695Z     tmp86 = tmp82 * tmp56
2023-01-11T21:38:06.7699768Z     tmp87 = tmp85 + tmp86
2023-01-11T21:38:06.7699848Z     tmp88 = tmp84 + tmp87
2023-01-11T21:38:06.7699927Z     tmp89 = tmp83 + tmp88
2023-01-11T21:38:06.7700049Z     tmp90 = tl.where(63 != 63, 63, tl.where(63 < tmp13, 63, tmp13))
2023-01-11T21:38:06.7700166Z     tmp91 = tl.where(0 != 0, 0, tl.where(0 > tmp90, 0, tmp90))
2023-01-11T21:38:06.7700408Z     tmp92 = tl.load(in_ptr0 + (tmp21 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7700685Z     tmp93 = tl.load(in_ptr0 + (tmp24 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7700925Z     tmp94 = tl.load(in_ptr0 + (tmp27 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7701157Z     tmp95 = tl.load(in_ptr0 + (tmp30 + (32*tmp91) + (2048*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7701239Z     tmp96 = tmp92 * tmp38
2023-01-11T21:38:06.7701318Z     tmp97 = tmp93 * tmp43
2023-01-11T21:38:06.7701399Z     tmp98 = tmp94 * tmp49
2023-01-11T21:38:06.7701478Z     tmp99 = tmp95 * tmp56
2023-01-11T21:38:06.7701561Z     tmp100 = tmp98 + tmp99
2023-01-11T21:38:06.7701643Z     tmp101 = tmp97 + tmp100
2023-01-11T21:38:06.7701718Z     tmp102 = tmp96 + tmp101
2023-01-11T21:38:06.7701798Z     tmp103 = tmp7 + 1.0
2023-01-11T21:38:06.7701912Z     tmp104 = -0.75 * tmp103
2023-01-11T21:38:06.7702030Z     tmp105 = tmp104 - -3.75
2023-01-11T21:38:06.7702116Z     tmp106 = tmp105 * tmp103
2023-01-11T21:38:06.7702230Z     tmp107 = tmp106 + -6.0
2023-01-11T21:38:06.7702312Z     tmp108 = tmp107 * tmp103
2023-01-11T21:38:06.7702419Z     tmp109 = tmp108 - -3.0
2023-01-11T21:38:06.7702497Z     tmp110 = 1.25 * tmp7
2023-01-11T21:38:06.7702611Z     tmp111 = tmp110 - 2.25
2023-01-11T21:38:06.7702694Z     tmp112 = tmp111 * tmp7
2023-01-11T21:38:06.7702778Z     tmp113 = tmp112 * tmp7
2023-01-11T21:38:06.7702859Z     tmp114 = tmp113 + 1.0
2023-01-11T21:38:06.7702970Z     tmp115 = 1.0 - tmp7
2023-01-11T21:38:06.7703041Z     tmp116 = 1.25 * tmp115
2023-01-11T21:38:06.7703157Z     tmp117 = tmp116 - 2.25
2023-01-11T21:38:06.7703245Z     tmp118 = tmp117 * tmp115
2023-01-11T21:38:06.7703331Z     tmp119 = tmp118 * tmp115
2023-01-11T21:38:06.7703410Z     tmp120 = tmp119 + 1.0
2023-01-11T21:38:06.7703489Z     tmp121 = tmp115 + 1.0
2023-01-11T21:38:06.7703596Z     tmp122 = -0.75 * tmp121
2023-01-11T21:38:06.7703713Z     tmp123 = tmp122 - -3.75
2023-01-11T21:38:06.7703794Z     tmp124 = tmp123 * tmp121
2023-01-11T21:38:06.7703912Z     tmp125 = tmp124 + -6.0
2023-01-11T21:38:06.7703994Z     tmp126 = tmp125 * tmp121
2023-01-11T21:38:06.7704108Z     tmp127 = tmp126 - -3.0
2023-01-11T21:38:06.7704189Z     tmp128 = tmp63 * tmp109
2023-01-11T21:38:06.7704263Z     tmp129 = tmp76 * tmp114
2023-01-11T21:38:06.7704344Z     tmp130 = tmp89 * tmp120
2023-01-11T21:38:06.7704452Z     tmp131 = tmp102 * tmp127
2023-01-11T21:38:06.7704535Z     tmp132 = tmp130 + tmp131
2023-01-11T21:38:06.7704617Z     tmp133 = tmp129 + tmp132
2023-01-11T21:38:06.7704701Z     tmp134 = tmp128 + tmp133
2023-01-11T21:38:06.7704841Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp134, xmask)
2023-01-11T21:38:06.7704920Z ''')
2023-01-11T21:38:06.7704926Z 
2023-01-11T21:38:06.7704930Z 
2023-01-11T21:38:06.7705119Z triton_fused_upsample_bicubic2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7705200Z import triton
2023-01-11T21:38:06.7705294Z import triton.language as tl
2023-01-11T21:38:06.7705413Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7705524Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7705657Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7705777Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7705790Z 
2023-01-11T21:38:06.7706196Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7706272Z @triton.jit
2023-01-11T21:38:06.7706404Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7706481Z     xnumel = 393216
2023-01-11T21:38:06.7706583Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7706714Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7706798Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7706869Z     x0 = xindex % 256
2023-01-11T21:38:06.7706981Z     x1 = (xindex // 256) % 128
2023-01-11T21:38:06.7707062Z     x2 = (xindex // 32768)
2023-01-11T21:38:06.7707136Z     x4 = xindex
2023-01-11T21:38:06.7707208Z     tmp0 = x0
2023-01-11T21:38:06.7707288Z     tmp1 = tmp0 + 0.5
2023-01-11T21:38:06.7707369Z     tmp2 = 0.125 * tmp1
2023-01-11T21:38:06.7707471Z     tmp3 = tmp2 - 0.5
2023-01-11T21:38:06.7707578Z     tmp4 = tl.libdevice.floor(tmp3)
2023-01-11T21:38:06.7707686Z     tmp5 = tmp3 - tmp4
2023-01-11T21:38:06.7707758Z     tmp6 = x1
2023-01-11T21:38:06.7707836Z     tmp7 = tmp6 + 0.5
2023-01-11T21:38:06.7707912Z     tmp8 = 0.5 * tmp7
2023-01-11T21:38:06.7708018Z     tmp9 = tmp8 - 0.5
2023-01-11T21:38:06.7708112Z     tmp10 = tl.libdevice.floor(tmp9)
2023-01-11T21:38:06.7708227Z     tmp11 = tmp9 - tmp10
2023-01-11T21:38:06.7708317Z     tmp12 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7708404Z     tmp13 = tmp4.to(tl.int32)
2023-01-11T21:38:06.7708499Z     tmp14 = tmp12 + -1
2023-01-11T21:38:06.7708578Z     tmp15 = tmp12 + 0
2023-01-11T21:38:06.7708652Z     tmp16 = tmp12 + 1
2023-01-11T21:38:06.7708731Z     tmp17 = tmp12 + 2
2023-01-11T21:38:06.7708826Z     tmp18 = tmp13 + -1
2023-01-11T21:38:06.7708904Z     tmp19 = tmp13 + 0
2023-01-11T21:38:06.7708980Z     tmp20 = tmp13 + 1
2023-01-11T21:38:06.7709058Z     tmp21 = tmp13 + 2
2023-01-11T21:38:06.7709185Z     tmp22 = tl.where(63 != 63, 63, tl.where(63 < tmp14, 63, tmp14))
2023-01-11T21:38:06.7709294Z     tmp23 = tl.where(0 != 0, 0, tl.where(0 > tmp22, 0, tmp22))
2023-01-11T21:38:06.7709416Z     tmp24 = tl.where(31 != 31, 31, tl.where(31 < tmp18, 31, tmp18))
2023-01-11T21:38:06.7709533Z     tmp25 = tl.where(0 != 0, 0, tl.where(0 > tmp24, 0, tmp24))
2023-01-11T21:38:06.7709677Z     tmp26 = tl.load(in_ptr0 + (tmp25 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7709796Z     tmp27 = tl.where(31 != 31, 31, tl.where(31 < tmp19, 31, tmp19))
2023-01-11T21:38:06.7709909Z     tmp28 = tl.where(0 != 0, 0, tl.where(0 > tmp27, 0, tmp27))
2023-01-11T21:38:06.7710048Z     tmp29 = tl.load(in_ptr0 + (tmp28 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7710168Z     tmp30 = tl.where(31 != 31, 31, tl.where(31 < tmp20, 31, tmp20))
2023-01-11T21:38:06.7710275Z     tmp31 = tl.where(0 != 0, 0, tl.where(0 > tmp30, 0, tmp30))
2023-01-11T21:38:06.7710442Z     tmp32 = tl.load(in_ptr0 + (tmp31 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7710560Z     tmp33 = tl.where(31 != 31, 31, tl.where(31 < tmp21, 31, tmp21))
2023-01-11T21:38:06.7710672Z     tmp34 = tl.where(0 != 0, 0, tl.where(0 > tmp33, 0, tmp33))
2023-01-11T21:38:06.7710810Z     tmp35 = tl.load(in_ptr0 + (tmp34 + (32*tmp23) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7710889Z     tmp36 = tmp5 + 1.0
2023-01-11T21:38:06.7711004Z     tmp37 = -0.75 * tmp36
2023-01-11T21:38:06.7711113Z     tmp38 = tmp37 - -3.75
2023-01-11T21:38:06.7711193Z     tmp39 = tmp38 * tmp36
2023-01-11T21:38:06.7711307Z     tmp40 = tmp39 + -6.0
2023-01-11T21:38:06.7711390Z     tmp41 = tmp40 * tmp36
2023-01-11T21:38:06.7711503Z     tmp42 = tmp41 - -3.0
2023-01-11T21:38:06.7711581Z     tmp43 = 1.25 * tmp5
2023-01-11T21:38:06.7711691Z     tmp44 = tmp43 - 2.25
2023-01-11T21:38:06.7711764Z     tmp45 = tmp44 * tmp5
2023-01-11T21:38:06.7711845Z     tmp46 = tmp45 * tmp5
2023-01-11T21:38:06.7711924Z     tmp47 = tmp46 + 1.0
2023-01-11T21:38:06.7712031Z     tmp48 = 1.0 - tmp5
2023-01-11T21:38:06.7712111Z     tmp49 = 1.25 * tmp48
2023-01-11T21:38:06.7712224Z     tmp50 = tmp49 - 2.25
2023-01-11T21:38:06.7712303Z     tmp51 = tmp50 * tmp48
2023-01-11T21:38:06.7712375Z     tmp52 = tmp51 * tmp48
2023-01-11T21:38:06.7712453Z     tmp53 = tmp52 + 1.0
2023-01-11T21:38:06.7712531Z     tmp54 = tmp48 + 1.0
2023-01-11T21:38:06.7712641Z     tmp55 = -0.75 * tmp54
2023-01-11T21:38:06.7712754Z     tmp56 = tmp55 - -3.75
2023-01-11T21:38:06.7712833Z     tmp57 = tmp56 * tmp54
2023-01-11T21:38:06.7712936Z     tmp58 = tmp57 + -6.0
2023-01-11T21:38:06.7713016Z     tmp59 = tmp58 * tmp54
2023-01-11T21:38:06.7713125Z     tmp60 = tmp59 - -3.0
2023-01-11T21:38:06.7713239Z     tmp61 = tmp26 * tmp42
2023-01-11T21:38:06.7713317Z     tmp62 = tmp29 * tmp47
2023-01-11T21:38:06.7713397Z     tmp63 = tmp32 * tmp53
2023-01-11T21:38:06.7713478Z     tmp64 = tmp35 * tmp60
2023-01-11T21:38:06.7713550Z     tmp65 = tmp63 + tmp64
2023-01-11T21:38:06.7713629Z     tmp66 = tmp62 + tmp65
2023-01-11T21:38:06.7713706Z     tmp67 = tmp61 + tmp66
2023-01-11T21:38:06.7713827Z     tmp68 = tl.where(63 != 63, 63, tl.where(63 < tmp15, 63, tmp15))
2023-01-11T21:38:06.7713944Z     tmp69 = tl.where(0 != 0, 0, tl.where(0 > tmp68, 0, tmp68))
2023-01-11T21:38:06.7714079Z     tmp70 = tl.load(in_ptr0 + (tmp25 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7714211Z     tmp71 = tl.load(in_ptr0 + (tmp28 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7714340Z     tmp72 = tl.load(in_ptr0 + (tmp31 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7714462Z     tmp73 = tl.load(in_ptr0 + (tmp34 + (32*tmp69) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7714546Z     tmp74 = tmp70 * tmp42
2023-01-11T21:38:06.7714629Z     tmp75 = tmp71 * tmp47
2023-01-11T21:38:06.7714709Z     tmp76 = tmp72 * tmp53
2023-01-11T21:38:06.7714789Z     tmp77 = tmp73 * tmp60
2023-01-11T21:38:06.7714869Z     tmp78 = tmp76 + tmp77
2023-01-11T21:38:06.7714947Z     tmp79 = tmp75 + tmp78
2023-01-11T21:38:06.7715022Z     tmp80 = tmp74 + tmp79
2023-01-11T21:38:06.7715141Z     tmp81 = tl.where(63 != 63, 63, tl.where(63 < tmp16, 63, tmp16))
2023-01-11T21:38:06.7715282Z     tmp82 = tl.where(0 != 0, 0, tl.where(0 > tmp81, 0, tmp81))
2023-01-11T21:38:06.7715435Z     tmp83 = tl.load(in_ptr0 + (tmp25 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7715571Z     tmp84 = tl.load(in_ptr0 + (tmp28 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7715701Z     tmp85 = tl.load(in_ptr0 + (tmp31 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7715829Z     tmp86 = tl.load(in_ptr0 + (tmp34 + (32*tmp82) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7715906Z     tmp87 = tmp83 * tmp42
2023-01-11T21:38:06.7715987Z     tmp88 = tmp84 * tmp47
2023-01-11T21:38:06.7716069Z     tmp89 = tmp85 * tmp53
2023-01-11T21:38:06.7716149Z     tmp90 = tmp86 * tmp60
2023-01-11T21:38:06.7716227Z     tmp91 = tmp89 + tmp90
2023-01-11T21:38:06.7716304Z     tmp92 = tmp88 + tmp91
2023-01-11T21:38:06.7716409Z     tmp93 = tmp87 + tmp92
2023-01-11T21:38:06.7716521Z     tmp94 = tl.where(63 != 63, 63, tl.where(63 < tmp17, 63, tmp17))
2023-01-11T21:38:06.7716639Z     tmp95 = tl.where(0 != 0, 0, tl.where(0 > tmp94, 0, tmp94))
2023-01-11T21:38:06.7716770Z     tmp96 = tl.load(in_ptr0 + (tmp25 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7716899Z     tmp97 = tl.load(in_ptr0 + (tmp28 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7717026Z     tmp98 = tl.load(in_ptr0 + (tmp31 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7717152Z     tmp99 = tl.load(in_ptr0 + (tmp34 + (32*tmp95) + (2048*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7717239Z     tmp100 = tmp96 * tmp42
2023-01-11T21:38:06.7717323Z     tmp101 = tmp97 * tmp47
2023-01-11T21:38:06.7717397Z     tmp102 = tmp98 * tmp53
2023-01-11T21:38:06.7717477Z     tmp103 = tmp99 * tmp60
2023-01-11T21:38:06.7717560Z     tmp104 = tmp102 + tmp103
2023-01-11T21:38:06.7717643Z     tmp105 = tmp101 + tmp104
2023-01-11T21:38:06.7717725Z     tmp106 = tmp100 + tmp105
2023-01-11T21:38:06.7717805Z     tmp107 = tmp11 + 1.0
2023-01-11T21:38:06.7717922Z     tmp108 = -0.75 * tmp107
2023-01-11T21:38:06.7718032Z     tmp109 = tmp108 - -3.75
2023-01-11T21:38:06.7718115Z     tmp110 = tmp109 * tmp107
2023-01-11T21:38:06.7718229Z     tmp111 = tmp110 + -6.0
2023-01-11T21:38:06.7718310Z     tmp112 = tmp111 * tmp107
2023-01-11T21:38:06.7718425Z     tmp113 = tmp112 - -3.0
2023-01-11T21:38:06.7718504Z     tmp114 = 1.25 * tmp11
2023-01-11T21:38:06.7718611Z     tmp115 = tmp114 - 2.25
2023-01-11T21:38:06.7718693Z     tmp116 = tmp115 * tmp11
2023-01-11T21:38:06.7718774Z     tmp117 = tmp116 * tmp11
2023-01-11T21:38:06.7718882Z     tmp118 = tmp117 + 1.0
2023-01-11T21:38:06.7718994Z     tmp119 = 1.0 - tmp11
2023-01-11T21:38:06.7719072Z     tmp120 = 1.25 * tmp119
2023-01-11T21:38:06.7719186Z     tmp121 = tmp120 - 2.25
2023-01-11T21:38:06.7719261Z     tmp122 = tmp121 * tmp119
2023-01-11T21:38:06.7719343Z     tmp123 = tmp122 * tmp119
2023-01-11T21:38:06.7719425Z     tmp124 = tmp123 + 1.0
2023-01-11T21:38:06.7719507Z     tmp125 = tmp119 + 1.0
2023-01-11T21:38:06.7719619Z     tmp126 = -0.75 * tmp125
2023-01-11T21:38:06.7719736Z     tmp127 = tmp126 - -3.75
2023-01-11T21:38:06.7719817Z     tmp128 = tmp127 * tmp125
2023-01-11T21:38:06.7719924Z     tmp129 = tmp128 + -6.0
2023-01-11T21:38:06.7720004Z     tmp130 = tmp129 * tmp125
2023-01-11T21:38:06.7720118Z     tmp131 = tmp130 - -3.0
2023-01-11T21:38:06.7720203Z     tmp132 = tmp67 * tmp113
2023-01-11T21:38:06.7720284Z     tmp133 = tmp80 * tmp118
2023-01-11T21:38:06.7720368Z     tmp134 = tmp93 * tmp124
2023-01-11T21:38:06.7720443Z     tmp135 = tmp106 * tmp131
2023-01-11T21:38:06.7720528Z     tmp136 = tmp134 + tmp135
2023-01-11T21:38:06.7720612Z     tmp137 = tmp133 + tmp136
2023-01-11T21:38:06.7720693Z     tmp138 = tmp132 + tmp137
2023-01-11T21:38:06.7720833Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp138, xmask)
2023-01-11T21:38:06.7720921Z ''')
2023-01-11T21:38:06.7720927Z 
2023-01-11T21:38:06.7720937Z 
2023-01-11T21:38:06.7721031Z async_compile.wait(globals())
2023-01-11T21:38:06.7721110Z del async_compile
2023-01-11T21:38:06.7721115Z 
2023-01-11T21:38:06.7721184Z def call(args):
2023-01-11T21:38:06.7721258Z     arg0_1, = args
2023-01-11T21:38:06.7721335Z     args.clear()
2023-01-11T21:38:06.7721429Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7721663Z         buf0 = empty_strided((4, 3, 128, 128), (49152, 16384, 128, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7721758Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7721918Z         triton_fused_upsample_bicubic2d_0.run(arg0_1, buf0, 196608, grid=grid(196608), stream=stream0)
2023-01-11T21:38:06.7722145Z         buf1 = empty_strided((4, 3, 128, 256), (98304, 32768, 256, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7722309Z         triton_fused_upsample_bicubic2d_1_1.run(arg0_1, buf1, 393216, grid=grid(393216), stream=stream0)
2023-01-11T21:38:06.7722385Z         del arg0_1
2023-01-11T21:38:06.7722495Z         return (buf0, buf1, )
2023-01-11T21:38:06.7722501Z 
2023-01-11T21:38:06.7722505Z 
2023-01-11T21:38:06.7722587Z if __name__ == "__main__":
2023-01-11T21:38:06.7722706Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7722835Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7723064Z     arg0_1 = rand_strided((4, 3, 64, 32), (6144, 2048, 32, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7723170Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7723433Z [2023-01-11 21:36:11,550] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 930
2023-01-11T21:38:06.7723442Z 
2023-01-11T21:38:06.7723515Z ok (4.423s)
2023-01-11T21:38:06.7723992Z   test_upsample_bilinear2d_a_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7724125Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7724380Z [2023-01-11 21:36:12,104] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 931
2023-01-11T21:38:06.7724618Z [2023-01-11 21:36:12,268] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7724848Z [2023-01-11 21:36:12,285] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7725121Z [2023-01-11 21:36:12,294] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7725336Z [2023-01-11 21:36:12,299] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7725557Z [2023-01-11 21:36:12,300] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7725786Z [2023-01-11 21:36:12,312] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7725991Z [2023-01-11 21:36:12,320] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.7726252Z [2023-01-11 21:36:12,548] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 931
2023-01-11T21:38:06.7726669Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7726806Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7727061Z [2023-01-11 21:36:13,169] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 932
2023-01-11T21:38:06.7727300Z [2023-01-11 21:36:13,317] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7727529Z [2023-01-11 21:36:13,328] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7727755Z [2023-01-11 21:36:13,337] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7727956Z [2023-01-11 21:36:13,340] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7728162Z [2023-01-11 21:36:13,341] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7728393Z [2023-01-11 21:36:13,342] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7728624Z [2023-01-11 21:36:13,350] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7728824Z [2023-01-11 21:36:13,358] torch._inductor.scheduler: [DEBUG] remove_buffer('buf3')
2023-01-11T21:38:06.7729050Z [2023-01-11 21:36:13,358] torch._inductor.scheduler: [DEBUG] remove_buffer('buf4')
2023-01-11T21:38:06.7729056Z 
2023-01-11T21:38:06.7729158Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7729234Z import torch
2023-01-11T21:38:06.7729303Z import random
2023-01-11T21:38:06.7729423Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7729549Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7729554Z 
2023-01-11T21:38:06.7729637Z aten = torch.ops.aten
2023-01-11T21:38:06.7729776Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7729874Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7729880Z 
2023-01-11T21:38:06.7729960Z import triton
2023-01-11T21:38:06.7730054Z import triton.language as tl
2023-01-11T21:38:06.7730173Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7730314Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7730320Z 
2023-01-11T21:38:06.7730324Z 
2023-01-11T21:38:06.7730600Z triton_fused_add_add_2_add_3_add_4_arange_convert_element_type_floor_index_index_1_index_2_0 = async_compile.triton('''
2023-01-11T21:38:06.7730676Z import triton
2023-01-11T21:38:06.7730769Z import triton.language as tl
2023-01-11T21:38:06.7730883Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7730985Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7740679Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7740844Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7740850Z 
2023-01-11T21:38:06.7741302Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7741479Z @triton.jit
2023-01-11T21:38:06.7741621Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7741699Z     xnumel = 16200
2023-01-11T21:38:06.7741804Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7741944Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7742023Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7742117Z     x1 = (xindex // 45) % 45
2023-01-11T21:38:06.7742197Z     x0 = xindex % 45
2023-01-11T21:38:06.7742285Z     x2 = (xindex // 2025)
2023-01-11T21:38:06.7742360Z     x4 = xindex
2023-01-11T21:38:06.7742436Z     tmp0 = x1
2023-01-11T21:38:06.7742514Z     tmp1 = 0.5
2023-01-11T21:38:06.7742590Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7742674Z     tmp3 = 0.8222222222222222
2023-01-11T21:38:06.7742758Z     tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.7742876Z     tmp5 = tmp4 - tmp1
2023-01-11T21:38:06.7742952Z     tmp6 = 0.0
2023-01-11T21:38:06.7743100Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6))
2023-01-11T21:38:06.7743206Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.7743288Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7743368Z     tmp10 = x0
2023-01-11T21:38:06.7743454Z     tmp11 = tmp10 + tmp1
2023-01-11T21:38:06.7743541Z     tmp12 = 0.8444444444444444
2023-01-11T21:38:06.7743627Z     tmp13 = tmp11 * tmp12
2023-01-11T21:38:06.7743745Z     tmp14 = tmp13 - tmp1
2023-01-11T21:38:06.7743891Z     tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 > tmp6, tmp14, tmp6))
2023-01-11T21:38:06.7743987Z     tmp16 = tl.libdevice.floor(tmp15)
2023-01-11T21:38:06.7744077Z     tmp17 = tmp16.to(tl.int32)
2023-01-11T21:38:06.7744306Z     tmp18 = tl.load(in_ptr0 + (tmp17 + (38*tmp9) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7744383Z     tmp19 = 1.0
2023-01-11T21:38:06.7744479Z     tmp20 = tmp9.to(tl.float32)
2023-01-11T21:38:06.7744594Z     tmp21 = tmp7 - tmp20
2023-01-11T21:38:06.7744713Z     tmp22 = tmp19 - tmp21
2023-01-11T21:38:06.7744789Z     tmp23 = tmp18 * tmp22
2023-01-11T21:38:06.7744892Z     tmp24 = tl.libdevice.ceil(tmp7)
2023-01-11T21:38:06.7744971Z     tmp25 = 36.0
2023-01-11T21:38:06.7745171Z     tmp26 = tl.where(tmp24 != tmp24, tmp24, tl.where(tmp24 < tmp25, tmp24, tmp25))
2023-01-11T21:38:06.7745261Z     tmp27 = tmp26.to(tl.int32)
2023-01-11T21:38:06.7745493Z     tmp28 = tl.load(in_ptr0 + (tmp17 + (38*tmp27) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7745579Z     tmp29 = tmp28 * tmp21
2023-01-11T21:38:06.7745651Z     tmp30 = tmp23 + tmp29
2023-01-11T21:38:06.7745750Z     tmp31 = tl.libdevice.ceil(tmp15)
2023-01-11T21:38:06.7745826Z     tmp32 = 37.0
2023-01-11T21:38:06.7745967Z     tmp33 = tl.where(tmp31 != tmp31, tmp31, tl.where(tmp31 < tmp32, tmp31, tmp32))
2023-01-11T21:38:06.7746057Z     tmp34 = tmp33.to(tl.int32)
2023-01-11T21:38:06.7746188Z     tmp35 = tl.load(in_ptr0 + (tmp34 + (38*tmp9) + (1406*x2)), xmask)
2023-01-11T21:38:06.7746271Z     tmp36 = tmp35 * tmp22
2023-01-11T21:38:06.7746387Z     tmp37 = tl.load(in_ptr0 + (tmp34 + (38*tmp27) + (1406*x2)), xmask)
2023-01-11T21:38:06.7746470Z     tmp38 = tmp37 * tmp21
2023-01-11T21:38:06.7746557Z     tmp39 = tmp36 + tmp38
2023-01-11T21:38:06.7746651Z     tmp40 = tmp17.to(tl.float32)
2023-01-11T21:38:06.7746767Z     tmp41 = tmp15 - tmp40
2023-01-11T21:38:06.7746882Z     tmp42 = tmp19 - tmp41
2023-01-11T21:38:06.7746966Z     tmp43 = tmp30 * tmp42
2023-01-11T21:38:06.7747037Z     tmp44 = tmp39 * tmp41
2023-01-11T21:38:06.7747118Z     tmp45 = tmp43 + tmp44
2023-01-11T21:38:06.7747261Z     tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp45, xmask)
2023-01-11T21:38:06.7747348Z ''')
2023-01-11T21:38:06.7747353Z 
2023-01-11T21:38:06.7747358Z 
2023-01-11T21:38:06.7747670Z triton_fused_add_5_add_6_add_7_arange_2_arange_3_convert_element_type_4_convert_element_type_6_floor_2_floor_3_index_4_1 = async_compile.triton('''
2023-01-11T21:38:06.7747782Z import triton
2023-01-11T21:38:06.7747876Z import triton.language as tl
2023-01-11T21:38:06.7747985Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7748093Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7748232Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7748363Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7748368Z 
2023-01-11T21:38:06.7748789Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7748867Z @triton.jit
2023-01-11T21:38:06.7749005Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7749082Z     xnumel = 44992
2023-01-11T21:38:06.7749173Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7749308Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7749391Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7749477Z     x1 = (xindex // 76) % 74
2023-01-11T21:38:06.7749552Z     x0 = xindex % 76
2023-01-11T21:38:06.7749633Z     x2 = (xindex // 5624)
2023-01-11T21:38:06.7749707Z     x4 = xindex
2023-01-11T21:38:06.7749772Z     tmp0 = x1
2023-01-11T21:38:06.7749855Z     tmp1 = 0.4931506849315068
2023-01-11T21:38:06.7749935Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7750036Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.7750125Z     tmp4 = tmp3.to(tl.int32)
2023-01-11T21:38:06.7750200Z     tmp5 = x0
2023-01-11T21:38:06.7750283Z     tmp6 = 0.49333333333333335
2023-01-11T21:38:06.7750354Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.7750453Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.7750539Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7750768Z     tmp10 = tl.load(in_ptr0 + (tmp9 + (38*tmp4) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7750844Z     tmp11 = 1.0
2023-01-11T21:38:06.7750934Z     tmp12 = tmp4.to(tl.float32)
2023-01-11T21:38:06.7751047Z     tmp13 = tmp2 - tmp12
2023-01-11T21:38:06.7751152Z     tmp14 = tmp11 - tmp13
2023-01-11T21:38:06.7751233Z     tmp15 = tmp10 * tmp14
2023-01-11T21:38:06.7751328Z     tmp16 = tl.libdevice.ceil(tmp2)
2023-01-11T21:38:06.7751433Z     tmp17 = 36.0
2023-01-11T21:38:06.7751575Z     tmp18 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp17, tmp16, tmp17))
2023-01-11T21:38:06.7751663Z     tmp19 = tmp18.to(tl.int32)
2023-01-11T21:38:06.7751889Z     tmp20 = tl.load(in_ptr0 + (tmp9 + (38*tmp19) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7751963Z     tmp21 = tmp20 * tmp13
2023-01-11T21:38:06.7752047Z     tmp22 = tmp15 + tmp21
2023-01-11T21:38:06.7752135Z     tmp23 = tmp9.to(tl.float32)
2023-01-11T21:38:06.7752250Z     tmp24 = tmp7 - tmp23
2023-01-11T21:38:06.7752363Z     tmp25 = tmp11 - tmp24
2023-01-11T21:38:06.7752446Z     tmp26 = tmp22 * tmp25
2023-01-11T21:38:06.7752547Z     tmp27 = tl.libdevice.ceil(tmp7)
2023-01-11T21:38:06.7752613Z     tmp28 = 37.0
2023-01-11T21:38:06.7752758Z     tmp29 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 < tmp28, tmp27, tmp28))
2023-01-11T21:38:06.7752846Z     tmp30 = tmp29.to(tl.int32)
2023-01-11T21:38:06.7752973Z     tmp31 = tl.load(in_ptr0 + (tmp30 + (38*tmp4) + (1406*x2)), xmask)
2023-01-11T21:38:06.7753055Z     tmp32 = tmp31 * tmp14
2023-01-11T21:38:06.7753178Z     tmp33 = tl.load(in_ptr0 + (tmp30 + (38*tmp19) + (1406*x2)), xmask)
2023-01-11T21:38:06.7753258Z     tmp34 = tmp33 * tmp13
2023-01-11T21:38:06.7753330Z     tmp35 = tmp32 + tmp34
2023-01-11T21:38:06.7753410Z     tmp36 = tmp35 * tmp24
2023-01-11T21:38:06.7753490Z     tmp37 = tmp26 + tmp36
2023-01-11T21:38:06.7753631Z     tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp37, xmask)
2023-01-11T21:38:06.7753715Z ''')
2023-01-11T21:38:06.7753721Z 
2023-01-11T21:38:06.7753726Z 
2023-01-11T21:38:06.7753819Z async_compile.wait(globals())
2023-01-11T21:38:06.7753927Z del async_compile
2023-01-11T21:38:06.7753932Z 
2023-01-11T21:38:06.7754007Z def call(args):
2023-01-11T21:38:06.7754073Z     arg0_1, = args
2023-01-11T21:38:06.7754147Z     args.clear()
2023-01-11T21:38:06.7754240Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7754470Z         buf0 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7754561Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.7754654Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7754864Z         triton_fused_add_add_2_add_3_add_4_arange_convert_element_type_floor_index_index_1_index_2_0.run(buf2, arg0_1, 16200, grid=grid(16200), stream=stream0)
2023-01-11T21:38:06.7755082Z         buf3 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7755184Z         buf5 = buf3; del buf3  # reuse
2023-01-11T21:38:06.7755452Z         triton_fused_add_5_add_6_add_7_arange_2_arange_3_convert_element_type_4_convert_element_type_6_floor_2_floor_3_index_4_1.run(buf5, arg0_1, 44992, grid=grid(44992), stream=stream0)
2023-01-11T21:38:06.7755532Z         del arg0_1
2023-01-11T21:38:06.7755615Z         return (buf2, buf5, )
2023-01-11T21:38:06.7755621Z 
2023-01-11T21:38:06.7755625Z 
2023-01-11T21:38:06.7755707Z if __name__ == "__main__":
2023-01-11T21:38:06.7755830Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7755960Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7756178Z     arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7756291Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7756297Z 
2023-01-11T21:38:06.7756571Z [2023-01-11 21:36:13,580] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 932
2023-01-11T21:38:06.7756577Z 
2023-01-11T21:38:06.7756676Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7756755Z import torch
2023-01-11T21:38:06.7756830Z import random
2023-01-11T21:38:06.7756954Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7757079Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7757084Z 
2023-01-11T21:38:06.7757159Z aten = torch.ops.aten
2023-01-11T21:38:06.7757328Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7757427Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7757432Z 
2023-01-11T21:38:06.7757508Z import triton
2023-01-11T21:38:06.7757602Z import triton.language as tl
2023-01-11T21:38:06.7757730Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7757874Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7757879Z 
2023-01-11T21:38:06.7757884Z 
2023-01-11T21:38:06.7758207Z triton_fused_add_add_2_add_3_arange_convert_element_type_convert_element_type_1_convert_element_type_5_floor_index_index_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7758287Z import triton
2023-01-11T21:38:06.7758372Z import triton.language as tl
2023-01-11T21:38:06.7758490Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7758594Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7758732Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7758862Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7758867Z 
2023-01-11T21:38:06.7759274Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7759349Z @triton.jit
2023-01-11T21:38:06.7759483Z def triton_(in_ptr0, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7759551Z     xnumel = 16200
2023-01-11T21:38:06.7759650Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7759778Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7759899Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7759985Z     x1 = (xindex // 45) % 45
2023-01-11T21:38:06.7760066Z     x0 = xindex % 45
2023-01-11T21:38:06.7760148Z     x2 = (xindex // 2025)
2023-01-11T21:38:06.7760214Z     x4 = xindex
2023-01-11T21:38:06.7760288Z     tmp0 = x1
2023-01-11T21:38:06.7760368Z     tmp1 = 0.5
2023-01-11T21:38:06.7760453Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.7760537Z     tmp3 = 0.8222222222222222
2023-01-11T21:38:06.7760620Z     tmp4 = tmp2 * tmp3
2023-01-11T21:38:06.7760725Z     tmp5 = tmp4 - tmp1
2023-01-11T21:38:06.7760801Z     tmp6 = 0.0
2023-01-11T21:38:06.7760942Z     tmp7 = tl.where(tmp5 != tmp5, tmp5, tl.where(tmp5 > tmp6, tmp5, tmp6))
2023-01-11T21:38:06.7761045Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.7761134Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7761209Z     tmp10 = x0
2023-01-11T21:38:06.7761292Z     tmp11 = tmp10 + tmp1
2023-01-11T21:38:06.7761367Z     tmp12 = 0.8444444444444444
2023-01-11T21:38:06.7761454Z     tmp13 = tmp11 * tmp12
2023-01-11T21:38:06.7761570Z     tmp14 = tmp13 - tmp1
2023-01-11T21:38:06.7761715Z     tmp15 = tl.where(tmp14 != tmp14, tmp14, tl.where(tmp14 > tmp6, tmp14, tmp6))
2023-01-11T21:38:06.7761820Z     tmp16 = tl.libdevice.floor(tmp15)
2023-01-11T21:38:06.7761909Z     tmp17 = tmp16.to(tl.int32)
2023-01-11T21:38:06.7762169Z     tmp18 = tl.load(in_ptr0 + (tmp17 + (38*tmp9) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7762255Z     tmp19 = tmp18.to(tl.float32)
2023-01-11T21:38:06.7762332Z     tmp20 = 1.0
2023-01-11T21:38:06.7762422Z     tmp21 = tmp9.to(tl.float32)
2023-01-11T21:38:06.7762538Z     tmp22 = tmp7 - tmp21
2023-01-11T21:38:06.7762656Z     tmp23 = tmp20 - tmp22
2023-01-11T21:38:06.7762738Z     tmp24 = tmp19 * tmp23
2023-01-11T21:38:06.7762838Z     tmp25 = tl.libdevice.ceil(tmp7)
2023-01-11T21:38:06.7762907Z     tmp26 = 36.0
2023-01-11T21:38:06.7763051Z     tmp27 = tl.where(tmp25 != tmp25, tmp25, tl.where(tmp25 < tmp26, tmp25, tmp26))
2023-01-11T21:38:06.7763146Z     tmp28 = tmp27.to(tl.int32)
2023-01-11T21:38:06.7763400Z     tmp29 = tl.load(in_ptr0 + (tmp17 + (38*tmp28) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7763492Z     tmp30 = tmp29.to(tl.float32)
2023-01-11T21:38:06.7763574Z     tmp31 = tmp30 * tmp22
2023-01-11T21:38:06.7763683Z     tmp32 = tmp24 + tmp31
2023-01-11T21:38:06.7763775Z     tmp33 = tl.libdevice.ceil(tmp15)
2023-01-11T21:38:06.7763848Z     tmp34 = 37.0
2023-01-11T21:38:06.7763990Z     tmp35 = tl.where(tmp33 != tmp33, tmp33, tl.where(tmp33 < tmp34, tmp33, tmp34))
2023-01-11T21:38:06.7764077Z     tmp36 = tmp35.to(tl.int32)
2023-01-11T21:38:06.7764215Z     tmp37 = tl.load(in_ptr0 + (tmp36 + (38*tmp9) + (1406*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7764303Z     tmp38 = tmp37.to(tl.float32)
2023-01-11T21:38:06.7764383Z     tmp39 = tmp38 * tmp23
2023-01-11T21:38:06.7764510Z     tmp40 = tl.load(in_ptr0 + (tmp36 + (38*tmp28) + (1406*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7764601Z     tmp41 = tmp40.to(tl.float32)
2023-01-11T21:38:06.7764682Z     tmp42 = tmp41 * tmp22
2023-01-11T21:38:06.7764761Z     tmp43 = tmp39 + tmp42
2023-01-11T21:38:06.7764847Z     tmp44 = tmp17.to(tl.float32)
2023-01-11T21:38:06.7764962Z     tmp45 = tmp15 - tmp44
2023-01-11T21:38:06.7765074Z     tmp46 = tmp20 - tmp45
2023-01-11T21:38:06.7765149Z     tmp47 = tmp32 * tmp46
2023-01-11T21:38:06.7765227Z     tmp48 = tmp43 * tmp45
2023-01-11T21:38:06.7765306Z     tmp49 = tmp47 + tmp48
2023-01-11T21:38:06.7765395Z     tmp50 = tmp49.to(tl.float32)
2023-01-11T21:38:06.7765531Z     tl.store(out_ptr2 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp50, xmask)
2023-01-11T21:38:06.7765616Z ''')
2023-01-11T21:38:06.7765622Z 
2023-01-11T21:38:06.7765626Z 
2023-01-11T21:38:06.7765989Z triton_fused_add_5_add_6_arange_2_arange_3_convert_element_type_11_convert_element_type_6_convert_element_type_7_convert_element_type_9_floor_2_floor_3_1 = async_compile.triton('''
2023-01-11T21:38:06.7766066Z import triton
2023-01-11T21:38:06.7766182Z import triton.language as tl
2023-01-11T21:38:06.7766301Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7766404Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7766538Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7766669Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7766676Z 
2023-01-11T21:38:06.7767088Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7767161Z @triton.jit
2023-01-11T21:38:06.7767295Z def triton_(in_ptr0, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7767363Z     xnumel = 44992
2023-01-11T21:38:06.7767465Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7767599Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7767687Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7767772Z     x1 = (xindex // 76) % 74
2023-01-11T21:38:06.7767848Z     x0 = xindex % 76
2023-01-11T21:38:06.7767927Z     x2 = (xindex // 5624)
2023-01-11T21:38:06.7767990Z     x4 = xindex
2023-01-11T21:38:06.7768064Z     tmp0 = x1
2023-01-11T21:38:06.7768145Z     tmp1 = 0.4931506849315068
2023-01-11T21:38:06.7768228Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7768330Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.7768416Z     tmp4 = tmp3.to(tl.int32)
2023-01-11T21:38:06.7768481Z     tmp5 = x0
2023-01-11T21:38:06.7768562Z     tmp6 = 0.49333333333333335
2023-01-11T21:38:06.7768641Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.7768739Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.7768823Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7769075Z     tmp10 = tl.load(in_ptr0 + (tmp9 + (38*tmp4) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7769166Z     tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.7769235Z     tmp12 = 1.0
2023-01-11T21:38:06.7769323Z     tmp13 = tmp4.to(tl.float32)
2023-01-11T21:38:06.7769435Z     tmp14 = tmp2 - tmp13
2023-01-11T21:38:06.7769551Z     tmp15 = tmp12 - tmp14
2023-01-11T21:38:06.7769632Z     tmp16 = tmp11 * tmp15
2023-01-11T21:38:06.7769729Z     tmp17 = tl.libdevice.ceil(tmp2)
2023-01-11T21:38:06.7769801Z     tmp18 = 36.0
2023-01-11T21:38:06.7769965Z     tmp19 = tl.where(tmp17 != tmp17, tmp17, tl.where(tmp17 < tmp18, tmp17, tmp18))
2023-01-11T21:38:06.7770054Z     tmp20 = tmp19.to(tl.int32)
2023-01-11T21:38:06.7770302Z     tmp21 = tl.load(in_ptr0 + (tmp9 + (38*tmp20) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7770391Z     tmp22 = tmp21.to(tl.float32)
2023-01-11T21:38:06.7770474Z     tmp23 = tmp22 * tmp14
2023-01-11T21:38:06.7770555Z     tmp24 = tmp16 + tmp23
2023-01-11T21:38:06.7770642Z     tmp25 = tmp9.to(tl.float32)
2023-01-11T21:38:06.7770746Z     tmp26 = tmp7 - tmp25
2023-01-11T21:38:06.7770859Z     tmp27 = tmp12 - tmp26
2023-01-11T21:38:06.7770943Z     tmp28 = tmp24 * tmp27
2023-01-11T21:38:06.7771040Z     tmp29 = tl.libdevice.ceil(tmp7)
2023-01-11T21:38:06.7771113Z     tmp30 = 37.0
2023-01-11T21:38:06.7771256Z     tmp31 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 < tmp30, tmp29, tmp30))
2023-01-11T21:38:06.7771343Z     tmp32 = tmp31.to(tl.int32)
2023-01-11T21:38:06.7771478Z     tmp33 = tl.load(in_ptr0 + (tmp32 + (38*tmp4) + (1406*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7771566Z     tmp34 = tmp33.to(tl.float32)
2023-01-11T21:38:06.7771646Z     tmp35 = tmp34 * tmp15
2023-01-11T21:38:06.7771781Z     tmp36 = tl.load(in_ptr0 + (tmp32 + (38*tmp20) + (1406*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7771868Z     tmp37 = tmp36.to(tl.float32)
2023-01-11T21:38:06.7771948Z     tmp38 = tmp37 * tmp14
2023-01-11T21:38:06.7772030Z     tmp39 = tmp35 + tmp38
2023-01-11T21:38:06.7772102Z     tmp40 = tmp39 * tmp26
2023-01-11T21:38:06.7772183Z     tmp41 = tmp28 + tmp40
2023-01-11T21:38:06.7772272Z     tmp42 = tmp41.to(tl.float32)
2023-01-11T21:38:06.7772441Z     tl.store(out_ptr2 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp42, xmask)
2023-01-11T21:38:06.7772527Z ''')
2023-01-11T21:38:06.7772533Z 
2023-01-11T21:38:06.7772537Z 
2023-01-11T21:38:06.7772632Z async_compile.wait(globals())
2023-01-11T21:38:06.7772711Z del async_compile
2023-01-11T21:38:06.7772716Z 
2023-01-11T21:38:06.7772793Z def call(args):
2023-01-11T21:38:06.7772859Z     arg0_1, = args
2023-01-11T21:38:06.7772933Z     args.clear()
2023-01-11T21:38:06.7773024Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7773255Z         buf2 = empty_strided((2, 4, 45, 45), (8100, 2025, 45, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7773350Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7773591Z         triton_fused_add_add_2_add_3_arange_convert_element_type_convert_element_type_1_convert_element_type_5_floor_index_index_1_0.run(arg0_1, buf2, 16200, grid=grid(16200), stream=stream0)
2023-01-11T21:38:06.7773819Z         buf5 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7774084Z         triton_fused_add_5_add_6_arange_2_arange_3_convert_element_type_11_convert_element_type_6_convert_element_type_7_convert_element_type_9_floor_2_floor_3_1.run(arg0_1, buf5, 44992, grid=grid(44992), stream=stream0)
2023-01-11T21:38:06.7774152Z         del arg0_1
2023-01-11T21:38:06.7774239Z         return (buf2, buf5, )
2023-01-11T21:38:06.7774245Z 
2023-01-11T21:38:06.7774249Z 
2023-01-11T21:38:06.7774329Z if __name__ == "__main__":
2023-01-11T21:38:06.7774452Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7774819Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7775053Z     arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7775165Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7775170Z 
2023-01-11T21:38:06.7775240Z ok (2.030s)
2023-01-11T21:38:06.7775809Z   test_upsample_bilinear2d_b_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7775946Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7776201Z [2023-01-11 21:36:13,857] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 933
2023-01-11T21:38:06.7776441Z [2023-01-11 21:36:13,926] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7776672Z [2023-01-11 21:36:13,941] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7776881Z [2023-01-11 21:36:13,949] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7777198Z [2023-01-11 21:36:14,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 933
2023-01-11T21:38:06.7777631Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7777764Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7778018Z [2023-01-11 21:36:14,452] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 934
2023-01-11T21:38:06.7778259Z [2023-01-11 21:36:14,515] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7778491Z [2023-01-11 21:36:14,524] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.7778747Z [2023-01-11 21:36:14,532] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7778954Z [2023-01-11 21:36:14,532] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.7779221Z [2023-01-11 21:36:14,729] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 934
2023-01-11T21:38:06.7779229Z 
2023-01-11T21:38:06.7779330Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7779411Z import torch
2023-01-11T21:38:06.7779489Z import random
2023-01-11T21:38:06.7779612Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7779731Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7779745Z 
2023-01-11T21:38:06.7779823Z aten = torch.ops.aten
2023-01-11T21:38:06.7779965Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7780064Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7780069Z 
2023-01-11T21:38:06.7780146Z import triton
2023-01-11T21:38:06.7780245Z import triton.language as tl
2023-01-11T21:38:06.7780373Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7780515Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7780521Z 
2023-01-11T21:38:06.7780525Z 
2023-01-11T21:38:06.7780830Z triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0 = async_compile.triton('''
2023-01-11T21:38:06.7780902Z import triton
2023-01-11T21:38:06.7780997Z import triton.language as tl
2023-01-11T21:38:06.7781117Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7781222Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7781358Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7781487Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7781493Z 
2023-01-11T21:38:06.7781916Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7781998Z @triton.jit
2023-01-11T21:38:06.7782138Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7782218Z     xnumel = 18880
2023-01-11T21:38:06.7782353Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7782492Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7782580Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7782666Z     x1 = (xindex // 118) % 80
2023-01-11T21:38:06.7782746Z     x0 = xindex % 118
2023-01-11T21:38:06.7782820Z     x2 = (xindex // 9440)
2023-01-11T21:38:06.7782891Z     x4 = xindex
2023-01-11T21:38:06.7782962Z     tmp0 = x1
2023-01-11T21:38:06.7783041Z     tmp1 = 0.4936708860759494
2023-01-11T21:38:06.7783121Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7783226Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.7783314Z     tmp4 = tmp3.to(tl.int32)
2023-01-11T21:38:06.7783382Z     tmp5 = x0
2023-01-11T21:38:06.7783465Z     tmp6 = 0.49572649572649574
2023-01-11T21:38:06.7783545Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.7783647Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.7783735Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7784000Z     tmp10 = tl.load(in_ptr0 + (tmp9 + (59*tmp4) + (2360*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7784068Z     tmp11 = 1.0
2023-01-11T21:38:06.7784162Z     tmp12 = tmp4.to(tl.float32)
2023-01-11T21:38:06.7784282Z     tmp13 = tmp2 - tmp12
2023-01-11T21:38:06.7784406Z     tmp14 = tmp11 - tmp13
2023-01-11T21:38:06.7784487Z     tmp15 = tmp10 * tmp14
2023-01-11T21:38:06.7784590Z     tmp16 = tl.libdevice.ceil(tmp2)
2023-01-11T21:38:06.7784665Z     tmp17 = 39.0
2023-01-11T21:38:06.7784810Z     tmp18 = tl.where(tmp16 != tmp16, tmp16, tl.where(tmp16 < tmp17, tmp16, tmp17))
2023-01-11T21:38:06.7784900Z     tmp19 = tmp18.to(tl.int32)
2023-01-11T21:38:06.7785160Z     tmp20 = tl.load(in_ptr0 + (tmp9 + (59*tmp19) + (2360*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7785273Z     tmp21 = tmp20 * tmp13
2023-01-11T21:38:06.7785353Z     tmp22 = tmp15 + tmp21
2023-01-11T21:38:06.7785443Z     tmp23 = tmp9.to(tl.float32)
2023-01-11T21:38:06.7785562Z     tmp24 = tmp7 - tmp23
2023-01-11T21:38:06.7785675Z     tmp25 = tmp11 - tmp24
2023-01-11T21:38:06.7785759Z     tmp26 = tmp22 * tmp25
2023-01-11T21:38:06.7785861Z     tmp27 = tl.libdevice.ceil(tmp7)
2023-01-11T21:38:06.7785935Z     tmp28 = 58.0
2023-01-11T21:38:06.7786088Z     tmp29 = tl.where(tmp27 != tmp27, tmp27, tl.where(tmp27 < tmp28, tmp27, tmp28))
2023-01-11T21:38:06.7786178Z     tmp30 = tmp29.to(tl.int32)
2023-01-11T21:38:06.7786310Z     tmp31 = tl.load(in_ptr0 + (tmp30 + (59*tmp4) + (2360*x2)), xmask)
2023-01-11T21:38:06.7786386Z     tmp32 = tmp31 * tmp14
2023-01-11T21:38:06.7786517Z     tmp33 = tl.load(in_ptr0 + (tmp30 + (59*tmp19) + (2360*x2)), xmask)
2023-01-11T21:38:06.7786597Z     tmp34 = tmp33 * tmp13
2023-01-11T21:38:06.7786682Z     tmp35 = tmp32 + tmp34
2023-01-11T21:38:06.7786763Z     tmp36 = tmp35 * tmp24
2023-01-11T21:38:06.7786844Z     tmp37 = tmp26 + tmp36
2023-01-11T21:38:06.7786995Z     tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp37, xmask)
2023-01-11T21:38:06.7787076Z ''')
2023-01-11T21:38:06.7787089Z 
2023-01-11T21:38:06.7787093Z 
2023-01-11T21:38:06.7787187Z async_compile.wait(globals())
2023-01-11T21:38:06.7787266Z del async_compile
2023-01-11T21:38:06.7787271Z 
2023-01-11T21:38:06.7787346Z def call(args):
2023-01-11T21:38:06.7787421Z     arg0_1, = args
2023-01-11T21:38:06.7787498Z     args.clear()
2023-01-11T21:38:06.7787594Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7787860Z         buf0 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7787944Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.7788036Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7788262Z         triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0.run(buf2, arg0_1, 18880, grid=grid(18880), stream=stream0)
2023-01-11T21:38:06.7788341Z         del arg0_1
2023-01-11T21:38:06.7788420Z         return (buf2, )
2023-01-11T21:38:06.7788425Z 
2023-01-11T21:38:06.7788429Z 
2023-01-11T21:38:06.7788509Z if __name__ == "__main__":
2023-01-11T21:38:06.7788654Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7788784Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7789004Z     arg0_1 = rand_strided((1, 2, 40, 59), (4720, 2360, 59, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7789115Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7789120Z 
2023-01-11T21:38:06.7789125Z 
2023-01-11T21:38:06.7789223Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7789299Z import torch
2023-01-11T21:38:06.7789373Z import random
2023-01-11T21:38:06.7789491Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7789621Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7789626Z 
2023-01-11T21:38:06.7789710Z aten = torch.ops.aten
2023-01-11T21:38:06.7789839Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7789936Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7789941Z 
2023-01-11T21:38:06.7790015Z import triton
2023-01-11T21:38:06.7790112Z import triton.language as tl
2023-01-11T21:38:06.7790238Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7790377Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7790383Z 
2023-01-11T21:38:06.7790387Z 
2023-01-11T21:38:06.7790741Z triton_fused_add_add_1_arange_arange_1_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_5_floor_floor_1_0 = async_compile.triton('''
2023-01-11T21:38:06.7790817Z import triton
2023-01-11T21:38:06.7790902Z import triton.language as tl
2023-01-11T21:38:06.7791017Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7791152Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7791287Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7791412Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7791418Z 
2023-01-11T21:38:06.7791830Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7791906Z @triton.jit
2023-01-11T21:38:06.7792040Z def triton_(in_ptr0, out_ptr2, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7792108Z     xnumel = 18880
2023-01-11T21:38:06.7792204Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7792334Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7792419Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7792501Z     x1 = (xindex // 118) % 80
2023-01-11T21:38:06.7792584Z     x0 = xindex % 118
2023-01-11T21:38:06.7792664Z     x2 = (xindex // 9440)
2023-01-11T21:38:06.7792728Z     x4 = xindex
2023-01-11T21:38:06.7792798Z     tmp0 = x1
2023-01-11T21:38:06.7792876Z     tmp1 = 0.4936708860759494
2023-01-11T21:38:06.7792956Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7793055Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.7793144Z     tmp4 = tmp3.to(tl.int32)
2023-01-11T21:38:06.7793215Z     tmp5 = x0
2023-01-11T21:38:06.7793287Z     tmp6 = 0.49572649572649574
2023-01-11T21:38:06.7793366Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.7793465Z     tmp8 = tl.libdevice.floor(tmp7)
2023-01-11T21:38:06.7793550Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7793804Z     tmp10 = tl.load(in_ptr0 + (tmp9 + (59*tmp4) + (2360*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7793896Z     tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.7793968Z     tmp12 = 1.0
2023-01-11T21:38:06.7794048Z     tmp13 = tmp4.to(tl.float32)
2023-01-11T21:38:06.7794163Z     tmp14 = tmp2 - tmp13
2023-01-11T21:38:06.7794277Z     tmp15 = tmp12 - tmp14
2023-01-11T21:38:06.7794356Z     tmp16 = tmp11 * tmp15
2023-01-11T21:38:06.7794454Z     tmp17 = tl.libdevice.ceil(tmp2)
2023-01-11T21:38:06.7794526Z     tmp18 = 39.0
2023-01-11T21:38:06.7794667Z     tmp19 = tl.where(tmp17 != tmp17, tmp17, tl.where(tmp17 < tmp18, tmp17, tmp18))
2023-01-11T21:38:06.7794774Z     tmp20 = tmp19.to(tl.int32)
2023-01-11T21:38:06.7795026Z     tmp21 = tl.load(in_ptr0 + (tmp9 + (59*tmp20) + (2360*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7795116Z     tmp22 = tmp21.to(tl.float32)
2023-01-11T21:38:06.7795196Z     tmp23 = tmp22 * tmp14
2023-01-11T21:38:06.7795276Z     tmp24 = tmp16 + tmp23
2023-01-11T21:38:06.7795364Z     tmp25 = tmp9.to(tl.float32)
2023-01-11T21:38:06.7795477Z     tmp26 = tmp7 - tmp25
2023-01-11T21:38:06.7795582Z     tmp27 = tmp12 - tmp26
2023-01-11T21:38:06.7795662Z     tmp28 = tmp24 * tmp27
2023-01-11T21:38:06.7795759Z     tmp29 = tl.libdevice.ceil(tmp7)
2023-01-11T21:38:06.7795837Z     tmp30 = 58.0
2023-01-11T21:38:06.7795981Z     tmp31 = tl.where(tmp29 != tmp29, tmp29, tl.where(tmp29 < tmp30, tmp29, tmp30))
2023-01-11T21:38:06.7796067Z     tmp32 = tmp31.to(tl.int32)
2023-01-11T21:38:06.7796207Z     tmp33 = tl.load(in_ptr0 + (tmp32 + (59*tmp4) + (2360*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7796290Z     tmp34 = tmp33.to(tl.float32)
2023-01-11T21:38:06.7796369Z     tmp35 = tmp34 * tmp15
2023-01-11T21:38:06.7796505Z     tmp36 = tl.load(in_ptr0 + (tmp32 + (59*tmp20) + (2360*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7796593Z     tmp37 = tmp36.to(tl.float32)
2023-01-11T21:38:06.7796673Z     tmp38 = tmp37 * tmp14
2023-01-11T21:38:06.7796752Z     tmp39 = tmp35 + tmp38
2023-01-11T21:38:06.7796832Z     tmp40 = tmp39 * tmp26
2023-01-11T21:38:06.7796903Z     tmp41 = tmp28 + tmp40
2023-01-11T21:38:06.7796992Z     tmp42 = tmp41.to(tl.float32)
2023-01-11T21:38:06.7797128Z     tl.store(out_ptr2 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp42, xmask)
2023-01-11T21:38:06.7797277Z ''')
2023-01-11T21:38:06.7797282Z 
2023-01-11T21:38:06.7797287Z 
2023-01-11T21:38:06.7797382Z async_compile.wait(globals())
2023-01-11T21:38:06.7797460Z del async_compile
2023-01-11T21:38:06.7797466Z 
2023-01-11T21:38:06.7797540Z def call(args):
2023-01-11T21:38:06.7797605Z     arg0_1, = args
2023-01-11T21:38:06.7797680Z     args.clear()
2023-01-11T21:38:06.7797775Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7798010Z         buf2 = empty_strided((1, 2, 80, 118), (18880, 9440, 118, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7798103Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7798361Z         triton_fused_add_add_1_arange_arange_1_convert_element_type_convert_element_type_1_convert_element_type_3_convert_element_type_5_floor_floor_1_0.run(arg0_1, buf2, 18880, grid=grid(18880), stream=stream0)
2023-01-11T21:38:06.7798437Z         del arg0_1
2023-01-11T21:38:06.7798515Z         return (buf2, )
2023-01-11T21:38:06.7798520Z 
2023-01-11T21:38:06.7798527Z 
2023-01-11T21:38:06.7798600Z if __name__ == "__main__":
2023-01-11T21:38:06.7798720Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7798848Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7799077Z     arg0_1 = rand_strided((1, 2, 40, 59), (4720, 2360, 59, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7799194Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7799200Z 
2023-01-11T21:38:06.7799270Z ok (1.149s)
2023-01-11T21:38:06.7799737Z   test_upsample_nearest1d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7799869Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7800133Z [2023-01-11 21:36:15,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 935
2023-01-11T21:38:06.7800396Z [2023-01-11 21:36:15,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 935
2023-01-11T21:38:06.7800831Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7800964Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7801223Z [2023-01-11 21:36:15,538] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 936
2023-01-11T21:38:06.7801228Z 
2023-01-11T21:38:06.7801326Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7801406Z import torch
2023-01-11T21:38:06.7801483Z import random
2023-01-11T21:38:06.7801603Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7801731Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7801736Z 
2023-01-11T21:38:06.7801823Z aten = torch.ops.aten
2023-01-11T21:38:06.7801956Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7802053Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7802058Z 
2023-01-11T21:38:06.7802132Z import triton
2023-01-11T21:38:06.7802227Z import triton.language as tl
2023-01-11T21:38:06.7802351Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7802491Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7802497Z 
2023-01-11T21:38:06.7802501Z 
2023-01-11T21:38:06.7802721Z triton_fused_upsample_nearest1d_upsample_nearest1d_4_0 = async_compile.triton('''
2023-01-11T21:38:06.7802798Z import triton
2023-01-11T21:38:06.7802911Z import triton.language as tl
2023-01-11T21:38:06.7803028Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7803129Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7803264Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7803391Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7803399Z 
2023-01-11T21:38:06.7803826Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7803901Z @triton.jit
2023-01-11T21:38:06.7804045Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7804113Z     xnumel = 592
2023-01-11T21:38:06.7804210Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7804340Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7804431Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7804508Z     x0 = xindex % 74
2023-01-11T21:38:06.7804589Z     x1 = (xindex // 74)
2023-01-11T21:38:06.7804661Z     x2 = xindex
2023-01-11T21:38:06.7804725Z     tmp0 = x0
2023-01-11T21:38:06.7804799Z     tmp1 = 0.5
2023-01-11T21:38:06.7804878Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7804967Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7805175Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7805284Z     tmp5 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask)
2023-01-11T21:38:06.7805420Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7805546Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7805630Z ''')
2023-01-11T21:38:06.7805636Z 
2023-01-11T21:38:06.7805640Z 
2023-01-11T21:38:06.7805827Z triton_fused_upsample_nearest1d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7805902Z import triton
2023-01-11T21:38:06.7806000Z import triton.language as tl
2023-01-11T21:38:06.7806115Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7806217Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7806342Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7806468Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7806502Z 
2023-01-11T21:38:06.7806911Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7806987Z @triton.jit
2023-01-11T21:38:06.7807120Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7807196Z     xnumel = 560
2023-01-11T21:38:06.7807293Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7807422Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7807501Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7807577Z     x0 = xindex % 70
2023-01-11T21:38:06.7807656Z     x1 = (xindex // 70)
2023-01-11T21:38:06.7807727Z     x2 = xindex
2023-01-11T21:38:06.7807797Z     tmp0 = x0
2023-01-11T21:38:06.7807878Z     tmp1 = 0.5285714285714286
2023-01-11T21:38:06.7807957Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7808038Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7808244Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7808378Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7808463Z ''')
2023-01-11T21:38:06.7808468Z 
2023-01-11T21:38:06.7808472Z 
2023-01-11T21:38:06.7808656Z triton_fused_upsample_nearest1d_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7808730Z import triton
2023-01-11T21:38:06.7808821Z import triton.language as tl
2023-01-11T21:38:06.7808927Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7809029Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7809189Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7809316Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7809321Z 
2023-01-11T21:38:06.7809723Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7809798Z @triton.jit
2023-01-11T21:38:06.7809928Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7810003Z     xnumel = 360
2023-01-11T21:38:06.7810091Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7810221Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7810304Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7810378Z     x0 = xindex % 45
2023-01-11T21:38:06.7810457Z     x1 = (xindex // 45)
2023-01-11T21:38:06.7810530Z     x2 = xindex
2023-01-11T21:38:06.7810605Z     tmp0 = x0
2023-01-11T21:38:06.7810677Z     tmp1 = 0.8222222222222222
2023-01-11T21:38:06.7810757Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7810843Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7811050Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7811187Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7811272Z ''')
2023-01-11T21:38:06.7811277Z 
2023-01-11T21:38:06.7811281Z 
2023-01-11T21:38:06.7811464Z triton_fused_upsample_nearest1d_3_3 = async_compile.triton('''
2023-01-11T21:38:06.7811541Z import triton
2023-01-11T21:38:06.7811626Z import triton.language as tl
2023-01-11T21:38:06.7811741Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7811842Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7811975Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7812099Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7812107Z 
2023-01-11T21:38:06.7812513Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7812589Z @triton.jit
2023-01-11T21:38:06.7812748Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7812817Z     xnumel = 288
2023-01-11T21:38:06.7812914Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7813043Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7813126Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7813204Z     x0 = xindex % 36
2023-01-11T21:38:06.7813283Z     x1 = (xindex // 36)
2023-01-11T21:38:06.7813355Z     x2 = xindex
2023-01-11T21:38:06.7813419Z     tmp0 = x0
2023-01-11T21:38:06.7813498Z     tmp1 = 1.0277777777777777
2023-01-11T21:38:06.7813576Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7813666Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7813777Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask)
2023-01-11T21:38:06.7813912Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7813999Z ''')
2023-01-11T21:38:06.7814005Z 
2023-01-11T21:38:06.7814009Z 
2023-01-11T21:38:06.7814097Z async_compile.wait(globals())
2023-01-11T21:38:06.7814174Z del async_compile
2023-01-11T21:38:06.7814181Z 
2023-01-11T21:38:06.7814256Z def call(args):
2023-01-11T21:38:06.7814331Z     arg0_1, = args
2023-01-11T21:38:06.7814407Z     args.clear()
2023-01-11T21:38:06.7814704Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7814926Z         buf0 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7815129Z         buf4 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7815224Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7815410Z         triton_fused_upsample_nearest1d_upsample_nearest1d_4_0.run(arg0_1, buf0, buf4, 592, grid=grid(592), stream=stream0)
2023-01-11T21:38:06.7815667Z         buf1 = empty_strided((2, 4, 70), (280, 70, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7815828Z         triton_fused_upsample_nearest1d_1_1.run(arg0_1, buf1, 560, grid=grid(560), stream=stream0)
2023-01-11T21:38:06.7816041Z         buf2 = empty_strided((2, 4, 45), (180, 45, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7816200Z         triton_fused_upsample_nearest1d_2_2.run(arg0_1, buf2, 360, grid=grid(360), stream=stream0)
2023-01-11T21:38:06.7816404Z         buf3 = empty_strided((2, 4, 36), (144, 36, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7816561Z         triton_fused_upsample_nearest1d_3_3.run(arg0_1, buf3, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.7816630Z         del arg0_1
2023-01-11T21:38:06.7816737Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7816742Z 
2023-01-11T21:38:06.7816751Z 
2023-01-11T21:38:06.7816837Z if __name__ == "__main__":
2023-01-11T21:38:06.7816958Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7817088Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7817367Z     arg0_1 = rand_strided((2, 4, 37), (148, 37, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7817485Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7817492Z 
2023-01-11T21:38:06.7817759Z [2023-01-11 21:36:15,656] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 936
2023-01-11T21:38:06.7817765Z 
2023-01-11T21:38:06.7817858Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7817935Z import torch
2023-01-11T21:38:06.7818012Z import random
2023-01-11T21:38:06.7818134Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7818263Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7818268Z 
2023-01-11T21:38:06.7818353Z aten = torch.ops.aten
2023-01-11T21:38:06.7818496Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7818599Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7818604Z 
2023-01-11T21:38:06.7818674Z import triton
2023-01-11T21:38:06.7818771Z import triton.language as tl
2023-01-11T21:38:06.7818898Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7819079Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7819085Z 
2023-01-11T21:38:06.7819090Z 
2023-01-11T21:38:06.7819316Z triton_fused_upsample_nearest1d_upsample_nearest1d_4_0 = async_compile.triton('''
2023-01-11T21:38:06.7819393Z import triton
2023-01-11T21:38:06.7819493Z import triton.language as tl
2023-01-11T21:38:06.7819610Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7819708Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7819845Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7819972Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7819981Z 
2023-01-11T21:38:06.7820404Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7820480Z @triton.jit
2023-01-11T21:38:06.7820629Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7820706Z     xnumel = 592
2023-01-11T21:38:06.7820808Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7820934Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7821024Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7821103Z     x0 = xindex % 74
2023-01-11T21:38:06.7821186Z     x1 = (xindex // 74)
2023-01-11T21:38:06.7821261Z     x2 = xindex
2023-01-11T21:38:06.7821334Z     tmp0 = x0
2023-01-11T21:38:06.7821402Z     tmp1 = 0.5
2023-01-11T21:38:06.7821483Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7821606Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7821841Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7821973Z     tmp5 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.7822113Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7822252Z     tl.store(out_ptr1 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.7822338Z ''')
2023-01-11T21:38:06.7822344Z 
2023-01-11T21:38:06.7822348Z 
2023-01-11T21:38:06.7822529Z triton_fused_upsample_nearest1d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7822607Z import triton
2023-01-11T21:38:06.7822701Z import triton.language as tl
2023-01-11T21:38:06.7822819Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7822926Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7823061Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7823196Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7823201Z 
2023-01-11T21:38:06.7823605Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7823678Z @triton.jit
2023-01-11T21:38:06.7823812Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7823891Z     xnumel = 560
2023-01-11T21:38:06.7823992Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7824123Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7824210Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7824288Z     x0 = xindex % 70
2023-01-11T21:38:06.7824363Z     x1 = (xindex // 70)
2023-01-11T21:38:06.7824436Z     x2 = xindex
2023-01-11T21:38:06.7824510Z     tmp0 = x0
2023-01-11T21:38:06.7824593Z     tmp1 = 0.5285714285714286
2023-01-11T21:38:06.7824677Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7824765Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7824996Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7825125Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7825211Z ''')
2023-01-11T21:38:06.7825246Z 
2023-01-11T21:38:06.7825251Z 
2023-01-11T21:38:06.7825467Z triton_fused_upsample_nearest1d_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7825564Z import triton
2023-01-11T21:38:06.7825663Z import triton.language as tl
2023-01-11T21:38:06.7825780Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7825889Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7826023Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7826144Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7826149Z 
2023-01-11T21:38:06.7826550Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7826628Z @triton.jit
2023-01-11T21:38:06.7826762Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7826841Z     xnumel = 360
2023-01-11T21:38:06.7826938Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7827068Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7827156Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7827227Z     x0 = xindex % 45
2023-01-11T21:38:06.7827309Z     x1 = (xindex // 45)
2023-01-11T21:38:06.7827384Z     x2 = xindex
2023-01-11T21:38:06.7827459Z     tmp0 = x0
2023-01-11T21:38:06.7827542Z     tmp1 = 0.8222222222222222
2023-01-11T21:38:06.7827625Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7827707Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7827937Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7828099Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7828188Z ''')
2023-01-11T21:38:06.7828193Z 
2023-01-11T21:38:06.7828198Z 
2023-01-11T21:38:06.7828385Z triton_fused_upsample_nearest1d_3_3 = async_compile.triton('''
2023-01-11T21:38:06.7828464Z import triton
2023-01-11T21:38:06.7828558Z import triton.language as tl
2023-01-11T21:38:06.7828675Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7828772Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7828909Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7829035Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7829040Z 
2023-01-11T21:38:06.7829444Z @pointwise(size_hints=[512], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7829524Z @triton.jit
2023-01-11T21:38:06.7829658Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7829736Z     xnumel = 288
2023-01-11T21:38:06.7829835Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7829962Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7830047Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7830125Z     x0 = xindex % 36
2023-01-11T21:38:06.7830206Z     x1 = (xindex // 36)
2023-01-11T21:38:06.7830280Z     x2 = xindex
2023-01-11T21:38:06.7830355Z     tmp0 = x0
2023-01-11T21:38:06.7830437Z     tmp1 = 1.0277777777777777
2023-01-11T21:38:06.7830511Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7830599Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7830729Z     tmp4 = tl.load(in_ptr0 + (tmp3 + (37*x1)), xmask).to(tl.float32)
2023-01-11T21:38:06.7830867Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.7830958Z ''')
2023-01-11T21:38:06.7830963Z 
2023-01-11T21:38:06.7830967Z 
2023-01-11T21:38:06.7831064Z async_compile.wait(globals())
2023-01-11T21:38:06.7831147Z del async_compile
2023-01-11T21:38:06.7831152Z 
2023-01-11T21:38:06.7831222Z def call(args):
2023-01-11T21:38:06.7831297Z     arg0_1, = args
2023-01-11T21:38:06.7831374Z     args.clear()
2023-01-11T21:38:06.7831496Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7831708Z         buf0 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7831913Z         buf4 = empty_strided((2, 4, 74), (296, 74, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7832006Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7832193Z         triton_fused_upsample_nearest1d_upsample_nearest1d_4_0.run(arg0_1, buf0, buf4, 592, grid=grid(592), stream=stream0)
2023-01-11T21:38:06.7832389Z         buf1 = empty_strided((2, 4, 70), (280, 70, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7832546Z         triton_fused_upsample_nearest1d_1_1.run(arg0_1, buf1, 560, grid=grid(560), stream=stream0)
2023-01-11T21:38:06.7832752Z         buf2 = empty_strided((2, 4, 45), (180, 45, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7832907Z         triton_fused_upsample_nearest1d_2_2.run(arg0_1, buf2, 360, grid=grid(360), stream=stream0)
2023-01-11T21:38:06.7833110Z         buf3 = empty_strided((2, 4, 36), (144, 36, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7833261Z         triton_fused_upsample_nearest1d_3_3.run(arg0_1, buf3, 288, grid=grid(288), stream=stream0)
2023-01-11T21:38:06.7833338Z         del arg0_1
2023-01-11T21:38:06.7833441Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7833447Z 
2023-01-11T21:38:06.7833451Z 
2023-01-11T21:38:06.7833524Z if __name__ == "__main__":
2023-01-11T21:38:06.7833641Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7833767Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7833977Z     arg0_1 = rand_strided((2, 4, 37), (148, 37, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7834118Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7834123Z 
2023-01-11T21:38:06.7834194Z ok (0.928s)
2023-01-11T21:38:06.7834672Z   test_upsample_nearest2d_backward_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7834803Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7835062Z [2023-01-11 21:36:15,683] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 937
2023-01-11T21:38:06.7835069Z 
2023-01-11T21:38:06.7835166Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7835237Z import torch
2023-01-11T21:38:06.7835313Z import random
2023-01-11T21:38:06.7835434Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7835557Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7835562Z 
2023-01-11T21:38:06.7835646Z aten = torch.ops.aten
2023-01-11T21:38:06.7835788Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7835884Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7835889Z 
2023-01-11T21:38:06.7835956Z import triton
2023-01-11T21:38:06.7836049Z import triton.language as tl
2023-01-11T21:38:06.7836175Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7836313Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7836318Z 
2023-01-11T21:38:06.7836323Z 
2023-01-11T21:38:06.7836527Z triton_fused_upsample_nearest2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.7836603Z import triton
2023-01-11T21:38:06.7836696Z import triton.language as tl
2023-01-11T21:38:06.7836812Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7836906Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7837037Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7837162Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7837167Z 
2023-01-11T21:38:06.7837595Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7837670Z @triton.jit
2023-01-11T21:38:06.7837802Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7837876Z     xnumel = 162
2023-01-11T21:38:06.7837972Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7838093Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7838175Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7838253Z     x0 = xindex % 6
2023-01-11T21:38:06.7838331Z     x1 = (xindex // 6)
2023-01-11T21:38:06.7838401Z     x2 = xindex
2023-01-11T21:38:06.7838608Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (24*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7838819Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7839020Z     tmp3 = tl.load(in_ptr0 + (12 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7839227Z     tmp5 = tl.load(in_ptr0 + (13 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7839307Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.7839386Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.7839465Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.7839539Z     tmp7 = 1.0
2023-01-11T21:38:06.7839616Z     tmp8 = tmp6 * tmp7
2023-01-11T21:38:06.7839743Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7839831Z ''')
2023-01-11T21:38:06.7839863Z 
2023-01-11T21:38:06.7839867Z 
2023-01-11T21:38:06.7840069Z triton_fused_upsample_nearest2d_backward_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7840144Z import triton
2023-01-11T21:38:06.7840237Z import triton.language as tl
2023-01-11T21:38:06.7840353Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7840457Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7840589Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7840707Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7840712Z 
2023-01-11T21:38:06.7841114Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7841186Z @triton.jit
2023-01-11T21:38:06.7841318Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7841393Z     xnumel = 180
2023-01-11T21:38:06.7841494Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7841622Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7841704Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7841779Z     x1 = (xindex // 5) % 4
2023-01-11T21:38:06.7841853Z     x0 = xindex % 5
2023-01-11T21:38:06.7841934Z     x2 = (xindex // 20)
2023-01-11T21:38:06.7842008Z     x4 = xindex
2023-01-11T21:38:06.7842089Z     tmp0 = ((3 + (6*x1)) // 4)
2023-01-11T21:38:06.7842171Z     tmp1 = ((9 + (6*x1)) // 4)
2023-01-11T21:38:06.7842243Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7842327Z     tmp3 = ((4 + (12*x0)) // 5)
2023-01-11T21:38:06.7842409Z     tmp4 = ((16 + (12*x0)) // 5)
2023-01-11T21:38:06.7842490Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.7842567Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.7842888Z     tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7842989Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.7843073Z     tmp9 = 1 + (((4 + (12*x0)) // 5))
2023-01-11T21:38:06.7843146Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.7843226Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.7843580Z     tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7843679Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.7843759Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.7843843Z     tmp15 = 2 + (((4 + (12*x0)) // 5))
2023-01-11T21:38:06.7843924Z     tmp16 = tmp15 < tmp4
2023-01-11T21:38:06.7843997Z     tmp17 = tmp2 & tmp16
2023-01-11T21:38:06.7844322Z     tmp18 = tl.load(in_ptr0 + (2 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7844420Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.7844503Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.7844588Z     tmp21 = 1 + (((3 + (6*x1)) // 4))
2023-01-11T21:38:06.7844668Z     tmp22 = tmp21 < tmp1
2023-01-11T21:38:06.7844748Z     tmp23 = tmp22 & tmp5
2023-01-11T21:38:06.7845071Z     tmp24 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp23 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7845160Z     tmp25 = tl.where(tmp23, tmp24, 0.0)
2023-01-11T21:38:06.7845240Z     tmp26 = tmp25 + tmp20
2023-01-11T21:38:06.7845319Z     tmp27 = tmp22 & tmp10
2023-01-11T21:38:06.7845631Z     tmp28 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7845729Z     tmp29 = tl.where(tmp27, tmp28, 0.0)
2023-01-11T21:38:06.7845810Z     tmp30 = tmp29 + tmp26
2023-01-11T21:38:06.7845890Z     tmp31 = tmp22 & tmp16
2023-01-11T21:38:06.7846219Z     tmp32 = tl.load(in_ptr0 + (14 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7846318Z     tmp33 = tl.where(tmp31, tmp32, 0.0)
2023-01-11T21:38:06.7846400Z     tmp34 = tmp33 + tmp30
2023-01-11T21:38:06.7846536Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask)
2023-01-11T21:38:06.7846621Z ''')
2023-01-11T21:38:06.7846626Z 
2023-01-11T21:38:06.7846631Z 
2023-01-11T21:38:06.7846887Z triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2 = async_compile.triton('''
2023-01-11T21:38:06.7846962Z import triton
2023-01-11T21:38:06.7847055Z import triton.language as tl
2023-01-11T21:38:06.7847164Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7847265Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7847399Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7847528Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7847533Z 
2023-01-11T21:38:06.7847956Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7848031Z @triton.jit
2023-01-11T21:38:06.7848175Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7848252Z     xnumel = 144
2023-01-11T21:38:06.7848341Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7848471Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7848556Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7848637Z     x1 = (xindex // 8) % 2
2023-01-11T21:38:06.7848714Z     x0 = xindex % 8
2023-01-11T21:38:06.7848791Z     x3 = (xindex // 8)
2023-01-11T21:38:06.7848862Z     x4 = xindex
2023-01-11T21:38:06.7848931Z     tmp0 = 3*x1
2023-01-11T21:38:06.7849006Z     tmp1 = 3 + (3*x1)
2023-01-11T21:38:06.7849086Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7849167Z     tmp3 = ((7 + (12*x0)) // 8)
2023-01-11T21:38:06.7849249Z     tmp4 = ((19 + (12*x0)) // 8)
2023-01-11T21:38:06.7849329Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.7849405Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.7849723Z     tmp7 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7849822Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.7849907Z     tmp9 = 1 + (((7 + (12*x0)) // 8))
2023-01-11T21:38:06.7849986Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.7850066Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.7850363Z     tmp12 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7850457Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.7850534Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.7850609Z     tmp15 = 1 + (3*x1)
2023-01-11T21:38:06.7850688Z     tmp16 = tmp15 < tmp1
2023-01-11T21:38:06.7850767Z     tmp17 = tmp16 & tmp5
2023-01-11T21:38:06.7851067Z     tmp18 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7851163Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.7851243Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.7851323Z     tmp21 = tmp16 & tmp10
2023-01-11T21:38:06.7851609Z     tmp22 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7851704Z     tmp23 = tl.where(tmp21, tmp22, 0.0)
2023-01-11T21:38:06.7851783Z     tmp24 = tmp23 + tmp20
2023-01-11T21:38:06.7851858Z     tmp25 = 2 + (3*x1)
2023-01-11T21:38:06.7851938Z     tmp26 = tmp25 < tmp1
2023-01-11T21:38:06.7852016Z     tmp27 = tmp26 & tmp5
2023-01-11T21:38:06.7852339Z     tmp28 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7852427Z     tmp29 = tl.where(tmp27, tmp28, 0.0)
2023-01-11T21:38:06.7852508Z     tmp30 = tmp29 + tmp24
2023-01-11T21:38:06.7852592Z     tmp31 = tmp26 & tmp10
2023-01-11T21:38:06.7852879Z     tmp32 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0)
2023-01-11T21:38:06.7852973Z     tmp33 = tl.where(tmp31, tmp32, 0.0)
2023-01-11T21:38:06.7853055Z     tmp34 = tmp33 + tmp30
2023-01-11T21:38:06.7853213Z     tmp35 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0)
2023-01-11T21:38:06.7853308Z     tmp36 = tl.where(tmp6, tmp35, 0.0)
2023-01-11T21:38:06.7853464Z     tmp37 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0)
2023-01-11T21:38:06.7853562Z     tmp38 = tl.where(tmp11, tmp37, 0.0)
2023-01-11T21:38:06.7853643Z     tmp39 = tmp38 + tmp36
2023-01-11T21:38:06.7853802Z     tmp40 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0)
2023-01-11T21:38:06.7853902Z     tmp41 = tl.where(tmp17, tmp40, 0.0)
2023-01-11T21:38:06.7853982Z     tmp42 = tmp41 + tmp39
2023-01-11T21:38:06.7854135Z     tmp43 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0)
2023-01-11T21:38:06.7854221Z     tmp44 = tl.where(tmp21, tmp43, 0.0)
2023-01-11T21:38:06.7854302Z     tmp45 = tmp44 + tmp42
2023-01-11T21:38:06.7854456Z     tmp46 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0)
2023-01-11T21:38:06.7854866Z     tmp47 = tl.where(tmp27, tmp46, 0.0)
2023-01-11T21:38:06.7854949Z     tmp48 = tmp47 + tmp45
2023-01-11T21:38:06.7855104Z     tmp49 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, other=0)
2023-01-11T21:38:06.7855197Z     tmp50 = tl.where(tmp31, tmp49, 0.0)
2023-01-11T21:38:06.7855278Z     tmp51 = tmp50 + tmp48
2023-01-11T21:38:06.7855406Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask)
2023-01-11T21:38:06.7855589Z     tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp51, xmask)
2023-01-11T21:38:06.7855679Z ''')
2023-01-11T21:38:06.7855684Z 
2023-01-11T21:38:06.7855688Z 
2023-01-11T21:38:06.7855892Z triton_fused_upsample_nearest2d_backward_4_3 = async_compile.triton('''
2023-01-11T21:38:06.7855970Z import triton
2023-01-11T21:38:06.7856062Z import triton.language as tl
2023-01-11T21:38:06.7856179Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7856281Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7856408Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7856535Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7856543Z 
2023-01-11T21:38:06.7856944Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7857020Z @triton.jit
2023-01-11T21:38:06.7857203Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7857285Z     xnumel = 252
2023-01-11T21:38:06.7857384Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7857515Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7857591Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7857675Z     x1 = (xindex // 7) % 4
2023-01-11T21:38:06.7857749Z     x0 = xindex % 7
2023-01-11T21:38:06.7857827Z     x2 = (xindex // 28)
2023-01-11T21:38:06.7857897Z     x4 = xindex
2023-01-11T21:38:06.7857978Z     tmp0 = ((3 + (6*x1)) // 4)
2023-01-11T21:38:06.7858094Z     tmp1 = ((9 + (6*x1)) // 4)
2023-01-11T21:38:06.7858177Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7858261Z     tmp3 = ((6 + (12*x0)) // 7)
2023-01-11T21:38:06.7858345Z     tmp4 = ((18 + (12*x0)) // 7)
2023-01-11T21:38:06.7858425Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.7858505Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.7858693Z     tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0)
2023-01-11T21:38:06.7858784Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.7858870Z     tmp9 = 1 + (((6 + (12*x0)) // 7))
2023-01-11T21:38:06.7858954Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.7859037Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.7859218Z     tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0)
2023-01-11T21:38:06.7859318Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.7859402Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.7859495Z     tmp15 = 1 + (((3 + (6*x1)) // 4))
2023-01-11T21:38:06.7859571Z     tmp16 = tmp15 < tmp1
2023-01-11T21:38:06.7859652Z     tmp17 = tmp16 & tmp5
2023-01-11T21:38:06.7859833Z     tmp18 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0)
2023-01-11T21:38:06.7859931Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.7860015Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.7860098Z     tmp21 = tmp16 & tmp10
2023-01-11T21:38:06.7860277Z     tmp22 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0)
2023-01-11T21:38:06.7860367Z     tmp23 = tl.where(tmp21, tmp22, 0.0)
2023-01-11T21:38:06.7860450Z     tmp24 = tmp23 + tmp20
2023-01-11T21:38:06.7860586Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.7860679Z ''')
2023-01-11T21:38:06.7860687Z 
2023-01-11T21:38:06.7860692Z 
2023-01-11T21:38:06.7860789Z async_compile.wait(globals())
2023-01-11T21:38:06.7860869Z del async_compile
2023-01-11T21:38:06.7860876Z 
2023-01-11T21:38:06.7860953Z def call(args):
2023-01-11T21:38:06.7861030Z     arg0_1, = args
2023-01-11T21:38:06.7861101Z     args.clear()
2023-01-11T21:38:06.7861225Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7861445Z         buf0 = empty_strided((3, 3, 3, 6), (54, 18, 6, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7861539Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7861708Z         triton_fused_upsample_nearest2d_backward_0.run(arg0_1, buf0, 162, grid=grid(162), stream=stream0)
2023-01-11T21:38:06.7861918Z         buf1 = empty_strided((3, 3, 4, 5), (60, 20, 5, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7862085Z         triton_fused_upsample_nearest2d_backward_1_1.run(arg0_1, buf1, 180, grid=grid(180), stream=stream0)
2023-01-11T21:38:06.7862300Z         buf2 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7862502Z         buf3 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7862708Z         triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2.run(arg0_1, buf2, buf3, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.7862925Z         buf4 = empty_strided((3, 3, 4, 7), (84, 28, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7863092Z         triton_fused_upsample_nearest2d_backward_4_3.run(arg0_1, buf4, 252, grid=grid(252), stream=stream0)
2023-01-11T21:38:06.7863167Z         del arg0_1
2023-01-11T21:38:06.7863271Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7863276Z 
2023-01-11T21:38:06.7863281Z 
2023-01-11T21:38:06.7863362Z if __name__ == "__main__":
2023-01-11T21:38:06.7863483Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7863603Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7863869Z     arg0_1 = rand_strided((3, 3, 6, 12), (216, 72, 12, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7863982Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7864253Z [2023-01-11 21:36:16,431] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 937
2023-01-11T21:38:06.7864674Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7864807Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7865065Z [2023-01-11 21:36:16,457] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 938
2023-01-11T21:38:06.7865071Z 
2023-01-11T21:38:06.7865079Z 
2023-01-11T21:38:06.7865177Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7865252Z import torch
2023-01-11T21:38:06.7865326Z import random
2023-01-11T21:38:06.7865439Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7865563Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7865568Z 
2023-01-11T21:38:06.7865656Z aten = torch.ops.aten
2023-01-11T21:38:06.7865793Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7865889Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7865895Z 
2023-01-11T21:38:06.7865970Z import triton
2023-01-11T21:38:06.7866064Z import triton.language as tl
2023-01-11T21:38:06.7866182Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7866321Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7866327Z 
2023-01-11T21:38:06.7866331Z 
2023-01-11T21:38:06.7866530Z triton_fused_upsample_nearest2d_backward_0 = async_compile.triton('''
2023-01-11T21:38:06.7866609Z import triton
2023-01-11T21:38:06.7866701Z import triton.language as tl
2023-01-11T21:38:06.7866816Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7866917Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7867050Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7867198Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7867204Z 
2023-01-11T21:38:06.7867608Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7867682Z @triton.jit
2023-01-11T21:38:06.7867819Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7867893Z     xnumel = 162
2023-01-11T21:38:06.7867991Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7868121Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7868207Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7868274Z     x0 = xindex % 6
2023-01-11T21:38:06.7868355Z     x1 = (xindex // 6)
2023-01-11T21:38:06.7868427Z     x2 = xindex
2023-01-11T21:38:06.7868659Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7868893Z     tmp1 = tl.load(in_ptr0 + (1 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7869126Z     tmp3 = tl.load(in_ptr0 + (12 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7869358Z     tmp5 = tl.load(in_ptr0 + (13 + (2*x0) + (24*x1)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7869439Z     tmp2 = tmp1 + tmp0
2023-01-11T21:38:06.7869511Z     tmp4 = tmp3 + tmp2
2023-01-11T21:38:06.7869589Z     tmp6 = tmp5 + tmp4
2023-01-11T21:38:06.7869665Z     tmp7 = 1.0
2023-01-11T21:38:06.7869742Z     tmp8 = tmp6 * tmp7
2023-01-11T21:38:06.7869909Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7869993Z ''')
2023-01-11T21:38:06.7869999Z 
2023-01-11T21:38:06.7870003Z 
2023-01-11T21:38:06.7870205Z triton_fused_upsample_nearest2d_backward_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7870273Z import triton
2023-01-11T21:38:06.7870370Z import triton.language as tl
2023-01-11T21:38:06.7870486Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7870590Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7870724Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7870852Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7870857Z 
2023-01-11T21:38:06.7871252Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7871327Z @triton.jit
2023-01-11T21:38:06.7871455Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7871530Z     xnumel = 180
2023-01-11T21:38:06.7871625Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7871754Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7871837Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7871925Z     x1 = (xindex // 5) % 4
2023-01-11T21:38:06.7871999Z     x0 = xindex % 5
2023-01-11T21:38:06.7872070Z     x2 = (xindex // 20)
2023-01-11T21:38:06.7872143Z     x4 = xindex
2023-01-11T21:38:06.7872228Z     tmp0 = ((3 + (6*x1)) // 4)
2023-01-11T21:38:06.7872312Z     tmp1 = ((9 + (6*x1)) // 4)
2023-01-11T21:38:06.7872392Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7872473Z     tmp3 = ((4 + (12*x0)) // 5)
2023-01-11T21:38:06.7872554Z     tmp4 = ((16 + (12*x0)) // 5)
2023-01-11T21:38:06.7872625Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.7872704Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.7873047Z     tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7873149Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.7873238Z     tmp9 = 1 + (((4 + (12*x0)) // 5))
2023-01-11T21:38:06.7873319Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.7873428Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.7873768Z     tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7873866Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.7873946Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.7874035Z     tmp15 = 2 + (((4 + (12*x0)) // 5))
2023-01-11T21:38:06.7874115Z     tmp16 = tmp15 < tmp4
2023-01-11T21:38:06.7874195Z     tmp17 = tmp2 & tmp16
2023-01-11T21:38:06.7874537Z     tmp18 = tl.load(in_ptr0 + (2 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7874636Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.7874710Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.7874794Z     tmp21 = 1 + (((3 + (6*x1)) // 4))
2023-01-11T21:38:06.7874877Z     tmp22 = tmp21 < tmp1
2023-01-11T21:38:06.7874957Z     tmp23 = tmp22 & tmp5
2023-01-11T21:38:06.7875302Z     tmp24 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp23 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7875398Z     tmp25 = tl.where(tmp23, tmp24, 0.0)
2023-01-11T21:38:06.7875479Z     tmp26 = tmp25 + tmp20
2023-01-11T21:38:06.7875552Z     tmp27 = tmp22 & tmp10
2023-01-11T21:38:06.7875886Z     tmp28 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7876010Z     tmp29 = tl.where(tmp27, tmp28, 0.0)
2023-01-11T21:38:06.7876092Z     tmp30 = tmp29 + tmp26
2023-01-11T21:38:06.7876171Z     tmp31 = tmp22 & tmp16
2023-01-11T21:38:06.7876504Z     tmp32 = tl.load(in_ptr0 + (14 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((4 + (12*x0)) // 5)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7876600Z     tmp33 = tl.where(tmp31, tmp32, 0.0)
2023-01-11T21:38:06.7876680Z     tmp34 = tmp33 + tmp30
2023-01-11T21:38:06.7876808Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask)
2023-01-11T21:38:06.7876895Z ''')
2023-01-11T21:38:06.7876900Z 
2023-01-11T21:38:06.7876905Z 
2023-01-11T21:38:06.7877159Z triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2 = async_compile.triton('''
2023-01-11T21:38:06.7877238Z import triton
2023-01-11T21:38:06.7877331Z import triton.language as tl
2023-01-11T21:38:06.7877451Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7877554Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7877691Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7877812Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7877817Z 
2023-01-11T21:38:06.7878240Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7878315Z @triton.jit
2023-01-11T21:38:06.7878458Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7878533Z     xnumel = 144
2023-01-11T21:38:06.7878629Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7878758Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7878841Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7878918Z     x1 = (xindex // 8) % 2
2023-01-11T21:38:06.7878993Z     x0 = xindex % 8
2023-01-11T21:38:06.7879070Z     x3 = (xindex // 8)
2023-01-11T21:38:06.7879142Z     x4 = xindex
2023-01-11T21:38:06.7879216Z     tmp0 = 3*x1
2023-01-11T21:38:06.7879291Z     tmp1 = 3 + (3*x1)
2023-01-11T21:38:06.7879363Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7879472Z     tmp3 = ((7 + (12*x0)) // 8)
2023-01-11T21:38:06.7879557Z     tmp4 = ((19 + (12*x0)) // 8)
2023-01-11T21:38:06.7879636Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.7879715Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.7880028Z     tmp7 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7880124Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.7880209Z     tmp9 = 1 + (((7 + (12*x0)) // 8))
2023-01-11T21:38:06.7880282Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.7880362Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.7880683Z     tmp12 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7880780Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.7880861Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.7880936Z     tmp15 = 1 + (3*x1)
2023-01-11T21:38:06.7881021Z     tmp16 = tmp15 < tmp1
2023-01-11T21:38:06.7881094Z     tmp17 = tmp16 & tmp5
2023-01-11T21:38:06.7881409Z     tmp18 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7881505Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.7881591Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.7881672Z     tmp21 = tmp16 & tmp10
2023-01-11T21:38:06.7881988Z     tmp22 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7882115Z     tmp23 = tl.where(tmp21, tmp22, 0.0)
2023-01-11T21:38:06.7882194Z     tmp24 = tmp23 + tmp20
2023-01-11T21:38:06.7882263Z     tmp25 = 2 + (3*x1)
2023-01-11T21:38:06.7882347Z     tmp26 = tmp25 < tmp1
2023-01-11T21:38:06.7882429Z     tmp27 = tmp26 & tmp5
2023-01-11T21:38:06.7882738Z     tmp28 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7882834Z     tmp29 = tl.where(tmp27, tmp28, 0.0)
2023-01-11T21:38:06.7882916Z     tmp30 = tmp29 + tmp24
2023-01-11T21:38:06.7882997Z     tmp31 = tmp26 & tmp10
2023-01-11T21:38:06.7883295Z     tmp32 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, eviction_policy='evict_last', other=0).to(tl.float32)
2023-01-11T21:38:06.7883396Z     tmp33 = tl.where(tmp31, tmp32, 0.0)
2023-01-11T21:38:06.7883478Z     tmp34 = tmp33 + tmp30
2023-01-11T21:38:06.7883653Z     tmp35 = tl.load(in_ptr0 + ((36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7883748Z     tmp36 = tl.where(tmp6, tmp35, 0.0)
2023-01-11T21:38:06.7883928Z     tmp37 = tl.load(in_ptr0 + (1 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7884025Z     tmp38 = tl.where(tmp11, tmp37, 0.0)
2023-01-11T21:38:06.7884106Z     tmp39 = tmp38 + tmp36
2023-01-11T21:38:06.7884270Z     tmp40 = tl.load(in_ptr0 + (12 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7884367Z     tmp41 = tl.where(tmp17, tmp40, 0.0)
2023-01-11T21:38:06.7884450Z     tmp42 = tmp41 + tmp39
2023-01-11T21:38:06.7884618Z     tmp43 = tl.load(in_ptr0 + (13 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7884711Z     tmp44 = tl.where(tmp21, tmp43, 0.0)
2023-01-11T21:38:06.7884796Z     tmp45 = tmp44 + tmp42
2023-01-11T21:38:06.7884962Z     tmp46 = tl.load(in_ptr0 + (24 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp27 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7885058Z     tmp47 = tl.where(tmp27, tmp46, 0.0)
2023-01-11T21:38:06.7885131Z     tmp48 = tmp47 + tmp45
2023-01-11T21:38:06.7885321Z     tmp49 = tl.load(in_ptr0 + (25 + (36*x3) + (((7 + (12*x0)) // 8)) + tl.zeros([XBLOCK], tl.int32)), tmp31 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7885422Z     tmp50 = tl.where(tmp31, tmp49, 0.0)
2023-01-11T21:38:06.7885504Z     tmp51 = tmp50 + tmp48
2023-01-11T21:38:06.7885640Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp34, xmask)
2023-01-11T21:38:06.7885772Z     tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp51, xmask)
2023-01-11T21:38:06.7885860Z ''')
2023-01-11T21:38:06.7885866Z 
2023-01-11T21:38:06.7885870Z 
2023-01-11T21:38:06.7886073Z triton_fused_upsample_nearest2d_backward_4_3 = async_compile.triton('''
2023-01-11T21:38:06.7886144Z import triton
2023-01-11T21:38:06.7886237Z import triton.language as tl
2023-01-11T21:38:06.7886353Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7886457Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7886594Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7886721Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7886727Z 
2023-01-11T21:38:06.7887127Z @pointwise(size_hints=[256], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.7887202Z @triton.jit
2023-01-11T21:38:06.7887328Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7887403Z     xnumel = 252
2023-01-11T21:38:06.7887501Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7887661Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7887744Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7887825Z     x1 = (xindex // 7) % 4
2023-01-11T21:38:06.7887900Z     x0 = xindex % 7
2023-01-11T21:38:06.7887973Z     x2 = (xindex // 28)
2023-01-11T21:38:06.7888045Z     x4 = xindex
2023-01-11T21:38:06.7888131Z     tmp0 = ((3 + (6*x1)) // 4)
2023-01-11T21:38:06.7888211Z     tmp1 = ((9 + (6*x1)) // 4)
2023-01-11T21:38:06.7888291Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.7888370Z     tmp3 = ((6 + (12*x0)) // 7)
2023-01-11T21:38:06.7888445Z     tmp4 = ((18 + (12*x0)) // 7)
2023-01-11T21:38:06.7888525Z     tmp5 = tmp3 < tmp4
2023-01-11T21:38:06.7888603Z     tmp6 = tmp2 & tmp5
2023-01-11T21:38:06.7888791Z     tmp7 = tl.load(in_ptr0 + ((12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp6 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7888889Z     tmp8 = tl.where(tmp6, tmp7, 0.0)
2023-01-11T21:38:06.7888972Z     tmp9 = 1 + (((6 + (12*x0)) // 7))
2023-01-11T21:38:06.7889057Z     tmp10 = tmp9 < tmp4
2023-01-11T21:38:06.7889131Z     tmp11 = tmp2 & tmp10
2023-01-11T21:38:06.7889321Z     tmp12 = tl.load(in_ptr0 + (1 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp11 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7889420Z     tmp13 = tl.where(tmp11, tmp12, 0.0)
2023-01-11T21:38:06.7889501Z     tmp14 = tmp13 + tmp8
2023-01-11T21:38:06.7889587Z     tmp15 = 1 + (((3 + (6*x1)) // 4))
2023-01-11T21:38:06.7889667Z     tmp16 = tmp15 < tmp1
2023-01-11T21:38:06.7889746Z     tmp17 = tmp16 & tmp5
2023-01-11T21:38:06.7889934Z     tmp18 = tl.load(in_ptr0 + (12 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp17 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7890023Z     tmp19 = tl.where(tmp17, tmp18, 0.0)
2023-01-11T21:38:06.7890104Z     tmp20 = tmp19 + tmp14
2023-01-11T21:38:06.7890185Z     tmp21 = tmp16 & tmp10
2023-01-11T21:38:06.7890366Z     tmp22 = tl.load(in_ptr0 + (13 + (12*(((3 + (6*x1)) // 4))) + (72*x2) + (((6 + (12*x0)) // 7)) + tl.zeros([XBLOCK], tl.int32)), tmp21 & xmask, other=0).to(tl.float32)
2023-01-11T21:38:06.7890461Z     tmp23 = tl.where(tmp21, tmp22, 0.0)
2023-01-11T21:38:06.7890542Z     tmp24 = tmp23 + tmp20
2023-01-11T21:38:06.7890726Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp24, xmask)
2023-01-11T21:38:06.7890814Z ''')
2023-01-11T21:38:06.7890820Z 
2023-01-11T21:38:06.7890824Z 
2023-01-11T21:38:06.7890911Z async_compile.wait(globals())
2023-01-11T21:38:06.7890990Z del async_compile
2023-01-11T21:38:06.7890995Z 
2023-01-11T21:38:06.7891069Z def call(args):
2023-01-11T21:38:06.7891148Z     arg0_1, = args
2023-01-11T21:38:06.7891223Z     args.clear()
2023-01-11T21:38:06.7891317Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7891533Z         buf0 = empty_strided((3, 3, 3, 6), (54, 18, 6, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7891618Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7891791Z         triton_fused_upsample_nearest2d_backward_0.run(arg0_1, buf0, 162, grid=grid(162), stream=stream0)
2023-01-11T21:38:06.7892001Z         buf1 = empty_strided((3, 3, 4, 5), (60, 20, 5, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7892172Z         triton_fused_upsample_nearest2d_backward_1_1.run(arg0_1, buf1, 180, grid=grid(180), stream=stream0)
2023-01-11T21:38:06.7892384Z         buf2 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7892592Z         buf3 = empty_strided((3, 3, 2, 8), (48, 16, 8, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7892796Z         triton_fused_upsample_nearest2d_backward_2_upsample_nearest2d_backward_3_2.run(arg0_1, buf2, buf3, 144, grid=grid(144), stream=stream0)
2023-01-11T21:38:06.7893008Z         buf4 = empty_strided((3, 3, 4, 7), (84, 28, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7893175Z         triton_fused_upsample_nearest2d_backward_4_3.run(arg0_1, buf4, 252, grid=grid(252), stream=stream0)
2023-01-11T21:38:06.7893272Z         del arg0_1
2023-01-11T21:38:06.7893379Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7893384Z 
2023-01-11T21:38:06.7893389Z 
2023-01-11T21:38:06.7893469Z if __name__ == "__main__":
2023-01-11T21:38:06.7893590Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7893721Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7893942Z     arg0_1 = rand_strided((3, 3, 6, 12), (216, 72, 12, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7894054Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7894320Z [2023-01-11 21:36:16,977] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 938
2023-01-11T21:38:06.7894326Z 
2023-01-11T21:38:06.7894391Z ok (1.321s)
2023-01-11T21:38:06.7895074Z   test_upsample_nearest2d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7895219Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7895479Z [2023-01-11 21:36:17,400] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 939
2023-01-11T21:38:06.7895743Z [2023-01-11 21:36:17,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 939
2023-01-11T21:38:06.7896160Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7896296Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7896552Z [2023-01-11 21:36:18,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 940
2023-01-11T21:38:06.7896558Z 
2023-01-11T21:38:06.7896714Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7896791Z import torch
2023-01-11T21:38:06.7896867Z import random
2023-01-11T21:38:06.7896980Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7897107Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7897112Z 
2023-01-11T21:38:06.7897254Z aten = torch.ops.aten
2023-01-11T21:38:06.7897395Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7897492Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7897498Z 
2023-01-11T21:38:06.7897571Z import triton
2023-01-11T21:38:06.7897663Z import triton.language as tl
2023-01-11T21:38:06.7897781Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7897923Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7897928Z 
2023-01-11T21:38:06.7897933Z 
2023-01-11T21:38:06.7898154Z triton_fused_upsample_nearest2d_upsample_nearest2d_4_0 = async_compile.triton('''
2023-01-11T21:38:06.7898229Z import triton
2023-01-11T21:38:06.7898323Z import triton.language as tl
2023-01-11T21:38:06.7898438Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7898546Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7898681Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7898800Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7898805Z 
2023-01-11T21:38:06.7899225Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7899345Z @triton.jit
2023-01-11T21:38:06.7899490Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7899569Z     xnumel = 44992
2023-01-11T21:38:06.7899667Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7899800Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7899885Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7899960Z     x1 = (xindex // 76) % 74
2023-01-11T21:38:06.7900036Z     x0 = xindex % 76
2023-01-11T21:38:06.7900116Z     x2 = (xindex // 5624)
2023-01-11T21:38:06.7900188Z     x4 = xindex
2023-01-11T21:38:06.7900261Z     tmp0 = x1
2023-01-11T21:38:06.7900335Z     tmp1 = 0.5
2023-01-11T21:38:06.7900414Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7900493Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7900563Z     tmp4 = x0
2023-01-11T21:38:06.7900643Z     tmp5 = tmp4 * tmp1
2023-01-11T21:38:06.7900727Z     tmp6 = tmp5.to(tl.int32)
2023-01-11T21:38:06.7900956Z     tmp7 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7901082Z     tmp8 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask)
2023-01-11T21:38:06.7901219Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7901349Z     tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7901435Z ''')
2023-01-11T21:38:06.7901441Z 
2023-01-11T21:38:06.7901445Z 
2023-01-11T21:38:06.7901634Z triton_fused_upsample_nearest2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7901708Z import triton
2023-01-11T21:38:06.7901801Z import triton.language as tl
2023-01-11T21:38:06.7901916Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7902019Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7902144Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7902269Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7902277Z 
2023-01-11T21:38:06.7902685Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7902759Z @triton.jit
2023-01-11T21:38:06.7902920Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7902997Z     xnumel = 42000
2023-01-11T21:38:06.7903097Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7903226Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7903309Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7903384Z     x1 = (xindex // 75) % 70
2023-01-11T21:38:06.7903460Z     x0 = xindex % 75
2023-01-11T21:38:06.7903540Z     x2 = (xindex // 5250)
2023-01-11T21:38:06.7903611Z     x4 = xindex
2023-01-11T21:38:06.7903686Z     tmp0 = x1
2023-01-11T21:38:06.7903764Z     tmp1 = 0.5285714285714286
2023-01-11T21:38:06.7903836Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7903924Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7903995Z     tmp4 = x0
2023-01-11T21:38:06.7904073Z     tmp5 = 0.5066666666666667
2023-01-11T21:38:06.7904153Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7904237Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7904467Z     tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7904596Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7904680Z ''')
2023-01-11T21:38:06.7904685Z 
2023-01-11T21:38:06.7904690Z 
2023-01-11T21:38:06.7904875Z triton_fused_upsample_nearest2d_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7904952Z import triton
2023-01-11T21:38:06.7905045Z import triton.language as tl
2023-01-11T21:38:06.7905159Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7905261Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7905396Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7905577Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7905584Z 
2023-01-11T21:38:06.7906013Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7906086Z @triton.jit
2023-01-11T21:38:06.7906217Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7906293Z     xnumel = 26640
2023-01-11T21:38:06.7906391Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7906523Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7906607Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7906681Z     x1 = (xindex // 74) % 45
2023-01-11T21:38:06.7906758Z     x0 = xindex % 74
2023-01-11T21:38:06.7906838Z     x2 = (xindex // 3330)
2023-01-11T21:38:06.7906909Z     x4 = xindex
2023-01-11T21:38:06.7906983Z     tmp0 = x1
2023-01-11T21:38:06.7907062Z     tmp1 = 0.8222222222222222
2023-01-11T21:38:06.7907134Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7907217Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7907290Z     tmp4 = x0
2023-01-11T21:38:06.7907367Z     tmp5 = 0.5135135135135135
2023-01-11T21:38:06.7907446Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7907532Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7907756Z     tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7907885Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7907971Z ''')
2023-01-11T21:38:06.7907976Z 
2023-01-11T21:38:06.7907981Z 
2023-01-11T21:38:06.7908168Z triton_fused_upsample_nearest2d_3_3 = async_compile.triton('''
2023-01-11T21:38:06.7908242Z import triton
2023-01-11T21:38:06.7908335Z import triton.language as tl
2023-01-11T21:38:06.7908453Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7908555Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7908691Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7908809Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7908815Z 
2023-01-11T21:38:06.7909247Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7909323Z @triton.jit
2023-01-11T21:38:06.7909456Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7909531Z     xnumel = 11232
2023-01-11T21:38:06.7909627Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7909756Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7909839Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7909914Z     x1 = (xindex // 39) % 36
2023-01-11T21:38:06.7909988Z     x0 = xindex % 39
2023-01-11T21:38:06.7910069Z     x2 = (xindex // 1404)
2023-01-11T21:38:06.7910141Z     x4 = xindex
2023-01-11T21:38:06.7910212Z     tmp0 = x1
2023-01-11T21:38:06.7910294Z     tmp1 = 1.0277777777777777
2023-01-11T21:38:06.7910366Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7910455Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7910527Z     tmp4 = x0
2023-01-11T21:38:06.7910612Z     tmp5 = 0.9743589743589743
2023-01-11T21:38:06.7910691Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7910776Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7910899Z     tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask)
2023-01-11T21:38:06.7911027Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7911115Z ''')
2023-01-11T21:38:06.7911121Z 
2023-01-11T21:38:06.7911125Z 
2023-01-11T21:38:06.7911219Z async_compile.wait(globals())
2023-01-11T21:38:06.7911296Z del async_compile
2023-01-11T21:38:06.7911301Z 
2023-01-11T21:38:06.7911374Z def call(args):
2023-01-11T21:38:06.7911448Z     arg0_1, = args
2023-01-11T21:38:06.7911552Z     args.clear()
2023-01-11T21:38:06.7911646Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7911867Z         buf0 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7912095Z         buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7912191Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7912381Z         triton_fused_upsample_nearest2d_upsample_nearest2d_4_0.run(arg0_1, buf0, buf4, 44992, grid=grid(44992), stream=stream0)
2023-01-11T21:38:06.7912603Z         buf1 = empty_strided((2, 4, 70, 75), (21000, 5250, 75, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7912762Z         triton_fused_upsample_nearest2d_1_1.run(arg0_1, buf1, 42000, grid=grid(42000), stream=stream0)
2023-01-11T21:38:06.7912987Z         buf2 = empty_strided((2, 4, 45, 74), (13320, 3330, 74, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7913143Z         triton_fused_upsample_nearest2d_2_2.run(arg0_1, buf2, 26640, grid=grid(26640), stream=stream0)
2023-01-11T21:38:06.7913357Z         buf3 = empty_strided((2, 4, 36, 39), (5616, 1404, 39, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7913511Z         triton_fused_upsample_nearest2d_3_3.run(arg0_1, buf3, 11232, grid=grid(11232), stream=stream0)
2023-01-11T21:38:06.7913590Z         del arg0_1
2023-01-11T21:38:06.7913698Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7913703Z 
2023-01-11T21:38:06.7913707Z 
2023-01-11T21:38:06.7913787Z if __name__ == "__main__":
2023-01-11T21:38:06.7913907Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7914033Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7914260Z     arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7914373Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7914378Z 
2023-01-11T21:38:06.7914638Z [2023-01-11 21:36:18,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 940
2023-01-11T21:38:06.7914651Z 
2023-01-11T21:38:06.7914743Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7914818Z import torch
2023-01-11T21:38:06.7914895Z import random
2023-01-11T21:38:06.7915042Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7915172Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7915177Z 
2023-01-11T21:38:06.7915262Z aten = torch.ops.aten
2023-01-11T21:38:06.7915401Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7915490Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7915494Z 
2023-01-11T21:38:06.7915569Z import triton
2023-01-11T21:38:06.7915662Z import triton.language as tl
2023-01-11T21:38:06.7915787Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7915926Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7915935Z 
2023-01-11T21:38:06.7915940Z 
2023-01-11T21:38:06.7916160Z triton_fused_upsample_nearest2d_upsample_nearest2d_4_0 = async_compile.triton('''
2023-01-11T21:38:06.7916239Z import triton
2023-01-11T21:38:06.7916331Z import triton.language as tl
2023-01-11T21:38:06.7916439Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7916546Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7916680Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7916807Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7916812Z 
2023-01-11T21:38:06.7917234Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7917307Z @triton.jit
2023-01-11T21:38:06.7917452Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7917556Z     xnumel = 44992
2023-01-11T21:38:06.7917647Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7917778Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7917864Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7917945Z     x1 = (xindex // 76) % 74
2023-01-11T21:38:06.7918023Z     x0 = xindex % 76
2023-01-11T21:38:06.7918103Z     x2 = (xindex // 5624)
2023-01-11T21:38:06.7918176Z     x4 = xindex
2023-01-11T21:38:06.7918240Z     tmp0 = x1
2023-01-11T21:38:06.7918313Z     tmp1 = 0.5
2023-01-11T21:38:06.7918393Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7918478Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7918550Z     tmp4 = x0
2023-01-11T21:38:06.7918629Z     tmp5 = tmp4 * tmp1
2023-01-11T21:38:06.7918706Z     tmp6 = tmp5.to(tl.int32)
2023-01-11T21:38:06.7918956Z     tmp7 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7919094Z     tmp8 = tl.load(in_ptr0 + (tmp6 + (38*tmp3) + (1406*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7919232Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp7, xmask)
2023-01-11T21:38:06.7919365Z     tl.store(out_ptr1 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7919451Z ''')
2023-01-11T21:38:06.7919457Z 
2023-01-11T21:38:06.7919461Z 
2023-01-11T21:38:06.7919649Z triton_fused_upsample_nearest2d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7919725Z import triton
2023-01-11T21:38:06.7919811Z import triton.language as tl
2023-01-11T21:38:06.7919926Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7920028Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7920160Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7920286Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7920291Z 
2023-01-11T21:38:06.7920696Z @pointwise(size_hints=[65536], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7920774Z @triton.jit
2023-01-11T21:38:06.7920905Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7920973Z     xnumel = 42000
2023-01-11T21:38:06.7921102Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7921236Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7921320Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7921404Z     x1 = (xindex // 75) % 70
2023-01-11T21:38:06.7921482Z     x0 = xindex % 75
2023-01-11T21:38:06.7921563Z     x2 = (xindex // 5250)
2023-01-11T21:38:06.7921626Z     x4 = xindex
2023-01-11T21:38:06.7921699Z     tmp0 = x1
2023-01-11T21:38:06.7921779Z     tmp1 = 0.5285714285714286
2023-01-11T21:38:06.7921858Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7921943Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7922016Z     tmp4 = x0
2023-01-11T21:38:06.7922098Z     tmp5 = 0.5066666666666667
2023-01-11T21:38:06.7922170Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7922256Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7922505Z     tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7922642Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7922726Z ''')
2023-01-11T21:38:06.7922732Z 
2023-01-11T21:38:06.7922736Z 
2023-01-11T21:38:06.7922921Z triton_fused_upsample_nearest2d_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7922996Z import triton
2023-01-11T21:38:06.7923081Z import triton.language as tl
2023-01-11T21:38:06.7923196Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7923297Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7923430Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7923555Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7923588Z 
2023-01-11T21:38:06.7923992Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7924065Z @triton.jit
2023-01-11T21:38:06.7924200Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7924269Z     xnumel = 26640
2023-01-11T21:38:06.7924366Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7924496Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7924579Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7924663Z     x1 = (xindex // 74) % 45
2023-01-11T21:38:06.7924743Z     x0 = xindex % 74
2023-01-11T21:38:06.7924824Z     x2 = (xindex // 3330)
2023-01-11T21:38:06.7924888Z     x4 = xindex
2023-01-11T21:38:06.7924959Z     tmp0 = x1
2023-01-11T21:38:06.7925039Z     tmp1 = 0.8222222222222222
2023-01-11T21:38:06.7925119Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7925206Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7925278Z     tmp4 = x0
2023-01-11T21:38:06.7925358Z     tmp5 = 0.5135135135135135
2023-01-11T21:38:06.7925430Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7925514Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7925769Z     tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7925905Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7925990Z ''')
2023-01-11T21:38:06.7925996Z 
2023-01-11T21:38:06.7926000Z 
2023-01-11T21:38:06.7926185Z triton_fused_upsample_nearest2d_3_3 = async_compile.triton('''
2023-01-11T21:38:06.7926262Z import triton
2023-01-11T21:38:06.7926355Z import triton.language as tl
2023-01-11T21:38:06.7926462Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7926564Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7926697Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7926829Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7926834Z 
2023-01-11T21:38:06.7927266Z @pointwise(size_hints=[16384], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7927344Z @triton.jit
2023-01-11T21:38:06.7927476Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7927551Z     xnumel = 11232
2023-01-11T21:38:06.7927641Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7927769Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7927851Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7927932Z     x1 = (xindex // 39) % 36
2023-01-11T21:38:06.7928007Z     x0 = xindex % 39
2023-01-11T21:38:06.7928091Z     x2 = (xindex // 1404)
2023-01-11T21:38:06.7928154Z     x4 = xindex
2023-01-11T21:38:06.7928229Z     tmp0 = x1
2023-01-11T21:38:06.7928308Z     tmp1 = 1.0277777777777777
2023-01-11T21:38:06.7928388Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7928473Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7928546Z     tmp4 = x0
2023-01-11T21:38:06.7928623Z     tmp5 = 0.9743589743589743
2023-01-11T21:38:06.7928695Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7928782Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7928921Z     tmp8 = tl.load(in_ptr0 + (tmp7 + (38*tmp3) + (1406*x2)), xmask).to(tl.float32)
2023-01-11T21:38:06.7929058Z     tl.store(out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp8, xmask)
2023-01-11T21:38:06.7929144Z ''')
2023-01-11T21:38:06.7929150Z 
2023-01-11T21:38:06.7929154Z 
2023-01-11T21:38:06.7929247Z async_compile.wait(globals())
2023-01-11T21:38:06.7929324Z del async_compile
2023-01-11T21:38:06.7929329Z 
2023-01-11T21:38:06.7929405Z def call(args):
2023-01-11T21:38:06.7929471Z     arg0_1, = args
2023-01-11T21:38:06.7929545Z     args.clear()
2023-01-11T21:38:06.7929665Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7929893Z         buf0 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7930116Z         buf4 = empty_strided((2, 4, 74, 76), (22496, 5624, 76, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7930209Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7930397Z         triton_fused_upsample_nearest2d_upsample_nearest2d_4_0.run(arg0_1, buf0, buf4, 44992, grid=grid(44992), stream=stream0)
2023-01-11T21:38:06.7930610Z         buf1 = empty_strided((2, 4, 70, 75), (21000, 5250, 75, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7930767Z         triton_fused_upsample_nearest2d_1_1.run(arg0_1, buf1, 42000, grid=grid(42000), stream=stream0)
2023-01-11T21:38:06.7930993Z         buf2 = empty_strided((2, 4, 45, 74), (13320, 3330, 74, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7931147Z         triton_fused_upsample_nearest2d_2_2.run(arg0_1, buf2, 26640, grid=grid(26640), stream=stream0)
2023-01-11T21:38:06.7931372Z         buf3 = empty_strided((2, 4, 36, 39), (5616, 1404, 39, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7931529Z         triton_fused_upsample_nearest2d_3_3.run(arg0_1, buf3, 11232, grid=grid(11232), stream=stream0)
2023-01-11T21:38:06.7931604Z         del arg0_1
2023-01-11T21:38:06.7931711Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7931717Z 
2023-01-11T21:38:06.7931721Z 
2023-01-11T21:38:06.7931802Z if __name__ == "__main__":
2023-01-11T21:38:06.7931914Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7932039Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7932264Z     arg0_1 = rand_strided((2, 4, 37, 38), (5624, 1406, 38, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7932377Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7932382Z 
2023-01-11T21:38:06.7932454Z ok (1.268s)
2023-01-11T21:38:06.7932948Z   test_upsample_nearest3d_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7933087Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7933345Z [2023-01-11 21:36:18,814] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 941
2023-01-11T21:38:06.7933610Z [2023-01-11 21:36:19,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 941
2023-01-11T21:38:06.7934024Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7934158Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7934409Z [2023-01-11 21:36:19,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 942
2023-01-11T21:38:06.7934414Z 
2023-01-11T21:38:06.7934709Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7934794Z import torch
2023-01-11T21:38:06.7934870Z import random
2023-01-11T21:38:06.7934991Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7935116Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7935122Z 
2023-01-11T21:38:06.7935203Z aten = torch.ops.aten
2023-01-11T21:38:06.7935334Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7935432Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7935437Z 
2023-01-11T21:38:06.7935511Z import triton
2023-01-11T21:38:06.7935657Z import triton.language as tl
2023-01-11T21:38:06.7935786Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7935929Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7935934Z 
2023-01-11T21:38:06.7935939Z 
2023-01-11T21:38:06.7936170Z triton_fused_upsample_nearest3d_upsample_nearest3d_4_0 = async_compile.triton('''
2023-01-11T21:38:06.7936248Z import triton
2023-01-11T21:38:06.7936337Z import triton.language as tl
2023-01-11T21:38:06.7936455Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7936559Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7936696Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7936824Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7936829Z 
2023-01-11T21:38:06.7937319Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7937402Z @triton.jit
2023-01-11T21:38:06.7937548Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7937627Z     xnumel = 3509376
2023-01-11T21:38:06.7937723Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7937855Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7937943Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7938030Z     x2 = (xindex // 5928) % 74
2023-01-11T21:38:06.7938114Z     x1 = (xindex // 78) % 76
2023-01-11T21:38:06.7938191Z     x0 = xindex % 78
2023-01-11T21:38:06.7938266Z     x3 = (xindex // 438672)
2023-01-11T21:38:06.7938339Z     x5 = xindex
2023-01-11T21:38:06.7938411Z     tmp0 = x2
2023-01-11T21:38:06.7938488Z     tmp1 = 0.5
2023-01-11T21:38:06.7938569Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7938661Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7938735Z     tmp4 = x1
2023-01-11T21:38:06.7938812Z     tmp5 = tmp4 * tmp1
2023-01-11T21:38:06.7938898Z     tmp6 = tmp5.to(tl.int32)
2023-01-11T21:38:06.7938972Z     tmp7 = x0
2023-01-11T21:38:06.7939053Z     tmp8 = tmp7 * tmp1
2023-01-11T21:38:06.7939138Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7939434Z     tmp10 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7939572Z     tmp11 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask)
2023-01-11T21:38:06.7939704Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.7939839Z     tl.store(out_ptr1 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7939928Z ''')
2023-01-11T21:38:06.7939934Z 
2023-01-11T21:38:06.7939938Z 
2023-01-11T21:38:06.7940127Z triton_fused_upsample_nearest3d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7940205Z import triton
2023-01-11T21:38:06.7940298Z import triton.language as tl
2023-01-11T21:38:06.7940420Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7940525Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7940654Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7940783Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7940788Z 
2023-01-11T21:38:06.7941196Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7941272Z @triton.jit
2023-01-11T21:38:06.7941406Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7941484Z     xnumel = 3360000
2023-01-11T21:38:06.7941584Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7941716Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7941794Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7941913Z     x2 = (xindex // 6000) % 70
2023-01-11T21:38:06.7941996Z     x1 = (xindex // 80) % 75
2023-01-11T21:38:06.7942074Z     x0 = xindex % 80
2023-01-11T21:38:06.7942158Z     x3 = (xindex // 420000)
2023-01-11T21:38:06.7942231Z     x5 = xindex
2023-01-11T21:38:06.7942305Z     tmp0 = x2
2023-01-11T21:38:06.7942380Z     tmp1 = 0.5285714285714286
2023-01-11T21:38:06.7942469Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7942556Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7942631Z     tmp4 = x1
2023-01-11T21:38:06.7942714Z     tmp5 = 0.5066666666666667
2023-01-11T21:38:06.7942796Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7942875Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7942950Z     tmp8 = x0
2023-01-11T21:38:06.7943026Z     tmp9 = 0.4875
2023-01-11T21:38:06.7943110Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.7943201Z     tmp11 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7943454Z     tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7943593Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.7943673Z ''')
2023-01-11T21:38:06.7943686Z 
2023-01-11T21:38:06.7943690Z 
2023-01-11T21:38:06.7943868Z triton_fused_upsample_nearest3d_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7943948Z import triton
2023-01-11T21:38:06.7944046Z import triton.language as tl
2023-01-11T21:38:06.7944162Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7944265Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7944400Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7944527Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7944533Z 
2023-01-11T21:38:06.7944933Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7945005Z @triton.jit
2023-01-11T21:38:06.7945136Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7945213Z     xnumel = 2743920
2023-01-11T21:38:06.7945313Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7945443Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7945558Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7945658Z     x2 = (xindex // 7622) % 45
2023-01-11T21:38:06.7945745Z     x1 = (xindex // 103) % 74
2023-01-11T21:38:06.7945835Z     x0 = xindex % 103
2023-01-11T21:38:06.7945929Z     x3 = (xindex // 342990)
2023-01-11T21:38:06.7946007Z     x5 = xindex
2023-01-11T21:38:06.7946080Z     tmp0 = x2
2023-01-11T21:38:06.7946161Z     tmp1 = 0.8222222222222222
2023-01-11T21:38:06.7946235Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7946323Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7946397Z     tmp4 = x1
2023-01-11T21:38:06.7946478Z     tmp5 = 0.5135135135135135
2023-01-11T21:38:06.7946558Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7946649Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7946727Z     tmp8 = x0
2023-01-11T21:38:06.7946799Z     tmp9 = 0.3786407766990291
2023-01-11T21:38:06.7946880Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.7946971Z     tmp11 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7947224Z     tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7947362Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.7947449Z ''')
2023-01-11T21:38:06.7947458Z 
2023-01-11T21:38:06.7947463Z 
2023-01-11T21:38:06.7947650Z triton_fused_upsample_nearest3d_3_3 = async_compile.triton('''
2023-01-11T21:38:06.7947727Z import triton
2023-01-11T21:38:06.7947815Z import triton.language as tl
2023-01-11T21:38:06.7947932Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7948037Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7948173Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7948342Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7948347Z 
2023-01-11T21:38:06.7948758Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7948839Z @triton.jit
2023-01-11T21:38:06.7948972Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7949043Z     xnumel = 449280
2023-01-11T21:38:06.7949141Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7949272Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7949357Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7949443Z     x2 = (xindex // 1560) % 36
2023-01-11T21:38:06.7949526Z     x1 = (xindex // 40) % 39
2023-01-11T21:38:06.7949605Z     x0 = xindex % 40
2023-01-11T21:38:06.7949680Z     x3 = (xindex // 56160)
2023-01-11T21:38:06.7949757Z     x5 = xindex
2023-01-11T21:38:06.7949830Z     tmp0 = x2
2023-01-11T21:38:06.7949910Z     tmp1 = 1.0277777777777777
2023-01-11T21:38:06.7949992Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7950081Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7950148Z     tmp4 = x1
2023-01-11T21:38:06.7950231Z     tmp5 = 0.9743589743589743
2023-01-11T21:38:06.7950316Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7950403Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7950477Z     tmp8 = x0
2023-01-11T21:38:06.7950555Z     tmp9 = 0.975
2023-01-11T21:38:06.7950637Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.7950720Z     tmp11 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7950857Z     tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask)
2023-01-11T21:38:06.7950994Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.7951086Z ''')
2023-01-11T21:38:06.7951091Z 
2023-01-11T21:38:06.7951095Z 
2023-01-11T21:38:06.7951194Z async_compile.wait(globals())
2023-01-11T21:38:06.7951282Z del async_compile
2023-01-11T21:38:06.7951288Z 
2023-01-11T21:38:06.7951366Z def call(args):
2023-01-11T21:38:06.7951441Z     arg0_1, = args
2023-01-11T21:38:06.7951512Z     args.clear()
2023-01-11T21:38:06.7951606Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7951881Z         buf0 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7952117Z         buf4 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7952215Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7952406Z         triton_fused_upsample_nearest3d_upsample_nearest3d_4_0.run(arg0_1, buf0, buf4, 3509376, grid=grid(3509376), stream=stream0)
2023-01-11T21:38:06.7952643Z         buf1 = empty_strided((2, 4, 70, 75, 80), (1680000, 420000, 6000, 80, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7952806Z         triton_fused_upsample_nearest3d_1_1.run(arg0_1, buf1, 3360000, grid=grid(3360000), stream=stream0)
2023-01-11T21:38:06.7953045Z         buf2 = empty_strided((2, 4, 45, 74, 103), (1371960, 342990, 7622, 103, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7953211Z         triton_fused_upsample_nearest3d_2_2.run(arg0_1, buf2, 2743920, grid=grid(2743920), stream=stream0)
2023-01-11T21:38:06.7953451Z         buf3 = empty_strided((2, 4, 36, 39, 40), (224640, 56160, 1560, 40, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7953612Z         triton_fused_upsample_nearest3d_3_3.run(arg0_1, buf3, 449280, grid=grid(449280), stream=stream0)
2023-01-11T21:38:06.7953691Z         del arg0_1
2023-01-11T21:38:06.7953802Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7953807Z 
2023-01-11T21:38:06.7953812Z 
2023-01-11T21:38:06.7953894Z if __name__ == "__main__":
2023-01-11T21:38:06.7954014Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7954137Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7954405Z     arg0_1 = rand_strided((2, 4, 37, 38, 39), (219336, 54834, 1482, 39, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7954524Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7954529Z 
2023-01-11T21:38:06.7954796Z [2023-01-11 21:36:19,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 942
2023-01-11T21:38:06.7954801Z 
2023-01-11T21:38:06.7954901Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7954978Z import torch
2023-01-11T21:38:06.7955055Z import random
2023-01-11T21:38:06.7955177Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7955322Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7955334Z 
2023-01-11T21:38:06.7955418Z aten = torch.ops.aten
2023-01-11T21:38:06.7955575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7955675Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7955680Z 
2023-01-11T21:38:06.7955758Z import triton
2023-01-11T21:38:06.7955853Z import triton.language as tl
2023-01-11T21:38:06.7955981Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7956122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7956128Z 
2023-01-11T21:38:06.7956132Z 
2023-01-11T21:38:06.7956357Z triton_fused_upsample_nearest3d_upsample_nearest3d_4_0 = async_compile.triton('''
2023-01-11T21:38:06.7956427Z import triton
2023-01-11T21:38:06.7956521Z import triton.language as tl
2023-01-11T21:38:06.7956638Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7956744Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7956882Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7957009Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7957014Z 
2023-01-11T21:38:06.7957441Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.7957526Z @triton.jit
2023-01-11T21:38:06.7957664Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7957742Z     xnumel = 3509376
2023-01-11T21:38:06.7957871Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7958004Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7958092Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7958176Z     x2 = (xindex // 5928) % 74
2023-01-11T21:38:06.7958262Z     x1 = (xindex // 78) % 76
2023-01-11T21:38:06.7958334Z     x0 = xindex % 78
2023-01-11T21:38:06.7958415Z     x3 = (xindex // 438672)
2023-01-11T21:38:06.7958489Z     x5 = xindex
2023-01-11T21:38:06.7958563Z     tmp0 = x2
2023-01-11T21:38:06.7958639Z     tmp1 = 0.5
2023-01-11T21:38:06.7958721Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7958802Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7958879Z     tmp4 = x1
2023-01-11T21:38:06.7958960Z     tmp5 = tmp4 * tmp1
2023-01-11T21:38:06.7959046Z     tmp6 = tmp5.to(tl.int32)
2023-01-11T21:38:06.7959120Z     tmp7 = x0
2023-01-11T21:38:06.7959202Z     tmp8 = tmp7 * tmp1
2023-01-11T21:38:06.7959287Z     tmp9 = tmp8.to(tl.int32)
2023-01-11T21:38:06.7959556Z     tmp10 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7959707Z     tmp11 = tl.load(in_ptr0 + (tmp9 + (39*tmp6) + (1482*tmp3) + (54834*x3)), xmask).to(tl.float32)
2023-01-11T21:38:06.7959842Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.7959976Z     tl.store(out_ptr1 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7960063Z ''')
2023-01-11T21:38:06.7960069Z 
2023-01-11T21:38:06.7960074Z 
2023-01-11T21:38:06.7960261Z triton_fused_upsample_nearest3d_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7960368Z import triton
2023-01-11T21:38:06.7960464Z import triton.language as tl
2023-01-11T21:38:06.7960574Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7960679Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7960816Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7960945Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7960950Z 
2023-01-11T21:38:06.7961363Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7961440Z @triton.jit
2023-01-11T21:38:06.7961573Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7961652Z     xnumel = 3360000
2023-01-11T21:38:06.7961745Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7961877Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7961967Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7962051Z     x2 = (xindex // 6000) % 70
2023-01-11T21:38:06.7962133Z     x1 = (xindex // 80) % 75
2023-01-11T21:38:06.7962212Z     x0 = xindex % 80
2023-01-11T21:38:06.7962295Z     x3 = (xindex // 420000)
2023-01-11T21:38:06.7962361Z     x5 = xindex
2023-01-11T21:38:06.7962439Z     tmp0 = x2
2023-01-11T21:38:06.7962522Z     tmp1 = 0.5285714285714286
2023-01-11T21:38:06.7962604Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7962692Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7962766Z     tmp4 = x1
2023-01-11T21:38:06.7962847Z     tmp5 = 0.5066666666666667
2023-01-11T21:38:06.7962921Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7963008Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7963082Z     tmp8 = x0
2023-01-11T21:38:06.7963161Z     tmp9 = 0.4875
2023-01-11T21:38:06.7963244Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.7963340Z     tmp11 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7963605Z     tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7963746Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.7963833Z ''')
2023-01-11T21:38:06.7963838Z 
2023-01-11T21:38:06.7963843Z 
2023-01-11T21:38:06.7964066Z triton_fused_upsample_nearest3d_2_2 = async_compile.triton('''
2023-01-11T21:38:06.7964145Z import triton
2023-01-11T21:38:06.7964240Z import triton.language as tl
2023-01-11T21:38:06.7964357Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7964461Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7964588Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7964716Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7964721Z 
2023-01-11T21:38:06.7965126Z @pointwise(size_hints=[4194304], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7965204Z @triton.jit
2023-01-11T21:38:06.7965337Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7965417Z     xnumel = 2743920
2023-01-11T21:38:06.7965516Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7965649Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7965742Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7965836Z     x2 = (xindex // 7622) % 45
2023-01-11T21:38:06.7965933Z     x1 = (xindex // 103) % 74
2023-01-11T21:38:06.7966022Z     x0 = xindex % 103
2023-01-11T21:38:06.7966103Z     x3 = (xindex // 342990)
2023-01-11T21:38:06.7966176Z     x5 = xindex
2023-01-11T21:38:06.7966249Z     tmp0 = x2
2023-01-11T21:38:06.7966323Z     tmp1 = 0.8222222222222222
2023-01-11T21:38:06.7966403Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7966493Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7966568Z     tmp4 = x1
2023-01-11T21:38:06.7966680Z     tmp5 = 0.5135135135135135
2023-01-11T21:38:06.7966761Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7966849Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7966916Z     tmp8 = x0
2023-01-11T21:38:06.7966996Z     tmp9 = 0.3786407766990291
2023-01-11T21:38:06.7967077Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.7967170Z     tmp11 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7967444Z     tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7967583Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.7967671Z ''')
2023-01-11T21:38:06.7967677Z 
2023-01-11T21:38:06.7967681Z 
2023-01-11T21:38:06.7967862Z triton_fused_upsample_nearest3d_3_3 = async_compile.triton('''
2023-01-11T21:38:06.7967939Z import triton
2023-01-11T21:38:06.7968033Z import triton.language as tl
2023-01-11T21:38:06.7968150Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7968256Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7968390Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.7968518Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7968523Z 
2023-01-11T21:38:06.7968933Z @pointwise(size_hints=[524288], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7969003Z @triton.jit
2023-01-11T21:38:06.7969138Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.7969215Z     xnumel = 449280
2023-01-11T21:38:06.7969314Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7969446Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.7969532Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7969619Z     x2 = (xindex // 1560) % 36
2023-01-11T21:38:06.7969695Z     x1 = (xindex // 40) % 39
2023-01-11T21:38:06.7969775Z     x0 = xindex % 40
2023-01-11T21:38:06.7969860Z     x3 = (xindex // 56160)
2023-01-11T21:38:06.7969934Z     x5 = xindex
2023-01-11T21:38:06.7970006Z     tmp0 = x2
2023-01-11T21:38:06.7970092Z     tmp1 = 1.0277777777777777
2023-01-11T21:38:06.7970174Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.7970284Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.7970360Z     tmp4 = x1
2023-01-11T21:38:06.7970440Z     tmp5 = 0.9743589743589743
2023-01-11T21:38:06.7970521Z     tmp6 = tmp4 * tmp5
2023-01-11T21:38:06.7970606Z     tmp7 = tmp6.to(tl.int32)
2023-01-11T21:38:06.7970681Z     tmp8 = x0
2023-01-11T21:38:06.7970757Z     tmp9 = 0.975
2023-01-11T21:38:06.7970832Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.7970921Z     tmp11 = tmp10.to(tl.int32)
2023-01-11T21:38:06.7971076Z     tmp12 = tl.load(in_ptr0 + (tmp11 + (39*tmp7) + (1482*tmp3) + (54834*x3)), xmask).to(tl.float32)
2023-01-11T21:38:06.7971212Z     tl.store(out_ptr0 + (x5 + tl.zeros([XBLOCK], tl.int32)), tmp12, xmask)
2023-01-11T21:38:06.7971306Z ''')
2023-01-11T21:38:06.7971311Z 
2023-01-11T21:38:06.7971315Z 
2023-01-11T21:38:06.7971411Z async_compile.wait(globals())
2023-01-11T21:38:06.7971493Z del async_compile
2023-01-11T21:38:06.7971498Z 
2023-01-11T21:38:06.7971568Z def call(args):
2023-01-11T21:38:06.7971644Z     arg0_1, = args
2023-01-11T21:38:06.7971721Z     args.clear()
2023-01-11T21:38:06.7971818Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7972070Z         buf0 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7972307Z         buf4 = empty_strided((2, 4, 74, 76, 78), (1754688, 438672, 5928, 78, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7972403Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7972595Z         triton_fused_upsample_nearest3d_upsample_nearest3d_4_0.run(arg0_1, buf0, buf4, 3509376, grid=grid(3509376), stream=stream0)
2023-01-11T21:38:06.7972823Z         buf1 = empty_strided((2, 4, 70, 75, 80), (1680000, 420000, 6000, 80, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7973014Z         triton_fused_upsample_nearest3d_1_1.run(arg0_1, buf1, 3360000, grid=grid(3360000), stream=stream0)
2023-01-11T21:38:06.7973259Z         buf2 = empty_strided((2, 4, 45, 74, 103), (1371960, 342990, 7622, 103, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7973424Z         triton_fused_upsample_nearest3d_2_2.run(arg0_1, buf2, 2743920, grid=grid(2743920), stream=stream0)
2023-01-11T21:38:06.7973662Z         buf3 = empty_strided((2, 4, 36, 39, 40), (224640, 56160, 1560, 40, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.7973824Z         triton_fused_upsample_nearest3d_3_3.run(arg0_1, buf3, 449280, grid=grid(449280), stream=stream0)
2023-01-11T21:38:06.7973901Z         del arg0_1
2023-01-11T21:38:06.7974009Z         return (buf0, buf1, buf2, buf3, buf4, )
2023-01-11T21:38:06.7974014Z 
2023-01-11T21:38:06.7974019Z 
2023-01-11T21:38:06.7974102Z if __name__ == "__main__":
2023-01-11T21:38:06.7974220Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7974349Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7974807Z     arg0_1 = rand_strided((2, 4, 37, 38, 39), (219336, 54834, 1482, 39, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.7974923Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7974933Z 
2023-01-11T21:38:06.7975004Z ok (1.628s)
2023-01-11T21:38:06.7975463Z   test_var_mean_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7975597Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7975858Z [2023-01-11 21:36:19,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 943
2023-01-11T21:38:06.7976071Z [2023-01-11 21:36:19,935] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7976267Z [2023-01-11 21:36:19,941] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5')
2023-01-11T21:38:06.7976584Z [2023-01-11 21:36:20,144] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 943
2023-01-11T21:38:06.7977006Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.7977197Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.7977460Z [2023-01-11 21:36:20,166] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 944
2023-01-11T21:38:06.7977670Z [2023-01-11 21:36:20,199] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.7977872Z [2023-01-11 21:36:20,199] torch._inductor.scheduler: [DEBUG] remove_buffer('buf2')
2023-01-11T21:38:06.7978075Z [2023-01-11 21:36:20,204] torch._inductor.scheduler: [DEBUG] remove_buffer('buf7')
2023-01-11T21:38:06.7978273Z [2023-01-11 21:36:20,204] torch._inductor.scheduler: [DEBUG] remove_buffer('buf5')
2023-01-11T21:38:06.7978278Z 
2023-01-11T21:38:06.7978378Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7978445Z import torch
2023-01-11T21:38:06.7978521Z import random
2023-01-11T21:38:06.7978641Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7978766Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7978771Z 
2023-01-11T21:38:06.7978853Z aten = torch.ops.aten
2023-01-11T21:38:06.7978990Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7979127Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7979133Z 
2023-01-11T21:38:06.7979200Z import triton
2023-01-11T21:38:06.7979291Z import triton.language as tl
2023-01-11T21:38:06.7979416Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7979557Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7979565Z 
2023-01-11T21:38:06.7979570Z 
2023-01-11T21:38:06.7979764Z triton_fused_getitem_getitem_1_var_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.7979839Z import triton
2023-01-11T21:38:06.7979931Z import triton.language as tl
2023-01-11T21:38:06.7980046Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7980141Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7980273Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7980399Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7980404Z 
2023-01-11T21:38:06.7980493Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.7980611Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7980698Z               filename=__file__,
2023-01-11T21:38:06.7981108Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7981184Z @triton.jit
2023-01-11T21:38:06.7981380Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7981458Z     xnumel = 8
2023-01-11T21:38:06.7981532Z     rnumel = 8
2023-01-11T21:38:06.7981633Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7981780Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7981872Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7981998Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7982063Z     x0 = xindex
2023-01-11T21:38:06.7982191Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7982302Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7982397Z         rindex = roffset + rbase
2023-01-11T21:38:06.7982485Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7982557Z         r1 = rindex
2023-01-11T21:38:06.7982847Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7982965Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.7983081Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7983198Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7983314Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7983419Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7983508Z         rindex = roffset + rbase
2023-01-11T21:38:06.7983595Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7983662Z         r1 = rindex
2023-01-11T21:38:06.7983878Z         tmp2 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7983997Z         tmp8 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask)
2023-01-11T21:38:06.7984069Z         tmp3 = 8
2023-01-11T21:38:06.7984151Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.7984267Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.7984349Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.7984464Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.7984584Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.7984697Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7984807Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7984884Z     tmp10 = 7
2023-01-11T21:38:06.7984967Z     tmp11 = tmp7 / tmp10
2023-01-11T21:38:06.7985038Z     tmp12 = 8
2023-01-11T21:38:06.7985110Z     tmp13 = tmp9 / tmp12
2023-01-11T21:38:06.7985250Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7985445Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.7985530Z ''')
2023-01-11T21:38:06.7985535Z 
2023-01-11T21:38:06.7985540Z 
2023-01-11T21:38:06.7985740Z triton_fused_getitem_2_getitem_3_var_mean_1_1 = async_compile.triton('''
2023-01-11T21:38:06.7985818Z import triton
2023-01-11T21:38:06.7985912Z import triton.language as tl
2023-01-11T21:38:06.7986031Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7986127Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7986259Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7986387Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7986392Z 
2023-01-11T21:38:06.7986482Z @reduction(size_hints=[4, 16],
2023-01-11T21:38:06.7986601Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.7986686Z               filename=__file__,
2023-01-11T21:38:06.7987098Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0', 'in_out_ptr1'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.7987171Z @triton.jit
2023-01-11T21:38:06.7987349Z def triton_(in_out_ptr0, in_out_ptr1, in_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7987424Z     xnumel = 4
2023-01-11T21:38:06.7987495Z     rnumel = 16
2023-01-11T21:38:06.7987593Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7987727Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7987813Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7987932Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7987995Z     x0 = xindex
2023-01-11T21:38:06.7988111Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7988221Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7988308Z         rindex = roffset + rbase
2023-01-11T21:38:06.7988393Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7988469Z         r1 = rindex % 8
2023-01-11T21:38:06.7988549Z         r2 = (rindex // 8)
2023-01-11T21:38:06.7988808Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7988933Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.7989048Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7989163Z     _tmp7 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7989278Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7989383Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7989474Z         rindex = roffset + rbase
2023-01-11T21:38:06.7989560Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7989629Z         r1 = rindex % 8
2023-01-11T21:38:06.7989710Z         r2 = (rindex // 8)
2023-01-11T21:38:06.7989935Z         tmp2 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.7990059Z         tmp8 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask)
2023-01-11T21:38:06.7990132Z         tmp3 = 16
2023-01-11T21:38:06.7990216Z         tmp4 = tmp1 / tmp3
2023-01-11T21:38:06.7990330Z         tmp5 = tmp2 - tmp4
2023-01-11T21:38:06.7990403Z         tmp6 = tmp5 * tmp5
2023-01-11T21:38:06.7990526Z         _tmp7 = tl.where(xmask & rmask, _tmp7 + tmp6, _tmp7)
2023-01-11T21:38:06.7990643Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.7990758Z     tmp7 = tl.reshape(tl.sum(_tmp7, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7990869Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7990942Z     tmp10 = 15
2023-01-11T21:38:06.7991026Z     tmp11 = tmp7 / tmp10
2023-01-11T21:38:06.7991091Z     tmp12 = 16
2023-01-11T21:38:06.7991172Z     tmp13 = tmp9 / tmp12
2023-01-11T21:38:06.7991349Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp11, xmask)
2023-01-11T21:38:06.7991487Z     tl.store(in_out_ptr1 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp13, xmask)
2023-01-11T21:38:06.7991572Z ''')
2023-01-11T21:38:06.7991578Z 
2023-01-11T21:38:06.7991582Z 
2023-01-11T21:38:06.7991680Z async_compile.wait(globals())
2023-01-11T21:38:06.7991760Z del async_compile
2023-01-11T21:38:06.7991765Z 
2023-01-11T21:38:06.7991832Z def call(args):
2023-01-11T21:38:06.7991907Z     arg0_1, = args
2023-01-11T21:38:06.7991982Z     args.clear()
2023-01-11T21:38:06.7992075Z     with torch.cuda.device(0):
2023-01-11T21:38:06.7992285Z         buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7992488Z         buf2 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7992580Z         buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.7992670Z         buf4 = buf2; del buf2  # reuse
2023-01-11T21:38:06.7992759Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.7992929Z         triton_fused_getitem_getitem_1_var_mean_0.run(buf3, buf4, arg0_1, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.7993130Z         buf6 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7993330Z         buf7 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.7993421Z         buf8 = buf6; del buf6  # reuse
2023-01-11T21:38:06.7993513Z         buf9 = buf7; del buf7  # reuse
2023-01-11T21:38:06.7993681Z         triton_fused_getitem_2_getitem_3_var_mean_1_1.run(buf8, buf9, arg0_1, 4, 16, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.7993747Z         del arg0_1
2023-01-11T21:38:06.7993844Z         return (buf3, buf4, buf8, buf9, )
2023-01-11T21:38:06.7993850Z 
2023-01-11T21:38:06.7993854Z 
2023-01-11T21:38:06.7993933Z if __name__ == "__main__":
2023-01-11T21:38:06.7994051Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.7994182Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.7994398Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.7994510Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.7994515Z 
2023-01-11T21:38:06.7994809Z [2023-01-11 21:36:20,303] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 944
2023-01-11T21:38:06.7994815Z 
2023-01-11T21:38:06.7994914Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.7994981Z import torch
2023-01-11T21:38:06.7995057Z import random
2023-01-11T21:38:06.7995183Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.7995327Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.7995332Z 
2023-01-11T21:38:06.7995433Z aten = torch.ops.aten
2023-01-11T21:38:06.7995575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.7995672Z async_compile = AsyncCompile()
2023-01-11T21:38:06.7995680Z 
2023-01-11T21:38:06.7995746Z import triton
2023-01-11T21:38:06.7995838Z import triton.language as tl
2023-01-11T21:38:06.7995961Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.7996100Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.7996105Z 
2023-01-11T21:38:06.7996109Z 
2023-01-11T21:38:06.7996304Z triton_fused_getitem_getitem_1_var_mean_0 = async_compile.triton('''
2023-01-11T21:38:06.7996384Z import triton
2023-01-11T21:38:06.7996477Z import triton.language as tl
2023-01-11T21:38:06.7996590Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.7996685Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.7996816Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.7996942Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.7996947Z 
2023-01-11T21:38:06.7997036Z @reduction(size_hints=[8, 8],
2023-01-11T21:38:06.7997151Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.7997266Z               filename=__file__,
2023-01-11T21:38:06.7997655Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.7997732Z @triton.jit
2023-01-11T21:38:06.7997905Z def triton_(in_out_ptr0, in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.7997977Z     xnumel = 8
2023-01-11T21:38:06.7998048Z     rnumel = 8
2023-01-11T21:38:06.7998146Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.7998281Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.7998364Z     xmask = xindex < xnumel
2023-01-11T21:38:06.7998483Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.7998547Z     x0 = xindex
2023-01-11T21:38:06.7998663Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7998773Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.7998860Z         rindex = roffset + rbase
2023-01-11T21:38:06.7998946Z         rmask = rindex < rnumel
2023-01-11T21:38:06.7999020Z         r1 = rindex
2023-01-11T21:38:06.7999263Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.7999348Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.7999468Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.7999583Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.7999699Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7999817Z     _tmp12 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.7999922Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.8000009Z         rindex = roffset + rbase
2023-01-11T21:38:06.8000087Z         rmask = rindex < rnumel
2023-01-11T21:38:06.8000162Z         r1 = rindex
2023-01-11T21:38:06.8000399Z         tmp3 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.8000532Z         tmp10 = tl.load(in_ptr0 + (r1 + (8*x0)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.8000605Z         tmp4 = 8
2023-01-11T21:38:06.8000716Z         tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.8000808Z         tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.8000923Z         tmp7 = tmp3 - tmp6
2023-01-11T21:38:06.8000997Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.8001119Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.8001211Z         tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.8001335Z         _tmp12 = tl.where(xmask & rmask, _tmp12 + tmp11, _tmp12)
2023-01-11T21:38:06.8001448Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.8001563Z     tmp12 = tl.reshape(tl.sum(_tmp12, 1), [XBLOCK, 1])
2023-01-11T21:38:06.8001637Z     tmp13 = 8
2023-01-11T21:38:06.8001714Z     tmp14 = tmp12 / tmp13
2023-01-11T21:38:06.8001804Z     tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.8001875Z     tmp16 = 7
2023-01-11T21:38:06.8001957Z     tmp17 = tmp9 / tmp16
2023-01-11T21:38:06.8002094Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.8002236Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.8002320Z ''')
2023-01-11T21:38:06.8002325Z 
2023-01-11T21:38:06.8002330Z 
2023-01-11T21:38:06.8002519Z triton_fused_getitem_2_getitem_3_var_mean_1_1 = async_compile.triton('''
2023-01-11T21:38:06.8002594Z import triton
2023-01-11T21:38:06.8002687Z import triton.language as tl
2023-01-11T21:38:06.8002802Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8002905Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8003037Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.8003162Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8003194Z 
2023-01-11T21:38:06.8003286Z @reduction(size_hints=[4, 16],
2023-01-11T21:38:06.8003397Z               reduction_hint=ReductionHint.DEFAULT,
2023-01-11T21:38:06.8003482Z               filename=__file__,
2023-01-11T21:38:06.8003871Z               meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 4), equal_to_1=())]})
2023-01-11T21:38:06.8003948Z @triton.jit
2023-01-11T21:38:06.8004126Z def triton_(in_out_ptr0, in_ptr0, out_ptr2, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.8004201Z     xnumel = 4
2023-01-11T21:38:06.8004274Z     rnumel = 16
2023-01-11T21:38:06.8004364Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8004497Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.8004581Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8004704Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.8004775Z     x0 = xindex
2023-01-11T21:38:06.8004893Z     _tmp2 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.8004998Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.8005079Z         rindex = roffset + rbase
2023-01-11T21:38:06.8005170Z         rmask = rindex < rnumel
2023-01-11T21:38:06.8005250Z         r1 = rindex % 8
2023-01-11T21:38:06.8005336Z         r2 = (rindex // 8)
2023-01-11T21:38:06.8005630Z         tmp0 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.8005722Z         tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.8005845Z         _tmp2 = tl.where(xmask & rmask, _tmp2 + tmp1, _tmp2)
2023-01-11T21:38:06.8005959Z     tmp2 = tl.reshape(tl.sum(_tmp2, 1), [XBLOCK, 1])
2023-01-11T21:38:06.8006069Z     _tmp9 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.8006187Z     _tmp12 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.8006294Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.8006382Z         rindex = roffset + rbase
2023-01-11T21:38:06.8006467Z         rmask = rindex < rnumel
2023-01-11T21:38:06.8006543Z         r1 = rindex % 8
2023-01-11T21:38:06.8006624Z         r2 = (rindex // 8)
2023-01-11T21:38:06.8006894Z         tmp3 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.8007038Z         tmp10 = tl.load(in_ptr0 + (r1 + (8*x0) + (32*r2)), rmask & xmask).to(tl.float32)
2023-01-11T21:38:06.8007113Z         tmp4 = 16
2023-01-11T21:38:06.8007196Z         tmp5 = tmp2 / tmp4
2023-01-11T21:38:06.8007287Z         tmp6 = tmp5.to(tl.float32)
2023-01-11T21:38:06.8007400Z         tmp7 = tmp3 - tmp6
2023-01-11T21:38:06.8007480Z         tmp8 = tmp7 * tmp7
2023-01-11T21:38:06.8007594Z         _tmp9 = tl.where(xmask & rmask, _tmp9 + tmp8, _tmp9)
2023-01-11T21:38:06.8007685Z         tmp11 = tmp10.to(tl.float32)
2023-01-11T21:38:06.8007812Z         _tmp12 = tl.where(xmask & rmask, _tmp12 + tmp11, _tmp12)
2023-01-11T21:38:06.8007926Z     tmp9 = tl.reshape(tl.sum(_tmp9, 1), [XBLOCK, 1])
2023-01-11T21:38:06.8008041Z     tmp12 = tl.reshape(tl.sum(_tmp12, 1), [XBLOCK, 1])
2023-01-11T21:38:06.8008116Z     tmp13 = 16
2023-01-11T21:38:06.8008200Z     tmp14 = tmp12 / tmp13
2023-01-11T21:38:06.8008281Z     tmp15 = tmp14.to(tl.float32)
2023-01-11T21:38:06.8008353Z     tmp16 = 15
2023-01-11T21:38:06.8008434Z     tmp17 = tmp9 / tmp16
2023-01-11T21:38:06.8008571Z     tl.store(out_ptr2 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp15, xmask)
2023-01-11T21:38:06.8008708Z     tl.store(in_out_ptr0 + (x0 + tl.zeros([XBLOCK, 1], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.8008793Z ''')
2023-01-11T21:38:06.8008799Z 
2023-01-11T21:38:06.8008803Z 
2023-01-11T21:38:06.8008897Z async_compile.wait(globals())
2023-01-11T21:38:06.8008967Z del async_compile
2023-01-11T21:38:06.8008979Z 
2023-01-11T21:38:06.8009046Z def call(args):
2023-01-11T21:38:06.8009155Z     arg0_1, = args
2023-01-11T21:38:06.8009229Z     args.clear()
2023-01-11T21:38:06.8009324Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8009531Z         buf1 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8009740Z         buf4 = empty_strided((1, 2, 4), (8, 4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8009833Z         buf3 = buf1; del buf1  # reuse
2023-01-11T21:38:06.8009918Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8010083Z         triton_fused_getitem_getitem_1_var_mean_0.run(buf3, arg0_1, buf4, 8, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.8010279Z         buf6 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8010476Z         buf9 = empty_strided((1, 4), (4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8010565Z         buf8 = buf6; del buf6  # reuse
2023-01-11T21:38:06.8010735Z         triton_fused_getitem_2_getitem_3_var_mean_1_1.run(buf8, arg0_1, buf9, 4, 16, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.8010812Z         del arg0_1
2023-01-11T21:38:06.8010900Z         return (buf3, buf4, buf8, buf9, )
2023-01-11T21:38:06.8010913Z 
2023-01-11T21:38:06.8010918Z 
2023-01-11T21:38:06.8010992Z if __name__ == "__main__":
2023-01-11T21:38:06.8011115Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8011241Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8011457Z     arg0_1 = rand_strided((1, 2, 4, 8), (64, 32, 8, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8011571Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8011576Z 
2023-01-11T21:38:06.8011647Z ok (0.429s)
2023-01-11T21:38:06.8012109Z   test_vdd_clamp_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8012246Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8012535Z [2023-01-11 21:36:20,341] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 945
2023-01-11T21:38:06.8012792Z [2023-01-11 21:36:20,413] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 945
2023-01-11T21:38:06.8013208Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8013340Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8013599Z [2023-01-11 21:36:20,447] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 946
2023-01-11T21:38:06.8013861Z [2023-01-11 21:36:20,523] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 946
2023-01-11T21:38:06.8013867Z 
2023-01-11T21:38:06.8013968Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8014044Z import torch
2023-01-11T21:38:06.8014119Z import random
2023-01-11T21:38:06.8014239Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8014357Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8014362Z 
2023-01-11T21:38:06.8014447Z aten = torch.ops.aten
2023-01-11T21:38:06.8014782Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8014881Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8014887Z 
2023-01-11T21:38:06.8014962Z import triton
2023-01-11T21:38:06.8015054Z import triton.language as tl
2023-01-11T21:38:06.8015179Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8015369Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8015375Z 
2023-01-11T21:38:06.8015380Z 
2023-01-11T21:38:06.8015568Z triton_fused_ge_maximum_0 = async_compile.triton('''
2023-01-11T21:38:06.8015662Z import triton
2023-01-11T21:38:06.8015766Z import triton.language as tl
2023-01-11T21:38:06.8015881Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8015983Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8016117Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8016243Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8016248Z 
2023-01-11T21:38:06.8016665Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8016735Z @triton.jit
2023-01-11T21:38:06.8016879Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8016956Z     xnumel = 16
2023-01-11T21:38:06.8017052Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8017233Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8017321Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8017392Z     x0 = xindex
2023-01-11T21:38:06.8017577Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.8017677Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8017750Z     tmp1 = 3.0
2023-01-11T21:38:06.8017887Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.8017958Z     tmp4 = 3
2023-01-11T21:38:06.8018039Z     tmp5 = tmp3 >= tmp4
2023-01-11T21:38:06.8018174Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8018300Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.8018391Z ''')
2023-01-11T21:38:06.8018397Z 
2023-01-11T21:38:06.8018401Z 
2023-01-11T21:38:06.8018493Z async_compile.wait(globals())
2023-01-11T21:38:06.8018570Z del async_compile
2023-01-11T21:38:06.8018575Z 
2023-01-11T21:38:06.8018656Z def call(args):
2023-01-11T21:38:06.8018736Z     primals_1, = args
2023-01-11T21:38:06.8018851Z     args.clear()
2023-01-11T21:38:06.8018946Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8019138Z         buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8019330Z         buf1 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.8019423Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8019580Z         triton_fused_ge_maximum_0.run(primals_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8019658Z         del primals_1
2023-01-11T21:38:06.8019742Z         return (buf0, buf1, )
2023-01-11T21:38:06.8019747Z 
2023-01-11T21:38:06.8019755Z 
2023-01-11T21:38:06.8019835Z if __name__ == "__main__":
2023-01-11T21:38:06.8019957Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8020076Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8020281Z     primals_1 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8020402Z     print_performance(lambda: call([primals_1]))
2023-01-11T21:38:06.8020408Z 
2023-01-11T21:38:06.8020412Z 
2023-01-11T21:38:06.8020511Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8020587Z import torch
2023-01-11T21:38:06.8020662Z import random
2023-01-11T21:38:06.8020784Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8020907Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8020912Z 
2023-01-11T21:38:06.8020987Z aten = torch.ops.aten
2023-01-11T21:38:06.8021122Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8021216Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8021248Z 
2023-01-11T21:38:06.8021324Z import triton
2023-01-11T21:38:06.8021415Z import triton.language as tl
2023-01-11T21:38:06.8021541Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8021680Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8021685Z 
2023-01-11T21:38:06.8021690Z 
2023-01-11T21:38:06.8021858Z triton_fused_ge_maximum_0 = async_compile.triton('''
2023-01-11T21:38:06.8021926Z import triton
2023-01-11T21:38:06.8022019Z import triton.language as tl
2023-01-11T21:38:06.8022131Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8022234Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8022367Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8022493Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8022498Z 
2023-01-11T21:38:06.8022915Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*i1', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8022994Z @triton.jit
2023-01-11T21:38:06.8023130Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8023204Z     xnumel = 16
2023-01-11T21:38:06.8023307Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8023435Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8023519Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8023589Z     x0 = xindex
2023-01-11T21:38:06.8023802Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.8023913Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8023986Z     tmp1 = 3.0
2023-01-11T21:38:06.8024124Z     tmp2 = tl.where(tmp0 != tmp0, tmp0, tl.where(tmp0 > tmp1, tmp0, tmp1))
2023-01-11T21:38:06.8024196Z     tmp4 = 3
2023-01-11T21:38:06.8024281Z     tmp5 = tmp3 >= tmp4
2023-01-11T21:38:06.8024415Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8024548Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.8024625Z ''')
2023-01-11T21:38:06.8024631Z 
2023-01-11T21:38:06.8024643Z 
2023-01-11T21:38:06.8024757Z async_compile.wait(globals())
2023-01-11T21:38:06.8024836Z del async_compile
2023-01-11T21:38:06.8024841Z 
2023-01-11T21:38:06.8024915Z def call(args):
2023-01-11T21:38:06.8024994Z     primals_1, = args
2023-01-11T21:38:06.8025069Z     args.clear()
2023-01-11T21:38:06.8025162Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8025361Z         buf0 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8025546Z         buf1 = empty_strided((16, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.8025639Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8025794Z         triton_fused_ge_maximum_0.run(primals_1, buf0, buf1, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8025874Z         del primals_1
2023-01-11T21:38:06.8025957Z         return (buf0, buf1, )
2023-01-11T21:38:06.8025962Z 
2023-01-11T21:38:06.8025967Z 
2023-01-11T21:38:06.8026046Z if __name__ == "__main__":
2023-01-11T21:38:06.8026164Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8026285Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8026489Z     primals_1 = rand_strided((16, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8026607Z     print_performance(lambda: call([primals_1]))
2023-01-11T21:38:06.8026612Z 
2023-01-11T21:38:06.8026682Z ok (0.219s)
2023-01-11T21:38:06.8027148Z   test_vertical_fusion1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8027305Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8027564Z [2023-01-11 21:36:20,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 947
2023-01-11T21:38:06.8027829Z [2023-01-11 21:36:20,672] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 947
2023-01-11T21:38:06.8028244Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8028373Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8028629Z [2023-01-11 21:36:20,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 948
2023-01-11T21:38:06.8028887Z [2023-01-11 21:36:20,806] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 948
2023-01-11T21:38:06.8028900Z 
2023-01-11T21:38:06.8028991Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8029068Z import torch
2023-01-11T21:38:06.8029142Z import random
2023-01-11T21:38:06.8029262Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8029385Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8029390Z 
2023-01-11T21:38:06.8029472Z aten = torch.ops.aten
2023-01-11T21:38:06.8029608Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8029697Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8029702Z 
2023-01-11T21:38:06.8029776Z import triton
2023-01-11T21:38:06.8029868Z import triton.language as tl
2023-01-11T21:38:06.8029993Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8030136Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8030141Z 
2023-01-11T21:38:06.8030146Z 
2023-01-11T21:38:06.8030303Z triton_fused_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.8030377Z import triton
2023-01-11T21:38:06.8030471Z import triton.language as tl
2023-01-11T21:38:06.8030604Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8030707Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8030839Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8030963Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8030969Z 
2023-01-11T21:38:06.8038605Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: '*fp32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.8038698Z @triton.jit
2023-01-11T21:38:06.8038862Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8038941Z     xnumel = 1082016
2023-01-11T21:38:06.8039040Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8039168Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8039249Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8039326Z     x2 = xindex
2023-01-11T21:38:06.8039404Z     x0 = xindex % 26
2023-01-11T21:38:06.8039504Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.8039601Z     tmp8 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.8039694Z     tmp15 = tl.load(in_ptr2 + (x0), xmask)
2023-01-11T21:38:06.8039843Z     tmp1 = -1.061519070296458e-11
2023-01-11T21:38:06.8039924Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8040048Z     tmp3 = -1.988366587925593e-08
2023-01-11T21:38:06.8040130Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8040210Z     tmp5 = tmp0 * tmp4
2023-01-11T21:38:06.8040331Z     tmp6 = -3.087032500374211e-07
2023-01-11T21:38:06.8040472Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.8040594Z     tmp9 = 1.55093272922008e-10
2023-01-11T21:38:06.8040675Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.8040757Z     tmp11 = tmp7 + tmp10
2023-01-11T21:38:06.8040835Z     tmp12 = 1 / tmp11
2023-01-11T21:38:06.8040909Z     tmp13 = 1.0
2023-01-11T21:38:06.8040988Z     tmp14 = tmp12 * tmp13
2023-01-11T21:38:06.8041070Z     tmp16 = tmp11 * tmp15
2023-01-11T21:38:06.8041152Z     tmp17 = tmp14 + tmp16
2023-01-11T21:38:06.8041290Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.8041376Z ''')
2023-01-11T21:38:06.8041382Z 
2023-01-11T21:38:06.8041387Z 
2023-01-11T21:38:06.8041483Z async_compile.wait(globals())
2023-01-11T21:38:06.8041562Z del async_compile
2023-01-11T21:38:06.8041568Z 
2023-01-11T21:38:06.8041642Z def call(args):
2023-01-11T21:38:06.8041723Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.8041801Z     args.clear()
2023-01-11T21:38:06.8041897Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8042122Z         buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8042216Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8042372Z         triton_fused_add_3_0.run(arg1_1, arg0_1, arg2_1, buf0, 1082016, grid=grid(1082016), stream=stream0)
2023-01-11T21:38:06.8042449Z         del arg0_1
2023-01-11T21:38:06.8042514Z         del arg1_1
2023-01-11T21:38:06.8042587Z         del arg2_1
2023-01-11T21:38:06.8042665Z         return (buf0, )
2023-01-11T21:38:06.8042671Z 
2023-01-11T21:38:06.8042675Z 
2023-01-11T21:38:06.8042759Z if __name__ == "__main__":
2023-01-11T21:38:06.8042878Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8043007Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8043226Z     arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8043445Z     arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8043638Z     arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8043768Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.8043773Z 
2023-01-11T21:38:06.8043777Z 
2023-01-11T21:38:06.8043876Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8043979Z import torch
2023-01-11T21:38:06.8044056Z import random
2023-01-11T21:38:06.8044177Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8044302Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8044307Z 
2023-01-11T21:38:06.8044389Z aten = torch.ops.aten
2023-01-11T21:38:06.8044519Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8044615Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8044620Z 
2023-01-11T21:38:06.8044694Z import triton
2023-01-11T21:38:06.8044786Z import triton.language as tl
2023-01-11T21:38:06.8044911Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8045052Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8045058Z 
2023-01-11T21:38:06.8045062Z 
2023-01-11T21:38:06.8045219Z triton_fused_add_3_0 = async_compile.triton('''
2023-01-11T21:38:06.8045300Z import triton
2023-01-11T21:38:06.8045389Z import triton.language as tl
2023-01-11T21:38:06.8045503Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8045608Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8045743Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8045868Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8045873Z 
2023-01-11T21:38:06.8046312Z @pointwise(size_hints=[2097152], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: '*fp16', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3, 4), equal_to_1=())]})
2023-01-11T21:38:06.8046413Z @triton.jit
2023-01-11T21:38:06.8046564Z def triton_(in_ptr0, in_ptr1, in_ptr2, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8046634Z     xnumel = 1082016
2023-01-11T21:38:06.8046732Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8046862Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8046949Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8047022Z     x2 = xindex
2023-01-11T21:38:06.8047100Z     x0 = xindex % 26
2023-01-11T21:38:06.8047219Z     tmp0 = tl.load(in_ptr0 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.8047330Z     tmp8 = tl.load(in_ptr1 + (x2), xmask).to(tl.float32)
2023-01-11T21:38:06.8047449Z     tmp15 = tl.load(in_ptr2 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8047574Z     tmp1 = -1.061519070296458e-11
2023-01-11T21:38:06.8047657Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8047781Z     tmp3 = -1.988366587925593e-08
2023-01-11T21:38:06.8047862Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8047945Z     tmp5 = tmp0 * tmp4
2023-01-11T21:38:06.8048060Z     tmp6 = -3.087032500374211e-07
2023-01-11T21:38:06.8048138Z     tmp7 = tmp5 + tmp6
2023-01-11T21:38:06.8048262Z     tmp9 = 1.55093272922008e-10
2023-01-11T21:38:06.8048344Z     tmp10 = tmp8 * tmp9
2023-01-11T21:38:06.8048427Z     tmp11 = tmp7 + tmp10
2023-01-11T21:38:06.8048505Z     tmp12 = 1 / tmp11
2023-01-11T21:38:06.8048575Z     tmp13 = 1.0
2023-01-11T21:38:06.8048657Z     tmp14 = tmp12 * tmp13
2023-01-11T21:38:06.8048737Z     tmp16 = tmp11 * tmp15
2023-01-11T21:38:06.8048818Z     tmp17 = tmp14 + tmp16
2023-01-11T21:38:06.8048957Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp17, xmask)
2023-01-11T21:38:06.8049045Z ''')
2023-01-11T21:38:06.8049051Z 
2023-01-11T21:38:06.8049056Z 
2023-01-11T21:38:06.8049151Z async_compile.wait(globals())
2023-01-11T21:38:06.8049229Z del async_compile
2023-01-11T21:38:06.8049235Z 
2023-01-11T21:38:06.8049303Z def call(args):
2023-01-11T21:38:06.8049390Z     arg0_1, arg1_1, arg2_1 = args
2023-01-11T21:38:06.8049473Z     args.clear()
2023-01-11T21:38:06.8049567Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8049784Z         buf0 = empty_strided((204, 204, 26), (5304, 26, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8049878Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8050066Z         triton_fused_add_3_0.run(arg1_1, arg0_1, arg2_1, buf0, 1082016, grid=grid(1082016), stream=stream0)
2023-01-11T21:38:06.8050135Z         del arg0_1
2023-01-11T21:38:06.8050209Z         del arg1_1
2023-01-11T21:38:06.8050281Z         del arg2_1
2023-01-11T21:38:06.8050360Z         return (buf0, )
2023-01-11T21:38:06.8050366Z 
2023-01-11T21:38:06.8050371Z 
2023-01-11T21:38:06.8050453Z if __name__ == "__main__":
2023-01-11T21:38:06.8050571Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8050698Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8050917Z     arg0_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8051128Z     arg1_1 = rand_strided((204, 204, 26), (5304, 26, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8051324Z     arg2_1 = rand_strided((26, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8051455Z     print_performance(lambda: call([arg0_1, arg1_1, arg2_1]))
2023-01-11T21:38:06.8051460Z 
2023-01-11T21:38:06.8051534Z ok (0.284s)
2023-01-11T21:38:06.8051990Z   test_views1_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8052122Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8052383Z [2023-01-11 21:36:20,824] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 949
2023-01-11T21:38:06.8052672Z [2023-01-11 21:36:20,893] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 949
2023-01-11T21:38:06.8053090Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8053223Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8053477Z [2023-01-11 21:36:20,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 950
2023-01-11T21:38:06.8053733Z [2023-01-11 21:36:20,978] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 950
2023-01-11T21:38:06.8054145Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8054282Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8054852Z [2023-01-11 21:36:20,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 951
2023-01-11T21:38:06.8055118Z [2023-01-11 21:36:21,066] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 951
2023-01-11T21:38:06.8055532Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8055667Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8055922Z [2023-01-11 21:36:21,085] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 952
2023-01-11T21:38:06.8055927Z 
2023-01-11T21:38:06.8056080Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8056157Z import torch
2023-01-11T21:38:06.8056225Z import random
2023-01-11T21:38:06.8056346Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8056470Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8056475Z 
2023-01-11T21:38:06.8056561Z aten = torch.ops.aten
2023-01-11T21:38:06.8056699Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8056794Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8056800Z 
2023-01-11T21:38:06.8056873Z import triton
2023-01-11T21:38:06.8056965Z import triton.language as tl
2023-01-11T21:38:06.8057086Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8057290Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8057296Z 
2023-01-11T21:38:06.8057302Z 
2023-01-11T21:38:06.8057481Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8057570Z import triton
2023-01-11T21:38:06.8057668Z import triton.language as tl
2023-01-11T21:38:06.8057782Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8057883Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8058009Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8058134Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8058140Z 
2023-01-11T21:38:06.8058558Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8058675Z @triton.jit
2023-01-11T21:38:06.8058819Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8058896Z     xnumel = 35
2023-01-11T21:38:06.8058995Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8059125Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8059204Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8059277Z     x0 = xindex
2023-01-11T21:38:06.8059374Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8059472Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8059554Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8059691Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8059777Z ''')
2023-01-11T21:38:06.8059782Z 
2023-01-11T21:38:06.8059787Z 
2023-01-11T21:38:06.8059886Z async_compile.wait(globals())
2023-01-11T21:38:06.8059956Z del async_compile
2023-01-11T21:38:06.8059962Z 
2023-01-11T21:38:06.8060039Z def call(args):
2023-01-11T21:38:06.8060122Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8060197Z     args.clear()
2023-01-11T21:38:06.8060292Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8060493Z         buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8060587Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8060727Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8060803Z         del arg0_1
2023-01-11T21:38:06.8060876Z         del arg1_1
2023-01-11T21:38:06.8060953Z         return (buf0, )
2023-01-11T21:38:06.8060958Z 
2023-01-11T21:38:06.8060962Z 
2023-01-11T21:38:06.8061042Z if __name__ == "__main__":
2023-01-11T21:38:06.8061159Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8061284Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8061485Z     arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8061675Z     arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8061800Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8061805Z 
2023-01-11T21:38:06.8061810Z 
2023-01-11T21:38:06.8061907Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8061988Z import torch
2023-01-11T21:38:06.8062063Z import random
2023-01-11T21:38:06.8062214Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8062353Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8062358Z 
2023-01-11T21:38:06.8062443Z aten = torch.ops.aten
2023-01-11T21:38:06.8062584Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8062686Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8062691Z 
2023-01-11T21:38:06.8062768Z import triton
2023-01-11T21:38:06.8062865Z import triton.language as tl
2023-01-11T21:38:06.8063001Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8063153Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8063162Z 
2023-01-11T21:38:06.8063166Z 
2023-01-11T21:38:06.8063336Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8063412Z import triton
2023-01-11T21:38:06.8063502Z import triton.language as tl
2023-01-11T21:38:06.8063624Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8063737Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8063882Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8064019Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8064024Z 
2023-01-11T21:38:06.8064513Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8064589Z @triton.jit
2023-01-11T21:38:06.8064731Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8064825Z     xnumel = 35
2023-01-11T21:38:06.8064922Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8065051Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8065136Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8065207Z     x0 = xindex
2023-01-11T21:38:06.8065327Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8065445Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8065518Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8065651Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8065739Z ''')
2023-01-11T21:38:06.8065745Z 
2023-01-11T21:38:06.8065749Z 
2023-01-11T21:38:06.8065842Z async_compile.wait(globals())
2023-01-11T21:38:06.8065920Z del async_compile
2023-01-11T21:38:06.8065925Z 
2023-01-11T21:38:06.8065999Z def call(args):
2023-01-11T21:38:06.8066078Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8066146Z     args.clear()
2023-01-11T21:38:06.8066244Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8066446Z         buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8066540Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8066681Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8066758Z         del arg0_1
2023-01-11T21:38:06.8066834Z         del arg1_1
2023-01-11T21:38:06.8066905Z         return (buf0, )
2023-01-11T21:38:06.8066918Z 
2023-01-11T21:38:06.8066922Z 
2023-01-11T21:38:06.8066995Z if __name__ == "__main__":
2023-01-11T21:38:06.8067112Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8067238Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8067437Z     arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8067634Z     arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8067756Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8067761Z 
2023-01-11T21:38:06.8067765Z 
2023-01-11T21:38:06.8067864Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8067939Z import torch
2023-01-11T21:38:06.8068007Z import random
2023-01-11T21:38:06.8068127Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8068279Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8068285Z 
2023-01-11T21:38:06.8068368Z aten = torch.ops.aten
2023-01-11T21:38:06.8068506Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8068600Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8068606Z 
2023-01-11T21:38:06.8068680Z import triton
2023-01-11T21:38:06.8068764Z import triton.language as tl
2023-01-11T21:38:06.8068889Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8069028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8069033Z 
2023-01-11T21:38:06.8069042Z 
2023-01-11T21:38:06.8069199Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8069275Z import triton
2023-01-11T21:38:06.8069367Z import triton.language as tl
2023-01-11T21:38:06.8069483Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8069585Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8069714Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8069842Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8069847Z 
2023-01-11T21:38:06.8070264Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8070338Z @triton.jit
2023-01-11T21:38:06.8070480Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8070554Z     xnumel = 35
2023-01-11T21:38:06.8070680Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8070810Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8070886Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8070957Z     x0 = xindex
2023-01-11T21:38:06.8071054Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8071155Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8071227Z     tmp1 = 1
2023-01-11T21:38:06.8071305Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8071383Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8071512Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8071598Z ''')
2023-01-11T21:38:06.8071604Z 
2023-01-11T21:38:06.8071608Z 
2023-01-11T21:38:06.8071701Z async_compile.wait(globals())
2023-01-11T21:38:06.8071778Z del async_compile
2023-01-11T21:38:06.8071784Z 
2023-01-11T21:38:06.8071858Z def call(args):
2023-01-11T21:38:06.8071938Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8072015Z     args.clear()
2023-01-11T21:38:06.8072103Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8072300Z         buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8072393Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8072535Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8072613Z         del arg0_1
2023-01-11T21:38:06.8072687Z         del arg1_1
2023-01-11T21:38:06.8072764Z         return (buf0, )
2023-01-11T21:38:06.8072770Z 
2023-01-11T21:38:06.8072774Z 
2023-01-11T21:38:06.8072854Z if __name__ == "__main__":
2023-01-11T21:38:06.8072964Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8073088Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8073286Z     arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8073483Z     arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8073601Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8073610Z 
2023-01-11T21:38:06.8073878Z [2023-01-11 21:36:21,155] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 952
2023-01-11T21:38:06.8074326Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8074460Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8074717Z [2023-01-11 21:36:21,172] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 953
2023-01-11T21:38:06.8074971Z [2023-01-11 21:36:21,253] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 953
2023-01-11T21:38:06.8075415Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8075578Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8075834Z [2023-01-11 21:36:21,270] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 954
2023-01-11T21:38:06.8076100Z [2023-01-11 21:36:21,341] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 954
2023-01-11T21:38:06.8076514Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8076669Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8076922Z [2023-01-11 21:36:21,361] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 955
2023-01-11T21:38:06.8076928Z 
2023-01-11T21:38:06.8077028Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8077104Z import torch
2023-01-11T21:38:06.8077180Z import random
2023-01-11T21:38:06.8077292Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8077415Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8077420Z 
2023-01-11T21:38:06.8077502Z aten = torch.ops.aten
2023-01-11T21:38:06.8077639Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8077735Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8077740Z 
2023-01-11T21:38:06.8077813Z import triton
2023-01-11T21:38:06.8077905Z import triton.language as tl
2023-01-11T21:38:06.8078026Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8078167Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8078172Z 
2023-01-11T21:38:06.8078177Z 
2023-01-11T21:38:06.8078335Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8078410Z import triton
2023-01-11T21:38:06.8078505Z import triton.language as tl
2023-01-11T21:38:06.8078618Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8078719Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8078852Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8078970Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8078975Z 
2023-01-11T21:38:06.8079392Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8079471Z @triton.jit
2023-01-11T21:38:06.8079613Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8079686Z     xnumel = 35
2023-01-11T21:38:06.8079785Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8079938Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8080026Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8080090Z     x0 = xindex
2023-01-11T21:38:06.8080208Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8080324Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8080396Z     tmp1 = 1
2023-01-11T21:38:06.8080476Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8080554Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8080690Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8080768Z ''')
2023-01-11T21:38:06.8080774Z 
2023-01-11T21:38:06.8080778Z 
2023-01-11T21:38:06.8080871Z async_compile.wait(globals())
2023-01-11T21:38:06.8080953Z del async_compile
2023-01-11T21:38:06.8080958Z 
2023-01-11T21:38:06.8081034Z def call(args):
2023-01-11T21:38:06.8081115Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8081193Z     args.clear()
2023-01-11T21:38:06.8081286Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8081479Z         buf0 = empty_strided((5, 7), (7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8081572Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8081716Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8081790Z         del arg0_1
2023-01-11T21:38:06.8081863Z         del arg1_1
2023-01-11T21:38:06.8081941Z         return (buf0, )
2023-01-11T21:38:06.8081946Z 
2023-01-11T21:38:06.8081950Z 
2023-01-11T21:38:06.8082030Z if __name__ == "__main__":
2023-01-11T21:38:06.8082149Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8082269Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8082505Z     arg0_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8082704Z     arg1_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8082824Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8082830Z 
2023-01-11T21:38:06.8082837Z 
2023-01-11T21:38:06.8082935Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8083009Z import torch
2023-01-11T21:38:06.8083083Z import random
2023-01-11T21:38:06.8083202Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8083319Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8083325Z 
2023-01-11T21:38:06.8083407Z aten = torch.ops.aten
2023-01-11T21:38:06.8083543Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8083640Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8083645Z 
2023-01-11T21:38:06.8083718Z import triton
2023-01-11T21:38:06.8083811Z import triton.language as tl
2023-01-11T21:38:06.8083941Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8084073Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8084086Z 
2023-01-11T21:38:06.8084090Z 
2023-01-11T21:38:06.8084237Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8084314Z import triton
2023-01-11T21:38:06.8084409Z import triton.language as tl
2023-01-11T21:38:06.8084525Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8084627Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8084761Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8084887Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8084892Z 
2023-01-11T21:38:06.8085314Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8085384Z @triton.jit
2023-01-11T21:38:06.8085524Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8085599Z     xnumel = 5040
2023-01-11T21:38:06.8085696Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8085881Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8085967Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8086040Z     x0 = xindex
2023-01-11T21:38:06.8086129Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8086226Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8086304Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8086439Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8086525Z ''')
2023-01-11T21:38:06.8086532Z 
2023-01-11T21:38:06.8086536Z 
2023-01-11T21:38:06.8086630Z async_compile.wait(globals())
2023-01-11T21:38:06.8086707Z del async_compile
2023-01-11T21:38:06.8086712Z 
2023-01-11T21:38:06.8086789Z def call(args):
2023-01-11T21:38:06.8086861Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8086937Z     args.clear()
2023-01-11T21:38:06.8087028Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8087266Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8087361Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8087506Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8087580Z         del arg0_1
2023-01-11T21:38:06.8087645Z         del arg1_1
2023-01-11T21:38:06.8087722Z         return (buf0, )
2023-01-11T21:38:06.8087728Z 
2023-01-11T21:38:06.8087733Z 
2023-01-11T21:38:06.8087814Z if __name__ == "__main__":
2023-01-11T21:38:06.8087933Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8088060Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8088262Z     arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8088526Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8088646Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8088652Z 
2023-01-11T21:38:06.8088656Z 
2023-01-11T21:38:06.8088748Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8088824Z import torch
2023-01-11T21:38:06.8088897Z import random
2023-01-11T21:38:06.8089016Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8089142Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8089147Z 
2023-01-11T21:38:06.8089229Z aten = torch.ops.aten
2023-01-11T21:38:06.8089364Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8089459Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8089464Z 
2023-01-11T21:38:06.8089530Z import triton
2023-01-11T21:38:06.8089623Z import triton.language as tl
2023-01-11T21:38:06.8089754Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8089892Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8089898Z 
2023-01-11T21:38:06.8089903Z 
2023-01-11T21:38:06.8090056Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8090131Z import triton
2023-01-11T21:38:06.8090228Z import triton.language as tl
2023-01-11T21:38:06.8090336Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8090439Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8090573Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8090698Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8090703Z 
2023-01-11T21:38:06.8091123Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8091200Z @triton.jit
2023-01-11T21:38:06.8091342Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8091419Z     xnumel = 5040
2023-01-11T21:38:06.8091516Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8091637Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8091750Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8091823Z     x0 = xindex
2023-01-11T21:38:06.8091939Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8092057Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8092139Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8092273Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8092351Z ''')
2023-01-11T21:38:06.8092357Z 
2023-01-11T21:38:06.8092361Z 
2023-01-11T21:38:06.8092455Z async_compile.wait(globals())
2023-01-11T21:38:06.8092532Z del async_compile
2023-01-11T21:38:06.8092538Z 
2023-01-11T21:38:06.8092615Z def call(args):
2023-01-11T21:38:06.8092692Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8092768Z     args.clear()
2023-01-11T21:38:06.8092860Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8093090Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8093188Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8093331Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8093408Z         del arg0_1
2023-01-11T21:38:06.8093483Z         del arg1_1
2023-01-11T21:38:06.8093560Z         return (buf0, )
2023-01-11T21:38:06.8093565Z 
2023-01-11T21:38:06.8093569Z 
2023-01-11T21:38:06.8093649Z if __name__ == "__main__":
2023-01-11T21:38:06.8093768Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8093887Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8094087Z     arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8094349Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8094660Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8094665Z 
2023-01-11T21:38:06.8094937Z [2023-01-11 21:36:21,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 955
2023-01-11T21:38:06.8095353Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8095489Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8095781Z [2023-01-11 21:36:21,452] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 956
2023-01-11T21:38:06.8096066Z [2023-01-11 21:36:21,526] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 956
2023-01-11T21:38:06.8096480Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8096614Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8096860Z [2023-01-11 21:36:21,543] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 957
2023-01-11T21:38:06.8097119Z [2023-01-11 21:36:21,555] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 957
2023-01-11T21:38:06.8097604Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8097794Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8098050Z [2023-01-11 21:36:21,572] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 958
2023-01-11T21:38:06.8098056Z 
2023-01-11T21:38:06.8098155Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8098233Z import torch
2023-01-11T21:38:06.8098308Z import random
2023-01-11T21:38:06.8098427Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8098543Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8098548Z 
2023-01-11T21:38:06.8098632Z aten = torch.ops.aten
2023-01-11T21:38:06.8098774Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8098873Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8098879Z 
2023-01-11T21:38:06.8098954Z import triton
2023-01-11T21:38:06.8099047Z import triton.language as tl
2023-01-11T21:38:06.8099172Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8099307Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8099320Z 
2023-01-11T21:38:06.8099325Z 
2023-01-11T21:38:06.8099475Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8099550Z import triton
2023-01-11T21:38:06.8099642Z import triton.language as tl
2023-01-11T21:38:06.8099757Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8099859Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8099991Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8100116Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8100121Z 
2023-01-11T21:38:06.8100540Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8100645Z @triton.jit
2023-01-11T21:38:06.8100791Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8100870Z     xnumel = 5040
2023-01-11T21:38:06.8100975Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8101107Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8101195Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8101270Z     x0 = xindex
2023-01-11T21:38:06.8101363Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8101465Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8101539Z     tmp1 = 1
2023-01-11T21:38:06.8101624Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8101707Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8101844Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8101936Z ''')
2023-01-11T21:38:06.8101941Z 
2023-01-11T21:38:06.8101946Z 
2023-01-11T21:38:06.8102034Z async_compile.wait(globals())
2023-01-11T21:38:06.8102116Z del async_compile
2023-01-11T21:38:06.8102121Z 
2023-01-11T21:38:06.8102198Z def call(args):
2023-01-11T21:38:06.8102283Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8102361Z     args.clear()
2023-01-11T21:38:06.8102455Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8102694Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8102790Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8102932Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8103007Z         del arg0_1
2023-01-11T21:38:06.8103089Z         del arg1_1
2023-01-11T21:38:06.8103169Z         return (buf0, )
2023-01-11T21:38:06.8103174Z 
2023-01-11T21:38:06.8103182Z 
2023-01-11T21:38:06.8103270Z if __name__ == "__main__":
2023-01-11T21:38:06.8103391Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8103521Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8103720Z     arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8103990Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8104112Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8104117Z 
2023-01-11T21:38:06.8104121Z 
2023-01-11T21:38:06.8104219Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8104294Z import torch
2023-01-11T21:38:06.8104368Z import random
2023-01-11T21:38:06.8104487Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8104612Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8104617Z 
2023-01-11T21:38:06.8104692Z aten = torch.ops.aten
2023-01-11T21:38:06.8104833Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8104932Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8104937Z 
2023-01-11T21:38:06.8105011Z import triton
2023-01-11T21:38:06.8105103Z import triton.language as tl
2023-01-11T21:38:06.8105230Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8105371Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8105376Z 
2023-01-11T21:38:06.8105381Z 
2023-01-11T21:38:06.8105540Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8105612Z import triton
2023-01-11T21:38:06.8105724Z import triton.language as tl
2023-01-11T21:38:06.8105855Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8105968Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8106100Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8106226Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8106269Z 
2023-01-11T21:38:06.8106688Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8106762Z @triton.jit
2023-01-11T21:38:06.8106897Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8106974Z     xnumel = 5040
2023-01-11T21:38:06.8107074Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8107204Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8107288Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8107361Z     x0 = xindex
2023-01-11T21:38:06.8107479Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8107588Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8107662Z     tmp1 = 1
2023-01-11T21:38:06.8107742Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8107825Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8107960Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8108044Z ''')
2023-01-11T21:38:06.8108050Z 
2023-01-11T21:38:06.8108054Z 
2023-01-11T21:38:06.8108147Z async_compile.wait(globals())
2023-01-11T21:38:06.8108224Z del async_compile
2023-01-11T21:38:06.8108232Z 
2023-01-11T21:38:06.8108300Z def call(args):
2023-01-11T21:38:06.8108379Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8108456Z     args.clear()
2023-01-11T21:38:06.8108548Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8108785Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8108877Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8109026Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8109092Z         del arg0_1
2023-01-11T21:38:06.8109165Z         del arg1_1
2023-01-11T21:38:06.8109245Z         return (buf0, )
2023-01-11T21:38:06.8109251Z 
2023-01-11T21:38:06.8109255Z 
2023-01-11T21:38:06.8109337Z if __name__ == "__main__":
2023-01-11T21:38:06.8109455Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8109582Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8109815Z     arg0_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8110050Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8110163Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8110168Z 
2023-01-11T21:38:06.8110181Z 
2023-01-11T21:38:06.8110271Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8110345Z import torch
2023-01-11T21:38:06.8110420Z import random
2023-01-11T21:38:06.8110538Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8110665Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8110673Z 
2023-01-11T21:38:06.8110756Z aten = torch.ops.aten
2023-01-11T21:38:06.8110892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8110981Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8110986Z 
2023-01-11T21:38:06.8111058Z import triton
2023-01-11T21:38:06.8111154Z import triton.language as tl
2023-01-11T21:38:06.8111281Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8111420Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8111426Z 
2023-01-11T21:38:06.8111430Z 
2023-01-11T21:38:06.8111583Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8111658Z import triton
2023-01-11T21:38:06.8111751Z import triton.language as tl
2023-01-11T21:38:06.8111858Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8111959Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8112092Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8112247Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8112253Z 
2023-01-11T21:38:06.8112678Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8112752Z @triton.jit
2023-01-11T21:38:06.8112893Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8112968Z     xnumel = 5040
2023-01-11T21:38:06.8113059Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8113188Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8113271Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8113341Z     x0 = xindex
2023-01-11T21:38:06.8113439Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8113535Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8113615Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8113745Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8113830Z ''')
2023-01-11T21:38:06.8113836Z 
2023-01-11T21:38:06.8113840Z 
2023-01-11T21:38:06.8113932Z async_compile.wait(globals())
2023-01-11T21:38:06.8114008Z del async_compile
2023-01-11T21:38:06.8114013Z 
2023-01-11T21:38:06.8114090Z def call(args):
2023-01-11T21:38:06.8114169Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8114247Z     args.clear()
2023-01-11T21:38:06.8114332Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8114567Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8114659Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8114805Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8114878Z         del arg0_1
2023-01-11T21:38:06.8114950Z         del arg1_1
2023-01-11T21:38:06.8115027Z         return (buf0, )
2023-01-11T21:38:06.8115035Z 
2023-01-11T21:38:06.8115040Z 
2023-01-11T21:38:06.8115119Z if __name__ == "__main__":
2023-01-11T21:38:06.8115229Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8115378Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8115656Z     arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8115897Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8116017Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8116023Z 
2023-01-11T21:38:06.8116290Z [2023-01-11 21:36:21,584] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 958
2023-01-11T21:38:06.8116708Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8116843Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8117103Z [2023-01-11 21:36:21,604] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 959
2023-01-11T21:38:06.8117365Z [2023-01-11 21:36:21,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 959
2023-01-11T21:38:06.8117770Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8117900Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8118180Z [2023-01-11 21:36:21,637] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 960
2023-01-11T21:38:06.8118444Z [2023-01-11 21:36:21,650] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 960
2023-01-11T21:38:06.8118857Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8118988Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8119242Z [2023-01-11 21:36:21,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 961
2023-01-11T21:38:06.8119247Z 
2023-01-11T21:38:06.8119345Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8119422Z import torch
2023-01-11T21:38:06.8119497Z import random
2023-01-11T21:38:06.8119610Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8119733Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8119738Z 
2023-01-11T21:38:06.8119822Z aten = torch.ops.aten
2023-01-11T21:38:06.8119961Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8120056Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8120061Z 
2023-01-11T21:38:06.8120137Z import triton
2023-01-11T21:38:06.8120229Z import triton.language as tl
2023-01-11T21:38:06.8120356Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8120488Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8120493Z 
2023-01-11T21:38:06.8120498Z 
2023-01-11T21:38:06.8120651Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8120726Z import triton
2023-01-11T21:38:06.8120818Z import triton.language as tl
2023-01-11T21:38:06.8120934Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8121040Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8121172Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8121290Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8121303Z 
2023-01-11T21:38:06.8121737Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8121814Z @triton.jit
2023-01-11T21:38:06.8121956Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8122031Z     xnumel = 5040
2023-01-11T21:38:06.8122128Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8122258Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8122341Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8122408Z     x0 = xindex
2023-01-11T21:38:06.8122525Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8122640Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8122719Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8122856Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8122942Z ''')
2023-01-11T21:38:06.8122948Z 
2023-01-11T21:38:06.8122952Z 
2023-01-11T21:38:06.8123045Z async_compile.wait(globals())
2023-01-11T21:38:06.8123124Z del async_compile
2023-01-11T21:38:06.8123129Z 
2023-01-11T21:38:06.8123196Z def call(args):
2023-01-11T21:38:06.8123275Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8123349Z     args.clear()
2023-01-11T21:38:06.8123442Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8123679Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8123772Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8123955Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8124021Z         del arg0_1
2023-01-11T21:38:06.8124093Z         del arg1_1
2023-01-11T21:38:06.8124172Z         return (buf0, )
2023-01-11T21:38:06.8124177Z 
2023-01-11T21:38:06.8124182Z 
2023-01-11T21:38:06.8124265Z if __name__ == "__main__":
2023-01-11T21:38:06.8124384Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8124515Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8124739Z     arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8124974Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8125086Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8125098Z 
2023-01-11T21:38:06.8125103Z 
2023-01-11T21:38:06.8125193Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8125270Z import torch
2023-01-11T21:38:06.8125348Z import random
2023-01-11T21:38:06.8125465Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8125613Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8125618Z 
2023-01-11T21:38:06.8125706Z aten = torch.ops.aten
2023-01-11T21:38:06.8125866Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8125954Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8125959Z 
2023-01-11T21:38:06.8126032Z import triton
2023-01-11T21:38:06.8126124Z import triton.language as tl
2023-01-11T21:38:06.8126248Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8126386Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8126392Z 
2023-01-11T21:38:06.8126396Z 
2023-01-11T21:38:06.8126552Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8126628Z import triton
2023-01-11T21:38:06.8126720Z import triton.language as tl
2023-01-11T21:38:06.8126829Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8126931Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8127064Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8127188Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8127193Z 
2023-01-11T21:38:06.8127640Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8127718Z @triton.jit
2023-01-11T21:38:06.8127861Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8127935Z     xnumel = 5040
2023-01-11T21:38:06.8128025Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8128156Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8128240Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8128315Z     x0 = xindex
2023-01-11T21:38:06.8128412Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8128507Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8128578Z     tmp1 = 1
2023-01-11T21:38:06.8128650Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8128727Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8128867Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8128952Z ''')
2023-01-11T21:38:06.8128957Z 
2023-01-11T21:38:06.8128962Z 
2023-01-11T21:38:06.8129055Z async_compile.wait(globals())
2023-01-11T21:38:06.8129131Z del async_compile
2023-01-11T21:38:06.8129137Z 
2023-01-11T21:38:06.8129211Z def call(args):
2023-01-11T21:38:06.8129284Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8129359Z     args.clear()
2023-01-11T21:38:06.8129451Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8129688Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8129816Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8129960Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8130034Z         del arg0_1
2023-01-11T21:38:06.8130107Z         del arg1_1
2023-01-11T21:38:06.8130177Z         return (buf0, )
2023-01-11T21:38:06.8130185Z 
2023-01-11T21:38:06.8130190Z 
2023-01-11T21:38:06.8130270Z if __name__ == "__main__":
2023-01-11T21:38:06.8130390Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8130518Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8130738Z     arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8130972Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8131091Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8131096Z 
2023-01-11T21:38:06.8131104Z 
2023-01-11T21:38:06.8131202Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8131270Z import torch
2023-01-11T21:38:06.8131344Z import random
2023-01-11T21:38:06.8131461Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8131587Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8131594Z 
2023-01-11T21:38:06.8131680Z aten = torch.ops.aten
2023-01-11T21:38:06.8131817Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8131911Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8131917Z 
2023-01-11T21:38:06.8131989Z import triton
2023-01-11T21:38:06.8132074Z import triton.language as tl
2023-01-11T21:38:06.8132198Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8132339Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8132344Z 
2023-01-11T21:38:06.8132349Z 
2023-01-11T21:38:06.8132505Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8132582Z import triton
2023-01-11T21:38:06.8132675Z import triton.language as tl
2023-01-11T21:38:06.8132789Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8132883Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8133016Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8133170Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8133175Z 
2023-01-11T21:38:06.8133597Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8133670Z @triton.jit
2023-01-11T21:38:06.8133810Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8133885Z     xnumel = 5040
2023-01-11T21:38:06.8133982Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8134105Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8134190Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8134260Z     x0 = xindex
2023-01-11T21:38:06.8134379Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8134755Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8134830Z     tmp1 = 1
2023-01-11T21:38:06.8134917Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8134988Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8135122Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8135211Z ''')
2023-01-11T21:38:06.8135216Z 
2023-01-11T21:38:06.8135220Z 
2023-01-11T21:38:06.8135313Z async_compile.wait(globals())
2023-01-11T21:38:06.8135389Z del async_compile
2023-01-11T21:38:06.8135394Z 
2023-01-11T21:38:06.8135471Z def call(args):
2023-01-11T21:38:06.8135549Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8135630Z     args.clear()
2023-01-11T21:38:06.8135732Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8136038Z         buf0 = empty_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8136130Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8136277Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8136349Z         del arg0_1
2023-01-11T21:38:06.8136423Z         del arg1_1
2023-01-11T21:38:06.8136504Z         return (buf0, )
2023-01-11T21:38:06.8136509Z 
2023-01-11T21:38:06.8136513Z 
2023-01-11T21:38:06.8136594Z if __name__ == "__main__":
2023-01-11T21:38:06.8136705Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8136831Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8137050Z     arg0_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8137338Z     arg1_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8137464Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8137470Z 
2023-01-11T21:38:06.8137737Z [2023-01-11 21:36:21,741] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 961
2023-01-11T21:38:06.8138155Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8138287Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8138541Z [2023-01-11 21:36:21,757] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 962
2023-01-11T21:38:06.8138796Z [2023-01-11 21:36:21,826] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 962
2023-01-11T21:38:06.8139209Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8139378Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8139636Z [2023-01-11 21:36:21,846] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 963
2023-01-11T21:38:06.8139898Z [2023-01-11 21:36:21,917] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 963
2023-01-11T21:38:06.8140312Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8140446Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8140699Z [2023-01-11 21:36:21,936] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 964
2023-01-11T21:38:06.8140710Z 
2023-01-11T21:38:06.8140813Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8140888Z import torch
2023-01-11T21:38:06.8140962Z import random
2023-01-11T21:38:06.8141075Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8141197Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8141202Z 
2023-01-11T21:38:06.8141284Z aten = torch.ops.aten
2023-01-11T21:38:06.8141420Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8141517Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8141522Z 
2023-01-11T21:38:06.8141595Z import triton
2023-01-11T21:38:06.8141716Z import triton.language as tl
2023-01-11T21:38:06.8141837Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8141979Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8141984Z 
2023-01-11T21:38:06.8141989Z 
2023-01-11T21:38:06.8142145Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8142225Z import triton
2023-01-11T21:38:06.8142322Z import triton.language as tl
2023-01-11T21:38:06.8142440Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8142543Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8142678Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8142799Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8142804Z 
2023-01-11T21:38:06.8143223Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8143306Z @triton.jit
2023-01-11T21:38:06.8143450Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8143527Z     xnumel = 1000
2023-01-11T21:38:06.8143628Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8143764Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8143851Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8143917Z     x0 = xindex
2023-01-11T21:38:06.8144020Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8144117Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8144199Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8144336Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8144423Z ''')
2023-01-11T21:38:06.8144428Z 
2023-01-11T21:38:06.8144432Z 
2023-01-11T21:38:06.8144528Z async_compile.wait(globals())
2023-01-11T21:38:06.8144601Z del async_compile
2023-01-11T21:38:06.8144615Z 
2023-01-11T21:38:06.8144684Z def call(args):
2023-01-11T21:38:06.8144765Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8144844Z     args.clear()
2023-01-11T21:38:06.8144940Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8145154Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8145285Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8145436Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8145504Z         del arg0_1
2023-01-11T21:38:06.8145580Z         del arg1_1
2023-01-11T21:38:06.8145662Z         return (buf0, )
2023-01-11T21:38:06.8145668Z 
2023-01-11T21:38:06.8145672Z 
2023-01-11T21:38:06.8145755Z if __name__ == "__main__":
2023-01-11T21:38:06.8145876Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8146006Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8146213Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8146421Z     arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8146544Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8146549Z 
2023-01-11T21:38:06.8146553Z 
2023-01-11T21:38:06.8146653Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8146733Z import torch
2023-01-11T21:38:06.8146810Z import random
2023-01-11T21:38:06.8146930Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8147055Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8147060Z 
2023-01-11T21:38:06.8147145Z aten = torch.ops.aten
2023-01-11T21:38:06.8147274Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8147371Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8147376Z 
2023-01-11T21:38:06.8147452Z import triton
2023-01-11T21:38:06.8147547Z import triton.language as tl
2023-01-11T21:38:06.8147673Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8147841Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8147847Z 
2023-01-11T21:38:06.8147851Z 
2023-01-11T21:38:06.8148010Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8148090Z import triton
2023-01-11T21:38:06.8148178Z import triton.language as tl
2023-01-11T21:38:06.8148299Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8148404Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8148538Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8148666Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8148671Z 
2023-01-11T21:38:06.8149089Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8149166Z @triton.jit
2023-01-11T21:38:06.8149311Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8149382Z     xnumel = 1000
2023-01-11T21:38:06.8149481Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8149612Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8149700Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8149774Z     x0 = xindex
2023-01-11T21:38:06.8149893Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8150013Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8150089Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8150227Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8150316Z ''')
2023-01-11T21:38:06.8150322Z 
2023-01-11T21:38:06.8150326Z 
2023-01-11T21:38:06.8150421Z async_compile.wait(globals())
2023-01-11T21:38:06.8150502Z del async_compile
2023-01-11T21:38:06.8150507Z 
2023-01-11T21:38:06.8150584Z def call(args):
2023-01-11T21:38:06.8150669Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8150747Z     args.clear()
2023-01-11T21:38:06.8150835Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8151051Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8151147Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8151324Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8151402Z         del arg0_1
2023-01-11T21:38:06.8151478Z         del arg1_1
2023-01-11T21:38:06.8151559Z         return (buf0, )
2023-01-11T21:38:06.8151564Z 
2023-01-11T21:38:06.8151569Z 
2023-01-11T21:38:06.8151644Z if __name__ == "__main__":
2023-01-11T21:38:06.8151765Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8151894Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8152108Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8152322Z     arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8152443Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8152448Z 
2023-01-11T21:38:06.8152452Z 
2023-01-11T21:38:06.8152552Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8152629Z import torch
2023-01-11T21:38:06.8152701Z import random
2023-01-11T21:38:06.8152822Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8152948Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8152953Z 
2023-01-11T21:38:06.8153037Z aten = torch.ops.aten
2023-01-11T21:38:06.8153177Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8153275Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8153280Z 
2023-01-11T21:38:06.8153356Z import triton
2023-01-11T21:38:06.8153450Z import triton.language as tl
2023-01-11T21:38:06.8153571Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8153739Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8153745Z 
2023-01-11T21:38:06.8153749Z 
2023-01-11T21:38:06.8153909Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8153986Z import triton
2023-01-11T21:38:06.8154080Z import triton.language as tl
2023-01-11T21:38:06.8154199Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8154302Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8154432Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8154559Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8154564Z 
2023-01-11T21:38:06.8154984Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8155061Z @triton.jit
2023-01-11T21:38:06.8155204Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8155287Z     xnumel = 1000
2023-01-11T21:38:06.8155387Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8155544Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8155637Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8155721Z     x0 = xindex
2023-01-11T21:38:06.8155826Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8155925Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8156000Z     tmp1 = 1
2023-01-11T21:38:06.8156083Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8156164Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8156295Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8156384Z ''')
2023-01-11T21:38:06.8156389Z 
2023-01-11T21:38:06.8156394Z 
2023-01-11T21:38:06.8156491Z async_compile.wait(globals())
2023-01-11T21:38:06.8156570Z del async_compile
2023-01-11T21:38:06.8156575Z 
2023-01-11T21:38:06.8156658Z def call(args):
2023-01-11T21:38:06.8156742Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8156824Z     args.clear()
2023-01-11T21:38:06.8156918Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8157124Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8157218Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8157392Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8157470Z         del arg0_1
2023-01-11T21:38:06.8157546Z         del arg1_1
2023-01-11T21:38:06.8157627Z         return (buf0, )
2023-01-11T21:38:06.8157632Z 
2023-01-11T21:38:06.8157637Z 
2023-01-11T21:38:06.8157720Z if __name__ == "__main__":
2023-01-11T21:38:06.8157840Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8157961Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8158170Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8158389Z     arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8158510Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8158515Z 
2023-01-11T21:38:06.8158784Z [2023-01-11 21:36:22,007] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 964
2023-01-11T21:38:06.8159201Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8159335Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8159593Z [2023-01-11 21:36:22,023] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 965
2023-01-11T21:38:06.8159882Z [2023-01-11 21:36:22,089] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 965
2023-01-11T21:38:06.8160298Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8160425Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8160682Z [2023-01-11 21:36:22,105] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 966
2023-01-11T21:38:06.8160945Z [2023-01-11 21:36:22,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 966
2023-01-11T21:38:06.8161359Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8161495Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8161753Z [2023-01-11 21:36:22,193] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 967
2023-01-11T21:38:06.8161759Z 
2023-01-11T21:38:06.8161859Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8161936Z import torch
2023-01-11T21:38:06.8162015Z import random
2023-01-11T21:38:06.8162138Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8162258Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8162263Z 
2023-01-11T21:38:06.8162348Z aten = torch.ops.aten
2023-01-11T21:38:06.8162490Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8162591Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8162596Z 
2023-01-11T21:38:06.8162672Z import triton
2023-01-11T21:38:06.8162767Z import triton.language as tl
2023-01-11T21:38:06.8162898Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8163033Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8163071Z 
2023-01-11T21:38:06.8163077Z 
2023-01-11T21:38:06.8163230Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8163311Z import triton
2023-01-11T21:38:06.8163406Z import triton.language as tl
2023-01-11T21:38:06.8163524Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8163628Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8163765Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8163896Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8163901Z 
2023-01-11T21:38:06.8164312Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8164393Z @triton.jit
2023-01-11T21:38:06.8164535Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8164614Z     xnumel = 1000
2023-01-11T21:38:06.8164713Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8164845Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8164931Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8165009Z     x0 = xindex
2023-01-11T21:38:06.8165121Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8165241Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8165317Z     tmp1 = 1
2023-01-11T21:38:06.8165400Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8165481Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8165619Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8165745Z ''')
2023-01-11T21:38:06.8165750Z 
2023-01-11T21:38:06.8165755Z 
2023-01-11T21:38:06.8165844Z async_compile.wait(globals())
2023-01-11T21:38:06.8165923Z del async_compile
2023-01-11T21:38:06.8165929Z 
2023-01-11T21:38:06.8166005Z def call(args):
2023-01-11T21:38:06.8166089Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8166167Z     args.clear()
2023-01-11T21:38:06.8166261Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8166475Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8166563Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8166710Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8166785Z         del arg0_1
2023-01-11T21:38:06.8166860Z         del arg1_1
2023-01-11T21:38:06.8166945Z         return (buf0, )
2023-01-11T21:38:06.8166950Z 
2023-01-11T21:38:06.8166955Z 
2023-01-11T21:38:06.8167042Z if __name__ == "__main__":
2023-01-11T21:38:06.8167164Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8167293Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8167493Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8167707Z     arg1_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8167829Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8167834Z 
2023-01-11T21:38:06.8167839Z 
2023-01-11T21:38:06.8167940Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8168017Z import torch
2023-01-11T21:38:06.8168094Z import random
2023-01-11T21:38:06.8168214Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8168340Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8168345Z 
2023-01-11T21:38:06.8168423Z aten = torch.ops.aten
2023-01-11T21:38:06.8168559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8168660Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8168666Z 
2023-01-11T21:38:06.8168743Z import triton
2023-01-11T21:38:06.8168839Z import triton.language as tl
2023-01-11T21:38:06.8168967Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8169137Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8169143Z 
2023-01-11T21:38:06.8169147Z 
2023-01-11T21:38:06.8169305Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8169375Z import triton
2023-01-11T21:38:06.8169469Z import triton.language as tl
2023-01-11T21:38:06.8169583Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8169688Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8169822Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8169949Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8169954Z 
2023-01-11T21:38:06.8170374Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8170454Z @triton.jit
2023-01-11T21:38:06.8170592Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8170668Z     xnumel = 10
2023-01-11T21:38:06.8170769Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8170902Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8170989Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8171062Z     x0 = xindex
2023-01-11T21:38:06.8171160Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8171252Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8171334Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8171471Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8171559Z ''')
2023-01-11T21:38:06.8171590Z 
2023-01-11T21:38:06.8171595Z 
2023-01-11T21:38:06.8171692Z async_compile.wait(globals())
2023-01-11T21:38:06.8171772Z del async_compile
2023-01-11T21:38:06.8171778Z 
2023-01-11T21:38:06.8171855Z def call(args):
2023-01-11T21:38:06.8171937Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8172008Z     args.clear()
2023-01-11T21:38:06.8172104Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8172306Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8172403Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8172546Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8172623Z         del arg0_1
2023-01-11T21:38:06.8172698Z         del arg1_1
2023-01-11T21:38:06.8172771Z         return (buf0, )
2023-01-11T21:38:06.8172776Z 
2023-01-11T21:38:06.8172780Z 
2023-01-11T21:38:06.8172863Z if __name__ == "__main__":
2023-01-11T21:38:06.8172984Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8173114Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8173326Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8173528Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8173652Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8173657Z 
2023-01-11T21:38:06.8173662Z 
2023-01-11T21:38:06.8173765Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8173834Z import torch
2023-01-11T21:38:06.8173910Z import random
2023-01-11T21:38:06.8174030Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8174156Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8174161Z 
2023-01-11T21:38:06.8174246Z aten = torch.ops.aten
2023-01-11T21:38:06.8174383Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8174686Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8174695Z 
2023-01-11T21:38:06.8174772Z import triton
2023-01-11T21:38:06.8174858Z import triton.language as tl
2023-01-11T21:38:06.8174984Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8175122Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8175128Z 
2023-01-11T21:38:06.8175133Z 
2023-01-11T21:38:06.8175331Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8175407Z import triton
2023-01-11T21:38:06.8175500Z import triton.language as tl
2023-01-11T21:38:06.8175614Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8175709Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8175842Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8175966Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8175972Z 
2023-01-11T21:38:06.8176391Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8176469Z @triton.jit
2023-01-11T21:38:06.8176609Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8176682Z     xnumel = 10
2023-01-11T21:38:06.8176781Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8176904Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8176988Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8177060Z     x0 = xindex
2023-01-11T21:38:06.8177232Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8177368Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8177461Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8177607Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8177686Z ''')
2023-01-11T21:38:06.8177697Z 
2023-01-11T21:38:06.8177702Z 
2023-01-11T21:38:06.8177788Z async_compile.wait(globals())
2023-01-11T21:38:06.8177932Z del async_compile
2023-01-11T21:38:06.8177937Z 
2023-01-11T21:38:06.8178014Z def call(args):
2023-01-11T21:38:06.8178096Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8178177Z     args.clear()
2023-01-11T21:38:06.8178274Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8178478Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8178567Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8178712Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8178788Z         del arg0_1
2023-01-11T21:38:06.8178863Z         del arg1_1
2023-01-11T21:38:06.8178943Z         return (buf0, )
2023-01-11T21:38:06.8178948Z 
2023-01-11T21:38:06.8178952Z 
2023-01-11T21:38:06.8179034Z if __name__ == "__main__":
2023-01-11T21:38:06.8179155Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8179276Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8179489Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8179690Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8179812Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8179817Z 
2023-01-11T21:38:06.8180090Z [2023-01-11 21:36:22,261] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 967
2023-01-11T21:38:06.8180507Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8180642Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8180899Z [2023-01-11 21:36:22,279] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 968
2023-01-11T21:38:06.8181169Z [2023-01-11 21:36:22,355] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 968
2023-01-11T21:38:06.8181609Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8181746Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8181996Z [2023-01-11 21:36:22,371] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 969
2023-01-11T21:38:06.8182258Z [2023-01-11 21:36:22,382] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 969
2023-01-11T21:38:06.8182673Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8182815Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8183068Z [2023-01-11 21:36:22,397] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 970
2023-01-11T21:38:06.8183074Z 
2023-01-11T21:38:06.8183179Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8183256Z import torch
2023-01-11T21:38:06.8183333Z import random
2023-01-11T21:38:06.8183458Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8183578Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8183590Z 
2023-01-11T21:38:06.8183668Z aten = torch.ops.aten
2023-01-11T21:38:06.8183835Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8183933Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8183939Z 
2023-01-11T21:38:06.8184015Z import triton
2023-01-11T21:38:06.8184109Z import triton.language as tl
2023-01-11T21:38:06.8184239Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8184384Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8184389Z 
2023-01-11T21:38:06.8184394Z 
2023-01-11T21:38:06.8184555Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8184626Z import triton
2023-01-11T21:38:06.8184720Z import triton.language as tl
2023-01-11T21:38:06.8184837Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8184943Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8185078Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8185207Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8185215Z 
2023-01-11T21:38:06.8185674Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8185758Z @triton.jit
2023-01-11T21:38:06.8185895Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8185971Z     xnumel = 10
2023-01-11T21:38:06.8186070Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8186202Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8186290Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8186364Z     x0 = xindex
2023-01-11T21:38:06.8186457Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8186558Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8186630Z     tmp1 = 1
2023-01-11T21:38:06.8186712Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8186794Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8186939Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8187027Z ''')
2023-01-11T21:38:06.8187032Z 
2023-01-11T21:38:06.8187037Z 
2023-01-11T21:38:06.8187134Z async_compile.wait(globals())
2023-01-11T21:38:06.8187206Z del async_compile
2023-01-11T21:38:06.8187211Z 
2023-01-11T21:38:06.8187317Z def call(args):
2023-01-11T21:38:06.8187401Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8187479Z     args.clear()
2023-01-11T21:38:06.8187573Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8187774Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8187868Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8188007Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8188084Z         del arg0_1
2023-01-11T21:38:06.8188160Z         del arg1_1
2023-01-11T21:38:06.8188239Z         return (buf0, )
2023-01-11T21:38:06.8188248Z 
2023-01-11T21:38:06.8188252Z 
2023-01-11T21:38:06.8188334Z if __name__ == "__main__":
2023-01-11T21:38:06.8188455Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8188586Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8188800Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8188992Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8189114Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8189119Z 
2023-01-11T21:38:06.8189124Z 
2023-01-11T21:38:06.8189225Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8189303Z import torch
2023-01-11T21:38:06.8189380Z import random
2023-01-11T21:38:06.8189500Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8189626Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8189631Z 
2023-01-11T21:38:06.8189716Z aten = torch.ops.aten
2023-01-11T21:38:06.8189874Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8189973Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8189979Z 
2023-01-11T21:38:06.8190055Z import triton
2023-01-11T21:38:06.8190150Z import triton.language as tl
2023-01-11T21:38:06.8190278Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8190421Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8190427Z 
2023-01-11T21:38:06.8190431Z 
2023-01-11T21:38:06.8190590Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8190667Z import triton
2023-01-11T21:38:06.8190755Z import triton.language as tl
2023-01-11T21:38:06.8190875Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8190980Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8191116Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8191244Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8191252Z 
2023-01-11T21:38:06.8191667Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8191743Z @triton.jit
2023-01-11T21:38:06.8191890Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8191959Z     xnumel = 10
2023-01-11T21:38:06.8192059Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8192191Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8192279Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8192353Z     x0 = xindex
2023-01-11T21:38:06.8192472Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8192591Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8192657Z     tmp1 = 1
2023-01-11T21:38:06.8192740Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8192824Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8192963Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8193050Z ''')
2023-01-11T21:38:06.8193056Z 
2023-01-11T21:38:06.8193060Z 
2023-01-11T21:38:06.8193154Z async_compile.wait(globals())
2023-01-11T21:38:06.8193233Z del async_compile
2023-01-11T21:38:06.8193238Z 
2023-01-11T21:38:06.8193334Z def call(args):
2023-01-11T21:38:06.8193417Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8193495Z     args.clear()
2023-01-11T21:38:06.8193589Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8193792Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8193886Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8194033Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8194103Z         del arg0_1
2023-01-11T21:38:06.8194178Z         del arg1_1
2023-01-11T21:38:06.8194259Z         return (buf0, )
2023-01-11T21:38:06.8194267Z 
2023-01-11T21:38:06.8194271Z 
2023-01-11T21:38:06.8194354Z if __name__ == "__main__":
2023-01-11T21:38:06.8194475Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8194604Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8194816Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8195015Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8195130Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8195135Z 
2023-01-11T21:38:06.8195149Z 
2023-01-11T21:38:06.8195242Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8195318Z import torch
2023-01-11T21:38:06.8195395Z import random
2023-01-11T21:38:06.8195515Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8195641Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8195646Z 
2023-01-11T21:38:06.8195732Z aten = torch.ops.aten
2023-01-11T21:38:06.8195897Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8195988Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8195993Z 
2023-01-11T21:38:06.8196068Z import triton
2023-01-11T21:38:06.8196168Z import triton.language as tl
2023-01-11T21:38:06.8196298Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8196439Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8196444Z 
2023-01-11T21:38:06.8196449Z 
2023-01-11T21:38:06.8196604Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8196682Z import triton
2023-01-11T21:38:06.8196777Z import triton.language as tl
2023-01-11T21:38:06.8196886Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8196989Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8197123Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8197255Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8197263Z 
2023-01-11T21:38:06.8197684Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8197760Z @triton.jit
2023-01-11T21:38:06.8197907Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8197984Z     xnumel = 1000
2023-01-11T21:38:06.8198076Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8198209Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8198295Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8198368Z     x0 = xindex
2023-01-11T21:38:06.8198469Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8198568Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8198649Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8198778Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8198867Z ''')
2023-01-11T21:38:06.8198873Z 
2023-01-11T21:38:06.8198877Z 
2023-01-11T21:38:06.8198972Z async_compile.wait(globals())
2023-01-11T21:38:06.8199050Z del async_compile
2023-01-11T21:38:06.8199056Z 
2023-01-11T21:38:06.8199133Z def call(args):
2023-01-11T21:38:06.8199215Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8199325Z     args.clear()
2023-01-11T21:38:06.8199413Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8199620Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8199715Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8199860Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8199936Z         del arg0_1
2023-01-11T21:38:06.8200011Z         del arg1_1
2023-01-11T21:38:06.8200091Z         return (buf0, )
2023-01-11T21:38:06.8200097Z 
2023-01-11T21:38:06.8200101Z 
2023-01-11T21:38:06.8200186Z if __name__ == "__main__":
2023-01-11T21:38:06.8200302Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8200431Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8200668Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8200877Z     arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8200999Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8201004Z 
2023-01-11T21:38:06.8201271Z [2023-01-11 21:36:22,408] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 970
2023-01-11T21:38:06.8201689Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8201849Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8202108Z [2023-01-11 21:36:22,427] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 971
2023-01-11T21:38:06.8202377Z [2023-01-11 21:36:22,440] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 971
2023-01-11T21:38:06.8202785Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8202917Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8203176Z [2023-01-11 21:36:22,459] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 972
2023-01-11T21:38:06.8203440Z [2023-01-11 21:36:22,471] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 972
2023-01-11T21:38:06.8203854Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8203988Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8204243Z [2023-01-11 21:36:22,486] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 973
2023-01-11T21:38:06.8204249Z 
2023-01-11T21:38:06.8204349Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8204429Z import torch
2023-01-11T21:38:06.8204507Z import random
2023-01-11T21:38:06.8204621Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8204750Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8204756Z 
2023-01-11T21:38:06.8204839Z aten = torch.ops.aten
2023-01-11T21:38:06.8204981Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8205077Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8205108Z 
2023-01-11T21:38:06.8205189Z import triton
2023-01-11T21:38:06.8205283Z import triton.language as tl
2023-01-11T21:38:06.8205404Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8205562Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8205568Z 
2023-01-11T21:38:06.8205574Z 
2023-01-11T21:38:06.8205753Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8205840Z import triton
2023-01-11T21:38:06.8205936Z import triton.language as tl
2023-01-11T21:38:06.8206052Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8206156Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8206296Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8206416Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8206428Z 
2023-01-11T21:38:06.8206845Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8206923Z @triton.jit
2023-01-11T21:38:06.8207066Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8207145Z     xnumel = 1000
2023-01-11T21:38:06.8207248Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8207382Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8207467Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8207535Z     x0 = xindex
2023-01-11T21:38:06.8207656Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8207810Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8207896Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8208033Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8208120Z ''')
2023-01-11T21:38:06.8208125Z 
2023-01-11T21:38:06.8208130Z 
2023-01-11T21:38:06.8208229Z async_compile.wait(globals())
2023-01-11T21:38:06.8208310Z del async_compile
2023-01-11T21:38:06.8208315Z 
2023-01-11T21:38:06.8208385Z def call(args):
2023-01-11T21:38:06.8208469Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8208547Z     args.clear()
2023-01-11T21:38:06.8208642Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8208853Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8208948Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8209093Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8209162Z         del arg0_1
2023-01-11T21:38:06.8209240Z         del arg1_1
2023-01-11T21:38:06.8209322Z         return (buf0, )
2023-01-11T21:38:06.8209327Z 
2023-01-11T21:38:06.8209331Z 
2023-01-11T21:38:06.8209414Z if __name__ == "__main__":
2023-01-11T21:38:06.8209534Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8209666Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8209901Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8210109Z     arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8210222Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8210227Z 
2023-01-11T21:38:06.8210241Z 
2023-01-11T21:38:06.8210333Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8210409Z import torch
2023-01-11T21:38:06.8210488Z import random
2023-01-11T21:38:06.8210609Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8210738Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8210743Z 
2023-01-11T21:38:06.8210829Z aten = torch.ops.aten
2023-01-11T21:38:06.8210966Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8211056Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8211061Z 
2023-01-11T21:38:06.8211163Z import triton
2023-01-11T21:38:06.8211258Z import triton.language as tl
2023-01-11T21:38:06.8211386Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8211529Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8211534Z 
2023-01-11T21:38:06.8211539Z 
2023-01-11T21:38:06.8211700Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8211779Z import triton
2023-01-11T21:38:06.8211880Z import triton.language as tl
2023-01-11T21:38:06.8211989Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8212092Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8212232Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8212358Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8212363Z 
2023-01-11T21:38:06.8212783Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8212859Z @triton.jit
2023-01-11T21:38:06.8213002Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8213080Z     xnumel = 1000
2023-01-11T21:38:06.8213173Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8213306Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8213392Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8213470Z     x0 = xindex
2023-01-11T21:38:06.8213575Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8213674Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8213775Z     tmp1 = 1
2023-01-11T21:38:06.8213850Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8213931Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8214072Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8214161Z ''')
2023-01-11T21:38:06.8214166Z 
2023-01-11T21:38:06.8214175Z 
2023-01-11T21:38:06.8214269Z async_compile.wait(globals())
2023-01-11T21:38:06.8214349Z del async_compile
2023-01-11T21:38:06.8214354Z 
2023-01-11T21:38:06.8214431Z def call(args):
2023-01-11T21:38:06.8214706Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8214791Z     args.clear()
2023-01-11T21:38:06.8214886Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8215098Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8215192Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8215341Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8215421Z         del arg0_1
2023-01-11T21:38:06.8215489Z         del arg1_1
2023-01-11T21:38:06.8215570Z         return (buf0, )
2023-01-11T21:38:06.8215575Z 
2023-01-11T21:38:06.8215579Z 
2023-01-11T21:38:06.8215661Z if __name__ == "__main__":
2023-01-11T21:38:06.8215781Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8215912Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8216148Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8216355Z     arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8216476Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8216482Z 
2023-01-11T21:38:06.8216486Z 
2023-01-11T21:38:06.8216586Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8216656Z import torch
2023-01-11T21:38:06.8216731Z import random
2023-01-11T21:38:06.8216853Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8216982Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8216987Z 
2023-01-11T21:38:06.8217071Z aten = torch.ops.aten
2023-01-11T21:38:06.8217266Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8217364Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8217370Z 
2023-01-11T21:38:06.8217491Z import triton
2023-01-11T21:38:06.8217590Z import triton.language as tl
2023-01-11T21:38:06.8217720Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8217862Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8217868Z 
2023-01-11T21:38:06.8217872Z 
2023-01-11T21:38:06.8218033Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8218111Z import triton
2023-01-11T21:38:06.8218208Z import triton.language as tl
2023-01-11T21:38:06.8218324Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8218422Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8218560Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8218689Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8218694Z 
2023-01-11T21:38:06.8219116Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8219192Z @triton.jit
2023-01-11T21:38:06.8219335Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8219416Z     xnumel = 1000
2023-01-11T21:38:06.8219515Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8219640Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8219725Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8219799Z     x0 = xindex
2023-01-11T21:38:06.8219919Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8220075Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8220148Z     tmp1 = 1
2023-01-11T21:38:06.8220229Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8220304Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8220443Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8220536Z ''')
2023-01-11T21:38:06.8220543Z 
2023-01-11T21:38:06.8220548Z 
2023-01-11T21:38:06.8220642Z async_compile.wait(globals())
2023-01-11T21:38:06.8220723Z del async_compile
2023-01-11T21:38:06.8220728Z 
2023-01-11T21:38:06.8220805Z def call(args):
2023-01-11T21:38:06.8220888Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8220958Z     args.clear()
2023-01-11T21:38:06.8221052Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8221263Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8221357Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8221507Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8221585Z         del arg0_1
2023-01-11T21:38:06.8221660Z         del arg1_1
2023-01-11T21:38:06.8221733Z         return (buf0, )
2023-01-11T21:38:06.8221746Z 
2023-01-11T21:38:06.8221750Z 
2023-01-11T21:38:06.8221826Z if __name__ == "__main__":
2023-01-11T21:38:06.8221950Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8222079Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8222315Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8222522Z     arg1_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8222645Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8222650Z 
2023-01-11T21:38:06.8222916Z [2023-01-11 21:36:22,553] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 973
2023-01-11T21:38:06.8223329Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8223506Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8223757Z [2023-01-11 21:36:22,569] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 974
2023-01-11T21:38:06.8224025Z [2023-01-11 21:36:22,639] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 974
2023-01-11T21:38:06.8224441Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8224580Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8224838Z [2023-01-11 21:36:22,658] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 975
2023-01-11T21:38:06.8225107Z [2023-01-11 21:36:22,730] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 975
2023-01-11T21:38:06.8225520Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8225668Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8225956Z [2023-01-11 21:36:22,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 976
2023-01-11T21:38:06.8225988Z 
2023-01-11T21:38:06.8226095Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8226173Z import torch
2023-01-11T21:38:06.8226243Z import random
2023-01-11T21:38:06.8226365Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8226497Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8226503Z 
2023-01-11T21:38:06.8226588Z aten = torch.ops.aten
2023-01-11T21:38:06.8226728Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8226829Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8226834Z 
2023-01-11T21:38:06.8226911Z import triton
2023-01-11T21:38:06.8226998Z import triton.language as tl
2023-01-11T21:38:06.8227129Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8227270Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8227276Z 
2023-01-11T21:38:06.8227280Z 
2023-01-11T21:38:06.8227440Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8227517Z import triton
2023-01-11T21:38:06.8227613Z import triton.language as tl
2023-01-11T21:38:06.8227734Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8227838Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8227968Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8228097Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8228102Z 
2023-01-11T21:38:06.8228523Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8228599Z @triton.jit
2023-01-11T21:38:06.8228743Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8228820Z     xnumel = 16
2023-01-11T21:38:06.8228921Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8229057Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8229136Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8229211Z     x0 = xindex
2023-01-11T21:38:06.8229311Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8229410Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8229518Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8229657Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8229745Z ''')
2023-01-11T21:38:06.8229750Z 
2023-01-11T21:38:06.8229754Z 
2023-01-11T21:38:06.8229854Z async_compile.wait(globals())
2023-01-11T21:38:06.8229927Z del async_compile
2023-01-11T21:38:06.8229932Z 
2023-01-11T21:38:06.8230009Z def call(args):
2023-01-11T21:38:06.8230091Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8230168Z     args.clear()
2023-01-11T21:38:06.8230261Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8230464Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8230565Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8230703Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8230781Z         del arg0_1
2023-01-11T21:38:06.8230855Z         del arg1_1
2023-01-11T21:38:06.8230943Z         return (buf0, )
2023-01-11T21:38:06.8230948Z 
2023-01-11T21:38:06.8230953Z 
2023-01-11T21:38:06.8231038Z if __name__ == "__main__":
2023-01-11T21:38:06.8231158Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8231288Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8231497Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8231698Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8231821Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8231826Z 
2023-01-11T21:38:06.8231856Z 
2023-01-11T21:38:06.8231956Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8232035Z import torch
2023-01-11T21:38:06.8232111Z import random
2023-01-11T21:38:06.8232232Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8232358Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8232363Z 
2023-01-11T21:38:06.8232442Z aten = torch.ops.aten
2023-01-11T21:38:06.8232580Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8232677Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8232682Z 
2023-01-11T21:38:06.8232760Z import triton
2023-01-11T21:38:06.8232854Z import triton.language as tl
2023-01-11T21:38:06.8232982Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8233123Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8233128Z 
2023-01-11T21:38:06.8233132Z 
2023-01-11T21:38:06.8233289Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8233363Z import triton
2023-01-11T21:38:06.8233458Z import triton.language as tl
2023-01-11T21:38:06.8233576Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8233680Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8233815Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8233944Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8233949Z 
2023-01-11T21:38:06.8234370Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8234445Z @triton.jit
2023-01-11T21:38:06.8234581Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8234659Z     xnumel = 16
2023-01-11T21:38:06.8234758Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8234892Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8234980Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8235055Z     x0 = xindex
2023-01-11T21:38:06.8235175Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8235286Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8235370Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8235533Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8235624Z ''')
2023-01-11T21:38:06.8235629Z 
2023-01-11T21:38:06.8235634Z 
2023-01-11T21:38:06.8235728Z async_compile.wait(globals())
2023-01-11T21:38:06.8235807Z del async_compile
2023-01-11T21:38:06.8235812Z 
2023-01-11T21:38:06.8235889Z def call(args):
2023-01-11T21:38:06.8235970Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8236040Z     args.clear()
2023-01-11T21:38:06.8236135Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8236338Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8236438Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8236582Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8236658Z         del arg0_1
2023-01-11T21:38:06.8236733Z         del arg1_1
2023-01-11T21:38:06.8236805Z         return (buf0, )
2023-01-11T21:38:06.8236811Z 
2023-01-11T21:38:06.8236826Z 
2023-01-11T21:38:06.8236903Z if __name__ == "__main__":
2023-01-11T21:38:06.8237025Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8237154Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8237369Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8237569Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8237690Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8237695Z 
2023-01-11T21:38:06.8237700Z 
2023-01-11T21:38:06.8237800Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8237901Z import torch
2023-01-11T21:38:06.8237969Z import random
2023-01-11T21:38:06.8238085Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8238209Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8238214Z 
2023-01-11T21:38:06.8238295Z aten = torch.ops.aten
2023-01-11T21:38:06.8238434Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8238529Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8238534Z 
2023-01-11T21:38:06.8238607Z import triton
2023-01-11T21:38:06.8238691Z import triton.language as tl
2023-01-11T21:38:06.8238816Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8238954Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8238960Z 
2023-01-11T21:38:06.8238964Z 
2023-01-11T21:38:06.8239120Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8239194Z import triton
2023-01-11T21:38:06.8239286Z import triton.language as tl
2023-01-11T21:38:06.8239404Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8239507Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8239634Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8239757Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8239763Z 
2023-01-11T21:38:06.8240187Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8240265Z @triton.jit
2023-01-11T21:38:06.8240406Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8240481Z     xnumel = 16
2023-01-11T21:38:06.8240579Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8240709Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8240786Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8240864Z     x0 = xindex
2023-01-11T21:38:06.8240962Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8241060Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8241131Z     tmp1 = 1
2023-01-11T21:38:06.8241210Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8241281Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8241445Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8241532Z ''')
2023-01-11T21:38:06.8241537Z 
2023-01-11T21:38:06.8241542Z 
2023-01-11T21:38:06.8241634Z async_compile.wait(globals())
2023-01-11T21:38:06.8241712Z del async_compile
2023-01-11T21:38:06.8241717Z 
2023-01-11T21:38:06.8241791Z def call(args):
2023-01-11T21:38:06.8241870Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8241949Z     args.clear()
2023-01-11T21:38:06.8242032Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8242232Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8242323Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8242470Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8242544Z         del arg0_1
2023-01-11T21:38:06.8242617Z         del arg1_1
2023-01-11T21:38:06.8242694Z         return (buf0, )
2023-01-11T21:38:06.8242700Z 
2023-01-11T21:38:06.8242704Z 
2023-01-11T21:38:06.8242787Z if __name__ == "__main__":
2023-01-11T21:38:06.8242897Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8243024Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8243235Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8243431Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8243555Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8243560Z 
2023-01-11T21:38:06.8243827Z [2023-01-11 21:36:22,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 976
2023-01-11T21:38:06.8244276Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8244411Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8244667Z [2023-01-11 21:36:22,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 977
2023-01-11T21:38:06.8244920Z [2023-01-11 21:36:22,843] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 977
2023-01-11T21:38:06.8245333Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8245468Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8245724Z [2023-01-11 21:36:22,859] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 978
2023-01-11T21:38:06.8245986Z [2023-01-11 21:36:22,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 978
2023-01-11T21:38:06.8246396Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8246527Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8246781Z [2023-01-11 21:36:22,887] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 979
2023-01-11T21:38:06.8246787Z 
2023-01-11T21:38:06.8246885Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8246961Z import torch
2023-01-11T21:38:06.8247034Z import random
2023-01-11T21:38:06.8247171Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8247297Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8247302Z 
2023-01-11T21:38:06.8247384Z aten = torch.ops.aten
2023-01-11T21:38:06.8247519Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8247616Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8247621Z 
2023-01-11T21:38:06.8247695Z import triton
2023-01-11T21:38:06.8247787Z import triton.language as tl
2023-01-11T21:38:06.8247904Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8248044Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8248053Z 
2023-01-11T21:38:06.8248057Z 
2023-01-11T21:38:06.8248216Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8248290Z import triton
2023-01-11T21:38:06.8248383Z import triton.language as tl
2023-01-11T21:38:06.8248498Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8248601Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8248735Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8248853Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8248858Z 
2023-01-11T21:38:06.8249277Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8249354Z @triton.jit
2023-01-11T21:38:06.8249492Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8249606Z     xnumel = 16
2023-01-11T21:38:06.8249703Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8249833Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8249917Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8249981Z     x0 = xindex
2023-01-11T21:38:06.8250103Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8250219Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8250290Z     tmp1 = 1
2023-01-11T21:38:06.8250370Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8250447Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8250582Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8250661Z ''')
2023-01-11T21:38:06.8250666Z 
2023-01-11T21:38:06.8250671Z 
2023-01-11T21:38:06.8250765Z async_compile.wait(globals())
2023-01-11T21:38:06.8250841Z del async_compile
2023-01-11T21:38:06.8250846Z 
2023-01-11T21:38:06.8250921Z def call(args):
2023-01-11T21:38:06.8251004Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8251082Z     args.clear()
2023-01-11T21:38:06.8251175Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8251368Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8251464Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8251611Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8251685Z         del arg0_1
2023-01-11T21:38:06.8251758Z         del arg1_1
2023-01-11T21:38:06.8251835Z         return (buf0, )
2023-01-11T21:38:06.8251840Z 
2023-01-11T21:38:06.8251845Z 
2023-01-11T21:38:06.8251924Z if __name__ == "__main__":
2023-01-11T21:38:06.8252044Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8252163Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8252375Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8252575Z     arg1_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8252698Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8252703Z 
2023-01-11T21:38:06.8252708Z 
2023-01-11T21:38:06.8252806Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8252882Z import torch
2023-01-11T21:38:06.8252957Z import random
2023-01-11T21:38:06.8253105Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8253222Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8253227Z 
2023-01-11T21:38:06.8253310Z aten = torch.ops.aten
2023-01-11T21:38:06.8253446Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8253543Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8253548Z 
2023-01-11T21:38:06.8253623Z import triton
2023-01-11T21:38:06.8253715Z import triton.language as tl
2023-01-11T21:38:06.8253841Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8253973Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8253988Z 
2023-01-11T21:38:06.8253992Z 
2023-01-11T21:38:06.8254139Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8254215Z import triton
2023-01-11T21:38:06.8254309Z import triton.language as tl
2023-01-11T21:38:06.8254424Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8254749Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8254884Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8255010Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8255015Z 
2023-01-11T21:38:06.8255433Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8255509Z @triton.jit
2023-01-11T21:38:06.8255669Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8255825Z     xnumel = 35
2023-01-11T21:38:06.8255927Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8256060Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8256148Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8256224Z     x0 = xindex
2023-01-11T21:38:06.8256319Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8256418Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8256499Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8256638Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8256728Z ''')
2023-01-11T21:38:06.8256733Z 
2023-01-11T21:38:06.8256738Z 
2023-01-11T21:38:06.8256834Z async_compile.wait(globals())
2023-01-11T21:38:06.8256915Z del async_compile
2023-01-11T21:38:06.8256920Z 
2023-01-11T21:38:06.8256998Z def call(args):
2023-01-11T21:38:06.8257073Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8257219Z     args.clear()
2023-01-11T21:38:06.8257338Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8257543Z         buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8257640Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8257788Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8257869Z         del arg0_1
2023-01-11T21:38:06.8257937Z         del arg1_1
2023-01-11T21:38:06.8258016Z         return (buf0, )
2023-01-11T21:38:06.8258021Z 
2023-01-11T21:38:06.8258026Z 
2023-01-11T21:38:06.8258109Z if __name__ == "__main__":
2023-01-11T21:38:06.8258229Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8258359Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8258561Z     arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8258762Z     arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8258876Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8258891Z 
2023-01-11T21:38:06.8258895Z 
2023-01-11T21:38:06.8258989Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8259066Z import torch
2023-01-11T21:38:06.8259143Z import random
2023-01-11T21:38:06.8259264Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8259438Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8259444Z 
2023-01-11T21:38:06.8259531Z aten = torch.ops.aten
2023-01-11T21:38:06.8259670Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8259759Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8259772Z 
2023-01-11T21:38:06.8259841Z import triton
2023-01-11T21:38:06.8259936Z import triton.language as tl
2023-01-11T21:38:06.8260065Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8260205Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8260211Z 
2023-01-11T21:38:06.8260215Z 
2023-01-11T21:38:06.8260377Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8260456Z import triton
2023-01-11T21:38:06.8260550Z import triton.language as tl
2023-01-11T21:38:06.8260659Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8260763Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8260899Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8261030Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8261035Z 
2023-01-11T21:38:06.8261454Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8261531Z @triton.jit
2023-01-11T21:38:06.8261674Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8261752Z     xnumel = 35
2023-01-11T21:38:06.8261844Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8262007Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8262093Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8262168Z     x0 = xindex
2023-01-11T21:38:06.8262290Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8262412Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8262496Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8262624Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8262712Z ''')
2023-01-11T21:38:06.8262717Z 
2023-01-11T21:38:06.8262721Z 
2023-01-11T21:38:06.8262817Z async_compile.wait(globals())
2023-01-11T21:38:06.8262897Z del async_compile
2023-01-11T21:38:06.8262902Z 
2023-01-11T21:38:06.8262980Z def call(args):
2023-01-11T21:38:06.8263062Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8263139Z     args.clear()
2023-01-11T21:38:06.8263238Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8263431Z         buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8263530Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8263673Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8263750Z         del arg0_1
2023-01-11T21:38:06.8263828Z         del arg1_1
2023-01-11T21:38:06.8263913Z         return (buf0, )
2023-01-11T21:38:06.8263918Z 
2023-01-11T21:38:06.8263923Z 
2023-01-11T21:38:06.8264005Z if __name__ == "__main__":
2023-01-11T21:38:06.8264118Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8264246Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8264445Z     arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8264644Z     arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8264765Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8264770Z 
2023-01-11T21:38:06.8265041Z [2023-01-11 21:36:22,899] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 979
2023-01-11T21:38:06.8265490Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8265631Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8265888Z [2023-01-11 21:36:22,917] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 980
2023-01-11T21:38:06.8266155Z [2023-01-11 21:36:22,929] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 980
2023-01-11T21:38:06.8266568Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8266697Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8266958Z [2023-01-11 21:36:22,944] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 981
2023-01-11T21:38:06.8267222Z [2023-01-11 21:36:22,954] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 981
2023-01-11T21:38:06.8267635Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8267799Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8268054Z [2023-01-11 21:36:22,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 982
2023-01-11T21:38:06.8268059Z 
2023-01-11T21:38:06.8268161Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8268243Z import torch
2023-01-11T21:38:06.8268320Z import random
2023-01-11T21:38:06.8268435Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8268561Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8268566Z 
2023-01-11T21:38:06.8268650Z aten = torch.ops.aten
2023-01-11T21:38:06.8268790Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8268888Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8268893Z 
2023-01-11T21:38:06.8268969Z import triton
2023-01-11T21:38:06.8269063Z import triton.language as tl
2023-01-11T21:38:06.8269192Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8269330Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8269336Z 
2023-01-11T21:38:06.8269347Z 
2023-01-11T21:38:06.8269500Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8269577Z import triton
2023-01-11T21:38:06.8269673Z import triton.language as tl
2023-01-11T21:38:06.8269792Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8269896Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8270032Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8270153Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8270168Z 
2023-01-11T21:38:06.8270576Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8270653Z @triton.jit
2023-01-11T21:38:06.8270798Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8270876Z     xnumel = 35
2023-01-11T21:38:06.8270975Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8271108Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8271196Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8271333Z     x0 = xindex
2023-01-11T21:38:06.8271437Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8271537Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8271609Z     tmp1 = 1
2023-01-11T21:38:06.8271691Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8271772Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8271912Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8271993Z ''')
2023-01-11T21:38:06.8272009Z 
2023-01-11T21:38:06.8272013Z 
2023-01-11T21:38:06.8272102Z async_compile.wait(globals())
2023-01-11T21:38:06.8272182Z del async_compile
2023-01-11T21:38:06.8272187Z 
2023-01-11T21:38:06.8272269Z def call(args):
2023-01-11T21:38:06.8272351Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8272429Z     args.clear()
2023-01-11T21:38:06.8272527Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8272731Z         buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8272818Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8272968Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8273045Z         del arg0_1
2023-01-11T21:38:06.8273121Z         del arg1_1
2023-01-11T21:38:06.8273202Z         return (buf0, )
2023-01-11T21:38:06.8273207Z 
2023-01-11T21:38:06.8273212Z 
2023-01-11T21:38:06.8273293Z if __name__ == "__main__":
2023-01-11T21:38:06.8273413Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8273534Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8273736Z     arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8273967Z     arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8274089Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8274094Z 
2023-01-11T21:38:06.8274098Z 
2023-01-11T21:38:06.8274199Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8274275Z import torch
2023-01-11T21:38:06.8274356Z import random
2023-01-11T21:38:06.8274482Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8274600Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8274613Z 
2023-01-11T21:38:06.8274690Z aten = torch.ops.aten
2023-01-11T21:38:06.8274828Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8274927Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8274932Z 
2023-01-11T21:38:06.8275009Z import triton
2023-01-11T21:38:06.8275104Z import triton.language as tl
2023-01-11T21:38:06.8275236Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8275382Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8275387Z 
2023-01-11T21:38:06.8275391Z 
2023-01-11T21:38:06.8275552Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8275622Z import triton
2023-01-11T21:38:06.8275721Z import triton.language as tl
2023-01-11T21:38:06.8275840Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8275945Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8276082Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8276208Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8276213Z 
2023-01-11T21:38:06.8276626Z @pointwise(size_hints=[64], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8276695Z @triton.jit
2023-01-11T21:38:06.8276838Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8276919Z     xnumel = 35
2023-01-11T21:38:06.8277018Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8277152Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8277238Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8277311Z     x0 = xindex
2023-01-11T21:38:06.8277452Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8277575Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8277648Z     tmp1 = 1
2023-01-11T21:38:06.8277731Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8277813Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8277951Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8278039Z ''')
2023-01-11T21:38:06.8278045Z 
2023-01-11T21:38:06.8278049Z 
2023-01-11T21:38:06.8278145Z async_compile.wait(globals())
2023-01-11T21:38:06.8278218Z del async_compile
2023-01-11T21:38:06.8278223Z 
2023-01-11T21:38:06.8278303Z def call(args):
2023-01-11T21:38:06.8278383Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8278463Z     args.clear()
2023-01-11T21:38:06.8278557Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8278756Z         buf0 = empty_strided((35, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8278853Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8278992Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 35, grid=grid(35), stream=stream0)
2023-01-11T21:38:06.8279068Z         del arg0_1
2023-01-11T21:38:06.8279145Z         del arg1_1
2023-01-11T21:38:06.8279225Z         return (buf0, )
2023-01-11T21:38:06.8279231Z 
2023-01-11T21:38:06.8279235Z 
2023-01-11T21:38:06.8279319Z if __name__ == "__main__":
2023-01-11T21:38:06.8279441Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8279569Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8279768Z     arg0_1 = rand_strided((5, 7), (7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8279997Z     arg1_1 = rand_strided((35, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8280118Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8280123Z 
2023-01-11T21:38:06.8280128Z 
2023-01-11T21:38:06.8280229Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8280306Z import torch
2023-01-11T21:38:06.8280387Z import random
2023-01-11T21:38:06.8280508Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8280634Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8280640Z 
2023-01-11T21:38:06.8280724Z aten = torch.ops.aten
2023-01-11T21:38:06.8280856Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8280955Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8280960Z 
2023-01-11T21:38:06.8281037Z import triton
2023-01-11T21:38:06.8281131Z import triton.language as tl
2023-01-11T21:38:06.8281259Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8281404Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8281409Z 
2023-01-11T21:38:06.8281414Z 
2023-01-11T21:38:06.8281572Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8281650Z import triton
2023-01-11T21:38:06.8281737Z import triton.language as tl
2023-01-11T21:38:06.8281856Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8281960Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8282097Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8282225Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8282230Z 
2023-01-11T21:38:06.8282652Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8282728Z @triton.jit
2023-01-11T21:38:06.8282870Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8282944Z     xnumel = 5040
2023-01-11T21:38:06.8283044Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8283175Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8283261Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8283362Z     x0 = xindex
2023-01-11T21:38:06.8283462Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8283562Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8283637Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8283771Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8283862Z ''')
2023-01-11T21:38:06.8283868Z 
2023-01-11T21:38:06.8283872Z 
2023-01-11T21:38:06.8283967Z async_compile.wait(globals())
2023-01-11T21:38:06.8284046Z del async_compile
2023-01-11T21:38:06.8284051Z 
2023-01-11T21:38:06.8284128Z def call(args):
2023-01-11T21:38:06.8284209Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8284283Z     args.clear()
2023-01-11T21:38:06.8284377Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8284581Z         buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8284679Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8284831Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8284909Z         del arg0_1
2023-01-11T21:38:06.8284983Z         del arg1_1
2023-01-11T21:38:06.8285056Z         return (buf0, )
2023-01-11T21:38:06.8285069Z 
2023-01-11T21:38:06.8285074Z 
2023-01-11T21:38:06.8285149Z if __name__ == "__main__":
2023-01-11T21:38:06.8285270Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8285397Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8285638Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8285874Z     arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8286039Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8286044Z 
2023-01-11T21:38:06.8286312Z [2023-01-11 21:36:22,981] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 982
2023-01-11T21:38:06.8286729Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8286865Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8287116Z [2023-01-11 21:36:22,999] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 983
2023-01-11T21:38:06.8287378Z [2023-01-11 21:36:23,014] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 983
2023-01-11T21:38:06.8287794Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8287929Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8288185Z [2023-01-11 21:36:23,032] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 984
2023-01-11T21:38:06.8288448Z [2023-01-11 21:36:23,045] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 984
2023-01-11T21:38:06.8288866Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8289001Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8289287Z [2023-01-11 21:36:23,061] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 985
2023-01-11T21:38:06.8289293Z 
2023-01-11T21:38:06.8289394Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8289473Z import torch
2023-01-11T21:38:06.8289544Z import random
2023-01-11T21:38:06.8289665Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8289791Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8289797Z 
2023-01-11T21:38:06.8289882Z aten = torch.ops.aten
2023-01-11T21:38:06.8290021Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8290119Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8290127Z 
2023-01-11T21:38:06.8290204Z import triton
2023-01-11T21:38:06.8290291Z import triton.language as tl
2023-01-11T21:38:06.8290420Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8290562Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8290567Z 
2023-01-11T21:38:06.8290572Z 
2023-01-11T21:38:06.8290733Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8290810Z import triton
2023-01-11T21:38:06.8290906Z import triton.language as tl
2023-01-11T21:38:06.8291023Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8291128Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8291256Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8291384Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8291389Z 
2023-01-11T21:38:06.8291813Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8291940Z @triton.jit
2023-01-11T21:38:06.8292086Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8292164Z     xnumel = 5040
2023-01-11T21:38:06.8292266Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8292398Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8292477Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8292551Z     x0 = xindex
2023-01-11T21:38:06.8292670Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8292793Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8292877Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8293016Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8293104Z ''')
2023-01-11T21:38:06.8293110Z 
2023-01-11T21:38:06.8293114Z 
2023-01-11T21:38:06.8293211Z async_compile.wait(globals())
2023-01-11T21:38:06.8293288Z del async_compile
2023-01-11T21:38:06.8293293Z 
2023-01-11T21:38:06.8293371Z def call(args):
2023-01-11T21:38:06.8293454Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8293531Z     args.clear()
2023-01-11T21:38:06.8293628Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8293834Z         buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8293929Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8294069Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8294146Z         del arg0_1
2023-01-11T21:38:06.8294220Z         del arg1_1
2023-01-11T21:38:06.8294303Z         return (buf0, )
2023-01-11T21:38:06.8294308Z 
2023-01-11T21:38:06.8294312Z 
2023-01-11T21:38:06.8294394Z if __name__ == "__main__":
2023-01-11T21:38:06.8294883Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8295025Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8295266Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8295471Z     arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8295618Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8295723Z 
2023-01-11T21:38:06.8295730Z 
2023-01-11T21:38:06.8295844Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8295921Z import torch
2023-01-11T21:38:06.8295998Z import random
2023-01-11T21:38:06.8296119Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8296245Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8296251Z 
2023-01-11T21:38:06.8296340Z aten = torch.ops.aten
2023-01-11T21:38:06.8296472Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8296571Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8296576Z 
2023-01-11T21:38:06.8296653Z import triton
2023-01-11T21:38:06.8296750Z import triton.language as tl
2023-01-11T21:38:06.8296877Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8297021Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8297026Z 
2023-01-11T21:38:06.8297031Z 
2023-01-11T21:38:06.8297265Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8297338Z import triton
2023-01-11T21:38:06.8297435Z import triton.language as tl
2023-01-11T21:38:06.8297552Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8297657Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8297798Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8297926Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8297931Z 
2023-01-11T21:38:06.8298354Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8298478Z @triton.jit
2023-01-11T21:38:06.8298614Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8298693Z     xnumel = 5040
2023-01-11T21:38:06.8298792Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8298927Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8299016Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8299090Z     x0 = xindex
2023-01-11T21:38:06.8299189Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8299282Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8299355Z     tmp1 = 1
2023-01-11T21:38:06.8299436Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8299516Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8299657Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8299749Z ''')
2023-01-11T21:38:06.8299754Z 
2023-01-11T21:38:06.8299759Z 
2023-01-11T21:38:06.8299862Z async_compile.wait(globals())
2023-01-11T21:38:06.8299942Z del async_compile
2023-01-11T21:38:06.8299948Z 
2023-01-11T21:38:06.8300017Z def call(args):
2023-01-11T21:38:06.8300098Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8300177Z     args.clear()
2023-01-11T21:38:06.8300271Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8300475Z         buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8300573Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8300720Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8300788Z         del arg0_1
2023-01-11T21:38:06.8300863Z         del arg1_1
2023-01-11T21:38:06.8300943Z         return (buf0, )
2023-01-11T21:38:06.8300948Z 
2023-01-11T21:38:06.8300952Z 
2023-01-11T21:38:06.8301035Z if __name__ == "__main__":
2023-01-11T21:38:06.8301155Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8301284Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8301527Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8301730Z     arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8301877Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8301883Z 
2023-01-11T21:38:06.8301896Z 
2023-01-11T21:38:06.8301990Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8302072Z import torch
2023-01-11T21:38:06.8302149Z import random
2023-01-11T21:38:06.8302271Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8302399Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8302405Z 
2023-01-11T21:38:06.8302489Z aten = torch.ops.aten
2023-01-11T21:38:06.8302626Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8302716Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8302721Z 
2023-01-11T21:38:06.8302799Z import triton
2023-01-11T21:38:06.8302894Z import triton.language as tl
2023-01-11T21:38:06.8303021Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8303162Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8303167Z 
2023-01-11T21:38:06.8303172Z 
2023-01-11T21:38:06.8303334Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8303412Z import triton
2023-01-11T21:38:06.8303507Z import triton.language as tl
2023-01-11T21:38:06.8303616Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8303720Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8303855Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8303982Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8303987Z 
2023-01-11T21:38:06.8304408Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8304512Z @triton.jit
2023-01-11T21:38:06.8304655Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8304732Z     xnumel = 5040
2023-01-11T21:38:06.8304824Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8304959Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8305046Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8305121Z     x0 = xindex
2023-01-11T21:38:06.8305242Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8305361Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8305434Z     tmp1 = 1
2023-01-11T21:38:06.8305509Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8305590Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8305732Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8305822Z ''')
2023-01-11T21:38:06.8305830Z 
2023-01-11T21:38:06.8305835Z 
2023-01-11T21:38:06.8305931Z async_compile.wait(globals())
2023-01-11T21:38:06.8306010Z del async_compile
2023-01-11T21:38:06.8306015Z 
2023-01-11T21:38:06.8306092Z def call(args):
2023-01-11T21:38:06.8306167Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8306247Z     args.clear()
2023-01-11T21:38:06.8306343Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8306546Z         buf0 = empty_strided((5040, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8306641Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8306790Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8306865Z         del arg0_1
2023-01-11T21:38:06.8306933Z         del arg1_1
2023-01-11T21:38:06.8307013Z         return (buf0, )
2023-01-11T21:38:06.8307019Z 
2023-01-11T21:38:06.8307023Z 
2023-01-11T21:38:06.8307106Z if __name__ == "__main__":
2023-01-11T21:38:06.8307228Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8307362Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8307601Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8307802Z     arg1_1 = rand_strided((5040, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8307959Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8307965Z 
2023-01-11T21:38:06.8308230Z [2023-01-11 21:36:23,072] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 985
2023-01-11T21:38:06.8308635Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8308766Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8309023Z [2023-01-11 21:36:23,088] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 986
2023-01-11T21:38:06.8309286Z [2023-01-11 21:36:23,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 986
2023-01-11T21:38:06.8309699Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8309830Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8310082Z [2023-01-11 21:36:23,116] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 987
2023-01-11T21:38:06.8310347Z [2023-01-11 21:36:23,129] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 987
2023-01-11T21:38:06.8310794Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8310925Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8311177Z [2023-01-11 21:36:23,148] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 988
2023-01-11T21:38:06.8311183Z 
2023-01-11T21:38:06.8311281Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8311348Z import torch
2023-01-11T21:38:06.8311422Z import random
2023-01-11T21:38:06.8311541Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8311665Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8311674Z 
2023-01-11T21:38:06.8311757Z aten = torch.ops.aten
2023-01-11T21:38:06.8311894Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8311989Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8311994Z 
2023-01-11T21:38:06.8312061Z import triton
2023-01-11T21:38:06.8312156Z import triton.language as tl
2023-01-11T21:38:06.8312283Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8312423Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8312428Z 
2023-01-11T21:38:06.8312432Z 
2023-01-11T21:38:06.8312587Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8312662Z import triton
2023-01-11T21:38:06.8312754Z import triton.language as tl
2023-01-11T21:38:06.8312869Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8312963Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8313098Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8313225Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8313230Z 
2023-01-11T21:38:06.8313680Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8313758Z @triton.jit
2023-01-11T21:38:06.8313899Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8313973Z     xnumel = 5040
2023-01-11T21:38:06.8314072Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8314194Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8314277Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8314348Z     x0 = xindex
2023-01-11T21:38:06.8314446Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8314544Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8314624Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8314759Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8314837Z ''')
2023-01-11T21:38:06.8314843Z 
2023-01-11T21:38:06.8314854Z 
2023-01-11T21:38:06.8314939Z async_compile.wait(globals())
2023-01-11T21:38:06.8315019Z del async_compile
2023-01-11T21:38:06.8315027Z 
2023-01-11T21:38:06.8315101Z def call(args):
2023-01-11T21:38:06.8315180Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8315255Z     args.clear()
2023-01-11T21:38:06.8315349Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8315570Z         buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8315657Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8315801Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8315877Z         del arg0_1
2023-01-11T21:38:06.8315949Z         del arg1_1
2023-01-11T21:38:06.8316027Z         return (buf0, )
2023-01-11T21:38:06.8316059Z 
2023-01-11T21:38:06.8316064Z 
2023-01-11T21:38:06.8316145Z if __name__ == "__main__":
2023-01-11T21:38:06.8316264Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8316382Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8316621Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8316848Z     arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8316972Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8316977Z 
2023-01-11T21:38:06.8316982Z 
2023-01-11T21:38:06.8317079Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8317153Z import torch
2023-01-11T21:38:06.8317228Z import random
2023-01-11T21:38:06.8317347Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8317463Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8317479Z 
2023-01-11T21:38:06.8317554Z aten = torch.ops.aten
2023-01-11T21:38:06.8317691Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8317786Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8317791Z 
2023-01-11T21:38:06.8317863Z import triton
2023-01-11T21:38:06.8317955Z import triton.language as tl
2023-01-11T21:38:06.8318084Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8318222Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8318228Z 
2023-01-11T21:38:06.8318232Z 
2023-01-11T21:38:06.8318385Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8318453Z import triton
2023-01-11T21:38:06.8318544Z import triton.language as tl
2023-01-11T21:38:06.8318661Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8318763Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8318894Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8319023Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8319028Z 
2023-01-11T21:38:06.8319484Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8319561Z @triton.jit
2023-01-11T21:38:06.8319695Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8319770Z     xnumel = 5040
2023-01-11T21:38:06.8319867Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8319998Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8320081Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8320153Z     x0 = xindex
2023-01-11T21:38:06.8320263Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8320380Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8320463Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8320598Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8320683Z ''')
2023-01-11T21:38:06.8320689Z 
2023-01-11T21:38:06.8320693Z 
2023-01-11T21:38:06.8320786Z async_compile.wait(globals())
2023-01-11T21:38:06.8320863Z del async_compile
2023-01-11T21:38:06.8320871Z 
2023-01-11T21:38:06.8320947Z def call(args):
2023-01-11T21:38:06.8321019Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8321094Z     args.clear()
2023-01-11T21:38:06.8321188Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8321409Z         buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8321502Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8321650Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8321723Z         del arg0_1
2023-01-11T21:38:06.8321789Z         del arg1_1
2023-01-11T21:38:06.8321872Z         return (buf0, )
2023-01-11T21:38:06.8321906Z 
2023-01-11T21:38:06.8321911Z 
2023-01-11T21:38:06.8321992Z if __name__ == "__main__":
2023-01-11T21:38:06.8322110Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8322236Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8322477Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8322696Z     arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8322815Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8322820Z 
2023-01-11T21:38:06.8322824Z 
2023-01-11T21:38:06.8322921Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8322988Z import torch
2023-01-11T21:38:06.8323066Z import random
2023-01-11T21:38:06.8323182Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8323306Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8323314Z 
2023-01-11T21:38:06.8323397Z aten = torch.ops.aten
2023-01-11T21:38:06.8323532Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8323627Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8323633Z 
2023-01-11T21:38:06.8323699Z import triton
2023-01-11T21:38:06.8323791Z import triton.language as tl
2023-01-11T21:38:06.8323919Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8324060Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8324065Z 
2023-01-11T21:38:06.8324069Z 
2023-01-11T21:38:06.8324226Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8324301Z import triton
2023-01-11T21:38:06.8324394Z import triton.language as tl
2023-01-11T21:38:06.8324507Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8324602Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8324735Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8324861Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8324866Z 
2023-01-11T21:38:06.8325314Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8325396Z @triton.jit
2023-01-11T21:38:06.8325548Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8325637Z     xnumel = 5040
2023-01-11T21:38:06.8325750Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8325882Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8325967Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8326038Z     x0 = xindex
2023-01-11T21:38:06.8326134Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8326233Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8326305Z     tmp1 = 1
2023-01-11T21:38:06.8326388Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8326459Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8326595Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8326679Z ''')
2023-01-11T21:38:06.8326685Z 
2023-01-11T21:38:06.8326689Z 
2023-01-11T21:38:06.8326782Z async_compile.wait(globals())
2023-01-11T21:38:06.8326866Z del async_compile
2023-01-11T21:38:06.8326871Z 
2023-01-11T21:38:06.8326946Z def call(args):
2023-01-11T21:38:06.8327026Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8327094Z     args.clear()
2023-01-11T21:38:06.8327190Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8327410Z         buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8327503Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8327648Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8327721Z         del arg0_1
2023-01-11T21:38:06.8327824Z         del arg1_1
2023-01-11T21:38:06.8327902Z         return (buf0, )
2023-01-11T21:38:06.8327907Z 
2023-01-11T21:38:06.8327912Z 
2023-01-11T21:38:06.8327985Z if __name__ == "__main__":
2023-01-11T21:38:06.8328103Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8328230Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8328469Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8328689Z     arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8328809Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8328814Z 
2023-01-11T21:38:06.8329077Z [2023-01-11 21:36:23,159] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 988
2023-01-11T21:38:06.8329493Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8329627Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8329888Z [2023-01-11 21:36:23,175] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 989
2023-01-11T21:38:06.8330144Z [2023-01-11 21:36:23,186] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 989
2023-01-11T21:38:06.8330556Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8330692Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8330946Z [2023-01-11 21:36:23,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 990
2023-01-11T21:38:06.8331233Z [2023-01-11 21:36:23,212] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 990
2023-01-11T21:38:06.8331650Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8331783Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8332035Z [2023-01-11 21:36:23,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 991
2023-01-11T21:38:06.8332043Z 
2023-01-11T21:38:06.8332141Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8332216Z import torch
2023-01-11T21:38:06.8332283Z import random
2023-01-11T21:38:06.8332401Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8332525Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8332530Z 
2023-01-11T21:38:06.8332614Z aten = torch.ops.aten
2023-01-11T21:38:06.8332753Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8332848Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8332853Z 
2023-01-11T21:38:06.8332931Z import triton
2023-01-11T21:38:06.8333022Z import triton.language as tl
2023-01-11T21:38:06.8333140Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8333281Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8333287Z 
2023-01-11T21:38:06.8333291Z 
2023-01-11T21:38:06.8333450Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8333525Z import triton
2023-01-11T21:38:06.8333672Z import triton.language as tl
2023-01-11T21:38:06.8333788Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8333890Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8334016Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8334143Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8334149Z 
2023-01-11T21:38:06.8334837Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8334912Z @triton.jit
2023-01-11T21:38:06.8335054Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8335130Z     xnumel = 5040
2023-01-11T21:38:06.8335227Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8335356Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8335444Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8335508Z     x0 = xindex
2023-01-11T21:38:06.8335626Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8335742Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8335812Z     tmp1 = 1
2023-01-11T21:38:06.8335898Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8335978Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8336105Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8336191Z ''')
2023-01-11T21:38:06.8336196Z 
2023-01-11T21:38:06.8336201Z 
2023-01-11T21:38:06.8336297Z async_compile.wait(globals())
2023-01-11T21:38:06.8336372Z del async_compile
2023-01-11T21:38:06.8336377Z 
2023-01-11T21:38:06.8336453Z def call(args):
2023-01-11T21:38:06.8336531Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8336607Z     args.clear()
2023-01-11T21:38:06.8336700Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8336915Z         buf0 = empty_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8337013Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8337225Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 5040, grid=grid(5040), stream=stream0)
2023-01-11T21:38:06.8337323Z         del arg0_1
2023-01-11T21:38:06.8337490Z         del arg1_1
2023-01-11T21:38:06.8337570Z         return (buf0, )
2023-01-11T21:38:06.8337575Z 
2023-01-11T21:38:06.8337580Z 
2023-01-11T21:38:06.8337661Z if __name__ == "__main__":
2023-01-11T21:38:06.8337778Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8337897Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8338135Z     arg0_1 = rand_strided((2, 3, 4, 5, 6, 7), (2520, 840, 210, 42, 7, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8338357Z     arg1_1 = rand_strided((6, 4, 5, 42), (840, 210, 42, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8338477Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8338485Z 
2023-01-11T21:38:06.8338489Z 
2023-01-11T21:38:06.8338588Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8338663Z import torch
2023-01-11T21:38:06.8338736Z import random
2023-01-11T21:38:06.8338853Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8338972Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8338977Z 
2023-01-11T21:38:06.8339061Z aten = torch.ops.aten
2023-01-11T21:38:06.8339199Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8339294Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8339299Z 
2023-01-11T21:38:06.8339373Z import triton
2023-01-11T21:38:06.8339466Z import triton.language as tl
2023-01-11T21:38:06.8339590Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8339721Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8339733Z 
2023-01-11T21:38:06.8339738Z 
2023-01-11T21:38:06.8339941Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8340018Z import triton
2023-01-11T21:38:06.8340115Z import triton.language as tl
2023-01-11T21:38:06.8340230Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8340335Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8340473Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8348593Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8348603Z 
2023-01-11T21:38:06.8349084Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8349161Z @triton.jit
2023-01-11T21:38:06.8349307Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8349376Z     xnumel = 1000
2023-01-11T21:38:06.8349475Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8349611Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8349691Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8349757Z     x0 = xindex
2023-01-11T21:38:06.8349856Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8349953Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8350036Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8350172Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8350259Z ''')
2023-01-11T21:38:06.8350265Z 
2023-01-11T21:38:06.8350269Z 
2023-01-11T21:38:06.8350364Z async_compile.wait(globals())
2023-01-11T21:38:06.8350441Z del async_compile
2023-01-11T21:38:06.8350446Z 
2023-01-11T21:38:06.8350514Z def call(args):
2023-01-11T21:38:06.8350595Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8350670Z     args.clear()
2023-01-11T21:38:06.8350763Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8350970Z         buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8351064Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8351209Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8351276Z         del arg0_1
2023-01-11T21:38:06.8351347Z         del arg1_1
2023-01-11T21:38:06.8351483Z         return (buf0, )
2023-01-11T21:38:06.8351489Z 
2023-01-11T21:38:06.8351493Z 
2023-01-11T21:38:06.8351576Z if __name__ == "__main__":
2023-01-11T21:38:06.8351698Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8351827Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8352041Z     arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8352244Z     arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8352357Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8352362Z 
2023-01-11T21:38:06.8352369Z 
2023-01-11T21:38:06.8352467Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8352540Z import torch
2023-01-11T21:38:06.8352614Z import random
2023-01-11T21:38:06.8352733Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8352857Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8352862Z 
2023-01-11T21:38:06.8352945Z aten = torch.ops.aten
2023-01-11T21:38:06.8353077Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8353174Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8353179Z 
2023-01-11T21:38:06.8353253Z import triton
2023-01-11T21:38:06.8353347Z import triton.language as tl
2023-01-11T21:38:06.8353473Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8353612Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8353618Z 
2023-01-11T21:38:06.8353622Z 
2023-01-11T21:38:06.8353773Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8353847Z import triton
2023-01-11T21:38:06.8353967Z import triton.language as tl
2023-01-11T21:38:06.8354081Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8354185Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8354319Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8354448Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8354453Z 
2023-01-11T21:38:06.8354874Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8354949Z @triton.jit
2023-01-11T21:38:06.8355091Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8355162Z     xnumel = 1000
2023-01-11T21:38:06.8355259Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8355388Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8355475Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8355546Z     x0 = xindex
2023-01-11T21:38:06.8355666Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8355783Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8355855Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8355995Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8356082Z ''')
2023-01-11T21:38:06.8356088Z 
2023-01-11T21:38:06.8356092Z 
2023-01-11T21:38:06.8356185Z async_compile.wait(globals())
2023-01-11T21:38:06.8356261Z del async_compile
2023-01-11T21:38:06.8356266Z 
2023-01-11T21:38:06.8356341Z def call(args):
2023-01-11T21:38:06.8356421Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8356495Z     args.clear()
2023-01-11T21:38:06.8356581Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8356783Z         buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8356879Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8357022Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8357096Z         del arg0_1
2023-01-11T21:38:06.8357168Z         del arg1_1
2023-01-11T21:38:06.8357245Z         return (buf0, )
2023-01-11T21:38:06.8357250Z 
2023-01-11T21:38:06.8357287Z 
2023-01-11T21:38:06.8357369Z if __name__ == "__main__":
2023-01-11T21:38:06.8357480Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8357607Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8357820Z     arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8358021Z     arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8358141Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8358146Z 
2023-01-11T21:38:06.8358410Z [2023-01-11 21:36:23,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 991
2023-01-11T21:38:06.8358828Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8358963Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8359218Z [2023-01-11 21:36:23,263] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 992
2023-01-11T21:38:06.8359474Z [2023-01-11 21:36:23,275] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 992
2023-01-11T21:38:06.8359885Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8360045Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8360303Z [2023-01-11 21:36:23,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 993
2023-01-11T21:38:06.8360563Z [2023-01-11 21:36:23,301] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 993
2023-01-11T21:38:06.8360975Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8361105Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8361363Z [2023-01-11 21:36:23,317] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 994
2023-01-11T21:38:06.8361368Z 
2023-01-11T21:38:06.8361466Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8361545Z import torch
2023-01-11T21:38:06.8361620Z import random
2023-01-11T21:38:06.8361737Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8361861Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8361867Z 
2023-01-11T21:38:06.8361948Z aten = torch.ops.aten
2023-01-11T21:38:06.8362086Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8362184Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8362190Z 
2023-01-11T21:38:06.8362265Z import triton
2023-01-11T21:38:06.8362360Z import triton.language as tl
2023-01-11T21:38:06.8362479Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8362619Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8362626Z 
2023-01-11T21:38:06.8362631Z 
2023-01-11T21:38:06.8362790Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8362865Z import triton
2023-01-11T21:38:06.8362957Z import triton.language as tl
2023-01-11T21:38:06.8363071Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8363198Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8363334Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8363453Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8363458Z 
2023-01-11T21:38:06.8363893Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8363967Z @triton.jit
2023-01-11T21:38:06.8364109Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8364188Z     xnumel = 1000
2023-01-11T21:38:06.8364286Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8364415Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8364503Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8364567Z     x0 = xindex
2023-01-11T21:38:06.8364666Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8364764Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8364837Z     tmp1 = 1
2023-01-11T21:38:06.8364915Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8364992Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8365128Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8365206Z ''')
2023-01-11T21:38:06.8365212Z 
2023-01-11T21:38:06.8365216Z 
2023-01-11T21:38:06.8365309Z async_compile.wait(globals())
2023-01-11T21:38:06.8365385Z del async_compile
2023-01-11T21:38:06.8365390Z 
2023-01-11T21:38:06.8365465Z def call(args):
2023-01-11T21:38:06.8365543Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8365644Z     args.clear()
2023-01-11T21:38:06.8365736Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8365934Z         buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8366026Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8366173Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8366249Z         del arg0_1
2023-01-11T21:38:06.8366321Z         del arg1_1
2023-01-11T21:38:06.8366400Z         return (buf0, )
2023-01-11T21:38:06.8366405Z 
2023-01-11T21:38:06.8366409Z 
2023-01-11T21:38:06.8366490Z if __name__ == "__main__":
2023-01-11T21:38:06.8366608Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8366727Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8366938Z     arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8367139Z     arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8367261Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8367266Z 
2023-01-11T21:38:06.8367270Z 
2023-01-11T21:38:06.8367366Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8367442Z import torch
2023-01-11T21:38:06.8367517Z import random
2023-01-11T21:38:06.8367637Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8367754Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8367759Z 
2023-01-11T21:38:06.8367841Z aten = torch.ops.aten
2023-01-11T21:38:06.8367977Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8368072Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8368077Z 
2023-01-11T21:38:06.8368153Z import triton
2023-01-11T21:38:06.8368245Z import triton.language as tl
2023-01-11T21:38:06.8368369Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8368500Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8368518Z 
2023-01-11T21:38:06.8368522Z 
2023-01-11T21:38:06.8368672Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8368747Z import triton
2023-01-11T21:38:06.8368839Z import triton.language as tl
2023-01-11T21:38:06.8368953Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8369083Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8369218Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8369341Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8369346Z 
2023-01-11T21:38:06.8369765Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8369832Z @triton.jit
2023-01-11T21:38:06.8369974Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8370050Z     xnumel = 1000
2023-01-11T21:38:06.8370149Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8370278Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8370362Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8370433Z     x0 = xindex
2023-01-11T21:38:06.8370544Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8370660Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8370731Z     tmp1 = 1
2023-01-11T21:38:06.8370809Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8370886Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8371024Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8371109Z ''')
2023-01-11T21:38:06.8371115Z 
2023-01-11T21:38:06.8371119Z 
2023-01-11T21:38:06.8371205Z async_compile.wait(globals())
2023-01-11T21:38:06.8371285Z del async_compile
2023-01-11T21:38:06.8371290Z 
2023-01-11T21:38:06.8371368Z def call(args):
2023-01-11T21:38:06.8371501Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8371575Z     args.clear()
2023-01-11T21:38:06.8371668Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8371872Z         buf0 = empty_strided((50, 20), (20, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8371964Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8372106Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8372179Z         del arg0_1
2023-01-11T21:38:06.8372253Z         del arg1_1
2023-01-11T21:38:06.8372330Z         return (buf0, )
2023-01-11T21:38:06.8372335Z 
2023-01-11T21:38:06.8372340Z 
2023-01-11T21:38:06.8372419Z if __name__ == "__main__":
2023-01-11T21:38:06.8372537Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8372662Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8372867Z     arg0_1 = rand_strided((10, 5, 20), (100, 20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8373072Z     arg1_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8373192Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8373198Z 
2023-01-11T21:38:06.8373202Z 
2023-01-11T21:38:06.8373299Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8373374Z import torch
2023-01-11T21:38:06.8373452Z import random
2023-01-11T21:38:06.8373570Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8373694Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8373700Z 
2023-01-11T21:38:06.8373775Z aten = torch.ops.aten
2023-01-11T21:38:06.8373910Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8374005Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8374010Z 
2023-01-11T21:38:06.8374089Z import triton
2023-01-11T21:38:06.8374180Z import triton.language as tl
2023-01-11T21:38:06.8374306Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8374447Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8374452Z 
2023-01-11T21:38:06.8374457Z 
2023-01-11T21:38:06.8374947Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8375016Z import triton
2023-01-11T21:38:06.8375108Z import triton.language as tl
2023-01-11T21:38:06.8375289Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8375392Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8375527Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8375650Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8375656Z 
2023-01-11T21:38:06.8376074Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8376148Z @triton.jit
2023-01-11T21:38:06.8376281Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8376357Z     xnumel = 10
2023-01-11T21:38:06.8376454Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8376583Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8376665Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8376735Z     x0 = xindex
2023-01-11T21:38:06.8376834Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8376924Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8377004Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8377195Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8377301Z ''')
2023-01-11T21:38:06.8377307Z 
2023-01-11T21:38:06.8377311Z 
2023-01-11T21:38:06.8377405Z async_compile.wait(globals())
2023-01-11T21:38:06.8377481Z del async_compile
2023-01-11T21:38:06.8377486Z 
2023-01-11T21:38:06.8377560Z def call(args):
2023-01-11T21:38:06.8377640Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8377708Z     args.clear()
2023-01-11T21:38:06.8377847Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8378059Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8378153Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8378300Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8378375Z         del arg0_1
2023-01-11T21:38:06.8378451Z         del arg1_1
2023-01-11T21:38:06.8378524Z         return (buf0, )
2023-01-11T21:38:06.8378529Z 
2023-01-11T21:38:06.8378533Z 
2023-01-11T21:38:06.8378615Z if __name__ == "__main__":
2023-01-11T21:38:06.8378734Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8378862Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8379064Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8379272Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8379397Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8379402Z 
2023-01-11T21:38:06.8379670Z [2023-01-11 21:36:23,326] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 994
2023-01-11T21:38:06.8380094Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8380220Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8380478Z [2023-01-11 21:36:23,345] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 995
2023-01-11T21:38:06.8380743Z [2023-01-11 21:36:23,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 995
2023-01-11T21:38:06.8381185Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8381324Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8381579Z [2023-01-11 21:36:23,374] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 996
2023-01-11T21:38:06.8381842Z [2023-01-11 21:36:23,384] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 996
2023-01-11T21:38:06.8382255Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8382390Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8382651Z [2023-01-11 21:36:23,401] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 997
2023-01-11T21:38:06.8382657Z 
2023-01-11T21:38:06.8382759Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8382829Z import torch
2023-01-11T21:38:06.8382905Z import random
2023-01-11T21:38:06.8383026Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8383152Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8383157Z 
2023-01-11T21:38:06.8383240Z aten = torch.ops.aten
2023-01-11T21:38:06.8383380Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8383478Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8383483Z 
2023-01-11T21:38:06.8383558Z import triton
2023-01-11T21:38:06.8383686Z import triton.language as tl
2023-01-11T21:38:06.8383813Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8383954Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8383959Z 
2023-01-11T21:38:06.8383964Z 
2023-01-11T21:38:06.8384123Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8384203Z import triton
2023-01-11T21:38:06.8384298Z import triton.language as tl
2023-01-11T21:38:06.8384413Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8384510Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8384647Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8384773Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8384779Z 
2023-01-11T21:38:06.8385196Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8385278Z @triton.jit
2023-01-11T21:38:06.8385421Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8385500Z     xnumel = 10
2023-01-11T21:38:06.8385601Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8385728Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8385815Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8385887Z     x0 = xindex
2023-01-11T21:38:06.8386007Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8386125Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8386206Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8386344Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8386434Z ''')
2023-01-11T21:38:06.8386439Z 
2023-01-11T21:38:06.8386443Z 
2023-01-11T21:38:06.8386531Z async_compile.wait(globals())
2023-01-11T21:38:06.8386613Z del async_compile
2023-01-11T21:38:06.8386618Z 
2023-01-11T21:38:06.8386695Z def call(args):
2023-01-11T21:38:06.8386776Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8386853Z     args.clear()
2023-01-11T21:38:06.8386948Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8387194Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8387283Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8387429Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8387505Z         del arg0_1
2023-01-11T21:38:06.8387579Z         del arg1_1
2023-01-11T21:38:06.8387659Z         return (buf0, )
2023-01-11T21:38:06.8387665Z 
2023-01-11T21:38:06.8387669Z 
2023-01-11T21:38:06.8387750Z if __name__ == "__main__":
2023-01-11T21:38:06.8387870Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8387999Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8388194Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8388407Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8388533Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8388539Z 
2023-01-11T21:38:06.8388543Z 
2023-01-11T21:38:06.8388645Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8388721Z import torch
2023-01-11T21:38:06.8388797Z import random
2023-01-11T21:38:06.8388917Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8389042Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8389047Z 
2023-01-11T21:38:06.8389125Z aten = torch.ops.aten
2023-01-11T21:38:06.8389264Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8389360Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8389366Z 
2023-01-11T21:38:06.8389443Z import triton
2023-01-11T21:38:06.8389536Z import triton.language as tl
2023-01-11T21:38:06.8389690Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8389831Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8389837Z 
2023-01-11T21:38:06.8389841Z 
2023-01-11T21:38:06.8389999Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8390070Z import triton
2023-01-11T21:38:06.8390172Z import triton.language as tl
2023-01-11T21:38:06.8390288Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8390392Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8390533Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8390660Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8390665Z 
2023-01-11T21:38:06.8391081Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8391162Z @triton.jit
2023-01-11T21:38:06.8391298Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8391378Z     xnumel = 10
2023-01-11T21:38:06.8391479Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8391610Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8391698Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8391772Z     x0 = xindex
2023-01-11T21:38:06.8391871Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8391963Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8392037Z     tmp1 = 1
2023-01-11T21:38:06.8392118Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8392198Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8392334Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8392423Z ''')
2023-01-11T21:38:06.8392428Z 
2023-01-11T21:38:06.8392433Z 
2023-01-11T21:38:06.8392529Z async_compile.wait(globals())
2023-01-11T21:38:06.8392601Z del async_compile
2023-01-11T21:38:06.8392608Z 
2023-01-11T21:38:06.8392685Z def call(args):
2023-01-11T21:38:06.8392766Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8392843Z     args.clear()
2023-01-11T21:38:06.8392936Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8393176Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8393272Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8393411Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8393486Z         del arg0_1
2023-01-11T21:38:06.8393560Z         del arg1_1
2023-01-11T21:38:06.8393640Z         return (buf0, )
2023-01-11T21:38:06.8393645Z 
2023-01-11T21:38:06.8393649Z 
2023-01-11T21:38:06.8393731Z if __name__ == "__main__":
2023-01-11T21:38:06.8393850Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8393979Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8394181Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8394385Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8394506Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8394511Z 
2023-01-11T21:38:06.8394516Z 
2023-01-11T21:38:06.8394618Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8394694Z import torch
2023-01-11T21:38:06.8394772Z import random
2023-01-11T21:38:06.8394896Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8395021Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8395026Z 
2023-01-11T21:38:06.8395112Z aten = torch.ops.aten
2023-01-11T21:38:06.8395244Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8395359Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8395365Z 
2023-01-11T21:38:06.8395450Z import triton
2023-01-11T21:38:06.8395562Z import triton.language as tl
2023-01-11T21:38:06.8395723Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8395864Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8395869Z 
2023-01-11T21:38:06.8395874Z 
2023-01-11T21:38:06.8396034Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8396110Z import triton
2023-01-11T21:38:06.8396200Z import triton.language as tl
2023-01-11T21:38:06.8396316Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8396421Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8396555Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8396682Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8396687Z 
2023-01-11T21:38:06.8397104Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8397186Z @triton.jit
2023-01-11T21:38:06.8397330Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8397400Z     xnumel = 10
2023-01-11T21:38:06.8397500Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8397632Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8397726Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8397800Z     x0 = xindex
2023-01-11T21:38:06.8397918Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8398040Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8398107Z     tmp1 = 1
2023-01-11T21:38:06.8398187Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8398269Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8398406Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8398493Z ''')
2023-01-11T21:38:06.8398499Z 
2023-01-11T21:38:06.8398503Z 
2023-01-11T21:38:06.8398599Z async_compile.wait(globals())
2023-01-11T21:38:06.8398683Z del async_compile
2023-01-11T21:38:06.8398688Z 
2023-01-11T21:38:06.8398758Z def call(args):
2023-01-11T21:38:06.8398839Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8398916Z     args.clear()
2023-01-11T21:38:06.8399009Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8399251Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8399346Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8399490Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.8399568Z         del arg0_1
2023-01-11T21:38:06.8399636Z         del arg1_1
2023-01-11T21:38:06.8399719Z         return (buf0, )
2023-01-11T21:38:06.8399724Z 
2023-01-11T21:38:06.8399728Z 
2023-01-11T21:38:06.8399810Z if __name__ == "__main__":
2023-01-11T21:38:06.8399930Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8400059Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8400260Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8400467Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8400580Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8400592Z 
2023-01-11T21:38:06.8400854Z [2023-01-11 21:36:23,413] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 997
2023-01-11T21:38:06.8401277Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8401415Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8401674Z [2023-01-11 21:36:23,429] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 998
2023-01-11T21:38:06.8401967Z [2023-01-11 21:36:23,440] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 998
2023-01-11T21:38:06.8402384Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8402518Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8402774Z [2023-01-11 21:36:23,460] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 999
2023-01-11T21:38:06.8403035Z [2023-01-11 21:36:23,472] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 999
2023-01-11T21:38:06.8403446Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8403587Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8403847Z [2023-01-11 21:36:23,491] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1000
2023-01-11T21:38:06.8403853Z 
2023-01-11T21:38:06.8403946Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8404025Z import torch
2023-01-11T21:38:06.8404102Z import random
2023-01-11T21:38:06.8404223Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8404348Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8404354Z 
2023-01-11T21:38:06.8404438Z aten = torch.ops.aten
2023-01-11T21:38:06.8404581Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8404672Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8404685Z 
2023-01-11T21:38:06.8404755Z import triton
2023-01-11T21:38:06.8404851Z import triton.language as tl
2023-01-11T21:38:06.8404979Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8405146Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8405152Z 
2023-01-11T21:38:06.8405156Z 
2023-01-11T21:38:06.8405314Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8405391Z import triton
2023-01-11T21:38:06.8405488Z import triton.language as tl
2023-01-11T21:38:06.8405597Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8405702Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8405862Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8406004Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8406014Z 
2023-01-11T21:38:06.8406442Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8406523Z @triton.jit
2023-01-11T21:38:06.8406671Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8406749Z     xnumel = 1000
2023-01-11T21:38:06.8406842Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8406974Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8407059Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8407132Z     x0 = xindex
2023-01-11T21:38:06.8407230Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8407330Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8407412Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8407542Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8407658Z ''')
2023-01-11T21:38:06.8407665Z 
2023-01-11T21:38:06.8407670Z 
2023-01-11T21:38:06.8407764Z async_compile.wait(globals())
2023-01-11T21:38:06.8407845Z del async_compile
2023-01-11T21:38:06.8407850Z 
2023-01-11T21:38:06.8407929Z def call(args):
2023-01-11T21:38:06.8408009Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8408090Z     args.clear()
2023-01-11T21:38:06.8408185Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8408412Z         buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8408506Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8408653Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8408727Z         del arg0_1
2023-01-11T21:38:06.8408801Z         del arg1_1
2023-01-11T21:38:06.8408884Z         return (buf0, )
2023-01-11T21:38:06.8408889Z 
2023-01-11T21:38:06.8408894Z 
2023-01-11T21:38:06.8408975Z if __name__ == "__main__":
2023-01-11T21:38:06.8409091Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8409218Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8409428Z     arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8409665Z     arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8409786Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8409792Z 
2023-01-11T21:38:06.8409796Z 
2023-01-11T21:38:06.8409896Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8409973Z import torch
2023-01-11T21:38:06.8410052Z import random
2023-01-11T21:38:06.8410166Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8410291Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8410296Z 
2023-01-11T21:38:06.8410380Z aten = torch.ops.aten
2023-01-11T21:38:06.8410519Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8410621Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8410626Z 
2023-01-11T21:38:06.8410703Z import triton
2023-01-11T21:38:06.8410798Z import triton.language as tl
2023-01-11T21:38:06.8410925Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8411083Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8411089Z 
2023-01-11T21:38:06.8411101Z 
2023-01-11T21:38:06.8411251Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8411328Z import triton
2023-01-11T21:38:06.8411422Z import triton.language as tl
2023-01-11T21:38:06.8411538Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8411643Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8411778Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8411904Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8411909Z 
2023-01-11T21:38:06.8412320Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8412400Z @triton.jit
2023-01-11T21:38:06.8412547Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8412625Z     xnumel = 1000
2023-01-11T21:38:06.8412728Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8412860Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8412946Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8413021Z     x0 = xindex
2023-01-11T21:38:06.8413133Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8413251Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8413332Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8413469Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8413583Z ''')
2023-01-11T21:38:06.8413588Z 
2023-01-11T21:38:06.8413593Z 
2023-01-11T21:38:06.8413689Z async_compile.wait(globals())
2023-01-11T21:38:06.8413769Z del async_compile
2023-01-11T21:38:06.8413774Z 
2023-01-11T21:38:06.8413845Z def call(args):
2023-01-11T21:38:06.8413928Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8414005Z     args.clear()
2023-01-11T21:38:06.8414102Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8414338Z         buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8414434Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8414780Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8414849Z         del arg0_1
2023-01-11T21:38:06.8414922Z         del arg1_1
2023-01-11T21:38:06.8415000Z         return (buf0, )
2023-01-11T21:38:06.8415005Z 
2023-01-11T21:38:06.8415009Z 
2023-01-11T21:38:06.8415092Z if __name__ == "__main__":
2023-01-11T21:38:06.8415214Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8415341Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8415576Z     arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8415867Z     arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8415989Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8416001Z 
2023-01-11T21:38:06.8416005Z 
2023-01-11T21:38:06.8416101Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8416175Z import torch
2023-01-11T21:38:06.8416252Z import random
2023-01-11T21:38:06.8416380Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8416512Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8416517Z 
2023-01-11T21:38:06.8416601Z aten = torch.ops.aten
2023-01-11T21:38:06.8416750Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8416845Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8416850Z 
2023-01-11T21:38:06.8416927Z import triton
2023-01-11T21:38:06.8417025Z import triton.language as tl
2023-01-11T21:38:06.8417220Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8417434Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8417440Z 
2023-01-11T21:38:06.8417444Z 
2023-01-11T21:38:06.8417606Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8417681Z import triton
2023-01-11T21:38:06.8417774Z import triton.language as tl
2023-01-11T21:38:06.8417881Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8417983Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8418116Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8418240Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8418245Z 
2023-01-11T21:38:06.8418661Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8418740Z @triton.jit
2023-01-11T21:38:06.8418881Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8418959Z     xnumel = 1000
2023-01-11T21:38:06.8419049Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8419177Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8419263Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8419336Z     x0 = xindex
2023-01-11T21:38:06.8419437Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8419534Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8419605Z     tmp1 = 1
2023-01-11T21:38:06.8419677Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8419754Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8419889Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8420012Z ''')
2023-01-11T21:38:06.8420017Z 
2023-01-11T21:38:06.8420022Z 
2023-01-11T21:38:06.8420118Z async_compile.wait(globals())
2023-01-11T21:38:06.8420199Z del async_compile
2023-01-11T21:38:06.8420204Z 
2023-01-11T21:38:06.8420281Z def call(args):
2023-01-11T21:38:06.8420356Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8420438Z     args.clear()
2023-01-11T21:38:06.8420531Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8420766Z         buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8420862Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8421011Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8421089Z         del arg0_1
2023-01-11T21:38:06.8421165Z         del arg1_1
2023-01-11T21:38:06.8421238Z         return (buf0, )
2023-01-11T21:38:06.8421243Z 
2023-01-11T21:38:06.8421247Z 
2023-01-11T21:38:06.8421335Z if __name__ == "__main__":
2023-01-11T21:38:06.8421459Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8421587Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8421796Z     arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8422032Z     arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8422156Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8422161Z 
2023-01-11T21:38:06.8422430Z [2023-01-11 21:36:23,503] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1000
2023-01-11T21:38:06.8422848Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8422979Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8423237Z [2023-01-11 21:36:23,519] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1001
2023-01-11T21:38:06.8423531Z [2023-01-11 21:36:23,531] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1001
2023-01-11T21:38:06.8423948Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8424085Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8424344Z [2023-01-11 21:36:23,547] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1002
2023-01-11T21:38:06.8424610Z [2023-01-11 21:36:23,558] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1002
2023-01-11T21:38:06.8425025Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8425159Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8425419Z [2023-01-11 21:36:23,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1003
2023-01-11T21:38:06.8425425Z 
2023-01-11T21:38:06.8425528Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8425617Z import torch
2023-01-11T21:38:06.8425700Z import random
2023-01-11T21:38:06.8425882Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8426009Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8426015Z 
2023-01-11T21:38:06.8426101Z aten = torch.ops.aten
2023-01-11T21:38:06.8426240Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8426342Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8426347Z 
2023-01-11T21:38:06.8426426Z import triton
2023-01-11T21:38:06.8426514Z import triton.language as tl
2023-01-11T21:38:06.8426644Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8426786Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8426792Z 
2023-01-11T21:38:06.8426796Z 
2023-01-11T21:38:06.8426958Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8427036Z import triton
2023-01-11T21:38:06.8427131Z import triton.language as tl
2023-01-11T21:38:06.8427249Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8427349Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8427487Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8427615Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8427620Z 
2023-01-11T21:38:06.8428044Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8428121Z @triton.jit
2023-01-11T21:38:06.8428263Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8428341Z     xnumel = 1000
2023-01-11T21:38:06.8428442Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8428567Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8428654Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8428728Z     x0 = xindex
2023-01-11T21:38:06.8428848Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8428971Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8429044Z     tmp1 = 1
2023-01-11T21:38:06.8429127Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8429200Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8429367Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8429457Z ''')
2023-01-11T21:38:06.8429462Z 
2023-01-11T21:38:06.8429467Z 
2023-01-11T21:38:06.8429563Z async_compile.wait(globals())
2023-01-11T21:38:06.8429643Z del async_compile
2023-01-11T21:38:06.8429648Z 
2023-01-11T21:38:06.8429726Z def call(args):
2023-01-11T21:38:06.8429808Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8429885Z     args.clear()
2023-01-11T21:38:06.8429973Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8430206Z         buf0 = empty_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8430300Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8430450Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8430526Z         del arg0_1
2023-01-11T21:38:06.8430603Z         del arg1_1
2023-01-11T21:38:06.8430683Z         return (buf0, )
2023-01-11T21:38:06.8430688Z 
2023-01-11T21:38:06.8430693Z 
2023-01-11T21:38:06.8430778Z if __name__ == "__main__":
2023-01-11T21:38:06.8430892Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8431020Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8431230Z     arg0_1 = rand_strided((10, 100), (100, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8431460Z     arg1_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8431581Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8431586Z 
2023-01-11T21:38:06.8431590Z 
2023-01-11T21:38:06.8431691Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8431796Z import torch
2023-01-11T21:38:06.8431873Z import random
2023-01-11T21:38:06.8431986Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8432112Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8432118Z 
2023-01-11T21:38:06.8432202Z aten = torch.ops.aten
2023-01-11T21:38:06.8432343Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8432440Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8432446Z 
2023-01-11T21:38:06.8432522Z import triton
2023-01-11T21:38:06.8432617Z import triton.language as tl
2023-01-11T21:38:06.8432737Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8432878Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8432884Z 
2023-01-11T21:38:06.8432888Z 
2023-01-11T21:38:06.8433045Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8433125Z import triton
2023-01-11T21:38:06.8433220Z import triton.language as tl
2023-01-11T21:38:06.8433343Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8433449Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8433584Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8433705Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8433710Z 
2023-01-11T21:38:06.8434133Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8434214Z @triton.jit
2023-01-11T21:38:06.8434356Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8434432Z     xnumel = 16
2023-01-11T21:38:06.8434532Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8434664Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8434748Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8434817Z     x0 = xindex
2023-01-11T21:38:06.8434917Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8435016Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8435098Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8435235Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8435344Z ''')
2023-01-11T21:38:06.8435380Z 
2023-01-11T21:38:06.8435385Z 
2023-01-11T21:38:06.8435495Z async_compile.wait(globals())
2023-01-11T21:38:06.8435579Z del async_compile
2023-01-11T21:38:06.8435592Z 
2023-01-11T21:38:06.8435662Z def call(args):
2023-01-11T21:38:06.8435746Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8435824Z     args.clear()
2023-01-11T21:38:06.8435918Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8436133Z         buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8436228Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8436365Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8436445Z         del arg0_1
2023-01-11T21:38:06.8436520Z         del arg1_1
2023-01-11T21:38:06.8436601Z         return (buf0, )
2023-01-11T21:38:06.8436606Z 
2023-01-11T21:38:06.8436611Z 
2023-01-11T21:38:06.8436694Z if __name__ == "__main__":
2023-01-11T21:38:06.8436818Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8436946Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8437148Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8437355Z     arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8437477Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8437482Z 
2023-01-11T21:38:06.8437487Z 
2023-01-11T21:38:06.8437586Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8437663Z import torch
2023-01-11T21:38:06.8437739Z import random
2023-01-11T21:38:06.8437890Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8438017Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8438022Z 
2023-01-11T21:38:06.8438106Z aten = torch.ops.aten
2023-01-11T21:38:06.8438238Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8438339Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8438344Z 
2023-01-11T21:38:06.8438420Z import triton
2023-01-11T21:38:06.8438515Z import triton.language as tl
2023-01-11T21:38:06.8438643Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8438783Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8438789Z 
2023-01-11T21:38:06.8438793Z 
2023-01-11T21:38:06.8438948Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8439025Z import triton
2023-01-11T21:38:06.8439113Z import triton.language as tl
2023-01-11T21:38:06.8439230Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8439334Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8439473Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8439599Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8439604Z 
2023-01-11T21:38:06.8440031Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8440108Z @triton.jit
2023-01-11T21:38:06.8440249Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8440319Z     xnumel = 16
2023-01-11T21:38:06.8440420Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8440557Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8440647Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8440720Z     x0 = xindex
2023-01-11T21:38:06.8440839Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8440960Z     tmp1 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8441034Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8441170Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8441257Z ''')
2023-01-11T21:38:06.8441263Z 
2023-01-11T21:38:06.8441267Z 
2023-01-11T21:38:06.8441388Z async_compile.wait(globals())
2023-01-11T21:38:06.8441470Z del async_compile
2023-01-11T21:38:06.8441475Z 
2023-01-11T21:38:06.8441552Z def call(args):
2023-01-11T21:38:06.8441633Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8441709Z     args.clear()
2023-01-11T21:38:06.8441796Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8442013Z         buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8442109Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8442257Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8442337Z         del arg0_1
2023-01-11T21:38:06.8442413Z         del arg1_1
2023-01-11T21:38:06.8442493Z         return (buf0, )
2023-01-11T21:38:06.8442499Z 
2023-01-11T21:38:06.8442503Z 
2023-01-11T21:38:06.8442578Z if __name__ == "__main__":
2023-01-11T21:38:06.8442697Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8442826Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8443030Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8443241Z     arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8443361Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8443367Z 
2023-01-11T21:38:06.8443639Z [2023-01-11 21:36:23,589] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1003
2023-01-11T21:38:06.8444059Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8444221Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8444482Z [2023-01-11 21:36:23,608] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1004
2023-01-11T21:38:06.8444741Z [2023-01-11 21:36:23,619] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1004
2023-01-11T21:38:06.8444748Z 
2023-01-11T21:38:06.8444847Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8444926Z import torch
2023-01-11T21:38:06.8445003Z import random
2023-01-11T21:38:06.8445126Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8445254Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8445262Z 
2023-01-11T21:38:06.8445346Z aten = torch.ops.aten
2023-01-11T21:38:06.8445503Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8445616Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8445622Z 
2023-01-11T21:38:06.8445707Z import triton
2023-01-11T21:38:06.8445802Z import triton.language as tl
2023-01-11T21:38:06.8445933Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8446075Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8446080Z 
2023-01-11T21:38:06.8446085Z 
2023-01-11T21:38:06.8446246Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8446323Z import triton
2023-01-11T21:38:06.8446411Z import triton.language as tl
2023-01-11T21:38:06.8446527Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8446631Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8446767Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8446893Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8446901Z 
2023-01-11T21:38:06.8447319Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8447420Z @triton.jit
2023-01-11T21:38:06.8447564Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8447633Z     xnumel = 16
2023-01-11T21:38:06.8447733Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8447865Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8447950Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8448022Z     x0 = xindex
2023-01-11T21:38:06.8448122Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8448225Z     tmp3 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.8448292Z     tmp1 = 1
2023-01-11T21:38:06.8448373Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8448459Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8448596Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8448684Z ''')
2023-01-11T21:38:06.8448690Z 
2023-01-11T21:38:06.8448694Z 
2023-01-11T21:38:06.8448791Z async_compile.wait(globals())
2023-01-11T21:38:06.8448873Z del async_compile
2023-01-11T21:38:06.8448879Z 
2023-01-11T21:38:06.8448957Z def call(args):
2023-01-11T21:38:06.8449032Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8449110Z     args.clear()
2023-01-11T21:38:06.8449204Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8449418Z         buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8449514Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8449659Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8449736Z         del arg0_1
2023-01-11T21:38:06.8449805Z         del arg1_1
2023-01-11T21:38:06.8449915Z         return (buf0, )
2023-01-11T21:38:06.8449920Z 
2023-01-11T21:38:06.8449925Z 
2023-01-11T21:38:06.8450007Z if __name__ == "__main__":
2023-01-11T21:38:06.8450128Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8450256Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8450463Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8450676Z     arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8450798Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8450803Z 
2023-01-11T21:38:06.8450808Z 
2023-01-11T21:38:06.8450900Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8450976Z import torch
2023-01-11T21:38:06.8451053Z import random
2023-01-11T21:38:06.8451172Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8451297Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8451305Z 
2023-01-11T21:38:06.8451390Z aten = torch.ops.aten
2023-01-11T21:38:06.8451528Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8451626Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8451631Z 
2023-01-11T21:38:06.8451701Z import triton
2023-01-11T21:38:06.8451800Z import triton.language as tl
2023-01-11T21:38:06.8451932Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8452076Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8452081Z 
2023-01-11T21:38:06.8452085Z 
2023-01-11T21:38:06.8452245Z triton_fused_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8452322Z import triton
2023-01-11T21:38:06.8452418Z import triton.language as tl
2023-01-11T21:38:06.8452527Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8452631Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8452765Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8452891Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8452900Z 
2023-01-11T21:38:06.8453319Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8453421Z @triton.jit
2023-01-11T21:38:06.8453565Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8453643Z     xnumel = 16
2023-01-11T21:38:06.8453735Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8453867Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8453952Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8454026Z     x0 = xindex
2023-01-11T21:38:06.8454146Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8454265Z     tmp3 = tl.load(in_ptr1 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8454338Z     tmp1 = 1
2023-01-11T21:38:06.8454415Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8454768Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8454911Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8455002Z ''')
2023-01-11T21:38:06.8455007Z 
2023-01-11T21:38:06.8455011Z 
2023-01-11T21:38:06.8455106Z async_compile.wait(globals())
2023-01-11T21:38:06.8455185Z del async_compile
2023-01-11T21:38:06.8455190Z 
2023-01-11T21:38:06.8455265Z def call(args):
2023-01-11T21:38:06.8455343Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8455411Z     args.clear()
2023-01-11T21:38:06.8455505Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8455716Z         buf0 = empty_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8455810Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8455952Z         triton_fused_add_1_0.run(arg0_1, arg1_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8456025Z         del arg0_1
2023-01-11T21:38:06.8456146Z         del arg1_1
2023-01-11T21:38:06.8456220Z         return (buf0, )
2023-01-11T21:38:06.8456225Z 
2023-01-11T21:38:06.8456237Z 
2023-01-11T21:38:06.8456312Z if __name__ == "__main__":
2023-01-11T21:38:06.8456431Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8456559Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8456764Z     arg0_1 = rand_strided((4, 4), (4, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8456978Z     arg1_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8457099Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8457105Z 
2023-01-11T21:38:06.8457259Z ok (2.813s)
2023-01-11T21:38:06.8457714Z   test_views2_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8457844Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8458111Z [2023-01-11 21:36:23,636] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1005
2023-01-11T21:38:06.8458381Z [2023-01-11 21:36:23,702] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1005
2023-01-11T21:38:06.8458795Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8458927Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8459190Z [2023-01-11 21:36:23,717] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1006
2023-01-11T21:38:06.8459457Z [2023-01-11 21:36:23,783] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1006
2023-01-11T21:38:06.8459912Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8460046Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8460303Z [2023-01-11 21:36:23,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1007
2023-01-11T21:38:06.8460565Z [2023-01-11 21:36:23,868] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1007
2023-01-11T21:38:06.8460981Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8461113Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8461363Z [2023-01-11 21:36:23,885] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1008
2023-01-11T21:38:06.8461627Z [2023-01-11 21:36:23,953] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1008
2023-01-11T21:38:06.8462037Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8462193Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8462452Z [2023-01-11 21:36:23,967] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1009
2023-01-11T21:38:06.8462458Z 
2023-01-11T21:38:06.8462558Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8462635Z import torch
2023-01-11T21:38:06.8462712Z import random
2023-01-11T21:38:06.8462833Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8462952Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8462957Z 
2023-01-11T21:38:06.8463043Z aten = torch.ops.aten
2023-01-11T21:38:06.8463183Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8463279Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8463285Z 
2023-01-11T21:38:06.8463364Z import triton
2023-01-11T21:38:06.8463458Z import triton.language as tl
2023-01-11T21:38:06.8463585Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8463720Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8463732Z 
2023-01-11T21:38:06.8463736Z 
2023-01-11T21:38:06.8463888Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8463965Z import triton
2023-01-11T21:38:06.8464059Z import triton.language as tl
2023-01-11T21:38:06.8464175Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8464279Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8464416Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8464543Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8464548Z 
2023-01-11T21:38:06.8464955Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8465028Z @triton.jit
2023-01-11T21:38:06.8465162Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8465237Z     xnumel = 16
2023-01-11T21:38:06.8465337Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8465535Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8465623Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8465696Z     x0 = xindex
2023-01-11T21:38:06.8465789Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8465862Z     tmp1 = 1
2023-01-11T21:38:06.8465943Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8466080Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8466167Z ''')
2023-01-11T21:38:06.8466173Z 
2023-01-11T21:38:06.8466177Z 
2023-01-11T21:38:06.8466274Z async_compile.wait(globals())
2023-01-11T21:38:06.8466355Z del async_compile
2023-01-11T21:38:06.8466363Z 
2023-01-11T21:38:06.8466434Z def call(args):
2023-01-11T21:38:06.8466510Z     arg0_1, = args
2023-01-11T21:38:06.8466590Z     args.clear()
2023-01-11T21:38:06.8466683Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8466887Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8466985Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8467125Z         triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8467193Z         del arg0_1
2023-01-11T21:38:06.8467274Z         return (buf0, )
2023-01-11T21:38:06.8467280Z 
2023-01-11T21:38:06.8467284Z 
2023-01-11T21:38:06.8467367Z if __name__ == "__main__":
2023-01-11T21:38:06.8467487Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8467614Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8467830Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8467944Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8467984Z 
2023-01-11T21:38:06.8467988Z 
2023-01-11T21:38:06.8468088Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8468163Z import torch
2023-01-11T21:38:06.8468233Z import random
2023-01-11T21:38:06.8468355Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8468482Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8468487Z 
2023-01-11T21:38:06.8468571Z aten = torch.ops.aten
2023-01-11T21:38:06.8468708Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8468804Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8468809Z 
2023-01-11T21:38:06.8468885Z import triton
2023-01-11T21:38:06.8468973Z import triton.language as tl
2023-01-11T21:38:06.8469098Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8469238Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8469243Z 
2023-01-11T21:38:06.8469248Z 
2023-01-11T21:38:06.8469409Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8469486Z import triton
2023-01-11T21:38:06.8469581Z import triton.language as tl
2023-01-11T21:38:06.8469696Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8469801Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8469931Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8470058Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8470063Z 
2023-01-11T21:38:06.8470469Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8470547Z @triton.jit
2023-01-11T21:38:06.8470681Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8470757Z     xnumel = 16
2023-01-11T21:38:06.8470857Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8470992Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8471071Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8471143Z     x0 = xindex
2023-01-11T21:38:06.8471262Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8471336Z     tmp1 = 1
2023-01-11T21:38:06.8471443Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8471582Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8471669Z ''')
2023-01-11T21:38:06.8471674Z 
2023-01-11T21:38:06.8471679Z 
2023-01-11T21:38:06.8471766Z async_compile.wait(globals())
2023-01-11T21:38:06.8471845Z del async_compile
2023-01-11T21:38:06.8471851Z 
2023-01-11T21:38:06.8471927Z def call(args):
2023-01-11T21:38:06.8472006Z     arg0_1, = args
2023-01-11T21:38:06.8472083Z     args.clear()
2023-01-11T21:38:06.8472177Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8472380Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8472471Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8472611Z         triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8472687Z         del arg0_1
2023-01-11T21:38:06.8472767Z         return (buf0, )
2023-01-11T21:38:06.8472772Z 
2023-01-11T21:38:06.8472777Z 
2023-01-11T21:38:06.8472861Z if __name__ == "__main__":
2023-01-11T21:38:06.8472980Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8473111Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8473326Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8473434Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8473439Z 
2023-01-11T21:38:06.8473452Z 
2023-01-11T21:38:06.8473544Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8473622Z import torch
2023-01-11T21:38:06.8473699Z import random
2023-01-11T21:38:06.8473820Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8473973Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8473978Z 
2023-01-11T21:38:06.8474062Z aten = torch.ops.aten
2023-01-11T21:38:06.8474200Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8474290Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8474295Z 
2023-01-11T21:38:06.8474372Z import triton
2023-01-11T21:38:06.8474466Z import triton.language as tl
2023-01-11T21:38:06.8474594Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8474736Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8474742Z 
2023-01-11T21:38:06.8474746Z 
2023-01-11T21:38:06.8474902Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8474980Z import triton
2023-01-11T21:38:06.8475075Z import triton.language as tl
2023-01-11T21:38:06.8475184Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8475294Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8475453Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8475603Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8475608Z 
2023-01-11T21:38:06.8476018Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8476095Z @triton.jit
2023-01-11T21:38:06.8476230Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8476305Z     xnumel = 16
2023-01-11T21:38:06.8476397Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8476528Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8476614Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8476687Z     x0 = xindex
2023-01-11T21:38:06.8476786Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8476859Z     tmp1 = 2
2023-01-11T21:38:06.8476943Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8477009Z     tmp3 = 1
2023-01-11T21:38:06.8477090Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8477227Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8477315Z ''')
2023-01-11T21:38:06.8477321Z 
2023-01-11T21:38:06.8477325Z 
2023-01-11T21:38:06.8477451Z async_compile.wait(globals())
2023-01-11T21:38:06.8477533Z del async_compile
2023-01-11T21:38:06.8477538Z 
2023-01-11T21:38:06.8477616Z def call(args):
2023-01-11T21:38:06.8477684Z     arg0_1, = args
2023-01-11T21:38:06.8477761Z     args.clear()
2023-01-11T21:38:06.8477857Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8478061Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8478156Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8478298Z         triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8478375Z         del arg0_1
2023-01-11T21:38:06.8478451Z         return (buf0, )
2023-01-11T21:38:06.8478456Z 
2023-01-11T21:38:06.8478467Z 
2023-01-11T21:38:06.8478542Z if __name__ == "__main__":
2023-01-11T21:38:06.8478661Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8478790Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8479007Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8479126Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8479131Z 
2023-01-11T21:38:06.8479136Z 
2023-01-11T21:38:06.8479237Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8479314Z import torch
2023-01-11T21:38:06.8479383Z import random
2023-01-11T21:38:06.8479504Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8479631Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8479636Z 
2023-01-11T21:38:06.8479720Z aten = torch.ops.aten
2023-01-11T21:38:06.8479858Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8479986Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8479991Z 
2023-01-11T21:38:06.8480067Z import triton
2023-01-11T21:38:06.8480160Z import triton.language as tl
2023-01-11T21:38:06.8480281Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8480425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8480430Z 
2023-01-11T21:38:06.8480435Z 
2023-01-11T21:38:06.8480590Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8480667Z import triton
2023-01-11T21:38:06.8480765Z import triton.language as tl
2023-01-11T21:38:06.8480881Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8480986Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8481114Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8481243Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8481248Z 
2023-01-11T21:38:06.8481652Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8481731Z @triton.jit
2023-01-11T21:38:06.8481866Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8481943Z     xnumel = 16
2023-01-11T21:38:06.8482046Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8482178Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8482256Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8482331Z     x0 = xindex
2023-01-11T21:38:06.8482449Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8482522Z     tmp1 = 2
2023-01-11T21:38:06.8482605Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8482678Z     tmp3 = 1
2023-01-11T21:38:06.8482757Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8482887Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8482978Z ''')
2023-01-11T21:38:06.8482984Z 
2023-01-11T21:38:06.8482988Z 
2023-01-11T21:38:06.8483083Z async_compile.wait(globals())
2023-01-11T21:38:06.8483163Z del async_compile
2023-01-11T21:38:06.8483168Z 
2023-01-11T21:38:06.8483246Z def call(args):
2023-01-11T21:38:06.8483323Z     arg0_1, = args
2023-01-11T21:38:06.8483427Z     args.clear()
2023-01-11T21:38:06.8483524Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8483721Z         buf0 = empty_strided((4, 4), (4, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8483816Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8483953Z         triton_fused_add_0.run(arg0_1, buf0, 16, grid=grid(16), stream=stream0)
2023-01-11T21:38:06.8484028Z         del arg0_1
2023-01-11T21:38:06.8484108Z         return (buf0, )
2023-01-11T21:38:06.8484114Z 
2023-01-11T21:38:06.8484118Z 
2023-01-11T21:38:06.8484199Z if __name__ == "__main__":
2023-01-11T21:38:06.8484319Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8484443Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8484658Z     arg0_1 = rand_strided((2, 2, 2, 2), (8, 4, 2, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8484774Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8484779Z 
2023-01-11T21:38:06.8485049Z [2023-01-11 21:36:24,034] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1009
2023-01-11T21:38:06.8485463Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8485597Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8485856Z [2023-01-11 21:36:24,049] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1010
2023-01-11T21:38:06.8486150Z [2023-01-11 21:36:24,116] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1010
2023-01-11T21:38:06.8486567Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8486701Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8486957Z [2023-01-11 21:36:24,133] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1011
2023-01-11T21:38:06.8487215Z [2023-01-11 21:36:24,202] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1011
2023-01-11T21:38:06.8487627Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8487766Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8488022Z [2023-01-11 21:36:24,219] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1012
2023-01-11T21:38:06.8488028Z 
2023-01-11T21:38:06.8488128Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8488205Z import torch
2023-01-11T21:38:06.8488281Z import random
2023-01-11T21:38:06.8488402Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8488527Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8488532Z 
2023-01-11T21:38:06.8488610Z aten = torch.ops.aten
2023-01-11T21:38:06.8488752Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8488850Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8488855Z 
2023-01-11T21:38:06.8488934Z import triton
2023-01-11T21:38:06.8489030Z import triton.language as tl
2023-01-11T21:38:06.8489156Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8489324Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8489330Z 
2023-01-11T21:38:06.8489334Z 
2023-01-11T21:38:06.8489492Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8489562Z import triton
2023-01-11T21:38:06.8489657Z import triton.language as tl
2023-01-11T21:38:06.8489775Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8489880Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8490014Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8490142Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8490147Z 
2023-01-11T21:38:06.8490558Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8490635Z @triton.jit
2023-01-11T21:38:06.8490766Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8490844Z     xnumel = 1000
2023-01-11T21:38:06.8490944Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8491075Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8491160Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8491236Z     x0 = xindex
2023-01-11T21:38:06.8491334Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8491401Z     tmp1 = 1
2023-01-11T21:38:06.8491483Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8491620Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8491709Z ''')
2023-01-11T21:38:06.8491740Z 
2023-01-11T21:38:06.8491744Z 
2023-01-11T21:38:06.8491841Z async_compile.wait(globals())
2023-01-11T21:38:06.8491919Z del async_compile
2023-01-11T21:38:06.8491924Z 
2023-01-11T21:38:06.8492000Z def call(args):
2023-01-11T21:38:06.8492076Z     arg0_1, = args
2023-01-11T21:38:06.8492147Z     args.clear()
2023-01-11T21:38:06.8492242Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8492454Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8492550Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8492692Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8492769Z         del arg0_1
2023-01-11T21:38:06.8492849Z         return (buf0, )
2023-01-11T21:38:06.8492854Z 
2023-01-11T21:38:06.8492859Z 
2023-01-11T21:38:06.8492934Z if __name__ == "__main__":
2023-01-11T21:38:06.8493053Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8493184Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8493425Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8493541Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8493546Z 
2023-01-11T21:38:06.8493551Z 
2023-01-11T21:38:06.8493651Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8493732Z import torch
2023-01-11T21:38:06.8493810Z import random
2023-01-11T21:38:06.8493924Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8494050Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8494055Z 
2023-01-11T21:38:06.8494139Z aten = torch.ops.aten
2023-01-11T21:38:06.8494281Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8494383Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8494388Z 
2023-01-11T21:38:06.8494465Z import triton
2023-01-11T21:38:06.8494764Z import triton.language as tl
2023-01-11T21:38:06.8494891Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8495028Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8495034Z 
2023-01-11T21:38:06.8495038Z 
2023-01-11T21:38:06.8495193Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8495268Z import triton
2023-01-11T21:38:06.8495360Z import triton.language as tl
2023-01-11T21:38:06.8495519Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8495622Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8495758Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8495877Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8495889Z 
2023-01-11T21:38:06.8496285Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8496359Z @triton.jit
2023-01-11T21:38:06.8496494Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8496569Z     xnumel = 1000
2023-01-11T21:38:06.8496665Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8496794Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8496876Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8496942Z     x0 = xindex
2023-01-11T21:38:06.8497063Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8497177Z     tmp1 = 1
2023-01-11T21:38:06.8497272Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8497422Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8497526Z ''')
2023-01-11T21:38:06.8497532Z 
2023-01-11T21:38:06.8497537Z 
2023-01-11T21:38:06.8497630Z async_compile.wait(globals())
2023-01-11T21:38:06.8497709Z del async_compile
2023-01-11T21:38:06.8497714Z 
2023-01-11T21:38:06.8497782Z def call(args):
2023-01-11T21:38:06.8497854Z     arg0_1, = args
2023-01-11T21:38:06.8497970Z     args.clear()
2023-01-11T21:38:06.8498061Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8498271Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8498362Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8498505Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8498571Z         del arg0_1
2023-01-11T21:38:06.8498648Z         return (buf0, )
2023-01-11T21:38:06.8498654Z 
2023-01-11T21:38:06.8498658Z 
2023-01-11T21:38:06.8498739Z if __name__ == "__main__":
2023-01-11T21:38:06.8498857Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8498984Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8499217Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8499330Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8499335Z 
2023-01-11T21:38:06.8499342Z 
2023-01-11T21:38:06.8499438Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8499505Z import torch
2023-01-11T21:38:06.8499579Z import random
2023-01-11T21:38:06.8499697Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8499820Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8499825Z 
2023-01-11T21:38:06.8499908Z aten = torch.ops.aten
2023-01-11T21:38:06.8500044Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8500138Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8500143Z 
2023-01-11T21:38:06.8500217Z import triton
2023-01-11T21:38:06.8500302Z import triton.language as tl
2023-01-11T21:38:06.8500426Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8500564Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8500569Z 
2023-01-11T21:38:06.8500573Z 
2023-01-11T21:38:06.8500728Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8500801Z import triton
2023-01-11T21:38:06.8500896Z import triton.language as tl
2023-01-11T21:38:06.8501009Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8501103Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8501237Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8501390Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8501396Z 
2023-01-11T21:38:06.8501800Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8501875Z @triton.jit
2023-01-11T21:38:06.8502007Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8502083Z     xnumel = 1000
2023-01-11T21:38:06.8502181Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8502301Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8502387Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8502458Z     x0 = xindex
2023-01-11T21:38:06.8502557Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8502628Z     tmp1 = 2
2023-01-11T21:38:06.8502710Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8502781Z     tmp3 = 1
2023-01-11T21:38:06.8502852Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8502990Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8503075Z ''')
2023-01-11T21:38:06.8503081Z 
2023-01-11T21:38:06.8503085Z 
2023-01-11T21:38:06.8503176Z async_compile.wait(globals())
2023-01-11T21:38:06.8503254Z del async_compile
2023-01-11T21:38:06.8503259Z 
2023-01-11T21:38:06.8503334Z def call(args):
2023-01-11T21:38:06.8503409Z     arg0_1, = args
2023-01-11T21:38:06.8503485Z     args.clear()
2023-01-11T21:38:06.8503570Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8503776Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8503895Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8504031Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8504106Z         del arg0_1
2023-01-11T21:38:06.8504184Z         return (buf0, )
2023-01-11T21:38:06.8504189Z 
2023-01-11T21:38:06.8504193Z 
2023-01-11T21:38:06.8504276Z if __name__ == "__main__":
2023-01-11T21:38:06.8504387Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8504512Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8504742Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8504852Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8504858Z 
2023-01-11T21:38:06.8505124Z [2023-01-11 21:36:24,291] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1012
2023-01-11T21:38:06.8505536Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8505673Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8505930Z [2023-01-11 21:36:24,305] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1013
2023-01-11T21:38:06.8506192Z [2023-01-11 21:36:24,315] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1013
2023-01-11T21:38:06.8506604Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8506737Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8506985Z [2023-01-11 21:36:24,329] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1014
2023-01-11T21:38:06.8507270Z [2023-01-11 21:36:24,339] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1014
2023-01-11T21:38:06.8507682Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8507812Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8508067Z [2023-01-11 21:36:24,356] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1015
2023-01-11T21:38:06.8508076Z 
2023-01-11T21:38:06.8508173Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8508247Z import torch
2023-01-11T21:38:06.8508324Z import random
2023-01-11T21:38:06.8508442Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8508559Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8508573Z 
2023-01-11T21:38:06.8508648Z aten = torch.ops.aten
2023-01-11T21:38:06.8508786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8508881Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8508886Z 
2023-01-11T21:38:06.8508962Z import triton
2023-01-11T21:38:06.8509053Z import triton.language as tl
2023-01-11T21:38:06.8509179Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8509317Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8509323Z 
2023-01-11T21:38:06.8509327Z 
2023-01-11T21:38:06.8509474Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8509586Z import triton
2023-01-11T21:38:06.8509682Z import triton.language as tl
2023-01-11T21:38:06.8509797Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8509900Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8510031Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8510158Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8510164Z 
2023-01-11T21:38:06.8510570Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8510636Z @triton.jit
2023-01-11T21:38:06.8510767Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8510842Z     xnumel = 1000
2023-01-11T21:38:06.8510939Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8511068Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8511154Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8511225Z     x0 = xindex
2023-01-11T21:38:06.8511335Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8511409Z     tmp1 = 2
2023-01-11T21:38:06.8511486Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8511557Z     tmp3 = 1
2023-01-11T21:38:06.8511637Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8511772Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8511859Z ''')
2023-01-11T21:38:06.8511865Z 
2023-01-11T21:38:06.8511869Z 
2023-01-11T21:38:06.8511965Z async_compile.wait(globals())
2023-01-11T21:38:06.8512035Z del async_compile
2023-01-11T21:38:06.8512040Z 
2023-01-11T21:38:06.8512114Z def call(args):
2023-01-11T21:38:06.8512186Z     arg0_1, = args
2023-01-11T21:38:06.8512261Z     args.clear()
2023-01-11T21:38:06.8512353Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8512559Z         buf0 = empty_strided((10, 100), (100, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8512653Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8512785Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8512860Z         del arg0_1
2023-01-11T21:38:06.8512936Z         return (buf0, )
2023-01-11T21:38:06.8512942Z 
2023-01-11T21:38:06.8512975Z 
2023-01-11T21:38:06.8513057Z if __name__ == "__main__":
2023-01-11T21:38:06.8513175Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8513301Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8513533Z     arg0_1 = rand_strided((10, 1, 10, 1, 10), (100, 100, 10, 10, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8513645Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8513650Z 
2023-01-11T21:38:06.8513654Z 
2023-01-11T21:38:06.8513744Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8513816Z import torch
2023-01-11T21:38:06.8513890Z import random
2023-01-11T21:38:06.8514012Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8514135Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8514141Z 
2023-01-11T21:38:06.8514222Z aten = torch.ops.aten
2023-01-11T21:38:06.8514360Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8514450Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8514462Z 
2023-01-11T21:38:06.8514529Z import triton
2023-01-11T21:38:06.8514620Z import triton.language as tl
2023-01-11T21:38:06.8514747Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8514884Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8514889Z 
2023-01-11T21:38:06.8514894Z 
2023-01-11T21:38:06.8515047Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8515122Z import triton
2023-01-11T21:38:06.8515218Z import triton.language as tl
2023-01-11T21:38:06.8515324Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8515455Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8515586Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8515711Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8515716Z 
2023-01-11T21:38:06.8516118Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8516194Z @triton.jit
2023-01-11T21:38:06.8516326Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8516401Z     xnumel = 1000
2023-01-11T21:38:06.8516491Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8516619Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8516703Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8516774Z     x0 = xindex
2023-01-11T21:38:06.8516874Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8516948Z     tmp1 = 1
2023-01-11T21:38:06.8517025Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8517153Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8517237Z ''')
2023-01-11T21:38:06.8517243Z 
2023-01-11T21:38:06.8517247Z 
2023-01-11T21:38:06.8517339Z async_compile.wait(globals())
2023-01-11T21:38:06.8517418Z del async_compile
2023-01-11T21:38:06.8517423Z 
2023-01-11T21:38:06.8517501Z def call(args):
2023-01-11T21:38:06.8517574Z     arg0_1, = args
2023-01-11T21:38:06.8517648Z     args.clear()
2023-01-11T21:38:06.8517739Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8517944Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8518036Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8518175Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8518247Z         del arg0_1
2023-01-11T21:38:06.8518324Z         return (buf0, )
2023-01-11T21:38:06.8518333Z 
2023-01-11T21:38:06.8518337Z 
2023-01-11T21:38:06.8518417Z if __name__ == "__main__":
2023-01-11T21:38:06.8518535Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8518653Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8518885Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8518996Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8519001Z 
2023-01-11T21:38:06.8519006Z 
2023-01-11T21:38:06.8519103Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8519177Z import torch
2023-01-11T21:38:06.8519251Z import random
2023-01-11T21:38:06.8519369Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8519493Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8519498Z 
2023-01-11T21:38:06.8519572Z aten = torch.ops.aten
2023-01-11T21:38:06.8519708Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8519805Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8519810Z 
2023-01-11T21:38:06.8519886Z import triton
2023-01-11T21:38:06.8519977Z import triton.language as tl
2023-01-11T21:38:06.8520101Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8520242Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8520247Z 
2023-01-11T21:38:06.8520252Z 
2023-01-11T21:38:06.8520407Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8520474Z import triton
2023-01-11T21:38:06.8520568Z import triton.language as tl
2023-01-11T21:38:06.8520682Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8520784Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8520915Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8521040Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8521045Z 
2023-01-11T21:38:06.8521446Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8521589Z @triton.jit
2023-01-11T21:38:06.8521714Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8521791Z     xnumel = 1000
2023-01-11T21:38:06.8521889Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8522018Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8522101Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8522173Z     x0 = xindex
2023-01-11T21:38:06.8522290Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8522354Z     tmp1 = 1
2023-01-11T21:38:06.8522433Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8522567Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8522654Z ''')
2023-01-11T21:38:06.8522660Z 
2023-01-11T21:38:06.8522664Z 
2023-01-11T21:38:06.8522766Z async_compile.wait(globals())
2023-01-11T21:38:06.8522842Z del async_compile
2023-01-11T21:38:06.8522847Z 
2023-01-11T21:38:06.8522922Z def call(args):
2023-01-11T21:38:06.8522994Z     arg0_1, = args
2023-01-11T21:38:06.8523062Z     args.clear()
2023-01-11T21:38:06.8523154Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8523371Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8523464Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8523606Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8523679Z         del arg0_1
2023-01-11T21:38:06.8523756Z         return (buf0, )
2023-01-11T21:38:06.8523761Z 
2023-01-11T21:38:06.8523765Z 
2023-01-11T21:38:06.8523838Z if __name__ == "__main__":
2023-01-11T21:38:06.8523955Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8524082Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8524288Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8524399Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8524405Z 
2023-01-11T21:38:06.8524671Z [2023-01-11 21:36:24,367] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1015
2023-01-11T21:38:06.8525122Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8525256Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8525520Z [2023-01-11 21:36:24,384] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1016
2023-01-11T21:38:06.8525781Z [2023-01-11 21:36:24,394] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1016
2023-01-11T21:38:06.8525789Z 
2023-01-11T21:38:06.8525879Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8525954Z import torch
2023-01-11T21:38:06.8526029Z import random
2023-01-11T21:38:06.8526147Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8526276Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8526282Z 
2023-01-11T21:38:06.8526363Z aten = torch.ops.aten
2023-01-11T21:38:06.8526499Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8526587Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8526600Z 
2023-01-11T21:38:06.8526667Z import triton
2023-01-11T21:38:06.8526757Z import triton.language as tl
2023-01-11T21:38:06.8526881Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8527020Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8527026Z 
2023-01-11T21:38:06.8527055Z 
2023-01-11T21:38:06.8527213Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8527288Z import triton
2023-01-11T21:38:06.8527383Z import triton.language as tl
2023-01-11T21:38:06.8527490Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8527591Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8527725Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8527852Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8527857Z 
2023-01-11T21:38:06.8528256Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8528329Z @triton.jit
2023-01-11T21:38:06.8528461Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8528537Z     xnumel = 1000
2023-01-11T21:38:06.8528627Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8528759Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8528843Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8528914Z     x0 = xindex
2023-01-11T21:38:06.8529011Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8529084Z     tmp1 = 2
2023-01-11T21:38:06.8529167Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8529230Z     tmp3 = 1
2023-01-11T21:38:06.8529308Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8529445Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8529530Z ''')
2023-01-11T21:38:06.8529536Z 
2023-01-11T21:38:06.8529540Z 
2023-01-11T21:38:06.8529633Z async_compile.wait(globals())
2023-01-11T21:38:06.8529710Z del async_compile
2023-01-11T21:38:06.8529715Z 
2023-01-11T21:38:06.8529790Z def call(args):
2023-01-11T21:38:06.8529855Z     arg0_1, = args
2023-01-11T21:38:06.8529931Z     args.clear()
2023-01-11T21:38:06.8530021Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8530235Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8530328Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8530466Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8530546Z         del arg0_1
2023-01-11T21:38:06.8530647Z         return (buf0, )
2023-01-11T21:38:06.8530662Z 
2023-01-11T21:38:06.8530667Z 
2023-01-11T21:38:06.8530741Z if __name__ == "__main__":
2023-01-11T21:38:06.8530860Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8530986Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8531188Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8531303Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8531308Z 
2023-01-11T21:38:06.8531313Z 
2023-01-11T21:38:06.8531411Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8531488Z import torch
2023-01-11T21:38:06.8531565Z import random
2023-01-11T21:38:06.8531676Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8531799Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8531804Z 
2023-01-11T21:38:06.8531885Z aten = torch.ops.aten
2023-01-11T21:38:06.8532023Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8532121Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8532127Z 
2023-01-11T21:38:06.8532201Z import triton
2023-01-11T21:38:06.8532298Z import triton.language as tl
2023-01-11T21:38:06.8532414Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8532551Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8532556Z 
2023-01-11T21:38:06.8532561Z 
2023-01-11T21:38:06.8532714Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.8532789Z import triton
2023-01-11T21:38:06.8532884Z import triton.language as tl
2023-01-11T21:38:06.8532997Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8533130Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8533265Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8533383Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8533388Z 
2023-01-11T21:38:06.8533788Z @pointwise(size_hints=[1024], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8533867Z @triton.jit
2023-01-11T21:38:06.8533998Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8534074Z     xnumel = 1000
2023-01-11T21:38:06.8534170Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8534297Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8534381Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8534445Z     x0 = xindex
2023-01-11T21:38:06.8534761Z     tmp0 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8534834Z     tmp1 = 2
2023-01-11T21:38:06.8534913Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.8534984Z     tmp3 = 1
2023-01-11T21:38:06.8535062Z     tmp4 = tmp2 + tmp3
2023-01-11T21:38:06.8535198Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8535283Z ''')
2023-01-11T21:38:06.8535288Z 
2023-01-11T21:38:06.8535293Z 
2023-01-11T21:38:06.8535386Z async_compile.wait(globals())
2023-01-11T21:38:06.8535463Z del async_compile
2023-01-11T21:38:06.8535468Z 
2023-01-11T21:38:06.8535541Z def call(args):
2023-01-11T21:38:06.8535614Z     arg0_1, = args
2023-01-11T21:38:06.8535689Z     args.clear()
2023-01-11T21:38:06.8535783Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8536023Z         buf0 = empty_strided((10, 5, 20), (100, 20, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8536130Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8536266Z         triton_fused_add_0.run(arg0_1, buf0, 1000, grid=grid(1000), stream=stream0)
2023-01-11T21:38:06.8536344Z         del arg0_1
2023-01-11T21:38:06.8536422Z         return (buf0, )
2023-01-11T21:38:06.8536427Z 
2023-01-11T21:38:06.8536432Z 
2023-01-11T21:38:06.8536512Z if __name__ == "__main__":
2023-01-11T21:38:06.8536672Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8536800Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8536996Z     arg0_1 = rand_strided((50, 20), (20, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8537110Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8537115Z 
2023-01-11T21:38:06.8537252Z ok (0.774s)
2023-01-11T21:38:06.8537704Z   test_views3_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8537841Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8538101Z [2023-01-11 21:36:24,452] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1017
2023-01-11T21:38:06.8538365Z [2023-01-11 21:36:24,564] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1017
2023-01-11T21:38:06.8538777Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8538908Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8539165Z [2023-01-11 21:36:24,620] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1018
2023-01-11T21:38:06.8539467Z [2023-01-11 21:36:24,714] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1018
2023-01-11T21:38:06.8539472Z 
2023-01-11T21:38:06.8539566Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8539646Z import torch
2023-01-11T21:38:06.8539725Z import random
2023-01-11T21:38:06.8539851Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8539977Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8539982Z 
2023-01-11T21:38:06.8540068Z aten = torch.ops.aten
2023-01-11T21:38:06.8540209Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8540300Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8540314Z 
2023-01-11T21:38:06.8540384Z import triton
2023-01-11T21:38:06.8540479Z import triton.language as tl
2023-01-11T21:38:06.8540608Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8540753Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8540759Z 
2023-01-11T21:38:06.8540763Z 
2023-01-11T21:38:06.8540923Z triton_fused_view_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8541000Z import triton
2023-01-11T21:38:06.8541095Z import triton.language as tl
2023-01-11T21:38:06.8541207Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8541312Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8541447Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8541579Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8541584Z 
2023-01-11T21:38:06.8542009Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8542086Z @triton.jit
2023-01-11T21:38:06.8542233Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8542312Z     xnumel = 142848
2023-01-11T21:38:06.8542405Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8542538Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8542623Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8542730Z     x0 = xindex % 192
2023-01-11T21:38:06.8542815Z     x1 = (xindex // 192)
2023-01-11T21:38:06.8542889Z     x2 = xindex
2023-01-11T21:38:06.8543005Z     tmp0 = tl.load(in_ptr0 + ((3*x1) + (x0 // 64)), xmask)
2023-01-11T21:38:06.8543116Z     tmp1 = tl.load(in_ptr1 + ((64*tmp0) + (x0 % 64)), xmask)
2023-01-11T21:38:06.8543254Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.8543343Z ''')
2023-01-11T21:38:06.8543349Z 
2023-01-11T21:38:06.8543354Z 
2023-01-11T21:38:06.8543449Z async_compile.wait(globals())
2023-01-11T21:38:06.8543528Z del async_compile
2023-01-11T21:38:06.8543536Z 
2023-01-11T21:38:06.8543616Z def call(args):
2023-01-11T21:38:06.8543699Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8543776Z     args.clear()
2023-01-11T21:38:06.8543864Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8544100Z         buf0 = empty_strided((1, 12, 62, 192), (142848, 11904, 192, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8544199Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8544351Z         triton_fused_view_1_0.run(arg1_1, arg0_1, buf0, 142848, grid=grid(142848), stream=stream0)
2023-01-11T21:38:06.8544426Z         del arg0_1
2023-01-11T21:38:06.8544501Z         del arg1_1
2023-01-11T21:38:06.8544580Z         return (buf0, )
2023-01-11T21:38:06.8544585Z 
2023-01-11T21:38:06.8544590Z 
2023-01-11T21:38:06.8544665Z if __name__ == "__main__":
2023-01-11T21:38:06.8544788Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8544916Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8545123Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8545353Z     arg1_1 = rand_strided((2232, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.8545476Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8545481Z 
2023-01-11T21:38:06.8545485Z 
2023-01-11T21:38:06.8545588Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8545665Z import torch
2023-01-11T21:38:06.8545735Z import random
2023-01-11T21:38:06.8545856Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8545981Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8545986Z 
2023-01-11T21:38:06.8546070Z aten = torch.ops.aten
2023-01-11T21:38:06.8546207Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8546308Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8546313Z 
2023-01-11T21:38:06.8546388Z import triton
2023-01-11T21:38:06.8546481Z import triton.language as tl
2023-01-11T21:38:06.8546600Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8546745Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8546750Z 
2023-01-11T21:38:06.8546755Z 
2023-01-11T21:38:06.8546917Z triton_fused_view_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8546994Z import triton
2023-01-11T21:38:06.8547093Z import triton.language as tl
2023-01-11T21:38:06.8547208Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8547314Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8547450Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8547571Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8547576Z 
2023-01-11T21:38:06.8547998Z @pointwise(size_hints=[262144], filename=__file__, meta={'signature': {0: '*i64', 1: '*fp16', 2: '*fp16', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2, 3), equal_to_1=())]})
2023-01-11T21:38:06.8548078Z @triton.jit
2023-01-11T21:38:06.8548221Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8548300Z     xnumel = 142848
2023-01-11T21:38:06.8548399Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8548529Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8548643Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8548717Z     x0 = xindex % 192
2023-01-11T21:38:06.8548800Z     x1 = (xindex // 192)
2023-01-11T21:38:06.8548873Z     x2 = xindex
2023-01-11T21:38:06.8548989Z     tmp0 = tl.load(in_ptr0 + ((3*x1) + (x0 // 64)), xmask)
2023-01-11T21:38:06.8549122Z     tmp1 = tl.load(in_ptr1 + ((64*tmp0) + (x0 % 64)), xmask).to(tl.float32)
2023-01-11T21:38:06.8549258Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.8549345Z ''')
2023-01-11T21:38:06.8549351Z 
2023-01-11T21:38:06.8549355Z 
2023-01-11T21:38:06.8549443Z async_compile.wait(globals())
2023-01-11T21:38:06.8549523Z del async_compile
2023-01-11T21:38:06.8549531Z 
2023-01-11T21:38:06.8549609Z def call(args):
2023-01-11T21:38:06.8549692Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8549769Z     args.clear()
2023-01-11T21:38:06.8549864Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8550101Z         buf0 = empty_strided((1, 12, 62, 192), (142848, 11904, 192, 1), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8550189Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8550340Z         triton_fused_view_1_0.run(arg1_1, arg0_1, buf0, 142848, grid=grid(142848), stream=stream0)
2023-01-11T21:38:06.8550416Z         del arg0_1
2023-01-11T21:38:06.8550490Z         del arg1_1
2023-01-11T21:38:06.8550572Z         return (buf0, )
2023-01-11T21:38:06.8550578Z 
2023-01-11T21:38:06.8550582Z 
2023-01-11T21:38:06.8550664Z if __name__ == "__main__":
2023-01-11T21:38:06.8550787Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8550915Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8551149Z     arg0_1 = rand_strided((64, 64), (64, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8551349Z     arg1_1 = rand_strided((2232, ), (1, ), device='cuda:0', dtype=torch.int64)
2023-01-11T21:38:06.8551472Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8551478Z 
2023-01-11T21:38:06.8551552Z ok (0.319s)
2023-01-11T21:38:06.8552084Z   test_zero_dim_reductions_cuda (__main__.CudaTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.8552167Z   warnings.warn(
2023-01-11T21:38:06.8552431Z [2023-01-11 21:36:24,772] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1019
2023-01-11T21:38:06.8552697Z [2023-01-11 21:36:24,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1019
2023-01-11T21:38:06.8552956Z [2023-01-11 21:36:25,089] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1020
2023-01-11T21:38:06.8553211Z [2023-01-11 21:36:25,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1020
2023-01-11T21:38:06.8553225Z 
2023-01-11T21:38:06.8553322Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8553398Z import torch
2023-01-11T21:38:06.8553475Z import random
2023-01-11T21:38:06.8553595Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8553723Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8553728Z 
2023-01-11T21:38:06.8553814Z aten = torch.ops.aten
2023-01-11T21:38:06.8553951Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8554041Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8554046Z 
2023-01-11T21:38:06.8554122Z import triton
2023-01-11T21:38:06.8554218Z import triton.language as tl
2023-01-11T21:38:06.8554346Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8554488Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8554494Z 
2023-01-11T21:38:06.8554498Z 
2023-01-11T21:38:06.8554670Z triton_fused_logical_not_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8554749Z import triton
2023-01-11T21:38:06.8554871Z import triton.language as tl
2023-01-11T21:38:06.8554981Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8555085Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8555220Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8555347Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8555352Z 
2023-01-11T21:38:06.8555740Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i1', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.8555816Z @triton.jit
2023-01-11T21:38:06.8555942Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8556019Z     xnumel = 2
2023-01-11T21:38:06.8556112Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8556241Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8556327Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8556403Z     x0 = xindex
2023-01-11T21:38:06.8556481Z     tmp0 = False
2023-01-11T21:38:06.8556560Z     tmp1 = tmp0 == 0
2023-01-11T21:38:06.8556689Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.8556780Z ''')
2023-01-11T21:38:06.8556785Z 
2023-01-11T21:38:06.8556790Z 
2023-01-11T21:38:06.8556884Z async_compile.wait(globals())
2023-01-11T21:38:06.8556966Z del async_compile
2023-01-11T21:38:06.8556971Z 
2023-01-11T21:38:06.8557047Z def call(args):
2023-01-11T21:38:06.8557122Z     arg0_1, = args
2023-01-11T21:38:06.8557198Z     args.clear()
2023-01-11T21:38:06.8557292Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8557538Z         buf0 = empty_strided((2, 1), (1, 1), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.8557636Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8557777Z         triton_fused_logical_not_1_0.run(buf0, 2, grid=grid(2), stream=stream0)
2023-01-11T21:38:06.8557857Z         return (buf0, )
2023-01-11T21:38:06.8557862Z 
2023-01-11T21:38:06.8557869Z 
2023-01-11T21:38:06.8557952Z if __name__ == "__main__":
2023-01-11T21:38:06.8558074Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8558201Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8558403Z     arg0_1 = rand_strided((2, 0), (1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8558510Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8558515Z 
2023-01-11T21:38:06.8558527Z 
2023-01-11T21:38:06.8558619Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8558694Z import torch
2023-01-11T21:38:06.8558771Z import random
2023-01-11T21:38:06.8558897Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8559024Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8559029Z 
2023-01-11T21:38:06.8559113Z aten = torch.ops.aten
2023-01-11T21:38:06.8559251Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8559343Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8559348Z 
2023-01-11T21:38:06.8559424Z import triton
2023-01-11T21:38:06.8559519Z import triton.language as tl
2023-01-11T21:38:06.8559646Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8559787Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8559793Z 
2023-01-11T21:38:06.8559797Z 
2023-01-11T21:38:06.8559968Z triton_fused_logical_not_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8560046Z import triton
2023-01-11T21:38:06.8560140Z import triton.language as tl
2023-01-11T21:38:06.8560249Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8560355Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8560488Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8560615Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8560620Z 
2023-01-11T21:38:06.8561028Z @pointwise(size_hints=[2], filename=__file__, meta={'signature': {0: '*i1', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.8561106Z @triton.jit
2023-01-11T21:38:06.8561232Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8561300Z     xnumel = 2
2023-01-11T21:38:06.8561400Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8561530Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8561616Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8561690Z     x0 = xindex
2023-01-11T21:38:06.8561768Z     tmp0 = False
2023-01-11T21:38:06.8561847Z     tmp1 = tmp0 == 0
2023-01-11T21:38:06.8561979Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp1, xmask)
2023-01-11T21:38:06.8562066Z ''')
2023-01-11T21:38:06.8562072Z 
2023-01-11T21:38:06.8562076Z 
2023-01-11T21:38:06.8562171Z async_compile.wait(globals())
2023-01-11T21:38:06.8562251Z del async_compile
2023-01-11T21:38:06.8562256Z 
2023-01-11T21:38:06.8562336Z def call(args):
2023-01-11T21:38:06.8562411Z     arg0_1, = args
2023-01-11T21:38:06.8562487Z     args.clear()
2023-01-11T21:38:06.8562582Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8562768Z         buf0 = empty_strided((2, ), (1, ), device='cuda', dtype=torch.bool)
2023-01-11T21:38:06.8562862Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8563002Z         triton_fused_logical_not_1_0.run(buf0, 2, grid=grid(2), stream=stream0)
2023-01-11T21:38:06.8563083Z         return (buf0, )
2023-01-11T21:38:06.8563088Z 
2023-01-11T21:38:06.8563093Z 
2023-01-11T21:38:06.8563175Z if __name__ == "__main__":
2023-01-11T21:38:06.8563295Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8563452Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8563654Z     arg0_1 = rand_strided((2, 0), (1, 1), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8563761Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8563766Z 
2023-01-11T21:38:06.8563843Z ok (0.539s)
2023-01-11T21:38:06.8564298Z   test_zeros_cuda (__main__.CudaTests) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8564431Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8564694Z [2023-01-11 21:36:25,360] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1021
2023-01-11T21:38:06.8564963Z [2023-01-11 21:36:27,038] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1021
2023-01-11T21:38:06.8565380Z /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8565513Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8565770Z [2023-01-11 21:36:27,146] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1022
2023-01-11T21:38:06.8565776Z 
2023-01-11T21:38:06.8565877Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8565947Z import torch
2023-01-11T21:38:06.8566024Z import random
2023-01-11T21:38:06.8566145Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8566275Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8566280Z 
2023-01-11T21:38:06.8566364Z aten = torch.ops.aten
2023-01-11T21:38:06.8566503Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8566601Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8566606Z 
2023-01-11T21:38:06.8566704Z import triton
2023-01-11T21:38:06.8566793Z import triton.language as tl
2023-01-11T21:38:06.8566922Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8567065Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8567070Z 
2023-01-11T21:38:06.8567075Z 
2023-01-11T21:38:06.8567240Z triton_fused_add_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8567316Z import triton
2023-01-11T21:38:06.8567411Z import triton.language as tl
2023-01-11T21:38:06.8567526Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8567631Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8567762Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8567888Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8567893Z 
2023-01-11T21:38:06.8568310Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8568386Z @triton.jit
2023-01-11T21:38:06.8568531Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8568611Z     xnumel = 8
2023-01-11T21:38:06.8568712Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8568843Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8568921Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8568995Z     x0 = xindex
2023-01-11T21:38:06.8569188Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.8569318Z     tmp3 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.8569393Z     tmp1 = 1
2023-01-11T21:38:06.8569474Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8569549Z     tmp4 = tmp3 + tmp1
2023-01-11T21:38:06.8569685Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8569824Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp4, xmask)
2023-01-11T21:38:06.8569911Z ''')
2023-01-11T21:38:06.8569917Z 
2023-01-11T21:38:06.8569921Z 
2023-01-11T21:38:06.8570082Z triton_fused_zeros_1 = async_compile.triton('''
2023-01-11T21:38:06.8570159Z import triton
2023-01-11T21:38:06.8570254Z import triton.language as tl
2023-01-11T21:38:06.8570370Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8570467Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8570601Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8570728Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8570737Z 
2023-01-11T21:38:06.8571131Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8571208Z @triton.jit
2023-01-11T21:38:06.8571334Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8571415Z     xnumel = 32768
2023-01-11T21:38:06.8571515Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8571639Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8571725Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8571797Z     x0 = xindex
2023-01-11T21:38:06.8571870Z     tmp0 = 0
2023-01-11T21:38:06.8572008Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.8572095Z ''')
2023-01-11T21:38:06.8572101Z 
2023-01-11T21:38:06.8572105Z 
2023-01-11T21:38:06.8572244Z kernel_cpp_2 = async_compile.cpp('''
2023-01-11T21:38:06.8572457Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8572571Z extern "C" void kernel(float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8572638Z {
2023-01-11T21:38:06.8572726Z     #pragma GCC ivdep
2023-01-11T21:38:06.8572812Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.8572907Z     {
2023-01-11T21:38:06.8572980Z         {
2023-01-11T21:38:06.8573043Z             {
2023-01-11T21:38:06.8573151Z                 auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.8573243Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.8573314Z             }
2023-01-11T21:38:06.8573383Z         }
2023-01-11T21:38:06.8573452Z     }
2023-01-11T21:38:06.8573518Z }
2023-01-11T21:38:06.8573598Z ''')
2023-01-11T21:38:06.8573603Z 
2023-01-11T21:38:06.8573607Z 
2023-01-11T21:38:06.8573767Z triton_fused_full_3 = async_compile.triton('''
2023-01-11T21:38:06.8573844Z import triton
2023-01-11T21:38:06.8573938Z import triton.language as tl
2023-01-11T21:38:06.8574058Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8574162Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8574295Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8574415Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8574427Z 
2023-01-11T21:38:06.8575004Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.8575083Z @triton.jit
2023-01-11T21:38:06.8575208Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8575284Z     xnumel = 6
2023-01-11T21:38:06.8575382Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8575512Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8575596Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8575663Z     x0 = xindex
2023-01-11T21:38:06.8575789Z     tmp0 = 3.1416
2023-01-11T21:38:06.8575926Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.8576013Z ''')
2023-01-11T21:38:06.8576019Z 
2023-01-11T21:38:06.8576024Z 
2023-01-11T21:38:06.8576119Z async_compile.wait(globals())
2023-01-11T21:38:06.8576199Z del async_compile
2023-01-11T21:38:06.8576206Z 
2023-01-11T21:38:06.8576286Z def call(args):
2023-01-11T21:38:06.8576362Z     arg0_1, = args
2023-01-11T21:38:06.8576433Z     args.clear()
2023-01-11T21:38:06.8576528Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8576731Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8576929Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8577025Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8577222Z         triton_fused_add_add_1_0.run(arg0_1, buf0, buf4, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.8577310Z         del arg0_1
2023-01-11T21:38:06.8577559Z         buf1 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8577697Z         triton_fused_zeros_1.run(buf1, 32768, grid=grid(32768), stream=stream0)
2023-01-11T21:38:06.8577920Z         buf2 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8578052Z         triton_fused_zeros_1.run(buf2, 32768, grid=grid(32768), stream=stream0)
2023-01-11T21:38:06.8578247Z     buf3 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8578357Z     kernel_cpp_2(c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.8578452Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8578655Z         buf5 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8578781Z         triton_fused_full_3.run(buf5, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.8578895Z         return (buf0, buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:06.8578903Z 
2023-01-11T21:38:06.8578909Z 
2023-01-11T21:38:06.8578990Z if __name__ == "__main__":
2023-01-11T21:38:06.8579109Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8579237Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8579476Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.8579591Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8579597Z 
2023-01-11T21:38:06.8579862Z [2023-01-11 21:36:27,234] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1022
2023-01-11T21:38:06.8579868Z 
2023-01-11T21:38:06.8579968Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8580038Z import torch
2023-01-11T21:38:06.8580114Z import random
2023-01-11T21:38:06.8580235Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8580359Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8580364Z 
2023-01-11T21:38:06.8580451Z aten = torch.ops.aten
2023-01-11T21:38:06.8580588Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8580685Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8580690Z 
2023-01-11T21:38:06.8580760Z import triton
2023-01-11T21:38:06.8580853Z import triton.language as tl
2023-01-11T21:38:06.8580983Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8581125Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8581131Z 
2023-01-11T21:38:06.8581135Z 
2023-01-11T21:38:06.8581301Z triton_fused_add_add_1_0 = async_compile.triton('''
2023-01-11T21:38:06.8581378Z import triton
2023-01-11T21:38:06.8581472Z import triton.language as tl
2023-01-11T21:38:06.8581587Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8581684Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8581817Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8581943Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8582006Z 
2023-01-11T21:38:06.8582426Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp16', 1: '*fp16', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.8582510Z @triton.jit
2023-01-11T21:38:06.8582656Z def triton_(in_ptr0, out_ptr0, out_ptr1, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8582732Z     xnumel = 8
2023-01-11T21:38:06.8582831Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8582955Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8583045Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8583118Z     x0 = xindex
2023-01-11T21:38:06.8583335Z     tmp0 = tl.load(in_ptr0 + (x0), xmask, eviction_policy='evict_last').to(tl.float32)
2023-01-11T21:38:06.8583456Z     tmp3 = tl.load(in_ptr0 + (x0), xmask).to(tl.float32)
2023-01-11T21:38:06.8583532Z     tmp1 = 1
2023-01-11T21:38:06.8583616Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.8583699Z     tmp4 = tmp3.to(tl.float32)
2023-01-11T21:38:06.8583782Z     tmp5 = tmp4 + tmp1
2023-01-11T21:38:06.8583916Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.8584055Z     tl.store(out_ptr1 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.8584143Z ''')
2023-01-11T21:38:06.8584149Z 
2023-01-11T21:38:06.8584153Z 
2023-01-11T21:38:06.8584314Z triton_fused_zeros_1 = async_compile.triton('''
2023-01-11T21:38:06.8584392Z import triton
2023-01-11T21:38:06.8584480Z import triton.language as tl
2023-01-11T21:38:06.8584598Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8584708Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8584843Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8584971Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8584977Z 
2023-01-11T21:38:06.8585372Z @pointwise(size_hints=[32768], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.8585456Z @triton.jit
2023-01-11T21:38:06.8585597Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8585717Z     xnumel = 32768
2023-01-11T21:38:06.8585820Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8585951Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8586038Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8586111Z     x0 = xindex
2023-01-11T21:38:06.8586183Z     tmp0 = 0
2023-01-11T21:38:06.8586317Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.8586397Z ''')
2023-01-11T21:38:06.8586403Z 
2023-01-11T21:38:06.8586418Z 
2023-01-11T21:38:06.8586549Z kernel_cpp_2 = async_compile.cpp('''
2023-01-11T21:38:06.8586758Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8586881Z extern "C" void kernel(float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8586953Z {
2023-01-11T21:38:06.8587036Z     #pragma GCC ivdep
2023-01-11T21:38:06.8587127Z     for(long i0=0; i0<6; i0+=1)
2023-01-11T21:38:06.8587197Z     {
2023-01-11T21:38:06.8587262Z         {
2023-01-11T21:38:06.8587333Z             {
2023-01-11T21:38:06.8587440Z                 auto tmp0 = static_cast<float>(0);
2023-01-11T21:38:06.8587528Z                 out_ptr0[i0] = tmp0;
2023-01-11T21:38:06.8587601Z             }
2023-01-11T21:38:06.8587669Z         }
2023-01-11T21:38:06.8587730Z     }
2023-01-11T21:38:06.8587798Z }
2023-01-11T21:38:06.8587883Z ''')
2023-01-11T21:38:06.8587889Z 
2023-01-11T21:38:06.8587894Z 
2023-01-11T21:38:06.8588053Z triton_fused_full_3 = async_compile.triton('''
2023-01-11T21:38:06.8588131Z import triton
2023-01-11T21:38:06.8588226Z import triton.language as tl
2023-01-11T21:38:06.8588342Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.8588469Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.8588608Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.8588734Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.8588740Z 
2023-01-11T21:38:06.8589127Z @pointwise(size_hints=[8], filename=__file__, meta={'signature': {0: '*fp32', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.8589203Z @triton.jit
2023-01-11T21:38:06.8589329Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.8589402Z     xnumel = 6
2023-01-11T21:38:06.8589505Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.8589627Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.8589712Z     xmask = xindex < xnumel
2023-01-11T21:38:06.8589785Z     x0 = xindex
2023-01-11T21:38:06.8589862Z     tmp0 = 3.1416
2023-01-11T21:38:06.8589999Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp0, xmask)
2023-01-11T21:38:06.8590086Z ''')
2023-01-11T21:38:06.8590091Z 
2023-01-11T21:38:06.8590096Z 
2023-01-11T21:38:06.8590192Z async_compile.wait(globals())
2023-01-11T21:38:06.8590272Z del async_compile
2023-01-11T21:38:06.8590277Z 
2023-01-11T21:38:06.8590349Z def call(args):
2023-01-11T21:38:06.8590425Z     arg0_1, = args
2023-01-11T21:38:06.8590501Z     args.clear()
2023-01-11T21:38:06.8590595Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8590798Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float16)
2023-01-11T21:38:06.8590995Z         buf4 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8591091Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.8591232Z         triton_fused_add_add_1_0.run(arg0_1, buf0, buf4, 8, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.8591310Z         del arg0_1
2023-01-11T21:38:06.8591535Z         buf1 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8591676Z         triton_fused_zeros_1.run(buf1, 32768, grid=grid(32768), stream=stream0)
2023-01-11T21:38:06.8591897Z         buf2 = empty_strided((1, 8, 64, 64), (32768, 4096, 64, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8592055Z         triton_fused_zeros_1.run(buf2, 32768, grid=grid(32768), stream=stream0)
2023-01-11T21:38:06.8592253Z     buf3 = empty_strided((2, 3), (3, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8592364Z     kernel_cpp_2(c_void_p(buf3.data_ptr()))
2023-01-11T21:38:06.8592453Z     with torch.cuda.device(0):
2023-01-11T21:38:06.8592652Z         buf5 = empty_strided((2, 3), (3, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.8592784Z         triton_fused_full_3.run(buf5, 6, grid=grid(6), stream=stream0)
2023-01-11T21:38:06.8592894Z         return (buf0, buf1, buf2, buf3, buf4, buf5, )
2023-01-11T21:38:06.8592899Z 
2023-01-11T21:38:06.8592907Z 
2023-01-11T21:38:06.8592993Z if __name__ == "__main__":
2023-01-11T21:38:06.8593114Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8593241Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8593441Z     arg0_1 = rand_strided((8, ), (1, ), device='cuda:0', dtype=torch.float16)
2023-01-11T21:38:06.8593550Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.8593555Z 
2023-01-11T21:38:06.8593629Z ok (1.983s)
2023-01-11T21:38:06.8593762Z   test_print_pow (__main__.ExprPrinterTests) ... ok (0.003s)
2023-01-11T21:38:06.8594249Z   test_cpu_broadcast1_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8594419Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8594683Z [2023-01-11 21:36:27,252] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1023
2023-01-11T21:38:06.8594950Z [2023-01-11 21:36:27,260] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1023
2023-01-11T21:38:06.8594956Z 
2023-01-11T21:38:06.8595056Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8595134Z import torch
2023-01-11T21:38:06.8595204Z import random
2023-01-11T21:38:06.8595325Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8595451Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8595456Z 
2023-01-11T21:38:06.8595542Z aten = torch.ops.aten
2023-01-11T21:38:06.8595683Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8595781Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8595786Z 
2023-01-11T21:38:06.8595864Z import triton
2023-01-11T21:38:06.8595962Z import triton.language as tl
2023-01-11T21:38:06.8596082Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8596223Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8596228Z 
2023-01-11T21:38:06.8596233Z 
2023-01-11T21:38:06.8596376Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8596585Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8596710Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8596824Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8596930Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8596999Z {
2023-01-11T21:38:06.8597096Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8597165Z     {
2023-01-11T21:38:06.8597251Z         #pragma omp for 
2023-01-11T21:38:06.8597342Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.8597414Z         {
2023-01-11T21:38:06.8597587Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.8597728Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8597814Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8597914Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8598010Z         }
2023-01-11T21:38:06.8598112Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8598199Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.8598265Z         {
2023-01-11T21:38:06.8598352Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8598432Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8598522Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8598606Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8598672Z         }
2023-01-11T21:38:06.8598737Z     }
2023-01-11T21:38:06.8598801Z }
2023-01-11T21:38:06.8598879Z ''')
2023-01-11T21:38:06.8598894Z 
2023-01-11T21:38:06.8598901Z 
2023-01-11T21:38:06.8598987Z async_compile.wait(globals())
2023-01-11T21:38:06.8599064Z del async_compile
2023-01-11T21:38:06.8599069Z 
2023-01-11T21:38:06.8599143Z def call(args):
2023-01-11T21:38:06.8599222Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8599298Z     args.clear()
2023-01-11T21:38:06.8599497Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8599666Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8599732Z     del arg0_1
2023-01-11T21:38:06.8599805Z     del arg1_1
2023-01-11T21:38:06.8599881Z     return (buf0, )
2023-01-11T21:38:06.8599887Z 
2023-01-11T21:38:06.8599891Z 
2023-01-11T21:38:06.8599970Z if __name__ == "__main__":
2023-01-11T21:38:06.8600090Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8600216Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8600412Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8600633Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8600746Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8600752Z 
2023-01-11T21:38:06.8600823Z ok (0.020s)
2023-01-11T21:38:06.8601301Z   test_cpu_broadcast1_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8601433Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8601692Z [2023-01-11 21:36:27,273] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1024
2023-01-11T21:38:06.8601954Z [2023-01-11 21:36:28,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1024
2023-01-11T21:38:06.8601963Z 
2023-01-11T21:38:06.8602062Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8602136Z import torch
2023-01-11T21:38:06.8602210Z import random
2023-01-11T21:38:06.8602322Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8602446Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8602451Z 
2023-01-11T21:38:06.8602533Z aten = torch.ops.aten
2023-01-11T21:38:06.8602669Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8602765Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8602770Z 
2023-01-11T21:38:06.8602844Z import triton
2023-01-11T21:38:06.8602937Z import triton.language as tl
2023-01-11T21:38:06.8603064Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8603197Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8603202Z 
2023-01-11T21:38:06.8603209Z 
2023-01-11T21:38:06.8603348Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8603553Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8603676Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8603816Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8603921Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8603987Z {
2023-01-11T21:38:06.8604088Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8604147Z     {
2023-01-11T21:38:06.8604229Z         #pragma omp for 
2023-01-11T21:38:06.8604316Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8604383Z         {
2023-01-11T21:38:06.8604471Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8604539Z             {
2023-01-11T21:38:06.8604672Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i1);
2023-01-11T21:38:06.8604806Z                 auto tmp1 = at::vec::Vectorized<float>(in_ptr1[i0]);
2023-01-11T21:38:06.8604902Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8605012Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8605080Z             }
2023-01-11T21:38:06.8605176Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8605268Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8605336Z             {
2023-01-11T21:38:06.8605442Z                 auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8605539Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8605649Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8605747Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8605816Z             }
2023-01-11T21:38:06.8605885Z         }
2023-01-11T21:38:06.8605944Z     }
2023-01-11T21:38:06.8606008Z }
2023-01-11T21:38:06.8606092Z ''')
2023-01-11T21:38:06.8606098Z 
2023-01-11T21:38:06.8606103Z 
2023-01-11T21:38:06.8606196Z async_compile.wait(globals())
2023-01-11T21:38:06.8606300Z del async_compile
2023-01-11T21:38:06.8606306Z 
2023-01-11T21:38:06.8606379Z def call(args):
2023-01-11T21:38:06.8606459Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8606534Z     args.clear()
2023-01-11T21:38:06.8606735Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8606907Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8606979Z     del arg0_1
2023-01-11T21:38:06.8607052Z     del arg1_1
2023-01-11T21:38:06.8607127Z     return (buf0, )
2023-01-11T21:38:06.8607132Z 
2023-01-11T21:38:06.8607137Z 
2023-01-11T21:38:06.8607219Z if __name__ == "__main__":
2023-01-11T21:38:06.8607337Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8607457Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8607652Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8607859Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8607984Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8607989Z 
2023-01-11T21:38:06.8608058Z ok (1.724s)
2023-01-11T21:38:06.8608537Z   test_cpu_broadcast1_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8608667Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8608924Z [2023-01-11 21:36:28,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1025
2023-01-11T21:38:06.8609187Z [2023-01-11 21:36:30,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1025
2023-01-11T21:38:06.8609195Z 
2023-01-11T21:38:06.8609292Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8609359Z import torch
2023-01-11T21:38:06.8609433Z import random
2023-01-11T21:38:06.8609552Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8609701Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8609707Z 
2023-01-11T21:38:06.8609789Z aten = torch.ops.aten
2023-01-11T21:38:06.8609926Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8610020Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8610026Z 
2023-01-11T21:38:06.8610099Z import triton
2023-01-11T21:38:06.8610184Z import triton.language as tl
2023-01-11T21:38:06.8610308Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8610447Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8610453Z 
2023-01-11T21:38:06.8610457Z 
2023-01-11T21:38:06.8610600Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8610806Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8610928Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8611036Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8611142Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8611200Z {
2023-01-11T21:38:06.8611302Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8611368Z     {
2023-01-11T21:38:06.8611450Z         #pragma omp for 
2023-01-11T21:38:06.8611540Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.8611611Z         {
2023-01-11T21:38:06.8611743Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.8611868Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:06.8611957Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8612053Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8612147Z         }
2023-01-11T21:38:06.8612246Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8612334Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.8612394Z         {
2023-01-11T21:38:06.8612482Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8612571Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.8612661Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8612746Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8612815Z         }
2023-01-11T21:38:06.8612882Z     }
2023-01-11T21:38:06.8612940Z }
2023-01-11T21:38:06.8613024Z ''')
2023-01-11T21:38:06.8613030Z 
2023-01-11T21:38:06.8613034Z 
2023-01-11T21:38:06.8613129Z async_compile.wait(globals())
2023-01-11T21:38:06.8613205Z del async_compile
2023-01-11T21:38:06.8613210Z 
2023-01-11T21:38:06.8613284Z def call(args):
2023-01-11T21:38:06.8613364Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8613439Z     args.clear()
2023-01-11T21:38:06.8613634Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8613798Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8613870Z     del arg0_1
2023-01-11T21:38:06.8613942Z     del arg1_1
2023-01-11T21:38:06.8614017Z     return (buf0, )
2023-01-11T21:38:06.8614022Z 
2023-01-11T21:38:06.8614029Z 
2023-01-11T21:38:06.8614110Z if __name__ == "__main__":
2023-01-11T21:38:06.8614228Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8614354Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8614655Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8614849Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8614969Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8614975Z 
2023-01-11T21:38:06.8615046Z ok (1.678s)
2023-01-11T21:38:06.8615565Z   test_cpu_broadcast1_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8615702Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8615960Z [2023-01-11 21:36:30,675] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1026
2023-01-11T21:38:06.8616219Z [2023-01-11 21:36:32,351] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1026
2023-01-11T21:38:06.8616225Z 
2023-01-11T21:38:06.8616322Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8616398Z import torch
2023-01-11T21:38:06.8616466Z import random
2023-01-11T21:38:06.8616585Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8616711Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8616716Z 
2023-01-11T21:38:06.8616797Z aten = torch.ops.aten
2023-01-11T21:38:06.8616932Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8617026Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8617031Z 
2023-01-11T21:38:06.8617109Z import triton
2023-01-11T21:38:06.8617256Z import triton.language as tl
2023-01-11T21:38:06.8617383Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8617523Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8617528Z 
2023-01-11T21:38:06.8617533Z 
2023-01-11T21:38:06.8617671Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8617877Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8617999Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8618108Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8618255Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8618314Z {
2023-01-11T21:38:06.8618414Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8618480Z     {
2023-01-11T21:38:06.8618563Z         #pragma omp for 
2023-01-11T21:38:06.8618652Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8618718Z         {
2023-01-11T21:38:06.8618806Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8618867Z             {
2023-01-11T21:38:06.8619009Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i1);
2023-01-11T21:38:06.8619157Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8619249Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8619359Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8619426Z             }
2023-01-11T21:38:06.8619523Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8619609Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8619676Z             {
2023-01-11T21:38:06.8619767Z                 auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8619873Z                 auto tmp1 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8619964Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8620065Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8620133Z             }
2023-01-11T21:38:06.8620193Z         }
2023-01-11T21:38:06.8620259Z     }
2023-01-11T21:38:06.8620322Z }
2023-01-11T21:38:06.8620408Z ''')
2023-01-11T21:38:06.8620414Z 
2023-01-11T21:38:06.8620418Z 
2023-01-11T21:38:06.8620513Z async_compile.wait(globals())
2023-01-11T21:38:06.8620590Z del async_compile
2023-01-11T21:38:06.8620595Z 
2023-01-11T21:38:06.8620670Z def call(args):
2023-01-11T21:38:06.8620751Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8620819Z     args.clear()
2023-01-11T21:38:06.8621018Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8621189Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8621261Z     del arg0_1
2023-01-11T21:38:06.8621332Z     del arg1_1
2023-01-11T21:38:06.8621407Z     return (buf0, )
2023-01-11T21:38:06.8621412Z 
2023-01-11T21:38:06.8621417Z 
2023-01-11T21:38:06.8621526Z if __name__ == "__main__":
2023-01-11T21:38:06.8621638Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8621767Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8621962Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8622159Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8622281Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8622286Z 
2023-01-11T21:38:06.8622356Z ok (1.689s)
2023-01-11T21:38:06.8622837Z   test_cpu_broadcast1_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8622972Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8623233Z [2023-01-11 21:36:32,365] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1027
2023-01-11T21:38:06.8623488Z [2023-01-11 21:36:34,024] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1027
2023-01-11T21:38:06.8623502Z 
2023-01-11T21:38:06.8623593Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8623666Z import torch
2023-01-11T21:38:06.8623740Z import random
2023-01-11T21:38:06.8623859Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8624019Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8624024Z 
2023-01-11T21:38:06.8624106Z aten = torch.ops.aten
2023-01-11T21:38:06.8624243Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8624331Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8624337Z 
2023-01-11T21:38:06.8624411Z import triton
2023-01-11T21:38:06.8624505Z import triton.language as tl
2023-01-11T21:38:06.8624632Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8624775Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8624780Z 
2023-01-11T21:38:06.8624785Z 
2023-01-11T21:38:06.8624922Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8625130Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8625252Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8625358Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8625467Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8625532Z {
2023-01-11T21:38:06.8625633Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8625699Z     {
2023-01-11T21:38:06.8625794Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8625881Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8625944Z         {
2023-01-11T21:38:06.8626034Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8626102Z             {
2023-01-11T21:38:06.8626170Z                 {
2023-01-11T21:38:06.8626241Z                     {
2023-01-11T21:38:06.8626342Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8626451Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8626562Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.8626660Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8626761Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8626834Z                     }
2023-01-11T21:38:06.8626903Z                 }
2023-01-11T21:38:06.8626971Z             }
2023-01-11T21:38:06.8627037Z         }
2023-01-11T21:38:06.8627096Z     }
2023-01-11T21:38:06.8627160Z }
2023-01-11T21:38:06.8627245Z ''')
2023-01-11T21:38:06.8627250Z 
2023-01-11T21:38:06.8627255Z 
2023-01-11T21:38:06.8627377Z async_compile.wait(globals())
2023-01-11T21:38:06.8627457Z del async_compile
2023-01-11T21:38:06.8627462Z 
2023-01-11T21:38:06.8627536Z def call(args):
2023-01-11T21:38:06.8627615Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8627683Z     args.clear()
2023-01-11T21:38:06.8627881Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8628048Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8628121Z     del arg0_1
2023-01-11T21:38:06.8628192Z     del arg1_1
2023-01-11T21:38:06.8628268Z     return (buf0, )
2023-01-11T21:38:06.8628277Z 
2023-01-11T21:38:06.8628282Z 
2023-01-11T21:38:06.8628363Z if __name__ == "__main__":
2023-01-11T21:38:06.8628482Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8628601Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8628795Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8628995Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8629113Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8629119Z 
2023-01-11T21:38:06.8629190Z ok (1.673s)
2023-01-11T21:38:06.8629663Z   test_cpu_broadcast1_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8629822Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8630084Z [2023-01-11 21:36:34,038] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1028
2023-01-11T21:38:06.8630349Z [2023-01-11 21:36:35,696] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1028
2023-01-11T21:38:06.8630355Z 
2023-01-11T21:38:06.8630447Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8630521Z import torch
2023-01-11T21:38:06.8630595Z import random
2023-01-11T21:38:06.8630713Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8630836Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8630841Z 
2023-01-11T21:38:06.8630926Z aten = torch.ops.aten
2023-01-11T21:38:06.8631062Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8631159Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8631167Z 
2023-01-11T21:38:06.8631234Z import triton
2023-01-11T21:38:06.8631326Z import triton.language as tl
2023-01-11T21:38:06.8631449Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8631589Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8631595Z 
2023-01-11T21:38:06.8631599Z 
2023-01-11T21:38:06.8631741Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8631946Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8632069Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8632177Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8632274Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8632339Z {
2023-01-11T21:38:06.8632440Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8632507Z     {
2023-01-11T21:38:06.8632590Z         #pragma omp for 
2023-01-11T21:38:06.8632676Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8632739Z         {
2023-01-11T21:38:06.8632807Z             {
2023-01-11T21:38:06.8632875Z                 {
2023-01-11T21:38:06.8632973Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8633070Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8633213Z                     auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:06.8633310Z                     auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8633393Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8633462Z                 }
2023-01-11T21:38:06.8633531Z             }
2023-01-11T21:38:06.8633598Z         }
2023-01-11T21:38:06.8633663Z     }
2023-01-11T21:38:06.8633727Z }
2023-01-11T21:38:06.8633811Z ''')
2023-01-11T21:38:06.8633817Z 
2023-01-11T21:38:06.8633821Z 
2023-01-11T21:38:06.8633906Z async_compile.wait(globals())
2023-01-11T21:38:06.8633984Z del async_compile
2023-01-11T21:38:06.8633989Z 
2023-01-11T21:38:06.8634063Z def call(args):
2023-01-11T21:38:06.8634146Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8634220Z     args.clear()
2023-01-11T21:38:06.8634415Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8634582Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8634655Z     del arg0_1
2023-01-11T21:38:06.8634722Z     del arg1_1
2023-01-11T21:38:06.8634798Z     return (buf0, )
2023-01-11T21:38:06.8634803Z 
2023-01-11T21:38:06.8634807Z 
2023-01-11T21:38:06.8634888Z if __name__ == "__main__":
2023-01-11T21:38:06.8635005Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8635132Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8635357Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8635562Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8635674Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8635723Z 
2023-01-11T21:38:06.8635790Z ok (1.671s)
2023-01-11T21:38:06.8636267Z   test_cpu_broadcast1_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8636399Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8636662Z [2023-01-11 21:36:35,710] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1029
2023-01-11T21:38:06.8636928Z [2023-01-11 21:36:37,377] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1029
2023-01-11T21:38:06.8636934Z 
2023-01-11T21:38:06.8637031Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8637108Z import torch
2023-01-11T21:38:06.8637183Z import random
2023-01-11T21:38:06.8637302Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8637418Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8637423Z 
2023-01-11T21:38:06.8637504Z aten = torch.ops.aten
2023-01-11T21:38:06.8637642Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8637739Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8637744Z 
2023-01-11T21:38:06.8637818Z import triton
2023-01-11T21:38:06.8637910Z import triton.language as tl
2023-01-11T21:38:06.8638034Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8638166Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8638181Z 
2023-01-11T21:38:06.8638185Z 
2023-01-11T21:38:06.8638314Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8638521Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8638646Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8638755Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8638858Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8638926Z {
2023-01-11T21:38:06.8639028Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8639113Z     {
2023-01-11T21:38:06.8639210Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8639296Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8639364Z         {
2023-01-11T21:38:06.8639455Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8639524Z             {
2023-01-11T21:38:06.8639594Z                 {
2023-01-11T21:38:06.8639658Z                     {
2023-01-11T21:38:06.8639758Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8639869Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8639969Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8640074Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8640144Z                     }
2023-01-11T21:38:06.8640213Z                 }
2023-01-11T21:38:06.8640273Z             }
2023-01-11T21:38:06.8640340Z         }
2023-01-11T21:38:06.8640406Z     }
2023-01-11T21:38:06.8640469Z }
2023-01-11T21:38:06.8640555Z ''')
2023-01-11T21:38:06.8640563Z 
2023-01-11T21:38:06.8640568Z 
2023-01-11T21:38:06.8640660Z async_compile.wait(globals())
2023-01-11T21:38:06.8640738Z del async_compile
2023-01-11T21:38:06.8640744Z 
2023-01-11T21:38:06.8640811Z def call(args):
2023-01-11T21:38:06.8640889Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8640964Z     args.clear()
2023-01-11T21:38:06.8641165Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8641333Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8641407Z     del arg0_1
2023-01-11T21:38:06.8641478Z     del arg1_1
2023-01-11T21:38:06.8649722Z     return (buf0, )
2023-01-11T21:38:06.8649733Z 
2023-01-11T21:38:06.8649737Z 
2023-01-11T21:38:06.8649838Z if __name__ == "__main__":
2023-01-11T21:38:06.8649962Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8650084Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8650311Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8650511Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8650631Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8650637Z 
2023-01-11T21:38:06.8650704Z ok (1.681s)
2023-01-11T21:38:06.8651193Z   test_cpu_broadcast1_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8651329Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8651583Z [2023-01-11 21:36:37,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1030
2023-01-11T21:38:06.8651850Z [2023-01-11 21:36:39,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1030
2023-01-11T21:38:06.8651856Z 
2023-01-11T21:38:06.8651954Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8652031Z import torch
2023-01-11T21:38:06.8652109Z import random
2023-01-11T21:38:06.8652228Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8652353Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8652358Z 
2023-01-11T21:38:06.8652441Z aten = torch.ops.aten
2023-01-11T21:38:06.8652570Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8652670Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8652675Z 
2023-01-11T21:38:06.8652751Z import triton
2023-01-11T21:38:06.8652843Z import triton.language as tl
2023-01-11T21:38:06.8652968Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8653108Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8653172Z 
2023-01-11T21:38:06.8653178Z 
2023-01-11T21:38:06.8653316Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8653524Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8653642Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8653751Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8653855Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8653922Z {
2023-01-11T21:38:06.8654025Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8654091Z     {
2023-01-11T21:38:06.8654176Z         #pragma omp for 
2023-01-11T21:38:06.8654257Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8654325Z         {
2023-01-11T21:38:06.8654413Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8654613Z             {
2023-01-11T21:38:06.8654758Z                 auto tmp0 = at::vec::Vectorized<float>(in_ptr0[i0]);
2023-01-11T21:38:06.8654910Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8655006Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8655110Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8655181Z             }
2023-01-11T21:38:06.8655278Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8655368Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8655436Z             {
2023-01-11T21:38:06.8655527Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8655629Z                 auto tmp1 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8655770Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8655868Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8655937Z             }
2023-01-11T21:38:06.8656005Z         }
2023-01-11T21:38:06.8656073Z     }
2023-01-11T21:38:06.8656139Z }
2023-01-11T21:38:06.8656227Z ''')
2023-01-11T21:38:06.8656233Z 
2023-01-11T21:38:06.8656240Z 
2023-01-11T21:38:06.8656326Z async_compile.wait(globals())
2023-01-11T21:38:06.8656407Z del async_compile
2023-01-11T21:38:06.8656413Z 
2023-01-11T21:38:06.8656487Z def call(args):
2023-01-11T21:38:06.8656567Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8656642Z     args.clear()
2023-01-11T21:38:06.8656845Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8657010Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8657085Z     del arg0_1
2023-01-11T21:38:06.8657204Z     del arg1_1
2023-01-11T21:38:06.8657294Z     return (buf0, )
2023-01-11T21:38:06.8657304Z 
2023-01-11T21:38:06.8657309Z 
2023-01-11T21:38:06.8657391Z if __name__ == "__main__":
2023-01-11T21:38:06.8657509Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8657635Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8657836Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8658032Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8658144Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8658157Z 
2023-01-11T21:38:06.8658220Z ok (1.681s)
2023-01-11T21:38:06.8658698Z   test_cpu_broadcast2_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8658834Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8659096Z [2023-01-11 21:36:39,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1031
2023-01-11T21:38:06.8659433Z [2023-01-11 21:36:40,735] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1031
2023-01-11T21:38:06.8659441Z 
2023-01-11T21:38:06.8659542Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8659616Z import torch
2023-01-11T21:38:06.8659693Z import random
2023-01-11T21:38:06.8659812Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8659933Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8659938Z 
2023-01-11T21:38:06.8660022Z aten = torch.ops.aten
2023-01-11T21:38:06.8660157Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8660253Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8660262Z 
2023-01-11T21:38:06.8660337Z import triton
2023-01-11T21:38:06.8660429Z import triton.language as tl
2023-01-11T21:38:06.8660554Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8660685Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8660699Z 
2023-01-11T21:38:06.8660706Z 
2023-01-11T21:38:06.8660837Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8661043Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8661167Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8661276Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8661384Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8661450Z {
2023-01-11T21:38:06.8661554Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8661613Z     {
2023-01-11T21:38:06.8661698Z         #pragma omp for 
2023-01-11T21:38:06.8661816Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8661883Z         {
2023-01-11T21:38:06.8661973Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8662041Z             {
2023-01-11T21:38:06.8662175Z                 auto tmp0 = at::vec::Vectorized<float>(in_ptr0[i0]);
2023-01-11T21:38:06.8662309Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i1);
2023-01-11T21:38:06.8662404Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8662512Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8662581Z             }
2023-01-11T21:38:06.8662676Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8662767Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8662835Z             {
2023-01-11T21:38:06.8662918Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8663009Z                 auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8663100Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8663203Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8663271Z             }
2023-01-11T21:38:06.8663339Z         }
2023-01-11T21:38:06.8663405Z     }
2023-01-11T21:38:06.8663462Z }
2023-01-11T21:38:06.8663548Z ''')
2023-01-11T21:38:06.8663554Z 
2023-01-11T21:38:06.8663558Z 
2023-01-11T21:38:06.8663657Z async_compile.wait(globals())
2023-01-11T21:38:06.8663736Z del async_compile
2023-01-11T21:38:06.8663742Z 
2023-01-11T21:38:06.8663818Z def call(args):
2023-01-11T21:38:06.8663898Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8663974Z     args.clear()
2023-01-11T21:38:06.8664177Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8664347Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8664422Z     del arg0_1
2023-01-11T21:38:06.8664493Z     del arg1_1
2023-01-11T21:38:06.8664571Z     return (buf0, )
2023-01-11T21:38:06.8664576Z 
2023-01-11T21:38:06.8664583Z 
2023-01-11T21:38:06.8664665Z if __name__ == "__main__":
2023-01-11T21:38:06.8664783Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8664911Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8665109Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8665329Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8665449Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8665455Z 
2023-01-11T21:38:06.8665528Z ok (1.676s)
2023-01-11T21:38:06.8666052Z   test_cpu_broadcast2_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8666189Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8666451Z [2023-01-11 21:36:40,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1032
2023-01-11T21:38:06.8666716Z [2023-01-11 21:36:40,756] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1032
2023-01-11T21:38:06.8666722Z 
2023-01-11T21:38:06.8666823Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8666899Z import torch
2023-01-11T21:38:06.8666966Z import random
2023-01-11T21:38:06.8667085Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8667209Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8667214Z 
2023-01-11T21:38:06.8667297Z aten = torch.ops.aten
2023-01-11T21:38:06.8667434Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8667530Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8667536Z 
2023-01-11T21:38:06.8667643Z import triton
2023-01-11T21:38:06.8667729Z import triton.language as tl
2023-01-11T21:38:06.8667854Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8667995Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8668001Z 
2023-01-11T21:38:06.8668005Z 
2023-01-11T21:38:06.8668146Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8668353Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8668476Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8668592Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8668697Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8668756Z {
2023-01-11T21:38:06.8668856Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8668928Z     {
2023-01-11T21:38:06.8669014Z         #pragma omp for 
2023-01-11T21:38:06.8669102Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.8669175Z         {
2023-01-11T21:38:06.8669315Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.8669444Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8669535Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8669631Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8669699Z         }
2023-01-11T21:38:06.8669798Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8669887Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.8669956Z         {
2023-01-11T21:38:06.8670037Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8670125Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8670213Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8670301Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8670368Z         }
2023-01-11T21:38:06.8670434Z     }
2023-01-11T21:38:06.8670500Z }
2023-01-11T21:38:06.8670577Z ''')
2023-01-11T21:38:06.8670585Z 
2023-01-11T21:38:06.8670590Z 
2023-01-11T21:38:06.8670688Z async_compile.wait(globals())
2023-01-11T21:38:06.8670765Z del async_compile
2023-01-11T21:38:06.8670771Z 
2023-01-11T21:38:06.8670848Z def call(args):
2023-01-11T21:38:06.8670928Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8671006Z     args.clear()
2023-01-11T21:38:06.8671241Z     buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8671405Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8671484Z     del arg0_1
2023-01-11T21:38:06.8671558Z     del arg1_1
2023-01-11T21:38:06.8671636Z     return (buf0, )
2023-01-11T21:38:06.8671641Z 
2023-01-11T21:38:06.8671646Z 
2023-01-11T21:38:06.8671732Z if __name__ == "__main__":
2023-01-11T21:38:06.8671855Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8671986Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8672197Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8672394Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8672517Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8672522Z 
2023-01-11T21:38:06.8672596Z ok (0.021s)
2023-01-11T21:38:06.8673078Z   test_cpu_broadcast2_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8673212Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8673474Z [2023-01-11 21:36:40,770] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1033
2023-01-11T21:38:06.8673765Z [2023-01-11 21:36:40,777] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1033
2023-01-11T21:38:06.8673771Z 
2023-01-11T21:38:06.8673869Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8673945Z import torch
2023-01-11T21:38:06.8674013Z import random
2023-01-11T21:38:06.8674135Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8674259Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8674264Z 
2023-01-11T21:38:06.8674350Z aten = torch.ops.aten
2023-01-11T21:38:06.8674485Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8674581Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8674586Z 
2023-01-11T21:38:06.8674661Z import triton
2023-01-11T21:38:06.8674754Z import triton.language as tl
2023-01-11T21:38:06.8674873Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8675014Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8675022Z 
2023-01-11T21:38:06.8675026Z 
2023-01-11T21:38:06.8675164Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8675369Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8675496Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8675610Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8675715Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8675782Z {
2023-01-11T21:38:06.8675878Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8675944Z     {
2023-01-11T21:38:06.8676026Z         #pragma omp for 
2023-01-11T21:38:06.8676115Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.8676182Z         {
2023-01-11T21:38:06.8676321Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.8676447Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:06.8676532Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8676627Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8676695Z         }
2023-01-11T21:38:06.8676796Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8676885Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.8676953Z         {
2023-01-11T21:38:06.8677068Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8677150Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.8677237Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8677324Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8677393Z         }
2023-01-11T21:38:06.8677461Z     }
2023-01-11T21:38:06.8677527Z }
2023-01-11T21:38:06.8677606Z ''')
2023-01-11T21:38:06.8677621Z 
2023-01-11T21:38:06.8677625Z 
2023-01-11T21:38:06.8677713Z async_compile.wait(globals())
2023-01-11T21:38:06.8677793Z del async_compile
2023-01-11T21:38:06.8677798Z 
2023-01-11T21:38:06.8677873Z def call(args):
2023-01-11T21:38:06.8677953Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8678034Z     args.clear()
2023-01-11T21:38:06.8678238Z     buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8678405Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8678470Z     del arg0_1
2023-01-11T21:38:06.8678546Z     del arg1_1
2023-01-11T21:38:06.8678622Z     return (buf0, )
2023-01-11T21:38:06.8678628Z 
2023-01-11T21:38:06.8678632Z 
2023-01-11T21:38:06.8678713Z if __name__ == "__main__":
2023-01-11T21:38:06.8678831Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8678959Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8679165Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8679356Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8679469Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8679503Z 
2023-01-11T21:38:06.8679574Z ok (0.021s)
2023-01-11T21:38:06.8680051Z   test_cpu_broadcast2_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8680186Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8680450Z [2023-01-11 21:36:40,790] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1034
2023-01-11T21:38:06.8680712Z [2023-01-11 21:36:40,799] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1034
2023-01-11T21:38:06.8680718Z 
2023-01-11T21:38:06.8680816Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8680894Z import torch
2023-01-11T21:38:06.8680968Z import random
2023-01-11T21:38:06.8681079Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8681203Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8681208Z 
2023-01-11T21:38:06.8681292Z aten = torch.ops.aten
2023-01-11T21:38:06.8681430Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8681526Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8681531Z 
2023-01-11T21:38:06.8681605Z import triton
2023-01-11T21:38:06.8681699Z import triton.language as tl
2023-01-11T21:38:06.8681822Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8681954Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8681961Z 
2023-01-11T21:38:06.8681972Z 
2023-01-11T21:38:06.8682102Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8682308Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8682436Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8682547Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8682651Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8682718Z {
2023-01-11T21:38:06.8682820Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8682905Z     {
2023-01-11T21:38:06.8682988Z         #pragma omp for 
2023-01-11T21:38:06.8683076Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8683143Z         {
2023-01-11T21:38:06.8683231Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8683299Z             {
2023-01-11T21:38:06.8683425Z                 auto tmp0 = at::vec::Vectorized<float>(in_ptr0[i0]);
2023-01-11T21:38:06.8683576Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8683669Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8683778Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8683853Z             }
2023-01-11T21:38:06.8683951Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8684039Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8684107Z             {
2023-01-11T21:38:06.8684190Z                 auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8684297Z                 auto tmp1 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8684388Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8684487Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8684555Z             }
2023-01-11T21:38:06.8684621Z         }
2023-01-11T21:38:06.8684688Z     }
2023-01-11T21:38:06.8684745Z }
2023-01-11T21:38:06.8684830Z ''')
2023-01-11T21:38:06.8684837Z 
2023-01-11T21:38:06.8684841Z 
2023-01-11T21:38:06.8684934Z async_compile.wait(globals())
2023-01-11T21:38:06.8685012Z del async_compile
2023-01-11T21:38:06.8685017Z 
2023-01-11T21:38:06.8685093Z def call(args):
2023-01-11T21:38:06.8685173Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8685290Z     args.clear()
2023-01-11T21:38:06.8685490Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8685683Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8685769Z     del arg0_1
2023-01-11T21:38:06.8685851Z     del arg1_1
2023-01-11T21:38:06.8685931Z     return (buf0, )
2023-01-11T21:38:06.8685937Z 
2023-01-11T21:38:06.8685941Z 
2023-01-11T21:38:06.8686028Z if __name__ == "__main__":
2023-01-11T21:38:06.8686148Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8686279Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8686477Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8686675Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8686796Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8686801Z 
2023-01-11T21:38:06.8686876Z ok (0.021s)
2023-01-11T21:38:06.8687356Z   test_cpu_broadcast2_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8687489Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8687749Z [2023-01-11 21:36:40,812] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1035
2023-01-11T21:38:06.8688016Z [2023-01-11 21:36:42,465] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1035
2023-01-11T21:38:06.8688022Z 
2023-01-11T21:38:06.8688120Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8688187Z import torch
2023-01-11T21:38:06.8688264Z import random
2023-01-11T21:38:06.8688383Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8688508Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8688513Z 
2023-01-11T21:38:06.8688595Z aten = torch.ops.aten
2023-01-11T21:38:06.8688733Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8688855Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8688861Z 
2023-01-11T21:38:06.8688938Z import triton
2023-01-11T21:38:06.8689024Z import triton.language as tl
2023-01-11T21:38:06.8689148Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8689288Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8689293Z 
2023-01-11T21:38:06.8689298Z 
2023-01-11T21:38:06.8689442Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8689648Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8689772Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8689888Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8689995Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8690053Z {
2023-01-11T21:38:06.8690154Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8690221Z     {
2023-01-11T21:38:06.8690321Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8690409Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8690476Z         {
2023-01-11T21:38:06.8690560Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8690628Z             {
2023-01-11T21:38:06.8690698Z                 {
2023-01-11T21:38:06.8690771Z                     {
2023-01-11T21:38:06.8690873Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8690980Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8691098Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.8691189Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8691319Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8691390Z                     }
2023-01-11T21:38:06.8691459Z                 }
2023-01-11T21:38:06.8691528Z             }
2023-01-11T21:38:06.8691596Z         }
2023-01-11T21:38:06.8691664Z     }
2023-01-11T21:38:06.8691724Z }
2023-01-11T21:38:06.8691808Z ''')
2023-01-11T21:38:06.8691814Z 
2023-01-11T21:38:06.8691818Z 
2023-01-11T21:38:06.8691911Z async_compile.wait(globals())
2023-01-11T21:38:06.8691990Z del async_compile
2023-01-11T21:38:06.8691995Z 
2023-01-11T21:38:06.8692073Z def call(args):
2023-01-11T21:38:06.8692154Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8692228Z     args.clear()
2023-01-11T21:38:06.8692430Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8692601Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8692677Z     del arg0_1
2023-01-11T21:38:06.8692749Z     del arg1_1
2023-01-11T21:38:06.8692825Z     return (buf0, )
2023-01-11T21:38:06.8692830Z 
2023-01-11T21:38:06.8692835Z 
2023-01-11T21:38:06.8692915Z if __name__ == "__main__":
2023-01-11T21:38:06.8693039Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8693166Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8693365Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8693571Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8693689Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8693695Z 
2023-01-11T21:38:06.8693766Z ok (1.667s)
2023-01-11T21:38:06.8694238Z   test_cpu_broadcast2_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8694374Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8694787Z [2023-01-11 21:36:42,479] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1036
2023-01-11T21:38:06.8695052Z [2023-01-11 21:36:44,140] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1036
2023-01-11T21:38:06.8695058Z 
2023-01-11T21:38:06.8695156Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8695231Z import torch
2023-01-11T21:38:06.8695298Z import random
2023-01-11T21:38:06.8695421Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8695544Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8695550Z 
2023-01-11T21:38:06.8695630Z aten = torch.ops.aten
2023-01-11T21:38:06.8695769Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8695866Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8695871Z 
2023-01-11T21:38:06.8695944Z import triton
2023-01-11T21:38:06.8696029Z import triton.language as tl
2023-01-11T21:38:06.8696153Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8696294Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8696300Z 
2023-01-11T21:38:06.8696304Z 
2023-01-11T21:38:06.8696443Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8696646Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8696768Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8696877Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8696981Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8697040Z {
2023-01-11T21:38:06.8697189Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8697311Z     {
2023-01-11T21:38:06.8697409Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8697495Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8697562Z         {
2023-01-11T21:38:06.8697654Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8697714Z             {
2023-01-11T21:38:06.8697787Z                 {
2023-01-11T21:38:06.8697862Z                     {
2023-01-11T21:38:06.8697962Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8698062Z                         auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8698178Z                         auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:06.8698276Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8698369Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8698440Z                     }
2023-01-11T21:38:06.8698508Z                 }
2023-01-11T21:38:06.8698575Z             }
2023-01-11T21:38:06.8698642Z         }
2023-01-11T21:38:06.8698711Z     }
2023-01-11T21:38:06.8698776Z }
2023-01-11T21:38:06.8698855Z ''')
2023-01-11T21:38:06.8698860Z 
2023-01-11T21:38:06.8698864Z 
2023-01-11T21:38:06.8698959Z async_compile.wait(globals())
2023-01-11T21:38:06.8699034Z del async_compile
2023-01-11T21:38:06.8699039Z 
2023-01-11T21:38:06.8699114Z def call(args):
2023-01-11T21:38:06.8699198Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8699274Z     args.clear()
2023-01-11T21:38:06.8699484Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8699644Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8699717Z     del arg0_1
2023-01-11T21:38:06.8699790Z     del arg1_1
2023-01-11T21:38:06.8699866Z     return (buf0, )
2023-01-11T21:38:06.8699871Z 
2023-01-11T21:38:06.8699876Z 
2023-01-11T21:38:06.8699958Z if __name__ == "__main__":
2023-01-11T21:38:06.8700076Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8700207Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8700413Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8700596Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8700743Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8700749Z 
2023-01-11T21:38:06.8700824Z ok (1.675s)
2023-01-11T21:38:06.8701300Z   test_cpu_broadcast2_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8701431Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8701696Z [2023-01-11 21:36:44,154] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1037
2023-01-11T21:38:06.8701960Z [2023-01-11 21:36:45,813] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1037
2023-01-11T21:38:06.8701966Z 
2023-01-11T21:38:06.8702064Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8702140Z import torch
2023-01-11T21:38:06.8702208Z import random
2023-01-11T21:38:06.8702328Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8702452Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8702457Z 
2023-01-11T21:38:06.8702538Z aten = torch.ops.aten
2023-01-11T21:38:06.8702674Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8702771Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8702776Z 
2023-01-11T21:38:06.8702850Z import triton
2023-01-11T21:38:06.8702943Z import triton.language as tl
2023-01-11T21:38:06.8703061Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8703229Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8703235Z 
2023-01-11T21:38:06.8703240Z 
2023-01-11T21:38:06.8703378Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8703586Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8703710Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8703819Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8703921Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8703986Z {
2023-01-11T21:38:06.8704080Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8704146Z     {
2023-01-11T21:38:06.8704241Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8704328Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8704395Z         {
2023-01-11T21:38:06.8704487Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8704551Z             {
2023-01-11T21:38:06.8704620Z                 {
2023-01-11T21:38:06.8704690Z                     {
2023-01-11T21:38:06.8704791Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8704903Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8705004Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8705107Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8705171Z                     }
2023-01-11T21:38:06.8705240Z                 }
2023-01-11T21:38:06.8705307Z             }
2023-01-11T21:38:06.8705377Z         }
2023-01-11T21:38:06.8705447Z     }
2023-01-11T21:38:06.8705518Z }
2023-01-11T21:38:06.8705624Z ''')
2023-01-11T21:38:06.8705630Z 
2023-01-11T21:38:06.8705634Z 
2023-01-11T21:38:06.8705730Z async_compile.wait(globals())
2023-01-11T21:38:06.8705822Z del async_compile
2023-01-11T21:38:06.8705828Z 
2023-01-11T21:38:06.8705903Z def call(args):
2023-01-11T21:38:06.8705983Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8706062Z     args.clear()
2023-01-11T21:38:06.8706267Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8706435Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8706509Z     del arg0_1
2023-01-11T21:38:06.8706607Z     del arg1_1
2023-01-11T21:38:06.8706685Z     return (buf0, )
2023-01-11T21:38:06.8706690Z 
2023-01-11T21:38:06.8706694Z 
2023-01-11T21:38:06.8706775Z if __name__ == "__main__":
2023-01-11T21:38:06.8706895Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8707020Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8707226Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8707425Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8707536Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8707551Z 
2023-01-11T21:38:06.8707615Z ok (1.674s)
2023-01-11T21:38:06.8708100Z   test_cpu_broadcast2_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8708233Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8708492Z [2023-01-11 21:36:45,827] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1038
2023-01-11T21:38:06.8708757Z [2023-01-11 21:36:45,836] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1038
2023-01-11T21:38:06.8708763Z 
2023-01-11T21:38:06.8708861Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8708962Z import torch
2023-01-11T21:38:06.8709037Z import random
2023-01-11T21:38:06.8709155Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8709271Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8709277Z 
2023-01-11T21:38:06.8709358Z aten = torch.ops.aten
2023-01-11T21:38:06.8709499Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8709595Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8709600Z 
2023-01-11T21:38:06.8709675Z import triton
2023-01-11T21:38:06.8709767Z import triton.language as tl
2023-01-11T21:38:06.8709891Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8710023Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8710035Z 
2023-01-11T21:38:06.8710040Z 
2023-01-11T21:38:06.8710169Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8710375Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8710504Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8710613Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8710717Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8710784Z {
2023-01-11T21:38:06.8710885Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8710947Z     {
2023-01-11T21:38:06.8711030Z         #pragma omp for 
2023-01-11T21:38:06.8711116Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8711184Z         {
2023-01-11T21:38:06.8711273Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8711341Z             {
2023-01-11T21:38:06.8711483Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i1);
2023-01-11T21:38:06.8711622Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8711715Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8711826Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8711896Z             }
2023-01-11T21:38:06.8711993Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8712083Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8712151Z             {
2023-01-11T21:38:06.8712234Z                 auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8712368Z                 auto tmp1 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8712461Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8712564Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8712634Z             }
2023-01-11T21:38:06.8712703Z         }
2023-01-11T21:38:06.8712768Z     }
2023-01-11T21:38:06.8712826Z }
2023-01-11T21:38:06.8712914Z ''')
2023-01-11T21:38:06.8712920Z 
2023-01-11T21:38:06.8712925Z 
2023-01-11T21:38:06.8713019Z async_compile.wait(globals())
2023-01-11T21:38:06.8713103Z del async_compile
2023-01-11T21:38:06.8713108Z 
2023-01-11T21:38:06.8713184Z def call(args):
2023-01-11T21:38:06.8713263Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8713341Z     args.clear()
2023-01-11T21:38:06.8713549Z     buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8713710Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8713782Z     del arg0_1
2023-01-11T21:38:06.8713857Z     del arg1_1
2023-01-11T21:38:06.8713932Z     return (buf0, )
2023-01-11T21:38:06.8713937Z 
2023-01-11T21:38:06.8713942Z 
2023-01-11T21:38:06.8714025Z if __name__ == "__main__":
2023-01-11T21:38:06.8714143Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8714269Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8714468Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8714666Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8714786Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8714820Z 
2023-01-11T21:38:06.8714892Z ok (0.022s)
2023-01-11T21:38:06.8715375Z   test_cpu_broadcast3_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8715506Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8715766Z [2023-01-11 21:36:45,849] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1039
2023-01-11T21:38:06.8716030Z [2023-01-11 21:36:47,508] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1039
2023-01-11T21:38:06.8716036Z 
2023-01-11T21:38:06.8716134Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8716208Z import torch
2023-01-11T21:38:06.8716278Z import random
2023-01-11T21:38:06.8716398Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8716521Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8716526Z 
2023-01-11T21:38:06.8716610Z aten = torch.ops.aten
2023-01-11T21:38:06.8716750Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8716846Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8716851Z 
2023-01-11T21:38:06.8716925Z import triton
2023-01-11T21:38:06.8717010Z import triton.language as tl
2023-01-11T21:38:06.8717136Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8717276Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8717282Z 
2023-01-11T21:38:06.8717286Z 
2023-01-11T21:38:06.8717422Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8717628Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8717757Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8717865Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8717972Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8718030Z {
2023-01-11T21:38:06.8718132Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8718228Z     {
2023-01-11T21:38:06.8718312Z         #pragma omp for 
2023-01-11T21:38:06.8718402Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.8718469Z         {
2023-01-11T21:38:06.8718599Z             auto tmp0 = at::vec::Vectorized<float>(in_ptr0[0]);
2023-01-11T21:38:06.8718729Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8718819Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8718915Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8718983Z         }
2023-01-11T21:38:06.8719083Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8719169Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.8719241Z         {
2023-01-11T21:38:06.8719321Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8719408Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8719497Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8719583Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8719652Z         }
2023-01-11T21:38:06.8719721Z     }
2023-01-11T21:38:06.8719785Z }
2023-01-11T21:38:06.8719864Z ''')
2023-01-11T21:38:06.8719869Z 
2023-01-11T21:38:06.8719874Z 
2023-01-11T21:38:06.8719972Z async_compile.wait(globals())
2023-01-11T21:38:06.8720050Z del async_compile
2023-01-11T21:38:06.8720055Z 
2023-01-11T21:38:06.8720130Z def call(args):
2023-01-11T21:38:06.8720209Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8720283Z     args.clear()
2023-01-11T21:38:06.8720478Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8720645Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8720742Z     del arg0_1
2023-01-11T21:38:06.8720815Z     del arg1_1
2023-01-11T21:38:06.8720891Z     return (buf0, )
2023-01-11T21:38:06.8720896Z 
2023-01-11T21:38:06.8720901Z 
2023-01-11T21:38:06.8720983Z if __name__ == "__main__":
2023-01-11T21:38:06.8721101Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8721231Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8721423Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8721610Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8721731Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8721736Z 
2023-01-11T21:38:06.8721809Z ok (1.672s)
2023-01-11T21:38:06.8722288Z   test_cpu_broadcast3_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8722421Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8722683Z [2023-01-11 21:36:47,521] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1040
2023-01-11T21:38:06.8722948Z [2023-01-11 21:36:47,528] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1040
2023-01-11T21:38:06.8722954Z 
2023-01-11T21:38:06.8723051Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8723126Z import torch
2023-01-11T21:38:06.8723200Z import random
2023-01-11T21:38:06.8723312Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8723434Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8723439Z 
2023-01-11T21:38:06.8723522Z aten = torch.ops.aten
2023-01-11T21:38:06.8723661Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8723757Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8723762Z 
2023-01-11T21:38:06.8723836Z import triton
2023-01-11T21:38:06.8723929Z import triton.language as tl
2023-01-11T21:38:06.8724046Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8724212Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8724218Z 
2023-01-11T21:38:06.8724223Z 
2023-01-11T21:38:06.8724360Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8724566Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8724692Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8724802Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8724908Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8724975Z {
2023-01-11T21:38:06.8725073Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8725138Z     {
2023-01-11T21:38:06.8725220Z         #pragma omp for 
2023-01-11T21:38:06.8725306Z         for(long i0=0; i0<1; i0+=1)
2023-01-11T21:38:06.8725374Z         {
2023-01-11T21:38:06.8725501Z             auto tmp0 = at::vec::Vectorized<float>(in_ptr0[0]);
2023-01-11T21:38:06.8725642Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8725725Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8725821Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8725892Z         }
2023-01-11T21:38:06.8725991Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8726077Z         for(long i0=8; i0<10; i0+=1)
2023-01-11T21:38:06.8726143Z         {
2023-01-11T21:38:06.8726229Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8726311Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8726399Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8726485Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8726591Z         }
2023-01-11T21:38:06.8726656Z     }
2023-01-11T21:38:06.8726724Z }
2023-01-11T21:38:06.8726810Z ''')
2023-01-11T21:38:06.8726816Z 
2023-01-11T21:38:06.8726820Z 
2023-01-11T21:38:06.8726908Z async_compile.wait(globals())
2023-01-11T21:38:06.8726985Z del async_compile
2023-01-11T21:38:06.8726990Z 
2023-01-11T21:38:06.8727069Z def call(args):
2023-01-11T21:38:06.8727150Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8727225Z     args.clear()
2023-01-11T21:38:06.8727433Z     buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8727599Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8727665Z     del arg0_1
2023-01-11T21:38:06.8727738Z     del arg1_1
2023-01-11T21:38:06.8727816Z     return (buf0, )
2023-01-11T21:38:06.8727822Z 
2023-01-11T21:38:06.8727826Z 
2023-01-11T21:38:06.8727906Z if __name__ == "__main__":
2023-01-11T21:38:06.8728027Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8728154Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8728349Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8728556Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8728672Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8728677Z 
2023-01-11T21:38:06.8728747Z ok (0.020s)
2023-01-11T21:38:06.8729229Z   test_cpu_broadcast3_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8729361Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8729624Z [2023-01-11 21:36:47,542] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1041
2023-01-11T21:38:06.8729887Z [2023-01-11 21:36:49,187] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1041
2023-01-11T21:38:06.8729893Z 
2023-01-11T21:38:06.8730017Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8730095Z import torch
2023-01-11T21:38:06.8730170Z import random
2023-01-11T21:38:06.8730283Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8730407Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8730413Z 
2023-01-11T21:38:06.8730495Z aten = torch.ops.aten
2023-01-11T21:38:06.8730631Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8730726Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8730731Z 
2023-01-11T21:38:06.8730806Z import triton
2023-01-11T21:38:06.8730901Z import triton.language as tl
2023-01-11T21:38:06.8731027Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8731160Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8731165Z 
2023-01-11T21:38:06.8731176Z 
2023-01-11T21:38:06.8731306Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8731513Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8731637Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8731748Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8731851Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8731917Z {
2023-01-11T21:38:06.8731983Z     {
2023-01-11T21:38:06.8732043Z         {
2023-01-11T21:38:06.8732131Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8732217Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.8732307Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8732394Z             out_ptr0[0] = tmp2;
2023-01-11T21:38:06.8732491Z         }
2023-01-11T21:38:06.8732550Z     }
2023-01-11T21:38:06.8732614Z }
2023-01-11T21:38:06.8732700Z ''')
2023-01-11T21:38:06.8732706Z 
2023-01-11T21:38:06.8732710Z 
2023-01-11T21:38:06.8732806Z async_compile.wait(globals())
2023-01-11T21:38:06.8732883Z del async_compile
2023-01-11T21:38:06.8732888Z 
2023-01-11T21:38:06.8732965Z def call(args):
2023-01-11T21:38:06.8733046Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8733122Z     args.clear()
2023-01-11T21:38:06.8733307Z     buf0 = empty_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8733475Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8733547Z     del arg0_1
2023-01-11T21:38:06.8733619Z     del arg1_1
2023-01-11T21:38:06.8733696Z     return (buf0, )
2023-01-11T21:38:06.8733702Z 
2023-01-11T21:38:06.8733706Z 
2023-01-11T21:38:06.8733790Z if __name__ == "__main__":
2023-01-11T21:38:06.8733909Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8734032Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8734227Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8734417Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8734654Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8734659Z 
2023-01-11T21:38:06.8734733Z ok (1.659s)
2023-01-11T21:38:06.8735297Z   test_cpu_broadcast3_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8735442Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8735742Z [2023-01-11 21:36:49,200] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1042
2023-01-11T21:38:06.8736042Z [2023-01-11 21:36:50,877] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1042
2023-01-11T21:38:06.8736048Z 
2023-01-11T21:38:06.8736196Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8736266Z import torch
2023-01-11T21:38:06.8736343Z import random
2023-01-11T21:38:06.8736472Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8736607Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8736612Z 
2023-01-11T21:38:06.8736696Z aten = torch.ops.aten
2023-01-11T21:38:06.8736845Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8736946Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8736951Z 
2023-01-11T21:38:06.8737027Z import triton
2023-01-11T21:38:06.8737117Z import triton.language as tl
2023-01-11T21:38:06.8737309Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8737452Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8737457Z 
2023-01-11T21:38:06.8737462Z 
2023-01-11T21:38:06.8737601Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8737807Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8737932Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8738041Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8738143Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8738202Z {
2023-01-11T21:38:06.8738304Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8738369Z     {
2023-01-11T21:38:06.8738451Z         #pragma omp for 
2023-01-11T21:38:06.8738538Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.8738605Z         {
2023-01-11T21:38:06.8738726Z             auto tmp0 = at::vec::Vectorized<float>(in_ptr0[0]);
2023-01-11T21:38:06.8738900Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8738992Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8739088Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8739156Z         }
2023-01-11T21:38:06.8739259Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8739349Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.8739409Z         {
2023-01-11T21:38:06.8739497Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8739590Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8739678Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8739768Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8739836Z         }
2023-01-11T21:38:06.8739903Z     }
2023-01-11T21:38:06.8739960Z }
2023-01-11T21:38:06.8740046Z ''')
2023-01-11T21:38:06.8740052Z 
2023-01-11T21:38:06.8740056Z 
2023-01-11T21:38:06.8740149Z async_compile.wait(globals())
2023-01-11T21:38:06.8740230Z del async_compile
2023-01-11T21:38:06.8740235Z 
2023-01-11T21:38:06.8740311Z def call(args):
2023-01-11T21:38:06.8740391Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8740466Z     args.clear()
2023-01-11T21:38:06.8740665Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8740827Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8740899Z     del arg0_1
2023-01-11T21:38:06.8740972Z     del arg1_1
2023-01-11T21:38:06.8741048Z     return (buf0, )
2023-01-11T21:38:06.8741053Z 
2023-01-11T21:38:06.8741057Z 
2023-01-11T21:38:06.8741139Z if __name__ == "__main__":
2023-01-11T21:38:06.8741257Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8741384Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8741568Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8741766Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8741889Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8741895Z 
2023-01-11T21:38:06.8741968Z ok (1.690s)
2023-01-11T21:38:06.8742477Z   test_cpu_broadcast3_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8742612Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8742871Z [2023-01-11 21:36:50,891] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1043
2023-01-11T21:38:06.8743134Z [2023-01-11 21:36:52,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1043
2023-01-11T21:38:06.8743142Z 
2023-01-11T21:38:06.8743243Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8743318Z import torch
2023-01-11T21:38:06.8743386Z import random
2023-01-11T21:38:06.8743505Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8743629Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8743636Z 
2023-01-11T21:38:06.8743718Z aten = torch.ops.aten
2023-01-11T21:38:06.8743854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8743950Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8743955Z 
2023-01-11T21:38:06.8744029Z import triton
2023-01-11T21:38:06.8744114Z import triton.language as tl
2023-01-11T21:38:06.8744239Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8744381Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8744386Z 
2023-01-11T21:38:06.8744390Z 
2023-01-11T21:38:06.8744526Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8744759Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8744880Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8744994Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8745104Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8745162Z {
2023-01-11T21:38:06.8745265Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8745332Z     {
2023-01-11T21:38:06.8745417Z         #pragma omp for 
2023-01-11T21:38:06.8745507Z         for(long i0=0; i0<100; i0+=1)
2023-01-11T21:38:06.8745574Z         {
2023-01-11T21:38:06.8745644Z             {
2023-01-11T21:38:06.8745705Z                 {
2023-01-11T21:38:06.8745803Z                     auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8745904Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:06.8746019Z                     auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.8746117Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8746208Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8746280Z                 }
2023-01-11T21:38:06.8746340Z             }
2023-01-11T21:38:06.8746408Z         }
2023-01-11T21:38:06.8746474Z     }
2023-01-11T21:38:06.8746541Z }
2023-01-11T21:38:06.8746626Z ''')
2023-01-11T21:38:06.8746637Z 
2023-01-11T21:38:06.8746642Z 
2023-01-11T21:38:06.8746735Z async_compile.wait(globals())
2023-01-11T21:38:06.8746812Z del async_compile
2023-01-11T21:38:06.8746817Z 
2023-01-11T21:38:06.8746885Z def call(args):
2023-01-11T21:38:06.8746965Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8747041Z     args.clear()
2023-01-11T21:38:06.8747241Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8747410Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8747483Z     del arg0_1
2023-01-11T21:38:06.8747554Z     del arg1_1
2023-01-11T21:38:06.8747625Z     return (buf0, )
2023-01-11T21:38:06.8747631Z 
2023-01-11T21:38:06.8747644Z 
2023-01-11T21:38:06.8747717Z if __name__ == "__main__":
2023-01-11T21:38:06.8747836Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8747964Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8748186Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8748389Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8748510Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8748517Z 
2023-01-11T21:38:06.8748588Z ok (1.674s)
2023-01-11T21:38:06.8749063Z   test_cpu_broadcast3_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8749190Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8749448Z [2023-01-11 21:36:52,565] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1044
2023-01-11T21:38:06.8749714Z [2023-01-11 21:36:54,231] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1044
2023-01-11T21:38:06.8749720Z 
2023-01-11T21:38:06.8749821Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8749896Z import torch
2023-01-11T21:38:06.8749970Z import random
2023-01-11T21:38:06.8750089Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8750214Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8750219Z 
2023-01-11T21:38:06.8750304Z aten = torch.ops.aten
2023-01-11T21:38:06.8750433Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8750591Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8750597Z 
2023-01-11T21:38:06.8750673Z import triton
2023-01-11T21:38:06.8750766Z import triton.language as tl
2023-01-11T21:38:06.8750892Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8751032Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8751040Z 
2023-01-11T21:38:06.8751044Z 
2023-01-11T21:38:06.8751181Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8751386Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8751506Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8751615Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8751719Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8751786Z {
2023-01-11T21:38:06.8751891Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8751956Z     {
2023-01-11T21:38:06.8752040Z         #pragma omp for 
2023-01-11T21:38:06.8752121Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8752187Z         {
2023-01-11T21:38:06.8752256Z             {
2023-01-11T21:38:06.8752325Z                 {
2023-01-11T21:38:06.8752421Z                     auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8752522Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8752627Z                     auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:06.8752725Z                     auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8752816Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8752886Z                 }
2023-01-11T21:38:06.8752953Z             }
2023-01-11T21:38:06.8753021Z         }
2023-01-11T21:38:06.8753087Z     }
2023-01-11T21:38:06.8753144Z }
2023-01-11T21:38:06.8753228Z ''')
2023-01-11T21:38:06.8753233Z 
2023-01-11T21:38:06.8753239Z 
2023-01-11T21:38:06.8753332Z async_compile.wait(globals())
2023-01-11T21:38:06.8753411Z del async_compile
2023-01-11T21:38:06.8753418Z 
2023-01-11T21:38:06.8753495Z def call(args):
2023-01-11T21:38:06.8753573Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8753649Z     args.clear()
2023-01-11T21:38:06.8753835Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8754030Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8754107Z     del arg0_1
2023-01-11T21:38:06.8754182Z     del arg1_1
2023-01-11T21:38:06.8754260Z     return (buf0, )
2023-01-11T21:38:06.8754265Z 
2023-01-11T21:38:06.8754270Z 
2023-01-11T21:38:06.8754353Z if __name__ == "__main__":
2023-01-11T21:38:06.8754473Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8754603Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8754791Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8754985Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8755111Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8755116Z 
2023-01-11T21:38:06.8755189Z ok (1.680s)
2023-01-11T21:38:06.8755671Z   test_cpu_broadcast3_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8755808Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8756071Z [2023-01-11 21:36:54,245] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1045
2023-01-11T21:38:06.8756336Z [2023-01-11 21:36:55,904] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1045
2023-01-11T21:38:06.8756342Z 
2023-01-11T21:38:06.8756468Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8756543Z import torch
2023-01-11T21:38:06.8756610Z import random
2023-01-11T21:38:06.8756730Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8756854Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8756859Z 
2023-01-11T21:38:06.8756944Z aten = torch.ops.aten
2023-01-11T21:38:06.8757081Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8757178Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8757184Z 
2023-01-11T21:38:06.8757257Z import triton
2023-01-11T21:38:06.8757342Z import triton.language as tl
2023-01-11T21:38:06.8757466Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8757605Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8757611Z 
2023-01-11T21:38:06.8757615Z 
2023-01-11T21:38:06.8757752Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8757957Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8758083Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8758192Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8758297Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8758355Z {
2023-01-11T21:38:06.8758462Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8758529Z     {
2023-01-11T21:38:06.8758623Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8758711Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8758782Z         {
2023-01-11T21:38:06.8758872Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8758932Z             {
2023-01-11T21:38:06.8759000Z                 {
2023-01-11T21:38:06.8759071Z                     {
2023-01-11T21:38:06.8759171Z                         auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8759283Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8759387Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8759489Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8759553Z                     }
2023-01-11T21:38:06.8759621Z                 }
2023-01-11T21:38:06.8759689Z             }
2023-01-11T21:38:06.8759755Z         }
2023-01-11T21:38:06.8759822Z     }
2023-01-11T21:38:06.8759916Z }
2023-01-11T21:38:06.8759994Z ''')
2023-01-11T21:38:06.8760000Z 
2023-01-11T21:38:06.8760011Z 
2023-01-11T21:38:06.8760099Z async_compile.wait(globals())
2023-01-11T21:38:06.8760177Z del async_compile
2023-01-11T21:38:06.8760182Z 
2023-01-11T21:38:06.8760257Z def call(args):
2023-01-11T21:38:06.8760336Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8760411Z     args.clear()
2023-01-11T21:38:06.8760613Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8760779Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8760848Z     del arg0_1
2023-01-11T21:38:06.8760919Z     del arg1_1
2023-01-11T21:38:06.8760995Z     return (buf0, )
2023-01-11T21:38:06.8761000Z 
2023-01-11T21:38:06.8761005Z 
2023-01-11T21:38:06.8761086Z if __name__ == "__main__":
2023-01-11T21:38:06.8761203Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8761331Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8761522Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8761721Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8761833Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8761838Z 
2023-01-11T21:38:06.8761910Z ok (1.673s)
2023-01-11T21:38:06.8762393Z   test_cpu_broadcast3_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8762557Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8762818Z [2023-01-11 21:36:55,918] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1046
2023-01-11T21:38:06.8763081Z [2023-01-11 21:36:55,926] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1046
2023-01-11T21:38:06.8763087Z 
2023-01-11T21:38:06.8763185Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8763260Z import torch
2023-01-11T21:38:06.8763336Z import random
2023-01-11T21:38:06.8763447Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8763571Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8763577Z 
2023-01-11T21:38:06.8763658Z aten = torch.ops.aten
2023-01-11T21:38:06.8763798Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8763897Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8763903Z 
2023-01-11T21:38:06.8763980Z import triton
2023-01-11T21:38:06.8764073Z import triton.language as tl
2023-01-11T21:38:06.8764197Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8764334Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8764339Z 
2023-01-11T21:38:06.8764344Z 
2023-01-11T21:38:06.8764480Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8764683Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8764806Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8764914Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8765015Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8765084Z {
2023-01-11T21:38:06.8765178Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8765248Z     {
2023-01-11T21:38:06.8765330Z         #pragma omp for 
2023-01-11T21:38:06.8765418Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.8765486Z         {
2023-01-11T21:38:06.8765614Z             auto tmp0 = at::vec::Vectorized<float>(in_ptr0[0]);
2023-01-11T21:38:06.8765782Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8765872Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8765961Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8766030Z         }
2023-01-11T21:38:06.8766129Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8766216Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.8766282Z         {
2023-01-11T21:38:06.8766371Z             auto tmp0 = in_ptr0[0];
2023-01-11T21:38:06.8766452Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8766540Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8766625Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8766696Z         }
2023-01-11T21:38:06.8766764Z     }
2023-01-11T21:38:06.8766829Z }
2023-01-11T21:38:06.8766914Z ''')
2023-01-11T21:38:06.8766919Z 
2023-01-11T21:38:06.8766924Z 
2023-01-11T21:38:06.8767011Z async_compile.wait(globals())
2023-01-11T21:38:06.8767089Z del async_compile
2023-01-11T21:38:06.8767094Z 
2023-01-11T21:38:06.8767171Z def call(args):
2023-01-11T21:38:06.8767251Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8767326Z     args.clear()
2023-01-11T21:38:06.8767523Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8767688Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8767761Z     del arg0_1
2023-01-11T21:38:06.8767825Z     del arg1_1
2023-01-11T21:38:06.8767899Z     return (buf0, )
2023-01-11T21:38:06.8767904Z 
2023-01-11T21:38:06.8767909Z 
2023-01-11T21:38:06.8767988Z if __name__ == "__main__":
2023-01-11T21:38:06.8768108Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8768275Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8768466Z     arg0_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8768663Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8768785Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8768791Z 
2023-01-11T21:38:06.8768854Z ok (0.022s)
2023-01-11T21:38:06.8769329Z   test_cpu_dense_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8769460Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8769720Z [2023-01-11 21:36:55,939] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1047
2023-01-11T21:38:06.8769984Z [2023-01-11 21:36:57,602] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1047
2023-01-11T21:38:06.8769989Z 
2023-01-11T21:38:06.8770090Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8770166Z import torch
2023-01-11T21:38:06.8770242Z import random
2023-01-11T21:38:06.8770363Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8770479Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8770484Z 
2023-01-11T21:38:06.8770565Z aten = torch.ops.aten
2023-01-11T21:38:06.8770700Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8770796Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8770801Z 
2023-01-11T21:38:06.8770878Z import triton
2023-01-11T21:38:06.8770971Z import triton.language as tl
2023-01-11T21:38:06.8771098Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8771230Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8771245Z 
2023-01-11T21:38:06.8771250Z 
2023-01-11T21:38:06.8771378Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8771609Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8771739Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8771850Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8771956Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8772026Z {
2023-01-11T21:38:06.8772132Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8772194Z     {
2023-01-11T21:38:06.8772278Z         #pragma omp for 
2023-01-11T21:38:06.8772369Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8772438Z         {
2023-01-11T21:38:06.8772529Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8772603Z             {
2023-01-11T21:38:06.8772754Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8772889Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i1);
2023-01-11T21:38:06.8772986Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8773103Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8773174Z             }
2023-01-11T21:38:06.8773273Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8773366Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8773439Z             {
2023-01-11T21:38:06.8773536Z                 auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8773631Z                 auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8773725Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8773827Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8773899Z             }
2023-01-11T21:38:06.8773969Z         }
2023-01-11T21:38:06.8774063Z     }
2023-01-11T21:38:06.8774120Z }
2023-01-11T21:38:06.8774205Z ''')
2023-01-11T21:38:06.8774210Z 
2023-01-11T21:38:06.8774215Z 
2023-01-11T21:38:06.8774308Z async_compile.wait(globals())
2023-01-11T21:38:06.8774385Z del async_compile
2023-01-11T21:38:06.8774390Z 
2023-01-11T21:38:06.8774465Z def call(args):
2023-01-11T21:38:06.8774656Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8774733Z     args.clear()
2023-01-11T21:38:06.8774937Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8775097Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8775171Z     del arg0_1
2023-01-11T21:38:06.8775241Z     del arg1_1
2023-01-11T21:38:06.8775316Z     return (buf0, )
2023-01-11T21:38:06.8775321Z 
2023-01-11T21:38:06.8775326Z 
2023-01-11T21:38:06.8775408Z if __name__ == "__main__":
2023-01-11T21:38:06.8775525Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8775660Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8775853Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8776046Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8776170Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8776175Z 
2023-01-11T21:38:06.8776247Z ok (1.678s)
2023-01-11T21:38:06.8776724Z   test_cpu_dense_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8776856Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8777113Z [2023-01-11 21:36:57,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1048
2023-01-11T21:38:06.8777440Z [2023-01-11 21:36:59,284] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1048
2023-01-11T21:38:06.8777446Z 
2023-01-11T21:38:06.8777543Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8777672Z import torch
2023-01-11T21:38:06.8777744Z import random
2023-01-11T21:38:06.8777864Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8777987Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8777992Z 
2023-01-11T21:38:06.8778075Z aten = torch.ops.aten
2023-01-11T21:38:06.8778210Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8778307Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8778313Z 
2023-01-11T21:38:06.8778391Z import triton
2023-01-11T21:38:06.8778476Z import triton.language as tl
2023-01-11T21:38:06.8778601Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8778742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8778748Z 
2023-01-11T21:38:06.8778752Z 
2023-01-11T21:38:06.8778892Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8779095Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8779220Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8779332Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8779438Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8779497Z {
2023-01-11T21:38:06.8779600Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8779670Z     {
2023-01-11T21:38:06.8779751Z         #pragma omp for 
2023-01-11T21:38:06.8779843Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8779913Z         {
2023-01-11T21:38:06.8780002Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8780062Z             {
2023-01-11T21:38:06.8780248Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8780384Z                 auto tmp1 = at::vec::Vectorized<float>(in_ptr1[i0]);
2023-01-11T21:38:06.8780479Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8780599Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8780671Z             }
2023-01-11T21:38:06.8780769Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8780854Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8780925Z             {
2023-01-11T21:38:06.8781027Z                 auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8781121Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8781212Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8781312Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8781383Z             }
2023-01-11T21:38:06.8781446Z         }
2023-01-11T21:38:06.8781519Z     }
2023-01-11T21:38:06.8781590Z }
2023-01-11T21:38:06.8781679Z ''')
2023-01-11T21:38:06.8781684Z 
2023-01-11T21:38:06.8781689Z 
2023-01-11T21:38:06.8781785Z async_compile.wait(globals())
2023-01-11T21:38:06.8781868Z del async_compile
2023-01-11T21:38:06.8781873Z 
2023-01-11T21:38:06.8781949Z def call(args):
2023-01-11T21:38:06.8782032Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8782106Z     args.clear()
2023-01-11T21:38:06.8782319Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8782487Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8782562Z     del arg0_1
2023-01-11T21:38:06.8782636Z     del arg1_1
2023-01-11T21:38:06.8782714Z     return (buf0, )
2023-01-11T21:38:06.8782719Z 
2023-01-11T21:38:06.8782723Z 
2023-01-11T21:38:06.8782807Z if __name__ == "__main__":
2023-01-11T21:38:06.8782919Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8783051Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8783256Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8783466Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8783589Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8783626Z 
2023-01-11T21:38:06.8783701Z ok (1.680s)
2023-01-11T21:38:06.8784177Z   test_cpu_dense_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8784312Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8784573Z [2023-01-11 21:36:59,298] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1049
2023-01-11T21:38:06.8784844Z [2023-01-11 21:37:00,970] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1049
2023-01-11T21:38:06.8784850Z 
2023-01-11T21:38:06.8784943Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8785020Z import torch
2023-01-11T21:38:06.8785101Z import random
2023-01-11T21:38:06.8785224Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8785351Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8785356Z 
2023-01-11T21:38:06.8785442Z aten = torch.ops.aten
2023-01-11T21:38:06.8785583Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8785674Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8785686Z 
2023-01-11T21:38:06.8785756Z import triton
2023-01-11T21:38:06.8785852Z import triton.language as tl
2023-01-11T21:38:06.8785979Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8786151Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8786156Z 
2023-01-11T21:38:06.8786161Z 
2023-01-11T21:38:06.8786301Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8786509Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8786637Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8786742Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8786849Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8786917Z {
2023-01-11T21:38:06.8787022Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8787091Z     {
2023-01-11T21:38:06.8787175Z         #pragma omp for 
2023-01-11T21:38:06.8787264Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.8787327Z         {
2023-01-11T21:38:06.8787471Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.8787600Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:06.8787696Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8787795Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8787863Z         }
2023-01-11T21:38:06.8787964Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8788054Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.8788125Z         {
2023-01-11T21:38:06.8788221Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8788309Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.8788400Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8788489Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8788561Z         }
2023-01-11T21:38:06.8788623Z     }
2023-01-11T21:38:06.8788691Z }
2023-01-11T21:38:06.8788779Z ''')
2023-01-11T21:38:06.8788785Z 
2023-01-11T21:38:06.8788790Z 
2023-01-11T21:38:06.8788885Z async_compile.wait(globals())
2023-01-11T21:38:06.8788965Z del async_compile
2023-01-11T21:38:06.8788971Z 
2023-01-11T21:38:06.8789053Z def call(args):
2023-01-11T21:38:06.8789134Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8789205Z     args.clear()
2023-01-11T21:38:06.8789407Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8789576Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8789680Z     del arg0_1
2023-01-11T21:38:06.8789756Z     del arg1_1
2023-01-11T21:38:06.8789834Z     return (buf0, )
2023-01-11T21:38:06.8789840Z 
2023-01-11T21:38:06.8789844Z 
2023-01-11T21:38:06.8789927Z if __name__ == "__main__":
2023-01-11T21:38:06.8790047Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8790169Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8790370Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8790568Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8790692Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8790697Z 
2023-01-11T21:38:06.8790773Z ok (1.687s)
2023-01-11T21:38:06.8791250Z   test_cpu_dense_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8791386Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8791648Z [2023-01-11 21:37:00,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1050
2023-01-11T21:38:06.8791913Z [2023-01-11 21:37:00,995] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1050
2023-01-11T21:38:06.8791919Z 
2023-01-11T21:38:06.8792019Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8792114Z import torch
2023-01-11T21:38:06.8792189Z import random
2023-01-11T21:38:06.8792308Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8792433Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8792438Z 
2023-01-11T21:38:06.8792520Z aten = torch.ops.aten
2023-01-11T21:38:06.8792659Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8792756Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8792761Z 
2023-01-11T21:38:06.8792828Z import triton
2023-01-11T21:38:06.8792920Z import triton.language as tl
2023-01-11T21:38:06.8793045Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8793185Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8793190Z 
2023-01-11T21:38:06.8793195Z 
2023-01-11T21:38:06.8793332Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8793537Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8793663Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8793772Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8793869Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8793934Z {
2023-01-11T21:38:06.8794037Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8794104Z     {
2023-01-11T21:38:06.8794185Z         #pragma omp for 
2023-01-11T21:38:06.8794274Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.8794343Z         {
2023-01-11T21:38:06.8794475Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.8794611Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.8794703Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8794799Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.8794867Z         }
2023-01-11T21:38:06.8794965Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.8795057Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.8795116Z         {
2023-01-11T21:38:06.8795205Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8795293Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8795380Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8795496Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8795564Z         }
2023-01-11T21:38:06.8795632Z     }
2023-01-11T21:38:06.8795689Z }
2023-01-11T21:38:06.8795777Z ''')
2023-01-11T21:38:06.8795783Z 
2023-01-11T21:38:06.8795787Z 
2023-01-11T21:38:06.8795882Z async_compile.wait(globals())
2023-01-11T21:38:06.8795964Z del async_compile
2023-01-11T21:38:06.8795970Z 
2023-01-11T21:38:06.8796045Z def call(args):
2023-01-11T21:38:06.8796125Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8796201Z     args.clear()
2023-01-11T21:38:06.8796392Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8796557Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8796633Z     del arg0_1
2023-01-11T21:38:06.8796704Z     del arg1_1
2023-01-11T21:38:06.8796779Z     return (buf0, )
2023-01-11T21:38:06.8796785Z 
2023-01-11T21:38:06.8796789Z 
2023-01-11T21:38:06.8796868Z if __name__ == "__main__":
2023-01-11T21:38:06.8796987Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8797114Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8797306Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8797504Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8797624Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8797629Z 
2023-01-11T21:38:06.8797701Z ok (0.024s)
2023-01-11T21:38:06.8798176Z   test_cpu_dense_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8798336Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8798595Z [2023-01-11 21:37:01,009] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1051
2023-01-11T21:38:06.8798858Z [2023-01-11 21:37:02,675] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1051
2023-01-11T21:38:06.8798864Z 
2023-01-11T21:38:06.8798963Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8799038Z import torch
2023-01-11T21:38:06.8799105Z import random
2023-01-11T21:38:06.8799225Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8799348Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8799356Z 
2023-01-11T21:38:06.8799437Z aten = torch.ops.aten
2023-01-11T21:38:06.8799573Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8799669Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8799674Z 
2023-01-11T21:38:06.8799748Z import triton
2023-01-11T21:38:06.8799833Z import triton.language as tl
2023-01-11T21:38:06.8799961Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8800103Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8800108Z 
2023-01-11T21:38:06.8800113Z 
2023-01-11T21:38:06.8800248Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8800452Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8800574Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8800687Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8800793Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8800856Z {
2023-01-11T21:38:06.8800959Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8801025Z     {
2023-01-11T21:38:06.8801109Z         #pragma omp for 
2023-01-11T21:38:06.8801197Z         for(long i0=0; i0<100; i0+=1)
2023-01-11T21:38:06.8801264Z         {
2023-01-11T21:38:06.8801332Z             {
2023-01-11T21:38:06.8801422Z                 {
2023-01-11T21:38:06.8801523Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8801624Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:06.8801738Z                     auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.8801834Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8801925Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8801996Z                 }
2023-01-11T21:38:06.8802056Z             }
2023-01-11T21:38:06.8802123Z         }
2023-01-11T21:38:06.8802188Z     }
2023-01-11T21:38:06.8802252Z }
2023-01-11T21:38:06.8802338Z ''')
2023-01-11T21:38:06.8802346Z 
2023-01-11T21:38:06.8802351Z 
2023-01-11T21:38:06.8802444Z async_compile.wait(globals())
2023-01-11T21:38:06.8802514Z del async_compile
2023-01-11T21:38:06.8802526Z 
2023-01-11T21:38:06.8802593Z def call(args):
2023-01-11T21:38:06.8802675Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8802752Z     args.clear()
2023-01-11T21:38:06.8802953Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8803120Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8803194Z     del arg0_1
2023-01-11T21:38:06.8803266Z     del arg1_1
2023-01-11T21:38:06.8803334Z     return (buf0, )
2023-01-11T21:38:06.8803340Z 
2023-01-11T21:38:06.8803345Z 
2023-01-11T21:38:06.8803425Z if __name__ == "__main__":
2023-01-11T21:38:06.8803543Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8803673Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8803871Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8804096Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8804215Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8804220Z 
2023-01-11T21:38:06.8804290Z ok (1.680s)
2023-01-11T21:38:06.8804755Z   test_cpu_dense_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8804886Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8805145Z [2023-01-11 21:37:02,690] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1052
2023-01-11T21:38:06.8805412Z [2023-01-11 21:37:04,355] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1052
2023-01-11T21:38:06.8805418Z 
2023-01-11T21:38:06.8805516Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8805592Z import torch
2023-01-11T21:38:06.8805666Z import random
2023-01-11T21:38:06.8805788Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8805912Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8805917Z 
2023-01-11T21:38:06.8805992Z aten = torch.ops.aten
2023-01-11T21:38:06.8806129Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8806226Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8806231Z 
2023-01-11T21:38:06.8806305Z import triton
2023-01-11T21:38:06.8806397Z import triton.language as tl
2023-01-11T21:38:06.8806522Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8806661Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8806669Z 
2023-01-11T21:38:06.8806673Z 
2023-01-11T21:38:06.8806813Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8807011Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8807136Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8807271Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8807377Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8807442Z {
2023-01-11T21:38:06.8807545Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8807611Z     {
2023-01-11T21:38:06.8807699Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8807786Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8807854Z         {
2023-01-11T21:38:06.8807944Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8808011Z             {
2023-01-11T21:38:06.8808080Z                 {
2023-01-11T21:38:06.8808151Z                     {
2023-01-11T21:38:06.8808257Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8808360Z                         auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8808477Z                         auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:06.8808575Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8808679Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8808751Z                     }
2023-01-11T21:38:06.8808818Z                 }
2023-01-11T21:38:06.8808878Z             }
2023-01-11T21:38:06.8808945Z         }
2023-01-11T21:38:06.8809011Z     }
2023-01-11T21:38:06.8809076Z }
2023-01-11T21:38:06.8809167Z ''')
2023-01-11T21:38:06.8809172Z 
2023-01-11T21:38:06.8809177Z 
2023-01-11T21:38:06.8809269Z async_compile.wait(globals())
2023-01-11T21:38:06.8809345Z del async_compile
2023-01-11T21:38:06.8809351Z 
2023-01-11T21:38:06.8809418Z def call(args):
2023-01-11T21:38:06.8809496Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8809577Z     args.clear()
2023-01-11T21:38:06.8809816Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8809982Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8810060Z     del arg0_1
2023-01-11T21:38:06.8810133Z     del arg1_1
2023-01-11T21:38:06.8810203Z     return (buf0, )
2023-01-11T21:38:06.8810208Z 
2023-01-11T21:38:06.8810220Z 
2023-01-11T21:38:06.8810294Z if __name__ == "__main__":
2023-01-11T21:38:06.8810414Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8810540Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8810737Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8810929Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8811049Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8811054Z 
2023-01-11T21:38:06.8811130Z ok (1.680s)
2023-01-11T21:38:06.8811603Z   test_cpu_dense_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8811734Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8811986Z [2023-01-11 21:37:04,369] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1053
2023-01-11T21:38:06.8812252Z [2023-01-11 21:37:06,026] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1053
2023-01-11T21:38:06.8812258Z 
2023-01-11T21:38:06.8812356Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8812430Z import torch
2023-01-11T21:38:06.8812503Z import random
2023-01-11T21:38:06.8812624Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8812748Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8812754Z 
2023-01-11T21:38:06.8812836Z aten = torch.ops.aten
2023-01-11T21:38:06.8812965Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8813088Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8813094Z 
2023-01-11T21:38:06.8813168Z import triton
2023-01-11T21:38:06.8813262Z import triton.language as tl
2023-01-11T21:38:06.8813387Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8813528Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8813533Z 
2023-01-11T21:38:06.8813538Z 
2023-01-11T21:38:06.8813674Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8813880Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8813996Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8814111Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8814216Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8814284Z {
2023-01-11T21:38:06.8814386Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8814454Z     {
2023-01-11T21:38:06.8814668Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8814748Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8814817Z         {
2023-01-11T21:38:06.8814908Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8814976Z             {
2023-01-11T21:38:06.8815045Z                 {
2023-01-11T21:38:06.8815117Z                     {
2023-01-11T21:38:06.8815220Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8815334Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8815434Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8815537Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8815655Z                     }
2023-01-11T21:38:06.8815725Z                 }
2023-01-11T21:38:06.8815793Z             }
2023-01-11T21:38:06.8815853Z         }
2023-01-11T21:38:06.8815919Z     }
2023-01-11T21:38:06.8815984Z }
2023-01-11T21:38:06.8816071Z ''')
2023-01-11T21:38:06.8816076Z 
2023-01-11T21:38:06.8816081Z 
2023-01-11T21:38:06.8816176Z async_compile.wait(globals())
2023-01-11T21:38:06.8816252Z del async_compile
2023-01-11T21:38:06.8816257Z 
2023-01-11T21:38:06.8816332Z def call(args):
2023-01-11T21:38:06.8816413Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8816481Z     args.clear()
2023-01-11T21:38:06.8816679Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8816844Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8816917Z     del arg0_1
2023-01-11T21:38:06.8816989Z     del arg1_1
2023-01-11T21:38:06.8817063Z     return (buf0, )
2023-01-11T21:38:06.8817071Z 
2023-01-11T21:38:06.8817075Z 
2023-01-11T21:38:06.8817236Z if __name__ == "__main__":
2023-01-11T21:38:06.8817360Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8817487Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8817691Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8817893Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8818011Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8818016Z 
2023-01-11T21:38:06.8818086Z ok (1.671s)
2023-01-11T21:38:06.8818561Z   test_cpu_dense_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8818696Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8818954Z [2023-01-11 21:37:06,040] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1054
2023-01-11T21:38:06.8819256Z [2023-01-11 21:37:07,707] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1054
2023-01-11T21:38:06.8819262Z 
2023-01-11T21:38:06.8819354Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8819430Z import torch
2023-01-11T21:38:06.8819504Z import random
2023-01-11T21:38:06.8819623Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8819747Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8819752Z 
2023-01-11T21:38:06.8819833Z aten = torch.ops.aten
2023-01-11T21:38:06.8819971Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8820060Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8820076Z 
2023-01-11T21:38:06.8820143Z import triton
2023-01-11T21:38:06.8820238Z import triton.language as tl
2023-01-11T21:38:06.8820365Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8820507Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8820513Z 
2023-01-11T21:38:06.8820519Z 
2023-01-11T21:38:06.8820657Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8820862Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8820986Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8821088Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8821192Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8821260Z {
2023-01-11T21:38:06.8821360Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8821426Z     {
2023-01-11T21:38:06.8821522Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8821642Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8821702Z         {
2023-01-11T21:38:06.8821792Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8821862Z             {
2023-01-11T21:38:06.8821930Z                 {
2023-01-11T21:38:06.8822000Z                     {
2023-01-11T21:38:06.8822112Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8822221Z                         auto tmp1 = in_ptr1[i0 + (10*i1)];
2023-01-11T21:38:06.8822314Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8822415Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8822485Z                     }
2023-01-11T21:38:06.8822553Z                 }
2023-01-11T21:38:06.8822622Z             }
2023-01-11T21:38:06.8822692Z         }
2023-01-11T21:38:06.8822759Z     }
2023-01-11T21:38:06.8822816Z }
2023-01-11T21:38:06.8822902Z ''')
2023-01-11T21:38:06.8822907Z 
2023-01-11T21:38:06.8822912Z 
2023-01-11T21:38:06.8823008Z async_compile.wait(globals())
2023-01-11T21:38:06.8823088Z del async_compile
2023-01-11T21:38:06.8823093Z 
2023-01-11T21:38:06.8823168Z def call(args):
2023-01-11T21:38:06.8823248Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8823323Z     args.clear()
2023-01-11T21:38:06.8823512Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8823681Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8823754Z     del arg0_1
2023-01-11T21:38:06.8823827Z     del arg1_1
2023-01-11T21:38:06.8823902Z     return (buf0, )
2023-01-11T21:38:06.8823908Z 
2023-01-11T21:38:06.8823912Z 
2023-01-11T21:38:06.8823992Z if __name__ == "__main__":
2023-01-11T21:38:06.8824109Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8824239Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8824430Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8824627Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8824747Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8824752Z 
2023-01-11T21:38:06.8824823Z ok (1.681s)
2023-01-11T21:38:06.8825329Z   test_cpu_double_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8825465Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8825724Z [2023-01-11 21:37:07,721] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1055
2023-01-11T21:38:06.8825987Z [2023-01-11 21:37:09,378] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1055
2023-01-11T21:38:06.8825997Z 
2023-01-11T21:38:06.8826094Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8826162Z import torch
2023-01-11T21:38:06.8826236Z import random
2023-01-11T21:38:06.8826355Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8826481Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8826486Z 
2023-01-11T21:38:06.8826567Z aten = torch.ops.aten
2023-01-11T21:38:06.8826705Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8826801Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8826806Z 
2023-01-11T21:38:06.8826880Z import triton
2023-01-11T21:38:06.8826965Z import triton.language as tl
2023-01-11T21:38:06.8827090Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8827230Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8827236Z 
2023-01-11T21:38:06.8827240Z 
2023-01-11T21:38:06.8827403Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8827609Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8827734Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8827843Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8827950Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8828009Z {
2023-01-11T21:38:06.8828110Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8828176Z     {
2023-01-11T21:38:06.8828271Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8828358Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8828423Z         {
2023-01-11T21:38:06.8828513Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8828574Z             {
2023-01-11T21:38:06.8828642Z                 {
2023-01-11T21:38:06.8828711Z                     {
2023-01-11T21:38:06.8828820Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8828923Z                         auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8829042Z                         auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8829134Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8829234Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8829307Z                     }
2023-01-11T21:38:06.8829378Z                 }
2023-01-11T21:38:06.8829450Z             }
2023-01-11T21:38:06.8829519Z         }
2023-01-11T21:38:06.8829588Z     }
2023-01-11T21:38:06.8829645Z }
2023-01-11T21:38:06.8829730Z ''')
2023-01-11T21:38:06.8829735Z 
2023-01-11T21:38:06.8829740Z 
2023-01-11T21:38:06.8829832Z async_compile.wait(globals())
2023-01-11T21:38:06.8829908Z del async_compile
2023-01-11T21:38:06.8829913Z 
2023-01-11T21:38:06.8829988Z def call(args):
2023-01-11T21:38:06.8830068Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8830143Z     args.clear()
2023-01-11T21:38:06.8830343Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8830505Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8830577Z     del arg0_1
2023-01-11T21:38:06.8830649Z     del arg1_1
2023-01-11T21:38:06.8830725Z     return (buf0, )
2023-01-11T21:38:06.8830730Z 
2023-01-11T21:38:06.8830762Z 
2023-01-11T21:38:06.8830850Z if __name__ == "__main__":
2023-01-11T21:38:06.8830971Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8831100Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8831295Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8831491Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8831612Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8831618Z 
2023-01-11T21:38:06.8831690Z ok (1.671s)
2023-01-11T21:38:06.8832167Z   test_cpu_double_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8832305Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8832566Z [2023-01-11 21:37:09,392] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1056
2023-01-11T21:38:06.8832830Z [2023-01-11 21:37:11,054] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1056
2023-01-11T21:38:06.8832836Z 
2023-01-11T21:38:06.8832937Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8833014Z import torch
2023-01-11T21:38:06.8833085Z import random
2023-01-11T21:38:06.8833205Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8833356Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8833361Z 
2023-01-11T21:38:06.8833442Z aten = torch.ops.aten
2023-01-11T21:38:06.8833578Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8833673Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8833678Z 
2023-01-11T21:38:06.8833754Z import triton
2023-01-11T21:38:06.8833839Z import triton.language as tl
2023-01-11T21:38:06.8833964Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8834103Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8834108Z 
2023-01-11T21:38:06.8834113Z 
2023-01-11T21:38:06.8834248Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8834452Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8834577Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8834688Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8834796Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8834854Z {
2023-01-11T21:38:06.8834955Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8835022Z     {
2023-01-11T21:38:06.8835117Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8835205Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8835272Z         {
2023-01-11T21:38:06.8835362Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8835423Z             {
2023-01-11T21:38:06.8835492Z                 {
2023-01-11T21:38:06.8835563Z                     {
2023-01-11T21:38:06.8835671Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8835771Z                         auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8835886Z                         auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8835984Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8836079Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8836153Z                     }
2023-01-11T21:38:06.8836223Z                 }
2023-01-11T21:38:06.8836292Z             }
2023-01-11T21:38:06.8836359Z         }
2023-01-11T21:38:06.8836425Z     }
2023-01-11T21:38:06.8836490Z }
2023-01-11T21:38:06.8836568Z ''')
2023-01-11T21:38:06.8836573Z 
2023-01-11T21:38:06.8836609Z 
2023-01-11T21:38:06.8836704Z async_compile.wait(globals())
2023-01-11T21:38:06.8836780Z del async_compile
2023-01-11T21:38:06.8836785Z 
2023-01-11T21:38:06.8836862Z def call(args):
2023-01-11T21:38:06.8836942Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8837017Z     args.clear()
2023-01-11T21:38:06.8837228Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8837387Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8837458Z     del arg0_1
2023-01-11T21:38:06.8837530Z     del arg1_1
2023-01-11T21:38:06.8837606Z     return (buf0, )
2023-01-11T21:38:06.8837614Z 
2023-01-11T21:38:06.8837618Z 
2023-01-11T21:38:06.8837698Z if __name__ == "__main__":
2023-01-11T21:38:06.8837816Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8837943Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8838145Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8838343Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8838461Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8838466Z 
2023-01-11T21:38:06.8838536Z ok (1.676s)
2023-01-11T21:38:06.8839013Z   test_cpu_double_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8839172Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8839429Z [2023-01-11 21:37:11,068] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1057
2023-01-11T21:38:06.8839694Z [2023-01-11 21:37:12,731] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1057
2023-01-11T21:38:06.8839699Z 
2023-01-11T21:38:06.8839798Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8839871Z import torch
2023-01-11T21:38:06.8839938Z import random
2023-01-11T21:38:06.8840058Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8840183Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8840188Z 
2023-01-11T21:38:06.8840271Z aten = torch.ops.aten
2023-01-11T21:38:06.8840409Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8840507Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8840512Z 
2023-01-11T21:38:06.8840586Z import triton
2023-01-11T21:38:06.8840679Z import triton.language as tl
2023-01-11T21:38:06.8840797Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8840937Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8840945Z 
2023-01-11T21:38:06.8840950Z 
2023-01-11T21:38:06.8841086Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8841292Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8841417Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8841527Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8841632Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8841698Z {
2023-01-11T21:38:06.8841792Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8841857Z     {
2023-01-11T21:38:06.8841942Z         #pragma omp for 
2023-01-11T21:38:06.8842032Z         for(long i0=0; i0<100; i0+=1)
2023-01-11T21:38:06.8842099Z         {
2023-01-11T21:38:06.8842165Z             {
2023-01-11T21:38:06.8842229Z                 {
2023-01-11T21:38:06.8842326Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8842423Z                     auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.8842595Z                     auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8842696Z                     auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8842788Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8842856Z                 }
2023-01-11T21:38:06.8842917Z             }
2023-01-11T21:38:06.8842983Z         }
2023-01-11T21:38:06.8843048Z     }
2023-01-11T21:38:06.8843112Z }
2023-01-11T21:38:06.8843197Z ''')
2023-01-11T21:38:06.8843202Z 
2023-01-11T21:38:06.8843206Z 
2023-01-11T21:38:06.8843301Z async_compile.wait(globals())
2023-01-11T21:38:06.8843378Z del async_compile
2023-01-11T21:38:06.8843383Z 
2023-01-11T21:38:06.8843463Z def call(args):
2023-01-11T21:38:06.8843536Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8843610Z     args.clear()
2023-01-11T21:38:06.8843813Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8843982Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8844056Z     del arg0_1
2023-01-11T21:38:06.8844128Z     del arg1_1
2023-01-11T21:38:06.8844205Z     return (buf0, )
2023-01-11T21:38:06.8844210Z 
2023-01-11T21:38:06.8844215Z 
2023-01-11T21:38:06.8844288Z if __name__ == "__main__":
2023-01-11T21:38:06.8844406Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8844532Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8844731Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8844922Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8845065Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8845071Z 
2023-01-11T21:38:06.8845142Z ok (1.677s)
2023-01-11T21:38:06.8845612Z   test_cpu_double_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8845746Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8845999Z [2023-01-11 21:37:12,745] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1058
2023-01-11T21:38:06.8846262Z [2023-01-11 21:37:14,419] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1058
2023-01-11T21:38:06.8846267Z 
2023-01-11T21:38:06.8846370Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8846445Z import torch
2023-01-11T21:38:06.8846520Z import random
2023-01-11T21:38:06.8846638Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8846761Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8846766Z 
2023-01-11T21:38:06.8846848Z aten = torch.ops.aten
2023-01-11T21:38:06.8846980Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8847078Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8847083Z 
2023-01-11T21:38:06.8847158Z import triton
2023-01-11T21:38:06.8847252Z import triton.language as tl
2023-01-11T21:38:06.8847376Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8847515Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8847521Z 
2023-01-11T21:38:06.8847525Z 
2023-01-11T21:38:06.8847662Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8847868Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8847989Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8848098Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8848204Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8848269Z {
2023-01-11T21:38:06.8848397Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8848466Z     {
2023-01-11T21:38:06.8848546Z         #pragma omp for 
2023-01-11T21:38:06.8848627Z         for(long i0=0; i0<100; i0+=1)
2023-01-11T21:38:06.8848696Z         {
2023-01-11T21:38:06.8848766Z             {
2023-01-11T21:38:06.8848836Z                 {
2023-01-11T21:38:06.8848934Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8849030Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8849143Z                     auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8849231Z                     auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8849325Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8849392Z                 }
2023-01-11T21:38:06.8849459Z             }
2023-01-11T21:38:06.8849526Z         }
2023-01-11T21:38:06.8849591Z     }
2023-01-11T21:38:06.8849654Z }
2023-01-11T21:38:06.8849733Z ''')
2023-01-11T21:38:06.8849738Z 
2023-01-11T21:38:06.8849743Z 
2023-01-11T21:38:06.8849838Z async_compile.wait(globals())
2023-01-11T21:38:06.8849916Z del async_compile
2023-01-11T21:38:06.8849921Z 
2023-01-11T21:38:06.8849996Z def call(args):
2023-01-11T21:38:06.8850075Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8850150Z     args.clear()
2023-01-11T21:38:06.8850350Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8850509Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8850582Z     del arg0_1
2023-01-11T21:38:06.8850654Z     del arg1_1
2023-01-11T21:38:06.8850728Z     return (buf0, )
2023-01-11T21:38:06.8850768Z 
2023-01-11T21:38:06.8850773Z 
2023-01-11T21:38:06.8850856Z if __name__ == "__main__":
2023-01-11T21:38:06.8850974Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8851100Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8851301Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8851492Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8851611Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8851616Z 
2023-01-11T21:38:06.8851686Z ok (1.687s)
2023-01-11T21:38:06.8852157Z   test_cpu_double_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8852292Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8852554Z [2023-01-11 21:37:14,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1059
2023-01-11T21:38:06.8852821Z [2023-01-11 21:37:16,095] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1059
2023-01-11T21:38:06.8852826Z 
2023-01-11T21:38:06.8852924Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8852999Z import torch
2023-01-11T21:38:06.8853066Z import random
2023-01-11T21:38:06.8853185Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8853312Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8853317Z 
2023-01-11T21:38:06.8853397Z aten = torch.ops.aten
2023-01-11T21:38:06.8853533Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8853631Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8853639Z 
2023-01-11T21:38:06.8853713Z import triton
2023-01-11T21:38:06.8853807Z import triton.language as tl
2023-01-11T21:38:06.8853925Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8854064Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8854070Z 
2023-01-11T21:38:06.8854074Z 
2023-01-11T21:38:06.8854239Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8854446Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8854703Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8854817Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8854921Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8854985Z {
2023-01-11T21:38:06.8855081Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8855147Z     {
2023-01-11T21:38:06.8855228Z         #pragma omp for 
2023-01-11T21:38:06.8855325Z         for(long i0=0; i0<100; i0+=1)
2023-01-11T21:38:06.8855392Z         {
2023-01-11T21:38:06.8855460Z             {
2023-01-11T21:38:06.8855522Z                 {
2023-01-11T21:38:06.8855619Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8855717Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8855814Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8855905Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8855975Z                 }
2023-01-11T21:38:06.8856043Z             }
2023-01-11T21:38:06.8856102Z         }
2023-01-11T21:38:06.8856167Z     }
2023-01-11T21:38:06.8856230Z }
2023-01-11T21:38:06.8856317Z ''')
2023-01-11T21:38:06.8856323Z 
2023-01-11T21:38:06.8856327Z 
2023-01-11T21:38:06.8856422Z async_compile.wait(globals())
2023-01-11T21:38:06.8856501Z del async_compile
2023-01-11T21:38:06.8856506Z 
2023-01-11T21:38:06.8856580Z def call(args):
2023-01-11T21:38:06.8856652Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8856727Z     args.clear()
2023-01-11T21:38:06.8856977Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8857196Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8857282Z     del arg0_1
2023-01-11T21:38:06.8857355Z     del arg1_1
2023-01-11T21:38:06.8857433Z     return (buf0, )
2023-01-11T21:38:06.8857438Z 
2023-01-11T21:38:06.8857443Z 
2023-01-11T21:38:06.8857523Z if __name__ == "__main__":
2023-01-11T21:38:06.8857634Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8857761Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8857958Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8858154Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8858273Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8858278Z 
2023-01-11T21:38:06.8858349Z ok (1.676s)
2023-01-11T21:38:06.8858822Z   test_cpu_double_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8858953Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8859210Z [2023-01-11 21:37:16,108] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1060
2023-01-11T21:38:06.8859463Z [2023-01-11 21:37:17,776] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1060
2023-01-11T21:38:06.8859477Z 
2023-01-11T21:38:06.8859568Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8859642Z import torch
2023-01-11T21:38:06.8859717Z import random
2023-01-11T21:38:06.8859839Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8859963Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8859969Z 
2023-01-11T21:38:06.8860050Z aten = torch.ops.aten
2023-01-11T21:38:06.8860185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8860312Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8860318Z 
2023-01-11T21:38:06.8860392Z import triton
2023-01-11T21:38:06.8860483Z import triton.language as tl
2023-01-11T21:38:06.8860608Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8860748Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8860753Z 
2023-01-11T21:38:06.8860758Z 
2023-01-11T21:38:06.8860895Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8861103Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8861226Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8861328Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8861433Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8861498Z {
2023-01-11T21:38:06.8861598Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8861663Z     {
2023-01-11T21:38:06.8861764Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8861850Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8861910Z         {
2023-01-11T21:38:06.8862000Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8862068Z             {
2023-01-11T21:38:06.8862138Z                 {
2023-01-11T21:38:06.8862209Z                     {
2023-01-11T21:38:06.8862317Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8862417Z                         auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8862528Z                         auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8862626Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8862757Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8862829Z                     }
2023-01-11T21:38:06.8862900Z                 }
2023-01-11T21:38:06.8862971Z             }
2023-01-11T21:38:06.8863037Z         }
2023-01-11T21:38:06.8863095Z     }
2023-01-11T21:38:06.8863159Z }
2023-01-11T21:38:06.8863245Z ''')
2023-01-11T21:38:06.8863251Z 
2023-01-11T21:38:06.8863255Z 
2023-01-11T21:38:06.8863349Z async_compile.wait(globals())
2023-01-11T21:38:06.8863427Z del async_compile
2023-01-11T21:38:06.8863432Z 
2023-01-11T21:38:06.8863507Z def call(args):
2023-01-11T21:38:06.8863586Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8863654Z     args.clear()
2023-01-11T21:38:06.8863853Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8864020Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8864092Z     del arg0_1
2023-01-11T21:38:06.8864166Z     del arg1_1
2023-01-11T21:38:06.8864242Z     return (buf0, )
2023-01-11T21:38:06.8864248Z 
2023-01-11T21:38:06.8864252Z 
2023-01-11T21:38:06.8864335Z if __name__ == "__main__":
2023-01-11T21:38:06.8864445Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8864571Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8864774Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8864965Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8865084Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8865089Z 
2023-01-11T21:38:06.8865162Z ok (1.682s)
2023-01-11T21:38:06.8865642Z   test_cpu_double_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8865777Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8866066Z [2023-01-11 21:37:17,791] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1061
2023-01-11T21:38:06.8866330Z [2023-01-11 21:37:19,455] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1061
2023-01-11T21:38:06.8866336Z 
2023-01-11T21:38:06.8866427Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8866503Z import torch
2023-01-11T21:38:06.8866578Z import random
2023-01-11T21:38:06.8866696Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8866820Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8866825Z 
2023-01-11T21:38:06.8866908Z aten = torch.ops.aten
2023-01-11T21:38:06.8867045Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8867137Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8867152Z 
2023-01-11T21:38:06.8867219Z import triton
2023-01-11T21:38:06.8867311Z import triton.language as tl
2023-01-11T21:38:06.8867436Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8867578Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8867583Z 
2023-01-11T21:38:06.8867588Z 
2023-01-11T21:38:06.8867724Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8867930Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8868054Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8868165Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8868264Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8868329Z {
2023-01-11T21:38:06.8868430Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8868497Z     {
2023-01-11T21:38:06.8868621Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8868710Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8868770Z         {
2023-01-11T21:38:06.8868862Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8868931Z             {
2023-01-11T21:38:06.8869000Z                 {
2023-01-11T21:38:06.8869074Z                     {
2023-01-11T21:38:06.8869184Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8869295Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8869404Z                         auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8869507Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8869610Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8869682Z                     }
2023-01-11T21:38:06.8869752Z                 }
2023-01-11T21:38:06.8869819Z             }
2023-01-11T21:38:06.8869887Z         }
2023-01-11T21:38:06.8869951Z     }
2023-01-11T21:38:06.8870013Z }
2023-01-11T21:38:06.8870098Z ''')
2023-01-11T21:38:06.8870104Z 
2023-01-11T21:38:06.8870108Z 
2023-01-11T21:38:06.8870203Z async_compile.wait(globals())
2023-01-11T21:38:06.8870280Z del async_compile
2023-01-11T21:38:06.8870286Z 
2023-01-11T21:38:06.8870359Z def call(args):
2023-01-11T21:38:06.8870439Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8870510Z     args.clear()
2023-01-11T21:38:06.8870709Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8870874Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8870950Z     del arg0_1
2023-01-11T21:38:06.8871024Z     del arg1_1
2023-01-11T21:38:06.8871100Z     return (buf0, )
2023-01-11T21:38:06.8871106Z 
2023-01-11T21:38:06.8871110Z 
2023-01-11T21:38:06.8871191Z if __name__ == "__main__":
2023-01-11T21:38:06.8871309Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8871429Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8871629Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8871825Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8871946Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8871981Z 
2023-01-11T21:38:06.8872053Z ok (1.679s)
2023-01-11T21:38:06.8872530Z   test_cpu_double_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8872665Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8872920Z [2023-01-11 21:37:19,470] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1062
2023-01-11T21:38:06.8873186Z [2023-01-11 21:37:21,136] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1062
2023-01-11T21:38:06.8873191Z 
2023-01-11T21:38:06.8873289Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8873357Z import torch
2023-01-11T21:38:06.8873435Z import random
2023-01-11T21:38:06.8873555Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8873678Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8873683Z 
2023-01-11T21:38:06.8873764Z aten = torch.ops.aten
2023-01-11T21:38:06.8873900Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8873994Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8873999Z 
2023-01-11T21:38:06.8874066Z import triton
2023-01-11T21:38:06.8874158Z import triton.language as tl
2023-01-11T21:38:06.8874282Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8874451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8874457Z 
2023-01-11T21:38:06.8874461Z 
2023-01-11T21:38:06.8874598Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8874802Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8874929Z extern "C" void kernel(const double* __restrict__ in_ptr0,
2023-01-11T21:38:06.8875039Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8875138Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8875203Z {
2023-01-11T21:38:06.8875303Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8875369Z     {
2023-01-11T21:38:06.8875465Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8875552Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8875619Z         {
2023-01-11T21:38:06.8875703Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8875769Z             {
2023-01-11T21:38:06.8875841Z                 {
2023-01-11T21:38:06.8875911Z                     {
2023-01-11T21:38:06.8876019Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8876125Z                         auto tmp1 = in_ptr1[i0 + (10*i1)];
2023-01-11T21:38:06.8876243Z                         auto tmp2 = static_cast<double>(tmp1);
2023-01-11T21:38:06.8876339Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8876440Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8876513Z                     }
2023-01-11T21:38:06.8876582Z                 }
2023-01-11T21:38:06.8876650Z             }
2023-01-11T21:38:06.8876717Z         }
2023-01-11T21:38:06.8876783Z     }
2023-01-11T21:38:06.8876840Z }
2023-01-11T21:38:06.8876924Z ''')
2023-01-11T21:38:06.8876929Z 
2023-01-11T21:38:06.8876934Z 
2023-01-11T21:38:06.8877028Z async_compile.wait(globals())
2023-01-11T21:38:06.8877104Z del async_compile
2023-01-11T21:38:06.8877109Z 
2023-01-11T21:38:06.8877186Z def call(args):
2023-01-11T21:38:06.8877264Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8877338Z     args.clear()
2023-01-11T21:38:06.8877530Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8877692Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8877792Z     del arg0_1
2023-01-11T21:38:06.8877865Z     del arg1_1
2023-01-11T21:38:06.8877941Z     return (buf0, )
2023-01-11T21:38:06.8877946Z 
2023-01-11T21:38:06.8877950Z 
2023-01-11T21:38:06.8878030Z if __name__ == "__main__":
2023-01-11T21:38:06.8878146Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8878272Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8878464Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8878659Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8878778Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8878786Z 
2023-01-11T21:38:06.8878856Z ok (1.680s)
2023-01-11T21:38:06.8879330Z   test_cpu_int_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8879463Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8879720Z [2023-01-11 21:37:21,149] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1063
2023-01-11T21:38:06.8879982Z [2023-01-11 21:37:22,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1063
2023-01-11T21:38:06.8879988Z 
2023-01-11T21:38:06.8880087Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8880187Z import torch
2023-01-11T21:38:06.8880262Z import random
2023-01-11T21:38:06.8880380Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8880505Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8880510Z 
2023-01-11T21:38:06.8880591Z aten = torch.ops.aten
2023-01-11T21:38:06.8880731Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8880829Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8880834Z 
2023-01-11T21:38:06.8880908Z import triton
2023-01-11T21:38:06.8880993Z import triton.language as tl
2023-01-11T21:38:06.8881118Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8881258Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8881263Z 
2023-01-11T21:38:06.8881268Z 
2023-01-11T21:38:06.8881405Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8881608Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8881735Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8881845Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8881950Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8882009Z {
2023-01-11T21:38:06.8882111Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8882179Z     {
2023-01-11T21:38:06.8882259Z         #pragma omp for 
2023-01-11T21:38:06.8882345Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8882414Z         {
2023-01-11T21:38:06.8882474Z             {
2023-01-11T21:38:06.8882543Z                 {
2023-01-11T21:38:06.8882640Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8882738Z                     auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:06.8882851Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.8882946Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8883037Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8883102Z                 }
2023-01-11T21:38:06.8883167Z             }
2023-01-11T21:38:06.8883237Z         }
2023-01-11T21:38:06.8883306Z     }
2023-01-11T21:38:06.8883375Z }
2023-01-11T21:38:06.8883458Z ''')
2023-01-11T21:38:06.8883464Z 
2023-01-11T21:38:06.8883468Z 
2023-01-11T21:38:06.8883596Z async_compile.wait(globals())
2023-01-11T21:38:06.8883668Z del async_compile
2023-01-11T21:38:06.8883680Z 
2023-01-11T21:38:06.8883748Z def call(args):
2023-01-11T21:38:06.8883829Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8883904Z     args.clear()
2023-01-11T21:38:06.8884098Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8884262Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8884339Z     del arg0_1
2023-01-11T21:38:06.8884413Z     del arg1_1
2023-01-11T21:38:06.8884481Z     return (buf0, )
2023-01-11T21:38:06.8884486Z 
2023-01-11T21:38:06.8884494Z 
2023-01-11T21:38:06.8884574Z if __name__ == "__main__":
2023-01-11T21:38:06.8884691Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8884817Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8885007Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8885202Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8885320Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8885326Z 
2023-01-11T21:38:06.8885398Z ok (1.684s)
2023-01-11T21:38:06.8885852Z   test_cpu_int_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8886010Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8886267Z [2023-01-11 21:37:22,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1064
2023-01-11T21:38:06.8886529Z [2023-01-11 21:37:24,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1064
2023-01-11T21:38:06.8886534Z 
2023-01-11T21:38:06.8886631Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8886705Z import torch
2023-01-11T21:38:06.8886779Z import random
2023-01-11T21:38:06.8886897Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8887021Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8887026Z 
2023-01-11T21:38:06.8887101Z aten = torch.ops.aten
2023-01-11T21:38:06.8887237Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8887332Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8887338Z 
2023-01-11T21:38:06.8887414Z import triton
2023-01-11T21:38:06.8887507Z import triton.language as tl
2023-01-11T21:38:06.8887631Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8887769Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8887775Z 
2023-01-11T21:38:06.8887779Z 
2023-01-11T21:38:06.8887922Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8888116Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8888235Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8888344Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8888447Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8888513Z {
2023-01-11T21:38:06.8888615Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8888681Z     {
2023-01-11T21:38:06.8888768Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8888852Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8888924Z         {
2023-01-11T21:38:06.8889014Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8889080Z             {
2023-01-11T21:38:06.8889149Z                 {
2023-01-11T21:38:06.8889219Z                     {
2023-01-11T21:38:06.8889312Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8889444Z                         auto tmp2 = in_ptr1[i0];
2023-01-11T21:38:06.8889561Z                         auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.8889661Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8889763Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8889834Z                     }
2023-01-11T21:38:06.8889906Z                 }
2023-01-11T21:38:06.8889966Z             }
2023-01-11T21:38:06.8890032Z         }
2023-01-11T21:38:06.8890099Z     }
2023-01-11T21:38:06.8890164Z }
2023-01-11T21:38:06.8890249Z ''')
2023-01-11T21:38:06.8890255Z 
2023-01-11T21:38:06.8890259Z 
2023-01-11T21:38:06.8890354Z async_compile.wait(globals())
2023-01-11T21:38:06.8890430Z del async_compile
2023-01-11T21:38:06.8890435Z 
2023-01-11T21:38:06.8890502Z def call(args):
2023-01-11T21:38:06.8890580Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8890655Z     args.clear()
2023-01-11T21:38:06.8890866Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8891031Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8891105Z     del arg0_1
2023-01-11T21:38:06.8891177Z     del arg1_1
2023-01-11T21:38:06.8891245Z     return (buf0, )
2023-01-11T21:38:06.8891250Z 
2023-01-11T21:38:06.8891261Z 
2023-01-11T21:38:06.8891335Z if __name__ == "__main__":
2023-01-11T21:38:06.8891451Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8891576Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8891771Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8892015Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8892135Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8892140Z 
2023-01-11T21:38:06.8892212Z ok (1.678s)
2023-01-11T21:38:06.8892688Z   test_cpu_int_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8892813Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8893070Z [2023-01-11 21:37:24,512] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1065
2023-01-11T21:38:06.8893332Z [2023-01-11 21:37:26,168] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1065
2023-01-11T21:38:06.8893341Z 
2023-01-11T21:38:06.8893438Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8893513Z import torch
2023-01-11T21:38:06.8893589Z import random
2023-01-11T21:38:06.8893708Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8893834Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8893839Z 
2023-01-11T21:38:06.8893914Z aten = torch.ops.aten
2023-01-11T21:38:06.8894051Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8894146Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8894151Z 
2023-01-11T21:38:06.8894228Z import triton
2023-01-11T21:38:06.8894320Z import triton.language as tl
2023-01-11T21:38:06.8894445Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8894694Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8894701Z 
2023-01-11T21:38:06.8894708Z 
2023-01-11T21:38:06.8894846Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8895044Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8895163Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8895273Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8895424Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8895490Z {
2023-01-11T21:38:06.8895593Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8895659Z     {
2023-01-11T21:38:06.8895734Z         #pragma omp for 
2023-01-11T21:38:06.8895821Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8895886Z         {
2023-01-11T21:38:06.8895954Z             {
2023-01-11T21:38:06.8896020Z                 {
2023-01-11T21:38:06.8896117Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8896213Z                     auto tmp2 = in_ptr1[0];
2023-01-11T21:38:06.8896319Z                     auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.8896419Z                     auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8896510Z                     out_ptr0[i0] = tmp3;
2023-01-11T21:38:06.8896579Z                 }
2023-01-11T21:38:06.8896645Z             }
2023-01-11T21:38:06.8896711Z         }
2023-01-11T21:38:06.8896777Z     }
2023-01-11T21:38:06.8896836Z }
2023-01-11T21:38:06.8896921Z ''')
2023-01-11T21:38:06.8896927Z 
2023-01-11T21:38:06.8896931Z 
2023-01-11T21:38:06.8897023Z async_compile.wait(globals())
2023-01-11T21:38:06.8897098Z del async_compile
2023-01-11T21:38:06.8897103Z 
2023-01-11T21:38:06.8897245Z def call(args):
2023-01-11T21:38:06.8897330Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8897406Z     args.clear()
2023-01-11T21:38:06.8897593Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8897759Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8897831Z     del arg0_1
2023-01-11T21:38:06.8897941Z     del arg1_1
2023-01-11T21:38:06.8898015Z     return (buf0, )
2023-01-11T21:38:06.8898021Z 
2023-01-11T21:38:06.8898026Z 
2023-01-11T21:38:06.8898106Z if __name__ == "__main__":
2023-01-11T21:38:06.8898222Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8898350Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8898536Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8898724Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8898842Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8898847Z 
2023-01-11T21:38:06.8898917Z ok (1.671s)
2023-01-11T21:38:06.8899381Z   test_cpu_int_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8899515Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8899774Z [2023-01-11 21:37:26,182] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1066
2023-01-11T21:38:06.8900036Z [2023-01-11 21:37:27,834] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1066
2023-01-11T21:38:06.8900041Z 
2023-01-11T21:38:06.8900139Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8900213Z import torch
2023-01-11T21:38:06.8900280Z import random
2023-01-11T21:38:06.8900397Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8900521Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8900526Z 
2023-01-11T21:38:06.8900607Z aten = torch.ops.aten
2023-01-11T21:38:06.8900741Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8900840Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8900845Z 
2023-01-11T21:38:06.8900919Z import triton
2023-01-11T21:38:06.8901004Z import triton.language as tl
2023-01-11T21:38:06.8901129Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8901294Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8901300Z 
2023-01-11T21:38:06.8901304Z 
2023-01-11T21:38:06.8901441Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8901645Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8901764Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8901873Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8901978Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8902036Z {
2023-01-11T21:38:06.8902138Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8902207Z     {
2023-01-11T21:38:06.8902301Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8902387Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8902453Z         {
2023-01-11T21:38:06.8902544Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8902606Z             {
2023-01-11T21:38:06.8902674Z                 {
2023-01-11T21:38:06.8902747Z                     {
2023-01-11T21:38:06.8902848Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8902957Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8903072Z                         auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.8903169Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8903263Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8903333Z                     }
2023-01-11T21:38:06.8903401Z                 }
2023-01-11T21:38:06.8903469Z             }
2023-01-11T21:38:06.8903534Z         }
2023-01-11T21:38:06.8903628Z     }
2023-01-11T21:38:06.8903685Z }
2023-01-11T21:38:06.8903769Z ''')
2023-01-11T21:38:06.8903774Z 
2023-01-11T21:38:06.8903779Z 
2023-01-11T21:38:06.8903871Z async_compile.wait(globals())
2023-01-11T21:38:06.8903948Z del async_compile
2023-01-11T21:38:06.8903953Z 
2023-01-11T21:38:06.8904027Z def call(args):
2023-01-11T21:38:06.8904108Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8904182Z     args.clear()
2023-01-11T21:38:06.8904380Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8904537Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8904610Z     del arg0_1
2023-01-11T21:38:06.8904683Z     del arg1_1
2023-01-11T21:38:06.8904758Z     return (buf0, )
2023-01-11T21:38:06.8904763Z 
2023-01-11T21:38:06.8904768Z 
2023-01-11T21:38:06.8904846Z if __name__ == "__main__":
2023-01-11T21:38:06.8904962Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8905090Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8905273Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8905470Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8905589Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8905596Z 
2023-01-11T21:38:06.8905667Z ok (1.665s)
2023-01-11T21:38:06.8906133Z   test_cpu_int_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8906264Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8906522Z [2023-01-11 21:37:27,848] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1067
2023-01-11T21:38:06.8906784Z [2023-01-11 21:37:29,511] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1067
2023-01-11T21:38:06.8906789Z 
2023-01-11T21:38:06.8906887Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8906961Z import torch
2023-01-11T21:38:06.8907057Z import random
2023-01-11T21:38:06.8907179Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8907302Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8907308Z 
2023-01-11T21:38:06.8907388Z aten = torch.ops.aten
2023-01-11T21:38:06.8907525Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8907622Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8907627Z 
2023-01-11T21:38:06.8907702Z import triton
2023-01-11T21:38:06.8907787Z import triton.language as tl
2023-01-11T21:38:06.8907915Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8908058Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8908063Z 
2023-01-11T21:38:06.8908068Z 
2023-01-11T21:38:06.8908204Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8908407Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8908532Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8908644Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8908748Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8908807Z {
2023-01-11T21:38:06.8908911Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8908977Z     {
2023-01-11T21:38:06.8909070Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8909155Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8909223Z         {
2023-01-11T21:38:06.8909315Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8909375Z             {
2023-01-11T21:38:06.8909478Z                 {
2023-01-11T21:38:06.8909550Z                     {
2023-01-11T21:38:06.8909649Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8909756Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8909872Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.8909973Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8910068Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8910139Z                     }
2023-01-11T21:38:06.8910209Z                 }
2023-01-11T21:38:06.8910275Z             }
2023-01-11T21:38:06.8910345Z         }
2023-01-11T21:38:06.8910412Z     }
2023-01-11T21:38:06.8910475Z }
2023-01-11T21:38:06.8910553Z ''')
2023-01-11T21:38:06.8910560Z 
2023-01-11T21:38:06.8910564Z 
2023-01-11T21:38:06.8910660Z async_compile.wait(globals())
2023-01-11T21:38:06.8910738Z del async_compile
2023-01-11T21:38:06.8910743Z 
2023-01-11T21:38:06.8910816Z def call(args):
2023-01-11T21:38:06.8910899Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8910975Z     args.clear()
2023-01-11T21:38:06.8911175Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8911335Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8911410Z     del arg0_1
2023-01-11T21:38:06.8911482Z     del arg1_1
2023-01-11T21:38:06.8911558Z     return (buf0, )
2023-01-11T21:38:06.8911563Z 
2023-01-11T21:38:06.8911568Z 
2023-01-11T21:38:06.8911652Z if __name__ == "__main__":
2023-01-11T21:38:06.8911770Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8911895Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8912087Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8912279Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8912399Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8912407Z 
2023-01-11T21:38:06.8912477Z ok (1.677s)
2023-01-11T21:38:06.8912967Z   test_cpu_int_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8913103Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8913361Z [2023-01-11 21:37:29,525] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1068
2023-01-11T21:38:06.8913625Z [2023-01-11 21:37:31,192] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1068
2023-01-11T21:38:06.8913631Z 
2023-01-11T21:38:06.8913728Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8913805Z import torch
2023-01-11T21:38:06.8913873Z import random
2023-01-11T21:38:06.8913995Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8914123Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8914128Z 
2023-01-11T21:38:06.8914210Z aten = torch.ops.aten
2023-01-11T21:38:06.8914350Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8914447Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8914452Z 
2023-01-11T21:38:06.8914525Z import triton
2023-01-11T21:38:06.8914617Z import triton.language as tl
2023-01-11T21:38:06.8914734Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8914873Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8914878Z 
2023-01-11T21:38:06.8914883Z 
2023-01-11T21:38:06.8915019Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8915225Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8915373Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8915480Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8915583Z                        int* __restrict__ out_ptr0)
2023-01-11T21:38:06.8915651Z {
2023-01-11T21:38:06.8915746Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8915814Z     {
2023-01-11T21:38:06.8915894Z         #pragma omp for 
2023-01-11T21:38:06.8915980Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8916048Z         {
2023-01-11T21:38:06.8916118Z             {
2023-01-11T21:38:06.8916179Z                 {
2023-01-11T21:38:06.8916276Z                     auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8916374Z                     auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8916468Z                     auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8916559Z                     out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.8916631Z                 }
2023-01-11T21:38:06.8916699Z             }
2023-01-11T21:38:06.8916762Z         }
2023-01-11T21:38:06.8916829Z     }
2023-01-11T21:38:06.8916893Z }
2023-01-11T21:38:06.8916977Z ''')
2023-01-11T21:38:06.8916983Z 
2023-01-11T21:38:06.8916987Z 
2023-01-11T21:38:06.8917079Z async_compile.wait(globals())
2023-01-11T21:38:06.8917155Z del async_compile
2023-01-11T21:38:06.8917160Z 
2023-01-11T21:38:06.8917238Z def call(args):
2023-01-11T21:38:06.8917310Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8917385Z     args.clear()
2023-01-11T21:38:06.8917575Z     buf0 = empty_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8917743Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8917816Z     del arg0_1
2023-01-11T21:38:06.8917887Z     del arg1_1
2023-01-11T21:38:06.8917962Z     return (buf0, )
2023-01-11T21:38:06.8917966Z 
2023-01-11T21:38:06.8917971Z 
2023-01-11T21:38:06.8918049Z if __name__ == "__main__":
2023-01-11T21:38:06.8918159Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8918288Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8918479Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8918668Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8918817Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8918823Z 
2023-01-11T21:38:06.8918896Z ok (1.681s)
2023-01-11T21:38:06.8919362Z   test_cpu_int_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8919493Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8919752Z [2023-01-11 21:37:31,207] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1069
2023-01-11T21:38:06.8920006Z [2023-01-11 21:37:32,869] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1069
2023-01-11T21:38:06.8920019Z 
2023-01-11T21:38:06.8920113Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8920186Z import torch
2023-01-11T21:38:06.8920261Z import random
2023-01-11T21:38:06.8920380Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8920505Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8920510Z 
2023-01-11T21:38:06.8920591Z aten = torch.ops.aten
2023-01-11T21:38:06.8920727Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8920816Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8920821Z 
2023-01-11T21:38:06.8920894Z import triton
2023-01-11T21:38:06.8920985Z import triton.language as tl
2023-01-11T21:38:06.8921111Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8921279Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8921285Z 
2023-01-11T21:38:06.8921289Z 
2023-01-11T21:38:06.8921428Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8921636Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8921760Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8921866Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8921977Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8922044Z {
2023-01-11T21:38:06.8922148Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8922217Z     {
2023-01-11T21:38:06.8922315Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8922405Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8922467Z         {
2023-01-11T21:38:06.8922561Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8922633Z             {
2023-01-11T21:38:06.8922705Z                 {
2023-01-11T21:38:06.8922775Z                     {
2023-01-11T21:38:06.8922877Z                         auto tmp0 = in_ptr0[i1];
2023-01-11T21:38:06.8922989Z                         auto tmp2 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8923105Z                         auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.8923207Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8923310Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8923382Z                     }
2023-01-11T21:38:06.8929541Z                 }
2023-01-11T21:38:06.8929626Z             }
2023-01-11T21:38:06.8929697Z         }
2023-01-11T21:38:06.8929769Z     }
2023-01-11T21:38:06.8929838Z }
2023-01-11T21:38:06.8929939Z ''')
2023-01-11T21:38:06.8929944Z 
2023-01-11T21:38:06.8929949Z 
2023-01-11T21:38:06.8930049Z async_compile.wait(globals())
2023-01-11T21:38:06.8930124Z del async_compile
2023-01-11T21:38:06.8930134Z 
2023-01-11T21:38:06.8930214Z def call(args):
2023-01-11T21:38:06.8930294Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8930365Z     args.clear()
2023-01-11T21:38:06.8930567Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8930785Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8930859Z     del arg0_1
2023-01-11T21:38:06.8930931Z     del arg1_1
2023-01-11T21:38:06.8931007Z     return (buf0, )
2023-01-11T21:38:06.8931012Z 
2023-01-11T21:38:06.8931016Z 
2023-01-11T21:38:06.8931098Z if __name__ == "__main__":
2023-01-11T21:38:06.8931217Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8931339Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8931531Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8931732Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8931859Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8931864Z 
2023-01-11T21:38:06.8931935Z ok (1.677s)
2023-01-11T21:38:06.8932413Z   test_cpu_int_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8932545Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8932805Z [2023-01-11 21:37:32,883] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1070
2023-01-11T21:38:06.8933069Z [2023-01-11 21:37:34,551] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1070
2023-01-11T21:38:06.8933075Z 
2023-01-11T21:38:06.8933197Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8933272Z import torch
2023-01-11T21:38:06.8933349Z import random
2023-01-11T21:38:06.8933468Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8933591Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8933596Z 
2023-01-11T21:38:06.8933679Z aten = torch.ops.aten
2023-01-11T21:38:06.8933815Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8933911Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8933917Z 
2023-01-11T21:38:06.8933983Z import triton
2023-01-11T21:38:06.8934073Z import triton.language as tl
2023-01-11T21:38:06.8934198Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8934336Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8934341Z 
2023-01-11T21:38:06.8934346Z 
2023-01-11T21:38:06.8934622Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8934830Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8934954Z extern "C" void kernel(const int* __restrict__ in_ptr0,
2023-01-11T21:38:06.8935065Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8935162Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8935227Z {
2023-01-11T21:38:06.8935334Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8935400Z     {
2023-01-11T21:38:06.8935499Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8935608Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8935668Z         {
2023-01-11T21:38:06.8935783Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8935853Z             {
2023-01-11T21:38:06.8935922Z                 {
2023-01-11T21:38:06.8935992Z                     {
2023-01-11T21:38:06.8936092Z                         auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.8936201Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8936313Z                         auto tmp1 = static_cast<float>(tmp0);
2023-01-11T21:38:06.8936412Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8936517Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8936588Z                     }
2023-01-11T21:38:06.8936655Z                 }
2023-01-11T21:38:06.8936774Z             }
2023-01-11T21:38:06.8936843Z         }
2023-01-11T21:38:06.8936904Z     }
2023-01-11T21:38:06.8936969Z }
2023-01-11T21:38:06.8937055Z ''')
2023-01-11T21:38:06.8937060Z 
2023-01-11T21:38:06.8937065Z 
2023-01-11T21:38:06.8937223Z async_compile.wait(globals())
2023-01-11T21:38:06.8937315Z del async_compile
2023-01-11T21:38:06.8937321Z 
2023-01-11T21:38:06.8937395Z def call(args):
2023-01-11T21:38:06.8937475Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8937543Z     args.clear()
2023-01-11T21:38:06.8937749Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8937916Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8937993Z     del arg0_1
2023-01-11T21:38:06.8938063Z     del arg1_1
2023-01-11T21:38:06.8938138Z     return (buf0, )
2023-01-11T21:38:06.8938143Z 
2023-01-11T21:38:06.8938147Z 
2023-01-11T21:38:06.8938226Z if __name__ == "__main__":
2023-01-11T21:38:06.8938345Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8938464Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8938657Z     arg0_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8938857Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8938976Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8938982Z 
2023-01-11T21:38:06.8939053Z ok (1.682s)
2023-01-11T21:38:06.8939528Z   test_cpu_strided_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8939749Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8940009Z [2023-01-11 21:37:34,565] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1071
2023-01-11T21:38:06.8940273Z [2023-01-11 21:37:36,265] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1071
2023-01-11T21:38:06.8940278Z 
2023-01-11T21:38:06.8940376Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8940444Z import torch
2023-01-11T21:38:06.8940518Z import random
2023-01-11T21:38:06.8940638Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8940762Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8940771Z 
2023-01-11T21:38:06.8940854Z aten = torch.ops.aten
2023-01-11T21:38:06.8940989Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8941084Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8941089Z 
2023-01-11T21:38:06.8941156Z import triton
2023-01-11T21:38:06.8941247Z import triton.language as tl
2023-01-11T21:38:06.8941374Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8941513Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8941519Z 
2023-01-11T21:38:06.8941523Z 
2023-01-11T21:38:06.8941661Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8941870Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8941995Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8942104Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8942200Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8942267Z {
2023-01-11T21:38:06.8942367Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8942432Z     {
2023-01-11T21:38:06.8942529Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8942614Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8942679Z         {
2023-01-11T21:38:06.8942796Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8942866Z             {
2023-01-11T21:38:06.8942934Z                 {
2023-01-11T21:38:06.8943005Z                     {
2023-01-11T21:38:06.8943117Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8943218Z                         auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8943316Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8943410Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8943480Z                     }
2023-01-11T21:38:06.8943549Z                 }
2023-01-11T21:38:06.8943616Z             }
2023-01-11T21:38:06.8943684Z         }
2023-01-11T21:38:06.8943749Z     }
2023-01-11T21:38:06.8943807Z }
2023-01-11T21:38:06.8943892Z ''')
2023-01-11T21:38:06.8943897Z 
2023-01-11T21:38:06.8943902Z 
2023-01-11T21:38:06.8943993Z async_compile.wait(globals())
2023-01-11T21:38:06.8944069Z del async_compile
2023-01-11T21:38:06.8944073Z 
2023-01-11T21:38:06.8944147Z def call(args):
2023-01-11T21:38:06.8944228Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8944304Z     args.clear()
2023-01-11T21:38:06.8944502Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8944662Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8944733Z     del arg0_1
2023-01-11T21:38:06.8944803Z     del arg1_1
2023-01-11T21:38:06.8944878Z     return (buf0, )
2023-01-11T21:38:06.8944883Z 
2023-01-11T21:38:06.8944887Z 
2023-01-11T21:38:06.8944968Z if __name__ == "__main__":
2023-01-11T21:38:06.8945084Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8945239Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8945437Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8945622Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8945743Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8945749Z 
2023-01-11T21:38:06.8945820Z ok (1.714s)
2023-01-11T21:38:06.8946293Z   test_cpu_strided_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8946424Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8946686Z [2023-01-11 21:37:36,280] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1072
2023-01-11T21:38:06.8946949Z [2023-01-11 21:37:37,943] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1072
2023-01-11T21:38:06.8946955Z 
2023-01-11T21:38:06.8947051Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8947126Z import torch
2023-01-11T21:38:06.8947193Z import random
2023-01-11T21:38:06.8947312Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8947436Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8947441Z 
2023-01-11T21:38:06.8947521Z aten = torch.ops.aten
2023-01-11T21:38:06.8947656Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8947751Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8947757Z 
2023-01-11T21:38:06.8947830Z import triton
2023-01-11T21:38:06.8947922Z import triton.language as tl
2023-01-11T21:38:06.8948038Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8948179Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8948184Z 
2023-01-11T21:38:06.8948189Z 
2023-01-11T21:38:06.8948326Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8948556Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8948681Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8948793Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8948900Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8948968Z {
2023-01-11T21:38:06.8949064Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8949132Z     {
2023-01-11T21:38:06.8949231Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8949320Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8949389Z         {
2023-01-11T21:38:06.8949480Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8949545Z             {
2023-01-11T21:38:06.8949615Z                 {
2023-01-11T21:38:06.8949687Z                     {
2023-01-11T21:38:06.8949803Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8949905Z                         auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8950007Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8950109Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8950176Z                     }
2023-01-11T21:38:06.8950247Z                 }
2023-01-11T21:38:06.8950316Z             }
2023-01-11T21:38:06.8950385Z         }
2023-01-11T21:38:06.8950453Z     }
2023-01-11T21:38:06.8950518Z }
2023-01-11T21:38:06.8950605Z ''')
2023-01-11T21:38:06.8950610Z 
2023-01-11T21:38:06.8950615Z 
2023-01-11T21:38:06.8950704Z async_compile.wait(globals())
2023-01-11T21:38:06.8950783Z del async_compile
2023-01-11T21:38:06.8950788Z 
2023-01-11T21:38:06.8950866Z def call(args):
2023-01-11T21:38:06.8950948Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8951058Z     args.clear()
2023-01-11T21:38:06.8951267Z     buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8951433Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8951505Z     del arg0_1
2023-01-11T21:38:06.8951572Z     del arg1_1
2023-01-11T21:38:06.8951647Z     return (buf0, )
2023-01-11T21:38:06.8951652Z 
2023-01-11T21:38:06.8951657Z 
2023-01-11T21:38:06.8951738Z if __name__ == "__main__":
2023-01-11T21:38:06.8951856Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8951983Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8952178Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8952381Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8952492Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8952509Z 
2023-01-11T21:38:06.8952573Z ok (1.677s)
2023-01-11T21:38:06.8953045Z   test_cpu_strided_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8953177Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8953434Z [2023-01-11 21:37:37,957] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1073
2023-01-11T21:38:06.8953695Z [2023-01-11 21:37:39,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1073
2023-01-11T21:38:06.8953701Z 
2023-01-11T21:38:06.8953799Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8953875Z import torch
2023-01-11T21:38:06.8953951Z import random
2023-01-11T21:38:06.8954070Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8954186Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8954191Z 
2023-01-11T21:38:06.8954272Z aten = torch.ops.aten
2023-01-11T21:38:06.8954432Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8954529Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8954534Z 
2023-01-11T21:38:06.8954608Z import triton
2023-01-11T21:38:06.8954700Z import triton.language as tl
2023-01-11T21:38:06.8954823Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8954955Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8954967Z 
2023-01-11T21:38:06.8954972Z 
2023-01-11T21:38:06.8955101Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8955316Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8955456Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8955586Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8955688Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8955753Z {
2023-01-11T21:38:06.8955852Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8955915Z     {
2023-01-11T21:38:06.8956010Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8956094Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8956160Z         {
2023-01-11T21:38:06.8956249Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8956319Z             {
2023-01-11T21:38:06.8956389Z                 {
2023-01-11T21:38:06.8956453Z                     {
2023-01-11T21:38:06.8956562Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8956663Z                         auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.8956762Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8956895Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8956966Z                     }
2023-01-11T21:38:06.8957034Z                 }
2023-01-11T21:38:06.8957094Z             }
2023-01-11T21:38:06.8957160Z         }
2023-01-11T21:38:06.8957225Z     }
2023-01-11T21:38:06.8957290Z }
2023-01-11T21:38:06.8957375Z ''')
2023-01-11T21:38:06.8957383Z 
2023-01-11T21:38:06.8957388Z 
2023-01-11T21:38:06.8957480Z async_compile.wait(globals())
2023-01-11T21:38:06.8957555Z del async_compile
2023-01-11T21:38:06.8957560Z 
2023-01-11T21:38:06.8957628Z def call(args):
2023-01-11T21:38:06.8957706Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8957780Z     args.clear()
2023-01-11T21:38:06.8957978Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8958143Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8958216Z     del arg0_1
2023-01-11T21:38:06.8958286Z     del arg1_1
2023-01-11T21:38:06.8958358Z     return (buf0, )
2023-01-11T21:38:06.8958363Z 
2023-01-11T21:38:06.8958367Z 
2023-01-11T21:38:06.8958447Z if __name__ == "__main__":
2023-01-11T21:38:06.8958563Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8958688Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8958886Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8959075Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8959193Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8959198Z 
2023-01-11T21:38:06.8959270Z ok (1.675s)
2023-01-11T21:38:06.8959744Z   test_cpu_strided_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8959872Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8960133Z [2023-01-11 21:37:39,632] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1074
2023-01-11T21:38:06.8960425Z [2023-01-11 21:37:41,294] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1074
2023-01-11T21:38:06.8960432Z 
2023-01-11T21:38:06.8960530Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8960605Z import torch
2023-01-11T21:38:06.8960681Z import random
2023-01-11T21:38:06.8960800Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8960924Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8960929Z 
2023-01-11T21:38:06.8961003Z aten = torch.ops.aten
2023-01-11T21:38:06.8961139Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8961238Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8961243Z 
2023-01-11T21:38:06.8961315Z import triton
2023-01-11T21:38:06.8961407Z import triton.language as tl
2023-01-11T21:38:06.8961530Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8961669Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8961677Z 
2023-01-11T21:38:06.8961681Z 
2023-01-11T21:38:06.8961818Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8962018Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8962141Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8962250Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8962354Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8962419Z {
2023-01-11T21:38:06.8962519Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8962585Z     {
2023-01-11T21:38:06.8962702Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8962788Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8962854Z         {
2023-01-11T21:38:06.8962946Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8963014Z             {
2023-01-11T21:38:06.8963081Z                 {
2023-01-11T21:38:06.8963151Z                     {
2023-01-11T21:38:06.8963256Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8963366Z                         auto tmp1 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8963465Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8963566Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8963637Z                     }
2023-01-11T21:38:06.8963706Z                 }
2023-01-11T21:38:06.8963773Z             }
2023-01-11T21:38:06.8963833Z         }
2023-01-11T21:38:06.8963896Z     }
2023-01-11T21:38:06.8963959Z }
2023-01-11T21:38:06.8964044Z ''')
2023-01-11T21:38:06.8964049Z 
2023-01-11T21:38:06.8964057Z 
2023-01-11T21:38:06.8964150Z async_compile.wait(globals())
2023-01-11T21:38:06.8964226Z del async_compile
2023-01-11T21:38:06.8964231Z 
2023-01-11T21:38:06.8964305Z def call(args):
2023-01-11T21:38:06.8964377Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8964450Z     args.clear()
2023-01-11T21:38:06.8964649Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8964814Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8964886Z     del arg0_1
2023-01-11T21:38:06.8964957Z     del arg1_1
2023-01-11T21:38:06.8965031Z     return (buf0, )
2023-01-11T21:38:06.8965036Z 
2023-01-11T21:38:06.8965041Z 
2023-01-11T21:38:06.8965120Z if __name__ == "__main__":
2023-01-11T21:38:06.8965232Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8965358Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8965556Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8965757Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8965875Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8965880Z 
2023-01-11T21:38:06.8965950Z ok (1.676s)
2023-01-11T21:38:06.8966451Z   test_cpu_strided_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8966582Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8966843Z [2023-01-11 21:37:41,309] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1075
2023-01-11T21:38:06.8967100Z [2023-01-11 21:37:42,971] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1075
2023-01-11T21:38:06.8967117Z 
2023-01-11T21:38:06.8967207Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8967281Z import torch
2023-01-11T21:38:06.8967357Z import random
2023-01-11T21:38:06.8967477Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8967600Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8967605Z 
2023-01-11T21:38:06.8967687Z aten = torch.ops.aten
2023-01-11T21:38:06.8967821Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8967910Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8967915Z 
2023-01-11T21:38:06.8967991Z import triton
2023-01-11T21:38:06.8968086Z import triton.language as tl
2023-01-11T21:38:06.8968208Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8968348Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8968381Z 
2023-01-11T21:38:06.8968385Z 
2023-01-11T21:38:06.8968528Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8968732Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8968856Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8968966Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.8969070Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.8969138Z {
2023-01-11T21:38:06.8969238Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8969305Z     {
2023-01-11T21:38:06.8969400Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8969483Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8969543Z         {
2023-01-11T21:38:06.8969633Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8969700Z             {
2023-01-11T21:38:06.8969769Z                 {
2023-01-11T21:38:06.8969838Z                     {
2023-01-11T21:38:06.8969953Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8970060Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.8970170Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.8970270Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.8970373Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8970444Z                     }
2023-01-11T21:38:06.8970513Z                 }
2023-01-11T21:38:06.8970579Z             }
2023-01-11T21:38:06.8970646Z         }
2023-01-11T21:38:06.8970705Z     }
2023-01-11T21:38:06.8970768Z }
2023-01-11T21:38:06.8970854Z ''')
2023-01-11T21:38:06.8970860Z 
2023-01-11T21:38:06.8970864Z 
2023-01-11T21:38:06.8970959Z async_compile.wait(globals())
2023-01-11T21:38:06.8971038Z del async_compile
2023-01-11T21:38:06.8971043Z 
2023-01-11T21:38:06.8971116Z def call(args):
2023-01-11T21:38:06.8971197Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8971265Z     args.clear()
2023-01-11T21:38:06.8971468Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8971637Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8971711Z     del arg0_1
2023-01-11T21:38:06.8971782Z     del arg1_1
2023-01-11T21:38:06.8971883Z     return (buf0, )
2023-01-11T21:38:06.8971889Z 
2023-01-11T21:38:06.8971893Z 
2023-01-11T21:38:06.8971973Z if __name__ == "__main__":
2023-01-11T21:38:06.8972084Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8972209Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8972411Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8972607Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.8972726Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8972732Z 
2023-01-11T21:38:06.8972805Z ok (1.676s)
2023-01-11T21:38:06.8973275Z   test_cpu_strided_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8973405Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8973662Z [2023-01-11 21:37:42,985] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1076
2023-01-11T21:38:06.8973923Z [2023-01-11 21:37:44,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1076
2023-01-11T21:38:06.8973929Z 
2023-01-11T21:38:06.8974019Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8974095Z import torch
2023-01-11T21:38:06.8974170Z import random
2023-01-11T21:38:06.8974317Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8974440Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8974445Z 
2023-01-11T21:38:06.8974631Z aten = torch.ops.aten
2023-01-11T21:38:06.8974769Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8974862Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8974874Z 
2023-01-11T21:38:06.8974941Z import triton
2023-01-11T21:38:06.8975035Z import triton.language as tl
2023-01-11T21:38:06.8975160Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8975297Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8975303Z 
2023-01-11T21:38:06.8975308Z 
2023-01-11T21:38:06.8975451Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8975669Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8975806Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8975936Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.8976033Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8976098Z {
2023-01-11T21:38:06.8976198Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8976265Z     {
2023-01-11T21:38:06.8976362Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8976446Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8976506Z         {
2023-01-11T21:38:06.8976596Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8976665Z             {
2023-01-11T21:38:06.8976735Z                 {
2023-01-11T21:38:06.8976805Z                     {
2023-01-11T21:38:06.8976918Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8977019Z                         auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.8977174Z                         auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:06.8977286Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.8977388Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.8977459Z                     }
2023-01-11T21:38:06.8977529Z                 }
2023-01-11T21:38:06.8977596Z             }
2023-01-11T21:38:06.8977665Z         }
2023-01-11T21:38:06.8977723Z     }
2023-01-11T21:38:06.8977784Z }
2023-01-11T21:38:06.8977918Z ''')
2023-01-11T21:38:06.8977925Z 
2023-01-11T21:38:06.8977929Z 
2023-01-11T21:38:06.8978027Z async_compile.wait(globals())
2023-01-11T21:38:06.8978102Z del async_compile
2023-01-11T21:38:06.8978107Z 
2023-01-11T21:38:06.8978181Z def call(args):
2023-01-11T21:38:06.8978259Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8978327Z     args.clear()
2023-01-11T21:38:06.8978527Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8978695Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8978770Z     del arg0_1
2023-01-11T21:38:06.8978843Z     del arg1_1
2023-01-11T21:38:06.8978918Z     return (buf0, )
2023-01-11T21:38:06.8978923Z 
2023-01-11T21:38:06.8978927Z 
2023-01-11T21:38:06.8979007Z if __name__ == "__main__":
2023-01-11T21:38:06.8979122Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8979240Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8979439Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8979631Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.8979749Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8979754Z 
2023-01-11T21:38:06.8979824Z ok (1.681s)
2023-01-11T21:38:06.8980299Z   test_cpu_strided_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8980477Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8980743Z [2023-01-11 21:37:44,666] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1077
2023-01-11T21:38:06.8981008Z [2023-01-11 21:37:46,321] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1077
2023-01-11T21:38:06.8981014Z 
2023-01-11T21:38:06.8981115Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8981185Z import torch
2023-01-11T21:38:06.8981261Z import random
2023-01-11T21:38:06.8981382Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8981508Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8981513Z 
2023-01-11T21:38:06.8981597Z aten = torch.ops.aten
2023-01-11T21:38:06.8981735Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8981838Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8981843Z 
2023-01-11T21:38:06.8981913Z import triton
2023-01-11T21:38:06.8982009Z import triton.language as tl
2023-01-11T21:38:06.8982141Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8982287Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8982292Z 
2023-01-11T21:38:06.8982296Z 
2023-01-11T21:38:06.8982434Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8982640Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8982766Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8982877Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8982975Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8983044Z {
2023-01-11T21:38:06.8983147Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8983216Z     {
2023-01-11T21:38:06.8983316Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8983403Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8983473Z         {
2023-01-11T21:38:06.8983559Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8983628Z             {
2023-01-11T21:38:06.8983698Z                 {
2023-01-11T21:38:06.8983799Z                     {
2023-01-11T21:38:06.8983917Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8984029Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8984130Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8984227Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8984301Z                     }
2023-01-11T21:38:06.8984371Z                 }
2023-01-11T21:38:06.8984442Z             }
2023-01-11T21:38:06.8984511Z         }
2023-01-11T21:38:06.8984579Z     }
2023-01-11T21:38:06.8984639Z }
2023-01-11T21:38:06.8984726Z ''')
2023-01-11T21:38:06.8984734Z 
2023-01-11T21:38:06.8984738Z 
2023-01-11T21:38:06.8984833Z async_compile.wait(globals())
2023-01-11T21:38:06.8984911Z del async_compile
2023-01-11T21:38:06.8984916Z 
2023-01-11T21:38:06.8984993Z def call(args):
2023-01-11T21:38:06.8985074Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8985152Z     args.clear()
2023-01-11T21:38:06.8985356Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8985520Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8985595Z     del arg0_1
2023-01-11T21:38:06.8985670Z     del arg1_1
2023-01-11T21:38:06.8985748Z     return (buf0, )
2023-01-11T21:38:06.8985753Z 
2023-01-11T21:38:06.8985757Z 
2023-01-11T21:38:06.8985839Z if __name__ == "__main__":
2023-01-11T21:38:06.8985960Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8986089Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8986289Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8986511Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8986634Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8986639Z 
2023-01-11T21:38:06.8986711Z ok (1.669s)
2023-01-11T21:38:06.8987198Z   test_cpu_strided_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8987332Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8987593Z [2023-01-11 21:37:46,335] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1078
2023-01-11T21:38:06.8987862Z [2023-01-11 21:37:47,998] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1078
2023-01-11T21:38:06.8987868Z 
2023-01-11T21:38:06.8987969Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8988046Z import torch
2023-01-11T21:38:06.8988117Z import random
2023-01-11T21:38:06.8988242Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8988367Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8988372Z 
2023-01-11T21:38:06.8988455Z aten = torch.ops.aten
2023-01-11T21:38:06.8988595Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8988693Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8988698Z 
2023-01-11T21:38:06.8988776Z import triton
2023-01-11T21:38:06.8988878Z import triton.language as tl
2023-01-11T21:38:06.8988999Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8989141Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8989148Z 
2023-01-11T21:38:06.8989152Z 
2023-01-11T21:38:06.8989292Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8989501Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8989627Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8989764Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8989872Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8989940Z {
2023-01-11T21:38:06.8990036Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8990104Z     {
2023-01-11T21:38:06.8990201Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.8990289Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8990359Z         {
2023-01-11T21:38:06.8990452Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.8990515Z             {
2023-01-11T21:38:06.8990588Z                 {
2023-01-11T21:38:06.8990661Z                     {
2023-01-11T21:38:06.8990774Z                         auto tmp0 = in_ptr0[(2*i1) + (30*i0)];
2023-01-11T21:38:06.8990886Z                         auto tmp1 = in_ptr1[i0 + (10*i1)];
2023-01-11T21:38:06.8990991Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8991097Z                         out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8991165Z                     }
2023-01-11T21:38:06.8991236Z                 }
2023-01-11T21:38:06.8991305Z             }
2023-01-11T21:38:06.8991374Z         }
2023-01-11T21:38:06.8991441Z     }
2023-01-11T21:38:06.8991507Z }
2023-01-11T21:38:06.8991592Z ''')
2023-01-11T21:38:06.8991597Z 
2023-01-11T21:38:06.8991602Z 
2023-01-11T21:38:06.8991691Z async_compile.wait(globals())
2023-01-11T21:38:06.8991772Z del async_compile
2023-01-11T21:38:06.8991777Z 
2023-01-11T21:38:06.8991853Z def call(args):
2023-01-11T21:38:06.8991935Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8992012Z     args.clear()
2023-01-11T21:38:06.8992213Z     buf0 = empty_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8992411Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8992487Z     del arg0_1
2023-01-11T21:38:06.8992554Z     del arg1_1
2023-01-11T21:38:06.8992631Z     return (buf0, )
2023-01-11T21:38:06.8992636Z 
2023-01-11T21:38:06.8992643Z 
2023-01-11T21:38:06.8992730Z if __name__ == "__main__":
2023-01-11T21:38:06.8992848Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.8992976Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.8993180Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8993377Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8993493Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.8993504Z 
2023-01-11T21:38:06.8993571Z ok (1.677s)
2023-01-11T21:38:06.8994054Z   test_cpu_transposed_broadcast1 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.8994189Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.8994449Z [2023-01-11 21:37:48,012] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1079
2023-01-11T21:38:06.8994711Z [2023-01-11 21:37:48,021] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1079
2023-01-11T21:38:06.8994717Z 
2023-01-11T21:38:06.8994817Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.8994893Z import torch
2023-01-11T21:38:06.8994969Z import random
2023-01-11T21:38:06.8995091Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.8995212Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.8995217Z 
2023-01-11T21:38:06.8995298Z aten = torch.ops.aten
2023-01-11T21:38:06.8995437Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.8995533Z async_compile = AsyncCompile()
2023-01-11T21:38:06.8995538Z 
2023-01-11T21:38:06.8995640Z import triton
2023-01-11T21:38:06.8995737Z import triton.language as tl
2023-01-11T21:38:06.8995862Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.8995996Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.8996008Z 
2023-01-11T21:38:06.8996012Z 
2023-01-11T21:38:06.8996145Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.8996351Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.8996474Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.8996582Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.8996695Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.8996764Z {
2023-01-11T21:38:06.8996869Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.8996931Z     {
2023-01-11T21:38:06.8997014Z         #pragma omp for 
2023-01-11T21:38:06.8997106Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.8997174Z         {
2023-01-11T21:38:06.8997264Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.8997339Z             {
2023-01-11T21:38:06.8997495Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8997624Z                 auto tmp1 = at::vec::Vectorized<float>(in_ptr1[i0]);
2023-01-11T21:38:06.8997716Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8997828Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.8997899Z             }
2023-01-11T21:38:06.8997996Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.8998117Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.8998185Z             {
2023-01-11T21:38:06.8998282Z                 auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.8998373Z                 auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.8998469Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.8998575Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.8998643Z             }
2023-01-11T21:38:06.8998712Z         }
2023-01-11T21:38:06.8998780Z     }
2023-01-11T21:38:06.8998840Z }
2023-01-11T21:38:06.8998926Z ''')
2023-01-11T21:38:06.8998931Z 
2023-01-11T21:38:06.8998936Z 
2023-01-11T21:38:06.8999031Z async_compile.wait(globals())
2023-01-11T21:38:06.8999110Z del async_compile
2023-01-11T21:38:06.8999115Z 
2023-01-11T21:38:06.8999192Z def call(args):
2023-01-11T21:38:06.8999273Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.8999350Z     args.clear()
2023-01-11T21:38:06.8999546Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.8999718Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.8999795Z     del arg0_1
2023-01-11T21:38:06.8999868Z     del arg1_1
2023-01-11T21:38:06.8999945Z     return (buf0, )
2023-01-11T21:38:06.8999950Z 
2023-01-11T21:38:06.8999954Z 
2023-01-11T21:38:06.9000039Z if __name__ == "__main__":
2023-01-11T21:38:06.9000159Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9000287Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9000480Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9000675Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9000795Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9000800Z 
2023-01-11T21:38:06.9000876Z ok (0.022s)
2023-01-11T21:38:06.9001357Z   test_cpu_transposed_broadcast2 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9001520Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9001783Z [2023-01-11 21:37:48,035] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1080
2023-01-11T21:38:06.9002050Z [2023-01-11 21:37:48,044] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1080
2023-01-11T21:38:06.9002056Z 
2023-01-11T21:38:06.9002157Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9002233Z import torch
2023-01-11T21:38:06.9002303Z import random
2023-01-11T21:38:06.9002425Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9002558Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9002563Z 
2023-01-11T21:38:06.9002645Z aten = torch.ops.aten
2023-01-11T21:38:06.9002783Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9002882Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9002887Z 
2023-01-11T21:38:06.9002967Z import triton
2023-01-11T21:38:06.9003054Z import triton.language as tl
2023-01-11T21:38:06.9003181Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9003321Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9003327Z 
2023-01-11T21:38:06.9003332Z 
2023-01-11T21:38:06.9003472Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9003679Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9003803Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9003912Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.9004045Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.9004104Z {
2023-01-11T21:38:06.9004203Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9004267Z     {
2023-01-11T21:38:06.9004348Z         #pragma omp for 
2023-01-11T21:38:06.9004436Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.9004505Z         {
2023-01-11T21:38:06.9004593Z             for(long i1=0; i1<1; i1+=1)
2023-01-11T21:38:06.9004654Z             {
2023-01-11T21:38:06.9004800Z                 auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.9004939Z                 auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i1);
2023-01-11T21:38:06.9005032Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9005143Z                 tmp2.store(out_ptr0 + (8*i1) + (10*i0));
2023-01-11T21:38:06.9005210Z             }
2023-01-11T21:38:06.9005304Z             #pragma omp simd simdlen(4) 
2023-01-11T21:38:06.9005385Z             for(long i1=8; i1<10; i1+=1)
2023-01-11T21:38:06.9005457Z             {
2023-01-11T21:38:06.9005557Z                 auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.9005645Z                 auto tmp1 = in_ptr1[i1];
2023-01-11T21:38:06.9005735Z                 auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9005833Z                 out_ptr0[i1 + (10*i0)] = tmp2;
2023-01-11T21:38:06.9005899Z             }
2023-01-11T21:38:06.9005959Z         }
2023-01-11T21:38:06.9006023Z     }
2023-01-11T21:38:06.9006087Z }
2023-01-11T21:38:06.9006171Z ''')
2023-01-11T21:38:06.9006177Z 
2023-01-11T21:38:06.9006182Z 
2023-01-11T21:38:06.9006274Z async_compile.wait(globals())
2023-01-11T21:38:06.9006353Z del async_compile
2023-01-11T21:38:06.9006359Z 
2023-01-11T21:38:06.9006432Z def call(args):
2023-01-11T21:38:06.9006505Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9006580Z     args.clear()
2023-01-11T21:38:06.9006790Z     buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9006961Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9007033Z     del arg0_1
2023-01-11T21:38:06.9007103Z     del arg1_1
2023-01-11T21:38:06.9007180Z     return (buf0, )
2023-01-11T21:38:06.9007185Z 
2023-01-11T21:38:06.9007189Z 
2023-01-11T21:38:06.9007296Z if __name__ == "__main__":
2023-01-11T21:38:06.9007407Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9007533Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9007731Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9007937Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9008058Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9008063Z 
2023-01-11T21:38:06.9008134Z ok (0.023s)
2023-01-11T21:38:06.9008612Z   test_cpu_transposed_broadcast3 (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9008746Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9009004Z [2023-01-11 21:37:48,058] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1081
2023-01-11T21:38:06.9009260Z [2023-01-11 21:37:48,065] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1081
2023-01-11T21:38:06.9009271Z 
2023-01-11T21:38:06.9009362Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9009437Z import torch
2023-01-11T21:38:06.9009509Z import random
2023-01-11T21:38:06.9009628Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9009751Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9009785Z 
2023-01-11T21:38:06.9009867Z aten = torch.ops.aten
2023-01-11T21:38:06.9010001Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9010089Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9010094Z 
2023-01-11T21:38:06.9010166Z import triton
2023-01-11T21:38:06.9010262Z import triton.language as tl
2023-01-11T21:38:06.9010387Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9010527Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9010533Z 
2023-01-11T21:38:06.9010537Z 
2023-01-11T21:38:06.9010674Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9010878Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9011001Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9011104Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.9011209Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.9011273Z {
2023-01-11T21:38:06.9011374Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9011439Z     {
2023-01-11T21:38:06.9011520Z         #pragma omp for 
2023-01-11T21:38:06.9011607Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.9011667Z         {
2023-01-11T21:38:06.9011812Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.9011941Z             auto tmp1 = at::vec::Vectorized<float>(in_ptr1[0]);
2023-01-11T21:38:06.9012031Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9012126Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.9012192Z         }
2023-01-11T21:38:06.9012291Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.9012372Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.9012437Z         {
2023-01-11T21:38:06.9012525Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.9012614Z             auto tmp1 = in_ptr1[0];
2023-01-11T21:38:06.9012704Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9012789Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.9012857Z         }
2023-01-11T21:38:06.9012916Z     }
2023-01-11T21:38:06.9012980Z }
2023-01-11T21:38:06.9013064Z ''')
2023-01-11T21:38:06.9013070Z 
2023-01-11T21:38:06.9013074Z 
2023-01-11T21:38:06.9013197Z async_compile.wait(globals())
2023-01-11T21:38:06.9013277Z del async_compile
2023-01-11T21:38:06.9013282Z 
2023-01-11T21:38:06.9013357Z def call(args):
2023-01-11T21:38:06.9013440Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9013507Z     args.clear()
2023-01-11T21:38:06.9013710Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9013875Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9013946Z     del arg0_1
2023-01-11T21:38:06.9014017Z     del arg1_1
2023-01-11T21:38:06.9014091Z     return (buf0, )
2023-01-11T21:38:06.9014100Z 
2023-01-11T21:38:06.9014105Z 
2023-01-11T21:38:06.9014182Z if __name__ == "__main__":
2023-01-11T21:38:06.9014299Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9014419Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9014739Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9014931Z     arg1_1 = rand_strided((1, ), (1, ), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9015050Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9015055Z 
2023-01-11T21:38:06.9015124Z ok (0.021s)
2023-01-11T21:38:06.9015630Z   test_cpu_transposed_dense (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9015832Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9016089Z [2023-01-11 21:37:48,079] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1082
2023-01-11T21:38:06.9016357Z [2023-01-11 21:37:49,738] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1082
2023-01-11T21:38:06.9016362Z 
2023-01-11T21:38:06.9016453Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9016526Z import torch
2023-01-11T21:38:06.9016598Z import random
2023-01-11T21:38:06.9016719Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9016842Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9016847Z 
2023-01-11T21:38:06.9016929Z aten = torch.ops.aten
2023-01-11T21:38:06.9017063Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9017210Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9017219Z 
2023-01-11T21:38:06.9017289Z import triton
2023-01-11T21:38:06.9017383Z import triton.language as tl
2023-01-11T21:38:06.9017508Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9017647Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9017652Z 
2023-01-11T21:38:06.9017657Z 
2023-01-11T21:38:06.9017802Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9018007Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9018132Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9018243Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.9018342Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.9018408Z {
2023-01-11T21:38:06.9018510Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9018578Z     {
2023-01-11T21:38:06.9018675Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.9018764Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.9018831Z         {
2023-01-11T21:38:06.9018917Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.9018985Z             {
2023-01-11T21:38:06.9019055Z                 {
2023-01-11T21:38:06.9019126Z                     {
2023-01-11T21:38:06.9019277Z                         auto tmp0 = in_ptr0[i0 + (10*i1)];
2023-01-11T21:38:06.9019389Z                         auto tmp1 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.9019490Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9019587Z                         out_ptr0[i0 + (10*i1)] = tmp2;
2023-01-11T21:38:06.9019658Z                     }
2023-01-11T21:38:06.9019727Z                 }
2023-01-11T21:38:06.9019795Z             }
2023-01-11T21:38:06.9019863Z         }
2023-01-11T21:38:06.9019930Z     }
2023-01-11T21:38:06.9019988Z }
2023-01-11T21:38:06.9020077Z ''')
2023-01-11T21:38:06.9020083Z 
2023-01-11T21:38:06.9020087Z 
2023-01-11T21:38:06.9020181Z async_compile.wait(globals())
2023-01-11T21:38:06.9020263Z del async_compile
2023-01-11T21:38:06.9020268Z 
2023-01-11T21:38:06.9020343Z def call(args):
2023-01-11T21:38:06.9020423Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9020499Z     args.clear()
2023-01-11T21:38:06.9020699Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9020864Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9020937Z     del arg0_1
2023-01-11T21:38:06.9021009Z     del arg1_1
2023-01-11T21:38:06.9021084Z     return (buf0, )
2023-01-11T21:38:06.9021089Z 
2023-01-11T21:38:06.9021093Z 
2023-01-11T21:38:06.9021175Z if __name__ == "__main__":
2023-01-11T21:38:06.9021294Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9021422Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9021616Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9021862Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9021982Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9021988Z 
2023-01-11T21:38:06.9022057Z ok (1.673s)
2023-01-11T21:38:06.9022539Z   test_cpu_transposed_double (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9022675Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9022935Z [2023-01-11 21:37:49,752] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1083
2023-01-11T21:38:06.9023199Z [2023-01-11 21:37:51,418] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1083
2023-01-11T21:38:06.9023207Z 
2023-01-11T21:38:06.9023306Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9023385Z import torch
2023-01-11T21:38:06.9023454Z import random
2023-01-11T21:38:06.9023574Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9023699Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9023704Z 
2023-01-11T21:38:06.9023786Z aten = torch.ops.aten
2023-01-11T21:38:06.9023922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9024019Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9024025Z 
2023-01-11T21:38:06.9024099Z import triton
2023-01-11T21:38:06.9024185Z import triton.language as tl
2023-01-11T21:38:06.9024315Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9024457Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9024462Z 
2023-01-11T21:38:06.9024467Z 
2023-01-11T21:38:06.9024608Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9024814Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9024940Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9025053Z                        const double* __restrict__ in_ptr1,
2023-01-11T21:38:06.9025189Z                        double* __restrict__ out_ptr0)
2023-01-11T21:38:06.9025252Z {
2023-01-11T21:38:06.9025355Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9025435Z     {
2023-01-11T21:38:06.9025545Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.9025647Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.9025723Z         {
2023-01-11T21:38:06.9025814Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.9025875Z             {
2023-01-11T21:38:06.9025946Z                 {
2023-01-11T21:38:06.9026018Z                     {
2023-01-11T21:38:06.9026127Z                         auto tmp0 = in_ptr0[i0 + (10*i1)];
2023-01-11T21:38:06.9026240Z                         auto tmp2 = in_ptr1[i1 + (10*i0)];
2023-01-11T21:38:06.9026357Z                         auto tmp1 = static_cast<double>(tmp0);
2023-01-11T21:38:06.9026457Z                         auto tmp3 = tmp1 + tmp2;
2023-01-11T21:38:06.9026553Z                         out_ptr0[i0 + (10*i1)] = tmp3;
2023-01-11T21:38:06.9026627Z                     }
2023-01-11T21:38:06.9026697Z                 }
2023-01-11T21:38:06.9026765Z             }
2023-01-11T21:38:06.9026833Z         }
2023-01-11T21:38:06.9026901Z     }
2023-01-11T21:38:06.9026968Z }
2023-01-11T21:38:06.9027048Z ''')
2023-01-11T21:38:06.9027053Z 
2023-01-11T21:38:06.9027058Z 
2023-01-11T21:38:06.9027151Z async_compile.wait(globals())
2023-01-11T21:38:06.9027231Z del async_compile
2023-01-11T21:38:06.9027236Z 
2023-01-11T21:38:06.9027310Z def call(args):
2023-01-11T21:38:06.9027390Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9027466Z     args.clear()
2023-01-11T21:38:06.9027667Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.9027858Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9027936Z     del arg0_1
2023-01-11T21:38:06.9028010Z     del arg1_1
2023-01-11T21:38:06.9028088Z     return (buf0, )
2023-01-11T21:38:06.9028093Z 
2023-01-11T21:38:06.9028099Z 
2023-01-11T21:38:06.9028182Z if __name__ == "__main__":
2023-01-11T21:38:06.9028302Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9028432Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9028636Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9028829Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cpu', dtype=torch.float64)
2023-01-11T21:38:06.9028953Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9028958Z 
2023-01-11T21:38:06.9029033Z ok (1.680s)
2023-01-11T21:38:06.9029511Z   test_cpu_transposed_int (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9029649Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9029911Z [2023-01-11 21:37:51,433] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1084
2023-01-11T21:38:06.9030175Z [2023-01-11 21:37:53,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1084
2023-01-11T21:38:06.9030181Z 
2023-01-11T21:38:06.9030284Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9030363Z import torch
2023-01-11T21:38:06.9030434Z import random
2023-01-11T21:38:06.9030558Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9030690Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9030695Z 
2023-01-11T21:38:06.9030781Z aten = torch.ops.aten
2023-01-11T21:38:06.9030922Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9031022Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9031028Z 
2023-01-11T21:38:06.9031167Z import triton
2023-01-11T21:38:06.9031264Z import triton.language as tl
2023-01-11T21:38:06.9031387Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9031532Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9031537Z 
2023-01-11T21:38:06.9031542Z 
2023-01-11T21:38:06.9031685Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9031891Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9032017Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9032129Z                        const int* __restrict__ in_ptr1,
2023-01-11T21:38:06.9032239Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.9032307Z {
2023-01-11T21:38:06.9032406Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9032477Z     {
2023-01-11T21:38:06.9032575Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.9032668Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.9032739Z         {
2023-01-11T21:38:06.9032832Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.9032903Z             {
2023-01-11T21:38:06.9032969Z                 {
2023-01-11T21:38:06.9033043Z                     {
2023-01-11T21:38:06.9033157Z                         auto tmp0 = in_ptr0[i1 + (10*i0)];
2023-01-11T21:38:06.9033261Z                         auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.9033382Z                         auto tmp2 = static_cast<float>(tmp1);
2023-01-11T21:38:06.9033484Z                         auto tmp3 = tmp0 + tmp2;
2023-01-11T21:38:06.9033589Z                         out_ptr0[i1 + (10*i0)] = tmp3;
2023-01-11T21:38:06.9033682Z                     }
2023-01-11T21:38:06.9033752Z                 }
2023-01-11T21:38:06.9033819Z             }
2023-01-11T21:38:06.9033886Z         }
2023-01-11T21:38:06.9033952Z     }
2023-01-11T21:38:06.9034018Z }
2023-01-11T21:38:06.9034098Z ''')
2023-01-11T21:38:06.9034103Z 
2023-01-11T21:38:06.9034117Z 
2023-01-11T21:38:06.9034206Z async_compile.wait(globals())
2023-01-11T21:38:06.9034283Z del async_compile
2023-01-11T21:38:06.9034288Z 
2023-01-11T21:38:06.9034364Z def call(args):
2023-01-11T21:38:06.9034444Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9034519Z     args.clear()
2023-01-11T21:38:06.9034720Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9034888Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9034955Z     del arg0_1
2023-01-11T21:38:06.9035028Z     del arg1_1
2023-01-11T21:38:06.9035102Z     return (buf0, )
2023-01-11T21:38:06.9035111Z 
2023-01-11T21:38:06.9035117Z 
2023-01-11T21:38:06.9035198Z if __name__ == "__main__":
2023-01-11T21:38:06.9035319Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9035447Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9035648Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9035834Z     arg1_1 = rand_strided((10, ), (1, ), device='cpu', dtype=torch.int32)
2023-01-11T21:38:06.9035955Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9035960Z 
2023-01-11T21:38:06.9036032Z ok (1.679s)
2023-01-11T21:38:06.9036512Z   test_cpu_transposed_strided (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9036647Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9036907Z [2023-01-11 21:37:53,113] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1085
2023-01-11T21:38:06.9037200Z [2023-01-11 21:37:54,777] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1085
2023-01-11T21:38:06.9037206Z 
2023-01-11T21:38:06.9037304Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9037379Z import torch
2023-01-11T21:38:06.9037455Z import random
2023-01-11T21:38:06.9037569Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9037693Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9037698Z 
2023-01-11T21:38:06.9037781Z aten = torch.ops.aten
2023-01-11T21:38:06.9037919Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9038016Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9038023Z 
2023-01-11T21:38:06.9038099Z import triton
2023-01-11T21:38:06.9038192Z import triton.language as tl
2023-01-11T21:38:06.9038310Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9038451Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9038456Z 
2023-01-11T21:38:06.9038463Z 
2023-01-11T21:38:06.9038599Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9038805Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9038929Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9039040Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.9039143Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.9039208Z {
2023-01-11T21:38:06.9039303Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9039368Z     {
2023-01-11T21:38:06.9039464Z         #pragma omp for  collapse(2)
2023-01-11T21:38:06.9039576Z         for(long i0=0; i0<10; i0+=1)
2023-01-11T21:38:06.9039643Z         {
2023-01-11T21:38:06.9039737Z             for(long i1=0; i1<10; i1+=1)
2023-01-11T21:38:06.9039805Z             {
2023-01-11T21:38:06.9039870Z                 {
2023-01-11T21:38:06.9039941Z                     {
2023-01-11T21:38:06.9040053Z                         auto tmp0 = in_ptr0[i0 + (10*i1)];
2023-01-11T21:38:06.9040167Z                         auto tmp1 = in_ptr1[(2*i1) + (30*i0)];
2023-01-11T21:38:06.9040266Z                         auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9040370Z                         out_ptr0[i0 + (10*i1)] = tmp2;
2023-01-11T21:38:06.9040441Z                     }
2023-01-11T21:38:06.9040503Z                 }
2023-01-11T21:38:06.9040569Z             }
2023-01-11T21:38:06.9040635Z         }
2023-01-11T21:38:06.9040698Z     }
2023-01-11T21:38:06.9040761Z }
2023-01-11T21:38:06.9040847Z ''')
2023-01-11T21:38:06.9040853Z 
2023-01-11T21:38:06.9040858Z 
2023-01-11T21:38:06.9040953Z async_compile.wait(globals())
2023-01-11T21:38:06.9041024Z del async_compile
2023-01-11T21:38:06.9041029Z 
2023-01-11T21:38:06.9041105Z def call(args):
2023-01-11T21:38:06.9041187Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9041261Z     args.clear()
2023-01-11T21:38:06.9041461Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9041630Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9041704Z     del arg0_1
2023-01-11T21:38:06.9041769Z     del arg1_1
2023-01-11T21:38:06.9041848Z     return (buf0, )
2023-01-11T21:38:06.9041853Z 
2023-01-11T21:38:06.9041857Z 
2023-01-11T21:38:06.9041939Z if __name__ == "__main__":
2023-01-11T21:38:06.9042057Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9042184Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9042384Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9042585Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9042702Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9042708Z 
2023-01-11T21:38:06.9042775Z ok (1.680s)
2023-01-11T21:38:06.9043285Z   test_cpu_transposed_transposed (__main__.SweepInputsCpuTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9043422Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9043684Z [2023-01-11 21:37:54,792] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1086
2023-01-11T21:38:06.9043950Z [2023-01-11 21:37:54,800] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1086
2023-01-11T21:38:06.9043959Z 
2023-01-11T21:38:06.9044058Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9044132Z import torch
2023-01-11T21:38:06.9044207Z import random
2023-01-11T21:38:06.9044327Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9044448Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9044461Z 
2023-01-11T21:38:06.9044537Z aten = torch.ops.aten
2023-01-11T21:38:06.9044673Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9044769Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9044775Z 
2023-01-11T21:38:06.9044848Z import triton
2023-01-11T21:38:06.9044942Z import triton.language as tl
2023-01-11T21:38:06.9045067Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9045206Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9045212Z 
2023-01-11T21:38:06.9045216Z 
2023-01-11T21:38:06.9045380Z kernel_cpp_0 = async_compile.cpp('''
2023-01-11T21:38:06.9045580Z #include "/tmp/torchinductor_jenkins/77/c7773nj5pwikpmm2pwa62rcudlf7p3if7eyqb5k4sjsvewwje4le.h"
2023-01-11T21:38:06.9045705Z extern "C" void kernel(const float* __restrict__ in_ptr0,
2023-01-11T21:38:06.9045814Z                        const float* __restrict__ in_ptr1,
2023-01-11T21:38:06.9045922Z                        float* __restrict__ out_ptr0)
2023-01-11T21:38:06.9045988Z {
2023-01-11T21:38:06.9046090Z     #pragma omp parallel num_threads(8)
2023-01-11T21:38:06.9046157Z     {
2023-01-11T21:38:06.9046233Z         #pragma omp for 
2023-01-11T21:38:06.9046322Z         for(long i0=0; i0<12; i0+=1)
2023-01-11T21:38:06.9046388Z         {
2023-01-11T21:38:06.9046530Z             auto tmp0 = at::vec::Vectorized<float>::loadu(in_ptr0 + 8*i0);
2023-01-11T21:38:06.9046670Z             auto tmp1 = at::vec::Vectorized<float>::loadu(in_ptr1 + 8*i0);
2023-01-11T21:38:06.9046761Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9046860Z             tmp2.store(out_ptr0 + 8*i0);
2023-01-11T21:38:06.9046921Z         }
2023-01-11T21:38:06.9047021Z         #pragma omp for simd simdlen(4) 
2023-01-11T21:38:06.9047110Z         for(long i0=96; i0<100; i0+=1)
2023-01-11T21:38:06.9047176Z         {
2023-01-11T21:38:06.9047264Z             auto tmp0 = in_ptr0[i0];
2023-01-11T21:38:06.9047356Z             auto tmp1 = in_ptr1[i0];
2023-01-11T21:38:06.9047442Z             auto tmp2 = tmp0 + tmp1;
2023-01-11T21:38:06.9047522Z             out_ptr0[i0] = tmp2;
2023-01-11T21:38:06.9047588Z         }
2023-01-11T21:38:06.9047655Z     }
2023-01-11T21:38:06.9047721Z }
2023-01-11T21:38:06.9047806Z ''')
2023-01-11T21:38:06.9047812Z 
2023-01-11T21:38:06.9047816Z 
2023-01-11T21:38:06.9047910Z async_compile.wait(globals())
2023-01-11T21:38:06.9047988Z del async_compile
2023-01-11T21:38:06.9047993Z 
2023-01-11T21:38:06.9048061Z def call(args):
2023-01-11T21:38:06.9048140Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9048217Z     args.clear()
2023-01-11T21:38:06.9048421Z     buf0 = empty_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9048590Z     kernel_cpp_0(c_void_p(arg0_1.data_ptr()), c_void_p(arg1_1.data_ptr()), c_void_p(buf0.data_ptr()))
2023-01-11T21:38:06.9048662Z     del arg0_1
2023-01-11T21:38:06.9048734Z     del arg1_1
2023-01-11T21:38:06.9048804Z     return (buf0, )
2023-01-11T21:38:06.9048835Z 
2023-01-11T21:38:06.9048848Z 
2023-01-11T21:38:06.9048925Z if __name__ == "__main__":
2023-01-11T21:38:06.9049047Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9049176Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9049381Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9049581Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cpu', dtype=torch.float32)
2023-01-11T21:38:06.9049705Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9049710Z 
2023-01-11T21:38:06.9049788Z ok (0.022s)
2023-01-11T21:38:06.9050279Z   test_cuda_broadcast1_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9050414Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9050671Z [2023-01-11 21:37:54,813] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1087
2023-01-11T21:38:06.9050939Z [2023-01-11 21:37:54,822] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1087
2023-01-11T21:38:06.9050944Z 
2023-01-11T21:38:06.9051046Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9051122Z import torch
2023-01-11T21:38:06.9051200Z import random
2023-01-11T21:38:06.9051350Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9051475Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9051480Z 
2023-01-11T21:38:06.9051563Z aten = torch.ops.aten
2023-01-11T21:38:06.9051694Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9051792Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9051797Z 
2023-01-11T21:38:06.9051874Z import triton
2023-01-11T21:38:06.9051966Z import triton.language as tl
2023-01-11T21:38:06.9052089Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9052228Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9052233Z 
2023-01-11T21:38:06.9052238Z 
2023-01-11T21:38:06.9052393Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9052462Z import triton
2023-01-11T21:38:06.9052556Z import triton.language as tl
2023-01-11T21:38:06.9052670Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9052780Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9052914Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9053039Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9053044Z 
2023-01-11T21:38:06.9053470Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9053546Z @triton.jit
2023-01-11T21:38:06.9053684Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9053760Z     xnumel = 10
2023-01-11T21:38:06.9053858Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9053991Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9054075Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9054147Z     x0 = xindex
2023-01-11T21:38:06.9054245Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9054339Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9054420Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9054666Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9054753Z ''')
2023-01-11T21:38:06.9054759Z 
2023-01-11T21:38:06.9054763Z 
2023-01-11T21:38:06.9054899Z async_compile.wait(globals())
2023-01-11T21:38:06.9054976Z del async_compile
2023-01-11T21:38:06.9054981Z 
2023-01-11T21:38:06.9055057Z def call(args):
2023-01-11T21:38:06.9055135Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9055204Z     args.clear()
2023-01-11T21:38:06.9055296Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9055499Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9055590Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9055736Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9055810Z         del arg0_1
2023-01-11T21:38:06.9055889Z         del arg1_1
2023-01-11T21:38:06.9055961Z         return (buf0, )
2023-01-11T21:38:06.9055966Z 
2023-01-11T21:38:06.9055980Z 
2023-01-11T21:38:06.9056054Z if __name__ == "__main__":
2023-01-11T21:38:06.9056172Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9056303Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9056505Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9056703Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9056823Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9056828Z 
2023-01-11T21:38:06.9056898Z ok (0.023s)
2023-01-11T21:38:06.9057467Z   test_cuda_broadcast1_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9057645Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9057907Z [2023-01-11 21:37:54,835] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1088
2023-01-11T21:38:06.9058172Z [2023-01-11 21:37:54,908] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1088
2023-01-11T21:38:06.9058178Z 
2023-01-11T21:38:06.9058276Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9058351Z import torch
2023-01-11T21:38:06.9058424Z import random
2023-01-11T21:38:06.9058545Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9058670Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9058675Z 
2023-01-11T21:38:06.9058758Z aten = torch.ops.aten
2023-01-11T21:38:06.9058892Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9058986Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9058992Z 
2023-01-11T21:38:06.9059064Z import triton
2023-01-11T21:38:06.9059156Z import triton.language as tl
2023-01-11T21:38:06.9059281Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9059422Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9059428Z 
2023-01-11T21:38:06.9059432Z 
2023-01-11T21:38:06.9059586Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9059655Z import triton
2023-01-11T21:38:06.9059748Z import triton.language as tl
2023-01-11T21:38:06.9059864Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9059966Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9060102Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9060230Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9060235Z 
2023-01-11T21:38:06.9060657Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9060731Z @triton.jit
2023-01-11T21:38:06.9060902Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9060972Z     xnumel = 100
2023-01-11T21:38:06.9061070Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9061200Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9061280Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9061355Z     x0 = xindex % 10
2023-01-11T21:38:06.9061436Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9061501Z     x2 = xindex
2023-01-11T21:38:06.9061602Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9061701Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9061784Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9061923Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9062009Z ''')
2023-01-11T21:38:06.9062015Z 
2023-01-11T21:38:06.9062019Z 
2023-01-11T21:38:06.9062115Z async_compile.wait(globals())
2023-01-11T21:38:06.9062191Z del async_compile
2023-01-11T21:38:06.9062196Z 
2023-01-11T21:38:06.9062268Z def call(args):
2023-01-11T21:38:06.9062348Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9062423Z     args.clear()
2023-01-11T21:38:06.9062514Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9062729Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9062821Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9062966Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9063033Z         del arg0_1
2023-01-11T21:38:06.9063104Z         del arg1_1
2023-01-11T21:38:06.9063182Z         return (buf0, )
2023-01-11T21:38:06.9063187Z 
2023-01-11T21:38:06.9063229Z 
2023-01-11T21:38:06.9063311Z if __name__ == "__main__":
2023-01-11T21:38:06.9063430Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9063561Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9063762Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9063976Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9064090Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9064096Z 
2023-01-11T21:38:06.9064167Z ok (0.086s)
2023-01-11T21:38:06.9064643Z   test_cuda_broadcast1_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9064778Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9065038Z [2023-01-11 21:37:54,922] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1089
2023-01-11T21:38:06.9065301Z [2023-01-11 21:37:54,989] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1089
2023-01-11T21:38:06.9065307Z 
2023-01-11T21:38:06.9065406Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9065480Z import torch
2023-01-11T21:38:06.9065555Z import random
2023-01-11T21:38:06.9065668Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9065791Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9065797Z 
2023-01-11T21:38:06.9065881Z aten = torch.ops.aten
2023-01-11T21:38:06.9066018Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9066113Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9066121Z 
2023-01-11T21:38:06.9066194Z import triton
2023-01-11T21:38:06.9066287Z import triton.language as tl
2023-01-11T21:38:06.9066413Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9066547Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9066552Z 
2023-01-11T21:38:06.9066561Z 
2023-01-11T21:38:06.9066734Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9066809Z import triton
2023-01-11T21:38:06.9066902Z import triton.language as tl
2023-01-11T21:38:06.9067015Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9067117Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9067250Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9067369Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9067381Z 
2023-01-11T21:38:06.9067794Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9067871Z @triton.jit
2023-01-11T21:38:06.9068013Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9068087Z     xnumel = 10
2023-01-11T21:38:06.9068186Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9068317Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9068399Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9068465Z     x0 = xindex
2023-01-11T21:38:06.9068563Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9068696Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9068775Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9068910Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9068994Z ''')
2023-01-11T21:38:06.9069000Z 
2023-01-11T21:38:06.9069004Z 
2023-01-11T21:38:06.9069125Z async_compile.wait(globals())
2023-01-11T21:38:06.9069203Z del async_compile
2023-01-11T21:38:06.9069208Z 
2023-01-11T21:38:06.9069277Z def call(args):
2023-01-11T21:38:06.9069357Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9069432Z     args.clear()
2023-01-11T21:38:06.9069525Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9069726Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9069818Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9069961Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9070028Z         del arg0_1
2023-01-11T21:38:06.9070101Z         del arg1_1
2023-01-11T21:38:06.9070179Z         return (buf0, )
2023-01-11T21:38:06.9070185Z 
2023-01-11T21:38:06.9070189Z 
2023-01-11T21:38:06.9070268Z if __name__ == "__main__":
2023-01-11T21:38:06.9070385Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9070511Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9070713Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9070910Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9071023Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9071028Z 
2023-01-11T21:38:06.9071101Z ok (0.081s)
2023-01-11T21:38:06.9071577Z   test_cuda_broadcast1_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9071708Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9071966Z [2023-01-11 21:37:55,003] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1090
2023-01-11T21:38:06.9072232Z [2023-01-11 21:37:55,084] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1090
2023-01-11T21:38:06.9072238Z 
2023-01-11T21:38:06.9072335Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9072411Z import torch
2023-01-11T21:38:06.9072512Z import random
2023-01-11T21:38:06.9072626Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9072753Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9072758Z 
2023-01-11T21:38:06.9072842Z aten = torch.ops.aten
2023-01-11T21:38:06.9072979Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9073077Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9073082Z 
2023-01-11T21:38:06.9073156Z import triton
2023-01-11T21:38:06.9073248Z import triton.language as tl
2023-01-11T21:38:06.9073373Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9073507Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9073515Z 
2023-01-11T21:38:06.9073524Z 
2023-01-11T21:38:06.9073673Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9073748Z import triton
2023-01-11T21:38:06.9073842Z import triton.language as tl
2023-01-11T21:38:06.9073958Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9074061Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9074193Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9074319Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9074324Z 
2023-01-11T21:38:06.9074735Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9074809Z @triton.jit
2023-01-11T21:38:06.9074953Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9075052Z     xnumel = 100
2023-01-11T21:38:06.9075156Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9075287Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9075377Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9075452Z     x0 = xindex % 10
2023-01-11T21:38:06.9075531Z     x2 = xindex
2023-01-11T21:38:06.9075641Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9075759Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9075847Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9075980Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9076071Z ''')
2023-01-11T21:38:06.9076076Z 
2023-01-11T21:38:06.9076081Z 
2023-01-11T21:38:06.9076174Z async_compile.wait(globals())
2023-01-11T21:38:06.9076245Z del async_compile
2023-01-11T21:38:06.9076250Z 
2023-01-11T21:38:06.9076324Z def call(args):
2023-01-11T21:38:06.9076402Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9076482Z     args.clear()
2023-01-11T21:38:06.9076575Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9076779Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9076872Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9077011Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9077087Z         del arg0_1
2023-01-11T21:38:06.9077162Z         del arg1_1
2023-01-11T21:38:06.9077238Z         return (buf0, )
2023-01-11T21:38:06.9077243Z 
2023-01-11T21:38:06.9077248Z 
2023-01-11T21:38:06.9077327Z if __name__ == "__main__":
2023-01-11T21:38:06.9077446Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9077572Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9077774Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9077971Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9078093Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9078099Z 
2023-01-11T21:38:06.9078171Z ok (0.094s)
2023-01-11T21:38:06.9078673Z   test_cuda_broadcast1_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9078808Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9079070Z [2023-01-11 21:37:55,097] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1091
2023-01-11T21:38:06.9079339Z [2023-01-11 21:37:55,167] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1091
2023-01-11T21:38:06.9079347Z 
2023-01-11T21:38:06.9079447Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9079523Z import torch
2023-01-11T21:38:06.9079592Z import random
2023-01-11T21:38:06.9079714Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9079838Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9079846Z 
2023-01-11T21:38:06.9079928Z aten = torch.ops.aten
2023-01-11T21:38:06.9080066Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9080161Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9080166Z 
2023-01-11T21:38:06.9080240Z import triton
2023-01-11T21:38:06.9080332Z import triton.language as tl
2023-01-11T21:38:06.9080451Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9080590Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9080596Z 
2023-01-11T21:38:06.9080600Z 
2023-01-11T21:38:06.9080757Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9080863Z import triton
2023-01-11T21:38:06.9080957Z import triton.language as tl
2023-01-11T21:38:06.9081072Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9081176Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9081312Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9081436Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9081441Z 
2023-01-11T21:38:06.9081862Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9081935Z @triton.jit
2023-01-11T21:38:06.9082077Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9082150Z     xnumel = 100
2023-01-11T21:38:06.9082248Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9082378Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9082464Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9082534Z     x0 = xindex % 10
2023-01-11T21:38:06.9082605Z     x2 = xindex
2023-01-11T21:38:06.9082702Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9082801Z     tmp2 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9082893Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9082971Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9083102Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9083188Z ''')
2023-01-11T21:38:06.9083193Z 
2023-01-11T21:38:06.9083198Z 
2023-01-11T21:38:06.9083290Z async_compile.wait(globals())
2023-01-11T21:38:06.9083367Z del async_compile
2023-01-11T21:38:06.9083372Z 
2023-01-11T21:38:06.9083448Z def call(args):
2023-01-11T21:38:06.9083526Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9083604Z     args.clear()
2023-01-11T21:38:06.9083695Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9083893Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9083989Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9084131Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9084206Z         del arg0_1
2023-01-11T21:38:06.9084279Z         del arg1_1
2023-01-11T21:38:06.9084380Z         return (buf0, )
2023-01-11T21:38:06.9084385Z 
2023-01-11T21:38:06.9084390Z 
2023-01-11T21:38:06.9084472Z if __name__ == "__main__":
2023-01-11T21:38:06.9084591Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9084711Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9084910Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9085115Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9085234Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9085242Z 
2023-01-11T21:38:06.9085318Z ok (0.083s)
2023-01-11T21:38:06.9085852Z   test_cuda_broadcast1_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9085983Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9086245Z [2023-01-11 21:37:55,180] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1092
2023-01-11T21:38:06.9086510Z [2023-01-11 21:37:55,247] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1092
2023-01-11T21:38:06.9086516Z 
2023-01-11T21:38:06.9086607Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9086680Z import torch
2023-01-11T21:38:06.9086783Z import random
2023-01-11T21:38:06.9086902Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9087024Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9087029Z 
2023-01-11T21:38:06.9087115Z aten = torch.ops.aten
2023-01-11T21:38:06.9087254Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9087351Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9087357Z 
2023-01-11T21:38:06.9087424Z import triton
2023-01-11T21:38:06.9087516Z import triton.language as tl
2023-01-11T21:38:06.9087642Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9087783Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9087789Z 
2023-01-11T21:38:06.9087793Z 
2023-01-11T21:38:06.9087950Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9088024Z import triton
2023-01-11T21:38:06.9088117Z import triton.language as tl
2023-01-11T21:38:06.9088225Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9088332Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9088468Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9088592Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9088597Z 
2023-01-11T21:38:06.9089014Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9089089Z @triton.jit
2023-01-11T21:38:06.9089229Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9089300Z     xnumel = 10
2023-01-11T21:38:06.9089391Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9089522Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9089606Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9089676Z     x0 = xindex
2023-01-11T21:38:06.9089775Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9089869Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9089958Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.9090030Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9090165Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9090275Z ''')
2023-01-11T21:38:06.9090281Z 
2023-01-11T21:38:06.9090286Z 
2023-01-11T21:38:06.9090377Z async_compile.wait(globals())
2023-01-11T21:38:06.9090457Z del async_compile
2023-01-11T21:38:06.9090462Z 
2023-01-11T21:38:06.9090537Z def call(args):
2023-01-11T21:38:06.9090620Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9090695Z     args.clear()
2023-01-11T21:38:06.9090780Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9090978Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9091070Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9091212Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9091287Z         del arg0_1
2023-01-11T21:38:06.9091361Z         del arg1_1
2023-01-11T21:38:06.9091439Z         return (buf0, )
2023-01-11T21:38:06.9091444Z 
2023-01-11T21:38:06.9091448Z 
2023-01-11T21:38:06.9091530Z if __name__ == "__main__":
2023-01-11T21:38:06.9091644Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9091770Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9091970Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9092165Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9092284Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9092289Z 
2023-01-11T21:38:06.9092359Z ok (0.080s)
2023-01-11T21:38:06.9092836Z   test_cuda_broadcast1_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9093000Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9093261Z [2023-01-11 21:37:55,261] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1093
2023-01-11T21:38:06.9093519Z [2023-01-11 21:37:55,335] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1093
2023-01-11T21:38:06.9093525Z 
2023-01-11T21:38:06.9093626Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9093698Z import torch
2023-01-11T21:38:06.9093771Z import random
2023-01-11T21:38:06.9093889Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9094013Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9094022Z 
2023-01-11T21:38:06.9094104Z aten = torch.ops.aten
2023-01-11T21:38:06.9094243Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9094332Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9094337Z 
2023-01-11T21:38:06.9094410Z import triton
2023-01-11T21:38:06.9094714Z import triton.language as tl
2023-01-11T21:38:06.9094846Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9094989Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9094995Z 
2023-01-11T21:38:06.9094999Z 
2023-01-11T21:38:06.9095157Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9095231Z import triton
2023-01-11T21:38:06.9095316Z import triton.language as tl
2023-01-11T21:38:06.9095431Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9095534Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9095670Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9095794Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9095802Z 
2023-01-11T21:38:06.9096219Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9096336Z @triton.jit
2023-01-11T21:38:06.9096481Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9096556Z     xnumel = 100
2023-01-11T21:38:06.9096647Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9096775Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9096859Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9096935Z     x0 = xindex % 10
2023-01-11T21:38:06.9097016Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9097087Z     x2 = xindex
2023-01-11T21:38:06.9097237Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9097347Z     tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9097434Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9097572Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9097657Z ''')
2023-01-11T21:38:06.9097662Z 
2023-01-11T21:38:06.9097667Z 
2023-01-11T21:38:06.9097762Z async_compile.wait(globals())
2023-01-11T21:38:06.9097841Z del async_compile
2023-01-11T21:38:06.9097846Z 
2023-01-11T21:38:06.9097919Z def call(args):
2023-01-11T21:38:06.9097991Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9098069Z     args.clear()
2023-01-11T21:38:06.9098161Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9098364Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9098454Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9098598Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9098670Z         del arg0_1
2023-01-11T21:38:06.9098735Z         del arg1_1
2023-01-11T21:38:06.9098855Z         return (buf0, )
2023-01-11T21:38:06.9098860Z 
2023-01-11T21:38:06.9098865Z 
2023-01-11T21:38:06.9098946Z if __name__ == "__main__":
2023-01-11T21:38:06.9099063Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9099186Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9099388Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9099592Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9099711Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9099717Z 
2023-01-11T21:38:06.9099780Z ok (0.088s)
2023-01-11T21:38:06.9100264Z   test_cuda_broadcast1_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9100403Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9100668Z [2023-01-11 21:37:55,349] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1094
2023-01-11T21:38:06.9100933Z [2023-01-11 21:37:55,421] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1094
2023-01-11T21:38:06.9100939Z 
2023-01-11T21:38:06.9101036Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9101110Z import torch
2023-01-11T21:38:06.9101184Z import random
2023-01-11T21:38:06.9101302Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9101419Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9101432Z 
2023-01-11T21:38:06.9101508Z aten = torch.ops.aten
2023-01-11T21:38:06.9101644Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9101741Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9101747Z 
2023-01-11T21:38:06.9101820Z import triton
2023-01-11T21:38:06.9101911Z import triton.language as tl
2023-01-11T21:38:06.9102034Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9102199Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9102205Z 
2023-01-11T21:38:06.9102209Z 
2023-01-11T21:38:06.9102357Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9102430Z import triton
2023-01-11T21:38:06.9102522Z import triton.language as tl
2023-01-11T21:38:06.9102638Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9102739Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9102873Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9102998Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9103004Z 
2023-01-11T21:38:06.9103422Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9103492Z @triton.jit
2023-01-11T21:38:06.9103638Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9103713Z     xnumel = 100
2023-01-11T21:38:06.9103810Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9103938Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9104020Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9104098Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9104162Z     x2 = xindex
2023-01-11T21:38:06.9104261Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9104358Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9104437Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9104573Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9104701Z ''')
2023-01-11T21:38:06.9104707Z 
2023-01-11T21:38:06.9104711Z 
2023-01-11T21:38:06.9104804Z async_compile.wait(globals())
2023-01-11T21:38:06.9104881Z del async_compile
2023-01-11T21:38:06.9104886Z 
2023-01-11T21:38:06.9104954Z def call(args):
2023-01-11T21:38:06.9105033Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9105111Z     args.clear()
2023-01-11T21:38:06.9105200Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9105407Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9105501Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9105644Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9105711Z         del arg0_1
2023-01-11T21:38:06.9105783Z         del arg1_1
2023-01-11T21:38:06.9105860Z         return (buf0, )
2023-01-11T21:38:06.9105865Z 
2023-01-11T21:38:06.9105870Z 
2023-01-11T21:38:06.9105949Z if __name__ == "__main__":
2023-01-11T21:38:06.9106075Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9106202Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9106401Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9106606Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9106721Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9106727Z 
2023-01-11T21:38:06.9106796Z ok (0.086s)
2023-01-11T21:38:06.9107280Z   test_cuda_broadcast2_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9107412Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9107674Z [2023-01-11 21:37:55,435] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1095
2023-01-11T21:38:06.9107939Z [2023-01-11 21:37:55,506] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1095
2023-01-11T21:38:06.9107944Z 
2023-01-11T21:38:06.9108070Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9108146Z import torch
2023-01-11T21:38:06.9108223Z import random
2023-01-11T21:38:06.9108335Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9108459Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9108464Z 
2023-01-11T21:38:06.9108544Z aten = torch.ops.aten
2023-01-11T21:38:06.9108680Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9108775Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9108780Z 
2023-01-11T21:38:06.9108850Z import triton
2023-01-11T21:38:06.9108945Z import triton.language as tl
2023-01-11T21:38:06.9109072Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9109206Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9109211Z 
2023-01-11T21:38:06.9109224Z 
2023-01-11T21:38:06.9109371Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9109447Z import triton
2023-01-11T21:38:06.9109540Z import triton.language as tl
2023-01-11T21:38:06.9109653Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9109755Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9109887Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9110013Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9110018Z 
2023-01-11T21:38:06.9110427Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9110527Z @triton.jit
2023-01-11T21:38:06.9110667Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9110740Z     xnumel = 100
2023-01-11T21:38:06.9110836Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9110967Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9111046Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9111125Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9111193Z     x0 = xindex % 10
2023-01-11T21:38:06.9111263Z     x2 = xindex
2023-01-11T21:38:06.9111361Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9111458Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9111535Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9111669Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9111747Z ''')
2023-01-11T21:38:06.9111759Z 
2023-01-11T21:38:06.9111763Z 
2023-01-11T21:38:06.9111849Z async_compile.wait(globals())
2023-01-11T21:38:06.9111926Z del async_compile
2023-01-11T21:38:06.9111931Z 
2023-01-11T21:38:06.9112004Z def call(args):
2023-01-11T21:38:06.9112082Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9112157Z     args.clear()
2023-01-11T21:38:06.9112250Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9112465Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9112551Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9112694Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9112767Z         del arg0_1
2023-01-11T21:38:06.9112839Z         del arg1_1
2023-01-11T21:38:06.9112916Z         return (buf0, )
2023-01-11T21:38:06.9112921Z 
2023-01-11T21:38:06.9112926Z 
2023-01-11T21:38:06.9113005Z if __name__ == "__main__":
2023-01-11T21:38:06.9113122Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9113247Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9113454Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9113652Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9113773Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9113778Z 
2023-01-11T21:38:06.9113876Z ok (0.085s)
2023-01-11T21:38:06.9114355Z   test_cuda_broadcast2_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9114485Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9114747Z [2023-01-11 21:37:55,520] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1096
2023-01-11T21:38:06.9115014Z [2023-01-11 21:37:55,529] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1096
2023-01-11T21:38:06.9115020Z 
2023-01-11T21:38:06.9115118Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9115186Z import torch
2023-01-11T21:38:06.9115261Z import random
2023-01-11T21:38:06.9115379Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9115503Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9115509Z 
2023-01-11T21:38:06.9115591Z aten = torch.ops.aten
2023-01-11T21:38:06.9115728Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9115823Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9115829Z 
2023-01-11T21:38:06.9115900Z import triton
2023-01-11T21:38:06.9115985Z import triton.language as tl
2023-01-11T21:38:06.9116110Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9116250Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9116283Z 
2023-01-11T21:38:06.9116288Z 
2023-01-11T21:38:06.9116442Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9116515Z import triton
2023-01-11T21:38:06.9116607Z import triton.language as tl
2023-01-11T21:38:06.9116723Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9116817Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9116950Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9117075Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9117080Z 
2023-01-11T21:38:06.9117497Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9117570Z @triton.jit
2023-01-11T21:38:06.9117711Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9117788Z     xnumel = 10
2023-01-11T21:38:06.9117887Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9118018Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9118095Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9118165Z     x0 = xindex
2023-01-11T21:38:06.9118266Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9118364Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9118442Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9118578Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9118657Z ''')
2023-01-11T21:38:06.9118667Z 
2023-01-11T21:38:06.9118671Z 
2023-01-11T21:38:06.9118757Z async_compile.wait(globals())
2023-01-11T21:38:06.9118833Z del async_compile
2023-01-11T21:38:06.9118839Z 
2023-01-11T21:38:06.9118915Z def call(args):
2023-01-11T21:38:06.9118993Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9119068Z     args.clear()
2023-01-11T21:38:06.9119162Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9119371Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9119456Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9119597Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9119701Z         del arg0_1
2023-01-11T21:38:06.9119773Z         del arg1_1
2023-01-11T21:38:06.9119849Z         return (buf0, )
2023-01-11T21:38:06.9119854Z 
2023-01-11T21:38:06.9119858Z 
2023-01-11T21:38:06.9119938Z if __name__ == "__main__":
2023-01-11T21:38:06.9120056Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9120183Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9120389Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9120597Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9120720Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9120725Z 
2023-01-11T21:38:06.9120795Z ok (0.023s)
2023-01-11T21:38:06.9121270Z   test_cuda_broadcast2_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9121401Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9121662Z [2023-01-11 21:37:55,543] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1097
2023-01-11T21:38:06.9121926Z [2023-01-11 21:37:55,552] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1097
2023-01-11T21:38:06.9121931Z 
2023-01-11T21:38:06.9122091Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9122159Z import torch
2023-01-11T21:38:06.9122232Z import random
2023-01-11T21:38:06.9122351Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9122475Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9122481Z 
2023-01-11T21:38:06.9122569Z aten = torch.ops.aten
2023-01-11T21:38:06.9122706Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9122803Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9122808Z 
2023-01-11T21:38:06.9122883Z import triton
2023-01-11T21:38:06.9122969Z import triton.language as tl
2023-01-11T21:38:06.9123095Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9123235Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9123240Z 
2023-01-11T21:38:06.9123245Z 
2023-01-11T21:38:06.9123402Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9123475Z import triton
2023-01-11T21:38:06.9123574Z import triton.language as tl
2023-01-11T21:38:06.9123688Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9123791Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9123918Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9124046Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9124051Z 
2023-01-11T21:38:06.9124464Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9124537Z @triton.jit
2023-01-11T21:38:06.9124682Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9124756Z     xnumel = 10
2023-01-11T21:38:06.9124855Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9124982Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9125061Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9125132Z     x0 = xindex
2023-01-11T21:38:06.9125231Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9125363Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9125446Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9125606Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9125693Z ''')
2023-01-11T21:38:06.9125699Z 
2023-01-11T21:38:06.9125703Z 
2023-01-11T21:38:06.9125789Z async_compile.wait(globals())
2023-01-11T21:38:06.9125868Z del async_compile
2023-01-11T21:38:06.9125873Z 
2023-01-11T21:38:06.9125950Z def call(args):
2023-01-11T21:38:06.9126029Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9126103Z     args.clear()
2023-01-11T21:38:06.9126194Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9126403Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9126489Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9126632Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9126704Z         del arg0_1
2023-01-11T21:38:06.9126777Z         del arg1_1
2023-01-11T21:38:06.9126854Z         return (buf0, )
2023-01-11T21:38:06.9126860Z 
2023-01-11T21:38:06.9126865Z 
2023-01-11T21:38:06.9126946Z if __name__ == "__main__":
2023-01-11T21:38:06.9127064Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9127191Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9127396Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9127594Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9127713Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9127718Z 
2023-01-11T21:38:06.9127787Z ok (0.023s)
2023-01-11T21:38:06.9128268Z   test_cuda_broadcast2_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9128425Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9128683Z [2023-01-11 21:37:55,566] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1098
2023-01-11T21:38:06.9128947Z [2023-01-11 21:37:55,577] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1098
2023-01-11T21:38:06.9128952Z 
2023-01-11T21:38:06.9129049Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9129122Z import torch
2023-01-11T21:38:06.9129190Z import random
2023-01-11T21:38:06.9129310Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9129437Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9129442Z 
2023-01-11T21:38:06.9129524Z aten = torch.ops.aten
2023-01-11T21:38:06.9129662Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9129759Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9129764Z 
2023-01-11T21:38:06.9129840Z import triton
2023-01-11T21:38:06.9129925Z import triton.language as tl
2023-01-11T21:38:06.9130050Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9130190Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9130195Z 
2023-01-11T21:38:06.9130200Z 
2023-01-11T21:38:06.9130352Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9130426Z import triton
2023-01-11T21:38:06.9130518Z import triton.language as tl
2023-01-11T21:38:06.9130632Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9130734Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9130864Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9130990Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9130995Z 
2023-01-11T21:38:06.9131440Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9131518Z @triton.jit
2023-01-11T21:38:06.9131658Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9131732Z     xnumel = 100
2023-01-11T21:38:06.9131836Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9131964Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9132041Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9132120Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9132191Z     x2 = xindex
2023-01-11T21:38:06.9132290Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9132388Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9132467Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9132602Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9132681Z ''')
2023-01-11T21:38:06.9132687Z 
2023-01-11T21:38:06.9132691Z 
2023-01-11T21:38:06.9132786Z async_compile.wait(globals())
2023-01-11T21:38:06.9132861Z del async_compile
2023-01-11T21:38:06.9132866Z 
2023-01-11T21:38:06.9132939Z def call(args):
2023-01-11T21:38:06.9133017Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9133093Z     args.clear()
2023-01-11T21:38:06.9133184Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9133392Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9133483Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9133628Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9133702Z         del arg0_1
2023-01-11T21:38:06.9133803Z         del arg1_1
2023-01-11T21:38:06.9133880Z         return (buf0, )
2023-01-11T21:38:06.9133885Z 
2023-01-11T21:38:06.9133889Z 
2023-01-11T21:38:06.9133970Z if __name__ == "__main__":
2023-01-11T21:38:06.9134089Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9134212Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9134423Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9134768Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9134885Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9134890Z 
2023-01-11T21:38:06.9134961Z ok (0.025s)
2023-01-11T21:38:06.9135430Z   test_cuda_broadcast2_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9135590Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9135876Z [2023-01-11 21:37:55,591] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1099
2023-01-11T21:38:06.9136140Z [2023-01-11 21:37:55,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1099
2023-01-11T21:38:06.9136145Z 
2023-01-11T21:38:06.9136243Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9136311Z import torch
2023-01-11T21:38:06.9136387Z import random
2023-01-11T21:38:06.9136508Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9136631Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9136636Z 
2023-01-11T21:38:06.9136718Z aten = torch.ops.aten
2023-01-11T21:38:06.9136855Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9136949Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9136954Z 
2023-01-11T21:38:06.9137021Z import triton
2023-01-11T21:38:06.9137113Z import triton.language as tl
2023-01-11T21:38:06.9137307Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9137494Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9137500Z 
2023-01-11T21:38:06.9137505Z 
2023-01-11T21:38:06.9137665Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9137738Z import triton
2023-01-11T21:38:06.9137830Z import triton.language as tl
2023-01-11T21:38:06.9137945Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9138041Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9138176Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9138302Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9138309Z 
2023-01-11T21:38:06.9138723Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9138796Z @triton.jit
2023-01-11T21:38:06.9138940Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9139014Z     xnumel = 100
2023-01-11T21:38:06.9139113Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9139235Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9139318Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9139399Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9139469Z     x2 = xindex
2023-01-11T21:38:06.9139566Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9139664Z     tmp2 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9139752Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9139825Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9140000Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9140088Z ''')
2023-01-11T21:38:06.9140094Z 
2023-01-11T21:38:06.9140098Z 
2023-01-11T21:38:06.9140193Z async_compile.wait(globals())
2023-01-11T21:38:06.9140272Z del async_compile
2023-01-11T21:38:06.9140277Z 
2023-01-11T21:38:06.9140355Z def call(args):
2023-01-11T21:38:06.9140436Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9140507Z     args.clear()
2023-01-11T21:38:06.9140600Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9140816Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9140911Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9141060Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9141136Z         del arg0_1
2023-01-11T21:38:06.9141210Z         del arg1_1
2023-01-11T21:38:06.9141284Z         return (buf0, )
2023-01-11T21:38:06.9141302Z 
2023-01-11T21:38:06.9141306Z 
2023-01-11T21:38:06.9141382Z if __name__ == "__main__":
2023-01-11T21:38:06.9141501Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9141630Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9141846Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9142053Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9142175Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9142181Z 
2023-01-11T21:38:06.9142254Z ok (0.088s)
2023-01-11T21:38:06.9142731Z   test_cuda_broadcast2_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9142871Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9143128Z [2023-01-11 21:37:55,678] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1100
2023-01-11T21:38:06.9143420Z [2023-01-11 21:37:55,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1100
2023-01-11T21:38:06.9143426Z 
2023-01-11T21:38:06.9143527Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9143603Z import torch
2023-01-11T21:38:06.9143680Z import random
2023-01-11T21:38:06.9143800Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9143925Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9143930Z 
2023-01-11T21:38:06.9144014Z aten = torch.ops.aten
2023-01-11T21:38:06.9144147Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9144240Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9144249Z 
2023-01-11T21:38:06.9144323Z import triton
2023-01-11T21:38:06.9144419Z import triton.language as tl
2023-01-11T21:38:06.9144546Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9144690Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9144696Z 
2023-01-11T21:38:06.9144702Z 
2023-01-11T21:38:06.9144860Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9144940Z import triton
2023-01-11T21:38:06.9145028Z import triton.language as tl
2023-01-11T21:38:06.9145143Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9145248Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9145387Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9145525Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9145532Z 
2023-01-11T21:38:06.9145994Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9146108Z @triton.jit
2023-01-11T21:38:06.9146253Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9146324Z     xnumel = 100
2023-01-11T21:38:06.9146425Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9146558Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9146643Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9146721Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9146797Z     x0 = xindex % 10
2023-01-11T21:38:06.9146872Z     x2 = xindex
2023-01-11T21:38:06.9146967Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9147067Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9147157Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.9147238Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9147376Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9147465Z ''')
2023-01-11T21:38:06.9147470Z 
2023-01-11T21:38:06.9147475Z 
2023-01-11T21:38:06.9147570Z async_compile.wait(globals())
2023-01-11T21:38:06.9147643Z del async_compile
2023-01-11T21:38:06.9147648Z 
2023-01-11T21:38:06.9147724Z def call(args):
2023-01-11T21:38:06.9147809Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9147886Z     args.clear()
2023-01-11T21:38:06.9147979Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9148194Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9148290Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9148430Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9148507Z         del arg0_1
2023-01-11T21:38:06.9148582Z         del arg1_1
2023-01-11T21:38:06.9148663Z         return (buf0, )
2023-01-11T21:38:06.9148668Z 
2023-01-11T21:38:06.9148672Z 
2023-01-11T21:38:06.9148755Z if __name__ == "__main__":
2023-01-11T21:38:06.9148875Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9149004Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9149217Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9149437Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9149561Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9149566Z 
2023-01-11T21:38:06.9149641Z ok (0.086s)
2023-01-11T21:38:06.9150123Z   test_cuda_broadcast2_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9150263Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9150526Z [2023-01-11 21:37:55,765] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1101
2023-01-11T21:38:06.9150796Z [2023-01-11 21:37:55,838] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1101
2023-01-11T21:38:06.9150802Z 
2023-01-11T21:38:06.9150903Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9150979Z import torch
2023-01-11T21:38:06.9151056Z import random
2023-01-11T21:38:06.9151171Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9151297Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9151302Z 
2023-01-11T21:38:06.9151387Z aten = torch.ops.aten
2023-01-11T21:38:06.9151527Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9151626Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9151631Z 
2023-01-11T21:38:06.9151707Z import triton
2023-01-11T21:38:06.9151832Z import triton.language as tl
2023-01-11T21:38:06.9151953Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9152099Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9152104Z 
2023-01-11T21:38:06.9152109Z 
2023-01-11T21:38:06.9152271Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9152350Z import triton
2023-01-11T21:38:06.9152445Z import triton.language as tl
2023-01-11T21:38:06.9152562Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9152667Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9152802Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9152925Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9152930Z 
2023-01-11T21:38:06.9153353Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9153431Z @triton.jit
2023-01-11T21:38:06.9153574Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9153652Z     xnumel = 100
2023-01-11T21:38:06.9153756Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9153894Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9153979Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9154054Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9154130Z     x0 = xindex % 10
2023-01-11T21:38:06.9154203Z     x2 = xindex
2023-01-11T21:38:06.9154307Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9154418Z     tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9154501Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9154642Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9154722Z ''')
2023-01-11T21:38:06.9154728Z 
2023-01-11T21:38:06.9154735Z 
2023-01-11T21:38:06.9154830Z async_compile.wait(globals())
2023-01-11T21:38:06.9154909Z del async_compile
2023-01-11T21:38:06.9154914Z 
2023-01-11T21:38:06.9154991Z def call(args):
2023-01-11T21:38:06.9155071Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9155149Z     args.clear()
2023-01-11T21:38:06.9155247Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9155525Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9155629Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9155772Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9155846Z         del arg0_1
2023-01-11T21:38:06.9155918Z         del arg1_1
2023-01-11T21:38:06.9155997Z         return (buf0, )
2023-01-11T21:38:06.9156002Z 
2023-01-11T21:38:06.9156007Z 
2023-01-11T21:38:06.9156088Z if __name__ == "__main__":
2023-01-11T21:38:06.9156207Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9156329Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9156540Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9156746Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9156866Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9156871Z 
2023-01-11T21:38:06.9156941Z ok (0.088s)
2023-01-11T21:38:06.9157427Z   test_cuda_broadcast2_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9157559Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9157849Z [2023-01-11 21:37:55,853] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1102
2023-01-11T21:38:06.9158116Z [2023-01-11 21:37:55,865] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1102
2023-01-11T21:38:06.9158121Z 
2023-01-11T21:38:06.9158224Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9158294Z import torch
2023-01-11T21:38:06.9158368Z import random
2023-01-11T21:38:06.9158487Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9158609Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9158615Z 
2023-01-11T21:38:06.9158697Z aten = torch.ops.aten
2023-01-11T21:38:06.9158835Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9158928Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9158933Z 
2023-01-11T21:38:06.9159001Z import triton
2023-01-11T21:38:06.9159092Z import triton.language as tl
2023-01-11T21:38:06.9159218Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9159362Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9159368Z 
2023-01-11T21:38:06.9159372Z 
2023-01-11T21:38:06.9159527Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9159602Z import triton
2023-01-11T21:38:06.9159694Z import triton.language as tl
2023-01-11T21:38:06.9159811Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9159906Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9160039Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9160166Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9160171Z 
2023-01-11T21:38:06.9160588Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9160660Z @triton.jit
2023-01-11T21:38:06.9160803Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9160873Z     xnumel = 100
2023-01-11T21:38:06.9160970Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9161093Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9161175Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9161286Z     x0 = xindex % 10
2023-01-11T21:38:06.9161359Z     x2 = xindex
2023-01-11T21:38:06.9161457Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9161556Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9161631Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9161762Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9161847Z ''')
2023-01-11T21:38:06.9161853Z 
2023-01-11T21:38:06.9161857Z 
2023-01-11T21:38:06.9161949Z async_compile.wait(globals())
2023-01-11T21:38:06.9162025Z del async_compile
2023-01-11T21:38:06.9162031Z 
2023-01-11T21:38:06.9162103Z def call(args):
2023-01-11T21:38:06.9162185Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9162260Z     args.clear()
2023-01-11T21:38:06.9162346Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9162560Z         buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9162653Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9162801Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9162874Z         del arg0_1
2023-01-11T21:38:06.9162948Z         del arg1_1
2023-01-11T21:38:06.9163027Z         return (buf0, )
2023-01-11T21:38:06.9163032Z 
2023-01-11T21:38:06.9163036Z 
2023-01-11T21:38:06.9163116Z if __name__ == "__main__":
2023-01-11T21:38:06.9163227Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9163354Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9163563Z     arg0_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9163794Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9163914Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9163919Z 
2023-01-11T21:38:06.9163991Z ok (0.026s)
2023-01-11T21:38:06.9164475Z   test_cuda_broadcast3_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9164606Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9164865Z [2023-01-11 21:37:55,878] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1103
2023-01-11T21:38:06.9165122Z [2023-01-11 21:37:55,945] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1103
2023-01-11T21:38:06.9165138Z 
2023-01-11T21:38:06.9165230Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9165304Z import torch
2023-01-11T21:38:06.9165380Z import random
2023-01-11T21:38:06.9165502Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9165628Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9165633Z 
2023-01-11T21:38:06.9165717Z aten = torch.ops.aten
2023-01-11T21:38:06.9165854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9165942Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9165947Z 
2023-01-11T21:38:06.9166017Z import triton
2023-01-11T21:38:06.9166107Z import triton.language as tl
2023-01-11T21:38:06.9166231Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9166368Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9166373Z 
2023-01-11T21:38:06.9166380Z 
2023-01-11T21:38:06.9166531Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9166606Z import triton
2023-01-11T21:38:06.9166699Z import triton.language as tl
2023-01-11T21:38:06.9166806Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9166906Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9167068Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9167197Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9167202Z 
2023-01-11T21:38:06.9167620Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9167694Z @triton.jit
2023-01-11T21:38:06.9167835Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9167908Z     xnumel = 10
2023-01-11T21:38:06.9167999Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9168133Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9168217Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9168287Z     x0 = xindex
2023-01-11T21:38:06.9168419Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9168520Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9168599Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9168727Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9168811Z ''')
2023-01-11T21:38:06.9168817Z 
2023-01-11T21:38:06.9168821Z 
2023-01-11T21:38:06.9168916Z async_compile.wait(globals())
2023-01-11T21:38:06.9168993Z del async_compile
2023-01-11T21:38:06.9168998Z 
2023-01-11T21:38:06.9169071Z def call(args):
2023-01-11T21:38:06.9169151Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9169229Z     args.clear()
2023-01-11T21:38:06.9169314Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9169513Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9169635Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9169777Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9169851Z         del arg0_1
2023-01-11T21:38:06.9169924Z         del arg1_1
2023-01-11T21:38:06.9170003Z         return (buf0, )
2023-01-11T21:38:06.9170008Z 
2023-01-11T21:38:06.9170013Z 
2023-01-11T21:38:06.9170091Z if __name__ == "__main__":
2023-01-11T21:38:06.9170203Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9170331Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9170531Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9170730Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9170852Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9170857Z 
2023-01-11T21:38:06.9170931Z ok (0.080s)
2023-01-11T21:38:06.9171418Z   test_cuda_broadcast3_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9171550Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9171809Z [2023-01-11 21:37:55,958] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1104
2023-01-11T21:38:06.9172068Z [2023-01-11 21:37:55,969] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1104
2023-01-11T21:38:06.9172079Z 
2023-01-11T21:38:06.9172171Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9172245Z import torch
2023-01-11T21:38:06.9172320Z import random
2023-01-11T21:38:06.9172440Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9172562Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9172567Z 
2023-01-11T21:38:06.9172649Z aten = torch.ops.aten
2023-01-11T21:38:06.9172786Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9172912Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9172917Z 
2023-01-11T21:38:06.9172994Z import triton
2023-01-11T21:38:06.9173086Z import triton.language as tl
2023-01-11T21:38:06.9173211Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9173350Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9173355Z 
2023-01-11T21:38:06.9173360Z 
2023-01-11T21:38:06.9173516Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9173587Z import triton
2023-01-11T21:38:06.9173679Z import triton.language as tl
2023-01-11T21:38:06.9173787Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9173897Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9174032Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9174160Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9174166Z 
2023-01-11T21:38:06.9174708Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9174784Z @triton.jit
2023-01-11T21:38:06.9174926Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9174997Z     xnumel = 10
2023-01-11T21:38:06.9175088Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9175223Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9175306Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9175377Z     x0 = xindex
2023-01-11T21:38:06.9175573Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9175680Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9175776Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9175911Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9175996Z ''')
2023-01-11T21:38:06.9176005Z 
2023-01-11T21:38:06.9176009Z 
2023-01-11T21:38:06.9176102Z async_compile.wait(globals())
2023-01-11T21:38:06.9176178Z del async_compile
2023-01-11T21:38:06.9176183Z 
2023-01-11T21:38:06.9176259Z def call(args):
2023-01-11T21:38:06.9176338Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9176411Z     args.clear()
2023-01-11T21:38:06.9176500Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9176705Z         buf0 = empty_strided((1, 10, 1), (10, 1, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9176796Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9176939Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9177017Z         del arg0_1
2023-01-11T21:38:06.9177089Z         del arg1_1
2023-01-11T21:38:06.9177230Z         return (buf0, )
2023-01-11T21:38:06.9177237Z 
2023-01-11T21:38:06.9177241Z 
2023-01-11T21:38:06.9177330Z if __name__ == "__main__":
2023-01-11T21:38:06.9177445Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9177571Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9177771Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9177979Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9178100Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9178105Z 
2023-01-11T21:38:06.9178174Z ok (0.024s)
2023-01-11T21:38:06.9178656Z   test_cuda_broadcast3_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9178789Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9179092Z [2023-01-11 21:37:55,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1105
2023-01-11T21:38:06.9179357Z [2023-01-11 21:37:56,046] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1105
2023-01-11T21:38:06.9179363Z 
2023-01-11T21:38:06.9179456Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9179533Z import torch
2023-01-11T21:38:06.9179608Z import random
2023-01-11T21:38:06.9179729Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9179851Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9179860Z 
2023-01-11T21:38:06.9179942Z aten = torch.ops.aten
2023-01-11T21:38:06.9180078Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9180167Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9180178Z 
2023-01-11T21:38:06.9180245Z import triton
2023-01-11T21:38:06.9180337Z import triton.language as tl
2023-01-11T21:38:06.9180464Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9180604Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9180609Z 
2023-01-11T21:38:06.9180614Z 
2023-01-11T21:38:06.9180768Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9180842Z import triton
2023-01-11T21:38:06.9180934Z import triton.language as tl
2023-01-11T21:38:06.9181041Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9181142Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9181278Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9181403Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9181435Z 
2023-01-11T21:38:06.9181853Z @pointwise(size_hints=[1], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9181929Z @triton.jit
2023-01-11T21:38:06.9182073Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9182145Z     xnumel = 1
2023-01-11T21:38:06.9182236Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9182364Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9182447Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9182580Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9182709Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9182789Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9182925Z     tl.store(out_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), tmp2, None)
2023-01-11T21:38:06.9183004Z ''')
2023-01-11T21:38:06.9183016Z 
2023-01-11T21:38:06.9183021Z 
2023-01-11T21:38:06.9183108Z async_compile.wait(globals())
2023-01-11T21:38:06.9183186Z del async_compile
2023-01-11T21:38:06.9183192Z 
2023-01-11T21:38:06.9183266Z def call(args):
2023-01-11T21:38:06.9183348Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9183423Z     args.clear()
2023-01-11T21:38:06.9183514Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9183713Z         buf0 = empty_strided((1, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9183799Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9183940Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 1, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.9184012Z         del arg0_1
2023-01-11T21:38:06.9184084Z         del arg1_1
2023-01-11T21:38:06.9184160Z         return (buf0, )
2023-01-11T21:38:06.9184167Z 
2023-01-11T21:38:06.9184171Z 
2023-01-11T21:38:06.9184253Z if __name__ == "__main__":
2023-01-11T21:38:06.9184371Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9184498Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9184690Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9184912Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9185033Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9185039Z 
2023-01-11T21:38:06.9185109Z ok (0.077s)
2023-01-11T21:38:06.9185586Z   test_cuda_broadcast3_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9185730Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9186031Z [2023-01-11 21:37:56,060] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1106
2023-01-11T21:38:06.9186300Z [2023-01-11 21:37:56,129] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1106
2023-01-11T21:38:06.9186306Z 
2023-01-11T21:38:06.9186402Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9186470Z import torch
2023-01-11T21:38:06.9186544Z import random
2023-01-11T21:38:06.9186662Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9186787Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9186792Z 
2023-01-11T21:38:06.9186874Z aten = torch.ops.aten
2023-01-11T21:38:06.9187011Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9187107Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9187112Z 
2023-01-11T21:38:06.9187225Z import triton
2023-01-11T21:38:06.9187310Z import triton.language as tl
2023-01-11T21:38:06.9187435Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9187576Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9187581Z 
2023-01-11T21:38:06.9187586Z 
2023-01-11T21:38:06.9187743Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9187817Z import triton
2023-01-11T21:38:06.9187909Z import triton.language as tl
2023-01-11T21:38:06.9188023Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9188118Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9188252Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9188380Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9188385Z 
2023-01-11T21:38:06.9188802Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9188877Z @triton.jit
2023-01-11T21:38:06.9189021Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9189096Z     xnumel = 100
2023-01-11T21:38:06.9189194Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9189320Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9189403Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9189473Z     x0 = xindex
2023-01-11T21:38:06.9189606Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9189703Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9189782Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9189918Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9189996Z ''')
2023-01-11T21:38:06.9190010Z 
2023-01-11T21:38:06.9190014Z 
2023-01-11T21:38:06.9190100Z async_compile.wait(globals())
2023-01-11T21:38:06.9190177Z del async_compile
2023-01-11T21:38:06.9190183Z 
2023-01-11T21:38:06.9190257Z def call(args):
2023-01-11T21:38:06.9190333Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9190408Z     args.clear()
2023-01-11T21:38:06.9190499Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9190732Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9190819Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9190961Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9191035Z         del arg0_1
2023-01-11T21:38:06.9191105Z         del arg1_1
2023-01-11T21:38:06.9191183Z         return (buf0, )
2023-01-11T21:38:06.9191188Z 
2023-01-11T21:38:06.9191192Z 
2023-01-11T21:38:06.9191270Z if __name__ == "__main__":
2023-01-11T21:38:06.9191388Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9191514Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9191709Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9191914Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9192039Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9192044Z 
2023-01-11T21:38:06.9192115Z ok (0.082s)
2023-01-11T21:38:06.9192596Z   test_cuda_broadcast3_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9192729Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9192994Z [2023-01-11 21:37:56,143] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1107
2023-01-11T21:38:06.9193289Z [2023-01-11 21:37:56,211] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1107
2023-01-11T21:38:06.9193295Z 
2023-01-11T21:38:06.9193393Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9193460Z import torch
2023-01-11T21:38:06.9193536Z import random
2023-01-11T21:38:06.9193656Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9193784Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9193790Z 
2023-01-11T21:38:06.9193871Z aten = torch.ops.aten
2023-01-11T21:38:06.9194007Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9194106Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9194111Z 
2023-01-11T21:38:06.9194184Z import triton
2023-01-11T21:38:06.9194269Z import triton.language as tl
2023-01-11T21:38:06.9194394Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9194536Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9194545Z 
2023-01-11T21:38:06.9194549Z 
2023-01-11T21:38:06.9194704Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9194778Z import triton
2023-01-11T21:38:06.9194870Z import triton.language as tl
2023-01-11T21:38:06.9194986Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9195084Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9195218Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9195346Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9195351Z 
2023-01-11T21:38:06.9195769Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9195845Z @triton.jit
2023-01-11T21:38:06.9195986Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9196063Z     xnumel = 100
2023-01-11T21:38:06.9196160Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9196290Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9196367Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9196438Z     x0 = xindex
2023-01-11T21:38:06.9196596Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9196695Z     tmp2 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9196783Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9196861Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9196998Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9197077Z ''')
2023-01-11T21:38:06.9197082Z 
2023-01-11T21:38:06.9197087Z 
2023-01-11T21:38:06.9197179Z async_compile.wait(globals())
2023-01-11T21:38:06.9197255Z del async_compile
2023-01-11T21:38:06.9197260Z 
2023-01-11T21:38:06.9197333Z def call(args):
2023-01-11T21:38:06.9197412Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9197489Z     args.clear()
2023-01-11T21:38:06.9197579Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9197779Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9197874Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9198021Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9198094Z         del arg0_1
2023-01-11T21:38:06.9198165Z         del arg1_1
2023-01-11T21:38:06.9198242Z         return (buf0, )
2023-01-11T21:38:06.9198247Z 
2023-01-11T21:38:06.9198252Z 
2023-01-11T21:38:06.9198330Z if __name__ == "__main__":
2023-01-11T21:38:06.9198448Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9198569Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9198769Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9198973Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9199125Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9199131Z 
2023-01-11T21:38:06.9199201Z ok (0.082s)
2023-01-11T21:38:06.9199682Z   test_cuda_broadcast3_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9199815Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9200076Z [2023-01-11 21:37:56,225] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1108
2023-01-11T21:38:06.9200343Z [2023-01-11 21:37:56,290] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1108
2023-01-11T21:38:06.9200351Z 
2023-01-11T21:38:06.9200453Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9200521Z import torch
2023-01-11T21:38:06.9200596Z import random
2023-01-11T21:38:06.9200715Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9200838Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9200843Z 
2023-01-11T21:38:06.9200929Z aten = torch.ops.aten
2023-01-11T21:38:06.9201068Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9201162Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9201168Z 
2023-01-11T21:38:06.9201234Z import triton
2023-01-11T21:38:06.9201325Z import triton.language as tl
2023-01-11T21:38:06.9201450Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9201589Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9201595Z 
2023-01-11T21:38:06.9201599Z 
2023-01-11T21:38:06.9201752Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9201828Z import triton
2023-01-11T21:38:06.9201920Z import triton.language as tl
2023-01-11T21:38:06.9202035Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9202130Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9202262Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9202413Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9202420Z 
2023-01-11T21:38:06.9202836Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9202911Z @triton.jit
2023-01-11T21:38:06.9203052Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9203125Z     xnumel = 10
2023-01-11T21:38:06.9203223Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9203346Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9203432Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9203503Z     x0 = xindex
2023-01-11T21:38:06.9203634Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9203733Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9203825Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.9203903Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9204032Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9204117Z ''')
2023-01-11T21:38:06.9204122Z 
2023-01-11T21:38:06.9204127Z 
2023-01-11T21:38:06.9204219Z async_compile.wait(globals())
2023-01-11T21:38:06.9204295Z del async_compile
2023-01-11T21:38:06.9204301Z 
2023-01-11T21:38:06.9204375Z def call(args):
2023-01-11T21:38:06.9204453Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9204527Z     args.clear()
2023-01-11T21:38:06.9204611Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9204809Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9204927Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9205070Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9205144Z         del arg0_1
2023-01-11T21:38:06.9205214Z         del arg1_1
2023-01-11T21:38:06.9205292Z         return (buf0, )
2023-01-11T21:38:06.9205298Z 
2023-01-11T21:38:06.9205302Z 
2023-01-11T21:38:06.9205379Z if __name__ == "__main__":
2023-01-11T21:38:06.9205490Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9205617Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9205816Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9206012Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9206132Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9206137Z 
2023-01-11T21:38:06.9206207Z ok (0.079s)
2023-01-11T21:38:06.9206695Z   test_cuda_broadcast3_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9206826Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9207090Z [2023-01-11 21:37:56,304] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1109
2023-01-11T21:38:06.9207349Z [2023-01-11 21:37:56,376] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1109
2023-01-11T21:38:06.9207359Z 
2023-01-11T21:38:06.9207451Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9207525Z import torch
2023-01-11T21:38:06.9207600Z import random
2023-01-11T21:38:06.9207722Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9207844Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9207849Z 
2023-01-11T21:38:06.9207927Z aten = torch.ops.aten
2023-01-11T21:38:06.9208060Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9208176Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9208182Z 
2023-01-11T21:38:06.9208255Z import triton
2023-01-11T21:38:06.9208346Z import triton.language as tl
2023-01-11T21:38:06.9208471Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9208608Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9208614Z 
2023-01-11T21:38:06.9208618Z 
2023-01-11T21:38:06.9208771Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9208843Z import triton
2023-01-11T21:38:06.9208937Z import triton.language as tl
2023-01-11T21:38:06.9209045Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9209150Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9209280Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9209405Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9209410Z 
2023-01-11T21:38:06.9209828Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9209903Z @triton.jit
2023-01-11T21:38:06.9210046Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9210117Z     xnumel = 100
2023-01-11T21:38:06.9210209Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9210337Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9210419Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9210497Z     x0 = xindex % 10
2023-01-11T21:38:06.9210576Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9210675Z     x2 = xindex
2023-01-11T21:38:06.9210807Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9210910Z     tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9210988Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9211126Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9211209Z ''')
2023-01-11T21:38:06.9211215Z 
2023-01-11T21:38:06.9211219Z 
2023-01-11T21:38:06.9211314Z async_compile.wait(globals())
2023-01-11T21:38:06.9211390Z del async_compile
2023-01-11T21:38:06.9211395Z 
2023-01-11T21:38:06.9211468Z def call(args):
2023-01-11T21:38:06.9211540Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9211613Z     args.clear()
2023-01-11T21:38:06.9211705Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9211909Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9212000Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9212146Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9212219Z         del arg0_1
2023-01-11T21:38:06.9212284Z         del arg1_1
2023-01-11T21:38:06.9212360Z         return (buf0, )
2023-01-11T21:38:06.9212365Z 
2023-01-11T21:38:06.9212369Z 
2023-01-11T21:38:06.9212452Z if __name__ == "__main__":
2023-01-11T21:38:06.9212570Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9212697Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9212897Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9213101Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9213218Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9213223Z 
2023-01-11T21:38:06.9213292Z ok (0.086s)
2023-01-11T21:38:06.9213767Z   test_cuda_broadcast3_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9213956Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9214218Z [2023-01-11 21:37:56,391] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1110
2023-01-11T21:38:06.9214589Z [2023-01-11 21:37:56,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1110
2023-01-11T21:38:06.9214595Z 
2023-01-11T21:38:06.9214695Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9214770Z import torch
2023-01-11T21:38:06.9214844Z import random
2023-01-11T21:38:06.9214961Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9215089Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9215094Z 
2023-01-11T21:38:06.9215169Z aten = torch.ops.aten
2023-01-11T21:38:06.9215305Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9215400Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9215405Z 
2023-01-11T21:38:06.9215489Z import triton
2023-01-11T21:38:06.9215595Z import triton.language as tl
2023-01-11T21:38:06.9215743Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9215880Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9215885Z 
2023-01-11T21:38:06.9215890Z 
2023-01-11T21:38:06.9216041Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9216109Z import triton
2023-01-11T21:38:06.9216200Z import triton.language as tl
2023-01-11T21:38:06.9216313Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9216415Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9216547Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9216714Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9216719Z 
2023-01-11T21:38:06.9223051Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9223148Z @triton.jit
2023-01-11T21:38:06.9223296Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9223373Z     xnumel = 100
2023-01-11T21:38:06.9223465Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9223598Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9223681Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9223749Z     x0 = xindex
2023-01-11T21:38:06.9223887Z     tmp0 = tl.load(in_ptr0 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9223978Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9224059Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9224192Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9224293Z ''')
2023-01-11T21:38:06.9224299Z 
2023-01-11T21:38:06.9224304Z 
2023-01-11T21:38:06.9224397Z async_compile.wait(globals())
2023-01-11T21:38:06.9224477Z del async_compile
2023-01-11T21:38:06.9224483Z 
2023-01-11T21:38:06.9224560Z def call(args):
2023-01-11T21:38:06.9224641Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9224710Z     args.clear()
2023-01-11T21:38:06.9224801Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9225005Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9225096Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9225238Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9225310Z         del arg0_1
2023-01-11T21:38:06.9225381Z         del arg1_1
2023-01-11T21:38:06.9225455Z         return (buf0, )
2023-01-11T21:38:06.9225460Z 
2023-01-11T21:38:06.9225465Z 
2023-01-11T21:38:06.9225544Z if __name__ == "__main__":
2023-01-11T21:38:06.9225660Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9225785Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9226087Z     arg0_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9226302Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9226421Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9226427Z 
2023-01-11T21:38:06.9226497Z ok (0.025s)
2023-01-11T21:38:06.9226968Z   test_cuda_dense_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9227099Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9227358Z [2023-01-11 21:37:56,415] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1111
2023-01-11T21:38:06.9227625Z [2023-01-11 21:37:56,485] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1111
2023-01-11T21:38:06.9227631Z 
2023-01-11T21:38:06.9227728Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9227802Z import torch
2023-01-11T21:38:06.9227879Z import random
2023-01-11T21:38:06.9227993Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9228116Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9228121Z 
2023-01-11T21:38:06.9228196Z aten = torch.ops.aten
2023-01-11T21:38:06.9228331Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9228458Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9228464Z 
2023-01-11T21:38:06.9228539Z import triton
2023-01-11T21:38:06.9228635Z import triton.language as tl
2023-01-11T21:38:06.9228762Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9228904Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9228912Z 
2023-01-11T21:38:06.9228916Z 
2023-01-11T21:38:06.9229075Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9229145Z import triton
2023-01-11T21:38:06.9229238Z import triton.language as tl
2023-01-11T21:38:06.9229354Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9229460Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9229592Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9229720Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9229725Z 
2023-01-11T21:38:06.9230142Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9230221Z @triton.jit
2023-01-11T21:38:06.9230358Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9230437Z     xnumel = 100
2023-01-11T21:38:06.9230536Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9230667Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9230752Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9230825Z     x2 = xindex
2023-01-11T21:38:06.9230902Z     x0 = xindex % 10
2023-01-11T21:38:06.9230995Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9231092Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9231174Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9231309Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9231396Z ''')
2023-01-11T21:38:06.9231402Z 
2023-01-11T21:38:06.9231406Z 
2023-01-11T21:38:06.9231501Z async_compile.wait(globals())
2023-01-11T21:38:06.9231580Z del async_compile
2023-01-11T21:38:06.9231585Z 
2023-01-11T21:38:06.9231663Z def call(args):
2023-01-11T21:38:06.9231738Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9231816Z     args.clear()
2023-01-11T21:38:06.9231938Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9232147Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9232242Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9232384Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9232462Z         del arg0_1
2023-01-11T21:38:06.9232530Z         del arg1_1
2023-01-11T21:38:06.9232611Z         return (buf0, )
2023-01-11T21:38:06.9232616Z 
2023-01-11T21:38:06.9232620Z 
2023-01-11T21:38:06.9232704Z if __name__ == "__main__":
2023-01-11T21:38:06.9232822Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9232951Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9233156Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9233353Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9233469Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9233481Z 
2023-01-11T21:38:06.9233548Z ok (0.084s)
2023-01-11T21:38:06.9234028Z   test_cuda_dense_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9234164Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9234462Z [2023-01-11 21:37:56,499] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1112
2023-01-11T21:38:06.9234731Z [2023-01-11 21:37:56,570] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1112
2023-01-11T21:38:06.9234737Z 
2023-01-11T21:38:06.9234841Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9234917Z import torch
2023-01-11T21:38:06.9234994Z import random
2023-01-11T21:38:06.9235115Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9235234Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9235239Z 
2023-01-11T21:38:06.9235323Z aten = torch.ops.aten
2023-01-11T21:38:06.9235463Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9235561Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9235566Z 
2023-01-11T21:38:06.9235641Z import triton
2023-01-11T21:38:06.9235736Z import triton.language as tl
2023-01-11T21:38:06.9235865Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9236000Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9236011Z 
2023-01-11T21:38:06.9236015Z 
2023-01-11T21:38:06.9236164Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9236241Z import triton
2023-01-11T21:38:06.9236338Z import triton.language as tl
2023-01-11T21:38:06.9236456Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9236559Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9236692Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9236817Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9236822Z 
2023-01-11T21:38:06.9237240Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9237311Z @triton.jit
2023-01-11T21:38:06.9237454Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9237533Z     xnumel = 100
2023-01-11T21:38:06.9237630Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9237761Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9237874Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9237951Z     x2 = xindex
2023-01-11T21:38:06.9238026Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9238126Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9238223Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9238302Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9238437Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9238524Z ''')
2023-01-11T21:38:06.9238529Z 
2023-01-11T21:38:06.9238534Z 
2023-01-11T21:38:06.9238629Z async_compile.wait(globals())
2023-01-11T21:38:06.9238702Z del async_compile
2023-01-11T21:38:06.9238716Z 
2023-01-11T21:38:06.9238786Z def call(args):
2023-01-11T21:38:06.9238867Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9238945Z     args.clear()
2023-01-11T21:38:06.9239039Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9239249Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9239345Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9239490Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9239559Z         del arg0_1
2023-01-11T21:38:06.9239636Z         del arg1_1
2023-01-11T21:38:06.9239716Z         return (buf0, )
2023-01-11T21:38:06.9239721Z 
2023-01-11T21:38:06.9239726Z 
2023-01-11T21:38:06.9239806Z if __name__ == "__main__":
2023-01-11T21:38:06.9239924Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9240053Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9240259Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9240487Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9240612Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9240618Z 
2023-01-11T21:38:06.9240691Z ok (0.085s)
2023-01-11T21:38:06.9241169Z   test_cuda_dense_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9241303Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9241562Z [2023-01-11 21:37:56,583] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1113
2023-01-11T21:38:06.9241829Z [2023-01-11 21:37:56,652] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1113
2023-01-11T21:38:06.9241835Z 
2023-01-11T21:38:06.9241936Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9242014Z import torch
2023-01-11T21:38:06.9242089Z import random
2023-01-11T21:38:06.9242207Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9242331Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9242337Z 
2023-01-11T21:38:06.9242420Z aten = torch.ops.aten
2023-01-11T21:38:06.9242559Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9242656Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9242661Z 
2023-01-11T21:38:06.9242735Z import triton
2023-01-11T21:38:06.9242829Z import triton.language as tl
2023-01-11T21:38:06.9242950Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9243090Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9243099Z 
2023-01-11T21:38:06.9243103Z 
2023-01-11T21:38:06.9243257Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9243332Z import triton
2023-01-11T21:38:06.9243427Z import triton.language as tl
2023-01-11T21:38:06.9243541Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9243646Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9243807Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9243927Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9243932Z 
2023-01-11T21:38:06.9244348Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9244422Z @triton.jit
2023-01-11T21:38:06.9244563Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9244637Z     xnumel = 100
2023-01-11T21:38:06.9244738Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9244868Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9244953Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9245020Z     x0 = xindex
2023-01-11T21:38:06.9245117Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9245251Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9245335Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9245471Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9245559Z ''')
2023-01-11T21:38:06.9245564Z 
2023-01-11T21:38:06.9245569Z 
2023-01-11T21:38:06.9245664Z async_compile.wait(globals())
2023-01-11T21:38:06.9245743Z del async_compile
2023-01-11T21:38:06.9245749Z 
2023-01-11T21:38:06.9245819Z def call(args):
2023-01-11T21:38:06.9245900Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9245977Z     args.clear()
2023-01-11T21:38:06.9246073Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9246308Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9246403Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9246547Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9246616Z         del arg0_1
2023-01-11T21:38:06.9246694Z         del arg1_1
2023-01-11T21:38:06.9246778Z         return (buf0, )
2023-01-11T21:38:06.9246784Z 
2023-01-11T21:38:06.9246788Z 
2023-01-11T21:38:06.9246870Z if __name__ == "__main__":
2023-01-11T21:38:06.9246990Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9247118Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9247323Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9247519Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9247634Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9247643Z 
2023-01-11T21:38:06.9247714Z ok (0.082s)
2023-01-11T21:38:06.9248193Z   test_cuda_dense_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9248325Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9248589Z [2023-01-11 21:37:56,665] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1114
2023-01-11T21:38:06.9248854Z [2023-01-11 21:37:56,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1114
2023-01-11T21:38:06.9248860Z 
2023-01-11T21:38:06.9248960Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9249039Z import torch
2023-01-11T21:38:06.9249116Z import random
2023-01-11T21:38:06.9249229Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9249354Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9249359Z 
2023-01-11T21:38:06.9249443Z aten = torch.ops.aten
2023-01-11T21:38:06.9249609Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9249708Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9249714Z 
2023-01-11T21:38:06.9249793Z import triton
2023-01-11T21:38:06.9249888Z import triton.language as tl
2023-01-11T21:38:06.9250009Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9250149Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9250155Z 
2023-01-11T21:38:06.9250159Z 
2023-01-11T21:38:06.9250315Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9250393Z import triton
2023-01-11T21:38:06.9250486Z import triton.language as tl
2023-01-11T21:38:06.9250603Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9250706Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9250840Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9250961Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9250972Z 
2023-01-11T21:38:06.9251384Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9251460Z @triton.jit
2023-01-11T21:38:06.9251601Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9251677Z     xnumel = 100
2023-01-11T21:38:06.9251775Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9251906Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9251991Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9252087Z     x0 = xindex
2023-01-11T21:38:06.9252186Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9252287Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9252367Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9252501Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9252589Z ''')
2023-01-11T21:38:06.9252597Z 
2023-01-11T21:38:06.9252602Z 
2023-01-11T21:38:06.9252699Z async_compile.wait(globals())
2023-01-11T21:38:06.9252782Z del async_compile
2023-01-11T21:38:06.9252787Z 
2023-01-11T21:38:06.9252857Z def call(args):
2023-01-11T21:38:06.9252938Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9253017Z     args.clear()
2023-01-11T21:38:06.9253111Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9253315Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9253409Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9253550Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9253624Z         del arg0_1
2023-01-11T21:38:06.9253699Z         del arg1_1
2023-01-11T21:38:06.9253779Z         return (buf0, )
2023-01-11T21:38:06.9253785Z 
2023-01-11T21:38:06.9253789Z 
2023-01-11T21:38:06.9253871Z if __name__ == "__main__":
2023-01-11T21:38:06.9253992Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9254119Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9254326Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9254668Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9254786Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9254792Z 
2023-01-11T21:38:06.9254865Z ok (0.081s)
2023-01-11T21:38:06.9255335Z   test_cuda_dense_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9255471Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9255774Z [2023-01-11 21:37:56,746] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1115
2023-01-11T21:38:06.9256042Z [2023-01-11 21:37:56,815] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1115
2023-01-11T21:38:06.9256048Z 
2023-01-11T21:38:06.9256148Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9256225Z import torch
2023-01-11T21:38:06.9256302Z import random
2023-01-11T21:38:06.9256416Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9256542Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9256550Z 
2023-01-11T21:38:06.9256636Z aten = torch.ops.aten
2023-01-11T21:38:06.9256774Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9256871Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9256876Z 
2023-01-11T21:38:06.9256954Z import triton
2023-01-11T21:38:06.9257049Z import triton.language as tl
2023-01-11T21:38:06.9257288Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9257425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9257431Z 
2023-01-11T21:38:06.9257435Z 
2023-01-11T21:38:06.9257591Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9257668Z import triton
2023-01-11T21:38:06.9257765Z import triton.language as tl
2023-01-11T21:38:06.9257881Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9257983Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9258117Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9258237Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9258288Z 
2023-01-11T21:38:06.9258698Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9258778Z @triton.jit
2023-01-11T21:38:06.9258925Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9259001Z     xnumel = 100
2023-01-11T21:38:06.9259099Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9259230Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9259316Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9259382Z     x0 = xindex
2023-01-11T21:38:06.9259482Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9259583Z     tmp2 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9259675Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9259758Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9259897Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9259987Z ''')
2023-01-11T21:38:06.9259992Z 
2023-01-11T21:38:06.9259997Z 
2023-01-11T21:38:06.9260091Z async_compile.wait(globals())
2023-01-11T21:38:06.9260164Z del async_compile
2023-01-11T21:38:06.9260169Z 
2023-01-11T21:38:06.9260253Z def call(args):
2023-01-11T21:38:06.9260335Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9260416Z     args.clear()
2023-01-11T21:38:06.9260511Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9260715Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9260810Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9260946Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9261024Z         del arg0_1
2023-01-11T21:38:06.9261099Z         del arg1_1
2023-01-11T21:38:06.9261179Z         return (buf0, )
2023-01-11T21:38:06.9261187Z 
2023-01-11T21:38:06.9261191Z 
2023-01-11T21:38:06.9261274Z if __name__ == "__main__":
2023-01-11T21:38:06.9261394Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9261522Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9261753Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9261957Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9262080Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9262086Z 
2023-01-11T21:38:06.9262159Z ok (0.083s)
2023-01-11T21:38:06.9262631Z   test_cuda_dense_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9262765Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9263028Z [2023-01-11 21:37:56,829] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1116
2023-01-11T21:38:06.9263301Z [2023-01-11 21:37:56,901] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1116
2023-01-11T21:38:06.9263307Z 
2023-01-11T21:38:06.9263406Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9263482Z import torch
2023-01-11T21:38:06.9263552Z import random
2023-01-11T21:38:06.9263672Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9263797Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9263802Z 
2023-01-11T21:38:06.9263884Z aten = torch.ops.aten
2023-01-11T21:38:06.9264020Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9264115Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9264146Z 
2023-01-11T21:38:06.9264225Z import triton
2023-01-11T21:38:06.9264321Z import triton.language as tl
2023-01-11T21:38:06.9264440Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9264582Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9264587Z 
2023-01-11T21:38:06.9264595Z 
2023-01-11T21:38:06.9264752Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9264828Z import triton
2023-01-11T21:38:06.9264923Z import triton.language as tl
2023-01-11T21:38:06.9265041Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9265145Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9265272Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9265401Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9265406Z 
2023-01-11T21:38:06.9265824Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9265904Z @triton.jit
2023-01-11T21:38:06.9266047Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9266123Z     xnumel = 100
2023-01-11T21:38:06.9266225Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9266357Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9266436Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9266507Z     x2 = xindex
2023-01-11T21:38:06.9266590Z     x0 = xindex % 10
2023-01-11T21:38:06.9266688Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9266785Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9266875Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.9266957Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9267090Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9267180Z ''')
2023-01-11T21:38:06.9267185Z 
2023-01-11T21:38:06.9267190Z 
2023-01-11T21:38:06.9267284Z async_compile.wait(globals())
2023-01-11T21:38:06.9267363Z del async_compile
2023-01-11T21:38:06.9267368Z 
2023-01-11T21:38:06.9267444Z def call(args):
2023-01-11T21:38:06.9267525Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9267602Z     args.clear()
2023-01-11T21:38:06.9267727Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9267927Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9268022Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9268165Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9268240Z         del arg0_1
2023-01-11T21:38:06.9268311Z         del arg1_1
2023-01-11T21:38:06.9268394Z         return (buf0, )
2023-01-11T21:38:06.9268399Z 
2023-01-11T21:38:06.9268403Z 
2023-01-11T21:38:06.9268485Z if __name__ == "__main__":
2023-01-11T21:38:06.9268598Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9268732Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9268938Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9269136Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9269261Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9269267Z 
2023-01-11T21:38:06.9269341Z ok (0.085s)
2023-01-11T21:38:06.9269813Z   test_cuda_dense_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9269947Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9270235Z [2023-01-11 21:37:56,915] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1117
2023-01-11T21:38:06.9270500Z [2023-01-11 21:37:56,988] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1117
2023-01-11T21:38:06.9270506Z 
2023-01-11T21:38:06.9270602Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9270679Z import torch
2023-01-11T21:38:06.9270758Z import random
2023-01-11T21:38:06.9270878Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9271004Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9271009Z 
2023-01-11T21:38:06.9271093Z aten = torch.ops.aten
2023-01-11T21:38:06.9271230Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9271321Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9271331Z 
2023-01-11T21:38:06.9271402Z import triton
2023-01-11T21:38:06.9271496Z import triton.language as tl
2023-01-11T21:38:06.9271626Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9271768Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9271773Z 
2023-01-11T21:38:06.9271777Z 
2023-01-11T21:38:06.9271934Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9272012Z import triton
2023-01-11T21:38:06.9272107Z import triton.language as tl
2023-01-11T21:38:06.9272217Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9272322Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9272456Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9272582Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9272587Z 
2023-01-11T21:38:06.9273003Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9273083Z @triton.jit
2023-01-11T21:38:06.9273229Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9273306Z     xnumel = 100
2023-01-11T21:38:06.9273398Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9273529Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9273644Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9273721Z     x2 = xindex
2023-01-11T21:38:06.9273798Z     x0 = xindex % 10
2023-01-11T21:38:06.9273879Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9273980Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9274085Z     tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9274166Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9274303Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9274391Z ''')
2023-01-11T21:38:06.9274397Z 
2023-01-11T21:38:06.9274401Z 
2023-01-11T21:38:06.9274498Z async_compile.wait(globals())
2023-01-11T21:38:06.9274581Z del async_compile
2023-01-11T21:38:06.9274587Z 
2023-01-11T21:38:06.9274667Z def call(args):
2023-01-11T21:38:06.9274748Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9274819Z     args.clear()
2023-01-11T21:38:06.9274911Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9275119Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9275210Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9275354Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9275432Z         del arg0_1
2023-01-11T21:38:06.9275507Z         del arg1_1
2023-01-11T21:38:06.9275581Z         return (buf0, )
2023-01-11T21:38:06.9275586Z 
2023-01-11T21:38:06.9275590Z 
2023-01-11T21:38:06.9275675Z if __name__ == "__main__":
2023-01-11T21:38:06.9275793Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9275920Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9276161Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9276361Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9276482Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9276487Z 
2023-01-11T21:38:06.9276559Z ok (0.087s)
2023-01-11T21:38:06.9277038Z   test_cuda_dense_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9277171Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9277433Z [2023-01-11 21:37:57,002] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1118
2023-01-11T21:38:06.9277704Z [2023-01-11 21:37:57,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1118
2023-01-11T21:38:06.9277710Z 
2023-01-11T21:38:06.9277808Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9277886Z import torch
2023-01-11T21:38:06.9277963Z import random
2023-01-11T21:38:06.9278084Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9278209Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9278214Z 
2023-01-11T21:38:06.9278291Z aten = torch.ops.aten
2023-01-11T21:38:06.9278428Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9278522Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9278528Z 
2023-01-11T21:38:06.9278605Z import triton
2023-01-11T21:38:06.9278701Z import triton.language as tl
2023-01-11T21:38:06.9278826Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9278967Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9278975Z 
2023-01-11T21:38:06.9278980Z 
2023-01-11T21:38:06.9279135Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9279206Z import triton
2023-01-11T21:38:06.9279301Z import triton.language as tl
2023-01-11T21:38:06.9279416Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9279547Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9279685Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9279813Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9279818Z 
2023-01-11T21:38:06.9280302Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9280380Z @triton.jit
2023-01-11T21:38:06.9280552Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.9280632Z     xnumel = 10
2023-01-11T21:38:06.9280708Z     ynumel = 10
2023-01-11T21:38:06.9280809Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9280947Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9281035Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9281133Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.9281260Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.9281345Z     ymask = yindex < ynumel
2023-01-11T21:38:06.9281416Z     x0 = xindex
2023-01-11T21:38:06.9281488Z     y1 = yindex
2023-01-11T21:38:06.9281607Z     tmp0 = tl.load(in_ptr0 + (y1 + (10*x0)), xmask & ymask)
2023-01-11T21:38:06.9281724Z     tmp1 = tl.load(in_ptr1 + (x0 + (10*y1)), xmask & ymask)
2023-01-11T21:38:06.9281806Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9281959Z     tl.store(out_ptr0 + (y1 + (10*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.9282073Z ''')
2023-01-11T21:38:06.9282079Z 
2023-01-11T21:38:06.9282083Z 
2023-01-11T21:38:06.9282181Z async_compile.wait(globals())
2023-01-11T21:38:06.9282266Z del async_compile
2023-01-11T21:38:06.9282271Z 
2023-01-11T21:38:06.9282347Z def call(args):
2023-01-11T21:38:06.9282430Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9282510Z     args.clear()
2023-01-11T21:38:06.9282603Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9282805Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9282899Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9283047Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0)
2023-01-11T21:38:06.9283124Z         del arg0_1
2023-01-11T21:38:06.9283198Z         del arg1_1
2023-01-11T21:38:06.9283278Z         return (buf0, )
2023-01-11T21:38:06.9283284Z 
2023-01-11T21:38:06.9283288Z 
2023-01-11T21:38:06.9283370Z if __name__ == "__main__":
2023-01-11T21:38:06.9283492Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9283614Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9283821Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9284027Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9284147Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9284152Z 
2023-01-11T21:38:06.9284225Z ok (0.238s)
2023-01-11T21:38:06.9284702Z   test_cuda_double_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9284837Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9285097Z [2023-01-11 21:37:57,240] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1119
2023-01-11T21:38:06.9285390Z [2023-01-11 21:37:57,316] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1119
2023-01-11T21:38:06.9285397Z 
2023-01-11T21:38:06.9285491Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9285579Z import torch
2023-01-11T21:38:06.9285670Z import random
2023-01-11T21:38:06.9285807Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9285941Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9285946Z 
2023-01-11T21:38:06.9286030Z aten = torch.ops.aten
2023-01-11T21:38:06.9286168Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9286265Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9286270Z 
2023-01-11T21:38:06.9286341Z import triton
2023-01-11T21:38:06.9286437Z import triton.language as tl
2023-01-11T21:38:06.9286563Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9286705Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9286711Z 
2023-01-11T21:38:06.9286715Z 
2023-01-11T21:38:06.9286874Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9286951Z import triton
2023-01-11T21:38:06.9287044Z import triton.language as tl
2023-01-11T21:38:06.9287154Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9287259Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9287391Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9287517Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9287522Z 
2023-01-11T21:38:06.9287938Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9288040Z @triton.jit
2023-01-11T21:38:06.9288182Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9288259Z     xnumel = 100
2023-01-11T21:38:06.9288359Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9288486Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9288569Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9288642Z     x2 = xindex
2023-01-11T21:38:06.9288720Z     x0 = xindex % 10
2023-01-11T21:38:06.9288820Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9288919Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9289003Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9289085Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9289228Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9289315Z ''')
2023-01-11T21:38:06.9289320Z 
2023-01-11T21:38:06.9289327Z 
2023-01-11T21:38:06.9289424Z async_compile.wait(globals())
2023-01-11T21:38:06.9289502Z del async_compile
2023-01-11T21:38:06.9289507Z 
2023-01-11T21:38:06.9289585Z def call(args):
2023-01-11T21:38:06.9289666Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9289736Z     args.clear()
2023-01-11T21:38:06.9289828Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9290034Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9290128Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9290272Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9290347Z         del arg0_1
2023-01-11T21:38:06.9290422Z         del arg1_1
2023-01-11T21:38:06.9290495Z         return (buf0, )
2023-01-11T21:38:06.9290501Z 
2023-01-11T21:38:06.9290511Z 
2023-01-11T21:38:06.9290587Z if __name__ == "__main__":
2023-01-11T21:38:06.9290706Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9290833Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9291041Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9291243Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9291363Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9291397Z 
2023-01-11T21:38:06.9291473Z ok (0.089s)
2023-01-11T21:38:06.9291952Z   test_cuda_double_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9292085Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9292340Z [2023-01-11 21:37:57,331] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1120
2023-01-11T21:38:06.9292611Z [2023-01-11 21:37:57,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1120
2023-01-11T21:38:06.9292616Z 
2023-01-11T21:38:06.9292716Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9292793Z import torch
2023-01-11T21:38:06.9292874Z import random
2023-01-11T21:38:06.9292996Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9293121Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9293127Z 
2023-01-11T21:38:06.9293212Z aten = torch.ops.aten
2023-01-11T21:38:06.9293342Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9293441Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9293447Z 
2023-01-11T21:38:06.9293521Z import triton
2023-01-11T21:38:06.9293615Z import triton.language as tl
2023-01-11T21:38:06.9293739Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9293915Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9293920Z 
2023-01-11T21:38:06.9293925Z 
2023-01-11T21:38:06.9294080Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9294150Z import triton
2023-01-11T21:38:06.9294245Z import triton.language as tl
2023-01-11T21:38:06.9294367Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9294472Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9294721Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9294846Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9294851Z 
2023-01-11T21:38:06.9295269Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9295342Z @triton.jit
2023-01-11T21:38:06.9295481Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9295552Z     xnumel = 100
2023-01-11T21:38:06.9295649Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9295777Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9295866Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9295955Z     x2 = xindex
2023-01-11T21:38:06.9296038Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9296152Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9296245Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9296334Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9296412Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9296547Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9296634Z ''')
2023-01-11T21:38:06.9296639Z 
2023-01-11T21:38:06.9296644Z 
2023-01-11T21:38:06.9296737Z async_compile.wait(globals())
2023-01-11T21:38:06.9296811Z del async_compile
2023-01-11T21:38:06.9296816Z 
2023-01-11T21:38:06.9296887Z def call(args):
2023-01-11T21:38:06.9296967Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9297042Z     args.clear()
2023-01-11T21:38:06.9297192Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9297431Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9297587Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9297750Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9297828Z         del arg0_1
2023-01-11T21:38:06.9297912Z         del arg1_1
2023-01-11T21:38:06.9298083Z         return (buf0, )
2023-01-11T21:38:06.9298089Z 
2023-01-11T21:38:06.9298093Z 
2023-01-11T21:38:06.9298173Z if __name__ == "__main__":
2023-01-11T21:38:06.9298288Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9298413Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9298618Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9298826Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9298939Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9298944Z 
2023-01-11T21:38:06.9299017Z ok (0.086s)
2023-01-11T21:38:06.9299493Z   test_cuda_double_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9299623Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9299881Z [2023-01-11 21:37:57,416] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1121
2023-01-11T21:38:06.9300145Z [2023-01-11 21:37:57,484] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1121
2023-01-11T21:38:06.9300188Z 
2023-01-11T21:38:06.9300291Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9300366Z import torch
2023-01-11T21:38:06.9300444Z import random
2023-01-11T21:38:06.9300561Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9300685Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9300690Z 
2023-01-11T21:38:06.9300775Z aten = torch.ops.aten
2023-01-11T21:38:06.9300913Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9301009Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9301014Z 
2023-01-11T21:38:06.9301091Z import triton
2023-01-11T21:38:06.9301189Z import triton.language as tl
2023-01-11T21:38:06.9301316Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9301450Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9301456Z 
2023-01-11T21:38:06.9301468Z 
2023-01-11T21:38:06.9301618Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9301693Z import triton
2023-01-11T21:38:06.9301787Z import triton.language as tl
2023-01-11T21:38:06.9301905Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9302011Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9302150Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9302276Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9302282Z 
2023-01-11T21:38:06.9302693Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9302767Z @triton.jit
2023-01-11T21:38:06.9302909Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9302985Z     xnumel = 100
2023-01-11T21:38:06.9303089Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9303218Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9303304Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9303377Z     x0 = xindex
2023-01-11T21:38:06.9303470Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9303635Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9303728Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9303811Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9303948Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9304035Z ''')
2023-01-11T21:38:06.9304041Z 
2023-01-11T21:38:06.9304045Z 
2023-01-11T21:38:06.9304138Z async_compile.wait(globals())
2023-01-11T21:38:06.9304211Z del async_compile
2023-01-11T21:38:06.9304216Z 
2023-01-11T21:38:06.9304293Z def call(args):
2023-01-11T21:38:06.9304377Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9304453Z     args.clear()
2023-01-11T21:38:06.9304549Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9304754Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9304848Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9304984Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9305063Z         del arg0_1
2023-01-11T21:38:06.9305140Z         del arg1_1
2023-01-11T21:38:06.9305220Z         return (buf0, )
2023-01-11T21:38:06.9305225Z 
2023-01-11T21:38:06.9305230Z 
2023-01-11T21:38:06.9305311Z if __name__ == "__main__":
2023-01-11T21:38:06.9305428Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9305555Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9305761Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9305950Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9306100Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9306105Z 
2023-01-11T21:38:06.9306180Z ok (0.082s)
2023-01-11T21:38:06.9306659Z   test_cuda_double_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9306791Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9307052Z [2023-01-11 21:37:57,498] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1122
2023-01-11T21:38:06.9307315Z [2023-01-11 21:37:57,567] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1122
2023-01-11T21:38:06.9307321Z 
2023-01-11T21:38:06.9307422Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9307503Z import torch
2023-01-11T21:38:06.9307573Z import random
2023-01-11T21:38:06.9307694Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9307820Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9307825Z 
2023-01-11T21:38:06.9307908Z aten = torch.ops.aten
2023-01-11T21:38:06.9308048Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9308146Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9308151Z 
2023-01-11T21:38:06.9308228Z import triton
2023-01-11T21:38:06.9308323Z import triton.language as tl
2023-01-11T21:38:06.9308443Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9308585Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9308590Z 
2023-01-11T21:38:06.9308595Z 
2023-01-11T21:38:06.9308750Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9308825Z import triton
2023-01-11T21:38:06.9308922Z import triton.language as tl
2023-01-11T21:38:06.9309039Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9309143Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9309275Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9309396Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9309428Z 
2023-01-11T21:38:06.9309847Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9309925Z @triton.jit
2023-01-11T21:38:06.9310067Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9310145Z     xnumel = 100
2023-01-11T21:38:06.9310244Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9310372Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9310461Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9310528Z     x0 = xindex
2023-01-11T21:38:06.9310631Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9310731Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9310823Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9310905Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9311043Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9311130Z ''')
2023-01-11T21:38:06.9311135Z 
2023-01-11T21:38:06.9311139Z 
2023-01-11T21:38:06.9311228Z async_compile.wait(globals())
2023-01-11T21:38:06.9311307Z del async_compile
2023-01-11T21:38:06.9311312Z 
2023-01-11T21:38:06.9311389Z def call(args):
2023-01-11T21:38:06.9311470Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9311549Z     args.clear()
2023-01-11T21:38:06.9311640Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9311843Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9311931Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9312137Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9312215Z         del arg0_1
2023-01-11T21:38:06.9312288Z         del arg1_1
2023-01-11T21:38:06.9312370Z         return (buf0, )
2023-01-11T21:38:06.9312375Z 
2023-01-11T21:38:06.9312379Z 
2023-01-11T21:38:06.9312464Z if __name__ == "__main__":
2023-01-11T21:38:06.9312581Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9312709Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9312910Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9313111Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9313235Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9313240Z 
2023-01-11T21:38:06.9313313Z ok (0.083s)
2023-01-11T21:38:06.9313793Z   test_cuda_double_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9313928Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9314190Z [2023-01-11 21:37:57,580] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1123
2023-01-11T21:38:06.9314454Z [2023-01-11 21:37:57,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1123
2023-01-11T21:38:06.9314460Z 
2023-01-11T21:38:06.9314559Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9314637Z import torch
2023-01-11T21:38:06.9314707Z import random
2023-01-11T21:38:06.9314829Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9314958Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9314964Z 
2023-01-11T21:38:06.9315048Z aten = torch.ops.aten
2023-01-11T21:38:06.9315185Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9315282Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9315287Z 
2023-01-11T21:38:06.9315385Z import triton
2023-01-11T21:38:06.9315474Z import triton.language as tl
2023-01-11T21:38:06.9315601Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9315742Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9315747Z 
2023-01-11T21:38:06.9315752Z 
2023-01-11T21:38:06.9315909Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9315982Z import triton
2023-01-11T21:38:06.9316077Z import triton.language as tl
2023-01-11T21:38:06.9316192Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9316295Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9316427Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9316552Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9316557Z 
2023-01-11T21:38:06.9316974Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9317050Z @triton.jit
2023-01-11T21:38:06.9317198Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9317274Z     xnumel = 100
2023-01-11T21:38:06.9317373Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9317503Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9317582Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9317655Z     x0 = xindex
2023-01-11T21:38:06.9317753Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9317853Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9317972Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9318114Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9318199Z ''')
2023-01-11T21:38:06.9318205Z 
2023-01-11T21:38:06.9318209Z 
2023-01-11T21:38:06.9318297Z async_compile.wait(globals())
2023-01-11T21:38:06.9318380Z del async_compile
2023-01-11T21:38:06.9318386Z 
2023-01-11T21:38:06.9318462Z def call(args):
2023-01-11T21:38:06.9318545Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9318621Z     args.clear()
2023-01-11T21:38:06.9318714Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9318924Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9319011Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9319155Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9319228Z         del arg0_1
2023-01-11T21:38:06.9319303Z         del arg1_1
2023-01-11T21:38:06.9319387Z         return (buf0, )
2023-01-11T21:38:06.9319392Z 
2023-01-11T21:38:06.9319396Z 
2023-01-11T21:38:06.9319480Z if __name__ == "__main__":
2023-01-11T21:38:06.9319599Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9319727Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9319929Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9320129Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9320254Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9320260Z 
2023-01-11T21:38:06.9320332Z ok (0.081s)
2023-01-11T21:38:06.9320805Z   test_cuda_double_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9320942Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9321207Z [2023-01-11 21:37:57,662] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1124
2023-01-11T21:38:06.9321502Z [2023-01-11 21:37:57,737] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1124
2023-01-11T21:38:06.9321508Z 
2023-01-11T21:38:06.9321609Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9321685Z import torch
2023-01-11T21:38:06.9321755Z import random
2023-01-11T21:38:06.9321875Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9322000Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9322007Z 
2023-01-11T21:38:06.9322088Z aten = torch.ops.aten
2023-01-11T21:38:06.9322226Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9322331Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9322336Z 
2023-01-11T21:38:06.9322413Z import triton
2023-01-11T21:38:06.9322501Z import triton.language as tl
2023-01-11T21:38:06.9322628Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9322771Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9322777Z 
2023-01-11T21:38:06.9322781Z 
2023-01-11T21:38:06.9322940Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9323018Z import triton
2023-01-11T21:38:06.9323115Z import triton.language as tl
2023-01-11T21:38:06.9323230Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9323336Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9323464Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9323593Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9323598Z 
2023-01-11T21:38:06.9324015Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*i32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9324117Z @triton.jit
2023-01-11T21:38:06.9324259Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9324339Z     xnumel = 100
2023-01-11T21:38:06.9324439Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9324570Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9324649Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9324723Z     x2 = xindex
2023-01-11T21:38:06.9324800Z     x0 = xindex % 10
2023-01-11T21:38:06.9324898Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9324998Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9325090Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9325172Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9325304Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9325393Z ''')
2023-01-11T21:38:06.9325399Z 
2023-01-11T21:38:06.9325403Z 
2023-01-11T21:38:06.9325498Z async_compile.wait(globals())
2023-01-11T21:38:06.9325577Z del async_compile
2023-01-11T21:38:06.9325582Z 
2023-01-11T21:38:06.9325659Z def call(args):
2023-01-11T21:38:06.9325741Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9325817Z     args.clear()
2023-01-11T21:38:06.9325905Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9326108Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9326202Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9326346Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9326424Z         del arg0_1
2023-01-11T21:38:06.9326499Z         del arg1_1
2023-01-11T21:38:06.9326580Z         return (buf0, )
2023-01-11T21:38:06.9326585Z 
2023-01-11T21:38:06.9326590Z 
2023-01-11T21:38:06.9326673Z if __name__ == "__main__":
2023-01-11T21:38:06.9326789Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9326916Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9327124Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9327348Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9327471Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9327476Z 
2023-01-11T21:38:06.9327552Z ok (0.088s)
2023-01-11T21:38:06.9328031Z   test_cuda_double_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9328167Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9328426Z [2023-01-11 21:37:57,751] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1125
2023-01-11T21:38:06.9328685Z [2023-01-11 21:37:57,824] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1125
2023-01-11T21:38:06.9328704Z 
2023-01-11T21:38:06.9328798Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9328874Z import torch
2023-01-11T21:38:06.9328949Z import random
2023-01-11T21:38:06.9329068Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9329193Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9329199Z 
2023-01-11T21:38:06.9329282Z aten = torch.ops.aten
2023-01-11T21:38:06.9329421Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9329512Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9329517Z 
2023-01-11T21:38:06.9329592Z import triton
2023-01-11T21:38:06.9329713Z import triton.language as tl
2023-01-11T21:38:06.9329840Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9329982Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9329988Z 
2023-01-11T21:38:06.9329992Z 
2023-01-11T21:38:06.9330148Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9330224Z import triton
2023-01-11T21:38:06.9330318Z import triton.language as tl
2023-01-11T21:38:06.9330428Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9330531Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9330662Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9330788Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9330793Z 
2023-01-11T21:38:06.9331208Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9331287Z @triton.jit
2023-01-11T21:38:06.9331431Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9331508Z     xnumel = 100
2023-01-11T21:38:06.9331601Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9331734Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9331823Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9331896Z     x2 = xindex
2023-01-11T21:38:06.9331973Z     x0 = xindex % 10
2023-01-11T21:38:06.9332054Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9332147Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9332257Z     tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9332348Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9332429Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9332569Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9332659Z ''')
2023-01-11T21:38:06.9332666Z 
2023-01-11T21:38:06.9332670Z 
2023-01-11T21:38:06.9332764Z async_compile.wait(globals())
2023-01-11T21:38:06.9332844Z del async_compile
2023-01-11T21:38:06.9332849Z 
2023-01-11T21:38:06.9332919Z def call(args):
2023-01-11T21:38:06.9333002Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9333080Z     args.clear()
2023-01-11T21:38:06.9333201Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9333411Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9333506Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9333650Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9333719Z         del arg0_1
2023-01-11T21:38:06.9333794Z         del arg1_1
2023-01-11T21:38:06.9333873Z         return (buf0, )
2023-01-11T21:38:06.9333878Z 
2023-01-11T21:38:06.9333883Z 
2023-01-11T21:38:06.9333968Z if __name__ == "__main__":
2023-01-11T21:38:06.9334088Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9334219Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9334423Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9334733Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9334851Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9334856Z 
2023-01-11T21:38:06.9334930Z ok (0.087s)
2023-01-11T21:38:06.9335405Z   test_cuda_double_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9335539Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9335840Z [2023-01-11 21:37:57,838] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1126
2023-01-11T21:38:06.9336104Z [2023-01-11 21:37:58,043] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1126
2023-01-11T21:38:06.9336110Z 
2023-01-11T21:38:06.9336212Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9336288Z import torch
2023-01-11T21:38:06.9336364Z import random
2023-01-11T21:38:06.9336479Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9336603Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9336608Z 
2023-01-11T21:38:06.9336693Z aten = torch.ops.aten
2023-01-11T21:38:06.9336831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9336929Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9336935Z 
2023-01-11T21:38:06.9337010Z import triton
2023-01-11T21:38:06.9337101Z import triton.language as tl
2023-01-11T21:38:06.9337289Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9337425Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9337431Z 
2023-01-11T21:38:06.9337440Z 
2023-01-11T21:38:06.9337593Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9337669Z import triton
2023-01-11T21:38:06.9337768Z import triton.language as tl
2023-01-11T21:38:06.9337885Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9337990Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9338124Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9338253Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9338258Z 
2023-01-11T21:38:06.9338728Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp64', 1: '*fp32', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9338807Z @triton.jit
2023-01-11T21:38:06.9338987Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.9339064Z     xnumel = 10
2023-01-11T21:38:06.9339140Z     ynumel = 10
2023-01-11T21:38:06.9339241Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9339415Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9339505Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9339597Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.9339731Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.9339819Z     ymask = yindex < ynumel
2023-01-11T21:38:06.9339891Z     x0 = xindex
2023-01-11T21:38:06.9339961Z     y1 = yindex
2023-01-11T21:38:06.9340078Z     tmp0 = tl.load(in_ptr0 + (y1 + (10*x0)), xmask & ymask)
2023-01-11T21:38:06.9340193Z     tmp1 = tl.load(in_ptr1 + (x0 + (10*y1)), xmask & ymask)
2023-01-11T21:38:06.9340282Z     tmp2 = tmp1.to(tl.float64)
2023-01-11T21:38:06.9340362Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9340519Z     tl.store(out_ptr0 + (y1 + (10*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask)
2023-01-11T21:38:06.9340609Z ''')
2023-01-11T21:38:06.9340614Z 
2023-01-11T21:38:06.9340618Z 
2023-01-11T21:38:06.9340715Z async_compile.wait(globals())
2023-01-11T21:38:06.9340795Z del async_compile
2023-01-11T21:38:06.9340800Z 
2023-01-11T21:38:06.9340878Z def call(args):
2023-01-11T21:38:06.9340960Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9341031Z     args.clear()
2023-01-11T21:38:06.9341125Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9341332Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9341426Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9341573Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0)
2023-01-11T21:38:06.9341648Z         del arg0_1
2023-01-11T21:38:06.9341753Z         del arg1_1
2023-01-11T21:38:06.9341827Z         return (buf0, )
2023-01-11T21:38:06.9341832Z 
2023-01-11T21:38:06.9341837Z 
2023-01-11T21:38:06.9341916Z if __name__ == "__main__":
2023-01-11T21:38:06.9342034Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9342162Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9342374Z     arg0_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9342573Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9342693Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9342699Z 
2023-01-11T21:38:06.9342770Z ok (0.220s)
2023-01-11T21:38:06.9343240Z   test_cuda_int_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9343375Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9343640Z [2023-01-11 21:37:58,057] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1127
2023-01-11T21:38:06.9343907Z [2023-01-11 21:37:58,126] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1127
2023-01-11T21:38:06.9343913Z 
2023-01-11T21:38:06.9344013Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9344093Z import torch
2023-01-11T21:38:06.9344167Z import random
2023-01-11T21:38:06.9344288Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9344414Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9344420Z 
2023-01-11T21:38:06.9344497Z aten = torch.ops.aten
2023-01-11T21:38:06.9344634Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9344734Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9344740Z 
2023-01-11T21:38:06.9344815Z import triton
2023-01-11T21:38:06.9344908Z import triton.language as tl
2023-01-11T21:38:06.9345033Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9345199Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9345205Z 
2023-01-11T21:38:06.9345210Z 
2023-01-11T21:38:06.9345368Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9345438Z import triton
2023-01-11T21:38:06.9345532Z import triton.language as tl
2023-01-11T21:38:06.9345647Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9345751Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9345889Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9346015Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9346020Z 
2023-01-11T21:38:06.9346435Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9346513Z @triton.jit
2023-01-11T21:38:06.9346653Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9346729Z     xnumel = 10
2023-01-11T21:38:06.9346830Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9346962Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9347045Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9347121Z     x0 = xindex
2023-01-11T21:38:06.9347220Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9347313Z     tmp2 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9347403Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.9347483Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9347621Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9347736Z ''')
2023-01-11T21:38:06.9347741Z 
2023-01-11T21:38:06.9347746Z 
2023-01-11T21:38:06.9347841Z async_compile.wait(globals())
2023-01-11T21:38:06.9347920Z del async_compile
2023-01-11T21:38:06.9347926Z 
2023-01-11T21:38:06.9348002Z def call(args):
2023-01-11T21:38:06.9348077Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9348156Z     args.clear()
2023-01-11T21:38:06.9348251Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9348452Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9348547Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9348691Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9348767Z         del arg0_1
2023-01-11T21:38:06.9348835Z         del arg1_1
2023-01-11T21:38:06.9348915Z         return (buf0, )
2023-01-11T21:38:06.9348920Z 
2023-01-11T21:38:06.9348925Z 
2023-01-11T21:38:06.9349005Z if __name__ == "__main__":
2023-01-11T21:38:06.9349127Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9349255Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9349453Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9349657Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9349773Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9349783Z 
2023-01-11T21:38:06.9349849Z ok (0.082s)
2023-01-11T21:38:06.9350326Z   test_cuda_int_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9350460Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9350725Z [2023-01-11 21:37:58,139] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1128
2023-01-11T21:38:06.9350991Z [2023-01-11 21:37:58,213] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1128
2023-01-11T21:38:06.9351022Z 
2023-01-11T21:38:06.9351124Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9351200Z import torch
2023-01-11T21:38:06.9351276Z import random
2023-01-11T21:38:06.9351398Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9351517Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9351523Z 
2023-01-11T21:38:06.9351607Z aten = torch.ops.aten
2023-01-11T21:38:06.9351744Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9351843Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9351848Z 
2023-01-11T21:38:06.9351927Z import triton
2023-01-11T21:38:06.9352020Z import triton.language as tl
2023-01-11T21:38:06.9352148Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9352283Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9352294Z 
2023-01-11T21:38:06.9352298Z 
2023-01-11T21:38:06.9352449Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9352526Z import triton
2023-01-11T21:38:06.9352621Z import triton.language as tl
2023-01-11T21:38:06.9352736Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9352840Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9352973Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9353099Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9353104Z 
2023-01-11T21:38:06.9353521Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9353618Z @triton.jit
2023-01-11T21:38:06.9353761Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9353838Z     xnumel = 100
2023-01-11T21:38:06.9353938Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9354073Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9354159Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9354235Z     x0 = xindex % 10
2023-01-11T21:38:06.9354310Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9354383Z     x2 = xindex
2023-01-11T21:38:06.9354481Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9354578Z     tmp2 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9354668Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.9354748Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9354887Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9354968Z ''')
2023-01-11T21:38:06.9354974Z 
2023-01-11T21:38:06.9354981Z 
2023-01-11T21:38:06.9355078Z async_compile.wait(globals())
2023-01-11T21:38:06.9355157Z del async_compile
2023-01-11T21:38:06.9355162Z 
2023-01-11T21:38:06.9355241Z def call(args):
2023-01-11T21:38:06.9355322Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9355399Z     args.clear()
2023-01-11T21:38:06.9355490Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9355700Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9355794Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9355938Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9356013Z         del arg0_1
2023-01-11T21:38:06.9356088Z         del arg1_1
2023-01-11T21:38:06.9356171Z         return (buf0, )
2023-01-11T21:38:06.9356176Z 
2023-01-11T21:38:06.9356180Z 
2023-01-11T21:38:06.9356262Z if __name__ == "__main__":
2023-01-11T21:38:06.9356380Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9356503Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9356700Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9356911Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9357058Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9357064Z 
2023-01-11T21:38:06.9357137Z ok (0.087s)
2023-01-11T21:38:06.9357607Z   test_cuda_int_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9357741Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9358001Z [2023-01-11 21:37:58,226] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1129
2023-01-11T21:38:06.9358273Z [2023-01-11 21:37:58,293] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1129
2023-01-11T21:38:06.9358279Z 
2023-01-11T21:38:06.9358382Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9358454Z import torch
2023-01-11T21:38:06.9358531Z import random
2023-01-11T21:38:06.9358656Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9358781Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9358787Z 
2023-01-11T21:38:06.9358871Z aten = torch.ops.aten
2023-01-11T21:38:06.9359009Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9359104Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9359109Z 
2023-01-11T21:38:06.9359179Z import triton
2023-01-11T21:38:06.9359277Z import triton.language as tl
2023-01-11T21:38:06.9359405Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9359579Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9359585Z 
2023-01-11T21:38:06.9359590Z 
2023-01-11T21:38:06.9359746Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9359824Z import triton
2023-01-11T21:38:06.9359917Z import triton.language as tl
2023-01-11T21:38:06.9360036Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9360133Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9360266Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9360392Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9360397Z 
2023-01-11T21:38:06.9360812Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9360887Z @triton.jit
2023-01-11T21:38:06.9361034Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9361109Z     xnumel = 10
2023-01-11T21:38:06.9361209Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9361334Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9361416Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9361493Z     x0 = xindex
2023-01-11T21:38:06.9361592Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9361725Z     tmp2 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9361819Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.9361901Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9362031Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9362119Z ''')
2023-01-11T21:38:06.9362124Z 
2023-01-11T21:38:06.9362129Z 
2023-01-11T21:38:06.9362223Z async_compile.wait(globals())
2023-01-11T21:38:06.9362302Z del async_compile
2023-01-11T21:38:06.9362308Z 
2023-01-11T21:38:06.9362386Z def call(args):
2023-01-11T21:38:06.9362467Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9362544Z     args.clear()
2023-01-11T21:38:06.9362630Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9362830Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9362950Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9363096Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9363173Z         del arg0_1
2023-01-11T21:38:06.9363245Z         del arg1_1
2023-01-11T21:38:06.9363324Z         return (buf0, )
2023-01-11T21:38:06.9363329Z 
2023-01-11T21:38:06.9363333Z 
2023-01-11T21:38:06.9363415Z if __name__ == "__main__":
2023-01-11T21:38:06.9363528Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9363654Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9363848Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9364046Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9364166Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9364171Z 
2023-01-11T21:38:06.9364243Z ok (0.080s)
2023-01-11T21:38:06.9364717Z   test_cuda_int_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9364848Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9365109Z [2023-01-11 21:37:58,306] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1130
2023-01-11T21:38:06.9365366Z [2023-01-11 21:37:58,383] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1130
2023-01-11T21:38:06.9365404Z 
2023-01-11T21:38:06.9365499Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9365575Z import torch
2023-01-11T21:38:06.9365652Z import random
2023-01-11T21:38:06.9365774Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9365902Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9365907Z 
2023-01-11T21:38:06.9365988Z aten = torch.ops.aten
2023-01-11T21:38:06.9366125Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9366217Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9366222Z 
2023-01-11T21:38:06.9366297Z import triton
2023-01-11T21:38:06.9366391Z import triton.language as tl
2023-01-11T21:38:06.9366517Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9366659Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9366665Z 
2023-01-11T21:38:06.9366669Z 
2023-01-11T21:38:06.9366831Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9366908Z import triton
2023-01-11T21:38:06.9367003Z import triton.language as tl
2023-01-11T21:38:06.9367113Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9367216Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9367353Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9367479Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9367484Z 
2023-01-11T21:38:06.9367900Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9367975Z @triton.jit
2023-01-11T21:38:06.9368117Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9368191Z     xnumel = 100
2023-01-11T21:38:06.9368284Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9368417Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9368503Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9368580Z     x0 = xindex % 10
2023-01-11T21:38:06.9368656Z     x2 = xindex
2023-01-11T21:38:06.9368759Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9368884Z     tmp2 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9368970Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.9369053Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9369191Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9369278Z ''')
2023-01-11T21:38:06.9369284Z 
2023-01-11T21:38:06.9369289Z 
2023-01-11T21:38:06.9369385Z async_compile.wait(globals())
2023-01-11T21:38:06.9369467Z del async_compile
2023-01-11T21:38:06.9369472Z 
2023-01-11T21:38:06.9369547Z def call(args):
2023-01-11T21:38:06.9369622Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9369699Z     args.clear()
2023-01-11T21:38:06.9369794Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9370000Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9370097Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9370240Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9370319Z         del arg0_1
2023-01-11T21:38:06.9370387Z         del arg1_1
2023-01-11T21:38:06.9370466Z         return (buf0, )
2023-01-11T21:38:06.9370471Z 
2023-01-11T21:38:06.9370475Z 
2023-01-11T21:38:06.9370556Z if __name__ == "__main__":
2023-01-11T21:38:06.9370675Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9370803Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9371002Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9371205Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9371327Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9371361Z 
2023-01-11T21:38:06.9371432Z ok (0.090s)
2023-01-11T21:38:06.9371897Z   test_cuda_int_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9372032Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9372290Z [2023-01-11 21:37:58,397] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1131
2023-01-11T21:38:06.9372554Z [2023-01-11 21:37:58,466] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1131
2023-01-11T21:38:06.9372560Z 
2023-01-11T21:38:06.9372660Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9372738Z import torch
2023-01-11T21:38:06.9372813Z import random
2023-01-11T21:38:06.9372932Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9373056Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9373062Z 
2023-01-11T21:38:06.9373139Z aten = torch.ops.aten
2023-01-11T21:38:06.9373279Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9373376Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9373381Z 
2023-01-11T21:38:06.9373461Z import triton
2023-01-11T21:38:06.9373556Z import triton.language as tl
2023-01-11T21:38:06.9373683Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9373823Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9373828Z 
2023-01-11T21:38:06.9373833Z 
2023-01-11T21:38:06.9373988Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9374058Z import triton
2023-01-11T21:38:06.9374151Z import triton.language as tl
2023-01-11T21:38:06.9374271Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9374373Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9374616Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9374744Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9374749Z 
2023-01-11T21:38:06.9375205Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9375283Z @triton.jit
2023-01-11T21:38:06.9375418Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9375492Z     xnumel = 100
2023-01-11T21:38:06.9375588Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9375715Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9375796Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9375874Z     x0 = xindex % 10
2023-01-11T21:38:06.9375943Z     x2 = xindex
2023-01-11T21:38:06.9376034Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9376133Z     tmp2 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9376220Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9376295Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9376432Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9376518Z ''')
2023-01-11T21:38:06.9376523Z 
2023-01-11T21:38:06.9376528Z 
2023-01-11T21:38:06.9376622Z async_compile.wait(globals())
2023-01-11T21:38:06.9376692Z del async_compile
2023-01-11T21:38:06.9376697Z 
2023-01-11T21:38:06.9376772Z def call(args):
2023-01-11T21:38:06.9376850Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9376927Z     args.clear()
2023-01-11T21:38:06.9377018Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9377307Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9377443Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9377580Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9377658Z         del arg0_1
2023-01-11T21:38:06.9377730Z         del arg1_1
2023-01-11T21:38:06.9377810Z         return (buf0, )
2023-01-11T21:38:06.9377815Z 
2023-01-11T21:38:06.9377822Z 
2023-01-11T21:38:06.9377903Z if __name__ == "__main__":
2023-01-11T21:38:06.9378022Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9378147Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9378345Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9378543Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9378664Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9378669Z 
2023-01-11T21:38:06.9378742Z ok (0.083s)
2023-01-11T21:38:06.9379214Z   test_cuda_int_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9379349Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9379610Z [2023-01-11 21:37:58,480] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1132
2023-01-11T21:38:06.9379877Z [2023-01-11 21:37:58,544] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1132
2023-01-11T21:38:06.9379882Z 
2023-01-11T21:38:06.9379981Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9380058Z import torch
2023-01-11T21:38:06.9380128Z import random
2023-01-11T21:38:06.9380250Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9380380Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9380385Z 
2023-01-11T21:38:06.9380469Z aten = torch.ops.aten
2023-01-11T21:38:06.9380607Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9380705Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9380710Z 
2023-01-11T21:38:06.9380816Z import triton
2023-01-11T21:38:06.9380911Z import triton.language as tl
2023-01-11T21:38:06.9381030Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9381173Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9381178Z 
2023-01-11T21:38:06.9381183Z 
2023-01-11T21:38:06.9381339Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9381418Z import triton
2023-01-11T21:38:06.9381511Z import triton.language as tl
2023-01-11T21:38:06.9381627Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9381731Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9381866Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9381987Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9381992Z 
2023-01-11T21:38:06.9382405Z @pointwise(size_hints=[16], filename=__file__, meta={'signature': {0: '*i32', 1: '*i32', 2: '*i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9382482Z @triton.jit
2023-01-11T21:38:06.9382625Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9382699Z     xnumel = 10
2023-01-11T21:38:06.9382797Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9382929Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9383013Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9383080Z     x0 = xindex
2023-01-11T21:38:06.9383179Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9383278Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9383390Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9383527Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9383616Z ''')
2023-01-11T21:38:06.9383621Z 
2023-01-11T21:38:06.9383626Z 
2023-01-11T21:38:06.9383720Z async_compile.wait(globals())
2023-01-11T21:38:06.9383795Z del async_compile
2023-01-11T21:38:06.9383801Z 
2023-01-11T21:38:06.9383878Z def call(args):
2023-01-11T21:38:06.9383959Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9384037Z     args.clear()
2023-01-11T21:38:06.9384131Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9384327Z         buf0 = empty_strided((10, ), (1, ), device='cuda', dtype=torch.int32)
2023-01-11T21:38:06.9384421Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9384560Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, grid=grid(10), stream=stream0)
2023-01-11T21:38:06.9384636Z         del arg0_1
2023-01-11T21:38:06.9384709Z         del arg1_1
2023-01-11T21:38:06.9384791Z         return (buf0, )
2023-01-11T21:38:06.9384797Z 
2023-01-11T21:38:06.9384801Z 
2023-01-11T21:38:06.9384882Z if __name__ == "__main__":
2023-01-11T21:38:06.9385002Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9385129Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9385331Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9385519Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9385641Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9385646Z 
2023-01-11T21:38:06.9385717Z ok (0.077s)
2023-01-11T21:38:06.9386183Z   test_cuda_int_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9386318Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9386578Z [2023-01-11 21:37:58,557] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1133
2023-01-11T21:38:06.9386871Z [2023-01-11 21:37:58,631] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1133
2023-01-11T21:38:06.9386877Z 
2023-01-11T21:38:06.9386976Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9387051Z import torch
2023-01-11T21:38:06.9387121Z import random
2023-01-11T21:38:06.9387242Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9387367Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9387372Z 
2023-01-11T21:38:06.9387456Z aten = torch.ops.aten
2023-01-11T21:38:06.9387592Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9387691Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9387696Z 
2023-01-11T21:38:06.9387771Z import triton
2023-01-11T21:38:06.9387864Z import triton.language as tl
2023-01-11T21:38:06.9387983Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9388125Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9388133Z 
2023-01-11T21:38:06.9388137Z 
2023-01-11T21:38:06.9388292Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9388368Z import triton
2023-01-11T21:38:06.9388462Z import triton.language as tl
2023-01-11T21:38:06.9388578Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9388680Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9388813Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9388933Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9388938Z 
2023-01-11T21:38:06.9389355Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9389463Z @triton.jit
2023-01-11T21:38:06.9389605Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9389685Z     xnumel = 100
2023-01-11T21:38:06.9389784Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9389913Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9389999Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9390071Z     x0 = xindex % 10
2023-01-11T21:38:06.9390152Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9390222Z     x2 = xindex
2023-01-11T21:38:06.9390323Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9390433Z     tmp2 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9390523Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.9390598Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9390735Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9390819Z ''')
2023-01-11T21:38:06.9390824Z 
2023-01-11T21:38:06.9390829Z 
2023-01-11T21:38:06.9390923Z async_compile.wait(globals())
2023-01-11T21:38:06.9391002Z del async_compile
2023-01-11T21:38:06.9391007Z 
2023-01-11T21:38:06.9391087Z def call(args):
2023-01-11T21:38:06.9391167Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9391244Z     args.clear()
2023-01-11T21:38:06.9391331Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9391534Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9391628Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9391769Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9391843Z         del arg0_1
2023-01-11T21:38:06.9391917Z         del arg1_1
2023-01-11T21:38:06.9391998Z         return (buf0, )
2023-01-11T21:38:06.9392003Z 
2023-01-11T21:38:06.9392010Z 
2023-01-11T21:38:06.9392093Z if __name__ == "__main__":
2023-01-11T21:38:06.9392206Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9392334Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9392531Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9392762Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9392884Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9392889Z 
2023-01-11T21:38:06.9392962Z ok (0.088s)
2023-01-11T21:38:06.9393440Z   test_cuda_int_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9393576Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9393840Z [2023-01-11 21:37:58,645] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1134
2023-01-11T21:38:06.9394102Z [2023-01-11 21:37:58,718] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1134
2023-01-11T21:38:06.9394114Z 
2023-01-11T21:38:06.9394209Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9394283Z import torch
2023-01-11T21:38:06.9394364Z import random
2023-01-11T21:38:06.9394484Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9394610Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9394615Z 
2023-01-11T21:38:06.9394698Z aten = torch.ops.aten
2023-01-11T21:38:06.9394831Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9394923Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9394928Z 
2023-01-11T21:38:06.9395037Z import triton
2023-01-11T21:38:06.9395131Z import triton.language as tl
2023-01-11T21:38:06.9395258Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9395397Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9395402Z 
2023-01-11T21:38:06.9395407Z 
2023-01-11T21:38:06.9395567Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9395643Z import triton
2023-01-11T21:38:06.9395732Z import triton.language as tl
2023-01-11T21:38:06.9395848Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9395952Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9396087Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9396214Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9396219Z 
2023-01-11T21:38:06.9396639Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*i32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9396717Z @triton.jit
2023-01-11T21:38:06.9396860Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9396941Z     xnumel = 100
2023-01-11T21:38:06.9397034Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9397170Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9397255Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9397339Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9397412Z     x2 = xindex
2023-01-11T21:38:06.9397511Z     tmp0 = tl.load(in_ptr0 + (x1), xmask)
2023-01-11T21:38:06.9397604Z     tmp2 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9397695Z     tmp1 = tmp0.to(tl.float32)
2023-01-11T21:38:06.9397775Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9397912Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9398000Z ''')
2023-01-11T21:38:06.9398008Z 
2023-01-11T21:38:06.9398013Z 
2023-01-11T21:38:06.9398108Z async_compile.wait(globals())
2023-01-11T21:38:06.9398185Z del async_compile
2023-01-11T21:38:06.9398190Z 
2023-01-11T21:38:06.9398265Z def call(args):
2023-01-11T21:38:06.9398340Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9398417Z     args.clear()
2023-01-11T21:38:06.9398538Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9398743Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9398835Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9398974Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9399047Z         del arg0_1
2023-01-11T21:38:06.9399113Z         del arg1_1
2023-01-11T21:38:06.9399189Z         return (buf0, )
2023-01-11T21:38:06.9399194Z 
2023-01-11T21:38:06.9399198Z 
2023-01-11T21:38:06.9399279Z if __name__ == "__main__":
2023-01-11T21:38:06.9399395Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9399523Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9399718Z     arg0_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9399922Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9400044Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9400049Z 
2023-01-11T21:38:06.9400112Z ok (0.087s)
2023-01-11T21:38:06.9400589Z   test_cuda_strided_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9400719Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9401015Z [2023-01-11 21:37:58,733] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1135
2023-01-11T21:38:06.9401279Z [2023-01-11 21:37:58,806] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1135
2023-01-11T21:38:06.9401285Z 
2023-01-11T21:38:06.9401381Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9401455Z import torch
2023-01-11T21:38:06.9401528Z import random
2023-01-11T21:38:06.9401646Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9401763Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9401778Z 
2023-01-11T21:38:06.9401853Z aten = torch.ops.aten
2023-01-11T21:38:06.9401988Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9402080Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9402086Z 
2023-01-11T21:38:06.9402158Z import triton
2023-01-11T21:38:06.9402251Z import triton.language as tl
2023-01-11T21:38:06.9402374Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9402519Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9402525Z 
2023-01-11T21:38:06.9402530Z 
2023-01-11T21:38:06.9402676Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9402750Z import triton
2023-01-11T21:38:06.9402842Z import triton.language as tl
2023-01-11T21:38:06.9402961Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9403062Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9403199Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9403325Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9403330Z 
2023-01-11T21:38:06.9403744Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9403810Z @triton.jit
2023-01-11T21:38:06.9403951Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9404025Z     xnumel = 100
2023-01-11T21:38:06.9404122Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9404250Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9404395Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9404471Z     x0 = xindex % 10
2023-01-11T21:38:06.9404544Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9404616Z     x2 = xindex
2023-01-11T21:38:06.9404724Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9404821Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9404899Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9405033Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9405122Z ''')
2023-01-11T21:38:06.9405127Z 
2023-01-11T21:38:06.9405132Z 
2023-01-11T21:38:06.9405226Z async_compile.wait(globals())
2023-01-11T21:38:06.9405296Z del async_compile
2023-01-11T21:38:06.9405304Z 
2023-01-11T21:38:06.9405377Z def call(args):
2023-01-11T21:38:06.9405455Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9405530Z     args.clear()
2023-01-11T21:38:06.9405645Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9405874Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9405967Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9406101Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9406174Z         del arg0_1
2023-01-11T21:38:06.9406246Z         del arg1_1
2023-01-11T21:38:06.9406320Z         return (buf0, )
2023-01-11T21:38:06.9406325Z 
2023-01-11T21:38:06.9406330Z 
2023-01-11T21:38:06.9406409Z if __name__ == "__main__":
2023-01-11T21:38:06.9406524Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9406649Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9406853Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9407071Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9407189Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9407195Z 
2023-01-11T21:38:06.9407265Z ok (0.088s)
2023-01-11T21:38:06.9407741Z   test_cuda_strided_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9407872Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9408133Z [2023-01-11 21:37:58,820] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1136
2023-01-11T21:38:06.9408400Z [2023-01-11 21:37:58,895] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1136
2023-01-11T21:38:06.9408406Z 
2023-01-11T21:38:06.9408505Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9408578Z import torch
2023-01-11T21:38:06.9408646Z import random
2023-01-11T21:38:06.9408767Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9408892Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9408897Z 
2023-01-11T21:38:06.9408979Z aten = torch.ops.aten
2023-01-11T21:38:06.9409116Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9409211Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9409217Z 
2023-01-11T21:38:06.9409289Z import triton
2023-01-11T21:38:06.9409383Z import triton.language as tl
2023-01-11T21:38:06.9409501Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9409639Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9409647Z 
2023-01-11T21:38:06.9409652Z 
2023-01-11T21:38:06.9409806Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9409882Z import triton
2023-01-11T21:38:06.9409977Z import triton.language as tl
2023-01-11T21:38:06.9410088Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9410223Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9410350Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9410475Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9410480Z 
2023-01-11T21:38:06.9410902Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9410976Z @triton.jit
2023-01-11T21:38:06.9411117Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9411193Z     xnumel = 100
2023-01-11T21:38:06.9411291Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9411420Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9411500Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9411569Z     x0 = xindex % 10
2023-01-11T21:38:06.9411649Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9411724Z     x2 = xindex
2023-01-11T21:38:06.9411834Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9411935Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9412012Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9412141Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9412227Z ''')
2023-01-11T21:38:06.9412233Z 
2023-01-11T21:38:06.9412238Z 
2023-01-11T21:38:06.9412331Z async_compile.wait(globals())
2023-01-11T21:38:06.9412408Z del async_compile
2023-01-11T21:38:06.9412413Z 
2023-01-11T21:38:06.9412490Z def call(args):
2023-01-11T21:38:06.9412571Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9412676Z     args.clear()
2023-01-11T21:38:06.9412769Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9412976Z         buf0 = empty_strided((1, 10, 10), (100, 10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9413066Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9413209Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9413282Z         del arg0_1
2023-01-11T21:38:06.9413353Z         del arg1_1
2023-01-11T21:38:06.9413432Z         return (buf0, )
2023-01-11T21:38:06.9413437Z 
2023-01-11T21:38:06.9413442Z 
2023-01-11T21:38:06.9413526Z if __name__ == "__main__":
2023-01-11T21:38:06.9413643Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9413761Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9413964Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9414172Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9414295Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9414300Z 
2023-01-11T21:38:06.9414370Z ok (0.088s)
2023-01-11T21:38:06.9414963Z   test_cuda_strided_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9415094Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9415353Z [2023-01-11 21:37:58,909] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1137
2023-01-11T21:38:06.9415616Z [2023-01-11 21:37:58,982] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1137
2023-01-11T21:38:06.9415625Z 
2023-01-11T21:38:06.9415719Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9415803Z import torch
2023-01-11T21:38:06.9415890Z import random
2023-01-11T21:38:06.9416024Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9416155Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9416206Z 
2023-01-11T21:38:06.9416291Z aten = torch.ops.aten
2023-01-11T21:38:06.9416429Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9416525Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9416530Z 
2023-01-11T21:38:06.9416597Z import triton
2023-01-11T21:38:06.9416688Z import triton.language as tl
2023-01-11T21:38:06.9416813Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9416950Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9416955Z 
2023-01-11T21:38:06.9416960Z 
2023-01-11T21:38:06.9417112Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9417244Z import triton
2023-01-11T21:38:06.9417339Z import triton.language as tl
2023-01-11T21:38:06.9417445Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9417546Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9417676Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9417805Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9417810Z 
2023-01-11T21:38:06.9418225Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9418303Z @triton.jit
2023-01-11T21:38:06.9418443Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9418517Z     xnumel = 100
2023-01-11T21:38:06.9418608Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9418736Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9418864Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9418940Z     x0 = xindex % 10
2023-01-11T21:38:06.9419019Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9419089Z     x2 = xindex
2023-01-11T21:38:06.9419196Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9419323Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9419403Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9419537Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9419623Z ''')
2023-01-11T21:38:06.9419629Z 
2023-01-11T21:38:06.9419633Z 
2023-01-11T21:38:06.9419728Z async_compile.wait(globals())
2023-01-11T21:38:06.9419802Z del async_compile
2023-01-11T21:38:06.9419807Z 
2023-01-11T21:38:06.9419879Z def call(args):
2023-01-11T21:38:06.9419959Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9420028Z     args.clear()
2023-01-11T21:38:06.9420121Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9420328Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9420422Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9420559Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9420633Z         del arg0_1
2023-01-11T21:38:06.9420708Z         del arg1_1
2023-01-11T21:38:06.9420779Z         return (buf0, )
2023-01-11T21:38:06.9420784Z 
2023-01-11T21:38:06.9420795Z 
2023-01-11T21:38:06.9420868Z if __name__ == "__main__":
2023-01-11T21:38:06.9420984Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9421108Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9421312Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9421507Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9421625Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9421633Z 
2023-01-11T21:38:06.9421707Z ok (0.087s)
2023-01-11T21:38:06.9422204Z   test_cuda_strided_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9422337Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9422590Z [2023-01-11 21:37:58,996] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1138
2023-01-11T21:38:06.9422854Z [2023-01-11 21:37:59,074] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1138
2023-01-11T21:38:06.9422859Z 
2023-01-11T21:38:06.9422960Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9423038Z import torch
2023-01-11T21:38:06.9423112Z import random
2023-01-11T21:38:06.9423230Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9423354Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9423359Z 
2023-01-11T21:38:06.9423439Z aten = torch.ops.aten
2023-01-11T21:38:06.9423571Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9423666Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9423671Z 
2023-01-11T21:38:06.9423741Z import triton
2023-01-11T21:38:06.9423835Z import triton.language as tl
2023-01-11T21:38:06.9423960Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9424099Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9424104Z 
2023-01-11T21:38:06.9424109Z 
2023-01-11T21:38:06.9424261Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9424335Z import triton
2023-01-11T21:38:06.9424422Z import triton.language as tl
2023-01-11T21:38:06.9424563Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9424662Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9424795Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9424918Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9424923Z 
2023-01-11T21:38:06.9425340Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9425416Z @triton.jit
2023-01-11T21:38:06.9425558Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9425633Z     xnumel = 100
2023-01-11T21:38:06.9425748Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9425898Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9425985Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9426063Z     x0 = xindex % 10
2023-01-11T21:38:06.9426142Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9426205Z     x2 = xindex
2023-01-11T21:38:06.9426314Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9426410Z     tmp1 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9426489Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9426625Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9426710Z ''')
2023-01-11T21:38:06.9426715Z 
2023-01-11T21:38:06.9426720Z 
2023-01-11T21:38:06.9426814Z async_compile.wait(globals())
2023-01-11T21:38:06.9426892Z del async_compile
2023-01-11T21:38:06.9426897Z 
2023-01-11T21:38:06.9426966Z def call(args):
2023-01-11T21:38:06.9427045Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9427122Z     args.clear()
2023-01-11T21:38:06.9427215Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9427418Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9427516Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9427656Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9427723Z         del arg0_1
2023-01-11T21:38:06.9427796Z         del arg1_1
2023-01-11T21:38:06.9427873Z         return (buf0, )
2023-01-11T21:38:06.9427878Z 
2023-01-11T21:38:06.9427909Z 
2023-01-11T21:38:06.9427994Z if __name__ == "__main__":
2023-01-11T21:38:06.9428113Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9428241Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9428446Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9428647Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9428761Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9428766Z 
2023-01-11T21:38:06.9428840Z ok (0.092s)
2023-01-11T21:38:06.9429316Z   test_cuda_strided_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9429451Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9429712Z [2023-01-11 21:37:59,089] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1139
2023-01-11T21:38:06.9429975Z [2023-01-11 21:37:59,162] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1139
2023-01-11T21:38:06.9429980Z 
2023-01-11T21:38:06.9430082Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9430159Z import torch
2023-01-11T21:38:06.9430235Z import random
2023-01-11T21:38:06.9430350Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9430501Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9430507Z 
2023-01-11T21:38:06.9430586Z aten = torch.ops.aten
2023-01-11T21:38:06.9430722Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9430817Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9430825Z 
2023-01-11T21:38:06.9430899Z import triton
2023-01-11T21:38:06.9430993Z import triton.language as tl
2023-01-11T21:38:06.9431116Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9431247Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9431253Z 
2023-01-11T21:38:06.9431266Z 
2023-01-11T21:38:06.9431414Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9431488Z import triton
2023-01-11T21:38:06.9431580Z import triton.language as tl
2023-01-11T21:38:06.9431691Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9431791Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9431929Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9432054Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9432059Z 
2023-01-11T21:38:06.9432469Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9432543Z @triton.jit
2023-01-11T21:38:06.9432686Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9432759Z     xnumel = 100
2023-01-11T21:38:06.9432858Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9432984Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9433068Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9433142Z     x0 = xindex % 10
2023-01-11T21:38:06.9433215Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9433287Z     x2 = xindex
2023-01-11T21:38:06.9433399Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9433495Z     tmp2 = tl.load(in_ptr1 + (x2), xmask)
2023-01-11T21:38:06.9433582Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9433661Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9433816Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9433904Z ''')
2023-01-11T21:38:06.9433909Z 
2023-01-11T21:38:06.9433914Z 
2023-01-11T21:38:06.9434008Z async_compile.wait(globals())
2023-01-11T21:38:06.9434086Z del async_compile
2023-01-11T21:38:06.9434092Z 
2023-01-11T21:38:06.9434170Z def call(args):
2023-01-11T21:38:06.9434249Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9434324Z     args.clear()
2023-01-11T21:38:06.9434419Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9434614Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9434704Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9434845Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9434919Z         del arg0_1
2023-01-11T21:38:06.9434991Z         del arg1_1
2023-01-11T21:38:06.9435068Z         return (buf0, )
2023-01-11T21:38:06.9435074Z 
2023-01-11T21:38:06.9435079Z 
2023-01-11T21:38:06.9435157Z if __name__ == "__main__":
2023-01-11T21:38:06.9435279Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9435397Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9435600Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9435796Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9435913Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9435918Z 
2023-01-11T21:38:06.9435989Z ok (0.088s)
2023-01-11T21:38:06.9436458Z   test_cuda_strided_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9436622Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9436880Z [2023-01-11 21:37:59,176] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1140
2023-01-11T21:38:06.9437145Z [2023-01-11 21:37:59,251] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1140
2023-01-11T21:38:06.9437151Z 
2023-01-11T21:38:06.9437242Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9437318Z import torch
2023-01-11T21:38:06.9437391Z import random
2023-01-11T21:38:06.9437508Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9437629Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9437638Z 
2023-01-11T21:38:06.9437717Z aten = torch.ops.aten
2023-01-11T21:38:06.9437854Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9437947Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9437952Z 
2023-01-11T21:38:06.9438019Z import triton
2023-01-11T21:38:06.9438117Z import triton.language as tl
2023-01-11T21:38:06.9438240Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9438378Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9438383Z 
2023-01-11T21:38:06.9438388Z 
2023-01-11T21:38:06.9438541Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9438616Z import triton
2023-01-11T21:38:06.9438713Z import triton.language as tl
2023-01-11T21:38:06.9438819Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9438921Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9439055Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9439181Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9439186Z 
2023-01-11T21:38:06.9439626Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9439704Z @triton.jit
2023-01-11T21:38:06.9439843Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9439917Z     xnumel = 100
2023-01-11T21:38:06.9440008Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9440133Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9440215Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9440290Z     x0 = xindex % 10
2023-01-11T21:38:06.9440370Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9440440Z     x2 = xindex
2023-01-11T21:38:06.9440548Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9440642Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9440734Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.9440813Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9440948Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9441032Z ''')
2023-01-11T21:38:06.9441040Z 
2023-01-11T21:38:06.9441044Z 
2023-01-11T21:38:06.9441137Z async_compile.wait(globals())
2023-01-11T21:38:06.9441213Z del async_compile
2023-01-11T21:38:06.9441219Z 
2023-01-11T21:38:06.9441293Z def call(args):
2023-01-11T21:38:06.9441366Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9441440Z     args.clear()
2023-01-11T21:38:06.9441532Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9441734Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9441826Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9441968Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9442081Z         del arg0_1
2023-01-11T21:38:06.9442147Z         del arg1_1
2023-01-11T21:38:06.9442224Z         return (buf0, )
2023-01-11T21:38:06.9442230Z 
2023-01-11T21:38:06.9442234Z 
2023-01-11T21:38:06.9442313Z if __name__ == "__main__":
2023-01-11T21:38:06.9442431Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9442558Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9442763Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9442959Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9443076Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9443082Z 
2023-01-11T21:38:06.9443145Z ok (0.089s)
2023-01-11T21:38:06.9443624Z   test_cuda_strided_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9443757Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9444018Z [2023-01-11 21:37:59,265] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1141
2023-01-11T21:38:06.9444282Z [2023-01-11 21:37:59,339] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1141
2023-01-11T21:38:06.9444287Z 
2023-01-11T21:38:06.9444385Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9444461Z import torch
2023-01-11T21:38:06.9444535Z import random
2023-01-11T21:38:06.9444653Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9444768Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9444780Z 
2023-01-11T21:38:06.9444858Z aten = torch.ops.aten
2023-01-11T21:38:06.9444993Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9445086Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9445091Z 
2023-01-11T21:38:06.9445165Z import triton
2023-01-11T21:38:06.9445255Z import triton.language as tl
2023-01-11T21:38:06.9445411Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9445561Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9445568Z 
2023-01-11T21:38:06.9445573Z 
2023-01-11T21:38:06.9445743Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9445834Z import triton
2023-01-11T21:38:06.9445926Z import triton.language as tl
2023-01-11T21:38:06.9446039Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9446139Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9446271Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9446394Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9446401Z 
2023-01-11T21:38:06.9446816Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9446883Z @triton.jit
2023-01-11T21:38:06.9447027Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9447101Z     xnumel = 100
2023-01-11T21:38:06.9447202Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9447331Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9447415Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9447489Z     x0 = xindex % 10
2023-01-11T21:38:06.9447562Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9447633Z     x2 = xindex
2023-01-11T21:38:06.9447742Z     tmp0 = tl.load(in_ptr0 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9447849Z     tmp1 = tl.load(in_ptr1 + ((2*x0) + (30*x1)), xmask)
2023-01-11T21:38:06.9447956Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9448091Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9448174Z ''')
2023-01-11T21:38:06.9448180Z 
2023-01-11T21:38:06.9448184Z 
2023-01-11T21:38:06.9448276Z async_compile.wait(globals())
2023-01-11T21:38:06.9448348Z del async_compile
2023-01-11T21:38:06.9448353Z 
2023-01-11T21:38:06.9448428Z def call(args):
2023-01-11T21:38:06.9448506Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9448581Z     args.clear()
2023-01-11T21:38:06.9448673Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9448877Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9448969Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9449102Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9449178Z         del arg0_1
2023-01-11T21:38:06.9449250Z         del arg1_1
2023-01-11T21:38:06.9449332Z         return (buf0, )
2023-01-11T21:38:06.9449337Z 
2023-01-11T21:38:06.9449342Z 
2023-01-11T21:38:06.9449422Z if __name__ == "__main__":
2023-01-11T21:38:06.9449537Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9449661Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9449867Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9450057Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9450176Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9450181Z 
2023-01-11T21:38:06.9450251Z ok (0.089s)
2023-01-11T21:38:06.9450728Z   test_cuda_strided_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9450860Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9451119Z [2023-01-11 21:37:59,354] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1142
2023-01-11T21:38:06.9451411Z [2023-01-11 21:37:59,573] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1142
2023-01-11T21:38:06.9451417Z 
2023-01-11T21:38:06.9451515Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9451589Z import torch
2023-01-11T21:38:06.9451657Z import random
2023-01-11T21:38:06.9451775Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9451897Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9451902Z 
2023-01-11T21:38:06.9451985Z aten = torch.ops.aten
2023-01-11T21:38:06.9452119Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9452217Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9452222Z 
2023-01-11T21:38:06.9452298Z import triton
2023-01-11T21:38:06.9452391Z import triton.language as tl
2023-01-11T21:38:06.9452508Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9452652Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9452660Z 
2023-01-11T21:38:06.9452665Z 
2023-01-11T21:38:06.9452820Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9452894Z import triton
2023-01-11T21:38:06.9452985Z import triton.language as tl
2023-01-11T21:38:06.9453099Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9453202Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9453329Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9453451Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9453456Z 
2023-01-11T21:38:06.9453930Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9454032Z @triton.jit
2023-01-11T21:38:06.9454210Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.9454287Z     xnumel = 10
2023-01-11T21:38:06.9454359Z     ynumel = 10
2023-01-11T21:38:06.9454457Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9454710Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9454788Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9454885Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.9455016Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.9455100Z     ymask = yindex < ynumel
2023-01-11T21:38:06.9455171Z     x0 = xindex
2023-01-11T21:38:06.9455243Z     y1 = yindex
2023-01-11T21:38:06.9455363Z     tmp0 = tl.load(in_ptr0 + ((2*y1) + (30*x0)), xmask & ymask)
2023-01-11T21:38:06.9455473Z     tmp1 = tl.load(in_ptr1 + (x0 + (10*y1)), xmask & ymask)
2023-01-11T21:38:06.9455555Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9455715Z     tl.store(out_ptr0 + (y1 + (10*x0) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.9455806Z ''')
2023-01-11T21:38:06.9455811Z 
2023-01-11T21:38:06.9455816Z 
2023-01-11T21:38:06.9455909Z async_compile.wait(globals())
2023-01-11T21:38:06.9455987Z del async_compile
2023-01-11T21:38:06.9455994Z 
2023-01-11T21:38:06.9456068Z def call(args):
2023-01-11T21:38:06.9456140Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9456220Z     args.clear()
2023-01-11T21:38:06.9456310Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9456516Z         buf0 = empty_strided((10, 10), (10, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9456607Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9456756Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0)
2023-01-11T21:38:06.9456827Z         del arg0_1
2023-01-11T21:38:06.9456892Z         del arg1_1
2023-01-11T21:38:06.9456969Z         return (buf0, )
2023-01-11T21:38:06.9456974Z 
2023-01-11T21:38:06.9456978Z 
2023-01-11T21:38:06.9457059Z if __name__ == "__main__":
2023-01-11T21:38:06.9457281Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9458700Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9458920Z     arg0_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9459121Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9459242Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9459247Z 
2023-01-11T21:38:06.9459319Z ok (0.234s)
2023-01-11T21:38:06.9459803Z   test_cuda_transposed_broadcast1 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9459938Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9460201Z [2023-01-11 21:37:59,588] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1143
2023-01-11T21:38:06.9460492Z [2023-01-11 21:37:59,601] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1143
2023-01-11T21:38:06.9460497Z 
2023-01-11T21:38:06.9460596Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9460673Z import torch
2023-01-11T21:38:06.9460748Z import random
2023-01-11T21:38:06.9460867Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9460993Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9460999Z 
2023-01-11T21:38:06.9461083Z aten = torch.ops.aten
2023-01-11T21:38:06.9461221Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9461311Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9461316Z 
2023-01-11T21:38:06.9461391Z import triton
2023-01-11T21:38:06.9461487Z import triton.language as tl
2023-01-11T21:38:06.9461612Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9461756Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9461762Z 
2023-01-11T21:38:06.9461767Z 
2023-01-11T21:38:06.9461925Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9462000Z import triton
2023-01-11T21:38:06.9462087Z import triton.language as tl
2023-01-11T21:38:06.9462206Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9462308Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9462443Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9462570Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9462575Z 
2023-01-11T21:38:06.9462993Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9463068Z @triton.jit
2023-01-11T21:38:06.9463212Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9463281Z     xnumel = 100
2023-01-11T21:38:06.9463380Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9463509Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9463593Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9463665Z     x2 = xindex
2023-01-11T21:38:06.9463746Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9463845Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9463935Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9464015Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9464151Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9464240Z ''')
2023-01-11T21:38:06.9464246Z 
2023-01-11T21:38:06.9464250Z 
2023-01-11T21:38:06.9464343Z async_compile.wait(globals())
2023-01-11T21:38:06.9464459Z del async_compile
2023-01-11T21:38:06.9464465Z 
2023-01-11T21:38:06.9464543Z def call(args):
2023-01-11T21:38:06.9464668Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9464738Z     args.clear()
2023-01-11T21:38:06.9464833Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9465041Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9465134Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9465276Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9465350Z         del arg0_1
2023-01-11T21:38:06.9465421Z         del arg1_1
2023-01-11T21:38:06.9465493Z         return (buf0, )
2023-01-11T21:38:06.9465498Z 
2023-01-11T21:38:06.9465507Z 
2023-01-11T21:38:06.9465582Z if __name__ == "__main__":
2023-01-11T21:38:06.9465701Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9465827Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9466035Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9466237Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9466357Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9466362Z 
2023-01-11T21:38:06.9466433Z ok (0.028s)
2023-01-11T21:38:06.9466912Z   test_cuda_transposed_broadcast2 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9467038Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9467301Z [2023-01-11 21:37:59,615] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1144
2023-01-11T21:38:06.9467565Z [2023-01-11 21:37:59,627] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1144
2023-01-11T21:38:06.9467574Z 
2023-01-11T21:38:06.9467674Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9467748Z import torch
2023-01-11T21:38:06.9467823Z import random
2023-01-11T21:38:06.9467943Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9468068Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9468073Z 
2023-01-11T21:38:06.9468156Z aten = torch.ops.aten
2023-01-11T21:38:06.9468287Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9468386Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9468391Z 
2023-01-11T21:38:06.9468467Z import triton
2023-01-11T21:38:06.9468561Z import triton.language as tl
2023-01-11T21:38:06.9468687Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9468830Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9468835Z 
2023-01-11T21:38:06.9468840Z 
2023-01-11T21:38:06.9468997Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9469066Z import triton
2023-01-11T21:38:06.9469159Z import triton.language as tl
2023-01-11T21:38:06.9469273Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9469376Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9469507Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9469635Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9469640Z 
2023-01-11T21:38:06.9470051Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9470129Z @triton.jit
2023-01-11T21:38:06.9470263Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9470372Z     xnumel = 100
2023-01-11T21:38:06.9470471Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9470635Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9470719Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9470791Z     x2 = xindex
2023-01-11T21:38:06.9470868Z     x0 = xindex % 10
2023-01-11T21:38:06.9470959Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9471054Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9471132Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9471267Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9471352Z ''')
2023-01-11T21:38:06.9471358Z 
2023-01-11T21:38:06.9471363Z 
2023-01-11T21:38:06.9471453Z async_compile.wait(globals())
2023-01-11T21:38:06.9471528Z del async_compile
2023-01-11T21:38:06.9471533Z 
2023-01-11T21:38:06.9471609Z def call(args):
2023-01-11T21:38:06.9471682Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9471761Z     args.clear()
2023-01-11T21:38:06.9471854Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9472069Z         buf0 = empty_strided((1, 10, 10), (100, 1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9472166Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9472311Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9472387Z         del arg0_1
2023-01-11T21:38:06.9472453Z         del arg1_1
2023-01-11T21:38:06.9472535Z         return (buf0, )
2023-01-11T21:38:06.9472541Z 
2023-01-11T21:38:06.9472545Z 
2023-01-11T21:38:06.9472625Z if __name__ == "__main__":
2023-01-11T21:38:06.9472748Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9472874Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9473081Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9473289Z     arg1_1 = rand_strided((1, 10, 1), (10, 1, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9473411Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9473420Z 
2023-01-11T21:38:06.9473485Z ok (0.026s)
2023-01-11T21:38:06.9473962Z   test_cuda_transposed_broadcast3 (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9474094Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9474355Z [2023-01-11 21:37:59,641] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1145
2023-01-11T21:38:06.9474617Z [2023-01-11 21:37:59,649] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1145
2023-01-11T21:38:06.9474626Z 
2023-01-11T21:38:06.9474723Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9474799Z import torch
2023-01-11T21:38:06.9474878Z import random
2023-01-11T21:38:06.9474997Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9475118Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9475130Z 
2023-01-11T21:38:06.9475208Z aten = torch.ops.aten
2023-01-11T21:38:06.9475344Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9475439Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9475444Z 
2023-01-11T21:38:06.9475520Z import triton
2023-01-11T21:38:06.9475611Z import triton.language as tl
2023-01-11T21:38:06.9475735Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9475876Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9475882Z 
2023-01-11T21:38:06.9475886Z 
2023-01-11T21:38:06.9476040Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9476141Z import triton
2023-01-11T21:38:06.9476235Z import triton.language as tl
2023-01-11T21:38:06.9476352Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9476483Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9476617Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9476743Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9476748Z 
2023-01-11T21:38:06.9477157Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9477225Z @triton.jit
2023-01-11T21:38:06.9477366Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9477440Z     xnumel = 100
2023-01-11T21:38:06.9477536Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9477668Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9477750Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9477825Z     x0 = xindex
2023-01-11T21:38:06.9477915Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9478047Z     tmp1 = tl.load(in_ptr1 + (0 + tl.zeros([XBLOCK], tl.int32)), None)
2023-01-11T21:38:06.9478130Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9478265Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9478349Z ''')
2023-01-11T21:38:06.9478355Z 
2023-01-11T21:38:06.9478359Z 
2023-01-11T21:38:06.9478453Z async_compile.wait(globals())
2023-01-11T21:38:06.9478532Z del async_compile
2023-01-11T21:38:06.9478538Z 
2023-01-11T21:38:06.9478612Z def call(args):
2023-01-11T21:38:06.9478685Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9478759Z     args.clear()
2023-01-11T21:38:06.9478851Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9479055Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9479152Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9479295Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9479372Z         del arg0_1
2023-01-11T21:38:06.9479438Z         del arg1_1
2023-01-11T21:38:06.9479515Z         return (buf0, )
2023-01-11T21:38:06.9479520Z 
2023-01-11T21:38:06.9479524Z 
2023-01-11T21:38:06.9479604Z if __name__ == "__main__":
2023-01-11T21:38:06.9479722Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9479848Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9480052Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9480247Z     arg1_1 = rand_strided((1, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9480367Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9480372Z 
2023-01-11T21:38:06.9480437Z ok (0.022s)
2023-01-11T21:38:06.9480918Z   test_cuda_transposed_dense (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9481052Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9481313Z [2023-01-11 21:37:59,663] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1146
2023-01-11T21:38:06.9481579Z [2023-01-11 21:37:59,867] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1146
2023-01-11T21:38:06.9481584Z 
2023-01-11T21:38:06.9481684Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9481759Z import torch
2023-01-11T21:38:06.9481832Z import random
2023-01-11T21:38:06.9481979Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9482099Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9482151Z 
2023-01-11T21:38:06.9482229Z aten = torch.ops.aten
2023-01-11T21:38:06.9482367Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9482464Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9482469Z 
2023-01-11T21:38:06.9482546Z import triton
2023-01-11T21:38:06.9482639Z import triton.language as tl
2023-01-11T21:38:06.9482765Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9482907Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9482912Z 
2023-01-11T21:38:06.9482917Z 
2023-01-11T21:38:06.9483074Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9483143Z import triton
2023-01-11T21:38:06.9483235Z import triton.language as tl
2023-01-11T21:38:06.9483351Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9483451Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9483588Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9483714Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9483722Z 
2023-01-11T21:38:06.9484196Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9484270Z @triton.jit
2023-01-11T21:38:06.9484441Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.9484516Z     xnumel = 10
2023-01-11T21:38:06.9484589Z     ynumel = 10
2023-01-11T21:38:06.9484688Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9484826Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9484910Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9485009Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.9485136Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.9485223Z     ymask = yindex < ynumel
2023-01-11T21:38:06.9485293Z     x0 = xindex
2023-01-11T21:38:06.9485368Z     y1 = yindex
2023-01-11T21:38:06.9485484Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*y1)), xmask & ymask)
2023-01-11T21:38:06.9485602Z     tmp1 = tl.load(in_ptr1 + (y1 + (10*x0)), xmask & ymask)
2023-01-11T21:38:06.9485683Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9485860Z     tl.store(out_ptr0 + (x0 + (10*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.9485965Z ''')
2023-01-11T21:38:06.9485972Z 
2023-01-11T21:38:06.9485978Z 
2023-01-11T21:38:06.9486074Z async_compile.wait(globals())
2023-01-11T21:38:06.9486155Z del async_compile
2023-01-11T21:38:06.9486161Z 
2023-01-11T21:38:06.9486233Z def call(args):
2023-01-11T21:38:06.9486313Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9486389Z     args.clear()
2023-01-11T21:38:06.9486479Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9486686Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9486783Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9486931Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0)
2023-01-11T21:38:06.9487003Z         del arg0_1
2023-01-11T21:38:06.9487075Z         del arg1_1
2023-01-11T21:38:06.9487154Z         return (buf0, )
2023-01-11T21:38:06.9487159Z 
2023-01-11T21:38:06.9487164Z 
2023-01-11T21:38:06.9487244Z if __name__ == "__main__":
2023-01-11T21:38:06.9487355Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9487482Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9487688Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9487887Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9488040Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9488046Z 
2023-01-11T21:38:06.9488144Z ok (0.218s)
2023-01-11T21:38:06.9488627Z   test_cuda_transposed_double (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9488759Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9489019Z [2023-01-11 21:37:59,881] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1147
2023-01-11T21:38:06.9489282Z [2023-01-11 21:38:00,085] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1147
2023-01-11T21:38:06.9489288Z 
2023-01-11T21:38:06.9489383Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9489459Z import torch
2023-01-11T21:38:06.9489537Z import random
2023-01-11T21:38:06.9489659Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9489783Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9489788Z 
2023-01-11T21:38:06.9489872Z aten = torch.ops.aten
2023-01-11T21:38:06.9490010Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9490099Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9490109Z 
2023-01-11T21:38:06.9490177Z import triton
2023-01-11T21:38:06.9490272Z import triton.language as tl
2023-01-11T21:38:06.9490402Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9490540Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9490546Z 
2023-01-11T21:38:06.9490550Z 
2023-01-11T21:38:06.9490706Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9490782Z import triton
2023-01-11T21:38:06.9490878Z import triton.language as tl
2023-01-11T21:38:06.9490987Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9491091Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9491223Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9491352Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9491357Z 
2023-01-11T21:38:06.9491833Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: '*fp64', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9491908Z @triton.jit
2023-01-11T21:38:06.9492085Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.9492161Z     xnumel = 10
2023-01-11T21:38:06.9492228Z     ynumel = 10
2023-01-11T21:38:06.9492332Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9492466Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9492555Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9492656Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.9492788Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.9492873Z     ymask = yindex < ynumel
2023-01-11T21:38:06.9492938Z     x0 = xindex
2023-01-11T21:38:06.9493008Z     y1 = yindex
2023-01-11T21:38:06.9493125Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*y1)), xmask & ymask)
2023-01-11T21:38:06.9493243Z     tmp2 = tl.load(in_ptr1 + (y1 + (10*x0)), xmask & ymask)
2023-01-11T21:38:06.9493334Z     tmp1 = tmp0.to(tl.float64)
2023-01-11T21:38:06.9493414Z     tmp3 = tmp1 + tmp2
2023-01-11T21:38:06.9493574Z     tl.store(out_ptr0 + (x0 + (10*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp3, xmask & ymask)
2023-01-11T21:38:06.9493654Z ''')
2023-01-11T21:38:06.9493666Z 
2023-01-11T21:38:06.9493671Z 
2023-01-11T21:38:06.9493787Z async_compile.wait(globals())
2023-01-11T21:38:06.9493869Z del async_compile
2023-01-11T21:38:06.9493927Z 
2023-01-11T21:38:06.9494004Z def call(args):
2023-01-11T21:38:06.9494084Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9494161Z     args.clear()
2023-01-11T21:38:06.9494254Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9494461Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9494674Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9494825Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0)
2023-01-11T21:38:06.9494898Z         del arg0_1
2023-01-11T21:38:06.9494970Z         del arg1_1
2023-01-11T21:38:06.9495050Z         return (buf0, )
2023-01-11T21:38:06.9495055Z 
2023-01-11T21:38:06.9495059Z 
2023-01-11T21:38:06.9495141Z if __name__ == "__main__":
2023-01-11T21:38:06.9495260Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9495390Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9495596Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9495831Z     arg1_1 = rand_strided((10, 10), (10, 1), device='cuda:0', dtype=torch.float64)
2023-01-11T21:38:06.9495961Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9495967Z 
2023-01-11T21:38:06.9496038Z ok (0.218s)
2023-01-11T21:38:06.9496513Z   test_cuda_transposed_int (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9496646Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9496908Z [2023-01-11 21:38:00,099] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1148
2023-01-11T21:38:06.9497253Z [2023-01-11 21:38:00,169] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1148
2023-01-11T21:38:06.9497262Z 
2023-01-11T21:38:06.9497368Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9497437Z import torch
2023-01-11T21:38:06.9497512Z import random
2023-01-11T21:38:06.9497634Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9497760Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9497765Z 
2023-01-11T21:38:06.9497850Z aten = torch.ops.aten
2023-01-11T21:38:06.9497989Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9498086Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9498092Z 
2023-01-11T21:38:06.9498165Z import triton
2023-01-11T21:38:06.9498252Z import triton.language as tl
2023-01-11T21:38:06.9498377Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9498520Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9498526Z 
2023-01-11T21:38:06.9498532Z 
2023-01-11T21:38:06.9498686Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9498763Z import triton
2023-01-11T21:38:06.9498858Z import triton.language as tl
2023-01-11T21:38:06.9498972Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9499068Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9499202Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9499328Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9499333Z 
2023-01-11T21:38:06.9499751Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*i32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9499825Z @triton.jit
2023-01-11T21:38:06.9500020Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9500099Z     xnumel = 100
2023-01-11T21:38:06.9500234Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9500359Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9500444Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9500516Z     x2 = xindex
2023-01-11T21:38:06.9500596Z     x1 = (xindex // 10)
2023-01-11T21:38:06.9500694Z     tmp0 = tl.load(in_ptr0 + (x2), xmask)
2023-01-11T21:38:06.9500791Z     tmp1 = tl.load(in_ptr1 + (x1), xmask)
2023-01-11T21:38:06.9500880Z     tmp2 = tmp1.to(tl.float32)
2023-01-11T21:38:06.9500952Z     tmp3 = tmp0 + tmp2
2023-01-11T21:38:06.9501088Z     tl.store(out_ptr0 + (x2 + tl.zeros([XBLOCK], tl.int32)), tmp3, xmask)
2023-01-11T21:38:06.9501175Z ''')
2023-01-11T21:38:06.9501181Z 
2023-01-11T21:38:06.9501185Z 
2023-01-11T21:38:06.9501278Z async_compile.wait(globals())
2023-01-11T21:38:06.9501355Z del async_compile
2023-01-11T21:38:06.9501360Z 
2023-01-11T21:38:06.9501438Z def call(args):
2023-01-11T21:38:06.9501518Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9501596Z     args.clear()
2023-01-11T21:38:06.9501682Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9501884Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9501977Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9502119Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9502192Z         del arg0_1
2023-01-11T21:38:06.9502265Z         del arg1_1
2023-01-11T21:38:06.9502345Z         return (buf0, )
2023-01-11T21:38:06.9502350Z 
2023-01-11T21:38:06.9502354Z 
2023-01-11T21:38:06.9502428Z if __name__ == "__main__":
2023-01-11T21:38:06.9502547Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9502673Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9502880Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9503075Z     arg1_1 = rand_strided((10, ), (1, ), device='cuda:0', dtype=torch.int32)
2023-01-11T21:38:06.9503197Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9503202Z 
2023-01-11T21:38:06.9503272Z ok (0.084s)
2023-01-11T21:38:06.9503751Z   test_cuda_transposed_strided (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9503883Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9504142Z [2023-01-11 21:38:00,183] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1149
2023-01-11T21:38:06.9504403Z [2023-01-11 21:38:00,387] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1149
2023-01-11T21:38:06.9504408Z 
2023-01-11T21:38:06.9504510Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9504584Z import torch
2023-01-11T21:38:06.9504659Z import random
2023-01-11T21:38:06.9504778Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9504902Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9504907Z 
2023-01-11T21:38:06.9504990Z aten = torch.ops.aten
2023-01-11T21:38:06.9505121Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9505217Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9505222Z 
2023-01-11T21:38:06.9505296Z import triton
2023-01-11T21:38:06.9505389Z import triton.language as tl
2023-01-11T21:38:06.9505514Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9505654Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9505660Z 
2023-01-11T21:38:06.9505665Z 
2023-01-11T21:38:06.9505849Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9505929Z import triton
2023-01-11T21:38:06.9506041Z import triton.language as tl
2023-01-11T21:38:06.9506157Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9506260Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9506397Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9506525Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9506530Z 
2023-01-11T21:38:06.9507005Z @pointwise(size_hints=[16, 16], tile_hint=TileHint.DEFAULT,filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32', 4: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9507083Z @triton.jit
2023-01-11T21:38:06.9507260Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, ynumel, XBLOCK : tl.constexpr, YBLOCK : tl.constexpr):
2023-01-11T21:38:06.9507342Z     xnumel = 10
2023-01-11T21:38:06.9507411Z     ynumel = 10
2023-01-11T21:38:06.9507512Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9507655Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9507740Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9507840Z     yoffset = tl.program_id(1) * YBLOCK
2023-01-11T21:38:06.9507975Z     yindex = yoffset + tl.reshape(tl.arange(0, YBLOCK), [1, YBLOCK])
2023-01-11T21:38:06.9508061Z     ymask = yindex < ynumel
2023-01-11T21:38:06.9508128Z     x0 = xindex
2023-01-11T21:38:06.9508199Z     y1 = yindex
2023-01-11T21:38:06.9508317Z     tmp0 = tl.load(in_ptr0 + (x0 + (10*y1)), xmask & ymask)
2023-01-11T21:38:06.9508436Z     tmp1 = tl.load(in_ptr1 + ((2*y1) + (30*x0)), xmask & ymask)
2023-01-11T21:38:06.9508518Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9508678Z     tl.store(out_ptr0 + (x0 + (10*y1) + tl.zeros([XBLOCK, YBLOCK], tl.int32)), tmp2, xmask & ymask)
2023-01-11T21:38:06.9508765Z ''')
2023-01-11T21:38:06.9508773Z 
2023-01-11T21:38:06.9508777Z 
2023-01-11T21:38:06.9508866Z async_compile.wait(globals())
2023-01-11T21:38:06.9508951Z del async_compile
2023-01-11T21:38:06.9508956Z 
2023-01-11T21:38:06.9509031Z def call(args):
2023-01-11T21:38:06.9509112Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9509190Z     args.clear()
2023-01-11T21:38:06.9509283Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9509491Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9509579Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9509727Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 10, 10, grid=grid(10, 10), stream=stream0)
2023-01-11T21:38:06.9509802Z         del arg0_1
2023-01-11T21:38:06.9509875Z         del arg1_1
2023-01-11T21:38:06.9509952Z         return (buf0, )
2023-01-11T21:38:06.9509958Z 
2023-01-11T21:38:06.9509962Z 
2023-01-11T21:38:06.9510046Z if __name__ == "__main__":
2023-01-11T21:38:06.9510170Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9510300Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9510503Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9510704Z     arg1_1 = rand_strided((10, 10), (30, 2), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9510825Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9510830Z 
2023-01-11T21:38:06.9510905Z ok (0.218s)
2023-01-11T21:38:06.9511389Z   test_cuda_transposed_transposed (__main__.SweepInputsCudaTest) ... /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:246: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:06.9511523Z   buffer = torch.as_strided(x, (x.storage().size(),), (1,), 0).clone()
2023-01-11T21:38:06.9511814Z [2023-01-11 21:38:00,401] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1150
2023-01-11T21:38:06.9512112Z [2023-01-11 21:38:00,411] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1150
2023-01-11T21:38:06.9512118Z 
2023-01-11T21:38:06.9512219Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9512296Z import torch
2023-01-11T21:38:06.9512367Z import random
2023-01-11T21:38:06.9512489Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9512614Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9512619Z 
2023-01-11T21:38:06.9512706Z aten = torch.ops.aten
2023-01-11T21:38:06.9512844Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9512942Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9512947Z 
2023-01-11T21:38:06.9513021Z import triton
2023-01-11T21:38:06.9513109Z import triton.language as tl
2023-01-11T21:38:06.9513234Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9513374Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9513383Z 
2023-01-11T21:38:06.9513388Z 
2023-01-11T21:38:06.9513542Z triton_fused_add_0 = async_compile.triton('''
2023-01-11T21:38:06.9513617Z import triton
2023-01-11T21:38:06.9513713Z import triton.language as tl
2023-01-11T21:38:06.9513827Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9513929Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9514056Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9514182Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9514187Z 
2023-01-11T21:38:06.9514603Z @pointwise(size_hints=[128], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: '*fp32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9514681Z @triton.jit
2023-01-11T21:38:06.9514822Z def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9514900Z     xnumel = 100
2023-01-11T21:38:06.9515001Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9515135Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9515214Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9515287Z     x0 = xindex
2023-01-11T21:38:06.9515385Z     tmp0 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9515484Z     tmp1 = tl.load(in_ptr1 + (x0), xmask)
2023-01-11T21:38:06.9515564Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9515701Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp2, xmask)
2023-01-11T21:38:06.9515788Z ''')
2023-01-11T21:38:06.9515794Z 
2023-01-11T21:38:06.9515798Z 
2023-01-11T21:38:06.9515886Z async_compile.wait(globals())
2023-01-11T21:38:06.9515966Z del async_compile
2023-01-11T21:38:06.9515971Z 
2023-01-11T21:38:06.9516047Z def call(args):
2023-01-11T21:38:06.9516130Z     arg0_1, arg1_1 = args
2023-01-11T21:38:06.9516210Z     args.clear()
2023-01-11T21:38:06.9516304Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9516509Z         buf0 = empty_strided((10, 10), (1, 10), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9516596Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9516740Z         triton_fused_add_0.run(arg0_1, arg1_1, buf0, 100, grid=grid(100), stream=stream0)
2023-01-11T21:38:06.9516815Z         del arg0_1
2023-01-11T21:38:06.9516888Z         del arg1_1
2023-01-11T21:38:06.9516968Z         return (buf0, )
2023-01-11T21:38:06.9516973Z 
2023-01-11T21:38:06.9516978Z 
2023-01-11T21:38:06.9517058Z if __name__ == "__main__":
2023-01-11T21:38:06.9517177Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9517306Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9517505Z     arg0_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9517736Z     arg1_1 = rand_strided((10, 10), (1, 10), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9517858Z     print_performance(lambda: call([arg0_1, arg1_1]))
2023-01-11T21:38:06.9517887Z 
2023-01-11T21:38:06.9517961Z ok (0.024s)
2023-01-11T21:38:06.9518118Z   test_indexing_join (__main__.TestIndexingSimplification) ... ok (0.065s)
2023-01-11T21:38:06.9518290Z   test_indexing_simplification (__main__.TestIndexingSimplification) ... ok (0.070s)
2023-01-11T21:38:06.9518832Z   test_cant_optimize_compute (__main__.TritonCodeGenTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.9518915Z   warnings.warn(
2023-01-11T21:38:06.9519155Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9519468Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090
2023-01-11T21:38:06.9519695Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9519953Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9520252Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9520588Z [2023-01-11 21:38:00,552] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9520962Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074> 
2023-01-11T21:38:06.9521071Z  6075           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9521171Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9521270Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9521360Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9521505Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9521659Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9521762Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9521844Z               14 RETURN_VALUE
2023-01-11T21:38:06.9521911Z  
2023-01-11T21:38:06.9521916Z 
2023-01-11T21:38:06.9522228Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075
2023-01-11T21:38:06.9522448Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9522804Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9523121Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9523471Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9523810Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9524198Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9524644Z [2023-01-11 21:38:00,553] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9524931Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9531534Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074>
2023-01-11T21:38:06.9531892Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9532274Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8ac9d0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6083> 
2023-01-11T21:38:06.9532374Z  6083           0 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9532480Z                2 LOAD_CONST               1 (2147483657)
2023-01-11T21:38:06.9532557Z                4 BINARY_ADD
2023-01-11T21:38:06.9532645Z                6 RETURN_VALUE
2023-01-11T21:38:06.9532703Z  
2023-01-11T21:38:06.9532709Z 
2023-01-11T21:38:06.9532920Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x []
2023-01-11T21:38:06.9533170Z [2023-01-11 21:38:00,559] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2147483657 [TensorVariable()]
2023-01-11T21:38:06.9533450Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9533699Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9534078Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8ac9d0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6083>
2023-01-11T21:38:06.9534366Z [2023-01-11 21:38:00,560] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9534894Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077> 
2023-01-11T21:38:06.9534994Z  6078           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9535085Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9535187Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9535281Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9535376Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9535471Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9535549Z               12 BINARY_ADD
2023-01-11T21:38:06.9535646Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9535744Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9535837Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9535930Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9536010Z               22 RETURN_VALUE
2023-01-11T21:38:06.9536076Z  
2023-01-11T21:38:06.9536081Z 
2023-01-11T21:38:06.9536398Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078
2023-01-11T21:38:06.9536615Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9536853Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9537186Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9537716Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9538072Z [2023-01-11 21:38:00,561] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9538311Z [2023-01-11 21:38:00,567] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9538586Z [2023-01-11 21:38:00,567] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9538816Z [2023-01-11 21:38:00,568] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9539084Z [2023-01-11 21:38:00,568] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9539501Z [2023-01-11 21:38:00,568] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9539819Z [2023-01-11 21:38:00,569] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9540068Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9540441Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077>
2023-01-11T21:38:06.9540687Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9540935Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9541165Z [2023-01-11 21:38:00,575] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9541387Z [2023-01-11 21:38:00,575] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9541632Z [2023-01-11 21:38:00,576] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9541888Z [2023-01-11 21:38:00,618] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1151
2023-01-11T21:38:06.9542127Z [2023-01-11 21:38:00,628] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9542392Z [2023-01-11 21:38:00,862] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1151
2023-01-11T21:38:06.9542810Z [2023-01-11 21:38:00,862] torch._inductor.debug: [WARNING] model__1151_inference_1197 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_0/run_2023_01_11_21_38_00_551246/aot_torchinductor/model__1151_inference_1197.1
2023-01-11T21:38:06.9543052Z [2023-01-11 21:38:00,863] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9543431Z [2023-01-11 21:38:00,864] torch._inductor.debug: [WARNING] model___1198 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_0/run_2023_01_11_21_38_00_551246/aot_torchinductor/model___1197.0
2023-01-11T21:38:06.9543659Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9543972Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090
2023-01-11T21:38:06.9544192Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9544448Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9544776Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9545132Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9545500Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074> 
2023-01-11T21:38:06.9545602Z  6075           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9545700Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9545788Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9545882Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9546026Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9546176Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9546277Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9546361Z               14 RETURN_VALUE
2023-01-11T21:38:06.9546429Z  
2023-01-11T21:38:06.9546434Z 
2023-01-11T21:38:06.9546745Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075
2023-01-11T21:38:06.9546957Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9547310Z [2023-01-11 21:38:00,867] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9547626Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9547977Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9548315Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9548702Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9549128Z [2023-01-11 21:38:00,868] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9549373Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9549748Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074>
2023-01-11T21:38:06.9550069Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9550439Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8aca80, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6084> 
2023-01-11T21:38:06.9550541Z  6084           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9550634Z                2 LOAD_METHOD              1 (where)
2023-01-11T21:38:06.9550729Z                4 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9550824Z                6 LOAD_CONST               1 (0)
2023-01-11T21:38:06.9550921Z                8 COMPARE_OP               0 (<)
2023-01-11T21:38:06.9551019Z               10 LOAD_DEREF               0 (ones)
2023-01-11T21:38:06.9551146Z               12 CALL_FUNCTION            0
2023-01-11T21:38:06.9551244Z               14 LOAD_DEREF               0 (ones)
2023-01-11T21:38:06.9551358Z               16 CALL_FUNCTION            0
2023-01-11T21:38:06.9551455Z               18 LOAD_CONST               2 (2)
2023-01-11T21:38:06.9551543Z               20 BINARY_SUBTRACT
2023-01-11T21:38:06.9551637Z               22 CALL_METHOD              3
2023-01-11T21:38:06.9551800Z               24 LOAD_CONST               3 (-1099511627776)
2023-01-11T21:38:06.9551889Z               26 BINARY_MULTIPLY
2023-01-11T21:38:06.9551972Z               28 RETURN_VALUE
2023-01-11T21:38:06.9552032Z  
2023-01-11T21:38:06.9552045Z 
2023-01-11T21:38:06.9552259Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9552620Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR where [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9552942Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9553289Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 0 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable()]
2023-01-11T21:38:06.9553667Z [2023-01-11 21:38:00,874] torch._dynamo.symbolic_convert: [DEBUG] TRACE COMPARE_OP < [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9554010Z [2023-01-11 21:38:00,875] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable()]
2023-01-11T21:38:06.9554396Z [2023-01-11 21:38:00,875] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9554762Z [2023-01-11 21:38:00,875] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074> 
2023-01-11T21:38:06.9554870Z  6075           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9554967Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9555056Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9555154Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9555316Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9555493Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9555604Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9555687Z               14 RETURN_VALUE
2023-01-11T21:38:06.9555754Z  
2023-01-11T21:38:06.9555759Z 
2023-01-11T21:38:06.9556082Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075
2023-01-11T21:38:06.9556295Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9556654Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9556975Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9557324Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9557662Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9558075Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9558518Z [2023-01-11 21:38:00,876] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9558768Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9559142Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074>
2023-01-11T21:38:06.9559514Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9559928Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9560301Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074> 
2023-01-11T21:38:06.9560405Z  6075           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9560497Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9560596Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9560692Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9560836Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9560988Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9561089Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9561172Z               14 RETURN_VALUE
2023-01-11T21:38:06.9561235Z  
2023-01-11T21:38:06.9561240Z 
2023-01-11T21:38:06.9561550Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075
2023-01-11T21:38:06.9561769Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9562125Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9562441Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9562793Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9563137Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9563521Z [2023-01-11 21:38:00,882] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9563946Z [2023-01-11 21:38:00,883] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9564194Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9564597Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074>
2023-01-11T21:38:06.9565011Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9565454Z [2023-01-11 21:38:00,888] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_SUBTRACT None [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9565894Z [2023-01-11 21:38:00,889] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 3 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9566151Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST -1099511627776 [TensorVariable()]
2023-01-11T21:38:06.9566444Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_MULTIPLY None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9566694Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9567074Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8aca80, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6084>
2023-01-11T21:38:06.9567363Z [2023-01-11 21:38:00,890] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9567732Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077> 
2023-01-11T21:38:06.9567833Z  6078           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9567936Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9568038Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9568133Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9568235Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9568336Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9568417Z               12 BINARY_ADD
2023-01-11T21:38:06.9568516Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9568616Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9568717Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9568805Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9568885Z               22 RETURN_VALUE
2023-01-11T21:38:06.9568953Z  
2023-01-11T21:38:06.9568959Z 
2023-01-11T21:38:06.9569278Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078
2023-01-11T21:38:06.9569497Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9569739Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9570021Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9570437Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9570753Z [2023-01-11 21:38:00,891] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9570982Z [2023-01-11 21:38:00,897] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9571290Z [2023-01-11 21:38:00,897] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9571565Z [2023-01-11 21:38:00,898] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9571840Z [2023-01-11 21:38:00,898] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9572260Z [2023-01-11 21:38:00,898] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9572571Z [2023-01-11 21:38:00,899] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9572821Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9573201Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077>
2023-01-11T21:38:06.9573451Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9573710Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9573938Z [2023-01-11 21:38:00,905] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9574151Z [2023-01-11 21:38:00,905] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9574395Z [2023-01-11 21:38:00,906] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9574771Z [2023-01-11 21:38:00,976] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1152
2023-01-11T21:38:06.9575011Z [2023-01-11 21:38:00,986] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9575273Z [2023-01-11 21:38:01,206] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1152
2023-01-11T21:38:06.9575683Z [2023-01-11 21:38:01,206] torch._inductor.debug: [WARNING] model__1152_inference_1198 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_1/run_2023_01_11_21_38_00_866432/aot_torchinductor/model__1152_inference_1198.3
2023-01-11T21:38:06.9575919Z [2023-01-11 21:38:01,206] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9576293Z [2023-01-11 21:38:01,208] torch._inductor.debug: [WARNING] model___1199 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_1/run_2023_01_11_21_38_00_866432/aot_torchinductor/model___1198.2
2023-01-11T21:38:06.9576527Z [2023-01-11 21:38:01,210] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9576839Z [2023-01-11 21:38:01,210] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090
2023-01-11T21:38:06.9577054Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9577382Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9577677Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9578010Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9578379Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074> 
2023-01-11T21:38:06.9578526Z  6075           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9578627Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9578757Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9578850Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9578988Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9579138Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9579236Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9579317Z               14 RETURN_VALUE
2023-01-11T21:38:06.9579382Z  
2023-01-11T21:38:06.9579387Z 
2023-01-11T21:38:06.9579698Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075
2023-01-11T21:38:06.9579916Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9580267Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9580584Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9580931Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9581268Z [2023-01-11 21:38:01,211] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9581649Z [2023-01-11 21:38:01,212] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9582067Z [2023-01-11 21:38:01,212] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9582314Z [2023-01-11 21:38:01,217] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9582688Z [2023-01-11 21:38:01,217] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074>
2023-01-11T21:38:06.9583013Z [2023-01-11 21:38:01,217] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9583381Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8acb30, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6085> 
2023-01-11T21:38:06.9583483Z  6085           0 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9583581Z                2 LOAD_DEREF               0 (ten)
2023-01-11T21:38:06.9583656Z                4 BINARY_ADD
2023-01-11T21:38:06.9583739Z                6 RETURN_VALUE
2023-01-11T21:38:06.9583805Z  
2023-01-11T21:38:06.9583810Z 
2023-01-11T21:38:06.9584020Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x []
2023-01-11T21:38:06.9584263Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ten [TensorVariable()]
2023-01-11T21:38:06.9584537Z [2023-01-11 21:38:01,218] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9584781Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9585189Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8acb30, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6085>
2023-01-11T21:38:06.9585475Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9585916Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077> 
2023-01-11T21:38:06.9586019Z  6078           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9586119Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9586221Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9586317Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9586410Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9586508Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9586589Z               12 BINARY_ADD
2023-01-11T21:38:06.9586684Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9586786Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9586895Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9586990Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9587075Z               22 RETURN_VALUE
2023-01-11T21:38:06.9587141Z  
2023-01-11T21:38:06.9587146Z 
2023-01-11T21:38:06.9587460Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078
2023-01-11T21:38:06.9587679Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9587913Z [2023-01-11 21:38:01,219] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9588193Z [2023-01-11 21:38:01,220] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9588612Z [2023-01-11 21:38:01,220] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9588929Z [2023-01-11 21:38:01,220] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9589166Z [2023-01-11 21:38:01,226] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9589440Z [2023-01-11 21:38:01,226] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9589672Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9589946Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9590365Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9590682Z [2023-01-11 21:38:01,227] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9590931Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9591300Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077>
2023-01-11T21:38:06.9591545Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9591831Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9592061Z [2023-01-11 21:38:01,234] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9592302Z [2023-01-11 21:38:01,234] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9592549Z [2023-01-11 21:38:01,235] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9592808Z [2023-01-11 21:38:01,277] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1153
2023-01-11T21:38:06.9593071Z [2023-01-11 21:38:01,348] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1153
2023-01-11T21:38:06.9593483Z [2023-01-11 21:38:01,348] torch._inductor.debug: [WARNING] model__1153_inference_1199 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_2/run_2023_01_11_21_38_01_210061/aot_torchinductor/model__1153_inference_1199.5
2023-01-11T21:38:06.9593726Z [2023-01-11 21:38:01,349] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9594095Z [2023-01-11 21:38:01,500] torch._inductor.debug: [WARNING] model___1200 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_2/run_2023_01_11_21_38_01_210061/aot_torchinductor/model___1199.4
2023-01-11T21:38:06.9594334Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9594647Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6090
2023-01-11T21:38:06.9594871Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9595129Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9595451Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9595815Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9596186Z [2023-01-11 21:38:01,503] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074> 
2023-01-11T21:38:06.9596290Z  6075           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9596389Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9596480Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9596577Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9596722Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9596878Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9596978Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9597064Z               14 RETURN_VALUE
2023-01-11T21:38:06.9597132Z  
2023-01-11T21:38:06.9597137Z 
2023-01-11T21:38:06.9597445Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6075
2023-01-11T21:38:06.9597665Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9598017Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9598338Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9598689Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9599059Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9599460Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9599882Z [2023-01-11 21:38:01,504] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9600132Z [2023-01-11 21:38:01,510] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9600508Z [2023-01-11 21:38:01,510] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8ac870, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6074>
2023-01-11T21:38:06.9600835Z [2023-01-11 21:38:01,510] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9601213Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8acbe0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6086> 
2023-01-11T21:38:06.9601308Z  6086           0 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9601411Z                2 LOAD_DEREF               0 (ten)
2023-01-11T21:38:06.9601512Z                4 LOAD_METHOD              0 (sum)
2023-01-11T21:38:06.9601607Z                6 CALL_METHOD              0
2023-01-11T21:38:06.9601687Z                8 BINARY_ADD
2023-01-11T21:38:06.9601770Z               10 RETURN_VALUE
2023-01-11T21:38:06.9601837Z  
2023-01-11T21:38:06.9601843Z 
2023-01-11T21:38:06.9602061Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x []
2023-01-11T21:38:06.9602300Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ten [TensorVariable()]
2023-01-11T21:38:06.9602573Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR sum [TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9602876Z [2023-01-11 21:38:01,511] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TensorVariable(), GetAttrVariable(TensorVariable(), sum)]
2023-01-11T21:38:06.9603148Z [2023-01-11 21:38:01,512] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9603397Z [2023-01-11 21:38:01,512] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9603776Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8acbe0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6086>
2023-01-11T21:38:06.9604062Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9604437Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077> 
2023-01-11T21:38:06.9604539Z  6078           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9604639Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9604736Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9604834Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9604928Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9605026Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9605105Z               12 BINARY_ADD
2023-01-11T21:38:06.9605206Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9605363Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9605460Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9605593Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9605687Z               22 RETURN_VALUE
2023-01-11T21:38:06.9605768Z  
2023-01-11T21:38:06.9605775Z 
2023-01-11T21:38:06.9606104Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6078
2023-01-11T21:38:06.9606322Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9606565Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9606849Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9607259Z [2023-01-11 21:38:01,513] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9607576Z [2023-01-11 21:38:01,514] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9607812Z [2023-01-11 21:38:01,520] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9608085Z [2023-01-11 21:38:01,520] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9608317Z [2023-01-11 21:38:01,520] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9608588Z [2023-01-11 21:38:01,521] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9609007Z [2023-01-11 21:38:01,521] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9609321Z [2023-01-11 21:38:01,521] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9609568Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9609944Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ac920, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6077>
2023-01-11T21:38:06.9610192Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9610445Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9610675Z [2023-01-11 21:38:01,527] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9610895Z [2023-01-11 21:38:01,527] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9611145Z [2023-01-11 21:38:01,528] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9611404Z [2023-01-11 21:38:01,572] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1154
2023-01-11T21:38:06.9611615Z [2023-01-11 21:38:01,584] torch._inductor.scheduler: [DEBUG] remove_buffer('buf0')
2023-01-11T21:38:06.9611621Z 
2023-01-11T21:38:06.9611719Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9611797Z import torch
2023-01-11T21:38:06.9611873Z import random
2023-01-11T21:38:06.9611987Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9612114Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9612120Z 
2023-01-11T21:38:06.9612204Z aten = torch.ops.aten
2023-01-11T21:38:06.9612376Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9612478Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9612506Z 
2023-01-11T21:38:06.9612585Z import triton
2023-01-11T21:38:06.9612680Z import triton.language as tl
2023-01-11T21:38:06.9612802Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9612946Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9612951Z 
2023-01-11T21:38:06.9612956Z 
2023-01-11T21:38:06.9613145Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9613222Z import triton
2023-01-11T21:38:06.9613315Z import triton.language as tl
2023-01-11T21:38:06.9613434Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9613539Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9613676Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9613798Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9613803Z 
2023-01-11T21:38:06.9614194Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.9614272Z @triton.jit
2023-01-11T21:38:06.9614397Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9614583Z     xnumel = 4
2023-01-11T21:38:06.9614685Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9614816Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9614900Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9614965Z     x0 = xindex
2023-01-11T21:38:06.9615035Z     tmp0 = 1
2023-01-11T21:38:06.9615109Z     tmp1 = 2147483657
2023-01-11T21:38:06.9615187Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9615273Z     tmp3 = tmp2.to(tl.int64)
2023-01-11T21:38:06.9615354Z     tmp4 = tmp3 + tmp0
2023-01-11T21:38:06.9615439Z     tmp5 = tmp4.to(tl.float64)
2023-01-11T21:38:06.9615577Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.9615664Z ''')
2023-01-11T21:38:06.9615673Z 
2023-01-11T21:38:06.9615678Z 
2023-01-11T21:38:06.9615771Z async_compile.wait(globals())
2023-01-11T21:38:06.9615847Z del async_compile
2023-01-11T21:38:06.9615852Z 
2023-01-11T21:38:06.9615926Z def call(args):
2023-01-11T21:38:06.9616020Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9616216Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9616302Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9616452Z         triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.9616529Z         return (buf0, )
2023-01-11T21:38:06.9616534Z 
2023-01-11T21:38:06.9616539Z 
2023-01-11T21:38:06.9616619Z if __name__ == "__main__":
2023-01-11T21:38:06.9616737Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9616864Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9616968Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.9616973Z 
2023-01-11T21:38:06.9616980Z 
2023-01-11T21:38:06.9617080Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9617197Z import torch
2023-01-11T21:38:06.9617275Z import random
2023-01-11T21:38:06.9617395Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9617519Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9617524Z 
2023-01-11T21:38:06.9617604Z aten = torch.ops.aten
2023-01-11T21:38:06.9617738Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9617830Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9617835Z 
2023-01-11T21:38:06.9617909Z import triton
2023-01-11T21:38:06.9617995Z import triton.language as tl
2023-01-11T21:38:06.9618120Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9618259Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9618265Z 
2023-01-11T21:38:06.9618322Z 
2023-01-11T21:38:06.9618517Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9618625Z import triton
2023-01-11T21:38:06.9618718Z import triton.language as tl
2023-01-11T21:38:06.9618834Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9618938Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9619066Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9619193Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9619198Z 
2023-01-11T21:38:06.9619585Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.9619661Z @triton.jit
2023-01-11T21:38:06.9619786Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9619861Z     xnumel = 4
2023-01-11T21:38:06.9619962Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9620094Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9620175Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9620251Z     x0 = xindex
2023-01-11T21:38:06.9620321Z     tmp0 = 1
2023-01-11T21:38:06.9620391Z     tmp1 = 0
2023-01-11T21:38:06.9620470Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.9620540Z     tmp3 = 2
2023-01-11T21:38:06.9620645Z     tmp4 = tmp0 - tmp3
2023-01-11T21:38:06.9620743Z     tmp5 = tl.where(tmp2, tmp0, tmp4)
2023-01-11T21:38:06.9620857Z     tmp6 = -1099511627776
2023-01-11T21:38:06.9620936Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.9621024Z     tmp8 = tmp7.to(tl.int64)
2023-01-11T21:38:06.9621103Z     tmp9 = tmp8 + tmp0
2023-01-11T21:38:06.9621197Z     tmp10 = tmp9.to(tl.float64)
2023-01-11T21:38:06.9621330Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.9621417Z ''')
2023-01-11T21:38:06.9621423Z 
2023-01-11T21:38:06.9621427Z 
2023-01-11T21:38:06.9621524Z async_compile.wait(globals())
2023-01-11T21:38:06.9621603Z del async_compile
2023-01-11T21:38:06.9621608Z 
2023-01-11T21:38:06.9621691Z def call(args):
2023-01-11T21:38:06.9621785Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9621987Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9622083Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9622228Z         triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.9622309Z         return (buf0, )
2023-01-11T21:38:06.9622314Z 
2023-01-11T21:38:06.9622319Z 
2023-01-11T21:38:06.9622401Z if __name__ == "__main__":
2023-01-11T21:38:06.9622518Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9622649Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9622752Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.9622758Z 
2023-01-11T21:38:06.9622762Z 
2023-01-11T21:38:06.9622864Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9622939Z import torch
2023-01-11T21:38:06.9623009Z import random
2023-01-11T21:38:06.9623131Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9623259Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9623264Z 
2023-01-11T21:38:06.9623349Z aten = torch.ops.aten
2023-01-11T21:38:06.9623485Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9623582Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9623587Z 
2023-01-11T21:38:06.9623663Z import triton
2023-01-11T21:38:06.9623751Z import triton.language as tl
2023-01-11T21:38:06.9623881Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9624022Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9624027Z 
2023-01-11T21:38:06.9624032Z 
2023-01-11T21:38:06.9624219Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9624297Z import triton
2023-01-11T21:38:06.9624422Z import triton.language as tl
2023-01-11T21:38:06.9624540Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9624676Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9624803Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9624931Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9624936Z 
2023-01-11T21:38:06.9625339Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp64', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.9625414Z @triton.jit
2023-01-11T21:38:06.9625550Z def triton_(in_ptr0, out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9625626Z     xnumel = 4
2023-01-11T21:38:06.9625724Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9625854Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9625936Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9626007Z     x0 = xindex
2023-01-11T21:38:06.9626107Z     tmp1 = tl.load(in_ptr0 + (x0), xmask)
2023-01-11T21:38:06.9626182Z     tmp0 = 1
2023-01-11T21:38:06.9626265Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9626351Z     tmp3 = tmp2.to(tl.int64)
2023-01-11T21:38:06.9626432Z     tmp4 = tmp3 + tmp0
2023-01-11T21:38:06.9626515Z     tmp5 = tmp4.to(tl.float64)
2023-01-11T21:38:06.9626654Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.9626741Z ''')
2023-01-11T21:38:06.9626746Z 
2023-01-11T21:38:06.9626750Z 
2023-01-11T21:38:06.9626845Z async_compile.wait(globals())
2023-01-11T21:38:06.9626926Z del async_compile
2023-01-11T21:38:06.9626931Z 
2023-01-11T21:38:06.9627007Z def call(args):
2023-01-11T21:38:06.9627080Z     arg0_1, = args
2023-01-11T21:38:06.9627151Z     args.clear()
2023-01-11T21:38:06.9627243Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9627441Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9627538Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9627697Z         triton_fused_convert_element_type_1_0.run(arg0_1, buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.9627775Z         del arg0_1
2023-01-11T21:38:06.9627856Z         return (buf0, )
2023-01-11T21:38:06.9627861Z 
2023-01-11T21:38:06.9627865Z 
2023-01-11T21:38:06.9627946Z if __name__ == "__main__":
2023-01-11T21:38:06.9628060Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9628190Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9628391Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9628505Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.9628511Z 
2023-01-11T21:38:06.9628777Z [2023-01-11 21:38:01,749] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1154
2023-01-11T21:38:06.9629194Z [2023-01-11 21:38:01,749] torch._inductor.debug: [WARNING] model__1154_inference_1200 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_3/run_2023_01_11_21_38_01_502557/aot_torchinductor/model__1154_inference_1200.7
2023-01-11T21:38:06.9629443Z [2023-01-11 21:38:01,750] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9629818Z [2023-01-11 21:38:01,899] torch._inductor.debug: [WARNING] model___1201 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_3/run_2023_01_11_21_38_01_502557/aot_torchinductor/model___1200.6
2023-01-11T21:38:06.9629823Z 
2023-01-11T21:38:06.9629924Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9629994Z import torch
2023-01-11T21:38:06.9630071Z import random
2023-01-11T21:38:06.9630193Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9630319Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9630324Z 
2023-01-11T21:38:06.9630408Z aten = torch.ops.aten
2023-01-11T21:38:06.9630575Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9630677Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9630704Z 
2023-01-11T21:38:06.9630782Z import triton
2023-01-11T21:38:06.9630871Z import triton.language as tl
2023-01-11T21:38:06.9631000Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9631141Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9631147Z 
2023-01-11T21:38:06.9631151Z 
2023-01-11T21:38:06.9631351Z triton_fused_convert_element_type_1_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9631431Z import triton
2023-01-11T21:38:06.9631526Z import triton.language as tl
2023-01-11T21:38:06.9631640Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9631742Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9631869Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.9631996Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9632002Z 
2023-01-11T21:38:06.9632094Z @reduction(size_hints=[1, 4],
2023-01-11T21:38:06.9632209Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.9632302Z               filename=__file__,
2023-01-11T21:38:06.9632662Z               meta={'signature': {0: '*fp32', 1: '*fp64', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.9632740Z @triton.jit
2023-01-11T21:38:06.9632908Z def triton_(in_ptr0, out_ptr1, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.9632977Z     xnumel = 1
2023-01-11T21:38:06.9633051Z     rnumel = 4
2023-01-11T21:38:06.9633152Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9633289Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9633376Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9633497Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.9633620Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.9633722Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.9633816Z         rindex = roffset + rbase
2023-01-11T21:38:06.9633909Z         rmask = rindex < rnumel
2023-01-11T21:38:06.9633982Z         r0 = rindex
2023-01-11T21:38:06.9634086Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.9634207Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.9634324Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.9634424Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.9634513Z         rindex = roffset + rbase
2023-01-11T21:38:06.9634600Z         rmask = rindex < rnumel
2023-01-11T21:38:06.9634672Z         r0 = rindex
2023-01-11T21:38:06.9634743Z         tmp2 = 1
2023-01-11T21:38:06.9634826Z         tmp3 = tmp2 + tmp1
2023-01-11T21:38:06.9634910Z         tmp4 = tmp3.to(tl.int64)
2023-01-11T21:38:06.9634993Z         tmp5 = tmp4 + tmp2
2023-01-11T21:38:06.9635091Z         tmp6 = tmp5.to(tl.float64)
2023-01-11T21:38:06.9635247Z         tl.store(out_ptr1 + (r0 + tl.zeros([XBLOCK, RBLOCK], tl.int32)), tmp6, rmask & xmask)
2023-01-11T21:38:06.9635337Z ''')
2023-01-11T21:38:06.9635343Z 
2023-01-11T21:38:06.9635347Z 
2023-01-11T21:38:06.9635444Z async_compile.wait(globals())
2023-01-11T21:38:06.9635523Z del async_compile
2023-01-11T21:38:06.9635528Z 
2023-01-11T21:38:06.9635602Z def call(args):
2023-01-11T21:38:06.9635671Z     arg0_1, = args
2023-01-11T21:38:06.9635749Z     args.clear()
2023-01-11T21:38:06.9635841Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9636039Z         buf1 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9636133Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9636297Z         triton_fused_convert_element_type_1_sum_1_0.run(arg0_1, buf1, 1, 4, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.9636374Z         del arg0_1
2023-01-11T21:38:06.9636448Z         return (buf1, )
2023-01-11T21:38:06.9636488Z 
2023-01-11T21:38:06.9636494Z 
2023-01-11T21:38:06.9636570Z if __name__ == "__main__":
2023-01-11T21:38:06.9636713Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9636839Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9637041Z     arg0_1 = rand_strided((4, ), (1, ), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9637156Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.9637161Z 
2023-01-11T21:38:06.9637236Z ok (1.351s)
2023-01-11T21:38:06.9637388Z   test_divisibile_by_16_covers_numel_args (__main__.TritonCodeGenTests) ... 
2023-01-11T21:38:06.9637482Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9637560Z import torch
2023-01-11T21:38:06.9637634Z import random
2023-01-11T21:38:06.9637754Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9637883Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9637888Z 
2023-01-11T21:38:06.9637973Z aten = torch.ops.aten
2023-01-11T21:38:06.9638112Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9638204Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9638219Z 
2023-01-11T21:38:06.9638289Z import triton
2023-01-11T21:38:06.9638384Z import triton.language as tl
2023-01-11T21:38:06.9638510Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9638653Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9638658Z 
2023-01-11T21:38:06.9638663Z 
2023-01-11T21:38:06.9638818Z triton_fused_sum_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9638893Z import triton
2023-01-11T21:38:06.9638989Z import triton.language as tl
2023-01-11T21:38:06.9639099Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9639202Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9639333Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.9639460Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9639468Z 
2023-01-11T21:38:06.9639561Z @reduction(size_hints=[8, 8192],
2023-01-11T21:38:06.9639677Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.9639770Z               filename=__file__,
2023-01-11T21:38:06.9640134Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1, 3), equal_to_1=())]})
2023-01-11T21:38:06.9640204Z @triton.jit
2023-01-11T21:38:06.9640376Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.9640449Z     xnumel = 8
2023-01-11T21:38:06.9640526Z     rnumel = 8192
2023-01-11T21:38:06.9640624Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9640764Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9640850Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9640965Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.9641041Z     x0 = xindex
2023-01-11T21:38:06.9641162Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.9641273Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.9641364Z         rindex = roffset + rbase
2023-01-11T21:38:06.9641450Z         rmask = rindex < rnumel
2023-01-11T21:38:06.9641521Z         r1 = rindex
2023-01-11T21:38:06.9641633Z         tmp0 = tl.load(in_ptr0 + (r1 + (8192*x0)), rmask & xmask)
2023-01-11T21:38:06.9641756Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.9641873Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.9641974Z     tl.store(out_ptr0 + x0, tmp1, xmask)
2023-01-11T21:38:06.9642061Z ''')
2023-01-11T21:38:06.9642067Z 
2023-01-11T21:38:06.9642071Z 
2023-01-11T21:38:06.9642230Z triton_fused_sum_1_1 = async_compile.triton('''
2023-01-11T21:38:06.9642308Z import triton
2023-01-11T21:38:06.9642402Z import triton.language as tl
2023-01-11T21:38:06.9642541Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9642647Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9642809Z from torch._inductor.triton_ops.autotune import reduction
2023-01-11T21:38:06.9642936Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9642942Z 
2023-01-11T21:38:06.9643032Z @reduction(size_hints=[1, 8],
2023-01-11T21:38:06.9643150Z               reduction_hint=ReductionHint.INNER,
2023-01-11T21:38:06.9643234Z               filename=__file__,
2023-01-11T21:38:06.9643590Z               meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32', 3: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0, 1), equal_to_1=())]})
2023-01-11T21:38:06.9643665Z @triton.jit
2023-01-11T21:38:06.9643835Z def triton_(in_ptr0, out_ptr0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):
2023-01-11T21:38:06.9643909Z     xnumel = 1
2023-01-11T21:38:06.9643982Z     rnumel = 8
2023-01-11T21:38:06.9644081Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9644219Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK, 1])
2023-01-11T21:38:06.9644307Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9644422Z     rbase = tl.reshape(tl.arange(0, RBLOCK), [1, RBLOCK])
2023-01-11T21:38:06.9644539Z     _tmp1 = tl.zeros([XBLOCK, RBLOCK], tl.float32) + 0
2023-01-11T21:38:06.9644644Z     for roffset in range(0, rnumel, RBLOCK):
2023-01-11T21:38:06.9644734Z         rindex = roffset + rbase
2023-01-11T21:38:06.9644820Z         rmask = rindex < rnumel
2023-01-11T21:38:06.9644893Z         r0 = rindex
2023-01-11T21:38:06.9644998Z         tmp0 = tl.load(in_ptr0 + (r0), rmask)
2023-01-11T21:38:06.9645115Z         _tmp1 = tl.where(xmask & rmask, _tmp1 + tmp0, _tmp1)
2023-01-11T21:38:06.9645230Z     tmp1 = tl.reshape(tl.sum(_tmp1, 1), [XBLOCK, 1])
2023-01-11T21:38:06.9645363Z     tl.store(out_ptr0 + 0 + tl.zeros([XBLOCK, 1], tl.int32), tmp1, None)
2023-01-11T21:38:06.9645449Z ''')
2023-01-11T21:38:06.9645454Z 
2023-01-11T21:38:06.9645461Z 
2023-01-11T21:38:06.9645557Z async_compile.wait(globals())
2023-01-11T21:38:06.9645638Z del async_compile
2023-01-11T21:38:06.9645644Z 
2023-01-11T21:38:06.9645719Z def call(args):
2023-01-11T21:38:06.9645799Z     arg0_1, = args
2023-01-11T21:38:06.9645870Z     args.clear()
2023-01-11T21:38:06.9645967Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9646165Z         buf0 = empty_strided((8, ), (1, ), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9646259Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9646400Z         triton_fused_sum_1_0.run(arg0_1, buf0, 8, 8192, grid=grid(8), stream=stream0)
2023-01-11T21:38:06.9646476Z         del arg0_1
2023-01-11T21:38:06.9646667Z         buf1 = empty_strided((), (), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9646800Z         triton_fused_sum_1_1.run(buf0, buf1, 1, 8, grid=grid(1), stream=stream0)
2023-01-11T21:38:06.9646878Z         return (buf1, )
2023-01-11T21:38:06.9646884Z 
2023-01-11T21:38:06.9646888Z 
2023-01-11T21:38:06.9646973Z if __name__ == "__main__":
2023-01-11T21:38:06.9647093Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9647226Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9647437Z     arg0_1 = rand_strided((256, 256), (256, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9647551Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.9647556Z 
2023-01-11T21:38:06.9647628Z ok (0.211s)
2023-01-11T21:38:06.9648152Z   test_optimize_compute (__main__.TritonCodeGenTests) ... /opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:372: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
2023-01-11T21:38:06.9648234Z   warnings.warn(
2023-01-11T21:38:06.9648476Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9648817Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6113
2023-01-11T21:38:06.9649060Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9649319Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9649615Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9649947Z [2023-01-11 21:38:02,114] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9650321Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100> 
2023-01-11T21:38:06.9650429Z  6101           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9650522Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9650621Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9650719Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9650862Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9651013Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9651112Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9651194Z               14 RETURN_VALUE
2023-01-11T21:38:06.9651255Z  
2023-01-11T21:38:06.9651261Z 
2023-01-11T21:38:06.9651571Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101
2023-01-11T21:38:06.9651794Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9652154Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9652471Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9652820Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9653160Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9653546Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9653968Z [2023-01-11 21:38:02,115] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9654218Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9654704Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100>
2023-01-11T21:38:06.9655030Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9655400Z [2023-01-11 21:38:02,121] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8ad0b0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6107> 
2023-01-11T21:38:06.9655544Z  6107           0 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9655646Z                2 LOAD_CONST               1 (500)
2023-01-11T21:38:06.9655755Z                4 BINARY_ADD
2023-01-11T21:38:06.9655838Z                6 RETURN_VALUE
2023-01-11T21:38:06.9655905Z  
2023-01-11T21:38:06.9655910Z 
2023-01-11T21:38:06.9656124Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x []
2023-01-11T21:38:06.9656366Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 500 [TensorVariable()]
2023-01-11T21:38:06.9656641Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9656886Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9657326Z [2023-01-11 21:38:02,122] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8ad0b0, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6107>
2023-01-11T21:38:06.9657615Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9657993Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ad000, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6103> 
2023-01-11T21:38:06.9658094Z  6104           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9658195Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9658295Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9658392Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9658481Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9658581Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9658662Z               12 BINARY_ADD
2023-01-11T21:38:06.9658764Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9658866Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9658971Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9659068Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9659145Z               22 RETURN_VALUE
2023-01-11T21:38:06.9659209Z  
2023-01-11T21:38:06.9659214Z 
2023-01-11T21:38:06.9659529Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6104
2023-01-11T21:38:06.9659748Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9659990Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9660271Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9660697Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9661021Z [2023-01-11 21:38:02,123] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9661260Z [2023-01-11 21:38:02,129] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9661539Z [2023-01-11 21:38:02,130] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9661770Z [2023-01-11 21:38:02,130] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9662047Z [2023-01-11 21:38:02,130] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9662514Z [2023-01-11 21:38:02,131] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9662847Z [2023-01-11 21:38:02,131] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9663094Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9663471Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ad000, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6103>
2023-01-11T21:38:06.9663715Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9663972Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9664200Z [2023-01-11 21:38:02,137] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9664421Z [2023-01-11 21:38:02,137] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9664665Z [2023-01-11 21:38:02,138] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9664918Z [2023-01-11 21:38:02,179] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1155
2023-01-11T21:38:06.9665155Z [2023-01-11 21:38:02,187] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9665417Z [2023-01-11 21:38:02,402] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1155
2023-01-11T21:38:06.9665830Z [2023-01-11 21:38:02,402] torch._inductor.debug: [WARNING] model__1155_inference_1201 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_4/run_2023_01_11_21_38_02_113801/aot_torchinductor/model__1155_inference_1201.9
2023-01-11T21:38:06.9666070Z [2023-01-11 21:38:02,402] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9666449Z [2023-01-11 21:38:02,404] torch._inductor.debug: [WARNING] model___1202 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_4/run_2023_01_11_21_38_02_113801/aot_torchinductor/model___1201.8
2023-01-11T21:38:06.9666684Z [2023-01-11 21:38:02,406] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9666998Z [2023-01-11 21:38:02,406] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6113
2023-01-11T21:38:06.9667222Z [2023-01-11 21:38:02,406] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9667482Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9667780Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9668119Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9668495Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100> 
2023-01-11T21:38:06.9668597Z  6101           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9668696Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9668795Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9668892Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9669039Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9669187Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9669318Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9669403Z               14 RETURN_VALUE
2023-01-11T21:38:06.9669505Z  
2023-01-11T21:38:06.9669510Z 
2023-01-11T21:38:06.9669825Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101
2023-01-11T21:38:06.9670047Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9670401Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9670722Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9671074Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9671413Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9671792Z [2023-01-11 21:38:02,407] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9672212Z [2023-01-11 21:38:02,408] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9672459Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9672838Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100>
2023-01-11T21:38:06.9673169Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9673545Z [2023-01-11 21:38:02,413] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8ad160, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6108> 
2023-01-11T21:38:06.9673650Z  6108           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9673751Z                2 LOAD_METHOD              1 (where)
2023-01-11T21:38:06.9673851Z                4 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9673950Z                6 LOAD_CONST               1 (0)
2023-01-11T21:38:06.9674042Z                8 COMPARE_OP               0 (<)
2023-01-11T21:38:06.9674145Z               10 LOAD_DEREF               0 (ones)
2023-01-11T21:38:06.9674245Z               12 CALL_FUNCTION            0
2023-01-11T21:38:06.9674346Z               14 LOAD_DEREF               0 (ones)
2023-01-11T21:38:06.9674446Z               16 CALL_FUNCTION            0
2023-01-11T21:38:06.9674541Z               18 LOAD_CONST               2 (2)
2023-01-11T21:38:06.9674628Z               20 BINARY_SUBTRACT
2023-01-11T21:38:06.9674716Z               22 CALL_METHOD              3
2023-01-11T21:38:06.9674867Z               24 LOAD_CONST               3 (-1048576)
2023-01-11T21:38:06.9674956Z               26 BINARY_MULTIPLY
2023-01-11T21:38:06.9675037Z               28 RETURN_VALUE
2023-01-11T21:38:06.9675110Z  
2023-01-11T21:38:06.9675115Z 
2023-01-11T21:38:06.9675341Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9675750Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR where [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9676096Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9676458Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 0 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable()]
2023-01-11T21:38:06.9676832Z [2023-01-11 21:38:02,414] torch._dynamo.symbolic_convert: [DEBUG] TRACE COMPARE_OP < [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9677179Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable()]
2023-01-11T21:38:06.9677566Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9677939Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100> 
2023-01-11T21:38:06.9678046Z  6101           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9678147Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9678244Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9678336Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9678481Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9678628Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9678726Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9678809Z               14 RETURN_VALUE
2023-01-11T21:38:06.9678877Z  
2023-01-11T21:38:06.9678882Z 
2023-01-11T21:38:06.9679196Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101
2023-01-11T21:38:06.9679423Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9679785Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9680102Z [2023-01-11 21:38:02,415] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9680448Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9680788Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9681167Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9681588Z [2023-01-11 21:38:02,416] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9681834Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9682208Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100>
2023-01-11T21:38:06.9682615Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9683029Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9683423Z [2023-01-11 21:38:02,421] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100> 
2023-01-11T21:38:06.9683528Z  6101           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9683627Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9683724Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9683818Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9683957Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9684113Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9684214Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9684296Z               14 RETURN_VALUE
2023-01-11T21:38:06.9684367Z  
2023-01-11T21:38:06.9684372Z 
2023-01-11T21:38:06.9684685Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101
2023-01-11T21:38:06.9684909Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9685266Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9685619Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9685975Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9686314Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9686698Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9687118Z [2023-01-11 21:38:02,422] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9687365Z [2023-01-11 21:38:02,427] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9687742Z [2023-01-11 21:38:02,427] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100>
2023-01-11T21:38:06.9688144Z [2023-01-11 21:38:02,427] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9688587Z [2023-01-11 21:38:02,428] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_SUBTRACT None [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9688985Z [2023-01-11 21:38:02,428] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 3 [TorchVariable(<built-in method where of type object at 0x7fcb82dee100>), TensorVariable(), TensorVariable(), TensorVariable()]
2023-01-11T21:38:06.9689235Z [2023-01-11 21:38:02,429] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST -1048576 [TensorVariable()]
2023-01-11T21:38:06.9689552Z [2023-01-11 21:38:02,429] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_MULTIPLY None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9689818Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9690190Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8ad160, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6108>
2023-01-11T21:38:06.9690472Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9690845Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ad000, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6103> 
2023-01-11T21:38:06.9690948Z  6104           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9691051Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9691154Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9691258Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9691356Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9691454Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9691530Z               12 BINARY_ADD
2023-01-11T21:38:06.9691629Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9691730Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9691833Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9691928Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9692012Z               22 RETURN_VALUE
2023-01-11T21:38:06.9692080Z  
2023-01-11T21:38:06.9692086Z 
2023-01-11T21:38:06.9692396Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6104
2023-01-11T21:38:06.9692616Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9692860Z [2023-01-11 21:38:02,430] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9693141Z [2023-01-11 21:38:02,431] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9693555Z [2023-01-11 21:38:02,431] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9693874Z [2023-01-11 21:38:02,431] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9694110Z [2023-01-11 21:38:02,437] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9694388Z [2023-01-11 21:38:02,437] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9694743Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9695015Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9695424Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9695733Z [2023-01-11 21:38:02,438] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9695980Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9696396Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ad000, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6103>
2023-01-11T21:38:06.9696674Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9696933Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9697207Z [2023-01-11 21:38:02,444] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9697456Z [2023-01-11 21:38:02,445] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9697724Z [2023-01-11 21:38:02,446] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9697982Z [2023-01-11 21:38:02,516] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1156
2023-01-11T21:38:06.9698213Z [2023-01-11 21:38:02,526] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9698479Z [2023-01-11 21:38:02,748] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1156
2023-01-11T21:38:06.9698892Z [2023-01-11 21:38:02,748] torch._inductor.debug: [WARNING] model__1156_inference_1202 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_5/run_2023_01_11_21_38_02_406037/aot_torchinductor/model__1156_inference_1202.11
2023-01-11T21:38:06.9699134Z [2023-01-11 21:38:02,749] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9699510Z [2023-01-11 21:38:02,750] torch._inductor.debug: [WARNING] model___1203 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_5/run_2023_01_11_21_38_02_406037/aot_torchinductor/model___1202.10
2023-01-11T21:38:06.9699743Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9700055Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6113
2023-01-11T21:38:06.9700277Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF suffix []
2023-01-11T21:38:06.9700536Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF foo [UserFunctionVariable()]
2023-01-11T21:38:06.9700833Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_DEREF ones [UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9701171Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 0 [UserFunctionVariable(), UserFunctionVariable(), UserFunctionVariable()]
2023-01-11T21:38:06.9701534Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100> 
2023-01-11T21:38:06.9701643Z  6101           0 LOAD_GLOBAL              0 (torch)
2023-01-11T21:38:06.9701745Z                2 LOAD_ATTR                1 (ones)
2023-01-11T21:38:06.9701844Z                4 LOAD_CONST               1 (4)
2023-01-11T21:38:06.9701945Z                6 BUILD_LIST               1
2023-01-11T21:38:06.9702093Z                8 LOAD_CONST               2 ('cuda')
2023-01-11T21:38:06.9702250Z               10 LOAD_CONST               3 (('device',))
2023-01-11T21:38:06.9702350Z               12 CALL_FUNCTION_KW         2
2023-01-11T21:38:06.9702427Z               14 RETURN_VALUE
2023-01-11T21:38:06.9702493Z  
2023-01-11T21:38:06.9702499Z 
2023-01-11T21:38:06.9702809Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6101
2023-01-11T21:38:06.9703031Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch []
2023-01-11T21:38:06.9703417Z [2023-01-11 21:38:02,753] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR ones [TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9703788Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 4 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>)]
2023-01-11T21:38:06.9704138Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 1 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ConstantVariable(int)]
2023-01-11T21:38:06.9704476Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST cuda [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable()]
2023-01-11T21:38:06.9704860Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST ('device',) [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str)]
2023-01-11T21:38:06.9705281Z [2023-01-11 21:38:02,754] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION_KW 2 [TorchVariable(<built-in method ones of type object at 0x7fcb82dee100>), ListVariable(), ConstantVariable(str), ConstantVariable(tuple)]
2023-01-11T21:38:06.9705556Z [2023-01-11 21:38:02,759] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9705950Z [2023-01-11 21:38:02,759] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object ones at 0x7fcc3e8acf50, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6100>
2023-01-11T21:38:06.9706275Z [2023-01-11 21:38:02,759] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9706657Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object <lambda> at 0x7fcc3e8ad210, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6109> 
2023-01-11T21:38:06.9706763Z  6109           0 LOAD_FAST                0 (x)
2023-01-11T21:38:06.9706863Z                2 LOAD_CONST               1 (30)
2023-01-11T21:38:06.9706957Z                4 BINARY_TRUE_DIVIDE
2023-01-11T21:38:06.9707040Z                6 RETURN_VALUE
2023-01-11T21:38:06.9707108Z  
2023-01-11T21:38:06.9707113Z 
2023-01-11T21:38:06.9707320Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x []
2023-01-11T21:38:06.9707569Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 30 [TensorVariable()]
2023-01-11T21:38:06.9707861Z [2023-01-11 21:38:02,760] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_TRUE_DIVIDE None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9708112Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9708495Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object <lambda> at 0x7fcc3e8ad210, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6109>
2023-01-11T21:38:06.9708791Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [UserFunctionVariable(), TensorVariable()]
2023-01-11T21:38:06.9709160Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] INLINING <code object suffix at 0x7fcc3e8ad000, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6103> 
2023-01-11T21:38:06.9709260Z  6104           0 LOAD_FAST                0 (inp)
2023-01-11T21:38:06.9709362Z                2 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9709464Z                4 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9709555Z                6 LOAD_ATTR                2 (int64)
2023-01-11T21:38:06.9709649Z                8 CALL_METHOD              1
2023-01-11T21:38:06.9709745Z               10 LOAD_CONST               1 (1)
2023-01-11T21:38:06.9709824Z               12 BINARY_ADD
2023-01-11T21:38:06.9709955Z               14 LOAD_METHOD              0 (to)
2023-01-11T21:38:06.9710056Z               16 LOAD_GLOBAL              1 (torch)
2023-01-11T21:38:06.9710181Z               18 LOAD_ATTR                3 (float64)
2023-01-11T21:38:06.9710269Z               20 CALL_METHOD              1
2023-01-11T21:38:06.9710352Z               22 RETURN_VALUE
2023-01-11T21:38:06.9710419Z  
2023-01-11T21:38:06.9710424Z 
2023-01-11T21:38:06.9710738Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6104
2023-01-11T21:38:06.9710958Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST inp []
2023-01-11T21:38:06.9711202Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9711481Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9711902Z [2023-01-11 21:38:02,761] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR int64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9712217Z [2023-01-11 21:38:02,762] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9712447Z [2023-01-11 21:38:02,768] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 1 [TensorVariable()]
2023-01-11T21:38:06.9712721Z [2023-01-11 21:38:02,768] torch._dynamo.symbolic_convert: [DEBUG] TRACE BINARY_ADD None [TensorVariable(), ConstantVariable(int)]
2023-01-11T21:38:06.9712957Z [2023-01-11 21:38:02,768] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR to [TensorVariable()]
2023-01-11T21:38:06.9713225Z [2023-01-11 21:38:02,769] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL torch [GetAttrVariable(TensorVariable(), to)]
2023-01-11T21:38:06.9713646Z [2023-01-11 21:38:02,769] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR float64 [GetAttrVariable(TensorVariable(), to), TorchVariable(<module 'torch' from '/opt/conda/lib/python3.10/site-packages/torch/__init__.py'>)]
2023-01-11T21:38:06.9713962Z [2023-01-11 21:38:02,769] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 1 [GetAttrVariable(TensorVariable(), to), ConstantVariable(dtype)]
2023-01-11T21:38:06.9714206Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9714583Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] DONE INLINING <code object suffix at 0x7fcc3e8ad000, file "/var/lib/jenkins/workspace/test/inductor/test_torchinductor.py", line 6103>
2023-01-11T21:38:06.9714837Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9715096Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9715322Z [2023-01-11 21:38:02,775] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9715537Z [2023-01-11 21:38:02,775] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9715780Z [2023-01-11 21:38:02,776] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9716039Z [2023-01-11 21:38:02,818] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1157
2023-01-11T21:38:06.9716281Z [2023-01-11 21:38:02,825] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9716548Z [2023-01-11 21:38:03,042] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1157
2023-01-11T21:38:06.9716992Z [2023-01-11 21:38:03,042] torch._inductor.debug: [WARNING] model__1157_inference_1203 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_6/run_2023_01_11_21_38_02_752239/aot_torchinductor/model__1157_inference_1203.13
2023-01-11T21:38:06.9717267Z [2023-01-11 21:38:03,042] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9717645Z [2023-01-11 21:38:03,043] torch._inductor.debug: [WARNING] model___1204 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_6/run_2023_01_11_21_38_02_752239/aot_torchinductor/model___1203.12
2023-01-11T21:38:06.9717651Z 
2023-01-11T21:38:06.9717752Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9717829Z import torch
2023-01-11T21:38:06.9717901Z import random
2023-01-11T21:38:06.9718024Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9718151Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9718157Z 
2023-01-11T21:38:06.9718242Z aten = torch.ops.aten
2023-01-11T21:38:06.9718383Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9718482Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9718488Z 
2023-01-11T21:38:06.9718568Z import triton
2023-01-11T21:38:06.9718660Z import triton.language as tl
2023-01-11T21:38:06.9718789Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9718935Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9718941Z 
2023-01-11T21:38:06.9718945Z 
2023-01-11T21:38:06.9719134Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9719217Z import triton
2023-01-11T21:38:06.9719312Z import triton.language as tl
2023-01-11T21:38:06.9719430Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9719535Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9719664Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9719794Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9719799Z 
2023-01-11T21:38:06.9720192Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.9720273Z @triton.jit
2023-01-11T21:38:06.9720399Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9720472Z     xnumel = 4
2023-01-11T21:38:06.9720573Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9720708Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9720787Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9720860Z     x0 = xindex
2023-01-11T21:38:06.9720934Z     tmp0 = 1
2023-01-11T21:38:06.9721009Z     tmp1 = 500
2023-01-11T21:38:06.9721089Z     tmp2 = tmp0 + tmp1
2023-01-11T21:38:06.9721177Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.9721251Z     tmp4 = tmp3 + tmp0
2023-01-11T21:38:06.9721343Z     tmp5 = tmp4.to(tl.float64)
2023-01-11T21:38:06.9721482Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.9721568Z ''')
2023-01-11T21:38:06.9721576Z 
2023-01-11T21:38:06.9721581Z 
2023-01-11T21:38:06.9721677Z async_compile.wait(globals())
2023-01-11T21:38:06.9721761Z del async_compile
2023-01-11T21:38:06.9721767Z 
2023-01-11T21:38:06.9721842Z def call(args):
2023-01-11T21:38:06.9721935Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9722127Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9722224Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9722373Z         triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.9722455Z         return (buf0, )
2023-01-11T21:38:06.9722461Z 
2023-01-11T21:38:06.9722465Z 
2023-01-11T21:38:06.9722546Z if __name__ == "__main__":
2023-01-11T21:38:06.9722665Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9722791Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9722895Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.9722900Z 
2023-01-11T21:38:06.9722940Z 
2023-01-11T21:38:06.9723035Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9723134Z import torch
2023-01-11T21:38:06.9723211Z import random
2023-01-11T21:38:06.9723333Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9723460Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9723465Z 
2023-01-11T21:38:06.9723549Z aten = torch.ops.aten
2023-01-11T21:38:06.9723686Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9723783Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9723789Z 
2023-01-11T21:38:06.9723858Z import triton
2023-01-11T21:38:06.9723953Z import triton.language as tl
2023-01-11T21:38:06.9724079Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9724219Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9724224Z 
2023-01-11T21:38:06.9724228Z 
2023-01-11T21:38:06.9724423Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9724499Z import triton
2023-01-11T21:38:06.9724595Z import triton.language as tl
2023-01-11T21:38:06.9724707Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9724813Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9724948Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9725080Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9725085Z 
2023-01-11T21:38:06.9725470Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.9725545Z @triton.jit
2023-01-11T21:38:06.9725666Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9725741Z     xnumel = 4
2023-01-11T21:38:06.9725835Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9725967Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9726055Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9726129Z     x0 = xindex
2023-01-11T21:38:06.9726206Z     tmp0 = 1
2023-01-11T21:38:06.9726279Z     tmp1 = 0
2023-01-11T21:38:06.9726360Z     tmp2 = tmp0 < tmp1
2023-01-11T21:38:06.9726425Z     tmp3 = 2
2023-01-11T21:38:06.9726536Z     tmp4 = tmp0 - tmp3
2023-01-11T21:38:06.9726632Z     tmp5 = tl.where(tmp2, tmp0, tmp4)
2023-01-11T21:38:06.9726740Z     tmp6 = -1048576
2023-01-11T21:38:06.9726819Z     tmp7 = tmp5 * tmp6
2023-01-11T21:38:06.9726909Z     tmp8 = tmp7.to(tl.int32)
2023-01-11T21:38:06.9726990Z     tmp9 = tmp8 + tmp0
2023-01-11T21:38:06.9727075Z     tmp10 = tmp9.to(tl.float64)
2023-01-11T21:38:06.9727216Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp10, xmask)
2023-01-11T21:38:06.9727303Z ''')
2023-01-11T21:38:06.9727309Z 
2023-01-11T21:38:06.9727313Z 
2023-01-11T21:38:06.9727412Z async_compile.wait(globals())
2023-01-11T21:38:06.9727493Z del async_compile
2023-01-11T21:38:06.9727498Z 
2023-01-11T21:38:06.9727578Z def call(args):
2023-01-11T21:38:06.9727673Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9727868Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9727966Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9728117Z         triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.9728200Z         return (buf0, )
2023-01-11T21:38:06.9728205Z 
2023-01-11T21:38:06.9728210Z 
2023-01-11T21:38:06.9728293Z if __name__ == "__main__":
2023-01-11T21:38:06.9728412Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9728539Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9728643Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.9728648Z 
2023-01-11T21:38:06.9728652Z 
2023-01-11T21:38:06.9728754Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9728823Z import torch
2023-01-11T21:38:06.9728900Z import random
2023-01-11T21:38:06.9729053Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9729180Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9729205Z 
2023-01-11T21:38:06.9729291Z aten = torch.ops.aten
2023-01-11T21:38:06.9729429Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9729527Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9729532Z 
2023-01-11T21:38:06.9729601Z import triton
2023-01-11T21:38:06.9729696Z import triton.language as tl
2023-01-11T21:38:06.9729824Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9729967Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9729972Z 
2023-01-11T21:38:06.9729976Z 
2023-01-11T21:38:06.9730162Z triton_fused_convert_element_type_1_0 = async_compile.triton('''
2023-01-11T21:38:06.9730244Z import triton
2023-01-11T21:38:06.9730337Z import triton.language as tl
2023-01-11T21:38:06.9730453Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9730553Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9730690Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9730819Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9730825Z 
2023-01-11T21:38:06.9731211Z @pointwise(size_hints=[4], filename=__file__, meta={'signature': {0: '*fp64', 1: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': set(), 'configs': [instance_descriptor(divisible_by_16=(0,), equal_to_1=())]})
2023-01-11T21:38:06.9731285Z @triton.jit
2023-01-11T21:38:06.9731409Z def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9731485Z     xnumel = 4
2023-01-11T21:38:06.9731586Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9731711Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9731795Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9731867Z     x0 = xindex
2023-01-11T21:38:06.9731938Z     tmp0 = 1
2023-01-11T21:38:06.9732011Z     tmp1 = 30
2023-01-11T21:38:06.9732095Z     tmp2 = tmp0 / tmp1
2023-01-11T21:38:06.9732177Z     tmp3 = tmp2.to(tl.int32)
2023-01-11T21:38:06.9732260Z     tmp4 = tmp3 + tmp0
2023-01-11T21:38:06.9732349Z     tmp5 = tmp4.to(tl.float64)
2023-01-11T21:38:06.9732485Z     tl.store(out_ptr0 + (x0 + tl.zeros([XBLOCK], tl.int32)), tmp5, xmask)
2023-01-11T21:38:06.9732571Z ''')
2023-01-11T21:38:06.9732576Z 
2023-01-11T21:38:06.9733676Z 
2023-01-11T21:38:06.9733785Z async_compile.wait(globals())
2023-01-11T21:38:06.9733865Z del async_compile
2023-01-11T21:38:06.9733870Z 
2023-01-11T21:38:06.9733954Z def call(args):
2023-01-11T21:38:06.9734047Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9734256Z         buf0 = empty_strided((4, ), (1, ), device='cuda', dtype=torch.float64)
2023-01-11T21:38:06.9734344Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9734624Z         triton_fused_convert_element_type_1_0.run(buf0, 4, grid=grid(4), stream=stream0)
2023-01-11T21:38:06.9734708Z         return (buf0, )
2023-01-11T21:38:06.9734713Z 
2023-01-11T21:38:06.9734717Z 
2023-01-11T21:38:06.9734801Z if __name__ == "__main__":
2023-01-11T21:38:06.9734935Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9735076Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9735187Z     print_performance(lambda: call([]))
2023-01-11T21:38:06.9735192Z 
2023-01-11T21:38:06.9735263Z ok (0.933s)
2023-01-11T21:38:06.9735700Z   test_optimize_indexing_dtype (__main__.TritonCodeGenTests) ... [2023-01-11 21:38:03,047] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo start tracing fn
2023-01-11T21:38:06.9736058Z [2023-01-11 21:38:03,047] torch._dynamo.symbolic_convert: [DEBUG] TRACE starts_line /var/lib/jenkins/workspace/test/inductor/test_torchinductor.py:6063
2023-01-11T21:38:06.9736285Z [2023-01-11 21:38:03,047] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_GLOBAL aten []
2023-01-11T21:38:06.9736616Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR upsample_bilinear2d [TorchVariable(<module 'torch.ops.aten' from 'torch.ops'>)]
2023-01-11T21:38:06.9736909Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_ATTR vec [TorchVariable(aten.upsample_bilinear2d)]
2023-01-11T21:38:06.9737359Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_FAST x [TorchVariable(aten.upsample_bilinear2d.vec)]
2023-01-11T21:38:06.9737677Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST None [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable()]
2023-01-11T21:38:06.9738038Z [2023-01-11 21:38:03,053] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST True [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType)]
2023-01-11T21:38:06.9738431Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2.0 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool)]
2023-01-11T21:38:06.9738862Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE LOAD_CONST 2.0 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool), ConstantVariable(float)]
2023-01-11T21:38:06.9739333Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE BUILD_LIST 2 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool), ConstantVariable(float), ConstantVariable(float)]
2023-01-11T21:38:06.9739753Z [2023-01-11 21:38:03,054] torch._dynamo.symbolic_convert: [DEBUG] TRACE CALL_FUNCTION 4 [TorchVariable(aten.upsample_bilinear2d.vec), TensorVariable(), ConstantVariable(NoneType), ConstantVariable(bool), ListVariable()]
2023-01-11T21:38:06.9740003Z [2023-01-11 21:38:03,137] torch._dynamo.symbolic_convert: [DEBUG] TRACE RETURN_VALUE None [TensorVariable()]
2023-01-11T21:38:06.9740261Z [2023-01-11 21:38:03,137] torch._dynamo.symbolic_convert: [INFO] Step 2: torchdynamo done tracing fn (RETURN_VALUE)
2023-01-11T21:38:06.9740494Z [2023-01-11 21:38:03,137] torch._dynamo.symbolic_convert: [DEBUG] RETURN_VALUE triggered compile
2023-01-11T21:38:06.9740719Z [2023-01-11 21:38:03,137] torch._dynamo.output_graph: [DEBUG] COMPILING GRAPH due to None
2023-01-11T21:38:06.9740959Z [2023-01-11 21:38:03,138] torch._dynamo.output_graph: [INFO] Step 3: calling compiler function compile_fx
2023-01-11T21:38:06.9741275Z [2023-01-11 21:38:03,319] torch._inductor.compile_fx: [INFO] Step 1: torchinductor compiling FORWARDS graph 1158
2023-01-11T21:38:06.9741517Z [2023-01-11 21:38:03,412] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9741750Z [2023-01-11 21:38:03,426] torch._inductor.optimize_indexing: [WARNING] unhandled ValueRange op name
2023-01-11T21:38:06.9741964Z [2023-01-11 21:38:03,434] torch._inductor.scheduler: [DEBUG] remove_buffer('buf1')
2023-01-11T21:38:06.9742231Z [2023-01-11 21:38:03,622] torch._inductor.compile_fx: [INFO] Step 1: torchinductor done compiling FORWARDS graph 1158
2023-01-11T21:38:06.9742647Z [2023-01-11 21:38:03,622] torch._inductor.debug: [WARNING] model__1158_inference_1204 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_7/run_2023_01_11_21_38_03_046733/aot_torchinductor/model__1158_inference_1204.15
2023-01-11T21:38:06.9742897Z [2023-01-11 21:38:03,623] torch._dynamo.output_graph: [INFO] Step 3: done compiler function compile_fx
2023-01-11T21:38:06.9743272Z [2023-01-11 21:38:03,779] torch._inductor.debug: [WARNING] model___1205 debug trace: /var/lib/jenkins/workspace/test/torch_compile_debug_tmp_7/run_2023_01_11_21_38_03_046733/aot_torchinductor/model___1204.14
2023-01-11T21:38:06.9743278Z 
2023-01-11T21:38:06.9743380Z from ctypes import c_void_p, c_long
2023-01-11T21:38:06.9743449Z import torch
2023-01-11T21:38:06.9743528Z import random
2023-01-11T21:38:06.9743652Z from torch import empty_strided, as_strided, device
2023-01-11T21:38:06.9743780Z from torch._inductor.codecache import AsyncCompile
2023-01-11T21:38:06.9743785Z 
2023-01-11T21:38:06.9743868Z aten = torch.ops.aten
2023-01-11T21:38:06.9744009Z assert_size_stride = torch._C._dynamo.guards.assert_size_stride
2023-01-11T21:38:06.9744138Z async_compile = AsyncCompile()
2023-01-11T21:38:06.9744143Z 
2023-01-11T21:38:06.9744223Z import triton
2023-01-11T21:38:06.9744312Z import triton.language as tl
2023-01-11T21:38:06.9744443Z from torch._inductor.triton_ops.autotune import grid
2023-01-11T21:38:06.9744584Z from torch._C import _cuda_getCurrentRawStream as get_cuda_stream
2023-01-11T21:38:06.9744590Z 
2023-01-11T21:38:06.9744594Z 
2023-01-11T21:38:06.9744896Z triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0 = async_compile.triton('''
2023-01-11T21:38:06.9744975Z import triton
2023-01-11T21:38:06.9745071Z import triton.language as tl
2023-01-11T21:38:06.9745187Z from torch._inductor.ir import ReductionHint
2023-01-11T21:38:06.9745296Z from torch._inductor.ir import TileHint
2023-01-11T21:38:06.9745425Z from torch._inductor.triton_ops.autotune import pointwise
2023-01-11T21:38:06.9745552Z from torch._inductor.utils import instance_descriptor
2023-01-11T21:38:06.9745560Z 
2023-01-11T21:38:06.9745985Z @pointwise(size_hints=[8192], filename=__file__, meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'constants': {}, 'mutated_arg_names': {'in_out_ptr0'}, 'configs': [instance_descriptor(divisible_by_16=(0, 1, 2), equal_to_1=())]})
2023-01-11T21:38:06.9746063Z @triton.jit
2023-01-11T21:38:06.9746202Z def triton_(in_out_ptr0, in_ptr0, xnumel, XBLOCK : tl.constexpr):
2023-01-11T21:38:06.9746281Z     xnumel = 8192
2023-01-11T21:38:06.9746381Z     xoffset = tl.program_id(0) * XBLOCK
2023-01-11T21:38:06.9746515Z     xindex = xoffset + tl.reshape(tl.arange(0, XBLOCK), [XBLOCK])
2023-01-11T21:38:06.9746594Z     xmask = xindex < xnumel
2023-01-11T21:38:06.9746680Z     x1 = (xindex // 32) % 32
2023-01-11T21:38:06.9746758Z     x0 = xindex % 32
2023-01-11T21:38:06.9746837Z     x2 = (xindex // 1024)
2023-01-11T21:38:06.9746908Z     x4 = xindex
2023-01-11T21:38:06.9746983Z     tmp0 = x1
2023-01-11T21:38:06.9747062Z     tmp1 = 0.4838709677419355
2023-01-11T21:38:06.9747143Z     tmp2 = tmp0 * tmp1
2023-01-11T21:38:06.9747244Z     tmp3 = tl.libdevice.floor(tmp2)
2023-01-11T21:38:06.9747333Z     tmp4 = tmp3.to(tl.int32)
2023-01-11T21:38:06.9747408Z     tmp5 = x0
2023-01-11T21:38:06.9747487Z     tmp6 = tmp5 * tmp1
2023-01-11T21:38:06.9747619Z     tmp7 = tl.libdevice.floor(tmp6)
2023-01-11T21:38:06.9747701Z     tmp8 = tmp7.to(tl.int32)
2023-01-11T21:38:06.9747930Z     tmp9 = tl.load(in_ptr0 + (tmp8 + (16*tmp4) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.9748008Z     tmp10 = 1.0
2023-01-11T21:38:06.9748099Z     tmp11 = tmp4.to(tl.float32)
2023-01-11T21:38:06.9748214Z     tmp12 = tmp2 - tmp11
2023-01-11T21:38:06.9748330Z     tmp13 = tmp10 - tmp12
2023-01-11T21:38:06.9748415Z     tmp14 = tmp9 * tmp13
2023-01-11T21:38:06.9748507Z     tmp15 = tl.libdevice.ceil(tmp2)
2023-01-11T21:38:06.9748583Z     tmp16 = 15.0
2023-01-11T21:38:06.9748730Z     tmp17 = tl.where(tmp15 != tmp15, tmp15, tl.where(tmp15 < tmp16, tmp15, tmp16))
2023-01-11T21:38:06.9748824Z     tmp18 = tmp17.to(tl.int32)
2023-01-11T21:38:06.9749051Z     tmp19 = tl.load(in_ptr0 + (tmp8 + (16*tmp18) + (256*x2)), xmask, eviction_policy='evict_last')
2023-01-11T21:38:06.9749134Z     tmp20 = tmp19 * tmp12
2023-01-11T21:38:06.9749221Z     tmp21 = tmp14 + tmp20
2023-01-11T21:38:06.9749305Z     tmp22 = tmp8.to(tl.float32)
2023-01-11T21:38:06.9749417Z     tmp23 = tmp6 - tmp22
2023-01-11T21:38:06.9749534Z     tmp24 = tmp10 - tmp23
2023-01-11T21:38:06.9749615Z     tmp25 = tmp21 * tmp24
2023-01-11T21:38:06.9749713Z     tmp26 = tl.libdevice.ceil(tmp6)
2023-01-11T21:38:06.9749860Z     tmp27 = tl.where(tmp26 != tmp26, tmp26, tl.where(tmp26 < tmp16, tmp26, tmp16))
2023-01-11T21:38:06.9749949Z     tmp28 = tmp27.to(tl.int32)
2023-01-11T21:38:06.9750068Z     tmp29 = tl.load(in_ptr0 + (tmp28 + (16*tmp4) + (256*x2)), xmask)
2023-01-11T21:38:06.9750151Z     tmp30 = tmp29 * tmp13
2023-01-11T21:38:06.9750277Z     tmp31 = tl.load(in_ptr0 + (tmp28 + (16*tmp18) + (256*x2)), xmask)
2023-01-11T21:38:06.9750393Z     tmp32 = tmp31 * tmp12
2023-01-11T21:38:06.9750476Z     tmp33 = tmp30 + tmp32
2023-01-11T21:38:06.9750559Z     tmp34 = tmp33 * tmp23
2023-01-11T21:38:06.9750641Z     tmp35 = tmp25 + tmp34
2023-01-11T21:38:06.9750778Z     tl.store(in_out_ptr0 + (x4 + tl.zeros([XBLOCK], tl.int32)), tmp35, xmask)
2023-01-11T21:38:06.9750867Z ''')
2023-01-11T21:38:06.9750873Z 
2023-01-11T21:38:06.9750877Z 
2023-01-11T21:38:06.9750975Z async_compile.wait(globals())
2023-01-11T21:38:06.9751054Z del async_compile
2023-01-11T21:38:06.9751060Z 
2023-01-11T21:38:06.9751135Z def call(args):
2023-01-11T21:38:06.9751211Z     arg0_1, = args
2023-01-11T21:38:06.9751290Z     args.clear()
2023-01-11T21:38:06.9751385Z     with torch.cuda.device(0):
2023-01-11T21:38:06.9751606Z         buf0 = empty_strided((2, 4, 32, 32), (4096, 1024, 32, 1), device='cuda', dtype=torch.float32)
2023-01-11T21:38:06.9751699Z         buf2 = buf0; del buf0  # reuse
2023-01-11T21:38:06.9751794Z         stream0 = get_cuda_stream(0)
2023-01-11T21:38:06.9752026Z         triton_fused_add_add_1_add_2_arange_arange_1_convert_element_type_convert_element_type_2_floor_floor_1_index_0.run(buf2, arg0_1, 8192, grid=grid(8192), stream=stream0)
2023-01-11T21:38:06.9752102Z         del arg0_1
2023-01-11T21:38:06.9752188Z         return (buf2, )
2023-01-11T21:38:06.9752193Z 
2023-01-11T21:38:06.9752197Z 
2023-01-11T21:38:06.9752281Z if __name__ == "__main__":
2023-01-11T21:38:06.9752403Z     from torch._dynamo.testing import rand_strided
2023-01-11T21:38:06.9752527Z     from torch._inductor.utils import print_performance
2023-01-11T21:38:06.9752756Z     arg0_1 = rand_strided((2, 4, 16, 16), (1024, 256, 16, 1), device='cuda:0', dtype=torch.float32)
2023-01-11T21:38:06.9752872Z     print_performance(lambda: call([arg0_1]))
2023-01-11T21:38:06.9752877Z 
2023-01-11T21:38:06.9752954Z ok (0.736s)
2023-01-11T21:38:06.9752959Z 
2023-01-11T21:38:06.9753157Z ----------------------------------------------------------------------
2023-01-11T21:38:06.9753249Z Ran 731 tests in 867.685s
2023-01-11T21:38:06.9753253Z 
2023-01-11T21:38:06.9753333Z OK (skipped=33)
2023-01-11T21:38:06.9753339Z 
2023-01-11T21:38:06.9753431Z Generating XML reports...
2023-01-11T21:38:06.9753759Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CPUReproTests-20230111212336.xml
2023-01-11T21:38:06.9754050Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CpuTests-20230111212336.xml
2023-01-11T21:38:06.9754352Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaReproTests-20230111212336.xml
2023-01-11T21:38:06.9754637Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaTests-20230111212336.xml
2023-01-11T21:38:06.9754944Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-ExprPrinterTests-20230111212336.xml
2023-01-11T21:38:06.9755252Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCpuTest-20230111212336.xml
2023-01-11T21:38:06.9755620Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCudaTest-20230111212336.xml
2023-01-11T21:38:06.9755964Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-TestIndexingSimplification-20230111212336.xml
2023-01-11T21:38:06.9756276Z Generated XML report: test-reports/python-unittest/inductor.test_torchinductor/TEST-TritonCodeGenTests-20230111212336.xml
2023-01-11T21:38:06.9756282Z 
2023-01-11T21:38:06.9756619Z ##[endgroup]
2023-01-11T21:38:06.9756945Z FINISHED PRINTING LOG FILE of inductor/test_torchinductor (/var/lib/jenkins/workspace/test/test-reports/inductor-test_torchinductor_y59pervs)
2023-01-11T21:38:06.9756951Z 
2023-01-11T21:38:06.9757144Z Running test_fake_tensor ... [2023-01-11 21:38:06.158498]
2023-01-11T21:38:06.9757479Z Executing ['/opt/conda/bin/python', '-bb', 'test_fake_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:38:06.158901]
2023-01-11T21:38:16.3220740Z 
2023-01-11T21:38:16.3221454Z Expand the folded group to see the log file of test_fake_tensor
2023-01-11T21:38:16.3222528Z ##[group]PRINTING LOG FILE of test_fake_tensor (/var/lib/jenkins/workspace/test/test-reports/test_fake_tensor_jebacq4z)
2023-01-11T21:38:16.3222935Z 
2023-01-11T21:38:16.3223080Z Running tests...
2023-01-11T21:38:16.3223656Z ----------------------------------------------------------------------
2023-01-11T21:38:16.3224185Z Test results will be stored in test-reports/python-unittest/test_fake_tensor
2023-01-11T21:38:16.3224645Z   test_aliased_const_write (__main__.FakeTensorConstHandling) ... ok (1.098s)
2023-01-11T21:38:16.3225219Z   test_constant_invalidation (__main__.FakeTensorConstHandling) ... ok (0.004s)
2023-01-11T21:38:16.3225852Z   test_fake_tensor_batch_norm_cpu (__main__.FakeTensorConstHandling) ... ok (0.083s)
2023-01-11T21:38:16.3226406Z   test_fake_tensor_in_intlist_repro (__main__.FakeTensorConstHandling) ... ok (0.008s)
2023-01-11T21:38:16.3226924Z   test_inplace_add (__main__.FakeTensorConstHandling) ... ok (0.001s)
2023-01-11T21:38:16.3227428Z   test_inplace_view_invalidation (__main__.FakeTensorConstHandling) ... ok (0.001s)
2023-01-11T21:38:16.3227854Z   test_shared_storage_invalidation (__main__.FakeTensorConstHandling) ... ok (0.004s)
2023-01-11T21:38:16.3228600Z   test_shared_storages (__main__.FakeTensorConstHandling) ... /var/lib/jenkins/workspace/test/test_fake_tensor.py:513: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:16.3229211Z   self.assertEqual(x.storage()._cdata, y.storage()._cdata)
2023-01-11T21:38:16.3229782Z /var/lib/jenkins/workspace/test/test_fake_tensor.py:514: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:16.3230370Z   self.assertEqual(x.constant.storage()._cdata, y.constant.storage()._cdata)
2023-01-11T21:38:16.3230844Z ok (0.001s)
2023-01-11T21:38:16.3231090Z   test_simple (__main__.FakeTensorConstHandling) ... ok (0.001s)
2023-01-11T21:38:16.3231406Z   test_dead_key (__main__.FakeTensorConverterTest) ... ok (0.001s)
2023-01-11T21:38:16.3231728Z   test_dead_weak_ref (__main__.FakeTensorConverterTest) ... ok (0.001s)
2023-01-11T21:38:16.3232070Z   test_memoized_conversion_from_meta (__main__.FakeTensorConverterTest) ... ok (0.001s)
2023-01-11T21:38:16.3232418Z   test_memoized_conversion_to_meta (__main__.FakeTensorConverterTest) ... ok (0.001s)
2023-01-11T21:38:16.3232755Z   test_no_active_mode (__main__.FakeTensorConverterTest) ... ok (0.007s)
2023-01-11T21:38:16.3233080Z   test_no_ref_cycle (__main__.FakeTensorConverterTest) ... ok (0.001s)
2023-01-11T21:38:16.3233404Z   test_separate_mode_error (__main__.FakeTensorConverterTest) ... ok (0.007s)
2023-01-11T21:38:16.3234068Z   test_separate_tensor_storages_non_view (__main__.FakeTensorConverterTest) ... /var/lib/jenkins/workspace/test/test_fake_tensor.py:602: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:16.3234639Z   y.set_(x.storage())
2023-01-11T21:38:16.3234832Z ok (0.001s)
2023-01-11T21:38:16.3235102Z   test_separate_tensor_storages_view (__main__.FakeTensorConverterTest) ... ok (0.001s)
2023-01-11T21:38:16.3235445Z   test_like_ops (__main__.FakeTensorOperatorInvariants) ... ok (0.012s)
2023-01-11T21:38:16.3235793Z   test_non_kwarg_only_device (__main__.FakeTensorOperatorInvariants) ... ok (0.057s)
2023-01-11T21:38:16.3236226Z   test_sparse_new (__main__.FakeTensorOperatorInvariants) ... expected failure (0.002s)
2023-01-11T21:38:16.3236607Z   test_tensor_constructors_all_have_kwarg_device (__main__.FakeTensorOperatorInvariants) ... ok (0.109s)
2023-01-11T21:38:16.3237456Z   test_fake_tensor_prop_on_nn_module (__main__.FakeTensorPropTest) ... /opt/conda/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py:564: UserWarning: Was not able to add assertion to guarantee correct input value to specialized function. It is up to the user to make sure that your inputs match the inputs you specialized the function with.
2023-01-11T21:38:16.3238002Z   warnings.warn(
2023-01-11T21:38:16.3238198Z ok (0.023s)
2023-01-11T21:38:16.3238424Z   test_basic (__main__.FakeTensorTest) ... ok (0.002s)
2023-01-11T21:38:16.3238721Z   test_binary_op_type_promotion (__main__.FakeTensorTest) ... ok (0.008s)
2023-01-11T21:38:16.3239024Z   test_constructor (__main__.FakeTensorTest) ... ok (0.004s)
2023-01-11T21:38:16.3239308Z   test_cpu_fallback (__main__.FakeTensorTest) ... ok (0.145s)
2023-01-11T21:38:16.3239591Z   test_cuda_lstm (__main__.FakeTensorTest) ... ok (0.115s)
2023-01-11T21:38:16.3239886Z   test_cudnn_rnn_with_fallback (__main__.FakeTensorTest) ... ok (3.489s)
2023-01-11T21:38:16.3240192Z   test_cudnn_rnn_without_fallback (__main__.FakeTensorTest) ... ok (2.612s)
2023-01-11T21:38:16.3240511Z   test_data_dependent_operator (__main__.FakeTensorTest) ... ok (0.004s)
2023-01-11T21:38:16.3241134Z   test_deepcopy (__main__.FakeTensorTest) ... /var/lib/jenkins/workspace/test/test_fake_tensor.py:466: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:38:16.3241753Z   self.assertEqual(mod_copied.b.storage()._cdata, mod_copied.a.storage()._cdata)
2023-01-11T21:38:16.3242016Z ok (0.005s)
2023-01-11T21:38:16.3242259Z   test_fake_dispatch_keys (__main__.FakeTensorTest) ... ok (0.007s)
2023-01-11T21:38:16.3242559Z   test_fake_grad_copy (__main__.FakeTensorTest) ... ok (0.001s)
2023-01-11T21:38:16.3242849Z   test_fake_mode_error (__main__.FakeTensorTest) ... ok (0.001s)
2023-01-11T21:38:16.3243181Z   test_fallback_memory_prop (__main__.FakeTensorTest) ... ok (0.030s)
2023-01-11T21:38:16.3243484Z   test_from_numpy (__main__.FakeTensorTest) ... ok (0.010s)
2023-01-11T21:38:16.3243778Z   test_index_cuda_with_cpu (__main__.FakeTensorTest) ... ok (0.011s)
2023-01-11T21:38:16.3244069Z   test_like_constructor (__main__.FakeTensorTest) ... ok (0.010s)
2023-01-11T21:38:16.3244351Z   test_mode (__main__.FakeTensorTest) ... ok (0.004s)
2023-01-11T21:38:16.3244629Z   test_nan_to_num (__main__.FakeTensorTest) ... ok (0.013s)
2023-01-11T21:38:16.3244896Z   test_new (__main__.FakeTensorTest) ... ok (0.017s)
2023-01-11T21:38:16.3245168Z   test_non_kwarg_device (__main__.FakeTensorTest) ... ok (0.021s)
2023-01-11T21:38:16.3245483Z   test_non_overlapping_stride_zero (__main__.FakeTensorTest) ... ok (0.013s)
2023-01-11T21:38:16.3245793Z   test_non_parameter_grad (__main__.FakeTensorTest) ... ok (0.001s)
2023-01-11T21:38:16.3246082Z   test_normalize_device (__main__.FakeTensorTest) ... ok (0.007s)
2023-01-11T21:38:16.3246389Z   test_parameter_instantiation (__main__.FakeTensorTest) ... ok (0.004s)
2023-01-11T21:38:16.3246698Z   test_print_in_fake_mode (__main__.FakeTensorTest) ... ok (0.001s)
2023-01-11T21:38:16.3246982Z   test_randperm (__main__.FakeTensorTest) ... ok (0.007s)
2023-01-11T21:38:16.3247280Z   test_recursive_invocation (__main__.FakeTensorTest) ... ok (0.001s)
2023-01-11T21:38:16.3247580Z   test_scalar_inputs (__main__.FakeTensorTest) ... ok (0.004s)
2023-01-11T21:38:16.3247865Z   test_setitem (__main__.FakeTensorTest) ... ok (0.008s)
2023-01-11T21:38:16.3248172Z   test_shape_take_not_device (__main__.FakeTensorTest) ... ok (0.011s)
2023-01-11T21:38:16.3248462Z   test_throw (__main__.FakeTensorTest) ... ok (0.008s)
2023-01-11T21:38:16.3248774Z   test_type_as (__main__.FakeTensorTest) ... ok (0.024s)
2023-01-11T21:38:16.3249068Z   test_upsample_bilinear_small_channels (__main__.FakeTensorTest) ... ok (0.081s)
2023-01-11T21:38:16.3249370Z   test_zero_dim (__main__.FakeTensorTest) ... ok (0.004s)
2023-01-11T21:38:16.3249529Z 
2023-01-11T21:38:16.3249742Z ----------------------------------------------------------------------
2023-01-11T21:38:16.3249996Z Ran 57 tests in 8.109s
2023-01-11T21:38:16.3250119Z 
2023-01-11T21:38:16.3250212Z OK (expected failures=1)
2023-01-11T21:38:16.3250341Z 
2023-01-11T21:38:16.3250436Z Generating XML reports...
2023-01-11T21:38:16.3250893Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConstHandling-20230111213807.xml
2023-01-11T21:38:16.3251447Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConverterTest-20230111213807.xml
2023-01-11T21:38:16.3252028Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorOperatorInvariants-20230111213807.xml
2023-01-11T21:38:16.3252589Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorPropTest-20230111213807.xml
2023-01-11T21:38:16.3253108Z Generated XML report: test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorTest-20230111213807.xml
2023-01-11T21:38:16.3253334Z 
2023-01-11T21:38:16.3253582Z ##[endgroup]
2023-01-11T21:38:16.3253973Z FINISHED PRINTING LOG FILE of test_fake_tensor (/var/lib/jenkins/workspace/test/test-reports/test_fake_tensor_jebacq4z)
2023-01-11T21:38:16.3254199Z 
2023-01-11T21:38:16.3254372Z Running test_sparse_csr ... [2023-01-11 21:38:16.321931]
2023-01-11T21:38:16.3255056Z Executing ['/opt/conda/bin/python', '-bb', 'test_sparse_csr.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:38:16.322120]
2023-01-11T21:45:10.4308675Z 
2023-01-11T21:45:10.4309303Z Expand the folded group to see the log file of test_sparse_csr
2023-01-11T21:45:10.4310042Z ##[group]PRINTING LOG FILE of test_sparse_csr (/var/lib/jenkins/workspace/test/test-reports/test_sparse_csr_txl3rn3o)
2023-01-11T21:45:10.4312454Z 
2023-01-11T21:45:10.4314394Z Running tests...
2023-01-11T21:45:10.4315020Z ----------------------------------------------------------------------
2023-01-11T21:45:10.4315826Z Test results will be stored in test-reports/python-unittest/test_sparse_csr
2023-01-11T21:45:10.4317050Z   test_add_cuda_float32 (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:2429: UserWarning: Sparse CSR tensor support is in beta state. If you miss a functionality in the sparse tensor support, please submit a feature request to https://github.com/pytorch/pytorch/issues. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/SparseCsrTensorImpl.cpp:56.)
2023-01-11T21:45:10.4318028Z   return torch.sparse_compressed_tensor(compressed_indices, plain_indices,
2023-01-11T21:45:10.4318424Z ok (0.126s)
2023-01-11T21:45:10.4318738Z   test_add_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.104s)
2023-01-11T21:45:10.4327220Z   test_addmm_all_sparse_csr_SparseCSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.219s)
2023-01-11T21:45:10.4328127Z   test_addmm_all_sparse_csr_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.052s)
2023-01-11T21:45:10.4329518Z   test_addmm_all_sparse_csr_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.047s)
2023-01-11T21:45:10.4330057Z   test_addmm_all_sparse_csr_SparseCSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.047s)
2023-01-11T21:45:10.4330522Z   test_addmm_all_sparse_csr_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.046s)
2023-01-11T21:45:10.4330987Z   test_addmm_all_sparse_csr_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.046s)
2023-01-11T21:45:10.4331492Z   test_addmm_all_sparse_csr_SparseCSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.040s)
2023-01-11T21:45:10.4332002Z   test_addmm_all_sparse_csr_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.041s)
2023-01-11T21:45:10.4332690Z   test_addmm_all_sparse_csr_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.040s)
2023-01-11T21:45:10.4333162Z   test_addmm_all_sparse_csr_SparseCSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.041s)
2023-01-11T21:45:10.4333565Z   test_addmm_all_sparse_csr_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.039s)
2023-01-11T21:45:10.4333913Z   test_addmm_all_sparse_csr_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.039s)
2023-01-11T21:45:10.4334379Z   test_addmm_dense_result_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4335177Z   test_addmm_dense_result_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4335681Z   test_addmm_dense_result_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4336098Z   test_addmm_dense_result_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4336623Z   test_addmm_dense_result_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4337128Z   test_addmm_dense_result_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4337524Z   test_addmm_dense_result_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4337977Z   test_addmm_dense_result_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:45:10.4338437Z   test_addmm_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.032s)
2023-01-11T21:45:10.4338971Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4339440Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4339816Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4340178Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4340623Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4341035Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4341462Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4341924Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4342364Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4342828Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4343296Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4343745Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4344198Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4344649Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4345116Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4345594Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4346055Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4346605Z   test_addmm_sizes_all_sparse_csr_k_0_n_0_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4347055Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4347519Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4347989Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4348458Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4348945Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4349399Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4349853Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4350328Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4350793Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4351160Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4351511Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4351864Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4352249Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4352753Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4353253Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4353785Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4354327Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4354797Z   test_addmm_sizes_all_sparse_csr_k_0_n_10_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4355265Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4355738Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4356401Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4356878Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4357347Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4357802Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4358261Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4358738Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4359214Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4359680Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4360128Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4361346Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4361882Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4362376Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4362868Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4363355Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4363819Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4364284Z   test_addmm_sizes_all_sparse_csr_k_0_n_1_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4378636Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4379131Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4379512Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4379901Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4380304Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4380773Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4381320Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4381757Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4382179Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4382719Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4383261Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4383746Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4384109Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4384572Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4385099Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4385620Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4386108Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4386627Z   test_addmm_sizes_all_sparse_csr_k_1_n_0_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4387130Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4387639Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4388132Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4388660Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4389149Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4389630Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4390206Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4390716Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4391218Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4391763Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4392187Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4392730Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4393304Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4393871Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4394337Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4394797Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4395253Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4395700Z   test_addmm_sizes_all_sparse_csr_k_1_n_10_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4396159Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4396625Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4397092Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4397549Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4398007Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4398529Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4398988Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4399440Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4399912Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4400365Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4400814Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4401262Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4401727Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4402177Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4402635Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4403088Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4403549Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4403999Z   test_addmm_sizes_all_sparse_csr_k_1_n_1_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4404482Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4404933Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4405397Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4405851Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4406284Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4406738Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4407188Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4407649Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4408108Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4408567Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4409015Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4409468Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4409924Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4410388Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4410847Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4411303Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4411750Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4412245Z   test_addmm_sizes_all_sparse_csr_k_8_n_0_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4412703Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4413171Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4413678Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4414140Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4414884Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4415341Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4415788Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4416259Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4416731Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4417183Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4417646Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4418089Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4418654Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4419197Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4419682Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4420139Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4420591Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4421022Z   test_addmm_sizes_all_sparse_csr_k_8_n_10_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4421478Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4421937Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4422405Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4422844Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4423303Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4423761Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_0_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4424198Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4424673Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4425143Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4425592Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4426042Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4426487Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4427002Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4427461Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4427915Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4428370Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4428828Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4429268Z   test_addmm_sizes_all_sparse_csr_k_8_n_1_m_25_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4429712Z   test_autograd_dense_output_addmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.711s)
2023-01-11T21:45:10.4430162Z   test_autograd_dense_output_addmv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.043s)
2023-01-11T21:45:10.4430609Z   test_autograd_dense_output_mm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4431046Z   test_autograd_dense_output_mv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4431490Z   test_autograd_sparse_csr_unary_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.202s)
2023-01-11T21:45:10.4431952Z   test_autograd_sparse_csr_unary_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4432517Z   test_autograd_sparse_csr_unary_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op angle not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4433084Z   test_autograd_sparse_csr_unary_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op angle not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4433567Z   test_autograd_sparse_csr_unary_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asin not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4434057Z   test_autograd_sparse_csr_unary_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asin not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4434542Z   test_autograd_sparse_csr_unary_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asinh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4435021Z   test_autograd_sparse_csr_unary_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op asinh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4435495Z   test_autograd_sparse_csr_unary_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atan not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4435985Z   test_autograd_sparse_csr_unary_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atan not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4436465Z   test_autograd_sparse_csr_unary_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atanh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4436954Z   test_autograd_sparse_csr_unary_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op atanh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4437430Z   test_autograd_sparse_csr_unary_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op ceil not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4437853Z   test_autograd_sparse_csr_unary_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4438234Z   test_autograd_sparse_csr_unary_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4438607Z   test_autograd_sparse_csr_unary_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4439054Z   test_autograd_sparse_csr_unary_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op erf not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4439537Z   test_autograd_sparse_csr_unary_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op erfinv not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4440023Z   test_autograd_sparse_csr_unary_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op expm1 not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4440494Z   test_autograd_sparse_csr_unary_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op floor not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4440909Z   test_autograd_sparse_csr_unary_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4441326Z   test_autograd_sparse_csr_unary_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isinf not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4441807Z   test_autograd_sparse_csr_unary_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isinf not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4442290Z   test_autograd_sparse_csr_unary_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isnan not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4442767Z   test_autograd_sparse_csr_unary_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isnan not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4443252Z   test_autograd_sparse_csr_unary_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isneginf not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4443780Z   test_autograd_sparse_csr_unary_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op isposinf not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4444796Z   test_autograd_sparse_csr_unary_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: log1p_backward: received self with sparse layout, but backward requires materialization of a dense tensor with this shape (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/FunctionsManual.cpp:4679.)
2023-01-11T21:45:10.4445476Z   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2023-01-11T21:45:10.4445744Z ok (0.003s)
2023-01-11T21:45:10.4446016Z   test_autograd_sparse_csr_unary_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4446381Z   test_autograd_sparse_csr_unary_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.156s)
2023-01-11T21:45:10.4446739Z   test_autograd_sparse_csr_unary_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4447117Z   test_autograd_sparse_csr_unary_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4447488Z   test_autograd_sparse_csr_unary_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4447857Z   test_autograd_sparse_csr_unary_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4448222Z   test_autograd_sparse_csr_unary_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4448647Z   test_autograd_sparse_csr_unary_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op round not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4449130Z   test_autograd_sparse_csr_unary_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sgn not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4449642Z   test_autograd_sparse_csr_unary_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sgn not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4450126Z   test_autograd_sparse_csr_unary_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sign not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4450614Z   test_autograd_sparse_csr_unary_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op signbit not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4451095Z   test_autograd_sparse_csr_unary_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sin not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4451570Z   test_autograd_sparse_csr_unary_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sin not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4452057Z   test_autograd_sparse_csr_unary_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sinh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4452544Z   test_autograd_sparse_csr_unary_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sinh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4453025Z   test_autograd_sparse_csr_unary_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sqrt not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4453498Z   test_autograd_sparse_csr_unary_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op sqrt not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4453973Z   test_autograd_sparse_csr_unary_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tan not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4454675Z   test_autograd_sparse_csr_unary_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tan not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4455198Z   test_autograd_sparse_csr_unary_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tanh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4455669Z   test_autograd_sparse_csr_unary_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op tanh not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4456142Z   test_autograd_sparse_csr_unary_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Unary op trunc not supported with CSR input and autograd (0.001s)
2023-01-11T21:45:10.4456556Z   test_baddbmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s)
2023-01-11T21:45:10.4456914Z   test_baddbmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s)
2023-01-11T21:45:10.4457264Z   test_baddbmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s)
2023-01-11T21:45:10.4457614Z   test_baddbmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: ROCm not available (0.002s)
2023-01-11T21:45:10.4457986Z   test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s)
2023-01-11T21:45:10.4458378Z   test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.065s)
2023-01-11T21:45:10.4458832Z   test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.064s)
2023-01-11T21:45:10.4459219Z   test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.071s)
2023-01-11T21:45:10.4459595Z   test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.061s)
2023-01-11T21:45:10.4459986Z   test_block_addmm_block_size_2_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.059s)
2023-01-11T21:45:10.4460416Z   test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.074s)
2023-01-11T21:45:10.4460809Z   test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.070s)
2023-01-11T21:45:10.4461195Z   test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.071s)
2023-01-11T21:45:10.4461582Z   test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4461952Z   test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.067s)
2023-01-11T21:45:10.4462326Z   test_block_addmm_block_size_2_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.065s)
2023-01-11T21:45:10.4462711Z   test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s)
2023-01-11T21:45:10.4463101Z   test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.068s)
2023-01-11T21:45:10.4463534Z   test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.069s)
2023-01-11T21:45:10.4463914Z   test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.071s)
2023-01-11T21:45:10.4464287Z   test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.064s)
2023-01-11T21:45:10.4464656Z   test_block_addmm_block_size_2_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.062s)
2023-01-11T21:45:10.4465026Z   test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.074s)
2023-01-11T21:45:10.4465453Z   test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.072s)
2023-01-11T21:45:10.4465840Z   test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.074s)
2023-01-11T21:45:10.4466220Z   test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4466597Z   test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.069s)
2023-01-11T21:45:10.4466969Z   test_block_addmm_block_size_2_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.068s)
2023-01-11T21:45:10.4467351Z   test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s)
2023-01-11T21:45:10.4467729Z   test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.064s)
2023-01-11T21:45:10.4468119Z   test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.065s)
2023-01-11T21:45:10.4468502Z   test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.072s)
2023-01-11T21:45:10.4468880Z   test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.061s)
2023-01-11T21:45:10.4469251Z   test_block_addmm_block_size_3_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.060s)
2023-01-11T21:45:10.4469627Z   test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4470010Z   test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.070s)
2023-01-11T21:45:10.4470391Z   test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.071s)
2023-01-11T21:45:10.4470769Z   test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4471143Z   test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.067s)
2023-01-11T21:45:10.4471536Z   test_block_addmm_block_size_3_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.065s)
2023-01-11T21:45:10.4471916Z   test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.072s)
2023-01-11T21:45:10.4472295Z   test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.067s)
2023-01-11T21:45:10.4472681Z   test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.069s)
2023-01-11T21:45:10.4473066Z   test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.071s)
2023-01-11T21:45:10.4473442Z   test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.063s)
2023-01-11T21:45:10.4473812Z   test_block_addmm_block_size_3_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.061s)
2023-01-11T21:45:10.4474191Z   test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.074s)
2023-01-11T21:45:10.4474575Z   test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.073s)
2023-01-11T21:45:10.4474949Z   test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4475333Z   test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4475706Z   test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.070s)
2023-01-11T21:45:10.4476117Z   test_block_addmm_block_size_3_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.068s)
2023-01-11T21:45:10.4476488Z   test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4476877Z   test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4477258Z   test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4477629Z   test_block_addmv_block_size_2_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4477997Z   test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4478381Z   test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4478766Z   test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4479140Z   test_block_addmv_block_size_2_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4479520Z   test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4479904Z   test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4480286Z   test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4480661Z   test_block_addmv_block_size_2_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4481034Z   test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4481423Z   test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4481806Z   test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4482223Z   test_block_addmv_block_size_2_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4482601Z   test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4482987Z   test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4483404Z   test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4483782Z   test_block_addmv_block_size_3_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4484163Z   test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4484548Z   test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4484932Z   test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4485297Z   test_block_addmv_block_size_3_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4485683Z   test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4486072Z   test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4486453Z   test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4486820Z   test_block_addmv_block_size_3_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4487226Z   test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4487618Z   test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4488000Z   test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4488367Z   test_block_addmv_block_size_3_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4488951Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... /var/lib/jenkins/workspace/test/test_sparse_csr.py:1616: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
2023-01-11T21:45:10.4489523Z torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
2023-01-11T21:45:10.4489835Z X = torch.triangular_solve(B, A).solution
2023-01-11T21:45:10.4490052Z should be replaced with
2023-01-11T21:45:10.4490415Z X = torch.linalg.solve_triangular(A, B). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2225.)
2023-01-11T21:45:10.4490772Z   expected_X, _ = torch.triangular_solve(
2023-01-11T21:45:10.4490984Z ok (0.150s)
2023-01-11T21:45:10.4491286Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4491698Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4492103Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4492507Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4492951Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4493361Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4493765Z   test_block_triangular_solve_block_size_2_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4494161Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4494787Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4495201Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4495607Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4496005Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4496414Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4496815Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4497213Z   test_block_triangular_solve_block_size_2_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4497611Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4498079Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4498482Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.059s)
2023-01-11T21:45:10.4498966Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4499361Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4499768Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4500167Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4500563Z   test_block_triangular_solve_block_size_3_int32_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4500964Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4501374Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4501779Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.058s)
2023-01-11T21:45:10.4502178Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_False_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.059s)
2023-01-11T21:45:10.4502578Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4502985Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4503388Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.057s)
2023-01-11T21:45:10.4503786Z   test_block_triangular_solve_block_size_3_int64_noncontiguous_True_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.056s)
2023-01-11T21:45:10.4504162Z   test_bmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4504471Z   test_bmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.016s)
2023-01-11T21:45:10.4504771Z   test_bmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4505063Z   test_bmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4505412Z   test_compressed_layout_conversions_coverage_SparseBSC_SparseBSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4505777Z This test performs a smoke test for covered conversion and verifies ... ok (0.038s)
2023-01-11T21:45:10.4506140Z   test_compressed_layout_conversions_coverage_SparseBSC_SparseBSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4506488Z This test performs a smoke test for covered conversion and verifies ... ok (0.038s)
2023-01-11T21:45:10.4506847Z   test_compressed_layout_conversions_coverage_SparseBSC_SparseCSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4507207Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s)
2023-01-11T21:45:10.4507556Z   test_compressed_layout_conversions_coverage_SparseBSC_SparseCSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4507894Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s)
2023-01-11T21:45:10.4508253Z   test_compressed_layout_conversions_coverage_SparseBSR_SparseBSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4508608Z This test performs a smoke test for covered conversion and verifies ... ok (0.038s)
2023-01-11T21:45:10.4508948Z   test_compressed_layout_conversions_coverage_SparseBSR_SparseBSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4509327Z This test performs a smoke test for covered conversion and verifies ... ok (0.037s)
2023-01-11T21:45:10.4509685Z   test_compressed_layout_conversions_coverage_SparseBSR_SparseCSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4510047Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s)
2023-01-11T21:45:10.4510401Z   test_compressed_layout_conversions_coverage_SparseBSR_SparseCSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4510755Z This test performs a smoke test for covered conversion and verifies ... ok (0.016s)
2023-01-11T21:45:10.4511112Z   test_compressed_layout_conversions_coverage_SparseCSC_SparseBSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4511457Z This test performs a smoke test for covered conversion and verifies ... ok (0.020s)
2023-01-11T21:45:10.4511813Z   test_compressed_layout_conversions_coverage_SparseCSC_SparseBSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4512170Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s)
2023-01-11T21:45:10.4512526Z   test_compressed_layout_conversions_coverage_SparseCSC_SparseCSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4512868Z This test performs a smoke test for covered conversion and verifies ... ok (0.005s)
2023-01-11T21:45:10.4513225Z   test_compressed_layout_conversions_coverage_SparseCSC_SparseCSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4513577Z This test performs a smoke test for covered conversion and verifies ... ok (0.006s)
2023-01-11T21:45:10.4513929Z   test_compressed_layout_conversions_coverage_SparseCSR_SparseBSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4514269Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s)
2023-01-11T21:45:10.4514621Z   test_compressed_layout_conversions_coverage_SparseCSR_SparseBSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4514975Z This test performs a smoke test for covered conversion and verifies ... ok (0.019s)
2023-01-11T21:45:10.4515323Z   test_compressed_layout_conversions_coverage_SparseCSR_SparseCSC_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4515674Z This test performs a smoke test for covered conversion and verifies ... ok (0.006s)
2023-01-11T21:45:10.4516055Z   test_compressed_layout_conversions_coverage_SparseCSR_SparseCSR_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4516409Z This test performs a smoke test for covered conversion and verifies ... ok (0.004s)
2023-01-11T21:45:10.4516729Z   test_coo_csr_conversion_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4517066Z   test_coo_csr_conversion_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4517401Z   test_coo_csr_conversion_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4517732Z   test_coo_csr_conversion_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4518065Z   test_coo_csr_conversion_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4518394Z   test_coo_csr_conversion_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4518717Z   test_coo_csr_conversion_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4519037Z   test_coo_csr_conversion_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4519364Z   test_coo_csr_conversion_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4519687Z   test_coo_csr_conversion_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4520003Z   test_coo_csr_conversion_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4520329Z   test_coo_csr_conversion_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4520659Z   test_coo_to_csr_convert_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.015s)
2023-01-11T21:45:10.4520993Z   test_csr_coo_conversion_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4521342Z   test_csr_coo_conversion_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4521678Z   test_csr_coo_conversion_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4522021Z   test_csr_coo_conversion_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4522346Z   test_csr_coo_conversion_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4522669Z   test_csr_coo_conversion_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4522997Z   test_csr_coo_conversion_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4523325Z   test_csr_coo_conversion_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4523640Z   test_csr_coo_conversion_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4523962Z   test_csr_coo_conversion_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4524289Z   test_csr_coo_conversion_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4524606Z   test_csr_coo_conversion_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4524931Z   test_csr_double_to_sparse_csr_cuda (__main__.TestSparseCSRCUDA) ... ok (0.001s)
2023-01-11T21:45:10.4525252Z   test_csr_is_contiguous_cuda (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4525573Z   test_csr_matvec_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.044s)
2023-01-11T21:45:10.4525889Z   test_csr_matvec_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.045s)
2023-01-11T21:45:10.4526213Z   test_csr_matvec_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.044s)
2023-01-11T21:45:10.4526533Z   test_csr_matvec_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.044s)
2023-01-11T21:45:10.4526840Z   test_csr_matvec_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.044s)
2023-01-11T21:45:10.4527151Z   test_csr_matvec_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.044s)
2023-01-11T21:45:10.4527808Z   test_csr_storage_cuda (__main__.TestSparseCSRCUDA) ... /var/lib/jenkins/workspace/test/test_sparse_csr.py:924: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T21:45:10.4528351Z   a.storage()
2023-01-11T21:45:10.4528529Z ok (0.003s)
2023-01-11T21:45:10.4528769Z   test_csr_stride_cuda (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4529100Z   test_csr_to_block_csr_blocksize_2_cuda_float64_int32 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4529456Z   test_csr_to_block_csr_blocksize_2_cuda_float64_int64 (__main__.TestSparseCSRCUDA) ... ok (0.014s)
2023-01-11T21:45:10.4529799Z   test_csr_to_block_csr_blocksize_4_cuda_float64_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4530149Z   test_csr_to_block_csr_blocksize_4_cuda_float64_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4530492Z   test_csr_to_block_csr_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4530850Z   test_dense_to_from_sparse_compressed_SparseBSC_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4531195Z This test tests conversion from dense to/from CSR and CSC ... ok (0.013s)
2023-01-11T21:45:10.4531546Z   test_dense_to_from_sparse_compressed_SparseBSC_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4531891Z This test tests conversion from dense to/from CSR and CSC ... ok (0.378s)
2023-01-11T21:45:10.4532233Z   test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4532578Z This test tests conversion from dense to/from CSR and CSC ... ok (0.011s)
2023-01-11T21:45:10.4532934Z   test_dense_to_from_sparse_compressed_SparseBSC_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4533309Z This test tests conversion from dense to/from CSR and CSC ... ok (0.020s)
2023-01-11T21:45:10.4533643Z   test_dense_to_from_sparse_compressed_SparseBSR_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4533983Z This test tests conversion from dense to/from CSR and CSC ... ok (0.012s)
2023-01-11T21:45:10.4534327Z   test_dense_to_from_sparse_compressed_SparseBSR_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4534848Z This test tests conversion from dense to/from CSR and CSC ... ok (0.304s)
2023-01-11T21:45:10.4535201Z   test_dense_to_from_sparse_compressed_SparseBSR_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4535542Z This test tests conversion from dense to/from CSR and CSC ... ok (0.011s)
2023-01-11T21:45:10.4535893Z   test_dense_to_from_sparse_compressed_SparseBSR_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4536229Z This test tests conversion from dense to/from CSR and CSC ... ok (0.019s)
2023-01-11T21:45:10.4536574Z   test_dense_to_from_sparse_compressed_SparseCSC_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4536910Z This test tests conversion from dense to/from CSR and CSC ... ok (0.015s)
2023-01-11T21:45:10.4537252Z   test_dense_to_from_sparse_compressed_SparseCSC_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4537594Z This test tests conversion from dense to/from CSR and CSC ... ok (0.083s)
2023-01-11T21:45:10.4537940Z   test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4538279Z This test tests conversion from dense to/from CSR and CSC ... ok (0.012s)
2023-01-11T21:45:10.4538678Z   test_dense_to_from_sparse_compressed_SparseCSC_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4539025Z This test tests conversion from dense to/from CSR and CSC ... ok (0.009s)
2023-01-11T21:45:10.4539367Z   test_dense_to_from_sparse_compressed_SparseCSR_Batched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4539693Z This test tests conversion from dense to/from CSR and CSC ... ok (0.013s)
2023-01-11T21:45:10.4540100Z   test_dense_to_from_sparse_compressed_SparseCSR_Batched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4540443Z This test tests conversion from dense to/from CSR and CSC ... ok (0.080s)
2023-01-11T21:45:10.4540786Z   test_dense_to_from_sparse_compressed_SparseCSR_NonBatched_Hybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4541115Z This test tests conversion from dense to/from CSR and CSC ... ok (0.011s)
2023-01-11T21:45:10.4541469Z   test_dense_to_from_sparse_compressed_SparseCSR_NonBatched_NonHybrid_cuda (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4541818Z This test tests conversion from dense to/from CSR and CSC ... ok (0.009s)
2023-01-11T21:45:10.4542140Z   test_direct_coo_csr_conversion_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4542499Z   test_direct_coo_csr_conversion_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4542854Z   test_direct_coo_csr_conversion_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4543206Z   test_direct_coo_csr_conversion_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4543539Z   test_direct_coo_csr_conversion_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4543883Z   test_direct_coo_csr_conversion_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4544220Z   test_direct_coo_csr_conversion_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4544568Z   test_direct_coo_csr_conversion_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4544901Z   test_direct_coo_csr_conversion_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4545235Z   test_exercise_detach_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4545601Z   test_exercise_detach_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4545927Z   test_exercise_detach_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4546268Z   test_exercise_detach_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4546602Z   test_exercise_detach_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4546931Z   test_exercise_detach_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4547252Z   test_exercise_detach_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4547574Z   test_exercise_detach_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4547900Z   test_exercise_detach_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4548218Z   test_exercise_detach_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4548541Z   test_exercise_detach_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4548868Z   test_exercise_detach_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4549207Z   test_matmul_device_mismatch_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4549522Z   test_mm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.823s)
2023-01-11T21:45:10.4549833Z   test_mm_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4550145Z   test_mul_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.261s)
2023-01-11T21:45:10.4550439Z   test_mul_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.259s)
2023-01-11T21:45:10.4550788Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (5.375s)
2023-01-11T21:45:10.4551171Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (3.101s)
2023-01-11T21:45:10.4551560Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (5.841s)
2023-01-11T21:45:10.4551944Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (5.896s)
2023-01-11T21:45:10.4552353Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (4.886s)
2023-01-11T21:45:10.4552737Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (5.358s)
2023-01-11T21:45:10.4553106Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (5.336s)
2023-01-11T21:45:10.4553486Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (4.267s)
2023-01-11T21:45:10.4553853Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (4.272s)
2023-01-11T21:45:10.4554230Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (4.247s)
2023-01-11T21:45:10.4554596Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (4.279s)
2023-01-11T21:45:10.4554969Z   test_mul_scalar_enable_hybrid_False_SparseBSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (4.260s)
2023-01-11T21:45:10.4555346Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (4.009s)
2023-01-11T21:45:10.4555723Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (2.309s)
2023-01-11T21:45:10.4556097Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (4.385s)
2023-01-11T21:45:10.4556484Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (4.418s)
2023-01-11T21:45:10.4556858Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (3.643s)
2023-01-11T21:45:10.4557231Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (4.012s)
2023-01-11T21:45:10.4557647Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (3.973s)
2023-01-11T21:45:10.4558027Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (3.164s)
2023-01-11T21:45:10.4558394Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (3.174s)
2023-01-11T21:45:10.4558760Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (3.133s)
2023-01-11T21:45:10.4559129Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (3.155s)
2023-01-11T21:45:10.4559495Z   test_mul_scalar_enable_hybrid_False_SparseBSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (3.155s)
2023-01-11T21:45:10.4559870Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.456s)
2023-01-11T21:45:10.4560239Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.801s)
2023-01-11T21:45:10.4560623Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.602s)
2023-01-11T21:45:10.4561016Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.639s)
2023-01-11T21:45:10.4561396Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.324s)
2023-01-11T21:45:10.4561763Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.456s)
2023-01-11T21:45:10.4562134Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.420s)
2023-01-11T21:45:10.4562507Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (1.071s)
2023-01-11T21:45:10.4562883Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (1.073s)
2023-01-11T21:45:10.4563250Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (1.053s)
2023-01-11T21:45:10.4563648Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (1.072s)
2023-01-11T21:45:10.4564021Z   test_mul_scalar_enable_hybrid_False_SparseCSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (1.069s)
2023-01-11T21:45:10.4564395Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.354s)
2023-01-11T21:45:10.4564769Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.740s)
2023-01-11T21:45:10.4565154Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.492s)
2023-01-11T21:45:10.4565543Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.531s)
2023-01-11T21:45:10.4565916Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.230s)
2023-01-11T21:45:10.4566293Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.356s)
2023-01-11T21:45:10.4566666Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.322s)
2023-01-11T21:45:10.4567041Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.997s)
2023-01-11T21:45:10.4567401Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.994s)
2023-01-11T21:45:10.4567776Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.977s)
2023-01-11T21:45:10.4568147Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.994s)
2023-01-11T21:45:10.4568511Z   test_mul_scalar_enable_hybrid_False_SparseCSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.992s)
2023-01-11T21:45:10.4569387Z   test_resize_as_sparse_compressed_SparseBSC_cuda_bool (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py:167: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.)
2023-01-11T21:45:10.4569940Z   result = torch.empty(shape, device=device, dtype=dtype)
2023-01-11T21:45:10.4570176Z ok (0.064s)
2023-01-11T21:45:10.4570456Z   test_resize_as_sparse_compressed_SparseBSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.063s)
2023-01-11T21:45:10.4570835Z   test_resize_as_sparse_compressed_SparseBSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.060s)
2023-01-11T21:45:10.4571212Z   test_resize_as_sparse_compressed_SparseBSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.060s)
2023-01-11T21:45:10.4571593Z   test_resize_as_sparse_compressed_SparseCSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4571965Z   test_resize_as_sparse_compressed_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.075s)
2023-01-11T21:45:10.4572333Z   test_resize_as_sparse_compressed_SparseCSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.085s)
2023-01-11T21:45:10.4572706Z   test_resize_as_sparse_compressed_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.086s)
2023-01-11T21:45:10.4573050Z   test_resize_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4573359Z   test_resize_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4573675Z   test_resize_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4573993Z   test_resize_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4574299Z   test_resize_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4574791Z   test_resize_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4575109Z   test_resize_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4575409Z   test_resize_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4575699Z   test_resize_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4576049Z   test_resize_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4576346Z   test_resize_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4576635Z   test_resize_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4576945Z   test_resize_errors_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.011s)
2023-01-11T21:45:10.4577265Z   test_resize_errors_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4577576Z   test_resize_errors_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4577904Z   test_resize_errors_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4578318Z   test_resize_errors_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4578763Z   test_resize_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4579105Z   test_resize_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4579486Z   test_resize_errors_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4579801Z   test_resize_errors_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4580101Z   test_resize_errors_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4580408Z   test_resize_errors_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4580718Z   test_resize_errors_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.010s)
2023-01-11T21:45:10.4581053Z   test_sampled_addmm_autograd_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4581397Z   test_sampled_addmm_autograd_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4581806Z   test_sampled_addmm_autograd_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4582145Z   test_sampled_addmm_autograd_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4582477Z   test_sampled_addmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.174s)
2023-01-11T21:45:10.4582808Z   test_sampled_addmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.205s)
2023-01-11T21:45:10.4583133Z   test_sampled_addmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.147s)
2023-01-11T21:45:10.4583454Z   test_sampled_addmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.118s)
2023-01-11T21:45:10.4583783Z   test_sampled_addmm_errors_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.034s)
2023-01-11T21:45:10.4584128Z   test_sampled_addmm_errors_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.034s)
2023-01-11T21:45:10.4584470Z   test_sampled_addmm_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.034s)
2023-01-11T21:45:10.4584799Z   test_sampled_addmm_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.034s)
2023-01-11T21:45:10.4585249Z   test_sampled_addmm_zero_sized_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s)
2023-01-11T21:45:10.4585774Z   test_sampled_addmm_zero_sized_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s)
2023-01-11T21:45:10.4586291Z   test_sampled_addmm_zero_sized_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s)
2023-01-11T21:45:10.4586802Z   test_sampled_addmm_zero_sized_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Causes CUDA memory exception, see https://github.com/pytorch/pytorch/issues/72177 (0.001s)
2023-01-11T21:45:10.4587212Z   test_select_SparseBSC_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4587559Z   test_select_SparseBSC_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4587934Z   test_select_SparseBSC_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4588286Z   test_select_SparseBSC_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4588619Z   test_select_SparseBSC_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4588955Z   test_select_SparseBSC_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4589295Z   test_select_SparseBSC_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4589619Z   test_select_SparseBSC_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4589949Z   test_select_SparseBSC_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4590276Z   test_select_SparseBSC_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4590597Z   test_select_SparseBSC_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4590924Z   test_select_SparseBSC_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4591255Z   test_select_SparseBSC_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4591592Z   test_select_SparseBSC_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4591923Z   test_select_SparseBSC_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4592266Z   test_select_SparseBSC_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4592604Z   test_select_SparseBSC_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4592941Z   test_select_SparseBSC_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4593299Z   test_select_SparseBSC_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4593632Z   test_select_SparseBSC_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4593965Z   test_select_SparseBSC_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4594279Z   test_select_SparseBSC_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4594608Z   test_select_SparseBSC_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4594936Z   test_select_SparseBSC_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.022s)
2023-01-11T21:45:10.4595270Z   test_select_SparseBSR_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4595597Z   test_select_SparseBSR_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4595933Z   test_select_SparseBSR_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4596280Z   test_select_SparseBSR_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4596609Z   test_select_SparseBSR_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4596955Z   test_select_SparseBSR_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4597292Z   test_select_SparseBSR_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4597627Z   test_select_SparseBSR_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4597944Z   test_select_SparseBSR_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4598269Z   test_select_SparseBSR_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4598594Z   test_select_SparseBSR_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4598913Z   test_select_SparseBSR_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4599258Z   test_select_SparseBSR_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4599592Z   test_select_SparseBSR_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4599964Z   test_select_SparseBSR_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4600302Z   test_select_SparseBSR_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4600643Z   test_select_SparseBSR_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4600974Z   test_select_SparseBSR_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4601313Z   test_select_SparseBSR_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4601637Z   test_select_SparseBSR_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4601966Z   test_select_SparseBSR_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4602293Z   test_select_SparseBSR_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.019s)
2023-01-11T21:45:10.4602612Z   test_select_SparseBSR_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.019s)
2023-01-11T21:45:10.4602940Z   test_select_SparseBSR_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.019s)
2023-01-11T21:45:10.4603274Z   test_select_SparseCSC_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4603611Z   test_select_SparseCSC_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4605287Z   test_select_SparseCSC_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4605633Z   test_select_SparseCSC_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4605970Z   test_select_SparseCSC_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4606328Z   test_select_SparseCSC_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4606665Z   test_select_SparseCSC_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4606999Z   test_select_SparseCSC_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4607325Z   test_select_SparseCSC_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4607638Z   test_select_SparseCSC_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4607966Z   test_select_SparseCSC_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4608295Z   test_select_SparseCSC_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4608627Z   test_select_SparseCSC_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4608949Z   test_select_SparseCSC_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4609293Z   test_select_SparseCSC_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4609636Z   test_select_SparseCSC_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4609970Z   test_select_SparseCSC_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4610307Z   test_select_SparseCSC_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4610640Z   test_select_SparseCSC_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4610977Z   test_select_SparseCSC_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4611297Z   test_select_SparseCSC_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4611625Z   test_select_SparseCSC_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4611951Z   test_select_SparseCSC_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4612275Z   test_select_SparseCSC_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4612608Z   test_select_SparseCSR_int32_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4612969Z   test_select_SparseCSR_int32_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4613309Z   test_select_SparseCSR_int32_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4613644Z   test_select_SparseCSR_int32_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4613980Z   test_select_SparseCSR_int32_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4614311Z   test_select_SparseCSR_int32_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4614768Z   test_select_SparseCSR_int32_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4615100Z   test_select_SparseCSR_int32_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4615427Z   test_select_SparseCSR_int32_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4615749Z   test_select_SparseCSR_int32_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4616066Z   test_select_SparseCSR_int32_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4616395Z   test_select_SparseCSR_int32_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4616727Z   test_select_SparseCSR_int64_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4617055Z   test_select_SparseCSR_int64_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4617394Z   test_select_SparseCSR_int64_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4617742Z   test_select_SparseCSR_int64_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4618125Z   test_select_SparseCSR_int64_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.021s)
2023-01-11T21:45:10.4618450Z   test_select_SparseCSR_int64_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4618854Z   test_select_SparseCSR_int64_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4619189Z   test_select_SparseCSR_int64_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4619516Z   test_select_SparseCSR_int64_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4619830Z   test_select_SparseCSR_int64_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4620159Z   test_select_SparseCSR_int64_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4620482Z   test_select_SparseCSR_int64_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.020s)
2023-01-11T21:45:10.4620801Z   test_sparse_add_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.030s)
2023-01-11T21:45:10.4621129Z   test_sparse_add_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.030s)
2023-01-11T21:45:10.4621445Z   test_sparse_add_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.029s)
2023-01-11T21:45:10.4621768Z   test_sparse_add_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.029s)
2023-01-11T21:45:10.4622085Z   test_sparse_add_errors_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.013s)
2023-01-11T21:45:10.4622423Z   test_sparse_add_errors_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.013s)
2023-01-11T21:45:10.4622755Z   test_sparse_add_errors_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.013s)
2023-01-11T21:45:10.4623077Z   test_sparse_add_errors_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.013s)
2023-01-11T21:45:10.4623406Z   test_sparse_addmm_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4623735Z   test_sparse_addmm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4624065Z   test_sparse_addmm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.027s)
2023-01-11T21:45:10.4624382Z   test_sparse_addmm_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4624699Z   test_sparse_addmm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4625062Z   test_sparse_addmm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4625380Z   test_sparse_csc_to_dense_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4625711Z   test_sparse_csc_to_dense_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4626045Z   test_sparse_csc_to_dense_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4626381Z   test_sparse_csc_to_dense_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4626711Z   test_sparse_csc_to_dense_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4627036Z   test_sparse_csc_to_dense_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4627370Z   test_sparse_csc_to_dense_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4627687Z   test_sparse_csc_to_dense_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4628013Z   test_sparse_csc_to_dense_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4628339Z   test_sparse_csc_to_dense_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4628661Z   test_sparse_csc_to_dense_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4628980Z   test_sparse_csc_to_dense_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4629315Z   test_sparse_csr_from_dense_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4629650Z   test_sparse_csr_from_dense_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4629980Z   test_sparse_csr_from_dense_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4630356Z   test_sparse_csr_from_dense_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4630697Z   test_sparse_csr_from_dense_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4631030Z   test_sparse_csr_from_dense_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4631356Z   test_sparse_csr_from_dense_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4631690Z   test_sparse_csr_from_dense_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4632020Z   test_sparse_csr_from_dense_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4632339Z   test_sparse_csr_from_dense_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4632668Z   test_sparse_csr_from_dense_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4633002Z   test_sparse_csr_from_dense_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4633375Z   test_sparse_csr_to_dense_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4633705Z   test_sparse_csr_to_dense_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4634041Z   test_sparse_csr_to_dense_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4634379Z   test_sparse_csr_to_dense_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4634703Z   test_sparse_csr_to_dense_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4635032Z   test_sparse_csr_to_dense_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4635360Z   test_sparse_csr_to_dense_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4635689Z   test_sparse_csr_to_dense_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4636003Z   test_sparse_csr_to_dense_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4636334Z   test_sparse_csr_to_dense_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4636657Z   test_sparse_csr_to_dense_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4637001Z   test_sparse_csr_to_dense_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4637343Z   test_sparse_csr_unary_inplace_abs_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4637692Z   test_sparse_csr_unary_inplace_abs_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4638047Z   test_sparse_csr_unary_inplace_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4638397Z   test_sparse_csr_unary_inplace_abs_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.175s)
2023-01-11T21:45:10.4638755Z   test_sparse_csr_unary_inplace_abs_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4639111Z   test_sparse_csr_unary_inplace_abs_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4639460Z   test_sparse_csr_unary_inplace_abs_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4639803Z   test_sparse_csr_unary_inplace_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4640152Z   test_sparse_csr_unary_inplace_abs_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4640497Z   test_sparse_csr_unary_inplace_abs_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4640825Z   test_sparse_csr_unary_inplace_abs_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4641167Z   test_sparse_csr_unary_inplace_abs_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4641517Z   test_sparse_csr_unary_inplace_abs_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4641913Z   test_sparse_csr_unary_inplace_angle_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4642390Z   test_sparse_csr_unary_inplace_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4642845Z   test_sparse_csr_unary_inplace_angle_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4643291Z   test_sparse_csr_unary_inplace_angle_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4643730Z   test_sparse_csr_unary_inplace_angle_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4644160Z   test_sparse_csr_unary_inplace_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4644601Z   test_sparse_csr_unary_inplace_angle_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4645032Z   test_sparse_csr_unary_inplace_angle_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4645469Z   test_sparse_csr_unary_inplace_angle_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4645897Z   test_sparse_csr_unary_inplace_angle_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4646331Z   test_sparse_csr_unary_inplace_angle_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4646722Z   test_sparse_csr_unary_inplace_asin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4647076Z   test_sparse_csr_unary_inplace_asin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4647422Z   test_sparse_csr_unary_inplace_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4647792Z   test_sparse_csr_unary_inplace_asin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4648178Z   test_sparse_csr_unary_inplace_asin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4648533Z   test_sparse_csr_unary_inplace_asin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4648877Z   test_sparse_csr_unary_inplace_asin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4649224Z   test_sparse_csr_unary_inplace_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4649574Z   test_sparse_csr_unary_inplace_asin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4649911Z   test_sparse_csr_unary_inplace_asin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4650261Z   test_sparse_csr_unary_inplace_asin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4650607Z   test_sparse_csr_unary_inplace_asin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4650950Z   test_sparse_csr_unary_inplace_asin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4651292Z   test_sparse_csr_unary_inplace_asinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4651643Z   test_sparse_csr_unary_inplace_asinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4651995Z   test_sparse_csr_unary_inplace_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4652348Z   test_sparse_csr_unary_inplace_asinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4652711Z   test_sparse_csr_unary_inplace_asinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4653069Z   test_sparse_csr_unary_inplace_asinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4653446Z   test_sparse_csr_unary_inplace_asinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4653791Z   test_sparse_csr_unary_inplace_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4654147Z   test_sparse_csr_unary_inplace_asinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4654628Z   test_sparse_csr_unary_inplace_asinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4655030Z   test_sparse_csr_unary_inplace_asinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4655363Z   test_sparse_csr_unary_inplace_asinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4655710Z   test_sparse_csr_unary_inplace_asinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4656056Z   test_sparse_csr_unary_inplace_atan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4656398Z   test_sparse_csr_unary_inplace_atan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4656748Z   test_sparse_csr_unary_inplace_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4657106Z   test_sparse_csr_unary_inplace_atan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4657459Z   test_sparse_csr_unary_inplace_atan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4657803Z   test_sparse_csr_unary_inplace_atan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4658148Z   test_sparse_csr_unary_inplace_atan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4658575Z   test_sparse_csr_unary_inplace_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4658936Z   test_sparse_csr_unary_inplace_atan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4659272Z   test_sparse_csr_unary_inplace_atan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4659615Z   test_sparse_csr_unary_inplace_atan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4660005Z   test_sparse_csr_unary_inplace_atan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4660338Z   test_sparse_csr_unary_inplace_atan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4660687Z   test_sparse_csr_unary_inplace_atanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4661032Z   test_sparse_csr_unary_inplace_atanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4661380Z   test_sparse_csr_unary_inplace_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4661729Z   test_sparse_csr_unary_inplace_atanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4662088Z   test_sparse_csr_unary_inplace_atanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4662442Z   test_sparse_csr_unary_inplace_atanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4662785Z   test_sparse_csr_unary_inplace_atanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4663136Z   test_sparse_csr_unary_inplace_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4663518Z   test_sparse_csr_unary_inplace_atanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4663868Z   test_sparse_csr_unary_inplace_atanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4664202Z   test_sparse_csr_unary_inplace_atanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4664540Z   test_sparse_csr_unary_inplace_atanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4664883Z   test_sparse_csr_unary_inplace_atanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4665275Z   test_sparse_csr_unary_inplace_ceil_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4665617Z   test_sparse_csr_unary_inplace_ceil_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4665970Z   test_sparse_csr_unary_inplace_ceil_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4666318Z   test_sparse_csr_unary_inplace_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4666656Z   test_sparse_csr_unary_inplace_ceil_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4667001Z   test_sparse_csr_unary_inplace_ceil_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4667341Z   test_sparse_csr_unary_inplace_ceil_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4667679Z   test_sparse_csr_unary_inplace_ceil_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4668020Z   test_sparse_csr_unary_inplace_ceil_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4668382Z   test_sparse_csr_unary_inplace_conj_physical_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4668757Z   test_sparse_csr_unary_inplace_conj_physical_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4669138Z   test_sparse_csr_unary_inplace_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4669512Z   test_sparse_csr_unary_inplace_conj_physical_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.153s)
2023-01-11T21:45:10.4669888Z   test_sparse_csr_unary_inplace_conj_physical_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4670262Z   test_sparse_csr_unary_inplace_conj_physical_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4670624Z   test_sparse_csr_unary_inplace_conj_physical_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4671000Z   test_sparse_csr_unary_inplace_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4671367Z   test_sparse_csr_unary_inplace_conj_physical_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4671770Z   test_sparse_csr_unary_inplace_conj_physical_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4672122Z   test_sparse_csr_unary_inplace_conj_physical_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4672489Z   test_sparse_csr_unary_inplace_conj_physical_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4672852Z   test_sparse_csr_unary_inplace_conj_physical_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4673216Z   test_sparse_csr_unary_inplace_deg2rad_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4673567Z   test_sparse_csr_unary_inplace_deg2rad_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4673927Z   test_sparse_csr_unary_inplace_deg2rad_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4674284Z   test_sparse_csr_unary_inplace_deg2rad_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4674636Z   test_sparse_csr_unary_inplace_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4674993Z   test_sparse_csr_unary_inplace_deg2rad_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4675347Z   test_sparse_csr_unary_inplace_deg2rad_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4675702Z   test_sparse_csr_unary_inplace_deg2rad_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4676045Z   test_sparse_csr_unary_inplace_deg2rad_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4676406Z   test_sparse_csr_unary_inplace_deg2rad_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4676790Z   test_sparse_csr_unary_inplace_erf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4677138Z   test_sparse_csr_unary_inplace_erf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4677482Z   test_sparse_csr_unary_inplace_erf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4677828Z   test_sparse_csr_unary_inplace_erf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4678173Z   test_sparse_csr_unary_inplace_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4678507Z   test_sparse_csr_unary_inplace_erf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4678848Z   test_sparse_csr_unary_inplace_erf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4679187Z   test_sparse_csr_unary_inplace_erf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4679525Z   test_sparse_csr_unary_inplace_erf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4679856Z   test_sparse_csr_unary_inplace_erf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4680209Z   test_sparse_csr_unary_inplace_erfinv_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.235s)
2023-01-11T21:45:10.4680561Z   test_sparse_csr_unary_inplace_erfinv_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.167s)
2023-01-11T21:45:10.4680909Z   test_sparse_csr_unary_inplace_erfinv_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.165s)
2023-01-11T21:45:10.4681271Z   test_sparse_csr_unary_inplace_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.217s)
2023-01-11T21:45:10.4681621Z   test_sparse_csr_unary_inplace_erfinv_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4681977Z   test_sparse_csr_unary_inplace_erfinv_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4682325Z   test_sparse_csr_unary_inplace_erfinv_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4682672Z   test_sparse_csr_unary_inplace_erfinv_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4683022Z   test_sparse_csr_unary_inplace_erfinv_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4683403Z   test_sparse_csr_unary_inplace_expm1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4683751Z   test_sparse_csr_unary_inplace_expm1_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4684102Z   test_sparse_csr_unary_inplace_expm1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4684458Z   test_sparse_csr_unary_inplace_expm1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4684800Z   test_sparse_csr_unary_inplace_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4685152Z   test_sparse_csr_unary_inplace_expm1_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4685503Z   test_sparse_csr_unary_inplace_expm1_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4685849Z   test_sparse_csr_unary_inplace_expm1_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4686186Z   test_sparse_csr_unary_inplace_expm1_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4686535Z   test_sparse_csr_unary_inplace_expm1_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4686886Z   test_sparse_csr_unary_inplace_floor_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4687240Z   test_sparse_csr_unary_inplace_floor_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4687581Z   test_sparse_csr_unary_inplace_floor_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4687929Z   test_sparse_csr_unary_inplace_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4688310Z   test_sparse_csr_unary_inplace_floor_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4688648Z   test_sparse_csr_unary_inplace_floor_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4688995Z   test_sparse_csr_unary_inplace_floor_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4689339Z   test_sparse_csr_unary_inplace_floor_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4689688Z   test_sparse_csr_unary_inplace_floor_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4690025Z   test_sparse_csr_unary_inplace_frac_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4690375Z   test_sparse_csr_unary_inplace_frac_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4690727Z   test_sparse_csr_unary_inplace_frac_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4691071Z   test_sparse_csr_unary_inplace_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4691478Z   test_sparse_csr_unary_inplace_isinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4691930Z   test_sparse_csr_unary_inplace_isinf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4692374Z   test_sparse_csr_unary_inplace_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4692813Z   test_sparse_csr_unary_inplace_isinf_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4693274Z   test_sparse_csr_unary_inplace_isinf_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4693750Z   test_sparse_csr_unary_inplace_isinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4694187Z   test_sparse_csr_unary_inplace_isinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4694763Z   test_sparse_csr_unary_inplace_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4695186Z   test_sparse_csr_unary_inplace_isinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4695617Z   test_sparse_csr_unary_inplace_isinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4696044Z   test_sparse_csr_unary_inplace_isinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4696477Z   test_sparse_csr_unary_inplace_isinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4696908Z   test_sparse_csr_unary_inplace_isinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4697351Z   test_sparse_csr_unary_inplace_isnan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4697789Z   test_sparse_csr_unary_inplace_isnan_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4698233Z   test_sparse_csr_unary_inplace_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4698719Z   test_sparse_csr_unary_inplace_isnan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4699156Z   test_sparse_csr_unary_inplace_isnan_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4699625Z   test_sparse_csr_unary_inplace_isnan_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4700060Z   test_sparse_csr_unary_inplace_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4700480Z   test_sparse_csr_unary_inplace_isnan_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4700910Z   test_sparse_csr_unary_inplace_isnan_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4701342Z   test_sparse_csr_unary_inplace_isnan_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4701773Z   test_sparse_csr_unary_inplace_isnan_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4702194Z   test_sparse_csr_unary_inplace_isnan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4702639Z   test_sparse_csr_unary_inplace_isneginf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4703086Z   test_sparse_csr_unary_inplace_isneginf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4703528Z   test_sparse_csr_unary_inplace_isneginf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4703965Z   test_sparse_csr_unary_inplace_isneginf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4704407Z   test_sparse_csr_unary_inplace_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4704853Z   test_sparse_csr_unary_inplace_isneginf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4705315Z   test_sparse_csr_unary_inplace_isneginf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4705743Z   test_sparse_csr_unary_inplace_isneginf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4706175Z   test_sparse_csr_unary_inplace_isneginf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4706613Z   test_sparse_csr_unary_inplace_isneginf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4707056Z   test_sparse_csr_unary_inplace_isposinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4707495Z   test_sparse_csr_unary_inplace_isposinf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4707942Z   test_sparse_csr_unary_inplace_isposinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4708382Z   test_sparse_csr_unary_inplace_isposinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4708822Z   test_sparse_csr_unary_inplace_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4709251Z   test_sparse_csr_unary_inplace_isposinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4709688Z   test_sparse_csr_unary_inplace_isposinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4710147Z   test_sparse_csr_unary_inplace_isposinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4710584Z   test_sparse_csr_unary_inplace_isposinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4711022Z   test_sparse_csr_unary_inplace_isposinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4711407Z   test_sparse_csr_unary_inplace_log1p_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4711765Z   test_sparse_csr_unary_inplace_log1p_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4712131Z   test_sparse_csr_unary_inplace_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4712484Z   test_sparse_csr_unary_inplace_log1p_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4712845Z   test_sparse_csr_unary_inplace_log1p_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4713199Z   test_sparse_csr_unary_inplace_log1p_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4713558Z   test_sparse_csr_unary_inplace_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4713900Z   test_sparse_csr_unary_inplace_log1p_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4714247Z   test_sparse_csr_unary_inplace_log1p_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4714590Z   test_sparse_csr_unary_inplace_log1p_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4714933Z   test_sparse_csr_unary_inplace_log1p_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4715270Z   test_sparse_csr_unary_inplace_log1p_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4715626Z   test_sparse_csr_unary_inplace_neg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4715988Z   test_sparse_csr_unary_inplace_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4716370Z   test_sparse_csr_unary_inplace_neg_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.154s)
2023-01-11T21:45:10.4716726Z   test_sparse_csr_unary_inplace_neg_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.151s)
2023-01-11T21:45:10.4717079Z   test_sparse_csr_unary_inplace_neg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4717426Z   test_sparse_csr_unary_inplace_neg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4717760Z   test_sparse_csr_unary_inplace_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4718105Z   test_sparse_csr_unary_inplace_neg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4718451Z   test_sparse_csr_unary_inplace_neg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4718784Z   test_sparse_csr_unary_inplace_neg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4719125Z   test_sparse_csr_unary_inplace_neg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4719473Z   test_sparse_csr_unary_inplace_neg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4719888Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4720348Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4720809Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4721268Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4721768Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4722217Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4722663Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4723117Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4723567Z   test_sparse_csr_unary_inplace_nn_functional_relu_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4724016Z   test_sparse_csr_unary_inplace_positive_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4724463Z   test_sparse_csr_unary_inplace_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4724918Z   test_sparse_csr_unary_inplace_positive_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4725368Z   test_sparse_csr_unary_inplace_positive_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4725815Z   test_sparse_csr_unary_inplace_positive_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4726248Z   test_sparse_csr_unary_inplace_positive_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4726686Z   test_sparse_csr_unary_inplace_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4727155Z   test_sparse_csr_unary_inplace_positive_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4727597Z   test_sparse_csr_unary_inplace_positive_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4728023Z   test_sparse_csr_unary_inplace_positive_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4728461Z   test_sparse_csr_unary_inplace_positive_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4728896Z   test_sparse_csr_unary_inplace_positive_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4729295Z   test_sparse_csr_unary_inplace_rad2deg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4729652Z   test_sparse_csr_unary_inplace_rad2deg_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4730011Z   test_sparse_csr_unary_inplace_rad2deg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4730372Z   test_sparse_csr_unary_inplace_rad2deg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4730733Z   test_sparse_csr_unary_inplace_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4731081Z   test_sparse_csr_unary_inplace_rad2deg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4731433Z   test_sparse_csr_unary_inplace_rad2deg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4731788Z   test_sparse_csr_unary_inplace_rad2deg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4732133Z   test_sparse_csr_unary_inplace_rad2deg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4732513Z   test_sparse_csr_unary_inplace_rad2deg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4732873Z   test_sparse_csr_unary_inplace_round_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4733230Z   test_sparse_csr_unary_inplace_round_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4733583Z   test_sparse_csr_unary_inplace_round_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4733973Z   test_sparse_csr_unary_inplace_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4734324Z   test_sparse_csr_unary_inplace_round_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4734853Z   test_sparse_csr_unary_inplace_round_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4735238Z   test_sparse_csr_unary_inplace_round_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4735589Z   test_sparse_csr_unary_inplace_round_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4735934Z   test_sparse_csr_unary_inplace_round_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4736281Z   test_sparse_csr_unary_inplace_sgn_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4736635Z   test_sparse_csr_unary_inplace_sgn_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4736989Z   test_sparse_csr_unary_inplace_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4737349Z   test_sparse_csr_unary_inplace_sgn_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.192s)
2023-01-11T21:45:10.4737696Z   test_sparse_csr_unary_inplace_sgn_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.188s)
2023-01-11T21:45:10.4738046Z   test_sparse_csr_unary_inplace_sgn_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4738392Z   test_sparse_csr_unary_inplace_sgn_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4738794Z   test_sparse_csr_unary_inplace_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4739200Z   test_sparse_csr_unary_inplace_sgn_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4739558Z   test_sparse_csr_unary_inplace_sgn_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4739903Z   test_sparse_csr_unary_inplace_sgn_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4740245Z   test_sparse_csr_unary_inplace_sgn_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4740597Z   test_sparse_csr_unary_inplace_sgn_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4740950Z   test_sparse_csr_unary_inplace_sign_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4741302Z   test_sparse_csr_unary_inplace_sign_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4741465Z   test_sparse_csr_unary_inplace_sign_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4741636Z   test_sparse_csr_unary_inplace_sign_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4741805Z   test_sparse_csr_unary_inplace_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4741972Z   test_sparse_csr_unary_inplace_sign_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4742138Z   test_sparse_csr_unary_inplace_sign_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4742300Z   test_sparse_csr_unary_inplace_sign_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4742465Z   test_sparse_csr_unary_inplace_sign_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4742629Z   test_sparse_csr_unary_inplace_sign_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4742876Z   test_sparse_csr_unary_inplace_signbit_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4743097Z   test_sparse_csr_unary_inplace_signbit_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4743316Z   test_sparse_csr_unary_inplace_signbit_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4743534Z   test_sparse_csr_unary_inplace_signbit_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4743747Z   test_sparse_csr_unary_inplace_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4743961Z   test_sparse_csr_unary_inplace_signbit_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4744181Z   test_sparse_csr_unary_inplace_signbit_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4744399Z   test_sparse_csr_unary_inplace_signbit_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4744611Z   test_sparse_csr_unary_inplace_signbit_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4744825Z   test_sparse_csr_unary_inplace_signbit_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Inplace variant not supported! (0.001s)
2023-01-11T21:45:10.4744992Z   test_sparse_csr_unary_inplace_sin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4745148Z   test_sparse_csr_unary_inplace_sin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4745322Z   test_sparse_csr_unary_inplace_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4745497Z   test_sparse_csr_unary_inplace_sin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4745667Z   test_sparse_csr_unary_inplace_sin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4745856Z   test_sparse_csr_unary_inplace_sin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4746023Z   test_sparse_csr_unary_inplace_sin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4746186Z   test_sparse_csr_unary_inplace_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4746350Z   test_sparse_csr_unary_inplace_sin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4746507Z   test_sparse_csr_unary_inplace_sin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4746664Z   test_sparse_csr_unary_inplace_sin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4746832Z   test_sparse_csr_unary_inplace_sin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4746995Z   test_sparse_csr_unary_inplace_sin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4747166Z   test_sparse_csr_unary_inplace_sinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4747326Z   test_sparse_csr_unary_inplace_sinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4747502Z   test_sparse_csr_unary_inplace_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4747673Z   test_sparse_csr_unary_inplace_sinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4747845Z   test_sparse_csr_unary_inplace_sinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4748003Z   test_sparse_csr_unary_inplace_sinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4748172Z   test_sparse_csr_unary_inplace_sinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4748363Z   test_sparse_csr_unary_inplace_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4748531Z   test_sparse_csr_unary_inplace_sinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4748696Z   test_sparse_csr_unary_inplace_sinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4748855Z   test_sparse_csr_unary_inplace_sinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4749019Z   test_sparse_csr_unary_inplace_sinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4749178Z   test_sparse_csr_unary_inplace_sinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4749340Z   test_sparse_csr_unary_inplace_sqrt_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4749505Z   test_sparse_csr_unary_inplace_sqrt_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4749679Z   test_sparse_csr_unary_inplace_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.418s)
2023-01-11T21:45:10.4749854Z   test_sparse_csr_unary_inplace_sqrt_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.430s)
2023-01-11T21:45:10.4750029Z   test_sparse_csr_unary_inplace_sqrt_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.424s)
2023-01-11T21:45:10.4750193Z   test_sparse_csr_unary_inplace_sqrt_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4750356Z   test_sparse_csr_unary_inplace_sqrt_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4750524Z   test_sparse_csr_unary_inplace_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4750690Z   test_sparse_csr_unary_inplace_sqrt_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4750847Z   test_sparse_csr_unary_inplace_sqrt_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4751012Z   test_sparse_csr_unary_inplace_sqrt_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4751171Z   test_sparse_csr_unary_inplace_sqrt_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4751356Z   test_sparse_csr_unary_inplace_sqrt_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4751521Z   test_sparse_csr_unary_inplace_tan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4751685Z   test_sparse_csr_unary_inplace_tan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4751856Z   test_sparse_csr_unary_inplace_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4752026Z   test_sparse_csr_unary_inplace_tan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4752195Z   test_sparse_csr_unary_inplace_tan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4752358Z   test_sparse_csr_unary_inplace_tan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4752524Z   test_sparse_csr_unary_inplace_tan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4752694Z   test_sparse_csr_unary_inplace_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4752857Z   test_sparse_csr_unary_inplace_tan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4753018Z   test_sparse_csr_unary_inplace_tan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4753175Z   test_sparse_csr_unary_inplace_tan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4753338Z   test_sparse_csr_unary_inplace_tan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4753503Z   test_sparse_csr_unary_inplace_tan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4753662Z   test_sparse_csr_unary_inplace_tanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4753848Z   test_sparse_csr_unary_inplace_tanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4754023Z   test_sparse_csr_unary_inplace_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4754196Z   test_sparse_csr_unary_inplace_tanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4754370Z   test_sparse_csr_unary_inplace_tanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4754535Z   test_sparse_csr_unary_inplace_tanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4754701Z   test_sparse_csr_unary_inplace_tanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4754864Z   test_sparse_csr_unary_inplace_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4755029Z   test_sparse_csr_unary_inplace_tanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4755185Z   test_sparse_csr_unary_inplace_tanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4755350Z   test_sparse_csr_unary_inplace_tanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4755515Z   test_sparse_csr_unary_inplace_tanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4755677Z   test_sparse_csr_unary_inplace_tanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.007s)
2023-01-11T21:45:10.4755849Z   test_sparse_csr_unary_inplace_trunc_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4756019Z   test_sparse_csr_unary_inplace_trunc_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4756188Z   test_sparse_csr_unary_inplace_trunc_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4756357Z   test_sparse_csr_unary_inplace_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4756517Z   test_sparse_csr_unary_inplace_trunc_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4756682Z   test_sparse_csr_unary_inplace_trunc_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4756873Z   test_sparse_csr_unary_inplace_trunc_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4757038Z   test_sparse_csr_unary_inplace_trunc_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4757201Z   test_sparse_csr_unary_inplace_trunc_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4757364Z   test_sparse_csr_unary_out_abs_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4757525Z   test_sparse_csr_unary_out_abs_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4758153Z   test_sparse_csr_unary_out_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/opinfo/core.py:1068: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Copy.cpp:276.)
2023-01-11T21:45:10.4758258Z   return self.op(*args, **kwargs)
2023-01-11T21:45:10.4758334Z ok (0.006s)
2023-01-11T21:45:10.4758496Z   test_sparse_csr_unary_out_abs_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4758656Z   test_sparse_csr_unary_out_abs_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4758815Z   test_sparse_csr_unary_out_abs_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4758975Z   test_sparse_csr_unary_out_abs_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4759133Z   test_sparse_csr_unary_out_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4759292Z   test_sparse_csr_unary_out_abs_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4759444Z   test_sparse_csr_unary_out_abs_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4759629Z   test_sparse_csr_unary_out_abs_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4759782Z   test_sparse_csr_unary_out_abs_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4759941Z   test_sparse_csr_unary_out_abs_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4760102Z   test_sparse_csr_unary_out_angle_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4760269Z   test_sparse_csr_unary_out_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.202s)
2023-01-11T21:45:10.4760435Z   test_sparse_csr_unary_out_angle_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.200s)
2023-01-11T21:45:10.4760598Z   test_sparse_csr_unary_out_angle_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.197s)
2023-01-11T21:45:10.4760761Z   test_sparse_csr_unary_out_angle_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4760927Z   test_sparse_csr_unary_out_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4761077Z   test_sparse_csr_unary_out_angle_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4761241Z   test_sparse_csr_unary_out_angle_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4761400Z   test_sparse_csr_unary_out_angle_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4761561Z   test_sparse_csr_unary_out_angle_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4761719Z   test_sparse_csr_unary_out_angle_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4761884Z   test_sparse_csr_unary_out_asin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4762045Z   test_sparse_csr_unary_out_asin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4762212Z   test_sparse_csr_unary_out_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4762384Z   test_sparse_csr_unary_out_asin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4762564Z   test_sparse_csr_unary_out_asin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4762728Z   test_sparse_csr_unary_out_asin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4762890Z   test_sparse_csr_unary_out_asin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.006s)
2023-01-11T21:45:10.4763056Z   test_sparse_csr_unary_out_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4763219Z   test_sparse_csr_unary_out_asin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4763400Z   test_sparse_csr_unary_out_asin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4763577Z   test_sparse_csr_unary_out_asin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4763738Z   test_sparse_csr_unary_out_asin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4763884Z   test_sparse_csr_unary_out_asin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4764054Z   test_sparse_csr_unary_out_asinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4764211Z   test_sparse_csr_unary_out_asinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4764381Z   test_sparse_csr_unary_out_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4764546Z   test_sparse_csr_unary_out_asinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4764713Z   test_sparse_csr_unary_out_asinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4764873Z   test_sparse_csr_unary_out_asinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4765035Z   test_sparse_csr_unary_out_asinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4765225Z   test_sparse_csr_unary_out_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4765375Z   test_sparse_csr_unary_out_asinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4765538Z   test_sparse_csr_unary_out_asinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4765696Z   test_sparse_csr_unary_out_asinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4765857Z   test_sparse_csr_unary_out_asinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766014Z   test_sparse_csr_unary_out_asinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766177Z   test_sparse_csr_unary_out_atan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766336Z   test_sparse_csr_unary_out_atan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766505Z   test_sparse_csr_unary_out_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766661Z   test_sparse_csr_unary_out_atan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766829Z   test_sparse_csr_unary_out_atan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4766990Z   test_sparse_csr_unary_out_atan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4767151Z   test_sparse_csr_unary_out_atan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4767309Z   test_sparse_csr_unary_out_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4767468Z   test_sparse_csr_unary_out_atan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4767625Z   test_sparse_csr_unary_out_atan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4767782Z   test_sparse_csr_unary_out_atan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4767943Z   test_sparse_csr_unary_out_atan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4768093Z   test_sparse_csr_unary_out_atan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4768280Z   test_sparse_csr_unary_out_atanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4768439Z   test_sparse_csr_unary_out_atanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4768607Z   test_sparse_csr_unary_out_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4768773Z   test_sparse_csr_unary_out_atanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4768937Z   test_sparse_csr_unary_out_atanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4769099Z   test_sparse_csr_unary_out_atanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4769264Z   test_sparse_csr_unary_out_atanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4769417Z   test_sparse_csr_unary_out_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4769578Z   test_sparse_csr_unary_out_atanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4769736Z   test_sparse_csr_unary_out_atanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4769898Z   test_sparse_csr_unary_out_atanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4770054Z   test_sparse_csr_unary_out_atanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4770213Z   test_sparse_csr_unary_out_atanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4770371Z   test_sparse_csr_unary_out_ceil_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4770531Z   test_sparse_csr_unary_out_ceil_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4770730Z   test_sparse_csr_unary_out_ceil_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4770882Z   test_sparse_csr_unary_out_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4771043Z   test_sparse_csr_unary_out_ceil_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4771201Z   test_sparse_csr_unary_out_ceil_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4771360Z   test_sparse_csr_unary_out_ceil_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4771517Z   test_sparse_csr_unary_out_ceil_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4771672Z   test_sparse_csr_unary_out_ceil_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4771849Z   test_sparse_csr_unary_out_conj_physical_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4772021Z   test_sparse_csr_unary_out_conj_physical_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4772194Z   test_sparse_csr_unary_out_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4772375Z   test_sparse_csr_unary_out_conj_physical_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4772553Z   test_sparse_csr_unary_out_conj_physical_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4772724Z   test_sparse_csr_unary_out_conj_physical_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4772893Z   test_sparse_csr_unary_out_conj_physical_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4773060Z   test_sparse_csr_unary_out_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4773230Z   test_sparse_csr_unary_out_conj_physical_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4773397Z   test_sparse_csr_unary_out_conj_physical_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4773561Z   test_sparse_csr_unary_out_conj_physical_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4773744Z   test_sparse_csr_unary_out_conj_physical_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4773908Z   test_sparse_csr_unary_out_conj_physical_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4774074Z   test_sparse_csr_unary_out_deg2rad_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4774239Z   test_sparse_csr_unary_out_deg2rad_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4774404Z   test_sparse_csr_unary_out_deg2rad_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4774690Z   test_sparse_csr_unary_out_deg2rad_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4774855Z   test_sparse_csr_unary_out_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775017Z   test_sparse_csr_unary_out_deg2rad_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775179Z   test_sparse_csr_unary_out_deg2rad_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775329Z   test_sparse_csr_unary_out_deg2rad_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775489Z   test_sparse_csr_unary_out_deg2rad_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775646Z   test_sparse_csr_unary_out_deg2rad_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775805Z   test_sparse_csr_unary_out_erf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4775963Z   test_sparse_csr_unary_out_erf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4776120Z   test_sparse_csr_unary_out_erf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4776323Z   test_sparse_csr_unary_out_erf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4776477Z   test_sparse_csr_unary_out_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4776629Z   test_sparse_csr_unary_out_erf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4776785Z   test_sparse_csr_unary_out_erf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4776937Z   test_sparse_csr_unary_out_erf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4777091Z   test_sparse_csr_unary_out_erf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4777244Z   test_sparse_csr_unary_out_erf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4777400Z   test_sparse_csr_unary_out_erfinv_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4777564Z   test_sparse_csr_unary_out_erfinv_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4777728Z   test_sparse_csr_unary_out_erfinv_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4777891Z   test_sparse_csr_unary_out_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4778044Z   test_sparse_csr_unary_out_erfinv_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4778203Z   test_sparse_csr_unary_out_erfinv_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4778358Z   test_sparse_csr_unary_out_erfinv_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4778573Z   test_sparse_csr_unary_out_erfinv_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4778730Z   test_sparse_csr_unary_out_erfinv_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.004s)
2023-01-11T21:45:10.4778892Z   test_sparse_csr_unary_out_expm1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4779054Z   test_sparse_csr_unary_out_expm1_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4779212Z   test_sparse_csr_unary_out_expm1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4779393Z   test_sparse_csr_unary_out_expm1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4779552Z   test_sparse_csr_unary_out_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4779706Z   test_sparse_csr_unary_out_expm1_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4779858Z   test_sparse_csr_unary_out_expm1_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780011Z   test_sparse_csr_unary_out_expm1_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780167Z   test_sparse_csr_unary_out_expm1_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780320Z   test_sparse_csr_unary_out_expm1_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780482Z   test_sparse_csr_unary_out_floor_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780637Z   test_sparse_csr_unary_out_floor_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780794Z   test_sparse_csr_unary_out_floor_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4780951Z   test_sparse_csr_unary_out_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4781106Z   test_sparse_csr_unary_out_floor_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4781260Z   test_sparse_csr_unary_out_floor_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4781413Z   test_sparse_csr_unary_out_floor_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4781572Z   test_sparse_csr_unary_out_floor_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4781753Z   test_sparse_csr_unary_out_floor_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4781919Z   test_sparse_csr_unary_out_frac_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4782075Z   test_sparse_csr_unary_out_frac_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4782237Z   test_sparse_csr_unary_out_frac_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4782397Z   test_sparse_csr_unary_out_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4782603Z   test_sparse_csr_unary_out_isinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4782800Z   test_sparse_csr_unary_out_isinf_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4783006Z   test_sparse_csr_unary_out_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4783211Z   test_sparse_csr_unary_out_isinf_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4783416Z   test_sparse_csr_unary_out_isinf_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4783614Z   test_sparse_csr_unary_out_isinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4783812Z   test_sparse_csr_unary_out_isinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4783996Z   test_sparse_csr_unary_out_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4784190Z   test_sparse_csr_unary_out_isinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4784384Z   test_sparse_csr_unary_out_isinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4784575Z   test_sparse_csr_unary_out_isinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4784794Z   test_sparse_csr_unary_out_isinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4784984Z   test_sparse_csr_unary_out_isinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4785181Z   test_sparse_csr_unary_out_isnan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4785375Z   test_sparse_csr_unary_out_isnan_cuda_bool (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4785576Z   test_sparse_csr_unary_out_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4785766Z   test_sparse_csr_unary_out_isnan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4785966Z   test_sparse_csr_unary_out_isnan_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4786166Z   test_sparse_csr_unary_out_isnan_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4786359Z   test_sparse_csr_unary_out_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4786555Z   test_sparse_csr_unary_out_isnan_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4786748Z   test_sparse_csr_unary_out_isnan_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4786936Z   test_sparse_csr_unary_out_isnan_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4787129Z   test_sparse_csr_unary_out_isnan_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4787349Z   test_sparse_csr_unary_out_isnan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4787522Z   test_sparse_csr_unary_out_isneginf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4787680Z   test_sparse_csr_unary_out_isneginf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4787849Z   test_sparse_csr_unary_out_isneginf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788013Z   test_sparse_csr_unary_out_isneginf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788174Z   test_sparse_csr_unary_out_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788338Z   test_sparse_csr_unary_out_isneginf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788504Z   test_sparse_csr_unary_out_isneginf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788671Z   test_sparse_csr_unary_out_isneginf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788835Z   test_sparse_csr_unary_out_isneginf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4788989Z   test_sparse_csr_unary_out_isneginf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4789160Z   test_sparse_csr_unary_out_isposinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4789322Z   test_sparse_csr_unary_out_isposinf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4789490Z   test_sparse_csr_unary_out_isposinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4789653Z   test_sparse_csr_unary_out_isposinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4789813Z   test_sparse_csr_unary_out_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4789978Z   test_sparse_csr_unary_out_isposinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4790144Z   test_sparse_csr_unary_out_isposinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4790329Z   test_sparse_csr_unary_out_isposinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4790484Z   test_sparse_csr_unary_out_isposinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4790647Z   test_sparse_csr_unary_out_isposinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4790809Z   test_sparse_csr_unary_out_log1p_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4790973Z   test_sparse_csr_unary_out_log1p_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4791136Z   test_sparse_csr_unary_out_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4791308Z   test_sparse_csr_unary_out_log1p_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4791472Z   test_sparse_csr_unary_out_log1p_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4791635Z   test_sparse_csr_unary_out_log1p_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4791798Z   test_sparse_csr_unary_out_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4791951Z   test_sparse_csr_unary_out_log1p_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4792112Z   test_sparse_csr_unary_out_log1p_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4792268Z   test_sparse_csr_unary_out_log1p_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4792429Z   test_sparse_csr_unary_out_log1p_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4792586Z   test_sparse_csr_unary_out_log1p_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4792776Z   test_sparse_csr_unary_out_neg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4792940Z   test_sparse_csr_unary_out_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4807776Z   test_sparse_csr_unary_out_neg_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4807970Z   test_sparse_csr_unary_out_neg_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4808131Z   test_sparse_csr_unary_out_neg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4808284Z   test_sparse_csr_unary_out_neg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4808443Z   test_sparse_csr_unary_out_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4808598Z   test_sparse_csr_unary_out_neg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4808756Z   test_sparse_csr_unary_out_neg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4808912Z   test_sparse_csr_unary_out_neg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4809067Z   test_sparse_csr_unary_out_neg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4809222Z   test_sparse_csr_unary_out_neg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4809426Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4809633Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4809831Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4810037Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4810243Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4810498Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4810702Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4810902Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4811099Z   test_sparse_csr_unary_out_nn_functional_relu_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4811293Z   test_sparse_csr_unary_out_positive_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4811490Z   test_sparse_csr_unary_out_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4811688Z   test_sparse_csr_unary_out_positive_cuda_complex32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4811886Z   test_sparse_csr_unary_out_positive_cuda_complex64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4812081Z   test_sparse_csr_unary_out_positive_cuda_float16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4812274Z   test_sparse_csr_unary_out_positive_cuda_float32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4812467Z   test_sparse_csr_unary_out_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4812660Z   test_sparse_csr_unary_out_positive_cuda_int16 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4812879Z   test_sparse_csr_unary_out_positive_cuda_int32 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4813073Z   test_sparse_csr_unary_out_positive_cuda_int64 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4813265Z   test_sparse_csr_unary_out_positive_cuda_int8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4813447Z   test_sparse_csr_unary_out_positive_cuda_uint8 (__main__.TestSparseCSRCUDA) ... skip: Skipped! Out not supported (0.001s)
2023-01-11T21:45:10.4813612Z   test_sparse_csr_unary_out_rad2deg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4813770Z   test_sparse_csr_unary_out_rad2deg_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4813932Z   test_sparse_csr_unary_out_rad2deg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4814095Z   test_sparse_csr_unary_out_rad2deg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4814249Z   test_sparse_csr_unary_out_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4814409Z   test_sparse_csr_unary_out_rad2deg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4814905Z   test_sparse_csr_unary_out_rad2deg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4815080Z   test_sparse_csr_unary_out_rad2deg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4815238Z   test_sparse_csr_unary_out_rad2deg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4815393Z   test_sparse_csr_unary_out_rad2deg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4815554Z   test_sparse_csr_unary_out_round_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4815715Z   test_sparse_csr_unary_out_round_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4815869Z   test_sparse_csr_unary_out_round_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4816081Z   test_sparse_csr_unary_out_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4816241Z   test_sparse_csr_unary_out_round_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4816396Z   test_sparse_csr_unary_out_round_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4816540Z   test_sparse_csr_unary_out_round_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4816697Z   test_sparse_csr_unary_out_round_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4816852Z   test_sparse_csr_unary_out_round_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817010Z   test_sparse_csr_unary_out_sgn_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817169Z   test_sparse_csr_unary_out_sgn_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817333Z   test_sparse_csr_unary_out_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817494Z   test_sparse_csr_unary_out_sgn_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817651Z   test_sparse_csr_unary_out_sgn_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817799Z   test_sparse_csr_unary_out_sgn_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4817952Z   test_sparse_csr_unary_out_sgn_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4818100Z   test_sparse_csr_unary_out_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4818255Z   test_sparse_csr_unary_out_sgn_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4818409Z   test_sparse_csr_unary_out_sgn_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4818706Z   test_sparse_csr_unary_out_sgn_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4818871Z   test_sparse_csr_unary_out_sgn_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819025Z   test_sparse_csr_unary_out_sgn_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819179Z   test_sparse_csr_unary_out_sign_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819326Z   test_sparse_csr_unary_out_sign_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819480Z   test_sparse_csr_unary_out_sign_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819635Z   test_sparse_csr_unary_out_sign_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819790Z   test_sparse_csr_unary_out_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4819947Z   test_sparse_csr_unary_out_sign_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4820102Z   test_sparse_csr_unary_out_sign_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4820260Z   test_sparse_csr_unary_out_sign_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4820409Z   test_sparse_csr_unary_out_sign_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4820558Z   test_sparse_csr_unary_out_sign_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4820722Z   test_sparse_csr_unary_out_signbit_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4820878Z   test_sparse_csr_unary_out_signbit_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4821036Z   test_sparse_csr_unary_out_signbit_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4821195Z   test_sparse_csr_unary_out_signbit_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4821355Z   test_sparse_csr_unary_out_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4821511Z   test_sparse_csr_unary_out_signbit_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4821695Z   test_sparse_csr_unary_out_signbit_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4821850Z   test_sparse_csr_unary_out_signbit_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822001Z   test_sparse_csr_unary_out_signbit_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822158Z   test_sparse_csr_unary_out_signbit_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822314Z   test_sparse_csr_unary_out_sin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822469Z   test_sparse_csr_unary_out_sin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822632Z   test_sparse_csr_unary_out_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822787Z   test_sparse_csr_unary_out_sin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4822946Z   test_sparse_csr_unary_out_sin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4823103Z   test_sparse_csr_unary_out_sin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4823255Z   test_sparse_csr_unary_out_sin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4823417Z   test_sparse_csr_unary_out_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4823595Z   test_sparse_csr_unary_out_sin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4823763Z   test_sparse_csr_unary_out_sin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4823916Z   test_sparse_csr_unary_out_sin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4824093Z   test_sparse_csr_unary_out_sin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4824247Z   test_sparse_csr_unary_out_sin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4824406Z   test_sparse_csr_unary_out_sinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4824557Z   test_sparse_csr_unary_out_sinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4824712Z   test_sparse_csr_unary_out_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4824869Z   test_sparse_csr_unary_out_sinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825027Z   test_sparse_csr_unary_out_sinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825183Z   test_sparse_csr_unary_out_sinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825339Z   test_sparse_csr_unary_out_sinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825494Z   test_sparse_csr_unary_out_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825651Z   test_sparse_csr_unary_out_sinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825806Z   test_sparse_csr_unary_out_sinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4825955Z   test_sparse_csr_unary_out_sinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4826107Z   test_sparse_csr_unary_out_sinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4826259Z   test_sparse_csr_unary_out_sinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4826417Z   test_sparse_csr_unary_out_sqrt_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4826566Z   test_sparse_csr_unary_out_sqrt_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4826729Z   test_sparse_csr_unary_out_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4826892Z   test_sparse_csr_unary_out_sqrt_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827076Z   test_sparse_csr_unary_out_sqrt_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827229Z   test_sparse_csr_unary_out_sqrt_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827377Z   test_sparse_csr_unary_out_sqrt_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827531Z   test_sparse_csr_unary_out_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827682Z   test_sparse_csr_unary_out_sqrt_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827836Z   test_sparse_csr_unary_out_sqrt_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4827987Z   test_sparse_csr_unary_out_sqrt_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4828145Z   test_sparse_csr_unary_out_sqrt_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4828297Z   test_sparse_csr_unary_out_sqrt_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4828454Z   test_sparse_csr_unary_out_tan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4828602Z   test_sparse_csr_unary_out_tan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4828761Z   test_sparse_csr_unary_out_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4828918Z   test_sparse_csr_unary_out_tan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4829075Z   test_sparse_csr_unary_out_tan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4829229Z   test_sparse_csr_unary_out_tan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4829408Z   test_sparse_csr_unary_out_tan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4829563Z   test_sparse_csr_unary_out_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4829718Z   test_sparse_csr_unary_out_tan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4829868Z   test_sparse_csr_unary_out_tan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830011Z   test_sparse_csr_unary_out_tan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830162Z   test_sparse_csr_unary_out_tan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830314Z   test_sparse_csr_unary_out_tan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830467Z   test_sparse_csr_unary_out_tanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830619Z   test_sparse_csr_unary_out_tanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830781Z   test_sparse_csr_unary_out_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4830941Z   test_sparse_csr_unary_out_tanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4831099Z   test_sparse_csr_unary_out_tanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4831247Z   test_sparse_csr_unary_out_tanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4831402Z   test_sparse_csr_unary_out_tanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4831557Z   test_sparse_csr_unary_out_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4831711Z   test_sparse_csr_unary_out_tanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4831862Z   test_sparse_csr_unary_out_tanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832019Z   test_sparse_csr_unary_out_tanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832169Z   test_sparse_csr_unary_out_tanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832344Z   test_sparse_csr_unary_out_tanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832498Z   test_sparse_csr_unary_out_trunc_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832655Z   test_sparse_csr_unary_out_trunc_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832809Z   test_sparse_csr_unary_out_trunc_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4832961Z   test_sparse_csr_unary_out_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4833115Z   test_sparse_csr_unary_out_trunc_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4833265Z   test_sparse_csr_unary_out_trunc_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4833423Z   test_sparse_csr_unary_out_trunc_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4833574Z   test_sparse_csr_unary_out_trunc_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4833731Z   test_sparse_csr_unary_out_trunc_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.005s)
2023-01-11T21:45:10.4833869Z   test_sparse_mm_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4834014Z   test_sparse_mm_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4834158Z   test_sparse_mm_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4834301Z   test_sparse_mm_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.009s)
2023-01-11T21:45:10.4834440Z   test_sparse_mm_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4834577Z   test_sparse_mm_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.008s)
2023-01-11T21:45:10.4834763Z   test_sparse_to_sparse_compressed_SparseBSC_cuda_float64 (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4834936Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... skip: NOT IMPL (0.002s)
2023-01-11T21:45:10.4835100Z   test_sparse_to_sparse_compressed_SparseBSR_cuda_float64 (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4835271Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... skip: NOT IMPL (0.002s)
2023-01-11T21:45:10.4835435Z   test_sparse_to_sparse_compressed_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4835583Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... ok (0.015s)
2023-01-11T21:45:10.4835747Z   test_sparse_to_sparse_compressed_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA)
2023-01-11T21:45:10.4835896Z This test tests conversion from COO to CSR and CSC and CSC to CSR and CSC ... ok (0.016s)
2023-01-11T21:45:10.4836057Z   test_sparse_triangular_solve_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.189s)
2023-01-11T21:45:10.4836220Z   test_sparse_triangular_solve_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.188s)
2023-01-11T21:45:10.4836376Z   test_sparse_triangular_solve_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.186s)
2023-01-11T21:45:10.4836528Z   test_sparse_triangular_solve_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.185s)
2023-01-11T21:45:10.4836665Z   test_sum_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.025s)
2023-01-11T21:45:10.4836793Z   test_sum_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4836934Z   test_sum_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4837072Z   test_sum_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.025s)
2023-01-11T21:45:10.4837205Z   test_sum_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.026s)
2023-01-11T21:45:10.4837335Z   test_sum_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.031s)
2023-01-11T21:45:10.4837462Z   test_sum_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.031s)
2023-01-11T21:45:10.4837593Z   test_sum_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4837720Z   test_sum_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4837869Z   test_sum_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.024s)
2023-01-11T21:45:10.4837998Z   test_sum_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4838126Z   test_sum_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.023s)
2023-01-11T21:45:10.4838283Z   test_transpose_SparseBSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (3.379s)
2023-01-11T21:45:10.4838433Z   test_transpose_SparseBSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (3.283s)
2023-01-11T21:45:10.4838587Z   test_transpose_SparseBSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (3.385s)
2023-01-11T21:45:10.4838745Z   test_transpose_SparseBSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (3.394s)
2023-01-11T21:45:10.4838902Z   test_transpose_SparseBSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (3.380s)
2023-01-11T21:45:10.4839054Z   test_transpose_SparseBSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (3.378s)
2023-01-11T21:45:10.4839205Z   test_transpose_SparseBSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (3.364s)
2023-01-11T21:45:10.4839356Z   test_transpose_SparseBSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (3.295s)
2023-01-11T21:45:10.4839504Z   test_transpose_SparseBSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (3.302s)
2023-01-11T21:45:10.4839650Z   test_transpose_SparseBSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (3.282s)
2023-01-11T21:45:10.4839798Z   test_transpose_SparseBSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (3.293s)
2023-01-11T21:45:10.4839945Z   test_transpose_SparseBSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (3.295s)
2023-01-11T21:45:10.4840098Z   test_transpose_SparseBSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (3.451s)
2023-01-11T21:45:10.4840270Z   test_transpose_SparseBSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (3.361s)
2023-01-11T21:45:10.4840428Z   test_transpose_SparseBSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (3.469s)
2023-01-11T21:45:10.4840586Z   test_transpose_SparseBSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (3.474s)
2023-01-11T21:45:10.4840742Z   test_transpose_SparseBSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (3.451s)
2023-01-11T21:45:10.4840894Z   test_transpose_SparseBSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (3.452s)
2023-01-11T21:45:10.4841042Z   test_transpose_SparseBSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (3.440s)
2023-01-11T21:45:10.4841186Z   test_transpose_SparseBSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (3.370s)
2023-01-11T21:45:10.4841333Z   test_transpose_SparseBSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (3.370s)
2023-01-11T21:45:10.4841477Z   test_transpose_SparseBSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (3.363s)
2023-01-11T21:45:10.4841626Z   test_transpose_SparseBSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (3.374s)
2023-01-11T21:45:10.4841775Z   test_transpose_SparseBSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (3.369s)
2023-01-11T21:45:10.4841933Z   test_transpose_SparseCSC_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.590s)
2023-01-11T21:45:10.4842081Z   test_transpose_SparseCSC_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (1.545s)
2023-01-11T21:45:10.4842236Z   test_transpose_SparseCSC_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.600s)
2023-01-11T21:45:10.4842387Z   test_transpose_SparseCSC_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.601s)
2023-01-11T21:45:10.4842541Z   test_transpose_SparseCSC_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.588s)
2023-01-11T21:45:10.4842689Z   test_transpose_SparseCSC_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.586s)
2023-01-11T21:45:10.4842836Z   test_transpose_SparseCSC_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.583s)
2023-01-11T21:45:10.4842989Z   test_transpose_SparseCSC_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (1.545s)
2023-01-11T21:45:10.4843137Z   test_transpose_SparseCSC_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (1.546s)
2023-01-11T21:45:10.4843304Z   test_transpose_SparseCSC_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (1.543s)
2023-01-11T21:45:10.4843455Z   test_transpose_SparseCSC_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (1.546s)
2023-01-11T21:45:10.4843602Z   test_transpose_SparseCSC_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (1.550s)
2023-01-11T21:45:10.4843756Z   test_transpose_SparseCSR_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (1.645s)
2023-01-11T21:45:10.4843907Z   test_transpose_SparseCSR_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (1.597s)
2023-01-11T21:45:10.4844066Z   test_transpose_SparseCSR_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (1.653s)
2023-01-11T21:45:10.4844223Z   test_transpose_SparseCSR_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (1.657s)
2023-01-11T21:45:10.4844376Z   test_transpose_SparseCSR_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (1.645s)
2023-01-11T21:45:10.4844523Z   test_transpose_SparseCSR_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (1.640s)
2023-01-11T21:45:10.4844672Z   test_transpose_SparseCSR_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (1.641s)
2023-01-11T21:45:10.4844821Z   test_transpose_SparseCSR_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (1.603s)
2023-01-11T21:45:10.4844963Z   test_transpose_SparseCSR_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (1.601s)
2023-01-11T21:45:10.4845109Z   test_transpose_SparseCSR_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (1.596s)
2023-01-11T21:45:10.4845260Z   test_transpose_SparseCSR_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (1.603s)
2023-01-11T21:45:10.4845410Z   test_transpose_SparseCSR_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (1.601s)
2023-01-11T21:45:10.4845586Z   test_zero_to_zero_correspondence_unary_abs_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4845794Z   test_zero_to_zero_correspondence_unary_abs_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4845971Z   test_zero_to_zero_correspondence_unary_abs_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4846148Z   test_zero_to_zero_correspondence_unary_abs_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4846321Z   test_zero_to_zero_correspondence_unary_abs_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4846496Z   test_zero_to_zero_correspondence_unary_abs_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4846671Z   test_zero_to_zero_correspondence_unary_abs_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4846838Z   test_zero_to_zero_correspondence_unary_abs_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4847009Z   test_zero_to_zero_correspondence_unary_abs_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4847176Z   test_zero_to_zero_correspondence_unary_abs_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4847340Z   test_zero_to_zero_correspondence_unary_abs_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4847511Z   test_zero_to_zero_correspondence_unary_abs_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4847675Z   test_zero_to_zero_correspondence_unary_abs_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4847843Z   test_zero_to_zero_correspondence_unary_angle_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4848024Z   test_zero_to_zero_correspondence_unary_angle_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4848205Z   test_zero_to_zero_correspondence_unary_angle_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4848382Z   test_zero_to_zero_correspondence_unary_angle_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4848563Z   test_zero_to_zero_correspondence_unary_angle_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4848770Z   test_zero_to_zero_correspondence_unary_angle_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4848942Z   test_zero_to_zero_correspondence_unary_angle_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4849115Z   test_zero_to_zero_correspondence_unary_angle_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4849285Z   test_zero_to_zero_correspondence_unary_angle_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4849452Z   test_zero_to_zero_correspondence_unary_angle_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4849620Z   test_zero_to_zero_correspondence_unary_angle_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4849795Z   test_zero_to_zero_correspondence_unary_asin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4849969Z   test_zero_to_zero_correspondence_unary_asin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4850153Z   test_zero_to_zero_correspondence_unary_asin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4850330Z   test_zero_to_zero_correspondence_unary_asin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4850503Z   test_zero_to_zero_correspondence_unary_asin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4850679Z   test_zero_to_zero_correspondence_unary_asin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4850851Z   test_zero_to_zero_correspondence_unary_asin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4851019Z   test_zero_to_zero_correspondence_unary_asin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4851217Z   test_zero_to_zero_correspondence_unary_asin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4851389Z   test_zero_to_zero_correspondence_unary_asin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4851563Z   test_zero_to_zero_correspondence_unary_asin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4851732Z   test_zero_to_zero_correspondence_unary_asin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4851899Z   test_zero_to_zero_correspondence_unary_asin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4852074Z   test_zero_to_zero_correspondence_unary_asinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4852241Z   test_zero_to_zero_correspondence_unary_asinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4852422Z   test_zero_to_zero_correspondence_unary_asinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4852596Z   test_zero_to_zero_correspondence_unary_asinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4852779Z   test_zero_to_zero_correspondence_unary_asinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4852956Z   test_zero_to_zero_correspondence_unary_asinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4853125Z   test_zero_to_zero_correspondence_unary_asinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4853301Z   test_zero_to_zero_correspondence_unary_asinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4853499Z   test_zero_to_zero_correspondence_unary_asinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4853695Z   test_zero_to_zero_correspondence_unary_asinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4853868Z   test_zero_to_zero_correspondence_unary_asinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4854040Z   test_zero_to_zero_correspondence_unary_asinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4854224Z   test_zero_to_zero_correspondence_unary_asinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4854404Z   test_zero_to_zero_correspondence_unary_atan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4854686Z   test_zero_to_zero_correspondence_unary_atan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4854866Z   test_zero_to_zero_correspondence_unary_atan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4855045Z   test_zero_to_zero_correspondence_unary_atan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4855221Z   test_zero_to_zero_correspondence_unary_atan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4855407Z   test_zero_to_zero_correspondence_unary_atan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4855584Z   test_zero_to_zero_correspondence_unary_atan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4855763Z   test_zero_to_zero_correspondence_unary_atan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4855926Z   test_zero_to_zero_correspondence_unary_atan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4856091Z   test_zero_to_zero_correspondence_unary_atan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4856257Z   test_zero_to_zero_correspondence_unary_atan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4856430Z   test_zero_to_zero_correspondence_unary_atan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4856598Z   test_zero_to_zero_correspondence_unary_atan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4856829Z   test_zero_to_zero_correspondence_unary_atanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4856999Z   test_zero_to_zero_correspondence_unary_atanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4857184Z   test_zero_to_zero_correspondence_unary_atanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4857364Z   test_zero_to_zero_correspondence_unary_atanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4857535Z   test_zero_to_zero_correspondence_unary_atanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4857709Z   test_zero_to_zero_correspondence_unary_atanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4857881Z   test_zero_to_zero_correspondence_unary_atanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4858054Z   test_zero_to_zero_correspondence_unary_atanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4858234Z   test_zero_to_zero_correspondence_unary_atanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4858406Z   test_zero_to_zero_correspondence_unary_atanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4858627Z   test_zero_to_zero_correspondence_unary_atanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4858801Z   test_zero_to_zero_correspondence_unary_atanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4858971Z   test_zero_to_zero_correspondence_unary_atanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4859137Z   test_zero_to_zero_correspondence_unary_ceil_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4859314Z   test_zero_to_zero_correspondence_unary_ceil_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4859487Z   test_zero_to_zero_correspondence_unary_ceil_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4859659Z   test_zero_to_zero_correspondence_unary_ceil_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4859863Z   test_zero_to_zero_correspondence_unary_ceil_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4860031Z   test_zero_to_zero_correspondence_unary_ceil_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4860195Z   test_zero_to_zero_correspondence_unary_ceil_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4860364Z   test_zero_to_zero_correspondence_unary_ceil_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4860534Z   test_zero_to_zero_correspondence_unary_ceil_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4860712Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4860898Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4861088Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4861277Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4861456Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4861639Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4861821Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4862005Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4862213Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4862389Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4862571Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4862750Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4862929Z   test_zero_to_zero_correspondence_unary_conj_physical_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4863116Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4863312Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4863519Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4863700Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4863878Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4864047Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4864217Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4864391Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4864566Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4864745Z   test_zero_to_zero_correspondence_unary_deg2rad_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4864921Z   test_zero_to_zero_correspondence_unary_erf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4865091Z   test_zero_to_zero_correspondence_unary_erf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4865288Z   test_zero_to_zero_correspondence_unary_erf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4865460Z   test_zero_to_zero_correspondence_unary_erf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4865626Z   test_zero_to_zero_correspondence_unary_erf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4865796Z   test_zero_to_zero_correspondence_unary_erf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4865967Z   test_zero_to_zero_correspondence_unary_erf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4866133Z   test_zero_to_zero_correspondence_unary_erf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4866303Z   test_zero_to_zero_correspondence_unary_erf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4866466Z   test_zero_to_zero_correspondence_unary_erf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4866639Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4866817Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4866995Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4867164Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4867338Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4867511Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4867707Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4867881Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4868053Z   test_zero_to_zero_correspondence_unary_erfinv_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4868231Z   test_zero_to_zero_correspondence_unary_expm1_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4868403Z   test_zero_to_zero_correspondence_unary_expm1_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4868577Z   test_zero_to_zero_correspondence_unary_expm1_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4868745Z   test_zero_to_zero_correspondence_unary_expm1_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4868921Z   test_zero_to_zero_correspondence_unary_expm1_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4869095Z   test_zero_to_zero_correspondence_unary_expm1_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4869268Z   test_zero_to_zero_correspondence_unary_expm1_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4869433Z   test_zero_to_zero_correspondence_unary_expm1_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4869602Z   test_zero_to_zero_correspondence_unary_expm1_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4869772Z   test_zero_to_zero_correspondence_unary_expm1_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4869949Z   test_zero_to_zero_correspondence_unary_floor_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4870123Z   test_zero_to_zero_correspondence_unary_floor_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4870289Z   test_zero_to_zero_correspondence_unary_floor_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4870460Z   test_zero_to_zero_correspondence_unary_floor_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4870652Z   test_zero_to_zero_correspondence_unary_floor_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4870827Z   test_zero_to_zero_correspondence_unary_floor_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4870996Z   test_zero_to_zero_correspondence_unary_floor_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4871166Z   test_zero_to_zero_correspondence_unary_floor_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4871336Z   test_zero_to_zero_correspondence_unary_floor_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4871510Z   test_zero_to_zero_correspondence_unary_frac_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4871688Z   test_zero_to_zero_correspondence_unary_frac_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4871854Z   test_zero_to_zero_correspondence_unary_frac_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4872024Z   test_zero_to_zero_correspondence_unary_frac_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4872200Z   test_zero_to_zero_correspondence_unary_isinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4872372Z   test_zero_to_zero_correspondence_unary_isinf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4872551Z   test_zero_to_zero_correspondence_unary_isinf_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4872728Z   test_zero_to_zero_correspondence_unary_isinf_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4872908Z   test_zero_to_zero_correspondence_unary_isinf_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4873110Z   test_zero_to_zero_correspondence_unary_isinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4873286Z   test_zero_to_zero_correspondence_unary_isinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4873450Z   test_zero_to_zero_correspondence_unary_isinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4873620Z   test_zero_to_zero_correspondence_unary_isinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4873786Z   test_zero_to_zero_correspondence_unary_isinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4873957Z   test_zero_to_zero_correspondence_unary_isinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4874128Z   test_zero_to_zero_correspondence_unary_isinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4874300Z   test_zero_to_zero_correspondence_unary_isinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4874475Z   test_zero_to_zero_correspondence_unary_isnan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4874645Z   test_zero_to_zero_correspondence_unary_isnan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4874825Z   test_zero_to_zero_correspondence_unary_isnan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4874996Z   test_zero_to_zero_correspondence_unary_isnan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4875167Z   test_zero_to_zero_correspondence_unary_isnan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4875336Z   test_zero_to_zero_correspondence_unary_isnan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4875507Z   test_zero_to_zero_correspondence_unary_isnan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4875677Z   test_zero_to_zero_correspondence_unary_isnan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4875847Z   test_zero_to_zero_correspondence_unary_isnan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4876039Z   test_zero_to_zero_correspondence_unary_isnan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4876208Z   test_zero_to_zero_correspondence_unary_isnan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4876375Z   test_zero_to_zero_correspondence_unary_isnan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4876551Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4876727Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4876906Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4877090Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4877271Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4877448Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4877629Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4877804Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4877980Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4878150Z   test_zero_to_zero_correspondence_unary_isneginf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4878355Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4878528Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4878709Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4878886Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4879062Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4879234Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4879407Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4879580Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4879748Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4879925Z   test_zero_to_zero_correspondence_unary_isposinf_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4880097Z   test_zero_to_zero_correspondence_unary_log1p_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4880269Z   test_zero_to_zero_correspondence_unary_log1p_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4880448Z   test_zero_to_zero_correspondence_unary_log1p_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4880628Z   test_zero_to_zero_correspondence_unary_log1p_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4880800Z   test_zero_to_zero_correspondence_unary_log1p_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4880975Z   test_zero_to_zero_correspondence_unary_log1p_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4881146Z   test_zero_to_zero_correspondence_unary_log1p_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4881333Z   test_zero_to_zero_correspondence_unary_log1p_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4881502Z   test_zero_to_zero_correspondence_unary_log1p_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4881676Z   test_zero_to_zero_correspondence_unary_log1p_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4881847Z   test_zero_to_zero_correspondence_unary_log1p_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4882015Z   test_zero_to_zero_correspondence_unary_log1p_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4882188Z   test_zero_to_zero_correspondence_unary_neg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4882364Z   test_zero_to_zero_correspondence_unary_neg_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4882540Z   test_zero_to_zero_correspondence_unary_neg_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4882719Z   test_zero_to_zero_correspondence_unary_neg_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4882880Z   test_zero_to_zero_correspondence_unary_neg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4883053Z   test_zero_to_zero_correspondence_unary_neg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4883229Z   test_zero_to_zero_correspondence_unary_neg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4883427Z   test_zero_to_zero_correspondence_unary_neg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4883609Z   test_zero_to_zero_correspondence_unary_neg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4883800Z   test_zero_to_zero_correspondence_unary_neg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4883973Z   test_zero_to_zero_correspondence_unary_neg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4884144Z   test_zero_to_zero_correspondence_unary_neg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4884334Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4884518Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4884703Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4884892Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4885082Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4885263Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4885444Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4885632Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4885818Z   test_zero_to_zero_correspondence_unary_nn_functional_relu_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4885999Z   test_zero_to_zero_correspondence_unary_positive_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4886187Z   test_zero_to_zero_correspondence_unary_positive_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4886365Z   test_zero_to_zero_correspondence_unary_positive_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4886545Z   test_zero_to_zero_correspondence_unary_positive_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4886749Z   test_zero_to_zero_correspondence_unary_positive_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4886928Z   test_zero_to_zero_correspondence_unary_positive_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4887100Z   test_zero_to_zero_correspondence_unary_positive_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4887276Z   test_zero_to_zero_correspondence_unary_positive_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4887453Z   test_zero_to_zero_correspondence_unary_positive_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4887631Z   test_zero_to_zero_correspondence_unary_positive_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4887807Z   test_zero_to_zero_correspondence_unary_positive_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4887977Z   test_zero_to_zero_correspondence_unary_positive_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4888156Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4888333Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4888509Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4888685Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4888857Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4889059Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4889237Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4889413Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4889580Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4889749Z   test_zero_to_zero_correspondence_unary_rad2deg_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4889920Z   test_zero_to_zero_correspondence_unary_round_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4890093Z   test_zero_to_zero_correspondence_unary_round_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4890265Z   test_zero_to_zero_correspondence_unary_round_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4890441Z   test_zero_to_zero_correspondence_unary_round_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4890614Z   test_zero_to_zero_correspondence_unary_round_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4890782Z   test_zero_to_zero_correspondence_unary_round_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4890954Z   test_zero_to_zero_correspondence_unary_round_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4891119Z   test_zero_to_zero_correspondence_unary_round_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4891288Z   test_zero_to_zero_correspondence_unary_round_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4891461Z   test_zero_to_zero_correspondence_unary_sgn_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4891630Z   test_zero_to_zero_correspondence_unary_sgn_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4891810Z   test_zero_to_zero_correspondence_unary_sgn_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4892006Z   test_zero_to_zero_correspondence_unary_sgn_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4892181Z   test_zero_to_zero_correspondence_unary_sgn_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4892351Z   test_zero_to_zero_correspondence_unary_sgn_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4892520Z   test_zero_to_zero_correspondence_unary_sgn_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4892680Z   test_zero_to_zero_correspondence_unary_sgn_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4892854Z   test_zero_to_zero_correspondence_unary_sgn_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4893023Z   test_zero_to_zero_correspondence_unary_sgn_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4893189Z   test_zero_to_zero_correspondence_unary_sgn_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4893364Z   test_zero_to_zero_correspondence_unary_sgn_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4893534Z   test_zero_to_zero_correspondence_unary_sgn_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4893707Z   test_zero_to_zero_correspondence_unary_sign_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4893876Z   test_zero_to_zero_correspondence_unary_sign_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4894049Z   test_zero_to_zero_correspondence_unary_sign_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4894211Z   test_zero_to_zero_correspondence_unary_sign_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4894420Z   test_zero_to_zero_correspondence_unary_sign_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4894765Z   test_zero_to_zero_correspondence_unary_sign_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4894977Z   test_zero_to_zero_correspondence_unary_sign_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4895144Z   test_zero_to_zero_correspondence_unary_sign_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4895315Z   test_zero_to_zero_correspondence_unary_sign_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4895483Z   test_zero_to_zero_correspondence_unary_sign_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4895661Z   test_zero_to_zero_correspondence_unary_signbit_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4895835Z   test_zero_to_zero_correspondence_unary_signbit_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4896008Z   test_zero_to_zero_correspondence_unary_signbit_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4896185Z   test_zero_to_zero_correspondence_unary_signbit_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4896362Z   test_zero_to_zero_correspondence_unary_signbit_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4896534Z   test_zero_to_zero_correspondence_unary_signbit_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4896711Z   test_zero_to_zero_correspondence_unary_signbit_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4896882Z   test_zero_to_zero_correspondence_unary_signbit_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4897053Z   test_zero_to_zero_correspondence_unary_signbit_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4897224Z   test_zero_to_zero_correspondence_unary_signbit_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4897396Z   test_zero_to_zero_correspondence_unary_sin_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4897631Z   test_zero_to_zero_correspondence_unary_sin_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4897808Z   test_zero_to_zero_correspondence_unary_sin_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4897980Z   test_zero_to_zero_correspondence_unary_sin_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4898152Z   test_zero_to_zero_correspondence_unary_sin_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4898321Z   test_zero_to_zero_correspondence_unary_sin_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4898493Z   test_zero_to_zero_correspondence_unary_sin_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4898735Z   test_zero_to_zero_correspondence_unary_sin_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4898903Z   test_zero_to_zero_correspondence_unary_sin_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4899072Z   test_zero_to_zero_correspondence_unary_sin_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4899228Z   test_zero_to_zero_correspondence_unary_sin_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4899395Z   test_zero_to_zero_correspondence_unary_sin_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4899562Z   test_zero_to_zero_correspondence_unary_sin_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4899740Z   test_zero_to_zero_correspondence_unary_sinh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4899904Z   test_zero_to_zero_correspondence_unary_sinh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4900120Z   test_zero_to_zero_correspondence_unary_sinh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4900299Z   test_zero_to_zero_correspondence_unary_sinh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4900474Z   test_zero_to_zero_correspondence_unary_sinh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4900646Z   test_zero_to_zero_correspondence_unary_sinh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4900814Z   test_zero_to_zero_correspondence_unary_sinh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4900989Z   test_zero_to_zero_correspondence_unary_sinh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4901158Z   test_zero_to_zero_correspondence_unary_sinh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4901328Z   test_zero_to_zero_correspondence_unary_sinh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4901498Z   test_zero_to_zero_correspondence_unary_sinh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4901672Z   test_zero_to_zero_correspondence_unary_sinh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4901839Z   test_zero_to_zero_correspondence_unary_sinh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4902013Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4902182Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4902355Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4902532Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4902708Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4902884Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4903078Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4903254Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4903427Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4903593Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4903758Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4903923Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4904095Z   test_zero_to_zero_correspondence_unary_sqrt_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4904269Z   test_zero_to_zero_correspondence_unary_tan_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4904445Z   test_zero_to_zero_correspondence_unary_tan_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4904624Z   test_zero_to_zero_correspondence_unary_tan_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4904804Z   test_zero_to_zero_correspondence_unary_tan_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4904980Z   test_zero_to_zero_correspondence_unary_tan_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4905152Z   test_zero_to_zero_correspondence_unary_tan_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4905317Z   test_zero_to_zero_correspondence_unary_tan_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4905513Z   test_zero_to_zero_correspondence_unary_tan_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4905683Z   test_zero_to_zero_correspondence_unary_tan_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4905852Z   test_zero_to_zero_correspondence_unary_tan_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4906017Z   test_zero_to_zero_correspondence_unary_tan_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4906185Z   test_zero_to_zero_correspondence_unary_tan_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4906355Z   test_zero_to_zero_correspondence_unary_tan_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4906530Z   test_zero_to_zero_correspondence_unary_tanh_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4906698Z   test_zero_to_zero_correspondence_unary_tanh_cuda_bool (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4906874Z   test_zero_to_zero_correspondence_unary_tanh_cuda_complex128 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4907051Z   test_zero_to_zero_correspondence_unary_tanh_cuda_complex32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4907227Z   test_zero_to_zero_correspondence_unary_tanh_cuda_complex64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4907399Z   test_zero_to_zero_correspondence_unary_tanh_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4907571Z   test_zero_to_zero_correspondence_unary_tanh_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4907742Z   test_zero_to_zero_correspondence_unary_tanh_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4907914Z   test_zero_to_zero_correspondence_unary_tanh_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4908081Z   test_zero_to_zero_correspondence_unary_tanh_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4908254Z   test_zero_to_zero_correspondence_unary_tanh_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4908438Z   test_zero_to_zero_correspondence_unary_tanh_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4908607Z   test_zero_to_zero_correspondence_unary_tanh_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4908784Z   test_zero_to_zero_correspondence_unary_trunc_cuda_bfloat16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4908960Z   test_zero_to_zero_correspondence_unary_trunc_cuda_float16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4909128Z   test_zero_to_zero_correspondence_unary_trunc_cuda_float32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4909301Z   test_zero_to_zero_correspondence_unary_trunc_cuda_float64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4909475Z   test_zero_to_zero_correspondence_unary_trunc_cuda_int16 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4909647Z   test_zero_to_zero_correspondence_unary_trunc_cuda_int32 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4909821Z   test_zero_to_zero_correspondence_unary_trunc_cuda_int64 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4909983Z   test_zero_to_zero_correspondence_unary_trunc_cuda_int8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4910150Z   test_zero_to_zero_correspondence_unary_trunc_cuda_uint8 (__main__.TestSparseCSRCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4910296Z   test_make_crow_indices (__main__.TestSparseCSRSampler) ... ok (0.411s)
2023-01-11T21:45:10.4910467Z   test_clone_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.188s)
2023-01-11T21:45:10.4910630Z   test_clone_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.177s)
2023-01-11T21:45:10.4910800Z   test_clone_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.185s)
2023-01-11T21:45:10.4910993Z   test_clone_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.185s)
2023-01-11T21:45:10.4911161Z   test_clone_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.186s)
2023-01-11T21:45:10.4911327Z   test_clone_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.185s)
2023-01-11T21:45:10.4911479Z   test_clone_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.183s)
2023-01-11T21:45:10.4911642Z   test_clone_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.178s)
2023-01-11T21:45:10.4911800Z   test_clone_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.177s)
2023-01-11T21:45:10.4911955Z   test_clone_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.174s)
2023-01-11T21:45:10.4912117Z   test_clone_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.175s)
2023-01-11T21:45:10.4912275Z   test_clone_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.178s)
2023-01-11T21:45:10.4912442Z   test_clone_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.143s)
2023-01-11T21:45:10.4912603Z   test_clone_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.134s)
2023-01-11T21:45:10.4912766Z   test_clone_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.144s)
2023-01-11T21:45:10.4912933Z   test_clone_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.145s)
2023-01-11T21:45:10.4913095Z   test_clone_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.143s)
2023-01-11T21:45:10.4913256Z   test_clone_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.143s)
2023-01-11T21:45:10.4913418Z   test_clone_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.142s)
2023-01-11T21:45:10.4913580Z   test_clone_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.135s)
2023-01-11T21:45:10.4913741Z   test_clone_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.135s)
2023-01-11T21:45:10.4913894Z   test_clone_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.133s)
2023-01-11T21:45:10.4914069Z   test_clone_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.135s)
2023-01-11T21:45:10.4914226Z   test_clone_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.134s)
2023-01-11T21:45:10.4914387Z   test_clone_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.123s)
2023-01-11T21:45:10.4914549Z   test_clone_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.4914714Z   test_clone_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.4914875Z   test_clone_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.122s)
2023-01-11T21:45:10.4915034Z   test_clone_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.4915195Z   test_clone_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.4915356Z   test_clone_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s)
2023-01-11T21:45:10.4915511Z   test_clone_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.4915665Z   test_clone_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.4915817Z   test_clone_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.4915974Z   test_clone_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.113s)
2023-01-11T21:45:10.4916128Z   test_clone_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.4916290Z   test_clone_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.088s)
2023-01-11T21:45:10.4916447Z   test_clone_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.079s)
2023-01-11T21:45:10.4916640Z   test_clone_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.089s)
2023-01-11T21:45:10.4916797Z   test_clone_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.090s)
2023-01-11T21:45:10.4916957Z   test_clone_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.087s)
2023-01-11T21:45:10.4917119Z   test_clone_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.087s)
2023-01-11T21:45:10.4917275Z   test_clone_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.086s)
2023-01-11T21:45:10.4917434Z   test_clone_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.080s)
2023-01-11T21:45:10.4917588Z   test_clone_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.080s)
2023-01-11T21:45:10.4917742Z   test_clone_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.078s)
2023-01-11T21:45:10.4917900Z   test_clone_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.079s)
2023-01-11T21:45:10.4918056Z   test_clone_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.079s)
2023-01-11T21:45:10.4918230Z   test_consistency_SparseBSC_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4918401Z   test_consistency_SparseBSC_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4918585Z   test_consistency_SparseBSC_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4918760Z   test_consistency_SparseBSC_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4918932Z   test_consistency_SparseBSC_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4919109Z   test_consistency_SparseBSC_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4919283Z   test_consistency_SparseBSC_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4919459Z   test_consistency_SparseBSC_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4919650Z   test_consistency_SparseBSC_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4919824Z   test_consistency_SparseBSC_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4920000Z   test_consistency_SparseBSC_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4920170Z   test_consistency_SparseBSC_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4920338Z   test_consistency_SparseBSC_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4920515Z   test_consistency_SparseBSC_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4920696Z   test_consistency_SparseBSC_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4920877Z   test_consistency_SparseBSC_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4921053Z   test_consistency_SparseBSC_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4921224Z   test_consistency_SparseBSC_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4921394Z   test_consistency_SparseBSC_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4921566Z   test_consistency_SparseBSC_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4921739Z   test_consistency_SparseBSC_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4921911Z   test_consistency_SparseBSC_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4922084Z   test_consistency_SparseBSC_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4922290Z   test_consistency_SparseBSC_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4922469Z   test_consistency_SparseBSC_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4922643Z   test_consistency_SparseBSC_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4922813Z   test_consistency_SparseBSC_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4922989Z   test_consistency_SparseBSC_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4923162Z   test_consistency_SparseBSC_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4923346Z   test_consistency_SparseBSC_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4923553Z   test_consistency_SparseBSC_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4923739Z   test_consistency_SparseBSC_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4923915Z   test_consistency_SparseBSC_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4924088Z   test_consistency_SparseBSC_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4924258Z   test_consistency_SparseBSC_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4924422Z   test_consistency_SparseBSC_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4924590Z   test_consistency_SparseBSC_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4924771Z   test_consistency_SparseBSC_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4924940Z   test_consistency_SparseBSC_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4925123Z   test_consistency_SparseBSC_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4925324Z   test_consistency_SparseBSC_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4925501Z   test_consistency_SparseBSC_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4925676Z   test_consistency_SparseBSC_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4925839Z   test_consistency_SparseBSC_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4926013Z   test_consistency_SparseBSC_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4926185Z   test_consistency_SparseBSC_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4926355Z   test_consistency_SparseBSC_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4926532Z   test_consistency_SparseBSC_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4926705Z   test_consistency_SparseBSC_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4926874Z   test_consistency_SparseBSC_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4927048Z   test_consistency_SparseBSC_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4927219Z   test_consistency_SparseBSC_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4927389Z   test_consistency_SparseBSC_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4927568Z   test_consistency_SparseBSC_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4927749Z   test_consistency_SparseBSC_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4927947Z   test_consistency_SparseBSC_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4928118Z   test_consistency_SparseBSC_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4928285Z   test_consistency_SparseBSC_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4928455Z   test_consistency_SparseBSC_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4928626Z   test_consistency_SparseBSC_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4928794Z   test_consistency_SparseBSC_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4928958Z   test_consistency_SparseBSC_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4929127Z   test_consistency_SparseBSC_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4929308Z   test_consistency_SparseBSC_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4929479Z   test_consistency_SparseBSC_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4929661Z   test_consistency_SparseBSC_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4929843Z   test_consistency_SparseBSC_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4930018Z   test_consistency_SparseBSC_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4930189Z   test_consistency_SparseBSC_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4930365Z   test_consistency_SparseBSC_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4930529Z   test_consistency_SparseBSC_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4930702Z   test_consistency_SparseBSC_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4930903Z   test_consistency_SparseBSC_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4931077Z   test_consistency_SparseBSC_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4931244Z   test_consistency_SparseBSC_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4931411Z   test_consistency_SparseBSC_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4931586Z   test_consistency_SparseBSC_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4931758Z   test_consistency_SparseBSC_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4931927Z   test_consistency_SparseBSC_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4932098Z   test_consistency_SparseBSC_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4932270Z   test_consistency_SparseBSC_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4932439Z   test_consistency_SparseBSC_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4932608Z   test_consistency_SparseBSC_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4932779Z   test_consistency_SparseBSC_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4932944Z   test_consistency_SparseBSC_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4933132Z   test_consistency_SparseBSC_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4933315Z   test_consistency_SparseBSC_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4933534Z   test_consistency_SparseBSC_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4933739Z   test_consistency_SparseBSC_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4933961Z   test_consistency_SparseBSC_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4934149Z   test_consistency_SparseBSC_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4934332Z   test_consistency_SparseBSC_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4934642Z   test_consistency_SparseBSC_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4934834Z   test_consistency_SparseBSC_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4935022Z   test_consistency_SparseBSC_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4935206Z   test_consistency_SparseBSC_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4935388Z   test_consistency_SparseBSC_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4935567Z   test_consistency_SparseBSC_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4935748Z   test_consistency_SparseBSC_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4935925Z   test_consistency_SparseBSC_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4936105Z   test_consistency_SparseBSC_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4936279Z   test_consistency_SparseBSC_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4936454Z   test_consistency_SparseBSC_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4936629Z   test_consistency_SparseBSC_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4936841Z   test_consistency_SparseBSC_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4937004Z   test_consistency_SparseBSC_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4937176Z   test_consistency_SparseBSC_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4937343Z   test_consistency_SparseBSC_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4937524Z   test_consistency_SparseBSC_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4937695Z   test_consistency_SparseBSC_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4937870Z   test_consistency_SparseBSC_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4938042Z   test_consistency_SparseBSC_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4938215Z   test_consistency_SparseBSC_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4938385Z   test_consistency_SparseBSC_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4938607Z   test_consistency_SparseBSC_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4938786Z   test_consistency_SparseBSC_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4938956Z   test_consistency_SparseBSC_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4939123Z   test_consistency_SparseBSC_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4939330Z   test_consistency_SparseBSC_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4939511Z   test_consistency_SparseBSC_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4939687Z   test_consistency_SparseBSC_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4939859Z   test_consistency_SparseBSC_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4940032Z   test_consistency_SparseBSC_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4940200Z   test_consistency_SparseBSC_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4940368Z   test_consistency_SparseBSC_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4940539Z   test_consistency_SparseBSC_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4940716Z   test_consistency_SparseBSC_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4940894Z   test_consistency_SparseBSC_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4941071Z   test_consistency_SparseBSC_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4941245Z   test_consistency_SparseBSC_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4941414Z   test_consistency_SparseBSC_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4941583Z   test_consistency_SparseBSC_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4941747Z   test_consistency_SparseBSC_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4941920Z   test_consistency_SparseBSC_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4942090Z   test_consistency_SparseBSC_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4942264Z   test_consistency_SparseBSC_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4942459Z   test_consistency_SparseBSC_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4942637Z   test_consistency_SparseBSC_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4942809Z   test_consistency_SparseBSC_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4942978Z   test_consistency_SparseBSC_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4943140Z   test_consistency_SparseBSC_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4943310Z   test_consistency_SparseBSC_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4943505Z   test_consistency_SparseBSC_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4943699Z   test_consistency_SparseBSC_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4943873Z   test_consistency_SparseBSC_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4944043Z   test_consistency_SparseBSC_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4944219Z   test_consistency_SparseBSC_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4944391Z   test_consistency_SparseBSC_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4944564Z   test_consistency_SparseBSC_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4944727Z   test_consistency_SparseBSC_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4944927Z   test_consistency_SparseBSC_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4945097Z   test_consistency_SparseBSC_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4945280Z   test_consistency_SparseBSC_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4945458Z   test_consistency_SparseBSC_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4945636Z   test_consistency_SparseBSC_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4945807Z   test_consistency_SparseBSC_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4945977Z   test_consistency_SparseBSC_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4946143Z   test_consistency_SparseBSC_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4946309Z   test_consistency_SparseBSC_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4946480Z   test_consistency_SparseBSC_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4946651Z   test_consistency_SparseBSC_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4946821Z   test_consistency_SparseBSC_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4946989Z   test_consistency_SparseBSC_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4947166Z   test_consistency_SparseBSC_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4947337Z   test_consistency_SparseBSC_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4947517Z   test_consistency_SparseBSC_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4947700Z   test_consistency_SparseBSC_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4947865Z   test_consistency_SparseBSC_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4948058Z   test_consistency_SparseBSC_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4948229Z   test_consistency_SparseBSC_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4948399Z   test_consistency_SparseBSC_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4948570Z   test_consistency_SparseBSC_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4948738Z   test_consistency_SparseBSC_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4948909Z   test_consistency_SparseBSC_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4949086Z   test_consistency_SparseBSC_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4949269Z   test_consistency_SparseBSC_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4949442Z   test_consistency_SparseBSC_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4949623Z   test_consistency_SparseBSC_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4949802Z   test_consistency_SparseBSC_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4949981Z   test_consistency_SparseBSC_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4950159Z   test_consistency_SparseBSC_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4950331Z   test_consistency_SparseBSC_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4950529Z   test_consistency_SparseBSC_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4950708Z   test_consistency_SparseBSC_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4950885Z   test_consistency_SparseBSC_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4951059Z   test_consistency_SparseBSC_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4951234Z   test_consistency_SparseBSC_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4951411Z   test_consistency_SparseBSC_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4951585Z   test_consistency_SparseBSC_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4951765Z   test_consistency_SparseBSC_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4951943Z   test_consistency_SparseBSC_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4952118Z   test_consistency_SparseBSC_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4952288Z   test_consistency_SparseBSC_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4952456Z   test_consistency_SparseBSC_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4952638Z   test_consistency_SparseBSC_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4952812Z   test_consistency_SparseBSC_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4952985Z   test_consistency_SparseBSC_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4953166Z   test_consistency_SparseBSC_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4953362Z   test_consistency_SparseBSC_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4953596Z   test_consistency_SparseBSC_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4953767Z   test_consistency_SparseBSC_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4953935Z   test_consistency_SparseBSC_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4954099Z   test_consistency_SparseBSC_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4954269Z   test_consistency_SparseBSC_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4954441Z   test_consistency_SparseBSC_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4954612Z   test_consistency_SparseBSC_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4954788Z   test_consistency_SparseBSC_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4955046Z   test_consistency_SparseBSC_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4955301Z   test_consistency_SparseBSC_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4955556Z   test_consistency_SparseBSC_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4955802Z   test_consistency_SparseBSC_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4956078Z   test_consistency_SparseBSC_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4956326Z   test_consistency_SparseBSC_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4956559Z   test_consistency_SparseBSC_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4956805Z   test_consistency_SparseBSC_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4957045Z   test_consistency_SparseBSC_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4957296Z   test_consistency_SparseBSC_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4957546Z   test_consistency_SparseBSC_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4957790Z   test_consistency_SparseBSC_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4958037Z   test_consistency_SparseBSC_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4958284Z   test_consistency_SparseBSC_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4958528Z   test_consistency_SparseBSC_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4958796Z   test_consistency_SparseBSC_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4959044Z   test_consistency_SparseBSC_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4959286Z   test_consistency_SparseBSC_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4959535Z   test_consistency_SparseBSC_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4959769Z   test_consistency_SparseBSC_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4960030Z   test_consistency_SparseBSC_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4960282Z   test_consistency_SparseBSC_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4960527Z   test_consistency_SparseBSC_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4960776Z   test_consistency_SparseBSC_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4961014Z   test_consistency_SparseBSC_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4961283Z   test_consistency_SparseBSC_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4961523Z   test_consistency_SparseBSC_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4961759Z   test_consistency_SparseBSC_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4961999Z   test_consistency_SparseBSC_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4962239Z   test_consistency_SparseBSC_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4962492Z   test_consistency_SparseBSC_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4962725Z   test_consistency_SparseBSC_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4962977Z   test_consistency_SparseBSC_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4963231Z   test_consistency_SparseBSC_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4963475Z   test_consistency_SparseBSC_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4963723Z   test_consistency_SparseBSC_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4963985Z   test_consistency_SparseBSC_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4964231Z   test_consistency_SparseBSC_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4964470Z   test_consistency_SparseBSC_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4964708Z   test_consistency_SparseBSC_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4964953Z   test_consistency_SparseBSC_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4965196Z   test_consistency_SparseBSC_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4965446Z   test_consistency_SparseBSC_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4965691Z   test_consistency_SparseBSC_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4965937Z   test_consistency_SparseBSC_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4966268Z   test_consistency_SparseBSC_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4966513Z   test_consistency_SparseBSC_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4966754Z   test_consistency_SparseBSC_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4966991Z   test_consistency_SparseBSC_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4967233Z   test_consistency_SparseBSC_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4967479Z   test_consistency_SparseBSC_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4967725Z   test_consistency_SparseBSC_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4967967Z   test_consistency_SparseBSC_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4968212Z   test_consistency_SparseBSC_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4968390Z   test_consistency_SparseBSC_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4968570Z   test_consistency_SparseBSC_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4968744Z   test_consistency_SparseBSC_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4968948Z   test_consistency_SparseBSC_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4969122Z   test_consistency_SparseBSC_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4969298Z   test_consistency_SparseBSC_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4969473Z   test_consistency_SparseBSC_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4969644Z   test_consistency_SparseBSC_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4969814Z   test_consistency_SparseBSC_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4969984Z   test_consistency_SparseBSC_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4970157Z   test_consistency_SparseBSC_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4970320Z   test_consistency_SparseBSC_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4970514Z   test_consistency_SparseBSC_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4970703Z   test_consistency_SparseBSC_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4970894Z   test_consistency_SparseBSC_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4971083Z   test_consistency_SparseBSC_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4971268Z   test_consistency_SparseBSC_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4971492Z   test_consistency_SparseBSC_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4971677Z   test_consistency_SparseBSC_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4971865Z   test_consistency_SparseBSC_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4972042Z   test_consistency_SparseBSC_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4972225Z   test_consistency_SparseBSC_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4972410Z   test_consistency_SparseBSC_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4972596Z   test_consistency_SparseBSC_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4972779Z   test_consistency_SparseBSC_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4972960Z   test_consistency_SparseBSC_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4973143Z   test_consistency_SparseBSC_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4973321Z   test_consistency_SparseBSC_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4973499Z   test_consistency_SparseBSC_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4973665Z   test_consistency_SparseBSC_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4973869Z   test_consistency_SparseBSC_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4974067Z   test_consistency_SparseBSC_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4974240Z   test_consistency_SparseBSC_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4974420Z   test_consistency_SparseBSC_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4974838Z   test_consistency_SparseBSC_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4975030Z   test_consistency_SparseBSC_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4975205Z   test_consistency_SparseBSC_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4975380Z   test_consistency_SparseBSC_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4975547Z   test_consistency_SparseBSC_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4975717Z   test_consistency_SparseBSC_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4975895Z   test_consistency_SparseBSC_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4976076Z   test_consistency_SparseBSC_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4976254Z   test_consistency_SparseBSC_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4976506Z   test_consistency_SparseBSC_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4976760Z   test_consistency_SparseBSC_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4977011Z   test_consistency_SparseBSC_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4977260Z   test_consistency_SparseBSC_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4977552Z   test_consistency_SparseBSC_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4977793Z   test_consistency_SparseBSC_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4978031Z   test_consistency_SparseBSC_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4978212Z   test_consistency_SparseBSC_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4978389Z   test_consistency_SparseBSC_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4978629Z   test_consistency_SparseBSC_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4978810Z   test_consistency_SparseBSC_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4978988Z   test_consistency_SparseBSC_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4979162Z   test_consistency_SparseBSC_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4979335Z   test_consistency_SparseBSC_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4979511Z   test_consistency_SparseBSC_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4979679Z   test_consistency_SparseBSC_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4979856Z   test_consistency_SparseBSC_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4980030Z   test_consistency_SparseBSC_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4980210Z   test_consistency_SparseBSC_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4980413Z   test_consistency_SparseBSC_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4980587Z   test_consistency_SparseBSC_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4980761Z   test_consistency_SparseBSC_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4980934Z   test_consistency_SparseBSC_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4981102Z   test_consistency_SparseBSC_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4981275Z   test_consistency_SparseBSC_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4981449Z   test_consistency_SparseBSC_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4981624Z   test_consistency_SparseBSC_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4981798Z   test_consistency_SparseBSC_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4981968Z   test_consistency_SparseBSC_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4982146Z   test_consistency_SparseBSC_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4982317Z   test_consistency_SparseBSC_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4982493Z   test_consistency_SparseBSC_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4982663Z   test_consistency_SparseBSC_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4982834Z   test_consistency_SparseBSC_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4983037Z   test_consistency_SparseBSC_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4983213Z   test_consistency_SparseBSC_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4983388Z   test_consistency_SparseBSC_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4983559Z   test_consistency_SparseBSC_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4983730Z   test_consistency_SparseBSC_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4983913Z   test_consistency_SparseBSC_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4984095Z   test_consistency_SparseBSC_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4984270Z   test_consistency_SparseBSC_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4984453Z   test_consistency_SparseBSC_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4984631Z   test_consistency_SparseBSC_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4984811Z   test_consistency_SparseBSC_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4984984Z   test_consistency_SparseBSC_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4985155Z   test_consistency_SparseBSC_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4985328Z   test_consistency_SparseBSC_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4985504Z   test_consistency_SparseBSC_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.4985674Z   test_consistency_SparseBSC_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4985843Z   test_consistency_SparseBSC_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4986049Z   test_consistency_SparseBSC_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4986223Z   test_consistency_SparseBSC_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4986394Z   test_consistency_SparseBSC_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4986568Z   test_consistency_SparseBSC_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4986740Z   test_consistency_SparseBSC_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4986918Z   test_consistency_SparseBSC_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4987089Z   test_consistency_SparseBSC_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4987258Z   test_consistency_SparseBSC_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4987431Z   test_consistency_SparseBSC_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4987603Z   test_consistency_SparseBSC_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4987775Z   test_consistency_SparseBSC_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4987952Z   test_consistency_SparseBSC_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4988120Z   test_consistency_SparseBSC_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4988300Z   test_consistency_SparseBSC_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4988479Z   test_consistency_SparseBSC_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4988681Z   test_consistency_SparseBSC_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4988853Z   test_consistency_SparseBSC_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4989024Z   test_consistency_SparseBSC_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4989194Z   test_consistency_SparseBSC_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4989369Z   test_consistency_SparseBSC_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4989542Z   test_consistency_SparseBSC_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4989713Z   test_consistency_SparseBSC_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4989885Z   test_consistency_SparseBSC_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4990062Z   test_consistency_SparseBSC_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4990238Z   test_consistency_SparseBSC_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4990406Z   test_consistency_SparseBSC_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4990588Z   test_consistency_SparseBSC_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4990767Z   test_consistency_SparseBSC_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4990943Z   test_consistency_SparseBSC_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4991119Z   test_consistency_SparseBSC_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4991289Z   test_consistency_SparseBSC_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4991466Z   test_consistency_SparseBSC_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4991638Z   test_consistency_SparseBSC_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4991832Z   test_consistency_SparseBSC_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4991999Z   test_consistency_SparseBSC_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4992172Z   test_consistency_SparseBSC_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4992342Z   test_consistency_SparseBSC_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4992516Z   test_consistency_SparseBSC_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4992689Z   test_consistency_SparseBSC_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4992870Z   test_consistency_SparseBSC_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4993047Z   test_consistency_SparseBSC_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4993231Z   test_consistency_SparseBSC_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4993417Z   test_consistency_SparseBSC_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4993620Z   test_consistency_SparseBSC_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4993800Z   test_consistency_SparseBSC_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4993971Z   test_consistency_SparseBSC_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4994144Z   test_consistency_SparseBSC_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4994346Z   test_consistency_SparseBSC_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4994517Z   test_consistency_SparseBSC_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4994688Z   test_consistency_SparseBSC_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4994867Z   test_consistency_SparseBSC_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4995031Z   test_consistency_SparseBSC_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4995212Z   test_consistency_SparseBSC_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4995392Z   test_consistency_SparseBSC_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4995569Z   test_consistency_SparseBSC_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4995745Z   test_consistency_SparseBSC_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4995922Z   test_consistency_SparseBSC_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4996094Z   test_consistency_SparseBSC_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4996267Z   test_consistency_SparseBSC_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4996440Z   test_consistency_SparseBSC_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4996606Z   test_consistency_SparseBSC_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4996774Z   test_consistency_SparseBSC_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4996948Z   test_consistency_SparseBSC_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.4997201Z   test_consistency_SparseBSC_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4997473Z   test_consistency_SparseBSC_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4997726Z   test_consistency_SparseBSC_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4997977Z   test_consistency_SparseBSC_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4998223Z   test_consistency_SparseBSC_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4998464Z   test_consistency_SparseBSC_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4998714Z   test_consistency_SparseBSC_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4998956Z   test_consistency_SparseBSC_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4999186Z   test_consistency_SparseBSC_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4999431Z   test_consistency_SparseBSC_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4999673Z   test_consistency_SparseBSC_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.4999937Z   test_consistency_SparseBSC_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsc layout (0.002s)
2023-01-11T21:45:10.5000116Z   test_consistency_SparseBSC_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5000290Z   test_consistency_SparseBSC_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5000460Z   test_consistency_SparseBSC_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5000637Z   test_consistency_SparseBSC_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5000807Z   test_consistency_SparseBSC_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5000982Z   test_consistency_SparseBSC_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5001151Z   test_consistency_SparseBSC_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5001323Z   test_consistency_SparseBSC_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5001492Z   test_consistency_SparseBSC_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5001665Z   test_consistency_SparseBSR_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5001833Z   test_consistency_SparseBSR_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5002009Z   test_consistency_SparseBSR_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5002182Z   test_consistency_SparseBSR_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5002354Z   test_consistency_SparseBSR_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5002531Z   test_consistency_SparseBSR_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5002719Z   test_consistency_SparseBSR_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5002892Z   test_consistency_SparseBSR_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5003064Z   test_consistency_SparseBSR_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5003234Z   test_consistency_SparseBSR_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5003401Z   test_consistency_SparseBSR_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5003569Z   test_consistency_SparseBSR_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5003737Z   test_consistency_SparseBSR_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5003914Z   test_consistency_SparseBSR_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5004093Z   test_consistency_SparseBSR_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5004272Z   test_consistency_SparseBSR_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5004445Z   test_consistency_SparseBSR_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5004617Z   test_consistency_SparseBSR_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5004787Z   test_consistency_SparseBSR_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5004957Z   test_consistency_SparseBSR_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5005131Z   test_consistency_SparseBSR_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5005332Z   test_consistency_SparseBSR_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5005510Z   test_consistency_SparseBSR_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5005676Z   test_consistency_SparseBSR_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5005854Z   test_consistency_SparseBSR_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5006027Z   test_consistency_SparseBSR_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5006206Z   test_consistency_SparseBSR_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5006385Z   test_consistency_SparseBSR_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5006559Z   test_consistency_SparseBSR_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5006739Z   test_consistency_SparseBSR_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5006912Z   test_consistency_SparseBSR_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5007082Z   test_consistency_SparseBSR_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5007249Z   test_consistency_SparseBSR_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5007424Z   test_consistency_SparseBSR_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5007597Z   test_consistency_SparseBSR_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5007769Z   test_consistency_SparseBSR_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5007940Z   test_consistency_SparseBSR_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5008123Z   test_consistency_SparseBSR_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5008318Z   test_consistency_SparseBSR_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5008503Z   test_consistency_SparseBSR_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5008683Z   test_consistency_SparseBSR_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5008855Z   test_consistency_SparseBSR_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5009033Z   test_consistency_SparseBSR_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5009205Z   test_consistency_SparseBSR_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5009384Z   test_consistency_SparseBSR_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5009559Z   test_consistency_SparseBSR_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5009736Z   test_consistency_SparseBSR_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5009911Z   test_consistency_SparseBSR_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5010084Z   test_consistency_SparseBSR_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5010259Z   test_consistency_SparseBSR_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5010429Z   test_consistency_SparseBSR_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5010601Z   test_consistency_SparseBSR_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5010780Z   test_consistency_SparseBSR_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5010999Z   test_consistency_SparseBSR_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5011183Z   test_consistency_SparseBSR_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5011359Z   test_consistency_SparseBSR_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5011533Z   test_consistency_SparseBSR_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5011702Z   test_consistency_SparseBSR_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5011868Z   test_consistency_SparseBSR_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5012039Z   test_consistency_SparseBSR_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5012209Z   test_consistency_SparseBSR_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5012385Z   test_consistency_SparseBSR_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5012562Z   test_consistency_SparseBSR_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5012742Z   test_consistency_SparseBSR_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5012911Z   test_consistency_SparseBSR_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5013091Z   test_consistency_SparseBSR_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5013269Z   test_consistency_SparseBSR_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5013439Z   test_consistency_SparseBSR_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5013614Z   test_consistency_SparseBSR_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5013789Z   test_consistency_SparseBSR_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5013987Z   test_consistency_SparseBSR_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5014165Z   test_consistency_SparseBSR_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5014340Z   test_consistency_SparseBSR_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5014631Z   test_consistency_SparseBSR_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5014805Z   test_consistency_SparseBSR_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5014973Z   test_consistency_SparseBSR_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5015142Z   test_consistency_SparseBSR_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5015319Z   test_consistency_SparseBSR_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5015491Z   test_consistency_SparseBSR_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5015661Z   test_consistency_SparseBSR_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5015831Z   test_consistency_SparseBSR_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5015999Z   test_consistency_SparseBSR_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5016165Z   test_consistency_SparseBSR_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5016334Z   test_consistency_SparseBSR_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5016502Z   test_consistency_SparseBSR_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5016726Z   test_consistency_SparseBSR_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5016913Z   test_consistency_SparseBSR_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5017106Z   test_consistency_SparseBSR_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5017297Z   test_consistency_SparseBSR_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5017488Z   test_consistency_SparseBSR_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5017674Z   test_consistency_SparseBSR_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5017858Z   test_consistency_SparseBSR_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5018046Z   test_consistency_SparseBSR_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5018232Z   test_consistency_SparseBSR_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5018409Z   test_consistency_SparseBSR_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5018652Z   test_consistency_SparseBSR_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5018839Z   test_consistency_SparseBSR_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5019020Z   test_consistency_SparseBSR_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5019205Z   test_consistency_SparseBSR_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5019382Z   test_consistency_SparseBSR_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5019564Z   test_consistency_SparseBSR_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5019740Z   test_consistency_SparseBSR_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5019954Z   test_consistency_SparseBSR_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5020124Z   test_consistency_SparseBSR_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5020299Z   test_consistency_SparseBSR_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5020468Z   test_consistency_SparseBSR_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5020643Z   test_consistency_SparseBSR_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5020811Z   test_consistency_SparseBSR_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5020985Z   test_consistency_SparseBSR_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5021158Z   test_consistency_SparseBSR_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5021330Z   test_consistency_SparseBSR_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5021501Z   test_consistency_SparseBSR_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5021665Z   test_consistency_SparseBSR_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5021836Z   test_consistency_SparseBSR_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5022003Z   test_consistency_SparseBSR_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5022170Z   test_consistency_SparseBSR_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5022367Z   test_consistency_SparseBSR_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5022539Z   test_consistency_SparseBSR_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5022718Z   test_consistency_SparseBSR_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5022897Z   test_consistency_SparseBSR_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5023065Z   test_consistency_SparseBSR_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5023242Z   test_consistency_SparseBSR_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5023447Z   test_consistency_SparseBSR_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5023637Z   test_consistency_SparseBSR_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5023812Z   test_consistency_SparseBSR_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5023984Z   test_consistency_SparseBSR_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5024160Z   test_consistency_SparseBSR_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5024338Z   test_consistency_SparseBSR_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5024513Z   test_consistency_SparseBSR_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5024683Z   test_consistency_SparseBSR_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5024856Z   test_consistency_SparseBSR_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5025026Z   test_consistency_SparseBSR_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5025201Z   test_consistency_SparseBSR_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5025374Z   test_consistency_SparseBSR_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5025572Z   test_consistency_SparseBSR_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5025748Z   test_consistency_SparseBSR_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5025917Z   test_consistency_SparseBSR_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5026097Z   test_consistency_SparseBSR_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5026265Z   test_consistency_SparseBSR_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5026442Z   test_consistency_SparseBSR_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5026618Z   test_consistency_SparseBSR_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5026793Z   test_consistency_SparseBSR_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5026968Z   test_consistency_SparseBSR_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5027139Z   test_consistency_SparseBSR_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5027316Z   test_consistency_SparseBSR_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5027485Z   test_consistency_SparseBSR_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5027659Z   test_consistency_SparseBSR_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5027827Z   test_consistency_SparseBSR_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5028026Z   test_consistency_SparseBSR_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5028202Z   test_consistency_SparseBSR_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5028383Z   test_consistency_SparseBSR_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5028555Z   test_consistency_SparseBSR_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5028739Z   test_consistency_SparseBSR_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5028919Z   test_consistency_SparseBSR_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5029101Z   test_consistency_SparseBSR_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5029268Z   test_consistency_SparseBSR_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5029442Z   test_consistency_SparseBSR_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5029617Z   test_consistency_SparseBSR_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5029790Z   test_consistency_SparseBSR_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5029963Z   test_consistency_SparseBSR_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5030135Z   test_consistency_SparseBSR_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5030306Z   test_consistency_SparseBSR_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5030478Z   test_consistency_SparseBSR_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5030659Z   test_consistency_SparseBSR_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5030828Z   test_consistency_SparseBSR_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5031010Z   test_consistency_SparseBSR_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5031218Z   test_consistency_SparseBSR_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5031396Z   test_consistency_SparseBSR_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5031567Z   test_consistency_SparseBSR_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5031741Z   test_consistency_SparseBSR_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5031912Z   test_consistency_SparseBSR_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5032084Z   test_consistency_SparseBSR_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5032258Z   test_consistency_SparseBSR_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5032424Z   test_consistency_SparseBSR_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5032598Z   test_consistency_SparseBSR_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5032783Z   test_consistency_SparseBSR_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5032959Z   test_consistency_SparseBSR_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5033141Z   test_consistency_SparseBSR_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5033324Z   test_consistency_SparseBSR_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5033505Z   test_consistency_SparseBSR_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5033740Z   test_consistency_SparseBSR_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5033939Z   test_consistency_SparseBSR_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5034106Z   test_consistency_SparseBSR_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5034281Z   test_consistency_SparseBSR_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5034460Z   test_consistency_SparseBSR_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5034643Z   test_consistency_SparseBSR_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5034816Z   test_consistency_SparseBSR_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5034992Z   test_consistency_SparseBSR_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5035178Z   test_consistency_SparseBSR_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5035359Z   test_consistency_SparseBSR_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5035537Z   test_consistency_SparseBSR_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5035704Z   test_consistency_SparseBSR_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5035878Z   test_consistency_SparseBSR_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5036055Z   test_consistency_SparseBSR_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5036235Z   test_consistency_SparseBSR_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5036412Z   test_consistency_SparseBSR_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5036590Z   test_consistency_SparseBSR_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5036794Z   test_consistency_SparseBSR_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5036979Z   test_consistency_SparseBSR_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5037157Z   test_consistency_SparseBSR_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5037324Z   test_consistency_SparseBSR_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5037494Z   test_consistency_SparseBSR_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5037667Z   test_consistency_SparseBSR_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5037843Z   test_consistency_SparseBSR_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5038015Z   test_consistency_SparseBSR_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5038192Z   test_consistency_SparseBSR_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5038364Z   test_consistency_SparseBSR_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5038624Z   test_consistency_SparseBSR_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5038880Z   test_consistency_SparseBSR_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5039128Z   test_consistency_SparseBSR_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5039401Z   test_consistency_SparseBSR_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5039655Z   test_consistency_SparseBSR_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5039904Z   test_consistency_SparseBSR_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5040145Z   test_consistency_SparseBSR_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5040393Z   test_consistency_SparseBSR_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5040639Z   test_consistency_SparseBSR_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5040892Z   test_consistency_SparseBSR_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5041140Z   test_consistency_SparseBSR_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5041381Z   test_consistency_SparseBSR_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5041623Z   test_consistency_SparseBSR_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5041867Z   test_consistency_SparseBSR_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5042133Z   test_consistency_SparseBSR_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5042366Z   test_consistency_SparseBSR_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5042614Z   test_consistency_SparseBSR_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5042854Z   test_consistency_SparseBSR_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5043109Z   test_consistency_SparseBSR_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5043355Z   test_consistency_SparseBSR_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5043615Z   test_consistency_SparseBSR_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5043869Z   test_consistency_SparseBSR_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5044115Z   test_consistency_SparseBSR_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5044387Z   test_consistency_SparseBSR_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5044628Z   test_consistency_SparseBSR_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5044866Z   test_consistency_SparseBSR_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5045106Z   test_consistency_SparseBSR_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5045343Z   test_consistency_SparseBSR_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5045578Z   test_consistency_SparseBSR_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5045826Z   test_consistency_SparseBSR_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5046073Z   test_consistency_SparseBSR_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5046319Z   test_consistency_SparseBSR_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5046574Z   test_consistency_SparseBSR_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5046827Z   test_consistency_SparseBSR_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5047109Z   test_consistency_SparseBSR_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5047360Z   test_consistency_SparseBSR_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5047604Z   test_consistency_SparseBSR_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5047851Z   test_consistency_SparseBSR_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5048096Z   test_consistency_SparseBSR_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5048340Z   test_consistency_SparseBSR_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5048584Z   test_consistency_SparseBSR_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5048817Z   test_consistency_SparseBSR_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5049062Z   test_consistency_SparseBSR_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5049310Z   test_consistency_SparseBSR_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5049601Z   test_consistency_SparseBSR_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5049848Z   test_consistency_SparseBSR_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5050091Z   test_consistency_SparseBSR_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5050337Z   test_consistency_SparseBSR_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5050575Z   test_consistency_SparseBSR_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5050823Z   test_consistency_SparseBSR_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5051072Z   test_consistency_SparseBSR_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5051314Z   test_consistency_SparseBSR_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5051558Z   test_consistency_SparseBSR_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5051794Z   test_consistency_SparseBSR_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5051973Z   test_consistency_SparseBSR_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5052224Z   test_consistency_SparseBSR_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5052406Z   test_consistency_SparseBSR_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5052578Z   test_consistency_SparseBSR_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5052753Z   test_consistency_SparseBSR_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5052929Z   test_consistency_SparseBSR_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5053102Z   test_consistency_SparseBSR_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5053276Z   test_consistency_SparseBSR_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5053442Z   test_consistency_SparseBSR_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5053615Z   test_consistency_SparseBSR_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5053786Z   test_consistency_SparseBSR_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5053954Z   test_consistency_SparseBSR_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5054146Z   test_consistency_SparseBSR_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5054337Z   test_consistency_SparseBSR_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5054674Z   test_consistency_SparseBSR_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5054911Z   test_consistency_SparseBSR_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5055155Z   test_consistency_SparseBSR_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5055341Z   test_consistency_SparseBSR_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5055532Z   test_consistency_SparseBSR_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5055720Z   test_consistency_SparseBSR_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5055907Z   test_consistency_SparseBSR_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5056093Z   test_consistency_SparseBSR_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5056281Z   test_consistency_SparseBSR_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5056470Z   test_consistency_SparseBSR_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5056658Z   test_consistency_SparseBSR_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5056840Z   test_consistency_SparseBSR_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5057015Z   test_consistency_SparseBSR_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5057199Z   test_consistency_SparseBSR_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5057378Z   test_consistency_SparseBSR_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5057553Z   test_consistency_SparseBSR_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5057734Z   test_consistency_SparseBSR_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5057915Z   test_consistency_SparseBSR_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5058123Z   test_consistency_SparseBSR_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5058310Z   test_consistency_SparseBSR_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5058490Z   test_consistency_SparseBSR_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5058743Z   test_consistency_SparseBSR_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5058921Z   test_consistency_SparseBSR_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5059096Z   test_consistency_SparseBSR_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5059274Z   test_consistency_SparseBSR_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5059449Z   test_consistency_SparseBSR_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5059629Z   test_consistency_SparseBSR_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5059808Z   test_consistency_SparseBSR_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5059984Z   test_consistency_SparseBSR_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5060236Z   test_consistency_SparseBSR_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5060487Z   test_consistency_SparseBSR_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5060759Z   test_consistency_SparseBSR_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5061013Z   test_consistency_SparseBSR_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5061263Z   test_consistency_SparseBSR_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5061509Z   test_consistency_SparseBSR_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5061753Z   test_consistency_SparseBSR_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5061933Z   test_consistency_SparseBSR_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5062114Z   test_consistency_SparseBSR_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5062290Z   test_consistency_SparseBSR_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5062461Z   test_consistency_SparseBSR_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5062635Z   test_consistency_SparseBSR_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5062804Z   test_consistency_SparseBSR_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5062978Z   test_consistency_SparseBSR_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5063152Z   test_consistency_SparseBSR_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5063329Z   test_consistency_SparseBSR_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5063508Z   test_consistency_SparseBSR_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5063710Z   test_consistency_SparseBSR_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5063894Z   test_consistency_SparseBSR_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5064069Z   test_consistency_SparseBSR_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5064241Z   test_consistency_SparseBSR_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5064407Z   test_consistency_SparseBSR_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5064580Z   test_consistency_SparseBSR_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5064754Z   test_consistency_SparseBSR_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5064929Z   test_consistency_SparseBSR_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5065104Z   test_consistency_SparseBSR_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5065276Z   test_consistency_SparseBSR_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5065449Z   test_consistency_SparseBSR_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5065618Z   test_consistency_SparseBSR_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5065793Z   test_consistency_SparseBSR_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5065958Z   test_consistency_SparseBSR_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5066135Z   test_consistency_SparseBSR_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5066344Z   test_consistency_SparseBSR_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5066520Z   test_consistency_SparseBSR_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5066691Z   test_consistency_SparseBSR_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5066863Z   test_consistency_SparseBSR_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5067038Z   test_consistency_SparseBSR_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5067208Z   test_consistency_SparseBSR_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5067373Z   test_consistency_SparseBSR_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5067555Z   test_consistency_SparseBSR_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5067737Z   test_consistency_SparseBSR_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5067920Z   test_consistency_SparseBSR_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5068101Z   test_consistency_SparseBSR_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5068279Z   test_consistency_SparseBSR_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5068455Z   test_consistency_SparseBSR_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5068626Z   test_consistency_SparseBSR_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5068804Z   test_consistency_SparseBSR_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5068972Z   test_consistency_SparseBSR_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5069150Z   test_consistency_SparseBSR_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5069347Z   test_consistency_SparseBSR_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5069519Z   test_consistency_SparseBSR_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5069700Z   test_consistency_SparseBSR_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5069873Z   test_consistency_SparseBSR_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5070044Z   test_consistency_SparseBSR_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5070215Z   test_consistency_SparseBSR_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5070391Z   test_consistency_SparseBSR_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5070561Z   test_consistency_SparseBSR_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5070735Z   test_consistency_SparseBSR_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5070909Z   test_consistency_SparseBSR_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5071081Z   test_consistency_SparseBSR_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5071254Z   test_consistency_SparseBSR_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5071422Z   test_consistency_SparseBSR_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5071597Z   test_consistency_SparseBSR_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5071766Z   test_consistency_SparseBSR_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5071975Z   test_consistency_SparseBSR_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5072151Z   test_consistency_SparseBSR_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5072329Z   test_consistency_SparseBSR_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5072506Z   test_consistency_SparseBSR_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5072677Z   test_consistency_SparseBSR_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5072852Z   test_consistency_SparseBSR_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5073025Z   test_consistency_SparseBSR_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5073200Z   test_consistency_SparseBSR_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5073398Z   test_consistency_SparseBSR_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5073589Z   test_consistency_SparseBSR_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5073764Z   test_consistency_SparseBSR_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5073942Z   test_consistency_SparseBSR_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5074112Z   test_consistency_SparseBSR_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5074293Z   test_consistency_SparseBSR_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5074473Z   test_consistency_SparseBSR_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5074651Z   test_consistency_SparseBSR_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5074830Z   test_consistency_SparseBSR_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5075025Z   test_consistency_SparseBSR_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5075195Z   test_consistency_SparseBSR_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5075367Z   test_consistency_SparseBSR_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5075538Z   test_consistency_SparseBSR_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5075711Z   test_consistency_SparseBSR_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5075882Z   test_consistency_SparseBSR_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5076052Z   test_consistency_SparseBSR_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5076229Z   test_consistency_SparseBSR_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5076405Z   test_consistency_SparseBSR_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5076583Z   test_consistency_SparseBSR_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5076754Z   test_consistency_SparseBSR_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5076935Z   test_consistency_SparseBSR_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5077110Z   test_consistency_SparseBSR_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5077285Z   test_consistency_SparseBSR_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5077460Z   test_consistency_SparseBSR_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5077662Z   test_consistency_SparseBSR_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5077834Z   test_consistency_SparseBSR_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5078006Z   test_consistency_SparseBSR_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5078178Z   test_consistency_SparseBSR_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5078344Z   test_consistency_SparseBSR_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5078521Z   test_consistency_SparseBSR_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5078692Z   test_consistency_SparseBSR_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5078876Z   test_consistency_SparseBSR_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5079057Z   test_consistency_SparseBSR_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5079230Z   test_consistency_SparseBSR_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5079407Z   test_consistency_SparseBSR_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5079584Z   test_consistency_SparseBSR_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5079748Z   test_consistency_SparseBSR_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5079920Z   test_consistency_SparseBSR_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5080092Z   test_consistency_SparseBSR_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5080265Z   test_consistency_SparseBSR_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5080438Z   test_consistency_SparseBSR_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5080608Z   test_consistency_SparseBSR_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5080883Z   test_consistency_SparseBSR_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5081132Z   test_consistency_SparseBSR_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5081382Z   test_consistency_SparseBSR_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5081631Z   test_consistency_SparseBSR_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5081873Z   test_consistency_SparseBSR_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5082112Z   test_consistency_SparseBSR_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5082358Z   test_consistency_SparseBSR_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5082603Z   test_consistency_SparseBSR_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5082845Z   test_consistency_SparseBSR_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5083113Z   test_consistency_SparseBSR_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5083359Z   test_consistency_SparseBSR_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5083640Z   test_consistency_SparseBSR_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: to_sparse does not support input with torch.sparse_bsr layout (0.002s)
2023-01-11T21:45:10.5083832Z   test_consistency_SparseBSR_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5084007Z   test_consistency_SparseBSR_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5084178Z   test_consistency_SparseBSR_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5084352Z   test_consistency_SparseBSR_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5084530Z   test_consistency_SparseBSR_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5084707Z   test_consistency_SparseBSR_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5084883Z   test_consistency_SparseBSR_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5085056Z   test_consistency_SparseBSR_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5085229Z   test_consistency_SparseBSR_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5085406Z   test_consistency_SparseCSC_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5085581Z   test_consistency_SparseCSC_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5085760Z   test_consistency_SparseCSC_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5085932Z   test_consistency_SparseCSC_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5086136Z   test_consistency_SparseCSC_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5086312Z   test_consistency_SparseCSC_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5086484Z   test_consistency_SparseCSC_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5086657Z   test_consistency_SparseCSC_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5086831Z   test_consistency_SparseCSC_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5087002Z   test_consistency_SparseCSC_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5087174Z   test_consistency_SparseCSC_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5087346Z   test_consistency_SparseCSC_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5087515Z   test_consistency_SparseCSC_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5087690Z   test_consistency_SparseCSC_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5087875Z   test_consistency_SparseCSC_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5088056Z   test_consistency_SparseCSC_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5088232Z   test_consistency_SparseCSC_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5088410Z   test_consistency_SparseCSC_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5088583Z   test_consistency_SparseCSC_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5088790Z   test_consistency_SparseCSC_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5088968Z   test_consistency_SparseCSC_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5089134Z   test_consistency_SparseCSC_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5089308Z   test_consistency_SparseCSC_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5089482Z   test_consistency_SparseCSC_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5089660Z   test_consistency_SparseCSC_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5089832Z   test_consistency_SparseCSC_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5090011Z   test_consistency_SparseCSC_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5090193Z   test_consistency_SparseCSC_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5090371Z   test_consistency_SparseCSC_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5090547Z   test_consistency_SparseCSC_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5090713Z   test_consistency_SparseCSC_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5090883Z   test_consistency_SparseCSC_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5091056Z   test_consistency_SparseCSC_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5091228Z   test_consistency_SparseCSC_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5091401Z   test_consistency_SparseCSC_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5091576Z   test_consistency_SparseCSC_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5091769Z   test_consistency_SparseCSC_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5091952Z   test_consistency_SparseCSC_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5092120Z   test_consistency_SparseCSC_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5092299Z   test_consistency_SparseCSC_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5092479Z   test_consistency_SparseCSC_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5092655Z   test_consistency_SparseCSC_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5092832Z   test_consistency_SparseCSC_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5093007Z   test_consistency_SparseCSC_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5093187Z   test_consistency_SparseCSC_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5093361Z   test_consistency_SparseCSC_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5093542Z   test_consistency_SparseCSC_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5093735Z   test_consistency_SparseCSC_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5093924Z   test_consistency_SparseCSC_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5094093Z   test_consistency_SparseCSC_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5094270Z   test_consistency_SparseCSC_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5094580Z   test_consistency_SparseCSC_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5094764Z   test_consistency_SparseCSC_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5094943Z   test_consistency_SparseCSC_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5095126Z   test_consistency_SparseCSC_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5095301Z   test_consistency_SparseCSC_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5095465Z   test_consistency_SparseCSC_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5095635Z   test_consistency_SparseCSC_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5095806Z   test_consistency_SparseCSC_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5095979Z   test_consistency_SparseCSC_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5096150Z   test_consistency_SparseCSC_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5096321Z   test_consistency_SparseCSC_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5096493Z   test_consistency_SparseCSC_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5096671Z   test_consistency_SparseCSC_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5096841Z   test_consistency_SparseCSC_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5097016Z   test_consistency_SparseCSC_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5097197Z   test_consistency_SparseCSC_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5097374Z   test_consistency_SparseCSC_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5097582Z   test_consistency_SparseCSC_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5097761Z   test_consistency_SparseCSC_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5097931Z   test_consistency_SparseCSC_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5098102Z   test_consistency_SparseCSC_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5098276Z   test_consistency_SparseCSC_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5098440Z   test_consistency_SparseCSC_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5098662Z   test_consistency_SparseCSC_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5098836Z   test_consistency_SparseCSC_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5099012Z   test_consistency_SparseCSC_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5099184Z   test_consistency_SparseCSC_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5099354Z   test_consistency_SparseCSC_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5099526Z   test_consistency_SparseCSC_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5099695Z   test_consistency_SparseCSC_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5099863Z   test_consistency_SparseCSC_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5100027Z   test_consistency_SparseCSC_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5100230Z   test_consistency_SparseCSC_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5100404Z   test_consistency_SparseCSC_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5100594Z   test_consistency_SparseCSC_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5100779Z   test_consistency_SparseCSC_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5100968Z   test_consistency_SparseCSC_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5101156Z   test_consistency_SparseCSC_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5101343Z   test_consistency_SparseCSC_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5101531Z   test_consistency_SparseCSC_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5101708Z   test_consistency_SparseCSC_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5101893Z   test_consistency_SparseCSC_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5102075Z   test_consistency_SparseCSC_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5102256Z   test_consistency_SparseCSC_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5102434Z   test_consistency_SparseCSC_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5102617Z   test_consistency_SparseCSC_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5102798Z   test_consistency_SparseCSC_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5102984Z   test_consistency_SparseCSC_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5103157Z   test_consistency_SparseCSC_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5103352Z   test_consistency_SparseCSC_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5103554Z   test_consistency_SparseCSC_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5103749Z   test_consistency_SparseCSC_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5103924Z   test_consistency_SparseCSC_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5104094Z   test_consistency_SparseCSC_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5104264Z   test_consistency_SparseCSC_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5104441Z   test_consistency_SparseCSC_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5104613Z   test_consistency_SparseCSC_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5104789Z   test_consistency_SparseCSC_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5104955Z   test_consistency_SparseCSC_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5105130Z   test_consistency_SparseCSC_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5105302Z   test_consistency_SparseCSC_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5105473Z   test_consistency_SparseCSC_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5105643Z   test_consistency_SparseCSC_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5105838Z   test_consistency_SparseCSC_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5106009Z   test_consistency_SparseCSC_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5106180Z   test_consistency_SparseCSC_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5106348Z   test_consistency_SparseCSC_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5106514Z   test_consistency_SparseCSC_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5106694Z   test_consistency_SparseCSC_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5106872Z   test_consistency_SparseCSC_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5107047Z   test_consistency_SparseCSC_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5107225Z   test_consistency_SparseCSC_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5107398Z   test_consistency_SparseCSC_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5107572Z   test_consistency_SparseCSC_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5107743Z   test_consistency_SparseCSC_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5107915Z   test_consistency_SparseCSC_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5108086Z   test_consistency_SparseCSC_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5108258Z   test_consistency_SparseCSC_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5108431Z   test_consistency_SparseCSC_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5108603Z   test_consistency_SparseCSC_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5108772Z   test_consistency_SparseCSC_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5108968Z   test_consistency_SparseCSC_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5109139Z   test_consistency_SparseCSC_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5109311Z   test_consistency_SparseCSC_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5109476Z   test_consistency_SparseCSC_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5109647Z   test_consistency_SparseCSC_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5109822Z   test_consistency_SparseCSC_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5110004Z   test_consistency_SparseCSC_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5110175Z   test_consistency_SparseCSC_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5110345Z   test_consistency_SparseCSC_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5110514Z   test_consistency_SparseCSC_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5110684Z   test_consistency_SparseCSC_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5110853Z   test_consistency_SparseCSC_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5111017Z   test_consistency_SparseCSC_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5111184Z   test_consistency_SparseCSC_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5111385Z   test_consistency_SparseCSC_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5111557Z   test_consistency_SparseCSC_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5111731Z   test_consistency_SparseCSC_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5111900Z   test_consistency_SparseCSC_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5112077Z   test_consistency_SparseCSC_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5112248Z   test_consistency_SparseCSC_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5112430Z   test_consistency_SparseCSC_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5112602Z   test_consistency_SparseCSC_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5112780Z   test_consistency_SparseCSC_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5112955Z   test_consistency_SparseCSC_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5113122Z   test_consistency_SparseCSC_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5113291Z   test_consistency_SparseCSC_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5113486Z   test_consistency_SparseCSC_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5113680Z   test_consistency_SparseCSC_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5113849Z   test_consistency_SparseCSC_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5114022Z   test_consistency_SparseCSC_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5114187Z   test_consistency_SparseCSC_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5114362Z   test_consistency_SparseCSC_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5114559Z   test_consistency_SparseCSC_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5114742Z   test_consistency_SparseCSC_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5114922Z   test_consistency_SparseCSC_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5115094Z   test_consistency_SparseCSC_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5115265Z   test_consistency_SparseCSC_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5115434Z   test_consistency_SparseCSC_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5115607Z   test_consistency_SparseCSC_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5115772Z   test_consistency_SparseCSC_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5115943Z   test_consistency_SparseCSC_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5116111Z   test_consistency_SparseCSC_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5116276Z   test_consistency_SparseCSC_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5116459Z   test_consistency_SparseCSC_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5116637Z   test_consistency_SparseCSC_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5116818Z   test_consistency_SparseCSC_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5117018Z   test_consistency_SparseCSC_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5117194Z   test_consistency_SparseCSC_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5117370Z   test_consistency_SparseCSC_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5117541Z   test_consistency_SparseCSC_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5117712Z   test_consistency_SparseCSC_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5117886Z   test_consistency_SparseCSC_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5118064Z   test_consistency_SparseCSC_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5118244Z   test_consistency_SparseCSC_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5118420Z   test_consistency_SparseCSC_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5118600Z   test_consistency_SparseCSC_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5118769Z   test_consistency_SparseCSC_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5118948Z   test_consistency_SparseCSC_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5119122Z   test_consistency_SparseCSC_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5119295Z   test_consistency_SparseCSC_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5119464Z   test_consistency_SparseCSC_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5119639Z   test_consistency_SparseCSC_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5119816Z   test_consistency_SparseCSC_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5120016Z   test_consistency_SparseCSC_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5120192Z   test_consistency_SparseCSC_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5120363Z   test_consistency_SparseCSC_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5120541Z   test_consistency_SparseCSC_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5120716Z   test_consistency_SparseCSC_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5120888Z   test_consistency_SparseCSC_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5121065Z   test_consistency_SparseCSC_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5121237Z   test_consistency_SparseCSC_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5121412Z   test_consistency_SparseCSC_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5121584Z   test_consistency_SparseCSC_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5121755Z   test_consistency_SparseCSC_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5121918Z   test_consistency_SparseCSC_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5122173Z   test_consistency_SparseCSC_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5122422Z   test_consistency_SparseCSC_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5122700Z   test_consistency_SparseCSC_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5122945Z   test_consistency_SparseCSC_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5123195Z   test_consistency_SparseCSC_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5123443Z   test_consistency_SparseCSC_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5123680Z   test_consistency_SparseCSC_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5123930Z   test_consistency_SparseCSC_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5124170Z   test_consistency_SparseCSC_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5124419Z   test_consistency_SparseCSC_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5124661Z   test_consistency_SparseCSC_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5124896Z   test_consistency_SparseCSC_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5125145Z   test_consistency_SparseCSC_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5125410Z   test_consistency_SparseCSC_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5125655Z   test_consistency_SparseCSC_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5125892Z   test_consistency_SparseCSC_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5126136Z   test_consistency_SparseCSC_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5126384Z   test_consistency_SparseCSC_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5126636Z   test_consistency_SparseCSC_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5126882Z   test_consistency_SparseCSC_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5127137Z   test_consistency_SparseCSC_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5127393Z   test_consistency_SparseCSC_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5127673Z   test_consistency_SparseCSC_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5127927Z   test_consistency_SparseCSC_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5128161Z   test_consistency_SparseCSC_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5128401Z   test_consistency_SparseCSC_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5128644Z   test_consistency_SparseCSC_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5128881Z   test_consistency_SparseCSC_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5129124Z   test_consistency_SparseCSC_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5129361Z   test_consistency_SparseCSC_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5129609Z   test_consistency_SparseCSC_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5129852Z   test_consistency_SparseCSC_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5130107Z   test_consistency_SparseCSC_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5130380Z   test_consistency_SparseCSC_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5130627Z   test_consistency_SparseCSC_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5130879Z   test_consistency_SparseCSC_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5131120Z   test_consistency_SparseCSC_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5131356Z   test_consistency_SparseCSC_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5131599Z   test_consistency_SparseCSC_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5131838Z   test_consistency_SparseCSC_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5132078Z   test_consistency_SparseCSC_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5132316Z   test_consistency_SparseCSC_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.prod does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5132590Z   test_consistency_SparseCSC_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5132840Z   test_consistency_SparseCSC_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5133094Z   test_consistency_SparseCSC_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5133344Z   test_consistency_SparseCSC_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5133591Z   test_consistency_SparseCSC_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5133832Z   test_consistency_SparseCSC_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5134077Z   test_consistency_SparseCSC_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5134320Z   test_consistency_SparseCSC_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5134654Z   test_consistency_SparseCSC_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5134946Z   test_consistency_SparseCSC_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5135225Z   test_consistency_SparseCSC_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5135525Z   test_consistency_SparseCSC_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.sum does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5135706Z   test_consistency_SparseCSC_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5135886Z   test_consistency_SparseCSC_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5136062Z   test_consistency_SparseCSC_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5136235Z   test_consistency_SparseCSC_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5136405Z   test_consistency_SparseCSC_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5136580Z   test_consistency_SparseCSC_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5136749Z   test_consistency_SparseCSC_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5136922Z   test_consistency_SparseCSC_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5137093Z   test_consistency_SparseCSC_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5137263Z   test_consistency_SparseCSC_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5137433Z   test_consistency_SparseCSC_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5137601Z   test_consistency_SparseCSC_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5137793Z   test_consistency_SparseCSC_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5137982Z   test_consistency_SparseCSC_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5138270Z   test_consistency_SparseCSC_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5138459Z   test_consistency_SparseCSC_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5138724Z   test_consistency_SparseCSC_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5138913Z   test_consistency_SparseCSC_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5139102Z   test_consistency_SparseCSC_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5139294Z   test_consistency_SparseCSC_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5139480Z   test_consistency_SparseCSC_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5139668Z   test_consistency_SparseCSC_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5139858Z   test_consistency_SparseCSC_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5140044Z   test_consistency_SparseCSC_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5140223Z   test_consistency_SparseCSC_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5140404Z   test_consistency_SparseCSC_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5140587Z   test_consistency_SparseCSC_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5140768Z   test_consistency_SparseCSC_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5140948Z   test_consistency_SparseCSC_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5141128Z   test_consistency_SparseCSC_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5141338Z   test_consistency_SparseCSC_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5141517Z   test_consistency_SparseCSC_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5141696Z   test_consistency_SparseCSC_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5141869Z   test_consistency_SparseCSC_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5142052Z   test_consistency_SparseCSC_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5142231Z   test_consistency_SparseCSC_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5142406Z   test_consistency_SparseCSC_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5142584Z   test_consistency_SparseCSC_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5142763Z   test_consistency_SparseCSC_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5142937Z   test_consistency_SparseCSC_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5143117Z   test_consistency_SparseCSC_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5143289Z   test_consistency_SparseCSC_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5143488Z   test_consistency_SparseCSC_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5143765Z   test_consistency_SparseCSC_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5144047Z   test_consistency_SparseCSC_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5144302Z   test_consistency_SparseCSC_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5144553Z   test_consistency_SparseCSC_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5144804Z   test_consistency_SparseCSC_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5145052Z   test_consistency_SparseCSC_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5145294Z   test_consistency_SparseCSC_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: randn_like does not support input with torch.sparse_csc layout (0.002s)
2023-01-11T21:45:10.5145478Z   test_consistency_SparseCSC_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5145657Z   test_consistency_SparseCSC_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5145823Z   test_consistency_SparseCSC_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5145995Z   test_consistency_SparseCSC_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5146174Z   test_consistency_SparseCSC_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5146350Z   test_consistency_SparseCSC_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5146530Z   test_consistency_SparseCSC_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5146710Z   test_consistency_SparseCSC_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5146905Z   test_consistency_SparseCSC_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5147085Z   test_consistency_SparseCSC_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5147260Z   test_consistency_SparseCSC_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5147436Z   test_consistency_SparseCSC_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5147609Z   test_consistency_SparseCSC_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5147782Z   test_consistency_SparseCSC_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5147957Z   test_consistency_SparseCSC_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5148129Z   test_consistency_SparseCSC_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5148303Z   test_consistency_SparseCSC_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5148478Z   test_consistency_SparseCSC_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5148651Z   test_consistency_SparseCSC_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5148822Z   test_consistency_SparseCSC_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5148989Z   test_consistency_SparseCSC_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5149158Z   test_consistency_SparseCSC_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5149334Z   test_consistency_SparseCSC_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5149533Z   test_consistency_SparseCSC_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5149714Z   test_consistency_SparseCSC_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5149891Z   test_consistency_SparseCSC_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5150062Z   test_consistency_SparseCSC_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5150238Z   test_consistency_SparseCSC_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5150412Z   test_consistency_SparseCSC_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5150580Z   test_consistency_SparseCSC_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5150751Z   test_consistency_SparseCSC_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5150923Z   test_consistency_SparseCSC_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5151113Z   test_consistency_SparseCSC_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5151289Z   test_consistency_SparseCSC_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5151472Z   test_consistency_SparseCSC_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5151653Z   test_consistency_SparseCSC_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5151828Z   test_consistency_SparseCSC_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5152005Z   test_consistency_SparseCSC_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5152171Z   test_consistency_SparseCSC_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5152345Z   test_consistency_SparseCSC_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5152552Z   test_consistency_SparseCSC_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5152734Z   test_consistency_SparseCSC_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5152908Z   test_consistency_SparseCSC_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5153083Z   test_consistency_SparseCSC_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5153262Z   test_consistency_SparseCSC_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5153437Z   test_consistency_SparseCSC_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5153602Z   test_consistency_SparseCSC_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5153782Z   test_consistency_SparseCSC_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5153962Z   test_consistency_SparseCSC_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5154137Z   test_consistency_SparseCSC_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5154310Z   test_consistency_SparseCSC_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5154484Z   test_consistency_SparseCSC_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5154655Z   test_consistency_SparseCSC_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5154827Z   test_consistency_SparseCSC_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5154996Z   test_consistency_SparseCSC_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5155198Z   test_consistency_SparseCSC_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5155375Z   test_consistency_SparseCSC_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5155553Z   test_consistency_SparseCSC_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5155733Z   test_consistency_SparseCSC_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5155913Z   test_consistency_SparseCSC_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5156092Z   test_consistency_SparseCSC_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5156262Z   test_consistency_SparseCSC_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5156435Z   test_consistency_SparseCSC_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5156610Z   test_consistency_SparseCSC_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5156779Z   test_consistency_SparseCSC_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5156952Z   test_consistency_SparseCSC_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5157124Z   test_consistency_SparseCSC_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5157296Z   test_consistency_SparseCSC_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5157471Z   test_consistency_SparseCSC_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5157641Z   test_consistency_SparseCSC_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5157821Z   test_consistency_SparseCSC_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5158003Z   test_consistency_SparseCSC_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5158203Z   test_consistency_SparseCSC_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5158374Z   test_consistency_SparseCSC_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5158546Z   test_consistency_SparseCSC_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5158724Z   test_consistency_SparseCSC_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5158899Z   test_consistency_SparseCSC_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5159070Z   test_consistency_SparseCSC_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5159243Z   test_consistency_SparseCSC_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5159421Z   test_consistency_SparseCSC_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5159593Z   test_consistency_SparseCSC_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5159762Z   test_consistency_SparseCSC_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5159936Z   test_consistency_SparseCSC_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5160113Z   test_consistency_SparseCSC_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5160290Z   test_consistency_SparseCSC_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5160466Z   test_consistency_SparseCSC_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5160639Z   test_consistency_SparseCSC_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5160843Z   test_consistency_SparseCSC_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5161027Z   test_consistency_SparseCSC_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5161200Z   test_consistency_SparseCSC_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5161365Z   test_consistency_SparseCSC_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5161546Z   test_consistency_SparseCSC_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5161719Z   test_consistency_SparseCSC_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5161889Z   test_consistency_SparseCSC_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5162067Z   test_consistency_SparseCSC_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5162241Z   test_consistency_SparseCSC_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5162423Z   test_consistency_SparseCSC_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5162602Z   test_consistency_SparseCSC_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5162778Z   test_consistency_SparseCSC_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5162947Z   test_consistency_SparseCSC_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5163122Z   test_consistency_SparseCSC_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5163300Z   test_consistency_SparseCSC_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5163475Z   test_consistency_SparseCSC_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5163650Z   test_consistency_SparseCSC_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5163825Z   test_consistency_SparseCSC_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5164020Z   test_consistency_SparseCSC_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5164191Z   test_consistency_SparseCSC_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5164372Z   test_consistency_SparseCSC_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5164553Z   test_consistency_SparseCSC_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5164739Z   test_consistency_SparseCSC_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5164927Z   test_consistency_SparseCSC_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5165117Z   test_consistency_SparseCSC_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5165303Z   test_consistency_SparseCSC_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5165483Z   test_consistency_SparseCSC_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5165666Z   test_consistency_SparseCSC_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5165847Z   test_consistency_SparseCSC_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5166014Z   test_consistency_SparseCSC_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5166194Z   test_consistency_SparseCSC_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5166368Z   test_consistency_SparseCSC_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5166589Z   test_consistency_SparseCSC_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5166772Z   test_consistency_SparseCSC_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5166946Z   test_consistency_SparseCSC_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5167121Z   test_consistency_SparseCSC_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5167297Z   test_consistency_SparseCSC_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5167473Z   test_consistency_SparseCSC_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5167639Z   test_consistency_SparseCSC_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5167814Z   test_consistency_SparseCSC_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5167988Z   test_consistency_SparseCSC_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5168170Z   test_consistency_SparseCSR_abs_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5168343Z   test_consistency_SparseCSR_abs_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5168522Z   test_consistency_SparseCSR_abs_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5168699Z   test_consistency_SparseCSR_abs_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5168871Z   test_consistency_SparseCSR_abs_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5169046Z   test_consistency_SparseCSR_abs_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5169212Z   test_consistency_SparseCSR_abs_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5169392Z   test_consistency_SparseCSR_abs_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5169594Z   test_consistency_SparseCSR_abs_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5169770Z   test_consistency_SparseCSR_abs_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5169944Z   test_consistency_SparseCSR_abs_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5170119Z   test_consistency_SparseCSR_abs_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5170297Z   test_consistency_SparseCSR_abs_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5170475Z   test_consistency_SparseCSR_angle_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5170660Z   test_consistency_SparseCSR_angle_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5170835Z   test_consistency_SparseCSR_angle_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5171021Z   test_consistency_SparseCSR_angle_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5171200Z   test_consistency_SparseCSR_angle_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5171373Z   test_consistency_SparseCSR_angle_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5171554Z   test_consistency_SparseCSR_angle_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5171735Z   test_consistency_SparseCSR_angle_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5171910Z   test_consistency_SparseCSR_angle_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5172086Z   test_consistency_SparseCSR_angle_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5172288Z   test_consistency_SparseCSR_angle_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5172464Z   test_consistency_SparseCSR_asin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5172640Z   test_consistency_SparseCSR_asin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5172824Z   test_consistency_SparseCSR_asin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5173009Z   test_consistency_SparseCSR_asin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5173189Z   test_consistency_SparseCSR_asin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5173371Z   test_consistency_SparseCSR_asin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5173550Z   test_consistency_SparseCSR_asin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5173728Z   test_consistency_SparseCSR_asin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5173899Z   test_consistency_SparseCSR_asin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5174078Z   test_consistency_SparseCSR_asin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5174261Z   test_consistency_SparseCSR_asin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5174438Z   test_consistency_SparseCSR_asin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5174732Z   test_consistency_SparseCSR_asin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5174914Z   test_consistency_SparseCSR_asinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5175088Z   test_consistency_SparseCSR_asinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5175279Z   test_consistency_SparseCSR_asinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5175499Z   test_consistency_SparseCSR_asinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5175665Z   test_consistency_SparseCSR_asinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5175844Z   test_consistency_SparseCSR_asinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5176021Z   test_consistency_SparseCSR_asinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5176195Z   test_consistency_SparseCSR_asinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5176371Z   test_consistency_SparseCSR_asinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5176546Z   test_consistency_SparseCSR_asinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5176726Z   test_consistency_SparseCSR_asinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5176903Z   test_consistency_SparseCSR_asinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5177072Z   test_consistency_SparseCSR_asinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5177243Z   test_consistency_SparseCSR_atan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5177419Z   test_consistency_SparseCSR_atan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5177599Z   test_consistency_SparseCSR_atan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5177776Z   test_consistency_SparseCSR_atan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5177949Z   test_consistency_SparseCSR_atan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5178157Z   test_consistency_SparseCSR_atan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5178333Z   test_consistency_SparseCSR_atan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5178577Z   test_consistency_SparseCSR_atan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5178758Z   test_consistency_SparseCSR_atan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5178923Z   test_consistency_SparseCSR_atan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5179093Z   test_consistency_SparseCSR_atan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5179267Z   test_consistency_SparseCSR_atan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5179435Z   test_consistency_SparseCSR_atan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5179618Z   test_consistency_SparseCSR_atanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5179795Z   test_consistency_SparseCSR_atanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5179982Z   test_consistency_SparseCSR_atanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5180164Z   test_consistency_SparseCSR_atanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5180334Z   test_consistency_SparseCSR_atanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5180510Z   test_consistency_SparseCSR_atanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5180684Z   test_consistency_SparseCSR_atanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5180850Z   test_consistency_SparseCSR_atanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5181024Z   test_consistency_SparseCSR_atanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5181224Z   test_consistency_SparseCSR_atanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5181398Z   test_consistency_SparseCSR_atanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5181570Z   test_consistency_SparseCSR_atanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5181746Z   test_consistency_SparseCSR_atanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5181914Z   test_consistency_SparseCSR_ceil_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5182087Z   test_consistency_SparseCSR_ceil_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5182256Z   test_consistency_SparseCSR_ceil_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5182427Z   test_consistency_SparseCSR_ceil_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5182607Z   test_consistency_SparseCSR_ceil_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5182781Z   test_consistency_SparseCSR_ceil_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5182949Z   test_consistency_SparseCSR_ceil_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5183120Z   test_consistency_SparseCSR_ceil_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5183291Z   test_consistency_SparseCSR_ceil_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5183473Z   test_consistency_SparseCSR_conj_physical_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5183659Z   test_consistency_SparseCSR_conj_physical_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5183876Z   test_consistency_SparseCSR_conj_physical_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5184067Z   test_consistency_SparseCSR_conj_physical_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5184259Z   test_consistency_SparseCSR_conj_physical_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5184447Z   test_consistency_SparseCSR_conj_physical_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5184634Z   test_consistency_SparseCSR_conj_physical_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5184821Z   test_consistency_SparseCSR_conj_physical_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5185007Z   test_consistency_SparseCSR_conj_physical_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5185188Z   test_consistency_SparseCSR_conj_physical_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5185369Z   test_consistency_SparseCSR_conj_physical_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5185555Z   test_consistency_SparseCSR_conj_physical_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5185741Z   test_consistency_SparseCSR_conj_physical_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5185924Z   test_consistency_SparseCSR_deg2rad_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5186103Z   test_consistency_SparseCSR_deg2rad_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5186285Z   test_consistency_SparseCSR_deg2rad_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5186462Z   test_consistency_SparseCSR_deg2rad_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5186637Z   test_consistency_SparseCSR_deg2rad_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5186829Z   test_consistency_SparseCSR_deg2rad_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5187005Z   test_consistency_SparseCSR_deg2rad_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5187184Z   test_consistency_SparseCSR_deg2rad_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5187362Z   test_consistency_SparseCSR_deg2rad_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5187537Z   test_consistency_SparseCSR_deg2rad_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5187711Z   test_consistency_SparseCSR_erf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5187880Z   test_consistency_SparseCSR_erf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5188056Z   test_consistency_SparseCSR_erf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5188232Z   test_consistency_SparseCSR_erf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5188397Z   test_consistency_SparseCSR_erf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5188568Z   test_consistency_SparseCSR_erf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5188737Z   test_consistency_SparseCSR_erf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5188906Z   test_consistency_SparseCSR_erf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5189076Z   test_consistency_SparseCSR_erf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5189241Z   test_consistency_SparseCSR_erf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5189441Z   test_consistency_SparseCSR_erfinv_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5189620Z   test_consistency_SparseCSR_erfinv_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5189801Z   test_consistency_SparseCSR_erfinv_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5189969Z   test_consistency_SparseCSR_erfinv_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5190146Z   test_consistency_SparseCSR_erfinv_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5190316Z   test_consistency_SparseCSR_erfinv_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5190485Z   test_consistency_SparseCSR_erfinv_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5190656Z   test_consistency_SparseCSR_erfinv_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5190834Z   test_consistency_SparseCSR_erfinv_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5191017Z   test_consistency_SparseCSR_expm1_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5191193Z   test_consistency_SparseCSR_expm1_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5191359Z   test_consistency_SparseCSR_expm1_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5191531Z   test_consistency_SparseCSR_expm1_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5191702Z   test_consistency_SparseCSR_expm1_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5191872Z   test_consistency_SparseCSR_expm1_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5192042Z   test_consistency_SparseCSR_expm1_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5192214Z   test_consistency_SparseCSR_expm1_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5192384Z   test_consistency_SparseCSR_expm1_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5192583Z   test_consistency_SparseCSR_expm1_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5192762Z   test_consistency_SparseCSR_floor_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5192927Z   test_consistency_SparseCSR_floor_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5193092Z   test_consistency_SparseCSR_floor_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5193269Z   test_consistency_SparseCSR_floor_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5193464Z   test_consistency_SparseCSR_floor_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5193661Z   test_consistency_SparseCSR_floor_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5193829Z   test_consistency_SparseCSR_floor_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5194004Z   test_consistency_SparseCSR_floor_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5194170Z   test_consistency_SparseCSR_floor_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5194345Z   test_consistency_SparseCSR_frac_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5194512Z   test_consistency_SparseCSR_frac_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5194680Z   test_consistency_SparseCSR_frac_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5194849Z   test_consistency_SparseCSR_frac_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5195051Z   test_consistency_SparseCSR_isinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5195222Z   test_consistency_SparseCSR_isinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5195406Z   test_consistency_SparseCSR_isinf_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5195585Z   test_consistency_SparseCSR_isinf_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5195761Z   test_consistency_SparseCSR_isinf_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5195935Z   test_consistency_SparseCSR_isinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5196102Z   test_consistency_SparseCSR_isinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5196272Z   test_consistency_SparseCSR_isinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5196443Z   test_consistency_SparseCSR_isinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5196615Z   test_consistency_SparseCSR_isinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5196788Z   test_consistency_SparseCSR_isinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5196955Z   test_consistency_SparseCSR_isinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5197124Z   test_consistency_SparseCSR_isinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5197298Z   test_consistency_SparseCSR_isnan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5197472Z   test_consistency_SparseCSR_isnan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5197647Z   test_consistency_SparseCSR_isnan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5197829Z   test_consistency_SparseCSR_isnan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5198025Z   test_consistency_SparseCSR_isnan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5198200Z   test_consistency_SparseCSR_isnan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5198368Z   test_consistency_SparseCSR_isnan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5198537Z   test_consistency_SparseCSR_isnan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5198706Z   test_consistency_SparseCSR_isnan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5198877Z   test_consistency_SparseCSR_isnan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5199043Z   test_consistency_SparseCSR_isnan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5199215Z   test_consistency_SparseCSR_isnan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5199399Z   test_consistency_SparseCSR_isneginf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5199577Z   test_consistency_SparseCSR_isneginf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5199758Z   test_consistency_SparseCSR_isneginf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5199937Z   test_consistency_SparseCSR_isneginf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5200117Z   test_consistency_SparseCSR_isneginf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5200294Z   test_consistency_SparseCSR_isneginf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5200467Z   test_consistency_SparseCSR_isneginf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5200665Z   test_consistency_SparseCSR_isneginf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5200840Z   test_consistency_SparseCSR_isneginf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5201017Z   test_consistency_SparseCSR_isneginf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5201199Z   test_consistency_SparseCSR_isposinf_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5201376Z   test_consistency_SparseCSR_isposinf_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5201554Z   test_consistency_SparseCSR_isposinf_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5201732Z   test_consistency_SparseCSR_isposinf_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5201911Z   test_consistency_SparseCSR_isposinf_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5202089Z   test_consistency_SparseCSR_isposinf_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5202259Z   test_consistency_SparseCSR_isposinf_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5202430Z   test_consistency_SparseCSR_isposinf_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5202607Z   test_consistency_SparseCSR_isposinf_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5202780Z   test_consistency_SparseCSR_isposinf_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5202956Z   test_consistency_SparseCSR_log1p_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5203130Z   test_consistency_SparseCSR_log1p_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5203313Z   test_consistency_SparseCSR_log1p_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5203493Z   test_consistency_SparseCSR_log1p_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5203721Z   test_consistency_SparseCSR_log1p_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5203911Z   test_consistency_SparseCSR_log1p_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5204082Z   test_consistency_SparseCSR_log1p_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5204251Z   test_consistency_SparseCSR_log1p_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5204421Z   test_consistency_SparseCSR_log1p_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5204591Z   test_consistency_SparseCSR_log1p_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5204762Z   test_consistency_SparseCSR_log1p_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5204933Z   test_consistency_SparseCSR_log1p_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5205677Z   test_consistency_SparseCSR_masked_amax_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/masked/_ops.py:767: UserWarning: scatter_reduce() is in beta and the API may change at any time. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorAdvancedIndexing.cpp:1739.)
2023-01-11T21:45:10.5205772Z   new_values.scatter_reduce_(
2023-01-11T21:45:10.5205837Z ok (0.023s)
2023-01-11T21:45:10.5206021Z   test_consistency_SparseCSR_masked_amax_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5206207Z   test_consistency_SparseCSR_masked_amax_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5206390Z   test_consistency_SparseCSR_masked_amax_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5206682Z   test_consistency_SparseCSR_masked_amax_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5206942Z   test_consistency_SparseCSR_masked_amax_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5207191Z   test_consistency_SparseCSR_masked_amax_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5207438Z   test_consistency_SparseCSR_masked_amax_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5207680Z   test_consistency_SparseCSR_masked_amax_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amax does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5207869Z   test_consistency_SparseCSR_masked_amin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5208053Z   test_consistency_SparseCSR_masked_amin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5208228Z   test_consistency_SparseCSR_masked_amin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5208406Z   test_consistency_SparseCSR_masked_amin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5208653Z   test_consistency_SparseCSR_masked_amin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5208893Z   test_consistency_SparseCSR_masked_amin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5209133Z   test_consistency_SparseCSR_masked_amin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5209404Z   test_consistency_SparseCSR_masked_amin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5209644Z   test_consistency_SparseCSR_masked_amin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.amin does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5209828Z   test_consistency_SparseCSR_masked_mean_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5210069Z   test_consistency_SparseCSR_masked_mean_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5210395Z   test_consistency_SparseCSR_masked_mean_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5210669Z   test_consistency_SparseCSR_masked_mean_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5210847Z   test_consistency_SparseCSR_masked_mean_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5211041Z   test_consistency_SparseCSR_masked_mean_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5211288Z   test_consistency_SparseCSR_masked_mean_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5211551Z   test_consistency_SparseCSR_masked_mean_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5211812Z   test_consistency_SparseCSR_masked_mean_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5212099Z   test_consistency_SparseCSR_masked_mean_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5212479Z   test_consistency_SparseCSR_masked_mean_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5212737Z   test_consistency_SparseCSR_masked_mean_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: masked.mean does not support input with torch.sparse_csr layout (0.002s)
2023-01-11T21:45:10.5212935Z   test_consistency_SparseCSR_masked_prod_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.674s)
2023-01-11T21:45:10.5213138Z   test_consistency_SparseCSR_masked_prod_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.646s)
2023-01-11T21:45:10.5213350Z   test_consistency_SparseCSR_masked_prod_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (2.104s)
2023-01-11T21:45:10.5213562Z   test_consistency_SparseCSR_masked_prod_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (1.421s)
2023-01-11T21:45:10.5213770Z   test_consistency_SparseCSR_masked_prod_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.676s)
2023-01-11T21:45:10.5213964Z   test_consistency_SparseCSR_masked_prod_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.646s)
2023-01-11T21:45:10.5214182Z   test_consistency_SparseCSR_masked_prod_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.649s)
2023-01-11T21:45:10.5214359Z   test_consistency_SparseCSR_masked_prod_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.009s)
2023-01-11T21:45:10.5214727Z   test_consistency_SparseCSR_masked_prod_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.009s)
2023-01-11T21:45:10.5214980Z   test_consistency_SparseCSR_masked_prod_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5215234Z   test_consistency_SparseCSR_masked_prod_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5215429Z   test_consistency_SparseCSR_masked_prod_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5215702Z   test_consistency_SparseCSR_masked_sum_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5215906Z   test_consistency_SparseCSR_masked_sum_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.009s)
2023-01-11T21:45:10.5216107Z   test_consistency_SparseCSR_masked_sum_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5216339Z   test_consistency_SparseCSR_masked_sum_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5216516Z   test_consistency_SparseCSR_masked_sum_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5216711Z   test_consistency_SparseCSR_masked_sum_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5216908Z   test_consistency_SparseCSR_masked_sum_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5217103Z   test_consistency_SparseCSR_masked_sum_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5217307Z   test_consistency_SparseCSR_masked_sum_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5217507Z   test_consistency_SparseCSR_masked_sum_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5217704Z   test_consistency_SparseCSR_masked_sum_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5217901Z   test_consistency_SparseCSR_masked_sum_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.008s)
2023-01-11T21:45:10.5218107Z   test_consistency_SparseCSR_neg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5218281Z   test_consistency_SparseCSR_neg_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5218575Z   test_consistency_SparseCSR_neg_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5218832Z   test_consistency_SparseCSR_neg_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5219022Z   test_consistency_SparseCSR_neg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5219212Z   test_consistency_SparseCSR_neg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5219402Z   test_consistency_SparseCSR_neg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5219587Z   test_consistency_SparseCSR_neg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5219777Z   test_consistency_SparseCSR_neg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5219990Z   test_consistency_SparseCSR_neg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5220164Z   test_consistency_SparseCSR_neg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5220359Z   test_consistency_SparseCSR_neg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5220572Z   test_consistency_SparseCSR_nn_functional_relu_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5220780Z   test_consistency_SparseCSR_nn_functional_relu_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5220990Z   test_consistency_SparseCSR_nn_functional_relu_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5221193Z   test_consistency_SparseCSR_nn_functional_relu_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5221404Z   test_consistency_SparseCSR_nn_functional_relu_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5221613Z   test_consistency_SparseCSR_nn_functional_relu_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5221837Z   test_consistency_SparseCSR_nn_functional_relu_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5222049Z   test_consistency_SparseCSR_nn_functional_relu_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5222299Z   test_consistency_SparseCSR_nn_functional_relu_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5222497Z   test_consistency_SparseCSR_positive_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5222700Z   test_consistency_SparseCSR_positive_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5222901Z   test_consistency_SparseCSR_positive_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5223094Z   test_consistency_SparseCSR_positive_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5223296Z   test_consistency_SparseCSR_positive_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5223500Z   test_consistency_SparseCSR_positive_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5223714Z   test_consistency_SparseCSR_positive_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5223889Z   test_consistency_SparseCSR_positive_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5224085Z   test_consistency_SparseCSR_positive_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5224278Z   test_consistency_SparseCSR_positive_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5224472Z   test_consistency_SparseCSR_positive_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5224662Z   test_consistency_SparseCSR_positive_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5224890Z   test_consistency_SparseCSR_rad2deg_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5225089Z   test_consistency_SparseCSR_rad2deg_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5225288Z   test_consistency_SparseCSR_rad2deg_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5225534Z   test_consistency_SparseCSR_rad2deg_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5225703Z   test_consistency_SparseCSR_rad2deg_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5225898Z   test_consistency_SparseCSR_rad2deg_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5226091Z   test_consistency_SparseCSR_rad2deg_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5226278Z   test_consistency_SparseCSR_rad2deg_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5226482Z   test_consistency_SparseCSR_rad2deg_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5226677Z   test_consistency_SparseCSR_rad2deg_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5226878Z   test_consistency_SparseCSR_randn_like_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5227079Z   test_consistency_SparseCSR_randn_like_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5227295Z   test_consistency_SparseCSR_randn_like_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5227476Z   test_consistency_SparseCSR_randn_like_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5227675Z   test_consistency_SparseCSR_randn_like_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5227879Z   test_consistency_SparseCSR_randn_like_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5228080Z   test_consistency_SparseCSR_randn_like_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5228304Z   test_consistency_SparseCSR_round_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5228491Z   test_consistency_SparseCSR_round_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5228675Z   test_consistency_SparseCSR_round_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5228897Z   test_consistency_SparseCSR_round_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5229096Z   test_consistency_SparseCSR_round_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5229262Z   test_consistency_SparseCSR_round_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5229452Z   test_consistency_SparseCSR_round_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5229649Z   test_consistency_SparseCSR_round_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5229841Z   test_consistency_SparseCSR_round_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5230035Z   test_consistency_SparseCSR_sgn_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5230226Z   test_consistency_SparseCSR_sgn_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5230423Z   test_consistency_SparseCSR_sgn_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5230612Z   test_consistency_SparseCSR_sgn_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5230778Z   test_consistency_SparseCSR_sgn_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5231013Z   test_consistency_SparseCSR_sgn_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5231199Z   test_consistency_SparseCSR_sgn_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5231388Z   test_consistency_SparseCSR_sgn_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5231572Z   test_consistency_SparseCSR_sgn_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5231756Z   test_consistency_SparseCSR_sgn_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5231940Z   test_consistency_SparseCSR_sgn_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5232163Z   test_consistency_SparseCSR_sgn_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5232336Z   test_consistency_SparseCSR_sgn_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5239087Z   test_consistency_SparseCSR_sign_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5239284Z   test_consistency_SparseCSR_sign_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5239469Z   test_consistency_SparseCSR_sign_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5239643Z   test_consistency_SparseCSR_sign_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5239817Z   test_consistency_SparseCSR_sign_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5239986Z   test_consistency_SparseCSR_sign_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5240161Z   test_consistency_SparseCSR_sign_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5240334Z   test_consistency_SparseCSR_sign_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5240512Z   test_consistency_SparseCSR_sign_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5240684Z   test_consistency_SparseCSR_sign_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5240930Z   test_consistency_SparseCSR_signbit_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5241112Z   test_consistency_SparseCSR_signbit_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5241295Z   test_consistency_SparseCSR_signbit_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5241464Z   test_consistency_SparseCSR_signbit_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5241639Z   test_consistency_SparseCSR_signbit_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5241814Z   test_consistency_SparseCSR_signbit_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5241990Z   test_consistency_SparseCSR_signbit_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5242166Z   test_consistency_SparseCSR_signbit_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5242349Z   test_consistency_SparseCSR_signbit_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5242518Z   test_consistency_SparseCSR_signbit_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5242686Z   test_consistency_SparseCSR_sin_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5242856Z   test_consistency_SparseCSR_sin_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5243030Z   test_consistency_SparseCSR_sin_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5243205Z   test_consistency_SparseCSR_sin_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5243403Z   test_consistency_SparseCSR_sin_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5243601Z   test_consistency_SparseCSR_sin_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5243804Z   test_consistency_SparseCSR_sin_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5243976Z   test_consistency_SparseCSR_sin_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5244147Z   test_consistency_SparseCSR_sin_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5244319Z   test_consistency_SparseCSR_sin_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5244488Z   test_consistency_SparseCSR_sin_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5244654Z   test_consistency_SparseCSR_sin_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5244826Z   test_consistency_SparseCSR_sin_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5244999Z   test_consistency_SparseCSR_sinh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5245173Z   test_consistency_SparseCSR_sinh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5245353Z   test_consistency_SparseCSR_sinh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5245535Z   test_consistency_SparseCSR_sinh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5245717Z   test_consistency_SparseCSR_sinh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5245892Z   test_consistency_SparseCSR_sinh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5246063Z   test_consistency_SparseCSR_sinh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5246225Z   test_consistency_SparseCSR_sinh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5246395Z   test_consistency_SparseCSR_sinh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5246589Z   test_consistency_SparseCSR_sinh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5246764Z   test_consistency_SparseCSR_sinh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5246931Z   test_consistency_SparseCSR_sinh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5247102Z   test_consistency_SparseCSR_sinh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5247273Z   test_consistency_SparseCSR_sqrt_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5247444Z   test_consistency_SparseCSR_sqrt_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5247615Z   test_consistency_SparseCSR_sqrt_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5247797Z   test_consistency_SparseCSR_sqrt_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5247974Z   test_consistency_SparseCSR_sqrt_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5248150Z   test_consistency_SparseCSR_sqrt_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5248326Z   test_consistency_SparseCSR_sqrt_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5248496Z   test_consistency_SparseCSR_sqrt_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5248666Z   test_consistency_SparseCSR_sqrt_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5248834Z   test_consistency_SparseCSR_sqrt_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5249029Z   test_consistency_SparseCSR_sqrt_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5249191Z   test_consistency_SparseCSR_sqrt_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5249361Z   test_consistency_SparseCSR_sqrt_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5249533Z   test_consistency_SparseCSR_tan_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5249704Z   test_consistency_SparseCSR_tan_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5249883Z   test_consistency_SparseCSR_tan_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5250058Z   test_consistency_SparseCSR_tan_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5250230Z   test_consistency_SparseCSR_tan_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5250402Z   test_consistency_SparseCSR_tan_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5250579Z   test_consistency_SparseCSR_tan_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5250748Z   test_consistency_SparseCSR_tan_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5250916Z   test_consistency_SparseCSR_tan_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5251086Z   test_consistency_SparseCSR_tan_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5251255Z   test_consistency_SparseCSR_tan_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5251427Z   test_consistency_SparseCSR_tan_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5251595Z   test_consistency_SparseCSR_tan_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5251772Z   test_consistency_SparseCSR_tanh_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5251941Z   test_consistency_SparseCSR_tanh_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5252144Z   test_consistency_SparseCSR_tanh_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5252314Z   test_consistency_SparseCSR_tanh_cuda_complex32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5252494Z   test_consistency_SparseCSR_tanh_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5252669Z   test_consistency_SparseCSR_tanh_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5252840Z   test_consistency_SparseCSR_tanh_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5253009Z   test_consistency_SparseCSR_tanh_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5253181Z   test_consistency_SparseCSR_tanh_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5253354Z   test_consistency_SparseCSR_tanh_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5253530Z   test_consistency_SparseCSR_tanh_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5253717Z   test_consistency_SparseCSR_tanh_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5253910Z   test_consistency_SparseCSR_tanh_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5254096Z   test_consistency_SparseCSR_to_sparse_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5254301Z   test_consistency_SparseCSR_to_sparse_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5254689Z   test_consistency_SparseCSR_to_sparse_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5254945Z   test_consistency_SparseCSR_to_sparse_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5255289Z   test_consistency_SparseCSR_to_sparse_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5255483Z   test_consistency_SparseCSR_to_sparse_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5255665Z   test_consistency_SparseCSR_to_sparse_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5255836Z   test_consistency_SparseCSR_to_sparse_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5256014Z   test_consistency_SparseCSR_to_sparse_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5256184Z   test_consistency_SparseCSR_to_sparse_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5256360Z   test_consistency_SparseCSR_to_sparse_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5256534Z   test_consistency_SparseCSR_to_sparse_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5256710Z   test_consistency_SparseCSR_trunc_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5256885Z   test_consistency_SparseCSR_trunc_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5257056Z   test_consistency_SparseCSR_trunc_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.003s)
2023-01-11T21:45:10.5257227Z   test_consistency_SparseCSR_trunc_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5257393Z   test_consistency_SparseCSR_trunc_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5257568Z   test_consistency_SparseCSR_trunc_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5257737Z   test_consistency_SparseCSR_trunc_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5257914Z   test_consistency_SparseCSR_trunc_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5258083Z   test_consistency_SparseCSR_trunc_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.002s)
2023-01-11T21:45:10.5258286Z   test_copy_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.030s)
2023-01-11T21:45:10.5258450Z   test_copy_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5258707Z   test_copy_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5258877Z   test_copy_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259034Z   test_copy_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259197Z   test_copy_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259353Z   test_copy_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259515Z   test_copy_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259668Z   test_copy_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259823Z   test_copy_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5259980Z   test_copy_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5260133Z   test_copy_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.029s)
2023-01-11T21:45:10.5260289Z   test_copy_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5260444Z   test_copy_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5260606Z   test_copy_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5260769Z   test_copy_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5260962Z   test_copy_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5261122Z   test_copy_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5261287Z   test_copy_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5261444Z   test_copy_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5261593Z   test_copy_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5261744Z   test_copy_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5261904Z   test_copy_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5262069Z   test_copy_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5262234Z   test_copy_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5262395Z   test_copy_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5262560Z   test_copy_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5262728Z   test_copy_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5262894Z   test_copy_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5263046Z   test_copy_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5263210Z   test_copy_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5263366Z   test_copy_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5263528Z   test_copy_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5263685Z   test_copy_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5263847Z   test_copy_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5264004Z   test_copy_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5264203Z   test_copy_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5264356Z   test_copy_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5264524Z   test_copy_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5264692Z   test_copy_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5264854Z   test_copy_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5265015Z   test_copy_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5265181Z   test_copy_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5265338Z   test_copy_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5265498Z   test_copy_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5265647Z   test_copy_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5265805Z   test_copy_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5265961Z   test_copy_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.021s)
2023-01-11T21:45:10.5266134Z   test_copy_errors_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.039s)
2023-01-11T21:45:10.5266302Z   test_copy_errors_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5266478Z   test_copy_errors_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5266680Z   test_copy_errors_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5266851Z   test_copy_errors_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5267022Z   test_copy_errors_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5267187Z   test_copy_errors_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5267357Z   test_copy_errors_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5267524Z   test_copy_errors_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5267688Z   test_copy_errors_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5267856Z   test_copy_errors_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5268024Z   test_copy_errors_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5268202Z   test_copy_errors_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5268372Z   test_copy_errors_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5268540Z   test_copy_errors_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5268715Z   test_copy_errors_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5268883Z   test_copy_errors_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5269053Z   test_copy_errors_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5269219Z   test_copy_errors_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5269386Z   test_copy_errors_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5269554Z   test_copy_errors_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5269719Z   test_copy_errors_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5269909Z   test_copy_errors_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5270068Z   test_copy_errors_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.038s)
2023-01-11T21:45:10.5270242Z   test_copy_errors_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5270410Z   test_copy_errors_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5270584Z   test_copy_errors_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5270758Z   test_copy_errors_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5270932Z   test_copy_errors_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5271101Z   test_copy_errors_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5271268Z   test_copy_errors_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5271433Z   test_copy_errors_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5271591Z   test_copy_errors_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5271752Z   test_copy_errors_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5271918Z   test_copy_errors_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5272082Z   test_copy_errors_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5272254Z   test_copy_errors_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5272443Z   test_copy_errors_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5272619Z   test_copy_errors_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5272789Z   test_copy_errors_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5272950Z   test_copy_errors_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5273118Z   test_copy_errors_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5273294Z   test_copy_errors_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5273482Z   test_copy_errors_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5273663Z   test_copy_errors_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5273829Z   test_copy_errors_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5273994Z   test_copy_errors_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5274158Z   test_copy_errors_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5274336Z   test_dim_SparseBSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:45:10.5274507Z   test_dim_SparseBSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:45:10.5274683Z   test_dim_SparseCSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:45:10.5274854Z   test_dim_SparseCSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:45:10.5275023Z   test_empty_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5275191Z   test_empty_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5275363Z   test_empty_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5275559Z   test_empty_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5275723Z   test_empty_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5275881Z   test_empty_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5276045Z   test_empty_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5276207Z   test_empty_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5276366Z   test_empty_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5276529Z   test_empty_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5276688Z   test_empty_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5276848Z   test_empty_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5277015Z   test_empty_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5277173Z   test_empty_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5277336Z   test_empty_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.018s)
2023-01-11T21:45:10.5277504Z   test_empty_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5277666Z   test_empty_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5277830Z   test_empty_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5277992Z   test_empty_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5278175Z   test_empty_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5278329Z   test_empty_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5278485Z   test_empty_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5278635Z   test_empty_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5278788Z   test_empty_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.019s)
2023-01-11T21:45:10.5278960Z   test_empty_errors_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5279124Z   test_empty_errors_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5279294Z   test_empty_errors_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5279469Z   test_empty_errors_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5279643Z   test_empty_errors_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5279814Z   test_empty_errors_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5279980Z   test_empty_errors_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5280141Z   test_empty_errors_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5280308Z   test_empty_errors_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5280471Z   test_empty_errors_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5280637Z   test_empty_errors_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5280803Z   test_empty_errors_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5280975Z   test_empty_errors_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5281141Z   test_empty_errors_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5281353Z   test_empty_errors_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5281521Z   test_empty_errors_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5281694Z   test_empty_errors_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5281861Z   test_empty_errors_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5282029Z   test_empty_errors_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5282195Z   test_empty_errors_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5282364Z   test_empty_errors_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5282530Z   test_empty_errors_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5282695Z   test_empty_errors_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5282857Z   test_empty_errors_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.004s)
2023-01-11T21:45:10.5283034Z   test_empty_like_SparseBSC_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5283213Z   test_empty_like_SparseBSC_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5283400Z   test_empty_like_SparseBSC_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5283585Z   test_empty_like_SparseBSC_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5283792Z   test_empty_like_SparseBSC_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5283974Z   test_empty_like_SparseBSC_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5284156Z   test_empty_like_SparseBSC_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5284333Z   test_empty_like_SparseBSC_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5284508Z   test_empty_like_SparseBSC_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5284673Z   test_empty_like_SparseBSC_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5284849Z   test_empty_like_SparseBSC_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5285020Z   test_empty_like_SparseBSC_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5285203Z   test_empty_like_SparseBSC_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5285377Z   test_empty_like_SparseBSC_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5285562Z   test_empty_like_SparseBSC_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5285743Z   test_empty_like_SparseBSC_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5285920Z   test_empty_like_SparseBSC_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5286087Z   test_empty_like_SparseBSC_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5286265Z   test_empty_like_SparseBSC_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5286439Z   test_empty_like_SparseBSC_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5286612Z   test_empty_like_SparseBSC_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5286805Z   test_empty_like_SparseBSC_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5286982Z   test_empty_like_SparseBSC_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5287159Z   test_empty_like_SparseBSC_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5287337Z   test_empty_like_SparseBSC_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5287512Z   test_empty_like_SparseBSC_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5287686Z   test_empty_like_SparseBSC_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5287868Z   test_empty_like_SparseBSC_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5288051Z   test_empty_like_SparseBSC_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5288231Z   test_empty_like_SparseBSC_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5288406Z   test_empty_like_SparseBSC_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5288578Z   test_empty_like_SparseBSC_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5288753Z   test_empty_like_SparseBSC_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5288924Z   test_empty_like_SparseBSC_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5289099Z   test_empty_like_SparseBSC_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5289289Z   test_empty_like_SparseBSC_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5289470Z   test_empty_like_SparseBSC_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5289646Z   test_empty_like_SparseBSC_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5289830Z   test_empty_like_SparseBSC_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5290010Z   test_empty_like_SparseBSC_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5290186Z   test_empty_like_SparseBSC_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5290360Z   test_empty_like_SparseBSC_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5290532Z   test_empty_like_SparseBSC_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5290708Z   test_empty_like_SparseBSC_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5290879Z   test_empty_like_SparseBSC_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5291056Z   test_empty_like_SparseBSC_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5291233Z   test_empty_like_SparseBSC_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5291413Z   test_empty_like_SparseBSC_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5291591Z   test_empty_like_SparseBSR_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5291765Z   test_empty_like_SparseBSR_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5291949Z   test_empty_like_SparseBSR_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5292133Z   test_empty_like_SparseBSR_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5292307Z   test_empty_like_SparseBSR_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5292502Z   test_empty_like_SparseBSR_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5292684Z   test_empty_like_SparseBSR_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5292858Z   test_empty_like_SparseBSR_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5293030Z   test_empty_like_SparseBSR_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5293206Z   test_empty_like_SparseBSR_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5293382Z   test_empty_like_SparseBSR_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5293561Z   test_empty_like_SparseBSR_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5293751Z   test_empty_like_SparseBSR_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.026s)
2023-01-11T21:45:10.5293952Z   test_empty_like_SparseBSR_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5294142Z   test_empty_like_SparseBSR_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5294327Z   test_empty_like_SparseBSR_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5294652Z   test_empty_like_SparseBSR_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5294868Z   test_empty_like_SparseBSR_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5295047Z   test_empty_like_SparseBSR_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5295278Z   test_empty_like_SparseBSR_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5295456Z   test_empty_like_SparseBSR_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5295627Z   test_empty_like_SparseBSR_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5295807Z   test_empty_like_SparseBSR_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5295979Z   test_empty_like_SparseBSR_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.025s)
2023-01-11T21:45:10.5296162Z   test_empty_like_SparseBSR_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5296338Z   test_empty_like_SparseBSR_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5296530Z   test_empty_like_SparseBSR_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5296713Z   test_empty_like_SparseBSR_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5296897Z   test_empty_like_SparseBSR_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5297076Z   test_empty_like_SparseBSR_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5297253Z   test_empty_like_SparseBSR_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5297426Z   test_empty_like_SparseBSR_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5297592Z   test_empty_like_SparseBSR_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5297764Z   test_empty_like_SparseBSR_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5297944Z   test_empty_like_SparseBSR_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5298125Z   test_empty_like_SparseBSR_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5298335Z   test_empty_like_SparseBSR_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5298577Z   test_empty_like_SparseBSR_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5298777Z   test_empty_like_SparseBSR_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5298957Z   test_empty_like_SparseBSR_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5299133Z   test_empty_like_SparseBSR_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5299301Z   test_empty_like_SparseBSR_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5299477Z   test_empty_like_SparseBSR_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5299648Z   test_empty_like_SparseBSR_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5299822Z   test_empty_like_SparseBSR_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5299992Z   test_empty_like_SparseBSR_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5300166Z   test_empty_like_SparseBSR_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5300341Z   test_empty_like_SparseBSR_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5300519Z   test_empty_like_SparseCSC_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5300688Z   test_empty_like_SparseCSC_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5300906Z   test_empty_like_SparseCSC_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5301090Z   test_empty_like_SparseCSC_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5301268Z   test_empty_like_SparseCSC_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5301442Z   test_empty_like_SparseCSC_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5301618Z   test_empty_like_SparseCSC_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5301792Z   test_empty_like_SparseCSC_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5301962Z   test_empty_like_SparseCSC_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5302131Z   test_empty_like_SparseCSC_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5302300Z   test_empty_like_SparseCSC_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5302477Z   test_empty_like_SparseCSC_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5302658Z   test_empty_like_SparseCSC_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5302829Z   test_empty_like_SparseCSC_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5303010Z   test_empty_like_SparseCSC_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5303189Z   test_empty_like_SparseCSC_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5303364Z   test_empty_like_SparseCSC_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5303544Z   test_empty_like_SparseCSC_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5303717Z   test_empty_like_SparseCSC_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5303905Z   test_empty_like_SparseCSC_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.108s)
2023-01-11T21:45:10.5304075Z   test_empty_like_SparseCSC_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5304249Z   test_empty_like_SparseCSC_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5304422Z   test_empty_like_SparseCSC_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5304595Z   test_empty_like_SparseCSC_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5304771Z   test_empty_like_SparseCSC_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5304950Z   test_empty_like_SparseCSC_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5305131Z   test_empty_like_SparseCSC_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5305317Z   test_empty_like_SparseCSC_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5305485Z   test_empty_like_SparseCSC_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5305663Z   test_empty_like_SparseCSC_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5305837Z   test_empty_like_SparseCSC_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5306007Z   test_empty_like_SparseCSC_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5306176Z   test_empty_like_SparseCSC_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5306430Z   test_empty_like_SparseCSC_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5306622Z   test_empty_like_SparseCSC_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5306796Z   test_empty_like_SparseCSC_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5306976Z   test_empty_like_SparseCSC_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5307141Z   test_empty_like_SparseCSC_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5307323Z   test_empty_like_SparseCSC_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5307505Z   test_empty_like_SparseCSC_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5307681Z   test_empty_like_SparseCSC_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5307855Z   test_empty_like_SparseCSC_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5308031Z   test_empty_like_SparseCSC_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5308202Z   test_empty_like_SparseCSC_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5308371Z   test_empty_like_SparseCSC_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5308540Z   test_empty_like_SparseCSC_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5308709Z   test_empty_like_SparseCSC_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5308885Z   test_empty_like_SparseCSC_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5309063Z   test_empty_like_SparseCSR_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5309236Z   test_empty_like_SparseCSR_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5309454Z   test_empty_like_SparseCSR_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5309636Z   test_empty_like_SparseCSR_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5309811Z   test_empty_like_SparseCSR_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5309988Z   test_empty_like_SparseCSR_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5310164Z   test_empty_like_SparseCSR_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5310331Z   test_empty_like_SparseCSR_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5310506Z   test_empty_like_SparseCSR_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5310674Z   test_empty_like_SparseCSR_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5310852Z   test_empty_like_SparseCSR_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5311020Z   test_empty_like_SparseCSR_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5311199Z   test_empty_like_SparseCSR_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5311373Z   test_empty_like_SparseCSR_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5311556Z   test_empty_like_SparseCSR_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5311738Z   test_empty_like_SparseCSR_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5311935Z   test_empty_like_SparseCSR_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.108s)
2023-01-11T21:45:10.5312108Z   test_empty_like_SparseCSR_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5312288Z   test_empty_like_SparseCSR_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5312459Z   test_empty_like_SparseCSR_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5312629Z   test_empty_like_SparseCSR_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5312799Z   test_empty_like_SparseCSR_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5312973Z   test_empty_like_SparseCSR_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5313145Z   test_empty_like_SparseCSR_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5313325Z   test_empty_like_SparseCSR_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5313492Z   test_empty_like_SparseCSR_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5313674Z   test_empty_like_SparseCSR_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5313854Z   test_empty_like_SparseCSR_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5314031Z   test_empty_like_SparseCSR_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5314203Z   test_empty_like_SparseCSR_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5314374Z   test_empty_like_SparseCSR_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5314547Z   test_empty_like_SparseCSR_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5314719Z   test_empty_like_SparseCSR_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5314907Z   test_empty_like_SparseCSR_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5315081Z   test_empty_like_SparseCSR_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5315254Z   test_empty_like_SparseCSR_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5315432Z   test_empty_like_SparseCSR_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5315603Z   test_empty_like_SparseCSR_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5315784Z   test_empty_like_SparseCSR_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5315965Z   test_empty_like_SparseCSR_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5316144Z   test_empty_like_SparseCSR_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5316321Z   test_empty_like_SparseCSR_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5316491Z   test_empty_like_SparseCSR_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5316661Z   test_empty_like_SparseCSR_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5316829Z   test_empty_like_SparseCSR_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5316998Z   test_empty_like_SparseCSR_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5317173Z   test_empty_like_SparseCSR_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5317375Z   test_empty_like_SparseCSR_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.023s)
2023-01-11T21:45:10.5317577Z   test_invalid_input_SparseBSC_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.034s)
2023-01-11T21:45:10.5317790Z   test_invalid_input_SparseBSC_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5318009Z   test_invalid_input_SparseBSC_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.024s)
2023-01-11T21:45:10.5318209Z   test_invalid_input_SparseBSR_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.033s)
2023-01-11T21:45:10.5318409Z   test_invalid_input_SparseBSR_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5318619Z   test_invalid_input_SparseBSR_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.024s)
2023-01-11T21:45:10.5318822Z   test_invalid_input_SparseCSC_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.030s)
2023-01-11T21:45:10.5319029Z   test_invalid_input_SparseCSC_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5319244Z   test_invalid_input_SparseCSC_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5319439Z   test_invalid_input_SparseCSR_target_sparse_compressed_tensor_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.030s)
2023-01-11T21:45:10.5319645Z   test_invalid_input_SparseCSR_target_sparse_compressed_tensor_no_size_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.020s)
2023-01-11T21:45:10.5319860Z   test_invalid_input_SparseCSR_target_validate_sparse_compressed_tensor_args_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.022s)
2023-01-11T21:45:10.5320036Z   test_layout_SparseBSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:45:10.5320206Z   test_layout_SparseBSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T21:45:10.5320463Z   test_layout_SparseCSC_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T21:45:10.5320638Z   test_layout_SparseCSR_cuda (__main__.TestSparseCompressedCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T21:45:10.5320804Z   test_pickle_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.220s)
2023-01-11T21:45:10.5320968Z   test_pickle_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.170s)
2023-01-11T21:45:10.5321132Z   test_pickle_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.153s)
2023-01-11T21:45:10.5321288Z   test_pickle_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s)
2023-01-11T21:45:10.5321446Z   test_print_SparseBSC_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.253s)
2023-01-11T21:45:10.5321606Z   test_print_SparseBSR_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.246s)
2023-01-11T21:45:10.5321754Z   test_print_SparseCSC_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.223s)
2023-01-11T21:45:10.5321914Z   test_print_SparseCSR_cuda (__main__.TestSparseCompressedCUDA) ... ok (0.223s)
2023-01-11T21:45:10.5322092Z   test_select_copy_SparseBSC_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.722s)
2023-01-11T21:45:10.5322267Z   test_select_copy_SparseBSC_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.675s)
2023-01-11T21:45:10.5322446Z   test_select_copy_SparseBSC_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.728s)
2023-01-11T21:45:10.5322627Z   test_select_copy_SparseBSC_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.735s)
2023-01-11T21:45:10.5322799Z   test_select_copy_SparseBSC_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.722s)
2023-01-11T21:45:10.5322997Z   test_select_copy_SparseBSC_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.723s)
2023-01-11T21:45:10.5323169Z   test_select_copy_SparseBSC_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.718s)
2023-01-11T21:45:10.5323336Z   test_select_copy_SparseBSC_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.674s)
2023-01-11T21:45:10.5323505Z   test_select_copy_SparseBSC_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.675s)
2023-01-11T21:45:10.5323671Z   test_select_copy_SparseBSC_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.666s)
2023-01-11T21:45:10.5323842Z   test_select_copy_SparseBSC_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.675s)
2023-01-11T21:45:10.5324018Z   test_select_copy_SparseBSC_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.675s)
2023-01-11T21:45:10.5324197Z   test_select_copy_SparseBSC_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.721s)
2023-01-11T21:45:10.5324369Z   test_select_copy_SparseBSC_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.673s)
2023-01-11T21:45:10.5324550Z   test_select_copy_SparseBSC_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.724s)
2023-01-11T21:45:10.5324723Z   test_select_copy_SparseBSC_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.732s)
2023-01-11T21:45:10.5324895Z   test_select_copy_SparseBSC_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.720s)
2023-01-11T21:45:10.5325064Z   test_select_copy_SparseBSC_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.724s)
2023-01-11T21:45:10.5325230Z   test_select_copy_SparseBSC_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.714s)
2023-01-11T21:45:10.5325399Z   test_select_copy_SparseBSC_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.674s)
2023-01-11T21:45:10.5325567Z   test_select_copy_SparseBSC_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.675s)
2023-01-11T21:45:10.5325736Z   test_select_copy_SparseBSC_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.667s)
2023-01-11T21:45:10.5325908Z   test_select_copy_SparseBSC_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.674s)
2023-01-11T21:45:10.5326102Z   test_select_copy_SparseBSC_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.673s)
2023-01-11T21:45:10.5326274Z   test_select_copy_SparseBSR_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.640s)
2023-01-11T21:45:10.5326441Z   test_select_copy_SparseBSR_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.592s)
2023-01-11T21:45:10.5326619Z   test_select_copy_SparseBSR_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.643s)
2023-01-11T21:45:10.5326797Z   test_select_copy_SparseBSR_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.650s)
2023-01-11T21:45:10.5326971Z   test_select_copy_SparseBSR_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.639s)
2023-01-11T21:45:10.5327141Z   test_select_copy_SparseBSR_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.640s)
2023-01-11T21:45:10.5327309Z   test_select_copy_SparseBSR_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.633s)
2023-01-11T21:45:10.5327479Z   test_select_copy_SparseBSR_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.592s)
2023-01-11T21:45:10.5327646Z   test_select_copy_SparseBSR_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.591s)
2023-01-11T21:45:10.5327807Z   test_select_copy_SparseBSR_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.583s)
2023-01-11T21:45:10.5327977Z   test_select_copy_SparseBSR_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.592s)
2023-01-11T21:45:10.5328142Z   test_select_copy_SparseBSR_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.592s)
2023-01-11T21:45:10.5328316Z   test_select_copy_SparseBSR_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.624s)
2023-01-11T21:45:10.5328513Z   test_select_copy_SparseBSR_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.577s)
2023-01-11T21:45:10.5328691Z   test_select_copy_SparseBSR_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.630s)
2023-01-11T21:45:10.5328871Z   test_select_copy_SparseBSR_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.637s)
2023-01-11T21:45:10.5329043Z   test_select_copy_SparseBSR_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.628s)
2023-01-11T21:45:10.5329210Z   test_select_copy_SparseBSR_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.628s)
2023-01-11T21:45:10.5329371Z   test_select_copy_SparseBSR_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.620s)
2023-01-11T21:45:10.5329539Z   test_select_copy_SparseBSR_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.578s)
2023-01-11T21:45:10.5329707Z   test_select_copy_SparseBSR_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.578s)
2023-01-11T21:45:10.5329876Z   test_select_copy_SparseBSR_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.572s)
2023-01-11T21:45:10.5330044Z   test_select_copy_SparseBSR_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.578s)
2023-01-11T21:45:10.5330212Z   test_select_copy_SparseBSR_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.579s)
2023-01-11T21:45:10.5330393Z   test_select_copy_SparseCSC_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.584s)
2023-01-11T21:45:10.5330563Z   test_select_copy_SparseCSC_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.537s)
2023-01-11T21:45:10.5330739Z   test_select_copy_SparseCSC_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.590s)
2023-01-11T21:45:10.5330908Z   test_select_copy_SparseCSC_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.596s)
2023-01-11T21:45:10.5331081Z   test_select_copy_SparseCSC_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.585s)
2023-01-11T21:45:10.5331253Z   test_select_copy_SparseCSC_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.586s)
2023-01-11T21:45:10.5331420Z   test_select_copy_SparseCSC_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.577s)
2023-01-11T21:45:10.5331614Z   test_select_copy_SparseCSC_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.536s)
2023-01-11T21:45:10.5331784Z   test_select_copy_SparseCSC_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.537s)
2023-01-11T21:45:10.5331952Z   test_select_copy_SparseCSC_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.528s)
2023-01-11T21:45:10.5332120Z   test_select_copy_SparseCSC_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.536s)
2023-01-11T21:45:10.5332280Z   test_select_copy_SparseCSC_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.536s)
2023-01-11T21:45:10.5332453Z   test_select_copy_SparseCSC_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.583s)
2023-01-11T21:45:10.5332623Z   test_select_copy_SparseCSC_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.536s)
2023-01-11T21:45:10.5332803Z   test_select_copy_SparseCSC_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.587s)
2023-01-11T21:45:10.5332982Z   test_select_copy_SparseCSC_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.597s)
2023-01-11T21:45:10.5333150Z   test_select_copy_SparseCSC_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.584s)
2023-01-11T21:45:10.5333320Z   test_select_copy_SparseCSC_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.584s)
2023-01-11T21:45:10.5333494Z   test_select_copy_SparseCSC_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.575s)
2023-01-11T21:45:10.5333680Z   test_select_copy_SparseCSC_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.534s)
2023-01-11T21:45:10.5333863Z   test_select_copy_SparseCSC_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.536s)
2023-01-11T21:45:10.5334056Z   test_select_copy_SparseCSC_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.528s)
2023-01-11T21:45:10.5334225Z   test_select_copy_SparseCSC_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.534s)
2023-01-11T21:45:10.5334392Z   test_select_copy_SparseCSC_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.535s)
2023-01-11T21:45:10.5334803Z   test_select_copy_SparseCSR_int32_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.523s)
2023-01-11T21:45:10.5334974Z   test_select_copy_SparseCSR_int32_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.474s)
2023-01-11T21:45:10.5335151Z   test_select_copy_SparseCSR_int32_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.527s)
2023-01-11T21:45:10.5335326Z   test_select_copy_SparseCSR_int32_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.535s)
2023-01-11T21:45:10.5335495Z   test_select_copy_SparseCSR_int32_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.524s)
2023-01-11T21:45:10.5335661Z   test_select_copy_SparseCSR_int32_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.523s)
2023-01-11T21:45:10.5335832Z   test_select_copy_SparseCSR_int32_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.518s)
2023-01-11T21:45:10.5336004Z   test_select_copy_SparseCSR_int32_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.475s)
2023-01-11T21:45:10.5336169Z   test_select_copy_SparseCSR_int32_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.476s)
2023-01-11T21:45:10.5336335Z   test_select_copy_SparseCSR_int32_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.468s)
2023-01-11T21:45:10.5336503Z   test_select_copy_SparseCSR_int32_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.475s)
2023-01-11T21:45:10.5336669Z   test_select_copy_SparseCSR_int32_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.474s)
2023-01-11T21:45:10.5336846Z   test_select_copy_SparseCSR_int64_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.517s)
2023-01-11T21:45:10.5337017Z   test_select_copy_SparseCSR_int64_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.469s)
2023-01-11T21:45:10.5337189Z   test_select_copy_SparseCSR_int64_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.521s)
2023-01-11T21:45:10.5337415Z   test_select_copy_SparseCSR_int64_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.528s)
2023-01-11T21:45:10.5337588Z   test_select_copy_SparseCSR_int64_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.516s)
2023-01-11T21:45:10.5337759Z   test_select_copy_SparseCSR_int64_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.518s)
2023-01-11T21:45:10.5337927Z   test_select_copy_SparseCSR_int64_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.511s)
2023-01-11T21:45:10.5338097Z   test_select_copy_SparseCSR_int64_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.469s)
2023-01-11T21:45:10.5338262Z   test_select_copy_SparseCSR_int64_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.470s)
2023-01-11T21:45:10.5338435Z   test_select_copy_SparseCSR_int64_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.462s)
2023-01-11T21:45:10.5338649Z   test_select_copy_SparseCSR_int64_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.469s)
2023-01-11T21:45:10.5338812Z   test_select_copy_SparseCSR_int64_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.470s)
2023-01-11T21:45:10.5339021Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5339222Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.054s)
2023-01-11T21:45:10.5339432Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.063s)
2023-01-11T21:45:10.5339636Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.064s)
2023-01-11T21:45:10.5339874Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5340076Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5340277Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5340474Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5340666Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5340859Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5341055Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5341257Z   test_sparse_compressed_constructor_____from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5341460Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5341656Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5341859Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5342061Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.063s)
2023-01-11T21:45:10.5342258Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5342455Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5342646Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5342864Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5343066Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5343262Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5343459Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5343661Z   test_sparse_compressed_constructor_____from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5343863Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5344063Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5344269Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.060s)
2023-01-11T21:45:10.5344479Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5344669Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5344866Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5345059Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s)
2023-01-11T21:45:10.5345284Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5345480Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.050s)
2023-01-11T21:45:10.5345669Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5345865Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5346059Z   test_sparse_compressed_constructor_____from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5346256Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5346454Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5346653Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.060s)
2023-01-11T21:45:10.5346860Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5347058Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5347256Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5347449Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s)
2023-01-11T21:45:10.5347642Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5347841Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5348056Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5348253Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5348450Z   test_sparse_compressed_constructor_____from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5348649Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5348848Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5349054Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s)
2023-01-11T21:45:10.5349263Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.125s)
2023-01-11T21:45:10.5349467Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5349671Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5349870Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s)
2023-01-11T21:45:10.5350066Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5350263Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.102s)
2023-01-11T21:45:10.5350484Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s)
2023-01-11T21:45:10.5350677Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.102s)
2023-01-11T21:45:10.5350876Z   test_sparse_compressed_constructor_____from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5351078Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5351276Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5351484Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s)
2023-01-11T21:45:10.5351691Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.125s)
2023-01-11T21:45:10.5351891Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5352092Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5352293Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s)
2023-01-11T21:45:10.5352490Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5352681Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5352876Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s)
2023-01-11T21:45:10.5353078Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5353276Z   test_sparse_compressed_constructor_____from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.102s)
2023-01-11T21:45:10.5353505Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5353710Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5353915Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.118s)
2023-01-11T21:45:10.5354116Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5354315Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5354516Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5354713Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.113s)
2023-01-11T21:45:10.5354906Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5355105Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5355298Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.096s)
2023-01-11T21:45:10.5355497Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5355692Z   test_sparse_compressed_constructor_____from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5355918Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5356120Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5356325Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.118s)
2023-01-11T21:45:10.5356528Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5356720Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5356916Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5357116Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.113s)
2023-01-11T21:45:10.5357313Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5357511Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.099s)
2023-01-11T21:45:10.5357704Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.096s)
2023-01-11T21:45:10.5357900Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.099s)
2023-01-11T21:45:10.5358094Z   test_sparse_compressed_constructor_____from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5358304Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5358512Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5358758Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.063s)
2023-01-11T21:45:10.5358971Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.064s)
2023-01-11T21:45:10.5359178Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5359381Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5359582Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.060s)
2023-01-11T21:45:10.5359792Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.054s)
2023-01-11T21:45:10.5360004Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.054s)
2023-01-11T21:45:10.5360206Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5360410Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.054s)
2023-01-11T21:45:10.5360611Z   test_sparse_compressed_constructor___factory_from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5360811Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5361040Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.054s)
2023-01-11T21:45:10.5361250Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5361461Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.064s)
2023-01-11T21:45:10.5361662Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5361862Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.062s)
2023-01-11T21:45:10.5362059Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5362262Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5362467Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5362671Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5362868Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5363067Z   test_sparse_compressed_constructor___factory_from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5363273Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5363499Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5363736Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5363964Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5364167Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5364366Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5364564Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s)
2023-01-11T21:45:10.5364765Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5364964Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5365164Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5365371Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5365573Z   test_sparse_compressed_constructor___factory_from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5365779Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5365981Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5366191Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5366424Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.061s)
2023-01-11T21:45:10.5366631Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5366831Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.059s)
2023-01-11T21:45:10.5367031Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.058s)
2023-01-11T21:45:10.5367225Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5367425Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5367632Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5367835Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5368035Z   test_sparse_compressed_constructor___factory_from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.051s)
2023-01-11T21:45:10.5368246Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5368452Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.102s)
2023-01-11T21:45:10.5368666Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s)
2023-01-11T21:45:10.5368876Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.124s)
2023-01-11T21:45:10.5369089Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5369312Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5369517Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.117s)
2023-01-11T21:45:10.5369725Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5369925Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5370124Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s)
2023-01-11T21:45:10.5370331Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5370538Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5370744Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5370947Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5371158Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.122s)
2023-01-11T21:45:10.5371361Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.125s)
2023-01-11T21:45:10.5371589Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.121s)
2023-01-11T21:45:10.5371794Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5371995Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.117s)
2023-01-11T21:45:10.5372196Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5372396Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5372593Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.100s)
2023-01-11T21:45:10.5372795Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.103s)
2023-01-11T21:45:10.5373004Z   test_sparse_compressed_constructor___factory_from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.102s)
2023-01-11T21:45:10.5373210Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5373413Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5373649Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.117s)
2023-01-11T21:45:10.5373877Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.120s)
2023-01-11T21:45:10.5374084Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5374293Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5374641Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5374846Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.096s)
2023-01-11T21:45:10.5375048Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5375243Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s)
2023-01-11T21:45:10.5375447Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5375654Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5375860Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5376064Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5376280Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.117s)
2023-01-11T21:45:10.5376491Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s)
2023-01-11T21:45:10.5376695Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5376895Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5377125Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.113s)
2023-01-11T21:45:10.5377329Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5377528Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5377727Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s)
2023-01-11T21:45:10.5377923Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5378126Z   test_sparse_compressed_constructor___factory_from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5378376Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5378743Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5378993Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5379238Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5379479Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5379716Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5379990Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5380232Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5380466Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5380709Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5380949Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5381183Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5381433Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5381674Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5381922Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5382194Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5382438Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5382676Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5382909Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5383146Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5383378Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5383622Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5383860Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5384088Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5384332Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5384574Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5384824Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5385092Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5385334Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5385568Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5385811Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5386052Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5386289Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5386527Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5386763Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5387000Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5387263Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5387505Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5387748Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5387993Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5388230Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5388468Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5388708Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5388944Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5389174Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5389410Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5389649Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5389912Z   test_sparse_compressed_constructor_shape_and_device_inference___from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5390139Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5390368Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5390602Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5390833Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.118s)
2023-01-11T21:45:10.5391065Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5391291Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5391521Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5391748Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5391965Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5392192Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5392452Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5392674Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5392896Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5393122Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5393353Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5393583Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s)
2023-01-11T21:45:10.5393810Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5394029Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5394254Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5394475Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5394692Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5394916Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s)
2023-01-11T21:45:10.5395175Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5395391Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5395621Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5395844Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5396070Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5396303Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5396523Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5396743Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5396967Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.108s)
2023-01-11T21:45:10.5397185Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5397428Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5397651Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.091s)
2023-01-11T21:45:10.5397873Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5398086Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5398312Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5398536Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5398764Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5398992Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5399215Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5399435Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5399659Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.108s)
2023-01-11T21:45:10.5399881Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5400121Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5400338Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.090s)
2023-01-11T21:45:10.5400556Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5400772Z   test_sparse_compressed_constructor_shape_and_device_inference___from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5401027Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5401279Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5401535Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5401784Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5402033Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5402286Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5402606Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5402852Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5403099Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5403347Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5403620Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5403886Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5404138Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5404386Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5404638Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5404887Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5405140Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5405406Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5405651Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5405895Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5406137Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5406381Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5406628Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5406872Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5407115Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5407363Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5407640Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5407892Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5408138Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5408381Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5408630Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5408872Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5409113Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5409355Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5409597Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5409842Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5410094Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5410356Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5410610Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5410859Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5411103Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5411351Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5411605Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5411846Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5412089Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5412330Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5412599Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5412844Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_list_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... skip: nothing to test (0.002s)
2023-01-11T21:45:10.5413081Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5413319Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5413559Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5413845Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s)
2023-01-11T21:45:10.5414082Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5414316Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5414662Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5414899Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.096s)
2023-01-11T21:45:10.5415138Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5415377Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s)
2023-01-11T21:45:10.5415645Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5415880Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5416115Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5416348Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5416580Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.116s)
2023-01-11T21:45:10.5416818Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.119s)
2023-01-11T21:45:10.5417051Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5417283Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.115s)
2023-01-11T21:45:10.5417516Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.112s)
2023-01-11T21:45:10.5417745Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5418001Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5418234Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.095s)
2023-01-11T21:45:10.5418467Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5418763Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5419025Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5419275Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5419566Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5419802Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5420034Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5420268Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5420501Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.107s)
2023-01-11T21:45:10.5420730Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5420989Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5421216Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.090s)
2023-01-11T21:45:10.5421444Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.092s)
2023-01-11T21:45:10.5421676Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5421913Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5422138Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5422375Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.111s)
2023-01-11T21:45:10.5422613Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.114s)
2023-01-11T21:45:10.5422844Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.109s)
2023-01-11T21:45:10.5423075Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.110s)
2023-01-11T21:45:10.5423335Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.107s)
2023-01-11T21:45:10.5423565Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5423793Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5424021Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.091s)
2023-01-11T21:45:10.5424248Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5424480Z   test_sparse_compressed_constructor_shape_and_device_inference_factory_from_tensor_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5424652Z   test_to_dtype_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.166s)
2023-01-11T21:45:10.5424810Z   test_to_dtype_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5424981Z   test_to_dtype_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.166s)
2023-01-11T21:45:10.5425148Z   test_to_dtype_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.167s)
2023-01-11T21:45:10.5425313Z   test_to_dtype_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.166s)
2023-01-11T21:45:10.5425477Z   test_to_dtype_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.166s)
2023-01-11T21:45:10.5425643Z   test_to_dtype_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.165s)
2023-01-11T21:45:10.5425809Z   test_to_dtype_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5425971Z   test_to_dtype_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5426144Z   test_to_dtype_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5426307Z   test_to_dtype_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5426472Z   test_to_dtype_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5426636Z   test_to_dtype_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.163s)
2023-01-11T21:45:10.5426796Z   test_to_dtype_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.162s)
2023-01-11T21:45:10.5426965Z   test_to_dtype_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.165s)
2023-01-11T21:45:10.5427134Z   test_to_dtype_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.166s)
2023-01-11T21:45:10.5427297Z   test_to_dtype_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5427461Z   test_to_dtype_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.166s)
2023-01-11T21:45:10.5427614Z   test_to_dtype_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.165s)
2023-01-11T21:45:10.5427778Z   test_to_dtype_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.165s)
2023-01-11T21:45:10.5427939Z   test_to_dtype_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.165s)
2023-01-11T21:45:10.5428099Z   test_to_dtype_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.162s)
2023-01-11T21:45:10.5428260Z   test_to_dtype_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5428418Z   test_to_dtype_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.164s)
2023-01-11T21:45:10.5428612Z   test_to_dtype_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.099s)
2023-01-11T21:45:10.5428772Z   test_to_dtype_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5428938Z   test_to_dtype_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.099s)
2023-01-11T21:45:10.5429104Z   test_to_dtype_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.099s)
2023-01-11T21:45:10.5429264Z   test_to_dtype_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.099s)
2023-01-11T21:45:10.5429424Z   test_to_dtype_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5429585Z   test_to_dtype_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5429743Z   test_to_dtype_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5429904Z   test_to_dtype_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.098s)
2023-01-11T21:45:10.5430065Z   test_to_dtype_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.097s)
2023-01-11T21:45:10.5430221Z   test_to_dtype_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.096s)
2023-01-11T21:45:10.5430382Z   test_to_dtype_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.096s)
2023-01-11T21:45:10.5430546Z   test_to_dtype_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5430707Z   test_to_dtype_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.092s)
2023-01-11T21:45:10.5430876Z   test_to_dtype_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.094s)
2023-01-11T21:45:10.5431045Z   test_to_dtype_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5431206Z   test_to_dtype_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5431369Z   test_to_dtype_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.093s)
2023-01-11T21:45:10.5431527Z   test_to_dtype_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.092s)
2023-01-11T21:45:10.5431716Z   test_to_dtype_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.092s)
2023-01-11T21:45:10.5431879Z   test_to_dtype_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.092s)
2023-01-11T21:45:10.5432034Z   test_to_dtype_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.091s)
2023-01-11T21:45:10.5432194Z   test_to_dtype_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.092s)
2023-01-11T21:45:10.5432356Z   test_to_dtype_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.091s)
2023-01-11T21:45:10.5432523Z   test_validate_SparseBSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5432686Z   test_validate_SparseBSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5432862Z   test_validate_SparseBSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5433022Z   test_validate_SparseBSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5433188Z   test_validate_SparseBSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5433366Z   test_validate_SparseBSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5433553Z   test_validate_SparseBSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5433721Z   test_validate_SparseBSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5433882Z   test_validate_SparseBSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5434041Z   test_validate_SparseBSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.046s)
2023-01-11T21:45:10.5434227Z   test_validate_SparseBSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5434390Z   test_validate_SparseBSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5434551Z   test_validate_SparseBSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5434712Z   test_validate_SparseBSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5434880Z   test_validate_SparseBSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.053s)
2023-01-11T21:45:10.5435047Z   test_validate_SparseBSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.054s)
2023-01-11T21:45:10.5435206Z   test_validate_SparseBSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5435369Z   test_validate_SparseBSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5435530Z   test_validate_SparseBSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.052s)
2023-01-11T21:45:10.5435691Z   test_validate_SparseBSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5435844Z   test_validate_SparseBSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5436003Z   test_validate_SparseBSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.046s)
2023-01-11T21:45:10.5436163Z   test_validate_SparseBSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5436323Z   test_validate_SparseBSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.046s)
2023-01-11T21:45:10.5436488Z   test_validate_SparseCSC_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5436648Z   test_validate_SparseCSC_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.042s)
2023-01-11T21:45:10.5436819Z   test_validate_SparseCSC_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.048s)
2023-01-11T21:45:10.5436986Z   test_validate_SparseCSC_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5437151Z   test_validate_SparseCSC_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.048s)
2023-01-11T21:45:10.5437328Z   test_validate_SparseCSC_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.048s)
2023-01-11T21:45:10.5437488Z   test_validate_SparseCSC_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.048s)
2023-01-11T21:45:10.5437647Z   test_validate_SparseCSC_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5437806Z   test_validate_SparseCSC_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5437966Z   test_validate_SparseCSC_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.042s)
2023-01-11T21:45:10.5438123Z   test_validate_SparseCSC_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5438284Z   test_validate_SparseCSC_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5438452Z   test_validate_SparseCSR_cuda_bfloat16 (__main__.TestSparseCompressedCUDA) ... ok (0.048s)
2023-01-11T21:45:10.5438609Z   test_validate_SparseCSR_cuda_bool (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5438781Z   test_validate_SparseCSR_cuda_complex128 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5438950Z   test_validate_SparseCSR_cuda_complex64 (__main__.TestSparseCompressedCUDA) ... ok (0.049s)
2023-01-11T21:45:10.5439109Z   test_validate_SparseCSR_cuda_float16 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5439274Z   test_validate_SparseCSR_cuda_float32 (__main__.TestSparseCompressedCUDA) ... ok (0.048s)
2023-01-11T21:45:10.5439435Z   test_validate_SparseCSR_cuda_float64 (__main__.TestSparseCompressedCUDA) ... ok (0.047s)
2023-01-11T21:45:10.5439598Z   test_validate_SparseCSR_cuda_int16 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5439757Z   test_validate_SparseCSR_cuda_int32 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5439946Z   test_validate_SparseCSR_cuda_int64 (__main__.TestSparseCompressedCUDA) ... ok (0.041s)
2023-01-11T21:45:10.5440099Z   test_validate_SparseCSR_cuda_int8 (__main__.TestSparseCompressedCUDA) ... ok (0.043s)
2023-01-11T21:45:10.5440264Z   test_validate_SparseCSR_cuda_uint8 (__main__.TestSparseCompressedCUDA) ... ok (0.042s)
2023-01-11T21:45:10.5440272Z 
2023-01-11T21:45:10.5440526Z ----------------------------------------------------------------------
2023-01-11T21:45:10.5440616Z Ran 4617 tests in 409.424s
2023-01-11T21:45:10.5440622Z 
2023-01-11T21:45:10.5440701Z OK (skipped=517)
2023-01-11T21:45:10.5440707Z 
2023-01-11T21:45:10.5440794Z Generating XML reports...
2023-01-11T21:45:10.5441102Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20230111213819.xml
2023-01-11T21:45:10.5441406Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20230111213819.xml
2023-01-11T21:45:10.5441714Z Generated XML report: test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20230111213819.xml
2023-01-11T21:45:10.5441728Z 
2023-01-11T21:45:10.5442098Z ##[endgroup]
2023-01-11T21:45:10.5442377Z FINISHED PRINTING LOG FILE of test_sparse_csr (/var/lib/jenkins/workspace/test/test-reports/test_sparse_csr_txl3rn3o)
2023-01-11T21:45:10.5442383Z 
2023-01-11T21:45:10.5442581Z Running test_cpp_extensions_aot_no_ninja ... [2023-01-11 21:45:10.435941]
2023-01-11T21:45:12.0501046Z running install
2023-01-11T21:45:12.0501915Z /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
2023-01-11T21:45:12.0502310Z   warnings.warn(
2023-01-11T21:45:12.0604262Z running build
2023-01-11T21:45:12.0604828Z running build_py
2023-01-11T21:45:12.0645343Z creating build
2023-01-11T21:45:12.0646159Z creating build/lib.linux-x86_64-cpython-310
2023-01-11T21:45:12.0646948Z creating build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension
2023-01-11T21:45:12.0647818Z copying torch_test_cpp_extension/__init__.py -> build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension
2023-01-11T21:45:12.0648389Z running build_ext
2023-01-11T21:45:12.0676497Z building 'torch_test_cpp_extension.cpp' extension
2023-01-11T21:45:12.0677302Z creating build/temp.linux-x86_64-cpython-310
2023-01-11T21:45:12.0681062Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c extension.cpp -o build/temp.linux-x86_64-cpython-310/extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cpp -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:45:12.9823337Z In file included from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/Exceptions.h:14:0,
2023-01-11T21:45:12.9824153Z                  from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/python.h:11,
2023-01-11T21:45:12.9824693Z                  from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:6,
2023-01-11T21:45:12.9824978Z                  from extension.cpp:1:
2023-01-11T21:45:12.9826394Z /opt/conda/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h: In instantiation of ‘class pybind11::class_<MatrixMultiplier>’:
2023-01-11T21:45:12.9826747Z extension.cpp:40:53:   required from here
2023-01-11T21:45:12.9827480Z /opt/conda/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<MatrixMultiplier>’ declared with greater visibility than the type of its field ‘pybind11::class_<MatrixMultiplier>::<anonymous>’ [-Wattributes]
2023-01-11T21:45:12.9828251Z  class class_ : public detail::generic_type {
2023-01-11T21:45:12.9828470Z        ^~~~~~
2023-01-11T21:45:12.9829088Z /opt/conda/lib/python3.10/site-packages/torch/include/pybind11/pybind11.h:1479:7: warning: ‘pybind11::class_<MatrixMultiplier>’ declared with greater visibility than its base ‘pybind11::detail::generic_type’ [-Wattributes]
2023-01-11T21:45:12.9833682Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cpp.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:45:13.3393235Z building 'torch_test_cpp_extension.ort' extension
2023-01-11T21:45:13.3395304Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c ort_extension.cpp -o build/temp.linux-x86_64-cpython-310/ort_extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=ort -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:45:14.3347780Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/ort_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/ort.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:45:14.6620768Z building 'torch_test_cpp_extension.rng' extension
2023-01-11T21:45:14.6622466Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c rng_extension.cpp -o build/temp.linux-x86_64-cpython-310/rng_extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=rng -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:45:15.8051636Z In file included from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec256/vec256.h:8:0,
2023-01-11T21:45:15.8052946Z                  from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec.h:6,
2023-01-11T21:45:15.8053858Z                  from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/native/cpu/Loops.h:37,
2023-01-11T21:45:15.8055024Z                  from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/native/cpu/DistributionTemplates.h:8,
2023-01-11T21:45:15.8055368Z                  from rng_extension.cpp:6:
2023-01-11T21:45:15.8055816Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1008:0: warning: ignoring #pragma unroll  [-Wunknown-pragmas]
2023-01-11T21:45:15.8056303Z  # pragma unroll
2023-01-11T21:45:15.8056485Z  
2023-01-11T21:45:15.8059702Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/rng_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/rng.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:45:16.1552202Z building 'torch_test_cpp_extension.cuda' extension
2023-01-11T21:45:16.1554042Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cuda_extension.cpp -o build/temp.linux-x86_64-cpython-310/cuda_extension.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:45:17.1416779Z /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cuda_extension_kernel.cu -o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -std=c++17
2023-01-11T21:45:20.8485706Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:20.8486109Z           detected during:
2023-01-11T21:45:20.8486771Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:20.8487173Z (61): here[0m
2023-01-11T21:45:20.8487711Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:20.8488243Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here[0m
2023-01-11T21:45:20.8488442Z 
2023-01-11T21:45:20.8989327Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:20.8989687Z           detected during:
2023-01-11T21:45:20.8990436Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:20.8990851Z (61): here[0m
2023-01-11T21:45:20.8991396Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:20.8992198Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here[0m
2023-01-11T21:45:20.8992403Z 
2023-01-11T21:45:27.1778869Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:27.1779753Z           detected during:
2023-01-11T21:45:27.1781170Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:27.1781956Z (61): here[0m
2023-01-11T21:45:27.1783029Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:27.1784097Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here[0m
2023-01-11T21:45:27.1784478Z 
2023-01-11T21:45:27.2285697Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:27.2286648Z           detected during:
2023-01-11T21:45:27.2288038Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:27.2288597Z (61): here[0m
2023-01-11T21:45:27.2289140Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:27.2289685Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here[0m
2023-01-11T21:45:27.2289889Z 
2023-01-11T21:45:36.3905525Z /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cuda_extension_kernel2.cu -o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel2.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cuda -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -std=c++17
2023-01-11T21:45:40.0969467Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:40.0970417Z           detected during:
2023-01-11T21:45:40.0971639Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:40.0972095Z (61): here[0m
2023-01-11T21:45:40.0972634Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:40.0973424Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here[0m
2023-01-11T21:45:40.0973620Z 
2023-01-11T21:45:40.1473095Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:40.1473657Z           detected during:
2023-01-11T21:45:40.1474301Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:40.1474707Z (61): here[0m
2023-01-11T21:45:40.1475260Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:40.1475790Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here[0m
2023-01-11T21:45:40.1475995Z 
2023-01-11T21:45:46.4372198Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:46.4372581Z           detected during:
2023-01-11T21:45:46.4373373Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:46.4373770Z (61): here[0m
2023-01-11T21:45:46.4374314Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:45:46.4375075Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here[0m
2023-01-11T21:45:46.4375272Z 
2023-01-11T21:45:46.4874350Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:45:46.4874719Z           detected during:
2023-01-11T21:45:46.4875451Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:46.4875856Z (61): here[0m
2023-01-11T21:45:46.4876408Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:45:46.4876943Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here[0m
2023-01-11T21:45:46.4877149Z 
2023-01-11T21:45:55.6657231Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/cuda_extension.o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel.o build/temp.linux-x86_64-cpython-310/cuda_extension_kernel2.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cuda.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:45:56.0536909Z building 'torch_test_cpp_extension.torch_library' extension
2023-01-11T21:45:56.0538848Z /usr/local/cuda/bin/nvcc -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c torch_library.cu -o build/temp.linux-x86_64-cpython-310/torch_library.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O2 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=torch_library -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_50,code=sm_50 -gencode=arch=compute_52,code=compute_52 -gencode=arch=compute_52,code=sm_52 -std=c++17
2023-01-11T21:46:03.4096591Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:46:03.4097450Z           detected during:
2023-01-11T21:46:03.4098795Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:46:03.4099362Z (61): here[0m
2023-01-11T21:46:03.4099962Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:46:03.4100489Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here[0m
2023-01-11T21:46:03.4100692Z 
2023-01-11T21:46:03.5384066Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:46:03.5384949Z           detected during:
2023-01-11T21:46:03.5386146Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:46:03.5387305Z (61): here[0m
2023-01-11T21:46:03.5388413Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:46:03.5389020Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here[0m
2023-01-11T21:46:03.5389221Z 
2023-01-11T21:46:27.2342296Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:46:27.2342888Z           detected during:
2023-01-11T21:46:27.2343891Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:46:27.2344577Z (61): here[0m
2023-01-11T21:46:27.2345495Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=size_t, one_sided=false, <unnamed>=0]"[0m [32m
2023-01-11T21:46:27.2346483Z /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/TensorImpl.h(77): here[0m
2023-01-11T21:46:27.2346793Z 
2023-01-11T21:46:27.3632420Z [01m[0m[01m/opt/conda/lib/python3.10/site-packages/torch/include/c10/util/irange.h(54)[0m: [01;35mwarning[0m #186-D: pointless comparison of unsigned integer with zero
2023-01-11T21:46:27.3633008Z           detected during:
2023-01-11T21:46:27.3633966Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator==(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:46:27.3634816Z (61): here[0m
2023-01-11T21:46:27.3635386Z             instantiation of [01m"__nv_bool c10::detail::integer_iterator<I, one_sided, <unnamed>>::operator!=(const c10::detail::integer_iterator<I, one_sided, <unnamed>> &) const [with I=std::size_t, one_sided=true, <unnamed>=0]"[0m [32m
2023-01-11T21:46:27.3635913Z /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/qualified_name.h(73): here[0m
2023-01-11T21:46:27.3636115Z 
2023-01-11T21:46:59.6070642Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/torch_library.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/torch_library.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:46:59.8416846Z building 'torch_test_cpp_extension.cublas_extension' extension
2023-01-11T21:46:59.8418538Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cublas_extension.cpp -o build/temp.linux-x86_64-cpython-310/cublas_extension.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cublas_extension -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:47:00.6737421Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/cublas_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lcublas -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cublas_extension.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:47:00.9284969Z building 'torch_test_cpp_extension.cusolver_extension' extension
2023-01-11T21:47:00.9286899Z gcc -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -Iself_compiler_include_dirs_test -I/opt/conda/include/python3.10 -c cusolver_extension.cpp -o build/temp.linux-x86_64-cpython-310/cusolver_extension.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=cusolver_extension -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:47:01.7701225Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib build/temp.linux-x86_64-cpython-310/cusolver_extension.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lcusolver -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cusolver_extension.cpython-310-x86_64-linux-gnu.so
2023-01-11T21:47:02.0159500Z running install_lib
2023-01-11T21:47:02.0207028Z creating install
2023-01-11T21:47:02.0207296Z creating install/opt
2023-01-11T21:47:02.0207593Z creating install/opt/conda
2023-01-11T21:47:02.0207928Z creating install/opt/conda/lib
2023-01-11T21:47:02.0208267Z creating install/opt/conda/lib/python3.10
2023-01-11T21:47:02.0208735Z creating install/opt/conda/lib/python3.10/site-packages
2023-01-11T21:47:02.0212991Z creating install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0213641Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/__init__.py -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0214284Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cpp.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0269192Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/ort.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0325243Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/rng.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0382134Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cuda.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0432693Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/torch_library.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0434212Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cublas_extension.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0436097Z copying build/lib.linux-x86_64-cpython-310/torch_test_cpp_extension/cusolver_extension.cpython-310-x86_64-linux-gnu.so -> ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension
2023-01-11T21:47:02.0443178Z byte-compiling ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension/__init__.py to __init__.cpython-310.pyc
2023-01-11T21:47:02.0444169Z running install_egg_info
2023-01-11T21:47:02.0541780Z running egg_info
2023-01-11T21:47:02.0542119Z creating torch_test_cpp_extension.egg-info
2023-01-11T21:47:02.0575720Z writing torch_test_cpp_extension.egg-info/PKG-INFO
2023-01-11T21:47:02.0576228Z writing dependency_links to torch_test_cpp_extension.egg-info/dependency_links.txt
2023-01-11T21:47:02.0578619Z writing top-level names to torch_test_cpp_extension.egg-info/top_level.txt
2023-01-11T21:47:02.0579178Z writing manifest file 'torch_test_cpp_extension.egg-info/SOURCES.txt'
2023-01-11T21:47:02.0616893Z reading manifest file 'torch_test_cpp_extension.egg-info/SOURCES.txt'
2023-01-11T21:47:02.0624084Z writing manifest file 'torch_test_cpp_extension.egg-info/SOURCES.txt'
2023-01-11T21:47:02.0625112Z Copying torch_test_cpp_extension.egg-info to ./install/opt/conda/lib/python3.10/site-packages/torch_test_cpp_extension-0.0.0-py3.10.egg-info
2023-01-11T21:47:02.0626413Z running install_scripts
2023-01-11T21:47:03.9471606Z running install
2023-01-11T21:47:03.9472336Z /opt/conda/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
2023-01-11T21:47:03.9472726Z   warnings.warn(
2023-01-11T21:47:03.9567733Z running build
2023-01-11T21:47:03.9568134Z running build_ext
2023-01-11T21:47:03.9846487Z building 'no_python_abi_suffix_test' extension
2023-01-11T21:47:03.9847013Z creating /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build
2023-01-11T21:47:03.9847801Z creating /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310
2023-01-11T21:47:04.0116079Z Emitting ninja build file /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/build.ninja...
2023-01-11T21:47:04.0116580Z Compiling objects...
2023-01-11T21:47:04.0116828Z Using envvar MAX_JOBS (14) as the number of workers...
2023-01-11T21:47:04.0712693Z [1/1] c++ -MMD -MF /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/no_python_abi_suffix_test.o.d -pthread -B /opt/conda/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /opt/conda/include -fPIC -O2 -isystem /opt/conda/include -fPIC -I/opt/conda/lib/python3.10/site-packages/torch/include -I/opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.10/site-packages/torch/include/TH -I/opt/conda/lib/python3.10/site-packages/torch/include/THC -I/opt/conda/include/python3.10 -c -c /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/no_python_abi_suffix_test.cpp -o /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/no_python_abi_suffix_test.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=no_python_abi_suffix_test -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
2023-01-11T21:47:04.0754193Z creating build/lib.linux-x86_64-cpython-310
2023-01-11T21:47:04.0756579Z g++ -pthread -B /opt/conda/compiler_compat -shared -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib -Wl,-rpath,/opt/conda/lib -Wl,-rpath-link,/opt/conda/lib -L/opt/conda/lib /var/lib/jenkins/workspace/test/cpp_extensions/no_python_abi_suffix_test/build/temp.linux-x86_64-cpython-310/no_python_abi_suffix_test.o -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch -ltorch_cpu -ltorch_python -o build/lib.linux-x86_64-cpython-310/no_python_abi_suffix_test.so
2023-01-11T21:47:04.1243306Z running install_lib
2023-01-11T21:47:04.1282339Z creating install
2023-01-11T21:47:04.1282879Z creating install/opt
2023-01-11T21:47:04.1283139Z creating install/opt/conda
2023-01-11T21:47:04.1283410Z creating install/opt/conda/lib
2023-01-11T21:47:04.1283645Z creating install/opt/conda/lib/python3.10
2023-01-11T21:47:04.1284123Z creating install/opt/conda/lib/python3.10/site-packages
2023-01-11T21:47:04.1284653Z copying build/lib.linux-x86_64-cpython-310/no_python_abi_suffix_test.so -> ./install/opt/conda/lib/python3.10/site-packages
2023-01-11T21:47:04.1290275Z running install_egg_info
2023-01-11T21:47:04.1384153Z running egg_info
2023-01-11T21:47:04.1384626Z creating no_python_abi_suffix_test.egg-info
2023-01-11T21:47:04.1417881Z writing no_python_abi_suffix_test.egg-info/PKG-INFO
2023-01-11T21:47:04.1418915Z writing dependency_links to no_python_abi_suffix_test.egg-info/dependency_links.txt
2023-01-11T21:47:04.1422603Z writing top-level names to no_python_abi_suffix_test.egg-info/top_level.txt
2023-01-11T21:47:04.1423137Z writing manifest file 'no_python_abi_suffix_test.egg-info/SOURCES.txt'
2023-01-11T21:47:04.1457122Z reading manifest file 'no_python_abi_suffix_test.egg-info/SOURCES.txt'
2023-01-11T21:47:04.1463381Z writing manifest file 'no_python_abi_suffix_test.egg-info/SOURCES.txt'
2023-01-11T21:47:04.1464041Z Copying no_python_abi_suffix_test.egg-info to ./install/opt/conda/lib/python3.10/site-packages/no_python_abi_suffix_test-0.0.0-py3.10.egg-info
2023-01-11T21:47:04.1466224Z running install_scripts
2023-01-11T21:47:04.4664431Z Executing ['/opt/conda/bin/python', '-bb', 'test_cpp_extensions_aot_no_ninja.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:47:04.465798]
2023-01-11T21:47:08.5466928Z 
2023-01-11T21:47:08.5467645Z Expand the folded group to see the log file of test_cpp_extensions_aot_no_ninja
2023-01-11T21:47:08.5468885Z ##[group]PRINTING LOG FILE of test_cpp_extensions_aot_no_ninja (/var/lib/jenkins/workspace/test/test-reports/test_cpp_extensions_aot_no_ninja_hsmbdv80)
2023-01-11T21:47:08.5469334Z 
2023-01-11T21:47:08.5469449Z Running tests...
2023-01-11T21:47:08.5470019Z ----------------------------------------------------------------------
2023-01-11T21:47:08.5472014Z Test results will be stored in test-reports/python-unittest/test_cpp_extensions_aot_no_ninja
2023-01-11T21:47:08.5472716Z   test_backward (__main__.TestCppExtensionAOT) ... ok (0.011s)
2023-01-11T21:47:08.5473095Z   test_cublas_extension (__main__.TestCppExtensionAOT) ... ok (0.653s)
2023-01-11T21:47:08.5473655Z   test_cuda_dlink_libs (__main__.TestCppExtensionAOT) ... skip: cuda extension with dlink requires ninja to build (0.001s)
2023-01-11T21:47:08.5474018Z   test_cuda_extension (__main__.TestCppExtensionAOT) ... ok (0.070s)
2023-01-11T21:47:08.5474325Z   test_cusolver_extension (__main__.TestCppExtensionAOT) ... ok (0.060s)
2023-01-11T21:47:08.5474646Z   test_extension_function (__main__.TestCppExtensionAOT) ... ok (0.001s)
2023-01-11T21:47:08.5474958Z   test_extension_module (__main__.TestCppExtensionAOT) ... ok (0.001s)
2023-01-11T21:47:08.5475309Z   test_no_python_abi_suffix_sets_the_correct_library_name (__main__.TestCppExtensionAOT) ... ok (0.001s)
2023-01-11T21:47:08.5476548Z   test_optional (__main__.TestCppExtensionAOT) ... ok (0.000s)
2023-01-11T21:47:08.5476826Z   test_add (__main__.TestORTTensor) ... ok (0.001s)
2023-01-11T21:47:08.5477196Z   test_conv_backend_override (__main__.TestORTTensor) ... ok (0.001s)
2023-01-11T21:47:08.5477619Z   test_unregistered (__main__.TestORTTensor) ... ok (0.005s)
2023-01-11T21:47:08.5477885Z   test_zeros (__main__.TestORTTensor) ... ok (0.001s)
2023-01-11T21:47:08.5478181Z   test_pybind_return_types (__main__.TestPybindTypeCasters) ... ok (0.001s)
2023-01-11T21:47:08.5478483Z   test_rng (__main__.TestRNGExtension) ... ok (0.002s)
2023-01-11T21:47:08.5478764Z   test_torch_library (__main__.TestTorchLibrary) ... ok (0.045s)
2023-01-11T21:47:08.5478923Z 
2023-01-11T21:47:08.5479153Z ----------------------------------------------------------------------
2023-01-11T21:47:08.5480569Z Ran 16 tests in 0.883s
2023-01-11T21:47:08.5480759Z 
2023-01-11T21:47:08.5481032Z OK (skipped=1)
2023-01-11T21:47:08.5481230Z 
2023-01-11T21:47:08.5481363Z Generating XML reports...
2023-01-11T21:47:08.5482066Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestCppExtensionAOT-20230111214707.xml
2023-01-11T21:47:08.5482637Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestORTTensor-20230111214707.xml
2023-01-11T21:47:08.5483204Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestPybindTypeCasters-20230111214707.xml
2023-01-11T21:47:08.5483764Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestRNGExtension-20230111214707.xml
2023-01-11T21:47:08.5484315Z Generated XML report: test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestTorchLibrary-20230111214707.xml
2023-01-11T21:47:08.5484555Z 
2023-01-11T21:47:08.5484845Z ##[endgroup]
2023-01-11T21:47:08.5485296Z FINISHED PRINTING LOG FILE of test_cpp_extensions_aot_no_ninja (/var/lib/jenkins/workspace/test/test-reports/test_cpp_extensions_aot_no_ninja_hsmbdv80)
2023-01-11T21:47:08.5485551Z 
2023-01-11T21:47:08.5485738Z Running test_cuda_nvml_based_avail ... [2023-01-11 21:47:08.546680]
2023-01-11T21:47:08.5486256Z Executing ['/opt/conda/bin/python', '-bb', 'test_cuda_nvml_based_avail.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:47:08.546872]
2023-01-11T21:47:19.7502141Z 
2023-01-11T21:47:19.7502719Z Expand the folded group to see the log file of test_cuda_nvml_based_avail
2023-01-11T21:47:19.7503684Z ##[group]PRINTING LOG FILE of test_cuda_nvml_based_avail (/var/lib/jenkins/workspace/test/test-reports/test_cuda_nvml_based_avail_1g39c3hx)
2023-01-11T21:47:19.7504457Z <unittest.suite.TestSuite tests=[]>
2023-01-11T21:47:19.7505703Z <unittest.suite.TestSuite tests=[<__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_False_avoid_init_0>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_False_avoid_init_1>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_False_avoid_init_None>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_True_avoid_init_0>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_True_avoid_init_1>, <__main__.TestExtendedCUDAIsAvail testMethod=test_cuda_is_available_nvml_avail_True_avoid_init_None>]>
2023-01-11T21:47:19.7506749Z test_cuda_is_available_nvml_avail_False_avoid_init_0 (__main__.TestExtendedCUDAIsAvail)
2023-01-11T21:47:19.7507195Z test_cuda_is_available_nvml_avail_False_avoid_init_1 (__main__.TestExtendedCUDAIsAvail)
2023-01-11T21:47:19.7507691Z test_cuda_is_available_nvml_avail_False_avoid_init_None (__main__.TestExtendedCUDAIsAvail)
2023-01-11T21:47:19.7508106Z test_cuda_is_available_nvml_avail_True_avoid_init_0 (__main__.TestExtendedCUDAIsAvail)
2023-01-11T21:47:19.7508674Z test_cuda_is_available_nvml_avail_True_avoid_init_1 (__main__.TestExtendedCUDAIsAvail)
2023-01-11T21:47:19.7509243Z test_cuda_is_available_nvml_avail_True_avoid_init_None (__main__.TestExtendedCUDAIsAvail)
2023-01-11T21:47:19.7509918Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:47:19.7510368Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:47:19.7511149Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:47:19.7511743Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:47:19.7512018Z 
2023-01-11T21:47:19.7512147Z Running tests...
2023-01-11T21:47:19.7512656Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7513296Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail
2023-01-11T21:47:19.7513841Z   test_cuda_is_available_nvml_avail_False_avoid_init_0 (__main__.TestExtendedCUDAIsAvail) ... ok (0.032s)
2023-01-11T21:47:19.7514191Z 
2023-01-11T21:47:19.7514588Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7515004Z Ran 1 test in 0.032s
2023-01-11T21:47:19.7515201Z 
2023-01-11T21:47:19.7515311Z OK
2023-01-11T21:47:19.7515483Z 
2023-01-11T21:47:19.7515645Z Generating XML reports...
2023-01-11T21:47:19.7516243Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214711.xml
2023-01-11T21:47:19.7517136Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:47:19.7517517Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:47:19.7518298Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:47:19.7518825Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:47:19.7519142Z 
2023-01-11T21:47:19.7519287Z Running tests...
2023-01-11T21:47:19.7519802Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7520289Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail
2023-01-11T21:47:19.7520675Z   test_cuda_is_available_nvml_avail_False_avoid_init_1 (__main__.TestExtendedCUDAIsAvail) ... ok (0.029s)
2023-01-11T21:47:19.7520888Z 
2023-01-11T21:47:19.7521091Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7521350Z Ran 1 test in 0.029s
2023-01-11T21:47:19.7521475Z 
2023-01-11T21:47:19.7521539Z OK
2023-01-11T21:47:19.7521642Z 
2023-01-11T21:47:19.7521736Z Generating XML reports...
2023-01-11T21:47:19.7522264Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214712.xml
2023-01-11T21:47:19.7522810Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:47:19.7523158Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:47:19.7523603Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:47:19.7523972Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:47:19.7524148Z 
2023-01-11T21:47:19.7524222Z Running tests...
2023-01-11T21:47:19.7524541Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7524948Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail
2023-01-11T21:47:19.7525336Z   test_cuda_is_available_nvml_avail_False_avoid_init_None (__main__.TestExtendedCUDAIsAvail) ... ok (0.029s)
2023-01-11T21:47:19.7525544Z 
2023-01-11T21:47:19.7525743Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7526001Z Ran 1 test in 0.029s
2023-01-11T21:47:19.7526121Z 
2023-01-11T21:47:19.7526191Z OK
2023-01-11T21:47:19.7526293Z 
2023-01-11T21:47:19.7526380Z Generating XML reports...
2023-01-11T21:47:19.7526857Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214714.xml
2023-01-11T21:47:19.7527397Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:47:19.7527754Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:47:19.7528195Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:47:19.7528564Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:47:19.7539869Z 
2023-01-11T21:47:19.7539984Z Running tests...
2023-01-11T21:47:19.7540336Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7540755Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail
2023-01-11T21:47:19.7541244Z   test_cuda_is_available_nvml_avail_True_avoid_init_0 (__main__.TestExtendedCUDAIsAvail) ... ok (0.024s)
2023-01-11T21:47:19.7541459Z 
2023-01-11T21:47:19.7541654Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7541917Z Ran 1 test in 0.024s
2023-01-11T21:47:19.7542040Z 
2023-01-11T21:47:19.7542111Z OK
2023-01-11T21:47:19.7542212Z 
2023-01-11T21:47:19.7542307Z Generating XML reports...
2023-01-11T21:47:19.7542762Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214715.xml
2023-01-11T21:47:19.7543315Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:47:19.7543677Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:47:19.7544116Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:47:19.7544495Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:47:19.7544671Z 
2023-01-11T21:47:19.7544752Z Running tests...
2023-01-11T21:47:19.7545068Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7545466Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail
2023-01-11T21:47:19.7545848Z   test_cuda_is_available_nvml_avail_True_avoid_init_1 (__main__.TestExtendedCUDAIsAvail) ... ok (0.013s)
2023-01-11T21:47:19.7546060Z 
2023-01-11T21:47:19.7546262Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7546514Z Ran 1 test in 0.013s
2023-01-11T21:47:19.7546635Z 
2023-01-11T21:47:19.7546755Z OK
2023-01-11T21:47:19.7546859Z 
2023-01-11T21:47:19.7546957Z Generating XML reports...
2023-01-11T21:47:19.7547417Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214717.xml
2023-01-11T21:47:19.7547957Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:47:19.7548316Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:47:19.7548764Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:47:19.7549128Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:47:19.7549306Z 
2023-01-11T21:47:19.7549404Z Running tests...
2023-01-11T21:47:19.7549754Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7550162Z Test results will be stored in test-reports/python-unittest/test_cuda_nvml_based_avail
2023-01-11T21:47:19.7550546Z   test_cuda_is_available_nvml_avail_True_avoid_init_None (__main__.TestExtendedCUDAIsAvail) ... ok (0.028s)
2023-01-11T21:47:19.7550758Z 
2023-01-11T21:47:19.7550956Z ----------------------------------------------------------------------
2023-01-11T21:47:19.7551218Z Ran 1 test in 0.028s
2023-01-11T21:47:19.7551342Z 
2023-01-11T21:47:19.7551403Z OK
2023-01-11T21:47:19.7551508Z 
2023-01-11T21:47:19.7551601Z Generating XML reports...
2023-01-11T21:47:19.7552061Z Generated XML report: test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214719.xml
2023-01-11T21:47:19.7552327Z 
2023-01-11T21:47:19.7552646Z ##[endgroup]
2023-01-11T21:47:19.7553077Z FINISHED PRINTING LOG FILE of test_cuda_nvml_based_avail (/var/lib/jenkins/workspace/test/test-reports/test_cuda_nvml_based_avail_1g39c3hx)
2023-01-11T21:47:19.7553323Z 
2023-01-11T21:47:19.7553495Z Running test_dispatch ... [2023-01-11 21:47:19.750185]
2023-01-11T21:47:19.7553978Z Executing ['/opt/conda/bin/python', '-bb', 'test_dispatch.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:47:19.750383]
2023-01-11T21:48:07.9746943Z 
2023-01-11T21:48:07.9747629Z Expand the folded group to see the log file of test_dispatch
2023-01-11T21:48:07.9748839Z ##[group]PRINTING LOG FILE of test_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_dispatch_riow3h5c)
2023-01-11T21:48:07.9749103Z 
2023-01-11T21:48:07.9749190Z Running tests...
2023-01-11T21:48:07.9749670Z ----------------------------------------------------------------------
2023-01-11T21:48:07.9750209Z Test results will be stored in test-reports/python-unittest/test_dispatch
2023-01-11T21:48:07.9750622Z   test_all_invariants (__main__.TestDispatch) ... ok (1.121s)
2023-01-11T21:48:07.9751027Z   test_computed_table (__main__.TestDispatch) ... ok (8.818s)
2023-01-11T21:48:07.9751449Z   test_computed_table_with_ambiguous_autogradother (__main__.TestDispatch) ... ok (0.013s)
2023-01-11T21:48:07.9754850Z   test_computed_table_with_autograd (__main__.TestDispatch) ... ok (0.002s)
2023-01-11T21:48:07.9755351Z   test_computed_table_with_cpu_autograd_defaultbackend (__main__.TestDispatch) ... ok (0.244s)
2023-01-11T21:48:07.9755911Z   test_computed_table_with_cpu_autograd_math (__main__.TestDispatch) ... ok (0.264s)
2023-01-11T21:48:07.9756509Z   test_computed_table_with_cpu_autograd_math_defaultbackend (__main__.TestDispatch) ... ok (8.273s)
2023-01-11T21:48:07.9757049Z   test_computed_table_with_cpu_defaultbackend (__main__.TestDispatch) ... ok (0.012s)
2023-01-11T21:48:07.9757431Z   test_computed_table_with_cpu_math (__main__.TestDispatch) ... ok (0.013s)
2023-01-11T21:48:07.9757879Z   test_computed_table_with_cpu_math_autogradcpu_fallthrough (__main__.TestDispatch) ... ok (0.002s)
2023-01-11T21:48:07.9758397Z   test_computed_table_with_math (__main__.TestDispatch) ... ok (0.002s)
2023-01-11T21:48:07.9758699Z   test_def (__main__.TestDispatch) ... ok (8.489s)
2023-01-11T21:48:07.9758977Z   test_def_impl_schema_mismatch (__main__.TestDispatch) ... ok (0.010s)
2023-01-11T21:48:07.9759388Z   test_def_only (__main__.TestDispatch) ... ok (0.001s)
2023-01-11T21:48:07.9760335Z   test_def_with_explicit_alias (__main__.TestDispatch) ... ok (0.001s)
2023-01-11T21:48:07.9760659Z   test_def_with_inference (__main__.TestDispatch) ... ok (0.282s)
2023-01-11T21:48:07.9761126Z   test_dispatch_print_registrations_for_dispatch_key_invalid (__main__.TestDispatch) ... ok (0.001s)
2023-01-11T21:48:07.9761539Z   test_find_dangling_impls (__main__.TestDispatch) ... ok (0.001s)
2023-01-11T21:48:07.9762010Z   test_find_dangling_impls_ext (__main__.TestDispatch) ... Using /var/lib/jenkins/.cache/torch_extensions/py310_cu116 as PyTorch extensions root...
2023-01-11T21:48:07.9762430Z Creating extension directory /var/lib/jenkins/.cache/torch_extensions/py310_cu116/dangling_impl_extension...
2023-01-11T21:48:07.9762880Z Emitting ninja build file /var/lib/jenkins/.cache/torch_extensions/py310_cu116/dangling_impl_extension/build.ninja...
2023-01-11T21:48:07.9763457Z Building extension module dangling_impl_extension...
2023-01-11T21:48:07.9763745Z Using envvar MAX_JOBS (14) as the number of workers...
2023-01-11T21:48:07.9765672Z [1/2] c++ -MMD -MF dangling_impl_extension.o.d -DTORCH_EXTENSION_NAME=dangling_impl_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/conda/lib/python3.10/site-packages/torch/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /opt/conda/lib/python3.10/site-packages/torch/include/TH -isystem /opt/conda/lib/python3.10/site-packages/torch/include/THC -isystem /opt/conda/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -g -c /var/lib/jenkins/workspace/test/cpp_extensions/dangling_impl_extension.cpp -o dangling_impl_extension.o 
2023-01-11T21:48:07.9767323Z [2/2] c++ dangling_impl_extension.o -shared -L/opt/conda/lib/python3.10/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o dangling_impl_extension.so
2023-01-11T21:48:07.9767862Z Loading extension module dangling_impl_extension...
2023-01-11T21:48:07.9768130Z ok (18.290s)
2023-01-11T21:48:07.9768386Z   test_impl_only (__main__.TestDispatch) ... ok (0.270s)
2023-01-11T21:48:07.9768843Z   test_multiple_def_alias_defaulting (__main__.TestDispatch) ... ok (0.007s)
2023-01-11T21:48:07.9769239Z   test_multiple_def_alias_mismatch (__main__.TestDispatch) ... ok (0.007s)
2023-01-11T21:48:07.9769648Z   test_multiple_def_error (__main__.TestDispatch) ... ok (0.007s)
2023-01-11T21:48:07.9770047Z   test_multiple_fallback (__main__.TestDispatch) ... ok (0.008s)
2023-01-11T21:48:07.9770608Z   test_overwrite_math (__main__.TestDispatch) ... [W OperatorEntry.cpp:159] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:48:07.9771126Z   operator: __test45643__::foo
2023-01-11T21:48:07.9771400Z     no debug info
2023-01-11T21:48:07.9771651Z   dispatch key: (catch all)
2023-01-11T21:48:07.9771871Z   previous kernel: fn1
2023-01-11T21:48:07.9772106Z        new kernel: fn2 (function registerKernel)
2023-01-11T21:48:07.9772453Z [W OperatorEntry.cpp:159] Warning: Overriding a previously registered kernel for the same operator and the same dispatch key
2023-01-11T21:48:07.9772769Z   operator: __test45644__::foo
2023-01-11T21:48:07.9772965Z     no debug info
2023-01-11T21:48:07.9773153Z   dispatch key: (catch all)
2023-01-11T21:48:07.9773362Z   previous kernel: fn1
2023-01-11T21:48:07.9773592Z        new kernel: fn2 (function registerKernel)
2023-01-11T21:48:07.9773796Z ok (0.001s)
2023-01-11T21:48:07.9774052Z   test_autogradother (__main__.TestPythonDispatcher) ... ok (0.001s)
2023-01-11T21:48:07.9774356Z   test_basic (__main__.TestPythonDispatcher) ... ok (0.001s)
2023-01-11T21:48:07.9774931Z   test_defaultbackend_autogradcpu (__main__.TestPythonDispatcher) ... ok (0.001s)
2023-01-11T21:48:07.9775258Z   test_defaultbackend_math (__main__.TestPythonDispatcher) ... ok (0.001s)
2023-01-11T21:48:07.9775722Z   test_duplicate_registrations (__main__.TestPythonDispatcher) ... ok (0.000s)
2023-01-11T21:48:07.9776063Z   test_math_autogradcpu (__main__.TestPythonDispatcher) ... ok (0.001s)
2023-01-11T21:48:07.9776398Z   test_quantized_structured_not_implemented (__main__.TestPythonDispatcher) ... ok (0.028s)
2023-01-11T21:48:07.9776600Z 
2023-01-11T21:48:07.9776824Z ----------------------------------------------------------------------
2023-01-11T21:48:07.9777084Z Ran 32 tests in 46.172s
2023-01-11T21:48:07.9777207Z 
2023-01-11T21:48:07.9777277Z OK
2023-01-11T21:48:07.9777369Z 
2023-01-11T21:48:07.9777462Z Generating XML reports...
2023-01-11T21:48:07.9777868Z Generated XML report: test-reports/python-unittest/test_dispatch/TEST-TestDispatch-20230111214721.xml
2023-01-11T21:48:07.9778390Z Generated XML report: test-reports/python-unittest/test_dispatch/TEST-TestPythonDispatcher-20230111214721.xml
2023-01-11T21:48:07.9778639Z 
2023-01-11T21:48:07.9778893Z ##[endgroup]
2023-01-11T21:48:07.9779380Z FINISHED PRINTING LOG FILE of test_dispatch (/var/lib/jenkins/workspace/test/test-reports/test_dispatch_riow3h5c)
2023-01-11T21:48:07.9779602Z 
2023-01-11T21:48:07.9779771Z Running test_linalg ... [2023-01-11 21:48:07.974645]
2023-01-11T21:48:07.9780237Z Executing ['/opt/conda/bin/python', '-bb', 'test_linalg.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:48:07.974838]
2023-01-11T21:53:15.8029629Z 
2023-01-11T21:53:15.8032703Z Expand the folded group to see the log file of test_linalg
2023-01-11T21:53:15.8047619Z ##[group]PRINTING LOG FILE of test_linalg (/var/lib/jenkins/workspace/test/test-reports/test_linalg_lzuigkoh)
2023-01-11T21:53:15.8047841Z 
2023-01-11T21:53:15.8049779Z Running tests...
2023-01-11T21:53:15.8050487Z ----------------------------------------------------------------------
2023-01-11T21:53:15.8051017Z Test results will be stored in test-reports/python-unittest/test_linalg
2023-01-11T21:53:15.8051347Z   test_addbmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (1.152s)
2023-01-11T21:53:15.8051680Z   test_addbmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.525s)
2023-01-11T21:53:15.8051979Z   test_addbmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.104s)
2023-01-11T21:53:15.8097072Z   test_addbmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.003s)
2023-01-11T21:53:15.8097436Z   test_addbmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.476s)
2023-01-11T21:53:15.8097753Z   test_addmm_activation_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8098083Z   test_addmm_activation_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.027s)
2023-01-11T21:53:15.8098399Z   test_addmm_activation_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8098721Z   test_addmm_baddbmm_overflow_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8099045Z   test_addmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8099354Z   test_addmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8099662Z   test_addmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.030s)
2023-01-11T21:53:15.8099968Z   test_addmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.027s)
2023-01-11T21:53:15.8100352Z   test_addmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8100654Z   test_addmm_sizes_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.218s)
2023-01-11T21:53:15.8100969Z   test_addmm_sizes_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.434s)
2023-01-11T21:53:15.8101281Z   test_addmm_sizes_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.429s)
2023-01-11T21:53:15.8101589Z   test_addmm_sizes_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.216s)
2023-01-11T21:53:15.8101883Z   test_addmv_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8102181Z   test_addmv_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8103446Z   test_addmv_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.021s)
2023-01-11T21:53:15.8104143Z   test_addmv_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8104557Z   test_addmv_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8104877Z   test_addmv_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8105198Z   test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8105553Z   test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8105908Z   test_addmv_rowmajor_colmajor_incx_incy_lda_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8106304Z   test_addr_bool_cuda_bool (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8106757Z   test_addr_float_and_complex_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8107218Z   test_addr_float_and_complex_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8107688Z   test_addr_float_and_complex_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8108143Z   test_addr_float_and_complex_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8108618Z   test_addr_float_and_complex_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8109099Z   test_addr_float_and_complex_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8109439Z   test_addr_integral_cuda_int16 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8109776Z   test_addr_integral_cuda_int32 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8110076Z   test_addr_integral_cuda_int64 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8110370Z   test_addr_integral_cuda_int8 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8110667Z   test_addr_integral_cuda_uint8 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8110958Z   test_addr_type_promotion_cuda (__main__.TestLinalgCUDA) ... ok (0.330s)
2023-01-11T21:53:15.8111257Z   test_baddbmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.995s)
2023-01-11T21:53:15.8111557Z   test_baddbmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (1.049s)
2023-01-11T21:53:15.8111927Z   test_baddbmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (2.191s)
2023-01-11T21:53:15.8112230Z   test_baddbmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.994s)
2023-01-11T21:53:15.8112523Z   test_baddbmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.938s)
2023-01-11T21:53:15.8144992Z   test_blas_alpha_beta_empty_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8145355Z   test_blas_alpha_beta_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8145690Z   test_blas_alpha_beta_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8146024Z   test_blas_alpha_beta_empty_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8146361Z   test_blas_alpha_beta_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8146679Z   test_blas_alpha_beta_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8147065Z   test_blas_empty_cuda (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8147388Z   test_blas_mv_large_input_cuda (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:53:15.8148164Z   test_blas_nan_out_cuda_bfloat16 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:4652: UserWarning: An output with one or more elements was resized since it had shape [7], which does not match the required output shape [5]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8149038Z   self.assertEqual(torch.mv(nm, _m), torch.mv(nm, _m, out=_m_out))
2023-01-11T21:53:15.8149276Z ok (0.003s)
2023-01-11T21:53:15.8149653Z   test_blas_nan_out_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8149971Z   test_blas_nan_out_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8150277Z   test_blas_nan_out_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8150577Z   test_blas_nan_out_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8150865Z   test_blas_nan_out_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8151607Z   test_bmm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [1, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8152274Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8152933Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [1, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8153547Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8154234Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [1, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8154841Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8155486Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 23, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8156101Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8156749Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8157349Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8157978Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8158603Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8159237Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [1, 23, 12], which does not match the required output shape [0, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8159833Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8160481Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [10, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8161098Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8161746Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [10, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8162366Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8163030Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [10, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8163641Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8164280Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 23, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8164885Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8165534Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 23, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8166129Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8166766Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 0, 12]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8167390Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8168022Z /var/lib/jenkins/workspace/test/test_linalg.py:5739: UserWarning: An output with one or more elements was resized since it had shape [10, 23, 12], which does not match the required output shape [0, 0, 0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8168637Z   torch.bmm(b1, b2, out=res2)
2023-01-11T21:53:15.8168862Z ok (15.701s)
2023-01-11T21:53:15.8169107Z   test_bmm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (15.734s)
2023-01-11T21:53:15.8169398Z   test_bmm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (31.414s)
2023-01-11T21:53:15.8169694Z   test_bmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (31.163s)
2023-01-11T21:53:15.8169979Z   test_bmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (15.592s)
2023-01-11T21:53:15.8170714Z   test_broadcast_batched_matmul_cuda (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [3, 8], which does not match the required output shape [3, 8, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8171381Z   torch.matmul(l, r, out=out)
2023-01-11T21:53:15.8172053Z /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [3, 1], which does not match the required output shape [3, 1, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8172665Z   torch.matmul(l, r, out=out)
2023-01-11T21:53:15.8173297Z /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [1, 2], which does not match the required output shape [1, 2, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8173920Z   torch.matmul(l, r, out=out)
2023-01-11T21:53:15.8174852Z /var/lib/jenkins/workspace/test/test_linalg.py:6856: UserWarning: An output with one or more elements was resized since it had shape [1, 6], which does not match the required output shape [1, 1, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8175489Z   torch.matmul(l, r, out=out)
2023-01-11T21:53:15.8175674Z ok (0.048s)
2023-01-11T21:53:15.8175924Z   test_broadcast_fused_matmul_cuda (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8176943Z   test_chain_matmul_cuda_float64 (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/functional.py:1626: UserWarning: torch.chain_matmul is deprecated and will be removed in a future PyTorch release. Use torch.linalg.multi_dot instead, which accepts a list of two or more tensors rather than multiple parameters. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/LinearAlgebra.cpp:1077.)
2023-01-11T21:53:15.8177700Z   return _VF.chain_matmul(matrices)  # type: ignore[attr-defined]
2023-01-11T21:53:15.8177933Z ok (0.005s)
2023-01-11T21:53:15.8178191Z   test_cholesky_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.159s)
2023-01-11T21:53:15.8178503Z   test_cholesky_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8178812Z   test_cholesky_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8179107Z   test_cholesky_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8179436Z   test_cholesky_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.040s)
2023-01-11T21:53:15.8179792Z   test_cholesky_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8180243Z   test_cholesky_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8180592Z   test_cholesky_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8180914Z   test_cholesky_ex_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8181231Z   test_cholesky_ex_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8181533Z   test_cholesky_ex_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8181834Z   test_cholesky_ex_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8182149Z   test_cholesky_ex_non_pd_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8182470Z   test_cholesky_ex_non_pd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8182797Z   test_cholesky_ex_non_pd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8183109Z   test_cholesky_ex_non_pd_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8183496Z   test_cholesky_inverse_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.222s)
2023-01-11T21:53:15.8183812Z   test_cholesky_inverse_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.032s)
2023-01-11T21:53:15.8184126Z   test_cholesky_inverse_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.027s)
2023-01-11T21:53:15.8184442Z   test_cholesky_inverse_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.027s)
2023-01-11T21:53:15.8184778Z   test_cholesky_inverse_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8185143Z   test_cholesky_inverse_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8185506Z   test_cholesky_inverse_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8185864Z   test_cholesky_inverse_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8186224Z   test_cholesky_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8186593Z   test_cholesky_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8186957Z   test_cholesky_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8187310Z   test_cholesky_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8187824Z   test_cholesky_solve_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:2483: UserWarning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
2023-01-11T21:53:15.8188236Z L = torch.cholesky(A)
2023-01-11T21:53:15.8188520Z should be replaced with
2023-01-11T21:53:15.8188806Z L = torch.linalg.cholesky(A)
2023-01-11T21:53:15.8189029Z and
2023-01-11T21:53:15.8189224Z U = torch.cholesky(A, upper=True)
2023-01-11T21:53:15.8189436Z should be replaced with
2023-01-11T21:53:15.8189658Z U = torch.linalg.cholesky(A).mH().
2023-01-11T21:53:15.8190083Z This transform will produce equivalent results for all valid (symmetric positive definite) inputs. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:1730.)
2023-01-11T21:53:15.8190481Z   L = torch.cholesky(A, upper=upper)
2023-01-11T21:53:15.8190682Z ok (0.007s)
2023-01-11T21:53:15.8190963Z   test_cholesky_solve_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8191300Z   test_cholesky_solve_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8191621Z   test_cholesky_solve_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8192034Z   test_cholesky_solve_batched_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8192516Z   test_cholesky_solve_batched_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8192990Z   test_cholesky_solve_batched_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8193443Z   test_cholesky_solve_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8193832Z   test_cholesky_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8194152Z   test_cholesky_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8194467Z   test_cholesky_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8194771Z   test_cholesky_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8195152Z   test_cholesky_solve_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8195520Z   test_cholesky_solve_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8195882Z   test_cholesky_solve_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8196238Z   test_cholesky_solve_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8196564Z   test_cond_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.254s)
2023-01-11T21:53:15.8196864Z   test_cond_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.065s)
2023-01-11T21:53:15.8197148Z   test_cond_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.061s)
2023-01-11T21:53:15.8197435Z   test_cond_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.113s)
2023-01-11T21:53:15.8197761Z   test_cond_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.047s)
2023-01-11T21:53:15.8198092Z   test_cond_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.046s)
2023-01-11T21:53:15.8198430Z   test_cond_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.046s)
2023-01-11T21:53:15.8198781Z   test_cond_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.046s)
2023-01-11T21:53:15.8199149Z   test_corner_cases_of_cublasltmatmul_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.316s)
2023-01-11T21:53:15.8199495Z   test_corner_cases_of_cublasltmatmul_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.451s)
2023-01-11T21:53:15.8199846Z   test_corner_cases_of_cublasltmatmul_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.390s)
2023-01-11T21:53:15.8200192Z   test_corner_cases_of_cublasltmatmul_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.318s)
2023-01-11T21:53:15.8200567Z   test_corner_cases_of_cublasltmatmul_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.309s)
2023-01-11T21:53:15.8200904Z   test_corner_cases_of_cublasltmatmul_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.385s)
2023-01-11T21:53:15.8201231Z   test_cross_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8201528Z   test_cross_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8201843Z   test_cross_with_and_without_dim_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8202183Z   test_cross_with_and_without_dim_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8202501Z   test_det_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.024s)
2023-01-11T21:53:15.8202794Z   test_det_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8203102Z   test_det_logdet_slogdet_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.188s)
2023-01-11T21:53:15.8203588Z   test_det_logdet_slogdet_cuda_float64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6457: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
2023-01-11T21:53:15.8204137Z The boolean parameter 'some' has been replaced with a string parameter 'mode'.
2023-01-11T21:53:15.8204401Z Q, R = torch.qr(A, some)
2023-01-11T21:53:15.8204611Z should be replaced with
2023-01-11T21:53:15.8205089Z Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2459.)
2023-01-11T21:53:15.8205437Z   q, _ = torch.qr(mat)
2023-01-11T21:53:15.8205618Z ok (2.621s)
2023-01-11T21:53:15.8205861Z   test_dot_invalid_args_cuda (__main__.TestLinalgCUDA) ... ok (0.033s)
2023-01-11T21:53:15.8206173Z   test_dot_vs_numpy_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8206479Z   test_dot_vs_numpy_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8206863Z   test_eig_check_magma_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8207246Z   test_eig_compare_backends_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8207620Z   test_eig_compare_backends_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8207948Z   test_eig_compare_backends_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8208278Z   test_eig_compare_backends_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8208618Z   test_eig_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.036s)
2023-01-11T21:53:15.8208954Z   test_eig_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.036s)
2023-01-11T21:53:15.8209289Z   test_eig_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.047s)
2023-01-11T21:53:15.8209624Z   test_eig_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.049s)
2023-01-11T21:53:15.8209941Z   test_eig_numpy_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8210244Z   test_eig_numpy_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8210553Z   test_eig_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8210864Z   test_eig_with_nan_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.023s)
2023-01-11T21:53:15.8211173Z   test_eig_with_nan_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.023s)
2023-01-11T21:53:15.8211483Z   test_eig_with_nan_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8211788Z   test_eig_with_nan_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8212092Z   test_eigh_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.034s)
2023-01-11T21:53:15.8212389Z   test_eigh_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.034s)
2023-01-11T21:53:15.8212719Z   test_eigh_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.030s)
2023-01-11T21:53:15.8213013Z   test_eigh_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.029s)
2023-01-11T21:53:15.8213327Z   test_eigh_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.043s)
2023-01-11T21:53:15.8213674Z   test_eigh_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.043s)
2023-01-11T21:53:15.8214012Z   test_eigh_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.043s)
2023-01-11T21:53:15.8214349Z   test_eigh_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.043s)
2023-01-11T21:53:15.8214914Z   test_eigh_lower_uplo_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8215325Z   test_eigh_lower_uplo_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8215710Z   test_eigh_lower_uplo_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8216012Z   test_eigh_lower_uplo_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8216339Z   test_eigvals_compare_backends_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8216679Z   test_eigvals_compare_backends_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8217017Z   test_eigvals_compare_backends_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8217335Z   test_eigvals_compare_backends_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8217668Z   test_eigvals_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8218008Z   test_eigvals_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8218334Z   test_eigvals_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8218686Z   test_eigvals_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8219033Z   test_eigvals_numpy_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8219337Z   test_eigvals_numpy_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8219634Z   test_eigvalsh_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8219995Z   test_eigvalsh_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8220400Z   test_eigvalsh_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8220700Z   test_eigvalsh_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8221226Z   test_eigvalsh_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1042: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Copy.cpp:276.)
2023-01-11T21:53:15.8221695Z   out = torch.empty_like(t).to(real_dtype)
2023-01-11T21:53:15.8221911Z ok (0.029s)
2023-01-11T21:53:15.8222170Z   test_eigvalsh_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.028s)
2023-01-11T21:53:15.8222510Z   test_eigvalsh_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.029s)
2023-01-11T21:53:15.8222852Z   test_eigvalsh_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.029s)
2023-01-11T21:53:15.8223159Z   test_einsum_corner_cases_cuda (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8223464Z   test_einsum_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.038s)
2023-01-11T21:53:15.8223761Z   test_einsum_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.035s)
2023-01-11T21:53:15.8224062Z   test_einsum_error_cases_cuda (__main__.TestLinalgCUDA) ... ok (0.036s)
2023-01-11T21:53:15.8224367Z   test_einsum_random_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (3.122s)
2023-01-11T21:53:15.8224678Z   test_einsum_random_cuda_float64 (__main__.TestLinalgCUDA) ... ok (2.959s)
2023-01-11T21:53:15.8225003Z   test_einsum_sublist_format_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8225365Z   test_einsum_sublist_format_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8225672Z   test_geqrf_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8225971Z   test_geqrf_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8226259Z   test_geqrf_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8226535Z   test_geqrf_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.018s)
2023-01-11T21:53:15.8226843Z   test_householder_product_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8227174Z   test_householder_product_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8227491Z   test_householder_product_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8227802Z   test_householder_product_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8228136Z   test_householder_product_errors_and_warnings_cuda (__main__.TestLinalgCUDA) ... ok (0.032s)
2023-01-11T21:53:15.8228454Z   test_inner_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.033s)
2023-01-11T21:53:15.8228738Z   test_inner_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.030s)
2023-01-11T21:53:15.8229046Z   test_inv_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.050s)
2023-01-11T21:53:15.8229378Z   test_inv_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.049s)
2023-01-11T21:53:15.8229694Z   test_inv_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.049s)
2023-01-11T21:53:15.8230013Z   test_inv_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.049s)
2023-01-11T21:53:15.8230328Z   test_inv_ex_info_device_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8230646Z   test_inv_ex_info_device_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8230956Z   test_inv_ex_info_device_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8231260Z   test_inv_ex_info_device_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8231610Z   test_inv_ex_singular_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8231918Z   test_inv_ex_singular_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8232224Z   test_inv_ex_singular_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8232530Z   test_inv_ex_singular_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8232870Z   test_invariance_error_spectral_decompositions_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8233199Z   test_inverse_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.150s)
2023-01-11T21:53:15.8233495Z   test_inverse_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.103s)
2023-01-11T21:53:15.8233790Z   test_inverse_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.095s)
2023-01-11T21:53:15.8234077Z   test_inverse_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.101s)
2023-01-11T21:53:15.8234378Z   test_inverse_errors_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8234692Z   test_inverse_errors_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.038s)
2023-01-11T21:53:15.8235002Z   test_inverse_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.038s)
2023-01-11T21:53:15.8235300Z   test_inverse_errors_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.038s)
2023-01-11T21:53:15.8235662Z   test_inverse_errors_large_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8236059Z   test_inverse_errors_large_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8236445Z   test_inverse_errors_large_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8236867Z   test_inverse_errors_large_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8237288Z   test_inverse_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8237733Z   test_inverse_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8238173Z   test_inverse_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8238601Z   test_inverse_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8238959Z   test_kron_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8239250Z   test_kron_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8239533Z   test_kron_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8239817Z   test_kron_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8240112Z   test_kron_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8240420Z   test_kron_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8240721Z   test_kron_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8241018Z   test_kron_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8241776Z   test_kron_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8242466Z   torch.kron(a, b, out=out)
2023-01-11T21:53:15.8242652Z ok (0.006s)
2023-01-11T21:53:15.8243343Z   test_kron_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8244010Z   torch.kron(a, b, out=out)
2023-01-11T21:53:15.8244202Z ok (0.005s)
2023-01-11T21:53:15.8244887Z   test_kron_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8245537Z   torch.kron(a, b, out=out)
2023-01-11T21:53:15.8245721Z ok (0.005s)
2023-01-11T21:53:15.8246385Z   test_kron_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:1117: UserWarning: An output with one or more elements was resized since it had shape [3, 3], which does not match the required output shape [6, 6]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8247082Z   torch.kron(a, b, out=out)
2023-01-11T21:53:15.8247276Z ok (0.005s)
2023-01-11T21:53:15.8247507Z   test_lapack_empty_cuda (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8247896Z   test_ldl_factor_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:7290: ComplexWarning: scipy.linalg.ldl():
2023-01-11T21:53:15.8248348Z The imaginary parts of the diagonalare ignored. Use "hermitian=False" for factorization ofcomplex symmetric arrays.
2023-01-11T21:53:15.8248740Z   lambda x: scipy_ldl(x, hermitian=hermitian, lower=True),
2023-01-11T21:53:15.8248963Z ok (0.047s)
2023-01-11T21:53:15.8249211Z   test_ldl_factor_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8249520Z   test_ldl_factor_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8249817Z   test_ldl_factor_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8250123Z   test_ldl_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8250432Z   test_ldl_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8250727Z   test_ldl_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8251024Z   test_ldl_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8251329Z   test_linalg_cross_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8251638Z   test_linalg_cross_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8251964Z   test_linalg_cross_with_and_without_dim_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8252323Z   test_linalg_cross_with_and_without_dim_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8252675Z   test_linalg_lstsq_batch_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.252s)
2023-01-11T21:53:15.8253053Z   test_linalg_lstsq_batch_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.060s)
2023-01-11T21:53:15.8253407Z   test_linalg_lstsq_batch_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.050s)
2023-01-11T21:53:15.8253752Z   test_linalg_lstsq_batch_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.091s)
2023-01-11T21:53:15.8254081Z   test_linalg_lstsq_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.505s)
2023-01-11T21:53:15.8254388Z   test_linalg_lstsq_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.146s)
2023-01-11T21:53:15.8254958Z   test_linalg_lstsq_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.127s)
2023-01-11T21:53:15.8255291Z   test_linalg_lstsq_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.197s)
2023-01-11T21:53:15.8255612Z   test_linalg_lstsq_input_checks_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8255957Z   test_linalg_lstsq_input_checks_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8256297Z   test_linalg_lstsq_input_checks_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.039s)
2023-01-11T21:53:15.8256629Z   test_linalg_lstsq_input_checks_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.038s)
2023-01-11T21:53:15.8256975Z   test_linalg_lu_cpu_errors_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:53:15.8257345Z   test_linalg_lu_cpu_errors_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:53:15.8257708Z   test_linalg_lu_cpu_errors_cuda_float32 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:53:15.8258070Z   test_linalg_lu_cpu_errors_cuda_float64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T21:53:15.8258897Z   test_linalg_lu_family_cuda_complex128 (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/functional.py:1728: UserWarning: torch.lu is deprecated in favor of torch.linalg.lu_factor / torch.linalg.lu_factor_ex and will be removed in a future PyTorch release.
2023-01-11T21:53:15.8259365Z LU, pivots = torch.lu(A, compute_pivots)
2023-01-11T21:53:15.8259590Z should be replaced with
2023-01-11T21:53:15.8259838Z LU, pivots = torch.linalg.lu_factor(A, compute_pivots)
2023-01-11T21:53:15.8260130Z and
2023-01-11T21:53:15.8260371Z LU, pivots, info = torch.lu(A, compute_pivots, get_infos=True)
2023-01-11T21:53:15.8260615Z should be replaced with
2023-01-11T21:53:15.8260983Z LU, pivots, info = torch.linalg.lu_factor_ex(A, compute_pivots) (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2029.)
2023-01-11T21:53:15.8261393Z   return torch._lu_with_info(A, pivot=pivot, check_errors=(not get_infos))
2023-01-11T21:53:15.8261637Z ok (1.124s)
2023-01-11T21:53:15.8261884Z   test_linalg_lu_family_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.994s)
2023-01-11T21:53:15.8262207Z   test_linalg_lu_family_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.901s)
2023-01-11T21:53:15.8262522Z   test_linalg_lu_family_cuda_float64 (__main__.TestLinalgCUDA) ... ok (1.285s)
2023-01-11T21:53:15.8263059Z   test_linalg_lu_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... [W Context.cpp:241] Warning: torch.backends.cuda.preferred_linalg_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
2023-01-11T21:53:15.8263491Z ok (1.968s)
2023-01-11T21:53:15.8263743Z   test_linalg_lu_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.926s)
2023-01-11T21:53:15.8264057Z   test_linalg_lu_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.919s)
2023-01-11T21:53:15.8264361Z   test_linalg_lu_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (1.925s)
2023-01-11T21:53:15.8264756Z   test_linalg_matrix_exp_analytic_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s)
2023-01-11T21:53:15.8265280Z   test_linalg_matrix_exp_analytic_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s)
2023-01-11T21:53:15.8265740Z   test_linalg_matrix_exp_analytic_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s)
2023-01-11T21:53:15.8266190Z   test_linalg_matrix_exp_analytic_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.003s)
2023-01-11T21:53:15.8266573Z   test_linalg_matrix_exp_batch_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.045s)
2023-01-11T21:53:15.8266900Z   test_linalg_matrix_exp_batch_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.045s)
2023-01-11T21:53:15.8267239Z   test_linalg_matrix_exp_boundary_cases_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8267589Z   test_linalg_matrix_exp_boundary_cases_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8267943Z   test_linalg_matrix_exp_boundary_cases_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8268282Z   test_linalg_matrix_exp_boundary_cases_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8268637Z   test_linalg_matrix_exp_compare_with_taylor_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.100s)
2023-01-11T21:53:15.8268991Z   test_linalg_matrix_exp_compare_with_taylor_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.099s)
2023-01-11T21:53:15.8269348Z   test_linalg_matrix_exp_compare_with_taylor_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.100s)
2023-01-11T21:53:15.8269703Z   test_linalg_matrix_exp_compare_with_taylor_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.093s)
2023-01-11T21:53:15.8270073Z   test_linalg_matrix_exp_no_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:53:15.8270460Z   test_linalg_matrix_exp_utils_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8270801Z   test_linalg_matrix_exp_utils_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8271140Z   test_linalg_qr_autograd_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8271488Z   test_linalg_solve_triangular_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.918s)
2023-01-11T21:53:15.8271861Z   test_linalg_solve_triangular_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.917s)
2023-01-11T21:53:15.8272231Z   test_linalg_solve_triangular_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8272596Z   test_linalg_solve_triangular_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8272941Z   test_linalg_solve_triangular_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (1.954s)
2023-01-11T21:53:15.8273285Z   test_linalg_solve_triangular_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.991s)
2023-01-11T21:53:15.8273621Z   test_linalg_solve_triangular_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.442s)
2023-01-11T21:53:15.8273947Z   test_linalg_solve_triangular_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.422s)
2023-01-11T21:53:15.8274327Z   test_linalg_solve_triangular_large_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8274751Z   test_linalg_solve_triangular_large_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8275170Z   test_linalg_solve_triangular_large_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8275577Z   test_linalg_solve_triangular_large_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8275954Z   test_linear_algebra_scalar_raises_cuda (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8276470Z   test_lobpcg_basic_cuda_float64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:5098: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
2023-01-11T21:53:15.8277010Z The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
2023-01-11T21:53:15.8277346Z L, _ = torch.symeig(A, upper=upper)
2023-01-11T21:53:15.8277560Z should be replaced with
2023-01-11T21:53:15.8277868Z L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
2023-01-11T21:53:15.8278099Z and
2023-01-11T21:53:15.8278304Z L, V = torch.symeig(A, eigenvectors=True)
2023-01-11T21:53:15.8278530Z should be replaced with
2023-01-11T21:53:15.8278997Z L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2910.)
2023-01-11T21:53:15.8279341Z   e = torch.symeig(A)[0]
2023-01-11T21:53:15.8279536Z ok (2.730s)
2023-01-11T21:53:15.8279782Z   test_lobpcg_ortho_cuda_float64 (__main__.TestLinalgCUDA) ... ok (12.638s)
2023-01-11T21:53:15.8280083Z   test_lobpcg_scipy_cuda_float64 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8280427Z Compare torch and scipy.sparse.linalg implementations of lobpcg ... skip: Only runs on cpu (0.005s)
2023-01-11T21:53:15.8280807Z   test_lobpcg_torchscript_cuda_float64 (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:53:15.8281141Z   test_lstsq_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8281635Z   test_lu_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6953: UserWarning: torch.lu_solve is deprecated in favor of torch.linalg.lu_solveand will be removed in a future PyTorch release.
2023-01-11T21:53:15.8282147Z Note that torch.linalg.lu_solve has its arguments reversed.
2023-01-11T21:53:15.8282403Z X = torch.lu_solve(B, LU, pivots)
2023-01-11T21:53:15.8282613Z should be replaced with
2023-01-11T21:53:15.8282972Z X = torch.linalg.lu_solve(LU, pivots, B) (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2183.)
2023-01-11T21:53:15.8283332Z   x = torch.lu_solve(b, LU_data, LU_pivots)
2023-01-11T21:53:15.8283541Z ok (0.012s)
2023-01-11T21:53:15.8283804Z   test_lu_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8284155Z   test_lu_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8284498Z   test_lu_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8284821Z   test_lu_solve_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.024s)
2023-01-11T21:53:15.8285147Z   test_lu_solve_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8285469Z   test_lu_solve_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8285780Z   test_lu_solve_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.012s)
2023-01-11T21:53:15.8286177Z   test_lu_solve_batched_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8286641Z   test_lu_solve_batched_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8287103Z   test_lu_solve_batched_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8287557Z   test_lu_solve_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8287922Z   test_lu_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.021s)
2023-01-11T21:53:15.8288230Z   test_lu_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8288534Z   test_lu_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8288856Z   test_lu_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8289181Z   test_lu_solve_large_matrices_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8289524Z   test_lu_solve_large_matrices_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8289858Z   test_lu_solve_large_matrices_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8290179Z   test_lu_solve_large_matrices_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8290510Z   test_lu_unpack_check_input_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8290816Z   test_matmul_45724_cuda (__main__.TestLinalgCUDA) ... ok (0.283s)
2023-01-11T21:53:15.8291585Z   test_matmul_small_brute_force_1d_Nd_cuda_complex64 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:4287: UserWarning: An output with one or more elements was resized since it had shape [1], which does not match the required output shape [1, 1]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8292273Z   ans = torch.matmul(x, y, out=out)
2023-01-11T21:53:15.8292928Z /var/lib/jenkins/workspace/test/test_linalg.py:4287: UserWarning: An output with one or more elements was resized since it had shape [2], which does not match the required output shape [1, 2]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8293573Z   ans = torch.matmul(x, y, out=out)
2023-01-11T21:53:15.8294244Z /var/lib/jenkins/workspace/test/test_linalg.py:4287: UserWarning: An output with one or more elements was resized since it had shape [3], which does not match the required output shape [1, 3]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8295159Z   ans = torch.matmul(x, y, out=out)
2023-01-11T21:53:15.8295362Z ok (0.464s)
2023-01-11T21:53:15.8295627Z   test_matmul_small_brute_force_1d_Nd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.425s)
2023-01-11T21:53:15.8295974Z   test_matmul_small_brute_force_2d_Nd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.815s)
2023-01-11T21:53:15.8296310Z   test_matmul_small_brute_force_2d_Nd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.668s)
2023-01-11T21:53:15.8296647Z   test_matmul_small_brute_force_3d_Nd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (2.067s)
2023-01-11T21:53:15.8296970Z   test_matmul_small_brute_force_3d_Nd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.896s)
2023-01-11T21:53:15.8297279Z   test_matrix_norm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.011s)
2023-01-11T21:53:15.8297575Z   test_matrix_norm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8297896Z   test_matrix_power_negative_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.023s)
2023-01-11T21:53:15.8298215Z   test_matrix_power_negative_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8298552Z   test_matrix_power_non_negative_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.010s)
2023-01-11T21:53:15.8298912Z   test_matrix_power_non_negative_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8299255Z   test_matrix_rank_atol_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.158s)
2023-01-11T21:53:15.8299668Z   test_matrix_rank_atol_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.033s)
2023-01-11T21:53:15.8300075Z   test_matrix_rank_atol_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.028s)
2023-01-11T21:53:15.8300415Z   test_matrix_rank_atol_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.053s)
2023-01-11T21:53:15.8300759Z   test_matrix_rank_atol_rtol_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8301095Z   test_matrix_rank_basic_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8301411Z   test_matrix_rank_basic_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8301718Z   test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8302028Z   test_matrix_rank_basic_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8302338Z   test_matrix_rank_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.086s)
2023-01-11T21:53:15.8302649Z   test_matrix_rank_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8302945Z   test_matrix_rank_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8303242Z   test_matrix_rank_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.031s)
2023-01-11T21:53:15.8303589Z   test_matrix_rank_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8303978Z   test_matrix_rank_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8304364Z   test_matrix_rank_empty_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8304793Z   test_matrix_rank_empty_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test skipped for CUDA version (11, 6) (0.001s)
2023-01-11T21:53:15.8305165Z   test_matrix_rank_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8305518Z   test_matrix_rank_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8305869Z   test_matrix_rank_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8306214Z   test_matrix_rank_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8306544Z   test_matrix_rank_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8306845Z   test_mm_bmm_non_memory_dense_cuda (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8307205Z   test_mm_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s)
2023-01-11T21:53:15.8307625Z   test_mm_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s)
2023-01-11T21:53:15.8308034Z   test_mm_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s)
2023-01-11T21:53:15.8308423Z   test_mm_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.002s)
2023-01-11T21:53:15.8308769Z   test_multi_dot_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8309070Z   test_multi_dot_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8309364Z   test_multi_dot_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.028s)
2023-01-11T21:53:15.8309676Z   test_norm_complex_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.116s)
2023-01-11T21:53:15.8309987Z   test_norm_complex_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8310297Z   test_norm_complex_old_cuda (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8310591Z   test_norm_dtype_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.072s)
2023-01-11T21:53:15.8310896Z   test_norm_dtype_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.150s)
2023-01-11T21:53:15.8311243Z   test_norm_dtype_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.172s)
2023-01-11T21:53:15.8311544Z   test_norm_dtype_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.072s)
2023-01-11T21:53:15.8311846Z   test_norm_dtype_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.116s)
2023-01-11T21:53:15.8312142Z   test_norm_dtype_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.070s)
2023-01-11T21:53:15.8312437Z   test_norm_errors_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.144s)
2023-01-11T21:53:15.8312730Z   test_norm_errors_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.144s)
2023-01-11T21:53:15.8313283Z   test_norm_extreme_values_cuda (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2570: RuntimeWarning: divide by zero encountered in power
2023-01-11T21:53:15.8313637Z   ret **= (1 / ord)
2023-01-11T21:53:15.8314029Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in reciprocal
2023-01-11T21:53:15.8314330Z   absx **= ord
2023-01-11T21:53:15.8314722Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in power
2023-01-11T21:53:15.8315016Z   absx **= ord
2023-01-11T21:53:15.8315187Z ok (0.052s)
2023-01-11T21:53:15.8315429Z   test_norm_fastpaths_cuda (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8315747Z   test_norm_fro_2_equivalence_old_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.126s)
2023-01-11T21:53:15.8316283Z   test_norm_fused_type_promotion_cuda_bfloat16 (__main__.TestLinalgCUDA) ... STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:53:15.8316832Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:53:15.8317296Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:53:15.8317758Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:53:15.8318200Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:53:15.8318709Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:53:15.8318987Z ok (0.361s)
2023-01-11T21:53:15.8319455Z   test_norm_fused_type_promotion_cuda_float16 (__main__.TestLinalgCUDA) ... STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:53:15.8319955Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:53:15.8320413Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:53:15.8320861Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T21:53:15.8321306Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T21:53:15.8321754Z STAGE:2023-01-11 21:51:16 18927:18927 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T21:53:15.8322030Z ok (0.004s)
2023-01-11T21:53:15.8322281Z   test_norm_matrix_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.117s)
2023-01-11T21:53:15.8322579Z   test_norm_matrix_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.237s)
2023-01-11T21:53:15.8322914Z   test_norm_matrix_degenerate_shapes_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.521s)
2023-01-11T21:53:15.8323268Z   test_norm_matrix_degenerate_shapes_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.523s)
2023-01-11T21:53:15.8323632Z   test_norm_matrix_degenerate_shapes_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.521s)
2023-01-11T21:53:15.8323967Z   test_norm_matrix_degenerate_shapes_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.521s)
2023-01-11T21:53:15.8324331Z   test_norm_old_cuda (__main__.TestLinalgCUDA) ... ok (0.040s)
2023-01-11T21:53:15.8324640Z   test_norm_old_nan_propagation_cuda (__main__.TestLinalgCUDA) ... ok (0.008s)
2023-01-11T21:53:15.8324946Z   test_norm_vector_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.099s)
2023-01-11T21:53:15.8325246Z   test_norm_vector_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.641s)
2023-01-11T21:53:15.8325577Z   test_norm_vector_degenerate_shapes_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.195s)
2023-01-11T21:53:15.8325930Z   test_norm_vector_degenerate_shapes_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.194s)
2023-01-11T21:53:15.8326268Z   test_norm_vector_degenerate_shapes_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.194s)
2023-01-11T21:53:15.8326619Z   test_norm_vector_degenerate_shapes_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.194s)
2023-01-11T21:53:15.8326966Z   test_nuclear_norm_axes_small_brute_force_old_cuda (__main__.TestLinalgCUDA) ... ok (0.075s)
2023-01-11T21:53:15.8327295Z   test_nuclear_norm_exceptions_old_cuda (__main__.TestLinalgCUDA) ... ok (0.026s)
2023-01-11T21:53:15.8327618Z   test_nuclear_norm_out_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8327930Z   test_nuclear_norm_out_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.069s)
2023-01-11T21:53:15.8328256Z   test_old_cholesky_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.014s)
2023-01-11T21:53:15.8328588Z   test_old_cholesky_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8328922Z   test_old_cholesky_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8329246Z   test_old_cholesky_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8329673Z   test_old_cholesky_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8330084Z   test_old_cholesky_batched_upper_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8330437Z   test_old_cholesky_batched_upper_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8330784Z   test_old_cholesky_batched_upper_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8331116Z   test_old_cholesky_batched_upper_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8331446Z   test_old_cholesky_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8331768Z   test_old_cholesky_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8332081Z   test_old_cholesky_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8332380Z   test_old_cholesky_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8332702Z   test_old_cholesky_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8333030Z   test_old_cholesky_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8333352Z   test_old_cholesky_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8333670Z   test_old_cholesky_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8333979Z   test_ormqr_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.143s)
2023-01-11T21:53:15.8334285Z   test_ormqr_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.150s)
2023-01-11T21:53:15.8334800Z   test_ormqr_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.136s)
2023-01-11T21:53:15.8335158Z   test_ormqr_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.129s)
2023-01-11T21:53:15.8335472Z   test_ormqr_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.027s)
2023-01-11T21:53:15.8335809Z   test_ormqr_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.026s)
2023-01-11T21:53:15.8336138Z   test_ormqr_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.026s)
2023-01-11T21:53:15.8336523Z   test_ormqr_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.026s)
2023-01-11T21:53:15.8337087Z   test_outer_cuda_bfloat16 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:132: UserWarning: torch.ger is deprecated and will be removed in a future PyTorch release. Use torch.outer instead. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/LinearAlgebra.cpp:1242.)
2023-01-11T21:53:15.8337556Z   torch.ger(a, b, out=out)
2023-01-11T21:53:15.8337747Z ok (0.006s)
2023-01-11T21:53:15.8337983Z   test_outer_cuda_bool (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8338277Z   test_outer_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8338566Z   test_outer_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8338859Z   test_outer_cuda_float16 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8339147Z   test_outer_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8339433Z   test_outer_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8339713Z   test_outer_cuda_int16 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8339991Z   test_outer_cuda_int32 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8340354Z   test_outer_cuda_int64 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8340636Z   test_outer_cuda_int8 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8340914Z   test_outer_cuda_uint8 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8341213Z   test_outer_ger_addr_legacy_tests_cuda (__main__.TestLinalgCUDA) ... ok (0.022s)
2023-01-11T21:53:15.8341537Z   test_outer_type_promotion_cuda_bfloat16_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8341923Z   test_outer_type_promotion_cuda_bfloat16_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8342281Z   test_outer_type_promotion_cuda_bfloat16_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8342627Z   test_outer_type_promotion_cuda_bfloat16_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8342973Z   test_outer_type_promotion_cuda_bfloat16_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8343317Z   test_outer_type_promotion_cuda_bfloat16_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8343656Z   test_outer_type_promotion_cuda_bfloat16_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8343986Z   test_outer_type_promotion_cuda_bfloat16_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8344322Z   test_outer_type_promotion_cuda_bfloat16_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8344659Z   test_outer_type_promotion_cuda_bfloat16_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8344991Z   test_outer_type_promotion_cuda_bfloat16_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8345318Z   test_outer_type_promotion_cuda_bfloat16_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8345653Z   test_outer_type_promotion_cuda_bool_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8345992Z   test_outer_type_promotion_cuda_bool_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8346319Z   test_outer_type_promotion_cuda_bool_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8346661Z   test_outer_type_promotion_cuda_bool_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8346998Z   test_outer_type_promotion_cuda_bool_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8347334Z   test_outer_type_promotion_cuda_bool_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8347660Z   test_outer_type_promotion_cuda_bool_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8347994Z   test_outer_type_promotion_cuda_bool_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8348363Z   test_outer_type_promotion_cuda_bool_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8348686Z   test_outer_type_promotion_cuda_bool_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8349059Z   test_outer_type_promotion_cuda_bool_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8349388Z   test_outer_type_promotion_cuda_bool_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8349733Z   test_outer_type_promotion_cuda_complex128_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8350069Z   test_outer_type_promotion_cuda_complex128_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8350423Z   test_outer_type_promotion_cuda_complex128_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8350782Z   test_outer_type_promotion_cuda_complex128_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8351131Z   test_outer_type_promotion_cuda_complex128_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8351469Z   test_outer_type_promotion_cuda_complex128_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8351821Z   test_outer_type_promotion_cuda_complex128_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8352165Z   test_outer_type_promotion_cuda_complex128_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8352499Z   test_outer_type_promotion_cuda_complex128_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8352838Z   test_outer_type_promotion_cuda_complex128_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8353171Z   test_outer_type_promotion_cuda_complex128_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8353540Z   test_outer_type_promotion_cuda_complex128_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8353875Z   test_outer_type_promotion_cuda_complex64_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8354218Z   test_outer_type_promotion_cuda_complex64_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8354567Z   test_outer_type_promotion_cuda_complex64_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8354923Z   test_outer_type_promotion_cuda_complex64_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8355261Z   test_outer_type_promotion_cuda_complex64_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8355604Z   test_outer_type_promotion_cuda_complex64_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8355950Z   test_outer_type_promotion_cuda_complex64_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8356280Z   test_outer_type_promotion_cuda_complex64_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8356624Z   test_outer_type_promotion_cuda_complex64_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8356963Z   test_outer_type_promotion_cuda_complex64_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8357299Z   test_outer_type_promotion_cuda_complex64_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8357627Z   test_outer_type_promotion_cuda_complex64_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8357964Z   test_outer_type_promotion_cuda_float16_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8358302Z   test_outer_type_promotion_cuda_float16_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8358640Z   test_outer_type_promotion_cuda_float16_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8358976Z   test_outer_type_promotion_cuda_float16_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8359319Z   test_outer_type_promotion_cuda_float16_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8359656Z   test_outer_type_promotion_cuda_float16_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8360008Z   test_outer_type_promotion_cuda_float16_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8360346Z   test_outer_type_promotion_cuda_float16_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8360678Z   test_outer_type_promotion_cuda_float16_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8361011Z   test_outer_type_promotion_cuda_float16_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8361331Z   test_outer_type_promotion_cuda_float16_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8361672Z   test_outer_type_promotion_cuda_float16_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8362004Z   test_outer_type_promotion_cuda_float32_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8362334Z   test_outer_type_promotion_cuda_float32_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8362677Z   test_outer_type_promotion_cuda_float32_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8363027Z   test_outer_type_promotion_cuda_float32_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8363371Z   test_outer_type_promotion_cuda_float32_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8363700Z   test_outer_type_promotion_cuda_float32_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8364033Z   test_outer_type_promotion_cuda_float32_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8364366Z   test_outer_type_promotion_cuda_float32_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8380909Z   test_outer_type_promotion_cuda_float32_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8381384Z   test_outer_type_promotion_cuda_float32_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8381717Z   test_outer_type_promotion_cuda_float32_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8382064Z   test_outer_type_promotion_cuda_float32_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8382404Z   test_outer_type_promotion_cuda_float64_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8382735Z   test_outer_type_promotion_cuda_float64_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8383092Z   test_outer_type_promotion_cuda_float64_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8383436Z   test_outer_type_promotion_cuda_float64_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8383786Z   test_outer_type_promotion_cuda_float64_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8384119Z   test_outer_type_promotion_cuda_float64_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8384462Z   test_outer_type_promotion_cuda_float64_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8384794Z   test_outer_type_promotion_cuda_float64_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8385128Z   test_outer_type_promotion_cuda_float64_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8385458Z   test_outer_type_promotion_cuda_float64_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8385787Z   test_outer_type_promotion_cuda_float64_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8386124Z   test_outer_type_promotion_cuda_float64_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8386449Z   test_outer_type_promotion_cuda_int16_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8386785Z   test_outer_type_promotion_cuda_int16_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8387127Z   test_outer_type_promotion_cuda_int16_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8387467Z   test_outer_type_promotion_cuda_int16_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8387838Z   test_outer_type_promotion_cuda_int16_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8388176Z   test_outer_type_promotion_cuda_int16_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8388599Z   test_outer_type_promotion_cuda_int16_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8389010Z   test_outer_type_promotion_cuda_int16_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8389399Z   test_outer_type_promotion_cuda_int16_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8389730Z   test_outer_type_promotion_cuda_int16_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8390060Z   test_outer_type_promotion_cuda_int16_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8390387Z   test_outer_type_promotion_cuda_int16_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8390720Z   test_outer_type_promotion_cuda_int32_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8391057Z   test_outer_type_promotion_cuda_int32_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8391384Z   test_outer_type_promotion_cuda_int32_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8391730Z   test_outer_type_promotion_cuda_int32_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8392067Z   test_outer_type_promotion_cuda_int32_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8392397Z   test_outer_type_promotion_cuda_int32_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8392719Z   test_outer_type_promotion_cuda_int32_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8393049Z   test_outer_type_promotion_cuda_int32_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8393428Z   test_outer_type_promotion_cuda_int32_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8393762Z   test_outer_type_promotion_cuda_int32_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8394081Z   test_outer_type_promotion_cuda_int32_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8394416Z   test_outer_type_promotion_cuda_int32_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8394746Z   test_outer_type_promotion_cuda_int64_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8395078Z   test_outer_type_promotion_cuda_int64_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8395418Z   test_outer_type_promotion_cuda_int64_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8395767Z   test_outer_type_promotion_cuda_int64_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8396110Z   test_outer_type_promotion_cuda_int64_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8396446Z   test_outer_type_promotion_cuda_int64_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8396783Z   test_outer_type_promotion_cuda_int64_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8397118Z   test_outer_type_promotion_cuda_int64_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8397435Z   test_outer_type_promotion_cuda_int64_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8397770Z   test_outer_type_promotion_cuda_int64_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8398099Z   test_outer_type_promotion_cuda_int64_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8398434Z   test_outer_type_promotion_cuda_int64_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8398758Z   test_outer_type_promotion_cuda_int8_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8399099Z   test_outer_type_promotion_cuda_int8_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8399438Z   test_outer_type_promotion_cuda_int8_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8399807Z   test_outer_type_promotion_cuda_int8_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8400142Z   test_outer_type_promotion_cuda_int8_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8400481Z   test_outer_type_promotion_cuda_int8_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8400817Z   test_outer_type_promotion_cuda_int8_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8401138Z   test_outer_type_promotion_cuda_int8_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8401470Z   test_outer_type_promotion_cuda_int8_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8401793Z   test_outer_type_promotion_cuda_int8_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8402124Z   test_outer_type_promotion_cuda_int8_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8402447Z   test_outer_type_promotion_cuda_int8_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8402785Z   test_outer_type_promotion_cuda_uint8_bfloat16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8403117Z   test_outer_type_promotion_cuda_uint8_bool (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8403445Z   test_outer_type_promotion_cuda_uint8_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8403794Z   test_outer_type_promotion_cuda_uint8_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8404133Z   test_outer_type_promotion_cuda_uint8_float16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8404470Z   test_outer_type_promotion_cuda_uint8_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8404800Z   test_outer_type_promotion_cuda_uint8_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8405163Z   test_outer_type_promotion_cuda_uint8_int16 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8405498Z   test_outer_type_promotion_cuda_uint8_int32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8405836Z   test_outer_type_promotion_cuda_uint8_int64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8406158Z   test_outer_type_promotion_cuda_uint8_int8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8406482Z   test_outer_type_promotion_cuda_uint8_uint8 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8406796Z   test_pca_lowrank_cuda (__main__.TestLinalgCUDA) ... ok (19.177s)
2023-01-11T21:53:15.8407092Z   test_permute_matmul_cuda (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8407402Z   test_pinv_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.448s)
2023-01-11T21:53:15.8407708Z   test_pinv_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.240s)
2023-01-11T21:53:15.8408000Z   test_pinv_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.211s)
2023-01-11T21:53:15.8408298Z   test_pinv_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.237s)
2023-01-11T21:53:15.8408627Z   test_pinv_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.035s)
2023-01-11T21:53:15.8408985Z   test_pinv_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.031s)
2023-01-11T21:53:15.8409316Z   test_pinv_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.030s)
2023-01-11T21:53:15.8409651Z   test_pinv_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.031s)
2023-01-11T21:53:15.8409975Z   test_pinverse_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.043s)
2023-01-11T21:53:15.8410290Z   test_pinverse_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.018s)
2023-01-11T21:53:15.8410594Z   test_pinverse_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8410896Z   test_pinverse_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8411220Z   test_preferred_linalg_library_cuda (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8411524Z   test_qr_batched_cuda_complex128 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8411875Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.004s)
2023-01-11T21:53:15.8412195Z   test_qr_batched_cuda_complex64 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8412507Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.004s)
2023-01-11T21:53:15.8412819Z   test_qr_batched_cuda_float32 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8413130Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.004s)
2023-01-11T21:53:15.8413440Z   test_qr_batched_cuda_float64 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8413742Z test torch.linalg.qr vs numpy.linalg.qr. We need some special logic ... ok (0.003s)
2023-01-11T21:53:15.8414225Z   test_qr_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:3516: UserWarning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
2023-01-11T21:53:15.8415125Z The boolean parameter 'some' has been replaced with a string parameter 'mode'.
2023-01-11T21:53:15.8415401Z Q, R = torch.qr(A, some)
2023-01-11T21:53:15.8415598Z should be replaced with
2023-01-11T21:53:15.8416079Z Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2471.)
2023-01-11T21:53:15.8416455Z   torch.qr(A, some=some, out=(Q_out, R_out))
2023-01-11T21:53:15.8416659Z ok (0.077s)
2023-01-11T21:53:15.8416897Z   test_qr_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.079s)
2023-01-11T21:53:15.8417181Z   test_qr_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.070s)
2023-01-11T21:53:15.8417453Z   test_qr_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.069s)
2023-01-11T21:53:15.8417750Z   test_qr_error_cases_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8418112Z   test_qr_vs_numpy_cuda_complex128 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8418395Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s)
2023-01-11T21:53:15.8418676Z   test_qr_vs_numpy_cuda_complex64 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8418958Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s)
2023-01-11T21:53:15.8419231Z   test_qr_vs_numpy_cuda_float32 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8419496Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s)
2023-01-11T21:53:15.8419774Z   test_qr_vs_numpy_cuda_float64 (__main__.TestLinalgCUDA)
2023-01-11T21:53:15.8420124Z test torch.linalg.qr vs numpy.linalg.qr ... ok (0.007s)
2023-01-11T21:53:15.8420410Z   test_renorm_cuda (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8420714Z   test_renorm_ps_cuda (__main__.TestLinalgCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T21:53:15.8421037Z   test_slogdet_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (1.151s)
2023-01-11T21:53:15.8421351Z   test_slogdet_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.050s)
2023-01-11T21:53:15.8421645Z   test_slogdet_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.121s)
2023-01-11T21:53:15.8421945Z   test_slogdet_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.128s)
2023-01-11T21:53:15.8422261Z   test_slogdet_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8422597Z   test_slogdet_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8422954Z   test_slogdet_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8423293Z   test_slogdet_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8423635Z   test_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.013s)
2023-01-11T21:53:15.8423970Z   test_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8424318Z   test_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8424663Z   test_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.006s)
2023-01-11T21:53:15.8425027Z   test_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.179s)
2023-01-11T21:53:15.8425321Z   test_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.040s)
2023-01-11T21:53:15.8425611Z   test_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.035s)
2023-01-11T21:53:15.8425896Z   test_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.079s)
2023-01-11T21:53:15.8426184Z   test_solve_removed_error_cuda (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8426495Z   test_strided_mm_bmm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.002s)
2023-01-11T21:53:15.8426800Z   test_strided_mm_bmm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8427089Z   test_svd_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (3.709s)
2023-01-11T21:53:15.8427382Z   test_svd_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (2.658s)
2023-01-11T21:53:15.8427669Z   test_svd_cuda_float32 (__main__.TestLinalgCUDA) ... ok (2.408s)
2023-01-11T21:53:15.8427952Z   test_svd_cuda_float64 (__main__.TestLinalgCUDA) ... ok (2.644s)
2023-01-11T21:53:15.8429010Z   test_svd_lowrank_cuda_float64 (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_lowrank.py:184: UserWarning: torch.linalg.svd: During SVD computation with the selected cusolver driver, batches 0 failed to converge. A more accurate method will be used to compute the SVD as a fallback. Check doc at https://pytorch.org/docs/stable/generated/torch.linalg.svd.html (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp:907.)
2023-01-11T21:53:15.8429709Z   U, S, Vh = torch.linalg.svd(B_t, full_matrices=False)
2023-01-11T21:53:15.8429935Z ok (64.932s)
2023-01-11T21:53:15.8430199Z   test_svd_memory_allocation_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.059s)
2023-01-11T21:53:15.8430572Z   test_svd_memory_allocation_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8430906Z   test_svd_memory_allocation_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8431236Z   test_svd_memory_allocation_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.026s)
2023-01-11T21:53:15.8431719Z   test_symeig_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
2023-01-11T21:53:15.8432248Z The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
2023-01-11T21:53:15.8432588Z L, _ = torch.symeig(A, upper=upper)
2023-01-11T21:53:15.8432813Z should be replaced with
2023-01-11T21:53:15.8433112Z L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
2023-01-11T21:53:15.8433348Z and
2023-01-11T21:53:15.8433551Z L, V = torch.symeig(A, eigenvectors=True)
2023-01-11T21:53:15.8433781Z should be replaced with
2023-01-11T21:53:15.8434253Z L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2928.)
2023-01-11T21:53:15.8434675Z   torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv))
2023-01-11T21:53:15.8435403Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8436109Z   torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv))
2023-01-11T21:53:15.8436863Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [3, 5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8437545Z   torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv))
2023-01-11T21:53:15.8438251Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [3, 5, 5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8438943Z   torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv))
2023-01-11T21:53:15.8439645Z /var/lib/jenkins/workspace/test/test_linalg.py:6990: UserWarning: An output with one or more elements was resized since it had shape [5, 3, 5, 5, 5], which does not match the required output shape [0]. This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Resize.cpp:33.)
2023-01-11T21:53:15.8440314Z   torch.symeig(x, eigenvectors=eigenvectors, upper=upper, out=(oute, outv))
2023-01-11T21:53:15.8440627Z ok (1.392s)
2023-01-11T21:53:15.8440930Z   test_symeig_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.392s)
2023-01-11T21:53:15.8441247Z   test_symeig_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.561s)
2023-01-11T21:53:15.8441562Z   test_symeig_cuda_float64 (__main__.TestLinalgCUDA) ... ok (1.377s)
2023-01-11T21:53:15.8441887Z   test_symeig_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.024s)
2023-01-11T21:53:15.8442249Z   test_symeig_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.024s)
2023-01-11T21:53:15.8442599Z   test_symeig_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.023s)
2023-01-11T21:53:15.8442946Z   test_symeig_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.023s)
2023-01-11T21:53:15.8443250Z   test_tensordot_cuda (__main__.TestLinalgCUDA) ... ok (0.005s)
2023-01-11T21:53:15.8443552Z   test_tensorinv_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8443870Z   test_tensorinv_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8444173Z   test_tensorinv_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8444476Z   test_tensorinv_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.015s)
2023-01-11T21:53:15.8444799Z   test_tensorinv_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8445126Z   test_tensorinv_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8445441Z   test_tensorinv_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8445760Z   test_tensorinv_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8446100Z   test_tensorinv_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.018s)
2023-01-11T21:53:15.8446446Z   test_tensorinv_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8446799Z   test_tensorinv_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8447148Z   test_tensorinv_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8447488Z   test_tensorinv_singular_input_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8447863Z   test_tensorinv_singular_input_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8448205Z   test_tensorinv_singular_input_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8448543Z   test_tensorinv_singular_input_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.009s)
2023-01-11T21:53:15.8448862Z   test_tensorsolve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8449203Z   test_tensorsolve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8449543Z   test_tensorsolve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8449849Z   test_tensorsolve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8450165Z   test_tensorsolve_empty_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8450493Z   test_tensorsolve_empty_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8450824Z   test_tensorsolve_empty_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8451139Z   test_tensorsolve_empty_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.001s)
2023-01-11T21:53:15.8451473Z   test_tensorsolve_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.007s)
2023-01-11T21:53:15.8452017Z   test_triangular_solve_batched_broadcasting_cuda_complex128 (__main__.TestLinalgCUDA) ... /var/lib/jenkins/workspace/test/test_linalg.py:4212: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
2023-01-11T21:53:15.8452571Z torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
2023-01-11T21:53:15.8452917Z X = torch.triangular_solve(B, A).solution
2023-01-11T21:53:15.8453140Z should be replaced with
2023-01-11T21:53:15.8453503Z X = torch.linalg.solve_triangular(A, B). (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/BatchLinearAlgebra.cpp:2225.)
2023-01-11T21:53:15.8453943Z   x = torch.triangular_solve(b, A, upper=upper, transpose=transpose, unitriangular=unitriangular)[0]
2023-01-11T21:53:15.8454218Z ok (0.031s)
2023-01-11T21:53:15.8454671Z   test_triangular_solve_batched_broadcasting_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.031s)
2023-01-11T21:53:15.8455174Z   test_triangular_solve_batched_broadcasting_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.029s)
2023-01-11T21:53:15.8455533Z   test_triangular_solve_batched_broadcasting_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.028s)
2023-01-11T21:53:15.8455884Z   test_triangular_solve_batched_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.021s)
2023-01-11T21:53:15.8456228Z   test_triangular_solve_batched_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.021s)
2023-01-11T21:53:15.8456559Z   test_triangular_solve_batched_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.020s)
2023-01-11T21:53:15.8456883Z   test_triangular_solve_batched_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.019s)
2023-01-11T21:53:15.8457297Z   test_triangular_solve_batched_many_batches_cuda_complex128 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8457777Z   test_triangular_solve_batched_many_batches_cuda_complex64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8458253Z   test_triangular_solve_batched_many_batches_cuda_float32 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8458707Z   test_triangular_solve_batched_many_batches_cuda_float64 (__main__.TestLinalgCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T21:53:15.8459098Z   test_triangular_solve_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8459497Z   test_triangular_solve_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.017s)
2023-01-11T21:53:15.8459821Z   test_triangular_solve_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8460185Z   test_triangular_solve_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.016s)
2023-01-11T21:53:15.8460525Z   test_triangular_solve_out_errors_and_warnings_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8460895Z   test_triangular_solve_out_errors_and_warnings_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8461259Z   test_triangular_solve_out_errors_and_warnings_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.024s)
2023-01-11T21:53:15.8461608Z   test_triangular_solve_out_errors_and_warnings_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.025s)
2023-01-11T21:53:15.8461941Z   test_vdot_invalid_args_cuda (__main__.TestLinalgCUDA) ... ok (0.033s)
2023-01-11T21:53:15.8462251Z   test_vdot_vs_numpy_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (0.004s)
2023-01-11T21:53:15.8462554Z   test_vdot_vs_numpy_cuda_float32 (__main__.TestLinalgCUDA) ... ok (0.003s)
2023-01-11T21:53:15.8462857Z   test_vector_norm_cuda_bfloat16 (__main__.TestLinalgCUDA) ... ok (1.339s)
2023-01-11T21:53:15.8463163Z   test_vector_norm_cuda_complex128 (__main__.TestLinalgCUDA) ... ok (0.845s)
2023-01-11T21:53:15.8463471Z   test_vector_norm_cuda_complex64 (__main__.TestLinalgCUDA) ... ok (1.342s)
2023-01-11T21:53:15.8463765Z   test_vector_norm_cuda_float16 (__main__.TestLinalgCUDA) ... ok (1.338s)
2023-01-11T21:53:15.8464064Z   test_vector_norm_cuda_float32 (__main__.TestLinalgCUDA) ... ok (1.339s)
2023-01-11T21:53:15.8464368Z   test_vector_norm_cuda_float64 (__main__.TestLinalgCUDA) ... ok (0.844s)
2023-01-11T21:53:15.8464667Z   test_vector_norm_dim_tuple_arg_cuda (__main__.TestLinalgCUDA) ... ok (0.038s)
2023-01-11T21:53:15.8465281Z   test_vector_norm_extreme_values_cuda (__main__.TestLinalgCUDA) ... /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2570: RuntimeWarning: divide by zero encountered in power
2023-01-11T21:53:15.8465638Z   ret **= (1 / ord)
2023-01-11T21:53:15.8466042Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in reciprocal
2023-01-11T21:53:15.8466336Z   absx **= ord
2023-01-11T21:53:15.8466733Z /opt/conda/lib/python3.10/site-packages/numpy/linalg/linalg.py:2568: RuntimeWarning: divide by zero encountered in power
2023-01-11T21:53:15.8467030Z   absx **= ord
2023-01-11T21:53:15.8467200Z ok (0.025s)
2023-01-11T21:53:15.8467309Z 
2023-01-11T21:53:15.8467512Z ----------------------------------------------------------------------
2023-01-11T21:53:15.8467772Z Ran 718 tests in 304.716s
2023-01-11T21:53:15.8467895Z 
2023-01-11T21:53:15.8467967Z OK (skipped=47)
2023-01-11T21:53:15.8468087Z 
2023-01-11T21:53:15.8468177Z Generating XML reports...
2023-01-11T21:53:15.8468576Z Generated XML report: test-reports/python-unittest/test_linalg/TEST-TestLinalgCUDA-20230111214810.xml
2023-01-11T21:53:15.8468807Z 
2023-01-11T21:53:15.8469178Z ##[endgroup]
2023-01-11T21:53:15.8469595Z FINISHED PRINTING LOG FILE of test_linalg (/var/lib/jenkins/workspace/test/test-reports/test_linalg_lzuigkoh)
2023-01-11T21:53:15.8469830Z 
2023-01-11T21:53:15.8470042Z Running test_multiprocessing_spawn ... [2023-01-11 21:53:15.804278]
2023-01-11T21:53:15.8470618Z Executing ['/opt/conda/bin/python', '-bb', 'test_multiprocessing_spawn.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:53:15.804502]
2023-01-11T21:53:35.8970148Z 
2023-01-11T21:53:35.8970669Z Expand the folded group to see the log file of test_multiprocessing_spawn
2023-01-11T21:53:35.8971665Z ##[group]PRINTING LOG FILE of test_multiprocessing_spawn (/var/lib/jenkins/workspace/test/test-reports/test_multiprocessing_spawn_a6hjddgv)
2023-01-11T21:53:35.8976318Z 
2023-01-11T21:53:35.8976965Z Running tests...
2023-01-11T21:53:35.8977596Z ----------------------------------------------------------------------
2023-01-11T21:53:35.8978488Z Test results will be stored in test-reports/python-unittest/test_multiprocessing_spawn
2023-01-11T21:53:35.8979010Z   test_errors_pickleable (__main__.ErrorTest) ... ok (1.085s)
2023-01-11T21:53:35.8979431Z   test_exception_all (__main__.ForkTest) ... ok (0.057s)
2023-01-11T21:53:35.8979801Z   test_exception_single (__main__.ForkTest) ... ok (0.113s)
2023-01-11T21:53:35.8980270Z   test_first_argument_index (__main__.ForkTest) ... ok (0.050s)
2023-01-11T21:53:35.8980709Z   test_success (__main__.ForkTest) ... ok (0.049s)
2023-01-11T21:53:35.8981175Z   test_success_first_then_exception (__main__.ForkTest) ... ok (0.150s)
2023-01-11T21:53:35.8982886Z   test_success_non_blocking (__main__.ForkTest) ... ok (0.050s)
2023-01-11T21:53:35.8983295Z   test_terminate_exit (__main__.ForkTest) ... ok (0.060s)
2023-01-11T21:53:35.8983682Z   test_terminate_signal (__main__.ForkTest) ... ok (0.739s)
2023-01-11T21:53:35.8984283Z   test_exception_all (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8984727Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8985178Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8985601Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8986045Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8986439Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8987060Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8987711Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8987965Z ok (1.664s)
2023-01-11T21:53:35.8988472Z   test_exception_raises (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8988872Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8989343Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8989717Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8989985Z ok (1.547s)
2023-01-11T21:53:35.8990429Z   test_exception_single (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8990863Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8991302Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8991702Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8992146Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8992539Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8992964Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8993362Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8993799Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8994191Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8994661Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8995071Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8995566Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8995956Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8996384Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8996787Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8997024Z ok (3.138s)
2023-01-11T21:53:35.8997494Z   test_first_argument_index (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8997893Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.8998361Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.8998730Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.8999203Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.8999547Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9000003Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9000369Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9000602Z ok (1.615s)
2023-01-11T21:53:35.9000865Z   test_signal_raises (__main__.SpawnTest) ... ok (0.001s)
2023-01-11T21:53:35.9001354Z   test_success (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9001820Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9002260Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9002668Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9003101Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9003487Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9003927Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9004329Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9004559Z ok (1.599s)
2023-01-11T21:53:35.9005046Z   test_success_first_then_exception (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9005466Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9005934Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9006307Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9006790Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9007138Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9007600Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9007968Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9008207Z ok (1.693s)
2023-01-11T21:53:35.9008714Z   test_success_non_blocking (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9009104Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9009630Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9010002Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9010479Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9010839Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9011311Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9011678Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9011943Z ok (1.603s)
2023-01-11T21:53:35.9012383Z   test_terminate_exit (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9012818Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9013244Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9013649Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9014083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9014441Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9015095Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9015448Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9015683Z ok (1.605s)
2023-01-11T21:53:35.9016270Z   test_terminate_signal (__main__.SpawnTest) ... /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9016645Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9017083Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9017433Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9017858Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:122: UserWarning: loaded 76 slow tests
2023-01-11T21:53:35.9018235Z   warnings.warn(f"loaded {len(slow_tests_dict)} slow tests")
2023-01-11T21:53:35.9018683Z /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py:126: UserWarning: loaded 210 disabled tests
2023-01-11T21:53:35.9019034Z   warnings.warn(f"loaded {len(disabled_tests_dict)} disabled tests")
2023-01-11T21:53:35.9019262Z ok (1.515s)
2023-01-11T21:53:35.9019369Z 
2023-01-11T21:53:35.9019572Z ----------------------------------------------------------------------
2023-01-11T21:53:35.9019828Z Ran 19 tests in 18.338s
2023-01-11T21:53:35.9019947Z 
2023-01-11T21:53:35.9020007Z OK
2023-01-11T21:53:35.9020172Z 
2023-01-11T21:53:35.9020295Z Generating XML reports...
2023-01-11T21:53:35.9020714Z Generated XML report: test-reports/python-unittest/test_multiprocessing_spawn/TEST-ErrorTest-20230111215317.xml
2023-01-11T21:53:35.9021221Z Generated XML report: test-reports/python-unittest/test_multiprocessing_spawn/TEST-ForkTest-20230111215317.xml
2023-01-11T21:53:35.9021729Z Generated XML report: test-reports/python-unittest/test_multiprocessing_spawn/TEST-SpawnTest-20230111215317.xml
2023-01-11T21:53:35.9021974Z 
2023-01-11T21:53:35.9022362Z ##[endgroup]
2023-01-11T21:53:35.9022796Z FINISHED PRINTING LOG FILE of test_multiprocessing_spawn (/var/lib/jenkins/workspace/test/test-reports/test_multiprocessing_spawn_a6hjddgv)
2023-01-11T21:53:35.9023047Z 
2023-01-11T21:53:35.9023199Z Running test_ops ... [2023-01-11 21:53:35.896996]
2023-01-11T21:53:38.4215613Z Ignoring disabled issues:  []
2023-01-11T21:53:38.4370306Z Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '--shard-id=0', '--num-shards=2', '-k=not _linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:53:38.436388]
2023-01-11T21:53:38.4744365Z Ignoring disabled issues:  []
2023-01-11T21:53:38.4896607Z Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '--shard-id=1', '--num-shards=2', '-k=not _linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 21:53:38.489123]
2023-01-11T23:10:16.6583981Z 
2023-01-11T23:10:16.6584774Z Expand the folded group to see the log file of test_ops
2023-01-11T23:10:16.6586080Z ##[group]PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_xj2vdchj)
2023-01-11T23:10:16.6601333Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-8308d40cbcb1066e.xml
2023-01-11T23:10:16.6601794Z ============================= test session starts ==============================
2023-01-11T23:10:16.6602396Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python
2023-01-11T23:10:16.6604173Z cachedir: .pytest_cache
2023-01-11T23:10:16.6604976Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2023-01-11T23:10:16.6605596Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini
2023-01-11T23:10:16.6606392Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0
2023-01-11T23:10:16.6606965Z collecting ... collected 30861 items / 25 deselected / 30836 selected
2023-01-11T23:10:16.8170649Z Running 15672 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_angle_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_repeat_interleave_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___radd___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rand___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___ror___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_acos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_all_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_frac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_all_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_byte_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_inverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_combinations_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cov_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumulative_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_equal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_full_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eig_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_instance_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pca_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_qr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rand_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resize__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resolve_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_roll_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_searchsorted_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_short_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_slice_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_symeig_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tensordot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_where_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda, test/test_ops.py::TestCommonCUDA::test_errors___radd___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_errors_amax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_amin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_complex_cuda, test/test_ops.py::TestCommonCUDA::test_errors_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cov_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_eq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gather_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_errors_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_le_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mean_cuda, test/test_ops.py::TestCommonCUDA::test_errors_median_cuda, test/test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_errors_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_errors_mul_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ne_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_pow_cuda, test/test_ops.py::TestCommonCUDA::test_errors_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_errors_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_errors_roll_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sub_cuda, test/test_ops.py::TestCommonCUDA::test_errors_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_tril_cuda, test/test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_errors_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___getitem___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rand___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rmul___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rxor___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_angle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argwhere_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_combinations_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diff_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_equal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gather_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_return_by_ref_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_unary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_kron_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_mean_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_no_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nansum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_constant_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_unshuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_outer_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_interleave_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_neg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_laguerre_polynomial_l_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_polygamma_special_polygamma_n_0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_v_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zero__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_exponential_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning___rand___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___ror___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_double_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_int_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_all_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_positive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__softmax_backward_data_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cartesian_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_inverse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_combinations_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumulative_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_einsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ge_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_geqrf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_inner_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_int_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isfinite_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_2inputs_2outputs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_unary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ldexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_singular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvalsh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_singular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vander_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_long_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matrix_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_no_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmedian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ne_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional__scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_linear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_smooth_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_positive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rad2deg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rand_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_repeat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resize_as__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resolve_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_neg_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scalar_tensor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_searchsorted_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_short_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_legendre_polynomial_p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_polygamma_special_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_stack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_stft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_symeig_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensordot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_topk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_consecutive_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_where_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_arange_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmin_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nextafter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lgamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_short_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_symeig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zero__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argsort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argwhere_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cartesian_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gather_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_geqrf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_and_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_msort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ne_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_celu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_glu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_selu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resize_as__cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_short_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_airy_ai_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_erfcx_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i0e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1e_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_xlog1py_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_symeig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensor_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_true_divide_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rsub___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_double_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_half_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_long_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_short_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_clone_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_physical_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_imag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svdvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_positive_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reciprocal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_acos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addcdiv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_allclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_any_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chalf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_char_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_conj_physical_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_constant_pad_nd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diff_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_double_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_einsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_equal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_half_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_hsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_imag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isnan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isreal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_istft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_unary_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_kron_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cond_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eig_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_power_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_multi_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_subgradients_at_zero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorsolve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vecdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log1p_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_long_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_normalize_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nonzero_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pinverse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rand_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ravel_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resize_as__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_roll_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_along_dim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapz_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_triangular_solve_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_triu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_H_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_T_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___getitem___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_T_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_column_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flipud_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_imag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log10_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ravel_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unbind_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_abs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_decomposed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_angle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_baddbmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chunk_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dist_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_einsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flip_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flipud_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gradient_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_half_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_imag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_put_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_inner_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_int_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isinf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_4inputs_with_extra_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_return_by_ref_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_unary_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_kron_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cond_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_hermitian_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_singular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_slogdet_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorinv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vander_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log1p_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_and_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mT_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_strided_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_linear_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_inf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_nuc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pinverse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_put_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ravel_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scalar_tensor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sparse_sampled_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_to_size_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_symeig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_sparse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_transpose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view___getitem___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmod___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__native_batch_norm_legit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_min_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clone_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expm1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_heaviside_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isreal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log10_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_minimum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_pow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_randn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ravel_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reciprocal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_remainder_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_repeat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tensor_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tril_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unflatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unsqueeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_where_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_xlogy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_acos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_acosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_decomposed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_all_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_allclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_angle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cartesian_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clone_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_copysign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_corrcoef_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cos_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_count_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cov_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_einsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eye_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftshift_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flip_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fliplr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_flipud_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_float_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_heaviside_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_igamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_4inputs_with_extra_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_return_by_ref_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kthvalue_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ldexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lgamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvalsh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_hermitian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vander_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log1p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logcumsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mT_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_normalize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_with_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_median_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_no_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_with_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_minimum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mode_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mul_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanmean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_dropout_backward_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_with_logits_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_similarity_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cross_entropy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_bag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_without_train_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardsigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_instance_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_logsigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_circular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_reflect_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_unshuffle_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_rrelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_soft_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softplus_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_nuc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_permute_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polar_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_positive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_qr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rad2deg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randint_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_offsets_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_select_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sgn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_short_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_bartlett_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_cosine_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_nuttall_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_v_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_erfcx_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_he_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_laguerre_polynomial_l_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_polygamma_special_polygamma_n_0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_u_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_v_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_xlog1py_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_zeta_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tensordot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapezoid_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tril_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_true_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unique_consecutive_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unsqueeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zero__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_like_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hann_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_contiguous_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_minimum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ne_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_remainder_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize_as__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_erfcx_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_consecutive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmatmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___ror___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rxor___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bool_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcdiv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_allclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_not_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ceil_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_dstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isreal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_lt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_round_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_squeeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tril_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_xlogy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_acos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcdiv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmm_decomposed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_all_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_any_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bernoulli_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bfloat16_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_right_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ceil_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cfloat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_inverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_combinations_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_count_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diff_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_dstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expm1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_eye_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_full_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gather_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_half_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_index_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_reduce_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isreal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_kron_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_kthvalue_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cross_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_householder_product_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logcumsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mH_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mT_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_maximum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_binary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_minimum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_multinomial_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_full_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_celu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_gelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_linear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softplus_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_nuc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ones_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pinverse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polar_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_quantile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randn_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ravel_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scalar_tensor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_searchsorted_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sgn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sigmoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_erfcx_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_log_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_xlog1py_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_square_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_mean_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_stft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_symeig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_take_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triangular_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trunc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unsqueeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_unbiased_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zero__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32
2023-01-11T23:10:16.9480845Z 
2023-01-11T23:10:16.9489663Z test_ops.py::TestCommonCUDA::test_compare_cpu_T_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9490164Z test_ops.py::TestCommonCUDA::test_compare_cpu___radd___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9490649Z test_ops.py::TestCommonCUDA::test_compare_cpu___rdiv___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9491103Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmod___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9491556Z test_ops.py::TestCommonCUDA::test_compare_cpu___rxor___cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9492021Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_T_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9492502Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bfloat16_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9492988Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_byte_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:10:16.9493468Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cdouble_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9493958Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_chalf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9494441Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_char_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:10:16.9495332Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_double_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9495812Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_half_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9496295Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_short_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:10:16.9496765Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcdiv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9497228Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addcmul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9497694Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_addr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9498145Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_arange_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9498615Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9499084Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atan2_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9499541Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9500046Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_left_shift_cuda_int64 SKIPPED (Some inputs produce undefined outputs) [  0%]
2023-01-11T23:10:16.9500503Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bucketize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9500972Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_column_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9501443Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_constant_pad_nd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9501903Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_contiguous_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9502366Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diag_embed_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9502833Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9503309Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_floor_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9503785Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_no_rounding_mode_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9504245Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9504882Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_like_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:10:16.9505327Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_eye_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9505790Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_ifftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9506279Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flipud_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9506734Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9507191Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9507648Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hypot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9508093Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igamma_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9508563Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_igammac_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9509113Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_add_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9509577Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9510036Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9510529Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9511061Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9511538Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9512002Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_logsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9512462Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_masked_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9512930Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_movedim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9513384Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_mul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9513841Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9514296Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_narrow_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9514819Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:10:16.9515333Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_empty_strided_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:10:16.9515778Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_full_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9516228Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9516768Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_dropout_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:10:16.9517252Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_glu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9517736Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9518227Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9518725Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_leaky_relu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9519215Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9519716Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9520223Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9520691Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_randn_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9521148Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_repeat_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9521606Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9522089Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rsub_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9522545Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9523037Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9523527Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9524008Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_xlog1py_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9524488Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_special_zeta_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9524947Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9525399Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_to_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9525848Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_tril_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9526304Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unflatten_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9526759Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9527256Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unsqueeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9527711Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9528164Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9528608Z test_ops.py::TestCommonCUDA::test_compare_cpu_addcdiv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9529058Z test_ops.py::TestCommonCUDA::test_compare_cpu_addcmul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9529514Z test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9529964Z test_ops.py::TestCommonCUDA::test_compare_cpu_addmv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9530451Z test_ops.py::TestCommonCUDA::test_compare_cpu_addr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9530899Z test_ops.py::TestCommonCUDA::test_compare_cpu_arange_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9531353Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9531816Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9532286Z test_ops.py::TestCommonCUDA::test_compare_cpu_atan2_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9532739Z test_ops.py::TestCommonCUDA::test_compare_cpu_baddbmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9533252Z test_ops.py::TestCommonCUDA::test_compare_cpu_bernoulli_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:10:16.9533687Z test_ops.py::TestCommonCUDA::test_compare_cpu_bfloat16_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9534123Z test_ops.py::TestCommonCUDA::test_compare_cpu_bincount_cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9534682Z test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_left_shift_cuda_int64 SKIPPED (Some inputs produce undefined outputs) [  0%]
2023-01-11T23:10:16.9535124Z test_ops.py::TestCommonCUDA::test_compare_cpu_bmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9535580Z test_ops.py::TestCommonCUDA::test_compare_cpu_bucketize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9536027Z test_ops.py::TestCommonCUDA::test_compare_cpu_byte_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:10:16.9536463Z test_ops.py::TestCommonCUDA::test_compare_cpu_cdist_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9536910Z test_ops.py::TestCommonCUDA::test_compare_cpu_cdouble_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9537355Z test_ops.py::TestCommonCUDA::test_compare_cpu_char_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:10:16.9537800Z test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9538291Z test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9538752Z test_ops.py::TestCommonCUDA::test_compare_cpu_chunk_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9539211Z test_ops.py::TestCommonCUDA::test_compare_cpu_combinations_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9539678Z test_ops.py::TestCommonCUDA::test_compare_cpu_constant_pad_nd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9540127Z test_ops.py::TestCommonCUDA::test_compare_cpu_cov_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9540580Z test_ops.py::TestCommonCUDA::test_compare_cpu_cummax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9541027Z test_ops.py::TestCommonCUDA::test_compare_cpu_cummin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9541481Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumprod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9541921Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9542387Z test_ops.py::TestCommonCUDA::test_compare_cpu_cumulative_trapezoid_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9542865Z test_ops.py::TestCommonCUDA::test_compare_cpu_diag_embed_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9543353Z test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9543817Z test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9544291Z test_ops.py::TestCommonCUDA::test_compare_cpu_div_floor_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9544771Z test_ops.py::TestCommonCUDA::test_compare_cpu_div_no_rounding_mode_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9545248Z test_ops.py::TestCommonCUDA::test_compare_cpu_div_trunc_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9545710Z test_ops.py::TestCommonCUDA::test_compare_cpu_dsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9546150Z test_ops.py::TestCommonCUDA::test_compare_cpu_dstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9546654Z test_ops.py::TestCommonCUDA::test_compare_cpu_empty_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:10:16.9580087Z test_ops.py::TestCommonCUDA::test_compare_cpu_expand_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9580561Z test_ops.py::TestCommonCUDA::test_compare_cpu_expand_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9581023Z test_ops.py::TestCommonCUDA::test_compare_cpu_fft_fftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9581475Z test_ops.py::TestCommonCUDA::test_compare_cpu_fliplr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9581935Z test_ops.py::TestCommonCUDA::test_compare_cpu_fmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9582507Z test_ops.py::TestCommonCUDA::test_compare_cpu_fmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9582963Z test_ops.py::TestCommonCUDA::test_compare_cpu_full_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9583404Z test_ops.py::TestCommonCUDA::test_compare_cpu_full_like_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9583862Z test_ops.py::TestCommonCUDA::test_compare_cpu_gather_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9584318Z test_ops.py::TestCommonCUDA::test_compare_cpu_geqrf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9584781Z test_ops.py::TestCommonCUDA::test_compare_cpu_gradient_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9585229Z test_ops.py::TestCommonCUDA::test_compare_cpu_half_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9585674Z test_ops.py::TestCommonCUDA::test_compare_cpu_hypot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9586125Z test_ops.py::TestCommonCUDA::test_compare_cpu_igamma_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9586587Z test_ops.py::TestCommonCUDA::test_compare_cpu_igammac_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9587046Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_add_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9587534Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_put_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9588003Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9588464Z test_ops.py::TestCommonCUDA::test_compare_cpu_inner_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9588995Z test_ops.py::TestCommonCUDA::test_compare_cpu_kron_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9589440Z test_ops.py::TestCommonCUDA::test_compare_cpu_kthvalue_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9589898Z test_ops.py::TestCommonCUDA::test_compare_cpu_lerp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9590402Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9590871Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvals_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9591341Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigvalsh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9591828Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_grad_oriented_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9592315Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9592797Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_factor_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9593299Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9593778Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_power_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9594267Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9594746Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_multi_dot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9595231Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9595726Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_hermitian_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9596211Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:10:16.9596676Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_qr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9597142Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9597616Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_triangular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9598121Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9598593Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_svdvals_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9599058Z test_ops.py::TestCommonCUDA::test_compare_cpu_linspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9599514Z test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp2_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9600022Z test_ops.py::TestCommonCUDA::test_compare_cpu_logsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9600526Z test_ops.py::TestCommonCUDA::test_compare_cpu_long_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  1%]
2023-01-11T23:10:16.9600977Z test_ops.py::TestCommonCUDA::test_compare_cpu_lu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9601430Z test_ops.py::TestCommonCUDA::test_compare_cpu_lu_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9601880Z test_ops.py::TestCommonCUDA::test_compare_cpu_lu_unpack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9602329Z test_ops.py::TestCommonCUDA::test_compare_cpu_mH_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9602770Z test_ops.py::TestCommonCUDA::test_compare_cpu_mT_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9603222Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumprod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9603692Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logaddexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9604202Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_median_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9604679Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9605148Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9605602Z test_ops.py::TestCommonCUDA::test_compare_cpu_matmul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9606083Z test_ops.py::TestCommonCUDA::test_compare_cpu_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9606575Z test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_no_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9607055Z test_ops.py::TestCommonCUDA::test_compare_cpu_max_reduction_with_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9607534Z test_ops.py::TestCommonCUDA::test_compare_cpu_meshgrid_list_of_tensors_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9607995Z test_ops.py::TestCommonCUDA::test_compare_cpu_mm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9608447Z test_ops.py::TestCommonCUDA::test_compare_cpu_movedim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9608930Z test_ops.py::TestCommonCUDA::test_compare_cpu_msort_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9609383Z test_ops.py::TestCommonCUDA::test_compare_cpu_mul_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9609822Z test_ops.py::TestCommonCUDA::test_compare_cpu_mv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9610282Z test_ops.py::TestCommonCUDA::test_compare_cpu_nanmedian_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9610797Z test_ops.py::TestCommonCUDA::test_compare_cpu_nanquantile_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9611280Z test_ops.py::TestCommonCUDA::test_compare_cpu_native_dropout_backward_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9611871Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_strided_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9612322Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9612852Z test_ops.py::TestCommonCUDA::test_compare_cpu_nextafter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9613370Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9613864Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9614358Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9615020Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9615515Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_max_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9615989Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_alpha_dropout_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9616472Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9616950Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_bilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9617443Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_embedding_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9617942Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cross_entropy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9618484Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout2d_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9619037Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_with_train_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9619539Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_feature_alpha_dropout_without_train_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9620181Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool2d_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9620745Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9621219Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_gaussian_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9621709Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_grid_sample_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9622188Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9622666Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardtanh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9623154Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9623640Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_huber_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9624121Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_instance_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9624610Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_area_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9625098Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9625599Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_linear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9626109Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_kl_div_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9626587Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_leaky_relu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9627073Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9627549Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9628030Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9628514Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_grad_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9629732Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9630228Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_circular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9630731Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_reflect_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9631217Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_unshuffle_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9631740Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9632213Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_relu6_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9632749Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_rrelu_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9633206Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9633683Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_unfold_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9634171Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_nearest_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9634639Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_fro_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9635081Z test_ops.py::TestCommonCUDA::test_compare_cpu_ones_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9635540Z test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_var_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9635998Z test_ops.py::TestCommonCUDA::test_compare_cpu_outer_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9636435Z test_ops.py::TestCommonCUDA::test_compare_cpu_put_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9636951Z test_ops.py::TestCommonCUDA::test_compare_cpu_randint_like_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9637413Z test_ops.py::TestCommonCUDA::test_compare_cpu_randn_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9637922Z test_ops.py::TestCommonCUDA::test_compare_cpu_randn_like_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9638351Z test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_conj_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9638805Z test_ops.py::TestCommonCUDA::test_compare_cpu_rot90_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9639262Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9639742Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_amin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9640202Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_sum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9640703Z test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_lengths_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9641199Z test_ops.py::TestCommonCUDA::test_compare_cpu_select_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9641651Z test_ops.py::TestCommonCUDA::test_compare_cpu_short_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  1%]
2023-01-11T23:10:16.9642095Z test_ops.py::TestCommonCUDA::test_compare_cpu_slice_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9642578Z test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9643026Z test_ops.py::TestCommonCUDA::test_compare_cpu_sort_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9643486Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9643958Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9644535Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9645031Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_h_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9645509Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_hermite_polynomial_he_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9645976Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_laguerre_polynomial_l_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9646432Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_legendre_polynomial_p_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9646913Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9647392Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:10:16.9647871Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_xlog1py_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9648353Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_zeta_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9648817Z test_ops.py::TestCommonCUDA::test_compare_cpu_split_list_args_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9649271Z test_ops.py::TestCommonCUDA::test_compare_cpu_squeeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9649720Z test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9650203Z test_ops.py::TestCommonCUDA::test_compare_cpu_std_mean_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9650693Z test_ops.py::TestCommonCUDA::test_compare_cpu_std_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9651152Z test_ops.py::TestCommonCUDA::test_compare_cpu_sum_to_size_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9651597Z test_ops.py::TestCommonCUDA::test_compare_cpu_svd_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9652044Z test_ops.py::TestCommonCUDA::test_compare_cpu_symeig_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9652492Z test_ops.py::TestCommonCUDA::test_compare_cpu_take_along_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9652978Z test_ops.py::TestCommonCUDA::test_compare_cpu_tensordot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9653432Z test_ops.py::TestCommonCUDA::test_compare_cpu_to_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9653876Z test_ops.py::TestCommonCUDA::test_compare_cpu_trace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9654319Z test_ops.py::TestCommonCUDA::test_compare_cpu_trapezoid_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9654874Z test_ops.py::TestCommonCUDA::test_compare_cpu_trapz_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9655320Z test_ops.py::TestCommonCUDA::test_compare_cpu_tril_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9655772Z test_ops.py::TestCommonCUDA::test_compare_cpu_true_divide_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9656218Z test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9656731Z test_ops.py::TestCommonCUDA::test_compare_cpu_uniform_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:10:16.9657178Z test_ops.py::TestCommonCUDA::test_compare_cpu_unique_consecutive_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9657640Z test_ops.py::TestCommonCUDA::test_compare_cpu_unsqueeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9658091Z test_ops.py::TestCommonCUDA::test_compare_cpu_var_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9658545Z test_ops.py::TestCommonCUDA::test_compare_cpu_vdot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9659038Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_complex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9659509Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_real_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9659966Z test_ops.py::TestCommonCUDA::test_compare_cpu_vsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9660410Z test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_like_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:10:16.9660819Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_T_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9661200Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing___getitem___cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9661569Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_abs_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9661944Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acosh_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9662316Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_add_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9662685Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_angle_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9663102Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_cuda_complex32 SKIPPED (Errors when storage_offset is included) [  1%]
2023-01-11T23:10:16.9663555Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_scatter_cuda_complex32 SKIPPED (Fails on cuda + rocm) [  1%]
2023-01-11T23:10:16.9663996Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asinh_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9664372Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atan_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9664740Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atanh_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9665120Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_1d_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9665499Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_2d_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9665881Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bfloat16_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9666256Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_block_diag_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9666634Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cdouble_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9667012Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_char_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9667379Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9667762Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_contiguous_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9668138Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cosh_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9668508Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_double_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9668948Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dstack_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9669364Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [  1%]
2023-01-11T23:10:16.9669796Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_empty_like_cuda_complex32 SKIPPED (Skipped!) [  1%]
2023-01-11T23:10:16.9670192Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_exp_cuda_complex32 PASSED [  1%]
2023-01-11T23:10:16.9670590Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft2_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9670971Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft2_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9671352Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfft_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9671732Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftn_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9672109Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifftshift_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9672497Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft2_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9672874Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fill_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9673240Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_flatten_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9673622Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_float_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9673993Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_full_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9674368Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hsplit_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9674733Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_imag_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9675108Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_add_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9675494Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_select_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9675906Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isinf_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9676272Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isreal_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9676655Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_long_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9677024Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mH_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9677383Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_neg_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9677760Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_zeros_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9678155Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv1d_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9678560Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv2d_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9678970Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose1d_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9679369Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9679745Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9680136Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ravel_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9680516Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_repeat_interleave_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9680902Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_roll_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9681275Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_select_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9681644Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sgn_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9682008Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sin_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9682416Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9682791Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sqrt_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9683153Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_stack_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9683519Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sum_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9683886Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tanh_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9684254Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tril_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9684612Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_triu_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9684993Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unflatten_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9685379Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_copy_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9685759Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vstack_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9686131Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_like_cuda_complex32 PASSED [  2%]
2023-01-11T23:10:16.9686486Z test_ops.py::TestCommonCUDA::test_dtypes___getitem___cuda PASSED         [  2%]
2023-01-11T23:10:16.9686806Z test_ops.py::TestCommonCUDA::test_dtypes___radd___cuda PASSED            [  2%]
2023-01-11T23:10:16.9687118Z test_ops.py::TestCommonCUDA::test_dtypes___rand___cuda PASSED            [  2%]
2023-01-11T23:10:16.9687429Z test_ops.py::TestCommonCUDA::test_dtypes___rmod___cuda PASSED            [  2%]
2023-01-11T23:10:16.9687769Z test_ops.py::TestCommonCUDA::test_dtypes___rmul___cuda PASSED            [  2%]
2023-01-11T23:10:16.9688084Z test_ops.py::TestCommonCUDA::test_dtypes___ror___cuda PASSED             [  2%]
2023-01-11T23:10:16.9688393Z test_ops.py::TestCommonCUDA::test_dtypes___rsub___cuda PASSED            [  2%]
2023-01-11T23:10:16.9688723Z test_ops.py::TestCommonCUDA::test_dtypes__native_batch_norm_legit_cuda PASSED [  2%]
2023-01-11T23:10:16.9689054Z test_ops.py::TestCommonCUDA::test_dtypes__refs_T_cuda PASSED             [  2%]
2023-01-11T23:10:16.9689380Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_chalf_cuda PASSED [  2%]
2023-01-11T23:10:16.9689726Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_double_cuda PASSED [  2%]
2023-01-11T23:10:16.9690072Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_float_cuda PASSED [  2%]
2023-01-11T23:10:16.9690411Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_half_cuda PASSED [  2%]
2023-01-11T23:10:16.9690746Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_short_cuda PASSED [  2%]
2023-01-11T23:10:16.9691083Z test_ops.py::TestCommonCUDA::test_dtypes__refs_acos_cuda PASSED          [  2%]
2023-01-11T23:10:16.9691407Z test_ops.py::TestCommonCUDA::test_dtypes__refs_acosh_cuda PASSED         [  2%]
2023-01-11T23:10:16.9691716Z test_ops.py::TestCommonCUDA::test_dtypes__refs_add_cuda PASSED           [  2%]
2023-01-11T23:10:16.9692029Z test_ops.py::TestCommonCUDA::test_dtypes__refs_addr_cuda PASSED          [  2%]
2023-01-11T23:10:16.9692346Z test_ops.py::TestCommonCUDA::test_dtypes__refs_all_cuda PASSED           [  2%]
2023-01-11T23:10:16.9692668Z test_ops.py::TestCommonCUDA::test_dtypes__refs_allclose_cuda PASSED      [  2%]
2023-01-11T23:10:16.9692979Z test_ops.py::TestCommonCUDA::test_dtypes__refs_amax_cuda PASSED          [  2%]
2023-01-11T23:10:16.9693293Z test_ops.py::TestCommonCUDA::test_dtypes__refs_amin_cuda PASSED          [  2%]
2023-01-11T23:10:16.9693613Z test_ops.py::TestCommonCUDA::test_dtypes__refs_arange_cuda PASSED        [  2%]
2023-01-11T23:10:16.9693921Z test_ops.py::TestCommonCUDA::test_dtypes__refs_asinh_cuda PASSED         [  2%]
2023-01-11T23:10:16.9694233Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atan2_cuda PASSED         [  2%]
2023-01-11T23:10:16.9694731Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atanh_cuda PASSED         [  2%]
2023-01-11T23:10:16.9695054Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_2d_cuda PASSED    [  2%]
2023-01-11T23:10:16.9695365Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_3d_cuda PASSED    [  2%]
2023-01-11T23:10:16.9695682Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_and_cuda PASSED   [  2%]
2023-01-11T23:10:16.9696015Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_left_shift_cuda PASSED [  2%]
2023-01-11T23:10:16.9696337Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_not_cuda PASSED   [  2%]
2023-01-11T23:10:16.9696661Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_or_cuda PASSED    [  2%]
2023-01-11T23:10:16.9697005Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_right_shift_cuda PASSED [  2%]
2023-01-11T23:10:16.9697350Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_shapes_cuda PASSED [  2%]
2023-01-11T23:10:16.9697687Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_tensors_cuda PASSED [  2%]
2023-01-11T23:10:16.9698019Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bucketize_cuda PASSED     [  2%]
2023-01-11T23:10:16.9698333Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cat_cuda PASSED           [  2%]
2023-01-11T23:10:16.9698637Z test_ops.py::TestCommonCUDA::test_dtypes__refs_chunk_cuda PASSED         [  2%]
2023-01-11T23:10:16.9698952Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_max_cuda PASSED     [  2%]
2023-01-11T23:10:16.9699274Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_min_cuda PASSED     [  2%]
2023-01-11T23:10:16.9699584Z test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_cuda PASSED          [  2%]
2023-01-11T23:10:16.9699925Z test_ops.py::TestCommonCUDA::test_dtypes__refs_conj_physical_cuda PASSED [  2%]
2023-01-11T23:10:16.9700262Z test_ops.py::TestCommonCUDA::test_dtypes__refs_constant_pad_nd_cuda PASSED [  2%]
2023-01-11T23:10:16.9700628Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cos_cuda PASSED           [  2%]
2023-01-11T23:10:16.9700933Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_embed_cuda PASSED    [  2%]
2023-01-11T23:10:16.9701259Z test_ops.py::TestCommonCUDA::test_dtypes__refs_div_floor_rounding_cuda PASSED [  2%]
2023-01-11T23:10:16.9701595Z test_ops.py::TestCommonCUDA::test_dtypes__refs_div_trunc_rounding_cuda PASSED [  2%]
2023-01-11T23:10:16.9701923Z test_ops.py::TestCommonCUDA::test_dtypes__refs_dsplit_cuda PASSED        [  2%]
2023-01-11T23:10:16.9702227Z test_ops.py::TestCommonCUDA::test_dtypes__refs_dstack_cuda PASSED        [  2%]
2023-01-11T23:10:16.9702533Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erfc_cuda PASSED          [  2%]
2023-01-11T23:10:16.9702847Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_as_cuda PASSED     [  2%]
2023-01-11T23:10:16.9703157Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expm1_cuda PASSED         [  2%]
2023-01-11T23:10:16.9703472Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft2_cuda PASSED      [  2%]
2023-01-11T23:10:16.9703784Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fft_cuda PASSED       [  2%]
2023-01-11T23:10:16.9704100Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftshift_cuda PASSED  [  2%]
2023-01-11T23:10:16.9704412Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft_cuda PASSED      [  2%]
2023-01-11T23:10:16.9704724Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfftn_cuda PASSED     [  2%]
2023-01-11T23:10:16.9705304Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft2_cuda PASSED     [  2%]
2023-01-11T23:10:16.9705610Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifft_cuda PASSED      [  2%]
2023-01-11T23:10:16.9705919Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftn_cuda PASSED     [  2%]
2023-01-11T23:10:16.9706233Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft_cuda PASSED     [  2%]
2023-01-11T23:10:16.9706546Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft2_cuda PASSED    [  2%]
2023-01-11T23:10:16.9706887Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfft_cuda PASSED     [  2%]
2023-01-11T23:10:16.9707201Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft2_cuda PASSED     [  2%]
2023-01-11T23:10:16.9707668Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fill_cuda PASSED          [  2%]
2023-01-11T23:10:16.9707975Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fliplr_cuda PASSED        [  2%]
2023-01-11T23:10:16.9708280Z test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_cuda PASSED         [  2%]
2023-01-11T23:10:16.9708586Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmin_cuda PASSED          [  2%]
2023-01-11T23:10:16.9708956Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmod_cuda PASSED          [  2%]
2023-01-11T23:10:16.9709258Z test_ops.py::TestCommonCUDA::test_dtypes__refs_frac_cuda PASSED          [  2%]
2023-01-11T23:10:16.9709562Z test_ops.py::TestCommonCUDA::test_dtypes__refs_gcd_cuda PASSED           [  2%]
2023-01-11T23:10:16.9709865Z test_ops.py::TestCommonCUDA::test_dtypes__refs_gt_cuda PASSED            [  2%]
2023-01-11T23:10:16.9710165Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hsplit_cuda PASSED        [  2%]
2023-01-11T23:10:16.9710473Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hstack_cuda PASSED        [  2%]
2023-01-11T23:10:16.9710779Z test_ops.py::TestCommonCUDA::test_dtypes__refs_hypot_cuda PASSED         [  2%]
2023-01-11T23:10:16.9711080Z test_ops.py::TestCommonCUDA::test_dtypes__refs_i0_cuda PASSED            [  2%]
2023-01-11T23:10:16.9711379Z test_ops.py::TestCommonCUDA::test_dtypes__refs_igamma_cuda PASSED        [  2%]
2023-01-11T23:10:16.9711688Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_copy_cuda PASSED    [  2%]
2023-01-11T23:10:16.9712000Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_fill_cuda PASSED    [  2%]
2023-01-11T23:10:16.9712342Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_select_cuda PASSED  [  2%]
2023-01-11T23:10:16.9712652Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isfinite_cuda PASSED      [  2%]
2023-01-11T23:10:16.9712965Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isinf_cuda PASSED         [  2%]
2023-01-11T23:10:16.9713268Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isnan_cuda PASSED         [  2%]
2023-01-11T23:10:16.9713570Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isneginf_cuda PASSED      [  2%]
2023-01-11T23:10:16.9713878Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isreal_cuda PASSED        [  2%]
2023-01-11T23:10:16.9714181Z test_ops.py::TestCommonCUDA::test_dtypes__refs_le_cuda PASSED            [  2%]
2023-01-11T23:10:16.9714480Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lerp_cuda PASSED          [  2%]
2023-01-11T23:10:16.9714787Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lgamma_cuda PASSED        [  2%]
2023-01-11T23:10:16.9715106Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_matrix_norm_cuda PASSED [  2%]
2023-01-11T23:10:16.9715427Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svd_cuda PASSED    [  2%]
2023-01-11T23:10:16.9715753Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_vector_norm_cuda PASSED [  2%]
2023-01-11T23:10:16.9716076Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linspace_cuda PASSED      [  2%]
2023-01-11T23:10:16.9716384Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log2_cuda PASSED          [  2%]
2023-01-11T23:10:16.9716681Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log_cuda PASSED           [  2%]
2023-01-11T23:10:16.9717006Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log_softmax_with_dtype_cuda PASSED [  2%]
2023-01-11T23:10:16.9717337Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_not_cuda PASSED   [  2%]
2023-01-11T23:10:16.9717650Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_or_cuda PASSED    [  2%]
2023-01-11T23:10:16.9717965Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_xor_cuda PASSED   [  2%]
2023-01-11T23:10:16.9718294Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logsumexp_cuda PASSED     [  2%]
2023-01-11T23:10:16.9718636Z test_ops.py::TestCommonCUDA::test_dtypes__refs_masked_fill_cuda PASSED   [  2%]
2023-01-11T23:10:16.9718944Z test_ops.py::TestCommonCUDA::test_dtypes__refs_movedim_cuda PASSED       [  2%]
2023-01-11T23:10:16.9719253Z test_ops.py::TestCommonCUDA::test_dtypes__refs_mul_cuda PASSED           [  2%]
2023-01-11T23:10:16.9719556Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nan_to_num_cuda PASSED    [  2%]
2023-01-11T23:10:16.9719857Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ne_cuda PASSED            [  2%]
2023-01-11T23:10:16.9720161Z test_ops.py::TestCommonCUDA::test_dtypes__refs_neg_cuda PASSED           [  2%]
2023-01-11T23:10:16.9720507Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_cuda PASSED     [  2%]
2023-01-11T23:10:16.9720836Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_empty_strided_cuda PASSED [  2%]
2023-01-11T23:10:16.9721155Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_full_cuda PASSED      [  2%]
2023-01-11T23:10:16.9721465Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nextafter_cuda PASSED     [  2%]
2023-01-11T23:10:16.9721794Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_dropout_cuda PASSED [  2%]
2023-01-11T23:10:16.9722131Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_elu_cuda PASSED [  2%]
2023-01-11T23:10:16.9722474Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_group_norm_cuda PASSED [  2%]
2023-01-11T23:10:16.9722821Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardtanh_cuda PASSED [  2%]
2023-01-11T23:10:16.9723169Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_huber_loss_cuda PASSED [  2%]
2023-01-11T23:10:16.9723514Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_layer_norm_cuda PASSED [  2%]
2023-01-11T23:10:16.9723886Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_leaky_relu_cuda PASSED [  3%]
2023-01-11T23:10:16.9724221Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_nll_loss_cuda PASSED [  3%]
2023-01-11T23:10:16.9724579Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pairwise_distance_cuda PASSED [  3%]
2023-01-11T23:10:16.9724924Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu_cuda PASSED [  3%]
2023-01-11T23:10:16.9725263Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softplus_cuda PASSED [  3%]
2023-01-11T23:10:16.9725607Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softshrink_cuda PASSED [  3%]
2023-01-11T23:10:16.9725959Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_triplet_margin_loss_cuda PASSED [  3%]
2023-01-11T23:10:16.9726298Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ones_cuda PASSED          [  3%]
2023-01-11T23:10:16.9726610Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ravel_cuda PASSED         [  3%]
2023-01-11T23:10:16.9726930Z test_ops.py::TestCommonCUDA::test_dtypes__refs_remainder_cuda PASSED     [  3%]
2023-01-11T23:10:16.9727243Z test_ops.py::TestCommonCUDA::test_dtypes__refs_repeat_cuda PASSED        [  3%]
2023-01-11T23:10:16.9727555Z test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_cuda PASSED       [  3%]
2023-01-11T23:10:16.9727870Z test_ops.py::TestCommonCUDA::test_dtypes__refs_roll_cuda PASSED          [  3%]
2023-01-11T23:10:16.9728173Z test_ops.py::TestCommonCUDA::test_dtypes__refs_rsqrt_cuda PASSED         [  3%]
2023-01-11T23:10:16.9728477Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sigmoid_cuda PASSED       [  3%]
2023-01-11T23:10:16.9728789Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sign_cuda PASSED          [  3%]
2023-01-11T23:10:16.9729097Z test_ops.py::TestCommonCUDA::test_dtypes__refs_signbit_cuda PASSED       [  3%]
2023-01-11T23:10:16.9729401Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sin_cuda PASSED           [  3%]
2023-01-11T23:10:16.9729708Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sinh_cuda PASSED          [  3%]
2023-01-11T23:10:16.9730031Z test_ops.py::TestCommonCUDA::test_dtypes__refs_softmax_with_dtype_cuda PASSED [  3%]
2023-01-11T23:10:16.9730385Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j0_cuda PASSED [  3%]
2023-01-11T23:10:16.9730763Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_bessel_j1_cuda PASSED [  3%]
2023-01-11T23:10:16.9731088Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_entr_cuda PASSED  [  3%]
2023-01-11T23:10:16.9731405Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_erfcx_cuda PASSED [  3%]
2023-01-11T23:10:16.9731716Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i0e_cuda PASSED   [  3%]
2023-01-11T23:10:16.9732051Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_softmax_with_dtype_cuda PASSED [  3%]
2023-01-11T23:10:16.9732387Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_logit_cuda PASSED [  3%]
2023-01-11T23:10:16.9732727Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_1_cuda PASSED [  3%]
2023-01-11T23:10:16.9733098Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_3_cuda PASSED [  3%]
2023-01-11T23:10:16.9733440Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtri_cuda PASSED [  3%]
2023-01-11T23:10:16.9733778Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_spherical_bessel_j0_cuda PASSED [  3%]
2023-01-11T23:10:16.9734116Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_xlog1py_cuda PASSED [  3%]
2023-01-11T23:10:16.9734443Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_zeta_cuda PASSED  [  3%]
2023-01-11T23:10:16.9734854Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sqrt_cuda PASSED          [  3%]
2023-01-11T23:10:16.9735159Z test_ops.py::TestCommonCUDA::test_dtypes__refs_squeeze_cuda PASSED       [  3%]
2023-01-11T23:10:16.9735468Z test_ops.py::TestCommonCUDA::test_dtypes__refs_stack_cuda PASSED         [  3%]
2023-01-11T23:10:16.9735821Z test_ops.py::TestCommonCUDA::test_dtypes__refs_std_cuda PASSED           [  3%]
2023-01-11T23:10:16.9736124Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sub_cuda PASSED           [  3%]
2023-01-11T23:10:16.9736423Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_cuda PASSED           [  3%]
2023-01-11T23:10:16.9736729Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sum_to_size_cuda PASSED   [  3%]
2023-01-11T23:10:16.9737036Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tan_cuda PASSED           [  3%]
2023-01-11T23:10:16.9737336Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tanh_cuda PASSED          [  3%]
2023-01-11T23:10:16.9737640Z test_ops.py::TestCommonCUDA::test_dtypes__refs_to_cuda PASSED            [  3%]
2023-01-11T23:10:16.9737945Z test_ops.py::TestCommonCUDA::test_dtypes__refs_transpose_cuda PASSED     [  3%]
2023-01-11T23:10:16.9738259Z test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_indices_cuda PASSED  [  3%]
2023-01-11T23:10:16.9738567Z test_ops.py::TestCommonCUDA::test_dtypes__refs_trunc_cuda PASSED         [  3%]
2023-01-11T23:10:16.9738877Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unflatten_cuda PASSED     [  3%]
2023-01-11T23:10:16.9739191Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_copy_cuda PASSED   [  3%]
2023-01-11T23:10:16.9739500Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unfold_cuda PASSED        [  3%]
2023-01-11T23:10:16.9739801Z test_ops.py::TestCommonCUDA::test_dtypes__refs_view_cuda PASSED          [  3%]
2023-01-11T23:10:16.9740104Z test_ops.py::TestCommonCUDA::test_dtypes__refs_vstack_cuda PASSED        [  3%]
2023-01-11T23:10:16.9740409Z test_ops.py::TestCommonCUDA::test_dtypes__refs_where_cuda PASSED         [  3%]
2023-01-11T23:10:16.9740710Z test_ops.py::TestCommonCUDA::test_dtypes__refs_xlogy_cuda PASSED         [  3%]
2023-01-11T23:10:16.9741011Z test_ops.py::TestCommonCUDA::test_dtypes_acosh_cuda PASSED               [  3%]
2023-01-11T23:10:16.9741314Z test_ops.py::TestCommonCUDA::test_dtypes_add_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9741611Z test_ops.py::TestCommonCUDA::test_dtypes_addcmul_cuda PASSED             [  3%]
2023-01-11T23:10:16.9741913Z test_ops.py::TestCommonCUDA::test_dtypes_addmv_cuda PASSED               [  3%]
2023-01-11T23:10:16.9742242Z test_ops.py::TestCommonCUDA::test_dtypes_all_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9742544Z test_ops.py::TestCommonCUDA::test_dtypes_amax_cuda PASSED                [  3%]
2023-01-11T23:10:16.9742839Z test_ops.py::TestCommonCUDA::test_dtypes_aminmax_cuda PASSED             [  3%]
2023-01-11T23:10:16.9743138Z test_ops.py::TestCommonCUDA::test_dtypes_any_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9743435Z test_ops.py::TestCommonCUDA::test_dtypes_arange_cuda PASSED              [  3%]
2023-01-11T23:10:16.9743733Z test_ops.py::TestCommonCUDA::test_dtypes_argmin_cuda PASSED              [  3%]
2023-01-11T23:10:16.9744034Z test_ops.py::TestCommonCUDA::test_dtypes_argsort_cuda PASSED             [  3%]
2023-01-11T23:10:16.9744337Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_cuda PASSED          [  3%]
2023-01-11T23:10:16.9744665Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_partial_views_cuda PASSED [  3%]
2023-01-11T23:10:16.9744991Z test_ops.py::TestCommonCUDA::test_dtypes_as_strided_scatter_cuda PASSED  [  3%]
2023-01-11T23:10:16.9745300Z test_ops.py::TestCommonCUDA::test_dtypes_asinh_cuda PASSED               [  3%]
2023-01-11T23:10:16.9745597Z test_ops.py::TestCommonCUDA::test_dtypes_atan_cuda PASSED                [  3%]
2023-01-11T23:10:16.9745896Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_1d_cuda PASSED          [  3%]
2023-01-11T23:10:16.9746201Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_2d_cuda PASSED          [  3%]
2023-01-11T23:10:16.9746506Z test_ops.py::TestCommonCUDA::test_dtypes_bernoulli_cuda PASSED           [  3%]
2023-01-11T23:10:16.9746811Z test_ops.py::TestCommonCUDA::test_dtypes_bfloat16_cuda PASSED            [  3%]
2023-01-11T23:10:16.9747118Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_left_shift_cuda PASSED  [  3%]
2023-01-11T23:10:16.9747464Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_right_shift_cuda PASSED [  3%]
2023-01-11T23:10:16.9747787Z test_ops.py::TestCommonCUDA::test_dtypes_bmm_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9748083Z test_ops.py::TestCommonCUDA::test_dtypes_bool_cuda PASSED                [  3%]
2023-01-11T23:10:16.9748408Z test_ops.py::TestCommonCUDA::test_dtypes_broadcast_shapes_cuda SKIPPED (Skipped!) [  3%]
2023-01-11T23:10:16.9748811Z test_ops.py::TestCommonCUDA::test_dtypes_broadcast_to_cuda PASSED        [  3%]
2023-01-11T23:10:16.9749117Z test_ops.py::TestCommonCUDA::test_dtypes_bucketize_cuda PASSED           [  3%]
2023-01-11T23:10:16.9749415Z test_ops.py::TestCommonCUDA::test_dtypes_byte_cuda PASSED                [  3%]
2023-01-11T23:10:16.9749722Z test_ops.py::TestCommonCUDA::test_dtypes_cartesian_prod_cuda PASSED      [  3%]
2023-01-11T23:10:16.9750059Z test_ops.py::TestCommonCUDA::test_dtypes_cat_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9750384Z test_ops.py::TestCommonCUDA::test_dtypes_cdist_cuda PASSED               [  3%]
2023-01-11T23:10:16.9750687Z test_ops.py::TestCommonCUDA::test_dtypes_ceil_cuda PASSED                [  3%]
2023-01-11T23:10:16.9750992Z test_ops.py::TestCommonCUDA::test_dtypes_chalf_cuda PASSED               [  3%]
2023-01-11T23:10:16.9751291Z test_ops.py::TestCommonCUDA::test_dtypes_char_cuda PASSED                [  3%]
2023-01-11T23:10:16.9751595Z test_ops.py::TestCommonCUDA::test_dtypes_cholesky_inverse_cuda PASSED    [  3%]
2023-01-11T23:10:16.9751903Z test_ops.py::TestCommonCUDA::test_dtypes_clamp_min_cuda PASSED           [  3%]
2023-01-11T23:10:16.9752206Z test_ops.py::TestCommonCUDA::test_dtypes_clone_cuda PASSED               [  3%]
2023-01-11T23:10:16.9752505Z test_ops.py::TestCommonCUDA::test_dtypes_combinations_cuda PASSED        [  3%]
2023-01-11T23:10:16.9752816Z test_ops.py::TestCommonCUDA::test_dtypes_conj_physical_cuda PASSED       [  3%]
2023-01-11T23:10:16.9753124Z test_ops.py::TestCommonCUDA::test_dtypes_copysign_cuda PASSED            [  3%]
2023-01-11T23:10:16.9753434Z test_ops.py::TestCommonCUDA::test_dtypes_cosh_cuda PASSED                [  3%]
2023-01-11T23:10:16.9753733Z test_ops.py::TestCommonCUDA::test_dtypes_count_nonzero_cuda PASSED       [  3%]
2023-01-11T23:10:16.9754066Z test_ops.py::TestCommonCUDA::test_dtypes_cov_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9754364Z test_ops.py::TestCommonCUDA::test_dtypes_cross_cuda PASSED               [  3%]
2023-01-11T23:10:16.9754660Z test_ops.py::TestCommonCUDA::test_dtypes_cummax_cuda PASSED              [  3%]
2023-01-11T23:10:16.9754963Z test_ops.py::TestCommonCUDA::test_dtypes_cumprod_cuda PASSED             [  3%]
2023-01-11T23:10:16.9755284Z test_ops.py::TestCommonCUDA::test_dtypes_cumulative_trapezoid_cuda PASSED [  3%]
2023-01-11T23:10:16.9755604Z test_ops.py::TestCommonCUDA::test_dtypes_diag_embed_cuda PASSED          [  3%]
2023-01-11T23:10:16.9755905Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_cuda PASSED            [  3%]
2023-01-11T23:10:16.9756213Z test_ops.py::TestCommonCUDA::test_dtypes_digamma_cuda PASSED             [  3%]
2023-01-11T23:10:16.9756521Z test_ops.py::TestCommonCUDA::test_dtypes_div_floor_rounding_cuda PASSED  [  3%]
2023-01-11T23:10:16.9756826Z test_ops.py::TestCommonCUDA::test_dtypes_dot_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9757127Z test_ops.py::TestCommonCUDA::test_dtypes_dsplit_cuda PASSED              [  3%]
2023-01-11T23:10:16.9757433Z test_ops.py::TestCommonCUDA::test_dtypes_empty_cuda PASSED               [  3%]
2023-01-11T23:10:16.9757735Z test_ops.py::TestCommonCUDA::test_dtypes_equal_cuda PASSED               [  3%]
2023-01-11T23:10:16.9758030Z test_ops.py::TestCommonCUDA::test_dtypes_erf_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9758327Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fftn_cuda PASSED            [  3%]
2023-01-11T23:10:16.9758631Z test_ops.py::TestCommonCUDA::test_dtypes_fft_hfftn_cuda PASSED           [  3%]
2023-01-11T23:10:16.9758958Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftn_cuda PASSED           [  3%]
2023-01-11T23:10:16.9759267Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifftshift_cuda PASSED       [  3%]
2023-01-11T23:10:16.9759572Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft2_cuda PASSED          [  3%]
2023-01-11T23:10:16.9759884Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfftn_cuda PASSED          [  3%]
2023-01-11T23:10:16.9760184Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfftn_cuda PASSED          [  3%]
2023-01-11T23:10:16.9760491Z test_ops.py::TestCommonCUDA::test_dtypes_fft_rfftn_cuda PASSED           [  3%]
2023-01-11T23:10:16.9760840Z test_ops.py::TestCommonCUDA::test_dtypes_flatten_cuda PASSED             [  3%]
2023-01-11T23:10:16.9761139Z test_ops.py::TestCommonCUDA::test_dtypes_fliplr_cuda PASSED              [  3%]
2023-01-11T23:10:16.9761440Z test_ops.py::TestCommonCUDA::test_dtypes_flipud_cuda PASSED              [  3%]
2023-01-11T23:10:16.9761746Z test_ops.py::TestCommonCUDA::test_dtypes_float_power_cuda PASSED         [  3%]
2023-01-11T23:10:16.9762053Z test_ops.py::TestCommonCUDA::test_dtypes_floor_divide_cuda PASSED        [  3%]
2023-01-11T23:10:16.9762357Z test_ops.py::TestCommonCUDA::test_dtypes_fmax_cuda PASSED                [  3%]
2023-01-11T23:10:16.9762656Z test_ops.py::TestCommonCUDA::test_dtypes_fmin_cuda PASSED                [  3%]
2023-01-11T23:10:16.9762954Z test_ops.py::TestCommonCUDA::test_dtypes_frac_cuda PASSED                [  3%]
2023-01-11T23:10:16.9763248Z test_ops.py::TestCommonCUDA::test_dtypes_full_cuda PASSED                [  3%]
2023-01-11T23:10:16.9763545Z test_ops.py::TestCommonCUDA::test_dtypes_full_like_cuda PASSED           [  3%]
2023-01-11T23:10:16.9763847Z test_ops.py::TestCommonCUDA::test_dtypes_geqrf_cuda XFAIL                [  3%]
2023-01-11T23:10:16.9764142Z test_ops.py::TestCommonCUDA::test_dtypes_gradient_cuda PASSED            [  3%]
2023-01-11T23:10:16.9764449Z test_ops.py::TestCommonCUDA::test_dtypes_grid_sampler_2d_cuda PASSED     [  3%]
2023-01-11T23:10:16.9764754Z test_ops.py::TestCommonCUDA::test_dtypes_gt_cuda PASSED                  [  3%]
2023-01-11T23:10:16.9765057Z test_ops.py::TestCommonCUDA::test_dtypes_histogram_cuda PASSED           [  3%]
2023-01-11T23:10:16.9765391Z test_ops.py::TestCommonCUDA::test_dtypes_hstack_cuda PASSED              [  3%]
2023-01-11T23:10:16.9765690Z test_ops.py::TestCommonCUDA::test_dtypes_hypot_cuda PASSED               [  3%]
2023-01-11T23:10:16.9765987Z test_ops.py::TestCommonCUDA::test_dtypes_i0_cuda PASSED                  [  3%]
2023-01-11T23:10:16.9766280Z test_ops.py::TestCommonCUDA::test_dtypes_index_add_cuda PASSED           [  3%]
2023-01-11T23:10:16.9766585Z test_ops.py::TestCommonCUDA::test_dtypes_index_reduce_cuda PASSED        [  3%]
2023-01-11T23:10:16.9766891Z test_ops.py::TestCommonCUDA::test_dtypes_int_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9767191Z test_ops.py::TestCommonCUDA::test_dtypes_isfinite_cuda PASSED            [  3%]
2023-01-11T23:10:16.9767492Z test_ops.py::TestCommonCUDA::test_dtypes_isin_cuda PASSED                [  3%]
2023-01-11T23:10:16.9767796Z test_ops.py::TestCommonCUDA::test_dtypes_isinf_cuda PASSED               [  3%]
2023-01-11T23:10:16.9768096Z test_ops.py::TestCommonCUDA::test_dtypes_isnan_cuda PASSED               [  3%]
2023-01-11T23:10:16.9768400Z test_ops.py::TestCommonCUDA::test_dtypes_isneginf_cuda PASSED            [  3%]
2023-01-11T23:10:16.9768700Z test_ops.py::TestCommonCUDA::test_dtypes_isreal_cuda PASSED              [  3%]
2023-01-11T23:10:16.9769008Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_cuda PASSED    [  3%]
2023-01-11T23:10:16.9769343Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_binary_return_by_ref_cuda PASSED [  3%]
2023-01-11T23:10:16.9769669Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_unary_cuda PASSED     [  3%]
2023-01-11T23:10:16.9769978Z test_ops.py::TestCommonCUDA::test_dtypes_kron_cuda PASSED                [  3%]
2023-01-11T23:10:16.9770294Z test_ops.py::TestCommonCUDA::test_dtypes_kthvalue_cuda PASSED            [  3%]
2023-01-11T23:10:16.9770667Z test_ops.py::TestCommonCUDA::test_dtypes_lcm_cuda PASSED                 [  3%]
2023-01-11T23:10:16.9770970Z test_ops.py::TestCommonCUDA::test_dtypes_ldexp_cuda PASSED               [  3%]
2023-01-11T23:10:16.9771276Z test_ops.py::TestCommonCUDA::test_dtypes_lerp_cuda PASSED                [  3%]
2023-01-11T23:10:16.9771586Z test_ops.py::TestCommonCUDA::test_dtypes_lgamma_cuda PASSED              [  3%]
2023-01-11T23:10:16.9771889Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cross_cuda PASSED        [  3%]
2023-01-11T23:10:16.9772194Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_cuda PASSED          [  3%]
2023-01-11T23:10:16.9772504Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eig_cuda PASSED          [  3%]
2023-01-11T23:10:16.9772807Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvalsh_cuda PASSED     [  4%]
2023-01-11T23:10:16.9773134Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_grad_oriented_cuda PASSED [  4%]
2023-01-11T23:10:16.9773457Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_cuda PASSED           [  4%]
2023-01-11T23:10:16.9773762Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_cuda PASSED    [  4%]
2023-01-11T23:10:16.9774072Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_factor_ex_cuda PASSED [  4%]
2023-01-11T23:10:16.9774383Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lu_solve_cuda PASSED     [  4%]
2023-01-11T23:10:16.9774796Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_norm_cuda PASSED  [  4%]
2023-01-11T23:10:16.9775108Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_power_cuda PASSED [  4%]
2023-01-11T23:10:16.9775425Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_cuda PASSED  [  4%]
2023-01-11T23:10:16.9775753Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_matrix_rank_hermitian_cuda PASSED [  4%]
2023-01-11T23:10:16.9776078Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_cuda PASSED        [  4%]
2023-01-11T23:10:16.9776393Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_triangular_cuda PASSED [  4%]
2023-01-11T23:10:16.9776718Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_svd_cuda PASSED          [  4%]
2023-01-11T23:10:16.9777027Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_svdvals_cuda PASSED      [  4%]
2023-01-11T23:10:16.9777378Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorinv_cuda PASSED    [  4%]
2023-01-11T23:10:16.9777694Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_vander_cuda PASSED       [  4%]
2023-01-11T23:10:16.9777999Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_vecdot_cuda PASSED       [  4%]
2023-01-11T23:10:16.9778313Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_vector_norm_cuda PASSED  [  4%]
2023-01-11T23:10:16.9778620Z test_ops.py::TestCommonCUDA::test_dtypes_logaddexp2_cuda PASSED          [  4%]
2023-01-11T23:10:16.9778926Z test_ops.py::TestCommonCUDA::test_dtypes_logaddexp_cuda PASSED           [  4%]
2023-01-11T23:10:16.9779239Z test_ops.py::TestCommonCUDA::test_dtypes_logcumsumexp_cuda PASSED        [  4%]
2023-01-11T23:10:16.9779547Z test_ops.py::TestCommonCUDA::test_dtypes_logdet_cuda PASSED              [  4%]
2023-01-11T23:10:16.9779860Z test_ops.py::TestCommonCUDA::test_dtypes_logical_and_cuda PASSED         [  4%]
2023-01-11T23:10:16.9780213Z test_ops.py::TestCommonCUDA::test_dtypes_logical_not_cuda PASSED         [  4%]
2023-01-11T23:10:16.9780518Z test_ops.py::TestCommonCUDA::test_dtypes_logical_xor_cuda PASSED         [  4%]
2023-01-11T23:10:16.9780825Z test_ops.py::TestCommonCUDA::test_dtypes_logsumexp_cuda PASSED           [  4%]
2023-01-11T23:10:16.9781127Z test_ops.py::TestCommonCUDA::test_dtypes_long_cuda PASSED                [  4%]
2023-01-11T23:10:16.9781423Z test_ops.py::TestCommonCUDA::test_dtypes_lt_cuda PASSED                  [  4%]
2023-01-11T23:10:16.9781714Z test_ops.py::TestCommonCUDA::test_dtypes_lu_cuda PASSED                  [  4%]
2023-01-11T23:10:16.9782010Z test_ops.py::TestCommonCUDA::test_dtypes_lu_solve_cuda PASSED            [  4%]
2023-01-11T23:10:16.9782348Z test_ops.py::TestCommonCUDA::test_dtypes_masked_amin_cuda PASSED         [  4%]
2023-01-11T23:10:16.9782661Z test_ops.py::TestCommonCUDA::test_dtypes_masked_fill_cuda PASSED         [  4%]
2023-01-11T23:10:16.9782971Z test_ops.py::TestCommonCUDA::test_dtypes_masked_log_softmax_cuda PASSED  [  4%]
2023-01-11T23:10:16.9783285Z test_ops.py::TestCommonCUDA::test_dtypes_masked_logaddexp_cuda PASSED    [  4%]
2023-01-11T23:10:16.9783605Z test_ops.py::TestCommonCUDA::test_dtypes_masked_logsumexp_cuda PASSED    [  4%]
2023-01-11T23:10:16.9783909Z test_ops.py::TestCommonCUDA::test_dtypes_masked_mean_cuda PASSED         [  4%]
2023-01-11T23:10:16.9784216Z test_ops.py::TestCommonCUDA::test_dtypes_masked_median_cuda PASSED       [  4%]
2023-01-11T23:10:16.9784521Z test_ops.py::TestCommonCUDA::test_dtypes_masked_norm_cuda PASSED         [  4%]
2023-01-11T23:10:16.9784829Z test_ops.py::TestCommonCUDA::test_dtypes_masked_softmax_cuda PASSED      [  4%]
2023-01-11T23:10:16.9785137Z test_ops.py::TestCommonCUDA::test_dtypes_masked_softmin_cuda PASSED      [  4%]
2023-01-11T23:10:16.9785445Z test_ops.py::TestCommonCUDA::test_dtypes_masked_std_cuda PASSED          [  4%]
2023-01-11T23:10:16.9785749Z test_ops.py::TestCommonCUDA::test_dtypes_masked_sum_cuda PASSED          [  4%]
2023-01-11T23:10:16.9786050Z test_ops.py::TestCommonCUDA::test_dtypes_masked_var_cuda PASSED          [  4%]
2023-01-11T23:10:16.9786354Z test_ops.py::TestCommonCUDA::test_dtypes_matrix_exp_cuda PASSED          [  4%]
2023-01-11T23:10:16.9786673Z test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_no_dim_cuda PASSED [  4%]
2023-01-11T23:10:16.9786989Z test_ops.py::TestCommonCUDA::test_dtypes_mean_cuda PASSED                [  4%]
2023-01-11T23:10:16.9787308Z test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_variadic_tensors_cuda PASSED [  4%]
2023-01-11T23:10:16.9787632Z test_ops.py::TestCommonCUDA::test_dtypes_min_binary_cuda PASSED          [  4%]
2023-01-11T23:10:16.9787944Z test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_no_dim_cuda PASSED [  4%]
2023-01-11T23:10:16.9788261Z test_ops.py::TestCommonCUDA::test_dtypes_minimum_cuda PASSED             [  4%]
2023-01-11T23:10:16.9788561Z test_ops.py::TestCommonCUDA::test_dtypes_mm_cuda PASSED                  [  4%]
2023-01-11T23:10:16.9788972Z test_ops.py::TestCommonCUDA::test_dtypes_movedim_cuda PASSED             [  4%]
2023-01-11T23:10:16.9789278Z test_ops.py::TestCommonCUDA::test_dtypes_msort_cuda PASSED               [  4%]
2023-01-11T23:10:16.9789590Z test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_5_cuda PASSED [  4%]
2023-01-11T23:10:16.9789908Z test_ops.py::TestCommonCUDA::test_dtypes_nanmean_cuda PASSED             [  4%]
2023-01-11T23:10:16.9790212Z test_ops.py::TestCommonCUDA::test_dtypes_narrow_cuda PASSED              [  4%]
2023-01-11T23:10:16.9790515Z test_ops.py::TestCommonCUDA::test_dtypes_native_batch_norm_cuda PASSED   [  4%]
2023-01-11T23:10:16.9790840Z test_ops.py::TestCommonCUDA::test_dtypes_native_dropout_backward_cuda PASSED [  4%]
2023-01-11T23:10:16.9791164Z test_ops.py::TestCommonCUDA::test_dtypes_new_empty_cuda PASSED           [  4%]
2023-01-11T23:10:16.9791469Z test_ops.py::TestCommonCUDA::test_dtypes_new_full_cuda PASSED            [  4%]
2023-01-11T23:10:16.9791766Z test_ops.py::TestCommonCUDA::test_dtypes_new_ones_cuda PASSED            [  4%]
2023-01-11T23:10:16.9792065Z test_ops.py::TestCommonCUDA::test_dtypes_new_zeros_cuda PASSED           [  4%]
2023-01-11T23:10:16.9792371Z test_ops.py::TestCommonCUDA::test_dtypes_nextafter_cuda PASSED           [  4%]
2023-01-11T23:10:16.9792699Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool1d_cuda PASSED [  4%]
2023-01-11T23:10:16.9793062Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool2d_cuda PASSED [  4%]
2023-01-11T23:10:16.9793417Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool3d_cuda PASSED [  4%]
2023-01-11T23:10:16.9793763Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_alpha_dropout_cuda PASSED [  4%]
2023-01-11T23:10:16.9794127Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool1d_cuda PASSED [  4%]
2023-01-11T23:10:16.9794459Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_cuda PASSED [  4%]
2023-01-11T23:10:16.9794800Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_bilinear_cuda PASSED [  4%]
2023-01-11T23:10:16.9795142Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_cuda PASSED [  4%]
2023-01-11T23:10:16.9795514Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_binary_cross_entropy_with_logits_cuda PASSED [  4%]
2023-01-11T23:10:16.9795868Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv1d_cuda PASSED [  4%]
2023-01-11T23:10:16.9796207Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose2d_cuda PASSED [  4%]
2023-01-11T23:10:16.9796555Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose3d_cuda PASSED [  4%]
2023-01-11T23:10:16.9796910Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_embedding_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9797264Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cross_entropy_cuda PASSED [  4%]
2023-01-11T23:10:16.9797605Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_ctc_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9797940Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout2d_cuda PASSED [  4%]
2023-01-11T23:10:16.9798274Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout3d_cuda PASSED [  4%]
2023-01-11T23:10:16.9798611Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_cuda PASSED [  4%]
2023-01-11T23:10:16.9798953Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool2d_cuda PASSED [  4%]
2023-01-11T23:10:16.9799307Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gaussian_nll_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9799643Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_gelu_cuda PASSED  [  4%]
2023-01-11T23:10:16.9799964Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_glu_cuda PASSED   [  4%]
2023-01-11T23:10:16.9800290Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardtanh_cuda PASSED [  4%]
2023-01-11T23:10:16.9800659Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hinge_embedding_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9801096Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_instance_norm_cuda PASSED [  4%]
2023-01-11T23:10:16.9801441Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_area_cuda PASSED [  4%]
2023-01-11T23:10:16.9801793Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bicubic_cuda PASSED [  4%]
2023-01-11T23:10:16.9802152Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_linear_cuda PASSED [  4%]
2023-01-11T23:10:16.9802493Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_l1_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9802825Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_leaky_relu_cuda PASSED [  4%]
2023-01-11T23:10:16.9803170Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_margin_ranking_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9803522Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool2d_cuda PASSED [  4%]
2023-01-11T23:10:16.9803861Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_grad_cuda PASSED [  4%]
2023-01-11T23:10:16.9804208Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_grad_cuda PASSED [  4%]
2023-01-11T23:10:16.9804544Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mish_cuda PASSED  [  4%]
2023-01-11T23:10:16.9804885Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_margin_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9805228Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_nll_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9805568Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_circular_cuda PASSED [  4%]
2023-01-11T23:10:16.9805907Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_reflect_cuda PASSED [  4%]
2023-01-11T23:10:16.9806274Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pairwise_distance_cuda PASSED [  4%]
2023-01-11T23:10:16.9806611Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_prelu_cuda PASSED [  4%]
2023-01-11T23:10:16.9806948Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_complex_cuda XFAIL [  4%]
2023-01-11T23:10:16.9807286Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_smooth_l1_loss_cuda PASSED [  4%]
2023-01-11T23:10:16.9807620Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softplus_cuda PASSED [  4%]
2023-01-11T23:10:16.9807955Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_threshold_cuda PASSED [  4%]
2023-01-11T23:10:16.9808283Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_unfold_cuda PASSED [  4%]
2023-01-11T23:10:16.9808597Z test_ops.py::TestCommonCUDA::test_dtypes_norm_inf_cuda PASSED            [  4%]
2023-01-11T23:10:16.9808910Z test_ops.py::TestCommonCUDA::test_dtypes_normal_number_mean_cuda PASSED  [  4%]
2023-01-11T23:10:16.9809223Z test_ops.py::TestCommonCUDA::test_dtypes_ones_like_cuda PASSED           [  4%]
2023-01-11T23:10:16.9809541Z test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_var_mean_cuda PASSED [  4%]
2023-01-11T23:10:16.9809868Z test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_view_cuda PASSED    [  4%]
2023-01-11T23:10:16.9810212Z test_ops.py::TestCommonCUDA::test_dtypes_pca_lowrank_cuda PASSED         [  4%]
2023-01-11T23:10:16.9810551Z test_ops.py::TestCommonCUDA::test_dtypes_pinverse_cuda PASSED            [  4%]
2023-01-11T23:10:16.9810895Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_1_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:10:16.9811254Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_4_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:10:16.9811575Z test_ops.py::TestCommonCUDA::test_dtypes_qr_cuda PASSED                  [  4%]
2023-01-11T23:10:16.9811879Z test_ops.py::TestCommonCUDA::test_dtypes_quantile_cuda PASSED            [  4%]
2023-01-11T23:10:16.9812186Z test_ops.py::TestCommonCUDA::test_dtypes_rad2deg_cuda PASSED             [  4%]
2023-01-11T23:10:16.9812493Z test_ops.py::TestCommonCUDA::test_dtypes_rand_like_cuda PASSED           [  4%]
2023-01-11T23:10:16.9812825Z test_ops.py::TestCommonCUDA::test_dtypes_randint_like_cuda PASSED        [  4%]
2023-01-11T23:10:16.9813129Z test_ops.py::TestCommonCUDA::test_dtypes_real_cuda PASSED                [  4%]
2023-01-11T23:10:16.9813439Z test_ops.py::TestCommonCUDA::test_dtypes_reciprocal_cuda PASSED          [  4%]
2023-01-11T23:10:16.9813752Z test_ops.py::TestCommonCUDA::test_dtypes_remainder_cuda PASSED           [  4%]
2023-01-11T23:10:16.9814063Z test_ops.py::TestCommonCUDA::test_dtypes_repeat_cuda PASSED              [  4%]
2023-01-11T23:10:16.9814363Z test_ops.py::TestCommonCUDA::test_dtypes_resize__cuda XFAIL              [  4%]
2023-01-11T23:10:16.9814769Z test_ops.py::TestCommonCUDA::test_dtypes_resolve_conj_cuda PASSED        [  4%]
2023-01-11T23:10:16.9815083Z test_ops.py::TestCommonCUDA::test_dtypes_resolve_neg_cuda PASSED         [  4%]
2023-01-11T23:10:16.9815387Z test_ops.py::TestCommonCUDA::test_dtypes_roll_cuda PASSED                [  4%]
2023-01-11T23:10:16.9815691Z test_ops.py::TestCommonCUDA::test_dtypes_rot90_cuda PASSED               [  4%]
2023-01-11T23:10:16.9816022Z test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_3_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:10:16.9816357Z test_ops.py::TestCommonCUDA::test_dtypes_scalar_tensor_cuda PASSED       [  4%]
2023-01-11T23:10:16.9816662Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_add_cuda PASSED         [  4%]
2023-01-11T23:10:16.9816975Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_cuda PASSED             [  4%]
2023-01-11T23:10:16.9817291Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amax_cuda PASSED [  4%]
2023-01-11T23:10:16.9817609Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_prod_cuda PASSED [  4%]
2023-01-11T23:10:16.9817934Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_sum_cuda PASSED  [  4%]
2023-01-11T23:10:16.9818300Z test_ops.py::TestCommonCUDA::test_dtypes_searchsorted_cuda PASSED        [  4%]
2023-01-11T23:10:16.9818640Z test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_lengths_cuda PASSED [  4%]
2023-01-11T23:10:16.9818970Z test_ops.py::TestCommonCUDA::test_dtypes_select_scatter_cuda PASSED      [  4%]
2023-01-11T23:10:16.9819287Z test_ops.py::TestCommonCUDA::test_dtypes_short_cuda PASSED               [  4%]
2023-01-11T23:10:16.9819599Z test_ops.py::TestCommonCUDA::test_dtypes_sigmoid_cuda PASSED             [  4%]
2023-01-11T23:10:16.9819924Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_bartlett_cuda PASSED [  4%]
2023-01-11T23:10:16.9820323Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_cosine_cuda PASSED [  4%]
2023-01-11T23:10:16.9820683Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_general_hamming_cuda PASSED [  4%]
2023-01-11T23:10:16.9821030Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hann_cuda PASSED [  4%]
2023-01-11T23:10:16.9821346Z test_ops.py::TestCommonCUDA::test_dtypes_sinc_cuda PASSED                [  4%]
2023-01-11T23:10:16.9821656Z test_ops.py::TestCommonCUDA::test_dtypes_sinh_cuda PASSED                [  4%]
2023-01-11T23:10:16.9821966Z test_ops.py::TestCommonCUDA::test_dtypes_slice_cuda PASSED               [  4%]
2023-01-11T23:10:16.9822279Z test_ops.py::TestCommonCUDA::test_dtypes_softmax_with_dtype_cuda PASSED  [  4%]
2023-01-11T23:10:16.9822598Z test_ops.py::TestCommonCUDA::test_dtypes_sort_cuda PASSED                [  4%]
2023-01-11T23:10:16.9822913Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j1_cuda PASSED   [  4%]
2023-01-11T23:10:16.9823259Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_u_cuda PASSED [  4%]
2023-01-11T23:10:16.9823597Z test_ops.py::TestCommonCUDA::test_dtypes_special_erfcx_cuda PASSED       [  4%]
2023-01-11T23:10:16.9823920Z test_ops.py::TestCommonCUDA::test_dtypes_special_i0e_cuda PASSED         [  4%]
2023-01-11T23:10:16.9824241Z test_ops.py::TestCommonCUDA::test_dtypes_special_i1_cuda PASSED          [  5%]
2023-01-11T23:10:16.9824555Z test_ops.py::TestCommonCUDA::test_dtypes_special_i1e_cuda PASSED         [  5%]
2023-01-11T23:10:16.9824894Z test_ops.py::TestCommonCUDA::test_dtypes_special_laguerre_polynomial_l_cuda PASSED [  5%]
2023-01-11T23:10:16.9825474Z test_ops.py::TestCommonCUDA::test_dtypes_special_legendre_polynomial_p_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:10:16.9825874Z test_ops.py::TestCommonCUDA::test_dtypes_special_log_ndtr_cuda PASSED    [  5%]
2023-01-11T23:10:16.9826202Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i1_cuda PASSED [  5%]
2023-01-11T23:10:16.9826533Z test_ops.py::TestCommonCUDA::test_dtypes_special_ndtr_cuda PASSED        [  5%]
2023-01-11T23:10:16.9826873Z test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k1_cuda PASSED [  5%]
2023-01-11T23:10:16.9827412Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_u_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:10:16.9827999Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:10:16.9828397Z test_ops.py::TestCommonCUDA::test_dtypes_special_xlog1py_cuda PASSED     [  5%]
2023-01-11T23:10:16.9828778Z test_ops.py::TestCommonCUDA::test_dtypes_split_cuda PASSED               [  5%]
2023-01-11T23:10:16.9829090Z test_ops.py::TestCommonCUDA::test_dtypes_sqrt_cuda PASSED                [  5%]
2023-01-11T23:10:16.9829394Z test_ops.py::TestCommonCUDA::test_dtypes_square_cuda PASSED              [  5%]
2023-01-11T23:10:16.9829698Z test_ops.py::TestCommonCUDA::test_dtypes_std_cuda PASSED                 [  5%]
2023-01-11T23:10:16.9830015Z test_ops.py::TestCommonCUDA::test_dtypes_std_mean_unbiased_cuda PASSED   [  5%]
2023-01-11T23:10:16.9830326Z test_ops.py::TestCommonCUDA::test_dtypes_sub_cuda PASSED                 [  5%]
2023-01-11T23:10:16.9830693Z test_ops.py::TestCommonCUDA::test_dtypes_svd_lowrank_cuda PASSED         [  5%]
2023-01-11T23:10:16.9831028Z test_ops.py::TestCommonCUDA::test_dtypes_symeig_cuda PASSED              [  5%]
2023-01-11T23:10:16.9831338Z test_ops.py::TestCommonCUDA::test_dtypes_take_cuda PASSED                [  5%]
2023-01-11T23:10:16.9831636Z test_ops.py::TestCommonCUDA::test_dtypes_tanh_cuda PASSED                [  5%]
2023-01-11T23:10:16.9831946Z test_ops.py::TestCommonCUDA::test_dtypes_tensor_split_cuda PASSED        [  5%]
2023-01-11T23:10:16.9832259Z test_ops.py::TestCommonCUDA::test_dtypes_tensordot_cuda PASSED           [  5%]
2023-01-11T23:10:16.9832576Z test_ops.py::TestCommonCUDA::test_dtypes_to_sparse_cuda SKIPPED (Skipped!) [  5%]
2023-01-11T23:10:16.9832900Z test_ops.py::TestCommonCUDA::test_dtypes_transpose_cuda PASSED           [  5%]
2023-01-11T23:10:16.9833215Z test_ops.py::TestCommonCUDA::test_dtypes_triangular_solve_cuda PASSED    [  5%]
2023-01-11T23:10:16.9833529Z test_ops.py::TestCommonCUDA::test_dtypes_tril_cuda PASSED                [  5%]
2023-01-11T23:10:16.9833826Z test_ops.py::TestCommonCUDA::test_dtypes_triu_cuda PASSED                [  5%]
2023-01-11T23:10:16.9834137Z test_ops.py::TestCommonCUDA::test_dtypes_unfold_copy_cuda PASSED         [  5%]
2023-01-11T23:10:16.9834450Z test_ops.py::TestCommonCUDA::test_dtypes_unfold_cuda PASSED              [  5%]
2023-01-11T23:10:16.9834750Z test_ops.py::TestCommonCUDA::test_dtypes_uniform_cuda PASSED             [  5%]
2023-01-11T23:10:16.9835073Z test_ops.py::TestCommonCUDA::test_dtypes_unique_consecutive_cuda PASSED  [  5%]
2023-01-11T23:10:16.9835391Z test_ops.py::TestCommonCUDA::test_dtypes_unsqueeze_cuda PASSED           [  5%]
2023-01-11T23:10:16.9835703Z test_ops.py::TestCommonCUDA::test_dtypes_var_mean_cuda PASSED            [  5%]
2023-01-11T23:10:16.9836006Z test_ops.py::TestCommonCUDA::test_dtypes_view_as_real_cuda PASSED        [  5%]
2023-01-11T23:10:16.9836316Z test_ops.py::TestCommonCUDA::test_dtypes_vsplit_cuda PASSED              [  5%]
2023-01-11T23:10:16.9836624Z test_ops.py::TestCommonCUDA::test_dtypes_vstack_cuda PASSED              [  5%]
2023-01-11T23:10:16.9836921Z test_ops.py::TestCommonCUDA::test_dtypes_where_cuda PASSED               [  5%]
2023-01-11T23:10:16.9837254Z test_ops.py::TestCommonCUDA::test_dtypes_xlogy_cuda PASSED               [  5%]
2023-01-11T23:10:16.9837562Z test_ops.py::TestCommonCUDA::test_dtypes_zero__cuda PASSED               [  5%]
2023-01-11T23:10:16.9837861Z test_ops.py::TestCommonCUDA::test_dtypes_zeros_like_cuda PASSED          [  5%]
2023-01-11T23:10:16.9838170Z test_ops.py::TestCommonCUDA::test_errors___radd___cuda PASSED            [  5%]
2023-01-11T23:10:16.9838473Z test_ops.py::TestCommonCUDA::test_errors___rdiv___cuda PASSED            [  5%]
2023-01-11T23:10:16.9838775Z test_ops.py::TestCommonCUDA::test_errors___rmul___cuda PASSED            [  5%]
2023-01-11T23:10:16.9839070Z test_ops.py::TestCommonCUDA::test_errors___rpow___cuda PASSED            [  5%]
2023-01-11T23:10:16.9839376Z test_ops.py::TestCommonCUDA::test_errors___rsub___cuda PASSED            [  5%]
2023-01-11T23:10:16.9839676Z test_ops.py::TestCommonCUDA::test_errors___rxor___cuda PASSED            [  5%]
2023-01-11T23:10:16.9839974Z test_ops.py::TestCommonCUDA::test_errors_amax_cuda PASSED                [  5%]
2023-01-11T23:10:16.9840275Z test_ops.py::TestCommonCUDA::test_errors_amin_cuda PASSED                [  5%]
2023-01-11T23:10:16.9840575Z test_ops.py::TestCommonCUDA::test_errors_atan2_cuda PASSED               [  5%]
2023-01-11T23:10:16.9840885Z test_ops.py::TestCommonCUDA::test_errors_bernoulli_cuda PASSED           [  5%]
2023-01-11T23:10:16.9841192Z test_ops.py::TestCommonCUDA::test_errors_bitwise_left_shift_cuda PASSED  [  5%]
2023-01-11T23:10:16.9841504Z test_ops.py::TestCommonCUDA::test_errors_bitwise_or_cuda PASSED          [  5%]
2023-01-11T23:10:16.9841817Z test_ops.py::TestCommonCUDA::test_errors_bitwise_right_shift_cuda PASSED [  5%]
2023-01-11T23:10:16.9842151Z test_ops.py::TestCommonCUDA::test_errors_bitwise_xor_cuda PASSED         [  5%]
2023-01-11T23:10:16.9842462Z test_ops.py::TestCommonCUDA::test_errors_complex_cuda PASSED             [  5%]
2023-01-11T23:10:16.9842770Z test_ops.py::TestCommonCUDA::test_errors_copysign_cuda PASSED            [  5%]
2023-01-11T23:10:16.9843079Z test_ops.py::TestCommonCUDA::test_errors_cov_cuda PASSED                 [  5%]
2023-01-11T23:10:16.9843374Z test_ops.py::TestCommonCUDA::test_errors_diag_cuda PASSED                [  5%]
2023-01-11T23:10:16.9843680Z test_ops.py::TestCommonCUDA::test_errors_diag_embed_cuda PASSED          [  5%]
2023-01-11T23:10:16.9843990Z test_ops.py::TestCommonCUDA::test_errors_diagonal_copy_cuda PASSED       [  5%]
2023-01-11T23:10:16.9844309Z test_ops.py::TestCommonCUDA::test_errors_div_no_rounding_mode_cuda PASSED [  5%]
2023-01-11T23:10:16.9844632Z test_ops.py::TestCommonCUDA::test_errors_dstack_cuda PASSED              [  5%]
2023-01-11T23:10:16.9844934Z test_ops.py::TestCommonCUDA::test_errors_eq_cuda PASSED                  [  5%]
2023-01-11T23:10:16.9845242Z test_ops.py::TestCommonCUDA::test_errors_fft_fft2_cuda PASSED            [  5%]
2023-01-11T23:10:16.9845542Z test_ops.py::TestCommonCUDA::test_errors_fft_fft_cuda PASSED             [  5%]
2023-01-11T23:10:16.9845849Z test_ops.py::TestCommonCUDA::test_errors_fft_hfftn_cuda PASSED           [  5%]
2023-01-11T23:10:16.9846159Z test_ops.py::TestCommonCUDA::test_errors_fft_ifftn_cuda PASSED           [  5%]
2023-01-11T23:10:16.9846459Z test_ops.py::TestCommonCUDA::test_errors_fft_ihfft_cuda PASSED           [  5%]
2023-01-11T23:10:16.9846767Z test_ops.py::TestCommonCUDA::test_errors_fft_ihfftn_cuda PASSED          [  5%]
2023-01-11T23:10:16.9847073Z test_ops.py::TestCommonCUDA::test_errors_fft_irfft2_cuda PASSED          [  5%]
2023-01-11T23:10:16.9847380Z test_ops.py::TestCommonCUDA::test_errors_fft_irfft_cuda PASSED           [  5%]
2023-01-11T23:10:16.9847681Z test_ops.py::TestCommonCUDA::test_errors_fft_irfftn_cuda PASSED          [  5%]
2023-01-11T23:10:16.9847992Z test_ops.py::TestCommonCUDA::test_errors_fft_rfft2_cuda PASSED           [  5%]
2023-01-11T23:10:16.9848300Z test_ops.py::TestCommonCUDA::test_errors_fft_rfft_cuda PASSED            [  5%]
2023-01-11T23:10:16.9848662Z test_ops.py::TestCommonCUDA::test_errors_fft_rfftn_cuda PASSED           [  5%]
2023-01-11T23:10:16.9848973Z test_ops.py::TestCommonCUDA::test_errors_floor_divide_cuda PASSED        [  5%]
2023-01-11T23:10:16.9849284Z test_ops.py::TestCommonCUDA::test_errors_fmin_cuda PASSED                [  5%]
2023-01-11T23:10:16.9849590Z test_ops.py::TestCommonCUDA::test_errors_gather_cuda PASSED              [  5%]
2023-01-11T23:10:16.9849891Z test_ops.py::TestCommonCUDA::test_errors_gradient_cuda PASSED            [  5%]
2023-01-11T23:10:16.9850228Z test_ops.py::TestCommonCUDA::test_errors_hsplit_cuda PASSED              [  5%]
2023-01-11T23:10:16.9850559Z test_ops.py::TestCommonCUDA::test_errors_hstack_cuda PASSED              [  5%]
2023-01-11T23:10:16.9850855Z test_ops.py::TestCommonCUDA::test_errors_hypot_cuda PASSED               [  5%]
2023-01-11T23:10:16.9851165Z test_ops.py::TestCommonCUDA::test_errors_isclose_cuda PASSED             [  5%]
2023-01-11T23:10:16.9851476Z test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_cuda PASSED    [  5%]
2023-01-11T23:10:16.9851793Z test_ops.py::TestCommonCUDA::test_errors_ldexp_cuda PASSED               [  5%]
2023-01-11T23:10:16.9852089Z test_ops.py::TestCommonCUDA::test_errors_le_cuda PASSED                  [  5%]
2023-01-11T23:10:16.9852394Z test_ops.py::TestCommonCUDA::test_errors_linalg_cross_cuda PASSED        [  5%]
2023-01-11T23:10:16.9852705Z test_ops.py::TestCommonCUDA::test_errors_linspace_cuda PASSED            [  5%]
2023-01-11T23:10:16.9853011Z test_ops.py::TestCommonCUDA::test_errors_logcumsumexp_cuda PASSED        [  5%]
2023-01-11T23:10:16.9853328Z test_ops.py::TestCommonCUDA::test_errors_logical_and_cuda PASSED         [  5%]
2023-01-11T23:10:16.9853636Z test_ops.py::TestCommonCUDA::test_errors_logical_or_cuda PASSED          [  5%]
2023-01-11T23:10:16.9853970Z test_ops.py::TestCommonCUDA::test_errors_logical_xor_cuda PASSED         [  5%]
2023-01-11T23:10:16.9854272Z test_ops.py::TestCommonCUDA::test_errors_masked_select_cuda PASSED       [  5%]
2023-01-11T23:10:16.9854708Z test_ops.py::TestCommonCUDA::test_errors_max_binary_cuda PASSED          [  5%]
2023-01-11T23:10:16.9855015Z test_ops.py::TestCommonCUDA::test_errors_mean_cuda PASSED                [  5%]
2023-01-11T23:10:16.9855311Z test_ops.py::TestCommonCUDA::test_errors_median_cuda PASSED              [  5%]
2023-01-11T23:10:16.9855616Z test_ops.py::TestCommonCUDA::test_errors_min_binary_cuda PASSED          [  5%]
2023-01-11T23:10:16.9855924Z test_ops.py::TestCommonCUDA::test_errors_minimum_cuda PASSED             [  5%]
2023-01-11T23:10:16.9856228Z test_ops.py::TestCommonCUDA::test_errors_movedim_cuda PASSED             [  5%]
2023-01-11T23:10:16.9856523Z test_ops.py::TestCommonCUDA::test_errors_mul_cuda PASSED                 [  5%]
2023-01-11T23:10:16.9856832Z test_ops.py::TestCommonCUDA::test_errors_narrow_copy_cuda PASSED         [  5%]
2023-01-11T23:10:16.9857139Z test_ops.py::TestCommonCUDA::test_errors_narrow_cuda PASSED              [  5%]
2023-01-11T23:10:16.9857445Z test_ops.py::TestCommonCUDA::test_errors_native_layer_norm_cuda PASSED   [  5%]
2023-01-11T23:10:16.9857751Z test_ops.py::TestCommonCUDA::test_errors_ne_cuda PASSED                  [  5%]
2023-01-11T23:10:16.9858066Z test_ops.py::TestCommonCUDA::test_errors_nextafter_cuda PASSED           [  5%]
2023-01-11T23:10:16.9858394Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool1d_cuda PASSED [  5%]
2023-01-11T23:10:16.9858729Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool2d_cuda PASSED [  5%]
2023-01-11T23:10:16.9859071Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_embedding_cuda PASSED [  5%]
2023-01-11T23:10:16.9859401Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_gelu_cuda PASSED  [  5%]
2023-01-11T23:10:16.9859726Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_group_norm_cuda PASSED [  5%]
2023-01-11T23:10:16.9860065Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_huber_loss_cuda PASSED [  5%]
2023-01-11T23:10:16.9860402Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_l1_loss_cuda PASSED [  5%]
2023-01-11T23:10:16.9860777Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool1d_cuda PASSED [  5%]
2023-01-11T23:10:16.9861104Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool2d_cuda PASSED [  5%]
2023-01-11T23:10:16.9861446Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_poisson_nll_loss_cuda PASSED [  5%]
2023-01-11T23:10:16.9861785Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_prelu_cuda PASSED [  5%]
2023-01-11T23:10:16.9862138Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_with_distance_loss_cuda PASSED [  5%]
2023-01-11T23:10:16.9862483Z test_ops.py::TestCommonCUDA::test_errors_ormqr_cuda PASSED               [  5%]
2023-01-11T23:10:16.9862793Z test_ops.py::TestCommonCUDA::test_errors_pow_cuda PASSED                 [  5%]
2023-01-11T23:10:16.9863100Z test_ops.py::TestCommonCUDA::test_errors_renorm_cuda PASSED              [  5%]
2023-01-11T23:10:16.9863411Z test_ops.py::TestCommonCUDA::test_errors_reshape_as_cuda PASSED          [  5%]
2023-01-11T23:10:16.9863723Z test_ops.py::TestCommonCUDA::test_errors_reshape_cuda PASSED             [  5%]
2023-01-11T23:10:16.9864032Z test_ops.py::TestCommonCUDA::test_errors_roll_cuda PASSED                [  5%]
2023-01-11T23:10:16.9864329Z test_ops.py::TestCommonCUDA::test_errors_rot90_cuda PASSED               [  5%]
2023-01-11T23:10:16.9864641Z test_ops.py::TestCommonCUDA::test_errors_scatter_add_cuda PASSED         [  5%]
2023-01-11T23:10:16.9864951Z test_ops.py::TestCommonCUDA::test_errors_scatter_cuda PASSED             [  5%]
2023-01-11T23:10:16.9865275Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_cosine_cuda PASSED [  5%]
2023-01-11T23:10:16.9865622Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_gaussian_cuda PASSED [  5%]
2023-01-11T23:10:16.9866006Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_cosine_cuda PASSED [  5%]
2023-01-11T23:10:16.9866349Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_kaiser_cuda PASSED [  5%]
2023-01-11T23:10:16.9866683Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_nuttall_cuda PASSED [  5%]
2023-01-11T23:10:16.9867030Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_u_cuda PASSED [  5%]
2023-01-11T23:10:16.9867385Z test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_h_cuda PASSED [  5%]
2023-01-11T23:10:16.9867914Z test_ops.py::TestCommonCUDA::test_errors_special_legendre_polynomial_p_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:10:16.9868490Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_t_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:10:16.9869155Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:10:16.9869560Z test_ops.py::TestCommonCUDA::test_errors_special_xlog1py_cuda PASSED     [  5%]
2023-01-11T23:10:16.9869883Z test_ops.py::TestCommonCUDA::test_errors_sub_cuda PASSED                 [  5%]
2023-01-11T23:10:16.9870184Z test_ops.py::TestCommonCUDA::test_errors_t_cuda PASSED                   [  5%]
2023-01-11T23:10:16.9870486Z test_ops.py::TestCommonCUDA::test_errors_tril_cuda PASSED                [  5%]
2023-01-11T23:10:16.9870808Z test_ops.py::TestCommonCUDA::test_errors_true_divide_cuda PASSED         [  5%]
2023-01-11T23:10:16.9871157Z test_ops.py::TestCommonCUDA::test_errors_unbind_cuda PASSED              [  5%]
2023-01-11T23:10:16.9871466Z test_ops.py::TestCommonCUDA::test_errors_view_as_cuda PASSED             [  5%]
2023-01-11T23:10:16.9871771Z test_ops.py::TestCommonCUDA::test_errors_view_copy_cuda PASSED           [  5%]
2023-01-11T23:10:16.9872083Z test_ops.py::TestCommonCUDA::test_errors_xlogy_cuda PASSED               [  5%]
2023-01-11T23:10:16.9872432Z test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9872857Z test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9873258Z test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9873660Z test_ops.py::TestCommonCUDA::test_multiple_devices___radd___cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9874048Z test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9874441Z test_ops.py::TestCommonCUDA::test_multiple_devices___rdiv___cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9874844Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmatmul___cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9875251Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9875645Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmul___cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9876037Z test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9876453Z test_ops.py::TestCommonCUDA::test_multiple_devices__softmax_backward_data_cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:10:16.9876867Z test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9877253Z test_ops.py::TestCommonCUDA::test_multiple_devices_abs_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9877670Z test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9878064Z test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9878452Z test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9878854Z test_ops.py::TestCommonCUDA::test_multiple_devices_addcdiv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9879255Z test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9879652Z test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9880043Z test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9880438Z test_ops.py::TestCommonCUDA::test_multiple_devices_addr_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9880827Z test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9881227Z test_ops.py::TestCommonCUDA::test_multiple_devices_allclose_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9881622Z test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9882012Z test_ops.py::TestCommonCUDA::test_multiple_devices_amax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9882401Z test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9882800Z test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9883199Z test_ops.py::TestCommonCUDA::test_multiple_devices_aminmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9883594Z test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9884025Z test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9884418Z test_ops.py::TestCommonCUDA::test_multiple_devices_any_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9884809Z test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9885206Z test_ops.py::TestCommonCUDA::test_multiple_devices_arange_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9885599Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9885987Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9886379Z test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9886781Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9887199Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9887625Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_partial_views_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9888039Z test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9888438Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9888859Z test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9889249Z test_ops.py::TestCommonCUDA::test_multiple_devices_atanh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9889652Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9890054Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_1d_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9890459Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9890902Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_2d_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9891302Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9891704Z test_ops.py::TestCommonCUDA::test_multiple_devices_atleast_3d_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9892108Z test_ops.py::TestCommonCUDA::test_multiple_devices_bernoulli_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9892509Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_and_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9892925Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_right_shift_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9893340Z test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9893743Z test_ops.py::TestCommonCUDA::test_multiple_devices_block_diag_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9894135Z test_ops.py::TestCommonCUDA::test_multiple_devices_bmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9894692Z test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9895143Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9895576Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_tensors_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9895992Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9896406Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_to_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9896815Z test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9897218Z test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9897624Z test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9898037Z test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9898434Z test_ops.py::TestCommonCUDA::test_multiple_devices_ceil_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9898837Z test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9899229Z test_ops.py::TestCommonCUDA::test_multiple_devices_cfloat_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9899635Z test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9900055Z test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_inverse_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9900549Z test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9900947Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9901355Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9901765Z test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9902168Z test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9902581Z test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9902988Z test_ops.py::TestCommonCUDA::test_multiple_devices_complex_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9903390Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9903793Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9904210Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_physical_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9904621Z test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9905027Z test_ops.py::TestCommonCUDA::test_multiple_devices_corrcoef_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9905419Z test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9905818Z test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9906217Z test_ops.py::TestCommonCUDA::test_multiple_devices_cosh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9906644Z test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9907042Z test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9907442Z test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9907846Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9908244Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9908634Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9909148Z test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9909555Z test_ops.py::TestCommonCUDA::test_multiple_devices_deg2rad_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9909951Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9910348Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9910762Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9910963Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9911188Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9911395Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9911605Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9911813Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9912006Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_trunc_rounding_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9912201Z test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9912392Z test_ops.py::TestCommonCUDA::test_multiple_devices_double_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9912586Z test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9912782Z test_ops.py::TestCommonCUDA::test_multiple_devices_dsplit_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9912976Z test_ops.py::TestCommonCUDA::test_multiple_devices_einsum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9913167Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9913365Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9913552Z test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9913746Z test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9913933Z test_ops.py::TestCommonCUDA::test_multiple_devices_equal_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9914144Z test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9914335Z test_ops.py::TestCommonCUDA::test_multiple_devices_erf_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9914524Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9914712Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfc_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9914905Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9915098Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9915289Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9915476Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9915670Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9915865Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_as_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9916061Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9916254Z test_ops.py::TestCommonCUDA::test_multiple_devices_expand_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9916446Z test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9916672Z test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9916864Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9917057Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9917247Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9917450Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9917639Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9917832Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9918027Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9918224Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9918419Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9918613Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9918806Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9919009Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9919211Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifftshift_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9919405Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9919628Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9919827Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9920023Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9920238Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9920463Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9920656Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9920851Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9921051Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9921245Z test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9921431Z test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9921624Z test_ops.py::TestCommonCUDA::test_multiple_devices_flipud_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9921819Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9922018Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9922229Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9922417Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9922605Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9922795Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9922985Z test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9923170Z test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9923375Z test_ops.py::TestCommonCUDA::test_multiple_devices_grid_sampler_2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9923568Z test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9923761Z test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9923950Z test_ops.py::TestCommonCUDA::test_multiple_devices_half_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9924149Z test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9924343Z test_ops.py::TestCommonCUDA::test_multiple_devices_heaviside_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9924539Z test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9924729Z test_ops.py::TestCommonCUDA::test_multiple_devices_histc_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9924924Z test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9925124Z test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:10:16.9925312Z test_ops.py::TestCommonCUDA::test_multiple_devices_i0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9925507Z test_ops.py::TestCommonCUDA::test_multiple_devices_igamma_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9925704Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9925899Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9926095Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9926298Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9926500Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_reduce_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9926695Z test_ops.py::TestCommonCUDA::test_multiple_devices_inner_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9926886Z test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9927072Z test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9927258Z test_ops.py::TestCommonCUDA::test_multiple_devices_isin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9927449Z test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9927665Z test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9927864Z test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9928059Z test_ops.py::TestCommonCUDA::test_multiple_devices_isneginf_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9928248Z test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9928441Z test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9928634Z test_ops.py::TestCommonCUDA::test_multiple_devices_isreal_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9928850Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9929076Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_4inputs_with_extra_args_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9929285Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9929503Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9929720Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_return_by_ref_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9929923Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9930116Z test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9930315Z test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9930527Z test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9930754Z test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9930973Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_singular_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9931170Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eig_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9931377Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9931582Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9931799Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_grad_oriented_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9931999Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9932202Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9932410Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9932627Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9932832Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_multi_dot_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9933049Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9933272Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9933469Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9933682Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_hermitian_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9933881Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_qr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9934084Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_slogdet_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9934284Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9934586Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9934794Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9934997Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vecdot_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9935199Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vector_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9935393Z test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9935584Z test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9935774Z test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9935965Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9936204Z test_ops.py::TestCommonCUDA::test_multiple_devices_logcumsumexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9936400Z test_ops.py::TestCommonCUDA::test_multiple_devices_logdet_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9936597Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9936795Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9936994Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_xor_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9937188Z test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9937379Z test_ops.py::TestCommonCUDA::test_multiple_devices_logit_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9937577Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9937772Z test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9937963Z test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9938155Z test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9938351Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9938569Z test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9938767Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9938957Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9939147Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9939348Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9939549Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9939753Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9939958Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumprod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9940158Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9940363Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9940558Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9940763Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_normalize_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9940955Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9941157Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9941359Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9941586Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9941786Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9941982Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_var_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9942177Z test_ops.py::TestCommonCUDA::test_multiple_devices_matmul_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9942367Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9942574Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9942774Z test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9942965Z test_ops.py::TestCommonCUDA::test_multiple_devices_maximum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9943178Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9943395Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9943595Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9943794Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_binary_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9944028Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9944243Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9944439Z test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9944632Z test_ops.py::TestCommonCUDA::test_multiple_devices_minimum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9944824Z test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9945014Z test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9945213Z test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9945404Z test_ops.py::TestCommonCUDA::test_multiple_devices_mv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9945616Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9945826Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9946029Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_5_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9946225Z test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9946422Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9946622Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanquantile_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9946816Z test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9947023Z test_ops.py::TestCommonCUDA::test_multiple_devices_nansum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9947226Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9947623Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9947880Z test_ops.py::TestCommonCUDA::test_multiple_devices_native_batch_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9948106Z test_ops.py::TestCommonCUDA::test_multiple_devices_native_layer_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9948313Z test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9948524Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9948817Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9949012Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9949218Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_full_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9949426Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9949662Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_ones_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9949901Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9950150Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9950387Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9950624Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9950856Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9951085Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_alpha_dropout_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9951295Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9951530Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9951772Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_avg_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9951991Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9952215Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_bilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9952462Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_with_logits_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9952700Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9952964Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_similarity_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9953195Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_ctc_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9953421Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9953659Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9953860Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_elu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9954110Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9954401Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9954635Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9954867Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gelu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9955086Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_glu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9955329Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9955566Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_area_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9955849Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_linear_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9956087Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9956294Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_layer_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9956518Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_leaky_relu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9956746Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_linear_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9956982Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_local_response_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9957227Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9957463Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9957692Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9957939Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9958207Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_grad_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9958434Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mish_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:10:16.9958662Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_mse_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9958914Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9959146Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_reflect_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9959378Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_replicate_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9959609Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9959838Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pairwise_distance_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9960085Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pdist_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9960320Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9960554Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9960775Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9960995Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9961224Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_rrelu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9961451Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_selu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9961676Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_silu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9961915Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_smooth_l1_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9962211Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_soft_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9962452Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9962683Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_with_dtype_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9962914Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9963143Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softsign_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9963375Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9963607Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_tanhshrink_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9963816Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9964047Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_threshold_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9964314Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9964569Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_unfold_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9964806Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_bilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9965017Z test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9965234Z test_ops.py::TestCommonCUDA::test_multiple_devices_nonzero_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9965440Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9965646Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_fro_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9965898Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9966091Z test_ops.py::TestCommonCUDA::test_multiple_devices_ormqr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9966315Z test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9966526Z test_ops.py::TestCommonCUDA::test_multiple_devices_pinverse_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9966732Z test_ops.py::TestCommonCUDA::test_multiple_devices_polar_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9966970Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9967180Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_int64 SKIPPED (Skipped!) [  8%]
2023-01-11T23:10:16.9967412Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_int64 SKIPPED (Skipped!) [  8%]
2023-01-11T23:10:16.9967622Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [  8%]
2023-01-11T23:10:16.9967836Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_4_cuda_int64 SKIPPED (Skipped!) [  8%]
2023-01-11T23:10:16.9968056Z test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9968241Z test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9968444Z test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9968657Z test_ops.py::TestCommonCUDA::test_multiple_devices_put_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9968865Z test_ops.py::TestCommonCUDA::test_multiple_devices_qr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9969077Z test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9969289Z test_ops.py::TestCommonCUDA::test_multiple_devices_rand_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9969548Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9969764Z test_ops.py::TestCommonCUDA::test_multiple_devices_randn_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9969986Z test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9970173Z test_ops.py::TestCommonCUDA::test_multiple_devices_real_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9970402Z test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9970671Z test_ops.py::TestCommonCUDA::test_multiple_devices_remainder_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9970905Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9971129Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9971342Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9971554Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_as_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9971771Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9972003Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9972217Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize_as__cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9972413Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9972627Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9972836Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_neg_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9973039Z test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9973285Z test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9973523Z test_ops.py::TestCommonCUDA::test_multiple_devices_rot90_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9973734Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9973956Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9974179Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9974363Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [  8%]
2023-01-11T23:10:16.9974667Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9974876Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9975098Z test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9975326Z test_ops.py::TestCommonCUDA::test_multiple_devices_scalar_tensor_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9975541Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9975754Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9975984Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9976208Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9976432Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9976634Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9976947Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9977158Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9977376Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9977584Z test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9977791Z test_ops.py::TestCommonCUDA::test_multiple_devices_sigmoid_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9978021Z test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9978224Z test_ops.py::TestCommonCUDA::test_multiple_devices_sign_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9978459Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_bartlett_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9978690Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_exponential_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9978904Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_cosine_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9979128Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hamming_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9979349Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_hann_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9979607Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_kaiser_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9979830Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_nuttall_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9980047Z test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9980258Z test_ops.py::TestCommonCUDA::test_multiple_devices_sin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9980498Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9980710Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9980922Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9981119Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9981330Z test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9981539Z test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9981743Z test_ops.py::TestCommonCUDA::test_multiple_devices_sort_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9981981Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9982196Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_airy_ai_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9982419Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9982660Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9982883Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9983093Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9983311Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9983714Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:10:16.9984084Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:10:16.9984319Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9984565Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_erfcx_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9984797Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9985028Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9985240Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9985459Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9985698Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9985900Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9986128Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9986357Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9986573Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9986788Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtri_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9987030Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9987273Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9987662Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:10:16.9988123Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:10:16.9988497Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:10:16.9988858Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9989062Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9989308Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_xlog1py_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9989538Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9989757Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9989965Z test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9990180Z test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9990391Z test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9990605Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9990840Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9991063Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9991252Z test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9991453Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9991670Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9991907Z test_ops.py::TestCommonCUDA::test_multiple_devices_symeig_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9992111Z test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9992331Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9992584Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_along_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9992812Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9993017Z test_ops.py::TestCommonCUDA::test_multiple_devices_take_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9993203Z test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9993409Z test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9993612Z test_ops.py::TestCommonCUDA::test_multiple_devices_tanh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9993826Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9994038Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensordot_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9994239Z test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9994461Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:10:16.9994686Z test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9994902Z test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9995141Z test_ops.py::TestCommonCUDA::test_multiple_devices_transpose_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9995349Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9995557Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9995762Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapz_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9995991Z test_ops.py::TestCommonCUDA::test_multiple_devices_triangular_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9996228Z test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9996435Z test_ops.py::TestCommonCUDA::test_multiple_devices_tril_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9996664Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_indices_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9996878Z test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9997089Z test_ops.py::TestCommonCUDA::test_multiple_devices_true_divide_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9997294Z test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9997480Z test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9997695Z test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9997938Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9998148Z test_ops.py::TestCommonCUDA::test_multiple_devices_uniform_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9998370Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9998604Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_consecutive_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9998807Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9999020Z test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9999246Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9999447Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9999670Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_complex_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:16.9999913Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0000131Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0000377Z test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0000605Z test_ops.py::TestCommonCUDA::test_multiple_devices_vsplit_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0000810Z test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0001022Z test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0001252Z test_ops.py::TestCommonCUDA::test_multiple_devices_xlogy_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0001465Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:10:17.0001633Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___getitem___cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0001814Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rand___cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0001992Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rmul___cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0002171Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rxor___cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0002373Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_abs_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0002549Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acos_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0002727Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_add_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0002907Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_aminmax_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0003067Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_angle_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0003284Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_argwhere_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0003531Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_partial_views_cuda_bool SKIPPED (Modifies input strides and storage_offset) [  9%]
2023-01-11T23:10:17.0003727Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_scatter_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0003940Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asin_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0004130Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan2_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0004310Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atan_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0004495Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_2d_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0004674Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bfloat16_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0004841Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_and_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0005026Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_not_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0005210Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_xor_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0005400Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_block_diag_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0005591Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_to_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0005788Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_byte_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0005969Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cdouble_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0006155Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_min_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0006373Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clone_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0006537Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_column_stack_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0006724Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_combinations_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0006924Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_physical_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0007121Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_constant_pad_nd_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0007309Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_contiguous_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0007531Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_copysign_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0007711Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cos_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0007898Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_count_nonzero_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0008060Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_deg2rad_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0008255Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_copy_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0008438Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diff_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0008625Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_digamma_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0008826Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_div_no_rounding_mode_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0009011Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_double_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0009207Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dsplit_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0009407Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_like_cuda_bool SKIPPED (Skipped!) [  9%]
2023-01-11T23:10:17.0009627Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eq_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0009789Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_equal_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0009966Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0010164Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_as_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0010400Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expm1_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0010575Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_eye_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0010760Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft2_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0010952Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fft_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0011138Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0011299Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0011484Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfftn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0011664Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0011857Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfftn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0012035Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0012218Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfftn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0012397Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfftn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0012595Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flip_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0012807Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fliplr_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0012968Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0013146Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmin_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0013321Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0013507Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_full_like_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0013688Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gather_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0013896Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_gt_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0014075Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_heaviside_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0014265Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hstack_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0014420Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_i0_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0014714Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_fill_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0014897Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isclose_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0015084Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isfinite_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0015261Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isinf_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0015447Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isnan_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0015626Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isreal_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0015831Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_return_by_ref_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0016078Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_unary_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0016236Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_kron_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0016421Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_and_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0016602Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logsumexp_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0016824Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_long_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0017006Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lt_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0017195Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_mean_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0017381Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_prod_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0017574Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_scatter_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0017746Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_select_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0017946Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_binary_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0018141Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_no_dim_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0018343Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_list_of_tensors_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0018555Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_meshgrid_variadic_tensors_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0018742Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_binary_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0018938Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_no_dim_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0019140Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_min_reduction_with_dim_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0019351Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mode_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0019513Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nan_to_num_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0019711Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nansum_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0019898Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ne_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0020183Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_strided_cuda_bool SKIPPED (Expected: new_empty_strided is not comparable) [  9%]
2023-01-11T23:10:17.0020399Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_cosine_embedding_loss_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0020630Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_feature_alpha_dropout_without_train_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0020833Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_constant_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0021039Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_unshuffle_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0021221Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nonzero_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0021430Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_like_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0021592Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_outer_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0021776Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_permute_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0021979Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_0_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0022191Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_1_cuda_bool SKIPPED (Skipped!) [  9%]
2023-01-11T23:10:17.0022370Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_prod_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0022550Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_put_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0022761Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ravel_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0022974Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0023151Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_repeat_interleave_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0023354Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_as_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0023535Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reshape_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0023713Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize__cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0023900Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_conj_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0024085Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resolve_neg_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0024264Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_roll_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0024447Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rot90_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0024630Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rsqrt_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0024797Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_add_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0024991Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0025196Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_scatter_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0025374Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sgn_cuda_bool PASSED [  9%]
2023-01-11T23:10:17.0051635Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_short_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0051862Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sigmoid_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0052032Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sin_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0052204Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0052389Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y0_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0052659Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_y1_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0052854Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_t_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0053271Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_v_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:10:17.0053451Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_entr_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0053645Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_h_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0053837Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_hermite_polynomial_he_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0054011Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i0e_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0054182Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0054371Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_laguerre_polynomial_l_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0055103Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_legendre_polynomial_p_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:10:17.0055295Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i0_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0055478Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k1_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0055648Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtri_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0055969Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_polygamma_special_polygamma_n_0_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0056167Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k0_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0056544Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_v_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:10:17.0056912Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_w_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:10:17.0057099Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_spherical_bessel_j0_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0057262Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0057426Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sqrt_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0057585Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_stack_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0057748Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0057918Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tensor_split_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0058082Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_trace_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0058241Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_triu_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0058409Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_true_divide_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0058570Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unbind_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0058734Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_as_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0058899Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_copy_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0059111Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vsplit_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0059274Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zero__cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0059456Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_like_cuda_bool PASSED [ 10%]
2023-01-11T23:10:17.0059620Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0059783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0059943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0060119Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0060313Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0060502Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___getitem___cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0060672Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0060838Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0061000Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmod___cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0061162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___ror___cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0061331Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0061493Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0061654Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0061836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rxor___cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0062027Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples__native_batch_norm_legit_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0062193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0062355Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0062515Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_abs_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0062677Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0062845Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0063008Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0063169Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0063334Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0063500Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addbmm_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0063671Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0063836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0063999Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0064184Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0064346Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0064512Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0064673Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0064836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0065069Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0065238Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0065399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0065558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0065716Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_any_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0065877Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0066043Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0066202Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0066373Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0066547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_complex64 XFAIL [ 10%]
2023-01-11T23:10:17.0066709Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_int64 XFAIL [ 10%]
2023-01-11T23:10:17.0066893Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_float32 XFAIL [ 10%]
2023-01-11T23:10:17.0067074Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_int64 XFAIL [ 10%]
2023-01-11T23:10:17.0067299Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_int64 SKIPPED (Works for int64, fails for everything else) [ 10%]
2023-01-11T23:10:17.0067491Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0067659Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0067824Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0067987Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0068145Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asinh_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0068310Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0068474Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0068635Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0068881Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0069053Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0069213Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0069387Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0069549Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0069715Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0069874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0070042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0070200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0070363Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bincount_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0070526Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_and_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0070723Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_left_shift_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0070889Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0071052Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0071212Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0071369Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bmm_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0071530Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0071688Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0071873Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0072035Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0072201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0072360Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0072520Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0072697Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0072868Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0073030Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0073219Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0073383Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0073541Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0073699Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ceil_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0073864Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0074028Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0074188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0074346Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chalf_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0074504Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0074673Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0074847Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0075021Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_inverse_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0075191Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0075355Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0075515Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0075674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_chunk_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0075834Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0075994Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0076155Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0076335Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0076495Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0076651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clone_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0076821Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0076985Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0077156Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0077333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0077506Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0077674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_physical_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0077848Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0078018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0078188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0078356Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0078521Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_contiguous_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0078684Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0078873Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0079036Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_float32 PASSED [ 10%]
2023-01-11T23:10:17.0079205Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0079369Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_complex64 PASSED [ 10%]
2023-01-11T23:10:17.0079529Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_int64 PASSED [ 10%]
2023-01-11T23:10:17.0079692Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0079855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0080025Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0080193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0080371Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0080587Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0080776Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0080955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumulative_trapezoid_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0081125Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0081290Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0081468Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0081641Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0081813Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_copy_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0082004Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0082169Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0082333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0082497Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0082662Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0082842Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0083020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_trunc_rounding_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0083186Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0083348Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0083518Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0083676Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_einsum_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0083859Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_complex64 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:10:17.0084045Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_complex64 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:10:17.0084225Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_int64 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:10:17.0084389Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0084573Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0084731Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eq_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0084900Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0085058Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0085216Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp2_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0085376Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0085548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0085712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0085874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0086056Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_complex64 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:10:17.0086228Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_int64 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:10:17.0086395Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0086554Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0086717Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0086878Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0087043Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0087212Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0087385Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0087548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0087740Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0087899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0088068Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0088233Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0088396Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0088555Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0088724Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0088898Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0089067Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0089245Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0089409Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0089572Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft2_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0089737Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0089898Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfftn_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0090072Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0090265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0090432Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0090596Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0090783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0090975Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0091138Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft2_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0091301Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0091463Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfft_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0091628Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0091791Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0091955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fill_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0092123Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0092284Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0092447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0092617Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0092778Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0092940Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flipud_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0093106Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0093282Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0093474Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0093641Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_power_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0093803Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0093972Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0094131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0094293Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frac_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0094457Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_frexp_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0094743Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0094902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0095074Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0095235Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0095391Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0095547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ge_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0095714Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0095874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0096084Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0096246Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0096419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0096579Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0096740Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_histc_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0096909Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0097072Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0097232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hsplit_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0097395Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0097558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0097719Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hypot_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0097880Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0098042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igammac_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0098204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_imag_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0098374Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0098539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0098701Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_add_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0098879Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0099048Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0099244Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_copy_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0099409Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0099577Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0099751Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0099917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0100107Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0100294Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0100459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_int_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0100627Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0100791Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0100949Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0101109Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isin_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0101272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0101434Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0101596Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0101757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0101950Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0102116Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isneginf_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0102281Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0102442Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0102602Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0102766Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_istft_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0102955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0103140Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0103339Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0103539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0103729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0103913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_return_by_ref_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0104091Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0104264Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0104429Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0104595Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0104754Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lcm_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0104951Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0105117Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_int64 PASSED [ 11%]
2023-01-11T23:10:17.0105280Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0105442Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0105616Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0105784Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_float32 PASSED [ 11%]
2023-01-11T23:10:17.0106011Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_complex64 SKIPPED (The backward may give different results) [ 11%]
2023-01-11T23:10:17.0106187Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0106361Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_complex64 PASSED [ 11%]
2023-01-11T23:10:17.0106531Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigh_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0106710Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0106889Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0107061Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvalsh_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0107254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0107443Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_householder_product_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0107641Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0107812Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0107988Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0108155Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_inv_ex_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0108333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0108508Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0108749Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_solve_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0108925Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0109099Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0109287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0109459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0109627Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0109807Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0109988Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0110161Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0110344Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0110524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_norm_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0110733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0110915Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0111107Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0111294Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_complex64 SKIPPED (67470!) [ 12%]
2023-01-11T23:10:17.0111490Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0111683Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0111855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0112041Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0112208Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0112379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0112558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0112731Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_ex_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0112919Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0113088Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0113287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0113456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svdvals_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0113634Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0113814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0113990Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorsolve_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0114162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0114330Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0114509Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0114687Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vector_norm_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0114858Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0115020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0115182Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0115343Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0115501Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log2_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0115660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0115836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0116007Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp2_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0116178Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logcumsumexp_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0116363Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0116528Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0116693Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0116865Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0117029Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0117201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0117368Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0117538Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0117704Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0117864Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0118032Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0118188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0118343Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0118510Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0118673Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_solve_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0118866Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0119030Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0119193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0119357Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0119530Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0119700Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmax_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0119872Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0120050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0120223Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0120403Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0120601Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0120791Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumsum_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0120963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0121134Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0121302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_fill_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0121480Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0121648Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0121818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_median_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0122023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0122199Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0122366Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0122540Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0122710Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0122879Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0123045Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0123218Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmax_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0123391Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0123561Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0123727Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_sum_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0123896Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0124062Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0124230Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0124393Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matmul_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0124589Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0124757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_matrix_exp_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0124922Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0125117Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0125291Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0125472Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0125640Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0125804Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_maximum_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0125969Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0126135Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0126301Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_median_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0126485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0126672Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0126850Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0127028Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_no_dim_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0127210Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0127381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0127545Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_minimum_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0127728Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0127892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0128052Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mode_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0128216Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0128375Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0128537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_msort_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0128699Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0128865Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0129048Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0129229Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0129399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0129565Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmean_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0129733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0129900Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanmedian_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0130078Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nanquantile_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0130298Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0130472Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nansum_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0130650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0130818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0130986Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_copy_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0131154Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0131314Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0131474Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_narrow_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0131655Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_batch_norm_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0131843Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_dropout_backward_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0132023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_native_layer_norm_cuda_float32 PASSED [ 12%]
2023-01-11T23:10:17.0132184Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0132349Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0132533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_float32 SKIPPED (Skipped!) [ 12%]
2023-01-11T23:10:17.0132711Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_int64 SKIPPED (Skipped!) [ 12%]
2023-01-11T23:10:17.0132946Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_complex64 SKIPPED (Expected: new_empty_strided is not comparable) [ 12%]
2023-01-11T23:10:17.0133179Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_int64 SKIPPED (Expected: new_empty_strided is not comparable) [ 12%]
2023-01-11T23:10:17.0133372Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0133536Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0133707Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_complex64 PASSED [ 12%]
2023-01-11T23:10:17.0133871Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_int64 PASSED [ 12%]
2023-01-11T23:10:17.0134038Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0134202Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0134422Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0134726Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0134921Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0135109Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0135296Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0135481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0135667Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0135854Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0136079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0136261Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_avg_pool3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0136435Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0136632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0136815Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_bilinear_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0137020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0137197Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_celu_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0137379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0137558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0137739Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0137932Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0138115Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0138309Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0138496Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0138691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0138879Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0139098Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_similarity_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0139287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cross_entropy_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0139467Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_ctc_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0139651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0139836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0140024Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_bag_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0140243Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_embedding_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0140470Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0140677Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0140870Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0141064Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0141253Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0141431Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_gelu_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0143160Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardswish_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0143339Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0143521Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_huber_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0143706Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_instance_norm_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0143896Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_area_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0144087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0144279Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0144465Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_layer_norm_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0144650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_leaky_relu_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0144830Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0145020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_local_response_norm_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0145201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0145381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0145571Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0145756Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0145941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0146166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0146370Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0146547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_one_hot_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0146736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0146918Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0147100Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_circular_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0147291Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0147476Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0147658Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_constant_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0147853Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0148042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_replicate_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0148236Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0148423Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0148637Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0148892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0149085Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0149272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0149454Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_prelu_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0149636Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_silu_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0149820Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0150012Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0150198Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0150396Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0150612Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0150821Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0151014Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0151200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0151394Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0151607Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0151818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0152002Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_unfold_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0152190Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0152362Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0152531Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0152696Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nonzero_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0152871Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0153042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0153213Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_fro_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0153383Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0153547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0153704Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0153876Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0154042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0154210Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0154436Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0154602Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0154768Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_outer_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0154941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0155100Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0155266Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polar_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0155452Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0155652Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0155855Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0156051Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_3_cuda_int64 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0156246Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_int64 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0156419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0156587Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0156748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0156912Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0157082Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0157253Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0157414Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_put_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0157606Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_quantile_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0157775Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0157945Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0158172Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_float32 SKIPPED (Test expects tensor input) [ 13%]
2023-01-11T23:10:17.0158337Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0158537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_complex64 SKIPPED (Test expects tensor input) [ 13%]
2023-01-11T23:10:17.0158714Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0158885Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_like_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0159054Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0159221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0159394Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0159564Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0159732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_remainder_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0159893Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0160082Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0160289Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0160494Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0160669Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0160838Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0161001Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_as_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0161166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0161325Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0161496Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0161669Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0161847Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0162018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0162193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0162365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0162533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0162697Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0162853Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0163023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0163200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_0_cuda_float32 PASSED [ 13%]
2023-01-11T23:10:17.0163415Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0163588Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0163750Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0163917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_complex64 PASSED [ 13%]
2023-01-11T23:10:17.0164079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_int64 PASSED [ 13%]
2023-01-11T23:10:17.0164265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_complex64 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0164457Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:10:17.0164643Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scalar_tensor_cuda_int64 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0164816Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0164986Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0165154Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0165321Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0165500Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0165674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0165876Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0166056Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_prod_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0166229Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0166417Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_offsets_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0166582Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0166748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0166925Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0167088Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0167251Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0167416Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0167581Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0167742Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_short_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0167941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_bartlett_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0168138Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_blackman_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0168333Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_cosine_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0168532Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_gaussian_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0168729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hann_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0168923Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_kaiser_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0169136Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_nuttall_cuda_float32 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:10:17.0169304Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0169469Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signbit_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0169632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0169794Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0169962Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0170131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0170293Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinc_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0170468Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0170655Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0170836Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0170998Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0171165Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0171328Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0171490Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sort_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0171698Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0171872Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_airy_ai_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0172044Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0172221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0172393Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0172588Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0172783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0173162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0173527Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0173704Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0173878Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0174050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_erfcx_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0174239Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0174431Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_he_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0174717Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0174889Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1e_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0175119Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0175300Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0175476Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_log_ndtr_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0175664Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0175845Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0176019Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0176194Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtr_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0176367Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0176571Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0176769Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0176963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k0_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0177150Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0177528Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0177928Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_t_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0178292Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0178660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_u_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0179020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:10:17.0179213Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0179408Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_spherical_bessel_j0_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0179574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0179742Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0179924Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0180100Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0180284Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0180456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0180629Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_with_sizes_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0180794Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0180969Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0181165Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0181329Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_square_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0181496Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0181659Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0181831Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0181992Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0182178Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0182360Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_mean_unbiased_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0182537Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0182706Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0182874Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0183036Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0183200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0183365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0183533Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0183721Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0183891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0184060Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_symeig_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0184224Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0184389Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0184548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_t_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0184726Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0184891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0185064Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_along_dim_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0185233Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0185399Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0185569Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0185733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0185907Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0186082Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0186240Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0186403Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0186574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_float32 SKIPPED [ 14%]
2023-01-11T23:10:17.0186751Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0186957Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0187131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0187300Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapezoid_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0187481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0187642Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0187792Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0187962Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0188130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0188292Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trunc_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0188460Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0188632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0188872Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0189044Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0189201Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0189367Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0189534Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0189731Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0189905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0190073Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0190256Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0190433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0190600Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0190765Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_complex_cuda_float32 XFAIL [ 14%]
2023-01-11T23:10:17.0190936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0191105Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0191266Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0191435Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0191598Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0191767Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0191931Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_int64 PASSED [ 14%]
2023-01-11T23:10:17.0192087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0192250Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_float32 PASSED [ 14%]
2023-01-11T23:10:17.0192422Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_complex64 PASSED [ 14%]
2023-01-11T23:10:17.0192583Z test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_complex128 PASSED [ 14%]
2023-01-11T23:10:17.0192771Z test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_complex128 PASSED [ 14%]
2023-01-11T23:10:17.0192927Z test_ops.py::TestCommonCUDA::test_numpy_ref_allclose_cuda_float64 PASSED [ 14%]
2023-01-11T23:10:17.0193083Z test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_float64 PASSED  [ 14%]
2023-01-11T23:10:17.0193236Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_float64 PASSED [ 14%]
2023-01-11T23:10:17.0193398Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0193541Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_complex128 PASSED   [ 15%]
2023-01-11T23:10:17.0193688Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_int64 PASSED        [ 15%]
2023-01-11T23:10:17.0193836Z test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_int64 PASSED      [ 15%]
2023-01-11T23:10:17.0193990Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_float64 XFAIL     [ 15%]
2023-01-11T23:10:17.0194138Z test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_int64 PASSED       [ 15%]
2023-01-11T23:10:17.0194299Z test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0194450Z test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0194597Z test_ops.py::TestCommonCUDA::test_numpy_ref_diagflat_cuda_int64 PASSED   [ 15%]
2023-01-11T23:10:17.0194742Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_complex128 PASSED  [ 15%]
2023-01-11T23:10:17.0194888Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_float64 PASSED     [ 15%]
2023-01-11T23:10:17.0195033Z test_ops.py::TestCommonCUDA::test_numpy_ref_diff_cuda_int64 PASSED       [ 15%]
2023-01-11T23:10:17.0195181Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0195353Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_float64 PASSED    [ 15%]
2023-01-11T23:10:17.0195500Z test_ops.py::TestCommonCUDA::test_numpy_ref_equal_cuda_int64 PASSED      [ 15%]
2023-01-11T23:10:17.0195652Z test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_int64 PASSED    [ 15%]
2023-01-11T23:10:17.0195831Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0195998Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0196182Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0196364Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0196542Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_4inputs_with_extra_args_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0196708Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0196873Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0197040Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorinv_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0197202Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0197361Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0197531Z test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0197704Z test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0197864Z test_ops.py::TestCommonCUDA::test_numpy_ref_native_layer_norm_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0198047Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0198231Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose1d_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0198433Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_complex128 XFAIL [ 15%]
2023-01-11T23:10:17.0198607Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_group_norm_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0198772Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0198933Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_layer_norm_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0199100Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_mse_loss_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0199264Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pdist_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0199437Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_smooth_l1_loss_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0199599Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0199751Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_float64 PASSED  [ 15%]
2023-01-11T23:10:17.0199906Z test_ops.py::TestCommonCUDA::test_numpy_ref_permute_cuda_int64 PASSED    [ 15%]
2023-01-11T23:10:17.0200060Z test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0200246Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0200412Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_float64 PASSED     [ 15%]
2023-01-11T23:10:17.0200573Z test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0200731Z test_ops.py::TestCommonCUDA::test_numpy_ref_searchsorted_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0200901Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_blackman_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0201093Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_cosine_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0201269Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_exponential_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0201450Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_cosine_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0201627Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_general_hamming_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0201788Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hamming_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0201952Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0202106Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0202257Z test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_complex128 PASSED  [ 15%]
2023-01-11T23:10:17.0202404Z test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_int64 PASSED       [ 15%]
2023-01-11T23:10:17.0202569Z test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0202727Z test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_float64 PASSED [ 15%]
2023-01-11T23:10:17.0202883Z test_ops.py::TestCommonCUDA::test_numpy_ref_triu_indices_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0203033Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_complex128 PASSED [ 15%]
2023-01-11T23:10:17.0203183Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_float64 PASSED   [ 15%]
2023-01-11T23:10:17.0203333Z test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_int64 PASSED  [ 15%]
2023-01-11T23:10:17.0203483Z test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_int64 PASSED      [ 15%]
2023-01-11T23:10:17.0203625Z test_ops.py::TestCommonCUDA::test_out_T_cuda_float32 PASSED              [ 15%]
2023-01-11T23:10:17.0203768Z test_ops.py::TestCommonCUDA::test_out___radd___cuda_float32 PASSED       [ 15%]
2023-01-11T23:10:17.0203914Z test_ops.py::TestCommonCUDA::test_out___rand___cuda_int64 PASSED         [ 15%]
2023-01-11T23:10:17.0204058Z test_ops.py::TestCommonCUDA::test_out___rdiv___cuda_float32 PASSED       [ 15%]
2023-01-11T23:10:17.0204216Z test_ops.py::TestCommonCUDA::test_out___ror___cuda_int64 PASSED          [ 15%]
2023-01-11T23:10:17.0204359Z test_ops.py::TestCommonCUDA::test_out___rxor___cuda_int64 PASSED         [ 15%]
2023-01-11T23:10:17.0204521Z test_ops.py::TestCommonCUDA::test_out__native_batch_norm_legit_cuda_float32 XFAIL [ 15%]
2023-01-11T23:10:17.0204665Z test_ops.py::TestCommonCUDA::test_out__refs_T_cuda_float32 PASSED        [ 15%]
2023-01-11T23:10:17.0204832Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_bfloat16_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0204994Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_byte_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0205160Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_cdouble_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0205328Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_cfloat_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0205490Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_chalf_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0205647Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_char_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0205816Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_complex_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0205978Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_float_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0206142Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_int_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0206289Z test_ops.py::TestCommonCUDA::test_out__refs_abs_cuda_float32 PASSED      [ 15%]
2023-01-11T23:10:17.0206437Z test_ops.py::TestCommonCUDA::test_out__refs_acosh_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0206670Z test_ops.py::TestCommonCUDA::test_out__refs_allclose_cuda_float32 SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 15%]
2023-01-11T23:10:17.0206843Z test_ops.py::TestCommonCUDA::test_out__refs_amax_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0206987Z test_ops.py::TestCommonCUDA::test_out__refs_any_cuda_float32 PASSED      [ 15%]
2023-01-11T23:10:17.0207137Z test_ops.py::TestCommonCUDA::test_out__refs_arange_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0207285Z test_ops.py::TestCommonCUDA::test_out__refs_asinh_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0207429Z test_ops.py::TestCommonCUDA::test_out__refs_atan_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0207576Z test_ops.py::TestCommonCUDA::test_out__refs_atanh_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0207731Z test_ops.py::TestCommonCUDA::test_out__refs_atleast_2d_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0207884Z test_ops.py::TestCommonCUDA::test_out__refs_atleast_3d_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0208036Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_and_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0208191Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_left_shift_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0208353Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_right_shift_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0208506Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_xor_cuda_int64 PASSED [ 15%]
2023-01-11T23:10:17.0208667Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_shapes_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0208827Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_tensors_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0208984Z test_ops.py::TestCommonCUDA::test_out__refs_broadcast_to_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0209137Z test_ops.py::TestCommonCUDA::test_out__refs_bucketize_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0209285Z test_ops.py::TestCommonCUDA::test_out__refs_chunk_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0209437Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_max_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0209592Z test_ops.py::TestCommonCUDA::test_out__refs_column_stack_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0209753Z test_ops.py::TestCommonCUDA::test_out__refs_constant_pad_nd_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0209931Z test_ops.py::TestCommonCUDA::test_out__refs_contiguous_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0210081Z test_ops.py::TestCommonCUDA::test_out__refs_cosh_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0210231Z test_ops.py::TestCommonCUDA::test_out__refs_diag_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0210384Z test_ops.py::TestCommonCUDA::test_out__refs_diag_embed_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0210538Z test_ops.py::TestCommonCUDA::test_out__refs_diagonal_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0210705Z test_ops.py::TestCommonCUDA::test_out__refs_diagonal_scatter_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0210867Z test_ops.py::TestCommonCUDA::test_out__refs_digamma_cuda_float32 PASSED  [ 15%]
2023-01-11T23:10:17.0211055Z test_ops.py::TestCommonCUDA::test_out__refs_div_no_rounding_mode_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0211203Z test_ops.py::TestCommonCUDA::test_out__refs_dstack_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0211398Z test_ops.py::TestCommonCUDA::test_out__refs_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 15%]
2023-01-11T23:10:17.0211595Z test_ops.py::TestCommonCUDA::test_out__refs_empty_like_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 15%]
2023-01-11T23:10:17.0211741Z test_ops.py::TestCommonCUDA::test_out__refs_eq_cuda_float32 PASSED       [ 15%]
2023-01-11T23:10:17.0211888Z test_ops.py::TestCommonCUDA::test_out__refs_erf_cuda_float32 PASSED      [ 15%]
2023-01-11T23:10:17.0212037Z test_ops.py::TestCommonCUDA::test_out__refs_expand_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0212177Z test_ops.py::TestCommonCUDA::test_out__refs_expm1_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0212346Z test_ops.py::TestCommonCUDA::test_out__refs_eye_cuda_float32 PASSED      [ 15%]
2023-01-11T23:10:17.0212494Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fft2_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0212644Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fftn_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0212799Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft2_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0212949Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft2_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0213095Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifft_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0213247Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftn_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0213400Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ifftshift_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0213556Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfftn_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0213705Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0213862Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfftn_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0214011Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0214162Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfftn_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0214309Z test_ops.py::TestCommonCUDA::test_out__refs_fill_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0214458Z test_ops.py::TestCommonCUDA::test_out__refs_flipud_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0214719Z test_ops.py::TestCommonCUDA::test_out__refs_float_power_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0214859Z test_ops.py::TestCommonCUDA::test_out__refs_floor_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0215017Z test_ops.py::TestCommonCUDA::test_out__refs_floor_divide_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0215163Z test_ops.py::TestCommonCUDA::test_out__refs_fmax_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0215313Z test_ops.py::TestCommonCUDA::test_out__refs_fmin_cuda_float32 PASSED     [ 15%]
2023-01-11T23:10:17.0215456Z test_ops.py::TestCommonCUDA::test_out__refs_gcd_cuda_int64 PASSED        [ 15%]
2023-01-11T23:10:17.0215637Z test_ops.py::TestCommonCUDA::test_out__refs_gt_cuda_float32 PASSED       [ 15%]
2023-01-11T23:10:17.0215796Z test_ops.py::TestCommonCUDA::test_out__refs_hsplit_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0215941Z test_ops.py::TestCommonCUDA::test_out__refs_hstack_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0216080Z test_ops.py::TestCommonCUDA::test_out__refs_hypot_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0216222Z test_ops.py::TestCommonCUDA::test_out__refs_i0_cuda_float32 PASSED       [ 15%]
2023-01-11T23:10:17.0216368Z test_ops.py::TestCommonCUDA::test_out__refs_igamma_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0216515Z test_ops.py::TestCommonCUDA::test_out__refs_imag_cuda_complex64 PASSED   [ 15%]
2023-01-11T23:10:17.0216671Z test_ops.py::TestCommonCUDA::test_out__refs_index_add_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0216832Z test_ops.py::TestCommonCUDA::test_out__refs_index_select_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0216981Z test_ops.py::TestCommonCUDA::test_out__refs_isinf_cuda_float32 PASSED    [ 15%]
2023-01-11T23:10:17.0217129Z test_ops.py::TestCommonCUDA::test_out__refs_isneginf_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0217271Z test_ops.py::TestCommonCUDA::test_out__refs_isposinf_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0217417Z test_ops.py::TestCommonCUDA::test_out__refs_isreal_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0217560Z test_ops.py::TestCommonCUDA::test_out__refs_lcm_cuda_int64 PASSED        [ 15%]
2023-01-11T23:10:17.0217705Z test_ops.py::TestCommonCUDA::test_out__refs_lgamma_cuda_float32 PASSED   [ 15%]
2023-01-11T23:10:17.0217868Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_matrix_norm_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0218021Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_svd_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0218213Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_vector_norm_cuda_float32 PASSED [ 15%]
2023-01-11T23:10:17.0218362Z test_ops.py::TestCommonCUDA::test_out__refs_linspace_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0218514Z test_ops.py::TestCommonCUDA::test_out__refs_logical_not_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0218660Z test_ops.py::TestCommonCUDA::test_out__refs_lt_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0218805Z test_ops.py::TestCommonCUDA::test_out__refs_mean_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0218976Z test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0219149Z test_ops.py::TestCommonCUDA::test_out__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0219298Z test_ops.py::TestCommonCUDA::test_out__refs_minimum_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0219442Z test_ops.py::TestCommonCUDA::test_out__refs_mul_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0219617Z test_ops.py::TestCommonCUDA::test_out__refs_narrow_copy_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0219772Z test_ops.py::TestCommonCUDA::test_out__refs_native_layer_norm_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0219921Z test_ops.py::TestCommonCUDA::test_out__refs_neg_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0220093Z test_ops.py::TestCommonCUDA::test_out__refs_new_full_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0220263Z test_ops.py::TestCommonCUDA::test_out__refs_new_ones_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0220417Z test_ops.py::TestCommonCUDA::test_out__refs_nextafter_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0220592Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_alpha_dropout_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0220755Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_celu_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0220920Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_elu_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0221085Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_gelu_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0221264Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_glu_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0221438Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0221605Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0221769Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0221935Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0222097Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mish_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0222263Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0222430Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_prelu_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0222584Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_selu_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0222765Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0222944Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0223112Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softplus_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0223278Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0223457Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0223605Z test_ops.py::TestCommonCUDA::test_out__refs_norm_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0223776Z test_ops.py::TestCommonCUDA::test_out__refs_ones_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0223927Z test_ops.py::TestCommonCUDA::test_out__refs_permute_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0224071Z test_ops.py::TestCommonCUDA::test_out__refs_real_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0224225Z test_ops.py::TestCommonCUDA::test_out__refs_remainder_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0224373Z test_ops.py::TestCommonCUDA::test_out__refs_repeat_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0224528Z test_ops.py::TestCommonCUDA::test_out__refs_reshape_as_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0224680Z test_ops.py::TestCommonCUDA::test_out__refs_reshape_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0224824Z test_ops.py::TestCommonCUDA::test_out__refs_rot90_cuda_float32 PASSED    [ 16%]
2023-01-11T23:10:17.0224968Z test_ops.py::TestCommonCUDA::test_out__refs_rsub_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0225116Z test_ops.py::TestCommonCUDA::test_out__refs_signbit_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0225256Z test_ops.py::TestCommonCUDA::test_out__refs_sin_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0225414Z test_ops.py::TestCommonCUDA::test_out__refs_sinc_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0225631Z test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j0_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0225844Z test_ops.py::TestCommonCUDA::test_out__refs_special_entr_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0226047Z test_ops.py::TestCommonCUDA::test_out__refs_special_i0e_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0226217Z test_ops.py::TestCommonCUDA::test_out__refs_special_i1e_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0226375Z test_ops.py::TestCommonCUDA::test_out__refs_special_logit_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0226556Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0226714Z test_ops.py::TestCommonCUDA::test_out__refs_special_ndtri_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0226887Z test_ops.py::TestCommonCUDA::test_out__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0227095Z test_ops.py::TestCommonCUDA::test_out__refs_special_xlog1py_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0227253Z test_ops.py::TestCommonCUDA::test_out__refs_special_zeta_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0227402Z test_ops.py::TestCommonCUDA::test_out__refs_square_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0227551Z test_ops.py::TestCommonCUDA::test_out__refs_squeeze_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0227698Z test_ops.py::TestCommonCUDA::test_out__refs_std_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0227842Z test_ops.py::TestCommonCUDA::test_out__refs_sub_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0227990Z test_ops.py::TestCommonCUDA::test_out__refs_sum_to_size_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0228133Z test_ops.py::TestCommonCUDA::test_out__refs_t_cuda_float32 PASSED        [ 16%]
2023-01-11T23:10:17.0228280Z test_ops.py::TestCommonCUDA::test_out__refs_tan_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0228436Z test_ops.py::TestCommonCUDA::test_out__refs_tensor_split_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0228595Z test_ops.py::TestCommonCUDA::test_out__refs_transpose_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0228828Z test_ops.py::TestCommonCUDA::test_out__refs_tril_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0228978Z test_ops.py::TestCommonCUDA::test_out__refs_unbind_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0229131Z test_ops.py::TestCommonCUDA::test_out__refs_unflatten_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0229285Z test_ops.py::TestCommonCUDA::test_out__refs_unfold_copy_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0229426Z test_ops.py::TestCommonCUDA::test_out__refs_unfold_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0229579Z test_ops.py::TestCommonCUDA::test_out__refs_unsqueeze_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0229758Z test_ops.py::TestCommonCUDA::test_out__refs_view_as_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0229907Z test_ops.py::TestCommonCUDA::test_out__refs_vstack_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0230058Z test_ops.py::TestCommonCUDA::test_out__refs_zeros_cuda_float32 PASSED    [ 16%]
2023-01-11T23:10:17.0230204Z test_ops.py::TestCommonCUDA::test_out_abs_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0230350Z test_ops.py::TestCommonCUDA::test_out_addcmul_cuda_float32 PASSED        [ 16%]
2023-01-11T23:10:17.0230496Z test_ops.py::TestCommonCUDA::test_out_addmv_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0230635Z test_ops.py::TestCommonCUDA::test_out_addr_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0230864Z test_ops.py::TestCommonCUDA::test_out_allclose_cuda_float32 SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 16%]
2023-01-11T23:10:17.0231008Z test_ops.py::TestCommonCUDA::test_out_amax_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0231154Z test_ops.py::TestCommonCUDA::test_out_any_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0231298Z test_ops.py::TestCommonCUDA::test_out_arange_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0231446Z test_ops.py::TestCommonCUDA::test_out_argmin_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0231592Z test_ops.py::TestCommonCUDA::test_out_argwhere_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0231736Z test_ops.py::TestCommonCUDA::test_out_asinh_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0231873Z test_ops.py::TestCommonCUDA::test_out_atan_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0232019Z test_ops.py::TestCommonCUDA::test_out_bitwise_and_cuda_int64 PASSED      [ 16%]
2023-01-11T23:10:17.0232164Z test_ops.py::TestCommonCUDA::test_out_bitwise_or_cuda_int64 PASSED       [ 16%]
2023-01-11T23:10:17.0232321Z test_ops.py::TestCommonCUDA::test_out_bitwise_right_shift_cuda_int64 PASSED [ 16%]
2023-01-11T23:10:17.0232471Z test_ops.py::TestCommonCUDA::test_out_bitwise_xor_cuda_int64 PASSED      [ 16%]
2023-01-11T23:10:17.0232617Z test_ops.py::TestCommonCUDA::test_out_block_diag_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0232784Z test_ops.py::TestCommonCUDA::test_out_bmm_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0232934Z test_ops.py::TestCommonCUDA::test_out_broadcast_to_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0233071Z test_ops.py::TestCommonCUDA::test_out_byte_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0233221Z test_ops.py::TestCommonCUDA::test_out_cartesian_prod_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0233364Z test_ops.py::TestCommonCUDA::test_out_cat_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0233508Z test_ops.py::TestCommonCUDA::test_out_cdist_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0233653Z test_ops.py::TestCommonCUDA::test_out_cdouble_cuda_float32 PASSED        [ 16%]
2023-01-11T23:10:17.0233800Z test_ops.py::TestCommonCUDA::test_out_ceil_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0233944Z test_ops.py::TestCommonCUDA::test_out_cfloat_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0234092Z test_ops.py::TestCommonCUDA::test_out_chalf_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0234230Z test_ops.py::TestCommonCUDA::test_out_char_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0234379Z test_ops.py::TestCommonCUDA::test_out_cholesky_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0234532Z test_ops.py::TestCommonCUDA::test_out_cholesky_inverse_cuda_float32 XFAIL [ 16%]
2023-01-11T23:10:17.0234677Z test_ops.py::TestCommonCUDA::test_out_chunk_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0234822Z test_ops.py::TestCommonCUDA::test_out_clamp_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0234972Z test_ops.py::TestCommonCUDA::test_out_clamp_max_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0235120Z test_ops.py::TestCommonCUDA::test_out_clamp_min_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0235290Z test_ops.py::TestCommonCUDA::test_out_clone_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0235434Z test_ops.py::TestCommonCUDA::test_out_combinations_cuda_float32 PASSED   [ 16%]
2023-01-11T23:10:17.0235584Z test_ops.py::TestCommonCUDA::test_out_complex_cuda_float32 PASSED        [ 16%]
2023-01-11T23:10:17.0235738Z test_ops.py::TestCommonCUDA::test_out_constant_pad_nd_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0235887Z test_ops.py::TestCommonCUDA::test_out_contiguous_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0236036Z test_ops.py::TestCommonCUDA::test_out_copysign_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0236180Z test_ops.py::TestCommonCUDA::test_out_cos_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0236327Z test_ops.py::TestCommonCUDA::test_out_cross_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0236474Z test_ops.py::TestCommonCUDA::test_out_cummin_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0236613Z test_ops.py::TestCommonCUDA::test_out_cumsum_cuda_float32 XFAIL          [ 16%]
2023-01-11T23:10:17.0236757Z test_ops.py::TestCommonCUDA::test_out_diag_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0236908Z test_ops.py::TestCommonCUDA::test_out_diagflat_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0237059Z test_ops.py::TestCommonCUDA::test_out_diagonal_copy_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0237203Z test_ops.py::TestCommonCUDA::test_out_diff_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0237345Z test_ops.py::TestCommonCUDA::test_out_dist_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0237509Z test_ops.py::TestCommonCUDA::test_out_div_no_rounding_mode_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0237669Z test_ops.py::TestCommonCUDA::test_out_div_trunc_rounding_cuda_float32 PASSED [ 16%]
2023-01-11T23:10:17.0237813Z test_ops.py::TestCommonCUDA::test_out_dot_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0237955Z test_ops.py::TestCommonCUDA::test_out_dstack_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0238144Z test_ops.py::TestCommonCUDA::test_out_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 16%]
2023-01-11T23:10:17.0238309Z test_ops.py::TestCommonCUDA::test_out_eq_cuda_float32 PASSED             [ 16%]
2023-01-11T23:10:17.0238453Z test_ops.py::TestCommonCUDA::test_out_erf_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0238599Z test_ops.py::TestCommonCUDA::test_out_erfc_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0238742Z test_ops.py::TestCommonCUDA::test_out_exp2_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0238886Z test_ops.py::TestCommonCUDA::test_out_expm1_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0239029Z test_ops.py::TestCommonCUDA::test_out_eye_cuda_float32 PASSED            [ 16%]
2023-01-11T23:10:17.0239168Z test_ops.py::TestCommonCUDA::test_out_fft_fft2_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0239315Z test_ops.py::TestCommonCUDA::test_out_fft_fft_cuda_float32 PASSED        [ 16%]
2023-01-11T23:10:17.0239465Z test_ops.py::TestCommonCUDA::test_out_fft_fftn_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0239611Z test_ops.py::TestCommonCUDA::test_out_fft_hfft2_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0239759Z test_ops.py::TestCommonCUDA::test_out_fft_hfft_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0239906Z test_ops.py::TestCommonCUDA::test_out_fft_hfftn_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0240051Z test_ops.py::TestCommonCUDA::test_out_fft_ifft2_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0240222Z test_ops.py::TestCommonCUDA::test_out_fft_ifft_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0240384Z test_ops.py::TestCommonCUDA::test_out_fft_ifftn_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0240537Z test_ops.py::TestCommonCUDA::test_out_fft_ifftshift_cuda_float32 PASSED  [ 16%]
2023-01-11T23:10:17.0240683Z test_ops.py::TestCommonCUDA::test_out_fft_ihfft2_cuda_float32 XFAIL      [ 16%]
2023-01-11T23:10:17.0240854Z test_ops.py::TestCommonCUDA::test_out_fft_ihfft_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0241001Z test_ops.py::TestCommonCUDA::test_out_fft_irfft2_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0241149Z test_ops.py::TestCommonCUDA::test_out_fft_irfft_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0241295Z test_ops.py::TestCommonCUDA::test_out_fft_irfftn_cuda_float32 PASSED     [ 16%]
2023-01-11T23:10:17.0241442Z test_ops.py::TestCommonCUDA::test_out_fft_rfft2_cuda_float32 PASSED      [ 16%]
2023-01-11T23:10:17.0241581Z test_ops.py::TestCommonCUDA::test_out_fft_rfft_cuda_float32 PASSED       [ 16%]
2023-01-11T23:10:17.0241726Z test_ops.py::TestCommonCUDA::test_out_fill_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0241873Z test_ops.py::TestCommonCUDA::test_out_flatten_cuda_float32 PASSED        [ 16%]
2023-01-11T23:10:17.0242016Z test_ops.py::TestCommonCUDA::test_out_flip_cuda_float32 PASSED           [ 16%]
2023-01-11T23:10:17.0242166Z test_ops.py::TestCommonCUDA::test_out_fliplr_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0242310Z test_ops.py::TestCommonCUDA::test_out_flipud_cuda_float32 PASSED         [ 16%]
2023-01-11T23:10:17.0242457Z test_ops.py::TestCommonCUDA::test_out_float_cuda_float32 PASSED          [ 16%]
2023-01-11T23:10:17.0242603Z test_ops.py::TestCommonCUDA::test_out_float_power_cuda_float32 PASSED    [ 16%]
2023-01-11T23:10:17.0242741Z test_ops.py::TestCommonCUDA::test_out_fmax_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0242885Z test_ops.py::TestCommonCUDA::test_out_fmod_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0243031Z test_ops.py::TestCommonCUDA::test_out_gather_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0243175Z test_ops.py::TestCommonCUDA::test_out_geqrf_cuda_float32 XFAIL           [ 17%]
2023-01-11T23:10:17.0243331Z test_ops.py::TestCommonCUDA::test_out_grid_sampler_2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0243475Z test_ops.py::TestCommonCUDA::test_out_half_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0243626Z test_ops.py::TestCommonCUDA::test_out_heaviside_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0243768Z test_ops.py::TestCommonCUDA::test_out_histc_cuda_float32 XFAIL           [ 17%]
2023-01-11T23:10:17.0243966Z test_ops.py::TestCommonCUDA::test_out_hstack_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0244112Z test_ops.py::TestCommonCUDA::test_out_hypot_cuda_float32 PASSED          [ 17%]
2023-01-11T23:10:17.0244262Z test_ops.py::TestCommonCUDA::test_out_index_reduce_cuda_float32 PASSED   [ 17%]
2023-01-11T23:10:17.0244407Z test_ops.py::TestCommonCUDA::test_out_int_cuda_float32 PASSED            [ 17%]
2023-01-11T23:10:17.0244554Z test_ops.py::TestCommonCUDA::test_out_isclose_cuda_float32 PASSED        [ 17%]
2023-01-11T23:10:17.0244701Z test_ops.py::TestCommonCUDA::test_out_isfinite_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0244845Z test_ops.py::TestCommonCUDA::test_out_isin_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0244992Z test_ops.py::TestCommonCUDA::test_out_isnan_cuda_float32 PASSED          [ 17%]
2023-01-11T23:10:17.0245135Z test_ops.py::TestCommonCUDA::test_out_isposinf_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0245282Z test_ops.py::TestCommonCUDA::test_out_isreal_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0245452Z test_ops.py::TestCommonCUDA::test_out_jiterator_2inputs_2outputs_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0245630Z test_ops.py::TestCommonCUDA::test_out_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0245802Z test_ops.py::TestCommonCUDA::test_out_jiterator_binary_return_by_ref_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0245958Z test_ops.py::TestCommonCUDA::test_out_jiterator_unary_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0246101Z test_ops.py::TestCommonCUDA::test_out_le_cuda_float32 PASSED             [ 17%]
2023-01-11T23:10:17.0246248Z test_ops.py::TestCommonCUDA::test_out_lgamma_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0246422Z test_ops.py::TestCommonCUDA::test_out_linalg_cross_cuda_float32 PASSED   [ 17%]
2023-01-11T23:10:17.0246563Z test_ops.py::TestCommonCUDA::test_out_linalg_det_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0246729Z test_ops.py::TestCommonCUDA::test_out_linalg_det_singular_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0246900Z test_ops.py::TestCommonCUDA::test_out_linalg_householder_product_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0247059Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0247220Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_factor_ex_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0247374Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_solve_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0247534Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0247689Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0247857Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0248013Z test_ops.py::TestCommonCUDA::test_out_linalg_multi_dot_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0248181Z test_ops.py::TestCommonCUDA::test_out_linalg_pinv_hermitian_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0248406Z test_ops.py::TestCommonCUDA::test_out_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 17%]
2023-01-11T23:10:17.0248556Z test_ops.py::TestCommonCUDA::test_out_linalg_qr_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0248707Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_cuda_float32 PASSED   [ 17%]
2023-01-11T23:10:17.0248862Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_ex_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0249020Z test_ops.py::TestCommonCUDA::test_out_linalg_tensorinv_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0249165Z test_ops.py::TestCommonCUDA::test_out_linalg_vecdot_cuda_float32 PASSED  [ 17%]
2023-01-11T23:10:17.0249317Z test_ops.py::TestCommonCUDA::test_out_linspace_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0249463Z test_ops.py::TestCommonCUDA::test_out_log2_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0249636Z test_ops.py::TestCommonCUDA::test_out_logaddexp_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0249791Z test_ops.py::TestCommonCUDA::test_out_logcumsumexp_cuda_float32 PASSED   [ 17%]
2023-01-11T23:10:17.0249937Z test_ops.py::TestCommonCUDA::test_out_logdet_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0250086Z test_ops.py::TestCommonCUDA::test_out_logical_or_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0250235Z test_ops.py::TestCommonCUDA::test_out_logical_xor_cuda_float32 PASSED    [ 17%]
2023-01-11T23:10:17.0250380Z test_ops.py::TestCommonCUDA::test_out_logit_cuda_float32 PASSED          [ 17%]
2023-01-11T23:10:17.0250521Z test_ops.py::TestCommonCUDA::test_out_logspace_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0250672Z test_ops.py::TestCommonCUDA::test_out_logsumexp_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0250819Z test_ops.py::TestCommonCUDA::test_out_masked_amax_cuda_float32 PASSED    [ 17%]
2023-01-11T23:10:17.0250975Z test_ops.py::TestCommonCUDA::test_out_masked_cumsum_cuda_float32 PASSED  [ 17%]
2023-01-11T23:10:17.0251125Z test_ops.py::TestCommonCUDA::test_out_masked_fill_cuda_float32 PASSED    [ 17%]
2023-01-11T23:10:17.0251287Z test_ops.py::TestCommonCUDA::test_out_masked_log_softmax_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0251445Z test_ops.py::TestCommonCUDA::test_out_masked_logaddexp_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0251597Z test_ops.py::TestCommonCUDA::test_out_masked_norm_cuda_float32 PASSED    [ 17%]
2023-01-11T23:10:17.0251749Z test_ops.py::TestCommonCUDA::test_out_masked_normalize_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0251897Z test_ops.py::TestCommonCUDA::test_out_masked_prod_cuda_float32 PASSED    [ 17%]
2023-01-11T23:10:17.0252073Z test_ops.py::TestCommonCUDA::test_out_masked_scatter_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0252224Z test_ops.py::TestCommonCUDA::test_out_masked_std_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0252374Z test_ops.py::TestCommonCUDA::test_out_max_binary_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0252540Z test_ops.py::TestCommonCUDA::test_out_max_reduction_no_dim_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0252705Z test_ops.py::TestCommonCUDA::test_out_max_reduction_with_dim_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0252851Z test_ops.py::TestCommonCUDA::test_out_maximum_cuda_float32 PASSED        [ 17%]
2023-01-11T23:10:17.0252992Z test_ops.py::TestCommonCUDA::test_out_median_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0253158Z test_ops.py::TestCommonCUDA::test_out_meshgrid_list_of_tensors_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0253306Z test_ops.py::TestCommonCUDA::test_out_min_binary_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0253467Z test_ops.py::TestCommonCUDA::test_out_min_reduction_no_dim_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0253633Z test_ops.py::TestCommonCUDA::test_out_min_reduction_with_dim_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0253779Z test_ops.py::TestCommonCUDA::test_out_minimum_cuda_float32 PASSED        [ 17%]
2023-01-11T23:10:17.0253926Z test_ops.py::TestCommonCUDA::test_out_mode_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0254078Z test_ops.py::TestCommonCUDA::test_out_movedim_cuda_float32 PASSED        [ 17%]
2023-01-11T23:10:17.0254216Z test_ops.py::TestCommonCUDA::test_out_msort_cuda_float32 PASSED          [ 17%]
2023-01-11T23:10:17.0254384Z test_ops.py::TestCommonCUDA::test_out_multinomial_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:10:17.0254631Z test_ops.py::TestCommonCUDA::test_out_mv_cuda_float32 PASSED             [ 17%]
2023-01-11T23:10:17.0254796Z test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0254939Z test_ops.py::TestCommonCUDA::test_out_nan_to_num_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0255090Z test_ops.py::TestCommonCUDA::test_out_nanmedian_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0255237Z test_ops.py::TestCommonCUDA::test_out_nansum_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0255426Z test_ops.py::TestCommonCUDA::test_out_narrow_copy_cuda_float32 XFAIL     [ 17%]
2023-01-11T23:10:17.0255565Z test_ops.py::TestCommonCUDA::test_out_narrow_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0255722Z test_ops.py::TestCommonCUDA::test_out_native_batch_norm_cuda_float32 XFAIL [ 17%]
2023-01-11T23:10:17.0255878Z test_ops.py::TestCommonCUDA::test_out_native_layer_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0256018Z test_ops.py::TestCommonCUDA::test_out_ne_cuda_float32 PASSED             [ 17%]
2023-01-11T23:10:17.0256159Z test_ops.py::TestCommonCUDA::test_out_neg_cuda_float32 PASSED            [ 17%]
2023-01-11T23:10:17.0256317Z test_ops.py::TestCommonCUDA::test_out_new_empty_strided_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0256466Z test_ops.py::TestCommonCUDA::test_out_nextafter_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0256655Z test_ops.py::TestCommonCUDA::test_out_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0256835Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0256998Z test_ops.py::TestCommonCUDA::test_out_nn_functional_alpha_dropout_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0257161Z test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0257342Z test_ops.py::TestCommonCUDA::test_out_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0257504Z test_ops.py::TestCommonCUDA::test_out_nn_functional_bilinear_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0257662Z test_ops.py::TestCommonCUDA::test_out_nn_functional_celu_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0257858Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0258031Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0258207Z test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_similarity_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0258366Z test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0258529Z test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0258720Z test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0258899Z test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0259057Z test_ops.py::TestCommonCUDA::test_out_nn_functional_glu_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0259227Z test_ops.py::TestCommonCUDA::test_out_nn_functional_grid_sample_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0259396Z test_ops.py::TestCommonCUDA::test_out_nn_functional_group_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0259562Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardshrink_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0259730Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardswish_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0259888Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardtanh_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0260062Z test_ops.py::TestCommonCUDA::test_out_nn_functional_huber_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0260261Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_area_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0260448Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0260627Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0260805Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_linear_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0260965Z test_ops.py::TestCommonCUDA::test_out_nn_functional_kl_div_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0261153Z test_ops.py::TestCommonCUDA::test_out_nn_functional_l1_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0261310Z test_ops.py::TestCommonCUDA::test_out_nn_functional_leaky_relu_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0261475Z test_ops.py::TestCommonCUDA::test_out_nn_functional_logsigmoid_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0261651Z test_ops.py::TestCommonCUDA::test_out_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0261815Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0261982Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0262155Z test_ops.py::TestCommonCUDA::test_out_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0262319Z test_ops.py::TestCommonCUDA::test_out_nn_functional_one_hot_cuda_int64 PASSED [ 17%]
2023-01-11T23:10:17.0262490Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_circular_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0262658Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_reflect_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0262822Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0262981Z test_ops.py::TestCommonCUDA::test_out_nn_functional_relu_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0263138Z test_ops.py::TestCommonCUDA::test_out_nn_functional_silu_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0263305Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0263482Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0263671Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softshrink_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0263834Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softsign_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0264013Z test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0264179Z test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_nearest_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0264329Z test_ops.py::TestCommonCUDA::test_out_norm_inf_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0264478Z test_ops.py::TestCommonCUDA::test_out_norm_nuc_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0264639Z test_ops.py::TestCommonCUDA::test_out_normal_number_mean_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0264789Z test_ops.py::TestCommonCUDA::test_out_ones_like_cuda_float32 PASSED      [ 17%]
2023-01-11T23:10:17.0264959Z test_ops.py::TestCommonCUDA::test_out_ops_nvprims_native_batch_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0265122Z test_ops.py::TestCommonCUDA::test_out_ops_nvprims_var_mean_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0265279Z test_ops.py::TestCommonCUDA::test_out_ops_nvprims_view_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0265423Z test_ops.py::TestCommonCUDA::test_out_outer_cuda_float32 PASSED          [ 17%]
2023-01-11T23:10:17.0265590Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_0_cuda_float32 PASSED [ 17%]
2023-01-11T23:10:17.0265770Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:10:17.0265946Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:10:17.0266118Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:10:17.0266290Z test_ops.py::TestCommonCUDA::test_out_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:10:17.0266443Z test_ops.py::TestCommonCUDA::test_out_positive_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0266592Z test_ops.py::TestCommonCUDA::test_out_qr_cuda_float32 PASSED             [ 17%]
2023-01-11T23:10:17.0266744Z test_ops.py::TestCommonCUDA::test_out_quantile_cuda_float32 PASSED       [ 17%]
2023-01-11T23:10:17.0266929Z test_ops.py::TestCommonCUDA::test_out_randint_like_cuda_float32 PASSED   [ 17%]
2023-01-11T23:10:17.0267076Z test_ops.py::TestCommonCUDA::test_out_randn_cuda_float32 PASSED          [ 17%]
2023-01-11T23:10:17.0267224Z test_ops.py::TestCommonCUDA::test_out_randn_like_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0267370Z test_ops.py::TestCommonCUDA::test_out_real_cuda_float32 PASSED           [ 17%]
2023-01-11T23:10:17.0267518Z test_ops.py::TestCommonCUDA::test_out_reciprocal_cuda_float32 PASSED     [ 17%]
2023-01-11T23:10:17.0267663Z test_ops.py::TestCommonCUDA::test_out_renorm_cuda_float32 PASSED         [ 17%]
2023-01-11T23:10:17.0267809Z test_ops.py::TestCommonCUDA::test_out_repeat_cuda_float32 PASSED         [ 18%]
2023-01-11T23:10:17.0267975Z test_ops.py::TestCommonCUDA::test_out_repeat_interleave_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0268115Z test_ops.py::TestCommonCUDA::test_out_resize__cuda_float32 PASSED        [ 18%]
2023-01-11T23:10:17.0268265Z test_ops.py::TestCommonCUDA::test_out_resize_as__cuda_float32 PASSED     [ 18%]
2023-01-11T23:10:17.0268412Z test_ops.py::TestCommonCUDA::test_out_resolve_neg_cuda_float32 PASSED    [ 18%]
2023-01-11T23:10:17.0268556Z test_ops.py::TestCommonCUDA::test_out_round_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0268792Z test_ops.py::TestCommonCUDA::test_out_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 18%]
2023-01-11T23:10:17.0268988Z test_ops.py::TestCommonCUDA::test_out_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 18%]
2023-01-11T23:10:17.0269134Z test_ops.py::TestCommonCUDA::test_out_rsqrt_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0269284Z test_ops.py::TestCommonCUDA::test_out_scatter_add_cuda_float32 PASSED    [ 18%]
2023-01-11T23:10:17.0269456Z test_ops.py::TestCommonCUDA::test_out_scatter_cuda_float32 PASSED        [ 18%]
2023-01-11T23:10:17.0269618Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amax_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0269781Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_sum_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0269935Z test_ops.py::TestCommonCUDA::test_out_searchsorted_cuda_float32 PASSED   [ 18%]
2023-01-11T23:10:17.0270103Z test_ops.py::TestCommonCUDA::test_out_segment_reduce_lengths_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0270251Z test_ops.py::TestCommonCUDA::test_out_select_cuda_float32 PASSED         [ 18%]
2023-01-11T23:10:17.0270398Z test_ops.py::TestCommonCUDA::test_out_short_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0270547Z test_ops.py::TestCommonCUDA::test_out_sigmoid_cuda_float32 PASSED        [ 18%]
2023-01-11T23:10:17.0270712Z test_ops.py::TestCommonCUDA::test_out_signal_windows_exponential_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0270879Z test_ops.py::TestCommonCUDA::test_out_signal_windows_hann_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0271042Z test_ops.py::TestCommonCUDA::test_out_signal_windows_kaiser_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0271192Z test_ops.py::TestCommonCUDA::test_out_signbit_cuda_float32 PASSED        [ 18%]
2023-01-11T23:10:17.0271337Z test_ops.py::TestCommonCUDA::test_out_sin_cuda_float32 PASSED            [ 18%]
2023-01-11T23:10:17.0271484Z test_ops.py::TestCommonCUDA::test_out_sinc_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0271631Z test_ops.py::TestCommonCUDA::test_out_slice_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0271775Z test_ops.py::TestCommonCUDA::test_out_sort_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0271932Z test_ops.py::TestCommonCUDA::test_out_special_airy_ai_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0272087Z test_ops.py::TestCommonCUDA::test_out_special_bessel_j1_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0272267Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0272443Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0272827Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%]
2023-01-11T23:10:17.0272984Z test_ops.py::TestCommonCUDA::test_out_special_entr_cuda_float32 PASSED   [ 18%]
2023-01-11T23:10:17.0273159Z test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_h_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0273310Z test_ops.py::TestCommonCUDA::test_out_special_i0e_cuda_float32 PASSED    [ 18%]
2023-01-11T23:10:17.0273487Z test_ops.py::TestCommonCUDA::test_out_special_laguerre_polynomial_l_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0273650Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k0_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0273824Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_k1_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0274001Z test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0274177Z test_ops.py::TestCommonCUDA::test_out_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0274530Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%]
2023-01-11T23:10:17.0274870Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%]
2023-01-11T23:10:17.0275204Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 18%]
2023-01-11T23:10:17.0275380Z test_ops.py::TestCommonCUDA::test_out_special_spherical_bessel_j0_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0275561Z test_ops.py::TestCommonCUDA::test_out_special_zeta_cuda_float32 PASSED   [ 18%]
2023-01-11T23:10:17.0275712Z test_ops.py::TestCommonCUDA::test_out_split_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0275868Z test_ops.py::TestCommonCUDA::test_out_split_with_sizes_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0276015Z test_ops.py::TestCommonCUDA::test_out_sqrt_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0276166Z test_ops.py::TestCommonCUDA::test_out_square_cuda_float32 PASSED         [ 18%]
2023-01-11T23:10:17.0276313Z test_ops.py::TestCommonCUDA::test_out_stack_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0276460Z test_ops.py::TestCommonCUDA::test_out_std_cuda_float32 PASSED            [ 18%]
2023-01-11T23:10:17.0276612Z test_ops.py::TestCommonCUDA::test_out_std_unbiased_cuda_float32 PASSED   [ 18%]
2023-01-11T23:10:17.0276756Z test_ops.py::TestCommonCUDA::test_out_sub_cuda_float32 PASSED            [ 18%]
2023-01-11T23:10:17.0276911Z test_ops.py::TestCommonCUDA::test_out_sum_to_size_cuda_float32 PASSED    [ 18%]
2023-01-11T23:10:17.0277051Z test_ops.py::TestCommonCUDA::test_out_svd_cuda_float32 PASSED            [ 18%]
2023-01-11T23:10:17.0277204Z test_ops.py::TestCommonCUDA::test_out_svd_lowrank_cuda_float32 PASSED    [ 18%]
2023-01-11T23:10:17.0277354Z test_ops.py::TestCommonCUDA::test_out_tensor_split_cuda_float32 PASSED   [ 18%]
2023-01-11T23:10:17.0277506Z test_ops.py::TestCommonCUDA::test_out_tensordot_cuda_float32 PASSED      [ 18%]
2023-01-11T23:10:17.0277652Z test_ops.py::TestCommonCUDA::test_out_tile_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0277798Z test_ops.py::TestCommonCUDA::test_out_topk_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0277944Z test_ops.py::TestCommonCUDA::test_out_trace_cuda_float32 PASSED          [ 18%]
2023-01-11T23:10:17.0278095Z test_ops.py::TestCommonCUDA::test_out_transpose_cuda_float32 PASSED      [ 18%]
2023-01-11T23:10:17.0278240Z test_ops.py::TestCommonCUDA::test_out_trapezoid_cuda_float32 PASSED      [ 18%]
2023-01-11T23:10:17.0278402Z test_ops.py::TestCommonCUDA::test_out_triangular_solve_cuda_float32 XFAIL [ 18%]
2023-01-11T23:10:17.0278548Z test_ops.py::TestCommonCUDA::test_out_tril_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0278718Z test_ops.py::TestCommonCUDA::test_out_triu_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0278870Z test_ops.py::TestCommonCUDA::test_out_true_divide_cuda_float32 PASSED    [ 18%]
2023-01-11T23:10:17.0279020Z test_ops.py::TestCommonCUDA::test_out_unflatten_cuda_float32 PASSED      [ 18%]
2023-01-11T23:10:17.0279169Z test_ops.py::TestCommonCUDA::test_out_unfold_copy_cuda_float32 XFAIL     [ 18%]
2023-01-11T23:10:17.0279316Z test_ops.py::TestCommonCUDA::test_out_unfold_cuda_float32 PASSED         [ 18%]
2023-01-11T23:10:17.0279473Z test_ops.py::TestCommonCUDA::test_out_unique_consecutive_cuda_float32 PASSED [ 18%]
2023-01-11T23:10:17.0279621Z test_ops.py::TestCommonCUDA::test_out_unique_cuda_float32 PASSED         [ 18%]
2023-01-11T23:10:17.0279769Z test_ops.py::TestCommonCUDA::test_out_var_cuda_float32 PASSED            [ 18%]
2023-01-11T23:10:17.0279921Z test_ops.py::TestCommonCUDA::test_out_var_unbiased_cuda_float32 PASSED   [ 18%]
2023-01-11T23:10:17.0280070Z test_ops.py::TestCommonCUDA::test_out_view_as_cuda_float32 PASSED        [ 18%]
2023-01-11T23:10:17.0280216Z test_ops.py::TestCommonCUDA::test_out_view_cuda_float32 PASSED           [ 18%]
2023-01-11T23:10:17.0280362Z test_ops.py::TestCommonCUDA::test_out_vstack_cuda_float32 PASSED         [ 18%]
2023-01-11T23:10:17.0280524Z test_ops.py::TestCommonCUDA::test_out_warning___rand___cuda PASSED       [ 18%]
2023-01-11T23:10:17.0280684Z test_ops.py::TestCommonCUDA::test_out_warning___rdiv___cuda PASSED       [ 18%]
2023-01-11T23:10:17.0280850Z test_ops.py::TestCommonCUDA::test_out_warning___rmatmul___cuda PASSED    [ 18%]
2023-01-11T23:10:17.0280995Z test_ops.py::TestCommonCUDA::test_out_warning___ror___cuda PASSED        [ 18%]
2023-01-11T23:10:17.0281190Z test_ops.py::TestCommonCUDA::test_out_warning__native_batch_norm_legit_cuda PASSED [ 18%]
2023-01-11T23:10:17.0281337Z test_ops.py::TestCommonCUDA::test_out_warning__refs_T_cuda PASSED        [ 18%]
2023-01-11T23:10:17.0281514Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bfloat16_cuda PASSED [ 18%]
2023-01-11T23:10:17.0281684Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_byte_cuda PASSED [ 18%]
2023-01-11T23:10:17.0281857Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cdouble_cuda PASSED [ 18%]
2023-01-11T23:10:17.0282026Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_chalf_cuda PASSED [ 18%]
2023-01-11T23:10:17.0282190Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_double_cuda PASSED [ 18%]
2023-01-11T23:10:17.0282355Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_float_cuda PASSED [ 18%]
2023-01-11T23:10:17.0282522Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_int_cuda PASSED [ 18%]
2023-01-11T23:10:17.0282691Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_long_cuda PASSED [ 18%]
2023-01-11T23:10:17.0282855Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_short_cuda PASSED [ 18%]
2023-01-11T23:10:17.0283013Z test_ops.py::TestCommonCUDA::test_out_warning__refs_abs_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0283166Z test_ops.py::TestCommonCUDA::test_out_warning__refs_addr_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0283316Z test_ops.py::TestCommonCUDA::test_out_warning__refs_all_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0283544Z test_ops.py::TestCommonCUDA::test_out_warning__refs_allclose_cuda SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 18%]
2023-01-11T23:10:17.0283695Z test_ops.py::TestCommonCUDA::test_out_warning__refs_any_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0283849Z test_ops.py::TestCommonCUDA::test_out_warning__refs_arange_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0284015Z test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_scatter_cuda PASSED [ 18%]
2023-01-11T23:10:17.0284172Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atan2_cuda PASSED    [ 18%]
2023-01-11T23:10:17.0284321Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atan_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0284513Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_1d_cuda PASSED [ 18%]
2023-01-11T23:10:17.0284673Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_2d_cuda PASSED [ 18%]
2023-01-11T23:10:17.0284826Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atleast_3d_cuda PASSED [ 18%]
2023-01-11T23:10:17.0284985Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_or_cuda PASSED [ 18%]
2023-01-11T23:10:17.0285154Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_right_shift_cuda PASSED [ 18%]
2023-01-11T23:10:17.0285323Z test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_tensors_cuda PASSED [ 18%]
2023-01-11T23:10:17.0285490Z test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_to_cuda PASSED [ 18%]
2023-01-11T23:10:17.0285644Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cat_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0285797Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_cuda PASSED    [ 18%]
2023-01-11T23:10:17.0285958Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_min_cuda PASSED [ 18%]
2023-01-11T23:10:17.0286121Z test_ops.py::TestCommonCUDA::test_out_warning__refs_column_stack_cuda PASSED [ 18%]
2023-01-11T23:10:17.0286279Z test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_physical_cuda PASSED [ 18%]
2023-01-11T23:10:17.0286443Z test_ops.py::TestCommonCUDA::test_out_warning__refs_constant_pad_nd_cuda PASSED [ 18%]
2023-01-11T23:10:17.0286598Z test_ops.py::TestCommonCUDA::test_out_warning__refs_copysign_cuda PASSED [ 18%]
2023-01-11T23:10:17.0286752Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cumsum_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0286921Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_scatter_cuda PASSED [ 18%]
2023-01-11T23:10:17.0287100Z test_ops.py::TestCommonCUDA::test_out_warning__refs_dstack_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0287306Z test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_like_cuda SKIPPED (Expected: empty is not comparable) [ 18%]
2023-01-11T23:10:17.0287457Z test_ops.py::TestCommonCUDA::test_out_warning__refs_eq_cuda PASSED       [ 18%]
2023-01-11T23:10:17.0287602Z test_ops.py::TestCommonCUDA::test_out_warning__refs_erf_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0287754Z test_ops.py::TestCommonCUDA::test_out_warning__refs_exp2_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0287907Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0288056Z test_ops.py::TestCommonCUDA::test_out_warning__refs_eye_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0288207Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft_cuda PASSED  [ 18%]
2023-01-11T23:10:17.0288360Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftn_cuda PASSED [ 18%]
2023-01-11T23:10:17.0288524Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fftshift_cuda PASSED [ 18%]
2023-01-11T23:10:17.0288681Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft2_cuda PASSED [ 18%]
2023-01-11T23:10:17.0288831Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfft_cuda PASSED [ 18%]
2023-01-11T23:10:17.0288990Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft_cuda PASSED [ 18%]
2023-01-11T23:10:17.0289150Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft_cuda PASSED [ 18%]
2023-01-11T23:10:17.0289303Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft_cuda PASSED [ 18%]
2023-01-11T23:10:17.0289459Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfftn_cuda PASSED [ 18%]
2023-01-11T23:10:17.0289611Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fill_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0289762Z test_ops.py::TestCommonCUDA::test_out_warning__refs_flatten_cuda PASSED  [ 18%]
2023-01-11T23:10:17.0289915Z test_ops.py::TestCommonCUDA::test_out_warning__refs_flip_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0290060Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fliplr_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0290232Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmin_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0290383Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmod_cuda PASSED     [ 18%]
2023-01-11T23:10:17.0290531Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ge_cuda PASSED       [ 18%]
2023-01-11T23:10:17.0290677Z test_ops.py::TestCommonCUDA::test_out_warning__refs_gt_cuda PASSED       [ 18%]
2023-01-11T23:10:17.0290836Z test_ops.py::TestCommonCUDA::test_out_warning__refs_heaviside_cuda PASSED [ 18%]
2023-01-11T23:10:17.0290987Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hsplit_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0291136Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hstack_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0291282Z test_ops.py::TestCommonCUDA::test_out_warning__refs_hypot_cuda PASSED    [ 18%]
2023-01-11T23:10:17.0291436Z test_ops.py::TestCommonCUDA::test_out_warning__refs_igamma_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0291588Z test_ops.py::TestCommonCUDA::test_out_warning__refs_igammac_cuda PASSED  [ 18%]
2023-01-11T23:10:17.0291742Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isfinite_cuda PASSED [ 18%]
2023-01-11T23:10:17.0291891Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isinf_cuda PASSED    [ 18%]
2023-01-11T23:10:17.0292044Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isneginf_cuda PASSED [ 18%]
2023-01-11T23:10:17.0292196Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isposinf_cuda PASSED [ 18%]
2023-01-11T23:10:17.0292350Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isreal_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0292492Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lcm_cuda PASSED      [ 18%]
2023-01-11T23:10:17.0292641Z test_ops.py::TestCommonCUDA::test_out_warning__refs_le_cuda PASSED       [ 18%]
2023-01-11T23:10:17.0292816Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lgamma_cuda PASSED   [ 18%]
2023-01-11T23:10:17.0292986Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_matrix_norm_cuda PASSED [ 18%]
2023-01-11T23:10:17.0293151Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_norm_cuda PASSED [ 18%]
2023-01-11T23:10:17.0293307Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linspace_cuda PASSED [ 18%]
2023-01-11T23:10:17.0293456Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0293616Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_or_cuda PASSED [ 19%]
2023-01-11T23:10:17.0293777Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_xor_cuda PASSED [ 19%]
2023-01-11T23:10:17.0293930Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logsumexp_cuda PASSED [ 19%]
2023-01-11T23:10:17.0294078Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lt_cuda PASSED       [ 19%]
2023-01-11T23:10:17.0294256Z test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_list_of_tensors_cuda PASSED [ 19%]
2023-01-11T23:10:17.0294436Z test_ops.py::TestCommonCUDA::test_out_warning__refs_meshgrid_variadic_tensors_cuda PASSED [ 19%]
2023-01-11T23:10:17.0294691Z test_ops.py::TestCommonCUDA::test_out_warning__refs_minimum_cuda PASSED  [ 19%]
2023-01-11T23:10:17.0294844Z test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0294992Z test_ops.py::TestCommonCUDA::test_out_warning__refs_neg_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0295194Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_cuda SKIPPED (Expected: empty is not comparable) [ 19%]
2023-01-11T23:10:17.0295341Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_ones_cuda PASSED [ 19%]
2023-01-11T23:10:17.0295500Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nextafter_cuda PASSED [ 19%]
2023-01-11T23:10:17.0295683Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_alpha_dropout_cuda PASSED [ 19%]
2023-01-11T23:10:17.0295855Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_celu_cuda PASSED [ 19%]
2023-01-11T23:10:17.0296116Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_dropout_cuda SKIPPED (Expected: dropout is not comparable) [ 19%]
2023-01-11T23:10:17.0296288Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_elu_cuda PASSED [ 19%]
2023-01-11T23:10:17.0296459Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_gelu_cuda PASSED [ 19%]
2023-01-11T23:10:17.0296625Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_glu_cuda PASSED [ 19%]
2023-01-11T23:10:17.0296801Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardshrink_cuda PASSED [ 19%]
2023-01-11T23:10:17.0296983Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hinge_embedding_loss_cuda PASSED [ 19%]
2023-01-11T23:10:17.0297156Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_huber_loss_cuda PASSED [ 19%]
2023-01-11T23:10:17.0297333Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_leaky_relu_cuda PASSED [ 19%]
2023-01-11T23:10:17.0297508Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_nll_loss_cuda PASSED [ 19%]
2023-01-11T23:10:17.0297675Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu_cuda PASSED [ 19%]
2023-01-11T23:10:17.0297851Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softshrink_cuda PASSED [ 19%]
2023-01-11T23:10:17.0298026Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_threshold_cuda PASSED [ 19%]
2023-01-11T23:10:17.0298180Z test_ops.py::TestCommonCUDA::test_out_warning__refs_norm_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0298327Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ones_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0298483Z test_ops.py::TestCommonCUDA::test_out_warning__refs_positive_cuda PASSED [ 19%]
2023-01-11T23:10:17.0298669Z test_ops.py::TestCommonCUDA::test_out_warning__refs_prod_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0298822Z test_ops.py::TestCommonCUDA::test_out_warning__refs_randn_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0298975Z test_ops.py::TestCommonCUDA::test_out_warning__refs_real_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0299128Z test_ops.py::TestCommonCUDA::test_out_warning__refs_repeat_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0299279Z test_ops.py::TestCommonCUDA::test_out_warning__refs_rot90_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0299431Z test_ops.py::TestCommonCUDA::test_out_warning__refs_round_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0299574Z test_ops.py::TestCommonCUDA::test_out_warning__refs_rsqrt_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0299725Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sgn_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0299874Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sign_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0300026Z test_ops.py::TestCommonCUDA::test_out_warning__refs_signbit_cuda PASSED  [ 19%]
2023-01-11T23:10:17.0300192Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_entr_cuda PASSED [ 19%]
2023-01-11T23:10:17.0300356Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_erfcx_cuda PASSED [ 19%]
2023-01-11T23:10:17.0300519Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1_cuda PASSED [ 19%]
2023-01-11T23:10:17.0300682Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i1e_cuda PASSED [ 19%]
2023-01-11T23:10:17.0300842Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_ndtr_cuda PASSED [ 19%]
2023-01-11T23:10:17.0301025Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_log_softmax_with_dtype_cuda PASSED [ 19%]
2023-01-11T23:10:17.0301191Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_logit_cuda PASSED [ 19%]
2023-01-11T23:10:17.0301380Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_5_cuda PASSED [ 19%]
2023-01-11T23:10:17.0301546Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtri_cuda PASSED [ 19%]
2023-01-11T23:10:17.0301700Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sqrt_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0301854Z test_ops.py::TestCommonCUDA::test_out_warning__refs_square_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0302029Z test_ops.py::TestCommonCUDA::test_out_warning__refs_stack_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0302182Z test_ops.py::TestCommonCUDA::test_out_warning__refs_std_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0302325Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sub_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0302484Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_to_size_cuda PASSED [ 19%]
2023-01-11T23:10:17.0302633Z test_ops.py::TestCommonCUDA::test_out_warning__refs_t_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0302783Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tan_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0302933Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tanh_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0303100Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tensor_split_cuda PASSED [ 19%]
2023-01-11T23:10:17.0303249Z test_ops.py::TestCommonCUDA::test_out_warning__refs_trace_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0303416Z test_ops.py::TestCommonCUDA::test_out_warning__refs_transpose_cuda PASSED [ 19%]
2023-01-11T23:10:17.0303559Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0303709Z test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0303872Z test_ops.py::TestCommonCUDA::test_out_warning__refs_triu_indices_cuda PASSED [ 19%]
2023-01-11T23:10:17.0304024Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unbind_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0304183Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_copy_cuda PASSED [ 19%]
2023-01-11T23:10:17.0304341Z test_ops.py::TestCommonCUDA::test_out_warning__refs_view_as_cuda PASSED  [ 19%]
2023-01-11T23:10:17.0304519Z test_ops.py::TestCommonCUDA::test_out_warning__refs_vsplit_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0304671Z test_ops.py::TestCommonCUDA::test_out_warning__refs_where_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0304820Z test_ops.py::TestCommonCUDA::test_out_warning__refs_xlogy_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0304969Z test_ops.py::TestCommonCUDA::test_out_warning__refs_zeros_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0305137Z test_ops.py::TestCommonCUDA::test_out_warning__softmax_backward_data_cuda PASSED [ 19%]
2023-01-11T23:10:17.0305287Z test_ops.py::TestCommonCUDA::test_out_warning_acos_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0305436Z test_ops.py::TestCommonCUDA::test_out_warning_acosh_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0305586Z test_ops.py::TestCommonCUDA::test_out_warning_addmm_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0305750Z test_ops.py::TestCommonCUDA::test_out_warning_addmm_decomposed_cuda PASSED [ 19%]
2023-01-11T23:10:17.0305901Z test_ops.py::TestCommonCUDA::test_out_warning_addmv_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0306126Z test_ops.py::TestCommonCUDA::test_out_warning_allclose_cuda SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 19%]
2023-01-11T23:10:17.0306280Z test_ops.py::TestCommonCUDA::test_out_warning_aminmax_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0306430Z test_ops.py::TestCommonCUDA::test_out_warning_argmax_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0306578Z test_ops.py::TestCommonCUDA::test_out_warning_argsort_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0306729Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0306899Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_partial_views_cuda PASSED [ 19%]
2023-01-11T23:10:17.0307046Z test_ops.py::TestCommonCUDA::test_out_warning_asin_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0307193Z test_ops.py::TestCommonCUDA::test_out_warning_asinh_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0307342Z test_ops.py::TestCommonCUDA::test_out_warning_atan2_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0307486Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_1d_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0307669Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_3d_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0307822Z test_ops.py::TestCommonCUDA::test_out_warning_baddbmm_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0307971Z test_ops.py::TestCommonCUDA::test_out_warning_bfloat16_cuda PASSED       [ 19%]
2023-01-11T23:10:17.0308120Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_or_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0308286Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_right_shift_cuda PASSED [ 19%]
2023-01-11T23:10:17.0308432Z test_ops.py::TestCommonCUDA::test_out_warning_bmm_cuda PASSED            [ 19%]
2023-01-11T23:10:17.0308595Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_shapes_cuda PASSED [ 19%]
2023-01-11T23:10:17.0308820Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_tensors_cuda PASSED [ 19%]
2023-01-11T23:10:17.0308985Z test_ops.py::TestCommonCUDA::test_out_warning_broadcast_to_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0309137Z test_ops.py::TestCommonCUDA::test_out_warning_bucketize_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0309294Z test_ops.py::TestCommonCUDA::test_out_warning_cartesian_prod_cuda PASSED [ 19%]
2023-01-11T23:10:17.0309444Z test_ops.py::TestCommonCUDA::test_out_warning_cdouble_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0309590Z test_ops.py::TestCommonCUDA::test_out_warning_chalf_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0309736Z test_ops.py::TestCommonCUDA::test_out_warning_char_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0309887Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_cuda PASSED       [ 19%]
2023-01-11T23:10:17.0310041Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_inverse_cuda PASSED [ 19%]
2023-01-11T23:10:17.0310198Z test_ops.py::TestCommonCUDA::test_out_warning_cholesky_solve_cuda PASSED [ 19%]
2023-01-11T23:10:17.0310373Z test_ops.py::TestCommonCUDA::test_out_warning_clamp_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0310524Z test_ops.py::TestCommonCUDA::test_out_warning_clamp_max_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0310676Z test_ops.py::TestCommonCUDA::test_out_warning_clamp_min_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0310826Z test_ops.py::TestCommonCUDA::test_out_warning_column_stack_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0310981Z test_ops.py::TestCommonCUDA::test_out_warning_combinations_cuda PASSED   [ 19%]
2023-01-11T23:10:17.0311130Z test_ops.py::TestCommonCUDA::test_out_warning_complex_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0311271Z test_ops.py::TestCommonCUDA::test_out_warning_conj_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0311420Z test_ops.py::TestCommonCUDA::test_out_warning_cos_cuda PASSED            [ 19%]
2023-01-11T23:10:17.0311568Z test_ops.py::TestCommonCUDA::test_out_warning_cosh_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0311717Z test_ops.py::TestCommonCUDA::test_out_warning_cross_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0311865Z test_ops.py::TestCommonCUDA::test_out_warning_cummax_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0312015Z test_ops.py::TestCommonCUDA::test_out_warning_cumsum_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0312184Z test_ops.py::TestCommonCUDA::test_out_warning_cumulative_trapezoid_cuda PASSED [ 19%]
2023-01-11T23:10:17.0312332Z test_ops.py::TestCommonCUDA::test_out_warning_deg2rad_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0312473Z test_ops.py::TestCommonCUDA::test_out_warning_diag_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0312623Z test_ops.py::TestCommonCUDA::test_out_warning_diag_embed_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0312775Z test_ops.py::TestCommonCUDA::test_out_warning_diagonal_copy_cuda PASSED  [ 19%]
2023-01-11T23:10:17.0312924Z test_ops.py::TestCommonCUDA::test_out_warning_digamma_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0313070Z test_ops.py::TestCommonCUDA::test_out_warning_dist_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0313237Z test_ops.py::TestCommonCUDA::test_out_warning_div_floor_rounding_cuda PASSED [ 19%]
2023-01-11T23:10:17.0313402Z test_ops.py::TestCommonCUDA::test_out_warning_div_no_rounding_mode_cuda PASSED [ 19%]
2023-01-11T23:10:17.0313573Z test_ops.py::TestCommonCUDA::test_out_warning_dot_cuda PASSED            [ 19%]
2023-01-11T23:10:17.0313717Z test_ops.py::TestCommonCUDA::test_out_warning_dsplit_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0313867Z test_ops.py::TestCommonCUDA::test_out_warning_dstack_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0314015Z test_ops.py::TestCommonCUDA::test_out_warning_einsum_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0314209Z test_ops.py::TestCommonCUDA::test_out_warning_empty_cuda SKIPPED (Expected: empty is not comparable) [ 19%]
2023-01-11T23:10:17.0314356Z test_ops.py::TestCommonCUDA::test_out_warning_erfc_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0314502Z test_ops.py::TestCommonCUDA::test_out_warning_exp_cuda PASSED            [ 19%]
2023-01-11T23:10:17.0314655Z test_ops.py::TestCommonCUDA::test_out_warning_expand_as_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0314802Z test_ops.py::TestCommonCUDA::test_out_warning_expand_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0314945Z test_ops.py::TestCommonCUDA::test_out_warning_expm1_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0315095Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fftn_cuda PASSED       [ 19%]
2023-01-11T23:10:17.0315243Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft_cuda PASSED       [ 19%]
2023-01-11T23:10:17.0315392Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft2_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0315539Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftn_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0315688Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft2_cuda XFAIL      [ 19%]
2023-01-11T23:10:17.0315837Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft2_cuda PASSED     [ 19%]
2023-01-11T23:10:17.0316011Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfft_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0316159Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft2_cuda PASSED      [ 19%]
2023-01-11T23:10:17.0316305Z test_ops.py::TestCommonCUDA::test_out_warning_flatten_cuda PASSED        [ 19%]
2023-01-11T23:10:17.0316453Z test_ops.py::TestCommonCUDA::test_out_warning_fliplr_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0316604Z test_ops.py::TestCommonCUDA::test_out_warning_float_power_cuda PASSED    [ 19%]
2023-01-11T23:10:17.0316750Z test_ops.py::TestCommonCUDA::test_out_warning_fmax_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0316895Z test_ops.py::TestCommonCUDA::test_out_warning_fmin_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0317043Z test_ops.py::TestCommonCUDA::test_out_warning_fmod_cuda PASSED           [ 19%]
2023-01-11T23:10:17.0317188Z test_ops.py::TestCommonCUDA::test_out_warning_full_cuda XFAIL            [ 19%]
2023-01-11T23:10:17.0317340Z test_ops.py::TestCommonCUDA::test_out_warning_gather_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0317479Z test_ops.py::TestCommonCUDA::test_out_warning_ge_cuda PASSED             [ 19%]
2023-01-11T23:10:17.0317628Z test_ops.py::TestCommonCUDA::test_out_warning_geqrf_cuda PASSED          [ 19%]
2023-01-11T23:10:17.0317790Z test_ops.py::TestCommonCUDA::test_out_warning_grid_sampler_2d_cuda PASSED [ 19%]
2023-01-11T23:10:17.0317934Z test_ops.py::TestCommonCUDA::test_out_warning_gt_cuda PASSED             [ 19%]
2023-01-11T23:10:17.0318155Z test_ops.py::TestCommonCUDA::test_out_warning_histogramdd_cuda SKIPPED (Skipped! Op has not supported dtypes on this device.) [ 19%]
2023-01-11T23:10:17.0318303Z test_ops.py::TestCommonCUDA::test_out_warning_hstack_cuda PASSED         [ 19%]
2023-01-11T23:10:17.0318450Z test_ops.py::TestCommonCUDA::test_out_warning_hypot_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0318598Z test_ops.py::TestCommonCUDA::test_out_warning_igamma_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0318742Z test_ops.py::TestCommonCUDA::test_out_warning_imag_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0318891Z test_ops.py::TestCommonCUDA::test_out_warning_index_add_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0319066Z test_ops.py::TestCommonCUDA::test_out_warning_index_copy_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0319222Z test_ops.py::TestCommonCUDA::test_out_warning_index_select_cuda PASSED   [ 20%]
2023-01-11T23:10:17.0319367Z test_ops.py::TestCommonCUDA::test_out_warning_inner_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0319511Z test_ops.py::TestCommonCUDA::test_out_warning_int_cuda PASSED            [ 20%]
2023-01-11T23:10:17.0319661Z test_ops.py::TestCommonCUDA::test_out_warning_isfinite_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0319813Z test_ops.py::TestCommonCUDA::test_out_warning_isposinf_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0319953Z test_ops.py::TestCommonCUDA::test_out_warning_istft_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0320126Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_2inputs_2outputs_cuda PASSED [ 20%]
2023-01-11T23:10:17.0320293Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_unary_cuda PASSED [ 20%]
2023-01-11T23:10:17.0320438Z test_ops.py::TestCommonCUDA::test_out_warning_kron_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0320618Z test_ops.py::TestCommonCUDA::test_out_warning_kthvalue_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0320788Z test_ops.py::TestCommonCUDA::test_out_warning_lcm_cuda PASSED            [ 20%]
2023-01-11T23:10:17.0320935Z test_ops.py::TestCommonCUDA::test_out_warning_ldexp_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0321082Z test_ops.py::TestCommonCUDA::test_out_warning_lerp_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0321224Z test_ops.py::TestCommonCUDA::test_out_warning_lgamma_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0321376Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cond_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0321526Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0321716Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_det_singular_cuda PASSED [ 20%]
2023-01-11T23:10:17.0321867Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigh_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0322025Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvals_cuda PASSED [ 20%]
2023-01-11T23:10:17.0322186Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eigvalsh_cuda PASSED [ 20%]
2023-01-11T23:10:17.0322337Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0322494Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_cuda PASSED [ 20%]
2023-01-11T23:10:17.0322666Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_grad_oriented_cuda PASSED [ 20%]
2023-01-11T23:10:17.0322831Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_ex_cuda PASSED [ 20%]
2023-01-11T23:10:17.0322992Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_cuda PASSED [ 20%]
2023-01-11T23:10:17.0323147Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0323318Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_hermitian_cuda PASSED [ 20%]
2023-01-11T23:10:17.0323550Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_pinv_singular_cuda SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 20%]
2023-01-11T23:10:17.0323703Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_svd_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0323857Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_svdvals_cuda PASSED [ 20%]
2023-01-11T23:10:17.0324012Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorinv_cuda PASSED [ 20%]
2023-01-11T23:10:17.0324165Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_vander_cuda PASSED  [ 20%]
2023-01-11T23:10:17.0324315Z test_ops.py::TestCommonCUDA::test_out_warning_linspace_cuda XFAIL        [ 20%]
2023-01-11T23:10:17.0324462Z test_ops.py::TestCommonCUDA::test_out_warning_log1p_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0324622Z test_ops.py::TestCommonCUDA::test_out_warning_logcumsumexp_cuda XFAIL    [ 20%]
2023-01-11T23:10:17.0324772Z test_ops.py::TestCommonCUDA::test_out_warning_logdet_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0324949Z test_ops.py::TestCommonCUDA::test_out_warning_logical_and_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0325104Z test_ops.py::TestCommonCUDA::test_out_warning_logical_xor_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0325247Z test_ops.py::TestCommonCUDA::test_out_warning_logspace_cuda XFAIL        [ 20%]
2023-01-11T23:10:17.0325402Z test_ops.py::TestCommonCUDA::test_out_warning_logsumexp_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0325550Z test_ops.py::TestCommonCUDA::test_out_warning_long_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0325700Z test_ops.py::TestCommonCUDA::test_out_warning_masked_amax_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0325853Z test_ops.py::TestCommonCUDA::test_out_warning_masked_argmin_cuda PASSED  [ 20%]
2023-01-11T23:10:17.0326006Z test_ops.py::TestCommonCUDA::test_out_warning_masked_fill_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0326170Z test_ops.py::TestCommonCUDA::test_out_warning_masked_logaddexp_cuda PASSED [ 20%]
2023-01-11T23:10:17.0326325Z test_ops.py::TestCommonCUDA::test_out_warning_masked_scatter_cuda PASSED [ 20%]
2023-01-11T23:10:17.0326470Z test_ops.py::TestCommonCUDA::test_out_warning_masked_std_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0326620Z test_ops.py::TestCommonCUDA::test_out_warning_masked_sum_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0326771Z test_ops.py::TestCommonCUDA::test_out_warning_matrix_exp_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0326950Z test_ops.py::TestCommonCUDA::test_out_warning_max_pool2d_with_indices_backward_cuda XFAIL [ 20%]
2023-01-11T23:10:17.0327117Z test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_no_dim_cuda PASSED [ 20%]
2023-01-11T23:10:17.0327284Z test_ops.py::TestCommonCUDA::test_out_warning_max_reduction_with_dim_cuda PASSED [ 20%]
2023-01-11T23:10:17.0327461Z test_ops.py::TestCommonCUDA::test_out_warning_maximum_cuda PASSED        [ 20%]
2023-01-11T23:10:17.0327609Z test_ops.py::TestCommonCUDA::test_out_warning_mean_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0327755Z test_ops.py::TestCommonCUDA::test_out_warning_min_binary_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0327922Z test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_no_dim_cuda PASSED [ 20%]
2023-01-11T23:10:17.0328071Z test_ops.py::TestCommonCUDA::test_out_warning_minimum_cuda PASSED        [ 20%]
2023-01-11T23:10:17.0328216Z test_ops.py::TestCommonCUDA::test_out_warning_mm_cuda PASSED             [ 20%]
2023-01-11T23:10:17.0328364Z test_ops.py::TestCommonCUDA::test_out_warning_movedim_cuda PASSED        [ 20%]
2023-01-11T23:10:17.0328519Z test_ops.py::TestCommonCUDA::test_out_warning_multinomial_cuda XFAIL     [ 20%]
2023-01-11T23:10:17.0328688Z test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_3_cuda PASSED [ 20%]
2023-01-11T23:10:17.0328840Z test_ops.py::TestCommonCUDA::test_out_warning_nanmedian_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0328987Z test_ops.py::TestCommonCUDA::test_out_warning_nanquantile_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0329135Z test_ops.py::TestCommonCUDA::test_out_warning_nansum_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0329286Z test_ops.py::TestCommonCUDA::test_out_warning_narrow_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0329450Z test_ops.py::TestCommonCUDA::test_out_warning_native_batch_norm_cuda PASSED [ 20%]
2023-01-11T23:10:17.0329596Z test_ops.py::TestCommonCUDA::test_out_warning_ne_cuda PASSED             [ 20%]
2023-01-11T23:10:17.0329758Z test_ops.py::TestCommonCUDA::test_out_warning_new_empty_strided_cuda PASSED [ 20%]
2023-01-11T23:10:17.0329907Z test_ops.py::TestCommonCUDA::test_out_warning_new_full_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0330060Z test_ops.py::TestCommonCUDA::test_out_warning_new_ones_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0330213Z test_ops.py::TestCommonCUDA::test_out_warning_nextafter_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0330437Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional__scaled_dot_product_attention_cuda PASSED [ 20%]
2023-01-11T23:10:17.0330656Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool1d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0330835Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool2d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0331011Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_alpha_dropout_cuda PASSED [ 20%]
2023-01-11T23:10:17.0331181Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_cuda PASSED [ 20%]
2023-01-11T23:10:17.0331345Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_celu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0331510Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv1d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0331671Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv2d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0331844Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose2d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0332023Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose3d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0332210Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_similarity_cuda PASSED [ 20%]
2023-01-11T23:10:17.0332380Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout3d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0332542Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_elu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0332714Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_bag_cuda PASSED [ 20%]
2023-01-11T23:10:17.0332910Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_with_train_cuda PASSED [ 20%]
2023-01-11T23:10:17.0333092Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool2d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0333311Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_glu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0333476Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_grid_sample_cuda PASSED [ 20%]
2023-01-11T23:10:17.0333646Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardswish_cuda PASSED [ 20%]
2023-01-11T23:10:17.0333832Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bilinear_cuda PASSED [ 20%]
2023-01-11T23:10:17.0334014Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_nearest_cuda PASSED [ 20%]
2023-01-11T23:10:17.0334178Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_linear_cuda PASSED [ 20%]
2023-01-11T23:10:17.0334355Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_local_response_norm_cuda PASSED [ 20%]
2023-01-11T23:10:17.0334618Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_logsigmoid_cuda PASSED [ 20%]
2023-01-11T23:10:17.0334797Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_margin_ranking_loss_cuda PASSED [ 20%]
2023-01-11T23:10:17.0334966Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0335143Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool1d_grad_cuda PASSED [ 20%]
2023-01-11T23:10:17.0335316Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0335494Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool2d_grad_cuda PASSED [ 20%]
2023-01-11T23:10:17.0335663Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_cuda PASSED [ 20%]
2023-01-11T23:10:17.0335827Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mse_loss_cuda PASSED [ 20%]
2023-01-11T23:10:17.0335990Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_nll_loss_cuda PASSED [ 20%]
2023-01-11T23:10:17.0336153Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_one_hot_cuda PASSED [ 20%]
2023-01-11T23:10:17.0336331Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pairwise_distance_cuda PASSED [ 20%]
2023-01-11T23:10:17.0336495Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_shuffle_cuda PASSED [ 20%]
2023-01-11T23:10:17.0336699Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_prelu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0336858Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_rrelu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0337018Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_selu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0337252Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_complex_cuda SKIPPED (Skipped! Op has not supported dtypes on this device.) [ 20%]
2023-01-11T23:10:17.0337410Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_silu_cuda PASSED [ 20%]
2023-01-11T23:10:17.0337581Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_smooth_l1_loss_cuda PASSED [ 20%]
2023-01-11T23:10:17.0337752Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_soft_margin_loss_cuda PASSED [ 20%]
2023-01-11T23:10:17.0337920Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softplus_cuda PASSED [ 20%]
2023-01-11T23:10:17.0338084Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softshrink_cuda PASSED [ 20%]
2023-01-11T23:10:17.0338252Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_threshold_cuda PASSED [ 20%]
2023-01-11T23:10:17.0338430Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_loss_cuda PASSED [ 20%]
2023-01-11T23:10:17.0338592Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_unfold_cuda PASSED [ 20%]
2023-01-11T23:10:17.0338765Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_nearest_cuda PASSED [ 20%]
2023-01-11T23:10:17.0338915Z test_ops.py::TestCommonCUDA::test_out_warning_norm_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0339065Z test_ops.py::TestCommonCUDA::test_out_warning_norm_fro_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0339249Z test_ops.py::TestCommonCUDA::test_out_warning_norm_nuc_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0339389Z test_ops.py::TestCommonCUDA::test_out_warning_normal_cuda XFAIL          [ 20%]
2023-01-11T23:10:17.0339553Z test_ops.py::TestCommonCUDA::test_out_warning_normal_number_mean_cuda PASSED [ 20%]
2023-01-11T23:10:17.0339697Z test_ops.py::TestCommonCUDA::test_out_warning_ones_cuda XFAIL            [ 20%]
2023-01-11T23:10:17.0339847Z test_ops.py::TestCommonCUDA::test_out_warning_ones_like_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0339993Z test_ops.py::TestCommonCUDA::test_out_warning_outer_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0340142Z test_ops.py::TestCommonCUDA::test_out_warning_permute_cuda PASSED        [ 20%]
2023-01-11T23:10:17.0340325Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_1_cuda SKIPPED (Skipped!) [ 20%]
2023-01-11T23:10:17.0340500Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_2_cuda SKIPPED (Skipped!) [ 20%]
2023-01-11T23:10:17.0340671Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_3_cuda SKIPPED (Skipped!) [ 20%]
2023-01-11T23:10:17.0340844Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_4_cuda SKIPPED (Skipped!) [ 20%]
2023-01-11T23:10:17.0341001Z test_ops.py::TestCommonCUDA::test_out_warning_positive_cuda PASSED       [ 20%]
2023-01-11T23:10:17.0341148Z test_ops.py::TestCommonCUDA::test_out_warning_prod_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0341296Z test_ops.py::TestCommonCUDA::test_out_warning_rad2deg_cuda PASSED        [ 20%]
2023-01-11T23:10:17.0341444Z test_ops.py::TestCommonCUDA::test_out_warning_rand_like_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0341590Z test_ops.py::TestCommonCUDA::test_out_warning_randint_cuda XFAIL         [ 20%]
2023-01-11T23:10:17.0341740Z test_ops.py::TestCommonCUDA::test_out_warning_randint_like_cuda PASSED   [ 20%]
2023-01-11T23:10:17.0341877Z test_ops.py::TestCommonCUDA::test_out_warning_randn_cuda XFAIL           [ 20%]
2023-01-11T23:10:17.0342031Z test_ops.py::TestCommonCUDA::test_out_warning_reciprocal_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0342181Z test_ops.py::TestCommonCUDA::test_out_warning_remainder_cuda PASSED      [ 20%]
2023-01-11T23:10:17.0342352Z test_ops.py::TestCommonCUDA::test_out_warning_renorm_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0342503Z test_ops.py::TestCommonCUDA::test_out_warning_repeat_cuda PASSED         [ 20%]
2023-01-11T23:10:17.0342652Z test_ops.py::TestCommonCUDA::test_out_warning_reshape_as_cuda PASSED     [ 20%]
2023-01-11T23:10:17.0342799Z test_ops.py::TestCommonCUDA::test_out_warning_resize_as__cuda PASSED     [ 20%]
2023-01-11T23:10:17.0342947Z test_ops.py::TestCommonCUDA::test_out_warning_resolve_conj_cuda PASSED   [ 20%]
2023-01-11T23:10:17.0343090Z test_ops.py::TestCommonCUDA::test_out_warning_roll_cuda PASSED           [ 20%]
2023-01-11T23:10:17.0343256Z test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_3_cuda SKIPPED (Skipped!) [ 20%]
2023-01-11T23:10:17.0343435Z test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_neg_3_cuda SKIPPED (Skipped!) [ 20%]
2023-01-11T23:10:17.0343581Z test_ops.py::TestCommonCUDA::test_out_warning_rsqrt_cuda PASSED          [ 20%]
2023-01-11T23:10:17.0343733Z test_ops.py::TestCommonCUDA::test_out_warning_scalar_tensor_cuda PASSED  [ 20%]
2023-01-11T23:10:17.0343882Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_add_cuda PASSED    [ 20%]
2023-01-11T23:10:17.0344045Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_mean_cuda PASSED [ 21%]
2023-01-11T23:10:17.0344197Z test_ops.py::TestCommonCUDA::test_out_warning_searchsorted_cuda PASSED   [ 21%]
2023-01-11T23:10:17.0344365Z test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_offsets_cuda PASSED [ 21%]
2023-01-11T23:10:17.0344503Z test_ops.py::TestCommonCUDA::test_out_warning_sgn_cuda PASSED            [ 21%]
2023-01-11T23:10:17.0344649Z test_ops.py::TestCommonCUDA::test_out_warning_short_cuda PASSED          [ 21%]
2023-01-11T23:10:17.0344793Z test_ops.py::TestCommonCUDA::test_out_warning_sign_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0344985Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_blackman_cuda PASSED [ 21%]
2023-01-11T23:10:17.0345154Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_cosine_cuda PASSED [ 21%]
2023-01-11T23:10:17.0345325Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_exponential_cuda PASSED [ 21%]
2023-01-11T23:10:17.0345489Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_gaussian_cuda PASSED [ 21%]
2023-01-11T23:10:17.0345662Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_cosine_cuda PASSED [ 21%]
2023-01-11T23:10:17.0345829Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_general_hamming_cuda PASSED [ 21%]
2023-01-11T23:10:17.0345996Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hamming_cuda PASSED [ 21%]
2023-01-11T23:10:17.0346145Z test_ops.py::TestCommonCUDA::test_out_warning_signbit_cuda PASSED        [ 21%]
2023-01-11T23:10:17.0346292Z test_ops.py::TestCommonCUDA::test_out_warning_sin_cuda PASSED            [ 21%]
2023-01-11T23:10:17.0346437Z test_ops.py::TestCommonCUDA::test_out_warning_sinh_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0346591Z test_ops.py::TestCommonCUDA::test_out_warning_slice_scatter_cuda PASSED  [ 21%]
2023-01-11T23:10:17.0346739Z test_ops.py::TestCommonCUDA::test_out_warning_softmax_cuda PASSED        [ 21%]
2023-01-11T23:10:17.0346900Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y0_cuda PASSED [ 21%]
2023-01-11T23:10:17.0347050Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_y1_cuda PASSED [ 21%]
2023-01-11T23:10:17.0347226Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_u_cuda PASSED [ 21%]
2023-01-11T23:10:17.0347573Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:10:17.0347726Z test_ops.py::TestCommonCUDA::test_out_warning_special_erfcx_cuda PASSED  [ 21%]
2023-01-11T23:10:17.0347903Z test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_h_cuda PASSED [ 21%]
2023-01-11T23:10:17.0348054Z test_ops.py::TestCommonCUDA::test_out_warning_special_i0e_cuda PASSED    [ 21%]
2023-01-11T23:10:17.0348237Z test_ops.py::TestCommonCUDA::test_out_warning_special_i1_cuda PASSED     [ 21%]
2023-01-11T23:10:17.0348392Z test_ops.py::TestCommonCUDA::test_out_warning_special_i1e_cuda PASSED    [ 21%]
2023-01-11T23:10:17.0348784Z test_ops.py::TestCommonCUDA::test_out_warning_special_legendre_polynomial_p_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:10:17.0348946Z test_ops.py::TestCommonCUDA::test_out_warning_special_log_ndtr_cuda PASSED [ 21%]
2023-01-11T23:10:17.0349118Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i0_cuda PASSED [ 21%]
2023-01-11T23:10:17.0349288Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_i1_cuda PASSED [ 21%]
2023-01-11T23:10:17.0349456Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k1_cuda PASSED [ 21%]
2023-01-11T23:10:17.0349606Z test_ops.py::TestCommonCUDA::test_out_warning_special_ndtr_cuda PASSED   [ 21%]
2023-01-11T23:10:17.0349762Z test_ops.py::TestCommonCUDA::test_out_warning_special_ndtri_cuda PASSED  [ 21%]
2023-01-11T23:10:17.0349949Z test_ops.py::TestCommonCUDA::test_out_warning_special_polygamma_special_polygamma_n_0_cuda PASSED [ 21%]
2023-01-11T23:10:17.0350131Z test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k1_cuda PASSED [ 21%]
2023-01-11T23:10:17.0350535Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:10:17.0350701Z test_ops.py::TestCommonCUDA::test_out_warning_special_spherical_bessel_j0_cuda PASSED [ 21%]
2023-01-11T23:10:17.0350849Z test_ops.py::TestCommonCUDA::test_out_warning_sqrt_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0351028Z test_ops.py::TestCommonCUDA::test_out_warning_squeeze_cuda PASSED        [ 21%]
2023-01-11T23:10:17.0351177Z test_ops.py::TestCommonCUDA::test_out_warning_stack_cuda PASSED          [ 21%]
2023-01-11T23:10:17.0351329Z test_ops.py::TestCommonCUDA::test_out_warning_std_mean_cuda PASSED       [ 21%]
2023-01-11T23:10:17.0351481Z test_ops.py::TestCommonCUDA::test_out_warning_std_unbiased_cuda PASSED   [ 21%]
2023-01-11T23:10:17.0351628Z test_ops.py::TestCommonCUDA::test_out_warning_stft_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0351775Z test_ops.py::TestCommonCUDA::test_out_warning_svd_cuda PASSED            [ 21%]
2023-01-11T23:10:17.0351915Z test_ops.py::TestCommonCUDA::test_out_warning_symeig_cuda PASSED         [ 21%]
2023-01-11T23:10:17.0352061Z test_ops.py::TestCommonCUDA::test_out_warning_take_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0352205Z test_ops.py::TestCommonCUDA::test_out_warning_tanh_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0352355Z test_ops.py::TestCommonCUDA::test_out_warning_tensordot_cuda PASSED      [ 21%]
2023-01-11T23:10:17.0352501Z test_ops.py::TestCommonCUDA::test_out_warning_tile_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0352648Z test_ops.py::TestCommonCUDA::test_out_warning_topk_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0352794Z test_ops.py::TestCommonCUDA::test_out_warning_trace_cuda PASSED          [ 21%]
2023-01-11T23:10:17.0352942Z test_ops.py::TestCommonCUDA::test_out_warning_transpose_cuda PASSED      [ 21%]
2023-01-11T23:10:17.0353084Z test_ops.py::TestCommonCUDA::test_out_warning_trapezoid_cuda PASSED      [ 21%]
2023-01-11T23:10:17.0353244Z test_ops.py::TestCommonCUDA::test_out_warning_triangular_solve_cuda PASSED [ 21%]
2023-01-11T23:10:17.0353395Z test_ops.py::TestCommonCUDA::test_out_warning_tril_indices_cuda PASSED   [ 21%]
2023-01-11T23:10:17.0353538Z test_ops.py::TestCommonCUDA::test_out_warning_triu_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0353687Z test_ops.py::TestCommonCUDA::test_out_warning_triu_indices_cuda PASSED   [ 21%]
2023-01-11T23:10:17.0353834Z test_ops.py::TestCommonCUDA::test_out_warning_trunc_cuda PASSED          [ 21%]
2023-01-11T23:10:17.0353981Z test_ops.py::TestCommonCUDA::test_out_warning_unflatten_cuda PASSED      [ 21%]
2023-01-11T23:10:17.0354158Z test_ops.py::TestCommonCUDA::test_out_warning_unfold_copy_cuda XFAIL     [ 21%]
2023-01-11T23:10:17.0354299Z test_ops.py::TestCommonCUDA::test_out_warning_unfold_cuda PASSED         [ 21%]
2023-01-11T23:10:17.0354447Z test_ops.py::TestCommonCUDA::test_out_warning_uniform_cuda PASSED        [ 21%]
2023-01-11T23:10:17.0354613Z test_ops.py::TestCommonCUDA::test_out_warning_unique_consecutive_cuda PASSED [ 21%]
2023-01-11T23:10:17.0354758Z test_ops.py::TestCommonCUDA::test_out_warning_unique_cuda PASSED         [ 21%]
2023-01-11T23:10:17.0354905Z test_ops.py::TestCommonCUDA::test_out_warning_var_mean_cuda PASSED       [ 21%]
2023-01-11T23:10:17.0355055Z test_ops.py::TestCommonCUDA::test_out_warning_var_unbiased_cuda PASSED   [ 21%]
2023-01-11T23:10:17.0355203Z test_ops.py::TestCommonCUDA::test_out_warning_vdot_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0355362Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_complex_cuda PASSED [ 21%]
2023-01-11T23:10:17.0355510Z test_ops.py::TestCommonCUDA::test_out_warning_view_copy_cuda XFAIL       [ 21%]
2023-01-11T23:10:17.0355649Z test_ops.py::TestCommonCUDA::test_out_warning_view_cuda PASSED           [ 21%]
2023-01-11T23:10:17.0355795Z test_ops.py::TestCommonCUDA::test_out_warning_vsplit_cuda PASSED         [ 21%]
2023-01-11T23:10:17.0355941Z test_ops.py::TestCommonCUDA::test_out_warning_where_cuda PASSED          [ 21%]
2023-01-11T23:10:17.0356084Z test_ops.py::TestCommonCUDA::test_out_warning_zeros_cuda XFAIL           [ 21%]
2023-01-11T23:10:17.0356230Z test_ops.py::TestCommonCUDA::test_out_where_cuda_float32 PASSED          [ 21%]
2023-01-11T23:10:17.0356373Z test_ops.py::TestCommonCUDA::test_out_xlogy_cuda_float32 PASSED          [ 21%]
2023-01-11T23:10:17.0356520Z test_ops.py::TestCommonCUDA::test_out_zeros_like_cuda_float32 PASSED     [ 21%]
2023-01-11T23:10:17.0356695Z test_ops.py::TestCommonCUDA::test_pointwise_tag_coverage_cuda PASSED     [ 21%]
2023-01-11T23:10:17.0356846Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0357010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex128 PASSED [ 21%]
2023-01-11T23:10:17.0357167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex32 PASSED [ 21%]
2023-01-11T23:10:17.0357323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_complex64 PASSED [ 21%]
2023-01-11T23:10:17.0357472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0357623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0357771Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0357919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int16 PASSED   [ 21%]
2023-01-11T23:10:17.0358062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int32 PASSED   [ 21%]
2023-01-11T23:10:17.0358209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int8 PASSED    [ 21%]
2023-01-11T23:10:17.0358359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_uint8 PASSED   [ 21%]
2023-01-11T23:10:17.0358534Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bool PASSED [ 21%]
2023-01-11T23:10:17.0358718Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex128 PASSED [ 21%]
2023-01-11T23:10:17.0358899Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex64 PASSED [ 21%]
2023-01-11T23:10:17.0359078Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0359251Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int16 PASSED [ 21%]
2023-01-11T23:10:17.0359422Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0359598Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0359794Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex32 PASSED [ 21%]
2023-01-11T23:10:17.0359968Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex64 PASSED [ 21%]
2023-01-11T23:10:17.0360152Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0360322Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0360490Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int64 PASSED [ 21%]
2023-01-11T23:10:17.0360661Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0360834Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_bool PASSED [ 21%]
2023-01-11T23:10:17.0361005Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex128 PASSED [ 21%]
2023-01-11T23:10:17.0361180Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_complex64 PASSED [ 21%]
2023-01-11T23:10:17.0361348Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int16 PASSED [ 21%]
2023-01-11T23:10:17.0361515Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0361680Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_uint8 PASSED [ 21%]
2023-01-11T23:10:17.0361857Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0362030Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_bool PASSED [ 21%]
2023-01-11T23:10:17.0362203Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0362402Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_uint8 PASSED [ 21%]
2023-01-11T23:10:17.0362575Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0362753Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex32 PASSED [ 21%]
2023-01-11T23:10:17.0362926Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0363097Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int16 PASSED [ 21%]
2023-01-11T23:10:17.0363267Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_uint8 PASSED [ 21%]
2023-01-11T23:10:17.0363438Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bool PASSED [ 21%]
2023-01-11T23:10:17.0363614Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex128 PASSED [ 21%]
2023-01-11T23:10:17.0363790Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0363957Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0364127Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int64 PASSED [ 21%]
2023-01-11T23:10:17.0364296Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int8 PASSED [ 21%]
2023-01-11T23:10:17.0364464Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_uint8 PASSED [ 21%]
2023-01-11T23:10:17.0364635Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0364808Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex32 PASSED [ 21%]
2023-01-11T23:10:17.0364975Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0365145Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0365310Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int64 PASSED [ 21%]
2023-01-11T23:10:17.0365489Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_uint8 PASSED [ 21%]
2023-01-11T23:10:17.0365668Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0365843Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0366014Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_complex_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0366184Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0366351Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int16 PASSED [ 21%]
2023-01-11T23:10:17.0366528Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int8 PASSED [ 21%]
2023-01-11T23:10:17.0366706Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex128 PASSED [ 21%]
2023-01-11T23:10:17.0366881Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex64 PASSED [ 21%]
2023-01-11T23:10:17.0367046Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0367218Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0367387Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int16 PASSED [ 21%]
2023-01-11T23:10:17.0367553Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0367720Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int8 PASSED [ 21%]
2023-01-11T23:10:17.0367912Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bool PASSED [ 21%]
2023-01-11T23:10:17.0368089Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex128 PASSED [ 21%]
2023-01-11T23:10:17.0368263Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_complex64 PASSED [ 21%]
2023-01-11T23:10:17.0368435Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0368597Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0368766Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0368932Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int64 PASSED [ 21%]
2023-01-11T23:10:17.0369097Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int8 PASSED [ 21%]
2023-01-11T23:10:17.0369272Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:10:17.0369442Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float16 PASSED [ 21%]
2023-01-11T23:10:17.0369612Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float32 PASSED [ 21%]
2023-01-11T23:10:17.0369778Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_float64 PASSED [ 21%]
2023-01-11T23:10:17.0369938Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int32 PASSED [ 21%]
2023-01-11T23:10:17.0370108Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0370277Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0370450Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0370616Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0370791Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0370988Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0371163Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0371330Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0371491Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0371651Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0371805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_bool PASSED  [ 22%]
2023-01-11T23:10:17.0371962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0372122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0372278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int16 PASSED [ 22%]
2023-01-11T23:10:17.0372435Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int8 PASSED  [ 22%]
2023-01-11T23:10:17.0372598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0372745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0372907Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0373065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0373218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0373373Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0373554Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0373710Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0373878Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0374032Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0374189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0374344Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0374597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0374758Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0374909Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int16 PASSED [ 22%]
2023-01-11T23:10:17.0375070Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0375235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0375393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0375555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0375717Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0375873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0376029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0376189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0376349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0376506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0376670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex128 XFAIL [ 22%]
2023-01-11T23:10:17.0376821Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float32 XFAIL [ 22%]
2023-01-11T23:10:17.0377016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float64 XFAIL [ 22%]
2023-01-11T23:10:17.0377171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int16 XFAIL [ 22%]
2023-01-11T23:10:17.0377323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int32 XFAIL [ 22%]
2023-01-11T23:10:17.0377474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int8 XFAIL  [ 22%]
2023-01-11T23:10:17.0377632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0377794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0377953Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0378102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int16 PASSED [ 22%]
2023-01-11T23:10:17.0378253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0378421Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0378589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0378751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0378904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0379062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0379215Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0379364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0379545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0379700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0379860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0380011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0380166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int16 PASSED [ 22%]
2023-01-11T23:10:17.0380321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0380472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bool PASSED  [ 22%]
2023-01-11T23:10:17.0380622Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0380776Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0380934Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0381093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0381252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0381406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0381562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0381722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0381890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0382048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0382211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0382374Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0382552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0382762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0382944Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0383124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int16 PASSED [ 22%]
2023-01-11T23:10:17.0383298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0383462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0383637Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0383809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0383985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0384159Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0384329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0384489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0384644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0384806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0384960Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0385117Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0385296Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0385449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0385607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0385761Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0385925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0386087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0386238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0386391Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0386546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0386703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0386859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0387021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0387173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0387327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0387475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0387630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0387788Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0387948Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0388108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0388266Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0388451Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0388606Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0388821Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0388987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0389147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0389303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0389457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0389623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0389792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0389961Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0390127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0390284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0390446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0390605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0390768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0390933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex32 PASSED [ 22%]
2023-01-11T23:10:17.0391124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0391283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0391443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0391603Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0391769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex64 PASSED [ 22%]
2023-01-11T23:10:17.0391929Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0392091Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0392253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0392413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int8 PASSED [ 22%]
2023-01-11T23:10:17.0392577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0392749Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0392910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0393065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0393228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0393387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_bool PASSED [ 22%]
2023-01-11T23:10:17.0393545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int16 PASSED [ 22%]
2023-01-11T23:10:17.0393700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0393873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0394044Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0394228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0394393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_shapes_cuda_float32 PASSED [ 22%]
2023-01-11T23:10:17.0394571Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex128 PASSED [ 22%]
2023-01-11T23:10:17.0394741Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float16 PASSED [ 22%]
2023-01-11T23:10:17.0394916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float64 PASSED [ 22%]
2023-01-11T23:10:17.0395083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int32 PASSED [ 22%]
2023-01-11T23:10:17.0395250Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int64 PASSED [ 22%]
2023-01-11T23:10:17.0395418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_uint8 PASSED [ 22%]
2023-01-11T23:10:17.0395586Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:10:17.0395751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0395910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0396073Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0396234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0396397Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0396559Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0396754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0396914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0397079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0397230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0397384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bool PASSED  [ 23%]
2023-01-11T23:10:17.0397545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0397703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0397860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0398016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0398173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0398326Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0398474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0398632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0398786Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0398939Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0399093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0399250Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0399414Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0399577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0399729Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0399910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0400067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0400222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0400377Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0400556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0400734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0400895Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0401055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0401202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0401357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0401510Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0401664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0401829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0401992Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0402152Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0402310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0402492Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0402652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0402816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0402973Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0403130Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0403288Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0403442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0403602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0403750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0403907Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0404061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0404218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0404389Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0404561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0404722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0404883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0405043Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0405195Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0405354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0405512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0405689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0405849Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0406003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0406169Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0406335Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0406500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0406671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0406842Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0407008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0407174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0407337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0407499Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0407663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0407827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0407993Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0408167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0408357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0408527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0408695Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0408857Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0409019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0409181Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0409338Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0409501Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0409663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0409819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0409975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bool PASSED  [ 23%]
2023-01-11T23:10:17.0410135Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0410290Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0410444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0410593Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0410744Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0410901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0411064Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0411223Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0411403Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0411564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0411719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0411866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0412028Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0412190Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0412349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0412508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0412663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0412819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0412978Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0413137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0413287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0413464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0413629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0413794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0413985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0414148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0414313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0414571Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0414728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0414889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0415061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0415232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0415398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0415565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0415727Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0415894Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0416053Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_uint8 PASSED [ 23%]
2023-01-11T23:10:17.0416216Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0416381Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0416542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0416704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0416864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0417042Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex128 PASSED [ 23%]
2023-01-11T23:10:17.0417255Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0417428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0417593Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0417761Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0417927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0418090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0418249Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0418425Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float16 PASSED [ 23%]
2023-01-11T23:10:17.0418598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0418772Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_float64 PASSED [ 23%]
2023-01-11T23:10:17.0418941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0419111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:10:17.0419283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex32 XFAIL [ 23%]
2023-01-11T23:10:17.0419461Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0419635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0419836Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0420005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0420175Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int64 PASSED [ 23%]
2023-01-11T23:10:17.0420346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float32 PASSED [ 23%]
2023-01-11T23:10:17.0420507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int32 PASSED [ 23%]
2023-01-11T23:10:17.0420676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int8 PASSED [ 23%]
2023-01-11T23:10:17.0420835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bool PASSED [ 23%]
2023-01-11T23:10:17.0421003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex32 PASSED [ 23%]
2023-01-11T23:10:17.0421169Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex64 PASSED [ 23%]
2023-01-11T23:10:17.0421327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int16 PASSED [ 23%]
2023-01-11T23:10:17.0421483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0421646Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0421805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0421963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0422123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex32 PASSED [ 24%]
2023-01-11T23:10:17.0422284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0422443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0422604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0422760Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0422970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0423127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0423324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0423523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0423724Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0423926Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0424127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0424329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0424526Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0424720Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0424916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0425114Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0425315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0425546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0425752Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0425952Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0426151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0426346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:10:17.0426504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0426663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0426820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0426969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0427125Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int16 PASSED  [ 24%]
2023-01-11T23:10:17.0427277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int32 PASSED  [ 24%]
2023-01-11T23:10:17.0427428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int64 PASSED  [ 24%]
2023-01-11T23:10:17.0427578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_int8 PASSED   [ 24%]
2023-01-11T23:10:17.0427734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0427890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0428039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0428182Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int8 PASSED  [ 24%]
2023-01-11T23:10:17.0428346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0428501Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0428743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0428917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0429080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0429242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0429398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0429546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0429703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0429863Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0430016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0430173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0430329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0430482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0430634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0430781Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bool PASSED  [ 24%]
2023-01-11T23:10:17.0430944Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0431104Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex32 PASSED [ 24%]
2023-01-11T23:10:17.0431287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0431444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float16 PASSED [ 24%]
2023-01-11T23:10:17.0431602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0431756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0431911Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0432065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0432213Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0432380Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0432539Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0432709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0432873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float16 PASSED [ 24%]
2023-01-11T23:10:17.0433038Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0433199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0433359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0433509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0433666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0433822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0433980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0434143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0434302Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float16 PASSED [ 24%]
2023-01-11T23:10:17.0434480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0434640Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0434788Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0434942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0435100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0435261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0435417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0435577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0435732Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0435892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:10:17.0436041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0436193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0436346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0436498Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0436649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0436803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_int8 PASSED  [ 24%]
2023-01-11T23:10:17.0436991Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0437154Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex32 PASSED [ 24%]
2023-01-11T23:10:17.0437320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0437474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0437627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0437782Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0437938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0438100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0438257Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0438413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0438565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0438722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0438877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0439030Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0439182Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0439348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0439519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex32 PASSED [ 24%]
2023-01-11T23:10:17.0439686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float16 PASSED [ 24%]
2023-01-11T23:10:17.0439853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0440009Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0440193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0440361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0440527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0440689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0440847Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0441006Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0441163Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0441324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0441480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex32 PASSED [ 24%]
2023-01-11T23:10:17.0441644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0441804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float16 PASSED [ 24%]
2023-01-11T23:10:17.0441964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0442120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0442274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0447932Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0448129Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0448346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0448509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0448671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0448831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0448986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0449139Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0449291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0449445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0449592Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0449757Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0449919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0450075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0450229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int32 PASSED [ 24%]
2023-01-11T23:10:17.0450382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_uint8 PASSED [ 24%]
2023-01-11T23:10:17.0450536Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0450721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0450906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0451071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0451228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int16 PASSED [ 24%]
2023-01-11T23:10:17.0451407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int64 PASSED [ 24%]
2023-01-11T23:10:17.0451563Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int8 PASSED [ 24%]
2023-01-11T23:10:17.0451726Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bool PASSED [ 24%]
2023-01-11T23:10:17.0451898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex128 PASSED [ 24%]
2023-01-11T23:10:17.0452070Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex64 PASSED [ 24%]
2023-01-11T23:10:17.0452235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float16 PASSED [ 24%]
2023-01-11T23:10:17.0452395Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float32 PASSED [ 24%]
2023-01-11T23:10:17.0452562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_float64 PASSED [ 24%]
2023-01-11T23:10:17.0452722Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0452887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0453049Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0453211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0453373Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0453535Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0453688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0453847Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0454031Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0454189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0454347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0454723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0454888Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0455050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0455214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0455372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0455534Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0455700Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0455860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0456016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0456183Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0456346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex32 PASSED [ 25%]
2023-01-11T23:10:17.0456508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0456660Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0456817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0456976Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0457132Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0457333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0457490Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0457643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0457795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0457959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0458117Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0458274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0458431Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0458587Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0458748Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0458900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0459055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0459211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0459356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0459512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0459668Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0459855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0460012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0460171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0460321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0460470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0460613Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0460767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0460922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0461077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0461233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0461387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0461546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0461697Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0461845Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0462003Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:10:17.0462161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0462322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0462480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0462638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0462792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0462968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0463119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0463262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:10:17.0463417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0463569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0463720Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0463871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0464027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bool PASSED [ 25%]
2023-01-11T23:10:17.0464188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0464347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0464498Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0464649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0464801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0464953Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0465107Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0465262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0465440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:10:17.0465596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0465761Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0465923Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0466084Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0466241Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0466398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0466554Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0466704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0466860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0467011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0467188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_bfloat16 SKIPPED (Skipped!) [ 25%]
2023-01-11T23:10:17.0467349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0467508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0467667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0467824Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0467978Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0468128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0468283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0468428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0468604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0468839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0468996Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0469151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0469300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0469454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0469605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0469751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0469900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0470051Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0470205Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int8 PASSED [ 25%]
2023-01-11T23:10:17.0470356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0470505Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0470652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0470801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int64 PASSED [ 25%]
2023-01-11T23:10:17.0470946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int8 PASSED  [ 25%]
2023-01-11T23:10:17.0471122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0471274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0471423Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int32 PASSED  [ 25%]
2023-01-11T23:10:17.0471570Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int64 PASSED  [ 25%]
2023-01-11T23:10:17.0471714Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int8 PASSED   [ 25%]
2023-01-11T23:10:17.0471866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:10:17.0472012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_uint8 PASSED  [ 25%]
2023-01-11T23:10:17.0472166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0472322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0472480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0472633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float64 PASSED [ 25%]
2023-01-11T23:10:17.0472789Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int16 PASSED [ 25%]
2023-01-11T23:10:17.0472940Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int32 PASSED [ 25%]
2023-01-11T23:10:17.0473091Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_uint8 PASSED [ 25%]
2023-01-11T23:10:17.0473250Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex64 PASSED [ 25%]
2023-01-11T23:10:17.0473407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0473556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0473709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0473866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:10:17.0474014Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_bool PASSED   [ 25%]
2023-01-11T23:10:17.0474199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float16 PASSED [ 25%]
2023-01-11T23:10:17.0474352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int16 PASSED  [ 25%]
2023-01-11T23:10:17.0474500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int8 PASSED   [ 25%]
2023-01-11T23:10:17.0474652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0474802Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float32 PASSED [ 25%]
2023-01-11T23:10:17.0474954Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex128 PASSED [ 25%]
2023-01-11T23:10:17.0475106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex32 PASSED [ 25%]
2023-01-11T23:10:17.0475264Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bfloat16 XFAIL [ 25%]
2023-01-11T23:10:17.0475419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_bool XFAIL [ 25%]
2023-01-11T23:10:17.0475580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float32 XFAIL [ 25%]
2023-01-11T23:10:17.0475733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int32 XFAIL [ 25%]
2023-01-11T23:10:17.0475885Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int8 XFAIL [ 25%]
2023-01-11T23:10:17.0476041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex128 XFAIL [ 25%]
2023-01-11T23:10:17.0476197Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float64 XFAIL [ 25%]
2023-01-11T23:10:17.0476354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int8 XFAIL [ 25%]
2023-01-11T23:10:17.0476514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bfloat16 XFAIL [ 25%]
2023-01-11T23:10:17.0476696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_bool XFAIL [ 25%]
2023-01-11T23:10:17.0476851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float32 XFAIL [ 25%]
2023-01-11T23:10:17.0477012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float64 XFAIL [ 25%]
2023-01-11T23:10:17.0477165Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int64 XFAIL [ 25%]
2023-01-11T23:10:17.0477325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bfloat16 XFAIL [ 25%]
2023-01-11T23:10:17.0477476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_bool XFAIL [ 25%]
2023-01-11T23:10:17.0477640Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex32 XFAIL [ 26%]
2023-01-11T23:10:17.0477802Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex64 XFAIL [ 26%]
2023-01-11T23:10:17.0477962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float32 XFAIL [ 26%]
2023-01-11T23:10:17.0478118Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int64 XFAIL [ 26%]
2023-01-11T23:10:17.0478273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int8 XFAIL [ 26%]
2023-01-11T23:10:17.0478431Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_uint8 XFAIL [ 26%]
2023-01-11T23:10:17.0478585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0478740Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0478900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0479056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0479210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0479361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0479510Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0479665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0479845Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0480002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0480170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex32 PASSED [ 26%]
2023-01-11T23:10:17.0480352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0480531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0480686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0480841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0480998Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0481151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0481310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0481459Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0481611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0481763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0481914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0482069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0482229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0482407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0482562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0482710Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0482866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0483024Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0483180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0483330Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0483482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0483632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_bool PASSED   [ 26%]
2023-01-11T23:10:17.0483783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int32 PASSED  [ 26%]
2023-01-11T23:10:17.0483925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int64 PASSED  [ 26%]
2023-01-11T23:10:17.0484075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int8 PASSED   [ 26%]
2023-01-11T23:10:17.0484223Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_uint8 PASSED  [ 26%]
2023-01-11T23:10:17.0484382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0484541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex32 PASSED [ 26%]
2023-01-11T23:10:17.0484695Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0484849Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0484999Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0485147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0485299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0485470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0485646Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0485819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0485987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0486153Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0486318Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0486477Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0486633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0486796Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0486957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0487116Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0487286Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0487450Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0487622Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0487794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0487982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0488145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0488307Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0488467Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0488627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0488785Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int16 XFAIL [ 26%]
2023-01-11T23:10:17.0488936Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int32 XFAIL [ 26%]
2023-01-11T23:10:17.0489089Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int8 XFAIL [ 26%]
2023-01-11T23:10:17.0489238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_uint8 XFAIL [ 26%]
2023-01-11T23:10:17.0489392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0489550Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0489706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0489855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0490010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0490187Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0490370Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0490521Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0490663Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0490813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0490966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0491145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0491300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0491449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0491604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0491755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0491907Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0492057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex32 PASSED [ 26%]
2023-01-11T23:10:17.0492212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0492363Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0492516Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0492667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0492816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0492966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0493111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0493260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0493430Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0493630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0493808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0493986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0494157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0494324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0494596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0494751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0494919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0495085Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0495246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0495407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0495565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0495723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0495887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0496038Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0496208Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0496367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0496531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0496689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0496883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0497044Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0497204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0497364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0497513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0497669Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0497832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0497989Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0498146Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0498306Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0498465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0498623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0498776Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0498933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0499093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex64 PASSED [ 26%]
2023-01-11T23:10:17.0499279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0499437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0499597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int8 PASSED [ 26%]
2023-01-11T23:10:17.0499754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0499913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0500073Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float32 PASSED [ 26%]
2023-01-11T23:10:17.0500224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0500381Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int16 PASSED [ 26%]
2023-01-11T23:10:17.0500532Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int32 PASSED [ 26%]
2023-01-11T23:10:17.0500688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0500840Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_uint8 PASSED [ 26%]
2023-01-11T23:10:17.0500996Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0501147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_bool PASSED   [ 26%]
2023-01-11T23:10:17.0501301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0501448Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float64 PASSED [ 26%]
2023-01-11T23:10:17.0501598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int64 PASSED  [ 26%]
2023-01-11T23:10:17.0501760Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:10:17.0501924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex128 PASSED [ 26%]
2023-01-11T23:10:17.0502090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex32 PASSED [ 26%]
2023-01-11T23:10:17.0502247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float16 PASSED [ 26%]
2023-01-11T23:10:17.0502426Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int64 PASSED [ 26%]
2023-01-11T23:10:17.0502583Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bool PASSED [ 26%]
2023-01-11T23:10:17.0502733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0502887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0503038Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0503192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0503343Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_uint8 PASSED [ 27%]
2023-01-11T23:10:17.0503496Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0503650Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0503809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0503954Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0504106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0504280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0504456Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0504633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0504804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0504999Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [ 27%]
2023-01-11T23:10:17.0505183Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0505357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0505529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0505706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [ 27%]
2023-01-11T23:10:17.0505866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0506019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0506175Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0506337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0506500Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0506664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0506823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0506971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int16 PASSED [ 27%]
2023-01-11T23:10:17.0507123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0507273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0507428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0507581Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0507737Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_bool PASSED  [ 27%]
2023-01-11T23:10:17.0507892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex32 XFAIL [ 27%]
2023-01-11T23:10:17.0508067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0508214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0508365Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int16 PASSED [ 27%]
2023-01-11T23:10:17.0508513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0508723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0508887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_int8 PASSED  [ 27%]
2023-01-11T23:10:17.0509046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0509208Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0509367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0509520Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_uint8 PASSED [ 27%]
2023-01-11T23:10:17.0509679Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0509843Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0510002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0510160Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0510316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0510474Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0510652Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0510810Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0510963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0511122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0511275Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0511428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0511600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0511753Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0511905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int16 PASSED  [ 27%]
2023-01-11T23:10:17.0512058Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int32 PASSED  [ 27%]
2023-01-11T23:10:17.0512201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int64 PASSED  [ 27%]
2023-01-11T23:10:17.0512354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0512506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex32 PASSED [ 27%]
2023-01-11T23:10:17.0512653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0512808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0512957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int16 PASSED [ 27%]
2023-01-11T23:10:17.0513103Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0513252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0513397Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_int8 PASSED  [ 27%]
2023-01-11T23:10:17.0513600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0513843Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0514046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0514244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0514440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0514635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0514832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0515029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:10:17.0515246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0515459Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0515678Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0515892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0516162Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0516376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0516585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0516794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0517007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:10:17.0517166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0517328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0517485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0517644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0517803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0517957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0518115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_uint8 PASSED [ 27%]
2023-01-11T23:10:17.0518274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0518428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0518596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0518750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0518909Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0519063Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0519240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0519396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0519556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex32 PASSED [ 27%]
2023-01-11T23:10:17.0519716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0519872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int32 PASSED [ 27%]
2023-01-11T23:10:17.0520026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_uint8 PASSED [ 27%]
2023-01-11T23:10:17.0520179Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0520404Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:10:17.0520627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:10:17.0520871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:10:17.0521069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0521240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0521458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:10:17.0521698Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:10:17.0521868Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0522033Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0522201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0522367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0522530Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0522692Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0522870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0523048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0523226Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0523402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0523570Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0523744Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0523915Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int16 PASSED [ 27%]
2023-01-11T23:10:17.0524084Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0524272Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0524463Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0524675Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0524855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0525025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0525195Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0525370Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0525542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0525718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0525908Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0526095Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_bool PASSED [ 27%]
2023-01-11T23:10:17.0526291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0526483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex32 PASSED [ 27%]
2023-01-11T23:10:17.0526669Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 27%]
2023-01-11T23:10:17.0526851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0527034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int64 PASSED [ 27%]
2023-01-11T23:10:17.0527246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_uint8 PASSED [ 27%]
2023-01-11T23:10:17.0527433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0527615Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0527799Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0527970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:10:17.0528140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0528315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float16 PASSED [ 27%]
2023-01-11T23:10:17.0528481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0528651Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 27%]
2023-01-11T23:10:17.0528817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0529002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 27%]
2023-01-11T23:10:17.0529185Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 27%]
2023-01-11T23:10:17.0529366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int16 PASSED [ 27%]
2023-01-11T23:10:17.0529545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int8 PASSED [ 27%]
2023-01-11T23:10:17.0529713Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float64 XFAIL [ 27%]
2023-01-11T23:10:17.0529896Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0530096Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0530298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0530502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0530681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0530854Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0531032Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0531210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0531382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0531555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0531717Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0531889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0532056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0532224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0532388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0532580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0532746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0532916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0533082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0533244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0533409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0533594Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0533775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [ 28%]
2023-01-11T23:10:17.0533963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0534145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0534325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0534626Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0534819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0534997Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0535180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0535361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0535542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0535814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0535995Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0536171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0536350Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0536525Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0536696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0536882Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0537058Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0537233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0537406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0537574Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0537749Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0537919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0538087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0538282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0538455Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0538627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0538815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0539005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0539192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0539376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0539558Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0539742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0539918Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0540075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0540234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0540390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0540542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0540696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bool PASSED [ 28%]
2023-01-11T23:10:17.0540855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0541010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0541160Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0541337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0541490Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0541641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0541803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0541958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_bool PASSED [ 28%]
2023-01-11T23:10:17.0542122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0542283Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0542446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0542598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0542759Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0542913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0543063Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0543225Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0543389Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0543549Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0543707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0543878Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0544034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0544188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0544338Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0544489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int8 PASSED  [ 28%]
2023-01-11T23:10:17.0544641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0544797Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0544952Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_bool PASSED [ 28%]
2023-01-11T23:10:17.0545108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0545268Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0545425Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0545579Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0545734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0545887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0546049Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0546203Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0546353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0546502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0546661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0546821Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0547005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0547160Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0547309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0547462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0547618Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0547765Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0547917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0548071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0548222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0548375Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0548534Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bool PASSED [ 28%]
2023-01-11T23:10:17.0548769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0548934Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0549090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0549253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0549411Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0549599Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0549760Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0549925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0550083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0550238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0550384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0550540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0550696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0550881Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0551066Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0551223Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0551379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0551530Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0551685Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0551831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0551982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0552140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bool PASSED [ 28%]
2023-01-11T23:10:17.0552305Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex128 PASSED [ 28%]
2023-01-11T23:10:17.0552472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0552635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_complex64 PASSED [ 28%]
2023-01-11T23:10:17.0552829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0552987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float64 PASSED [ 28%]
2023-01-11T23:10:17.0553141Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0553299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0553454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0553612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0553766Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0553925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex32 PASSED [ 28%]
2023-01-11T23:10:17.0554080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float32 PASSED [ 28%]
2023-01-11T23:10:17.0554233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0554379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0554533Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:10:17.0554684Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0554835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int64 PASSED [ 28%]
2023-01-11T23:10:17.0554985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int8 PASSED [ 28%]
2023-01-11T23:10:17.0555137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_uint8 PASSED [ 28%]
2023-01-11T23:10:17.0555320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0555471Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int16 PASSED [ 28%]
2023-01-11T23:10:17.0555619Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int32 PASSED [ 28%]
2023-01-11T23:10:17.0555777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float16 PASSED [ 28%]
2023-01-11T23:10:17.0555931Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0556083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0556232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0556387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0556545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex128 PASSED [ 29%]
2023-01-11T23:10:17.0556705Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0556859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0557007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0557157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0557308Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0557461Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0557613Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0557768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex32 PASSED [ 29%]
2023-01-11T23:10:17.0557917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0558069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0558214Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int8 PASSED  [ 29%]
2023-01-11T23:10:17.0558401Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0558561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0558718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0558872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0559025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0559174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0559325Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0559469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0559629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0559784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0559937Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0560092Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0560244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0560399Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0560550Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_bool PASSED  [ 29%]
2023-01-11T23:10:17.0560701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex128 PASSED [ 29%]
2023-01-11T23:10:17.0560855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex32 PASSED [ 29%]
2023-01-11T23:10:17.0561050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0561224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0561385Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0561533Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0561685Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0561841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex128 PASSED [ 29%]
2023-01-11T23:10:17.0561989Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0562140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0562287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0562441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0562597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0562756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0562906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0563053Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0563204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0563349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0563519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0563697Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex128 PASSED [ 29%]
2023-01-11T23:10:17.0563873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0564068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0564236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0564400Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0564564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0564721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0564886Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0565048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0565212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0565372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0565542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0565704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0565864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0566021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0566174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0566334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0566524Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0566686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0566852Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0567014Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0567178Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0567342Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0567497Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0567658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0567817Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0567980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0568141Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0568302Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0568458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0568618Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0568775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0568927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0569086Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0569244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0569404Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0569563Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0569745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0569905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0570072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0570252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0570444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0570615Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0570781Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0570942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0571126Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0571314Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex32 PASSED [ 29%]
2023-01-11T23:10:17.0571498Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0571681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0571856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0572036Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0572238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0572402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0572567Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0572728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0572888Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0573077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0573263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0573440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0573628Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0573808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0573986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0574170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0574354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0574653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0574833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0575019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0575176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:10:17.0575376Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0575545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0575706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0575870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0576029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0576191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0576355Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0576512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0576673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0576836Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0577022Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0577202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0577379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0577556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0577736Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0577944Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0578119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0578300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0578478Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0578659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_int8 PASSED [ 29%]
2023-01-11T23:10:17.0578823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0578990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0579155Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0579321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int64 PASSED [ 29%]
2023-01-11T23:10:17.0579487Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0579641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0579801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0579954Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0580112Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex128 PASSED [ 29%]
2023-01-11T23:10:17.0580269Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0580424Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0580577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float64 PASSED [ 29%]
2023-01-11T23:10:17.0580730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int16 PASSED [ 29%]
2023-01-11T23:10:17.0580875Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int32 PASSED [ 29%]
2023-01-11T23:10:17.0581054Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_uint8 PASSED [ 29%]
2023-01-11T23:10:17.0581211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bool PASSED [ 29%]
2023-01-11T23:10:17.0581372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex64 PASSED [ 29%]
2023-01-11T23:10:17.0581528Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float16 PASSED [ 29%]
2023-01-11T23:10:17.0581680Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float32 PASSED [ 29%]
2023-01-11T23:10:17.0581830Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0581984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0582138Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0582282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0582442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0582603Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0582764Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0582922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0583075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0583229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0583385Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0583556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_bool PASSED [ 30%]
2023-01-11T23:10:17.0583715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0583871Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0584023Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0584176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0584332Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0584489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0584643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0584788Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0584951Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0585107Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0585270Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0585424Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0585575Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0585725Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0585877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0586022Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0586174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0586328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0586483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0586665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0586820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0586972Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0587122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0587271Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0587413Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int8 PASSED  [ 30%]
2023-01-11T23:10:17.0587580Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0587745Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0587906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0588065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0588218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0588372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0588523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0588724Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0588892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0589052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int32 PASSED   [ 30%]
2023-01-11T23:10:17.0589233Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int8 PASSED    [ 30%]
2023-01-11T23:10:17.0589390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0589543Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_bool PASSED  [ 30%]
2023-01-11T23:10:17.0589701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0589855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0590000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0590150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0590308Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0590468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0590625Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0590782Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0590962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0591147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bool XFAIL [ 30%]
2023-01-11T23:10:17.0591309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex128 XFAIL [ 30%]
2023-01-11T23:10:17.0591476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_complex64 XFAIL [ 30%]
2023-01-11T23:10:17.0591639Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float32 XFAIL [ 30%]
2023-01-11T23:10:17.0591798Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int16 XFAIL [ 30%]
2023-01-11T23:10:17.0591958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int32 XFAIL [ 30%]
2023-01-11T23:10:17.0592119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int64 XFAIL [ 30%]
2023-01-11T23:10:17.0592279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_int8 XFAIL [ 30%]
2023-01-11T23:10:17.0592470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_uint8 XFAIL [ 30%]
2023-01-11T23:10:17.0592625Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0592768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_bool PASSED   [ 30%]
2023-01-11T23:10:17.0592925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0593078Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0593229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0593382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0593534Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int32 PASSED  [ 30%]
2023-01-11T23:10:17.0593682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int64 PASSED  [ 30%]
2023-01-11T23:10:17.0593833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int8 PASSED   [ 30%]
2023-01-11T23:10:17.0593987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0594148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0594304Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0594457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0594621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0594783Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0594967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0595125Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0595277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0595428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_bool PASSED [ 30%]
2023-01-11T23:10:17.0595588Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0595743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0595897Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0596047Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0596199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0596350Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0596495Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0596659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0596808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_bool PASSED [ 30%]
2023-01-11T23:10:17.0596962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0597124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0597277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0597427Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0597578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0597728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0597879Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0598062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bool PASSED [ 30%]
2023-01-11T23:10:17.0598230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0598392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0598555Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0598718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0598877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0599034Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0599190Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0599348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0599508Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0599666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0599816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0599967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0600123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0600278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_bool PASSED [ 30%]
2023-01-11T23:10:17.0600431Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0600611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0600767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0600919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0601069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0601230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0601396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0601556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0601715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0601868Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0602027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0602183Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int32 PASSED [ 30%]
2023-01-11T23:10:17.0602341Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int8 PASSED [ 30%]
2023-01-11T23:10:17.0602504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0602669Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0602827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0602983Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0603134Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0603292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0603453Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0603607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0603822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0603975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int64 PASSED [ 30%]
2023-01-11T23:10:17.0604127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_uint8 PASSED [ 30%]
2023-01-11T23:10:17.0604290Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0604448Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0604604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0604764Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float32 PASSED [ 30%]
2023-01-11T23:10:17.0604924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int16 PASSED [ 30%]
2023-01-11T23:10:17.0605079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex128 PASSED [ 30%]
2023-01-11T23:10:17.0605236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0605387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0605546Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex64 PASSED [ 30%]
2023-01-11T23:10:17.0605702Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0605853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float64 PASSED [ 30%]
2023-01-11T23:10:17.0606013Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:10:17.0606167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_bool PASSED [ 30%]
2023-01-11T23:10:17.0606346Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex32 PASSED [ 30%]
2023-01-11T23:10:17.0606505Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float16 PASSED [ 30%]
2023-01-11T23:10:17.0606657Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0606807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_float64 PASSED [ 31%]
2023-01-11T23:10:17.0606957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0607104Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0607254Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0607406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex128 PASSED [ 31%]
2023-01-11T23:10:17.0607564Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0607713Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float16 PASSED [ 31%]
2023-01-11T23:10:17.0607867Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0608019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int64 PASSED [ 31%]
2023-01-11T23:10:17.0608172Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0608324Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0608476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0608635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0608794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float16 PASSED [ 31%]
2023-01-11T23:10:17.0608949Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0609107Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int32 PASSED [ 31%]
2023-01-11T23:10:17.0609262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0609442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex32 PASSED [ 31%]
2023-01-11T23:10:17.0609597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0609750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float16 PASSED [ 31%]
2023-01-11T23:10:17.0609902Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0610051Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int16 PASSED [ 31%]
2023-01-11T23:10:17.0610200Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int32 PASSED [ 31%]
2023-01-11T23:10:17.0610348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int64 PASSED [ 31%]
2023-01-11T23:10:17.0610504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0610657Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0610816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0610963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0611122Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex128 PASSED [ 31%]
2023-01-11T23:10:17.0611280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0611433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0611587Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0611741Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0611919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int32 PASSED [ 31%]
2023-01-11T23:10:17.0612069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int64 PASSED [ 31%]
2023-01-11T23:10:17.0612217Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0612372Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0612523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0612682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0612840Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float16 PASSED [ 31%]
2023-01-11T23:10:17.0612993Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0613143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0613298Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_add_cuda PASSED [ 31%]
2023-01-11T23:10:17.0613450Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_arange_cuda PASSED [ 31%]
2023-01-11T23:10:17.0613622Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_right_shift_cuda PASSED [ 31%]
2023-01-11T23:10:17.0613784Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_xor_cuda PASSED [ 31%]
2023-01-11T23:10:17.0613941Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_min_cuda XFAIL [ 31%]
2023-01-11T23:10:17.0614097Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_copysign_cuda PASSED [ 31%]
2023-01-11T23:10:17.0614257Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_embed_cuda PASSED [ 31%]
2023-01-11T23:10:17.0614423Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_copy_cuda PASSED [ 31%]
2023-01-11T23:10:17.0614935Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diagonal_cuda PASSED [ 31%]
2023-01-11T23:10:17.0615143Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_floor_rounding_cuda PASSED [ 31%]
2023-01-11T23:10:17.0615296Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dsplit_cuda PASSED [ 31%]
2023-01-11T23:10:17.0615507Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_dstack_cuda XFAIL [ 31%]
2023-01-11T23:10:17.0615671Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eq_cuda PASSED [ 31%]
2023-01-11T23:10:17.0615830Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft2_cuda PASSED [ 31%]
2023-01-11T23:10:17.0615985Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fftn_cuda PASSED [ 31%]
2023-01-11T23:10:17.0616144Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft2_cuda PASSED [ 31%]
2023-01-11T23:10:17.0616299Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfft_cuda PASSED [ 31%]
2023-01-11T23:10:17.0616455Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_hfftn_cuda PASSED [ 31%]
2023-01-11T23:10:17.0616611Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft2_cuda PASSED [ 31%]
2023-01-11T23:10:17.0616771Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifft_cuda PASSED [ 31%]
2023-01-11T23:10:17.0616928Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ifftn_cuda PASSED [ 31%]
2023-01-11T23:10:17.0617083Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft_cuda PASSED [ 31%]
2023-01-11T23:10:17.0617241Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfftn_cuda PASSED [ 31%]
2023-01-11T23:10:17.0617399Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fliplr_cuda PASSED [ 31%]
2023-01-11T23:10:17.0617565Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_floor_divide_cuda PASSED [ 31%]
2023-01-11T23:10:17.0617719Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmax_cuda PASSED [ 31%]
2023-01-11T23:10:17.0617902Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmin_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618055Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fmod_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618210Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ge_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618363Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gt_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618521Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hypot_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618675Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igamma_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618833Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_igammac_cuda PASSED [ 31%]
2023-01-11T23:10:17.0618993Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_index_select_cuda XFAIL [ 31%]
2023-01-11T23:10:17.0619139Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lcm_cuda PASSED [ 31%]
2023-01-11T23:10:17.0619298Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_and_cuda PASSED [ 31%]
2023-01-11T23:10:17.0619464Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_or_cuda PASSED [ 31%]
2023-01-11T23:10:17.0619615Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_lt_cuda PASSED [ 31%]
2023-01-11T23:10:17.0619772Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_masked_fill_cuda XFAIL [ 31%]
2023-01-11T23:10:17.0619926Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mean_cuda PASSED [ 31%]
2023-01-11T23:10:17.0620081Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_minimum_cuda XFAIL [ 31%]
2023-01-11T23:10:17.0620232Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_mul_cuda PASSED [ 31%]
2023-01-11T23:10:17.0620384Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_neg_cuda PASSED [ 31%]
2023-01-11T23:10:17.0620538Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nextafter_cuda PASSED [ 31%]
2023-01-11T23:10:17.0620707Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_gelu_cuda PASSED [ 31%]
2023-01-11T23:10:17.0620895Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_hinge_embedding_loss_cuda PASSED [ 31%]
2023-01-11T23:10:17.0621094Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_huber_loss_cuda PASSED [ 31%]
2023-01-11T23:10:17.0621282Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_margin_ranking_loss_cuda PASSED [ 31%]
2023-01-11T23:10:17.0621453Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_prelu_cuda PASSED [ 31%]
2023-01-11T23:10:17.0621612Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_as_cuda PASSED [ 31%]
2023-01-11T23:10:17.0621774Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_reshape_cuda PASSED [ 31%]
2023-01-11T23:10:17.0621929Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_zeta_cuda PASSED [ 31%]
2023-01-11T23:10:17.0622085Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sub_cuda PASSED [ 31%]
2023-01-11T23:10:17.0622248Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_sum_to_size_cuda PASSED [ 31%]
2023-01-11T23:10:17.0622407Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_triu_cuda PASSED [ 31%]
2023-01-11T23:10:17.0622565Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_unbind_cuda PASSED [ 31%]
2023-01-11T23:10:17.0622720Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vsplit_cuda PASSED [ 31%]
2023-01-11T23:10:17.0622876Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_xlogy_cuda PASSED [ 31%]
2023-01-11T23:10:17.0623055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0623231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0623402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float16 PASSED [ 31%]
2023-01-11T23:10:17.0623611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0623787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int16 PASSED [ 31%]
2023-01-11T23:10:17.0623960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int32 PASSED [ 31%]
2023-01-11T23:10:17.0624133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0624337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0624517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0624719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0624894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int32 PASSED [ 31%]
2023-01-11T23:10:17.0625245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 31%]
2023-01-11T23:10:17.0625447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0625647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex128 PASSED [ 31%]
2023-01-11T23:10:17.0625844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0626038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int16 PASSED [ 31%]
2023-01-11T23:10:17.0626229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int64 PASSED [ 31%]
2023-01-11T23:10:17.0626426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_uint8 PASSED [ 31%]
2023-01-11T23:10:17.0626672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0626891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0627110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0627478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.bfloat16 doesn't support nvfuser) [ 31%]
2023-01-11T23:10:17.0627697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0628047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 31%]
2023-01-11T23:10:17.0628418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.bfloat16 doesn't support nvfuser) [ 31%]
2023-01-11T23:10:17.0628835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 31%]
2023-01-11T23:10:17.0629070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0629264Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0629456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0629681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex32 PASSED [ 31%]
2023-01-11T23:10:17.0629877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0630072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float64 PASSED [ 31%]
2023-01-11T23:10:17.0630256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int16 PASSED [ 31%]
2023-01-11T23:10:17.0630440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int64 PASSED [ 31%]
2023-01-11T23:10:17.0630627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0630843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0631061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0631285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0631649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 31%]
2023-01-11T23:10:17.0631869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0632086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0632425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 31%]
2023-01-11T23:10:17.0632650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex128 PASSED [ 31%]
2023-01-11T23:10:17.0632846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float16 PASSED [ 31%]
2023-01-11T23:10:17.0633035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float32 PASSED [ 31%]
2023-01-11T23:10:17.0633227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int32 PASSED [ 31%]
2023-01-11T23:10:17.0633416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int8 PASSED [ 31%]
2023-01-11T23:10:17.0633634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0633850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:10:17.0634050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:10:17.0634240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_bool PASSED [ 31%]
2023-01-11T23:10:17.0634445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex128 PASSED [ 31%]
2023-01-11T23:10:17.0634640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex64 PASSED [ 31%]
2023-01-11T23:10:17.0634837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float64 PASSED [ 31%]
2023-01-11T23:10:17.0635029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int16 PASSED [ 31%]
2023-01-11T23:10:17.0635246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int64 PASSED [ 31%]
2023-01-11T23:10:17.0635438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:10:17.0635661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0635882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0636254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.cdouble doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0636599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:10:17.0636970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.cdouble doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0637185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0637398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0637593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0637787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0637978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:10:17.0638172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:10:17.0638418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0638784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.cfloat doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0639005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0639364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.cfloat doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0639585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0639803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0639992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0640180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0640374Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:10:17.0640566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:10:17.0640756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:10:17.0640969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:10:17.0641193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0641406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0641629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0641848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0642059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0642405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:10:17.0642767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.chalf doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0643109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:10:17.0643297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0643493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0643686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0643878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0644095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:10:17.0644280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0644491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0644709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0644928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0645296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:10:17.0645519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0645873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.char doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0646223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.char doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0646436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0646633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0646859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:10:17.0647229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.complex doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0647425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0647613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0647813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:10:17.0648011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:10:17.0648208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0648403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0648596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:10:17.0648787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:10:17.0649008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0649367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:10:17.0649586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0649971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.double doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0650334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.double doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0650681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:10:17.0650877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0651097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0651320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0651514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0651709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:10:17.0651897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0652083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:10:17.0652296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0652656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:10:17.0652905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0653125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0653483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.float doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0653698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0653913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0654251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:10:17.0654449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0654762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0654958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0655144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0655336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0655526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0655716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:10:17.0655973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0656195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0656412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0656626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0656972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:10:17.0657165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0657357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:10:17.0657551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0657742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:10:17.0657959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0658171Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0658389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0658642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0658859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0659205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:10:17.0659415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0659627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0659816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:10:17.0660012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0660207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0660397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:10:17.0660584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:10:17.0660799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0661046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0661431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:10:17.0661676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0661894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0662238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.long doesn't support nvfuser) [ 32%]
2023-01-11T23:10:17.0662579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:10:17.0662791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0663138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:10:17.0663357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0663549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0663739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:10:17.0663931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0664121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0664338Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:10:17.0664528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:10:17.0664706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0664889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:10:17.0665111Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0665324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0665540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0665756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0665936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0666121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:10:17.0666305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0666477Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:10:17.0666654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:10:17.0666828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0667004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:10:17.0667231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:10:17.0667440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0667779Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:10:17.0667988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0668191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0668517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:10:17.0668754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0668958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0669276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:10:17.0669457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0669638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:10:17.0669815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0670021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:10:17.0670232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0670436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0670810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:10:17.0670984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int32 PASSED [ 32%]
2023-01-11T23:10:17.0671186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:10:17.0671366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:10:17.0671548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:10:17.0671728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0671905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:10:17.0672083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:10:17.0672260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:10:17.0672437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0672637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0672841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0673089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0673295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0673492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0673810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int32 SKIPPED (_refs.acosh doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0673996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:10:17.0674178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex32 PASSED [ 33%]
2023-01-11T23:10:17.0674363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0674542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:10:17.0674714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0674890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:10:17.0675066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0675269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0675478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0675709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0675895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0676096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0676278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0676474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0676674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0676865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:10:17.0677052Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0677265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0677479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0677666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0677852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0678035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:10:17.0678216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0678388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0678589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0678801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0679010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0679344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:10:17.0679526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0679706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:10:17.0679890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0680072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0680250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0680422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:10:17.0680599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0680805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0681036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0681402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float32 SKIPPED (_refs.addr doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0681610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0681938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:10:17.0682250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int32 SKIPPED (_refs.addr doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0682428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0682605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0682782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0682992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0683176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0683499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:10:17.0683699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0683896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0684083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0684272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:10:17.0684453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0684650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0684870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0685080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0685288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0685616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float32 SKIPPED (_refs.allclose doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0685826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0686007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:10:17.0686185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0686366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0686541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:10:17.0686709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:10:17.0686883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0687110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0687316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0687498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0687703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0688028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:10:17.0688229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0688411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0688592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:10:17.0688766Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0688942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0689119Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0689444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:10:17.0689765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:10:17.0689966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0690148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0690345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:10:17.0690526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:10:17.0690701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0690878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0691058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0691232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:10:17.0691414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0691589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:10:17.0691798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0692002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0692322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:10:17.0692502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0692695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0692956Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0693141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0693323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0693499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:10:17.0693677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:10:17.0693888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0694211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float32 SKIPPED (_refs.arange doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0694635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int32 SKIPPED (_refs.arange doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0694957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:10:17.0695146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0695331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:10:17.0695514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0695697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:10:17.0695874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0696091Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0696347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0696564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0696776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0697102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0697308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0697520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0697723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0697923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0698128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:10:17.0698330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:10:17.0698530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0698761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0698990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0699219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0699439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0699795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0700147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:10:17.0700397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0700623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:10:17.0700821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:10:17.0701013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:10:17.0701204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:10:17.0701396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0701619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0701853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0702076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0702292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0702652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float32 SKIPPED (_refs.as_strided_scatter doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0702870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0703215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:10:17.0703568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int32 SKIPPED (_refs.as_strided_scatter doesn't support nvfuser) [ 33%]
2023-01-11T23:10:17.0703782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0703965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex32 PASSED [ 33%]
2023-01-11T23:10:17.0704145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:10:17.0704321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:10:17.0704523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:10:17.0704699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:10:17.0704878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:10:17.0705084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:10:17.0705284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0705492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0705833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:10:17.0706038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0706362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0706544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:10:17.0706722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:10:17.0706901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0707078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0707252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0707574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float32 SKIPPED (_refs.asinh doesn't support nvfuser) [ 34%]
2023-01-11T23:10:17.0707808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0708133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0708450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int32 SKIPPED (_refs.asinh doesn't support nvfuser) [ 34%]
2023-01-11T23:10:17.0708715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0708904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0709076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0709253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0709458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0709658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0709842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float32 PASSED [ 34%]
2023-01-11T23:10:17.0710166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0710346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:10:17.0710555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:10:17.0710731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0710903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0711080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0711259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0711464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0711664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0711871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0712051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0712376Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0712577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0712760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:10:17.0712938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:10:17.0713124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:10:17.0713309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:10:17.0713489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:10:17.0713700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0713881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0714059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0714265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0714609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:10:17.0714815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0715022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0715206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float32 PASSED [ 34%]
2023-01-11T23:10:17.0715412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0715598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0715922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0716126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0716339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:10:17.0716534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:10:17.0716723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:10:17.0716902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:10:17.0717087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:10:17.0717267Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0717447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0717624Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0717803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0717982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0718193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0718402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0718616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0718818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0719028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0719252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0719442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:10:17.0719623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:10:17.0719810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:10:17.0719993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:10:17.0720176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:10:17.0720391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0720587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0720765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0720940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0721153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0721368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0721706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float32 SKIPPED (_refs.atleast_2d doesn't support nvfuser) [ 34%]
2023-01-11T23:10:17.0722072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0722283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0722469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:10:17.0722654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:10:17.0722834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:10:17.0723017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:10:17.0723201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:10:17.0723384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:10:17.0723565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0723739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0723916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0724127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0724481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:10:17.0724692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0725051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float32 SKIPPED (_refs.atleast_3d doesn't support nvfuser) [ 34%]
2023-01-11T23:10:17.0725255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0725585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0725792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0726122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0726331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0726517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0726702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0726884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0727064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0727391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0727607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0727814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0728028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0728219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0728401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0728581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0728929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0729145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0729330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0729512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0729714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0730044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0730250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0730436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0730626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0730867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0731078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0731412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0731744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0731948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0732137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0732331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0732523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0732867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:10:17.0733210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0733392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0733600Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0733787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0733996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0734201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0734656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_nvfuser_cuda_float32 SKIPPED (_refs.broadcast_shapes doesn't support nvfuser) [ 34%]
2023-01-11T23:10:17.0734857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:10:17.0735049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:10:17.0735244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:10:17.0735440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:10:17.0735631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:10:17.0735815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:10:17.0736002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:10:17.0736185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0736403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0736662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0736888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0737081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0737296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0737481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:10:17.0737666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:10:17.0737854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:10:17.0738041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:10:17.0738254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0738468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0738677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:10:17.0738867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float32 PASSED [ 34%]
2023-01-11T23:10:17.0739236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:10:17.0739451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0739634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float16 XFAIL [ 35%]
2023-01-11T23:10:17.0739814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int16 XFAIL [ 35%]
2023-01-11T23:10:17.0739991Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int32 XFAIL [ 35%]
2023-01-11T23:10:17.0740162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int64 XFAIL [ 35%]
2023-01-11T23:10:17.0740337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_uint8 XFAIL [ 35%]
2023-01-11T23:10:17.0740524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float32 XFAIL [ 35%]
2023-01-11T23:10:17.0740703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int32 XFAIL [ 35%]
2023-01-11T23:10:17.0741036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0741244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0741422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0741599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:10:17.0741782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex32 PASSED [ 35%]
2023-01-11T23:10:17.0741966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0742140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0742342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0742512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:10:17.0742689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:10:17.0743029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%]
2023-01-11T23:10:17.0743236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0743558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:10:17.0743762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0744085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0744283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0744464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0744645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0744815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0745021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0745202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0745378Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:10:17.0745581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0745896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float32 SKIPPED (_refs.ceil doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0746099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0746418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:10:17.0746731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int32 SKIPPED (_refs.ceil doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0746930Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0747243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0747443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0747623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0747807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:10:17.0747988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0748166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0748367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:10:17.0748545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:10:17.0748797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:10:17.0749006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0749359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%]
2023-01-11T23:10:17.0749572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0749891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float32 SKIPPED (_refs.chunk doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0750214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:10:17.0750527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int32 SKIPPED (_refs.chunk doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0750728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0750911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0751118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0751293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0751468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0751644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:10:17.0751967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float32 SKIPPED (_refs.clamp doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0752174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0752487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int32 SKIPPED (_refs.clamp doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0752810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0753017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0753203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0753387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0753570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:10:17.0753746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0753924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:10:17.0754138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0754380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0754714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float32 SKIPPED (_refs.clamp_max doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0755040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int32 SKIPPED (_refs.clamp_max doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0755246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0755448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0755638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0755819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:10:17.0755997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0756179Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:10:17.0756388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0756596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0756921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int32 SKIPPED (_refs.clamp_min doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0757153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0757354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0757534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:10:17.0757711Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0757888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:10:17.0758090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0758290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0758493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0758678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0758881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0759084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0759268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:10:17.0759461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:10:17.0759646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0759833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0760062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0760272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0760512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0760876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float32 SKIPPED (_refs.column_stack doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0761209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int32 SKIPPED (_refs.column_stack doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0761549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0761738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:10:17.0761918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0762099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0762275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0762445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0762655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0762885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0763204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float32 SKIPPED (_refs.conj doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0763525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:10:17.0763836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int32 SKIPPED (_refs.conj doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0764035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0764226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0764420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex32 PASSED [ 35%]
2023-01-11T23:10:17.0764611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0764787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0764971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:10:17.0765155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:10:17.0765340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:10:17.0765554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0765771Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0766137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:10:17.0766484Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int32 SKIPPED (_refs.conj_physical doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0766818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0767029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0767223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0767409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:10:17.0767601Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0767792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0767978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:10:17.0768196Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0768407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0768651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0769005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float32 SKIPPED (_refs.constant_pad_nd doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0769218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0769430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0769764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:10:17.0769950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:10:17.0770135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:10:17.0770323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:10:17.0770512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:10:17.0770698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0770883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:10:17.0771067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:10:17.0771247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:10:17.0771457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0771664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0772039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%]
2023-01-11T23:10:17.0772252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0772465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0772804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float32 SKIPPED (_refs.contiguous doesn't support nvfuser) [ 35%]
2023-01-11T23:10:17.0772990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:10:17.0773172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:10:17.0773354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:10:17.0773541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:10:17.0773749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:10:17.0773949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0774277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float32 SKIPPED (_refs.copysign doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0774614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0774947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:10:17.0775155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0775331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:10:17.0775513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:10:17.0775697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:10:17.0775877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:10:17.0776051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0776232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0776404Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:10:17.0776609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0776811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0777134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:10:17.0777332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0777535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0777754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:10:17.0777943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:10:17.0778117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:10:17.0778296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0778475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0778680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0778888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0779072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0779253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int32 PASSED [ 36%]
2023-01-11T23:10:17.0779575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0779755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0779936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:10:17.0780106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0780318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:10:17.0780530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0780741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0781066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float32 SKIPPED (_refs.cumsum doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0781270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0781596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:10:17.0781923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int32 SKIPPED (_refs.cumsum doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0782249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0782435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0782618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:10:17.0782802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0782984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:10:17.0783166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0783352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:10:17.0783590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0783945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:10:17.0784156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0784490Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float32 SKIPPED (_refs.diag_embed doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0784671Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0784853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:10:17.0785033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:10:17.0785217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:10:17.0785398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:10:17.0785575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:10:17.0785751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0785929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:10:17.0786104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0786305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:10:17.0786472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0786683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0786886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0787205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float32 SKIPPED (_refs.diag doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0787524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0787726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0787918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0788106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:10:17.0788301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:10:17.0788491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0788733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0788922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0789136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0789358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0789602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0789943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:10:17.0790286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.diagonal_copy doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0790497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0790834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0791022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0791204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:10:17.0791386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0791595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0791803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0792149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:10:17.0792386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0792717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float32 SKIPPED (_refs.diagonal doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0792925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0793254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:10:17.0793458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0793655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0793844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0794043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:10:17.0794237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0794422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:10:17.0794611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0794792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0794979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:10:17.0795164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0795415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0795767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float32 SKIPPED (_refs.diagonal_scatter doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0795987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0796336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int32 SKIPPED (_refs.diagonal_scatter doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0796675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0796893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0797079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:10:17.0797265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0797448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:10:17.0797629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:10:17.0797810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0798014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0798368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float32 SKIPPED (_refs.digamma doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0798701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:10:17.0799026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int32 SKIPPED (_refs.digamma doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0799352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0799559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0799755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0799950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0800146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:10:17.0800337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0800521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0800705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:10:17.0800922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0801143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0801385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0801716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int32 SKIPPED (_refs.div doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0801931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0802281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0802495Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0802694Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0802906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex32 SKIPPED (Skipped!) [ 36%]
2023-01-11T23:10:17.0803113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex64 SKIPPED (Skipped!) [ 36%]
2023-01-11T23:10:17.0803300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:10:17.0803493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0803684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:10:17.0803899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:10:17.0804088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0804311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0804523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex128 SKIPPED (Skipped!) [ 36%]
2023-01-11T23:10:17.0804737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex64 SKIPPED (Skipped!) [ 36%]
2023-01-11T23:10:17.0804955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0805298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float32 SKIPPED (_refs.div doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0805509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0805853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0806069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0806261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:10:17.0806450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0806636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:10:17.0806822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:10:17.0807060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0807392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int32 SKIPPED (_refs.div doesn't support nvfuser) [ 36%]
2023-01-11T23:10:17.0807604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:10:17.0807941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:10:17.0808118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:10:17.0808308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:10:17.0808498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:10:17.0808680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:10:17.0808862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:10:17.0809044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:10:17.0809228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0809405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:10:17.0809636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0809960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float32 SKIPPED (_refs.dsplit doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0810285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int32 SKIPPED (_refs.dsplit doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0810606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0810812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0810997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:10:17.0811187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex128 PASSED [ 37%]
2023-01-11T23:10:17.0811370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex32 PASSED [ 37%]
2023-01-11T23:10:17.0811557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_complex64 PASSED [ 37%]
2023-01-11T23:10:17.0811739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0811919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:10:17.0812123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0812327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0812533Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0812738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0813084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int32 SKIPPED (_refs.dstack doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0813287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0813618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0813815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0814118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bool SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0814428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex128 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0814835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex32 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0815139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_complex64 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0815443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float32 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0815742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int32 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0816045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_uint8 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0816395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float64 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0816697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int64 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0816997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int8 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0817294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_uint8 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:10:17.0817614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bfloat16 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0817925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_bool SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0818247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex64 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0818558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int32 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0818862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int64 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0819163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int8 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0819471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_uint8 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0819796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex64 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0820176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float16 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0820519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float32 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0820835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int64 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0821153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int8 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0821468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_uint8 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:10:17.0821641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:10:17.0821823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex32 PASSED [ 37%]
2023-01-11T23:10:17.0822003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0822180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:10:17.0822359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:10:17.0822528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0822732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0822940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0823169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0823346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0823665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0823846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:10:17.0824021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:10:17.0824198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0824376Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0824551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:10:17.0824729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0824901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0825106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0825283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0825489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0825667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0825871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0826217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0826394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:10:17.0826576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0826750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:10:17.0826928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0827126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0827330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0827517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0827844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:10:17.0828162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0828364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0828545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:10:17.0828786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:10:17.0828994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0829173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0829343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0829516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:10:17.0829724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0829926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0830253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0830458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0830634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0830811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:10:17.0830987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0831164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0831334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:10:17.0831536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0831743Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0832093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:10:17.0832407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int32 SKIPPED (_refs.exp2 doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0832609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0832811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0832994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex64 PASSED [ 37%]
2023-01-11T23:10:17.0833176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0833347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:10:17.0833525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0833702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0833906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0834243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%]
2023-01-11T23:10:17.0834450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0834658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0834864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0835186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:10:17.0835365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0835680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0835882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0836070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:10:17.0836263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex64 PASSED [ 37%]
2023-01-11T23:10:17.0836450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:10:17.0836634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:10:17.0836816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0836994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0837174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0837387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0837590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0837831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0838167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float32 SKIPPED (_refs.expand_as doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0838500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:10:17.0838827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int32 SKIPPED (_refs.expand_as doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0839035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0839223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:10:17.0839411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:10:17.0839597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0839773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0839943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0840120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:10:17.0840327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0840560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0840897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float32 SKIPPED (_refs.expand doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0841266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:10:17.0841586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int32 SKIPPED (_refs.expand doesn't support nvfuser) [ 37%]
2023-01-11T23:10:17.0841786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0841992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0842175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:10:17.0842348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0842528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:10:17.0842706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0842910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0843091Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0843293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0843621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:10:17.0843844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:10:17.0844026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:10:17.0844202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex64 PASSED [ 37%]
2023-01-11T23:10:17.0844380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:10:17.0844556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:10:17.0844732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:10:17.0844909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:10:17.0845082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:10:17.0845260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0845460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0845779Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float32 SKIPPED (_refs.eye doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0846087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int32 SKIPPED (_refs.eye doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0846282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0846626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0846831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0847021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0847203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0847383Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:10:17.0847559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:10:17.0847737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:10:17.0847916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:10:17.0848087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:10:17.0848434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:10:17.0848640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0848967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0849172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0849502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:10:17.0849735Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0849917Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:10:17.0850105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0850289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0850471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:10:17.0850645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:10:17.0850824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0851029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0851241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0851587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:10:17.0851797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0851998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0852325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:10:17.0852561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0852763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0852935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:10:17.0853121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0853307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0853491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0853673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:10:17.0853851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:10:17.0854034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:10:17.0854212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:10:17.0854388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0854806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fftn doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0855015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0855345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fftn doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0855537Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bfloat16 PASSED [ 38%]
2023-01-11T23:10:17.0855774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:10:17.0855966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0856150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0856333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:10:17.0856520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:10:17.0856739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0856943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0857159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0857501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fftshift doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0857709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0857918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0858135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:10:17.0858324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0858510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0858693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:10:17.0858872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:10:17.0859047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:10:17.0859228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0859440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0859794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:10:17.0860001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0860206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0860535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0860721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:10:17.0860905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0861090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0861292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:10:17.0861478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:10:17.0861658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0861863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0862066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0862392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0862581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0862769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0862950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:10:17.0863131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:10:17.0863306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0863516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0863727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0863960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0864294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.hfftn doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0864501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0864830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:10:17.0865035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0865364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0865557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:10:17.0865739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0865924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0866104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:10:17.0866282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:10:17.0866462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:10:17.0866637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0866989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:10:17.0867223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0867554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0867763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0868082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0868407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0868616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0868859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:10:17.0869045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:10:17.0869232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0869416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0869599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0869809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:10:17.0869988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:10:17.0870336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:10:17.0870570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0870922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:10:17.0871249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifft doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0871454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0871782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0871965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:10:17.0872153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:10:17.0872340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0872524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:10:17.0872701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:10:17.0872883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:10:17.0873061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0873292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0873505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0873837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifftn doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0874045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0874250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0874452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0874646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bfloat16 PASSED [ 38%]
2023-01-11T23:10:17.0874832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:10:17.0875016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:10:17.0875202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0875384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:10:17.0875567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:10:17.0875812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:10:17.0876002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:10:17.0876218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0876428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0876636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0876847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0877187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0877397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0877581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:10:17.0877767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:10:17.0877951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:10:17.0878134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:10:17.0878313Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:10:17.0878518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0878750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0879087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ihfft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0879417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ihfft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:10:17.0879628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:10:17.0879954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:10:17.0880164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0880347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:10:17.0880529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0880737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0880941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0881122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0881293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0881650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ihfft doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0881859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0882044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0882228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0882409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0882613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0882820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0883033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0883356Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:10:17.0883562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0883742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:10:17.0883929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex32 PASSED [ 39%]
2023-01-11T23:10:17.0884115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:10:17.0884301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0884480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0884686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0884867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0885046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:10:17.0885215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0885393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0885567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0885774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0886131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%]
2023-01-11T23:10:17.0886342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0886678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0887005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:10:17.0887218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex32 PASSED [ 39%]
2023-01-11T23:10:17.0887405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:10:17.0887586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0887768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0887948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0888128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:10:17.0888306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0888516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0888725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0888934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0889118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0889297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0889468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:10:17.0889641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0889855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0890229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%]
2023-01-11T23:10:17.0890466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0890700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0891029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0891210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:10:17.0891393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0891576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0891754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0891929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0892139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0892343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0892676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.rfft2 doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0892909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0893238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:10:17.0893423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0893605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0893785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0893982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0894307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.rfft doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0894614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0894942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.rfft doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0895266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:10:17.0895453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0895635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0909596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0909863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0910312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0910643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.rfftn doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0910860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0911079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bfloat16 PASSED [ 39%]
2023-01-11T23:10:17.0911290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:10:17.0911484Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex32 PASSED [ 39%]
2023-01-11T23:10:17.0911679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0911870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0912060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0912246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0912425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0912611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0912827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0913073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0913295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0913641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%]
2023-01-11T23:10:17.0913861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0914075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0914398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float32 SKIPPED (_refs.fill doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0914731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0915050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int32 SKIPPED (_refs.fill doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0915242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:10:17.0915441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex32 PASSED [ 39%]
2023-01-11T23:10:17.0915640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:10:17.0915832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0916024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0916212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0916402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0916648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0916863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0917207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%]
2023-01-11T23:10:17.0917433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0917649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0917981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float32 SKIPPED (_refs.flatten doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0918321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0918649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int32 SKIPPED (_refs.flatten doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0918864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0919058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:10:17.0919245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0919460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0919642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0919858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0920077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0920295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0920534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0920881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float32 SKIPPED (_refs.flip doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0921097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0921429Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0921750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int32 SKIPPED (_refs.flip doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0922075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:10:17.0922269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bfloat16 PASSED [ 39%]
2023-01-11T23:10:17.0922459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:10:17.0922659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:10:17.0922875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0923067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0923252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0923441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0923631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:10:17.0923818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:10:17.0924038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0924248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0924582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:10:17.0924910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int32 SKIPPED (_refs.fliplr doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0925109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:10:17.0925304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:10:17.0925496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0925714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0925907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:10:17.0926097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:10:17.0926315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0926522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0926744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0926962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0927175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0927503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int32 SKIPPED (_refs.flipud doesn't support nvfuser) [ 39%]
2023-01-11T23:10:17.0927717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:10:17.0928048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:10:17.0928246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bfloat16 PASSED [ 39%]
2023-01-11T23:10:17.0928442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:10:17.0928641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:10:17.0928868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:10:17.0929063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:10:17.0929252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0929472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0929693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0929914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0930345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int32 SKIPPED (_refs.float_power doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0930557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0930752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0930972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0931307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float32 SKIPPED (_refs.floor_divide doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0931641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0932012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int32 SKIPPED (_refs.floor_divide doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0932236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0932459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0932648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0932836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0933052Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0933377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float32 SKIPPED (_refs.floor doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0933707Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0934082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int32 SKIPPED (_refs.floor doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0934782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0935032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0938754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0938946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0939128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0939368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:10:17.0939552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:10:17.0939724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0939927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0940238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int32 SKIPPED (_refs.fmax doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0940419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0940597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:10:17.0940770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:10:17.0940941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:10:17.0941109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0941311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0941510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0941820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float32 SKIPPED (_refs.fmin doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0942054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0942231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0942406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0942583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0942756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:10:17.0942926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0943125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0943305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int32 PASSED [ 40%]
2023-01-11T23:10:17.0943507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0943705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0943876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0944048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0944248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0944558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float32 SKIPPED (_refs.frac doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0944733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:10:17.0944931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:10:17.0945105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0945423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0945727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int32 SKIPPED (_refs.gcd doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0945904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0946074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:10:17.0946249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0946425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0946596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0946767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:10:17.0946934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0947099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0947300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0947503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0947817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0947994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int32 PASSED [ 40%]
2023-01-11T23:10:17.0948306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0948502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0969426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0969606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0969781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:10:17.0969949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0970125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:10:17.0970287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0970487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0970685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0971055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0971366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0971600Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0971789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0971975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:10:17.0972158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0972369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0972572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0972902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float32 SKIPPED (_refs.heaviside doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0973114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0973444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0973650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0973828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:10:17.0974014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex128 PASSED [ 40%]
2023-01-11T23:10:17.0974225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex64 PASSED [ 40%]
2023-01-11T23:10:17.0974403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0974705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0974879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:10:17.0975086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0975427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 40%]
2023-01-11T23:10:17.0975632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0975946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int32 SKIPPED (_refs.hsplit doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0976264Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0976466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0976641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:10:17.0976819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0976995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:10:17.0977165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0977373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0977626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0977835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0978170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 40%]
2023-01-11T23:10:17.0978372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0978685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float32 SKIPPED (_refs.hstack doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0979008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0979323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int32 SKIPPED (_refs.hstack doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0979502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0979680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0979878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0980053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0980261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0980433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:10:17.0980631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0980828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0981133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float32 SKIPPED (_refs.i0 doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0981330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0981631Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int32 SKIPPED (_refs.i0 doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0981826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0982140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0982321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0982503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:10:17.0982827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float32 SKIPPED (_refs.igammac doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0983037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0983218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex32 PASSED [ 40%]
2023-01-11T23:10:17.0983402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex64 PASSED [ 40%]
2023-01-11T23:10:17.0983634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0983815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:10:17.0984005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex32 PASSED [ 40%]
2023-01-11T23:10:17.0984185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex64 PASSED [ 40%]
2023-01-11T23:10:17.0984373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0984556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:10:17.0984771Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0984983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0985189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0985517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:10:17.0985709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex64 PASSED [ 40%]
2023-01-11T23:10:17.0985896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:10:17.0986109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0986317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0986536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0986746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0986955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0987165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:10:17.0987503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:10:17.0987836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_copy doesn't support nvfuser) [ 40%]
2023-01-11T23:10:17.0988023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:10:17.0988211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:10:17.0988402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex128 PASSED [ 40%]
2023-01-11T23:10:17.0988581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:10:17.0988847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.0989038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.0989278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0989492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0989704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0990036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_fill doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.0990363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:10:17.0990690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_fill doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.0990927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0991148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:10:17.0991330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.0991516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.0991702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:10:17.0991887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.0992126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0992343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0992555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0992764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0993098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_select doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.0993307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0993629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.0993841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0994031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.0994212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:10:17.0994392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:10:17.0994574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.0994784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0994997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0995245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0995454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0995651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0995973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.0996178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0996365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:10:17.0996556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.0996739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:10:17.0996922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:10:17.0997104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:10:17.0997314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0997527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0997755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0998084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float32 SKIPPED (_refs.isfinite doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.0998294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0998614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int32 SKIPPED (_refs.isfinite doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.0998795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.0998975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:10:17.0999155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:10:17.0999335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:10:17.0999517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.0999723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.0999925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1000128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1000496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float32 SKIPPED (_refs.isinf doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1000699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1001077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:10:17.1001390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int32 SKIPPED (_refs.isinf doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1001573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:10:17.1001752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:10:17.1001934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1002115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1002289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:10:17.1002469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.1002677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1002885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1003088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1003273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float32 PASSED [ 41%]
2023-01-11T23:10:17.1003593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:10:17.1003801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int32 PASSED [ 41%]
2023-01-11T23:10:17.1004125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.1004306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:10:17.1004491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1004674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1004853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:10:17.1005030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:10:17.1005212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:10:17.1005390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.1005600Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1005806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1006014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1006332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.1006520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:10:17.1006700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:10:17.1006950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1007279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float32 SKIPPED (_refs.isposinf doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1007603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int32 SKIPPED (_refs.isposinf doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1007808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1008129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.1008320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.1008504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:10:17.1008676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1008856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.1009063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1009277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1009625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:10:17.1009834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1010015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:10:17.1010213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:10:17.1010416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:10:17.1010737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:10:17.1011036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int32 SKIPPED (_refs.lcm doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1011243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1011562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.1011741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:10:17.1011919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1012124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1012303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int32 PASSED [ 41%]
2023-01-11T23:10:17.1012616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:10:17.1012817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1013026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:10:17.1013205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:10:17.1013387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex32 PASSED [ 41%]
2023-01-11T23:10:17.1013568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1013773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1013976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1014159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:10:17.1014343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1014621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:10:17.1014800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:10:17.1014998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1015187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float32 PASSED [ 41%]
2023-01-11T23:10:17.1015515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:10:17.1015764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1015975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1016174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:10:17.1016369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1016590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1016814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1017034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1017394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.matrix_norm doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1017605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1017799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:10:17.1017987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1018199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1018418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1018661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1019002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.norm doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1019214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1019403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:10:17.1019590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.1019770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1019985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1020326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.svd doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1020522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.1020739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1020958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1021310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.svdvals doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1021560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1021755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1021976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1022186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1022544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.linalg.vector_norm doesn't support nvfuser) [ 41%]
2023-01-11T23:10:17.1022761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:10:17.1022951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:10:17.1023142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:10:17.1023333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:10:17.1023518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:10:17.1023700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:10:17.1023880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int16 XFAIL [ 42%]
2023-01-11T23:10:17.1024058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int64 XFAIL [ 42%]
2023-01-11T23:10:17.1024229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_uint8 XFAIL [ 42%]
2023-01-11T23:10:17.1024465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1024684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1025015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float32 SKIPPED (_refs.linspace doesn't support nvfuser) [ 42%]
2023-01-11T23:10:17.1025341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:10:17.1025662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int32 SKIPPED (_refs.linspace doesn't support nvfuser) [ 42%]
2023-01-11T23:10:17.1025875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1026202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:10:17.1026388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1026570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:10:17.1026748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1026920Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1027099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:10:17.1027329Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1027536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1027744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1027928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1028131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1028312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1028517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1028787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:10:17.1028977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1029163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:10:17.1029344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1029524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:10:17.1029702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1029907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1030239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:10:17.1030473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1030657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1030831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:10:17.1031011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:10:17.1031191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1031366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1031548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1031758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1031966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1032172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1032494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:10:17.1032689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1032895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:10:17.1033072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1033257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1033441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex32 PASSED [ 42%]
2023-01-11T23:10:17.1033618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1033796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1033971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1034149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:10:17.1034318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1034494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:10:17.1034702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1034908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1035110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1035293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1035614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:10:17.1035818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1036054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1036256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [ 42%]
2023-01-11T23:10:17.1036447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:10:17.1036641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1036835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1037065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1037289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1037512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1037711Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1037911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1038129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1038351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1038565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:10:17.1038746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1038941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1039129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:10:17.1039315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:10:17.1039499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1039683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1039869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1040088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1040306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1040510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1040728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1040968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1041153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1041365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1041551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:10:17.1041739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:10:17.1041922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1042104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1042320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1042530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1042747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1042955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1043143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1043355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1043687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:10:17.1043903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:10:17.1044090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1044280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:10:17.1044461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1044636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:10:17.1044845Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1045061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1045275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1045487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1045819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:10:17.1046005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1046333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:10:17.1046539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1046730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:10:17.1046906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1047123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1047314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:10:17.1047502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:10:17.1047687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1047873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:10:17.1048056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1048241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:10:17.1048460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1048677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1048880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1049066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1049247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int16 XFAIL [ 42%]
2023-01-11T23:10:17.1049456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:10:17.1049639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:10:17.1049855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1050065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1050392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float32 SKIPPED (_refs.logspace doesn't support nvfuser) [ 42%]
2023-01-11T23:10:17.1050601Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1050800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1050990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:10:17.1051180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:10:17.1051367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1051551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1051730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:10:17.1051908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:10:17.1052120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1052334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1052567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1052894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:10:17.1053073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:10:17.1053249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:10:17.1053419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1053589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:10:17.1053909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:10:17.1054112Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1054294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:10:17.1054597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:10:17.1054792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex32 PASSED [ 42%]
2023-01-11T23:10:17.1054967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:10:17.1055149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:10:17.1055411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1055770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%]
2023-01-11T23:10:17.1056106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float32 SKIPPED (_refs.masked_fill doesn't support nvfuser) [ 42%]
2023-01-11T23:10:17.1056317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:10:17.1056653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:10:17.1056863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1057071Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1057257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:10:17.1057431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1057611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1057789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:10:17.1057969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:10:17.1058178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1058505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float32 SKIPPED (_refs.maximum doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1058745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1059070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:10:17.1059276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1059478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1059652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:10:17.1059838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1060021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:10:17.1060229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1060432Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1060615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1060842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1061068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:10:17.1061296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:10:17.1061502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1061692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:10:17.1061889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1062086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:10:17.1062276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1062505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1062729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1062951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1063166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1063516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:10:17.1063737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1063932Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1064160Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:10:17.1064367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:10:17.1064571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1064770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1064966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:10:17.1065196Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1065422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1065775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1066123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:10:17.1066347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1066525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:10:17.1066733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1066918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1067103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1067283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:10:17.1067464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:10:17.1067647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:10:17.1067828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1068150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float32 SKIPPED (_refs.minimum doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1068485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:10:17.1068876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int32 SKIPPED (_refs.minimum doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1069085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1069266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1069445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:10:17.1069654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1069868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1070242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 43%]
2023-01-11T23:10:17.1070455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1070665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1071029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int32 SKIPPED (_refs.movedim doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1071203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:10:17.1071382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1071566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:10:17.1071745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1071923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:10:17.1072098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:10:17.1072272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1072457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1072660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1072994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:10:17.1073183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:10:17.1073366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1073551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1073730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1073911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:10:17.1074090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1074300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1074510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1074840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float32 SKIPPED (_refs.nan_to_num doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1075042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1075248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1075441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:10:17.1075636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:10:17.1075859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:10:17.1076044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:10:17.1076231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1076445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1076656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1076995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.narrow_copy doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1077322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:10:17.1077659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.narrow_copy doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1077870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1078198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:10:17.1078409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1078618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1078805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:10:17.1078993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1079175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1079351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:10:17.1079523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:10:17.1079701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1079882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:10:17.1080063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1080275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1080618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 43%]
2023-01-11T23:10:17.1080835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1081074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1081397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float32 SKIPPED (_refs.narrow doesn't support nvfuser) [ 43%]
2023-01-11T23:10:17.1081722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:10:17.1082065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:10:17.1082273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1082470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:10:17.1082667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1082859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:10:17.1083081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1083276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1083493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1083674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:10:17.1083850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:10:17.1084025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:10:17.1084202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:10:17.1084400Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:10:17.1084602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1084810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1085017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1085201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1085403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1085580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1085781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1085973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1086158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:10:17.1086337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:10:17.1086510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1086689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:10:17.1086863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:10:17.1087039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1087216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:10:17.1087449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1087649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1087854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1088039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:10:17.1088241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1088419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int32 PASSED [ 43%]
2023-01-11T23:10:17.1088623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1088831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:10:17.1089141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bool SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1089451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float16 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1089758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int32 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1090056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int64 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1090453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bfloat16 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1090773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex64 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1091141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float32 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1091447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float64 SKIPPED (Can't check result for new_empty) [ 43%]
2023-01-11T23:10:17.1091757Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int16 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:10:17.1092064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int8 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:10:17.1092372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_uint8 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:10:17.1092620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1092854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1093096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1093327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1093573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1093841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1094084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1094324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1094660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1094895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:10:17.1095084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1095273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex64 PASSED [ 44%]
2023-01-11T23:10:17.1095459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1095642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1095815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int16 PASSED [ 44%]
2023-01-11T23:10:17.1095994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int32 PASSED [ 44%]
2023-01-11T23:10:17.1096273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_uint8 PASSED [ 44%]
2023-01-11T23:10:17.1096488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1096696Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1096903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1097104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1097434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:10:17.1097638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1097825Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1098003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_bool PASSED [ 44%]
2023-01-11T23:10:17.1098190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:10:17.1098369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int16 PASSED [ 44%]
2023-01-11T23:10:17.1098548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:10:17.1098729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int8 PASSED [ 44%]
2023-01-11T23:10:17.1098938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1099158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1099397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1099722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float32 SKIPPED (_refs.new_ones doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1100048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:10:17.1100244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1100441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1100670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:10:17.1100865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:10:17.1101050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_complex64 PASSED [ 44%]
2023-01-11T23:10:17.1101235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1101420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1101604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int32 PASSED [ 44%]
2023-01-11T23:10:17.1101784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:10:17.1101980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int8 PASSED [ 44%]
2023-01-11T23:10:17.1102186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1102399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1102750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%]
2023-01-11T23:10:17.1102960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1103285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float32 SKIPPED (_refs.new_zeros doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1103612Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:10:17.1103939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int32 SKIPPED (_refs.new_zeros doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1104125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1104313Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1104637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float32 SKIPPED (_refs.nextafter doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1104875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:10:17.1105135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:10:17.1105415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:10:17.1105615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1105811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1106030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1106250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1106470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1106672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_bfloat16 XFAIL [ 44%]
2023-01-11T23:10:17.1106874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float16 XFAIL [ 44%]
2023-01-11T23:10:17.1107074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float64 XFAIL [ 44%]
2023-01-11T23:10:17.1107291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1107510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1107743Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float32 XFAIL [ 44%]
2023-01-11T23:10:17.1107937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1108133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1108324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1108539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1108837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1109034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1109232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1109447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1109665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1110504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1110733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1110927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1111123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1111384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1111601Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1111817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1112018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1112211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1112438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1112830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.group_norm doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1113054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1113255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1113456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1113658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1113856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1114266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.hardshrink doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1114467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1114667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1114853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int16 PASSED [ 44%]
2023-01-11T23:10:17.1115047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int32 PASSED [ 44%]
2023-01-11T23:10:17.1115243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int8 PASSED [ 44%]
2023-01-11T23:10:17.1115473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1115694Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1116068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.hardtanh doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1116293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1116638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:10:17.1116986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:10:17.1117236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1117474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1117701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1117936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1118134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1118335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1118561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1118782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1118984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1119205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1119402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1119633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:10:17.1119834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_complex64 PASSED [ 44%]
2023-01-11T23:10:17.1120047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1120272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1120494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1120910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.l1_loss doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1121130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1121333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1121530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:10:17.1121754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1122381Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1122653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:10:17.1122900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1123164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1123369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1123588Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1123800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 44%]
2023-01-11T23:10:17.1124016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:10:17.1124234Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:10:17.1124453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 44%]
2023-01-11T23:10:17.1124663Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1124874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:10:17.1125084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 44%]
2023-01-11T23:10:17.1125319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1125744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%]
2023-01-11T23:10:17.1125962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 44%]
2023-01-11T23:10:17.1126198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1126562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:10:17.1126775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 44%]
2023-01-11T23:10:17.1127009Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1127223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int16 PASSED [ 44%]
2023-01-11T23:10:17.1127423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:10:17.1127621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_uint8 PASSED [ 44%]
2023-01-11T23:10:17.1127982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:10:17.1128385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.margin_ranking_loss doesn't support nvfuser) [ 44%]
2023-01-11T23:10:17.1128618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1128994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:10:17.1129224Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:10:17.1129419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:10:17.1129615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1129833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1130058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1130261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1130459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1130655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1130878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1131090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1131487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.mse_loss doesn't support nvfuser) [ 45%]
2023-01-11T23:10:17.1131709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1131905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float16 XFAIL [ 45%]
2023-01-11T23:10:17.1132126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1132344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1132709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.nll_loss doesn't support nvfuser) [ 45%]
2023-01-11T23:10:17.1132930Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1133279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex128 PASSED [ 45%]
2023-01-11T23:10:17.1133494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:10:17.1133703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:10:17.1133900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1134106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:10:17.1134343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1134747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1134982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1135351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:10:17.1135581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1135786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1135986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1136194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1136397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:10:17.1136587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1136790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:10:17.1136996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:10:17.1137233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1137445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1137673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1137897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1138253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:10:17.1138474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1138674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1138901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1139258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.prelu doesn't support nvfuser) [ 45%]
2023-01-11T23:10:17.1139480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1139675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1139873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:10:17.1140065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1140283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:10:17.1140528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1140761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1140980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1141323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:10:17.1141517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1141737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1142079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:10:17.1142295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1142489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1142683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1142906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1143098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1143285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:10:17.1143645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.relu doesn't support nvfuser) [ 45%]
2023-01-11T23:10:17.1143987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:10:17.1144195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1144417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1144634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1144831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1145047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1145257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1145469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:10:17.1145679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1146373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:10:17.1146588Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1146778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:10:17.1146984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1147184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:10:17.1147419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1147655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1147885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1148095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1148324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1148531Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1148865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex128 PASSED [ 45%]
2023-01-11T23:10:17.1149080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:10:17.1149279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1149480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1149684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1149884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:10:17.1150079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:10:17.1150319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1150577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1150827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1151035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1151408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:10:17.1151639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1151862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1152061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1152259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1152482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1152682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1152885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:10:17.1153083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1153310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1153515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex128 PASSED [ 45%]
2023-01-11T23:10:17.1153717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:10:17.1153909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1154107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1154357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1154713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:10:17.1154918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int32 PASSED [ 45%]
2023-01-11T23:10:17.1155141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1155488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:10:17.1155689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:10:17.1155891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1156087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1156284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:10:17.1156503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1156723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1156940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1157292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:10:17.1157551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1157765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex128 PASSED [ 45%]
2023-01-11T23:10:17.1157974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1158182Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1158390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:10:17.1158598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1158834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1159064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1159299Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1159531Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1159785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1159970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:10:17.1160155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:10:17.1160334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1160555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1160800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1161004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1161201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1161519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float32 SKIPPED (_refs.norm doesn't support nvfuser) [ 45%]
2023-01-11T23:10:17.1161703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:10:17.1161881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:10:17.1162061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1162240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:10:17.1162419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:10:17.1162599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1162834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1163172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 45%]
2023-01-11T23:10:17.1163371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1163572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:10:17.1163889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float32 SKIPPED (_refs.ones doesn't support nvfuser) [ 45%]
2023-01-11T23:10:17.1164212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:10:17.1164529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:10:17.1164712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bool PASSED [ 45%]
2023-01-11T23:10:17.1164898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:10:17.1165084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:10:17.1165266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:10:17.1165446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:10:17.1165809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 45%]
2023-01-11T23:10:17.1166025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1166236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1166423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1166629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1166958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:10:17.1167148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1167353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1167543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:10:17.1167730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1167906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1168089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1168271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1168616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:10:17.1168976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float32 SKIPPED (_refs.positive doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1169305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:10:17.1169511Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1169833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1170038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1170223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:10:17.1170397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:10:17.1170586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1170761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1170939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1171116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:10:17.1171452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:10:17.1171657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1171993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float32 SKIPPED (_refs.pow doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1172201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1172508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int32 SKIPPED (_refs.pow doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1172701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1172883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1173063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:10:17.1173247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1173426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1173610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1173819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1174028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1174340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float32 SKIPPED (_refs.prod doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1174723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1175092Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int32 SKIPPED (_refs.prod doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1175471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1175804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_bfloat16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1176126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex128 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1176445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1176763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_complex64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1177080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1177408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_bfloat16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1177732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex128 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1178045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float16 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:10:17.1178231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:10:17.1178409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:10:17.1178630Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1178817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1178995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:10:17.1179176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1179359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1179539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1179745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1179959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1180157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1180475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int32 SKIPPED (_refs.ravel doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1180794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1181000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1181190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:10:17.1181378Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:10:17.1181560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1181824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1182010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:10:17.1182351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:10:17.1182662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float32 SKIPPED (_refs.real doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1182869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1183078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1183402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1183593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1183781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1183972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1184161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1184346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:10:17.1184590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1184798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1184990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1185203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1185538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:10:17.1185725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:10:17.1185916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:10:17.1186099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1186284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1186465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:10:17.1186677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1186879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1187068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1187277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1187457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:10:17.1187665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1187848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1188025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:10:17.1188209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:10:17.1188414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1188612Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1188915Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1189132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1189455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float32 SKIPPED (_refs.repeat doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1189774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int32 SKIPPED (_refs.repeat doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1189980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1190170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:10:17.1190392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:10:17.1190583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1190767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:10:17.1190942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1191123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1191304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:10:17.1191520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1191855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int32 SKIPPED (_refs.reshape_as doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1192188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1192371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:10:17.1192556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:10:17.1192741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1192923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1193094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1193279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:10:17.1193647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:10:17.1193966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int32 SKIPPED (_refs.reshape doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1194175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1194359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:10:17.1194541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:10:17.1194725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:10:17.1194902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1195083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1195254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1195430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:10:17.1195639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1195840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1196069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1196396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:10:17.1196705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int32 SKIPPED (_refs.roll doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1196886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:10:17.1197065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:10:17.1197238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:10:17.1197418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1197598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1197777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:10:17.1197989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1198307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float32 SKIPPED (_refs.rot90 doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1198618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int32 SKIPPED (_refs.rot90 doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1198934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1199116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:10:17.1199301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:10:17.1199512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:10:17.1199691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:10:17.1199869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:10:17.1200076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1200397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float32 SKIPPED (_refs.round doesn't support nvfuser) [ 46%]
2023-01-11T23:10:17.1200605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:10:17.1200927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:10:17.1201108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:10:17.1201295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:10:17.1201483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex32 PASSED [ 47%]
2023-01-11T23:10:17.1201661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:10:17.1201841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:10:17.1202020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:10:17.1202222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1202402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:10:17.1202576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1202903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1203117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1203462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%]
2023-01-11T23:10:17.1203660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1203988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1204311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1204496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:10:17.1204678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:10:17.1204859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1205038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1205216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1205395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1205633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1205949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1206152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1206354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1206537Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:10:17.1206723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1206907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:10:17.1207090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex32 PASSED [ 47%]
2023-01-11T23:10:17.1207269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1207449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1207619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:10:17.1207794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1207997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1208362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%]
2023-01-11T23:10:17.1208574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1208892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1209095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1209278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1209466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:10:17.1209656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:10:17.1209836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:10:17.1210022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1210205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:10:17.1210411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1210628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1211018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%]
2023-01-11T23:10:17.1211234Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1211469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1211795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1211975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:10:17.1212146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1212331Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1212510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:10:17.1212691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:10:17.1212897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1213083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1213287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1213610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1213928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1214150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:10:17.1214325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:10:17.1214682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1214868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:10:17.1215077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1215285Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1215609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float32 SKIPPED (_refs.signbit doesn't support nvfuser) [ 47%]
2023-01-11T23:10:17.1215937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1216145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1216467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1216672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1216843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:10:17.1217024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:10:17.1217205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1217385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1217827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1218082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:10:17.1218293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1218502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1218856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%]
2023-01-11T23:10:17.1219066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1219242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1219561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1219743Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1219943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1220261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1220439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1220675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:10:17.1220859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:10:17.1221039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1221211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:10:17.1221393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1221576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:10:17.1221782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1221988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1222198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1222405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1222594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1222799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1223121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1223317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1223638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1223868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1224050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1224235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:10:17.1224417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1224597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1224775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1224983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1225324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%]
2023-01-11T23:10:17.1225518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1225705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1226024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1226205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1226431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1226626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:10:17.1226825Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1227044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1227304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1227541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:10:17.1227734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:10:17.1227922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:10:17.1228146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1228366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1228583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1228865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1229085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1229435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1229669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1229864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1230048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1230240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:10:17.1230429Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1230786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.bessel_j0 doesn't support nvfuser) [ 47%]
2023-01-11T23:10:17.1231130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1231468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:10:17.1231662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1231850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1232065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1232283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1232648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1232840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:10:17.1233029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1233217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:10:17.1233403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:10:17.1233616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1233951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1234164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1234352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:10:17.1234542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1234717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1235025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:10:17.1235238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1235457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1235832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:10:17.1236176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.erfcx doesn't support nvfuser) [ 47%]
2023-01-11T23:10:17.1236389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:10:17.1236579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:10:17.1236764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:10:17.1236952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:10:17.1237155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1237493Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.i0e doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1237824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1238035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1238364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:10:17.1238574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:10:17.1238763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1238949Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1239134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1239318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1239649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.i1 doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1239827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1240013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1240199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1240407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1240741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1241078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.i1e doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1241273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1241466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1241653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:10:17.1241879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1242056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1242406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.log_ndtr doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1242627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1242834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:10:17.1243048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 48%]
2023-01-11T23:10:17.1243260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 48%]
2023-01-11T23:10:17.1243464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:10:17.1243666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1243869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1244102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1244357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1244568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1244799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1245157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:10:17.1245347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:10:17.1245536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:10:17.1245723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1245908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1246096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1246311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1246519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1246869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.logit doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1247210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1247581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.logit doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1247910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:10:17.1248122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1248328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1248562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1248800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1249165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1249552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1249782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1249984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1250222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1250431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1250828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1251102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1251478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1251860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1252093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1252453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:10:17.1252682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1252893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1253098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1253296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1253527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1253732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1253963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1254348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1254826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1255213Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1255445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1255814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:10:17.1256000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:10:17.1256192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:10:17.1256375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1256611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1256802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1257017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1257351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1257695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.ndtr doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1257910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1258104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:10:17.1258298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1258486Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1258669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:10:17.1258887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1259232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.ndtri doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1259574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.ndtri doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1259826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1260039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 48%]
2023-01-11T23:10:17.1260249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 48%]
2023-01-11T23:10:17.1260452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1260653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:10:17.1260885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1261122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1261347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1261557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1261761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1261963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:10:17.1262172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1262393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1262593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1262795Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1263019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1263377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1263766Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.spherical_bessel_j0 doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1263990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1264186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:10:17.1264383Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:10:17.1264567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1264753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1264941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:10:17.1265131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1265320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1265563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1265924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.xlog1py doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1266265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.xlog1py doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1266481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1266698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1266889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1267077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1267423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.zeta doesn't support nvfuser) [ 48%]
2023-01-11T23:10:17.1267635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1267972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1268206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1268419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1268597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex32 PASSED [ 48%]
2023-01-11T23:10:17.1268858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex64 PASSED [ 48%]
2023-01-11T23:10:17.1269044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1269223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1269568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 48%]
2023-01-11T23:10:17.1269777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1269983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1270169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1270534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1270747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1270940Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1271121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:10:17.1271327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex64 SKIPPED (Skipped!) [ 48%]
2023-01-11T23:10:17.1271538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:10:17.1271721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1271899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:10:17.1272078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:10:17.1272256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1272434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:10:17.1272608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:10:17.1272821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1273024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1273209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float32 PASSED [ 48%]
2023-01-11T23:10:17.1273541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:10:17.1273726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int32 PASSED [ 48%]
2023-01-11T23:10:17.1273931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1274200Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:10:17.1274389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:10:17.1274579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex32 PASSED [ 49%]
2023-01-11T23:10:17.1274754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1274938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1275114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:10:17.1275297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1275506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1275714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1276047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1276251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1276452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1276637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:10:17.1276813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1276997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1277204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:10:17.1277385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1277568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1277746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1277925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:10:17.1278137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1278486Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%]
2023-01-11T23:10:17.1278803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float32 SKIPPED (_refs.stack doesn't support nvfuser) [ 49%]
2023-01-11T23:10:17.1279010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1279336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1279653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int32 SKIPPED (_refs.stack doesn't support nvfuser) [ 49%]
2023-01-11T23:10:17.1279857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1280065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:10:17.1280255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1280470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1280682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1280877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex32 PASSED [ 49%]
2023-01-11T23:10:17.1281048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1281225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1281405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:10:17.1281582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1281762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:10:17.1281937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:10:17.1282114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1282321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1282664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%]
2023-01-11T23:10:17.1282874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1283109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1283437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1283638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1283961Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:10:17.1284161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1284340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1284525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex32 PASSED [ 49%]
2023-01-11T23:10:17.1284709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:10:17.1284888Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1285068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1285236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:10:17.1285436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1285644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1286012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%]
2023-01-11T23:10:17.1286222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1286423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1286626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1286822Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1287145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:10:17.1287351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1287533Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:10:17.1287719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1287909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1288095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:10:17.1288277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:10:17.1288459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:10:17.1288645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1288881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1289096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1289298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1289488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1289676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1289883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1290244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:10:17.1290454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:10:17.1290628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1290808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:10:17.1290992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1291167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1291366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1291592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1291775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1291976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1292175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1292362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1292543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex32 PASSED [ 49%]
2023-01-11T23:10:17.1292724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:10:17.1292903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1293076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1293252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:10:17.1293428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1293633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1293842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1294045Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1294249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1294470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1294757Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:10:17.1294941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1295113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1295293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1295471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:10:17.1295651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:10:17.1295860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1296205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%]
2023-01-11T23:10:17.1296414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1296616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1296798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1297124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1297343Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1297548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1297878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:10:17.1298064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1298253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:10:17.1298440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:10:17.1298627Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:10:17.1298807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:10:17.1299025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1299236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1299447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1299793Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float32 SKIPPED (_refs.tensor_split doesn't support nvfuser) [ 49%]
2023-01-11T23:10:17.1300128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1300512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int32 SKIPPED (_refs.tensor_split doesn't support nvfuser) [ 49%]
2023-01-11T23:10:17.1300693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1300875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:10:17.1301053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:10:17.1301231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:10:17.1301407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1301570Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:10:17.1301748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1301957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1302161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1302362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1302687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1302870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:10:17.1303076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1303255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1303434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:10:17.1303607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1303814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1304015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1304227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1304574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%]
2023-01-11T23:10:17.1304786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1305104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int32 SKIPPED (_refs.trace doesn't support nvfuser) [ 49%]
2023-01-11T23:10:17.1305430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:10:17.1305618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:10:17.1305809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:10:17.1305989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:10:17.1306177Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:10:17.1306388Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:10:17.1306602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1306809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1307024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1307236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:10:17.1307576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:10:17.1307765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:10:17.1307974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1308146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1308328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1308503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:10:17.1308742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1308965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:10:17.1309168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1309375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1309577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1309894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int32 SKIPPED (_refs.tril doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1310095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1310408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:10:17.1310610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1310787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1310972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:10:17.1311176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1311378Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:10:17.1311553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:10:17.1311730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1311933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1312157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1312483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:10:17.1312796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int32 SKIPPED (_refs.triu doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1313003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1313193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1313386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:10:17.1313571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1313760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1313943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1314154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1314364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1314575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1314811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1314998Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1315181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1315360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1315539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1315719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:10:17.1315927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1316251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float32 SKIPPED (_refs.trunc doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1316451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1316777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:10:17.1317095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int32 SKIPPED (_refs.trunc doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1317417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:10:17.1317602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1317792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:10:17.1317978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:10:17.1318183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1318365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1318544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:10:17.1318746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1318952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1319301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:10:17.1319629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float32 SKIPPED (_refs.unbind doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1319836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1320167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:10:17.1320370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1320572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1320783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1320973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1321157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1321341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:10:17.1321521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:10:17.1321733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1321938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1322150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1322363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1322689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int32 SKIPPED (_refs.unflatten doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1322879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1323066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1323247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1323432Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:10:17.1323622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:10:17.1323868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1324087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1324298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1324480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1324664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:10:17.1324849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:10:17.1325029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1325215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1325394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1325576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:10:17.1325756Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:10:17.1325935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1326142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1326374Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1326702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:10:17.1327018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int32 SKIPPED (_refs.unfold doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1327215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1327398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1327585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1327772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1327958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:10:17.1328142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1328354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1328566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1328919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:10:17.1329130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1329314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1329527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int32 PASSED [ 50%]
2023-01-11T23:10:17.1329737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1330068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:10:17.1330316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1330502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1330680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1330860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1331070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1331275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1331472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1331655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1331843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:10:17.1332029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1332242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1332428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:10:17.1332639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1332848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1333031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1333210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:10:17.1333395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:10:17.1333586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1333769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:10:17.1333951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:10:17.1334158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1334369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1334680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1335004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float32 SKIPPED (_refs.view_as doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1335258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1335576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:10:17.1335784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1335963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1336142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1336325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:10:17.1336508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1336688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:10:17.1336868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:10:17.1337190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:10:17.1337506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:10:17.1337683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:10:17.1337865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1338089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:10:17.1338275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1338455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1338638Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1338817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:10:17.1338996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:10:17.1339203Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1339406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1339615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1339814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1340132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int32 SKIPPED (_refs.vsplit doesn't support nvfuser) [ 50%]
2023-01-11T23:10:17.1340360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:10:17.1340707Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:10:17.1340891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:10:17.1341075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:10:17.1341289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:10:17.1341471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:10:17.1341641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1341819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1341996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1342174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1342386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1342594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1342806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1343017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1343221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1343541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int32 SKIPPED (_refs.vstack doesn't support nvfuser) [ 51%]
2023-01-11T23:10:17.1343765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1343952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1344140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1344320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1344528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1344849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:10:17.1345162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int32 SKIPPED (_refs.where doesn't support nvfuser) [ 51%]
2023-01-11T23:10:17.1345368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1345573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1345755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1345929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1346116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1346296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1346470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:10:17.1346677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1347019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float32 SKIPPED (_refs.xlogy doesn't support nvfuser) [ 51%]
2023-01-11T23:10:17.1347225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1347550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:10:17.1347862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int32 SKIPPED (_refs.xlogy doesn't support nvfuser) [ 51%]
2023-01-11T23:10:17.1348184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:10:17.1348360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1348549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1348793Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1348988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1349165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1349371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1349574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1349821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1350031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1350341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float32 SKIPPED (_refs.zeros doesn't support nvfuser) [ 51%]
2023-01-11T23:10:17.1350660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:10:17.1350979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int32 SKIPPED (_refs.zeros doesn't support nvfuser) [ 51%]
2023-01-11T23:10:17.1351184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1351391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1351598Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1351799Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1351994Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1352223Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1352426Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1352620Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1352831Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1353046Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1353233Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1353429Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1353617Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1353804Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1353990Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1354177Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1354364Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1354537Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1354752Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1354963Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1355178Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1355394Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1355628Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1355824Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1356157Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:10:17.1356349Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1356560Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:10:17.1356715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1356881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1357044Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1357207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1357362Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1357519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1357675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int64 PASSED [ 51%]
2023-01-11T23:10:17.1357831Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1358013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1358194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1358385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1358590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1358769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1358946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1359122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1359296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1359471Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1359648Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1359817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1359996Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1360168Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1360361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1360562Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1360735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1360904Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1361101Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1361262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1361432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1361603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1361786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1361969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1362149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1362328Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1362508Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1362681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1362846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int64 PASSED [ 51%]
2023-01-11T23:10:17.1363022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1363193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1363368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1363549Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1363729Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1363914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1364093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1364341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1364511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1364682Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1364866Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1365043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1365220Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1365394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1365569Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1365739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1365911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1366079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float64 PASSED [ 51%]
2023-01-11T23:10:17.1366250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1366422Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1366596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1366799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1366978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1367155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1367339Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1367520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1367691Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1367865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1368038Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int64 PASSED [ 51%]
2023-01-11T23:10:17.1368209Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1368391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1368570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex32 PASSED [ 51%]
2023-01-11T23:10:17.1368752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1368928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float16 PASSED [ 51%]
2023-01-11T23:10:17.1369099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1369265Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1369436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int64 PASSED [ 51%]
2023-01-11T23:10:17.1369606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int8 PASSED [ 51%]
2023-01-11T23:10:17.1369799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1369972Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1370170Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex128 PASSED [ 51%]
2023-01-11T23:10:17.1370368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int32 PASSED [ 51%]
2023-01-11T23:10:17.1370541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bool PASSED [ 51%]
2023-01-11T23:10:17.1370715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1370879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float32 PASSED [ 51%]
2023-01-11T23:10:17.1371047Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int16 PASSED [ 51%]
2023-01-11T23:10:17.1371220Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int64 PASSED [ 51%]
2023-01-11T23:10:17.1371390Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1371567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1371746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex64 PASSED [ 51%]
2023-01-11T23:10:17.1371918Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_uint8 PASSED [ 51%]
2023-01-11T23:10:17.1372099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:10:17.1372301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1372474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1372651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1372819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1372986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1373151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1373321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1373488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1373651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1373809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1373973Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1374133Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1374291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1374445Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1374704Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1374874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1375036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1375200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1375352Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1375553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1375712Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1375879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1376046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1376210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1376374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1376538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1376691Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1376853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1377014Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1377172Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1377334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1377495Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1377655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1377812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1377999Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1378146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1378317Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1378481Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1378647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1378817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1378980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1379142Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1379305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1379462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1379619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1379786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1379948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1380108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1380268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1380429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1380587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1380738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1380899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1381081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1381244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1381403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1381556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1381709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1381879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1382046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1382207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1382363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1382525Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1382682Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1382847Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1383006Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1383164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1383324Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1383474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1383654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1383811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1383971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1384128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1384290Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1384451Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1384609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1384759Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1384913Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1385070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1385235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1385400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1385557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1385715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1385873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1386031Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1386197Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1386368Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1386537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1386723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1386910Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1387090Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1387277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1387457Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1387636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1387810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1387986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1388162Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1388334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1388507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1388746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1388925Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1389090Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1389246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1389436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1389593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1389758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1389923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1390095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1390281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1390466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1390625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1390779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1390939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1391102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1391262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1391418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1391572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1391728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1391885Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1392034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1392202Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1392365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1392548Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1392706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1392859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1393013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1393178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex128 PASSED [ 52%]
2023-01-11T23:10:17.1393334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1393494Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float64 PASSED [ 52%]
2023-01-11T23:10:17.1393653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1393810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1393968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1394136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1394299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1394461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1394620Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1394772Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1394930Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1395123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1395292Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1395456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1395624Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1395783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1395943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1396105Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:10:17.1396266Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_bool PASSED [ 52%]
2023-01-11T23:10:17.1396440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex32 PASSED [ 52%]
2023-01-11T23:10:17.1396612Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex64 PASSED [ 52%]
2023-01-11T23:10:17.1396779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float16 PASSED [ 52%]
2023-01-11T23:10:17.1396942Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float32 PASSED [ 52%]
2023-01-11T23:10:17.1397104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1397265Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1397431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int32 PASSED [ 52%]
2023-01-11T23:10:17.1397583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int8 PASSED [ 52%]
2023-01-11T23:10:17.1397746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_uint8 PASSED [ 52%]
2023-01-11T23:10:17.1397919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int16 PASSED [ 52%]
2023-01-11T23:10:17.1398108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int64 PASSED [ 52%]
2023-01-11T23:10:17.1398272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1398435Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1398595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1398756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1398920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1399071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1399250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1399413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1399570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1399747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1399926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1400105Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1400275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1400441Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1400637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1400814Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1400984Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1401152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1401319Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1401485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1401650Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1401817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1401976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1402137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1402301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1402459Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1402621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1402782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1402946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1403107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1403256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1403419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1403611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1403771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1403924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1404093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1404257Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1404415Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1404576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1404733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1404896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1405063Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1405219Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1405377Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1405541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1405706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1405868Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1406022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1406206Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1406365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1406528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1406692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1406858Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex32 PASSED [ 53%]
2023-01-11T23:10:17.1407021Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1407180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1407339Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1407491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1407657Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1407830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1408004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1408171Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1408341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1408504Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1408670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex32 PASSED [ 53%]
2023-01-11T23:10:17.1408826Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1408992Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1409154Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1409333Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1409507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1409675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1409846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1410016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1410183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1410349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1410528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1410708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1410878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1411046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1411216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1411382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1411550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1411734Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex32 PASSED [ 53%]
2023-01-11T23:10:17.1411905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1412073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1412238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1412401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1412567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1412730Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1412895Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1413053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1413210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1413369Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1413531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1413693Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1413851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1414009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1414180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1414341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1414582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1414750Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1414911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1415113Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1415277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1415434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1415602Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1415772Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1415940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1416098Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int16 PASSED [ 53%]
2023-01-11T23:10:17.1416261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1416423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1416589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1416749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1416923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1417103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex32 PASSED [ 53%]
2023-01-11T23:10:17.1417274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1417438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1417636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1417808Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1417974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1418144Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex32 PASSED [ 53%]
2023-01-11T23:10:17.1418307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1418470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1418631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1418791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1418948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1419110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1419293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1419466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1419634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1419805Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1419977Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1420138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1420302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1420456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1420640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1420820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1420996Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1421167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1421342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1421510Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1421695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1421864Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1422049Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex32 PASSED [ 53%]
2023-01-11T23:10:17.1422231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 53%]
2023-01-11T23:10:17.1429391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float16 PASSED [ 53%]
2023-01-11T23:10:17.1429601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float32 PASSED [ 53%]
2023-01-11T23:10:17.1429784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1429962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int64 PASSED [ 53%]
2023-01-11T23:10:17.1430194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_uint8 PASSED [ 53%]
2023-01-11T23:10:17.1430364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:10:17.1430544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float64 PASSED [ 53%]
2023-01-11T23:10:17.1430717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1430888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int8 PASSED [ 53%]
2023-01-11T23:10:17.1431053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bool PASSED [ 53%]
2023-01-11T23:10:17.1431221Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex128 PASSED [ 53%]
2023-01-11T23:10:17.1431382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int32 PASSED [ 53%]
2023-01-11T23:10:17.1431546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1431711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1431872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1432035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1432198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1432356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1432515Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1432675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:10:17.1432834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1433003Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1433160Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1433349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1433512Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1433673Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1433833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1433995Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1434159Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1434335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1434503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1434659Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1434816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1434978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1435139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1435296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1435451Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1435604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1435783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1435935Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:10:17.1436099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1436259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1436415Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1436574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:10:17.1436731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1436886Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1437042Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1437190Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1437349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1437513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1437669Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1437822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1437979Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1438136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:10:17.1438290Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1438443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1438596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1438752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1438933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1439094Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:10:17.1439256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1439417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1439574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1439731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1439882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1440040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1440196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1440351Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1440513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1440679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1440843Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1441005Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1441164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1441344Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1441512Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1441674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1441833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1441997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1442155Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1442312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1442472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1442621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1442784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1442944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1443100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1443256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1443413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1443572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1443725Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1443876Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1444032Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1444189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1444350Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1444544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1444711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1444874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1445035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1445192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1445343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1445505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1445664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1445831Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1445991Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1446148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1446308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1446463Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1446621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1446786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1446974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1447136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1447295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1447453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1447619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1447788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1447956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1448112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1448279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1448438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1448598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1448768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1448932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1449098Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1449258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1449409Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1449571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1449732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1449932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1450124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1450303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1450462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1450621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1450780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1450942Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1451109Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1451269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1451430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1451592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1451753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1451916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1452083Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1452235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1452397Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1452582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1452746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1452905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1453063Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1453222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1453389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1453556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1453715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_complex64 PASSED [ 54%]
2023-01-11T23:10:17.1453882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1454046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1454207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int64 PASSED [ 54%]
2023-01-11T23:10:17.1454365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1454738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1454902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_bool PASSED [ 54%]
2023-01-11T23:10:17.1455070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1455232Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float16 PASSED [ 54%]
2023-01-11T23:10:17.1455384Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1455547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1455755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int16 PASSED [ 54%]
2023-01-11T23:10:17.1455917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int32 PASSED [ 54%]
2023-01-11T23:10:17.1456076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int8 PASSED [ 54%]
2023-01-11T23:10:17.1456235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_uint8 PASSED [ 54%]
2023-01-11T23:10:17.1456408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:10:17.1456582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex128 PASSED [ 54%]
2023-01-11T23:10:17.1456747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex32 PASSED [ 54%]
2023-01-11T23:10:17.1456919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float32 PASSED [ 54%]
2023-01-11T23:10:17.1457088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float64 PASSED [ 54%]
2023-01-11T23:10:17.1457257Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1457426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1457589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1457752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1457913Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1458073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1458298Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1458456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1458625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1458788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1458947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1459108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1459267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1459427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1459581Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1459749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1459911Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1460069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1460228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1460399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1460567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex32 PASSED [ 55%]
2023-01-11T23:10:17.1460727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1460884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1461038Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1461204Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1461430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1461596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1461755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1461915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1462085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1462254Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex32 PASSED [ 55%]
2023-01-11T23:10:17.1462417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_complex64 PASSED [ 55%]
2023-01-11T23:10:17.1462582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1462741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1462902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1463062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1463221Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1463381Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1463539Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1463694Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1463876Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1464033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1464196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1464354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1464511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1464668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1464829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1464989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1465145Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1465307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex32 PASSED [ 55%]
2023-01-11T23:10:17.1465472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1465631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1465788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1465943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1466096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1466267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1466430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex64 PASSED [ 55%]
2023-01-11T23:10:17.1466585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1466744Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1466933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1467093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1467253Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1467411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1467567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1467722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1467877Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex64 PASSED [ 55%]
2023-01-11T23:10:17.1468040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1468203Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1468364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1468524Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1468757Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1468937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1469117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1469295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1469462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1469660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1469825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1469985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1470153Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1470321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1470486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1470646Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1470798Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1470960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1471119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1471283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1471455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1471623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1471787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1471951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1472112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1472264Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1472423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1472578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1472758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1472920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1473079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1473235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1473389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1473541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1473703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1473859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1474019Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1474176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1474332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1474485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1474643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1474794Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1474948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1475129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1475282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1475438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1475589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1475741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1475899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1476048Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1476190Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1476338Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1476491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1476660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1476823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1476989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1477152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1477316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1477467Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1477629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1477793Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex32 PASSED [ 55%]
2023-01-11T23:10:17.1477956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1478118Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1478300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1478460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1478622Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1478777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1478937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1479100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex64 PASSED [ 55%]
2023-01-11T23:10:17.1479259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1479420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1479578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int32 PASSED [ 55%]
2023-01-11T23:10:17.1479738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1479897Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1480059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1480210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bool PASSED [ 55%]
2023-01-11T23:10:17.1480366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float16 PASSED [ 55%]
2023-01-11T23:10:17.1480520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1480696Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_float64 PASSED [ 55%]
2023-01-11T23:10:17.1480853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1481009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int8 PASSED [ 55%]
2023-01-11T23:10:17.1481164Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_uint8 PASSED [ 55%]
2023-01-11T23:10:17.1481325Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1481480Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1481642Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex32 PASSED [ 55%]
2023-01-11T23:10:17.1481807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:10:17.1481976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex128 PASSED [ 55%]
2023-01-11T23:10:17.1482148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex32 PASSED [ 55%]
2023-01-11T23:10:17.1482309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float32 PASSED [ 55%]
2023-01-11T23:10:17.1482470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int16 PASSED [ 55%]
2023-01-11T23:10:17.1482631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int64 PASSED [ 55%]
2023-01-11T23:10:17.1482789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1482940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1483110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1483282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1483448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1483614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1483798Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1483959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1484119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1484276Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1484436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1484604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1484769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1484930Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1485087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1485248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1485405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1485576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1485743Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1485915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex32 PASSED [ 56%]
2023-01-11T23:10:17.1486082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1486275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1486441Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1486604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1486769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1486933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1487086Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1487248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1487409Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1487574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1487742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1487910Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1488076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1488238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1488398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1488550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1488711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1488869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1489037Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1489196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1489389Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1489551Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1489709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1489859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1490017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1490177Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1490340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1490500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1490659Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1490817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1490974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1491139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1491294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1491452Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1491610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1491773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1491955Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1492114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1492278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1492439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1492588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1492745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1492904Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1493062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1493223Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1493383Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1493540Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1493697Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1493849Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1493993Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1494150Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1494304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1494466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1494733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1494892Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex32 PASSED [ 56%]
2023-01-11T23:10:17.1495089Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1495250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1495400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1495566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1495723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1495901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1496074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1496246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1496424Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1496596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1496764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1496926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1497093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1497269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1497443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1497647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1497830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1498008Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1498182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1498343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1498509Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1498680Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1498845Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1499009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1499171Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1499335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1499498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1499657Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1499809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1499968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1500128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1500284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1500449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1500632Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1500792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1500948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1501095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1501251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1501417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1501577Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1501739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1501899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1502060Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1502214Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1502366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1502514Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1502672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1502829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1502993Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1503181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1503340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1503500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1503659Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1503809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1503989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1504174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1504354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1504537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1504714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1504894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1505069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int8 PASSED [ 56%]
2023-01-11T23:10:17.1505238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1505395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_bool PASSED [ 56%]
2023-01-11T23:10:17.1505567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1505734Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1505903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1506069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1506256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1506420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int32 PASSED [ 56%]
2023-01-11T23:10:17.1506579Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1506751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1506915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_complex64 PASSED [ 56%]
2023-01-11T23:10:17.1507081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1507245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1507411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int64 PASSED [ 56%]
2023-01-11T23:10:17.1507574Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_uint8 PASSED [ 56%]
2023-01-11T23:10:17.1507739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float64 PASSED [ 56%]
2023-01-11T23:10:17.1507901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int16 PASSED [ 56%]
2023-01-11T23:10:17.1508069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:10:17.1508234Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex128 PASSED [ 56%]
2023-01-11T23:10:17.1508400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float16 PASSED [ 56%]
2023-01-11T23:10:17.1508565Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float32 PASSED [ 56%]
2023-01-11T23:10:17.1508828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1509011Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1509178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1509337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1509498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1509664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1509817Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1509980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1510146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1510307Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1510476Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1510637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1510800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1510961Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1511111Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1511269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1511426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1511582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1511736Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1511912Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1512082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1512248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1512418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1512582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1512750Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1512915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1513080Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1513244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1513404Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1513567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1513728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1513881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1514043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1514208Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1514397Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1514557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1514722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1514908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1515092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1515279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1515453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1515637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1515819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1515997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1516176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1516355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1516537Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1516722Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1516906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1517080Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1517267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1517473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1517658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1517819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1517981Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1518144Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1518303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1518464Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1518619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1518785Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1518956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1519123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1519287Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1519450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1519614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1519773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1519953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1520138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1520326Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1520485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1520647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1520803Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1520955Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1521107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1521275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1521433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1521596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1521756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1521914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1522075Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1522247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1522417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1522585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1522744Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1522908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1523093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1523256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1523415Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1523577Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1523738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1523902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1524062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1524216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1524375Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1524552Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1524723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1524891Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1525051Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1525210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1525364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1525538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1525725Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1525879Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1526039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1526202Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1526363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1526522Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1526681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1526829Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1526985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1527142Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1527304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1527473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1527641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1527809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1527971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1528132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1528283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1528465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1528676Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1528855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1529024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1529195Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1529364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1529529Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1529701Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1529860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1530026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1530210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1530394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1530555Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int8 PASSED [ 57%]
2023-01-11T23:10:17.1530716Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_uint8 PASSED [ 57%]
2023-01-11T23:10:17.1530880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:10:17.1531039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_bool PASSED [ 57%]
2023-01-11T23:10:17.1531223Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1531383Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1531544Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1531713Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex128 PASSED [ 57%]
2023-01-11T23:10:17.1531884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex32 PASSED [ 57%]
2023-01-11T23:10:17.1532049Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_complex64 PASSED [ 57%]
2023-01-11T23:10:17.1532213Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1532375Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1532533Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1532687Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int16 PASSED [ 57%]
2023-01-11T23:10:17.1532850Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int32 PASSED [ 57%]
2023-01-11T23:10:17.1533009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int64 PASSED [ 57%]
2023-01-11T23:10:17.1533174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1533363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1533549Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1533726Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1533898Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1534081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float64 PASSED [ 57%]
2023-01-11T23:10:17.1534269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1534444Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1534715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float32 PASSED [ 57%]
2023-01-11T23:10:17.1534888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float16 PASSED [ 57%]
2023-01-11T23:10:17.1535056Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1535222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1535413Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1535596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1535771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1535950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1536130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1536308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1536502Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1536695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1536922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1537110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1537291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1537471Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1537644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1537822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1537999Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1538185Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1538363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1538542Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1538718Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1538917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1539115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1539305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1539505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1539697Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1539918Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1540104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1540287Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1540477Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1540661Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1540851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1541027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1541200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1541382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1541560Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1541753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1541944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1542134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1542351Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1542546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1542733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1542913Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1543092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1543281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1543468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1543653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1543839Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1544022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1544202Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1544380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1544547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1544719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1544897Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1545067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1545305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1545486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1545658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1545830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1546000Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_selu_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1546183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1546380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1546571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1546762Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1546943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1547129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1547312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int64 PASSED [ 58%]
2023-01-11T23:10:17.1547499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1547721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1547905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1548096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1548279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1548460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1548644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 58%]
2023-01-11T23:10:17.1548928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1549119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1549300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1549488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1549671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1549843Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1550030Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1550211Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1550391Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1550572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1550778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1550961Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1551140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1551318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1551486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1551664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1551857Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1552043Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1552236Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1552419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1552606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1552767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1552930Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1553084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1553268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1553427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1553590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1553755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1553917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1554078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1554237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1554385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1554546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1554714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1554884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1555052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1555216Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1555380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1555546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1555713Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1555873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1556045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1556211Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1556400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1556562Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1556723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1556884Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int64 PASSED [ 58%]
2023-01-11T23:10:17.1557045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1557200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1557363Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1557522Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1557679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int64 PASSED [ 58%]
2023-01-11T23:10:17.1557835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1557988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bool PASSED [ 58%]
2023-01-11T23:10:17.1558154Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1558316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1558478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1558631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1558813Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1558971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1559127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1559279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int64 PASSED [ 58%]
2023-01-11T23:10:17.1559432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1559596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1559757Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1559910Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1560071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:10:17.1560231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_bool PASSED [ 58%]
2023-01-11T23:10:17.1560395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1560554Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float16 PASSED [ 58%]
2023-01-11T23:10:17.1560713Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1560874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1561033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_uint8 PASSED [ 58%]
2023-01-11T23:10:17.1561191Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bool PASSED [ 58%]
2023-01-11T23:10:17.1561347Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex128 PASSED [ 58%]
2023-01-11T23:10:17.1561507Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex32 PASSED [ 58%]
2023-01-11T23:10:17.1561672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_complex64 PASSED [ 58%]
2023-01-11T23:10:17.1561835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float32 PASSED [ 58%]
2023-01-11T23:10:17.1562017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float64 PASSED [ 58%]
2023-01-11T23:10:17.1562179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int16 PASSED [ 58%]
2023-01-11T23:10:17.1562331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int32 PASSED [ 58%]
2023-01-11T23:10:17.1562482Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int64 PASSED [ 58%]
2023-01-11T23:10:17.1562630Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_int8 PASSED [ 58%]
2023-01-11T23:10:17.1562800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1562964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1563140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex128 PASSED [ 59%]
2023-01-11T23:10:17.1563306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1563470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1563637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1563797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1563951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1564109Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1564265Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1564448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1564607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1564771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1564932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1565102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1565274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex128 PASSED [ 59%]
2023-01-11T23:10:17.1565432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1565598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1565763Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1565923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1566085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1566253Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex128 PASSED [ 59%]
2023-01-11T23:10:17.1566418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex32 PASSED [ 59%]
2023-01-11T23:10:17.1566582Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1566745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1566899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1567061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1567226Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1567387Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1567580Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1567739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1567897Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1568052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1568204Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1568361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1568520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1568679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1568840Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1569001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1569158Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1569314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1569462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1569626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1569782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1569939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1570122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1570289Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex128 PASSED [ 59%]
2023-01-11T23:10:17.1570449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1570605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1570774Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1570946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1571115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1571271Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1571430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1571594Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1571765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex128 PASSED [ 59%]
2023-01-11T23:10:17.1571932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1572094Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1572247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1572405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1572565Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1572724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1572885Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1573045Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1573233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1573394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1573548Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1573710Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1573869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1574028Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1574184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1574345Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1574608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1574773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1574924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1575077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex128 PASSED [ 59%]
2023-01-11T23:10:17.1575235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1575393Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1575546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1575755Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1575916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1576081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1576238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1576385Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1576540Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1576721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1576901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1577075Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1577247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1577414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1577590Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1577759Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1577921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1578085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1578250Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1578419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1578597Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1578765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1578958Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1579123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1579290Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1579453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1579621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1579793Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1579964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1580132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1580301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1580466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1580634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1580792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1580956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1581126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1581314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1581478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1581644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1581805Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1581965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1582124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1582281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1582449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1582611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1582773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1582944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1583116Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1583285Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1583454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1583640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1583825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 59%]
2023-01-11T23:10:17.1584014Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1584205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1584417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1584604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1584790Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1584960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1585131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1585302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1585465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1585634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1585833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1586022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1586211Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1586395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int32 PASSED [ 59%]
2023-01-11T23:10:17.1586576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1586781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1586968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1587148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1587334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1587518Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 59%]
2023-01-11T23:10:17.1587703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [ 59%]
2023-01-11T23:10:17.1587890Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1588058Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:10:17.1588228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_bool PASSED [ 59%]
2023-01-11T23:10:17.1588399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float16 PASSED [ 59%]
2023-01-11T23:10:17.1588566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float32 PASSED [ 59%]
2023-01-11T23:10:17.1588799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int16 PASSED [ 59%]
2023-01-11T23:10:17.1588965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int8 PASSED [ 59%]
2023-01-11T23:10:17.1589128Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_uint8 PASSED [ 59%]
2023-01-11T23:10:17.1589299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1589470Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1589638Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1589807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1590015Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1590206Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1590387Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1590573Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1590754Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1590939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1591119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1591293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1591461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1591637Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1591806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1591968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1592141Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1592334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1592500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1592672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1592838Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1593003Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1593167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1593322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1593485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1593649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1593810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1593966Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1594121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1594285Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1594446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1594609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1594766Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1594924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1595085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1595245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1595433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1595600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1595764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1595925Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1596077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1596238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1596397Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1596566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1596728Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1596888Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1597046Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1597209Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1597370Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1597521Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1597679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1597846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1598034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1598197Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1598355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1598513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1598670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1598822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1598980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1599137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1599293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1599466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1599629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1599801Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1599967Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1600131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1600283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1600440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1600592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1600747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1600904Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1601081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1601232Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1601382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1601531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1601692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1601851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1602009Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1602169Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1602328Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1602484Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1602634Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1602788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1602947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1603107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1603263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1603418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1603611Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bfloat16 XFAIL [ 60%]
2023-01-11T23:10:17.1603784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex128 XFAIL [ 60%]
2023-01-11T23:10:17.1603959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_complex64 XFAIL [ 60%]
2023-01-11T23:10:17.1604126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float16 XFAIL [ 60%]
2023-01-11T23:10:17.1604283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int16 XFAIL [ 60%]
2023-01-11T23:10:17.1604447Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int8 XFAIL [ 60%]
2023-01-11T23:10:17.1604608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_uint8 XFAIL [ 60%]
2023-01-11T23:10:17.1604774Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1604936Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1605091Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1605248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1605398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1605547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1605700Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1605854Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1606007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1606170Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1606332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1606490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1606674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1606825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1606985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1607152Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1607316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1607479Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1607641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1607800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1607962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1608126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1608279Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1608436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1608593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1608747Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1608914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1609096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1609258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex32 PASSED [ 60%]
2023-01-11T23:10:17.1609419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1609571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1609730Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1609893Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1610059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_indices_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1610227Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1610393Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1610555Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float16 PASSED [ 60%]
2023-01-11T23:10:17.1610721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1610906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_float64 PASSED [ 60%]
2023-01-11T23:10:17.1611073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1611237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1611395Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1611561Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1611719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1611880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1612040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1612200Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1612382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:10:17.1612551Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1612715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1612875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int32 PASSED [ 60%]
2023-01-11T23:10:17.1613034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int64 PASSED [ 60%]
2023-01-11T23:10:17.1613193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_int8 PASSED [ 60%]
2023-01-11T23:10:17.1613361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1613523Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1613698Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1613860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float32 PASSED [ 60%]
2023-01-11T23:10:17.1614026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int16 PASSED [ 60%]
2023-01-11T23:10:17.1614191Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_uint8 PASSED [ 60%]
2023-01-11T23:10:17.1614349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bool PASSED [ 60%]
2023-01-11T23:10:17.1614610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex128 PASSED [ 60%]
2023-01-11T23:10:17.1614777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex64 PASSED [ 60%]
2023-01-11T23:10:17.1614980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1615146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1615299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1615466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1615633Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1615803Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1615968Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1616131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1616297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1616461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1616624Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1616780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1616943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1617107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1617272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1617440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex128 PASSED [ 61%]
2023-01-11T23:10:17.1617605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1617773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1617965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1618120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1618283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1618443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1618603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1618767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1618926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1619092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1619254Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1619417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1619571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1619729Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1619883Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1620040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1620205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1620365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1620553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1620717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1620873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1621032Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1621193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1621358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex128 PASSED [ 61%]
2023-01-11T23:10:17.1621523Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1621684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1621842Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1622010Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex128 PASSED [ 61%]
2023-01-11T23:10:17.1622175Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1622330Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1622491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1622652Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1622810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1622971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1623130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1623291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1623450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1623626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1623787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1623951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1624108Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1624271Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1624433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1624593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1624758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1624937Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1625126Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1625300Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1625474Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1625640Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1625813Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1625985Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1626177Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1626341Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1626497Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1626660Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1626838Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_bfloat16 XFAIL [ 61%]
2023-01-11T23:10:17.1627014Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float16 XFAIL [ 61%]
2023-01-11T23:10:17.1627186Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float32 XFAIL [ 61%]
2023-01-11T23:10:17.1627355Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_native_batch_norm_cuda_float64 XFAIL [ 61%]
2023-01-11T23:10:17.1627521Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float16 XFAIL [ 61%]
2023-01-11T23:10:17.1627679Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float32 XFAIL [ 61%]
2023-01-11T23:10:17.1627844Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_float64 XFAIL [ 61%]
2023-01-11T23:10:17.1627999Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bool XFAIL [ 61%]
2023-01-11T23:10:17.1628164Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex32 XFAIL [ 61%]
2023-01-11T23:10:17.1628325Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float32 XFAIL [ 61%]
2023-01-11T23:10:17.1628485Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int64 XFAIL [ 61%]
2023-01-11T23:10:17.1628643Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int8 XFAIL [ 61%]
2023-01-11T23:10:17.1628885Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_uint8 XFAIL [ 61%]
2023-01-11T23:10:17.1629063Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1629231Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1629416Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1629583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1629748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1629911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1630106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1630298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1630489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1630679Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1630867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1631048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1631236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1631422Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1631607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1631861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1632045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1632229Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1632407Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1632596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex128 PASSED [ 61%]
2023-01-11T23:10:17.1632773Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1632954Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1633138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1633319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1633514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1633702Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1633893Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1634081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1634268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int16 PASSED [ 61%]
2023-01-11T23:10:17.1634457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1634641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1634854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1635050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1635236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1635422Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1635606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1635792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1635987Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex128 PASSED [ 61%]
2023-01-11T23:10:17.1636177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1636358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1636543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1636730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_uint8 PASSED [ 61%]
2023-01-11T23:10:17.1636915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1637103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex128 PASSED [ 61%]
2023-01-11T23:10:17.1637293Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1637504Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1637693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1637880Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int32 PASSED [ 61%]
2023-01-11T23:10:17.1638053Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1638236Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1638423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1638615Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1638806Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1638994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1639196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1639379Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1639563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex32 PASSED [ 61%]
2023-01-11T23:10:17.1639754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1639934Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1640123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1640312Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int64 PASSED [ 61%]
2023-01-11T23:10:17.1640519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1640708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1640920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1641132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1641319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int8 PASSED [ 61%]
2023-01-11T23:10:17.1641505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:10:17.1641682Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_bool PASSED [ 61%]
2023-01-11T23:10:17.1641875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex64 PASSED [ 61%]
2023-01-11T23:10:17.1642060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float16 PASSED [ 61%]
2023-01-11T23:10:17.1642244Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float32 PASSED [ 61%]
2023-01-11T23:10:17.1642425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_float64 PASSED [ 61%]
2023-01-11T23:10:17.1642609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1642796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1643008Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1643201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1643380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1643564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1643747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1643934Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1644123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1644307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1644491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1644685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1644871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1645047Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1645229Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1645409Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1645588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1645762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1645967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1646140Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1646314Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1646485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1646646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1646817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1646988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1647172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1647346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1647518Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1647688Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1647870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1648048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1648210Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1648377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1648574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1648749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1648918Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1649092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1649262Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1649431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1649597Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1649757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1649942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1650125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1650307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1650484Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1650665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1650844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1651019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1651191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1651361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1651553Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1651725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1651892Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1652066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1652242Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1652414Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1652583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1652748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1652921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1653093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1653261Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1653425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1653594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1653761Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1654450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1654844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1655071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1655283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1655455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1655626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1655796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1655970Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1656143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1656325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1656499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1656664Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1656833Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1657001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1657181Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1657357Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1657530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1657712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1657884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1658124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1658297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1658469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1658665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1658858Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1659058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1659255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1659450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1659639Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1659828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1660011Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1660197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1660380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1660587Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1660769Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1660946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1661142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1661341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1661507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1661669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1661840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1662022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1662193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1662364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1662534Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1662698Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1662864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1663032Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1663193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1663361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1663550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1663722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1663889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1664066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1664239Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1664402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1664563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1664733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1664911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1665091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1665264Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1665438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1665606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1665772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1665944Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1666133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1666303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1666473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1666641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1666823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:10:17.1667008Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex128 PASSED [ 62%]
2023-01-11T23:10:17.1667192Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1667373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1667557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1667732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float64 PASSED [ 62%]
2023-01-11T23:10:17.1667909Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1668088Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1668266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int16 PASSED [ 62%]
2023-01-11T23:10:17.1668436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1668606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1668881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1669058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bool PASSED [ 62%]
2023-01-11T23:10:17.1669266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex32 PASSED [ 62%]
2023-01-11T23:10:17.1669444Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex64 PASSED [ 62%]
2023-01-11T23:10:17.1669622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float16 PASSED [ 62%]
2023-01-11T23:10:17.1669800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float32 PASSED [ 62%]
2023-01-11T23:10:17.1669974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1670146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1670319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_uint8 PASSED [ 62%]
2023-01-11T23:10:17.1670495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1670676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int64 PASSED [ 62%]
2023-01-11T23:10:17.1670853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1671031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int32 PASSED [ 62%]
2023-01-11T23:10:17.1671215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int8 PASSED [ 62%]
2023-01-11T23:10:17.1671395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1671576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1671802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1671982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1672156Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1672328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1672519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1672699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1672886Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1673071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1673247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1673423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1673598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1673784Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_shapes_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1673974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1674158Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1674335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1674524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1674733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1674919Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1675103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1675287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1675464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1675644Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1675826Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1676001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1676185Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1676363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1676541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1676717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1676889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1677059Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1677257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1677432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex32 PASSED [ 63%]
2023-01-11T23:10:17.1677599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1677772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1677941Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1678108Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1678281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1678448Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1678621Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1678795Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1678974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1679135Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1679304Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1679473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1679648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1679815Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1679988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1680160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1680350Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1680523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1680697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1680869Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1681037Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1681213Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1681393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1681564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1681738Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1681915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1682081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex32 PASSED [ 63%]
2023-01-11T23:10:17.1682251Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1682420Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1682589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1682781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1682951Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1683139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1683317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1683503Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1683678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1683856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1684033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1684216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1684386Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1684559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1684731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1684898Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1685064Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1685235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1685424Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1685613Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1685798Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1686023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1686205Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1686386Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1686564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1686753Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1686930Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1687118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1687300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1687479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1687655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1687835Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1688019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1688201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex32 PASSED [ 63%]
2023-01-11T23:10:17.1688410Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1688583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1688763Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1688943Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1689120Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1689299Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1689471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1689641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1689821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1689998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1690165Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1690361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1690554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1690725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1690894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1691062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1691234Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1691400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1691589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1691749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1691928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1692103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1692281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1692455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1692628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1692804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1692980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1693142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1693309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_float64 PASSED [ 63%]
2023-01-11T23:10:17.1693479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1693653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1693823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1694020Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1694190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1694364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1694652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1694817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1695001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex32 PASSED [ 63%]
2023-01-11T23:10:17.1695188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1695369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1695550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1695731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1695923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex128 PASSED [ 63%]
2023-01-11T23:10:17.1696106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1696290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1696462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1696640Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int8 PASSED [ 63%]
2023-01-11T23:10:17.1696824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1697002Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bool PASSED [ 63%]
2023-01-11T23:10:17.1697224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex32 PASSED [ 63%]
2023-01-11T23:10:17.1697407Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1697582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1697755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float32 PASSED [ 63%]
2023-01-11T23:10:17.1697928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1698092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1698287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:10:17.1698480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex64 PASSED [ 63%]
2023-01-11T23:10:17.1698665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int32 PASSED [ 63%]
2023-01-11T23:10:17.1698849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int64 PASSED [ 63%]
2023-01-11T23:10:17.1699030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_uint8 PASSED [ 63%]
2023-01-11T23:10:17.1699207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float16 PASSED [ 63%]
2023-01-11T23:10:17.1699380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int16 PASSED [ 63%]
2023-01-11T23:10:17.1699550Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1699745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1699920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1700113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1700303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1700490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1700677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1700871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1701059Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1701254Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex32 XFAIL [ 64%]
2023-01-11T23:10:17.1701438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1701628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1701814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1701996Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1702188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1702376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1702562Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1702772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1702959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1703137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1703305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1703490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1703671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex32 PASSED [ 64%]
2023-01-11T23:10:17.1703847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1704023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1704195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1704367Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1704547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1704719Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1704893Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex32 PASSED [ 64%]
2023-01-11T23:10:17.1705067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1705265Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1705434Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1705609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1705781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1705952Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1706124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1706287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1706509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1706734Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1706955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1707172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1707385Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1707596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1707804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1708016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1708247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1708469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1708754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1708982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1709202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1709419Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 64%]
2023-01-11T23:10:17.1709592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1709775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1709949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1710119Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1710290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1710454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1710623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1710816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1710983Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1711155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1711328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1711496Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1711668Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1711838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1712000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1712180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1712349Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1712522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1712689Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1712854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1713030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1713200Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1713374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1713545Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1713714Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1713909Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1714083Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1714254Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1714422Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1714590Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1714761Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1714930Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1715103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1715274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1715440Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1715625Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1715809Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1715997Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1716177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1716356Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1716573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1716750Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1716923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1717093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1717267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1717439Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1717612Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1717785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1717957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1718125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1718294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1718465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1718633Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1718805Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1718972Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1719146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1719322Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1719552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1719729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex32 PASSED [ 64%]
2023-01-11T23:10:17.1719910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1720087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1720259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1720436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1720614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex32 PASSED [ 64%]
2023-01-11T23:10:17.1720796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1720980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1721147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1721317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1721488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1721661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1721832Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1722040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1722219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex32 PASSED [ 64%]
2023-01-11T23:10:17.1722399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1722572Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1722737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1722910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1723096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:10:17.1723274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1723465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1723656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1723840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1724025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1724205Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1724374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1724551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1724725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_bool PASSED [ 64%]
2023-01-11T23:10:17.1724908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1725113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1725290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float32 PASSED [ 64%]
2023-01-11T23:10:17.1725466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1725643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1725817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int64 PASSED [ 64%]
2023-01-11T23:10:17.1725981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int8 PASSED [ 64%]
2023-01-11T23:10:17.1726155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1726336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex32 PASSED [ 64%]
2023-01-11T23:10:17.1726516Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1726692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float64 PASSED [ 64%]
2023-01-11T23:10:17.1726867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int16 PASSED [ 64%]
2023-01-11T23:10:17.1727038Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int32 PASSED [ 64%]
2023-01-11T23:10:17.1727206Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_uint8 PASSED [ 64%]
2023-01-11T23:10:17.1727389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex128 PASSED [ 64%]
2023-01-11T23:10:17.1727593Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex64 PASSED [ 64%]
2023-01-11T23:10:17.1727772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float16 PASSED [ 64%]
2023-01-11T23:10:17.1727950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1728128Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1728303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1728473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1728649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1728826Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1729001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1729164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1729336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1729505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1729680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1729863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex128 PASSED [ 65%]
2023-01-11T23:10:17.1730041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex32 PASSED [ 65%]
2023-01-11T23:10:17.1730218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1730394Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1730565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1730762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex32 PASSED [ 65%]
2023-01-11T23:10:17.1730946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1731123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1731301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1731490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1731676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1731863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1732041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1732220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1732388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1732565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1732740Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1732917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1733118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1733293Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1733469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1733652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1733832Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1734000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1734176Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1734347Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1734662Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1734903Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex32 PASSED [ 65%]
2023-01-11T23:10:17.1735094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1735274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1735450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1735628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1735797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1735974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1736166Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex32 PASSED [ 65%]
2023-01-11T23:10:17.1736350Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1736586Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1736765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1736940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1737116Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1737287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1737454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1737626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1737810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1737988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1738167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1738342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1738514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1738693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1738866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1739062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1739235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1739405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1739574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1739742Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1739919Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1740093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1740264Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1740431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1740606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex128 PASSED [ 65%]
2023-01-11T23:10:17.1740777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1740948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1741127Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1741301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1741482Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex128 PASSED [ 65%]
2023-01-11T23:10:17.1741659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1741834Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1742024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1742198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex128 PASSED [ 65%]
2023-01-11T23:10:17.1742371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1742541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1742717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1742891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1743067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1743237Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1743413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1743576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1743749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1743924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1744102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex128 PASSED [ 65%]
2023-01-11T23:10:17.1744282Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1744479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1744652Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1744828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1745001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1745177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1745366Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex128 PASSED [ 65%]
2023-01-11T23:10:17.1745551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_complex64 PASSED [ 65%]
2023-01-11T23:10:17.1745732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1745915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1746096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1746278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1746457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1746635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1746803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1746973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1747143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1747316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1747489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1747712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_bfloat16 SKIPPED (Skipped!) [ 65%]
2023-01-11T23:10:17.1747895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1748080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1748263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1748433Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1748603Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1748850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1749026Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1749197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1749369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1749539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1749709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1749870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1750038Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1750237Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1750406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1750584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1750754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1750926Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1751092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_frac_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1751261Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1751424Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1751600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1751770Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1751943Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1752111Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1752276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1752449Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1752619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1752785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1752945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int16 PASSED [ 65%]
2023-01-11T23:10:17.1753112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1753300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1753486Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:10:17.1753669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1753851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1754034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1754211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1754391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1754562Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bool PASSED [ 65%]
2023-01-11T23:10:17.1754744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float16 PASSED [ 65%]
2023-01-11T23:10:17.1754915Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float32 PASSED [ 65%]
2023-01-11T23:10:17.1755086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_float64 PASSED [ 65%]
2023-01-11T23:10:17.1755258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int32 PASSED [ 65%]
2023-01-11T23:10:17.1755432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int64 PASSED [ 65%]
2023-01-11T23:10:17.1755609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int8 PASSED [ 65%]
2023-01-11T23:10:17.1755807Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_uint8 PASSED [ 65%]
2023-01-11T23:10:17.1755973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1756158Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1756335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1756508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1756680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1756856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1757030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1757204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1757374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1757539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1757706Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1757876Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1758041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1758216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1758392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igamma_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1758578Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1758765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1758973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex32 PASSED [ 66%]
2023-01-11T23:10:17.1759150Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1759333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1759513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1759683Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1759858Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1760033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1760218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1760410Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1760592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1760766Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1760945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1761147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1761343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1761537Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1761711Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1761902Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1762087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1762266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1762442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1762628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1762808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1762984Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1763162Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1763342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1763531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1763720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex32 PASSED [ 66%]
2023-01-11T23:10:17.1763906Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1764084Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1764269Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1764447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1764661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1764843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1765021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1765203Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1765383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1765564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1765737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1765910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1766089Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1766263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1766444Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1766623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1766810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1766992Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1767195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1767363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1767542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1767717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1767894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1768071Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1768245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1768421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1768598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1768765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1768942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1769117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1769289Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1769470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1769649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1769829Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1770025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1770202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1770405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1770581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1770753Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1770931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1771102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1771274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1771454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1771630Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_bool PASSED [ 66%]
2023-01-11T23:10:17.1771812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1771982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1772153Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1772325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1772499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1772675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1772871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1773045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1773219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1773389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1773552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1773721Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1773899Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1774072Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1774248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1774427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1774722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1774900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1775067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1775267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1775458Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1775646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1775843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1776066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1776250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1776435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1776617Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1776811Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1776995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1777184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1777379Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1777574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1777765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1777949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1778132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1778309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float32 PASSED [ 66%]
2023-01-11T23:10:17.1778485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1778687Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int16 XFAIL [ 66%]
2023-01-11T23:10:17.1778863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int32 XFAIL [ 66%]
2023-01-11T23:10:17.1779038Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int8 XFAIL [ 66%]
2023-01-11T23:10:17.1779218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1779398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1779573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1779747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1779931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1780110Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1780281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1780460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1780632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1780814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1780988Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1781160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1781333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1781509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1781697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1781870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1782042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_int8 PASSED [ 66%]
2023-01-11T23:10:17.1782221Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1782396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex32 PASSED [ 66%]
2023-01-11T23:10:17.1782570Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_complex64 PASSED [ 66%]
2023-01-11T23:10:17.1782744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int16 PASSED [ 66%]
2023-01-11T23:10:17.1782919Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int32 PASSED [ 66%]
2023-01-11T23:10:17.1783092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int64 PASSED [ 66%]
2023-01-11T23:10:17.1783257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_uint8 PASSED [ 66%]
2023-01-11T23:10:17.1783456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:10:17.1783656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 66%]
2023-01-11T23:10:17.1783854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex32 PASSED [ 66%]
2023-01-11T23:10:17.1784051Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float16 PASSED [ 66%]
2023-01-11T23:10:17.1784268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 66%]
2023-01-11T23:10:17.1784461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1784655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1784841Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1785022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1785209Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1785392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int16 PASSED [ 67%]
2023-01-11T23:10:17.1785574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1785757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1785938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1786118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1786303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1786488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1786659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int16 PASSED [ 67%]
2023-01-11T23:10:17.1786839Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1787019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1787202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1787408Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1787596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1787776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1787957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1788138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1788309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int16 PASSED [ 67%]
2023-01-11T23:10:17.1788491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1788731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1788932Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1789129Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1789307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1789481Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1789657Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int64 XFAIL [ 67%]
2023-01-11T23:10:17.1789833Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1790030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1790214Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1790397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1790576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1790754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1790926Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1791096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1791283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1791462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1791641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1791827Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1792009Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1792188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1792371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1792547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1792723Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1792904Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1793106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1793272Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1793446Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1793620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1793791Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1793967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1794140Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1794346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1794548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1794746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1794926Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1795116Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1795318Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1795513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1795747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1795949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1796146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1796340Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1796532Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1796710Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1796878Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1797054Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1797233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1797418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1797601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex32 PASSED [ 67%]
2023-01-11T23:10:17.1797779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1797956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1798128Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1798297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1798477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1798673Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex32 XFAIL [ 67%]
2023-01-11T23:10:17.1798851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1799024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1799197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1799374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1799554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1799730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1799900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1800087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1800269Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1800480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1800694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1800875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1801056Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1801261Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1801457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1801631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex32 PASSED [ 67%]
2023-01-11T23:10:17.1801808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1801984Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1802154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1802325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1802517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1802708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1802882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1803055Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1803224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1803396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1803566Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int16 PASSED [ 67%]
2023-01-11T23:10:17.1803734Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1803899Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1804075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1804251Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1804455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1804630Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1804793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int16 PASSED [ 67%]
2023-01-11T23:10:17.1804963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1805185Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1805412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1805636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1805859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1806078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1806292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1806507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 67%]
2023-01-11T23:10:17.1806745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 67%]
2023-01-11T23:10:17.1807033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 67%]
2023-01-11T23:10:17.1807267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 67%]
2023-01-11T23:10:17.1807493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 67%]
2023-01-11T23:10:17.1807725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 67%]
2023-01-11T23:10:17.1807903Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bool PASSED [ 67%]
2023-01-11T23:10:17.1808087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex32 PASSED [ 67%]
2023-01-11T23:10:17.1808268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1808445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1808620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1808793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1808958Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1809128Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_uint8 PASSED [ 67%]
2023-01-11T23:10:17.1809309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex32 PASSED [ 67%]
2023-01-11T23:10:17.1809490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex64 PASSED [ 67%]
2023-01-11T23:10:17.1809671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1809868Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1810045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1810218Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int8 PASSED [ 67%]
2023-01-11T23:10:17.1810398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1810577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex128 PASSED [ 67%]
2023-01-11T23:10:17.1810790Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex32 PASSED [ 67%]
2023-01-11T23:10:17.1810990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1811176Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int16 PASSED [ 67%]
2023-01-11T23:10:17.1811352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int32 PASSED [ 67%]
2023-01-11T23:10:17.1811522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int64 PASSED [ 67%]
2023-01-11T23:10:17.1811701Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1811882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1812124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 67%]
2023-01-11T23:10:17.1812308Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1812528Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float64 PASSED [ 67%]
2023-01-11T23:10:17.1812767Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 67%]
2023-01-11T23:10:17.1813000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 67%]
2023-01-11T23:10:17.1813190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:10:17.1813380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float16 PASSED [ 67%]
2023-01-11T23:10:17.1813568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float32 PASSED [ 67%]
2023-01-11T23:10:17.1813756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_elu_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1813948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1814137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1814318Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1814610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1814810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1815006Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1815203Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1815402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1815634Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1815827Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1816021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1816223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1816431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1816636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1816836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1817030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1817219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1817411Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1817596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1817786Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1817974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1818183Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1818376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1818607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bool PASSED [ 68%]
2023-01-11T23:10:17.1818819Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1819029Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex32 PASSED [ 68%]
2023-01-11T23:10:17.1819235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1819445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1819649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1819852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1820056Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1820253Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1820450Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1820651Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1820853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1821081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1821285Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1821488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1821680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1821870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1822066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1822248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1822456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1822661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1822865Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1823067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1823267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1823488Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1823693Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1823891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1824083Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1824275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1824466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1824655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1824844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1825031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1825219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1825406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1825597Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1825782Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1825961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1826155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1826362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1826587Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [ 68%]
2023-01-11T23:10:17.1826794Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 68%]
2023-01-11T23:10:17.1826995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1827193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1827392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1827588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1827781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1827977Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1828186Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1828390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1828589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1828863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1829091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1829289Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1829485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1829675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1829855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1830057Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1830253Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1830447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1830656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softshrink_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1830888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 68%]
2023-01-11T23:10:17.1831085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1831279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1831469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1831655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1831842Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1832053Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1832243Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1832437Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1832648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1832855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1833060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1833271Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1833452Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1833624Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex64 PASSED [ 68%]
2023-01-11T23:10:17.1833800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1833976Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1839689Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1839889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1840122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1840297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex64 PASSED [ 68%]
2023-01-11T23:10:17.1840480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1840682Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1840855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1841024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1841203Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1841374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_bool PASSED [ 68%]
2023-01-11T23:10:17.1841557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex32 PASSED [ 68%]
2023-01-11T23:10:17.1841732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1841908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1842080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1842258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1842436Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1842619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex32 PASSED [ 68%]
2023-01-11T23:10:17.1842799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_complex64 PASSED [ 68%]
2023-01-11T23:10:17.1842974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1843142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1843338Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1843509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1843678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex32 XFAIL [ 68%]
2023-01-11T23:10:17.1843850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1844010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1844177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1844348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1844527Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex128 PASSED [ 68%]
2023-01-11T23:10:17.1844704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex32 PASSED [ 68%]
2023-01-11T23:10:17.1844877Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1845050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1845220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1845380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1845548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1845754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex32 PASSED [ 68%]
2023-01-11T23:10:17.1845928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1846104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1846279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1846456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex32 PASSED [ 68%]
2023-01-11T23:10:17.1846626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex64 PASSED [ 68%]
2023-01-11T23:10:17.1846799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1846964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1847138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1847307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1847493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1847677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_bool PASSED [ 68%]
2023-01-11T23:10:17.1847861Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float64 PASSED [ 68%]
2023-01-11T23:10:17.1848041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int16 PASSED [ 68%]
2023-01-11T23:10:17.1848216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int32 PASSED [ 68%]
2023-01-11T23:10:17.1848388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int64 PASSED [ 68%]
2023-01-11T23:10:17.1848561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_int8 PASSED [ 68%]
2023-01-11T23:10:17.1848735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_uint8 PASSED [ 68%]
2023-01-11T23:10:17.1848952Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:10:17.1849134Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float16 PASSED [ 68%]
2023-01-11T23:10:17.1849313Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float32 PASSED [ 68%]
2023-01-11T23:10:17.1849492Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1849667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1849839Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1850018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1850184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1850358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1850539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex128 PASSED [ 69%]
2023-01-11T23:10:17.1850736Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1850940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1851118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1851290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1851499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1851680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1851859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1852041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1852221Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1852396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1852570Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1852739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1852914Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1853097Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1853272Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1853435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1853606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1853780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex32 PASSED [ 69%]
2023-01-11T23:10:17.1853949Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1854122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1854297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1854737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1854917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1855083Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1855250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1855415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1855582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1855757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1855934Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1856106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1856275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1856442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1856604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1856775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1856941Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1857113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1857316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1857487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1857655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1857823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1857994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1858153Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1858320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1858493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex128 PASSED [ 69%]
2023-01-11T23:10:17.1858663Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1858829Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1859001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1859171Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1859343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1859511Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1859674Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1859843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1860016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1860187Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1860396Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1860593Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1860776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1860945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1861113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1861283Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1861455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex32 PASSED [ 69%]
2023-01-11T23:10:17.1861624Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1861797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1861967Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1862135Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1862306Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1862484Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex128 PASSED [ 69%]
2023-01-11T23:10:17.1862648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1862843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1863012Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1863180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1863349Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1863517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1863694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex32 PASSED [ 69%]
2023-01-11T23:10:17.1863866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1864036Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1864199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1864363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1864531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1864700Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1864889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1865085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex128 PASSED [ 69%]
2023-01-11T23:10:17.1865279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_complex64 PASSED [ 69%]
2023-01-11T23:10:17.1865466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1865655Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1865859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1866051Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1866235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1866418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1866601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1866781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1866964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1867142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1867322Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1867494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1867675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1867859Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1868041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1868223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1868432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1868621Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1868887Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1869073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1869246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1869425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1869604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1869786Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1869969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1870756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1870936Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1871109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1871280Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1871445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1871626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1871803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1871982Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1872186Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1872372Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1872557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1872739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1872940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bool PASSED [ 69%]
2023-01-11T23:10:17.1873136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1873341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 69%]
2023-01-11T23:10:17.1873539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1873735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1873932Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1874114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:10:17.1874297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1874479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1874683Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1874883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1875089Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1875288Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1875499Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1875704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1875899Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1876104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1876305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1876502Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1876706Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1876900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1877100Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1877302Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1877485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float16 PASSED [ 69%]
2023-01-11T23:10:17.1877691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float32 PASSED [ 69%]
2023-01-11T23:10:17.1877877Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int16 PASSED [ 69%]
2023-01-11T23:10:17.1878062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int32 PASSED [ 69%]
2023-01-11T23:10:17.1878247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int64 PASSED [ 69%]
2023-01-11T23:10:17.1878427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_int8 PASSED [ 69%]
2023-01-11T23:10:17.1878604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_uint8 PASSED [ 69%]
2023-01-11T23:10:17.1878787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1878971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1879169Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1879374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1879568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1879764Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1879962Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1880184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1880383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1880589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1880802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1880985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1881171Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1881415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 70%]
2023-01-11T23:10:17.1881599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1881781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1881961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1882135Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1882306Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex32 PASSED [ 70%]
2023-01-11T23:10:17.1882481Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1882653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1882825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1883001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1883201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1883383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1883561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1883737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1883900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1884070Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1884241Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1884416Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1884591Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1884774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1884957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1885131Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1885301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1885469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1885671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1885852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1886028Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1886202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1886376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1886551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1886724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1886898Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1887076Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1887256Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1887432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1887603Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1887779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1887953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex32 PASSED [ 70%]
2023-01-11T23:10:17.1888122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1888294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1888469Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1888635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1888844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1889022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1889198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1889369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1889538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1889709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1889879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1890045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1890233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1890440Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1890643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1890821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1891000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1891177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1891375Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1891549Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1891718Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1891892Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1892064Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1892231Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1892400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1892573Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1892751Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1892924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1893096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1893259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1893427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1893594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1893774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1893946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1894119Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1894292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1894616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1894789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1894948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1895114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1895300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1895478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1895662Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1895851Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1896031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1896208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1896370Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1896540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1896707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1896874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1897040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1897255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1897424Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1897601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1897776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex32 PASSED [ 70%]
2023-01-11T23:10:17.1897940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1898109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1898279Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1898451Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1898635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1898812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1898996Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex32 PASSED [ 70%]
2023-01-11T23:10:17.1899182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1899361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1899534Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1899712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1899888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1900061Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1900302Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1900474Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1900646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1900818Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1901000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1901170Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_indices_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1901341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1901517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1901692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex32 PASSED [ 70%]
2023-01-11T23:10:17.1901863Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1902031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1902199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1902367Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1902543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1902745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1902931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1903114Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1903294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1903473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1903651Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1903828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1904004Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1904172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1904343Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1904519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int8 PASSED [ 70%]
2023-01-11T23:10:17.1904697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:10:17.1904875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex64 PASSED [ 70%]
2023-01-11T23:10:17.1905047Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float16 PASSED [ 70%]
2023-01-11T23:10:17.1905291Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1905461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_float64 PASSED [ 70%]
2023-01-11T23:10:17.1905636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int16 PASSED [ 70%]
2023-01-11T23:10:17.1905800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int32 PASSED [ 70%]
2023-01-11T23:10:17.1905999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int64 PASSED [ 70%]
2023-01-11T23:10:17.1906175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_uint8 PASSED [ 70%]
2023-01-11T23:10:17.1906348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bool PASSED [ 70%]
2023-01-11T23:10:17.1906532Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex128 PASSED [ 70%]
2023-01-11T23:10:17.1906717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex32 PASSED [ 70%]
2023-01-11T23:10:17.1906896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float32 PASSED [ 70%]
2023-01-11T23:10:17.1907081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1907255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int32 PASSED [ 71%]
2023-01-11T23:10:17.1907425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1907612Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:10:17.1907791Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_bool PASSED [ 71%]
2023-01-11T23:10:17.1907977Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1908160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1908341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1908547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1908804Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1908987Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int32 PASSED [ 71%]
2023-01-11T23:10:17.1909157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1909336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_uint8 PASSED [ 71%]
2023-01-11T23:10:17.1909514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1909689Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1909864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1910040Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1910211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1910384Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1910559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bool PASSED [ 71%]
2023-01-11T23:10:17.1910735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1910920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1911104Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1911284Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1911461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1911670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int32 PASSED [ 71%]
2023-01-11T23:10:17.1911841Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int64 PASSED [ 71%]
2023-01-11T23:10:17.1912013Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1912182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_uint8 PASSED [ 71%]
2023-01-11T23:10:17.1912348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:10:17.1912523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1912699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1912881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1913060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1913233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1913404Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1913581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1913755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int64 PASSED [ 71%]
2023-01-11T23:10:17.1913921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1914122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1914294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1914472Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1914648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int64 PASSED [ 71%]
2023-01-11T23:10:17.1914819Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1914990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_uint8 PASSED [ 71%]
2023-01-11T23:10:17.1915172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1915341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1915518Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1915692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1915867Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1916039Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1916209Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1916380Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int64 PASSED [ 71%]
2023-01-11T23:10:17.1916553Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1916733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1916907Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1917080Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1917280Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1917454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1917625Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1917799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1917971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int64 PASSED [ 71%]
2023-01-11T23:10:17.1918141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_uint8 PASSED [ 71%]
2023-01-11T23:10:17.1918320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bool PASSED [ 71%]
2023-01-11T23:10:17.1918489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1918671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex32 PASSED [ 71%]
2023-01-11T23:10:17.1918845Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1919018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1919191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1919363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int32 PASSED [ 71%]
2023-01-11T23:10:17.1919538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bool PASSED [ 71%]
2023-01-11T23:10:17.1919741Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1919911Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1920078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int32 PASSED [ 71%]
2023-01-11T23:10:17.1920254Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int8 PASSED [ 71%]
2023-01-11T23:10:17.1920454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_uint8 PASSED [ 71%]
2023-01-11T23:10:17.1920649Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1920824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1920995Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int16 PASSED [ 71%]
2023-01-11T23:10:17.1921167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_uint8 PASSED [ 71%]
2023-01-11T23:10:17.1921364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1921558Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1921739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_float64 PASSED [ 71%]
2023-01-11T23:10:17.1921928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:10:17.1922115Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1922301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1922489Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex128 PASSED [ 71%]
2023-01-11T23:10:17.1922677Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float16 PASSED [ 71%]
2023-01-11T23:10:17.1922883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1923061Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int64 PASSED [ 71%]
2023-01-11T23:10:17.1923234Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1923407Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1923581Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1923753Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1923934Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1924103Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1924279Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1924471Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager__native_batch_norm_legit_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1924645Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1924813Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_abs_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1924979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1925147Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acos_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1925320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1925534Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_complex64 SKIPPED (Skipped!) [ 71%]
2023-01-11T23:10:17.1925727Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addbmm_cuda_float32 SKIPPED (Skipped!) [ 71%]
2023-01-11T23:10:17.1925902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_float32 XFAIL [ 71%]
2023-01-11T23:10:17.1926075Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_complex64 XFAIL [ 71%]
2023-01-11T23:10:17.1926266Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1926428Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1926614Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_complex64 SKIPPED (Skipped!) [ 71%]
2023-01-11T23:10:17.1926798Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addr_cuda_float32 SKIPPED (Skipped!) [ 71%]
2023-01-11T23:10:17.1926968Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1927147Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1927323Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_allclose_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1927498Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1927671Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_angle_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1927841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1928002Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_any_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1928173Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_arange_cuda_float32 XFAIL [ 71%]
2023-01-11T23:10:17.1928401Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 71%]
2023-01-11T23:10:17.1928633Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_complex64 XFAIL [ 71%]
2023-01-11T23:10:17.1928826Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_partial_views_cuda_float32 XFAIL [ 71%]
2023-01-11T23:10:17.1929001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1929175Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1929342Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asinh_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1929511Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan2_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1929678Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1929850Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1930020Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atanh_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1930194Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1930397Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bernoulli_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1930598Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1930769Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1930938Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bmm_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1931105Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1931304Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1931478Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1931648Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_byte_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1931830Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1931999Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1932167Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cat_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1932343Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1932514Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ceil_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1932685Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1932856Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1933027Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1933206Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1933379Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1933551Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1933718Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chunk_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1933891Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_min_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1934064Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1934223Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clone_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1934427Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_complex64 PASSED [ 71%]
2023-01-11T23:10:17.1934710Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1934882Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_complex_cuda_float32 PASSED [ 71%]
2023-01-11T23:10:17.1935047Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1935231Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1935409Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_physical_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1935593Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1935771Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_contiguous_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1935938Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_copysign_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1936112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1936284Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1936464Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1936631Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1936800Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1937013Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1937184Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumsum_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1937349Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1937529Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1937702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_embed_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1937873Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1938051Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1938221Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1938411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1938595Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_scatter_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1938767Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1938932Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_digamma_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1939101Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1939289Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_floor_rounding_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1939476Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1939648Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1939815Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_double_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1939989Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1940193Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dsplit_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1940364Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1940542Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_float32 SKIPPED (Skipped!) [ 72%]
2023-01-11T23:10:17.1940719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1940883Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1941055Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1941222Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_equal_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1941392Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfinv_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1941572Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1941747Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_as_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1941910Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1942093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_float32 SKIPPED (Skipped!) [ 72%]
2023-01-11T23:10:17.1942271Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1942443Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft2_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1942617Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1942812Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fft_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1942983Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1943164Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1943336Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1943504Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1943673Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1943851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1944025Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1944200Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1944384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1944561Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft2_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1944733Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfft_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1944908Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1945075Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1945244Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1945414Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1945589Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfftn_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1945769Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1945961Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1946137Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1946316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1946485Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1946643Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmin_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1946808Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frexp_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1946976Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_complex64 XFAIL [ 72%]
2023-01-11T23:10:17.1947143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_cuda_float32 XFAIL [ 72%]
2023-01-11T23:10:17.1947316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1947490Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1947664Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1947847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_grid_sampler_2d_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1948008Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1948181Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1948374Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hsplit_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1948543Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1948785Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hypot_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1948954Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_i0_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1949121Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igamma_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1949291Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_igammac_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1949459Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_imag_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1949633Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1949806Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1949985Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_reduce_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1950161Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1950327Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1950495Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1950664Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1950844Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1951016Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1951179Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isnan_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1951355Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isneginf_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1951521Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1951743Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1951928Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1952128Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1952295Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1952464Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_complex64 XFAIL [ 72%]
2023-01-11T23:10:17.1952623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ldexp_cuda_float32 XFAIL [ 72%]
2023-01-11T23:10:17.1952791Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_le_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1952963Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1953134Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lerp_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1953302Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lgamma_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1953480Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1953666Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1953848Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1954031Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1954230Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvals_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1954430Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1954606Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1954795Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1954981Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1955162Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1955343Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1955518Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1955713Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1955886Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1956176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1956360Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1956548Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1956734Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_ex_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1956915Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1957107Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1957295Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1957503Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1957679Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1957861Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_multi_dot_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1958045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1958248Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1958428Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1958607Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1958797Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1958973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1959148Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1959323Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1959506Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_ex_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1959696Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1959897Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1960078Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1960260Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svdvals_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1960447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1960627Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1960810Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1960979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vecdot_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1961159Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1961335Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1961504Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log1p_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1961675Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1961841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1962009Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1962185Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp2_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1962363Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logaddexp_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1962537Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logcumsumexp_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1962740Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_complex64 PASSED [ 72%]
2023-01-11T23:10:17.1962929Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logdet_cuda_float32 PASSED [ 72%]
2023-01-11T23:10:17.1963131Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1963309Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_and_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1963499Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_float32 SKIPPED (Skipped!) [ 73%]
2023-01-11T23:10:17.1963675Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1963847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_or_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1964024Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1964189Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logit_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1964360Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_float32 XFAIL [ 73%]
2023-01-11T23:10:17.1964530Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lt_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1964699Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1964862Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1965031Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1965203Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1965367Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mH_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1965529Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1965735Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1965915Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1966097Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1966273Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_fill_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1966456Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_log_softmax_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1966638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logsumexp_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1966815Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1966994Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_median_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1967176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1967359Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1967534Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_prod_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1967715Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1967895Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmin_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1968072Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1968250Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1968428Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1968625Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1968841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_no_dim_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1969017Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_maximum_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1969189Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_median_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1969384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1969578Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1969745Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1969918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mode_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1970093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1970270Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_movedim_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1970458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1970662Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_multinomial_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1970848Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1971036Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1971210Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nan_to_num_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1971411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmean_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1971591Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanquantile_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1971762Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nansum_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1971941Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1972106Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1972275Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1972458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_batch_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1972640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_layer_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1972812Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1972984Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1973177Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_float32 SKIPPED (Skipped!) [ 73%]
2023-01-11T23:10:17.1973419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_complex64 SKIPPED (Expected: new_empty_strided is not comparable) [ 73%]
2023-01-11T23:10:17.1973655Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_strided_cuda_float32 SKIPPED (Expected: new_empty_strided is not comparable) [ 73%]
2023-01-11T23:10:17.1973820Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1973998Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1974176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1974377Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1974727Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1974932Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1975131Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1975320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool2d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1975509Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool3d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1975696Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1975895Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1976111Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1976309Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1976507Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1976703Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1976902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1977124Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_ctc_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1977315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout2d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1977505Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout3d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1977687Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_dropout_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1977873Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1978088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1978303Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1978503Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1978686Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gelu_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1978878Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_group_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1979068Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardswish_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1979256Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardtanh_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1979447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1979644Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_instance_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1979847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1980037Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1980252Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_layer_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1980440Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1980639Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1980832Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool2d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1981029Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1981221Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1981408Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1981598Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1981786Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mish_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1981973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_mse_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1982169Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1982371Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1982561Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_nll_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1982773Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1982973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1983158Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1983353Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1983557Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1983756Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pairwise_distance_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1983948Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1984149Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1984337Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_prelu_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1984522Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_rrelu_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1984702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_selu_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1984890Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1985073Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softplus_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1985264Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1985460Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1985683Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1985885Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1986102Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1986291Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1986493Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1986690Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_upsample_nearest_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1986858Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1987027Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1987201Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1987377Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1987548Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_nuc_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1987719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1987896Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1988068Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_like_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1988310Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1988472Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ormqr_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1988645Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1988886Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1989061Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_permute_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1989265Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [ 73%]
2023-01-11T23:10:17.1989470Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 73%]
2023-01-11T23:10:17.1989638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1989811Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pow_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1989983Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1990147Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1990316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1990484Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rad2deg_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1990663Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1990831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_cuda_float32 XFAIL [ 73%]
2023-01-11T23:10:17.1991001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_complex64 XFAIL [ 73%]
2023-01-11T23:10:17.1991170Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_cuda_float32 XFAIL [ 73%]
2023-01-11T23:10:17.1991346Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1991537Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1991706Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1991890Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1992066Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_remainder_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1992240Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_complex64 PASSED [ 73%]
2023-01-11T23:10:17.1992411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_cuda_float32 PASSED [ 73%]
2023-01-11T23:10:17.1992600Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1992789Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_repeat_interleave_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1992967Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1993132Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1993305Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1993474Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize__cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1993651Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1993830Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1994034Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1994203Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1994376Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1994546Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rot90_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1994721Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_0_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1994893Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1995063Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1995258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_complex64 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1995444Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1995623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_add_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1995801Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1995987Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_prod_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1996170Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_sum_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1996352Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_lengths_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1996542Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_segment_reduce_offsets_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1996716Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1996888Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sgn_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1997055Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1997251Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1997421Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sign_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1997623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_bartlett_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1997817Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_blackman_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1998015Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_cosine_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1998222Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_exponential_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1998424Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_gaussian_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1998624Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hamming_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:10:17.1998796Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signbit_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1998965Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1999133Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1999316Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_scatter_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1999484Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.1999691Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.1999875Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_softmax_with_dtype_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2000062Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j0_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2000243Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_j1_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2000424Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y1_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2000623Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2000804Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_entr_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2000981Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_erfcx_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2001181Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_he_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2001374Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i0e_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2001575Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1e_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2002065Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 74%]
2023-01-11T23:10:17.2002255Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i0_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2002447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_i1_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2002638Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k1_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2002849Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2003073Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2003272Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2003445Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_xlog1py_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2003622Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_zeta_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2003790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2003973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2004148Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2004315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sqrt_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2004490Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2004661Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2004827Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stack_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2004987Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2005152Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2005325Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2005507Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2005708Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_unbiased_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2005877Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2006043Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2006209Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sub_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2006390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2006549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2006718Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2006882Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2007066Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2007235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2007403Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2007567Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2007747Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2007919Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2008095Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensordot_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2008267Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2008434Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2008596Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2008806Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2008979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_to_sparse_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2009145Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_topk_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2009311Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2009479Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2009646Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2009831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2010003Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2010176Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2010344Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triu_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2010551Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2010742Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_true_divide_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2010909Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trunc_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2011079Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2011272Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2011440Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_float32 XFAIL [ 74%]
2023-01-11T23:10:17.2011616Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2011788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unsqueeze_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2011955Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2012121Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2012297Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2012460Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2012646Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2012823Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2012994Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2013161Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vdot_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2013336Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2013505Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2013683Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_real_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2013853Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2014011Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2014179Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2014345Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2014720Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_complex64 PASSED [ 74%]
2023-01-11T23:10:17.2014889Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_xlogy_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2015054Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2015219Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_float32 XFAIL [ 74%]
2023-01-11T23:10:17.2015390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2015564Z test_ops.py::TestCompositeComplianceCUDA::test_backward_H_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2015729Z test_ops.py::TestCompositeComplianceCUDA::test_backward_T_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2015905Z test_ops.py::TestCompositeComplianceCUDA::test_backward___radd___cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2016091Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmatmul___cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2016262Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmul___cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2016433Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rsub___cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2016627Z test_ops.py::TestCompositeComplianceCUDA::test_backward__softmax_backward_data_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2016800Z test_ops.py::TestCompositeComplianceCUDA::test_backward_abs_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2016976Z test_ops.py::TestCompositeComplianceCUDA::test_backward_acos_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2017145Z test_ops.py::TestCompositeComplianceCUDA::test_backward_acosh_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2017346Z test_ops.py::TestCompositeComplianceCUDA::test_backward_add_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2017526Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addcdiv_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2017716Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_decomposed_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2017918Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmv_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2018091Z test_ops.py::TestCompositeComplianceCUDA::test_backward_angle_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2018270Z test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2018464Z test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_partial_views_cuda_float32 XFAIL [ 74%]
2023-01-11T23:10:17.2018653Z test_ops.py::TestCompositeComplianceCUDA::test_backward_as_strided_scatter_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2018821Z test_ops.py::TestCompositeComplianceCUDA::test_backward_asin_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2018993Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atan_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2019166Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atanh_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2019345Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_1d_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2019521Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_3d_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2019697Z test_ops.py::TestCompositeComplianceCUDA::test_backward_baddbmm_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2019877Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bernoulli_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2020048Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bmm_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2020235Z test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_tensors_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2020422Z test_ops.py::TestCompositeComplianceCUDA::test_backward_broadcast_to_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2020656Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cartesian_prod_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2020855Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cat_cuda_float32 PASSED [ 74%]
2023-01-11T23:10:17.2021030Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cdist_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2021204Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ceil_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2021378Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cfloat_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2021555Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2021731Z test_ops.py::TestCompositeComplianceCUDA::test_backward_chunk_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2021903Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_min_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2022083Z test_ops.py::TestCompositeComplianceCUDA::test_backward_column_stack_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2022269Z test_ops.py::TestCompositeComplianceCUDA::test_backward_combinations_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2022441Z test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2022625Z test_ops.py::TestCompositeComplianceCUDA::test_backward_conj_physical_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2022802Z test_ops.py::TestCompositeComplianceCUDA::test_backward_contiguous_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2022975Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cosh_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2023149Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cummin_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2023349Z test_ops.py::TestCompositeComplianceCUDA::test_backward_deg2rad_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2023522Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2023702Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagflat_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2023880Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2024071Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_scatter_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2024242Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dist_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2024436Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_no_rounding_mode_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2024609Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dot_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2024783Z test_ops.py::TestCompositeComplianceCUDA::test_backward_double_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2024950Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erfc_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2025124Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erfinv_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2025300Z test_ops.py::TestCompositeComplianceCUDA::test_backward_exp2_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2025477Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_as_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2025652Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft2_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2025825Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fft_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2025998Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftn_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2026171Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2026342Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft2_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2026513Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifft_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2026717Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftshift_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2026898Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft2_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2027075Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfft_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2027252Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2027425Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft2_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2027599Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfftn_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2027769Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fill_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2027940Z test_ops.py::TestCompositeComplianceCUDA::test_backward_flipud_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2028115Z test_ops.py::TestCompositeComplianceCUDA::test_backward_float_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2028288Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmin_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2028459Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmod_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2028630Z test_ops.py::TestCompositeComplianceCUDA::test_backward_gather_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2028876Z test_ops.py::TestCompositeComplianceCUDA::test_backward_half_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2029052Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hstack_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2029227Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hypot_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2029424Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_add_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2029603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_copy_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2029782Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_fill_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2029959Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_reduce_cuda_float32 XFAIL [ 75%]
2023-01-11T23:10:17.2030141Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_select_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2030313Z test_ops.py::TestCompositeComplianceCUDA::test_backward_inner_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2030484Z test_ops.py::TestCompositeComplianceCUDA::test_backward_kron_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2030658Z test_ops.py::TestCompositeComplianceCUDA::test_backward_kthvalue_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2030829Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lerp_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2031006Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cond_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2031184Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cross_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2031365Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2031558Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_det_singular_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2031734Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eig_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2031909Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigh_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2032111Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_householder_product_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2032313Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2032504Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_factor_ex_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2032677Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2032884Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_qr_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2033068Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2033255Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_ex_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2033450Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_solve_triangular_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2033626Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svd_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2033811Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorinv_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2033997Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vecdot_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2034179Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vector_norm_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2034354Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log10_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2034528Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log1p_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2034699Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log2_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2034871Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2035048Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2035235Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logcumsumexp_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2035432Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logit_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2035613Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logsumexp_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2035786Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_solve_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2035957Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mH_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2036126Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mT_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2036311Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumsum_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2036500Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logsumexp_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2036683Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_mean_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2036872Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_median_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2037063Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_normalize_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2037243Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_scatter_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2037426Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_select_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2037611Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmin_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2037793Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_sum_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2037971Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_var_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2038147Z test_ops.py::TestCompositeComplianceCUDA::test_backward_matmul_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2038324Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_binary_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2038535Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2038757Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_with_dim_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2038926Z test_ops.py::TestCompositeComplianceCUDA::test_backward_median_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2039120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_no_dim_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2039296Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mode_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2039474Z test_ops.py::TestCompositeComplianceCUDA::test_backward_movedim_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2039647Z test_ops.py::TestCompositeComplianceCUDA::test_backward_msort_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2039842Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2040018Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmean_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2040201Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nanquantile_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2040381Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nansum_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2040589Z test_ops.py::TestCompositeComplianceCUDA::test_backward_native_batch_norm_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2040812Z test_ops.py::TestCompositeComplianceCUDA::test_backward_native_dropout_backward_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2040985Z test_ops.py::TestCompositeComplianceCUDA::test_backward_neg_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2041194Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2041398Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2041630Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_alpha_dropout_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2041834Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2042027Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2042236Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2042450Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2042639Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_celu_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2042829Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv1d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2043024Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2043228Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2043441Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2043647Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cosine_similarity_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2043838Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_ctc_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2044035Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2044225Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout3d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2044420Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_dropout_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2044614Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_elu_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2044856Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2045082Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2045291Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2045501Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2045693Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_glu_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2045889Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_group_norm_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2046094Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardtanh_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2046287Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_huber_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2046491Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_instance_norm_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2046697Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_area_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2046905Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2047113Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2047319Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2047534Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_kl_div_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2047731Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_l1_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2047926Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_linear_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2048121Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2048317Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool3d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2048515Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2048719Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2048919Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2049120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2049314Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mish_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2049506Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_mse_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2049714Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2049920Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2050116Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_normalize_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2050311Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_circular_cuda_float32 PASSED [ 75%]
2023-01-11T23:10:17.2050509Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_constant_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2050762Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_reflect_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2050992Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pad_replicate_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2051199Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pairwise_distance_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2051392Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pdist_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2051584Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_prelu_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2051769Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu6_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2051963Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_relu_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2052147Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_selu_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2052335Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_silu_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2052534Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softshrink_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2052734Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_tanhshrink_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2052931Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_threshold_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2053140Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2053361Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2053590Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2053787Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_upsample_nearest_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2053963Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2054150Z test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_number_mean_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2054324Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ormqr_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2054594Z test_ops.py::TestCompositeComplianceCUDA::test_backward_outer_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2054845Z test_ops.py::TestCompositeComplianceCUDA::test_backward_pca_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 76%]
2023-01-11T23:10:17.2055022Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polar_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2055220Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_0_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2055419Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_2_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2055603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_3_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2055795Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_4_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2055970Z test_ops.py::TestCompositeComplianceCUDA::test_backward_put_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2056150Z test_ops.py::TestCompositeComplianceCUDA::test_backward_quantile_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2056329Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rad2deg_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2056506Z test_ops.py::TestCompositeComplianceCUDA::test_backward_real_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2056683Z test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2056906Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_as_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2057079Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reshape_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2057254Z test_ops.py::TestCompositeComplianceCUDA::test_backward_roll_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2057431Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2057621Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_0_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2057811Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_neg_3_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2057988Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rsub_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2058180Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amin_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2058371Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_prod_cuda_float32 XFAIL [ 76%]
2023-01-11T23:10:17.2058556Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_sum_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2058744Z test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_lengths_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2058944Z test_ops.py::TestCompositeComplianceCUDA::test_backward_segment_reduce_offsets_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2059119Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sgn_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2059299Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sin_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2059508Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sinc_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2059685Z test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2059866Z test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2060036Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sort_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2060216Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_erfcx_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2060399Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i0e_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2060579Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2060760Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_i1e_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2060948Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_log_ndtr_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2061131Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtr_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2061320Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_xlog1py_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2061496Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2061672Z test_ops.py::TestCompositeComplianceCUDA::test_backward_squeeze_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2061840Z test_ops.py::TestCompositeComplianceCUDA::test_backward_stack_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2062031Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_unbiased_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2062204Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sub_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2062375Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2062550Z test_ops.py::TestCompositeComplianceCUDA::test_backward_t_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2062724Z test_ops.py::TestCompositeComplianceCUDA::test_backward_take_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2062922Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tan_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2063109Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tensor_split_cuda_float32 XFAIL [ 76%]
2023-01-11T23:10:17.2063289Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tensordot_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2063454Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tile_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2063627Z test_ops.py::TestCompositeComplianceCUDA::test_backward_to_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2063798Z test_ops.py::TestCompositeComplianceCUDA::test_backward_topk_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2063975Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trapezoid_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2064159Z test_ops.py::TestCompositeComplianceCUDA::test_backward_true_divide_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2064335Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trunc_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2064520Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unflatten_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2064697Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unsqueeze_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2064861Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2065054Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_unbiased_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2065231Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2065410Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_copy_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2065610Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2065787Z test_ops.py::TestCompositeComplianceCUDA::test_backward_zero__cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2065974Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___getitem___cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2066152Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rdiv___cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2066331Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmod___cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2066498Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmul___cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2066671Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rpow___cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2066870Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__softmax_backward_data_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2067048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_abs_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2067226Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acos_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2067401Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_add_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2067580Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addbmm_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2067758Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcdiv_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2067927Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2068099Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addr_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2068317Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_allclose_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2068491Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amin_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2068775Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_any_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2069017Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_arange_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2069227Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmax_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2069442Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argsort_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2069657Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argwhere_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2069833Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2070035Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_partial_views_cuda_float32 XFAIL [ 76%]
2023-01-11T23:10:17.2070230Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_as_strided_scatter_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2070407Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asin_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2070590Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan2_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2070767Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atanh_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2070948Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_2d_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2071128Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_3d_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2071311Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bernoulli_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2071477Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bmm_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2071693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cartesian_prod_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2071867Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cat_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2072080Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdist_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%]
2023-01-11T23:10:17.2072290Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cfloat_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%]
2023-01-11T23:10:17.2072505Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%]
2023-01-11T23:10:17.2072693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_solve_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2072878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_min_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2073054Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clone_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2073225Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2073414Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_conj_physical_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2073605Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_constant_pad_nd_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2073777Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cos_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2073951Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cross_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2074128Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumsum_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2074306Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_deg2rad_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2074488Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_embed_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2074674Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2074843Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diff_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2075046Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_digamma_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2075220Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dist_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2075413Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_floor_rounding_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2075610Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_no_rounding_mode_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2075791Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_double_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2076000Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2076215Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_empty_like_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2076387Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfc_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2076562Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2076737Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expm1_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2076913Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft2_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2077093Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft2_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2077272Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfftn_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2077452Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftn_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2077658Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft2_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2077839Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfftn_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2078011Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft2_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2078189Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfft_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2078365Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flip_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2078575Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_cuda_float32 SKIPPED (Does not support forward_ad) [ 76%]
2023-01-11T23:10:17.2078759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_float_power_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2078974Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_divide_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2079150Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmax_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2079324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmod_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2079501Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frac_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2079671Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_frexp_cuda_float32 PASSED [ 76%]
2023-01-11T23:10:17.2079878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_cuda_float32 SKIPPED (Does not support autograd) [ 76%]
2023-01-11T23:10:17.2080091Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_full_like_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2080273Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gather_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2080481Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_geqrf_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2080664Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gradient_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2080945Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_grid_sampler_2d_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2081150Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_gt_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2081361Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_histc_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2081532Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hsplit_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2081718Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_copy_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2081897Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_put_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2082084Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_select_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2082294Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_int_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2082503Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isclose_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2082716Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isfinite_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2082926Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isin_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2083130Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isinf_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2083328Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isnan_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2083538Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isposinf_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2083797Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2084019Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2084250Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2084470Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_unary_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2084651Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kthvalue_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2084828Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lgamma_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2085016Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cond_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2085211Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_singular_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2085396Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvalsh_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2085599Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_householder_product_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2085787Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_ex_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2086007Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2086227Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_factor_ex_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2086449Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_ldl_solve_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2086647Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2086850Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2087042Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_solve_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2087226Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_norm_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2087420Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_power_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2087609Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_multi_dot_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2087792Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2088001Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2088202Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_hermitian_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2088388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2088585Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_triangular_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2088772Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vander_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2088975Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linspace_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2089151Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log10_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2089326Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2089552Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_with_dtype_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2089741Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp2_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2089963Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logcumsumexp_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2090146Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logdet_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2090361Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_and_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2090608Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_not_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2090831Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_xor_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2091007Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_solve_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2091188Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_unpack_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2091371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amin_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2091589Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmax_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2091804Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_argmin_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2091994Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumprod_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2092187Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logsumexp_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2092372Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_mean_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2092554Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_median_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2092772Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_normalize_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2092956Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_var_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2093137Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matmul_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2093318Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_binary_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2093514Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_no_dim_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2093710Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_reduction_with_dim_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2093892Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_maximum_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2094073Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_binary_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2094256Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_no_dim_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2094435Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_msort_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2094919Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mul_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2095096Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mv_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2095292Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2095487Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2095678Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2095908Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nan_to_num_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2096092Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmean_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2096266Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanmedian_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2096482Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_copy_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2096712Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_dropout_backward_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2096935Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_layer_norm_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2097141Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ne_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2097363Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_strided_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2097576Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_ones_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2097825Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2098034Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2098243Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2098445Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2098657Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2098860Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2099093Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool2d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2099337Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_without_cudnn_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2099535Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_bilinear_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2099744Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2099939Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_celu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2100148Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2100347Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2100546Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_similarity_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2100746Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cross_entropy_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2100971Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_ctc_loss_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2101173Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout2d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2101371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout3d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2101590Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_dropout_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2101824Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_bag_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2102021Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_embedding_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2102241Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2102442Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2102653Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2102855Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gaussian_nll_loss_cuda_float32 XFAIL [ 77%]
2023-01-11T23:10:17.2103047Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_gelu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2103239Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_glu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2103438Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardswish_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2103635Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardtanh_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2103838Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_area_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2104048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2104257Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_linear_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2104447Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_kl_div_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2104644Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_leaky_relu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2104864Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_logsigmoid_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2105074Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2105274Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool1d_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2105488Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_cuda_float32 SKIPPED (Skipped!) [ 77%]
2023-01-11T23:10:17.2105693Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2105887Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mish_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2106125Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2106315Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_nll_loss_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2106517Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_circular_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2106717Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_constant_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2106914Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_reflect_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2107142Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pdist_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2107368Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2107557Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu6_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2107754Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_rrelu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2107943Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_selu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2108127Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_silu_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2108324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2108525Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_threshold_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2108790Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2109017Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2109214Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_unfold_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2109421Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_nearest_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2109598Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_fro_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2109781Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_inf_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2109953Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_nuc_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2110158Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2110371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2110585Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ormqr_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:10:17.2110793Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_permute_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2110990Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_1_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2111186Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_3_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2111362Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_positive_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2111535Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_qr_cuda_float32 PASSED [ 77%]
2023-01-11T23:10:17.2111747Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2111958Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randint_like_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:10:17.2112166Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2112377Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_randn_like_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2112559Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reciprocal_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2112740Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2112921Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2113128Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize__cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2113364Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resize_as__cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2113541Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_roll_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2113728Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_3_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2113922Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_neg_3_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2114100Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsqrt_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2114283Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_add_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2114477Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amin_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2114668Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_sum_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2114845Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2115054Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_short_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2115233Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sigmoid_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2115400Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sign_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2115627Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_bartlett_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2115857Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_exponential_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2116088Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_hamming_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2116316Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hamming_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2116548Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signbit_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2116727Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sin_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2116902Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinc_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2117092Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_scatter_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2117278Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_with_dtype_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2117495Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_airy_ai_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2117717Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y1_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2117956Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2118144Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_erfcx_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2118379Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_h_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2118612Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_hermite_polynomial_he_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2118796Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i0e_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2118980Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2119193Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_i1e_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2119377Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_log_ndtr_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2119610Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k0_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2119795Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtr_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2120009Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2120242Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k0_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2120686Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:10:17.2121085Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:10:17.2121313Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_spherical_bessel_j0_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:10:17.2121499Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_xlog1py_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2121675Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2121856Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_with_sizes_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2122029Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sqrt_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2122208Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stack_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2122385Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2122580Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_stft_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2122835Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 78%]
2023-01-11T23:10:17.2123047Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_symeig_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:10:17.2123217Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_t_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2123404Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_along_dim_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2123572Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_take_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2123745Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tan_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2123929Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensor_split_cuda_float32 XFAIL [ 78%]
2023-01-11T23:10:17.2124142Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_sparse_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:10:17.2124318Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trace_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2124500Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_transpose_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2124683Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapezoid_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2124875Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triangular_solve_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2125046Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_triu_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2125248Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_true_divide_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2125423Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unbind_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2125606Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_copy_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2125781Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unfold_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2125955Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2126144Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_unbiased_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2126329Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_unbiased_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2126515Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_complex_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2126682Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2126857Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zero__cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2127031Z test_ops.py::TestCompositeComplianceCUDA::test_operator_H_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2127197Z test_ops.py::TestCompositeComplianceCUDA::test_operator_T_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2127374Z test_ops.py::TestCompositeComplianceCUDA::test_operator___radd___cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2127550Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rdiv___cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2127723Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rmul___cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2127894Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rpow___cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2128066Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rsub___cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2128257Z test_ops.py::TestCompositeComplianceCUDA::test_operator__softmax_backward_data_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2128430Z test_ops.py::TestCompositeComplianceCUDA::test_operator_abs_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2128629Z test_ops.py::TestCompositeComplianceCUDA::test_operator_acos_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2128805Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addcmul_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2128979Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2129170Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmm_decomposed_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2129344Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addmv_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2129520Z test_ops.py::TestCompositeComplianceCUDA::test_operator_allclose_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2129682Z test_ops.py::TestCompositeComplianceCUDA::test_operator_angle_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2129861Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argmax_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2130037Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argsort_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2130214Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argwhere_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2130393Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2130581Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_scatter_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2130762Z test_ops.py::TestCompositeComplianceCUDA::test_operator_asin_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2130964Z test_ops.py::TestCompositeComplianceCUDA::test_operator_asinh_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2131149Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atan_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2131349Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_1d_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2131529Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_3d_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2131708Z test_ops.py::TestCompositeComplianceCUDA::test_operator_baddbmm_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2131883Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bfloat16_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2132059Z test_ops.py::TestCompositeComplianceCUDA::test_operator_block_diag_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2132231Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bmm_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2132402Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bool_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2132592Z test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_shapes_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2132786Z test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_tensors_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2132959Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bucketize_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2133134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_byte_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2133317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cartesian_prod_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2133490Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cat_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2133668Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cdouble_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2133840Z test_ops.py::TestCompositeComplianceCUDA::test_operator_char_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2134016Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2134191Z test_ops.py::TestCompositeComplianceCUDA::test_operator_chunk_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2134365Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_max_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2134646Z test_ops.py::TestCompositeComplianceCUDA::test_operator_combinations_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2134876Z test_ops.py::TestCompositeComplianceCUDA::test_operator_complex_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2135055Z test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2135242Z test_ops.py::TestCompositeComplianceCUDA::test_operator_constant_pad_nd_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2135419Z test_ops.py::TestCompositeComplianceCUDA::test_operator_copysign_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2135600Z test_ops.py::TestCompositeComplianceCUDA::test_operator_count_nonzero_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2135788Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cov_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2135965Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cross_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2136136Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cumprod_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2136331Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cumulative_trapezoid_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2136505Z test_ops.py::TestCompositeComplianceCUDA::test_operator_deg2rad_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2136676Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2136851Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagflat_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2137036Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_copy_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2137211Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2137396Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diagonal_scatter_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2137590Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diff_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2137770Z test_ops.py::TestCompositeComplianceCUDA::test_operator_digamma_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2137943Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dist_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2138134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_no_rounding_mode_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2138325Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_trunc_rounding_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2138500Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dot_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2138674Z test_ops.py::TestCompositeComplianceCUDA::test_operator_einsum_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2138892Z test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 78%]
2023-01-11T23:10:17.2139121Z test_ops.py::TestCompositeComplianceCUDA::test_operator_empty_like_cuda_float32 SKIPPED (Expected: empty_like is not comparable) [ 78%]
2023-01-11T23:10:17.2139290Z test_ops.py::TestCompositeComplianceCUDA::test_operator_equal_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2139463Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erf_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2139637Z test_ops.py::TestCompositeComplianceCUDA::test_operator_exp_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2139812Z test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2139983Z test_ops.py::TestCompositeComplianceCUDA::test_operator_expm1_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2140159Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft2_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2140333Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftn_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2140513Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2140685Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfftn_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2140902Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft2_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2141077Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftn_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2141257Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft2_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2141434Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft2_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2141607Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfft_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2141779Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_irfftn_cuda_float32 PASSED [ 78%]
2023-01-11T23:10:17.2141956Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft2_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2142129Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfftn_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2142301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_flatten_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2142475Z test_ops.py::TestCompositeComplianceCUDA::test_operator_flip_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2142650Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fliplr_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2142827Z test_ops.py::TestCompositeComplianceCUDA::test_operator_float_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2143006Z test_ops.py::TestCompositeComplianceCUDA::test_operator_float_power_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2143178Z test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2143363Z test_ops.py::TestCompositeComplianceCUDA::test_operator_floor_divide_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2143562Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmod_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2143732Z test_ops.py::TestCompositeComplianceCUDA::test_operator_frac_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2143899Z test_ops.py::TestCompositeComplianceCUDA::test_operator_full_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2144075Z test_ops.py::TestCompositeComplianceCUDA::test_operator_full_like_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2144248Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gather_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2144423Z test_ops.py::TestCompositeComplianceCUDA::test_operator_geqrf_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2144595Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gt_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2144770Z test_ops.py::TestCompositeComplianceCUDA::test_operator_half_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2144942Z test_ops.py::TestCompositeComplianceCUDA::test_operator_histc_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2145115Z test_ops.py::TestCompositeComplianceCUDA::test_operator_i0_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2145291Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_copy_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2145468Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_put_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2145641Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isnan_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2145819Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isneginf_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2146026Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (skip) [ 79%]
2023-01-11T23:10:17.2146238Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (skip) [ 79%]
2023-01-11T23:10:17.2146413Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ldexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2146582Z test_ops.py::TestCompositeComplianceCUDA::test_operator_le_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2146786Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigh_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2146983Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_householder_product_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2147159Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2147341Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_inv_ex_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2147531Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_ex_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2147719Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_solve_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2147900Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2148614Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_cuda_float32 SKIPPED (Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/91685 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure CI is not set and you are not using the flag --import-disabled-tests.) [ 79%]
2023-01-11T23:10:17.2148886Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_solve_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2149076Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2149268Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_power_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2149451Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2149682Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2149869Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_multi_dot_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2150050Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2150231Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2150422Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_hermitian_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2150677Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 79%]
2023-01-11T23:10:17.2150853Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_qr_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2151037Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_slogdet_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2151219Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_ex_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2151404Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svdvals_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2151586Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vander_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2151761Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log1p_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2151954Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_with_dtype_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2152135Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2152318Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logcumsumexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2152499Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_and_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2152683Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_or_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2152852Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logspace_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2153054Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logsumexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2153229Z test_ops.py::TestCompositeComplianceCUDA::test_operator_long_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2153397Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2153574Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_solve_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2153751Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lu_unpack_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2153919Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mT_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2154102Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amin_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2154279Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmax_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2154465Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_argmin_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2154651Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumprod_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2154838Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logsumexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2155019Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_median_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2155205Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_normalize_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2155387Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_prod_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2155569Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_scatter_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2155784Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_select_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2155961Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmin_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2156140Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_std_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2156317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_sum_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2156494Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_var_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2156671Z test_ops.py::TestCompositeComplianceCUDA::test_operator_matmul_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2156849Z test_ops.py::TestCompositeComplianceCUDA::test_operator_matrix_exp_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2157026Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_binary_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2157233Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2157430Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_no_dim_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2157600Z test_ops.py::TestCompositeComplianceCUDA::test_operator_maximum_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2157790Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_no_dim_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2157983Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_reduction_with_dim_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2158155Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2158330Z test_ops.py::TestCompositeComplianceCUDA::test_operator_movedim_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2158504Z test_ops.py::TestCompositeComplianceCUDA::test_operator_msort_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2158682Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mul_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2158852Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mv_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2159062Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2159257Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2159437Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmedian_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2159619Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nanquantile_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2159792Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nansum_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2159969Z test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_copy_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2160144Z test_ops.py::TestCompositeComplianceCUDA::test_operator_narrow_cuda_float32 XFAIL [ 79%]
2023-01-11T23:10:17.2160334Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_batch_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2160539Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_layer_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2160702Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ne_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2160874Z test_ops.py::TestCompositeComplianceCUDA::test_operator_neg_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2161050Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_ones_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2161226Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_zeros_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2161404Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nextafter_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2161622Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2161856Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2162060Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2162264Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2162446Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_celu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2162634Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv2d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2162837Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2163038Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2163241Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cross_entropy_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2163437Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_ctc_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2163631Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout3d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2163822Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2164008Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_elu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2164225Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_bag_cuda_float32 SKIPPED (Allowed exemption) [ 79%]
2023-01-11T23:10:17.2164440Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_embedding_cuda_float32 SKIPPED (Allowed exemption) [ 79%]
2023-01-11T23:10:17.2164660Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2164908Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2165099Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_glu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2165291Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardtanh_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2165494Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2165687Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_huber_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2165883Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_instance_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2166082Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_area_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2166288Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2166477Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_kl_div_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2166673Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_layer_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2166876Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_local_response_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2167072Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool1d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2167263Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool3d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2167487Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2167677Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mish_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2167871Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_mse_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2168066Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2168272Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2168465Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_nll_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2168662Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_circular_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2168858Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_reflect_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2169052Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pdist_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2169256Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2169444Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_prelu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2169636Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu6_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2169817Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_selu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2170001Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_silu_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2170193Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2170400Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2170591Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softplus_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2170812Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softshrink_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2171005Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_softsign_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2171208Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2171432Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2171629Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2171829Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_upsample_nearest_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2172012Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_inf_cuda_float32 PASSED [ 79%]
2023-01-11T23:10:17.2172203Z test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_number_mean_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2172378Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ormqr_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2172553Z test_ops.py::TestCompositeComplianceCUDA::test_operator_outer_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2172727Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polar_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2172924Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_0_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2173122Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_3_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2173288Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pow_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2173522Z test_ops.py::TestCompositeComplianceCUDA::test_operator_prod_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2173697Z test_ops.py::TestCompositeComplianceCUDA::test_operator_put_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2173875Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rand_like_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2174052Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2174230Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_like_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2174410Z test_ops.py::TestCompositeComplianceCUDA::test_operator_remainder_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2174677Z test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2174876Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resize__cuda_float32 SKIPPED (Allowed exception) [ 80%]
2023-01-11T23:10:17.2175084Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resize_as__cuda_float32 SKIPPED (Allowed exemption) [ 80%]
2023-01-11T23:10:17.2175265Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_neg_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2175443Z test_ops.py::TestCompositeComplianceCUDA::test_operator_roll_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2175614Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rot90_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2175804Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_3_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2175978Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rsqrt_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2176161Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scalar_tensor_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2176350Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_sum_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2176540Z test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_lengths_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2176713Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sgn_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2176935Z test_ops.py::TestCompositeComplianceCUDA::test_operator_short_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2177115Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sigmoid_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2177288Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sign_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2177483Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_cosine_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2177685Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_hamming_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2177877Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hamming_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2178071Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_hann_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2178256Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_kaiser_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2178431Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sinc_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2178604Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sinh_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2178793Z test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_with_dtype_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2178978Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_airy_ai_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2179165Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j0_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2179365Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2179793Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:10:17.2179982Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_erfcx_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2180192Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_he_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2180403Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i0e_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2180590Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2180772Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_i1e_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2180972Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_laguerre_polynomial_l_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2181346Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:10:17.2181549Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k0_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2181746Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_k1_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2181928Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtri_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2182305Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:10:17.2182689Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:10:17.2182893Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_spherical_bessel_j0_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2183081Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_xlog1py_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2183283Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sqrt_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2183461Z test_ops.py::TestCompositeComplianceCUDA::test_operator_stack_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2183639Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2183827Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_mean_unbiased_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2184008Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_unbiased_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2184181Z test_ops.py::TestCompositeComplianceCUDA::test_operator_stft_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2184345Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sub_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2184530Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_to_size_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2184706Z test_ops.py::TestCompositeComplianceCUDA::test_operator_symeig_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2184878Z test_ops.py::TestCompositeComplianceCUDA::test_operator_t_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2185051Z test_ops.py::TestCompositeComplianceCUDA::test_operator_take_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2185222Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tan_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2185400Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tensor_split_cuda_float32 XFAIL [ 80%]
2023-01-11T23:10:17.2185580Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tensordot_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2185744Z test_ops.py::TestCompositeComplianceCUDA::test_operator_topk_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2185953Z test_ops.py::TestCompositeComplianceCUDA::test_operator_transpose_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2186141Z test_ops.py::TestCompositeComplianceCUDA::test_operator_triangular_solve_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2186317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_triu_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2186497Z test_ops.py::TestCompositeComplianceCUDA::test_operator_true_divide_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2186672Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trunc_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2186844Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unbind_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2187022Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unflatten_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2187202Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_copy_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2187371Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unfold_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2187565Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_consecutive_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2187741Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unique_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2187918Z test_ops.py::TestCompositeComplianceCUDA::test_operator_unsqueeze_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2188089Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2188279Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_unbiased_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2188452Z test_ops.py::TestCompositeComplianceCUDA::test_operator_vdot_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2188638Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_complex_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2188878Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_as_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2189062Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_copy_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2189237Z test_ops.py::TestCompositeComplianceCUDA::test_operator_where_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2189443Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2189624Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zeros_like_cuda_float32 PASSED [ 80%]
2023-01-11T23:10:17.2189781Z test_ops.py::TestMathBitsCUDA::test_conj_view_H_cuda_complex64 PASSED    [ 80%]
2023-01-11T23:10:17.2189939Z test_ops.py::TestMathBitsCUDA::test_conj_view___rsub___cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2190097Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_T_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2190280Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bfloat16_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2190451Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cdouble_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2190626Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_chalf_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2190801Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_char_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2190976Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_double_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2191145Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_float_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2191314Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_half_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2191479Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_long_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2191647Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_short_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2191799Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_abs_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2191984Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acosh_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2192152Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_allclose_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2192365Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 80%]
2023-01-11T23:10:17.2192525Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atanh_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2192690Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_2d_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2192854Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_3d_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2193022Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_to_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2193180Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_chunk_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2193333Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_clone_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2193496Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_column_stack_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2193657Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2193827Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_conj_physical_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194000Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_constant_pad_nd_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194160Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cos_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194319Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cosh_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194477Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cumsum_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194642Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194803Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_copy_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2194991Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2195155Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dsplit_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2195319Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_dstack_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2195484Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_as_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2195644Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft2_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2195807Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fft_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2195969Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftn_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2196130Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft2_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2196289Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfft_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2196449Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2196609Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftn_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2196773Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft2_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2196940Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_float_power_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2197101Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hsplit_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2197261Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_imag_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2197419Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_add_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2197605Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isreal_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2197765Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_lerp_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2197941Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2198108Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_norm_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2198278Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svdvals_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2198449Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_vector_norm_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2198606Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log10_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2198768Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log1p_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2198922Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log2_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2199099Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2199268Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_and_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2199434Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_or_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2199596Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_masked_fill_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2199783Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2199945Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_movedim_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2200103Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_neg_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2200309Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 80%]
2023-01-11T23:10:17.2200527Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_empty_strided_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 80%]
2023-01-11T23:10:17.2200719Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_zeros_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2200898Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 80%]
2023-01-11T23:10:17.2201093Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2201283Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2201473Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2201665Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2201852Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2202046Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2202198Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ones_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:10:17.2202363Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_positive_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2202524Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ravel_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2202685Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_real_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2202850Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reciprocal_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203012Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_repeat_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203204Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_as_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203367Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_reshape_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203522Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_roll_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203681Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rot90_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203839Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsqrt_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2203998Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sgn_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2204150Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sin_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2204307Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinc_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2204461Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sinh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2204656Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2204815Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sub_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2204965Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_t_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2205123Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tan_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2205281Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_trace_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2205447Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_transpose_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2205612Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_true_divide_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2205773Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unbind_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2205946Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_copy_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2206106Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unfold_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2206289Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_mean_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2206448Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2206607Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vsplit_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2206764Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_zeros_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:10:17.2206918Z test_ops.py::TestMathBitsCUDA::test_conj_view_abs_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2207070Z test_ops.py::TestMathBitsCUDA::test_conj_view_acos_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2207225Z test_ops.py::TestMathBitsCUDA::test_conj_view_add_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2207384Z test_ops.py::TestMathBitsCUDA::test_conj_view_addbmm_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2207544Z test_ops.py::TestMathBitsCUDA::test_conj_view_addcdiv_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2207695Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmv_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2207856Z test_ops.py::TestMathBitsCUDA::test_conj_view_allclose_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2208006Z test_ops.py::TestMathBitsCUDA::test_conj_view_any_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2208158Z test_ops.py::TestMathBitsCUDA::test_conj_view_asin_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2208312Z test_ops.py::TestMathBitsCUDA::test_conj_view_atan_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2208465Z test_ops.py::TestMathBitsCUDA::test_conj_view_atanh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2208623Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_1d_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2208805Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_2d_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2208952Z test_ops.py::TestMathBitsCUDA::test_conj_view_atleast_3d_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2209110Z test_ops.py::TestMathBitsCUDA::test_conj_view_bfloat16_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2209263Z test_ops.py::TestMathBitsCUDA::test_conj_view_bmm_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2209412Z test_ops.py::TestMathBitsCUDA::test_conj_view_bool_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2209582Z test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_tensors_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2209734Z test_ops.py::TestMathBitsCUDA::test_conj_view_byte_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2209892Z test_ops.py::TestMathBitsCUDA::test_conj_view_cfloat_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2210047Z test_ops.py::TestMathBitsCUDA::test_conj_view_chalf_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2210195Z test_ops.py::TestMathBitsCUDA::test_conj_view_char_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2210378Z test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_solve_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2210557Z test_ops.py::TestMathBitsCUDA::test_conj_view_chunk_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2210732Z test_ops.py::TestMathBitsCUDA::test_conj_view_conj_physical_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2210899Z test_ops.py::TestMathBitsCUDA::test_conj_view_constant_pad_nd_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2211051Z test_ops.py::TestMathBitsCUDA::test_conj_view_cosh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2211214Z test_ops.py::TestMathBitsCUDA::test_conj_view_count_nonzero_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2211364Z test_ops.py::TestMathBitsCUDA::test_conj_view_cov_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2211512Z test_ops.py::TestMathBitsCUDA::test_conj_view_cumsum_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2211690Z test_ops.py::TestMathBitsCUDA::test_conj_view_cumulative_trapezoid_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2211840Z test_ops.py::TestMathBitsCUDA::test_conj_view_diag_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212014Z test_ops.py::TestMathBitsCUDA::test_conj_view_diff_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212187Z test_ops.py::TestMathBitsCUDA::test_conj_view_div_no_rounding_mode_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212344Z test_ops.py::TestMathBitsCUDA::test_conj_view_double_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212500Z test_ops.py::TestMathBitsCUDA::test_conj_view_dsplit_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212657Z test_ops.py::TestMathBitsCUDA::test_conj_view_dstack_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212804Z test_ops.py::TestMathBitsCUDA::test_conj_view_einsum_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2212973Z test_ops.py::TestMathBitsCUDA::test_conj_view_empty_cuda_complex64 SKIPPED (Skipped!) [ 81%]
2023-01-11T23:10:17.2213152Z test_ops.py::TestMathBitsCUDA::test_conj_view_empty_like_cuda_complex64 SKIPPED (Skipped!) [ 81%]
2023-01-11T23:10:17.2213307Z test_ops.py::TestMathBitsCUDA::test_conj_view_equal_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2213463Z test_ops.py::TestMathBitsCUDA::test_conj_view_expand_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2213630Z test_ops.py::TestMathBitsCUDA::test_conj_view_eye_cuda_complex64 SKIPPED (Skipped!) [ 81%]
2023-01-11T23:10:17.2213791Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft2_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2213953Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftshift_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2214115Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfftn_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2214265Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftn_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2214424Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft2_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2214703Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfftn_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2214873Z test_ops.py::TestMathBitsCUDA::test_conj_view_flatten_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2215033Z test_ops.py::TestMathBitsCUDA::test_conj_view_fliplr_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2215197Z test_ops.py::TestMathBitsCUDA::test_conj_view_float_power_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2215348Z test_ops.py::TestMathBitsCUDA::test_conj_view_full_cuda_complex64 XFAIL  [ 81%]
2023-01-11T23:10:17.2215502Z test_ops.py::TestMathBitsCUDA::test_conj_view_half_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2215651Z test_ops.py::TestMathBitsCUDA::test_conj_view_hsplit_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2215803Z test_ops.py::TestMathBitsCUDA::test_conj_view_imag_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2215960Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_fill_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2216124Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_put_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2216279Z test_ops.py::TestMathBitsCUDA::test_conj_view_isinf_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2216434Z test_ops.py::TestMathBitsCUDA::test_conj_view_isnan_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2216591Z test_ops.py::TestMathBitsCUDA::test_conj_view_isreal_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2216747Z test_ops.py::TestMathBitsCUDA::test_conj_view_istft_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2216920Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_2inputs_2outputs_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:10:17.2217099Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_return_by_ref_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:10:17.2217262Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_unary_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:10:17.2217416Z test_ops.py::TestMathBitsCUDA::test_conj_view_kron_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2217572Z test_ops.py::TestMathBitsCUDA::test_conj_view_lerp_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2217734Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cond_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2217942Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eig_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2218103Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2218268Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvals_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2218421Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2218590Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_ex_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2218758Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_solve_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2218924Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2219108Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2219272Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_solve_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2219444Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_norm_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2219614Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_power_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2219786Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2219951Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_multi_dot_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2220112Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2220297Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_norm_subgradients_at_zero_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2220490Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2220655Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_ex_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2220833Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_solve_triangular_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2220992Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svd_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2221157Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorinv_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2221319Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_tensorsolve_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2221482Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vander_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2221645Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vecdot_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2221802Z test_ops.py::TestMathBitsCUDA::test_conj_view_log1p_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2221958Z test_ops.py::TestMathBitsCUDA::test_conj_view_log2_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2222111Z test_ops.py::TestMathBitsCUDA::test_conj_view_log_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2222270Z test_ops.py::TestMathBitsCUDA::test_conj_view_logdet_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2222434Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_and_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2222589Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_not_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2222748Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_or_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2222907Z test_ops.py::TestMathBitsCUDA::test_conj_view_logical_xor_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2223069Z test_ops.py::TestMathBitsCUDA::test_conj_view_logspace_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:10:17.2223221Z test_ops.py::TestMathBitsCUDA::test_conj_view_long_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2223383Z test_ops.py::TestMathBitsCUDA::test_conj_view_lu_solve_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2223540Z test_ops.py::TestMathBitsCUDA::test_conj_view_lu_unpack_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2223719Z test_ops.py::TestMathBitsCUDA::test_conj_view_mH_cuda_complex64 PASSED   [ 81%]
2023-01-11T23:10:17.2223875Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_fill_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2224040Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_normalize_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2224205Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_select_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2224362Z test_ops.py::TestMathBitsCUDA::test_conj_view_matmul_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2224539Z test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2224693Z test_ops.py::TestMathBitsCUDA::test_conj_view_mm_cuda_complex64 PASSED   [ 81%]
2023-01-11T23:10:17.2224844Z test_ops.py::TestMathBitsCUDA::test_conj_view_mul_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:10:17.2224994Z test_ops.py::TestMathBitsCUDA::test_conj_view_mv_cuda_complex64 PASSED   [ 81%]
2023-01-11T23:10:17.2225150Z test_ops.py::TestMathBitsCUDA::test_conj_view_ne_cuda_complex64 PASSED   [ 81%]
2023-01-11T23:10:17.2225364Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_strided_cuda_complex64 SKIPPED (Expected: new_empty_strided is not comparable) [ 81%]
2023-01-11T23:10:17.2225523Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_ones_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2225706Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose1d_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2225886Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_circular_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2226066Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_reflect_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2226280Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2226469Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2226646Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_tanhshrink_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2226835Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 81%]
2023-01-11T23:10:17.2227030Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2227201Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_unfold_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2227360Z test_ops.py::TestMathBitsCUDA::test_conj_view_nonzero_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2227516Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_inf_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2227669Z test_ops.py::TestMathBitsCUDA::test_conj_view_ones_cuda_complex64 XFAIL  [ 82%]
2023-01-11T23:10:17.2227828Z test_ops.py::TestMathBitsCUDA::test_conj_view_ormqr_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2227993Z test_ops.py::TestMathBitsCUDA::test_conj_view_pinverse_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2228146Z test_ops.py::TestMathBitsCUDA::test_conj_view_prod_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2228291Z test_ops.py::TestMathBitsCUDA::test_conj_view_put_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:10:17.2228464Z test_ops.py::TestMathBitsCUDA::test_conj_view_rand_like_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2228711Z test_ops.py::TestMathBitsCUDA::test_conj_view_randn_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:10:17.2228956Z test_ops.py::TestMathBitsCUDA::test_conj_view_randn_like_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2229149Z test_ops.py::TestMathBitsCUDA::test_conj_view_ravel_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2229353Z test_ops.py::TestMathBitsCUDA::test_conj_view_renorm_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2229515Z test_ops.py::TestMathBitsCUDA::test_conj_view_resize_as__cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2229719Z test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_conj_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2229867Z test_ops.py::TestMathBitsCUDA::test_conj_view_roll_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2230019Z test_ops.py::TestMathBitsCUDA::test_conj_view_rsub_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2230178Z test_ops.py::TestMathBitsCUDA::test_conj_view_sigmoid_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2230328Z test_ops.py::TestMathBitsCUDA::test_conj_view_sin_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:10:17.2230482Z test_ops.py::TestMathBitsCUDA::test_conj_view_sinh_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2230666Z test_ops.py::TestMathBitsCUDA::test_conj_view_sparse_sampled_addmm_cuda_complex64 SKIPPED (Skipped!) [ 82%]
2023-01-11T23:10:17.2230824Z test_ops.py::TestMathBitsCUDA::test_conj_view_sqrt_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2230981Z test_ops.py::TestMathBitsCUDA::test_conj_view_squeeze_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2231133Z test_ops.py::TestMathBitsCUDA::test_conj_view_stack_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:10:17.2231276Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:10:17.2231440Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2231612Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_mean_unbiased_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2231760Z test_ops.py::TestMathBitsCUDA::test_conj_view_sub_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:10:17.2231910Z test_ops.py::TestMathBitsCUDA::test_conj_view_sum_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:10:17.2232074Z test_ops.py::TestMathBitsCUDA::test_conj_view_sum_to_size_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2232263Z test_ops.py::TestMathBitsCUDA::test_conj_view_take_along_dim_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2232411Z test_ops.py::TestMathBitsCUDA::test_conj_view_take_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2232555Z test_ops.py::TestMathBitsCUDA::test_conj_view_tan_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:10:17.2232719Z test_ops.py::TestMathBitsCUDA::test_conj_view_tensor_split_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2232867Z test_ops.py::TestMathBitsCUDA::test_conj_view_tile_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2233017Z test_ops.py::TestMathBitsCUDA::test_conj_view_to_cuda_complex64 PASSED   [ 82%]
2023-01-11T23:10:17.2233173Z test_ops.py::TestMathBitsCUDA::test_conj_view_trace_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2233327Z test_ops.py::TestMathBitsCUDA::test_conj_view_trapz_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2233493Z test_ops.py::TestMathBitsCUDA::test_conj_view_triangular_solve_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2233646Z test_ops.py::TestMathBitsCUDA::test_conj_view_tril_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2233788Z test_ops.py::TestMathBitsCUDA::test_conj_view_triu_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2233952Z test_ops.py::TestMathBitsCUDA::test_conj_view_true_divide_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2234113Z test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_copy_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2234270Z test_ops.py::TestMathBitsCUDA::test_conj_view_unfold_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2234430Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2234599Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_mean_unbiased_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2234758Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_cuda_complex64 PASSED [ 82%]
2023-01-11T23:10:17.2235077Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_as_real_cuda_complex64 SKIPPED (Operation doesn't support conjugated inputs.) [ 82%]
2023-01-11T23:10:17.2235233Z test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:10:17.2235383Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_H_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2235563Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_T_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2235734Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___getitem___cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2235898Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___radd___cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2236061Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rsub___cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2236224Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_T_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2236411Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bfloat16_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2236594Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_bool_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2236766Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_byte_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2236957Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cdouble_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2237138Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_chalf_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2237320Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_float_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2237498Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_long_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2237682Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_short_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2237848Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_abs_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2238037Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acos_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2238199Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_add_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2238362Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcdiv_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2238524Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addr_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2238683Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_all_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2238898Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_cuda_complex128 SKIPPED (Errors when storage_offset is included) [ 82%]
2023-01-11T23:10:17.2239079Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_scatter_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2239245Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asinh_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2239409Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atan_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2239579Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_1d_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2239753Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_2d_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2239917Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atleast_3d_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2240082Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_chunk_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2240247Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_clone_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2240420Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_column_stack_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2240582Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2240764Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_constant_pad_nd_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2240934Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_contiguous_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2241120Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cos_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2241280Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2241454Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diag_embed_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2241622Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2241801Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_scatter_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2241967Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dsplit_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2242135Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_dstack_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2242341Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 82%]
2023-01-11T23:10:17.2242505Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eq_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2242668Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_exp_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2242832Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_as_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2242995Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_expand_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2243173Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_eye_cuda_complex128 SKIPPED (Skipped!) [ 82%]
2023-01-11T23:10:17.2243340Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft2_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2243504Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fft_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2243695Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft2_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2243866Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfftn_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2244031Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft2_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2244194Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifft_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2244352Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftn_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2244521Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft2_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2244684Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fill_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2244851Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flatten_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2245013Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flip_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2245180Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fliplr_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2245349Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_flipud_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2245520Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_float_power_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2245677Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hstack_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2245840Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_imag_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2246012Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_fill_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2246185Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_select_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2246353Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isclose_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2246519Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isinf_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2246707Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isnan_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2246875Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_norm_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2247052Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_vector_norm_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2247211Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linspace_cuda_complex128 XFAIL [ 82%]
2023-01-11T23:10:17.2247368Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log10_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2247531Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log1p_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2247696Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log2_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2247858Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2248029Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_or_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2248194Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2248354Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ne_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2248581Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_strided_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 82%]
2023-01-11T23:10:17.2248741Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_full_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2248909Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_zeros_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2249125Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2249318Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2249518Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2249681Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_pow_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2249845Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ravel_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250005Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_real_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250177Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reciprocal_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250340Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_as_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250510Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_reshape_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250673Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_roll_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250834Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsub_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2250995Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sgn_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2251162Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sigmoid_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2251319Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sin_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2251481Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinc_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2251654Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_softmax_with_dtype_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2251818Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sqrt_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2251982Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_squeeze_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2252172Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_mean_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2252344Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_to_size_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2252506Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tan_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2252668Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tanh_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2252830Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_trace_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2253003Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_transpose_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2253160Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unbind_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2253334Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unflatten_cuda_complex128 PASSED [ 82%]
2023-01-11T23:10:17.2253508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2253671Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unfold_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2253837Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_unsqueeze_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2253999Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2254164Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_var_mean_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2254333Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_as_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2254604Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_view_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2254817Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_abs_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2254978Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acos_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2255139Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_acosh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2255304Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcdiv_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2255466Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addcmul_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2255642Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_decomposed_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2255801Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addr_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2255960Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_angle_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2256115Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_argwhere_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2256280Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2256501Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_partial_views_cuda_complex128 SKIPPED (Test changes in memory layout) [ 83%]
2023-01-11T23:10:17.2256675Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_as_strided_scatter_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2256832Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atanh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2256994Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_1d_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2257156Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_baddbmm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2257313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bmm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2257483Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_tensors_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2257637Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_byte_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2257797Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cat_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2258043Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cfloat_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2258208Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chunk_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2258370Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2258538Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_conj_physical_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2258702Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_contiguous_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2258864Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_corrcoef_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2259024Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_count_nonzero_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2259184Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cross_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2259343Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumsum_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2259527Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumulative_trapezoid_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2259688Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2259859Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_scatter_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2260017Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dist_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2260191Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_div_no_rounding_mode_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2260347Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dsplit_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2260535Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dstack_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2260698Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_einsum_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2260902Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_equal_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2261067Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_as_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2261231Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft2_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2261392Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft2_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2261553Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2261720Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftshift_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2261872Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2262038Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flatten_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2262198Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flip_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2262364Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_flipud_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2262523Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2262694Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_float_power_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2262852Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2263013Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_full_like_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2263170Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gradient_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2263328Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_half_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2263485Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_imag_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2263648Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_add_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2269343Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_fill_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2269600Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_put_cuda_complex128 SKIPPED (Operation not tested with tensors with negative bit.) [ 83%]
2023-01-11T23:10:17.2269774Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_select_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2269939Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_inner_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2270099Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_int_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2270277Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isclose_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2270472Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isinf_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2270636Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isreal_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2270820Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_2inputs_2outputs_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2271012Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_4inputs_with_extra_args_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2271182Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2271367Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_binary_return_by_ref_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2271537Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_jiterator_unary_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2271699Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_kron_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2271877Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ldexp_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2272045Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cond_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2272218Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2272391Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvalsh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2272581Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_householder_product_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2272755Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2272929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_solve_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2273100Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2273267Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_solve_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2273443Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_norm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2273614Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2273803Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_rank_hermitian_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2273969Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2274135Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2274313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_hermitian_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2274553Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_pinv_singular_cuda_complex128 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 83%]
2023-01-11T23:10:17.2274729Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_slogdet_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2274890Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2275080Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_ex_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2275255Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorinv_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2275434Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_tensorsolve_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2275603Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vander_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2275768Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linspace_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2275929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log1p_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2276094Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2276275Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2276438Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_and_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2276600Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logspace_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2276762Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_long_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2276921Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mH_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2277073Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mT_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2277245Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumprod_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2277412Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_mean_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2277610Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_normalize_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2277769Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_prod_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2277937Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_std_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2278103Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_sum_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2278264Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_var_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2278424Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matrix_exp_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2278610Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2278768Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2278936Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2279116Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_cuda_complex128 SKIPPED (Skipped!) [ 83%]
2023-01-11T23:10:17.2279342Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_empty_strided_cuda_complex128 SKIPPED (Expected: new_empty_strided is not comparable) [ 83%]
2023-01-11T23:10:17.2279508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_zeros_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2279690Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv2d_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2279881Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose2d_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2280092Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2280275Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_linear_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2280486Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_normalize_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2280722Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_replicate_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2280912Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2281091Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_shuffle_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2281281Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pixel_unshuffle_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2281464Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_tanhshrink_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2281655Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2281837Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_unfold_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2282001Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2282167Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_inf_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2282333Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_nuc_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2282490Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2282646Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ones_like_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2282810Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_permute_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2282973Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pinverse_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2283134Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_prod_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2283317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_put_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2283482Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rand_like_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2283644Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:10:17.2283811Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_randn_like_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2283964Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ravel_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2284124Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_real_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2284288Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reciprocal_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2284451Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2284674Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize__cuda_complex128 SKIPPED (Operation not tested with tensors with negative bit.) [ 83%]
2023-01-11T23:10:17.2284848Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_conj_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2285015Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resolve_neg_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2285177Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rot90_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2285332Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsub_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2285508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scalar_tensor_cuda_complex128 SKIPPED (Skipped!) [ 83%]
2023-01-11T23:10:17.2285675Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_add_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2285838Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_scatter_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2285995Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sin_cuda_complex128 PASSED [ 83%]
2023-01-11T23:10:17.2286155Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinc_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2286308Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sinh_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2286493Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_slice_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2286671Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_softmax_with_dtype_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2286855Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sparse_sampled_addmm_cuda_complex128 SKIPPED (Skipped!) [ 84%]
2023-01-11T23:10:17.2287023Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_list_args_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2287195Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_with_sizes_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2287356Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stack_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2287518Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2287686Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_unbiased_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2287846Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_stft_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288002Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sub_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288157Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288316Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sum_to_size_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288481Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_symeig_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288638Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_t_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288798Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tan_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2288986Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2289154Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_to_sparse_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2289317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trace_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2289482Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_transpose_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2289638Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unbind_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2289803Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unflatten_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2289970Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_copy_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2290132Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unfold_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2290317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_unbiased_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2290506Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2290681Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2290836Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_cuda_complex128 XFAIL [ 84%]
2023-01-11T23:10:17.2291000Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zeros_like_cuda_complex128 PASSED [ 84%]
2023-01-11T23:10:17.2291151Z test_ops.py::TestMathBitsCUDA::test_neg_view___getitem___cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2291306Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmod___cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2291479Z test_ops.py::TestMathBitsCUDA::test_neg_view__native_batch_norm_legit_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2291655Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bfloat16_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2291829Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_bool_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2292006Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cdouble_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2292201Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_chalf_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2292379Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_complex_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2292543Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_double_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2292715Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_float_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2292882Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_long_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293041Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_abs_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293204Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acos_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293359Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_acosh_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293514Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_add_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293676Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcdiv_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293826Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addcmul_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2293985Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amin_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2294142Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_any_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2294297Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_arange_cuda_float64 XFAIL [ 84%]
2023-01-11T23:10:17.2294760Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_cuda_float64 SKIPPED (Errors when storage_offset is included) [ 84%]
2023-01-11T23:10:17.2295110Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_partial_views_cuda_float64 SKIPPED (Errors when storage_offset is included) [ 84%]
2023-01-11T23:10:17.2295273Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atanh_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2295443Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_1d_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2295606Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_2d_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2295772Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_tensors_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2295934Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_bucketize_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2296091Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_chunk_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2296257Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_min_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2296415Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clone_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2296584Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_constant_pad_nd_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2296748Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_contiguous_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2296915Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_copysign_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2297070Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cos_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2297221Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cosh_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2297384Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_embed_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2297554Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_scatter_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2297729Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_no_rounding_mode_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2297903Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_trunc_rounding_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2298106Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_like_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 84%]
2023-01-11T23:10:17.2298299Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eq_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2298459Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp2_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2298607Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_exp_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2298770Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2298927Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expm1_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2299094Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_eye_cuda_float64 SKIPPED (Skipped!) [ 84%]
2023-01-11T23:10:17.2299260Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftshift_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2299423Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2299594Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftshift_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2299760Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfftn_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2299922Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2300077Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfftn_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2300263Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft2_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2300450Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfft_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2300607Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_rfftn_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2300790Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flip_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2300950Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flipud_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2301109Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_gt_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2301267Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_heaviside_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2301419Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hypot_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2301571Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_i0_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2301728Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igamma_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2301886Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_igammac_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2302043Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_add_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2302209Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_fill_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2302371Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_select_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2302531Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isclose_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2302683Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isfinite_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2302845Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isposinf_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2303005Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isreal_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2303159Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_le_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2303318Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lgamma_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2303489Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_matrix_norm_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2303655Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_norm_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2303816Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log10_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304005Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log2_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304154Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304330Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304493Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_or_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304656Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_masked_fill_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304814Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mean_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2304993Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2305178Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2305341Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_minimum_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2305490Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_movedim_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2305645Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_mul_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2305809Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_copy_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2305977Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_native_layer_norm_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2306134Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_neg_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2306350Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 84%]
2023-01-11T23:10:17.2306596Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_full_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2306866Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 84%]
2023-01-11T23:10:17.2307049Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_gelu_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2307221Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_group_norm_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2307400Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2307577Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2307768Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2307943Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mish_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2308125Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2308314Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2308505Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2308762Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softplus_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2308937Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softshrink_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2309123Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2309310Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2309474Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ones_cuda_float64 XFAIL [ 84%]
2023-01-11T23:10:17.2309634Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_permute_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2309837Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_positive_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2309998Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_pow_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2310187Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_randn_cuda_float64 SKIPPED (Test expects tensor input) [ 84%]
2023-01-11T23:10:17.2310340Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ravel_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2310506Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reciprocal_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2310668Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_remainder_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2310830Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_repeat_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2310995Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_as_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2311156Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_reshape_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2311316Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_round_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2311474Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sgn_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2311636Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sigmoid_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2311786Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_signbit_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2311942Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sin_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2312101Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinc_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2312271Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j0_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2312460Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i0e_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2312623Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2312793Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_i1e_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2312964Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_ndtr_cuda_float64 PASSED [ 84%]
2023-01-11T23:10:17.2313143Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_log_softmax_with_dtype_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2313314Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_logit_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2313506Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2313694Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2313858Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtri_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314041Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314212Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_xlog1py_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314371Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_squeeze_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314527Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314678Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tanh_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314843Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tensor_split_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2314998Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_to_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2315155Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trace_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2315313Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tril_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2315466Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_triu_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2315657Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_true_divide_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2315821Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unbind_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2315976Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unflatten_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2316143Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_copy_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2316304Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unfold_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2316466Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_unsqueeze_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2316628Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_mean_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2316791Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_as_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2316952Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_view_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2317112Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vsplit_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2317267Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_vstack_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2317419Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_where_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2317575Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_xlogy_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2317731Z test_ops.py::TestMathBitsCUDA::test_neg_view_acos_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2317886Z test_ops.py::TestMathBitsCUDA::test_neg_view_acosh_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2318039Z test_ops.py::TestMathBitsCUDA::test_neg_view_add_cuda_float64 PASSED     [ 85%]
2023-01-11T23:10:17.2318219Z test_ops.py::TestMathBitsCUDA::test_neg_view_addcmul_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2318387Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_decomposed_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2318543Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmv_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2318687Z test_ops.py::TestMathBitsCUDA::test_neg_view_all_cuda_float64 PASSED     [ 85%]
2023-01-11T23:10:17.2318842Z test_ops.py::TestMathBitsCUDA::test_neg_view_allclose_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2318993Z test_ops.py::TestMathBitsCUDA::test_neg_view_amin_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2319143Z test_ops.py::TestMathBitsCUDA::test_neg_view_angle_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2319293Z test_ops.py::TestMathBitsCUDA::test_neg_view_arange_cuda_float64 XFAIL   [ 85%]
2023-01-11T23:10:17.2319444Z test_ops.py::TestMathBitsCUDA::test_neg_view_argmax_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2319599Z test_ops.py::TestMathBitsCUDA::test_neg_view_argmin_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2319750Z test_ops.py::TestMathBitsCUDA::test_neg_view_argsort_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2319956Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_partial_views_cuda_float64 SKIPPED (Test changes in memory layout) [ 85%]
2023-01-11T23:10:17.2320106Z test_ops.py::TestMathBitsCUDA::test_neg_view_asin_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2320254Z test_ops.py::TestMathBitsCUDA::test_neg_view_atan_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2320404Z test_ops.py::TestMathBitsCUDA::test_neg_view_atanh_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2320564Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_2d_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2320720Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_3d_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2320876Z test_ops.py::TestMathBitsCUDA::test_neg_view_bfloat16_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2321052Z test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_tensors_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2321207Z test_ops.py::TestMathBitsCUDA::test_neg_view_cartesian_prod_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2321383Z test_ops.py::TestMathBitsCUDA::test_neg_view_ceil_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2321537Z test_ops.py::TestMathBitsCUDA::test_neg_view_char_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2321695Z test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2321844Z test_ops.py::TestMathBitsCUDA::test_neg_view_chunk_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2322005Z test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_max_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2322159Z test_ops.py::TestMathBitsCUDA::test_neg_view_clone_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2322317Z test_ops.py::TestMathBitsCUDA::test_neg_view_column_stack_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2322471Z test_ops.py::TestMathBitsCUDA::test_neg_view_combinations_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2322625Z test_ops.py::TestMathBitsCUDA::test_neg_view_complex_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2322784Z test_ops.py::TestMathBitsCUDA::test_neg_view_copysign_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2322941Z test_ops.py::TestMathBitsCUDA::test_neg_view_corrcoef_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2323089Z test_ops.py::TestMathBitsCUDA::test_neg_view_cos_cuda_float64 PASSED     [ 85%]
2023-01-11T23:10:17.2323252Z test_ops.py::TestMathBitsCUDA::test_neg_view_count_nonzero_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2323401Z test_ops.py::TestMathBitsCUDA::test_neg_view_cov_cuda_float64 PASSED     [ 85%]
2023-01-11T23:10:17.2323551Z test_ops.py::TestMathBitsCUDA::test_neg_view_cross_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2323704Z test_ops.py::TestMathBitsCUDA::test_neg_view_cummin_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2323870Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumprod_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2324019Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumsum_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2324194Z test_ops.py::TestMathBitsCUDA::test_neg_view_cumulative_trapezoid_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2324345Z test_ops.py::TestMathBitsCUDA::test_neg_view_deg2rad_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2324491Z test_ops.py::TestMathBitsCUDA::test_neg_view_diag_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2324654Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_copy_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2324819Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_scatter_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2324970Z test_ops.py::TestMathBitsCUDA::test_neg_view_digamma_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2325132Z test_ops.py::TestMathBitsCUDA::test_neg_view_div_floor_rounding_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2325304Z test_ops.py::TestMathBitsCUDA::test_neg_view_div_no_rounding_mode_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2325457Z test_ops.py::TestMathBitsCUDA::test_neg_view_dstack_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2325615Z test_ops.py::TestMathBitsCUDA::test_neg_view_einsum_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2325764Z test_ops.py::TestMathBitsCUDA::test_neg_view_exp2_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2325912Z test_ops.py::TestMathBitsCUDA::test_neg_view_exp_cuda_float64 PASSED     [ 85%]
2023-01-11T23:10:17.2326064Z test_ops.py::TestMathBitsCUDA::test_neg_view_expand_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2326233Z test_ops.py::TestMathBitsCUDA::test_neg_view_eye_cuda_float64 SKIPPED (Skipped!) [ 85%]
2023-01-11T23:10:17.2326380Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft2_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2326532Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fft_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2326696Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftshift_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2326850Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft2_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327029Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfft_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327188Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327351Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftshift_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327510Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft2_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327660Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfft_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327823Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft2_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2327977Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfft_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2328133Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfftn_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2328288Z test_ops.py::TestMathBitsCUDA::test_neg_view_flatten_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2328440Z test_ops.py::TestMathBitsCUDA::test_neg_view_flip_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2328597Z test_ops.py::TestMathBitsCUDA::test_neg_view_fliplr_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2328749Z test_ops.py::TestMathBitsCUDA::test_neg_view_flipud_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2328891Z test_ops.py::TestMathBitsCUDA::test_neg_view_float_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2329038Z test_ops.py::TestMathBitsCUDA::test_neg_view_frac_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2329189Z test_ops.py::TestMathBitsCUDA::test_neg_view_frexp_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2329336Z test_ops.py::TestMathBitsCUDA::test_neg_view_full_cuda_float64 XFAIL     [ 85%]
2023-01-11T23:10:17.2329482Z test_ops.py::TestMathBitsCUDA::test_neg_view_ge_cuda_float64 PASSED      [ 85%]
2023-01-11T23:10:17.2329671Z test_ops.py::TestMathBitsCUDA::test_neg_view_grid_sampler_2d_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2329820Z test_ops.py::TestMathBitsCUDA::test_neg_view_half_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2329984Z test_ops.py::TestMathBitsCUDA::test_neg_view_heaviside_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2330134Z test_ops.py::TestMathBitsCUDA::test_neg_view_hsplit_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2330276Z test_ops.py::TestMathBitsCUDA::test_neg_view_hstack_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2330425Z test_ops.py::TestMathBitsCUDA::test_neg_view_hypot_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2330574Z test_ops.py::TestMathBitsCUDA::test_neg_view_i0_cuda_float64 PASSED      [ 85%]
2023-01-11T23:10:17.2330724Z test_ops.py::TestMathBitsCUDA::test_neg_view_igamma_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2330884Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_copy_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2331044Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_fill_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2331267Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_put_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 85%]
2023-01-11T23:10:17.2331429Z test_ops.py::TestMathBitsCUDA::test_neg_view_isfinite_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2331597Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_2inputs_2outputs_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2331780Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_4inputs_with_extra_args_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2331943Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2332122Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_binary_return_by_ref_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2332285Z test_ops.py::TestMathBitsCUDA::test_neg_view_jiterator_unary_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2332446Z test_ops.py::TestMathBitsCUDA::test_neg_view_kthvalue_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2332599Z test_ops.py::TestMathBitsCUDA::test_neg_view_ldexp_cuda_float64 XFAIL    [ 85%]
2023-01-11T23:10:17.2332773Z test_ops.py::TestMathBitsCUDA::test_neg_view_le_cuda_float64 PASSED      [ 85%]
2023-01-11T23:10:17.2332918Z test_ops.py::TestMathBitsCUDA::test_neg_view_lerp_cuda_float64 PASSED    [ 85%]
2023-01-11T23:10:17.2333070Z test_ops.py::TestMathBitsCUDA::test_neg_view_lgamma_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2333225Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eig_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2333386Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvals_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2333551Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigvalsh_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2333712Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_ex_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2333878Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2334050Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_factor_ex_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2334219Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_ldl_solve_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2334368Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2334652Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_ex_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2334816Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_solve_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2334981Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2335162Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_rank_hermitian_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2335319Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2335549Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_norm_subgradients_at_zero_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2335721Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_hermitian_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2335880Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_slogdet_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2336047Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorinv_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2336209Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vander_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2336367Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vecdot_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2336524Z test_ops.py::TestMathBitsCUDA::test_neg_view_linspace_cuda_float64 XFAIL [ 85%]
2023-01-11T23:10:17.2336678Z test_ops.py::TestMathBitsCUDA::test_neg_view_log1p_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2336829Z test_ops.py::TestMathBitsCUDA::test_neg_view_log_cuda_float64 PASSED     [ 85%]
2023-01-11T23:10:17.2336992Z test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp2_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2337143Z test_ops.py::TestMathBitsCUDA::test_neg_view_logaddexp_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2337308Z test_ops.py::TestMathBitsCUDA::test_neg_view_logcumsumexp_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2337460Z test_ops.py::TestMathBitsCUDA::test_neg_view_logdet_cuda_float64 PASSED  [ 85%]
2023-01-11T23:10:17.2337618Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_and_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2337776Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_not_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2337929Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_xor_cuda_float64 PASSED [ 85%]
2023-01-11T23:10:17.2338081Z test_ops.py::TestMathBitsCUDA::test_neg_view_logit_cuda_float64 PASSED   [ 85%]
2023-01-11T23:10:17.2338231Z test_ops.py::TestMathBitsCUDA::test_neg_view_lu_cuda_float64 PASSED      [ 86%]
2023-01-11T23:10:17.2338391Z test_ops.py::TestMathBitsCUDA::test_neg_view_lu_solve_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2338532Z test_ops.py::TestMathBitsCUDA::test_neg_view_mH_cuda_float64 PASSED      [ 86%]
2023-01-11T23:10:17.2338709Z test_ops.py::TestMathBitsCUDA::test_neg_view_mT_cuda_float64 PASSED      [ 86%]
2023-01-11T23:10:17.2338872Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_fill_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2339037Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logaddexp_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2339201Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_logsumexp_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2339368Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_normalize_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2339526Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_prod_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2339689Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_scatter_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2339846Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_select_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2340007Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmax_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2340170Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_softmin_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2340333Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_std_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2340509Z test_ops.py::TestMathBitsCUDA::test_neg_view_matmul_cuda_float64 PASSED  [ 86%]
2023-01-11T23:10:17.2340715Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_pool2d_with_indices_backward_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2340884Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_no_dim_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2341057Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_reduction_with_dim_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2341202Z test_ops.py::TestMathBitsCUDA::test_neg_view_mean_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2341382Z test_ops.py::TestMathBitsCUDA::test_neg_view_median_cuda_float64 PASSED  [ 86%]
2023-01-11T23:10:17.2341557Z test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_variadic_tensors_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2341730Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_no_dim_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2341901Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_reduction_with_dim_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2342054Z test_ops.py::TestMathBitsCUDA::test_neg_view_minimum_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2342206Z test_ops.py::TestMathBitsCUDA::test_neg_view_mode_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2342359Z test_ops.py::TestMathBitsCUDA::test_neg_view_movedim_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2342510Z test_ops.py::TestMathBitsCUDA::test_neg_view_mul_cuda_float64 PASSED     [ 86%]
2023-01-11T23:10:17.2342662Z test_ops.py::TestMathBitsCUDA::test_neg_view_multinomial_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2342835Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_1_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2343005Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_3_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2343172Z test_ops.py::TestMathBitsCUDA::test_neg_view_mvlgamma_mvlgamma_p_5_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2343325Z test_ops.py::TestMathBitsCUDA::test_neg_view_nanmean_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2343474Z test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_cuda_float64 XFAIL   [ 86%]
2023-01-11T23:10:17.2343638Z test_ops.py::TestMathBitsCUDA::test_neg_view_native_batch_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2343811Z test_ops.py::TestMathBitsCUDA::test_neg_view_native_dropout_backward_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2343965Z test_ops.py::TestMathBitsCUDA::test_neg_view_native_layer_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2344117Z test_ops.py::TestMathBitsCUDA::test_neg_view_ne_cuda_float64 PASSED      [ 86%]
2023-01-11T23:10:17.2344275Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_full_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2344494Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2344675Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_alpha_dropout_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2344847Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2345021Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2345208Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_batch_norm_without_cudnn_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2345394Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2345585Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_binary_cross_entropy_with_logits_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2345755Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2345939Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2346125Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_embedding_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2346306Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cosine_similarity_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2346484Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_cross_entropy_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2346668Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2346835Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_elu_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2347036Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_bag_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2347215Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_embedding_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2347418Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_without_train_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2347660Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool3d_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 86%]
2023-01-11T23:10:17.2347837Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_grid_sample_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2348012Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardsigmoid_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2348187Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_instance_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2348367Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_area_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2348552Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_nearest_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2348794Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_l1_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2348962Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_leaky_relu_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2349129Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_linear_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2349309Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_local_response_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2349485Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_logsigmoid_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2349658Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2349835Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2350021Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool1d_grad_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2350227Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_grad_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2350394Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mish_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2350556Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_mse_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2350736Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_circular_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2350907Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_constant_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2351081Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_reflect_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2351263Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pairwise_distance_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2351445Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_shuffle_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2351627Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pixel_unshuffle_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2351795Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_prelu_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2351954Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu6_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2352116Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_rrelu_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2352296Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_smooth_l1_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2352472Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_soft_margin_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2352640Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2352834Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softplus_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2353013Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softshrink_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2353186Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_tanhshrink_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2353358Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_threshold_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2353534Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2353732Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_triplet_margin_with_distance_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2353915Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_bilinear_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2354073Z test_ops.py::TestMathBitsCUDA::test_neg_view_nonzero_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2354225Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2354381Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_nuc_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2354551Z test_ops.py::TestMathBitsCUDA::test_neg_view_normal_number_mean_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2354708Z test_ops.py::TestMathBitsCUDA::test_neg_view_ones_like_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2354888Z test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_native_batch_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2355048Z test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_var_mean_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2355202Z test_ops.py::TestMathBitsCUDA::test_neg_view_ormqr_cuda_float64 PASSED   [ 86%]
2023-01-11T23:10:17.2355363Z test_ops.py::TestMathBitsCUDA::test_neg_view_pca_lowrank_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2355520Z test_ops.py::TestMathBitsCUDA::test_neg_view_permute_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2355674Z test_ops.py::TestMathBitsCUDA::test_neg_view_polar_cuda_float64 PASSED   [ 86%]
2023-01-11T23:10:17.2355870Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_0_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2356046Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_3_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2356204Z test_ops.py::TestMathBitsCUDA::test_neg_view_positive_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2356347Z test_ops.py::TestMathBitsCUDA::test_neg_view_pow_cuda_float64 PASSED     [ 86%]
2023-01-11T23:10:17.2356501Z test_ops.py::TestMathBitsCUDA::test_neg_view_prod_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2356650Z test_ops.py::TestMathBitsCUDA::test_neg_view_qr_cuda_float64 PASSED      [ 86%]
2023-01-11T23:10:17.2356807Z test_ops.py::TestMathBitsCUDA::test_neg_view_quantile_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2356962Z test_ops.py::TestMathBitsCUDA::test_neg_view_rad2deg_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2357117Z test_ops.py::TestMathBitsCUDA::test_neg_view_rand_like_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2357284Z test_ops.py::TestMathBitsCUDA::test_neg_view_randint_like_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2357435Z test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2357644Z test_ops.py::TestMathBitsCUDA::test_neg_view_resize__cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 86%]
2023-01-11T23:10:17.2357804Z test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_conj_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2357985Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_0_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2358166Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_3_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2358317Z test_ops.py::TestMathBitsCUDA::test_neg_view_rsub_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2358541Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amax_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2358706Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_prod_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2358873Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_sum_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2359044Z test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_offsets_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2359200Z test_ops.py::TestMathBitsCUDA::test_neg_view_select_scatter_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2359348Z test_ops.py::TestMathBitsCUDA::test_neg_view_sgn_cuda_float64 PASSED     [ 86%]
2023-01-11T23:10:17.2359498Z test_ops.py::TestMathBitsCUDA::test_neg_view_short_cuda_float64 PASSED   [ 86%]
2023-01-11T23:10:17.2359652Z test_ops.py::TestMathBitsCUDA::test_neg_view_sigmoid_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2359804Z test_ops.py::TestMathBitsCUDA::test_neg_view_sign_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2359991Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_bartlett_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2360181Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_gaussian_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2360368Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_cosine_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2360543Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hann_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2360738Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_kaiser_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2360957Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_nuttall_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:10:17.2361110Z test_ops.py::TestMathBitsCUDA::test_neg_view_sinc_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2361262Z test_ops.py::TestMathBitsCUDA::test_neg_view_sinh_cuda_float64 PASSED    [ 86%]
2023-01-11T23:10:17.2361418Z test_ops.py::TestMathBitsCUDA::test_neg_view_slice_cuda_float64 PASSED   [ 86%]
2023-01-11T23:10:17.2361578Z test_ops.py::TestMathBitsCUDA::test_neg_view_slice_scatter_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2361765Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j0_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2361931Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_j1_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2362108Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_u_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2362486Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_v_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 86%]
2023-01-11T23:10:17.2362649Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_erfcx_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2362828Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_he_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2362989Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1e_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2363175Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_laguerre_polynomial_l_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2363350Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i0_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2363519Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_i1_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2363685Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k0_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2363846Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_modified_bessel_k1_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2364007Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtr_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2364166Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_ndtri_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2364384Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_polygamma_special_polygamma_n_0_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2364569Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k1_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2364932Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_u_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 86%]
2023-01-11T23:10:17.2365290Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_v_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 86%]
2023-01-11T23:10:17.2365452Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_xlog1py_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2365611Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_zeta_cuda_float64 PASSED [ 86%]
2023-01-11T23:10:17.2365764Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_list_args_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2365917Z test_ops.py::TestMathBitsCUDA::test_neg_view_stack_cuda_float64 XFAIL    [ 87%]
2023-01-11T23:10:17.2366067Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_cuda_float64 PASSED     [ 87%]
2023-01-11T23:10:17.2366225Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2366387Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_unbiased_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2366537Z test_ops.py::TestMathBitsCUDA::test_neg_view_sub_cuda_float64 PASSED     [ 87%]
2023-01-11T23:10:17.2366684Z test_ops.py::TestMathBitsCUDA::test_neg_view_sum_cuda_float64 PASSED     [ 87%]
2023-01-11T23:10:17.2366831Z test_ops.py::TestMathBitsCUDA::test_neg_view_take_cuda_float64 PASSED    [ 87%]
2023-01-11T23:10:17.2366982Z test_ops.py::TestMathBitsCUDA::test_neg_view_tensordot_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2367130Z test_ops.py::TestMathBitsCUDA::test_neg_view_tile_cuda_float64 PASSED    [ 87%]
2023-01-11T23:10:17.2367282Z test_ops.py::TestMathBitsCUDA::test_neg_view_topk_cuda_float64 PASSED    [ 87%]
2023-01-11T23:10:17.2367428Z test_ops.py::TestMathBitsCUDA::test_neg_view_trace_cuda_float64 PASSED   [ 87%]
2023-01-11T23:10:17.2367608Z test_ops.py::TestMathBitsCUDA::test_neg_view_trapezoid_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2367756Z test_ops.py::TestMathBitsCUDA::test_neg_view_tril_cuda_float64 PASSED    [ 87%]
2023-01-11T23:10:17.2367917Z test_ops.py::TestMathBitsCUDA::test_neg_view_true_divide_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2368067Z test_ops.py::TestMathBitsCUDA::test_neg_view_unbind_cuda_float64 PASSED  [ 87%]
2023-01-11T23:10:17.2368219Z test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_copy_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2368385Z test_ops.py::TestMathBitsCUDA::test_neg_view_unique_consecutive_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2368542Z test_ops.py::TestMathBitsCUDA::test_neg_view_unsqueeze_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2368693Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2368840Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_cuda_float64 PASSED    [ 87%]
2023-01-11T23:10:17.2368994Z test_ops.py::TestMathBitsCUDA::test_neg_view_vsplit_cuda_float64 PASSED  [ 87%]
2023-01-11T23:10:17.2369143Z test_ops.py::TestMathBitsCUDA::test_neg_view_vstack_cuda_float64 PASSED  [ 87%]
2023-01-11T23:10:17.2369291Z test_ops.py::TestMathBitsCUDA::test_neg_view_zero__cuda_float64 PASSED   [ 87%]
2023-01-11T23:10:17.2369450Z test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_like_cuda_float64 PASSED [ 87%]
2023-01-11T23:10:17.2369593Z test_ops.py::TestFakeTensorCUDA::test_fake_T_cuda_float32 PASSED         [ 87%]
2023-01-11T23:10:17.2369744Z test_ops.py::TestFakeTensorCUDA::test_fake___radd___cuda_float32 PASSED  [ 87%]
2023-01-11T23:10:17.2369891Z test_ops.py::TestFakeTensorCUDA::test_fake___rand___cuda_int64 PASSED    [ 87%]
2023-01-11T23:10:17.2370047Z test_ops.py::TestFakeTensorCUDA::test_fake___rmatmul___cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2370218Z test_ops.py::TestFakeTensorCUDA::test_fake___rmul___cuda_float32 PASSED  [ 87%]
2023-01-11T23:10:17.2370363Z test_ops.py::TestFakeTensorCUDA::test_fake___ror___cuda_int64 PASSED     [ 87%]
2023-01-11T23:10:17.2370518Z test_ops.py::TestFakeTensorCUDA::test_fake___rsub___cuda_float32 PASSED  [ 87%]
2023-01-11T23:10:17.2370689Z test_ops.py::TestFakeTensorCUDA::test_fake__native_batch_norm_legit_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2370831Z test_ops.py::TestFakeTensorCUDA::test_fake_acos_cuda_float32 PASSED      [ 87%]
2023-01-11T23:10:17.2370980Z test_ops.py::TestFakeTensorCUDA::test_fake_addbmm_cuda_float32 PASSED    [ 87%]
2023-01-11T23:10:17.2371147Z test_ops.py::TestFakeTensorCUDA::test_fake_addmm_decomposed_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2371295Z test_ops.py::TestFakeTensorCUDA::test_fake_addr_cuda_float32 PASSED      [ 87%]
2023-01-11T23:10:17.2371444Z test_ops.py::TestFakeTensorCUDA::test_fake_amax_cuda_float32 PASSED      [ 87%]
2023-01-11T23:10:17.2371626Z test_ops.py::TestFakeTensorCUDA::test_fake_aminmax_cuda_float32 SKIPPED (Skip failing test) [ 87%]
2023-01-11T23:10:17.2371776Z test_ops.py::TestFakeTensorCUDA::test_fake_angle_cuda_float32 PASSED     [ 87%]
2023-01-11T23:10:17.2371931Z test_ops.py::TestFakeTensorCUDA::test_fake_argwhere_cuda_float32 PASSED  [ 87%]
2023-01-11T23:10:17.2372080Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2372247Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_scatter_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2372396Z test_ops.py::TestFakeTensorCUDA::test_fake_asinh_cuda_float32 PASSED     [ 87%]
2023-01-11T23:10:17.2372545Z test_ops.py::TestFakeTensorCUDA::test_fake_atan2_cuda_float32 PASSED     [ 87%]
2023-01-11T23:10:17.2372692Z test_ops.py::TestFakeTensorCUDA::test_fake_atan_cuda_float32 PASSED      [ 87%]
2023-01-11T23:10:17.2372846Z test_ops.py::TestFakeTensorCUDA::test_fake_atleast_2d_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2373003Z test_ops.py::TestFakeTensorCUDA::test_fake_atleast_3d_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2373169Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___getitem___cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2373354Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rand___cuda_int64 PASSED [ 87%]
2023-01-11T23:10:17.2373518Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rdiv___cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2373686Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmatmul___cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2373850Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmod___cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2374008Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rpow___cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2374191Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast__native_batch_norm_legit_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2374354Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_abs_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2374621Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acos_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2374788Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcdiv_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2374946Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addr_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2375104Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_all_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2375264Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amax_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2375420Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_amin_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2375577Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_any_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2375739Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_arange_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2375897Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmin_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2376101Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argsort_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2376256Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argwhere_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2376439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_partial_views_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2376602Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asinh_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2376758Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2376920Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bincount_cuda_int64 PASSED [ 87%]
2023-01-11T23:10:17.2377092Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_left_shift_cuda_int64 PASSED [ 87%]
2023-01-11T23:10:17.2377257Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_not_cuda_int64 PASSED [ 87%]
2023-01-11T23:10:17.2377435Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_right_shift_cuda_int64 PASSED [ 87%]
2023-01-11T23:10:17.2377589Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_xor_cuda_int64 PASSED [ 87%]
2023-01-11T23:10:17.2377758Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_block_diag_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2377918Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bmm_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2378091Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_shapes_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2378259Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_to_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2378423Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bucketize_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2378595Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cartesian_prod_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2378750Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ceil_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2378912Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cfloat_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2379066Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chalf_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2379251Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_char_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2379445Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_cuda_float32 SKIPPED (Skip failing test) [ 87%]
2023-01-11T23:10:17.2379613Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_solve_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2379774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2379942Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_min_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2380102Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clone_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2380271Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_combinations_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2380439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_constant_pad_nd_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2380609Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_contiguous_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2380774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_copysign_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2380931Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cosh_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2381114Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cov_cuda_float32 SKIPPED (Skip failing test) [ 87%]
2023-01-11T23:10:17.2381277Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cross_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2381439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummin_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2381601Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumprod_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2381788Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2381946Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagflat_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2382114Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2382287Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_scatter_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2382447Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diff_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2382606Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dist_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2382767Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dot_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2382929Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_double_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2383091Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dsplit_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2383243Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_dstack_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2383399Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_einsum_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2383563Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2383728Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_empty_like_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2383885Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eq_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2384043Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfc_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2384200Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erfinv_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2384359Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2384517Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_as_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2384682Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expand_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2384841Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_expm1_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2385039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft2_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2385202Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftn_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2385369Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fftshift_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2385534Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft2_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2385701Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftn_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2385870Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifftshift_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2386027Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft2_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2386191Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfft_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2386358Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ihfftn_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2386524Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfftn_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2386685Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2386848Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flatten_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387010Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387182Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_float_power_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387335Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387518Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmin_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387673Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frac_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_frexp_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2387994Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2388150Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_geqrf_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2388313Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gradient_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2388486Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_grid_sampler_2d_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2388639Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gt_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2388875Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_half_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2389043Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_heaviside_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2389203Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hstack_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2389365Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hypot_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2389521Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_i0_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2389684Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igammac_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2389850Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_add_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2390018Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_reduce_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2390178Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_select_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2390336Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_inner_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2390496Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_int_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2390657Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isin_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2390838Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isinf_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2391003Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isneginf_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2391167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isreal_cuda_float32 PASSED [ 87%]
2023-01-11T23:10:17.2391375Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2391582Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2391780Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2391978Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_unary_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2392140Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kron_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2392298Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lcm_cuda_int64 PASSED [ 88%]
2023-01-11T23:10:17.2392459Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ldexp_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2392616Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_le_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2392779Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lgamma_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2392944Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2393105Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigh_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2393292Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_householder_product_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2393491Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2393668Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2393845Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_factor_ex_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2394015Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_solve_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2394191Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2394392Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_power_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2394556Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_qr_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2394719Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_slogdet_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2394892Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_ex_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2395080Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_triangular_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2395246Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linspace_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2395407Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2395574Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2395741Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2395913Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logcumsumexp_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2396072Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_not_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2396239Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_or_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2396402Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_xor_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2396586Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logit_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2396752Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logsumexp_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2396914Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_long_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2397073Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lt_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2397231Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2397418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_solve_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2397566Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mH_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2397723Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mT_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2397888Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amin_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2398061Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmax_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2398229Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_argmin_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2398395Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumsum_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2398556Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_fill_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2398731Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_log_softmax_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2398899Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logaddexp_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2399090Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_median_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2399258Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmax_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2399428Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_softmin_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2399591Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_binary_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2399773Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_with_dim_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2399942Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_maximum_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2400102Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mean_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2400262Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_median_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2400439Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_list_of_tensors_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2400608Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_binary_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2400773Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_minimum_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2400934Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mm_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2401117Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mul_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2401296Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mv_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2401499Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2401701Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2401895Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2402068Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nan_to_num_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2402237Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmedian_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2402450Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanquantile_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2402614Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nansum_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2402801Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2402992Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_batch_norm_cuda_float32 SKIPPED (Skipped!) [ 88%]
2023-01-11T23:10:17.2403147Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ne_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2403308Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_neg_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2403472Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2403634Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_full_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2403799Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_ones_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2403993Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2404185Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2404379Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2404567Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_alpha_dropout_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2404749Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool1d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2404952Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2405129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_avg_pool3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2405335Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2405510Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_celu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2405700Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2405894Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2406094Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2406280Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2406464Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2406646Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_dropout_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2406813Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_elu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2407025Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_bag_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2407204Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_embedding_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2407403Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2407591Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2407768Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_gelu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2407969Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_glu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2408155Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_grid_sample_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2408339Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_group_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2408517Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardsigmoid_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2408699Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardswish_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2408879Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardtanh_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2409075Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2409265Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_area_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2409463Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2409641Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_l1_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2409822Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_layer_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2410002Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_leaky_relu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2410172Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_linear_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2410354Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2410561Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2410747Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2410969Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2411187Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2411384Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2411589Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_nll_loss_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2411769Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_normalize_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2411957Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_circular_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2412135Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_reflect_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2412325Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_replicate_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2412516Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pairwise_distance_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2412694Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pdist_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2412883Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2413058Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_selu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2413232Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_silu_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2413421Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2413632Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2413807Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2413997Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2414184Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softshrink_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2414365Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softsign_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2414656Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_tanhshrink_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2414841Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_threshold_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2415039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2415223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_unfold_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2415411Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2415569Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nonzero_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2415730Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ormqr_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2415894Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_permute_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2416078Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_2_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2416263Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_3_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2416462Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_qr_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2416656Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_quantile_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2416820Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2416979Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ravel_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2417143Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_real_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2417311Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_remainder_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2417478Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2417680Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_repeat_interleave_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2417854Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_as_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2418019Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reshape_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2418192Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize_as__cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2418355Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rot90_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2418509Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2418685Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_3_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2418852Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_add_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2419031Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amin_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2419206Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_mean_cuda_float32 PASSED [ 88%]
2023-01-11T23:10:17.2419414Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_lengths_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:10:17.2419607Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2419778Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_select_scatter_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2419931Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sgn_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2420091Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_short_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2420274Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_blackman_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2420454Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_cosine_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2420637Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_gaussian_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2420830Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_cosine_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2421021Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_general_hamming_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2421200Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hamming_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2421381Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_hann_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2421551Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_kaiser_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2421727Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_nuttall_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2421887Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinh_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2422044Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2422237Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_slice_scatter_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2422400Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sort_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2422600Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sparse_sampled_addmm_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:10:17.2422774Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_airy_ai_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2422946Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j1_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2423112Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y1_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2423304Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2423671Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:10:17.2424031Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:10:17.2424201Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_entr_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2424374Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_erfcx_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2424562Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_h_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2424727Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1e_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2425081Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:10:17.2425275Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i0_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2425453Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k0_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2425660Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtri_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2425853Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2426223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:10:17.2426580Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:10:17.2426766Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_spherical_bessel_j0_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2426941Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_xlog1py_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2427111Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_zeta_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2427284Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_with_sizes_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2427440Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_squeeze_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2427602Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stack_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2427764Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2427933Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_unbiased_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2428092Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_stft_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2428280Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2428434Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2428598Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_symeig_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2428832Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_along_dim_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2428992Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tan_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2429160Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensordot_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2429320Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tile_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2429478Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2429642Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_transpose_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2429815Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapezoid_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2429977Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trapz_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2430139Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2430321Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_indices_cuda_int64 PASSED [ 89%]
2023-01-11T23:10:17.2430507Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unbind_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2430679Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_copy_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2430842Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_uniform_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2431003Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2431167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unsqueeze_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2431333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2431509Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_unbiased_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2431697Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_unbiased_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2431861Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vdot_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2432023Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2432195Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_real_cuda_complex64 PASSED [ 89%]
2023-01-11T23:10:17.2432361Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_copy_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2432523Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vsplit_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2432684Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_vstack_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2432844Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_where_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2433011Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_like_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2433159Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_and_cuda_int64 PASSED [ 89%]
2023-01-11T23:10:17.2433323Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_left_shift_cuda_int64 PASSED [ 89%]
2023-01-11T23:10:17.2433486Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_right_shift_cuda_int64 PASSED [ 89%]
2023-01-11T23:10:17.2433646Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_to_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2433804Z test_ops.py::TestFakeTensorCUDA::test_fake_bucketize_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2433966Z test_ops.py::TestFakeTensorCUDA::test_fake_cartesian_prod_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2434144Z test_ops.py::TestFakeTensorCUDA::test_fake_cat_cuda_float32 PASSED       [ 89%]
2023-01-11T23:10:17.2434297Z test_ops.py::TestFakeTensorCUDA::test_fake_cdist_cuda_float32 PASSED     [ 89%]
2023-01-11T23:10:17.2434443Z test_ops.py::TestFakeTensorCUDA::test_fake_ceil_cuda_float32 PASSED      [ 89%]
2023-01-11T23:10:17.2434601Z test_ops.py::TestFakeTensorCUDA::test_fake_cfloat_cuda_float32 PASSED    [ 89%]
2023-01-11T23:10:17.2434751Z test_ops.py::TestFakeTensorCUDA::test_fake_chalf_cuda_float32 PASSED     [ 89%]
2023-01-11T23:10:17.2434930Z test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:10:17.2435121Z test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_inverse_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:10:17.2435272Z test_ops.py::TestFakeTensorCUDA::test_fake_chunk_cuda_float32 PASSED     [ 89%]
2023-01-11T23:10:17.2435425Z test_ops.py::TestFakeTensorCUDA::test_fake_clamp_cuda_float32 PASSED     [ 89%]
2023-01-11T23:10:17.2435582Z test_ops.py::TestFakeTensorCUDA::test_fake_clamp_max_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2435727Z test_ops.py::TestFakeTensorCUDA::test_fake_clamp_min_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2435886Z test_ops.py::TestFakeTensorCUDA::test_fake_column_stack_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2436046Z test_ops.py::TestFakeTensorCUDA::test_fake_conj_physical_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2436203Z test_ops.py::TestFakeTensorCUDA::test_fake_contiguous_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2436363Z test_ops.py::TestFakeTensorCUDA::test_fake_count_nonzero_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2436532Z test_ops.py::TestFakeTensorCUDA::test_fake_cov_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:10:17.2436702Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_H_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2436885Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___getitem___cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2437061Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmod___cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2437233Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_abs_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2437434Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acosh_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2437613Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addbmm_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2437788Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmv_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2437964Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addr_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2438139Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amax_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2438313Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_amin_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2438516Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_angle_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2438702Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_scatter_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2438878Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asinh_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2439049Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2439229Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_2d_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2439409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_baddbmm_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2439589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_block_diag_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2439778Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_tensors_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2439991Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cartesian_prod_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2440166Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdist_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2440340Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cdouble_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2440519Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cfloat_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2440693Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2440882Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_inverse_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2441067Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cholesky_solve_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2441245Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_max_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2441421Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clone_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2441606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_column_stack_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2441790Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_combinations_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2441968Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_physical_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2442154Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_constant_pad_nd_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2442337Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_contiguous_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2442517Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_copysign_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2442693Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummin_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2442876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumprod_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2443067Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_deg2rad_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2443241Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2443418Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diag_embed_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2443589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagflat_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2443758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2443945Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_scatter_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2444124Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diff_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2444315Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_floor_rounding_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2444509Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_no_rounding_mode_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2444700Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_div_trunc_rounding_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2444878Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dstack_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2445050Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_einsum_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2445215Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erf_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2445394Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfinv_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2445595Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp2_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2445768Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_exp_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2445947Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2446117Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2446290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftn_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2446465Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft2_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2446639Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfft_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2446804Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft2_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2446975Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifft_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2447146Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftn_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2447331Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ifftshift_cuda_float32 PASSED [ 89%]
2023-01-11T23:10:17.2447504Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft_cuda_float32 XFAIL [ 89%]
2023-01-11T23:10:17.2447683Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft2_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2447857Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfftn_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2448030Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfftn_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2448204Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fill_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2448373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flatten_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2448545Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fliplr_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2448775Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmax_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2448949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmin_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2449126Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2449300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gradient_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2449472Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_half_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2449647Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hsplit_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2449813Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hstack_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2449989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_hypot_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2450170Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_add_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2450350Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_copy_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2450559Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_reduce_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2450757Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kthvalue_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2450928Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ldexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2451101Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lerp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2451305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cond_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2451486Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvalsh_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2451670Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2451855Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_inv_ex_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2452039Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2452236Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2452421Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_solve_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2452606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_norm_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2452814Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skipped!) [ 90%]
2023-01-11T23:10:17.2453002Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_slogdet_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2453191Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_triangular_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2453373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svdvals_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2453561Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorinv_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2453752Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_tensorsolve_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2453939Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vector_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2454117Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log10_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2454315Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log1p_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2454592Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2454789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_with_dtype_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2454964Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2455142Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logsumexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2455319Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_solve_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2455490Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_unpack_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2455664Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mH_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2455849Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amax_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2456031Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_amin_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2456215Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumsum_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2456404Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_log_softmax_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2456584Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logaddexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2456771Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_logsumexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2456992Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_median_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2457177Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_prod_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2457366Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_scatter_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2457551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmax_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2457729Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_sum_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2457909Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_var_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2458088Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matmul_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2458271Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_no_dim_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2458449Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_maximum_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2458623Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mean_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2458799Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_median_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2458995Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_list_of_tensors_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2459172Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_binary_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2459364Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_no_dim_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2459556Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_min_reduction_with_dim_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2459736Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_minimum_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2459899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mm_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2460105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mul_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2460325Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2460527Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmedian_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2460701Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_narrow_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2460890Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_batch_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2461086Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_dropout_backward_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2461294Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2461503Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2461703Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2461893Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2462101Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2462295Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_bilinear_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2462510Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2462721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_celu_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2462913Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2463103Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2463308Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2463512Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2463711Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2463912Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cross_entropy_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2464133Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2464335Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2464523Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_gelu_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2464721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardshrink_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2464916Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardswish_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2465109Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardtanh_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2465314Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2465511Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_huber_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2465740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2465949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2466152Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_linear_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2466362Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2466553Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_l1_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2466753Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_layer_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2466943Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_leaky_relu_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2467139Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_logsigmoid_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2467326Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2467514Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2467704Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2467904Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2468128Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2468326Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2468517Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mish_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2468770Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_mse_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2468978Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2469185Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2469400Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2469600Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_circular_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2469792Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_constant_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2469991Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_reflect_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2470184Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pdist_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2470382Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2470570Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_prelu_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2470759Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_selu_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2470949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_silu_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2471194Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2471404Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2471591Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softplus_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2471787Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softshrink_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2472004Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2472196Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_unfold_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2472376Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_fro_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2472554Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_nuc_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2472732Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_outer_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2472908Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polar_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2473105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_1_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2473288Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_4_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2473465Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_prod_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2473637Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_put_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2473834Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_qr_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2474014Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rad2deg_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2474195Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reciprocal_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2474374Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_remainder_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2474548Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_renorm_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2474721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2474906Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_repeat_interleave_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2475091Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_neg_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2475262Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_roll_cuda_float32 XFAIL [ 90%]
2023-01-11T23:10:17.2475441Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rot90_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2475631Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_neg_3_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2475806Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsqrt_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2475980Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_rsub_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2476157Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2476348Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amax_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2476531Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_amin_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2476721Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_mean_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2476933Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_prod_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2477122Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_reduce_sum_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2477300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2477474Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sign_cuda_float32 PASSED [ 90%]
2023-01-11T23:10:17.2477646Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinh_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2477823Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2478016Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_softmax_with_dtype_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2478185Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sort_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2478371Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_erfcx_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2478552Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i0e_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2478736Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtr_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2478920Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_ndtri_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2479095Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2479280Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_list_args_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2479479Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_squeeze_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2479656Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stack_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2479821Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2480009Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_unbiased_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2480191Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_unbiased_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2480363Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2480533Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2480716Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_svd_lowrank_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2480891Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_symeig_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2481078Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_along_dim_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2481251Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_take_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2481424Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_sparse_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2481599Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_topk_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2481780Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_transpose_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2481956Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapezoid_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2482134Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tril_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2482308Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triu_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2482515Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_true_divide_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2482700Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_copy_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2482877Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unfold_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2483043Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2483221Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2483407Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_mean_unbiased_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2483592Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_var_unbiased_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2483766Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vdot_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2483955Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_complex_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2484135Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_copy_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2484311Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vstack_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2484483Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_zero__cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2484647Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_H_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2484819Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_T_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2485027Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___radd___cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2485205Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rdiv___cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2485384Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmul___cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2485562Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rsub___cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2485758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__softmax_backward_data_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2485933Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acos_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2486110Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_acosh_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2486283Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcdiv_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2486461Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addr_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2486641Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_angle_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2486832Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_scatter_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2487008Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asinh_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2487181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan2_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2487367Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_2d_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2487552Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_3d_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2487733Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_baddbmm_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2487913Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bernoulli_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2488115Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bfloat16_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2488291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_bmm_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2488483Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_tensors_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2488672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cartesian_prod_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2488849Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdist_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2489029Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cdouble_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2489213Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cfloat_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2489402Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_inverse_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2489574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2489758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_max_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2489940Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clamp_min_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2490125Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_column_stack_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2490303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_complex_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2490490Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_physical_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2490696Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_contiguous_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2490881Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_copysign_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2491056Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cos_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2491226Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cosh_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2491402Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cross_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2491584Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumprod_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2491760Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_deg2rad_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2491940Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_embed_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2492122Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagflat_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2492312Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_copy_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2492493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2492676Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diagonal_scatter_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2492865Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_floor_rounding_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2493056Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_trunc_rounding_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2493234Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_double_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2493411Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dsplit_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2493585Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erf_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2493784Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfc_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2493960Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2494145Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_as_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2494325Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expm1_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2494587Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft2_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2494768Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fft_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2494949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2495130Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft2_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2495309Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifft_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2495489Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2495678Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ifftshift_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2495856Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft2_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2496025Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2496205Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfftn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2496416Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft2_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2496593Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfft_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2496772Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fill_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2496945Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flip_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2497127Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flipud_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2497302Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2497487Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_float_power_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2497660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_floor_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2497835Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmax_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2498011Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmod_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2498181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_half_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2498360Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hstack_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2498536Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hypot_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2498723Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_add_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2498899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lerp_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2499078Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lgamma_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2499253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2499465Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigh_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2499660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvalsh_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2499863Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_householder_product_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2500050Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_ex_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2500234Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2500423Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2500615Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_solve_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2500813Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_power_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2501003Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_multi_dot_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2501201Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_subgradients_at_zero_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2501457Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 91%]
2023-01-11T23:10:17.2501644Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2501858Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_ex_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2502037Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svd_cuda_float32 XFAIL [ 91%]
2023-01-11T23:10:17.2502230Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorinv_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2502428Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_tensorsolve_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2502613Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vecdot_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2502789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log10_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2502957Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log1p_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2503134Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log2_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2503323Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp2_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2503512Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logaddexp_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2503692Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logdet_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2503866Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2504047Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_solve_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2504230Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_lu_unpack_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2504402Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mT_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2504580Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amax_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2504765Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_amin_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2504989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logaddexp_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2505180Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_logsumexp_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2505368Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_scatter_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2505556Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_select_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2505740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmax_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2505929Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_softmin_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2506116Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_std_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2506290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matmul_cuda_float32 PASSED [ 91%]
2023-01-11T23:10:17.2506477Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_matrix_exp_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2506674Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_with_dim_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2506853Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_maximum_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2507055Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_list_of_tensors_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2507258Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_meshgrid_variadic_tensors_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2507476Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_no_dim_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2507659Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_minimum_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2507835Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2508005Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mode_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2508187Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_movedim_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2508367Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_msort_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2508541Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mul_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2508789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mv_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2509009Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2509199Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2509386Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2509570Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nan_to_num_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2509743Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmean_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2509922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nansum_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2510117Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_dropout_backward_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2510303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_neg_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2510513Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2510749Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2510960Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2511165Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2511366Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2511571Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_alpha_dropout_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2511763Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2511964Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2512176Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2512369Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_bilinear_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2512561Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2512749Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2512959Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2513188Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2513398Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2513606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skipped!) [ 92%]
2023-01-11T23:10:17.2513798Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2513996Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_bag_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2514193Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_embedding_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2514414Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2514642Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2514849Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2515040Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_glu_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2515237Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_group_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2515438Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardshrink_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2515638Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardsigmoid_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2515828Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardswish_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2516051Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hardtanh_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2516253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_instance_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2516461Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_linear_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2516672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2516883Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2517077Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_layer_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2517271Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_linear_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2517478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_local_response_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2517672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_logsigmoid_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2517866Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2518065Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2518269Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2518463Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mse_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2518692Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2518891Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_normalize_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2519089Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_circular_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2519289Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_constant_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2519493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pairwise_distance_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2519679Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pdist_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2519872Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_prelu_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2520058Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu6_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2520254Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_relu_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2520446Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_rrelu_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2520646Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2520847Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2521043Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2521239Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softplus_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2521439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_threshold_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2521660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2521856Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_unfold_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2522062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2522267Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_upsample_nearest_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2522443Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2522627Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_inf_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2522807Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_nuc_cuda_float32 XFAIL [ 92%]
2023-01-11T23:10:17.2522987Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ormqr_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2523171Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pca_lowrank_cuda_float32 XFAIL [ 92%]
2023-01-11T23:10:17.2523345Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_permute_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2523524Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pinverse_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2523725Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_0_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2523922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_1_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2524142Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_2_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2524343Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_3_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2524543Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polygamma_polygamma_n_4_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2524719Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_qr_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2524899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rad2deg_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2525075Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_real_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2525253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reciprocal_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2525438Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_remainder_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2525616Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_renorm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2525792Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2525975Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_as_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2526165Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_conj_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2526351Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_resolve_neg_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2526525Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_roll_cuda_float32 XFAIL [ 92%]
2023-01-11T23:10:17.2526705Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2526890Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_3_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2527090Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsub_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2527277Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_add_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2527458Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2527655Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_offsets_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2527833Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2528021Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_select_scatter_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2528199Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sgn_cuda_float32 XFAIL [ 92%]
2023-01-11T23:10:17.2528380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sigmoid_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2528552Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sign_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2528722Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinc_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2528891Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sinh_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2529071Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2529265Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_softmax_with_dtype_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2529463Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sparse_sampled_addmm_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2529675Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_erfcx_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2529863Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1e_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2530045Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtri_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2530256Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2530443Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_xlog1py_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2530628Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2530820Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sqrt_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2531026Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2531218Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2531410Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_mean_unbiased_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2531597Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_std_unbiased_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2531778Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2531954Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sum_to_size_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2532129Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_cuda_float32 XFAIL [ 92%]
2023-01-11T23:10:17.2532311Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_svd_lowrank_cuda_float32 XFAIL [ 92%]
2023-01-11T23:10:17.2532489Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tan_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2532666Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tanh_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2532876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensor_split_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2533060Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_sparse_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2533237Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_topk_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2533414Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trace_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2533590Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_transpose_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2533782Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triangular_solve_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2533958Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tril_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2534136Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trunc_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2534317Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unflatten_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2534662Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_copy_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2534846Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unfold_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2535019Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2535200Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2535380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_unbiased_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2535660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_complex_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2535847Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_copy_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2536031Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vstack_cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2536207Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_zero__cuda_float32 PASSED [ 92%]
2023-01-11T23:10:17.2536363Z test_ops.py::TestFakeTensorCUDA::test_fake_cummax_cuda_float32 PASSED    [ 92%]
2023-01-11T23:10:17.2536518Z test_ops.py::TestFakeTensorCUDA::test_fake_cummin_cuda_float32 PASSED    [ 92%]
2023-01-11T23:10:17.2536672Z test_ops.py::TestFakeTensorCUDA::test_fake_diag_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2536821Z test_ops.py::TestFakeTensorCUDA::test_fake_diagflat_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2536981Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2537134Z test_ops.py::TestFakeTensorCUDA::test_fake_diff_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2537290Z test_ops.py::TestFakeTensorCUDA::test_fake_digamma_cuda_float32 PASSED   [ 93%]
2023-01-11T23:10:17.2537457Z test_ops.py::TestFakeTensorCUDA::test_fake_div_no_rounding_mode_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2537604Z test_ops.py::TestFakeTensorCUDA::test_fake_dot_cuda_float32 PASSED       [ 93%]
2023-01-11T23:10:17.2537753Z test_ops.py::TestFakeTensorCUDA::test_fake_double_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2537904Z test_ops.py::TestFakeTensorCUDA::test_fake_dstack_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2538055Z test_ops.py::TestFakeTensorCUDA::test_fake_equal_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2538197Z test_ops.py::TestFakeTensorCUDA::test_fake_exp2_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2538353Z test_ops.py::TestFakeTensorCUDA::test_fake_expand_as_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2538504Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft2_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2538691Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2538842Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft2_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2538992Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifft_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2539155Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftshift_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2539306Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2539457Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfftn_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2539615Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft2_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2539768Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfft_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2539924Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_irfftn_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2540078Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2540227Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfftn_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2540377Z test_ops.py::TestFakeTensorCUDA::test_fake_fill_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2540529Z test_ops.py::TestFakeTensorCUDA::test_fake_flatten_cuda_float32 PASSED   [ 93%]
2023-01-11T23:10:17.2540672Z test_ops.py::TestFakeTensorCUDA::test_fake_fliplr_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2540823Z test_ops.py::TestFakeTensorCUDA::test_fake_flipud_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2540986Z test_ops.py::TestFakeTensorCUDA::test_fake_floor_divide_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2541133Z test_ops.py::TestFakeTensorCUDA::test_fake_fmin_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2541308Z test_ops.py::TestFakeTensorCUDA::test_fake_fmod_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2541457Z test_ops.py::TestFakeTensorCUDA::test_fake_frac_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2541607Z test_ops.py::TestFakeTensorCUDA::test_fake_full_like_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2541755Z test_ops.py::TestFakeTensorCUDA::test_fake_ge_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2541897Z test_ops.py::TestFakeTensorCUDA::test_fake_geqrf_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2542051Z test_ops.py::TestFakeTensorCUDA::test_fake_gradient_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2542200Z test_ops.py::TestFakeTensorCUDA::test_fake_half_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2542353Z test_ops.py::TestFakeTensorCUDA::test_fake_heaviside_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2542502Z test_ops.py::TestFakeTensorCUDA::test_fake_histc_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2542654Z test_ops.py::TestFakeTensorCUDA::test_fake_hsplit_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2542802Z test_ops.py::TestFakeTensorCUDA::test_fake_hstack_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2542954Z test_ops.py::TestFakeTensorCUDA::test_fake_hypot_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2543099Z test_ops.py::TestFakeTensorCUDA::test_fake_index_add_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2543256Z test_ops.py::TestFakeTensorCUDA::test_fake_index_copy_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2543415Z test_ops.py::TestFakeTensorCUDA::test_fake_index_fill_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2543568Z test_ops.py::TestFakeTensorCUDA::test_fake_index_put_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2543730Z test_ops.py::TestFakeTensorCUDA::test_fake_index_reduce_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2543889Z test_ops.py::TestFakeTensorCUDA::test_fake_index_select_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2544042Z test_ops.py::TestFakeTensorCUDA::test_fake_inner_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2544193Z test_ops.py::TestFakeTensorCUDA::test_fake_isinf_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2544338Z test_ops.py::TestFakeTensorCUDA::test_fake_isneginf_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2544526Z test_ops.py::TestFakeTensorCUDA::test_fake_isposinf_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2544736Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2544929Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2545130Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2545278Z test_ops.py::TestFakeTensorCUDA::test_fake_le_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2545429Z test_ops.py::TestFakeTensorCUDA::test_fake_lerp_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2545594Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cross_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2545751Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eig_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2545934Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvalsh_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2546093Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2546255Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_inv_ex_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2546413Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2546576Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_solve_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2546743Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2546945Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2547136Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_multi_dot_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2547291Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2547473Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2547631Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2547826Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2547981Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_qr_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2548144Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_slogdet_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2548325Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2548500Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_triangular_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2548740Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vecdot_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2548905Z test_ops.py::TestFakeTensorCUDA::test_fake_log10_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2549075Z test_ops.py::TestFakeTensorCUDA::test_fake_log1p_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2549225Z test_ops.py::TestFakeTensorCUDA::test_fake_log2_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2549385Z test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp2_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2549542Z test_ops.py::TestFakeTensorCUDA::test_fake_logaddexp_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2549694Z test_ops.py::TestFakeTensorCUDA::test_fake_logdet_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2549853Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_not_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2550007Z test_ops.py::TestFakeTensorCUDA::test_fake_logit_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2550153Z test_ops.py::TestFakeTensorCUDA::test_fake_logspace_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2550327Z test_ops.py::TestFakeTensorCUDA::test_fake_lt_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2550481Z test_ops.py::TestFakeTensorCUDA::test_fake_lu_unpack_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2550629Z test_ops.py::TestFakeTensorCUDA::test_fake_mH_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2550776Z test_ops.py::TestFakeTensorCUDA::test_fake_mT_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2550938Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_amax_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2551101Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmin_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2551258Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumsum_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2551412Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_fill_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2551579Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_log_softmax_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2551750Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_logsumexp_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2551915Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_normalize_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2552073Z test_ops.py::TestFakeTensorCUDA::test_fake_max_binary_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2552244Z test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_with_dim_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2552398Z test_ops.py::TestFakeTensorCUDA::test_fake_maximum_cuda_float32 PASSED   [ 93%]
2023-01-11T23:10:17.2552548Z test_ops.py::TestFakeTensorCUDA::test_fake_median_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2552717Z test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_variadic_tensors_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2552901Z test_ops.py::TestFakeTensorCUDA::test_fake_min_binary_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2553072Z test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_no_dim_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2553230Z test_ops.py::TestFakeTensorCUDA::test_fake_minimum_cuda_float32 PASSED   [ 93%]
2023-01-11T23:10:17.2553381Z test_ops.py::TestFakeTensorCUDA::test_fake_mm_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2553533Z test_ops.py::TestFakeTensorCUDA::test_fake_mode_cuda_float32 PASSED      [ 93%]
2023-01-11T23:10:17.2553684Z test_ops.py::TestFakeTensorCUDA::test_fake_movedim_cuda_float32 PASSED   [ 93%]
2023-01-11T23:10:17.2553835Z test_ops.py::TestFakeTensorCUDA::test_fake_msort_cuda_float32 PASSED     [ 93%]
2023-01-11T23:10:17.2554018Z test_ops.py::TestFakeTensorCUDA::test_fake_multinomial_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2554158Z test_ops.py::TestFakeTensorCUDA::test_fake_mv_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2554354Z test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2554532Z test_ops.py::TestFakeTensorCUDA::test_fake_nanmean_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2554689Z test_ops.py::TestFakeTensorCUDA::test_fake_nanmedian_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2554841Z test_ops.py::TestFakeTensorCUDA::test_fake_nansum_cuda_float32 PASSED    [ 93%]
2023-01-11T23:10:17.2555020Z test_ops.py::TestFakeTensorCUDA::test_fake_native_batch_norm_cuda_float32 SKIPPED (Skipped!) [ 93%]
2023-01-11T23:10:17.2555169Z test_ops.py::TestFakeTensorCUDA::test_fake_ne_cuda_float32 PASSED        [ 93%]
2023-01-11T23:10:17.2555321Z test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2555481Z test_ops.py::TestFakeTensorCUDA::test_fake_new_empty_strided_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2555633Z test_ops.py::TestFakeTensorCUDA::test_fake_new_full_cuda_float32 PASSED  [ 93%]
2023-01-11T23:10:17.2555788Z test_ops.py::TestFakeTensorCUDA::test_fake_new_zeros_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2555938Z test_ops.py::TestFakeTensorCUDA::test_fake_nextafter_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2556148Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2556335Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2556519Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2556706Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2556880Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2557046Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool3d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2557239Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2557408Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_celu_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2557594Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2557773Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2557960Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2558135Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cross_entropy_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2558329Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:10:17.2558495Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2558693Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2558861Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_elu_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2559035Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2559221Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2559385Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_glu_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2559561Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_grid_sample_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2559739Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardsigmoid_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2559929Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2560095Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_huber_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2560281Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_area_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2560471Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2560640Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_l1_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2560809Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_leaky_relu_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2560978Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_linear_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2561157Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2561335Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2561520Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2561689Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2561878Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mish_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2562050Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_mse_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2562231Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2562420Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 93%]
2023-01-11T23:10:17.2562614Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_nll_loss_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:10:17.2562804Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_one_hot_cuda_int64 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:10:17.2562984Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_circular_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2563158Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_reflect_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2563330Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_replicate_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2563508Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pairwise_distance_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2563675Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pdist_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2563847Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2564028Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2564200Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_prelu_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2564391Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_silu_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2564573Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2564740Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softplus_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2564915Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softshrink_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2565087Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_tanhshrink_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2565272Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2565471Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2565640Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_unfold_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2565796Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2565952Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_nuc_cuda_float32 PASSED  [ 94%]
2023-01-11T23:10:17.2566109Z test_ops.py::TestFakeTensorCUDA::test_fake_normal_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2566271Z test_ops.py::TestFakeTensorCUDA::test_fake_normal_number_mean_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2566422Z test_ops.py::TestFakeTensorCUDA::test_fake_ones_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2566577Z test_ops.py::TestFakeTensorCUDA::test_fake_outer_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2566737Z test_ops.py::TestFakeTensorCUDA::test_fake_pca_lowrank_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2566896Z test_ops.py::TestFakeTensorCUDA::test_fake_pinverse_cuda_float32 PASSED  [ 94%]
2023-01-11T23:10:17.2567069Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2567242Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_2_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2567416Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_3_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2567601Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_4_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2567754Z test_ops.py::TestFakeTensorCUDA::test_fake_pow_cuda_float32 PASSED       [ 94%]
2023-01-11T23:10:17.2567909Z test_ops.py::TestFakeTensorCUDA::test_fake_prod_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2568060Z test_ops.py::TestFakeTensorCUDA::test_fake_qr_cuda_float32 PASSED        [ 94%]
2023-01-11T23:10:17.2568238Z test_ops.py::TestFakeTensorCUDA::test_fake_quantile_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:10:17.2568390Z test_ops.py::TestFakeTensorCUDA::test_fake_rad2deg_cuda_float32 PASSED   [ 94%]
2023-01-11T23:10:17.2568541Z test_ops.py::TestFakeTensorCUDA::test_fake_randint_cuda_float32 PASSED   [ 94%]
2023-01-11T23:10:17.2568708Z test_ops.py::TestFakeTensorCUDA::test_fake_randint_like_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2568860Z test_ops.py::TestFakeTensorCUDA::test_fake_randn_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2569006Z test_ops.py::TestFakeTensorCUDA::test_fake_ravel_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2569162Z test_ops.py::TestFakeTensorCUDA::test_fake_remainder_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2569313Z test_ops.py::TestFakeTensorCUDA::test_fake_renorm_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2569474Z test_ops.py::TestFakeTensorCUDA::test_fake_reshape_as_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2569625Z test_ops.py::TestFakeTensorCUDA::test_fake_reshape_cuda_float32 PASSED   [ 94%]
2023-01-11T23:10:17.2569783Z test_ops.py::TestFakeTensorCUDA::test_fake_resize_as__cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2569942Z test_ops.py::TestFakeTensorCUDA::test_fake_resolve_neg_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2570094Z test_ops.py::TestFakeTensorCUDA::test_fake_roll_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2570263Z test_ops.py::TestFakeTensorCUDA::test_fake_rot90_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2570430Z test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2570602Z test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_neg_3_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2570769Z test_ops.py::TestFakeTensorCUDA::test_fake_scalar_tensor_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2570954Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_add_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2571148Z test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_offsets_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2571300Z test_ops.py::TestFakeTensorCUDA::test_fake_select_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2571472Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_bartlett_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2571633Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_cosine_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2571815Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_exponential_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2571988Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_gaussian_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2572166Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_hamming_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2572337Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hamming_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2572501Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_nuttall_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2572652Z test_ops.py::TestFakeTensorCUDA::test_fake_sin_cuda_float32 PASSED       [ 94%]
2023-01-11T23:10:17.2572804Z test_ops.py::TestFakeTensorCUDA::test_fake_slice_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2572959Z test_ops.py::TestFakeTensorCUDA::test_fake_slice_scatter_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2573127Z test_ops.py::TestFakeTensorCUDA::test_fake_softmax_with_dtype_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2573278Z test_ops.py::TestFakeTensorCUDA::test_fake_sort_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2573465Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j1_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2573635Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2573819Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2573994Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2574357Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:10:17.2574616Z test_ops.py::TestFakeTensorCUDA::test_fake_special_erfcx_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2574792Z test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_h_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2574970Z test_ops.py::TestFakeTensorCUDA::test_fake_special_hermite_polynomial_he_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2575146Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2575309Z test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtr_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2575469Z test_ops.py::TestFakeTensorCUDA::test_fake_special_ndtri_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2575664Z test_ops.py::TestFakeTensorCUDA::test_fake_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2575850Z test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2576207Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:10:17.2576426Z test_ops.py::TestFakeTensorCUDA::test_fake_special_spherical_bessel_j0_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2576583Z test_ops.py::TestFakeTensorCUDA::test_fake_special_xlog1py_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2576738Z test_ops.py::TestFakeTensorCUDA::test_fake_split_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2576904Z test_ops.py::TestFakeTensorCUDA::test_fake_split_with_sizes_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2577054Z test_ops.py::TestFakeTensorCUDA::test_fake_sqrt_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2577207Z test_ops.py::TestFakeTensorCUDA::test_fake_squeeze_cuda_float32 PASSED   [ 94%]
2023-01-11T23:10:17.2577358Z test_ops.py::TestFakeTensorCUDA::test_fake_stack_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2577510Z test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_cuda_float32 PASSED  [ 94%]
2023-01-11T23:10:17.2577676Z test_ops.py::TestFakeTensorCUDA::test_fake_std_mean_unbiased_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2577831Z test_ops.py::TestFakeTensorCUDA::test_fake_std_unbiased_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2577980Z test_ops.py::TestFakeTensorCUDA::test_fake_stft_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2578130Z test_ops.py::TestFakeTensorCUDA::test_fake_sum_cuda_float32 PASSED       [ 94%]
2023-01-11T23:10:17.2578292Z test_ops.py::TestFakeTensorCUDA::test_fake_take_along_dim_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2578443Z test_ops.py::TestFakeTensorCUDA::test_fake_take_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2578593Z test_ops.py::TestFakeTensorCUDA::test_fake_tan_cuda_float32 PASSED       [ 94%]
2023-01-11T23:10:17.2578745Z test_ops.py::TestFakeTensorCUDA::test_fake_tensordot_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2578893Z test_ops.py::TestFakeTensorCUDA::test_fake_to_cuda_float32 PASSED        [ 94%]
2023-01-11T23:10:17.2579035Z test_ops.py::TestFakeTensorCUDA::test_fake_trace_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2579192Z test_ops.py::TestFakeTensorCUDA::test_fake_trapezoid_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2579339Z test_ops.py::TestFakeTensorCUDA::test_fake_trapz_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2579529Z test_ops.py::TestFakeTensorCUDA::test_fake_tril_indices_cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2579676Z test_ops.py::TestFakeTensorCUDA::test_fake_triu_cuda_float32 PASSED      [ 94%]
2023-01-11T23:10:17.2579834Z test_ops.py::TestFakeTensorCUDA::test_fake_triu_indices_cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2579986Z test_ops.py::TestFakeTensorCUDA::test_fake_unflatten_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2580134Z test_ops.py::TestFakeTensorCUDA::test_fake_unfold_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2580284Z test_ops.py::TestFakeTensorCUDA::test_fake_unique_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2580438Z test_ops.py::TestFakeTensorCUDA::test_fake_var_unbiased_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2580604Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_complex_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2580757Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_cuda_float32 PASSED   [ 94%]
2023-01-11T23:10:17.2580921Z test_ops.py::TestFakeTensorCUDA::test_fake_view_as_real_cuda_complex64 PASSED [ 94%]
2023-01-11T23:10:17.2581073Z test_ops.py::TestFakeTensorCUDA::test_fake_view_copy_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2581222Z test_ops.py::TestFakeTensorCUDA::test_fake_vsplit_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2581370Z test_ops.py::TestFakeTensorCUDA::test_fake_vstack_cuda_float32 PASSED    [ 94%]
2023-01-11T23:10:17.2581521Z test_ops.py::TestFakeTensorCUDA::test_fake_where_cuda_float32 PASSED     [ 94%]
2023-01-11T23:10:17.2581670Z test_ops.py::TestFakeTensorCUDA::test_fake_zeros_like_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2581824Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_H_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2581982Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_T_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2582174Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___radd___cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2582346Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmatmul___cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2582508Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___ror___cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2582676Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rpow___cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2582858Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__native_batch_norm_legit_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2583035Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops__softmax_backward_data_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2583197Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_abs_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2583359Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acos_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2583525Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_acosh_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2583690Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addbmm_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2583856Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcmul_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2584038Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_decomposed_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2584198Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmv_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2584351Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addr_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2584511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_all_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2584678Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_allclose_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2584840Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amin_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2585031Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_aminmax_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:10:17.2585195Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_angle_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2585401Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmin_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2585570Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argsort_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2585737Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argwhere_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2585899Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2586085Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_partial_views_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2586247Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2586414Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_1d_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2586589Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bernoulli_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2586754Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bincount_cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2586928Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_left_shift_cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2587095Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_not_cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2587265Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_right_shift_cuda_int64 PASSED [ 94%]
2023-01-11T23:10:17.2587432Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_block_diag_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2587592Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bool_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2587759Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bucketize_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2587945Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_byte_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2588105Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cat_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2588271Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdist_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2588437Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cdouble_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2588599Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ceil_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2588856Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:10:17.2589030Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chunk_cuda_float32 PASSED [ 94%]
2023-01-11T23:10:17.2589194Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_max_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2589360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_min_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2589536Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_column_stack_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2589706Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_combinations_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2589869Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2590045Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_constant_pad_nd_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2590206Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_contiguous_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2590376Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_copysign_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2590538Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cosh_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2590708Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_count_nonzero_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2590869Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cross_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2591038Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummax_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2591204Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumprod_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2591410Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumulative_trapezoid_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2591579Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_embed_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2591743Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_copy_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2591907Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_digamma_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2592068Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dist_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2592231Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_double_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2592398Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_like_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2592558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eq_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2592723Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erf_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2592883Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfc_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2593045Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftshift_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2593210Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft2_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2593375Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfftn_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2593542Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft2_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2593705Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfft_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2593897Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ihfftn_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2594063Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfftn_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2594230Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft2_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2594388Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfft_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2594549Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fill_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2594714Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flatten_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2594879Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flip_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2595047Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_power_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2595217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_divide_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2595384Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmin_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2595543Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmod_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2595700Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frac_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2595855Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_frexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2596020Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_like_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2596182Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ge_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2596347Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_geqrf_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2596511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gradient_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2596685Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_grid_sampler_2d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2596848Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gt_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2597012Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hstack_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2597202Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igammac_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2597369Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_imag_cuda_complex64 PASSED [ 95%]
2023-01-11T23:10:17.2597534Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_put_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2597702Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_reduce_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2597866Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_inner_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2598029Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_int_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2598191Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isclose_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2598360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isfinite_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2598563Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2598762Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2598924Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kron_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2599085Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lerp_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2599246Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lgamma_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2599415Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cond_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2599582Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2599776Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eig_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2599969Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_householder_product_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2600134Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_ex_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2600309Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2600487Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_factor_ex_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2600661Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_ldl_solve_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2600848Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2601028Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_ex_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2601228Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2601466Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2601638Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_multi_dot_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2601802Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2601994Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2602161Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2602363Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2602604Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 95%]
2023-01-11T23:10:17.2602775Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_qr_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2602992Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2603178Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_triangular_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2603345Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svd_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2603511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_svdvals_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2603686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorinv_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2603859Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vander_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2604032Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vecdot_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2604192Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log2_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2604357Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2604527Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp2_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2604700Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logcumsumexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2604863Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logdet_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2605025Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_and_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2605191Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_not_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2605360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logspace_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2605551Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_long_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2605714Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2605883Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amax_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2606046Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_amin_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2606218Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmin_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2606381Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumsum_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2606558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logsumexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2606727Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_median_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2606897Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_prod_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2607068Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_scatter_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2607241Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmax_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2607407Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_softmin_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2607572Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_std_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2607740Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_var_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2607899Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matrix_exp_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2608067Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_binary_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2608263Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2608445Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_no_dim_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2608650Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_max_reduction_with_dim_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2608815Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_median_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2608983Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_minimum_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2609144Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mm_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2609299Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mode_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2609462Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_movedim_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2609629Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_msort_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2609824Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_multinomial_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2609988Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mv_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2610193Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2610395Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2610584Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanquantile_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2610774Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2610946Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_batch_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2611113Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ne_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2611333Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_neg_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2611511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_full_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2611679Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_ones_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2611847Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_zeros_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2612051Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2612246Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2612443Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2612621Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool1d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2612807Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool2d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2612991Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2613175Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_bilinear_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2613370Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2613547Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv1d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2613738Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2613935Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2614126Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cosine_similarity_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2614311Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_cross_entropy_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2614644Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2614833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout2d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2615011Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_elu_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2615219Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_bag_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:10:17.2615403Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_embedding_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2615608Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2615807Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2616002Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2616173Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_gelu_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2616362Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_grid_sample_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2616542Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_group_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2616729Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardsigmoid_cuda_float32 PASSED [ 95%]
2023-01-11T23:10:17.2616910Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardtanh_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2617135Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2617331Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2617515Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_l1_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2617693Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_linear_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2617878Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_local_response_norm_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2618061Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_logsigmoid_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2618251Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2618431Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool2d_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2618621Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2618814Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2619003Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2619190Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2619376Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2619546Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mish_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2619749Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_nll_loss_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:10:17.2619937Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_circular_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2620124Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_reflect_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2620333Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pdist_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2620532Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_prelu_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2620734Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu6_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2620910Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_relu_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2621098Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_tanhshrink_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2621275Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_threshold_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2621486Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2621675Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_unfold_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2621870Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2622036Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nonzero_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2622199Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2622368Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_inf_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2622534Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_nuc_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2622698Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2622883Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ones_like_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2623048Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ormqr_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2623223Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pca_lowrank_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2623387Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pinverse_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2623553Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polar_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2623737Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_0_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2623920Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_2_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2624083Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_pow_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2624240Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_put_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2624395Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_qr_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2624584Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_quantile_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:10:17.2624753Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rad2deg_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2624919Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2625092Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randint_like_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2625257Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2625425Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_randn_like_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2625588Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ravel_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2625747Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_real_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2625921Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reciprocal_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2626159Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_remainder_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2626360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_interleave_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:10:17.2626526Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2626690Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize__cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2626855Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resize_as__cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2627024Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_conj_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2627188Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_roll_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2627346Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rot90_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2627519Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_0_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2627695Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_3_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2627862Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2628040Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amax_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2628217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_mean_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2628392Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_sum_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2628563Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_searchsorted_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2628839Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_gaussian_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2629029Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_cosine_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2629216Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hamming_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2629395Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_nuttall_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2629556Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sin_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2629718Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinc_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2629878Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sinh_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2630043Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2630204Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sort_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2630377Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j0_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2630558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y1_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2630753Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2630925Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_erfcx_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2631115Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_h_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2631309Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_hermite_polynomial_he_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2631480Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i0e_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2631647Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2632021Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:10:17.2632230Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i0_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2632419Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_i1_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2632610Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2632983Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:10:17.2633171Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_spherical_bessel_j0_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2633343Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_zeta_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2633507Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2633674Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_square_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2633835Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2633986Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2634154Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_svd_lowrank_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2634318Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_symeig_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2634479Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_t_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2634651Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_along_dim_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2634837Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tan_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2635031Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensor_split_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:10:17.2635193Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2635348Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_topk_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2635509Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trace_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2635677Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapezoid_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2635853Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triangular_solve_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2636020Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_indices_cuda_int64 PASSED [ 96%]
2023-01-11T23:10:17.2636186Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_indices_cuda_int64 PASSED [ 96%]
2023-01-11T23:10:17.2636353Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trunc_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2636521Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_copy_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2636686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unfold_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2636843Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_uniform_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2637022Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_consecutive_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2637188Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unsqueeze_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2637348Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2637514Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2637695Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_mean_unbiased_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2637865Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_var_unbiased_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2638052Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vdot_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2638217Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_real_cuda_complex64 PASSED [ 96%]
2023-01-11T23:10:17.2638377Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2638544Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vsplit_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2638703Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_vstack_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2638867Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2639034Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zeros_like_cuda_float32 PASSED [ 96%]
2023-01-11T23:10:17.2639197Z test_ops.py::TestTagsCUDA::test_tags_T_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2639366Z test_ops.py::TestTagsCUDA::test_tags___rmatmul___cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2639533Z test_ops.py::TestTagsCUDA::test_tags___rmod___cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2639684Z test_ops.py::TestTagsCUDA::test_tags___ror___cuda_int64 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2639848Z test_ops.py::TestTagsCUDA::test_tags___rsub___cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2640009Z test_ops.py::TestTagsCUDA::test_tags___rxor___cuda_int64 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2640193Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bfloat16_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2640375Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_bool_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2640585Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_byte_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2640771Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cdouble_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2640956Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_cfloat_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2641132Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_complex_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2641317Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_double_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2641502Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_float_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2641683Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_short_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2641848Z test_ops.py::TestTagsCUDA::test_tags__refs_acos_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2642015Z test_ops.py::TestTagsCUDA::test_tags__refs_add_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2642188Z test_ops.py::TestTagsCUDA::test_tags__refs_addcdiv_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2642356Z test_ops.py::TestTagsCUDA::test_tags__refs_addcmul_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2642519Z test_ops.py::TestTagsCUDA::test_tags__refs_all_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2642683Z test_ops.py::TestTagsCUDA::test_tags__refs_allclose_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2642848Z test_ops.py::TestTagsCUDA::test_tags__refs_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2643012Z test_ops.py::TestTagsCUDA::test_tags__refs_any_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2643200Z test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_partial_views_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2643386Z test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2643578Z test_ops.py::TestTagsCUDA::test_tags__refs_atanh_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2643752Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_1d_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2643924Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_2d_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2644086Z test_ops.py::TestTagsCUDA::test_tags__refs_atleast_3d_cuda_float32 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2644249Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_and_cuda_int64 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2644427Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_left_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2644597Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_not_cuda_int64 SKIPPED (Only runs on cpu) [ 96%]
2023-01-11T23:10:17.2644782Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_right_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2644959Z test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_to_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2645126Z test_ops.py::TestTagsCUDA::test_tags__refs_ceil_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2645298Z test_ops.py::TestTagsCUDA::test_tags__refs_chunk_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2645472Z test_ops.py::TestTagsCUDA::test_tags__refs_clamp_max_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2645649Z test_ops.py::TestTagsCUDA::test_tags__refs_conj_physical_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2645820Z test_ops.py::TestTagsCUDA::test_tags__refs_constant_pad_nd_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2646014Z test_ops.py::TestTagsCUDA::test_tags__refs_cumsum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2646180Z test_ops.py::TestTagsCUDA::test_tags__refs_diag_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2646362Z test_ops.py::TestTagsCUDA::test_tags__refs_div_floor_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2646529Z test_ops.py::TestTagsCUDA::test_tags__refs_dsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2646698Z test_ops.py::TestTagsCUDA::test_tags__refs_dstack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2646873Z test_ops.py::TestTagsCUDA::test_tags__refs_empty_like_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2647039Z test_ops.py::TestTagsCUDA::test_tags__refs_eq_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2647205Z test_ops.py::TestTagsCUDA::test_tags__refs_erfinv_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2647363Z test_ops.py::TestTagsCUDA::test_tags__refs_exp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2647535Z test_ops.py::TestTagsCUDA::test_tags__refs_expand_as_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2647704Z test_ops.py::TestTagsCUDA::test_tags__refs_expm1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2647871Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2648046Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2648219Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2648389Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2648558Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ihfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2648728Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2648890Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2649078Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2649245Z test_ops.py::TestTagsCUDA::test_tags__refs_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2649414Z test_ops.py::TestTagsCUDA::test_tags__refs_flatten_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2649578Z test_ops.py::TestTagsCUDA::test_tags__refs_flip_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2649748Z test_ops.py::TestTagsCUDA::test_tags__refs_fliplr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2649914Z test_ops.py::TestTagsCUDA::test_tags__refs_flipud_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2650078Z test_ops.py::TestTagsCUDA::test_tags__refs_fmax_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2650238Z test_ops.py::TestTagsCUDA::test_tags__refs_gt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2650410Z test_ops.py::TestTagsCUDA::test_tags__refs_heaviside_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2650599Z test_ops.py::TestTagsCUDA::test_tags__refs_hsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2650787Z test_ops.py::TestTagsCUDA::test_tags__refs_hypot_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2650948Z test_ops.py::TestTagsCUDA::test_tags__refs_i0_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2651115Z test_ops.py::TestTagsCUDA::test_tags__refs_igamma_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2651284Z test_ops.py::TestTagsCUDA::test_tags__refs_igammac_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2651455Z test_ops.py::TestTagsCUDA::test_tags__refs_imag_cuda_complex64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2651653Z test_ops.py::TestTagsCUDA::test_tags__refs_index_add_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2651820Z test_ops.py::TestTagsCUDA::test_tags__refs_index_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2651987Z test_ops.py::TestTagsCUDA::test_tags__refs_isclose_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2652154Z test_ops.py::TestTagsCUDA::test_tags__refs_isinf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2652321Z test_ops.py::TestTagsCUDA::test_tags__refs_isreal_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2652482Z test_ops.py::TestTagsCUDA::test_tags__refs_lcm_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2652650Z test_ops.py::TestTagsCUDA::test_tags__refs_lerp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2652815Z test_ops.py::TestTagsCUDA::test_tags__refs_lgamma_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2652999Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_matrix_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2653172Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svd_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2653342Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_svdvals_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2653523Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_vector_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2653686Z test_ops.py::TestTagsCUDA::test_tags__refs_log10_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2653850Z test_ops.py::TestTagsCUDA::test_tags__refs_log_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2654025Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_and_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2654194Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_not_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2654369Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_or_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2654664Z test_ops.py::TestTagsCUDA::test_tags__refs_lt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2654834Z test_ops.py::TestTagsCUDA::test_tags__refs_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2655015Z test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_list_of_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2655203Z test_ops.py::TestTagsCUDA::test_tags__refs_meshgrid_variadic_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2655370Z test_ops.py::TestTagsCUDA::test_tags__refs_movedim_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2655534Z test_ops.py::TestTagsCUDA::test_tags__refs_neg_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2655705Z test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2655881Z test_ops.py::TestTagsCUDA::test_tags__refs_new_full_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2656050Z test_ops.py::TestTagsCUDA::test_tags__refs_new_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2656215Z test_ops.py::TestTagsCUDA::test_tags__refs_new_zeros_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2656374Z test_ops.py::TestTagsCUDA::test_tags__refs_nextafter_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2656566Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2656751Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2656931Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_glu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2657149Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_group_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2657334Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2657515Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_huber_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2657714Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2657914Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2658097Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mish_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2658274Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_mse_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2658473Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pairwise_distance_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2658655Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_pdist_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2658844Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu6_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2659023Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_selu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2659218Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2659403Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_threshold_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2659596Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2659771Z test_ops.py::TestTagsCUDA::test_tags__refs_positive_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2659928Z test_ops.py::TestTagsCUDA::test_tags__refs_ravel_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2660119Z test_ops.py::TestTagsCUDA::test_tags__refs_real_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2660295Z test_ops.py::TestTagsCUDA::test_tags__refs_reciprocal_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2660463Z test_ops.py::TestTagsCUDA::test_tags__refs_repeat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2660635Z test_ops.py::TestTagsCUDA::test_tags__refs_reshape_as_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2660801Z test_ops.py::TestTagsCUDA::test_tags__refs_round_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2660968Z test_ops.py::TestTagsCUDA::test_tags__refs_rsqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2661136Z test_ops.py::TestTagsCUDA::test_tags__refs_rsub_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2661320Z test_ops.py::TestTagsCUDA::test_tags__refs_sigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2661480Z test_ops.py::TestTagsCUDA::test_tags__refs_sinc_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2661643Z test_ops.py::TestTagsCUDA::test_tags__refs_sinh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2661823Z test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2662004Z test_ops.py::TestTagsCUDA::test_tags__refs_special_bessel_j1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2662179Z test_ops.py::TestTagsCUDA::test_tags__refs_special_erfcx_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2662358Z test_ops.py::TestTagsCUDA::test_tags__refs_special_log_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2662578Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2662782Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2662972Z test_ops.py::TestTagsCUDA::test_tags__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2663143Z test_ops.py::TestTagsCUDA::test_tags__refs_special_xlog1py_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2663309Z test_ops.py::TestTagsCUDA::test_tags__refs_sqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2663478Z test_ops.py::TestTagsCUDA::test_tags__refs_square_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2663650Z test_ops.py::TestTagsCUDA::test_tags__refs_squeeze_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2663816Z test_ops.py::TestTagsCUDA::test_tags__refs_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2663984Z test_ops.py::TestTagsCUDA::test_tags__refs_std_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2664148Z test_ops.py::TestTagsCUDA::test_tags__refs_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2664308Z test_ops.py::TestTagsCUDA::test_tags__refs_t_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2664472Z test_ops.py::TestTagsCUDA::test_tags__refs_trace_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2664637Z test_ops.py::TestTagsCUDA::test_tags__refs_transpose_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2664799Z test_ops.py::TestTagsCUDA::test_tags__refs_tril_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2664962Z test_ops.py::TestTagsCUDA::test_tags__refs_triu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2665125Z test_ops.py::TestTagsCUDA::test_tags__refs_trunc_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2665299Z test_ops.py::TestTagsCUDA::test_tags__refs_var_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2665463Z test_ops.py::TestTagsCUDA::test_tags__refs_where_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2665659Z test_ops.py::TestTagsCUDA::test_tags__refs_xlogy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2665825Z test_ops.py::TestTagsCUDA::test_tags__refs_zeros_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2665985Z test_ops.py::TestTagsCUDA::test_tags_acos_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2666137Z test_ops.py::TestTagsCUDA::test_tags_add_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2666302Z test_ops.py::TestTagsCUDA::test_tags_addcdiv_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2666464Z test_ops.py::TestTagsCUDA::test_tags_addmm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2666645Z test_ops.py::TestTagsCUDA::test_tags_addmm_decomposed_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2666803Z test_ops.py::TestTagsCUDA::test_tags_all_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2666966Z test_ops.py::TestTagsCUDA::test_tags_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2667127Z test_ops.py::TestTagsCUDA::test_tags_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2667283Z test_ops.py::TestTagsCUDA::test_tags_any_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2667439Z test_ops.py::TestTagsCUDA::test_tags_argmin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2667604Z test_ops.py::TestTagsCUDA::test_tags_argsort_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2667770Z test_ops.py::TestTagsCUDA::test_tags_argwhere_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2667935Z test_ops.py::TestTagsCUDA::test_tags_as_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2668144Z test_ops.py::TestTagsCUDA::test_tags_as_strided_partial_views_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2668325Z test_ops.py::TestTagsCUDA::test_tags_as_strided_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2668488Z test_ops.py::TestTagsCUDA::test_tags_atanh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2668655Z test_ops.py::TestTagsCUDA::test_tags_atleast_1d_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2668894Z test_ops.py::TestTagsCUDA::test_tags_atleast_3d_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2669053Z test_ops.py::TestTagsCUDA::test_tags_bernoulli_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2669216Z test_ops.py::TestTagsCUDA::test_tags_bfloat16_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2669378Z test_ops.py::TestTagsCUDA::test_tags_bincount_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2669556Z test_ops.py::TestTagsCUDA::test_tags_bitwise_right_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2669722Z test_ops.py::TestTagsCUDA::test_tags_bitwise_xor_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2669890Z test_ops.py::TestTagsCUDA::test_tags_block_diag_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2670051Z test_ops.py::TestTagsCUDA::test_tags_bool_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2670228Z test_ops.py::TestTagsCUDA::test_tags_broadcast_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2670386Z test_ops.py::TestTagsCUDA::test_tags_bucketize_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2670544Z test_ops.py::TestTagsCUDA::test_tags_cat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2670707Z test_ops.py::TestTagsCUDA::test_tags_cdist_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2670875Z test_ops.py::TestTagsCUDA::test_tags_ceil_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2671036Z test_ops.py::TestTagsCUDA::test_tags_cfloat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2671235Z test_ops.py::TestTagsCUDA::test_tags_cholesky_inverse_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2671409Z test_ops.py::TestTagsCUDA::test_tags_cholesky_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2671571Z test_ops.py::TestTagsCUDA::test_tags_chunk_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2671734Z test_ops.py::TestTagsCUDA::test_tags_clamp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2671893Z test_ops.py::TestTagsCUDA::test_tags_clamp_max_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:10:17.2672047Z test_ops.py::TestTagsCUDA::test_tags_clone_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2672217Z test_ops.py::TestTagsCUDA::test_tags_combinations_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2672379Z test_ops.py::TestTagsCUDA::test_tags_conj_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2672555Z test_ops.py::TestTagsCUDA::test_tags_constant_pad_nd_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2672721Z test_ops.py::TestTagsCUDA::test_tags_copysign_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2672886Z test_ops.py::TestTagsCUDA::test_tags_corrcoef_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2673048Z test_ops.py::TestTagsCUDA::test_tags_cos_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2673202Z test_ops.py::TestTagsCUDA::test_tags_cosh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2673373Z test_ops.py::TestTagsCUDA::test_tags_count_nonzero_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2673532Z test_ops.py::TestTagsCUDA::test_tags_cov_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2673720Z test_ops.py::TestTagsCUDA::test_tags_cross_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2673886Z test_ops.py::TestTagsCUDA::test_tags_cumprod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2674070Z test_ops.py::TestTagsCUDA::test_tags_cumulative_trapezoid_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2674232Z test_ops.py::TestTagsCUDA::test_tags_diag_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2674399Z test_ops.py::TestTagsCUDA::test_tags_diag_embed_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2674566Z test_ops.py::TestTagsCUDA::test_tags_diagonal_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2674718Z test_ops.py::TestTagsCUDA::test_tags_diff_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2674878Z test_ops.py::TestTagsCUDA::test_tags_dist_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2675058Z test_ops.py::TestTagsCUDA::test_tags_div_no_rounding_mode_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2675218Z test_ops.py::TestTagsCUDA::test_tags_dot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2675384Z test_ops.py::TestTagsCUDA::test_tags_dsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2675546Z test_ops.py::TestTagsCUDA::test_tags_dstack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2675707Z test_ops.py::TestTagsCUDA::test_tags_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2675866Z test_ops.py::TestTagsCUDA::test_tags_eq_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2676018Z test_ops.py::TestTagsCUDA::test_tags_equal_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2681471Z test_ops.py::TestTagsCUDA::test_tags_erf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2681636Z test_ops.py::TestTagsCUDA::test_tags_erfc_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2681808Z test_ops.py::TestTagsCUDA::test_tags_erfinv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2681968Z test_ops.py::TestTagsCUDA::test_tags_exp2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2682185Z test_ops.py::TestTagsCUDA::test_tags_expand_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2682347Z test_ops.py::TestTagsCUDA::test_tags_expm1_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2682505Z test_ops.py::TestTagsCUDA::test_tags_eye_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2682668Z test_ops.py::TestTagsCUDA::test_tags_fft_fftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2682826Z test_ops.py::TestTagsCUDA::test_tags_fft_hfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2682985Z test_ops.py::TestTagsCUDA::test_tags_fft_hfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2683145Z test_ops.py::TestTagsCUDA::test_tags_fft_hfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2683309Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2683478Z test_ops.py::TestTagsCUDA::test_tags_fft_irfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2683643Z test_ops.py::TestTagsCUDA::test_tags_fft_irfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2683806Z test_ops.py::TestTagsCUDA::test_tags_fft_rfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2683971Z test_ops.py::TestTagsCUDA::test_tags_fft_rfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2684124Z test_ops.py::TestTagsCUDA::test_tags_fft_rfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2684285Z test_ops.py::TestTagsCUDA::test_tags_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2684450Z test_ops.py::TestTagsCUDA::test_tags_flatten_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2684637Z test_ops.py::TestTagsCUDA::test_tags_float_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2684792Z test_ops.py::TestTagsCUDA::test_tags_fmin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2684953Z test_ops.py::TestTagsCUDA::test_tags_fmod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2685117Z test_ops.py::TestTagsCUDA::test_tags_frexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2685277Z test_ops.py::TestTagsCUDA::test_tags_full_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2685444Z test_ops.py::TestTagsCUDA::test_tags_full_like_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2685600Z test_ops.py::TestTagsCUDA::test_tags_gather_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2685756Z test_ops.py::TestTagsCUDA::test_tags_gcd_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2685917Z test_ops.py::TestTagsCUDA::test_tags_ge_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2686092Z test_ops.py::TestTagsCUDA::test_tags_grid_sampler_2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2686255Z test_ops.py::TestTagsCUDA::test_tags_half_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2686415Z test_ops.py::TestTagsCUDA::test_tags_hsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2686569Z test_ops.py::TestTagsCUDA::test_tags_i0_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2686735Z test_ops.py::TestTagsCUDA::test_tags_imag_cuda_complex64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2686892Z test_ops.py::TestTagsCUDA::test_tags_index_add_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2687057Z test_ops.py::TestTagsCUDA::test_tags_index_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2687222Z test_ops.py::TestTagsCUDA::test_tags_index_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2687395Z test_ops.py::TestTagsCUDA::test_tags_index_reduce_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2687556Z test_ops.py::TestTagsCUDA::test_tags_int_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2687740Z test_ops.py::TestTagsCUDA::test_tags_isin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2687904Z test_ops.py::TestTagsCUDA::test_tags_isreal_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2688071Z test_ops.py::TestTagsCUDA::test_tags_istft_cuda_complex64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2688257Z test_ops.py::TestTagsCUDA::test_tags_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2688441Z test_ops.py::TestTagsCUDA::test_tags_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2688618Z test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2688808Z test_ops.py::TestTagsCUDA::test_tags_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2688967Z test_ops.py::TestTagsCUDA::test_tags_kron_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2689135Z test_ops.py::TestTagsCUDA::test_tags_kthvalue_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2689292Z test_ops.py::TestTagsCUDA::test_tags_lcm_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2689454Z test_ops.py::TestTagsCUDA::test_tags_ldexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2689618Z test_ops.py::TestTagsCUDA::test_tags_lgamma_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2689787Z test_ops.py::TestTagsCUDA::test_tags_linalg_cross_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2689949Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2690145Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigvals_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2690330Z test_ops.py::TestTagsCUDA::test_tags_linalg_householder_product_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2690508Z test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2690685Z test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_factor_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2690856Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2691030Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2691210Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_power_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2691383Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2691565Z test_ops.py::TestTagsCUDA::test_tags_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2691745Z test_ops.py::TestTagsCUDA::test_tags_linalg_multi_dot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2691910Z test_ops.py::TestTagsCUDA::test_tags_linalg_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2692080Z test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2692245Z test_ops.py::TestTagsCUDA::test_tags_linalg_qr_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2692411Z test_ops.py::TestTagsCUDA::test_tags_linalg_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2692583Z test_ops.py::TestTagsCUDA::test_tags_linalg_solve_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2692764Z test_ops.py::TestTagsCUDA::test_tags_linalg_solve_triangular_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2692942Z test_ops.py::TestTagsCUDA::test_tags_linalg_tensorinv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2693128Z test_ops.py::TestTagsCUDA::test_tags_linalg_vander_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2693297Z test_ops.py::TestTagsCUDA::test_tags_linalg_vecdot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2693462Z test_ops.py::TestTagsCUDA::test_tags_linspace_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2693621Z test_ops.py::TestTagsCUDA::test_tags_log1p_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2693780Z test_ops.py::TestTagsCUDA::test_tags_log2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2693937Z test_ops.py::TestTagsCUDA::test_tags_log_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2694102Z test_ops.py::TestTagsCUDA::test_tags_log_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2694286Z test_ops.py::TestTagsCUDA::test_tags_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2694448Z test_ops.py::TestTagsCUDA::test_tags_logaddexp2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2694841Z test_ops.py::TestTagsCUDA::test_tags_logaddexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2695013Z test_ops.py::TestTagsCUDA::test_tags_logcumsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2695177Z test_ops.py::TestTagsCUDA::test_tags_logdet_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2695337Z test_ops.py::TestTagsCUDA::test_tags_logit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2695503Z test_ops.py::TestTagsCUDA::test_tags_logspace_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2695670Z test_ops.py::TestTagsCUDA::test_tags_logsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2695927Z test_ops.py::TestTagsCUDA::test_tags_mH_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2696092Z test_ops.py::TestTagsCUDA::test_tags_mT_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2696254Z test_ops.py::TestTagsCUDA::test_tags_masked_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2696423Z test_ops.py::TestTagsCUDA::test_tags_masked_argmin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2696599Z test_ops.py::TestTagsCUDA::test_tags_masked_log_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2696777Z test_ops.py::TestTagsCUDA::test_tags_masked_logaddexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2696948Z test_ops.py::TestTagsCUDA::test_tags_masked_logsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2697114Z test_ops.py::TestTagsCUDA::test_tags_masked_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2697281Z test_ops.py::TestTagsCUDA::test_tags_masked_median_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2697449Z test_ops.py::TestTagsCUDA::test_tags_masked_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2697619Z test_ops.py::TestTagsCUDA::test_tags_masked_normalize_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2697785Z test_ops.py::TestTagsCUDA::test_tags_masked_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2697956Z test_ops.py::TestTagsCUDA::test_tags_masked_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2698121Z test_ops.py::TestTagsCUDA::test_tags_masked_std_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2698284Z test_ops.py::TestTagsCUDA::test_tags_max_binary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2698446Z test_ops.py::TestTagsCUDA::test_tags_maximum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2698603Z test_ops.py::TestTagsCUDA::test_tags_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2698787Z test_ops.py::TestTagsCUDA::test_tags_meshgrid_list_of_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2698981Z test_ops.py::TestTagsCUDA::test_tags_min_binary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2699152Z test_ops.py::TestTagsCUDA::test_tags_minimum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2699314Z test_ops.py::TestTagsCUDA::test_tags_mm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2699478Z test_ops.py::TestTagsCUDA::test_tags_multinomial_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2699631Z test_ops.py::TestTagsCUDA::test_tags_mv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2699812Z test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2699993Z test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2700161Z test_ops.py::TestTagsCUDA::test_tags_nan_to_num_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2700319Z test_ops.py::TestTagsCUDA::test_tags_nanmean_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2700491Z test_ops.py::TestTagsCUDA::test_tags_nanquantile_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2700646Z test_ops.py::TestTagsCUDA::test_tags_nansum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2700810Z test_ops.py::TestTagsCUDA::test_tags_narrow_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2700988Z test_ops.py::TestTagsCUDA::test_tags_native_batch_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2701158Z test_ops.py::TestTagsCUDA::test_tags_native_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2701313Z test_ops.py::TestTagsCUDA::test_tags_ne_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2701500Z test_ops.py::TestTagsCUDA::test_tags_new_full_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2701666Z test_ops.py::TestTagsCUDA::test_tags_nextafter_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2701860Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2702046Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2702222Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_avg_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2702403Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2702585Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2702766Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_adaptive_max_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2702954Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2703141Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2703325Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2703504Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_avg_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2703701Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_without_cudnn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2703887Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2704064Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_celu_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:10:17.2704247Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2704460Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2704648Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2704842Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_embedding_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2705033Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_cosine_similarity_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2705215Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_cross_entropy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2705392Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_ctc_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2705570Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2705752Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2705933Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_bag_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2706115Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_embedding_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2706308Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2706502Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_fractional_max_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2706688Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_gaussian_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2706891Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_gelu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2707075Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_group_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2707262Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2707441Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardsigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2707625Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardswish_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2707807Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hardtanh_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2708000Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2708183Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_huber_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2708377Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_area_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2708570Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2708828Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2709024Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_linear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2709211Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_nearest_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2709405Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2709581Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_l1_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2709765Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_linear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2709987Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_logsigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2710178Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2710355Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skipped!) [ 99%]
2023-01-11T23:10:17.2710535Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2710721Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2710900Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2711092Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool2d_grad_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2711277Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2711471Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool3d_grad_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2711655Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2711852Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2712032Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2712214Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_normalize_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2712426Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_one_hot_cuda_int64 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2712609Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_constant_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2712790Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_reflect_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2712967Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pdist_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2713153Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_shuffle_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2713339Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pixel_unshuffle_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2713518Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_prelu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2713694Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2713876Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_silu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2714062Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_smooth_l1_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2714250Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_soft_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2714425Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softplus_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2714606Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2714790Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_tanhshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2714972Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_threshold_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2715166Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2715396Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2715588Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_bilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2715776Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_upsample_nearest_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2715937Z test_ops.py::TestTagsCUDA::test_tags_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2716093Z test_ops.py::TestTagsCUDA::test_tags_norm_fro_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2716256Z test_ops.py::TestTagsCUDA::test_tags_norm_nuc_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2716419Z test_ops.py::TestTagsCUDA::test_tags_normal_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2716582Z test_ops.py::TestTagsCUDA::test_tags_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2716750Z test_ops.py::TestTagsCUDA::test_tags_ones_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2716925Z test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_view_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2717091Z test_ops.py::TestTagsCUDA::test_tags_pinverse_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2717255Z test_ops.py::TestTagsCUDA::test_tags_polar_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2717440Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2717615Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2717780Z test_ops.py::TestTagsCUDA::test_tags_positive_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718006Z test_ops.py::TestTagsCUDA::test_tags_pow_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718167Z test_ops.py::TestTagsCUDA::test_tags_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718329Z test_ops.py::TestTagsCUDA::test_tags_qr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718492Z test_ops.py::TestTagsCUDA::test_tags_quantile_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718656Z test_ops.py::TestTagsCUDA::test_tags_rand_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718827Z test_ops.py::TestTagsCUDA::test_tags_randint_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2718986Z test_ops.py::TestTagsCUDA::test_tags_randn_like_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2719147Z test_ops.py::TestTagsCUDA::test_tags_ravel_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2719320Z test_ops.py::TestTagsCUDA::test_tags_resolve_conj_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2719485Z test_ops.py::TestTagsCUDA::test_tags_resolve_neg_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2719648Z test_ops.py::TestTagsCUDA::test_tags_rot90_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2719806Z test_ops.py::TestTagsCUDA::test_tags_round_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2719981Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2720158Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_neg_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2720319Z test_ops.py::TestTagsCUDA::test_tags_rsqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2720472Z test_ops.py::TestTagsCUDA::test_tags_rsub_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2720669Z test_ops.py::TestTagsCUDA::test_tags_scalar_tensor_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2720860Z test_ops.py::TestTagsCUDA::test_tags_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2721060Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amax_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2721242Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2721421Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2721598Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2721768Z test_ops.py::TestTagsCUDA::test_tags_searchsorted_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2721950Z test_ops.py::TestTagsCUDA::test_tags_segment_reduce_lengths_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2722110Z test_ops.py::TestTagsCUDA::test_tags_select_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2722271Z test_ops.py::TestTagsCUDA::test_tags_sgn_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2722436Z test_ops.py::TestTagsCUDA::test_tags_sigmoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2722618Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_bartlett_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2722799Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_gaussian_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2722986Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_cosine_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2723168Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_hamming_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2723351Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_kaiser_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2723552Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_nuttall_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2723711Z test_ops.py::TestTagsCUDA::test_tags_sin_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2723874Z test_ops.py::TestTagsCUDA::test_tags_slice_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2724034Z test_ops.py::TestTagsCUDA::test_tags_softmax_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2724215Z test_ops.py::TestTagsCUDA::test_tags_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2724392Z test_ops.py::TestTagsCUDA::test_tags_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 99%]
2023-01-11T23:10:17.2724566Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2724737Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_y0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2724911Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_y1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2725094Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2725280Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2725637Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:10:17.2725808Z test_ops.py::TestTagsCUDA::test_tags_special_entr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2725979Z test_ops.py::TestTagsCUDA::test_tags_special_erfcx_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2726149Z test_ops.py::TestTagsCUDA::test_tags_special_i0e_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2726316Z test_ops.py::TestTagsCUDA::test_tags_special_i1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2726483Z test_ops.py::TestTagsCUDA::test_tags_special_i1e_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2726697Z test_ops.py::TestTagsCUDA::test_tags_special_laguerre_polynomial_l_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2726869Z test_ops.py::TestTagsCUDA::test_tags_special_log_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2727055Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2727238Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_i1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2727429Z test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2727779Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:10:17.2728123Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:10:17.2728302Z test_ops.py::TestTagsCUDA::test_tags_special_xlog1py_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2728468Z test_ops.py::TestTagsCUDA::test_tags_special_zeta_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2728640Z test_ops.py::TestTagsCUDA::test_tags_split_list_args_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2728806Z test_ops.py::TestTagsCUDA::test_tags_square_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2728962Z test_ops.py::TestTagsCUDA::test_tags_std_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2729135Z test_ops.py::TestTagsCUDA::test_tags_std_mean_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2729329Z test_ops.py::TestTagsCUDA::test_tags_std_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2729490Z test_ops.py::TestTagsCUDA::test_tags_stft_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2729662Z test_ops.py::TestTagsCUDA::test_tags_sum_to_size_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2729819Z test_ops.py::TestTagsCUDA::test_tags_svd_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2729981Z test_ops.py::TestTagsCUDA::test_tags_symeig_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2730140Z test_ops.py::TestTagsCUDA::test_tags_take_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2730287Z test_ops.py::TestTagsCUDA::test_tags_tan_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2730454Z test_ops.py::TestTagsCUDA::test_tags_tanh_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2730651Z test_ops.py::TestTagsCUDA::test_tags_tensor_split_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2730829Z test_ops.py::TestTagsCUDA::test_tags_to_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731004Z test_ops.py::TestTagsCUDA::test_tags_triangular_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731166Z test_ops.py::TestTagsCUDA::test_tags_trunc_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731328Z test_ops.py::TestTagsCUDA::test_tags_unbind_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731489Z test_ops.py::TestTagsCUDA::test_tags_unique_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731655Z test_ops.py::TestTagsCUDA::test_tags_unsqueeze_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731821Z test_ops.py::TestTagsCUDA::test_tags_var_mean_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2731992Z test_ops.py::TestTagsCUDA::test_tags_var_unbiased_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2732161Z test_ops.py::TestTagsCUDA::test_tags_vsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2732325Z test_ops.py::TestTagsCUDA::test_tags_vstack_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2732505Z test_ops.py::TestTagsCUDA::test_tags_zero__cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:10:17.2732668Z test_ops.py::TestTagsCUDA::test_tags_zeros_cuda_float32 SKIPPED (Only runs on cpu) [100%]
2023-01-11T23:10:17.2732675Z 
2023-01-11T23:10:17.2732798Z =============================== warnings summary ===============================
2023-01-11T23:10:17.2733029Z ../../../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171
2023-01-11T23:10:17.2733392Z   /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: hypothesis
2023-01-11T23:10:17.2733490Z     self._mark_plugins_for_rewrite(hook)
2023-01-11T23:10:17.2733499Z 
2023-01-11T23:10:17.2733729Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
2023-01-11T23:10:17.2734025Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-8308d40cbcb1066e.xml -
2023-01-11T23:10:17.2734179Z = 12240 passed, 3207 skipped, 25 deselected, 225 xfailed, 1 warning in 4589.34s (1:16:29) =
2023-01-11T23:10:17.2734358Z If in CI, skip info is located in the xml test reports, please either go to s3 or the hud to download them
2023-01-11T23:10:17.2734364Z 
2023-01-11T23:10:17.2734834Z ##[endgroup]
2023-01-11T23:10:17.2735104Z FINISHED PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_xj2vdchj)
2023-01-11T23:10:17.2735109Z 
2023-01-11T23:13:46.7161921Z 
2023-01-11T23:13:46.7162736Z Expand the folded group to see the log file of test_ops
2023-01-11T23:13:46.7163497Z ##[group]PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_oa0bw8mk)
2023-01-11T23:13:46.7178787Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-621a128d9f5db79e.xml
2023-01-11T23:13:46.7179139Z ============================= test session starts ==============================
2023-01-11T23:13:46.7179536Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python
2023-01-11T23:13:46.7179816Z cachedir: .pytest_cache
2023-01-11T23:13:46.7180238Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2023-01-11T23:13:46.7182221Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini
2023-01-11T23:13:46.7183013Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0
2023-01-11T23:13:46.7183494Z collecting ... collected 30861 items / 17 deselected / 30844 selected
2023-01-11T23:13:46.8672140Z Running 15147 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_repeat_interleave_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_H_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_partial_views_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_clone_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_div_no_rounding_mode_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mul_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nonzero_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rand_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_real_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_scalar_tensor_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_slice_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_with_sizes_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_trace_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_dtypes_H_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_T_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rdiv___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bfloat16_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_byte_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_char_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_int_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_any_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_asin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log10_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_lt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_minimum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_threshold_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_positive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_square_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__refs_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes__softmax_backward_data_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_abs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_acos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addmm_decomposed_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_addr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_asin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_atleast_3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_broadcast_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cholesky_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_conj_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_deg2rad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagflat_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_double_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_dstack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_einsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_eq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_exp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expand_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_eye_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_float_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_floor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gather_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ge_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_half_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_isposinf_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_istft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_2inputs_2outputs_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_jiterator_4inputs_with_extra_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_le_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cond_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_singular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigh_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvals_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_solve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_multi_dot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_singular_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_qr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_slogdet_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log10_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mH_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mT_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_amax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_median_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mode_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mv_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanmedian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nanquantile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nansum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ne_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_neg_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional__scaled_dot_product_attention_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_celu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_elu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_bag_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_grid_sample_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardswish_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_kl_div_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_linear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_logsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_one_hot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softsign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_nearest_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_norm_nuc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ones_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_outer_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polar_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_2_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_positive_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_pow_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_prod_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_put_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randint_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_randn_like_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_renorm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_resize_as__cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_neg_3_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_offsets_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_select_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sgn_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_gaussian_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_signbit_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sin_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_slice_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_h_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k1_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_ndtri_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_polygamma_special_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_stft_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_svd_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_t_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tan_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_to_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_topk_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trace_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapezoid_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trapz_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_triu_indices_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_unique_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_var_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_vdot_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_complex_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_copy_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_view_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_errors_T_cuda, test/test_ops.py::TestCommonCUDA::test_errors___rand___cuda, test/test_ops.py::TestCommonCUDA::test_errors___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_errors___ror___cuda, test/test_ops.py::TestCommonCUDA::test_errors_add_cuda, test/test_ops.py::TestCommonCUDA::test_errors_aminmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_arange_cuda, test/test_ops.py::TestCommonCUDA::test_errors_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_errors_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_errors_cat_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_errors_clamp_min_cuda, test/test_ops.py::TestCommonCUDA::test_errors_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_errors_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_eye_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_fftn_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_hfft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fliplr_cuda, test/test_ops.py::TestCommonCUDA::test_errors_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_errors_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_errors_fmod_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_errors_ge_cuda, test/test_ops.py::TestCommonCUDA::test_errors_gt_cuda, test/test_ops.py::TestCommonCUDA::test_errors_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igamma_cuda, test/test_ops.py::TestCommonCUDA::test_errors_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_errors_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_errors_kthvalue_cuda, test/test_ops.py::TestCommonCUDA::test_errors_lcm_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_grad_oriented_cuda, test/test_ops.py::TestCommonCUDA::test_errors_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_lt_cuda, test/test_ops.py::TestCommonCUDA::test_errors_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_errors_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_errors_multinomial_cuda, test/test_ops.py::TestCommonCUDA::test_errors_neg_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv1d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv2d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_rrelu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_errors_polar_cuda, test/test_ops.py::TestCommonCUDA::test_errors_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_errors_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_blackman_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda, test/test_ops.py::TestCommonCUDA::test_errors_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_w_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_errors_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_errors_take_cuda, test/test_ops.py::TestCommonCUDA::test_errors_trace_cuda, test/test_ops.py::TestCommonCUDA::test_errors_triu_cuda, test/test_ops.py::TestCommonCUDA::test_errors_uniform_cuda, test/test_ops.py::TestCommonCUDA::test_errors_view_cuda, test/test_ops.py::TestCommonCUDA::test_errors_vsplit_cuda, test/test_ops.py::TestCommonCUDA::test_errors_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_errors_where_cuda, test/test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___ror___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices___rxor___cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_addmv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_H_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___radd___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rdiv___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values___ror___cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_3d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cartesian_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_put_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_4inputs_with_extra_args_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mH_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_with_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_shuffle_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_3_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_4_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rad2deg_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reciprocal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize_as__cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scalar_tensor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_reduce_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_airy_ai_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_w_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_log_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_u_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_list_args_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_with_sizes_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tile_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_sparse_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_xlogy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rand___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_bartlett_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hann_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_nuttall_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_int64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_out_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_acos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_angle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_baddbmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bernoulli_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bincount_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cov_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diag_embed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_fftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_frexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_hsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_igamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_inner_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_ldexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_pinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_slogdet_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vander_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logaddexp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mH_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_median_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_softmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_matrix_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanmean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nanquantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_full_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_nonzero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_norm_fro_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_normal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ormqr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rad2deg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_round_decimals_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_bessel_y1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_special_xlog1py_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_stft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_symeig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_to_sparse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_trapz_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vdot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_out_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning_H_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_T_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___getitem___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmod___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rmul___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rpow___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rsub___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_char_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_acosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_partial_views_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_asin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_asinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_shapes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_bucketize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_clone_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cos_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_cosh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_digamma_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_dsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erfc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_exp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_fmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_frac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_imag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_index_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_lerp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log10_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log1p_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_masked_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_full_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_margin_ranking_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pairwise_distance_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_prelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_selu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softplus_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_permute_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sinh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i0e_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_3_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_spherical_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_std_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_indices_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_trunc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unflatten_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_var_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_abs_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_add_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addcdiv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addcmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_addr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_all_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_angle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_any_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_arange_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_argwhere_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_atleast_2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bincount_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bitwise_xor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_block_diag_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ceil_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cfloat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_clone_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_conj_physical_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_constant_pad_nd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_contiguous_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_copysign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_corrcoef_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cov_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cummin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagflat_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_diff_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_double_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_empty_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_eq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_equal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_erfinv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_exp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_eye_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_fftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_hfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftshift_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_irfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_float_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_floor_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_frexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_full_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_half_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_histogram_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_i0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_fill_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isinf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isnan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isneginf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_4inputs_with_extra_args_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_return_by_ref_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_le_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_eig_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_householder_product_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_power_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_hermitian_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_multi_dot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_subgradients_at_zero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_qr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_slogdet_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_ex_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorsolve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vecdot_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_vector_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log10_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp2_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logaddexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logical_or_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_logit_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mH_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mT_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_argmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_cumsum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_log_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_logsumexp_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_masked_var_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_matmul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_median_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_list_of_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_msort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mul_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mv_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_5_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nan_to_num_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nanmean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_dropout_backward_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_new_zeros_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_with_logits_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cross_entropy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_without_train_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gaussian_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardsigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardtanh_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_huber_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_instance_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_area_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bicubic_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_linear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_trilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_kl_div_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool1d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool3d_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_grad_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mish_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_soft_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_replicate_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pdist_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_unshuffle_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softsign_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_tanhshrink_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_with_distance_loss_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_bilinear_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_norm_inf_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_native_batch_norm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_var_mean_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_view_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pca_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polar_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_pow_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_put_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_qr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_randn_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_ravel_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_repeat_interleave_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_reshape_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resize__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_resolve_neg_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amax_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amin_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_prod_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_lengths_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sigmoid_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_bartlett_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hann_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_kaiser_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_nuttall_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_softmax_with_dtype_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sort_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_airy_ai_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_he_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_laguerre_polynomial_l_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k0_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_u_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_v_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_special_zeta_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_list_args_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_square_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_std_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_sum_to_size_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_svd_lowrank_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_t_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tan_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_to_sparse_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_trapz_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_tril_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unbind_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_var_mean_unbiased_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_view_as_real_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_xlogy_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zero__cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_zeros_like_cuda, test/test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amax_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amin_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_as_strided_scatter_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_atan2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_left_shift_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_or_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cat_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_max_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_no_rounding_mode_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_trunc_rounding_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eye_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft2_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfftn_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_flipud_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_float_power_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gcd_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_heaviside_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hsplit_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_xor_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_maximum_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_copy_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_native_layer_norm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ne_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_group_norm_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_l1_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_poisson_nll_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_softshrink_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_triplet_margin_loss_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_pow_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_remainder_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_roll_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rot90_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rsub_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_xlog1py_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_tril_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_true_divide_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vstack_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_where_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_errors_ops_nvprims_view_cuda, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex128, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bfloat16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bool, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float64, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int16, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int32, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int8, test/test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_uint8, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmod___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager__softmax_backward_data_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_shapes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_max_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expm1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfftn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_divide_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frac_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ge_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_heaviside_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_histc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isposinf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logaddexp_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_minimum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_msort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_dropout_backward_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nextafter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_celu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_elu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_glu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pdist_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu6_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polar_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_quantile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_like_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_searchsorted_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_scatter_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hann_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sort_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_airy_ai_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_log_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtr_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtri_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_lowrank_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_consecutive_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_complex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_copy_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_complex64, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rdiv___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward___rpow___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_constant_pad_nd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_diff_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_digamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_expm1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_float_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_index_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_multi_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vander_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_unpack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmedian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_native_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_inf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_rsqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sqrt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_symeig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triangular_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_triu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_H_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_T_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___radd___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rsub___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcmul_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_all_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_angle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_baddbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_block_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_shapes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_byte_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdouble_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_combinations_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_complex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_count_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cov_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumprod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagflat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_trunc_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_equal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fliplr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_half_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isneginf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ldexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_le_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log1p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_or_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_long_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mT_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matrix_exp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanquantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nansum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_elu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_prelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softplus_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softsign_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_number_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_outer_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polar_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pow_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_put_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rand_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_neg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rot90_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scalar_tensor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sgn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hann_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtri_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_topk_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trunc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unflatten_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsqueeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_copy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_where_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___getitem___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmatmul___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator___rmod___cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_acosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addbmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addcdiv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_addr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_all_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_aminmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_any_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_arange_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_argmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_partial_views_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_bernoulli_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ceil_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cfloat_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_chalf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_inverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_min_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_clone_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_physical_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_corrcoef_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cos_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_embed_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_div_floor_rounding_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_double_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_dstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_eq_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erfc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_erfinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_eye_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftshift_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfftn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_flipud_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_fmin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_frexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ge_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_gradient_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_heaviside_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_igammac_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_index_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_inner_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isclose_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isfinite_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_isreal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_unary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_kron_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_kthvalue_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lerp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lgamma_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cond_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cross_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_singular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eig_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvals_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorinv_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vecdot_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vector_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log10_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_xor_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_logit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_lt_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mH_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumsum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_log_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_median_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_min_binary_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_minimum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mode_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_multinomial_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nan_to_num_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_strided_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_new_full_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_linear_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_grad_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_normalize_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_threshold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_fro_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_nuc_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pca_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_permute_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_positive_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_qr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rad2deg_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_like_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_ravel_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_real_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reciprocal_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_renorm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_interleave_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_as_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_conj_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_rsub_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_searchsorted_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_select_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_blackman_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_exponential_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_gaussian_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_nuttall_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sin_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sort_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_entr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_log_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_list_args_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_square_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_std_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_lowrank_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tile_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_to_sparse_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trace_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trapezoid_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_trapz_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_tril_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_uniform_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_var_unbiased_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_view_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vsplit_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_xlogy_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_T_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___getitem___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rdiv___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmatmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_byte_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcdiv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_any_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_embed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_div_no_rounding_mode_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eye_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flip_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fliplr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flipud_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isnan_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_not_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_list_of_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ne_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_ones_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_pow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_randn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sigmoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_to_size_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tensor_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tril_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_triu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unflatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_acosh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addcmul_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_addr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_angle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_argwhere_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_partial_views_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_asinh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_block_diag_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_to_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cartesian_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cdouble_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_inverse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_clone_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_combinations_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_conj_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_contiguous_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_corrcoef_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cos_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cross_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diag_embed_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagflat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_dot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_eq_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_expand_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft2_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftshift_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flip_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_flipud_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_geqrf_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_gradient_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_hstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_index_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_inner_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isclose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_isfinite_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_4inputs_with_extra_args_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ldexp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cross_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_singular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvalsh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_householder_product_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_hermitian_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_singular_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svdvals_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vector_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log10_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_lu_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mT_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumprod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumsum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_std_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_masked_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_mean_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_movedim_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_copy_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_full_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_new_zeros_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv1d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose2d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose3d_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_l1_loss_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_linear_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_normalize_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_replicate_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_shuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_unshuffle_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softsign_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_fro_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_norm_nuc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_ops_nvprims_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_outer_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_positive_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_pow_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_qr_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_real_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reciprocal_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_interleave_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_as_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resize__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_add_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_select_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sgn_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_short_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_sinc_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_slice_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_softmax_with_dtype_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_list_args_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_square_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_std_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_stft_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_symeig_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tanh_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_tensordot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_to_sparse_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_transpose_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_unsqueeze_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_var_unbiased_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vdot_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_view_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vsplit_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_vstack_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_where_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zero__cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_like_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rdiv___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rpow___cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cfloat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_half_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_int_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_allclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_any_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_partial_views_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_physical_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_div_no_rounding_mode_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isfinite_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_lerp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_matrix_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_and_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_xor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_variadic_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_permute_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_repeat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rot90_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_log_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_square_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sub_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_zeros_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_add_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addbmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_all_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_allclose_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_any_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asin_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_2d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bfloat16_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_block_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bool_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_to_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cartesian_prod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cdouble_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chalf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_char_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_inverse_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_clone_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_column_stack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_combinations_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_constant_pad_nd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cosh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cov_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumprod_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_embed_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagflat_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diff_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_like_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eye_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftshift_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfftn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fliplr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gather_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_copy_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isfinite_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_istft_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lerp_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_singular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_grad_oriented_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_power_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_multi_dot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_subgradients_at_zero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_triangular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svdvals_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vecdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log10_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log2_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_not_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_or_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_xor_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_unpack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumsum_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_fill_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matmul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_list_of_tensors_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_movedim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mul_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mv_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_neg_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_full_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_ones_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose1d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_l1_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_circular_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_constant_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_reflect_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softsign_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nonzero_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_fro_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ops_nvprims_view_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ormqr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_outer_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pow_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_qr_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_renorm_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sgn_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sqrt_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_square_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_along_dim_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tanh_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensordot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tile_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapezoid_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapz_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triangular_solve_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triu_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_unbiased_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vdot_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vsplit_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vstack_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zero__cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view_H_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_T_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___radd___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rdiv___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_T_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_short_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_all_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_allclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_max_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_column_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_physical_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_floor_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fliplr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hstack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lerp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log1p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_not_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_maximum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nan_to_num_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_alpha_dropout_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_prelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu6_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_selu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rot90_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_entr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_erfcx_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_zeta_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trunc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__refs_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view__softmax_backward_data_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_addr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_aminmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_any_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_argwhere_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_asinh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_block_diag_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bool_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_bucketize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cdouble_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cfloat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_chalf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_inverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_min_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_conj_physical_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_constant_pad_nd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_contiguous_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_cummax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_diff_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_div_trunc_rounding_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_double_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_dsplit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_empty_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_equal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_erfinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expand_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_expm1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfftn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fill_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_float_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_fmod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_full_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gather_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_geqrf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_gt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_histc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_igammac_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_index_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_inner_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_int_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isposinf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_isreal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_kron_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_singular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_householder_product_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_grad_oriented_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_power_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_multi_dot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_singular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_qr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_ex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_triangular_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svdvals_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorsolve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vector_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log10_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_logsumexp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_long_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_lu_unpack_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumprod_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumsum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_log_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_median_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_sum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_matrix_exp_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_maximum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_list_of_tensors_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_min_binary_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_msort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_mv_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nan_to_num_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanmedian_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nanquantile_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nansum_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_new_zeros_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional__scaled_dot_product_attention_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_celu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_ctc_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_with_train_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gaussian_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gelu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_glu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_group_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardshrink_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardswish_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hinge_embedding_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_huber_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bicubic_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_linear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_kl_div_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_margin_ranking_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_grad_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_soft_margin_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_normalize_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_poisson_nll_loss_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_selu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_silu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_view_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_pinverse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_put_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randint_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randn_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_randn_like_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_ravel_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_real_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reciprocal_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_remainder_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_interleave_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_as_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resize_as__cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_neg_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_roll_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rot90_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_neg_3_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_rsqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scalar_tensor_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_searchsorted_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_lengths_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_select_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_blackman_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_cosine_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_exponential_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_hamming_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sin_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_sampled_addmm_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_airy_ai_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_w_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_entr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_h_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_log_ndtr_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_w_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_split_with_sizes_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_square_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_stft_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_sum_to_size_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_svd_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_svd_lowrank_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_symeig_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_t_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_take_along_dim_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tan_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_tensor_split_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_to_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_to_sparse_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trapz_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triangular_solve_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_trunc_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unflatten_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_uniform_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_unique_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_vdot_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_where_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_all_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_argsort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_allclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_aminmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_angle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_column_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_count_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ge_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_multinomial_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_trilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_prelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_nuc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rad2deg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_roll_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_indices_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_consecutive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_bool_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_byte_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cdouble_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_copysign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_T_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___radd___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__softmax_backward_data_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_min_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cosh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_multi_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vecdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_msort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_ctc_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pinverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rpow___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_abs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_decomposed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_partial_views_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_block_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ceil_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chunk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_combinations_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_constant_pad_nd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_digamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gradient_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_elu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_quantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sort_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_along_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_cumulative_trapezoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diag_embed_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dist_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_eq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_frexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_igammac_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_imag_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_int_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isfinite_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_2inputs_2outputs_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_kron_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_fake_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lgamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cond_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_householder_product_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svdvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vander_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logcumsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_and_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_long_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_median_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_masked_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nanquantile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_bag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gaussian_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_logsigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_margin_ranking_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_threshold_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_bilinear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_norm_inf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_ormqr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polar_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_put_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_randn_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_real_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_reciprocal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_repeat_interleave_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_resolve_conj_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_searchsorted_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hann_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sinc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i0e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_special_zeta_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_symeig_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tensor_split_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_triangular_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_trunc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_uniform_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unique_consecutive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_vdot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_view_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zero__cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_zeros_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rdiv___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmod___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmul___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rsub___cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcdiv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_any_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_arange_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asinh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_baddbmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bfloat16_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_and_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_or_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_xor_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_shapes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cartesian_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chalf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_char_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_inverse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clone_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_physical_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_corrcoef_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cos_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cov_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_deg2rad_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_trunc_rounding_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dstack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_einsum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfinv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expm1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eye_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftshift_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft2_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfftn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fliplr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_heaviside_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hsplit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hypot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igamma_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isnan_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isneginf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isposinf_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_istft_cuda_complex64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_4inputs_with_extra_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_unary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lcm_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ldexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_le_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cross_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_singular_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvals_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_slogdet_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vector_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log1p_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_or_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_xor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logsumexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_unpack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mH_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mT_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumprod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_fill_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logaddexp_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matmul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_maximum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_list_of_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_binary_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mul_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_5_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nan_to_num_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmedian_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nansum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_dropout_backward_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nextafter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_alpha_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_without_cudnn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_celu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_glu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_huber_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_linear_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool1d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool3d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_normalize_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_one_hot_cuda_int64, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_constant_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_shuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_unshuffle_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_selu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_soft_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softsign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_nearest_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_fro_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_number_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_outer_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_positive_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rand_like_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_neg_3_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_add_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_lengths_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sgn_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_short_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sigmoid_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sign_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_bartlett_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_cosine_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hann_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_kaiser_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signbit_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_scatter_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_with_dtype_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_sampled_addmm_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_airy_ai_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_entr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1e_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_laguerre_polynomial_l_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_log_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtr_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtri_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k1_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_v_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_xlog1py_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_with_sizes_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stack_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensordot_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tile_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_transpose_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unflatten_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_complex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_copy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zero__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_H_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rand___cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__native_batch_norm_legit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_char_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_half_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_int_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs__conversions_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_abs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_arange_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_asin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_asinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_xor_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clamp_min_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_clone_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_column_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_conj_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_copysign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cos_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_cosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_no_rounding_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_erfc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_exp2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_expand_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_eye_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_floor_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_fmod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isneginf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_linspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log1p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logical_xor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_maximum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_minimum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_native_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ne_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_celu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_gelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardtanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hinge_embedding_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_l1_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softplus_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_tanhshrink_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_permute_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_randn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_reshape_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_roll_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sgn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_entr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_i1e_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_log_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_softmax_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_special_zeta_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_std_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tensor_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags__refs_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unbind_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags__softmax_backward_data_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_abs_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_acosh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addmv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_addr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_allclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_aminmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_angle_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_arange_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_asin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_asinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_atleast_2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_baddbmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_and_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_left_shift_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_not_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bitwise_or_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_bmm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_broadcast_to_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_byte_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cdouble_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_char_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_clamp_min_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_column_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_conj_physical_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_deg2rad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagflat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_diagonal_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_floor_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_double_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_empty_like_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_expand_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ifftshift_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_ihfftn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fft_irfft_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_flipud_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_float_power_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_floor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_floor_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_fmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gradient_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_gt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_histc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hstack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_hypot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_igammac_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_put_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_index_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_inner_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isclose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isneginf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_isposinf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_le_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cond_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_det_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_det_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_eigvalsh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_inv_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_grad_oriented_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_norm_subgradients_at_zero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_hermitian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_slogdet_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_svdvals_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_tensorsolve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_vector_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_not_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_or_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_logical_xor_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_long_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_lu_unpack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_amin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumprod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_fill_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_select_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_softmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_masked_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_matmul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_matrix_exp_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_pool2d_with_indices_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_median_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_meshgrid_variadic_tensors_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_reduction_no_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_min_reduction_with_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mode_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_movedim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_msort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_native_dropout_backward_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_neg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_empty_strided_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_ones_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_new_zeros_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional__scaled_dot_product_attention_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_bilinear_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_with_logits_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose1d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout3d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_elu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_with_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_glu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_grid_sample_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_instance_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_kl_div_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_layer_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_local_response_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool2d_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mish_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_soft_margin_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_circular_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_poisson_nll_loss_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu6_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_rrelu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_selu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_with_dtype_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_softsign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nn_functional_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_nonzero_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_norm_inf_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_normal_number_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_native_batch_norm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_ormqr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_outer_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_pca_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_permute_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_2_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_3_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_put_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_rad2deg_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randint_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_randn_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_real_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reciprocal_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_remainder_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_renorm_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_repeat_interleave_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_reshape_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resize__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_resize_as__cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_roll_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_round_decimals_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_add_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_prod_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_segment_reduce_offsets_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_select_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_short_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_blackman_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_cosine_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_hamming_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signal_windows_hann_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_signbit_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sinc_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sinh_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sort_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_bessel_j1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_w_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_he_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_legendre_polynomial_p_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k1_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtr_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_ndtri_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_polygamma_special_polygamma_n_0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_u_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_special_spherical_bessel_j0_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sqrt_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_squeeze_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_stack_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sub_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_sum_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_svd_lowrank_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_t_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_take_along_dim_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tensordot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tile_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_to_sparse_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_topk_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_transpose_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_trapz_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tril_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_tril_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_triu_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_triu_indices_cuda_int64, test/test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unflatten_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_unique_consecutive_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_complex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_as_real_cuda_complex64, test/test_ops.py::TestTagsCUDA::test_tags_view_copy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_xlogy_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_zeros_like_cuda_float32
2023-01-11T23:13:46.9946776Z 
2023-01-11T23:13:46.9947071Z test_ops.py::TestCommonCUDA::test_compare_cpu_H_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9947542Z test_ops.py::TestCommonCUDA::test_compare_cpu___getitem___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9948009Z test_ops.py::TestCommonCUDA::test_compare_cpu___rand___cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9948462Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmatmul___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9949082Z test_ops.py::TestCommonCUDA::test_compare_cpu___rmul___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9949529Z test_ops.py::TestCommonCUDA::test_compare_cpu___ror___cuda_int64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9950064Z test_ops.py::TestCommonCUDA::test_compare_cpu___rpow___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9950502Z test_ops.py::TestCommonCUDA::test_compare_cpu___rsub___cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9950965Z test_ops.py::TestCommonCUDA::test_compare_cpu__native_batch_norm_legit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9951439Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_bool_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9951923Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_cfloat_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9952413Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_complex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9952890Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_float_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9953357Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_int_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:13:46.9953818Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs__conversions_long_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:13:46.9954285Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9954812Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_as_strided_partial_views_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9955286Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9955739Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_atleast_3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9956198Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_bitwise_right_shift_cuda_int64 SKIPPED (Skipped some inputs produce undefined outputs) [  0%]
2023-01-11T23:13:46.9956657Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_chunk_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9957126Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_copysign_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9957579Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_cumsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9958037Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9958510Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_diagonal_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9958989Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_div_trunc_rounding_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9959504Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_dsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9960155Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_empty_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:13:46.9960603Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9961062Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_expand_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9961519Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fft_fftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9961963Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_flip_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9962417Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fliplr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9962871Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_fmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9963323Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_hsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9963769Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_index_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9964225Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_lerp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9964687Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_matrix_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9965161Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9965665Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_svdvals_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9966143Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linalg_vector_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9966605Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_linspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9967080Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_meshgrid_list_of_tensors_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9967556Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_new_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9968016Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nextafter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9968483Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [  0%]
2023-01-11T23:13:46.9968984Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_hardtanh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9969491Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_huber_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9969979Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9970516Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_margin_ranking_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9971005Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_nll_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9971485Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_relu6_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9971979Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9972464Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_softshrink_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9972966Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9973443Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9973901Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_reshape_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9974349Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_rot90_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9974921Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_squeeze_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9975384Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_std_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9975887Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_sum_to_size_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9976337Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_t_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9976788Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_trace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9977238Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_triu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9977692Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_true_divide_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9978152Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_unfold_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9978606Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_var_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9979061Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_view_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9979514Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9979969Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_vstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9980408Z test_ops.py::TestCommonCUDA::test_compare_cpu__refs_xlogy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9980913Z test_ops.py::TestCommonCUDA::test_compare_cpu__softmax_backward_data_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9981389Z test_ops.py::TestCommonCUDA::test_compare_cpu_addmm_decomposed_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9981847Z test_ops.py::TestCommonCUDA::test_compare_cpu_argsort_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9982305Z test_ops.py::TestCommonCUDA::test_compare_cpu_as_strided_partial_views_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9982769Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9983224Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9983675Z test_ops.py::TestCommonCUDA::test_compare_cpu_atleast_3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9984110Z test_ops.py::TestCommonCUDA::test_compare_cpu_bitwise_right_shift_cuda_int64 SKIPPED (Some inputs produce undefined outputs) [  0%]
2023-01-11T23:13:46.9984559Z test_ops.py::TestCommonCUDA::test_compare_cpu_block_diag_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9985008Z test_ops.py::TestCommonCUDA::test_compare_cpu_bool_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9985459Z test_ops.py::TestCommonCUDA::test_compare_cpu_cartesian_prod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9985918Z test_ops.py::TestCommonCUDA::test_compare_cpu_cfloat_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9986380Z test_ops.py::TestCommonCUDA::test_compare_cpu_chalf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9986836Z test_ops.py::TestCommonCUDA::test_compare_cpu_cholesky_inverse_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9987299Z test_ops.py::TestCommonCUDA::test_compare_cpu_column_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9987753Z test_ops.py::TestCommonCUDA::test_compare_cpu_complex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9988198Z test_ops.py::TestCommonCUDA::test_compare_cpu_contiguous_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9988652Z test_ops.py::TestCommonCUDA::test_compare_cpu_copysign_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9989097Z test_ops.py::TestCommonCUDA::test_compare_cpu_corrcoef_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9989544Z test_ops.py::TestCommonCUDA::test_compare_cpu_cross_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9990080Z test_ops.py::TestCommonCUDA::test_compare_cpu_diagonal_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9990529Z test_ops.py::TestCommonCUDA::test_compare_cpu_dist_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9990970Z test_ops.py::TestCommonCUDA::test_compare_cpu_dot_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9991455Z test_ops.py::TestCommonCUDA::test_compare_cpu_double_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9991896Z test_ops.py::TestCommonCUDA::test_compare_cpu_einsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9992422Z test_ops.py::TestCommonCUDA::test_compare_cpu_empty_like_cuda_float32 SKIPPED (output is non-deterministic) [  0%]
2023-01-11T23:13:46.9992857Z test_ops.py::TestCommonCUDA::test_compare_cpu_eye_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9993314Z test_ops.py::TestCommonCUDA::test_compare_cpu_fft_ifftshift_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9993762Z test_ops.py::TestCommonCUDA::test_compare_cpu_flip_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9994215Z test_ops.py::TestCommonCUDA::test_compare_cpu_flipud_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9994664Z test_ops.py::TestCommonCUDA::test_compare_cpu_float_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9995128Z test_ops.py::TestCommonCUDA::test_compare_cpu_grid_sampler_2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9995579Z test_ops.py::TestCommonCUDA::test_compare_cpu_histc_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9996026Z test_ops.py::TestCommonCUDA::test_compare_cpu_hsplit_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9996471Z test_ops.py::TestCommonCUDA::test_compare_cpu_hstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9996931Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9997465Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9997921Z test_ops.py::TestCommonCUDA::test_compare_cpu_index_reduce_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9998375Z test_ops.py::TestCommonCUDA::test_compare_cpu_int_cuda_float32 SKIPPED (Overflow when downcasting signed type is undefined) [  0%]
2023-01-11T23:13:46.9998825Z test_ops.py::TestCommonCUDA::test_compare_cpu_isin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9999328Z test_ops.py::TestCommonCUDA::test_compare_cpu_istft_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:46.9999778Z test_ops.py::TestCommonCUDA::test_compare_cpu_ldexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0000232Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cond_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0000705Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_det_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0001175Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eig_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0001629Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_eigh_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0002107Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_householder_product_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0002609Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0003072Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_inv_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0003529Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0004005Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_factor_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0004479Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_ldl_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0004945Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lstsq_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0005402Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0005856Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_lu_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0006325Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_matrix_rank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0006790Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0007247Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_pinv_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0007706Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_slogdet_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0008195Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0008664Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_vector_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0009131Z test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0009591Z test_ops.py::TestCommonCUDA::test_compare_cpu_log_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0010058Z test_ops.py::TestCommonCUDA::test_compare_cpu_logaddexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0010526Z test_ops.py::TestCommonCUDA::test_compare_cpu_logcumsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0010987Z test_ops.py::TestCommonCUDA::test_compare_cpu_logdet_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0011433Z test_ops.py::TestCommonCUDA::test_compare_cpu_logspace_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0011895Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_cumsum_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0012357Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_fill_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0012846Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_log_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0013321Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_logsumexp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0013778Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0014239Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_normalize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0014804Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0015264Z test_ops.py::TestCommonCUDA::test_compare_cpu_masked_softmax_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0015713Z test_ops.py::TestCommonCUDA::test_compare_cpu_matrix_exp_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0016167Z test_ops.py::TestCommonCUDA::test_compare_cpu_median_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0016624Z test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_no_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0017100Z test_ops.py::TestCommonCUDA::test_compare_cpu_min_reduction_with_dim_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0017548Z test_ops.py::TestCommonCUDA::test_compare_cpu_mode_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  0%]
2023-01-11T23:13:47.0018063Z test_ops.py::TestCommonCUDA::test_compare_cpu_multinomial_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0018503Z test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_copy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0018994Z test_ops.py::TestCommonCUDA::test_compare_cpu_narrow_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0019444Z test_ops.py::TestCommonCUDA::test_compare_cpu_native_batch_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0019957Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_empty_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0020386Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_full_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0020836Z test_ops.py::TestCommonCUDA::test_compare_cpu_new_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0021418Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (output is non-deterministic (when dropout_p > 0)) [  1%]
2023-01-11T23:13:47.0021923Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_adaptive_avg_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0022413Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0022883Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_avg_pool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0023353Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_batch_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0023876Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0024390Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_binary_cross_entropy_with_logits_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0024884Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0025360Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_conv2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0025841Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_cosine_similarity_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0026331Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_ctc_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0026878Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout3d_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0027399Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_dropout_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0027864Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_bag_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0028303Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_embedding_cuda_float32 SKIPPED (Skipped!) [  1%]
2023-01-11T23:13:47.0028739Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_glu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0029218Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_hardswish_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0029817Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0030318Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_nearest_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0030827Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_interpolate_trilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0031321Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_linear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0031806Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_local_response_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0032293Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_pool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0032773Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0033267Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0033760Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool2d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0034244Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0034764Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_max_unpool3d_grad_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0035257Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0035749Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0036252Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0036736Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_normalize_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0037212Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_constant_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0037700Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pad_replicate_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0038184Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_pixel_shuffle_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0038666Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_soft_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0039186Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0039675Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0040205Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0040715Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_triplet_margin_with_distance_loss_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0041214Z test_ops.py::TestCommonCUDA::test_compare_cpu_nn_functional_upsample_bilinear_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0041692Z test_ops.py::TestCommonCUDA::test_compare_cpu_nonzero_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0042138Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0042587Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_inf_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0043034Z test_ops.py::TestCommonCUDA::test_compare_cpu_norm_nuc_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0043538Z test_ops.py::TestCommonCUDA::test_compare_cpu_normal_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0044038Z test_ops.py::TestCommonCUDA::test_compare_cpu_normal_number_mean_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0044476Z test_ops.py::TestCommonCUDA::test_compare_cpu_ones_like_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0044951Z test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_native_batch_norm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0045452Z test_ops.py::TestCommonCUDA::test_compare_cpu_ops_nvprims_view_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0045916Z test_ops.py::TestCommonCUDA::test_compare_cpu_ormqr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0046430Z test_ops.py::TestCommonCUDA::test_compare_cpu_pca_lowrank_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0046869Z test_ops.py::TestCommonCUDA::test_compare_cpu_pinverse_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0047318Z test_ops.py::TestCommonCUDA::test_compare_cpu_polar_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0047768Z test_ops.py::TestCommonCUDA::test_compare_cpu_qr_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0048226Z test_ops.py::TestCommonCUDA::test_compare_cpu_quantile_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0048743Z test_ops.py::TestCommonCUDA::test_compare_cpu_rand_like_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0049170Z test_ops.py::TestCommonCUDA::test_compare_cpu_randint_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0049622Z test_ops.py::TestCommonCUDA::test_compare_cpu_renorm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0050085Z test_ops.py::TestCommonCUDA::test_compare_cpu_repeat_interleave_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0050557Z test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0051010Z test_ops.py::TestCommonCUDA::test_compare_cpu_reshape_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0051485Z test_ops.py::TestCommonCUDA::test_compare_cpu_resize__cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0051935Z test_ops.py::TestCommonCUDA::test_compare_cpu_resize_as__cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0052399Z test_ops.py::TestCommonCUDA::test_compare_cpu_resolve_neg_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0052842Z test_ops.py::TestCommonCUDA::test_compare_cpu_rsub_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0053301Z test_ops.py::TestCommonCUDA::test_compare_cpu_scalar_tensor_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0053776Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_add_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0054235Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0054800Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0055278Z test_ops.py::TestCommonCUDA::test_compare_cpu_scatter_reduce_prod_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0055760Z test_ops.py::TestCommonCUDA::test_compare_cpu_segment_reduce_offsets_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0056274Z test_ops.py::TestCommonCUDA::test_compare_cpu_select_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0056726Z test_ops.py::TestCommonCUDA::test_compare_cpu_slice_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0057182Z test_ops.py::TestCommonCUDA::test_compare_cpu_softmax_with_dtype_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0057657Z test_ops.py::TestCommonCUDA::test_compare_cpu_sparse_sampled_addmm_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0058265Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:13:47.0058766Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:13:47.0059302Z test_ops.py::TestCommonCUDA::test_compare_cpu_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (testing takes an unreasonably long time, #79528) [  1%]
2023-01-11T23:13:47.0059774Z test_ops.py::TestCommonCUDA::test_compare_cpu_split_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0060236Z test_ops.py::TestCommonCUDA::test_compare_cpu_split_with_sizes_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0060694Z test_ops.py::TestCommonCUDA::test_compare_cpu_stack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0061134Z test_ops.py::TestCommonCUDA::test_compare_cpu_stft_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0061653Z test_ops.py::TestCommonCUDA::test_compare_cpu_svd_lowrank_cuda_float32 SKIPPED (output is non-deterministic) [  1%]
2023-01-11T23:13:47.0062083Z test_ops.py::TestCommonCUDA::test_compare_cpu_t_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0062559Z test_ops.py::TestCommonCUDA::test_compare_cpu_take_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0063003Z test_ops.py::TestCommonCUDA::test_compare_cpu_to_sparse_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0063452Z test_ops.py::TestCommonCUDA::test_compare_cpu_topk_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0063910Z test_ops.py::TestCommonCUDA::test_compare_cpu_triangular_solve_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0064364Z test_ops.py::TestCommonCUDA::test_compare_cpu_triu_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0064822Z test_ops.py::TestCommonCUDA::test_compare_cpu_unflatten_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0065272Z test_ops.py::TestCommonCUDA::test_compare_cpu_unfold_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0065713Z test_ops.py::TestCommonCUDA::test_compare_cpu_unique_cuda_float32 SKIPPED (Output order is undefined when sorted=False) [  1%]
2023-01-11T23:13:47.0066153Z test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0066615Z test_ops.py::TestCommonCUDA::test_compare_cpu_var_mean_unbiased_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0067092Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_as_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0067542Z test_ops.py::TestCommonCUDA::test_compare_cpu_view_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0067990Z test_ops.py::TestCommonCUDA::test_compare_cpu_vstack_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0068438Z test_ops.py::TestCommonCUDA::test_compare_cpu_xlogy_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0068872Z test_ops.py::TestCommonCUDA::test_compare_cpu_zero__cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0069305Z test_ops.py::TestCommonCUDA::test_compare_cpu_zeros_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  1%]
2023-01-11T23:13:47.0069777Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_H_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0070169Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_acos_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0070560Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_as_strided_partial_views_cuda_complex32 XFAIL [  1%]
2023-01-11T23:13:47.0070955Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_asin_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0071343Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_atleast_3d_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0071723Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_bool_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0072090Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cat_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0072466Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cfloat_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0072848Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chalf_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0108747Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_chunk_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0109303Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_clone_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0109786Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_column_stack_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0110204Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_conj_physical_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0110595Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_cos_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0110968Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0111358Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diag_embed_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0111756Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_copy_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0112138Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_diagonal_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0112544Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_div_no_rounding_mode_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0112943Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_dsplit_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0113322Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_eq_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0113694Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fft_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0114072Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftn_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0114460Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_fftshift_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0114901Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_hfftn_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0115287Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft2_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0115673Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_ifft_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0116059Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfft_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0116439Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_fft_irfftn_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0116832Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_hstack_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0117220Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_index_put_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0117612Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_isfinite_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0117989Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_lerp_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0118369Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0118768Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_log_softmax_with_dtype_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0119166Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mT_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0119547Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_masked_fill_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0119935Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_movedim_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0120314Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_mul_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0120700Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_copy_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0121078Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_narrow_cuda_complex32 PASSED [  1%]
2023-01-11T23:13:47.0121543Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_cuda_complex32 SKIPPED (Expected: new_empty is not comparable) [  1%]
2023-01-11T23:13:47.0122038Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_empty_strided_cuda_complex32 SKIPPED (Expected: new_empty_strided is not comparable) [  1%]
2023-01-11T23:13:47.0122483Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_full_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0122865Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_new_ones_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0123275Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose2d_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0123717Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nn_functional_conv_transpose3d_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0124119Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_nonzero_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0124509Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_ones_like_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0124893Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_permute_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0125280Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_positive_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0125654Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_pow_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0126036Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_prod_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0126480Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rand_like_cuda_complex32 SKIPPED (Expected: randn_like is not comparable between dtypes) [  2%]
2023-01-11T23:13:47.0127016Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_randn_like_cuda_complex32 SKIPPED (Expected: randn_like is not comparable between dtypes) [  2%]
2023-01-11T23:13:47.0127447Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_real_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0127838Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_as_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0128225Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_reshape_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0128616Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_resolve_neg_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0129002Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_rsqrt_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0129440Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_scalar_tensor_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0129837Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sigmoid_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0130218Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sinh_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0130600Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_slice_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0130995Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_split_with_sizes_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0131388Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_squeeze_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0131759Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_sub_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0132136Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_tan_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0132518Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_trace_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0132905Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_transpose_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0133318Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_true_divide_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0133709Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unbind_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0134092Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unfold_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0134601Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_unsqueeze_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0134982Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_as_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0135364Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_view_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0135748Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_vsplit_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0136125Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_where_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0136512Z test_ops.py::TestCommonCUDA::test_complex_half_reference_testing_zeros_cuda_complex32 PASSED [  2%]
2023-01-11T23:13:47.0136863Z test_ops.py::TestCommonCUDA::test_dtypes_H_cuda PASSED                   [  2%]
2023-01-11T23:13:47.0137189Z test_ops.py::TestCommonCUDA::test_dtypes_T_cuda PASSED                   [  2%]
2023-01-11T23:13:47.0137505Z test_ops.py::TestCommonCUDA::test_dtypes___rdiv___cuda PASSED            [  2%]
2023-01-11T23:13:47.0137827Z test_ops.py::TestCommonCUDA::test_dtypes___rmatmul___cuda PASSED         [  2%]
2023-01-11T23:13:47.0138157Z test_ops.py::TestCommonCUDA::test_dtypes___rpow___cuda PASSED            [  2%]
2023-01-11T23:13:47.0138472Z test_ops.py::TestCommonCUDA::test_dtypes___rxor___cuda PASSED            [  2%]
2023-01-11T23:13:47.0138875Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bfloat16_cuda PASSED [  2%]
2023-01-11T23:13:47.0139224Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_bool_cuda PASSED [  2%]
2023-01-11T23:13:47.0139567Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_byte_cuda PASSED [  2%]
2023-01-11T23:13:47.0139910Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cdouble_cuda PASSED [  2%]
2023-01-11T23:13:47.0140261Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_cfloat_cuda PASSED [  2%]
2023-01-11T23:13:47.0140604Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_char_cuda PASSED [  2%]
2023-01-11T23:13:47.0140942Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_complex_cuda PASSED [  2%]
2023-01-11T23:13:47.0141291Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_int_cuda PASSED [  2%]
2023-01-11T23:13:47.0141633Z test_ops.py::TestCommonCUDA::test_dtypes__refs__conversions_long_cuda PASSED [  2%]
2023-01-11T23:13:47.0141969Z test_ops.py::TestCommonCUDA::test_dtypes__refs_abs_cuda PASSED           [  2%]
2023-01-11T23:13:47.0142286Z test_ops.py::TestCommonCUDA::test_dtypes__refs_addcdiv_cuda PASSED       [  2%]
2023-01-11T23:13:47.0142616Z test_ops.py::TestCommonCUDA::test_dtypes__refs_addcmul_cuda PASSED       [  2%]
2023-01-11T23:13:47.0142939Z test_ops.py::TestCommonCUDA::test_dtypes__refs_any_cuda PASSED           [  2%]
2023-01-11T23:13:47.0143256Z test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_cuda PASSED    [  2%]
2023-01-11T23:13:47.0143606Z test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_partial_views_cuda PASSED [  2%]
2023-01-11T23:13:47.0143967Z test_ops.py::TestCommonCUDA::test_dtypes__refs_as_strided_scatter_cuda PASSED [  2%]
2023-01-11T23:13:47.0144304Z test_ops.py::TestCommonCUDA::test_dtypes__refs_asin_cuda PASSED          [  2%]
2023-01-11T23:13:47.0144618Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atan_cuda PASSED          [  2%]
2023-01-11T23:13:47.0144945Z test_ops.py::TestCommonCUDA::test_dtypes__refs_atleast_1d_cuda PASSED    [  2%]
2023-01-11T23:13:47.0145270Z test_ops.py::TestCommonCUDA::test_dtypes__refs_bitwise_xor_cuda PASSED   [  2%]
2023-01-11T23:13:47.0145625Z test_ops.py::TestCommonCUDA::test_dtypes__refs_broadcast_to_cuda PASSED  [  2%]
2023-01-11T23:13:47.0145953Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ceil_cuda PASSED          [  2%]
2023-01-11T23:13:47.0146270Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clamp_cuda PASSED         [  2%]
2023-01-11T23:13:47.0146593Z test_ops.py::TestCommonCUDA::test_dtypes__refs_clone_cuda PASSED         [  2%]
2023-01-11T23:13:47.0146912Z test_ops.py::TestCommonCUDA::test_dtypes__refs_column_stack_cuda PASSED  [  2%]
2023-01-11T23:13:47.0147239Z test_ops.py::TestCommonCUDA::test_dtypes__refs_contiguous_cuda PASSED    [  2%]
2023-01-11T23:13:47.0147564Z test_ops.py::TestCommonCUDA::test_dtypes__refs_copysign_cuda PASSED      [  2%]
2023-01-11T23:13:47.0147876Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cosh_cuda PASSED          [  2%]
2023-01-11T23:13:47.0148202Z test_ops.py::TestCommonCUDA::test_dtypes__refs_cumsum_cuda PASSED        [  2%]
2023-01-11T23:13:47.0148516Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diag_cuda PASSED          [  2%]
2023-01-11T23:13:47.0148841Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_copy_cuda PASSED [  2%]
2023-01-11T23:13:47.0149158Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_cuda PASSED      [  2%]
2023-01-11T23:13:47.0149491Z test_ops.py::TestCommonCUDA::test_dtypes__refs_diagonal_scatter_cuda PASSED [  2%]
2023-01-11T23:13:47.0149908Z test_ops.py::TestCommonCUDA::test_dtypes__refs_digamma_cuda PASSED       [  2%]
2023-01-11T23:13:47.0150238Z test_ops.py::TestCommonCUDA::test_dtypes__refs_div_no_rounding_mode_cuda PASSED [  2%]
2023-01-11T23:13:47.0150573Z test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_cuda PASSED         [  2%]
2023-01-11T23:13:47.0150896Z test_ops.py::TestCommonCUDA::test_dtypes__refs_empty_like_cuda PASSED    [  2%]
2023-01-11T23:13:47.0151259Z test_ops.py::TestCommonCUDA::test_dtypes__refs_eq_cuda PASSED            [  2%]
2023-01-11T23:13:47.0151571Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erf_cuda PASSED           [  2%]
2023-01-11T23:13:47.0151894Z test_ops.py::TestCommonCUDA::test_dtypes__refs_erfinv_cuda PASSED        [  2%]
2023-01-11T23:13:47.0152220Z test_ops.py::TestCommonCUDA::test_dtypes__refs_exp2_cuda PASSED          [  2%]
2023-01-11T23:13:47.0152534Z test_ops.py::TestCommonCUDA::test_dtypes__refs_exp_cuda PASSED           [  2%]
2023-01-11T23:13:47.0152859Z test_ops.py::TestCommonCUDA::test_dtypes__refs_expand_cuda PASSED        [  2%]
2023-01-11T23:13:47.0153180Z test_ops.py::TestCommonCUDA::test_dtypes__refs_eye_cuda PASSED           [  2%]
2023-01-11T23:13:47.0153504Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_fftn_cuda PASSED      [  2%]
2023-01-11T23:13:47.0153823Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_hfft2_cuda PASSED     [  2%]
2023-01-11T23:13:47.0154157Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ifftshift_cuda PASSED [  2%]
2023-01-11T23:13:47.0154484Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfft2_cuda PASSED    [  2%]
2023-01-11T23:13:47.0154805Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_ihfftn_cuda PASSED    [  2%]
2023-01-11T23:13:47.0155129Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_irfftn_cuda PASSED    [  2%]
2023-01-11T23:13:47.0155452Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfft_cuda PASSED      [  2%]
2023-01-11T23:13:47.0155776Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fft_rfftn_cuda PASSED     [  2%]
2023-01-11T23:13:47.0156095Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flatten_cuda PASSED       [  2%]
2023-01-11T23:13:47.0156417Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flip_cuda PASSED          [  2%]
2023-01-11T23:13:47.0156741Z test_ops.py::TestCommonCUDA::test_dtypes__refs_flipud_cuda PASSED        [  2%]
2023-01-11T23:13:47.0157059Z test_ops.py::TestCommonCUDA::test_dtypes__refs_float_power_cuda PASSED   [  2%]
2023-01-11T23:13:47.0157394Z test_ops.py::TestCommonCUDA::test_dtypes__refs_floor_divide_cuda PASSED  [  2%]
2023-01-11T23:13:47.0157719Z test_ops.py::TestCommonCUDA::test_dtypes__refs_fmax_cuda PASSED          [  2%]
2023-01-11T23:13:47.0158065Z test_ops.py::TestCommonCUDA::test_dtypes__refs_ge_cuda PASSED            [  2%]
2023-01-11T23:13:47.0158383Z test_ops.py::TestCommonCUDA::test_dtypes__refs_heaviside_cuda PASSED     [  2%]
2023-01-11T23:13:47.0158708Z test_ops.py::TestCommonCUDA::test_dtypes__refs_igammac_cuda PASSED       [  2%]
2023-01-11T23:13:47.0159030Z test_ops.py::TestCommonCUDA::test_dtypes__refs_imag_cuda PASSED          [  2%]
2023-01-11T23:13:47.0159344Z test_ops.py::TestCommonCUDA::test_dtypes__refs_index_add_cuda PASSED     [  2%]
2023-01-11T23:13:47.0159670Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isclose_cuda PASSED       [  2%]
2023-01-11T23:13:47.0159997Z test_ops.py::TestCommonCUDA::test_dtypes__refs_isposinf_cuda PASSED      [  2%]
2023-01-11T23:13:47.0160322Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lcm_cuda PASSED           [  2%]
2023-01-11T23:13:47.0160635Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_norm_cuda PASSED   [  2%]
2023-01-11T23:13:47.0160975Z test_ops.py::TestCommonCUDA::test_dtypes__refs_linalg_svdvals_cuda PASSED [  2%]
2023-01-11T23:13:47.0161312Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log10_cuda PASSED         [  2%]
2023-01-11T23:13:47.0161629Z test_ops.py::TestCommonCUDA::test_dtypes__refs_log1p_cuda PASSED         [  2%]
2023-01-11T23:13:47.0161953Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logical_and_cuda PASSED   [  2%]
2023-01-11T23:13:47.0162281Z test_ops.py::TestCommonCUDA::test_dtypes__refs_logspace_cuda PASSED      [  2%]
2023-01-11T23:13:47.0162602Z test_ops.py::TestCommonCUDA::test_dtypes__refs_lt_cuda PASSED            [  2%]
2023-01-11T23:13:47.0162917Z test_ops.py::TestCommonCUDA::test_dtypes__refs_maximum_cuda PASSED       [  2%]
2023-01-11T23:13:47.0163266Z test_ops.py::TestCommonCUDA::test_dtypes__refs_mean_cuda PASSED          [  2%]
2023-01-11T23:13:47.0163610Z test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_list_of_tensors_cuda PASSED [  2%]
2023-01-11T23:13:47.0163970Z test_ops.py::TestCommonCUDA::test_dtypes__refs_meshgrid_variadic_tensors_cuda PASSED [  2%]
2023-01-11T23:13:47.0164317Z test_ops.py::TestCommonCUDA::test_dtypes__refs_minimum_cuda PASSED       [  2%]
2023-01-11T23:13:47.0164645Z test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_copy_cuda PASSED   [  2%]
2023-01-11T23:13:47.0164973Z test_ops.py::TestCommonCUDA::test_dtypes__refs_narrow_cuda PASSED        [  2%]
2023-01-11T23:13:47.0165299Z test_ops.py::TestCommonCUDA::test_dtypes__refs_native_layer_norm_cuda PASSED [  2%]
2023-01-11T23:13:47.0165636Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_ones_cuda PASSED      [  2%]
2023-01-11T23:13:47.0165961Z test_ops.py::TestCommonCUDA::test_dtypes__refs_new_zeros_cuda PASSED     [  2%]
2023-01-11T23:13:47.0166303Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_alpha_dropout_cuda PASSED [  2%]
2023-01-11T23:13:47.0166670Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_celu_cuda PASSED [  2%]
2023-01-11T23:13:47.0167023Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_gelu_cuda PASSED [  2%]
2023-01-11T23:13:47.0167377Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_glu_cuda PASSED [  2%]
2023-01-11T23:13:47.0167727Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hardshrink_cuda PASSED [  2%]
2023-01-11T23:13:47.0168102Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_hinge_embedding_loss_cuda PASSED [  2%]
2023-01-11T23:13:47.0168479Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_l1_loss_cuda PASSED [  2%]
2023-01-11T23:13:47.0168846Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_log_softmax_with_dtype_cuda PASSED [  2%]
2023-01-11T23:13:47.0169232Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_margin_ranking_loss_cuda PASSED [  2%]
2023-01-11T23:13:47.0169602Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mish_cuda PASSED [  2%]
2023-01-11T23:13:47.0169956Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_mse_loss_cuda PASSED [  2%]
2023-01-11T23:13:47.0170333Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_pdist_cuda PASSED [  2%]
2023-01-11T23:13:47.0170702Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_poisson_nll_loss_cuda PASSED [  2%]
2023-01-11T23:13:47.0171066Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_prelu_cuda PASSED [  2%]
2023-01-11T23:13:47.0171417Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_relu6_cuda PASSED [  2%]
2023-01-11T23:13:47.0171756Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_selu_cuda PASSED [  2%]
2023-01-11T23:13:47.0172124Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmax_with_dtype_cuda PASSED [  2%]
2023-01-11T23:13:47.0172499Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_softmin_with_dtype_cuda PASSED [  2%]
2023-01-11T23:13:47.0172869Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_tanhshrink_cuda PASSED [  2%]
2023-01-11T23:13:47.0173237Z test_ops.py::TestCommonCUDA::test_dtypes__refs_nn_functional_threshold_cuda PASSED [  2%]
2023-01-11T23:13:47.0173581Z test_ops.py::TestCommonCUDA::test_dtypes__refs_norm_cuda PASSED          [  2%]
2023-01-11T23:13:47.0173915Z test_ops.py::TestCommonCUDA::test_dtypes__refs_permute_cuda PASSED       [  2%]
2023-01-11T23:13:47.0174242Z test_ops.py::TestCommonCUDA::test_dtypes__refs_positive_cuda PASSED      [  2%]
2023-01-11T23:13:47.0174670Z test_ops.py::TestCommonCUDA::test_dtypes__refs_pow_cuda PASSED           [  2%]
2023-01-11T23:13:47.0174995Z test_ops.py::TestCommonCUDA::test_dtypes__refs_prod_cuda PASSED          [  2%]
2023-01-11T23:13:47.0175314Z test_ops.py::TestCommonCUDA::test_dtypes__refs_randn_cuda PASSED         [  2%]
2023-01-11T23:13:47.0175634Z test_ops.py::TestCommonCUDA::test_dtypes__refs_real_cuda PASSED          [  2%]
2023-01-11T23:13:47.0176015Z test_ops.py::TestCommonCUDA::test_dtypes__refs_reciprocal_cuda PASSED    [  3%]
2023-01-11T23:13:47.0176350Z test_ops.py::TestCommonCUDA::test_dtypes__refs_reshape_as_cuda PASSED    [  3%]
2023-01-11T23:13:47.0176674Z test_ops.py::TestCommonCUDA::test_dtypes__refs_rot90_cuda PASSED         [  3%]
2023-01-11T23:13:47.0176996Z test_ops.py::TestCommonCUDA::test_dtypes__refs_round_cuda PASSED         [  3%]
2023-01-11T23:13:47.0177318Z test_ops.py::TestCommonCUDA::test_dtypes__refs_rsub_cuda PASSED          [  3%]
2023-01-11T23:13:47.0177633Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sgn_cuda PASSED           [  3%]
2023-01-11T23:13:47.0177954Z test_ops.py::TestCommonCUDA::test_dtypes__refs_sinc_cuda PASSED          [  3%]
2023-01-11T23:13:47.0178280Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1_cuda PASSED    [  3%]
2023-01-11T23:13:47.0178612Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_i1e_cuda PASSED   [  3%]
2023-01-11T23:13:47.0178950Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_log_ndtr_cuda PASSED [  3%]
2023-01-11T23:13:47.0179317Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_multigammaln_mvlgamma_p_5_cuda PASSED [  3%]
2023-01-11T23:13:47.0179677Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_ndtr_cuda PASSED  [  3%]
2023-01-11T23:13:47.0180020Z test_ops.py::TestCommonCUDA::test_dtypes__refs_special_softmax_with_dtype_cuda PASSED [  3%]
2023-01-11T23:13:47.0180368Z test_ops.py::TestCommonCUDA::test_dtypes__refs_square_cuda PASSED        [  3%]
2023-01-11T23:13:47.0180692Z test_ops.py::TestCommonCUDA::test_dtypes__refs_std_mean_cuda PASSED      [  3%]
2023-01-11T23:13:47.0181017Z test_ops.py::TestCommonCUDA::test_dtypes__refs_t_cuda PASSED             [  3%]
2023-01-11T23:13:47.0181337Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tensor_split_cuda PASSED  [  3%]
2023-01-11T23:13:47.0181662Z test_ops.py::TestCommonCUDA::test_dtypes__refs_trace_cuda PASSED         [  3%]
2023-01-11T23:13:47.0181987Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_cuda PASSED          [  3%]
2023-01-11T23:13:47.0182303Z test_ops.py::TestCommonCUDA::test_dtypes__refs_tril_indices_cuda PASSED  [  3%]
2023-01-11T23:13:47.0182663Z test_ops.py::TestCommonCUDA::test_dtypes__refs_triu_cuda PASSED          [  3%]
2023-01-11T23:13:47.0182984Z test_ops.py::TestCommonCUDA::test_dtypes__refs_true_divide_cuda PASSED   [  3%]
2023-01-11T23:13:47.0183305Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unbind_cuda PASSED        [  3%]
2023-01-11T23:13:47.0183618Z test_ops.py::TestCommonCUDA::test_dtypes__refs_unsqueeze_cuda PASSED     [  3%]
2023-01-11T23:13:47.0183938Z test_ops.py::TestCommonCUDA::test_dtypes__refs_var_cuda PASSED           [  3%]
2023-01-11T23:13:47.0184253Z test_ops.py::TestCommonCUDA::test_dtypes__refs_var_mean_cuda PASSED      [  3%]
2023-01-11T23:13:47.0184566Z test_ops.py::TestCommonCUDA::test_dtypes__refs_view_as_cuda PASSED       [  3%]
2023-01-11T23:13:47.0184883Z test_ops.py::TestCommonCUDA::test_dtypes__refs_vsplit_cuda PASSED        [  3%]
2023-01-11T23:13:47.0185205Z test_ops.py::TestCommonCUDA::test_dtypes__refs_zeros_cuda PASSED         [  3%]
2023-01-11T23:13:47.0185535Z test_ops.py::TestCommonCUDA::test_dtypes__softmax_backward_data_cuda PASSED [  3%]
2023-01-11T23:13:47.0185856Z test_ops.py::TestCommonCUDA::test_dtypes_abs_cuda PASSED                 [  3%]
2023-01-11T23:13:47.0186168Z test_ops.py::TestCommonCUDA::test_dtypes_acos_cuda PASSED                [  3%]
2023-01-11T23:13:47.0186480Z test_ops.py::TestCommonCUDA::test_dtypes_addbmm_cuda PASSED              [  3%]
2023-01-11T23:13:47.0186786Z test_ops.py::TestCommonCUDA::test_dtypes_addcdiv_cuda PASSED             [  3%]
2023-01-11T23:13:47.0187100Z test_ops.py::TestCommonCUDA::test_dtypes_addmm_cuda PASSED               [  3%]
2023-01-11T23:13:47.0187421Z test_ops.py::TestCommonCUDA::test_dtypes_addmm_decomposed_cuda PASSED    [  3%]
2023-01-11T23:13:47.0187742Z test_ops.py::TestCommonCUDA::test_dtypes_addr_cuda PASSED                [  3%]
2023-01-11T23:13:47.0188076Z test_ops.py::TestCommonCUDA::test_dtypes_allclose_cuda PASSED            [  3%]
2023-01-11T23:13:47.0188387Z test_ops.py::TestCommonCUDA::test_dtypes_amin_cuda PASSED                [  3%]
2023-01-11T23:13:47.0188700Z test_ops.py::TestCommonCUDA::test_dtypes_angle_cuda PASSED               [  3%]
2023-01-11T23:13:47.0189035Z test_ops.py::TestCommonCUDA::test_dtypes_argmax_cuda PASSED              [  3%]
2023-01-11T23:13:47.0189373Z test_ops.py::TestCommonCUDA::test_dtypes_argwhere_cuda PASSED            [  3%]
2023-01-11T23:13:47.0189757Z test_ops.py::TestCommonCUDA::test_dtypes_asin_cuda PASSED                [  3%]
2023-01-11T23:13:47.0190077Z test_ops.py::TestCommonCUDA::test_dtypes_atan2_cuda PASSED               [  3%]
2023-01-11T23:13:47.0190383Z test_ops.py::TestCommonCUDA::test_dtypes_atanh_cuda PASSED               [  3%]
2023-01-11T23:13:47.0190702Z test_ops.py::TestCommonCUDA::test_dtypes_atleast_3d_cuda PASSED          [  3%]
2023-01-11T23:13:47.0191023Z test_ops.py::TestCommonCUDA::test_dtypes_baddbmm_cuda PASSED             [  3%]
2023-01-11T23:13:47.0191332Z test_ops.py::TestCommonCUDA::test_dtypes_bincount_cuda PASSED            [  3%]
2023-01-11T23:13:47.0191651Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_and_cuda PASSED         [  3%]
2023-01-11T23:13:47.0191973Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_not_cuda PASSED         [  3%]
2023-01-11T23:13:47.0192294Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_or_cuda PASSED          [  3%]
2023-01-11T23:13:47.0192604Z test_ops.py::TestCommonCUDA::test_dtypes_bitwise_xor_cuda PASSED         [  3%]
2023-01-11T23:13:47.0192923Z test_ops.py::TestCommonCUDA::test_dtypes_block_diag_cuda PASSED          [  3%]
2023-01-11T23:13:47.0193247Z test_ops.py::TestCommonCUDA::test_dtypes_broadcast_tensors_cuda PASSED   [  3%]
2023-01-11T23:13:47.0193563Z test_ops.py::TestCommonCUDA::test_dtypes_cdouble_cuda PASSED             [  3%]
2023-01-11T23:13:47.0193876Z test_ops.py::TestCommonCUDA::test_dtypes_cfloat_cuda PASSED              [  3%]
2023-01-11T23:13:47.0194194Z test_ops.py::TestCommonCUDA::test_dtypes_cholesky_cuda PASSED            [  3%]
2023-01-11T23:13:47.0194517Z test_ops.py::TestCommonCUDA::test_dtypes_cholesky_solve_cuda PASSED      [  3%]
2023-01-11T23:13:47.0194860Z test_ops.py::TestCommonCUDA::test_dtypes_chunk_cuda PASSED               [  3%]
2023-01-11T23:13:47.0195179Z test_ops.py::TestCommonCUDA::test_dtypes_clamp_cuda PASSED               [  3%]
2023-01-11T23:13:47.0195496Z test_ops.py::TestCommonCUDA::test_dtypes_clamp_max_cuda PASSED           [  3%]
2023-01-11T23:13:47.0195807Z test_ops.py::TestCommonCUDA::test_dtypes_column_stack_cuda PASSED        [  3%]
2023-01-11T23:13:47.0196123Z test_ops.py::TestCommonCUDA::test_dtypes_complex_cuda PASSED             [  3%]
2023-01-11T23:13:47.0196439Z test_ops.py::TestCommonCUDA::test_dtypes_conj_cuda PASSED                [  3%]
2023-01-11T23:13:47.0196750Z test_ops.py::TestCommonCUDA::test_dtypes_constant_pad_nd_cuda PASSED     [  3%]
2023-01-11T23:13:47.0197077Z test_ops.py::TestCommonCUDA::test_dtypes_contiguous_cuda PASSED          [  3%]
2023-01-11T23:13:47.0197397Z test_ops.py::TestCommonCUDA::test_dtypes_corrcoef_cuda PASSED            [  3%]
2023-01-11T23:13:47.0197719Z test_ops.py::TestCommonCUDA::test_dtypes_cos_cuda PASSED                 [  3%]
2023-01-11T23:13:47.0198022Z test_ops.py::TestCommonCUDA::test_dtypes_cummin_cuda PASSED              [  3%]
2023-01-11T23:13:47.0198335Z test_ops.py::TestCommonCUDA::test_dtypes_cumsum_cuda PASSED              [  3%]
2023-01-11T23:13:47.0198650Z test_ops.py::TestCommonCUDA::test_dtypes_deg2rad_cuda PASSED             [  3%]
2023-01-11T23:13:47.0198957Z test_ops.py::TestCommonCUDA::test_dtypes_diag_cuda PASSED                [  3%]
2023-01-11T23:13:47.0199273Z test_ops.py::TestCommonCUDA::test_dtypes_diagflat_cuda PASSED            [  3%]
2023-01-11T23:13:47.0199593Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_copy_cuda PASSED       [  3%]
2023-01-11T23:13:47.0199996Z test_ops.py::TestCommonCUDA::test_dtypes_diagonal_scatter_cuda PASSED    [  3%]
2023-01-11T23:13:47.0200307Z test_ops.py::TestCommonCUDA::test_dtypes_diff_cuda PASSED                [  3%]
2023-01-11T23:13:47.0200618Z test_ops.py::TestCommonCUDA::test_dtypes_dist_cuda PASSED                [  3%]
2023-01-11T23:13:47.0200944Z test_ops.py::TestCommonCUDA::test_dtypes_div_no_rounding_mode_cuda PASSED [  3%]
2023-01-11T23:13:47.0201272Z test_ops.py::TestCommonCUDA::test_dtypes_div_trunc_rounding_cuda PASSED  [  3%]
2023-01-11T23:13:47.0201588Z test_ops.py::TestCommonCUDA::test_dtypes_double_cuda PASSED              [  3%]
2023-01-11T23:13:47.0201899Z test_ops.py::TestCommonCUDA::test_dtypes_dstack_cuda PASSED              [  3%]
2023-01-11T23:13:47.0202209Z test_ops.py::TestCommonCUDA::test_dtypes_einsum_cuda PASSED              [  3%]
2023-01-11T23:13:47.0202517Z test_ops.py::TestCommonCUDA::test_dtypes_empty_like_cuda PASSED          [  3%]
2023-01-11T23:13:47.0202833Z test_ops.py::TestCommonCUDA::test_dtypes_eq_cuda PASSED                  [  3%]
2023-01-11T23:13:47.0203139Z test_ops.py::TestCommonCUDA::test_dtypes_erfc_cuda PASSED                [  3%]
2023-01-11T23:13:47.0203441Z test_ops.py::TestCommonCUDA::test_dtypes_erfinv_cuda PASSED              [  3%]
2023-01-11T23:13:47.0203751Z test_ops.py::TestCommonCUDA::test_dtypes_exp2_cuda PASSED                [  3%]
2023-01-11T23:13:47.0204054Z test_ops.py::TestCommonCUDA::test_dtypes_exp_cuda PASSED                 [  3%]
2023-01-11T23:13:47.0204361Z test_ops.py::TestCommonCUDA::test_dtypes_expand_as_cuda PASSED           [  3%]
2023-01-11T23:13:47.0204668Z test_ops.py::TestCommonCUDA::test_dtypes_expand_cuda PASSED              [  3%]
2023-01-11T23:13:47.0204978Z test_ops.py::TestCommonCUDA::test_dtypes_expm1_cuda PASSED               [  3%]
2023-01-11T23:13:47.0205289Z test_ops.py::TestCommonCUDA::test_dtypes_eye_cuda PASSED                 [  3%]
2023-01-11T23:13:47.0205592Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fft2_cuda PASSED            [  3%]
2023-01-11T23:13:47.0205904Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fft_cuda PASSED             [  3%]
2023-01-11T23:13:47.0206213Z test_ops.py::TestCommonCUDA::test_dtypes_fft_fftshift_cuda PASSED        [  3%]
2023-01-11T23:13:47.0206559Z test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft2_cuda PASSED           [  3%]
2023-01-11T23:13:47.0206864Z test_ops.py::TestCommonCUDA::test_dtypes_fft_hfft_cuda PASSED            [  3%]
2023-01-11T23:13:47.0207174Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft2_cuda PASSED           [  3%]
2023-01-11T23:13:47.0207486Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ifft_cuda PASSED            [  3%]
2023-01-11T23:13:47.0207789Z test_ops.py::TestCommonCUDA::test_dtypes_fft_ihfft_cuda PASSED           [  3%]
2023-01-11T23:13:47.0208104Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft2_cuda PASSED          [  3%]
2023-01-11T23:13:47.0208411Z test_ops.py::TestCommonCUDA::test_dtypes_fft_irfft_cuda PASSED           [  3%]
2023-01-11T23:13:47.0208720Z test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft2_cuda PASSED           [  3%]
2023-01-11T23:13:47.0209029Z test_ops.py::TestCommonCUDA::test_dtypes_fft_rfft_cuda PASSED            [  3%]
2023-01-11T23:13:47.0209378Z test_ops.py::TestCommonCUDA::test_dtypes_fill_cuda PASSED                [  3%]
2023-01-11T23:13:47.0209686Z test_ops.py::TestCommonCUDA::test_dtypes_flip_cuda PASSED                [  3%]
2023-01-11T23:13:47.0209989Z test_ops.py::TestCommonCUDA::test_dtypes_float_cuda PASSED               [  3%]
2023-01-11T23:13:47.0210296Z test_ops.py::TestCommonCUDA::test_dtypes_floor_cuda PASSED               [  3%]
2023-01-11T23:13:47.0210602Z test_ops.py::TestCommonCUDA::test_dtypes_fmod_cuda PASSED                [  3%]
2023-01-11T23:13:47.0210912Z test_ops.py::TestCommonCUDA::test_dtypes_frexp_cuda PASSED               [  3%]
2023-01-11T23:13:47.0211215Z test_ops.py::TestCommonCUDA::test_dtypes_gather_cuda PASSED              [  3%]
2023-01-11T23:13:47.0211519Z test_ops.py::TestCommonCUDA::test_dtypes_gcd_cuda PASSED                 [  3%]
2023-01-11T23:13:47.0211850Z test_ops.py::TestCommonCUDA::test_dtypes_ge_cuda PASSED                  [  3%]
2023-01-11T23:13:47.0212151Z test_ops.py::TestCommonCUDA::test_dtypes_half_cuda PASSED                [  3%]
2023-01-11T23:13:47.0212462Z test_ops.py::TestCommonCUDA::test_dtypes_heaviside_cuda PASSED           [  3%]
2023-01-11T23:13:47.0212770Z test_ops.py::TestCommonCUDA::test_dtypes_histc_cuda PASSED               [  3%]
2023-01-11T23:13:47.0213083Z test_ops.py::TestCommonCUDA::test_dtypes_histogramdd_cuda PASSED         [  3%]
2023-01-11T23:13:47.0213392Z test_ops.py::TestCommonCUDA::test_dtypes_hsplit_cuda PASSED              [  3%]
2023-01-11T23:13:47.0213699Z test_ops.py::TestCommonCUDA::test_dtypes_igamma_cuda PASSED              [  3%]
2023-01-11T23:13:47.0214009Z test_ops.py::TestCommonCUDA::test_dtypes_igammac_cuda PASSED             [  3%]
2023-01-11T23:13:47.0214311Z test_ops.py::TestCommonCUDA::test_dtypes_imag_cuda PASSED                [  3%]
2023-01-11T23:13:47.0214746Z test_ops.py::TestCommonCUDA::test_dtypes_index_copy_cuda PASSED          [  3%]
2023-01-11T23:13:47.0215057Z test_ops.py::TestCommonCUDA::test_dtypes_index_fill_cuda PASSED          [  3%]
2023-01-11T23:13:47.0215365Z test_ops.py::TestCommonCUDA::test_dtypes_index_put_cuda PASSED           [  3%]
2023-01-11T23:13:47.0215674Z test_ops.py::TestCommonCUDA::test_dtypes_index_select_cuda PASSED        [  3%]
2023-01-11T23:13:47.0215980Z test_ops.py::TestCommonCUDA::test_dtypes_inner_cuda PASSED               [  3%]
2023-01-11T23:13:47.0216285Z test_ops.py::TestCommonCUDA::test_dtypes_isclose_cuda PASSED             [  3%]
2023-01-11T23:13:47.0216585Z test_ops.py::TestCommonCUDA::test_dtypes_isposinf_cuda PASSED            [  3%]
2023-01-11T23:13:47.0216892Z test_ops.py::TestCommonCUDA::test_dtypes_istft_cuda PASSED               [  3%]
2023-01-11T23:13:47.0217219Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_2inputs_2outputs_cuda PASSED [  3%]
2023-01-11T23:13:47.0217574Z test_ops.py::TestCommonCUDA::test_dtypes_jiterator_4inputs_with_extra_args_cuda PASSED [  3%]
2023-01-11T23:13:47.0217900Z test_ops.py::TestCommonCUDA::test_dtypes_le_cuda PASSED                  [  3%]
2023-01-11T23:13:47.0218202Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cond_cuda PASSED         [  3%]
2023-01-11T23:13:47.0218563Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_det_singular_cuda PASSED [  3%]
2023-01-11T23:13:47.0218884Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigh_cuda PASSED         [  3%]
2023-01-11T23:13:47.0219220Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_eigvals_cuda PASSED      [  3%]
2023-01-11T23:13:47.0219576Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_householder_product_cuda PASSED [  3%]
2023-01-11T23:13:47.0219906Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_cuda PASSED          [  3%]
2023-01-11T23:13:47.0220211Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_inv_ex_cuda PASSED       [  3%]
2023-01-11T23:13:47.0220524Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_cuda PASSED   [  3%]
2023-01-11T23:13:47.0220853Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_factor_ex_cuda PASSED [  3%]
2023-01-11T23:13:47.0221172Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_ldl_solve_cuda PASSED    [  3%]
2023-01-11T23:13:47.0221488Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_lstsq_cuda PASSED        [  3%]
2023-01-11T23:13:47.0221801Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_multi_dot_cuda PASSED    [  3%]
2023-01-11T23:13:47.0222111Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_cuda PASSED         [  3%]
2023-01-11T23:13:47.0222438Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_norm_subgradients_at_zero_cuda PASSED [  3%]
2023-01-11T23:13:47.0222771Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_cuda PASSED         [  3%]
2023-01-11T23:13:47.0223095Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_hermitian_cuda PASSED [  3%]
2023-01-11T23:13:47.0223492Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_pinv_singular_cuda SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  3%]
2023-01-11T23:13:47.0223932Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_qr_cuda PASSED           [  3%]
2023-01-11T23:13:47.0224245Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_slogdet_cuda PASSED      [  3%]
2023-01-11T23:13:47.0224568Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_solve_ex_cuda PASSED     [  4%]
2023-01-11T23:13:47.0224882Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_tensorsolve_cuda PASSED  [  4%]
2023-01-11T23:13:47.0225200Z test_ops.py::TestCommonCUDA::test_dtypes_linspace_cuda PASSED            [  4%]
2023-01-11T23:13:47.0225507Z test_ops.py::TestCommonCUDA::test_dtypes_log10_cuda PASSED               [  4%]
2023-01-11T23:13:47.0225803Z test_ops.py::TestCommonCUDA::test_dtypes_log1p_cuda PASSED               [  4%]
2023-01-11T23:13:47.0226104Z test_ops.py::TestCommonCUDA::test_dtypes_log2_cuda PASSED                [  4%]
2023-01-11T23:13:47.0226403Z test_ops.py::TestCommonCUDA::test_dtypes_log_cuda PASSED                 [  4%]
2023-01-11T23:13:47.0226710Z test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_cuda PASSED         [  4%]
2023-01-11T23:13:47.0227030Z test_ops.py::TestCommonCUDA::test_dtypes_log_softmax_with_dtype_cuda PASSED [  4%]
2023-01-11T23:13:47.0227361Z test_ops.py::TestCommonCUDA::test_dtypes_logical_or_cuda PASSED          [  4%]
2023-01-11T23:13:47.0227663Z test_ops.py::TestCommonCUDA::test_dtypes_logit_cuda PASSED               [  4%]
2023-01-11T23:13:47.0227963Z test_ops.py::TestCommonCUDA::test_dtypes_logspace_cuda PASSED            [  4%]
2023-01-11T23:13:47.0228271Z test_ops.py::TestCommonCUDA::test_dtypes_lu_unpack_cuda PASSED           [  4%]
2023-01-11T23:13:47.0228575Z test_ops.py::TestCommonCUDA::test_dtypes_mH_cuda PASSED                  [  4%]
2023-01-11T23:13:47.0228873Z test_ops.py::TestCommonCUDA::test_dtypes_mT_cuda PASSED                  [  4%]
2023-01-11T23:13:47.0229197Z test_ops.py::TestCommonCUDA::test_dtypes_masked_amax_cuda PASSED         [  4%]
2023-01-11T23:13:47.0229534Z test_ops.py::TestCommonCUDA::test_dtypes_masked_argmax_cuda PASSED       [  4%]
2023-01-11T23:13:47.0229931Z test_ops.py::TestCommonCUDA::test_dtypes_masked_argmin_cuda PASSED       [  4%]
2023-01-11T23:13:47.0230273Z test_ops.py::TestCommonCUDA::test_dtypes_masked_cumprod_cuda PASSED      [  4%]
2023-01-11T23:13:47.0230586Z test_ops.py::TestCommonCUDA::test_dtypes_masked_cumsum_cuda PASSED       [  4%]
2023-01-11T23:13:47.0230900Z test_ops.py::TestCommonCUDA::test_dtypes_masked_normalize_cuda PASSED    [  4%]
2023-01-11T23:13:47.0231218Z test_ops.py::TestCommonCUDA::test_dtypes_masked_prod_cuda PASSED         [  4%]
2023-01-11T23:13:47.0231526Z test_ops.py::TestCommonCUDA::test_dtypes_masked_scatter_cuda PASSED      [  4%]
2023-01-11T23:13:47.0231837Z test_ops.py::TestCommonCUDA::test_dtypes_masked_select_cuda PASSED       [  4%]
2023-01-11T23:13:47.0232148Z test_ops.py::TestCommonCUDA::test_dtypes_matmul_cuda PASSED              [  4%]
2023-01-11T23:13:47.0232449Z test_ops.py::TestCommonCUDA::test_dtypes_max_binary_cuda PASSED          [  4%]
2023-01-11T23:13:47.0232785Z test_ops.py::TestCommonCUDA::test_dtypes_max_pool2d_with_indices_backward_cuda PASSED [  4%]
2023-01-11T23:13:47.0233138Z test_ops.py::TestCommonCUDA::test_dtypes_max_reduction_with_dim_cuda PASSED [  4%]
2023-01-11T23:13:47.0233461Z test_ops.py::TestCommonCUDA::test_dtypes_maximum_cuda PASSED             [  4%]
2023-01-11T23:13:47.0233762Z test_ops.py::TestCommonCUDA::test_dtypes_median_cuda PASSED              [  4%]
2023-01-11T23:13:47.0234084Z test_ops.py::TestCommonCUDA::test_dtypes_meshgrid_list_of_tensors_cuda PASSED [  4%]
2023-01-11T23:13:47.0234422Z test_ops.py::TestCommonCUDA::test_dtypes_min_reduction_with_dim_cuda PASSED [  4%]
2023-01-11T23:13:47.0234736Z test_ops.py::TestCommonCUDA::test_dtypes_mode_cuda PASSED                [  4%]
2023-01-11T23:13:47.0235041Z test_ops.py::TestCommonCUDA::test_dtypes_mul_cuda PASSED                 [  4%]
2023-01-11T23:13:47.0235347Z test_ops.py::TestCommonCUDA::test_dtypes_multinomial_cuda PASSED         [  4%]
2023-01-11T23:13:47.0235691Z test_ops.py::TestCommonCUDA::test_dtypes_mv_cuda PASSED                  [  4%]
2023-01-11T23:13:47.0236005Z test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_1_cuda PASSED [  4%]
2023-01-11T23:13:47.0236338Z test_ops.py::TestCommonCUDA::test_dtypes_mvlgamma_mvlgamma_p_3_cuda PASSED [  4%]
2023-01-11T23:13:47.0236659Z test_ops.py::TestCommonCUDA::test_dtypes_nan_to_num_cuda PASSED          [  4%]
2023-01-11T23:13:47.0236963Z test_ops.py::TestCommonCUDA::test_dtypes_nanmedian_cuda PASSED           [  4%]
2023-01-11T23:13:47.0237279Z test_ops.py::TestCommonCUDA::test_dtypes_nanquantile_cuda PASSED         [  4%]
2023-01-11T23:13:47.0237585Z test_ops.py::TestCommonCUDA::test_dtypes_nansum_cuda PASSED              [  4%]
2023-01-11T23:13:47.0237895Z test_ops.py::TestCommonCUDA::test_dtypes_narrow_copy_cuda PASSED         [  4%]
2023-01-11T23:13:47.0238206Z test_ops.py::TestCommonCUDA::test_dtypes_native_layer_norm_cuda PASSED   [  4%]
2023-01-11T23:13:47.0238516Z test_ops.py::TestCommonCUDA::test_dtypes_ne_cuda PASSED                  [  4%]
2023-01-11T23:13:47.0238819Z test_ops.py::TestCommonCUDA::test_dtypes_neg_cuda PASSED                 [  4%]
2023-01-11T23:13:47.0239126Z test_ops.py::TestCommonCUDA::test_dtypes_new_empty_strided_cuda PASSED   [  4%]
2023-01-11T23:13:47.0239492Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional__scaled_dot_product_attention_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:13:47.0239874Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool2d_cuda PASSED [  4%]
2023-01-11T23:13:47.0240232Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_avg_pool3d_cuda PASSED [  4%]
2023-01-11T23:13:47.0240583Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_adaptive_max_pool1d_cuda PASSED [  4%]
2023-01-11T23:13:47.0240930Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool2d_cuda PASSED [  4%]
2023-01-11T23:13:47.0241264Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_avg_pool3d_cuda PASSED [  4%]
2023-01-11T23:13:47.0241619Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_batch_norm_without_cudnn_cuda PASSED [  4%]
2023-01-11T23:13:47.0241961Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_celu_cuda PASSED  [  4%]
2023-01-11T23:13:47.0242315Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv2d_cuda PASSED [  4%]
2023-01-11T23:13:47.0242663Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_conv_transpose1d_cuda PASSED [  4%]
2023-01-11T23:13:47.0243011Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_cosine_similarity_cuda PASSED [  4%]
2023-01-11T23:13:47.0243354Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_dropout_cuda PASSED [  4%]
2023-01-11T23:13:47.0243684Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_elu_cuda PASSED   [  4%]
2023-01-11T23:13:47.0244020Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_embedding_bag_cuda PASSED [  4%]
2023-01-11T23:13:47.0244383Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_with_train_cuda PASSED [  4%]
2023-01-11T23:13:47.0244776Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_feature_alpha_dropout_without_train_cuda PASSED [  4%]
2023-01-11T23:13:47.0245153Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_fractional_max_pool3d_cuda PASSED [  4%]
2023-01-11T23:13:47.0245499Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_grid_sample_cuda PASSED [  4%]
2023-01-11T23:13:47.0245838Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_group_norm_cuda PASSED [  4%]
2023-01-11T23:13:47.0246174Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardshrink_cuda PASSED [  4%]
2023-01-11T23:13:47.0246514Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardsigmoid_cuda PASSED [  4%]
2023-01-11T23:13:47.0246847Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_hardswish_cuda PASSED [  4%]
2023-01-11T23:13:47.0247180Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_huber_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0247555Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_bilinear_cuda PASSED [  4%]
2023-01-11T23:13:47.0247913Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_nearest_cuda PASSED [  4%]
2023-01-11T23:13:47.0248279Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_interpolate_trilinear_cuda PASSED [  4%]
2023-01-11T23:13:47.0248627Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_kl_div_cuda PASSED [  4%]
2023-01-11T23:13:47.0248961Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_layer_norm_cuda PASSED [  4%]
2023-01-11T23:13:47.0249288Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_linear_cuda PASSED [  4%]
2023-01-11T23:13:47.0249630Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_local_response_norm_cuda PASSED [  4%]
2023-01-11T23:13:47.0249975Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_logsigmoid_cuda PASSED [  4%]
2023-01-11T23:13:47.0250303Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool1d_cuda PASSED [  4%]
2023-01-11T23:13:47.0250635Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_pool3d_cuda PASSED [  4%]
2023-01-11T23:13:47.0250978Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_cuda PASSED [  4%]
2023-01-11T23:13:47.0251329Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool1d_grad_cuda PASSED [  4%]
2023-01-11T23:13:47.0251672Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool2d_cuda PASSED [  4%]
2023-01-11T23:13:47.0252051Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_max_unpool3d_cuda PASSED [  4%]
2023-01-11T23:13:47.0252451Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_mse_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0252795Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multi_margin_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0253153Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_multilabel_soft_margin_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0253506Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_normalize_cuda PASSED [  4%]
2023-01-11T23:13:47.0253844Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_one_hot_cuda PASSED [  4%]
2023-01-11T23:13:47.0254177Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_constant_cuda PASSED [  4%]
2023-01-11T23:13:47.0254686Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pad_replicate_cuda PASSED [  4%]
2023-01-11T23:13:47.0255031Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pdist_cuda PASSED [  4%]
2023-01-11T23:13:47.0255366Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_shuffle_cuda PASSED [  4%]
2023-01-11T23:13:47.0255706Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_pixel_unshuffle_cuda PASSED [  4%]
2023-01-11T23:13:47.0256051Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_poisson_nll_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0256389Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu6_cuda PASSED [  4%]
2023-01-11T23:13:47.0256707Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_relu_cuda PASSED  [  4%]
2023-01-11T23:13:47.0257033Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_rrelu_cuda PASSED [  4%]
2023-01-11T23:13:47.0257356Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_selu_cuda PASSED  [  4%]
2023-01-11T23:13:47.0257680Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_silu_cuda PASSED  [  4%]
2023-01-11T23:13:47.0258013Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_soft_margin_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0258355Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_cuda PASSED [  4%]
2023-01-11T23:13:47.0258701Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softmin_with_dtype_cuda PASSED [  4%]
2023-01-11T23:13:47.0259039Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softshrink_cuda PASSED [  4%]
2023-01-11T23:13:47.0259383Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_softsign_cuda PASSED [  4%]
2023-01-11T23:13:47.0259768Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_tanhshrink_cuda PASSED [  4%]
2023-01-11T23:13:47.0260156Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0260529Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_triplet_margin_with_distance_loss_cuda PASSED [  4%]
2023-01-11T23:13:47.0260901Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_bilinear_cuda PASSED [  4%]
2023-01-11T23:13:47.0261255Z test_ops.py::TestCommonCUDA::test_dtypes_nn_functional_upsample_nearest_cuda PASSED [  4%]
2023-01-11T23:13:47.0261586Z test_ops.py::TestCommonCUDA::test_dtypes_nonzero_cuda PASSED             [  4%]
2023-01-11T23:13:47.0261900Z test_ops.py::TestCommonCUDA::test_dtypes_norm_cuda PASSED                [  4%]
2023-01-11T23:13:47.0262210Z test_ops.py::TestCommonCUDA::test_dtypes_norm_fro_cuda PASSED            [  4%]
2023-01-11T23:13:47.0262526Z test_ops.py::TestCommonCUDA::test_dtypes_norm_nuc_cuda PASSED            [  4%]
2023-01-11T23:13:47.0262838Z test_ops.py::TestCommonCUDA::test_dtypes_normal_cuda PASSED              [  4%]
2023-01-11T23:13:47.0263147Z test_ops.py::TestCommonCUDA::test_dtypes_ones_cuda PASSED                [  4%]
2023-01-11T23:13:47.0263483Z test_ops.py::TestCommonCUDA::test_dtypes_ops_nvprims_native_batch_norm_cuda PASSED [  4%]
2023-01-11T23:13:47.0263806Z test_ops.py::TestCommonCUDA::test_dtypes_ormqr_cuda PASSED               [  4%]
2023-01-11T23:13:47.0264112Z test_ops.py::TestCommonCUDA::test_dtypes_outer_cuda PASSED               [  4%]
2023-01-11T23:13:47.0264417Z test_ops.py::TestCommonCUDA::test_dtypes_permute_cuda PASSED             [  4%]
2023-01-11T23:13:47.0264726Z test_ops.py::TestCommonCUDA::test_dtypes_polar_cuda PASSED               [  4%]
2023-01-11T23:13:47.0265050Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_0_cuda PASSED [  4%]
2023-01-11T23:13:47.0265406Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_2_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:13:47.0265774Z test_ops.py::TestCommonCUDA::test_dtypes_polygamma_polygamma_n_3_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:13:47.0266112Z test_ops.py::TestCommonCUDA::test_dtypes_positive_cuda PASSED            [  4%]
2023-01-11T23:13:47.0266422Z test_ops.py::TestCommonCUDA::test_dtypes_pow_cuda PASSED                 [  4%]
2023-01-11T23:13:47.0266757Z test_ops.py::TestCommonCUDA::test_dtypes_prod_cuda PASSED                [  4%]
2023-01-11T23:13:47.0267064Z test_ops.py::TestCommonCUDA::test_dtypes_put_cuda PASSED                 [  4%]
2023-01-11T23:13:47.0267363Z test_ops.py::TestCommonCUDA::test_dtypes_randint_cuda PASSED             [  4%]
2023-01-11T23:13:47.0267669Z test_ops.py::TestCommonCUDA::test_dtypes_randn_cuda PASSED               [  4%]
2023-01-11T23:13:47.0267976Z test_ops.py::TestCommonCUDA::test_dtypes_randn_like_cuda PASSED          [  4%]
2023-01-11T23:13:47.0268275Z test_ops.py::TestCommonCUDA::test_dtypes_ravel_cuda PASSED               [  4%]
2023-01-11T23:13:47.0268575Z test_ops.py::TestCommonCUDA::test_dtypes_renorm_cuda PASSED              [  4%]
2023-01-11T23:13:47.0268899Z test_ops.py::TestCommonCUDA::test_dtypes_repeat_interleave_cuda PASSED   [  4%]
2023-01-11T23:13:47.0269214Z test_ops.py::TestCommonCUDA::test_dtypes_reshape_as_cuda PASSED          [  4%]
2023-01-11T23:13:47.0269520Z test_ops.py::TestCommonCUDA::test_dtypes_reshape_cuda PASSED             [  4%]
2023-01-11T23:13:47.0269889Z test_ops.py::TestCommonCUDA::test_dtypes_resize_as__cuda XFAIL           [  4%]
2023-01-11T23:13:47.0270193Z test_ops.py::TestCommonCUDA::test_dtypes_round_cuda PASSED               [  4%]
2023-01-11T23:13:47.0270495Z test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_0_cuda PASSED    [  4%]
2023-01-11T23:13:47.0270828Z test_ops.py::TestCommonCUDA::test_dtypes_round_decimals_neg_3_cuda SKIPPED (Skipped!) [  4%]
2023-01-11T23:13:47.0271157Z test_ops.py::TestCommonCUDA::test_dtypes_rsqrt_cuda PASSED               [  4%]
2023-01-11T23:13:47.0271457Z test_ops.py::TestCommonCUDA::test_dtypes_rsub_cuda PASSED                [  4%]
2023-01-11T23:13:47.0271792Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_amin_cuda PASSED [  4%]
2023-01-11T23:13:47.0272108Z test_ops.py::TestCommonCUDA::test_dtypes_scatter_reduce_mean_cuda PASSED [  4%]
2023-01-11T23:13:47.0272439Z test_ops.py::TestCommonCUDA::test_dtypes_segment_reduce_offsets_cuda PASSED [  4%]
2023-01-11T23:13:47.0272751Z test_ops.py::TestCommonCUDA::test_dtypes_select_cuda PASSED              [  4%]
2023-01-11T23:13:47.0273050Z test_ops.py::TestCommonCUDA::test_dtypes_sgn_cuda PASSED                 [  4%]
2023-01-11T23:13:47.0273347Z test_ops.py::TestCommonCUDA::test_dtypes_sign_cuda PASSED                [  4%]
2023-01-11T23:13:47.0273667Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_blackman_cuda PASSED [  4%]
2023-01-11T23:13:47.0274001Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_cosine_cuda PASSED [  4%]
2023-01-11T23:13:47.0274338Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_exponential_cuda PASSED [  4%]
2023-01-11T23:13:47.0274682Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_gaussian_cuda PASSED [  4%]
2023-01-11T23:13:47.0275013Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_hamming_cuda PASSED [  5%]
2023-01-11T23:13:47.0275347Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_kaiser_cuda PASSED [  5%]
2023-01-11T23:13:47.0275680Z test_ops.py::TestCommonCUDA::test_dtypes_signal_windows_nuttall_cuda PASSED [  5%]
2023-01-11T23:13:47.0276000Z test_ops.py::TestCommonCUDA::test_dtypes_signbit_cuda PASSED             [  5%]
2023-01-11T23:13:47.0276298Z test_ops.py::TestCommonCUDA::test_dtypes_sin_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0276605Z test_ops.py::TestCommonCUDA::test_dtypes_slice_scatter_cuda PASSED       [  5%]
2023-01-11T23:13:47.0276918Z test_ops.py::TestCommonCUDA::test_dtypes_softmax_cuda PASSED             [  5%]
2023-01-11T23:13:47.0277236Z test_ops.py::TestCommonCUDA::test_dtypes_sparse_sampled_addmm_cuda PASSED [  5%]
2023-01-11T23:13:47.0277556Z test_ops.py::TestCommonCUDA::test_dtypes_special_airy_ai_cuda PASSED     [  5%]
2023-01-11T23:13:47.0277874Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_j0_cuda PASSED   [  5%]
2023-01-11T23:13:47.0278188Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y0_cuda PASSED   [  5%]
2023-01-11T23:13:47.0278528Z test_ops.py::TestCommonCUDA::test_dtypes_special_bessel_y1_cuda PASSED   [  5%]
2023-01-11T23:13:47.0278866Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_t_cuda PASSED [  5%]
2023-01-11T23:13:47.0279470Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0280031Z test_ops.py::TestCommonCUDA::test_dtypes_special_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0280416Z test_ops.py::TestCommonCUDA::test_dtypes_special_entr_cuda PASSED        [  5%]
2023-01-11T23:13:47.0280752Z test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_h_cuda PASSED [  5%]
2023-01-11T23:13:47.0281104Z test_ops.py::TestCommonCUDA::test_dtypes_special_hermite_polynomial_he_cuda PASSED [  5%]
2023-01-11T23:13:47.0281448Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_i0_cuda PASSED [  5%]
2023-01-11T23:13:47.0281789Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k0_cuda PASSED [  5%]
2023-01-11T23:13:47.0282131Z test_ops.py::TestCommonCUDA::test_dtypes_special_modified_bessel_k1_cuda PASSED [  5%]
2023-01-11T23:13:47.0282458Z test_ops.py::TestCommonCUDA::test_dtypes_special_ndtri_cuda PASSED       [  5%]
2023-01-11T23:13:47.0282803Z test_ops.py::TestCommonCUDA::test_dtypes_special_polygamma_special_polygamma_n_0_cuda PASSED [  5%]
2023-01-11T23:13:47.0283166Z test_ops.py::TestCommonCUDA::test_dtypes_special_scaled_modified_bessel_k0_cuda PASSED [  5%]
2023-01-11T23:13:47.0283688Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_t_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0284302Z test_ops.py::TestCommonCUDA::test_dtypes_special_shifted_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0284708Z test_ops.py::TestCommonCUDA::test_dtypes_special_spherical_bessel_j0_cuda PASSED [  5%]
2023-01-11T23:13:47.0285041Z test_ops.py::TestCommonCUDA::test_dtypes_special_zeta_cuda PASSED        [  5%]
2023-01-11T23:13:47.0285359Z test_ops.py::TestCommonCUDA::test_dtypes_split_list_args_cuda PASSED     [  5%]
2023-01-11T23:13:47.0285673Z test_ops.py::TestCommonCUDA::test_dtypes_split_with_sizes_cuda PASSED    [  5%]
2023-01-11T23:13:47.0285986Z test_ops.py::TestCommonCUDA::test_dtypes_squeeze_cuda PASSED             [  5%]
2023-01-11T23:13:47.0286297Z test_ops.py::TestCommonCUDA::test_dtypes_stack_cuda PASSED               [  5%]
2023-01-11T23:13:47.0286600Z test_ops.py::TestCommonCUDA::test_dtypes_std_mean_cuda PASSED            [  5%]
2023-01-11T23:13:47.0286909Z test_ops.py::TestCommonCUDA::test_dtypes_std_unbiased_cuda PASSED        [  5%]
2023-01-11T23:13:47.0287210Z test_ops.py::TestCommonCUDA::test_dtypes_stft_cuda PASSED                [  5%]
2023-01-11T23:13:47.0287512Z test_ops.py::TestCommonCUDA::test_dtypes_sum_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0287809Z test_ops.py::TestCommonCUDA::test_dtypes_sum_to_size_cuda PASSED         [  5%]
2023-01-11T23:13:47.0288110Z test_ops.py::TestCommonCUDA::test_dtypes_svd_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0288405Z test_ops.py::TestCommonCUDA::test_dtypes_t_cuda PASSED                   [  5%]
2023-01-11T23:13:47.0288706Z test_ops.py::TestCommonCUDA::test_dtypes_take_along_dim_cuda PASSED      [  5%]
2023-01-11T23:13:47.0289022Z test_ops.py::TestCommonCUDA::test_dtypes_tan_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0289354Z test_ops.py::TestCommonCUDA::test_dtypes_tile_cuda PASSED                [  5%]
2023-01-11T23:13:47.0289649Z test_ops.py::TestCommonCUDA::test_dtypes_to_cuda PASSED                  [  5%]
2023-01-11T23:13:47.0289946Z test_ops.py::TestCommonCUDA::test_dtypes_topk_cuda PASSED                [  5%]
2023-01-11T23:13:47.0290246Z test_ops.py::TestCommonCUDA::test_dtypes_trace_cuda PASSED               [  5%]
2023-01-11T23:13:47.0290577Z test_ops.py::TestCommonCUDA::test_dtypes_trapezoid_cuda PASSED           [  5%]
2023-01-11T23:13:47.0290884Z test_ops.py::TestCommonCUDA::test_dtypes_trapz_cuda PASSED               [  5%]
2023-01-11T23:13:47.0291185Z test_ops.py::TestCommonCUDA::test_dtypes_tril_indices_cuda PASSED        [  5%]
2023-01-11T23:13:47.0291492Z test_ops.py::TestCommonCUDA::test_dtypes_triu_indices_cuda PASSED        [  5%]
2023-01-11T23:13:47.0291798Z test_ops.py::TestCommonCUDA::test_dtypes_true_divide_cuda PASSED         [  5%]
2023-01-11T23:13:47.0292097Z test_ops.py::TestCommonCUDA::test_dtypes_trunc_cuda PASSED               [  5%]
2023-01-11T23:13:47.0292396Z test_ops.py::TestCommonCUDA::test_dtypes_unbind_cuda PASSED              [  5%]
2023-01-11T23:13:47.0292702Z test_ops.py::TestCommonCUDA::test_dtypes_unflatten_cuda PASSED           [  5%]
2023-01-11T23:13:47.0293005Z test_ops.py::TestCommonCUDA::test_dtypes_unique_cuda PASSED              [  5%]
2023-01-11T23:13:47.0293300Z test_ops.py::TestCommonCUDA::test_dtypes_var_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0293605Z test_ops.py::TestCommonCUDA::test_dtypes_var_mean_unbiased_cuda PASSED   [  5%]
2023-01-11T23:13:47.0293916Z test_ops.py::TestCommonCUDA::test_dtypes_var_unbiased_cuda PASSED        [  5%]
2023-01-11T23:13:47.0294212Z test_ops.py::TestCommonCUDA::test_dtypes_vdot_cuda PASSED                [  5%]
2023-01-11T23:13:47.0294618Z test_ops.py::TestCommonCUDA::test_dtypes_view_as_complex_cuda PASSED     [  5%]
2023-01-11T23:13:47.0294928Z test_ops.py::TestCommonCUDA::test_dtypes_view_as_cuda PASSED             [  5%]
2023-01-11T23:13:47.0295230Z test_ops.py::TestCommonCUDA::test_dtypes_view_copy_cuda PASSED           [  5%]
2023-01-11T23:13:47.0295576Z test_ops.py::TestCommonCUDA::test_dtypes_view_cuda PASSED                [  5%]
2023-01-11T23:13:47.0295877Z test_ops.py::TestCommonCUDA::test_dtypes_zeros_cuda PASSED               [  5%]
2023-01-11T23:13:47.0296177Z test_ops.py::TestCommonCUDA::test_errors_T_cuda PASSED                   [  5%]
2023-01-11T23:13:47.0296470Z test_ops.py::TestCommonCUDA::test_errors___rand___cuda PASSED            [  5%]
2023-01-11T23:13:47.0296765Z test_ops.py::TestCommonCUDA::test_errors___rmod___cuda PASSED            [  5%]
2023-01-11T23:13:47.0297060Z test_ops.py::TestCommonCUDA::test_errors___ror___cuda PASSED             [  5%]
2023-01-11T23:13:47.0297360Z test_ops.py::TestCommonCUDA::test_errors_add_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0297654Z test_ops.py::TestCommonCUDA::test_errors_aminmax_cuda PASSED             [  5%]
2023-01-11T23:13:47.0297951Z test_ops.py::TestCommonCUDA::test_errors_arange_cuda PASSED              [  5%]
2023-01-11T23:13:47.0298259Z test_ops.py::TestCommonCUDA::test_errors_as_strided_scatter_cuda PASSED  [  5%]
2023-01-11T23:13:47.0298568Z test_ops.py::TestCommonCUDA::test_errors_bitwise_and_cuda PASSED         [  5%]
2023-01-11T23:13:47.0298867Z test_ops.py::TestCommonCUDA::test_errors_cat_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0299196Z test_ops.py::TestCommonCUDA::test_errors_clamp_max_cuda XFAIL            [  5%]
2023-01-11T23:13:47.0299520Z test_ops.py::TestCommonCUDA::test_errors_clamp_min_cuda XFAIL            [  5%]
2023-01-11T23:13:47.0299822Z test_ops.py::TestCommonCUDA::test_errors_diagonal_cuda PASSED            [  5%]
2023-01-11T23:13:47.0300130Z test_ops.py::TestCommonCUDA::test_errors_div_floor_rounding_cuda PASSED  [  5%]
2023-01-11T23:13:47.0300444Z test_ops.py::TestCommonCUDA::test_errors_div_trunc_rounding_cuda PASSED  [  5%]
2023-01-11T23:13:47.0300750Z test_ops.py::TestCommonCUDA::test_errors_dsplit_cuda PASSED              [  5%]
2023-01-11T23:13:47.0301051Z test_ops.py::TestCommonCUDA::test_errors_eye_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0301350Z test_ops.py::TestCommonCUDA::test_errors_fft_fftn_cuda PASSED            [  5%]
2023-01-11T23:13:47.0301652Z test_ops.py::TestCommonCUDA::test_errors_fft_hfft2_cuda PASSED           [  5%]
2023-01-11T23:13:47.0301985Z test_ops.py::TestCommonCUDA::test_errors_fft_hfft_cuda PASSED            [  5%]
2023-01-11T23:13:47.0302288Z test_ops.py::TestCommonCUDA::test_errors_fft_ifft2_cuda PASSED           [  5%]
2023-01-11T23:13:47.0302586Z test_ops.py::TestCommonCUDA::test_errors_fft_ifft_cuda PASSED            [  5%]
2023-01-11T23:13:47.0302882Z test_ops.py::TestCommonCUDA::test_errors_fft_ihfft2_cuda PASSED          [  5%]
2023-01-11T23:13:47.0303182Z test_ops.py::TestCommonCUDA::test_errors_fliplr_cuda PASSED              [  5%]
2023-01-11T23:13:47.0303482Z test_ops.py::TestCommonCUDA::test_errors_flipud_cuda PASSED              [  5%]
2023-01-11T23:13:47.0303784Z test_ops.py::TestCommonCUDA::test_errors_float_power_cuda PASSED         [  5%]
2023-01-11T23:13:47.0304086Z test_ops.py::TestCommonCUDA::test_errors_fmax_cuda PASSED                [  5%]
2023-01-11T23:13:47.0304382Z test_ops.py::TestCommonCUDA::test_errors_fmod_cuda PASSED                [  5%]
2023-01-11T23:13:47.0304678Z test_ops.py::TestCommonCUDA::test_errors_gcd_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0304973Z test_ops.py::TestCommonCUDA::test_errors_ge_cuda PASSED                  [  5%]
2023-01-11T23:13:47.0305267Z test_ops.py::TestCommonCUDA::test_errors_gt_cuda PASSED                  [  5%]
2023-01-11T23:13:47.0305566Z test_ops.py::TestCommonCUDA::test_errors_heaviside_cuda PASSED           [  5%]
2023-01-11T23:13:47.0305869Z test_ops.py::TestCommonCUDA::test_errors_igamma_cuda PASSED              [  5%]
2023-01-11T23:13:47.0306164Z test_ops.py::TestCommonCUDA::test_errors_igammac_cuda PASSED             [  5%]
2023-01-11T23:13:47.0306466Z test_ops.py::TestCommonCUDA::test_errors_index_select_cuda PASSED        [  5%]
2023-01-11T23:13:47.0306793Z test_ops.py::TestCommonCUDA::test_errors_jiterator_binary_return_by_ref_cuda PASSED [  5%]
2023-01-11T23:13:47.0307153Z test_ops.py::TestCommonCUDA::test_errors_kthvalue_cuda PASSED            [  5%]
2023-01-11T23:13:47.0307453Z test_ops.py::TestCommonCUDA::test_errors_lcm_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0307757Z test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_cuda PASSED        [  5%]
2023-01-11T23:13:47.0308082Z test_ops.py::TestCommonCUDA::test_errors_linalg_lstsq_grad_oriented_cuda PASSED [  5%]
2023-01-11T23:13:47.0308400Z test_ops.py::TestCommonCUDA::test_errors_logspace_cuda PASSED            [  5%]
2023-01-11T23:13:47.0308698Z test_ops.py::TestCommonCUDA::test_errors_lt_cuda PASSED                  [  5%]
2023-01-11T23:13:47.0308999Z test_ops.py::TestCommonCUDA::test_errors_masked_fill_cuda PASSED         [  5%]
2023-01-11T23:13:47.0309298Z test_ops.py::TestCommonCUDA::test_errors_maximum_cuda PASSED             [  5%]
2023-01-11T23:13:47.0309601Z test_ops.py::TestCommonCUDA::test_errors_multinomial_cuda PASSED         [  5%]
2023-01-11T23:13:47.0309977Z test_ops.py::TestCommonCUDA::test_errors_neg_cuda PASSED                 [  5%]
2023-01-11T23:13:47.0310300Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_avg_pool3d_cuda PASSED [  5%]
2023-01-11T23:13:47.0310635Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv1d_cuda PASSED [  5%]
2023-01-11T23:13:47.0310965Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_conv2d_cuda PASSED [  5%]
2023-01-11T23:13:47.0311301Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_gaussian_nll_loss_cuda PASSED [  5%]
2023-01-11T23:13:47.0311649Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_hinge_embedding_loss_cuda PASSED [  5%]
2023-01-11T23:13:47.0312001Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_margin_ranking_loss_cuda PASSED [  5%]
2023-01-11T23:13:47.0312340Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_max_pool3d_cuda PASSED [  5%]
2023-01-11T23:13:47.0312667Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_rrelu_cuda PASSED [  5%]
2023-01-11T23:13:47.0312998Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_soft_margin_loss_cuda PASSED [  5%]
2023-01-11T23:13:47.0313337Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_softshrink_cuda PASSED [  5%]
2023-01-11T23:13:47.0313724Z test_ops.py::TestCommonCUDA::test_errors_nn_functional_triplet_margin_loss_cuda PASSED [  5%]
2023-01-11T23:13:47.0314057Z test_ops.py::TestCommonCUDA::test_errors_polar_cuda PASSED               [  5%]
2023-01-11T23:13:47.0314373Z test_ops.py::TestCommonCUDA::test_errors_remainder_cuda PASSED           [  5%]
2023-01-11T23:13:47.0314684Z test_ops.py::TestCommonCUDA::test_errors_rsub_cuda PASSED                [  5%]
2023-01-11T23:13:47.0315012Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_bartlett_cuda PASSED [  5%]
2023-01-11T23:13:47.0315343Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_blackman_cuda PASSED [  5%]
2023-01-11T23:13:47.0315689Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_exponential_cuda PASSED [  5%]
2023-01-11T23:13:47.0316045Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_general_hamming_cuda PASSED [  5%]
2023-01-11T23:13:47.0316382Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_hamming_cuda PASSED [  5%]
2023-01-11T23:13:47.0316718Z test_ops.py::TestCommonCUDA::test_errors_signal_windows_hann_cuda PASSED [  5%]
2023-01-11T23:13:47.0317058Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_t_cuda PASSED [  5%]
2023-01-11T23:13:47.0317586Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0318150Z test_ops.py::TestCommonCUDA::test_errors_special_chebyshev_polynomial_w_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0318561Z test_ops.py::TestCommonCUDA::test_errors_special_hermite_polynomial_he_cuda PASSED [  5%]
2023-01-11T23:13:47.0318915Z test_ops.py::TestCommonCUDA::test_errors_special_laguerre_polynomial_l_cuda PASSED [  5%]
2023-01-11T23:13:47.0319463Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_u_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0320041Z test_ops.py::TestCommonCUDA::test_errors_special_shifted_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  5%]
2023-01-11T23:13:47.0320440Z test_ops.py::TestCommonCUDA::test_errors_special_zeta_cuda PASSED        [  5%]
2023-01-11T23:13:47.0320763Z test_ops.py::TestCommonCUDA::test_errors_sum_to_size_cuda PASSED         [  5%]
2023-01-11T23:13:47.0321077Z test_ops.py::TestCommonCUDA::test_errors_take_cuda PASSED                [  5%]
2023-01-11T23:13:47.0321384Z test_ops.py::TestCommonCUDA::test_errors_trace_cuda PASSED               [  5%]
2023-01-11T23:13:47.0321687Z test_ops.py::TestCommonCUDA::test_errors_triu_cuda PASSED                [  5%]
2023-01-11T23:13:47.0321999Z test_ops.py::TestCommonCUDA::test_errors_uniform_cuda PASSED             [  5%]
2023-01-11T23:13:47.0322303Z test_ops.py::TestCommonCUDA::test_errors_view_cuda PASSED                [  5%]
2023-01-11T23:13:47.0322611Z test_ops.py::TestCommonCUDA::test_errors_vsplit_cuda PASSED              [  5%]
2023-01-11T23:13:47.0322920Z test_ops.py::TestCommonCUDA::test_errors_vstack_cuda PASSED              [  5%]
2023-01-11T23:13:47.0323231Z test_ops.py::TestCommonCUDA::test_errors_where_cuda PASSED               [  5%]
2023-01-11T23:13:47.0323577Z test_ops.py::TestCommonCUDA::test_multiple_devices_H_cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:13:47.0323967Z test_ops.py::TestCommonCUDA::test_multiple_devices_T_cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:13:47.0324363Z test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_float32 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:13:47.0324760Z test_ops.py::TestCommonCUDA::test_multiple_devices___getitem___cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:13:47.0325162Z test_ops.py::TestCommonCUDA::test_multiple_devices___rand___cuda_int64 SKIPPED (fewer than 2 devices detected) [  5%]
2023-01-11T23:13:47.0325600Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0325997Z test_ops.py::TestCommonCUDA::test_multiple_devices___rmod___cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0326388Z test_ops.py::TestCommonCUDA::test_multiple_devices___ror___cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0326785Z test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0327176Z test_ops.py::TestCommonCUDA::test_multiple_devices___rpow___cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0327565Z test_ops.py::TestCommonCUDA::test_multiple_devices___rsub___cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0327943Z test_ops.py::TestCommonCUDA::test_multiple_devices___rxor___cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0328366Z test_ops.py::TestCommonCUDA::test_multiple_devices__native_batch_norm_legit_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0328781Z test_ops.py::TestCommonCUDA::test_multiple_devices_acos_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0329224Z test_ops.py::TestCommonCUDA::test_multiple_devices_acosh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0329614Z test_ops.py::TestCommonCUDA::test_multiple_devices_add_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0330009Z test_ops.py::TestCommonCUDA::test_multiple_devices_addbmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0330409Z test_ops.py::TestCommonCUDA::test_multiple_devices_addcmul_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0330853Z test_ops.py::TestCommonCUDA::test_multiple_devices_addmm_decomposed_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0331261Z test_ops.py::TestCommonCUDA::test_multiple_devices_addmv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0331658Z test_ops.py::TestCommonCUDA::test_multiple_devices_all_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0332053Z test_ops.py::TestCommonCUDA::test_multiple_devices_amin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0332450Z test_ops.py::TestCommonCUDA::test_multiple_devices_angle_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0332842Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0333239Z test_ops.py::TestCommonCUDA::test_multiple_devices_argmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0333640Z test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0334041Z test_ops.py::TestCommonCUDA::test_multiple_devices_argsort_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0334428Z test_ops.py::TestCommonCUDA::test_multiple_devices_argwhere_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0334933Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0335347Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0335764Z test_ops.py::TestCommonCUDA::test_multiple_devices_as_strided_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0336177Z test_ops.py::TestCommonCUDA::test_multiple_devices_asin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0336569Z test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0337004Z test_ops.py::TestCommonCUDA::test_multiple_devices_asinh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0337389Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0337783Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0338171Z test_ops.py::TestCommonCUDA::test_multiple_devices_atan_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0338570Z test_ops.py::TestCommonCUDA::test_multiple_devices_baddbmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0338969Z test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0339414Z test_ops.py::TestCommonCUDA::test_multiple_devices_bfloat16_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0339810Z test_ops.py::TestCommonCUDA::test_multiple_devices_bincount_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0340213Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_left_shift_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0340616Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_not_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0341015Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_or_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0341421Z test_ops.py::TestCommonCUDA::test_multiple_devices_bitwise_xor_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0341852Z test_ops.py::TestCommonCUDA::test_multiple_devices_bool_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0342254Z test_ops.py::TestCommonCUDA::test_multiple_devices_broadcast_shapes_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0342672Z test_ops.py::TestCommonCUDA::test_multiple_devices_bucketize_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0343070Z test_ops.py::TestCommonCUDA::test_multiple_devices_byte_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0343475Z test_ops.py::TestCommonCUDA::test_multiple_devices_cartesian_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0343870Z test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0344262Z test_ops.py::TestCommonCUDA::test_multiple_devices_cat_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0344660Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdist_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0345062Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0345459Z test_ops.py::TestCommonCUDA::test_multiple_devices_cdouble_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0345855Z test_ops.py::TestCommonCUDA::test_multiple_devices_chalf_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0346249Z test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0346641Z test_ops.py::TestCommonCUDA::test_multiple_devices_char_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0347030Z test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0347440Z test_ops.py::TestCommonCUDA::test_multiple_devices_cholesky_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0347841Z test_ops.py::TestCommonCUDA::test_multiple_devices_chunk_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0348256Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0348654Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0349057Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_max_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0349462Z test_ops.py::TestCommonCUDA::test_multiple_devices_clamp_min_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0349902Z test_ops.py::TestCommonCUDA::test_multiple_devices_clone_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0350304Z test_ops.py::TestCommonCUDA::test_multiple_devices_column_stack_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0350722Z test_ops.py::TestCommonCUDA::test_multiple_devices_combinations_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0351128Z test_ops.py::TestCommonCUDA::test_multiple_devices_conj_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0351527Z test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0351942Z test_ops.py::TestCommonCUDA::test_multiple_devices_constant_pad_nd_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0352355Z test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0352764Z test_ops.py::TestCommonCUDA::test_multiple_devices_contiguous_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0353191Z test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0353594Z test_ops.py::TestCommonCUDA::test_multiple_devices_copysign_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0353986Z test_ops.py::TestCommonCUDA::test_multiple_devices_cos_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0354384Z test_ops.py::TestCommonCUDA::test_multiple_devices_count_nonzero_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0354770Z test_ops.py::TestCommonCUDA::test_multiple_devices_cov_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0355159Z test_ops.py::TestCommonCUDA::test_multiple_devices_cross_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0355556Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0355955Z test_ops.py::TestCommonCUDA::test_multiple_devices_cummin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0356350Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumprod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0356750Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0357140Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumsum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0357548Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0357968Z test_ops.py::TestCommonCUDA::test_multiple_devices_cumulative_trapezoid_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0358377Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0358778Z test_ops.py::TestCommonCUDA::test_multiple_devices_diag_embed_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0359233Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0359660Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagflat_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0360058Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0360465Z test_ops.py::TestCommonCUDA::test_multiple_devices_diagonal_scatter_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0360860Z test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0361253Z test_ops.py::TestCommonCUDA::test_multiple_devices_diff_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0361658Z test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0362064Z test_ops.py::TestCommonCUDA::test_multiple_devices_digamma_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0362456Z test_ops.py::TestCommonCUDA::test_multiple_devices_dist_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0362864Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0363283Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_floor_rounding_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0363710Z test_ops.py::TestCommonCUDA::test_multiple_devices_div_no_rounding_mode_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0364133Z test_ops.py::TestCommonCUDA::test_multiple_devices_dot_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0364531Z test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0364933Z test_ops.py::TestCommonCUDA::test_multiple_devices_dstack_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0365331Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0365729Z test_ops.py::TestCommonCUDA::test_multiple_devices_empty_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0366130Z test_ops.py::TestCommonCUDA::test_multiple_devices_eq_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0366520Z test_ops.py::TestCommonCUDA::test_multiple_devices_erfinv_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0366915Z test_ops.py::TestCommonCUDA::test_multiple_devices_exp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0367295Z test_ops.py::TestCommonCUDA::test_multiple_devices_expm1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0367691Z test_ops.py::TestCommonCUDA::test_multiple_devices_eye_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0368086Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0368489Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0368884Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0369338Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_fftshift_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0369745Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0370146Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0370567Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_hfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0370972Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0371370Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ifft_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0371769Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0372161Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_ihfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0372360Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0372552Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft2_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0372753Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0372945Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_irfftn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0373136Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0373327Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfft_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0373520Z test_ops.py::TestCommonCUDA::test_multiple_devices_fft_rfftn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0373726Z test_ops.py::TestCommonCUDA::test_multiple_devices_fill_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0373921Z test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0374114Z test_ops.py::TestCommonCUDA::test_multiple_devices_flatten_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0374307Z test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0374596Z test_ops.py::TestCommonCUDA::test_multiple_devices_flip_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0374790Z test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0374981Z test_ops.py::TestCommonCUDA::test_multiple_devices_fliplr_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0375175Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0375364Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0375557Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0375753Z test_ops.py::TestCommonCUDA::test_multiple_devices_float_power_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0375942Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0376138Z test_ops.py::TestCommonCUDA::test_multiple_devices_floor_divide_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0376326Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmax_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0376519Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0376788Z test_ops.py::TestCommonCUDA::test_multiple_devices_fmin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0376983Z test_ops.py::TestCommonCUDA::test_multiple_devices_frac_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0377175Z test_ops.py::TestCommonCUDA::test_multiple_devices_frexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0377363Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0377551Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0377743Z test_ops.py::TestCommonCUDA::test_multiple_devices_full_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0377937Z test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0378127Z test_ops.py::TestCommonCUDA::test_multiple_devices_gather_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0378315Z test_ops.py::TestCommonCUDA::test_multiple_devices_gcd_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0378501Z test_ops.py::TestCommonCUDA::test_multiple_devices_ge_cuda_int64 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0378692Z test_ops.py::TestCommonCUDA::test_multiple_devices_geqrf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0378890Z test_ops.py::TestCommonCUDA::test_multiple_devices_gradient_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0379077Z test_ops.py::TestCommonCUDA::test_multiple_devices_gt_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0379330Z test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_float32 SKIPPED (fewer than 2 devices detected) [  6%]
2023-01-11T23:13:47.0379544Z test_ops.py::TestCommonCUDA::test_multiple_devices_hsplit_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0379732Z test_ops.py::TestCommonCUDA::test_multiple_devices_hstack_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0379923Z test_ops.py::TestCommonCUDA::test_multiple_devices_hypot_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0380116Z test_ops.py::TestCommonCUDA::test_multiple_devices_igammac_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0380303Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_add_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0380500Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0380698Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0380895Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_fill_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0381087Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_put_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0381279Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0381478Z test_ops.py::TestCommonCUDA::test_multiple_devices_index_select_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0381667Z test_ops.py::TestCommonCUDA::test_multiple_devices_int_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0381862Z test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0382055Z test_ops.py::TestCommonCUDA::test_multiple_devices_isclose_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0382274Z test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0382468Z test_ops.py::TestCommonCUDA::test_multiple_devices_isfinite_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0382655Z test_ops.py::TestCommonCUDA::test_multiple_devices_isinf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0382842Z test_ops.py::TestCommonCUDA::test_multiple_devices_isnan_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0383031Z test_ops.py::TestCommonCUDA::test_multiple_devices_isposinf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0383240Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0383460Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_2inputs_2outputs_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0383663Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_binary_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0383861Z test_ops.py::TestCommonCUDA::test_multiple_devices_jiterator_unary_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0384051Z test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0384239Z test_ops.py::TestCommonCUDA::test_multiple_devices_kron_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0384430Z test_ops.py::TestCommonCUDA::test_multiple_devices_kthvalue_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0384617Z test_ops.py::TestCommonCUDA::test_multiple_devices_lcm_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0384830Z test_ops.py::TestCommonCUDA::test_multiple_devices_ldexp_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0385013Z test_ops.py::TestCommonCUDA::test_multiple_devices_le_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0385200Z test_ops.py::TestCommonCUDA::test_multiple_devices_lerp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0385385Z test_ops.py::TestCommonCUDA::test_multiple_devices_lgamma_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0385582Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cond_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0385782Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0385978Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cross_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0386175Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_det_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0386375Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0386574Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvals_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0386773Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_eigvalsh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0386982Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_householder_product_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0387173Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0387374Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_inv_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0387606Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_ldl_factor_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0387803Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lstsq_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0388008Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_factor_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0388209Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_lu_solve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0388415Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_power_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0388616Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_matrix_rank_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0388862Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  7%]
2023-01-11T23:13:47.0389069Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_solve_triangular_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0389266Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svd_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0389464Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_svdvals_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0389667Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorinv_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0389927Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_tensorsolve_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0390156Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_vander_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0390354Z test_ops.py::TestCommonCUDA::test_multiple_devices_linspace_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0390544Z test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0390733Z test_ops.py::TestCommonCUDA::test_multiple_devices_log10_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0390927Z test_ops.py::TestCommonCUDA::test_multiple_devices_log1p_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0391111Z test_ops.py::TestCommonCUDA::test_multiple_devices_log2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0391298Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0391496Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0391709Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0391915Z test_ops.py::TestCommonCUDA::test_multiple_devices_log_softmax_with_dtype_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0392113Z test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp2_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0392317Z test_ops.py::TestCommonCUDA::test_multiple_devices_logaddexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0392515Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0392710Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_and_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0392913Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0393122Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_not_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0393320Z test_ops.py::TestCommonCUDA::test_multiple_devices_logical_or_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0393513Z test_ops.py::TestCommonCUDA::test_multiple_devices_logspace_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0393714Z test_ops.py::TestCommonCUDA::test_multiple_devices_logsumexp_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0393904Z test_ops.py::TestCommonCUDA::test_multiple_devices_long_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0394092Z test_ops.py::TestCommonCUDA::test_multiple_devices_lt_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0394282Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0394483Z test_ops.py::TestCommonCUDA::test_multiple_devices_lu_unpack_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0394673Z test_ops.py::TestCommonCUDA::test_multiple_devices_mH_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0394859Z test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0395038Z test_ops.py::TestCommonCUDA::test_multiple_devices_mT_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0395233Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_amin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0395434Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0395659Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_argmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0395856Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_cumsum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0396053Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0396248Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_fill_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0396454Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_log_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0396657Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logaddexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0396856Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_logsumexp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0397055Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0397252Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_median_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0397444Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0397642Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_prod_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0397836Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_select_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0398037Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmax_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0398237Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_softmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0398455Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0398648Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_std_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0398835Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0399028Z test_ops.py::TestCommonCUDA::test_multiple_devices_masked_sum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0399224Z test_ops.py::TestCommonCUDA::test_multiple_devices_matrix_exp_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0399414Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_binary_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0399637Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0399847Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_no_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0400054Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0400259Z test_ops.py::TestCommonCUDA::test_multiple_devices_max_reduction_with_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0400451Z test_ops.py::TestCommonCUDA::test_multiple_devices_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0400640Z test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0400843Z test_ops.py::TestCommonCUDA::test_multiple_devices_median_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0401056Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_list_of_tensors_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0401268Z test_ops.py::TestCommonCUDA::test_multiple_devices_meshgrid_variadic_tensors_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0401473Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_no_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0401680Z test_ops.py::TestCommonCUDA::test_multiple_devices_min_reduction_with_dim_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0401866Z test_ops.py::TestCommonCUDA::test_multiple_devices_mm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0402057Z test_ops.py::TestCommonCUDA::test_multiple_devices_mode_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0402252Z test_ops.py::TestCommonCUDA::test_multiple_devices_movedim_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0402442Z test_ops.py::TestCommonCUDA::test_multiple_devices_msort_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0402631Z test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0402809Z test_ops.py::TestCommonCUDA::test_multiple_devices_mul_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0403006Z test_ops.py::TestCommonCUDA::test_multiple_devices_multinomial_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0403217Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0403424Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0403633Z test_ops.py::TestCommonCUDA::test_multiple_devices_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0403854Z test_ops.py::TestCommonCUDA::test_multiple_devices_nan_to_num_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0404051Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanmean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0404249Z test_ops.py::TestCommonCUDA::test_multiple_devices_nanmedian_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0404443Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0404631Z test_ops.py::TestCommonCUDA::test_multiple_devices_narrow_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0404836Z test_ops.py::TestCommonCUDA::test_multiple_devices_native_dropout_backward_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0405020Z test_ops.py::TestCommonCUDA::test_multiple_devices_ne_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0405211Z test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0405398Z test_ops.py::TestCommonCUDA::test_multiple_devices_neg_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0405589Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0405790Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_empty_strided_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0405982Z test_ops.py::TestCommonCUDA::test_multiple_devices_new_zeros_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0406199Z test_ops.py::TestCommonCUDA::test_multiple_devices_nextafter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0406425Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0406647Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_avg_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0406860Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_adaptive_max_pool1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0407083Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_batch_norm_without_cudnn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0407302Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0407508Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_celu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0407716Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv1d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0407915Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0408132Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0408350Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_conv_transpose3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0408570Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0408792Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cosine_embedding_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0409056Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_cross_entropy_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0409292Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_dropout2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0409506Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_bag_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0409714Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_embedding_cuda_float32 SKIPPED (fewer than 2 devices detected) [  7%]
2023-01-11T23:13:47.0409947Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_with_train_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0410185Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_feature_alpha_dropout_without_train_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0410404Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_gaussian_nll_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0410615Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_grid_sample_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0410830Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_group_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0411036Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardshrink_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0411248Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardsigmoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0411475Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardswish_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0411684Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0411895Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_hardtanh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0412105Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_huber_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0412316Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_instance_norm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0412536Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bicubic_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0412756Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_bilinear_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0412982Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_interpolate_nearest_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0413189Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_kl_div_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0413395Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_l1_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0413603Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_logsigmoid_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0413812Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_margin_ranking_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0414014Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0414225Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_pool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0414454Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool2d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0414775Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0414993Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_max_unpool3d_grad_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0415208Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multi_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0415431Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0415640Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_nll_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0415854Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_normalize_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0416050Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_one_hot_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0416263Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0416472Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_circular_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0416685Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0416945Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pad_constant_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0417155Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_shuffle_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0417369Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_pixel_unshuffle_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0417586Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0417798Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_poisson_nll_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0418003Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_prelu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0418203Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu6_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0418402Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_relu_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0418609Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softmin_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0418814Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softplus_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0419049Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_softshrink_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0419288Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0419504Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0419762Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_triplet_margin_with_distance_loss_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0419978Z test_ops.py::TestCommonCUDA::test_multiple_devices_nn_functional_upsample_nearest_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0420171Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_inf_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0420366Z test_ops.py::TestCommonCUDA::test_multiple_devices_norm_nuc_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0420551Z test_ops.py::TestCommonCUDA::test_multiple_devices_normal_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0420756Z test_ops.py::TestCommonCUDA::test_multiple_devices_normal_number_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0420947Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0421134Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0421323Z test_ops.py::TestCommonCUDA::test_multiple_devices_ones_like_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0421515Z test_ops.py::TestCommonCUDA::test_multiple_devices_outer_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0421715Z test_ops.py::TestCommonCUDA::test_multiple_devices_pca_lowrank_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0421908Z test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0422100Z test_ops.py::TestCommonCUDA::test_multiple_devices_permute_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0422331Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0422519Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Skipped!) [  8%]
2023-01-11T23:13:47.0422709Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [  8%]
2023-01-11T23:13:47.0422898Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [  8%]
2023-01-11T23:13:47.0423088Z test_ops.py::TestCommonCUDA::test_multiple_devices_polygamma_polygamma_n_3_cuda_int64 SKIPPED (Skipped!) [  8%]
2023-01-11T23:13:47.0423282Z test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0423474Z test_ops.py::TestCommonCUDA::test_multiple_devices_positive_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0423666Z test_ops.py::TestCommonCUDA::test_multiple_devices_pow_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0423851Z test_ops.py::TestCommonCUDA::test_multiple_devices_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0424046Z test_ops.py::TestCommonCUDA::test_multiple_devices_quantile_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0424232Z test_ops.py::TestCommonCUDA::test_multiple_devices_rad2deg_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0424419Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0424609Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0424808Z test_ops.py::TestCommonCUDA::test_multiple_devices_randint_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0425004Z test_ops.py::TestCommonCUDA::test_multiple_devices_randn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0425219Z test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0425406Z test_ops.py::TestCommonCUDA::test_multiple_devices_ravel_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0425607Z test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0425802Z test_ops.py::TestCommonCUDA::test_multiple_devices_reciprocal_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0425993Z test_ops.py::TestCommonCUDA::test_multiple_devices_renorm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0426176Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0426381Z test_ops.py::TestCommonCUDA::test_multiple_devices_repeat_interleave_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0426576Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0426764Z test_ops.py::TestCommonCUDA::test_multiple_devices_reshape_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0426946Z test_ops.py::TestCommonCUDA::test_multiple_devices_resize__cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0427139Z test_ops.py::TestCommonCUDA::test_multiple_devices_resolve_conj_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0427326Z test_ops.py::TestCommonCUDA::test_multiple_devices_roll_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0427537Z test_ops.py::TestCommonCUDA::test_multiple_devices_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [  8%]
2023-01-11T23:13:47.0427725Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsqrt_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0427910Z test_ops.py::TestCommonCUDA::test_multiple_devices_rsub_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0428108Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_add_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0428311Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_amin_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0428515Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0428720Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_mean_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0428925Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_prod_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0429135Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0429337Z test_ops.py::TestCommonCUDA::test_multiple_devices_scatter_reduce_sum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0429536Z test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0429791Z test_ops.py::TestCommonCUDA::test_multiple_devices_searchsorted_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0429998Z test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_lengths_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0430198Z test_ops.py::TestCommonCUDA::test_multiple_devices_segment_reduce_offsets_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0430392Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0430614Z test_ops.py::TestCommonCUDA::test_multiple_devices_select_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0430805Z test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0430991Z test_ops.py::TestCommonCUDA::test_multiple_devices_sgn_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0431178Z test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0431364Z test_ops.py::TestCommonCUDA::test_multiple_devices_short_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0431575Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_blackman_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0431784Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_cosine_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0431987Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_gaussian_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0432204Z test_ops.py::TestCommonCUDA::test_multiple_devices_signal_windows_general_hamming_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0432396Z test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0432586Z test_ops.py::TestCommonCUDA::test_multiple_devices_signbit_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0432777Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinc_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0432987Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0433177Z test_ops.py::TestCommonCUDA::test_multiple_devices_sinh_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0433377Z test_ops.py::TestCommonCUDA::test_multiple_devices_slice_scatter_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0433582Z test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0433784Z test_ops.py::TestCommonCUDA::test_multiple_devices_softmax_with_dtype_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0433983Z test_ops.py::TestCommonCUDA::test_multiple_devices_sparse_sampled_addmm_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0434184Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0434394Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0434595Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_j1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0434791Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_bessel_y0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0435012Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0435230Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0435447Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_u_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0435818Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:13:47.0436195Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:13:47.0436398Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0436588Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_entr_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0436810Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_h_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0437029Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_hermite_polynomial_he_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0437229Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0437427Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i0e_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0437627Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_i1e_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0437845Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0438061Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_laguerre_polynomial_l_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0438411Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:13:47.0438787Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_legendre_polynomial_p_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:13:47.0438985Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_log_ndtr_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0439203Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0439417Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0439628Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0439842Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_i1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0440055Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0440268Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0440477Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_modified_bessel_k1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0440675Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_ndtr_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0440903Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0441130Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_polygamma_special_polygamma_n_0_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0441345Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_float32 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0441584Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_scaled_modified_bessel_k1_cuda_int64 SKIPPED (fewer than 2 devices detected) [  8%]
2023-01-11T23:13:47.0441956Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  8%]
2023-01-11T23:13:47.0442310Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  9%]
2023-01-11T23:13:47.0442668Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_u_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  9%]
2023-01-11T23:13:47.0443032Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_v_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  9%]
2023-01-11T23:13:47.0443396Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_shifted_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [  9%]
2023-01-11T23:13:47.0443614Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_spherical_bessel_j0_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0443819Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0444017Z test_ops.py::TestCommonCUDA::test_multiple_devices_special_zeta_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0444213Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0444395Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0444618Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_list_args_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0444819Z test_ops.py::TestCommonCUDA::test_multiple_devices_split_with_sizes_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0445008Z test_ops.py::TestCommonCUDA::test_multiple_devices_sqrt_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0445201Z test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0445398Z test_ops.py::TestCommonCUDA::test_multiple_devices_square_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0445590Z test_ops.py::TestCommonCUDA::test_multiple_devices_squeeze_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0445779Z test_ops.py::TestCommonCUDA::test_multiple_devices_stack_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0445985Z test_ops.py::TestCommonCUDA::test_multiple_devices_std_mean_unbiased_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0446179Z test_ops.py::TestCommonCUDA::test_multiple_devices_stft_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0446360Z test_ops.py::TestCommonCUDA::test_multiple_devices_sub_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0446548Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0446741Z test_ops.py::TestCommonCUDA::test_multiple_devices_sum_to_size_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0446936Z test_ops.py::TestCommonCUDA::test_multiple_devices_svd_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0447134Z test_ops.py::TestCommonCUDA::test_multiple_devices_svd_lowrank_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0447322Z test_ops.py::TestCommonCUDA::test_multiple_devices_t_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0447533Z test_ops.py::TestCommonCUDA::test_multiple_devices_tan_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0447736Z test_ops.py::TestCommonCUDA::test_multiple_devices_tensor_split_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0447926Z test_ops.py::TestCommonCUDA::test_multiple_devices_tile_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0448106Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0448293Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0448486Z test_ops.py::TestCommonCUDA::test_multiple_devices_to_sparse_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0448681Z test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0448873Z test_ops.py::TestCommonCUDA::test_multiple_devices_topk_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0449092Z test_ops.py::TestCommonCUDA::test_multiple_devices_trace_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0449306Z test_ops.py::TestCommonCUDA::test_multiple_devices_trapezoid_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0449502Z test_ops.py::TestCommonCUDA::test_multiple_devices_tril_indices_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0449691Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0449875Z test_ops.py::TestCommonCUDA::test_multiple_devices_triu_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0450084Z test_ops.py::TestCommonCUDA::test_multiple_devices_trunc_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0450280Z test_ops.py::TestCommonCUDA::test_multiple_devices_unbind_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0450473Z test_ops.py::TestCommonCUDA::test_multiple_devices_unflatten_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0450669Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0450862Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0451051Z test_ops.py::TestCommonCUDA::test_multiple_devices_unfold_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0451237Z test_ops.py::TestCommonCUDA::test_multiple_devices_unique_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0451430Z test_ops.py::TestCommonCUDA::test_multiple_devices_unsqueeze_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0451620Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0451805Z test_ops.py::TestCommonCUDA::test_multiple_devices_var_mean_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0451995Z test_ops.py::TestCommonCUDA::test_multiple_devices_vdot_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0452183Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_as_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0452370Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_copy_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0452556Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0452744Z test_ops.py::TestCommonCUDA::test_multiple_devices_view_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0452955Z test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0453149Z test_ops.py::TestCommonCUDA::test_multiple_devices_vstack_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0453337Z test_ops.py::TestCommonCUDA::test_multiple_devices_where_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0453525Z test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0453703Z test_ops.py::TestCommonCUDA::test_multiple_devices_zero__cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0453894Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0454084Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_cuda_int64 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0454280Z test_ops.py::TestCommonCUDA::test_multiple_devices_zeros_like_cuda_float32 SKIPPED (fewer than 2 devices detected) [  9%]
2023-01-11T23:13:47.0454443Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_H_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0454713Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_T_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0454886Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___radd___cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0455049Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___rdiv___cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0455212Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values___ror___cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0455368Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_acosh_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0455590Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_addr_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0455754Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_all_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0455919Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amax_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0456081Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_amin_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0456241Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_any_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0456414Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_as_strided_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0456576Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_asinh_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0456732Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atanh_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0456905Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_1d_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0457074Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_atleast_3d_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0457239Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bitwise_or_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0457401Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_bool_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0457582Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_broadcast_tensors_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0457755Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cartesian_prod_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0457917Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cat_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0458080Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cfloat_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0458233Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chalf_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0458398Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_char_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0458562Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_chunk_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0458764Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_clamp_max_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0458924Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_conj_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0459081Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cosh_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0459244Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummax_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0459408Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_cummin_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0459561Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0459735Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diag_embed_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0459901Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagflat_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0460069Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0460245Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_diagonal_scatter_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0460407Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_dstack_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0460589Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_empty_cuda_bool SKIPPED (Skipped!) [  9%]
2023-01-11T23:13:47.0460749Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erf_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0460914Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfc_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0461068Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_erfinv_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0461293Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_exp2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0461453Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_expand_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0461630Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_fftshift_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0461798Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_hfft2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0461968Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0462133Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifft_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0462308Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ifftshift_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0462467Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0462633Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_ihfft_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0462795Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_irfft2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0462959Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0463121Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fft_rfft_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0463285Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fill_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0463450Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flatten_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0463617Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_flipud_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0463786Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_float_power_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0463938Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_fmax_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0464100Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ge_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0464258Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_half_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0464443Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_hsplit_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0464609Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_add_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0464776Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_copy_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0464944Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_put_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0465116Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_index_select_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0465270Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_int_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0465442Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isneginf_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0465606Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_isposinf_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0465800Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_2inputs_2outputs_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0465995Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_4inputs_with_extra_args_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0466175Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_jiterator_binary_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0466338Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ldexp_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0466500Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_le_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0466663Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_lgamma_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0466844Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log10_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0467009Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log1p_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0467172Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log2_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0467333Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0467517Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_log_softmax_with_dtype_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0467690Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_not_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0467857Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_or_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0468028Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logical_xor_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0468190Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_logit_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0468343Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mH_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0468499Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mT_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0468667Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_fill_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0468837Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_masked_sum_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0469038Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_max_reduction_with_dim_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0469229Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_maximum_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0469404Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_minimum_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0469565Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_movedim_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0469785Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_mul_cuda_bool PASSED [  9%]
2023-01-11T23:13:47.0469955Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_copy_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0470146Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_narrow_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0470333Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_empty_cuda_bool SKIPPED (Skipped!) [ 10%]
2023-01-11T23:13:47.0470498Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_full_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0470661Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_ones_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0470827Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_new_zeros_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0471020Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pad_circular_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0471215Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_pixel_shuffle_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0471391Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_nn_functional_softsign_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0471558Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_ones_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0471758Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_2_cuda_bool SKIPPED (Skipped!) [ 10%]
2023-01-11T23:13:47.0471949Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_3_cuda_bool SKIPPED (Skipped!) [ 10%]
2023-01-11T23:13:47.0472145Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_polygamma_polygamma_n_4_cuda_bool SKIPPED (Skipped!) [ 10%]
2023-01-11T23:13:47.0472311Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_rad2deg_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0472476Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_real_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0472675Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_reciprocal_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0472853Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_resize_as__cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0473075Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scalar_tensor_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0473275Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_scatter_reduce_sum_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0473440Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_select_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0473600Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sign_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0473764Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_signbit_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0473923Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinc_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0474086Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sinh_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0474259Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_slice_scatter_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0474431Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_softmax_with_dtype_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0474608Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_airy_ai_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0474784Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j0_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0474956Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_bessel_j1_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0475151Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_u_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0475531Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_chebyshev_polynomial_w_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:13:47.0475710Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_erfcx_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0475920Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_i1e_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0476104Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_log_ndtr_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0476295Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_i1_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0476476Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_modified_bessel_k0_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0476650Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_ndtr_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0476849Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_scaled_modified_bessel_k1_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0477224Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_t_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:13:47.0477599Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_shifted_chebyshev_polynomial_u_cuda_bool SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 10%]
2023-01-11T23:13:47.0477779Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_xlog1py_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0477952Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_special_zeta_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0478132Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_list_args_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0478310Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_split_with_sizes_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0478469Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_square_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0478666Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_squeeze_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0478842Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_sum_to_size_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0479034Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_t_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0479231Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_along_dim_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0479396Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_take_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0479561Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tan_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0479725Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tanh_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0479888Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tile_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0480044Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0480216Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_to_sparse_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0480389Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_transpose_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0480553Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_tril_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0480721Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unflatten_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0480894Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_copy_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0481060Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unfold_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0481242Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_consecutive_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0481398Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unique_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0481571Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_unsqueeze_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0481735Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_view_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0481922Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_vstack_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0482091Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_where_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0482257Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_xlogy_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0482422Z test_ops.py::TestCommonCUDA::test_non_standard_bool_values_zeros_cuda_bool PASSED [ 10%]
2023-01-11T23:13:47.0482590Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0482758Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_H_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0482913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_T_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0483090Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0483257Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___radd___cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0483424Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rand___cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0483594Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0483766Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0483934Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rdiv___cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0484111Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0484277Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmatmul___cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0484475Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0484646Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0484814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rmul___cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0484977Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rpow___cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0485150Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0485311Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples___rsub___cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0485502Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples__softmax_backward_data_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0485670Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0485829Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acos_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0486000Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_acosh_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0486169Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0486339Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_add_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0486514Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0486687Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcdiv_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0486859Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0487027Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addcmul_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0487204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmm_decomposed_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0487378Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addmv_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0487548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_addr_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0487736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_all_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0487912Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0488084Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_allclose_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0488249Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0488416Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amax_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0488582Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0488741Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_amin_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0488930Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_aminmax_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0489125Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0489302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0489467Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_angle_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0489636Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_arange_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0489806Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmax_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0489977Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argmin_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0490138Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0490305Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argsort_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0490508Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0490676Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_argwhere_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0490850Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_cuda_float32 XFAIL [ 10%]
2023-01-11T23:13:47.0491046Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_partial_views_cuda_complex64 XFAIL [ 10%]
2023-01-11T23:13:47.0491281Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_complex64 SKIPPED (Works for int64, fails for everything else) [ 10%]
2023-01-11T23:13:47.0491507Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_as_strided_scatter_cuda_float32 SKIPPED (Works for int64, fails for everything else) [ 10%]
2023-01-11T23:13:47.0491674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_asin_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0491837Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0492002Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atan2_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0492179Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0492344Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atanh_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0492523Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_1d_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0492696Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_2d_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0492875Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_atleast_3d_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0493046Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_baddbmm_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0493220Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bernoulli_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0493387Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0493588Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bfloat16_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0493760Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_not_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0493927Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_or_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0494112Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_right_shift_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0494282Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bitwise_xor_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0494460Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_block_diag_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0494747Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bool_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0494943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_shapes_cuda_float32 SKIPPED (Skipped!) [ 10%]
2023-01-11T23:13:47.0495130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0495315Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_tensors_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0495502Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_complex64 PASSED [ 10%]
2023-01-11T23:13:47.0495677Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_broadcast_to_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0495850Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_bucketize_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0496018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_byte_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0496193Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cartesian_prod_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0496420Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0496575Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cat_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0496747Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdist_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0496913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cdouble_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0497078Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_float32 PASSED [ 10%]
2023-01-11T23:13:47.0497242Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cfloat_cuda_int64 PASSED [ 10%]
2023-01-11T23:13:47.0497410Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0497574Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_char_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0497741Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0497917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cholesky_solve_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0498087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_max_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0498261Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0498428Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_clamp_min_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0498605Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_column_stack_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0498782Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0498980Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_combinations_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0499174Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_complex_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0499344Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0499501Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0499697Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_conj_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0499882Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_constant_pad_nd_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0500051Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0500217Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_copysign_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0500388Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0500554Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_corrcoef_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0500721Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0500881Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0501042Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cos_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0501207Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cosh_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0501387Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0501563Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_count_nonzero_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0501728Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cov_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0501897Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0502065Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cross_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0502254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummax_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0502413Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0502585Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cummin_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0502751Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumprod_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0502924Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0503087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_cumsum_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0503254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0503419Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_deg2rad_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0503582Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0503740Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0503902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0504079Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diag_embed_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0504257Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0504427Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0504592Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagflat_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0504757Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0504943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0505125Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0505293Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diagonal_scatter_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0505485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0505651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_diff_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0505820Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_digamma_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0505982Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dist_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0506165Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0506342Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_floor_rounding_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0506530Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0506708Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0506892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_div_no_rounding_mode_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0507061Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0507227Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dot_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0507398Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0507561Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_double_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0507732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0507899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0508086Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dsplit_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0508247Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0508414Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_dstack_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0508598Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_float32 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:13:47.0508782Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_cuda_int64 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:13:47.0508991Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_empty_like_cuda_float32 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:13:47.0509182Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0509345Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_equal_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0509514Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0509729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erf_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0509903Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0510065Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfc_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0510231Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0510396Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_erfinv_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0510563Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0510729Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_exp_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0510899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0511071Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_as_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0511232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expand_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0511424Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0511593Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_expm1_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0511772Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_eye_cuda_float32 SKIPPED (Skipped!) [ 11%]
2023-01-11T23:13:47.0511945Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft2_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0512118Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fft_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0512291Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0512456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftn_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0512629Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0512806Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_fftshift_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0512979Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft2_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0513147Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfft_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0513322Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_hfftn_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0513494Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft2_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0513660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0513824Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifft_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0514016Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftn_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0514184Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0514357Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ifftshift_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0514527Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0514693Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_ihfft_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0514860Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft2_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0515033Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfft_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0515208Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0515381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_irfftn_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0515551Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0515714Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fft_rfftn_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0515881Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flatten_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0516050Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0516216Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0516379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_flip_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0516548Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0516713Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fliplr_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0516883Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0517040Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_float_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0517231Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0517407Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_floor_divide_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0517571Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0517735Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmax_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0517896Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0518060Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmin_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0518222Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_fmod_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0518394Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0518555Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_full_like_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0518725Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0518891Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0519059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gather_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0519222Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gcd_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0519391Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0519558Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_geqrf_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0519765Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gradient_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0519936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_grid_sampler_2d_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0520102Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0520265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_gt_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0520427Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_half_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0520593Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_heaviside_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0520761Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_hstack_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0520920Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_i0_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0521086Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_igamma_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0521254Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0521426Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0521599Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_fill_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0521775Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0521941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_put_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0522116Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_reduce_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0522288Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0522461Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_index_select_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0522631Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_inner_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0522790Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_float32 PASSED [ 11%]
2023-01-11T23:13:47.0522980Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isclose_cuda_int64 PASSED [ 11%]
2023-01-11T23:13:47.0523157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_complex64 PASSED [ 11%]
2023-01-11T23:13:47.0523325Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isfinite_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0523491Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isinf_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0523657Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isnan_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0523824Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isposinf_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0523992Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_isreal_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0524190Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_2inputs_2outputs_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0524382Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0524577Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_4inputs_with_extra_args_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0524758Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0524936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0525114Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_binary_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0525297Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_jiterator_unary_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0525487Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kron_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0525657Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0525818Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_kthvalue_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0525991Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ldexp_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0526157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0526319Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_le_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0526485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lerp_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0526650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lgamma_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0526825Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cond_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0527006Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0527177Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0527342Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cross_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0527517Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0527743Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_det_singular_cuda_float32 SKIPPED (The backward may give different results) [ 12%]
2023-01-11T23:13:47.0527911Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eig_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0528087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_eigvals_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0528272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0528461Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0528666Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_ldl_factor_ex_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0528862Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0529058Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0529263Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_factor_ex_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0529447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_lu_solve_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0529629Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_power_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0529813Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0530006Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0530194Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_multi_dot_cuda_float32 SKIPPED (67470!) [ 12%]
2023-01-11T23:13:47.0530374Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0530545Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_norm_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0530712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0530902Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0531150Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 12%]
2023-01-11T23:13:47.0541264Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 12%]
2023-01-11T23:13:47.0541481Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_qr_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0541667Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0541842Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_slogdet_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0542020Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0542206Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_solve_triangular_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0542374Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_svd_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0542556Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_tensorinv_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0542730Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0542904Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vander_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0543081Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_vecdot_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0543249Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linspace_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0543418Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0543583Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0543745Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log10_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0543905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0544072Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0544316Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log1p_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0544485Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0544650Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0544822Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0545012Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0545200Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_log_softmax_with_dtype_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0545365Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logaddexp_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0545541Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logdet_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0545715Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_and_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0545888Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_not_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0546059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_or_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0546230Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0546401Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logical_xor_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0546567Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0546732Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logit_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0546953Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logspace_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0547122Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_logsumexp_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0547294Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0547462Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0547627Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_long_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0547790Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lt_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0547957Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0548129Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_lu_unpack_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0548286Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mH_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0548458Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0548618Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0548783Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mT_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0548955Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0549127Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amax_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0549302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_amin_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0549473Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_argmin_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0549651Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_cumprod_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0549922Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_log_softmax_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0550109Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logaddexp_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0550322Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_logsumexp_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0550503Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0550679Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_mean_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0550852Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_norm_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0551032Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_normalize_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0551204Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_prod_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0551385Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_scatter_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0551561Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_select_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0551738Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_softmin_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0551915Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0552087Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0552256Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_std_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0552427Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_masked_var_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0552595Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_binary_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0552779Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_no_dim_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0552989Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_max_reduction_with_dim_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0553161Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mean_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0553354Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0553543Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_list_of_tensors_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0553737Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0553929Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_meshgrid_variadic_tensors_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0554102Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0554275Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_binary_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0554459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_min_reduction_with_dim_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0554622Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mm_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0554796Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0554965Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_movedim_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0555133Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0555300Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mul_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0555475Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_multinomial_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0555640Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mv_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0555831Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0556018Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0556217Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_3_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0556402Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0556568Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nan_to_num_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0556730Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0556892Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ne_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0557055Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0557221Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_neg_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0557410Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_cuda_complex64 SKIPPED (Skipped!) [ 12%]
2023-01-11T23:13:47.0557653Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_empty_strided_cuda_float32 SKIPPED (Expected: new_empty_strided is not comparable) [ 12%]
2023-01-11T23:13:47.0557816Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_full_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0557984Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_ones_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0558157Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_new_zeros_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0558330Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nextafter_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0558524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_alpha_dropout_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0558745Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0558949Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_conv2d_cuda_complex64 PASSED [ 12%]
2023-01-11T23:13:47.0559181Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0559379Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_cosine_embedding_loss_cuda_int64 PASSED [ 12%]
2023-01-11T23:13:47.0559557Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_dropout_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0559736Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_elu_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0559946Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0560166Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0560346Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_glu_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0560539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_grid_sample_cuda_float32 PASSED [ 12%]
2023-01-11T23:13:47.0560727Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_group_norm_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0560917Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardshrink_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0561107Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardsigmoid_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0561287Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hardtanh_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0561483Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0561684Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0561945Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0562142Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_interpolate_linear_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0562325Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_kl_div_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0562509Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0562691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_l1_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0562876Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_linear_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0563059Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_logsigmoid_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0563257Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0563451Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0563639Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_pool2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0563830Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool1d_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0564016Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool2d_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0564207Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0564415Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mish_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0564599Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_mse_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0564786Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0564970Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_nll_loss_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0565162Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0565347Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_normalize_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0565539Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0565733Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pad_reflect_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0565936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0566130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0566321Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pairwise_distance_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0566511Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pdist_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0566691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0566882Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_pixel_unshuffle_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0567066Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0567247Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu6_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0567449Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0567631Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_relu_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0567814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_rrelu_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0567991Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_selu_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0568188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0568373Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0568564Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softplus_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0568752Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softshrink_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0568940Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_softsign_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0569153Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_tanhshrink_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0569362Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0569547Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_threshold_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0569744Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0569941Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0570172Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0570383Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_triplet_margin_with_distance_loss_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0570581Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_nn_functional_upsample_nearest_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0570748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0570923Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_inf_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0571101Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0571272Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_norm_nuc_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0571440Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0571620Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_normal_number_mean_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0571778Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0571948Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ones_like_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0572113Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ormqr_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0572285Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pca_lowrank_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0572451Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_permute_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0572622Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0572792Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pinverse_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0572981Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_0_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0573197Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_1_cuda_int64 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0573398Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0573595Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_2_cuda_int64 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0573793Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_polygamma_polygamma_n_4_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0573959Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_positive_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0574126Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_pow_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0574299Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0574465Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0574749Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_prod_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0574905Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0575065Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_qr_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0575232Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rad2deg_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0575404Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rand_like_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0575601Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_cuda_int64 SKIPPED (Test expects tensor input) [ 13%]
2023-01-11T23:13:47.0575777Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randint_like_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0576066Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_randn_cuda_float32 SKIPPED (Test expects tensor input) [ 13%]
2023-01-11T23:13:47.0576239Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0576402Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_ravel_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0576563Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0576723Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_real_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0576899Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0577069Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reciprocal_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0577238Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0577405Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_renorm_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0577567Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0577756Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_repeat_interleave_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0577927Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_reshape_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0578088Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0578251Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize__cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0578423Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0578589Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resize_as__cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0578762Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_conj_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0578936Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_resolve_neg_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0579132Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_roll_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0579302Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0579456Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rot90_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0579621Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0579815Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0579978Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsqrt_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0580141Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_rsub_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0580320Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_add_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0580487Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0580674Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amax_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0580861Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_amin_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0581031Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0581211Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_mean_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0581389Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0581566Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_scatter_reduce_sum_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0581765Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_searchsorted_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0581950Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_segment_reduce_lengths_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0582126Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0582301Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_select_scatter_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0582459Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sgn_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0582632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0582801Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0582963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sigmoid_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0583131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0583292Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sign_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0583500Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_exponential_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0583702Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_cosine_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0583907Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_general_hamming_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0584095Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_signal_windows_hamming_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0584262Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sin_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0584433Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0584599Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sinh_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0584776Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0584977Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_slice_scatter_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0585168Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_complex64 PASSED [ 13%]
2023-01-11T23:13:47.0585347Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0585524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_softmax_with_dtype_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0585717Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_complex64 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0585913Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 13%]
2023-01-11T23:13:47.0586096Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j0_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0586274Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0586447Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_j1_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0586617Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y0_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0586785Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_bessel_y1_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0586979Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_t_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0587172Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_u_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0587565Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 13%]
2023-01-11T23:13:47.0587964Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 13%]
2023-01-11T23:13:47.0588144Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_entr_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0588337Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0588527Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_hermite_polynomial_h_cuda_int64 PASSED [ 13%]
2023-01-11T23:13:47.0588700Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_float32 PASSED [ 13%]
2023-01-11T23:13:47.0588870Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i0e_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0589043Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0589214Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_i1_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0589410Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_laguerre_polynomial_l_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0589823Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:13:47.0590180Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_legendre_polynomial_p_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:13:47.0590372Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0590557Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i0_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0590746Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0590964Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_i1_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0591154Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k0_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0591337Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_modified_bessel_k1_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0591512Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_ndtri_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0591712Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0591909Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0592286Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:13:47.0592660Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:13:47.0593023Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_shifted_chebyshev_polynomial_w_cuda_int64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 14%]
2023-01-11T23:13:47.0593202Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0593376Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_xlog1py_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0593550Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0593748Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_special_zeta_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0593921Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0594096Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_split_list_args_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0594265Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0594428Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sqrt_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0594604Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0594772Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0594939Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_squeeze_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0595112Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stack_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0595281Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0595444Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0595616Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_std_unbiased_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0595779Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_stft_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0595943Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sub_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0596106Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0596267Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0596445Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_sum_to_size_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0596619Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0596792Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_svd_lowrank_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0596979Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0597146Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0597307Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_take_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0597470Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tan_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0597636Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tanh_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0597814Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0597984Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensor_split_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0598159Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tensordot_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0598328Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0598487Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tile_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0598654Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0598819Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0598995Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_complex64 SKIPPED [ 14%]
2023-01-11T23:13:47.0599164Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_to_sparse_cuda_int64 SKIPPED [ 14%]
2023-01-11T23:13:47.0599329Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0599494Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_topk_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0599691Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0599849Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0600016Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trace_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0600188Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0600356Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_transpose_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0600524Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0600690Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0600853Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_trapz_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0601036Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triangular_solve_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0601205Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0601364Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0601551Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_tril_indices_cuda_int64 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:13:47.0601718Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0601887Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0602075Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_triu_indices_cuda_int64 SKIPPED (Skipped!) [ 14%]
2023-01-11T23:13:47.0602253Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0602423Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_true_divide_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0602594Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0602772Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unbind_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0602947Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unflatten_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0603124Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0603297Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0603465Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_copy_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0603632Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0603798Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0603963Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unfold_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0604131Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_uniform_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0604311Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0604489Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_consecutive_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0604658Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unique_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0604834Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0605003Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_unsqueeze_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0605183Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_mean_unbiased_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0605381Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_var_unbiased_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0605546Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vdot_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0605709Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0605875Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0606051Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_as_real_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0606223Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0606388Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_copy_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0606550Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_view_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0606721Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0606888Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0607048Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vsplit_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0607211Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_vstack_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0607377Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_where_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0607543Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0607707Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_xlogy_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0607875Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0608035Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zero__cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0608206Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_complex64 PASSED [ 14%]
2023-01-11T23:13:47.0608368Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0608557Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_float32 PASSED [ 14%]
2023-01-11T23:13:47.0608726Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_zeros_like_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0608880Z test_ops.py::TestCommonCUDA::test_numpy_ref_addbmm_cuda_float64 PASSED   [ 14%]
2023-01-11T23:13:47.0609031Z test_ops.py::TestCommonCUDA::test_numpy_ref_aminmax_cuda_int64 PASSED    [ 14%]
2023-01-11T23:13:47.0609191Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0609343Z test_ops.py::TestCommonCUDA::test_numpy_ref_argwhere_cuda_int64 PASSED   [ 14%]
2023-01-11T23:13:47.0609509Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0609676Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_tensors_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0609837Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0609992Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0610148Z test_ops.py::TestCommonCUDA::test_numpy_ref_broadcast_to_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0610299Z test_ops.py::TestCommonCUDA::test_numpy_ref_cat_cuda_float64 PASSED      [ 14%]
2023-01-11T23:13:47.0610450Z test_ops.py::TestCommonCUDA::test_numpy_ref_clamp_cuda_float64 PASSED    [ 14%]
2023-01-11T23:13:47.0610600Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_complex128 XFAIL  [ 14%]
2023-01-11T23:13:47.0610750Z test_ops.py::TestCommonCUDA::test_numpy_ref_clone_cuda_int64 XFAIL       [ 14%]
2023-01-11T23:13:47.0610901Z test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_complex128 PASSED  [ 14%]
2023-01-11T23:13:47.0611075Z test_ops.py::TestCommonCUDA::test_numpy_ref_diag_cuda_float64 PASSED     [ 14%]
2023-01-11T23:13:47.0611225Z test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0611379Z test_ops.py::TestCommonCUDA::test_numpy_ref_flatten_cuda_float64 PASSED  [ 14%]
2023-01-11T23:13:47.0611552Z test_ops.py::TestCommonCUDA::test_numpy_ref_jiterator_2inputs_2outputs_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0611713Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0611865Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_cross_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0612038Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0612203Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_tensorsolve_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0612364Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vander_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0612521Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0612678Z test_ops.py::TestCommonCUDA::test_numpy_ref_linalg_vecdot_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0612851Z test_ops.py::TestCommonCUDA::test_numpy_ref_meshgrid_variadic_tensors_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0613031Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose2d_cuda_float64 XFAIL [ 14%]
2023-01-11T23:13:47.0613212Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_complex128 XFAIL [ 14%]
2023-01-11T23:13:47.0613392Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_conv_transpose3d_cuda_float64 XFAIL [ 14%]
2023-01-11T23:13:47.0613556Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_gelu_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0613724Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_l1_loss_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0613889Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_one_hot_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0614071Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_complex128 PASSED [ 14%]
2023-01-11T23:13:47.0614278Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0614459Z test_ops.py::TestCommonCUDA::test_numpy_ref_nn_functional_pairwise_distance_cuda_int64 PASSED [ 14%]
2023-01-11T23:13:47.0614719Z test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_float64 PASSED    [ 14%]
2023-01-11T23:13:47.0614872Z test_ops.py::TestCommonCUDA::test_numpy_ref_ravel_cuda_int64 PASSED      [ 14%]
2023-01-11T23:13:47.0615025Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_float64 PASSED   [ 14%]
2023-01-11T23:13:47.0615176Z test_ops.py::TestCommonCUDA::test_numpy_ref_repeat_cuda_int64 PASSED     [ 14%]
2023-01-11T23:13:47.0615327Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_complex128 PASSED  [ 14%]
2023-01-11T23:13:47.0615470Z test_ops.py::TestCommonCUDA::test_numpy_ref_roll_cuda_int64 PASSED       [ 14%]
2023-01-11T23:13:47.0615642Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_bartlett_cuda_float64 PASSED [ 14%]
2023-01-11T23:13:47.0615819Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_gaussian_cuda_float64 PASSED [ 15%]
2023-01-11T23:13:47.0615985Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_hann_cuda_float64 PASSED [ 15%]
2023-01-11T23:13:47.0616153Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_kaiser_cuda_float64 PASSED [ 15%]
2023-01-11T23:13:47.0616321Z test_ops.py::TestCommonCUDA::test_numpy_ref_signal_windows_nuttall_cuda_float64 PASSED [ 15%]
2023-01-11T23:13:47.0616480Z test_ops.py::TestCommonCUDA::test_numpy_ref_tensor_split_cuda_float64 PASSED [ 15%]
2023-01-11T23:13:47.0616630Z test_ops.py::TestCommonCUDA::test_numpy_ref_tile_cuda_float64 PASSED     [ 15%]
2023-01-11T23:13:47.0616784Z test_ops.py::TestCommonCUDA::test_numpy_ref_transpose_cuda_int64 PASSED  [ 15%]
2023-01-11T23:13:47.0616974Z test_ops.py::TestCommonCUDA::test_numpy_ref_tril_indices_cuda_int64 PASSED [ 15%]
2023-01-11T23:13:47.0617125Z test_ops.py::TestCommonCUDA::test_numpy_ref_unbind_cuda_int64 PASSED     [ 15%]
2023-01-11T23:13:47.0617285Z test_ops.py::TestCommonCUDA::test_numpy_ref_view_copy_cuda_float64 PASSED [ 15%]
2023-01-11T23:13:47.0617437Z test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_complex128 PASSED [ 15%]
2023-01-11T23:13:47.0617587Z test_ops.py::TestCommonCUDA::test_numpy_ref_where_cuda_float64 PASSED    [ 15%]
2023-01-11T23:13:47.0617729Z test_ops.py::TestCommonCUDA::test_out_H_cuda_float32 PASSED              [ 15%]
2023-01-11T23:13:47.0617876Z test_ops.py::TestCommonCUDA::test_out___getitem___cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0618021Z test_ops.py::TestCommonCUDA::test_out___rmatmul___cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0618157Z test_ops.py::TestCommonCUDA::test_out___rmod___cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0618300Z test_ops.py::TestCommonCUDA::test_out___rmul___cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0618443Z test_ops.py::TestCommonCUDA::test_out___rpow___cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0618586Z test_ops.py::TestCommonCUDA::test_out___rsub___cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0618751Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_bool_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0618917Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_double_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0619085Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_half_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0619248Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_long_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0619403Z test_ops.py::TestCommonCUDA::test_out__refs__conversions_short_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0619553Z test_ops.py::TestCommonCUDA::test_out__refs_acos_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0619703Z test_ops.py::TestCommonCUDA::test_out__refs_add_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0619851Z test_ops.py::TestCommonCUDA::test_out__refs_addcdiv_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0620031Z test_ops.py::TestCommonCUDA::test_out__refs_addcmul_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0620177Z test_ops.py::TestCommonCUDA::test_out__refs_addr_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0620323Z test_ops.py::TestCommonCUDA::test_out__refs_all_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0620467Z test_ops.py::TestCommonCUDA::test_out__refs_amin_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0620616Z test_ops.py::TestCommonCUDA::test_out__refs_as_strided_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0620788Z test_ops.py::TestCommonCUDA::test_out__refs_as_strided_partial_views_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0620950Z test_ops.py::TestCommonCUDA::test_out__refs_as_strided_scatter_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0621100Z test_ops.py::TestCommonCUDA::test_out__refs_asin_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0621245Z test_ops.py::TestCommonCUDA::test_out__refs_atan2_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0621399Z test_ops.py::TestCommonCUDA::test_out__refs_atleast_1d_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0621558Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_not_cuda_int64 PASSED [ 15%]
2023-01-11T23:13:47.0621704Z test_ops.py::TestCommonCUDA::test_out__refs_bitwise_or_cuda_int64 PASSED [ 15%]
2023-01-11T23:13:47.0621843Z test_ops.py::TestCommonCUDA::test_out__refs_cat_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0621989Z test_ops.py::TestCommonCUDA::test_out__refs_ceil_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0622138Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0622292Z test_ops.py::TestCommonCUDA::test_out__refs_clamp_min_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0622437Z test_ops.py::TestCommonCUDA::test_out__refs_clone_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0622607Z test_ops.py::TestCommonCUDA::test_out__refs_conj_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0622767Z test_ops.py::TestCommonCUDA::test_out__refs_conj_physical_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0622920Z test_ops.py::TestCommonCUDA::test_out__refs_copysign_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0623057Z test_ops.py::TestCommonCUDA::test_out__refs_cos_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0623205Z test_ops.py::TestCommonCUDA::test_out__refs_cumsum_cuda_float32 PASSED   [ 15%]
2023-01-11T23:13:47.0623365Z test_ops.py::TestCommonCUDA::test_out__refs_diagonal_copy_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0623528Z test_ops.py::TestCommonCUDA::test_out__refs_div_floor_rounding_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0623687Z test_ops.py::TestCommonCUDA::test_out__refs_div_trunc_rounding_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0623833Z test_ops.py::TestCommonCUDA::test_out__refs_dsplit_cuda_float32 PASSED   [ 15%]
2023-01-11T23:13:47.0623982Z test_ops.py::TestCommonCUDA::test_out__refs_erfc_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0624129Z test_ops.py::TestCommonCUDA::test_out__refs_erfinv_cuda_float32 PASSED   [ 15%]
2023-01-11T23:13:47.0624278Z test_ops.py::TestCommonCUDA::test_out__refs_exp2_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0624417Z test_ops.py::TestCommonCUDA::test_out__refs_exp_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0624570Z test_ops.py::TestCommonCUDA::test_out__refs_expand_as_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0624716Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fft_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0624872Z test_ops.py::TestCommonCUDA::test_out__refs_fft_fftshift_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625020Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfft_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625174Z test_ops.py::TestCommonCUDA::test_out__refs_fft_hfftn_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625332Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft2_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625485Z test_ops.py::TestCommonCUDA::test_out__refs_fft_ihfft_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625656Z test_ops.py::TestCommonCUDA::test_out__refs_fft_irfft2_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625808Z test_ops.py::TestCommonCUDA::test_out__refs_fft_rfft2_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0625958Z test_ops.py::TestCommonCUDA::test_out__refs_flatten_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0626104Z test_ops.py::TestCommonCUDA::test_out__refs_flip_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0626254Z test_ops.py::TestCommonCUDA::test_out__refs_fliplr_cuda_float32 PASSED   [ 15%]
2023-01-11T23:13:47.0626399Z test_ops.py::TestCommonCUDA::test_out__refs_fmod_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0626543Z test_ops.py::TestCommonCUDA::test_out__refs_frac_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0626687Z test_ops.py::TestCommonCUDA::test_out__refs_ge_cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0626836Z test_ops.py::TestCommonCUDA::test_out__refs_heaviside_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0626984Z test_ops.py::TestCommonCUDA::test_out__refs_igammac_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0627141Z test_ops.py::TestCommonCUDA::test_out__refs_index_copy_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0627295Z test_ops.py::TestCommonCUDA::test_out__refs_index_fill_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0627441Z test_ops.py::TestCommonCUDA::test_out__refs_isclose_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0627589Z test_ops.py::TestCommonCUDA::test_out__refs_isfinite_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0627734Z test_ops.py::TestCommonCUDA::test_out__refs_isnan_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0627878Z test_ops.py::TestCommonCUDA::test_out__refs_le_cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0628016Z test_ops.py::TestCommonCUDA::test_out__refs_lerp_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0628206Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_norm_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0628365Z test_ops.py::TestCommonCUDA::test_out__refs_linalg_svdvals_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0628514Z test_ops.py::TestCommonCUDA::test_out__refs_log10_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0628664Z test_ops.py::TestCommonCUDA::test_out__refs_log1p_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0628809Z test_ops.py::TestCommonCUDA::test_out__refs_log2_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0628955Z test_ops.py::TestCommonCUDA::test_out__refs_log_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0629123Z test_ops.py::TestCommonCUDA::test_out__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0629275Z test_ops.py::TestCommonCUDA::test_out__refs_logical_and_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0629429Z test_ops.py::TestCommonCUDA::test_out__refs_logical_or_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0629587Z test_ops.py::TestCommonCUDA::test_out__refs_logical_xor_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0629801Z test_ops.py::TestCommonCUDA::test_out__refs_logspace_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0629967Z test_ops.py::TestCommonCUDA::test_out__refs_logsumexp_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0630120Z test_ops.py::TestCommonCUDA::test_out__refs_masked_fill_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0630269Z test_ops.py::TestCommonCUDA::test_out__refs_maximum_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0630417Z test_ops.py::TestCommonCUDA::test_out__refs_movedim_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0630561Z test_ops.py::TestCommonCUDA::test_out__refs_nan_to_num_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0630708Z test_ops.py::TestCommonCUDA::test_out__refs_narrow_cuda_float32 PASSED   [ 15%]
2023-01-11T23:13:47.0630854Z test_ops.py::TestCommonCUDA::test_out__refs_ne_cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0631053Z test_ops.py::TestCommonCUDA::test_out__refs_new_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 15%]
2023-01-11T23:13:47.0631214Z test_ops.py::TestCommonCUDA::test_out__refs_new_empty_strided_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0631395Z test_ops.py::TestCommonCUDA::test_out__refs_new_zeros_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0631613Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 15%]
2023-01-11T23:13:47.0631782Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_group_norm_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0631950Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0632124Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0632297Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0632482Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0632663Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0632829Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0633007Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0633174Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_pdist_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0633349Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0633512Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu6_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0633667Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_relu_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0633861Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_softshrink_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0634029Z test_ops.py::TestCommonCUDA::test_out__refs_nn_functional_threshold_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0634183Z test_ops.py::TestCommonCUDA::test_out__refs_positive_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0634330Z test_ops.py::TestCommonCUDA::test_out__refs_pow_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0634479Z test_ops.py::TestCommonCUDA::test_out__refs_prod_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0634626Z test_ops.py::TestCommonCUDA::test_out__refs_randn_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0634770Z test_ops.py::TestCommonCUDA::test_out__refs_ravel_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0634920Z test_ops.py::TestCommonCUDA::test_out__refs_reciprocal_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0635066Z test_ops.py::TestCommonCUDA::test_out__refs_roll_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0635215Z test_ops.py::TestCommonCUDA::test_out__refs_round_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0635362Z test_ops.py::TestCommonCUDA::test_out__refs_rsqrt_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0635505Z test_ops.py::TestCommonCUDA::test_out__refs_sgn_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0635656Z test_ops.py::TestCommonCUDA::test_out__refs_sigmoid_cuda_float32 PASSED  [ 15%]
2023-01-11T23:13:47.0635800Z test_ops.py::TestCommonCUDA::test_out__refs_sign_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0635945Z test_ops.py::TestCommonCUDA::test_out__refs_sinh_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0636100Z test_ops.py::TestCommonCUDA::test_out__refs_softmax_with_dtype_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0636264Z test_ops.py::TestCommonCUDA::test_out__refs_special_bessel_j1_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0636425Z test_ops.py::TestCommonCUDA::test_out__refs_special_erfcx_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0636577Z test_ops.py::TestCommonCUDA::test_out__refs_special_i1_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0636740Z test_ops.py::TestCommonCUDA::test_out__refs_special_log_ndtr_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0636941Z test_ops.py::TestCommonCUDA::test_out__refs_special_log_softmax_with_dtype_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0637123Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0637303Z test_ops.py::TestCommonCUDA::test_out__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0637461Z test_ops.py::TestCommonCUDA::test_out__refs_special_ndtr_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0637625Z test_ops.py::TestCommonCUDA::test_out__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0637773Z test_ops.py::TestCommonCUDA::test_out__refs_sqrt_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0637918Z test_ops.py::TestCommonCUDA::test_out__refs_stack_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0638069Z test_ops.py::TestCommonCUDA::test_out__refs_std_mean_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0638215Z test_ops.py::TestCommonCUDA::test_out__refs_sum_cuda_float32 PASSED      [ 15%]
2023-01-11T23:13:47.0638365Z test_ops.py::TestCommonCUDA::test_out__refs_tanh_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0638510Z test_ops.py::TestCommonCUDA::test_out__refs_to_cuda_float32 PASSED       [ 15%]
2023-01-11T23:13:47.0638654Z test_ops.py::TestCommonCUDA::test_out__refs_trace_cuda_float32 PASSED    [ 15%]
2023-01-11T23:13:47.0638802Z test_ops.py::TestCommonCUDA::test_out__refs_tril_indices_cuda_int64 PASSED [ 15%]
2023-01-11T23:13:47.0638945Z test_ops.py::TestCommonCUDA::test_out__refs_triu_cuda_float32 PASSED     [ 15%]
2023-01-11T23:13:47.0639100Z test_ops.py::TestCommonCUDA::test_out__refs_triu_indices_cuda_int64 PASSED [ 15%]
2023-01-11T23:13:47.0639258Z test_ops.py::TestCommonCUDA::test_out__refs_true_divide_cuda_float32 PASSED [ 15%]
2023-01-11T23:13:47.0639427Z test_ops.py::TestCommonCUDA::test_out__refs_trunc_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0639572Z test_ops.py::TestCommonCUDA::test_out__refs_var_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0639721Z test_ops.py::TestCommonCUDA::test_out__refs_var_mean_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0639869Z test_ops.py::TestCommonCUDA::test_out__refs_view_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0640009Z test_ops.py::TestCommonCUDA::test_out__refs_vsplit_cuda_float32 PASSED   [ 16%]
2023-01-11T23:13:47.0640154Z test_ops.py::TestCommonCUDA::test_out__refs_where_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0640299Z test_ops.py::TestCommonCUDA::test_out__refs_xlogy_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0640462Z test_ops.py::TestCommonCUDA::test_out__softmax_backward_data_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0640606Z test_ops.py::TestCommonCUDA::test_out_acos_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0640747Z test_ops.py::TestCommonCUDA::test_out_acosh_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0640892Z test_ops.py::TestCommonCUDA::test_out_add_cuda_float32 PASSED            [ 16%]
2023-01-11T23:13:47.0641035Z test_ops.py::TestCommonCUDA::test_out_addbmm_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0641175Z test_ops.py::TestCommonCUDA::test_out_addcdiv_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0641320Z test_ops.py::TestCommonCUDA::test_out_addmm_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0641474Z test_ops.py::TestCommonCUDA::test_out_addmm_decomposed_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0641617Z test_ops.py::TestCommonCUDA::test_out_all_cuda_float32 PASSED            [ 16%]
2023-01-11T23:13:47.0641763Z test_ops.py::TestCommonCUDA::test_out_amin_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0641907Z test_ops.py::TestCommonCUDA::test_out_aminmax_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0642049Z test_ops.py::TestCommonCUDA::test_out_angle_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0642196Z test_ops.py::TestCommonCUDA::test_out_argmax_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0642333Z test_ops.py::TestCommonCUDA::test_out_argsort_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0642513Z test_ops.py::TestCommonCUDA::test_out_as_strided_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0642679Z test_ops.py::TestCommonCUDA::test_out_as_strided_partial_views_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0642836Z test_ops.py::TestCommonCUDA::test_out_as_strided_scatter_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0642978Z test_ops.py::TestCommonCUDA::test_out_asin_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0643122Z test_ops.py::TestCommonCUDA::test_out_atan2_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0643267Z test_ops.py::TestCommonCUDA::test_out_atanh_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0643416Z test_ops.py::TestCommonCUDA::test_out_atleast_1d_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0643567Z test_ops.py::TestCommonCUDA::test_out_atleast_2d_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0643705Z test_ops.py::TestCommonCUDA::test_out_atleast_3d_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0643851Z test_ops.py::TestCommonCUDA::test_out_baddbmm_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0643996Z test_ops.py::TestCommonCUDA::test_out_bernoulli_cuda_float32 XFAIL       [ 16%]
2023-01-11T23:13:47.0644140Z test_ops.py::TestCommonCUDA::test_out_bfloat16_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0644283Z test_ops.py::TestCommonCUDA::test_out_bincount_cuda_int64 PASSED         [ 16%]
2023-01-11T23:13:47.0644436Z test_ops.py::TestCommonCUDA::test_out_bitwise_left_shift_cuda_int64 PASSED [ 16%]
2023-01-11T23:13:47.0644582Z test_ops.py::TestCommonCUDA::test_out_bitwise_not_cuda_int64 PASSED      [ 16%]
2023-01-11T23:13:47.0644723Z test_ops.py::TestCommonCUDA::test_out_bool_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0644877Z test_ops.py::TestCommonCUDA::test_out_broadcast_shapes_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0645062Z test_ops.py::TestCommonCUDA::test_out_broadcast_tensors_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0645208Z test_ops.py::TestCommonCUDA::test_out_bucketize_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0645359Z test_ops.py::TestCommonCUDA::test_out_cholesky_solve_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0645507Z test_ops.py::TestCommonCUDA::test_out_column_stack_cuda_float32 PASSED   [ 16%]
2023-01-11T23:13:47.0645648Z test_ops.py::TestCommonCUDA::test_out_conj_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0645796Z test_ops.py::TestCommonCUDA::test_out_conj_physical_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0645942Z test_ops.py::TestCommonCUDA::test_out_corrcoef_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0646078Z test_ops.py::TestCommonCUDA::test_out_cosh_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0646225Z test_ops.py::TestCommonCUDA::test_out_count_nonzero_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0646370Z test_ops.py::TestCommonCUDA::test_out_cov_cuda_float32 PASSED            [ 16%]
2023-01-11T23:13:47.0646513Z test_ops.py::TestCommonCUDA::test_out_cummax_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0646657Z test_ops.py::TestCommonCUDA::test_out_cumprod_cuda_float32 XFAIL         [ 16%]
2023-01-11T23:13:47.0646821Z test_ops.py::TestCommonCUDA::test_out_cumulative_trapezoid_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0646965Z test_ops.py::TestCommonCUDA::test_out_deg2rad_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0647112Z test_ops.py::TestCommonCUDA::test_out_diag_embed_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0647251Z test_ops.py::TestCommonCUDA::test_out_diagonal_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0647408Z test_ops.py::TestCommonCUDA::test_out_diagonal_scatter_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0647554Z test_ops.py::TestCommonCUDA::test_out_digamma_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0647714Z test_ops.py::TestCommonCUDA::test_out_div_floor_rounding_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0647859Z test_ops.py::TestCommonCUDA::test_out_double_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0648029Z test_ops.py::TestCommonCUDA::test_out_dsplit_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0648173Z test_ops.py::TestCommonCUDA::test_out_einsum_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0648318Z test_ops.py::TestCommonCUDA::test_out_empty_like_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0648534Z test_ops.py::TestCommonCUDA::test_out_equal_cuda_float32 SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 16%]
2023-01-11T23:13:47.0648680Z test_ops.py::TestCommonCUDA::test_out_erfinv_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0648821Z test_ops.py::TestCommonCUDA::test_out_exp_cuda_float32 PASSED            [ 16%]
2023-01-11T23:13:47.0648969Z test_ops.py::TestCommonCUDA::test_out_expand_as_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0649115Z test_ops.py::TestCommonCUDA::test_out_expand_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0649285Z test_ops.py::TestCommonCUDA::test_out_fft_fftshift_cuda_float32 PASSED   [ 16%]
2023-01-11T23:13:47.0649456Z test_ops.py::TestCommonCUDA::test_out_fft_ihfftn_cuda_float32 XFAIL      [ 16%]
2023-01-11T23:13:47.0649600Z test_ops.py::TestCommonCUDA::test_out_fft_rfftn_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0649737Z test_ops.py::TestCommonCUDA::test_out_floor_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0649884Z test_ops.py::TestCommonCUDA::test_out_floor_divide_cuda_float32 PASSED   [ 16%]
2023-01-11T23:13:47.0650027Z test_ops.py::TestCommonCUDA::test_out_fmin_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0650170Z test_ops.py::TestCommonCUDA::test_out_frac_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0650313Z test_ops.py::TestCommonCUDA::test_out_frexp_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0650457Z test_ops.py::TestCommonCUDA::test_out_full_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0650675Z test_ops.py::TestCommonCUDA::test_out_full_like_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0650814Z test_ops.py::TestCommonCUDA::test_out_gcd_cuda_int64 PASSED              [ 16%]
2023-01-11T23:13:47.0650956Z test_ops.py::TestCommonCUDA::test_out_ge_cuda_float32 PASSED             [ 16%]
2023-01-11T23:13:47.0651096Z test_ops.py::TestCommonCUDA::test_out_gradient_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0651238Z test_ops.py::TestCommonCUDA::test_out_gt_cuda_float32 PASSED             [ 16%]
2023-01-11T23:13:47.0651378Z test_ops.py::TestCommonCUDA::test_out_hsplit_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0651520Z test_ops.py::TestCommonCUDA::test_out_i0_cuda_float32 PASSED             [ 16%]
2023-01-11T23:13:47.0651664Z test_ops.py::TestCommonCUDA::test_out_igamma_cuda_float32 PASSED         [ 16%]
2023-01-11T23:13:47.0651809Z test_ops.py::TestCommonCUDA::test_out_igammac_cuda_float32 PASSED        [ 16%]
2023-01-11T23:13:47.0651956Z test_ops.py::TestCommonCUDA::test_out_imag_cuda_complex64 PASSED         [ 16%]
2023-01-11T23:13:47.0652103Z test_ops.py::TestCommonCUDA::test_out_index_add_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0652245Z test_ops.py::TestCommonCUDA::test_out_index_copy_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0652391Z test_ops.py::TestCommonCUDA::test_out_index_fill_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0652536Z test_ops.py::TestCommonCUDA::test_out_index_put_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0652682Z test_ops.py::TestCommonCUDA::test_out_index_select_cuda_float32 PASSED   [ 16%]
2023-01-11T23:13:47.0652824Z test_ops.py::TestCommonCUDA::test_out_inner_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0652964Z test_ops.py::TestCommonCUDA::test_out_isinf_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0653109Z test_ops.py::TestCommonCUDA::test_out_isneginf_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0653256Z test_ops.py::TestCommonCUDA::test_out_istft_cuda_complex64 PASSED        [ 16%]
2023-01-11T23:13:47.0653403Z test_ops.py::TestCommonCUDA::test_out_jiterator_binary_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0653571Z test_ops.py::TestCommonCUDA::test_out_kron_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0653718Z test_ops.py::TestCommonCUDA::test_out_kthvalue_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0653857Z test_ops.py::TestCommonCUDA::test_out_lcm_cuda_int64 PASSED              [ 16%]
2023-01-11T23:13:47.0654003Z test_ops.py::TestCommonCUDA::test_out_ldexp_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0654147Z test_ops.py::TestCommonCUDA::test_out_lerp_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0654296Z test_ops.py::TestCommonCUDA::test_out_linalg_cond_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0654442Z test_ops.py::TestCommonCUDA::test_out_linalg_eig_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0654689Z test_ops.py::TestCommonCUDA::test_out_linalg_eigh_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0654843Z test_ops.py::TestCommonCUDA::test_out_linalg_eigvals_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0654997Z test_ops.py::TestCommonCUDA::test_out_linalg_eigvalsh_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0655146Z test_ops.py::TestCommonCUDA::test_out_linalg_inv_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0655292Z test_ops.py::TestCommonCUDA::test_out_linalg_inv_ex_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0655449Z test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0655610Z test_ops.py::TestCommonCUDA::test_out_linalg_ldl_factor_ex_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0655770Z test_ops.py::TestCommonCUDA::test_out_linalg_ldl_solve_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0655931Z test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_cuda_float32 SKIPPED (Skipped!) [ 16%]
2023-01-11T23:13:47.0656098Z test_ops.py::TestCommonCUDA::test_out_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0656284Z test_ops.py::TestCommonCUDA::test_out_linalg_lu_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0656446Z test_ops.py::TestCommonCUDA::test_out_linalg_matrix_power_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0656597Z test_ops.py::TestCommonCUDA::test_out_linalg_norm_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0656772Z test_ops.py::TestCommonCUDA::test_out_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0656919Z test_ops.py::TestCommonCUDA::test_out_linalg_pinv_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0657069Z test_ops.py::TestCommonCUDA::test_out_linalg_slogdet_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0657227Z test_ops.py::TestCommonCUDA::test_out_linalg_solve_triangular_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0657374Z test_ops.py::TestCommonCUDA::test_out_linalg_svd_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0657524Z test_ops.py::TestCommonCUDA::test_out_linalg_svdvals_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0657688Z test_ops.py::TestCommonCUDA::test_out_linalg_tensorsolve_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0657838Z test_ops.py::TestCommonCUDA::test_out_linalg_vander_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0657998Z test_ops.py::TestCommonCUDA::test_out_linalg_vector_norm_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0658140Z test_ops.py::TestCommonCUDA::test_out_log10_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0658284Z test_ops.py::TestCommonCUDA::test_out_log1p_cuda_float32 PASSED          [ 16%]
2023-01-11T23:13:47.0658421Z test_ops.py::TestCommonCUDA::test_out_log_cuda_float32 PASSED            [ 16%]
2023-01-11T23:13:47.0658567Z test_ops.py::TestCommonCUDA::test_out_log_softmax_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0658727Z test_ops.py::TestCommonCUDA::test_out_log_softmax_with_dtype_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0658875Z test_ops.py::TestCommonCUDA::test_out_logaddexp2_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0659029Z test_ops.py::TestCommonCUDA::test_out_logical_and_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0659178Z test_ops.py::TestCommonCUDA::test_out_logical_not_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0659351Z test_ops.py::TestCommonCUDA::test_out_long_cuda_float32 PASSED           [ 16%]
2023-01-11T23:13:47.0659493Z test_ops.py::TestCommonCUDA::test_out_lt_cuda_float32 PASSED             [ 16%]
2023-01-11T23:13:47.0659633Z test_ops.py::TestCommonCUDA::test_out_lu_cuda_float32 XFAIL              [ 16%]
2023-01-11T23:13:47.0659773Z test_ops.py::TestCommonCUDA::test_out_lu_solve_cuda_float32 PASSED       [ 16%]
2023-01-11T23:13:47.0659918Z test_ops.py::TestCommonCUDA::test_out_lu_unpack_cuda_float32 PASSED      [ 16%]
2023-01-11T23:13:47.0660059Z test_ops.py::TestCommonCUDA::test_out_mH_cuda_float32 PASSED             [ 16%]
2023-01-11T23:13:47.0660201Z test_ops.py::TestCommonCUDA::test_out_mT_cuda_float32 PASSED             [ 16%]
2023-01-11T23:13:47.0660345Z test_ops.py::TestCommonCUDA::test_out_masked_amin_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0660496Z test_ops.py::TestCommonCUDA::test_out_masked_argmax_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0660644Z test_ops.py::TestCommonCUDA::test_out_masked_argmin_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0660798Z test_ops.py::TestCommonCUDA::test_out_masked_cumprod_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0660949Z test_ops.py::TestCommonCUDA::test_out_masked_logsumexp_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0661094Z test_ops.py::TestCommonCUDA::test_out_masked_mean_cuda_float32 PASSED    [ 16%]
2023-01-11T23:13:47.0661242Z test_ops.py::TestCommonCUDA::test_out_masked_median_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0661389Z test_ops.py::TestCommonCUDA::test_out_masked_select_cuda_float32 PASSED  [ 16%]
2023-01-11T23:13:47.0661537Z test_ops.py::TestCommonCUDA::test_out_masked_softmax_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0661685Z test_ops.py::TestCommonCUDA::test_out_masked_softmin_cuda_float32 PASSED [ 16%]
2023-01-11T23:13:47.0661856Z test_ops.py::TestCommonCUDA::test_out_masked_sum_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0662004Z test_ops.py::TestCommonCUDA::test_out_masked_var_cuda_float32 PASSED     [ 16%]
2023-01-11T23:13:47.0662141Z test_ops.py::TestCommonCUDA::test_out_matmul_cuda_float32 XFAIL          [ 17%]
2023-01-11T23:13:47.0662286Z test_ops.py::TestCommonCUDA::test_out_matrix_exp_cuda_float32 PASSED     [ 17%]
2023-01-11T23:13:47.0662476Z test_ops.py::TestCommonCUDA::test_out_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:13:47.0662618Z test_ops.py::TestCommonCUDA::test_out_mean_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0662784Z test_ops.py::TestCommonCUDA::test_out_meshgrid_variadic_tensors_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0662925Z test_ops.py::TestCommonCUDA::test_out_mm_cuda_float32 PASSED             [ 17%]
2023-01-11T23:13:47.0663067Z test_ops.py::TestCommonCUDA::test_out_mul_cuda_float32 PASSED            [ 17%]
2023-01-11T23:13:47.0663232Z test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_1_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0663387Z test_ops.py::TestCommonCUDA::test_out_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0663533Z test_ops.py::TestCommonCUDA::test_out_nanmean_cuda_float32 PASSED        [ 17%]
2023-01-11T23:13:47.0663681Z test_ops.py::TestCommonCUDA::test_out_nanquantile_cuda_float32 PASSED    [ 17%]
2023-01-11T23:13:47.0663845Z test_ops.py::TestCommonCUDA::test_out_native_dropout_backward_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0663992Z test_ops.py::TestCommonCUDA::test_out_new_empty_cuda_float32 PASSED      [ 17%]
2023-01-11T23:13:47.0664135Z test_ops.py::TestCommonCUDA::test_out_new_full_cuda_float32 PASSED       [ 17%]
2023-01-11T23:13:47.0664279Z test_ops.py::TestCommonCUDA::test_out_new_ones_cuda_float32 PASSED       [ 17%]
2023-01-11T23:13:47.0664423Z test_ops.py::TestCommonCUDA::test_out_new_zeros_cuda_float32 PASSED      [ 17%]
2023-01-11T23:13:47.0664597Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0664769Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0664970Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0665144Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0665314Z test_ops.py::TestCommonCUDA::test_out_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0665479Z test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0665639Z test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool2d_cuda_float32 XFAIL [ 17%]
2023-01-11T23:13:47.0665802Z test_ops.py::TestCommonCUDA::test_out_nn_functional_avg_pool3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0665982Z test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0666162Z test_ops.py::TestCommonCUDA::test_out_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0666326Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0666500Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0666670Z test_ops.py::TestCommonCUDA::test_out_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0666847Z test_ops.py::TestCommonCUDA::test_out_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0667013Z test_ops.py::TestCommonCUDA::test_out_nn_functional_cross_entropy_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0667175Z test_ops.py::TestCommonCUDA::test_out_nn_functional_ctc_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0667337Z test_ops.py::TestCommonCUDA::test_out_nn_functional_dropout3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0667532Z test_ops.py::TestCommonCUDA::test_out_nn_functional_elu_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0667698Z test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_bag_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0667863Z test_ops.py::TestCommonCUDA::test_out_nn_functional_embedding_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0668058Z test_ops.py::TestCommonCUDA::test_out_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0668235Z test_ops.py::TestCommonCUDA::test_out_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0668408Z test_ops.py::TestCommonCUDA::test_out_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0668577Z test_ops.py::TestCommonCUDA::test_out_nn_functional_gelu_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:13:47.0668744Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hardsigmoid_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0668923Z test_ops.py::TestCommonCUDA::test_out_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0669084Z test_ops.py::TestCommonCUDA::test_out_nn_functional_instance_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0669259Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0669437Z test_ops.py::TestCommonCUDA::test_out_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0669599Z test_ops.py::TestCommonCUDA::test_out_nn_functional_layer_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0669826Z test_ops.py::TestCommonCUDA::test_out_nn_functional_linear_cuda_float32 XFAIL [ 17%]
2023-01-11T23:13:47.0670004Z test_ops.py::TestCommonCUDA::test_out_nn_functional_local_response_norm_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0670163Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0670330Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_pool3d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0670496Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0670692Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0670863Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0671031Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0671203Z test_ops.py::TestCommonCUDA::test_out_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0671363Z test_ops.py::TestCommonCUDA::test_out_nn_functional_mish_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0671524Z test_ops.py::TestCommonCUDA::test_out_nn_functional_mse_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0671703Z test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0671890Z test_ops.py::TestCommonCUDA::test_out_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0672052Z test_ops.py::TestCommonCUDA::test_out_nn_functional_nll_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0672211Z test_ops.py::TestCommonCUDA::test_out_nn_functional_normalize_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0672376Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_constant_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0672543Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pad_replicate_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0672717Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pairwise_distance_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0672878Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pdist_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0673048Z test_ops.py::TestCommonCUDA::test_out_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0673247Z test_ops.py::TestCommonCUDA::test_out_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0673405Z test_ops.py::TestCommonCUDA::test_out_nn_functional_prelu_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0673560Z test_ops.py::TestCommonCUDA::test_out_nn_functional_relu6_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0673719Z test_ops.py::TestCommonCUDA::test_out_nn_functional_rrelu_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0673877Z test_ops.py::TestCommonCUDA::test_out_nn_functional_selu_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0674043Z test_ops.py::TestCommonCUDA::test_out_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0674210Z test_ops.py::TestCommonCUDA::test_out_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0674375Z test_ops.py::TestCommonCUDA::test_out_nn_functional_softplus_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0674541Z test_ops.py::TestCommonCUDA::test_out_nn_functional_tanhshrink_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0674707Z test_ops.py::TestCommonCUDA::test_out_nn_functional_threshold_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0674895Z test_ops.py::TestCommonCUDA::test_out_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0675058Z test_ops.py::TestCommonCUDA::test_out_nn_functional_unfold_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0675229Z test_ops.py::TestCommonCUDA::test_out_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0675377Z test_ops.py::TestCommonCUDA::test_out_nonzero_cuda_float32 XFAIL         [ 17%]
2023-01-11T23:13:47.0675526Z test_ops.py::TestCommonCUDA::test_out_norm_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0675675Z test_ops.py::TestCommonCUDA::test_out_norm_fro_cuda_float32 PASSED       [ 17%]
2023-01-11T23:13:47.0675821Z test_ops.py::TestCommonCUDA::test_out_normal_cuda_float32 XFAIL          [ 17%]
2023-01-11T23:13:47.0675968Z test_ops.py::TestCommonCUDA::test_out_ones_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0676112Z test_ops.py::TestCommonCUDA::test_out_ormqr_cuda_float32 XFAIL           [ 17%]
2023-01-11T23:13:47.0676283Z test_ops.py::TestCommonCUDA::test_out_pca_lowrank_cuda_float32 PASSED    [ 17%]
2023-01-11T23:13:47.0676432Z test_ops.py::TestCommonCUDA::test_out_permute_cuda_float32 PASSED        [ 17%]
2023-01-11T23:13:47.0676579Z test_ops.py::TestCommonCUDA::test_out_pinverse_cuda_float32 PASSED       [ 17%]
2023-01-11T23:13:47.0676723Z test_ops.py::TestCommonCUDA::test_out_polar_cuda_float32 PASSED          [ 17%]
2023-01-11T23:13:47.0676867Z test_ops.py::TestCommonCUDA::test_out_pow_cuda_float32 PASSED            [ 17%]
2023-01-11T23:13:47.0677012Z test_ops.py::TestCommonCUDA::test_out_prod_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0677154Z test_ops.py::TestCommonCUDA::test_out_put_cuda_float32 PASSED            [ 17%]
2023-01-11T23:13:47.0677299Z test_ops.py::TestCommonCUDA::test_out_rad2deg_cuda_float32 PASSED        [ 17%]
2023-01-11T23:13:47.0677442Z test_ops.py::TestCommonCUDA::test_out_rand_like_cuda_float32 PASSED      [ 17%]
2023-01-11T23:13:47.0677585Z test_ops.py::TestCommonCUDA::test_out_randint_cuda_float32 XFAIL         [ 17%]
2023-01-11T23:13:47.0677731Z test_ops.py::TestCommonCUDA::test_out_ravel_cuda_float32 PASSED          [ 17%]
2023-01-11T23:13:47.0677877Z test_ops.py::TestCommonCUDA::test_out_remainder_cuda_float32 PASSED      [ 17%]
2023-01-11T23:13:47.0678023Z test_ops.py::TestCommonCUDA::test_out_reshape_as_cuda_float32 PASSED     [ 17%]
2023-01-11T23:13:47.0678169Z test_ops.py::TestCommonCUDA::test_out_reshape_cuda_float32 PASSED        [ 17%]
2023-01-11T23:13:47.0678316Z test_ops.py::TestCommonCUDA::test_out_resolve_conj_cuda_float32 PASSED   [ 17%]
2023-01-11T23:13:47.0678457Z test_ops.py::TestCommonCUDA::test_out_roll_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0678592Z test_ops.py::TestCommonCUDA::test_out_rot90_cuda_float32 PASSED          [ 17%]
2023-01-11T23:13:47.0678777Z test_ops.py::TestCommonCUDA::test_out_round_decimals_0_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0678923Z test_ops.py::TestCommonCUDA::test_out_rsub_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0679097Z test_ops.py::TestCommonCUDA::test_out_scalar_tensor_cuda_float32 PASSED  [ 17%]
2023-01-11T23:13:47.0679282Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_amin_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0679439Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_mean_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0679594Z test_ops.py::TestCommonCUDA::test_out_scatter_reduce_prod_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0679757Z test_ops.py::TestCommonCUDA::test_out_segment_reduce_offsets_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0679901Z test_ops.py::TestCommonCUDA::test_out_select_scatter_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0680043Z test_ops.py::TestCommonCUDA::test_out_sgn_cuda_float32 PASSED            [ 17%]
2023-01-11T23:13:47.0680187Z test_ops.py::TestCommonCUDA::test_out_sign_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0680352Z test_ops.py::TestCommonCUDA::test_out_signal_windows_bartlett_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0680518Z test_ops.py::TestCommonCUDA::test_out_signal_windows_blackman_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0680681Z test_ops.py::TestCommonCUDA::test_out_signal_windows_cosine_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0680843Z test_ops.py::TestCommonCUDA::test_out_signal_windows_gaussian_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0681014Z test_ops.py::TestCommonCUDA::test_out_signal_windows_general_cosine_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0681177Z test_ops.py::TestCommonCUDA::test_out_signal_windows_general_hamming_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0681337Z test_ops.py::TestCommonCUDA::test_out_signal_windows_hamming_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0681493Z test_ops.py::TestCommonCUDA::test_out_signal_windows_nuttall_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0681642Z test_ops.py::TestCommonCUDA::test_out_sinh_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0681792Z test_ops.py::TestCommonCUDA::test_out_slice_scatter_cuda_float32 PASSED  [ 17%]
2023-01-11T23:13:47.0681968Z test_ops.py::TestCommonCUDA::test_out_softmax_cuda_float32 PASSED        [ 17%]
2023-01-11T23:13:47.0682130Z test_ops.py::TestCommonCUDA::test_out_softmax_with_dtype_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0682306Z test_ops.py::TestCommonCUDA::test_out_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 17%]
2023-01-11T23:13:47.0682463Z test_ops.py::TestCommonCUDA::test_out_special_bessel_j0_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0682612Z test_ops.py::TestCommonCUDA::test_out_special_bessel_y0_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0682766Z test_ops.py::TestCommonCUDA::test_out_special_bessel_y1_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0683114Z test_ops.py::TestCommonCUDA::test_out_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 17%]
2023-01-11T23:13:47.0683269Z test_ops.py::TestCommonCUDA::test_out_special_erfcx_cuda_float32 PASSED  [ 17%]
2023-01-11T23:13:47.0683444Z test_ops.py::TestCommonCUDA::test_out_special_hermite_polynomial_he_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0683592Z test_ops.py::TestCommonCUDA::test_out_special_i1_cuda_float32 PASSED     [ 17%]
2023-01-11T23:13:47.0683743Z test_ops.py::TestCommonCUDA::test_out_special_i1e_cuda_float32 PASSED    [ 17%]
2023-01-11T23:13:47.0684077Z test_ops.py::TestCommonCUDA::test_out_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 17%]
2023-01-11T23:13:47.0684235Z test_ops.py::TestCommonCUDA::test_out_special_log_ndtr_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0684393Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i0_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0684559Z test_ops.py::TestCommonCUDA::test_out_special_modified_bessel_i1_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0684739Z test_ops.py::TestCommonCUDA::test_out_special_ndtr_cuda_float32 PASSED   [ 17%]
2023-01-11T23:13:47.0684890Z test_ops.py::TestCommonCUDA::test_out_special_ndtri_cuda_float32 PASSED  [ 17%]
2023-01-11T23:13:47.0685078Z test_ops.py::TestCommonCUDA::test_out_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0685426Z test_ops.py::TestCommonCUDA::test_out_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 17%]
2023-01-11T23:13:47.0685581Z test_ops.py::TestCommonCUDA::test_out_special_xlog1py_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0685735Z test_ops.py::TestCommonCUDA::test_out_split_list_args_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0685875Z test_ops.py::TestCommonCUDA::test_out_squeeze_cuda_float32 PASSED        [ 17%]
2023-01-11T23:13:47.0686021Z test_ops.py::TestCommonCUDA::test_out_std_mean_cuda_float32 PASSED       [ 17%]
2023-01-11T23:13:47.0686183Z test_ops.py::TestCommonCUDA::test_out_std_mean_unbiased_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0686328Z test_ops.py::TestCommonCUDA::test_out_stft_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0686470Z test_ops.py::TestCommonCUDA::test_out_sum_cuda_float32 PASSED            [ 17%]
2023-01-11T23:13:47.0686615Z test_ops.py::TestCommonCUDA::test_out_symeig_cuda_float32 PASSED         [ 17%]
2023-01-11T23:13:47.0686758Z test_ops.py::TestCommonCUDA::test_out_t_cuda_float32 PASSED              [ 17%]
2023-01-11T23:13:47.0686907Z test_ops.py::TestCommonCUDA::test_out_take_along_dim_cuda_float32 PASSED [ 17%]
2023-01-11T23:13:47.0687043Z test_ops.py::TestCommonCUDA::test_out_take_cuda_float32 PASSED           [ 17%]
2023-01-11T23:13:47.0687184Z test_ops.py::TestCommonCUDA::test_out_tan_cuda_float32 PASSED            [ 18%]
2023-01-11T23:13:47.0687326Z test_ops.py::TestCommonCUDA::test_out_tanh_cuda_float32 PASSED           [ 18%]
2023-01-11T23:13:47.0687471Z test_ops.py::TestCommonCUDA::test_out_to_cuda_float32 PASSED             [ 18%]
2023-01-11T23:13:47.0687615Z test_ops.py::TestCommonCUDA::test_out_to_sparse_cuda_float32 PASSED      [ 18%]
2023-01-11T23:13:47.0687787Z test_ops.py::TestCommonCUDA::test_out_trapz_cuda_float32 PASSED          [ 18%]
2023-01-11T23:13:47.0687934Z test_ops.py::TestCommonCUDA::test_out_tril_indices_cuda_int64 PASSED     [ 18%]
2023-01-11T23:13:47.0688080Z test_ops.py::TestCommonCUDA::test_out_triu_indices_cuda_int64 PASSED     [ 18%]
2023-01-11T23:13:47.0688218Z test_ops.py::TestCommonCUDA::test_out_trunc_cuda_float32 PASSED          [ 18%]
2023-01-11T23:13:47.0688362Z test_ops.py::TestCommonCUDA::test_out_unbind_cuda_float32 PASSED         [ 18%]
2023-01-11T23:13:47.0688508Z test_ops.py::TestCommonCUDA::test_out_uniform_cuda_float32 PASSED        [ 18%]
2023-01-11T23:13:47.0688652Z test_ops.py::TestCommonCUDA::test_out_unsqueeze_cuda_float32 PASSED      [ 18%]
2023-01-11T23:13:47.0688797Z test_ops.py::TestCommonCUDA::test_out_var_mean_cuda_float32 PASSED       [ 18%]
2023-01-11T23:13:47.0688957Z test_ops.py::TestCommonCUDA::test_out_var_mean_unbiased_cuda_float32 PASSED [ 18%]
2023-01-11T23:13:47.0689099Z test_ops.py::TestCommonCUDA::test_out_vdot_cuda_float32 PASSED           [ 18%]
2023-01-11T23:13:47.0689256Z test_ops.py::TestCommonCUDA::test_out_view_as_complex_cuda_float32 PASSED [ 18%]
2023-01-11T23:13:47.0689403Z test_ops.py::TestCommonCUDA::test_out_view_as_real_cuda_complex64 PASSED [ 18%]
2023-01-11T23:13:47.0689542Z test_ops.py::TestCommonCUDA::test_out_view_copy_cuda_float32 XFAIL       [ 18%]
2023-01-11T23:13:47.0689687Z test_ops.py::TestCommonCUDA::test_out_vsplit_cuda_float32 PASSED         [ 18%]
2023-01-11T23:13:47.0689828Z test_ops.py::TestCommonCUDA::test_out_warning_H_cuda PASSED              [ 18%]
2023-01-11T23:13:47.0689969Z test_ops.py::TestCommonCUDA::test_out_warning_T_cuda PASSED              [ 18%]
2023-01-11T23:13:47.0690114Z test_ops.py::TestCommonCUDA::test_out_warning___getitem___cuda PASSED    [ 18%]
2023-01-11T23:13:47.0690285Z test_ops.py::TestCommonCUDA::test_out_warning___radd___cuda PASSED       [ 18%]
2023-01-11T23:13:47.0690428Z test_ops.py::TestCommonCUDA::test_out_warning___rmod___cuda PASSED       [ 18%]
2023-01-11T23:13:47.0690573Z test_ops.py::TestCommonCUDA::test_out_warning___rmul___cuda PASSED       [ 18%]
2023-01-11T23:13:47.0690710Z test_ops.py::TestCommonCUDA::test_out_warning___rpow___cuda PASSED       [ 18%]
2023-01-11T23:13:47.0690854Z test_ops.py::TestCommonCUDA::test_out_warning___rsub___cuda PASSED       [ 18%]
2023-01-11T23:13:47.0690997Z test_ops.py::TestCommonCUDA::test_out_warning___rxor___cuda PASSED       [ 18%]
2023-01-11T23:13:47.0691162Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_bool_cuda PASSED [ 18%]
2023-01-11T23:13:47.0691329Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_cfloat_cuda PASSED [ 18%]
2023-01-11T23:13:47.0691493Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_char_cuda PASSED [ 18%]
2023-01-11T23:13:47.0691664Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_complex_cuda PASSED [ 18%]
2023-01-11T23:13:47.0691827Z test_ops.py::TestCommonCUDA::test_out_warning__refs__conversions_half_cuda PASSED [ 18%]
2023-01-11T23:13:47.0691970Z test_ops.py::TestCommonCUDA::test_out_warning__refs_acos_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0692120Z test_ops.py::TestCommonCUDA::test_out_warning__refs_acosh_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0692266Z test_ops.py::TestCommonCUDA::test_out_warning__refs_add_cuda PASSED      [ 18%]
2023-01-11T23:13:47.0692413Z test_ops.py::TestCommonCUDA::test_out_warning__refs_addcdiv_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0692562Z test_ops.py::TestCommonCUDA::test_out_warning__refs_addcmul_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0692708Z test_ops.py::TestCommonCUDA::test_out_warning__refs_amax_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0692854Z test_ops.py::TestCommonCUDA::test_out_warning__refs_amin_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0693012Z test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_cuda PASSED [ 18%]
2023-01-11T23:13:47.0693178Z test_ops.py::TestCommonCUDA::test_out_warning__refs_as_strided_partial_views_cuda PASSED [ 18%]
2023-01-11T23:13:47.0693349Z test_ops.py::TestCommonCUDA::test_out_warning__refs_asin_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0693498Z test_ops.py::TestCommonCUDA::test_out_warning__refs_asinh_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0693644Z test_ops.py::TestCommonCUDA::test_out_warning__refs_atanh_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0693801Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_and_cuda PASSED [ 18%]
2023-01-11T23:13:47.0693968Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_left_shift_cuda PASSED [ 18%]
2023-01-11T23:13:47.0694123Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_not_cuda PASSED [ 18%]
2023-01-11T23:13:47.0694274Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bitwise_xor_cuda PASSED [ 18%]
2023-01-11T23:13:47.0694433Z test_ops.py::TestCommonCUDA::test_out_warning__refs_broadcast_shapes_cuda PASSED [ 18%]
2023-01-11T23:13:47.0694696Z test_ops.py::TestCommonCUDA::test_out_warning__refs_bucketize_cuda PASSED [ 18%]
2023-01-11T23:13:47.0694849Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ceil_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0694997Z test_ops.py::TestCommonCUDA::test_out_warning__refs_chunk_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0695151Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clamp_max_cuda PASSED [ 18%]
2023-01-11T23:13:47.0695300Z test_ops.py::TestCommonCUDA::test_out_warning__refs_clone_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0695446Z test_ops.py::TestCommonCUDA::test_out_warning__refs_conj_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0695602Z test_ops.py::TestCommonCUDA::test_out_warning__refs_contiguous_cuda PASSED [ 18%]
2023-01-11T23:13:47.0695752Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cos_cuda PASSED      [ 18%]
2023-01-11T23:13:47.0695892Z test_ops.py::TestCommonCUDA::test_out_warning__refs_cosh_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0696082Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0696238Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diag_embed_cuda PASSED [ 18%]
2023-01-11T23:13:47.0696400Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_copy_cuda PASSED [ 18%]
2023-01-11T23:13:47.0696551Z test_ops.py::TestCommonCUDA::test_out_warning__refs_diagonal_cuda PASSED [ 18%]
2023-01-11T23:13:47.0696698Z test_ops.py::TestCommonCUDA::test_out_warning__refs_digamma_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0696863Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_floor_rounding_cuda PASSED [ 18%]
2023-01-11T23:13:47.0697028Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_no_rounding_mode_cuda PASSED [ 18%]
2023-01-11T23:13:47.0697183Z test_ops.py::TestCommonCUDA::test_out_warning__refs_div_trunc_rounding_cuda PASSED [ 18%]
2023-01-11T23:13:47.0697332Z test_ops.py::TestCommonCUDA::test_out_warning__refs_dsplit_cuda PASSED   [ 18%]
2023-01-11T23:13:47.0697529Z test_ops.py::TestCommonCUDA::test_out_warning__refs_empty_cuda SKIPPED (Expected: empty is not comparable) [ 18%]
2023-01-11T23:13:47.0697680Z test_ops.py::TestCommonCUDA::test_out_warning__refs_erfc_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0697828Z test_ops.py::TestCommonCUDA::test_out_warning__refs_erfinv_cuda PASSED   [ 18%]
2023-01-11T23:13:47.0697979Z test_ops.py::TestCommonCUDA::test_out_warning__refs_exp_cuda PASSED      [ 18%]
2023-01-11T23:13:47.0698135Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expand_as_cuda PASSED [ 18%]
2023-01-11T23:13:47.0698283Z test_ops.py::TestCommonCUDA::test_out_warning__refs_expm1_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0698424Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_fft2_cuda PASSED [ 18%]
2023-01-11T23:13:47.0698578Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_hfftn_cuda PASSED [ 18%]
2023-01-11T23:13:47.0698730Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft2_cuda PASSED [ 18%]
2023-01-11T23:13:47.0698880Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifft_cuda PASSED [ 18%]
2023-01-11T23:13:47.0699031Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftn_cuda PASSED [ 18%]
2023-01-11T23:13:47.0699230Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ifftshift_cuda PASSED [ 18%]
2023-01-11T23:13:47.0699399Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfft2_cuda PASSED [ 18%]
2023-01-11T23:13:47.0699573Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_ihfftn_cuda PASSED [ 18%]
2023-01-11T23:13:47.0699735Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfft2_cuda PASSED [ 18%]
2023-01-11T23:13:47.0699888Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_irfftn_cuda PASSED [ 18%]
2023-01-11T23:13:47.0700038Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fft_rfft2_cuda PASSED [ 18%]
2023-01-11T23:13:47.0700193Z test_ops.py::TestCommonCUDA::test_out_warning__refs_flipud_cuda PASSED   [ 18%]
2023-01-11T23:13:47.0700352Z test_ops.py::TestCommonCUDA::test_out_warning__refs_float_power_cuda PASSED [ 18%]
2023-01-11T23:13:47.0700499Z test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0700659Z test_ops.py::TestCommonCUDA::test_out_warning__refs_floor_divide_cuda PASSED [ 18%]
2023-01-11T23:13:47.0700808Z test_ops.py::TestCommonCUDA::test_out_warning__refs_fmax_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0700948Z test_ops.py::TestCommonCUDA::test_out_warning__refs_frac_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0701096Z test_ops.py::TestCommonCUDA::test_out_warning__refs_gcd_cuda PASSED      [ 18%]
2023-01-11T23:13:47.0701242Z test_ops.py::TestCommonCUDA::test_out_warning__refs_i0_cuda PASSED       [ 18%]
2023-01-11T23:13:47.0701388Z test_ops.py::TestCommonCUDA::test_out_warning__refs_imag_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0701540Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_add_cuda PASSED [ 18%]
2023-01-11T23:13:47.0701718Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_copy_cuda PASSED [ 18%]
2023-01-11T23:13:47.0701869Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_fill_cuda PASSED [ 18%]
2023-01-11T23:13:47.0702031Z test_ops.py::TestCommonCUDA::test_out_warning__refs_index_select_cuda PASSED [ 18%]
2023-01-11T23:13:47.0702180Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isclose_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0702322Z test_ops.py::TestCommonCUDA::test_out_warning__refs_isnan_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0702468Z test_ops.py::TestCommonCUDA::test_out_warning__refs_lerp_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0702622Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svd_cuda PASSED [ 18%]
2023-01-11T23:13:47.0702783Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_svdvals_cuda PASSED [ 18%]
2023-01-11T23:13:47.0702944Z test_ops.py::TestCommonCUDA::test_out_warning__refs_linalg_vector_norm_cuda PASSED [ 18%]
2023-01-11T23:13:47.0703093Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log10_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0703240Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log1p_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0703386Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log2_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0703551Z test_ops.py::TestCommonCUDA::test_out_warning__refs_log_softmax_with_dtype_cuda PASSED [ 18%]
2023-01-11T23:13:47.0703708Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_and_cuda PASSED [ 18%]
2023-01-11T23:13:47.0703864Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logical_not_cuda PASSED [ 18%]
2023-01-11T23:13:47.0704016Z test_ops.py::TestCommonCUDA::test_out_warning__refs_logspace_cuda PASSED [ 18%]
2023-01-11T23:13:47.0704173Z test_ops.py::TestCommonCUDA::test_out_warning__refs_masked_fill_cuda PASSED [ 18%]
2023-01-11T23:13:47.0704323Z test_ops.py::TestCommonCUDA::test_out_warning__refs_maximum_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0704470Z test_ops.py::TestCommonCUDA::test_out_warning__refs_mean_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0704621Z test_ops.py::TestCommonCUDA::test_out_warning__refs_movedim_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0704761Z test_ops.py::TestCommonCUDA::test_out_warning__refs_mul_cuda PASSED      [ 18%]
2023-01-11T23:13:47.0704940Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nan_to_num_cuda PASSED [ 18%]
2023-01-11T23:13:47.0705098Z test_ops.py::TestCommonCUDA::test_out_warning__refs_narrow_copy_cuda PASSED [ 18%]
2023-01-11T23:13:47.0705262Z test_ops.py::TestCommonCUDA::test_out_warning__refs_native_layer_norm_cuda PASSED [ 18%]
2023-01-11T23:13:47.0705407Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ne_cuda PASSED       [ 18%]
2023-01-11T23:13:47.0705567Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_empty_strided_cuda PASSED [ 18%]
2023-01-11T23:13:47.0705714Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_full_cuda PASSED [ 18%]
2023-01-11T23:13:47.0705869Z test_ops.py::TestCommonCUDA::test_out_warning__refs_new_zeros_cuda PASSED [ 18%]
2023-01-11T23:13:47.0706038Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_group_norm_cuda PASSED [ 18%]
2023-01-11T23:13:47.0706207Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_hardtanh_cuda PASSED [ 18%]
2023-01-11T23:13:47.0706377Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_l1_loss_cuda PASSED [ 18%]
2023-01-11T23:13:47.0706548Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_layer_norm_cuda PASSED [ 18%]
2023-01-11T23:13:47.0706738Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_log_softmax_with_dtype_cuda PASSED [ 18%]
2023-01-11T23:13:47.0706920Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_margin_ranking_loss_cuda PASSED [ 18%]
2023-01-11T23:13:47.0707086Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mish_cuda PASSED [ 18%]
2023-01-11T23:13:47.0707253Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_mse_loss_cuda PASSED [ 18%]
2023-01-11T23:13:47.0707472Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pairwise_distance_cuda PASSED [ 18%]
2023-01-11T23:13:47.0707633Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_pdist_cuda PASSED [ 18%]
2023-01-11T23:13:47.0707814Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_poisson_nll_loss_cuda PASSED [ 18%]
2023-01-11T23:13:47.0707981Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_prelu_cuda PASSED [ 18%]
2023-01-11T23:13:47.0708144Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_relu6_cuda PASSED [ 18%]
2023-01-11T23:13:47.0708307Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_selu_cuda PASSED [ 18%]
2023-01-11T23:13:47.0708488Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmax_with_dtype_cuda PASSED [ 18%]
2023-01-11T23:13:47.0708668Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softmin_with_dtype_cuda PASSED [ 18%]
2023-01-11T23:13:47.0708838Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_softplus_cuda PASSED [ 18%]
2023-01-11T23:13:47.0709009Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_tanhshrink_cuda PASSED [ 18%]
2023-01-11T23:13:47.0709186Z test_ops.py::TestCommonCUDA::test_out_warning__refs_nn_functional_triplet_margin_loss_cuda PASSED [ 18%]
2023-01-11T23:13:47.0709339Z test_ops.py::TestCommonCUDA::test_out_warning__refs_permute_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0709488Z test_ops.py::TestCommonCUDA::test_out_warning__refs_pow_cuda PASSED      [ 18%]
2023-01-11T23:13:47.0709638Z test_ops.py::TestCommonCUDA::test_out_warning__refs_ravel_cuda PASSED    [ 18%]
2023-01-11T23:13:47.0709853Z test_ops.py::TestCommonCUDA::test_out_warning__refs_reciprocal_cuda PASSED [ 18%]
2023-01-11T23:13:47.0710014Z test_ops.py::TestCommonCUDA::test_out_warning__refs_remainder_cuda PASSED [ 18%]
2023-01-11T23:13:47.0710172Z test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_as_cuda PASSED [ 18%]
2023-01-11T23:13:47.0710328Z test_ops.py::TestCommonCUDA::test_out_warning__refs_reshape_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0710470Z test_ops.py::TestCommonCUDA::test_out_warning__refs_roll_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0710646Z test_ops.py::TestCommonCUDA::test_out_warning__refs_rsub_cuda PASSED     [ 18%]
2023-01-11T23:13:47.0710797Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sigmoid_cuda PASSED  [ 18%]
2023-01-11T23:13:47.0710943Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sin_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0711093Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sinc_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0711239Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sinh_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0711406Z test_ops.py::TestCommonCUDA::test_out_warning__refs_softmax_with_dtype_cuda PASSED [ 19%]
2023-01-11T23:13:47.0711575Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j0_cuda PASSED [ 19%]
2023-01-11T23:13:47.0711733Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_bessel_j1_cuda PASSED [ 19%]
2023-01-11T23:13:47.0711896Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_i0e_cuda PASSED [ 19%]
2023-01-11T23:13:47.0712083Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_1_cuda PASSED [ 19%]
2023-01-11T23:13:47.0712266Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_multigammaln_mvlgamma_p_3_cuda PASSED [ 19%]
2023-01-11T23:13:47.0712424Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_ndtr_cuda PASSED [ 19%]
2023-01-11T23:13:47.0712597Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_softmax_with_dtype_cuda PASSED [ 19%]
2023-01-11T23:13:47.0712777Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_spherical_bessel_j0_cuda PASSED [ 19%]
2023-01-11T23:13:47.0712944Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_xlog1py_cuda PASSED [ 19%]
2023-01-11T23:13:47.0713101Z test_ops.py::TestCommonCUDA::test_out_warning__refs_special_zeta_cuda PASSED [ 19%]
2023-01-11T23:13:47.0713276Z test_ops.py::TestCommonCUDA::test_out_warning__refs_squeeze_cuda PASSED  [ 19%]
2023-01-11T23:13:47.0713426Z test_ops.py::TestCommonCUDA::test_out_warning__refs_std_mean_cuda PASSED [ 19%]
2023-01-11T23:13:47.0713577Z test_ops.py::TestCommonCUDA::test_out_warning__refs_sum_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0713723Z test_ops.py::TestCommonCUDA::test_out_warning__refs_to_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0713883Z test_ops.py::TestCommonCUDA::test_out_warning__refs_tril_indices_cuda PASSED [ 19%]
2023-01-11T23:13:47.0714038Z test_ops.py::TestCommonCUDA::test_out_warning__refs_true_divide_cuda PASSED [ 19%]
2023-01-11T23:13:47.0714187Z test_ops.py::TestCommonCUDA::test_out_warning__refs_trunc_cuda PASSED    [ 19%]
2023-01-11T23:13:47.0714344Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unflatten_cuda PASSED [ 19%]
2023-01-11T23:13:47.0714486Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unfold_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0714645Z test_ops.py::TestCommonCUDA::test_out_warning__refs_unsqueeze_cuda PASSED [ 19%]
2023-01-11T23:13:47.0714791Z test_ops.py::TestCommonCUDA::test_out_warning__refs_var_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0714946Z test_ops.py::TestCommonCUDA::test_out_warning__refs_var_mean_cuda PASSED [ 19%]
2023-01-11T23:13:47.0715093Z test_ops.py::TestCommonCUDA::test_out_warning__refs_view_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0715240Z test_ops.py::TestCommonCUDA::test_out_warning__refs_vstack_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0715384Z test_ops.py::TestCommonCUDA::test_out_warning_abs_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0715529Z test_ops.py::TestCommonCUDA::test_out_warning_add_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0715667Z test_ops.py::TestCommonCUDA::test_out_warning_addbmm_cuda XFAIL          [ 19%]
2023-01-11T23:13:47.0715815Z test_ops.py::TestCommonCUDA::test_out_warning_addcdiv_cuda PASSED        [ 19%]
2023-01-11T23:13:47.0715963Z test_ops.py::TestCommonCUDA::test_out_warning_addcmul_cuda PASSED        [ 19%]
2023-01-11T23:13:47.0716105Z test_ops.py::TestCommonCUDA::test_out_warning_addr_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0716271Z test_ops.py::TestCommonCUDA::test_out_warning_all_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0716417Z test_ops.py::TestCommonCUDA::test_out_warning_amax_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0716560Z test_ops.py::TestCommonCUDA::test_out_warning_amin_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0716704Z test_ops.py::TestCommonCUDA::test_out_warning_angle_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0716839Z test_ops.py::TestCommonCUDA::test_out_warning_any_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0716979Z test_ops.py::TestCommonCUDA::test_out_warning_arange_cuda XFAIL          [ 19%]
2023-01-11T23:13:47.0717126Z test_ops.py::TestCommonCUDA::test_out_warning_argmin_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0717272Z test_ops.py::TestCommonCUDA::test_out_warning_argwhere_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0717436Z test_ops.py::TestCommonCUDA::test_out_warning_as_strided_scatter_cuda PASSED [ 19%]
2023-01-11T23:13:47.0717579Z test_ops.py::TestCommonCUDA::test_out_warning_atan_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0717724Z test_ops.py::TestCommonCUDA::test_out_warning_atanh_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0717870Z test_ops.py::TestCommonCUDA::test_out_warning_atleast_2d_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0718013Z test_ops.py::TestCommonCUDA::test_out_warning_bernoulli_cuda XFAIL       [ 19%]
2023-01-11T23:13:47.0718160Z test_ops.py::TestCommonCUDA::test_out_warning_bincount_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0718306Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_and_cuda PASSED    [ 19%]
2023-01-11T23:13:47.0718465Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_left_shift_cuda PASSED [ 19%]
2023-01-11T23:13:47.0718612Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_not_cuda PASSED    [ 19%]
2023-01-11T23:13:47.0718785Z test_ops.py::TestCommonCUDA::test_out_warning_bitwise_xor_cuda PASSED    [ 19%]
2023-01-11T23:13:47.0718932Z test_ops.py::TestCommonCUDA::test_out_warning_block_diag_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0719078Z test_ops.py::TestCommonCUDA::test_out_warning_bool_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0719214Z test_ops.py::TestCommonCUDA::test_out_warning_byte_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0719355Z test_ops.py::TestCommonCUDA::test_out_warning_cat_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0719498Z test_ops.py::TestCommonCUDA::test_out_warning_cdist_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0719640Z test_ops.py::TestCommonCUDA::test_out_warning_ceil_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0719784Z test_ops.py::TestCommonCUDA::test_out_warning_cfloat_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0719928Z test_ops.py::TestCommonCUDA::test_out_warning_chunk_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0720073Z test_ops.py::TestCommonCUDA::test_out_warning_clone_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0720222Z test_ops.py::TestCommonCUDA::test_out_warning_conj_physical_cuda PASSED  [ 19%]
2023-01-11T23:13:47.0720373Z test_ops.py::TestCommonCUDA::test_out_warning_constant_pad_nd_cuda PASSED [ 19%]
2023-01-11T23:13:47.0720522Z test_ops.py::TestCommonCUDA::test_out_warning_contiguous_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0720668Z test_ops.py::TestCommonCUDA::test_out_warning_copysign_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0720815Z test_ops.py::TestCommonCUDA::test_out_warning_corrcoef_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0720965Z test_ops.py::TestCommonCUDA::test_out_warning_count_nonzero_cuda PASSED  [ 19%]
2023-01-11T23:13:47.0721105Z test_ops.py::TestCommonCUDA::test_out_warning_cov_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0721252Z test_ops.py::TestCommonCUDA::test_out_warning_cummin_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0721400Z test_ops.py::TestCommonCUDA::test_out_warning_cumprod_cuda PASSED        [ 19%]
2023-01-11T23:13:47.0721543Z test_ops.py::TestCommonCUDA::test_out_warning_diagflat_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0721690Z test_ops.py::TestCommonCUDA::test_out_warning_diagonal_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0721882Z test_ops.py::TestCommonCUDA::test_out_warning_diagonal_scatter_cuda PASSED [ 19%]
2023-01-11T23:13:47.0722033Z test_ops.py::TestCommonCUDA::test_out_warning_diff_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0722191Z test_ops.py::TestCommonCUDA::test_out_warning_div_trunc_rounding_cuda PASSED [ 19%]
2023-01-11T23:13:47.0722336Z test_ops.py::TestCommonCUDA::test_out_warning_double_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0722485Z test_ops.py::TestCommonCUDA::test_out_warning_empty_like_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0722627Z test_ops.py::TestCommonCUDA::test_out_warning_eq_cuda PASSED             [ 19%]
2023-01-11T23:13:47.0722849Z test_ops.py::TestCommonCUDA::test_out_warning_equal_cuda SKIPPED (Skipped! Only supports single tensor or iterable of tensor outputs.) [ 19%]
2023-01-11T23:13:47.0722989Z test_ops.py::TestCommonCUDA::test_out_warning_erf_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0723133Z test_ops.py::TestCommonCUDA::test_out_warning_erfinv_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0723275Z test_ops.py::TestCommonCUDA::test_out_warning_exp2_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0723416Z test_ops.py::TestCommonCUDA::test_out_warning_eye_cuda XFAIL             [ 19%]
2023-01-11T23:13:47.0723560Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fft2_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0723705Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fft_cuda PASSED        [ 19%]
2023-01-11T23:13:47.0723853Z test_ops.py::TestCommonCUDA::test_out_warning_fft_fftshift_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0723997Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfft2_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0724136Z test_ops.py::TestCommonCUDA::test_out_warning_fft_hfftn_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0724310Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifft_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0724459Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ifftshift_cuda PASSED  [ 19%]
2023-01-11T23:13:47.0724608Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfft_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0724753Z test_ops.py::TestCommonCUDA::test_out_warning_fft_ihfftn_cuda XFAIL      [ 19%]
2023-01-11T23:13:47.0724898Z test_ops.py::TestCommonCUDA::test_out_warning_fft_irfftn_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0725042Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfft_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0725189Z test_ops.py::TestCommonCUDA::test_out_warning_fft_rfftn_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0725325Z test_ops.py::TestCommonCUDA::test_out_warning_fill_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0725469Z test_ops.py::TestCommonCUDA::test_out_warning_flip_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0725617Z test_ops.py::TestCommonCUDA::test_out_warning_flipud_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0725759Z test_ops.py::TestCommonCUDA::test_out_warning_float_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0725904Z test_ops.py::TestCommonCUDA::test_out_warning_floor_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0726055Z test_ops.py::TestCommonCUDA::test_out_warning_floor_divide_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0726197Z test_ops.py::TestCommonCUDA::test_out_warning_frac_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0726340Z test_ops.py::TestCommonCUDA::test_out_warning_frexp_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0726478Z test_ops.py::TestCommonCUDA::test_out_warning_full_like_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0726619Z test_ops.py::TestCommonCUDA::test_out_warning_gcd_cuda PASSED            [ 19%]
2023-01-11T23:13:47.0726765Z test_ops.py::TestCommonCUDA::test_out_warning_gradient_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0726908Z test_ops.py::TestCommonCUDA::test_out_warning_half_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0727058Z test_ops.py::TestCommonCUDA::test_out_warning_heaviside_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0727200Z test_ops.py::TestCommonCUDA::test_out_warning_histc_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0727442Z test_ops.py::TestCommonCUDA::test_out_warning_histogram_cuda SKIPPED (Skipped! Op has not supported dtypes on this device.) [ 19%]
2023-01-11T23:13:47.0727589Z test_ops.py::TestCommonCUDA::test_out_warning_hsplit_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0727725Z test_ops.py::TestCommonCUDA::test_out_warning_i0_cuda PASSED             [ 19%]
2023-01-11T23:13:47.0727870Z test_ops.py::TestCommonCUDA::test_out_warning_igammac_cuda PASSED        [ 19%]
2023-01-11T23:13:47.0728017Z test_ops.py::TestCommonCUDA::test_out_warning_index_fill_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0728163Z test_ops.py::TestCommonCUDA::test_out_warning_index_put_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0728310Z test_ops.py::TestCommonCUDA::test_out_warning_index_reduce_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0728458Z test_ops.py::TestCommonCUDA::test_out_warning_isclose_cuda PASSED        [ 19%]
2023-01-11T23:13:47.0728600Z test_ops.py::TestCommonCUDA::test_out_warning_isin_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0728746Z test_ops.py::TestCommonCUDA::test_out_warning_isinf_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0728882Z test_ops.py::TestCommonCUDA::test_out_warning_isnan_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0729029Z test_ops.py::TestCommonCUDA::test_out_warning_isneginf_cuda PASSED       [ 19%]
2023-01-11T23:13:47.0729195Z test_ops.py::TestCommonCUDA::test_out_warning_isreal_cuda PASSED         [ 19%]
2023-01-11T23:13:47.0729396Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_4inputs_with_extra_args_cuda PASSED [ 19%]
2023-01-11T23:13:47.0729551Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_cuda PASSED [ 19%]
2023-01-11T23:13:47.0729724Z test_ops.py::TestCommonCUDA::test_out_warning_jiterator_binary_return_by_ref_cuda PASSED [ 19%]
2023-01-11T23:13:47.0729893Z test_ops.py::TestCommonCUDA::test_out_warning_le_cuda PASSED             [ 19%]
2023-01-11T23:13:47.0730041Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cross_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0730192Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_eig_cuda PASSED     [ 19%]
2023-01-11T23:13:47.0730356Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_householder_product_cuda PASSED [ 19%]
2023-01-11T23:13:47.0730504Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_inv_ex_cuda PASSED  [ 19%]
2023-01-11T23:13:47.0730665Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_factor_ex_cuda PASSED [ 19%]
2023-01-11T23:13:47.0730821Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_ldl_solve_cuda PASSED [ 19%]
2023-01-11T23:13:47.0730971Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lstsq_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0731116Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0731276Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_factor_cuda PASSED [ 19%]
2023-01-11T23:13:47.0731431Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_lu_solve_cuda PASSED [ 19%]
2023-01-11T23:13:47.0731586Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_norm_cuda PASSED [ 19%]
2023-01-11T23:13:47.0731748Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_power_cuda PASSED [ 19%]
2023-01-11T23:13:47.0731917Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_matrix_rank_hermitian_cuda PASSED [ 19%]
2023-01-11T23:13:47.0732070Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_multi_dot_cuda PASSED [ 19%]
2023-01-11T23:13:47.0732220Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_cuda PASSED    [ 19%]
2023-01-11T23:13:47.0732396Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_norm_subgradients_at_zero_cuda PASSED [ 19%]
2023-01-11T23:13:47.0732545Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_qr_cuda PASSED      [ 19%]
2023-01-11T23:13:47.0732698Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_slogdet_cuda PASSED [ 19%]
2023-01-11T23:13:47.0732841Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_cuda PASSED   [ 19%]
2023-01-11T23:13:47.0733019Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_ex_cuda PASSED [ 19%]
2023-01-11T23:13:47.0733186Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_solve_triangular_cuda PASSED [ 19%]
2023-01-11T23:13:47.0733345Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_tensorsolve_cuda PASSED [ 19%]
2023-01-11T23:13:47.0733494Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_vecdot_cuda PASSED  [ 19%]
2023-01-11T23:13:47.0733651Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_vector_norm_cuda PASSED [ 19%]
2023-01-11T23:13:47.0733796Z test_ops.py::TestCommonCUDA::test_out_warning_log10_cuda PASSED          [ 19%]
2023-01-11T23:13:47.0733940Z test_ops.py::TestCommonCUDA::test_out_warning_log2_cuda PASSED           [ 19%]
2023-01-11T23:13:47.0734081Z test_ops.py::TestCommonCUDA::test_out_warning_log_cuda PASSED            [ 20%]
2023-01-11T23:13:47.0734230Z test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0734396Z test_ops.py::TestCommonCUDA::test_out_warning_log_softmax_with_dtype_cuda PASSED [ 20%]
2023-01-11T23:13:47.0734670Z test_ops.py::TestCommonCUDA::test_out_warning_logaddexp2_cuda PASSED     [ 20%]
2023-01-11T23:13:47.0734822Z test_ops.py::TestCommonCUDA::test_out_warning_logaddexp_cuda PASSED      [ 20%]
2023-01-11T23:13:47.0734972Z test_ops.py::TestCommonCUDA::test_out_warning_logical_not_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0735119Z test_ops.py::TestCommonCUDA::test_out_warning_logical_or_cuda PASSED     [ 20%]
2023-01-11T23:13:47.0735265Z test_ops.py::TestCommonCUDA::test_out_warning_logit_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0735407Z test_ops.py::TestCommonCUDA::test_out_warning_lt_cuda PASSED             [ 20%]
2023-01-11T23:13:47.0735581Z test_ops.py::TestCommonCUDA::test_out_warning_lu_cuda XFAIL              [ 20%]
2023-01-11T23:13:47.0735726Z test_ops.py::TestCommonCUDA::test_out_warning_lu_solve_cuda PASSED       [ 20%]
2023-01-11T23:13:47.0735875Z test_ops.py::TestCommonCUDA::test_out_warning_lu_unpack_cuda PASSED      [ 20%]
2023-01-11T23:13:47.0736025Z test_ops.py::TestCommonCUDA::test_out_warning_mH_cuda PASSED             [ 20%]
2023-01-11T23:13:47.0736168Z test_ops.py::TestCommonCUDA::test_out_warning_mT_cuda PASSED             [ 20%]
2023-01-11T23:13:47.0736315Z test_ops.py::TestCommonCUDA::test_out_warning_masked_amin_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0736465Z test_ops.py::TestCommonCUDA::test_out_warning_masked_argmax_cuda PASSED  [ 20%]
2023-01-11T23:13:47.0736615Z test_ops.py::TestCommonCUDA::test_out_warning_masked_cumprod_cuda PASSED [ 20%]
2023-01-11T23:13:47.0736758Z test_ops.py::TestCommonCUDA::test_out_warning_masked_cumsum_cuda PASSED  [ 20%]
2023-01-11T23:13:47.0736918Z test_ops.py::TestCommonCUDA::test_out_warning_masked_log_softmax_cuda PASSED [ 20%]
2023-01-11T23:13:47.0737079Z test_ops.py::TestCommonCUDA::test_out_warning_masked_logsumexp_cuda PASSED [ 20%]
2023-01-11T23:13:47.0737226Z test_ops.py::TestCommonCUDA::test_out_warning_masked_mean_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0737377Z test_ops.py::TestCommonCUDA::test_out_warning_masked_median_cuda PASSED  [ 20%]
2023-01-11T23:13:47.0737525Z test_ops.py::TestCommonCUDA::test_out_warning_masked_norm_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0737684Z test_ops.py::TestCommonCUDA::test_out_warning_masked_normalize_cuda PASSED [ 20%]
2023-01-11T23:13:47.0737831Z test_ops.py::TestCommonCUDA::test_out_warning_masked_prod_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0737974Z test_ops.py::TestCommonCUDA::test_out_warning_masked_select_cuda PASSED  [ 20%]
2023-01-11T23:13:47.0738131Z test_ops.py::TestCommonCUDA::test_out_warning_masked_softmax_cuda PASSED [ 20%]
2023-01-11T23:13:47.0738279Z test_ops.py::TestCommonCUDA::test_out_warning_masked_softmin_cuda PASSED [ 20%]
2023-01-11T23:13:47.0738429Z test_ops.py::TestCommonCUDA::test_out_warning_masked_var_cuda PASSED     [ 20%]
2023-01-11T23:13:47.0738575Z test_ops.py::TestCommonCUDA::test_out_warning_matmul_cuda PASSED         [ 20%]
2023-01-11T23:13:47.0738802Z test_ops.py::TestCommonCUDA::test_out_warning_max_binary_cuda PASSED     [ 20%]
2023-01-11T23:13:47.0738950Z test_ops.py::TestCommonCUDA::test_out_warning_median_cuda PASSED         [ 20%]
2023-01-11T23:13:47.0739115Z test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_list_of_tensors_cuda PASSED [ 20%]
2023-01-11T23:13:47.0739277Z test_ops.py::TestCommonCUDA::test_out_warning_meshgrid_variadic_tensors_cuda PASSED [ 20%]
2023-01-11T23:13:47.0739441Z test_ops.py::TestCommonCUDA::test_out_warning_min_reduction_with_dim_cuda PASSED [ 20%]
2023-01-11T23:13:47.0739584Z test_ops.py::TestCommonCUDA::test_out_warning_mode_cuda XFAIL            [ 20%]
2023-01-11T23:13:47.0739728Z test_ops.py::TestCommonCUDA::test_out_warning_msort_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0739873Z test_ops.py::TestCommonCUDA::test_out_warning_mul_cuda PASSED            [ 20%]
2023-01-11T23:13:47.0740016Z test_ops.py::TestCommonCUDA::test_out_warning_mv_cuda PASSED             [ 20%]
2023-01-11T23:13:47.0740183Z test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_1_cuda PASSED [ 20%]
2023-01-11T23:13:47.0740345Z test_ops.py::TestCommonCUDA::test_out_warning_mvlgamma_mvlgamma_p_5_cuda PASSED [ 20%]
2023-01-11T23:13:47.0740485Z test_ops.py::TestCommonCUDA::test_out_warning_nan_to_num_cuda PASSED     [ 20%]
2023-01-11T23:13:47.0740631Z test_ops.py::TestCommonCUDA::test_out_warning_nanmean_cuda PASSED        [ 20%]
2023-01-11T23:13:47.0740779Z test_ops.py::TestCommonCUDA::test_out_warning_narrow_copy_cuda XFAIL     [ 20%]
2023-01-11T23:13:47.0740946Z test_ops.py::TestCommonCUDA::test_out_warning_native_dropout_backward_cuda PASSED [ 20%]
2023-01-11T23:13:47.0741104Z test_ops.py::TestCommonCUDA::test_out_warning_native_layer_norm_cuda PASSED [ 20%]
2023-01-11T23:13:47.0741273Z test_ops.py::TestCommonCUDA::test_out_warning_neg_cuda PASSED            [ 20%]
2023-01-11T23:13:47.0741421Z test_ops.py::TestCommonCUDA::test_out_warning_new_empty_cuda PASSED      [ 20%]
2023-01-11T23:13:47.0741566Z test_ops.py::TestCommonCUDA::test_out_warning_new_zeros_cuda PASSED      [ 20%]
2023-01-11T23:13:47.0741742Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool2d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0741917Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_avg_pool3d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0742089Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool1d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0742259Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_adaptive_max_pool3d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0742425Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool1d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0742586Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool2d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0742753Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_avg_pool3d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0742935Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_batch_norm_without_cudnn_cuda PASSED [ 20%]
2023-01-11T23:13:47.0743104Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_bilinear_cuda PASSED [ 20%]
2023-01-11T23:13:47.0743275Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_cuda PASSED [ 20%]
2023-01-11T23:13:47.0743468Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_binary_cross_entropy_with_logits_cuda PASSED [ 20%]
2023-01-11T23:13:47.0743640Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_conv_transpose1d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0743822Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cosine_embedding_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0743993Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_cross_entropy_cuda PASSED [ 20%]
2023-01-11T23:13:47.0744160Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_ctc_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0744327Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout2d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0744515Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_dropout_cuda PASSED [ 20%]
2023-01-11T23:13:47.0744683Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_embedding_cuda PASSED [ 20%]
2023-01-11T23:13:47.0744871Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_feature_alpha_dropout_without_train_cuda PASSED [ 20%]
2023-01-11T23:13:47.0745050Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_fractional_max_pool3d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0745229Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gaussian_nll_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0745387Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_gelu_cuda PASSED [ 20%]
2023-01-11T23:13:47.0745552Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_group_norm_cuda PASSED [ 20%]
2023-01-11T23:13:47.0745718Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardshrink_cuda PASSED [ 20%]
2023-01-11T23:13:47.0745887Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardsigmoid_cuda PASSED [ 20%]
2023-01-11T23:13:47.0746049Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hardtanh_cuda PASSED [ 20%]
2023-01-11T23:13:47.0746218Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_hinge_embedding_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0746379Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_huber_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0746549Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_instance_norm_cuda PASSED [ 20%]
2023-01-11T23:13:47.0746724Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_area_cuda PASSED [ 20%]
2023-01-11T23:13:47.0746904Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_bicubic_cuda PASSED [ 20%]
2023-01-11T23:13:47.0747113Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_linear_cuda PASSED [ 20%]
2023-01-11T23:13:47.0747298Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_interpolate_trilinear_cuda PASSED [ 20%]
2023-01-11T23:13:47.0747459Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_kl_div_cuda PASSED [ 20%]
2023-01-11T23:13:47.0747619Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_l1_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0747772Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_layer_norm_cuda PASSED [ 20%]
2023-01-11T23:13:47.0747936Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_leaky_relu_cuda PASSED [ 20%]
2023-01-11T23:13:47.0748099Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool1d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0748264Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool2d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0748425Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_pool3d_cuda PASSED [ 20%]
2023-01-11T23:13:47.0748599Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_max_unpool3d_grad_cuda PASSED [ 20%]
2023-01-11T23:13:47.0748760Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_mish_cuda PASSED [ 20%]
2023-01-11T23:13:47.0748934Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multi_margin_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0749107Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_margin_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0749295Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_multilabel_soft_margin_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0749460Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_normalize_cuda PASSED [ 20%]
2023-01-11T23:13:47.0749628Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_circular_cuda PASSED [ 20%]
2023-01-11T23:13:47.0749875Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_constant_cuda PASSED [ 20%]
2023-01-11T23:13:47.0750048Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_reflect_cuda PASSED [ 20%]
2023-01-11T23:13:47.0750250Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pad_replicate_cuda PASSED [ 20%]
2023-01-11T23:13:47.0750415Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pdist_cuda PASSED [ 20%]
2023-01-11T23:13:47.0750588Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_pixel_unshuffle_cuda PASSED [ 20%]
2023-01-11T23:13:47.0750753Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_poisson_nll_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0750915Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu6_cuda PASSED [ 20%]
2023-01-11T23:13:47.0751077Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_relu_cuda PASSED [ 20%]
2023-01-11T23:13:47.0751239Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_cuda PASSED [ 20%]
2023-01-11T23:13:47.0751417Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softmin_with_dtype_cuda PASSED [ 20%]
2023-01-11T23:13:47.0751582Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_softsign_cuda PASSED [ 20%]
2023-01-11T23:13:47.0751750Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_tanhshrink_cuda PASSED [ 20%]
2023-01-11T23:13:47.0751944Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_triplet_margin_with_distance_loss_cuda PASSED [ 20%]
2023-01-11T23:13:47.0752120Z test_ops.py::TestCommonCUDA::test_out_warning_nn_functional_upsample_bilinear_cuda PASSED [ 20%]
2023-01-11T23:13:47.0752263Z test_ops.py::TestCommonCUDA::test_out_warning_nonzero_cuda XFAIL         [ 20%]
2023-01-11T23:13:47.0752412Z test_ops.py::TestCommonCUDA::test_out_warning_norm_inf_cuda PASSED       [ 20%]
2023-01-11T23:13:47.0752585Z test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_native_batch_norm_cuda PASSED [ 20%]
2023-01-11T23:13:47.0752750Z test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_var_mean_cuda PASSED [ 20%]
2023-01-11T23:13:47.0752944Z test_ops.py::TestCommonCUDA::test_out_warning_ops_nvprims_view_cuda PASSED [ 20%]
2023-01-11T23:13:47.0753090Z test_ops.py::TestCommonCUDA::test_out_warning_ormqr_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0753247Z test_ops.py::TestCommonCUDA::test_out_warning_pca_lowrank_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0753398Z test_ops.py::TestCommonCUDA::test_out_warning_pinverse_cuda PASSED       [ 20%]
2023-01-11T23:13:47.0753541Z test_ops.py::TestCommonCUDA::test_out_warning_polar_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0753708Z test_ops.py::TestCommonCUDA::test_out_warning_polygamma_polygamma_n_0_cuda PASSED [ 20%]
2023-01-11T23:13:47.0753851Z test_ops.py::TestCommonCUDA::test_out_warning_pow_cuda PASSED            [ 20%]
2023-01-11T23:13:47.0753998Z test_ops.py::TestCommonCUDA::test_out_warning_put_cuda PASSED            [ 20%]
2023-01-11T23:13:47.0754141Z test_ops.py::TestCommonCUDA::test_out_warning_qr_cuda PASSED             [ 20%]
2023-01-11T23:13:47.0754292Z test_ops.py::TestCommonCUDA::test_out_warning_quantile_cuda PASSED       [ 20%]
2023-01-11T23:13:47.0754439Z test_ops.py::TestCommonCUDA::test_out_warning_randn_like_cuda PASSED     [ 20%]
2023-01-11T23:13:47.0754587Z test_ops.py::TestCommonCUDA::test_out_warning_ravel_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0754725Z test_ops.py::TestCommonCUDA::test_out_warning_real_cuda PASSED           [ 20%]
2023-01-11T23:13:47.0754886Z test_ops.py::TestCommonCUDA::test_out_warning_repeat_interleave_cuda PASSED [ 20%]
2023-01-11T23:13:47.0755034Z test_ops.py::TestCommonCUDA::test_out_warning_reshape_cuda PASSED        [ 20%]
2023-01-11T23:13:47.0755180Z test_ops.py::TestCommonCUDA::test_out_warning_resize__cuda PASSED        [ 20%]
2023-01-11T23:13:47.0755328Z test_ops.py::TestCommonCUDA::test_out_warning_resolve_neg_cuda PASSED    [ 20%]
2023-01-11T23:13:47.0755472Z test_ops.py::TestCommonCUDA::test_out_warning_rot90_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0755619Z test_ops.py::TestCommonCUDA::test_out_warning_round_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0755776Z test_ops.py::TestCommonCUDA::test_out_warning_round_decimals_0_cuda PASSED [ 20%]
2023-01-11T23:13:47.0755937Z test_ops.py::TestCommonCUDA::test_out_warning_rsub_cuda PASSED           [ 20%]
2023-01-11T23:13:47.0756086Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_cuda PASSED        [ 20%]
2023-01-11T23:13:47.0756249Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amax_cuda PASSED [ 20%]
2023-01-11T23:13:47.0756406Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_amin_cuda PASSED [ 20%]
2023-01-11T23:13:47.0756570Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_prod_cuda PASSED [ 20%]
2023-01-11T23:13:47.0756730Z test_ops.py::TestCommonCUDA::test_out_warning_scatter_reduce_sum_cuda PASSED [ 20%]
2023-01-11T23:13:47.0756895Z test_ops.py::TestCommonCUDA::test_out_warning_segment_reduce_lengths_cuda PASSED [ 20%]
2023-01-11T23:13:47.0757047Z test_ops.py::TestCommonCUDA::test_out_warning_select_cuda PASSED         [ 20%]
2023-01-11T23:13:47.0757192Z test_ops.py::TestCommonCUDA::test_out_warning_select_scatter_cuda PASSED [ 20%]
2023-01-11T23:13:47.0757344Z test_ops.py::TestCommonCUDA::test_out_warning_sigmoid_cuda PASSED        [ 20%]
2023-01-11T23:13:47.0757511Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_bartlett_cuda PASSED [ 20%]
2023-01-11T23:13:47.0757674Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_hann_cuda PASSED [ 20%]
2023-01-11T23:13:47.0757836Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_kaiser_cuda PASSED [ 20%]
2023-01-11T23:13:47.0758000Z test_ops.py::TestCommonCUDA::test_out_warning_signal_windows_nuttall_cuda PASSED [ 20%]
2023-01-11T23:13:47.0758146Z test_ops.py::TestCommonCUDA::test_out_warning_sinc_cuda PASSED           [ 20%]
2023-01-11T23:13:47.0758292Z test_ops.py::TestCommonCUDA::test_out_warning_slice_cuda PASSED          [ 20%]
2023-01-11T23:13:47.0758472Z test_ops.py::TestCommonCUDA::test_out_warning_softmax_with_dtype_cuda PASSED [ 20%]
2023-01-11T23:13:47.0758618Z test_ops.py::TestCommonCUDA::test_out_warning_sort_cuda PASSED           [ 21%]
2023-01-11T23:13:47.0758798Z test_ops.py::TestCommonCUDA::test_out_warning_sparse_sampled_addmm_cuda SKIPPED (Skipped!) [ 21%]
2023-01-11T23:13:47.0758957Z test_ops.py::TestCommonCUDA::test_out_warning_special_airy_ai_cuda PASSED [ 21%]
2023-01-11T23:13:47.0759113Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j0_cuda PASSED [ 21%]
2023-01-11T23:13:47.0759276Z test_ops.py::TestCommonCUDA::test_out_warning_special_bessel_j1_cuda PASSED [ 21%]
2023-01-11T23:13:47.0759477Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_t_cuda PASSED [ 21%]
2023-01-11T23:13:47.0759851Z test_ops.py::TestCommonCUDA::test_out_warning_special_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:13:47.0760006Z test_ops.py::TestCommonCUDA::test_out_warning_special_entr_cuda PASSED   [ 21%]
2023-01-11T23:13:47.0760176Z test_ops.py::TestCommonCUDA::test_out_warning_special_hermite_polynomial_he_cuda PASSED [ 21%]
2023-01-11T23:13:47.0760350Z test_ops.py::TestCommonCUDA::test_out_warning_special_laguerre_polynomial_l_cuda PASSED [ 21%]
2023-01-11T23:13:47.0760519Z test_ops.py::TestCommonCUDA::test_out_warning_special_modified_bessel_k0_cuda PASSED [ 21%]
2023-01-11T23:13:47.0760697Z test_ops.py::TestCommonCUDA::test_out_warning_special_scaled_modified_bessel_k0_cuda PASSED [ 21%]
2023-01-11T23:13:47.0761039Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_t_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:13:47.0761384Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_u_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:13:47.0761719Z test_ops.py::TestCommonCUDA::test_out_warning_special_shifted_chebyshev_polynomial_v_cuda SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 21%]
2023-01-11T23:13:47.0761881Z test_ops.py::TestCommonCUDA::test_out_warning_special_xlog1py_cuda PASSED [ 21%]
2023-01-11T23:13:47.0762060Z test_ops.py::TestCommonCUDA::test_out_warning_special_zeta_cuda PASSED   [ 21%]
2023-01-11T23:13:47.0762203Z test_ops.py::TestCommonCUDA::test_out_warning_split_cuda PASSED          [ 21%]
2023-01-11T23:13:47.0762359Z test_ops.py::TestCommonCUDA::test_out_warning_split_list_args_cuda PASSED [ 21%]
2023-01-11T23:13:47.0762516Z test_ops.py::TestCommonCUDA::test_out_warning_split_with_sizes_cuda PASSED [ 21%]
2023-01-11T23:13:47.0762666Z test_ops.py::TestCommonCUDA::test_out_warning_square_cuda PASSED         [ 21%]
2023-01-11T23:13:47.0762809Z test_ops.py::TestCommonCUDA::test_out_warning_std_cuda PASSED            [ 21%]
2023-01-11T23:13:47.0762967Z test_ops.py::TestCommonCUDA::test_out_warning_std_mean_unbiased_cuda PASSED [ 21%]
2023-01-11T23:13:47.0763112Z test_ops.py::TestCommonCUDA::test_out_warning_sub_cuda PASSED            [ 21%]
2023-01-11T23:13:47.0763260Z test_ops.py::TestCommonCUDA::test_out_warning_sum_cuda PASSED            [ 21%]
2023-01-11T23:13:47.0763406Z test_ops.py::TestCommonCUDA::test_out_warning_sum_to_size_cuda PASSED    [ 21%]
2023-01-11T23:13:47.0763553Z test_ops.py::TestCommonCUDA::test_out_warning_svd_lowrank_cuda PASSED    [ 21%]
2023-01-11T23:13:47.0763694Z test_ops.py::TestCommonCUDA::test_out_warning_t_cuda PASSED              [ 21%]
2023-01-11T23:13:47.0763845Z test_ops.py::TestCommonCUDA::test_out_warning_take_along_dim_cuda PASSED [ 21%]
2023-01-11T23:13:47.0763992Z test_ops.py::TestCommonCUDA::test_out_warning_tan_cuda PASSED            [ 21%]
2023-01-11T23:13:47.0764140Z test_ops.py::TestCommonCUDA::test_out_warning_tensor_split_cuda PASSED   [ 21%]
2023-01-11T23:13:47.0764281Z test_ops.py::TestCommonCUDA::test_out_warning_to_cuda PASSED             [ 21%]
2023-01-11T23:13:47.0764427Z test_ops.py::TestCommonCUDA::test_out_warning_to_sparse_cuda PASSED      [ 21%]
2023-01-11T23:13:47.0764599Z test_ops.py::TestCommonCUDA::test_out_warning_trapz_cuda PASSED          [ 21%]
2023-01-11T23:13:47.0764734Z test_ops.py::TestCommonCUDA::test_out_warning_tril_cuda PASSED           [ 21%]
2023-01-11T23:13:47.0764884Z test_ops.py::TestCommonCUDA::test_out_warning_true_divide_cuda PASSED    [ 21%]
2023-01-11T23:13:47.0765030Z test_ops.py::TestCommonCUDA::test_out_warning_unbind_cuda PASSED         [ 21%]
2023-01-11T23:13:47.0765176Z test_ops.py::TestCommonCUDA::test_out_warning_unsqueeze_cuda PASSED      [ 21%]
2023-01-11T23:13:47.0765318Z test_ops.py::TestCommonCUDA::test_out_warning_var_cuda PASSED            [ 21%]
2023-01-11T23:13:47.0765478Z test_ops.py::TestCommonCUDA::test_out_warning_var_mean_unbiased_cuda PASSED [ 21%]
2023-01-11T23:13:47.0765624Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_cuda PASSED        [ 21%]
2023-01-11T23:13:47.0765771Z test_ops.py::TestCommonCUDA::test_out_warning_view_as_real_cuda PASSED   [ 21%]
2023-01-11T23:13:47.0765911Z test_ops.py::TestCommonCUDA::test_out_warning_vstack_cuda PASSED         [ 21%]
2023-01-11T23:13:47.0766055Z test_ops.py::TestCommonCUDA::test_out_warning_xlogy_cuda PASSED          [ 21%]
2023-01-11T23:13:47.0766201Z test_ops.py::TestCommonCUDA::test_out_warning_zero__cuda PASSED          [ 21%]
2023-01-11T23:13:47.0766346Z test_ops.py::TestCommonCUDA::test_out_warning_zeros_like_cuda PASSED     [ 21%]
2023-01-11T23:13:47.0766489Z test_ops.py::TestCommonCUDA::test_out_zero__cuda_float32 PASSED          [ 21%]
2023-01-11T23:13:47.0766631Z test_ops.py::TestCommonCUDA::test_out_zeros_cuda_float32 PASSED          [ 21%]
2023-01-11T23:13:47.0766777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_bool PASSED    [ 21%]
2023-01-11T23:13:47.0766924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_T_cuda_int64 PASSED   [ 21%]
2023-01-11T23:13:47.0767097Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:13:47.0767276Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_complex32 PASSED [ 21%]
2023-01-11T23:13:47.0767456Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0767651Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0767825Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0767995Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0768165Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bfloat16_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0768334Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0768513Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0768676Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0768852Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0769023Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0769190Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0769357Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_bool_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0769524Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0769694Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0769862Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0770021Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0770212Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_byte_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0770392Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0770570Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex32 PASSED [ 21%]
2023-01-11T23:13:47.0770744Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0770923Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0771093Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0771266Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0771435Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int32 PASSED [ 21%]
2023-01-11T23:13:47.0771602Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0771771Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cdouble_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0771942Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0772123Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0772297Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0772469Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0772641Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0772810Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int32 PASSED [ 21%]
2023-01-11T23:13:47.0772983Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0773148Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_cfloat_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0773345Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:13:47.0773523Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex32 PASSED [ 21%]
2023-01-11T23:13:47.0773693Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0773864Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0774035Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0774205Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_chalf_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0774376Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0774681Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0774850Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0775020Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0775189Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0775359Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0775525Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_char_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0775696Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:13:47.0775926Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0776106Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0776278Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex32 PASSED [ 21%]
2023-01-11T23:13:47.0776451Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0776622Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0776794Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0776965Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int32 PASSED [ 21%]
2023-01-11T23:13:47.0777134Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0777306Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_double_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0777474Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:13:47.0777645Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0777813Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_complex32 PASSED [ 21%]
2023-01-11T23:13:47.0777984Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0778152Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0778322Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_float_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0778495Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_bfloat16 PASSED [ 21%]
2023-01-11T23:13:47.0778666Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0778837Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_int32 PASSED [ 21%]
2023-01-11T23:13:47.0779055Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_half_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0779249Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0779417Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0779589Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0779756Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0779921Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0780088Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0780254Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_int_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0780428Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0780602Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0780766Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0780935Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0781103Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0781266Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0781431Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int32 PASSED [ 21%]
2023-01-11T23:13:47.0781620Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0781782Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_long_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0781953Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_bool PASSED [ 21%]
2023-01-11T23:13:47.0782130Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0782293Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float32 PASSED [ 21%]
2023-01-11T23:13:47.0782464Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0782631Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int16 PASSED [ 21%]
2023-01-11T23:13:47.0782800Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0782967Z test_ops.py::TestCommonCUDA::test_python_ref__refs__conversions_short_cuda_int8 PASSED [ 21%]
2023-01-11T23:13:47.0783131Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0783294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex32 PASSED [ 21%]
2023-01-11T23:13:47.0783450Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0783600Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_float16 PASSED [ 21%]
2023-01-11T23:13:47.0783753Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int32 PASSED [ 21%]
2023-01-11T23:13:47.0783905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_int64 PASSED [ 21%]
2023-01-11T23:13:47.0784057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_abs_cuda_uint8 PASSED [ 21%]
2023-01-11T23:13:47.0784219Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex128 PASSED [ 21%]
2023-01-11T23:13:47.0784383Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_complex64 PASSED [ 21%]
2023-01-11T23:13:47.0784541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_float64 PASSED [ 21%]
2023-01-11T23:13:47.0784719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0784876Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0785025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acos_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0785182Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0785341Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex32 PASSED [ 22%]
2023-01-11T23:13:47.0785496Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0785655Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0785813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0785971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0786133Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0786280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_acosh_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0786436Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0786589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_bool PASSED  [ 22%]
2023-01-11T23:13:47.0786754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0786913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex32 PASSED [ 22%]
2023-01-11T23:13:47.0787067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0787222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0787410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0787553Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0787706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_int8 PASSED  [ 22%]
2023-01-11T23:13:47.0787860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_add_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0788026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0788191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0788352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcdiv_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0788520Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0788683Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0788839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0788992Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addcmul_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0789153Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bfloat16 XFAIL [ 22%]
2023-01-11T23:13:47.0789304Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_bool XFAIL  [ 22%]
2023-01-11T23:13:47.0789460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_complex64 XFAIL [ 22%]
2023-01-11T23:13:47.0789617Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_float16 XFAIL [ 22%]
2023-01-11T23:13:47.0789837Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_int64 XFAIL [ 22%]
2023-01-11T23:13:47.0790005Z test_ops.py::TestCommonCUDA::test_python_ref__refs_addr_cuda_uint8 XFAIL [ 22%]
2023-01-11T23:13:47.0790151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_bool PASSED  [ 22%]
2023-01-11T23:13:47.0790310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0790465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0790649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0790806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0790956Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_int8 PASSED  [ 22%]
2023-01-11T23:13:47.0791106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_all_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0791272Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0791437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0791590Z test_ops.py::TestCommonCUDA::test_python_ref__refs_allclose_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0791751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0791906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0792062Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0792218Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amax_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0792375Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0792531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0792682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0792828Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0792979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0793156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_amin_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0793311Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0793476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0793634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0793790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0793940Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0794084Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0794234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0794382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_any_cuda_int8 PASSED  [ 22%]
2023-01-11T23:13:47.0794547Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0794707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0794865Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0795021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0795177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_arange_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0795333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0795504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0795671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0795831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0795994Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0796150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0796333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0796518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0796703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0796876Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0797050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0797230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0797409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_partial_views_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0797582Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex32 PASSED [ 22%]
2023-01-11T23:13:47.0797753Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0797927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0798098Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0798266Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0798429Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0798597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0798763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_as_strided_scatter_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0798947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0799108Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0799262Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0799416Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0799570Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asin_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0799723Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0799884Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0800043Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0800199Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0800354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_asinh_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0800513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0800671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0800823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0800969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0801119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan2_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0801273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_bool PASSED [ 22%]
2023-01-11T23:13:47.0801433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0801589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0801746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0801904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0802079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0802224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atan_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0802379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_bool PASSED [ 22%]
2023-01-11T23:13:47.0802541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex32 PASSED [ 22%]
2023-01-11T23:13:47.0802701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0802858Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0803012Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0803170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0803327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0803485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atanh_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0803641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0803806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0803966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0804127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0804287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0804445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_1d_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0804629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_bool PASSED [ 22%]
2023-01-11T23:13:47.0804795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex128 PASSED [ 22%]
2023-01-11T23:13:47.0804955Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_complex64 PASSED [ 22%]
2023-01-11T23:13:47.0805120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0805281Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_float64 PASSED [ 22%]
2023-01-11T23:13:47.0805440Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0805595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0805751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0805910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_2d_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0806077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_bfloat16 PASSED [ 22%]
2023-01-11T23:13:47.0806240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_complex32 PASSED [ 22%]
2023-01-11T23:13:47.0806403Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float16 PASSED [ 22%]
2023-01-11T23:13:47.0806561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_float32 PASSED [ 22%]
2023-01-11T23:13:47.0806719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0806875Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0807029Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0807186Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0807345Z test_ops.py::TestCommonCUDA::test_python_ref__refs_atleast_3d_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0807509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0807688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_and_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0807859Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0808026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int32 PASSED [ 22%]
2023-01-11T23:13:47.0808193Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0808358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_left_shift_cuda_uint8 PASSED [ 22%]
2023-01-11T23:13:47.0808518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int16 PASSED [ 22%]
2023-01-11T23:13:47.0808677Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int64 PASSED [ 22%]
2023-01-11T23:13:47.0808839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_not_cuda_int8 PASSED [ 22%]
2023-01-11T23:13:47.0808987Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0809146Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0809300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_or_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0809472Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0809641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0809808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_right_shift_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0809968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0810156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0810317Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0810469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0810627Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bitwise_xor_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0810801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0810971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0811147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_complex64 PASSED [ 23%]
2023-01-11T23:13:47.0811318Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0811487Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0811658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_tensors_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0811813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0811981Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_complex64 PASSED [ 23%]
2023-01-11T23:13:47.0812149Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0812313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0812477Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0812638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_broadcast_to_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0812797Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0812962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0813116Z test_ops.py::TestCommonCUDA::test_python_ref__refs_bucketize_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0813292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0813449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0813602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0813759Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0813920Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cat_cuda_int8 PASSED  [ 23%]
2023-01-11T23:13:47.0814079Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0814235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0814387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0814647Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0814801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ceil_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0814968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0815124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0815279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_chunk_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0815438Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0815594Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0815750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0815903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0816110Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0816267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0816428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0816581Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0816741Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_max_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0816903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0817059Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0817211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0817367Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0817523Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clamp_min_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0817681Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0817839Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0818000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_complex128 PASSED [ 23%]
2023-01-11T23:13:47.0818156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0818310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0818465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_clone_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0818623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0818781Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0818952Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_complex128 PASSED [ 23%]
2023-01-11T23:13:47.0819155Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0819347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0819524Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0819686Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0819842Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0819996Z test_ops.py::TestCommonCUDA::test_python_ref__refs_column_stack_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0820155Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0820310Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0820468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0820623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0820775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0820929Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0821100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_complex128 PASSED [ 23%]
2023-01-11T23:13:47.0821260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0821420Z test_ops.py::TestCommonCUDA::test_python_ref__refs_conj_physical_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0821589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_complex64 PASSED [ 23%]
2023-01-11T23:13:47.0821784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0821951Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0822115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0822281Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0822441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0822602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0822755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_constant_pad_nd_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0822917Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0823080Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0823248Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_complex32 PASSED [ 23%]
2023-01-11T23:13:47.0823410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0823579Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0823738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0823897Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0824048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_contiguous_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0824211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0824368Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0824529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0824694Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0824879Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0825040Z test_ops.py::TestCommonCUDA::test_python_ref__refs_copysign_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0825198Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0825356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex128 PASSED [ 23%]
2023-01-11T23:13:47.0825504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_complex32 PASSED [ 23%]
2023-01-11T23:13:47.0825659Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0825814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0825970Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_int8 PASSED  [ 23%]
2023-01-11T23:13:47.0826124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cos_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0826279Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0826438Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0826591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0826738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0826890Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0827046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cosh_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0827207Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_complex64 PASSED [ 23%]
2023-01-11T23:13:47.0827365Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0827594Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0827746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0827901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0828048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_cumsum_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0828206Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0828364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_complex64 PASSED [ 23%]
2023-01-11T23:13:47.0828522Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0828679Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0828831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0828986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0829139Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0829289Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0829441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0829605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0829837Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0830011Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0830172Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diag_embed_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0830340Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0830509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0830707Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_complex32 PASSED [ 23%]
2023-01-11T23:13:47.0830863Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0831027Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0831187Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_copy_cuda_int64 PASSED [ 23%]
2023-01-11T23:13:47.0831343Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0831502Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0831662Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float32 PASSED [ 23%]
2023-01-11T23:13:47.0831826Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0831985Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0832138Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int32 PASSED [ 23%]
2023-01-11T23:13:47.0832293Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_int8 PASSED [ 23%]
2023-01-11T23:13:47.0832449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0832621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bfloat16 PASSED [ 23%]
2023-01-11T23:13:47.0832790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0832959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_float64 PASSED [ 23%]
2023-01-11T23:13:47.0833127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_int16 PASSED [ 23%]
2023-01-11T23:13:47.0833320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_diagonal_scatter_cuda_uint8 PASSED [ 23%]
2023-01-11T23:13:47.0833478Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_bool PASSED [ 23%]
2023-01-11T23:13:47.0833633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float16 PASSED [ 23%]
2023-01-11T23:13:47.0833791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0833946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0834102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0834253Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int32 PASSED [ 24%]
2023-01-11T23:13:47.0834410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0834565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_digamma_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0834742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:13:47.0834901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0835071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int32 PASSED [ 24%]
2023-01-11T23:13:47.0835237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0835398Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_floor_rounding_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0835566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0835746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0835922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0836093Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0836263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0836452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_no_rounding_mode_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0836623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:13:47.0836793Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0836963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0837132Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0837301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0837470Z test_ops.py::TestCommonCUDA::test_python_ref__refs_div_trunc_rounding_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0837631Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:13:47.0837791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0837950Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0838109Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0838264Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0838422Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0838579Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0838740Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dsplit_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0838922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0839071Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0839227Z test_ops.py::TestCommonCUDA::test_python_ref__refs_dstack_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0839433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0839634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0839833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_cuda_int32 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0840039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0840241Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_bool SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0840452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0840660Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0840863Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0841056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0841258Z test_ops.py::TestCommonCUDA::test_python_ref__refs_empty_like_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 24%]
2023-01-11T23:13:47.0841412Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_bool PASSED   [ 24%]
2023-01-11T23:13:47.0841572Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0841733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0841889Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0842068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eq_cuda_uint8 PASSED  [ 24%]
2023-01-11T23:13:47.0842228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:13:47.0842380Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_bool PASSED  [ 24%]
2023-01-11T23:13:47.0842530Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0842682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0842833Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0842986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erf_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0843142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0843300Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0843456Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0843609Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0843756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0843914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfc_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0844075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0844230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0844384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int32 PASSED [ 24%]
2023-01-11T23:13:47.0844562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0844717Z test_ops.py::TestCommonCUDA::test_python_ref__refs_erfinv_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0844870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0845019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0845170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp2_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0845323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:13:47.0845475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0845623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_exp_cuda_int8 PASSED  [ 24%]
2023-01-11T23:13:47.0845791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0845957Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0846114Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_as_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0846272Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0846432Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0846586Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expand_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0846743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0846895Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0847048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0847202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int32 PASSED [ 24%]
2023-01-11T23:13:47.0847357Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0847507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_expm1_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0847658Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_bool PASSED  [ 24%]
2023-01-11T23:13:47.0847843Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0848001Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0848156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0848308Z test_ops.py::TestCommonCUDA::test_python_ref__refs_eye_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0848466Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0848630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0848794Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0848977Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0849157Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0849315Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft2_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0849476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0849640Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0849800Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0849958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0850113Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0850260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0850442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fft_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0850598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0850763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0850927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0851090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0851249Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0851407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0851557Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0851716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftn_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0851883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_bfloat16 PASSED [ 24%]
2023-01-11T23:13:47.0852056Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0852224Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0852390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0852552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_int32 PASSED [ 24%]
2023-01-11T23:13:47.0852715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_fftshift_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0852872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0853025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0853189Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0853345Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0853525Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int16 PASSED [ 24%]
2023-01-11T23:13:47.0853680Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0853838Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft2_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0854001Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0854159Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0854312Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0854467Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfft_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0854739Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0854905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0855072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0855232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0855392Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0855549Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_int32 PASSED [ 24%]
2023-01-11T23:13:47.0855696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_hfftn_cuda_uint8 PASSED [ 24%]
2023-01-11T23:13:47.0855856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_bool PASSED [ 24%]
2023-01-11T23:13:47.0856019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0856219Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0856379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0856537Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0856693Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float32 PASSED [ 24%]
2023-01-11T23:13:47.0856853Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft2_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0857016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex128 PASSED [ 24%]
2023-01-11T23:13:47.0857171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0857333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0857494Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0857653Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_float64 PASSED [ 24%]
2023-01-11T23:13:47.0857815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int64 PASSED [ 24%]
2023-01-11T23:13:47.0857974Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifft_cuda_int8 PASSED [ 24%]
2023-01-11T23:13:47.0858134Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex32 PASSED [ 24%]
2023-01-11T23:13:47.0858292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_complex64 PASSED [ 24%]
2023-01-11T23:13:47.0858445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_float16 PASSED [ 24%]
2023-01-11T23:13:47.0858605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0858762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftn_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0858947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0859142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0859341Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0859506Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ifftshift_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0859667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0859829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0859980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0860137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft2_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0860294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0860457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0860616Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0860777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfft_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0860934Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0861092Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0861242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_ihfftn_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0861406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0861563Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0861719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0861903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft2_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0862072Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex128 PASSED [ 25%]
2023-01-11T23:13:47.0862234Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0862393Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_complex64 PASSED [ 25%]
2023-01-11T23:13:47.0862544Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0862704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0862860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0863013Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0863178Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0863335Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfft_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0863501Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0863662Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0863822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0863972Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0864128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0864288Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0864446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0864608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_irfftn_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0864769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0864961Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft2_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0865127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0865278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0865435Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0865593Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0865746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0865900Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfft_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0866055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0866216Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0866384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0866531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fft_rfftn_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0866688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0866842Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0867000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0867156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0867309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fill_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0867511Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0867670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0867828Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0867979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0868137Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flatten_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0868292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0868451Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex128 PASSED [ 25%]
2023-01-11T23:13:47.0868611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_complex64 PASSED [ 25%]
2023-01-11T23:13:47.0868770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0868925Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0869082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0869230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flip_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0869388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0869545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0869751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fliplr_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0869909Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0870074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_complex128 PASSED [ 25%]
2023-01-11T23:13:47.0870232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0870396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0870545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0870730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0870887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0871041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0871201Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0871361Z test_ops.py::TestCommonCUDA::test_python_ref__refs_flipud_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0871529Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0871694Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0871854Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0872021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0872185Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0872347Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0872511Z test_ops.py::TestCommonCUDA::test_python_ref__refs_float_power_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0872670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0872829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0872990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0873147Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0873326Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0873493Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0873661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0873823Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0873982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_floor_divide_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0874145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0874301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0874460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0874608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0874768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0874927Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmax_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0875087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0875240Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0875396Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0875551Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0875706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmin_cuda_uint8 PASSED [ 25%]
2023-01-11T23:13:47.0875860Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0876014Z test_ops.py::TestCommonCUDA::test_python_ref__refs_fmod_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0876176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0876334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_frac_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0876512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0876664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gcd_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0876818Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0876967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_bool PASSED   [ 25%]
2023-01-11T23:13:47.0877111Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0877265Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0877418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_int16 PASSED  [ 25%]
2023-01-11T23:13:47.0877566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ge_cuda_uint8 PASSED  [ 25%]
2023-01-11T23:13:47.0877721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_bool PASSED   [ 25%]
2023-01-11T23:13:47.0877873Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0878028Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0878176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0878316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int16 PASSED  [ 25%]
2023-01-11T23:13:47.0878465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int32 PASSED  [ 25%]
2023-01-11T23:13:47.0878612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int64 PASSED  [ 25%]
2023-01-11T23:13:47.0878763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_gt_cuda_int8 PASSED   [ 25%]
2023-01-11T23:13:47.0878950Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0879161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0879322Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0879489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0879649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0879803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0879959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0880121Z test_ops.py::TestCommonCUDA::test_python_ref__refs_heaviside_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0880276Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0880434Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0880596Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex128 PASSED [ 25%]
2023-01-11T23:13:47.0880755Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0880924Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_complex64 PASSED [ 25%]
2023-01-11T23:13:47.0881075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_float16 PASSED [ 25%]
2023-01-11T23:13:47.0881231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0881388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hsplit_cuda_int8 PASSED [ 25%]
2023-01-11T23:13:47.0881551Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bfloat16 PASSED [ 25%]
2023-01-11T23:13:47.0881706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_bool PASSED [ 25%]
2023-01-11T23:13:47.0881864Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex128 PASSED [ 25%]
2023-01-11T23:13:47.0882026Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_complex32 PASSED [ 25%]
2023-01-11T23:13:47.0882186Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float32 PASSED [ 25%]
2023-01-11T23:13:47.0882360Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_float64 PASSED [ 25%]
2023-01-11T23:13:47.0882519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int16 PASSED [ 25%]
2023-01-11T23:13:47.0882670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int32 PASSED [ 25%]
2023-01-11T23:13:47.0882820Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int64 PASSED [ 25%]
2023-01-11T23:13:47.0882971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0883128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hstack_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0883287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0883445Z test_ops.py::TestCommonCUDA::test_python_ref__refs_hypot_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0883598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0883747Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0883897Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int32 PASSED  [ 26%]
2023-01-11T23:13:47.0884046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_int64 PASSED  [ 26%]
2023-01-11T23:13:47.0884198Z test_ops.py::TestCommonCUDA::test_python_ref__refs_i0_cuda_uint8 PASSED  [ 26%]
2023-01-11T23:13:47.0884353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igamma_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0884510Z test_ops.py::TestCommonCUDA::test_python_ref__refs_igammac_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0884665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_imag_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0884851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex128 XFAIL [ 26%]
2023-01-11T23:13:47.0885007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex32 XFAIL [ 26%]
2023-01-11T23:13:47.0885170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_complex64 XFAIL [ 26%]
2023-01-11T23:13:47.0885329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float16 XFAIL [ 26%]
2023-01-11T23:13:47.0885489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_float64 XFAIL [ 26%]
2023-01-11T23:13:47.0885641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int16 XFAIL [ 26%]
2023-01-11T23:13:47.0885792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_int64 XFAIL [ 26%]
2023-01-11T23:13:47.0885948Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_add_cuda_uint8 XFAIL [ 26%]
2023-01-11T23:13:47.0886109Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bfloat16 XFAIL [ 26%]
2023-01-11T23:13:47.0886257Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_bool XFAIL [ 26%]
2023-01-11T23:13:47.0886419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_complex64 XFAIL [ 26%]
2023-01-11T23:13:47.0886582Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float16 XFAIL [ 26%]
2023-01-11T23:13:47.0886740Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_float32 XFAIL [ 26%]
2023-01-11T23:13:47.0886898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int16 XFAIL [ 26%]
2023-01-11T23:13:47.0887052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int32 XFAIL [ 26%]
2023-01-11T23:13:47.0887202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_int64 XFAIL [ 26%]
2023-01-11T23:13:47.0887358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_copy_cuda_uint8 XFAIL [ 26%]
2023-01-11T23:13:47.0887514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex128 XFAIL [ 26%]
2023-01-11T23:13:47.0887682Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_complex64 XFAIL [ 26%]
2023-01-11T23:13:47.0887862Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_float16 XFAIL [ 26%]
2023-01-11T23:13:47.0888021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int16 XFAIL [ 26%]
2023-01-11T23:13:47.0888176Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int32 XFAIL [ 26%]
2023-01-11T23:13:47.0888333Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_int8 XFAIL [ 26%]
2023-01-11T23:13:47.0888489Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_fill_cuda_uint8 XFAIL [ 26%]
2023-01-11T23:13:47.0888656Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_complex128 XFAIL [ 26%]
2023-01-11T23:13:47.0888816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float16 XFAIL [ 26%]
2023-01-11T23:13:47.0888973Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_float64 XFAIL [ 26%]
2023-01-11T23:13:47.0889129Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int16 XFAIL [ 26%]
2023-01-11T23:13:47.0889284Z test_ops.py::TestCommonCUDA::test_python_ref__refs_index_select_cuda_int32 XFAIL [ 26%]
2023-01-11T23:13:47.0889446Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0889602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0889756Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0889914Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isclose_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0890074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0890231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0890414Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0890577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0890737Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0890899Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0891055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0891212Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isfinite_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0891366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0891518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0891678Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex32 PASSED [ 26%]
2023-01-11T23:13:47.0891835Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0891990Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0892144Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0892299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0892453Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isinf_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0892608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0892767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0892919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0893075Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0893231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0893382Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0893565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0893719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isnan_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0893880Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0894041Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0894191Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0894350Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0894610Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0894770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isneginf_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0894931Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0895090Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0895252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0895409Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0895561Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0895721Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0895883Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0896039Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0896237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isposinf_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0896395Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0896556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex32 PASSED [ 26%]
2023-01-11T23:13:47.0896719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0896874Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0897020Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0897173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0897329Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0897480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0897638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0897795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_isreal_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0897949Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0898103Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0898247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_int8 PASSED  [ 26%]
2023-01-11T23:13:47.0898397Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lcm_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0898550Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0898706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0898856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0899008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_le_cuda_int16 PASSED  [ 26%]
2023-01-11T23:13:47.0899166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0899358Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lerp_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0899509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0899664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0899815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0899969Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0900123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0900280Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lgamma_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0900454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0900631Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0900799Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_matrix_norm_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0900965Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0901126Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_norm_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0901291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svd_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0901463Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0901630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_svdvals_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0901805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0902000Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0902173Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linalg_vector_norm_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0902334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0902503Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0902661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_linspace_cuda_int64 XFAIL [ 26%]
2023-01-11T23:13:47.0902815Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0902967Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0903123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_float16 PASSED [ 26%]
2023-01-11T23:13:47.0903278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0903434Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0903578Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0903733Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log10_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0903887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_bool PASSED [ 26%]
2023-01-11T23:13:47.0904054Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0904207Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0904359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0904512Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log1p_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0904671Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex128 PASSED [ 26%]
2023-01-11T23:13:47.0904826Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_complex64 PASSED [ 26%]
2023-01-11T23:13:47.0904981Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0905156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int16 PASSED [ 26%]
2023-01-11T23:13:47.0905306Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int32 PASSED [ 26%]
2023-01-11T23:13:47.0905457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int64 PASSED [ 26%]
2023-01-11T23:13:47.0905609Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_int8 PASSED [ 26%]
2023-01-11T23:13:47.0905763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log2_cuda_uint8 PASSED [ 26%]
2023-01-11T23:13:47.0905921Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0906074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_bool PASSED  [ 26%]
2023-01-11T23:13:47.0906219Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_cuda_int8 PASSED  [ 26%]
2023-01-11T23:13:47.0906401Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 26%]
2023-01-11T23:13:47.0906585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_complex32 PASSED [ 26%]
2023-01-11T23:13:47.0906762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 26%]
2023-01-11T23:13:47.0906933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 26%]
2023-01-11T23:13:47.0907106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0907274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_log_softmax_with_dtype_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0907441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0907639Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0907803Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0907966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0908126Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_and_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0908291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0908452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0908608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0908763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0908926Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0909082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_not_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0909241Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0909407Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0909569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0909784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0909946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0910105Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_or_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0910273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0910433Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0910591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0910775Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logical_xor_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0910938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0911102Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0911261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0911419Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int16 XFAIL [ 27%]
2023-01-11T23:13:47.0911572Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int32 XFAIL [ 27%]
2023-01-11T23:13:47.0911719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logspace_cuda_int64 XFAIL [ 27%]
2023-01-11T23:13:47.0911885Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0912046Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_bool PASSED [ 27%]
2023-01-11T23:13:47.0912206Z test_ops.py::TestCommonCUDA::test_python_ref__refs_logsumexp_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0912362Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0912514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int16 PASSED  [ 27%]
2023-01-11T23:13:47.0912666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int32 PASSED  [ 27%]
2023-01-11T23:13:47.0912816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_int8 PASSED   [ 27%]
2023-01-11T23:13:47.0912968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_lt_cuda_uint8 PASSED  [ 27%]
2023-01-11T23:13:47.0913121Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_bool PASSED [ 27%]
2023-01-11T23:13:47.0913314Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0913476Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0913643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0913801Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0913959Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0914120Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0914277Z test_ops.py::TestCommonCUDA::test_python_ref__refs_masked_fill_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0914429Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0914590Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0914747Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0914905Z test_ops.py::TestCommonCUDA::test_python_ref__refs_maximum_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0915064Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mean_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0915244Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0915428Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0915612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0915792Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0915964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0916145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0916318Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_list_of_tensors_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0916566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0916752Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0916936Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0917118Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0917293Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0917465Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0917637Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0917807Z test_ops.py::TestCommonCUDA::test_python_ref__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0917968Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0918127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_bool PASSED [ 27%]
2023-01-11T23:13:47.0918286Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0918441Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0918595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0918750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0918919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_minimum_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0919128Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_bool PASSED [ 27%]
2023-01-11T23:13:47.0919293Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_complex32 PASSED [ 27%]
2023-01-11T23:13:47.0919451Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0919605Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0919758Z test_ops.py::TestCommonCUDA::test_python_ref__refs_movedim_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0919916Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0920074Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0920222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0920374Z test_ops.py::TestCommonCUDA::test_python_ref__refs_mul_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0920539Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0920696Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_bool PASSED [ 27%]
2023-01-11T23:13:47.0920856Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0921009Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0921167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0921320Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nan_to_num_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0921485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0921646Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0946166Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_complex32 PASSED [ 27%]
2023-01-11T23:13:47.0946360Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0946645Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0946811Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0946974Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_copy_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0947134Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0947295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_complex32 PASSED [ 27%]
2023-01-11T23:13:47.0947449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0947604Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0947762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0947920Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0948077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_narrow_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0948247Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0948417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0948583Z test_ops.py::TestCommonCUDA::test_python_ref__refs_native_layer_norm_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0948734Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0948887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_bool PASSED   [ 27%]
2023-01-11T23:13:47.0949044Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0949238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0949391Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0949545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0949785Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_int8 PASSED   [ 27%]
2023-01-11T23:13:47.0949939Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ne_cuda_uint8 PASSED  [ 27%]
2023-01-11T23:13:47.0950100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0950251Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0950402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0950552Z test_ops.py::TestCommonCUDA::test_python_ref__refs_neg_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0950754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:13:47.0950958Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:13:47.0951162Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:13:47.0951362Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:13:47.0951556Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 27%]
2023-01-11T23:13:47.0951769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:13:47.0951984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:13:47.0952203Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:13:47.0952444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_empty_strided_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 27%]
2023-01-11T23:13:47.0952607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0952769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_complex32 PASSED [ 27%]
2023-01-11T23:13:47.0952928Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0953083Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0953232Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int32 PASSED [ 27%]
2023-01-11T23:13:47.0953381Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_full_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0953542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_complex32 PASSED [ 27%]
2023-01-11T23:13:47.0953704Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float16 PASSED [ 27%]
2023-01-11T23:13:47.0953862Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0954017Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0954167Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0954316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_ones_cuda_uint8 PASSED [ 27%]
2023-01-11T23:13:47.0954475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_bfloat16 PASSED [ 27%]
2023-01-11T23:13:47.0954638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex128 PASSED [ 27%]
2023-01-11T23:13:47.0954825Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_complex64 PASSED [ 27%]
2023-01-11T23:13:47.0954986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0955150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0955309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int16 PASSED [ 27%]
2023-01-11T23:13:47.0955463Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int64 PASSED [ 27%]
2023-01-11T23:13:47.0955619Z test_ops.py::TestCommonCUDA::test_python_ref__refs_new_zeros_cuda_int8 PASSED [ 27%]
2023-01-11T23:13:47.0955779Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0955942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nextafter_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0956168Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_alpha_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:13:47.0956340Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float32 PASSED [ 27%]
2023-01-11T23:13:47.0956513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_celu_cuda_float64 PASSED [ 27%]
2023-01-11T23:13:47.0956731Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:13:47.0956946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 27%]
2023-01-11T23:13:47.0957117Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_elu_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0957289Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0957458Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_gelu_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0957629Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0957795Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0957988Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_glu_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0958170Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0958348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_group_norm_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0958527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0958705Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0958882Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0959057Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0959230Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0959400Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hardtanh_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0959587Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0959767Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0959943Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0960115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0960292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0960492Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0960670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0960841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0961008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0961180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0961354Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0961531Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0961703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0961898Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0962086Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0962274Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0962460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0962647Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [ 28%]
2023-01-11T23:13:47.0962832Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0963016Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0963204Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0963415Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0963595Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0963773Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_margin_ranking_loss_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0963946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0964116Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mish_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0964295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0964468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0964641Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0964814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_nll_loss_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0965002Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0965188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0965370Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0965553Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0965739Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0965946Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0966124Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pairwise_distance_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0966297Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_pdist_cuda_float32 XFAIL [ 28%]
2023-01-11T23:13:47.0966481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0966664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_poisson_nll_loss_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0966834Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_prelu_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0967008Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0967177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0967348Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu6_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0967521Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0967688Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0967858Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0968025Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0968192Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0968356Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_int8 PASSED [ 28%]
2023-01-11T23:13:47.0968518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_relu_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0968690Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_selu_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0968879Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0969091Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0969275Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0969459Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0969642Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0969819Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0970001Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmax_with_dtype_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0970202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0970447Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0970673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0970891Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0971069Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softmin_with_dtype_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0971246Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0971418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softplus_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0971654Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_softshrink_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0971841Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0972019Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0972194Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0972368Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_int8 PASSED [ 28%]
2023-01-11T23:13:47.0972541Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_tanhshrink_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0972716Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0972893Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0973065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_threshold_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0973260Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0973447Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0973630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0973813Z test_ops.py::TestCommonCUDA::test_python_ref__refs_nn_functional_triplet_margin_loss_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0973974Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0974132Z test_ops.py::TestCommonCUDA::test_python_ref__refs_norm_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0974287Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0974449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0974732Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0974949Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0975156Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0975349Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ones_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0975509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0975667Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0975818Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_int8 PASSED [ 28%]
2023-01-11T23:13:47.0975975Z test_ops.py::TestCommonCUDA::test_python_ref__refs_permute_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0976142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex32 PASSED [ 28%]
2023-01-11T23:13:47.0976303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0976464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0976621Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0976777Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0976929Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0977077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_positive_cuda_int8 PASSED [ 28%]
2023-01-11T23:13:47.0977231Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0977384Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0977585Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex32 XFAIL [ 28%]
2023-01-11T23:13:47.0977737Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0977891Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0978042Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_float64 PASSED [ 28%]
2023-01-11T23:13:47.0978197Z test_ops.py::TestCommonCUDA::test_python_ref__refs_pow_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0978352Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_complex32 PASSED [ 28%]
2023-01-11T23:13:47.0978505Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0978654Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0978806Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0978962Z test_ops.py::TestCommonCUDA::test_python_ref__refs_prod_cuda_int8 PASSED [ 28%]
2023-01-11T23:13:47.0979138Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0979321Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0979483Z test_ops.py::TestCommonCUDA::test_python_ref__refs_randn_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0979634Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_bool PASSED [ 28%]
2023-01-11T23:13:47.0979786Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_complex32 PASSED [ 28%]
2023-01-11T23:13:47.0979938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0980087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0980237Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0980390Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0980538Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0980713Z test_ops.py::TestCommonCUDA::test_python_ref__refs_ravel_cuda_uint8 PASSED [ 28%]
2023-01-11T23:13:47.0980866Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_bool PASSED [ 28%]
2023-01-11T23:13:47.0981021Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0981177Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0981331Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float16 PASSED [ 28%]
2023-01-11T23:13:47.0981482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_float32 PASSED [ 28%]
2023-01-11T23:13:47.0981633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_real_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0981804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0981971Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex128 PASSED [ 28%]
2023-01-11T23:13:47.0982142Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_complex64 PASSED [ 28%]
2023-01-11T23:13:47.0982299Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int16 PASSED [ 28%]
2023-01-11T23:13:47.0982460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reciprocal_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0982623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0982780Z test_ops.py::TestCommonCUDA::test_python_ref__refs_remainder_cuda_int64 PASSED [ 28%]
2023-01-11T23:13:47.0982933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bfloat16 PASSED [ 28%]
2023-01-11T23:13:47.0983087Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_bool PASSED [ 28%]
2023-01-11T23:13:47.0983273Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int32 PASSED [ 28%]
2023-01-11T23:13:47.0983426Z test_ops.py::TestCommonCUDA::test_python_ref__refs_repeat_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0983592Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0983750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0983910Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0984065Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0984221Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0984377Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_as_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0984535Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0984693Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0984855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0985015Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.0985171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0985323Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0985480Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0985636Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0985790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_reshape_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0985942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0986097Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0986252Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0986435Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.0986591Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0986742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0986894Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0987048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0987200Z test_ops.py::TestCommonCUDA::test_python_ref__refs_roll_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0987351Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0987509Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0987664Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.0987816Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0987966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0988114Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0988267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rot90_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0988422Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0988577Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0988729Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0988901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0989053Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0989208Z test_ops.py::TestCommonCUDA::test_python_ref__refs_round_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0989365Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0989518Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0989676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0989906Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex32 PASSED [ 29%]
2023-01-11T23:13:47.0990060Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.0990209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0990366Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0990519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0990673Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsqrt_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0990827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0990978Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0991130Z test_ops.py::TestCommonCUDA::test_python_ref__refs_rsub_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0991282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_bool PASSED  [ 29%]
2023-01-11T23:13:47.0991437Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0991590Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.0991746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0991899Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0992077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0992228Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0992379Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sgn_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0992537Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0992689Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0992851Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0993013Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_complex32 PASSED [ 29%]
2023-01-11T23:13:47.0993174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0993327Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0993486Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0993638Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0993790Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sigmoid_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0993941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0994092Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0994243Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0994391Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0994565Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0994718Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sign_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0994877Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0995030Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0995184Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0995331Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0995482Z test_ops.py::TestCommonCUDA::test_python_ref__refs_signbit_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0995632Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.0995784Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0995938Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0996088Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0996239Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sin_cuda_int8 PASSED  [ 29%]
2023-01-11T23:13:47.0996389Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.0996540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0996691Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.0996840Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0996991Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0997143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0997295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinc_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0997452Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0997608Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.0997791Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_complex32 PASSED [ 29%]
2023-01-11T23:13:47.0997941Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0998089Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0998236Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sinh_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.0998406Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.0998574Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.0998742Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0998911Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0999076Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.0999238Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.0999402Z test_ops.py::TestCommonCUDA::test_python_ref__refs_softmax_with_dtype_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.0999566Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.0999727Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j0_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.0999888Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_bessel_j1_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.1000048Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.1000242Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1000400Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.1000558Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.1000715Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.1000870Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_entr_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.1001033Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.1001195Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_erfcx_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.1001353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.1001513Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.1001670Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1001829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.1001986Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.1002143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i0e_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.1002298Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.1002454Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1002607Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.1002762Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.1002920Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.1003077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.1003255Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.1003411Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_i1e_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.1003575Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.1003738Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_ndtr_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1003922Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.1004106Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 29%]
2023-01-11T23:13:47.1004294Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 29%]
2023-01-11T23:13:47.1004475Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.1004655Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1004827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.1004991Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bfloat16 PASSED [ 29%]
2023-01-11T23:13:47.1005148Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_bool PASSED [ 29%]
2023-01-11T23:13:47.1005309Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.1005468Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1005651Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.1005809Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_logit_cuda_int8 PASSED [ 29%]
2023-01-11T23:13:47.1005998Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.1006180Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.1006362Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_1_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.1006540Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float16 PASSED [ 29%]
2023-01-11T23:13:47.1006719Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float32 PASSED [ 29%]
2023-01-11T23:13:47.1006901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 29%]
2023-01-11T23:13:47.1007085Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [ 29%]
2023-01-11T23:13:47.1007267Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int32 PASSED [ 29%]
2023-01-11T23:13:47.1007444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_int64 PASSED [ 29%]
2023-01-11T23:13:47.1007618Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [ 29%]
2023-01-11T23:13:47.1007800Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1007976Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1008151Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1008316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1008477Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1008665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1008831Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtr_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1009018Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1009197Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1009351Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_ndtri_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1009528Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1009703Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1009887Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1010067Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1010243Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1010417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_softmax_with_dtype_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1010592Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1010770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1010942Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1011150Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_spherical_bessel_j0_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1011316Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1011481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1011644Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1011804Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1011964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1012123Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_xlog1py_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1012278Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1012504Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 30%]
2023-01-11T23:13:47.1012666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1012827Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1012984Z test_ops.py::TestCommonCUDA::test_python_ref__refs_special_zeta_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1013140Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1013295Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1013448Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1013598Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1013746Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sqrt_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1013903Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1014061Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1014235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1014388Z test_ops.py::TestCommonCUDA::test_python_ref__refs_square_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1014649Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1014808Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_complex64 PASSED [ 30%]
2023-01-11T23:13:47.1014964Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1015115Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1015261Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1015418Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1015568Z test_ops.py::TestCommonCUDA::test_python_ref__refs_squeeze_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1015728Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1015884Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1016037Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1016188Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1016337Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1016485Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1016635Z test_ops.py::TestCommonCUDA::test_python_ref__refs_stack_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1016834Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1016982Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1017145Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1017305Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_complex64 PASSED [ 30%]
2023-01-11T23:13:47.1017460Z test_ops.py::TestCommonCUDA::test_python_ref__refs_std_mean_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1017612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1017763Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1017913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1018060Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_int8 PASSED  [ 30%]
2023-01-11T23:13:47.1018211Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sub_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1018359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_bool PASSED  [ 30%]
2023-01-11T23:13:47.1018515Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1018665Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1018814Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1018980Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1019143Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1019313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1019469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1019633Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_complex64 PASSED [ 30%]
2023-01-11T23:13:47.1019789Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1019979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1020136Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1020291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1020443Z test_ops.py::TestCommonCUDA::test_python_ref__refs_sum_to_size_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1020589Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_bool PASSED    [ 30%]
2023-01-11T23:13:47.1020735Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1020881Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int16 PASSED   [ 30%]
2023-01-11T23:13:47.1021031Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_int64 PASSED   [ 30%]
2023-01-11T23:13:47.1021174Z test_ops.py::TestCommonCUDA::test_python_ref__refs_t_cuda_uint8 PASSED   [ 30%]
2023-01-11T23:13:47.1021334Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1021488Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1021636Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_complex64 PASSED [ 30%]
2023-01-11T23:13:47.1021786Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1021933Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1022077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_int8 PASSED  [ 30%]
2023-01-11T23:13:47.1022222Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tan_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1022394Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1022551Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1022709Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1022858Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1023007Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1023154Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1023303Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1023449Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tanh_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1023610Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_bfloat16 XFAIL [ 30%]
2023-01-11T23:13:47.1023768Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float16 XFAIL [ 30%]
2023-01-11T23:13:47.1023928Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tensor_split_cuda_float64 XFAIL [ 30%]
2023-01-11T23:13:47.1024082Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1024225Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_int16 PASSED  [ 30%]
2023-01-11T23:13:47.1024371Z test_ops.py::TestCommonCUDA::test_python_ref__refs_to_cuda_uint8 PASSED  [ 30%]
2023-01-11T23:13:47.1024527Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1024676Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1024829Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1024979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1025127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1025281Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1025427Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1025602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1025751Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trace_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1025913Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1026068Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1026229Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1026387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1026545Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1026701Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1026855Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1027010Z test_ops.py::TestCommonCUDA::test_python_ref__refs_transpose_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1027161Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1027313Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1027462Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1027612Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1027770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_tril_indices_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1027919Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_complex64 PASSED [ 30%]
2023-01-11T23:13:47.1028144Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1028291Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1028444Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1028602Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1028757Z test_ops.py::TestCommonCUDA::test_python_ref__refs_triu_indices_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1028918Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1029078Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_complex32 XFAIL [ 30%]
2023-01-11T23:13:47.1029235Z test_ops.py::TestCommonCUDA::test_python_ref__refs_true_divide_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1029387Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_bfloat16 PASSED [ 30%]
2023-01-11T23:13:47.1029539Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1029750Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1029904Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int32 PASSED [ 30%]
2023-01-11T23:13:47.1030052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1030202Z test_ops.py::TestCommonCUDA::test_python_ref__refs_trunc_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1030359Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_complex128 PASSED [ 30%]
2023-01-11T23:13:47.1030514Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float16 PASSED [ 30%]
2023-01-11T23:13:47.1030661Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float32 PASSED [ 30%]
2023-01-11T23:13:47.1030810Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_float64 PASSED [ 30%]
2023-01-11T23:13:47.1030963Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_int8 PASSED [ 30%]
2023-01-11T23:13:47.1031119Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unbind_cuda_uint8 PASSED [ 30%]
2023-01-11T23:13:47.1031301Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_bool PASSED [ 30%]
2023-01-11T23:13:47.1031464Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_complex32 PASSED [ 30%]
2023-01-11T23:13:47.1031619Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int16 PASSED [ 30%]
2023-01-11T23:13:47.1031770Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_int64 PASSED [ 30%]
2023-01-11T23:13:47.1031918Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unflatten_cuda_uint8 PASSED [ 31%]
2023-01-11T23:13:47.1032077Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_bool PASSED [ 31%]
2023-01-11T23:13:47.1032243Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1032405Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_complex64 PASSED [ 31%]
2023-01-11T23:13:47.1032569Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float16 PASSED [ 31%]
2023-01-11T23:13:47.1032730Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1032892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1033051Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1033210Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1033364Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_copy_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1033519Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_bool PASSED [ 31%]
2023-01-11T23:13:47.1033706Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_complex64 PASSED [ 31%]
2023-01-11T23:13:47.1033861Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float16 PASSED [ 31%]
2023-01-11T23:13:47.1034018Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1034175Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1034330Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unfold_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1034488Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_bool PASSED [ 31%]
2023-01-11T23:13:47.1034643Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1034805Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_complex64 PASSED [ 31%]
2023-01-11T23:13:47.1034966Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1035127Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1035282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1035442Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1035597Z test_ops.py::TestCommonCUDA::test_python_ref__refs_unsqueeze_cuda_uint8 PASSED [ 31%]
2023-01-11T23:13:47.1035754Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:13:47.1035901Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float16 PASSED [ 31%]
2023-01-11T23:13:47.1036055Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1036217Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:13:47.1036380Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_complex128 PASSED [ 31%]
2023-01-11T23:13:47.1036542Z test_ops.py::TestCommonCUDA::test_python_ref__refs_var_mean_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1036705Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex128 PASSED [ 31%]
2023-01-11T23:13:47.1036892Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_complex64 PASSED [ 31%]
2023-01-11T23:13:47.1037052Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int16 PASSED [ 31%]
2023-01-11T23:13:47.1037209Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1037353Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_as_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1037507Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_bool PASSED [ 31%]
2023-01-11T23:13:47.1037666Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1037822Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1037979Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int16 PASSED [ 31%]
2023-01-11T23:13:47.1038133Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1038292Z test_ops.py::TestCommonCUDA::test_python_ref__refs_view_cuda_uint8 PASSED [ 31%]
2023-01-11T23:13:47.1038457Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex128 PASSED [ 31%]
2023-01-11T23:13:47.1038611Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1038769Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1038947Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int16 PASSED [ 31%]
2023-01-11T23:13:47.1039121Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1039282Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vsplit_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1039469Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:13:47.1039623Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_bool PASSED [ 31%]
2023-01-11T23:13:47.1039789Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_complex128 PASSED [ 31%]
2023-01-11T23:13:47.1039939Z test_ops.py::TestCommonCUDA::test_python_ref__refs_vstack_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1040097Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1040254Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float16 PASSED [ 31%]
2023-01-11T23:13:47.1040410Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1040562Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1040714Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int16 PASSED [ 31%]
2023-01-11T23:13:47.1040872Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1041022Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1041171Z test_ops.py::TestCommonCUDA::test_python_ref__refs_where_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1041328Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float16 PASSED [ 31%]
2023-01-11T23:13:47.1041481Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1041630Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1041780Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int16 PASSED [ 31%]
2023-01-11T23:13:47.1041937Z test_ops.py::TestCommonCUDA::test_python_ref__refs_xlogy_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1042100Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex128 PASSED [ 31%]
2023-01-11T23:13:47.1042263Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1042417Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1042588Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1042743Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int16 PASSED [ 31%]
2023-01-11T23:13:47.1042896Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1043050Z test_ops.py::TestCommonCUDA::test_python_ref__refs_zeros_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1043204Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_T_cuda PASSED  [ 31%]
2023-01-11T23:13:47.1043379Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs__conversions_complex_cuda PASSED [ 31%]
2023-01-11T23:13:47.1043533Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amax_cuda PASSED [ 31%]
2023-01-11T23:13:47.1043689Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_amin_cuda PASSED [ 31%]
2023-01-11T23:13:47.1043852Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_as_strided_scatter_cuda PASSED [ 31%]
2023-01-11T23:13:47.1044012Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_atan2_cuda PASSED [ 31%]
2023-01-11T23:13:47.1044176Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_and_cuda PASSED [ 31%]
2023-01-11T23:13:47.1044342Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_left_shift_cuda PASSED [ 31%]
2023-01-11T23:13:47.1044503Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_bitwise_or_cuda PASSED [ 31%]
2023-01-11T23:13:47.1044655Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_cat_cuda XFAIL [ 31%]
2023-01-11T23:13:47.1044813Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_clamp_max_cuda XFAIL [ 31%]
2023-01-11T23:13:47.1044967Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_diag_cuda PASSED [ 31%]
2023-01-11T23:13:47.1045159Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_no_rounding_mode_cuda PASSED [ 31%]
2023-01-11T23:13:47.1045331Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_div_trunc_rounding_cuda PASSED [ 31%]
2023-01-11T23:13:47.1045490Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_eye_cuda PASSED [ 31%]
2023-01-11T23:13:47.1045648Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_fft_cuda PASSED [ 31%]
2023-01-11T23:13:47.1045807Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft2_cuda PASSED [ 31%]
2023-01-11T23:13:47.1045967Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfft_cuda PASSED [ 31%]
2023-01-11T23:13:47.1046129Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_ihfftn_cuda PASSED [ 31%]
2023-01-11T23:13:47.1046288Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_irfft2_cuda PASSED [ 31%]
2023-01-11T23:13:47.1046448Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft2_cuda PASSED [ 31%]
2023-01-11T23:13:47.1046602Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfft_cuda PASSED [ 31%]
2023-01-11T23:13:47.1046761Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_fft_rfftn_cuda PASSED [ 31%]
2023-01-11T23:13:47.1046923Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_flipud_cuda PASSED [ 31%]
2023-01-11T23:13:47.1047084Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_float_power_cuda PASSED [ 31%]
2023-01-11T23:13:47.1047240Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_gcd_cuda PASSED [ 31%]
2023-01-11T23:13:47.1047401Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_heaviside_cuda PASSED [ 31%]
2023-01-11T23:13:47.1047560Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hsplit_cuda PASSED [ 31%]
2023-01-11T23:13:47.1047717Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_hstack_cuda XFAIL [ 31%]
2023-01-11T23:13:47.1047867Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_isclose_cuda PASSED [ 31%]
2023-01-11T23:13:47.1048024Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_le_cuda PASSED [ 31%]
2023-01-11T23:13:47.1048187Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_linspace_cuda PASSED [ 31%]
2023-01-11T23:13:47.1048375Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logical_xor_cuda PASSED [ 31%]
2023-01-11T23:13:47.1048538Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_logspace_cuda PASSED [ 31%]
2023-01-11T23:13:47.1048694Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_maximum_cuda XFAIL [ 31%]
2023-01-11T23:13:47.1048849Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_movedim_cuda PASSED [ 31%]
2023-01-11T23:13:47.1049011Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_copy_cuda PASSED [ 31%]
2023-01-11T23:13:47.1049161Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_narrow_cuda PASSED [ 31%]
2023-01-11T23:13:47.1049333Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_native_layer_norm_cuda PASSED [ 31%]
2023-01-11T23:13:47.1049490Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_ne_cuda PASSED [ 31%]
2023-01-11T23:13:47.1049669Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_group_norm_cuda PASSED [ 31%]
2023-01-11T23:13:47.1049848Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_l1_loss_cuda PASSED [ 31%]
2023-01-11T23:13:47.1050032Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_poisson_nll_loss_cuda PASSED [ 31%]
2023-01-11T23:13:47.1050211Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_softshrink_cuda PASSED [ 31%]
2023-01-11T23:13:47.1050401Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_nn_functional_triplet_margin_loss_cuda PASSED [ 31%]
2023-01-11T23:13:47.1050557Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_pow_cuda PASSED [ 31%]
2023-01-11T23:13:47.1050711Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_remainder_cuda PASSED [ 31%]
2023-01-11T23:13:47.1050904Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_roll_cuda PASSED [ 31%]
2023-01-11T23:13:47.1051058Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rot90_cuda PASSED [ 31%]
2023-01-11T23:13:47.1051216Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_rsub_cuda PASSED [ 31%]
2023-01-11T23:13:47.1051385Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_special_xlog1py_cuda PASSED [ 31%]
2023-01-11T23:13:47.1051537Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_t_cuda PASSED  [ 31%]
2023-01-11T23:13:47.1051693Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_trace_cuda PASSED [ 31%]
2023-01-11T23:13:47.1051847Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_tril_cuda PASSED [ 31%]
2023-01-11T23:13:47.1052002Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_true_divide_cuda PASSED [ 31%]
2023-01-11T23:13:47.1052164Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_as_cuda PASSED [ 31%]
2023-01-11T23:13:47.1052319Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_view_cuda PASSED [ 31%]
2023-01-11T23:13:47.1052475Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_vstack_cuda XFAIL [ 31%]
2023-01-11T23:13:47.1052630Z test_ops.py::TestCommonCUDA::test_python_ref_errors__refs_where_cuda XFAIL [ 31%]
2023-01-11T23:13:47.1052793Z test_ops.py::TestCommonCUDA::test_python_ref_errors_ops_nvprims_view_cuda PASSED [ 31%]
2023-01-11T23:13:47.1052978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex128 PASSED [ 31%]
2023-01-11T23:13:47.1053160Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1053341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_complex64 PASSED [ 31%]
2023-01-11T23:13:47.1053513Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1053692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_int64 PASSED [ 31%]
2023-01-11T23:13:47.1053863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_aten_cuda_uint8 PASSED [ 31%]
2023-01-11T23:13:47.1054095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:13:47.1054305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:13:47.1054889Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 31%]
2023-01-11T23:13:47.1055099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:13:47.1055301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:13:47.1055629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 31%]
2023-01-11T23:13:47.1055825Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_T_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 31%]
2023-01-11T23:13:47.1056025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bfloat16 PASSED [ 31%]
2023-01-11T23:13:47.1056221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_bool PASSED [ 31%]
2023-01-11T23:13:47.1056420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_complex32 PASSED [ 31%]
2023-01-11T23:13:47.1056615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float16 PASSED [ 31%]
2023-01-11T23:13:47.1056811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float32 PASSED [ 31%]
2023-01-11T23:13:47.1057047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_float64 PASSED [ 31%]
2023-01-11T23:13:47.1057245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int32 PASSED [ 31%]
2023-01-11T23:13:47.1057439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_aten_cuda_int8 PASSED [ 31%]
2023-01-11T23:13:47.1057666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1058026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:13:47.1058255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1058475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bfloat16_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1058673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1058867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:13:47.1059059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1059252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1059439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1059662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1060056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.bool doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1060402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1060754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.bool doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1060972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1061186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_bool_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1061384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:13:47.1061580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:13:47.1061776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:13:47.1061969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1062161Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1062349Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1062532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1062774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1062995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1063216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1063440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1063658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1064017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.byte doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1064367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1064721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.byte doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1064938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1065278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_byte_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:13:47.1065484Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:13:47.1065678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:13:47.1065902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1066099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1066294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1066516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1066740Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1067103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:13:47.1067339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1067563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1067910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cdouble_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:13:47.1068110Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1068301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:13:47.1068497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:13:47.1068719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:13:47.1068952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1069163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1069355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1069546Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1069808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1070035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1070256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1070472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1070839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:13:47.1071062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1071410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1071759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_cfloat_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:13:47.1071989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1072189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:13:47.1072386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:13:47.1072581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:13:47.1072775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1072963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1073147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1073512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 32%]
2023-01-11T23:13:47.1073733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1074096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.chalf doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1074312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1074561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_chalf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1074760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:13:47.1074959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1075155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:13:47.1075353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1075538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1075727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:13:47.1075920Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1076146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1076368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1076712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1076928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1077277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_char_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:13:47.1077478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1077729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1077951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_complex_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1078145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:13:47.1078341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1078534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1078729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1078923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1079146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1079364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1079589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1079807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1080154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1080395Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1080614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_double_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1080813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1081010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_complex32 PASSED [ 32%]
2023-01-11T23:13:47.1081207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1081397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1081584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1081772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_aten_cuda_uint8 PASSED [ 32%]
2023-01-11T23:13:47.1081989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1082210Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1082548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1082907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.float doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1083125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_float_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1083350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1083545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1083738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1083925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1084114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:13:47.1084335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1084559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1084782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1085134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.half doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1085479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 32%]
2023-01-11T23:13:47.1085835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.half doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1086079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_half_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1086278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:13:47.1086470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:13:47.1086657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float32 PASSED [ 32%]
2023-01-11T23:13:47.1086842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1087028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1087212Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1087389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1087580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:13:47.1087939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.int doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1088153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1088500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.int doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1088846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_int_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 32%]
2023-01-11T23:13:47.1089046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bfloat16 PASSED [ 32%]
2023-01-11T23:13:47.1089269Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_bool PASSED [ 32%]
2023-01-11T23:13:47.1089471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1089666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float16 PASSED [ 32%]
2023-01-11T23:13:47.1089851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_float64 PASSED [ 32%]
2023-01-11T23:13:47.1090043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int16 PASSED [ 32%]
2023-01-11T23:13:47.1090232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int32 PASSED [ 32%]
2023-01-11T23:13:47.1090419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:13:47.1090640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1090859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1091216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_long_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.long doesn't support nvfuser) [ 32%]
2023-01-11T23:13:47.1091416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex128 PASSED [ 32%]
2023-01-11T23:13:47.1091612Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_complex64 PASSED [ 32%]
2023-01-11T23:13:47.1091842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int64 PASSED [ 32%]
2023-01-11T23:13:47.1092030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_aten_cuda_int8 PASSED [ 32%]
2023-01-11T23:13:47.1092250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1092467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1092689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 32%]
2023-01-11T23:13:47.1093057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_float32 SKIPPED (_refs._conversions.short doesn't support nvfuser) [ 33%]
2023-01-11T23:13:47.1093405Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:13:47.1093763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int32 SKIPPED (_refs._conversions.short doesn't support nvfuser) [ 33%]
2023-01-11T23:13:47.1093979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1094324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs__conversions_short_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1094629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1094817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1094994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1095221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:13:47.1095399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1095608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1095817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1096024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1096207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1096418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_abs_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1096602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1096787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1096960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_complex32 PASSED [ 33%]
2023-01-11T23:13:47.1097141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1097324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:13:47.1097503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1097714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1097893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1098077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:13:47.1098288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1098492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1098831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 33%]
2023-01-11T23:13:47.1099042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1099229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1099437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1099641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1099964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acos_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1100148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1100335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1100522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex32 PASSED [ 33%]
2023-01-11T23:13:47.1100710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:13:47.1100909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1101091Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1101307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1101657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 33%]
2023-01-11T23:13:47.1101983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_float32 SKIPPED (_refs.acosh doesn't support nvfuser) [ 33%]
2023-01-11T23:13:47.1102314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:13:47.1102522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1102847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1103049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_acosh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1103232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:13:47.1103403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1103583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:13:47.1103788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1103970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1104147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1104351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1104562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1104955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 33%]
2023-01-11T23:13:47.1105283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:13:47.1105610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_add_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1105793Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:13:47.1105984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:13:47.1106170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1106353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:13:47.1106566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1106778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1106981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcdiv_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1107193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:13:47.1107384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1107562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1107745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:13:47.1107927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1108108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:13:47.1108318Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1108532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1108721Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1108925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1109107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1109315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1109664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1109950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addcmul_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1110154Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:13:47.1110336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1110524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1110706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1110884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1111096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1111308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1111512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1111709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1112034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1112238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_addr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1112416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1112604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1112826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1113007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1113183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:13:47.1113359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1113527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1113704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:13:47.1113883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1114090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1114297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1114505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1114709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1114913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1115093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1115443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_all_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1115628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:13:47.1115819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1116033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_allclose_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1116214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_bfloat16 PASSED [ 33%]
2023-01-11T23:13:47.1116394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1116573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1116754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1116965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1117294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:13:47.1117470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1117675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amax_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1117852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1118033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_float64 PASSED [ 33%]
2023-01-11T23:13:47.1118216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int16 PASSED [ 33%]
2023-01-11T23:13:47.1118397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:13:47.1118640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1118848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1119077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1119309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1119487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1119692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1119877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1120084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_amin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1120266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1120443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_int8 PASSED [ 33%]
2023-01-11T23:13:47.1120620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:13:47.1120828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1121056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1121261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1121450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1121655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1121980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_any_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 33%]
2023-01-11T23:13:47.1122165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1122350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1122530Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int32 PASSED [ 33%]
2023-01-11T23:13:47.1122709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_int64 PASSED [ 33%]
2023-01-11T23:13:47.1122894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_aten_cuda_uint8 PASSED [ 33%]
2023-01-11T23:13:47.1123101Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1123294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1123625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 33%]
2023-01-11T23:13:47.1123831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1124053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_arange_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 33%]
2023-01-11T23:13:47.1124332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_bool PASSED [ 33%]
2023-01-11T23:13:47.1124585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex128 PASSED [ 33%]
2023-01-11T23:13:47.1124826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_complex64 PASSED [ 33%]
2023-01-11T23:13:47.1125049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float16 PASSED [ 33%]
2023-01-11T23:13:47.1125265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float32 PASSED [ 33%]
2023-01-11T23:13:47.1125475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:13:47.1125693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1125972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1126375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1126738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1127067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1127268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1127520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:13:47.1127722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1127924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1128122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1128361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:13:47.1128611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:13:47.1128866Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1129125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1129385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1129811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1130165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int32 SKIPPED (_refs.as_strided doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1130386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_partial_views_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1130628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1130890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:13:47.1131120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:13:47.1131312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1131497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1131681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1131870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:13:47.1132061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:13:47.1132429Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:13:47.1132651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1132869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1133215Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_as_strided_scatter_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1133403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:13:47.1133623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1133812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:13:47.1133995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:13:47.1134174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:13:47.1134352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1134708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1134967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1135157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1135364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1135543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1135746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1136074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1136278Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1136461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:13:47.1136645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1136829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:13:47.1137067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:13:47.1137251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1137427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:13:47.1137599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1137779Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:13:47.1137988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1138191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1138406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1138750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:13:47.1138987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1139217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1139420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1139776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_asinh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1139957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:13:47.1140137Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1140319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:13:47.1140500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1140673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:13:47.1140849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1141032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:13:47.1141241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1141440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1141764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1141947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1142150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1142356Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1142540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1142750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:13:47.1142936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:13:47.1143117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:13:47.1143298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1143476Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:13:47.1143648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1143863Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1144207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:13:47.1144421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1144626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1144811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1145132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1145365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atan_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1145546Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1145731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float16 PASSED [ 34%]
2023-01-11T23:13:47.1145906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1146087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1146266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1146471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1146680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1147010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1147216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atanh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1147403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_bfloat16 PASSED [ 34%]
2023-01-11T23:13:47.1147595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:13:47.1147774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_aten_cuda_float32 PASSED [ 34%]
2023-01-11T23:13:47.1148121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:13:47.1148337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1148575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1148911Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_float32 SKIPPED (_refs.atleast_1d doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1149249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1149579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int32 SKIPPED (_refs.atleast_1d doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1150025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_1d_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1150222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:13:47.1150421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_complex64 PASSED [ 34%]
2023-01-11T23:13:47.1150605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_float64 PASSED [ 34%]
2023-01-11T23:13:47.1150782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1150993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1151207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1151591Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 34%]
2023-01-11T23:13:47.1151807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1152018Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1152347Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 34%]
2023-01-11T23:13:47.1152682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int32 SKIPPED (_refs.atleast_2d doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1152887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_2d_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1153081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex128 PASSED [ 34%]
2023-01-11T23:13:47.1153263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_complex32 PASSED [ 34%]
2023-01-11T23:13:47.1153449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1153635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:13:47.1153848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1154056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1154270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1154609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_atleast_3d_executor_nvfuser_cuda_int32 SKIPPED (_refs.atleast_3d doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1154819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1155009Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1155219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1155542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_and_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1155739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:13:47.1155935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:13:47.1156293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int32 SKIPPED (_refs.bitwise_left_shift doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1156511Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1156849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_left_shift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1157035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1157221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1157407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int64 PASSED [ 34%]
2023-01-11T23:13:47.1157615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_aten_cuda_int8 PASSED [ 34%]
2023-01-11T23:13:47.1157796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1158126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 34%]
2023-01-11T23:13:47.1158342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_not_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1158529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_bool PASSED [ 34%]
2023-01-11T23:13:47.1158713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1158894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_aten_cuda_uint8 PASSED [ 34%]
2023-01-11T23:13:47.1159080Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1159293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_or_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1159491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int16 PASSED [ 34%]
2023-01-11T23:13:47.1159684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_aten_cuda_int32 PASSED [ 34%]
2023-01-11T23:13:47.1160037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int32 SKIPPED (_refs.bitwise_right_shift doesn't support nvfuser) [ 34%]
2023-01-11T23:13:47.1160257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 34%]
2023-01-11T23:13:47.1160480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_right_shift_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1160693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1160879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:13:47.1161064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1161275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1161608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1161802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1162132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bitwise_xor_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1162328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_shapes_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1162519Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1162715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1162912Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1163107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1163358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1163584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1163782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1163996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1164340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1164559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1164894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_tensors_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1165093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:13:47.1165280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1165474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1165667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:13:47.1165853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1166041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:13:47.1166231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1166440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:13:47.1166655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1166864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1167081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1167418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1167614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1167829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_broadcast_to_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1168014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float32 XFAIL [ 35%]
2023-01-11T23:13:47.1168199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_float64 XFAIL [ 35%]
2023-01-11T23:13:47.1168384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_aten_cuda_int8 XFAIL [ 35%]
2023-01-11T23:13:47.1168596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1168810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1169207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1169426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_bucketize_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1169613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1169796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:13:47.1169976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1170156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1170336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1170545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1170749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1170952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1171156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1171468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float32 SKIPPED (_refs.cat doesn't support nvfuser) [ 35%]
2023-01-11T23:13:47.1171676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1171989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cat_executor_nvfuser_cuda_int32 SKIPPED (_refs.cat doesn't support nvfuser) [ 35%]
2023-01-11T23:13:47.1172172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1172382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:13:47.1172565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1172773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ceil_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1172954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1173133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1173325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_complex32 PASSED [ 35%]
2023-01-11T23:13:47.1173507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1173690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:13:47.1173869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1174078Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1174283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1174781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1175017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1175576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1175778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_chunk_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1175962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:13:47.1176145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1176324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1176503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1176716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1176921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1177250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1177456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1177636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1177824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1178007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:13:47.1178194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1178413Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1178628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1178839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1179171Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1179497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_max_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1179687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:13:47.1179864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int16 PASSED [ 35%]
2023-01-11T23:13:47.1180047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1180227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1180409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:13:47.1180589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1180800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1181008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1181367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_float32 SKIPPED (_refs.clamp_min doesn't support nvfuser) [ 35%]
2023-01-11T23:13:47.1181695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1182016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clamp_min_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1182195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:13:47.1182375Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1182562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1182746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_complex32 PASSED [ 35%]
2023-01-11T23:13:47.1182928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float16 PASSED [ 35%]
2023-01-11T23:13:47.1183107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1183284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1183463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1183642Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1183813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1184025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1184229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1184599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%]
2023-01-11T23:13:47.1184810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1185132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1185317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1185637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_clone_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1185834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:13:47.1186028Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex32 PASSED [ 35%]
2023-01-11T23:13:47.1186206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:13:47.1186394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float32 PASSED [ 35%]
2023-01-11T23:13:47.1186582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1186767Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1186954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1187175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:13:47.1187367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1187717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%]
2023-01-11T23:13:47.1187936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1188150Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1188355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1188688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 35%]
2023-01-11T23:13:47.1188901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1189111Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_column_stack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1189295Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bfloat16 PASSED [ 35%]
2023-01-11T23:13:47.1189474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1189658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1189939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_complex32 PASSED [ 35%]
2023-01-11T23:13:47.1190125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1190339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int64 PASSED [ 35%]
2023-01-11T23:13:47.1190512Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:13:47.1190693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_aten_cuda_uint8 PASSED [ 35%]
2023-01-11T23:13:47.1190904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1191245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 35%]
2023-01-11T23:13:47.1191454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1191664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1191871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1192074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 35%]
2023-01-11T23:13:47.1192393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 35%]
2023-01-11T23:13:47.1192581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_bool PASSED [ 35%]
2023-01-11T23:13:47.1192769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex128 PASSED [ 35%]
2023-01-11T23:13:47.1192992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_complex64 PASSED [ 35%]
2023-01-11T23:13:47.1193189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_float64 PASSED [ 35%]
2023-01-11T23:13:47.1193381Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int32 PASSED [ 35%]
2023-01-11T23:13:47.1193571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_aten_cuda_int8 PASSED [ 35%]
2023-01-11T23:13:47.1193789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1194004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1194226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1194586Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:13:47.1194931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float32 SKIPPED (_refs.conj_physical doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1195140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1195358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_conj_physical_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1195559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:13:47.1195759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:13:47.1195957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:13:47.1196173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:13:47.1196367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1196553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:13:47.1196741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:13:47.1196968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1197181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1197527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:13:47.1197872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_int32 SKIPPED (_refs.constant_pad_nd doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1198093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_constant_pad_nd_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1198286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:13:47.1198478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:13:47.1198665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1198880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:13:47.1199070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:13:47.1199289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1199498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1199839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:13:47.1200172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int32 SKIPPED (_refs.contiguous doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1200386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1200717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:13:47.1200929Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_contiguous_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1201120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:13:47.1201307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:13:47.1201496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1201682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1201859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:13:47.1202067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:13:47.1202286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1202615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int32 SKIPPED (_refs.copysign doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1202827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1203155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_copysign_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:13:47.1203344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:13:47.1203529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:13:47.1203714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:13:47.1203896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1204068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:13:47.1204249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:13:47.1204456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1204662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1205032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:13:47.1205245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1205451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1205640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_float32 PASSED [ 36%]
2023-01-11T23:13:47.1205826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1206148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cos_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:13:47.1206327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_bfloat16 PASSED [ 36%]
2023-01-11T23:13:47.1206516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:13:47.1206698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1206881Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:13:47.1207062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1207242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1207422Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:13:47.1207603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:13:47.1207816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1208042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1208257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1208597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:13:47.1208806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1209182Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:13:47.1209391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1209598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cosh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1209789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:13:47.1209978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:13:47.1210164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1210342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:13:47.1210524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1210755Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:13:47.1210943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:13:47.1211155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1211365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1211574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1211782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_cumsum_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1211972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:13:47.1212169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:13:47.1212355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:13:47.1212545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1212734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:13:47.1212922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1213140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1213352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1213574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1213816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1214031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1214366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:13:47.1214977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int32 SKIPPED (_refs.diag_embed doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1215193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1215532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:13:47.1215748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_embed_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1215928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:13:47.1216113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:13:47.1216315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1216527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1217073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:13:47.1217288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1217487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1217811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:13:47.1218123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int32 SKIPPED (_refs.diag doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1218327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diag_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1218526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:13:47.1218725Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:13:47.1218919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1219106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:13:47.1219298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1219488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int64 PASSED [ 36%]
2023-01-11T23:13:47.1219670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:13:47.1219890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1220284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 36%]
2023-01-11T23:13:47.1220506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1220849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.diagonal_copy doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1221067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1221280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1221467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:13:47.1221661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex128 PASSED [ 36%]
2023-01-11T23:13:47.1221854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_complex32 PASSED [ 36%]
2023-01-11T23:13:47.1222031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1222217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float32 PASSED [ 36%]
2023-01-11T23:13:47.1222396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_float64 PASSED [ 36%]
2023-01-11T23:13:47.1222579Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1222789Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1222975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:13:47.1223156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:13:47.1223372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1223584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1223910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int32 SKIPPED (_refs.diagonal doesn't support nvfuser) [ 36%]
2023-01-11T23:13:47.1224227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 36%]
2023-01-11T23:13:47.1224427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_bool PASSED [ 36%]
2023-01-11T23:13:47.1224628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_complex64 PASSED [ 36%]
2023-01-11T23:13:47.1224823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1225013Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_aten_cuda_int32 PASSED [ 36%]
2023-01-11T23:13:47.1225233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1225455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1225681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1225902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1226272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 36%]
2023-01-11T23:13:47.1226480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_diagonal_scatter_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1226665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_float16 PASSED [ 36%]
2023-01-11T23:13:47.1226851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int16 PASSED [ 36%]
2023-01-11T23:13:47.1227033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_int8 PASSED [ 36%]
2023-01-11T23:13:47.1227223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_aten_cuda_uint8 PASSED [ 36%]
2023-01-11T23:13:47.1227438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1227643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1227851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_digamma_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 36%]
2023-01-11T23:13:47.1228046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:13:47.1228240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1228420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:13:47.1228802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_float32 SKIPPED (_refs.div doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1229146Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_floor_rounding_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1229341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:13:47.1229554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_complex128 SKIPPED (Skipped!) [ 37%]
2023-01-11T23:13:47.1229846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:13:47.1230049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1230245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1230468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1230681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_complex32 SKIPPED (Skipped!) [ 37%]
2023-01-11T23:13:47.1230892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1231236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1231565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_no_rounding_mode_executor_nvfuser_cuda_int32 SKIPPED (_refs.div doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1231766Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:13:47.1231995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:13:47.1232188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:13:47.1232380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:13:47.1232571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1232792Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1233130Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float32 SKIPPED (_refs.div doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1233351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1233685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1233903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_div_trunc_rounding_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1234087Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:13:47.1234273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1234457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:13:47.1234663Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1234848Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1235057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1235263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1235476Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1235811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%]
2023-01-11T23:13:47.1236026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1236235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1236560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1236765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dsplit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1236947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:13:47.1237134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1237358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:13:47.1237582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:13:47.1237804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1238001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:13:47.1238181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1238396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1238753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%]
2023-01-11T23:13:47.1238963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1239286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_float32 SKIPPED (_refs.dstack doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1239618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_dstack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1239921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_bfloat16 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1240221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float16 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1240524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_float64 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1240816Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int16 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1241147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int64 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1241445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_aten_cuda_int8 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1241749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bfloat16 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1242046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_bool SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1242358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex128 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1242665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex32 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1242976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_complex64 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1243282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float16 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1243583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_float32 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1243883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int16 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1244174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_executor_nvfuser_cuda_int32 SKIPPED (Can't check result for empty) [ 37%]
2023-01-11T23:13:47.1244492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex128 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1244821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_complex32 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1245162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float16 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1245471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float32 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1245775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_float64 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1246083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_aten_cuda_int16 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1246408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bfloat16 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1246722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_bool SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1247047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex128 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1247373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_complex32 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1247687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_float64 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1247997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int16 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1248339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_empty_like_executor_nvfuser_cuda_int32 SKIPPED (Can't check result for empty_like) [ 37%]
2023-01-11T23:13:47.1248524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:13:47.1248709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex128 PASSED [ 37%]
2023-01-11T23:13:47.1248892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_complex64 PASSED [ 37%]
2023-01-11T23:13:47.1249070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1249248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1249426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1249605Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:13:47.1249806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1250144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 37%]
2023-01-11T23:13:47.1250348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1250528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1250730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1251050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1251250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1251481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eq_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1251661Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:13:47.1251840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1252009Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:13:47.1252211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1252415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1252738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1252943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1253125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:13:47.1253304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1253483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:13:47.1253663Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:13:47.1253831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1254037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:13:47.1254248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1254452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1254867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int32 PASSED [ 37%]
2023-01-11T23:13:47.1255077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfc_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1255258Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:13:47.1255442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1255622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:13:47.1255833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1256034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1256364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_float32 SKIPPED (_refs.erfinv doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1256688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 37%]
2023-01-11T23:13:47.1257015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_erfinv_executor_nvfuser_cuda_int32 SKIPPED (_refs.erfinv doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1257259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:13:47.1257514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:13:47.1257818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1258002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int16 PASSED [ 37%]
2023-01-11T23:13:47.1258180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_aten_cuda_int8 PASSED [ 37%]
2023-01-11T23:13:47.1258380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1258710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float32 SKIPPED (_refs.exp2 doesn't support nvfuser) [ 37%]
2023-01-11T23:13:47.1258916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1259262Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:13:47.1259468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bfloat16 PASSED [ 37%]
2023-01-11T23:13:47.1259645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:13:47.1259830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex128 PASSED [ 37%]
2023-01-11T23:13:47.1260011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_complex32 PASSED [ 37%]
2023-01-11T23:13:47.1260192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1260434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_float64 PASSED [ 37%]
2023-01-11T23:13:47.1260606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_int64 PASSED [ 37%]
2023-01-11T23:13:47.1260791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:13:47.1261001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1261207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1261416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1261619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_exp_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1261806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_bool PASSED [ 37%]
2023-01-11T23:13:47.1262001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_complex128 PASSED [ 37%]
2023-01-11T23:13:47.1262194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float16 PASSED [ 37%]
2023-01-11T23:13:47.1262384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_float32 PASSED [ 37%]
2023-01-11T23:13:47.1262565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_aten_cuda_uint8 PASSED [ 37%]
2023-01-11T23:13:47.1262781Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1262999Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1263216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1263454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 37%]
2023-01-11T23:13:47.1263786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_as_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 37%]
2023-01-11T23:13:47.1263977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex128 PASSED [ 37%]
2023-01-11T23:13:47.1264168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_complex64 PASSED [ 37%]
2023-01-11T23:13:47.1264352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:13:47.1264529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1264715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1264899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1265114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1265326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1265539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1265748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1266072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expand_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:13:47.1266283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1266472Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:13:47.1266649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1266831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:13:47.1267012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1267192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:13:47.1267401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1267607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1267798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1268006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1268336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_expm1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1268520Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_bfloat16 PASSED [ 38%]
2023-01-11T23:13:47.1268700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:13:47.1268880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1269061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1269298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1269510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1269805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1270038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1270247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1270578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_eye_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1270765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1270951Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:13:47.1271142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:13:47.1271325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:13:47.1271508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:13:47.1271714Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1271962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1272180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1272509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1272719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1273038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:13:47.1273227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:13:47.1273416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:13:47.1273604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1273787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1273969Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1274151Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:13:47.1274362Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1274687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fft doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1275010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.fft doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1275360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:13:47.1275553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:13:47.1275738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1275923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:13:47.1276134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1276348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1276700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:13:47.1276916Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1277127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1277459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1277665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1277986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:13:47.1278220Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1278413Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1278604Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:13:47.1278797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1278987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1279174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1279361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:13:47.1279551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:13:47.1279898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:13:47.1280117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1280332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1280681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.fftshift doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1280896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1281236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1281592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_fftshift_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:13:47.1281780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1281974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:13:47.1282158Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1282344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1282522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:13:47.1282732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1282948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1283283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.hfft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1283498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1283829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1284156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.hfft2 doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1284390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1284578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1284764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:13:47.1284942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1285124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1285303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1285483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:13:47.1285696Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1286043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:13:47.1286251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1286461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1286782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.hfft doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1286990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1287312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1287660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.hfft doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1287867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1288049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1288235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:13:47.1288424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:13:47.1288615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1288802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1288985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:13:47.1289167Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:13:47.1289368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1289713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:13:47.1290036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.hfftn doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1290274Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_hfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1290459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1290645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_complex32 PASSED [ 38%]
2023-01-11T23:13:47.1290829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1291014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1291194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:13:47.1291399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1291606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1291824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1292152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1292359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1292541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float32 PASSED [ 38%]
2023-01-11T23:13:47.1292726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1292913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int16 PASSED [ 38%]
2023-01-11T23:13:47.1293094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1293297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_aten_cuda_uint8 PASSED [ 38%]
2023-01-11T23:13:47.1293504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1293708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1293919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1294243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifft doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1294453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1294981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1295174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_complex64 PASSED [ 38%]
2023-01-11T23:13:47.1295358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float16 PASSED [ 38%]
2023-01-11T23:13:47.1295541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1295722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int32 PASSED [ 38%]
2023-01-11T23:13:47.1295896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1296186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1296550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 38%]
2023-01-11T23:13:47.1296760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 38%]
2023-01-11T23:13:47.1297092Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 38%]
2023-01-11T23:13:47.1297417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifftn doesn't support nvfuser) [ 38%]
2023-01-11T23:13:47.1297748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 38%]
2023-01-11T23:13:47.1297942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_bool PASSED [ 38%]
2023-01-11T23:13:47.1298139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_complex128 PASSED [ 38%]
2023-01-11T23:13:47.1298333Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_float64 PASSED [ 38%]
2023-01-11T23:13:47.1298521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int64 PASSED [ 38%]
2023-01-11T23:13:47.1298703Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_aten_cuda_int8 PASSED [ 38%]
2023-01-11T23:13:47.1298922Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1299281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%]
2023-01-11T23:13:47.1299537Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1299886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ifftshift doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1300225Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1300566Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ifftshift doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1300777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ifftshift_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1300963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1301147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1301322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1301500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:13:47.1301713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1302047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1302417Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:13:47.1302602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:13:47.1302788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1302996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1303204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1303412Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1303731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1304057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ihfft doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1304266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1304589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1304775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1304959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:13:47.1305141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1305317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1305500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:13:47.1305728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:13:47.1305934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1306265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.ihfftn doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1306594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1306925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_ihfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.ihfftn doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1307120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:13:47.1307334Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1307546Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1307875Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.irfft2 doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1308081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1308407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.irfft2 doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1308641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1308839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1309021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1309211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:13:47.1309399Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:13:47.1309578Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1309839Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:13:47.1310055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1310403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 39%]
2023-01-11T23:13:47.1310614Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1310939Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.irfft doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1311140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1311463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1311788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.irfft doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1312022Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1312344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfft_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1312526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1312717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:13:47.1312904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex32 PASSED [ 39%]
2023-01-11T23:13:47.1313100Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:13:47.1313291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:13:47.1313470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:13:47.1313652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1313831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:13:47.1314038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1314371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.irfftn doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1314610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1314936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int32 SKIPPED (_refs.fft.irfftn doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1315144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1315469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1315677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_irfftn_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1315861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:13:47.1316041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:13:47.1316221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1316404Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:13:47.1316613Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1316943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.rfft2 doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1317271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1317480Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1317667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1317882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:13:47.1318059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:13:47.1318237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1318415Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:13:47.1318593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:13:47.1318800Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1319133Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1319342Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1319543Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfft_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1319720Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1319907Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:13:47.1320086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:13:47.1320296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:13:47.1320475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1320654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:13:47.1320862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1321198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float32 SKIPPED (_refs.fft.rfftn doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1321407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1321611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1321937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fft_rfftn_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1322123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:13:47.1322298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_complex64 PASSED [ 39%]
2023-01-11T23:13:47.1322479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:13:47.1322659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1322864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1323067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1323389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1323620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fill_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1323807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bfloat16 PASSED [ 39%]
2023-01-11T23:13:47.1323990Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1324173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:13:47.1324348Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int16 PASSED [ 39%]
2023-01-11T23:13:47.1324527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1324708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1324926Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1325136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1325462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1325673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flatten_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1325854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bfloat16 PASSED [ 39%]
2023-01-11T23:13:47.1326071Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1326250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:13:47.1326433Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float16 PASSED [ 39%]
2023-01-11T23:13:47.1326615Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_float64 PASSED [ 39%]
2023-01-11T23:13:47.1326797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1326978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1327156Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:13:47.1327363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1327575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1327780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flip_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1327963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1328134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1328315Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_aten_cuda_int8 PASSED [ 39%]
2023-01-11T23:13:47.1328523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1328729Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1328942Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1329176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1329501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_float32 SKIPPED (_refs.fliplr doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1329706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1330027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 39%]
2023-01-11T23:13:47.1330229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fliplr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1330410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bfloat16 PASSED [ 39%]
2023-01-11T23:13:47.1330596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1330780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_float32 PASSED [ 39%]
2023-01-11T23:13:47.1330962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int32 PASSED [ 39%]
2023-01-11T23:13:47.1331143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_int64 PASSED [ 39%]
2023-01-11T23:13:47.1331325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_aten_cuda_uint8 PASSED [ 39%]
2023-01-11T23:13:47.1331646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float32 SKIPPED (_refs.flipud doesn't support nvfuser) [ 39%]
2023-01-11T23:13:47.1331883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1332208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 39%]
2023-01-11T23:13:47.1332407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_flipud_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 39%]
2023-01-11T23:13:47.1332597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_bool PASSED [ 39%]
2023-01-11T23:13:47.1332788Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex128 PASSED [ 39%]
2023-01-11T23:13:47.1332977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_complex64 PASSED [ 40%]
2023-01-11T23:13:47.1333163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1333351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1333539Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:13:47.1333748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1333964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1334181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1334782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_float32 SKIPPED (_refs.float_power doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1335164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1335437Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1335777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1335989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_float_power_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1336180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1336370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1336560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1336748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1336937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1337116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1337301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:13:47.1337516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1337732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1338102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_divide_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1338291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1338476Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1338655Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1338836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1339020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1339193Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1339377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1339587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1339797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1340006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_floor_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1340187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:13:47.1340370Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1340553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1340737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:13:47.1340943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1341175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1341384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1341705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_float32 SKIPPED (_refs.fmax doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1342031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1342237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1342563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1342770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmax_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1342952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1343136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1343311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1343490Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1343670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:13:47.1343968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1344300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1344616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int32 SKIPPED (_refs.fmin doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1344820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1345140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1345344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1345529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1345702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1345885Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1346065Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1346272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1346459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1346665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1346994Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1347365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_fmod_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1347555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:13:47.1347737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1347933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1348142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_frac_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1348324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1348509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1348716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1349042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1349246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gcd_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1349427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1349606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1349893Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1350119Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1350325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1350526Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ge_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1350708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:13:47.1350886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:13:47.1351067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1351242Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1351448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1351633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1351826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1352006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1352207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_gt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1352397Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:13:47.1352582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:13:47.1352773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1352960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1353173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1353359Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1353529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1353742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1353954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1354293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1354629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_heaviside_executor_nvfuser_cuda_int32 SKIPPED (_refs.heaviside doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1354809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:13:47.1354996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_complex32 PASSED [ 40%]
2023-01-11T23:13:47.1355178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1355358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1355540Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1355741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1355925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_aten_cuda_uint8 PASSED [ 40%]
2023-01-11T23:13:47.1356136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1356351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1356559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1356886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float32 SKIPPED (_refs.hsplit doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1357094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1357423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1357634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hsplit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1357821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:13:47.1357999Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex128 PASSED [ 40%]
2023-01-11T23:13:47.1358186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex32 PASSED [ 40%]
2023-01-11T23:13:47.1358371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_complex64 PASSED [ 40%]
2023-01-11T23:13:47.1358555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1358737Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1358946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1359155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1359352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1359565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1359771Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1359973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1360300Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 40%]
2023-01-11T23:13:47.1360501Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hstack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1360685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1360870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1361077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1361394Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float32 SKIPPED (_refs.hypot doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1361621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_hypot_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1361801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_bool PASSED [ 40%]
2023-01-11T23:13:47.1361983Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float16 PASSED [ 40%]
2023-01-11T23:13:47.1362155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1362330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int16 PASSED [ 40%]
2023-01-11T23:13:47.1362499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int32 PASSED [ 40%]
2023-01-11T23:13:47.1362667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int64 PASSED [ 40%]
2023-01-11T23:13:47.1362844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_aten_cuda_int8 PASSED [ 40%]
2023-01-11T23:13:47.1363048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1363373Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 40%]
2023-01-11T23:13:47.1363574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_i0_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1363895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float32 SKIPPED (_refs.igamma doesn't support nvfuser) [ 40%]
2023-01-11T23:13:47.1364096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igamma_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1364282Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1364467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_igammac_executor_aten_cuda_float64 PASSED [ 40%]
2023-01-11T23:13:47.1364681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_aten_cuda_complex128 PASSED [ 40%]
2023-01-11T23:13:47.1365021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 40%]
2023-01-11T23:13:47.1365231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_imag_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 40%]
2023-01-11T23:13:47.1365419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_bfloat16 PASSED [ 40%]
2023-01-11T23:13:47.1365610Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_complex128 PASSED [ 40%]
2023-01-11T23:13:47.1365796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float32 PASSED [ 40%]
2023-01-11T23:13:47.1365985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1366163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1366346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1366527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1366706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:13:47.1366919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1367128Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1367504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%]
2023-01-11T23:13:47.1367722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1367936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1368265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_add doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1368585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:13:47.1368914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_int32 SKIPPED (_refs.index_add doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1369122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_add_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1369312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:13:47.1369500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1369691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1369879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1370063Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1370250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1370430Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1370633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1370813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:13:47.1371024Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1371360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_copy doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1371572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1371905Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:13:47.1372118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_copy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1372308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:13:47.1372496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:13:47.1372680Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1372857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1373035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1373246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1373459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1373669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1373879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1374088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1374418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_fill_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:13:47.1374899Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1375106Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1375298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_complex32 PASSED [ 41%]
2023-01-11T23:13:47.1375489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1375679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1375864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1376051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1376245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1376463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1376751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1377114Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%]
2023-01-11T23:13:47.1377444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_float32 SKIPPED (_refs.index_select doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1377775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_index_select_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:13:47.1377966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:13:47.1378148Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1378337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1378522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:13:47.1378710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:13:47.1378891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1379075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1379256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1379492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1379709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1380038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_float32 SKIPPED (_refs.isclose doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1380368Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:13:47.1380693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isclose_executor_nvfuser_cuda_int32 SKIPPED (_refs.isclose doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1380883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:13:47.1381077Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1381273Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_complex32 PASSED [ 41%]
2023-01-11T23:13:47.1381460Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:13:47.1381646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:13:47.1381821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1382005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1382192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:13:47.1382407Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1382783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%]
2023-01-11T23:13:47.1383000Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1383327Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:13:47.1383535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1383859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:13:47.1384073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isfinite_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1384253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:13:47.1384436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1384624Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1384811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex32 PASSED [ 41%]
2023-01-11T23:13:47.1384995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:13:47.1385178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1385427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1385606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1385814Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1386017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1386360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%]
2023-01-11T23:13:47.1386569Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1386891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:13:47.1387093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isinf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1387280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1387465Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_complex64 PASSED [ 41%]
2023-01-11T23:13:47.1387649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:13:47.1387828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1388007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1388178Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1388387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1388626Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1388836Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1389040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1389250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isnan_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1389439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1389624Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:13:47.1389913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1390143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1390468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_float32 SKIPPED (_refs.isneginf doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1390797Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:13:47.1391121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int32 SKIPPED (_refs.isneginf doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1391329Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1391562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isneginf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1391753Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:13:47.1391936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:13:47.1392122Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:13:47.1392302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1392483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1392658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1392838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1393021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_aten_cuda_uint8 PASSED [ 41%]
2023-01-11T23:13:47.1393235Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1393442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1393651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1393980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 41%]
2023-01-11T23:13:47.1394182Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isposinf_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1394369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bfloat16 PASSED [ 41%]
2023-01-11T23:13:47.1394575Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1394758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex128 PASSED [ 41%]
2023-01-11T23:13:47.1394941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_complex32 PASSED [ 41%]
2023-01-11T23:13:47.1395123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_float16 PASSED [ 41%]
2023-01-11T23:13:47.1395303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int16 PASSED [ 41%]
2023-01-11T23:13:47.1395483Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int32 PASSED [ 41%]
2023-01-11T23:13:47.1395667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1395851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1396056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1396400Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 41%]
2023-01-11T23:13:47.1396603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1396810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1397181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float32 SKIPPED (_refs.isreal doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1397392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1397709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int32 SKIPPED (_refs.isreal doesn't support nvfuser) [ 41%]
2023-01-11T23:13:47.1398031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 41%]
2023-01-11T23:13:47.1398237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_isreal_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1398416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int64 PASSED [ 41%]
2023-01-11T23:13:47.1398595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_aten_cuda_int8 PASSED [ 41%]
2023-01-11T23:13:47.1398806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lcm_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 41%]
2023-01-11T23:13:47.1399004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_bool PASSED [ 41%]
2023-01-11T23:13:47.1399209Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float32 PASSED [ 41%]
2023-01-11T23:13:47.1399387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_float64 PASSED [ 41%]
2023-01-11T23:13:47.1399563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:13:47.1399736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1399906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1400084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:13:47.1400255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1400482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1400690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1400865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1401066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1401385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:13:47.1401589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_le_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1401773Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1401957Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1402138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1402345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1402555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1402887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%]
2023-01-11T23:13:47.1403121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1403440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lerp_executor_nvfuser_cuda_float32 SKIPPED (_refs.lerp doesn't support nvfuser) [ 42%]
2023-01-11T23:13:47.1403625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:13:47.1403808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1403987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:13:47.1404166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1404345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1404555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1404758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1404937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1405261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lgamma_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1405459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1405656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1405854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:13:47.1406049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1406297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_matrix_norm_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1406489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1406683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1406871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1407051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1407266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1407455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1407673Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1407886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svd_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1408082Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:13:47.1408275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1408462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_svdvals_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1408682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1408883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:13:47.1409073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1409266Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1409458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1409678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1409896Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linalg_vector_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1410085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1410270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int32 XFAIL [ 42%]
2023-01-11T23:13:47.1410451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_aten_cuda_int8 XFAIL [ 42%]
2023-01-11T23:13:47.1410664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1410877Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1411073Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1411283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_linspace_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1411466Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1411668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:13:47.1411851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:13:47.1412031Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1412205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1412382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:13:47.1412561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1412765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1412970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1413302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:13:47.1413507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1413827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log10_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1414014Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:13:47.1414228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:13:47.1414409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1414903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1415093Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:13:47.1415265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1415473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1415686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1415898Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1416105Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1416290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1416496Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1416679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1417004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1417208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log1p_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1417384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1417634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:13:47.1417818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1417997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1418175Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1418351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1418556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1418762Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1418950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1419149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1419328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1419648Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1419850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log2_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1420033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1420251Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:13:47.1420435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1420639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1420979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%]
2023-01-11T23:13:47.1421187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1421384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1429001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1429364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1429568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1429900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 42%]
2023-01-11T23:13:47.1430108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:13:47.1430308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1430505Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1430704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1430975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1431166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:13:47.1431363Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1431590Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1431963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 42%]
2023-01-11T23:13:47.1432191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1432416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1432768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:13:47.1433117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_log_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1433308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1433497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1433723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1433910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1434125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1434340Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1434552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1434891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:13:47.1435079Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1435291Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1435623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_and_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 42%]
2023-01-11T23:13:47.1435812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_bfloat16 PASSED [ 42%]
2023-01-11T23:13:47.1436004Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1436183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_float32 PASSED [ 42%]
2023-01-11T23:13:47.1436367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1436552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_int64 PASSED [ 42%]
2023-01-11T23:13:47.1436736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1436974Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1437309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 42%]
2023-01-11T23:13:47.1437498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int32 PASSED [ 42%]
2023-01-11T23:13:47.1437705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1437915Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_not_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1438104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex128 PASSED [ 42%]
2023-01-11T23:13:47.1438296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_complex64 PASSED [ 42%]
2023-01-11T23:13:47.1438481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float16 PASSED [ 42%]
2023-01-11T23:13:47.1438668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_float64 PASSED [ 42%]
2023-01-11T23:13:47.1438850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int16 PASSED [ 42%]
2023-01-11T23:13:47.1439033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_int8 PASSED [ 42%]
2023-01-11T23:13:47.1439211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_aten_cuda_uint8 PASSED [ 42%]
2023-01-11T23:13:47.1439448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 42%]
2023-01-11T23:13:47.1439665Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1439853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1440055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_or_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1440239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1440423Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1440606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1440824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1441036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1441227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1441439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1441774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1441965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1442170Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1442535Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1442747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logical_xor_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1442932Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1443121Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:13:47.1443306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:13:47.1443487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1443670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1443853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int32 XFAIL [ 43%]
2023-01-11T23:13:47.1444034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_aten_cuda_int64 XFAIL [ 43%]
2023-01-11T23:13:47.1444243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1444455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1444787Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1445115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int32 SKIPPED (_refs.logspace doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1445350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1445683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logspace_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1445870Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:13:47.1446058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1446245Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1446424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:13:47.1446629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1446966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_float32 SKIPPED (_refs.logsumexp doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1447298Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1447629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int32 SKIPPED (_refs.logsumexp doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1447837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1448042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_logsumexp_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1448223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1448402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:13:47.1448609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:13:47.1448790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1448961Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:13:47.1449163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:13:47.1449392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1449594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1449799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1449985Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1450187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1450508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1450688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1450880Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_lt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1451098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1451290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:13:47.1451479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:13:47.1451667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1451851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1452036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:13:47.1452221Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:13:47.1452403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:13:47.1452621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1452824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1453039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1453252Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1453591Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int32 SKIPPED (_refs.masked_fill doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1453925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_masked_fill_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1454108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:13:47.1454325Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1454747Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:13:47.1454988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1455173Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:13:47.1455377Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1455584Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1455925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int32 SKIPPED (_refs.maximum doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1456263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_maximum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1456447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1456628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:13:47.1456808Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1457015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1457330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1457538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1457736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1457938Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1458140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:13:47.1458345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:13:47.1458576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1458809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1459169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_float32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1459522Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1459744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1460095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_list_of_tensors_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1460306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1460545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1460754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1460954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:13:47.1461155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:13:47.1461354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:13:47.1461580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1461813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1462047Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1462277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1462640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_float32 SKIPPED (_refs.meshgrid doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1462999Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1463244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_meshgrid_variadic_tensors_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1463435Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1463619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1463826Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1464032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1464239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1464440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1464649Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1464977Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_minimum_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1465169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bfloat16 PASSED [ 43%]
2023-01-11T23:13:47.1465344Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_bool PASSED [ 43%]
2023-01-11T23:13:47.1465534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex128 PASSED [ 43%]
2023-01-11T23:13:47.1465723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex32 PASSED [ 43%]
2023-01-11T23:13:47.1465913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:13:47.1466097Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float32 PASSED [ 43%]
2023-01-11T23:13:47.1466314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1466498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1466678Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:13:47.1466857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:13:47.1467030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_aten_cuda_uint8 PASSED [ 43%]
2023-01-11T23:13:47.1467236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1467568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float32 SKIPPED (_refs.movedim doesn't support nvfuser) [ 43%]
2023-01-11T23:13:47.1467776Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1468107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1468312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1468640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 43%]
2023-01-11T23:13:47.1468855Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_movedim_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1469084Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex32 XFAIL [ 43%]
2023-01-11T23:13:47.1469271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_complex64 PASSED [ 43%]
2023-01-11T23:13:47.1469443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:13:47.1469623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1469887Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1470067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_aten_cuda_int64 PASSED [ 43%]
2023-01-11T23:13:47.1470271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1470475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1470689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1471032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 43%]
2023-01-11T23:13:47.1471236Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1471441Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1471637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1471963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 43%]
2023-01-11T23:13:47.1472144Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_int32 PASSED [ 43%]
2023-01-11T23:13:47.1472374Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_mul_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1472563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float16 PASSED [ 43%]
2023-01-11T23:13:47.1472749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_float64 PASSED [ 43%]
2023-01-11T23:13:47.1472931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int16 PASSED [ 43%]
2023-01-11T23:13:47.1473113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_aten_cuda_int8 PASSED [ 43%]
2023-01-11T23:13:47.1473329Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 43%]
2023-01-11T23:13:47.1473666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:13:47.1473987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int32 SKIPPED (_refs.nan_to_num doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1474198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1474525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nan_to_num_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:13:47.1474715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:13:47.1474945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_bool PASSED [ 44%]
2023-01-11T23:13:47.1475139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:13:47.1475336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1475521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1475706Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1475890Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int32 PASSED [ 44%]
2023-01-11T23:13:47.1476067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:13:47.1476276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1476494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1476853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%]
2023-01-11T23:13:47.1477075Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1477287Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1477473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:13:47.1477660Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:13:47.1477847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:13:47.1478058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:13:47.1478260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1478471Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1478677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1479016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int32 SKIPPED (_refs.narrow doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1479259Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_narrow_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1479504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1479748Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_native_layer_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1479933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_complex64 PASSED [ 44%]
2023-01-11T23:13:47.1480112Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1480281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1480459Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1480682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int16 PASSED [ 44%]
2023-01-11T23:13:47.1480852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int32 PASSED [ 44%]
2023-01-11T23:13:47.1481030Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_aten_cuda_int8 PASSED [ 44%]
2023-01-11T23:13:47.1481237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1481443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1481775Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:13:47.1482096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ne_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:13:47.1482279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:13:47.1482458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:13:47.1482643Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:13:47.1482818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_int8 PASSED [ 44%]
2023-01-11T23:13:47.1482997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_aten_cuda_uint8 PASSED [ 44%]
2023-01-11T23:13:47.1483202Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1483544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%]
2023-01-11T23:13:47.1483873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 44%]
2023-01-11T23:13:47.1484232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_neg_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:13:47.1484550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_bfloat16 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1484868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex128 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1485176Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex32 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1485485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_complex64 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1485801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float32 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1486109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_float64 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1486419Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int16 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1486730Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_int8 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1487042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_aten_cuda_uint8 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1487354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_bool SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1487704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex128 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1488033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_complex32 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1488351Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_float16 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1488657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int32 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1488961Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_executor_nvfuser_cuda_int64 SKIPPED (Can't check result for new_empty) [ 44%]
2023-01-11T23:13:47.1489200Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1489450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1489691Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1489924Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1490157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1490387Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float32 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1490617Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1490883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1491124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1491361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1491592Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_aten_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1491842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_complex32 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1492090Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_float16 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1492330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1492567Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1492802Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_empty_strided_executor_nvfuser_cuda_uint8 SKIPPED (Expected: empty_strided is not comparable) [ 44%]
2023-01-11T23:13:47.1493012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_bool PASSED [ 44%]
2023-01-11T23:13:47.1493201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex128 PASSED [ 44%]
2023-01-11T23:13:47.1493392Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:13:47.1493577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1493757Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:13:47.1493928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_aten_cuda_int8 PASSED [ 44%]
2023-01-11T23:13:47.1494141Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1494777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%]
2023-01-11T23:13:47.1495037Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1495380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_float32 SKIPPED (_refs.new_full doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1495709Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int32 SKIPPED (_refs.new_full doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1495925Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1496256Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_full_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:13:47.1496451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex32 PASSED [ 44%]
2023-01-11T23:13:47.1496639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_complex64 PASSED [ 44%]
2023-01-11T23:13:47.1496950Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1497139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1497316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1497497Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_int32 PASSED [ 44%]
2023-01-11T23:13:47.1497676Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_aten_cuda_uint8 PASSED [ 44%]
2023-01-11T23:13:47.1497883Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1498230Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 44%]
2023-01-11T23:13:47.1498443Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1498654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1498982Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int32 SKIPPED (_refs.new_ones doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1499208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1499553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_ones_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:13:47.1499777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_bool PASSED [ 44%]
2023-01-11T23:13:47.1499967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1500153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_int16 PASSED [ 44%]
2023-01-11T23:13:47.1500337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_aten_cuda_uint8 PASSED [ 44%]
2023-01-11T23:13:47.1500545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1500760Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1501025Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1501246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1501574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 44%]
2023-01-11T23:13:47.1501785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_new_zeros_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1501971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1502180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1502391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nextafter_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1502640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:13:47.1502937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:13:47.1503186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_aten_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:13:47.1503434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:13:47.1503679Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_alpha_dropout_executor_nvfuser_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 44%]
2023-01-11T23:13:47.1503878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:13:47.1504067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1504271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_celu_executor_nvfuser_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1504469Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_aten_cuda_float32 XFAIL [ 44%]
2023-01-11T23:13:47.1504694Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_dropout_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1504886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1505107Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_elu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1505330Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:13:47.1505528Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1505719Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1505919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1506131Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_gelu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1506494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_glu_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.glu doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1506699Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1506901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1507124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_group_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1507349Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1507576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1507794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardshrink_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1508001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1508227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_float64 PASSED [ 44%]
2023-01-11T23:13:47.1508424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_aten_cuda_int64 PASSED [ 44%]
2023-01-11T23:13:47.1508799Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.hardtanh doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1509023Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hardtanh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 44%]
2023-01-11T23:13:47.1509238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_bfloat16 PASSED [ 44%]
2023-01-11T23:13:47.1509455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float16 PASSED [ 44%]
2023-01-11T23:13:47.1509669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_aten_cuda_float32 PASSED [ 44%]
2023-01-11T23:13:47.1510184Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_hinge_embedding_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.hinge_embedding_loss doesn't support nvfuser) [ 44%]
2023-01-11T23:13:47.1510393Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1510594Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_huber_loss_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1510791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1511021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1511217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1511428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_l1_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1511625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1511823Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1512046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1512254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_layer_norm_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1512457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1512656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1512882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_leaky_relu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1513095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1513309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1513517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:13:47.1513751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1513962Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:13:47.1514172Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:13:47.1514406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1514653Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1514892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1515124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1515357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1515727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_log_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:13:47.1515941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1516151Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1516396Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1516606Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1516813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1517016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:13:47.1517250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1517482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1517901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.margin_ranking_loss doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1518132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_margin_ranking_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1518332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1518529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1518751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1518943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mish_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1519145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_mse_loss_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1519376Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_bfloat16 XFAIL [ 45%]
2023-01-11T23:13:47.1519571Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float32 XFAIL [ 45%]
2023-01-11T23:13:47.1519763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_nll_loss_executor_aten_cuda_float64 XFAIL [ 45%]
2023-01-11T23:13:47.1519972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1520179Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1520384Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1520587Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1520791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1520984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:13:47.1521216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1521452Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1521693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1521927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1522297Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:13:47.1522506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pairwise_distance_executor_nvfuser_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1522878Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.pdist doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1523101Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_pdist_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1523307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1523514Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1523712Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:13:47.1523941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1524168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1524532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:13:47.1524745Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_poisson_nll_loss_executor_nvfuser_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1524968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1525169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1525365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1525585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_prelu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1525780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1525970Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1526166Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1526357Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1526549Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:13:47.1526768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu6_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1526960Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1527157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1527382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:13:47.1527574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:13:47.1527796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1528006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1528228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1528589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.relu doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1528937Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_relu_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:13:47.1529136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1529337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1529532Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1529723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1529941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_selu_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1530153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 45%]
2023-01-11T23:13:47.1530391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 45%]
2023-01-11T23:13:47.1530593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1530796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1531027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1531263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1531508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1531723Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1532098Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:13:47.1532328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1532692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:13:47.1532902Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1533136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:13:47.1533336Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:13:47.1533568Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1533801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1534012Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1534380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:13:47.1534900Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softmin_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1535123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1535352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1535559Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1535782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softplus_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1535987Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1536317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1536717Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.softshrink doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1536941Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_softshrink_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1537143Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1537345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1537550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_float64 PASSED [ 45%]
2023-01-11T23:13:47.1537749Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:13:47.1537946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int64 PASSED [ 45%]
2023-01-11T23:13:47.1538142Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:13:47.1538335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:13:47.1538562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1538782Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1539051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1539263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float32 PASSED [ 45%]
2023-01-11T23:13:47.1539485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1539705Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_tanhshrink_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1539904Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1540099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:13:47.1540299Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1540503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_int8 PASSED [ 45%]
2023-01-11T23:13:47.1540701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:13:47.1541081Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.threshold doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1541463Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.threshold doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1541812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:13:47.1542067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_threshold_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1542280Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_bfloat16 PASSED [ 45%]
2023-01-11T23:13:47.1542492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_complex64 PASSED [ 45%]
2023-01-11T23:13:47.1542702Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1542908Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int16 PASSED [ 45%]
2023-01-11T23:13:47.1543109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_int32 PASSED [ 45%]
2023-01-11T23:13:47.1543314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_aten_cuda_uint8 PASSED [ 45%]
2023-01-11T23:13:47.1543738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float32 SKIPPED (_refs.nn.functional.triplet_margin_loss doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1543973Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1544339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 45%]
2023-01-11T23:13:47.1544746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int32 SKIPPED (_refs.nn.functional.triplet_margin_loss doesn't support nvfuser) [ 45%]
2023-01-11T23:13:47.1545140Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 45%]
2023-01-11T23:13:47.1545372Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_nn_functional_triplet_margin_loss_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 45%]
2023-01-11T23:13:47.1545558Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_complex128 PASSED [ 45%]
2023-01-11T23:13:47.1545739Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float16 PASSED [ 45%]
2023-01-11T23:13:47.1545919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1546125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1546308Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1546491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:13:47.1546670Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1546853Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1547029Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:13:47.1547208Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:13:47.1547386Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1547595Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1547796Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1548026Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1548341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int32 SKIPPED (_refs.ones doesn't support nvfuser) [ 46%]
2023-01-11T23:13:47.1548545Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1548740Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ones_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1548933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1549149Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1549364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1549552Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:13:47.1549818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:13:47.1550001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:13:47.1550185Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1550360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1550596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1550804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1551017Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1551346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:13:47.1551555Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_permute_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1551741Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1551928Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1552118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1552310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1552492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:13:47.1552667Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1552849Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1553060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1553276Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1553490Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1553728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1553933Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1554265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_positive_executor_nvfuser_cuda_int32 SKIPPED (_refs.positive doesn't support nvfuser) [ 46%]
2023-01-11T23:13:47.1554447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1554633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_complex32 XFAIL [ 46%]
2023-01-11T23:13:47.1554812Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1554992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:13:47.1555174Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1555350Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:13:47.1555554Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1555763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1555971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1556332Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1556538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1556859Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_pow_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:13:47.1557036Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1557217Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:13:47.1557403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1557585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1557769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:13:47.1557948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:13:47.1558129Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1558307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1558517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1558713Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1559055Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:13:47.1559263Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1559623Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1559830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1560032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_prod_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1560353Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:13:47.1560674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_aten_cuda_float64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:13:47.1561007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:13:47.1561337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_complex64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:13:47.1561657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float32 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:13:47.1561972Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_randn_executor_nvfuser_cuda_float64 SKIPPED (make_traced() doesn't set seed properly!) [ 46%]
2023-01-11T23:13:47.1562155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:13:47.1562341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1562553Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:13:47.1562733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1562917Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1563127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1563335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1563677Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:13:47.1563886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1564201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_float32 SKIPPED (_refs.ravel doesn't support nvfuser) [ 46%]
2023-01-11T23:13:47.1564529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1564734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_ravel_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1564917Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1565096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:13:47.1565279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1565461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:13:47.1565639Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:13:47.1565842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:13:47.1566016Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:13:47.1566195Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1566401Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1566602Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1566811Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1567019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1567226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1567550Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1567865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_int32 SKIPPED (_refs.real doesn't support nvfuser) [ 46%]
2023-01-11T23:13:47.1568068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_real_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1568250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1568468Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:13:47.1568662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1568851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:13:47.1569039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1569226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_float64 PASSED [ 46%]
2023-01-11T23:13:47.1569411Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1569628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1569846Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1570061Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1570241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int32 PASSED [ 46%]
2023-01-11T23:13:47.1570450Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1570786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:13:47.1570996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reciprocal_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1571186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float16 PASSED [ 46%]
2023-01-11T23:13:47.1571369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1571577Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:13:47.1571764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1571979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1572168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1572494Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1572708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1573039Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_remainder_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:13:47.1573222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_bfloat16 PASSED [ 46%]
2023-01-11T23:13:47.1573408Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex128 PASSED [ 46%]
2023-01-11T23:13:47.1573593Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:13:47.1573772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int16 PASSED [ 46%]
2023-01-11T23:13:47.1573954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int32 PASSED [ 46%]
2023-01-11T23:13:47.1574162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:13:47.1574335Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_aten_cuda_uint8 PASSED [ 46%]
2023-01-11T23:13:47.1574722Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1574993Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1575341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1575548Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1575873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_repeat_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 46%]
2023-01-11T23:13:47.1576060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_bool PASSED [ 46%]
2023-01-11T23:13:47.1576255Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex32 PASSED [ 46%]
2023-01-11T23:13:47.1576442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_complex64 PASSED [ 46%]
2023-01-11T23:13:47.1576628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_float32 PASSED [ 46%]
2023-01-11T23:13:47.1576805Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int64 PASSED [ 46%]
2023-01-11T23:13:47.1576988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_aten_cuda_int8 PASSED [ 46%]
2023-01-11T23:13:47.1577199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1577410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1577830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 46%]
2023-01-11T23:13:47.1578054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1578268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1578607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float32 SKIPPED (_refs.reshape_as doesn't support nvfuser) [ 46%]
2023-01-11T23:13:47.1578815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1579169Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 46%]
2023-01-11T23:13:47.1579418Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 46%]
2023-01-11T23:13:47.1579621Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_as_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1579809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1580001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex32 PASSED [ 47%]
2023-01-11T23:13:47.1580190Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:13:47.1580410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1580597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1580778Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1580959Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1581163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1581361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1581572Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1581786Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1581995Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1582326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float32 SKIPPED (_refs.reshape doesn't support nvfuser) [ 47%]
2023-01-11T23:13:47.1582536Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1582862Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:13:47.1583194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:13:47.1583403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_reshape_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1583625Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1583806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:13:47.1583981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1584164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1584341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:13:47.1584517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1584728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1585074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 47%]
2023-01-11T23:13:47.1585281Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1585598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float32 SKIPPED (_refs.roll doesn't support nvfuser) [ 47%]
2023-01-11T23:13:47.1585801Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1586006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1586365Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:13:47.1586573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_roll_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1586759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:13:47.1586943Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:13:47.1587125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1587304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1587479Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:13:47.1587657Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1587864Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1588059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1588272Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1588475Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1588672Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1589001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:13:47.1589207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1589436Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rot90_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1589622Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1589872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1590053Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1590226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1590431Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1590633Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1590966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:13:47.1591288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_int32 SKIPPED (_refs.round doesn't support nvfuser) [ 47%]
2023-01-11T23:13:47.1591492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_round_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1591674Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1591856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1592066Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1592244Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1592445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1592658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1592841Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1593042Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1593224Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1593427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1593632Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsqrt_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1593813Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1593992Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1594163Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1594339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1594517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1594724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1594931Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1595159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1595345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1595551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1595733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1596057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_rsub_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 47%]
2023-01-11T23:13:47.1596238Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:13:47.1596416Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1596597Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1596774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1596952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1597157Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1597367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1597616Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1597798Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1597997Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1598324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:13:47.1598504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1598708Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sgn_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1598894Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1599085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_complex32 PASSED [ 47%]
2023-01-11T23:13:47.1599271Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1599456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1599637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1599815Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1599988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:13:47.1600199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1600409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1600596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1600958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 47%]
2023-01-11T23:13:47.1601151Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1601360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1601563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sigmoid_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1601742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1601927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1602102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1602286Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1602467Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_aten_cuda_int8 PASSED [ 47%]
2023-01-11T23:13:47.1602675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1602879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1603062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1603294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1603498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sign_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1603682Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:13:47.1603861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1604048Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float32 PASSED [ 47%]
2023-01-11T23:13:47.1604227Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1604410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1604589Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1604803Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1605015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1605345Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_signbit_executor_nvfuser_cuda_int32 SKIPPED (_refs.signbit doesn't support nvfuser) [ 47%]
2023-01-11T23:13:47.1605525Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_bool PASSED [ 47%]
2023-01-11T23:13:47.1605710Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:13:47.1605884Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_complex32 PASSED [ 47%]
2023-01-11T23:13:47.1606067Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1606246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1606455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1606637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1606842Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1607052Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1607260Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1607461Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sin_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1607641Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1607831Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_complex128 PASSED [ 47%]
2023-01-11T23:13:47.1608010Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1608188Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1608367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_aten_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1608582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1608764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinc_executor_nvfuser_cuda_int32 PASSED [ 47%]
2023-01-11T23:13:47.1608996Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_bfloat16 PASSED [ 47%]
2023-01-11T23:13:47.1609207Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex32 PASSED [ 47%]
2023-01-11T23:13:47.1609382Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_complex64 PASSED [ 47%]
2023-01-11T23:13:47.1609565Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float16 PASSED [ 47%]
2023-01-11T23:13:47.1609744Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_float64 PASSED [ 47%]
2023-01-11T23:13:47.1609921Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int16 PASSED [ 47%]
2023-01-11T23:13:47.1610095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_int64 PASSED [ 47%]
2023-01-11T23:13:47.1610275Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_aten_cuda_uint8 PASSED [ 47%]
2023-01-11T23:13:47.1610482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 47%]
2023-01-11T23:13:47.1610693Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1610901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1611104Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1611301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1611629Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sinh_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1611833Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex128 PASSED [ 48%]
2023-01-11T23:13:47.1612062Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_complex64 PASSED [ 48%]
2023-01-11T23:13:47.1612262Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:13:47.1612454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1612645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1612866Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1613092Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1613442Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:13:47.1613635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1613856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1614072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1614265Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1614456Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1614873Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1615095Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1615314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1615683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.bessel_j0 doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1615903Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1616113Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j0_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1616310Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:13:47.1616510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:13:47.1616698Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1616882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1617064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1617247Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1617611Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.bessel_j1 doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1618021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.bessel_j1 doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1618241Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1618585Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1618794Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_bessel_j1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1618986Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:13:47.1619177Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:13:47.1619366Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:13:47.1619557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1619746Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1619934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1620152Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1620367Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1620754Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.entr doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1620963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1621314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.entr doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1621650Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1621865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_entr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1622058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1622250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1622439Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1622630Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1622984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.erfcx doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1623199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1623529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_erfcx_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1623724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:13:47.1623913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:13:47.1624127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:13:47.1624317Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1624503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1624688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1624869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1625086Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1625301Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1625509Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1625856Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.i0e doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1626069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i0e_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1626253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1626438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1626647Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1626867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1627206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.i1 doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1627420Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1627751Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:13:47.1627954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1628290Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1628503Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1628689Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:13:47.1628901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:13:47.1629118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1629303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1629488Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1629910Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.i1e doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1630153Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1630360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1630700Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1630913Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_i1e_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1631108Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:13:47.1631311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:13:47.1631507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1631727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1632096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.log_ndtr doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1632320Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1632668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:13:47.1633043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1633261Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_ndtr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1633470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 48%]
2023-01-11T23:13:47.1633685Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_complex32 PASSED [ 48%]
2023-01-11T23:13:47.1633892Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:13:47.1634096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:13:47.1634303Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1634515Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1634724Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1634954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1635194Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1635581Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 48%]
2023-01-11T23:13:47.1635809Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1636206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 48%]
2023-01-11T23:13:47.1636421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1636651Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1636879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_log_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1637074Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:13:47.1637270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1637464Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1637654Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1637843Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_aten_cuda_uint8 PASSED [ 48%]
2023-01-11T23:13:47.1638051Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1638270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1638521Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1638738Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_logit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1638978Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:13:47.1639214Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:13:47.1639424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:13:47.1639635Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1639840Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1640050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1640455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.multigammaln doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1640820Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1641058Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_1_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1641268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:13:47.1641481Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_float64 PASSED [ 48%]
2023-01-11T23:13:47.1641715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1641923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1642126Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1642361Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_3_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1642569Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float16 PASSED [ 48%]
2023-01-11T23:13:47.1642780Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:13:47.1642988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1643218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1643449Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_multigammaln_mvlgamma_p_5_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1643640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_bfloat16 PASSED [ 48%]
2023-01-11T23:13:47.1643829Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_float32 PASSED [ 48%]
2023-01-11T23:13:47.1644019Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int32 PASSED [ 48%]
2023-01-11T23:13:47.1644233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int64 PASSED [ 48%]
2023-01-11T23:13:47.1644425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_aten_cuda_int8 PASSED [ 48%]
2023-01-11T23:13:47.1644640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1644857Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1645211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.ndtr doesn't support nvfuser) [ 48%]
2023-01-11T23:13:47.1645421Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1645761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 48%]
2023-01-11T23:13:47.1645976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtr_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 48%]
2023-01-11T23:13:47.1646165Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int16 PASSED [ 48%]
2023-01-11T23:13:47.1646355Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1646542Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:13:47.1646734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:13:47.1646952Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1647296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:13:47.1647544Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1647879Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_ndtri_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1648088Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:13:47.1648293Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:13:47.1648498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:13:47.1648704Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1648909Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1649111Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1649309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:13:47.1649506Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:13:47.1649733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1649981Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1650219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1650580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:13:47.1650804Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1651164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1651390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_softmax_with_dtype_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1651599Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1651807Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1652011Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1652409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float32 SKIPPED (_refs.special.spherical_bessel_j0 doesn't support nvfuser) [ 49%]
2023-01-11T23:13:47.1652640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1653001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1653253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_spherical_bessel_j0_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1653447Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:13:47.1653640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:13:47.1653828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1654049Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1654267Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1654688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1655040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:13:47.1655380Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_xlog1py_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1655573Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:13:47.1655821Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 49%]
2023-01-11T23:13:47.1656060Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:13:47.1656253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1656444Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1656634Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:13:47.1656850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1657198Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int32 SKIPPED (_refs.special.zeta doesn't support nvfuser) [ 49%]
2023-01-11T23:13:47.1657538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_special_zeta_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1657731Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:13:47.1657919Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:13:47.1658102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:13:47.1658288Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:13:47.1658470Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1658652Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1658834Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1659021Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:13:47.1659201Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:13:47.1659448Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1659656Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1659869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1660068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1660254Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1660582Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sqrt_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1660770Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:13:47.1660958Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:13:47.1661139Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1661354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1661557Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_complex64 SKIPPED (Skipped!) [ 49%]
2023-01-11T23:13:47.1661765Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1661988Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1662319Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_square_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1662507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_bool PASSED [ 49%]
2023-01-11T23:13:47.1662697Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:13:47.1662886Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:13:47.1663072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1663257Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:13:47.1663440Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1663619Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1663827Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1664035Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1664385Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 49%]
2023-01-11T23:13:47.1664598Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1664810Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1665001Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1665211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1665541Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_squeeze_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1665732Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_complex32 PASSED [ 49%]
2023-01-11T23:13:47.1665918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1666099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:13:47.1666277Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1666458Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:13:47.1666669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1666874Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1667085Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1667292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1667620Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 49%]
2023-01-11T23:13:47.1667852Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_stack_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1668044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:13:47.1668230Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:13:47.1668403Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:13:47.1668583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1668761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1668968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1669181Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1669390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1669596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1669847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1670054Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1670237Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:13:47.1670428Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:13:47.1670618Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:13:47.1670832Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1671015Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1671229Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1671446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1671662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1671850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1672064Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_std_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1672243Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:13:47.1672427Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_complex64 PASSED [ 49%]
2023-01-11T23:13:47.1672608Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1672819Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1673027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1673211Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1673424Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sub_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1673609Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_bfloat16 PASSED [ 49%]
2023-01-11T23:13:47.1673791Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:13:47.1673963Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1674145Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:13:47.1674324Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1674504Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:13:47.1674686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_aten_cuda_uint8 PASSED [ 49%]
2023-01-11T23:13:47.1674895Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1675083Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1675414Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:13:47.1675596Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_executor_nvfuser_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1675784Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:13:47.1675966Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float32 PASSED [ 49%]
2023-01-11T23:13:47.1676159Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1676346Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1676574Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1676790Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1677007Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1677219Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1677556Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 49%]
2023-01-11T23:13:47.1677769Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_sum_to_size_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 49%]
2023-01-11T23:13:47.1677954Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_complex128 PASSED [ 49%]
2023-01-11T23:13:47.1678127Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float16 PASSED [ 49%]
2023-01-11T23:13:47.1678306Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_float64 PASSED [ 49%]
2023-01-11T23:13:47.1678487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int16 PASSED [ 49%]
2023-01-11T23:13:47.1678658Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int32 PASSED [ 49%]
2023-01-11T23:13:47.1678828Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int64 PASSED [ 49%]
2023-01-11T23:13:47.1679043Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_aten_cuda_int8 PASSED [ 49%]
2023-01-11T23:13:47.1679248Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1679453Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1679662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1679979Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:13:47.1680162Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1680364Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1680686Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_t_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1680872Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:13:47.1681050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:13:47.1681232Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:13:47.1681410Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1681588Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:13:47.1681758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:13:47.1681965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1682205Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1682547Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:13:47.1682733Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1683059Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:13:47.1683240Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1683564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1683772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tan_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1683955Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:13:47.1684132Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:13:47.1684316Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1684498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1684681Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1684860Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:13:47.1685069Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:13:47.1685279Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1685492Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1685695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1685901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tanh_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1686089Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:13:47.1686283Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:13:47.1686478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1686668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:13:47.1686858Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1687044Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1687233Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:13:47.1687446Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1687666Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1687876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1688115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1688455Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1688669Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tensor_split_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1688851Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:13:47.1689034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1689216Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:13:47.1689398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:13:47.1689576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:13:47.1689783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1689980Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1690186Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1690499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_float32 SKIPPED (_refs.to doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1690835Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int32 SKIPPED (_refs.to doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1691040Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1691360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1691562Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_to_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1691750Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:13:47.1691936Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:13:47.1692124Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1692302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:13:47.1692489Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1692668Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:13:47.1692847Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:13:47.1693027Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1693239Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1693445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1693777Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_float32 SKIPPED (_refs.trace doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1694134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:13:47.1694341Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1694727Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trace_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1694923Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_bfloat16 PASSED [ 50%]
2023-01-11T23:13:47.1695116Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:13:47.1695312Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:13:47.1695500Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:13:47.1695692Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1695876Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1696070Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1696253Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:13:47.1696462Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1696882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:13:47.1697102Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1697304Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1697527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1697861Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_transpose_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1698056Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex128 PASSED [ 50%]
2023-01-11T23:13:47.1698249Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:13:47.1698438Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:13:47.1698628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1698806Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:13:47.1699008Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1699222Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1699434Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1699783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:13:47.1700003Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1700360Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_float32 SKIPPED (_refs.tril doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1700690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:13:47.1700882Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1701072Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1701406Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int32 SKIPPED (_refs.tril_indices doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1701628Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_tril_indices_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1701818Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex32 PASSED [ 50%]
2023-01-11T23:13:47.1702005Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1702187Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float16 PASSED [ 50%]
2023-01-11T23:13:47.1702371Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:13:47.1702551Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1702734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:13:47.1702975Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1703322Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:13:47.1703524Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1703728Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1704046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_float32 SKIPPED (_refs.triu doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1704250Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1704576Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1704783Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1704971Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1705160Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1705499Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_triu_indices_executor_nvfuser_cuda_int32 SKIPPED (_refs.triu_indices doesn't support nvfuser) [ 50%]
2023-01-11T23:13:47.1705688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:13:47.1705871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex32 XFAIL [ 50%]
2023-01-11T23:13:47.1706068Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1706284Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1706474Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:13:47.1706664Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1706850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1707032Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:13:47.1707246Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1707603Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 50%]
2023-01-11T23:13:47.1707824Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1708033Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1708228Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1708569Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 50%]
2023-01-11T23:13:47.1708759Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1709123Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 50%]
2023-01-11T23:13:47.1709339Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_true_divide_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1709523Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_float32 PASSED [ 50%]
2023-01-11T23:13:47.1709761Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:13:47.1709965Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int32 PASSED [ 50%]
2023-01-11T23:13:47.1710147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1710352Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1710560Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1710768Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_trunc_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1710946Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_bool PASSED [ 50%]
2023-01-11T23:13:47.1711136Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_complex64 PASSED [ 50%]
2023-01-11T23:13:47.1711323Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_float64 PASSED [ 50%]
2023-01-11T23:13:47.1711507Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int16 PASSED [ 50%]
2023-01-11T23:13:47.1711688Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int64 PASSED [ 50%]
2023-01-11T23:13:47.1711871Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_int8 PASSED [ 50%]
2023-01-11T23:13:47.1712046Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_aten_cuda_uint8 PASSED [ 50%]
2023-01-11T23:13:47.1712289Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 50%]
2023-01-11T23:13:47.1712508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1712715Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1713041Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int32 SKIPPED (_refs.unbind doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1713369Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unbind_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1713563Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:13:47.1713758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1713947Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_complex32 PASSED [ 51%]
2023-01-11T23:13:47.1714135Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:13:47.1714314Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1714502Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:13:47.1714687Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:13:47.1714897Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1715120Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1715473Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1715684Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1716020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_float32 SKIPPED (_refs.unflatten doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1716354Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:13:47.1716564Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1716891Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1717103Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unflatten_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1717292Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:13:47.1717482Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:13:47.1717675Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1717869Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_complex32 PASSED [ 51%]
2023-01-11T23:13:47.1718057Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1718268Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:13:47.1718457Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:13:47.1718644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:13:47.1718850Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1719204Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1719426Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1719644Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1719989Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float32 SKIPPED (_refs.unfold_copy doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1720199Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1720534Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:13:47.1720868Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int32 SKIPPED (_refs.unfold_copy doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1721118Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1721451Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_copy_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1721636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:13:47.1721817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:13:47.1722002Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:13:47.1722183Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1722391Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1722607Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1722956Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1723168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1723491Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float32 SKIPPED (_refs.unfold doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1723701Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1723906Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1724226Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unfold_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1724445Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:13:47.1724640Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1724830Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex32 PASSED [ 51%]
2023-01-11T23:13:47.1725020Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_complex64 PASSED [ 51%]
2023-01-11T23:13:47.1725206Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1725389Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:13:47.1725580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:13:47.1725764Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1725968Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1726180Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1726390Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1726736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_unsqueeze_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:13:47.1726948Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:13:47.1727134Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1727321Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_aten_cuda_complex64 PASSED [ 51%]
2023-01-11T23:13:47.1727527Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1727734Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1727918Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_executor_nvfuser_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1728096Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1728311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1728529Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1728716Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1728927Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_var_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1729109Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:13:47.1729296Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:13:47.1729478Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1729659Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:13:47.1729838Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:13:47.1730034Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:13:47.1730218Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1730425Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1730774Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1730984Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1731311Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int32 SKIPPED (_refs.view_as doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1731645Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1731854Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_as_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1732038Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1732223Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_complex64 PASSED [ 51%]
2023-01-11T23:13:47.1732398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1732580Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:13:47.1732785Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:13:47.1732967Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:13:47.1733147Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1733358Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1733561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1733772Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1734115Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1734326Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1734637Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1734964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float32 SKIPPED (_refs.view doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1735168Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1735485Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int32 SKIPPED (_refs.view doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1735690Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1735901Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_view_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1736138Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1736328Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:13:47.1736510Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:13:47.1736683Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:13:47.1736867Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:13:47.1737076Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1737294Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1737646Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1737976Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_float32 SKIPPED (_refs.vsplit doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1738302Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:13:47.1738508Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vsplit_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1738695Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:13:47.1738935Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex128 PASSED [ 51%]
2023-01-11T23:13:47.1739155Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_complex32 PASSED [ 51%]
2023-01-11T23:13:47.1739337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_float64 PASSED [ 51%]
2023-01-11T23:13:47.1739516Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:13:47.1739865Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1740192Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float32 SKIPPED (_refs.vstack doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1740402Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1740736Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int16 SKIPPED (nvfuser doesn't support dtype torch.int16) [ 51%]
2023-01-11T23:13:47.1740945Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1741270Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_vstack_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1741454Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bfloat16 PASSED [ 51%]
2023-01-11T23:13:47.1741636Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_bool PASSED [ 51%]
2023-01-11T23:13:47.1741817Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex32 PASSED [ 51%]
2023-01-11T23:13:47.1742006Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_complex64 PASSED [ 51%]
2023-01-11T23:13:47.1742189Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float16 PASSED [ 51%]
2023-01-11T23:13:47.1742398Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1742583Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:13:47.1742763Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:13:47.1742944Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_int64 PASSED [ 51%]
2023-01-11T23:13:47.1743125Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1743337Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1743538Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_bool SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1743752Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex128 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1744099Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 51%]
2023-01-11T23:13:47.1744309Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_complex64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1744517Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1744837Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_float32 SKIPPED (_refs.where doesn't support nvfuser) [ 51%]
2023-01-11T23:13:47.1745191Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_where_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 51%]
2023-01-11T23:13:47.1745379Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_float32 PASSED [ 51%]
2023-01-11T23:13:47.1745561Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int16 PASSED [ 51%]
2023-01-11T23:13:47.1745742Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int32 PASSED [ 51%]
2023-01-11T23:13:47.1745914Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_int8 PASSED [ 51%]
2023-01-11T23:13:47.1746094Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_aten_cuda_uint8 PASSED [ 51%]
2023-01-11T23:13:47.1746307Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_bfloat16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1746518Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1746726Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 51%]
2023-01-11T23:13:47.1746934Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_xlogy_executor_nvfuser_cuda_uint8 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1747117Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1747305Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1747487Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1747662Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1747844Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1748050Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1748231Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1748409Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_aten_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1748758Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 52%]
2023-01-11T23:13:47.1748964Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1749164Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1749498Z test_ops.py::TestCommonCUDA::test_python_ref_executor__refs_zeros_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 52%]
2023-01-11T23:13:47.1749775Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_aten_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1749998Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1750220Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_native_batch_norm_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1750416Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1750611Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1750833Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_aten_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1751055Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float16 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1751252Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1751470Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_var_mean_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1751664Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1751855Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_complex64 PASSED [ 52%]
2023-01-11T23:13:47.1752039Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1752229Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_aten_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1752586Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_complex32 SKIPPED (nvfuser doesn't support dtype torch.complex32) [ 52%]
2023-01-11T23:13:47.1752800Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_float64 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1753011Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int64 SKIPPED (skipped for speed) [ 52%]
2023-01-11T23:13:47.1753350Z test_ops.py::TestCommonCUDA::test_python_ref_executor_ops_nvprims_view_executor_nvfuser_cuda_int8 SKIPPED (nvfuser doesn't support dtype torch.int8) [ 52%]
2023-01-11T23:13:47.1753511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1753679Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1753841Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1754025Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1754174Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_T_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1754361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1754547Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1754726Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1754906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1755086Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bfloat16_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1755261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1755443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1755621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1755789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1755962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1756132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1756305Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_bool_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1756516Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1756689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1756874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1757050Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1757214Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1757384Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_byte_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1757566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1757749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1757933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cdouble_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1758112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1758295Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1758472Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1758645Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1758810Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_cfloat_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1759014Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1759215Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1759401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1759607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1759783Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1759953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1760121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_chalf_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1760294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1760466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1760645Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1760825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_complex64 PASSED [ 52%]
2023-01-11T23:13:47.1761004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1761176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1761347Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1761515Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_char_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1761695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1761873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_complex_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1762048Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_complex64 PASSED [ 52%]
2023-01-11T23:13:47.1762258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1762441Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1762617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1762789Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_double_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1762967Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1763145Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1763326Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1763505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_float_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1763672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1763851Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_complex64 PASSED [ 52%]
2023-01-11T23:13:47.1764029Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1764201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1764373Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1764546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1764716Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1764890Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1765062Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_half_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1765261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1765440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1765614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1765785Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1765953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1766126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_int_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1766302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1766481Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1766666Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1766834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1767007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1767177Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1767348Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1767519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1767724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1767896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_long_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1768082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_complex64 PASSED [ 52%]
2023-01-11T23:13:47.1768253Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1768429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs__conversions_short_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1768589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1768758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1768937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1769124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1769292Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1769452Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_abs_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1769616Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1769768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1769933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_complex64 PASSED [ 52%]
2023-01-11T23:13:47.1770095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_float64 PASSED [ 52%]
2023-01-11T23:13:47.1770255Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1770411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1770566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acos_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1770721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1770912Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1771067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1771230Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1771392Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_acosh_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1771556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1771717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_complex32 PASSED [ 52%]
2023-01-11T23:13:47.1771878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1772040Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int64 PASSED [ 52%]
2023-01-11T23:13:47.1772199Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1772361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_add_cuda_uint8 PASSED [ 52%]
2023-01-11T23:13:47.1772522Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1772693Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1772859Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1773027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcdiv_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1773198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_complex128 PASSED [ 52%]
2023-01-11T23:13:47.1773359Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float16 PASSED [ 52%]
2023-01-11T23:13:47.1773550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_float32 PASSED [ 52%]
2023-01-11T23:13:47.1773718Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addcmul_cuda_int8 PASSED [ 52%]
2023-01-11T23:13:47.1773874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bfloat16 PASSED [ 52%]
2023-01-11T23:13:47.1774033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_bool PASSED [ 52%]
2023-01-11T23:13:47.1774192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int16 PASSED [ 52%]
2023-01-11T23:13:47.1774347Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int32 PASSED [ 52%]
2023-01-11T23:13:47.1774681Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_addr_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1774848Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1775015Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1775177Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1775331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1775488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1775647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_all_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1775816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1775988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1776153Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_allclose_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1776315Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1776478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1776636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1776825Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1776980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1777140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amax_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1777293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1777446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_amin_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1777603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1777767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1777933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1778084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1778244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1778400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_any_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1778563Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1778724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1778881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_arange_cuda_int64 XFAIL [ 53%]
2023-01-11T23:13:47.1779053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1779217Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1779427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1779591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1779756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1779921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1780087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1780249Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1780439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1780627Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1780813Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1780991Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1781173Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1781355Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1781535Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_partial_views_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1781714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1781887Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1782073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1782254Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_complex32 PASSED [ 53%]
2023-01-11T23:13:47.1782457Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1782623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1782797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1782969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_as_strided_scatter_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1783134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1783296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1783463Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1783623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1783780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1783932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1784079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asin_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1784241Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1784408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1784576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1784739Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1784928Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_asinh_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1785092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1785259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1785411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1785567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan2_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1785730Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1785894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1786055Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1786217Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1786379Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1786538Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atan_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1786701Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1786854Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1787021Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1787181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1787340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1787497Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1787658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atanh_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1787833Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1787997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1788188Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1788365Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex32 PASSED [ 53%]
2023-01-11T23:13:47.1788536Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1788704Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1788869Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1789035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_1d_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1789205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1789382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1789579Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1789818Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1790001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1790167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_2d_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1790335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_complex128 PASSED [ 53%]
2023-01-11T23:13:47.1790500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1790695Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1790860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1791024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_atleast_3d_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1791181Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1791351Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1791516Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_and_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1791692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1791865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1792036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_left_shift_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1792201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1792369Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_not_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1792530Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1792684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1792848Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1793011Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_or_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1793191Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1793364Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1793536Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1793710Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_right_shift_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1793903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1794069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1794224Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1794386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bitwise_xor_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1794560Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_shapes_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1794736Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1794920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1795093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1795268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1795439Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1795601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1795776Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_tensors_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1795946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1796121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1796322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_float32 PASSED [ 53%]
2023-01-11T23:13:47.1796490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1796658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1796824Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_broadcast_to_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1796988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1797146Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_int32 PASSED [ 53%]
2023-01-11T23:13:47.1797311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_bucketize_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1797474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1797639Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_complex32 PASSED [ 53%]
2023-01-11T23:13:47.1797802Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1797964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1798120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int64 PASSED [ 53%]
2023-01-11T23:13:47.1798277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1798423Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cat_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1798588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1798749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_float64 PASSED [ 53%]
2023-01-11T23:13:47.1798909Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int16 PASSED [ 53%]
2023-01-11T23:13:47.1799072Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_int8 PASSED [ 53%]
2023-01-11T23:13:47.1799228Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ceil_cuda_uint8 PASSED [ 53%]
2023-01-11T23:13:47.1799418Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bfloat16 PASSED [ 53%]
2023-01-11T23:13:47.1799583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_bool PASSED [ 53%]
2023-01-11T23:13:47.1799748Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex32 PASSED [ 53%]
2023-01-11T23:13:47.1799905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_complex64 PASSED [ 53%]
2023-01-11T23:13:47.1800067Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_float16 PASSED [ 53%]
2023-01-11T23:13:47.1800227Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1800387Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1800550Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1800709Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1800870Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_chunk_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1801036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1801193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1801349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1801520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1801684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1801850Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1802051Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1802213Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_max_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1802383Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1802545Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1802702Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1802868Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1803031Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1803192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1803356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1803515Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clamp_min_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1803678Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1803837Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1803989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1804149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1804309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1804468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_clone_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1804636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1804813Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1804987Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1805184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1805353Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1805512Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1805678Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1805841Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_column_stack_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1806005Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1806170Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1806335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1806499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1806658Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1806807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1806966Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1807122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1807299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1807477Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1807676Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1807849Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1808022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1808193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1808358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1808531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_conj_physical_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1808708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1808880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1809055Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1809226Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1809400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1809571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_constant_pad_nd_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1809742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1809903Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1810074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1810239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1810408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1810576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1810772Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_contiguous_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1810942Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1811103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1811259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1811422Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1811587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1811753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_copysign_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1811917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1812076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1812243Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1812403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1812566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1812717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1812874Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1813024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1813182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cos_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1813373Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1813532Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1813703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1813867Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1814022Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1814183Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1814342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1814596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1814756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1814908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cosh_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1815074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1815239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1815392Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1815552Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1815715Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1815873Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1816035Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_cumsum_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1816198Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1816359Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1816557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1816721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1816872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1817033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1817188Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1817344Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1817500Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1817668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1817842Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1818016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1818178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1818346Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diag_embed_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1818520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1818687Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1818863Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1819070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1819245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1819420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_copy_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1819588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1819746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1819917Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1820081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1820247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1820410Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1820591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1820766Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1820950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1821118Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1821297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1821471Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1821643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_diagonal_scatter_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1821806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_bool PASSED [ 54%]
2023-01-11T23:13:47.1821976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1822171Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1822336Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1822499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_digamma_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1822672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_float64 PASSED [ 54%]
2023-01-11T23:13:47.1822847Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1823020Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1823194Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_floor_rounding_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1823382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 54%]
2023-01-11T23:13:47.1823561Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1823733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int32 PASSED [ 54%]
2023-01-11T23:13:47.1823908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_no_rounding_mode_cuda_int8 PASSED [ 54%]
2023-01-11T23:13:47.1824080Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1824243Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_float32 PASSED [ 54%]
2023-01-11T23:13:47.1824417Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int16 PASSED [ 54%]
2023-01-11T23:13:47.1824588Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_int64 PASSED [ 54%]
2023-01-11T23:13:47.1824786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_div_trunc_rounding_cuda_uint8 PASSED [ 54%]
2023-01-11T23:13:47.1824956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_bfloat16 PASSED [ 54%]
2023-01-11T23:13:47.1825125Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex32 PASSED [ 54%]
2023-01-11T23:13:47.1825291Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_complex64 PASSED [ 54%]
2023-01-11T23:13:47.1825456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float16 PASSED [ 54%]
2023-01-11T23:13:47.1825623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1825779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1825939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1826102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1826264Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dsplit_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1826429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1826592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1826760Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1826921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1827076Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1827238Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1827402Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_dstack_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1827570Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1827732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1827916Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1828078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1828246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1828414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1828585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1828753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1828937Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1829129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1829313Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1829478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1829638Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_empty_like_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1829860Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1830017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1830178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1830334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1830525Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1830686Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eq_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1830853Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1831016Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1831179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1831331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1831491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1831649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erf_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1831806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1831971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1832132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1832293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1832455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfc_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1832625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1832786Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1832949Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1833112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_erfinv_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1833274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1833434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1833591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1833775Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp2_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1833939Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1834104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1834267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1834430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_exp_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1834604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1834778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1834956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1835130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1835300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1835460Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_as_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1835628Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1835792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1835963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1836129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expand_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1836323Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1836488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1836656Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1836820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1836974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1837136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_expm1_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1837300Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1837465Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1837631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1837800Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1837962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1838123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_eye_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1838288Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1838454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1838617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1838779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft2_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1838950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1839122Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1839289Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1839483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1839650Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1839807Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fft_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1839971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1840144Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1840310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1840476Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1840642Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1840806Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftn_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1840986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_bfloat16 PASSED [ 55%]
2023-01-11T23:13:47.1841157Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1841334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1841510Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1841683Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1841855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1842066Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_fftshift_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1842233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1842401Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1842569Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1842727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1842893Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft2_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1843065Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex128 PASSED [ 55%]
2023-01-11T23:13:47.1843237Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1843406Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1843573Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1843741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfft_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1843907Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1844073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1844246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1844416Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1844584Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1844751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1844920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_hfftn_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1845087Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1845278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1845443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft2_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1845603Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1845769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1845932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifft_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1846102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1846274Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1846440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftn_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1846613Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1846791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1846969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1847130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1847299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1847468Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ifftshift_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1847636Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float32 PASSED [ 55%]
2023-01-11T23:13:47.1847828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1847996Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1848165Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft2_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1848331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1848485Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfft_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1848654Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1848820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1848985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int16 PASSED [ 55%]
2023-01-11T23:13:47.1849148Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1849312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_ihfftn_cuda_uint8 PASSED [ 55%]
2023-01-11T23:13:47.1849488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1849657Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1849823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_float64 PASSED [ 55%]
2023-01-11T23:13:47.1849983Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int32 PASSED [ 55%]
2023-01-11T23:13:47.1850149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int64 PASSED [ 55%]
2023-01-11T23:13:47.1850312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft2_cuda_int8 PASSED [ 55%]
2023-01-11T23:13:47.1850477Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_bool PASSED [ 55%]
2023-01-11T23:13:47.1850649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex32 PASSED [ 55%]
2023-01-11T23:13:47.1850845Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_complex64 PASSED [ 55%]
2023-01-11T23:13:47.1851015Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_float16 PASSED [ 55%]
2023-01-11T23:13:47.1851180Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1851335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1851501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfft_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1851664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1851832Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1852004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1852168Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1852337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_irfftn_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1852501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1852668Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1852823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1852985Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1853147Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1853313Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft2_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1853508Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1853680Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1853843Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1854008Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1854167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1854330Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfft_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1854619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1854790Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1854960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1855123Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1855286Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fft_rfftn_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1855448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1855613Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1855771Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1855932Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fill_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1856100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1856262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1856436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_complex32 PASSED [ 56%]
2023-01-11T23:13:47.1856600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1856815Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1856982Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1857138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1857304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flatten_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1857469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1857630Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1857799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1857963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1858129Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1858289Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1858446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flip_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1858604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1858768Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1858940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1859102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1859298Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1859461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1859628Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fliplr_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1859799Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1859959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1860120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1860282Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_flipud_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1860448Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1860625Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1860801Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1860974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1861145Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1861312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1861473Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1861640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1861805Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_float_power_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1861969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1862136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1862299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1862484Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1862655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1862819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1862986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1863150Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1863316Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_floor_divide_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1863475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1863641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1863804Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1863964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1864120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmax_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1864271Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1864433Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1864595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1864752Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1864934Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1865088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmin_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1865248Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1865408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1865559Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1865717Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1865871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1866026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_fmod_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1866189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1866349Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_frac_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1866509Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1866669Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1866818Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gcd_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1866981Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1867137Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1867294Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ge_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1867453Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1867609Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1867765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1867923Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_gt_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1868114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1868277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1868442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1868607Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_heaviside_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1868767Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1868936Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1869101Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1869268Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1869426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1869583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hsplit_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1869809Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1869976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_complex32 PASSED [ 56%]
2023-01-11T23:13:47.1870138Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1870299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1870456Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hstack_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1870646Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1870808Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1870969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_hypot_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1871130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1871288Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1871442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_i0_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1871604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igamma_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1871769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1871934Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_igammac_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1872099Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_imag_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1872259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1872425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1872594Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1872758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1872921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1873078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_add_cuda_int8 PASSED [ 56%]
2023-01-11T23:13:47.1873247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1873414Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1873585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1873774Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_copy_cuda_uint8 PASSED [ 56%]
2023-01-11T23:13:47.1873951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_complex128 PASSED [ 56%]
2023-01-11T23:13:47.1874117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1874281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1874449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_fill_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1874617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_bool PASSED [ 56%]
2023-01-11T23:13:47.1874790Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1874965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float32 PASSED [ 56%]
2023-01-11T23:13:47.1875136Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_float64 PASSED [ 56%]
2023-01-11T23:13:47.1875297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int16 PASSED [ 56%]
2023-01-11T23:13:47.1875461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int32 PASSED [ 56%]
2023-01-11T23:13:47.1875623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_index_select_cuda_int64 PASSED [ 56%]
2023-01-11T23:13:47.1875787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_bfloat16 PASSED [ 56%]
2023-01-11T23:13:47.1875954Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_complex64 PASSED [ 56%]
2023-01-11T23:13:47.1876119Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float16 PASSED [ 56%]
2023-01-11T23:13:47.1876311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1876475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1876639Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isclose_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1876796Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1876967Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1877132Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_complex32 PASSED [ 57%]
2023-01-11T23:13:47.1877297Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1877459Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1877624Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isfinite_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1877791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1877959Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_complex32 PASSED [ 57%]
2023-01-11T23:13:47.1878115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1878278Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1878438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isinf_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1878598Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1878766Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1878926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1879090Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1879247Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isnan_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1879436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1879591Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1879758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1879922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1880084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1880245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isneginf_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1880411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1880577Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1880740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1880896Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1881059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1881223Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isposinf_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1881386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1881546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1881712Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1881877Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex32 PASSED [ 57%]
2023-01-11T23:13:47.1882079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1882235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1882400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_isreal_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1882559Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1882714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1882872Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lcm_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1883033Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1883196Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1883357Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1883513Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1883665Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_le_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1883830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1883992Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1884153Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1884314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lerp_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1884474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1884635Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1884795Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1884946Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lgamma_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1885151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1885340Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1885520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1885689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1885862Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1886029Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_norm_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1886203Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1886374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svd_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1886542Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_svdvals_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1886727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1886902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linalg_vector_norm_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1887071Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1887236Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1887400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1887587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1887750Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1887915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_linspace_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1888070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1888231Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1888392Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1888553Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1888714Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log10_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1888881Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1889049Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1889210Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1889366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1889527Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1889684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1889844Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log1p_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1890004Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1890163Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1890318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log2_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1890483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_complex32 PASSED [ 57%]
2023-01-11T23:13:47.1890633Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1890822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1890978Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1891131Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1891314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1891503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1891689Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_complex32 PASSED [ 57%]
2023-01-11T23:13:47.1891871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1892050Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1892225Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_log_softmax_with_dtype_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1892399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1892566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1892731Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_and_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1892901Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1893066Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1893261Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1893430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1893600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1893757Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_not_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1893927Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1894094Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1894267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1894436Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1894708Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1909820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1910060Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1910233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1910407Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1910576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_or_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1910745Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1910926Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1911095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1911266Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logical_xor_cuda_int8 PASSED [ 57%]
2023-01-11T23:13:47.1911442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1911697Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1911867Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1912036Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1912207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logspace_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1912380Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1912557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1912735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1912908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1913081Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1913242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_logsumexp_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1913411Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1913578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1913741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_float64 PASSED [ 57%]
2023-01-11T23:13:47.1913906Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_lt_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1914091Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1914310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1914487Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1914663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_masked_fill_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1914827Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_bool PASSED [ 57%]
2023-01-11T23:13:47.1914997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_float32 PASSED [ 57%]
2023-01-11T23:13:47.1915167Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int16 PASSED [ 57%]
2023-01-11T23:13:47.1915334Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int32 PASSED [ 57%]
2023-01-11T23:13:47.1915505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_maximum_cuda_int64 PASSED [ 57%]
2023-01-11T23:13:47.1915674Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1915846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1916017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mean_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1916207Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 57%]
2023-01-11T23:13:47.1916400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [ 57%]
2023-01-11T23:13:47.1916592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [ 57%]
2023-01-11T23:13:47.1916787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_bfloat16 PASSED [ 57%]
2023-01-11T23:13:47.1916986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 57%]
2023-01-11T23:13:47.1917187Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 58%]
2023-01-11T23:13:47.1917381Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1917599Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_meshgrid_variadic_tensors_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1917773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1917935Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1918104Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1918272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_minimum_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1918440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1918608Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1918781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1918950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_movedim_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1919115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1919284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1919445Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1919614Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1919782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1919947Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1920140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_mul_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1920311Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1920490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1920664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1920824Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1920993Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nan_to_num_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1921172Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1921353Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_complex128 PASSED [ 58%]
2023-01-11T23:13:47.1921528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1921705Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1921880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1922053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_copy_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1922224Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex128 PASSED [ 58%]
2023-01-11T23:13:47.1922386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_complex32 PASSED [ 58%]
2023-01-11T23:13:47.1922556Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1922724Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1922889Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_narrow_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1923077Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_native_layer_norm_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1923242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1923434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1923605Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_complex64 PASSED [ 58%]
2023-01-11T23:13:47.1923765Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1923927Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1924091Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ne_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1924258Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1924425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1924592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1924754Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_neg_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1924935Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1925103Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_complex64 PASSED [ 58%]
2023-01-11T23:13:47.1925276Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1925450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1925620Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1925791Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1925971Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1926189Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex128 PASSED [ 58%]
2023-01-11T23:13:47.1926379Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_complex32 PASSED [ 58%]
2023-01-11T23:13:47.1926563Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1926735Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1926914Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1927097Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_empty_strided_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1927267Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1927444Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex32 PASSED [ 58%]
2023-01-11T23:13:47.1927623Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_complex64 PASSED [ 58%]
2023-01-11T23:13:47.1927794Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1927964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_full_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1928140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex128 PASSED [ 58%]
2023-01-11T23:13:47.1928308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_complex32 PASSED [ 58%]
2023-01-11T23:13:47.1928479Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1928650Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1928820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1929017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1929208Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1929402Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_ones_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1929578Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1929738Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1929902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1930074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_new_zeros_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1930252Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1930425Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nextafter_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1930630Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1930875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_alpha_dropout_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1931061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1931246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_celu_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1931430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1931621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1931812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_dropout_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1932039Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1932218Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1932400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_elu_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1932581Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1932764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_gelu_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1932941Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_glu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1933127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1933318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1933511Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_group_norm_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1933706Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardshrink_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1933894Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1934139Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1934331Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1934891Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1935113Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1939835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_hardtanh_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1940025Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1940284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1940467Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1940641Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 58%]
2023-01-11T23:13:47.1940819Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_l1_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1941001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1941182Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1941361Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1941536Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1941727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1941921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_bool PASSED [ 58%]
2023-01-11T23:13:47.1942114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1942306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1942498Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1963514Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_log_softmax_with_dtype_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1963742Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1963943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1964124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1964302Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1964474Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_margin_ranking_loss_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1964651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1964830Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1965001Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mish_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1965186Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1965366Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1965543Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_mse_loss_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1965721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1965898Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1966079Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1966266Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1966520Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1966711Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pairwise_distance_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1966889Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_pdist_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1967074Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int32 PASSED [ 58%]
2023-01-11T23:13:47.1967256Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1967440Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1967626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_poisson_nll_loss_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1967797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1967974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1968149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_prelu_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1968322Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float16 PASSED [ 58%]
2023-01-11T23:13:47.1968499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1968670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_float64 PASSED [ 58%]
2023-01-11T23:13:47.1968858Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1969088Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1969263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu6_cuda_uint8 PASSED [ 58%]
2023-01-11T23:13:47.1969432Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_bfloat16 PASSED [ 58%]
2023-01-11T23:13:47.1969606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_float32 PASSED [ 58%]
2023-01-11T23:13:47.1969778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int16 PASSED [ 58%]
2023-01-11T23:13:47.1969944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int64 PASSED [ 58%]
2023-01-11T23:13:47.1970114Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_int8 PASSED [ 58%]
2023-01-11T23:13:47.1970280Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_relu_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1970469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1970662Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1970848Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1971027Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmax_with_dtype_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1971217Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1971402Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1971587Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softmin_with_dtype_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1971773Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1971980Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1972161Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softplus_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1972343Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1972524Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_softshrink_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1972701Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1972882Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1973063Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1973242Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1973420Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_tanhshrink_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1973600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1973780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1973963Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1974205Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_threshold_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1974412Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1974908Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1975111Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1975304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1975489Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1975673Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_nn_functional_triplet_margin_loss_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1975836Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_norm_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1975998Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1976160Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1976312Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1976469Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1976626Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1976780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ones_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1976945Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1977107Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1977270Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1977429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1977595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1977749Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1977965Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1978127Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_permute_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1978292Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1978452Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1978617Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_positive_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1978778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1978941Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1979117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1979298Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1979461Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1979619Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_pow_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1979779Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1979945Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1980102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_prod_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1980263Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1980429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1980663Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1980827Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_randn_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1980990Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1981151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1981309Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1981466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1981622Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1981781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_ravel_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1981933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1982092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1982251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_real_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1982428Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1982595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1982760Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1982921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1983089Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1983244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1983408Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1983567Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reciprocal_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1983756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1983924Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1984083Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_remainder_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1984240Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1984403Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1984566Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1984733Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1984950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1985169Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1985354Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1985557Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_repeat_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1985723Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1985892Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1986061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1986220Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1986434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1986601Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1986769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1986933Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_as_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1987096Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1987259Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1987422Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1987585Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1987744Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_reshape_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1987905Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1988073Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1988239Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1988399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1988559Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1988719Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1988877Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_roll_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1989023Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1989192Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1989356Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1989543Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1989787Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1989951Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1990111Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1990270Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rot90_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1990429Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1990580Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1990740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1990900Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_round_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1991070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1991233Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1991398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1991560Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1991721Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1991875Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1992061Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1992221Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_int8 PASSED [ 59%]
2023-01-11T23:13:47.1992382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsqrt_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1992541Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1992704Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1992864Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1993023Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1993178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1993325Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1993480Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1993632Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_rsub_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1993795Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_bfloat16 PASSED [ 59%]
2023-01-11T23:13:47.1993957Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex128 PASSED [ 59%]
2023-01-11T23:13:47.1994120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1994284Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_complex64 PASSED [ 59%]
2023-01-11T23:13:47.1994443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1994595Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_float64 PASSED [ 59%]
2023-01-11T23:13:47.1994758Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int16 PASSED [ 59%]
2023-01-11T23:13:47.1994920Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sgn_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1995082Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1995283Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_complex32 PASSED [ 59%]
2023-01-11T23:13:47.1995450Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_float32 PASSED [ 59%]
2023-01-11T23:13:47.1995610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int32 PASSED [ 59%]
2023-01-11T23:13:47.1995769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_int64 PASSED [ 59%]
2023-01-11T23:13:47.1995922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sigmoid_cuda_uint8 PASSED [ 59%]
2023-01-11T23:13:47.1996078Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_bool PASSED [ 59%]
2023-01-11T23:13:47.1996235Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float16 PASSED [ 59%]
2023-01-11T23:13:47.1996399Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.1996554Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.1996712Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.1996865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.1997017Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sign_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.1997176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.1997329Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.1997490Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.1997649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.1997846Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_signbit_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.1998013Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.1998173Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.1998341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex128 PASSED [ 60%]
2023-01-11T23:13:47.1998501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_complex32 PASSED [ 60%]
2023-01-11T23:13:47.1998653Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.1998815Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.1998976Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.1999134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sin_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.1999299Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.1999464Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_complex64 PASSED [ 60%]
2023-01-11T23:13:47.1999629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.1999788Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.1999940Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2000095Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2000251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinc_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2000407Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2000576Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex128 PASSED [ 60%]
2023-01-11T23:13:47.2000741Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_complex32 PASSED [ 60%]
2023-01-11T23:13:47.2000929Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2001093Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2001251Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2001400Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sinh_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2001583Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2001759Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2001943Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_complex128 PASSED [ 60%]
2023-01-11T23:13:47.2002124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2002301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2002478Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2002652Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_softmax_with_dtype_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2002824Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2002988Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2003158Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j0_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2003335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_bessel_j1_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2003531Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2003703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2003871Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2004034Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2004201Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2004357Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_entr_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2004528Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2004702Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2004878Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2005044Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2005215Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_erfcx_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2005386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2005558Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2005727Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2005886Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2006053Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2006219Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i0e_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2006382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2006568Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2006737Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2006899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2007070Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2007230Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2007394Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_i1e_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2007571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2007746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2007919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2008092Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_ndtr_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2008286Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2008483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 60%]
2023-01-11T23:13:47.2008675Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_complex32 PASSED [ 60%]
2023-01-11T23:13:47.2008855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.2009069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2009260Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_log_softmax_with_dtype_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2009430Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2009604Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2009775Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2009944Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2010115Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_logit_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2010310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2010496Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_1_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2010691Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2010883Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2011069Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2011252Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2011442Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2011632Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_3_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2011828Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.2012037Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2012222Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2012398Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2012568Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2012740Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2012907Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtr_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2013085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2013257Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2013426Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2013592Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_ndtri_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2013778Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2013956Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.2014143Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2014326Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2014672Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_softmax_with_dtype_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2014861Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2015052Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2015236Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2015421Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2015606Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2015777Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2015960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2016140Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_spherical_bessel_j0_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2016318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2016491Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2016664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_xlog1py_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2016832Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2016998Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2017166Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_special_zeta_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2017319Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2017488Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_complex128 PASSED [ 60%]
2023-01-11T23:13:47.2017651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.2017855Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2018019Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2018179Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2018338Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sqrt_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2018503Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.2018660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2018822Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2018986Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2019151Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2019310Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_square_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2019479Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2019647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_complex64 PASSED [ 60%]
2023-01-11T23:13:47.2019812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2019969Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2020134Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int64 PASSED [ 60%]
2023-01-11T23:13:47.2020325Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_squeeze_cuda_int8 PASSED [ 60%]
2023-01-11T23:13:47.2020484Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_bool PASSED [ 60%]
2023-01-11T23:13:47.2020651Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_complex32 PASSED [ 60%]
2023-01-11T23:13:47.2020811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float32 PASSED [ 60%]
2023-01-11T23:13:47.2020974Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2021133Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int16 PASSED [ 60%]
2023-01-11T23:13:47.2021293Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_int32 PASSED [ 60%]
2023-01-11T23:13:47.2021447Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_stack_cuda_uint8 PASSED [ 60%]
2023-01-11T23:13:47.2021610Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_complex64 PASSED [ 60%]
2023-01-11T23:13:47.2021769Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float16 PASSED [ 60%]
2023-01-11T23:13:47.2021929Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_cuda_float64 PASSED [ 60%]
2023-01-11T23:13:47.2022100Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_bfloat16 PASSED [ 60%]
2023-01-11T23:13:47.2022269Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_complex128 PASSED [ 60%]
2023-01-11T23:13:47.2022434Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2022600Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_std_mean_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2022753Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2022919Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2023083Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2023245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2023431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2023589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2023746Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2023902Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sub_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2024060Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_bool PASSED [ 61%]
2023-01-11T23:13:47.2024218Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2024382Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2024545Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2024703Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2024865Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2025024Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2025178Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2025335Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2025501Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2025671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2025839Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2026028Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2026193Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2026358Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2026519Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_sum_to_size_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2026684Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2026834Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_bool PASSED [ 61%]
2023-01-11T23:13:47.2026997Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2027150Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_t_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2027308Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2027475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2027640Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2027802Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2027960Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2028117Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tan_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2028272Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2028438Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2028602Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2028764Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2028921Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2029102Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2029262Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tanh_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2029428Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_bool XFAIL [ 61%]
2023-01-11T23:13:47.2029589Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float32 XFAIL [ 61%]
2023-01-11T23:13:47.2029844Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_float64 XFAIL [ 61%]
2023-01-11T23:13:47.2030011Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int32 XFAIL [ 61%]
2023-01-11T23:13:47.2030176Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tensor_split_cuda_int64 XFAIL [ 61%]
2023-01-11T23:13:47.2030337Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_bool PASSED [ 61%]
2023-01-11T23:13:47.2030502Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2030664Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2030820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_to_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2031157Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2031318Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2031483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2031644Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2031804Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2031998Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2032159Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trace_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2032332Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2032506Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2032670Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2032839Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2033007Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2033173Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2033341Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2033505Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_transpose_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2033671Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2033835Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2033994Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2034143Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2034301Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2034467Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_tril_indices_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2034631Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2034797Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2034962Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2035149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2035306Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2035454Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2035613Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_triu_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2035780Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_bool PASSED [ 61%]
2023-01-11T23:13:47.2035953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2036126Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2036303Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2036475Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2036643Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2036812Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2036970Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2037130Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2037296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2037455Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_true_divide_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2037655Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2037816Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2037981Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2038143Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2038296Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_trunc_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2038462Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2038629Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2038792Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2038957Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2039121Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2039281Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unbind_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2039443Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_bool PASSED [ 61%]
2023-01-11T23:13:47.2039621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2039784Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2039950Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2040112Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2040275Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unflatten_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2040446Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2040621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2040820Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2040989Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2041149Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2041314Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2041483Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2041647Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_copy_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2041814Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2041977Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2042141Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2042304Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2042466Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2042621Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2042781Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unfold_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2042953Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2043120Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2043321Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2043486Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_unsqueeze_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2043649Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2043811Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2043964Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2044124Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2044287Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_var_mean_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2044449Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_bool PASSED [ 61%]
2023-01-11T23:13:47.2044615Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2044782Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2044948Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_float16 PASSED [ 61%]
2023-01-11T23:13:47.2045110Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2045277Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2045431Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_int8 PASSED [ 61%]
2023-01-11T23:13:47.2045593Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_as_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2045751Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_bfloat16 PASSED [ 61%]
2023-01-11T23:13:47.2045915Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2046085Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2046245Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_int16 PASSED [ 61%]
2023-01-11T23:13:47.2046427Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_view_cuda_uint8 PASSED [ 61%]
2023-01-11T23:13:47.2046596Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex128 PASSED [ 61%]
2023-01-11T23:13:47.2046756Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex32 PASSED [ 61%]
2023-01-11T23:13:47.2046922Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_complex64 PASSED [ 61%]
2023-01-11T23:13:47.2047084Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float32 PASSED [ 61%]
2023-01-11T23:13:47.2047246Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_float64 PASSED [ 61%]
2023-01-11T23:13:47.2047405Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int32 PASSED [ 61%]
2023-01-11T23:13:47.2047571Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vsplit_cuda_int64 PASSED [ 61%]
2023-01-11T23:13:47.2047732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2047899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2048054Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2048215Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2048374Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2048532Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2048692Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2048880Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_vstack_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2049059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2049244Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2049419Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_complex32 PASSED [ 62%]
2023-01-11T23:13:47.2049572Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2049732Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_where_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2049899Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2050059Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2050225Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2050386Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2050546Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2050705Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_xlogy_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2050864Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2051026Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2051184Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2051342Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2051499Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2051660Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2051823Z test_ops.py::TestCommonCUDA::test_python_ref_meta__refs_zeros_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2052009Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2052227Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_native_batch_norm_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2052398Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2052572Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_var_mean_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2052742Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2052917Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2053087Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2053258Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2053425Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2053594Z test_ops.py::TestCommonCUDA::test_python_ref_meta_ops_nvprims_view_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2053753Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_var_mean_cuda_bfloat16 XFAIL [ 62%]
2023-01-11T23:13:47.2053916Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_bfloat16 XFAIL [ 62%]
2023-01-11T23:13:47.2054085Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex128 XFAIL [ 62%]
2023-01-11T23:13:47.2054249Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_complex64 XFAIL [ 62%]
2023-01-11T23:13:47.2054410Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float16 XFAIL [ 62%]
2023-01-11T23:13:47.2054706Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_float64 XFAIL [ 62%]
2023-01-11T23:13:47.2054868Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int16 XFAIL [ 62%]
2023-01-11T23:13:47.2055029Z test_ops.py::TestCommonCUDA::test_python_ref_ops_nvprims_view_cuda_int32 XFAIL [ 62%]
2023-01-11T23:13:47.2055201Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2055362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2055535Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2055708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_complex32 PASSED [ 62%]
2023-01-11T23:13:47.2055879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2056045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2056217Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_T_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2056413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2056607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2056803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2056989Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2057178Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2057365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2057557Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bfloat16_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2057747Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2057985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2058177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex32 PASSED [ 62%]
2023-01-11T23:13:47.2058364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2058551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2058730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2058921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_bool_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2059106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2059300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2059490Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2059676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2059860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2060043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_byte_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2060238Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2060456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2060648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2060836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2061025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2061211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cdouble_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2061399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2061591Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2061782Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2061972Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2062160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2062337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2062524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2062707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_cfloat_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2062895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2063086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2063271Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2063483Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2063668Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2063852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2064023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_chalf_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2064207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2064393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_complex32 PASSED [ 62%]
2023-01-11T23:13:47.2064579Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2064767Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2064950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_char_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2065136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2065325Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_complex_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2065510Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2065695Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2065914Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex32 PASSED [ 62%]
2023-01-11T23:13:47.2066102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2066289Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2066475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2066657Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2066838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2067023Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2067208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_double_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2067400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2067578Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2067759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2067942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2068126Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2068303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_float_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2068493Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2068680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2068922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2069106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2069281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2069458Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2069639Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_half_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2069896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2070084Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2070269Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2070448Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2070627Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_int_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2070805Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2070987Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2071164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2071374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_long_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2071561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2071753Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_complex128 PASSED [ 62%]
2023-01-11T23:13:47.2071938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2072126Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_float32 PASSED [ 62%]
2023-01-11T23:13:47.2072307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2072487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs__conversions_short_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2072654Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2072822Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_bool PASSED [ 62%]
2023-01-11T23:13:47.2072999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_float64 PASSED [ 62%]
2023-01-11T23:13:47.2073164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2073327Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2073494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_abs_cuda_uint8 PASSED [ 62%]
2023-01-11T23:13:47.2073670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex32 PASSED [ 62%]
2023-01-11T23:13:47.2073840Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_complex64 PASSED [ 62%]
2023-01-11T23:13:47.2074010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_float16 PASSED [ 62%]
2023-01-11T23:13:47.2074172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int16 PASSED [ 62%]
2023-01-11T23:13:47.2074374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int32 PASSED [ 62%]
2023-01-11T23:13:47.2074543Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int64 PASSED [ 62%]
2023-01-11T23:13:47.2074708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acos_cuda_int8 PASSED [ 62%]
2023-01-11T23:13:47.2074881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bfloat16 PASSED [ 62%]
2023-01-11T23:13:47.2075048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2075226Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_complex32 PASSED [ 63%]
2023-01-11T23:13:47.2075399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2075570Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2075734Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int32 PASSED [ 63%]
2023-01-11T23:13:47.2075907Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_acosh_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2076076Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2076250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2076421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_complex32 PASSED [ 63%]
2023-01-11T23:13:47.2076589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2076755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2076956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2077118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_add_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2077295Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2077467Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcdiv_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2077643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2077823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2077993Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2078162Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addcmul_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2078332Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2078506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2078669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2078837Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2079005Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_addr_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2079172Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2079345Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2079515Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2079687Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2079879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2080048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2080208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_all_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2080387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2080567Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2080743Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2080917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_allclose_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2081087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2081258Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2081431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2081600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2081757Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amax_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2081924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2082091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2082256Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2082453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2082619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2082788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int32 PASSED [ 63%]
2023-01-11T23:13:47.2082957Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2083121Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2083278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_amin_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2083445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2083618Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2083788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2083956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2084124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_int32 PASSED [ 63%]
2023-01-11T23:13:47.2084290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_any_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2084462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2084621Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2084789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2084955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_arange_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2085141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2085317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2085524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2085706Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2085882Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2086058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2086224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2086393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_int32 PASSED [ 63%]
2023-01-11T23:13:47.2086565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2086768Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2086963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2087153Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2087337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2087523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2087709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2087924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2088102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_partial_views_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2088286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2088476Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2088662Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2088847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2089030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2089215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2089397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_as_strided_scatter_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2089566Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2089733Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_complex32 PASSED [ 63%]
2023-01-11T23:13:47.2089906Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2090073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2090240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asin_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2090412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2090583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2090762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2090963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2091137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2091298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2091467Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2091637Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_asinh_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2091810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2091985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2092154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2092324Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan2_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2092492Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2092659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2092828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2092999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2093168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2093357Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2093526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atan_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2093701Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2093871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atanh_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2094045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2094215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2094387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2094749Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int32 PASSED [ 63%]
2023-01-11T23:13:47.2094924Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2095098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2095268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_1d_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2095446Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2095632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2095814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_complex32 PASSED [ 63%]
2023-01-11T23:13:47.2095983Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2096163Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2096342Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2096559Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_2d_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2096739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_bfloat16 PASSED [ 63%]
2023-01-11T23:13:47.2096920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2097093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_float64 PASSED [ 63%]
2023-01-11T23:13:47.2097267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2097437Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_atleast_3d_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2097600Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2097777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2097955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_and_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2098141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2098324Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2098505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_left_shift_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2098678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2098849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_not_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2099058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2099223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int16 PASSED [ 63%]
2023-01-11T23:13:47.2099399Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_int8 PASSED [ 63%]
2023-01-11T23:13:47.2099568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_or_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2099756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_right_shift_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2099928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2100099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int32 PASSED [ 63%]
2023-01-11T23:13:47.2100276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bitwise_xor_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2100473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2100667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_complex64 PASSED [ 63%]
2023-01-11T23:13:47.2100847Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_float16 PASSED [ 63%]
2023-01-11T23:13:47.2101034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_int64 PASSED [ 63%]
2023-01-11T23:13:47.2101220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_tensors_cuda_uint8 PASSED [ 63%]
2023-01-11T23:13:47.2101400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_bool PASSED [ 63%]
2023-01-11T23:13:47.2101589Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_complex128 PASSED [ 63%]
2023-01-11T23:13:47.2101774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_float32 PASSED [ 63%]
2023-01-11T23:13:47.2101955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2102159Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2102335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_broadcast_to_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2102506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2102685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_bucketize_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2102855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2103021Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2103197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2103369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2103540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2103708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2103874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2104034Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cat_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2104204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2104375Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2104568Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2104738Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2104908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2105075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ceil_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2105250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2105416Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2105586Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2105762Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2105935Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2106103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2106276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2106447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_chunk_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2106619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2106787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2106948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2107117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2107300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2107475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2107675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2107856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2108030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2108199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2108368Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_max_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2108538Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2108720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2108896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2109072Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2109257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2109454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clamp_min_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2109634Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2109869Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2110041Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2110228Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2110397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_clone_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2110587Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2110771Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2110951Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2111131Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2111306Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2111480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_column_stack_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2111653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2111825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2111997Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2112167Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2112337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2112508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2112675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2112844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2113030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2113239Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2113415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2113593Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2113775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_conj_physical_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2113960Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2114141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2114333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2114514Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2114697Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2114872Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_constant_pad_nd_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2115043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2115219Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2115395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2115567Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2115779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2115950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_contiguous_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2116126Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2116296Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2116464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_copysign_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2116626Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2116793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2116964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2117133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2117300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cos_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2117475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2117647Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2117814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2117981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2118142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2118308Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2118477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cosh_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2118656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2118857Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2119025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2119193Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2119363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_cumsum_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2119524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2119700Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2119875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2120043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2120211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2120377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2120542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2120721Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2120895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2121070Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2121276Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2121452Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2121627Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2121796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int64 PASSED [ 64%]
2023-01-11T23:13:47.2121969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2122138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diag_embed_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2122320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2122505Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex32 PASSED [ 64%]
2023-01-11T23:13:47.2122684Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2122868Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2123048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2123227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_copy_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2123402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2123581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2123756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2123929Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2124101Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2124287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2124471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2124661Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2124845Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2125028Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2125207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2125391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2125574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_diagonal_scatter_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2125746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_bool PASSED [ 64%]
2023-01-11T23:13:47.2125914Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2126085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2126259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_digamma_cuda_uint8 PASSED [ 64%]
2023-01-11T23:13:47.2126446Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_bfloat16 PASSED [ 64%]
2023-01-11T23:13:47.2126632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_float32 PASSED [ 64%]
2023-01-11T23:13:47.2126842Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2127026Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2127211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_floor_rounding_cuda_int8 PASSED [ 64%]
2023-01-11T23:13:47.2127404Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 64%]
2023-01-11T23:13:47.2127588Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 64%]
2023-01-11T23:13:47.2127776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2127962Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int16 PASSED [ 64%]
2023-01-11T23:13:47.2128150Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_no_rounding_mode_cuda_int32 PASSED [ 64%]
2023-01-11T23:13:47.2128336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float16 PASSED [ 64%]
2023-01-11T23:13:47.2128523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_float64 PASSED [ 64%]
2023-01-11T23:13:47.2128703Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2128886Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2129069Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_div_trunc_rounding_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2129246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2129412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2129583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2129778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2129950Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dsplit_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2130122Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2130294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_dstack_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2130506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_bool SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2130724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2130940Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2131146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2131360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_bool SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2131581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_complex32 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2131794Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float16 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2132010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2132245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int16 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2132461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2132670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2132884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_empty_like_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 65%]
2023-01-11T23:13:47.2133051Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2133217Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2133385Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eq_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2133558Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2133730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2133897Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2134062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2134229Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erf_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2134400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2134737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2134902Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2135075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2135245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfc_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2135457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2135632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2135798Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2135965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2136132Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2136301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_erfinv_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2136464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2136631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2136799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp2_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2136966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2137143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2137316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2137487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2137659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2137854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2138022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2138191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_exp_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2138373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:13:47.2138547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2138724Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2138898Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2139068Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_as_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2139240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2139409Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2139590Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_complex64 PASSED [ 65%]
2023-01-11T23:13:47.2139763Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2139932Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2140102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expand_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2140273Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:13:47.2140445Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2140616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2140785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_expm1_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2140974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:13:47.2141150Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2141316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2141484Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2141651Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2141815Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2141984Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2142147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_eye_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2142315Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2142495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2142669Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2142839Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2143010Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2143177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2143369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2143539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft2_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2143709Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2143874Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2144043Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2144213Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fft_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2144391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_complex64 PASSED [ 65%]
2023-01-11T23:13:47.2144563Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2144731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2144895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2145062Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftn_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2145247Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2145421Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2145597Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2145775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_fftshift_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2145956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2146139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2146336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft2_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2146508Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2146685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2146855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2147016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2147187Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2147362Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfft_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2147534Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2147716Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2147893Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2148066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2148240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2148412Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_hfftn_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2148584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2148792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2148968Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_complex64 PASSED [ 65%]
2023-01-11T23:13:47.2149146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2149321Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft2_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2149502Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_complex64 PASSED [ 65%]
2023-01-11T23:13:47.2149759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2149938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2150110Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2150274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2150442Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifft_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2150614Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2150793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2150971Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2151147Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2151319Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2151487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2151658Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2151818Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2152015Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftn_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2152204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_bfloat16 PASSED [ 65%]
2023-01-11T23:13:47.2152391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex128 PASSED [ 65%]
2023-01-11T23:13:47.2152576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_complex32 PASSED [ 65%]
2023-01-11T23:13:47.2152758Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2152939Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2153118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2153300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2153471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ifftshift_cuda_int8 PASSED [ 65%]
2023-01-11T23:13:47.2153644Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2153822Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2153994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2154164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2154405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft2_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2154577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_bool PASSED [ 65%]
2023-01-11T23:13:47.2154754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_float64 PASSED [ 65%]
2023-01-11T23:13:47.2154928Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int16 PASSED [ 65%]
2023-01-11T23:13:47.2155091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int32 PASSED [ 65%]
2023-01-11T23:13:47.2155260Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_int64 PASSED [ 65%]
2023-01-11T23:13:47.2155430Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfft_cuda_uint8 PASSED [ 65%]
2023-01-11T23:13:47.2155607Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float16 PASSED [ 65%]
2023-01-11T23:13:47.2155784Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_float32 PASSED [ 65%]
2023-01-11T23:13:47.2155958Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_ihfftn_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2156141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2156320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2156491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2156654Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2156821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft2_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2156998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2157175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfft_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2157369Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2157553Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2157732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_complex32 PASSED [ 66%]
2023-01-11T23:13:47.2157908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2158085Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2158252Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2158423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2158596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_irfftn_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2158771Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2158944Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2159117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2159287Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2159460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft2_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2159632Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2159799Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2159993Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2160163Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2160333Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2160500Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfft_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2160679Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2160857Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2161028Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2161192Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fft_rfftn_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2161367Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2161539Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2161713Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_complex32 PASSED [ 66%]
2023-01-11T23:13:47.2161883Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2162052Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2162222Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2162390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2162560Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2162720Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2162912Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fill_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2163095Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex32 PASSED [ 66%]
2023-01-11T23:13:47.2163274Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2163449Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2163620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2163790Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2163964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2164137Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flatten_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2164302Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2164475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2164653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2164825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2165000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2165171Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2165366Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2165536Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2165707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flip_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2165881Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2166061Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2166233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2166402Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2166572Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fliplr_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2166744Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2166922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2167094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2167263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2167428Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_flipud_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2167605Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2167780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2167963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_float_power_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2168138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2168309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2168506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2168678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2168853Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2169033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2169216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2169395Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2169577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_floor_divide_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2169748Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2169922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2170094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2170263Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmax_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2170429Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2170599Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2170769Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2170963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2171130Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2171301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmin_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2171478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2171646Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2171816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2171975Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2172139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2172307Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2172477Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_fmod_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2172644Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2172812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2172978Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gcd_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2173148Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2173315Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2173475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2173641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2173808Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ge_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2174001Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2174169Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2174332Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2174608Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_gt_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2174789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2174962Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2175142Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2175320Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_heaviside_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2175494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2175676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2175854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex32 PASSED [ 66%]
2023-01-11T23:13:47.2176027Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2176197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hsplit_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2176368Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2176537Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2176755Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_complex32 PASSED [ 66%]
2023-01-11T23:13:47.2176930Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2177098Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2177268Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int32 PASSED [ 66%]
2023-01-11T23:13:47.2177438Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2177609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2177775Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hstack_cuda_uint8 PASSED [ 66%]
2023-01-11T23:13:47.2177952Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2178117Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_hypot_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2178288Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2178455Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2178622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_i0_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2178797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float32 PASSED [ 66%]
2023-01-11T23:13:47.2178968Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_igammac_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2179138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex128 PASSED [ 66%]
2023-01-11T23:13:47.2179316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex32 PASSED [ 66%]
2023-01-11T23:13:47.2179517Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_imag_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2179731Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2180019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2180235Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2180426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_add_cuda_int16 PASSED [ 66%]
2023-01-11T23:13:47.2180629Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_bool PASSED [ 66%]
2023-01-11T23:13:47.2180814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_complex64 PASSED [ 66%]
2023-01-11T23:13:47.2180994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2181179Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_copy_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2181363Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2181530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int64 PASSED [ 66%]
2023-01-11T23:13:47.2181793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_fill_cuda_int8 PASSED [ 66%]
2023-01-11T23:13:47.2181992Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2182196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float16 PASSED [ 66%]
2023-01-11T23:13:47.2182377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_index_select_cuda_float64 PASSED [ 66%]
2023-01-11T23:13:47.2182640Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_bfloat16 PASSED [ 66%]
2023-01-11T23:13:47.2182817Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2182990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2183159Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2183321Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isclose_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2183495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex32 PASSED [ 67%]
2023-01-11T23:13:47.2183671Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2183843Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2184015Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2184184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2184355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isfinite_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2184523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2184692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex32 PASSED [ 67%]
2023-01-11T23:13:47.2184866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2185035Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2185202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2185374Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2185540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2185737Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isinf_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2185908Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2186075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2186243Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2186414Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2186581Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2186750Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2186917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isnan_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2187092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2187262Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2187431Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isneginf_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2187608Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2187776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2187946Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2188143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2188310Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2188478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isposinf_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2188650Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_complex32 PASSED [ 67%]
2023-01-11T23:13:47.2188816Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2188980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2189149Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2189312Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_isreal_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2189480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2189645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2189879Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2190044Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lcm_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2190215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2190381Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2190547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2190704Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2190872Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_le_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2191048Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2191244Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex32 PASSED [ 67%]
2023-01-11T23:13:47.2191415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2191584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lerp_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2191752Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2191923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2192093Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2192259Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2192426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lgamma_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2192619Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2192812Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2192999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2193184Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_matrix_norm_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2193364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2193542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_norm_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2193759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2193931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svd_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2194112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_svdvals_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2194387Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2194592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2194806Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linalg_vector_norm_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2194989Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2195177Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2195350Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_int64 XFAIL [ 67%]
2023-01-11T23:13:47.2195522Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_linspace_cuda_uint8 XFAIL [ 67%]
2023-01-11T23:13:47.2195686Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2195855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2196025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2196195Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2196364Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2196540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2196712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2196922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log10_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2197097Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2197257Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2197435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2197606Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2197772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2197942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2198111Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2198284Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log1p_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2198453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2198622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2198781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log2_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2198951Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2199120Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2199318Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2199486Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2199656Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2199825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2200016Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2200211Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2200394Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2200583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2200774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2200963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_log_softmax_with_dtype_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2201140Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2201323Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2201507Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2201686Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_and_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2201860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2202036Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2202224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2202428Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2202609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2202785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_not_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2202968Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2203149Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2203328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2203502Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2203664Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_or_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2203849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2204031Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2204207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2204383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2204561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2204740Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2204942Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2205120Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logical_xor_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2205286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2205466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2205648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_complex64 PASSED [ 67%]
2023-01-11T23:13:47.2205825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float16 PASSED [ 67%]
2023-01-11T23:13:47.2205994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2206165Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int16 XFAIL [ 67%]
2023-01-11T23:13:47.2206339Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int32 XFAIL [ 67%]
2023-01-11T23:13:47.2206509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logspace_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2206690Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2206855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2207030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2207202Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int32 PASSED [ 67%]
2023-01-11T23:13:47.2207371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_logsumexp_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2207540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bfloat16 PASSED [ 67%]
2023-01-11T23:13:47.2207712Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_bool PASSED [ 67%]
2023-01-11T23:13:47.2207904Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float32 PASSED [ 67%]
2023-01-11T23:13:47.2208072Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2208232Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int16 PASSED [ 67%]
2023-01-11T23:13:47.2208398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int64 PASSED [ 67%]
2023-01-11T23:13:47.2208560Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_int8 PASSED [ 67%]
2023-01-11T23:13:47.2208723Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_lt_cuda_uint8 PASSED [ 67%]
2023-01-11T23:13:47.2208910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex128 PASSED [ 67%]
2023-01-11T23:13:47.2209109Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_complex32 PASSED [ 67%]
2023-01-11T23:13:47.2209316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_float64 PASSED [ 67%]
2023-01-11T23:13:47.2209501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2209678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_masked_fill_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2209844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2210019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2210191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2210360Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_maximum_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2210554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_complex128 PASSED [ 68%]
2023-01-11T23:13:47.2210727Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2210895Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mean_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2211095Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2211286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2211471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2211659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2211844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2212037Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2212225Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_list_of_tensors_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2212423Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 68%]
2023-01-11T23:13:47.2212620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2212813Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2213000Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2213188Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_meshgrid_variadic_tensors_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2213383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2213556Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2213730Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2213900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2214067Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2214233Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2214401Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_minimum_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2214781Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2214965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2215130Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2215297Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2215462Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2215628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_movedim_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2215800Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2215965Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2216191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2216361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2216523Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2216690Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2216856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2217024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2217186Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_mul_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2217365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2217544Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2217719Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2217894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2218057Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nan_to_num_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2218241Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex128 PASSED [ 68%]
2023-01-11T23:13:47.2218425Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_complex32 PASSED [ 68%]
2023-01-11T23:13:47.2218601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2218777Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2218953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2219160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_copy_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2219335Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2219503Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2219670Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2219838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2220007Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2220179Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2220352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_narrow_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2220541Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2220729Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_native_layer_norm_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2220901Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2221068Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2221227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2221392Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2221558Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ne_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2221756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex32 PASSED [ 68%]
2023-01-11T23:13:47.2221929Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2222099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2222267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2222435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2222602Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_neg_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2222814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_bfloat16 SKIPPED (Expected: empty is not comparable) [ 68%]
2023-01-11T23:13:47.2223033Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float32 SKIPPED (Expected: empty is not comparable) [ 68%]
2023-01-11T23:13:47.2223249Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 68%]
2023-01-11T23:13:47.2223460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int64 SKIPPED (Expected: empty is not comparable) [ 68%]
2023-01-11T23:13:47.2223674Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_int8 SKIPPED (Expected: empty is not comparable) [ 68%]
2023-01-11T23:13:47.2223885Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_cuda_uint8 SKIPPED (Expected: empty is not comparable) [ 68%]
2023-01-11T23:13:47.2224119Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bfloat16 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2224352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_bool SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2224610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex128 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2224852Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_complex64 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2225079Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int16 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2225300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int32 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2225526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int64 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2225756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_empty_strided_cuda_int8 SKIPPED (Expected: empty_strided is not comparable) [ 68%]
2023-01-11T23:13:47.2225937Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2226118Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex128 PASSED [ 68%]
2023-01-11T23:13:47.2226300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2226475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2226648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_full_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2226824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2227011Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2227191Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_complex128 PASSED [ 68%]
2023-01-11T23:13:47.2227372Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2227542Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2227708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2227875Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_ones_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2228042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_bool PASSED [ 68%]
2023-01-11T23:13:47.2228220Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2228398Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2228564Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2228742Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_int8 PASSED [ 68%]
2023-01-11T23:13:47.2228917Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_new_zeros_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2229094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nextafter_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2229337Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_bfloat16 SKIPPED (Expected: dropout is not comparable) [ 68%]
2023-01-11T23:13:47.2229576Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 68%]
2023-01-11T23:13:47.2229900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_alpha_dropout_cuda_float32 SKIPPED (Expected: dropout is not comparable) [ 68%]
2023-01-11T23:13:47.2230143Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2230355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_celu_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2230637Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float16 SKIPPED (Expected: dropout is not comparable) [ 68%]
2023-01-11T23:13:47.2230860Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 68%]
2023-01-11T23:13:47.2231046Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_gelu_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2231232Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2231420Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2231605Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2231789Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_glu_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2231981Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2232168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_group_norm_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2232355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2232582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2232765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2232955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2233141Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int32 PASSED [ 68%]
2023-01-11T23:13:47.2233328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hardtanh_cuda_int64 PASSED [ 68%]
2023-01-11T23:13:47.2233531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2233725Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2233912Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_huber_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2234102Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2234294Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2234473Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_l1_loss_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2234667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2234856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_layer_norm_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2235047Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2235240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_leaky_relu_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2235449Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2235685Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_complex64 PASSED [ 68%]
2023-01-11T23:13:47.2235890Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_float64 PASSED [ 68%]
2023-01-11T23:13:47.2236092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_log_softmax_with_dtype_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2236292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2236485Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_int16 PASSED [ 68%]
2023-01-11T23:13:47.2236691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_margin_ranking_loss_cuda_uint8 PASSED [ 68%]
2023-01-11T23:13:47.2236878Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2237065Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mish_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2237255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2237444Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2237631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_mse_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2237818Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_bfloat16 PASSED [ 68%]
2023-01-11T23:13:47.2238024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float16 PASSED [ 68%]
2023-01-11T23:13:47.2238209Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_nll_loss_cuda_float32 PASSED [ 68%]
2023-01-11T23:13:47.2238405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2238604Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2238802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2238999Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2239190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2239389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pairwise_distance_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2239582Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2239768Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_pdist_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2239964Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2240154Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2240346Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2240536Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2240732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_poisson_nll_loss_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2240985Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2241174Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2241359Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_prelu_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2241547Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2241732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2241921Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu6_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2242099Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2242286Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2242470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2242653Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2242835Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2243018Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_relu_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2243200Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2243405Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2243595Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_selu_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2243793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2243994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2244192Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2244391Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2244594Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2244793Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2244997Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2245196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softmin_with_dtype_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2245390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_softplus_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2245584Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2245774Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2245978Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2246228Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2246496Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2246694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_tanhshrink_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2246885Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2247077Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2247266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2247451Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2247645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_threshold_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2247846Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2248058Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2248264Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2248466Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2248666Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2248889Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2249086Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_nn_functional_triplet_margin_loss_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2249266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_norm_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2249435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2249601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_complex32 PASSED [ 69%]
2023-01-11T23:13:47.2249770Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2249944Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2250113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2250280Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ones_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2250460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2250643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2250823Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2250994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2251157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2251328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2251501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_permute_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2251680Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2251878Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2252050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2252224Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2252393Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_positive_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2252566Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2252732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2252904Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2253078Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2253248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2253417Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_pow_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2253592Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2253760Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2253938Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2254100Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2254267Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2254464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_prod_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2254780Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2254961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2255139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2255309Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2255475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_randn_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2255645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2255807Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2255986Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2256161Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex32 PASSED [ 69%]
2023-01-11T23:13:47.2256332Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2256501Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2256667Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2256854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2257082Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2257304Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2257471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_ravel_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2257641Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2257866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2258045Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2258216Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2258383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2258551Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2258718Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_real_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2258909Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2259088Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2259275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2259460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reciprocal_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2259642Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_remainder_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2259814Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2259990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2260160Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2260365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2260540Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2260705Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_repeat_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2260887Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2261066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_complex32 PASSED [ 69%]
2023-01-11T23:13:47.2261246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2261426Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2261601Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_as_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2261776Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_bfloat16 PASSED [ 69%]
2023-01-11T23:13:47.2261955Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2262133Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_complex32 PASSED [ 69%]
2023-01-11T23:13:47.2262303Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2262475Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2262647Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2262821Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2262993Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2263168Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_reshape_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2263358Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2263533Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2263696Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_complex64 PASSED [ 69%]
2023-01-11T23:13:47.2263864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_float32 PASSED [ 69%]
2023-01-11T23:13:47.2264030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2264198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2264365Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_roll_cuda_int8 PASSED [ 69%]
2023-01-11T23:13:47.2264536Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_bool PASSED [ 69%]
2023-01-11T23:13:47.2264717Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_complex128 PASSED [ 69%]
2023-01-11T23:13:47.2264888Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2265055Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int16 PASSED [ 69%]
2023-01-11T23:13:47.2265215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2265382Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_int64 PASSED [ 69%]
2023-01-11T23:13:47.2265549Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rot90_cuda_uint8 PASSED [ 69%]
2023-01-11T23:13:47.2265722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float16 PASSED [ 69%]
2023-01-11T23:13:47.2265918Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_float64 PASSED [ 69%]
2023-01-11T23:13:47.2266087Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_round_cuda_int32 PASSED [ 69%]
2023-01-11T23:13:47.2266265Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2266439Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex32 PASSED [ 70%]
2023-01-11T23:13:47.2266609Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_complex64 PASSED [ 70%]
2023-01-11T23:13:47.2266772Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2266939Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2267106Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsqrt_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2267278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2267457Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2267631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_complex64 PASSED [ 70%]
2023-01-11T23:13:47.2267801Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2267969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2268138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2268298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2268465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_rsub_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2268636Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2268825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2269002Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex32 PASSED [ 70%]
2023-01-11T23:13:47.2269196Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_complex64 PASSED [ 70%]
2023-01-11T23:13:47.2269389Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2269556Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2269795Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2269969Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2270139Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sgn_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2270316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2270494Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2270675Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex32 PASSED [ 70%]
2023-01-11T23:13:47.2270854Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_complex64 PASSED [ 70%]
2023-01-11T23:13:47.2271027Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2271197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2271357Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2271565Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2271735Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sigmoid_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2271900Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2272066Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2272238Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2272406Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2272574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sign_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2272746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2272910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2273081Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2273248Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2273418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2273585Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_signbit_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2273751Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2273920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2274092Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2274262Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2274453Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2274622Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2274787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sin_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2274961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2275125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2275298Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_complex64 PASSED [ 70%]
2023-01-11T23:13:47.2275471Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2275643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2275803Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinc_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2275970Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2276144Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2276311Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2276479Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2276645Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sinh_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2276834Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2277049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2277240Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2277419Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2277605Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2277788Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2277973Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_softmax_with_dtype_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2278155Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2278341Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j0_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2278526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2278716Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2278896Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2279077Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_bessel_j1_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2279278Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2279480Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2279658Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2279841Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2280042Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2280223Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_entr_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2280397Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2280577Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2280756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2280923Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_erfcx_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2281107Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2281285Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2281464Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2281639Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2281815Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2281990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i0e_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2282164Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2282336Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2282531Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2282708Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2282890Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2283060Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_i1e_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2283246Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2283432Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2283616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2283797Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2283976Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_ndtr_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2284173Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2284376Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2284578Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex32 PASSED [ 70%]
2023-01-11T23:13:47.2284778Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_complex64 PASSED [ 70%]
2023-01-11T23:13:47.2284980Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2285175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2285373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_log_softmax_with_dtype_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2285572Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2285756Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2285931Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2286112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2286292Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2286470Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_logit_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2286676Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2286880Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2287082Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2287281Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2287478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_1_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2287678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2287871Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_3_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2288096Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2288296Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2288491Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2288691Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_multigammaln_mvlgamma_p_5_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2288870Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2289049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2289238Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtr_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2289419Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2289596Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2289779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2289959Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2290136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_int8 PASSED [ 70%]
2023-01-11T23:13:47.2290314Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_ndtri_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2290506Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2290707Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 70%]
2023-01-11T23:13:47.2290936Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float16 PASSED [ 70%]
2023-01-11T23:13:47.2291134Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2291328Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2291513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int16 PASSED [ 70%]
2023-01-11T23:13:47.2291701Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2291884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2292073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_softmax_with_dtype_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2292260Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2292456Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_float32 PASSED [ 70%]
2023-01-11T23:13:47.2292650Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2292844Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_spherical_bessel_j0_cuda_uint8 PASSED [ 70%]
2023-01-11T23:13:47.2293029Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bfloat16 PASSED [ 70%]
2023-01-11T23:13:47.2293204Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_bool PASSED [ 70%]
2023-01-11T23:13:47.2293415Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_float64 PASSED [ 70%]
2023-01-11T23:13:47.2293598Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int32 PASSED [ 70%]
2023-01-11T23:13:47.2293779Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_int64 PASSED [ 70%]
2023-01-11T23:13:47.2293956Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_xlog1py_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2294136Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2294316Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2294643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2294824Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_special_zeta_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2294994Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2295175Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2295348Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2295519Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2295688Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2295855Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2296025Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sqrt_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2296197Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2296371Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2296574Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2296745Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_square_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2296922Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2297103Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2297282Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2297454Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2297623Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2297792Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2297963Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_squeeze_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2298124Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2298301Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2298472Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2298643Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2298810Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2298977Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2299182Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2299355Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2299526Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_stack_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2299688Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2299856Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2300032Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2300208Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2300383Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_std_mean_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2300554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2300722Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2300891Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2301049Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2301215Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2301377Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sub_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2301548Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2301715Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2301884Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2302075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2302252Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2302435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2302610Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2302787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2302961Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_sum_to_size_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2303125Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2303295Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2303461Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2303628Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2303787Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2303953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_t_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2304113Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2304290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2304465Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2304660Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2304825Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2304990Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2305157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tan_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2305329Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2305495Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2305665Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2305836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tanh_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2306024Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2306207Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2306388Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2306567Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2306746Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2306920Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tensor_split_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2307082Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2307255Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2307427Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2307616Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2307782Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2307948Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2308112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_to_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2308290Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2308460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2308620Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2308796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2308966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trace_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2309157Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2309361Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2309561Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2309796Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_transpose_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2309966Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2310138Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2310340Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2310513Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2310681Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2310849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2311014Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2311180Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_tril_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2311352Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2311521Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2311692Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2311849Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2312019Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2312198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_triu_indices_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2312373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2312554Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2312732Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_complex32 XFAIL [ 71%]
2023-01-11T23:13:47.2312910Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2313091Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2313288Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2313459Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_true_divide_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2313635Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2313805Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2313975Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2314144Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_trunc_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2314317Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2314498Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2314678Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_complex32 PASSED [ 71%]
2023-01-11T23:13:47.2314838Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unbind_cuda_int8 PASSED [ 71%]
2023-01-11T23:13:47.2315017Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2315198Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2315373Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2315552Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2315752Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2315925Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unflatten_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2316112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2316293Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2316460Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_copy_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2316631Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2316802Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2316974Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2317146Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int16 PASSED [ 71%]
2023-01-11T23:13:47.2317315Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int32 PASSED [ 71%]
2023-01-11T23:13:47.2317487Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_int64 PASSED [ 71%]
2023-01-11T23:13:47.2317658Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unfold_cuda_uint8 PASSED [ 71%]
2023-01-11T23:13:47.2317836Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2318006Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2318183Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_unsqueeze_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2318354Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2318524Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2318694Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2318894Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2319075Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_var_mean_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2319250Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bfloat16 PASSED [ 71%]
2023-01-11T23:13:47.2319418Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_bool PASSED [ 71%]
2023-01-11T23:13:47.2319585Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex128 PASSED [ 71%]
2023-01-11T23:13:47.2319759Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_complex64 PASSED [ 71%]
2023-01-11T23:13:47.2319937Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float16 PASSED [ 71%]
2023-01-11T23:13:47.2320105Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float32 PASSED [ 71%]
2023-01-11T23:13:47.2320275Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_float64 PASSED [ 71%]
2023-01-11T23:13:47.2320447Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int16 PASSED [ 72%]
2023-01-11T23:13:47.2320617Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int32 PASSED [ 72%]
2023-01-11T23:13:47.2320785Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_int8 PASSED [ 72%]
2023-01-11T23:13:47.2320953Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_as_cuda_uint8 PASSED [ 72%]
2023-01-11T23:13:47.2321112Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2321305Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_bool PASSED [ 72%]
2023-01-11T23:13:47.2321482Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex128 PASSED [ 72%]
2023-01-11T23:13:47.2321659Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_complex32 PASSED [ 72%]
2023-01-11T23:13:47.2321828Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_float64 PASSED [ 72%]
2023-01-11T23:13:47.2321998Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_view_cuda_int32 PASSED [ 72%]
2023-01-11T23:13:47.2322170Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2322339Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_bool PASSED [ 72%]
2023-01-11T23:13:47.2322509Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_int32 PASSED [ 72%]
2023-01-11T23:13:47.2322673Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vsplit_cuda_uint8 PASSED [ 72%]
2023-01-11T23:13:47.2322850Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2323022Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_bool PASSED [ 72%]
2023-01-11T23:13:47.2323190Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int32 PASSED [ 72%]
2023-01-11T23:13:47.2323359Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_vstack_cuda_int8 PASSED [ 72%]
2023-01-11T23:13:47.2323530Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2323699Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float16 PASSED [ 72%]
2023-01-11T23:13:47.2323866Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_float64 PASSED [ 72%]
2023-01-11T23:13:47.2324030Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int64 PASSED [ 72%]
2023-01-11T23:13:47.2324199Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_int8 PASSED [ 72%]
2023-01-11T23:13:47.2324390Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_where_cuda_uint8 PASSED [ 72%]
2023-01-11T23:13:47.2324569Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2324739Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_float16 PASSED [ 72%]
2023-01-11T23:13:47.2324905Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int16 PASSED [ 72%]
2023-01-11T23:13:47.2325073Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_xlogy_cuda_int64 PASSED [ 72%]
2023-01-11T23:13:47.2325245Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2325413Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_bool PASSED [ 72%]
2023-01-11T23:13:47.2325579Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex32 PASSED [ 72%]
2023-01-11T23:13:47.2325754Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2325925Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float16 PASSED [ 72%]
2023-01-11T23:13:47.2326094Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_float64 PASSED [ 72%]
2023-01-11T23:13:47.2326266Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int32 PASSED [ 72%]
2023-01-11T23:13:47.2326435Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int64 PASSED [ 72%]
2023-01-11T23:13:47.2326602Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback__refs_zeros_cuda_int8 PASSED [ 72%]
2023-01-11T23:13:47.2326864Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_native_batch_norm_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2327050Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_var_mean_cuda_float64 PASSED [ 72%]
2023-01-11T23:13:47.2327227Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bfloat16 PASSED [ 72%]
2023-01-11T23:13:47.2327400Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_bool PASSED [ 72%]
2023-01-11T23:13:47.2327583Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex32 PASSED [ 72%]
2023-01-11T23:13:47.2327765Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2327945Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_float64 PASSED [ 72%]
2023-01-11T23:13:47.2328123Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int16 PASSED [ 72%]
2023-01-11T23:13:47.2328300Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int32 PASSED [ 72%]
2023-01-11T23:13:47.2328478Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_int8 PASSED [ 72%]
2023-01-11T23:13:47.2328648Z test_ops.py::TestCommonCUDA::test_python_ref_torch_fallback_ops_nvprims_view_cuda_uint8 PASSED [ 72%]
2023-01-11T23:13:47.2328810Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2328978Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_H_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2329142Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_T_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2329320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___getitem___cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2329490Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___radd___cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2329665Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rdiv___cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2329841Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmatmul___cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2330036Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmod___cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2330202Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rmul___cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2330375Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2330543Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rpow___cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2330708Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager___rsub___cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2330897Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager__softmax_backward_data_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2331070Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_acosh_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2331239Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2331408Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_add_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2331578Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcdiv_cuda_complex64 XFAIL [ 72%]
2023-01-11T23:13:47.2331738Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addcmul_cuda_float32 XFAIL [ 72%]
2023-01-11T23:13:47.2331910Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2332076Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2332258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmm_decomposed_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2332456Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_addmv_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2332619Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_all_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2332788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amax_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2332952Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_amin_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2333122Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_aminmax_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2333281Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmax_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2333448Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argmin_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2333618Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argsort_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2333793Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2333964Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_argwhere_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2334185Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_cuda_float32 SKIPPED (Errors when storage_offset is included) [ 72%]
2023-01-11T23:13:47.2334430Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_complex64 SKIPPED (Fails in most cases, passes on LAZY for some reason) [ 72%]
2023-01-11T23:13:47.2334911Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_as_strided_scatter_cuda_float32 SKIPPED (Fails in most cases, passes on LAZY for some reason) [ 72%]
2023-01-11T23:13:47.2335083Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_asin_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2335245Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atan_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2335429Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2335603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_1d_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2335828Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2336005Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_2d_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2336183Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2336355Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_atleast_3d_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2336532Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_baddbmm_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2336702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bfloat16_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2336873Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2337045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_block_diag_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2337215Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bool_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2337415Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_shapes_cuda_float32 SKIPPED (Skipped!) [ 72%]
2023-01-11T23:13:47.2337603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2337787Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_tensors_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2337969Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_broadcast_to_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2338143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_bucketize_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2338359Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cartesian_prod_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2338520Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdist_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2338696Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cdouble_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2338866Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cfloat_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2339034Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_chalf_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2339202Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_char_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2339388Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2339571Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_inverse_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2339756Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2339937Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cholesky_solve_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2340101Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2340273Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_clamp_max_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2340449Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_column_stack_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2340629Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_combinations_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2340800Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_conj_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2340983Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2341167Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_constant_pad_nd_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2341338Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_corrcoef_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2341524Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2341693Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cos_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2341862Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cosh_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2342045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_count_nonzero_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2342215Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cov_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2342384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cross_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2342553Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummax_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2342722Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cummin_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2342902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2343067Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumprod_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2343260Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2343449Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_cumulative_trapezoid_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2343619Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_deg2rad_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2343787Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diag_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2343963Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagflat_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2344172Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2344354Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diagonal_copy_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2344525Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_diff_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2344683Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dist_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2344874Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_no_rounding_mode_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2345060Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_div_trunc_rounding_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2345230Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2345399Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dot_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2345574Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_dstack_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2345750Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2345919Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_einsum_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2346112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_cuda_complex64 SKIPPED (Skipped!) [ 72%]
2023-01-11T23:13:47.2346281Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_empty_like_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2354216Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eq_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2354399Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erf_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2354561Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_erfc_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2354733Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp2_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2354954Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_complex64 PASSED [ 72%]
2023-01-11T23:13:47.2355119Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_exp_cuda_float32 PASSED [ 72%]
2023-01-11T23:13:47.2355295Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expand_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2355457Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_expm1_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2355640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_eye_cuda_complex64 SKIPPED (Skipped!) [ 73%]
2023-01-11T23:13:47.2355814Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftn_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2355997Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_fftshift_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2356178Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfft2_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2356351Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_hfftn_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2356534Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft2_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2356709Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2356883Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifft_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2357054Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftn_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2357234Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ifftshift_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2357409Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_ihfftn_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2357633Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft2_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2357810Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfft_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2357992Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_irfftn_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2358169Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fft_rfft2_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2358343Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2358512Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fill_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2358680Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flatten_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2358852Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flip_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2359030Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fliplr_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2359204Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2359376Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_flipud_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2359550Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2359722Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2359902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_float_power_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2360072Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_floor_divide_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2360242Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmax_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2360418Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_fmod_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2360583Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_frac_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2360790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_full_like_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2360967Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2361137Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gather_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2361303Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ge_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2361473Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_geqrf_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2361645Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gradient_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2361814Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_gt_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2361988Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_half_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2362167Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_heaviside_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2362337Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_histc_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2362514Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_hstack_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2362687Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_add_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2362867Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2363035Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_copy_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2363216Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_fill_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2363419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2363594Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_put_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2363778Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_index_select_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2363952Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_inner_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2364122Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_int_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2364297Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isclose_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2364469Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isfinite_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2364630Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isin_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2364806Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2364979Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isinf_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2365154Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isposinf_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2365329Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_isreal_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2365502Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_istft_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2365697Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_2inputs_2outputs_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2365902Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2366106Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_4inputs_with_extra_args_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2366289Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2366512Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_binary_return_by_ref_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2366702Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2366887Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_jiterator_unary_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2367058Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kron_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2367229Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_kthvalue_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2367411Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2367587Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cond_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2367776Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cross_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2367950Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2368126Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2368317Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_det_singular_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2368496Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2368669Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eig_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2368847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigh_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2369059Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2369245Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_eigvalsh_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2369444Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_householder_product_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2369616Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2369790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2369973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_inv_ex_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2370159Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2370352Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_factor_ex_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2370536Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_ldl_solve_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2370734Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lstsq_grad_oriented_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2370918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_factor_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2371100Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_lu_solve_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2371278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2371468Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_power_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2371653Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2371851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2372052Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_matrix_rank_hermitian_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2372252Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2372457Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2372652Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_hermitian_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2372910Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 73%]
2023-01-11T23:13:47.2373160Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 73%]
2023-01-11T23:13:47.2373334Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_qr_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2373520Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_slogdet_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2373704Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2373881Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2374076Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_solve_triangular_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2374258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_svd_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2374445Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2374756Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorinv_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2375000Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_tensorsolve_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2375180Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vander_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2375369Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_vector_norm_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2375549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_complex64 XFAIL [ 73%]
2023-01-11T23:13:47.2375723Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linspace_cuda_float32 XFAIL [ 73%]
2023-01-11T23:13:47.2375897Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2376066Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log10_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2376235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log2_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2376419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2376607Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2376799Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_log_softmax_with_dtype_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2376995Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_not_cuda_complex64 SKIPPED (Skipped!) [ 73%]
2023-01-11T23:13:47.2377178Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logical_xor_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2377356Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logspace_cuda_complex64 XFAIL [ 73%]
2023-01-11T23:13:47.2377535Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_logsumexp_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2377712Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2377881Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_long_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2378093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_solve_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2378269Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2378447Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_lu_unpack_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2378613Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mT_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2378788Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amax_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2378960Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_amin_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2379142Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmax_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2379326Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_argmin_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2379516Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumprod_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2379698Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_cumsum_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2379875Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_logaddexp_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2380058Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_mean_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2380235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_norm_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2380419Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_normalize_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2380631Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2380814Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_scatter_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2380995Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_select_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2381172Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_softmax_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2381347Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_std_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2381519Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2381692Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_sum_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2381870Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_masked_var_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2382047Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2382220Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matmul_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2382405Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_matrix_exp_cuda_complex64 PASSED [ 73%]
2023-01-11T23:13:47.2382577Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_binary_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2382768Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_max_reduction_with_dim_cuda_float32 PASSED [ 73%]
2023-01-11T23:13:47.2382939Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2383100Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mean_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2383291Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_list_of_tensors_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2383494Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2383667Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_binary_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2383879Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_no_dim_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2384070Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_min_reduction_with_dim_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2384242Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_minimum_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2384412Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mm_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2384583Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_msort_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2384744Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mul_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2384918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2385087Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mv_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2385278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2385458Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nanmedian_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2385640Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_narrow_copy_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2385831Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_native_dropout_backward_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2385997Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ne_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2386158Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_neg_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2386377Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_empty_cuda_complex64 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2386552Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_full_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2386732Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_ones_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2386912Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_new_zeros_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2387090Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nextafter_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2387305Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2387508Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2387710Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2387911Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_alpha_dropout_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2388097Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_avg_pool1d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2388305Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2388495Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_bilinear_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2388681Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_celu_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2388871Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2389058Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv1d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2389251Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2389461Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv2d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2389669Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose1d_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2389936Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2390131Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2390329Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cosine_similarity_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2390523Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_cross_entropy_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2390711Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_elu_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2390908Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_embedding_bag_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2391134Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_feature_alpha_dropout_without_train_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2391339Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2391539Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2391719Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_glu_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2391914Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_grid_sample_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2392133Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardshrink_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2392326Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_hardsigmoid_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2392517Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_huber_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2392713Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_area_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2392916Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2393115Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_linear_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2393315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2393522Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2393699Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_kl_div_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2393892Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_l1_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2394082Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_leaky_relu_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2394273Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_linear_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2394471Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_local_response_norm_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2394661Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_logsigmoid_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2394855Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool1d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2395041Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_pool3d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2395258Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool1d_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2395451Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2395662Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2395856Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_normalize_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2396059Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2396256Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_circular_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2396453Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_constant_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2396650Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_reflect_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2396849Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pad_replicate_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2397036Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pdist_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2397235Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_shuffle_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2397420Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2397616Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2397838Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu6_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2398026Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_relu_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2398214Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_silu_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2398406Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2398599Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2398802Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2399002Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2399190Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softshrink_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2399386Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_softsign_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2399579Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_tanhshrink_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2399769Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_threshold_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2399985Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2400175Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nn_functional_unfold_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2400353Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_nonzero_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2400529Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2400707Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_fro_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2400898Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2401074Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_norm_inf_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2401260Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_normal_number_mean_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2401434Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_complex64 XFAIL [ 74%]
2023-01-11T23:13:47.2401602Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ones_cuda_float32 XFAIL [ 74%]
2023-01-11T23:13:47.2401779Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_outer_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2401961Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pca_lowrank_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2402138Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2402315Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_pinverse_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2402480Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polar_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2402672Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_0_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2402882Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2403088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2403266Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2403468Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_positive_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2403643Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_prod_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2403811Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_put_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2403978Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_qr_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2404143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_quantile_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2404320Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rand_like_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2404501Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randint_like_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2404676Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_randn_like_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2404851Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_ravel_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2405025Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_real_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2405206Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reciprocal_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2405380Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2405549Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_renorm_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2405723Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2405898Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_reshape_as_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2406075Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resize_as__cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2406256Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_conj_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2406435Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_resolve_neg_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2406627Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_roll_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2406803Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2406998Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_3_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2407190Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_round_decimals_neg_3_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2407361Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsqrt_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2407531Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_rsub_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2407728Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scalar_tensor_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2407905Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2408092Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amax_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2408278Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_amin_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2408460Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_scatter_reduce_mean_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2408641Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_searchsorted_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2408809Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2408978Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2409195Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_select_scatter_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2409373Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_short_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2409544Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sigmoid_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2409758Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_cosine_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2409973Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_general_hamming_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2410175Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_hann_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2410375Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_kaiser_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2410577Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_signal_windows_nuttall_cuda_float32 SKIPPED (Skipped!) [ 74%]
2023-01-11T23:13:47.2410742Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sin_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2410917Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2411088Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinc_cuda_float32 PASSED [ 74%]
2023-01-11T23:13:47.2411259Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sinh_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2411431Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_complex64 PASSED [ 74%]
2023-01-11T23:13:47.2411603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_slice_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2411774Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sort_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2411981Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_complex64 SKIPPED (Skipped!) [ 75%]
2023-01-11T23:13:47.2412199Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 75%]
2023-01-11T23:13:47.2412384Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_airy_ai_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2412570Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_bessel_y0_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2412770Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2413277Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%]
2023-01-11T23:13:47.2413644Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%]
2023-01-11T23:13:47.2413847Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_hermite_polynomial_h_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2414029Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_i1_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2414229Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_laguerre_polynomial_l_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2414414Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_log_ndtr_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2414790Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_modified_bessel_k0_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2414974Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtr_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2415156Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_ndtri_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2415603Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%]
2023-01-11T23:13:47.2415991Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%]
2023-01-11T23:13:47.2416366Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%]
2023-01-11T23:13:47.2416738Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 75%]
2023-01-11T23:13:47.2416935Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_special_spherical_bessel_j0_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2417112Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2417294Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_list_args_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2417485Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2417660Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_split_with_sizes_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2417832Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2418001Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_square_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2418171Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_squeeze_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2418348Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2418538Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2418757Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_std_mean_unbiased_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2418933Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_stft_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2419104Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2419264Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2419443Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_sum_to_size_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2419608Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2419784Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_svd_lowrank_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2419962Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_symeig_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2420128Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_t_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2420308Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_along_dim_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2420476Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_take_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2420630Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tan_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2420800Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tanh_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2420976Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tensor_split_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2421142Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tile_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2421337Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trace_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2421519Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_transpose_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2421697Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2421874Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapezoid_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2422045Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_trapz_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2422224Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_triangular_solve_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2422390Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_tril_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2422568Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2422740Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unbind_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2422914Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unflatten_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2423097Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_copy_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2423266Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2423433Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unfold_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2423606Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_uniform_cuda_complex64 XFAIL [ 75%]
2023-01-11T23:13:47.2423784Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_consecutive_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2423955Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_unique_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2424143Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_mean_unbiased_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2424322Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_var_unbiased_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2424568Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_as_complex_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2424746Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_view_copy_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2424918Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vsplit_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2425093Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_vstack_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2425255Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_where_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2425424Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zero__cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2425591Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_cuda_complex64 XFAIL [ 75%]
2023-01-11T23:13:47.2425773Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_zeros_like_cuda_complex64 PASSED [ 75%]
2023-01-11T23:13:47.2425958Z test_ops.py::TestCompositeComplianceCUDA::test_backward___getitem___cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2426135Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rdiv___cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2426311Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rmod___cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2426487Z test_ops.py::TestCompositeComplianceCUDA::test_backward___rpow___cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2426687Z test_ops.py::TestCompositeComplianceCUDA::test_backward__native_batch_norm_legit_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2426855Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addbmm_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2427035Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addcmul_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2427238Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addmm_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2427414Z test_ops.py::TestCompositeComplianceCUDA::test_backward_addr_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2427592Z test_ops.py::TestCompositeComplianceCUDA::test_backward_amax_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2427766Z test_ops.py::TestCompositeComplianceCUDA::test_backward_amin_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2427940Z test_ops.py::TestCompositeComplianceCUDA::test_backward_asinh_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2428116Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atan2_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2428291Z test_ops.py::TestCompositeComplianceCUDA::test_backward_atleast_2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2428470Z test_ops.py::TestCompositeComplianceCUDA::test_backward_bfloat16_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2428651Z test_ops.py::TestCompositeComplianceCUDA::test_backward_block_diag_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2428834Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cdouble_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2429012Z test_ops.py::TestCompositeComplianceCUDA::test_backward_chalf_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2429205Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_inverse_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2429416Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cholesky_solve_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2429619Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2429872Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clamp_max_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2430042Z test_ops.py::TestCompositeComplianceCUDA::test_backward_clone_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2430223Z test_ops.py::TestCompositeComplianceCUDA::test_backward_complex_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2430417Z test_ops.py::TestCompositeComplianceCUDA::test_backward_constant_pad_nd_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2430596Z test_ops.py::TestCompositeComplianceCUDA::test_backward_copysign_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2430805Z test_ops.py::TestCompositeComplianceCUDA::test_backward_corrcoef_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2430985Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cos_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2431160Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cov_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2431335Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cross_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2431504Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cummax_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2431682Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumprod_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2431861Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumsum_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2432058Z test_ops.py::TestCompositeComplianceCUDA::test_backward_cumulative_trapezoid_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2432242Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diag_embed_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2432428Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diagonal_copy_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2432603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_diff_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2432780Z test_ops.py::TestCompositeComplianceCUDA::test_backward_digamma_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2432973Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_floor_rounding_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2433153Z test_ops.py::TestCompositeComplianceCUDA::test_backward_div_trunc_rounding_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2433331Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dsplit_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2433534Z test_ops.py::TestCompositeComplianceCUDA::test_backward_dstack_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2433711Z test_ops.py::TestCompositeComplianceCUDA::test_backward_einsum_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2433889Z test_ops.py::TestCompositeComplianceCUDA::test_backward_erf_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2434064Z test_ops.py::TestCompositeComplianceCUDA::test_backward_exp_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2434239Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expand_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2434413Z test_ops.py::TestCompositeComplianceCUDA::test_backward_expm1_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2434598Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_fftshift_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2434771Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfft2_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2434954Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_hfftn_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2435129Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ifftn_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2435313Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_ihfftn_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2435493Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfft2_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2435669Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_irfftn_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2435846Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fft_rfft_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2436023Z test_ops.py::TestCompositeComplianceCUDA::test_backward_flatten_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2436191Z test_ops.py::TestCompositeComplianceCUDA::test_backward_flip_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2436370Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fliplr_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2436557Z test_ops.py::TestCompositeComplianceCUDA::test_backward_float_power_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2436758Z test_ops.py::TestCompositeComplianceCUDA::test_backward_floor_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2436936Z test_ops.py::TestCompositeComplianceCUDA::test_backward_fmax_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2437114Z test_ops.py::TestCompositeComplianceCUDA::test_backward_frac_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2437291Z test_ops.py::TestCompositeComplianceCUDA::test_backward_frexp_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2437472Z test_ops.py::TestCompositeComplianceCUDA::test_backward_gradient_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2437662Z test_ops.py::TestCompositeComplianceCUDA::test_backward_grid_sampler_2d_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2437830Z test_ops.py::TestCompositeComplianceCUDA::test_backward_hsplit_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2438008Z test_ops.py::TestCompositeComplianceCUDA::test_backward_i0_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2438187Z test_ops.py::TestCompositeComplianceCUDA::test_backward_index_put_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2438366Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ldexp_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2438541Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lgamma_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2438729Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvals_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2438915Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_eigvalsh_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2439096Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2439273Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_inv_ex_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2439491Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2439691Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2439873Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2440057Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_lu_solve_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2440246Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_norm_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2440438Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_matrix_power_cuda_float32 PASSED [ 75%]
2023-01-11T23:13:47.2440627Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_multi_dot_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2440810Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2441011Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2441207Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_hermitian_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2441468Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 76%]
2023-01-11T23:13:47.2441656Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_slogdet_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2441841Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_svdvals_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2442034Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_tensorsolve_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2442220Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_vander_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2442404Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2442603Z test_ops.py::TestCompositeComplianceCUDA::test_backward_log_softmax_with_dtype_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2442802Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logaddexp2_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2442979Z test_ops.py::TestCompositeComplianceCUDA::test_backward_logdet_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2443151Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2443329Z test_ops.py::TestCompositeComplianceCUDA::test_backward_lu_unpack_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2443509Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amax_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2443688Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_amin_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2443874Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_cumprod_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2444061Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_fill_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2444250Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_log_softmax_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2444434Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_logaddexp_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2444612Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_norm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2444792Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_prod_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2444973Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_softmax_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2445156Z test_ops.py::TestCompositeComplianceCUDA::test_backward_masked_std_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2445338Z test_ops.py::TestCompositeComplianceCUDA::test_backward_matrix_exp_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2445558Z test_ops.py::TestCompositeComplianceCUDA::test_backward_max_reduction_no_dim_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2445736Z test_ops.py::TestCompositeComplianceCUDA::test_backward_maximum_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2445914Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mean_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2446106Z test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_list_of_tensors_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2446307Z test_ops.py::TestCompositeComplianceCUDA::test_backward_meshgrid_variadic_tensors_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2446486Z test_ops.py::TestCompositeComplianceCUDA::test_backward_min_binary_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2446682Z test_ops.py::TestCompositeComplianceCUDA::test_backward_min_reduction_with_dim_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2446865Z test_ops.py::TestCompositeComplianceCUDA::test_backward_minimum_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2447041Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2447216Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mul_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2447392Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mv_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2447579Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2447774Z test_ops.py::TestCompositeComplianceCUDA::test_backward_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2447956Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nan_to_num_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2448141Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nanmedian_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2448317Z test_ops.py::TestCompositeComplianceCUDA::test_backward_narrow_cuda_float32 XFAIL [ 76%]
2023-01-11T23:13:47.2448508Z test_ops.py::TestCompositeComplianceCUDA::test_backward_native_layer_norm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2448734Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2448968Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2449175Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2449369Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2449571Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2449769Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool1d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2449970Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_avg_pool3d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2450216Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_bilinear_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2450432Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2450682Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2450892Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2451096Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_cross_entropy_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2451299Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_bag_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2451492Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_embedding_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2451824Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gaussian_nll_loss_cuda_float32 XFAIL [ 76%]
2023-01-11T23:13:47.2452037Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_gelu_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2452276Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_grid_sample_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2452479Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardshrink_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2452679Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardsigmoid_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2452880Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hardswish_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2453093Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2453307Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2453510Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_interpolate_linear_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2453711Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_layer_norm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2453908Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_leaky_relu_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2454115Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_local_response_norm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2454312Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_logsigmoid_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2454626Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool1d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2454823Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_pool2d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2455028Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2455297Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2455494Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2455692Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_nll_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2455893Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2456098Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2456300Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2456500Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_rrelu_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2456704Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2456911Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2457106Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2457305Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2457500Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softplus_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2457694Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_softsign_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2457922Z test_ops.py::TestCompositeComplianceCUDA::test_backward_nn_functional_unfold_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2458103Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_fro_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2458286Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_inf_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2458463Z test_ops.py::TestCompositeComplianceCUDA::test_backward_norm_nuc_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2458639Z test_ops.py::TestCompositeComplianceCUDA::test_backward_normal_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2458819Z test_ops.py::TestCompositeComplianceCUDA::test_backward_permute_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2458990Z test_ops.py::TestCompositeComplianceCUDA::test_backward_pinverse_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2459192Z test_ops.py::TestCompositeComplianceCUDA::test_backward_polygamma_polygamma_n_1_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2459391Z test_ops.py::TestCompositeComplianceCUDA::test_backward_positive_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2459598Z test_ops.py::TestCompositeComplianceCUDA::test_backward_pow_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2459781Z test_ops.py::TestCompositeComplianceCUDA::test_backward_prod_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2459958Z test_ops.py::TestCompositeComplianceCUDA::test_backward_qr_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2460134Z test_ops.py::TestCompositeComplianceCUDA::test_backward_ravel_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2460316Z test_ops.py::TestCompositeComplianceCUDA::test_backward_reciprocal_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2460498Z test_ops.py::TestCompositeComplianceCUDA::test_backward_remainder_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2460665Z test_ops.py::TestCompositeComplianceCUDA::test_backward_renorm_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2460858Z test_ops.py::TestCompositeComplianceCUDA::test_backward_repeat_interleave_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2461051Z test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_conj_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2461235Z test_ops.py::TestCompositeComplianceCUDA::test_backward_resolve_neg_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2461437Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rot90_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2461628Z test_ops.py::TestCompositeComplianceCUDA::test_backward_round_decimals_3_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2461806Z test_ops.py::TestCompositeComplianceCUDA::test_backward_rsqrt_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2461991Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_add_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2462161Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2462353Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_amax_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2462547Z test_ops.py::TestCompositeComplianceCUDA::test_backward_scatter_reduce_mean_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2462721Z test_ops.py::TestCompositeComplianceCUDA::test_backward_select_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2462911Z test_ops.py::TestCompositeComplianceCUDA::test_backward_select_scatter_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2463087Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sigmoid_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2463260Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sign_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2463437Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sinh_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2463622Z test_ops.py::TestCompositeComplianceCUDA::test_backward_slice_scatter_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2463806Z test_ops.py::TestCompositeComplianceCUDA::test_backward_softmax_with_dtype_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2464014Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 76%]
2023-01-11T23:13:47.2464223Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_entr_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2464411Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_ndtri_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2464628Z test_ops.py::TestCompositeComplianceCUDA::test_backward_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2464814Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_list_args_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2465003Z test_ops.py::TestCompositeComplianceCUDA::test_backward_split_with_sizes_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2465177Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sqrt_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2465355Z test_ops.py::TestCompositeComplianceCUDA::test_backward_square_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2465523Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2465702Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_mean_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2465888Z test_ops.py::TestCompositeComplianceCUDA::test_backward_std_unbiased_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2466061Z test_ops.py::TestCompositeComplianceCUDA::test_backward_stft_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2466240Z test_ops.py::TestCompositeComplianceCUDA::test_backward_sum_to_size_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2466414Z test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2466753Z test_ops.py::TestCompositeComplianceCUDA::test_backward_svd_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 76%]
2023-01-11T23:13:47.2466941Z test_ops.py::TestCompositeComplianceCUDA::test_backward_symeig_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2467149Z test_ops.py::TestCompositeComplianceCUDA::test_backward_take_along_dim_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2467336Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tanh_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2467584Z test_ops.py::TestCompositeComplianceCUDA::test_backward_to_sparse_cuda_float32 SKIPPED (Allowed exception) [ 76%]
2023-01-11T23:13:47.2467762Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trace_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2467944Z test_ops.py::TestCompositeComplianceCUDA::test_backward_transpose_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2468120Z test_ops.py::TestCompositeComplianceCUDA::test_backward_trapz_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2468309Z test_ops.py::TestCompositeComplianceCUDA::test_backward_triangular_solve_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2468479Z test_ops.py::TestCompositeComplianceCUDA::test_backward_tril_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2468654Z test_ops.py::TestCompositeComplianceCUDA::test_backward_triu_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2468827Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unbind_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2469011Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_copy_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2469188Z test_ops.py::TestCompositeComplianceCUDA::test_backward_unfold_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2469367Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_mean_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2469550Z test_ops.py::TestCompositeComplianceCUDA::test_backward_var_unbiased_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2469789Z test_ops.py::TestCompositeComplianceCUDA::test_backward_vdot_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2469979Z test_ops.py::TestCompositeComplianceCUDA::test_backward_view_as_complex_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2470154Z test_ops.py::TestCompositeComplianceCUDA::test_backward_vsplit_cuda_float32 PASSED [ 76%]
2023-01-11T23:13:47.2470356Z test_ops.py::TestCompositeComplianceCUDA::test_backward_vstack_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2470524Z test_ops.py::TestCompositeComplianceCUDA::test_backward_where_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2470703Z test_ops.py::TestCompositeComplianceCUDA::test_backward_xlogy_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2470878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_H_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2471053Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_T_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2471235Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___radd___cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2471419Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rmatmul___cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2471597Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad___rsub___cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2471795Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad__native_batch_norm_legit_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2471968Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_acosh_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2472150Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addcmul_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2472341Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmm_decomposed_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2472516Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_addmv_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2472726Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_all_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2472899Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_amax_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2473113Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_aminmax_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2473285Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_angle_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2473500Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_argmin_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2473694Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_asinh_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2473867Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atan_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2474048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_atleast_1d_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2474228Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_baddbmm_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2474440Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bfloat16_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:13:47.2474626Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_block_diag_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2474835Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bool_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2475061Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_shapes_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2475256Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_tensors_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2475435Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_broadcast_to_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2475646Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_bucketize_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2475853Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_byte_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2476033Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cdouble_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2476205Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ceil_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2476444Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chalf_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:13:47.2476654Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_char_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2476846Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cholesky_inverse_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2477023Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_chunk_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2477192Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2477373Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_clamp_max_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2477556Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_column_stack_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2477759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_combinations_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2477996Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_complex_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2478224Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_contiguous_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2478408Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_copysign_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2478589Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_corrcoef_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2478764Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cosh_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2478974Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_count_nonzero_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2479149Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cov_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2479328Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummax_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2479509Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cummin_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2479689Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumprod_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2479929Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_cumulative_trapezoid_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2480109Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diag_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2480295Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagflat_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2480486Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_copy_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2480673Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_diagonal_scatter_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2480869Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_div_trunc_rounding_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2481047Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dot_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2481226Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dsplit_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2481407Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_dstack_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2481583Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_einsum_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2481799Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eq_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2482009Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_equal_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2482178Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erf_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2482353Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_erfinv_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2482566Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp2_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2482736Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_exp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2482921Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_expand_as_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2483127Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_eye_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2483305Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fft_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2483483Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftn_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2483667Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_fftshift_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2483839Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_hfft_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2484019Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft2_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2484199Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifft_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2484388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ifftshift_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2484567Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfft_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2484748Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_ihfftn_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2484929Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft2_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2485109Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_irfft_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2485287Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fft_rfftn_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2485459Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fill_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2485638Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flatten_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2485841Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fliplr_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2486021Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_flipud_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2486198Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_floor_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2486371Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_fmin_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2486577Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ge_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2486787Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_half_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:13:47.2487005Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_heaviside_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2487174Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hstack_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2487359Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_hypot_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2487534Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_i0_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2487746Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igamma_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2487955Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_igammac_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2488135Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_add_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2488317Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_fill_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2488576Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_index_reduce_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:13:47.2488798Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_inner_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2489088Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isneginf_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2489358Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_isreal_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2489592Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2489767Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_kron_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2489942Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ldexp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2490148Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_le_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2490324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lerp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2490510Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cross_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2490690Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_det_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2490864Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eig_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2491048Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigh_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2491236Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_eigvals_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2491417Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_inv_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2491638Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lstsq_cuda_float32 SKIPPED (Does not support forward_ad) [ 77%]
2023-01-11T23:13:47.2491828Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2492059Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_lu_factor_ex_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2492282Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2492515Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2492692Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2492952Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 77%]
2023-01-11T23:13:47.2493138Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_qr_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2493324Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_slogdet_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2493517Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_solve_ex_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2493699Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svd_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2493884Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_svdvals_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2494073Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorinv_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2494265Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_tensorsolve_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2494443Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vecdot_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2494878Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_vector_norm_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2495058Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log1p_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2495240Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log2_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2495423Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_log_softmax_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2495607Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logaddexp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2495822Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logical_or_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2495998Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logit_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2496211Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logspace_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2496389Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_logsumexp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2496598Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_long_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2496804Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lt_cuda_float32 SKIPPED (Does not support autograd) [ 77%]
2023-01-11T23:13:47.2496978Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_lu_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2497152Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mH_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2497327Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mT_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2497510Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_amax_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2497696Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_cumsum_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2497882Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_fill_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2498113Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_log_softmax_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2498309Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_logaddexp_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2498490Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_norm_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2498675Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_prod_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2498860Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_scatter_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2499045Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_select_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2499227Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmax_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2499413Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_softmin_cuda_float32 PASSED [ 77%]
2023-01-11T23:13:47.2499615Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_std_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2499821Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_masked_sum_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2500002Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_matrix_exp_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2500208Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2500383Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mean_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2500562Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_median_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2500759Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_list_of_tensors_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2500991Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_meshgrid_variadic_tensors_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2501192Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_min_reduction_with_dim_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2501365Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_minimum_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2501544Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2501721Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_mode_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2501900Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_movedim_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2502118Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_multinomial_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2502307Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nanquantile_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2502484Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nansum_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2502665Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_narrow_cuda_float32 XFAIL [ 78%]
2023-01-11T23:13:47.2502858Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_native_batch_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2503025Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_neg_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2503239Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_empty_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2503447Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_full_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2503654Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_new_zeros_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2503870Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nextafter_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2504106Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2504313Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_alpha_dropout_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2504510Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool1d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2504707Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_avg_pool3d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2504892Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_batch_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2505112Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2505308Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv1d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2505507Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv2d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2505717Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2505925Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2506149Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_elu_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2506428Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2506726Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_grid_sample_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2507034Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_group_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2507231Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardshrink_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2507439Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hardsigmoid_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2507653Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2507851Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_huber_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2508052Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_instance_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2508263Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2508475Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2508687Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2508884Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_l1_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2509073Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_layer_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2509275Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_linear_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2509485Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_local_response_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2509779Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool2d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2509997Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_pool3d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2510202Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2510445Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2510664Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_cuda_float32 SKIPPED (Skipped!) [ 78%]
2023-01-11T23:13:47.2510876Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2511076Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_mse_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2511314Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_margin_loss_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2511538Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2511740Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_normalize_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2511949Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pad_replicate_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2512161Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pairwise_distance_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2512364Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2512572Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2512768Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_prelu_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2512966Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_relu_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2513189Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2513403Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2513613Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2513816Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softplus_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2514016Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softshrink_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2514215Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_softsign_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2514411Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_tanhshrink_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2514627Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2514846Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_nonzero_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2515031Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_norm_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2515251Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_normal_number_mean_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2515466Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ones_like_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2515648Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_outer_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2515900Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pca_lowrank_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 78%]
2023-01-11T23:13:47.2516090Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pinverse_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2516304Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polar_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2516506Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_0_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2516709Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_2_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2516907Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_polygamma_polygamma_n_4_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2517077Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_pow_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2517253Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_prod_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2517430Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_put_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2517619Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_quantile_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2517801Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rad2deg_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2518013Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rand_like_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2518190Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_ravel_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2518367Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_real_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2518552Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_remainder_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2518756Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_renorm_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2518950Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_repeat_interleave_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2519205Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_reshape_as_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2519411Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_conj_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2519626Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_resolve_neg_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2519805Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rot90_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2519981Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2520173Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_round_decimals_0_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2520340Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_rsub_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2520561Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scalar_tensor_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2520745Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2520942Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_amax_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2521139Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_mean_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2521365Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_scatter_reduce_prod_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2521586Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_searchsorted_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2521816Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_lengths_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2522046Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_segment_reduce_offsets_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2522238Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_select_scatter_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2522432Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sgn_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2522661Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_blackman_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2522887Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_cosine_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2523116Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_gaussian_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2523352Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_general_cosine_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2523577Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_hann_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2523803Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_kaiser_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2524028Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_signal_windows_nuttall_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2524206Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sinh_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2524384Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_slice_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2524557Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_softmax_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2524732Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sort_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2524960Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sparse_sampled_addmm_cuda_float32 SKIPPED (Does not support forward_ad) [ 78%]
2023-01-11T23:13:47.2525217Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j0_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2525433Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_j1_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2525645Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_bessel_y0_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2525883Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_t_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2526334Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:13:47.2526713Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:13:47.2526903Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_entr_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2527131Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_laguerre_polynomial_l_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2527507Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:13:47.2527737Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i0_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2527966Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_i1_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2528193Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_modified_bessel_k1_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2528381Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_ndtri_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2528651Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_scaled_modified_bessel_k1_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2529040Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:13:47.2529435Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 78%]
2023-01-11T23:13:47.2529653Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_special_zeta_cuda_float32 SKIPPED (Does not support autograd) [ 78%]
2023-01-11T23:13:47.2529849Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_split_list_args_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2530023Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_square_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2530207Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_squeeze_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2530387Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2530579Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_mean_unbiased_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2530766Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_std_unbiased_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2530942Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sub_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2531116Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2531329Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_sum_to_size_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2531492Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_svd_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2531673Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tanh_cuda_float32 PASSED [ 78%]
2023-01-11T23:13:47.2531860Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tensordot_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2532035Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tile_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2532210Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_to_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2532388Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_topk_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2532567Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trapz_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2532745Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_tril_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2532926Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_trunc_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2533102Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unflatten_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2533319Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_uniform_cuda_float32 SKIPPED (Does not support autograd) [ 79%]
2023-01-11T23:13:47.2533546Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_consecutive_cuda_float32 SKIPPED (Does not support autograd) [ 79%]
2023-01-11T23:13:47.2533754Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unique_cuda_float32 SKIPPED (Does not support autograd) [ 79%]
2023-01-11T23:13:47.2533938Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_unsqueeze_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2534120Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_var_mean_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2534298Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vdot_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2534612Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_as_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2534839Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_view_copy_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2535014Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vsplit_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2535194Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_vstack_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2535374Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_where_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2535548Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_xlogy_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2535758Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_cuda_float32 SKIPPED (Does not support autograd) [ 79%]
2023-01-11T23:13:47.2535972Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_zeros_like_cuda_float32 SKIPPED (Does not support autograd) [ 79%]
2023-01-11T23:13:47.2536156Z test_ops.py::TestCompositeComplianceCUDA::test_operator___getitem___cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2536339Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rmatmul___cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2536517Z test_ops.py::TestCompositeComplianceCUDA::test_operator___rmod___cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2536706Z test_ops.py::TestCompositeComplianceCUDA::test_operator__native_batch_norm_legit_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2536882Z test_ops.py::TestCompositeComplianceCUDA::test_operator_acosh_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2537056Z test_ops.py::TestCompositeComplianceCUDA::test_operator_add_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2537231Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addbmm_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2537410Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addcdiv_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2537621Z test_ops.py::TestCompositeComplianceCUDA::test_operator_addr_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2537796Z test_ops.py::TestCompositeComplianceCUDA::test_operator_all_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2537977Z test_ops.py::TestCompositeComplianceCUDA::test_operator_amax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2538143Z test_ops.py::TestCompositeComplianceCUDA::test_operator_amin_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2538326Z test_ops.py::TestCompositeComplianceCUDA::test_operator_aminmax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2538500Z test_ops.py::TestCompositeComplianceCUDA::test_operator_any_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2538679Z test_ops.py::TestCompositeComplianceCUDA::test_operator_arange_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2538856Z test_ops.py::TestCompositeComplianceCUDA::test_operator_argmin_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2539058Z test_ops.py::TestCompositeComplianceCUDA::test_operator_as_strided_partial_views_cuda_float32 XFAIL [ 79%]
2023-01-11T23:13:47.2539234Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atan2_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2539439Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atanh_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2539646Z test_ops.py::TestCompositeComplianceCUDA::test_operator_atleast_2d_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2539820Z test_ops.py::TestCompositeComplianceCUDA::test_operator_bernoulli_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2540005Z test_ops.py::TestCompositeComplianceCUDA::test_operator_broadcast_to_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2540180Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cdist_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2540355Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ceil_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2540532Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cfloat_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2540714Z test_ops.py::TestCompositeComplianceCUDA::test_operator_chalf_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2540905Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_inverse_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2541116Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cholesky_solve_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2541285Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2541466Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clamp_min_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2541644Z test_ops.py::TestCompositeComplianceCUDA::test_operator_clone_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2541828Z test_ops.py::TestCompositeComplianceCUDA::test_operator_column_stack_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2542015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_conj_physical_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2542199Z test_ops.py::TestCompositeComplianceCUDA::test_operator_contiguous_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2542380Z test_ops.py::TestCompositeComplianceCUDA::test_operator_corrcoef_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2542557Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cos_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2542731Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cosh_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2542900Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cummax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2543075Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cummin_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2543250Z test_ops.py::TestCompositeComplianceCUDA::test_operator_cumsum_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2543436Z test_ops.py::TestCompositeComplianceCUDA::test_operator_diag_embed_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2543627Z test_ops.py::TestCompositeComplianceCUDA::test_operator_div_floor_rounding_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2543832Z test_ops.py::TestCompositeComplianceCUDA::test_operator_double_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2544008Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dsplit_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2544184Z test_ops.py::TestCompositeComplianceCUDA::test_operator_dstack_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2544351Z test_ops.py::TestCompositeComplianceCUDA::test_operator_eq_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2544526Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erfc_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2544701Z test_ops.py::TestCompositeComplianceCUDA::test_operator_erfinv_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2544877Z test_ops.py::TestCompositeComplianceCUDA::test_operator_exp2_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2545060Z test_ops.py::TestCompositeComplianceCUDA::test_operator_expand_as_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2545237Z test_ops.py::TestCompositeComplianceCUDA::test_operator_eye_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2545414Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fft_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2545599Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_fftshift_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2545777Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_hfft2_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2545949Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifft_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2546136Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ifftshift_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2546317Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfft_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2546497Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_ihfftn_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2546675Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fft_rfft_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2546853Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fill_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2547053Z test_ops.py::TestCompositeComplianceCUDA::test_operator_flipud_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2547229Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2547396Z test_ops.py::TestCompositeComplianceCUDA::test_operator_fmin_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2547575Z test_ops.py::TestCompositeComplianceCUDA::test_operator_frexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2547748Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ge_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2547926Z test_ops.py::TestCompositeComplianceCUDA::test_operator_gradient_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2548117Z test_ops.py::TestCompositeComplianceCUDA::test_operator_grid_sampler_2d_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2548301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_heaviside_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2548474Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hsplit_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2548650Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hstack_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2548827Z test_ops.py::TestCompositeComplianceCUDA::test_operator_hypot_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2548996Z test_ops.py::TestCompositeComplianceCUDA::test_operator_igamma_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2549174Z test_ops.py::TestCompositeComplianceCUDA::test_operator_igammac_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2549351Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_add_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2549533Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_fill_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2549779Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_reduce_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2550002Z test_ops.py::TestCompositeComplianceCUDA::test_operator_index_select_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2550184Z test_ops.py::TestCompositeComplianceCUDA::test_operator_inner_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2550360Z test_ops.py::TestCompositeComplianceCUDA::test_operator_int_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2550532Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isclose_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2550710Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isfinite_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2550886Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isin_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2551062Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isinf_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2551237Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isposinf_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2551418Z test_ops.py::TestCompositeComplianceCUDA::test_operator_isreal_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2551619Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_cuda_float32 SKIPPED (skip) [ 79%]
2023-01-11T23:13:47.2551831Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (skip) [ 79%]
2023-01-11T23:13:47.2552034Z test_ops.py::TestCompositeComplianceCUDA::test_operator_jiterator_unary_cuda_float32 SKIPPED (skip) [ 79%]
2023-01-11T23:13:47.2552202Z test_ops.py::TestCompositeComplianceCUDA::test_operator_kron_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2552381Z test_ops.py::TestCompositeComplianceCUDA::test_operator_kthvalue_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2552555Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lerp_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2552729Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lgamma_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2552917Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cond_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2553102Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cross_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2553310Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2553507Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_det_singular_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2553690Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eig_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2553870Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvals_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2554061Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_eigvalsh_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2554251Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_ldl_factor_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2554456Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2554635Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2554833Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_lu_factor_ex_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2555045Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2555229Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2555429Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_solve_triangular_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2555603Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_svd_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2555794Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorinv_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2556015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_tensorsolve_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2556202Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vecdot_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2556396Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_vector_norm_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2556582Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linspace_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2556760Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log10_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2556935Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log2_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2557103Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2557287Z test_ops.py::TestCompositeComplianceCUDA::test_operator_log_softmax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2557469Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logaddexp2_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2557648Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logdet_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2557834Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_not_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2558017Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logical_xor_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2558195Z test_ops.py::TestCompositeComplianceCUDA::test_operator_logit_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2558369Z test_ops.py::TestCompositeComplianceCUDA::test_operator_lt_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2558539Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mH_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2558713Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_amax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2558902Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_cumsum_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2559090Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_fill_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2559281Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_log_softmax_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2559527Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_logaddexp_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2559726Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_mean_cuda_float32 PASSED [ 79%]
2023-01-11T23:13:47.2559906Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_norm_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2560094Z test_ops.py::TestCompositeComplianceCUDA::test_operator_masked_softmax_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2560284Z test_ops.py::TestCompositeComplianceCUDA::test_operator_max_reduction_with_dim_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2560460Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mean_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2560641Z test_ops.py::TestCompositeComplianceCUDA::test_operator_median_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2560841Z test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_list_of_tensors_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2561046Z test_ops.py::TestCompositeComplianceCUDA::test_operator_meshgrid_variadic_tensors_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2561227Z test_ops.py::TestCompositeComplianceCUDA::test_operator_min_binary_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2561406Z test_ops.py::TestCompositeComplianceCUDA::test_operator_minimum_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2561582Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mode_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2561767Z test_ops.py::TestCompositeComplianceCUDA::test_operator_multinomial_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2561956Z test_ops.py::TestCompositeComplianceCUDA::test_operator_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2562161Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nan_to_num_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2562341Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nanmean_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2562542Z test_ops.py::TestCompositeComplianceCUDA::test_operator_native_dropout_backward_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2562770Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_cuda_float32 SKIPPED (Expected: new_empty is not comparable) [ 80%]
2023-01-11T23:13:47.2563015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_empty_strided_cuda_float32 SKIPPED (Expected: new_empty_strided is not comparable) [ 80%]
2023-01-11T23:13:47.2563196Z test_ops.py::TestCompositeComplianceCUDA::test_operator_new_full_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2563408Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2563616Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2563812Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2564015Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2564220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_alpha_dropout_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2564419Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool1d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2564616Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool2d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2564809Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_avg_pool3d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2565002Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2565220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2565444Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_bilinear_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2565667Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2565855Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv1d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2566061Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2566272Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_embedding_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2566483Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_cosine_similarity_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2566684Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_dropout2d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2566896Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2567104Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2567305Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gaussian_nll_loss_cuda_float32 XFAIL [ 80%]
2023-01-11T23:13:47.2567499Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_gelu_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2567693Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_grid_sample_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2567893Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_group_norm_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2568128Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardshrink_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2568330Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardsigmoid_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2568530Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_hardswish_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2568740Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2568950Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_linear_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2569155Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2569367Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2569557Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_l1_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2569754Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_leaky_relu_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2569954Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_linear_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2570151Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_logsigmoid_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2570358Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2570554Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_pool2d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2570760Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2570960Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2571169Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2571390Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2571585Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_max_unpool3d_grad_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2571801Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2571998Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_normalize_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2572197Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_constant_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2572399Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pad_replicate_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2572609Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pairwise_distance_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2572813Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2573016Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2573207Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_relu_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2573393Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_rrelu_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2573596Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2573797Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2574022Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_tanhshrink_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2574220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_threshold_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2574419Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nn_functional_unfold_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2574699Z test_ops.py::TestCompositeComplianceCUDA::test_operator_nonzero_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2574877Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2575056Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_fro_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2575225Z test_ops.py::TestCompositeComplianceCUDA::test_operator_norm_nuc_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2575403Z test_ops.py::TestCompositeComplianceCUDA::test_operator_normal_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2575586Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2575766Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ones_like_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2575954Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pca_lowrank_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2576134Z test_ops.py::TestCompositeComplianceCUDA::test_operator_permute_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2576313Z test_ops.py::TestCompositeComplianceCUDA::test_operator_pinverse_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2576512Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_1_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2576710Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_2_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2576894Z test_ops.py::TestCompositeComplianceCUDA::test_operator_polygamma_polygamma_n_4_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2577071Z test_ops.py::TestCompositeComplianceCUDA::test_operator_positive_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2577251Z test_ops.py::TestCompositeComplianceCUDA::test_operator_qr_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2577471Z test_ops.py::TestCompositeComplianceCUDA::test_operator_quantile_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2577652Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rad2deg_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2577837Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randint_like_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2578014Z test_ops.py::TestCompositeComplianceCUDA::test_operator_randn_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2578191Z test_ops.py::TestCompositeComplianceCUDA::test_operator_ravel_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2578358Z test_ops.py::TestCompositeComplianceCUDA::test_operator_real_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2578544Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reciprocal_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2578724Z test_ops.py::TestCompositeComplianceCUDA::test_operator_renorm_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2578920Z test_ops.py::TestCompositeComplianceCUDA::test_operator_repeat_interleave_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2579107Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_as_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2579287Z test_ops.py::TestCompositeComplianceCUDA::test_operator_reshape_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2579472Z test_ops.py::TestCompositeComplianceCUDA::test_operator_resolve_conj_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2579647Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2579834Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_0_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2580019Z test_ops.py::TestCompositeComplianceCUDA::test_operator_round_decimals_neg_3_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2580228Z test_ops.py::TestCompositeComplianceCUDA::test_operator_rsub_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2580413Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_add_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2580596Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2580790Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amax_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2580981Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_amin_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2581174Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_mean_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2581366Z test_ops.py::TestCompositeComplianceCUDA::test_operator_scatter_reduce_prod_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2581544Z test_ops.py::TestCompositeComplianceCUDA::test_operator_searchsorted_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2581747Z test_ops.py::TestCompositeComplianceCUDA::test_operator_segment_reduce_offsets_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2581927Z test_ops.py::TestCompositeComplianceCUDA::test_operator_select_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2582116Z test_ops.py::TestCompositeComplianceCUDA::test_operator_select_scatter_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2582314Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_bartlett_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2582512Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_blackman_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2582714Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_exponential_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2582907Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_gaussian_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2583111Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_general_cosine_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2583301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signal_windows_nuttall_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2583483Z test_ops.py::TestCompositeComplianceCUDA::test_operator_signbit_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2583681Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sin_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2583862Z test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2584048Z test_ops.py::TestCompositeComplianceCUDA::test_operator_slice_scatter_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2584227Z test_ops.py::TestCompositeComplianceCUDA::test_operator_softmax_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2584400Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sort_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2584610Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 80%]
2023-01-11T23:13:47.2584805Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_j1_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2584984Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y0_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2585172Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_bessel_y1_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2585381Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2585773Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:13:47.2585958Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_entr_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2586161Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_hermite_polynomial_h_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2586378Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_log_ndtr_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2586576Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i0_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2586776Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_modified_bessel_i1_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2586953Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_ndtr_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2587168Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2587374Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k0_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2587574Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2587962Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:13:47.2588357Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 80%]
2023-01-11T23:13:47.2588541Z test_ops.py::TestCompositeComplianceCUDA::test_operator_special_zeta_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2588720Z test_ops.py::TestCompositeComplianceCUDA::test_operator_split_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2588907Z test_ops.py::TestCompositeComplianceCUDA::test_operator_split_list_args_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2589095Z test_ops.py::TestCompositeComplianceCUDA::test_operator_split_with_sizes_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2589263Z test_ops.py::TestCompositeComplianceCUDA::test_operator_square_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2589443Z test_ops.py::TestCompositeComplianceCUDA::test_operator_squeeze_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2589622Z test_ops.py::TestCompositeComplianceCUDA::test_operator_std_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2589858Z test_ops.py::TestCompositeComplianceCUDA::test_operator_sum_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2590064Z test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_cuda_float32 PASSED [ 80%]
2023-01-11T23:13:47.2590249Z test_ops.py::TestCompositeComplianceCUDA::test_operator_svd_lowrank_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2590435Z test_ops.py::TestCompositeComplianceCUDA::test_operator_take_along_dim_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2590609Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tanh_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2590785Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tile_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2590951Z test_ops.py::TestCompositeComplianceCUDA::test_operator_to_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2591158Z test_ops.py::TestCompositeComplianceCUDA::test_operator_to_sparse_cuda_float32 SKIPPED (Allowed exception) [ 81%]
2023-01-11T23:13:47.2591335Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trace_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2591518Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trapezoid_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2591692Z test_ops.py::TestCompositeComplianceCUDA::test_operator_trapz_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2591865Z test_ops.py::TestCompositeComplianceCUDA::test_operator_tril_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2592043Z test_ops.py::TestCompositeComplianceCUDA::test_operator_uniform_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2592220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_mean_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2592397Z test_ops.py::TestCompositeComplianceCUDA::test_operator_var_unbiased_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2592567Z test_ops.py::TestCompositeComplianceCUDA::test_operator_view_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2592774Z test_ops.py::TestCompositeComplianceCUDA::test_operator_vsplit_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2592953Z test_ops.py::TestCompositeComplianceCUDA::test_operator_vstack_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2593127Z test_ops.py::TestCompositeComplianceCUDA::test_operator_xlogy_cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2593301Z test_ops.py::TestCompositeComplianceCUDA::test_operator_zero__cuda_float32 PASSED [ 81%]
2023-01-11T23:13:47.2593458Z test_ops.py::TestMathBitsCUDA::test_conj_view_T_cuda_complex64 PASSED    [ 81%]
2023-01-11T23:13:47.2593620Z test_ops.py::TestMathBitsCUDA::test_conj_view___getitem___cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2593779Z test_ops.py::TestMathBitsCUDA::test_conj_view___radd___cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2593926Z test_ops.py::TestMathBitsCUDA::test_conj_view___rdiv___cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2594090Z test_ops.py::TestMathBitsCUDA::test_conj_view___rmatmul___cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2594251Z test_ops.py::TestMathBitsCUDA::test_conj_view___rmul___cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2594407Z test_ops.py::TestMathBitsCUDA::test_conj_view___rpow___cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2594586Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_bool_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2594760Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_byte_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2594940Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_cfloat_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2595115Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs__conversions_int_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2595267Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_acos_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2595427Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_add_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2595595Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcdiv_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2595762Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addcmul_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2595944Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_addr_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2596104Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_all_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2596259Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_any_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2596489Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_partial_views_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 81%]
2023-01-11T23:13:47.2596667Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_as_strided_scatter_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2596819Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asin_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2596984Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_asinh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2597144Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atan_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2597313Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_atleast_1d_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2597488Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_broadcast_tensors_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2597649Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_cat_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2597817Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_contiguous_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2597982Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diag_embed_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2598150Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_diagonal_scatter_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2598330Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_div_no_rounding_mode_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2598563Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 81%]
2023-01-11T23:13:47.2598777Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_empty_like_cuda_complex64 SKIPPED (Expected: empty is not comparable) [ 81%]
2023-01-11T23:13:47.2598937Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eq_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2599099Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_exp_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2599261Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_expand_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2599437Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_eye_cuda_complex64 SKIPPED (Skipped!) [ 81%]
2023-01-11T23:13:47.2599605Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_fftshift_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2599762Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_hfftn_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2599929Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifft2_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2600106Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_ifftshift_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2600272Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfft_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2600438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fft_irfftn_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2600598Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fill_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2600763Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flatten_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2600923Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flip_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2601079Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_fliplr_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2601249Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_flipud_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2601411Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_hstack_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2601607Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_copy_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2601772Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_fill_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2601940Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_index_select_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2602104Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isclose_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2602269Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isfinite_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2602431Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isinf_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2602586Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_isnan_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2602752Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linalg_svd_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2602917Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_linspace_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:13:47.2603081Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_log_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2603250Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_not_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2603419Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logical_xor_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2603583Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_logspace_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:13:47.2603743Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mean_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2603919Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_meshgrid_list_of_tensors_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2604105Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_mul_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2604271Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_copy_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2604438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_narrow_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2604597Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_ne_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2604759Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_full_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2604922Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_new_ones_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2605084Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_norm_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2605247Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_permute_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2605400Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_pow_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2605563Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_prod_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2605761Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_randn_cuda_complex64 SKIPPED (Test expects tensor input) [ 81%]
2023-01-11T23:13:47.2605924Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_rsub_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2606088Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sigmoid_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2606266Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_softmax_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2606451Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_special_softmax_with_dtype_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2606610Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sqrt_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2606764Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_square_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2606928Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_squeeze_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2607092Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_stack_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2607273Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2607438Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_std_mean_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2607598Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2607764Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_sum_to_size_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2607924Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tanh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2608093Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tensor_split_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2608244Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_to_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2608404Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_tril_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2608560Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_triu_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2608733Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unflatten_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2608902Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_unsqueeze_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2609065Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_var_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2609225Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_view_as_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2609387Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_vstack_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2609542Z test_ops.py::TestMathBitsCUDA::test_conj_view__refs_where_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2609700Z test_ops.py::TestMathBitsCUDA::test_conj_view_acosh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2609898Z test_ops.py::TestMathBitsCUDA::test_conj_view_addcmul_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2610057Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2610230Z test_ops.py::TestMathBitsCUDA::test_conj_view_addmm_decomposed_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2610388Z test_ops.py::TestMathBitsCUDA::test_conj_view_addr_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2610547Z test_ops.py::TestMathBitsCUDA::test_conj_view_all_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:13:47.2610704Z test_ops.py::TestMathBitsCUDA::test_conj_view_angle_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2610857Z test_ops.py::TestMathBitsCUDA::test_conj_view_argwhere_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2611066Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_cuda_complex64 SKIPPED (Errors when storage_offset is included) [ 81%]
2023-01-11T23:13:47.2611280Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_partial_views_cuda_complex64 SKIPPED (Test changes in memory layout) [ 81%]
2023-01-11T23:13:47.2611454Z test_ops.py::TestMathBitsCUDA::test_conj_view_as_strided_scatter_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2611617Z test_ops.py::TestMathBitsCUDA::test_conj_view_asinh_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2611779Z test_ops.py::TestMathBitsCUDA::test_conj_view_baddbmm_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2611940Z test_ops.py::TestMathBitsCUDA::test_conj_view_block_diag_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2612104Z test_ops.py::TestMathBitsCUDA::test_conj_view_broadcast_to_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2612270Z test_ops.py::TestMathBitsCUDA::test_conj_view_cartesian_prod_cuda_complex64 XFAIL [ 81%]
2023-01-11T23:13:47.2612417Z test_ops.py::TestMathBitsCUDA::test_conj_view_cat_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:13:47.2612579Z test_ops.py::TestMathBitsCUDA::test_conj_view_cdouble_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2612743Z test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2612911Z test_ops.py::TestMathBitsCUDA::test_conj_view_cholesky_inverse_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2613139Z test_ops.py::TestMathBitsCUDA::test_conj_view_clone_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2613304Z test_ops.py::TestMathBitsCUDA::test_conj_view_column_stack_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2613471Z test_ops.py::TestMathBitsCUDA::test_conj_view_combinations_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2613627Z test_ops.py::TestMathBitsCUDA::test_conj_view_conj_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2613782Z test_ops.py::TestMathBitsCUDA::test_conj_view_contiguous_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2613944Z test_ops.py::TestMathBitsCUDA::test_conj_view_corrcoef_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2614098Z test_ops.py::TestMathBitsCUDA::test_conj_view_cos_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:13:47.2614257Z test_ops.py::TestMathBitsCUDA::test_conj_view_cross_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2614418Z test_ops.py::TestMathBitsCUDA::test_conj_view_cumprod_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2614684Z test_ops.py::TestMathBitsCUDA::test_conj_view_diag_embed_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2614846Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagflat_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2615012Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_copy_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2615165Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2615334Z test_ops.py::TestMathBitsCUDA::test_conj_view_diagonal_scatter_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2615490Z test_ops.py::TestMathBitsCUDA::test_conj_view_dist_cuda_complex64 PASSED [ 81%]
2023-01-11T23:13:47.2615643Z test_ops.py::TestMathBitsCUDA::test_conj_view_dot_cuda_complex64 PASSED  [ 81%]
2023-01-11T23:13:47.2615843Z test_ops.py::TestMathBitsCUDA::test_conj_view_eq_cuda_complex64 PASSED   [ 82%]
2023-01-11T23:13:47.2615999Z test_ops.py::TestMathBitsCUDA::test_conj_view_exp_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2616165Z test_ops.py::TestMathBitsCUDA::test_conj_view_expand_as_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2616325Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fft_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2616486Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_fftn_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2616638Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft2_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2616796Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_hfft_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2616953Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft2_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2617110Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifft_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2617278Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_ifftshift_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2617440Z test_ops.py::TestMathBitsCUDA::test_conj_view_fft_irfft_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2617598Z test_ops.py::TestMathBitsCUDA::test_conj_view_fill_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2617754Z test_ops.py::TestMathBitsCUDA::test_conj_view_flip_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2617909Z test_ops.py::TestMathBitsCUDA::test_conj_view_flipud_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2618068Z test_ops.py::TestMathBitsCUDA::test_conj_view_float_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2618229Z test_ops.py::TestMathBitsCUDA::test_conj_view_full_like_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2618388Z test_ops.py::TestMathBitsCUDA::test_conj_view_gather_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2618546Z test_ops.py::TestMathBitsCUDA::test_conj_view_geqrf_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2618710Z test_ops.py::TestMathBitsCUDA::test_conj_view_gradient_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2618867Z test_ops.py::TestMathBitsCUDA::test_conj_view_hstack_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2619065Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_add_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2619221Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_copy_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2619410Z test_ops.py::TestMathBitsCUDA::test_conj_view_index_select_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2619592Z test_ops.py::TestMathBitsCUDA::test_conj_view_inner_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2619747Z test_ops.py::TestMathBitsCUDA::test_conj_view_int_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2619908Z test_ops.py::TestMathBitsCUDA::test_conj_view_isclose_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2620070Z test_ops.py::TestMathBitsCUDA::test_conj_view_isfinite_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2620258Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_4inputs_with_extra_args_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:13:47.2620431Z test_ops.py::TestMathBitsCUDA::test_conj_view_jiterator_binary_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:13:47.2620579Z test_ops.py::TestMathBitsCUDA::test_conj_view_ldexp_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:13:47.2620747Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cross_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2620909Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2621082Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_det_singular_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2621250Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_eigvalsh_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2621434Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_householder_product_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2621599Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_inv_ex_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2621795Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_ldl_factor_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2621962Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2622126Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2622298Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_lu_factor_ex_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2622469Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_matrix_rank_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2622635Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2622815Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_hermitian_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2623056Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_pinv_singular_cuda_complex64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 82%]
2023-01-11T23:13:47.2623222Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_qr_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2623389Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_slogdet_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2623560Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_svdvals_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2623724Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_vector_norm_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2623886Z test_ops.py::TestMathBitsCUDA::test_conj_view_linspace_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:13:47.2624042Z test_ops.py::TestMathBitsCUDA::test_conj_view_log10_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2624220Z test_ops.py::TestMathBitsCUDA::test_conj_view_log_softmax_with_dtype_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2624375Z test_ops.py::TestMathBitsCUDA::test_conj_view_lu_cuda_complex64 PASSED   [ 82%]
2023-01-11T23:13:47.2624530Z test_ops.py::TestMathBitsCUDA::test_conj_view_mT_cuda_complex64 PASSED   [ 82%]
2023-01-11T23:13:47.2624702Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumprod_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2624869Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_cumsum_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2625055Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_mean_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2625221Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_prod_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2625385Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_scatter_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2625548Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_std_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2625708Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_sum_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2625866Z test_ops.py::TestMathBitsCUDA::test_conj_view_masked_var_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2626026Z test_ops.py::TestMathBitsCUDA::test_conj_view_matrix_exp_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2626186Z test_ops.py::TestMathBitsCUDA::test_conj_view_mean_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2626366Z test_ops.py::TestMathBitsCUDA::test_conj_view_meshgrid_variadic_tensors_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2626527Z test_ops.py::TestMathBitsCUDA::test_conj_view_movedim_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2626692Z test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_copy_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2626851Z test_ops.py::TestMathBitsCUDA::test_conj_view_narrow_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:13:47.2627008Z test_ops.py::TestMathBitsCUDA::test_conj_view_neg_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2627186Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_empty_cuda_complex64 SKIPPED (Skipped!) [ 82%]
2023-01-11T23:13:47.2627348Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_full_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2627543Z test_ops.py::TestMathBitsCUDA::test_conj_view_new_zeros_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2627717Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv1d_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2627887Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv2d_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2628079Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose2d_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2628268Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_conv_transpose3d_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2628479Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_feature_alpha_dropout_without_train_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2628655Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_l1_loss_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2628830Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_linear_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2629013Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_normalize_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2629198Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_constant_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2629386Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pad_replicate_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2629563Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_shuffle_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2629817Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_pixel_unshuffle_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2629998Z test_ops.py::TestMathBitsCUDA::test_conj_view_nn_functional_softsign_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2630156Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2630318Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_fro_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2630480Z test_ops.py::TestMathBitsCUDA::test_conj_view_norm_nuc_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2630644Z test_ops.py::TestMathBitsCUDA::test_conj_view_ones_like_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2630842Z test_ops.py::TestMathBitsCUDA::test_conj_view_ops_nvprims_view_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2630995Z test_ops.py::TestMathBitsCUDA::test_conj_view_outer_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2631159Z test_ops.py::TestMathBitsCUDA::test_conj_view_permute_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2631321Z test_ops.py::TestMathBitsCUDA::test_conj_view_positive_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2631479Z test_ops.py::TestMathBitsCUDA::test_conj_view_pow_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2631634Z test_ops.py::TestMathBitsCUDA::test_conj_view_qr_cuda_complex64 PASSED   [ 82%]
2023-01-11T23:13:47.2631792Z test_ops.py::TestMathBitsCUDA::test_conj_view_real_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2631958Z test_ops.py::TestMathBitsCUDA::test_conj_view_reciprocal_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2632122Z test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2632290Z test_ops.py::TestMathBitsCUDA::test_conj_view_repeat_interleave_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2632455Z test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_as_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2632618Z test_ops.py::TestMathBitsCUDA::test_conj_view_reshape_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2632776Z test_ops.py::TestMathBitsCUDA::test_conj_view_resize__cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2632942Z test_ops.py::TestMathBitsCUDA::test_conj_view_resolve_neg_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2633099Z test_ops.py::TestMathBitsCUDA::test_conj_view_rot90_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2633259Z test_ops.py::TestMathBitsCUDA::test_conj_view_rsqrt_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2633443Z test_ops.py::TestMathBitsCUDA::test_conj_view_scalar_tensor_cuda_complex64 SKIPPED (Skipped!) [ 82%]
2023-01-11T23:13:47.2633635Z test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_add_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2633789Z test_ops.py::TestMathBitsCUDA::test_conj_view_scatter_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2633950Z test_ops.py::TestMathBitsCUDA::test_conj_view_select_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2634107Z test_ops.py::TestMathBitsCUDA::test_conj_view_sgn_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2634266Z test_ops.py::TestMathBitsCUDA::test_conj_view_short_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2634424Z test_ops.py::TestMathBitsCUDA::test_conj_view_sinc_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2634580Z test_ops.py::TestMathBitsCUDA::test_conj_view_slice_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2634753Z test_ops.py::TestMathBitsCUDA::test_conj_view_softmax_with_dtype_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2634914Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2635076Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_list_args_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2635251Z test_ops.py::TestMathBitsCUDA::test_conj_view_split_with_sizes_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2635410Z test_ops.py::TestMathBitsCUDA::test_conj_view_square_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2635578Z test_ops.py::TestMathBitsCUDA::test_conj_view_std_unbiased_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2635734Z test_ops.py::TestMathBitsCUDA::test_conj_view_stft_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2635891Z test_ops.py::TestMathBitsCUDA::test_conj_view_svd_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2636051Z test_ops.py::TestMathBitsCUDA::test_conj_view_symeig_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2636205Z test_ops.py::TestMathBitsCUDA::test_conj_view_t_cuda_complex64 PASSED    [ 82%]
2023-01-11T23:13:47.2636352Z test_ops.py::TestMathBitsCUDA::test_conj_view_tanh_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2636523Z test_ops.py::TestMathBitsCUDA::test_conj_view_tensordot_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2636682Z test_ops.py::TestMathBitsCUDA::test_conj_view_to_sparse_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2636867Z test_ops.py::TestMathBitsCUDA::test_conj_view_transpose_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2637028Z test_ops.py::TestMathBitsCUDA::test_conj_view_trapezoid_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2637191Z test_ops.py::TestMathBitsCUDA::test_conj_view_unbind_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2637356Z test_ops.py::TestMathBitsCUDA::test_conj_view_unflatten_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2637514Z test_ops.py::TestMathBitsCUDA::test_conj_view_uniform_cuda_complex64 XFAIL [ 82%]
2023-01-11T23:13:47.2637669Z test_ops.py::TestMathBitsCUDA::test_conj_view_unsqueeze_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2637823Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_cuda_complex64 PASSED  [ 82%]
2023-01-11T23:13:47.2637992Z test_ops.py::TestMathBitsCUDA::test_conj_view_var_unbiased_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2638149Z test_ops.py::TestMathBitsCUDA::test_conj_view_vdot_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2638304Z test_ops.py::TestMathBitsCUDA::test_conj_view_view_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2638463Z test_ops.py::TestMathBitsCUDA::test_conj_view_vsplit_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2638620Z test_ops.py::TestMathBitsCUDA::test_conj_view_vstack_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2638777Z test_ops.py::TestMathBitsCUDA::test_conj_view_where_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2638933Z test_ops.py::TestMathBitsCUDA::test_conj_view_zero__cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2639089Z test_ops.py::TestMathBitsCUDA::test_conj_view_zeros_like_cuda_complex64 PASSED [ 82%]
2023-01-11T23:13:47.2639254Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rdiv___cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2639451Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmatmul___cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2639615Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rmul___cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2639781Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view___rpow___cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2639969Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_cfloat_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2640152Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_char_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2640337Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_double_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2640514Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_half_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2640696Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs__conversions_int_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2640867Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_acosh_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2641038Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_addcmul_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2641213Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_allclose_cuda_complex128 PASSED [ 82%]
2023-01-11T23:13:47.2641379Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_any_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2641616Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_as_strided_partial_views_cuda_complex128 SKIPPED (Errors when storage_offset is included) [ 83%]
2023-01-11T23:13:47.2641786Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_asin_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2641953Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_atanh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2642131Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_tensors_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2642313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_broadcast_to_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2642508Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cat_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2642688Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_conj_physical_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2642859Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cosh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2643029Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_cumsum_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2643205Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_diagonal_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2643392Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_div_no_rounding_mode_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2643609Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_empty_like_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 83%]
2023-01-11T23:13:47.2643777Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftn_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2643957Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_fftshift_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2644131Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_hfft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2644306Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_ifftshift_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2644478Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2644652Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_fft_irfftn_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2644826Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_hsplit_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2644997Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_add_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2645206Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_index_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2645372Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isfinite_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2645543Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_isreal_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2645712Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_lerp_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2645896Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_matrix_norm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2646070Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svd_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2646248Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_linalg_svdvals_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2646435Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2646613Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_and_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2646783Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_not_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2646961Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logical_xor_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2647133Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_logspace_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:13:47.2647305Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_masked_fill_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2647473Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mean_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2647663Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2647856Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_meshgrid_variadic_tensors_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2648031Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_movedim_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2648225Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_mul_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2648394Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_narrow_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2648558Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_neg_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2648775Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_empty_cuda_complex128 SKIPPED (Expected: empty is not comparable) [ 83%]
2023-01-11T23:13:47.2648947Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_new_ones_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2649133Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_l1_loss_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2649335Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2649537Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2649729Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_nn_functional_tanhshrink_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2649896Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_norm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2650053Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_ones_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:13:47.2650226Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_permute_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2650397Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_positive_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2650561Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_prod_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2650801Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_randn_cuda_complex128 SKIPPED (Test expects tensor input) [ 83%]
2023-01-11T23:13:47.2650973Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_repeat_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2651144Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rot90_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2651313Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_rsqrt_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2651481Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sinh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2651670Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_log_softmax_with_dtype_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2651863Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_special_softmax_with_dtype_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2652031Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_square_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2652203Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_stack_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2652369Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_std_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2652537Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sub_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2652698Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_sum_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2652862Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_t_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2653041Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tensor_split_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2653198Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_to_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2653365Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_tril_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2653530Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_triu_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2653709Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_true_divide_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2653904Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vsplit_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2654074Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_vstack_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2654240Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_where_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2654404Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view__refs_zeros_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:13:47.2654662Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_add_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2654829Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addbmm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2654993Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmm_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2655156Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_addmv_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2655317Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_all_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2655486Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_allclose_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2655648Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_any_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2655811Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asin_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2655972Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_asinh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2656124Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atan_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2656291Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_2d_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2656460Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_atleast_3d_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2656669Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bfloat16_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2656834Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_block_diag_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2656999Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_bool_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2657170Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_broadcast_to_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2657344Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cartesian_prod_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2657503Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cdouble_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2657665Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_chalf_cuda_complex128 XFAIL [ 83%]
2023-01-11T23:13:47.2657825Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_char_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2657991Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2658171Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_inverse_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2658341Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cholesky_solve_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2658506Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_clone_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2658674Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_column_stack_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2658837Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_combinations_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2659012Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_constant_pad_nd_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2659174Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cos_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2659336Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cosh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2659495Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cov_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2659664Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_cumprod_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2659865Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2660033Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diag_embed_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2660199Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagflat_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2660365Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diagonal_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2660525Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_diff_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2660687Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_dot_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2660850Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_double_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2661030Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_cuda_complex128 SKIPPED (Skipped!) [ 83%]
2023-01-11T23:13:47.2661215Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_empty_like_cuda_complex128 SKIPPED (Skipped!) [ 83%]
2023-01-11T23:13:47.2661379Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eq_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2661541Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_exp_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2661699Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_expand_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2661875Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_eye_cuda_complex128 SKIPPED (Skipped!) [ 83%]
2023-01-11T23:13:47.2662040Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft2_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2662204Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2662368Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftn_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2662569Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_fftshift_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2662735Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2662903Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_hfftn_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2663070Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_ifftn_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2663230Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfft2_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2663396Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fft_irfftn_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2663557Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fill_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2663719Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_fliplr_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2663885Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_gather_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2664052Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_geqrf_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2664218Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hsplit_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2664384Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_hstack_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2664543Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_index_copy_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2664708Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isfinite_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2664875Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_isnan_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2665033Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_istft_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2665194Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lerp_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2665367Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cross_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2665546Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_det_singular_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2665741Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eig_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2665914Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigh_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2666080Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_eigvals_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2666246Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2666419Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_inv_ex_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2666599Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_ldl_factor_ex_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2666770Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2666958Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lstsq_grad_oriented_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2667129Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2667308Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_lu_factor_ex_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2667476Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_matrix_power_cuda_complex128 PASSED [ 83%]
2023-01-11T23:13:47.2667652Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_multi_dot_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2667848Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_norm_subgradients_at_zero_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2668015Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_qr_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2668225Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_solve_triangular_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2668393Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svd_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2668570Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_svdvals_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2668741Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vecdot_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2668919Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_vector_norm_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2669074Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log10_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2669235Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_log2_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2669399Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logdet_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2669568Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_not_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2669795Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_or_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2669961Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_logical_xor_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2670127Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2670295Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_solve_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2670455Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_lu_unpack_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2670629Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_cumsum_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2670797Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_fill_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2670969Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_scatter_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2671145Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_masked_select_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2671310Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_matmul_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2671499Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mean_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2671688Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_meshgrid_list_of_tensors_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2671856Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_movedim_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672011Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mul_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672172Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_mv_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672338Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_narrow_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672502Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ne_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672667Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_neg_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672832Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_full_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2672999Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_new_ones_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2673180Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv1d_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2673365Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose1d_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2673560Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_conv_transpose3d_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2673743Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_l1_loss_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2673929Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_circular_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2674144Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_constant_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2674330Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_pad_reflect_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2674528Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softmin_with_dtype_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2674712Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_softsign_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2674922Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nn_functional_triplet_margin_with_distance_loss_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2675082Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_nonzero_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2675247Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_norm_fro_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2675424Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ops_nvprims_view_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2675593Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_ormqr_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2675753Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_outer_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2675920Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_positive_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2676084Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_pow_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2676247Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_qr_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2676412Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_renorm_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2676583Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_repeat_interleave_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2676752Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_as_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2676919Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_reshape_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2677089Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_resize_as__cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2677279Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_roll_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2677443Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_rsqrt_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2677607Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_select_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2677768Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sgn_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2677923Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_short_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2678088Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sigmoid_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2678251Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_split_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2678414Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_sqrt_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2678578Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_square_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2678746Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_squeeze_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2678913Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2679092Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_std_mean_unbiased_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2679253Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_svd_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2679418Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_along_dim_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2679580Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_take_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2679739Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tanh_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2679934Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensor_split_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2680103Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tensordot_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2680266Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tile_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2680435Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapezoid_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2680597Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_trapz_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2680767Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triangular_solve_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2680927Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_tril_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2681083Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_triu_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2681253Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_true_divide_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2681422Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_uniform_cuda_complex128 XFAIL [ 84%]
2023-01-11T23:13:47.2681591Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_unsqueeze_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2681757Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2681921Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_mean_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2682083Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_var_unbiased_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2682246Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vdot_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2682576Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_view_as_real_cuda_complex128 SKIPPED (Operation doesn't support conjugated inputs.) [ 84%]
2023-01-11T23:13:47.2682743Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vsplit_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2682910Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_vstack_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2683072Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_where_cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2683259Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_zero__cuda_complex128 PASSED [ 84%]
2023-01-11T23:13:47.2683417Z test_ops.py::TestMathBitsCUDA::test_neg_view_H_cuda_float64 PASSED       [ 84%]
2023-01-11T23:13:47.2683570Z test_ops.py::TestMathBitsCUDA::test_neg_view_T_cuda_float64 PASSED       [ 84%]
2023-01-11T23:13:47.2683719Z test_ops.py::TestMathBitsCUDA::test_neg_view___radd___cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2683876Z test_ops.py::TestMathBitsCUDA::test_neg_view___rdiv___cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2684036Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmatmul___cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2684191Z test_ops.py::TestMathBitsCUDA::test_neg_view___rmul___cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2684349Z test_ops.py::TestMathBitsCUDA::test_neg_view___rpow___cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2684500Z test_ops.py::TestMathBitsCUDA::test_neg_view___rsub___cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2684658Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_T_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2684834Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_byte_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2685001Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_cfloat_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2685176Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_char_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2685347Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_half_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2685517Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_int_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2685690Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs__conversions_short_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2685876Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_addr_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686032Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_all_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686198Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_allclose_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686355Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_amax_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686520Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_as_strided_scatter_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686679Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asin_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686838Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_asinh_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2686993Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan2_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2687149Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atan_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2687317Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_atleast_3d_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2687486Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_broadcast_to_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2687646Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cat_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2687796Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ceil_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2687957Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2688119Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_clamp_max_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2688284Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_column_stack_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2688440Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2688608Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_conj_physical_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2688772Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_cumsum_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2688929Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diag_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2689115Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_copy_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2689282Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_diagonal_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2689445Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_digamma_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2689615Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_div_floor_rounding_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2689776Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dsplit_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2689932Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_dstack_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2690139Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_empty_cuda_float64 SKIPPED (Expected: empty is not comparable) [ 84%]
2023-01-11T23:13:47.2690299Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erf_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2690458Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfc_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2690608Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_erfinv_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2690770Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_expand_as_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2690933Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft2_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2691092Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fft_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2691252Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_fftn_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2691414Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfft2_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2691613Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_hfftn_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2691775Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft2_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2691928Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifft_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2692088Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ifftn_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2692252Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft2_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2692410Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_ihfft_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2692576Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fft_irfft2_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2692733Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fill_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2692893Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_flatten_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2693056Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fliplr_cuda_float64 PASSED [ 84%]
2023-01-11T23:13:47.2693213Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_float_power_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2693376Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2693544Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_floor_divide_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2693702Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmax_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2693856Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmin_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2694009Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_fmod_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2694161Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_frac_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2694318Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ge_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2694589Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hsplit_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2694741Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_hstack_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2694945Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_index_copy_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2695108Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isinf_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2695262Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isnan_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2695424Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_isneginf_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2695587Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lerp_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2695751Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svd_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2695918Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_svdvals_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2696086Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linalg_vector_norm_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2696251Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_linspace_cuda_float64 XFAIL [ 85%]
2023-01-11T23:13:47.2696410Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_log1p_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2696574Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_and_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2696737Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_not_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2696899Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logical_xor_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2697060Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logspace_cuda_float64 XFAIL [ 85%]
2023-01-11T23:13:47.2697224Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_logsumexp_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2697409Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_lt_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2697568Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_maximum_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2697734Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nan_to_num_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2697894Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_narrow_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2698047Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_ne_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2698274Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_empty_strided_cuda_float64 SKIPPED (Expected: empty_strided is not comparable) [ 85%]
2023-01-11T23:13:47.2698435Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_ones_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2698597Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_new_zeros_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2698759Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nextafter_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2698983Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_alpha_dropout_cuda_float64 SKIPPED (Expected: dropout is not comparable) [ 85%]
2023-01-11T23:13:47.2699159Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_celu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2699331Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_elu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2699502Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_glu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2699685Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardshrink_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2699863Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hardtanh_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2700054Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2700234Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_layer_norm_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2700416Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_leaky_relu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2700624Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2700801Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_nll_loss_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2700975Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_pdist_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2701163Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2701338Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_prelu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2701508Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu6_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2701688Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_relu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2701862Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_selu_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2702054Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_softmax_with_dtype_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2702226Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_nn_functional_threshold_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2702383Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_norm_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2702540Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_prod_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2702692Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_real_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2702845Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_roll_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2703001Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rot90_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2703231Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsqrt_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2703386Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_rsub_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2703533Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sign_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2703686Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sinh_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2703859Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_softmax_with_dtype_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2704030Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_bessel_j1_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2704197Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_entr_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2704365Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_erfcx_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2704559Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_multigammaln_mvlgamma_p_1_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2704728Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_ndtr_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2704914Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_spherical_bessel_j0_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2705073Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_special_zeta_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2705230Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sqrt_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2705390Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_square_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2705550Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_stack_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2705710Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2705872Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_std_mean_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2706031Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sub_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2706196Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_sum_to_size_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2706372Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_t_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2706532Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_tan_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2706696Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_transpose_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2706858Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_trunc_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2707013Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_var_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2707168Z test_ops.py::TestMathBitsCUDA::test_neg_view__refs_zeros_cuda_float64 XFAIL [ 85%]
2023-01-11T23:13:47.2707340Z test_ops.py::TestMathBitsCUDA::test_neg_view__softmax_backward_data_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2707495Z test_ops.py::TestMathBitsCUDA::test_neg_view_abs_cuda_float64 PASSED     [ 85%]
2023-01-11T23:13:47.2707642Z test_ops.py::TestMathBitsCUDA::test_neg_view_addbmm_cuda_float64 PASSED  [ 85%]
2023-01-11T23:13:47.2707803Z test_ops.py::TestMathBitsCUDA::test_neg_view_addcdiv_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2707959Z test_ops.py::TestMathBitsCUDA::test_neg_view_addmm_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2708111Z test_ops.py::TestMathBitsCUDA::test_neg_view_addr_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2708260Z test_ops.py::TestMathBitsCUDA::test_neg_view_amax_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2708413Z test_ops.py::TestMathBitsCUDA::test_neg_view_aminmax_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2708563Z test_ops.py::TestMathBitsCUDA::test_neg_view_any_cuda_float64 PASSED     [ 85%]
2023-01-11T23:13:47.2708726Z test_ops.py::TestMathBitsCUDA::test_neg_view_argwhere_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2708958Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_cuda_float64 SKIPPED (Errors when storage_offset is included) [ 85%]
2023-01-11T23:13:47.2709137Z test_ops.py::TestMathBitsCUDA::test_neg_view_as_strided_scatter_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2709314Z test_ops.py::TestMathBitsCUDA::test_neg_view_asinh_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2709472Z test_ops.py::TestMathBitsCUDA::test_neg_view_atan2_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2709630Z test_ops.py::TestMathBitsCUDA::test_neg_view_atleast_1d_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2709846Z test_ops.py::TestMathBitsCUDA::test_neg_view_baddbmm_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2710007Z test_ops.py::TestMathBitsCUDA::test_neg_view_bernoulli_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2710166Z test_ops.py::TestMathBitsCUDA::test_neg_view_block_diag_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2710314Z test_ops.py::TestMathBitsCUDA::test_neg_view_bmm_cuda_float64 PASSED     [ 85%]
2023-01-11T23:13:47.2710462Z test_ops.py::TestMathBitsCUDA::test_neg_view_bool_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2710628Z test_ops.py::TestMathBitsCUDA::test_neg_view_broadcast_to_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2710789Z test_ops.py::TestMathBitsCUDA::test_neg_view_bucketize_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2710941Z test_ops.py::TestMathBitsCUDA::test_neg_view_byte_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2711089Z test_ops.py::TestMathBitsCUDA::test_neg_view_cat_cuda_float64 PASSED     [ 85%]
2023-01-11T23:13:47.2711239Z test_ops.py::TestMathBitsCUDA::test_neg_view_cdist_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2711391Z test_ops.py::TestMathBitsCUDA::test_neg_view_cdouble_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2711541Z test_ops.py::TestMathBitsCUDA::test_neg_view_cfloat_cuda_float64 PASSED  [ 85%]
2023-01-11T23:13:47.2711684Z test_ops.py::TestMathBitsCUDA::test_neg_view_chalf_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2711855Z test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_inverse_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2712020Z test_ops.py::TestMathBitsCUDA::test_neg_view_cholesky_solve_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2712197Z test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2712354Z test_ops.py::TestMathBitsCUDA::test_neg_view_clamp_min_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2712505Z test_ops.py::TestMathBitsCUDA::test_neg_view_conj_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2712668Z test_ops.py::TestMathBitsCUDA::test_neg_view_conj_physical_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2712833Z test_ops.py::TestMathBitsCUDA::test_neg_view_constant_pad_nd_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2712987Z test_ops.py::TestMathBitsCUDA::test_neg_view_contiguous_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2713136Z test_ops.py::TestMathBitsCUDA::test_neg_view_cosh_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2713291Z test_ops.py::TestMathBitsCUDA::test_neg_view_cummax_cuda_float64 PASSED  [ 85%]
2023-01-11T23:13:47.2713451Z test_ops.py::TestMathBitsCUDA::test_neg_view_diag_embed_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2713611Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagflat_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2713770Z test_ops.py::TestMathBitsCUDA::test_neg_view_diagonal_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2713920Z test_ops.py::TestMathBitsCUDA::test_neg_view_diff_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2714070Z test_ops.py::TestMathBitsCUDA::test_neg_view_dist_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2714230Z test_ops.py::TestMathBitsCUDA::test_neg_view_div_trunc_rounding_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2714380Z test_ops.py::TestMathBitsCUDA::test_neg_view_dot_cuda_float64 PASSED     [ 85%]
2023-01-11T23:13:47.2714532Z test_ops.py::TestMathBitsCUDA::test_neg_view_double_cuda_float64 PASSED  [ 85%]
2023-01-11T23:13:47.2714685Z test_ops.py::TestMathBitsCUDA::test_neg_view_dsplit_cuda_float64 PASSED  [ 85%]
2023-01-11T23:13:47.2714880Z test_ops.py::TestMathBitsCUDA::test_neg_view_empty_cuda_float64 SKIPPED (Skipped!) [ 85%]
2023-01-11T23:13:47.2715056Z test_ops.py::TestMathBitsCUDA::test_neg_view_empty_like_cuda_float64 SKIPPED (Skipped!) [ 85%]
2023-01-11T23:13:47.2715208Z test_ops.py::TestMathBitsCUDA::test_neg_view_eq_cuda_float64 PASSED      [ 85%]
2023-01-11T23:13:47.2715359Z test_ops.py::TestMathBitsCUDA::test_neg_view_equal_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2715508Z test_ops.py::TestMathBitsCUDA::test_neg_view_erf_cuda_float64 PASSED     [ 85%]
2023-01-11T23:13:47.2715651Z test_ops.py::TestMathBitsCUDA::test_neg_view_erfc_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2715803Z test_ops.py::TestMathBitsCUDA::test_neg_view_erfinv_cuda_float64 PASSED  [ 85%]
2023-01-11T23:13:47.2715961Z test_ops.py::TestMathBitsCUDA::test_neg_view_expand_as_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2716112Z test_ops.py::TestMathBitsCUDA::test_neg_view_expm1_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2716274Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_fftn_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2716433Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_hfftn_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2716592Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifft2_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2716752Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ifftn_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2716904Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_ihfftn_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2717061Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_irfftn_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2717218Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft2_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2717375Z test_ops.py::TestMathBitsCUDA::test_neg_view_fft_rfft_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2717526Z test_ops.py::TestMathBitsCUDA::test_neg_view_fill_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2717692Z test_ops.py::TestMathBitsCUDA::test_neg_view_float_power_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2717844Z test_ops.py::TestMathBitsCUDA::test_neg_view_floor_cuda_float64 PASSED   [ 85%]
2023-01-11T23:13:47.2718033Z test_ops.py::TestMathBitsCUDA::test_neg_view_floor_divide_cuda_float64 PASSED [ 85%]
2023-01-11T23:13:47.2718179Z test_ops.py::TestMathBitsCUDA::test_neg_view_fmax_cuda_float64 PASSED    [ 85%]
2023-01-11T23:13:47.2718330Z test_ops.py::TestMathBitsCUDA::test_neg_view_fmin_cuda_float64 PASSED    [ 86%]
2023-01-11T23:13:47.2718482Z test_ops.py::TestMathBitsCUDA::test_neg_view_fmod_cuda_float64 PASSED    [ 86%]
2023-01-11T23:13:47.2718639Z test_ops.py::TestMathBitsCUDA::test_neg_view_full_like_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2718791Z test_ops.py::TestMathBitsCUDA::test_neg_view_gather_cuda_float64 PASSED  [ 86%]
2023-01-11T23:13:47.2718944Z test_ops.py::TestMathBitsCUDA::test_neg_view_geqrf_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2719106Z test_ops.py::TestMathBitsCUDA::test_neg_view_gradient_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2719256Z test_ops.py::TestMathBitsCUDA::test_neg_view_gt_cuda_float64 PASSED      [ 86%]
2023-01-11T23:13:47.2719401Z test_ops.py::TestMathBitsCUDA::test_neg_view_histc_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2719555Z test_ops.py::TestMathBitsCUDA::test_neg_view_igammac_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2719711Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_add_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2719875Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_reduce_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2720035Z test_ops.py::TestMathBitsCUDA::test_neg_view_index_select_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2720187Z test_ops.py::TestMathBitsCUDA::test_neg_view_inner_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2720338Z test_ops.py::TestMathBitsCUDA::test_neg_view_int_cuda_float64 PASSED     [ 86%]
2023-01-11T23:13:47.2720489Z test_ops.py::TestMathBitsCUDA::test_neg_view_isclose_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2720656Z test_ops.py::TestMathBitsCUDA::test_neg_view_isin_cuda_float64 PASSED    [ 86%]
2023-01-11T23:13:47.2720810Z test_ops.py::TestMathBitsCUDA::test_neg_view_isinf_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2720963Z test_ops.py::TestMathBitsCUDA::test_neg_view_isnan_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2721121Z test_ops.py::TestMathBitsCUDA::test_neg_view_isneginf_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2721280Z test_ops.py::TestMathBitsCUDA::test_neg_view_isposinf_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2721433Z test_ops.py::TestMathBitsCUDA::test_neg_view_isreal_cuda_float64 PASSED  [ 86%]
2023-01-11T23:13:47.2721584Z test_ops.py::TestMathBitsCUDA::test_neg_view_kron_cuda_float64 PASSED    [ 86%]
2023-01-11T23:13:47.2721748Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cond_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2721902Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cross_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2722064Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2722234Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_det_singular_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2722398Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_eigh_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2722578Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_householder_product_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2722738Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_inv_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2722898Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2723075Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lstsq_grad_oriented_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2723241Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_lu_factor_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2723400Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2723574Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_matrix_power_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2723761Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_multi_dot_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2723922Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2724159Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_pinv_singular_cuda_float64 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 86%]
2023-01-11T23:13:47.2724317Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_qr_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2724481Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2724644Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_ex_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2724811Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_solve_triangular_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2724976Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svd_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2725143Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_svdvals_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2725315Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_tensorsolve_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2725481Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_vector_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2725634Z test_ops.py::TestMathBitsCUDA::test_neg_view_log10_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2725788Z test_ops.py::TestMathBitsCUDA::test_neg_view_log2_cuda_float64 PASSED    [ 86%]
2023-01-11T23:13:47.2725948Z test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2726120Z test_ops.py::TestMathBitsCUDA::test_neg_view_log_softmax_with_dtype_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2726274Z test_ops.py::TestMathBitsCUDA::test_neg_view_logical_or_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2726459Z test_ops.py::TestMathBitsCUDA::test_neg_view_logspace_cuda_float64 XFAIL [ 86%]
2023-01-11T23:13:47.2726620Z test_ops.py::TestMathBitsCUDA::test_neg_view_logsumexp_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2726782Z test_ops.py::TestMathBitsCUDA::test_neg_view_long_cuda_float64 PASSED    [ 86%]
2023-01-11T23:13:47.2726939Z test_ops.py::TestMathBitsCUDA::test_neg_view_lt_cuda_float64 PASSED      [ 86%]
2023-01-11T23:13:47.2727100Z test_ops.py::TestMathBitsCUDA::test_neg_view_lu_unpack_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2727264Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amax_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2727424Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_amin_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2727581Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmax_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2727745Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_argmin_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2727916Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumprod_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2728080Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_cumsum_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2728253Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_log_softmax_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2728414Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_mean_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2728580Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_median_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2728740Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2728892Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_sum_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2729049Z test_ops.py::TestMathBitsCUDA::test_neg_view_masked_var_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2729204Z test_ops.py::TestMathBitsCUDA::test_neg_view_matrix_exp_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2729361Z test_ops.py::TestMathBitsCUDA::test_neg_view_max_binary_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2729520Z test_ops.py::TestMathBitsCUDA::test_neg_view_maximum_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2729722Z test_ops.py::TestMathBitsCUDA::test_neg_view_meshgrid_list_of_tensors_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2729883Z test_ops.py::TestMathBitsCUDA::test_neg_view_min_binary_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2730035Z test_ops.py::TestMathBitsCUDA::test_neg_view_mm_cuda_float64 PASSED      [ 86%]
2023-01-11T23:13:47.2730190Z test_ops.py::TestMathBitsCUDA::test_neg_view_msort_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2730334Z test_ops.py::TestMathBitsCUDA::test_neg_view_mv_cuda_float64 PASSED      [ 86%]
2023-01-11T23:13:47.2730493Z test_ops.py::TestMathBitsCUDA::test_neg_view_nan_to_num_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2730652Z test_ops.py::TestMathBitsCUDA::test_neg_view_nanmedian_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2730818Z test_ops.py::TestMathBitsCUDA::test_neg_view_nanquantile_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2730972Z test_ops.py::TestMathBitsCUDA::test_neg_view_nansum_cuda_float64 PASSED  [ 86%]
2023-01-11T23:13:47.2731135Z test_ops.py::TestMathBitsCUDA::test_neg_view_narrow_copy_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2731287Z test_ops.py::TestMathBitsCUDA::test_neg_view_neg_cuda_float64 PASSED     [ 86%]
2023-01-11T23:13:47.2731461Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_cuda_float64 SKIPPED (Skipped!) [ 86%]
2023-01-11T23:13:47.2731678Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_empty_strided_cuda_float64 SKIPPED (Expected: new_empty_strided is not comparable) [ 86%]
2023-01-11T23:13:47.2731836Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_ones_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2731995Z test_ops.py::TestMathBitsCUDA::test_neg_view_new_zeros_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2732190Z test_ops.py::TestMathBitsCUDA::test_neg_view_nextafter_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2732389Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional__scaled_dot_product_attention_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2732578Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2732767Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_avg_pool3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2732952Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2733136Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2733312Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_adaptive_max_pool3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2733489Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2733667Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_avg_pool3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2733845Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_bilinear_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2734015Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_celu_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2734184Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2734368Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2734746Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_conv_transpose3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2734921Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_ctc_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2735089Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2735266Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_dropout3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2735468Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_feature_alpha_dropout_with_train_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2735753Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_fractional_max_pool2d_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 86%]
2023-01-11T23:13:47.2735939Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gaussian_nll_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2736108Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_gelu_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2736277Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_glu_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2736453Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_group_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2736630Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardshrink_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2736801Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardswish_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2736974Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hardtanh_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2737162Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_hinge_embedding_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2737336Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_huber_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2737522Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bicubic_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2737711Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_bilinear_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2737901Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_linear_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2738091Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_interpolate_trilinear_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2738286Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_kl_div_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2738464Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_layer_norm_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2738651Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_margin_ranking_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2738831Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool1d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2739020Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_pool3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2739227Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool2d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2739410Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2739594Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_max_unpool3d_grad_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2739778Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multi_margin_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2739962Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_margin_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2740156Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_multilabel_soft_margin_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2740329Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_nll_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2740503Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_normalize_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2740686Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pad_replicate_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2740855Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_pdist_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2741042Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_poisson_nll_loss_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2741213Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_relu_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2741405Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_selu_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2741564Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_silu_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2741751Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softmin_with_dtype_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2741923Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_softsign_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2742095Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_unfold_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2742279Z test_ops.py::TestMathBitsCUDA::test_neg_view_nn_functional_upsample_nearest_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2742439Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_fro_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2742601Z test_ops.py::TestMathBitsCUDA::test_neg_view_norm_inf_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2742759Z test_ops.py::TestMathBitsCUDA::test_neg_view_normal_cuda_float64 PASSED  [ 86%]
2023-01-11T23:13:47.2742907Z test_ops.py::TestMathBitsCUDA::test_neg_view_ones_cuda_float64 XFAIL     [ 86%]
2023-01-11T23:13:47.2743076Z test_ops.py::TestMathBitsCUDA::test_neg_view_ops_nvprims_view_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2743229Z test_ops.py::TestMathBitsCUDA::test_neg_view_outer_cuda_float64 PASSED   [ 86%]
2023-01-11T23:13:47.2743386Z test_ops.py::TestMathBitsCUDA::test_neg_view_pinverse_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2743563Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_1_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2743740Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_2_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2743912Z test_ops.py::TestMathBitsCUDA::test_neg_view_polygamma_polygamma_n_4_cuda_float64 PASSED [ 86%]
2023-01-11T23:13:47.2744094Z test_ops.py::TestMathBitsCUDA::test_neg_view_put_cuda_float64 PASSED     [ 86%]
2023-01-11T23:13:47.2744251Z test_ops.py::TestMathBitsCUDA::test_neg_view_randint_cuda_float64 XFAIL  [ 87%]
2023-01-11T23:13:47.2744400Z test_ops.py::TestMathBitsCUDA::test_neg_view_randn_cuda_float64 XFAIL    [ 87%]
2023-01-11T23:13:47.2744565Z test_ops.py::TestMathBitsCUDA::test_neg_view_randn_like_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2744717Z test_ops.py::TestMathBitsCUDA::test_neg_view_ravel_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2744869Z test_ops.py::TestMathBitsCUDA::test_neg_view_real_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2745033Z test_ops.py::TestMathBitsCUDA::test_neg_view_reciprocal_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2745192Z test_ops.py::TestMathBitsCUDA::test_neg_view_remainder_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2745345Z test_ops.py::TestMathBitsCUDA::test_neg_view_renorm_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2745501Z test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2745662Z test_ops.py::TestMathBitsCUDA::test_neg_view_repeat_interleave_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2745825Z test_ops.py::TestMathBitsCUDA::test_neg_view_reshape_as_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2745980Z test_ops.py::TestMathBitsCUDA::test_neg_view_resize_as__cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2746142Z test_ops.py::TestMathBitsCUDA::test_neg_view_resolve_neg_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2746293Z test_ops.py::TestMathBitsCUDA::test_neg_view_roll_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2746449Z test_ops.py::TestMathBitsCUDA::test_neg_view_rot90_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2746603Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2746790Z test_ops.py::TestMathBitsCUDA::test_neg_view_round_decimals_neg_3_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2746938Z test_ops.py::TestMathBitsCUDA::test_neg_view_rsqrt_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2747116Z test_ops.py::TestMathBitsCUDA::test_neg_view_scalar_tensor_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2747301Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_add_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2747456Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2747625Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_amin_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2747795Z test_ops.py::TestMathBitsCUDA::test_neg_view_scatter_reduce_mean_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2747963Z test_ops.py::TestMathBitsCUDA::test_neg_view_searchsorted_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2748137Z test_ops.py::TestMathBitsCUDA::test_neg_view_segment_reduce_lengths_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2748290Z test_ops.py::TestMathBitsCUDA::test_neg_view_select_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2748475Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_blackman_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2748663Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_cosine_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2748861Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_exponential_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2749057Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_general_hamming_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2749243Z test_ops.py::TestMathBitsCUDA::test_neg_view_signal_windows_hamming_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2749399Z test_ops.py::TestMathBitsCUDA::test_neg_view_signbit_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2749550Z test_ops.py::TestMathBitsCUDA::test_neg_view_sin_cuda_float64 PASSED     [ 87%]
2023-01-11T23:13:47.2749768Z test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2749975Z test_ops.py::TestMathBitsCUDA::test_neg_view_softmax_with_dtype_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2750129Z test_ops.py::TestMathBitsCUDA::test_neg_view_sort_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2750322Z test_ops.py::TestMathBitsCUDA::test_neg_view_sparse_sampled_addmm_cuda_float64 SKIPPED (Skipped!) [ 87%]
2023-01-11T23:13:47.2750490Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_airy_ai_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2750657Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y0_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2750823Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_bessel_y1_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2751010Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_t_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2751379Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_chebyshev_polynomial_w_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%]
2023-01-11T23:13:47.2751546Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_entr_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2751725Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_hermite_polynomial_h_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2751887Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i0e_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2752045Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_i1_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2752395Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_legendre_polynomial_p_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%]
2023-01-11T23:13:47.2752562Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_log_ndtr_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2752747Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_scaled_modified_bessel_k0_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2753107Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_t_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%]
2023-01-11T23:13:47.2753496Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_shifted_chebyshev_polynomial_w_cuda_float64 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 87%]
2023-01-11T23:13:47.2753675Z test_ops.py::TestMathBitsCUDA::test_neg_view_special_spherical_bessel_j0_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2753833Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2753993Z test_ops.py::TestMathBitsCUDA::test_neg_view_split_with_sizes_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2754144Z test_ops.py::TestMathBitsCUDA::test_neg_view_sqrt_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2754301Z test_ops.py::TestMathBitsCUDA::test_neg_view_square_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2754455Z test_ops.py::TestMathBitsCUDA::test_neg_view_squeeze_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2754625Z test_ops.py::TestMathBitsCUDA::test_neg_view_std_mean_unbiased_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2754775Z test_ops.py::TestMathBitsCUDA::test_neg_view_stft_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2754941Z test_ops.py::TestMathBitsCUDA::test_neg_view_sum_to_size_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2755093Z test_ops.py::TestMathBitsCUDA::test_neg_view_svd_cuda_float64 PASSED     [ 87%]
2023-01-11T23:13:47.2755248Z test_ops.py::TestMathBitsCUDA::test_neg_view_svd_lowrank_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2755402Z test_ops.py::TestMathBitsCUDA::test_neg_view_symeig_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2755553Z test_ops.py::TestMathBitsCUDA::test_neg_view_t_cuda_float64 PASSED       [ 87%]
2023-01-11T23:13:47.2755717Z test_ops.py::TestMathBitsCUDA::test_neg_view_take_along_dim_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2755865Z test_ops.py::TestMathBitsCUDA::test_neg_view_tan_cuda_float64 PASSED     [ 87%]
2023-01-11T23:13:47.2756042Z test_ops.py::TestMathBitsCUDA::test_neg_view_tanh_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2756206Z test_ops.py::TestMathBitsCUDA::test_neg_view_tensor_split_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2756362Z test_ops.py::TestMathBitsCUDA::test_neg_view_to_cuda_float64 PASSED      [ 87%]
2023-01-11T23:13:47.2756515Z test_ops.py::TestMathBitsCUDA::test_neg_view_to_sparse_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2756676Z test_ops.py::TestMathBitsCUDA::test_neg_view_transpose_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2756829Z test_ops.py::TestMathBitsCUDA::test_neg_view_trapz_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2756997Z test_ops.py::TestMathBitsCUDA::test_neg_view_triangular_solve_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2757149Z test_ops.py::TestMathBitsCUDA::test_neg_view_triu_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2757301Z test_ops.py::TestMathBitsCUDA::test_neg_view_trunc_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2757460Z test_ops.py::TestMathBitsCUDA::test_neg_view_unflatten_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2757613Z test_ops.py::TestMathBitsCUDA::test_neg_view_unfold_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2757759Z test_ops.py::TestMathBitsCUDA::test_neg_view_uniform_cuda_float64 XFAIL  [ 87%]
2023-01-11T23:13:47.2757910Z test_ops.py::TestMathBitsCUDA::test_neg_view_unique_cuda_float64 PASSED  [ 87%]
2023-01-11T23:13:47.2758061Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_cuda_float64 PASSED     [ 87%]
2023-01-11T23:13:47.2758221Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2758392Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_mean_unbiased_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2758556Z test_ops.py::TestMathBitsCUDA::test_neg_view_var_unbiased_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2758708Z test_ops.py::TestMathBitsCUDA::test_neg_view_vdot_cuda_float64 PASSED    [ 87%]
2023-01-11T23:13:47.2758937Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_as_complex_cuda_float64 SKIPPED (Operation not tested with tensors with negative bit.) [ 87%]
2023-01-11T23:13:47.2759095Z test_ops.py::TestMathBitsCUDA::test_neg_view_view_copy_cuda_float64 PASSED [ 87%]
2023-01-11T23:13:47.2759273Z test_ops.py::TestMathBitsCUDA::test_neg_view_where_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2759427Z test_ops.py::TestMathBitsCUDA::test_neg_view_xlogy_cuda_float64 PASSED   [ 87%]
2023-01-11T23:13:47.2759582Z test_ops.py::TestMathBitsCUDA::test_neg_view_zeros_cuda_float64 XFAIL    [ 87%]
2023-01-11T23:13:47.2759733Z test_ops.py::TestFakeTensorCUDA::test_fake_H_cuda_float32 PASSED         [ 87%]
2023-01-11T23:13:47.2759890Z test_ops.py::TestFakeTensorCUDA::test_fake___getitem___cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2760042Z test_ops.py::TestFakeTensorCUDA::test_fake___rdiv___cuda_float32 PASSED  [ 87%]
2023-01-11T23:13:47.2760197Z test_ops.py::TestFakeTensorCUDA::test_fake___rmod___cuda_float32 PASSED  [ 87%]
2023-01-11T23:13:47.2760351Z test_ops.py::TestFakeTensorCUDA::test_fake___rpow___cuda_float32 PASSED  [ 87%]
2023-01-11T23:13:47.2760493Z test_ops.py::TestFakeTensorCUDA::test_fake___rxor___cuda_int64 PASSED    [ 87%]
2023-01-11T23:13:47.2760670Z test_ops.py::TestFakeTensorCUDA::test_fake__softmax_backward_data_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2760823Z test_ops.py::TestFakeTensorCUDA::test_fake_abs_cuda_float32 PASSED       [ 87%]
2023-01-11T23:13:47.2760975Z test_ops.py::TestFakeTensorCUDA::test_fake_acosh_cuda_float32 PASSED     [ 87%]
2023-01-11T23:13:47.2761125Z test_ops.py::TestFakeTensorCUDA::test_fake_add_cuda_float32 PASSED       [ 87%]
2023-01-11T23:13:47.2761278Z test_ops.py::TestFakeTensorCUDA::test_fake_addcdiv_cuda_float32 PASSED   [ 87%]
2023-01-11T23:13:47.2761429Z test_ops.py::TestFakeTensorCUDA::test_fake_addcmul_cuda_float32 PASSED   [ 87%]
2023-01-11T23:13:47.2761580Z test_ops.py::TestFakeTensorCUDA::test_fake_addmm_cuda_float32 PASSED     [ 87%]
2023-01-11T23:13:47.2761724Z test_ops.py::TestFakeTensorCUDA::test_fake_addmv_cuda_float32 PASSED     [ 87%]
2023-01-11T23:13:47.2761904Z test_ops.py::TestFakeTensorCUDA::test_fake_all_cuda_float32 PASSED       [ 87%]
2023-01-11T23:13:47.2762056Z test_ops.py::TestFakeTensorCUDA::test_fake_allclose_cuda_float32 PASSED  [ 87%]
2023-01-11T23:13:47.2762212Z test_ops.py::TestFakeTensorCUDA::test_fake_amin_cuda_float32 PASSED      [ 87%]
2023-01-11T23:13:47.2762366Z test_ops.py::TestFakeTensorCUDA::test_fake_any_cuda_float32 PASSED       [ 87%]
2023-01-11T23:13:47.2762518Z test_ops.py::TestFakeTensorCUDA::test_fake_arange_cuda_float32 PASSED    [ 87%]
2023-01-11T23:13:47.2762669Z test_ops.py::TestFakeTensorCUDA::test_fake_argmax_cuda_float32 PASSED    [ 87%]
2023-01-11T23:13:47.2762819Z test_ops.py::TestFakeTensorCUDA::test_fake_argmin_cuda_float32 PASSED    [ 87%]
2023-01-11T23:13:47.2762965Z test_ops.py::TestFakeTensorCUDA::test_fake_argsort_cuda_float32 PASSED   [ 87%]
2023-01-11T23:13:47.2763143Z test_ops.py::TestFakeTensorCUDA::test_fake_as_strided_partial_views_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2763294Z test_ops.py::TestFakeTensorCUDA::test_fake_asin_cuda_float32 PASSED      [ 87%]
2023-01-11T23:13:47.2763445Z test_ops.py::TestFakeTensorCUDA::test_fake_atanh_cuda_float32 PASSED     [ 87%]
2023-01-11T23:13:47.2763607Z test_ops.py::TestFakeTensorCUDA::test_fake_atleast_1d_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2763766Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_H_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2763925Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_T_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2764095Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___radd___cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2764253Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rmul___cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2764418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___ror___cuda_int64 PASSED [ 87%]
2023-01-11T23:13:47.2764586Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rsub___cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2764751Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast___rxor___cuda_int64 PASSED [ 87%]
2023-01-11T23:13:47.2764935Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast__softmax_backward_data_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2765124Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_acosh_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2765288Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_add_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2765452Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addbmm_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2765620Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addcmul_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2765778Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2765955Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmm_decomposed_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2766117Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_addmv_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2766285Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_allclose_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2766478Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_aminmax_cuda_float32 SKIPPED (Skip failing test) [ 87%]
2023-01-11T23:13:47.2766646Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_angle_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2766811Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_argmax_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2766980Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2767151Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_as_strided_scatter_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2767314Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_asin_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2767475Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atan2_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2767659Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atanh_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2767828Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_1d_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2767998Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_2d_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2768167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_atleast_3d_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2768335Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_baddbmm_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2768496Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bernoulli_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2768663Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bfloat16_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2768830Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_and_cuda_int64 PASSED [ 87%]
2023-01-11T23:13:47.2768995Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bitwise_or_cuda_int64 PASSED [ 87%]
2023-01-11T23:13:47.2769163Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_bool_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2769343Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_broadcast_tensors_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2769508Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_byte_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2769674Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cat_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2769838Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdist_cuda_float32 PASSED [ 87%]
2023-01-11T23:13:47.2769996Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cdouble_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2770198Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cholesky_inverse_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2770360Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_chunk_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2770527Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_clamp_max_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2770700Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_column_stack_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2770889Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_complex_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2771053Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2771228Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_conj_physical_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2771387Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_corrcoef_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2771549Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cos_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2771722Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_count_nonzero_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2771886Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cummax_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2772053Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumsum_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2772238Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_cumulative_trapezoid_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2772407Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_deg2rad_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2772575Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diag_embed_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2772749Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_diagonal_copy_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2772906Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_digamma_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2773086Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_floor_rounding_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2773265Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_no_rounding_mode_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2773442Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_div_trunc_rounding_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2773648Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_equal_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2773811Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_erf_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2773976Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_exp2_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2774134Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_eye_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2774293Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_fft_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2774459Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfft_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2774750Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_hfftn_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2774916Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft2_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2775084Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_ifft_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2775254Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft2_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2775419Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_irfft_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2775584Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfft2_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2775748Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fft_rfftn_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2775902Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fill_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2782735Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flip_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2782922Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fliplr_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2783083Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_flipud_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2783260Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_floor_divide_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2783425Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmax_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2783655Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_fmod_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2783825Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_full_like_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2783984Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gather_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2784147Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_gcd_cuda_int64 PASSED [ 88%]
2023-01-11T23:13:47.2784310Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ge_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2784474Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_histc_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2784637Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_hsplit_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2784798Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_igamma_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2784958Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_imag_cuda_complex64 PASSED [ 88%]
2023-01-11T23:13:47.2785129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_copy_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2785289Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_fill_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2785455Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_index_put_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2785618Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isclose_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2785786Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isfinite_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2785948Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isnan_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2786113Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_isposinf_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2786342Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_istft_cuda_complex64 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2786567Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2786736Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_kthvalue_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2786893Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lerp_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2787067Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cond_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2787240Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cross_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2787423Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_det_singular_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2787592Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eig_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2787794Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvals_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2787995Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_eigvalsh_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2788167Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2788333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_inv_ex_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2788516Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_factor_ex_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2788691Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_ldl_solve_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2788869Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2789060Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2789236Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_lu_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2789419Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2789662Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_matrix_rank_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2789929Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_multi_dot_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2790093Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2790289Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_norm_subgradients_at_zero_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2790460Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2790665Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_hermitian_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2790914Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 88%]
2023-01-11T23:13:47.2791112Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_solve_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2791281Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svd_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2791455Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_svdvals_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2791632Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorinv_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2791837Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_tensorsolve_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2792003Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vander_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2792206Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vecdot_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2792387Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_vector_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2792556Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log10_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2792719Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log1p_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2792887Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log2_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2793075Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_log_softmax_with_dtype_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2793248Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logaddexp2_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2793406Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logdet_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2793578Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logical_and_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2793752Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_logspace_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2793922Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_lu_unpack_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2794094Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_amax_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2794269Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_cumprod_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2794449Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_logsumexp_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2794620Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_mean_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2794793Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2794963Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_normalize_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2795139Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_prod_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2795313Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_scatter_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2795514Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_select_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2795684Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_std_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2795852Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_sum_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2796019Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_masked_var_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2796187Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matmul_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2796348Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_matrix_exp_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2796547Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2796734Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_max_reduction_no_dim_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2796928Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_meshgrid_variadic_tensors_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2797109Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_no_dim_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2797292Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_min_reduction_with_dim_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2797456Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_mode_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2797626Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_movedim_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2797794Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_msort_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2797984Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_multinomial_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2798205Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nanmean_cuda_float32 SKIPPED (Skip failing test) [ 88%]
2023-01-11T23:13:47.2798379Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_narrow_copy_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2798565Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_dropout_backward_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2798744Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_native_layer_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2798924Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_empty_strided_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2799093Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_new_zeros_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2799262Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nextafter_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2799471Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2799666Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2799866Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2800064Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_adaptive_max_pool2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2800251Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2800452Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2800639Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_bilinear_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2800836Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2801020Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv1d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2801196Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2801451Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2801645Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_conv_transpose2d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2801839Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cosine_similarity_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2802030Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_cross_entropy_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2802241Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2802458Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2802661Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2802852Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_hardshrink_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2803039Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_huber_loss_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2803223Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_instance_norm_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2803418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2803615Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_linear_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2803814Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 88%]
2023-01-11T23:13:47.2804041Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_interpolate_trilinear_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2804226Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_kl_div_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2804420Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_local_response_norm_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2804607Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_logsigmoid_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2804796Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2804997Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:13:47.2805190Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2805384Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2805573Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2805761Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_max_unpool3d_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2805942Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mish_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2806127Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_mse_loss_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2806320Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2806525Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_one_hot_cuda_int64 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:13:47.2806703Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pad_constant_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2806896Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2807117Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2807300Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_prelu_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2807476Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu6_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2807652Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_relu_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2807832Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_rrelu_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2808018Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_softplus_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2808237Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2808431Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_nn_functional_upsample_nearest_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2808592Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2808762Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_fro_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2808931Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_inf_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2809100Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_norm_nuc_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2809280Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2809488Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_normal_number_mean_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2809691Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2809947Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_ones_like_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2810143Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_outer_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2810317Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pca_lowrank_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2810482Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pinverse_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2810644Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polar_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2810828Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_0_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811010Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_1_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811193Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_polygamma_polygamma_n_4_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811358Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_positive_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811519Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_pow_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811675Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_prod_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_put_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2811999Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rad2deg_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2812164Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rand_like_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2812333Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randint_like_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2812498Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2812664Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_randn_like_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2812835Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_reciprocal_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2812991Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_renorm_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2813188Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resize__cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2813364Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_conj_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2813531Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_resolve_neg_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2813693Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_roll_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2813866Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_0_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2814043Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_round_decimals_neg_3_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2814208Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsqrt_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2814362Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_rsub_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2814656Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scalar_tensor_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2814822Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2815001Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_amax_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2815174Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_prod_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2815349Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_scatter_reduce_sum_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2815521Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_searchsorted_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2815706Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_segment_reduce_offsets_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2815976Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sigmoid_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2816129Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sign_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2816317Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_bartlett_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2816506Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signal_windows_exponential_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2816672Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_signbit_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2816834Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sin_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2816994Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sinc_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2817156Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2817338Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_softmax_with_dtype_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2817506Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_j0_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2817685Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_bessel_y0_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2817878Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_chebyshev_polynomial_t_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2818068Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_hermite_polynomial_he_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2818238Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i0e_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2818403Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_i1_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2818594Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_laguerre_polynomial_l_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2818773Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_log_ndtr_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2818959Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_i1_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2819168Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_modified_bessel_k1_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2819339Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_ndtr_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2819540Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2819733Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2820140Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:13:47.2820533Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 89%]
2023-01-11T23:13:47.2820755Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2820981Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_split_list_args_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2821195Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sqrt_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2821403Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_square_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2821594Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2821775Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_std_mean_unbiased_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2821937Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sub_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2822172Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_sum_to_size_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2822343Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_svd_lowrank_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2822505Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_t_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2822673Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_take_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2822835Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tanh_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2823023Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tensor_split_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:13:47.2823214Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_to_sparse_cuda_float32 SKIPPED (Skip failing test) [ 89%]
2023-01-11T23:13:47.2823377Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_topk_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2823539Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trace_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2823716Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triangular_solve_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2823885Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_tril_indices_cuda_int64 PASSED [ 89%]
2023-01-11T23:13:47.2824048Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_triu_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2824216Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_true_divide_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2824378Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_trunc_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2824538Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unflatten_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2824704Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unfold_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2824883Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_unique_consecutive_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2825051Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_var_mean_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2825230Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_as_complex_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2825418Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_view_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2825581Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_xlogy_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2825740Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zero__cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2825890Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_zeros_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2826049Z test_ops.py::TestFakeTensorCUDA::test_fake_baddbmm_cuda_float32 PASSED   [ 89%]
2023-01-11T23:13:47.2826209Z test_ops.py::TestFakeTensorCUDA::test_fake_bernoulli_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2826367Z test_ops.py::TestFakeTensorCUDA::test_fake_bfloat16_cuda_float32 PASSED  [ 89%]
2023-01-11T23:13:47.2826527Z test_ops.py::TestFakeTensorCUDA::test_fake_bincount_cuda_int64 PASSED    [ 89%]
2023-01-11T23:13:47.2826681Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_not_cuda_int64 PASSED [ 89%]
2023-01-11T23:13:47.2826836Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_or_cuda_int64 PASSED  [ 89%]
2023-01-11T23:13:47.2826992Z test_ops.py::TestFakeTensorCUDA::test_fake_bitwise_xor_cuda_int64 PASSED [ 89%]
2023-01-11T23:13:47.2827145Z test_ops.py::TestFakeTensorCUDA::test_fake_block_diag_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2827295Z test_ops.py::TestFakeTensorCUDA::test_fake_bmm_cuda_float32 PASSED       [ 89%]
2023-01-11T23:13:47.2827446Z test_ops.py::TestFakeTensorCUDA::test_fake_bool_cuda_float32 PASSED      [ 89%]
2023-01-11T23:13:47.2827614Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_shapes_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2827782Z test_ops.py::TestFakeTensorCUDA::test_fake_broadcast_tensors_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2827932Z test_ops.py::TestFakeTensorCUDA::test_fake_byte_cuda_float32 PASSED      [ 89%]
2023-01-11T23:13:47.2828113Z test_ops.py::TestFakeTensorCUDA::test_fake_cdouble_cuda_float32 PASSED   [ 89%]
2023-01-11T23:13:47.2828266Z test_ops.py::TestFakeTensorCUDA::test_fake_char_cuda_float32 PASSED      [ 89%]
2023-01-11T23:13:47.2828426Z test_ops.py::TestFakeTensorCUDA::test_fake_cholesky_solve_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2828583Z test_ops.py::TestFakeTensorCUDA::test_fake_clone_cuda_float32 PASSED     [ 89%]
2023-01-11T23:13:47.2828748Z test_ops.py::TestFakeTensorCUDA::test_fake_combinations_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2828902Z test_ops.py::TestFakeTensorCUDA::test_fake_complex_cuda_float32 PASSED   [ 89%]
2023-01-11T23:13:47.2829052Z test_ops.py::TestFakeTensorCUDA::test_fake_conj_cuda_float32 PASSED      [ 89%]
2023-01-11T23:13:47.2829218Z test_ops.py::TestFakeTensorCUDA::test_fake_constant_pad_nd_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2829372Z test_ops.py::TestFakeTensorCUDA::test_fake_copysign_cuda_float32 PASSED  [ 89%]
2023-01-11T23:13:47.2829529Z test_ops.py::TestFakeTensorCUDA::test_fake_corrcoef_cuda_float32 PASSED  [ 89%]
2023-01-11T23:13:47.2829672Z test_ops.py::TestFakeTensorCUDA::test_fake_cos_cuda_float32 PASSED       [ 89%]
2023-01-11T23:13:47.2829911Z test_ops.py::TestFakeTensorCUDA::test_fake_cosh_cuda_float32 PASSED      [ 89%]
2023-01-11T23:13:47.2830064Z test_ops.py::TestFakeTensorCUDA::test_fake_cross_cuda_float32 PASSED     [ 89%]
2023-01-11T23:13:47.2830242Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_T_cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2830424Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___radd___cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2830606Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rdiv___cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2830789Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmatmul___cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2830969Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rmul___cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2831150Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rpow___cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2831345Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp___rsub___cuda_float32 PASSED [ 89%]
2023-01-11T23:13:47.2831547Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__native_batch_norm_legit_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2831745Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp__softmax_backward_data_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2831922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_acos_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2832100Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_add_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2832284Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcdiv_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2832466Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addcmul_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2832642Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2832842Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_addmm_decomposed_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2833021Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2833219Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_as_strided_partial_views_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2833395Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_asin_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2833572Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atan2_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2833747Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atanh_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2833960Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2834145Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_atleast_3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2834334Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bernoulli_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2834515Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bfloat16_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2834682Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_bmm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2834871Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_broadcast_to_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2835044Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cat_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2835222Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ceil_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2835403Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chalf_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2835580Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_chunk_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2835758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2835942Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_clamp_min_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2836122Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_complex_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2836291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_conj_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2836474Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_corrcoef_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2836648Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cos_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2836827Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cosh_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2837003Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cov_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2837202Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cross_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2837384Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cummax_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2837563Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumsum_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2837753Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_cumulative_trapezoid_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2837942Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_diagonal_copy_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2838125Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_digamma_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2838303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dist_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2838478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dot_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2838660Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_double_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2838836Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_dsplit_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2839018Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_erfc_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2839209Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expand_as_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2839403Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_expm1_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2839595Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fft2_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2839811Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_fftshift_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2839995Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_hfftn_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2840173Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfft2_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2840348Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_ihfftn_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2840524Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_irfft_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2840698Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft2_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2840874Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fft_rfft_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2841043Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flip_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2841231Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_flipud_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2841409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2841596Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_float_power_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2841770Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_floor_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2841949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_fmod_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2842125Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_frac_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2842305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_gather_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2842497Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_grid_sampler_2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2842666Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_i0_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2842876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_fill_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2843060Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_put_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2843245Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_index_select_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2843420Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_inner_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2843596Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_kron_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2843775Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lgamma_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2843961Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cross_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2844147Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2844338Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_det_singular_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2844519Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eig_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2844707Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigh_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2844899Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_eigvals_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2845104Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_householder_product_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2845287Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2845503Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2845698Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_lu_factor_ex_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2845895Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_matrix_power_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2846075Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_multi_dot_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2846260Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2846467Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_norm_subgradients_at_zero_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2846668Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_cuda_float32 SKIPPED (Skipped!) [ 90%]
2023-01-11T23:13:47.2846879Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_pinv_singular_cuda_float32 SKIPPED (Skipped!) [ 90%]
2023-01-11T23:13:47.2847062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_qr_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2847250Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2847439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_solve_ex_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2847620Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_svd_cuda_float32 XFAIL [ 90%]
2023-01-11T23:13:47.2847801Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vander_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2847989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_vecdot_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2848167Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log2_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2848354Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_log_softmax_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2848540Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logaddexp2_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2848754Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logcumsumexp_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2848937Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logdet_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2849136Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_logit_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2849336Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_lu_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2849504Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mT_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2849696Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_cumprod_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2849883Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_fill_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2850072Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_mean_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2850256Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2850446Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_normalize_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2850634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_select_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2850823Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_softmin_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2851011Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_masked_std_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2851189Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_matrix_exp_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2851406Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_binary_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2851618Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2851816Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_max_reduction_with_dim_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2852017Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_meshgrid_variadic_tensors_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2852195Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mode_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2852376Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_movedim_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2852554Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_msort_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2852731Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mv_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2852920Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_3_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2853120Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_mvlgamma_mvlgamma_p_5_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2853305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nan_to_num_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2853487Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanmean_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2853674Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nanquantile_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2853854Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nansum_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2854044Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_native_layer_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2854223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_neg_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2854470Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2854806Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2855018Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2855222Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_adaptive_max_pool1d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2855429Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_alpha_dropout_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2855634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2855841Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_avg_pool3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2856039Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_batch_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2856265Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_binary_cross_entropy_cuda_float32 SKIPPED (Skipped!) [ 90%]
2023-01-11T23:13:47.2856476Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2856685Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_cosine_similarity_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2856890Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_ctc_loss_cuda_float32 SKIPPED (Skipped!) [ 90%]
2023-01-11T23:13:47.2857138Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2857335Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2857535Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_dropout_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2857730Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_elu_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2857935Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_bag_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2858135Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_embedding_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2858363Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2858579Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2858794Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2858980Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_glu_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2859181Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_grid_sample_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2859380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_group_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2859579Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_hardsigmoid_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2859782Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_instance_norm_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2859993Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_area_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2860233Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 90%]
2023-01-11T23:13:47.2860433Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_kl_div_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2860628Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_linear_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2860827Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_local_response_norm_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2861038Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2861237Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_pool3d_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2861443Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool2d_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2861650Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_max_unpool3d_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2861851Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_nll_loss_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2862052Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_normalize_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2862259Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pad_replicate_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2862466Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pairwise_distance_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2862671Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2862904Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2863102Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu6_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2863296Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_relu_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2863491Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_rrelu_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2863694Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2863897Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2864099Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_softsign_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2864300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_tanhshrink_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2864500Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_threshold_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2864704Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2864904Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2865111Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_nn_functional_upsample_nearest_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2865290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2865474Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_norm_inf_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2865658Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2865880Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_normal_number_mean_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2866062Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ormqr_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2866248Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pca_lowrank_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2866431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_permute_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2866621Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pinverse_cuda_float32 SKIPPED (Skipped!) [ 91%]
2023-01-11T23:13:47.2866821Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_0_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2867015Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_2_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2867211Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_polygamma_polygamma_n_3_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2867397Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_positive_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2867574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_pow_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2867759Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_quantile_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2867940Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_ravel_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2868118Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_real_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2868300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_as_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2868507Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_reshape_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2868696Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_resolve_conj_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2868883Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2869075Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_0_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2869266Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_round_decimals_3_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2869455Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_scatter_add_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2869652Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_lengths_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2869924Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_segment_reduce_offsets_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2870123Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_select_scatter_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2870303Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sgn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2870485Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sigmoid_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2870663Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sin_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2870841Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sinc_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2871021Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2871212Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_slice_scatter_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2871425Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sparse_sampled_addmm_cuda_float32 SKIPPED (Skipped!) [ 91%]
2023-01-11T23:13:47.2871618Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_entr_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2871824Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2872015Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_i1e_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2872213Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_log_ndtr_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2872439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2872634Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_special_xlog1py_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2872824Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_split_with_sizes_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2873010Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sqrt_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2873196Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_square_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2873387Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_std_mean_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2873559Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_stft_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2873740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sub_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2873930Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_sum_to_size_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2874110Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_t_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2874288Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tan_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2874494Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tanh_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2874686Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensor_split_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2874875Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tensordot_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2875053Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_tile_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2875223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_to_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2875406Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trace_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2875589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trapz_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2875784Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_triangular_solve_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2875965Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_trunc_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2876148Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unbind_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2876337Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unflatten_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2876519Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_unsqueeze_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2876694Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_as_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2876875Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_view_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2877059Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_vsplit_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2877238Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_where_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2877426Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_xlogy_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2877645Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___getitem___cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2877841Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmatmul___cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2878026Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rmod___cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2878213Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp___rpow___cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2878414Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp__native_batch_norm_legit_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2878598Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_abs_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2878776Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_add_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2878962Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addbmm_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2879163Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addcmul_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2879380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2879588Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmm_decomposed_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2879772Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_addmv_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2879953Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amax_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2880123Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_amin_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2880347Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2880551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_as_strided_partial_views_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2880735Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_asin_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2880910Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atan_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2881091Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atanh_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2881279Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_atleast_1d_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2881467Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_block_diag_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2881656Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_broadcast_to_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2881830Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cat_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2882010Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ceil_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2882190Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chalf_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2882373Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2882567Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2882749Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_chunk_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2882927Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_clone_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2883121Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_combinations_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2883305Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_conj_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2883516Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_constant_pad_nd_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2883705Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_corrcoef_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2883886Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cov_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2884070Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummax_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2884249Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cummin_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2884427Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumsum_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2884633Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_cumulative_trapezoid_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2884814Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diag_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2884993Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_diff_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2885172Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_digamma_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2885351Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dist_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2885550Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_div_no_rounding_mode_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2885729Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dot_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2885914Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_dstack_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2886120Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_einsum_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2886300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_erfinv_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2886481Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_exp2_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2886659Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_expand_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2886842Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_fftshift_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2887024Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft2_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2887206Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfft_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2887389Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_hfftn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2887574Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfft_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2887762Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_ihfftn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2887949Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_irfft2_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2888129Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fft_rfftn_cuda_float32 XFAIL [ 91%]
2023-01-11T23:13:47.2888316Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_flatten_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2888493Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fliplr_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2888672Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_fmin_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2888848Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frac_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2889033Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_frexp_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2889236Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gather_cuda_float32 PASSED [ 91%]
2023-01-11T23:13:47.2889421Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_gradient_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2889618Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_grid_sampler_2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2889797Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_hsplit_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2889973Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_i0_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2890155Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_copy_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2890346Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_fill_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2890536Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_put_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2890728Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_reduce_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2890922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_index_select_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2891105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_inner_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2891283Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kron_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2891467Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_kthvalue_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2891650Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ldexp_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2891907Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cond_cuda_float32 XFAIL [ 92%]
2023-01-11T23:13:47.2892095Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cross_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2892300Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_det_singular_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2892486Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eig_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2892680Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_eigvals_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2892867Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_inv_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2893073Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2893264Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2893460Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_lu_factor_ex_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2893647Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_matrix_norm_cuda_float32 XFAIL [ 92%]
2023-01-11T23:13:47.2893833Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_norm_cuda_float32 XFAIL [ 92%]
2023-01-11T23:13:47.2894024Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2894224Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_pinv_hermitian_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2894409Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_qr_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2894707Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_slogdet_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2894916Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_solve_triangular_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2895146Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_svdvals_cuda_float32 XFAIL [ 92%]
2023-01-11T23:13:47.2895336Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vander_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2895532Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_vector_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2895705Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2895894Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2896092Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_log_softmax_with_dtype_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2896285Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logcumsumexp_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2896472Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logit_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2896662Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_logsumexp_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2896838Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mH_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2897029Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumprod_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2897219Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_cumsum_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2897400Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_fill_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2897594Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_log_softmax_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2897814Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_mean_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2898005Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_median_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2898189Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2898384Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_normalize_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2898572Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_prod_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2898758Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_sum_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2898944Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_masked_var_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2899127Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_binary_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2899343Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2899540Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_max_reduction_no_dim_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2899723Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_mean_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2899905Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_median_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2900092Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_binary_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2900292Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_min_reduction_with_dim_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2900482Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanmedian_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2900669Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nanquantile_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2900868Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_narrow_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2901064Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_batch_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2901256Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_native_layer_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2901478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2901691Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2901895Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_avg_pool3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2902100Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_batch_norm_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2902314Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2902537Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2902734Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_celu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2902934Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2903143Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cosine_similarity_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2903389Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_cross_entropy_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2903595Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2903795Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_dropout_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2903989Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_elu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2904203Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2904412Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2904607Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_gelu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2904813Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_grid_sample_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2905022Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2905223Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_huber_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2905431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_area_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2905644Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2905858Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2906055Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_kl_div_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2906254Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_l1_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2906476Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_leaky_relu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2906680Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2906874Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool2d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2907078Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_pool3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2907282Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2907491Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool1d_grad_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2907703Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool2d_grad_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2907911Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_max_unpool3d_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2908105Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_mish_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2908323Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2908542Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2908740Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_nll_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2908961Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_reflect_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2909173Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pad_replicate_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2909380Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2909589Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2909858Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2910055Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_selu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2910250Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_silu_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2910464Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2910669Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softshrink_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2910869Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_softsign_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2911063Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_tanhshrink_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2911291Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_nn_functional_triplet_margin_with_distance_loss_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2911478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_norm_fro_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2911664Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2911861Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_normal_number_mean_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2912071Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_outer_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2912253Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_polar_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2912439Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_positive_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2912622Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_pow_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2912796Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_prod_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2912978Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_put_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2913165Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_quantile_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2913352Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_ravel_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2913551Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_repeat_interleave_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2913736Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_reshape_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2913917Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rot90_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2914115Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_0_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2914312Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_round_decimals_neg_3_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2914486Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_rsqrt_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2914709Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amax_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2914901Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_amin_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2915099Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_mean_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2915290Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_prod_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2915481Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_scatter_reduce_sum_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2915683Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_segment_reduce_lengths_cuda_float32 XFAIL [ 92%]
2023-01-11T23:13:47.2915865Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sin_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2916048Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2916232Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_slice_scatter_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2916417Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sort_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2916610Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_entr_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2916800Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i0e_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2916988Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_i1_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2917183Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_log_ndtr_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2917371Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_special_ndtr_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2917569Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_list_args_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2917785Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_split_with_sizes_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2917965Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_square_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2918153Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_squeeze_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2918336Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stack_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2918515Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_stft_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2918694Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_sub_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2918876Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_symeig_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2919056Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_t_cuda_float32 PASSED [ 92%]
2023-01-11T23:13:47.2919251Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_along_dim_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2919431Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_take_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2919614Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tensordot_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2919797Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_tile_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2919973Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_to_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2920158Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapezoid_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2920366Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_trapz_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2920545Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_triu_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2920738Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_true_divide_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2920922Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unbind_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2921109Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_unsqueeze_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2921297Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_var_mean_unbiased_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2921478Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vdot_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2921663Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_as_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2921845Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_view_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2922028Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_vsplit_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2922214Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_where_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2922391Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_xlogy_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2922552Z test_ops.py::TestFakeTensorCUDA::test_fake_cumprod_cuda_float32 PASSED   [ 93%]
2023-01-11T23:13:47.2922711Z test_ops.py::TestFakeTensorCUDA::test_fake_cumsum_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2922879Z test_ops.py::TestFakeTensorCUDA::test_fake_cumulative_trapezoid_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2923037Z test_ops.py::TestFakeTensorCUDA::test_fake_deg2rad_cuda_float32 PASSED   [ 93%]
2023-01-11T23:13:47.2923199Z test_ops.py::TestFakeTensorCUDA::test_fake_diag_embed_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2923369Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_copy_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2923539Z test_ops.py::TestFakeTensorCUDA::test_fake_diagonal_scatter_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2923720Z test_ops.py::TestFakeTensorCUDA::test_fake_dist_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2923893Z test_ops.py::TestFakeTensorCUDA::test_fake_div_floor_rounding_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2924062Z test_ops.py::TestFakeTensorCUDA::test_fake_div_trunc_rounding_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2924211Z test_ops.py::TestFakeTensorCUDA::test_fake_dsplit_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2924366Z test_ops.py::TestFakeTensorCUDA::test_fake_einsum_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2924525Z test_ops.py::TestFakeTensorCUDA::test_fake_empty_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2924687Z test_ops.py::TestFakeTensorCUDA::test_fake_empty_like_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2924843Z test_ops.py::TestFakeTensorCUDA::test_fake_eq_cuda_float32 PASSED        [ 93%]
2023-01-11T23:13:47.2924995Z test_ops.py::TestFakeTensorCUDA::test_fake_erf_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2925152Z test_ops.py::TestFakeTensorCUDA::test_fake_erfc_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2925309Z test_ops.py::TestFakeTensorCUDA::test_fake_erfinv_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2925453Z test_ops.py::TestFakeTensorCUDA::test_fake_exp_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2925607Z test_ops.py::TestFakeTensorCUDA::test_fake_expand_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2925761Z test_ops.py::TestFakeTensorCUDA::test_fake_expm1_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2925912Z test_ops.py::TestFakeTensorCUDA::test_fake_eye_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2926070Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fft_cuda_float32 PASSED   [ 93%]
2023-01-11T23:13:47.2926248Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftn_cuda_float32 PASSED  [ 93%]
2023-01-11T23:13:47.2926413Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_fftshift_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2926570Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfft2_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2926717Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_hfftn_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2926877Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ifftn_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2927038Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_ihfft2_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2927191Z test_ops.py::TestFakeTensorCUDA::test_fake_fft_rfft2_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2927342Z test_ops.py::TestFakeTensorCUDA::test_fake_flip_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2927497Z test_ops.py::TestFakeTensorCUDA::test_fake_float_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2927660Z test_ops.py::TestFakeTensorCUDA::test_fake_float_power_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2927814Z test_ops.py::TestFakeTensorCUDA::test_fake_floor_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2927965Z test_ops.py::TestFakeTensorCUDA::test_fake_fmax_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2928112Z test_ops.py::TestFakeTensorCUDA::test_fake_frexp_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2928264Z test_ops.py::TestFakeTensorCUDA::test_fake_full_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2928419Z test_ops.py::TestFakeTensorCUDA::test_fake_gather_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2928567Z test_ops.py::TestFakeTensorCUDA::test_fake_gcd_cuda_int64 PASSED         [ 93%]
2023-01-11T23:13:47.2928735Z test_ops.py::TestFakeTensorCUDA::test_fake_grid_sampler_2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2928887Z test_ops.py::TestFakeTensorCUDA::test_fake_gt_cuda_float32 PASSED        [ 93%]
2023-01-11T23:13:47.2929041Z test_ops.py::TestFakeTensorCUDA::test_fake_i0_cuda_float32 PASSED        [ 93%]
2023-01-11T23:13:47.2929196Z test_ops.py::TestFakeTensorCUDA::test_fake_igamma_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2929342Z test_ops.py::TestFakeTensorCUDA::test_fake_igammac_cuda_float32 PASSED   [ 93%]
2023-01-11T23:13:47.2929525Z test_ops.py::TestFakeTensorCUDA::test_fake_imag_cuda_complex64 PASSED    [ 93%]
2023-01-11T23:13:47.2929677Z test_ops.py::TestFakeTensorCUDA::test_fake_int_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2929829Z test_ops.py::TestFakeTensorCUDA::test_fake_isclose_cuda_float32 PASSED   [ 93%]
2023-01-11T23:13:47.2929981Z test_ops.py::TestFakeTensorCUDA::test_fake_isfinite_cuda_float32 PASSED  [ 93%]
2023-01-11T23:13:47.2930133Z test_ops.py::TestFakeTensorCUDA::test_fake_isin_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2930287Z test_ops.py::TestFakeTensorCUDA::test_fake_isnan_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2930438Z test_ops.py::TestFakeTensorCUDA::test_fake_isreal_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2930618Z test_ops.py::TestFakeTensorCUDA::test_fake_istft_cuda_complex64 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2930821Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_2inputs_2outputs_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2931016Z test_ops.py::TestFakeTensorCUDA::test_fake_jiterator_unary_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2931168Z test_ops.py::TestFakeTensorCUDA::test_fake_kron_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2931322Z test_ops.py::TestFakeTensorCUDA::test_fake_kthvalue_cuda_float32 PASSED  [ 93%]
2023-01-11T23:13:47.2931475Z test_ops.py::TestFakeTensorCUDA::test_fake_lcm_cuda_int64 PASSED         [ 93%]
2023-01-11T23:13:47.2931627Z test_ops.py::TestFakeTensorCUDA::test_fake_ldexp_cuda_float32 PASSED     [ 93%]
2023-01-11T23:13:47.2931779Z test_ops.py::TestFakeTensorCUDA::test_fake_lgamma_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2931935Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cond_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2932121Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2932293Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_det_singular_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2932460Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigh_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2932651Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_eigvals_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2932832Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_householder_product_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2933002Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2933171Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_factor_ex_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2933338Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_ldl_solve_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2933513Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lstsq_grad_oriented_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2933667Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2933836Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2934006Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_lu_factor_ex_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2934174Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_norm_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2934368Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_matrix_power_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2934711Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 93%]
2023-01-11T23:13:47.2934877Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_solve_ex_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2935032Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svd_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2935196Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_svdvals_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2935400Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorinv_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2935593Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_tensorsolve_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2935756Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vander_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2935923Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_vector_norm_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2936080Z test_ops.py::TestFakeTensorCUDA::test_fake_linspace_cuda_float32 PASSED  [ 93%]
2023-01-11T23:13:47.2936232Z test_ops.py::TestFakeTensorCUDA::test_fake_log_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2936392Z test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2936558Z test_ops.py::TestFakeTensorCUDA::test_fake_log_softmax_with_dtype_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2936723Z test_ops.py::TestFakeTensorCUDA::test_fake_logcumsumexp_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2936882Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_and_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2937039Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_or_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2937196Z test_ops.py::TestFakeTensorCUDA::test_fake_logical_xor_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2937353Z test_ops.py::TestFakeTensorCUDA::test_fake_logsumexp_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2937507Z test_ops.py::TestFakeTensorCUDA::test_fake_long_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2937657Z test_ops.py::TestFakeTensorCUDA::test_fake_lu_cuda_float32 PASSED        [ 93%]
2023-01-11T23:13:47.2937831Z test_ops.py::TestFakeTensorCUDA::test_fake_lu_solve_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2938024Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_amin_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2938188Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_argmax_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2938355Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_cumprod_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2938520Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_logaddexp_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2938678Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_mean_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2938839Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_median_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2939000Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_norm_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2939171Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_prod_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2939359Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_scatter_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2939526Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_select_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2939684Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmax_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2939844Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_softmin_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2940001Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_std_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2940162Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_sum_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2940319Z test_ops.py::TestFakeTensorCUDA::test_fake_masked_var_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2940471Z test_ops.py::TestFakeTensorCUDA::test_fake_matmul_cuda_float32 PASSED    [ 93%]
2023-01-11T23:13:47.2940619Z test_ops.py::TestFakeTensorCUDA::test_fake_matrix_exp_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2940804Z test_ops.py::TestFakeTensorCUDA::test_fake_max_pool2d_with_indices_backward_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2940977Z test_ops.py::TestFakeTensorCUDA::test_fake_max_reduction_no_dim_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2941128Z test_ops.py::TestFakeTensorCUDA::test_fake_mean_cuda_float32 PASSED      [ 93%]
2023-01-11T23:13:47.2941328Z test_ops.py::TestFakeTensorCUDA::test_fake_meshgrid_list_of_tensors_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2941499Z test_ops.py::TestFakeTensorCUDA::test_fake_min_reduction_with_dim_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2941649Z test_ops.py::TestFakeTensorCUDA::test_fake_mul_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2941844Z test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_1_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2942029Z test_ops.py::TestFakeTensorCUDA::test_fake_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2942183Z test_ops.py::TestFakeTensorCUDA::test_fake_nan_to_num_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2942368Z test_ops.py::TestFakeTensorCUDA::test_fake_nanquantile_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2942528Z test_ops.py::TestFakeTensorCUDA::test_fake_narrow_copy_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2942703Z test_ops.py::TestFakeTensorCUDA::test_fake_narrow_cuda_float32 SKIPPED (Skip failing test) [ 93%]
2023-01-11T23:13:47.2942878Z test_ops.py::TestFakeTensorCUDA::test_fake_native_dropout_backward_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2943043Z test_ops.py::TestFakeTensorCUDA::test_fake_native_layer_norm_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2943199Z test_ops.py::TestFakeTensorCUDA::test_fake_neg_cuda_float32 PASSED       [ 93%]
2023-01-11T23:13:47.2943354Z test_ops.py::TestFakeTensorCUDA::test_fake_new_ones_cuda_float32 PASSED  [ 93%]
2023-01-11T23:13:47.2943542Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional__scaled_dot_product_attention_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2943727Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2943948Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2944126Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_alpha_dropout_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2944301Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_avg_pool1d_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2944473Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_batch_norm_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2944645Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_bilinear_cuda_float32 PASSED [ 93%]
2023-01-11T23:13:47.2944832Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2945021Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2945190Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv1d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2945358Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv2d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2945540Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2945721Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_cosine_similarity_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2945892Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_dropout3d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2946094Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_embedding_bag_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2946288Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_with_train_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2946486Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2946664Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_fractional_max_pool3d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2946847Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gaussian_nll_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2947036Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_gelu_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2947209Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_group_norm_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2947383Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardshrink_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2947553Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardswish_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2947726Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_hardtanh_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2947902Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_instance_norm_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2948085Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2948267Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_bilinear_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2948452Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_linear_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2948639Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2948805Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_kl_div_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2948977Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_layer_norm_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2949166Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_local_response_norm_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2949360Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_logsigmoid_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2949556Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_margin_ranking_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2949841Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2950007Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool2d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2950181Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_pool3d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2950356Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool1d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2950532Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_max_unpool3d_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2950722Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2950895Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_normalize_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2951067Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pad_constant_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2951248Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2951416Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu6_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2951575Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_relu_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2951741Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_rrelu_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2951907Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_selu_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2952083Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2952252Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2952433Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2952607Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_softsign_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2952780Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_threshold_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2952984Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_bilinear_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2953168Z test_ops.py::TestFakeTensorCUDA::test_fake_nn_functional_upsample_nearest_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2953322Z test_ops.py::TestFakeTensorCUDA::test_fake_nonzero_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2953477Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_fro_cuda_float32 PASSED  [ 94%]
2023-01-11T23:13:47.2953630Z test_ops.py::TestFakeTensorCUDA::test_fake_norm_inf_cuda_float32 PASSED  [ 94%]
2023-01-11T23:13:47.2953785Z test_ops.py::TestFakeTensorCUDA::test_fake_ones_like_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2953936Z test_ops.py::TestFakeTensorCUDA::test_fake_ormqr_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2954094Z test_ops.py::TestFakeTensorCUDA::test_fake_permute_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2954247Z test_ops.py::TestFakeTensorCUDA::test_fake_polar_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2954416Z test_ops.py::TestFakeTensorCUDA::test_fake_polygamma_polygamma_n_1_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2954567Z test_ops.py::TestFakeTensorCUDA::test_fake_positive_cuda_float32 PASSED  [ 94%]
2023-01-11T23:13:47.2954715Z test_ops.py::TestFakeTensorCUDA::test_fake_put_cuda_float32 PASSED       [ 94%]
2023-01-11T23:13:47.2954868Z test_ops.py::TestFakeTensorCUDA::test_fake_rand_like_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2955027Z test_ops.py::TestFakeTensorCUDA::test_fake_randn_like_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2955174Z test_ops.py::TestFakeTensorCUDA::test_fake_real_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2955333Z test_ops.py::TestFakeTensorCUDA::test_fake_reciprocal_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2955511Z test_ops.py::TestFakeTensorCUDA::test_fake_repeat_cuda_float32 PASSED    [ 94%]
2023-01-11T23:13:47.2955693Z test_ops.py::TestFakeTensorCUDA::test_fake_repeat_interleave_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2955846Z test_ops.py::TestFakeTensorCUDA::test_fake_resize__cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2956007Z test_ops.py::TestFakeTensorCUDA::test_fake_resolve_conj_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2956155Z test_ops.py::TestFakeTensorCUDA::test_fake_round_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2956324Z test_ops.py::TestFakeTensorCUDA::test_fake_round_decimals_3_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2956472Z test_ops.py::TestFakeTensorCUDA::test_fake_rsqrt_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2956624Z test_ops.py::TestFakeTensorCUDA::test_fake_rsub_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2956773Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2956936Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amax_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2957103Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_amin_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2957271Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_mean_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2957437Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_prod_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2957600Z test_ops.py::TestFakeTensorCUDA::test_fake_scatter_reduce_sum_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2957763Z test_ops.py::TestFakeTensorCUDA::test_fake_searchsorted_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2957956Z test_ops.py::TestFakeTensorCUDA::test_fake_segment_reduce_lengths_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2958121Z test_ops.py::TestFakeTensorCUDA::test_fake_select_scatter_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2958263Z test_ops.py::TestFakeTensorCUDA::test_fake_sgn_cuda_float32 PASSED       [ 94%]
2023-01-11T23:13:47.2958417Z test_ops.py::TestFakeTensorCUDA::test_fake_short_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2958571Z test_ops.py::TestFakeTensorCUDA::test_fake_sigmoid_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2958748Z test_ops.py::TestFakeTensorCUDA::test_fake_sign_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2958921Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_blackman_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2959103Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_general_cosine_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2959293Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_hann_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2959485Z test_ops.py::TestFakeTensorCUDA::test_fake_signal_windows_kaiser_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2959636Z test_ops.py::TestFakeTensorCUDA::test_fake_signbit_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2959783Z test_ops.py::TestFakeTensorCUDA::test_fake_sinc_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2959932Z test_ops.py::TestFakeTensorCUDA::test_fake_sinh_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2960087Z test_ops.py::TestFakeTensorCUDA::test_fake_softmax_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2960280Z test_ops.py::TestFakeTensorCUDA::test_fake_sparse_sampled_addmm_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2960442Z test_ops.py::TestFakeTensorCUDA::test_fake_special_airy_ai_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2960607Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_j0_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2960769Z test_ops.py::TestFakeTensorCUDA::test_fake_special_bessel_y1_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2961141Z test_ops.py::TestFakeTensorCUDA::test_fake_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:13:47.2961296Z test_ops.py::TestFakeTensorCUDA::test_fake_special_entr_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2961481Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i0e_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2961638Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i1_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2961799Z test_ops.py::TestFakeTensorCUDA::test_fake_special_i1e_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2961979Z test_ops.py::TestFakeTensorCUDA::test_fake_special_laguerre_polynomial_l_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2962327Z test_ops.py::TestFakeTensorCUDA::test_fake_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:13:47.2962494Z test_ops.py::TestFakeTensorCUDA::test_fake_special_log_ndtr_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2962670Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i0_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2962848Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_i1_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2963017Z test_ops.py::TestFakeTensorCUDA::test_fake_special_modified_bessel_k1_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2963208Z test_ops.py::TestFakeTensorCUDA::test_fake_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2963567Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:13:47.2963927Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:13:47.2964275Z test_ops.py::TestFakeTensorCUDA::test_fake_special_shifted_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 94%]
2023-01-11T23:13:47.2964435Z test_ops.py::TestFakeTensorCUDA::test_fake_special_zeta_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2964602Z test_ops.py::TestFakeTensorCUDA::test_fake_split_list_args_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2964757Z test_ops.py::TestFakeTensorCUDA::test_fake_square_cuda_float32 PASSED    [ 94%]
2023-01-11T23:13:47.2964940Z test_ops.py::TestFakeTensorCUDA::test_fake_std_cuda_float32 PASSED       [ 94%]
2023-01-11T23:13:47.2965092Z test_ops.py::TestFakeTensorCUDA::test_fake_sub_cuda_float32 PASSED       [ 94%]
2023-01-11T23:13:47.2965243Z test_ops.py::TestFakeTensorCUDA::test_fake_sum_to_size_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2965393Z test_ops.py::TestFakeTensorCUDA::test_fake_svd_cuda_float32 PASSED       [ 94%]
2023-01-11T23:13:47.2965550Z test_ops.py::TestFakeTensorCUDA::test_fake_svd_lowrank_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2965703Z test_ops.py::TestFakeTensorCUDA::test_fake_symeig_cuda_float32 PASSED    [ 94%]
2023-01-11T23:13:47.2965854Z test_ops.py::TestFakeTensorCUDA::test_fake_t_cuda_float32 PASSED         [ 94%]
2023-01-11T23:13:47.2966008Z test_ops.py::TestFakeTensorCUDA::test_fake_tanh_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2966194Z test_ops.py::TestFakeTensorCUDA::test_fake_tensor_split_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2966345Z test_ops.py::TestFakeTensorCUDA::test_fake_tile_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2966519Z test_ops.py::TestFakeTensorCUDA::test_fake_to_sparse_cuda_float32 SKIPPED (Skip failing test) [ 94%]
2023-01-11T23:13:47.2966672Z test_ops.py::TestFakeTensorCUDA::test_fake_topk_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2966825Z test_ops.py::TestFakeTensorCUDA::test_fake_transpose_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2966992Z test_ops.py::TestFakeTensorCUDA::test_fake_triangular_solve_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2967139Z test_ops.py::TestFakeTensorCUDA::test_fake_tril_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2967297Z test_ops.py::TestFakeTensorCUDA::test_fake_true_divide_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2967472Z test_ops.py::TestFakeTensorCUDA::test_fake_trunc_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2967623Z test_ops.py::TestFakeTensorCUDA::test_fake_unbind_cuda_float32 PASSED    [ 94%]
2023-01-11T23:13:47.2967778Z test_ops.py::TestFakeTensorCUDA::test_fake_unfold_copy_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2967931Z test_ops.py::TestFakeTensorCUDA::test_fake_uniform_cuda_float32 PASSED   [ 94%]
2023-01-11T23:13:47.2968100Z test_ops.py::TestFakeTensorCUDA::test_fake_unique_consecutive_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2968253Z test_ops.py::TestFakeTensorCUDA::test_fake_unsqueeze_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2968402Z test_ops.py::TestFakeTensorCUDA::test_fake_var_cuda_float32 PASSED       [ 94%]
2023-01-11T23:13:47.2968556Z test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_cuda_float32 PASSED  [ 94%]
2023-01-11T23:13:47.2968723Z test_ops.py::TestFakeTensorCUDA::test_fake_var_mean_unbiased_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2968877Z test_ops.py::TestFakeTensorCUDA::test_fake_vdot_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2969024Z test_ops.py::TestFakeTensorCUDA::test_fake_view_cuda_float32 PASSED      [ 94%]
2023-01-11T23:13:47.2969195Z test_ops.py::TestFakeTensorCUDA::test_fake_xlogy_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2969369Z test_ops.py::TestFakeTensorCUDA::test_fake_zero__cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2969517Z test_ops.py::TestFakeTensorCUDA::test_fake_zeros_cuda_float32 PASSED     [ 94%]
2023-01-11T23:13:47.2969686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___getitem___cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2969849Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rand___cuda_int64 PASSED [ 94%]
2023-01-11T23:13:47.2970014Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rdiv___cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2970178Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmod___cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2970338Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rmul___cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2970498Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rsub___cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2970686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops___rxor___cuda_int64 PASSED [ 94%]
2023-01-11T23:13:47.2970852Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_add_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2971018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addcdiv_cuda_float32 PASSED [ 94%]
2023-01-11T23:13:47.2971183Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_addmm_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2971345Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_amax_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2971510Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_any_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2971673Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_arange_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2971831Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_argmax_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972008Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_as_strided_scatter_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972172Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asin_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972335Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_asinh_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972494Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atan2_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972650Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atanh_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972818Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_2d_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2972986Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_atleast_3d_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2973144Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_baddbmm_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2973336Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bfloat16_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2973501Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_and_cuda_int64 PASSED [ 95%]
2023-01-11T23:13:47.2973669Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_or_cuda_int64 PASSED [ 95%]
2023-01-11T23:13:47.2973839Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bitwise_xor_cuda_int64 PASSED [ 95%]
2023-01-11T23:13:47.2974002Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_bmm_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2974179Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_shapes_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2974356Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_tensors_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2974636Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_broadcast_to_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2974805Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cartesian_prod_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2974974Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cfloat_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2975138Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_chalf_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2975303Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_char_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2975501Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_inverse_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2975677Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cholesky_solve_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2975842Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clamp_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2976002Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_clone_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2976161Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_complex_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2976334Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_conj_physical_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2976500Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_corrcoef_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2976702Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cos_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2976889Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cov_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2977059Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cummin_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2977223Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_cumsum_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2977391Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_deg2rad_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2977556Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diag_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2977718Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagflat_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2977888Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2978074Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diagonal_scatter_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2978240Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_diff_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2978418Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_floor_rounding_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2978600Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_no_rounding_mode_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2978778Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_div_trunc_rounding_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2978941Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dot_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2979099Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dsplit_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2979260Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_dstack_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2979454Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_einsum_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2979620Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_empty_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2979786Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_equal_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2979950Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_erfinv_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2980116Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp2_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2980280Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_exp_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2980445Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_as_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2980613Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expand_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2980779Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_expm1_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2980947Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_eye_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2981117Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft2_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2981284Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fft_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2981455Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_fftn_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2981623Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_hfft_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2981794Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft2_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2981953Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifft_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2982124Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftn_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2982301Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_ifftshift_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2982473Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft2_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2982719Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_irfft_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2982888Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fft_rfftn_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2983053Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fliplr_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2983215Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_flipud_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2983371Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_float_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2983532Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_floor_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2983695Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_fmax_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2983858Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_full_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2984022Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gather_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2984186Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_gcd_cuda_int64 PASSED [ 95%]
2023-01-11T23:13:47.2984349Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_half_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2984518Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_heaviside_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2984671Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_histc_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2984833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hsplit_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2984995Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_hypot_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2985157Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_i0_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2985357Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_igamma_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2985525Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_add_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2985698Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_copy_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2985866Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_fill_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2986040Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_index_select_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2986195Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isin_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2986356Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isinf_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2986514Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isnan_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2986680Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isneginf_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2986852Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isposinf_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2987018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_isreal_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2987211Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_istft_cuda_complex64 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2987428Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_4inputs_with_extra_args_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2987631Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_binary_return_by_ref_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2987827Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_jiterator_unary_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2987996Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_kthvalue_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2988156Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lcm_cuda_int64 PASSED [ 95%]
2023-01-11T23:13:47.2988319Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_ldexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2988481Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_le_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2988678Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cross_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2988865Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_det_singular_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2989036Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigh_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2989226Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvals_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2989422Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_eigvalsh_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2989590Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_inv_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2989837Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lstsq_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2990007Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2990184Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_factor_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2990360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_lu_solve_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2990561Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_power_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2990738Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_matrix_rank_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2990904Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_slogdet_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2991077Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_solve_ex_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2991310Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_tensorsolve_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2991493Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_vector_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2991661Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linspace_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2991824Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log10_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2991984Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log1p_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2992156Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2992341Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_log_softmax_with_dtype_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2992511Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logaddexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2992734Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_or_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2992960Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logical_xor_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2993135Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logit_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2993302Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_logsumexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2993467Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lt_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2993655Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_solve_cuda_float32 SKIPPED (Skip failing test) [ 95%]
2023-01-11T23:13:47.2993823Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_lu_unpack_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2993977Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mH_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2994138Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mT_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2994316Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_argmax_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2994489Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_cumprod_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2994709Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_fill_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2994889Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_log_softmax_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2995067Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_logaddexp_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2995236Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_mean_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2995405Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_norm_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2995575Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_normalize_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2995748Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_select_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2995916Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_masked_sum_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2996083Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_matmul_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2996251Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_maximum_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2996414Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mean_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2996600Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_list_of_tensors_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2996788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_meshgrid_variadic_tensors_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2996947Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_binary_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2997130Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_no_dim_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2997339Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_min_reduction_with_dim_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2997505Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mul_cuda_float32 PASSED [ 95%]
2023-01-11T23:13:47.2997709Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_mvlgamma_mvlgamma_p_5_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.2997876Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nan_to_num_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2998066Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmean_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.2998235Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nanmedian_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2998400Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nansum_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2998562Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_narrow_copy_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2998753Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_dropout_backward_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2998935Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_native_layer_norm_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2999103Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2999276Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_new_empty_strided_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2999441Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nextafter_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2999637Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool1d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.2999833Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool2d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3000029Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_avg_pool3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3000219Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_adaptive_max_pool3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3000434Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_alpha_dropout_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3000621Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_avg_pool3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3000820Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_batch_norm_without_cudnn_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3001027Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_binary_cross_entropy_with_logits_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3001205Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_celu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3001386Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv2d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3001582Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose1d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3001774Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_conv_transpose3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3001955Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3002139Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_dropout_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3002350Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_feature_alpha_dropout_without_train_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3002547Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_fractional_max_pool2d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3002724Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_glu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3002912Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardshrink_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3003124Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hardswish_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3003323Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_hinge_embedding_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3003507Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_huber_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3003688Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_instance_norm_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3003881Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_area_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3004074Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_bicubic_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3004268Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_linear_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3004465Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_interpolate_nearest_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3004648Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_kl_div_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3004831Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_layer_norm_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3005014Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_leaky_relu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3005222Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool1d_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.3005396Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_pool3d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3005584Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_max_unpool2d_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3005770Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_mse_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3005960Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multi_margin_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3006180Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_margin_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3006384Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_multilabel_soft_margin_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3006568Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_normalize_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3006769Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_one_hot_cuda_int64 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.3006960Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_constant_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3007142Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pad_replicate_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3007339Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pairwise_distance_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3007532Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_shuffle_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3007724Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_pixel_unshuffle_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3007913Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_poisson_nll_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3008094Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_rrelu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3008272Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_selu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3008449Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_silu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3008663Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_smooth_l1_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3008845Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_soft_margin_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3009033Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3009226Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softmin_with_dtype_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3009410Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softplus_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3009595Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softshrink_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3009780Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_softsign_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3009975Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_triplet_margin_loss_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3010169Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_nn_functional_upsample_nearest_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3010338Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_norm_fro_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3010500Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3010677Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_normal_number_mean_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3010842Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_outer_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3011009Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_permute_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3011194Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_1_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3011383Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_3_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3011570Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_polygamma_polygamma_n_4_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3011737Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_positive_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3011940Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_prod_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3012102Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rand_like_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3012269Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_renorm_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3012431Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_repeat_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3012602Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_reshape_as_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3012771Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_resolve_neg_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3012936Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3013121Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_round_decimals_neg_3_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3013283Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsqrt_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3013442Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_rsub_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3013619Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scalar_tensor_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3013788Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_add_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3013968Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_amin_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3014143Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_scatter_reduce_prod_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3014356Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_lengths_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.3014772Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_segment_reduce_offsets_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3014977Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3015155Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_select_scatter_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3015310Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sgn_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3015473Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_short_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3015642Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sigmoid_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3015808Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sign_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3015992Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_bartlett_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3016175Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_blackman_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3016360Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_cosine_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3016548Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_exponential_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3016739Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_general_hamming_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3016909Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_hann_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3017089Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signal_windows_kaiser_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3017258Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_signbit_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3017428Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_slice_scatter_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3017595Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3017770Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_softmax_with_dtype_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3018030Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sparse_sampled_addmm_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.3018210Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_airy_ai_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3018380Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_j1_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3018558Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_bessel_y0_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3018751Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_u_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3019140Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:13:47.3019508Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:13:47.3019686Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_entr_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3019857Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_i1e_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3020052Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_laguerre_polynomial_l_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3020231Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_log_ndtr_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3020420Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k0_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3020600Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_modified_bessel_k1_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3020810Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtr_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3020987Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_ndtri_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3021196Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_polygamma_special_polygamma_n_0_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3021390Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_scaled_modified_bessel_k1_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3021766Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:13:47.3022129Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:13:47.3022487Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_shifted_chebyshev_polynomial_v_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 96%]
2023-01-11T23:13:47.3022668Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_special_xlog1py_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3022847Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_list_args_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3023018Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_split_with_sizes_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3023187Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sqrt_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3023354Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_squeeze_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3023524Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stack_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3023693Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3023875Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_mean_unbiased_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3024053Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_std_unbiased_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3024250Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_stft_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3024406Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sub_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3024564Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3024735Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_sum_to_size_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3024897Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_take_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3025056Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tanh_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3025226Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tensordot_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3025393Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tile_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3025583Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_to_sparse_cuda_float32 SKIPPED (Skip failing test) [ 96%]
2023-01-11T23:13:47.3025756Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_transpose_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3025914Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_trapz_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3026074Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_tril_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3026235Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_triu_cuda_float32 PASSED [ 96%]
2023-01-11T23:13:47.3026407Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_true_divide_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3026573Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unbind_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3026742Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unflatten_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3026946Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_unique_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3027121Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_complex_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3027282Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_as_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3027449Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_view_copy_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3027613Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_where_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3027774Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_xlogy_cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3027932Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_zero__cuda_float32 PASSED [ 97%]
2023-01-11T23:13:47.3028094Z test_ops.py::TestTagsCUDA::test_tags_H_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3028262Z test_ops.py::TestTagsCUDA::test_tags___getitem___cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3028434Z test_ops.py::TestTagsCUDA::test_tags___radd___cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3028588Z test_ops.py::TestTagsCUDA::test_tags___rand___cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3028751Z test_ops.py::TestTagsCUDA::test_tags___rdiv___cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3028914Z test_ops.py::TestTagsCUDA::test_tags___rmul___cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3029070Z test_ops.py::TestTagsCUDA::test_tags___rpow___cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3029258Z test_ops.py::TestTagsCUDA::test_tags__native_batch_norm_legit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3029445Z test_ops.py::TestTagsCUDA::test_tags__refs_T_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3029649Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_chalf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3029920Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_char_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3030103Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_half_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3030305Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_int_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3030483Z test_ops.py::TestTagsCUDA::test_tags__refs__conversions_long_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3030649Z test_ops.py::TestTagsCUDA::test_tags__refs_abs_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3030816Z test_ops.py::TestTagsCUDA::test_tags__refs_acosh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3030985Z test_ops.py::TestTagsCUDA::test_tags__refs_addr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3031149Z test_ops.py::TestTagsCUDA::test_tags__refs_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3031324Z test_ops.py::TestTagsCUDA::test_tags__refs_arange_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3031498Z test_ops.py::TestTagsCUDA::test_tags__refs_as_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3031665Z test_ops.py::TestTagsCUDA::test_tags__refs_asin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3031826Z test_ops.py::TestTagsCUDA::test_tags__refs_asinh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3031992Z test_ops.py::TestTagsCUDA::test_tags__refs_atan2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3032160Z test_ops.py::TestTagsCUDA::test_tags__refs_atan_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3032331Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_or_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3032501Z test_ops.py::TestTagsCUDA::test_tags__refs_bitwise_xor_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3032711Z test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_shapes_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3032893Z test_ops.py::TestTagsCUDA::test_tags__refs_broadcast_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3033067Z test_ops.py::TestTagsCUDA::test_tags__refs_bucketize_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3033232Z test_ops.py::TestTagsCUDA::test_tags__refs_cat_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3033390Z test_ops.py::TestTagsCUDA::test_tags__refs_clamp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3033566Z test_ops.py::TestTagsCUDA::test_tags__refs_clamp_min_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3033735Z test_ops.py::TestTagsCUDA::test_tags__refs_clone_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3033911Z test_ops.py::TestTagsCUDA::test_tags__refs_column_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3034082Z test_ops.py::TestTagsCUDA::test_tags__refs_conj_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3034257Z test_ops.py::TestTagsCUDA::test_tags__refs_contiguous_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3034430Z test_ops.py::TestTagsCUDA::test_tags__refs_copysign_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3034592Z test_ops.py::TestTagsCUDA::test_tags__refs_cos_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3034750Z test_ops.py::TestTagsCUDA::test_tags__refs_cosh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3034928Z test_ops.py::TestTagsCUDA::test_tags__refs_diag_embed_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3035107Z test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3035279Z test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3035465Z test_ops.py::TestTagsCUDA::test_tags__refs_diagonal_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3035636Z test_ops.py::TestTagsCUDA::test_tags__refs_digamma_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3035849Z test_ops.py::TestTagsCUDA::test_tags__refs_div_no_rounding_mode_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3036032Z test_ops.py::TestTagsCUDA::test_tags__refs_div_trunc_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3036198Z test_ops.py::TestTagsCUDA::test_tags__refs_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3036356Z test_ops.py::TestTagsCUDA::test_tags__refs_erf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3036521Z test_ops.py::TestTagsCUDA::test_tags__refs_erfc_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3036689Z test_ops.py::TestTagsCUDA::test_tags__refs_exp2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3036864Z test_ops.py::TestTagsCUDA::test_tags__refs_expand_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3037028Z test_ops.py::TestTagsCUDA::test_tags__refs_eye_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3037199Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3037364Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3037539Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_fftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3037715Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3037878Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3038045Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_hfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3038232Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3038403Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifft_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3038570Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_ifftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3038741Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3038911Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_irfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3039076Z test_ops.py::TestTagsCUDA::test_tags__refs_fft_rfft2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3039253Z test_ops.py::TestTagsCUDA::test_tags__refs_float_power_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3039413Z test_ops.py::TestTagsCUDA::test_tags__refs_floor_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3039588Z test_ops.py::TestTagsCUDA::test_tags__refs_floor_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3039755Z test_ops.py::TestTagsCUDA::test_tags__refs_fmin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3039921Z test_ops.py::TestTagsCUDA::test_tags__refs_fmod_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3040088Z test_ops.py::TestTagsCUDA::test_tags__refs_frac_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3040252Z test_ops.py::TestTagsCUDA::test_tags__refs_gcd_cuda_int64 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3040417Z test_ops.py::TestTagsCUDA::test_tags__refs_ge_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3040587Z test_ops.py::TestTagsCUDA::test_tags__refs_hstack_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3040761Z test_ops.py::TestTagsCUDA::test_tags__refs_index_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3040934Z test_ops.py::TestTagsCUDA::test_tags__refs_index_select_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3041108Z test_ops.py::TestTagsCUDA::test_tags__refs_isfinite_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3041303Z test_ops.py::TestTagsCUDA::test_tags__refs_isnan_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3041475Z test_ops.py::TestTagsCUDA::test_tags__refs_isneginf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3041645Z test_ops.py::TestTagsCUDA::test_tags__refs_isposinf_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3041812Z test_ops.py::TestTagsCUDA::test_tags__refs_le_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3041990Z test_ops.py::TestTagsCUDA::test_tags__refs_linalg_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3042168Z test_ops.py::TestTagsCUDA::test_tags__refs_linspace_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3042332Z test_ops.py::TestTagsCUDA::test_tags__refs_log1p_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3042498Z test_ops.py::TestTagsCUDA::test_tags__refs_log2_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3042689Z test_ops.py::TestTagsCUDA::test_tags__refs_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3042865Z test_ops.py::TestTagsCUDA::test_tags__refs_logical_xor_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3043034Z test_ops.py::TestTagsCUDA::test_tags__refs_logspace_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3043207Z test_ops.py::TestTagsCUDA::test_tags__refs_logsumexp_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3043379Z test_ops.py::TestTagsCUDA::test_tags__refs_masked_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3043547Z test_ops.py::TestTagsCUDA::test_tags__refs_maximum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3043746Z test_ops.py::TestTagsCUDA::test_tags__refs_minimum_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3043902Z test_ops.py::TestTagsCUDA::test_tags__refs_mul_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3044077Z test_ops.py::TestTagsCUDA::test_tags__refs_nan_to_num_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3044248Z test_ops.py::TestTagsCUDA::test_tags__refs_narrow_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3044417Z test_ops.py::TestTagsCUDA::test_tags__refs_narrow_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3044602Z test_ops.py::TestTagsCUDA::test_tags__refs_native_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3044768Z test_ops.py::TestTagsCUDA::test_tags__refs_ne_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3044948Z test_ops.py::TestTagsCUDA::test_tags__refs_new_empty_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3045137Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_celu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3045315Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_elu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3045487Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_gelu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3045678Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hardtanh_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3045878Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_hinge_embedding_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3046063Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_l1_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3046253Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3046442Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_leaky_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3046654Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3046851Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3047038Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_prelu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3047209Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3047408Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3047596Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softplus_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3047790Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_softshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3047977Z test_ops.py::TestTagsCUDA::test_tags__refs_nn_functional_tanhshrink_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3048145Z test_ops.py::TestTagsCUDA::test_tags__refs_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3048314Z test_ops.py::TestTagsCUDA::test_tags__refs_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3048487Z test_ops.py::TestTagsCUDA::test_tags__refs_permute_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3048655Z test_ops.py::TestTagsCUDA::test_tags__refs_pow_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3048821Z test_ops.py::TestTagsCUDA::test_tags__refs_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3048981Z test_ops.py::TestTagsCUDA::test_tags__refs_randn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3049184Z test_ops.py::TestTagsCUDA::test_tags__refs_remainder_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3049355Z test_ops.py::TestTagsCUDA::test_tags__refs_reshape_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3049522Z test_ops.py::TestTagsCUDA::test_tags__refs_roll_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3049689Z test_ops.py::TestTagsCUDA::test_tags__refs_rot90_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3049854Z test_ops.py::TestTagsCUDA::test_tags__refs_sgn_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3050017Z test_ops.py::TestTagsCUDA::test_tags__refs_sign_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3050188Z test_ops.py::TestTagsCUDA::test_tags__refs_signbit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3050341Z test_ops.py::TestTagsCUDA::test_tags__refs_sin_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3050526Z test_ops.py::TestTagsCUDA::test_tags__refs_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3050704Z test_ops.py::TestTagsCUDA::test_tags__refs_special_entr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3050882Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i0e_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3051056Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i1_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3051228Z test_ops.py::TestTagsCUDA::test_tags__refs_special_i1e_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3051425Z test_ops.py::TestTagsCUDA::test_tags__refs_special_log_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3051605Z test_ops.py::TestTagsCUDA::test_tags__refs_special_logit_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3051783Z test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3051957Z test_ops.py::TestTagsCUDA::test_tags__refs_special_ndtri_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3052176Z test_ops.py::TestTagsCUDA::test_tags__refs_special_softmax_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3052373Z test_ops.py::TestTagsCUDA::test_tags__refs_special_spherical_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3052549Z test_ops.py::TestTagsCUDA::test_tags__refs_special_zeta_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3052719Z test_ops.py::TestTagsCUDA::test_tags__refs_std_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3052883Z test_ops.py::TestTagsCUDA::test_tags__refs_sub_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:13:47.3053056Z test_ops.py::TestTagsCUDA::test_tags__refs_sum_to_size_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3053220Z test_ops.py::TestTagsCUDA::test_tags__refs_tan_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3053383Z test_ops.py::TestTagsCUDA::test_tags__refs_tanh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3053551Z test_ops.py::TestTagsCUDA::test_tags__refs_tensor_split_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3053716Z test_ops.py::TestTagsCUDA::test_tags__refs_to_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3053891Z test_ops.py::TestTagsCUDA::test_tags__refs_tril_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3054061Z test_ops.py::TestTagsCUDA::test_tags__refs_triu_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3054233Z test_ops.py::TestTagsCUDA::test_tags__refs_true_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3054404Z test_ops.py::TestTagsCUDA::test_tags__refs_unbind_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3054703Z test_ops.py::TestTagsCUDA::test_tags__refs_unflatten_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3054931Z test_ops.py::TestTagsCUDA::test_tags__refs_unfold_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3055104Z test_ops.py::TestTagsCUDA::test_tags__refs_unfold_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3055272Z test_ops.py::TestTagsCUDA::test_tags__refs_unsqueeze_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3055443Z test_ops.py::TestTagsCUDA::test_tags__refs_var_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3055615Z test_ops.py::TestTagsCUDA::test_tags__refs_view_as_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3055783Z test_ops.py::TestTagsCUDA::test_tags__refs_view_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3055954Z test_ops.py::TestTagsCUDA::test_tags__refs_vsplit_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3056123Z test_ops.py::TestTagsCUDA::test_tags__refs_vstack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3056312Z test_ops.py::TestTagsCUDA::test_tags__softmax_backward_data_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3056476Z test_ops.py::TestTagsCUDA::test_tags_abs_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3056642Z test_ops.py::TestTagsCUDA::test_tags_acosh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3056801Z test_ops.py::TestTagsCUDA::test_tags_addbmm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3056968Z test_ops.py::TestTagsCUDA::test_tags_addcmul_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3057131Z test_ops.py::TestTagsCUDA::test_tags_addmv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3057295Z test_ops.py::TestTagsCUDA::test_tags_addr_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3057460Z test_ops.py::TestTagsCUDA::test_tags_allclose_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3057629Z test_ops.py::TestTagsCUDA::test_tags_aminmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3057792Z test_ops.py::TestTagsCUDA::test_tags_angle_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3057993Z test_ops.py::TestTagsCUDA::test_tags_arange_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3058150Z test_ops.py::TestTagsCUDA::test_tags_argmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3058310Z test_ops.py::TestTagsCUDA::test_tags_asin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3058466Z test_ops.py::TestTagsCUDA::test_tags_asinh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3058626Z test_ops.py::TestTagsCUDA::test_tags_atan2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3058787Z test_ops.py::TestTagsCUDA::test_tags_atan_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3058954Z test_ops.py::TestTagsCUDA::test_tags_atleast_2d_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3059121Z test_ops.py::TestTagsCUDA::test_tags_baddbmm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3059292Z test_ops.py::TestTagsCUDA::test_tags_bitwise_and_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3059466Z test_ops.py::TestTagsCUDA::test_tags_bitwise_left_shift_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3059621Z test_ops.py::TestTagsCUDA::test_tags_bitwise_not_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3059783Z test_ops.py::TestTagsCUDA::test_tags_bitwise_or_cuda_int64 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3059941Z test_ops.py::TestTagsCUDA::test_tags_bmm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3060120Z test_ops.py::TestTagsCUDA::test_tags_broadcast_shapes_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3060290Z test_ops.py::TestTagsCUDA::test_tags_broadcast_to_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3060477Z test_ops.py::TestTagsCUDA::test_tags_byte_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3060655Z test_ops.py::TestTagsCUDA::test_tags_cartesian_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3060824Z test_ops.py::TestTagsCUDA::test_tags_cdouble_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3060978Z test_ops.py::TestTagsCUDA::test_tags_chalf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3061137Z test_ops.py::TestTagsCUDA::test_tags_char_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3061303Z test_ops.py::TestTagsCUDA::test_tags_cholesky_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3061470Z test_ops.py::TestTagsCUDA::test_tags_clamp_min_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3061640Z test_ops.py::TestTagsCUDA::test_tags_column_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3061804Z test_ops.py::TestTagsCUDA::test_tags_complex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3061978Z test_ops.py::TestTagsCUDA::test_tags_conj_physical_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3062148Z test_ops.py::TestTagsCUDA::test_tags_contiguous_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3062315Z test_ops.py::TestTagsCUDA::test_tags_cummax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3062472Z test_ops.py::TestTagsCUDA::test_tags_cummin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3062631Z test_ops.py::TestTagsCUDA::test_tags_cumsum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3062796Z test_ops.py::TestTagsCUDA::test_tags_deg2rad_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3062959Z test_ops.py::TestTagsCUDA::test_tags_diagflat_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3063130Z test_ops.py::TestTagsCUDA::test_tags_diagonal_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3063313Z test_ops.py::TestTagsCUDA::test_tags_diagonal_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3063502Z test_ops.py::TestTagsCUDA::test_tags_digamma_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3063680Z test_ops.py::TestTagsCUDA::test_tags_div_floor_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3063859Z test_ops.py::TestTagsCUDA::test_tags_div_trunc_rounding_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3064016Z test_ops.py::TestTagsCUDA::test_tags_double_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3064177Z test_ops.py::TestTagsCUDA::test_tags_einsum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3064344Z test_ops.py::TestTagsCUDA::test_tags_empty_like_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3064504Z test_ops.py::TestTagsCUDA::test_tags_exp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3064675Z test_ops.py::TestTagsCUDA::test_tags_expand_as_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3064839Z test_ops.py::TestTagsCUDA::test_tags_fft_fft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3065002Z test_ops.py::TestTagsCUDA::test_tags_fft_fft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3065172Z test_ops.py::TestTagsCUDA::test_tags_fft_fftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3065329Z test_ops.py::TestTagsCUDA::test_tags_fft_ifft2_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3065490Z test_ops.py::TestTagsCUDA::test_tags_fft_ifft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3065657Z test_ops.py::TestTagsCUDA::test_tags_fft_ifftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3065827Z test_ops.py::TestTagsCUDA::test_tags_fft_ifftshift_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3066029Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3066199Z test_ops.py::TestTagsCUDA::test_tags_fft_ihfftn_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3066366Z test_ops.py::TestTagsCUDA::test_tags_fft_irfft_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3066526Z test_ops.py::TestTagsCUDA::test_tags_flip_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3066690Z test_ops.py::TestTagsCUDA::test_tags_fliplr_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3066846Z test_ops.py::TestTagsCUDA::test_tags_flipud_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3067018Z test_ops.py::TestTagsCUDA::test_tags_float_power_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3067179Z test_ops.py::TestTagsCUDA::test_tags_floor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3067349Z test_ops.py::TestTagsCUDA::test_tags_floor_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3067511Z test_ops.py::TestTagsCUDA::test_tags_fmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3067674Z test_ops.py::TestTagsCUDA::test_tags_frac_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3067835Z test_ops.py::TestTagsCUDA::test_tags_geqrf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068002Z test_ops.py::TestTagsCUDA::test_tags_gradient_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068155Z test_ops.py::TestTagsCUDA::test_tags_gt_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068320Z test_ops.py::TestTagsCUDA::test_tags_heaviside_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068477Z test_ops.py::TestTagsCUDA::test_tags_histc_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068638Z test_ops.py::TestTagsCUDA::test_tags_hstack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068801Z test_ops.py::TestTagsCUDA::test_tags_hypot_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3068963Z test_ops.py::TestTagsCUDA::test_tags_igamma_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3069152Z test_ops.py::TestTagsCUDA::test_tags_igammac_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3069320Z test_ops.py::TestTagsCUDA::test_tags_index_put_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3069491Z test_ops.py::TestTagsCUDA::test_tags_index_select_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3069645Z test_ops.py::TestTagsCUDA::test_tags_inner_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3069873Z test_ops.py::TestTagsCUDA::test_tags_isclose_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3070040Z test_ops.py::TestTagsCUDA::test_tags_isfinite_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3070198Z test_ops.py::TestTagsCUDA::test_tags_isinf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3070361Z test_ops.py::TestTagsCUDA::test_tags_isnan_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3070529Z test_ops.py::TestTagsCUDA::test_tags_isneginf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3070693Z test_ops.py::TestTagsCUDA::test_tags_isposinf_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3070868Z test_ops.py::TestTagsCUDA::test_tags_jiterator_unary_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3071019Z test_ops.py::TestTagsCUDA::test_tags_le_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3071178Z test_ops.py::TestTagsCUDA::test_tags_lerp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3071350Z test_ops.py::TestTagsCUDA::test_tags_linalg_cond_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3071520Z test_ops.py::TestTagsCUDA::test_tags_linalg_det_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3071775Z test_ops.py::TestTagsCUDA::test_tags_linalg_det_singular_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3071942Z test_ops.py::TestTagsCUDA::test_tags_linalg_eig_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3072117Z test_ops.py::TestTagsCUDA::test_tags_linalg_eigvalsh_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3072286Z test_ops.py::TestTagsCUDA::test_tags_linalg_inv_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3072457Z test_ops.py::TestTagsCUDA::test_tags_linalg_inv_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3072625Z test_ops.py::TestTagsCUDA::test_tags_linalg_ldl_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3072795Z test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3072981Z test_ops.py::TestTagsCUDA::test_tags_linalg_lstsq_grad_oriented_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3073151Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3073328Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_factor_ex_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3073505Z test_ops.py::TestTagsCUDA::test_tags_linalg_lu_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3073699Z test_ops.py::TestTagsCUDA::test_tags_linalg_norm_subgradients_at_zero_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3073881Z test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_hermitian_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3074105Z test_ops.py::TestTagsCUDA::test_tags_linalg_pinv_singular_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [ 98%]
2023-01-11T23:13:47.3074271Z test_ops.py::TestTagsCUDA::test_tags_linalg_slogdet_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3074440Z test_ops.py::TestTagsCUDA::test_tags_linalg_svd_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3074611Z test_ops.py::TestTagsCUDA::test_tags_linalg_svdvals_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3074818Z test_ops.py::TestTagsCUDA::test_tags_linalg_tensorsolve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3074994Z test_ops.py::TestTagsCUDA::test_tags_linalg_vector_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3075155Z test_ops.py::TestTagsCUDA::test_tags_log10_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3075324Z test_ops.py::TestTagsCUDA::test_tags_logical_and_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3075494Z test_ops.py::TestTagsCUDA::test_tags_logical_not_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3075658Z test_ops.py::TestTagsCUDA::test_tags_logical_or_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3075824Z test_ops.py::TestTagsCUDA::test_tags_logical_xor_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3075987Z test_ops.py::TestTagsCUDA::test_tags_long_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3076149Z test_ops.py::TestTagsCUDA::test_tags_lt_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3076309Z test_ops.py::TestTagsCUDA::test_tags_lu_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3076475Z test_ops.py::TestTagsCUDA::test_tags_lu_solve_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3076648Z test_ops.py::TestTagsCUDA::test_tags_lu_unpack_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3076814Z test_ops.py::TestTagsCUDA::test_tags_masked_amin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3076988Z test_ops.py::TestTagsCUDA::test_tags_masked_argmax_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3077155Z test_ops.py::TestTagsCUDA::test_tags_masked_cumprod_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3077351Z test_ops.py::TestTagsCUDA::test_tags_masked_cumsum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3077523Z test_ops.py::TestTagsCUDA::test_tags_masked_fill_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3077697Z test_ops.py::TestTagsCUDA::test_tags_masked_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3077868Z test_ops.py::TestTagsCUDA::test_tags_masked_select_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3078040Z test_ops.py::TestTagsCUDA::test_tags_masked_softmin_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3078205Z test_ops.py::TestTagsCUDA::test_tags_masked_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3078373Z test_ops.py::TestTagsCUDA::test_tags_masked_var_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3078537Z test_ops.py::TestTagsCUDA::test_tags_matmul_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3078694Z test_ops.py::TestTagsCUDA::test_tags_matrix_exp_cuda_float32 SKIPPED (Only runs on cpu) [ 98%]
2023-01-11T23:13:47.3078888Z test_ops.py::TestTagsCUDA::test_tags_max_pool2d_with_indices_backward_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3079071Z test_ops.py::TestTagsCUDA::test_tags_max_reduction_no_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3079255Z test_ops.py::TestTagsCUDA::test_tags_max_reduction_with_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3079442Z test_ops.py::TestTagsCUDA::test_tags_median_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3079653Z test_ops.py::TestTagsCUDA::test_tags_meshgrid_variadic_tensors_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3079833Z test_ops.py::TestTagsCUDA::test_tags_min_reduction_no_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3080014Z test_ops.py::TestTagsCUDA::test_tags_min_reduction_with_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3080179Z test_ops.py::TestTagsCUDA::test_tags_mode_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3080361Z test_ops.py::TestTagsCUDA::test_tags_movedim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3080523Z test_ops.py::TestTagsCUDA::test_tags_msort_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3080684Z test_ops.py::TestTagsCUDA::test_tags_mul_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3080865Z test_ops.py::TestTagsCUDA::test_tags_mvlgamma_mvlgamma_p_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3081035Z test_ops.py::TestTagsCUDA::test_tags_nanmedian_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3081198Z test_ops.py::TestTagsCUDA::test_tags_narrow_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3081382Z test_ops.py::TestTagsCUDA::test_tags_native_dropout_backward_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3081542Z test_ops.py::TestTagsCUDA::test_tags_neg_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3081711Z test_ops.py::TestTagsCUDA::test_tags_new_empty_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3081882Z test_ops.py::TestTagsCUDA::test_tags_new_empty_strided_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3082046Z test_ops.py::TestTagsCUDA::test_tags_new_ones_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3082208Z test_ops.py::TestTagsCUDA::test_tags_new_zeros_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3082413Z test_ops.py::TestTagsCUDA::test_tags_nn_functional__scaled_dot_product_attention_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3082601Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_batch_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3082784Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_bilinear_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3083019Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_binary_cross_entropy_with_logits_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3083205Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv1d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3083400Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_conv_transpose1d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3083578Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_dropout3d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3083758Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_elu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3083967Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_with_train_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3084174Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_feature_alpha_dropout_without_train_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3084351Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_glu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3084539Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_grid_sample_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3084726Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_instance_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3084910Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_kl_div_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3085093Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_layer_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3085268Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_leaky_relu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3085463Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_local_response_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3085645Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_pool2d_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3085837Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_max_unpool1d_grad_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3086042Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_mish_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3086225Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_mse_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3086428Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_multilabel_soft_margin_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3086620Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_circular_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3086809Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pad_replicate_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3086998Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_pairwise_distance_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3087187Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_poisson_nll_loss_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3087369Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_relu6_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3087547Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_rrelu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3087726Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_selu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3087906Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3088100Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softmin_with_dtype_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3088281Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_softsign_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3088486Z test_ops.py::TestTagsCUDA::test_tags_nn_functional_unfold_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3088654Z test_ops.py::TestTagsCUDA::test_tags_nonzero_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3088812Z test_ops.py::TestTagsCUDA::test_tags_norm_inf_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3088990Z test_ops.py::TestTagsCUDA::test_tags_normal_number_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3089196Z test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_native_batch_norm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3089409Z test_ops.py::TestTagsCUDA::test_tags_ops_nvprims_var_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3089575Z test_ops.py::TestTagsCUDA::test_tags_ormqr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3089737Z test_ops.py::TestTagsCUDA::test_tags_outer_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3089911Z test_ops.py::TestTagsCUDA::test_tags_pca_lowrank_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3090076Z test_ops.py::TestTagsCUDA::test_tags_permute_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3090256Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3090439Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_2_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3090617Z test_ops.py::TestTagsCUDA::test_tags_polygamma_polygamma_n_3_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3090777Z test_ops.py::TestTagsCUDA::test_tags_put_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3090946Z test_ops.py::TestTagsCUDA::test_tags_rad2deg_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3091113Z test_ops.py::TestTagsCUDA::test_tags_randint_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3091279Z test_ops.py::TestTagsCUDA::test_tags_randn_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3091443Z test_ops.py::TestTagsCUDA::test_tags_real_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3091640Z test_ops.py::TestTagsCUDA::test_tags_reciprocal_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3091802Z test_ops.py::TestTagsCUDA::test_tags_remainder_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3091965Z test_ops.py::TestTagsCUDA::test_tags_renorm_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3092132Z test_ops.py::TestTagsCUDA::test_tags_repeat_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3092312Z test_ops.py::TestTagsCUDA::test_tags_repeat_interleave_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3092483Z test_ops.py::TestTagsCUDA::test_tags_reshape_as_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3092657Z test_ops.py::TestTagsCUDA::test_tags_reshape_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3092823Z test_ops.py::TestTagsCUDA::test_tags_resize__cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3092990Z test_ops.py::TestTagsCUDA::test_tags_resize_as__cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3093152Z test_ops.py::TestTagsCUDA::test_tags_roll_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3093324Z test_ops.py::TestTagsCUDA::test_tags_round_decimals_0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3093493Z test_ops.py::TestTagsCUDA::test_tags_scatter_add_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3093674Z test_ops.py::TestTagsCUDA::test_tags_scatter_reduce_prod_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3093857Z test_ops.py::TestTagsCUDA::test_tags_segment_reduce_offsets_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3094059Z test_ops.py::TestTagsCUDA::test_tags_select_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3094219Z test_ops.py::TestTagsCUDA::test_tags_short_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3094385Z test_ops.py::TestTagsCUDA::test_tags_sign_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3094674Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_blackman_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3094851Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_cosine_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3095042Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_exponential_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3095234Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_general_hamming_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3095414Z test_ops.py::TestTagsCUDA::test_tags_signal_windows_hann_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3095588Z test_ops.py::TestTagsCUDA::test_tags_signbit_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3095752Z test_ops.py::TestTagsCUDA::test_tags_sinc_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3095916Z test_ops.py::TestTagsCUDA::test_tags_sinh_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3096088Z test_ops.py::TestTagsCUDA::test_tags_slice_scatter_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3096247Z test_ops.py::TestTagsCUDA::test_tags_sort_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3096415Z test_ops.py::TestTagsCUDA::test_tags_special_airy_ai_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3096594Z test_ops.py::TestTagsCUDA::test_tags_special_bessel_j1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3096937Z test_ops.py::TestTagsCUDA::test_tags_special_chebyshev_polynomial_w_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:13:47.3097130Z test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_h_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3097361Z test_ops.py::TestTagsCUDA::test_tags_special_hermite_polynomial_he_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3097697Z test_ops.py::TestTagsCUDA::test_tags_special_legendre_polynomial_p_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:13:47.3097882Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3098068Z test_ops.py::TestTagsCUDA::test_tags_special_modified_bessel_k1_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3098240Z test_ops.py::TestTagsCUDA::test_tags_special_ndtr_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3098412Z test_ops.py::TestTagsCUDA::test_tags_special_ndtri_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3098609Z test_ops.py::TestTagsCUDA::test_tags_special_polygamma_special_polygamma_n_0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3098802Z test_ops.py::TestTagsCUDA::test_tags_special_scaled_modified_bessel_k0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3099146Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_t_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:13:47.3099479Z test_ops.py::TestTagsCUDA::test_tags_special_shifted_chebyshev_polynomial_u_cuda_float32 SKIPPED (Skipping - testing takes an unreasonably long time, #79528) [ 99%]
2023-01-11T23:13:47.3099666Z test_ops.py::TestTagsCUDA::test_tags_special_spherical_bessel_j0_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3099828Z test_ops.py::TestTagsCUDA::test_tags_split_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100005Z test_ops.py::TestTagsCUDA::test_tags_split_with_sizes_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100199Z test_ops.py::TestTagsCUDA::test_tags_sqrt_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100369Z test_ops.py::TestTagsCUDA::test_tags_squeeze_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100521Z test_ops.py::TestTagsCUDA::test_tags_stack_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100682Z test_ops.py::TestTagsCUDA::test_tags_std_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100840Z test_ops.py::TestTagsCUDA::test_tags_sub_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3100999Z test_ops.py::TestTagsCUDA::test_tags_sum_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3101169Z test_ops.py::TestTagsCUDA::test_tags_svd_lowrank_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3101329Z test_ops.py::TestTagsCUDA::test_tags_t_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3101505Z test_ops.py::TestTagsCUDA::test_tags_take_along_dim_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3101672Z test_ops.py::TestTagsCUDA::test_tags_tensordot_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3101834Z test_ops.py::TestTagsCUDA::test_tags_tile_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3101997Z test_ops.py::TestTagsCUDA::test_tags_to_sparse_cuda_float32 SKIPPED (Allowed exception) [ 99%]
2023-01-11T23:13:47.3102155Z test_ops.py::TestTagsCUDA::test_tags_topk_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3102317Z test_ops.py::TestTagsCUDA::test_tags_trace_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3102483Z test_ops.py::TestTagsCUDA::test_tags_transpose_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3102651Z test_ops.py::TestTagsCUDA::test_tags_trapezoid_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3102814Z test_ops.py::TestTagsCUDA::test_tags_trapz_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3102979Z test_ops.py::TestTagsCUDA::test_tags_tril_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3103174Z test_ops.py::TestTagsCUDA::test_tags_tril_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3103325Z test_ops.py::TestTagsCUDA::test_tags_triu_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3103486Z test_ops.py::TestTagsCUDA::test_tags_triu_indices_cuda_int64 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3103654Z test_ops.py::TestTagsCUDA::test_tags_true_divide_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3103819Z test_ops.py::TestTagsCUDA::test_tags_unflatten_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3103990Z test_ops.py::TestTagsCUDA::test_tags_unfold_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3104152Z test_ops.py::TestTagsCUDA::test_tags_unfold_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3104319Z test_ops.py::TestTagsCUDA::test_tags_uniform_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3104498Z test_ops.py::TestTagsCUDA::test_tags_unique_consecutive_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3104662Z test_ops.py::TestTagsCUDA::test_tags_var_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3104820Z test_ops.py::TestTagsCUDA::test_tags_var_mean_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3104979Z test_ops.py::TestTagsCUDA::test_tags_vdot_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3105154Z test_ops.py::TestTagsCUDA::test_tags_view_as_complex_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3105316Z test_ops.py::TestTagsCUDA::test_tags_view_as_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3105488Z test_ops.py::TestTagsCUDA::test_tags_view_as_real_cuda_complex64 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3105699Z test_ops.py::TestTagsCUDA::test_tags_view_copy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3105858Z test_ops.py::TestTagsCUDA::test_tags_view_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3106022Z test_ops.py::TestTagsCUDA::test_tags_where_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3106174Z test_ops.py::TestTagsCUDA::test_tags_xlogy_cuda_float32 SKIPPED (Only runs on cpu) [ 99%]
2023-01-11T23:13:47.3106343Z test_ops.py::TestTagsCUDA::test_tags_zeros_like_cuda_float32 SKIPPED (Only runs on cpu) [100%]
2023-01-11T23:13:47.3106351Z 
2023-01-11T23:13:47.3106472Z =============================== warnings summary ===============================
2023-01-11T23:13:47.3106697Z ../../../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171
2023-01-11T23:13:47.3107053Z   /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: hypothesis
2023-01-11T23:13:47.3107162Z     self._mark_plugins_for_rewrite(hook)
2023-01-11T23:13:47.3107168Z 
2023-01-11T23:13:47.3107401Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
2023-01-11T23:13:47.3107695Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-621a128d9f5db79e.xml -
2023-01-11T23:13:47.3107847Z = 11898 passed, 3070 skipped, 17 deselected, 179 xfailed, 1 warning in 4799.59s (1:19:59) =
2023-01-11T23:13:47.3108027Z If in CI, skip info is located in the xml test reports, please either go to s3 or the hud to download them
2023-01-11T23:13:47.3108033Z 
2023-01-11T23:13:47.3108397Z ##[endgroup]
2023-01-11T23:13:47.3108654Z FINISHED PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_oa0bw8mk)
2023-01-11T23:13:47.3108660Z 
2023-01-11T23:13:47.3109119Z Executing ['/opt/conda/bin/python', '-bb', 'test_ops.py', '-v', '--use-pytest', '-vv', '-rfEX', '-x', '--reruns=2', '-k=_linalg_cholesky_', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 23:13:47.250262]
2023-01-11T23:14:02.8990976Z 
2023-01-11T23:14:02.8991624Z Expand the folded group to see the log file of test_ops
2023-01-11T23:14:02.8993583Z ##[group]PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_3nn1zy0z)
2023-01-11T23:14:02.8994311Z Test results will be stored in test-reports/python-pytest/test_ops/test_ops-001e362a3915126e.xml
2023-01-11T23:14:02.8994645Z ============================= test session starts ==============================
2023-01-11T23:14:02.8995016Z platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0 -- /opt/conda/bin/python
2023-01-11T23:14:02.8995291Z cachedir: .pytest_cache
2023-01-11T23:14:02.8995718Z hypothesis profile 'pytorch_ci' -> database=None, max_examples=50, derandomize=True, suppress_health_check=[HealthCheck.too_slow]
2023-01-11T23:14:02.8996090Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini
2023-01-11T23:14:02.8996517Z plugins: hypothesis-5.35.1, flakefinder-1.1.0, rerunfailures-10.3, shard-0.1.2, xdist-3.1.0, xdoctest-1.1.0
2023-01-11T23:14:02.8996874Z collecting ... collected 30861 items / 30819 deselected / 42 selected
2023-01-11T23:14:02.9000780Z Running 42 items in this shard: test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_cuda, test/test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_ex_cuda, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32, test/test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_ex_cuda_complex64, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_ex_cuda_complex128, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_cuda_float64, test/test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32, test/test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_ex_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_cuda_float32, test/test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_ex_cuda_float32
2023-01-11T23:14:02.9004453Z 
2023-01-11T23:14:02.9004687Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  2%]
2023-01-11T23:14:02.9005171Z test_ops.py::TestCommonCUDA::test_compare_cpu_linalg_cholesky_ex_cuda_float32 SKIPPED (test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test) [  4%]
2023-01-11T23:14:02.9005577Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_cuda PASSED     [  7%]
2023-01-11T23:14:02.9005912Z test_ops.py::TestCommonCUDA::test_dtypes_linalg_cholesky_ex_cuda PASSED  [  9%]
2023-01-11T23:14:02.9006294Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 11%]
2023-01-11T23:14:02.9006724Z test_ops.py::TestCommonCUDA::test_multiple_devices_linalg_cholesky_ex_cuda_float32 SKIPPED (fewer than 2 devices detected) [ 14%]
2023-01-11T23:14:02.9007130Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_complex64 PASSED [ 16%]
2023-01-11T23:14:02.9007513Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_cuda_float32 PASSED [ 19%]
2023-01-11T23:14:02.9007918Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_complex64 PASSED [ 21%]
2023-01-11T23:14:02.9008307Z test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_cholesky_ex_cuda_float32 PASSED [ 23%]
2023-01-11T23:14:02.9008671Z test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_cuda_float32 PASSED [ 26%]
2023-01-11T23:14:02.9009002Z test_ops.py::TestCommonCUDA::test_out_linalg_cholesky_ex_cuda_float32 PASSED [ 28%]
2023-01-11T23:14:02.9009345Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_cuda PASSED [ 30%]
2023-01-11T23:14:02.9009688Z test_ops.py::TestCommonCUDA::test_out_warning_linalg_cholesky_ex_cuda PASSED [ 33%]
2023-01-11T23:14:02.9010060Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_complex64 PASSED [ 35%]
2023-01-11T23:14:02.9010443Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_cuda_float32 PASSED [ 38%]
2023-01-11T23:14:02.9010835Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_complex64 PASSED [ 40%]
2023-01-11T23:14:02.9011231Z test_ops.py::TestCommonCUDA::test_variant_consistency_eager_linalg_cholesky_ex_cuda_float32 PASSED [ 42%]
2023-01-11T23:14:02.9011626Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_cuda_float32 PASSED [ 45%]
2023-01-11T23:14:02.9012021Z test_ops.py::TestCompositeComplianceCUDA::test_backward_linalg_cholesky_ex_cuda_float32 PASSED [ 47%]
2023-01-11T23:14:02.9012421Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_cuda_float32 PASSED [ 50%]
2023-01-11T23:14:02.9012828Z test_ops.py::TestCompositeComplianceCUDA::test_forward_ad_linalg_cholesky_ex_cuda_float32 PASSED [ 52%]
2023-01-11T23:14:02.9013220Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_cuda_float32 PASSED [ 54%]
2023-01-11T23:14:02.9013621Z test_ops.py::TestCompositeComplianceCUDA::test_operator_linalg_cholesky_ex_cuda_float32 PASSED [ 57%]
2023-01-11T23:14:02.9014001Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_cuda_complex64 PASSED [ 59%]
2023-01-11T23:14:02.9014364Z test_ops.py::TestMathBitsCUDA::test_conj_view_linalg_cholesky_ex_cuda_complex64 PASSED [ 61%]
2023-01-11T23:14:02.9015022Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_cuda_complex128 PASSED [ 64%]
2023-01-11T23:14:02.9015399Z test_ops.py::TestMathBitsCUDA::test_neg_conj_view_linalg_cholesky_ex_cuda_complex128 PASSED [ 66%]
2023-01-11T23:14:02.9015759Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_cuda_float64 PASSED [ 69%]
2023-01-11T23:14:02.9016112Z test_ops.py::TestMathBitsCUDA::test_neg_view_linalg_cholesky_ex_cuda_float64 PASSED [ 71%]
2023-01-11T23:14:02.9016468Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_cuda_float32 PASSED [ 73%]
2023-01-11T23:14:02.9016840Z test_ops.py::TestFakeTensorCUDA::test_fake_autocast_linalg_cholesky_ex_cuda_float32 PASSED [ 76%]
2023-01-11T23:14:02.9017226Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_cuda_float32 PASSED [ 78%]
2023-01-11T23:14:02.9017620Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_amp_linalg_cholesky_ex_cuda_float32 PASSED [ 80%]
2023-01-11T23:14:02.9018024Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_cuda_float32 PASSED [ 83%]
2023-01-11T23:14:02.9018422Z test_ops.py::TestFakeTensorCUDA::test_fake_crossref_backward_no_amp_linalg_cholesky_ex_cuda_float32 PASSED [ 85%]
2023-01-11T23:14:02.9018799Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_cuda_float32 PASSED [ 88%]
2023-01-11T23:14:02.9019144Z test_ops.py::TestFakeTensorCUDA::test_fake_linalg_cholesky_ex_cuda_float32 PASSED [ 90%]
2023-01-11T23:14:02.9019509Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_cuda_float32 PASSED [ 92%]
2023-01-11T23:14:02.9019878Z test_ops.py::TestFakeTensorCUDA::test_pointwise_ops_linalg_cholesky_ex_cuda_float32 PASSED [ 95%]
2023-01-11T23:14:02.9020300Z test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_cuda_float32 SKIPPED (Only runs on cpu) [ 97%]
2023-01-11T23:14:02.9020670Z test_ops.py::TestTagsCUDA::test_tags_linalg_cholesky_ex_cuda_float32 SKIPPED (Only runs on cpu) [100%]
2023-01-11T23:14:02.9020873Z 
2023-01-11T23:14:02.9020999Z =============================== warnings summary ===============================
2023-01-11T23:14:02.9021379Z ../../../../../opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171
2023-01-11T23:14:02.9021913Z   /opt/conda/lib/python3.10/site-packages/_pytest/config/__init__.py:1171: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: hypothesis
2023-01-11T23:14:02.9022275Z     self._mark_plugins_for_rewrite(hook)
2023-01-11T23:14:02.9022416Z 
2023-01-11T23:14:02.9022649Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
2023-01-11T23:14:02.9023128Z - generated xml file: /var/lib/jenkins/workspace/test/test-reports/python-pytest/test_ops/test_ops-001e362a3915126e.xml -
2023-01-11T23:14:02.9023494Z ========== 36 passed, 6 skipped, 30819 deselected, 1 warning in 9.16s ==========
2023-01-11T23:14:02.9023871Z If in CI, skip info is located in the xml test reports, please either go to s3 or the hud to download them
2023-01-11T23:14:02.9024086Z 
2023-01-11T23:14:02.9024327Z ##[endgroup]
2023-01-11T23:14:02.9024686Z FINISHED PRINTING LOG FILE of test_ops (/var/lib/jenkins/workspace/test/test-reports/test_ops_3nn1zy0z)
2023-01-11T23:14:02.9024894Z 
2023-01-11T23:14:02.9025057Z Running test_prims ... [2023-01-11 23:14:02.899032]
2023-01-11T23:14:02.9025514Z Executing ['/opt/conda/bin/python', '-bb', 'test_prims.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 23:14:02.899225]
2023-01-11T23:14:49.4054788Z 
2023-01-11T23:14:49.4055243Z Expand the folded group to see the log file of test_prims
2023-01-11T23:14:49.4056183Z ##[group]PRINTING LOG FILE of test_prims (/var/lib/jenkins/workspace/test/test-reports/test_prims_f_kgo3dr)
2023-01-11T23:14:49.4059068Z 
2023-01-11T23:14:49.4059382Z Running tests...
2023-01-11T23:14:49.4059927Z ----------------------------------------------------------------------
2023-01-11T23:14:49.4060409Z Test results will be stored in test-reports/python-unittest/test_prims
2023-01-11T23:14:49.4060947Z   test_decomposition_method_vararg_ones_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.019s)
2023-01-11T23:14:49.4061320Z   test_decomposition_method_vararg_permute_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.015s)
2023-01-11T23:14:49.4061699Z   test_decomposition_type_promotion_nvprim_amp_cuda_float16 (__main__.TestDecompCUDA) ... ok (1.316s)
2023-01-11T23:14:49.4064281Z   test_decomposition_type_promotion_nvprim_amp_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.679s)
2023-01-11T23:14:49.4064829Z   test_masked_fill_decomposition_under_nvprim_context_cuda_float16 (__main__.TestDecompCUDA) ... ok (0.118s)
2023-01-11T23:14:49.4065369Z   test_masked_fill_decomposition_under_nvprim_context_cuda_float32 (__main__.TestDecompCUDA) ... ok (0.120s)
2023-01-11T23:14:49.4065709Z   test_mul_complex (__main__.TestPrimsBasic) ... ok (0.001s)
2023-01-11T23:14:49.4066088Z   test_torch_ops (__main__.TestPrimsBasic) ... ok (0.002s)
2023-01-11T23:14:49.4066486Z   test_aten_overload_to_prims_cuda (__main__.TestPrimsCUDA) ... ok (0.038s)
2023-01-11T23:14:49.4066931Z   test_batch_norm_backward_nvprims_cuda_float16 (__main__.TestPrimsCUDA) ... ok (3.815s)
2023-01-11T23:14:49.4067385Z   test_batch_norm_backward_nvprims_cuda_float32 (__main__.TestPrimsCUDA) ... ok (3.585s)
2023-01-11T23:14:49.4067804Z   test_broadcast_in_dim_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.927s)
2023-01-11T23:14:49.4068177Z   test_broadcast_in_dim_sum_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.258s)
2023-01-11T23:14:49.4068486Z   test_cbrt_prim_cuda_float64 (__main__.TestPrimsCUDA) ... ok (0.010s)
2023-01-11T23:14:49.4068777Z   test_cbrt_prim_cuda_int64 (__main__.TestPrimsCUDA) ... ok (0.009s)
2023-01-11T23:14:49.4069075Z   test_cpu_tensor_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.434s)
2023-01-11T23:14:49.4069501Z   test_cpu_tensor_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.365s)
2023-01-11T23:14:49.4069818Z   test_cudnn_batch_norm_nvprims_cuda_float32 (__main__.TestPrimsCUDA) ... ok (6.611s)
2023-01-11T23:14:49.4070232Z   test_cudnn_batch_norm_nvprims_cuda_float64 (__main__.TestPrimsCUDA) ... ok (6.536s)
2023-01-11T23:14:49.4070567Z   test_full_cuda_float32 (__main__.TestPrimsCUDA) ... ok (2.199s)
2023-01-11T23:14:49.4070871Z   test_memory_format_strides_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.020s)
2023-01-11T23:14:49.4071190Z   test_native_batch_norm_nvprims_cuda_float32 (__main__.TestPrimsCUDA) ... ok (5.339s)
2023-01-11T23:14:49.4071528Z   test_native_batch_norm_nvprims_cuda_float64 (__main__.TestPrimsCUDA) ... ok (5.271s)
2023-01-11T23:14:49.4071849Z   test_nvfuser_capability_context_cuda (__main__.TestPrimsCUDA) ... ok (0.147s)
2023-01-11T23:14:49.4072167Z   test_nvfuser_constant_tensors_cuda (__main__.TestPrimsCUDA) ... ok (0.057s)
2023-01-11T23:14:49.4072929Z   test_nvfuser_empty_fusion_cuda (__main__.TestPrimsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_prims/nvfuser_executor.py:414: RuntimeWarning: No partition found for the graph. This is likely because the graph is not supported by nvFuser. Please use the eager ATen mode to execute the graph.
2023-01-11T23:14:49.4073427Z   warn(
2023-01-11T23:14:49.4073598Z ok (0.016s)
2023-01-11T23:14:49.4073868Z   test_nvfuser_executor_cached_noncontiguous_cuda (__main__.TestPrimsCUDA) ... ok (0.450s)
2023-01-11T23:14:49.4074198Z   test_nvfuser_executor_parameters_cuda (__main__.TestPrimsCUDA) ... ok (0.199s)
2023-01-11T23:14:49.4074523Z   test_nvfuser_executor_partitioned_cuda (__main__.TestPrimsCUDA) ... ok (0.548s)
2023-01-11T23:14:49.4077380Z   test_nvfuser_executor_partitioned_no_partitions_error_cuda (__main__.TestPrimsCUDA) ... ok (0.048s)
2023-01-11T23:14:49.4077776Z   test_nvfuser_impl_is_used_cuda (__main__.TestPrimsCUDA) ... ok (0.001s)
2023-01-11T23:14:49.4078345Z   test_nvfuser_no_args_cuda (__main__.TestPrimsCUDA) ... ok (0.068s)
2023-01-11T23:14:49.4078796Z   test_nvfuser_rand_like_fusion_cuda (__main__.TestPrimsCUDA) ... ok (0.208s)
2023-01-11T23:14:49.4079910Z   test_nvprim_convert_element_type_cuda_float16 (__main__.TestPrimsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py:238: UserWarning: get_isolated_graphmodule failed on decomposition: empty_like with error message: Unexpected type when checking for same shape, <class 'torch.fx.proxy.Proxy'>!
2023-01-11T23:14:49.4080582Z   warn(
2023-01-11T23:14:49.4081126Z /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py:238: UserWarning: get_isolated_graphmodule failed on decomposition: fill_scalar with error message: full_like(): argument 'fill_value' (position 2) must be Number, not Proxy
2023-01-11T23:14:49.4081519Z   warn(
2023-01-11T23:14:49.4081688Z ok (0.251s)
2023-01-11T23:14:49.4081950Z   test_nvprim_convert_element_type_cuda_uint8 (__main__.TestPrimsCUDA) ... ok (0.251s)
2023-01-11T23:14:49.4082252Z   test_nvprims_cuda (__main__.TestPrimsCUDA) ... ok (0.012s)
2023-01-11T23:14:49.4082539Z   test_nvprims_view_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.329s)
2023-01-11T23:14:49.4082837Z   test_nvprims_view_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.330s)
2023-01-11T23:14:49.4083548Z   test_nvprims_view_partitioner_cuda_float16 (__main__.TestPrimsCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/_prims/nvfuser_executor.py:414: RuntimeWarning: No partition found for the graph. This is likely because the graph is not supported by nvFuser. Please use the eager ATen mode to execute the graph.
2023-01-11T23:14:49.4083993Z   warn(
2023-01-11T23:14:49.4084162Z ok (0.084s)
2023-01-11T23:14:49.4084417Z   test_nvprims_view_partitioner_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.073s)
2023-01-11T23:14:49.4084732Z   test_partitioner_tuple_output_cuda (__main__.TestPrimsCUDA) ... ok (0.038s)
2023-01-11T23:14:49.4085047Z   test_pytree_input_output_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.258s)
2023-01-11T23:14:49.4085406Z   test_reshape_view_method_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.002s)
2023-01-11T23:14:49.4085738Z   test_silu_backward_no_filled_tensor_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.829s)
2023-01-11T23:14:49.4086068Z   test_skip_ops_nvfuser_capability_mode_cuda (__main__.TestPrimsCUDA) ... ok (0.022s)
2023-01-11T23:14:49.4086388Z   test_skip_ops_nvfuser_prims_mode_cuda (__main__.TestPrimsCUDA) ... ok (0.011s)
2023-01-11T23:14:49.4086696Z   test_var_correction_0_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.273s)
2023-01-11T23:14:49.4087012Z   test_var_correction_1_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.273s)
2023-01-11T23:14:49.4087371Z   test_var_mean_correction_0_keepdim_False_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.055s)
2023-01-11T23:14:49.4087710Z   test_var_mean_correction_0_keepdim_False_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.040s)
2023-01-11T23:14:49.4088055Z   test_var_mean_correction_0_keepdim_True_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.065s)
2023-01-11T23:14:49.4088387Z   test_var_mean_correction_0_keepdim_True_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.049s)
2023-01-11T23:14:49.4088721Z   test_var_mean_correction_1_keepdim_False_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.054s)
2023-01-11T23:14:49.4089063Z   test_var_mean_correction_1_keepdim_False_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.039s)
2023-01-11T23:14:49.4089391Z   test_var_mean_correction_1_keepdim_True_cuda_float16 (__main__.TestPrimsCUDA) ... ok (0.064s)
2023-01-11T23:14:49.4089724Z   test_var_mean_correction_1_keepdim_True_cuda_float32 (__main__.TestPrimsCUDA) ... ok (0.049s)
2023-01-11T23:14:49.4090051Z   test_constant_pad_nd_memory_format_cuda_float32 (__main__.TestRefsCUDA) ... ok (0.003s)
2023-01-11T23:14:49.4090236Z 
2023-01-11T23:14:49.4090443Z ----------------------------------------------------------------------
2023-01-11T23:14:49.4090690Z Ran 57 tests in 42.887s
2023-01-11T23:14:49.4090813Z 
2023-01-11T23:14:49.4090879Z OK
2023-01-11T23:14:49.4090982Z 
2023-01-11T23:14:49.4091071Z Generating XML reports...
2023-01-11T23:14:49.4091469Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestDecompCUDA-20230111231405.xml
2023-01-11T23:14:49.4092007Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestPrimsBasic-20230111231405.xml
2023-01-11T23:14:49.4092495Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestPrimsCUDA-20230111231405.xml
2023-01-11T23:14:49.4092976Z Generated XML report: test-reports/python-unittest/test_prims/TEST-TestRefsCUDA-20230111231405.xml
2023-01-11T23:14:49.4093193Z 
2023-01-11T23:14:49.4093447Z ##[endgroup]
2023-01-11T23:14:49.4093812Z FINISHED PRINTING LOG FILE of test_prims (/var/lib/jenkins/workspace/test/test-reports/test_prims_f_kgo3dr)
2023-01-11T23:14:49.4094023Z 
2023-01-11T23:14:49.4094205Z Running test_tensor_creation_ops ... [2023-01-11 23:14:49.402429]
2023-01-11T23:14:49.4094905Z Executing ['/opt/conda/bin/python', '-bb', 'test_tensor_creation_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2023-01-11 23:14:49.402619]
2023-01-11T23:16:18.3492561Z 
2023-01-11T23:16:18.3492786Z Expand the folded group to see the log file of test_tensor_creation_ops
2023-01-11T23:16:18.3496061Z ##[group]PRINTING LOG FILE of test_tensor_creation_ops (/var/lib/jenkins/workspace/test/test-reports/test_tensor_creation_ops_hsc_5c7d)
2023-01-11T23:16:18.3496399Z 
2023-01-11T23:16:18.3496527Z Running tests...
2023-01-11T23:16:18.3496996Z ----------------------------------------------------------------------
2023-01-11T23:16:18.3497473Z Test results will be stored in test-reports/python-unittest/test_tensor_creation_ops
2023-01-11T23:16:18.3497970Z   test_alias_from_buffer_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3499458Z   test_alias_from_buffer_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3499997Z   test_alias_from_buffer_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3500687Z   test_alias_from_buffer_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3501059Z   test_alias_from_buffer_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3501429Z   test_alias_from_buffer_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3501821Z   test_alias_from_buffer_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3502171Z   test_alias_from_buffer_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3502516Z   test_alias_from_buffer_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3502849Z   test_alias_from_buffer_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3536871Z   test_alias_from_buffer_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3537365Z   test_alias_from_dlpack_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.006s)
2023-01-11T23:16:18.3537806Z   test_alias_from_dlpack_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3538149Z   test_alias_from_dlpack_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3538465Z   test_alias_from_dlpack_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3538796Z   test_alias_from_dlpack_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3539101Z   test_alias_from_dlpack_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3542702Z   test_alias_from_dlpack_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3543934Z   test_alias_from_dlpack_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3544324Z   test_alias_from_dlpack_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3544688Z   test_alias_from_dlpack_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3545032Z   test_alias_from_dlpack_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3545580Z   test_alias_from_numpy_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3545954Z   test_alias_from_numpy_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3546326Z   test_alias_from_numpy_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3546693Z   test_alias_from_numpy_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3547044Z   test_alias_from_numpy_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3547448Z   test_alias_from_numpy_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3547822Z   test_alias_from_numpy_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3548175Z   test_alias_from_numpy_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3548525Z   test_alias_from_numpy_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3548873Z   test_alias_from_numpy_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3549216Z   test_alias_from_numpy_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3549554Z   test_alias_from_tensor_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3549867Z   test_alias_from_tensor_cuda_bool (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3550192Z   test_alias_from_tensor_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3550643Z   test_alias_from_tensor_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3551056Z   test_alias_from_tensor_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3551362Z   test_alias_from_tensor_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3551670Z   test_alias_from_tensor_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3552031Z   test_alias_from_tensor_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3552335Z   test_alias_from_tensor_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3552641Z   test_alias_from_tensor_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3552948Z   test_alias_from_tensor_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3553275Z   test_alias_from_tensor_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3553636Z   test_astensor_consistency_cuda (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3554114Z   test_copy_from_buffer_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3554611Z   test_copy_from_buffer_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3555074Z   test_copy_from_buffer_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3555561Z   test_copy_from_buffer_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3556033Z   test_copy_from_buffer_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3556489Z   test_copy_from_buffer_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3556872Z   test_copy_from_buffer_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3557225Z   test_copy_from_buffer_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3557588Z   test_copy_from_buffer_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3557934Z   test_copy_from_buffer_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3558338Z   test_copy_from_buffer_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3558682Z   test_copy_from_dlpack_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3559220Z   test_copy_from_dlpack_cuda_complex128 (__main__.TestAsArrayCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:3701: UserWarning: Casting complex values to real discards the imaginary part (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/Copy.cpp:276.)
2023-01-11T23:16:18.3559765Z   result = torch.asarray(cvt(original), **kwargs)
2023-01-11T23:16:18.3560006Z ok (0.010s)
2023-01-11T23:16:18.3560272Z   test_copy_from_dlpack_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3560612Z   test_copy_from_dlpack_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3560995Z   test_copy_from_dlpack_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3561404Z   test_copy_from_dlpack_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3561816Z   test_copy_from_dlpack_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3562245Z   test_copy_from_dlpack_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3562656Z   test_copy_from_dlpack_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3563044Z   test_copy_from_dlpack_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3563420Z   test_copy_from_dlpack_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3563789Z   test_copy_from_dlpack_mult_devices_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3564251Z   test_copy_from_dlpack_mult_devices_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3564663Z   test_copy_from_dlpack_mult_devices_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3565065Z   test_copy_from_dlpack_mult_devices_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3565466Z   test_copy_from_dlpack_mult_devices_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3565863Z   test_copy_from_dlpack_mult_devices_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3566260Z   test_copy_from_dlpack_mult_devices_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3566655Z   test_copy_from_dlpack_mult_devices_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3567040Z   test_copy_from_dlpack_mult_devices_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3567420Z   test_copy_from_dlpack_mult_devices_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3567809Z   test_copy_from_dlpack_mult_devices_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3568176Z   test_copy_from_numpy_cuda_bool (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3568526Z   test_copy_from_numpy_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3568880Z   test_copy_from_numpy_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3569237Z   test_copy_from_numpy_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3569590Z   test_copy_from_numpy_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3569938Z   test_copy_from_numpy_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3570312Z   test_copy_from_numpy_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3570655Z   test_copy_from_numpy_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3570996Z   test_copy_from_numpy_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3571332Z   test_copy_from_numpy_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3571684Z   test_copy_from_numpy_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3572094Z   test_copy_from_tensor_mult_devices_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3572505Z   test_copy_from_tensor_mult_devices_cuda_complex128 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3572909Z   test_copy_from_tensor_mult_devices_cuda_complex64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3573309Z   test_copy_from_tensor_mult_devices_cuda_float16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3573708Z   test_copy_from_tensor_mult_devices_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3574102Z   test_copy_from_tensor_mult_devices_cuda_float64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3574684Z   test_copy_from_tensor_mult_devices_cuda_int16 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3575158Z   test_copy_from_tensor_mult_devices_cuda_int32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3575753Z   test_copy_from_tensor_mult_devices_cuda_int64 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3576146Z   test_copy_from_tensor_mult_devices_cuda_int8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3576533Z   test_copy_from_tensor_mult_devices_cuda_uint8 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.000s)
2023-01-11T23:16:18.3576880Z   test_copy_list_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3577186Z   test_copy_list_cuda_bool (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3577495Z   test_copy_list_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3577802Z   test_copy_list_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3578117Z   test_copy_list_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3578425Z   test_copy_list_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3578722Z   test_copy_list_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3579028Z   test_copy_list_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3579333Z   test_copy_list_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3579625Z   test_copy_list_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3579927Z   test_copy_list_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3580227Z   test_copy_list_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3580532Z   test_copy_tensor_cuda_bfloat16 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3580835Z   test_copy_tensor_cuda_bool (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3581148Z   test_copy_tensor_cuda_complex128 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3581472Z   test_copy_tensor_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3581779Z   test_copy_tensor_cuda_float16 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3582130Z   test_copy_tensor_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3582442Z   test_copy_tensor_cuda_float64 (__main__.TestAsArrayCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3582749Z   test_copy_tensor_cuda_int16 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3583050Z   test_copy_tensor_cuda_int32 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3583352Z   test_copy_tensor_cuda_int64 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3583659Z   test_copy_tensor_cuda_int8 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3583954Z   test_copy_tensor_cuda_uint8 (__main__.TestAsArrayCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3584285Z   test_retain_autograd_history_cuda_complex64 (__main__.TestAsArrayCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3584627Z   test_retain_autograd_history_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3584969Z   test_unsupported_alias_cuda_float32 (__main__.TestAsArrayCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3585340Z   test_unsupported_alias_mult_devices_cuda_float32 (__main__.TestAsArrayCUDA) ... skip: fewer than 2 devices detected (0.001s)
2023-01-11T23:16:18.3585735Z   test_empty_like_cuda (__main__.TestLikeTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3586100Z   test_full_like_inference_cuda (__main__.TestLikeTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3586434Z   test_ones_like_cuda (__main__.TestLikeTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3586819Z   test_ones_like_multiple_device_cuda (__main__.TestLikeTensorCreationCUDA) ... skip: fewer than 2 devices detected (0.001s)
2023-01-11T23:16:18.3587195Z   test_zeros_like_cuda (__main__.TestLikeTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3587613Z   test_zeros_like_multiple_device_cuda (__main__.TestLikeTensorCreationCUDA) ... skip: fewer than 2 devices detected (0.001s)
2023-01-11T23:16:18.3587995Z   test_normal_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.011s)
2023-01-11T23:16:18.3588347Z   test_normal_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.011s)
2023-01-11T23:16:18.3588701Z   test_normal_std_error_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.020s)
2023-01-11T23:16:18.3589059Z   test_rand_cuda_complex128 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3589932Z   test_rand_cuda_complex32 (__main__.TestRandomTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:3349: UserWarning: ComplexHalf support is experimental and many operators don't support it yet. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/EmptyTensor.cpp:32.)
2023-01-11T23:16:18.3590554Z   res1 = torch.rand(size, size, dtype=dtype, device=device)
2023-01-11T23:16:18.3590802Z ok (0.001s)
2023-01-11T23:16:18.3591064Z   test_rand_cuda_complex64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3591417Z   test_rand_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3591759Z   test_rand_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3592170Z   test_randint_cuda (__main__.TestRandomTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3592551Z   test_randint_inference_cuda (__main__.TestRandomTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3592921Z   test_randn_cuda_bfloat16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3593271Z   test_randn_cuda_complex128 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3593623Z   test_randn_cuda_complex32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3593970Z   test_randn_cuda_complex64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3594315Z   test_randn_cuda_float16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3594692Z   test_randn_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3595028Z   test_randn_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3595376Z   test_random_neg_values_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3595723Z   test_randperm_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.044s)
2023-01-11T23:16:18.3596091Z   test_randperm_device_compatibility_cuda (__main__.TestRandomTensorCreationCUDA) ... ok (0.167s)
2023-01-11T23:16:18.3596458Z   test_uniform_from_to_cuda_bfloat16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.017s)
2023-01-11T23:16:18.3596819Z   test_uniform_from_to_cuda_float16 (__main__.TestRandomTensorCreationCUDA) ... ok (0.018s)
2023-01-11T23:16:18.3597181Z   test_uniform_from_to_cuda_float32 (__main__.TestRandomTensorCreationCUDA) ... ok (0.058s)
2023-01-11T23:16:18.3597530Z   test_uniform_from_to_cuda_float64 (__main__.TestRandomTensorCreationCUDA) ... ok (0.112s)
2023-01-11T23:16:18.3597867Z   test_arange_bfloat16_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3598186Z   test_arange_cuda (__main__.TestTensorCreationCUDA) ... ok (0.063s)
2023-01-11T23:16:18.3598520Z   test_arange_device_vs_cpu_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3598865Z   test_arange_device_vs_cpu_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3599213Z   test_arange_device_vs_cpu_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3599563Z   test_arange_device_vs_cpu_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3599920Z   test_arange_inference_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3600312Z   test_as_strided_neg_cuda (__main__.TestTensorCreationCUDA) ... ok (0.012s)
2023-01-11T23:16:18.3600653Z   test_as_tensor_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3600994Z   test_block_diag_cuda (__main__.TestTensorCreationCUDA) ... ok (0.040s)
2023-01-11T23:16:18.3601313Z   test_block_diag_scipy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.031s)
2023-01-11T23:16:18.3601643Z   test_cartesian_prod_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3602017Z   test_cat2_cuda_float16 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3602386Z   test_cat2_cuda_float64 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3602737Z   test_cat2_cuda_int32 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3603088Z   test_cat_all_dtypes_and_devices_cuda (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3603488Z   test_cat_big_cuda (__main__.TestTensorCreationCUDA) ... skip: test is slow; run with PYTORCH_TEST_WITH_SLOW to enable test (0.001s)
2023-01-11T23:16:18.3603845Z   test_cat_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3604154Z   test_cat_empty_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3604477Z   test_cat_empty_legacy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3604807Z   test_cat_in_channels_last_cuda (__main__.TestTensorCreationCUDA) ... ok (0.025s)
2023-01-11T23:16:18.3605141Z   test_cat_mem_overlap_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3605475Z   test_cat_out_channels_last_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3605797Z   test_cat_out_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3606114Z   test_cat_out_memory_format_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3606463Z   test_cat_preserve_channels_last_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3606810Z   test_cat_stack_cross_devices_cuda (__main__.TestTensorCreationCUDA) ... ok (0.011s)
2023-01-11T23:16:18.3607167Z   test_combinations_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3607556Z   test_complex_type_conversions_cuda (__main__.TestTensorCreationCUDA) ... skip: real and imag not implemented for complex (0.001s)
2023-01-11T23:16:18.3607972Z   test_constructor_device_legacy_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3608358Z   test_constructor_dtypes_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3608987Z   test_ctor_with_numpy_array_cuda (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1447: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
2023-01-11T23:16:18.3609756Z Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
2023-01-11T23:16:18.3610060Z   np.float,
2023-01-11T23:16:18.3610538Z /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1454: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
2023-01-11T23:16:18.3611201Z Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
2023-01-11T23:16:18.3611486Z   np.bool,
2023-01-11T23:16:18.3611668Z ok (0.007s)
2023-01-11T23:16:18.3611964Z   test_device_rounding_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3612366Z   test_device_rounding_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3612710Z   test_device_rounding_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3613077Z   test_diag_embed_cuda_float32 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3613419Z   test_diagflat_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3613740Z   test_dsplit_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.020s)
2023-01-11T23:16:18.3614069Z   test_dsplit_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.020s)
2023-01-11T23:16:18.3614394Z   test_dsplit_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.019s)
2023-01-11T23:16:18.3614836Z   test_dstack_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.040s)
2023-01-11T23:16:18.3615171Z   test_dstack_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.040s)
2023-01-11T23:16:18.3615504Z   test_dstack_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.040s)
2023-01-11T23:16:18.3615829Z   test_dstack_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.040s)
2023-01-11T23:16:18.3616146Z   test_dstack_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.039s)
2023-01-11T23:16:18.3616463Z   test_dstack_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3616782Z   test_dstack_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3617084Z   test_dstack_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3617406Z   test_dstack_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3617722Z   test_dstack_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3618035Z   test_empty_full_cuda (__main__.TestTensorCreationCUDA) ... ok (0.052s)
2023-01-11T23:16:18.3618352Z   test_empty_overflow_cuda (__main__.TestTensorCreationCUDA) ... ok (0.016s)
2023-01-11T23:16:18.3619069Z   test_empty_strided_cuda (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:2445: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2023-01-11T23:16:18.3619688Z   as_strided = torch.empty(empty_strided.storage().size(),
2023-01-11T23:16:18.3619923Z ok (0.002s)
2023-01-11T23:16:18.3620179Z   test_empty_tensor_props_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3620497Z   test_eye_cuda (__main__.TestTensorCreationCUDA) ... ok (0.184s)
2023-01-11T23:16:18.3620827Z   test_fill_all_dtypes_and_devices_cuda (__main__.TestTensorCreationCUDA) ... ok (0.363s)
2023-01-11T23:16:18.3621180Z   test_float_to_int_conversion_finite_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3621551Z   test_float_to_int_conversion_finite_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3621949Z   test_float_to_int_conversion_finite_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3622335Z   test_float_to_int_conversion_finite_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3622688Z   test_float_to_int_conversion_finite_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3623054Z   test_float_to_int_conversion_finite_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3623439Z   test_float_to_int_conversion_nonfinite_cuda_bool (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3623849Z   test_float_to_int_conversion_nonfinite_cuda_int16 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3624333Z   test_float_to_int_conversion_nonfinite_cuda_int32 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3624730Z   test_float_to_int_conversion_nonfinite_cuda_int64 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3625138Z   test_float_to_int_conversion_nonfinite_cuda_int8 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3625537Z   test_float_to_int_conversion_nonfinite_cuda_uint8 (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3625902Z   test_full_inference_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3626249Z   test_full_inference_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3626590Z   test_full_inference_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3626907Z   test_full_out_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3627236Z   test_hsplit_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.025s)
2023-01-11T23:16:18.3627564Z   test_hsplit_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.025s)
2023-01-11T23:16:18.3627892Z   test_hsplit_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3628228Z   test_hstack_column_stack_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.081s)
2023-01-11T23:16:18.3628586Z   test_hstack_column_stack_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.081s)
2023-01-11T23:16:18.3628935Z   test_hstack_column_stack_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.080s)
2023-01-11T23:16:18.3629274Z   test_hstack_column_stack_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.080s)
2023-01-11T23:16:18.3629610Z   test_hstack_column_stack_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.080s)
2023-01-11T23:16:18.3629956Z   test_hstack_column_stack_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.059s)
2023-01-11T23:16:18.3630300Z   test_hstack_column_stack_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.059s)
2023-01-11T23:16:18.3630730Z   test_hstack_column_stack_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.059s)
2023-01-11T23:16:18.3631126Z   test_hstack_column_stack_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.060s)
2023-01-11T23:16:18.3631504Z   test_hstack_column_stack_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.059s)
2023-01-11T23:16:18.3631851Z   test_kaiser_window_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.790s)
2023-01-11T23:16:18.3632188Z   test_kaiser_window_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.626s)
2023-01-11T23:16:18.3632525Z   test_kaiser_window_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.624s)
2023-01-11T23:16:18.3632860Z   test_kaiser_window_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.653s)
2023-01-11T23:16:18.3633184Z   test_kaiser_window_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.177s)
2023-01-11T23:16:18.3633523Z   test_large_linspace_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3633863Z   test_large_linspace_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3634245Z   test_like_fn_stride_proparation_vs_tensoriterator_unary_op_cuda (__main__.TestTensorCreationCUDA) ... ok (0.106s)
2023-01-11T23:16:18.3634627Z   test_linlogspace_mem_overlap_cuda (__main__.TestTensorCreationCUDA) ... ok (0.014s)
2023-01-11T23:16:18.3634970Z   test_linspace_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3635308Z   test_linspace_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (12.919s)
2023-01-11T23:16:18.3635643Z   test_linspace_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (12.564s)
2023-01-11T23:16:18.3635975Z   test_linspace_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (12.549s)
2023-01-11T23:16:18.3636300Z   test_linspace_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (12.579s)
2023-01-11T23:16:18.3636655Z   test_linspace_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3636984Z   test_linspace_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (12.714s)
2023-01-11T23:16:18.3637313Z   test_linspace_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (12.611s)
2023-01-11T23:16:18.3637637Z   test_linspace_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3637954Z   test_linspace_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3638288Z   test_linspace_deduction_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3638638Z   test_linspace_device_vs_cpu_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3638996Z   test_linspace_device_vs_cpu_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3639360Z   test_linspace_device_vs_cpu_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3639719Z   test_linspace_device_vs_cpu_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3640078Z   test_linspace_device_vs_cpu_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3640425Z   test_linspace_device_vs_cpu_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3640782Z   test_linspace_special_steps_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3641145Z   test_linspace_special_steps_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3641508Z   test_linspace_special_steps_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3641866Z   test_linspace_special_steps_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3642224Z   test_linspace_special_steps_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3642575Z   test_linspace_special_steps_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3642930Z   test_linspace_vs_numpy_complex_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.087s)
2023-01-11T23:16:18.3643326Z   test_linspace_vs_numpy_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.076s)
2023-01-11T23:16:18.3643682Z   test_linspace_vs_numpy_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.087s)
2023-01-11T23:16:18.3644033Z   test_linspace_vs_numpy_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.047s)
2023-01-11T23:16:18.3644372Z   test_linspace_vs_numpy_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.042s)
2023-01-11T23:16:18.3644727Z   test_linspace_vs_numpy_integral_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3645081Z   test_linspace_vs_numpy_integral_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3645428Z   test_linspace_vs_numpy_integral_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3645793Z   test_linspace_vs_numpy_integral_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3646152Z   test_linspace_vs_numpy_integral_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3646501Z   test_logspace_base2_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3646838Z   test_logspace_base2_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3647179Z   test_logspace_base2_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3647518Z   test_logspace_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3647850Z   test_logspace_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3648171Z   test_logspace_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.017s)
2023-01-11T23:16:18.3648493Z   test_logspace_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3648856Z   test_logspace_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3649175Z   test_logspace_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3649502Z   test_logspace_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3649824Z   test_logspace_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3650141Z   test_logspace_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.007s)
2023-01-11T23:16:18.3650471Z   test_logspace_deduction_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3650818Z   test_logspace_device_vs_cpu_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3651174Z   test_logspace_device_vs_cpu_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3651522Z   test_logspace_device_vs_cpu_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3651883Z   test_logspace_special_steps_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3652236Z   test_logspace_special_steps_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3652589Z   test_logspace_special_steps_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3652941Z   test_logspace_vs_numpy_complex_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.090s)
2023-01-11T23:16:18.3653300Z   test_logspace_vs_numpy_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.216s)
2023-01-11T23:16:18.3653647Z   test_logspace_vs_numpy_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.211s)
2023-01-11T23:16:18.3654447Z   test_meshgrid_default_indexing_cuda (__main__.TestTensorCreationCUDA) ... /opt/conda/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/TensorShape.cpp:3452.)
2023-01-11T23:16:18.3655257Z   return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2023-01-11T23:16:18.3655506Z ok (0.003s)
2023-01-11T23:16:18.3655819Z   test_meshgrid_empty_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3656169Z   test_meshgrid_ij_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3656557Z   test_meshgrid_ij_indexing_is_default_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3656955Z   test_meshgrid_inconsistent_device_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3657349Z   test_meshgrid_inconsistent_dtype_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3657725Z   test_meshgrid_non_1d_tensor_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3658113Z   test_meshgrid_unsupported_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3658503Z   test_meshgrid_vs_numpy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.013s)
2023-01-11T23:16:18.3658876Z   test_meshgrid_warns_if_no_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3659264Z   test_meshgrid_xy_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3659628Z   test_new_empty_strided_cuda (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3660031Z   test_new_methods_requires_grad_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3660431Z   test_new_tensor_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3660834Z   test_offset_scalar_cast_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.000s)
2023-01-11T23:16:18.3661201Z   test_ones_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3661535Z   test_random_bool_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3661948Z   test_random_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3662333Z   test_random_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3662685Z   test_random_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3663027Z   test_random_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3663377Z   test_random_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3663722Z   test_random_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3664077Z   test_random_default_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3664457Z   test_random_default_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3664829Z   test_random_default_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3665205Z   test_random_default_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3665565Z   test_random_default_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3665936Z   test_random_default_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3666305Z   test_random_default_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3666663Z   test_random_default_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3667029Z   test_random_default_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3667390Z   test_random_from_to_bool_cuda (__main__.TestTensorCreationCUDA) ... ok (0.217s)
2023-01-11T23:16:18.3667756Z   test_random_from_to_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.047s)
2023-01-11T23:16:18.3668119Z   test_random_from_to_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.046s)
2023-01-11T23:16:18.3668493Z   test_random_from_to_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.046s)
2023-01-11T23:16:18.3668862Z   test_random_from_to_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.046s)
2023-01-11T23:16:18.3669250Z   test_random_from_to_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.233s)
2023-01-11T23:16:18.3669615Z   test_random_from_to_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.232s)
2023-01-11T23:16:18.3669983Z   test_random_from_to_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.031s)
2023-01-11T23:16:18.3670347Z   test_random_from_to_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.233s)
2023-01-11T23:16:18.3670795Z   test_random_from_to_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.203s)
2023-01-11T23:16:18.3671135Z   test_random_full_range_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3671482Z   test_random_full_range_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3671823Z   test_random_full_range_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3672167Z   test_random_full_range_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3672511Z   test_random_full_range_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3672849Z   test_random_full_range_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3673175Z   test_random_full_range_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3673510Z   test_random_full_range_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3673848Z   test_random_full_range_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3674972Z   test_random_to_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^8), 2^8]. Due to precision limitations c10::BFloat16 can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.)
2023-01-11T23:16:18.3675630Z   t.random_(to_)
2023-01-11T23:16:18.3675820Z ok (0.011s)
2023-01-11T23:16:18.3676792Z   test_random_to_cuda_float16 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^11), 2^11]. Due to precision limitations c10::Half can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.)
2023-01-11T23:16:18.3677391Z   t.random_(to_)
2023-01-11T23:16:18.3677575Z ok (0.011s)
2023-01-11T23:16:18.3678457Z   test_random_to_cuda_float32 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^24), 2^24]. Due to precision limitations float can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.)
2023-01-11T23:16:18.3679049Z   t.random_(to_)
2023-01-11T23:16:18.3679227Z ok (0.011s)
2023-01-11T23:16:18.3680111Z   test_random_to_cuda_float64 (__main__.TestTensorCreationCUDA) ... /var/lib/jenkins/workspace/test/test_tensor_creation_ops.py:1714: UserWarning: to - 1 is out of bounds [-(2^53), 2^53]. Due to precision limitations double can support discrete uniform distribution only within this range. This warning will become an error in version 1.7 release, please fix the code in advance (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/native/DistributionTemplates.h:106.)
2023-01-11T23:16:18.3680709Z   t.random_(to_)
2023-01-11T23:16:18.3680893Z ok (0.011s)
2023-01-11T23:16:18.3681142Z   test_random_to_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.025s)
2023-01-11T23:16:18.3681500Z   test_random_to_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.025s)
2023-01-11T23:16:18.3681830Z   test_random_to_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.011s)
2023-01-11T23:16:18.3682148Z   test_random_to_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.025s)
2023-01-11T23:16:18.3682474Z   test_random_to_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.021s)
2023-01-11T23:16:18.3682792Z   test_range_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3683127Z   test_range_factories_64bit_indexing_cuda (__main__.TestTensorCreationCUDA) ... ok (0.240s)
2023-01-11T23:16:18.3683461Z   test_range_warning_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3683791Z   test_repeat_interleave_cuda (__main__.TestTensorCreationCUDA) ... ok (0.020s)
2023-01-11T23:16:18.3684114Z   test_roll_cuda (__main__.TestTensorCreationCUDA) ... ok (0.023s)
2023-01-11T23:16:18.3684461Z   test_signal_window_functions_window_bartlett_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3684854Z   test_signal_window_functions_window_bartlett_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3685241Z   test_signal_window_functions_window_bartlett_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3685626Z   test_signal_window_functions_window_bartlett_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3685999Z   test_signal_window_functions_window_bartlett_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3686384Z   test_signal_window_functions_window_blackman_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3686774Z   test_signal_window_functions_window_blackman_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3687191Z   test_signal_window_functions_window_blackman_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3687566Z   test_signal_window_functions_window_blackman_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3687952Z   test_signal_window_functions_window_blackman_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3688335Z   test_signal_window_functions_window_hamming_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3688728Z   test_signal_window_functions_window_hamming_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3689111Z   test_signal_window_functions_window_hamming_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3689496Z   test_signal_window_functions_window_hamming_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.010s)
2023-01-11T23:16:18.3689876Z   test_signal_window_functions_window_hamming_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3690259Z   test_signal_window_functions_window_hann_cuda_bfloat16 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3690627Z   test_signal_window_functions_window_hann_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3691005Z   test_signal_window_functions_window_hann_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3691377Z   test_signal_window_functions_window_hann_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.009s)
2023-01-11T23:16:18.3691748Z   test_signal_window_functions_window_hann_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3692131Z   test_simple_scalar_cast_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3692493Z   test_stack_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3692850Z   test_stack_out_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3693238Z   test_strided_mismatched_stride_shape_cuda (__main__.TestTensorCreationCUDA) ... ok (0.022s)
2023-01-11T23:16:18.3693603Z   test_tensor_ctor_device_inference_cuda (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3693942Z   test_tensor_device_cuda (__main__.TestTensorCreationCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3694273Z   test_tensor_factories_empty_cuda (__main__.TestTensorCreationCUDA) ... ok (0.198s)
2023-01-11T23:16:18.3694761Z   test_tensor_factory_copy_var_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3695138Z   test_tensor_factory_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.003s)
2023-01-11T23:16:18.3695492Z   test_tensor_factory_gpu_type_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3695855Z   test_tensor_factory_gpu_type_inference_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3696236Z   test_tensor_factory_type_inference_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.002s)
2023-01-11T23:16:18.3696610Z   test_tensor_from_non_writable_numpy_cuda (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3696978Z   test_tensor_from_sequence_cuda (__main__.TestTensorCreationCUDA) ... skip: Only runs on cpu (0.001s)
2023-01-11T23:16:18.3697331Z   test_torch_complex_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3697668Z   test_torch_complex_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3698004Z   test_torch_complex_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3698356Z   test_torch_complex_floating_dtype_error_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3698788Z   test_torch_complex_floating_dtype_error_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3699184Z   test_torch_complex_floating_dtype_error_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3699567Z   test_torch_complex_floating_dtype_error_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3699939Z   test_torch_complex_floating_dtype_error_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3700309Z   test_torch_complex_floating_dtype_error_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3700684Z   test_torch_complex_floating_dtype_error_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3701059Z   test_torch_complex_floating_dtype_error_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3701426Z   test_torch_complex_out_dtype_error_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3701796Z   test_torch_complex_out_dtype_error_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.008s)
2023-01-11T23:16:18.3702167Z   test_torch_complex_same_dtype_error_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.011s)
2023-01-11T23:16:18.3702534Z   test_torch_complex_same_dtype_error_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.011s)
2023-01-11T23:16:18.3702884Z   test_torch_polar_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3703216Z   test_torch_polar_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3703555Z   test_unpack_double_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3703885Z   test_unpack_double_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.002s)
2023-01-11T23:16:18.3704209Z   test_vander_cuda (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3704534Z   test_vander_types_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3704871Z   test_vander_types_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3705215Z   test_vander_types_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3705604Z   test_vander_types_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3705947Z   test_vander_types_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3706280Z   test_vander_types_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3706607Z   test_vander_types_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3706935Z   test_vander_types_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3707261Z   test_vander_types_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3707593Z   test_vander_types_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.005s)
2023-01-11T23:16:18.3707927Z   test_vsplit_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.016s)
2023-01-11T23:16:18.3708256Z   test_vsplit_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.015s)
2023-01-11T23:16:18.3708574Z   test_vsplit_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.014s)
2023-01-11T23:16:18.3708911Z   test_vstack_row_stack_cuda_complex128 (__main__.TestTensorCreationCUDA) ... ok (0.089s)
2023-01-11T23:16:18.3709260Z   test_vstack_row_stack_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.089s)
2023-01-11T23:16:18.3709601Z   test_vstack_row_stack_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.088s)
2023-01-11T23:16:18.3709942Z   test_vstack_row_stack_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.088s)
2023-01-11T23:16:18.3710279Z   test_vstack_row_stack_cuda_float64 (__main__.TestTensorCreationCUDA) ... ok (0.088s)
2023-01-11T23:16:18.3710697Z   test_vstack_row_stack_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.057s)
2023-01-11T23:16:18.3711084Z   test_vstack_row_stack_cuda_int32 (__main__.TestTensorCreationCUDA) ... ok (0.057s)
2023-01-11T23:16:18.3711420Z   test_vstack_row_stack_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.057s)
2023-01-11T23:16:18.3711758Z   test_vstack_row_stack_cuda_int8 (__main__.TestTensorCreationCUDA) ... ok (0.057s)
2023-01-11T23:16:18.3712090Z   test_vstack_row_stack_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.057s)
2023-01-11T23:16:18.3712409Z   test_zeros_cuda (__main__.TestTensorCreationCUDA) ... ok (0.003s)
2023-01-11T23:16:18.3712748Z   test_zeros_dtype_layout_device_match_cuda_bool (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3713125Z   test_zeros_dtype_layout_device_match_cuda_complex64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3713496Z   test_zeros_dtype_layout_device_match_cuda_float16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3713867Z   test_zeros_dtype_layout_device_match_cuda_float32 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3714236Z   test_zeros_dtype_layout_device_match_cuda_int16 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3714602Z   test_zeros_dtype_layout_device_match_cuda_int64 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3714961Z   test_zeros_dtype_layout_device_match_cuda_uint8 (__main__.TestTensorCreationCUDA) ... ok (0.001s)
2023-01-11T23:16:18.3715300Z   test_zeros_out_cuda (__main__.TestTensorCreationCUDA) ... ok (0.004s)
2023-01-11T23:16:18.3715476Z 
2023-01-11T23:16:18.3715691Z ----------------------------------------------------------------------
2023-01-11T23:16:18.3715947Z Ran 459 tests in 85.967s
2023-01-11T23:16:18.3716073Z 
2023-01-11T23:16:18.3716156Z OK (skipped=99)
2023-01-11T23:16:18.3716273Z 
2023-01-11T23:16:18.3716366Z Generating XML reports...
2023-01-11T23:16:18.3716796Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestAsArrayCUDA-20230111231451.xml
2023-01-11T23:16:18.3717365Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestLikeTensorCreationCUDA-20230111231451.xml
2023-01-11T23:16:18.3718025Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestRandomTensorCreationCUDA-20230111231451.xml
2023-01-11T23:16:18.3718611Z Generated XML report: test-reports/python-unittest/test_tensor_creation_ops/TEST-TestTensorCreationCUDA-20230111231451.xml
2023-01-11T23:16:18.3718870Z 
2023-01-11T23:16:18.3719216Z ##[endgroup]
2023-01-11T23:16:18.3719638Z FINISHED PRINTING LOG FILE of test_tensor_creation_ops (/var/lib/jenkins/workspace/test/test-reports/test_tensor_creation_ops_hsc_5c7d)
2023-01-11T23:16:18.3719880Z 
2023-01-11T23:16:18.3720038Z Running doctests ... [2023-01-11 23:16:18.349520]
2023-01-11T23:16:18.3720388Z Start doctest_module('/opt/conda/lib/python3.10/site-packages/torch')
2023-01-11T23:16:18.3720634Z Listing tests
2023-01-11T23:16:23.9128805Z gathering tests
2023-01-11T23:16:23.9143112Z running 663 test(s)
2023-01-11T23:16:23.9150245Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::is_tensor:0, line 429[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9155646Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::is_tensor:0
2023-01-11T23:16:23.9156323Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_tensor_type:0, line 458[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9156926Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_tensor_type:0
2023-01-11T23:16:23.9157480Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_dtype:0, line 496[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9159956Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::set_default_dtype:0
2023-01-11T23:16:23.9160723Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::use_deterministic_algorithms:0, line 629[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9161810Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::use_deterministic_algorithms:0
2023-01-11T23:16:23.9162470Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/__init__.py::compile:0, line 1221[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9162953Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/__init__.py::compile:0
2023-01-11T23:16:23.9163590Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::Generator:0, line 15[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9164647Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::Generator:0
2023-01-11T23:16:23.9165343Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::_LinAlgError:0, line 5[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9166024Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_C.cpython-310-x86_64-linux-gnu.so::_LinAlgError:0
2023-01-11T23:16:23.9166641Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_namedtensor_internals.py::update_names:0, line 125[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9168738Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_namedtensor_internals.py::update_names:0
2023-01-11T23:16:23.9169321Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_hook:0, line 508[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9388816Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.register_hook:0
2023-01-11T23:16:23.9389556Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.refine_names:0, line 1096[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9498355Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.refine_names:0
2023-01-11T23:16:23.9499847Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.align_to:0, line 1141[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9505071Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.align_to:0
2023-01-11T23:16:23.9505614Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.rename:0, line 1214[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9511003Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.rename:0
2023-01-11T23:16:23.9511568Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.to_sparse_coo:0, line 1244[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9521271Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_tensor.py::Tensor.to_sparse_coo:0
2023-01-11T23:16:23.9521913Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py::set_printoptions:0, line 49[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9548588Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_tensor_str.py::set_printoptions:0
2023-01-11T23:16:23.9549224Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_tensors:0, line 61[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9554985Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_tensors:0
2023-01-11T23:16:23.9555604Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_shapes:0, line 89[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9557796Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::broadcast_shapes:0
2023-01-11T23:16:23.9558403Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::split:0, line 161[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9571502Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::split:0
2023-01-11T23:16:23.9572234Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::einsum:0, line 269[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9590221Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::einsum:0
2023-01-11T23:16:23.9590889Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::meshgrid:0, line 450[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9625764Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::meshgrid:0
2023-01-11T23:16:23.9626374Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_impl:0, line 764[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9640047Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_impl:0
2023-01-11T23:16:23.9640672Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_consecutive_impl:0, line 842[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9651895Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::_unique_consecutive_impl:0
2023-01-11T23:16:23.9652444Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::tensordot:0, line 1040[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9663179Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::tensordot:0
2023-01-11T23:16:23.9663707Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::cartesian_prod:0, line 1118[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9670660Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::cartesian_prod:0
2023-01-11T23:16:23.9671190Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::block_diag:0, line 1152[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9681500Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::block_diag:0
2023-01-11T23:16:23.9682030Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::cdist:0, line 1203[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9696106Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::cdist:0
2023-01-11T23:16:23.9696652Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_1d:0, line 1243[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9713443Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_1d:0
2023-01-11T23:16:23.9713973Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_2d:0, line 1279[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9729084Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_2d:0
2023-01-11T23:16:23.9729629Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_3d:0, line 1317[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9750875Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::atleast_3d:0
2023-01-11T23:16:23.9751408Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::norm:0, line 1455[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9785539Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::norm:0
2023-01-11T23:16:23.9786173Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::chain_matmul:0, line 1606[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9786794Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::chain_matmul:0
2023-01-11T23:16:23.9787348Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/functional.py::_lu_impl:0, line 1706[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9789623Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/functional.py::_lu_impl:0
2023-01-11T23:16:23.9790571Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::list:0, line 391[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9791222Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/hub.py::list:0
2023-01-11T23:16:23.9791844Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::help:0, line 444[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9792374Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/hub.py::help:0
2023-01-11T23:16:23.9793031Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::load:0, line 524[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9793685Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/hub.py::load:0
2023-01-11T23:16:23.9794351Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::_load_local:0, line 563[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9795007Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/hub.py::_load_local:0
2023-01-11T23:16:23.9795630Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::download_url_to_file:0, line 592[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9796136Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/hub.py::download_url_to_file:0
2023-01-11T23:16:23.9796658Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/hub.py::load_state_dict_from_url:0, line 701[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9797237Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/hub.py::load_state_dict_from_url:0
2023-01-11T23:16:23.9797766Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.define:0, line 61[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9798342Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.define:0
2023-01-11T23:16:23.9799042Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.impl:0, line 81[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9799718Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/library.py::Library.impl:0
2023-01-11T23:16:23.9800366Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_ignored_functions:0, line 67[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9800905Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_ignored_functions:0
2023-01-11T23:16:23.9801449Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_testing_overrides:0, line 336[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9839047Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::get_testing_overrides:0
2023-01-11T23:16:23.9839690Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::wrap_torch_function:0, line 1391[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9842961Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::wrap_torch_function:0
2023-01-11T23:16:23.9843608Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::handle_torch_function:0, line 1508[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9846540Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::handle_torch_function:0
2023-01-11T23:16:23.9847202Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_method_or_property:0, line 1732[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9889522Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_method_or_property:0
2023-01-11T23:16:23.9890170Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_like:0, line 1750[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9898075Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/overrides.py::is_tensor_like:0
2023-01-11T23:16:23.9898943Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/quasirandom.py::SobolEngine:0, line 37[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9899529Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/quasirandom.py::SobolEngine:0
2023-01-11T23:16:23.9900062Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/serialization.py::save:0, line 429[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9900558Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/serialization.py::save:0
2023-01-11T23:16:23.9901069Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/serialization.py::load:0, line 754[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9904409Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/serialization.py::load:0
2023-01-11T23:16:23.9905157Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/torch_version.py::TorchVersion:0, line 49[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9905758Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/torch_version.py::TorchVersion:0
2023-01-11T23:16:23.9906336Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_prims_common/__init__.py::compute_required_storage_length:0, line 1495[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9913350Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_prims_common/__init__.py::compute_required_storage_length:0
2023-01-11T23:16:23.9914046Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.then:0, line 147[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9914590Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.then:0
2023-01-11T23:16:23.9915256Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.add_done_callback:0, line 195[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9915872Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.add_done_callback:0
2023-01-11T23:16:23.9916618Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_result:0, line 228[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9917299Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_result:0
2023-01-11T23:16:23.9918025Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_exception:0, line 257[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9918745Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::Future.set_exception:0
2023-01-11T23:16:23.9919423Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::collect_all:0, line 288[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9920132Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/futures/__init__.py::collect_all:0
2023-01-11T23:16:23.9920850Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::annotate:0, line 103[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9921533Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::annotate:0
2023-01-11T23:16:23.9922209Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::strict_fusion:0, line 202[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9922879Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/__init__.py::strict_fusion:0
2023-01-11T23:16:23.9923435Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/monitor/__init__.py::TensorboardEventHandler:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9938528Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/monitor/__init__.py::TensorboardEventHandler:0
2023-01-11T23:16:23.9939279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nested/__init__.py::as_nested_tensor:0, line 39[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9953555Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nested/__init__.py::as_nested_tensor:0
2023-01-11T23:16:23.9954116Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/sparse/__init__.py::sum:0, line 175[01m <- wrt source file[39;49;00m
2023-01-11T23:16:23.9963310Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/sparse/__init__.py::sum:0
2023-01-11T23:16:23.9963867Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py::aot_function:0, line 2139[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.1259916Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py::aot_function:0
2023-01-11T23:16:24.1261101Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/benchmark_utils.py::benchmark_utilization:0, line 162[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.1262265Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/benchmark_utils.py::benchmark_utilization:0
2023-01-11T23:16:24.1263167Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::vjp:0, line 195[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.1327748Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::vjp:0
2023-01-11T23:16:24.1328338Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacrev:0, line 382[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.1457091Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacrev:0
2023-01-11T23:16:24.1458255Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jvp:0, line 882[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2111479Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jvp:0
2023-01-11T23:16:24.2112523Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacfwd:0, line 1024[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2233333Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::jacfwd:0
2023-01-11T23:16:24.2234032Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::hessian:0, line 1173[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2270227Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::hessian:0
2023-01-11T23:16:24.2271705Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::grad:0, line 1290[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2273076Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::grad:0
2023-01-11T23:16:24.2273689Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::functionalize:0, line 1441[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2274265Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/eager_transforms.py::functionalize:0
2023-01-11T23:16:24.2274815Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/fx_minifier.py::minifier:0, line 72[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2275345Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/fx_minifier.py::minifier:0
2023-01-11T23:16:24.2275870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_functorch/vmap.py::vmap:0, line 306[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2317414Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_functorch/vmap.py::vmap:0
2023-01-11T23:16:24.2318924Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::NvfuserPrimsMode:0, line 90[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2320673Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::NvfuserPrimsMode:0
2023-01-11T23:16:24.2322031Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::TorchRefsMode:0, line 141[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2323265Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/_prims/context.py::TorchRefsMode:0
2023-01-11T23:16:24.2323833Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/qat/modules/linear_relu.py::LinearReLU:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2324410Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/qat/modules/linear_relu.py::LinearReLU:0
2023-01-11T23:16:24.2325022Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py::LinearReLU:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2325644Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/dynamic/modules/linear_relu.py::LinearReLU:0
2023-01-11T23:16:24.2326252Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearReLU:0, line 22[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2326857Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearReLU:0
2023-01-11T23:16:24.2327468Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearLeakyReLU:0, line 59[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2328075Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearLeakyReLU:0
2023-01-11T23:16:24.2328687Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearTanh:0, line 126[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2329338Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/intrinsic/quantized/modules/linear_relu.py::LinearTanh:0
2023-01-11T23:16:24.2329916Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTMCell:0, line 24[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2347656Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTMCell:0
2023-01-11T23:16:24.2348813Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTM:0, line 274[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2376574Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantizable/modules/rnn.py::LSTM:0
2023-01-11T23:16:24.2377641Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv1d:0, line 166[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2378354Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv1d:0
2023-01-11T23:16:24.2379134Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv2d:0, line 226[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2379734Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv2d:0
2023-01-11T23:16:24.2380426Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv3d:0, line 287[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2381069Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/functional.py::conv3d:0
2023-01-11T23:16:24.2381626Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::Quantize:0, line 74[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2386859Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::Quantize:0
2023-01-11T23:16:24.2387466Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::DeQuantize:0, line 114[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2394299Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/__init__.py::DeQuantize:0
2023-01-11T23:16:24.2395100Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv1d:0, line 34[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2395864Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv1d:0
2023-01-11T23:16:24.2396514Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv2d:0, line 105[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2397131Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv2d:0
2023-01-11T23:16:24.2397878Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv3d:0, line 170[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2398533Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::Conv3d:0
2023-01-11T23:16:24.2399265Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose1d:0, line 236[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2399998Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose1d:0
2023-01-11T23:16:24.2400842Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose2d:0, line 297[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2401686Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose2d:0
2023-01-11T23:16:24.2402609Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose3d:0, line 358[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2403455Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/conv.py::ConvTranspose3d:0
2023-01-11T23:16:24.2404279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py::Linear:0, line 28[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2404846Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/linear.py::Linear:0
2023-01-11T23:16:24.2405413Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTM:0, line 391[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2405956Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTM:0
2023-01-11T23:16:24.2406519Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRU:0, line 618[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2407075Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRU:0
2023-01-11T23:16:24.2407639Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::RNNCell:0, line 940[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2408353Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::RNNCell:0
2023-01-11T23:16:24.2409113Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTMCell:0, line 993[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2409864Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::LSTMCell:0
2023-01-11T23:16:24.2410559Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRUCell:0, line 1036[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2411186Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/dynamic/modules/rnn.py::GRUCell:0
2023-01-11T23:16:24.2411752Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/activation.py::ReLU6:0, line 31[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2412440Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/activation.py::ReLU6:0
2023-01-11T23:16:24.2413108Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv1d:0, line 295[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2413709Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv1d:0
2023-01-11T23:16:24.2414386Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv2d:0, line 403[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2415127Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv2d:0
2023-01-11T23:16:24.2415774Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv3d:0, line 502[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2416314Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::Conv3d:0
2023-01-11T23:16:24.2416874Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose1d:0, line 685[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2417452Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose1d:0
2023-01-11T23:16:24.2418038Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose2d:0, line 775[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2418678Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose2d:0
2023-01-11T23:16:24.2419255Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose3d:0, line 869[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2419813Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/conv.py::ConvTranspose3d:0
2023-01-11T23:16:24.2420386Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::Embedding:0, line 84[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2439845Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::Embedding:0
2023-01-11T23:16:24.2440467Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::EmbeddingBag:0, line 209[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2460790Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/embedding_ops.py::EmbeddingBag:0
2023-01-11T23:16:24.2461414Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::FloatFunctional:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2465696Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::FloatFunctional:0
2023-01-11T23:16:24.2466320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::QFunctional:0, line 141[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2469564Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/functional_modules.py::QFunctional:0
2023-01-11T23:16:24.2470517Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/linear.py::Linear:0, line 117[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2471295Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/linear.py::Linear:0
2023-01-11T23:16:24.2471857Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/rnn.py::LSTM:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2472486Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/nn/quantized/modules/rnn.py::LSTM:0
2023-01-11T23:16:24.2473168Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py::ActivationSparsifier:0, line 59[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2473975Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/activation_sparsifier/activation_sparsifier.py::ActivationSparsifier:0
2023-01-11T23:16:24.2474706Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py::BaseDataScheduler.get_schedule_param:0, line 92[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2504229Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_scheduler/base_data_scheduler.py::BaseDataScheduler.get_schedule_param:0
2023-01-11T23:16:24.2505069Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py::BaseDataSparsifier:0, line 54[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2505836Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/_experimental/data_sparsifier/base_data_sparsifier.py::BaseDataSparsifier:0
2023-01-11T23:16:24.2506567Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/scheduler/lambda_scheduler.py::LambdaSL:0, line 19[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2508255Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/scheduler/lambda_scheduler.py::LambdaSL:0
2023-01-11T23:16:24.2509067Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier:0, line 45[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2509762Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier:0
2023-01-11T23:16:24.2510541Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier.squash_mask:0, line 237[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2514951Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/pruning/sparsifier/base_sparsifier.py::BaseSparsifier.squash_mask:0
2023-01-11T23:16:24.2515653Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuse_modules.py::fuse_modules:0, line 143[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2516437Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuse_modules.py::fuse_modules:0
2023-01-11T23:16:24.2517105Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn:0, line 27[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2524453Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn:0
2023-01-11T23:16:24.2525117Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn_relu:0, line 64[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2529460Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_conv_bn_relu:0
2023-01-11T23:16:24.2530320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_linear_bn:0, line 111[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2534713Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_linear_bn:0
2023-01-11T23:16:24.2535340Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_convtranspose_bn:0, line 139[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2540259Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fuser_method_mappings.py::fuse_convtranspose_bn:0
2023-01-11T23:16:24.2540843Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_args:0, line 85[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2541535Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_args:0
2023-01-11T23:16:24.2542117Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_callable_args:0, line 106[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2542686Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/observer.py::_with_callable_args:0
2023-01-11T23:16:24.2543252Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::fuse_fx:0, line 242[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2543795Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::fuse_fx:0
2023-01-11T23:16:24.2546573Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_fx:0, line 301[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2547134Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_fx:0
2023-01-11T23:16:24.2549155Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_qat_fx:0, line 439[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2549823Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::prepare_qat_fx:0
2023-01-11T23:16:24.2550571Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_fx:0, line 604[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2551148Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_fx:0
2023-01-11T23:16:24.2551735Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_to_reference_fx:0, line 663[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2552589Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::convert_to_reference_fx:0
2023-01-11T23:16:24.2553249Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::_convert_to_reference_decomposed_fx:0, line 715[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2554113Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/quantize_fx.py::_convert_to_reference_decomposed_fx:0
2023-01-11T23:16:24.2554927Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_path_of_module:0, line 408[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2555663Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_path_of_module:0
2023-01-11T23:16:24.2556404Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_signature_locals:0, line 429[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2556966Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_signature_locals:0
2023-01-11T23:16:24.2557536Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_default_kwargs:0, line 442[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2558163Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_default_kwargs:0
2023-01-11T23:16:24.2558742Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_normalize_kwargs:0, line 463[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2559292Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_normalize_kwargs:0
2023-01-11T23:16:24.2559921Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_num_pos_args:0, line 483[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2560606Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/utils.py::_get_num_pos_args:0
2023-01-11T23:16:24.2561259Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/backend_config.py::DTypeConfig:0, line 131[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2561877Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/backend_config.py::DTypeConfig:0
2023-01-11T23:16:24.2562506Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/onednn.py::_fuse_linear_bn_leaky_relu:0, line 80[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2563113Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/backend_config/onednn.py::_fuse_linear_bn_leaky_relu:0
2023-01-11T23:16:24.2563764Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report.py::ModelReport:0, line 79[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2565799Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report.py::ModelReport:0
2023-01-11T23:16:24.2566600Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_filtered_tables:0, line 324[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2567472Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_filtered_tables:0
2023-01-11T23:16:24.2568233Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_table_visualization:0, line 407[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2568970Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_table_visualization:0
2023-01-11T23:16:24.2569727Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_plot_visualization:0, line 557[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2570482Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_plot_visualization:0
2023-01-11T23:16:24.2571445Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_histogram_visualization:0, line 619[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2572254Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/ao/quantization/fx/_model_report/model_report_visualizer.py::ModelReportVisualizer.generate_histogram_visualization:0
2023-01-11T23:16:24.2573081Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/anomaly_mode.py::detect_anomaly:0, line 25[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2573693Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/anomaly_mode.py::detect_anomaly:0
2023-01-11T23:16:24.2574789Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::make_dual:0, line 63[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2575516Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::make_dual:0
2023-01-11T23:16:24.2576238Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::unpack_dual:0, line 126[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2576801Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::unpack_dual:0
2023-01-11T23:16:24.2577342Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::dual_level:0, line 163[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2577866Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/forward_ad.py::dual_level:0
2023-01-11T23:16:24.2578436Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_backward:0, line 51[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2579006Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_backward:0
2023-01-11T23:16:24.2579591Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_forward:0, line 93[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2580153Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.save_for_forward:0
2023-01-11T23:16:24.2580725Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_dirty:0, line 143[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2581265Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_dirty:0
2023-01-11T23:16:24.2581869Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_non_differentiable:0, line 187[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2582758Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.mark_non_differentiable:0
2023-01-11T23:16:24.2583632Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.set_materialize_grads:0, line 215[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2584301Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::FunctionCtx.set_materialize_grads:0
2023-01-11T23:16:24.2584860Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::Function:0, line 387[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2585381Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/function.py::Function:0
2023-01-11T23:16:24.2585911Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vjp:0, line 248[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2587954Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vjp:0
2023-01-11T23:16:24.2588623Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jvp:0, line 346[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2593092Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jvp:0
2023-01-11T23:16:24.2593778Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jacobian:0, line 548[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2594832Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::jacobian:0
2023-01-11T23:16:24.2595458Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hessian:0, line 760[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2598491Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hessian:0
2023-01-11T23:16:24.2599167Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vhp:0, line 864[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2602499Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::vhp:0
2023-01-11T23:16:24.2603167Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hvp:0, line 955[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2606467Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/functional.py::hvp:0
2023-01-11T23:16:24.2607153Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::no_grad:0, line 120[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2607992Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::no_grad:0
2023-01-11T23:16:24.2608614Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::enable_grad:0, line 166[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2610571Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::enable_grad:0
2023-01-11T23:16:24.2611268Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::set_grad_enabled:0, line 216[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2612362Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::set_grad_enabled:0
2023-01-11T23:16:24.2613047Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::inference_mode:0, line 273[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2614952Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py::inference_mode:0
2023-01-11T23:16:24.2615700Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::saved_tensors_hooks:0, line 50[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2616458Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::saved_tensors_hooks:0
2023-01-11T23:16:24.2617230Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::save_on_cpu:0, line 109[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2617942Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::save_on_cpu:0
2023-01-11T23:16:24.2618524Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::disable_saved_tensors_hooks:0, line 164[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2619080Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::disable_saved_tensors_hooks:0
2023-01-11T23:16:24.2619643Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::register_multi_grad_hook:0, line 204[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2633333Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::register_multi_grad_hook:0
2023-01-11T23:16:24.2634045Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::allow_mutation_on_saved_tensors:0, line 406[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2647389Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/graph.py::allow_mutation_on_saved_tensors:0
2023-01-11T23:16:24.2648088Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::profile:0, line 123[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2648866Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::profile:0
2023-01-11T23:16:24.2649560Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::record_function:0, line 457[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2650365Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::record_function:0
2023-01-11T23:16:24.2651270Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_itt:0, line 582[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2651887Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_itt:0
2023-01-11T23:16:24.2652585Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_nvtx:0, line 651[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2653548Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py::emit_nvtx:0
2023-01-11T23:16:24.2654416Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:0, line 95[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2655272Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:0
2023-01-11T23:16:24.2655993Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:1, line 106[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2656721Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:1
2023-01-11T23:16:24.2657492Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:2, line 119[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2658067Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_jit_fn:2
2023-01-11T23:16:24.2658747Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_multi_output_jit_fn:0, line 151[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2659431Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/cuda/jiterator.py::_create_multi_output_jit_fn:0
2023-01-11T23:16:24.2659991Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::env:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2660530Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::env:0
2023-01-11T23:16:24.2661156Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::check_env:0, line 73[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2661808Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/argparse_util.py::check_env:0
2023-01-11T23:16:24.2662634Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::batch_isend_irecv:0, line 1377[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2663358Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::batch_isend_irecv:0
2023-01-11T23:16:24.2664211Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_reduce:0, line 1633[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2664794Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_reduce:0
2023-01-11T23:16:24.2665556Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_object:0, line 1997[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2666213Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_object:0
2023-01-11T23:16:24.2666792Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather_object:0, line 2085[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2667352Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::gather_object:0
2023-01-11T23:16:24.2667930Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::broadcast_object_list:0, line 2192[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2681383Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::broadcast_object_list:0
2023-01-11T23:16:24.2682428Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter_object_list:0, line 2288[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2683059Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter_object_list:0
2023-01-11T23:16:24.2683653Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather:0, line 2375[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2684229Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather:0
2023-01-11T23:16:24.2684812Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_into_tensor:0, line 2455[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2685392Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_into_tensor:0
2023-01-11T23:16:24.2685976Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_coalesced:0, line 2564[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2686549Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_gather_coalesced:0
2023-01-11T23:16:24.2687116Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter:0, line 2722[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2687664Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::scatter:0
2023-01-11T23:16:24.2688237Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::reduce_scatter_tensor:0, line 2929[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2688817Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::reduce_scatter_tensor:0
2023-01-11T23:16:24.2689501Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all_single:0, line 3046[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2690164Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all_single:0
2023-01-11T23:16:24.2690725Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all:0, line 3164[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2691290Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::all_to_all:0
2023-01-11T23:16:24.2691866Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::monitored_barrier:0, line 3346[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2692494Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::monitored_barrier:0
2023-01-11T23:16:24.2693099Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups:0, line 3598[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2693690Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups:0
2023-01-11T23:16:24.2694285Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups_by_enumeration:0, line 3713[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2695025Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py::new_subgroups_by_enumeration:0
2023-01-11T23:16:24.2695582Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py::__doc__:0, line 81[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2696311Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/launch.py::__doc__:0
2023-01-11T23:16:24.2696914Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/run.py::__doc__:0, line 297[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2697593Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/run.py::__doc__:0
2023-01-11T23:16:24.2698262Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/autograd/__init__.py::context:0, line 39[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2698805Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/autograd/__init__.py::context:0
2023-01-11T23:16:24.2699409Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.no_sync:0, line 509[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2700036Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.no_sync:0
2023-01-11T23:16:24.2700702Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:0, line 826[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2722142Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:0
2023-01-11T23:16:24.2722916Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:1, line 838[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2723580Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel.register_comm_hook:1
2023-01-11T23:16:24.2724405Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_builtin_comm_hook:0, line 874[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2725175Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_builtin_comm_hook:0
2023-01-11T23:16:24.2725922Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_fused_optim:0, line 930[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2726577Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/_ddp.py::DistributedDataParallel._register_fused_optim:0
2023-01-11T23:16:24.2727212Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/checkpoint_activation.py::checkpoint:0, line 198[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2727820Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/checkpoint_activation.py::checkpoint:0
2023-01-11T23:16:24.2728410Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/contract.py::contract:0, line 44[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2728962Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/contract.py::contract:0
2023-01-11T23:16:24.2729541Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/replicate.py::replicate:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2730236Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_composable/replicate.py::replicate:0
2023-01-11T23:16:24.2730830Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/partial_tensor.py::_PartialTensor:0, line 61[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2731397Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/partial_tensor.py::_PartialTensor:0
2023-01-11T23:16:24.2732118Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_optim/__init__.py::named_params_with_sharded_tensor:0, line 32[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2732827Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_optim/__init__.py::named_params_with_sharded_tensor:0
2023-01-11T23:16:24.2733458Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::init_from_local_shards:0, line 366[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2734053Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::init_from_local_shards:0
2023-01-11T23:16:24.2734931Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::custom_sharded_op_impl:0, line 430[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2736017Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/__init__.py::custom_sharded_op_impl:0
2023-01-11T23:16:24.2736697Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor._init_from_local_tensor:0, line 808[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2737361Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor._init_from_local_tensor:0
2023-01-11T23:16:24.2738002Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor.reshard:0, line 958[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2739598Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/api.py::ShardedTensor.reshard:0
2023-01-11T23:16:24.2740244Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/_ops/_common.py::_sharded_op_common:0, line 15[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2740863Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharded_tensor/_ops/_common.py::_sharded_op_common:0
2023-01-11T23:16:24.2741603Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharding_plan/api.py::ShardingPlan:0, line 36[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2742283Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_shard/sharding_plan/api.py::ShardingPlan:0
2023-01-11T23:16:24.2742881Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/_tools/memory_tracker.py::MemoryTracker:0, line 57[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2743605Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/_tools/memory_tracker.py::MemoryTracker:0
2023-01-11T23:16:24.2744166Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/join.py::Join:0, line 148[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2747650Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/join.py::Join:0
2023-01-11T23:16:24.2748515Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/__init__.py::register_ddp_comm_hook:0, line 99[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2749147Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/__init__.py::register_ddp_comm_hook:0
2023-01-11T23:16:24.2749909Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py::noop_hook:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2750683Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/debugging_hooks.py::noop_hook:0
2023-01-11T23:16:24.2751387Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::allreduce_hook:0, line 37[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2752473Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::allreduce_hook:0
2023-01-11T23:16:24.2753243Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_hook:0, line 54[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2754085Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_hook:0
2023-01-11T23:16:24.2754770Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_hook:0, line 90[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2755508Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_hook:0
2023-01-11T23:16:24.2756162Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_wrapper:0, line 123[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2756958Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::fp16_compress_wrapper:0
2023-01-11T23:16:24.2757695Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_wrapper:0, line 161[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2758408Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.py::bf16_compress_wrapper:0
2023-01-11T23:16:24.2759282Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py::post_localSGD_hook:0, line 85[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2759935Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/post_localSGD_hook.py::post_localSGD_hook:0
2023-01-11T23:16:24.2760873Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::powerSGD_hook:0, line 382[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2761563Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::powerSGD_hook:0
2023-01-11T23:16:24.2762202Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::batched_powerSGD_hook:0, line 691[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2763044Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py::batched_powerSGD_hook:0
2023-01-11T23:16:24.2763937Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_pertensor_hook:0, line 62[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2764752Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_pertensor_hook:0
2023-01-11T23:16:24.2765585Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_perchannel_hook:0, line 142[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2766256Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/ddp_comm_hooks/quantization_hooks.py::quantization_perchannel_hook:0
2023-01-11T23:16:24.2767040Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/averagers.py::PeriodicModelAverager:0, line 51[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2767790Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/averagers.py::PeriodicModelAverager:0
2023-01-11T23:16:24.2768777Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py::HierarchicalModelAverager:0, line 50[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2769654Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/algorithms/model_averaging/hierarchical_model_averager.py::HierarchicalModelAverager:0
2023-01-11T23:16:24.2770492Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/optimizer.py::load_sharded_optimizer_state_dict:0, line 205[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2771281Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/optimizer.py::load_sharded_optimizer_state_dict:0
2023-01-11T23:16:24.2771957Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::SavePlanner:0, line 126[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2772662Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::SavePlanner:0
2023-01-11T23:16:24.2773407Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::LoadPlanner:0, line 281[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2774076Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/planner.py::LoadPlanner:0
2023-01-11T23:16:24.2775046Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py::load_state_dict:0, line 63[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2775799Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_loader.py::load_state_dict:0
2023-01-11T23:16:24.2776471Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::save_state_dict:0, line 60[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2777193Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/checkpoint/state_dict_saver.py::save_state_dict:0
2023-01-11T23:16:24.2777914Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/__init__.py::start_processes:0, line 132[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2778661Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/__init__.py::start_processes:0
2023-01-11T23:16:24.2779264Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::Std.from_str:0, line 110[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2779853Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::Std.from_str:0
2023-01-11T23:16:24.2780448Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::to_map:0, line 150[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2781031Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py::to_map:0
2023-01-11T23:16:24.2781757Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py::ChildFailedError:0, line 203[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2782399Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py::ChildFailedError:0
2023-01-11T23:16:24.2783161Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/api.py::RendezvousHandler.shutdown:0, line 112[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2783846Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/rendezvous/api.py::RendezvousHandler.shutdown:0
2023-01-11T23:16:24.2784547Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::StateDictType:0, line 221[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2785196Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::StateDictType:0
2023-01-11T23:16:24.2785774Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::FullStateDictConfig:0, line 258[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2786506Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/api.py::FullStateDictConfig:0
2023-01-11T23:16:24.2787307Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel:0, line 125[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2788085Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel:0
2023-01-11T23:16:24.2788965Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.set_state_dict_type:0, line 551[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2789917Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.set_state_dict_type:0
2023-01-11T23:16:24.2790764Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.state_dict_type:0, line 619[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2791549Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.state_dict_type:0
2023-01-11T23:16:24.2792540Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.shard_full_optim_state_dict:0, line 1346[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2793493Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.shard_full_optim_state_dict:0
2023-01-11T23:16:24.2794238Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.scatter_full_optim_state_dict:0, line 1455[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2795067Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.scatter_full_optim_state_dict:0
2023-01-11T23:16:24.2795809Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.rekey_optim_state_dict:0, line 1586[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2796504Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py::FullyShardedDataParallel.rekey_optim_state_dict:0
2023-01-11T23:16:24.2797168Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/sharded_grad_scaler.py::ShardedGradScaler:0, line 45[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2797764Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/fsdp/sharded_grad_scaler.py::ShardedGradScaler:0
2023-01-11T23:16:24.2798349Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/functional.py::_all_gather_base:0, line 130[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2798895Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/functional.py::_all_gather_base:0
2023-01-11T23:16:24.2799480Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.__init__:0, line 201[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2800192Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.__init__:0
2023-01-11T23:16:24.2800870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.init_from_module_rref:0, line 524[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2801540Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::_RemoteModule.init_from_module_rref:0
2023-01-11T23:16:24.2802142Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::RemoteModule:0, line 646[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2802731Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/nn/api/remote_module.py::RemoteModule:0
2023-01-11T23:16:24.2803368Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_apply_optimizer_in_backward:0, line 27[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2804010Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/apply_optimizer_in_backward.py::_apply_optimizer_in_backward:0
2023-01-11T23:16:24.2804724Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.py::_NamedOptimizer:0, line 38[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2805396Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/named_optimizer.py::_NamedOptimizer:0
2023-01-11T23:16:24.2806097Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/optimizer.py::DistributedOptimizer:0, line 160[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2806686Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/optimizer.py::DistributedOptimizer:0
2023-01-11T23:16:24.2807304Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/post_localSGD_optimizer.py::PostLocalSGDOptimizer:0, line 18[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2807998Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/post_localSGD_optimizer.py::PostLocalSGDOptimizer:0
2023-01-11T23:16:24.2808607Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/utils.py::register_functional_optim:0, line 35[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2809176Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/utils.py::register_functional_optim:0
2023-01-11T23:16:24.2809874Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/zero_redundancy_optimizer.py::ZeroRedundancyOptimizer:0, line 325[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2810570Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/optim/zero_redundancy_optimizer.py::ZeroRedundancyOptimizer:0
2023-01-11T23:16:24.2811243Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::WithDevice:0, line 152[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2811805Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::WithDevice:0
2023-01-11T23:16:24.2812372Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::Pipe:0, line 274[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2812915Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/pipeline/sync/pipe.py::Pipe:0
2023-01-11T23:16:24.2813462Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::_wait_all:0, line 160[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2813979Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::_wait_all:0
2023-01-11T23:16:24.2814649Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::shutdown:0, line 333[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2815240Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::shutdown:0
2023-01-11T23:16:24.2815778Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::remote:0, line 582[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2816295Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::remote:0
2023-01-11T23:16:24.2816824Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_sync:0, line 766[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2817341Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_sync:0
2023-01-11T23:16:24.2817877Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_async:0, line 858[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2818400Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/api.py::rpc_async:0
2023-01-11T23:16:24.2818956Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/functions.py::async_execution:0, line 33[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2830020Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/functions.py::async_execution:0
2023-01-11T23:16:24.2830724Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/options.py::TensorPipeRpcBackendOptions.set_device_map:0, line 117[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2831378Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/options.py::TensorPipeRpcBackendOptions.set_device_map:0
2023-01-11T23:16:24.2832039Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/server_process_global_profiler.py::_server_process_global_profile:0, line 58[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2835175Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/rpc/server_process_global_profiler.py::_server_process_global_profile:0
2023-01-11T23:16:24.2835912Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_input_validate:0, line 33[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2836512Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_input_validate:0
2023-01-11T23:16:24.2837126Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_output_validate:0, line 78[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2837721Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/_utils.py::_prepare_output_validate:0
2023-01-11T23:16:24.2838320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/api.py::parallelize_module:0, line 63[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2838901Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributed/tensor/parallel/api.py::parallelize_module:0
2023-01-11T23:16:24.2839510Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/bernoulli.py::Bernoulli:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2841221Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/bernoulli.py::Bernoulli:0
2023-01-11T23:16:24.2841753Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/beta.py::Beta:0, line 16[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2846957Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/beta.py::Beta:0
2023-01-11T23:16:24.2847499Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/binomial.py::Binomial:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2855046Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/binomial.py::Binomial:0
2023-01-11T23:16:24.2855616Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/categorical.py::Categorical:0, line 37[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2861005Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/categorical.py::Categorical:0
2023-01-11T23:16:24.2861562Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/cauchy.py::Cauchy:0, line 19[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2865505Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/cauchy.py::Cauchy:0
2023-01-11T23:16:24.2866037Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/chi2.py::Chi2:0, line 12[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2870193Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/chi2.py::Chi2:0
2023-01-11T23:16:24.2870847Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/constraints.py::_DependentProperty:0, line 152[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2873151Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/constraints.py::_DependentProperty:0
2023-01-11T23:16:24.2873771Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/continuous_bernoulli.py::ContinuousBernoulli:0, line 24[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2878581Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/continuous_bernoulli.py::ContinuousBernoulli:0
2023-01-11T23:16:24.2879162Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/dirichlet.py::Dirichlet:0, line 35[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2882880Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/dirichlet.py::Dirichlet:0
2023-01-11T23:16:24.2883456Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/exponential.py::Exponential:0, line 15[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2887158Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/exponential.py::Exponential:0
2023-01-11T23:16:24.2887761Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/fishersnedecor.py::FisherSnedecor:0, line 16[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2893020Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/fishersnedecor.py::FisherSnedecor:0
2023-01-11T23:16:24.2893616Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/gamma.py::Gamma:0, line 19[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2897379Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/gamma.py::Gamma:0
2023-01-11T23:16:24.2897938Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/geometric.py::Geometric:0, line 21[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2902070Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/geometric.py::Geometric:0
2023-01-11T23:16:24.2902621Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/gumbel.py::Gumbel:0, line 17[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2908121Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/gumbel.py::Gumbel:0
2023-01-11T23:16:24.2908682Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/half_cauchy.py::HalfCauchy:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2913900Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/half_cauchy.py::HalfCauchy:0
2023-01-11T23:16:24.2914456Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/half_normal.py::HalfNormal:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2917983Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/half_normal.py::HalfNormal:0
2023-01-11T23:16:24.2918551Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/independent.py::Independent:0, line 18[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2929707Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/independent.py::Independent:0
2023-01-11T23:16:24.2930374Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/kumaraswamy.py::Kumaraswamy:0, line 25[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2935864Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/kumaraswamy.py::Kumaraswamy:0
2023-01-11T23:16:24.2937381Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/laplace.py::Laplace:0, line 14[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2941222Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/laplace.py::Laplace:0
2023-01-11T23:16:24.2942098Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/lkj_cholesky.py::LKJCholesky:0, line 38[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2950178Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/lkj_cholesky.py::LKJCholesky:0
2023-01-11T23:16:24.2951216Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/log_normal.py::LogNormal:0, line 17[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2956328Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/log_normal.py::LogNormal:0
2023-01-11T23:16:24.2957279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/logistic_normal.py::LogisticNormal:0, line 22[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2964280Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/logistic_normal.py::LogisticNormal:0
2023-01-11T23:16:24.2965313Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/lowrank_multivariate_normal.py::LowRankMultivariateNormal:0, line 56[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2966558Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/lowrank_multivariate_normal.py::LowRankMultivariateNormal:0
2023-01-11T23:16:24.2967559Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py::MixtureSameFamily:0, line 19[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2970358Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/mixture_same_family.py::MixtureSameFamily:0
2023-01-11T23:16:24.2971279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/multinomial.py::Multinomial:0, line 34[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2972176Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/multinomial.py::Multinomial:0
2023-01-11T23:16:24.2973119Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py::MultivariateNormal:0, line 94[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2974084Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/multivariate_normal.py::MultivariateNormal:0
2023-01-11T23:16:24.2975438Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/normal.py::Normal:0, line 18[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2979631Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/normal.py::Normal:0
2023-01-11T23:16:24.2980542Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py::OneHotCategorical:0, line 27[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2987031Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/one_hot_categorical.py::OneHotCategorical:0
2023-01-11T23:16:24.2988112Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/pareto.py::Pareto:0, line 14[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2993883Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/pareto.py::Pareto:0
2023-01-11T23:16:24.2994763Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/poisson.py::Poisson:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.2995613Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/poisson.py::Poisson:0
2023-01-11T23:16:24.2996554Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_bernoulli.py::RelaxedBernoulli:0, line 102[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3001127Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_bernoulli.py::RelaxedBernoulli:0
2023-01-11T23:16:24.3002119Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_categorical.py::RelaxedOneHotCategorical:0, line 96[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3009514Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/relaxed_categorical.py::RelaxedOneHotCategorical:0
2023-01-11T23:16:24.3010460Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/studentT.py::StudentT:0, line 17[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3016776Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/studentT.py::StudentT:0
2023-01-11T23:16:24.3017685Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CatTransform:0, line 1002[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3018573Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CatTransform:0
2023-01-11T23:16:24.3019456Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::StackTransform:0, line 1105[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3020372Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::StackTransform:0
2023-01-11T23:16:24.3021483Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CumulativeDistributionTransform:0, line 1178[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3022495Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/transforms.py::CumulativeDistributionTransform:0
2023-01-11T23:16:24.3023444Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/uniform.py::Uniform:0, line 17[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3025041Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/uniform.py::Uniform:0
2023-01-11T23:16:24.3025930Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/von_mises.py::VonMises:0, line 79[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3035101Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/von_mises.py::VonMises:0
2023-01-11T23:16:24.3036007Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/weibull.py::Weibull:0, line 16[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3040440Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/weibull.py::Weibull:0
2023-01-11T23:16:24.3041328Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/distributions/wishart.py::Wishart:0, line 36[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3042163Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/distributions/wishart.py::Wishart:0
2023-01-11T23:16:24.3042983Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::_snake_case:0, line 79[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3043934Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::_snake_case:0
2023-01-11T23:16:24.3044818Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.eliminate_dead_code:0, line 1363[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3045683Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.eliminate_dead_code:0
2023-01-11T23:16:24.3046530Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.on_generate_code:0, line 1431[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3047349Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/graph.py::Graph.on_generate_code:0
2023-01-11T23:16:24.3048197Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Interpreter:0, line 37[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3049036Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Interpreter:0
2023-01-11T23:16:24.3049897Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Transformer:0, line 380[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3050710Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/interpreter.py::Transformer:0
2023-01-11T23:16:24.3051611Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/subgraph_rewriter.py::replace_pattern:0, line 108[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3052441Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/subgraph_rewriter.py::replace_pattern:0
2023-01-11T23:16:24.3053278Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::TensorType:0, line 11[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3054151Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::TensorType:0
2023-01-11T23:16:24.3055258Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_consistent:0, line 62[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3056061Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_consistent:0
2023-01-11T23:16:24.3057037Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_more_precise:0, line 88[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3057872Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/tensor_type.py::is_more_precise:0
2023-01-11T23:16:24.3058793Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/rewriter.py::AST_Rewriter.visit_AnnAssign:0, line 87[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3059720Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/rewriter.py::AST_Rewriter.visit_AnnAssign:0
2023-01-11T23:16:24.3060635Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/core.py::reify:0, line 42[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3061526Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/core.py::reify:0
2023-01-11T23:16:24.3062415Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/match.py::VarDispatcher:0, line 42[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3063355Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/match.py::VarDispatcher:0
2023-01-11T23:16:24.3064279Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unifiable:0, line 10[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3065146Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unifiable:0
2023-01-11T23:16:24.3066025Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::reify_object:0, line 36[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3067024Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::reify_object:0
2023-01-11T23:16:24.3067893Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unify_object:0, line 91[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3068732Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/more.py::unify_object:0
2023-01-11T23:16:24.3069598Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge:0, line 22[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3092780Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge:0
2023-01-11T23:16:24.3093794Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge_with:0, line 49[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3097851Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::merge_with:0
2023-01-11T23:16:24.3098816Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valmap:0, line 75[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3101234Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valmap:0
2023-01-11T23:16:24.3102232Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keymap:0, line 91[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3105717Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keymap:0
2023-01-11T23:16:24.3106690Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemmap:0, line 107[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3109444Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemmap:0
2023-01-11T23:16:24.3110652Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valfilter:0, line 123[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3115032Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::valfilter:0
2023-01-11T23:16:24.3116041Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keyfilter:0, line 143[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3119898Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::keyfilter:0
2023-01-11T23:16:24.3120896Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemfilter:0, line 163[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3125792Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::itemfilter:0
2023-01-11T23:16:24.3126801Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc:0, line 189[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3129422Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc:0
2023-01-11T23:16:24.3130402Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::dissoc:0, line 206[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3135386Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::dissoc:0
2023-01-11T23:16:24.3136358Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc_in:0, line 232[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3140004Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::assoc_in:0
2023-01-11T23:16:24.3141014Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::update_in:0, line 259[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3148206Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::update_in:0
2023-01-11T23:16:24.3149221Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::get_in:0, line 311[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3158244Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::get_in:0
2023-01-11T23:16:24.3159244Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::groupby:0, line 357[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3162844Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::groupby:0
2023-01-11T23:16:24.3163543Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::first:0, line 393[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3165603Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/unification_tools.py::first:0
2023-01-11T23:16:24.3166198Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::transitive_get:0, line 12[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3169308Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::transitive_get:0
2023-01-11T23:16:24.3169908Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::_toposort:0, line 39[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3170607Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::_toposort:0
2023-01-11T23:16:24.3171205Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::reverse_dict:0, line 67[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3173589Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::reverse_dict:0
2023-01-11T23:16:24.3174230Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::freeze:0, line 92[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3177353Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/utils.py::freeze:0
2023-01-11T23:16:24.3177948Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/variable.py::variables:0, line 62[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3179971Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/variable.py::variables:0
2023-01-11T23:16:24.3180590Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/core.py::dispatch:0, line 18[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3183906Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/core.py::dispatch:0
2023-01-11T23:16:24.3184567Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher:0, line 100[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3185212Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher:0
2023-01-11T23:16:24.3185972Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.register:0, line 124[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3186659Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.register:0
2023-01-11T23:16:24.3187345Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.add:0, line 176[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3188006Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.add:0
2023-01-11T23:16:24.3188686Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.dispatch:0, line 288[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3189363Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::Dispatcher.dispatch:0
2023-01-11T23:16:24.3190035Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::str_signature:0, line 418[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3193546Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/dispatcher.py::str_signature:0
2023-01-11T23:16:24.3194196Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::expand_tuples:0, line 15[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3195845Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::expand_tuples:0
2023-01-11T23:16:24.3196487Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::_toposort:0, line 38[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3199541Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::_toposort:0
2023-01-11T23:16:24.3200187Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::reverse_dict:0, line 66[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3202684Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::reverse_dict:0
2023-01-11T23:16:24.3203320Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::groupby:0, line 85[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3206459Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::groupby:0
2023-01-11T23:16:24.3207085Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::typename:0, line 115[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3209596Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/utils.py::typename:0
2023-01-11T23:16:24.3210237Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::isvariadic:0, line 47[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3210886Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::isvariadic:0
2023-01-11T23:16:24.3211533Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::Variadic:0, line 80[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3213470Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/experimental/unification/multipledispatch/variadic.py::Variadic:0
2023-01-11T23:16:24.3214663Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/passes/shape_prop.py::ShapeProp:0, line 76[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3215216Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/passes/shape_prop.py::ShapeProp:0
2023-01-11T23:16:24.3217290Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/fx/passes/split_module.py::split_module:0, line 68[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3217838Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/fx/passes/split_module.py::split_module:0
2023-01-11T23:16:24.3218413Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py::AttributeTypeIsSupportedChecker:0, line 35[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3219010Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/_check.py::AttributeTypeIsSupportedChecker:0
2023-01-11T23:16:24.3219783Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save:0, line 53[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3220384Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save:0
2023-01-11T23:16:24.3221068Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::load:0, line 111[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3221663Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::load:0
2023-01-11T23:16:24.3222558Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save_jit_module_to_flatbuffer:0, line 235[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3223261Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/_serialization.py::save_jit_module_to_flatbuffer:0
2023-01-11T23:16:24.3224054Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_load_for_lite_interpreter:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3224772Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_load_for_lite_interpreter:0
2023-01-11T23:16:24.3225515Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_bytecode_version:0, line 89[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3226217Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_bytecode_version:0
2023-01-11T23:16:24.3226846Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_mobile_model_contained_types:0, line 119[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3227616Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_mobile_model_contained_types:0
2023-01-11T23:16:24.3228274Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_ops_and_info:0, line 199[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3228830Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/jit/mobile/__init__.py::_get_model_ops_and_info:0
2023-01-11T23:16:24.3229406Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/masked/maskedtensor/core.py::is_masked_tensor:0, line 22[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3229962Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/masked/maskedtensor/core.py::is_masked_tensor:0
2023-01-11T23:16:24.3230607Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool2d_with_indices:0, line 452[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.3275972Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool2d_with_indices:0
2023-01-11T23:16:24.3276670Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool3d_with_indices:0, line 553[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4076984Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::fractional_max_pool3d_with_indices:0
2023-01-11T23:16:24.4089803Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::gumbel_softmax:0, line 1878[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4099145Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::gumbel_softmax:0
2023-01-11T23:16:24.4099765Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding:0, line 2149[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4108542Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding:0
2023-01-11T23:16:24.4109171Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding_bag:0, line 2286[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4119947Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::embedding_bag:0
2023-01-11T23:16:24.4120571Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::ctc_loss:0, line 2615[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4145943Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::ctc_loss:0
2023-01-11T23:16:24.4146544Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::nll_loss:0, line 2681[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4153454Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::nll_loss:0
2023-01-11T23:16:24.4154056Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::cross_entropy:0, line 3000[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4161087Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::cross_entropy:0
2023-01-11T23:16:24.4161727Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy:0, line 3066[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4168267Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy:0
2023-01-11T23:16:24.4169600Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy_with_logits:0, line 3138[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4175384Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/functional.py::binary_cross_entropy_with_logits:0
2023-01-11T23:16:24.4176251Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_input:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4183763Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_input:0
2023-01-11T23:16:24.4184608Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_weight:0, line 53[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4189241Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv1d_weight:0
2023-01-11T23:16:24.4190073Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_input:0, line 86[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4197590Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_input:0
2023-01-11T23:16:24.4198437Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_weight:0, line 116[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4203200Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv2d_weight:0
2023-01-11T23:16:24.4204033Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_input:0, line 149[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4229988Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_input:0
2023-01-11T23:16:24.4231096Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_weight:0, line 179[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4249884Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/grad.py::conv3d_weight:0
2023-01-11T23:16:24.4250708Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::calculate_gain:0, line 96[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4253172Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::calculate_gain:0
2023-01-11T23:16:24.4253992Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::uniform_:0, line 132[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4258100Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::uniform_:0
2023-01-11T23:16:24.4258881Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::normal_:0, line 150[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4261892Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::normal_:0
2023-01-11T23:16:24.4262522Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::trunc_normal_:0, line 173[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4265971Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::trunc_normal_:0
2023-01-11T23:16:24.4266566Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::constant_:0, line 187[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4269815Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::constant_:0
2023-01-11T23:16:24.4270391Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::ones_:0, line 202[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4274136Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::ones_:0
2023-01-11T23:16:24.4274724Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::zeros_:0, line 215[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4276397Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::zeros_:0
2023-01-11T23:16:24.4277112Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::eye_:0, line 230[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4279955Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::eye_:0
2023-01-11T23:16:24.4280527Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::dirac_:0, line 251[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4285336Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::dirac_:0
2023-01-11T23:16:24.4285917Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_uniform_:0, line 320[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4289288Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_uniform_:0
2023-01-11T23:16:24.4289892Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_normal_:0, line 347[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4292834Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::xavier_normal_:0
2023-01-11T23:16:24.4293504Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_uniform_:0, line 392[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4296720Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_uniform_:0
2023-01-11T23:16:24.4297351Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_normal_:0, line 441[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4300836Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::kaiming_normal_:0
2023-01-11T23:16:24.4301452Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::orthogonal_:0, line 466[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4302131Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::orthogonal_:0
2023-01-11T23:16:24.4302736Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::sparse_:0, line 512[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4306524Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/init.py::sparse_:0
2023-01-11T23:16:24.4307140Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Threshold:0, line 40[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4311850Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Threshold:0
2023-01-11T23:16:24.4313776Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU:0, line 83[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4318553Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU:0
2023-01-11T23:16:24.4319425Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::RReLU:0, line 142[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4324046Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::RReLU:0
2023-01-11T23:16:24.4324961Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardtanh:0, line 202[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4328794Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardtanh:0
2023-01-11T23:16:24.4329688Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU6:0, line 260[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4333363Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ReLU6:0
2023-01-11T23:16:24.4334270Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Sigmoid:0, line 288[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4338074Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Sigmoid:0
2023-01-11T23:16:24.4338977Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardsigmoid:0, line 320[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4342437Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardsigmoid:0
2023-01-11T23:16:24.4343325Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanh:0, line 352[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4346816Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanh:0
2023-01-11T23:16:24.4347681Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SiLU:0, line 383[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4351479Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SiLU:0
2023-01-11T23:16:24.4352463Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Mish:0, line 419[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4355839Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Mish:0
2023-01-11T23:16:24.4356731Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardswish:0, line 461[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4360481Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardswish:0
2023-01-11T23:16:24.4361349Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ELU:0, line 502[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4365011Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::ELU:0
2023-01-11T23:16:24.4365998Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::CELU:0, line 543[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4369643Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::CELU:0
2023-01-11T23:16:24.4370507Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SELU:0, line 595[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4374096Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::SELU:0
2023-01-11T23:16:24.4375218Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GLU:0, line 631[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4378880Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GLU:0
2023-01-11T23:16:24.4379755Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GELU:0, line 672[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4385522Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::GELU:0
2023-01-11T23:16:24.4386416Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardshrink:0, line 714[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4390225Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Hardshrink:0
2023-01-11T23:16:24.4391221Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LeakyReLU:0, line 761[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4395480Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LeakyReLU:0
2023-01-11T23:16:24.4396323Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSigmoid:0, line 796[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4399172Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSigmoid:0
2023-01-11T23:16:24.4400172Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softplus:0, line 827[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4403703Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softplus:0
2023-01-11T23:16:24.4404592Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softshrink:0, line 869[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4408869Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softshrink:0
2023-01-11T23:16:24.4409780Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::MultiheadAttention:0, line 938[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4410649Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::MultiheadAttention:0
2023-01-11T23:16:24.4411520Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::PReLU:0, line 1274[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4414280Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::PReLU:0
2023-01-11T23:16:24.4415468Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softsign:0, line 1309[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4419469Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softsign:0
2023-01-11T23:16:24.4420351Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanhshrink:0, line 1332[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4423758Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Tanhshrink:0
2023-01-11T23:16:24.4424808Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmin:0, line 1366[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4428696Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmin:0
2023-01-11T23:16:24.4429595Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax:0, line 1421[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4434862Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax:0
2023-01-11T23:16:24.4435761Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax2d:0, line 1461[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4437287Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::Softmax2d:0
2023-01-11T23:16:24.4438184Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSoftmax:0, line 1493[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4442367Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/activation.py::LogSoftmax:0
2023-01-11T23:16:24.4443291Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm1d:0, line 290[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4450994Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm1d:0
2023-01-11T23:16:24.4451870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm2d:0, line 399[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.4672352Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm2d:0
2023-01-11T23:16:24.4673360Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm3d:0, line 505[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6851095Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::BatchNorm3d:0
2023-01-11T23:16:24.6961369Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm:0, line 627[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6963198Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm:0
2023-01-11T23:16:24.6963845Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm.convert_sync_batchnorm:0, line 782[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6964474Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py::SyncBatchNorm.convert_sync_batchnorm:0
2023-01-11T23:16:24.6965076Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/channelshuffle.py::ChannelShuffle:0, line 17[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6984367Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/channelshuffle.py::ChannelShuffle:0
2023-01-11T23:16:24.6985057Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential:0, line 63[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6985672Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::Sequential:0
2023-01-11T23:16:24.6986557Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleList:0, line 261[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6987183Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleList:0
2023-01-11T23:16:24.6987963Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleDict:0, line 433[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6988567Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ModuleDict:0
2023-01-11T23:16:24.6989374Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterList:0, line 567[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6990013Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterList:0
2023-01-11T23:16:24.6993044Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterDict:0, line 707[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6993729Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/container.py::ParameterDict:0
2023-01-11T23:16:24.6994368Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::PairwiseDistance:0, line 36[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.6999428Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::PairwiseDistance:0
2023-01-11T23:16:24.7000083Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::CosineSimilarity:0, line 72[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7006884Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/distance.py::CosineSimilarity:0
2023-01-11T23:16:24.7007510Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout:0, line 49[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7012459Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout:0
2023-01-11T23:16:24.7013148Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout1d:0, line 91[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7017810Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout1d:0
2023-01-11T23:16:24.7018665Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout2d:0, line 140[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7040891Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout2d:0
2023-01-11T23:16:24.7041549Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout3d:0, line 182[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7128084Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::Dropout3d:0
2023-01-11T23:16:24.7128761Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::AlphaDropout:0, line 225[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7133076Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::AlphaDropout:0
2023-01-11T23:16:24.7133838Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::FeatureAlphaDropout:0, line 272[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7249745Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/dropout.py::FeatureAlphaDropout:0
2023-01-11T23:16:24.7250423Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Flatten:0, line 24[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7257427Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Flatten:0
2023-01-11T23:16:24.7258041Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Unflatten:0, line 76[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7274309Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/flatten.py::Unflatten:0
2023-01-11T23:16:24.7274917Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Fold:0, line 111[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7279071Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Fold:0
2023-01-11T23:16:24.7279662Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Unfold:0, line 253[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7299441Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/fold.py::Unfold:0
2023-01-11T23:16:24.7300079Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm1d:0, line 135[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7313209Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm1d:0
2023-01-11T23:16:24.7313921Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm2d:0, line 251[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.7541725Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm2d:0
2023-01-11T23:16:24.7542419Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm3d:0, line 367[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9689311Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/instancenorm.py::InstanceNorm3d:0
2023-01-11T23:16:24.9785801Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py::LazyModuleMixin:0, line 77[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9790913Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/lazy.py::LazyModuleMixin:0
2023-01-11T23:16:24.9791716Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Identity:0, line 33[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9798888Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Identity:0
2023-01-11T23:16:24.9799759Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Linear:0, line 78[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9807786Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Linear:0
2023-01-11T23:16:24.9808635Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Bilinear:0, line 164[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9831811Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/linear.py::Bilinear:0
2023-01-11T23:16:24.9832591Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::L1Loss:0, line 88[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9839547Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::L1Loss:0
2023-01-11T23:16:24.9840161Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::NLLLoss:0, line 184[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9865627Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::NLLLoss:0
2023-01-11T23:16:24.9866259Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::PoissonNLLLoss:0, line 271[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9872824Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::PoissonNLLLoss:0
2023-01-11T23:16:24.9873538Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::GaussianNLLLoss:0, line 343[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9887056Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::GaussianNLLLoss:0
2023-01-11T23:16:24.9887707Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::KLDivLoss:0, line 451[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9896475Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::KLDivLoss:0
2023-01-11T23:16:24.9897091Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MSELoss:0, line 523[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9902917Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MSELoss:0
2023-01-11T23:16:24.9903751Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCELoss:0, line 605[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9909623Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCELoss:0
2023-01-11T23:16:24.9910249Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:0, line 668[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9921369Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::BCEWithLogitsLoss:0
2023-01-11T23:16:24.9922014Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiLabelMarginLoss:0, line 831[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9929610Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiLabelMarginLoss:0
2023-01-11T23:16:24.9930187Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:0, line 1149[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9939955Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CrossEntropyLoss:0
2023-01-11T23:16:24.9940619Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MarginRankingLoss:0, line 1317[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9947105Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MarginRankingLoss:0
2023-01-11T23:16:24.9947910Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiMarginLoss:0, line 1388[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9955242Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::MultiMarginLoss:0
2023-01-11T23:16:24.9955885Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginLoss:0, line 1468[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9966793Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginLoss:0
2023-01-11T23:16:24.9967603Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginWithDistanceLoss:0, line 1559[01m <- wrt source file[39;49;00m
2023-01-11T23:16:24.9987797Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::TripletMarginWithDistanceLoss:0
2023-01-11T23:16:24.9988448Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CTCLoss:0, line 1670[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0015246Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/loss.py::CTCLoss:0
2023-01-11T23:16:25.0015879Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.register_buffer:0, line 491[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0016519Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.register_buffer:0
2023-01-11T23:16:25.0017160Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.apply:0, line 847[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0030509Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.apply:0
2023-01-11T23:16:25.0031134Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.to:0, line 1072[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0039341Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.to:0
2023-01-11T23:16:25.0039986Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.state_dict:0, line 1763[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0040616Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.state_dict:0
2023-01-11T23:16:25.0041407Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.parameters:0, line 2050[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0042023Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.parameters:0
2023-01-11T23:16:25.0042673Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_parameters:0, line 2082[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0043313Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_parameters:0
2023-01-11T23:16:25.0043953Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.buffers:0, line 2107[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0044553Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.buffers:0
2023-01-11T23:16:25.0045201Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_buffers:0, line 2133[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0045831Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_buffers:0
2023-01-11T23:16:25.0046477Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_children:0, line 2163[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0047090Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_children:0
2023-01-11T23:16:25.0047722Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.modules:0, line 2187[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0050817Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.modules:0
2023-01-11T23:16:25.0051460Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_modules:0, line 2221[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0056898Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py::Module.named_modules:0
2023-01-11T23:16:25.0057638Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LocalResponseNorm:0, line 34[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0096339Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LocalResponseNorm:0
2023-01-11T23:16:25.0097252Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LayerNorm:0, line 140[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0106275Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::LayerNorm:0
2023-01-11T23:16:25.0106940Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::GroupNorm:0, line 230[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0114062Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/normalization.py::GroupNorm:0
2023-01-11T23:16:25.0114755Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad1d:0, line 48[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0123120Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad1d:0
2023-01-11T23:16:25.0123771Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad2d:0, line 101[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0129200Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad2d:0
2023-01-11T23:16:25.0129850Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad3d:0, line 157[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0155212Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ConstantPad3d:0
2023-01-11T23:16:25.0155871Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad1d:0, line 201[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0161500Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad1d:0
2023-01-11T23:16:25.0162406Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad2d:0, line 244[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0168368Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad2d:0
2023-01-11T23:16:25.0169032Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad3d:0, line 301[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0172250Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReflectionPad3d:0
2023-01-11T23:16:25.0172925Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad1d:0, line 358[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0178159Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad1d:0
2023-01-11T23:16:25.0178826Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad2d:0, line 401[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.0184370Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad2d:0
2023-01-11T23:16:25.0185023Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad3d:0, line 458[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.4954538Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ReplicationPad3d:0
2023-01-11T23:16:25.5128459Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad2d:0, line 494[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.5136467Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/padding.py::ZeroPad2d:0
2023-01-11T23:16:25.5137335Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelShuffle:0, line 36[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.5143322Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelShuffle:0
2023-01-11T23:16:25.5144084Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelUnshuffle:0, line 86[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.5149529Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pixelshuffle.py::PixelUnshuffle:0
2023-01-11T23:16:25.5150179Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool1d:0, line 76[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.5156042Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool1d:0
2023-01-11T23:16:25.5156660Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool2d:0, line 148[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.5199855Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool2d:0
2023-01-11T23:16:25.5200540Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool3d:0, line 226[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.7005103Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxPool3d:0
2023-01-11T23:16:25.7042663Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool1d:0, line 293[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.7057485Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool1d:0
2023-01-11T23:16:25.7058493Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool2d:0, line 366[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.7077513Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool2d:0
2023-01-11T23:16:25.7078146Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool3d:0, line 451[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.7692868Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::MaxUnpool3d:0
2023-01-11T23:16:25.7693573Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool1d:0, line 524[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.7703183Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool1d:0
2023-01-11T23:16:25.7703870Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool2d:0, line 600[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.7742998Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool2d:0
2023-01-11T23:16:25.7743630Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool3d:0, line 686[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9248715Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AvgPool3d:0
2023-01-11T23:16:25.9284484Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool2d:0, line 749[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9329498Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool2d:0
2023-01-11T23:16:25.9330191Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool3d:0, line 819[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9904942Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::FractionalMaxPool3d:0
2023-01-11T23:16:25.9905972Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool1d:0, line 909[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9914538Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool1d:0
2023-01-11T23:16:25.9915174Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool2d:0, line 960[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9959820Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::LPPool2d:0
2023-01-11T23:16:25.9960463Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool1d:0, line 1011[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9964978Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool1d:0
2023-01-11T23:16:25.9965663Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool2d:0, line 1045[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9973426Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool2d:0
2023-01-11T23:16:25.9974095Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool3d:0, line 1088[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9993458Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveMaxPool3d:0
2023-01-11T23:16:25.9994184Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool1d:0, line 1135[01m <- wrt source file[39;49;00m
2023-01-11T23:16:25.9997691Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool1d:0
2023-01-11T23:16:25.9998503Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool2d:0, line 1166[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0005937Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool2d:0
2023-01-11T23:16:26.0006597Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool3d:0, line 1205[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0025896Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/pooling.py::AdaptiveAvgPool3d:0
2023-01-11T23:16:26.0026524Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNN:0, line 436[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0038204Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNN:0
2023-01-11T23:16:26.0038798Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTM:0, line 702[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0054269Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTM:0
2023-01-11T23:16:26.0055005Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRU:0, line 933[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0072240Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRU:0
2023-01-11T23:16:26.0073001Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNNCell:0, line 1106[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0081979Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::RNNCell:0
2023-01-11T23:16:26.0082572Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTMCell:0, line 1207[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0092617Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::LSTMCell:0
2023-01-11T23:16:26.0093240Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRUCell:0, line 1300[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0105762Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/rnn.py::GRUCell:0
2023-01-11T23:16:26.0106397Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding:0, line 67[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0119849Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding:0
2023-01-11T23:16:26.0120503Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding.from_pretrained:0, line 200[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0125963Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::Embedding.from_pretrained:0
2023-01-11T23:16:26.0126615Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag:0, line 278[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0141587Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag:0
2023-01-11T23:16:26.0142264Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag.from_pretrained:0, line 429[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.0148519Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py::EmbeddingBag.from_pretrained:0
2023-01-11T23:16:26.0149183Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer:0, line 42[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.5985142Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer:0
2023-01-11T23:16:26.5997447Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer.forward:0, line 134[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.5998777Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::Transformer.forward:0
2023-01-11T23:16:26.5999683Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoder:0, line 181[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.6689092Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoder:0
2023-01-11T23:16:26.6691749Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoder:0, line 325[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.8154586Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoder:0
2023-01-11T23:16:26.8160305Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoderLayer:0, line 391[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.8409200Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerEncoderLayer:0
2023-01-11T23:16:26.8450693Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoderLayer:0, line 608[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.8963546Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py::TransformerDecoderLayer:0
2023-01-11T23:16:26.8964316Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::Upsample:0, line 74[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.8990184Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::Upsample:0
2023-01-11T23:16:26.8990928Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingNearest2d:0, line 196[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9004264Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingNearest2d:0
2023-01-11T23:16:26.9005239Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingBilinear2d:0, line 242[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9014226Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/modules/upsampling.py::UpsamplingBilinear2d:0
2023-01-11T23:16:26.9015208Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py::DataParallel:0, line 116[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9015874Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/data_parallel.py::DataParallel:0
2023-01-11T23:16:26.9016560Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel:0, line 534[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9017263Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel:0
2023-01-11T23:16:26.9017994Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.no_sync:0, line 1051[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9018770Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.no_sync:0
2023-01-11T23:16:26.9019435Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.join:0, line 1377[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9020151Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.join:0
2023-01-11T23:16:26.9020898Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:0, line 1550[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9021732Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:0
2023-01-11T23:16:26.9022411Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:1, line 1560[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9023053Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel.register_comm_hook:1
2023-01-11T23:16:26.9023734Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_builtin_comm_hook:0, line 1594[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9024460Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_builtin_comm_hook:0
2023-01-11T23:16:26.9025139Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_fused_optim:0, line 1652[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9025793Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/parallel/distributed.py::DistributedDataParallel._register_fused_optim:0
2023-01-11T23:16:26.9026509Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_per_sample_grad.py::call_for_per_sample_grads:0, line 32[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9027177Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_per_sample_grad.py::call_for_per_sample_grads:0
2023-01-11T23:16:26.9027805Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/init.py::skip_init:0, line 30[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9033948Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/init.py::skip_init:0
2023-01-11T23:16:26.9035786Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv2d_weight_memory_format:0, line 54[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9036872Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/memory_format.py::convert_conv2d_weight_memory_format:0
2023-01-11T23:16:26.9037481Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::orthogonal:0, line 245[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9038049Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::orthogonal:0
2023-01-11T23:16:26.9038634Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::spectral_norm:0, line 462[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9039204Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrizations.py::spectral_norm:0
2023-01-11T23:16:26.9039803Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrize.py::register_parametrization:0, line 463[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9044427Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/parametrize.py::register_parametrization:0
2023-01-11T23:16:26.9045050Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::identity:0, line 846[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9045561Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::identity:0
2023-01-11T23:16:26.9046098Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_unstructured:0, line 880[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9046640Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_unstructured:0
2023-01-11T23:16:26.9047190Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::l1_unstructured:0, line 921[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9048035Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::l1_unstructured:0
2023-01-11T23:16:26.9048583Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_structured:0, line 959[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9050327Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::random_structured:0
2023-01-11T23:16:26.9050879Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::ln_structured:0, line 1005[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9062955Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::ln_structured:0
2023-01-11T23:16:26.9063506Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::global_unstructured:0, line 1058[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9079697Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::global_unstructured:0
2023-01-11T23:16:26.9080256Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::custom_from_mask:0, line 1160[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9088763Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::custom_from_mask:0
2023-01-11T23:16:26.9089302Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::remove:0, line 1188[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9094950Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::remove:0
2023-01-11T23:16:26.9095485Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::is_pruned:0, line 1215[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9103446Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/prune.py::is_pruned:0
2023-01-11T23:16:26.9104017Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_packed_sequence:0, line 282[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9118934Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_packed_sequence:0
2023-01-11T23:16:26.9119473Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_sequence:0, line 359[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9124803Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pad_sequence:0
2023-01-11T23:16:26.9125335Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpad_sequence:0, line 412[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9139825Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpad_sequence:0
2023-01-11T23:16:26.9140443Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pack_sequence:0, line 467[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9147236Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::pack_sequence:0
2023-01-11T23:16:26.9147863Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpack_sequence:0, line 495[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9163848Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/rnn.py::unpack_sequence:0
2023-01-11T23:16:26.9164545Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::spectral_norm:0, line 267[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9170835Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::spectral_norm:0
2023-01-11T23:16:26.9171502Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::remove_spectral_norm:0, line 294[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9177853Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/spectral_norm.py::remove_spectral_norm:0
2023-01-11T23:16:26.9178599Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/stateless.py::functional_call:0, line 123[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9181269Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/stateless.py::functional_call:0
2023-01-11T23:16:26.9181907Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::weight_norm:0, line 99[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9188407Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::weight_norm:0
2023-01-11T23:16:26.9189052Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::remove_weight_norm:0, line 121[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9194496Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py::remove_weight_norm:0
2023-01-11T23:16:26.9195163Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/conv_utils.py::unfold3d:0, line 203[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9195838Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/conv_utils.py::unfold3d:0
2023-01-11T23:16:26.9196582Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/expanded_weights_utils.py::sum_over_all_but_batch_and_last_n:0, line 108[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9217045Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/nn/utils/_expanded_weights/expanded_weights_utils.py::sum_over_all_but_batch_and_last_n:0
2023-01-11T23:16:26.9217741Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/onnx/_type_utils.py::JitScalarType:0, line 66[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9219925Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/onnx/_type_utils.py::JitScalarType:0
2023-01-11T23:16:26.9220586Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/onnx/verification.py::find_mismatch:0, line 1746[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9221285Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/onnx/verification.py::find_mismatch:0
2023-01-11T23:16:26.9221976Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/onnx/_internal/diagnostics/infra/engine.py::DiagnosticEngine:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9222965Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/onnx/_internal/diagnostics/infra/engine.py::DiagnosticEngine:0
2023-01-11T23:16:26.9223639Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LambdaLR:0, line 200[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9224238Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LambdaLR:0
2023-01-11T23:16:26.9224873Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiplicativeLR:0, line 286[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9225499Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiplicativeLR:0
2023-01-11T23:16:26.9226114Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::StepLR:0, line 369[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9226705Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::StepLR:0
2023-01-11T23:16:26.9227315Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiStepLR:0, line 418[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9227917Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::MultiStepLR:0
2023-01-11T23:16:26.9228527Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ConstantLR:0, line 467[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9229184Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ConstantLR:0
2023-01-11T23:16:26.9229798Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LinearLR:0, line 529[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9230406Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::LinearLR:0
2023-01-11T23:16:26.9231109Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::SequentialLR:0, line 621[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9231710Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::SequentialLR:0
2023-01-11T23:16:26.9232332Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::PolynomialLR:0, line 729[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9232942Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::PolynomialLR:0
2023-01-11T23:16:26.9233568Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ChainedScheduler:0, line 849[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9234187Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ChainedScheduler:0
2023-01-11T23:16:26.9234840Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ReduceLROnPlateau:0, line 953[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9235466Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::ReduceLROnPlateau:0
2023-01-11T23:16:26.9236105Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CyclicLR:0, line 1168[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9236696Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CyclicLR:0
2023-01-11T23:16:26.9237369Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:0, line 1389[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9238112Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:0
2023-01-11T23:16:26.9238822Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:1, line 1405[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9239517Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::CosineAnnealingWarmRestarts.step:1
2023-01-11T23:16:26.9240172Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::OneCycleLR:0, line 1547[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9240767Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/lr_scheduler.py::OneCycleLR:0
2023-01-11T23:16:26.9241343Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py::SGD:0, line 58[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9241889Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py::SGD:0
2023-01-11T23:16:26.9242487Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:0, line 38[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9243082Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:0
2023-01-11T23:16:26.9243698Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:1, line 64[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9244289Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::AveragedModel:1
2023-01-11T23:16:26.9244894Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::update_bn:0, line 161[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9245520Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::update_bn:0
2023-01-11T23:16:26.9246111Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::SWALR:0, line 222[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9246679Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/optim/swa_utils.py::SWALR:0
2023-01-11T23:16:26.9247283Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/package/glob_group.py::GlobGroup:0, line 19[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9247884Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/package/glob_group.py::GlobGroup:0
2023-01-11T23:16:26.9248495Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py::profile:0, line 363[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9249081Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py::profile:0
2023-01-11T23:16:26.9249705Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py::assert_close:0, line 1395[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9291845Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_comparison.py::assert_close:0
2023-01-11T23:16:26.9292436Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py::make_tensor:0, line 93[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9293155Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_creation.py::make_tensor:0
2023-01-11T23:16:26.9293774Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::parametrize:0, line 305[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9294342Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::parametrize:0
2023-01-11T23:16:26.9295147Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_symmetric_psd_matrix:0, line 3447[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9296041Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_symmetric_psd_matrix:0
2023-01-11T23:16:26.9296649Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_psd_matrix:0, line 3461[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9297343Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_psd_matrix:0
2023-01-11T23:16:26.9298091Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_pd_matrix:0, line 3491[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9298706Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py::random_hermitian_pd_matrix:0
2023-01-11T23:16:26.9299424Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py::skip_unless_torch_gpu:0, line 57[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9300168Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/testing/_internal/distributed/_tensor/common_dtensor.py::skip_unless_torch_gpu:0
2023-01-11T23:16:26.9300995Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/backend_registration.py::rename_privateuse1_backend:0, line 24[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9301749Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/backend_registration.py::rename_privateuse1_backend:0
2023-01-11T23:16:26.9302459Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py::checkpoint_sequential:0, line 306[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9303021Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/checkpoint.py::checkpoint_sequential:0
2023-01-11T23:16:26.9303717Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CppExtension:0, line 912[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9304273Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CppExtension:0
2023-01-11T23:16:26.9304829Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:0, line 960[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9305366Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:0
2023-01-11T23:16:26.9305918Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:1, line 1037[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9306442Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::CUDAExtension:1
2023-01-11T23:16:26.9306992Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load:0, line 1273[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9307510Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load:0
2023-01-11T23:16:26.9308055Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load_inline:0, line 1364[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9308576Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py::load_inline:0
2023-01-11T23:16:26.9309105Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/dlpack.py::from_dlpack:0, line 71[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9320070Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/dlpack.py::from_dlpack:0
2023-01-11T23:16:26.9320651Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/throughput_benchmark.py::ThroughputBenchmark:0, line 77[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9321247Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/throughput_benchmark.py::ThroughputBenchmark:0
2023-01-11T23:16:26.9321911Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::IterableDataset:0, line 84[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9329113Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::IterableDataset:0
2023-01-11T23:16:26.9329720Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::random_split:0, line 320[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9330239Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/dataset.py::random_split:0
2023-01-11T23:16:26.9330802Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/distributed.py::DistributedSampler:0, line 51[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9331377Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/distributed.py::DistributedSampler:0
2023-01-11T23:16:26.9331961Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::WeightedRandomSampler:0, line 172[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9337035Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::WeightedRandomSampler:0
2023-01-11T23:16:26.9337603Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::BatchSampler:0, line 220[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9341713Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/sampler.py::BatchSampler:0
2023-01-11T23:16:26.9342283Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_convert:0, line 36[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9345404Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_convert:0
2023-01-11T23:16:26.9346047Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::collate:0, line 102[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9350409Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::collate:0
2023-01-11T23:16:26.9351055Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_collate:0, line 231[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9357199Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/_utils/collate.py::default_collate:0
2023-01-11T23:16:26.9357821Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::IterDataPipe:0, line 84[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9360919Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::IterDataPipe:0
2023-01-11T23:16:26.9361553Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::MapDataPipe:0, line 232[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9362444Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/datapipe.py::MapDataPipe:0
2023-01-11T23:16:26.9363047Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::MapperIterDataPipe:0, line 46[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9365274Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::MapperIterDataPipe:0
2023-01-11T23:16:26.9365909Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::CollatorIterDataPipe:0, line 187[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9367919Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/callable.py::CollatorIterDataPipe:0
2023-01-11T23:16:26.9368602Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combinatorics.py::ShufflerIterDataPipe:0, line 80[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9369531Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combinatorics.py::ShufflerIterDataPipe:0
2023-01-11T23:16:26.9370163Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ConcaterIterDataPipe:0, line 33[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9395239Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ConcaterIterDataPipe:0
2023-01-11T23:16:26.9395906Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ForkerIterDataPipe:0, line 75[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9396506Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ForkerIterDataPipe:0
2023-01-11T23:16:26.9397112Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::_ChildDataPipe:0, line 250[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9397695Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::_ChildDataPipe:0
2023-01-11T23:16:26.9398369Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::DemultiplexerIterDataPipe:0, line 329[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9399231Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::DemultiplexerIterDataPipe:0
2023-01-11T23:16:26.9399872Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::MultiplexerIterDataPipe:0, line 507[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9400888Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::MultiplexerIterDataPipe:0
2023-01-11T23:16:26.9401592Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ZipperIterDataPipe:0, line 572[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9402368Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/combining.py::ZipperIterDataPipe:0
2023-01-11T23:16:26.9402983Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/filelister.py::FileListerIterDataPipe:0, line 29[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9404149Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/filelister.py::FileListerIterDataPipe:0
2023-01-11T23:16:26.9404967Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/fileopener.py::FileOpenerIterDataPipe:0, line 33[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9405606Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/fileopener.py::FileOpenerIterDataPipe:0
2023-01-11T23:16:26.9406345Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::BatcherIterDataPipe:0, line 102[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9407146Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::BatcherIterDataPipe:0
2023-01-11T23:16:26.9407767Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::UnBatcherIterDataPipe:0, line 159[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9408369Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::UnBatcherIterDataPipe:0
2023-01-11T23:16:26.9408993Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::GrouperIterDataPipe:0, line 226[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9412776Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/grouping.py::GrouperIterDataPipe:0
2023-01-11T23:16:26.9413567Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/selecting.py::FilterIterDataPipe:0, line 34[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9414201Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/selecting.py::FilterIterDataPipe:0
2023-01-11T23:16:26.9415145Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/streamreader.py::StreamReaderIterDataPipe:0, line 20[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9415798Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/streamreader.py::StreamReaderIterDataPipe:0
2023-01-11T23:16:26.9416553Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/utils.py::IterableWrapperIterDataPipe:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9417188Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/iter/utils.py::IterableWrapperIterDataPipe:0
2023-01-11T23:16:26.9417875Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/callable.py::MapperMapDataPipe:0, line 30[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9418871Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/callable.py::MapperMapDataPipe:0
2023-01-11T23:16:26.9419684Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combinatorics.py::ShufflerIterDataPipe:0, line 31[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9420847Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combinatorics.py::ShufflerIterDataPipe:0
2023-01-11T23:16:26.9421582Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ConcaterMapDataPipe:0, line 24[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9422286Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ConcaterMapDataPipe:0
2023-01-11T23:16:26.9422932Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ZipperMapDataPipe:0, line 66[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9423566Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/combining.py::ZipperMapDataPipe:0
2023-01-11T23:16:26.9424181Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/grouping.py::BatcherMapDataPipe:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9424845Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/grouping.py::BatcherMapDataPipe:0
2023-01-11T23:16:26.9425538Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/utils.py::SequenceWrapperMapDataPipe:0, line 23[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9426170Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/map/utils.py::SequenceWrapperMapDataPipe:0
2023-01-11T23:16:26.9426782Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/common.py::validate_input_col:0, line 33[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9427362Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/data/datapipes/utils/common.py::validate_input_col:0
2023-01-11T23:16:26.9427936Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::find_closure_group:0, line 415[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9430139Z * [32mSUCCESS[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::find_closure_group:0
2023-01-11T23:16:26.9430987Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::replace_extern_shared:0, line 511[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9431578Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/hipify/hipify_python.py::replace_extern_shared:0
2023-01-11T23:16:26.9432200Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.__init__:0, line 213[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9433540Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.__init__:0
2023-01-11T23:16:26.9434247Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_hparams:0, line 320[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9434976Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_hparams:0
2023-01-11T23:16:26.9435777Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalar:0, line 368[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9436491Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalar:0
2023-01-11T23:16:26.9437199Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalars:0, line 404[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9437940Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_scalars:0
2023-01-11T23:16:26.9438536Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram:0, line 462[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9439352Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram:0
2023-01-11T23:16:26.9440015Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram_raw:0, line 519[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9440731Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_histogram_raw:0
2023-01-11T23:16:26.9441365Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_image:0, line 585[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9442007Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_image:0
2023-01-11T23:16:26.9442659Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_images:0, line 638[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9443322Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_images:0
2023-01-11T23:16:26.9443913Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_text:0, line 810[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9444558Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_text:0
2023-01-11T23:16:26.9445184Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_embedding:0, line 896[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9445811Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_embedding:0
2023-01-11T23:16:26.9446489Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_pr_curve:0, line 1001[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9447072Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_pr_curve:0
2023-01-11T23:16:26.9447757Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_multilinechart:0, line 1076[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9448390Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_multilinechart:0
2023-01-11T23:16:26.9449134Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_marginchart:0, line 1095[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9449817Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars_marginchart:0
2023-01-11T23:16:26.9450550Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars:0, line 1117[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9451143Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_custom_scalars:0
2023-01-11T23:16:26.9451746Z * DOCTEST : /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_mesh:0, line 1161[01m <- wrt source file[39;49;00m
2023-01-11T23:16:26.9452421Z * [33mSKIPPED[39;49;00m: /opt/conda/lib/python3.10/site-packages/torch/utils/tensorboard/writer.py::SummaryWriter.add_mesh:0
2023-01-11T23:16:26.9452763Z [01m============[39;49;00m
2023-01-11T23:16:26.9452954Z Finished doctests
2023-01-11T23:16:26.9453165Z 287 / 663 passed
2023-01-11T23:16:26.9453421Z [33m
2023-01-11T23:16:26.9453723Z === Found 3 run-time warnings ===[39;49;00m
2023-01-11T23:16:26.9454093Z [33m--- Runtime Warning: 1 / 3 ---[39;49;00m
2023-01-11T23:16:26.9454452Z example = <DocTest(torch._tensor Tensor.refine_names:0 ln 1096) at 0x7f39057df0d0>
2023-01-11T23:16:26.9455410Z     /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:1114: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /var/lib/jenkins/workspace/c10/core/TensorImpl.h:1816.)
2023-01-11T23:16:26.9456007Z       return super(Tensor, self).refine_names(names)
2023-01-11T23:16:26.9456217Z     
2023-01-11T23:16:26.9456508Z [33m--- Runtime Warning: 2 / 3 ---[39;49;00m
2023-01-11T23:16:26.9456857Z example = <DocTest(torch.nested as_nested_tensor:0 ln 39) at 0x7f3901f27f10>
2023-01-11T23:16:26.9457527Z     /opt/conda/lib/python3.10/site-packages/torch/nested/__init__.py:58: UserWarning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (Triggered internally at /var/lib/jenkins/workspace/aten/src/ATen/NestedTensorImpl.cpp:179.)
2023-01-11T23:16:26.9458047Z       return torch._nested_tensor_from_tensor_list(tensor_list, dtype, None, device, None)
2023-01-11T23:16:26.9458300Z     
2023-01-11T23:16:26.9458547Z [33m--- Runtime Warning: 3 / 3 ---[39;49;00m
2023-01-11T23:16:26.9458842Z example = <DocTest(torch._functorch.aot_autograd aot_function:0 ln 2139) at 0x7f39040b7af0>
2023-01-11T23:16:26.9459687Z     /opt/conda/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:921: UserWarning: Your compiler for AOTAutograd is returning a a function that doesn't take boxed arguments. Please wrap it with functorch.compile.make_boxed_func or handle the boxed arguments yourself. See https://github.com/pytorch/pytorch/pull/83137#issuecomment-1211320670 for rationale.
2023-01-11T23:16:26.9460220Z       warnings.warn(
2023-01-11T23:16:26.9460404Z     
2023-01-11T23:16:26.9460687Z [33m=== 287 passed, 376 skipped, 3 warnings in 8.58 seconds ===[39;49;00m
2023-01-11T23:16:27.3529291Z 
2023-01-11T23:16:27.3529846Z real	113m44.950s
2023-01-11T23:16:27.3530179Z user	222m9.096s
2023-01-11T23:16:27.3530459Z sys	10m41.357s
2023-01-11T23:16:27.3530715Z + assert_git_not_dirty
2023-01-11T23:16:27.3531185Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]]
2023-01-11T23:16:27.3531886Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]]
2023-01-11T23:16:27.3532187Z ++ git status --porcelain
2023-01-11T23:16:28.5459094Z + git_status=
2023-01-11T23:16:28.5459701Z + [[ -n '' ]]
2023-01-11T23:16:28.5460071Z + test_libtorch
2023-01-11T23:16:28.5460700Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]]
2023-01-11T23:16:28.5461262Z + echo 'Testing libtorch'
2023-01-11T23:16:28.5461659Z Testing libtorch
2023-01-11T23:16:28.5462499Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libbackend_with_compiler.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:16:28.5471376Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libjitbackend_test.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:16:28.5480779Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10d_cuda_test.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:16:28.5487348Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libshm.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:16:28.5495126Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda_linalg.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:16:28.5502134Z + ln -sf '/opt/conda/lib/python3.10/site-packages/torch/lib/libtbb*' /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:16:28.5511148Z + TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_libtorch
2023-01-11T23:16:28.5513071Z + mkdir -p test/test-reports/cpp-unittest/test_libtorch
2023-01-11T23:16:28.5513427Z + python tools/download_mnist.py --quiet -d test/cpp/api/mnist
2023-01-11T23:16:28.5521614Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *-tsan* ]]
2023-01-11T23:16:28.5521921Z + python test/cpp/jit/tests_setup.py setup
2023-01-11T23:16:28.5903864Z Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ...
2023-01-11T23:16:28.9486862Z Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ...
2023-01-11T23:16:28.9673841Z Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz ...
2023-01-11T23:16:29.0232727Z Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ...
2023-01-11T23:16:30.0466737Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *cuda* ]]
2023-01-11T23:16:30.0467285Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_jit --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_jit.xml
2023-01-11T23:16:30.4573488Z Only one CUDA device detected. Disabling MultiCUDA tests
2023-01-11T23:16:30.4581096Z [0;33mNote: Google Test filter = *-*_MultiCUDA
2023-01-11T23:16:30.4581466Z [m[0;32m[==========] [mRunning 1340 tests from 122 test suites.
2023-01-11T23:16:30.4581841Z [0;32m[----------] [mGlobal test environment set-up.
2023-01-11T23:16:30.4582168Z [0;32m[----------] [m2 tests from AddIfThenElseOpTest
2023-01-11T23:16:30.4582524Z [0;32m[ RUN      ] [mAddIfThenElseOpTest.AddIfThenElseOpSimple
2023-01-11T23:16:30.4648517Z [0;32m[       OK ] [mAddIfThenElseOpTest.AddIfThenElseOpSimple (6 ms)
2023-01-11T23:16:30.4649106Z [0;32m[ RUN      ] [mAddIfThenElseOpTest.NoIfThenElseOpMultipleOutputs
2023-01-11T23:16:30.4649662Z [0;32m[       OK ] [mAddIfThenElseOpTest.NoIfThenElseOpMultipleOutputs (0 ms)
2023-01-11T23:16:30.4650240Z [0;32m[----------] [m2 tests from AddIfThenElseOpTest (6 ms total)
2023-01-11T23:16:30.4650492Z 
2023-01-11T23:16:30.4650689Z [0;32m[----------] [m15 tests from TopologicalMoveTest
2023-01-11T23:16:30.4660614Z [0;32m[ RUN      ] [mTopologicalMoveTest.SplitsDeps
2023-01-11T23:16:30.4661140Z [0;32m[       OK ] [mTopologicalMoveTest.SplitsDeps (0 ms)
2023-01-11T23:16:30.4661610Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterBackwardSimple
2023-01-11T23:16:30.4662184Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterBackwardSimple (0 ms)
2023-01-11T23:16:30.4662751Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterBackwardInvalid
2023-01-11T23:16:30.4663317Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterBackwardInvalid (0 ms)
2023-01-11T23:16:30.4663751Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterNoOp
2023-01-11T23:16:30.4664103Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterNoOp (0 ms)
2023-01-11T23:16:30.4664507Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterBackwardMultipleDeps
2023-01-11T23:16:30.4664981Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterBackwardMultipleDeps (0 ms)
2023-01-11T23:16:30.4665511Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterBackwardNonZeroWorkingSet
2023-01-11T23:16:30.4665997Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterBackwardNonZeroWorkingSet (0 ms)
2023-01-11T23:16:30.4666562Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterForwardSimple
2023-01-11T23:16:30.4666987Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterForwardSimple (0 ms)
2023-01-11T23:16:30.4667418Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterForwardNonZeroWorkingSet
2023-01-11T23:16:30.4667893Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterForwardNonZeroWorkingSet (0 ms)
2023-01-11T23:16:30.4668324Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveBeforeForwardSimple
2023-01-11T23:16:30.4668735Z [0;32m[       OK ] [mTopologicalMoveTest.MoveBeforeForwardSimple (0 ms)
2023-01-11T23:16:30.4669239Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveBeforeBackwardSimple
2023-01-11T23:16:30.4669656Z [0;32m[       OK ] [mTopologicalMoveTest.MoveBeforeBackwardSimple (0 ms)
2023-01-11T23:16:30.4670040Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveBeforeNoOp
2023-01-11T23:16:30.4670399Z [0;32m[       OK ] [mTopologicalMoveTest.MoveBeforeNoOp (0 ms)
2023-01-11T23:16:30.4670896Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveBeforeForwardWithDeps
2023-01-11T23:16:30.4671321Z [0;32m[       OK ] [mTopologicalMoveTest.MoveBeforeForwardWithDeps (0 ms)
2023-01-11T23:16:30.4671743Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveBeforeBackwardWithDeps
2023-01-11T23:16:30.4672164Z [0;32m[       OK ] [mTopologicalMoveTest.MoveBeforeBackwardWithDeps (0 ms)
2023-01-11T23:16:30.4672557Z [0;32m[ RUN      ] [mTopologicalMoveTest.DepsDisallowMove
2023-01-11T23:16:30.4672927Z [0;32m[       OK ] [mTopologicalMoveTest.DepsDisallowMove (0 ms)
2023-01-11T23:16:30.4673311Z [0;32m[ RUN      ] [mTopologicalMoveTest.MoveAfterBeforeWithDeps
2023-01-11T23:16:30.4673723Z [0;32m[       OK ] [mTopologicalMoveTest.MoveAfterBeforeWithDeps (0 ms)
2023-01-11T23:16:30.4674119Z [0;32m[----------] [m15 tests from TopologicalMoveTest (1 ms total)
2023-01-11T23:16:30.4674292Z 
2023-01-11T23:16:30.4674462Z [0;32m[----------] [m6 tests from AliasAnalysisTest
2023-01-11T23:16:30.4674820Z [0;32m[ RUN      ] [mAliasAnalysisTest.AliasingMutationBlocksMoves
2023-01-11T23:16:30.4705083Z [0;32m[       OK ] [mAliasAnalysisTest.AliasingMutationBlocksMoves (3 ms)
2023-01-11T23:16:30.4705548Z [0;32m[ RUN      ] [mAliasAnalysisTest.AliasingMutationBlocksMoves2
2023-01-11T23:16:30.4706118Z [0;32m[       OK ] [mAliasAnalysisTest.AliasingMutationBlocksMoves2 (0 ms)
2023-01-11T23:16:30.4706557Z [0;32m[ RUN      ] [mAliasAnalysisTest.SideEffectsBlockMoves
2023-01-11T23:16:30.4707061Z [0;32m[       OK ] [mAliasAnalysisTest.SideEffectsBlockMoves (0 ms)
2023-01-11T23:16:30.4707533Z [0;32m[ RUN      ] [mAliasAnalysisTest.MovingAcrossInnerBlocks
2023-01-11T23:16:30.4707936Z [0;32m[       OK ] [mAliasAnalysisTest.MovingAcrossInnerBlocks (0 ms)
2023-01-11T23:16:30.4708311Z [0;32m[ RUN      ] [mAliasAnalysisTest.NoneHasNoWriters
2023-01-11T23:16:30.4708694Z [0;32m[       OK ] [mAliasAnalysisTest.NoneHasNoWriters (0 ms)
2023-01-11T23:16:30.4709356Z [0;32m[ RUN      ] [mAliasAnalysisTest.SafeToChangeAliasingRelationship
2023-01-11T23:16:30.4710018Z [0;32m[       OK ] [mAliasAnalysisTest.SafeToChangeAliasingRelationship (0 ms)
2023-01-11T23:16:30.4710566Z [0;32m[----------] [m6 tests from AliasAnalysisTest (4 ms total)
2023-01-11T23:16:30.4710756Z 
2023-01-11T23:16:30.4710947Z [0;32m[----------] [m4 tests from WriteTrackingTest
2023-01-11T23:16:30.4711273Z [0;32m[ RUN      ] [mWriteTrackingTest.Basic
2023-01-11T23:16:30.4711653Z [0;32m[       OK ] [mWriteTrackingTest.Basic (0 ms)
2023-01-11T23:16:30.4712075Z [0;32m[ RUN      ] [mWriteTrackingTest.IsMutable
2023-01-11T23:16:30.4712483Z [0;32m[       OK ] [mWriteTrackingTest.IsMutable (0 ms)
2023-01-11T23:16:30.4712885Z [0;32m[ RUN      ] [mWriteTrackingTest.IsImmutable
2023-01-11T23:16:30.4713231Z [0;32m[       OK ] [mWriteTrackingTest.IsImmutable (0 ms)
2023-01-11T23:16:30.4713582Z [0;32m[ RUN      ] [mWriteTrackingTest.HasWriters
2023-01-11T23:16:30.4714027Z [0;32m[       OK ] [mWriteTrackingTest.HasWriters (0 ms)
2023-01-11T23:16:30.4714497Z [0;32m[----------] [m4 tests from WriteTrackingTest (0 ms total)
2023-01-11T23:16:30.4714728Z 
2023-01-11T23:16:30.4714990Z [0;32m[----------] [m13 tests from ContainerAliasingTest
2023-01-11T23:16:30.4715476Z [0;32m[ RUN      ] [mContainerAliasingTest.MayContainAlias
2023-01-11T23:16:30.4716004Z [0;32m[       OK ] [mContainerAliasingTest.MayContainAlias (0 ms)
2023-01-11T23:16:30.4716489Z [0;32m[ RUN      ] [mContainerAliasingTest.MayContainAlias_cast
2023-01-11T23:16:30.4716937Z [0;32m[       OK ] [mContainerAliasingTest.MayContainAlias_cast (0 ms)
2023-01-11T23:16:30.4717499Z [0;32m[ RUN      ] [mContainerAliasingTest.PrimitveValuesDontAliasContainers
2023-01-11T23:16:30.4718174Z [0;32m[       OK ] [mContainerAliasingTest.PrimitveValuesDontAliasContainers (0 ms)
2023-01-11T23:16:30.4718594Z [0;32m[ RUN      ] [mContainerAliasingTest.UnionAliasing
2023-01-11T23:16:30.4718954Z [0;32m[       OK ] [mContainerAliasingTest.UnionAliasing (0 ms)
2023-01-11T23:16:30.4719341Z [0;32m[ RUN      ] [mContainerAliasingTest.InputsCanAliasOutputs
2023-01-11T23:16:30.4719831Z [0;32m[       OK ] [mContainerAliasingTest.InputsCanAliasOutputs (0 ms)
2023-01-11T23:16:30.4720244Z [0;32m[ RUN      ] [mContainerAliasingTest.NestedTupleConstruct
2023-01-11T23:16:30.4720672Z [0;32m[       OK ] [mContainerAliasingTest.NestedTupleConstruct (0 ms)
2023-01-11T23:16:30.4721165Z [0;32m[ RUN      ] [mContainerAliasingTest.NestedTypes
2023-01-11T23:16:30.4721539Z [0;32m[       OK ] [mContainerAliasingTest.NestedTypes (0 ms)
2023-01-11T23:16:30.4721876Z [0;32m[ RUN      ] [mContainerAliasingTest.Simple
2023-01-11T23:16:30.4722280Z [0;32m[       OK ] [mContainerAliasingTest.Simple (0 ms)
2023-01-11T23:16:30.4722648Z [0;32m[ RUN      ] [mContainerAliasingTest.Lists
2023-01-11T23:16:30.4723047Z [0;32m[       OK ] [mContainerAliasingTest.Lists (0 ms)
2023-01-11T23:16:30.4723483Z [0;32m[ RUN      ] [mContainerAliasingTest.Lists2
2023-01-11T23:16:30.4723875Z [0;32m[       OK ] [mContainerAliasingTest.Lists2 (0 ms)
2023-01-11T23:16:30.4724328Z [0;32m[ RUN      ] [mContainerAliasingTest.Conservative
2023-01-11T23:16:30.4724793Z [0;32m[       OK ] [mContainerAliasingTest.Conservative (0 ms)
2023-01-11T23:16:30.4725194Z [0;32m[ RUN      ] [mContainerAliasingTest.MovesAcrossContainedWrites
2023-01-11T23:16:30.4725761Z [0;32m[       OK ] [mContainerAliasingTest.MovesAcrossContainedWrites (0 ms)
2023-01-11T23:16:30.4726364Z [0;32m[ RUN      ] [mContainerAliasingTest.MovesAcrossContainedWritesNested
2023-01-11T23:16:30.4726943Z [0;32m[       OK ] [mContainerAliasingTest.MovesAcrossContainedWritesNested (0 ms)
2023-01-11T23:16:30.4727444Z [0;32m[----------] [m13 tests from ContainerAliasingTest (1 ms total)
2023-01-11T23:16:30.4727639Z 
2023-01-11T23:16:30.4727853Z [0;32m[----------] [m3 tests from WildcardsTest
2023-01-11T23:16:30.4728226Z [0;32m[ RUN      ] [mWildcardsTest.Basic
2023-01-11T23:16:30.4728660Z [0;32m[       OK ] [mWildcardsTest.Basic (0 ms)
2023-01-11T23:16:30.4728989Z [0;32m[ RUN      ] [mWildcardsTest.TypeIsolation
2023-01-11T23:16:30.4729430Z [0;32m[       OK ] [mWildcardsTest.TypeIsolation (0 ms)
2023-01-11T23:16:30.4729881Z [0;32m[ RUN      ] [mWildcardsTest.InvariantContainerAliasing
2023-01-11T23:16:30.4730420Z [0;32m[       OK ] [mWildcardsTest.InvariantContainerAliasing (0 ms)
2023-01-11T23:16:30.4730841Z [0;32m[----------] [m3 tests from WildcardsTest (0 ms total)
2023-01-11T23:16:30.4730998Z 
2023-01-11T23:16:30.4731178Z [0;32m[----------] [m18 tests from AliasRegistrationTest
2023-01-11T23:16:30.4731572Z [0;32m[ RUN      ] [mAliasRegistrationTest.ConservativeWithInferredSchema
2023-01-11T23:16:30.4732109Z [0;32m[       OK ] [mAliasRegistrationTest.ConservativeWithInferredSchema (0 ms)
2023-01-11T23:16:30.4732583Z [0;32m[ RUN      ] [mAliasRegistrationTest.ConservativeWithSpecifiedSchema
2023-01-11T23:16:30.4733062Z [0;32m[       OK ] [mAliasRegistrationTest.ConservativeWithSpecifiedSchema (0 ms)
2023-01-11T23:16:30.4733587Z [0;32m[ RUN      ] [mAliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError
2023-01-11T23:16:30.4768990Z [0;32m[       OK ] [mAliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError (3 ms)
2023-01-11T23:16:30.4769569Z [0;32m[ RUN      ] [mAliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError2
2023-01-11T23:16:30.4802993Z [0;32m[       OK ] [mAliasRegistrationTest.ConservativeWithAliasingAnnotationsShouldError2 (3 ms)
2023-01-11T23:16:30.4803548Z [0;32m[ RUN      ] [mAliasRegistrationTest.FromSchemaWithInferredSchemaShouldError
2023-01-11T23:16:30.4816892Z [0;32m[       OK ] [mAliasRegistrationTest.FromSchemaWithInferredSchemaShouldError (1 ms)
2023-01-11T23:16:30.4817598Z [0;32m[ RUN      ] [mAliasRegistrationTest.FromSchemaInferredPure
2023-01-11T23:16:30.4818028Z [0;32m[       OK ] [mAliasRegistrationTest.FromSchemaInferredPure (0 ms)
2023-01-11T23:16:30.4818434Z [0;32m[ RUN      ] [mAliasRegistrationTest.FromSchemaAliased
2023-01-11T23:16:30.4819164Z [0;32m[       OK ] [mAliasRegistrationTest.FromSchemaAliased (0 ms)
2023-01-11T23:16:30.4819984Z [0;32m[ RUN      ] [mAliasRegistrationTest.FromSchemaPure
2023-01-11T23:16:30.4820503Z [0;32m[       OK ] [mAliasRegistrationTest.FromSchemaPure (0 ms)
2023-01-11T23:16:30.4820967Z [0;32m[ RUN      ] [mAliasRegistrationTest.PureNoSchema
2023-01-11T23:16:30.4821819Z [0;32m[       OK ] [mAliasRegistrationTest.PureNoSchema (0 ms)
2023-01-11T23:16:30.4822393Z [0;32m[ RUN      ] [mAliasRegistrationTest.PureWithSchema
2023-01-11T23:16:30.4822883Z [0;32m[       OK ] [mAliasRegistrationTest.PureWithSchema (0 ms)
2023-01-11T23:16:30.4823300Z [0;32m[ RUN      ] [mAliasRegistrationTest.PureWithAnnotationsShouldError
2023-01-11T23:16:30.4860672Z [0;32m[       OK ] [mAliasRegistrationTest.PureWithAnnotationsShouldError (3 ms)
2023-01-11T23:16:30.4861320Z [0;32m[ RUN      ] [mAliasRegistrationTest.AliasMoveAtenListOp
2023-01-11T23:16:30.4861880Z [0;32m[       OK ] [mAliasRegistrationTest.AliasMoveAtenListOp (0 ms)
2023-01-11T23:16:30.4862507Z [0;32m[ RUN      ] [mAliasRegistrationTest.AliasMoveForTupleConstructWithSingleUseAsGraphOutput
2023-01-11T23:16:30.4863133Z [0;32m[       OK ] [mAliasRegistrationTest.AliasMoveForTupleConstructWithSingleUseAsGraphOutput (0 ms)
2023-01-11T23:16:30.4863686Z [0;32m[ RUN      ] [mAliasRegistrationTest.RecursiveSubgraphTupleContainment
2023-01-11T23:16:30.4864198Z [0;32m[       OK ] [mAliasRegistrationTest.RecursiveSubgraphTupleContainment (0 ms)
2023-01-11T23:16:30.4864693Z [0;32m[ RUN      ] [mAliasRegistrationTest.WildcardAliasForTupleConstructWithUses
2023-01-11T23:16:30.4865250Z [0;32m[       OK ] [mAliasRegistrationTest.WildcardAliasForTupleConstructWithUses (0 ms)
2023-01-11T23:16:30.4865722Z [0;32m[ RUN      ] [mAliasRegistrationTest.ATenSplitIntListAliasCheck
2023-01-11T23:16:30.4866167Z [0;32m[       OK ] [mAliasRegistrationTest.ATenSplitIntListAliasCheck (0 ms)
2023-01-11T23:16:30.4866699Z [0;32m[ RUN      ] [mAliasRegistrationTest.ATenSplitIntAliasCheck
2023-01-11T23:16:30.4867117Z [0;32m[       OK ] [mAliasRegistrationTest.ATenSplitIntAliasCheck (0 ms)
2023-01-11T23:16:30.4867556Z [0;32m[ RUN      ] [mAliasRegistrationTest.PureWithAnnotationsShouldError2
2023-01-11T23:16:30.4897984Z [0;32m[       OK ] [mAliasRegistrationTest.PureWithAnnotationsShouldError2 (3 ms)
2023-01-11T23:16:30.4898598Z [0;32m[----------] [m18 tests from AliasRegistrationTest (16 ms total)
2023-01-11T23:16:30.4898956Z 
2023-01-11T23:16:30.4899202Z [0;32m[----------] [m2 tests from IRNonDeterminismTest
2023-01-11T23:16:30.4899539Z [0;32m[ RUN      ] [mIRNonDeterminismTest.Basic
2023-01-11T23:16:30.4899858Z [0;32m[       OK ] [mIRNonDeterminismTest.Basic (0 ms)
2023-01-11T23:16:30.4900276Z [0;32m[ RUN      ] [mIRNonDeterminismTest.DropoutSpecialCase
2023-01-11T23:16:30.4900668Z [0;32m[       OK ] [mIRNonDeterminismTest.DropoutSpecialCase (0 ms)
2023-01-11T23:16:30.4901138Z [0;32m[----------] [m2 tests from IRNonDeterminismTest (0 ms total)
2023-01-11T23:16:30.4901368Z 
2023-01-11T23:16:30.4901654Z [0;32m[----------] [m1 test from NonDeterminismBackwardsCompatibility
2023-01-11T23:16:30.4902109Z [0;32m[ RUN      ] [mNonDeterminismBackwardsCompatibility.BackwardsCompatibility
2023-01-11T23:16:30.4902628Z [0;32m[       OK ] [mNonDeterminismBackwardsCompatibility.BackwardsCompatibility (0 ms)
2023-01-11T23:16:30.4903107Z [0;32m[----------] [m1 test from NonDeterminismBackwardsCompatibility (0 ms total)
2023-01-11T23:16:30.4903308Z 
2023-01-11T23:16:30.4903474Z [0;32m[----------] [m2 tests from ArgumentSpecTest
2023-01-11T23:16:30.4903832Z [0;32m[ RUN      ] [mArgumentSpecTest.CompleteArgumentSpec_CUDA
2023-01-11T23:16:31.3385778Z [0;32m[       OK ] [mArgumentSpecTest.CompleteArgumentSpec_CUDA (847 ms)
2023-01-11T23:16:31.3386454Z [0;32m[ RUN      ] [mArgumentSpecTest.Basic_CUDA
2023-01-11T23:16:31.3386796Z [0;32m[       OK ] [mArgumentSpecTest.Basic_CUDA (0 ms)
2023-01-11T23:16:31.3387251Z [0;32m[----------] [m2 tests from ArgumentSpecTest (848 ms total)
2023-01-11T23:16:31.3387431Z 
2023-01-11T23:16:31.3387588Z [0;32m[----------] [m3 tests from AutodiffTest
2023-01-11T23:16:31.3387906Z [0;32m[ RUN      ] [mAutodiffTest.ADFormulas
2023-01-11T23:16:31.3862934Z [0;32m[       OK ] [mAutodiffTest.ADFormulas (47 ms)
2023-01-11T23:16:31.3863741Z [0;32m[ RUN      ] [mAutodiffTest.Differentiate
2023-01-11T23:16:31.3865563Z [0;32m[       OK ] [mAutodiffTest.Differentiate (0 ms)
2023-01-11T23:16:31.3866466Z [0;32m[ RUN      ] [mAutodiffTest.DifferentiateWithRequiresGrad
2023-01-11T23:16:31.3885800Z [0;32m[       OK ] [mAutodiffTest.DifferentiateWithRequiresGrad (2 ms)
2023-01-11T23:16:31.3886339Z [0;32m[----------] [m3 tests from AutodiffTest (50 ms total)
2023-01-11T23:16:31.3886511Z 
2023-01-11T23:16:31.3886724Z [0;32m[----------] [m1 test from AutodiffRemoveUnusedGradientsTest
2023-01-11T23:16:31.3887107Z [0;32m[ RUN      ] [mAutodiffRemoveUnusedGradientsTest.Linear
2023-01-11T23:16:31.3898284Z [0;32m[       OK ] [mAutodiffRemoveUnusedGradientsTest.Linear (1 ms)
2023-01-11T23:16:31.3898740Z [0;32m[----------] [m1 test from AutodiffRemoveUnusedGradientsTest (1 ms total)
2023-01-11T23:16:31.3898945Z 
2023-01-11T23:16:31.3899109Z [0;32m[----------] [m1 test from UpgraderLoad
2023-01-11T23:16:31.3899446Z [0;32m[ RUN      ] [mUpgraderLoad.CanPopulateUpgradersGraph
2023-01-11T23:16:31.3946977Z [0;32m[       OK ] [mUpgraderLoad.CanPopulateUpgradersGraph (4 ms)
2023-01-11T23:16:31.3947941Z [0;32m[----------] [m1 test from UpgraderLoad (4 ms total)
2023-01-11T23:16:31.3948269Z 
2023-01-11T23:16:31.3948649Z [0;32m[----------] [m4 tests from OpReplacementTest
2023-01-11T23:16:31.3949490Z [0;32m[ RUN      ] [mOpReplacementTest.ReplaceDivInSimpleFunction
2023-01-11T23:16:31.3950336Z [0;32m[       OK ] [mOpReplacementTest.ReplaceDivInSimpleFunction (0 ms)
2023-01-11T23:16:31.3951425Z [0;32m[ RUN      ] [mOpReplacementTest.ReplaceTwoOpsInSimpleFunction
2023-01-11T23:16:31.3952643Z [0;32m[       OK ] [mOpReplacementTest.ReplaceTwoOpsInSimpleFunction (0 ms)
2023-01-11T23:16:31.3953729Z [0;32m[ RUN      ] [mOpReplacementTest.ReplaceDivInNestedFunction
2023-01-11T23:16:31.3955004Z [0;32m[       OK ] [mOpReplacementTest.ReplaceDivInNestedFunction (0 ms)
2023-01-11T23:16:31.3955485Z [0;32m[ RUN      ] [mOpReplacementTest.ReplaceTestSubcmulInSimpleFunction
2023-01-11T23:16:31.3955945Z [0;32m[       OK ] [mOpReplacementTest.ReplaceTestSubcmulInSimpleFunction (0 ms)
2023-01-11T23:16:31.3956360Z [0;32m[----------] [m4 tests from OpReplacementTest (0 ms total)
2023-01-11T23:16:31.3956531Z 
2023-01-11T23:16:31.3956694Z [0;32m[----------] [m4 tests from UpgraderUtils
2023-01-11T23:16:31.3957029Z [0;32m[ RUN      ] [mUpgraderUtils.FindCorrectUpgrader
2023-01-11T23:16:31.3957383Z [0;32m[       OK ] [mUpgraderUtils.FindCorrectUpgrader (0 ms)
2023-01-11T23:16:31.3957735Z [0;32m[ RUN      ] [mUpgraderUtils.IsVersionMapSorted
2023-01-11T23:16:31.3958094Z [0;32m[       OK ] [mUpgraderUtils.IsVersionMapSorted (0 ms)
2023-01-11T23:16:31.3958430Z [0;32m[ RUN      ] [mUpgraderUtils.FindIfOpIsCurrent
2023-01-11T23:16:31.3958778Z [0;32m[       OK ] [mUpgraderUtils.FindIfOpIsCurrent (0 ms)
2023-01-11T23:16:31.3959120Z [0;32m[ RUN      ] [mUpgraderUtils.CanLoadHistoricOp
2023-01-11T23:16:31.3959468Z [0;32m[       OK ] [mUpgraderUtils.CanLoadHistoricOp (0 ms)
2023-01-11T23:16:31.3959809Z [0;32m[----------] [m4 tests from UpgraderUtils (0 ms total)
2023-01-11T23:16:31.3959972Z 
2023-01-11T23:16:31.3960123Z [0;32m[----------] [m9 tests from BackendTest
2023-01-11T23:16:31.3960408Z [0;32m[ RUN      ] [mBackendTest.ToBackend
2023-01-11T23:16:31.4008991Z [0;32m[       OK ] [mBackendTest.ToBackend (5 ms)
2023-01-11T23:16:31.4009487Z [0;32m[ RUN      ] [mBackendTest.ToBackendNotAvailable
2023-01-11T23:16:31.4035727Z [W backend_detail.cpp:393] Warning: Backend [test_backend_unavailable] is not available. Execution of this Module is still possible by saving and loading on a device where the backend is available. (function codegen_backend_module)
2023-01-11T23:16:31.4052385Z [0;32m[       OK ] [mBackendTest.ToBackendNotAvailable (4 ms)
2023-01-11T23:16:31.4052750Z [0;32m[ RUN      ] [mBackendTest.TestCompiler
2023-01-11T23:16:31.4117008Z [0;32m[       OK ] [mBackendTest.TestCompiler (6 ms)
2023-01-11T23:16:31.4117399Z [0;32m[ RUN      ] [mBackendTest.TestCompilerWithStringTable
2023-01-11T23:16:31.4175021Z [0;32m[       OK ] [mBackendTest.TestCompilerWithStringTable (5 ms)
2023-01-11T23:16:31.4175462Z [0;32m[ RUN      ] [mBackendTest.TestComposite
2023-01-11T23:16:31.4290612Z [0;32m[       OK ] [mBackendTest.TestComposite (11 ms)
2023-01-11T23:16:31.4290986Z [0;32m[ RUN      ] [mBackendTest.TestPrimDtype
2023-01-11T23:16:31.4296661Z [0;32m[       OK ] [mBackendTest.TestPrimDtype (0 ms)
2023-01-11T23:16:31.4297053Z [0;32m[ RUN      ] [mBackendTest.TestCompositeWithSetStates
2023-01-11T23:16:31.4411695Z [0;32m[       OK ] [mBackendTest.TestCompositeWithSetStates (11 ms)
2023-01-11T23:16:31.4412146Z [0;32m[ RUN      ] [mBackendTest.TestConsistencyOfCompositeWithSetStates
2023-01-11T23:16:31.4615128Z [0;32m[       OK ] [mBackendTest.TestConsistencyOfCompositeWithSetStates (20 ms)
2023-01-11T23:16:31.4615977Z [0;32m[ RUN      ] [mBackendTest.TestCompilerNotSupport
2023-01-11T23:16:31.4635680Z [0;32m[       OK ] [mBackendTest.TestCompilerNotSupport (1 ms)
2023-01-11T23:16:31.4636078Z [0;32m[----------] [m9 tests from BackendTest (67 ms total)
2023-01-11T23:16:31.4636247Z 
2023-01-11T23:16:31.4636446Z [0;32m[----------] [m6 tests from BackendTestDebugInfo
2023-01-11T23:16:31.4636887Z [0;32m[ RUN      ] [mBackendTestDebugInfo.TestCompiler
2023-01-11T23:16:31.4766215Z [0;32m[       OK ] [mBackendTestDebugInfo.TestCompiler (13 ms)
2023-01-11T23:16:31.4766661Z [0;32m[ RUN      ] [mBackendTestDebugInfo.TestCompilerWithStringTable
2023-01-11T23:16:31.4899983Z [0;32m[       OK ] [mBackendTestDebugInfo.TestCompilerWithStringTable (13 ms)
2023-01-11T23:16:31.4900780Z [0;32m[ RUN      ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithModuleHierarchy
2023-01-11T23:16:31.5035699Z [0;32m[       OK ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithModuleHierarchy (13 ms)
2023-01-11T23:16:31.5036365Z [0;32m[ RUN      ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithTwoLevelModuleHierarchy
2023-01-11T23:16:31.5168402Z [0;32m[       OK ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithTwoLevelModuleHierarchy (13 ms)
2023-01-11T23:16:31.5169051Z [0;32m[ RUN      ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithLoweredSubModule
2023-01-11T23:16:31.5307867Z [0;32m[       OK ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithLoweredSubModule (13 ms)
2023-01-11T23:16:31.5309178Z [0;32m[ RUN      ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithSelectiveLoweredSubModule
2023-01-11T23:16:31.5446897Z [0;32m[       OK ] [mBackendTestDebugInfo.TestExceptionStackForCompilerWithSelectiveLoweredSubModule (13 ms)
2023-01-11T23:16:31.5448168Z [0;32m[----------] [m6 tests from BackendTestDebugInfo (81 ms total)
2023-01-11T23:16:31.5448514Z 
2023-01-11T23:16:31.5448834Z [0;32m[----------] [m4 tests from ClassImportTest
2023-01-11T23:16:31.5449430Z [0;32m[ RUN      ] [mClassImportTest.Basic
2023-01-11T23:16:31.5453405Z [0;32m[       OK ] [mClassImportTest.Basic (0 ms)
2023-01-11T23:16:31.5454281Z [0;32m[ RUN      ] [mClassImportTest.ScriptObject
2023-01-11T23:16:31.5478513Z [0;32m[       OK ] [mClassImportTest.ScriptObject (2 ms)
2023-01-11T23:16:31.5478989Z [0;32m[ RUN      ] [mClassImportTest.ClassDerive
2023-01-11T23:16:31.5479429Z [0;32m[       OK ] [mClassImportTest.ClassDerive (0 ms)
2023-01-11T23:16:31.5479960Z [0;32m[ RUN      ] [mClassImportTest.CustomClass
2023-01-11T23:16:31.5480413Z [0;32m[       OK ] [mClassImportTest.CustomClass (0 ms)
2023-01-11T23:16:31.5480904Z [0;32m[----------] [m4 tests from ClassImportTest (3 ms total)
2023-01-11T23:16:31.5481123Z 
2023-01-11T23:16:31.5481345Z [0;32m[----------] [m1 test from ClassParserTest
2023-01-11T23:16:31.5481705Z [0;32m[ RUN      ] [mClassParserTest.Basic
2023-01-11T23:16:31.5482020Z [0;32m[       OK ] [mClassParserTest.Basic (0 ms)
2023-01-11T23:16:31.5482365Z [0;32m[----------] [m1 test from ClassParserTest (0 ms total)
2023-01-11T23:16:31.5482530Z 
2023-01-11T23:16:31.5482680Z [0;32m[----------] [m3 tests from ClassTypeTest
2023-01-11T23:16:31.5482988Z [0;32m[ RUN      ] [mClassTypeTest.AddRemoveAttr
2023-01-11T23:16:31.5483325Z [0;32m[       OK ] [mClassTypeTest.AddRemoveAttr (0 ms)
2023-01-11T23:16:31.5483655Z [0;32m[ RUN      ] [mClassTypeTest.AddRemoveConstant
2023-01-11T23:16:31.5484009Z [0;32m[       OK ] [mClassTypeTest.AddRemoveConstant (0 ms)
2023-01-11T23:16:31.5484385Z [0;32m[ RUN      ] [mClassTypeTest.IdenticalTypesDifferentCus
2023-01-11T23:16:31.5494095Z [0;32m[       OK ] [mClassTypeTest.IdenticalTypesDifferentCus (1 ms)
2023-01-11T23:16:31.5494869Z [0;32m[----------] [m3 tests from ClassTypeTest (1 ms total)
2023-01-11T23:16:31.5495144Z 
2023-01-11T23:16:31.5495385Z [0;32m[----------] [m2 tests from TestCodeTemplate
2023-01-11T23:16:31.5495795Z [0;32m[ RUN      ] [mTestCodeTemplate.Copying
2023-01-11T23:16:31.5496115Z [0;32m[       OK ] [mTestCodeTemplate.Copying (0 ms)
2023-01-11T23:16:31.5496425Z [0;32m[ RUN      ] [mTestCodeTemplate.Formatting
2023-01-11T23:16:31.5496751Z [0;32m[       OK ] [mTestCodeTemplate.Formatting (0 ms)
2023-01-11T23:16:31.5497094Z [0;32m[----------] [m2 tests from TestCodeTemplate (0 ms total)
2023-01-11T23:16:31.5497255Z 
2023-01-11T23:16:31.5497413Z [0;32m[----------] [m13 tests from ConcatOptTest
2023-01-11T23:16:31.5497788Z [0;32m[ RUN      ] [mConcatOptTest.SimpleCommonInputsEliminationPrefix
2023-01-11T23:16:31.5531333Z [0;32m[       OK ] [mConcatOptTest.SimpleCommonInputsEliminationPrefix (3 ms)
2023-01-11T23:16:31.5531854Z [0;32m[ RUN      ] [mConcatOptTest.SimpleCommonInputsEliminationSuffix
2023-01-11T23:16:31.5569671Z [0;32m[       OK ] [mConcatOptTest.SimpleCommonInputsEliminationSuffix (3 ms)
2023-01-11T23:16:31.5570244Z [0;32m[ RUN      ] [mConcatOptTest.CommonInputsEliminationWithDifferentOrderInputs
2023-01-11T23:16:31.5608227Z [0;32m[       OK ] [mConcatOptTest.CommonInputsEliminationWithDifferentOrderInputs (3 ms)
2023-01-11T23:16:31.5608749Z [0;32m[ RUN      ] [mConcatOptTest.MoreCommonInputsElimination
2023-01-11T23:16:31.5673938Z [0;32m[       OK ] [mConcatOptTest.MoreCommonInputsElimination (6 ms)
2023-01-11T23:16:31.5674384Z [0;32m[ RUN      ] [mConcatOptTest.ExpandConcat
2023-01-11T23:16:31.5702642Z [0;32m[       OK ] [mConcatOptTest.ExpandConcat (2 ms)
2023-01-11T23:16:31.5703705Z [0;32m[ RUN      ] [mConcatOptTest.ConcatWithoutResultShape
2023-01-11T23:16:31.5725101Z [0;32m[       OK ] [mConcatOptTest.ConcatWithoutResultShape (2 ms)
2023-01-11T23:16:31.5725567Z [0;32m[ RUN      ] [mConcatOptTest.ConcatWithoutInputShape
2023-01-11T23:16:31.5755320Z [0;32m[       OK ] [mConcatOptTest.ConcatWithoutInputShape (2 ms)
2023-01-11T23:16:31.5755739Z [0;32m[ RUN      ] [mConcatOptTest.UseVariadicCat
2023-01-11T23:16:31.5818803Z [0;32m[       OK ] [mConcatOptTest.UseVariadicCat (6 ms)
2023-01-11T23:16:31.5819275Z [0;32m[ RUN      ] [mConcatOptTest.UseVariadicCatWithMultipleListUses
2023-01-11T23:16:31.5848277Z [0;32m[       OK ] [mConcatOptTest.UseVariadicCatWithMultipleListUses (2 ms)
2023-01-11T23:16:31.5849226Z [0;32m[ RUN      ] [mConcatOptTest.UseVariadicCatWithListMutationAfterCat
2023-01-11T23:16:31.5884752Z [0;32m[       OK ] [mConcatOptTest.UseVariadicCatWithListMutationAfterCat (3 ms)
2023-01-11T23:16:31.5885311Z [0;32m[ RUN      ] [mConcatOptTest.UseVariadicCatWithListMutationBeforeCat
2023-01-11T23:16:31.5921501Z [0;32m[       OK ] [mConcatOptTest.UseVariadicCatWithListMutationBeforeCat (3 ms)
2023-01-11T23:16:31.5922058Z [0;32m[ RUN      ] [mConcatOptTest.UseVariadicCatWithMultipleListMutations
2023-01-11T23:16:31.5988253Z [0;32m[       OK ] [mConcatOptTest.UseVariadicCatWithMultipleListMutations (6 ms)
2023-01-11T23:16:31.5988885Z [0;32m[ RUN      ] [mConcatOptTest.RemoveListMutationUseVariadicCatAndCommonInputsElimination
2023-01-11T23:16:31.6027899Z [0;32m[       OK ] [mConcatOptTest.RemoveListMutationUseVariadicCatAndCommonInputsElimination (3 ms)
2023-01-11T23:16:31.6028457Z [0;32m[----------] [m13 tests from ConcatOptTest (53 ms total)
2023-01-11T23:16:31.6028660Z 
2023-01-11T23:16:31.6028858Z [0;32m[----------] [m1 test from OptimizeConcatTest
2023-01-11T23:16:31.6029303Z [0;32m[ RUN      ] [mOptimizeConcatTest.UseVariadicCatReplaceMultiple
2023-01-11T23:16:31.6079894Z [0;32m[       OK ] [mOptimizeConcatTest.UseVariadicCatReplaceMultiple (5 ms)
2023-01-11T23:16:31.6080371Z [0;32m[----------] [m1 test from OptimizeConcatTest (5 ms total)
2023-01-11T23:16:31.6080576Z 
2023-01-11T23:16:31.6080752Z [0;32m[----------] [m3 tests from ConcatOpt
2023-01-11T23:16:31.6081136Z [0;32m[ RUN      ] [mConcatOpt.CombineConcatsSimpleCase
2023-01-11T23:16:31.6081559Z [0;32m[       OK ] [mConcatOpt.CombineConcatsSimpleCase (0 ms)
2023-01-11T23:16:31.6081980Z [0;32m[ RUN      ] [mConcatOpt.CombineConcatsLongChain
2023-01-11T23:16:31.6084864Z [0;32m[       OK ] [mConcatOpt.CombineConcatsLongChain (0 ms)
2023-01-11T23:16:31.6085478Z [0;32m[ RUN      ] [mConcatOpt.CombineConcatsMutation
2023-01-11T23:16:31.6085882Z [0;32m[       OK ] [mConcatOpt.CombineConcatsMutation (0 ms)
2023-01-11T23:16:31.6086286Z [0;32m[----------] [m3 tests from ConcatOpt (0 ms total)
2023-01-11T23:16:31.6086470Z 
2023-01-11T23:16:31.6086691Z [0;32m[----------] [m4 tests from ConstantPoolingTest
2023-01-11T23:16:31.6087044Z [0;32m[ RUN      ] [mConstantPoolingTest.Int
2023-01-11T23:16:31.6087411Z [0;32m[       OK ] [mConstantPoolingTest.Int (0 ms)
2023-01-11T23:16:31.6087831Z [0;32m[ RUN      ] [mConstantPoolingTest.PoolingAcrossBlocks
2023-01-11T23:16:31.6088281Z [0;32m[       OK ] [mConstantPoolingTest.PoolingAcrossBlocks (0 ms)
2023-01-11T23:16:31.6088946Z [0;32m[ RUN      ] [mConstantPoolingTest.PoolingDifferentDevices
2023-01-11T23:16:31.6090340Z [0;32m[       OK ] [mConstantPoolingTest.PoolingDifferentDevices (0 ms)
2023-01-11T23:16:31.6090806Z [0;32m[ RUN      ] [mConstantPoolingTest.DictConstantPooling
2023-01-11T23:16:31.6091252Z [0;32m[       OK ] [mConstantPoolingTest.DictConstantPooling (0 ms)
2023-01-11T23:16:31.6095395Z [0;32m[----------] [m4 tests from ConstantPoolingTest (0 ms total)
2023-01-11T23:16:31.6095645Z 
2023-01-11T23:16:31.6095855Z [0;32m[----------] [m1 test from CleanupPassTest
2023-01-11T23:16:31.6096213Z [0;32m[ RUN      ] [mCleanupPassTest.Basic
2023-01-11T23:16:31.6096578Z [0;32m[       OK ] [mCleanupPassTest.Basic (0 ms)
2023-01-11T23:16:31.6096956Z [0;32m[----------] [m1 test from CleanupPassTest (0 ms total)
2023-01-11T23:16:31.6097151Z 
2023-01-11T23:16:31.6097374Z [0;32m[----------] [m1 test from CreateAutodiffSubgraphsTest
2023-01-11T23:16:31.6097782Z [0;32m[ RUN      ] [mCreateAutodiffSubgraphsTest.Basic
2023-01-11T23:16:31.6098481Z [0;32m[       OK ] [mCreateAutodiffSubgraphsTest.Basic (0 ms)
2023-01-11T23:16:31.6098930Z [0;32m[----------] [m1 test from CreateAutodiffSubgraphsTest (0 ms total)
2023-01-11T23:16:31.6099150Z 
2023-01-11T23:16:31.6099334Z [0;32m[----------] [m4 tests from CustomClassTest
2023-01-11T23:16:31.6099714Z [0;32m[ RUN      ] [mCustomClassTest.TorchbindIValueAPI
2023-01-11T23:16:31.6105798Z [0;32m[       OK ] [mCustomClassTest.TorchbindIValueAPI (0 ms)
2023-01-11T23:16:31.6106199Z [0;32m[ RUN      ] [mCustomClassTest.ScalarTypeClass
2023-01-11T23:16:31.6108342Z [0;32m[       OK ] [mCustomClassTest.ScalarTypeClass (0 ms)
2023-01-11T23:16:31.6108725Z [0;32m[ RUN      ] [mCustomClassTest.TestDocString
2023-01-11T23:16:31.6109298Z [0;32m[       OK ] [mCustomClassTest.TestDocString (0 ms)
2023-01-11T23:16:31.6109677Z [0;32m[ RUN      ] [mCustomClassTest.Serialization
2023-01-11T23:16:31.6123649Z [0;32m[       OK ] [mCustomClassTest.Serialization (1 ms)
2023-01-11T23:16:31.6124055Z [0;32m[----------] [m4 tests from CustomClassTest (2 ms total)
2023-01-11T23:16:31.6124246Z 
2023-01-11T23:16:31.6124444Z [0;32m[----------] [m5 tests from CustomOperatorTest
2023-01-11T23:16:31.6124833Z [0;32m[ RUN      ] [mCustomOperatorTest.InferredSchema
2023-01-11T23:16:31.6126336Z [0;32m[       OK ] [mCustomOperatorTest.InferredSchema (0 ms)
2023-01-11T23:16:31.6126767Z [0;32m[ RUN      ] [mCustomOperatorTest.ExplicitSchema
2023-01-11T23:16:31.6129166Z [0;32m[       OK ] [mCustomOperatorTest.ExplicitSchema (0 ms)
2023-01-11T23:16:31.6130980Z [0;32m[ RUN      ] [mCustomOperatorTest.ListParameters
2023-01-11T23:16:31.6131550Z [0;32m[       OK ] [mCustomOperatorTest.ListParameters (0 ms)
2023-01-11T23:16:31.6132098Z [0;32m[ RUN      ] [mCustomOperatorTest.ListParameters2
2023-01-11T23:16:31.6132670Z [0;32m[       OK ] [mCustomOperatorTest.ListParameters2 (0 ms)
2023-01-11T23:16:31.6133158Z [0;32m[ RUN      ] [mCustomOperatorTest.Aliasing
2023-01-11T23:16:31.6135336Z [0;32m[       OK ] [mCustomOperatorTest.Aliasing (0 ms)
2023-01-11T23:16:31.6136074Z [0;32m[----------] [m5 tests from CustomOperatorTest (1 ms total)
2023-01-11T23:16:31.6136260Z 
2023-01-11T23:16:31.6136441Z [0;32m[----------] [m2 tests from TestCustomOperator
2023-01-11T23:16:31.6136905Z [0;32m[ RUN      ] [mTestCustomOperator.OperatorGeneratorUndeclared
2023-01-11T23:16:31.6137344Z [0;32m[       OK ] [mTestCustomOperator.OperatorGeneratorUndeclared (0 ms)
2023-01-11T23:16:31.6137764Z [0;32m[ RUN      ] [mTestCustomOperator.OperatorGeneratorBasic
2023-01-11T23:16:31.6138171Z [0;32m[       OK ] [mTestCustomOperator.OperatorGeneratorBasic (0 ms)
2023-01-11T23:16:31.6138559Z [0;32m[----------] [m2 tests from TestCustomOperator (0 ms total)
2023-01-11T23:16:31.6138750Z 
2023-01-11T23:16:31.6138973Z [0;32m[----------] [m1 test from EliminateDeadCodeTest
2023-01-11T23:16:31.6139301Z [0;32m[ RUN      ] [mEliminateDeadCodeTest.Basic
2023-01-11T23:16:31.6139657Z [0;32m[       OK ] [mEliminateDeadCodeTest.Basic (0 ms)
2023-01-11T23:16:31.6140341Z [0;32m[----------] [m1 test from EliminateDeadCodeTest (0 ms total)
2023-01-11T23:16:31.6140526Z 
2023-01-11T23:16:31.6140677Z [0;32m[----------] [m5 tests from FuserTest
2023-01-11T23:16:31.6140980Z [0;32m[ RUN      ] [mFuserTest.TestSimple_CUDA
2023-01-11T23:16:31.7485821Z [0;32m[       OK ] [mFuserTest.TestSimple_CUDA (134 ms)
2023-01-11T23:16:31.7486176Z [0;32m[ RUN      ] [mFuserTest.TestOne_CUDA
2023-01-11T23:16:32.3000667Z [0;32m[       OK ] [mFuserTest.TestOne_CUDA (551 ms)
2023-01-11T23:16:32.3001034Z [0;32m[ RUN      ] [mFuserTest.FusedConcat_CUDA
2023-01-11T23:16:32.7379216Z [0;32m[       OK ] [mFuserTest.FusedConcat_CUDA (437 ms)
2023-01-11T23:16:32.7379633Z [0;32m[ RUN      ] [mFuserTest.FusionAliasing
2023-01-11T23:16:32.7384753Z [0;32m[       OK ] [mFuserTest.FusionAliasing (0 ms)
2023-01-11T23:16:32.7385896Z [0;32m[ RUN      ] [mFuserTest.KernelCaching
2023-01-11T23:16:32.7386779Z [0;32m[       OK ] [mFuserTest.KernelCaching (0 ms)
2023-01-11T23:16:32.7387270Z [0;32m[----------] [m5 tests from FuserTest (1124 ms total)
2023-01-11T23:16:32.7387504Z 
2023-01-11T23:16:32.7387747Z [0;32m[----------] [m2 tests from GraphExecutorTest
2023-01-11T23:16:32.7388182Z [0;32m[ RUN      ] [mGraphExecutorTest.Basic_CUDA
2023-01-11T23:16:33.3595824Z [0;32m[       OK ] [mGraphExecutorTest.Basic_CUDA (620 ms)
2023-01-11T23:16:33.3596424Z [0;32m[ RUN      ] [mGraphExecutorTest.runAsync_executor
2023-01-11T23:16:33.3640659Z [0;32m[       OK ] [mGraphExecutorTest.runAsync_executor (4 ms)
2023-01-11T23:16:33.3641219Z [0;32m[----------] [m2 tests from GraphExecutorTest (625 ms total)
2023-01-11T23:16:33.3641483Z 
2023-01-11T23:16:33.3641728Z [0;32m[----------] [m5 tests from GraphIteratorTest
2023-01-11T23:16:33.3642481Z [0;32m[ RUN      ] [mGraphIteratorTest.ConstantReturnGraph
2023-01-11T23:16:33.3642913Z [0;32m[       OK ] [mGraphIteratorTest.ConstantReturnGraph (0 ms)
2023-01-11T23:16:33.3643440Z [0;32m[ RUN      ] [mGraphIteratorTest.GraphWithParameters
2023-01-11T23:16:33.3643868Z [0;32m[       OK ] [mGraphIteratorTest.GraphWithParameters (0 ms)
2023-01-11T23:16:33.3644232Z [0;32m[ RUN      ] [mGraphIteratorTest.GraphWithIf
2023-01-11T23:16:33.3644586Z [0;32m[       OK ] [mGraphIteratorTest.GraphWithIf (0 ms)
2023-01-11T23:16:33.3644931Z [0;32m[ RUN      ] [mGraphIteratorTest.GraphWithNestedIf
2023-01-11T23:16:33.3645298Z [0;32m[       OK ] [mGraphIteratorTest.GraphWithNestedIf (0 ms)
2023-01-11T23:16:33.3645651Z [0;32m[ RUN      ] [mGraphIteratorTest.GraphWithLoop
2023-01-11T23:16:33.3645998Z [0;32m[       OK ] [mGraphIteratorTest.GraphWithLoop (0 ms)
2023-01-11T23:16:33.3646350Z [0;32m[----------] [m5 tests from GraphIteratorTest (0 ms total)
2023-01-11T23:16:33.3646526Z 
2023-01-11T23:16:33.3646723Z [0;32m[----------] [m1 test from CSDebugInfoSerializaitionTest
2023-01-11T23:16:33.3647102Z [0;32m[ RUN      ] [mCSDebugInfoSerializaitionTest.TwoSubmodules
2023-01-11T23:16:33.3650661Z [0;32m[       OK ] [mCSDebugInfoSerializaitionTest.TwoSubmodules (0 ms)
2023-01-11T23:16:33.3651180Z [0;32m[----------] [m1 test from CSDebugInfoSerializaitionTest (0 ms total)
2023-01-11T23:16:33.3651382Z 
2023-01-11T23:16:33.3651532Z [0;32m[----------] [m1 test from InlinerTest
2023-01-11T23:16:33.3651813Z [0;32m[ RUN      ] [mInlinerTest.Basic
2023-01-11T23:16:33.3652679Z [0;32m[       OK ] [mInlinerTest.Basic (0 ms)
2023-01-11T23:16:33.3653071Z [0;32m[----------] [m1 test from InlinerTest (0 ms total)
2023-01-11T23:16:33.3653293Z 
2023-01-11T23:16:33.3653473Z [0;32m[----------] [m1 test from InterfaceTest
2023-01-11T23:16:33.3653823Z [0;32m[ RUN      ] [mInterfaceTest.ModuleInterfaceSerialization
2023-01-11T23:16:33.3668181Z [0;32m[       OK ] [mInterfaceTest.ModuleInterfaceSerialization (1 ms)
2023-01-11T23:16:33.3668593Z [0;32m[----------] [m1 test from InterfaceTest (1 ms total)
2023-01-11T23:16:33.3668855Z 
2023-01-11T23:16:33.3669154Z [0;32m[----------] [m5 tests from TypeCheckTest
2023-01-11T23:16:33.3669740Z [0;32m[ RUN      ] [mTypeCheckTest.MatchingType
2023-01-11T23:16:33.3670187Z [0;32m[       OK ] [mTypeCheckTest.MatchingType (0 ms)
2023-01-11T23:16:33.3670590Z [0;32m[ RUN      ] [mTypeCheckTest.SizeMismatch
2023-01-11T23:16:33.3670944Z [0;32m[       OK ] [mTypeCheckTest.SizeMismatch (0 ms)
2023-01-11T23:16:33.3671310Z [0;32m[ RUN      ] [mTypeCheckTest.GradientMismatch
2023-01-11T23:16:33.3671823Z [0;32m[       OK ] [mTypeCheckTest.GradientMismatch (0 ms)
2023-01-11T23:16:33.3672423Z [0;32m[ RUN      ] [mTypeCheckTest.ScalarTypeMismatch
2023-01-11T23:16:33.3673645Z [0;32m[       OK ] [mTypeCheckTest.ScalarTypeMismatch (0 ms)
2023-01-11T23:16:33.3674094Z [0;32m[ RUN      ] [mTypeCheckTest.DeviceMismatch_CUDA
2023-01-11T23:16:33.3675008Z [0;32m[       OK ] [mTypeCheckTest.DeviceMismatch_CUDA (0 ms)
2023-01-11T23:16:33.3675403Z [0;32m[----------] [m5 tests from TypeCheckTest (0 ms total)
2023-01-11T23:16:33.3675609Z 
2023-01-11T23:16:33.3675834Z [0;32m[----------] [m4 tests from InterpreterTest
2023-01-11T23:16:33.3676160Z [0;32m[ RUN      ] [mInterpreterTest.Basic_CUDA
2023-01-11T23:16:33.3681836Z [0;32m[       OK ] [mInterpreterTest.Basic_CUDA (0 ms)
2023-01-11T23:16:33.3682197Z [0;32m[ RUN      ] [mInterpreterTest.IgnorableArgsInSchema
2023-01-11T23:16:33.3685238Z [0;32m[       OK ] [mInterpreterTest.IgnorableArgsInSchema (0 ms)
2023-01-11T23:16:33.3685673Z [0;32m[ RUN      ] [mInterpreterTest.IgnorableArgsInSchemaWithOut
2023-01-11T23:16:33.3686177Z [0;32m[       OK ] [mInterpreterTest.IgnorableArgsInSchemaWithOut (0 ms)
2023-01-11T23:16:33.3686559Z [0;32m[ RUN      ] [mInterpreterTest.runAsyncBasicTest
2023-01-11T23:16:33.3718780Z [0;32m[       OK ] [mInterpreterTest.runAsyncBasicTest (3 ms)
2023-01-11T23:16:33.3719325Z [0;32m[----------] [m4 tests from InterpreterTest (4 ms total)
2023-01-11T23:16:33.3719536Z 
2023-01-11T23:16:33.3719757Z [0;32m[----------] [m1 test from EnableRethrowCaughtExceptionTest
2023-01-11T23:16:33.3720324Z [0;32m[ RUN      ] [mEnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException
2023-01-11T23:16:33.3921075Z [0;32m[       OK ] [mEnableRethrowCaughtExceptionTest.EnableRethrowCaughtExceptionTestRethrowsCaughtException (20 ms)
2023-01-11T23:16:33.3921809Z [0;32m[----------] [m1 test from EnableRethrowCaughtExceptionTest (20 ms total)
2023-01-11T23:16:33.3922094Z 
2023-01-11T23:16:33.3922302Z [0;32m[----------] [m4 tests from IRTest
2023-01-11T23:16:33.3922663Z [0;32m[ RUN      ] [mIRTest.Attributes
2023-01-11T23:16:33.3923035Z [0;32m[       OK ] [mIRTest.Attributes (0 ms)
2023-01-11T23:16:33.3923405Z [0;32m[ RUN      ] [mIRTest.Blocks
2023-01-11T23:16:33.3923805Z [0;32m[       OK ] [mIRTest.Blocks (0 ms)
2023-01-11T23:16:33.3924110Z [0;32m[ RUN      ] [mIRTest.CommonAncestor
2023-01-11T23:16:33.3924417Z [0;32m[       OK ] [mIRTest.CommonAncestor (0 ms)
2023-01-11T23:16:33.3924709Z [0;32m[ RUN      ] [mIRTest.OperatorMap
2023-01-11T23:16:33.3924994Z [0;32m[       OK ] [mIRTest.OperatorMap (0 ms)
2023-01-11T23:16:33.3925378Z [0;32m[----------] [m4 tests from IRTest (0 ms total)
2023-01-11T23:16:33.3925537Z 
2023-01-11T23:16:33.3925701Z [0;32m[----------] [m21 tests from IRParserTest
2023-01-11T23:16:33.3926040Z [0;32m[ RUN      ] [mIRParserTest.Basic
2023-01-11T23:16:33.3926336Z [0;32m[       OK ] [mIRParserTest.Basic (0 ms)
2023-01-11T23:16:33.3926638Z [0;32m[ RUN      ] [mIRParserTest.NestedBlock
2023-01-11T23:16:33.3927015Z [0;32m[       OK ] [mIRParserTest.NestedBlock (0 ms)
2023-01-11T23:16:33.3927403Z [0;32m[ RUN      ] [mIRParserTest.If
2023-01-11T23:16:33.3927798Z [0;32m[       OK ] [mIRParserTest.If (0 ms)
2023-01-11T23:16:33.3928081Z [0;32m[ RUN      ] [mIRParserTest.If2
2023-01-11T23:16:33.3928354Z [0;32m[       OK ] [mIRParserTest.If2 (0 ms)
2023-01-11T23:16:33.3928688Z [0;32m[ RUN      ] [mIRParserTest.InferredTypeIsTensor
2023-01-11T23:16:33.3929046Z [0;32m[       OK ] [mIRParserTest.InferredTypeIsTensor (0 ms)
2023-01-11T23:16:33.3929368Z [0;32m[ RUN      ] [mIRParserTest.ValueReuse
2023-01-11T23:16:33.3929929Z [0;32m[       OK ] [mIRParserTest.ValueReuse (0 ms)
2023-01-11T23:16:33.3930380Z [0;32m[ RUN      ] [mIRParserTest.Attributes
2023-01-11T23:16:33.3930796Z [0;32m[       OK ] [mIRParserTest.Attributes (0 ms)
2023-01-11T23:16:33.3931233Z [0;32m[ RUN      ] [mIRParserTest.OptionalTypes
2023-01-11T23:16:33.3931626Z [0;32m[       OK ] [mIRParserTest.OptionalTypes (0 ms)
2023-01-11T23:16:33.3932046Z [0;32m[ RUN      ] [mIRParserTest.StarTensor
2023-01-11T23:16:33.3932417Z [0;32m[       OK ] [mIRParserTest.StarTensor (0 ms)
2023-01-11T23:16:33.3932730Z [0;32m[ RUN      ] [mIRParserTest.UnshapedTensor
2023-01-11T23:16:33.3933062Z [0;32m[       OK ] [mIRParserTest.UnshapedTensor (0 ms)
2023-01-11T23:16:33.3933402Z [0;32m[ RUN      ] [mIRParserTest.ShapedTensor
2023-01-11T23:16:33.3933862Z [0;32m[       OK ] [mIRParserTest.ShapedTensor (0 ms)
2023-01-11T23:16:33.3934310Z [0;32m[ RUN      ] [mIRParserTest.NestedContrainer
2023-01-11T23:16:33.3934962Z [0;32m[       OK ] [mIRParserTest.NestedContrainer (0 ms)
2023-01-11T23:16:33.3935318Z [0;32m[ RUN      ] [mIRParserTest.MalformedShapeAnnotation
2023-01-11T23:16:33.3935888Z [0;32m[       OK ] [mIRParserTest.MalformedShapeAnnotation (0 ms)
2023-01-11T23:16:33.3936339Z [0;32m[ RUN      ] [mIRParserTest.FileCheck
2023-01-11T23:16:33.3936650Z [0;32m[       OK ] [mIRParserTest.FileCheck (0 ms)
2023-01-11T23:16:33.3936943Z [0;32m[ RUN      ] [mIRParserTest.Strides
2023-01-11T23:16:33.3937238Z [0;32m[       OK ] [mIRParserTest.Strides (0 ms)
2023-01-11T23:16:33.3937544Z [0;32m[ RUN      ] [mIRParserTest.MalformedStrides
2023-01-11T23:16:33.3937879Z [0;32m[       OK ] [mIRParserTest.MalformedStrides (0 ms)
2023-01-11T23:16:33.3938284Z [0;32m[ RUN      ] [mIRParserTest.TensorShapes
2023-01-11T23:16:33.3938595Z [0;32m[       OK ] [mIRParserTest.TensorShapes (0 ms)
2023-01-11T23:16:33.3938952Z [0;32m[ RUN      ] [mIRParserTest.DeviceAndRequiresGradTensors
2023-01-11T23:16:33.3939372Z [0;32m[       OK ] [mIRParserTest.DeviceAndRequiresGradTensors (0 ms)
2023-01-11T23:16:33.3939723Z [0;32m[ RUN      ] [mIRParserTest.ListConstant
2023-01-11T23:16:33.3940049Z [0;32m[       OK ] [mIRParserTest.ListConstant (0 ms)
2023-01-11T23:16:33.3940430Z [0;32m[ RUN      ] [mIRParserTest.PartialStarTensor
2023-01-11T23:16:33.3940776Z [0;32m[       OK ] [mIRParserTest.PartialStarTensor (0 ms)
2023-01-11T23:16:33.3941135Z [0;32m[ RUN      ] [mIRParserTest.ComplexTensorAttributes
2023-01-11T23:16:33.3941559Z [0;32m[       OK ] [mIRParserTest.ComplexTensorAttributes (0 ms)
2023-01-11T23:16:33.3941930Z [0;32m[----------] [m21 tests from IRParserTest (1 ms total)
2023-01-11T23:16:33.3942094Z 
2023-01-11T23:16:33.3942240Z [0;32m[----------] [m2 tests from JitTypeTest
2023-01-11T23:16:33.3942539Z [0;32m[ RUN      ] [mJitTypeTest.IsComplete
2023-01-11T23:16:33.3942845Z [0;32m[       OK ] [mJitTypeTest.IsComplete (0 ms)
2023-01-11T23:16:33.3943149Z [0;32m[ RUN      ] [mJitTypeTest.UnifyTypes
2023-01-11T23:16:33.3943446Z [0;32m[       OK ] [mJitTypeTest.UnifyTypes (0 ms)
2023-01-11T23:16:33.3943774Z [0;32m[----------] [m2 tests from JitTypeTest (0 ms total)
2023-01-11T23:16:33.3943944Z 
2023-01-11T23:16:33.3944136Z [0;32m[----------] [m42 tests from LiteInterpreterTest
2023-01-11T23:16:33.3944517Z [0;32m[ RUN      ] [mLiteInterpreterTest.UpsampleNearest2d
2023-01-11T23:16:33.3950902Z [0;32m[       OK ] [mLiteInterpreterTest.UpsampleNearest2d (1 ms)
2023-01-11T23:16:33.3951323Z [0;32m[ RUN      ] [mLiteInterpreterTest.CheckAttrAccess
2023-01-11T23:16:33.3952697Z [0;32m[       OK ] [mLiteInterpreterTest.CheckAttrAccess (0 ms)
2023-01-11T23:16:33.3953129Z [0;32m[ RUN      ] [mLiteInterpreterTest.MethodInvocation
2023-01-11T23:16:33.3981699Z [0;32m[       OK ] [mLiteInterpreterTest.MethodInvocation (2 ms)
2023-01-11T23:16:33.3982425Z [0;32m[ RUN      ] [mLiteInterpreterTest.Conv
2023-01-11T23:16:33.4014251Z [0;32m[       OK ] [mLiteInterpreterTest.Conv (3 ms)
2023-01-11T23:16:33.4015019Z [0;32m[ RUN      ] [mLiteInterpreterTest.Inline
2023-01-11T23:16:33.4022721Z [0;32m[       OK ] [mLiteInterpreterTest.Inline (0 ms)
2023-01-11T23:16:33.4023074Z [0;32m[ RUN      ] [mLiteInterpreterTest.Tuple
2023-01-11T23:16:33.4028834Z [0;32m[       OK ] [mLiteInterpreterTest.Tuple (0 ms)
2023-01-11T23:16:33.4029178Z [0;32m[ RUN      ] [mLiteInterpreterTest.AtenFormat
2023-01-11T23:16:33.4035231Z [0;32m[       OK ] [mLiteInterpreterTest.AtenFormat (0 ms)
2023-01-11T23:16:33.4035608Z [0;32m[ RUN      ] [mLiteInterpreterTest.PrimDevice
2023-01-11T23:16:33.4039395Z [0;32m[       OK ] [mLiteInterpreterTest.PrimDevice (0 ms)
2023-01-11T23:16:33.4039728Z [0;32m[ RUN      ] [mLiteInterpreterTest.Dict
2023-01-11T23:16:33.4046196Z [0;32m[       OK ] [mLiteInterpreterTest.Dict (0 ms)
2023-01-11T23:16:33.4046526Z [0;32m[ RUN      ] [mLiteInterpreterTest.List
2023-01-11T23:16:33.4054654Z [0;32m[       OK ] [mLiteInterpreterTest.List (0 ms)
2023-01-11T23:16:33.4055010Z [0;32m[ RUN      ] [mLiteInterpreterTest.PrimOverload
2023-01-11T23:16:33.4055374Z [0;32m[       OK ] [mLiteInterpreterTest.PrimOverload (0 ms)
2023-01-11T23:16:33.4055698Z [0;32m[ RUN      ] [mLiteInterpreterTest.Prim
2023-01-11T23:16:33.4059394Z [0;32m[       OK ] [mLiteInterpreterTest.Prim (0 ms)
2023-01-11T23:16:33.4059731Z [0;32m[ RUN      ] [mLiteInterpreterTest.PrimScalar
2023-01-11T23:16:33.4063948Z [0;32m[       OK ] [mLiteInterpreterTest.PrimScalar (0 ms)
2023-01-11T23:16:33.4064289Z [0;32m[ RUN      ] [mLiteInterpreterTest.LoadOrigJit
2023-01-11T23:16:33.4121650Z [0;32m[       OK ] [mLiteInterpreterTest.LoadOrigJit (5 ms)
2023-01-11T23:16:33.4122010Z [0;32m[ RUN      ] [mLiteInterpreterTest.WrongMethodName
2023-01-11T23:16:33.4142432Z [0;32m[       OK ] [mLiteInterpreterTest.WrongMethodName (2 ms)
2023-01-11T23:16:33.4142908Z [0;32m[ RUN      ] [mLiteInterpreterTest.SetState
2023-01-11T23:16:33.4169027Z [0;32m[       OK ] [mLiteInterpreterTest.SetState (2 ms)
2023-01-11T23:16:33.4169378Z [0;32m[ RUN      ] [mLiteInterpreterTest.BuiltinClass
2023-01-11T23:16:33.4176967Z [0;32m[       OK ] [mLiteInterpreterTest.BuiltinClass (0 ms)
2023-01-11T23:16:33.4177341Z [0;32m[ RUN      ] [mLiteInterpreterTest.BuiltinFunction
2023-01-11T23:16:33.4180283Z [0;32m[       OK ] [mLiteInterpreterTest.BuiltinFunction (0 ms)
2023-01-11T23:16:33.4180687Z [0;32m[ RUN      ] [mLiteInterpreterTest.GetRuntimeByteCodeVersion
2023-01-11T23:16:33.4181194Z [0;32m[       OK ] [mLiteInterpreterTest.GetRuntimeByteCodeVersion (0 ms)
2023-01-11T23:16:33.4181671Z [0;32m[ RUN      ] [mLiteInterpreterTest.GetRuntimeOperatorsVersion
2023-01-11T23:16:33.4182103Z [0;32m[       OK ] [mLiteInterpreterTest.GetRuntimeOperatorsVersion (0 ms)
2023-01-11T23:16:33.4182507Z [0;32m[ RUN      ] [mLiteInterpreterTest.GetByteCodeVersion
2023-01-11T23:16:33.4182891Z [0;32m[       OK ] [mLiteInterpreterTest.GetByteCodeVersion (0 ms)
2023-01-11T23:16:33.4183263Z [0;32m[ RUN      ] [mLiteInterpreterTest.GetContainTypes
2023-01-11T23:16:33.4183836Z [0;32m[       OK ] [mLiteInterpreterTest.GetContainTypes (0 ms)
2023-01-11T23:16:33.4184300Z [0;32m[ RUN      ] [mLiteInterpreterTest.BackPortByteCodeModelAllVersions
2023-01-11T23:16:33.5107250Z [0;32m[       OK ] [mLiteInterpreterTest.BackPortByteCodeModelAllVersions (92 ms)
2023-01-11T23:16:33.5107714Z [0;32m[ RUN      ] [mLiteInterpreterTest.GetRuntimeOpsAndInfo
2023-01-11T23:16:33.5164467Z [0;32m[       OK ] [mLiteInterpreterTest.GetRuntimeOpsAndInfo (5 ms)
2023-01-11T23:16:33.5164864Z [0;32m[ RUN      ] [mLiteInterpreterTest.isCompatibleSuccess
2023-01-11T23:16:33.5214165Z [0;32m[       OK ] [mLiteInterpreterTest.isCompatibleSuccess (4 ms)
2023-01-11T23:16:33.5215408Z [0;32m[ RUN      ] [mLiteInterpreterTest.isCompatibleFail
2023-01-11T23:16:33.5304884Z [0;32m[       OK ] [mLiteInterpreterTest.isCompatibleFail (8 ms)
2023-01-11T23:16:33.5305599Z [0;32m[ RUN      ] [mLiteInterpreterTest.Eval
2023-01-11T23:16:33.5315644Z [0;32m[       OK ] [mLiteInterpreterTest.Eval (1 ms)
2023-01-11T23:16:33.5316356Z [0;32m[ RUN      ] [mLiteInterpreterTest.FindWrongMethodName
2023-01-11T23:16:33.5318032Z [0;32m[       OK ] [mLiteInterpreterTest.FindWrongMethodName (0 ms)
2023-01-11T23:16:33.5318417Z [0;32m[ RUN      ] [mLiteInterpreterTest.FindAndRunMethod
2023-01-11T23:16:33.5326173Z [0;32m[       OK ] [mLiteInterpreterTest.FindAndRunMethod (0 ms)
2023-01-11T23:16:33.5326560Z [0;32m[ RUN      ] [mLiteInterpreterTest.RunMethodVariadic
2023-01-11T23:16:33.5331607Z [0;32m[       OK ] [mLiteInterpreterTest.RunMethodVariadic (0 ms)
2023-01-11T23:16:33.5332006Z [0;32m[ RUN      ] [mLiteInterpreterTest.DuplicateSetState
2023-01-11T23:16:33.5343244Z [0;32m[       OK ] [mLiteInterpreterTest.DuplicateSetState (1 ms)
2023-01-11T23:16:33.5343806Z [0;32m[ RUN      ] [mLiteInterpreterTest.ExtraFiles
2023-01-11T23:16:33.5347876Z [0;32m[       OK ] [mLiteInterpreterTest.ExtraFiles (0 ms)
2023-01-11T23:16:33.5348290Z [0;32m[ RUN      ] [mLiteInterpreterTest.OpNameExportFetchRootOperators
2023-01-11T23:16:33.5356698Z [0;32m[       OK ] [mLiteInterpreterTest.OpNameExportFetchRootOperators (0 ms)
2023-01-11T23:16:33.5357240Z [0;32m[ RUN      ] [mLiteInterpreterTest.DefaultArgsConv
2023-01-11T23:16:33.5375207Z [0;32m[       OK ] [mLiteInterpreterTest.DefaultArgsConv (1 ms)
2023-01-11T23:16:33.5375645Z [0;32m[ RUN      ] [mLiteInterpreterTest.DefaultArgsPinv
2023-01-11T23:16:33.5470175Z [0;32m[       OK ] [mLiteInterpreterTest.DefaultArgsPinv (9 ms)
2023-01-11T23:16:33.5470710Z [0;32m[ RUN      ] [mLiteInterpreterTest.DefaultArgsTensorinvSpecifyDefault
2023-01-11T23:16:33.5480126Z [0;32m[       OK ] [mLiteInterpreterTest.DefaultArgsTensorinvSpecifyDefault (1 ms)
2023-01-11T23:16:33.5480589Z [0;32m[ RUN      ] [mLiteInterpreterTest.DefaultArgsPinvWithOutArg
2023-01-11T23:16:33.5513966Z [0;32m[       OK ] [mLiteInterpreterTest.DefaultArgsPinvWithOutArg (3 ms)
2023-01-11T23:16:33.5514395Z [0;32m[ RUN      ] [mLiteInterpreterTest.DefaultArgsWithOutArg
2023-01-11T23:16:33.5518323Z [0;32m[       OK ] [mLiteInterpreterTest.DefaultArgsWithOutArg (0 ms)
2023-01-11T23:16:33.5518811Z [0;32m[ RUN      ] [mLiteInterpreterTest.TestExceptionStackWithTwoLevelModuleHierarchy
2023-01-11T23:16:33.5619252Z [0;32m[       OK ] [mLiteInterpreterTest.TestExceptionStackWithTwoLevelModuleHierarchy (10 ms)
2023-01-11T23:16:33.5619852Z [0;32m[ RUN      ] [mLiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs
2023-01-11T23:16:33.5652508Z [0;32m[       OK ] [mLiteInterpreterTest.OperatorCacheDifferentiatesDefaultArgs (3 ms)
2023-01-11T23:16:33.5653006Z [0;32m[ RUN      ] [mLiteInterpreterTest.OperatorSize1
2023-01-11T23:16:33.5656437Z [0;32m[       OK ] [mLiteInterpreterTest.OperatorSize1 (0 ms)
2023-01-11T23:16:33.5656873Z [0;32m[ RUN      ] [mLiteInterpreterTest.OperatorTest2
2023-01-11T23:16:33.5673703Z [0;32m[       OK ] [mLiteInterpreterTest.OperatorTest2 (1 ms)
2023-01-11T23:16:33.5674220Z [0;32m[----------] [m42 tests from LiteInterpreterTest (173 ms total)
2023-01-11T23:16:33.5674457Z 
2023-01-11T23:16:33.5674668Z [0;32m[----------] [m3 tests from RunTimeTest
2023-01-11T23:16:33.5675051Z [0;32m[ RUN      ] [mRunTimeTest.ParseBytecode
2023-01-11T23:16:33.5675428Z [0;32m[       OK ] [mRunTimeTest.ParseBytecode (0 ms)
2023-01-11T23:16:33.5675743Z [0;32m[ RUN      ] [mRunTimeTest.ParseOperator
2023-01-11T23:16:33.5676151Z [0;32m[       OK ] [mRunTimeTest.ParseOperator (0 ms)
2023-01-11T23:16:33.5676568Z [0;32m[ RUN      ] [mRunTimeTest.RuntimeCall
2023-01-11T23:16:33.5676978Z [0;32m[       OK ] [mRunTimeTest.RuntimeCall (0 ms)
2023-01-11T23:16:33.5677402Z [0;32m[----------] [m3 tests from RunTimeTest (0 ms total)
2023-01-11T23:16:33.5677617Z 
2023-01-11T23:16:33.5677878Z [0;32m[----------] [m11 tests from LiteInterpreterUpgraderTest
2023-01-11T23:16:33.5678336Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivTensorV2
2023-01-11T23:16:33.5678740Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivTensorV2 (0 ms)
2023-01-11T23:16:33.5679259Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivTensorOutV2
2023-01-11T23:16:33.5679668Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivTensorOutV2 (0 ms)
2023-01-11T23:16:33.5680171Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivTensorInplaceV2
2023-01-11T23:16:33.5680764Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivTensorInplaceV2 (0 ms)
2023-01-11T23:16:33.5681281Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarFloatV2
2023-01-11T23:16:33.5681842Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarFloatV2 (0 ms)
2023-01-11T23:16:33.5682446Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarReciprocalFloatV2
2023-01-11T23:16:33.5682755Z expect output:  0.5000
2023-01-11T23:16:33.5682975Z [ CPUFloatType{1} ]actual output:  0.5000
2023-01-11T23:16:33.5683506Z [ CPUFloatType{1} ][0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarReciprocalFloatV2 (0 ms)
2023-01-11T23:16:33.5684160Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarReciprocalIntV2
2023-01-11T23:16:33.5684636Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarReciprocalIntV2 (0 ms)
2023-01-11T23:16:33.5685076Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarScalarV2
2023-01-11T23:16:33.5685503Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarScalarV2 (0 ms)
2023-01-11T23:16:33.5685938Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarIntV2
2023-01-11T23:16:33.5686496Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarIntV2 (0 ms)
2023-01-11T23:16:33.5687047Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarInplaceFloatV2
2023-01-11T23:16:33.5687859Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarInplaceFloatV2 (0 ms)
2023-01-11T23:16:33.5688586Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.DivScalarInplaceIntV2
2023-01-11T23:16:33.5689134Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.DivScalarInplaceIntV2 (0 ms)
2023-01-11T23:16:33.5689543Z [0;32m[ RUN      ] [mLiteInterpreterUpgraderTest.Upgrader
2023-01-11T23:16:33.5689922Z [0;32m[       OK ] [mLiteInterpreterUpgraderTest.Upgrader (0 ms)
2023-01-11T23:16:33.5690318Z [0;32m[----------] [m11 tests from LiteInterpreterUpgraderTest (1 ms total)
2023-01-11T23:16:33.5690511Z 
2023-01-11T23:16:33.5690706Z [0;32m[----------] [m29 tests from LiteInterpreterDirectTest
2023-01-11T23:16:33.5691083Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.UpsampleNearest2d
2023-01-11T23:16:33.5700127Z [0;32m[       OK ] [mLiteInterpreterDirectTest.UpsampleNearest2d (1 ms)
2023-01-11T23:16:33.5700658Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.CheckAttrAccess
2023-01-11T23:16:33.5701153Z [0;32m[       OK ] [mLiteInterpreterDirectTest.CheckAttrAccess (0 ms)
2023-01-11T23:16:33.5701553Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.MethodInvocation
2023-01-11T23:16:33.5705695Z hello 
2023-01-11T23:16:33.5705959Z hello 3
2023-01-11T23:16:33.5713482Z hello 
2023-01-11T23:16:33.5713925Z hello 3
2023-01-11T23:16:33.5718804Z hello 
2023-01-11T23:16:33.5719254Z hello 3
2023-01-11T23:16:33.5719738Z [0;32m[       OK ] [mLiteInterpreterDirectTest.MethodInvocation (1 ms)
2023-01-11T23:16:33.5720246Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.Conv
2023-01-11T23:16:33.5740275Z [0;32m[       OK ] [mLiteInterpreterDirectTest.Conv (2 ms)
2023-01-11T23:16:33.5740766Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.Inline
2023-01-11T23:16:33.5745394Z [0;32m[       OK ] [mLiteInterpreterDirectTest.Inline (0 ms)
2023-01-11T23:16:33.5745922Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.Tuple
2023-01-11T23:16:33.5749957Z [0;32m[       OK ] [mLiteInterpreterDirectTest.Tuple (0 ms)
2023-01-11T23:16:33.5750433Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.Dict
2023-01-11T23:16:33.5754721Z [0;32m[       OK ] [mLiteInterpreterDirectTest.Dict (0 ms)
2023-01-11T23:16:33.5755221Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.Prim
2023-01-11T23:16:33.5756199Z [0;32m[       OK ] [mLiteInterpreterDirectTest.Prim (0 ms)
2023-01-11T23:16:33.5756832Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.PrimScalar
2023-01-11T23:16:33.5759548Z [0;32m[       OK ] [mLiteInterpreterDirectTest.PrimScalar (0 ms)
2023-01-11T23:16:33.5760109Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.WrongMethodName
2023-01-11T23:16:33.5778379Z [0;32m[       OK ] [mLiteInterpreterDirectTest.WrongMethodName (1 ms)
2023-01-11T23:16:33.5778888Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.SetState
2023-01-11T23:16:33.5796282Z [0;32m[       OK ] [mLiteInterpreterDirectTest.SetState (1 ms)
2023-01-11T23:16:33.5796884Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.BuiltinFunction
2023-01-11T23:16:33.5797816Z [0;32m[       OK ] [mLiteInterpreterDirectTest.BuiltinFunction (0 ms)
2023-01-11T23:16:33.5798477Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.GetRuntimeByteCodeVersion
2023-01-11T23:16:33.5798961Z [0;32m[       OK ] [mLiteInterpreterDirectTest.GetRuntimeByteCodeVersion (0 ms)
2023-01-11T23:16:33.5799422Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.GetRuntimeOperatorsVersion
2023-01-11T23:16:33.5799886Z [0;32m[       OK ] [mLiteInterpreterDirectTest.GetRuntimeOperatorsVersion (0 ms)
2023-01-11T23:16:33.5800322Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.GetByteCodeVersion
2023-01-11T23:16:33.5800738Z [0;32m[       OK ] [mLiteInterpreterDirectTest.GetByteCodeVersion (0 ms)
2023-01-11T23:16:33.5801160Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.GetRuntimeOpsAndInfo
2023-01-11T23:16:33.5854140Z [0;32m[       OK ] [mLiteInterpreterDirectTest.GetRuntimeOpsAndInfo (5 ms)
2023-01-11T23:16:33.5854959Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.Eval
2023-01-11T23:16:33.5860767Z [0;32m[       OK ] [mLiteInterpreterDirectTest.Eval (0 ms)
2023-01-11T23:16:33.5861295Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.FindWrongMethodName
2023-01-11T23:16:33.5862536Z [0;32m[       OK ] [mLiteInterpreterDirectTest.FindWrongMethodName (0 ms)
2023-01-11T23:16:33.5863100Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.FindAndRunMethod
2023-01-11T23:16:33.5868854Z [0;32m[       OK ] [mLiteInterpreterDirectTest.FindAndRunMethod (0 ms)
2023-01-11T23:16:33.5869410Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.RunMethodVariadic
2023-01-11T23:16:33.5874840Z [0;32m[       OK ] [mLiteInterpreterDirectTest.RunMethodVariadic (0 ms)
2023-01-11T23:16:33.5875396Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.DuplicateSetState
2023-01-11T23:16:33.5879170Z [0;32m[       OK ] [mLiteInterpreterDirectTest.DuplicateSetState (0 ms)
2023-01-11T23:16:33.5879798Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.OpNameExportFetchRootOperators
2023-01-11T23:16:33.5884098Z [0;32m[       OK ] [mLiteInterpreterDirectTest.OpNameExportFetchRootOperators (0 ms)
2023-01-11T23:16:33.5884711Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.DefaultArgsConv
2023-01-11T23:16:33.5897423Z [0;32m[       OK ] [mLiteInterpreterDirectTest.DefaultArgsConv (1 ms)
2023-01-11T23:16:33.5897979Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.DefaultArgsPinv
2023-01-11T23:16:33.5983820Z [0;32m[       OK ] [mLiteInterpreterDirectTest.DefaultArgsPinv (8 ms)
2023-01-11T23:16:33.5984308Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.DefaultArgsTensorinvSpecifyDefault
2023-01-11T23:16:33.5990220Z [0;32m[       OK ] [mLiteInterpreterDirectTest.DefaultArgsTensorinvSpecifyDefault (0 ms)
2023-01-11T23:16:33.5990780Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.DefaultArgsPinvWithOutArg
2023-01-11T23:16:33.6019564Z [0;32m[       OK ] [mLiteInterpreterDirectTest.DefaultArgsPinvWithOutArg (2 ms)
2023-01-11T23:16:33.6020027Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.DefaultArgsWithOutArg
2023-01-11T23:16:33.6024095Z [0;32m[       OK ] [mLiteInterpreterDirectTest.DefaultArgsWithOutArg (0 ms)
2023-01-11T23:16:33.6024623Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.TestExceptionStackWithTwoLevelModuleHierarchy
2023-01-11T23:16:33.6119584Z [0;32m[       OK ] [mLiteInterpreterDirectTest.TestExceptionStackWithTwoLevelModuleHierarchy (9 ms)
2023-01-11T23:16:33.6120373Z [0;32m[ RUN      ] [mLiteInterpreterDirectTest.OperatorCacheDifferentiatesDefaultArgs
2023-01-11T23:16:33.6140562Z [0;32m[       OK ] [mLiteInterpreterDirectTest.OperatorCacheDifferentiatesDefaultArgs (2 ms)
2023-01-11T23:16:33.6141058Z [0;32m[----------] [m29 tests from LiteInterpreterDirectTest (45 ms total)
2023-01-11T23:16:33.6141239Z 
2023-01-11T23:16:33.6141402Z [0;32m[----------] [m7 tests from LiteTrainerTest
2023-01-11T23:16:33.6141706Z [0;32m[ RUN      ] [mLiteTrainerTest.Params
2023-01-11T23:16:33.6224224Z [0;32m[       OK ] [mLiteTrainerTest.Params (8 ms)
2023-01-11T23:16:33.6224525Z [0;32m[ RUN      ] [mLiteTrainerTest.SGD
2023-01-11T23:16:33.6302632Z [0;32m[       OK ] [mLiteTrainerTest.SGD (7 ms)
2023-01-11T23:16:33.6302970Z [0;32m[ RUN      ] [mLiteTrainerTest.SequentialSampler
2023-01-11T23:16:33.6303358Z [0;32m[       OK ] [mLiteTrainerTest.SequentialSampler (0 ms)
2023-01-11T23:16:33.6303779Z [0;32m[ RUN      ] [mLiteTrainerTest.RandomSamplerReturnsIndicesInCorrectRange
2023-01-11T23:16:33.6304283Z [0;32m[       OK ] [mLiteTrainerTest.RandomSamplerReturnsIndicesInCorrectRange (0 ms)
2023-01-11T23:16:33.6304779Z [0;32m[ RUN      ] [mLiteTrainerTest.RandomSamplerReturnsLessValuesForLastBatch
2023-01-11T23:16:33.6305276Z [0;32m[       OK ] [mLiteTrainerTest.RandomSamplerReturnsLessValuesForLastBatch (0 ms)
2023-01-11T23:16:33.6305716Z [0;32m[ RUN      ] [mLiteTrainerTest.RandomSamplerResetsWell
2023-01-11T23:16:33.6306104Z [0;32m[       OK ] [mLiteTrainerTest.RandomSamplerResetsWell (0 ms)
2023-01-11T23:16:33.6306531Z [0;32m[ RUN      ] [mLiteTrainerTest.RandomSamplerResetsWithNewSizeWell
2023-01-11T23:16:33.6306980Z [0;32m[       OK ] [mLiteTrainerTest.RandomSamplerResetsWithNewSizeWell (0 ms)
2023-01-11T23:16:33.6307590Z [0;32m[----------] [m7 tests from LiteTrainerTest (16 ms total)
2023-01-11T23:16:33.6307765Z 
2023-01-11T23:16:33.6307917Z [0;32m[----------] [m6 tests from MobileTest
2023-01-11T23:16:33.6308232Z [0;32m[ RUN      ] [mMobileTest.SaveLoadParametersEmpty
2023-01-11T23:16:33.6308786Z [0;32m[       OK ] [mMobileTest.SaveLoadParametersEmpty (0 ms)
2023-01-11T23:16:33.6309158Z [0;32m[ RUN      ] [mMobileTest.SaveParametersDefaultsToZip
2023-01-11T23:16:33.6309602Z [0;32m[       OK ] [mMobileTest.SaveParametersDefaultsToZip (0 ms)
2023-01-11T23:16:33.6310015Z [0;32m[ RUN      ] [mMobileTest.SaveParametersCanUseFlatbuffer
2023-01-11T23:16:33.6310419Z [0;32m[       OK ] [mMobileTest.SaveParametersCanUseFlatbuffer (0 ms)
2023-01-11T23:16:33.6310899Z [0;32m[ RUN      ] [mMobileTest.SaveLoadParametersUsingFlatbuffers
2023-01-11T23:16:33.6311710Z [0;32m[       OK ] [mMobileTest.SaveLoadParametersUsingFlatbuffers (0 ms)
2023-01-11T23:16:33.6312360Z [0;32m[ RUN      ] [mMobileTest.LoadParametersUnexpectedFormatShouldThrow
2023-01-11T23:16:33.6333215Z [0;32m[       OK ] [mMobileTest.LoadParametersUnexpectedFormatShouldThrow (2 ms)
2023-01-11T23:16:33.6333667Z [0;32m[ RUN      ] [mMobileTest.LoadParametersEmptyDataShouldThrow
2023-01-11T23:16:33.6355558Z [0;32m[       OK ] [mMobileTest.LoadParametersEmptyDataShouldThrow (2 ms)
2023-01-11T23:16:33.6356138Z [0;32m[----------] [m6 tests from MobileTest (4 ms total)
2023-01-11T23:16:33.6356356Z 
2023-01-11T23:16:33.6356522Z [0;32m[----------] [m1 test from MemoryDAGTest
2023-01-11T23:16:33.6356874Z [0;32m[ RUN      ] [mMemoryDAGTest.Basic
2023-01-11T23:16:33.6357208Z [0;32m[       OK ] [mMemoryDAGTest.Basic (0 ms)
2023-01-11T23:16:33.6357570Z [0;32m[----------] [m1 test from MemoryDAGTest (0 ms total)
2023-01-11T23:16:33.6357769Z 
2023-01-11T23:16:33.6357973Z [0;32m[----------] [m1 test from InternedStringsTest
2023-01-11T23:16:33.6358285Z [0;32m[ RUN      ] [mInternedStringsTest.Basic
2023-01-11T23:16:33.6358597Z [0;32m[       OK ] [mInternedStringsTest.Basic (0 ms)
2023-01-11T23:16:33.6358952Z [0;32m[----------] [m1 test from InternedStringsTest (0 ms total)
2023-01-11T23:16:33.6359149Z 
2023-01-11T23:16:33.6359350Z [0;32m[----------] [m1 test from FromQualStringTest
2023-01-11T23:16:33.6359818Z [0;32m[ RUN      ] [mFromQualStringTest.Basic
2023-01-11T23:16:33.6360178Z [0;32m[       OK ] [mFromQualStringTest.Basic (0 ms)
2023-01-11T23:16:33.6360520Z [0;32m[----------] [m1 test from FromQualStringTest (0 ms total)
2023-01-11T23:16:33.6360709Z 
2023-01-11T23:16:33.6360905Z [0;32m[----------] [m1 test from THNNConvTest
2023-01-11T23:16:33.6361186Z [0;32m[ RUN      ] [mTHNNConvTest.Basic
2023-01-11T23:16:33.6383926Z [0;32m[       OK ] [mTHNNConvTest.Basic (2 ms)
2023-01-11T23:16:33.6384335Z [0;32m[----------] [m1 test from THNNConvTest (2 ms total)
2023-01-11T23:16:33.6384539Z 
2023-01-11T23:16:33.6384760Z [0;32m[----------] [m1 test from ATenNativeBatchNormTest
2023-01-11T23:16:33.6385172Z [0;32m[ RUN      ] [mATenNativeBatchNormTest.Basic
2023-01-11T23:16:33.6401275Z [0;32m[       OK ] [mATenNativeBatchNormTest.Basic (1 ms)
2023-01-11T23:16:33.6401669Z [0;32m[----------] [m1 test from ATenNativeBatchNormTest (1 ms total)
2023-01-11T23:16:33.6401849Z 
2023-01-11T23:16:33.6402023Z [0;32m[----------] [m2 tests from CustomFusionTest
2023-01-11T23:16:33.6402331Z [0;32m[ RUN      ] [mCustomFusionTest.Basic
2023-01-11T23:16:33.6402797Z [0;32m[       OK ] [mCustomFusionTest.Basic (0 ms)
2023-01-11T23:16:33.6403111Z [0;32m[ RUN      ] [mCustomFusionTest.NestedBlocks
2023-01-11T23:16:33.6404098Z [0;32m[       OK ] [mCustomFusionTest.NestedBlocks (0 ms)
2023-01-11T23:16:33.6404458Z [0;32m[----------] [m2 tests from CustomFusionTest (0 ms total)
2023-01-11T23:16:33.6404629Z 
2023-01-11T23:16:33.6404787Z [0;32m[----------] [m1 test from ControlFlowTest
2023-01-11T23:16:33.6405075Z [0;32m[ RUN      ] [mControlFlowTest.Basic
2023-01-11T23:16:33.6411239Z [0;32m[       OK ] [mControlFlowTest.Basic (0 ms)
2023-01-11T23:16:33.6411702Z [0;32m[----------] [m1 test from ControlFlowTest (0 ms total)
2023-01-11T23:16:33.6411924Z 
2023-01-11T23:16:33.6412108Z [0;32m[----------] [m1 test from ProtoTest
2023-01-11T23:16:33.6412445Z [0;32m[ RUN      ] [mProtoTest.Basic
2023-01-11T23:16:33.6412803Z [0;32m[       OK ] [mProtoTest.Basic (0 ms)
2023-01-11T23:16:33.6413199Z [0;32m[----------] [m1 test from ProtoTest (0 ms total)
2023-01-11T23:16:33.6413381Z 
2023-01-11T23:16:33.6413589Z [0;32m[----------] [m9 tests from SchemaParserTest
2023-01-11T23:16:33.6414015Z [0;32m[ RUN      ] [mSchemaParserTest.NestedArrays
2023-01-11T23:16:33.6414431Z [0;32m[       OK ] [mSchemaParserTest.NestedArrays (0 ms)
2023-01-11T23:16:33.6414966Z [0;32m[ RUN      ] [mSchemaParserTest.OutVariant
2023-01-11T23:16:33.6415290Z [0;32m[       OK ] [mSchemaParserTest.OutVariant (0 ms)
2023-01-11T23:16:33.6415629Z [0;32m[ RUN      ] [mSchemaParserTest.NamedReturns
2023-01-11T23:16:33.6416047Z [0;32m[       OK ] [mSchemaParserTest.NamedReturns (0 ms)
2023-01-11T23:16:33.6416392Z [0;32m[ RUN      ] [mSchemaParserTest.Futures
2023-01-11T23:16:33.6416765Z [0;32m[       OK ] [mSchemaParserTest.Futures (0 ms)
2023-01-11T23:16:33.6417154Z [0;32m[ RUN      ] [mSchemaParserTest.AnnotatedAliasSets
2023-01-11T23:16:33.6417543Z [0;32m[       OK ] [mSchemaParserTest.AnnotatedAliasSets (0 ms)
2023-01-11T23:16:33.6418078Z [0;32m[ RUN      ] [mSchemaParserTest.TensorListAnnotatedAliasSets
2023-01-11T23:16:33.6418627Z [0;32m[       OK ] [mSchemaParserTest.TensorListAnnotatedAliasSets (0 ms)
2023-01-11T23:16:33.6419060Z [0;32m[ RUN      ] [mSchemaParserTest.AnnotatedAliasWithoutBeforeSet
2023-01-11T23:16:33.6419490Z [0;32m[       OK ] [mSchemaParserTest.AnnotatedAliasWithoutBeforeSet (0 ms)
2023-01-11T23:16:33.6419877Z [0;32m[ RUN      ] [mSchemaParserTest.BeforeAfterSets
2023-01-11T23:16:33.6420231Z [0;32m[       OK ] [mSchemaParserTest.BeforeAfterSets (0 ms)
2023-01-11T23:16:33.6420577Z [0;32m[ RUN      ] [mSchemaParserTest.BeforeAfterSets2
2023-01-11T23:16:33.6420939Z [0;32m[       OK ] [mSchemaParserTest.BeforeAfterSets2 (0 ms)
2023-01-11T23:16:33.6421301Z [0;32m[----------] [m9 tests from SchemaParserTest (0 ms total)
2023-01-11T23:16:33.6421470Z 
2023-01-11T23:16:33.6421728Z [0;32m[----------] [m2 tests from TopologicalIndexTest
2023-01-11T23:16:33.6422064Z [0;32m[ RUN      ] [mTopologicalIndexTest.Basic
2023-01-11T23:16:33.6422439Z [0;32m[       OK ] [mTopologicalIndexTest.Basic (0 ms)
2023-01-11T23:16:33.6422785Z [0;32m[ RUN      ] [mTopologicalIndexTest.Reindex
2023-01-11T23:16:33.6423189Z [0;32m[       OK ] [mTopologicalIndexTest.Reindex (0 ms)
2023-01-11T23:16:33.6423553Z [0;32m[----------] [m2 tests from TopologicalIndexTest (0 ms total)
2023-01-11T23:16:33.6423730Z 
2023-01-11T23:16:33.6423900Z [0;32m[----------] [m7 tests from RecordFunctionTest
2023-01-11T23:16:33.6424258Z [0;32m[ RUN      ] [mRecordFunctionTest.TracedTestInputsOutputs
2023-01-11T23:16:33.6424665Z [0;32m[       OK ] [mRecordFunctionTest.TracedTestInputsOutputs (0 ms)
2023-01-11T23:16:33.6425053Z [0;32m[ RUN      ] [mRecordFunctionTest.SampledCallbacks
2023-01-11T23:16:33.6499564Z [0;32m[       OK ] [mRecordFunctionTest.SampledCallbacks (7 ms)
2023-01-11T23:16:33.6500168Z [0;32m[ RUN      ] [mRecordFunctionTest.RecordFunctionGuard
2023-01-11T23:16:33.6500655Z [0;32m[       OK ] [mRecordFunctionTest.RecordFunctionGuard (0 ms)
2023-01-11T23:16:33.6501011Z [0;32m[ RUN      ] [mRecordFunctionTest.Callbacks
2023-01-11T23:16:33.6503098Z [0;32m[       OK ] [mRecordFunctionTest.Callbacks (0 ms)
2023-01-11T23:16:33.6503512Z [0;32m[ RUN      ] [mRecordFunctionTest.ShouldRun
2023-01-11T23:16:33.6503857Z [0;32m[       OK ] [mRecordFunctionTest.ShouldRun (0 ms)
2023-01-11T23:16:33.6504173Z [0;32m[ RUN      ] [mRecordFunctionTest.Basic
2023-01-11T23:16:33.6505045Z [0;32m[       OK ] [mRecordFunctionTest.Basic (0 ms)
2023-01-11T23:16:33.6505551Z [0;32m[ RUN      ] [mRecordFunctionTest.OperatorNameOverload
2023-01-11T23:16:33.6506168Z [0;32m[       OK ] [mRecordFunctionTest.OperatorNameOverload (0 ms)
2023-01-11T23:16:33.6506590Z [0;32m[----------] [m7 tests from RecordFunctionTest (8 ms total)
2023-01-11T23:16:33.6506777Z 
2023-01-11T23:16:33.6507004Z [0;32m[----------] [m1 test from ThreadLocalDebugInfoTest
2023-01-11T23:16:33.6507473Z [0;32m[ RUN      ] [mThreadLocalDebugInfoTest.Basic
2023-01-11T23:16:33.6507917Z [0;32m[       OK ] [mThreadLocalDebugInfoTest.Basic (0 ms)
2023-01-11T23:16:33.6508291Z [0;32m[----------] [m1 test from ThreadLocalDebugInfoTest (0 ms total)
2023-01-11T23:16:33.6508475Z 
2023-01-11T23:16:33.6508651Z [0;32m[----------] [m1 test from TestSymIntArrayRef
2023-01-11T23:16:33.6508988Z [0;32m[ RUN      ] [mTestSymIntArrayRef.BasicConversion
2023-01-11T23:16:33.6509413Z [0;32m[       OK ] [mTestSymIntArrayRef.BasicConversion (0 ms)
2023-01-11T23:16:33.6509902Z [0;32m[----------] [m1 test from TestSymIntArrayRef (0 ms total)
2023-01-11T23:16:33.6510128Z 
2023-01-11T23:16:33.6510309Z [0;32m[----------] [m4 tests from TestSymInt
2023-01-11T23:16:33.6510778Z [0;32m[ RUN      ] [mTestSymInt.NarrowCopyWithSymbolicInt
2023-01-11T23:16:33.6511159Z [0;32m[       OK ] [mTestSymInt.NarrowCopyWithSymbolicInt (0 ms)
2023-01-11T23:16:33.6511503Z [0;32m[ RUN      ] [mTestSymInt.NarrowCopy
2023-01-11T23:16:33.6511844Z [0;32m[       OK ] [mTestSymInt.NarrowCopy (0 ms)
2023-01-11T23:16:33.6512175Z [0;32m[ RUN      ] [mTestSymInt.AddSymbolicInt
2023-01-11T23:16:33.6512559Z [0;32m[       OK ] [mTestSymInt.AddSymbolicInt (0 ms)
2023-01-11T23:16:33.6512912Z [0;32m[ RUN      ] [mTestSymInt.TestSymIntToSymNodeDispatch
2023-01-11T23:16:33.6513303Z [0;32m[       OK ] [mTestSymInt.TestSymIntToSymNodeDispatch (0 ms)
2023-01-11T23:16:33.6513659Z [0;32m[----------] [m4 tests from TestSymInt (0 ms total)
2023-01-11T23:16:33.6513816Z 
2023-01-11T23:16:33.6513983Z [0;32m[----------] [m1 test from FallbackGraphsTest
2023-01-11T23:16:33.6514291Z [0;32m[ RUN      ] [mFallbackGraphsTest.Basic
2023-01-11T23:16:33.6519260Z [0;32m[       OK ] [mFallbackGraphsTest.Basic (0 ms)
2023-01-11T23:16:33.6519633Z [0;32m[----------] [m1 test from FallbackGraphsTest (0 ms total)
2023-01-11T23:16:33.6519838Z 
2023-01-11T23:16:33.6520055Z [0;32m[----------] [m1 test from NoneSchemaMatchTest
2023-01-11T23:16:33.6520473Z [0;32m[ RUN      ] [mNoneSchemaMatchTest.Basic
2023-01-11T23:16:33.6520798Z [0;32m[       OK ] [mNoneSchemaMatchTest.Basic (0 ms)
2023-01-11T23:16:33.6521148Z [0;32m[----------] [m1 test from NoneSchemaMatchTest (0 ms total)
2023-01-11T23:16:33.6521321Z 
2023-01-11T23:16:33.6521489Z [0;32m[----------] [m1 test from PassManagementTest
2023-01-11T23:16:33.6521794Z [0;32m[ RUN      ] [mPassManagementTest.Basic
2023-01-11T23:16:33.6522118Z [0;32m[       OK ] [mPassManagementTest.Basic (0 ms)
2023-01-11T23:16:33.6522463Z [0;32m[----------] [m1 test from PassManagementTest (0 ms total)
2023-01-11T23:16:33.6522669Z 
2023-01-11T23:16:33.6522885Z [0;32m[----------] [m5 tests from LoopPeelerTest
2023-01-11T23:16:33.6523233Z [0;32m[ RUN      ] [mLoopPeelerTest.NoInductionVariableUse
2023-01-11T23:16:33.6524967Z [0;32m[       OK ] [mLoopPeelerTest.NoInductionVariableUse (0 ms)
2023-01-11T23:16:33.6525359Z [0;32m[ RUN      ] [mLoopPeelerTest.YesInductionVariableUse
2023-01-11T23:16:33.6528281Z [0;32m[       OK ] [mLoopPeelerTest.YesInductionVariableUse (0 ms)
2023-01-11T23:16:33.6528684Z [0;32m[ RUN      ] [mLoopPeelerTest.LoopWithTerminationCondition
2023-01-11T23:16:33.6533027Z [0;32m[       OK ] [mLoopPeelerTest.LoopWithTerminationCondition (0 ms)
2023-01-11T23:16:33.6533413Z [0;32m[ RUN      ] [mLoopPeelerTest.SimpleNestedLoops
2023-01-11T23:16:33.6539779Z [0;32m[       OK ] [mLoopPeelerTest.SimpleNestedLoops (0 ms)
2023-01-11T23:16:33.6540144Z [0;32m[ RUN      ] [mLoopPeelerTest.SimpleNestedLoops2
2023-01-11T23:16:33.6548310Z [0;32m[       OK ] [mLoopPeelerTest.SimpleNestedLoops2 (0 ms)
2023-01-11T23:16:33.6548679Z [0;32m[----------] [m5 tests from LoopPeelerTest (2 ms total)
2023-01-11T23:16:33.6548934Z 
2023-01-11T23:16:33.6549075Z [0;32m[----------] [m1 test from JitTracing
2023-01-11T23:16:33.6549348Z [0;32m[ RUN      ] [mJitTracing.Basic
2023-01-11T23:16:33.6681478Z [0;32m[       OK ] [mJitTracing.Basic (13 ms)
2023-01-11T23:16:33.6681806Z [0;32m[----------] [m1 test from JitTracing (13 ms total)
2023-01-11T23:16:33.6681954Z 
2023-01-11T23:16:33.6682175Z [0;32m[----------] [m1 test from InsertAndEliminateRedundantGuardsTest
2023-01-11T23:16:33.6682580Z [0;32m[ RUN      ] [mInsertAndEliminateRedundantGuardsTest.Basic
2023-01-11T23:16:33.6687411Z [0;32m[       OK ] [mInsertAndEliminateRedundantGuardsTest.Basic (0 ms)
2023-01-11T23:16:33.6687859Z [0;32m[----------] [m1 test from InsertAndEliminateRedundantGuardsTest (0 ms total)
2023-01-11T23:16:33.6688073Z 
2023-01-11T23:16:33.6688238Z [0;32m[----------] [m1 test from InsertBailOutsTest
2023-01-11T23:16:33.6688544Z [0;32m[ RUN      ] [mInsertBailOutsTest.Basic
2023-01-11T23:16:33.6697274Z [0;32m[       OK ] [mInsertBailOutsTest.Basic (0 ms)
2023-01-11T23:16:33.6697623Z [0;32m[----------] [m1 test from InsertBailOutsTest (0 ms total)
2023-01-11T23:16:33.6697795Z 
2023-01-11T23:16:33.6697950Z [0;32m[----------] [m2 tests from ProfilerTest
2023-01-11T23:16:33.6698233Z [0;32m[ RUN      ] [mProfilerTest.Basic
2023-01-11T23:16:33.6795995Z [0;32m[       OK ] [mProfilerTest.Basic (9 ms)
2023-01-11T23:16:33.6796450Z [0;32m[ RUN      ] [mProfilerTest.OptionalProfiling
2023-01-11T23:16:33.6797466Z [0;32m[       OK ] [mProfilerTest.OptionalProfiling (0 ms)
2023-01-11T23:16:33.6797824Z [0;32m[----------] [m2 tests from ProfilerTest (10 ms total)
2023-01-11T23:16:33.6797987Z 
2023-01-11T23:16:33.6798143Z [0;32m[----------] [m2 tests from CallStackTest
2023-01-11T23:16:33.6798418Z [0;32m[ RUN      ] [mCallStackTest.Basic
2023-01-11T23:16:33.6802930Z [0;32m[       OK ] [mCallStackTest.Basic (0 ms)
2023-01-11T23:16:33.6803228Z [0;32m[ RUN      ] [mCallStackTest.Caching
2023-01-11T23:16:33.6806650Z [0;32m[       OK ] [mCallStackTest.Caching (0 ms)
2023-01-11T23:16:33.6806995Z [0;32m[----------] [m2 tests from CallStackTest (0 ms total)
2023-01-11T23:16:33.6807155Z 
2023-01-11T23:16:33.6807330Z [0;32m[----------] [m2 tests from InlinedCallStackTest
2023-01-11T23:16:33.6807773Z [0;32m[ RUN      ] [mInlinedCallStackTest.BlockAnnotation
2023-01-11T23:16:33.6813331Z [0;32m[       OK ] [mInlinedCallStackTest.BlockAnnotation (0 ms)
2023-01-11T23:16:33.6813708Z [0;32m[ RUN      ] [mInlinedCallStackTest.SelfCallMethods
2023-01-11T23:16:33.6822932Z [0;32m[       OK ] [mInlinedCallStackTest.SelfCallMethods (0 ms)
2023-01-11T23:16:33.6823353Z [0;32m[----------] [m2 tests from InlinedCallStackTest (1 ms total)
2023-01-11T23:16:33.6823575Z 
2023-01-11T23:16:33.6823756Z [0;32m[----------] [m1 test from AutogradSymbolsTest
2023-01-11T23:16:33.6824131Z [0;32m[ RUN      ] [mAutogradSymbolsTest.Basic
2023-01-11T23:16:33.6824452Z [0;32m[       OK ] [mAutogradSymbolsTest.Basic (0 ms)
2023-01-11T23:16:33.6824791Z [0;32m[----------] [m1 test from AutogradSymbolsTest (0 ms total)
2023-01-11T23:16:33.6824969Z 
2023-01-11T23:16:33.6825159Z [0;32m[----------] [m1 test from DefaultArgTypeHintingTest
2023-01-11T23:16:33.6825502Z [0;32m[ RUN      ] [mDefaultArgTypeHintingTest.Basic
2023-01-11T23:16:33.6825853Z [0;32m[       OK ] [mDefaultArgTypeHintingTest.Basic (0 ms)
2023-01-11T23:16:33.6826221Z [0;32m[----------] [m1 test from DefaultArgTypeHintingTest (0 ms total)
2023-01-11T23:16:33.6826412Z 
2023-01-11T23:16:33.6826563Z [0;32m[----------] [m5 tests from FuturesTest
2023-01-11T23:16:33.6826839Z [0;32m[ RUN      ] [mFuturesTest.Basic
2023-01-11T23:16:33.6827115Z [0;32m[       OK ] [mFuturesTest.Basic (0 ms)
2023-01-11T23:16:33.6827390Z [0;32m[ RUN      ] [mFuturesTest.Error
2023-01-11T23:16:33.6838050Z [0;32m[       OK ] [mFuturesTest.Error (1 ms)
2023-01-11T23:16:33.6838334Z [0;32m[ RUN      ] [mFuturesTest.Then
2023-01-11T23:16:33.6839270Z [0;32m[       OK ] [mFuturesTest.Then (0 ms)
2023-01-11T23:16:33.6839976Z [0;32m[ RUN      ] [mFuturesTest.CollectAll
2023-01-11T23:16:33.6840290Z [0;32m[       OK ] [mFuturesTest.CollectAll (0 ms)
2023-01-11T23:16:33.6840627Z [0;32m[ RUN      ] [mFuturesTest.CollectAny
2023-01-11T23:16:33.6841071Z [0;32m[       OK ] [mFuturesTest.CollectAny (0 ms)
2023-01-11T23:16:33.6841494Z [0;32m[----------] [m5 tests from FuturesTest (1 ms total)
2023-01-11T23:16:33.6841662Z 
2023-01-11T23:16:33.6841845Z [0;32m[----------] [m1 test from TLSFutureCallbacksTest
2023-01-11T23:16:33.6842167Z [0;32m[ RUN      ] [mTLSFutureCallbacksTest.Basic
2023-01-11T23:16:33.6842864Z [0;32m[       OK ] [mTLSFutureCallbacksTest.Basic (0 ms)
2023-01-11T23:16:33.6843385Z [0;32m[----------] [m1 test from TLSFutureCallbacksTest (0 ms total)
2023-01-11T23:16:33.6843649Z 
2023-01-11T23:16:33.6844077Z [0;32m[----------] [m1 test from ProfilerDisableInCallbackTest
2023-01-11T23:16:33.6844500Z [0;32m[ RUN      ] [mProfilerDisableInCallbackTest.Basic
2023-01-11T23:16:33.6845323Z [0;32m[       OK ] [mProfilerDisableInCallbackTest.Basic (0 ms)
2023-01-11T23:16:33.6845957Z [0;32m[----------] [m1 test from ProfilerDisableInCallbackTest (0 ms total)
2023-01-11T23:16:33.6846242Z 
2023-01-11T23:16:33.6846427Z [0;32m[----------] [m2 tests from RecordDebugHandles
2023-01-11T23:16:33.6846746Z [0;32m[ RUN      ] [mRecordDebugHandles.Basic
2023-01-11T23:16:33.9639832Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T23:16:33.9640929Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T23:16:33.9641507Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T23:16:33.9642004Z [0;32m[       OK ] [mRecordDebugHandles.Basic (279 ms)
2023-01-11T23:16:33.9642342Z [0;32m[ RUN      ] [mRecordDebugHandles.ScopedCallbacks
2023-01-11T23:16:33.9642765Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T23:16:33.9647796Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T23:16:33.9648295Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T23:16:33.9648890Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T23:16:33.9652661Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T23:16:33.9653156Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T23:16:33.9653613Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:300] Completed Stage: Warm Up
2023-01-11T23:16:33.9658818Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:306] Completed Stage: Collection
2023-01-11T23:16:33.9659315Z STAGE:2023-01-11 23:16:33 19737:19737 ActivityProfilerController.cpp:310] Completed Stage: Post Processing
2023-01-11T23:16:33.9659734Z [0;32m[       OK ] [mRecordDebugHandles.ScopedCallbacks (1 ms)
2023-01-11T23:16:33.9660108Z [0;32m[----------] [m2 tests from RecordDebugHandles (281 ms total)
2023-01-11T23:16:33.9660286Z 
2023-01-11T23:16:33.9660457Z [0;32m[----------] [m1 test from IValueKWargsTest
2023-01-11T23:16:33.9660757Z [0;32m[ RUN      ] [mIValueKWargsTest.Basic
2023-01-11T23:16:33.9666054Z [0;32m[       OK ] [mIValueKWargsTest.Basic (0 ms)
2023-01-11T23:16:33.9666416Z [0;32m[----------] [m1 test from IValueKWargsTest (0 ms total)
2023-01-11T23:16:33.9666590Z 
2023-01-11T23:16:33.9666757Z [0;32m[----------] [m1 test from ComputeFlopsTest
2023-01-11T23:16:33.9667064Z [0;32m[ RUN      ] [mComputeFlopsTest.Basic
2023-01-11T23:16:33.9667548Z [W util.cpp:501] Warning: Failed to compute flops for op aten::conv2d because both input and weight must be size 4. (function computeFlops)
2023-01-11T23:16:33.9668080Z [W util.cpp:516] Warning: Failed to compute flops for op aten::conv2d because stride must be size 2 and cannot be 0. (function computeFlops)
2023-01-11T23:16:33.9668685Z [W util.cpp:472] Warning: Calculating flops for aten::conv2d requires groups, padding, stride, dilation, input_size, and weight_size in saved arguments. (function computeFlops)
2023-01-11T23:16:33.9669213Z [W util.cpp:545] Warning: Calculating flops for aten::mm requires mat1_size and mat2_size in saved arguments. (function computeFlops)
2023-01-11T23:16:33.9669690Z [0;32m[       OK ] [mComputeFlopsTest.Basic (0 ms)
2023-01-11T23:16:33.9670025Z [0;32m[----------] [m1 test from ComputeFlopsTest (0 ms total)
2023-01-11T23:16:33.9670196Z 
2023-01-11T23:16:33.9670348Z [0;32m[----------] [m1 test from TestConstant
2023-01-11T23:16:33.9670751Z [0;32m[ RUN      ] [mTestConstant.TensorGrad
2023-01-11T23:16:33.9671055Z [0;32m[       OK ] [mTestConstant.TensorGrad (0 ms)
2023-01-11T23:16:33.9671389Z [0;32m[----------] [m1 test from TestConstant (0 ms total)
2023-01-11T23:16:33.9671550Z 
2023-01-11T23:16:33.9671713Z [0;32m[----------] [m1 test from TestMutation
2023-01-11T23:16:33.9672043Z [0;32m[ RUN      ] [mTestMutation.Basic
2023-01-11T23:16:33.9672355Z [0;32m[       OK ] [mTestMutation.Basic (0 ms)
2023-01-11T23:16:33.9672789Z [0;32m[----------] [m1 test from TestMutation (0 ms total)
2023-01-11T23:16:33.9672972Z 
2023-01-11T23:16:33.9673183Z [0;32m[----------] [m1 test from TestInplaceToFunctionalActivation
2023-01-11T23:16:33.9673564Z [0;32m[ RUN      ] [mTestInplaceToFunctionalActivation.Basic
2023-01-11T23:16:33.9673955Z [0;32m[       OK ] [mTestInplaceToFunctionalActivation.Basic (0 ms)
2023-01-11T23:16:33.9674375Z [0;32m[----------] [m1 test from TestInplaceToFunctionalActivation (0 ms total)
2023-01-11T23:16:33.9674579Z 
2023-01-11T23:16:33.9674747Z [0;32m[----------] [m1 test from TestRegisterShapeOp
2023-01-11T23:16:33.9675053Z [0;32m[ RUN      ] [mTestRegisterShapeOp.Basic
2023-01-11T23:16:34.0818388Z [0;32m[       OK ] [mTestRegisterShapeOp.Basic (114 ms)
2023-01-11T23:16:34.0819163Z [0;32m[----------] [m1 test from TestRegisterShapeOp (114 ms total)
2023-01-11T23:16:34.0819520Z 
2023-01-11T23:16:34.0819944Z [0;32m[----------] [m1 test from TestFunctionalToInplaceActivation
2023-01-11T23:16:34.0821040Z [0;32m[ RUN      ] [mTestFunctionalToInplaceActivation.Basic
2023-01-11T23:16:34.0822146Z [0;32m[       OK ] [mTestFunctionalToInplaceActivation.Basic (0 ms)
2023-01-11T23:16:34.0822994Z [0;32m[----------] [m1 test from TestFunctionalToInplaceActivation (0 ms total)
2023-01-11T23:16:34.0823399Z 
2023-01-11T23:16:34.0823735Z [0;32m[----------] [m2 tests from TestFunctionExecutor
2023-01-11T23:16:34.0824440Z [0;32m[ RUN      ] [mTestFunctionExecutor.SimpleExecutorTest
2023-01-11T23:16:34.0825222Z [0;32m[       OK ] [mTestFunctionExecutor.SimpleExecutorTest (0 ms)
2023-01-11T23:16:34.0825999Z [0;32m[ RUN      ] [mTestFunctionExecutor.RunDecompositionTest
2023-01-11T23:16:34.0839371Z [0;32m[       OK ] [mTestFunctionExecutor.RunDecompositionTest (1 ms)
2023-01-11T23:16:34.0839807Z [0;32m[----------] [m2 tests from TestFunctionExecutor (2 ms total)
2023-01-11T23:16:34.0839982Z 
2023-01-11T23:16:34.0840159Z [0;32m[----------] [m1 test from TestShapeGraphLinting
2023-01-11T23:16:34.0840493Z [0;32m[ RUN      ] [mTestShapeGraphLinting.Basic
2023-01-11T23:16:34.0843085Z [0;32m[       OK ] [mTestShapeGraphLinting.Basic (0 ms)
2023-01-11T23:16:34.0843470Z [0;32m[----------] [m1 test from TestShapeGraphLinting (0 ms total)
2023-01-11T23:16:34.0843657Z 
2023-01-11T23:16:34.0843802Z [0;32m[----------] [m1 test from Composed
2023-01-11T23:16:34.0844085Z [0;32m[ RUN      ] [mComposed.ComposedOp
2023-01-11T23:16:34.3429621Z [0;32m[       OK ] [mComposed.ComposedOp (258 ms)
2023-01-11T23:16:34.3430628Z [0;32m[----------] [m1 test from Composed (258 ms total)
2023-01-11T23:16:34.3430980Z 
2023-01-11T23:16:34.3431512Z [0;32m[----------] [m1 test from ConstantPropagation
2023-01-11T23:16:34.3432415Z [0;32m[ RUN      ] [mConstantPropagation.CustomClassesCanBePropagated
2023-01-11T23:16:34.3434061Z [0;32m[       OK ] [mConstantPropagation.CustomClassesCanBePropagated (0 ms)
2023-01-11T23:16:34.3435018Z [0;32m[----------] [m1 test from ConstantPropagation (0 ms total)
2023-01-11T23:16:34.3435372Z 
2023-01-11T23:16:34.3435712Z [0;32m[----------] [m19 tests from MobileTypeParserTest
2023-01-11T23:16:34.3436071Z [0;32m[ RUN      ] [mMobileTypeParserTest.Int
2023-01-11T23:16:34.3436401Z [0;32m[       OK ] [mMobileTypeParserTest.Int (0 ms)
2023-01-11T23:16:34.3436793Z [0;32m[ RUN      ] [mMobileTypeParserTest.NestedContainersAnnotationStr
2023-01-11T23:16:34.3437259Z [0;32m[       OK ] [mMobileTypeParserTest.NestedContainersAnnotationStr (0 ms)
2023-01-11T23:16:34.3437663Z [0;32m[ RUN      ] [mMobileTypeParserTest.TorchBindClass
2023-01-11T23:16:34.3438033Z [0;32m[       OK ] [mMobileTypeParserTest.TorchBindClass (0 ms)
2023-01-11T23:16:34.3438419Z [0;32m[ RUN      ] [mMobileTypeParserTest.ListOfTorchBindClass
2023-01-11T23:16:34.3438820Z [0;32m[       OK ] [mMobileTypeParserTest.ListOfTorchBindClass (0 ms)
2023-01-11T23:16:34.3439285Z [0;32m[ RUN      ] [mMobileTypeParserTest.NestedContainersAnnotationStrWithSpaces
2023-01-11T23:16:34.3439807Z [0;32m[       OK ] [mMobileTypeParserTest.NestedContainersAnnotationStrWithSpaces (0 ms)
2023-01-11T23:16:34.3440223Z [0;32m[ RUN      ] [mMobileTypeParserTest.NamedTuple
2023-01-11T23:16:34.3440577Z [0;32m[       OK ] [mMobileTypeParserTest.NamedTuple (0 ms)
2023-01-11T23:16:34.3440978Z [0;32m[ RUN      ] [mMobileTypeParserTest.DictNestedNamedTupleTypeList
2023-01-11T23:16:34.3441430Z [0;32m[       OK ] [mMobileTypeParserTest.DictNestedNamedTupleTypeList (0 ms)
2023-01-11T23:16:34.3441889Z [0;32m[ RUN      ] [mMobileTypeParserTest.NamedTupleNestedNamedTupleTypeList
2023-01-11T23:16:34.3442372Z [0;32m[       OK ] [mMobileTypeParserTest.NamedTupleNestedNamedTupleTypeList (0 ms)
2023-01-11T23:16:34.3442830Z [0;32m[ RUN      ] [mMobileTypeParserTest.NamedTupleNestedNamedTuple
2023-01-11T23:16:34.3443265Z [0;32m[       OK ] [mMobileTypeParserTest.NamedTupleNestedNamedTuple (0 ms)
2023-01-11T23:16:34.3443638Z [0;32m[ RUN      ] [mMobileTypeParserTest.Empty
2023-01-11T23:16:34.3468316Z [0;32m[       OK ] [mMobileTypeParserTest.Empty (3 ms)
2023-01-11T23:16:34.3469278Z [0;32m[ RUN      ] [mMobileTypeParserTest.TypoRaises
2023-01-11T23:16:34.3505197Z [0;32m[       OK ] [mMobileTypeParserTest.TypoRaises (3 ms)
2023-01-11T23:16:34.3505901Z [0;32m[ RUN      ] [mMobileTypeParserTest.MismatchBracketRaises
2023-01-11T23:16:34.3542646Z [0;32m[       OK ] [mMobileTypeParserTest.MismatchBracketRaises (3 ms)
2023-01-11T23:16:34.3543079Z [0;32m[ RUN      ] [mMobileTypeParserTest.MismatchBracketRaises2
2023-01-11T23:16:34.3581789Z [0;32m[       OK ] [mMobileTypeParserTest.MismatchBracketRaises2 (3 ms)
2023-01-11T23:16:34.3582224Z [0;32m[ RUN      ] [mMobileTypeParserTest.DictWithoutValueRaises
2023-01-11T23:16:34.3615884Z [0;32m[       OK ] [mMobileTypeParserTest.DictWithoutValueRaises (3 ms)
2023-01-11T23:16:34.3616363Z [0;32m[ RUN      ] [mMobileTypeParserTest.ListArgCountMismatchRaises
2023-01-11T23:16:34.3655913Z [0;32m[       OK ] [mMobileTypeParserTest.ListArgCountMismatchRaises (3 ms)
2023-01-11T23:16:34.3656429Z [0;32m[ RUN      ] [mMobileTypeParserTest.DictArgCountMismatchRaises
2023-01-11T23:16:34.3685424Z [0;32m[       OK ] [mMobileTypeParserTest.DictArgCountMismatchRaises (3 ms)
2023-01-11T23:16:34.3685994Z [0;32m[ RUN      ] [mMobileTypeParserTest.ValidTypeWithExtraStuffRaises
2023-01-11T23:16:34.3708316Z [0;32m[       OK ] [mMobileTypeParserTest.ValidTypeWithExtraStuffRaises (2 ms)
2023-01-11T23:16:34.3709320Z [0;32m[ RUN      ] [mMobileTypeParserTest.NonIdentifierRaises
2023-01-11T23:16:34.3729404Z [0;32m[       OK ] [mMobileTypeParserTest.NonIdentifierRaises (2 ms)
2023-01-11T23:16:34.3730462Z [0;32m[ RUN      ] [mMobileTypeParserTest.DictNestedNamedTupleTypeListRaises
2023-01-11T23:16:34.3773671Z [0;32m[       OK ] [mMobileTypeParserTest.DictNestedNamedTupleTypeListRaises (4 ms)
2023-01-11T23:16:34.3774420Z [0;32m[----------] [m19 tests from MobileTypeParserTest (34 ms total)
2023-01-11T23:16:34.3774807Z 
2023-01-11T23:16:34.3775038Z [0;32m[----------] [m14 tests from ModuleAPITest
2023-01-11T23:16:34.3775389Z [0;32m[ RUN      ] [mModuleAPITest.MethodRunAsync
2023-01-11T23:16:34.3810649Z [0;32m[       OK ] [mModuleAPITest.MethodRunAsync (3 ms)
2023-01-11T23:16:34.3811391Z [0;32m[ RUN      ] [mModuleAPITest.Clone
2023-01-11T23:16:34.3811983Z [0;32m[       OK ] [mModuleAPITest.Clone (0 ms)
2023-01-11T23:16:34.3812644Z [0;32m[ RUN      ] [mModuleAPITest.CloneWithModuleInterface
2023-01-11T23:16:34.3817596Z [0;32m[       OK ] [mModuleAPITest.CloneWithModuleInterface (0 ms)
2023-01-11T23:16:34.3818113Z [0;32m[ RUN      ] [mModuleAPITest.Copy
2023-01-11T23:16:34.3818489Z [0;32m[       OK ] [mModuleAPITest.Copy (0 ms)
2023-01-11T23:16:34.3818914Z [0;32m[ RUN      ] [mModuleAPITest.DeepCopy
2023-01-11T23:16:34.3819343Z [0;32m[       OK ] [mModuleAPITest.DeepCopy (0 ms)
2023-01-11T23:16:34.3819678Z [0;32m[ RUN      ] [mModuleAPITest.DeepCopyString
2023-01-11T23:16:34.3820017Z [0;32m[       OK ] [mModuleAPITest.DeepCopyString (0 ms)
2023-01-11T23:16:34.3820346Z [0;32m[ RUN      ] [mModuleAPITest.DeepCopyEnum
2023-01-11T23:16:34.3820675Z [0;32m[       OK ] [mModuleAPITest.DeepCopyEnum (0 ms)
2023-01-11T23:16:34.3821026Z [0;32m[ RUN      ] [mModuleAPITest.DeepCopyPreservesAliasing
2023-01-11T23:16:34.3821422Z [0;32m[       OK ] [mModuleAPITest.DeepCopyPreservesAliasing (0 ms)
2023-01-11T23:16:34.3821765Z [0;32m[ RUN      ] [mModuleAPITest.Constants
2023-01-11T23:16:34.3822068Z [0;32m[       OK ] [mModuleAPITest.Constants (0 ms)
2023-01-11T23:16:34.3822378Z [0;32m[ RUN      ] [mModuleAPITest.Parameters
2023-01-11T23:16:34.3822692Z [0;32m[       OK ] [mModuleAPITest.Parameters (0 ms)
2023-01-11T23:16:34.3822992Z [0;32m[ RUN      ] [mModuleAPITest.Define
2023-01-11T23:16:34.3824883Z [0;32m[       OK ] [mModuleAPITest.Define (0 ms)
2023-01-11T23:16:34.3825248Z [0;32m[ RUN      ] [mModuleAPITest.Freezing
2023-01-11T23:16:34.3855818Z [0;32m[       OK ] [mModuleAPITest.Freezing (3 ms)
2023-01-11T23:16:34.3856268Z [0;32m[ RUN      ] [mModuleAPITest.OfiFreezesTraining
2023-01-11T23:16:34.3888648Z [0;32m[       OK ] [mModuleAPITest.OfiFreezesTraining (3 ms)
2023-01-11T23:16:34.3889370Z [0;32m[ RUN      ] [mModuleAPITest.To_CUDA
2023-01-11T23:16:34.3890791Z [0;32m[       OK ] [mModuleAPITest.To_CUDA (0 ms)
2023-01-11T23:16:34.3891701Z [0;32m[----------] [m14 tests from ModuleAPITest (11 ms total)
2023-01-11T23:16:34.3892165Z 
2023-01-11T23:16:34.3892646Z [0;32m[----------] [m6 tests from PeepholeOptimizeTest
2023-01-11T23:16:34.3893496Z [0;32m[ RUN      ] [mPeepholeOptimizeTest.IsAndIsNot
2023-01-11T23:16:34.3894403Z [0;32m[       OK ] [mPeepholeOptimizeTest.IsAndIsNot (0 ms)
2023-01-11T23:16:34.3895615Z [0;32m[ RUN      ] [mPeepholeOptimizeTest.IsAndIsNot2
2023-01-11T23:16:34.3896141Z [0;32m[       OK ] [mPeepholeOptimizeTest.IsAndIsNot2 (0 ms)
2023-01-11T23:16:34.3896532Z [0;32m[ RUN      ] [mPeepholeOptimizeTest.IsAndIsNot3
2023-01-11T23:16:34.3896878Z [0;32m[       OK ] [mPeepholeOptimizeTest.IsAndIsNot3 (0 ms)
2023-01-11T23:16:34.3897238Z [0;32m[ RUN      ] [mPeepholeOptimizeTest.UnwrapOptional
2023-01-11T23:16:34.3897609Z [0;32m[       OK ] [mPeepholeOptimizeTest.UnwrapOptional (0 ms)
2023-01-11T23:16:34.3897969Z [0;32m[ RUN      ] [mPeepholeOptimizeTest.UnwrapOptional2
2023-01-11T23:16:34.3898343Z [0;32m[       OK ] [mPeepholeOptimizeTest.UnwrapOptional2 (0 ms)
2023-01-11T23:16:34.3898698Z [0;32m[ RUN      ] [mPeepholeOptimizeTest.AddMMFusion
2023-01-11T23:16:34.3899048Z [0;32m[       OK ] [mPeepholeOptimizeTest.AddMMFusion (0 ms)
2023-01-11T23:16:34.3899409Z [0;32m[----------] [m6 tests from PeepholeOptimizeTest (0 ms total)
2023-01-11T23:16:34.3899589Z 
2023-01-11T23:16:34.3899753Z [0;32m[----------] [m5 tests from QualifiedNameTest
2023-01-11T23:16:34.3900091Z [0;32m[ RUN      ] [mQualifiedNameTest.PrefixConstruction
2023-01-11T23:16:34.3900551Z [0;32m[       OK ] [mQualifiedNameTest.PrefixConstruction (0 ms)
2023-01-11T23:16:34.3900918Z [0;32m[ RUN      ] [mQualifiedNameTest.DottedConstruction
2023-01-11T23:16:34.3901292Z [0;32m[       OK ] [mQualifiedNameTest.DottedConstruction (0 ms)
2023-01-11T23:16:34.3901650Z [0;32m[ RUN      ] [mQualifiedNameTest.BadInputRaises
2023-01-11T23:16:34.3943770Z [0;32m[       OK ] [mQualifiedNameTest.BadInputRaises (4 ms)
2023-01-11T23:16:34.3945089Z [0;32m[ RUN      ] [mQualifiedNameTest.Equality
2023-01-11T23:16:34.3946006Z [0;32m[       OK ] [mQualifiedNameTest.Equality (0 ms)
2023-01-11T23:16:34.3946867Z [0;32m[ RUN      ] [mQualifiedNameTest.IsPrefixOf
2023-01-11T23:16:34.3947526Z [0;32m[       OK ] [mQualifiedNameTest.IsPrefixOf (0 ms)
2023-01-11T23:16:34.3948231Z [0;32m[----------] [m5 tests from QualifiedNameTest (4 ms total)
2023-01-11T23:16:34.3948570Z 
2023-01-11T23:16:34.3948895Z [0;32m[----------] [m7 tests from SerializationTest
2023-01-11T23:16:34.3949613Z [0;32m[ RUN      ] [mSerializationTest.ExtraFilesHookPreference
2023-01-11T23:16:34.3950851Z [W export_module.cpp:587] Warning: An extra files hook attempted to write metadata.json but this is already written in extra files and so will be skipped. This warning will only appear once per process. (function operator())
2023-01-11T23:16:34.3952117Z [0;32m[       OK ] [mSerializationTest.ExtraFilesHookPreference (0 ms)
2023-01-11T23:16:34.3952997Z [0;32m[ RUN      ] [mSerializationTest.ExtraFileHooksNoSecret
2023-01-11T23:16:34.3953871Z [0;32m[       OK ] [mSerializationTest.ExtraFileHooksNoSecret (0 ms)
2023-01-11T23:16:34.3954738Z [0;32m[ RUN      ] [mSerializationTest.ExtraFileHooksWithSecret
2023-01-11T23:16:34.3955856Z [0;32m[       OK ] [mSerializationTest.ExtraFileHooksWithSecret (0 ms)
2023-01-11T23:16:34.3956428Z [0;32m[ RUN      ] [mSerializationTest.TypeTags
2023-01-11T23:16:34.3956752Z [0;32m[       OK ] [mSerializationTest.TypeTags (0 ms)
2023-01-11T23:16:34.3957101Z [0;32m[ RUN      ] [mSerializationTest.TestJitStream_CUDA
2023-01-11T23:16:34.4075869Z [0;32m[       OK ] [mSerializationTest.TestJitStream_CUDA (11 ms)
2023-01-11T23:16:34.4076417Z [0;32m[ RUN      ] [mSerializationTest.ParentDirNotExist
2023-01-11T23:16:34.4108324Z [0;32m[       OK ] [mSerializationTest.ParentDirNotExist (3 ms)
2023-01-11T23:16:34.4108876Z [0;32m[ RUN      ] [mSerializationTest.CalculateNecessaryArgsTest
2023-01-11T23:16:34.4109387Z [0;32m[       OK ] [mSerializationTest.CalculateNecessaryArgsTest (0 ms)
2023-01-11T23:16:34.4109787Z [0;32m[----------] [m7 tests from SerializationTest (16 ms total)
2023-01-11T23:16:34.4109953Z 
2023-01-11T23:16:34.4110125Z [0;32m[----------] [m3 tests from TestSourceRoundTrip
2023-01-11T23:16:34.4110472Z [0;32m[ RUN      ] [mTestSourceRoundTrip.UpsampleNearest2d
2023-01-11T23:16:34.4126316Z [0;32m[       OK ] [mTestSourceRoundTrip.UpsampleNearest2d (1 ms)
2023-01-11T23:16:34.4126837Z [0;32m[ RUN      ] [mTestSourceRoundTrip.CheckAttrAccess
2023-01-11T23:16:34.4127309Z [0;32m[       OK ] [mTestSourceRoundTrip.CheckAttrAccess (0 ms)
2023-01-11T23:16:34.4127679Z [0;32m[ RUN      ] [mTestSourceRoundTrip.MethodInvocation
2023-01-11T23:16:34.4181249Z [0;32m[       OK ] [mTestSourceRoundTrip.MethodInvocation (5 ms)
2023-01-11T23:16:34.4181774Z [0;32m[----------] [m3 tests from TestSourceRoundTrip (7 ms total)
2023-01-11T23:16:34.4182014Z 
2023-01-11T23:16:34.4182170Z [0;32m[----------] [m1 test from TestSaveLoad
2023-01-11T23:16:34.4182493Z [0;32m[ RUN      ] [mTestSaveLoad.LoadWithoutDebugInfo
2023-01-11T23:16:34.4201938Z [0;32m[       OK ] [mTestSaveLoad.LoadWithoutDebugInfo (2 ms)
2023-01-11T23:16:34.4202504Z [0;32m[----------] [m1 test from TestSaveLoad (2 ms total)
2023-01-11T23:16:34.4202738Z 
2023-01-11T23:16:34.4203025Z [0;32m[----------] [m2 tests from FunctionSchemaIsAliasingTest
2023-01-11T23:16:34.4203466Z [0;32m[ RUN      ] [mFunctionSchemaIsAliasingTest.Basic
2023-01-11T23:16:34.4203950Z [0;32m[       OK ] [mFunctionSchemaIsAliasingTest.Basic (0 ms)
2023-01-11T23:16:34.4204344Z [0;32m[ RUN      ] [mFunctionSchemaIsAliasingTest.InvalidArgument
2023-01-11T23:16:34.4225299Z [0;32m[       OK ] [mFunctionSchemaIsAliasingTest.InvalidArgument (2 ms)
2023-01-11T23:16:34.4225731Z [0;32m[----------] [m2 tests from FunctionSchemaIsAliasingTest (2 ms total)
2023-01-11T23:16:34.4225930Z 
2023-01-11T23:16:34.4226185Z [0;32m[----------] [m2 tests from FunctionSchemaIsMutableTest
2023-01-11T23:16:34.4226568Z [0;32m[ RUN      ] [mFunctionSchemaIsMutableTest.Basic
2023-01-11T23:16:34.4226926Z [0;32m[       OK ] [mFunctionSchemaIsMutableTest.Basic (0 ms)
2023-01-11T23:16:34.4227296Z [0;32m[ RUN      ] [mFunctionSchemaIsMutableTest.InvalidArgument
2023-01-11T23:16:34.4257536Z [0;32m[       OK ] [mFunctionSchemaIsMutableTest.InvalidArgument (3 ms)
2023-01-11T23:16:34.4257958Z [0;32m[----------] [m2 tests from FunctionSchemaIsMutableTest (3 ms total)
2023-01-11T23:16:34.4258153Z 
2023-01-11T23:16:34.4258338Z [0;32m[----------] [m5 tests from SchemaInfoIsMutableTest
2023-01-11T23:16:34.4258662Z [0;32m[ RUN      ] [mSchemaInfoIsMutableTest.Basic
2023-01-11T23:16:34.4259417Z [0;32m[       OK ] [mSchemaInfoIsMutableTest.Basic (0 ms)
2023-01-11T23:16:34.4259780Z [0;32m[ RUN      ] [mSchemaInfoIsMutableTest.InvalidArgument
2023-01-11T23:16:34.4293857Z [0;32m[       OK ] [mSchemaInfoIsMutableTest.InvalidArgument (3 ms)
2023-01-11T23:16:34.4294243Z [0;32m[ RUN      ] [mSchemaInfoIsMutableTest.AliasingInputs
2023-01-11T23:16:34.4294961Z [0;32m[       OK ] [mSchemaInfoIsMutableTest.AliasingInputs (0 ms)
2023-01-11T23:16:34.4295453Z [0;32m[ RUN      ] [mSchemaInfoIsMutableTest.InstanceNorm
2023-01-11T23:16:34.4295871Z [0;32m[       OK ] [mSchemaInfoIsMutableTest.InstanceNorm (0 ms)
2023-01-11T23:16:34.4296226Z [0;32m[ RUN      ] [mSchemaInfoIsMutableTest.BatchNorm
2023-01-11T23:16:34.4296591Z [0;32m[       OK ] [mSchemaInfoIsMutableTest.BatchNorm (0 ms)
2023-01-11T23:16:34.4296968Z [0;32m[----------] [m5 tests from SchemaInfoIsMutableTest (3 ms total)
2023-01-11T23:16:34.4297151Z 
2023-01-11T23:16:34.4297352Z [0;32m[----------] [m2 tests from SchemaInfoIsNonDeterministicTest
2023-01-11T23:16:34.4297727Z [0;32m[ RUN      ] [mSchemaInfoIsNonDeterministicTest.Basic
2023-01-11T23:16:34.4298194Z [0;32m[       OK ] [mSchemaInfoIsNonDeterministicTest.Basic (0 ms)
2023-01-11T23:16:34.4298583Z [0;32m[ RUN      ] [mSchemaInfoIsNonDeterministicTest.Dropout
2023-01-11T23:16:34.4298967Z [0;32m[       OK ] [mSchemaInfoIsNonDeterministicTest.Dropout (0 ms)
2023-01-11T23:16:34.4299382Z [0;32m[----------] [m2 tests from SchemaInfoIsNonDeterministicTest (0 ms total)
2023-01-11T23:16:34.4299583Z 
2023-01-11T23:16:34.4299773Z [0;32m[----------] [m3 tests from FunctionSchemaMayAliasTest
2023-01-11T23:16:34.4300107Z [0;32m[ RUN      ] [mFunctionSchemaMayAliasTest.Basic
2023-01-11T23:16:34.4300456Z [0;32m[       OK ] [mFunctionSchemaMayAliasTest.Basic (0 ms)
2023-01-11T23:16:34.4300832Z [0;32m[ RUN      ] [mFunctionSchemaMayAliasTest.InvalidArgument
2023-01-11T23:16:34.4322873Z [0;32m[       OK ] [mFunctionSchemaMayAliasTest.InvalidArgument (2 ms)
2023-01-11T23:16:34.4323353Z [0;32m[ RUN      ] [mFunctionSchemaMayAliasTest.Wildcard
2023-01-11T23:16:34.4323791Z [0;32m[       OK ] [mFunctionSchemaMayAliasTest.Wildcard (0 ms)
2023-01-11T23:16:34.4324289Z [0;32m[----------] [m3 tests from FunctionSchemaMayAliasTest (2 ms total)
2023-01-11T23:16:34.4324538Z 
2023-01-11T23:16:34.4324809Z [0;32m[----------] [m7 tests from SchemaInfoMayAliasTest
2023-01-11T23:16:34.4325205Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.AliasingInputs
2023-01-11T23:16:34.4325673Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.AliasingInputs (0 ms)
2023-01-11T23:16:34.4326060Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.AliasingOutputs
2023-01-11T23:16:34.4326569Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.AliasingOutputs (0 ms)
2023-01-11T23:16:34.4327107Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.AliasingInputOutput
2023-01-11T23:16:34.4327617Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.AliasingInputOutput (0 ms)
2023-01-11T23:16:34.4328112Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.MultipleWildcardInputs
2023-01-11T23:16:34.4328728Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.MultipleWildcardInputs (0 ms)
2023-01-11T23:16:34.4329224Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.MultipleNonWildcardInputs
2023-01-11T23:16:34.4329693Z [W schema_info.cpp:333] Warning: alias::a appears twice in same argument list which will make aliasing checks more conservative. (function operator())
2023-01-11T23:16:34.4330174Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.MultipleNonWildcardInputs (0 ms)
2023-01-11T23:16:34.4330622Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.MultipleNonWildcardOutputs
2023-01-11T23:16:34.4331050Z [W schema_info.cpp:333] Warning: alias::a appears twice in same argument list which will make aliasing checks more conservative. (function operator())
2023-01-11T23:16:34.4331547Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.MultipleNonWildcardOutputs (0 ms)
2023-01-11T23:16:34.4331958Z [0;32m[ RUN      ] [mSchemaInfoMayAliasTest.MismatchingTypes
2023-01-11T23:16:34.4332350Z [0;32m[       OK ] [mSchemaInfoMayAliasTest.MismatchingTypes (0 ms)
2023-01-11T23:16:34.4332750Z [0;32m[----------] [m7 tests from SchemaInfoMayAliasTest (0 ms total)
2023-01-11T23:16:34.4332931Z 
2023-01-11T23:16:34.4333135Z [0;32m[----------] [m3 tests from FunctionSchemaMayContainAliasTest
2023-01-11T23:16:34.4333518Z [0;32m[ RUN      ] [mFunctionSchemaMayContainAliasTest.Basic
2023-01-11T23:16:34.4333927Z [0;32m[       OK ] [mFunctionSchemaMayContainAliasTest.Basic (0 ms)
2023-01-11T23:16:34.4334330Z [0;32m[ RUN      ] [mFunctionSchemaMayContainAliasTest.Wildcard
2023-01-11T23:16:34.4335094Z [0;32m[       OK ] [mFunctionSchemaMayContainAliasTest.Wildcard (0 ms)
2023-01-11T23:16:34.4335744Z [0;32m[ RUN      ] [mFunctionSchemaMayContainAliasTest.InputAndOutputContainers
2023-01-11T23:16:34.4336334Z [0;32m[       OK ] [mFunctionSchemaMayContainAliasTest.InputAndOutputContainers (0 ms)
2023-01-11T23:16:34.4336819Z [0;32m[----------] [m3 tests from FunctionSchemaMayContainAliasTest (0 ms total)
2023-01-11T23:16:34.4337024Z 
2023-01-11T23:16:34.4337309Z [0;32m[----------] [m6 tests from SchemaInfoMayContainAliasTest
2023-01-11T23:16:34.4337741Z [0;32m[ RUN      ] [mSchemaInfoMayContainAliasTest.ContainAliasInputsEqual
2023-01-11T23:16:34.4338220Z [0;32m[       OK ] [mSchemaInfoMayContainAliasTest.ContainAliasInputsEqual (0 ms)
2023-01-11T23:16:34.4338821Z [0;32m[ RUN      ] [mSchemaInfoMayContainAliasTest.ContainAliasInputsContained
2023-01-11T23:16:34.4339446Z [0;32m[       OK ] [mSchemaInfoMayContainAliasTest.ContainAliasInputsContained (0 ms)
2023-01-11T23:16:34.4339932Z [0;32m[ RUN      ] [mSchemaInfoMayContainAliasTest.ContainAliasOutputs
2023-01-11T23:16:34.4349662Z [0;32m[       OK ] [mSchemaInfoMayContainAliasTest.ContainAliasOutputs (0 ms)
2023-01-11T23:16:34.4350178Z [0;32m[ RUN      ] [mSchemaInfoMayContainAliasTest.ContainAliasInputOutput
2023-01-11T23:16:34.4350741Z [0;32m[       OK ] [mSchemaInfoMayContainAliasTest.ContainAliasInputOutput (0 ms)
2023-01-11T23:16:34.4351226Z [0;32m[ RUN      ] [mSchemaInfoMayContainAliasTest.InputAndOutputContainers
2023-01-11T23:16:34.4351721Z [0;32m[       OK ] [mSchemaInfoMayContainAliasTest.InputAndOutputContainers (0 ms)
2023-01-11T23:16:34.4352154Z [0;32m[ RUN      ] [mSchemaInfoMayContainAliasTest.Wildcard
2023-01-11T23:16:34.4352537Z [0;32m[       OK ] [mSchemaInfoMayContainAliasTest.Wildcard (0 ms)
2023-01-11T23:16:34.4352958Z [0;32m[----------] [m6 tests from SchemaInfoMayContainAliasTest (0 ms total)
2023-01-11T23:16:34.4353158Z 
2023-01-11T23:16:34.4353329Z [0;32m[----------] [m2 tests from SchemaMatchingTest
2023-01-11T23:16:34.4353645Z [0;32m[ RUN      ] [mSchemaMatchingTest.VarType
2023-01-11T23:16:34.4353992Z [0;32m[       OK ] [mSchemaMatchingTest.VarType (0 ms)
2023-01-11T23:16:34.4354319Z [0;32m[ RUN      ] [mSchemaMatchingTest.VarType2
2023-01-11T23:16:34.4354762Z [0;32m[       OK ] [mSchemaMatchingTest.VarType2 (0 ms)
2023-01-11T23:16:34.4355116Z [0;32m[----------] [m2 tests from SchemaMatchingTest (0 ms total)
2023-01-11T23:16:34.4355291Z 
2023-01-11T23:16:34.4355451Z [0;32m[----------] [m6 tests from StackOptTest
2023-01-11T23:16:34.4355771Z [0;32m[ RUN      ] [mStackOptTest.UseVariadicStack
2023-01-11T23:16:34.4422262Z [0;32m[       OK ] [mStackOptTest.UseVariadicStack (7 ms)
2023-01-11T23:16:34.4422726Z [0;32m[ RUN      ] [mStackOptTest.UseVariadicStackReplaceMultiple
2023-01-11T23:16:34.4470437Z [0;32m[       OK ] [mStackOptTest.UseVariadicStackReplaceMultiple (4 ms)
2023-01-11T23:16:34.4471019Z [0;32m[ RUN      ] [mStackOptTest.UseVariadicStackWithMultipleListUses
2023-01-11T23:16:34.4498373Z [0;32m[       OK ] [mStackOptTest.UseVariadicStackWithMultipleListUses (2 ms)
2023-01-11T23:16:34.4498878Z [0;32m[ RUN      ] [mStackOptTest.UseVariadicStackWithListMutationAfterCat
2023-01-11T23:16:34.4535402Z [0;32m[       OK ] [mStackOptTest.UseVariadicStackWithListMutationAfterCat (3 ms)
2023-01-11T23:16:34.4535991Z [0;32m[ RUN      ] [mStackOptTest.UseVariadicStackWithListMutationBeforeCat
2023-01-11T23:16:34.4584533Z [0;32m[       OK ] [mStackOptTest.UseVariadicStackWithListMutationBeforeCat (4 ms)
2023-01-11T23:16:34.4585051Z [0;32m[ RUN      ] [mStackOptTest.UseVariadicStackWithMultipleListMutations
2023-01-11T23:16:34.4671288Z [0;32m[       OK ] [mStackOptTest.UseVariadicStackWithMultipleListMutations (8 ms)
2023-01-11T23:16:34.4671856Z [0;32m[----------] [m6 tests from StackOptTest (33 ms total)
2023-01-11T23:16:34.4672068Z 
2023-01-11T23:16:34.4672314Z [0;32m[----------] [m16 tests from SubgraphMatcherTest
2023-01-11T23:16:34.4672696Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Trivial1
2023-01-11T23:16:34.4673037Z [0;32m[       OK ] [mSubgraphMatcherTest.Trivial1 (0 ms)
2023-01-11T23:16:34.4673360Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Trivial2
2023-01-11T23:16:34.4673713Z [0;32m[       OK ] [mSubgraphMatcherTest.Trivial2 (0 ms)
2023-01-11T23:16:34.4674050Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Trivial3
2023-01-11T23:16:34.4674382Z [0;32m[       OK ] [mSubgraphMatcherTest.Trivial3 (0 ms)
2023-01-11T23:16:34.4675041Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Trivial4
2023-01-11T23:16:34.4675464Z [0;32m[       OK ] [mSubgraphMatcherTest.Trivial4 (0 ms)
2023-01-11T23:16:34.4675788Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Linear1
2023-01-11T23:16:34.4676241Z [0;32m[       OK ] [mSubgraphMatcherTest.Linear1 (0 ms)
2023-01-11T23:16:34.4676571Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Linear2
2023-01-11T23:16:34.4676898Z [0;32m[       OK ] [mSubgraphMatcherTest.Linear2 (0 ms)
2023-01-11T23:16:34.4677217Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Diamond1
2023-01-11T23:16:34.4677598Z [0;32m[       OK ] [mSubgraphMatcherTest.Diamond1 (0 ms)
2023-01-11T23:16:34.4677968Z [0;32m[ RUN      ] [mSubgraphMatcherTest.Diamond2
2023-01-11T23:16:34.4678319Z [0;32m[       OK ] [mSubgraphMatcherTest.Diamond2 (0 ms)
2023-01-11T23:16:34.4678708Z [0;32m[ RUN      ] [mSubgraphMatcherTest.XPattern
2023-01-11T23:16:34.4679047Z [0;32m[       OK ] [mSubgraphMatcherTest.XPattern (0 ms)
2023-01-11T23:16:34.4679408Z [0;32m[ RUN      ] [mSubgraphMatcherTest.MultipleMatches
2023-01-11T23:16:34.4679769Z [0;32m[       OK ] [mSubgraphMatcherTest.MultipleMatches (0 ms)
2023-01-11T23:16:34.4680148Z [0;32m[ RUN      ] [mSubgraphMatcherTest.OverlappingMatches
2023-01-11T23:16:34.4680543Z [0;32m[       OK ] [mSubgraphMatcherTest.OverlappingMatches (0 ms)
2023-01-11T23:16:34.4680927Z [0;32m[ RUN      ] [mSubgraphMatcherTest.MatchInBasicBlocks1
2023-01-11T23:16:34.4681321Z [0;32m[       OK ] [mSubgraphMatcherTest.MatchInBasicBlocks1 (0 ms)
2023-01-11T23:16:34.4681711Z [0;32m[ RUN      ] [mSubgraphMatcherTest.MatchInBasicBlocks2
2023-01-11T23:16:34.4682102Z [0;32m[       OK ] [mSubgraphMatcherTest.MatchInBasicBlocks2 (0 ms)
2023-01-11T23:16:34.4682480Z [0;32m[ RUN      ] [mSubgraphMatcherTest.MatchesAttributes
2023-01-11T23:16:34.4682953Z [0;32m[       OK ] [mSubgraphMatcherTest.MatchesAttributes (0 ms)
2023-01-11T23:16:34.4683311Z [0;32m[ RUN      ] [mSubgraphMatcherTest.BadPattern
2023-01-11T23:16:34.4713042Z [0;32m[       OK ] [mSubgraphMatcherTest.BadPattern (3 ms)
2023-01-11T23:16:34.4714114Z [0;32m[ RUN      ] [mSubgraphMatcherTest.MultiOutput
2023-01-11T23:16:34.4715023Z [0;32m[       OK ] [mSubgraphMatcherTest.MultiOutput (0 ms)
2023-01-11T23:16:34.4715906Z [0;32m[----------] [m16 tests from SubgraphMatcherTest (4 ms total)
2023-01-11T23:16:34.4716115Z 
2023-01-11T23:16:34.4716327Z [0;32m[----------] [m4 tests from SubgraphRewriterTest
2023-01-11T23:16:34.4716651Z [0;32m[ RUN      ] [mSubgraphRewriterTest.FilterMatch
2023-01-11T23:16:34.4717008Z [0;32m[       OK ] [mSubgraphRewriterTest.FilterMatch (0 ms)
2023-01-11T23:16:34.4717470Z [0;32m[ RUN      ] [mSubgraphRewriterTest.FilterNoMatch
2023-01-11T23:16:34.4717968Z [0;32m[       OK ] [mSubgraphRewriterTest.FilterNoMatch (0 ms)
2023-01-11T23:16:34.4718452Z [0;32m[ RUN      ] [mSubgraphRewriterTest.MultiOutput
2023-01-11T23:16:34.4718976Z [0;32m[       OK ] [mSubgraphRewriterTest.MultiOutput (0 ms)
2023-01-11T23:16:34.4719326Z [0;32m[ RUN      ] [mSubgraphRewriterTest.OutputType
2023-01-11T23:16:34.4719677Z [0;32m[       OK ] [mSubgraphRewriterTest.OutputType (0 ms)
2023-01-11T23:16:34.4720041Z [0;32m[----------] [m4 tests from SubgraphRewriterTest (0 ms total)
2023-01-11T23:16:34.4720216Z 
2023-01-11T23:16:34.4720384Z [0;32m[----------] [m3 tests from SubgraphUtilsTest
2023-01-11T23:16:34.4720681Z [0;32m[ RUN      ] [mSubgraphUtilsTest.Basic
2023-01-11T23:16:34.4723866Z [0;32m[       OK ] [mSubgraphUtilsTest.Basic (0 ms)
2023-01-11T23:16:34.4724281Z [0;32m[ RUN      ] [mSubgraphUtilsTest.MergeSubgraphs
2023-01-11T23:16:34.4726025Z [0;32m[       OK ] [mSubgraphUtilsTest.MergeSubgraphs (0 ms)
2023-01-11T23:16:34.4726563Z [0;32m[ RUN      ] [mSubgraphUtilsTest.GraphName
2023-01-11T23:16:34.4727044Z [0;32m[       OK ] [mSubgraphUtilsTest.GraphName (0 ms)
2023-01-11T23:16:34.4727535Z [0;32m[----------] [m3 tests from SubgraphUtilsTest (0 ms total)
2023-01-11T23:16:34.4727790Z 
2023-01-11T23:16:34.4727955Z [0;32m[----------] [m8 tests from UnionTypeTest
2023-01-11T23:16:34.4728399Z [0;32m[ RUN      ] [mUnionTypeTest.UnionOperatorEquals
2023-01-11T23:16:34.4728763Z [0;32m[       OK ] [mUnionTypeTest.UnionOperatorEquals (0 ms)
2023-01-11T23:16:34.4729150Z [0;32m[ RUN      ] [mUnionTypeTest.UnionCreate_OptionalT1AndOptionalT2
2023-01-11T23:16:34.4729551Z [0;32m[       OK ] [mUnionTypeTest.UnionCreate_OptionalT1AndOptionalT2 (0 ms)
2023-01-11T23:16:34.4729934Z [0;32m[ RUN      ] [mUnionTypeTest.UnionCreate_OptionalTAndT
2023-01-11T23:16:34.4730297Z [0;32m[       OK ] [mUnionTypeTest.UnionCreate_OptionalTAndT (0 ms)
2023-01-11T23:16:34.4730690Z [0;32m[ RUN      ] [mUnionTypeTest.UnionCreate_TupleWithSubtypingRelationship
2023-01-11T23:16:34.4731126Z [0;32m[       OK ] [mUnionTypeTest.UnionCreate_TupleWithSubtypingRelationship (0 ms)
2023-01-11T23:16:34.4731522Z [0;32m[ RUN      ] [mUnionTypeTest.UnionCreate_ContainerTAndT
2023-01-11T23:16:34.4731890Z [0;32m[       OK ] [mUnionTypeTest.UnionCreate_ContainerTAndT (0 ms)
2023-01-11T23:16:34.4732305Z [0;32m[ RUN      ] [mUnionTypeTest.UnionCreate_OptionalContainerTAndContainerTAndT
2023-01-11T23:16:34.4732762Z [0;32m[       OK ] [mUnionTypeTest.UnionCreate_OptionalContainerTAndContainerTAndT (0 ms)
2023-01-11T23:16:34.4733153Z [0;32m[ RUN      ] [mUnionTypeTest.Subtyping_NumberType
2023-01-11T23:16:34.4733491Z [0;32m[       OK ] [mUnionTypeTest.Subtyping_NumberType (0 ms)
2023-01-11T23:16:34.4733842Z [0;32m[ RUN      ] [mUnionTypeTest.Subtyping_OptionalType
2023-01-11T23:16:34.4734193Z [0;32m[       OK ] [mUnionTypeTest.Subtyping_OptionalType (0 ms)
2023-01-11T23:16:34.4734729Z [0;32m[----------] [m8 tests from UnionTypeTest (0 ms total)
2023-01-11T23:16:34.4734889Z 
2023-01-11T23:16:34.4735124Z [0;32m[----------] [m2 tests from ScriptProfileTest
2023-01-11T23:16:34.4735433Z [0;32m[ RUN      ] [mScriptProfileTest.Basic
2023-01-11T23:16:34.4735748Z [0;32m[       OK ] [mScriptProfileTest.Basic (0 ms)
2023-01-11T23:16:34.4736069Z [0;32m[ RUN      ] [mScriptProfileTest.CallingOrder
2023-01-11T23:16:34.4760906Z [0;32m[       OK ] [mScriptProfileTest.CallingOrder (3 ms)
2023-01-11T23:16:34.4761393Z [0;32m[----------] [m2 tests from ScriptProfileTest (3 ms total)
2023-01-11T23:16:34.4761603Z 
2023-01-11T23:16:34.4761814Z [0;32m[----------] [m7 tests from ShapeAnalysisTest
2023-01-11T23:16:34.4762193Z [0;32m[ RUN      ] [mShapeAnalysisTest.DynamicShapesFusion
2023-01-11T23:16:34.4839518Z [0;32m[       OK ] [mShapeAnalysisTest.DynamicShapesFusion (7 ms)
2023-01-11T23:16:34.4839949Z [0;32m[ RUN      ] [mShapeAnalysisTest.MovingConstantOutOfFusionGroups
2023-01-11T23:16:34.4856401Z [0;32m[       OK ] [mShapeAnalysisTest.MovingConstantOutOfFusionGroups (1 ms)
2023-01-11T23:16:34.4856830Z [0;32m[ RUN      ] [mShapeAnalysisTest.SymbolicShapeAPI
2023-01-11T23:16:34.4933714Z [0;32m[       OK ] [mShapeAnalysisTest.SymbolicShapeAPI (7 ms)
2023-01-11T23:16:34.4934108Z [0;32m[ RUN      ] [mShapeAnalysisTest.BoundedSymbolicShapes
2023-01-11T23:16:34.4940968Z [0;32m[       OK ] [mShapeAnalysisTest.BoundedSymbolicShapes (0 ms)
2023-01-11T23:16:34.4941375Z [0;32m[ RUN      ] [mShapeAnalysisTest.SymbolicShapeCaching
2023-01-11T23:16:34.4949201Z [0;32m[       OK ] [mShapeAnalysisTest.SymbolicShapeCaching (0 ms)
2023-01-11T23:16:34.4949603Z [0;32m[ RUN      ] [mShapeAnalysisTest.ShapeCacheMultipleFns
2023-01-11T23:16:34.4983529Z [0;32m[       OK ] [mShapeAnalysisTest.ShapeCacheMultipleFns (3 ms)
2023-01-11T23:16:34.4983947Z [0;32m[ RUN      ] [mShapeAnalysisTest.TestShapeMultipleReturns
2023-01-11T23:16:34.4998269Z [0;32m[       OK ] [mShapeAnalysisTest.TestShapeMultipleReturns (1 ms)
2023-01-11T23:16:34.4998683Z [0;32m[----------] [m7 tests from ShapeAnalysisTest (23 ms total)
2023-01-11T23:16:34.4998863Z 
2023-01-11T23:16:34.4999014Z [0;32m[----------] [m5 tests from JitLoggingTest
2023-01-11T23:16:34.4999349Z [0;32m[ RUN      ] [mJitLoggingTest.CheckSetLoggingLevel
2023-01-11T23:16:34.4999717Z [0;32m[       OK ] [mJitLoggingTest.CheckSetLoggingLevel (0 ms)
2023-01-11T23:16:34.5000177Z [0;32m[ RUN      ] [mJitLoggingTest.CheckSetMultipleLogLevels
2023-01-11T23:16:34.5000600Z [0;32m[       OK ] [mJitLoggingTest.CheckSetMultipleLogLevels (0 ms)
2023-01-11T23:16:34.5000996Z [0;32m[ RUN      ] [mJitLoggingTest.CheckLoggingLevelAfterUnset
2023-01-11T23:16:34.5001402Z [0;32m[       OK ] [mJitLoggingTest.CheckLoggingLevelAfterUnset (0 ms)
2023-01-11T23:16:34.5001784Z [0;32m[ RUN      ] [mJitLoggingTest.CheckAfterChangingLevel
2023-01-11T23:16:34.5002168Z [0;32m[       OK ] [mJitLoggingTest.CheckAfterChangingLevel (0 ms)
2023-01-11T23:16:34.5002549Z [0;32m[ RUN      ] [mJitLoggingTest.CheckOutputStreamSetting
2023-01-11T23:16:34.5002938Z [0;32m[       OK ] [mJitLoggingTest.CheckOutputStreamSetting (0 ms)
2023-01-11T23:16:34.5003307Z [0;32m[----------] [m5 tests from JitLoggingTest (0 ms total)
2023-01-11T23:16:34.5003470Z 
2023-01-11T23:16:34.5003628Z [0;32m[----------] [m9 tests from FileFormatTest
2023-01-11T23:16:34.5003979Z [0;32m[ RUN      ] [mFileFormatTest.IdentifiesFlatbufferStream
2023-01-11T23:16:34.5004370Z [0;32m[       OK ] [mFileFormatTest.IdentifiesFlatbufferStream (0 ms)
2023-01-11T23:16:34.5004746Z [0;32m[ RUN      ] [mFileFormatTest.IdentifiesZipStream
2023-01-11T23:16:34.5005110Z [0;32m[       OK ] [mFileFormatTest.IdentifiesZipStream (0 ms)
2023-01-11T23:16:34.5005475Z [0;32m[ RUN      ] [mFileFormatTest.FlatbufferTakesPrecedence
2023-01-11T23:16:34.5005876Z [0;32m[       OK ] [mFileFormatTest.FlatbufferTakesPrecedence (0 ms)
2023-01-11T23:16:34.5006309Z [0;32m[ RUN      ] [mFileFormatTest.HandlesUnknownStream
2023-01-11T23:16:34.5006673Z [0;32m[       OK ] [mFileFormatTest.HandlesUnknownStream (0 ms)
2023-01-11T23:16:34.5007026Z [0;32m[ RUN      ] [mFileFormatTest.ShortStreamIsUnknown
2023-01-11T23:16:34.5007439Z [0;32m[       OK ] [mFileFormatTest.ShortStreamIsUnknown (0 ms)
2023-01-11T23:16:34.5007805Z [0;32m[ RUN      ] [mFileFormatTest.EmptyStreamIsUnknown
2023-01-11T23:16:34.5008168Z [0;32m[       OK ] [mFileFormatTest.EmptyStreamIsUnknown (0 ms)
2023-01-11T23:16:34.5008529Z [0;32m[ RUN      ] [mFileFormatTest.BadStreamIsUnknown
2023-01-11T23:16:34.5008888Z [0;32m[       OK ] [mFileFormatTest.BadStreamIsUnknown (0 ms)
2023-01-11T23:16:34.5009279Z [0;32m[ RUN      ] [mFileFormatTest.StreamOffsetIsObservedAndRestored
2023-01-11T23:16:34.5009725Z [0;32m[       OK ] [mFileFormatTest.StreamOffsetIsObservedAndRestored (0 ms)
2023-01-11T23:16:34.5010116Z [0;32m[ RUN      ] [mFileFormatTest.HandlesMissingFile
2023-01-11T23:16:34.5010479Z [0;32m[       OK ] [mFileFormatTest.HandlesMissingFile (0 ms)
2023-01-11T23:16:34.5010829Z [0;32m[----------] [m9 tests from FileFormatTest (0 ms total)
2023-01-11T23:16:34.5010988Z 
2023-01-11T23:16:34.5011151Z [0;32m[----------] [m35 tests from FlatbufferTest
2023-01-11T23:16:34.5011479Z [0;32m[ RUN      ] [mFlatbufferTest.UpsampleNearest2d
2023-01-11T23:16:34.5012831Z [0;32m[       OK ] [mFlatbufferTest.UpsampleNearest2d (1 ms)
2023-01-11T23:16:34.5013259Z [0;32m[ RUN      ] [mFlatbufferTest.UpsampleNearest2dWithCopyTensorMemory
2023-01-11T23:16:34.5023119Z [0;32m[       OK ] [mFlatbufferTest.UpsampleNearest2dWithCopyTensorMemory (1 ms)
2023-01-11T23:16:34.5023528Z [0;32m[ RUN      ] [mFlatbufferTest.CheckAttrAccess
2023-01-11T23:16:34.5023865Z [0;32m[       OK ] [mFlatbufferTest.CheckAttrAccess (0 ms)
2023-01-11T23:16:34.5024204Z [0;32m[ RUN      ] [mFlatbufferTest.MethodInvocation
2023-01-11T23:16:34.5044438Z [0;32m[       OK ] [mFlatbufferTest.MethodInvocation (2 ms)
2023-01-11T23:16:34.5044807Z [0;32m[ RUN      ] [mFlatbufferTest.FlatbufferBackPortTest
2023-01-11T23:16:34.5074501Z [0;32m[       OK ] [mFlatbufferTest.FlatbufferBackPortTest (2 ms)
2023-01-11T23:16:34.5074863Z [0;32m[ RUN      ] [mFlatbufferTest.ExtraFiles
2023-01-11T23:16:34.5076918Z [0;32m[       OK ] [mFlatbufferTest.ExtraFiles (0 ms)
2023-01-11T23:16:34.5077210Z [0;32m[ RUN      ] [mFlatbufferTest.Conv
2023-01-11T23:16:34.5107787Z [0;32m[       OK ] [mFlatbufferTest.Conv (3 ms)
2023-01-11T23:16:34.5108218Z [0;32m[ RUN      ] [mFlatbufferTest.ConvWithCopyTensorMemory
2023-01-11T23:16:34.5138231Z [0;32m[       OK ] [mFlatbufferTest.ConvWithCopyTensorMemory (3 ms)
2023-01-11T23:16:34.5138570Z [0;32m[ RUN      ] [mFlatbufferTest.Inline
2023-01-11T23:16:34.5144088Z [0;32m[       OK ] [mFlatbufferTest.Inline (0 ms)
2023-01-11T23:16:34.5144450Z [0;32m[ RUN      ] [mFlatbufferTest.InlineWithCopyTensorMemory
2023-01-11T23:16:34.5150151Z [0;32m[       OK ] [mFlatbufferTest.InlineWithCopyTensorMemory (0 ms)
2023-01-11T23:16:34.5150541Z [0;32m[ RUN      ] [mFlatbufferTest.Tuple
2023-01-11T23:16:34.5154825Z [0;32m[       OK ] [mFlatbufferTest.Tuple (0 ms)
2023-01-11T23:16:34.5155124Z [0;32m[ RUN      ] [mFlatbufferTest.Dict
2023-01-11T23:16:34.5158915Z [0;32m[       OK ] [mFlatbufferTest.Dict (0 ms)
2023-01-11T23:16:34.5159206Z [0;32m[ RUN      ] [mFlatbufferTest.Prim
2023-01-11T23:16:34.5162084Z [0;32m[       OK ] [mFlatbufferTest.Prim (0 ms)
2023-01-11T23:16:34.5162394Z [0;32m[ RUN      ] [mFlatbufferTest.PrimScalar
2023-01-11T23:16:34.5165925Z [0;32m[       OK ] [mFlatbufferTest.PrimScalar (0 ms)
2023-01-11T23:16:34.5166297Z [0;32m[ RUN      ] [mFlatbufferTest.WrongMethodName
2023-01-11T23:16:34.5200826Z [0;32m[       OK ] [mFlatbufferTest.WrongMethodName (3 ms)
2023-01-11T23:16:34.5201152Z [0;32m[ RUN      ] [mFlatbufferTest.SetState
2023-01-11T23:16:34.5221931Z [0;32m[       OK ] [mFlatbufferTest.SetState (2 ms)
2023-01-11T23:16:34.5222247Z [0;32m[ RUN      ] [mFlatbufferTest.BuiltinClass
2023-01-11T23:16:34.5227621Z [0;32m[       OK ] [mFlatbufferTest.BuiltinClass (0 ms)
2023-01-11T23:16:34.5227962Z [0;32m[ RUN      ] [mFlatbufferTest.BuiltinFunction
2023-01-11T23:16:34.5229991Z [0;32m[       OK ] [mFlatbufferTest.BuiltinFunction (0 ms)
2023-01-11T23:16:34.5230303Z [0;32m[ RUN      ] [mFlatbufferTest.Eval
2023-01-11T23:16:34.5236425Z [0;32m[       OK ] [mFlatbufferTest.Eval (0 ms)
2023-01-11T23:16:34.5236752Z [0;32m[ RUN      ] [mFlatbufferTest.FindWrongMethodName
2023-01-11T23:16:34.5239339Z [0;32m[       OK ] [mFlatbufferTest.FindWrongMethodName (0 ms)
2023-01-11T23:16:34.5239694Z [0;32m[ RUN      ] [mFlatbufferTest.FindAndRunMethod
2023-01-11T23:16:34.5246120Z [0;32m[       OK ] [mFlatbufferTest.FindAndRunMethod (0 ms)
2023-01-11T23:16:34.5246460Z [0;32m[ RUN      ] [mFlatbufferTest.RunMethodVariadic
2023-01-11T23:16:34.5252441Z [0;32m[       OK ] [mFlatbufferTest.RunMethodVariadic (0 ms)
2023-01-11T23:16:34.5252792Z [0;32m[ RUN      ] [mFlatbufferTest.DuplicateSetState
2023-01-11T23:16:34.5261674Z [0;32m[       OK ] [mFlatbufferTest.DuplicateSetState (0 ms)
2023-01-11T23:16:34.5262075Z [0;32m[ RUN      ] [mFlatbufferTest.OpNameExportFetchRootOperators
2023-01-11T23:16:34.5267953Z [0;32m[       OK ] [mFlatbufferTest.OpNameExportFetchRootOperators (0 ms)
2023-01-11T23:16:34.5268342Z [0;32m[ RUN      ] [mFlatbufferTest.DefaultArgsConv
2023-01-11T23:16:34.5285522Z [0;32m[       OK ] [mFlatbufferTest.DefaultArgsConv (1 ms)
2023-01-11T23:16:34.5285863Z [0;32m[ RUN      ] [mFlatbufferTest.DefaultArgsPinv
2023-01-11T23:16:34.5427160Z [0;32m[       OK ] [mFlatbufferTest.DefaultArgsPinv (13 ms)
2023-01-11T23:16:34.5427595Z [0;32m[ RUN      ] [mFlatbufferTest.DefaultArgsTensorinvSpecifyDefault
2023-01-11T23:16:34.5434712Z [0;32m[       OK ] [mFlatbufferTest.DefaultArgsTensorinvSpecifyDefault (0 ms)
2023-01-11T23:16:34.5435144Z [0;32m[ RUN      ] [mFlatbufferTest.DefaultArgsPinvWithOutArg
2023-01-11T23:16:34.5462723Z [0;32m[       OK ] [mFlatbufferTest.DefaultArgsPinvWithOutArg (2 ms)
2023-01-11T23:16:34.5463116Z [0;32m[ RUN      ] [mFlatbufferTest.DefaultArgsWithOutArg
2023-01-11T23:16:34.5472123Z [0;32m[       OK ] [mFlatbufferTest.DefaultArgsWithOutArg (0 ms)
2023-01-11T23:16:34.5473019Z [0;32m[ RUN      ] [mFlatbufferTest.OperatorCacheDifferentiatesDefaultArgs
2023-01-11T23:16:34.5496059Z [0;32m[       OK ] [mFlatbufferTest.OperatorCacheDifferentiatesDefaultArgs (2 ms)
2023-01-11T23:16:34.5496685Z [0;32m[ RUN      ] [mFlatbufferTest.OperatorSize1
2023-01-11T23:16:34.5498896Z [0;32m[       OK ] [mFlatbufferTest.OperatorSize1 (0 ms)
2023-01-11T23:16:34.5499252Z [0;32m[ RUN      ] [mFlatbufferTest.BoolAndDoubleList
2023-01-11T23:16:34.5499606Z [0;32m[       OK ] [mFlatbufferTest.BoolAndDoubleList (0 ms)
2023-01-11T23:16:34.5499939Z [0;32m[ RUN      ] [mFlatbufferTest.OperatorTest2
2023-01-11T23:16:34.5509637Z [0;32m[       OK ] [mFlatbufferTest.OperatorTest2 (0 ms)
2023-01-11T23:16:34.5509990Z [0;32m[ RUN      ] [mFlatbufferTest.DetachedBufferSmoke
2023-01-11T23:16:34.5510355Z [0;32m[       OK ] [mFlatbufferTest.DetachedBufferSmoke (0 ms)
2023-01-11T23:16:34.5510800Z [0;32m[ RUN      ] [mFlatbufferTest.DetachedBufferNullOwner
2023-01-11T23:16:34.5511190Z [0;32m[       OK ] [mFlatbufferTest.DetachedBufferNullOwner (0 ms)
2023-01-11T23:16:34.5511562Z [0;32m[----------] [m35 tests from FlatbufferTest (51 ms total)
2023-01-11T23:16:34.5511728Z 
2023-01-11T23:16:34.5511910Z [0;32m[----------] [m3 tests from TestSourceFlatbuffer
2023-01-11T23:16:34.5512256Z [0;32m[ RUN      ] [mTestSourceFlatbuffer.UpsampleNearest2d
2023-01-11T23:16:34.5527991Z [0;32m[       OK ] [mTestSourceFlatbuffer.UpsampleNearest2d (1 ms)
2023-01-11T23:16:34.5528495Z [0;32m[ RUN      ] [mTestSourceFlatbuffer.CheckAttrAccess
2023-01-11T23:16:34.5528872Z [0;32m[       OK ] [mTestSourceFlatbuffer.CheckAttrAccess (0 ms)
2023-01-11T23:16:34.5529237Z [0;32m[ RUN      ] [mTestSourceFlatbuffer.MethodInvocation
2023-01-11T23:16:34.5586117Z [0;32m[       OK ] [mTestSourceFlatbuffer.MethodInvocation (5 ms)
2023-01-11T23:16:34.5586654Z [0;32m[----------] [m3 tests from TestSourceFlatbuffer (7 ms total)
2023-01-11T23:16:34.5586838Z 
2023-01-11T23:16:34.5587254Z [0;32m[----------] [m10 tests from FlatbufferUpgraderTest
2023-01-11T23:16:34.5587688Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivTensorV2
2023-01-11T23:16:34.5588147Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivTensorV2 (0 ms)
2023-01-11T23:16:34.5588584Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivTensorOutV2
2023-01-11T23:16:34.5588979Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivTensorOutV2 (0 ms)
2023-01-11T23:16:34.5589486Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivTensorInplaceV2
2023-01-11T23:16:34.5589899Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivTensorInplaceV2 (0 ms)
2023-01-11T23:16:34.5590424Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarFloatV2
2023-01-11T23:16:34.5591034Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarFloatV2 (0 ms)
2023-01-11T23:16:34.5591572Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarReciprocalFloatV2
2023-01-11T23:16:34.5592036Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarReciprocalFloatV2 (0 ms)
2023-01-11T23:16:34.5592519Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarReciprocalIntV2
2023-01-11T23:16:34.5593102Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarReciprocalIntV2 (0 ms)
2023-01-11T23:16:34.5593615Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarScalarV2
2023-01-11T23:16:34.5594145Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarScalarV2 (0 ms)
2023-01-11T23:16:34.5594557Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarIntV2
2023-01-11T23:16:34.5594945Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarIntV2 (0 ms)
2023-01-11T23:16:34.5595346Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarInplaceFloatV2
2023-01-11T23:16:34.5595769Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarInplaceFloatV2 (0 ms)
2023-01-11T23:16:34.5596186Z [0;32m[ RUN      ] [mFlatbufferUpgraderTest.DivScalarInplaceIntV2
2023-01-11T23:16:34.5596601Z [0;32m[       OK ] [mFlatbufferUpgraderTest.DivScalarInplaceIntV2 (0 ms)
2023-01-11T23:16:34.5597008Z [0;32m[----------] [m10 tests from FlatbufferUpgraderTest (0 ms total)
2023-01-11T23:16:34.5597189Z 
2023-01-11T23:16:34.5597345Z [0;32m[----------] [m759 tests from NVFuserTest
2023-01-11T23:16:34.5597662Z [0;32m[ RUN      ] [mNVFuserTest.FusionDefinition_CUDA
2023-01-11T23:16:34.5657114Z [0;32m[       OK ] [mNVFuserTest.FusionDefinition_CUDA (6 ms)
2023-01-11T23:16:34.5657487Z [0;32m[ RUN      ] [mNVFuserTest.PyFusionCache_CUDA
2023-01-11T23:16:34.5771702Z [0;32m[       OK ] [mNVFuserTest.PyFusionCache_CUDA (11 ms)
2023-01-11T23:16:34.5772132Z [0;32m[ RUN      ] [mNVFuserTest.RecordFunctorEquality_CUDA
2023-01-11T23:16:34.5772514Z [0;32m[       OK ] [mNVFuserTest.RecordFunctorEquality_CUDA (0 ms)
2023-01-11T23:16:34.5772885Z [0;32m[ RUN      ] [mNVFuserTest.FusionIrGraphGenerator_CUDA
2023-01-11T23:16:34.5781369Z [0;32m[       OK ] [mNVFuserTest.FusionIrGraphGenerator_CUDA (0 ms)
2023-01-11T23:16:34.5781860Z [0;32m[ RUN      ] [mNVFuserTest.FusionDispatch_CUDA
2023-01-11T23:16:34.5782274Z [0;32m[       OK ] [mNVFuserTest.FusionDispatch_CUDA (0 ms)
2023-01-11T23:16:34.5782637Z [0;32m[ RUN      ] [mNVFuserTest.FusionExprEvalConstants_CUDA
2023-01-11T23:16:34.5783021Z [0;32m[       OK ] [mNVFuserTest.FusionExprEvalConstants_CUDA (0 ms)
2023-01-11T23:16:34.5783407Z [0;32m[ RUN      ] [mNVFuserTest.FusionExprEvalDouble_CUDA
2023-01-11T23:16:34.5783775Z [0;32m[       OK ] [mNVFuserTest.FusionExprEvalDouble_CUDA (0 ms)
2023-01-11T23:16:34.5784144Z [0;32m[ RUN      ] [mNVFuserTest.FusionExprEvalBindings_CUDA
2023-01-11T23:16:34.5811252Z [0;32m[       OK ] [mNVFuserTest.FusionExprEvalBindings_CUDA (2 ms)
2023-01-11T23:16:34.5811669Z [0;32m[ RUN      ] [mNVFuserTest.FusionExprEvalBasic_CUDA
2023-01-11T23:16:34.5818675Z [0;32m[       OK ] [mNVFuserTest.FusionExprEvalBasic_CUDA (0 ms)
2023-01-11T23:16:34.5819124Z [0;32m[ RUN      ] [mNVFuserTest.FusionExprEvalComplex_CUDA
2023-01-11T23:16:34.5819504Z [0;32m[       OK ] [mNVFuserTest.FusionExprEvalComplex_CUDA (0 ms)
2023-01-11T23:16:34.5819960Z [0;32m[ RUN      ] [mNVFuserTest.FusionExprEvalPostLower_CUDA
2023-01-11T23:16:34.5855570Z [0;32m[       OK ] [mNVFuserTest.FusionExprEvalPostLower_CUDA (3 ms)
2023-01-11T23:16:34.5856004Z [0;32m[ RUN      ] [mNVFuserTest.FusionKernelExprEvalConstants_CUDA
2023-01-11T23:16:34.5856436Z [0;32m[       OK ] [mNVFuserTest.FusionKernelExprEvalConstants_CUDA (0 ms)
2023-01-11T23:16:34.5856844Z [0;32m[ RUN      ] [mNVFuserTest.FusionKernelExprEvalBindings_CUDA
2023-01-11T23:16:34.5884321Z [0;32m[       OK ] [mNVFuserTest.FusionKernelExprEvalBindings_CUDA (2 ms)
2023-01-11T23:16:34.5884689Z [0;32m[ RUN      ] [mNVFuserTest.FusionClear_CUDA
2023-01-11T23:16:34.7691445Z [0;32m[       OK ] [mNVFuserTest.FusionClear_CUDA (180 ms)
2023-01-11T23:16:34.7692131Z [0;32m[ RUN      ] [mNVFuserTest.FusionCopy_CUDA
2023-01-11T23:16:34.7786425Z [0;32m[       OK ] [mNVFuserTest.FusionCopy_CUDA (9 ms)
2023-01-11T23:16:34.7788072Z [0;32m[ RUN      ] [mNVFuserTest.FusionMove_CUDA
2023-01-11T23:16:34.7838279Z [0;32m[       OK ] [mNVFuserTest.FusionMove_CUDA (5 ms)
2023-01-11T23:16:34.7838650Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleArith_CUDA
2023-01-11T23:16:34.7839011Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleArith_CUDA (0 ms)
2023-01-11T23:16:34.7839385Z [0;32m[ RUN      ] [mNVFuserTest.FusionScalarTypePromote_CUDA
2023-01-11T23:16:34.7839773Z [0;32m[       OK ] [mNVFuserTest.FusionScalarTypePromote_CUDA (0 ms)
2023-01-11T23:16:34.7840152Z [0;32m[ RUN      ] [mNVFuserTest.FusionComplexAbsTypes_CUDA
2023-01-11T23:16:34.7865894Z [0;32m[       OK ] [mNVFuserTest.FusionComplexAbsTypes_CUDA (2 ms)
2023-01-11T23:16:34.7866273Z [0;32m[ RUN      ] [mNVFuserTest.FusionRegister_CUDA
2023-01-11T23:16:34.7866627Z [0;32m[       OK ] [mNVFuserTest.FusionRegister_CUDA (0 ms)
2023-01-11T23:16:34.7867010Z [0;32m[ RUN      ] [mNVFuserTest.FusionTopoSort_CUDA
2023-01-11T23:16:34.7867344Z [0;32m[       OK ] [mNVFuserTest.FusionTopoSort_CUDA (0 ms)
2023-01-11T23:16:34.7867685Z [0;32m[ RUN      ] [mNVFuserTest.FusionTensor_CUDA
2023-01-11T23:16:34.7868016Z [0;32m[       OK ] [mNVFuserTest.FusionTensor_CUDA (0 ms)
2023-01-11T23:16:34.7868342Z [0;32m[ RUN      ] [mNVFuserTest.FusionFilterVals_CUDA
2023-01-11T23:16:34.7868861Z [0;32m[       OK ] [mNVFuserTest.FusionFilterVals_CUDA (0 ms)
2023-01-11T23:16:34.7869205Z [0;32m[ RUN      ] [mNVFuserTest.FusionTVSplit_CUDA
2023-01-11T23:16:34.7869551Z [0;32m[       OK ] [mNVFuserTest.FusionTVSplit_CUDA (0 ms)
2023-01-11T23:16:34.7869876Z [0;32m[ RUN      ] [mNVFuserTest.FusionTVMerge_CUDA
2023-01-11T23:16:34.7870214Z [0;32m[       OK ] [mNVFuserTest.FusionTVMerge_CUDA (0 ms)
2023-01-11T23:16:34.7870661Z [0;32m[ RUN      ] [mNVFuserTest.FusionTVReorder_CUDA
2023-01-11T23:16:34.7871003Z [0;32m[       OK ] [mNVFuserTest.FusionTVReorder_CUDA (0 ms)
2023-01-11T23:16:34.7871336Z [0;32m[ RUN      ] [mNVFuserTest.FusionEquality_CUDA
2023-01-11T23:16:34.7871678Z [0;32m[       OK ] [mNVFuserTest.FusionEquality_CUDA (0 ms)
2023-01-11T23:16:34.7872013Z [0;32m[ RUN      ] [mNVFuserTest.FusionDependency_CUDA
2023-01-11T23:16:34.7872363Z [0;32m[       OK ] [mNVFuserTest.FusionDependency_CUDA (0 ms)
2023-01-11T23:16:34.7872700Z [0;32m[ RUN      ] [mNVFuserTest.FusionParser_CUDA
2023-01-11T23:16:34.9679934Z [0;32m[       OK ] [mNVFuserTest.FusionParser_CUDA (181 ms)
2023-01-11T23:16:34.9681381Z [0;32m[ RUN      ] [mNVFuserTest.FusionOuterSplit_CUDA
2023-01-11T23:16:35.1497269Z [0;32m[       OK ] [mNVFuserTest.FusionOuterSplit_CUDA (181 ms)
2023-01-11T23:16:35.1497853Z [0;32m[ RUN      ] [mNVFuserTest.FusionCodeGen_CUDA
2023-01-11T23:16:35.3329368Z [0;32m[       OK ] [mNVFuserTest.FusionCodeGen_CUDA (183 ms)
2023-01-11T23:16:35.3329945Z [0;32m[ RUN      ] [mNVFuserTest.FusionCodeGen2_CUDA
2023-01-11T23:16:35.5211775Z [0;32m[       OK ] [mNVFuserTest.FusionCodeGen2_CUDA (188 ms)
2023-01-11T23:16:35.5212370Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimplePWise_CUDA
2023-01-11T23:16:35.7042093Z [0;32m[       OK ] [mNVFuserTest.FusionSimplePWise_CUDA (183 ms)
2023-01-11T23:16:35.7043227Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimplePWiseDtypeComplex_CUDA
2023-01-11T23:16:35.8937026Z [0;32m[       OK ] [mNVFuserTest.FusionSimplePWiseDtypeComplex_CUDA (189 ms)
2023-01-11T23:16:35.8937712Z [0;32m[ RUN      ] [mNVFuserTest.FusionExecKernel_CUDA
2023-01-11T23:16:36.0993617Z [0;32m[       OK ] [mNVFuserTest.FusionExecKernel_CUDA (205 ms)
2023-01-11T23:16:36.0994062Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt1_CUDA
2023-01-11T23:16:36.3142123Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt1_CUDA (214 ms)
2023-01-11T23:16:36.3142956Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt2_CUDA
2023-01-11T23:16:36.5576750Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt2_CUDA (243 ms)
2023-01-11T23:16:36.5577528Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt3_CUDA
2023-01-11T23:16:36.8182527Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt3_CUDA (260 ms)
2023-01-11T23:16:36.8183415Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt4_CUDA
2023-01-11T23:16:37.1170848Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt4_CUDA (298 ms)
2023-01-11T23:16:37.1171689Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt5_CUDA
2023-01-11T23:16:37.3187026Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt5_CUDA (201 ms)
2023-01-11T23:16:37.3187524Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt6_CUDA
2023-01-11T23:16:37.5188241Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt6_CUDA (200 ms)
2023-01-11T23:16:37.5188692Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt7_CUDA
2023-01-11T23:16:37.7307637Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt7_CUDA (211 ms)
2023-01-11T23:16:37.7308067Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAt8_CUDA
2023-01-11T23:16:37.9156484Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAt8_CUDA (184 ms)
2023-01-11T23:16:37.9157141Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeWith1_CUDA
2023-01-11T23:16:38.1479810Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeWith1_CUDA (232 ms)
2023-01-11T23:16:38.1480260Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeWith2_CUDA
2023-01-11T23:16:38.4079363Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeWith2_CUDA (259 ms)
2023-01-11T23:16:38.4079827Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeWith3_CUDA
2023-01-11T23:16:38.6678113Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeWith3_CUDA (259 ms)
2023-01-11T23:16:38.6678564Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeWith4_CUDA
2023-01-11T23:16:38.9657229Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeWith4_CUDA (297 ms)
2023-01-11T23:16:38.9657735Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeWith5_CUDA
2023-01-11T23:16:39.1661624Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeWith5_CUDA (200 ms)
2023-01-11T23:16:39.1662526Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeWith6_CUDA
2023-01-11T23:16:39.3659477Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeWith6_CUDA (199 ms)
2023-01-11T23:16:39.3659956Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtMultiConsumers_CUDA
2023-01-11T23:16:39.5378716Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtMultiConsumers_CUDA (171 ms)
2023-01-11T23:16:39.5379230Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtCommonConsumer1_CUDA
2023-01-11T23:16:39.7109687Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtCommonConsumer1_CUDA (173 ms)
2023-01-11T23:16:39.7110196Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtCommonConsumer2_CUDA
2023-01-11T23:16:39.9122725Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtCommonConsumer2_CUDA (201 ms)
2023-01-11T23:16:39.9123205Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtCommonConsumer3_CUDA
2023-01-11T23:16:40.1222227Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtCommonConsumer3_CUDA (209 ms)
2023-01-11T23:16:40.1223184Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtNoCommonConsumer_CUDA
2023-01-11T23:16:40.2967152Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtNoCommonConsumer_CUDA (174 ms)
2023-01-11T23:16:40.2967763Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingBasic_CUDA
2023-01-11T23:16:40.2968186Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingBasic_CUDA (0 ms)
2023-01-11T23:16:40.2968574Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingRfactor_CUDA
2023-01-11T23:16:40.2975409Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingRfactor_CUDA (0 ms)
2023-01-11T23:16:40.2976265Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingReductionDependency1_CUDA
2023-01-11T23:16:40.2977146Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingReductionDependency1_CUDA (0 ms)
2023-01-11T23:16:40.2977810Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingReductionDependency2_CUDA
2023-01-11T23:16:40.2978270Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingReductionDependency2_CUDA (0 ms)
2023-01-11T23:16:40.2978731Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingReductionDependency3_CUDA
2023-01-11T23:16:40.2980556Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingReductionDependency3_CUDA (0 ms)
2023-01-11T23:16:40.2981021Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingReductionDependency4_CUDA
2023-01-11T23:16:40.2985619Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingReductionDependency4_CUDA (0 ms)
2023-01-11T23:16:40.2986122Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingReductionDependency5_CUDA_CUDA
2023-01-11T23:16:40.2989556Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingReductionDependency5_CUDA_CUDA (0 ms)
2023-01-11T23:16:40.2990124Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingReductionDependency6_CUDA_CUDA
2023-01-11T23:16:40.2997615Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingReductionDependency6_CUDA_CUDA (0 ms)
2023-01-11T23:16:40.2998207Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingMultipleBroadcastWithNoCommonConsumer_CUDA
2023-01-11T23:16:40.2998820Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingMultipleBroadcastWithNoCommonConsumer_CUDA (0 ms)
2023-01-11T23:16:40.2999344Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingBroadcastNonUniqueSize_CUDA
2023-01-11T23:16:40.3000373Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingBroadcastNonUniqueSize_CUDA (0 ms)
2023-01-11T23:16:40.3000825Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingBroadcast_CUDA
2023-01-11T23:16:40.3001234Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingBroadcast_CUDA (0 ms)
2023-01-11T23:16:40.3001646Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingTrivialReduction_CUDA
2023-01-11T23:16:40.4740307Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingTrivialReduction_CUDA (173 ms)
2023-01-11T23:16:40.4741566Z [0;32m[ RUN      ] [mNVFuserTest.FusionRootMappingRepro1950_CUDA
2023-01-11T23:16:40.4742535Z [0;32m[       OK ] [mNVFuserTest.FusionRootMappingRepro1950_CUDA (0 ms)
2023-01-11T23:16:40.4743464Z [0;32m[ RUN      ] [mNVFuserTest.FusionDetectSelfMappedDomains_CUDA
2023-01-11T23:16:40.4755867Z [0;32m[       OK ] [mNVFuserTest.FusionDetectSelfMappedDomains_CUDA (1 ms)
2023-01-11T23:16:40.4756381Z [0;32m[ RUN      ] [mNVFuserTest.FusionScalarInputs_CUDA
2023-01-11T23:16:40.6890942Z [0;32m[       OK ] [mNVFuserTest.FusionScalarInputs_CUDA (213 ms)
2023-01-11T23:16:40.6891724Z [0;32m[ RUN      ] [mNVFuserTest.FusionLoopUnroll_CUDA
2023-01-11T23:16:40.9147962Z [0;32m[       OK ] [mNVFuserTest.FusionLoopUnroll_CUDA (225 ms)
2023-01-11T23:16:40.9148434Z [0;32m[ RUN      ] [mNVFuserTest.FusionUnaryOps_CUDA
2023-01-11T23:16:51.4682647Z [W Copy.cpp:276] Warning: Casting complex values to real discards the imaginary part (function operator())
2023-01-11T23:16:56.7834009Z [0;32m[       OK ] [mNVFuserTest.FusionUnaryOps_CUDA (15868 ms)
2023-01-11T23:16:56.7834511Z [0;32m[ RUN      ] [mNVFuserTest.FusionBinaryOps_CUDA
2023-01-11T23:17:05.0885093Z [0;32m[       OK ] [mNVFuserTest.FusionBinaryOps_CUDA (8305 ms)
2023-01-11T23:17:05.0885512Z [0;32m[ RUN      ] [mNVFuserTest.FusionTernaryOps_CUDA
2023-01-11T23:17:06.4424365Z [0;32m[       OK ] [mNVFuserTest.FusionTernaryOps_CUDA (1353 ms)
2023-01-11T23:17:06.4425076Z [0;32m[ RUN      ] [mNVFuserTest.FusionCompoundOps_CUDA
2023-01-11T23:17:07.9387583Z [0;32m[       OK ] [mNVFuserTest.FusionCompoundOps_CUDA (1496 ms)
2023-01-11T23:17:07.9387989Z [0;32m[ RUN      ] [mNVFuserTest.FusionCastOps_CUDA
2023-01-11T23:17:08.1073864Z [0;32m[       OK ] [mNVFuserTest.FusionCastOps_CUDA (168 ms)
2023-01-11T23:17:08.1074681Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduction1_CUDA
2023-01-11T23:17:08.3095523Z [0;32m[       OK ] [mNVFuserTest.FusionReduction1_CUDA (202 ms)
2023-01-11T23:17:08.3095929Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduction2_CUDA
2023-01-11T23:17:08.5123733Z [0;32m[       OK ] [mNVFuserTest.FusionReduction2_CUDA (202 ms)
2023-01-11T23:17:08.5124128Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduction3_CUDA
2023-01-11T23:17:08.6999406Z [0;32m[       OK ] [mNVFuserTest.FusionReduction3_CUDA (187 ms)
2023-01-11T23:17:08.6999826Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduction4_CUDA
2023-01-11T23:17:08.8885910Z [0;32m[       OK ] [mNVFuserTest.FusionReduction4_CUDA (188 ms)
2023-01-11T23:17:08.8886323Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduction5_CUDA
2023-01-11T23:17:09.0754521Z [0;32m[       OK ] [mNVFuserTest.FusionReduction5_CUDA (186 ms)
2023-01-11T23:17:09.0755422Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduction6_CUDA
2023-01-11T23:17:09.2664697Z [0;32m[       OK ] [mNVFuserTest.FusionReduction6_CUDA (191 ms)
2023-01-11T23:17:09.2665152Z [0;32m[ RUN      ] [mNVFuserTest.FusionMultiGridReduction_CUDA
2023-01-11T23:17:09.5059842Z [0;32m[       OK ] [mNVFuserTest.FusionMultiGridReduction_CUDA (239 ms)
2023-01-11T23:17:09.5060298Z [0;32m[ RUN      ] [mNVFuserTest.FusionMultiGridReduction2_CUDA
2023-01-11T23:17:09.5102718Z [0;32m[       OK ] [mNVFuserTest.FusionMultiGridReduction2_CUDA (4 ms)
2023-01-11T23:17:09.5103275Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionTFT_CUDA
2023-01-11T23:17:09.7042825Z [0;32m[       OK ] [mNVFuserTest.FusionReductionTFT_CUDA (193 ms)
2023-01-11T23:17:09.7043249Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionOuterSplit_CUDA
2023-01-11T23:17:09.9174937Z [0;32m[       OK ] [mNVFuserTest.FusionReductionOuterSplit_CUDA (212 ms)
2023-01-11T23:17:09.9175379Z [0;32m[ RUN      ] [mNVFuserTest.FusionBranches_CUDA
2023-01-11T23:17:10.1472011Z [0;32m[       OK ] [mNVFuserTest.FusionBranches_CUDA (229 ms)
2023-01-11T23:17:10.1473511Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleBCast1_CUDA
2023-01-11T23:17:10.3641186Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleBCast1_CUDA (217 ms)
2023-01-11T23:17:10.3641676Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleBCast2_CUDA
2023-01-11T23:17:10.5415611Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleBCast2_CUDA (177 ms)
2023-01-11T23:17:10.5416267Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleBCast3_CUDA
2023-01-11T23:17:10.7163825Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleBCast3_CUDA (175 ms)
2023-01-11T23:17:10.7164631Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleBCast4_CUDA
2023-01-11T23:17:10.9262201Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleBCast4_CUDA (209 ms)
2023-01-11T23:17:10.9262647Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleBCast5_CUDA
2023-01-11T23:17:11.1457002Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleBCast5_CUDA (219 ms)
2023-01-11T23:17:11.1457517Z [0;32m[ RUN      ] [mNVFuserTest.FusionComplexBCast1_CUDA
2023-01-11T23:17:11.3816614Z [0;32m[       OK ] [mNVFuserTest.FusionComplexBCast1_CUDA (235 ms)
2023-01-11T23:17:11.3817122Z [0;32m[ RUN      ] [mNVFuserTest.FusionComplexBCast2_CUDA
2023-01-11T23:17:11.5663638Z [0;32m[       OK ] [mNVFuserTest.FusionComplexBCast2_CUDA (184 ms)
2023-01-11T23:17:11.5664702Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing1_CUDA
2023-01-11T23:17:11.8023978Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing1_CUDA (235 ms)
2023-01-11T23:17:11.8025013Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing2_CUDA
2023-01-11T23:17:12.0402543Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing2_CUDA (238 ms)
2023-01-11T23:17:12.0403034Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing3_CUDA
2023-01-11T23:17:12.2358010Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing3_CUDA (195 ms)
2023-01-11T23:17:12.2358443Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing4_CUDA
2023-01-11T23:17:12.6260083Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing4_CUDA (390 ms)
2023-01-11T23:17:12.6260589Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing5_CUDA
2023-01-11T23:17:12.8428877Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing5_CUDA (216 ms)
2023-01-11T23:17:12.8429437Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing6_CUDA
2023-01-11T23:17:13.0942453Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing6_CUDA (251 ms)
2023-01-11T23:17:13.0943269Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing7_CUDA
2023-01-11T23:17:13.2997273Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing7_CUDA (205 ms)
2023-01-11T23:17:13.2997707Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing8_CUDA
2023-01-11T23:17:14.2782686Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing8_CUDA (978 ms)
2023-01-11T23:17:14.2783537Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing9_CUDA
2023-01-11T23:17:14.5280418Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing9_CUDA (249 ms)
2023-01-11T23:17:14.5280850Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing10_CUDA
2023-01-11T23:17:14.7217755Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing10_CUDA (193 ms)
2023-01-11T23:17:14.7218440Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedIndexing11_CUDA
2023-01-11T23:17:14.9001324Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedIndexing11_CUDA (178 ms)
2023-01-11T23:17:14.9001968Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedLowering1_CUDA
2023-01-11T23:17:15.1918816Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedLowering1_CUDA (291 ms)
2023-01-11T23:17:15.1919515Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedLowering2_CUDA
2023-01-11T23:17:15.3736828Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedLowering2_CUDA (181 ms)
2023-01-11T23:17:15.3737955Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedLowering3_CUDA
2023-01-11T23:17:15.5507692Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedLowering3_CUDA (177 ms)
2023-01-11T23:17:15.5508340Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedLowering4_CUDA
2023-01-11T23:17:15.7677333Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedLowering4_CUDA (216 ms)
2023-01-11T23:17:15.7677970Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedLowering5_CUDA
2023-01-11T23:17:16.0147738Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedLowering5_CUDA (247 ms)
2023-01-11T23:17:16.0148337Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedLowering6_CUDA
2023-01-11T23:17:16.2678712Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedLowering6_CUDA (253 ms)
2023-01-11T23:17:16.2679139Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleGemm_CUDA
2023-01-11T23:17:16.4681704Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleGemm_CUDA (200 ms)
2023-01-11T23:17:16.4683294Z [0;32m[ RUN      ] [mNVFuserTest.FusionSoftmax1D_CUDA
2023-01-11T23:17:16.6706327Z [0;32m[       OK ] [mNVFuserTest.FusionSoftmax1D_CUDA (202 ms)
2023-01-11T23:17:16.6707059Z [0;32m[ RUN      ] [mNVFuserTest.FusionSoftmax1DNormalized_CUDA
2023-01-11T23:17:16.8979592Z [0;32m[       OK ] [mNVFuserTest.FusionSoftmax1DNormalized_CUDA (227 ms)
2023-01-11T23:17:16.8980358Z [0;32m[ RUN      ] [mNVFuserTest.FusionSoftmax3D_CUDA
2023-01-11T23:17:17.0929909Z [0;32m[       OK ] [mNVFuserTest.FusionSoftmax3D_CUDA (195 ms)
2023-01-11T23:17:17.0930372Z [0;32m[ RUN      ] [mNVFuserTest.FusionSoftmax3DNormalized_CUDA
2023-01-11T23:17:17.3048968Z [0;32m[       OK ] [mNVFuserTest.FusionSoftmax3DNormalized_CUDA (211 ms)
2023-01-11T23:17:17.3049592Z [0;32m[ RUN      ] [mNVFuserTest.FusionSoftmaxComputeAt_CUDA
2023-01-11T23:17:17.3084722Z [0;32m[       OK ] [mNVFuserTest.FusionSoftmaxComputeAt_CUDA (3 ms)
2023-01-11T23:17:17.3085268Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction1_CUDA
2023-01-11T23:17:17.5560196Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction1_CUDA (247 ms)
2023-01-11T23:17:17.5560599Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction2_CUDA
2023-01-11T23:17:17.8040299Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction2_CUDA (247 ms)
2023-01-11T23:17:17.8040692Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction3dim1_CUDA
2023-01-11T23:17:18.0161262Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction3dim1_CUDA (212 ms)
2023-01-11T23:17:18.0161701Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction3dim0_CUDA
2023-01-11T23:17:18.2275224Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction3dim0_CUDA (211 ms)
2023-01-11T23:17:18.2275947Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction4_CUDA
2023-01-11T23:17:18.4526313Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction4_CUDA (225 ms)
2023-01-11T23:17:18.4526746Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction5_CUDA
2023-01-11T23:17:18.6667660Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction5_CUDA (213 ms)
2023-01-11T23:17:18.6668385Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction6_CUDA
2023-01-11T23:17:18.8983119Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction6_CUDA (231 ms)
2023-01-11T23:17:18.8983906Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction7_CUDA
2023-01-11T23:17:19.1041619Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction7_CUDA (206 ms)
2023-01-11T23:17:19.1042032Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction8_CUDA
2023-01-11T23:17:19.3081267Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction8_CUDA (203 ms)
2023-01-11T23:17:19.3081673Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction9_CUDA
2023-01-11T23:17:19.5229240Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction9_CUDA (214 ms)
2023-01-11T23:17:19.5229658Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReduction10_CUDA
2023-01-11T23:17:19.7519099Z [0;32m[       OK ] [mNVFuserTest.FusionGridReduction10_CUDA (228 ms)
2023-01-11T23:17:19.7519508Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonRedAxisBind_CUDA
2023-01-11T23:17:19.9229605Z [0;32m[       OK ] [mNVFuserTest.FusionNonRedAxisBind_CUDA (170 ms)
2023-01-11T23:17:19.9230042Z [0;32m[ RUN      ] [mNVFuserTest.FusionSplitBCast_CUDA
2023-01-11T23:17:20.1129679Z [0;32m[       OK ] [mNVFuserTest.FusionSplitBCast_CUDA (190 ms)
2023-01-11T23:17:20.1130085Z [0;32m[ RUN      ] [mNVFuserTest.FusionBCastInnerDim_CUDA
2023-01-11T23:17:20.1130449Z [0;32m[       OK ] [mNVFuserTest.FusionBCastInnerDim_CUDA (0 ms)
2023-01-11T23:17:20.1130804Z [0;32m[ RUN      ] [mNVFuserTest.FusionBCastReduce_CUDA
2023-01-11T23:17:20.1131161Z [0;32m[       OK ] [mNVFuserTest.FusionBCastReduce_CUDA (0 ms)
2023-01-11T23:17:20.1134002Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionMultiConsumer_CUDA
2023-01-11T23:17:20.1135237Z [0;32m[       OK ] [mNVFuserTest.FusionReductionMultiConsumer_CUDA (0 ms)
2023-01-11T23:17:20.1136065Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtExprOrder1_CUDA
2023-01-11T23:17:20.4582809Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtExprOrder1_CUDA (344 ms)
2023-01-11T23:17:20.4583335Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtExprOrder2_CUDA
2023-01-11T23:17:20.7298450Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtExprOrder2_CUDA (271 ms)
2023-01-11T23:17:20.7298874Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtExprOrder3_CUDA
2023-01-11T23:17:21.4906425Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtExprOrder3_CUDA (760 ms)
2023-01-11T23:17:21.4907296Z [0;32m[ RUN      ] [mNVFuserTest.FusionZeroDimComputeAt_CUDA
2023-01-11T23:17:21.6597543Z [0;32m[       OK ] [mNVFuserTest.FusionZeroDimComputeAt_CUDA (169 ms)
2023-01-11T23:17:21.6598356Z [0;32m[ RUN      ] [mNVFuserTest.FusionZeroDimBroadcast_CUDA
2023-01-11T23:17:21.8411430Z [0;32m[       OK ] [mNVFuserTest.FusionZeroDimBroadcast_CUDA (181 ms)
2023-01-11T23:17:21.8412681Z [0;32m[ RUN      ] [mNVFuserTest.FusionZeroDimReduction_CUDA
2023-01-11T23:17:22.0533418Z [0;32m[       OK ] [mNVFuserTest.FusionZeroDimReduction_CUDA (212 ms)
2023-01-11T23:17:22.0534241Z [0;32m[ RUN      ] [mNVFuserTest.FusionBCastAfterReduce_CUDA
2023-01-11T23:17:22.2475883Z [0;32m[       OK ] [mNVFuserTest.FusionBCastAfterReduce_CUDA (194 ms)
2023-01-11T23:17:22.2476725Z [0;32m[ RUN      ] [mNVFuserTest.FusionOutputBroadcast_CUDA
2023-01-11T23:17:22.4204328Z [0;32m[       OK ] [mNVFuserTest.FusionOutputBroadcast_CUDA (173 ms)
2023-01-11T23:17:22.4204827Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionKeepDimBasic_CUDA
2023-01-11T23:17:26.1359962Z [0;32m[       OK ] [mNVFuserTest.FusionReductionKeepDimBasic_CUDA (3715 ms)
2023-01-11T23:17:26.1360680Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionKeepDimScheduler_CUDA
2023-01-11T23:17:26.3538212Z [0;32m[       OK ] [mNVFuserTest.FusionReductionKeepDimScheduler_CUDA (217 ms)
2023-01-11T23:17:26.3539047Z [0;32m[ RUN      ] [mNVFuserTest.FusionSumTo_CUDA
2023-01-11T23:17:30.0736290Z [0;32m[       OK ] [mNVFuserTest.FusionSumTo_CUDA (3719 ms)
2023-01-11T23:17:30.0736832Z [0;32m[ RUN      ] [mNVFuserTest.FusionSumToNoop_CUDA
2023-01-11T23:17:30.4026884Z [0;32m[       OK ] [mNVFuserTest.FusionSumToNoop_CUDA (329 ms)
2023-01-11T23:17:30.4027488Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionScheduler_CUDA
2023-01-11T23:17:30.6199089Z [0;32m[       OK ] [mNVFuserTest.FusionReductionScheduler_CUDA (217 ms)
2023-01-11T23:17:30.6200031Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionWithTrivialReduction_CUDA
2023-01-11T23:17:35.0989311Z [0;32m[       OK ] [mNVFuserTest.FusionReductionWithTrivialReduction_CUDA (4478 ms)
2023-01-11T23:17:35.0990236Z [0;32m[ RUN      ] [mNVFuserTest.FusionSymbolicReduction_CUDA
2023-01-11T23:17:35.2864778Z [0;32m[       OK ] [mNVFuserTest.FusionSymbolicReduction_CUDA (187 ms)
2023-01-11T23:17:35.2865283Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionSchedulerMultiDimNonFastest_CUDA
2023-01-11T23:17:35.5094744Z [0;32m[       OK ] [mNVFuserTest.FusionReductionSchedulerMultiDimNonFastest_CUDA (223 ms)
2023-01-11T23:17:35.5095463Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionSchedulerMultiDimFastest_CUDA
2023-01-11T23:17:35.7274522Z [0;32m[       OK ] [mNVFuserTest.FusionReductionSchedulerMultiDimFastest_CUDA (217 ms)
2023-01-11T23:17:35.7275077Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionSchedulerNoODimShmoo_CUDA
2023-01-11T23:17:45.1729172Z [0;32m[       OK ] [mNVFuserTest.FusionReductionSchedulerNoODimShmoo_CUDA (9445 ms)
2023-01-11T23:17:45.1729663Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionSchedulerDimShmoo_CUDA
2023-01-11T23:18:25.7474117Z [0;32m[       OK ] [mNVFuserTest.FusionReductionSchedulerDimShmoo_CUDA (40574 ms)
2023-01-11T23:18:25.7475087Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheBefore_CUDA
2023-01-11T23:18:25.9231969Z [0;32m[       OK ] [mNVFuserTest.FusionCacheBefore_CUDA (175 ms)
2023-01-11T23:18:25.9232933Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheAfter_CUDA
2023-01-11T23:18:26.0982720Z [0;32m[       OK ] [mNVFuserTest.FusionCacheAfter_CUDA (174 ms)
2023-01-11T23:18:26.0983515Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheFork_CUDA
2023-01-11T23:18:26.2759374Z [0;32m[       OK ] [mNVFuserTest.FusionCacheFork_CUDA (177 ms)
2023-01-11T23:18:26.2759757Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheIndirect_CUDA
2023-01-11T23:18:26.4572162Z [0;32m[       OK ] [mNVFuserTest.FusionCacheIndirect_CUDA (181 ms)
2023-01-11T23:18:26.4572983Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheBcast_CUDA
2023-01-11T23:18:26.9200552Z [0;32m[       OK ] [mNVFuserTest.FusionCacheBcast_CUDA (462 ms)
2023-01-11T23:18:26.9201030Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheMultiConsumer_CUDA
2023-01-11T23:18:27.1002785Z [0;32m[       OK ] [mNVFuserTest.FusionCacheMultiConsumer_CUDA (180 ms)
2023-01-11T23:18:27.1003180Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmem_CUDA
2023-01-11T23:18:27.3894992Z [0;32m[       OK ] [mNVFuserTest.FusionSmem_CUDA (288 ms)
2023-01-11T23:18:27.3896324Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemReduce_CUDA
2023-01-11T23:18:28.6751796Z [0;32m[       OK ] [mNVFuserTest.FusionSmemReduce_CUDA (1285 ms)
2023-01-11T23:18:28.6752300Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemBlockGemm_CUDA
2023-01-11T23:18:28.9314097Z [0;32m[       OK ] [mNVFuserTest.FusionSmemBlockGemm_CUDA (256 ms)
2023-01-11T23:18:28.9315115Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemBlockGemmCache_CUDA
2023-01-11T23:18:29.1845230Z [0;32m[       OK ] [mNVFuserTest.FusionSmemBlockGemmCache_CUDA (253 ms)
2023-01-11T23:18:29.1845764Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemDynamicPersistentSoftmax2D_CUDA
2023-01-11T23:18:29.4242309Z [0;32m[       OK ] [mNVFuserTest.FusionSmemDynamicPersistentSoftmax2D_CUDA (239 ms)
2023-01-11T23:18:29.4242820Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerSoftmax_CUDA
2023-01-11T23:18:29.8499499Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerSoftmax_CUDA (425 ms)
2023-01-11T23:18:29.8500031Z [0;32m[ RUN      ] [mNVFuserTest.FusionTestMaskSoftmax_CUDA
2023-01-11T23:18:30.1598962Z [0;32m[       OK ] [mNVFuserTest.FusionTestMaskSoftmax_CUDA (309 ms)
2023-01-11T23:18:30.1599472Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerLayerNormBackward_CUDA
2023-01-11T23:18:31.2048010Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerLayerNormBackward_CUDA (1045 ms)
2023-01-11T23:18:31.2048509Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerRMSNormBackward_CUDA
2023-01-11T23:18:31.8839860Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerRMSNormBackward_CUDA (678 ms)
2023-01-11T23:18:31.8840841Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerLayerNormalization_CUDA
2023-01-11T23:18:32.4080621Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerLayerNormalization_CUDA (524 ms)
2023-01-11T23:18:32.4081142Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerRMSNormalization_CUDA
2023-01-11T23:18:32.6479052Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerRMSNormalization_CUDA (239 ms)
2023-01-11T23:18:32.6479616Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerBatchNormalization_CUDA
2023-01-11T23:18:34.9093561Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerBatchNormalization_CUDA (2261 ms)
2023-01-11T23:18:34.9095467Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerInstanceNormalization_CUDA
2023-01-11T23:18:36.1120765Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerInstanceNormalization_CUDA (1202 ms)
2023-01-11T23:18:36.1121408Z [0;32m[ RUN      ] [mNVFuserTest.FusionMagicSchedulerInstanceNormalizationBackward_CUDA
2023-01-11T23:18:37.6478406Z [0;32m[       OK ] [mNVFuserTest.FusionMagicSchedulerInstanceNormalizationBackward_CUDA (1535 ms)
2023-01-11T23:18:37.6479537Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentSoftmaxLocalShared_CUDA
2023-01-11T23:18:40.6363863Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentSoftmaxLocalShared_CUDA (2988 ms)
2023-01-11T23:18:40.6364357Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentNormLocalShared_CUDA
2023-01-11T23:18:41.1261651Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentNormLocalShared_CUDA (489 ms)
2023-01-11T23:18:41.1262145Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemDynamicPersistentNorm_CUDA
2023-01-11T23:18:41.3656038Z [0;32m[       OK ] [mNVFuserTest.FusionSmemDynamicPersistentNorm_CUDA (239 ms)
2023-01-11T23:18:41.3657002Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemDynamicReductionSymbolic_CUDA
2023-01-11T23:18:41.5608517Z [0;32m[       OK ] [mNVFuserTest.FusionSmemDynamicReductionSymbolic_CUDA (195 ms)
2023-01-11T23:18:41.5609058Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemDynamicReductionSymbolicArg_CUDA
2023-01-11T23:18:42.8851544Z [0;32m[       OK ] [mNVFuserTest.FusionSmemDynamicReductionSymbolicArg_CUDA (1324 ms)
2023-01-11T23:18:42.8852117Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemDynamicPwiseMulSymbolicArgWAR_CUDA
2023-01-11T23:18:45.8315205Z [0;32m[       OK ] [mNVFuserTest.FusionSmemDynamicPwiseMulSymbolicArgWAR_CUDA (2946 ms)
2023-01-11T23:18:45.8316109Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemDynamicTiledGemm_CUDA
2023-01-11T23:18:46.0798515Z [0;32m[       OK ] [mNVFuserTest.FusionSmemDynamicTiledGemm_CUDA (248 ms)
2023-01-11T23:18:46.0799553Z [0;32m[ RUN      ] [mNVFuserTest.FusionGlobalIntermediate_CUDA
2023-01-11T23:18:46.2765075Z [0;32m[       OK ] [mNVFuserTest.FusionGlobalIntermediate_CUDA (197 ms)
2023-01-11T23:18:46.2765635Z [0;32m[ RUN      ] [mNVFuserTest.FusionGlobalIntermediateDefaultSchedule_CUDA
2023-01-11T23:18:46.4631536Z [0;32m[       OK ] [mNVFuserTest.FusionGlobalIntermediateDefaultSchedule_CUDA (186 ms)
2023-01-11T23:18:46.4632560Z [0;32m[ RUN      ] [mNVFuserTest.FusionConstCheck_CUDA
2023-01-11T23:18:46.4633263Z [0;32m[       OK ] [mNVFuserTest.FusionConstCheck_CUDA (0 ms)
2023-01-11T23:18:46.4633682Z [0;32m[ RUN      ] [mNVFuserTest.FusionUnrollWithAlloc_CUDA
2023-01-11T23:18:46.6661322Z [0;32m[       OK ] [mNVFuserTest.FusionUnrollWithAlloc_CUDA (202 ms)
2023-01-11T23:18:46.6662242Z [0;32m[ RUN      ] [mNVFuserTest.FusionIsZeroInt_CUDA
2023-01-11T23:18:46.6662993Z [0;32m[       OK ] [mNVFuserTest.FusionIsZeroInt_CUDA (0 ms)
2023-01-11T23:18:46.6664397Z [0;32m[ RUN      ] [mNVFuserTest.FusionIsOneInt_CUDA
2023-01-11T23:18:46.6664775Z [0;32m[       OK ] [mNVFuserTest.FusionIsOneInt_CUDA (0 ms)
2023-01-11T23:18:46.6665178Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtNonterminatingOutput_CUDA
2023-01-11T23:18:46.8390415Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtNonterminatingOutput_CUDA (172 ms)
2023-01-11T23:18:46.8391595Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder1_CUDA
2023-01-11T23:18:47.0163554Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder1_CUDA (177 ms)
2023-01-11T23:18:47.0163977Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder2_CUDA
2023-01-11T23:18:47.1924465Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder2_CUDA (175 ms)
2023-01-11T23:18:47.1924901Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder3_CUDA
2023-01-11T23:18:47.7282917Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder3_CUDA (535 ms)
2023-01-11T23:18:47.7283382Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder4_CUDA
2023-01-11T23:18:47.9045386Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder4_CUDA (176 ms)
2023-01-11T23:18:47.9045850Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder5_CUDA
2023-01-11T23:18:48.0805266Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder5_CUDA (176 ms)
2023-01-11T23:18:48.0805690Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder6_CUDA
2023-01-11T23:18:48.3424198Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder6_CUDA (261 ms)
2023-01-11T23:18:48.3424650Z [0;32m[ RUN      ] [mNVFuserTest.FusionTraversalOrder7_CUDA
2023-01-11T23:18:48.7228864Z [0;32m[       OK ] [mNVFuserTest.FusionTraversalOrder7_CUDA (380 ms)
2023-01-11T23:18:48.7229687Z [0;32m[ RUN      ] [mNVFuserTest.FusionThreadPredicate_CUDA
2023-01-11T23:18:48.9446106Z [0;32m[       OK ] [mNVFuserTest.FusionThreadPredicate_CUDA (221 ms)
2023-01-11T23:18:48.9446538Z [0;32m[ RUN      ] [mNVFuserTest.FusionLSTMCell_CUDA
2023-01-11T23:18:49.5344288Z [0;32m[       OK ] [mNVFuserTest.FusionLSTMCell_CUDA (589 ms)
2023-01-11T23:18:49.5344706Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionHalf_CUDA
2023-01-11T23:18:49.7639027Z [0;32m[       OK ] [mNVFuserTest.FusionReductionHalf_CUDA (229 ms)
2023-01-11T23:18:49.7639805Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduceSingle_CUDA
2023-01-11T23:18:50.0720471Z [0;32m[       OK ] [mNVFuserTest.FusionReduceSingle_CUDA (307 ms)
2023-01-11T23:18:50.0721333Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduceImplicitBroadcast_CUDA
2023-01-11T23:18:50.2914387Z [0;32m[       OK ] [mNVFuserTest.FusionReduceImplicitBroadcast_CUDA (219 ms)
2023-01-11T23:18:50.2915023Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduceImplicitBroadcast2_CUDA
2023-01-11T23:18:50.5123824Z [0;32m[       OK ] [mNVFuserTest.FusionReduceImplicitBroadcast2_CUDA (220 ms)
2023-01-11T23:18:50.5125068Z [0;32m[ RUN      ] [mNVFuserTest.FusionReduceImplicitBroadcast3_CUDA
2023-01-11T23:18:50.7325543Z [0;32m[       OK ] [mNVFuserTest.FusionReduceImplicitBroadcast3_CUDA (220 ms)
2023-01-11T23:18:50.7326081Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReduction_CUDA
2023-01-11T23:18:51.2474858Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReduction_CUDA (514 ms)
2023-01-11T23:18:51.2475298Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReduction2_CUDA
2023-01-11T23:18:51.4323924Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReduction2_CUDA (184 ms)
2023-01-11T23:18:51.4324438Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReduction3_CUDA
2023-01-11T23:18:51.6157785Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReduction3_CUDA (183 ms)
2023-01-11T23:18:51.6158609Z [0;32m[ RUN      ] [mNVFuserTest.FusionDetectTrivialReduction1_CUDA
2023-01-11T23:18:51.8109956Z [0;32m[       OK ] [mNVFuserTest.FusionDetectTrivialReduction1_CUDA (194 ms)
2023-01-11T23:18:51.8111174Z [0;32m[ RUN      ] [mNVFuserTest.FusionDetectTrivialReduction2_CUDA
2023-01-11T23:18:51.8132102Z [0;32m[       OK ] [mNVFuserTest.FusionDetectTrivialReduction2_CUDA (2 ms)
2023-01-11T23:18:51.8132683Z [0;32m[ RUN      ] [mNVFuserTest.FusionInputsIdLookup_CUDA
2023-01-11T23:18:51.8133154Z [0;32m[       OK ] [mNVFuserTest.FusionInputsIdLookup_CUDA (0 ms)
2023-01-11T23:18:51.8133632Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupGuardSimpleTensor_CUDA
2023-01-11T23:18:51.8134293Z [0;32m[       OK ] [mNVFuserTest.FusionGroupGuardSimpleTensor_CUDA (0 ms)
2023-01-11T23:18:51.8135096Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupGuardBroadcastTensor_CUDA
2023-01-11T23:18:51.8135623Z [0;32m[       OK ] [mNVFuserTest.FusionGroupGuardBroadcastTensor_CUDA (0 ms)
2023-01-11T23:18:51.8136197Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupGuardPermutedTensor_CUDA
2023-01-11T23:18:51.8136663Z [0;32m[       OK ] [mNVFuserTest.FusionGroupGuardPermutedTensor_CUDA (0 ms)
2023-01-11T23:18:51.8140940Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupGuardRelaxedCheck_CUDA
2023-01-11T23:18:51.8141553Z [0;32m[       OK ] [mNVFuserTest.FusionGroupGuardRelaxedCheck_CUDA (0 ms)
2023-01-11T23:18:51.8142017Z [0;32m[ RUN      ] [mNVFuserTest.FusionDisjointSet_CUDA
2023-01-11T23:18:51.8142627Z [0;32m[       OK ] [mNVFuserTest.FusionDisjointSet_CUDA (0 ms)
2023-01-11T23:18:51.8143010Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonUniqueBroadcastSize_CUDA
2023-01-11T23:18:51.8167773Z [0;32m[       OK ] [mNVFuserTest.FusionNonUniqueBroadcastSize_CUDA (3 ms)
2023-01-11T23:18:51.8168185Z [0;32m[ RUN      ] [mNVFuserTest.FusionBiasGeluFwd_CUDA
2023-01-11T23:18:52.0920124Z [0;32m[       OK ] [mNVFuserTest.FusionBiasGeluFwd_CUDA (274 ms)
2023-01-11T23:18:52.0920912Z [0;32m[ RUN      ] [mNVFuserTest.FusionBiasGeluBwd_CUDA
2023-01-11T23:18:52.4220230Z [0;32m[       OK ] [mNVFuserTest.FusionBiasGeluBwd_CUDA (330 ms)
2023-01-11T23:18:52.4220985Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue459_CUDA
2023-01-11T23:18:52.6028123Z [0;32m[       OK ] [mNVFuserTest.FusionIssue459_CUDA (180 ms)
2023-01-11T23:18:52.6028918Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemIndexingSimple_CUDA
2023-01-11T23:18:52.7737186Z [0;32m[       OK ] [mNVFuserTest.FusionSmemIndexingSimple_CUDA (171 ms)
2023-01-11T23:18:52.7737612Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemIndexing_CUDA
2023-01-11T23:18:53.0096690Z [0;32m[       OK ] [mNVFuserTest.FusionSmemIndexing_CUDA (235 ms)
2023-01-11T23:18:53.0097524Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheBeforeReduction_CUDA
2023-01-11T23:18:53.2121512Z [0;32m[       OK ] [mNVFuserTest.FusionCacheBeforeReduction_CUDA (202 ms)
2023-01-11T23:18:53.2122434Z [0;32m[ RUN      ] [mNVFuserTest.FusionCacheBeforeReduction2_CUDA
2023-01-11T23:18:53.3869256Z [0;32m[       OK ] [mNVFuserTest.FusionCacheBeforeReduction2_CUDA (174 ms)
2023-01-11T23:18:53.3870067Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue367_CUDA
2023-01-11T23:18:53.6461683Z [0;32m[       OK ] [mNVFuserTest.FusionIssue367_CUDA (259 ms)
2023-01-11T23:18:53.6462808Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue468_CUDA
2023-01-11T23:18:53.8375648Z [0;32m[       OK ] [mNVFuserTest.FusionIssue468_CUDA (191 ms)
2023-01-11T23:18:53.8376034Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue363_CUDA
2023-01-11T23:18:54.0510888Z [0;32m[       OK ] [mNVFuserTest.FusionIssue363_CUDA (213 ms)
2023-01-11T23:18:54.0511652Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue484_CUDA
2023-01-11T23:18:54.2388292Z [0;32m[       OK ] [mNVFuserTest.FusionIssue484_CUDA (187 ms)
2023-01-11T23:18:54.2389229Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue329_CUDA
2023-01-11T23:18:54.4152288Z [0;32m[       OK ] [mNVFuserTest.FusionIssue329_CUDA (176 ms)
2023-01-11T23:18:54.4153024Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue382_CUDA
2023-01-11T23:18:54.5959659Z [0;32m[       OK ] [mNVFuserTest.FusionIssue382_CUDA (181 ms)
2023-01-11T23:18:54.5960044Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue507_CUDA
2023-01-11T23:18:54.7655716Z [0;32m[       OK ] [mNVFuserTest.FusionIssue507_CUDA (169 ms)
2023-01-11T23:18:54.7656533Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue532_CUDA
2023-01-11T23:18:55.0485066Z [0;32m[       OK ] [mNVFuserTest.FusionIssue532_CUDA (282 ms)
2023-01-11T23:18:55.0485505Z [0;32m[ RUN      ] [mNVFuserTest.FusionLoopUnswitch_CUDA
2023-01-11T23:18:55.2276736Z [0;32m[       OK ] [mNVFuserTest.FusionLoopUnswitch_CUDA (178 ms)
2023-01-11T23:18:55.2277126Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue549_CUDA
2023-01-11T23:18:55.4317511Z [0;32m[       OK ] [mNVFuserTest.FusionIssue549_CUDA (204 ms)
2023-01-11T23:18:55.4318014Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleCompileRtc_CUDA
2023-01-11T23:18:55.6092416Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleCompileRtc_CUDA (177 ms)
2023-01-11T23:18:55.6092861Z [0;32m[ RUN      ] [mNVFuserTest.FusionSerialWelford_CUDA
2023-01-11T23:18:55.8769532Z [0;32m[       OK ] [mNVFuserTest.FusionSerialWelford_CUDA (267 ms)
2023-01-11T23:18:55.8770000Z [0;32m[ RUN      ] [mNVFuserTest.FusionBlockWelford_CUDA
2023-01-11T23:18:56.0833601Z [0;32m[       OK ] [mNVFuserTest.FusionBlockWelford_CUDA (206 ms)
2023-01-11T23:18:56.0834661Z [0;32m[ RUN      ] [mNVFuserTest.FusionBlockWelfordNoInit_CUDA
2023-01-11T23:18:56.2864472Z [0;32m[       OK ] [mNVFuserTest.FusionBlockWelfordNoInit_CUDA (203 ms)
2023-01-11T23:18:56.2864957Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridWelfordNoInit_CUDA
2023-01-11T23:18:56.5270157Z [0;32m[       OK ] [mNVFuserTest.FusionGridWelfordNoInit_CUDA (240 ms)
2023-01-11T23:18:56.5271270Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelfordOp_CUDA
2023-01-11T23:18:57.5592186Z [0;32m[       OK ] [mNVFuserTest.FusionWelfordOp_CUDA (1032 ms)
2023-01-11T23:18:57.5593653Z [0;32m[ RUN      ] [mNVFuserTest.FusionBlockWelfordOp_CUDA
2023-01-11T23:18:57.7703999Z [0;32m[       OK ] [mNVFuserTest.FusionBlockWelfordOp_CUDA (211 ms)
2023-01-11T23:18:57.7704810Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridWelfordOp_CUDA
2023-01-11T23:18:58.0075063Z [0;32m[       OK ] [mNVFuserTest.FusionGridWelfordOp_CUDA (236 ms)
2023-01-11T23:18:58.0075718Z [0;32m[ RUN      ] [mNVFuserTest.FusionRfactorWelfordOp_CUDA
2023-01-11T23:18:58.2157979Z [0;32m[       OK ] [mNVFuserTest.FusionRfactorWelfordOp_CUDA (208 ms)
2023-01-11T23:18:58.2158502Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelfordSchedule_CUDA
2023-01-11T23:18:58.4905580Z [0;32m[       OK ] [mNVFuserTest.FusionWelfordSchedule_CUDA (274 ms)
2023-01-11T23:18:58.4906580Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelfordShmoo_CUDA
2023-01-11T23:19:43.9584038Z [0;32m[       OK ] [mNVFuserTest.FusionWelfordShmoo_CUDA (45467 ms)
2023-01-11T23:19:43.9584808Z [0;32m[ RUN      ] [mNVFuserTest.FusionVarMean_CUDA
2023-01-11T23:19:46.1965125Z [0;32m[       OK ] [mNVFuserTest.FusionVarMean_CUDA (2238 ms)
2023-01-11T23:19:46.1965561Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleGemmTransposed_CUDA
2023-01-11T23:19:46.4019419Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleGemmTransposed_CUDA (205 ms)
2023-01-11T23:19:46.4020328Z [0;32m[ RUN      ] [mNVFuserTest.FusionSoftmax3DTransposed_CUDA
2023-01-11T23:19:46.6015994Z [0;32m[       OK ] [mNVFuserTest.FusionSoftmax3DTransposed_CUDA (199 ms)
2023-01-11T23:19:46.6016901Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAtTransposed1_CUDA
2023-01-11T23:19:46.8189681Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAtTransposed1_CUDA (217 ms)
2023-01-11T23:19:46.8190234Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAtTransposed2_CUDA
2023-01-11T23:19:47.0363171Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAtTransposed2_CUDA (217 ms)
2023-01-11T23:19:47.0363679Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAtTransposed3_CUDA
2023-01-11T23:19:47.3277644Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAtTransposed3_CUDA (291 ms)
2023-01-11T23:19:47.3278593Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAtTransposed4_CUDA
2023-01-11T23:19:47.7201773Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAtTransposed4_CUDA (392 ms)
2023-01-11T23:19:47.7202322Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAtTransposed5_CUDA
2023-01-11T23:19:47.9308892Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAtTransposed5_CUDA (210 ms)
2023-01-11T23:19:47.9309800Z [0;32m[ RUN      ] [mNVFuserTest.FusionAdvancedComputeAtTransposed6_CUDA
2023-01-11T23:19:48.1402285Z [0;32m[       OK ] [mNVFuserTest.FusionAdvancedComputeAtTransposed6_CUDA (209 ms)
2023-01-11T23:19:48.1402771Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentReducePointwise_CUDA
2023-01-11T23:19:48.5594730Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentReducePointwise_CUDA (418 ms)
2023-01-11T23:19:48.5595614Z [0;32m[ RUN      ] [mNVFuserTest.FusionMultipleVectorize_CUDA
2023-01-11T23:19:48.9346776Z [0;32m[       OK ] [mNVFuserTest.FusionMultipleVectorize_CUDA (375 ms)
2023-01-11T23:19:48.9347583Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeSimple_CUDA
2023-01-11T23:19:49.1549003Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeSimple_CUDA (220 ms)
2023-01-11T23:19:49.1549544Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleVectorizeUnroll_CUDA
2023-01-11T23:19:49.3753554Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleVectorizeUnroll_CUDA (220 ms)
2023-01-11T23:19:49.3754823Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentReduceSoftmax_CUDA
2023-01-11T23:19:49.9001229Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentReduceSoftmax_CUDA (524 ms)
2023-01-11T23:19:49.9001659Z [0;32m[ RUN      ] [mNVFuserTest.FusionSwizzle1_CUDA
2023-01-11T23:19:50.0742051Z [0;32m[       OK ] [mNVFuserTest.FusionSwizzle1_CUDA (173 ms)
2023-01-11T23:19:50.0742481Z [0;32m[ RUN      ] [mNVFuserTest.FusionSwizzle2_CUDA
2023-01-11T23:19:50.2489118Z [0;32m[       OK ] [mNVFuserTest.FusionSwizzle2_CUDA (174 ms)
2023-01-11T23:19:50.2489532Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridPersistence_CUDA
2023-01-11T23:19:50.4784870Z [0;32m[       OK ] [mNVFuserTest.FusionGridPersistence_CUDA (229 ms)
2023-01-11T23:19:50.4785728Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridPersistence2_CUDA
2023-01-11T23:19:50.7138676Z [0;32m[       OK ] [mNVFuserTest.FusionGridPersistence2_CUDA (235 ms)
2023-01-11T23:19:50.7139499Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelfordPersistence_CUDA
2023-01-11T23:19:50.9855300Z [0;32m[       OK ] [mNVFuserTest.FusionWelfordPersistence_CUDA (271 ms)
2023-01-11T23:19:50.9856167Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelfordPersistence2_CUDA
2023-01-11T23:19:51.2707036Z [0;32m[       OK ] [mNVFuserTest.FusionWelfordPersistence2_CUDA (285 ms)
2023-01-11T23:19:51.2707866Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue633_CUDA
2023-01-11T23:19:51.4438124Z [0;32m[       OK ] [mNVFuserTest.FusionIssue633_CUDA (173 ms)
2023-01-11T23:19:51.4438972Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastAcrossComputeAt_CUDA
2023-01-11T23:19:51.6215697Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastAcrossComputeAt_CUDA (177 ms)
2023-01-11T23:19:51.6216235Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedPointwise_CUDA
2023-01-11T23:19:51.8445535Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedPointwise_CUDA (223 ms)
2023-01-11T23:19:51.8446062Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedPointwiseMergeContig_CUDA
2023-01-11T23:19:52.0704178Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedPointwiseMergeContig_CUDA (225 ms)
2023-01-11T23:19:52.0705299Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicPass_CUDA
2023-01-11T23:19:52.2954896Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicPass_CUDA (224 ms)
2023-01-11T23:19:52.2956039Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicFail_CUDA
2023-01-11T23:19:52.3038417Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedPointwiseMergeSymbolicFail_CUDA (8 ms)
2023-01-11T23:19:52.3039442Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedRFactor_CUDA
2023-01-11T23:19:52.5359941Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedRFactor_CUDA (231 ms)
2023-01-11T23:19:52.5360843Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedWrongDimFail_CUDA
2023-01-11T23:19:52.5410701Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedWrongDimFail_CUDA (5 ms)
2023-01-11T23:19:52.5411684Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedStride_CUDA
2023-01-11T23:19:52.7635510Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedStride_CUDA (221 ms)
2023-01-11T23:19:52.7636456Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeMisalignedStrideFail_CUDA
2023-01-11T23:19:52.9920421Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeMisalignedStrideFail_CUDA (228 ms)
2023-01-11T23:19:52.9921111Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorization1_CUDA
2023-01-11T23:19:53.2052154Z [0;32m[       OK ] [mNVFuserTest.FusionVectorization1_CUDA (213 ms)
2023-01-11T23:19:53.2052568Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorization2_CUDA
2023-01-11T23:19:53.2108799Z [0;32m[       OK ] [mNVFuserTest.FusionVectorization2_CUDA (5 ms)
2023-01-11T23:19:53.2109600Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorization3_CUDA
2023-01-11T23:19:53.4280210Z [0;32m[       OK ] [mNVFuserTest.FusionVectorization3_CUDA (216 ms)
2023-01-11T23:19:53.4281398Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizationRFactor_CUDA
2023-01-11T23:19:53.6333745Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizationRFactor_CUDA (205 ms)
2023-01-11T23:19:53.6335192Z [0;32m[ RUN      ] [mNVFuserTest.FusionSizeOneLoop1_CUDA
2023-01-11T23:19:53.8210088Z [0;32m[       OK ] [mNVFuserTest.FusionSizeOneLoop1_CUDA (187 ms)
2023-01-11T23:19:53.8210511Z [0;32m[ RUN      ] [mNVFuserTest.FusionSizeOneLoop2_CUDA
2023-01-11T23:19:54.0130692Z [0;32m[       OK ] [mNVFuserTest.FusionSizeOneLoop2_CUDA (191 ms)
2023-01-11T23:19:54.0131242Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize1_CUDA
2023-01-11T23:19:54.0162204Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize1_CUDA (3 ms)
2023-01-11T23:19:54.0162781Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize2_CUDA
2023-01-11T23:19:54.1878959Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize2_CUDA (171 ms)
2023-01-11T23:19:54.1880068Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize3_CUDA
2023-01-11T23:19:54.3640662Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize3_CUDA (176 ms)
2023-01-11T23:19:54.3641697Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize4_CUDA
2023-01-11T23:19:54.5408008Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize4_CUDA (177 ms)
2023-01-11T23:19:54.5408540Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize5_CUDA
2023-01-11T23:19:54.7184625Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize5_CUDA (177 ms)
2023-01-11T23:19:54.7185153Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize6_CUDA
2023-01-11T23:19:54.7249386Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize6_CUDA (6 ms)
2023-01-11T23:19:54.7249813Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelize7_CUDA
2023-01-11T23:19:54.7262156Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelize7_CUDA (1 ms)
2023-01-11T23:19:54.7262574Z [0;32m[ RUN      ] [mNVFuserTest.FusionDAGMerging_CUDA
2023-01-11T23:19:54.7433856Z [0;32m[       OK ] [mNVFuserTest.FusionDAGMerging_CUDA (16 ms)
2023-01-11T23:19:54.7434662Z [0;32m[ RUN      ] [mNVFuserTest.FusionDAGScalarMerging_CUDA
2023-01-11T23:19:55.1639087Z [0;32m[       OK ] [mNVFuserTest.FusionDAGScalarMerging_CUDA (420 ms)
2023-01-11T23:19:55.1639962Z [0;32m[ RUN      ] [mNVFuserTest.FusionBlockReduceInSerialLoop_CUDA
2023-01-11T23:19:55.3482840Z [0;32m[       OK ] [mNVFuserTest.FusionBlockReduceInSerialLoop_CUDA (184 ms)
2023-01-11T23:19:55.3483317Z [0;32m[ RUN      ] [mNVFuserTest.FusionBlockWelfordInSerialLoop_CUDA
2023-01-11T23:19:55.5533818Z [0;32m[       OK ] [mNVFuserTest.FusionBlockWelfordInSerialLoop_CUDA (205 ms)
2023-01-11T23:19:55.5535272Z [0;32m[ RUN      ] [mNVFuserTest.FusionIOTensorTrivialReductionRepro_CUDA
2023-01-11T23:19:55.7235136Z [0;32m[       OK ] [mNVFuserTest.FusionIOTensorTrivialReductionRepro_CUDA (170 ms)
2023-01-11T23:19:55.7235634Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionPredicate_CUDA
2023-01-11T23:19:56.0479731Z [0;32m[       OK ] [mNVFuserTest.FusionReductionPredicate_CUDA (324 ms)
2023-01-11T23:19:56.0480726Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue728_CUDA
2023-01-11T23:19:56.0481419Z [0;32m[       OK ] [mNVFuserTest.FusionIssue728_CUDA (0 ms)
2023-01-11T23:19:56.0481912Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue757_CUDA
2023-01-11T23:19:56.2388274Z [0;32m[       OK ] [mNVFuserTest.FusionIssue757_CUDA (190 ms)
2023-01-11T23:19:56.2389323Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicatedBlockBroadcast_CUDA
2023-01-11T23:19:56.4315718Z [0;32m[       OK ] [mNVFuserTest.FusionPredicatedBlockBroadcast_CUDA (192 ms)
2023-01-11T23:19:56.4316785Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentVerticalMerge_CUDA
2023-01-11T23:19:56.4586180Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentVerticalMerge_CUDA (27 ms)
2023-01-11T23:19:56.4587054Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentHorizontalMerge_CUDA
2023-01-11T23:19:56.4646287Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentHorizontalMerge_CUDA (6 ms)
2023-01-11T23:19:56.4647116Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentMixReduction_CUDA
2023-01-11T23:19:56.4769251Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentMixReduction_CUDA (12 ms)
2023-01-11T23:19:56.4769634Z [0;32m[ RUN      ] [mNVFuserTest.FusionSBAR_CUDA
2023-01-11T23:19:56.7109465Z [0;32m[       OK ] [mNVFuserTest.FusionSBAR_CUDA (233 ms)
2023-01-11T23:19:56.7109904Z [0;32m[ RUN      ] [mNVFuserTest.FusionSingleElement_CUDA
2023-01-11T23:19:56.8788686Z [0;32m[       OK ] [mNVFuserTest.FusionSingleElement_CUDA (167 ms)
2023-01-11T23:19:56.8789504Z [0;32m[ RUN      ] [mNVFuserTest.FusionBNBackwardRepro_CUDA
2023-01-11T23:19:57.4563712Z [0;32m[       OK ] [mNVFuserTest.FusionBNBackwardRepro_CUDA (577 ms)
2023-01-11T23:19:57.4564233Z [0;32m[ RUN      ] [mNVFuserTest.FusionBNBackwardRepro2_CUDA
2023-01-11T23:19:58.1802141Z [0;32m[       OK ] [mNVFuserTest.FusionBNBackwardRepro2_CUDA (723 ms)
2023-01-11T23:19:58.1811115Z [0;32m[ RUN      ] [mNVFuserTest.FusionBNRepro_CUDA
2023-01-11T23:19:58.6199018Z [0;32m[       OK ] [mNVFuserTest.FusionBNRepro_CUDA (439 ms)
2023-01-11T23:19:58.6199774Z [0;32m[ RUN      ] [mNVFuserTest.FusionBNRepro2_CUDA
2023-01-11T23:19:59.0030428Z [0;32m[       OK ] [mNVFuserTest.FusionBNRepro2_CUDA (383 ms)
2023-01-11T23:19:59.0031441Z [0;32m[ RUN      ] [mNVFuserTest.FusionZeroSizeTensorPW_CUDA
2023-01-11T23:19:59.0051442Z [0;32m[       OK ] [mNVFuserTest.FusionZeroSizeTensorPW_CUDA (2 ms)
2023-01-11T23:19:59.0051924Z [0;32m[ RUN      ] [mNVFuserTest.FusionZeroSizeTensorReduction_CUDA
2023-01-11T23:19:59.2246155Z [0;32m[       OK ] [mNVFuserTest.FusionZeroSizeTensorReduction_CUDA (219 ms)
2023-01-11T23:19:59.2246645Z [0;32m[ RUN      ] [mNVFuserTest.FusionZeroSizeTensorNormalization_CUDA
2023-01-11T23:19:59.4450150Z [0;32m[       OK ] [mNVFuserTest.FusionZeroSizeTensorNormalization_CUDA (220 ms)
2023-01-11T23:19:59.4450612Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentIoAlias_CUDA
2023-01-11T23:19:59.8713460Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentIoAlias_CUDA (426 ms)
2023-01-11T23:19:59.8713892Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelford1Output_CUDA
2023-01-11T23:20:00.1419731Z [0;32m[       OK ] [mNVFuserTest.FusionWelford1Output_CUDA (270 ms)
2023-01-11T23:20:00.1420559Z [0;32m[ RUN      ] [mNVFuserTest.FusionTranslate1Welford_CUDA
2023-01-11T23:20:00.9497475Z [0;32m[       OK ] [mNVFuserTest.FusionTranslate1Welford_CUDA (807 ms)
2023-01-11T23:20:00.9498288Z [0;32m[ RUN      ] [mNVFuserTest.FusionTranslate2Welford_CUDA
2023-01-11T23:20:02.2006705Z [0;32m[       OK ] [mNVFuserTest.FusionTranslate2Welford_CUDA (1250 ms)
2023-01-11T23:20:02.2007170Z [0;32m[ RUN      ] [mNVFuserTest.FusionLargeWelfordNormalization_CUDA
2023-01-11T23:20:02.5163135Z [0;32m[       OK ] [mNVFuserTest.FusionLargeWelfordNormalization_CUDA (315 ms)
2023-01-11T23:20:02.5163712Z [0;32m[ RUN      ] [mNVFuserTest.FusionWelfordOuterPersistence_CUDA
2023-01-11T23:20:04.6091037Z [0;32m[       OK ] [mNVFuserTest.FusionWelfordOuterPersistence_CUDA (2093 ms)
2023-01-11T23:20:04.6091505Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmentIslands_CUDA
2023-01-11T23:20:05.0194767Z [0;32m[       OK ] [mNVFuserTest.FusionSegmentIslands_CUDA (409 ms)
2023-01-11T23:20:05.0195251Z [0;32m[ RUN      ] [mNVFuserTest.FusionBackOffInnerBroadcast_CUDA
2023-01-11T23:20:05.0224213Z [0;32m[       OK ] [mNVFuserTest.FusionBackOffInnerBroadcast_CUDA (3 ms)
2023-01-11T23:20:05.0224786Z [0;32m[ RUN      ] [mNVFuserTest.FusionBackOffInnerBroadcast2_CUDA
2023-01-11T23:20:05.0230454Z [0;32m[       OK ] [mNVFuserTest.FusionBackOffInnerBroadcast2_CUDA (0 ms)
2023-01-11T23:20:05.0231092Z [0;32m[ RUN      ] [mNVFuserTest.FusionBackOffInnerBroadcast3_CUDA
2023-01-11T23:20:05.0237599Z [0;32m[       OK ] [mNVFuserTest.FusionBackOffInnerBroadcast3_CUDA (0 ms)
2023-01-11T23:20:05.0238162Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleWarp_CUDA
2023-01-11T23:20:05.2114334Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleWarp_CUDA (186 ms)
2023-01-11T23:20:05.2115229Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleWarpPad_CUDA
2023-01-11T23:20:05.4124701Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleWarpPad_CUDA (201 ms)
2023-01-11T23:20:05.4125244Z [0;32m[ RUN      ] [mNVFuserTest.FusionWarpPadMergeSplit_CUDA
2023-01-11T23:20:05.6366850Z [0;32m[       OK ] [mNVFuserTest.FusionWarpPadMergeSplit_CUDA (224 ms)
2023-01-11T23:20:05.6367296Z [0;32m[ RUN      ] [mNVFuserTest.FusionSerialWarpReduction_CUDA
2023-01-11T23:20:05.9149370Z [0;32m[       OK ] [mNVFuserTest.FusionSerialWarpReduction_CUDA (278 ms)
2023-01-11T23:20:05.9150221Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialWarpReduction_CUDA
2023-01-11T23:20:06.1255135Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialWarpReduction_CUDA (210 ms)
2023-01-11T23:20:06.1255597Z [0;32m[ RUN      ] [mNVFuserTest.FusionMultipleDimBinding_CUDA
2023-01-11T23:20:06.3528159Z [0;32m[       OK ] [mNVFuserTest.FusionMultipleDimBinding_CUDA (227 ms)
2023-01-11T23:20:06.3528597Z [0;32m[ RUN      ] [mNVFuserTest.FusionPadNoWarpReduce_CUDA
2023-01-11T23:20:06.5394767Z [0;32m[       OK ] [mNVFuserTest.FusionPadNoWarpReduce_CUDA (186 ms)
2023-01-11T23:20:06.5395264Z [0;32m[ RUN      ] [mNVFuserTest.FusionWarpMutipleThreadDim_CUDA
2023-01-11T23:20:06.7389833Z [0;32m[       OK ] [mNVFuserTest.FusionWarpMutipleThreadDim_CUDA (199 ms)
2023-01-11T23:20:06.7390404Z [0;32m[ RUN      ] [mNVFuserTest.FusionWarpReduceUnrollOuterLoop_CUDA
2023-01-11T23:20:07.1016361Z [0;32m[       OK ] [mNVFuserTest.FusionWarpReduceUnrollOuterLoop_CUDA (362 ms)
2023-01-11T23:20:07.1017390Z [0;32m[ RUN      ] [mNVFuserTest.FusionWarpReducePredication_CUDA
2023-01-11T23:20:07.3840946Z [0;32m[       OK ] [mNVFuserTest.FusionWarpReducePredication_CUDA (282 ms)
2023-01-11T23:20:07.3841777Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegfaultReduction_CUDA
2023-01-11T23:20:07.6038297Z [0;32m[       OK ] [mNVFuserTest.FusionSegfaultReduction_CUDA (219 ms)
2023-01-11T23:20:07.6038867Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination1_CUDA
2023-01-11T23:20:07.6082038Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination1_CUDA (4 ms)
2023-01-11T23:20:07.6082619Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination2_CUDA
2023-01-11T23:20:07.9126311Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination2_CUDA (304 ms)
2023-01-11T23:20:07.9126858Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination3_CUDA
2023-01-11T23:20:09.2922166Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination3_CUDA (1379 ms)
2023-01-11T23:20:09.2922684Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination4_CUDA
2023-01-11T23:20:40.0803713Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination4_CUDA (30787 ms)
2023-01-11T23:20:40.0804167Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination5_CUDA
2023-01-11T23:20:41.3785530Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination5_CUDA (1298 ms)
2023-01-11T23:20:41.3785993Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination6_CUDA
2023-01-11T23:20:41.5761160Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination6_CUDA (197 ms)
2023-01-11T23:20:41.5762166Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateElimination7_CUDA
2023-01-11T23:20:41.7904674Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateElimination7_CUDA (214 ms)
2023-01-11T23:20:41.7905127Z [0;32m[ RUN      ] [mNVFuserTest.FusionForceFp16Simple_CUDA
2023-01-11T23:20:42.2089510Z [0;32m[       OK ] [mNVFuserTest.FusionForceFp16Simple_CUDA (418 ms)
2023-01-11T23:20:42.2089982Z [0;32m[ RUN      ] [mNVFuserTest.FusionForceBf16Simple_CUDA
2023-01-11T23:20:42.6288585Z [0;32m[       OK ] [mNVFuserTest.FusionForceBf16Simple_CUDA (419 ms)
2023-01-11T23:20:42.6289034Z [0;32m[ RUN      ] [mNVFuserTest.FusionForceFp16NotAllCast_CUDA
2023-01-11T23:20:43.3020978Z [0;32m[       OK ] [mNVFuserTest.FusionForceFp16NotAllCast_CUDA (673 ms)
2023-01-11T23:20:43.3021960Z [0;32m[ RUN      ] [mNVFuserTest.FusionForceBf16NotAllCast_CUDA
2023-01-11T23:20:43.9762426Z [0;32m[       OK ] [mNVFuserTest.FusionForceBf16NotAllCast_CUDA (674 ms)
2023-01-11T23:20:43.9762918Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseBroadCastMultiVisit_CUDA
2023-01-11T23:20:44.1598794Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseBroadCastMultiVisit_CUDA (183 ms)
2023-01-11T23:20:44.1599270Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseStressTest_CUDA
2023-01-11T23:20:44.3560706Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseStressTest_CUDA (196 ms)
2023-01-11T23:20:44.3561196Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseLargeBuffer_CUDA
2023-01-11T23:20:47.3522891Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseLargeBuffer_CUDA (2996 ms)
2023-01-11T23:20:47.3523556Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseNo2hop_CUDA
2023-01-11T23:20:47.5377657Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseNo2hop_CUDA (185 ms)
2023-01-11T23:20:47.5378340Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseAllocationOrder_CUDA
2023-01-11T23:20:47.7197372Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseAllocationOrder_CUDA (181 ms)
2023-01-11T23:20:47.7197862Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseLiveInterval_CUDA
2023-01-11T23:20:48.4541034Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseLiveInterval_CUDA (734 ms)
2023-01-11T23:20:48.4541970Z [0;32m[ RUN      ] [mNVFuserTest.FusionBufferReuseNoAcrossBroadcast_CUDA
2023-01-11T23:20:48.6392207Z [0;32m[       OK ] [mNVFuserTest.FusionBufferReuseNoAcrossBroadcast_CUDA (185 ms)
2023-01-11T23:20:48.6393053Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue970_CUDA
2023-01-11T23:20:49.0423855Z [0;32m[       OK ] [mNVFuserTest.FusionIssue970_CUDA (403 ms)
2023-01-11T23:20:49.0424608Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1016_CUDA
2023-01-11T23:20:49.2323962Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1016_CUDA (190 ms)
2023-01-11T23:20:49.2324365Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1021_CUDA
2023-01-11T23:20:49.4102758Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1021_CUDA (177 ms)
2023-01-11T23:20:49.4103699Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonUniqueThreadDim_CUDA
2023-01-11T23:20:49.5991292Z [0;32m[       OK ] [mNVFuserTest.FusionNonUniqueThreadDim_CUDA (188 ms)
2023-01-11T23:20:49.5992389Z [0;32m[ RUN      ] [mNVFuserTest.FusionParallelDimensionMap1_CUDA
2023-01-11T23:20:49.7879127Z [0;32m[       OK ] [mNVFuserTest.FusionParallelDimensionMap1_CUDA (188 ms)
2023-01-11T23:20:49.7879629Z [0;32m[ RUN      ] [mNVFuserTest.FusionParallelDimensionMap2_CUDA
2023-01-11T23:20:49.9762394Z [0;32m[       OK ] [mNVFuserTest.FusionParallelDimensionMap2_CUDA (188 ms)
2023-01-11T23:20:49.9762862Z [0;32m[ RUN      ] [mNVFuserTest.FusionParallelDimensionMap3_CUDA
2023-01-11T23:20:50.1645028Z [0;32m[       OK ] [mNVFuserTest.FusionParallelDimensionMap3_CUDA (188 ms)
2023-01-11T23:20:50.1645510Z [0;32m[ RUN      ] [mNVFuserTest.FusionParallelDimensionMap4_CUDA
2023-01-11T23:20:50.3524390Z [0;32m[       OK ] [mNVFuserTest.FusionParallelDimensionMap4_CUDA (187 ms)
2023-01-11T23:20:50.3524930Z [0;32m[ RUN      ] [mNVFuserTest.FusionParallelDimensionMap5_CUDA
2023-01-11T23:20:50.5283827Z [0;32m[       OK ] [mNVFuserTest.FusionParallelDimensionMap5_CUDA (175 ms)
2023-01-11T23:20:50.5284611Z [0;32m[ RUN      ] [mNVFuserTest.FusionSegmenterCombineReductionsCycleRepro_CUDA
2023-01-11T23:20:52.6284064Z [0;32m[       OK ] [mNVFuserTest.FusionSegmenterCombineReductionsCycleRepro_CUDA (2100 ms)
2023-01-11T23:20:52.6284850Z [0;32m[ RUN      ] [mNVFuserTest.FusionSerialAndParallelIndexing_CUDA
2023-01-11T23:20:52.8322809Z [0;32m[       OK ] [mNVFuserTest.FusionSerialAndParallelIndexing_CUDA (203 ms)
2023-01-11T23:20:52.8323262Z [0;32m[ RUN      ] [mNVFuserTest.FusionWARSyncAliasedSmem_CUDA
2023-01-11T23:20:53.0130551Z [0;32m[       OK ] [mNVFuserTest.FusionWARSyncAliasedSmem_CUDA (180 ms)
2023-01-11T23:20:53.0131340Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1099_CUDA
2023-01-11T23:20:53.2040208Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1099_CUDA (190 ms)
2023-01-11T23:20:53.2041010Z [0;32m[ RUN      ] [mNVFuserTest.FusionUnswitchPredicate_CUDA
2023-01-11T23:20:53.3919761Z [0;32m[       OK ] [mNVFuserTest.FusionUnswitchPredicate_CUDA (187 ms)
2023-01-11T23:20:53.3920165Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1189_CUDA
2023-01-11T23:20:53.5989427Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1189_CUDA (206 ms)
2023-01-11T23:20:53.5989835Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1052_CUDA
2023-01-11T23:20:53.7700778Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1052_CUDA (171 ms)
2023-01-11T23:20:53.7701346Z [0;32m[ RUN      ] [mNVFuserTest.FusionPointwiseBroadcast_CUDA
2023-01-11T23:20:53.9800544Z [0;32m[       OK ] [mNVFuserTest.FusionPointwiseBroadcast_CUDA (210 ms)
2023-01-11T23:20:53.9801005Z [0;32m[ RUN      ] [mNVFuserTest.FusionPointwiseVectorize_CUDA
2023-01-11T23:20:53.9824042Z [0;32m[       OK ] [mNVFuserTest.FusionPointwiseVectorize_CUDA (2 ms)
2023-01-11T23:20:53.9825006Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemAliasSerial_CUDA
2023-01-11T23:20:54.1617482Z [0;32m[       OK ] [mNVFuserTest.FusionSmemAliasSerial_CUDA (179 ms)
2023-01-11T23:20:54.1618105Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReductionWithNonExactParallelDimensions_CUDA
2023-01-11T23:20:54.3706124Z [0;32m[       OK ] [mNVFuserTest.FusionGridReductionWithNonExactParallelDimensions_CUDA (208 ms)
2023-01-11T23:20:54.3707253Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridWelfordWithNonExactParallelDimensions_CUDA
2023-01-11T23:20:54.6016663Z [0;32m[       OK ] [mNVFuserTest.FusionGridWelfordWithNonExactParallelDimensions_CUDA (230 ms)
2023-01-11T23:20:54.6017778Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridReductionWithNonExactParallelDimensions2_CUDA
2023-01-11T23:20:54.6018386Z [0;32m[       OK ] [mNVFuserTest.FusionGridReductionWithNonExactParallelDimensions2_CUDA (0 ms)
2023-01-11T23:20:54.6019237Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridWelfordWithNonExactParallelDimensions2_CUDA
2023-01-11T23:20:54.6019782Z [0;32m[       OK ] [mNVFuserTest.FusionGridWelfordWithNonExactParallelDimensions2_CUDA (0 ms)
2023-01-11T23:20:54.6020257Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateParallelizedDomains_CUDA
2023-01-11T23:20:54.8407214Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateParallelizedDomains_CUDA (239 ms)
2023-01-11T23:20:54.8407686Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemPredicateUnswitch_CUDA
2023-01-11T23:20:55.0504030Z [0;32m[       OK ] [mNVFuserTest.FusionSmemPredicateUnswitch_CUDA (209 ms)
2023-01-11T23:20:55.0504827Z [0;32m[ RUN      ] [mNVFuserTest.FusionFloatPow_CUDA
2023-01-11T23:20:55.2428591Z [0;32m[       OK ] [mNVFuserTest.FusionFloatPow_CUDA (192 ms)
2023-01-11T23:20:55.2429350Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1127_CUDA
2023-01-11T23:20:55.2459797Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1127_CUDA (3 ms)
2023-01-11T23:20:55.2460194Z [0;32m[ RUN      ] [mNVFuserTest.FusionChannelsLastParser_CUDA
2023-01-11T23:20:55.2736027Z [0;32m[       OK ] [mNVFuserTest.FusionChannelsLastParser_CUDA (27 ms)
2023-01-11T23:20:55.2736893Z [0;32m[ RUN      ] [mNVFuserTest.FusionThreadPredicateUnswitch_CUDA
2023-01-11T23:20:55.5038235Z [0;32m[       OK ] [mNVFuserTest.FusionThreadPredicateUnswitch_CUDA (229 ms)
2023-01-11T23:20:55.5038690Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonContigOutputs_CUDA
2023-01-11T23:20:55.6746162Z [0;32m[       OK ] [mNVFuserTest.FusionNonContigOutputs_CUDA (170 ms)
2023-01-11T23:20:55.6746975Z [0;32m[ RUN      ] [mNVFuserTest.FusionTestWarpSoftMax_CUDA
2023-01-11T23:20:55.9783994Z [0;32m[       OK ] [mNVFuserTest.FusionTestWarpSoftMax_CUDA (303 ms)
2023-01-11T23:20:55.9784778Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1133_CUDA
2023-01-11T23:20:56.1907568Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1133_CUDA (212 ms)
2023-01-11T23:20:56.1908304Z [0;32m[ RUN      ] [mNVFuserTest.FusionRfactorContigIDs_CUDA
2023-01-11T23:20:56.3852285Z [0;32m[       OK ] [mNVFuserTest.FusionRfactorContigIDs_CUDA (194 ms)
2023-01-11T23:20:56.3853912Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentBufferCalculation1_CUDA
2023-01-11T23:20:56.3855378Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentBufferCalculation1_CUDA (0 ms)
2023-01-11T23:20:56.3856412Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentBufferCalculation2_CUDA
2023-01-11T23:20:56.3859881Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentBufferCalculation2_CUDA (0 ms)
2023-01-11T23:20:56.3860331Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentBufferCalculation3_CUDA
2023-01-11T23:20:56.3869654Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentBufferCalculation3_CUDA (0 ms)
2023-01-11T23:20:56.3870144Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentBufferCalculation4_CUDA
2023-01-11T23:20:56.3876148Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentBufferCalculation4_CUDA (0 ms)
2023-01-11T23:20:56.3876636Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentBufferProjection_CUDA
2023-01-11T23:20:56.6586074Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentBufferProjection_CUDA (270 ms)
2023-01-11T23:20:56.6586852Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1223_CUDA
2023-01-11T23:20:56.8718724Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1223_CUDA (213 ms)
2023-01-11T23:20:56.8719189Z [0;32m[ RUN      ] [mNVFuserTest.FusionRfactorPredication1_CUDA
2023-01-11T23:20:57.0724661Z [0;32m[       OK ] [mNVFuserTest.FusionRfactorPredication1_CUDA (200 ms)
2023-01-11T23:20:57.0725164Z [0;32m[ RUN      ] [mNVFuserTest.FusionRfactorPredication2_CUDA
2023-01-11T23:20:57.2961208Z [0;32m[       OK ] [mNVFuserTest.FusionRfactorPredication2_CUDA (223 ms)
2023-01-11T23:20:57.2961650Z [0;32m[ RUN      ] [mNVFuserTest.FusionRfactorIndirectRoot_CUDA
2023-01-11T23:20:57.4861657Z [0;32m[       OK ] [mNVFuserTest.FusionRfactorIndirectRoot_CUDA (189 ms)
2023-01-11T23:20:57.4862989Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplit1_CUDA
2023-01-11T23:20:57.6758373Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplit1_CUDA (189 ms)
2023-01-11T23:20:57.6759043Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplit2_CUDA
2023-01-11T23:20:57.8823071Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplit2_CUDA (206 ms)
2023-01-11T23:20:57.8823923Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplit3_CUDA
2023-01-11T23:20:58.0699573Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplit3_CUDA (187 ms)
2023-01-11T23:20:58.0700055Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplit4_CUDA
2023-01-11T23:20:58.2624685Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplit4_CUDA (192 ms)
2023-01-11T23:20:58.2625526Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplit5_CUDA
2023-01-11T23:20:58.4568200Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplit5_CUDA (194 ms)
2023-01-11T23:20:58.4568703Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplitVectorize1_CUDA
2023-01-11T23:20:58.6578589Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplitVectorize1_CUDA (200 ms)
2023-01-11T23:20:58.6579276Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleSplitVectorize2_CUDA
2023-01-11T23:20:58.9985899Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleSplitVectorize2_CUDA (340 ms)
2023-01-11T23:20:58.9986929Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1284Repro_CUDA
2023-01-11T23:20:59.3491099Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1284Repro_CUDA (350 ms)
2023-01-11T23:20:59.3491883Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1284Repro2_CUDA
2023-01-11T23:20:59.7436331Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1284Repro2_CUDA (394 ms)
2023-01-11T23:20:59.7437265Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1305Repro_CUDA
2023-01-11T23:20:59.7440661Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1305Repro_CUDA (0 ms)
2023-01-11T23:20:59.7441119Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering1_CUDA
2023-01-11T23:20:59.9256988Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering1_CUDA (181 ms)
2023-01-11T23:20:59.9257961Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering2_CUDA
2023-01-11T23:21:00.1060248Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering2_CUDA (180 ms)
2023-01-11T23:21:00.1060701Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering3_CUDA
2023-01-11T23:21:00.2966301Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering3_CUDA (190 ms)
2023-01-11T23:21:00.2966775Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering4_CUDA
2023-01-11T23:21:00.5075995Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering4_CUDA (210 ms)
2023-01-11T23:21:00.5076971Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering5_CUDA
2023-01-11T23:21:00.7113893Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering5_CUDA (203 ms)
2023-01-11T23:21:00.7114890Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering6_CUDA
2023-01-11T23:21:01.0722189Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering6_CUDA (361 ms)
2023-01-11T23:21:01.0722646Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering7_CUDA
2023-01-11T23:21:01.2602869Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering7_CUDA (188 ms)
2023-01-11T23:21:01.2603349Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering8_CUDA
2023-01-11T23:21:01.4656427Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering8_CUDA (205 ms)
2023-01-11T23:21:01.4657378Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBuffering9_CUDA
2023-01-11T23:21:01.6641413Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBuffering9_CUDA (198 ms)
2023-01-11T23:21:01.6641919Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemBlockGemmCacheDoubleBuffer_CUDA
2023-01-11T23:21:04.1540820Z [0;32m[       OK ] [mNVFuserTest.FusionSmemBlockGemmCacheDoubleBuffer_CUDA (2489 ms)
2023-01-11T23:21:04.1541438Z [0;32m[ RUN      ] [mNVFuserTest.FusionIntermediateTensorVectorize_CUDA
2023-01-11T23:21:04.5258798Z [0;32m[       OK ] [mNVFuserTest.FusionIntermediateTensorVectorize_CUDA (371 ms)
2023-01-11T23:21:04.5259777Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastConcretization1_CUDA
2023-01-11T23:21:04.8476275Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastConcretization1_CUDA (321 ms)
2023-01-11T23:21:04.8476780Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastConcretization2_CUDA
2023-01-11T23:21:05.0316669Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastConcretization2_CUDA (183 ms)
2023-01-11T23:21:05.0317602Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastConcretization3_CUDA
2023-01-11T23:21:05.3933581Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastConcretization3_CUDA (361 ms)
2023-01-11T23:21:05.3935321Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastConcretization5_CUDA
2023-01-11T23:21:05.3936739Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastConcretization5_CUDA (0 ms)
2023-01-11T23:21:05.3937549Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1430_CUDA
2023-01-11T23:21:06.0404717Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1430_CUDA (646 ms)
2023-01-11T23:21:06.0405156Z [0;32m[ RUN      ] [mNVFuserTest.FusionCodegenAllocatedScalars_CUDA
2023-01-11T23:21:06.0409000Z [0;32m[       OK ] [mNVFuserTest.FusionCodegenAllocatedScalars_CUDA (0 ms)
2023-01-11T23:21:06.0409422Z [0;32m[ RUN      ] [mNVFuserTest.FusionIndexHoist1_CUDA
2023-01-11T23:21:06.2549106Z [0;32m[       OK ] [mNVFuserTest.FusionIndexHoist1_CUDA (213 ms)
2023-01-11T23:21:06.2549862Z [0;32m[ RUN      ] [mNVFuserTest.FusionIndexHoist2_CUDA
2023-01-11T23:21:06.4425919Z [0;32m[       OK ] [mNVFuserTest.FusionIndexHoist2_CUDA (187 ms)
2023-01-11T23:21:06.4426713Z [0;32m[ RUN      ] [mNVFuserTest.FusionTestGridComm_CUDA
2023-01-11T23:21:06.6447877Z [0;32m[       OK ] [mNVFuserTest.FusionTestGridComm_CUDA (202 ms)
2023-01-11T23:21:06.6448312Z [0;32m[ RUN      ] [mNVFuserTest.FusionTestGridComm2_CUDA
2023-01-11T23:21:06.8298453Z [0;32m[       OK ] [mNVFuserTest.FusionTestGridComm2_CUDA (184 ms)
2023-01-11T23:21:06.8299289Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBufferVector_CUDA
2023-01-11T23:21:07.0537881Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBufferVector_CUDA (223 ms)
2023-01-11T23:21:07.0538453Z [0;32m[ RUN      ] [mNVFuserTest.FusionLargeSmem_CUDA
2023-01-11T23:21:07.4832353Z [0;32m[       OK ] [mNVFuserTest.FusionLargeSmem_CUDA (429 ms)
2023-01-11T23:21:07.4832842Z [0;32m[ RUN      ] [mNVFuserTest.FusionTooLargeSmem_CUDA
2023-01-11T23:21:08.4867954Z [0;32m[       OK ] [mNVFuserTest.FusionTooLargeSmem_CUDA (1003 ms)
2023-01-11T23:21:08.4868894Z [0;32m[ RUN      ] [mNVFuserTest.FusionSmemAlignment_CUDA
2023-01-11T23:21:13.6333436Z [0;32m[       OK ] [mNVFuserTest.FusionSmemAlignment_CUDA (5146 ms)
2023-01-11T23:21:13.6333921Z [0;32m[ RUN      ] [mNVFuserTest.FusionImmediateValueAsInput_CUDA
2023-01-11T23:21:13.8047761Z [0;32m[       OK ] [mNVFuserTest.FusionImmediateValueAsInput_CUDA (171 ms)
2023-01-11T23:21:13.8048248Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeContigIndex_CUDA
2023-01-11T23:21:13.9789433Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeContigIndex_CUDA (173 ms)
2023-01-11T23:21:13.9790503Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeContigIndexFail_CUDA
2023-01-11T23:21:14.1563421Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeContigIndexFail_CUDA (177 ms)
2023-01-11T23:21:14.1564059Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeInputToOutput_CUDA
2023-01-11T23:21:14.3360997Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeInputToOutput_CUDA (179 ms)
2023-01-11T23:21:14.3362098Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeContigIndexValidationFail_CUDA
2023-01-11T23:21:14.5144874Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeContigIndexValidationFail_CUDA (178 ms)
2023-01-11T23:21:14.5145372Z [0;32m[ RUN      ] [mNVFuserTest.FusionContigIndexingWithBroadcast_CUDA
2023-01-11T23:21:14.8910050Z [0;32m[       OK ] [mNVFuserTest.FusionContigIndexingWithBroadcast_CUDA (376 ms)
2023-01-11T23:21:14.8911351Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeContigIndexValidationFail2_CUDA
2023-01-11T23:21:15.0852314Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeContigIndexValidationFail2_CUDA (194 ms)
2023-01-11T23:21:15.0852862Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeContigIndexWithBroadcast_CUDA
2023-01-11T23:21:15.2682411Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeContigIndexWithBroadcast_CUDA (182 ms)
2023-01-11T23:21:15.2683461Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeContigIndexPointwiseSchedule_CUDA
2023-01-11T23:21:15.4925574Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeContigIndexPointwiseSchedule_CUDA (224 ms)
2023-01-11T23:21:15.4926089Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReductionForwarding1_CUDA
2023-01-11T23:21:15.4933228Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReductionForwarding1_CUDA (0 ms)
2023-01-11T23:21:15.4933918Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReductionForwarding2_CUDA
2023-01-11T23:21:15.4934984Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReductionForwarding2_CUDA (0 ms)
2023-01-11T23:21:15.4935876Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReductionForwarding3_CUDA
2023-01-11T23:21:15.4936676Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReductionForwarding3_CUDA (0 ms)
2023-01-11T23:21:15.4937139Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialReductionForwarding4_CUDA
2023-01-11T23:21:15.6723928Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialReductionForwarding4_CUDA (178 ms)
2023-01-11T23:21:15.6724413Z [0;32m[ RUN      ] [mNVFuserTest.FusionRAWSyncInsertionPlace1_CUDA
2023-01-11T23:21:15.8606791Z [0;32m[       OK ] [mNVFuserTest.FusionRAWSyncInsertionPlace1_CUDA (188 ms)
2023-01-11T23:21:15.8607249Z [0;32m[ RUN      ] [mNVFuserTest.FusionRAWSyncInsertionPlace2_CUDA
2023-01-11T23:21:16.1478122Z [0;32m[       OK ] [mNVFuserTest.FusionRAWSyncInsertionPlace2_CUDA (287 ms)
2023-01-11T23:21:16.1478644Z [0;32m[ RUN      ] [mNVFuserTest.FusionRAWSyncInsertionPlace3_CUDA
2023-01-11T23:21:16.3797097Z [0;32m[       OK ] [mNVFuserTest.FusionRAWSyncInsertionPlace3_CUDA (231 ms)
2023-01-11T23:21:16.3797675Z [0;32m[ RUN      ] [mNVFuserTest.FusionRAWSyncInsertionPlace4_CUDA
2023-01-11T23:21:16.3836182Z [0;32m[       OK ] [mNVFuserTest.FusionRAWSyncInsertionPlace4_CUDA (4 ms)
2023-01-11T23:21:16.3837609Z [0;32m[ RUN      ] [mNVFuserTest.FusionSerialSmemWriteParallelRead1_CUDA
2023-01-11T23:21:17.4333278Z [0;32m[       OK ] [mNVFuserTest.FusionSerialSmemWriteParallelRead1_CUDA (1049 ms)
2023-01-11T23:21:17.4334075Z [0;32m[ RUN      ] [mNVFuserTest.FusionSerialSmemWriteParallelRead2_CUDA
2023-01-11T23:21:18.4980429Z [0;32m[       OK ] [mNVFuserTest.FusionSerialSmemWriteParallelRead2_CUDA (1064 ms)
2023-01-11T23:21:18.4981436Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleCpAsync_CUDA
2023-01-11T23:21:18.7520520Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleCpAsync_CUDA (254 ms)
2023-01-11T23:21:18.7520967Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBufferCpAsync1_CUDA
2023-01-11T23:21:19.1060138Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBufferCpAsync1_CUDA (353 ms)
2023-01-11T23:21:19.1061112Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBufferCpAsync2_CUDA
2023-01-11T23:21:19.6731677Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBufferCpAsync2_CUDA (567 ms)
2023-01-11T23:21:19.6732435Z [0;32m[ RUN      ] [mNVFuserTest.FusionDoubleBufferNoSync_CUDA
2023-01-11T23:21:20.1594440Z [0;32m[       OK ] [mNVFuserTest.FusionDoubleBufferNoSync_CUDA (486 ms)
2023-01-11T23:21:20.1595094Z [0;32m[ RUN      ] [mNVFuserTest.FusionCpAsyncPredicate_CUDA
2023-01-11T23:21:22.3711439Z [0;32m[       OK ] [mNVFuserTest.FusionCpAsyncPredicate_CUDA (2211 ms)
2023-01-11T23:21:22.3712305Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredRemovalCheck_CUDA
2023-01-11T23:21:22.3744834Z [0;32m[       OK ] [mNVFuserTest.FusionPredRemovalCheck_CUDA (3 ms)
2023-01-11T23:21:22.3745558Z [0;32m[ RUN      ] [mNVFuserTest.FusionPropagateParallelTypesToSiblings_CUDA
2023-01-11T23:21:22.6241469Z [0;32m[       OK ] [mNVFuserTest.FusionPropagateParallelTypesToSiblings_CUDA (249 ms)
2023-01-11T23:21:22.6242282Z [0;32m[ RUN      ] [mNVFuserTest.FusionExactRootDomainMap_CUDA
2023-01-11T23:21:22.6242680Z [0;32m[       OK ] [mNVFuserTest.FusionExactRootDomainMap_CUDA (0 ms)
2023-01-11T23:21:22.6243082Z [0;32m[ RUN      ] [mNVFuserTest.FusionIncompleteConcreteID_CUDA
2023-01-11T23:21:22.6300677Z [0;32m[       OK ] [mNVFuserTest.FusionIncompleteConcreteID_CUDA (5 ms)
2023-01-11T23:21:22.6301541Z [0;32m[ RUN      ] [mNVFuserTest.FusionTestReEntrantGridWelford_CUDA
2023-01-11T23:21:23.8598034Z [0;32m[       OK ] [mNVFuserTest.FusionTestReEntrantGridWelford_CUDA (1229 ms)
2023-01-11T23:21:23.8598499Z [0;32m[ RUN      ] [mNVFuserTest.FusionRedundantPredSync_CUDA
2023-01-11T23:21:24.0357429Z [0;32m[       OK ] [mNVFuserTest.FusionRedundantPredSync_CUDA (175 ms)
2023-01-11T23:21:24.0357855Z [0;32m[ RUN      ] [mNVFuserTest.FusionRedundantPredSync2_CUDA
2023-01-11T23:21:24.2096082Z [0;32m[       OK ] [mNVFuserTest.FusionRedundantPredSync2_CUDA (173 ms)
2023-01-11T23:21:24.2096594Z [0;32m[ RUN      ] [mNVFuserTest.FusionRedundantPredSync3_CUDA
2023-01-11T23:21:35.9968772Z [0;32m[       OK ] [mNVFuserTest.FusionRedundantPredSync3_CUDA (11787 ms)
2023-01-11T23:21:35.9969256Z [0;32m[ RUN      ] [mNVFuserTest.FusionRedundantUseCheck_CUDA
2023-01-11T23:21:35.9997297Z [0;32m[       OK ] [mNVFuserTest.FusionRedundantUseCheck_CUDA (2 ms)
2023-01-11T23:21:35.9998090Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleSwizzle0_CUDA
2023-01-11T23:21:36.2909487Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleSwizzle0_CUDA (290 ms)
2023-01-11T23:21:36.2909921Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleSwizzle1_CUDA
2023-01-11T23:21:36.6012300Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleSwizzle1_CUDA (310 ms)
2023-01-11T23:21:36.6013204Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleSwizzle2_CUDA
2023-01-11T23:21:36.9478454Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleSwizzle2_CUDA (346 ms)
2023-01-11T23:21:36.9479258Z [0;32m[ RUN      ] [mNVFuserTest.FusionSwizzleMapping_CUDA
2023-01-11T23:21:36.9487339Z [0;32m[       OK ] [mNVFuserTest.FusionSwizzleMapping_CUDA (1 ms)
2023-01-11T23:21:36.9487726Z [0;32m[ RUN      ] [mNVFuserTest.FusionLoopSwizzle0_CUDA
2023-01-11T23:21:37.2099017Z [0;32m[       OK ] [mNVFuserTest.FusionLoopSwizzle0_CUDA (260 ms)
2023-01-11T23:21:37.2099599Z [0;32m[ RUN      ] [mNVFuserTest.FusionLoopSwizzle1_CUDA
2023-01-11T23:21:37.4731272Z [0;32m[       OK ] [mNVFuserTest.FusionLoopSwizzle1_CUDA (263 ms)
2023-01-11T23:21:37.4731757Z [0;32m[ RUN      ] [mNVFuserTest.FusionLoopSwizzleCheck0_CUDA
2023-01-11T23:21:37.4777549Z [0;32m[       OK ] [mNVFuserTest.FusionLoopSwizzleCheck0_CUDA (4 ms)
2023-01-11T23:21:37.4778358Z [0;32m[ RUN      ] [mNVFuserTest.FusionLoopSwizzleCheck1_CUDA
2023-01-11T23:21:37.4794067Z [W lower_validation.cpp:1081] Warning: Ignored loop swizzle :ZShape(2D): iS10{( ceilDiv(16, 4) )} , iS11{4} -> iS16{( ceilDiv(16, 4) )} , iS17{4}
2023-01-11T23:21:37.4794414Z  (function operator())
2023-01-11T23:21:37.4822743Z [0;32m[       OK ] [mNVFuserTest.FusionLoopSwizzleCheck1_CUDA (4 ms)
2023-01-11T23:21:37.4823684Z [0;32m[ RUN      ] [mNVFuserTest.FusionUnsqueeze1_CUDA
2023-01-11T23:21:37.8197987Z [0;32m[       OK ] [mNVFuserTest.FusionUnsqueeze1_CUDA (337 ms)
2023-01-11T23:21:37.8198931Z [0;32m[ RUN      ] [mNVFuserTest.FusionSqueeze1_CUDA
2023-01-11T23:21:38.1569303Z [0;32m[       OK ] [mNVFuserTest.FusionSqueeze1_CUDA (337 ms)
2023-01-11T23:21:38.1569767Z [0;32m[ RUN      ] [mNVFuserTest.FusionContigPredicate_CUDA
2023-01-11T23:21:38.6067429Z [0;32m[       OK ] [mNVFuserTest.FusionContigPredicate_CUDA (449 ms)
2023-01-11T23:21:38.6068274Z [0;32m[ RUN      ] [mNVFuserTest.FusionDivScalarLhs_CUDA
2023-01-11T23:21:38.7842202Z [0;32m[       OK ] [mNVFuserTest.FusionDivScalarLhs_CUDA (177 ms)
2023-01-11T23:21:38.7842627Z [0;32m[ RUN      ] [mNVFuserTest.FusionRepro1713_CUDA
2023-01-11T23:21:39.4258313Z [0;32m[       OK ] [mNVFuserTest.FusionRepro1713_CUDA (641 ms)
2023-01-11T23:21:39.4259462Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpand_CUDA
2023-01-11T23:21:39.8244993Z [0;32m[       OK ] [mNVFuserTest.FusionExpand_CUDA (398 ms)
2023-01-11T23:21:39.8245415Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandIssue1751_CUDA
2023-01-11T23:21:40.0158475Z [0;32m[       OK ] [mNVFuserTest.FusionExpandIssue1751_CUDA (190 ms)
2023-01-11T23:21:40.0159462Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandToConcrete_CUDA
2023-01-11T23:21:40.1939153Z [0;32m[       OK ] [mNVFuserTest.FusionExpandToConcrete_CUDA (178 ms)
2023-01-11T23:21:40.1940165Z [0;32m[ RUN      ] [mNVFuserTest.FusionReproNoncontigBroadcast_CUDA
2023-01-11T23:21:40.4404810Z [0;32m[       OK ] [mNVFuserTest.FusionReproNoncontigBroadcast_CUDA (246 ms)
2023-01-11T23:21:40.4405288Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransformPropagateSibling_CUDA
2023-01-11T23:21:40.4416051Z [0;32m[       OK ] [mNVFuserTest.FusionTransformPropagateSibling_CUDA (1 ms)
2023-01-11T23:21:40.4416691Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransformPropagateSelectorSibling_CUDA
2023-01-11T23:21:40.4442306Z [0;32m[       OK ] [mNVFuserTest.FusionTransformPropagateSelectorSibling_CUDA (2 ms)
2023-01-11T23:21:40.4442905Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransformPropagatePosition_CUDA
2023-01-11T23:21:40.4447360Z [0;32m[       OK ] [mNVFuserTest.FusionTransformPropagatePosition_CUDA (0 ms)
2023-01-11T23:21:40.4447939Z [0;32m[ RUN      ] [mNVFuserTest.FusionIgnoreZeroDimReduction_CUDA
2023-01-11T23:21:40.6636808Z [0;32m[       OK ] [mNVFuserTest.FusionIgnoreZeroDimReduction_CUDA (218 ms)
2023-01-11T23:21:40.6637795Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1770Repro_CUDA
2023-01-11T23:21:40.8438048Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1770Repro_CUDA (180 ms)
2023-01-11T23:21:40.8439149Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransformPropagatorSelector_CUDA
2023-01-11T23:21:40.8440298Z [0;32m[       OK ] [mNVFuserTest.FusionTransformPropagatorSelector_CUDA (0 ms)
2023-01-11T23:21:40.8440972Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransformPropagatorPos_CUDA
2023-01-11T23:21:40.8441564Z [0;32m[       OK ] [mNVFuserTest.FusionTransformPropagatorPos_CUDA (0 ms)
2023-01-11T23:21:40.8442265Z [0;32m[ RUN      ] [mNVFuserTest.FusionMaxRootDomainInfoSpanningTreePrintTwice_CUDA
2023-01-11T23:21:40.8443168Z [0;32m[       OK ] [mNVFuserTest.FusionMaxRootDomainInfoSpanningTreePrintTwice_CUDA (0 ms)
2023-01-11T23:21:40.8443722Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransformPropagatorNoOverwrite_CUDA
2023-01-11T23:21:40.8444184Z [0;32m[       OK ] [mNVFuserTest.FusionTransformPropagatorNoOverwrite_CUDA (0 ms)
2023-01-11T23:21:40.8444595Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssue1785Repro_CUDA
2023-01-11T23:21:41.0456452Z [0;32m[       OK ] [mNVFuserTest.FusionIssue1785Repro_CUDA (200 ms)
2023-01-11T23:21:41.0457323Z [0;32m[ RUN      ] [mNVFuserTest.FusionSkipReplay_CUDA
2023-01-11T23:21:41.0458569Z [0;32m[       OK ] [mNVFuserTest.FusionSkipReplay_CUDA (0 ms)
2023-01-11T23:21:41.0459430Z [0;32m[ RUN      ] [mNVFuserTest.FusionInlineRepro1803_CUDA
2023-01-11T23:21:41.0468318Z [0;32m[       OK ] [mNVFuserTest.FusionInlineRepro1803_CUDA (1 ms)
2023-01-11T23:21:41.0468939Z [0;32m[ RUN      ] [mNVFuserTest.FusionBoundedDirectionSelection1_CUDA
2023-01-11T23:21:41.0470254Z [0;32m[       OK ] [mNVFuserTest.FusionBoundedDirectionSelection1_CUDA (0 ms)
2023-01-11T23:21:41.0470847Z [0;32m[ RUN      ] [mNVFuserTest.FusionIssueRepro1844_CUDA
2023-01-11T23:21:41.2835431Z [0;32m[       OK ] [mNVFuserTest.FusionIssueRepro1844_CUDA (235 ms)
2023-01-11T23:21:41.2836449Z [0;32m[ RUN      ] [mNVFuserTest.FusionInsertMagicZero1_CUDA
2023-01-11T23:21:41.2866023Z [0;32m[       OK ] [mNVFuserTest.FusionInsertMagicZero1_CUDA (3 ms)
2023-01-11T23:21:41.2866953Z [0;32m[ RUN      ] [mNVFuserTest.FusionRepro1860_CUDA
2023-01-11T23:21:41.5110893Z [0;32m[       OK ] [mNVFuserTest.FusionRepro1860_CUDA (224 ms)
2023-01-11T23:21:41.5111359Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandReduce_CUDA
2023-01-11T23:21:41.7150698Z [0;32m[       OK ] [mNVFuserTest.FusionExpandReduce_CUDA (203 ms)
2023-01-11T23:21:41.7151124Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandReduce2_CUDA
2023-01-11T23:21:41.9358501Z [0;32m[       OK ] [mNVFuserTest.FusionExpandReduce2_CUDA (220 ms)
2023-01-11T23:21:41.9359325Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandBadShapeTest_CUDA
2023-01-11T23:21:41.9394432Z [0;32m[       OK ] [mNVFuserTest.FusionExpandBadShapeTest_CUDA (4 ms)
2023-01-11T23:21:41.9394953Z [0;32m[ RUN      ] [mNVFuserTest.FusionPointwiseScheduleWithBroadcastAndTrivialReduction_CUDA
2023-01-11T23:21:42.1966242Z [0;32m[       OK ] [mNVFuserTest.FusionPointwiseScheduleWithBroadcastAndTrivialReduction_CUDA (257 ms)
2023-01-11T23:21:42.1966805Z [0;32m[ RUN      ] [mNVFuserTest.FusionInliningMismatchedDims1_CUDA
2023-01-11T23:21:43.0544072Z [0;32m[       OK ] [mNVFuserTest.FusionInliningMismatchedDims1_CUDA (857 ms)
2023-01-11T23:21:43.0545003Z [0;32m[ RUN      ] [mNVFuserTest.FusionInliningMismatchedDims2_CUDA
2023-01-11T23:21:43.9150537Z [0;32m[       OK ] [mNVFuserTest.FusionInliningMismatchedDims2_CUDA (860 ms)
2023-01-11T23:21:43.9151629Z [0;32m[ RUN      ] [mNVFuserTest.FusionInliningMismatchedDims3_CUDA
2023-01-11T23:21:44.8634079Z [0;32m[       OK ] [mNVFuserTest.FusionInliningMismatchedDims3_CUDA (948 ms)
2023-01-11T23:21:44.8634612Z [0;32m[ RUN      ] [mNVFuserTest.FusionInliningMismatchedDims4_CUDA
2023-01-11T23:21:45.7278230Z [0;32m[       OK ] [mNVFuserTest.FusionInliningMismatchedDims4_CUDA (864 ms)
2023-01-11T23:21:45.7279376Z [0;32m[ RUN      ] [mNVFuserTest.FusionInliningBroadcast_CUDA
2023-01-11T23:21:46.5611453Z [0;32m[       OK ] [mNVFuserTest.FusionInliningBroadcast_CUDA (833 ms)
2023-01-11T23:21:46.5612141Z [0;32m[ RUN      ] [mNVFuserTest.FusionInliningBroadcastTrivialReduction_CUDA
2023-01-11T23:21:47.4949209Z [0;32m[       OK ] [mNVFuserTest.FusionInliningBroadcastTrivialReduction_CUDA (933 ms)
2023-01-11T23:21:47.4950606Z [0;32m[ RUN      ] [mNVFuserTest.FusionMatchedLeafPosWithoutReplayTrivialReduction_CUDA
2023-01-11T23:21:47.4951908Z [0;32m[       OK ] [mNVFuserTest.FusionMatchedLeafPosWithoutReplayTrivialReduction_CUDA (0 ms)
2023-01-11T23:21:47.4952648Z [0;32m[ RUN      ] [mNVFuserTest.FusionMatchedLeafPosWithoutReplayBroadcast_CUDA
2023-01-11T23:21:47.4953488Z [0;32m[       OK ] [mNVFuserTest.FusionMatchedLeafPosWithoutReplayBroadcast_CUDA (0 ms)
2023-01-11T23:21:47.4953952Z [0;32m[ RUN      ] [mNVFuserTest.FusionIdGraphTrivialReduction_CUDA
2023-01-11T23:21:47.4961161Z [0;32m[       OK ] [mNVFuserTest.FusionIdGraphTrivialReduction_CUDA (1 ms)
2023-01-11T23:21:47.4961558Z [0;32m[ RUN      ] [mNVFuserTest.FusionPrint_CUDA
2023-01-11T23:21:47.6887600Z T3[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:47.8913703Z T3[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:47.8914113Z T3[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:47.8914411Z T3[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.0867408Z T4[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.0868127Z T4[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.2809869Z T4[0] = 0.000000 @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.2810269Z T4[0] = 1.000000 @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.4754803Z T4[0] = 0 @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.4755172Z T4[0] = 1 @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.6696779Z T4[0] = 0 @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.6697447Z T4[0] = 1 @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.8641628Z T4[0] = false @ threadIdx=(0,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.8642177Z T4[0] = true @ threadIdx=(1,0,0), blockIdx=(0,0,0)
2023-01-11T23:21:48.8642820Z [0;32m[       OK ] [mNVFuserTest.FusionPrint_CUDA (1367 ms)
2023-01-11T23:21:48.8643188Z [0;32m[ RUN      ] [mNVFuserTest.FusionCheckedSymbolicShape_CUDA
2023-01-11T23:21:49.0751146Z [0;32m[       OK ] [mNVFuserTest.FusionCheckedSymbolicShape_CUDA (210 ms)
2023-01-11T23:21:49.0752030Z [0;32m[ RUN      ] [mNVFuserTest.FusionSizeDependentData_CUDA
2023-01-11T23:21:49.2493044Z [0;32m[       OK ] [mNVFuserTest.FusionSizeDependentData_CUDA (174 ms)
2023-01-11T23:21:49.2493627Z [0;32m[ RUN      ] [mNVFuserTest.FusionDependencyCheck_CUDA
2023-01-11T23:21:49.2494008Z [0;32m[       OK ] [mNVFuserTest.FusionDependencyCheck_CUDA (0 ms)
2023-01-11T23:21:49.2494400Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeRepro1_CUDA
2023-01-11T23:21:49.5647994Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeRepro1_CUDA (315 ms)
2023-01-11T23:21:49.5648475Z [0;32m[ RUN      ] [mNVFuserTest.FusionInlineBroadcastIndexing0_CUDA
2023-01-11T23:21:49.8601936Z [0;32m[       OK ] [mNVFuserTest.FusionInlineBroadcastIndexing0_CUDA (295 ms)
2023-01-11T23:21:49.8602490Z [0;32m[ RUN      ] [mNVFuserTest.FusionPredicateUnshare_CUDA
2023-01-11T23:21:50.0431040Z [0;32m[       OK ] [mNVFuserTest.FusionPredicateUnshare_CUDA (182 ms)
2023-01-11T23:21:50.0431986Z [0;32m[ RUN      ] [mNVFuserTest.AsyncCompilation_CUDA
2023-01-11T23:21:50.4804803Z .....................[0;32m[       OK ] [mNVFuserTest.AsyncCompilation_CUDA (437 ms)
2023-01-11T23:21:50.4805626Z [0;32m[ RUN      ] [mNVFuserTest.FusionMergeBroadcastingTrivialReduction1_CUDA
2023-01-11T23:21:50.6528343Z [0;32m[       OK ] [mNVFuserTest.FusionMergeBroadcastingTrivialReduction1_CUDA (172 ms)
2023-01-11T23:21:50.6528942Z [0;32m[ RUN      ] [mNVFuserTest.FusionMergeBroadcastingTrivialReduction2_CUDA
2023-01-11T23:21:50.8279437Z [0;32m[       OK ] [mNVFuserTest.FusionMergeBroadcastingTrivialReduction2_CUDA (174 ms)
2023-01-11T23:21:50.8279919Z [0;32m[ RUN      ] [mNVFuserTest.FusionNullScheduler_CUDA
2023-01-11T23:21:50.9952371Z [0;32m[       OK ] [mNVFuserTest.FusionNullScheduler_CUDA (167 ms)
2023-01-11T23:21:50.9952991Z [0;32m[ RUN      ] [mNVFuserTest.FusionNullScheduler2_CUDA
2023-01-11T23:21:51.1647089Z [0;32m[       OK ] [mNVFuserTest.FusionNullScheduler2_CUDA (169 ms)
2023-01-11T23:21:51.1647503Z [0;32m[ RUN      ] [mNVFuserTest.FusionNullScheduler3_CUDA
2023-01-11T23:21:51.3321090Z [0;32m[       OK ] [mNVFuserTest.FusionNullScheduler3_CUDA (167 ms)
2023-01-11T23:21:51.3322546Z [0;32m[ RUN      ] [mNVFuserTest.FusionEmpty_CUDA
2023-01-11T23:21:51.3325050Z [0;32m[       OK ] [mNVFuserTest.FusionEmpty_CUDA (0 ms)
2023-01-11T23:21:51.3325403Z [0;32m[ RUN      ] [mNVFuserTest.FusionMappingRelation_CUDA
2023-01-11T23:21:51.5068961Z [0;32m[       OK ] [mNVFuserTest.FusionMappingRelation_CUDA (174 ms)
2023-01-11T23:21:51.5069409Z [0;32m[ RUN      ] [mNVFuserTest.FusionInlineAt_CUDA
2023-01-11T23:21:51.7064669Z [0;32m[       OK ] [mNVFuserTest.FusionInlineAt_CUDA (199 ms)
2023-01-11T23:21:51.7065131Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialInputForwarding_CUDA
2023-01-11T23:21:51.7073372Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialInputForwarding_CUDA (0 ms)
2023-01-11T23:21:51.7073867Z [0;32m[ RUN      ] [mNVFuserTest.FusionTrivialInputForwarding2_CUDA
2023-01-11T23:21:51.8733900Z [0;32m[       OK ] [mNVFuserTest.FusionTrivialInputForwarding2_CUDA (166 ms)
2023-01-11T23:21:51.8735258Z [0;32m[ RUN      ] [mNVFuserTest.FusionReplayTrivialReductionAndBroadcast2_CUDA
2023-01-11T23:21:52.0638920Z [0;32m[       OK ] [mNVFuserTest.FusionReplayTrivialReductionAndBroadcast2_CUDA (190 ms)
2023-01-11T23:21:52.0639894Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeStrideContiguity2D_CUDA
2023-01-11T23:21:52.6404875Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeStrideContiguity2D_CUDA (576 ms)
2023-01-11T23:21:52.6405349Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeStrideContiguity3D_CUDA
2023-01-11T23:21:53.2327077Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeStrideContiguity3D_CUDA (592 ms)
2023-01-11T23:21:53.2327540Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeStrideContiguity5D_CUDA
2023-01-11T23:21:53.8572535Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeStrideContiguity5D_CUDA (624 ms)
2023-01-11T23:21:53.8573269Z [0;32m[ RUN      ] [mNVFuserTest.FusionVectorizeStrideContiguitySelfOverlapping_CUDA
2023-01-11T23:21:54.4827267Z [0;32m[       OK ] [mNVFuserTest.FusionVectorizeStrideContiguitySelfOverlapping_CUDA (625 ms)
2023-01-11T23:21:54.4828179Z [0;32m[ RUN      ] [mNVFuserTest.FusionSimpleAmperePipeline_CUDA
2023-01-11T23:21:54.7708560Z [0;32m[       OK ] [mNVFuserTest.FusionSimpleAmperePipeline_CUDA (287 ms)
2023-01-11T23:21:54.7709386Z [0;32m[ RUN      ] [mNVFuserTest.FusionStandaloneFull_CUDA
2023-01-11T23:21:59.3800056Z [0;32m[       OK ] [mNVFuserTest.FusionStandaloneFull_CUDA (4609 ms)
2023-01-11T23:21:59.3800457Z [0;32m[ RUN      ] [mNVFuserTest.FusionStandaloneZeros_CUDA
2023-01-11T23:22:03.9906654Z [0;32m[       OK ] [mNVFuserTest.FusionStandaloneZeros_CUDA (4610 ms)
2023-01-11T23:22:03.9907391Z [0;32m[ RUN      ] [mNVFuserTest.FusionStandaloneOnes_CUDA
2023-01-11T23:22:08.6056109Z [0;32m[       OK ] [mNVFuserTest.FusionStandaloneOnes_CUDA (4614 ms)
2023-01-11T23:22:08.6056908Z [0;32m[ RUN      ] [mNVFuserTest.FusionStandaloneARange_CUDA
2023-01-11T23:22:10.8239509Z [0;32m[       OK ] [mNVFuserTest.FusionStandaloneARange_CUDA (2218 ms)
2023-01-11T23:22:10.8239946Z [0;32m[ RUN      ] [mNVFuserTest.FusionStandaloneEye_CUDA
2023-01-11T23:22:14.0315100Z [0;32m[       OK ] [mNVFuserTest.FusionStandaloneEye_CUDA (3207 ms)
2023-01-11T23:22:14.0315503Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduce1_CUDA
2023-01-11T23:22:14.3045924Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduce1_CUDA (273 ms)
2023-01-11T23:22:14.3046606Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduce2_CUDA
2023-01-11T23:22:14.5400114Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduce2_CUDA (235 ms)
2023-01-11T23:22:14.5401572Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduce3_CUDA
2023-01-11T23:22:14.7806327Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduce3_CUDA (240 ms)
2023-01-11T23:22:14.7806734Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduce4_CUDA
2023-01-11T23:22:15.0137826Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduce4_CUDA (232 ms)
2023-01-11T23:22:15.0138613Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduce5_CUDA
2023-01-11T23:22:15.2785663Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduce5_CUDA (264 ms)
2023-01-11T23:22:15.2786479Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduce6_CUDA
2023-01-11T23:22:15.5727470Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduce6_CUDA (294 ms)
2023-01-11T23:22:15.5727984Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduceWelford1_CUDA
2023-01-11T23:22:15.8581358Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduceWelford1_CUDA (285 ms)
2023-01-11T23:22:15.8581811Z [0;32m[ RUN      ] [mNVFuserTest.FusionGridAllreduceWelford2_CUDA
2023-01-11T23:22:16.1559456Z [0;32m[       OK ] [mNVFuserTest.FusionGridAllreduceWelford2_CUDA (297 ms)
2023-01-11T23:22:16.1560406Z [0;32m[ RUN      ] [mNVFuserTest.FusionFusedReductionBatchnorm_CUDA
2023-01-11T23:22:22.3870336Z [0;32m[       OK ] [mNVFuserTest.FusionFusedReductionBatchnorm_CUDA (6231 ms)
2023-01-11T23:22:22.3871377Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction1_CUDA
2023-01-11T23:22:22.5801141Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction1_CUDA (193 ms)
2023-01-11T23:22:22.5801987Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction2_CUDA
2023-01-11T23:22:22.8262557Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction2_CUDA (246 ms)
2023-01-11T23:22:22.8263061Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction3_CUDA
2023-01-11T23:22:23.0668374Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction3_CUDA (240 ms)
2023-01-11T23:22:23.0669240Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction4_CUDA
2023-01-11T23:22:23.0685131Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction4_CUDA (2 ms)
2023-01-11T23:22:23.0685556Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction5_CUDA
2023-01-11T23:22:23.0705981Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction5_CUDA (1 ms)
2023-01-11T23:22:23.0707522Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction6_CUDA
2023-01-11T23:22:23.2700702Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction6_CUDA (199 ms)
2023-01-11T23:22:23.2701731Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReduction7_CUDA
2023-01-11T23:22:23.2717967Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReduction7_CUDA (2 ms)
2023-01-11T23:22:23.2718412Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReductionRfactor1_CUDA
2023-01-11T23:22:23.5125845Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReductionRfactor1_CUDA (240 ms)
2023-01-11T23:22:23.5126317Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReductionRfactor2_CUDA
2023-01-11T23:22:23.7542600Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReductionRfactor2_CUDA (241 ms)
2023-01-11T23:22:23.7543143Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReductionAfterComputeAt_CUDA
2023-01-11T23:22:23.9573926Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReductionAfterComputeAt_CUDA (203 ms)
2023-01-11T23:22:23.9574399Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupAllreduce1_CUDA
2023-01-11T23:22:24.2438410Z [0;32m[       OK ] [mNVFuserTest.FusionGroupAllreduce1_CUDA (286 ms)
2023-01-11T23:22:24.2439236Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupAllreduce2_CUDA
2023-01-11T23:22:24.5446394Z [0;32m[       OK ] [mNVFuserTest.FusionGroupAllreduce2_CUDA (301 ms)
2023-01-11T23:22:24.5446805Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupAllreduce3_CUDA
2023-01-11T23:22:24.8907788Z [0;32m[       OK ] [mNVFuserTest.FusionGroupAllreduce3_CUDA (345 ms)
2023-01-11T23:22:24.8908586Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupAllreduce4_CUDA
2023-01-11T23:22:25.4693988Z [0;32m[       OK ] [mNVFuserTest.FusionGroupAllreduce4_CUDA (578 ms)
2023-01-11T23:22:25.4694900Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupAllreduce5_CUDA
2023-01-11T23:22:25.8278854Z [0;32m[       OK ] [mNVFuserTest.FusionGroupAllreduce5_CUDA (358 ms)
2023-01-11T23:22:25.8279956Z [0;32m[ RUN      ] [mNVFuserTest.FusionPersistentBNBackwardAllreduce_CUDA
2023-01-11T23:22:26.2747532Z [0;32m[       OK ] [mNVFuserTest.FusionPersistentBNBackwardAllreduce_CUDA (446 ms)
2023-01-11T23:22:26.2748616Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReductionReEntrant1_CUDA
2023-01-11T23:22:26.5572250Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReductionReEntrant1_CUDA (282 ms)
2023-01-11T23:22:26.5572806Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReductionChannelsLastBatchNormLike_CUDA
2023-01-11T23:22:26.9072349Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReductionChannelsLastBatchNormLike_CUDA (349 ms)
2023-01-11T23:22:26.9073698Z [0;32m[ RUN      ] [mNVFuserTest.FusionGroupedReductionPersistentChannelsLastBatchNormLike_CUDA
2023-01-11T23:22:27.4449924Z [0;32m[       OK ] [mNVFuserTest.FusionGroupedReductionPersistentChannelsLastBatchNormLike_CUDA (537 ms)
2023-01-11T23:22:27.4450555Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce1_CUDA
2023-01-11T23:22:27.7518673Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce1_CUDA (306 ms)
2023-01-11T23:22:27.7519921Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce2_CUDA
2023-01-11T23:22:28.2360249Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce2_CUDA (484 ms)
2023-01-11T23:22:28.2361404Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce3_CUDA
2023-01-11T23:22:28.6357350Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce3_CUDA (399 ms)
2023-01-11T23:22:28.6357946Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce4_CUDA
2023-01-11T23:22:28.9431109Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduce4_CUDA (307 ms)
2023-01-11T23:22:28.9433362Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford1_CUDA
2023-01-11T23:22:29.3792714Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford1_CUDA (436 ms)
2023-01-11T23:22:29.3794019Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford2_CUDA
2023-01-11T23:22:30.2701468Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduceWelford2_CUDA (890 ms)
2023-01-11T23:22:30.2702834Z [0;32m[ RUN      ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduceWelfordShmoo_CUDA
2023-01-11T23:23:02.5874991Z [0;32m[       OK ] [mNVFuserTest.FusionCrossIterationGroupedGridAllreduceWelfordShmoo_CUDA (32317 ms)
2023-01-11T23:23:02.5875503Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift1_CUDA
2023-01-11T23:23:02.7809665Z [0;32m[       OK ] [mNVFuserTest.FusionShift1_CUDA (193 ms)
2023-01-11T23:23:02.7810107Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift2_CUDA
2023-01-11T23:23:03.0094414Z [0;32m[       OK ] [mNVFuserTest.FusionShift2_CUDA (228 ms)
2023-01-11T23:23:03.0096296Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftRightOfCA_CUDA
2023-01-11T23:23:03.1914005Z [0;32m[       OK ] [mNVFuserTest.FusionShiftRightOfCA_CUDA (181 ms)
2023-01-11T23:23:03.1914440Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftLeftOfCA_CUDA
2023-01-11T23:23:03.1953889Z [0;32m[       OK ] [mNVFuserTest.FusionShiftLeftOfCA_CUDA (4 ms)
2023-01-11T23:23:03.1954349Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftSplit1_CUDA
2023-01-11T23:23:03.4001930Z [0;32m[       OK ] [mNVFuserTest.FusionShiftSplit1_CUDA (204 ms)
2023-01-11T23:23:03.4002703Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftSplit2_CUDA
2023-01-11T23:23:03.6314327Z [0;32m[       OK ] [mNVFuserTest.FusionShiftSplit2_CUDA (231 ms)
2023-01-11T23:23:03.6314745Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftDoubleSplit_CUDA
2023-01-11T23:23:03.8444755Z [0;32m[       OK ] [mNVFuserTest.FusionShiftDoubleSplit_CUDA (213 ms)
2023-01-11T23:23:03.8445198Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift3ptStencil_CUDA
2023-01-11T23:23:04.0716664Z [0;32m[       OK ] [mNVFuserTest.FusionShift3ptStencil_CUDA (227 ms)
2023-01-11T23:23:04.0717136Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift5ptStencil_CUDA
2023-01-11T23:23:04.9088184Z [0;32m[       OK ] [mNVFuserTest.FusionShift5ptStencil_CUDA (837 ms)
2023-01-11T23:23:04.9088613Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift9ptStencil_CUDA
2023-01-11T23:23:05.9640969Z [0;32m[       OK ] [mNVFuserTest.FusionShift9ptStencil_CUDA (1054 ms)
2023-01-11T23:23:05.9641853Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftSmemBlocking_CUDA
2023-01-11T23:23:06.1486178Z [0;32m[       OK ] [mNVFuserTest.FusionShiftSmemBlocking_CUDA (184 ms)
2023-01-11T23:23:06.1486659Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift3ptStencilParallel_CUDA
2023-01-11T23:23:06.3476149Z [0;32m[       OK ] [mNVFuserTest.FusionShift3ptStencilParallel_CUDA (198 ms)
2023-01-11T23:23:06.3476617Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift5ptStencilParallel_CUDA
2023-01-11T23:23:06.5567158Z [0;32m[       OK ] [mNVFuserTest.FusionShift5ptStencilParallel_CUDA (209 ms)
2023-01-11T23:23:06.5567637Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftMerge1_CUDA
2023-01-11T23:23:06.8006602Z [0;32m[       OK ] [mNVFuserTest.FusionShiftMerge1_CUDA (243 ms)
2023-01-11T23:23:06.8007018Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftMerge2_CUDA
2023-01-11T23:23:07.1162948Z [0;32m[       OK ] [mNVFuserTest.FusionShiftMerge2_CUDA (315 ms)
2023-01-11T23:23:07.1163877Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftGlobal_CUDA
2023-01-11T23:23:07.3389080Z [0;32m[       OK ] [mNVFuserTest.FusionShiftGlobal_CUDA (222 ms)
2023-01-11T23:23:07.3389725Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftDoubleSplitMerge1_CUDA
2023-01-11T23:23:07.5583055Z [0;32m[       OK ] [mNVFuserTest.FusionShiftDoubleSplitMerge1_CUDA (219 ms)
2023-01-11T23:23:07.5584055Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftDoubleSplitMerge2_CUDA
2023-01-11T23:23:07.7655684Z [0;32m[       OK ] [mNVFuserTest.FusionShiftDoubleSplitMerge2_CUDA (207 ms)
2023-01-11T23:23:07.7656232Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift5ptStencilParallel1DThreadBlock_CUDA
2023-01-11T23:23:11.6979363Z [0;32m[       OK ] [mNVFuserTest.FusionShift5ptStencilParallel1DThreadBlock_CUDA (3932 ms)
2023-01-11T23:23:11.6980712Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftChain1_CUDA
2023-01-11T23:23:11.8921037Z [0;32m[       OK ] [mNVFuserTest.FusionShiftChain1_CUDA (194 ms)
2023-01-11T23:23:11.8921443Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftChain2_CUDA
2023-01-11T23:23:12.0850932Z [0;32m[       OK ] [mNVFuserTest.FusionShiftChain2_CUDA (192 ms)
2023-01-11T23:23:12.0851330Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftChain3_CUDA
2023-01-11T23:23:12.2847509Z [0;32m[       OK ] [mNVFuserTest.FusionShiftChain3_CUDA (199 ms)
2023-01-11T23:23:12.2848284Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftChain4_CUDA
2023-01-11T23:23:13.0901967Z [0;32m[       OK ] [mNVFuserTest.FusionShiftChain4_CUDA (805 ms)
2023-01-11T23:23:13.0902785Z [0;32m[ RUN      ] [mNVFuserTest.FusionShift5ptStencilChain_CUDA
2023-01-11T23:23:13.3659316Z [0;32m[       OK ] [mNVFuserTest.FusionShift5ptStencilChain_CUDA (275 ms)
2023-01-11T23:23:13.3660172Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftReduction1_CUDA
2023-01-11T23:23:13.5679270Z [0;32m[       OK ] [mNVFuserTest.FusionShiftReduction1_CUDA (202 ms)
2023-01-11T23:23:13.5679742Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftReduction2_CUDA
2023-01-11T23:23:13.8046259Z [0;32m[       OK ] [mNVFuserTest.FusionShiftReduction2_CUDA (236 ms)
2023-01-11T23:23:13.8046757Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftRfactor1_CUDA
2023-01-11T23:23:14.0450536Z [0;32m[       OK ] [mNVFuserTest.FusionShiftRfactor1_CUDA (240 ms)
2023-01-11T23:23:14.0451507Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftBcast1_CUDA
2023-01-11T23:23:14.2225570Z [0;32m[       OK ] [mNVFuserTest.FusionShiftBcast1_CUDA (177 ms)
2023-01-11T23:23:14.2226280Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftBcast2_CUDA
2023-01-11T23:23:14.4189971Z [0;32m[       OK ] [mNVFuserTest.FusionShiftBcast2_CUDA (196 ms)
2023-01-11T23:23:14.4190979Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftBcast3_CUDA
2023-01-11T23:23:14.6122932Z [0;32m[       OK ] [mNVFuserTest.FusionShiftBcast3_CUDA (193 ms)
2023-01-11T23:23:14.6123387Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftSyncPlacement1_CUDA
2023-01-11T23:23:14.8045274Z [0;32m[       OK ] [mNVFuserTest.FusionShiftSyncPlacement1_CUDA (192 ms)
2023-01-11T23:23:14.8046074Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftSyncPlacement2_CUDA
2023-01-11T23:23:14.9844786Z [0;32m[       OK ] [mNVFuserTest.FusionShiftSyncPlacement2_CUDA (180 ms)
2023-01-11T23:23:14.9845194Z [0;32m[ RUN      ] [mNVFuserTest.FusionHdiff_CUDA
2023-01-11T23:23:15.3442103Z [0;32m[       OK ] [mNVFuserTest.FusionHdiff_CUDA (359 ms)
2023-01-11T23:23:15.3442578Z [0;32m[ RUN      ] [mNVFuserTest.FusionHdiffPartialSplitUnswitch_CUDA
2023-01-11T23:23:15.8072935Z [0;32m[       OK ] [mNVFuserTest.FusionHdiffPartialSplitUnswitch_CUDA (462 ms)
2023-01-11T23:23:15.8074040Z [0;32m[ RUN      ] [mNVFuserTest.FusionMaxPooling_CUDA
2023-01-11T23:23:16.0781301Z [0;32m[       OK ] [mNVFuserTest.FusionMaxPooling_CUDA (270 ms)
2023-01-11T23:23:16.0782164Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather1_CUDA
2023-01-11T23:23:16.2615724Z [0;32m[       OK ] [mNVFuserTest.FusionGather1_CUDA (183 ms)
2023-01-11T23:23:16.2616579Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather2_CUDA
2023-01-11T23:23:16.4495606Z [0;32m[       OK ] [mNVFuserTest.FusionGather2_CUDA (187 ms)
2023-01-11T23:23:16.4496380Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather3_CUDA
2023-01-11T23:23:16.6298711Z [0;32m[       OK ] [mNVFuserTest.FusionGather3_CUDA (180 ms)
2023-01-11T23:23:16.6299131Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather4_CUDA
2023-01-11T23:23:16.8188402Z [0;32m[       OK ] [mNVFuserTest.FusionGather4_CUDA (188 ms)
2023-01-11T23:23:16.8188827Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather5_CUDA
2023-01-11T23:23:17.0221597Z [0;32m[       OK ] [mNVFuserTest.FusionGather5_CUDA (203 ms)
2023-01-11T23:23:17.0222416Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather6_CUDA
2023-01-11T23:23:17.2178643Z [0;32m[       OK ] [mNVFuserTest.FusionGather6_CUDA (195 ms)
2023-01-11T23:23:17.2179398Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather7_CUDA
2023-01-11T23:23:17.4211367Z [0;32m[       OK ] [mNVFuserTest.FusionGather7_CUDA (203 ms)
2023-01-11T23:23:17.4212109Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather8_CUDA
2023-01-11T23:23:17.6108327Z [0;32m[       OK ] [mNVFuserTest.FusionGather8_CUDA (189 ms)
2023-01-11T23:23:17.6108706Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather9_CUDA
2023-01-11T23:23:17.8263812Z [0;32m[       OK ] [mNVFuserTest.FusionGather9_CUDA (215 ms)
2023-01-11T23:23:17.8264532Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv2D_CUDA
2023-01-11T23:23:19.3064346Z [0;32m[       OK ] [mNVFuserTest.FusionConv2D_CUDA (1480 ms)
2023-01-11T23:23:19.3065106Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv2DNoPadding_CUDA
2023-01-11T23:23:19.5934005Z [0;32m[       OK ] [mNVFuserTest.FusionConv2DNoPadding_CUDA (286 ms)
2023-01-11T23:23:19.5935375Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv2DNoPaddingStrided_CUDA
2023-01-11T23:23:19.8666686Z [0;32m[       OK ] [mNVFuserTest.FusionConv2DNoPaddingStrided_CUDA (273 ms)
2023-01-11T23:23:19.8667165Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv2DChain_CUDA
2023-01-11T23:23:20.2391554Z [0;32m[       OK ] [mNVFuserTest.FusionConv2DChain_CUDA (372 ms)
2023-01-11T23:23:20.2392527Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv2DStaticEvenSizedWindow_CUDA
2023-01-11T23:23:20.4912270Z [0;32m[       OK ] [mNVFuserTest.FusionConv2DStaticEvenSizedWindow_CUDA (252 ms)
2023-01-11T23:23:20.4913707Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv4x4Pad1x1_CUDA
2023-01-11T23:23:20.8196777Z [0;32m[       OK ] [mNVFuserTest.FusionConv4x4Pad1x1_CUDA (328 ms)
2023-01-11T23:23:20.8197594Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv4x5Pad1x2_CUDA
2023-01-11T23:23:21.1680539Z [0;32m[       OK ] [mNVFuserTest.FusionConv4x5Pad1x2_CUDA (348 ms)
2023-01-11T23:23:21.1680960Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv4x4Pad1x1Stride4_CUDA
2023-01-11T23:23:21.7198673Z [0;32m[       OK ] [mNVFuserTest.FusionConv4x4Pad1x1Stride4_CUDA (551 ms)
2023-01-11T23:23:21.7199078Z [0;32m[ RUN      ] [mNVFuserTest.FusionIm2Col_CUDA
2023-01-11T23:23:21.9282132Z [0;32m[       OK ] [mNVFuserTest.FusionIm2Col_CUDA (208 ms)
2023-01-11T23:23:21.9282974Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftNoPadding1_CUDA
2023-01-11T23:23:22.1385590Z [0;32m[       OK ] [mNVFuserTest.FusionShiftNoPadding1_CUDA (210 ms)
2023-01-11T23:23:22.1386476Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftNoPadding2_CUDA
2023-01-11T23:23:22.3481735Z [0;32m[       OK ] [mNVFuserTest.FusionShiftNoPadding2_CUDA (209 ms)
2023-01-11T23:23:22.3482235Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftNoPadding3_CUDA
2023-01-11T23:23:22.5831279Z [0;32m[       OK ] [mNVFuserTest.FusionShiftNoPadding3_CUDA (234 ms)
2023-01-11T23:23:22.5832161Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftNoPaddingContigMerge_CUDA
2023-01-11T23:23:22.7796587Z [0;32m[       OK ] [mNVFuserTest.FusionShiftNoPaddingContigMerge_CUDA (196 ms)
2023-01-11T23:23:22.7797693Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftNoPaddingChain_CUDA
2023-01-11T23:23:23.0120046Z [0;32m[       OK ] [mNVFuserTest.FusionShiftNoPaddingChain_CUDA (232 ms)
2023-01-11T23:23:23.0120550Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftNoPaddingRfactor_CUDA
2023-01-11T23:23:23.0142228Z [0;32m[       OK ] [mNVFuserTest.FusionShiftNoPaddingRfactor_CUDA (2 ms)
2023-01-11T23:23:23.0142680Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftPadding1_CUDA
2023-01-11T23:23:23.2274640Z [0;32m[       OK ] [mNVFuserTest.FusionShiftPadding1_CUDA (212 ms)
2023-01-11T23:23:23.2275458Z [0;32m[ RUN      ] [mNVFuserTest.FusionPartialSplit1_CUDA
2023-01-11T23:23:23.4047710Z [0;32m[       OK ] [mNVFuserTest.FusionPartialSplit1_CUDA (177 ms)
2023-01-11T23:23:23.4048131Z [0;32m[ RUN      ] [mNVFuserTest.FusionPartialSplit2_CUDA
2023-01-11T23:23:23.4102883Z [0;32m[       OK ] [mNVFuserTest.FusionPartialSplit2_CUDA (5 ms)
2023-01-11T23:23:23.4103680Z [0;32m[ RUN      ] [mNVFuserTest.FusionPartialSplit3_CUDA
2023-01-11T23:23:23.5942149Z [0;32m[       OK ] [mNVFuserTest.FusionPartialSplit3_CUDA (183 ms)
2023-01-11T23:23:23.5942951Z [0;32m[ RUN      ] [mNVFuserTest.FusionPartialSplit4_CUDA
2023-01-11T23:23:23.8778305Z [0;32m[       OK ] [mNVFuserTest.FusionPartialSplit4_CUDA (283 ms)
2023-01-11T23:23:23.8778940Z [0;32m[ RUN      ] [mNVFuserTest.FusionPartialSplit5_CUDA
2023-01-11T23:23:24.1031827Z [0;32m[       OK ] [mNVFuserTest.FusionPartialSplit5_CUDA (225 ms)
2023-01-11T23:23:24.1033145Z [0;32m[ RUN      ] [mNVFuserTest.FusionPartialSplit6_CUDA
2023-01-11T23:23:24.3019668Z [0;32m[       OK ] [mNVFuserTest.FusionPartialSplit6_CUDA (198 ms)
2023-01-11T23:23:24.3020089Z [0;32m[ RUN      ] [mNVFuserTest.FusionShiftUnswitch1_CUDA
2023-01-11T23:23:24.5127551Z [0;32m[       OK ] [mNVFuserTest.FusionShiftUnswitch1_CUDA (210 ms)
2023-01-11T23:23:24.5127970Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherUnswitch1_CUDA
2023-01-11T23:23:25.7845950Z [0;32m[       OK ] [mNVFuserTest.FusionGatherUnswitch1_CUDA (1271 ms)
2023-01-11T23:23:25.7846600Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided1_CUDA
2023-01-11T23:23:25.9714959Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided1_CUDA (186 ms)
2023-01-11T23:23:25.9715635Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided2_CUDA
2023-01-11T23:23:26.1588228Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided2_CUDA (187 ms)
2023-01-11T23:23:26.1588885Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided3_CUDA
2023-01-11T23:23:26.3499777Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided3_CUDA (191 ms)
2023-01-11T23:23:26.3500419Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided4_CUDA
2023-01-11T23:23:26.5704229Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided4_CUDA (220 ms)
2023-01-11T23:23:26.5704814Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided5_CUDA
2023-01-11T23:23:26.7566814Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided5_CUDA (186 ms)
2023-01-11T23:23:26.7567443Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided6_CUDA
2023-01-11T23:23:26.9442045Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided6_CUDA (187 ms)
2023-01-11T23:23:26.9442613Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided7_CUDA
2023-01-11T23:23:26.9491829Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided7_CUDA (4 ms)
2023-01-11T23:23:26.9492474Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStrided8_CUDA
2023-01-11T23:23:27.1476825Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStrided8_CUDA (199 ms)
2023-01-11T23:23:27.1477455Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherStridedChain_CUDA
2023-01-11T23:23:27.1513827Z [0;32m[       OK ] [mNVFuserTest.FusionGatherStridedChain_CUDA (3 ms)
2023-01-11T23:23:27.1514430Z [0;32m[ RUN      ] [mNVFuserTest.FusionMaxPoolingStrided_CUDA
2023-01-11T23:23:27.4789261Z [0;32m[       OK ] [mNVFuserTest.FusionMaxPoolingStrided_CUDA (327 ms)
2023-01-11T23:23:27.4789919Z [0;32m[ RUN      ] [mNVFuserTest.FusionConv2DStaticStrided_CUDA
2023-01-11T23:23:27.9185083Z [0;32m[       OK ] [mNVFuserTest.FusionConv2DStaticStrided_CUDA (439 ms)
2023-01-11T23:23:27.9185711Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleHalo1_CUDA
2023-01-11T23:23:28.1160128Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleHalo1_CUDA (197 ms)
2023-01-11T23:23:28.1160846Z [0;32m[ RUN      ] [mNVFuserTest.FusionNonDivisibleHalo2_CUDA
2023-01-11T23:23:29.0930667Z [0;32m[       OK ] [mNVFuserTest.FusionNonDivisibleHalo2_CUDA (977 ms)
2023-01-11T23:23:29.0931498Z [0;32m[ RUN      ] [mNVFuserTest.FusionGather9ptStencilDoubleBuffering_CUDA
2023-01-11T23:23:29.3122181Z [0;32m[       OK ] [mNVFuserTest.FusionGather9ptStencilDoubleBuffering_CUDA (218 ms)
2023-01-11T23:23:29.3122950Z [0;32m[ RUN      ] [mNVFuserTest.FusionValidateParallelizeShift_CUDA
2023-01-11T23:23:29.5092129Z [0;32m[       OK ] [mNVFuserTest.FusionValidateParallelizeShift_CUDA (197 ms)
2023-01-11T23:23:29.5092876Z [0;32m[ RUN      ] [mNVFuserTest.FusionGatherIterTypePromotion_CUDA
2023-01-11T23:23:29.7325874Z [0;32m[       OK ] [mNVFuserTest.FusionGatherIterTypePromotion_CUDA (223 ms)
2023-01-11T23:23:29.7326564Z [0;32m[ RUN      ] [mNVFuserTest.FusionContigPredicateShift_CUDA
2023-01-11T23:23:29.9215238Z [0;32m[       OK ] [mNVFuserTest.FusionContigPredicateShift_CUDA (188 ms)
2023-01-11T23:23:29.9215876Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMMATT_CUDA
2023-01-11T23:23:30.2762209Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMMATT_CUDA (354 ms)
2023-01-11T23:23:30.2762651Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMMATN_CUDA
2023-01-11T23:23:30.6435551Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMMATN_CUDA (367 ms)
2023-01-11T23:23:30.6436542Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMMANT_CUDA
2023-01-11T23:23:30.9841339Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMMANT_CUDA (340 ms)
2023-01-11T23:23:30.9841753Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMatmul_CUDA
2023-01-11T23:24:54.7088126Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMatmul_CUDA (83724 ms)
2023-01-11T23:24:54.7088589Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMatmulRegDoubleBuffer_CUDA
2023-01-11T23:26:19.9360640Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMatmulRegDoubleBuffer_CUDA (85227 ms)
2023-01-11T23:26:19.9361089Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMMATN_CUDA
2023-01-11T23:26:20.8708707Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMMATN_CUDA (934 ms)
2023-01-11T23:26:20.8709637Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMMATT_CUDA
2023-01-11T23:26:21.8750973Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMMATT_CUDA (1004 ms)
2023-01-11T23:26:21.8752240Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMMANT_CUDA
2023-01-11T23:26:22.8820168Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMMANT_CUDA (1006 ms)
2023-01-11T23:26:22.8820876Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMatmul_CUDA
2023-01-11T23:26:25.7296383Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMatmul_CUDA (2847 ms)
2023-01-11T23:26:25.7297163Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMatmulPipelineGmem_CUDA
2023-01-11T23:26:31.2761813Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMatmulPipelineGmem_CUDA (5546 ms)
2023-01-11T23:26:31.2762285Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMatmulRegDbouleBuffer_CUDA
2023-01-11T23:26:36.9900815Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMatmulRegDbouleBuffer_CUDA (5713 ms)
2023-01-11T23:26:36.9901662Z [0;32m[ RUN      ] [mNVFuserTest.FusionMatmulMatmulAmpere_CUDA
2023-01-11T23:26:38.3271730Z [0;32m[       OK ] [mNVFuserTest.FusionMatmulMatmulAmpere_CUDA (1337 ms)
2023-01-11T23:26:38.3272792Z [0;32m[ RUN      ] [mNVFuserTest.FusionMatmulSoftmaxMatmulAmpere_CUDA
2023-01-11T23:26:40.9613157Z [0;32m[       OK ] [mNVFuserTest.FusionMatmulSoftmaxMatmulAmpere_CUDA (2634 ms)
2023-01-11T23:26:40.9614050Z [0;32m[ RUN      ] [mNVFuserTest.FusionTuringMMATN_CUDA
2023-01-11T23:26:41.8960895Z [0;32m[       OK ] [mNVFuserTest.FusionTuringMMATN_CUDA (934 ms)
2023-01-11T23:26:41.8961661Z [0;32m[ RUN      ] [mNVFuserTest.FusionTuringMMATT_CUDA
2023-01-11T23:26:42.9028416Z [0;32m[       OK ] [mNVFuserTest.FusionTuringMMATT_CUDA (1006 ms)
2023-01-11T23:26:42.9029179Z [0;32m[ RUN      ] [mNVFuserTest.FusionTuringMMANT_CUDA
2023-01-11T23:26:43.9108509Z [0;32m[       OK ] [mNVFuserTest.FusionTuringMMANT_CUDA (1008 ms)
2023-01-11T23:26:43.9109543Z [0;32m[ RUN      ] [mNVFuserTest.FusionTuringMatmul_CUDA
2023-01-11T23:26:46.7068631Z [0;32m[       OK ] [mNVFuserTest.FusionTuringMatmul_CUDA (2795 ms)
2023-01-11T23:26:46.7069461Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMatmulTNcpAsync_CUDA
2023-01-11T23:26:47.2297141Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMatmulTNcpAsync_CUDA (523 ms)
2023-01-11T23:26:47.2298150Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereStridedBatchedMatmulTN_CUDA
2023-01-11T23:26:48.0353237Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereStridedBatchedMatmulTN_CUDA (805 ms)
2023-01-11T23:26:48.0354154Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereViewMatmulTN_CUDA
2023-01-11T23:26:48.6133573Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereViewMatmulTN_CUDA (578 ms)
2023-01-11T23:26:48.6134142Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMatMulTNCrossWarp_CUDA
2023-01-11T23:26:53.2347609Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMatMulTNCrossWarp_CUDA (4621 ms)
2023-01-11T23:26:53.2348442Z [0;32m[ RUN      ] [mNVFuserTest.FusionVoltaMatMulTNCrossCTA_CUDA
2023-01-11T23:27:35.6292914Z [0;32m[       OK ] [mNVFuserTest.FusionVoltaMatMulTNCrossCTA_CUDA (42394 ms)
2023-01-11T23:27:35.6294042Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMatmulTNSwizzled_CUDA
2023-01-11T23:27:37.0102020Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMatmulTNSwizzled_CUDA (1380 ms)
2023-01-11T23:27:37.0103183Z [0;32m[ RUN      ] [mNVFuserTest.FusionAmpereMatmulLargeLoad_CUDA
2023-01-11T23:27:39.8418597Z [0;32m[       OK ] [mNVFuserTest.FusionAmpereMatmulLargeLoad_CUDA (2831 ms)
2023-01-11T23:27:39.8419368Z [0;32m[ RUN      ] [mNVFuserTest.FusionTuringMatmulLargeLoad_CUDA
2023-01-11T23:27:42.6118277Z [0;32m[       OK ] [mNVFuserTest.FusionTuringMatmulLargeLoad_CUDA (2769 ms)
2023-01-11T23:27:42.6119422Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewDtypeSameSizeOutput_CUDA
2023-01-11T23:27:42.8008929Z [0;32m[       OK ] [mNVFuserTest.FusionViewDtypeSameSizeOutput_CUDA (189 ms)
2023-01-11T23:27:42.8009945Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewDtypeFailMismatchSize_CUDA
2023-01-11T23:27:42.8040316Z [0;32m[       OK ] [mNVFuserTest.FusionViewDtypeFailMismatchSize_CUDA (3 ms)
2023-01-11T23:27:42.8040938Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewAsRealOutput_CUDA
2023-01-11T23:27:42.9842097Z [0;32m[       OK ] [mNVFuserTest.FusionViewAsRealOutput_CUDA (180 ms)
2023-01-11T23:27:42.9842684Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewRfactorExtentReplacement_CUDA
2023-01-11T23:27:43.5888767Z [0;32m[       OK ] [mNVFuserTest.FusionViewRfactorExtentReplacement_CUDA (604 ms)
2023-01-11T23:27:43.5890411Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewOutput_CUDA
2023-01-11T23:27:43.7837411Z [0;32m[       OK ] [mNVFuserTest.FusionViewOutput_CUDA (194 ms)
2023-01-11T23:27:43.7838229Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewFailMismatchSize_CUDA
2023-01-11T23:27:43.7855980Z [0;32m[       OK ] [mNVFuserTest.FusionViewFailMismatchSize_CUDA (2 ms)
2023-01-11T23:27:43.7856754Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewFailMulitDimInference_CUDA
2023-01-11T23:27:43.7876782Z [0;32m[       OK ] [mNVFuserTest.FusionViewFailMulitDimInference_CUDA (2 ms)
2023-01-11T23:27:43.7877193Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewReductionShmoo_CUDA
2023-01-11T23:28:21.8919232Z [0;32m[       OK ] [mNVFuserTest.FusionViewReductionShmoo_CUDA (38103 ms)
2023-01-11T23:28:21.8920071Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewPersistentShmoo_CUDA
2023-01-11T23:29:32.6822915Z [0;32m[       OK ] [mNVFuserTest.FusionViewPersistentShmoo_CUDA (70790 ms)
2023-01-11T23:29:32.6823785Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewSplit_CUDA
2023-01-11T23:29:33.0993193Z [0;32m[       OK ] [mNVFuserTest.FusionViewSplit_CUDA (417 ms)
2023-01-11T23:29:33.0994744Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewBroadcast_CUDA
2023-01-11T23:29:33.4943493Z [0;32m[       OK ] [mNVFuserTest.FusionViewBroadcast_CUDA (394 ms)
2023-01-11T23:29:33.4944185Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMerge_CUDA
2023-01-11T23:29:33.8960689Z [0;32m[       OK ] [mNVFuserTest.FusionViewMerge_CUDA (401 ms)
2023-01-11T23:29:33.8961065Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewAllShmoo_CUDA
2023-01-11T23:29:49.6735408Z [0;32m[       OK ] [mNVFuserTest.FusionViewAllShmoo_CUDA (15777 ms)
2023-01-11T23:29:49.6735794Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewStride_CUDA
2023-01-11T23:30:05.7922591Z [0;32m[       OK ] [mNVFuserTest.FusionViewStride_CUDA (16118 ms)
2023-01-11T23:30:05.7923012Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewBinary_CUDA
2023-01-11T23:30:06.3081621Z [0;32m[       OK ] [mNVFuserTest.FusionViewBinary_CUDA (515 ms)
2023-01-11T23:30:06.3082310Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewConcreteDomain_CUDA
2023-01-11T23:30:06.4898785Z [0;32m[       OK ] [mNVFuserTest.FusionViewConcreteDomain_CUDA (181 ms)
2023-01-11T23:30:06.4899576Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewConcreteDomain2_CUDA
2023-01-11T23:30:07.2470015Z [0;32m[       OK ] [mNVFuserTest.FusionViewConcreteDomain2_CUDA (757 ms)
2023-01-11T23:30:07.2470470Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewConcreteDomain3_CUDA
2023-01-11T23:30:07.5920216Z [0;32m[       OK ] [mNVFuserTest.FusionViewConcreteDomain3_CUDA (345 ms)
2023-01-11T23:30:07.5920679Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewConcreteDomain4_CUDA
2023-01-11T23:30:07.5935763Z [0;32m[       OK ] [mNVFuserTest.FusionViewConcreteDomain4_CUDA (1 ms)
2023-01-11T23:30:07.5936152Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewConcreteDomain5_CUDA
2023-01-11T23:30:07.5993675Z [0;32m[       OK ] [mNVFuserTest.FusionViewConcreteDomain5_CUDA (5 ms)
2023-01-11T23:30:07.5994559Z [0;32m[ RUN      ] [mNVFuserTest.FusionFlattenAfterUnsqueezeOutput_CUDA
2023-01-11T23:30:07.7848967Z [0;32m[       OK ] [mNVFuserTest.FusionFlattenAfterUnsqueezeOutput_CUDA (185 ms)
2023-01-11T23:30:07.7849628Z [0;32m[ RUN      ] [mNVFuserTest.FusionComputeAtRootDomainMapWithView_CUDA
2023-01-11T23:30:07.7850265Z [0;32m[       OK ] [mNVFuserTest.FusionComputeAtRootDomainMapWithView_CUDA (0 ms)
2023-01-11T23:30:07.7850728Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandRepro_CUDA
2023-01-11T23:30:07.9592275Z [0;32m[       OK ] [mNVFuserTest.FusionExpandRepro_CUDA (173 ms)
2023-01-11T23:30:07.9593017Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandView1_CUDA
2023-01-11T23:30:08.1503469Z [0;32m[       OK ] [mNVFuserTest.FusionExpandView1_CUDA (191 ms)
2023-01-11T23:30:08.1504165Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandView2_CUDA
2023-01-11T23:30:08.3434337Z [0;32m[       OK ] [mNVFuserTest.FusionExpandView2_CUDA (193 ms)
2023-01-11T23:30:08.3435371Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewTransformCache_CUDA
2023-01-11T23:30:08.3436381Z [0;32m[       OK ] [mNVFuserTest.FusionViewTransformCache_CUDA (0 ms)
2023-01-11T23:30:08.3437124Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewIdGraph_CUDA
2023-01-11T23:30:08.3447687Z [0;32m[       OK ] [mNVFuserTest.FusionViewIdGraph_CUDA (1 ms)
2023-01-11T23:30:08.3448350Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewVectorize_CUDA
2023-01-11T23:30:08.7482209Z [0;32m[       OK ] [mNVFuserTest.FusionViewVectorize_CUDA (403 ms)
2023-01-11T23:30:08.7482786Z [0;32m[ RUN      ] [mNVFuserTest.FusionExpandFlatten_CUDA
2023-01-11T23:30:09.1622221Z [0;32m[       OK ] [mNVFuserTest.FusionExpandFlatten_CUDA (414 ms)
2023-01-11T23:30:09.1622792Z [0;32m[ RUN      ] [mNVFuserTest.FusionIllegalReductionFlatten_CUDA
2023-01-11T23:30:09.1642346Z [0;32m[       OK ] [mNVFuserTest.FusionIllegalReductionFlatten_CUDA (1 ms)
2023-01-11T23:30:09.1642924Z [0;32m[ RUN      ] [mNVFuserTest.FusionReductionFlatten1_CUDA
2023-01-11T23:30:09.5864400Z [0;32m[       OK ] [mNVFuserTest.FusionReductionFlatten1_CUDA (421 ms)
2023-01-11T23:30:09.5865430Z [0;32m[ RUN      ] [mNVFuserTest.FusionPwiseViewSchedule_CUDA
2023-01-11T23:30:09.9040169Z [0;32m[       OK ] [mNVFuserTest.FusionPwiseViewSchedule_CUDA (317 ms)
2023-01-11T23:30:09.9040712Z [0;32m[ RUN      ] [mNVFuserTest.FusionSumViewSchedule_CUDA
2023-01-11T23:30:12.3139628Z [0;32m[       OK ] [mNVFuserTest.FusionSumViewSchedule_CUDA (2409 ms)
2023-01-11T23:30:12.3140654Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMagicSchedule1_CUDA
2023-01-11T23:30:12.6559171Z [0;32m[       OK ] [mNVFuserTest.FusionViewMagicSchedule1_CUDA (341 ms)
2023-01-11T23:30:12.6560162Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMagicSchedule2_CUDA
2023-01-11T23:30:13.5389255Z [0;32m[       OK ] [mNVFuserTest.FusionViewMagicSchedule2_CUDA (882 ms)
2023-01-11T23:30:13.5390360Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMagicSchedule3_CUDA
2023-01-11T23:30:13.8678728Z [0;32m[       OK ] [mNVFuserTest.FusionViewMagicSchedule3_CUDA (329 ms)
2023-01-11T23:30:13.8679788Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMagicSchedule4_CUDA
2023-01-11T23:30:14.2166461Z [0;32m[       OK ] [mNVFuserTest.FusionViewMagicSchedule4_CUDA (348 ms)
2023-01-11T23:30:14.2166995Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMagicSchedule5_CUDA
2023-01-11T23:30:14.6455954Z [0;32m[       OK ] [mNVFuserTest.FusionViewMagicSchedule5_CUDA (428 ms)
2023-01-11T23:30:14.6456983Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewMapping_CUDA
2023-01-11T23:30:15.0560739Z [0;32m[       OK ] [mNVFuserTest.FusionViewMapping_CUDA (410 ms)
2023-01-11T23:30:15.0561606Z [0;32m[ RUN      ] [mNVFuserTest.FusionLowerDivisibleSplits_CUDA
2023-01-11T23:30:15.0592090Z [0;32m[       OK ] [mNVFuserTest.FusionLowerDivisibleSplits_CUDA (3 ms)
2023-01-11T23:30:15.0592621Z [0;32m[ RUN      ] [mNVFuserTest.FusionTranspose1_CUDA
2023-01-11T23:30:15.2301800Z [0;32m[       OK ] [mNVFuserTest.FusionTranspose1_CUDA (170 ms)
2023-01-11T23:30:15.2302275Z [0;32m[ RUN      ] [mNVFuserTest.FusionTranspose2_CUDA
2023-01-11T23:30:15.4039871Z [0;32m[       OK ] [mNVFuserTest.FusionTranspose2_CUDA (173 ms)
2023-01-11T23:30:15.4040852Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeWithSwizzle_CUDA
2023-01-11T23:30:15.5812919Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeWithSwizzle_CUDA (177 ms)
2023-01-11T23:30:15.5814052Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeWithSwizzle1DThreadBlock_CUDA
2023-01-11T23:30:15.7780691Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeWithSwizzle1DThreadBlock_CUDA (196 ms)
2023-01-11T23:30:15.7781735Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeSimple_CUDA
2023-01-11T23:30:16.4472624Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeSimple_CUDA (669 ms)
2023-01-11T23:30:16.4473283Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeSinTransposeCos_CUDA
2023-01-11T23:30:17.1689992Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeSinTransposeCos_CUDA (721 ms)
2023-01-11T23:30:17.1690591Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeMultipleInput_CUDA
2023-01-11T23:30:17.8590808Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeMultipleInput_CUDA (689 ms)
2023-01-11T23:30:17.8591898Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeMultipleOutput_CUDA
2023-01-11T23:30:18.8833261Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeMultipleOutput_CUDA (1024 ms)
2023-01-11T23:30:18.8833924Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeMultipleInputOutput_CUDA
2023-01-11T23:30:19.8207664Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeMultipleInputOutput_CUDA (937 ms)
2023-01-11T23:30:19.8208223Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeMatchingSkipConnection_CUDA
2023-01-11T23:30:20.5547039Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeMatchingSkipConnection_CUDA (733 ms)
2023-01-11T23:30:20.5547955Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeBroadcast_CUDA
2023-01-11T23:30:20.9181647Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeBroadcast_CUDA (363 ms)
2023-01-11T23:30:20.9182754Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeNoReference_CUDA
2023-01-11T23:30:20.9213568Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeNoReference_CUDA (3 ms)
2023-01-11T23:30:20.9214150Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleBroadcastOnly_CUDA
2023-01-11T23:30:22.2719541Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleBroadcastOnly_CUDA (1350 ms)
2023-01-11T23:30:22.2720666Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeComplexDAG1_CUDA
2023-01-11T23:30:23.1530448Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeComplexDAG1_CUDA (881 ms)
2023-01-11T23:30:23.1531056Z [0;32m[ RUN      ] [mNVFuserTest.FusionManualScheduleTransposeComplexDAG1_CUDA
2023-01-11T23:30:24.0074375Z [0;32m[       OK ] [mNVFuserTest.FusionManualScheduleTransposeComplexDAG1_CUDA (854 ms)
2023-01-11T23:30:24.0074957Z [0;32m[ RUN      ] [mNVFuserTest.FusionViewNoTranspose_CUDA
2023-01-11T23:30:24.0075752Z [0;32m[       OK ] [mNVFuserTest.FusionViewNoTranspose_CUDA (0 ms)
2023-01-11T23:30:24.0076376Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeSelfMapping_CUDA
2023-01-11T23:30:24.3715061Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeSelfMapping_CUDA (363 ms)
2023-01-11T23:30:24.3715749Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeMissingDim_CUDA
2023-01-11T23:30:24.7850875Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeMissingDim_CUDA (413 ms)
2023-01-11T23:30:24.7851349Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeSmall_CUDA
2023-01-11T23:30:25.5209506Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeSmall_CUDA (735 ms)
2023-01-11T23:30:25.5210045Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeSmallInnerSize1_CUDA
2023-01-11T23:30:26.2922853Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeSmallInnerSize1_CUDA (771 ms)
2023-01-11T23:30:26.2923372Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeSmallInnerSize2_CUDA
2023-01-11T23:30:27.0790826Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeSmallInnerSize2_CUDA (786 ms)
2023-01-11T23:30:27.0791791Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTransposeSmallInnerSize3_CUDA
2023-01-11T23:30:28.1598076Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTransposeSmallInnerSize3_CUDA (1081 ms)
2023-01-11T23:30:28.1598616Z [0;32m[ RUN      ] [mNVFuserTest.FusionScheduleTranspose2DSmallInnerSize_CUDA
2023-01-11T23:30:29.5287948Z [0;32m[       OK ] [mNVFuserTest.FusionScheduleTranspose2DSmallInnerSize_CUDA (1368 ms)
2023-01-11T23:30:29.5288836Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict1_CUDA
2023-01-11T23:30:29.5308963Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict1_CUDA (2 ms)
2023-01-11T23:30:29.5310105Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict2_CUDA
2023-01-11T23:30:29.5321110Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict2_CUDA (1 ms)
2023-01-11T23:30:29.5321569Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict3_CUDA
2023-01-11T23:30:29.5335635Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict3_CUDA (1 ms)
2023-01-11T23:30:29.5336067Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict4_CUDA
2023-01-11T23:30:29.5376594Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict4_CUDA (3 ms)
2023-01-11T23:30:29.5377224Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict5_CUDA
2023-01-11T23:30:29.5393514Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict5_CUDA (1 ms)
2023-01-11T23:30:29.5393943Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict6_CUDA
2023-01-11T23:30:29.5409954Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict6_CUDA (1 ms)
2023-01-11T23:30:29.5410396Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict7_CUDA
2023-01-11T23:30:29.5428211Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict7_CUDA (1 ms)
2023-01-11T23:30:29.5428994Z [0;32m[ RUN      ] [mNVFuserTest.FusionTransposeBankConflict8_CUDA
2023-01-11T23:30:29.5443819Z [0;32m[       OK ] [mNVFuserTest.FusionTransposeBankConflict8_CUDA (1 ms)
2023-01-11T23:30:29.5444253Z [0;32m[ RUN      ] [mNVFuserTest.FusionRNGValidateWithCURand_CUDA
2023-01-11T23:30:29.9306960Z [0;32m[       OK ] [mNVFuserTest.FusionRNGValidateWithCURand_CUDA (385 ms)
2023-01-11T23:30:29.9307441Z [0;32m[ RUN      ] [mNVFuserTest.FusionRNGManualScheduleValidateWithCURand_CUDA
2023-01-11T23:30:30.2319284Z [0;32m[       OK ] [mNVFuserTest.FusionRNGManualScheduleValidateWithCURand_CUDA (301 ms)
2023-01-11T23:30:30.2319805Z [0;32m[ RUN      ] [mNVFuserTest.FusionRNGManualScheduleValidateWithCURand2_CUDA
2023-01-11T23:30:30.4429286Z [0;32m[       OK ] [mNVFuserTest.FusionRNGManualScheduleValidateWithCURand2_CUDA (210 ms)
2023-01-11T23:30:30.4430353Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastingRNG_CUDA
2023-01-11T23:30:30.8479402Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastingRNG_CUDA (405 ms)
2023-01-11T23:30:30.8479806Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastingRNG2_CUDA
2023-01-11T23:30:33.5678722Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastingRNG2_CUDA (2719 ms)
2023-01-11T23:30:33.5679130Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastingRNGSmem_CUDA
2023-01-11T23:30:34.7446120Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastingRNGSmem_CUDA (1176 ms)
2023-01-11T23:30:34.7446610Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastingRNGSmemNonSquareTile_CUDA
2023-01-11T23:30:34.9995220Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastingRNGSmemNonSquareTile_CUDA (254 ms)
2023-01-11T23:30:34.9996084Z [0;32m[ RUN      ] [mNVFuserTest.FusionUniform_CUDA
2023-01-11T23:30:35.3880307Z [0;32m[       OK ] [mNVFuserTest.FusionUniform_CUDA (388 ms)
2023-01-11T23:30:35.3880758Z [0;32m[ RUN      ] [mNVFuserTest.FusionRandLikeReduction_CUDA
2023-01-11T23:30:35.5988278Z [0;32m[       OK ] [mNVFuserTest.FusionRandLikeReduction_CUDA (210 ms)
2023-01-11T23:30:35.5988801Z [0;32m[ RUN      ] [mNVFuserTest.FusionSplitDims_CUDA
2023-01-11T23:30:35.5994450Z [0;32m[       OK ] [mNVFuserTest.FusionSplitDims_CUDA (0 ms)
2023-01-11T23:30:35.5994974Z [0;32m[ RUN      ] [mNVFuserTest.FusionMergeDims_CUDA
2023-01-11T23:30:35.5995920Z [0;32m[       OK ] [mNVFuserTest.FusionMergeDims_CUDA (0 ms)
2023-01-11T23:30:35.5996964Z [0;32m[ RUN      ] [mNVFuserTest.FusionReorderAsRFactor_CUDA
2023-01-11T23:30:35.5997519Z [0;32m[       OK ] [mNVFuserTest.FusionReorderAsRFactor_CUDA (0 ms)
2023-01-11T23:30:35.5997931Z [0;32m[ RUN      ] [mNVFuserTest.FusionDisjointViewSet_CUDA
2023-01-11T23:30:35.5998305Z [0;32m[       OK ] [mNVFuserTest.FusionDisjointViewSet_CUDA (0 ms)
2023-01-11T23:30:35.5998677Z [0;32m[ RUN      ] [mNVFuserTest.FusionMatchingViews_CUDA
2023-01-11T23:30:35.5999115Z [0;32m[       OK ] [mNVFuserTest.FusionMatchingViews_CUDA (0 ms)
2023-01-11T23:30:35.5999612Z [0;32m[ RUN      ] [mNVFuserTest.FusionBroadcastViewMultiples_CUDA
2023-01-11T23:30:35.6013114Z [0;32m[       OK ] [mNVFuserTest.FusionBroadcastViewMultiples_CUDA (1 ms)
2023-01-11T23:30:35.6013689Z [0;32m[ RUN      ] [mNVFuserTest.FusionTVDomainGuard_CUDA
2023-01-11T23:30:35.6014175Z [0;32m[       OK ] [mNVFuserTest.FusionTVDomainGuard_CUDA (0 ms)
2023-01-11T23:30:35.6014868Z [0;32m[----------] [m759 tests from NVFuserTest (841041 ms total)
2023-01-11T23:30:35.6015091Z 
2023-01-11T23:30:35.6015587Z [0;32m[----------] [m2 tests from NVFuserMultithreadedTest
2023-01-11T23:30:35.6016150Z [0;32m[ RUN      ] [mNVFuserMultithreadedTest.SingleFunction_CUDA
2023-01-11T23:30:35.8954138Z [0;32m[       OK ] [mNVFuserMultithreadedTest.SingleFunction_CUDA (293 ms)
2023-01-11T23:30:35.8955046Z [0;32m[ RUN      ] [mNVFuserMultithreadedTest.MultipleFunctions_CUDA
2023-01-11T23:30:35.9088787Z [0;32m[       OK ] [mNVFuserMultithreadedTest.MultipleFunctions_CUDA (13 ms)
2023-01-11T23:30:35.9090041Z [0;32m[----------] [m2 tests from NVFuserMultithreadedTest (307 ms total)
2023-01-11T23:30:35.9090493Z 
2023-01-11T23:30:35.9091072Z [0;32m[----------] [m12 tests from AliasAnalysisTest/BatchAndInstanceNormFixture
2023-01-11T23:30:35.9092364Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/0
2023-01-11T23:30:35.9093737Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/0 (0 ms)
2023-01-11T23:30:35.9095352Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/1
2023-01-11T23:30:35.9096476Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/1 (0 ms)
2023-01-11T23:30:35.9097262Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/2
2023-01-11T23:30:35.9097796Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/2 (0 ms)
2023-01-11T23:30:35.9098282Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/3
2023-01-11T23:30:35.9098772Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNorm/3 (0 ms)
2023-01-11T23:30:35.9099579Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/0
2023-01-11T23:30:35.9100171Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/0 (0 ms)
2023-01-11T23:30:35.9100768Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/1
2023-01-11T23:30:35.9101363Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/1 (0 ms)
2023-01-11T23:30:35.9101946Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/2
2023-01-11T23:30:35.9102542Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/2 (0 ms)
2023-01-11T23:30:35.9103131Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/3
2023-01-11T23:30:35.9103726Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchAndInstanceNormTrainingUnknown/3 (0 ms)
2023-01-11T23:30:35.9104301Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/0
2023-01-11T23:30:35.9104877Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/0 (0 ms)
2023-01-11T23:30:35.9105447Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/1
2023-01-11T23:30:35.9106020Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/1 (0 ms)
2023-01-11T23:30:35.9106577Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/2
2023-01-11T23:30:35.9107155Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/2 (0 ms)
2023-01-11T23:30:35.9107762Z [0;32m[ RUN      ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/3
2023-01-11T23:30:35.9108338Z [0;32m[       OK ] [mAliasAnalysisTest/BatchAndInstanceNormFixture.BatchNormTrainingWithNoMeanOrVar/3 (0 ms)
2023-01-11T23:30:35.9108841Z [0;32m[----------] [m12 tests from AliasAnalysisTest/BatchAndInstanceNormFixture (0 ms total)
2023-01-11T23:30:35.9109109Z 
2023-01-11T23:30:35.9109357Z [0;32m[----------] [m10 tests from PyTorch/LiteInterpreterDynamicTypeTestFixture
2023-01-11T23:30:35.9109816Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/0
2023-01-11T23:30:36.8078668Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/0 (897 ms)
2023-01-11T23:30:36.8079370Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/1
2023-01-11T23:30:38.0435500Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/1 (1235 ms)
2023-01-11T23:30:38.0436502Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/2
2023-01-11T23:30:39.5062956Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/2 (1462 ms)
2023-01-11T23:30:39.5063451Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/3
2023-01-11T23:30:40.9623399Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/3 (1456 ms)
2023-01-11T23:30:40.9624627Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/4
2023-01-11T23:30:42.3999487Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/4 (1437 ms)
2023-01-11T23:30:42.4000118Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/5
2023-01-11T23:30:43.8749728Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/5 (1474 ms)
2023-01-11T23:30:43.8750420Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/6
2023-01-11T23:30:45.3362963Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/6 (1461 ms)
2023-01-11T23:30:45.3363804Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/7
2023-01-11T23:30:46.7752326Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/7 (1438 ms)
2023-01-11T23:30:46.7753370Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/8
2023-01-11T23:30:48.1138870Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/8 (1338 ms)
2023-01-11T23:30:48.1139876Z [0;32m[ RUN      ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/9
2023-01-11T23:30:49.4522468Z [0;32m[       OK ] [mPyTorch/LiteInterpreterDynamicTypeTestFixture.Conformance/9 (1338 ms)
2023-01-11T23:30:49.4523141Z [0;32m[----------] [m10 tests from PyTorch/LiteInterpreterDynamicTypeTestFixture (13542 ms total)
2023-01-11T23:30:49.4523373Z 
2023-01-11T23:30:49.4523547Z [0;32m[----------] [mGlobal test environment tear-down
2023-01-11T23:30:49.4698239Z [0;32m[==========] [m1340 tests from 122 test suites ran. (858993 ms total)
2023-01-11T23:30:49.4698861Z [0;32m[  PASSED  ] [m1340 tests.
2023-01-11T23:30:50.0729308Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 == *cuda* ]]
2023-01-11T23:30:50.0729703Z + [[ default != *nogpu* ]]
2023-01-11T23:30:50.0729970Z + LTC_TS_CUDA=1
2023-01-11T23:30:50.0730439Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_lazy --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_lazy.xml
2023-01-11T23:30:50.4864408Z Only one CUDA device detected. Disabling MultiCUDA tests
2023-01-11T23:30:50.4866447Z [0;33mNote: Google Test filter = *-*_MultiCUDA
2023-01-11T23:30:50.4866809Z [m[0;32m[==========] [mRunning 611 tests from 10 test suites.
2023-01-11T23:30:50.4867248Z [0;32m[----------] [mGlobal test environment set-up.
2023-01-11T23:30:50.4867634Z [0;32m[----------] [m11 tests from BackendDeviceTest
2023-01-11T23:30:50.4868085Z [0;32m[ RUN      ] [mBackendDeviceTest.BackendDeviceType
2023-01-11T23:30:50.4868511Z [0;32m[       OK ] [mBackendDeviceTest.BackendDeviceType (0 ms)
2023-01-11T23:30:50.4868945Z [0;32m[ RUN      ] [mBackendDeviceTest.Basic1
2023-01-11T23:30:50.4869405Z [0;32m[       OK ] [mBackendDeviceTest.Basic1 (0 ms)
2023-01-11T23:30:50.4869799Z [0;32m[ RUN      ] [mBackendDeviceTest.Basic2
2023-01-11T23:30:50.4870358Z [0;32m[       OK ] [mBackendDeviceTest.Basic2 (0 ms)
2023-01-11T23:30:50.4870716Z [0;32m[ RUN      ] [mBackendDeviceTest.Basic3
2023-01-11T23:30:50.4871061Z [0;32m[       OK ] [mBackendDeviceTest.Basic3 (0 ms)
2023-01-11T23:30:50.4871555Z [0;32m[ RUN      ] [mBackendDeviceTest.Basic4
2023-01-11T23:30:50.4871895Z [0;32m[       OK ] [mBackendDeviceTest.Basic4 (0 ms)
2023-01-11T23:30:50.4872341Z [0;32m[ RUN      ] [mBackendDeviceTest.Compare
2023-01-11T23:30:50.4872760Z [0;32m[       OK ] [mBackendDeviceTest.Compare (0 ms)
2023-01-11T23:30:50.4873069Z [0;32m[ RUN      ] [mBackendDeviceTest.Ostream
2023-01-11T23:30:50.4873387Z [0;32m[       OK ] [mBackendDeviceTest.Ostream (0 ms)
2023-01-11T23:30:50.4873713Z [0;32m[ RUN      ] [mBackendDeviceTest.FromAten
2023-01-11T23:30:50.4879405Z [0;32m[       OK ] [mBackendDeviceTest.FromAten (1 ms)
2023-01-11T23:30:50.4880092Z [0;32m[ RUN      ] [mBackendDeviceTest.ToAten
2023-01-11T23:30:50.4880531Z [0;32m[       OK ] [mBackendDeviceTest.ToAten (0 ms)
2023-01-11T23:30:50.4880902Z [0;32m[ RUN      ] [mBackendDeviceTest.GetBackendDevice1
2023-01-11T23:30:50.4886405Z [0;32m[       OK ] [mBackendDeviceTest.GetBackendDevice1 (0 ms)
2023-01-11T23:30:50.4886930Z [0;32m[ RUN      ] [mBackendDeviceTest.GetBackendDevice2
2023-01-11T23:30:50.4887438Z [0;32m[       OK ] [mBackendDeviceTest.GetBackendDevice2 (0 ms)
2023-01-11T23:30:50.4887932Z [0;32m[----------] [m11 tests from BackendDeviceTest (1 ms total)
2023-01-11T23:30:50.4888157Z 
2023-01-11T23:30:50.4888309Z [0;32m[----------] [m2 tests from CacheTest
2023-01-11T23:30:50.4888582Z [0;32m[ RUN      ] [mCacheTest.BasicTest
2023-01-11T23:30:50.4888875Z [0;32m[       OK ] [mCacheTest.BasicTest (0 ms)
2023-01-11T23:30:50.4889356Z [0;32m[ RUN      ] [mCacheTest.ShapeCacheTestForDynamicShape
2023-01-11T23:30:50.4889744Z [0;32m[       OK ] [mCacheTest.ShapeCacheTestForDynamicShape (0 ms)
2023-01-11T23:30:50.4890108Z [0;32m[----------] [m2 tests from CacheTest (0 ms total)
2023-01-11T23:30:50.4890267Z 
2023-01-11T23:30:50.4890407Z [0;32m[----------] [m5 tests from IrTest
2023-01-11T23:30:50.4890673Z [0;32m[ RUN      ] [mIrTest.BasicTest
2023-01-11T23:30:50.4890943Z [0;32m[       OK ] [mIrTest.BasicTest (0 ms)
2023-01-11T23:30:50.4891223Z [0;32m[ RUN      ] [mIrTest.MetaDataTest
2023-01-11T23:30:50.4891513Z [0;32m[       OK ] [mIrTest.MetaDataTest (0 ms)
2023-01-11T23:30:50.4891795Z [0;32m[ RUN      ] [mIrTest.TsNodeTest
2023-01-11T23:30:50.4892078Z [0;32m[       OK ] [mIrTest.TsNodeTest (0 ms)
2023-01-11T23:30:50.4892370Z [0;32m[ RUN      ] [mIrTest.DimensionNodeTest
2023-01-11T23:30:50.4892671Z [0;32m[       OK ] [mIrTest.DimensionNodeTest (0 ms)
2023-01-11T23:30:50.4892993Z [0;32m[ RUN      ] [mIrTest.DimensionIsDynamicTest
2023-01-11T23:30:50.4893329Z [0;32m[       OK ] [mIrTest.DimensionIsDynamicTest (0 ms)
2023-01-11T23:30:50.4893645Z [0;32m[----------] [m5 tests from IrTest (0 ms total)
2023-01-11T23:30:50.4893795Z 
2023-01-11T23:30:50.4893945Z [0;32m[----------] [m2 tests from IrUtilTest
2023-01-11T23:30:50.4894225Z [0;32m[ RUN      ] [mIrUtilTest.BasicTest
2023-01-11T23:30:50.4894776Z [0;32m[       OK ] [mIrUtilTest.BasicTest (0 ms)
2023-01-11T23:30:50.4895202Z [0;32m[ RUN      ] [mIrUtilTest.TestCircle
2023-01-11T23:30:50.4909580Z [0;32m[       OK ] [mIrUtilTest.TestCircle (2 ms)
2023-01-11T23:30:50.4910046Z [0;32m[----------] [m2 tests from IrUtilTest (2 ms total)
2023-01-11T23:30:50.4910211Z 
2023-01-11T23:30:50.4912102Z [0;32m[----------] [m2 tests from HashTest
2023-01-11T23:30:50.4912552Z [0;32m[ RUN      ] [mHashTest.Scalar
2023-01-11T23:30:50.4912951Z [0;32m[       OK ] [mHashTest.Scalar (0 ms)
2023-01-11T23:30:50.4913323Z [0;32m[ RUN      ] [mHashTest.Sanity
2023-01-11T23:30:50.4913668Z [0;32m[       OK ] [mHashTest.Sanity (0 ms)
2023-01-11T23:30:50.4913977Z [0;32m[----------] [m2 tests from HashTest (0 ms total)
2023-01-11T23:30:50.4914134Z 
2023-01-11T23:30:50.4914308Z [0;32m[----------] [m3 tests from PermutationUtilTest
2023-01-11T23:30:50.4914758Z [0;32m[ RUN      ] [mPermutationUtilTest.TestInversePermutation
2023-01-11T23:30:50.4931149Z [0;32m[       OK ] [mPermutationUtilTest.TestInversePermutation (2 ms)
2023-01-11T23:30:50.4931986Z [0;32m[ RUN      ] [mPermutationUtilTest.TestIsPermutation
2023-01-11T23:30:50.4932744Z [0;32m[       OK ] [mPermutationUtilTest.TestIsPermutation (0 ms)
2023-01-11T23:30:50.4933449Z [0;32m[ RUN      ] [mPermutationUtilTest.TestPermute
2023-01-11T23:30:50.4949545Z [0;32m[       OK ] [mPermutationUtilTest.TestPermute (1 ms)
2023-01-11T23:30:50.4950123Z [0;32m[----------] [m3 tests from PermutationUtilTest (3 ms total)
2023-01-11T23:30:50.4951908Z 
2023-01-11T23:30:50.4952525Z [0;32m[----------] [m7 tests from ShapeTest
2023-01-11T23:30:50.4952915Z [0;32m[ RUN      ] [mShapeTest.Basic1
2023-01-11T23:30:50.4953315Z [0;32m[       OK ] [mShapeTest.Basic1 (0 ms)
2023-01-11T23:30:50.4953694Z [0;32m[ RUN      ] [mShapeTest.Basic2
2023-01-11T23:30:50.4954070Z [0;32m[       OK ] [mShapeTest.Basic2 (0 ms)
2023-01-11T23:30:50.4954437Z [0;32m[ RUN      ] [mShapeTest.Basic3
2023-01-11T23:30:50.4954813Z [0;32m[       OK ] [mShapeTest.Basic3 (0 ms)
2023-01-11T23:30:50.4955197Z [0;32m[ RUN      ] [mShapeTest.SetScalarType
2023-01-11T23:30:50.4955607Z [0;32m[       OK ] [mShapeTest.SetScalarType (0 ms)
2023-01-11T23:30:50.4956007Z [0;32m[ RUN      ] [mShapeTest.SetSize
2023-01-11T23:30:50.4956335Z [0;32m[       OK ] [mShapeTest.SetSize (0 ms)
2023-01-11T23:30:50.4956600Z [0;32m[ RUN      ] [mShapeTest.Equal
2023-01-11T23:30:50.4956879Z [0;32m[       OK ] [mShapeTest.Equal (0 ms)
2023-01-11T23:30:50.4957155Z [0;32m[ RUN      ] [mShapeTest.Ostream
2023-01-11T23:30:50.4957432Z [0;32m[       OK ] [mShapeTest.Ostream (0 ms)
2023-01-11T23:30:50.4957850Z [0;32m[----------] [m7 tests from ShapeTest (0 ms total)
2023-01-11T23:30:50.4958011Z 
2023-01-11T23:30:50.4958168Z [0;32m[----------] [m2 tests from TrieCacheTest
2023-01-11T23:30:50.4958485Z [0;32m[ RUN      ] [mTrieCacheTest.TestSinglePath
2023-01-11T23:30:50.4958819Z [0;32m[       OK ] [mTrieCacheTest.TestSinglePath (0 ms)
2023-01-11T23:30:50.4959146Z [0;32m[ RUN      ] [mTrieCacheTest.TestTwoPaths
2023-01-11T23:30:50.4959472Z [0;32m[       OK ] [mTrieCacheTest.TestTwoPaths (0 ms)
2023-01-11T23:30:50.4959808Z [0;32m[----------] [m2 tests from TrieCacheTest (0 ms total)
2023-01-11T23:30:50.4959976Z 
2023-01-11T23:30:50.4960122Z [0;32m[----------] [m3 tests from UtilTest
2023-01-11T23:30:50.4960421Z [0;32m[ RUN      ] [mUtilTest.ExceptionCleanup
2023-01-11T23:30:50.4960736Z [0;32m[       OK ] [mUtilTest.ExceptionCleanup (0 ms)
2023-01-11T23:30:50.4961037Z [0;32m[ RUN      ] [mUtilTest.MaybeRef
2023-01-11T23:30:50.4961367Z [0;32m[       OK ] [mUtilTest.MaybeRef (0 ms)
2023-01-11T23:30:50.4961637Z [0;32m[ RUN      ] [mUtilTest.Iota
2023-01-11T23:30:50.4961900Z [0;32m[       OK ] [mUtilTest.Iota (0 ms)
2023-01-11T23:30:50.4962204Z [0;32m[----------] [m3 tests from UtilTest (0 ms total)
2023-01-11T23:30:50.4962362Z 
2023-01-11T23:30:50.4962518Z [0;32m[----------] [m574 tests from LazyOpsTest
2023-01-11T23:30:50.4962819Z [0;32m[ RUN      ] [mLazyOpsTest.TestScalarTensor
2023-01-11T23:30:51.3512088Z [0;32m[       OK ] [mLazyOpsTest.TestScalarTensor (855 ms)
2023-01-11T23:30:51.3512543Z [0;32m[ RUN      ] [mLazyOpsTest.TestClone
2023-01-11T23:30:51.3523253Z [0;32m[       OK ] [mLazyOpsTest.TestClone (1 ms)
2023-01-11T23:30:51.3523570Z [0;32m[ RUN      ] [mLazyOpsTest.TestTo
2023-01-11T23:30:51.3523938Z [0;32m[       OK ] [mLazyOpsTest.TestTo (0 ms)
2023-01-11T23:30:51.3524406Z [0;32m[ RUN      ] [mLazyOpsTest.TestIsFloatingPoint
2023-01-11T23:30:51.3524871Z [0;32m[       OK ] [mLazyOpsTest.TestIsFloatingPoint (0 ms)
2023-01-11T23:30:51.3525214Z [0;32m[ RUN      ] [mLazyOpsTest.TestIsSigned
2023-01-11T23:30:51.3525532Z [0;32m[       OK ] [mLazyOpsTest.TestIsSigned (0 ms)
2023-01-11T23:30:51.3525843Z [0;32m[ RUN      ] [mLazyOpsTest.TestCastByte
2023-01-11T23:30:51.3557914Z [0;32m[       OK ] [mLazyOpsTest.TestCastByte (3 ms)
2023-01-11T23:30:51.3558703Z [0;32m[ RUN      ] [mLazyOpsTest.TestCastChar
2023-01-11T23:30:51.3560723Z [0;32m[       OK ] [mLazyOpsTest.TestCastChar (0 ms)
2023-01-11T23:30:51.3561387Z [0;32m[ RUN      ] [mLazyOpsTest.TestCastShort
2023-01-11T23:30:51.3563580Z [0;32m[       OK ] [mLazyOpsTest.TestCastShort (0 ms)
2023-01-11T23:30:51.3563904Z [0;32m[ RUN      ] [mLazyOpsTest.TestCastInt
2023-01-11T23:30:51.3567787Z [0;32m[       OK ] [mLazyOpsTest.TestCastInt (0 ms)
2023-01-11T23:30:51.3568115Z [0;32m[ RUN      ] [mLazyOpsTest.TestCastLong
2023-01-11T23:30:51.3570948Z [0;32m[       OK ] [mLazyOpsTest.TestCastLong (0 ms)
2023-01-11T23:30:51.3571441Z [0;32m[ RUN      ] [mLazyOpsTest.TestCastFloat
2023-01-11T23:30:51.3571885Z [0;32m[       OK ] [mLazyOpsTest.TestCastFloat (0 ms)
2023-01-11T23:30:51.3572204Z [0;32m[ RUN      ] [mLazyOpsTest.TestRetainType
2023-01-11T23:30:51.3574164Z [0;32m[       OK ] [mLazyOpsTest.TestRetainType (0 ms)
2023-01-11T23:30:51.3576116Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogicalTypeWithInterop
2023-01-11T23:30:51.3580533Z [0;32m[       OK ] [mLazyOpsTest.TestLogicalTypeWithInterop (0 ms)
2023-01-11T23:30:51.3592304Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdd
2023-01-11T23:30:51.3592750Z [0;32m[       OK ] [mLazyOpsTest.TestAdd (0 ms)
2023-01-11T23:30:51.3593150Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddHalf
2023-01-11T23:30:51.3593551Z [0;32m[       OK ] [mLazyOpsTest.TestAddHalf (0 ms)
2023-01-11T23:30:51.3593998Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddMixedPrecision
2023-01-11T23:30:51.3594851Z [0;32m[       OK ] [mLazyOpsTest.TestAddMixedPrecision (0 ms)
2023-01-11T23:30:51.3595297Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddInPlace
2023-01-11T23:30:51.3600779Z [0;32m[       OK ] [mLazyOpsTest.TestAddInPlace (0 ms)
2023-01-11T23:30:51.3601193Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddScalar
2023-01-11T23:30:51.3604677Z [0;32m[       OK ] [mLazyOpsTest.TestAddScalar (0 ms)
2023-01-11T23:30:51.3605066Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddScalarInPlace
2023-01-11T23:30:51.3610474Z [0;32m[       OK ] [mLazyOpsTest.TestAddScalarInPlace (0 ms)
2023-01-11T23:30:51.3611047Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddZeroSizeDim
2023-01-11T23:30:51.3613600Z [0;32m[       OK ] [mLazyOpsTest.TestAddZeroSizeDim (0 ms)
2023-01-11T23:30:51.3613929Z [0;32m[ RUN      ] [mLazyOpsTest.TestSub
2023-01-11T23:30:51.3618256Z [0;32m[       OK ] [mLazyOpsTest.TestSub (0 ms)
2023-01-11T23:30:51.3618568Z [0;32m[ RUN      ] [mLazyOpsTest.TestSubInPlace
2023-01-11T23:30:51.3624201Z [0;32m[       OK ] [mLazyOpsTest.TestSubInPlace (0 ms)
2023-01-11T23:30:51.3624535Z [0;32m[ RUN      ] [mLazyOpsTest.TestSubScalar
2023-01-11T23:30:51.3628154Z [0;32m[       OK ] [mLazyOpsTest.TestSubScalar (0 ms)
2023-01-11T23:30:51.3628508Z [0;32m[ RUN      ] [mLazyOpsTest.TestSubScalarInPlace
2023-01-11T23:30:51.3634028Z [0;32m[       OK ] [mLazyOpsTest.TestSubScalarInPlace (0 ms)
2023-01-11T23:30:51.3634361Z [0;32m[ RUN      ] [mLazyOpsTest.TestMul
2023-01-11T23:30:51.3638245Z [0;32m[       OK ] [mLazyOpsTest.TestMul (0 ms)
2023-01-11T23:30:51.3638573Z [0;32m[ RUN      ] [mLazyOpsTest.TestMulInPlace
2023-01-11T23:30:51.3644199Z [0;32m[       OK ] [mLazyOpsTest.TestMulInPlace (0 ms)
2023-01-11T23:30:51.3644528Z [0;32m[ RUN      ] [mLazyOpsTest.TestMulScalar
2023-01-11T23:30:51.3648192Z [0;32m[       OK ] [mLazyOpsTest.TestMulScalar (0 ms)
2023-01-11T23:30:51.3648532Z [0;32m[ RUN      ] [mLazyOpsTest.TestMulScalarInPlace
2023-01-11T23:30:51.3653899Z [0;32m[       OK ] [mLazyOpsTest.TestMulScalarInPlace (0 ms)
2023-01-11T23:30:51.3654244Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiv
2023-01-11T23:30:51.3778463Z [0;32m[       OK ] [mLazyOpsTest.TestDiv (12 ms)
2023-01-11T23:30:51.3779738Z [0;32m[ RUN      ] [mLazyOpsTest.TestDivWithRoundingMode
2023-01-11T23:30:51.4181553Z [0;32m[       OK ] [mLazyOpsTest.TestDivWithRoundingMode (40 ms)
2023-01-11T23:30:51.4181930Z [0;32m[ RUN      ] [mLazyOpsTest.TestDivInPlace
2023-01-11T23:30:51.4184297Z [0;32m[       OK ] [mLazyOpsTest.TestDivInPlace (0 ms)
2023-01-11T23:30:51.4184892Z [0;32m[ RUN      ] [mLazyOpsTest.TestDivInPlaceWithRoundingMode
2023-01-11T23:30:51.4197815Z [0;32m[       OK ] [mLazyOpsTest.TestDivInPlaceWithRoundingMode (1 ms)
2023-01-11T23:30:51.4198371Z [0;32m[ RUN      ] [mLazyOpsTest.TestDivScalar
2023-01-11T23:30:51.4244116Z [0;32m[       OK ] [mLazyOpsTest.TestDivScalar (4 ms)
2023-01-11T23:30:51.4244636Z [0;32m[ RUN      ] [mLazyOpsTest.TestDivScalarInPlace
2023-01-11T23:30:51.4252350Z [0;32m[       OK ] [mLazyOpsTest.TestDivScalarInPlace (0 ms)
2023-01-11T23:30:51.4252829Z [0;32m[ RUN      ] [mLazyOpsTest.TestDivOut
2023-01-11T23:30:51.4258734Z [0;32m[       OK ] [mLazyOpsTest.TestDivOut (0 ms)
2023-01-11T23:30:51.4259178Z [0;32m[ RUN      ] [mLazyOpsTest.TestRsubScalar
2023-01-11T23:30:51.4263109Z [0;32m[       OK ] [mLazyOpsTest.TestRsubScalar (0 ms)
2023-01-11T23:30:51.4263522Z [0;32m[ RUN      ] [mLazyOpsTest.TestNe
2023-01-11T23:30:51.4267144Z [0;32m[       OK ] [mLazyOpsTest.TestNe (0 ms)
2023-01-11T23:30:51.4267561Z [0;32m[ RUN      ] [mLazyOpsTest.TestNeInplace
2023-01-11T23:30:51.4274828Z [0;32m[       OK ] [mLazyOpsTest.TestNeInplace (0 ms)
2023-01-11T23:30:51.4275491Z [0;32m[ RUN      ] [mLazyOpsTest.TestEq
2023-01-11T23:30:51.4277102Z [0;32m[       OK ] [mLazyOpsTest.TestEq (0 ms)
2023-01-11T23:30:51.4277853Z [0;32m[ RUN      ] [mLazyOpsTest.TestEqInplace
2023-01-11T23:30:51.4283264Z [0;32m[       OK ] [mLazyOpsTest.TestEqInplace (0 ms)
2023-01-11T23:30:51.4283684Z [0;32m[ RUN      ] [mLazyOpsTest.TestGe
2023-01-11T23:30:51.4286572Z [0;32m[       OK ] [mLazyOpsTest.TestGe (0 ms)
2023-01-11T23:30:51.4286917Z [0;32m[ RUN      ] [mLazyOpsTest.TestGeInplace
2023-01-11T23:30:51.4293045Z [0;32m[       OK ] [mLazyOpsTest.TestGeInplace (0 ms)
2023-01-11T23:30:51.4293363Z [0;32m[ RUN      ] [mLazyOpsTest.TestLe
2023-01-11T23:30:51.4296374Z [0;32m[       OK ] [mLazyOpsTest.TestLe (0 ms)
2023-01-11T23:30:51.4296702Z [0;32m[ RUN      ] [mLazyOpsTest.TestLeInplace
2023-01-11T23:30:51.4302009Z [0;32m[       OK ] [mLazyOpsTest.TestLeInplace (0 ms)
2023-01-11T23:30:51.4302405Z [0;32m[ RUN      ] [mLazyOpsTest.TestGt
2023-01-11T23:30:51.4306275Z [0;32m[       OK ] [mLazyOpsTest.TestGt (0 ms)
2023-01-11T23:30:51.4306651Z [0;32m[ RUN      ] [mLazyOpsTest.TestGtInplace
2023-01-11T23:30:51.4312432Z [0;32m[       OK ] [mLazyOpsTest.TestGtInplace (0 ms)
2023-01-11T23:30:51.4312850Z [0;32m[ RUN      ] [mLazyOpsTest.TestLt
2023-01-11T23:30:51.4315851Z [0;32m[       OK ] [mLazyOpsTest.TestLt (0 ms)
2023-01-11T23:30:51.4316208Z [0;32m[ RUN      ] [mLazyOpsTest.TestLtInplace
2023-01-11T23:30:51.4321928Z [0;32m[       OK ] [mLazyOpsTest.TestLtInplace (0 ms)
2023-01-11T23:30:51.4322282Z [0;32m[ RUN      ] [mLazyOpsTest.TestNeScalar
2023-01-11T23:30:51.4325249Z [0;32m[       OK ] [mLazyOpsTest.TestNeScalar (0 ms)
2023-01-11T23:30:51.4325575Z [0;32m[ RUN      ] [mLazyOpsTest.TestEqScalar
2023-01-11T23:30:51.4328392Z [0;32m[       OK ] [mLazyOpsTest.TestEqScalar (0 ms)
2023-01-11T23:30:51.4328717Z [0;32m[ RUN      ] [mLazyOpsTest.TestGeScalar
2023-01-11T23:30:51.4331683Z [0;32m[       OK ] [mLazyOpsTest.TestGeScalar (0 ms)
2023-01-11T23:30:51.4332053Z [0;32m[ RUN      ] [mLazyOpsTest.TestGeScalarInplace
2023-01-11T23:30:51.4337191Z [0;32m[       OK ] [mLazyOpsTest.TestGeScalarInplace (0 ms)
2023-01-11T23:30:51.4337520Z [0;32m[ RUN      ] [mLazyOpsTest.TestLeScalar
2023-01-11T23:30:51.4340337Z [0;32m[       OK ] [mLazyOpsTest.TestLeScalar (0 ms)
2023-01-11T23:30:51.4340684Z [0;32m[ RUN      ] [mLazyOpsTest.TestLeScalarInplace
2023-01-11T23:30:51.4345573Z [0;32m[       OK ] [mLazyOpsTest.TestLeScalarInplace (0 ms)
2023-01-11T23:30:51.4345939Z [0;32m[ RUN      ] [mLazyOpsTest.TestGtScalar
2023-01-11T23:30:51.4348819Z [0;32m[       OK ] [mLazyOpsTest.TestGtScalar (0 ms)
2023-01-11T23:30:51.4349166Z [0;32m[ RUN      ] [mLazyOpsTest.TestGtScalarInplace
2023-01-11T23:30:51.4354382Z [0;32m[       OK ] [mLazyOpsTest.TestGtScalarInplace (0 ms)
2023-01-11T23:30:51.4354740Z [0;32m[ RUN      ] [mLazyOpsTest.TestLtScalar
2023-01-11T23:30:51.4357436Z [0;32m[       OK ] [mLazyOpsTest.TestLtScalar (0 ms)
2023-01-11T23:30:51.4357783Z [0;32m[ RUN      ] [mLazyOpsTest.TestLtScalarInplace
2023-01-11T23:30:51.4362307Z [0;32m[       OK ] [mLazyOpsTest.TestLtScalarInplace (0 ms)
2023-01-11T23:30:51.4362664Z [0;32m[ RUN      ] [mLazyOpsTest.TestIntegerAdd
2023-01-11T23:30:51.4379042Z [0;32m[       OK ] [mLazyOpsTest.TestIntegerAdd (1 ms)
2023-01-11T23:30:51.4379393Z [0;32m[ RUN      ] [mLazyOpsTest.TestSVD
2023-01-11T23:30:52.2476706Z [0;32m[       OK ] [mLazyOpsTest.TestSVD (809 ms)
2023-01-11T23:30:52.2477507Z [0;32m[ RUN      ] [mLazyOpsTest.TestQR
2023-01-11T23:30:52.2478385Z [W BatchLinearAlgebra.cpp:2459] Warning: torch.qr is deprecated in favor of torch.linalg.qr and will be removed in a future PyTorch release.
2023-01-11T23:30:52.2479258Z The boolean parameter 'some' has been replaced with a string parameter 'mode'.
2023-01-11T23:30:52.2479957Z Q, R = torch.qr(A, some)
2023-01-11T23:30:52.2480344Z should be replaced with
2023-01-11T23:30:52.2480976Z Q, R = torch.linalg.qr(A, 'reduced' if some else 'complete') (function operator())
2023-01-11T23:30:52.2509088Z [0;32m[       OK ] [mLazyOpsTest.TestQR (3 ms)
2023-01-11T23:30:52.2509810Z [0;32m[ RUN      ] [mLazyOpsTest.TestSymEig
2023-01-11T23:30:52.2512306Z [W BatchLinearAlgebra.cpp:2910] Warning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
2023-01-11T23:30:52.2512877Z The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
2023-01-11T23:30:52.2513417Z L, _ = torch.symeig(A, upper=upper)
2023-01-11T23:30:52.2513730Z should be replaced with
2023-01-11T23:30:52.2514175Z L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
2023-01-11T23:30:52.2514485Z and
2023-01-11T23:30:52.2514703Z L, V = torch.symeig(A, eigenvectors=True)
2023-01-11T23:30:52.2514939Z should be replaced with
2023-01-11T23:30:52.2515274Z L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (function operator())
2023-01-11T23:30:52.4682361Z [0;32m[       OK ] [mLazyOpsTest.TestSymEig (217 ms)
2023-01-11T23:30:52.4682875Z [0;32m[ RUN      ] [mLazyOpsTest.TestCholesky
2023-01-11T23:30:52.4683467Z [W BatchLinearAlgebra.cpp:1730] Warning: torch.cholesky is deprecated in favor of torch.linalg.cholesky and will be removed in a future PyTorch release.
2023-01-11T23:30:52.4683958Z L = torch.cholesky(A)
2023-01-11T23:30:52.4684279Z should be replaced with
2023-01-11T23:30:52.4684603Z L = torch.linalg.cholesky(A)
2023-01-11T23:30:52.4684877Z and
2023-01-11T23:30:52.4685162Z U = torch.cholesky(A, upper=True)
2023-01-11T23:30:52.4685484Z should be replaced with
2023-01-11T23:30:52.4685798Z U = torch.linalg.cholesky(A).mH().
2023-01-11T23:30:52.4686302Z This transform will produce equivalent results for all valid (symmetric positive definite) inputs. (function operator())
2023-01-11T23:30:52.4698763Z [0;32m[       OK ] [mLazyOpsTest.TestCholesky (1 ms)
2023-01-11T23:30:52.4699186Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogDet
2023-01-11T23:30:52.4749285Z [0;32m[       OK ] [mLazyOpsTest.TestLogDet (4 ms)
2023-01-11T23:30:52.4750363Z [0;32m[ RUN      ] [mLazyOpsTest.TestTriangularSolve
2023-01-11T23:30:52.4751265Z [W BatchLinearAlgebra.cpp:2225] Warning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangularand will be removed in a future PyTorch release.
2023-01-11T23:30:52.4752068Z torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
2023-01-11T23:30:52.4752422Z X = torch.triangular_solve(B, A).solution
2023-01-11T23:30:52.4752653Z should be replaced with
2023-01-11T23:30:52.4752917Z X = torch.linalg.solve_triangular(A, B). (function operator())
2023-01-11T23:30:52.5260888Z [0;32m[       OK ] [mLazyOpsTest.TestTriangularSolve (51 ms)
2023-01-11T23:30:52.5261667Z [0;32m[ RUN      ] [mLazyOpsTest.TestKthValue
2023-01-11T23:30:52.5343745Z [0;32m[       OK ] [mLazyOpsTest.TestKthValue (8 ms)
2023-01-11T23:30:52.5344403Z [0;32m[ RUN      ] [mLazyOpsTest.TestTopK
2023-01-11T23:30:52.5601981Z [0;32m[       OK ] [mLazyOpsTest.TestTopK (25 ms)
2023-01-11T23:30:52.5602342Z [0;32m[ RUN      ] [mLazyOpsTest.TestSort
2023-01-11T23:30:52.5720786Z [0;32m[       OK ] [mLazyOpsTest.TestSort (11 ms)
2023-01-11T23:30:52.5721498Z [0;32m[ RUN      ] [mLazyOpsTest.TestSortDescWithMinValue
2023-01-11T23:30:52.5726498Z [0;32m[       OK ] [mLazyOpsTest.TestSortDescWithMinValue (0 ms)
2023-01-11T23:30:52.5726860Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgSort
2023-01-11T23:30:52.5763404Z [0;32m[       OK ] [mLazyOpsTest.TestArgSort (3 ms)
2023-01-11T23:30:52.5763727Z [0;32m[ RUN      ] [mLazyOpsTest.TestMin
2023-01-11T23:30:52.5766694Z [0;32m[       OK ] [mLazyOpsTest.TestMin (0 ms)
2023-01-11T23:30:52.5767501Z [0;32m[ RUN      ] [mLazyOpsTest.TestMax
2023-01-11T23:30:52.5770446Z [0;32m[       OK ] [mLazyOpsTest.TestMax (0 ms)
2023-01-11T23:30:52.5771025Z [0;32m[ RUN      ] [mLazyOpsTest.TestUnaryMin
2023-01-11T23:30:52.5774414Z [0;32m[       OK ] [mLazyOpsTest.TestUnaryMin (0 ms)
2023-01-11T23:30:52.5775116Z [0;32m[ RUN      ] [mLazyOpsTest.TestUnaryMax
2023-01-11T23:30:52.5778234Z [0;32m[       OK ] [mLazyOpsTest.TestUnaryMax (0 ms)
2023-01-11T23:30:52.5778648Z [0;32m[ RUN      ] [mLazyOpsTest.TestAll
2023-01-11T23:30:52.5796057Z [0;32m[       OK ] [mLazyOpsTest.TestAll (1 ms)
2023-01-11T23:30:52.5796508Z [0;32m[ RUN      ] [mLazyOpsTest.TestAllDim
2023-01-11T23:30:52.5803645Z [0;32m[       OK ] [mLazyOpsTest.TestAllDim (0 ms)
2023-01-11T23:30:52.5804115Z [0;32m[ RUN      ] [mLazyOpsTest.TestAllDimKeep
2023-01-11T23:30:52.5810686Z [0;32m[       OK ] [mLazyOpsTest.TestAllDimKeep (0 ms)
2023-01-11T23:30:52.5811144Z [0;32m[ RUN      ] [mLazyOpsTest.TestAmax
2023-01-11T23:30:52.5914057Z [0;32m[       OK ] [mLazyOpsTest.TestAmax (10 ms)
2023-01-11T23:30:52.5914909Z [0;32m[ RUN      ] [mLazyOpsTest.TestAmin
2023-01-11T23:30:52.6010967Z [0;32m[       OK ] [mLazyOpsTest.TestAmin (9 ms)
2023-01-11T23:30:52.6011409Z [0;32m[ RUN      ] [mLazyOpsTest.TestAny
2023-01-11T23:30:52.6028417Z [0;32m[       OK ] [mLazyOpsTest.TestAny (1 ms)
2023-01-11T23:30:52.6028852Z [0;32m[ RUN      ] [mLazyOpsTest.TestAnyDim
2023-01-11T23:30:52.6037072Z [0;32m[       OK ] [mLazyOpsTest.TestAnyDim (0 ms)
2023-01-11T23:30:52.6037528Z [0;32m[ RUN      ] [mLazyOpsTest.TestAnyDimKeep
2023-01-11T23:30:52.6044571Z [0;32m[       OK ] [mLazyOpsTest.TestAnyDimKeep (0 ms)
2023-01-11T23:30:52.6045001Z [0;32m[ RUN      ] [mLazyOpsTest.TestMean
2023-01-11T23:30:52.6048373Z [0;32m[       OK ] [mLazyOpsTest.TestMean (0 ms)
2023-01-11T23:30:52.6048809Z [0;32m[ RUN      ] [mLazyOpsTest.TestMeanCast
2023-01-11T23:30:52.6052236Z [0;32m[       OK ] [mLazyOpsTest.TestMeanCast (0 ms)
2023-01-11T23:30:52.6052575Z [0;32m[ RUN      ] [mLazyOpsTest.TestMeanInDim
2023-01-11T23:30:52.6079257Z [0;32m[       OK ] [mLazyOpsTest.TestMeanInDim (2 ms)
2023-01-11T23:30:52.6079959Z [0;32m[ RUN      ] [mLazyOpsTest.TestMeanInDims
2023-01-11T23:30:52.6084432Z [0;32m[       OK ] [mLazyOpsTest.TestMeanInDims (0 ms)
2023-01-11T23:30:52.6084797Z [0;32m[ RUN      ] [mLazyOpsTest.TestMeanInDimsKeepCast
2023-01-11T23:30:52.6092939Z [0;32m[       OK ] [mLazyOpsTest.TestMeanInDimsKeepCast (0 ms)
2023-01-11T23:30:52.6093292Z [0;32m[ RUN      ] [mLazyOpsTest.TestMeanInDimOut
2023-01-11T23:30:52.6117555Z [0;32m[       OK ] [mLazyOpsTest.TestMeanInDimOut (2 ms)
2023-01-11T23:30:52.6117883Z [0;32m[ RUN      ] [mLazyOpsTest.TestStd
2023-01-11T23:30:52.6125437Z [0;32m[       OK ] [mLazyOpsTest.TestStd (0 ms)
2023-01-11T23:30:52.6125749Z [0;32m[ RUN      ] [mLazyOpsTest.TestStdInDim
2023-01-11T23:30:52.6224568Z [0;32m[       OK ] [mLazyOpsTest.TestStdInDim (9 ms)
2023-01-11T23:30:52.6224960Z [0;32m[ RUN      ] [mLazyOpsTest.TestStdWithCorrection
2023-01-11T23:30:52.6275134Z [0;32m[       OK ] [mLazyOpsTest.TestStdWithCorrection (5 ms)
2023-01-11T23:30:52.6275528Z [0;32m[ RUN      ] [mLazyOpsTest.TestStdMeanWithCorrection
2023-01-11T23:30:52.6308805Z [0;32m[       OK ] [mLazyOpsTest.TestStdMeanWithCorrection (3 ms)
2023-01-11T23:30:52.6309142Z [0;32m[ RUN      ] [mLazyOpsTest.TestSum
2023-01-11T23:30:52.6313676Z [0;32m[       OK ] [mLazyOpsTest.TestSum (0 ms)
2023-01-11T23:30:52.6314054Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumCast
2023-01-11T23:30:52.6316168Z [0;32m[       OK ] [mLazyOpsTest.TestSumCast (0 ms)
2023-01-11T23:30:52.6316517Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumU8
2023-01-11T23:30:52.6319686Z [0;32m[       OK ] [mLazyOpsTest.TestSumU8 (0 ms)
2023-01-11T23:30:52.6319997Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumInDim
2023-01-11T23:30:52.6342787Z [0;32m[       OK ] [mLazyOpsTest.TestSumInDim (2 ms)
2023-01-11T23:30:52.6343114Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumInDims
2023-01-11T23:30:52.6350655Z [0;32m[       OK ] [mLazyOpsTest.TestSumInDims (0 ms)
2023-01-11T23:30:52.6351723Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumInDimsKeep
2023-01-11T23:30:52.6358906Z [0;32m[       OK ] [mLazyOpsTest.TestSumInDimsKeep (0 ms)
2023-01-11T23:30:52.6359302Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumInDimsKeepCast
2023-01-11T23:30:52.6367199Z [0;32m[       OK ] [mLazyOpsTest.TestSumInDimsKeepCast (0 ms)
2023-01-11T23:30:52.6367535Z [0;32m[ RUN      ] [mLazyOpsTest.TestVar
2023-01-11T23:30:52.6370999Z [0;32m[       OK ] [mLazyOpsTest.TestVar (0 ms)
2023-01-11T23:30:52.6371316Z [0;32m[ RUN      ] [mLazyOpsTest.TestVarWithDim
2023-01-11T23:30:52.6385550Z [0;32m[       OK ] [mLazyOpsTest.TestVarWithDim (1 ms)
2023-01-11T23:30:52.6385905Z [0;32m[ RUN      ] [mLazyOpsTest.TestVarWithCorrection
2023-01-11T23:30:52.6406467Z [0;32m[       OK ] [mLazyOpsTest.TestVarWithCorrection (2 ms)
2023-01-11T23:30:52.6406845Z [0;32m[ RUN      ] [mLazyOpsTest.TestVarMeanWithCorrection
2023-01-11T23:30:52.6439458Z [0;32m[       OK ] [mLazyOpsTest.TestVarMeanWithCorrection (3 ms)
2023-01-11T23:30:52.6439810Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxInDim
2023-01-11T23:30:52.6521258Z [0;32m[       OK ] [mLazyOpsTest.TestMaxInDim (8 ms)
2023-01-11T23:30:52.6521571Z [0;32m[ RUN      ] [mLazyOpsTest.TestMinInDim
2023-01-11T23:30:52.6550377Z [0;32m[       OK ] [mLazyOpsTest.TestMinInDim (2 ms)
2023-01-11T23:30:52.6552220Z [0;32m[ RUN      ] [mLazyOpsTest.TestNorm
2023-01-11T23:30:52.6554680Z [0;32m[       OK ] [mLazyOpsTest.TestNorm (0 ms)
2023-01-11T23:30:52.6555018Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormInDim
2023-01-11T23:30:52.6563308Z [0;32m[       OK ] [mLazyOpsTest.TestNormInDim (0 ms)
2023-01-11T23:30:52.6563643Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormInDims
2023-01-11T23:30:52.6571281Z [0;32m[       OK ] [mLazyOpsTest.TestNormInDims (0 ms)
2023-01-11T23:30:52.6571630Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormInDimsKeep
2023-01-11T23:30:52.6579505Z [0;32m[       OK ] [mLazyOpsTest.TestNormInDimsKeep (0 ms)
2023-01-11T23:30:52.6579865Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormalTwoTensor
2023-01-11T23:30:52.6587311Z [0;32m[       OK ] [mLazyOpsTest.TestNormalTwoTensor (0 ms)
2023-01-11T23:30:52.6587661Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormalDoubleMean
2023-01-11T23:30:52.6596136Z [0;32m[       OK ] [mLazyOpsTest.TestNormalDoubleMean (0 ms)
2023-01-11T23:30:52.6596494Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormalDoubleStd
2023-01-11T23:30:52.6598765Z [0;32m[       OK ] [mLazyOpsTest.TestNormalDoubleStd (0 ms)
2023-01-11T23:30:52.6599122Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormalInPlace
2023-01-11T23:30:52.6602589Z [0;32m[       OK ] [mLazyOpsTest.TestNormalInPlace (0 ms)
2023-01-11T23:30:52.6602994Z [0;32m[ RUN      ] [mLazyOpsTest.TestUniformInPlace
2023-01-11T23:30:52.6605693Z [0;32m[       OK ] [mLazyOpsTest.TestUniformInPlace (0 ms)
2023-01-11T23:30:52.6606048Z [0;32m[ RUN      ] [mLazyOpsTest.TestRandomInPlace
2023-01-11T23:30:52.6677267Z [0;32m[       OK ] [mLazyOpsTest.TestRandomInPlace (7 ms)
2023-01-11T23:30:52.6677770Z [0;32m[ RUN      ] [mLazyOpsTest.TestRandomInPlaceDefaultFrom
2023-01-11T23:30:52.6746736Z [0;32m[       OK ] [mLazyOpsTest.TestRandomInPlaceDefaultFrom (6 ms)
2023-01-11T23:30:52.6747134Z [0;32m[ RUN      ] [mLazyOpsTest.TestRandomInPlaceDefault
2023-01-11T23:30:52.6766050Z [0;32m[       OK ] [mLazyOpsTest.TestRandomInPlaceDefault (1 ms)
2023-01-11T23:30:52.6766411Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormGeneral
2023-01-11T23:30:52.6770503Z [0;32m[       OK ] [mLazyOpsTest.TestNormGeneral (0 ms)
2023-01-11T23:30:52.6770842Z [0;32m[ RUN      ] [mLazyOpsTest.TestNormNuclear
2023-01-11T23:30:52.6775319Z [0;32m[       OK ] [mLazyOpsTest.TestNormNuclear (0 ms)
2023-01-11T23:30:52.6775664Z [0;32m[ RUN      ] [mLazyOpsTest.TestFrobeniusNormInDim
2023-01-11T23:30:52.6783696Z [0;32m[       OK ] [mLazyOpsTest.TestFrobeniusNormInDim (0 ms)
2023-01-11T23:30:52.6784072Z [0;32m[ RUN      ] [mLazyOpsTest.TestFrobeniusNormInDims
2023-01-11T23:30:52.6792605Z [0;32m[       OK ] [mLazyOpsTest.TestFrobeniusNormInDims (0 ms)
2023-01-11T23:30:52.6793035Z [0;32m[ RUN      ] [mLazyOpsTest.TestGroupNorm
2023-01-11T23:30:52.6828615Z [0;32m[       OK ] [mLazyOpsTest.TestGroupNorm (3 ms)
2023-01-11T23:30:52.6828977Z [0;32m[ RUN      ] [mLazyOpsTest.TestGroupNormBackward
2023-01-11T23:30:52.7435220Z [0;32m[       OK ] [mLazyOpsTest.TestGroupNormBackward (60 ms)
2023-01-11T23:30:52.7435643Z [0;32m[ RUN      ] [mLazyOpsTest.TestInstanceNorm
2023-01-11T23:30:52.7464596Z [0;32m[       OK ] [mLazyOpsTest.TestInstanceNorm (2 ms)
2023-01-11T23:30:52.7465052Z [0;32m[ RUN      ] [mLazyOpsTest.TestLayerNorm
2023-01-11T23:30:52.7494897Z [0;32m[       OK ] [mLazyOpsTest.TestLayerNorm (3 ms)
2023-01-11T23:30:52.7495367Z [0;32m[ RUN      ] [mLazyOpsTest.TestLayerNormBackward
2023-01-11T23:30:52.7833135Z [0;32m[       OK ] [mLazyOpsTest.TestLayerNormBackward (33 ms)
2023-01-11T23:30:52.7833605Z [0;32m[ RUN      ] [mLazyOpsTest.TestNuclearNorm
2023-01-11T23:30:52.7842841Z [0;32m[       OK ] [mLazyOpsTest.TestNuclearNorm (1 ms)
2023-01-11T23:30:52.7843352Z [0;32m[ RUN      ] [mLazyOpsTest.TestPairwiseDistance
2023-01-11T23:30:53.1055248Z [0;32m[       OK ] [mLazyOpsTest.TestPairwiseDistance (320 ms)
2023-01-11T23:30:53.1056228Z [0;32m[ RUN      ] [mLazyOpsTest.TestCosineSimilarity
2023-01-11T23:30:53.1113139Z [0;32m[       OK ] [mLazyOpsTest.TestCosineSimilarity (5 ms)
2023-01-11T23:30:53.1113576Z [0;32m[ RUN      ] [mLazyOpsTest.TestCosineEmbeddingLoss
2023-01-11T23:30:54.4101390Z [0;32m[       OK ] [mLazyOpsTest.TestCosineEmbeddingLoss (1298 ms)
2023-01-11T23:30:54.4102093Z [0;32m[ RUN      ] [mLazyOpsTest.TestHingeEmbeddingLoss
2023-01-11T23:30:54.4156987Z [0;32m[       OK ] [mLazyOpsTest.TestHingeEmbeddingLoss (5 ms)
2023-01-11T23:30:54.4158165Z [0;32m[ RUN      ] [mLazyOpsTest.TestTripletMarginLoss
2023-01-11T23:30:55.1078729Z [0;32m[       OK ] [mLazyOpsTest.TestTripletMarginLoss (691 ms)
2023-01-11T23:30:55.1079563Z [0;32m[ RUN      ] [mLazyOpsTest.TestBinaryCrossEntropy
2023-01-11T23:30:55.1102335Z [0;32m[       OK ] [mLazyOpsTest.TestBinaryCrossEntropy (2 ms)
2023-01-11T23:30:55.1103346Z [0;32m[ RUN      ] [mLazyOpsTest.TestMarginRankingLoss
2023-01-11T23:30:55.3220131Z [0;32m[       OK ] [mLazyOpsTest.TestMarginRankingLoss (211 ms)
2023-01-11T23:30:55.3220804Z [0;32m[ RUN      ] [mLazyOpsTest.TestBCEWithLogits
2023-01-11T23:30:55.3241299Z [0;32m[       OK ] [mLazyOpsTest.TestBCEWithLogits (2 ms)
2023-01-11T23:30:55.3241649Z [0;32m[ RUN      ] [mLazyOpsTest.TestKlDiv
2023-01-11T23:30:55.3270228Z [0;32m[       OK ] [mLazyOpsTest.TestKlDiv (2 ms)
2023-01-11T23:30:55.3270888Z [0;32m[ RUN      ] [mLazyOpsTest.TestProd
2023-01-11T23:30:55.3271655Z [0;32m[       OK ] [mLazyOpsTest.TestProd (0 ms)
2023-01-11T23:30:55.3272321Z [0;32m[ RUN      ] [mLazyOpsTest.TestProdCast
2023-01-11T23:30:55.3281615Z [0;32m[       OK ] [mLazyOpsTest.TestProdCast (1 ms)
2023-01-11T23:30:55.3282093Z [0;32m[ RUN      ] [mLazyOpsTest.TestProdInDim
2023-01-11T23:30:55.3301878Z [0;32m[       OK ] [mLazyOpsTest.TestProdInDim (2 ms)
2023-01-11T23:30:55.3302423Z [0;32m[ RUN      ] [mLazyOpsTest.TestProdInDimKeepCast
2023-01-11T23:30:55.3322584Z [0;32m[       OK ] [mLazyOpsTest.TestProdInDimKeepCast (2 ms)
2023-01-11T23:30:55.3323119Z [0;32m[ RUN      ] [mLazyOpsTest.TestProdInDimKeep
2023-01-11T23:30:55.3332401Z [0;32m[       OK ] [mLazyOpsTest.TestProdInDimKeep (0 ms)
2023-01-11T23:30:55.3332882Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumSum
2023-01-11T23:30:55.3355734Z [0;32m[       OK ] [mLazyOpsTest.TestCumSum (2 ms)
2023-01-11T23:30:55.3356187Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumSumCast
2023-01-11T23:30:55.3377805Z [0;32m[       OK ] [mLazyOpsTest.TestCumSumCast (2 ms)
2023-01-11T23:30:55.3378703Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumSumLong
2023-01-11T23:30:55.3395712Z [0;32m[       OK ] [mLazyOpsTest.TestCumSumLong (1 ms)
2023-01-11T23:30:55.3396192Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumSumCastLong
2023-01-11T23:30:55.3414276Z [0;32m[       OK ] [mLazyOpsTest.TestCumSumCastLong (1 ms)
2023-01-11T23:30:55.3415884Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumProd
2023-01-11T23:30:55.3424766Z [0;32m[       OK ] [mLazyOpsTest.TestCumProd (1 ms)
2023-01-11T23:30:55.3425231Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumProdCast
2023-01-11T23:30:55.3434066Z [0;32m[       OK ] [mLazyOpsTest.TestCumProdCast (1 ms)
2023-01-11T23:30:55.3434537Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumProdLong
2023-01-11T23:30:55.3446863Z [0;32m[       OK ] [mLazyOpsTest.TestCumProdLong (1 ms)
2023-01-11T23:30:55.3447343Z [0;32m[ RUN      ] [mLazyOpsTest.TestCumProdCastLong
2023-01-11T23:30:55.3459721Z [0;32m[       OK ] [mLazyOpsTest.TestCumProdCastLong (1 ms)
2023-01-11T23:30:55.3460211Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMin
2023-01-11T23:30:55.3460881Z [0;32m[       OK ] [mLazyOpsTest.TestArgMin (0 ms)
2023-01-11T23:30:55.3461301Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMinDim
2023-01-11T23:30:55.3464245Z [0;32m[       OK ] [mLazyOpsTest.TestArgMinDim (0 ms)
2023-01-11T23:30:55.3464730Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMinDimKeep
2023-01-11T23:30:55.3466833Z [0;32m[       OK ] [mLazyOpsTest.TestArgMinDimKeep (0 ms)
2023-01-11T23:30:55.3467278Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMinSameValue
2023-01-11T23:30:55.3468712Z [0;32m[       OK ] [mLazyOpsTest.TestArgMinSameValue (0 ms)
2023-01-11T23:30:55.3469169Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMinWrapper
2023-01-11T23:30:55.3472428Z [0;32m[       OK ] [mLazyOpsTest.TestArgMinWrapper (0 ms)
2023-01-11T23:30:55.3472938Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMax
2023-01-11T23:30:55.3474122Z [0;32m[       OK ] [mLazyOpsTest.TestArgMax (0 ms)
2023-01-11T23:30:55.3474529Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMaxDim
2023-01-11T23:30:55.3476979Z [0;32m[       OK ] [mLazyOpsTest.TestArgMaxDim (0 ms)
2023-01-11T23:30:55.3477400Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMaxDimKeep
2023-01-11T23:30:55.3479268Z [0;32m[       OK ] [mLazyOpsTest.TestArgMaxDimKeep (0 ms)
2023-01-11T23:30:55.3479743Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMaxSameValue
2023-01-11T23:30:55.3480650Z [0;32m[       OK ] [mLazyOpsTest.TestArgMaxSameValue (0 ms)
2023-01-11T23:30:55.3481058Z [0;32m[ RUN      ] [mLazyOpsTest.TestArgMaxWrapper
2023-01-11T23:30:55.3483775Z [0;32m[       OK ] [mLazyOpsTest.TestArgMaxWrapper (0 ms)
2023-01-11T23:30:55.3484191Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsin
2023-01-11T23:30:55.3485780Z [0;32m[       OK ] [mLazyOpsTest.TestAsin (0 ms)
2023-01-11T23:30:55.3486149Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsinh
2023-01-11T23:30:55.3487944Z [0;32m[       OK ] [mLazyOpsTest.TestAsinh (0 ms)
2023-01-11T23:30:55.3488291Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsinhInPlace
2023-01-11T23:30:55.3490851Z [0;32m[       OK ] [mLazyOpsTest.TestAsinhInPlace (0 ms)
2023-01-11T23:30:55.3491216Z [0;32m[ RUN      ] [mLazyOpsTest.TestSin
2023-01-11T23:30:55.3493063Z [0;32m[       OK ] [mLazyOpsTest.TestSin (0 ms)
2023-01-11T23:30:55.3493368Z [0;32m[ RUN      ] [mLazyOpsTest.TestSinh
2023-01-11T23:30:55.3495492Z [0;32m[       OK ] [mLazyOpsTest.TestSinh (0 ms)
2023-01-11T23:30:55.3495918Z [0;32m[ RUN      ] [mLazyOpsTest.TestAcos
2023-01-11T23:30:55.3498023Z [0;32m[       OK ] [mLazyOpsTest.TestAcos (0 ms)
2023-01-11T23:30:55.3498425Z [0;32m[ RUN      ] [mLazyOpsTest.TestAcosh
2023-01-11T23:30:55.3499892Z [0;32m[       OK ] [mLazyOpsTest.TestAcosh (0 ms)
2023-01-11T23:30:55.3500219Z [0;32m[ RUN      ] [mLazyOpsTest.TestAcoshInPlace
2023-01-11T23:30:55.3502673Z [0;32m[       OK ] [mLazyOpsTest.TestAcoshInPlace (0 ms)
2023-01-11T23:30:55.3502981Z [0;32m[ RUN      ] [mLazyOpsTest.TestCos
2023-01-11T23:30:55.3506377Z [0;32m[       OK ] [mLazyOpsTest.TestCos (0 ms)
2023-01-11T23:30:55.3506743Z [0;32m[ RUN      ] [mLazyOpsTest.TestCosh
2023-01-11T23:30:55.3508604Z [0;32m[       OK ] [mLazyOpsTest.TestCosh (0 ms)
2023-01-11T23:30:55.3508938Z [0;32m[ RUN      ] [mLazyOpsTest.TestAtan
2023-01-11T23:30:55.3512104Z [0;32m[       OK ] [mLazyOpsTest.TestAtan (0 ms)
2023-01-11T23:30:55.3512695Z [0;32m[ RUN      ] [mLazyOpsTest.TestAtanh
2023-01-11T23:30:55.3514098Z [0;32m[       OK ] [mLazyOpsTest.TestAtanh (0 ms)
2023-01-11T23:30:55.3514546Z [0;32m[ RUN      ] [mLazyOpsTest.TestAtanhInPlace
2023-01-11T23:30:55.3516220Z [0;32m[       OK ] [mLazyOpsTest.TestAtanhInPlace (0 ms)
2023-01-11T23:30:55.3516658Z [0;32m[ RUN      ] [mLazyOpsTest.TestAtan2
2023-01-11T23:30:55.3518858Z [0;32m[       OK ] [mLazyOpsTest.TestAtan2 (0 ms)
2023-01-11T23:30:55.3519270Z [0;32m[ RUN      ] [mLazyOpsTest.TestTan
2023-01-11T23:30:55.3521217Z [0;32m[       OK ] [mLazyOpsTest.TestTan (0 ms)
2023-01-11T23:30:55.3521622Z [0;32m[ RUN      ] [mLazyOpsTest.TestTanh
2023-01-11T23:30:55.3524704Z [0;32m[       OK ] [mLazyOpsTest.TestTanh (0 ms)
2023-01-11T23:30:55.3525134Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMinMax
2023-01-11T23:30:55.3528719Z [0;32m[       OK ] [mLazyOpsTest.TestClampMinMax (0 ms)
2023-01-11T23:30:55.3529036Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMin
2023-01-11T23:30:55.3532916Z [0;32m[       OK ] [mLazyOpsTest.TestClampMin (0 ms)
2023-01-11T23:30:55.3533252Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMax
2023-01-11T23:30:55.3536526Z [0;32m[       OK ] [mLazyOpsTest.TestClampMax (0 ms)
2023-01-11T23:30:55.3536861Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMinExplicit
2023-01-11T23:30:55.3540278Z [0;32m[       OK ] [mLazyOpsTest.TestClampMinExplicit (0 ms)
2023-01-11T23:30:55.3540747Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMaxExplicit
2023-01-11T23:30:55.3542705Z [0;32m[       OK ] [mLazyOpsTest.TestClampMaxExplicit (0 ms)
2023-01-11T23:30:55.3543074Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMinExplicitInPlace
2023-01-11T23:30:55.3547204Z [0;32m[       OK ] [mLazyOpsTest.TestClampMinExplicitInPlace (0 ms)
2023-01-11T23:30:55.3547621Z [0;32m[ RUN      ] [mLazyOpsTest.TestClampMaxExplicitInPlace
2023-01-11T23:30:55.3550312Z [0;32m[       OK ] [mLazyOpsTest.TestClampMaxExplicitInPlace (0 ms)
2023-01-11T23:30:55.3552295Z [0;32m[ RUN      ] [mLazyOpsTest.TestCeil
2023-01-11T23:30:55.3553932Z [0;32m[       OK ] [mLazyOpsTest.TestCeil (0 ms)
2023-01-11T23:30:55.3554337Z [0;32m[ RUN      ] [mLazyOpsTest.TestFloor
2023-01-11T23:30:55.3556616Z [0;32m[       OK ] [mLazyOpsTest.TestFloor (0 ms)
2023-01-11T23:30:55.3556949Z [0;32m[ RUN      ] [mLazyOpsTest.TestRound
2023-01-11T23:30:55.3559418Z [0;32m[       OK ] [mLazyOpsTest.TestRound (0 ms)
2023-01-11T23:30:55.3559755Z [0;32m[ RUN      ] [mLazyOpsTest.TestTrunc
2023-01-11T23:30:55.3563173Z [0;32m[       OK ] [mLazyOpsTest.TestTrunc (0 ms)
2023-01-11T23:30:55.3563595Z [0;32m[ RUN      ] [mLazyOpsTest.TestFrac
2023-01-11T23:30:55.3566905Z [0;32m[       OK ] [mLazyOpsTest.TestFrac (0 ms)
2023-01-11T23:30:55.3567254Z [0;32m[ RUN      ] [mLazyOpsTest.TestNeg
2023-01-11T23:30:55.3570397Z [0;32m[       OK ] [mLazyOpsTest.TestNeg (0 ms)
2023-01-11T23:30:55.3570753Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseNot
2023-01-11T23:30:55.3576052Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseNot (0 ms)
2023-01-11T23:30:55.3576420Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseNotInPlace
2023-01-11T23:30:55.3581026Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseNotInPlace (0 ms)
2023-01-11T23:30:55.3581378Z [0;32m[ RUN      ] [mLazyOpsTest.TestSign
2023-01-11T23:30:55.3583254Z [0;32m[       OK ] [mLazyOpsTest.TestSign (0 ms)
2023-01-11T23:30:55.3583673Z [0;32m[ RUN      ] [mLazyOpsTest.TestSignByte
2023-01-11T23:30:55.3584837Z [0;32m[       OK ] [mLazyOpsTest.TestSignByte (0 ms)
2023-01-11T23:30:55.3585180Z [0;32m[ RUN      ] [mLazyOpsTest.TestAbs
2023-01-11T23:30:55.3588079Z [0;32m[       OK ] [mLazyOpsTest.TestAbs (0 ms)
2023-01-11T23:30:55.3588388Z [0;32m[ RUN      ] [mLazyOpsTest.TestAbsByte
2023-01-11T23:30:55.3592882Z [0;32m[       OK ] [mLazyOpsTest.TestAbsByte (0 ms)
2023-01-11T23:30:55.3593344Z [0;32m[ RUN      ] [mLazyOpsTest.TestEmptyLike
2023-01-11T23:30:55.3593696Z [0;32m[       OK ] [mLazyOpsTest.TestEmptyLike (0 ms)
2023-01-11T23:30:55.3594029Z [0;32m[ RUN      ] [mLazyOpsTest.TestEmptyLikeOptions
2023-01-11T23:30:55.3594480Z [0;32m[       OK ] [mLazyOpsTest.TestEmptyLikeOptions (0 ms)
2023-01-11T23:30:55.3594805Z [0;32m[ RUN      ] [mLazyOpsTest.TestEmpty
2023-01-11T23:30:55.3595104Z [0;32m[       OK ] [mLazyOpsTest.TestEmpty (0 ms)
2023-01-11T23:30:55.3595502Z [0;32m[ RUN      ] [mLazyOpsTest.TestZeroInPlace
2023-01-11T23:30:55.3596111Z [0;32m[       OK ] [mLazyOpsTest.TestZeroInPlace (0 ms)
2023-01-11T23:30:55.3596427Z [0;32m[ RUN      ] [mLazyOpsTest.TestZerosLike
2023-01-11T23:30:55.3598960Z [0;32m[       OK ] [mLazyOpsTest.TestZerosLike (0 ms)
2023-01-11T23:30:55.3599478Z [0;32m[ RUN      ] [mLazyOpsTest.TestZerosLikeOptions
2023-01-11T23:30:55.3599865Z [0;32m[       OK ] [mLazyOpsTest.TestZerosLikeOptions (0 ms)
2023-01-11T23:30:55.3600179Z [0;32m[ RUN      ] [mLazyOpsTest.TestZeros
2023-01-11T23:30:55.3603544Z [0;32m[       OK ] [mLazyOpsTest.TestZeros (0 ms)
2023-01-11T23:30:55.3603856Z [0;32m[ RUN      ] [mLazyOpsTest.TestOnes
2023-01-11T23:30:55.3607988Z [0;32m[       OK ] [mLazyOpsTest.TestOnes (0 ms)
2023-01-11T23:30:55.3608296Z [0;32m[ RUN      ] [mLazyOpsTest.TestOnesLike
2023-01-11T23:30:55.3610050Z [0;32m[       OK ] [mLazyOpsTest.TestOnesLike (0 ms)
2023-01-11T23:30:55.3610542Z [0;32m[ RUN      ] [mLazyOpsTest.TestOnesLikeOptions
2023-01-11T23:30:55.3611376Z [0;32m[       OK ] [mLazyOpsTest.TestOnesLikeOptions (0 ms)
2023-01-11T23:30:55.3611703Z [0;32m[ RUN      ] [mLazyOpsTest.TestFull
2023-01-11T23:30:55.3617795Z [0;32m[       OK ] [mLazyOpsTest.TestFull (0 ms)
2023-01-11T23:30:55.3618151Z [0;32m[ RUN      ] [mLazyOpsTest.TestFullLike
2023-01-11T23:30:55.3619326Z [0;32m[       OK ] [mLazyOpsTest.TestFullLike (0 ms)
2023-01-11T23:30:55.3619666Z [0;32m[ RUN      ] [mLazyOpsTest.TestFullLikeOptions
2023-01-11T23:30:55.3621260Z [0;32m[       OK ] [mLazyOpsTest.TestFullLikeOptions (0 ms)
2023-01-11T23:30:55.3621592Z [0;32m[ RUN      ] [mLazyOpsTest.TestARange
2023-01-11T23:30:55.3629079Z [0;32m[       OK ] [mLazyOpsTest.TestARange (0 ms)
2023-01-11T23:30:55.3629488Z [0;32m[ RUN      ] [mLazyOpsTest.TestARangeOut
2023-01-11T23:30:55.3630058Z [W RangeFactories.cu:270] Warning: The number of elements in the out tensor of shape [4] is 4 which does not match the computed number of elements 200. Note that this may occur as a result of rounding error. The out tensor will be resized to a tensor of shape (200,). (function operator())
2023-01-11T23:30:55.3634060Z [0;32m[       OK ] [mLazyOpsTest.TestARangeOut (0 ms)
2023-01-11T23:30:55.3634381Z [0;32m[ RUN      ] [mLazyOpsTest.TestDimARange
2023-01-11T23:30:55.3638625Z [0;32m[       OK ] [mLazyOpsTest.TestDimARange (0 ms)
2023-01-11T23:30:55.3638966Z [0;32m[ RUN      ] [mLazyOpsTest.TestBartlettWindow
2023-01-11T23:30:55.3676696Z [0;32m[       OK ] [mLazyOpsTest.TestBartlettWindow (3 ms)
2023-01-11T23:30:55.3677202Z [0;32m[ RUN      ] [mLazyOpsTest.TestBlackmanWindow
2023-01-11T23:30:55.3701709Z [0;32m[       OK ] [mLazyOpsTest.TestBlackmanWindow (2 ms)
2023-01-11T23:30:55.3702057Z [0;32m[ RUN      ] [mLazyOpsTest.TestHammingWindow
2023-01-11T23:30:55.3719646Z [0;32m[       OK ] [mLazyOpsTest.TestHammingWindow (1 ms)
2023-01-11T23:30:55.3719971Z [0;32m[ RUN      ] [mLazyOpsTest.TestHannWindow
2023-01-11T23:30:55.7837702Z [0;32m[       OK ] [mLazyOpsTest.TestHannWindow (411 ms)
2023-01-11T23:30:55.7838469Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSigmoid
2023-01-11T23:30:55.7843234Z [0;32m[       OK ] [mLazyOpsTest.TestLogSigmoid (0 ms)
2023-01-11T23:30:55.7843599Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSigmoidForward
2023-01-11T23:30:55.7846906Z [0;32m[       OK ] [mLazyOpsTest.TestLogSigmoidForward (0 ms)
2023-01-11T23:30:55.7847271Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogsumexp
2023-01-11T23:30:55.7898911Z [0;32m[       OK ] [mLazyOpsTest.TestLogsumexp (4 ms)
2023-01-11T23:30:55.7899693Z [0;32m[ RUN      ] [mLazyOpsTest.TestSiLU
2023-01-11T23:30:55.7902328Z [0;32m[       OK ] [mLazyOpsTest.TestSiLU (0 ms)
2023-01-11T23:30:55.7902759Z [0;32m[ RUN      ] [mLazyOpsTest.TestSigmoid
2023-01-11T23:30:55.7905290Z [0;32m[       OK ] [mLazyOpsTest.TestSigmoid (0 ms)
2023-01-11T23:30:55.7905758Z [0;32m[ RUN      ] [mLazyOpsTest.TestMatmul_1x1
2023-01-11T23:30:55.7907841Z [0;32m[       OK ] [mLazyOpsTest.TestMatmul_1x1 (0 ms)
2023-01-11T23:30:55.7908161Z [0;32m[ RUN      ] [mLazyOpsTest.TestMatmul_2x1
2023-01-11T23:30:55.7914596Z [0;32m[       OK ] [mLazyOpsTest.TestMatmul_2x1 (0 ms)
2023-01-11T23:30:55.7914944Z [0;32m[ RUN      ] [mLazyOpsTest.TestMatmul_1x2
2023-01-11T23:30:55.7920563Z [0;32m[       OK ] [mLazyOpsTest.TestMatmul_1x2 (0 ms)
2023-01-11T23:30:55.7920904Z [0;32m[ RUN      ] [mLazyOpsTest.TestMatmul_2x2
2023-01-11T23:30:55.7924532Z [0;32m[       OK ] [mLazyOpsTest.TestMatmul_2x2 (0 ms)
2023-01-11T23:30:55.7924880Z [0;32m[ RUN      ] [mLazyOpsTest.TestMatmulBcast
2023-01-11T23:30:55.7933668Z [0;32m[       OK ] [mLazyOpsTest.TestMatmulBcast (0 ms)
2023-01-11T23:30:55.7934139Z [0;32m[ RUN      ] [mLazyOpsTest.TestDot
2023-01-11T23:30:55.7936296Z [0;32m[       OK ] [mLazyOpsTest.TestDot (0 ms)
2023-01-11T23:30:55.7936632Z [0;32m[ RUN      ] [mLazyOpsTest.TestTensorDot
2023-01-11T23:30:55.7944936Z [0;32m[       OK ] [mLazyOpsTest.TestTensorDot (0 ms)
2023-01-11T23:30:55.7945277Z [0;32m[ RUN      ] [mLazyOpsTest.TestGer
2023-01-11T23:30:55.7949840Z [0;32m[       OK ] [mLazyOpsTest.TestGer (0 ms)
2023-01-11T23:30:55.7950243Z [0;32m[ RUN      ] [mLazyOpsTest.TestMv
2023-01-11T23:30:55.7954690Z [0;32m[       OK ] [mLazyOpsTest.TestMv (0 ms)
2023-01-11T23:30:55.7955009Z [0;32m[ RUN      ] [mLazyOpsTest.TestMvOut
2023-01-11T23:30:55.7959696Z [0;32m[       OK ] [mLazyOpsTest.TestMvOut (0 ms)
2023-01-11T23:30:55.7960039Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchAddBatchMatMul
2023-01-11T23:30:55.7964446Z [0;32m[       OK ] [mLazyOpsTest.TestBatchAddBatchMatMul (0 ms)
2023-01-11T23:30:55.7964843Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchAddBatchMatMulInPlace
2023-01-11T23:30:55.7970747Z [0;32m[       OK ] [mLazyOpsTest.TestBatchAddBatchMatMulInPlace (0 ms)
2023-01-11T23:30:55.7971120Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchMatMul
2023-01-11T23:30:55.7975033Z [0;32m[       OK ] [mLazyOpsTest.TestBatchMatMul (0 ms)
2023-01-11T23:30:55.7975373Z [0;32m[ RUN      ] [mLazyOpsTest.TestChainMatMul
2023-01-11T23:30:55.7975871Z [W LinearAlgebra.cpp:1077] Warning: torch.chain_matmul is deprecated and will be removed in a future PyTorch release. Use torch.linalg.multi_dot instead, which accepts a list of two or more tensors rather than multiple parameters. (function operator())
2023-01-11T23:30:55.7982949Z [0;32m[       OK ] [mLazyOpsTest.TestChainMatMul (0 ms)
2023-01-11T23:30:55.7983316Z [0;32m[ RUN      ] [mLazyOpsTest.TestLinear
2023-01-11T23:30:55.7996024Z [0;32m[       OK ] [mLazyOpsTest.TestLinear (1 ms)
2023-01-11T23:30:55.7996349Z [0;32m[ RUN      ] [mLazyOpsTest.TestPinverse
2023-01-11T23:30:55.8023975Z [0;32m[       OK ] [mLazyOpsTest.TestPinverse (2 ms)
2023-01-11T23:30:55.8024430Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumOuter
2023-01-11T23:30:55.8032087Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumOuter (0 ms)
2023-01-11T23:30:55.8032464Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumOuterBackward
2023-01-11T23:30:55.8060409Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumOuterBackward (3 ms)
2023-01-11T23:30:55.8060803Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumBatchMatMul
2023-01-11T23:30:55.8074041Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumBatchMatMul (1 ms)
2023-01-11T23:30:55.8074437Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumPyTorchLowerBilinear
2023-01-11T23:30:55.8095066Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumPyTorchLowerBilinear (2 ms)
2023-01-11T23:30:55.8095491Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumPyTorchLowerDiagonal
2023-01-11T23:30:55.8101288Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumPyTorchLowerDiagonal (0 ms)
2023-01-11T23:30:55.8101751Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumPyTorchLowerBatchDiagonal
2023-01-11T23:30:55.8107006Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumPyTorchLowerBatchDiagonal (0 ms)
2023-01-11T23:30:55.8107573Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumPyTorchLowerBatchPermute
2023-01-11T23:30:55.8112402Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumPyTorchLowerBatchPermute (0 ms)
2023-01-11T23:30:55.8112941Z [0;32m[ RUN      ] [mLazyOpsTest.TestEinsumPyTorchLowerRepeatedAxis
2023-01-11T23:30:55.8123128Z [0;32m[       OK ] [mLazyOpsTest.TestEinsumPyTorchLowerRepeatedAxis (1 ms)
2023-01-11T23:30:55.8123534Z [0;32m[ RUN      ] [mLazyOpsTest.TestBilinear
2023-01-11T23:30:55.8300681Z [0;32m[       OK ] [mLazyOpsTest.TestBilinear (17 ms)
2023-01-11T23:30:55.8301034Z [0;32m[ RUN      ] [mLazyOpsTest.TestUpsampleNearest2D
2023-01-11T23:30:55.8305299Z [0;32m[       OK ] [mLazyOpsTest.TestUpsampleNearest2D (0 ms)
2023-01-11T23:30:55.8305715Z [0;32m[ RUN      ] [mLazyOpsTest.TestUpsampleNearest2DBackward
2023-01-11T23:30:55.8318987Z [0;32m[       OK ] [mLazyOpsTest.TestUpsampleNearest2DBackward (1 ms)
2023-01-11T23:30:55.8319399Z [0;32m[ RUN      ] [mLazyOpsTest.TestUpsampleNearest2DWithScale
2023-01-11T23:30:55.8323615Z [0;32m[       OK ] [mLazyOpsTest.TestUpsampleNearest2DWithScale (0 ms)
2023-01-11T23:30:55.8324070Z [0;32m[ RUN      ] [mLazyOpsTest.TestUpsampleNearest2DBackwardWithScale
2023-01-11T23:30:55.8336756Z [0;32m[       OK ] [mLazyOpsTest.TestUpsampleNearest2DBackwardWithScale (1 ms)
2023-01-11T23:30:55.8337168Z [0;32m[ RUN      ] [mLazyOpsTest.TestUpsampleBilinear2D
2023-01-11T23:30:55.8345577Z [0;32m[       OK ] [mLazyOpsTest.TestUpsampleBilinear2D (0 ms)
2023-01-11T23:30:55.8345993Z [0;32m[ RUN      ] [mLazyOpsTest.TestUpsampleBilinear2DBackward
2023-01-11T23:30:55.8372951Z [0;32m[       OK ] [mLazyOpsTest.TestUpsampleBilinear2DBackward (2 ms)
2023-01-11T23:30:55.8373316Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddCMul
2023-01-11T23:30:55.8377759Z [0;32m[       OK ] [mLazyOpsTest.TestAddCMul (0 ms)
2023-01-11T23:30:55.8378111Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddCDiv
2023-01-11T23:30:55.8382702Z [0;32m[       OK ] [mLazyOpsTest.TestAddCDiv (0 ms)
2023-01-11T23:30:55.8383102Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddCDivWithBroadcast
2023-01-11T23:30:55.8387446Z [0;32m[       OK ] [mLazyOpsTest.TestAddCDivWithBroadcast (0 ms)
2023-01-11T23:30:55.8387815Z [0;32m[ RUN      ] [mLazyOpsTest.TestSize
2023-01-11T23:30:55.8388102Z [0;32m[       OK ] [mLazyOpsTest.TestSize (0 ms)
2023-01-11T23:30:55.8388403Z [0;32m[ RUN      ] [mLazyOpsTest.TestSelect
2023-01-11T23:30:55.8484709Z [0;32m[       OK ] [mLazyOpsTest.TestSelect (9 ms)
2023-01-11T23:30:55.8485098Z [0;32m[ RUN      ] [mLazyOpsTest.TestBernoulliScalarProb
2023-01-11T23:30:55.8488976Z [0;32m[       OK ] [mLazyOpsTest.TestBernoulliScalarProb (0 ms)
2023-01-11T23:30:55.8489366Z [0;32m[ RUN      ] [mLazyOpsTest.TestBernoulliTensorProb
2023-01-11T23:30:55.8492815Z [0;32m[       OK ] [mLazyOpsTest.TestBernoulliTensorProb (0 ms)
2023-01-11T23:30:55.8493317Z [0;32m[ RUN      ] [mLazyOpsTest.TestBernoulliScalarProbInPlace
2023-01-11T23:30:55.8497668Z [0;32m[       OK ] [mLazyOpsTest.TestBernoulliScalarProbInPlace (0 ms)
2023-01-11T23:30:55.8498094Z [0;32m[ RUN      ] [mLazyOpsTest.TestBernoulliTensorProbInPlace
2023-01-11T23:30:55.8502535Z [0;32m[       OK ] [mLazyOpsTest.TestBernoulliTensorProbInPlace (0 ms)
2023-01-11T23:30:55.8502948Z [0;32m[ RUN      ] [mLazyOpsTest.TestDropout
2023-01-11T23:30:55.8506061Z [0;32m[       OK ] [mLazyOpsTest.TestDropout (0 ms)
2023-01-11T23:30:55.8506435Z [0;32m[ RUN      ] [mLazyOpsTest.TestDropoutInPlace
2023-01-11T23:30:55.8513002Z [0;32m[       OK ] [mLazyOpsTest.TestDropoutInPlace (0 ms)
2023-01-11T23:30:55.8513469Z [0;32m[ RUN      ] [mLazyOpsTest.TestRandperm
2023-01-11T23:30:55.8514438Z [0;32m[       OK ] [mLazyOpsTest.TestRandperm (0 ms)
2023-01-11T23:30:55.8514813Z [0;32m[ RUN      ] [mLazyOpsTest.TestSlice
2023-01-11T23:30:55.8521714Z [0;32m[       OK ] [mLazyOpsTest.TestSlice (0 ms)
2023-01-11T23:30:55.8522042Z [0;32m[ RUN      ] [mLazyOpsTest.TestTake
2023-01-11T23:30:55.8524470Z [0;32m[       OK ] [mLazyOpsTest.TestTake (0 ms)
2023-01-11T23:30:55.8524820Z [0;32m[ RUN      ] [mLazyOpsTest.TestTakeBackward
2023-01-11T23:30:55.8540363Z [0;32m[       OK ] [mLazyOpsTest.TestTakeBackward (1 ms)
2023-01-11T23:30:55.8540973Z [0;32m[ RUN      ] [mLazyOpsTest.TestStack
2023-01-11T23:30:55.8577560Z [0;32m[       OK ] [mLazyOpsTest.TestStack (3 ms)
2023-01-11T23:30:55.8577879Z [0;32m[ RUN      ] [mLazyOpsTest.TestCat
2023-01-11T23:30:55.8587083Z [0;32m[       OK ] [mLazyOpsTest.TestCat (0 ms)
2023-01-11T23:30:55.8587457Z [0;32m[ RUN      ] [mLazyOpsTest.TestUnbind
2023-01-11T23:30:55.8616775Z [0;32m[       OK ] [mLazyOpsTest.TestUnbind (2 ms)
2023-01-11T23:30:55.8617094Z [0;32m[ RUN      ] [mLazyOpsTest.TestRepeat
2023-01-11T23:30:55.8632703Z [0;32m[       OK ] [mLazyOpsTest.TestRepeat (1 ms)
2023-01-11T23:30:55.8633025Z [0;32m[ RUN      ] [mLazyOpsTest.TestGather
2023-01-11T23:30:55.8641124Z [0;32m[       OK ] [mLazyOpsTest.TestGather (0 ms)
2023-01-11T23:30:55.8641445Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatter
2023-01-11T23:30:55.8649001Z [0;32m[       OK ] [mLazyOpsTest.TestScatter (0 ms)
2023-01-11T23:30:55.8649327Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterR1
2023-01-11T23:30:55.8652111Z [0;32m[       OK ] [mLazyOpsTest.TestScatterR1 (0 ms)
2023-01-11T23:30:55.8652464Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterR3
2023-01-11T23:30:55.8657286Z [0;32m[       OK ] [mLazyOpsTest.TestScatterR3 (0 ms)
2023-01-11T23:30:55.8657639Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterBiggerSource
2023-01-11T23:30:55.8663828Z [0;32m[       OK ] [mLazyOpsTest.TestScatterBiggerSource (0 ms)
2023-01-11T23:30:55.8664178Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterScalar
2023-01-11T23:30:55.8669705Z [0;32m[       OK ] [mLazyOpsTest.TestScatterScalar (0 ms)
2023-01-11T23:30:55.8670153Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterReduceAdd
2023-01-11T23:30:55.8677721Z [0;32m[       OK ] [mLazyOpsTest.TestScatterReduceAdd (0 ms)
2023-01-11T23:30:55.8678057Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterAdd
2023-01-11T23:30:55.8688973Z [0;32m[       OK ] [mLazyOpsTest.TestScatterAdd (1 ms)
2023-01-11T23:30:55.8689319Z [0;32m[ RUN      ] [mLazyOpsTest.TestScatterAddInPlace
2023-01-11T23:30:55.8700840Z [0;32m[       OK ] [mLazyOpsTest.TestScatterAddInPlace (1 ms)
2023-01-11T23:30:55.8701180Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexSelect
2023-01-11T23:30:55.8843321Z [0;32m[       OK ] [mLazyOpsTest.TestIndexSelect (14 ms)
2023-01-11T23:30:55.8843738Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexSelectRank0
2023-01-11T23:30:55.8881467Z [0;32m[       OK ] [mLazyOpsTest.TestIndexSelectRank0 (3 ms)
2023-01-11T23:30:55.8881959Z [0;32m[ RUN      ] [mLazyOpsTest.TestInverse
2023-01-11T23:30:55.8882260Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:4728: Skipped
2023-01-11T23:30:55.8882519Z 
2023-01-11T23:30:55.8883214Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestInverse (0 ms)
2023-01-11T23:30:55.8883803Z [0;32m[ RUN      ] [mLazyOpsTest.TestIsnan
2023-01-11T23:30:55.8884213Z [0;32m[       OK ] [mLazyOpsTest.TestIsnan (0 ms)
2023-01-11T23:30:55.8884519Z [0;32m[ RUN      ] [mLazyOpsTest.TestExpand
2023-01-11T23:30:55.8887624Z [0;32m[       OK ] [mLazyOpsTest.TestExpand (0 ms)
2023-01-11T23:30:55.8888082Z [0;32m[ RUN      ] [mLazyOpsTest.TestExpandBack
2023-01-11T23:30:55.8891809Z [0;32m[       OK ] [mLazyOpsTest.TestExpandBack (0 ms)
2023-01-11T23:30:55.8892276Z [0;32m[ RUN      ] [mLazyOpsTest.TestExpandAs
2023-01-11T23:30:55.8897041Z [0;32m[       OK ] [mLazyOpsTest.TestExpandAs (0 ms)
2023-01-11T23:30:55.8897430Z [0;32m[ RUN      ] [mLazyOpsTest.TestEye
2023-01-11T23:30:55.8899004Z [0;32m[       OK ] [mLazyOpsTest.TestEye (0 ms)
2023-01-11T23:30:55.8899408Z [0;32m[ RUN      ] [mLazyOpsTest.TestEyeWide
2023-01-11T23:30:55.8900368Z [0;32m[       OK ] [mLazyOpsTest.TestEyeWide (0 ms)
2023-01-11T23:30:55.8900785Z [0;32m[ RUN      ] [mLazyOpsTest.TestEyeNarrow
2023-01-11T23:30:55.8902425Z [0;32m[       OK ] [mLazyOpsTest.TestEyeNarrow (0 ms)
2023-01-11T23:30:55.8903058Z [0;32m[ RUN      ] [mLazyOpsTest.TestBroadcastTensors
2023-01-11T23:30:55.8910247Z [0;32m[       OK ] [mLazyOpsTest.TestBroadcastTensors (0 ms)
2023-01-11T23:30:55.8910720Z [0;32m[ RUN      ] [mLazyOpsTest.TestOneIndex
2023-01-11T23:30:55.8926221Z [0;32m[       OK ] [mLazyOpsTest.TestOneIndex (1 ms)
2023-01-11T23:30:55.8926691Z [0;32m[ RUN      ] [mLazyOpsTest.TestOneIndexTransfer
2023-01-11T23:30:55.8941801Z [0;32m[       OK ] [mLazyOpsTest.TestOneIndexTransfer (1 ms)
2023-01-11T23:30:55.8942244Z [0;32m[ RUN      ] [mLazyOpsTest.TestNonzero
2023-01-11T23:30:55.8945085Z [0;32m[       OK ] [mLazyOpsTest.TestNonzero (0 ms)
2023-01-11T23:30:55.8945516Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskedSelect
2023-01-11T23:30:55.8949356Z [0;32m[       OK ] [mLazyOpsTest.TestMaskedSelect (0 ms)
2023-01-11T23:30:55.8949804Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskedScatter
2023-01-11T23:30:55.8954280Z [0;32m[       OK ] [mLazyOpsTest.TestMaskedScatter (0 ms)
2023-01-11T23:30:55.8954760Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexHeadNull
2023-01-11T23:30:55.8970449Z [0;32m[       OK ] [mLazyOpsTest.TestMultiIndexHeadNull (1 ms)
2023-01-11T23:30:55.8970990Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexMiddleNull
2023-01-11T23:30:55.8986610Z [0;32m[       OK ] [mLazyOpsTest.TestMultiIndexMiddleNull (1 ms)
2023-01-11T23:30:55.8987127Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexTailNull
2023-01-11T23:30:55.9002826Z [0;32m[       OK ] [mLazyOpsTest.TestMultiIndexTailNull (1 ms)
2023-01-11T23:30:55.9003386Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexMiddleBroadcast
2023-01-11T23:30:55.9020710Z [0;32m[       OK ] [mLazyOpsTest.TestMultiIndexMiddleBroadcast (1 ms)
2023-01-11T23:30:55.9021248Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexTailBroadcast
2023-01-11T23:30:55.9038908Z [0;32m[       OK ] [mLazyOpsTest.TestMultiIndexTailBroadcast (1 ms)
2023-01-11T23:30:55.9039393Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskIndex
2023-01-11T23:30:55.9055740Z [0;32m[       OK ] [mLazyOpsTest.TestMaskIndex (1 ms)
2023-01-11T23:30:55.9056179Z [0;32m[ RUN      ] [mLazyOpsTest.TestOneIndexPut
2023-01-11T23:30:55.9060007Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5230: Skipped
2023-01-11T23:30:55.9060264Z 
2023-01-11T23:30:55.9060552Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestOneIndexPut (0 ms)
2023-01-11T23:30:55.9060994Z [0;32m[ RUN      ] [mLazyOpsTest.TestOneIndexPutInPlace
2023-01-11T23:30:55.9065364Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5264: Skipped
2023-01-11T23:30:55.9065618Z 
2023-01-11T23:30:55.9065922Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestOneIndexPutInPlace (0 ms)
2023-01-11T23:30:55.9066349Z [0;32m[ RUN      ] [mLazyOpsTest.TestOneIndexPutTransfer
2023-01-11T23:30:55.9068265Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5315: Skipped
2023-01-11T23:30:55.9068523Z 
2023-01-11T23:30:55.9068934Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestOneIndexPutTransfer (0 ms)
2023-01-11T23:30:55.9069398Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexPut
2023-01-11T23:30:55.9073402Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5360: Skipped
2023-01-11T23:30:55.9073649Z 
2023-01-11T23:30:55.9073921Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPut (0 ms)
2023-01-11T23:30:55.9074381Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexPutHeadNull
2023-01-11T23:30:55.9076571Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5411: Skipped
2023-01-11T23:30:55.9076813Z 
2023-01-11T23:30:55.9077115Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutHeadNull (0 ms)
2023-01-11T23:30:55.9077535Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexPutMiddleNull
2023-01-11T23:30:55.9081672Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5462: Skipped
2023-01-11T23:30:55.9081915Z 
2023-01-11T23:30:55.9082231Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutMiddleNull (0 ms)
2023-01-11T23:30:55.9082727Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexPutTailNull
2023-01-11T23:30:55.9085334Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5513: Skipped
2023-01-11T23:30:55.9085579Z 
2023-01-11T23:30:55.9085885Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutTailNull (0 ms)
2023-01-11T23:30:55.9086339Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexPutMiddleBroadcast
2023-01-11T23:30:55.9089437Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5563: Skipped
2023-01-11T23:30:55.9089682Z 
2023-01-11T23:30:55.9090008Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutMiddleBroadcast (0 ms)
2023-01-11T23:30:55.9090500Z [0;32m[ RUN      ] [mLazyOpsTest.TestMultiIndexPutTailBroadcast
2023-01-11T23:30:55.9093399Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5613: Skipped
2023-01-11T23:30:55.9093639Z 
2023-01-11T23:30:55.9093957Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutTailBroadcast (0 ms)
2023-01-11T23:30:55.9094361Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskIndexPut
2023-01-11T23:30:55.9156733Z [0;32m[       OK ] [mLazyOpsTest.TestMaskIndexPut (6 ms)
2023-01-11T23:30:55.9157209Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexPutImpl
2023-01-11T23:30:55.9161359Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:5687: Skipped
2023-01-11T23:30:55.9161628Z 
2023-01-11T23:30:55.9161921Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestIndexPutImpl (0 ms)
2023-01-11T23:30:55.9162367Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexFillWithScalar
2023-01-11T23:30:55.9241269Z [0;32m[       OK ] [mLazyOpsTest.TestIndexFillWithScalar (7 ms)
2023-01-11T23:30:55.9241714Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexFillWithScalarInPlace
2023-01-11T23:30:55.9330823Z [0;32m[       OK ] [mLazyOpsTest.TestIndexFillWithScalarInPlace (8 ms)
2023-01-11T23:30:55.9331273Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexFillWithTensor
2023-01-11T23:30:55.9418967Z [0;32m[       OK ] [mLazyOpsTest.TestIndexFillWithTensor (8 ms)
2023-01-11T23:30:55.9419361Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexFillWithTensorInPlace
2023-01-11T23:30:55.9527331Z [0;32m[       OK ] [mLazyOpsTest.TestIndexFillWithTensorInPlace (10 ms)
2023-01-11T23:30:55.9527720Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexFillRank0
2023-01-11T23:30:55.9614063Z [0;32m[       OK ] [mLazyOpsTest.TestIndexFillRank0 (8 ms)
2023-01-11T23:30:55.9614401Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexAdd
2023-01-11T23:30:55.9805888Z [0;32m[       OK ] [mLazyOpsTest.TestIndexAdd (19 ms)
2023-01-11T23:30:55.9806237Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexAddInPlace
2023-01-11T23:30:55.9940473Z [0;32m[       OK ] [mLazyOpsTest.TestIndexAddInPlace (13 ms)
2023-01-11T23:30:55.9940882Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexAddRank0
2023-01-11T23:30:56.0019282Z [0;32m[       OK ] [mLazyOpsTest.TestIndexAddRank0 (7 ms)
2023-01-11T23:30:56.0019628Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexCopy
2023-01-11T23:30:56.0113531Z [0;32m[       OK ] [mLazyOpsTest.TestIndexCopy (9 ms)
2023-01-11T23:30:56.0114067Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexCopyInPlace
2023-01-11T23:30:56.0114388Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:6082: Skipped
2023-01-11T23:30:56.0114570Z 
2023-01-11T23:30:56.0114772Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestIndexCopyInPlace (0 ms)
2023-01-11T23:30:56.0115118Z [0;32m[ RUN      ] [mLazyOpsTest.TestIndexCopyRank0
2023-01-11T23:30:56.0194395Z [0;32m[       OK ] [mLazyOpsTest.TestIndexCopyRank0 (8 ms)
2023-01-11T23:30:56.0194724Z [0;32m[ RUN      ] [mLazyOpsTest.TestRelu
2023-01-11T23:30:56.0198113Z [0;32m[       OK ] [mLazyOpsTest.TestRelu (0 ms)
2023-01-11T23:30:56.0198433Z [0;32m[ RUN      ] [mLazyOpsTest.TestReluInPlace
2023-01-11T23:30:56.0203317Z [0;32m[       OK ] [mLazyOpsTest.TestReluInPlace (0 ms)
2023-01-11T23:30:56.0203655Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardshrink
2023-01-11T23:30:56.0205438Z [0;32m[       OK ] [mLazyOpsTest.TestHardshrink (0 ms)
2023-01-11T23:30:56.0205836Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardSigmoid
2023-01-11T23:30:56.0209469Z [0;32m[       OK ] [mLazyOpsTest.TestHardSigmoid (0 ms)
2023-01-11T23:30:56.0209932Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardSigmoidInPlace
2023-01-11T23:30:56.0214410Z [0;32m[       OK ] [mLazyOpsTest.TestHardSigmoidInPlace (0 ms)
2023-01-11T23:30:56.0214937Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardSigmoidBackward
2023-01-11T23:30:56.0228416Z [0;32m[       OK ] [mLazyOpsTest.TestHardSigmoidBackward (1 ms)
2023-01-11T23:30:56.0228900Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftshrink
2023-01-11T23:30:56.0230672Z [0;32m[       OK ] [mLazyOpsTest.TestSoftshrink (0 ms)
2023-01-11T23:30:56.0232335Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardtanh
2023-01-11T23:30:56.0233168Z [0;32m[       OK ] [mLazyOpsTest.TestHardtanh (0 ms)
2023-01-11T23:30:56.0233648Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardtanhInPlace
2023-01-11T23:30:56.0235800Z [0;32m[       OK ] [mLazyOpsTest.TestHardtanhInPlace (0 ms)
2023-01-11T23:30:56.0236284Z [0;32m[ RUN      ] [mLazyOpsTest.TestLeakyRelu
2023-01-11T23:30:56.0239726Z [0;32m[       OK ] [mLazyOpsTest.TestLeakyRelu (0 ms)
2023-01-11T23:30:56.0240220Z [0;32m[ RUN      ] [mLazyOpsTest.TestLeakyReluInPlace
2023-01-11T23:30:56.0244910Z [0;32m[       OK ] [mLazyOpsTest.TestLeakyReluInPlace (0 ms)
2023-01-11T23:30:56.0245354Z [0;32m[ RUN      ] [mLazyOpsTest.TestExp
2023-01-11T23:30:56.0248400Z [0;32m[       OK ] [mLazyOpsTest.TestExp (0 ms)
2023-01-11T23:30:56.0248805Z [0;32m[ RUN      ] [mLazyOpsTest.TestExpm1
2023-01-11T23:30:56.0250299Z [0;32m[       OK ] [mLazyOpsTest.TestExpm1 (0 ms)
2023-01-11T23:30:56.0250707Z [0;32m[ RUN      ] [mLazyOpsTest.TestLog
2023-01-11T23:30:56.0253886Z [0;32m[       OK ] [mLazyOpsTest.TestLog (0 ms)
2023-01-11T23:30:56.0254306Z [0;32m[ RUN      ] [mLazyOpsTest.TestLog2
2023-01-11T23:30:56.0257758Z [0;32m[       OK ] [mLazyOpsTest.TestLog2 (0 ms)
2023-01-11T23:30:56.0258196Z [0;32m[ RUN      ] [mLazyOpsTest.TestLog10
2023-01-11T23:30:56.0259942Z [0;32m[       OK ] [mLazyOpsTest.TestLog10 (0 ms)
2023-01-11T23:30:56.0260378Z [0;32m[ RUN      ] [mLazyOpsTest.TestLog1p
2023-01-11T23:30:56.0262128Z [0;32m[       OK ] [mLazyOpsTest.TestLog1p (0 ms)
2023-01-11T23:30:56.0262568Z [0;32m[ RUN      ] [mLazyOpsTest.TestErf
2023-01-11T23:30:56.0264673Z [0;32m[       OK ] [mLazyOpsTest.TestErf (0 ms)
2023-01-11T23:30:56.0265098Z [0;32m[ RUN      ] [mLazyOpsTest.TestErfc
2023-01-11T23:30:56.0273141Z [0;32m[       OK ] [mLazyOpsTest.TestErfc (0 ms)
2023-01-11T23:30:56.0273569Z [0;32m[ RUN      ] [mLazyOpsTest.TestErfinv
2023-01-11T23:30:56.0278228Z [0;32m[       OK ] [mLazyOpsTest.TestErfinv (0 ms)
2023-01-11T23:30:56.0278654Z [0;32m[ RUN      ] [mLazyOpsTest.TestSqrt
2023-01-11T23:30:56.0281951Z [0;32m[       OK ] [mLazyOpsTest.TestSqrt (0 ms)
2023-01-11T23:30:56.0282274Z [0;32m[ RUN      ] [mLazyOpsTest.TestRsqrt
2023-01-11T23:30:56.0285526Z [0;32m[       OK ] [mLazyOpsTest.TestRsqrt (0 ms)
2023-01-11T23:30:56.0285853Z [0;32m[ RUN      ] [mLazyOpsTest.TestReciprocal
2023-01-11T23:30:56.0289097Z [0;32m[       OK ] [mLazyOpsTest.TestReciprocal (0 ms)
2023-01-11T23:30:56.0289448Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowTensorScalar
2023-01-11T23:30:56.0292952Z [0;32m[       OK ] [mLazyOpsTest.TestPowTensorScalar (0 ms)
2023-01-11T23:30:56.0293326Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowTensorScalarInPlace
2023-01-11T23:30:56.0298502Z [0;32m[       OK ] [mLazyOpsTest.TestPowTensorScalarInPlace (0 ms)
2023-01-11T23:30:56.0298865Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowTensorTensor
2023-01-11T23:30:56.0302955Z [0;32m[       OK ] [mLazyOpsTest.TestPowTensorTensor (0 ms)
2023-01-11T23:30:56.0303324Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowTensorTensorInPlace
2023-01-11T23:30:56.0308767Z [0;32m[       OK ] [mLazyOpsTest.TestPowTensorTensorInPlace (0 ms)
2023-01-11T23:30:56.0309156Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowTensorTensorBroadcast
2023-01-11T23:30:56.0313959Z [0;32m[       OK ] [mLazyOpsTest.TestPowTensorTensorBroadcast (0 ms)
2023-01-11T23:30:56.0314454Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowScalarTensor
2023-01-11T23:30:56.0315062Z [0;32m[       OK ] [mLazyOpsTest.TestPowScalarTensor (0 ms)
2023-01-11T23:30:56.0315402Z [0;32m[ RUN      ] [mLazyOpsTest.TestPowIntExponent
2023-01-11T23:30:56.0319197Z [0;32m[       OK ] [mLazyOpsTest.TestPowIntExponent (0 ms)
2023-01-11T23:30:56.0319535Z [0;32m[ RUN      ] [mLazyOpsTest.TestFmodScalar
2023-01-11T23:30:56.0321797Z [0;32m[       OK ] [mLazyOpsTest.TestFmodScalar (0 ms)
2023-01-11T23:30:56.0322151Z [0;32m[ RUN      ] [mLazyOpsTest.TestFmodScalarInPlace
2023-01-11T23:30:56.0325279Z [0;32m[       OK ] [mLazyOpsTest.TestFmodScalarInPlace (0 ms)
2023-01-11T23:30:56.0325626Z [0;32m[ RUN      ] [mLazyOpsTest.TestFmodTensor
2023-01-11T23:30:56.0327612Z [0;32m[       OK ] [mLazyOpsTest.TestFmodTensor (0 ms)
2023-01-11T23:30:56.0327962Z [0;32m[ RUN      ] [mLazyOpsTest.TestFmodTensorInPlace
2023-01-11T23:30:56.0331272Z [0;32m[       OK ] [mLazyOpsTest.TestFmodTensorInPlace (0 ms)
2023-01-11T23:30:56.0331874Z [0;32m[ RUN      ] [mLazyOpsTest.TestRemainderScalar
2023-01-11T23:30:56.0335588Z [0;32m[       OK ] [mLazyOpsTest.TestRemainderScalar (0 ms)
2023-01-11T23:30:56.0335959Z [0;32m[ RUN      ] [mLazyOpsTest.TestRemainderScalarInPlace
2023-01-11T23:30:56.0341544Z [0;32m[       OK ] [mLazyOpsTest.TestRemainderScalarInPlace (0 ms)
2023-01-11T23:30:56.0341918Z [0;32m[ RUN      ] [mLazyOpsTest.TestRemainderTensor
2023-01-11T23:30:56.0345883Z [0;32m[       OK ] [mLazyOpsTest.TestRemainderTensor (0 ms)
2023-01-11T23:30:56.0346249Z [0;32m[ RUN      ] [mLazyOpsTest.TestRemainderTensorInPlace
2023-01-11T23:30:56.0353158Z [0;32m[       OK ] [mLazyOpsTest.TestRemainderTensorInPlace (0 ms)
2023-01-11T23:30:56.0353639Z [0;32m[ RUN      ] [mLazyOpsTest.TestWhere
2023-01-11T23:30:56.0354130Z [W TensorCompare.cpp:493] Warning: where received a uint8 condition tensor. This behavior is deprecated and will be removed in a future version of PyTorch. Use a boolean condition instead. (function operator())
2023-01-11T23:30:56.0356229Z [0;32m[       OK ] [mLazyOpsTest.TestWhere (0 ms)
2023-01-11T23:30:56.0356582Z [0;32m[ RUN      ] [mLazyOpsTest.TestWhereBroadcast
2023-01-11T23:30:56.0359831Z [0;32m[       OK ] [mLazyOpsTest.TestWhereBroadcast (0 ms)
2023-01-11T23:30:56.0360193Z [0;32m[ RUN      ] [mLazyOpsTest.TestThreshold
2023-01-11T23:30:56.0363839Z [0;32m[       OK ] [mLazyOpsTest.TestThreshold (0 ms)
2023-01-11T23:30:56.0364211Z [0;32m[ RUN      ] [mLazyOpsTest.TestThresholdBackward
2023-01-11T23:30:56.0376663Z [0;32m[       OK ] [mLazyOpsTest.TestThresholdBackward (1 ms)
2023-01-11T23:30:56.0377038Z [0;32m[ RUN      ] [mLazyOpsTest.TestThresholdInPlace
2023-01-11T23:30:56.0381168Z [0;32m[       OK ] [mLazyOpsTest.TestThresholdInPlace (0 ms)
2023-01-11T23:30:56.0381564Z [0;32m[ RUN      ] [mLazyOpsTest.TestElu
2023-01-11T23:30:56.0385332Z [0;32m[       OK ] [mLazyOpsTest.TestElu (0 ms)
2023-01-11T23:30:56.0385673Z [0;32m[ RUN      ] [mLazyOpsTest.TestEluInPlace
2023-01-11T23:30:56.0390611Z [0;32m[       OK ] [mLazyOpsTest.TestEluInPlace (0 ms)
2023-01-11T23:30:56.0391534Z [0;32m[ RUN      ] [mLazyOpsTest.TestSelu
2023-01-11T23:30:56.0394508Z [0;32m[       OK ] [mLazyOpsTest.TestSelu (0 ms)
2023-01-11T23:30:56.0394851Z [0;32m[ RUN      ] [mLazyOpsTest.TestSeluInPlace
2023-01-11T23:30:56.0399627Z [0;32m[       OK ] [mLazyOpsTest.TestSeluInPlace (0 ms)
2023-01-11T23:30:56.0399957Z [0;32m[ RUN      ] [mLazyOpsTest.TestCelu
2023-01-11T23:30:56.0402021Z [0;32m[       OK ] [mLazyOpsTest.TestCelu (0 ms)
2023-01-11T23:30:56.0402382Z [0;32m[ RUN      ] [mLazyOpsTest.TestCeluInPlace
2023-01-11T23:30:56.0405102Z [0;32m[       OK ] [mLazyOpsTest.TestCeluInPlace (0 ms)
2023-01-11T23:30:56.0405465Z [0;32m[ RUN      ] [mLazyOpsTest.TestGelu
2023-01-11T23:30:56.0408933Z [0;32m[       OK ] [mLazyOpsTest.TestGelu (0 ms)
2023-01-11T23:30:56.0409307Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddMatMul
2023-01-11T23:30:56.0420744Z [0;32m[       OK ] [mLazyOpsTest.TestAddMatMul (1 ms)
2023-01-11T23:30:56.0421143Z [0;32m[ RUN      ] [mLazyOpsTest.TestEmbedding
2023-01-11T23:30:56.0425670Z [0;32m[       OK ] [mLazyOpsTest.TestEmbedding (0 ms)
2023-01-11T23:30:56.0426067Z [0;32m[ RUN      ] [mLazyOpsTest.TestOneHot
2023-01-11T23:30:56.0436243Z [0;32m[       OK ] [mLazyOpsTest.TestOneHot (1 ms)
2023-01-11T23:30:56.0436644Z [0;32m[ RUN      ] [mLazyOpsTest.TestTranspose
2023-01-11T23:30:56.0440045Z [0;32m[       OK ] [mLazyOpsTest.TestTranspose (0 ms)
2023-01-11T23:30:56.0440448Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposeInPlace
2023-01-11T23:30:56.0444914Z [0;32m[       OK ] [mLazyOpsTest.TestTransposeInPlace (0 ms)
2023-01-11T23:30:56.0445340Z [0;32m[ RUN      ] [mLazyOpsTest.TestReshape
2023-01-11T23:30:56.0450139Z [0;32m[       OK ] [mLazyOpsTest.TestReshape (0 ms)
2023-01-11T23:30:56.0450540Z [0;32m[ RUN      ] [mLazyOpsTest.TestResize
2023-01-11T23:30:56.0456301Z [0;32m[       OK ] [mLazyOpsTest.TestResize (0 ms)
2023-01-11T23:30:56.0456677Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewResize
2023-01-11T23:30:56.0464664Z [0;32m[       OK ] [mLazyOpsTest.TestViewResize (0 ms)
2023-01-11T23:30:56.0465039Z [0;32m[ RUN      ] [mLazyOpsTest.TestView
2023-01-11T23:30:56.0469844Z [0;32m[       OK ] [mLazyOpsTest.TestView (0 ms)
2023-01-11T23:30:56.0470295Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewMod
2023-01-11T23:30:56.0485442Z [0;32m[       OK ] [mLazyOpsTest.TestViewMod (1 ms)
2023-01-11T23:30:56.0485846Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewModComplex
2023-01-11T23:30:56.0503975Z [0;32m[       OK ] [mLazyOpsTest.TestViewModComplex (1 ms)
2023-01-11T23:30:56.0504384Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewOfViewMod
2023-01-11T23:30:56.0525271Z [0;32m[       OK ] [mLazyOpsTest.TestViewOfViewMod (2 ms)
2023-01-11T23:30:56.0525642Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewSqueezeAddInPlace
2023-01-11T23:30:56.0540458Z [0;32m[       OK ] [mLazyOpsTest.TestViewSqueezeAddInPlace (1 ms)
2023-01-11T23:30:56.0540860Z [0;32m[ RUN      ] [mLazyOpsTest.TestUnsafeView
2023-01-11T23:30:56.0546001Z [0;32m[       OK ] [mLazyOpsTest.TestUnsafeView (0 ms)
2023-01-11T23:30:56.0546374Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrow
2023-01-11T23:30:56.0567776Z [0;32m[       OK ] [mLazyOpsTest.TestNarrow (2 ms)
2023-01-11T23:30:56.0568128Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrowUpdate
2023-01-11T23:30:56.0608354Z [0;32m[       OK ] [mLazyOpsTest.TestNarrowUpdate (4 ms)
2023-01-11T23:30:56.0608773Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrowUpdateBaseCheck
2023-01-11T23:30:56.0642653Z [0;32m[       OK ] [mLazyOpsTest.TestNarrowUpdateBaseCheck (3 ms)
2023-01-11T23:30:56.0643112Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrowUpdateTwoSlices
2023-01-11T23:30:56.0827822Z [0;32m[       OK ] [mLazyOpsTest.TestNarrowUpdateTwoSlices (18 ms)
2023-01-11T23:30:56.0828212Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrowUpdateView
2023-01-11T23:30:56.0877299Z [0;32m[       OK ] [mLazyOpsTest.TestNarrowUpdateView (4 ms)
2023-01-11T23:30:56.0877691Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrowInNarrowUpdate
2023-01-11T23:30:56.0943817Z [0;32m[       OK ] [mLazyOpsTest.TestNarrowInNarrowUpdate (6 ms)
2023-01-11T23:30:56.0944357Z [0;32m[ RUN      ] [mLazyOpsTest.TestNarrowCopy
2023-01-11T23:30:56.0954464Z [0;32m[       OK ] [mLazyOpsTest.TestNarrowCopy (1 ms)
2023-01-11T23:30:56.0954846Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewAs
2023-01-11T23:30:56.0960391Z [0;32m[       OK ] [mLazyOpsTest.TestViewAs (0 ms)
2023-01-11T23:30:56.0960795Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSoftmax
2023-01-11T23:30:56.0986832Z [0;32m[       OK ] [mLazyOpsTest.TestLogSoftmax (2 ms)
2023-01-11T23:30:56.0987240Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSoftmaxCast
2023-01-11T23:30:56.1022565Z [0;32m[       OK ] [mLazyOpsTest.TestLogSoftmaxCast (3 ms)
2023-01-11T23:30:56.1023016Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSoftmaxWrapper
2023-01-11T23:30:56.1048568Z [0;32m[       OK ] [mLazyOpsTest.TestLogSoftmaxWrapper (2 ms)
2023-01-11T23:30:56.1048973Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftmax
2023-01-11T23:30:56.1075612Z [0;32m[       OK ] [mLazyOpsTest.TestSoftmax (2 ms)
2023-01-11T23:30:56.1076127Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftmaxCast
2023-01-11T23:30:56.1112326Z [0;32m[       OK ] [mLazyOpsTest.TestSoftmaxCast (3 ms)
2023-01-11T23:30:56.1112739Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftmaxWrapper
2023-01-11T23:30:56.1145062Z [0;32m[       OK ] [mLazyOpsTest.TestSoftmaxWrapper (3 ms)
2023-01-11T23:30:56.1145404Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftplus
2023-01-11T23:30:56.1148881Z [0;32m[       OK ] [mLazyOpsTest.TestSoftplus (0 ms)
2023-01-11T23:30:56.1149271Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool1D
2023-01-11T23:30:56.1249836Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool1D (10 ms)
2023-01-11T23:30:56.1250222Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool2D
2023-01-11T23:30:56.1322629Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool2D (7 ms)
2023-01-11T23:30:56.1323067Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool2DWithIndices
2023-01-11T23:30:56.1478637Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool2DWithIndices (15 ms)
2023-01-11T23:30:56.1479215Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool2DNonSquare
2023-01-11T23:30:56.1550588Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool2DNonSquare (7 ms)
2023-01-11T23:30:56.1551855Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3D
2023-01-11T23:30:56.1581934Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3D (3 ms)
2023-01-11T23:30:56.1582355Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3DWithIndices
2023-01-11T23:30:56.1626836Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3DWithIndices (4 ms)
2023-01-11T23:30:56.1627274Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3DIncompleteAttributes
2023-01-11T23:30:56.1658202Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3DIncompleteAttributes (3 ms)
2023-01-11T23:30:56.1658625Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3DNonSquare
2023-01-11T23:30:56.1689109Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3DNonSquare (3 ms)
2023-01-11T23:30:56.1689505Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool2DNoBatch
2023-01-11T23:30:56.1760305Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool2DNoBatch (7 ms)
2023-01-11T23:30:56.1760689Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3DNoBatch
2023-01-11T23:30:56.1794379Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3DNoBatch (3 ms)
2023-01-11T23:30:56.1794749Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool1D
2023-01-11T23:30:56.1887047Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool1D (9 ms)
2023-01-11T23:30:56.1887385Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool2D
2023-01-11T23:30:56.1954565Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool2D (6 ms)
2023-01-11T23:30:56.1954939Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool2DNonSquare
2023-01-11T23:30:56.2023183Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool2DNonSquare (6 ms)
2023-01-11T23:30:56.2023538Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool3D
2023-01-11T23:30:56.2052315Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool3D (2 ms)
2023-01-11T23:30:56.2052829Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool3DIncompleteAttributes
2023-01-11T23:30:56.2081330Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool3DIncompleteAttributes (2 ms)
2023-01-11T23:30:56.2081746Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool3DNonSquare
2023-01-11T23:30:56.2109996Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool3DNonSquare (2 ms)
2023-01-11T23:30:56.2110420Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool2DNoBatch
2023-01-11T23:30:56.2179283Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool2DNoBatch (6 ms)
2023-01-11T23:30:56.2179651Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool3DNoBatch
2023-01-11T23:30:56.2206561Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool3DNoBatch (2 ms)
2023-01-11T23:30:56.2207033Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool2D
2023-01-11T23:30:56.2214432Z [0;32m[       OK ] [mLazyOpsTest.TestAdaptiveAvgPool2D (0 ms)
2023-01-11T23:30:56.2214997Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool3D
2023-01-11T23:30:56.2225829Z [0;32m[       OK ] [mLazyOpsTest.TestAdaptiveAvgPool3D (1 ms)
2023-01-11T23:30:56.2226342Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool3DNoBatch
2023-01-11T23:30:56.2232555Z [0;32m[       OK ] [mLazyOpsTest.TestAdaptiveAvgPool3DNoBatch (0 ms)
2023-01-11T23:30:56.2233032Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool2DNoBatch
2023-01-11T23:30:56.2238767Z [0;32m[       OK ] [mLazyOpsTest.TestAdaptiveAvgPool2DNoBatch (0 ms)
2023-01-11T23:30:56.2239193Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxUnpool2D
2023-01-11T23:30:56.2277252Z [0;32m[       OK ] [mLazyOpsTest.TestMaxUnpool2D (3 ms)
2023-01-11T23:30:56.2277617Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxUnpool3D
2023-01-11T23:30:56.2316136Z [0;32m[       OK ] [mLazyOpsTest.TestMaxUnpool3D (3 ms)
2023-01-11T23:30:56.2316477Z [0;32m[ RUN      ] [mLazyOpsTest.TestNllLoss
2023-01-11T23:30:56.2316848Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:8173: Skipped
2023-01-11T23:30:56.2317065Z 
2023-01-11T23:30:56.2317251Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestNllLoss (0 ms)
2023-01-11T23:30:56.2317587Z [0;32m[ RUN      ] [mLazyOpsTest.TestNllLoss2d
2023-01-11T23:30:56.2430164Z [0;32m[       OK ] [mLazyOpsTest.TestNllLoss2d (11 ms)
2023-01-11T23:30:56.2430490Z [0;32m[ RUN      ] [mLazyOpsTest.TestSmoothL1Loss
2023-01-11T23:30:56.2454728Z [0;32m[       OK ] [mLazyOpsTest.TestSmoothL1Loss (2 ms)
2023-01-11T23:30:56.2455059Z [0;32m[ RUN      ] [mLazyOpsTest.TestL1Loss
2023-01-11T23:30:56.2470119Z [0;32m[       OK ] [mLazyOpsTest.TestL1Loss (1 ms)
2023-01-11T23:30:56.2470487Z [0;32m[ RUN      ] [mLazyOpsTest.TestL1LossBackward
2023-01-11T23:30:56.2523072Z [0;32m[       OK ] [mLazyOpsTest.TestL1LossBackward (5 ms)
2023-01-11T23:30:56.2523434Z [0;32m[ RUN      ] [mLazyOpsTest.TestMseLoss
2023-01-11T23:30:56.2529714Z [0;32m[       OK ] [mLazyOpsTest.TestMseLoss (0 ms)
2023-01-11T23:30:56.2530118Z [0;32m[ RUN      ] [mLazyOpsTest.TestMseLossBackward
2023-01-11T23:30:56.2568649Z [0;32m[       OK ] [mLazyOpsTest.TestMseLossBackward (3 ms)
2023-01-11T23:30:56.2569011Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchNorm1D
2023-01-11T23:30:56.2594071Z [0;32m[       OK ] [mLazyOpsTest.TestBatchNorm1D (2 ms)
2023-01-11T23:30:56.2594415Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchNorm2D
2023-01-11T23:30:56.2617682Z [0;32m[       OK ] [mLazyOpsTest.TestBatchNorm2D (2 ms)
2023-01-11T23:30:56.2618001Z [0;32m[ RUN      ] [mLazyOpsTest.TestDim
2023-01-11T23:30:56.2618363Z [0;32m[       OK ] [mLazyOpsTest.TestDim (0 ms)
2023-01-11T23:30:56.2618727Z [0;32m[ RUN      ] [mLazyOpsTest.TestContiguous
2023-01-11T23:30:56.2619258Z [0;32m[       OK ] [mLazyOpsTest.TestContiguous (0 ms)
2023-01-11T23:30:56.2619648Z [0;32m[ RUN      ] [mLazyOpsTest.TestSqueezeAll
2023-01-11T23:30:56.2623225Z [0;32m[       OK ] [mLazyOpsTest.TestSqueezeAll (0 ms)
2023-01-11T23:30:56.2623582Z [0;32m[ RUN      ] [mLazyOpsTest.TestSqueezeAllInPlace
2023-01-11T23:30:56.2628005Z [0;32m[       OK ] [mLazyOpsTest.TestSqueezeAllInPlace (0 ms)
2023-01-11T23:30:56.2628441Z [0;32m[ RUN      ] [mLazyOpsTest.TestSqueezeOne
2023-01-11T23:30:56.2656716Z [0;32m[       OK ] [mLazyOpsTest.TestSqueezeOne (2 ms)
2023-01-11T23:30:56.2657080Z [0;32m[ RUN      ] [mLazyOpsTest.TestSqueezeOneInPlace
2023-01-11T23:30:56.2695067Z [0;32m[       OK ] [mLazyOpsTest.TestSqueezeOneInPlace (3 ms)
2023-01-11T23:30:56.2695419Z [0;32m[ RUN      ] [mLazyOpsTest.TestUnsqueeze
2023-01-11T23:30:56.2716097Z [0;32m[       OK ] [mLazyOpsTest.TestUnsqueeze (2 ms)
2023-01-11T23:30:56.2716498Z [0;32m[ RUN      ] [mLazyOpsTest.TestUnsqueezeInPlace
2023-01-11T23:30:56.2742924Z [0;32m[       OK ] [mLazyOpsTest.TestUnsqueezeInPlace (2 ms)
2023-01-11T23:30:56.2743294Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskedFill
2023-01-11T23:30:56.2747214Z [0;32m[       OK ] [mLazyOpsTest.TestMaskedFill (0 ms)
2023-01-11T23:30:56.2747616Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskedFillInPlace
2023-01-11T23:30:56.2753571Z [0;32m[       OK ] [mLazyOpsTest.TestMaskedFillInPlace (0 ms)
2023-01-11T23:30:56.2753993Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaskedFillBroadcast
2023-01-11T23:30:56.2757261Z [0;32m[       OK ] [mLazyOpsTest.TestMaskedFillBroadcast (0 ms)
2023-01-11T23:30:56.2757647Z [0;32m[ RUN      ] [mLazyOpsTest.TestFill
2023-01-11T23:30:56.2762778Z [0;32m[       OK ] [mLazyOpsTest.TestFill (0 ms)
2023-01-11T23:30:56.2763164Z [0;32m[ RUN      ] [mLazyOpsTest.TestFillWithRank0
2023-01-11T23:30:56.2766118Z [0;32m[       OK ] [mLazyOpsTest.TestFillWithRank0 (0 ms)
2023-01-11T23:30:56.2766490Z [0;32m[ RUN      ] [mLazyOpsTest.TestPermute
2023-01-11T23:30:56.2810846Z [0;32m[       OK ] [mLazyOpsTest.TestPermute (4 ms)
2023-01-11T23:30:56.2811217Z [0;32m[ RUN      ] [mLazyOpsTest.TestPermuteMod
2023-01-11T23:30:56.2971169Z [0;32m[       OK ] [mLazyOpsTest.TestPermuteMod (16 ms)
2023-01-11T23:30:56.2971536Z [0;32m[ RUN      ] [mLazyOpsTest.TestFlip
2023-01-11T23:30:56.3020939Z [0;32m[       OK ] [mLazyOpsTest.TestFlip (4 ms)
2023-01-11T23:30:56.3021313Z [0;32m[ RUN      ] [mLazyOpsTest.TestPixelShuffle
2023-01-11T23:30:56.3027301Z [0;32m[       OK ] [mLazyOpsTest.TestPixelShuffle (0 ms)
2023-01-11T23:30:56.3027678Z [0;32m[ RUN      ] [mLazyOpsTest.TestSumToSize
2023-01-11T23:30:56.3032744Z [0;32m[       OK ] [mLazyOpsTest.TestSumToSize (0 ms)
2023-01-11T23:30:56.3033137Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposeDims
2023-01-11T23:30:56.3036200Z [0;32m[       OK ] [mLazyOpsTest.TestTransposeDims (0 ms)
2023-01-11T23:30:56.3036552Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposeDimsMod
2023-01-11T23:30:56.3049534Z [0;32m[       OK ] [mLazyOpsTest.TestTransposeDimsMod (1 ms)
2023-01-11T23:30:56.3049901Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposeDimsInPlace
2023-01-11T23:30:56.3055089Z [0;32m[       OK ] [mLazyOpsTest.TestTransposeDimsInPlace (0 ms)
2023-01-11T23:30:56.3055435Z [0;32m[ RUN      ] [mLazyOpsTest.TestSplit
2023-01-11T23:30:56.3106791Z [0;32m[       OK ] [mLazyOpsTest.TestSplit (5 ms)
2023-01-11T23:30:56.3107118Z [0;32m[ RUN      ] [mLazyOpsTest.TestSplitEmpty
2023-01-11T23:30:56.3107743Z [0;32m[       OK ] [mLazyOpsTest.TestSplitEmpty (0 ms)
2023-01-11T23:30:56.3108117Z [0;32m[ RUN      ] [mLazyOpsTest.TestSplitWithSizes
2023-01-11T23:30:56.3131922Z [0;32m[       OK ] [mLazyOpsTest.TestSplitWithSizes (2 ms)
2023-01-11T23:30:56.3132273Z [0;32m[ RUN      ] [mLazyOpsTest.TestCrossImplicitDim
2023-01-11T23:30:56.3139232Z [0;32m[       OK ] [mLazyOpsTest.TestCrossImplicitDim (0 ms)
2023-01-11T23:30:56.3139589Z [0;32m[ RUN      ] [mLazyOpsTest.TestCrossExplicitDim
2023-01-11T23:30:56.3147786Z [0;32m[       OK ] [mLazyOpsTest.TestCrossExplicitDim (0 ms)
2023-01-11T23:30:56.3148195Z [0;32m[ RUN      ] [mLazyOpsTest.TestCrossZeroDim
2023-01-11T23:30:56.3148618Z [0;32m[       OK ] [mLazyOpsTest.TestCrossZeroDim (0 ms)
2023-01-11T23:30:56.3148940Z [0;32m[ RUN      ] [mLazyOpsTest.TestTriu
2023-01-11T23:30:56.3186659Z [0;32m[       OK ] [mLazyOpsTest.TestTriu (3 ms)
2023-01-11T23:30:56.3186987Z [0;32m[ RUN      ] [mLazyOpsTest.TestTriuNonSquare
2023-01-11T23:30:56.3223573Z [0;32m[       OK ] [mLazyOpsTest.TestTriuNonSquare (3 ms)
2023-01-11T23:30:56.3223921Z [0;32m[ RUN      ] [mLazyOpsTest.TestTriuBatch
2023-01-11T23:30:56.3261076Z [0;32m[       OK ] [mLazyOpsTest.TestTriuBatch (3 ms)
2023-01-11T23:30:56.3261389Z [0;32m[ RUN      ] [mLazyOpsTest.TestTril
2023-01-11T23:30:56.3298849Z [0;32m[       OK ] [mLazyOpsTest.TestTril (3 ms)
2023-01-11T23:30:56.3299171Z [0;32m[ RUN      ] [mLazyOpsTest.TestTrilNonSquare
2023-01-11T23:30:56.3336000Z [0;32m[       OK ] [mLazyOpsTest.TestTrilNonSquare (3 ms)
2023-01-11T23:30:56.3336334Z [0;32m[ RUN      ] [mLazyOpsTest.TestTrilBatch
2023-01-11T23:30:56.3376166Z [0;32m[       OK ] [mLazyOpsTest.TestTrilBatch (3 ms)
2023-01-11T23:30:56.3376487Z [0;32m[ RUN      ] [mLazyOpsTest.TestTriuInPlace
2023-01-11T23:30:56.3427835Z [0;32m[       OK ] [mLazyOpsTest.TestTriuInPlace (5 ms)
2023-01-11T23:30:56.3428169Z [0;32m[ RUN      ] [mLazyOpsTest.TestTrilInPlace
2023-01-11T23:30:56.3479820Z [0;32m[       OK ] [mLazyOpsTest.TestTrilInPlace (5 ms)
2023-01-11T23:30:56.3480211Z [0;32m[ RUN      ] [mLazyOpsTest.TestTrace
2023-01-11T23:30:56.3483556Z [0;32m[       OK ] [mLazyOpsTest.TestTrace (0 ms)
2023-01-11T23:30:56.3483875Z [0;32m[ RUN      ] [mLazyOpsTest.TestTraceWide
2023-01-11T23:30:56.3487072Z [0;32m[       OK ] [mLazyOpsTest.TestTraceWide (0 ms)
2023-01-11T23:30:56.3487408Z [0;32m[ RUN      ] [mLazyOpsTest.TestTraceNarrow
2023-01-11T23:30:56.3490757Z [0;32m[       OK ] [mLazyOpsTest.TestTraceNarrow (0 ms)
2023-01-11T23:30:56.3491086Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagRank1
2023-01-11T23:30:56.3735005Z [0;32m[       OK ] [mLazyOpsTest.TestDiagRank1 (24 ms)
2023-01-11T23:30:56.3735360Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagRank2
2023-01-11T23:30:56.3789659Z [0;32m[       OK ] [mLazyOpsTest.TestDiagRank2 (5 ms)
2023-01-11T23:30:56.3790080Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagFlat
2023-01-11T23:30:56.4223054Z [0;32m[       OK ] [mLazyOpsTest.TestDiagFlat (43 ms)
2023-01-11T23:30:56.4223478Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagonal
2023-01-11T23:30:56.4261921Z [0;32m[       OK ] [mLazyOpsTest.TestDiagonal (4 ms)
2023-01-11T23:30:56.4262298Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagonalUpdate
2023-01-11T23:30:56.4388622Z [0;32m[       OK ] [mLazyOpsTest.TestDiagonalUpdate (12 ms)
2023-01-11T23:30:56.4389006Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagonalNonSquare
2023-01-11T23:30:56.4428705Z [0;32m[       OK ] [mLazyOpsTest.TestDiagonalNonSquare (4 ms)
2023-01-11T23:30:56.4429106Z [0;32m[ RUN      ] [mLazyOpsTest.TestDiagonalBatch
2023-01-11T23:30:56.4469435Z [0;32m[       OK ] [mLazyOpsTest.TestDiagonalBatch (4 ms)
2023-01-11T23:30:56.4469767Z [0;32m[ RUN      ] [mLazyOpsTest.TestFlatten
2023-01-11T23:30:56.4549047Z [0;32m[       OK ] [mLazyOpsTest.TestFlatten (7 ms)
2023-01-11T23:30:56.4549378Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogicalAnd
2023-01-11T23:30:56.4593982Z [0;32m[       OK ] [mLazyOpsTest.TestLogicalAnd (4 ms)
2023-01-11T23:30:56.4594379Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseAnd
2023-01-11T23:30:56.4596564Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseAnd (0 ms)
2023-01-11T23:30:56.4596976Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseAndInPlace
2023-01-11T23:30:56.4601189Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseAndInPlace (0 ms)
2023-01-11T23:30:56.4601599Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseAndScalar
2023-01-11T23:30:56.4604582Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseAndScalar (0 ms)
2023-01-11T23:30:56.4605039Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseAndScalarInPlace
2023-01-11T23:30:56.4609226Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseAndScalarInPlace (0 ms)
2023-01-11T23:30:56.4609671Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseAndPromotion
2023-01-11T23:30:56.4615412Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseAndPromotion (0 ms)
2023-01-11T23:30:56.4615759Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseOr
2023-01-11T23:30:56.4618822Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseOr (0 ms)
2023-01-11T23:30:56.4619539Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseOrInPlace
2023-01-11T23:30:56.4623397Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseOrInPlace (0 ms)
2023-01-11T23:30:56.4623862Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseOrScalar
2023-01-11T23:30:56.4626759Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseOrScalar (0 ms)
2023-01-11T23:30:56.4627222Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseOrScalarInPlace
2023-01-11T23:30:56.4632562Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseOrScalarInPlace (0 ms)
2023-01-11T23:30:56.4633059Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseXor
2023-01-11T23:30:56.4634073Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseXor (0 ms)
2023-01-11T23:30:56.4634500Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseXorInPlace
2023-01-11T23:30:56.4635894Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseXorInPlace (0 ms)
2023-01-11T23:30:56.4636351Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseXorScalar
2023-01-11T23:30:56.4637013Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseXorScalar (0 ms)
2023-01-11T23:30:56.4637604Z [0;32m[ RUN      ] [mLazyOpsTest.TestBitwiseXorScalarInPlace
2023-01-11T23:30:56.4638882Z [0;32m[       OK ] [mLazyOpsTest.TestBitwiseXorScalarInPlace (0 ms)
2023-01-11T23:30:56.4639238Z [0;32m[ RUN      ] [mLazyOpsTest.TestLshift
2023-01-11T23:30:56.4641565Z [0;32m[       OK ] [mLazyOpsTest.TestLshift (0 ms)
2023-01-11T23:30:56.4641961Z [0;32m[ RUN      ] [mLazyOpsTest.TestLshiftInPlace
2023-01-11T23:30:56.4645567Z [0;32m[       OK ] [mLazyOpsTest.TestLshiftInPlace (0 ms)
2023-01-11T23:30:56.4645965Z [0;32m[ RUN      ] [mLazyOpsTest.TestLshiftScalar
2023-01-11T23:30:56.4647376Z [0;32m[       OK ] [mLazyOpsTest.TestLshiftScalar (0 ms)
2023-01-11T23:30:56.4647763Z [0;32m[ RUN      ] [mLazyOpsTest.TestLshiftScalarInPlace
2023-01-11T23:30:56.4650463Z [0;32m[       OK ] [mLazyOpsTest.TestLshiftScalarInPlace (0 ms)
2023-01-11T23:30:56.4650807Z [0;32m[ RUN      ] [mLazyOpsTest.TestRshift
2023-01-11T23:30:56.4652735Z [0;32m[       OK ] [mLazyOpsTest.TestRshift (0 ms)
2023-01-11T23:30:56.4653079Z [0;32m[ RUN      ] [mLazyOpsTest.TestRshiftInPlace
2023-01-11T23:30:56.4656563Z [0;32m[       OK ] [mLazyOpsTest.TestRshiftInPlace (0 ms)
2023-01-11T23:30:56.4656905Z [0;32m[ RUN      ] [mLazyOpsTest.TestRshiftScalar
2023-01-11T23:30:56.4658599Z [0;32m[       OK ] [mLazyOpsTest.TestRshiftScalar (0 ms)
2023-01-11T23:30:56.4658962Z [0;32m[ RUN      ] [mLazyOpsTest.TestRshiftScalarInPlace
2023-01-11T23:30:56.4661745Z [0;32m[       OK ] [mLazyOpsTest.TestRshiftScalarInPlace (0 ms)
2023-01-11T23:30:56.4662088Z [0;32m[ RUN      ] [mLazyOpsTest.TestMeshgrid
2023-01-11T23:30:56.4662461Z [W TensorShape.cpp:3452] Warning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (function operator())
2023-01-11T23:30:56.4676421Z [0;32m[       OK ] [mLazyOpsTest.TestMeshgrid (1 ms)
2023-01-11T23:30:56.4676757Z [0;32m[ RUN      ] [mLazyOpsTest.TestConstantPad
2023-01-11T23:30:56.4681686Z [0;32m[       OK ] [mLazyOpsTest.TestConstantPad (0 ms)
2023-01-11T23:30:56.4682045Z [0;32m[ RUN      ] [mLazyOpsTest.TestConstantPadIncomplete
2023-01-11T23:30:56.4686156Z [0;32m[       OK ] [mLazyOpsTest.TestConstantPadIncomplete (0 ms)
2023-01-11T23:30:56.4686590Z [0;32m[ RUN      ] [mLazyOpsTest.TestReflectionPad2dRank3
2023-01-11T23:30:56.4688308Z [0;32m[       OK ] [mLazyOpsTest.TestReflectionPad2dRank3 (0 ms)
2023-01-11T23:30:56.4688688Z [0;32m[ RUN      ] [mLazyOpsTest.TestReflectionPad2dRank4
2023-01-11T23:30:56.4690687Z [0;32m[       OK ] [mLazyOpsTest.TestReflectionPad2dRank4 (0 ms)
2023-01-11T23:30:56.4691084Z [0;32m[ RUN      ] [mLazyOpsTest.TestReflectionPad2dBackward
2023-01-11T23:30:56.4704576Z [0;32m[       OK ] [mLazyOpsTest.TestReflectionPad2dBackward (1 ms)
2023-01-11T23:30:56.4704951Z [0;32m[ RUN      ] [mLazyOpsTest.TestReplicationPad1d
2023-01-11T23:30:56.4706982Z [0;32m[       OK ] [mLazyOpsTest.TestReplicationPad1d (0 ms)
2023-01-11T23:30:56.4707548Z [0;32m[ RUN      ] [mLazyOpsTest.TestReplicationPad1dZeroPad
2023-01-11T23:30:56.4709519Z [0;32m[       OK ] [mLazyOpsTest.TestReplicationPad1dZeroPad (0 ms)
2023-01-11T23:30:56.4710009Z [0;32m[ RUN      ] [mLazyOpsTest.TestReplicationPad1dBackward
2023-01-11T23:30:56.4720296Z [0;32m[       OK ] [mLazyOpsTest.TestReplicationPad1dBackward (1 ms)
2023-01-11T23:30:56.4720741Z [0;32m[ RUN      ] [mLazyOpsTest.TestReplicationPad2d
2023-01-11T23:30:56.4722484Z [0;32m[       OK ] [mLazyOpsTest.TestReplicationPad2d (0 ms)
2023-01-11T23:30:56.4723048Z [0;32m[ RUN      ] [mLazyOpsTest.TestReplicationPad2dZeroPad
2023-01-11T23:30:56.4724698Z [0;32m[       OK ] [mLazyOpsTest.TestReplicationPad2dZeroPad (0 ms)
2023-01-11T23:30:56.4725155Z [0;32m[ RUN      ] [mLazyOpsTest.TestReplicationPad2dBackward
2023-01-11T23:30:56.4736825Z [0;32m[       OK ] [mLazyOpsTest.TestReplicationPad2dBackward (1 ms)
2023-01-11T23:30:56.4737192Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsStrided
2023-01-11T23:30:56.4745494Z [0;32m[       OK ] [mLazyOpsTest.TestAsStrided (0 ms)
2023-01-11T23:30:56.4746007Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsStridedInPlace
2023-01-11T23:30:56.4775218Z [0;32m[       OK ] [mLazyOpsTest.TestAsStridedInPlace (1 ms)
2023-01-11T23:30:56.4775811Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsStridedWithOffset
2023-01-11T23:30:56.4776230Z [0;32m[       OK ] [mLazyOpsTest.TestAsStridedWithOffset (0 ms)
2023-01-11T23:30:56.4776640Z [0;32m[ RUN      ] [mLazyOpsTest.TestAsStridedWithInplaceCopy
2023-01-11T23:30:56.4777081Z [0;32m[       OK ] [mLazyOpsTest.TestAsStridedWithInplaceCopy (0 ms)
2023-01-11T23:30:56.4777480Z [0;32m[ RUN      ] [mLazyOpsTest.TestEmptyStrided
2023-01-11T23:30:56.4777846Z [0;32m[       OK ] [mLazyOpsTest.TestEmptyStrided (0 ms)
2023-01-11T23:30:56.4778235Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool2DBackward
2023-01-11T23:30:56.5001793Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool2DBackward (22 ms)
2023-01-11T23:30:56.5002167Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool3DBackward
2023-01-11T23:30:56.5154783Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool3DBackward (15 ms)
2023-01-11T23:30:56.5155200Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool2DNoBatchBackward
2023-01-11T23:30:56.5383169Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool2DNoBatchBackward (22 ms)
2023-01-11T23:30:56.5383605Z [0;32m[ RUN      ] [mLazyOpsTest.TestAvgPool3DNoBatchBackward
2023-01-11T23:30:56.5538040Z [0;32m[       OK ] [mLazyOpsTest.TestAvgPool3DNoBatchBackward (15 ms)
2023-01-11T23:30:56.5538533Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool3DNoBatchBackward
2023-01-11T23:30:56.5538984Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:10048: Skipped
2023-01-11T23:30:56.5539226Z 
2023-01-11T23:30:56.5539489Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestAdaptiveAvgPool3DNoBatchBackward (0 ms)
2023-01-11T23:30:56.5539926Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool3DBackward
2023-01-11T23:30:56.5540254Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:10071: Skipped
2023-01-11T23:30:56.5540430Z 
2023-01-11T23:30:56.5540660Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestAdaptiveAvgPool3DBackward (0 ms)
2023-01-11T23:30:56.5541063Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool2DBackward
2023-01-11T23:30:56.5569482Z [0;32m[       OK ] [mLazyOpsTest.TestAdaptiveAvgPool2DBackward (3 ms)
2023-01-11T23:30:56.5569903Z [0;32m[ RUN      ] [mLazyOpsTest.TestAdaptiveAvgPool2DNoBatchBackward
2023-01-11T23:30:56.5596859Z [0;32m[       OK ] [mLazyOpsTest.TestAdaptiveAvgPool2DNoBatchBackward (2 ms)
2023-01-11T23:30:56.5597237Z [0;32m[ RUN      ] [mLazyOpsTest.TestConv2D
2023-01-11T23:30:57.6765825Z [0;32m[       OK ] [mLazyOpsTest.TestConv2D (1116 ms)
2023-01-11T23:30:57.6766278Z [0;32m[ RUN      ] [mLazyOpsTest.TestConv2DBackward
2023-01-11T23:30:58.2265676Z [0;32m[       OK ] [mLazyOpsTest.TestConv2DBackward (549 ms)
2023-01-11T23:30:58.2266471Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposedConv2DBackward
2023-01-11T23:30:58.5929696Z [0;32m[       OK ] [mLazyOpsTest.TestTransposedConv2DBackward (366 ms)
2023-01-11T23:30:58.5930194Z [0;32m[ RUN      ] [mLazyOpsTest.TestConv3DBackward
2023-01-11T23:30:58.8327166Z [0;32m[       OK ] [mLazyOpsTest.TestConv3DBackward (239 ms)
2023-01-11T23:30:58.8327561Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposedConv3DBackward
2023-01-11T23:30:59.2616517Z [0;32m[       OK ] [mLazyOpsTest.TestTransposedConv3DBackward (428 ms)
2023-01-11T23:30:59.2616967Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool2DBackward
2023-01-11T23:30:59.2746560Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool2DBackward (13 ms)
2023-01-11T23:30:59.2746941Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3DBackward
2023-01-11T23:30:59.2836087Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3DBackward (8 ms)
2023-01-11T23:30:59.2836473Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool2DNoBatchBackward
2023-01-11T23:30:59.2965010Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool2DNoBatchBackward (12 ms)
2023-01-11T23:30:59.2965475Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxPool3DNoBatchBackward
2023-01-11T23:30:59.3052022Z [0;32m[       OK ] [mLazyOpsTest.TestMaxPool3DNoBatchBackward (8 ms)
2023-01-11T23:30:59.3052414Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxUnpool2DBackward
2023-01-11T23:30:59.3312798Z [0;32m[       OK ] [mLazyOpsTest.TestMaxUnpool2DBackward (25 ms)
2023-01-11T23:30:59.3313193Z [0;32m[ RUN      ] [mLazyOpsTest.TestMaxUnpool3DBackward
2023-01-11T23:30:59.3536531Z [0;32m[       OK ] [mLazyOpsTest.TestMaxUnpool3DBackward (22 ms)
2023-01-11T23:30:59.3536924Z [0;32m[ RUN      ] [mLazyOpsTest.TestTanhBackward
2023-01-11T23:30:59.3550063Z [0;32m[       OK ] [mLazyOpsTest.TestTanhBackward (1 ms)
2023-01-11T23:30:59.3550466Z [0;32m[ RUN      ] [mLazyOpsTest.TestSigmoidBackward
2023-01-11T23:30:59.3563625Z [0;32m[       OK ] [mLazyOpsTest.TestSigmoidBackward (1 ms)
2023-01-11T23:30:59.3564092Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSigmoidBackward
2023-01-11T23:30:59.3577295Z [0;32m[       OK ] [mLazyOpsTest.TestLogSigmoidBackward (1 ms)
2023-01-11T23:30:59.3577716Z [0;32m[ RUN      ] [mLazyOpsTest.TestLogSoftmaxBackward
2023-01-11T23:30:59.3683535Z [0;32m[       OK ] [mLazyOpsTest.TestLogSoftmaxBackward (10 ms)
2023-01-11T23:30:59.3683949Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftmaxBackward
2023-01-11T23:30:59.3791390Z [0;32m[       OK ] [mLazyOpsTest.TestSoftmaxBackward (10 ms)
2023-01-11T23:30:59.3792077Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftplusBackward
2023-01-11T23:30:59.3804632Z [0;32m[       OK ] [mLazyOpsTest.TestSoftplusBackward (1 ms)
2023-01-11T23:30:59.3804996Z [0;32m[ RUN      ] [mLazyOpsTest.TestReluBackward
2023-01-11T23:30:59.3818233Z [0;32m[       OK ] [mLazyOpsTest.TestReluBackward (1 ms)
2023-01-11T23:30:59.3818583Z [0;32m[ RUN      ] [mLazyOpsTest.TestRreluBackward
2023-01-11T23:30:59.3832728Z [0;32m[       OK ] [mLazyOpsTest.TestRreluBackward (1 ms)
2023-01-11T23:30:59.3833102Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardshrinkBackward
2023-01-11T23:30:59.3844436Z [0;32m[       OK ] [mLazyOpsTest.TestHardshrinkBackward (1 ms)
2023-01-11T23:30:59.3844822Z [0;32m[ RUN      ] [mLazyOpsTest.TestSoftshrinkBackward
2023-01-11T23:30:59.3857044Z [0;32m[       OK ] [mLazyOpsTest.TestSoftshrinkBackward (1 ms)
2023-01-11T23:30:59.3857419Z [0;32m[ RUN      ] [mLazyOpsTest.TestHardtanhBackward
2023-01-11T23:30:59.3865668Z [0;32m[       OK ] [mLazyOpsTest.TestHardtanhBackward (0 ms)
2023-01-11T23:30:59.3866011Z [0;32m[ RUN      ] [mLazyOpsTest.TestEluBackward
2023-01-11T23:30:59.3880322Z [0;32m[       OK ] [mLazyOpsTest.TestEluBackward (1 ms)
2023-01-11T23:30:59.3880661Z [0;32m[ RUN      ] [mLazyOpsTest.TestGeluBackward
2023-01-11T23:30:59.3893742Z [0;32m[       OK ] [mLazyOpsTest.TestGeluBackward (1 ms)
2023-01-11T23:30:59.3894093Z [0;32m[ RUN      ] [mLazyOpsTest.TestLeakyReluBackward
2023-01-11T23:30:59.3907597Z [0;32m[       OK ] [mLazyOpsTest.TestLeakyReluBackward (1 ms)
2023-01-11T23:30:59.3907970Z [0;32m[ RUN      ] [mLazyOpsTest.TestTransposeBackward
2023-01-11T23:30:59.3920333Z [0;32m[       OK ] [mLazyOpsTest.TestTransposeBackward (1 ms)
2023-01-11T23:30:59.3920707Z [0;32m[ RUN      ] [mLazyOpsTest.TestAddMatMulBackward
2023-01-11T23:30:59.3994469Z [0;32m[       OK ] [mLazyOpsTest.TestAddMatMulBackward (7 ms)
2023-01-11T23:30:59.3994869Z [0;32m[ RUN      ] [mLazyOpsTest.TestBinaryCrossEntropyBackward
2023-01-11T23:30:59.4074640Z [0;32m[       OK ] [mLazyOpsTest.TestBinaryCrossEntropyBackward (8 ms)
2023-01-11T23:30:59.4075083Z [0;32m[ RUN      ] [mLazyOpsTest.TestNllLossBackward
2023-01-11T23:30:59.4075498Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:10954: Skipped
2023-01-11T23:30:59.4075738Z 
2023-01-11T23:30:59.4075948Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestNllLossBackward (0 ms)
2023-01-11T23:30:59.4076294Z [0;32m[ RUN      ] [mLazyOpsTest.TestNllLoss2dBackward
2023-01-11T23:30:59.4478136Z [0;32m[       OK ] [mLazyOpsTest.TestNllLoss2dBackward (40 ms)
2023-01-11T23:30:59.4478662Z [0;32m[ RUN      ] [mLazyOpsTest.TestSmoothL1LossBackward
2023-01-11T23:30:59.4565467Z [0;32m[       OK ] [mLazyOpsTest.TestSmoothL1LossBackward (8 ms)
2023-01-11T23:30:59.4566057Z [0;32m[ RUN      ] [mLazyOpsTest.TestViewBackward
2023-01-11T23:30:59.4584397Z [0;32m[       OK ] [mLazyOpsTest.TestViewBackward (1 ms)
2023-01-11T23:30:59.4584764Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchNorm2DBackward
2023-01-11T23:30:59.4651264Z [0;32m[       OK ] [mLazyOpsTest.TestBatchNorm2DBackward (6 ms)
2023-01-11T23:30:59.4651641Z [0;32m[ RUN      ] [mLazyOpsTest.TestBatchNorm3DBackward
2023-01-11T23:30:59.4717170Z [0;32m[       OK ] [mLazyOpsTest.TestBatchNorm3DBackward (6 ms)
2023-01-11T23:30:59.4717558Z [0;32m[ RUN      ] [mLazyOpsTest.TestBCEWithLogitsBackward
2023-01-11T23:30:59.5181059Z [0;32m[       OK ] [mLazyOpsTest.TestBCEWithLogitsBackward (46 ms)
2023-01-11T23:30:59.5181432Z [0;32m[ RUN      ] [mLazyOpsTest.TestKlDivBackward
2023-01-11T23:30:59.5291887Z [0;32m[       OK ] [mLazyOpsTest.TestKlDivBackward (11 ms)
2023-01-11T23:30:59.5292264Z [0;32m[ RUN      ] [mLazyOpsTest.TestEmbeddingBackward
2023-01-11T23:30:59.6342171Z [0;32m[       OK ] [mLazyOpsTest.TestEmbeddingBackward (104 ms)
2023-01-11T23:30:59.6343189Z [0;32m[ RUN      ] [mLazyOpsTest.TestAmpForeachNonFiniteCheckAndUnscale
2023-01-11T23:30:59.6343905Z /var/lib/jenkins/workspace/test/cpp/lazy/test_lazy_ops.cpp:11331: Skipped
2023-01-11T23:30:59.6344229Z 
2023-01-11T23:30:59.6344526Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestAmpForeachNonFiniteCheckAndUnscale (0 ms)
2023-01-11T23:30:59.6344940Z [0;32m[ RUN      ] [mLazyOpsTest.TestAmpUpdateScale
2023-01-11T23:30:59.6358951Z [0;32m[       OK ] [mLazyOpsTest.TestAmpUpdateScale (1 ms)
2023-01-11T23:30:59.6359339Z [0;32m[ RUN      ] [mLazyOpsTest.TestEarlySyncLiveTensors
2023-01-11T23:30:59.6359738Z [0;32m[       OK ] [mLazyOpsTest.TestEarlySyncLiveTensors (0 ms)
2023-01-11T23:30:59.6360096Z [0;32m[ RUN      ] [mLazyOpsTest.TestLerp
2023-01-11T23:30:59.6363294Z [0;32m[       OK ] [mLazyOpsTest.TestLerp (0 ms)
2023-01-11T23:30:59.6363675Z [0;32m[ RUN      ] [mLazyOpsTest.TestLerpScalar
2023-01-11T23:30:59.6365795Z [0;32m[       OK ] [mLazyOpsTest.TestLerpScalar (0 ms)
2023-01-11T23:30:59.6366172Z [0;32m[ RUN      ] [mLazyOpsTest.TestLerpInplace
2023-01-11T23:30:59.6368626Z [0;32m[       OK ] [mLazyOpsTest.TestLerpInplace (0 ms)
2023-01-11T23:30:59.6369047Z [0;32m[ RUN      ] [mLazyOpsTest.TestLerpScalarInplace
2023-01-11T23:30:59.6371770Z [0;32m[       OK ] [mLazyOpsTest.TestLerpScalarInplace (0 ms)
2023-01-11T23:30:59.6372216Z [0;32m[ RUN      ] [mLazyOpsTest.TestLerpOut
2023-01-11T23:30:59.6375052Z [0;32m[       OK ] [mLazyOpsTest.TestLerpOut (0 ms)
2023-01-11T23:30:59.6375454Z [0;32m[ RUN      ] [mLazyOpsTest.TestLerpScalarOut
2023-01-11T23:30:59.6378084Z [0;32m[       OK ] [mLazyOpsTest.TestLerpScalarOut (0 ms)
2023-01-11T23:30:59.6378563Z [0;32m[ RUN      ] [mLazyOpsTest.IsAliasOf
2023-01-11T23:30:59.6378975Z [0;32m[       OK ] [mLazyOpsTest.IsAliasOf (0 ms)
2023-01-11T23:30:59.6379513Z [0;32m[----------] [m574 tests from LazyOpsTest (9142 ms total)
2023-01-11T23:30:59.6379690Z 
2023-01-11T23:30:59.6379867Z [0;32m[----------] [mGlobal test environment tear-down
2023-01-11T23:30:59.6456031Z [0;32m[==========] [m611 tests from 10 test suites ran. (9151 ms total)
2023-01-11T23:30:59.6456833Z [0;32m[  PASSED  ] [m594 tests.
2023-01-11T23:30:59.6457569Z [0;32m[  SKIPPED ] [m17 tests, listed below:
2023-01-11T23:30:59.6458339Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestInverse
2023-01-11T23:30:59.6459074Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestOneIndexPut
2023-01-11T23:30:59.6459751Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestOneIndexPutInPlace
2023-01-11T23:30:59.6460490Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestOneIndexPutTransfer
2023-01-11T23:30:59.6461177Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPut
2023-01-11T23:30:59.6461885Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutHeadNull
2023-01-11T23:30:59.6462626Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutMiddleNull
2023-01-11T23:30:59.6463391Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutTailNull
2023-01-11T23:30:59.6464134Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutMiddleBroadcast
2023-01-11T23:30:59.6464555Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestMultiIndexPutTailBroadcast
2023-01-11T23:30:59.6464914Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestIndexPutImpl
2023-01-11T23:30:59.6465255Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestIndexCopyInPlace
2023-01-11T23:30:59.6465567Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestNllLoss
2023-01-11T23:30:59.6465945Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestAdaptiveAvgPool3DNoBatchBackward
2023-01-11T23:30:59.6466363Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestAdaptiveAvgPool3DBackward
2023-01-11T23:30:59.6466727Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestNllLossBackward
2023-01-11T23:30:59.6467119Z [0;32m[  SKIPPED ] [mLazyOpsTest.TestAmpForeachNonFiniteCheckAndUnscale
2023-01-11T23:31:00.2592945Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *-tsan* ]]
2023-01-11T23:31:00.2593720Z + python test/cpp/jit/tests_setup.py shutdown
2023-01-11T23:31:01.6112935Z + wait
2023-01-11T23:31:01.6113191Z + OMP_NUM_THREADS=2
2023-01-11T23:31:01.6113443Z + TORCH_CPP_TEST_MNIST_PATH=test/cpp/api/mnist
2023-01-11T23:31:01.6114134Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_api '--gtest_filter=-IMethodTest.*' --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_api.xml
2023-01-11T23:31:01.9924415Z Only one CUDA device detected. Disabling MultiCUDA tests
2023-01-11T23:31:01.9930805Z [0;33mNote: Google Test filter = -IMethodTest.*:*_MultiCUDA
2023-01-11T23:31:01.9931241Z [m[0;32m[==========] [mRunning 1035 tests from 49 test suites.
2023-01-11T23:31:01.9931557Z [0;32m[----------] [mGlobal test environment set-up.
2023-01-11T23:31:01.9931880Z [0;32m[----------] [m9 tests from AutogradAPITests
2023-01-11T23:31:01.9932219Z [0;32m[ RUN      ] [mAutogradAPITests.BackwardSimpleTest
2023-01-11T23:31:01.9947043Z [0;32m[       OK ] [mAutogradAPITests.BackwardSimpleTest (1 ms)
2023-01-11T23:31:01.9947421Z [0;32m[ RUN      ] [mAutogradAPITests.BackwardTest
2023-01-11T23:31:01.9948056Z [W engine.cpp:1134] Warning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (function operator())
2023-01-11T23:31:01.9952414Z [0;32m[       OK ] [mAutogradAPITests.BackwardTest (0 ms)
2023-01-11T23:31:01.9952786Z [0;32m[ RUN      ] [mAutogradAPITests.GradSimpleTest
2023-01-11T23:31:01.9953239Z [0;32m[       OK ] [mAutogradAPITests.GradSimpleTest (0 ms)
2023-01-11T23:31:01.9953581Z [0;32m[ RUN      ] [mAutogradAPITests.GradTest
2023-01-11T23:31:01.9956943Z [0;32m[       OK ] [mAutogradAPITests.GradTest (0 ms)
2023-01-11T23:31:01.9957413Z [0;32m[ RUN      ] [mAutogradAPITests.GradNonLeafTest
2023-01-11T23:31:01.9959522Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)
2023-01-11T23:31:01.9960932Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)
2023-01-11T23:31:01.9962328Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)
2023-01-11T23:31:01.9963829Z [W TensorBody.h:485] Warning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the .grad field to be populated for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. (function grad)
2023-01-11T23:31:01.9965206Z [0;32m[       OK ] [mAutogradAPITests.GradNonLeafTest (0 ms)
2023-01-11T23:31:01.9965580Z [0;32m[ RUN      ] [mAutogradAPITests.GradUnreachableTest
2023-01-11T23:31:01.9993561Z [0;32m[       OK ] [mAutogradAPITests.GradUnreachableTest (2 ms)
2023-01-11T23:31:01.9993941Z [0;32m[ RUN      ] [mAutogradAPITests.EmptyInput
2023-01-11T23:31:02.0021880Z [0;32m[       OK ] [mAutogradAPITests.EmptyInput (2 ms)
2023-01-11T23:31:02.0022737Z [0;32m[ RUN      ] [mAutogradAPITests.RetainGrad
2023-01-11T23:31:02.0024202Z [0;32m[       OK ] [mAutogradAPITests.RetainGrad (0 ms)
2023-01-11T23:31:02.0024847Z [0;32m[ RUN      ] [mAutogradAPITests.AnomalyMode
2023-01-11T23:31:02.0025364Z [W anomaly_mode.cpp:27] Warning: This mode should be enabled only for debugging as the different tests will slow down your program execution. (function operator())
2023-01-11T23:31:02.1628487Z [0;32m[       OK ] [mAutogradAPITests.AnomalyMode (160 ms)
2023-01-11T23:31:02.1629022Z [0;32m[----------] [m9 tests from AutogradAPITests (169 ms total)
2023-01-11T23:31:02.1629217Z 
2023-01-11T23:31:02.1629395Z [0;32m[----------] [m33 tests from CustomAutogradTest
2023-01-11T23:31:02.1629785Z [0;32m[ RUN      ] [mCustomAutogradTest.GradUnreachableDiscoveryTest
2023-01-11T23:31:02.1630374Z [0;32m[       OK ] [mCustomAutogradTest.GradUnreachableDiscoveryTest (0 ms)
2023-01-11T23:31:02.1630808Z [0;32m[ RUN      ] [mCustomAutogradTest.CustomFunction
2023-01-11T23:31:02.1631214Z [0;32m[       OK ] [mCustomAutogradTest.CustomFunction (0 ms)
2023-01-11T23:31:02.1631803Z [0;32m[ RUN      ] [mCustomAutogradTest.CustomFunctionWithTensorList
2023-01-11T23:31:02.1632379Z [0;32m[       OK ] [mCustomAutogradTest.CustomFunctionWithTensorList (0 ms)
2023-01-11T23:31:02.1632892Z [0;32m[ RUN      ] [mCustomAutogradTest.GraphTaskTrimEdges
2023-01-11T23:31:02.1636381Z [0;32m[       OK ] [mCustomAutogradTest.GraphTaskTrimEdges (0 ms)
2023-01-11T23:31:02.1637037Z [0;32m[ RUN      ] [mCustomAutogradTest.FunctionReturnsInput
2023-01-11T23:31:02.1637596Z [0;32m[       OK ] [mCustomAutogradTest.FunctionReturnsInput (0 ms)
2023-01-11T23:31:02.1638273Z [0;32m[ RUN      ] [mCustomAutogradTest.FunctionReturnsUndefined
2023-01-11T23:31:02.1638701Z [0;32m[       OK ] [mCustomAutogradTest.FunctionReturnsUndefined (0 ms)
2023-01-11T23:31:02.1639107Z [0;32m[ RUN      ] [mCustomAutogradTest.MaterializeGrads
2023-01-11T23:31:02.1639470Z [0;32m[       OK ] [mCustomAutogradTest.MaterializeGrads (0 ms)
2023-01-11T23:31:02.1639967Z [0;32m[ RUN      ] [mCustomAutogradTest.DontMaterializeGrads
2023-01-11T23:31:02.1640594Z [0;32m[       OK ] [mCustomAutogradTest.DontMaterializeGrads (0 ms)
2023-01-11T23:31:02.1641088Z [0;32m[ RUN      ] [mCustomAutogradTest.NoGradCustomFunction
2023-01-11T23:31:02.1641475Z [0;32m[       OK ] [mCustomAutogradTest.NoGradCustomFunction (0 ms)
2023-01-11T23:31:02.1641841Z [0;32m[ RUN      ] [mCustomAutogradTest.MarkDirty
2023-01-11T23:31:02.1642352Z [0;32m[       OK ] [mCustomAutogradTest.MarkDirty (0 ms)
2023-01-11T23:31:02.1642890Z [0;32m[ RUN      ] [mCustomAutogradTest.MarkNonDifferentiable
2023-01-11T23:31:02.1643403Z [0;32m[       OK ] [mCustomAutogradTest.MarkNonDifferentiable (0 ms)
2023-01-11T23:31:02.1643825Z [0;32m[ RUN      ] [mCustomAutogradTest.MarkNonDifferentiableMixed
2023-01-11T23:31:02.1644264Z [0;32m[       OK ] [mCustomAutogradTest.MarkNonDifferentiableMixed (0 ms)
2023-01-11T23:31:02.1644690Z [0;32m[ RUN      ] [mCustomAutogradTest.MarkNonDifferentiableNone
2023-01-11T23:31:02.1645115Z [0;32m[       OK ] [mCustomAutogradTest.MarkNonDifferentiableNone (0 ms)
2023-01-11T23:31:02.1645516Z [0;32m[ RUN      ] [mCustomAutogradTest.ReturnLeafInplace
2023-01-11T23:31:02.1645891Z [0;32m[       OK ] [mCustomAutogradTest.ReturnLeafInplace (0 ms)
2023-01-11T23:31:02.1646286Z [0;32m[ RUN      ] [mCustomAutogradTest.ReturnDuplicateInplace
2023-01-11T23:31:02.1667868Z [0;32m[       OK ] [mCustomAutogradTest.ReturnDuplicateInplace (2 ms)
2023-01-11T23:31:02.1668962Z [0;32m[ RUN      ] [mCustomAutogradTest.ReturnDuplicate
2023-01-11T23:31:02.1669704Z [0;32m[       OK ] [mCustomAutogradTest.ReturnDuplicate (0 ms)
2023-01-11T23:31:02.1670663Z [0;32m[ RUN      ] [mCustomAutogradTest.SaveEmptyForBackward
2023-01-11T23:31:02.1671443Z [0;32m[       OK ] [mCustomAutogradTest.SaveEmptyForBackward (0 ms)
2023-01-11T23:31:02.1672164Z [0;32m[ RUN      ] [mCustomAutogradTest.InvalidGradients
2023-01-11T23:31:02.1728591Z [0;32m[       OK ] [mCustomAutogradTest.InvalidGradients (5 ms)
2023-01-11T23:31:02.1728999Z [0;32m[ RUN      ] [mCustomAutogradTest.NoGradInput
2023-01-11T23:31:02.1729592Z [0;32m[       OK ] [mCustomAutogradTest.NoGradInput (0 ms)
2023-01-11T23:31:02.1730056Z [0;32m[ RUN      ] [mCustomAutogradTest.TooManyGrads
2023-01-11T23:31:02.1730405Z [0;32m[       OK ] [mCustomAutogradTest.TooManyGrads (0 ms)
2023-01-11T23:31:02.1730736Z [0;32m[ RUN      ] [mCustomAutogradTest.DepNoGrad
2023-01-11T23:31:02.1731064Z [0;32m[       OK ] [mCustomAutogradTest.DepNoGrad (0 ms)
2023-01-11T23:31:02.1731388Z [0;32m[ RUN      ] [mCustomAutogradTest.Reentrant
2023-01-11T23:31:02.1731738Z [0;32m[       OK ] [mCustomAutogradTest.Reentrant (0 ms)
2023-01-11T23:31:02.1732071Z [0;32m[ RUN      ] [mCustomAutogradTest.DeepReentrant
2023-01-11T23:31:02.6057955Z [0;32m[       OK ] [mCustomAutogradTest.DeepReentrant (432 ms)
2023-01-11T23:31:02.6058377Z [0;32m[ RUN      ] [mCustomAutogradTest.ReentrantPriority
2023-01-11T23:31:02.6063071Z [0;32m[       OK ] [mCustomAutogradTest.ReentrantPriority (0 ms)
2023-01-11T23:31:02.6063429Z [0;32m[ RUN      ] [mCustomAutogradTest.Hooks
2023-01-11T23:31:02.6084874Z [0;32m[       OK ] [mCustomAutogradTest.Hooks (2 ms)
2023-01-11T23:31:02.6085208Z [0;32m[ RUN      ] [mCustomAutogradTest.HooksInplace
2023-01-11T23:31:02.6087138Z [0;32m[       OK ] [mCustomAutogradTest.HooksInplace (0 ms)
2023-01-11T23:31:02.6087551Z [0;32m[ RUN      ] [mCustomAutogradTest.HooksInplaceWithRetainsGrad
2023-01-11T23:31:02.6089298Z [0;32m[       OK ] [mCustomAutogradTest.HooksInplaceWithRetainsGrad (0 ms)
2023-01-11T23:31:02.6090121Z [0;32m[ RUN      ] [mCustomAutogradTest.HooksInplaceTwiceWithRetainsGrad
2023-01-11T23:31:02.6092372Z [0;32m[       OK ] [mCustomAutogradTest.HooksInplaceTwiceWithRetainsGrad (0 ms)
2023-01-11T23:31:02.6092770Z [0;32m[ RUN      ] [mCustomAutogradTest.HookNone
2023-01-11T23:31:02.6093234Z [0;32m[       OK ] [mCustomAutogradTest.HookNone (0 ms)
2023-01-11T23:31:02.6093580Z [0;32m[ RUN      ] [mCustomAutogradTest.BackwardWithInputs
2023-01-11T23:31:02.6095437Z [0;32m[       OK ] [mCustomAutogradTest.BackwardWithInputs (0 ms)
2023-01-11T23:31:02.6095842Z [0;32m[ RUN      ] [mCustomAutogradTest.BackwardWithEmptyInputs
2023-01-11T23:31:02.6108008Z [0;32m[       OK ] [mCustomAutogradTest.BackwardWithEmptyInputs (1 ms)
2023-01-11T23:31:02.6108420Z [0;32m[ RUN      ] [mCustomAutogradTest.BackwardWithNonLeafInputs
2023-01-11T23:31:02.6109877Z [0;32m[       OK ] [mCustomAutogradTest.BackwardWithNonLeafInputs (0 ms)
2023-01-11T23:31:02.6110417Z [0;32m[ RUN      ] [mCustomAutogradTest.BackwardWithCreateGraphWarns
2023-01-11T23:31:02.6110958Z [0;32m[       OK ] [mCustomAutogradTest.BackwardWithCreateGraphWarns (0 ms)
2023-01-11T23:31:02.6111374Z [0;32m[----------] [m33 tests from CustomAutogradTest (448 ms total)
2023-01-11T23:31:02.6111554Z 
2023-01-11T23:31:02.6111777Z [0;32m[----------] [m13 tests from TestAutogradNotImplementedFallback
2023-01-11T23:31:02.6112219Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.RetSingleNonTensor
2023-01-11T23:31:02.6114540Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.RetSingleNonTensor (0 ms)
2023-01-11T23:31:02.6115021Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.InplaceOp
2023-01-11T23:31:02.6158742Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.InplaceOp (4 ms)
2023-01-11T23:31:02.6159194Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.DoubleInplaceOp
2023-01-11T23:31:02.6195735Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.DoubleInplaceOp (3 ms)
2023-01-11T23:31:02.6196177Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.OptOp
2023-01-11T23:31:02.6198283Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.OptOp (0 ms)
2023-01-11T23:31:02.6198738Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.OutOfPlaceAddition
2023-01-11T23:31:02.6234561Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.OutOfPlaceAddition (3 ms)
2023-01-11T23:31:02.6235055Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.RetTupleNonTensor
2023-01-11T23:31:02.6266758Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.RetTupleNonTensor (3 ms)
2023-01-11T23:31:02.6267197Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.ViewOp
2023-01-11T23:31:02.6332258Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.ViewOp (6 ms)
2023-01-11T23:31:02.6332711Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.ViewOpWithExtraArg
2023-01-11T23:31:02.6366710Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.ViewOpWithExtraArg (3 ms)
2023-01-11T23:31:02.6367207Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.RetTensorVectorView
2023-01-11T23:31:02.6367699Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.RetTensorVectorView (0 ms)
2023-01-11T23:31:02.6368160Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.DoubleViewOP
2023-01-11T23:31:02.6387569Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.DoubleViewOP (1 ms)
2023-01-11T23:31:02.6388027Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.NonFirstViewOP
2023-01-11T23:31:02.6418006Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.NonFirstViewOP (3 ms)
2023-01-11T23:31:02.6418471Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.RetTensorVector
2023-01-11T23:31:02.6451806Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.RetTensorVector (3 ms)
2023-01-11T23:31:02.6452261Z [0;32m[ RUN      ] [mTestAutogradNotImplementedFallback.TensorlistOp
2023-01-11T23:31:02.6476212Z [0;32m[       OK ] [mTestAutogradNotImplementedFallback.TensorlistOp (2 ms)
2023-01-11T23:31:02.6476762Z [0;32m[----------] [m13 tests from TestAutogradNotImplementedFallback (36 ms total)
2023-01-11T23:31:02.6476974Z 
2023-01-11T23:31:02.6477133Z [0;32m[----------] [m18 tests from AnyModuleTest
2023-01-11T23:31:02.6477447Z [0;32m[ RUN      ] [mAnyModuleTest.SimpleReturnType
2023-01-11T23:31:02.6477807Z [0;32m[       OK ] [mAnyModuleTest.SimpleReturnType (0 ms)
2023-01-11T23:31:02.6478227Z [0;32m[ RUN      ] [mAnyModuleTest.SimpleReturnTypeAndSingleArgument
2023-01-11T23:31:02.6478668Z [0;32m[       OK ] [mAnyModuleTest.SimpleReturnTypeAndSingleArgument (0 ms)
2023-01-11T23:31:02.6479096Z [0;32m[ RUN      ] [mAnyModuleTest.StringLiteralReturnTypeAndArgument
2023-01-11T23:31:02.6479538Z [0;32m[       OK ] [mAnyModuleTest.StringLiteralReturnTypeAndArgument (0 ms)
2023-01-11T23:31:02.6479988Z [0;32m[ RUN      ] [mAnyModuleTest.StringReturnTypeWithConstArgument
2023-01-11T23:31:02.6480414Z [0;32m[       OK ] [mAnyModuleTest.StringReturnTypeWithConstArgument (0 ms)
2023-01-11T23:31:02.6480986Z [0;32m[ RUN      ] [mAnyModuleTest.TensorReturnTypeAndStringArgumentsWithFunkyQualifications
2023-01-11T23:31:02.6481586Z [0;32m[       OK ] [mAnyModuleTest.TensorReturnTypeAndStringArgumentsWithFunkyQualifications (0 ms)
2023-01-11T23:31:02.6482046Z [0;32m[ RUN      ] [mAnyModuleTest.WrongArgumentType
2023-01-11T23:31:02.6491743Z [0;32m[       OK ] [mAnyModuleTest.WrongArgumentType (1 ms)
2023-01-11T23:31:02.6492112Z [0;32m[ RUN      ] [mAnyModuleTest.WrongNumberOfArguments
2023-01-11T23:31:02.6528620Z [0;32m[       OK ] [mAnyModuleTest.WrongNumberOfArguments (3 ms)
2023-01-11T23:31:02.6529138Z [0;32m[ RUN      ] [mAnyModuleTest.PassingArgumentsToModuleWithDefaultArgumentsInForwardMethod
2023-01-11T23:31:02.6602464Z [0;32m[       OK ] [mAnyModuleTest.PassingArgumentsToModuleWithDefaultArgumentsInForwardMethod (7 ms)
2023-01-11T23:31:02.6602983Z [0;32m[ RUN      ] [mAnyModuleTest.GetWithCorrectTypeSucceeds
2023-01-11T23:31:02.6603386Z [0;32m[       OK ] [mAnyModuleTest.GetWithCorrectTypeSucceeds (0 ms)
2023-01-11T23:31:02.6603787Z [0;32m[ RUN      ] [mAnyModuleTest.GetWithIncorrectTypeThrows
2023-01-11T23:31:02.6613070Z [0;32m[       OK ] [mAnyModuleTest.GetWithIncorrectTypeThrows (1 ms)
2023-01-11T23:31:02.6613471Z [0;32m[ RUN      ] [mAnyModuleTest.PtrWithBaseClassSucceeds
2023-01-11T23:31:02.6613860Z [0;32m[       OK ] [mAnyModuleTest.PtrWithBaseClassSucceeds (0 ms)
2023-01-11T23:31:02.6614245Z [0;32m[ RUN      ] [mAnyModuleTest.PtrWithGoodDowncastSuccceeds
2023-01-11T23:31:02.6614789Z [0;32m[       OK ] [mAnyModuleTest.PtrWithGoodDowncastSuccceeds (0 ms)
2023-01-11T23:31:02.6615182Z [0;32m[ RUN      ] [mAnyModuleTest.PtrWithBadDowncastThrows
2023-01-11T23:31:02.6624264Z [0;32m[       OK ] [mAnyModuleTest.PtrWithBadDowncastThrows (1 ms)
2023-01-11T23:31:02.6624638Z [0;32m[ RUN      ] [mAnyModuleTest.DefaultStateIsEmpty
2023-01-11T23:31:02.6625043Z [0;32m[       OK ] [mAnyModuleTest.DefaultStateIsEmpty (0 ms)
2023-01-11T23:31:02.6625448Z [0;32m[ RUN      ] [mAnyModuleTest.AllMethodsThrowForEmptyAnyModule
2023-01-11T23:31:02.6677115Z [0;32m[       OK ] [mAnyModuleTest.AllMethodsThrowForEmptyAnyModule (5 ms)
2023-01-11T23:31:02.6677637Z [0;32m[ RUN      ] [mAnyModuleTest.CanMoveAssignDifferentModules
2023-01-11T23:31:02.6678208Z [0;32m[       OK ] [mAnyModuleTest.CanMoveAssignDifferentModules (0 ms)
2023-01-11T23:31:02.6678706Z [0;32m[ RUN      ] [mAnyModuleTest.ConstructsFromModuleHolder
2023-01-11T23:31:02.6679095Z [0;32m[       OK ] [mAnyModuleTest.ConstructsFromModuleHolder (0 ms)
2023-01-11T23:31:02.6679506Z [0;32m[ RUN      ] [mAnyModuleTest.ConvertsVariableToTensorCorrectly
2023-01-11T23:31:02.6679943Z [0;32m[       OK ] [mAnyModuleTest.ConvertsVariableToTensorCorrectly (0 ms)
2023-01-11T23:31:02.6680353Z [0;32m[----------] [m18 tests from AnyModuleTest (20 ms total)
2023-01-11T23:31:02.6680511Z 
2023-01-11T23:31:02.6680666Z [0;32m[----------] [m12 tests from AnyValueTest
2023-01-11T23:31:02.6681127Z [0;32m[ RUN      ] [mAnyValueTest.CorrectlyAccessesIntWhenCorrectType
2023-01-11T23:31:02.6681581Z [0;32m[       OK ] [mAnyValueTest.CorrectlyAccessesIntWhenCorrectType (0 ms)
2023-01-11T23:31:02.6682044Z [0;32m[ RUN      ] [mAnyValueTest.CorrectlyAccessesStringLiteralWhenCorrectType
2023-01-11T23:31:02.6682557Z [0;32m[       OK ] [mAnyValueTest.CorrectlyAccessesStringLiteralWhenCorrectType (0 ms)
2023-01-11T23:31:02.6683031Z [0;32m[ RUN      ] [mAnyValueTest.CorrectlyAccessesStringWhenCorrectType
2023-01-11T23:31:02.6683493Z [0;32m[       OK ] [mAnyValueTest.CorrectlyAccessesStringWhenCorrectType (0 ms)
2023-01-11T23:31:02.6683945Z [0;32m[ RUN      ] [mAnyValueTest.CorrectlyAccessesPointersWhenCorrectType
2023-01-11T23:31:02.6684421Z [0;32m[       OK ] [mAnyValueTest.CorrectlyAccessesPointersWhenCorrectType (0 ms)
2023-01-11T23:31:02.6684894Z [0;32m[ RUN      ] [mAnyValueTest.CorrectlyAccessesReferencesWhenCorrectType
2023-01-11T23:31:02.6685376Z [0;32m[       OK ] [mAnyValueTest.CorrectlyAccessesReferencesWhenCorrectType (0 ms)
2023-01-11T23:31:02.6685877Z [0;32m[ RUN      ] [mAnyValueTest.TryGetReturnsNullptrForTheWrongType
2023-01-11T23:31:02.6686317Z [0;32m[       OK ] [mAnyValueTest.TryGetReturnsNullptrForTheWrongType (0 ms)
2023-01-11T23:31:02.6686724Z [0;32m[ RUN      ] [mAnyValueTest.GetThrowsForTheWrongType
2023-01-11T23:31:02.6701469Z [0;32m[       OK ] [mAnyValueTest.GetThrowsForTheWrongType (2 ms)
2023-01-11T23:31:02.6701987Z [0;32m[ RUN      ] [mAnyValueTest.MoveConstructionIsAllowed
2023-01-11T23:31:02.6702488Z [0;32m[       OK ] [mAnyValueTest.MoveConstructionIsAllowed (0 ms)
2023-01-11T23:31:02.6702959Z [0;32m[ RUN      ] [mAnyValueTest.MoveAssignmentIsAllowed
2023-01-11T23:31:02.6703335Z [0;32m[       OK ] [mAnyValueTest.MoveAssignmentIsAllowed (0 ms)
2023-01-11T23:31:02.6703719Z [0;32m[ RUN      ] [mAnyValueTest.TypeInfoIsCorrectForInt
2023-01-11T23:31:02.6704092Z [0;32m[       OK ] [mAnyValueTest.TypeInfoIsCorrectForInt (0 ms)
2023-01-11T23:31:02.6704488Z [0;32m[ RUN      ] [mAnyValueTest.TypeInfoIsCorrectForStringLiteral
2023-01-11T23:31:02.6704942Z [0;32m[       OK ] [mAnyValueTest.TypeInfoIsCorrectForStringLiteral (0 ms)
2023-01-11T23:31:02.6705396Z [0;32m[ RUN      ] [mAnyValueTest.TypeInfoIsCorrectForString
2023-01-11T23:31:02.6705776Z [0;32m[       OK ] [mAnyValueTest.TypeInfoIsCorrectForString (0 ms)
2023-01-11T23:31:02.6706148Z [0;32m[----------] [m12 tests from AnyValueTest (2 ms total)
2023-01-11T23:31:02.6706307Z 
2023-01-11T23:31:02.6706457Z [0;32m[----------] [m50 tests from DataTest
2023-01-11T23:31:02.6706773Z [0;32m[ RUN      ] [mDataTest.DatasetCallsGetCorrectly
2023-01-11T23:31:02.6707123Z [0;32m[       OK ] [mDataTest.DatasetCallsGetCorrectly (0 ms)
2023-01-11T23:31:02.6707496Z [0;32m[ RUN      ] [mDataTest.TransformCallsGetApplyCorrectly
2023-01-11T23:31:02.6707894Z [0;32m[       OK ] [mDataTest.TransformCallsGetApplyCorrectly (0 ms)
2023-01-11T23:31:02.6708294Z [0;32m[ RUN      ] [mDataTest.ChunkDataSetWithInvalidInitParameter
2023-01-11T23:31:02.6762780Z [0;32m[       OK ] [mDataTest.ChunkDataSetWithInvalidInitParameter (6 ms)
2023-01-11T23:31:02.6763194Z [0;32m[ RUN      ] [mDataTest.InfiniteStreamDataset
2023-01-11T23:31:02.6763625Z [0;32m[       OK ] [mDataTest.InfiniteStreamDataset (0 ms)
2023-01-11T23:31:02.6764042Z [0;32m[ RUN      ] [mDataTest.NoSequencerIsIdentity
2023-01-11T23:31:02.6764509Z [0;32m[       OK ] [mDataTest.NoSequencerIsIdentity (0 ms)
2023-01-11T23:31:02.6764902Z [0;32m[ RUN      ] [mDataTest.OrderedSequencerIsSetUpWell
2023-01-11T23:31:02.6765314Z [0;32m[       OK ] [mDataTest.OrderedSequencerIsSetUpWell (0 ms)
2023-01-11T23:31:02.6765696Z [0;32m[ RUN      ] [mDataTest.OrderedSequencerReOrdersValues
2023-01-11T23:31:02.6766084Z [0;32m[       OK ] [mDataTest.OrderedSequencerReOrdersValues (0 ms)
2023-01-11T23:31:02.6766479Z [0;32m[ RUN      ] [mDataTest.BatchLambdaAppliesFunctionToBatch
2023-01-11T23:31:02.6766880Z [0;32m[       OK ] [mDataTest.BatchLambdaAppliesFunctionToBatch (0 ms)
2023-01-11T23:31:02.6767362Z [0;32m[ RUN      ] [mDataTest.LambdaAppliesFunctionToExample
2023-01-11T23:31:02.6767762Z [0;32m[       OK ] [mDataTest.LambdaAppliesFunctionToExample (0 ms)
2023-01-11T23:31:02.6768110Z [0;32m[ RUN      ] [mDataTest.CollateReducesBatch
2023-01-11T23:31:02.6768445Z [0;32m[       OK ] [mDataTest.CollateReducesBatch (0 ms)
2023-01-11T23:31:02.6768774Z [0;32m[ RUN      ] [mDataTest.CollationReducesBatch
2023-01-11T23:31:02.6769106Z [0;32m[       OK ] [mDataTest.CollationReducesBatch (0 ms)
2023-01-11T23:31:02.6769494Z [0;32m[ RUN      ] [mDataTest.SequentialSamplerReturnsIndicesInOrder
2023-01-11T23:31:02.6769930Z [0;32m[       OK ] [mDataTest.SequentialSamplerReturnsIndicesInOrder (0 ms)
2023-01-11T23:31:02.6770390Z [0;32m[ RUN      ] [mDataTest.SequentialSamplerReturnsLessValuesForLastBatch
2023-01-11T23:31:02.6770867Z [0;32m[       OK ] [mDataTest.SequentialSamplerReturnsLessValuesForLastBatch (0 ms)
2023-01-11T23:31:02.6771292Z [0;32m[ RUN      ] [mDataTest.SequentialSamplerResetsWell
2023-01-11T23:31:02.6771711Z [0;32m[       OK ] [mDataTest.SequentialSamplerResetsWell (0 ms)
2023-01-11T23:31:02.6772109Z [0;32m[ RUN      ] [mDataTest.SequentialSamplerResetsWithNewSizeWell
2023-01-11T23:31:02.6772549Z [0;32m[       OK ] [mDataTest.SequentialSamplerResetsWithNewSizeWell (0 ms)
2023-01-11T23:31:02.6772962Z [0;32m[ RUN      ] [mDataTest.CanSaveAndLoadSequentialSampler
2023-01-11T23:31:02.6904231Z [0;32m[       OK ] [mDataTest.CanSaveAndLoadSequentialSampler (13 ms)
2023-01-11T23:31:02.6904779Z [0;32m[ RUN      ] [mDataTest.RandomSamplerReturnsIndicesInCorrectRange
2023-01-11T23:31:02.6905716Z [0;32m[       OK ] [mDataTest.RandomSamplerReturnsIndicesInCorrectRange (0 ms)
2023-01-11T23:31:02.6906786Z [0;32m[ RUN      ] [mDataTest.RandomSamplerReturnsLessValuesForLastBatch
2023-01-11T23:31:02.6907679Z [0;32m[       OK ] [mDataTest.RandomSamplerReturnsLessValuesForLastBatch (0 ms)
2023-01-11T23:31:02.6908498Z [0;32m[ RUN      ] [mDataTest.RandomSamplerResetsWell
2023-01-11T23:31:02.6909373Z [0;32m[       OK ] [mDataTest.RandomSamplerResetsWell (0 ms)
2023-01-11T23:31:02.6910285Z [0;32m[ RUN      ] [mDataTest.RandomSamplerResetsWithNewSizeWell
2023-01-11T23:31:02.6911162Z [0;32m[       OK ] [mDataTest.RandomSamplerResetsWithNewSizeWell (0 ms)
2023-01-11T23:31:02.6912029Z [0;32m[ RUN      ] [mDataTest.SavingAndLoadingRandomSamplerYieldsSameSequence
2023-01-11T23:31:02.6912925Z [0;32m[       OK ] [mDataTest.SavingAndLoadingRandomSamplerYieldsSameSequence (0 ms)
2023-01-11T23:31:02.6913818Z [0;32m[ RUN      ] [mDataTest.StreamSamplerReturnsTheBatchSizeAndThenRemainder
2023-01-11T23:31:02.6914690Z [0;32m[       OK ] [mDataTest.StreamSamplerReturnsTheBatchSizeAndThenRemainder (0 ms)
2023-01-11T23:31:02.6915132Z [0;32m[ RUN      ] [mDataTest.StreamSamplerResetsWell
2023-01-11T23:31:02.6915651Z [0;32m[       OK ] [mDataTest.StreamSamplerResetsWell (0 ms)
2023-01-11T23:31:02.6916142Z [0;32m[ RUN      ] [mDataTest.StreamSamplerResetsWithNewSizeWell
2023-01-11T23:31:02.6916565Z [0;32m[       OK ] [mDataTest.StreamSamplerResetsWithNewSizeWell (0 ms)
2023-01-11T23:31:02.6916988Z [0;32m[ RUN      ] [mDataTest.TensorDatasetConstructsFromSingleTensor
2023-01-11T23:31:02.6917421Z [0;32m[       OK ] [mDataTest.TensorDatasetConstructsFromSingleTensor (0 ms)
2023-01-11T23:31:02.6917899Z [0;32m[ RUN      ] [mDataTest.TensorDatasetConstructsFromInitializerListOfTensors
2023-01-11T23:31:02.6918430Z [0;32m[       OK ] [mDataTest.TensorDatasetConstructsFromInitializerListOfTensors (0 ms)
2023-01-11T23:31:02.6918871Z [0;32m[ RUN      ] [mDataTest.StackTransformWorksForExample
2023-01-11T23:31:02.6919249Z [0;32m[       OK ] [mDataTest.StackTransformWorksForExample (0 ms)
2023-01-11T23:31:02.6919646Z [0;32m[ RUN      ] [mDataTest.StackTransformWorksForTensorExample
2023-01-11T23:31:02.6920063Z [0;32m[       OK ] [mDataTest.StackTransformWorksForTensorExample (0 ms)
2023-01-11T23:31:02.6920480Z [0;32m[ RUN      ] [mDataTest.TensorTransformWorksForAnyTargetType
2023-01-11T23:31:02.6920990Z [0;32m[       OK ] [mDataTest.TensorTransformWorksForAnyTargetType (0 ms)
2023-01-11T23:31:02.6921404Z [0;32m[ RUN      ] [mDataTest.TensorLambdaWorksforAnyTargetType
2023-01-11T23:31:02.6921810Z [0;32m[       OK ] [mDataTest.TensorLambdaWorksforAnyTargetType (0 ms)
2023-01-11T23:31:02.6922162Z [0;32m[ RUN      ] [mDataTest.NormalizeTransform
2023-01-11T23:31:02.6922550Z [0;32m[       OK ] [mDataTest.NormalizeTransform (0 ms)
2023-01-11T23:31:02.6922891Z [0;32m[ RUN      ] [mDataTest.MapDoesNotCopy
2023-01-11T23:31:02.6923197Z [0;32m[       OK ] [mDataTest.MapDoesNotCopy (0 ms)
2023-01-11T23:31:02.6923547Z [0;32m[ RUN      ] [mDataTest.QueuePushAndPopFromSameThread
2023-01-11T23:31:02.6923936Z [0;32m[       OK ] [mDataTest.QueuePushAndPopFromSameThread (0 ms)
2023-01-11T23:31:02.6924342Z [0;32m[ RUN      ] [mDataTest.QueuePopWithTimeoutThrowsUponTimeout
2023-01-11T23:31:02.7035180Z [0;32m[       OK ] [mDataTest.QueuePopWithTimeoutThrowsUponTimeout (11 ms)
2023-01-11T23:31:02.7035686Z [0;32m[ RUN      ] [mDataTest.QueuePushAndPopFromDifferentThreads
2023-01-11T23:31:02.7240897Z [0;32m[       OK ] [mDataTest.QueuePushAndPopFromDifferentThreads (20 ms)
2023-01-11T23:31:02.7241295Z [0;32m[ RUN      ] [mDataTest.QueueClearEmptiesTheQueue
2023-01-11T23:31:02.7263599Z [0;32m[       OK ] [mDataTest.QueueClearEmptiesTheQueue (2 ms)
2023-01-11T23:31:02.7263986Z [0;32m[ RUN      ] [mDataTest.DataShuttleCanPushAndPopJob
2023-01-11T23:31:02.7264384Z [0;32m[       OK ] [mDataTest.DataShuttleCanPushAndPopJob (0 ms)
2023-01-11T23:31:02.7264830Z [0;32m[ RUN      ] [mDataTest.DataShuttleCanPushAndPopResult
2023-01-11T23:31:02.7265221Z [0;32m[       OK ] [mDataTest.DataShuttleCanPushAndPopResult (0 ms)
2023-01-11T23:31:02.7265695Z [0;32m[ RUN      ] [mDataTest.DataShuttlePopResultReturnsNulloptWhenNoJobsInFlight
2023-01-11T23:31:02.7266232Z [0;32m[       OK ] [mDataTest.DataShuttlePopResultReturnsNulloptWhenNoJobsInFlight (0 ms)
2023-01-11T23:31:02.7266727Z [0;32m[ RUN      ] [mDataTest.DataShuttleDrainMeansPopResultReturnsNullopt
2023-01-11T23:31:02.7267208Z [0;32m[       OK ] [mDataTest.DataShuttleDrainMeansPopResultReturnsNullopt (0 ms)
2023-01-11T23:31:02.7267628Z [0;32m[ RUN      ] [mDataTest.DataShuttlePopResultTimesOut
2023-01-11T23:31:02.7376037Z [0;32m[       OK ] [mDataTest.DataShuttlePopResultTimesOut (11 ms)
2023-01-11T23:31:02.7376448Z [0;32m[ RUN      ] [mDataTest.SharedBatchDatasetReallyIsShared
2023-01-11T23:31:02.7402950Z [0;32m[       OK ] [mDataTest.SharedBatchDatasetReallyIsShared (2 ms)
2023-01-11T23:31:02.7403500Z [0;32m[ RUN      ] [mDataTest.SharedBatchDatasetDoesNotIncurCopyWhenPassedDatasetObject
2023-01-11T23:31:02.7404135Z [0;32m[       OK ] [mDataTest.SharedBatchDatasetDoesNotIncurCopyWhenPassedDatasetObject (0 ms)
2023-01-11T23:31:02.7404601Z [0;32m[ RUN      ] [mDataTest.CanUseCustomTypeAsIndexType
2023-01-11T23:31:02.7404974Z [0;32m[       OK ] [mDataTest.CanUseCustomTypeAsIndexType (0 ms)
2023-01-11T23:31:02.7405465Z [0;32m[ RUN      ] [mDataTest.DistributedRandomSamplerSingleReplicaProduceCorrectSamples
2023-01-11T23:31:02.7406035Z [0;32m[       OK ] [mDataTest.DistributedRandomSamplerSingleReplicaProduceCorrectSamples (0 ms)
2023-01-11T23:31:02.7406596Z [0;32m[ RUN      ] [mDataTest.DistributedRandomSamplerMultiReplicaProduceCorrectSamples
2023-01-11T23:31:02.7407160Z [0;32m[       OK ] [mDataTest.DistributedRandomSamplerMultiReplicaProduceCorrectSamples (0 ms)
2023-01-11T23:31:02.7407644Z [0;32m[ RUN      ] [mDataTest.CanSaveAndLoadDistributedRandomSampler
2023-01-11T23:31:02.7410242Z [0;32m[       OK ] [mDataTest.CanSaveAndLoadDistributedRandomSampler (0 ms)
2023-01-11T23:31:02.7410791Z [0;32m[ RUN      ] [mDataTest.DistributedSequentialSamplerSingleReplicaProduceCorrectSamples
2023-01-11T23:31:02.7411475Z [0;32m[       OK ] [mDataTest.DistributedSequentialSamplerSingleReplicaProduceCorrectSamples (0 ms)
2023-01-11T23:31:02.7412144Z [0;32m[ RUN      ] [mDataTest.DistributedSequentialSamplerMultiReplicaProduceCorrectSamples
2023-01-11T23:31:02.7412744Z [0;32m[       OK ] [mDataTest.DistributedSequentialSamplerMultiReplicaProduceCorrectSamples (0 ms)
2023-01-11T23:31:02.7413263Z [0;32m[ RUN      ] [mDataTest.CanSaveAndLoadDistributedSequentialSampler
2023-01-11T23:31:02.7413892Z [0;32m[       OK ] [mDataTest.CanSaveAndLoadDistributedSequentialSampler (0 ms)
2023-01-11T23:31:02.7414375Z [0;32m[----------] [m50 tests from DataTest (71 ms total)
2023-01-11T23:31:02.7414655Z 
2023-01-11T23:31:02.7414819Z [0;32m[----------] [m37 tests from DataLoaderTest
2023-01-11T23:31:02.7415198Z [0;32m[ RUN      ] [mDataLoaderTest.DataLoaderOptionsDefaultAsExpected
2023-01-11T23:31:02.7415654Z [0;32m[       OK ] [mDataLoaderTest.DataLoaderOptionsDefaultAsExpected (0 ms)
2023-01-11T23:31:02.7416112Z [0;32m[ RUN      ] [mDataLoaderTest.DataLoaderOptionsCoalesceOptionalValues
2023-01-11T23:31:02.7416599Z [0;32m[       OK ] [mDataLoaderTest.DataLoaderOptionsCoalesceOptionalValues (0 ms)
2023-01-11T23:31:02.7417059Z [0;32m[ RUN      ] [mDataLoaderTest.MakeDataLoaderDefaultsAsExpected
2023-01-11T23:31:02.7417550Z [0;32m[       OK ] [mDataLoaderTest.MakeDataLoaderDefaultsAsExpected (0 ms)
2023-01-11T23:31:02.7418098Z [0;32m[ RUN      ] [mDataLoaderTest.MakeDataLoaderThrowsWhenConstructingSamplerWithUnsizedDataset
2023-01-11T23:31:02.7425293Z [0;32m[       OK ] [mDataLoaderTest.MakeDataLoaderThrowsWhenConstructingSamplerWithUnsizedDataset (1 ms)
2023-01-11T23:31:02.7426016Z [0;32m[ RUN      ] [mDataLoaderTest.IteratorsCompareEqualToThemselves
2023-01-11T23:31:02.7426577Z [0;32m[       OK ] [mDataLoaderTest.IteratorsCompareEqualToThemselves (0 ms)
2023-01-11T23:31:02.7427104Z [0;32m[ RUN      ] [mDataLoaderTest.ValidIteratorsCompareUnequalToEachOther
2023-01-11T23:31:02.7427585Z [0;32m[       OK ] [mDataLoaderTest.ValidIteratorsCompareUnequalToEachOther (0 ms)
2023-01-11T23:31:02.7428066Z [0;32m[ RUN      ] [mDataLoaderTest.SentinelIteratorsCompareEqualToEachOther
2023-01-11T23:31:02.7428552Z [0;32m[       OK ] [mDataLoaderTest.SentinelIteratorsCompareEqualToEachOther (0 ms)
2023-01-11T23:31:02.7429055Z [0;32m[ RUN      ] [mDataLoaderTest.IteratorsCompareEqualToSentinelWhenExhausted
2023-01-11T23:31:02.7429580Z [0;32m[       OK ] [mDataLoaderTest.IteratorsCompareEqualToSentinelWhenExhausted (0 ms)
2023-01-11T23:31:02.7430074Z [0;32m[ RUN      ] [mDataLoaderTest.IteratorsShareState
2023-01-11T23:31:02.7430439Z [0;32m[       OK ] [mDataLoaderTest.IteratorsShareState (0 ms)
2023-01-11T23:31:02.7430848Z [0;32m[ RUN      ] [mDataLoaderTest.CanDereferenceIteratorMultipleTimes
2023-01-11T23:31:02.7431306Z [0;32m[       OK ] [mDataLoaderTest.CanDereferenceIteratorMultipleTimes (0 ms)
2023-01-11T23:31:02.7431720Z [0;32m[ RUN      ] [mDataLoaderTest.CanUseIteratorAlgorithms
2023-01-11T23:31:02.7432109Z [0;32m[       OK ] [mDataLoaderTest.CanUseIteratorAlgorithms (0 ms)
2023-01-11T23:31:02.7432571Z [0;32m[ RUN      ] [mDataLoaderTest.CallingBeginWhileOtherIteratorIsInFlightThrows
2023-01-11T23:31:02.7440690Z [0;32m[       OK ] [mDataLoaderTest.CallingBeginWhileOtherIteratorIsInFlightThrows (1 ms)
2023-01-11T23:31:02.7441204Z [0;32m[ RUN      ] [mDataLoaderTest.IncrementingExhaustedValidIteratorThrows
2023-01-11T23:31:02.7452419Z [0;32m[       OK ] [mDataLoaderTest.IncrementingExhaustedValidIteratorThrows (1 ms)
2023-01-11T23:31:02.7452912Z [0;32m[ RUN      ] [mDataLoaderTest.DereferencingExhaustedValidIteratorThrows
2023-01-11T23:31:02.7464055Z [0;32m[       OK ] [mDataLoaderTest.DereferencingExhaustedValidIteratorThrows (1 ms)
2023-01-11T23:31:02.7464529Z [0;32m[ RUN      ] [mDataLoaderTest.IncrementingSentinelIteratorThrows
2023-01-11T23:31:02.7475541Z [0;32m[       OK ] [mDataLoaderTest.IncrementingSentinelIteratorThrows (1 ms)
2023-01-11T23:31:02.7475999Z [0;32m[ RUN      ] [mDataLoaderTest.DereferencingSentinelIteratorThrows
2023-01-11T23:31:02.7486994Z [0;32m[       OK ] [mDataLoaderTest.DereferencingSentinelIteratorThrows (1 ms)
2023-01-11T23:31:02.7487506Z [0;32m[ RUN      ] [mDataLoaderTest.YieldsCorrectBatchSize
2023-01-11T23:31:02.7487909Z [0;32m[       OK ] [mDataLoaderTest.YieldsCorrectBatchSize (0 ms)
2023-01-11T23:31:02.7488414Z [0;32m[ RUN      ] [mDataLoaderTest.ReturnsLastBatchWhenSmallerThanBatchSizeWhenDropLastIsFalse
2023-01-11T23:31:02.7489029Z [0;32m[       OK ] [mDataLoaderTest.ReturnsLastBatchWhenSmallerThanBatchSizeWhenDropLastIsFalse (0 ms)
2023-01-11T23:31:02.7489664Z [0;32m[ RUN      ] [mDataLoaderTest.DoesNotReturnLastBatchWhenSmallerThanBatchSizeWhenDropLastIsTrue
2023-01-11T23:31:02.7490325Z [0;32m[       OK ] [mDataLoaderTest.DoesNotReturnLastBatchWhenSmallerThanBatchSizeWhenDropLastIsTrue (0 ms)
2023-01-11T23:31:02.7490809Z [0;32m[ RUN      ] [mDataLoaderTest.RespectsTimeout
2023-01-11T23:31:02.7604634Z [0;32m[       OK ] [mDataLoaderTest.RespectsTimeout (11 ms)
2023-01-11T23:31:02.7605154Z [0;32m[ RUN      ] [mDataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured
2023-01-11T23:31:02.7628844Z [0;32m[       OK ] [mDataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured (2 ms)
2023-01-11T23:31:02.7629328Z [0;32m[ RUN      ] [mDataLoaderTest.Reset
2023-01-11T23:31:02.7629637Z [0;32m[       OK ] [mDataLoaderTest.Reset (0 ms)
2023-01-11T23:31:02.7630113Z [0;32m[ RUN      ] [mDataLoaderTest.TestExceptionsArePropagatedFromWorkers
2023-01-11T23:31:02.7635116Z [0;32m[       OK ] [mDataLoaderTest.TestExceptionsArePropagatedFromWorkers (0 ms)
2023-01-11T23:31:02.7635576Z [0;32m[ RUN      ] [mDataLoaderTest.StatefulDatasetWithNoWorkers
2023-01-11T23:31:02.7635992Z [0;32m[       OK ] [mDataLoaderTest.StatefulDatasetWithNoWorkers (0 ms)
2023-01-11T23:31:02.7636407Z [0;32m[ RUN      ] [mDataLoaderTest.StatefulDatasetWithManyWorkers
2023-01-11T23:31:02.7669066Z [0;32m[       OK ] [mDataLoaderTest.StatefulDatasetWithManyWorkers (3 ms)
2023-01-11T23:31:02.7669483Z [0;32m[ RUN      ] [mDataLoaderTest.StatefulDatasetWithMap
2023-01-11T23:31:02.7669880Z [0;32m[       OK ] [mDataLoaderTest.StatefulDatasetWithMap (0 ms)
2023-01-11T23:31:02.7670466Z [0;32m[ RUN      ] [mDataLoaderTest.StatefulDatasetWithCollate
2023-01-11T23:31:02.7673049Z [0;32m[       OK ] [mDataLoaderTest.StatefulDatasetWithCollate (0 ms)
2023-01-11T23:31:02.7673438Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDataSetGetBatch
2023-01-11T23:31:02.7805012Z [0;32m[       OK ] [mDataLoaderTest.ChunkDataSetGetBatch (13 ms)
2023-01-11T23:31:02.7805449Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDataSetWithBatchSizeMismatch
2023-01-11T23:31:02.7818709Z [0;32m[       OK ] [mDataLoaderTest.ChunkDataSetWithBatchSizeMismatch (1 ms)
2023-01-11T23:31:02.7819154Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDataSetWithEmptyBatch
2023-01-11T23:31:02.7821677Z [0;32m[       OK ] [mDataLoaderTest.ChunkDataSetWithEmptyBatch (0 ms)
2023-01-11T23:31:02.7822126Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDataSetGetBatchWithUnevenBatchSize
2023-01-11T23:31:02.7827630Z [0;32m[       OK ] [mDataLoaderTest.ChunkDataSetGetBatchWithUnevenBatchSize (0 ms)
2023-01-11T23:31:02.7828227Z [0;32m[ RUN      ] [mDataLoaderTest.CanAccessChunkSamplerWithChunkDataSet
2023-01-11T23:31:02.7833209Z [0;32m[       OK ] [mDataLoaderTest.CanAccessChunkSamplerWithChunkDataSet (0 ms)
2023-01-11T23:31:02.7833633Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDatasetDoesNotHang
2023-01-11T23:31:02.7835079Z [0;32m[       OK ] [mDataLoaderTest.ChunkDatasetDoesNotHang (0 ms)
2023-01-11T23:31:02.7835450Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDatasetSave
2023-01-11T23:31:02.7985217Z [0;32m[       OK ] [mDataLoaderTest.ChunkDatasetSave (14 ms)
2023-01-11T23:31:02.7985567Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDatasetLoad
2023-01-11T23:31:02.7990524Z [0;32m[       OK ] [mDataLoaderTest.ChunkDatasetLoad (0 ms)
2023-01-11T23:31:02.7991263Z [0;32m[ RUN      ] [mDataLoaderTest.ChunkDatasetCrossChunkShuffle
2023-01-11T23:31:02.8000424Z [0;32m[       OK ] [mDataLoaderTest.ChunkDatasetCrossChunkShuffle (0 ms)
2023-01-11T23:31:02.8000825Z [0;32m[ RUN      ] [mDataLoaderTest.CustomPreprocessPolicy
2023-01-11T23:31:02.8005723Z [0;32m[       OK ] [mDataLoaderTest.CustomPreprocessPolicy (0 ms)
2023-01-11T23:31:02.8006137Z [0;32m[----------] [m37 tests from DataLoaderTest (59 ms total)
2023-01-11T23:31:02.8006377Z 
2023-01-11T23:31:02.8006588Z [0;32m[----------] [m1 test from EnumTest
2023-01-11T23:31:02.8006915Z [0;32m[ RUN      ] [mEnumTest.AllEnums
2023-01-11T23:31:02.8007235Z [0;32m[       OK ] [mEnumTest.AllEnums (0 ms)
2023-01-11T23:31:02.8007548Z [0;32m[----------] [m1 test from EnumTest (0 ms total)
2023-01-11T23:31:02.8007694Z 
2023-01-11T23:31:02.8007881Z [0;32m[----------] [m6 tests from ExpandingArrayTest
2023-01-11T23:31:02.8008270Z [0;32m[ RUN      ] [mExpandingArrayTest.CanConstructFromInitializerList
2023-01-11T23:31:02.8008735Z [0;32m[       OK ] [mExpandingArrayTest.CanConstructFromInitializerList (0 ms)
2023-01-11T23:31:02.8009163Z [0;32m[ RUN      ] [mExpandingArrayTest.CanConstructFromVector
2023-01-11T23:31:02.8009566Z [0;32m[       OK ] [mExpandingArrayTest.CanConstructFromVector (0 ms)
2023-01-11T23:31:02.8010035Z [0;32m[ RUN      ] [mExpandingArrayTest.CanConstructFromArray
2023-01-11T23:31:02.8010436Z [0;32m[       OK ] [mExpandingArrayTest.CanConstructFromArray (0 ms)
2023-01-11T23:31:02.8010845Z [0;32m[ RUN      ] [mExpandingArrayTest.CanConstructFromSingleValue
2023-01-11T23:31:02.8011287Z [0;32m[       OK ] [mExpandingArrayTest.CanConstructFromSingleValue (0 ms)
2023-01-11T23:31:02.8011881Z [0;32m[ RUN      ] [mExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInInitializerList
2023-01-11T23:31:02.8018291Z [0;32m[       OK ] [mExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInInitializerList (1 ms)
2023-01-11T23:31:02.8018991Z [0;32m[ RUN      ] [mExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInVector
2023-01-11T23:31:02.8028580Z [0;32m[       OK ] [mExpandingArrayTest.ThrowsWhenConstructedWithIncorrectNumberOfArgumentsInVector (1 ms)
2023-01-11T23:31:02.8029193Z [0;32m[----------] [m6 tests from ExpandingArrayTest (2 ms total)
2023-01-11T23:31:02.8029376Z 
2023-01-11T23:31:02.8029528Z [0;32m[----------] [m10 tests from FFTTest
2023-01-11T23:31:02.8029785Z [0;32m[ RUN      ] [mFFTTest.fft
2023-01-11T23:31:02.8033831Z [0;32m[       OK ] [mFFTTest.fft (0 ms)
2023-01-11T23:31:02.8034112Z [0;32m[ RUN      ] [mFFTTest.fft_real
2023-01-11T23:31:02.8034982Z [0;32m[       OK ] [mFFTTest.fft_real (0 ms)
2023-01-11T23:31:02.8035972Z [0;32m[ RUN      ] [mFFTTest.fft_pad
2023-01-11T23:31:02.8037492Z [0;32m[       OK ] [mFFTTest.fft_pad (0 ms)
2023-01-11T23:31:02.8037783Z [0;32m[ RUN      ] [mFFTTest.fft_norm
2023-01-11T23:31:02.8039564Z [0;32m[       OK ] [mFFTTest.fft_norm (0 ms)
2023-01-11T23:31:02.8039831Z [0;32m[ RUN      ] [mFFTTest.ifft
2023-01-11T23:31:02.8042059Z [0;32m[       OK ] [mFFTTest.ifft (0 ms)
2023-01-11T23:31:02.8042444Z [0;32m[ RUN      ] [mFFTTest.fft_ifft
2023-01-11T23:31:02.8043099Z [0;32m[       OK ] [mFFTTest.fft_ifft (0 ms)
2023-01-11T23:31:02.8043406Z [0;32m[ RUN      ] [mFFTTest.rfft
2023-01-11T23:31:02.8043954Z [0;32m[       OK ] [mFFTTest.rfft (0 ms)
2023-01-11T23:31:02.8044436Z [0;32m[ RUN      ] [mFFTTest.rfft_irfft
2023-01-11T23:31:02.8045329Z [0;32m[       OK ] [mFFTTest.rfft_irfft (0 ms)
2023-01-11T23:31:02.8045734Z [0;32m[ RUN      ] [mFFTTest.ihfft
2023-01-11T23:31:02.8046997Z [0;32m[       OK ] [mFFTTest.ihfft (0 ms)
2023-01-11T23:31:02.8047275Z [0;32m[ RUN      ] [mFFTTest.hfft_ihfft
2023-01-11T23:31:02.8048912Z [0;32m[       OK ] [mFFTTest.hfft_ihfft (0 ms)
2023-01-11T23:31:02.8049243Z [0;32m[----------] [m10 tests from FFTTest (2 ms total)
2023-01-11T23:31:02.8049406Z 
2023-01-11T23:31:02.8049573Z [0;32m[----------] [m135 tests from FunctionalTest
2023-01-11T23:31:02.8049868Z [0;32m[ RUN      ] [mFunctionalTest.Conv1d
2023-01-11T23:31:02.8058992Z [0;32m[       OK ] [mFunctionalTest.Conv1d (0 ms)
2023-01-11T23:31:02.8059307Z [0;32m[ RUN      ] [mFunctionalTest.Conv2dEven
2023-01-11T23:31:02.8063221Z [0;32m[       OK ] [mFunctionalTest.Conv2dEven (0 ms)
2023-01-11T23:31:02.8063560Z [0;32m[ RUN      ] [mFunctionalTest.Conv2dUneven
2023-01-11T23:31:02.8064992Z [0;32m[       OK ] [mFunctionalTest.Conv2dUneven (0 ms)
2023-01-11T23:31:02.8065313Z [0;32m[ RUN      ] [mFunctionalTest.Conv3d
2023-01-11T23:31:02.8069168Z [0;32m[       OK ] [mFunctionalTest.Conv3d (0 ms)
2023-01-11T23:31:02.8069505Z [0;32m[ RUN      ] [mFunctionalTest.MaxPool1d
2023-01-11T23:31:02.8069982Z [0;32m[       OK ] [mFunctionalTest.MaxPool1d (0 ms)
2023-01-11T23:31:02.8070322Z [0;32m[ RUN      ] [mFunctionalTest.MaxPool2d
2023-01-11T23:31:02.8070859Z [0;32m[       OK ] [mFunctionalTest.MaxPool2d (0 ms)
2023-01-11T23:31:02.8072089Z [0;32m[ RUN      ] [mFunctionalTest.MaxPool2dBackward
2023-01-11T23:31:02.8073994Z [0;32m[       OK ] [mFunctionalTest.MaxPool2dBackward (0 ms)
2023-01-11T23:31:02.8074436Z [0;32m[ RUN      ] [mFunctionalTest.MaxPool3d
2023-01-11T23:31:02.8074867Z [0;32m[       OK ] [mFunctionalTest.MaxPool3d (0 ms)
2023-01-11T23:31:02.8075309Z [0;32m[ RUN      ] [mFunctionalTest.AvgPool1d
2023-01-11T23:31:02.8075681Z [0;32m[       OK ] [mFunctionalTest.AvgPool1d (0 ms)
2023-01-11T23:31:02.8076019Z [0;32m[ RUN      ] [mFunctionalTest.AvgPool2d
2023-01-11T23:31:02.8076320Z [0;32m[       OK ] [mFunctionalTest.AvgPool2d (0 ms)
2023-01-11T23:31:02.8076628Z [0;32m[ RUN      ] [mFunctionalTest.AvgPool3d
2023-01-11T23:31:02.8076944Z [0;32m[       OK ] [mFunctionalTest.AvgPool3d (0 ms)
2023-01-11T23:31:02.8077270Z [0;32m[ RUN      ] [mFunctionalTest.FractionalMaxPool2d
2023-01-11T23:31:02.8079576Z [0;32m[       OK ] [mFunctionalTest.FractionalMaxPool2d (0 ms)
2023-01-11T23:31:02.8079942Z [0;32m[ RUN      ] [mFunctionalTest.FractionalMaxPool3d
2023-01-11T23:31:02.8081488Z [0;32m[       OK ] [mFunctionalTest.FractionalMaxPool3d (0 ms)
2023-01-11T23:31:02.8081820Z [0;32m[ RUN      ] [mFunctionalTest.LPPool1d
2023-01-11T23:31:02.8083425Z [0;32m[       OK ] [mFunctionalTest.LPPool1d (0 ms)
2023-01-11T23:31:02.8083773Z [0;32m[ RUN      ] [mFunctionalTest.LPPool2d
2023-01-11T23:31:02.8084151Z [0;32m[       OK ] [mFunctionalTest.LPPool2d (0 ms)
2023-01-11T23:31:02.8084472Z [0;32m[ RUN      ] [mFunctionalTest.CosineSimilarity
2023-01-11T23:31:02.8085925Z [0;32m[       OK ] [mFunctionalTest.CosineSimilarity (0 ms)
2023-01-11T23:31:02.8086306Z [0;32m[ RUN      ] [mFunctionalTest.SmoothL1LossDefaultOptions
2023-01-11T23:31:02.8088049Z [0;32m[       OK ] [mFunctionalTest.SmoothL1LossDefaultOptions (0 ms)
2023-01-11T23:31:02.8088520Z [0;32m[ RUN      ] [mFunctionalTest.SmoothL1LossBeta
2023-01-11T23:31:02.8089216Z [0;32m[       OK ] [mFunctionalTest.SmoothL1LossBeta (0 ms)
2023-01-11T23:31:02.8089622Z [0;32m[ RUN      ] [mFunctionalTest.SmoothL1LossNoReduction
2023-01-11T23:31:02.8090562Z [0;32m[       OK ] [mFunctionalTest.SmoothL1LossNoReduction (0 ms)
2023-01-11T23:31:02.8091025Z [0;32m[ RUN      ] [mFunctionalTest.HuberLossDefaultOptions
2023-01-11T23:31:02.8092122Z [0;32m[       OK ] [mFunctionalTest.HuberLossDefaultOptions (0 ms)
2023-01-11T23:31:02.8092566Z [0;32m[ RUN      ] [mFunctionalTest.HuberLossDelta
2023-01-11T23:31:02.8092964Z [0;32m[       OK ] [mFunctionalTest.HuberLossDelta (0 ms)
2023-01-11T23:31:02.8093369Z [0;32m[ RUN      ] [mFunctionalTest.HuberLossNoReduction
2023-01-11T23:31:02.8094168Z [0;32m[       OK ] [mFunctionalTest.HuberLossNoReduction (0 ms)
2023-01-11T23:31:02.8094917Z [0;32m[ RUN      ] [mFunctionalTest.SoftMarginLossDefaultOptions
2023-01-11T23:31:02.8097170Z [0;32m[       OK ] [mFunctionalTest.SoftMarginLossDefaultOptions (0 ms)
2023-01-11T23:31:02.8097624Z [0;32m[ RUN      ] [mFunctionalTest.MultiLabelSoftMarginLossDefaultOptions
2023-01-11T23:31:02.8100449Z [0;32m[       OK ] [mFunctionalTest.MultiLabelSoftMarginLossDefaultOptions (0 ms)
2023-01-11T23:31:02.8100895Z [0;32m[ RUN      ] [mFunctionalTest.SoftMarginLossNoReduction
2023-01-11T23:31:02.8101850Z [0;32m[       OK ] [mFunctionalTest.SoftMarginLossNoReduction (0 ms)
2023-01-11T23:31:02.8102529Z [0;32m[ RUN      ] [mFunctionalTest.MultiLabelSoftMarginLossWeightedNoReduction
2023-01-11T23:31:02.8104545Z [0;32m[       OK ] [mFunctionalTest.MultiLabelSoftMarginLossWeightedNoReduction (0 ms)
2023-01-11T23:31:02.8105060Z [0;32m[ RUN      ] [mFunctionalTest.PairwiseDistance
2023-01-11T23:31:02.8105503Z [0;32m[       OK ] [mFunctionalTest.PairwiseDistance (0 ms)
2023-01-11T23:31:02.8105856Z [0;32m[ RUN      ] [mFunctionalTest.PDist
2023-01-11T23:31:02.8106885Z [0;32m[       OK ] [mFunctionalTest.PDist (0 ms)
2023-01-11T23:31:02.8107219Z [0;32m[ RUN      ] [mFunctionalTest.AdaptiveMaxPool1d
2023-01-11T23:31:02.8107667Z [0;32m[       OK ] [mFunctionalTest.AdaptiveMaxPool1d (0 ms)
2023-01-11T23:31:02.8108016Z [0;32m[ RUN      ] [mFunctionalTest.AdaptiveMaxPool2d
2023-01-11T23:31:02.8108422Z [0;32m[       OK ] [mFunctionalTest.AdaptiveMaxPool2d (0 ms)
2023-01-11T23:31:02.8108786Z [0;32m[ RUN      ] [mFunctionalTest.AdaptiveMaxPool3d
2023-01-11T23:31:02.8109457Z [0;32m[       OK ] [mFunctionalTest.AdaptiveMaxPool3d (0 ms)
2023-01-11T23:31:02.8110058Z [0;32m[ RUN      ] [mFunctionalTest.AdaptiveAvgPool1d
2023-01-11T23:31:02.8110460Z [0;32m[       OK ] [mFunctionalTest.AdaptiveAvgPool1d (0 ms)
2023-01-11T23:31:02.8110809Z [0;32m[ RUN      ] [mFunctionalTest.AdaptiveAvgPool2d
2023-01-11T23:31:02.8111460Z [0;32m[       OK ] [mFunctionalTest.AdaptiveAvgPool2d (0 ms)
2023-01-11T23:31:02.8112160Z [0;32m[ RUN      ] [mFunctionalTest.AdaptiveAvgPool3d
2023-01-11T23:31:02.8112777Z [0;32m[       OK ] [mFunctionalTest.AdaptiveAvgPool3d (0 ms)
2023-01-11T23:31:02.8113190Z [0;32m[ RUN      ] [mFunctionalTest.L1Loss
2023-01-11T23:31:02.8113696Z [0;32m[       OK ] [mFunctionalTest.L1Loss (0 ms)
2023-01-11T23:31:02.8113988Z [0;32m[ RUN      ] [mFunctionalTest.MSELoss
2023-01-11T23:31:02.8115123Z [0;32m[       OK ] [mFunctionalTest.MSELoss (0 ms)
2023-01-11T23:31:02.8115460Z [0;32m[ RUN      ] [mFunctionalTest.BCELoss
2023-01-11T23:31:02.8116787Z [0;32m[       OK ] [mFunctionalTest.BCELoss (0 ms)
2023-01-11T23:31:02.8117101Z [0;32m[ RUN      ] [mFunctionalTest.KLDivLoss
2023-01-11T23:31:02.8117772Z [W loss.h:57] Warning: reduction: 'mean' divides the total loss by both the batch size and the support size.'batchmean' divides only by the batch size, and aligns with the KL div math definition.'mean' will be changed to behave the same as 'batchmean' in the next major release. (function kl_div)
2023-01-11T23:31:02.8118415Z [0;32m[       OK ] [mFunctionalTest.KLDivLoss (0 ms)
2023-01-11T23:31:02.8118758Z [0;32m[ RUN      ] [mFunctionalTest.HingeEmbeddingLoss
2023-01-11T23:31:02.8120038Z [0;32m[       OK ] [mFunctionalTest.HingeEmbeddingLoss (0 ms)
2023-01-11T23:31:02.8120396Z [0;32m[ RUN      ] [mFunctionalTest.GridSample
2023-01-11T23:31:02.8122874Z [W vision.h:87] Warning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details. (function grid_sample)
2023-01-11T23:31:02.8125128Z [0;32m[       OK ] [mFunctionalTest.GridSample (0 ms)
2023-01-11T23:31:02.8125454Z [0;32m[ RUN      ] [mFunctionalTest.AffineGrid
2023-01-11T23:31:02.8268921Z [0;32m[       OK ] [mFunctionalTest.AffineGrid (14 ms)
2023-01-11T23:31:02.8269359Z [0;32m[ RUN      ] [mFunctionalTest.MultiMarginLoss
2023-01-11T23:31:02.8269711Z [0;32m[       OK ] [mFunctionalTest.MultiMarginLoss (0 ms)
2023-01-11T23:31:02.8270207Z [0;32m[ RUN      ] [mFunctionalTest.CosineEmbeddingLoss
2023-01-11T23:31:02.8273585Z [0;32m[       OK ] [mFunctionalTest.CosineEmbeddingLoss (0 ms)
2023-01-11T23:31:02.8274004Z [0;32m[ RUN      ] [mFunctionalTest.MultiLabelMarginLossDefaultOptions
2023-01-11T23:31:02.8274577Z [0;32m[       OK ] [mFunctionalTest.MultiLabelMarginLossDefaultOptions (0 ms)
2023-01-11T23:31:02.8275104Z [0;32m[ RUN      ] [mFunctionalTest.MultiLabelMarginLossNoReduction
2023-01-11T23:31:02.8276197Z [0;32m[       OK ] [mFunctionalTest.MultiLabelMarginLossNoReduction (0 ms)
2023-01-11T23:31:02.8276815Z [0;32m[ RUN      ] [mFunctionalTest.TripletMarginLoss
2023-01-11T23:31:02.8277434Z [0;32m[       OK ] [mFunctionalTest.TripletMarginLoss (0 ms)
2023-01-11T23:31:02.8277875Z [0;32m[ RUN      ] [mFunctionalTest.TripletMarginWithDistanceLossDefaultParity
2023-01-11T23:31:02.8395044Z [0;32m[       OK ] [mFunctionalTest.TripletMarginWithDistanceLossDefaultParity (11 ms)
2023-01-11T23:31:02.8395482Z [0;32m[ RUN      ] [mFunctionalTest.NLLLoss
2023-01-11T23:31:02.8395849Z [0;32m[       OK ] [mFunctionalTest.NLLLoss (0 ms)
2023-01-11T23:31:02.8396163Z [0;32m[ RUN      ] [mFunctionalTest.CrossEntropy
2023-01-11T23:31:02.8399956Z [0;32m[       OK ] [mFunctionalTest.CrossEntropy (0 ms)
2023-01-11T23:31:02.8400295Z [0;32m[ RUN      ] [mFunctionalTest.MaxUnpool1d
2023-01-11T23:31:02.8403549Z [0;32m[       OK ] [mFunctionalTest.MaxUnpool1d (0 ms)
2023-01-11T23:31:02.8403977Z [0;32m[ RUN      ] [mFunctionalTest.MaxUnpool2d
2023-01-11T23:31:02.8406286Z [0;32m[       OK ] [mFunctionalTest.MaxUnpool2d (0 ms)
2023-01-11T23:31:02.8406817Z [0;32m[ RUN      ] [mFunctionalTest.MaxUnpool3d
2023-01-11T23:31:02.8407552Z [0;32m[       OK ] [mFunctionalTest.MaxUnpool3d (0 ms)
2023-01-11T23:31:02.8407897Z [0;32m[ RUN      ] [mFunctionalTest.ELU
2023-01-11T23:31:02.8421892Z [0;32m[       OK ] [mFunctionalTest.ELU (1 ms)
2023-01-11T23:31:02.8422189Z [0;32m[ RUN      ] [mFunctionalTest.SELU
2023-01-11T23:31:02.8425534Z [0;32m[       OK ] [mFunctionalTest.SELU (0 ms)
2023-01-11T23:31:02.8425871Z [0;32m[ RUN      ] [mFunctionalTest.GLU
2023-01-11T23:31:02.8426783Z [0;32m[       OK ] [mFunctionalTest.GLU (0 ms)
2023-01-11T23:31:02.8427077Z [0;32m[ RUN      ] [mFunctionalTest.GELU
2023-01-11T23:31:02.8431306Z [0;32m[       OK ] [mFunctionalTest.GELU (0 ms)
2023-01-11T23:31:02.8431970Z [0;32m[ RUN      ] [mFunctionalTest.TanhGELU
2023-01-11T23:31:02.8433384Z [0;32m[       OK ] [mFunctionalTest.TanhGELU (0 ms)
2023-01-11T23:31:02.8433711Z [0;32m[ RUN      ] [mFunctionalTest.Hardshrink
2023-01-11T23:31:02.8440329Z [0;32m[       OK ] [mFunctionalTest.Hardshrink (0 ms)
2023-01-11T23:31:02.8440650Z [0;32m[ RUN      ] [mFunctionalTest.OneHot
2023-01-11T23:31:02.8444022Z [0;32m[       OK ] [mFunctionalTest.OneHot (0 ms)
2023-01-11T23:31:02.8444334Z [0;32m[ RUN      ] [mFunctionalTest.Hardtanh
2023-01-11T23:31:02.8473323Z [0;32m[       OK ] [mFunctionalTest.Hardtanh (2 ms)
2023-01-11T23:31:02.8473646Z [0;32m[ RUN      ] [mFunctionalTest.LeakyReLU
2023-01-11T23:31:02.8481591Z [0;32m[       OK ] [mFunctionalTest.LeakyReLU (0 ms)
2023-01-11T23:31:02.8481918Z [0;32m[ RUN      ] [mFunctionalTest.LogSigmoid
2023-01-11T23:31:02.8482436Z [0;32m[       OK ] [mFunctionalTest.LogSigmoid (0 ms)
2023-01-11T23:31:02.8482779Z [0;32m[ RUN      ] [mFunctionalTest.GumbelSoftmax
2023-01-11T23:31:02.8514344Z [0;32m[       OK ] [mFunctionalTest.GumbelSoftmax (3 ms)
2023-01-11T23:31:02.8514685Z [0;32m[ RUN      ] [mFunctionalTest.Softmax
2023-01-11T23:31:02.8515113Z [0;32m[       OK ] [mFunctionalTest.Softmax (0 ms)
2023-01-11T23:31:02.8515427Z [0;32m[ RUN      ] [mFunctionalTest.Softmin
2023-01-11T23:31:02.8516059Z [0;32m[       OK ] [mFunctionalTest.Softmin (0 ms)
2023-01-11T23:31:02.8516435Z [0;32m[ RUN      ] [mFunctionalTest.LogSoftmax
2023-01-11T23:31:02.8517370Z [0;32m[       OK ] [mFunctionalTest.LogSoftmax (0 ms)
2023-01-11T23:31:02.8517698Z [0;32m[ RUN      ] [mFunctionalTest.PReLU
2023-01-11T23:31:02.8520818Z [0;32m[       OK ] [mFunctionalTest.PReLU (0 ms)
2023-01-11T23:31:02.8521128Z [0;32m[ RUN      ] [mFunctionalTest.LayerNorm
2023-01-11T23:31:02.8521472Z [0;32m[       OK ] [mFunctionalTest.LayerNorm (0 ms)
2023-01-11T23:31:02.8521851Z [0;32m[ RUN      ] [mFunctionalTest.GroupNorm
2023-01-11T23:31:02.8522164Z [0;32m[       OK ] [mFunctionalTest.GroupNorm (0 ms)
2023-01-11T23:31:02.8522500Z [0;32m[ RUN      ] [mFunctionalTest.LocalResponseNorm
2023-01-11T23:31:02.8524144Z [0;32m[       OK ] [mFunctionalTest.LocalResponseNorm (0 ms)
2023-01-11T23:31:02.8524485Z [0;32m[ RUN      ] [mFunctionalTest.Linear
2023-01-11T23:31:02.8527136Z [0;32m[       OK ] [mFunctionalTest.Linear (0 ms)
2023-01-11T23:31:02.8527455Z [0;32m[ RUN      ] [mFunctionalTest.Embedding
2023-01-11T23:31:02.8527870Z [0;32m[       OK ] [mFunctionalTest.Embedding (0 ms)
2023-01-11T23:31:02.8528189Z [0;32m[ RUN      ] [mFunctionalTest.EmbeddingBag
2023-01-11T23:31:02.8532577Z [0;32m[       OK ] [mFunctionalTest.EmbeddingBag (0 ms)
2023-01-11T23:31:02.8532899Z [0;32m[ RUN      ] [mFunctionalTest.Bilinear
2023-01-11T23:31:02.8535406Z [0;32m[       OK ] [mFunctionalTest.Bilinear (0 ms)
2023-01-11T23:31:02.8535716Z [0;32m[ RUN      ] [mFunctionalTest.Normalize
2023-01-11T23:31:02.8539915Z [0;32m[       OK ] [mFunctionalTest.Normalize (0 ms)
2023-01-11T23:31:02.8540221Z [0;32m[ RUN      ] [mFunctionalTest.ReLU
2023-01-11T23:31:02.8542761Z [0;32m[       OK ] [mFunctionalTest.ReLU (0 ms)
2023-01-11T23:31:02.8543098Z [0;32m[ RUN      ] [mFunctionalTest.ReLUDefaultOptions
2023-01-11T23:31:02.8543544Z [0;32m[       OK ] [mFunctionalTest.ReLUDefaultOptions (0 ms)
2023-01-11T23:31:02.8543871Z [0;32m[ RUN      ] [mFunctionalTest.ReLU6
2023-01-11T23:31:02.8547282Z [0;32m[       OK ] [mFunctionalTest.ReLU6 (0 ms)
2023-01-11T23:31:02.8547684Z [0;32m[ RUN      ] [mFunctionalTest.ReLU6DefaultOptions
2023-01-11T23:31:02.8548094Z [0;32m[       OK ] [mFunctionalTest.ReLU6DefaultOptions (0 ms)
2023-01-11T23:31:02.8548415Z [0;32m[ RUN      ] [mFunctionalTest.RReLU
2023-01-11T23:31:02.8584301Z [0;32m[       OK ] [mFunctionalTest.RReLU (3 ms)
2023-01-11T23:31:02.8585252Z [0;32m[ RUN      ] [mFunctionalTest.RReLUDefaultOptions
2023-01-11T23:31:02.8586256Z [0;32m[       OK ] [mFunctionalTest.RReLUDefaultOptions (0 ms)
2023-01-11T23:31:02.8586763Z [0;32m[ RUN      ] [mFunctionalTest.CELU
2023-01-11T23:31:02.8599203Z [0;32m[       OK ] [mFunctionalTest.CELU (1 ms)
2023-01-11T23:31:02.8599698Z [0;32m[ RUN      ] [mFunctionalTest.CELUDefaultOptions
2023-01-11T23:31:02.8600271Z [0;32m[       OK ] [mFunctionalTest.CELUDefaultOptions (0 ms)
2023-01-11T23:31:02.8600782Z [0;32m[ RUN      ] [mFunctionalTest.PixelShuffle
2023-01-11T23:31:02.8602663Z [0;32m[       OK ] [mFunctionalTest.PixelShuffle (0 ms)
2023-01-11T23:31:02.8603186Z [0;32m[ RUN      ] [mFunctionalTest.PixelUnshuffle
2023-01-11T23:31:02.8604361Z [0;32m[       OK ] [mFunctionalTest.PixelUnshuffle (0 ms)
2023-01-11T23:31:02.8604870Z [0;32m[ RUN      ] [mFunctionalTest.Softplus
2023-01-11T23:31:02.8612493Z [0;32m[       OK ] [mFunctionalTest.Softplus (0 ms)
2023-01-11T23:31:02.8613036Z [0;32m[ RUN      ] [mFunctionalTest.SoftplusDefaultOptions
2023-01-11T23:31:02.8613591Z [0;32m[       OK ] [mFunctionalTest.SoftplusDefaultOptions (0 ms)
2023-01-11T23:31:02.8614080Z [0;32m[ RUN      ] [mFunctionalTest.Fold
2023-01-11T23:31:02.8614867Z [0;32m[       OK ] [mFunctionalTest.Fold (0 ms)
2023-01-11T23:31:02.8615260Z [0;32m[ RUN      ] [mFunctionalTest.Unfold
2023-01-11T23:31:02.8616446Z [0;32m[       OK ] [mFunctionalTest.Unfold (0 ms)
2023-01-11T23:31:02.8616760Z [0;32m[ RUN      ] [mFunctionalTest.Softshrink
2023-01-11T23:31:02.8623523Z [0;32m[       OK ] [mFunctionalTest.Softshrink (0 ms)
2023-01-11T23:31:02.8623992Z [0;32m[ RUN      ] [mFunctionalTest.SoftshrinkDefaultOptions
2023-01-11T23:31:02.8624400Z [0;32m[       OK ] [mFunctionalTest.SoftshrinkDefaultOptions (0 ms)
2023-01-11T23:31:02.8624868Z [0;32m[ RUN      ] [mFunctionalTest.Softsign
2023-01-11T23:31:02.8625231Z [0;32m[       OK ] [mFunctionalTest.Softsign (0 ms)
2023-01-11T23:31:02.8625543Z [0;32m[ RUN      ] [mFunctionalTest.Mish
2023-01-11T23:31:02.8626287Z [0;32m[       OK ] [mFunctionalTest.Mish (0 ms)
2023-01-11T23:31:02.8626603Z [0;32m[ RUN      ] [mFunctionalTest.Tanhshrink
2023-01-11T23:31:02.8626961Z [0;32m[       OK ] [mFunctionalTest.Tanhshrink (0 ms)
2023-01-11T23:31:02.8627326Z [0;32m[ RUN      ] [mFunctionalTest.Threshold
2023-01-11T23:31:02.8640658Z [0;32m[       OK ] [mFunctionalTest.Threshold (1 ms)
2023-01-11T23:31:02.8640997Z [0;32m[ RUN      ] [mFunctionalTest.BatchNorm1d
2023-01-11T23:31:02.8641425Z [0;32m[       OK ] [mFunctionalTest.BatchNorm1d (0 ms)
2023-01-11T23:31:02.8641784Z [0;32m[ RUN      ] [mFunctionalTest.BatchNorm1dDefaultOptions
2023-01-11T23:31:02.8642553Z [0;32m[       OK ] [mFunctionalTest.BatchNorm1dDefaultOptions (0 ms)
2023-01-11T23:31:02.8643110Z [0;32m[ RUN      ] [mFunctionalTest.BatchNorm2d
2023-01-11T23:31:02.8644029Z [0;32m[       OK ] [mFunctionalTest.BatchNorm2d (0 ms)
2023-01-11T23:31:02.8644583Z [0;32m[ RUN      ] [mFunctionalTest.BatchNorm2dDefaultOptions
2023-01-11T23:31:02.8645480Z [0;32m[       OK ] [mFunctionalTest.BatchNorm2dDefaultOptions (0 ms)
2023-01-11T23:31:02.8646017Z [0;32m[ RUN      ] [mFunctionalTest.BatchNorm3d
2023-01-11T23:31:02.8647004Z [0;32m[       OK ] [mFunctionalTest.BatchNorm3d (0 ms)
2023-01-11T23:31:02.8647544Z [0;32m[ RUN      ] [mFunctionalTest.BatchNorm3dDefaultOptions
2023-01-11T23:31:02.8648516Z [0;32m[       OK ] [mFunctionalTest.BatchNorm3dDefaultOptions (0 ms)
2023-01-11T23:31:02.8649212Z [0;32m[ RUN      ] [mFunctionalTest.InstanceNorm1d
2023-01-11T23:31:02.8652111Z [0;32m[       OK ] [mFunctionalTest.InstanceNorm1d (0 ms)
2023-01-11T23:31:02.8652701Z [0;32m[ RUN      ] [mFunctionalTest.InstanceNorm1dDefaultOptions
2023-01-11T23:31:02.8653675Z [0;32m[       OK ] [mFunctionalTest.InstanceNorm1dDefaultOptions (0 ms)
2023-01-11T23:31:02.8654218Z [0;32m[ RUN      ] [mFunctionalTest.InstanceNorm2d
2023-01-11T23:31:02.8657545Z [0;32m[       OK ] [mFunctionalTest.InstanceNorm2d (0 ms)
2023-01-11T23:31:02.8658101Z [0;32m[ RUN      ] [mFunctionalTest.InstanceNorm2dDefaultOptions
2023-01-11T23:31:02.8659503Z [0;32m[       OK ] [mFunctionalTest.InstanceNorm2dDefaultOptions (0 ms)
2023-01-11T23:31:02.8660046Z [0;32m[ RUN      ] [mFunctionalTest.InstanceNorm3d
2023-01-11T23:31:02.8664404Z [0;32m[       OK ] [mFunctionalTest.InstanceNorm3d (0 ms)
2023-01-11T23:31:02.8664987Z [0;32m[ RUN      ] [mFunctionalTest.InstanceNorm3dDefaultOptions
2023-01-11T23:31:02.8667646Z [0;32m[       OK ] [mFunctionalTest.InstanceNorm3dDefaultOptions (0 ms)
2023-01-11T23:31:02.8668223Z [0;32m[ RUN      ] [mFunctionalTest.Interpolate
2023-01-11T23:31:02.8669133Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8670670Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8672758Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8674128Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8675649Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8677026Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8678299Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8679662Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:02.8756429Z [0;32m[       OK ] [mFunctionalTest.Interpolate (8 ms)
2023-01-11T23:31:02.8756931Z [0;32m[ RUN      ] [mFunctionalTest.Pad1
2023-01-11T23:31:02.8757359Z [0;32m[       OK ] [mFunctionalTest.Pad1 (0 ms)
2023-01-11T23:31:02.8757803Z [0;32m[ RUN      ] [mFunctionalTest.Pad2
2023-01-11T23:31:02.8759521Z [0;32m[       OK ] [mFunctionalTest.Pad2 (0 ms)
2023-01-11T23:31:02.8759959Z [0;32m[ RUN      ] [mFunctionalTest.Pad3
2023-01-11T23:31:02.8766241Z [0;32m[       OK ] [mFunctionalTest.Pad3 (0 ms)
2023-01-11T23:31:02.8766682Z [0;32m[ RUN      ] [mFunctionalTest.Pad4
2023-01-11T23:31:02.8768508Z [0;32m[       OK ] [mFunctionalTest.Pad4 (0 ms)
2023-01-11T23:31:02.8768939Z [0;32m[ RUN      ] [mFunctionalTest.Pad5
2023-01-11T23:31:02.8772999Z [0;32m[       OK ] [mFunctionalTest.Pad5 (0 ms)
2023-01-11T23:31:02.8773442Z [0;32m[ RUN      ] [mFunctionalTest.Pad6
2023-01-11T23:31:02.8775513Z [0;32m[       OK ] [mFunctionalTest.Pad6 (0 ms)
2023-01-11T23:31:02.8775920Z [0;32m[ RUN      ] [mFunctionalTest.Pad7
2023-01-11T23:31:02.8776332Z [0;32m[       OK ] [mFunctionalTest.Pad7 (0 ms)
2023-01-11T23:31:02.8776729Z [0;32m[ RUN      ] [mFunctionalTest.Pad8
2023-01-11T23:31:02.8777120Z [0;32m[       OK ] [mFunctionalTest.Pad8 (0 ms)
2023-01-11T23:31:02.8777531Z [0;32m[ RUN      ] [mFunctionalTest.CTCLoss
2023-01-11T23:31:02.8888008Z [0;32m[       OK ] [mFunctionalTest.CTCLoss (11 ms)
2023-01-11T23:31:02.8888545Z [0;32m[ RUN      ] [mFunctionalTest.PoissonNLLLoss
2023-01-11T23:31:02.8889560Z [0;32m[       OK ] [mFunctionalTest.PoissonNLLLoss (0 ms)
2023-01-11T23:31:02.8890100Z [0;32m[ RUN      ] [mFunctionalTest.MarginRankingLoss
2023-01-11T23:31:02.8894131Z [0;32m[       OK ] [mFunctionalTest.MarginRankingLoss (0 ms)
2023-01-11T23:31:02.8896577Z [0;32m[ RUN      ] [mFunctionalTest.ConvTranspose1d
2023-01-11T23:31:02.8897156Z [0;32m[       OK ] [mFunctionalTest.ConvTranspose1d (0 ms)
2023-01-11T23:31:02.8897672Z [0;32m[ RUN      ] [mFunctionalTest.ConvTranspose2dEven
2023-01-11T23:31:02.8902308Z [0;32m[       OK ] [mFunctionalTest.ConvTranspose2dEven (0 ms)
2023-01-11T23:31:02.8902889Z [0;32m[ RUN      ] [mFunctionalTest.ConvTranspose2dUneven
2023-01-11T23:31:02.8905760Z [0;32m[       OK ] [mFunctionalTest.ConvTranspose2dUneven (0 ms)
2023-01-11T23:31:02.8906333Z [0;32m[ RUN      ] [mFunctionalTest.ConvTranspose3d
2023-01-11T23:31:02.8909064Z [0;32m[       OK ] [mFunctionalTest.ConvTranspose3d (0 ms)
2023-01-11T23:31:02.8909606Z [0;32m[ RUN      ] [mFunctionalTest.AlphaDropout
2023-01-11T23:31:02.8918849Z [0;32m[       OK ] [mFunctionalTest.AlphaDropout (0 ms)
2023-01-11T23:31:02.8919403Z [0;32m[ RUN      ] [mFunctionalTest.FeatureAlphaDropout
2023-01-11T23:31:02.8928000Z [0;32m[       OK ] [mFunctionalTest.FeatureAlphaDropout (0 ms)
2023-01-11T23:31:02.8928552Z [0;32m[ RUN      ] [mFunctionalTest.Dropout
2023-01-11T23:31:02.8933094Z [0;32m[       OK ] [mFunctionalTest.Dropout (0 ms)
2023-01-11T23:31:02.8933594Z [0;32m[ RUN      ] [mFunctionalTest.Dropout2d
2023-01-11T23:31:02.8938589Z [0;32m[       OK ] [mFunctionalTest.Dropout2d (0 ms)
2023-01-11T23:31:02.8939069Z [0;32m[ RUN      ] [mFunctionalTest.Dropout3d
2023-01-11T23:31:02.8943946Z [0;32m[       OK ] [mFunctionalTest.Dropout3d (0 ms)
2023-01-11T23:31:02.8944438Z [0;32m[ RUN      ] [mFunctionalTest.isfinite
2023-01-11T23:31:02.8950146Z [0;32m[       OK ] [mFunctionalTest.isfinite (0 ms)
2023-01-11T23:31:02.8950590Z [0;32m[ RUN      ] [mFunctionalTest.isfinite_CUDA
2023-01-11T23:31:03.7402210Z [0;32m[       OK ] [mFunctionalTest.isfinite_CUDA (844 ms)
2023-01-11T23:31:03.7402546Z [0;32m[ RUN      ] [mFunctionalTest.isinf
2023-01-11T23:31:03.7406634Z [0;32m[       OK ] [mFunctionalTest.isinf (0 ms)
2023-01-11T23:31:03.7406958Z [0;32m[ RUN      ] [mFunctionalTest.isinf_CUDA
2023-01-11T23:31:03.7436609Z [0;32m[       OK ] [mFunctionalTest.isinf_CUDA (2 ms)
2023-01-11T23:31:03.7436950Z [0;32m[ RUN      ] [mFunctionalTest.AllClose
2023-01-11T23:31:03.7510913Z [0;32m[       OK ] [mFunctionalTest.AllClose (7 ms)
2023-01-11T23:31:03.7511480Z [0;32m[ RUN      ] [mFunctionalTest.AllClose_CUDA
2023-01-11T23:31:03.7836886Z [0;32m[       OK ] [mFunctionalTest.AllClose_CUDA (32 ms)
2023-01-11T23:31:03.7837255Z [0;32m[ RUN      ] [mFunctionalTest.BCEWithLogitsLoss
2023-01-11T23:31:03.7875095Z [0;32m[       OK ] [mFunctionalTest.BCEWithLogitsLoss (3 ms)
2023-01-11T23:31:03.7875755Z [0;32m[----------] [m135 tests from FunctionalTest (982 ms total)
2023-01-11T23:31:03.7875939Z 
2023-01-11T23:31:03.7876110Z [0;32m[----------] [m3 tests from IntegrationTest
2023-01-11T23:31:03.7876425Z [0;32m[ RUN      ] [mIntegrationTest.CartPole
2023-01-11T23:31:15.8182055Z [0;32m[       OK ] [mIntegrationTest.CartPole (12030 ms)
2023-01-11T23:31:15.8182711Z [0;32m[ RUN      ] [mIntegrationTest.MNIST_CUDA
2023-01-11T23:31:25.2905575Z [0;32m[       OK ] [mIntegrationTest.MNIST_CUDA (9472 ms)
2023-01-11T23:31:25.2905990Z [0;32m[ RUN      ] [mIntegrationTest.MNISTBatchNorm_CUDA
2023-01-11T23:31:34.0412462Z [0;32m[       OK ] [mIntegrationTest.MNISTBatchNorm_CUDA (8750 ms)
2023-01-11T23:31:34.0413088Z [0;32m[----------] [m3 tests from IntegrationTest (30253 ms total)
2023-01-11T23:31:34.0413269Z 
2023-01-11T23:31:34.0413419Z [0;32m[----------] [m9 tests from InitTest
2023-01-11T23:31:34.0413812Z [0;32m[ RUN      ] [mInitTest.ProducesPyTorchValues_XavierUniform
2023-01-11T23:31:34.0427265Z [0;32m[       OK ] [mInitTest.ProducesPyTorchValues_XavierUniform (1 ms)
2023-01-11T23:31:34.0427687Z [0;32m[ RUN      ] [mInitTest.ProducesPyTorchValues_XavierNormal
2023-01-11T23:31:34.0441345Z [0;32m[       OK ] [mInitTest.ProducesPyTorchValues_XavierNormal (1 ms)
2023-01-11T23:31:34.0441904Z [0;32m[ RUN      ] [mInitTest.ProducesPyTorchValues_KaimingNormal
2023-01-11T23:31:34.0456364Z [0;32m[       OK ] [mInitTest.ProducesPyTorchValues_KaimingNormal (1 ms)
2023-01-11T23:31:34.0456780Z [0;32m[ RUN      ] [mInitTest.ProducesPyTorchValues_KaimingUniform
2023-01-11T23:31:34.0471467Z [0;32m[       OK ] [mInitTest.ProducesPyTorchValues_KaimingUniform (1 ms)
2023-01-11T23:31:34.0472272Z [0;32m[ RUN      ] [mInitTest.CanInitializeTensorThatRequiresGrad
2023-01-11T23:31:34.0499219Z [0;32m[       OK ] [mInitTest.CanInitializeTensorThatRequiresGrad (2 ms)
2023-01-11T23:31:34.0499628Z [0;32m[ RUN      ] [mInitTest.CalculateGainWithTanh
2023-01-11T23:31:34.0500029Z [0;32m[       OK ] [mInitTest.CalculateGainWithTanh (0 ms)
2023-01-11T23:31:34.0500442Z [0;32m[ RUN      ] [mInitTest.CalculateGainWithRelu
2023-01-11T23:31:34.0500876Z [0;32m[       OK ] [mInitTest.CalculateGainWithRelu (0 ms)
2023-01-11T23:31:34.0501371Z [0;32m[ RUN      ] [mInitTest.CalculateGainWithLeakyRelu
2023-01-11T23:31:34.0503453Z [0;32m[       OK ] [mInitTest.CalculateGainWithLeakyRelu (0 ms)
2023-01-11T23:31:34.0504025Z [0;32m[ RUN      ] [mInitTest.CanInitializeCnnWithOrthogonal
2023-01-11T23:31:34.0504594Z [0;32m[       OK ] [mInitTest.CanInitializeCnnWithOrthogonal (0 ms)
2023-01-11T23:31:34.0504970Z [0;32m[----------] [m9 tests from InitTest (9 ms total)
2023-01-11T23:31:34.0505130Z 
2023-01-11T23:31:34.0505297Z [0;32m[----------] [m6 tests from TorchScriptTest
2023-01-11T23:31:34.0505653Z [0;32m[ RUN      ] [mTorchScriptTest.CanCompileMultipleFunctions
2023-01-11T23:31:34.0917602Z [0;32m[       OK ] [mTorchScriptTest.CanCompileMultipleFunctions (41 ms)
2023-01-11T23:31:34.0918713Z [0;32m[ RUN      ] [mTorchScriptTest.TestNestedIValueModuleArgMatching
2023-01-11T23:31:34.0939205Z [0;32m[       OK ] [mTorchScriptTest.TestNestedIValueModuleArgMatching (2 ms)
2023-01-11T23:31:34.0939756Z [0;32m[ RUN      ] [mTorchScriptTest.TestDictArgMatching
2023-01-11T23:31:34.0942651Z [0;32m[       OK ] [mTorchScriptTest.TestDictArgMatching (0 ms)
2023-01-11T23:31:34.0943204Z [0;32m[ RUN      ] [mTorchScriptTest.TestTupleArgMatching
2023-01-11T23:31:34.0943695Z [0;32m[       OK ] [mTorchScriptTest.TestTupleArgMatching (0 ms)
2023-01-11T23:31:34.0944105Z [0;32m[ RUN      ] [mTorchScriptTest.TestOptionalArgMatching
2023-01-11T23:31:34.0950028Z [0;32m[       OK ] [mTorchScriptTest.TestOptionalArgMatching (0 ms)
2023-01-11T23:31:34.0950559Z [0;32m[ RUN      ] [mTorchScriptTest.TestPickle
2023-01-11T23:31:34.0951013Z [0;32m[       OK ] [mTorchScriptTest.TestPickle (0 ms)
2023-01-11T23:31:34.0952543Z [0;32m[----------] [m6 tests from TorchScriptTest (44 ms total)
2023-01-11T23:31:34.0952786Z 
2023-01-11T23:31:34.0953033Z [0;32m[----------] [m3 tests from MakeUniqueTest
2023-01-11T23:31:34.0953532Z [0;32m[ RUN      ] [mMakeUniqueTest.ForwardRvaluesCorrectly
2023-01-11T23:31:34.0954059Z [0;32m[       OK ] [mMakeUniqueTest.ForwardRvaluesCorrectly (0 ms)
2023-01-11T23:31:34.0954590Z [0;32m[ RUN      ] [mMakeUniqueTest.ForwardLvaluesCorrectly
2023-01-11T23:31:34.0955092Z [0;32m[       OK ] [mMakeUniqueTest.ForwardLvaluesCorrectly (0 ms)
2023-01-11T23:31:34.0955578Z [0;32m[ RUN      ] [mMakeUniqueTest.CanConstructUniquePtrOfArray
2023-01-11T23:31:34.0956190Z [0;32m[       OK ] [mMakeUniqueTest.CanConstructUniquePtrOfArray (0 ms)
2023-01-11T23:31:34.0956682Z [0;32m[----------] [m3 tests from MakeUniqueTest (0 ms total)
2023-01-11T23:31:34.0956855Z 
2023-01-11T23:31:34.0957067Z [0;32m[----------] [m2 tests from MetaTensorTest
2023-01-11T23:31:34.0957502Z [0;32m[ RUN      ] [mMetaTensorTest.MetaDeviceApi
2023-01-11T23:31:34.0957965Z [0;32m[       OK ] [mMetaTensorTest.MetaDeviceApi (0 ms)
2023-01-11T23:31:34.0958361Z [0;32m[ RUN      ] [mMetaTensorTest.MetaNamespaceApi
2023-01-11T23:31:34.0958717Z [0;32m[       OK ] [mMetaTensorTest.MetaNamespaceApi (0 ms)
2023-01-11T23:31:34.0959067Z [0;32m[----------] [m2 tests from MetaTensorTest (0 ms total)
2023-01-11T23:31:34.0959230Z 
2023-01-11T23:31:34.0959376Z [0;32m[----------] [m2 tests from UtilsTest
2023-01-11T23:31:34.0959787Z [0;32m[ RUN      ] [mUtilsTest.WarnOnce
2023-01-11T23:31:34.0960075Z [0;32m[       OK ] [mUtilsTest.WarnOnce (0 ms)
2023-01-11T23:31:34.0960407Z [0;32m[ RUN      ] [mUtilsTest.AmbiguousOperatorDefaults
2023-01-11T23:31:34.0960775Z [0;32m[       OK ] [mUtilsTest.AmbiguousOperatorDefaults (0 ms)
2023-01-11T23:31:34.0961127Z [0;32m[----------] [m2 tests from UtilsTest (0 ms total)
2023-01-11T23:31:34.0961274Z 
2023-01-11T23:31:34.0961419Z [0;32m[----------] [m1 test from NoGradTest
2023-01-11T23:31:34.0961731Z [0;32m[ RUN      ] [mNoGradTest.SetsGradModeCorrectly
2023-01-11T23:31:34.0983322Z [0;32m[       OK ] [mNoGradTest.SetsGradModeCorrectly (3 ms)
2023-01-11T23:31:34.0983890Z [0;32m[----------] [m1 test from NoGradTest (3 ms total)
2023-01-11T23:31:34.0984117Z 
2023-01-11T23:31:34.0984344Z [0;32m[----------] [m3 tests from AutogradTest
2023-01-11T23:31:34.0984672Z [0;32m[ RUN      ] [mAutogradTest.CanTakeDerivatives
2023-01-11T23:31:34.0985103Z [0;32m[       OK ] [mAutogradTest.CanTakeDerivatives (0 ms)
2023-01-11T23:31:34.0985518Z [0;32m[ RUN      ] [mAutogradTest.CanTakeDerivativesOfZeroDimTensors
2023-01-11T23:31:34.0985977Z [0;32m[       OK ] [mAutogradTest.CanTakeDerivativesOfZeroDimTensors (0 ms)
2023-01-11T23:31:34.0986394Z [0;32m[ RUN      ] [mAutogradTest.CanPassCustomGradientInputs
2023-01-11T23:31:34.0987977Z [0;32m[       OK ] [mAutogradTest.CanPassCustomGradientInputs (0 ms)
2023-01-11T23:31:34.0988491Z [0;32m[----------] [m3 tests from AutogradTest (0 ms total)
2023-01-11T23:31:34.0988703Z 
2023-01-11T23:31:34.0988908Z [0;32m[----------] [m1 test from OptionalArrayRefTest
2023-01-11T23:31:34.0989333Z [0;32m[ RUN      ] [mOptionalArrayRefTest.DanglingPointerFix
2023-01-11T23:31:34.0989857Z [0;32m[       OK ] [mOptionalArrayRefTest.DanglingPointerFix (0 ms)
2023-01-11T23:31:34.0990472Z [0;32m[----------] [m1 test from OptionalArrayRefTest (0 ms total)
2023-01-11T23:31:34.0990659Z 
2023-01-11T23:31:34.0990816Z [0;32m[----------] [m55 tests from ModuleTest
2023-01-11T23:31:34.0991167Z [0;32m[ RUN      ] [mModuleTest.CanEnableAndDisableTrainingMode
2023-01-11T23:31:34.0991568Z [0;32m[       OK ] [mModuleTest.CanEnableAndDisableTrainingMode (0 ms)
2023-01-11T23:31:34.0991911Z [0;32m[ RUN      ] [mModuleTest.ZeroGrad
2023-01-11T23:31:34.0992213Z [0;32m[       OK ] [mModuleTest.ZeroGrad (0 ms)
2023-01-11T23:31:34.0992584Z [0;32m[ RUN      ] [mModuleTest.ZeroGradWithUndefined
2023-01-11T23:31:34.0992956Z [0;32m[       OK ] [mModuleTest.ZeroGradWithUndefined (0 ms)
2023-01-11T23:31:34.0993365Z [0;32m[ RUN      ] [mModuleTest.RegisterModuleThrowsForEmptyOrDottedName
2023-01-11T23:31:34.1016032Z [0;32m[       OK ] [mModuleTest.RegisterModuleThrowsForEmptyOrDottedName (2 ms)
2023-01-11T23:31:34.1017018Z [0;32m[ RUN      ] [mModuleTest.RegisterModuleThrowsForDuplicateModuleName
2023-01-11T23:31:34.1026322Z [0;32m[       OK ] [mModuleTest.RegisterModuleThrowsForDuplicateModuleName (1 ms)
2023-01-11T23:31:34.1026807Z [0;32m[ RUN      ] [mModuleTest.ReplaceModuleThrowsForUnknownModuleName
2023-01-11T23:31:34.1038449Z [0;32m[       OK ] [mModuleTest.ReplaceModuleThrowsForUnknownModuleName (1 ms)
2023-01-11T23:31:34.1038922Z [0;32m[ RUN      ] [mModuleTest.ReplaceModule
2023-01-11T23:31:34.1039242Z [0;32m[       OK ] [mModuleTest.ReplaceModule (0 ms)
2023-01-11T23:31:34.1039559Z [0;32m[ RUN      ] [mModuleTest.UnregisterModule
2023-01-11T23:31:34.1055022Z [0;32m[       OK ] [mModuleTest.UnregisterModule (1 ms)
2023-01-11T23:31:34.1055606Z [0;32m[ RUN      ] [mModuleTest.RegisterParameterThrowsForEmptyOrDottedName
2023-01-11T23:31:34.1088021Z [0;32m[       OK ] [mModuleTest.RegisterParameterThrowsForEmptyOrDottedName (3 ms)
2023-01-11T23:31:34.1088656Z [0;32m[ RUN      ] [mModuleTest.RegisterParameterThrowsForDuplicateModuleName
2023-01-11T23:31:34.1110124Z [0;32m[       OK ] [mModuleTest.RegisterParameterThrowsForDuplicateModuleName (2 ms)
2023-01-11T23:31:34.1110766Z [0;32m[ RUN      ] [mModuleTest.RegisterParameterUndefinedTensor
2023-01-11T23:31:34.1111302Z [0;32m[       OK ] [mModuleTest.RegisterParameterUndefinedTensor (0 ms)
2023-01-11T23:31:34.1111746Z [0;32m[ RUN      ] [mModuleTest.RegisterBufferThrowsForEmptyOrDottedName
2023-01-11T23:31:34.1142908Z [0;32m[       OK ] [mModuleTest.RegisterBufferThrowsForEmptyOrDottedName (3 ms)
2023-01-11T23:31:34.1143473Z [0;32m[ RUN      ] [mModuleTest.RegisterBufferThrowsForDuplicateModuleName
2023-01-11T23:31:34.1164226Z [0;32m[       OK ] [mModuleTest.RegisterBufferThrowsForDuplicateModuleName (2 ms)
2023-01-11T23:31:34.1164856Z [0;32m[ RUN      ] [mModuleTest.CanGetName
2023-01-11T23:31:34.1165242Z [0;32m[       OK ] [mModuleTest.CanGetName (0 ms)
2023-01-11T23:31:34.1165578Z [0;32m[ RUN      ] [mModuleTest.AsCastsModulesCorrectly
2023-01-11T23:31:34.1165941Z [0;32m[       OK ] [mModuleTest.AsCastsModulesCorrectly (0 ms)
2023-01-11T23:31:34.1166425Z [0;32m[ RUN      ] [mModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor
2023-01-11T23:31:34.1167052Z [0;32m[       OK ] [mModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor (0 ms)
2023-01-11T23:31:34.1167630Z [0;32m[ RUN      ] [mModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor_CUDA
2023-01-11T23:31:34.1168123Z [0;32m[       OK ] [mModuleTest.DeviceOrDtypeConversionSkipsUndefinedTensor_CUDA (0 ms)
2023-01-11T23:31:34.1168619Z [0;32m[ RUN      ] [mModuleTest.ParametersAndBuffersAccessorSkipsUndefinedTensor
2023-01-11T23:31:34.1169124Z [0;32m[       OK ] [mModuleTest.ParametersAndBuffersAccessorSkipsUndefinedTensor (0 ms)
2023-01-11T23:31:34.1169637Z [0;32m[ RUN      ] [mModuleTest.CallingCloneOnModuleThatDoesNotOverrideCloneThrows
2023-01-11T23:31:34.1184596Z [0;32m[       OK ] [mModuleTest.CallingCloneOnModuleThatDoesNotOverrideCloneThrows (1 ms)
2023-01-11T23:31:34.1185142Z [0;32m[ RUN      ] [mModuleTest.CallingCloneOnModuleThatDoesOverrideCloneDoesNotThrow
2023-01-11T23:31:34.1185682Z [0;32m[       OK ] [mModuleTest.CallingCloneOnModuleThatDoesOverrideCloneDoesNotThrow (0 ms)
2023-01-11T23:31:34.1186148Z [0;32m[ RUN      ] [mModuleTest.CloneCreatesDistinctParameters
2023-01-11T23:31:34.1192678Z [0;32m[       OK ] [mModuleTest.CloneCreatesDistinctParameters (0 ms)
2023-01-11T23:31:34.1193140Z [0;32m[ RUN      ] [mModuleTest.CloneCreatesDistinctParametersExplicitDevice_CUDA
2023-01-11T23:31:34.1209299Z [0;32m[       OK ] [mModuleTest.CloneCreatesDistinctParametersExplicitDevice_CUDA (1 ms)
2023-01-11T23:31:34.1209747Z [0;32m[ RUN      ] [mModuleTest.ClonePreservesExternalReferences
2023-01-11T23:31:34.1210683Z [0;32m[       OK ] [mModuleTest.ClonePreservesExternalReferences (0 ms)
2023-01-11T23:31:34.1211230Z [0;32m[ RUN      ] [mModuleTest.CloneCopiesTheValuesOfVariablesOfSubmodules
2023-01-11T23:31:34.1211912Z [0;32m[       OK ] [mModuleTest.CloneCopiesTheValuesOfVariablesOfSubmodules (0 ms)
2023-01-11T23:31:34.1212402Z [0;32m[ RUN      ] [mModuleTest.CloneToDevicePreservesTheDeviceOfParameters_CUDA
2023-01-11T23:31:34.1216484Z [0;32m[       OK ] [mModuleTest.CloneToDevicePreservesTheDeviceOfParameters_CUDA (0 ms)
2023-01-11T23:31:34.1217059Z [0;32m[ RUN      ] [mModuleTest.HasCorrectNumberOfParameters
2023-01-11T23:31:34.1217602Z [0;32m[       OK ] [mModuleTest.HasCorrectNumberOfParameters (0 ms)
2023-01-11T23:31:34.1218154Z [0;32m[ RUN      ] [mModuleTest.ContainsParametersWithTheCorrectName
2023-01-11T23:31:34.1218612Z [0;32m[       OK ] [mModuleTest.ContainsParametersWithTheCorrectName (0 ms)
2023-01-11T23:31:34.1219016Z [0;32m[ RUN      ] [mModuleTest.HasCorrectNumberOfBuffers
2023-01-11T23:31:34.1219400Z [0;32m[       OK ] [mModuleTest.HasCorrectNumberOfBuffers (0 ms)
2023-01-11T23:31:34.1219785Z [0;32m[ RUN      ] [mModuleTest.ContainsBuffersWithTheCorrectName
2023-01-11T23:31:34.1220209Z [0;32m[       OK ] [mModuleTest.ContainsBuffersWithTheCorrectName (0 ms)
2023-01-11T23:31:34.1220723Z [0;32m[ RUN      ] [mModuleTest.DefaultConstructorOfModuleHolderCallsDefaultConstructorOfImpl
2023-01-11T23:31:34.1221404Z [0;32m[       OK ] [mModuleTest.DefaultConstructorOfModuleHolderCallsDefaultConstructorOfImpl (0 ms)
2023-01-11T23:31:34.1221999Z [0;32m[ RUN      ] [mModuleTest.ValueConstructorOfModuleHolderCallsCorrectConstructorInImpl
2023-01-11T23:31:34.1222588Z [0;32m[       OK ] [mModuleTest.ValueConstructorOfModuleHolderCallsCorrectConstructorInImpl (0 ms)
2023-01-11T23:31:34.1223149Z [0;32m[ RUN      ] [mModuleTest.NullptrConstructorLeavesTheModuleHolderInEmptyState
2023-01-11T23:31:34.1229616Z [0;32m[       OK ] [mModuleTest.NullptrConstructorLeavesTheModuleHolderInEmptyState (1 ms)
2023-01-11T23:31:34.1230361Z [0;32m[ RUN      ] [mModuleTest.ModulesReturnsExpectedSubmodulesForFlatModel
2023-01-11T23:31:34.1230903Z [0;32m[       OK ] [mModuleTest.ModulesReturnsExpectedSubmodulesForFlatModel (0 ms)
2023-01-11T23:31:34.1231516Z [0;32m[ RUN      ] [mModuleTest.ModulesExcludesSelfWhenIncludeSelfSetToFalse
2023-01-11T23:31:34.1232020Z [0;32m[       OK ] [mModuleTest.ModulesExcludesSelfWhenIncludeSelfSetToFalse (0 ms)
2023-01-11T23:31:34.1232585Z [0;32m[ RUN      ] [mModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForFlatModel
2023-01-11T23:31:34.1233302Z [0;32m[       OK ] [mModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForFlatModel (0 ms)
2023-01-11T23:31:34.1234034Z [0;32m[ RUN      ] [mModuleTest.NamedModulesExcludesSelfWhenIncludeSelfSetToFalse
2023-01-11T23:31:34.1234738Z [0;32m[       OK ] [mModuleTest.NamedModulesExcludesSelfWhenIncludeSelfSetToFalse (0 ms)
2023-01-11T23:31:34.1235269Z [0;32m[ RUN      ] [mModuleTest.ChildrenReturnsExpectedSubmodulesForFlatModel
2023-01-11T23:31:34.1235926Z [0;32m[       OK ] [mModuleTest.ChildrenReturnsExpectedSubmodulesForFlatModel (0 ms)
2023-01-11T23:31:34.1236573Z [0;32m[ RUN      ] [mModuleTest.NamedChildrenReturnsExpectedNamedSubmodulesForFlatModel
2023-01-11T23:31:34.1237201Z [0;32m[       OK ] [mModuleTest.NamedChildrenReturnsExpectedNamedSubmodulesForFlatModel (0 ms)
2023-01-11T23:31:34.1237717Z [0;32m[ RUN      ] [mModuleTest.ParametersReturnsExpectedTensorsForFlatModel
2023-01-11T23:31:34.1238208Z [0;32m[       OK ] [mModuleTest.ParametersReturnsExpectedTensorsForFlatModel (0 ms)
2023-01-11T23:31:34.1238699Z [0;32m[ RUN      ] [mModuleTest.NamedParametersReturnsExpectedTensorsForFlatModel
2023-01-11T23:31:34.1239216Z [0;32m[       OK ] [mModuleTest.NamedParametersReturnsExpectedTensorsForFlatModel (0 ms)
2023-01-11T23:31:34.1239701Z [0;32m[ RUN      ] [mModuleTest.BuffersReturnsExpectedTensorsForFlatModel
2023-01-11T23:31:34.1240168Z [0;32m[       OK ] [mModuleTest.BuffersReturnsExpectedTensorsForFlatModel (0 ms)
2023-01-11T23:31:34.1240641Z [0;32m[ RUN      ] [mModuleTest.NamedBuffersReturnsExpectedTensorsForFlatModel
2023-01-11T23:31:34.1241142Z [0;32m[       OK ] [mModuleTest.NamedBuffersReturnsExpectedTensorsForFlatModel (0 ms)
2023-01-11T23:31:34.1241625Z [0;32m[ RUN      ] [mModuleTest.ModulesReturnsExpectedSubmodulesForDeepModel
2023-01-11T23:31:34.1242103Z [0;32m[       OK ] [mModuleTest.ModulesReturnsExpectedSubmodulesForDeepModel (0 ms)
2023-01-11T23:31:34.1242621Z [0;32m[ RUN      ] [mModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForDeepModel
2023-01-11T23:31:34.1243171Z [0;32m[       OK ] [mModuleTest.NamedModulesReturnsExpectedNamedSubmodulesForDeepModel (0 ms)
2023-01-11T23:31:34.1243690Z [0;32m[ RUN      ] [mModuleTest.ChildrensReturnsExpectedSubmodulesForDeepModel
2023-01-11T23:31:34.1244180Z [0;32m[       OK ] [mModuleTest.ChildrensReturnsExpectedSubmodulesForDeepModel (0 ms)
2023-01-11T23:31:34.1244710Z [0;32m[ RUN      ] [mModuleTest.NamedChildrensReturnsExpectedNamedSubmodulesForDeepModel
2023-01-11T23:31:34.1245276Z [0;32m[       OK ] [mModuleTest.NamedChildrensReturnsExpectedNamedSubmodulesForDeepModel (0 ms)
2023-01-11T23:31:34.1245741Z [0;32m[ RUN      ] [mModuleTest.ModuleApplyIteratesCorreclty
2023-01-11T23:31:34.1246132Z [0;32m[       OK ] [mModuleTest.ModuleApplyIteratesCorreclty (0 ms)
2023-01-11T23:31:34.1246533Z [0;32m[ RUN      ] [mModuleTest.ConstModuleApplyIteratesCorreclty
2023-01-11T23:31:34.1246996Z [0;32m[       OK ] [mModuleTest.ConstModuleApplyIteratesCorreclty (0 ms)
2023-01-11T23:31:34.1247406Z [0;32m[ RUN      ] [mModuleTest.NamedModuleApplyIteratesCorreclty
2023-01-11T23:31:34.1247824Z [0;32m[       OK ] [mModuleTest.NamedModuleApplyIteratesCorreclty (0 ms)
2023-01-11T23:31:34.1248255Z [0;32m[ RUN      ] [mModuleTest.ConstNamedModuleApplyIteratesCorreclty
2023-01-11T23:31:34.1248698Z [0;32m[       OK ] [mModuleTest.ConstNamedModuleApplyIteratesCorreclty (0 ms)
2023-01-11T23:31:34.1249135Z [0;32m[ RUN      ] [mModuleTest.ModulePointerApplyIteratesCorreclty
2023-01-11T23:31:34.1249566Z [0;32m[       OK ] [mModuleTest.ModulePointerApplyIteratesCorreclty (0 ms)
2023-01-11T23:31:34.1250011Z [0;32m[ RUN      ] [mModuleTest.NamedModulePointerApplyIteratesCorreclty
2023-01-11T23:31:34.1250463Z [0;32m[       OK ] [mModuleTest.NamedModulePointerApplyIteratesCorreclty (0 ms)
2023-01-11T23:31:34.1250955Z [0;32m[ RUN      ] [mModuleTest.ThrowsWhenAttemptingtoGetTopLevelModuleAsSharedPtr
2023-01-11T23:31:34.1266834Z [0;32m[       OK ] [mModuleTest.ThrowsWhenAttemptingtoGetTopLevelModuleAsSharedPtr (2 ms)
2023-01-11T23:31:34.1267279Z [0;32m[ RUN      ] [mModuleTest.PrettyPrint
2023-01-11T23:31:34.1267650Z [0;32m[       OK ] [mModuleTest.PrettyPrint (0 ms)
2023-01-11T23:31:34.1268059Z [0;32m[ RUN      ] [mModuleTest.CanCallForwardOnNonTensorForwardThroughPimpl
2023-01-11T23:31:34.1268545Z [0;32m[       OK ] [mModuleTest.CanCallForwardOnNonTensorForwardThroughPimpl (0 ms)
2023-01-11T23:31:34.1268953Z [0;32m[----------] [m55 tests from ModuleTest (27 ms total)
2023-01-11T23:31:34.1269113Z 
2023-01-11T23:31:34.1269273Z [0;32m[----------] [m13 tests from ModuleDictTest
2023-01-11T23:31:34.1269678Z [0;32m[ RUN      ] [mModuleDictTest.ConstructsFromList
2023-01-11T23:31:34.1270102Z [0;32m[       OK ] [mModuleDictTest.ConstructsFromList (0 ms)
2023-01-11T23:31:34.1270474Z [0;32m[ RUN      ] [mModuleDictTest.ConstructsFromordereddict
2023-01-11T23:31:34.1270869Z [0;32m[       OK ] [mModuleDictTest.ConstructsFromordereddict (0 ms)
2023-01-11T23:31:34.1271259Z [0;32m[ RUN      ] [mModuleDictTest.UpdatePopClearContains
2023-01-11T23:31:34.1280693Z [0;32m[       OK ] [mModuleDictTest.UpdatePopClearContains (1 ms)
2023-01-11T23:31:34.1281052Z [0;32m[ RUN      ] [mModuleDictTest.UpdateExist
2023-01-11T23:31:34.1281382Z [0;32m[       OK ] [mModuleDictTest.UpdateExist (0 ms)
2023-01-11T23:31:34.1281674Z [0;32m[ RUN      ] [mModuleDictTest.Keys
2023-01-11T23:31:34.1293381Z [0;32m[       OK ] [mModuleDictTest.Keys (1 ms)
2023-01-11T23:31:34.1294063Z [0;32m[ RUN      ] [mModuleDictTest.Values
2023-01-11T23:31:34.1294619Z [0;32m[       OK ] [mModuleDictTest.Values (0 ms)
2023-01-11T23:31:34.1295046Z [0;32m[ RUN      ] [mModuleDictTest.SanityCheckForHoldingStandardModules
2023-01-11T23:31:34.1295810Z [0;32m[       OK ] [mModuleDictTest.SanityCheckForHoldingStandardModules (0 ms)
2023-01-11T23:31:34.1296332Z [0;32m[ RUN      ] [mModuleDictTest.HasReferenceSemantics
2023-01-11T23:31:34.1296720Z [0;32m[       OK ] [mModuleDictTest.HasReferenceSemantics (0 ms)
2023-01-11T23:31:34.1297063Z [0;32m[ RUN      ] [mModuleDictTest.IsCloneable
2023-01-11T23:31:34.1300229Z [0;32m[       OK ] [mModuleDictTest.IsCloneable (0 ms)
2023-01-11T23:31:34.1300671Z [0;32m[ RUN      ] [mModuleDictTest.IsCloneable_CUDA
2023-01-11T23:31:34.1312614Z [0;32m[       OK ] [mModuleDictTest.IsCloneable_CUDA (1 ms)
2023-01-11T23:31:34.1313156Z [0;32m[ RUN      ] [mModuleDictTest.RegistersElementsAsSubmodules
2023-01-11T23:31:34.1313933Z [0;32m[       OK ] [mModuleDictTest.RegistersElementsAsSubmodules (0 ms)
2023-01-11T23:31:34.1314461Z [0;32m[ RUN      ] [mModuleDictTest.CloneToDevice_CUDA
2023-01-11T23:31:34.1314951Z [0;32m[       OK ] [mModuleDictTest.CloneToDevice_CUDA (0 ms)
2023-01-11T23:31:34.1315322Z [0;32m[ RUN      ] [mModuleDictTest.PrettyPrintModuleDict
2023-01-11T23:31:34.1316980Z [0;32m[       OK ] [mModuleDictTest.PrettyPrintModuleDict (0 ms)
2023-01-11T23:31:34.1327999Z [0;32m[----------] [m13 tests from ModuleDictTest (5 ms total)
2023-01-11T23:31:34.1328492Z 
2023-01-11T23:31:34.1328972Z [0;32m[----------] [m16 tests from ModuleListTest
2023-01-11T23:31:34.1329483Z [0;32m[ RUN      ] [mModuleListTest.ConstructsFromSharedPointer
2023-01-11T23:31:34.1330040Z [0;32m[       OK ] [mModuleListTest.ConstructsFromSharedPointer (0 ms)
2023-01-11T23:31:34.1330536Z [0;32m[ RUN      ] [mModuleListTest.ConstructsFromConcreteType
2023-01-11T23:31:34.1330935Z [0;32m[       OK ] [mModuleListTest.ConstructsFromConcreteType (0 ms)
2023-01-11T23:31:34.1331329Z [0;32m[ RUN      ] [mModuleListTest.ConstructsFromModuleHolder
2023-01-11T23:31:34.1331735Z [0;32m[       OK ] [mModuleListTest.ConstructsFromModuleHolder (0 ms)
2023-01-11T23:31:34.1332108Z [0;32m[ RUN      ] [mModuleListTest.PushBackAddsAnElement
2023-01-11T23:31:34.1332484Z [0;32m[       OK ] [mModuleListTest.PushBackAddsAnElement (0 ms)
2023-01-11T23:31:34.1332823Z [0;32m[ RUN      ] [mModuleListTest.Insertion
2023-01-11T23:31:34.1333142Z [0;32m[       OK ] [mModuleListTest.Insertion (0 ms)
2023-01-11T23:31:34.1333453Z [0;32m[ RUN      ] [mModuleListTest.AccessWithAt
2023-01-11T23:31:34.1341387Z [0;32m[       OK ] [mModuleListTest.AccessWithAt (2 ms)
2023-01-11T23:31:34.1341835Z [0;32m[ RUN      ] [mModuleListTest.AccessWithPtr
2023-01-11T23:31:34.1364387Z [0;32m[       OK ] [mModuleListTest.AccessWithPtr (2 ms)
2023-01-11T23:31:34.1365022Z [0;32m[ RUN      ] [mModuleListTest.SanityCheckForHoldingStandardModules
2023-01-11T23:31:34.1365768Z [0;32m[       OK ] [mModuleListTest.SanityCheckForHoldingStandardModules (0 ms)
2023-01-11T23:31:34.1366422Z [0;32m[ RUN      ] [mModuleListTest.ExtendPushesModulesFromOtherModuleList
2023-01-11T23:31:34.1367002Z [0;32m[       OK ] [mModuleListTest.ExtendPushesModulesFromOtherModuleList (0 ms)
2023-01-11T23:31:34.1367418Z [0;32m[ RUN      ] [mModuleListTest.HasReferenceSemantics
2023-01-11T23:31:34.1367798Z [0;32m[       OK ] [mModuleListTest.HasReferenceSemantics (0 ms)
2023-01-11T23:31:34.1368133Z [0;32m[ RUN      ] [mModuleListTest.IsCloneable
2023-01-11T23:31:34.1370071Z [0;32m[       OK ] [mModuleListTest.IsCloneable (0 ms)
2023-01-11T23:31:34.1370669Z [0;32m[ RUN      ] [mModuleListTest.RegistersElementsAsSubmodules
2023-01-11T23:31:34.1371181Z [0;32m[       OK ] [mModuleListTest.RegistersElementsAsSubmodules (0 ms)
2023-01-11T23:31:34.1371554Z [0;32m[ RUN      ] [mModuleListTest.NestingIsPossible
2023-01-11T23:31:34.1371908Z [0;32m[       OK ] [mModuleListTest.NestingIsPossible (0 ms)
2023-01-11T23:31:34.1372255Z [0;32m[ RUN      ] [mModuleListTest.CloneToDevice_CUDA
2023-01-11T23:31:34.1372904Z [0;32m[       OK ] [mModuleListTest.CloneToDevice_CUDA (0 ms)
2023-01-11T23:31:34.1373479Z [0;32m[ RUN      ] [mModuleListTest.PrettyPrintModuleList
2023-01-11T23:31:34.1374924Z [0;32m[       OK ] [mModuleListTest.PrettyPrintModuleList (0 ms)
2023-01-11T23:31:34.1375429Z [0;32m[ RUN      ] [mModuleListTest.RangeBasedForLoop
2023-01-11T23:31:34.1375795Z [0;32m[       OK ] [mModuleListTest.RangeBasedForLoop (0 ms)
2023-01-11T23:31:34.1376158Z [0;32m[----------] [m16 tests from ModuleListTest (5 ms total)
2023-01-11T23:31:34.1376326Z 
2023-01-11T23:31:34.1376479Z [0;32m[----------] [m258 tests from ModulesTest
2023-01-11T23:31:34.1376762Z [0;32m[ RUN      ] [mModulesTest.Conv1d
2023-01-11T23:31:34.1385348Z [0;32m[       OK ] [mModulesTest.Conv1d (0 ms)
2023-01-11T23:31:34.1385673Z [0;32m[ RUN      ] [mModulesTest.Conv1dSameStrided
2023-01-11T23:31:34.1407527Z [0;32m[       OK ] [mModulesTest.Conv1dSameStrided (2 ms)
2023-01-11T23:31:34.1407849Z [0;32m[ RUN      ] [mModulesTest.Conv2dEven
2023-01-11T23:31:34.1410625Z [0;32m[       OK ] [mModulesTest.Conv2dEven (0 ms)
2023-01-11T23:31:34.1410956Z [0;32m[ RUN      ] [mModulesTest.Conv2dUneven
2023-01-11T23:31:34.1413250Z [0;32m[       OK ] [mModulesTest.Conv2dUneven (0 ms)
2023-01-11T23:31:34.1413607Z [0;32m[ RUN      ] [mModulesTest.Conv2dSameStrided
2023-01-11T23:31:34.1459666Z [0;32m[       OK ] [mModulesTest.Conv2dSameStrided (4 ms)
2023-01-11T23:31:34.1459986Z [0;32m[ RUN      ] [mModulesTest.Conv3d
2023-01-11T23:31:34.1463696Z [0;32m[       OK ] [mModulesTest.Conv3d (0 ms)
2023-01-11T23:31:34.1464007Z [0;32m[ RUN      ] [mModulesTest.Conv3dSameStrided
2023-01-11T23:31:34.1508889Z [0;32m[       OK ] [mModulesTest.Conv3dSameStrided (4 ms)
2023-01-11T23:31:34.1509229Z [0;32m[ RUN      ] [mModulesTest.ConvTranspose1d
2023-01-11T23:31:34.1512814Z [0;32m[       OK ] [mModulesTest.ConvTranspose1d (0 ms)
2023-01-11T23:31:34.1513197Z [0;32m[ RUN      ] [mModulesTest.ConvTranspose2dEven
2023-01-11T23:31:34.1516215Z [0;32m[       OK ] [mModulesTest.ConvTranspose2dEven (0 ms)
2023-01-11T23:31:34.1516583Z [0;32m[ RUN      ] [mModulesTest.ConvTranspose2dUneven
2023-01-11T23:31:34.1519924Z [0;32m[       OK ] [mModulesTest.ConvTranspose2dUneven (0 ms)
2023-01-11T23:31:34.1520274Z [0;32m[ RUN      ] [mModulesTest.ConvTranspose3d
2023-01-11T23:31:34.1523038Z [0;32m[       OK ] [mModulesTest.ConvTranspose3d (0 ms)
2023-01-11T23:31:34.1523535Z [0;32m[ RUN      ] [mModulesTest.MaxPool1d
2023-01-11T23:31:34.1524890Z [0;32m[       OK ] [mModulesTest.MaxPool1d (0 ms)
2023-01-11T23:31:34.1525231Z [0;32m[ RUN      ] [mModulesTest.MaxPool1dReturnIndices
2023-01-11T23:31:34.1526058Z [0;32m[       OK ] [mModulesTest.MaxPool1dReturnIndices (0 ms)
2023-01-11T23:31:34.1526401Z [0;32m[ RUN      ] [mModulesTest.MaxPool2dEven
2023-01-11T23:31:34.1527756Z [0;32m[       OK ] [mModulesTest.MaxPool2dEven (0 ms)
2023-01-11T23:31:34.1528087Z [0;32m[ RUN      ] [mModulesTest.MaxPool2dUneven
2023-01-11T23:31:34.1528643Z [0;32m[       OK ] [mModulesTest.MaxPool2dUneven (0 ms)
2023-01-11T23:31:34.1529191Z [0;32m[ RUN      ] [mModulesTest.MaxPool2dReturnIndices
2023-01-11T23:31:34.1530766Z [0;32m[       OK ] [mModulesTest.MaxPool2dReturnIndices (0 ms)
2023-01-11T23:31:34.1531154Z [0;32m[ RUN      ] [mModulesTest.MaxPool3d
2023-01-11T23:31:34.1532508Z [0;32m[       OK ] [mModulesTest.MaxPool3d (0 ms)
2023-01-11T23:31:34.1532905Z [0;32m[ RUN      ] [mModulesTest.MaxPool3dReturnIndices
2023-01-11T23:31:34.1534763Z [0;32m[       OK ] [mModulesTest.MaxPool3dReturnIndices (0 ms)
2023-01-11T23:31:34.1535140Z [0;32m[ RUN      ] [mModulesTest.AvgPool1d
2023-01-11T23:31:34.1535924Z [0;32m[       OK ] [mModulesTest.AvgPool1d (0 ms)
2023-01-11T23:31:34.1536283Z [0;32m[ RUN      ] [mModulesTest.AvgPool2dEven
2023-01-11T23:31:34.1537841Z [0;32m[       OK ] [mModulesTest.AvgPool2dEven (0 ms)
2023-01-11T23:31:34.1538197Z [0;32m[ RUN      ] [mModulesTest.AvgPool2dUneven
2023-01-11T23:31:34.1539413Z [0;32m[       OK ] [mModulesTest.AvgPool2dUneven (0 ms)
2023-01-11T23:31:34.1539777Z [0;32m[ RUN      ] [mModulesTest.AvgPool3d
2023-01-11T23:31:34.1540264Z [0;32m[       OK ] [mModulesTest.AvgPool3d (0 ms)
2023-01-11T23:31:34.1540627Z [0;32m[ RUN      ] [mModulesTest.FractionalMaxPool2d
2023-01-11T23:31:34.1542407Z [0;32m[       OK ] [mModulesTest.FractionalMaxPool2d (0 ms)
2023-01-11T23:31:34.1542850Z [0;32m[ RUN      ] [mModulesTest.FractionalMaxPool2dReturnIndices
2023-01-11T23:31:34.1543776Z [0;32m[       OK ] [mModulesTest.FractionalMaxPool2dReturnIndices (0 ms)
2023-01-11T23:31:34.1544204Z [0;32m[ RUN      ] [mModulesTest.FractionalMaxPool3d
2023-01-11T23:31:34.1545780Z [0;32m[       OK ] [mModulesTest.FractionalMaxPool3d (0 ms)
2023-01-11T23:31:34.1546208Z [0;32m[ RUN      ] [mModulesTest.FractionalMaxPool3dReturnIndices
2023-01-11T23:31:34.1547578Z [0;32m[       OK ] [mModulesTest.FractionalMaxPool3dReturnIndices (0 ms)
2023-01-11T23:31:34.1547978Z [0;32m[ RUN      ] [mModulesTest.LPPool1d
2023-01-11T23:31:34.1548470Z [0;32m[       OK ] [mModulesTest.LPPool1d (0 ms)
2023-01-11T23:31:34.1548805Z [0;32m[ RUN      ] [mModulesTest.LPPool2d
2023-01-11T23:31:34.1551688Z [0;32m[       OK ] [mModulesTest.LPPool2d (0 ms)
2023-01-11T23:31:34.1552366Z [0;32m[ RUN      ] [mModulesTest.Identity
2023-01-11T23:31:34.1552782Z [0;32m[       OK ] [mModulesTest.Identity (0 ms)
2023-01-11T23:31:34.1553236Z [0;32m[ RUN      ] [mModulesTest.Flatten
2023-01-11T23:31:34.1553961Z [0;32m[       OK ] [mModulesTest.Flatten (0 ms)
2023-01-11T23:31:34.1554374Z [0;32m[ RUN      ] [mModulesTest.Unflatten
2023-01-11T23:31:34.1555005Z [W TensorImpl.h:1816] Warning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (function operator())
2023-01-11T23:31:34.1555572Z [0;32m[       OK ] [mModulesTest.Unflatten (0 ms)
2023-01-11T23:31:34.1555896Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool1d
2023-01-11T23:31:34.1556562Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool1d (0 ms)
2023-01-11T23:31:34.1557046Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool1dReturnIndices
2023-01-11T23:31:34.1558612Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool1dReturnIndices (0 ms)
2023-01-11T23:31:34.1559119Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool2dEven
2023-01-11T23:31:34.1560571Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool2dEven (0 ms)
2023-01-11T23:31:34.1561072Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool2dUneven
2023-01-11T23:31:34.1561754Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool2dUneven (0 ms)
2023-01-11T23:31:34.1562251Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool2dReturnIndicesEven
2023-01-11T23:31:34.1564785Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool2dReturnIndicesEven (0 ms)
2023-01-11T23:31:34.1565369Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool2dReturnIndicesUneven
2023-01-11T23:31:34.1567273Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool2dReturnIndicesUneven (0 ms)
2023-01-11T23:31:34.1567883Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool3d
2023-01-11T23:31:34.1569219Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool3d (0 ms)
2023-01-11T23:31:34.1569718Z [0;32m[ RUN      ] [mModulesTest.AdaptiveMaxPool3dReturnIndices
2023-01-11T23:31:34.1572331Z [0;32m[       OK ] [mModulesTest.AdaptiveMaxPool3dReturnIndices (0 ms)
2023-01-11T23:31:34.1572826Z [0;32m[ RUN      ] [mModulesTest.AdaptiveAvgPool1d
2023-01-11T23:31:34.1573770Z [0;32m[       OK ] [mModulesTest.AdaptiveAvgPool1d (0 ms)
2023-01-11T23:31:34.1574233Z [0;32m[ RUN      ] [mModulesTest.AdaptiveAvgPool2dEven
2023-01-11T23:31:34.1576066Z [0;32m[       OK ] [mModulesTest.AdaptiveAvgPool2dEven (0 ms)
2023-01-11T23:31:34.1576568Z [0;32m[ RUN      ] [mModulesTest.AdaptiveAvgPool2dUneven
2023-01-11T23:31:34.1577456Z [0;32m[       OK ] [mModulesTest.AdaptiveAvgPool2dUneven (0 ms)
2023-01-11T23:31:34.1577927Z [0;32m[ RUN      ] [mModulesTest.AdaptiveAvgPool3d
2023-01-11T23:31:34.1579778Z [0;32m[       OK ] [mModulesTest.AdaptiveAvgPool3d (0 ms)
2023-01-11T23:31:34.1580235Z [0;32m[ RUN      ] [mModulesTest.MaxUnpool1d
2023-01-11T23:31:34.1581993Z [0;32m[       OK ] [mModulesTest.MaxUnpool1d (0 ms)
2023-01-11T23:31:34.1582442Z [0;32m[ RUN      ] [mModulesTest.MaxPool1d_MaxUnpool1d
2023-01-11T23:31:34.1585102Z [0;32m[       OK ] [mModulesTest.MaxPool1d_MaxUnpool1d (0 ms)
2023-01-11T23:31:34.1585548Z [0;32m[ RUN      ] [mModulesTest.MaxUnpool2d
2023-01-11T23:31:34.1587519Z [0;32m[       OK ] [mModulesTest.MaxUnpool2d (0 ms)
2023-01-11T23:31:34.1587971Z [0;32m[ RUN      ] [mModulesTest.MaxPool2d_MaxUnpool2d
2023-01-11T23:31:34.1590198Z [0;32m[       OK ] [mModulesTest.MaxPool2d_MaxUnpool2d (0 ms)
2023-01-11T23:31:34.1590637Z [0;32m[ RUN      ] [mModulesTest.MaxUnpool3d
2023-01-11T23:31:34.1592602Z [0;32m[       OK ] [mModulesTest.MaxUnpool3d (0 ms)
2023-01-11T23:31:34.1593012Z [0;32m[ RUN      ] [mModulesTest.MaxUnpool3dOutputSize
2023-01-11T23:31:34.1594404Z [0;32m[       OK ] [mModulesTest.MaxUnpool3dOutputSize (0 ms)
2023-01-11T23:31:34.1594856Z [0;32m[ RUN      ] [mModulesTest.MaxPool3d_MaxUnpool3d
2023-01-11T23:31:34.2473561Z [0;32m[       OK ] [mModulesTest.MaxPool3d_MaxUnpool3d (87 ms)
2023-01-11T23:31:34.2473975Z [0;32m[ RUN      ] [mModulesTest.Linear
2023-01-11T23:31:34.2475880Z [0;32m[       OK ] [mModulesTest.Linear (0 ms)
2023-01-11T23:31:34.2476521Z [0;32m[ RUN      ] [mModulesTest.LocalResponseNorm
2023-01-11T23:31:34.2480563Z [0;32m[       OK ] [mModulesTest.LocalResponseNorm (0 ms)
2023-01-11T23:31:34.2480928Z [0;32m[ RUN      ] [mModulesTest.LayerNorm
2023-01-11T23:31:34.2481889Z [0;32m[       OK ] [mModulesTest.LayerNorm (0 ms)
2023-01-11T23:31:34.2482200Z [0;32m[ RUN      ] [mModulesTest.GroupNorm
2023-01-11T23:31:34.2484599Z [0;32m[       OK ] [mModulesTest.GroupNorm (0 ms)
2023-01-11T23:31:34.2484891Z [0;32m[ RUN      ] [mModulesTest.Bilinear
2023-01-11T23:31:34.2488500Z [0;32m[       OK ] [mModulesTest.Bilinear (0 ms)
2023-01-11T23:31:34.2488795Z [0;32m[ RUN      ] [mModulesTest.Fold
2023-01-11T23:31:34.2514806Z [0;32m[       OK ] [mModulesTest.Fold (2 ms)
2023-01-11T23:31:34.2515455Z [0;32m[ RUN      ] [mModulesTest.Unfold
2023-01-11T23:31:34.2580153Z [0;32m[       OK ] [mModulesTest.Unfold (6 ms)
2023-01-11T23:31:34.2580577Z [0;32m[ RUN      ] [mModulesTest.SimpleContainer
2023-01-11T23:31:34.2592381Z [0;32m[       OK ] [mModulesTest.SimpleContainer (1 ms)
2023-01-11T23:31:34.2592786Z [0;32m[ RUN      ] [mModulesTest.EmbeddingBasic
2023-01-11T23:31:34.2593869Z [0;32m[       OK ] [mModulesTest.EmbeddingBasic (0 ms)
2023-01-11T23:31:34.2594299Z [0;32m[ RUN      ] [mModulesTest.EmbeddingList
2023-01-11T23:31:34.2594713Z [0;32m[       OK ] [mModulesTest.EmbeddingList (0 ms)
2023-01-11T23:31:34.2595059Z [0;32m[ RUN      ] [mModulesTest.EmbeddingFromPretrained
2023-01-11T23:31:34.2595420Z [0;32m[       OK ] [mModulesTest.EmbeddingFromPretrained (0 ms)
2023-01-11T23:31:34.2595791Z [0;32m[ RUN      ] [mModulesTest.EmbeddingBagFromPretrained
2023-01-11T23:31:34.2596719Z [0;32m[       OK ] [mModulesTest.EmbeddingBagFromPretrained (0 ms)
2023-01-11T23:31:34.2597073Z [0;32m[ RUN      ] [mModulesTest.AlphaDropout
2023-01-11T23:31:34.2597521Z [0;32m[       OK ] [mModulesTest.AlphaDropout (0 ms)
2023-01-11T23:31:34.2597862Z [0;32m[ RUN      ] [mModulesTest.FeatureAlphaDropout
2023-01-11T23:31:34.2599085Z [0;32m[       OK ] [mModulesTest.FeatureAlphaDropout (0 ms)
2023-01-11T23:31:34.2599409Z [0;32m[ RUN      ] [mModulesTest.Dropout
2023-01-11T23:31:34.2600963Z [0;32m[       OK ] [mModulesTest.Dropout (0 ms)
2023-01-11T23:31:34.2601265Z [0;32m[ RUN      ] [mModulesTest.Dropout2d
2023-01-11T23:31:34.2605743Z [0;32m[       OK ] [mModulesTest.Dropout2d (0 ms)
2023-01-11T23:31:34.2606041Z [0;32m[ RUN      ] [mModulesTest.Dropout3d
2023-01-11T23:31:34.2611468Z [0;32m[       OK ] [mModulesTest.Dropout3d (0 ms)
2023-01-11T23:31:34.2611796Z [0;32m[ RUN      ] [mModulesTest.Parameters
2023-01-11T23:31:34.2612203Z [0;32m[       OK ] [mModulesTest.Parameters (0 ms)
2023-01-11T23:31:34.2612636Z [0;32m[ RUN      ] [mModulesTest.FunctionalCallsSuppliedFunction
2023-01-11T23:31:34.2613068Z [0;32m[       OK ] [mModulesTest.FunctionalCallsSuppliedFunction (0 ms)
2023-01-11T23:31:34.2613525Z [0;32m[ RUN      ] [mModulesTest.FunctionalWithTorchFunction
2023-01-11T23:31:34.2614004Z [0;32m[       OK ] [mModulesTest.FunctionalWithTorchFunction (0 ms)
2023-01-11T23:31:34.2614629Z [0;32m[ RUN      ] [mModulesTest.FunctionalArgumentBinding
2023-01-11T23:31:34.2615014Z [0;32m[       OK ] [mModulesTest.FunctionalArgumentBinding (0 ms)
2023-01-11T23:31:34.2615378Z [0;32m[ RUN      ] [mModulesTest.BatchNorm1dStateful
2023-01-11T23:31:34.2615715Z [0;32m[       OK ] [mModulesTest.BatchNorm1dStateful (0 ms)
2023-01-11T23:31:34.2616063Z [0;32m[ RUN      ] [mModulesTest.BatchNorm1dStateless
2023-01-11T23:31:34.2616434Z [0;32m[       OK ] [mModulesTest.BatchNorm1dStateless (0 ms)
2023-01-11T23:31:34.2616863Z [0;32m[ RUN      ] [mModulesTest.BatchNorm1d
2023-01-11T23:31:34.2617295Z [0;32m[       OK ] [mModulesTest.BatchNorm1d (0 ms)
2023-01-11T23:31:34.2617736Z [0;32m[ RUN      ] [mModulesTest.BatchNorm2dStateful
2023-01-11T23:31:34.2618178Z [0;32m[       OK ] [mModulesTest.BatchNorm2dStateful (0 ms)
2023-01-11T23:31:34.2618517Z [0;32m[ RUN      ] [mModulesTest.BatchNorm2dStateless
2023-01-11T23:31:34.2618956Z [0;32m[       OK ] [mModulesTest.BatchNorm2dStateless (0 ms)
2023-01-11T23:31:34.2619283Z [0;32m[ RUN      ] [mModulesTest.BatchNorm2d
2023-01-11T23:31:34.2620991Z [0;32m[       OK ] [mModulesTest.BatchNorm2d (0 ms)
2023-01-11T23:31:34.2621447Z [0;32m[ RUN      ] [mModulesTest.BatchNorm3dStateful
2023-01-11T23:31:34.2621914Z [0;32m[       OK ] [mModulesTest.BatchNorm3dStateful (0 ms)
2023-01-11T23:31:34.2622373Z [0;32m[ RUN      ] [mModulesTest.BatchNorm3dStateless
2023-01-11T23:31:34.2622833Z [0;32m[       OK ] [mModulesTest.BatchNorm3dStateless (0 ms)
2023-01-11T23:31:34.2623192Z [0;32m[ RUN      ] [mModulesTest.BatchNorm3d
2023-01-11T23:31:34.2626116Z [0;32m[       OK ] [mModulesTest.BatchNorm3d (0 ms)
2023-01-11T23:31:34.2627271Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm1dStateful
2023-01-11T23:31:34.2627865Z [0;32m[       OK ] [mModulesTest.InstanceNorm1dStateful (0 ms)
2023-01-11T23:31:34.2628396Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm1dStateless
2023-01-11T23:31:34.2628946Z [0;32m[       OK ] [mModulesTest.InstanceNorm1dStateless (0 ms)
2023-01-11T23:31:34.2629442Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm1d
2023-01-11T23:31:34.2629920Z [0;32m[       OK ] [mModulesTest.InstanceNorm1d (0 ms)
2023-01-11T23:31:34.2630500Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm2dStateful
2023-01-11T23:31:34.2631032Z [0;32m[       OK ] [mModulesTest.InstanceNorm2dStateful (0 ms)
2023-01-11T23:31:34.2631561Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm2dStateless
2023-01-11T23:31:34.2632096Z [0;32m[       OK ] [mModulesTest.InstanceNorm2dStateless (0 ms)
2023-01-11T23:31:34.2632915Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm2d
2023-01-11T23:31:34.2634028Z [0;32m[       OK ] [mModulesTest.InstanceNorm2d (0 ms)
2023-01-11T23:31:34.2634539Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm3dStateful
2023-01-11T23:31:34.2635084Z [0;32m[       OK ] [mModulesTest.InstanceNorm3dStateful (0 ms)
2023-01-11T23:31:34.2635612Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm3dStateless
2023-01-11T23:31:34.2636143Z [0;32m[       OK ] [mModulesTest.InstanceNorm3dStateless (0 ms)
2023-01-11T23:31:34.2636647Z [0;32m[ RUN      ] [mModulesTest.InstanceNorm3d
2023-01-11T23:31:34.2640144Z [0;32m[       OK ] [mModulesTest.InstanceNorm3d (0 ms)
2023-01-11T23:31:34.2640613Z [0;32m[ RUN      ] [mModulesTest.Linear_CUDA
2023-01-11T23:31:34.2647081Z [0;32m[       OK ] [mModulesTest.Linear_CUDA (0 ms)
2023-01-11T23:31:34.2647523Z [0;32m[ RUN      ] [mModulesTest.Linear2_CUDA
2023-01-11T23:31:34.2649575Z [0;32m[       OK ] [mModulesTest.Linear2_CUDA (0 ms)
2023-01-11T23:31:34.2649893Z [0;32m[ RUN      ] [mModulesTest.L1Loss
2023-01-11T23:31:34.2650894Z [0;32m[       OK ] [mModulesTest.L1Loss (0 ms)
2023-01-11T23:31:34.2651189Z [0;32m[ RUN      ] [mModulesTest.MSELoss
2023-01-11T23:31:34.2652901Z [0;32m[       OK ] [mModulesTest.MSELoss (0 ms)
2023-01-11T23:31:34.2653297Z [0;32m[ RUN      ] [mModulesTest.BCELoss
2023-01-11T23:31:34.2653595Z [0;32m[       OK ] [mModulesTest.BCELoss (0 ms)
2023-01-11T23:31:34.2653883Z [0;32m[ RUN      ] [mModulesTest.KLDivLoss
2023-01-11T23:31:34.2654794Z [W loss.h:57] Warning: reduction: 'mean' divides the total loss by both the batch size and the support size.'batchmean' divides only by the batch size, and aligns with the KL div math definition.'mean' will be changed to behave the same as 'batchmean' in the next major release. (function kl_div)
2023-01-11T23:31:34.2655656Z [0;32m[       OK ] [mModulesTest.KLDivLoss (0 ms)
2023-01-11T23:31:34.2655991Z [0;32m[ RUN      ] [mModulesTest.HingeEmbeddingLoss
2023-01-11T23:31:34.2658518Z [0;32m[       OK ] [mModulesTest.HingeEmbeddingLoss (0 ms)
2023-01-11T23:31:34.2658869Z [0;32m[ RUN      ] [mModulesTest.MultiMarginLoss
2023-01-11T23:31:34.2660380Z [0;32m[       OK ] [mModulesTest.MultiMarginLoss (0 ms)
2023-01-11T23:31:34.2660732Z [0;32m[ RUN      ] [mModulesTest.CosineEmbeddingLoss
2023-01-11T23:31:34.2664728Z [0;32m[       OK ] [mModulesTest.CosineEmbeddingLoss (0 ms)
2023-01-11T23:31:34.2665309Z [0;32m[ RUN      ] [mModulesTest.SmoothL1LossDefaultOptions
2023-01-11T23:31:34.2666224Z [0;32m[       OK ] [mModulesTest.SmoothL1LossDefaultOptions (0 ms)
2023-01-11T23:31:34.2666619Z [0;32m[ RUN      ] [mModulesTest.HuberLossDefaultOptions
2023-01-11T23:31:34.2667474Z [0;32m[       OK ] [mModulesTest.HuberLossDefaultOptions (0 ms)
2023-01-11T23:31:34.2667899Z [0;32m[ RUN      ] [mModulesTest.MultiLabelMarginLossDefaultOptions
2023-01-11T23:31:34.2669343Z [0;32m[       OK ] [mModulesTest.MultiLabelMarginLossDefaultOptions (0 ms)
2023-01-11T23:31:34.2669771Z [0;32m[ RUN      ] [mModulesTest.SmoothL1LossNoReduction
2023-01-11T23:31:34.2670507Z [0;32m[       OK ] [mModulesTest.SmoothL1LossNoReduction (0 ms)
2023-01-11T23:31:34.2671170Z [0;32m[ RUN      ] [mModulesTest.HuberLossNoReduction
2023-01-11T23:31:34.2673440Z [0;32m[       OK ] [mModulesTest.HuberLossNoReduction (0 ms)
2023-01-11T23:31:34.2673909Z [0;32m[ RUN      ] [mModulesTest.MultiLabelMarginLossNoReduction
2023-01-11T23:31:34.2674327Z [0;32m[       OK ] [mModulesTest.MultiLabelMarginLossNoReduction (0 ms)
2023-01-11T23:31:34.2674703Z [0;32m[ RUN      ] [mModulesTest.SmoothL1LossBeta
2023-01-11T23:31:34.2675084Z [0;32m[       OK ] [mModulesTest.SmoothL1LossBeta (0 ms)
2023-01-11T23:31:34.2675407Z [0;32m[ RUN      ] [mModulesTest.HuberLossDelta
2023-01-11T23:31:34.2676133Z [0;32m[       OK ] [mModulesTest.HuberLossDelta (0 ms)
2023-01-11T23:31:34.2676476Z [0;32m[ RUN      ] [mModulesTest.TripletMarginLoss
2023-01-11T23:31:34.2679534Z [0;32m[       OK ] [mModulesTest.TripletMarginLoss (0 ms)
2023-01-11T23:31:34.2680056Z [0;32m[ RUN      ] [mModulesTest.TripletMarginWithDistanceLossDefaultParity
2023-01-11T23:31:34.2819330Z [0;32m[       OK ] [mModulesTest.TripletMarginWithDistanceLossDefaultParity (13 ms)
2023-01-11T23:31:34.2819826Z [0;32m[ RUN      ] [mModulesTest.TripletMarginWithDistanceLossFunctionalParity
2023-01-11T23:31:34.3061864Z [0;32m[       OK ] [mModulesTest.TripletMarginWithDistanceLossFunctionalParity (24 ms)
2023-01-11T23:31:34.3062271Z [0;32m[ RUN      ] [mModulesTest.NLLLoss
2023-01-11T23:31:34.3064094Z [0;32m[       OK ] [mModulesTest.NLLLoss (0 ms)
2023-01-11T23:31:34.3064643Z [0;32m[ RUN      ] [mModulesTest.CrossEntropyLoss
2023-01-11T23:31:34.3070961Z [0;32m[       OK ] [mModulesTest.CrossEntropyLoss (0 ms)
2023-01-11T23:31:34.3072234Z [0;32m[ RUN      ] [mModulesTest.CosineSimilarity
2023-01-11T23:31:34.3074146Z [0;32m[       OK ] [mModulesTest.CosineSimilarity (0 ms)
2023-01-11T23:31:34.3074633Z [0;32m[ RUN      ] [mModulesTest.SoftMarginLossDefaultOptions
2023-01-11T23:31:34.3075686Z [0;32m[       OK ] [mModulesTest.SoftMarginLossDefaultOptions (0 ms)
2023-01-11T23:31:34.3076126Z [0;32m[ RUN      ] [mModulesTest.MultiLabelSoftMarginLossDefaultOptions
2023-01-11T23:31:34.3078759Z [0;32m[       OK ] [mModulesTest.MultiLabelSoftMarginLossDefaultOptions (0 ms)
2023-01-11T23:31:34.3079177Z [0;32m[ RUN      ] [mModulesTest.SoftMarginLossNoReduction
2023-01-11T23:31:34.3080636Z [0;32m[       OK ] [mModulesTest.SoftMarginLossNoReduction (0 ms)
2023-01-11T23:31:34.3081081Z [0;32m[ RUN      ] [mModulesTest.MultiLabelSoftMarginLossWeightedNoReduction
2023-01-11T23:31:34.3083661Z [0;32m[       OK ] [mModulesTest.MultiLabelSoftMarginLossWeightedNoReduction (0 ms)
2023-01-11T23:31:34.3084061Z [0;32m[ RUN      ] [mModulesTest.PairwiseDistance
2023-01-11T23:31:34.3085499Z [0;32m[       OK ] [mModulesTest.PairwiseDistance (0 ms)
2023-01-11T23:31:34.3085987Z [0;32m[ RUN      ] [mModulesTest.ELU
2023-01-11T23:31:34.3096974Z [0;32m[       OK ] [mModulesTest.ELU (1 ms)
2023-01-11T23:31:34.3097276Z [0;32m[ RUN      ] [mModulesTest.SELU
2023-01-11T23:31:34.3099760Z [0;32m[       OK ] [mModulesTest.SELU (0 ms)
2023-01-11T23:31:34.3100060Z [0;32m[ RUN      ] [mModulesTest.Hardshrink
2023-01-11T23:31:34.3107363Z [0;32m[       OK ] [mModulesTest.Hardshrink (0 ms)
2023-01-11T23:31:34.3107671Z [0;32m[ RUN      ] [mModulesTest.Hardtanh
2023-01-11T23:31:34.3136114Z [0;32m[       OK ] [mModulesTest.Hardtanh (2 ms)
2023-01-11T23:31:34.3136455Z [0;32m[ RUN      ] [mModulesTest.HardtanhMinValGEMaxVal
2023-01-11T23:31:34.3214200Z [0;32m[       OK ] [mModulesTest.HardtanhMinValGEMaxVal (7 ms)
2023-01-11T23:31:34.3214774Z [0;32m[ RUN      ] [mModulesTest.LeakyReLU
2023-01-11T23:31:34.3227426Z [0;32m[       OK ] [mModulesTest.LeakyReLU (1 ms)
2023-01-11T23:31:34.3227734Z [0;32m[ RUN      ] [mModulesTest.LogSigmoid
2023-01-11T23:31:34.3229458Z [0;32m[       OK ] [mModulesTest.LogSigmoid (0 ms)
2023-01-11T23:31:34.3229822Z [0;32m[ RUN      ] [mModulesTest.Softmax
2023-01-11T23:31:34.3230679Z [0;32m[       OK ] [mModulesTest.Softmax (0 ms)
2023-01-11T23:31:34.3231161Z [0;32m[ RUN      ] [mModulesTest.Softmin
2023-01-11T23:31:34.3233715Z [0;32m[       OK ] [mModulesTest.Softmin (0 ms)
2023-01-11T23:31:34.3234032Z [0;32m[ RUN      ] [mModulesTest.LogSoftmax
2023-01-11T23:31:34.3234341Z [0;32m[       OK ] [mModulesTest.LogSoftmax (0 ms)
2023-01-11T23:31:34.3234694Z [0;32m[ RUN      ] [mModulesTest.AdaptiveLogSoftmaxWithLoss
2023-01-11T23:31:34.3257190Z [0;32m[       OK ] [mModulesTest.AdaptiveLogSoftmaxWithLoss (2 ms)
2023-01-11T23:31:34.3257538Z [0;32m[ RUN      ] [mModulesTest.Softmax2d
2023-01-11T23:31:34.3268099Z [0;32m[       OK ] [mModulesTest.Softmax2d (1 ms)
2023-01-11T23:31:34.3268631Z [0;32m[ RUN      ] [mModulesTest.PReLU
2023-01-11T23:31:34.3273211Z [0;32m[       OK ] [mModulesTest.PReLU (0 ms)
2023-01-11T23:31:34.3273580Z [0;32m[ RUN      ] [mModulesTest.ReLU
2023-01-11T23:31:34.3274808Z [0;32m[       OK ] [mModulesTest.ReLU (0 ms)
2023-01-11T23:31:34.3275211Z [0;32m[ RUN      ] [mModulesTest.ReLU6
2023-01-11T23:31:34.3278040Z [0;32m[       OK ] [mModulesTest.ReLU6 (0 ms)
2023-01-11T23:31:34.3278330Z [0;32m[ RUN      ] [mModulesTest.RReLU
2023-01-11T23:31:34.3320954Z [0;32m[       OK ] [mModulesTest.RReLU (4 ms)
2023-01-11T23:31:34.3321296Z [0;32m[ RUN      ] [mModulesTest.CELU
2023-01-11T23:31:34.3330759Z [0;32m[       OK ] [mModulesTest.CELU (0 ms)
2023-01-11T23:31:34.3331041Z [0;32m[ RUN      ] [mModulesTest.GLU
2023-01-11T23:31:34.3332923Z [0;32m[       OK ] [mModulesTest.GLU (0 ms)
2023-01-11T23:31:34.3333292Z [0;32m[ RUN      ] [mModulesTest.GELU
2023-01-11T23:31:34.3335047Z [0;32m[       OK ] [mModulesTest.GELU (0 ms)
2023-01-11T23:31:34.3335347Z [0;32m[ RUN      ] [mModulesTest.TanhGELU
2023-01-11T23:31:34.3335859Z [0;32m[       OK ] [mModulesTest.TanhGELU (0 ms)
2023-01-11T23:31:34.3336187Z [0;32m[ RUN      ] [mModulesTest.Mish
2023-01-11T23:31:34.3337337Z [0;32m[       OK ] [mModulesTest.Mish (0 ms)
2023-01-11T23:31:34.3337627Z [0;32m[ RUN      ] [mModulesTest.Sigmoid
2023-01-11T23:31:34.3337924Z [0;32m[       OK ] [mModulesTest.Sigmoid (0 ms)
2023-01-11T23:31:34.3338239Z [0;32m[ RUN      ] [mModulesTest.PixelShuffle
2023-01-11T23:31:34.3340385Z [0;32m[       OK ] [mModulesTest.PixelShuffle (0 ms)
2023-01-11T23:31:34.3340712Z [0;32m[ RUN      ] [mModulesTest.PixelUnshuffle
2023-01-11T23:31:34.3341398Z [0;32m[       OK ] [mModulesTest.PixelUnshuffle (0 ms)
2023-01-11T23:31:34.3341712Z [0;32m[ RUN      ] [mModulesTest.Softplus
2023-01-11T23:31:34.3348616Z [0;32m[       OK ] [mModulesTest.Softplus (0 ms)
2023-01-11T23:31:34.3348922Z [0;32m[ RUN      ] [mModulesTest.Softshrink
2023-01-11T23:31:34.3355355Z [0;32m[       OK ] [mModulesTest.Softshrink (0 ms)
2023-01-11T23:31:34.3355717Z [0;32m[ RUN      ] [mModulesTest.Softsign
2023-01-11T23:31:34.3356073Z [0;32m[       OK ] [mModulesTest.Softsign (0 ms)
2023-01-11T23:31:34.3356360Z [0;32m[ RUN      ] [mModulesTest.Tanh
2023-01-11T23:31:34.3356653Z [0;32m[       OK ] [mModulesTest.Tanh (0 ms)
2023-01-11T23:31:34.3357012Z [0;32m[ RUN      ] [mModulesTest.Tanhshrink
2023-01-11T23:31:34.3357325Z [0;32m[       OK ] [mModulesTest.Tanhshrink (0 ms)
2023-01-11T23:31:34.3357625Z [0;32m[ RUN      ] [mModulesTest.Threshold
2023-01-11T23:31:34.3370644Z [0;32m[       OK ] [mModulesTest.Threshold (1 ms)
2023-01-11T23:31:34.3371075Z [0;32m[ RUN      ] [mModulesTest.Upsampling1D
2023-01-11T23:31:34.3372029Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3374680Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3376771Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3377860Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3380983Z [0;32m[       OK ] [mModulesTest.Upsampling1D (1 ms)
2023-01-11T23:31:34.3381333Z [0;32m[ RUN      ] [mModulesTest.Upsampling2D
2023-01-11T23:31:34.3382084Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3383660Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3385904Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3386779Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3389071Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3390512Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3394576Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3395472Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3396282Z [0;32m[       OK ] [mModulesTest.Upsampling2D (1 ms)
2023-01-11T23:31:34.3396719Z [0;32m[ RUN      ] [mModulesTest.Upsampling3D
2023-01-11T23:31:34.3398453Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3399702Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3402367Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3403276Z [W upsampling.h:66] Warning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and uses scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.  (function _interp_output_size)
2023-01-11T23:31:34.3405760Z [0;32m[       OK ] [mModulesTest.Upsampling3D (0 ms)
2023-01-11T23:31:34.3406080Z [0;32m[ RUN      ] [mModulesTest.CTCLoss
2023-01-11T23:31:34.3408897Z [0;32m[       OK ] [mModulesTest.CTCLoss (0 ms)
2023-01-11T23:31:34.3409309Z [0;32m[ RUN      ] [mModulesTest.PoissonNLLLoss
2023-01-11T23:31:34.3410665Z [0;32m[       OK ] [mModulesTest.PoissonNLLLoss (0 ms)
2023-01-11T23:31:34.3411089Z [0;32m[ RUN      ] [mModulesTest.MarginRankingLoss
2023-01-11T23:31:34.3414203Z [0;32m[       OK ] [mModulesTest.MarginRankingLoss (0 ms)
2023-01-11T23:31:34.3414780Z [0;32m[ RUN      ] [mModulesTest.BCEWithLogitsLoss
2023-01-11T23:31:34.3472502Z [0;32m[       OK ] [mModulesTest.BCEWithLogitsLoss (5 ms)
2023-01-11T23:31:34.3472936Z [0;32m[ RUN      ] [mModulesTest.MultiheadAttention
2023-01-11T23:31:43.6439465Z [0;32m[       OK ] [mModulesTest.MultiheadAttention (9296 ms)
2023-01-11T23:31:43.6440173Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintIdentity
2023-01-11T23:31:43.6440632Z [0;32m[       OK ] [mModulesTest.PrettyPrintIdentity (0 ms)
2023-01-11T23:31:43.6441043Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintFlatten
2023-01-11T23:31:43.6441508Z [0;32m[       OK ] [mModulesTest.PrettyPrintFlatten (0 ms)
2023-01-11T23:31:43.6441934Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintUnflatten
2023-01-11T23:31:43.6442296Z [0;32m[       OK ] [mModulesTest.PrettyPrintUnflatten (0 ms)
2023-01-11T23:31:43.6442632Z [0;32m[ RUN      ] [mModulesTest.ReflectionPad1d
2023-01-11T23:31:43.6442970Z [0;32m[       OK ] [mModulesTest.ReflectionPad1d (0 ms)
2023-01-11T23:31:43.6443400Z [0;32m[ RUN      ] [mModulesTest.ReflectionPad2d
2023-01-11T23:31:43.6443732Z [0;32m[       OK ] [mModulesTest.ReflectionPad2d (0 ms)
2023-01-11T23:31:43.6444050Z [0;32m[ RUN      ] [mModulesTest.ReflectionPad3d
2023-01-11T23:31:43.6446478Z [0;32m[       OK ] [mModulesTest.ReflectionPad3d (0 ms)
2023-01-11T23:31:43.6446940Z [0;32m[ RUN      ] [mModulesTest.ReplicationPad1d
2023-01-11T23:31:43.6447609Z [0;32m[       OK ] [mModulesTest.ReplicationPad1d (0 ms)
2023-01-11T23:31:43.6447934Z [0;32m[ RUN      ] [mModulesTest.ReplicationPad2d
2023-01-11T23:31:43.6453419Z [0;32m[       OK ] [mModulesTest.ReplicationPad2d (0 ms)
2023-01-11T23:31:43.6453779Z [0;32m[ RUN      ] [mModulesTest.ReplicationPad3d
2023-01-11T23:31:43.6456979Z [0;32m[       OK ] [mModulesTest.ReplicationPad3d (0 ms)
2023-01-11T23:31:43.6457305Z [0;32m[ RUN      ] [mModulesTest.ZeroPad2d
2023-01-11T23:31:43.6460857Z [0;32m[       OK ] [mModulesTest.ZeroPad2d (0 ms)
2023-01-11T23:31:43.6461329Z [0;32m[ RUN      ] [mModulesTest.ConstantPad1d
2023-01-11T23:31:43.6462453Z [0;32m[       OK ] [mModulesTest.ConstantPad1d (0 ms)
2023-01-11T23:31:43.6462787Z [0;32m[ RUN      ] [mModulesTest.ConstantPad2d
2023-01-11T23:31:43.6465743Z [0;32m[       OK ] [mModulesTest.ConstantPad2d (0 ms)
2023-01-11T23:31:43.6466120Z [0;32m[ RUN      ] [mModulesTest.ConstantPad3d
2023-01-11T23:31:43.6472221Z [0;32m[       OK ] [mModulesTest.ConstantPad3d (0 ms)
2023-01-11T23:31:43.6472567Z [0;32m[ RUN      ] [mModulesTest.CrossMapLRN2d
2023-01-11T23:31:43.6478728Z [0;32m[       OK ] [mModulesTest.CrossMapLRN2d (0 ms)
2023-01-11T23:31:43.6479031Z [0;32m[ RUN      ] [mModulesTest.RNNCell
2023-01-11T23:31:43.6481943Z [0;32m[       OK ] [mModulesTest.RNNCell (0 ms)
2023-01-11T23:31:43.6482249Z [0;32m[ RUN      ] [mModulesTest.LSTMCell
2023-01-11T23:31:43.6486504Z [0;32m[       OK ] [mModulesTest.LSTMCell (0 ms)
2023-01-11T23:31:43.6486797Z [0;32m[ RUN      ] [mModulesTest.GRUCell
2023-01-11T23:31:43.6490136Z [0;32m[       OK ] [mModulesTest.GRUCell (0 ms)
2023-01-11T23:31:43.6490583Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLinear
2023-01-11T23:31:43.6490947Z [0;32m[       OK ] [mModulesTest.PrettyPrintLinear (0 ms)
2023-01-11T23:31:43.6491283Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintBilinear
2023-01-11T23:31:43.6491640Z [0;32m[       OK ] [mModulesTest.PrettyPrintBilinear (0 ms)
2023-01-11T23:31:43.6492076Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintConv
2023-01-11T23:31:43.6492795Z [0;32m[       OK ] [mModulesTest.PrettyPrintConv (0 ms)
2023-01-11T23:31:43.6493914Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintConvTranspose
2023-01-11T23:31:43.6494686Z [0;32m[       OK ] [mModulesTest.PrettyPrintConvTranspose (0 ms)
2023-01-11T23:31:43.6495157Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintUpsample
2023-01-11T23:31:43.6495581Z [0;32m[       OK ] [mModulesTest.PrettyPrintUpsample (0 ms)
2023-01-11T23:31:43.6496064Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintFold
2023-01-11T23:31:43.6496536Z [0;32m[       OK ] [mModulesTest.PrettyPrintFold (0 ms)
2023-01-11T23:31:43.6496979Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintUnfold
2023-01-11T23:31:43.6497417Z [0;32m[       OK ] [mModulesTest.PrettyPrintUnfold (0 ms)
2023-01-11T23:31:43.6497887Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMaxPool
2023-01-11T23:31:43.6498673Z [0;32m[       OK ] [mModulesTest.PrettyPrintMaxPool (0 ms)
2023-01-11T23:31:43.6499064Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintAvgPool
2023-01-11T23:31:43.6499556Z [0;32m[       OK ] [mModulesTest.PrettyPrintAvgPool (0 ms)
2023-01-11T23:31:43.6500059Z [0;32m[ RUN      ] [mModulesTest.PrettyPrinFractionalMaxPool
2023-01-11T23:31:43.6500600Z [0;32m[       OK ] [mModulesTest.PrettyPrinFractionalMaxPool (0 ms)
2023-01-11T23:31:43.6501145Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLPPool
2023-01-11T23:31:43.6501611Z [0;32m[       OK ] [mModulesTest.PrettyPrintLPPool (0 ms)
2023-01-11T23:31:43.6502108Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintAdaptiveMaxPool
2023-01-11T23:31:43.6502635Z [0;32m[       OK ] [mModulesTest.PrettyPrintAdaptiveMaxPool (0 ms)
2023-01-11T23:31:43.6503151Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintAdaptiveAvgPool
2023-01-11T23:31:43.6503671Z [0;32m[       OK ] [mModulesTest.PrettyPrintAdaptiveAvgPool (0 ms)
2023-01-11T23:31:43.6504150Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMaxUnpool
2023-01-11T23:31:43.6504597Z [0;32m[       OK ] [mModulesTest.PrettyPrintMaxUnpool (0 ms)
2023-01-11T23:31:43.6505011Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintDropout
2023-01-11T23:31:43.6505481Z [0;32m[       OK ] [mModulesTest.PrettyPrintDropout (0 ms)
2023-01-11T23:31:43.6505910Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintDropout2d
2023-01-11T23:31:43.6506337Z [0;32m[       OK ] [mModulesTest.PrettyPrintDropout2d (0 ms)
2023-01-11T23:31:43.6506826Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintDropout3d
2023-01-11T23:31:43.6507270Z [0;32m[       OK ] [mModulesTest.PrettyPrintDropout3d (0 ms)
2023-01-11T23:31:43.6507806Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintFunctional
2023-01-11T23:31:43.6508282Z [0;32m[       OK ] [mModulesTest.PrettyPrintFunctional (0 ms)
2023-01-11T23:31:43.6508747Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintBatchNorm1d
2023-01-11T23:31:43.6509217Z [0;32m[       OK ] [mModulesTest.PrettyPrintBatchNorm1d (0 ms)
2023-01-11T23:31:43.6509615Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintBatchNorm2d
2023-01-11T23:31:43.6510088Z [0;32m[       OK ] [mModulesTest.PrettyPrintBatchNorm2d (0 ms)
2023-01-11T23:31:43.6510499Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintBatchNorm3d
2023-01-11T23:31:43.6510982Z [0;32m[       OK ] [mModulesTest.PrettyPrintBatchNorm3d (0 ms)
2023-01-11T23:31:43.6511411Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintInstanceNorm1d
2023-01-11T23:31:43.6511790Z [0;32m[       OK ] [mModulesTest.PrettyPrintInstanceNorm1d (0 ms)
2023-01-11T23:31:43.6512252Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintInstanceNorm2d
2023-01-11T23:31:43.6512753Z [0;32m[       OK ] [mModulesTest.PrettyPrintInstanceNorm2d (0 ms)
2023-01-11T23:31:43.6513180Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintInstanceNorm3d
2023-01-11T23:31:43.6513672Z [0;32m[       OK ] [mModulesTest.PrettyPrintInstanceNorm3d (0 ms)
2023-01-11T23:31:43.6514163Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLayerNorm
2023-01-11T23:31:43.6514636Z [0;32m[       OK ] [mModulesTest.PrettyPrintLayerNorm (0 ms)
2023-01-11T23:31:43.6515080Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintGroupNorm
2023-01-11T23:31:43.6515522Z [0;32m[       OK ] [mModulesTest.PrettyPrintGroupNorm (0 ms)
2023-01-11T23:31:43.6515914Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLocalResponseNorm
2023-01-11T23:31:43.6516315Z [0;32m[       OK ] [mModulesTest.PrettyPrintLocalResponseNorm (0 ms)
2023-01-11T23:31:43.6516758Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintEmbedding
2023-01-11T23:31:43.6517133Z [0;32m[       OK ] [mModulesTest.PrettyPrintEmbedding (0 ms)
2023-01-11T23:31:43.6517509Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintEmbeddingBag
2023-01-11T23:31:43.6518008Z [0;32m[       OK ] [mModulesTest.PrettyPrintEmbeddingBag (0 ms)
2023-01-11T23:31:43.6518492Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintL1Loss
2023-01-11T23:31:43.6518840Z [0;32m[       OK ] [mModulesTest.PrettyPrintL1Loss (0 ms)
2023-01-11T23:31:43.6519176Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintKLDivLoss
2023-01-11T23:31:43.6519702Z [0;32m[       OK ] [mModulesTest.PrettyPrintKLDivLoss (0 ms)
2023-01-11T23:31:43.6520200Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMSELoss
2023-01-11T23:31:43.6520651Z [0;32m[       OK ] [mModulesTest.PrettyPrintMSELoss (0 ms)
2023-01-11T23:31:43.6521074Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintBCELoss
2023-01-11T23:31:43.6521459Z [0;32m[       OK ] [mModulesTest.PrettyPrintBCELoss (0 ms)
2023-01-11T23:31:43.6521830Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintHingeEmbeddingLoss
2023-01-11T23:31:43.6522224Z [0;32m[       OK ] [mModulesTest.PrettyPrintHingeEmbeddingLoss (0 ms)
2023-01-11T23:31:43.6522630Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintCosineEmbeddingLoss
2023-01-11T23:31:43.6523039Z [0;32m[       OK ] [mModulesTest.PrettyPrintCosineEmbeddingLoss (0 ms)
2023-01-11T23:31:43.6523437Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintTripletMarginLoss
2023-01-11T23:31:43.6523826Z [0;32m[       OK ] [mModulesTest.PrettyPrintTripletMarginLoss (0 ms)
2023-01-11T23:31:43.6524257Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintTripletMarginWithDistanceLoss
2023-01-11T23:31:43.6524724Z [0;32m[       OK ] [mModulesTest.PrettyPrintTripletMarginWithDistanceLoss (0 ms)
2023-01-11T23:31:43.6525110Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintNLLLoss
2023-01-11T23:31:43.6525453Z [0;32m[       OK ] [mModulesTest.PrettyPrintNLLLoss (0 ms)
2023-01-11T23:31:43.6525816Z [0;32m[ RUN      ] [mModulesTest.PrettyPrinCrossEntropyLoss
2023-01-11T23:31:43.6526251Z [0;32m[       OK ] [mModulesTest.PrettyPrinCrossEntropyLoss (0 ms)
2023-01-11T23:31:43.6526638Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMultiLabelMarginLoss
2023-01-11T23:31:43.6527108Z [0;32m[       OK ] [mModulesTest.PrettyPrintMultiLabelMarginLoss (0 ms)
2023-01-11T23:31:43.6527537Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMultiLabelSoftMarginLoss
2023-01-11T23:31:43.6527971Z [0;32m[       OK ] [mModulesTest.PrettyPrintMultiLabelSoftMarginLoss (0 ms)
2023-01-11T23:31:43.6528378Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftMarginLoss
2023-01-11T23:31:43.6528761Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftMarginLoss (0 ms)
2023-01-11T23:31:43.6529137Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintCosineSimilarity
2023-01-11T23:31:43.6529531Z [0;32m[       OK ] [mModulesTest.PrettyPrintCosineSimilarity (0 ms)
2023-01-11T23:31:43.6529921Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintPairwiseDistance
2023-01-11T23:31:43.6530312Z [0;32m[       OK ] [mModulesTest.PrettyPrintPairwiseDistance (0 ms)
2023-01-11T23:31:43.6530691Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintReflectionPad
2023-01-11T23:31:43.6531077Z [0;32m[       OK ] [mModulesTest.PrettyPrintReflectionPad (0 ms)
2023-01-11T23:31:43.6531457Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintReplicationPad
2023-01-11T23:31:43.6531833Z [0;32m[       OK ] [mModulesTest.PrettyPrintReplicationPad (0 ms)
2023-01-11T23:31:43.6532202Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintZeroPad2d
2023-01-11T23:31:43.6532566Z [0;32m[       OK ] [mModulesTest.PrettyPrintZeroPad2d (0 ms)
2023-01-11T23:31:43.6532919Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintConstantPad
2023-01-11T23:31:43.6533293Z [0;32m[       OK ] [mModulesTest.PrettyPrintConstantPad (0 ms)
2023-01-11T23:31:43.6533658Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintNestedModel
2023-01-11T23:31:43.6534021Z [0;32m[       OK ] [mModulesTest.PrettyPrintNestedModel (0 ms)
2023-01-11T23:31:43.6534356Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintELU
2023-01-11T23:31:43.6534928Z [0;32m[       OK ] [mModulesTest.PrettyPrintELU (0 ms)
2023-01-11T23:31:43.6535307Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSELU
2023-01-11T23:31:43.6535638Z [0;32m[       OK ] [mModulesTest.PrettyPrintSELU (0 ms)
2023-01-11T23:31:43.6535956Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintGLU
2023-01-11T23:31:43.6536281Z [0;32m[       OK ] [mModulesTest.PrettyPrintGLU (0 ms)
2023-01-11T23:31:43.6536610Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintHardshrink
2023-01-11T23:31:43.6537030Z [0;32m[       OK ] [mModulesTest.PrettyPrintHardshrink (0 ms)
2023-01-11T23:31:43.6537387Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintHardtanh
2023-01-11T23:31:43.6537744Z [0;32m[       OK ] [mModulesTest.PrettyPrintHardtanh (0 ms)
2023-01-11T23:31:43.6538087Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLeakyReLU
2023-01-11T23:31:43.6538447Z [0;32m[       OK ] [mModulesTest.PrettyPrintLeakyReLU (0 ms)
2023-01-11T23:31:43.6538805Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLogSigmoid
2023-01-11T23:31:43.6539158Z [0;32m[       OK ] [mModulesTest.PrettyPrintLogSigmoid (0 ms)
2023-01-11T23:31:43.6539507Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftmax
2023-01-11T23:31:43.6539857Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftmax (0 ms)
2023-01-11T23:31:43.6540188Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftmin
2023-01-11T23:31:43.6540535Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftmin (0 ms)
2023-01-11T23:31:43.6540884Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLogSoftmax
2023-01-11T23:31:43.6541247Z [0;32m[       OK ] [mModulesTest.PrettyPrintLogSoftmax (0 ms)
2023-01-11T23:31:43.6541592Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftmax2d
2023-01-11T23:31:43.6541950Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftmax2d (0 ms)
2023-01-11T23:31:43.6542288Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintPReLU
2023-01-11T23:31:43.6542622Z [0;32m[       OK ] [mModulesTest.PrettyPrintPReLU (0 ms)
2023-01-11T23:31:43.6542948Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintReLU
2023-01-11T23:31:43.6543280Z [0;32m[       OK ] [mModulesTest.PrettyPrintReLU (0 ms)
2023-01-11T23:31:43.6543643Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintReLU6
2023-01-11T23:31:43.6543986Z [0;32m[       OK ] [mModulesTest.PrettyPrintReLU6 (0 ms)
2023-01-11T23:31:43.6544316Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintRReLU
2023-01-11T23:31:43.6544716Z [0;32m[       OK ] [mModulesTest.PrettyPrintRReLU (0 ms)
2023-01-11T23:31:43.6545072Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintCELU
2023-01-11T23:31:43.6545441Z [0;32m[       OK ] [mModulesTest.PrettyPrintCELU (0 ms)
2023-01-11T23:31:43.6545808Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSigmoid
2023-01-11T23:31:43.6546194Z [0;32m[       OK ] [mModulesTest.PrettyPrintSigmoid (0 ms)
2023-01-11T23:31:43.6546582Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintPixelShuffle
2023-01-11T23:31:43.6546995Z [0;32m[       OK ] [mModulesTest.PrettyPrintPixelShuffle (0 ms)
2023-01-11T23:31:43.6547441Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintPixelUnshuffle
2023-01-11T23:31:43.6547925Z [0;32m[       OK ] [mModulesTest.PrettyPrintPixelUnshuffle (0 ms)
2023-01-11T23:31:43.6548372Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftplus
2023-01-11T23:31:43.6548855Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftplus (0 ms)
2023-01-11T23:31:43.6549299Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftshrink
2023-01-11T23:31:43.6549679Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftshrink (0 ms)
2023-01-11T23:31:43.6550119Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintSoftsign
2023-01-11T23:31:43.6550471Z [0;32m[       OK ] [mModulesTest.PrettyPrintSoftsign (0 ms)
2023-01-11T23:31:43.6550796Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintTanh
2023-01-11T23:31:43.6551128Z [0;32m[       OK ] [mModulesTest.PrettyPrintTanh (0 ms)
2023-01-11T23:31:43.6551467Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintTanhshrink
2023-01-11T23:31:43.6551884Z [0;32m[       OK ] [mModulesTest.PrettyPrintTanhshrink (0 ms)
2023-01-11T23:31:43.6552344Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintThreshold
2023-01-11T23:31:43.6552746Z [0;32m[       OK ] [mModulesTest.PrettyPrintThreshold (0 ms)
2023-01-11T23:31:43.6553091Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintCTCLoss
2023-01-11T23:31:43.6553435Z [0;32m[       OK ] [mModulesTest.PrettyPrintCTCLoss (0 ms)
2023-01-11T23:31:43.6553791Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintPoissonNLLLoss
2023-01-11T23:31:43.6554165Z [0;32m[       OK ] [mModulesTest.PrettyPrintPoissonNLLLoss (0 ms)
2023-01-11T23:31:43.6554617Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMarginRankingLoss
2023-01-11T23:31:43.6555016Z [0;32m[       OK ] [mModulesTest.PrettyPrintMarginRankingLoss (0 ms)
2023-01-11T23:31:43.6555394Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintCrossMapLRN2d
2023-01-11T23:31:43.6555764Z [0;32m[       OK ] [mModulesTest.PrettyPrintCrossMapLRN2d (0 ms)
2023-01-11T23:31:43.6556178Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintAlphaDropout
2023-01-11T23:31:43.6556542Z [0;32m[       OK ] [mModulesTest.PrettyPrintAlphaDropout (0 ms)
2023-01-11T23:31:43.6556916Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintFeatureAlphaDropout
2023-01-11T23:31:43.6557328Z [0;32m[       OK ] [mModulesTest.PrettyPrintFeatureAlphaDropout (0 ms)
2023-01-11T23:31:43.6557723Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintBCEWithLogitsLoss
2023-01-11T23:31:43.6558115Z [0;32m[       OK ] [mModulesTest.PrettyPrintBCEWithLogitsLoss (0 ms)
2023-01-11T23:31:43.6558502Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintMultiheadAttention
2023-01-11T23:31:43.6558901Z [0;32m[       OK ] [mModulesTest.PrettyPrintMultiheadAttention (0 ms)
2023-01-11T23:31:43.6559263Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintRNNCell
2023-01-11T23:31:43.6559595Z [0;32m[       OK ] [mModulesTest.PrettyPrintRNNCell (0 ms)
2023-01-11T23:31:43.6559931Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintLSTMCell
2023-01-11T23:31:43.6560277Z [0;32m[       OK ] [mModulesTest.PrettyPrintLSTMCell (0 ms)
2023-01-11T23:31:43.6560609Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintGRUCell
2023-01-11T23:31:43.6560946Z [0;32m[       OK ] [mModulesTest.PrettyPrintGRUCell (0 ms)
2023-01-11T23:31:43.6561370Z [0;32m[ RUN      ] [mModulesTest.PrettyPrintAdaptiveLogSoftmaxWithLoss
2023-01-11T23:31:43.6561820Z [0;32m[       OK ] [mModulesTest.PrettyPrintAdaptiveLogSoftmaxWithLoss (0 ms)
2023-01-11T23:31:43.6562216Z [0;32m[----------] [m258 tests from ModulesTest (9514 ms total)
2023-01-11T23:31:43.6562381Z 
2023-01-11T23:31:43.6562531Z [0;32m[----------] [m1 test from NestedTest
2023-01-11T23:31:43.6562802Z [0;32m[ RUN      ] [mNestedTest.Nested
2023-01-11T23:31:43.6563165Z [W NestedTensorImpl.cpp:179] Warning: The PyTorch API of nested tensors is in prototype stage and will change in the near future. (function operator())
2023-01-11T23:31:43.6563569Z [0;32m[       OK ] [mNestedTest.Nested (0 ms)
2023-01-11T23:31:43.6563877Z [0;32m[----------] [m1 test from NestedTest (0 ms total)
2023-01-11T23:31:43.6564030Z 
2023-01-11T23:31:43.6564194Z [0;32m[----------] [m10 tests from ParameterDictTest
2023-01-11T23:31:43.6564534Z [0;32m[ RUN      ] [mParameterDictTest.ConstructFromTensor
2023-01-11T23:31:43.6564915Z [0;32m[       OK ] [mParameterDictTest.ConstructFromTensor (0 ms)
2023-01-11T23:31:43.6565298Z [0;32m[ RUN      ] [mParameterDictTest.ConstructFromOrderedDict
2023-01-11T23:31:43.6565697Z [0;32m[       OK ] [mParameterDictTest.ConstructFromOrderedDict (0 ms)
2023-01-11T23:31:43.6566077Z [0;32m[ RUN      ] [mParameterDictTest.InsertAndContains
2023-01-11T23:31:43.6566439Z [0;32m[       OK ] [mParameterDictTest.InsertAndContains (0 ms)
2023-01-11T23:31:43.6566780Z [0;32m[ RUN      ] [mParameterDictTest.InsertAndClear
2023-01-11T23:31:43.6567134Z [0;32m[       OK ] [mParameterDictTest.InsertAndClear (0 ms)
2023-01-11T23:31:43.6567471Z [0;32m[ RUN      ] [mParameterDictTest.InsertAndPop
2023-01-11T23:31:43.6567811Z [0;32m[       OK ] [mParameterDictTest.InsertAndPop (1 ms)
2023-01-11T23:31:43.6568137Z [0;32m[ RUN      ] [mParameterDictTest.SimpleUpdate
2023-01-11T23:31:43.6568496Z [0;32m[       OK ] [mParameterDictTest.SimpleUpdate (1 ms)
2023-01-11T23:31:43.6568816Z [0;32m[ RUN      ] [mParameterDictTest.Keys
2023-01-11T23:31:43.6569110Z [0;32m[       OK ] [mParameterDictTest.Keys (0 ms)
2023-01-11T23:31:43.6569414Z [0;32m[ RUN      ] [mParameterDictTest.Values
2023-01-11T23:31:43.6569725Z [0;32m[       OK ] [mParameterDictTest.Values (0 ms)
2023-01-11T23:31:43.6570024Z [0;32m[ RUN      ] [mParameterDictTest.Get
2023-01-11T23:31:43.6570354Z [0;32m[       OK ] [mParameterDictTest.Get (0 ms)
2023-01-11T23:31:43.6570708Z [0;32m[ RUN      ] [mParameterDictTest.PrettyPrintParameterDict
2023-01-11T23:31:43.6571112Z [0;32m[       OK ] [mParameterDictTest.PrettyPrintParameterDict (0 ms)
2023-01-11T23:31:43.6571490Z [0;32m[----------] [m10 tests from ParameterDictTest (2 ms total)
2023-01-11T23:31:43.6571659Z 
2023-01-11T23:31:43.6571825Z [0;32m[----------] [m8 tests from ParameterListTest
2023-01-11T23:31:43.6572194Z [0;32m[ RUN      ] [mParameterListTest.ConstructsFromSharedPointer
2023-01-11T23:31:43.6572609Z [0;32m[       OK ] [mParameterListTest.ConstructsFromSharedPointer (0 ms)
2023-01-11T23:31:43.6572972Z [0;32m[ RUN      ] [mParameterListTest.isEmpty
2023-01-11T23:31:43.6573296Z [0;32m[       OK ] [mParameterListTest.isEmpty (0 ms)
2023-01-11T23:31:43.6573648Z [0;32m[ RUN      ] [mParameterListTest.PushBackAddsAnElement
2023-01-11T23:31:43.6574031Z [0;32m[       OK ] [mParameterListTest.PushBackAddsAnElement (0 ms)
2023-01-11T23:31:43.6574386Z [0;32m[ RUN      ] [mParameterListTest.ForEachLoop
2023-01-11T23:31:43.6575061Z [0;32m[       OK ] [mParameterListTest.ForEachLoop (0 ms)
2023-01-11T23:31:43.6575397Z [0;32m[ RUN      ] [mParameterListTest.AccessWithAt
2023-01-11T23:31:43.6587866Z [0;32m[       OK ] [mParameterListTest.AccessWithAt (3 ms)
2023-01-11T23:31:43.6588486Z [0;32m[ RUN      ] [mParameterListTest.ExtendPushesParametersFromOtherParameterList
2023-01-11T23:31:43.6589112Z [0;32m[       OK ] [mParameterListTest.ExtendPushesParametersFromOtherParameterList (0 ms)
2023-01-11T23:31:43.6589655Z [0;32m[ RUN      ] [mParameterListTest.PrettyPrintParameterList
2023-01-11T23:31:43.6590393Z [0;32m[       OK ] [mParameterListTest.PrettyPrintParameterList (0 ms)
2023-01-11T23:31:43.6590776Z [0;32m[ RUN      ] [mParameterListTest.IncrementAdd
2023-01-11T23:31:43.6591125Z [0;32m[       OK ] [mParameterListTest.IncrementAdd (0 ms)
2023-01-11T23:31:43.6591486Z [0;32m[----------] [m8 tests from ParameterListTest (4 ms total)
2023-01-11T23:31:43.6591658Z 
2023-01-11T23:31:43.6591817Z [0;32m[----------] [m1 test from NamespaceTests
2023-01-11T23:31:43.6592223Z [0;32m[ RUN      ] [mNamespaceTests.NotLeakingSymbolsFromTorchAutogradNamespace
2023-01-11T23:31:43.6592726Z [0;32m[       OK ] [mNamespaceTests.NotLeakingSymbolsFromTorchAutogradNamespace (0 ms)
2023-01-11T23:31:43.6593150Z [0;32m[----------] [m1 test from NamespaceTests (0 ms total)
2023-01-11T23:31:43.6593312Z 
2023-01-11T23:31:43.6593462Z [0;32m[----------] [m7 tests from NNUtilsTest
2023-01-11T23:31:43.6593749Z [0;32m[ RUN      ] [mNNUtilsTest.ClipGradNorm
2023-01-11T23:31:43.6605565Z [0;32m[       OK ] [mNNUtilsTest.ClipGradNorm (1 ms)
2023-01-11T23:31:43.6605981Z [0;32m[ RUN      ] [mNNUtilsTest.ClipGradNormErrorIfNonfinite
2023-01-11T23:31:44.1766817Z [0;32m[       OK ] [mNNUtilsTest.ClipGradNormErrorIfNonfinite (516 ms)
2023-01-11T23:31:44.1767282Z [0;32m[ RUN      ] [mNNUtilsTest.ClipGradValue
2023-01-11T23:31:44.1767674Z [0;32m[       OK ] [mNNUtilsTest.ClipGradValue (0 ms)
2023-01-11T23:31:44.1768006Z [0;32m[ RUN      ] [mNNUtilsTest.ConvertParameters
2023-01-11T23:31:44.1773296Z [0;32m[       OK ] [mNNUtilsTest.ConvertParameters (0 ms)
2023-01-11T23:31:44.1773655Z [0;32m[ RUN      ] [mNNUtilsTest.PackSequence
2023-01-11T23:31:44.2198239Z [0;32m[       OK ] [mNNUtilsTest.PackSequence (42 ms)
2023-01-11T23:31:44.2198617Z [0;32m[ RUN      ] [mNNUtilsTest.PackPaddedSequence
2023-01-11T23:31:44.2363819Z [0;32m[       OK ] [mNNUtilsTest.PackPaddedSequence (16 ms)
2023-01-11T23:31:44.2364180Z [0;32m[ RUN      ] [mNNUtilsTest.PadSequence
2023-01-11T23:31:44.2433074Z [0;32m[       OK ] [mNNUtilsTest.PadSequence (6 ms)
2023-01-11T23:31:44.2433800Z [0;32m[----------] [m7 tests from NNUtilsTest (583 ms total)
2023-01-11T23:31:44.2434242Z 
2023-01-11T23:31:44.2434615Z [0;32m[----------] [m3 tests from PackedSequenceTest
2023-01-11T23:31:44.2435254Z [0;32m[ RUN      ] [mPackedSequenceTest.WrongOrder
2023-01-11T23:31:44.2472791Z [0;32m[       OK ] [mPackedSequenceTest.WrongOrder (4 ms)
2023-01-11T23:31:44.2473548Z [0;32m[ RUN      ] [mPackedSequenceTest.TotalLength
2023-01-11T23:31:44.2549554Z [0;32m[       OK ] [mPackedSequenceTest.TotalLength (7 ms)
2023-01-11T23:31:44.2550476Z [0;32m[ RUN      ] [mPackedSequenceTest.To
2023-01-11T23:31:44.2563053Z [0;32m[       OK ] [mPackedSequenceTest.To (1 ms)
2023-01-11T23:31:44.2563594Z [0;32m[----------] [m3 tests from PackedSequenceTest (13 ms total)
2023-01-11T23:31:44.2563851Z 
2023-01-11T23:31:44.2564073Z [0;32m[----------] [m34 tests from OptimTest
2023-01-11T23:31:44.2564395Z [0;32m[ RUN      ] [mOptimTest.OptimizerAccessors
2023-01-11T23:31:44.2581736Z [0;32m[       OK ] [mOptimTest.OptimizerAccessors (1 ms)
2023-01-11T23:31:44.2582196Z [0;32m[ RUN      ] [mOptimTest.OldInterface
2023-01-11T23:31:44.2582697Z [0;32m[       OK ] [mOptimTest.OldInterface (0 ms)
2023-01-11T23:31:44.2583144Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_SGD
2023-01-11T23:31:45.7378963Z [0;32m[       OK ] [mOptimTest.XORConvergence_SGD (1479 ms)
2023-01-11T23:31:45.7379912Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_LBFGS
2023-01-11T23:31:46.7543165Z [0;32m[       OK ] [mOptimTest.XORConvergence_LBFGS (1016 ms)
2023-01-11T23:31:46.7543907Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_Adagrad
2023-01-11T23:31:47.3218432Z [0;32m[       OK ] [mOptimTest.XORConvergence_Adagrad (567 ms)
2023-01-11T23:31:47.3219406Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_RMSprop
2023-01-11T23:31:47.8874723Z [0;32m[       OK ] [mOptimTest.XORConvergence_RMSprop (565 ms)
2023-01-11T23:31:47.8875775Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_RMSpropWithMomentum
2023-01-11T23:31:49.4775519Z [0;32m[       OK ] [mOptimTest.XORConvergence_RMSpropWithMomentum (1590 ms)
2023-01-11T23:31:49.4776283Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_Adam
2023-01-11T23:31:50.0958545Z [0;32m[       OK ] [mOptimTest.XORConvergence_Adam (618 ms)
2023-01-11T23:31:50.0959083Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_AdamWithAmsgrad
2023-01-11T23:31:50.7185203Z [0;32m[       OK ] [mOptimTest.XORConvergence_AdamWithAmsgrad (622 ms)
2023-01-11T23:31:50.7186304Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_Adam
2023-01-11T23:31:50.9174428Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_Adam (198 ms)
2023-01-11T23:31:50.9176058Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdamWithWeightDecay
2023-01-11T23:31:51.1227299Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdamWithWeightDecay (205 ms)
2023-01-11T23:31:51.1228272Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdamWithWeightDecayAndAMSGrad
2023-01-11T23:31:51.3335677Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdamWithWeightDecayAndAMSGrad (210 ms)
2023-01-11T23:31:51.3336682Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_AdamW
2023-01-11T23:31:51.9663484Z [0;32m[       OK ] [mOptimTest.XORConvergence_AdamW (632 ms)
2023-01-11T23:31:51.9664021Z [0;32m[ RUN      ] [mOptimTest.XORConvergence_AdamWWithAmsgrad
2023-01-11T23:31:52.6005764Z [0;32m[       OK ] [mOptimTest.XORConvergence_AdamWWithAmsgrad (634 ms)
2023-01-11T23:31:52.6006341Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdamW
2023-01-11T23:31:52.8059696Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdamW (205 ms)
2023-01-11T23:31:52.8060263Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdamWWithoutWeightDecay
2023-01-11T23:31:53.0033276Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdamWWithoutWeightDecay (197 ms)
2023-01-11T23:31:53.0034402Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdamWWithAMSGrad
2023-01-11T23:31:53.2150041Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdamWWithAMSGrad (211 ms)
2023-01-11T23:31:53.2150582Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_Adagrad
2023-01-11T23:31:53.3810999Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_Adagrad (166 ms)
2023-01-11T23:31:53.3812347Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdagradWithWeightDecay
2023-01-11T23:31:53.5543662Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdagradWithWeightDecay (173 ms)
2023-01-11T23:31:53.5544882Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_AdagradWithWeightDecayAndLRDecay
2023-01-11T23:31:53.7273102Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_AdagradWithWeightDecayAndLRDecay (173 ms)
2023-01-11T23:31:53.7273703Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_RMSprop
2023-01-11T23:31:53.9033412Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_RMSprop (175 ms)
2023-01-11T23:31:53.9034621Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_RMSpropWithWeightDecay
2023-01-11T23:31:54.0849033Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_RMSpropWithWeightDecay (181 ms)
2023-01-11T23:31:54.0849691Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCentered
2023-01-11T23:31:54.2843575Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCentered (199 ms)
2023-01-11T23:31:54.2844324Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCenteredAndMomentum
2023-01-11T23:31:54.4937341Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_RMSpropWithWeightDecayAndCenteredAndMomentum (208 ms)
2023-01-11T23:31:54.4938283Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_SGD
2023-01-11T23:31:54.6270469Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_SGD (133 ms)
2023-01-11T23:31:54.6271337Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_SGDWithWeightDecay
2023-01-11T23:31:54.7695433Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_SGDWithWeightDecay (142 ms)
2023-01-11T23:31:54.7696366Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndMomentum
2023-01-11T23:31:54.9303594Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndMomentum (160 ms)
2023-01-11T23:31:54.9304629Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndNesterovMomentum
2023-01-11T23:31:55.0974847Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_SGDWithWeightDecayAndNesterovMomentum (167 ms)
2023-01-11T23:31:55.0975423Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_LBFGS
2023-01-11T23:31:55.2478640Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_LBFGS (150 ms)
2023-01-11T23:31:55.2479500Z [0;32m[ RUN      ] [mOptimTest.ProducesPyTorchValues_LBFGS_with_line_search
2023-01-11T23:31:55.9391407Z [0;32m[       OK ] [mOptimTest.ProducesPyTorchValues_LBFGS_with_line_search (691 ms)
2023-01-11T23:31:55.9392386Z [0;32m[ RUN      ] [mOptimTest.ZeroGrad
2023-01-11T23:31:55.9393175Z [0;32m[       OK ] [mOptimTest.ZeroGrad (0 ms)
2023-01-11T23:31:55.9394085Z [0;32m[ RUN      ] [mOptimTest.ExternalVectorOfParameters
2023-01-11T23:31:55.9395120Z [0;32m[       OK ] [mOptimTest.ExternalVectorOfParameters (0 ms)
2023-01-11T23:31:55.9395957Z [0;32m[ RUN      ] [mOptimTest.AddParameter_LBFGS
2023-01-11T23:31:55.9396613Z [0;32m[       OK ] [mOptimTest.AddParameter_LBFGS (0 ms)
2023-01-11T23:31:55.9397266Z [0;32m[ RUN      ] [mOptimTest.CheckLRChange_StepLR_Adam
2023-01-11T23:31:55.9397925Z [0;32m[       OK ] [mOptimTest.CheckLRChange_StepLR_Adam (0 ms)
2023-01-11T23:31:55.9398615Z [0;32m[----------] [m34 tests from OptimTest (11682 ms total)
2023-01-11T23:31:55.9398937Z 
2023-01-11T23:31:55.9399258Z [0;32m[----------] [m29 tests from OrderedDictTest
2023-01-11T23:31:55.9399667Z [0;32m[ RUN      ] [mOrderedDictTest.IsEmptyAfterDefaultConstruction
2023-01-11T23:31:55.9400106Z [0;32m[       OK ] [mOrderedDictTest.IsEmptyAfterDefaultConstruction (0 ms)
2023-01-11T23:31:55.9400582Z [0;32m[ RUN      ] [mOrderedDictTest.InsertAddsElementsWhenTheyAreYetNotPresent
2023-01-11T23:31:55.9401087Z [0;32m[       OK ] [mOrderedDictTest.InsertAddsElementsWhenTheyAreYetNotPresent (0 ms)
2023-01-11T23:31:55.9401557Z [0;32m[ RUN      ] [mOrderedDictTest.GetReturnsValuesWhenTheyArePresent
2023-01-11T23:31:55.9402174Z [0;32m[       OK ] [mOrderedDictTest.GetReturnsValuesWhenTheyArePresent (0 ms)
2023-01-11T23:31:55.9402646Z [0;32m[ RUN      ] [mOrderedDictTest.GetThrowsWhenPassedKeysThatAreNotPresent
2023-01-11T23:31:55.9421925Z [0;32m[       OK ] [mOrderedDictTest.GetThrowsWhenPassedKeysThatAreNotPresent (2 ms)
2023-01-11T23:31:55.9422536Z [0;32m[ RUN      ] [mOrderedDictTest.CanInitializeFromList
2023-01-11T23:31:55.9423008Z [0;32m[       OK ] [mOrderedDictTest.CanInitializeFromList (0 ms)
2023-01-11T23:31:55.9423512Z [0;32m[ RUN      ] [mOrderedDictTest.InsertThrowsWhenPassedElementsThatArePresent
2023-01-11T23:31:55.9445498Z [0;32m[       OK ] [mOrderedDictTest.InsertThrowsWhenPassedElementsThatArePresent (2 ms)
2023-01-11T23:31:55.9446036Z [0;32m[ RUN      ] [mOrderedDictTest.FrontReturnsTheFirstItem
2023-01-11T23:31:55.9446437Z [0;32m[       OK ] [mOrderedDictTest.FrontReturnsTheFirstItem (0 ms)
2023-01-11T23:31:55.9446821Z [0;32m[ RUN      ] [mOrderedDictTest.FrontThrowsWhenEmpty
2023-01-11T23:31:55.9456403Z [0;32m[       OK ] [mOrderedDictTest.FrontThrowsWhenEmpty (1 ms)
2023-01-11T23:31:55.9456799Z [0;32m[ RUN      ] [mOrderedDictTest.BackReturnsTheLastItem
2023-01-11T23:31:55.9457265Z [0;32m[       OK ] [mOrderedDictTest.BackReturnsTheLastItem (0 ms)
2023-01-11T23:31:55.9457637Z [0;32m[ RUN      ] [mOrderedDictTest.BackThrowsWhenEmpty
2023-01-11T23:31:55.9467297Z [0;32m[       OK ] [mOrderedDictTest.BackThrowsWhenEmpty (1 ms)
2023-01-11T23:31:55.9467862Z [0;32m[ RUN      ] [mOrderedDictTest.FindReturnsPointersToValuesWhenPresent
2023-01-11T23:31:55.9468476Z [0;32m[       OK ] [mOrderedDictTest.FindReturnsPointersToValuesWhenPresent (0 ms)
2023-01-11T23:31:55.9469146Z [0;32m[ RUN      ] [mOrderedDictTest.FindReturnsNullPointersWhenPasesdKeysThatAreNotPresent
2023-01-11T23:31:55.9469747Z [0;32m[       OK ] [mOrderedDictTest.FindReturnsNullPointersWhenPasesdKeysThatAreNotPresent (0 ms)
2023-01-11T23:31:55.9470430Z [0;32m[ RUN      ] [mOrderedDictTest.SubscriptOperatorThrowsWhenPassedKeysThatAreNotPresent
2023-01-11T23:31:55.9471028Z [0;32m[       OK ] [mOrderedDictTest.SubscriptOperatorThrowsWhenPassedKeysThatAreNotPresent (0 ms)
2023-01-11T23:31:55.9471624Z [0;32m[ RUN      ] [mOrderedDictTest.SubscriptOperatorReturnsItemsPositionallyWhenPassedIntegers
2023-01-11T23:31:55.9472254Z [0;32m[       OK ] [mOrderedDictTest.SubscriptOperatorReturnsItemsPositionallyWhenPassedIntegers (0 ms)
2023-01-11T23:31:55.9472861Z [0;32m[ RUN      ] [mOrderedDictTest.SubscriptOperatorsThrowswhenPassedKeysThatAreNotPresent
2023-01-11T23:31:55.9491859Z [0;32m[       OK ] [mOrderedDictTest.SubscriptOperatorsThrowswhenPassedKeysThatAreNotPresent (2 ms)
2023-01-11T23:31:55.9492638Z [0;32m[ RUN      ] [mOrderedDictTest.UpdateInsertsAllItemsFromAnotherOrderedDict
2023-01-11T23:31:55.9493193Z [0;32m[       OK ] [mOrderedDictTest.UpdateInsertsAllItemsFromAnotherOrderedDict (0 ms)
2023-01-11T23:31:55.9493666Z [0;32m[ RUN      ] [mOrderedDictTest.UpdateAlsoChecksForDuplicates
2023-01-11T23:31:55.9503224Z [0;32m[       OK ] [mOrderedDictTest.UpdateAlsoChecksForDuplicates (1 ms)
2023-01-11T23:31:55.9503705Z [0;32m[ RUN      ] [mOrderedDictTest.CanIterateItems
2023-01-11T23:31:55.9504178Z [0;32m[       OK ] [mOrderedDictTest.CanIterateItems (0 ms)
2023-01-11T23:31:55.9504591Z [0;32m[ RUN      ] [mOrderedDictTest.EraseWorks
2023-01-11T23:31:55.9505104Z [0;32m[       OK ] [mOrderedDictTest.EraseWorks (0 ms)
2023-01-11T23:31:55.9505577Z [0;32m[ RUN      ] [mOrderedDictTest.ClearMakesTheDictEmpty
2023-01-11T23:31:55.9505973Z [0;32m[       OK ] [mOrderedDictTest.ClearMakesTheDictEmpty (0 ms)
2023-01-11T23:31:55.9506359Z [0;32m[ RUN      ] [mOrderedDictTest.CanCopyConstruct
2023-01-11T23:31:55.9506705Z [0;32m[       OK ] [mOrderedDictTest.CanCopyConstruct (0 ms)
2023-01-11T23:31:55.9507044Z [0;32m[ RUN      ] [mOrderedDictTest.CanCopyAssign
2023-01-11T23:31:55.9507382Z [0;32m[       OK ] [mOrderedDictTest.CanCopyAssign (0 ms)
2023-01-11T23:31:55.9507800Z [0;32m[ RUN      ] [mOrderedDictTest.CanMoveConstruct
2023-01-11T23:31:55.9508154Z [0;32m[       OK ] [mOrderedDictTest.CanMoveConstruct (0 ms)
2023-01-11T23:31:55.9508495Z [0;32m[ RUN      ] [mOrderedDictTest.CanMoveAssign
2023-01-11T23:31:55.9508826Z [0;32m[       OK ] [mOrderedDictTest.CanMoveAssign (0 ms)
2023-01-11T23:31:55.9509198Z [0;32m[ RUN      ] [mOrderedDictTest.CanInsertWithBraces
2023-01-11T23:31:55.9509601Z [0;32m[       OK ] [mOrderedDictTest.CanInsertWithBraces (0 ms)
2023-01-11T23:31:55.9510134Z [0;32m[ RUN      ] [mOrderedDictTest.ErrorMessagesIncludeTheKeyDescription
2023-01-11T23:31:55.9526907Z [0;32m[       OK ] [mOrderedDictTest.ErrorMessagesIncludeTheKeyDescription (2 ms)
2023-01-11T23:31:55.9527413Z [0;32m[ RUN      ] [mOrderedDictTest.KeysReturnsAllKeys
2023-01-11T23:31:55.9527912Z [0;32m[       OK ] [mOrderedDictTest.KeysReturnsAllKeys (0 ms)
2023-01-11T23:31:55.9528312Z [0;32m[ RUN      ] [mOrderedDictTest.ValuesReturnsAllValues
2023-01-11T23:31:55.9528711Z [0;32m[       OK ] [mOrderedDictTest.ValuesReturnsAllValues (0 ms)
2023-01-11T23:31:55.9529084Z [0;32m[ RUN      ] [mOrderedDictTest.ItemsReturnsAllItems
2023-01-11T23:31:55.9529450Z [0;32m[       OK ] [mOrderedDictTest.ItemsReturnsAllItems (0 ms)
2023-01-11T23:31:55.9529813Z [0;32m[----------] [m29 tests from OrderedDictTest (13 ms total)
2023-01-11T23:31:55.9529981Z 
2023-01-11T23:31:55.9530128Z [0;32m[----------] [m25 tests from RNNTest
2023-01-11T23:31:55.9530417Z [0;32m[ RUN      ] [mRNNTest.CheckOutputSizes
2023-01-11T23:31:55.9597441Z [0;32m[       OK ] [mRNNTest.CheckOutputSizes (6 ms)
2023-01-11T23:31:55.9598152Z [0;32m[ RUN      ] [mRNNTest.CheckOutputSizesProj
2023-01-11T23:31:55.9670350Z [0;32m[       OK ] [mRNNTest.CheckOutputSizesProj (7 ms)
2023-01-11T23:31:55.9671153Z [0;32m[ RUN      ] [mRNNTest.CheckOutputValuesMatchPyTorch
2023-01-11T23:31:55.9675337Z [0;32m[       OK ] [mRNNTest.CheckOutputValuesMatchPyTorch (0 ms)
2023-01-11T23:31:55.9676027Z [0;32m[ RUN      ] [mRNNTest.EndToEndLSTM
2023-01-11T23:31:57.5905261Z [0;32m[       OK ] [mRNNTest.EndToEndLSTM (1622 ms)
2023-01-11T23:31:57.5906110Z [0;32m[ RUN      ] [mRNNTest.EndToEndLSTMProj
2023-01-11T23:31:59.2580022Z [0;32m[       OK ] [mRNNTest.EndToEndLSTMProj (1667 ms)
2023-01-11T23:31:59.2580730Z [0;32m[ RUN      ] [mRNNTest.EndToEndGRU
2023-01-11T23:32:00.6770577Z [0;32m[       OK ] [mRNNTest.EndToEndGRU (1419 ms)
2023-01-11T23:32:00.6770931Z [0;32m[ RUN      ] [mRNNTest.EndToEndRNNRelu
2023-01-11T23:32:01.4737435Z [0;32m[       OK ] [mRNNTest.EndToEndRNNRelu (796 ms)
2023-01-11T23:32:01.4738143Z [0;32m[ RUN      ] [mRNNTest.EndToEndRNNTanh
2023-01-11T23:32:02.3808416Z [0;32m[       OK ] [mRNNTest.EndToEndRNNTanh (907 ms)
2023-01-11T23:32:02.3808762Z [0;32m[ RUN      ] [mRNNTest.Sizes_CUDA
2023-01-11T23:32:02.4830202Z [0;32m[       OK ] [mRNNTest.Sizes_CUDA (101 ms)
2023-01-11T23:32:02.4830992Z [0;32m[ RUN      ] [mRNNTest.SizesProj_CUDA
2023-01-11T23:32:02.4857150Z [0;32m[       OK ] [mRNNTest.SizesProj_CUDA (2 ms)
2023-01-11T23:32:02.4857816Z [0;32m[ RUN      ] [mRNNTest.EndToEndLSTM_CUDA
2023-01-11T23:32:03.5556834Z [0;32m[       OK ] [mRNNTest.EndToEndLSTM_CUDA (1069 ms)
2023-01-11T23:32:03.5557603Z [0;32m[ RUN      ] [mRNNTest.EndToEndLSTMProj_CUDA
2023-01-11T23:32:04.7278438Z [0;32m[       OK ] [mRNNTest.EndToEndLSTMProj_CUDA (1172 ms)
2023-01-11T23:32:04.7279491Z [0;32m[ RUN      ] [mRNNTest.EndToEndGRU_CUDA
2023-01-11T23:32:05.5923773Z [0;32m[       OK ] [mRNNTest.EndToEndGRU_CUDA (864 ms)
2023-01-11T23:32:05.5924130Z [0;32m[ RUN      ] [mRNNTest.EndToEndRNNRelu_CUDA
2023-01-11T23:32:06.4318599Z [0;32m[       OK ] [mRNNTest.EndToEndRNNRelu_CUDA (839 ms)
2023-01-11T23:32:06.4319320Z [0;32m[ RUN      ] [mRNNTest.EndToEndRNNTanh_CUDA
2023-01-11T23:32:07.3722961Z [0;32m[       OK ] [mRNNTest.EndToEndRNNTanh_CUDA (940 ms)
2023-01-11T23:32:07.3723622Z [0;32m[ RUN      ] [mRNNTest.PrettyPrintRNNs
2023-01-11T23:32:07.3740175Z [0;32m[       OK ] [mRNNTest.PrettyPrintRNNs (2 ms)
2023-01-11T23:32:07.3740934Z [0;32m[ RUN      ] [mRNNTest.BidirectionalFlattenParameters
2023-01-11T23:32:07.3827756Z [0;32m[       OK ] [mRNNTest.BidirectionalFlattenParameters (8 ms)
2023-01-11T23:32:07.3828819Z [0;32m[ RUN      ] [mRNNTest.BidirectionalGRUReverseForward
2023-01-11T23:32:07.3839948Z [0;32m[       OK ] [mRNNTest.BidirectionalGRUReverseForward (1 ms)
2023-01-11T23:32:07.3840472Z [0;32m[ RUN      ] [mRNNTest.BidirectionalGRUReverseForward_CUDA
2023-01-11T23:32:07.3851310Z [0;32m[       OK ] [mRNNTest.BidirectionalGRUReverseForward_CUDA (1 ms)
2023-01-11T23:32:07.3851919Z [0;32m[ RUN      ] [mRNNTest.BidirectionalLSTMReverseForward
2023-01-11T23:32:07.3862903Z [0;32m[       OK ] [mRNNTest.BidirectionalLSTMReverseForward (1 ms)
2023-01-11T23:32:07.3863430Z [0;32m[ RUN      ] [mRNNTest.BidirectionalLSTMReverseForward_CUDA
2023-01-11T23:32:07.3875099Z [0;32m[       OK ] [mRNNTest.BidirectionalLSTMReverseForward_CUDA (1 ms)
2023-01-11T23:32:07.3875600Z [0;32m[ RUN      ] [mRNNTest.BidirectionalMultilayerGRU_CPU_vs_CUDA
2023-01-11T23:32:07.3915336Z [0;32m[       OK ] [mRNNTest.BidirectionalMultilayerGRU_CPU_vs_CUDA (3 ms)
2023-01-11T23:32:07.3915836Z [0;32m[ RUN      ] [mRNNTest.BidirectionalMultilayerLSTM_CPU_vs_CUDA
2023-01-11T23:32:07.3953348Z [0;32m[       OK ] [mRNNTest.BidirectionalMultilayerLSTM_CPU_vs_CUDA (3 ms)
2023-01-11T23:32:07.3953946Z [0;32m[ RUN      ] [mRNNTest.BidirectionalMultilayerLSTMProj_CPU_vs_CUDA
2023-01-11T23:32:07.3999482Z [0;32m[       OK ] [mRNNTest.BidirectionalMultilayerLSTMProj_CPU_vs_CUDA (4 ms)
2023-01-11T23:32:07.4000451Z [0;32m[ RUN      ] [mRNNTest.UsePackedSequenceAsInput
2023-01-11T23:32:07.4014894Z [0;32m[       OK ] [mRNNTest.UsePackedSequenceAsInput (1 ms)
2023-01-11T23:32:07.4015331Z [0;32m[----------] [m25 tests from RNNTest (11448 ms total)
2023-01-11T23:32:07.4015490Z 
2023-01-11T23:32:07.4015646Z [0;32m[----------] [m20 tests from SequentialTest
2023-01-11T23:32:07.4015972Z [0;32m[ RUN      ] [mSequentialTest.CanContainThings
2023-01-11T23:32:07.4016320Z [0;32m[       OK ] [mSequentialTest.CanContainThings (0 ms)
2023-01-11T23:32:07.4016692Z [0;32m[ RUN      ] [mSequentialTest.ConstructsFromSharedPointer
2023-01-11T23:32:07.4017105Z [0;32m[       OK ] [mSequentialTest.ConstructsFromSharedPointer (0 ms)
2023-01-11T23:32:07.4017507Z [0;32m[ RUN      ] [mSequentialTest.ConstructsFromConcreteType
2023-01-11T23:32:07.4017906Z [0;32m[       OK ] [mSequentialTest.ConstructsFromConcreteType (0 ms)
2023-01-11T23:32:07.4018289Z [0;32m[ RUN      ] [mSequentialTest.ConstructsFromModuleHolder
2023-01-11T23:32:07.4018684Z [0;32m[       OK ] [mSequentialTest.ConstructsFromModuleHolder (0 ms)
2023-01-11T23:32:07.4019069Z [0;32m[ RUN      ] [mSequentialTest.PushBackAddsAnElement
2023-01-11T23:32:07.4019440Z [0;32m[       OK ] [mSequentialTest.PushBackAddsAnElement (0 ms)
2023-01-11T23:32:07.4019775Z [0;32m[ RUN      ] [mSequentialTest.AccessWithAt
2023-01-11T23:32:07.4042348Z [0;32m[       OK ] [mSequentialTest.AccessWithAt (2 ms)
2023-01-11T23:32:07.4042793Z [0;32m[ RUN      ] [mSequentialTest.AccessWithPtr
2023-01-11T23:32:07.4065707Z [0;32m[       OK ] [mSequentialTest.AccessWithPtr (2 ms)
2023-01-11T23:32:07.4066220Z [0;32m[ RUN      ] [mSequentialTest.CallingForwardOnEmptySequentialIsDisallowed
2023-01-11T23:32:07.4076930Z [0;32m[       OK ] [mSequentialTest.CallingForwardOnEmptySequentialIsDisallowed (1 ms)
2023-01-11T23:32:07.4077528Z [0;32m[ RUN      ] [mSequentialTest.CallingForwardChainsCorrectly
2023-01-11T23:32:07.4077941Z [0;32m[       OK ] [mSequentialTest.CallingForwardChainsCorrectly (0 ms)
2023-01-11T23:32:07.4078399Z [0;32m[ RUN      ] [mSequentialTest.CallingForwardWithTheWrongReturnTypeThrows
2023-01-11T23:32:07.4088067Z [0;32m[       OK ] [mSequentialTest.CallingForwardWithTheWrongReturnTypeThrows (1 ms)
2023-01-11T23:32:07.4088729Z [0;32m[ RUN      ] [mSequentialTest.TheReturnTypeOfForwardDefaultsToTensor
2023-01-11T23:32:07.4089384Z [0;32m[       OK ] [mSequentialTest.TheReturnTypeOfForwardDefaultsToTensor (0 ms)
2023-01-11T23:32:07.4089822Z [0;32m[ RUN      ] [mSequentialTest.ForwardReturnsTheLastValue
2023-01-11T23:32:07.4091431Z [0;32m[       OK ] [mSequentialTest.ForwardReturnsTheLastValue (0 ms)
2023-01-11T23:32:07.4091976Z [0;32m[ RUN      ] [mSequentialTest.SanityCheckForHoldingStandardModules
2023-01-11T23:32:07.4092600Z [0;32m[       OK ] [mSequentialTest.SanityCheckForHoldingStandardModules (0 ms)
2023-01-11T23:32:07.4093135Z [0;32m[ RUN      ] [mSequentialTest.ExtendPushesModulesFromOtherSequential
2023-01-11T23:32:07.4093672Z [0;32m[       OK ] [mSequentialTest.ExtendPushesModulesFromOtherSequential (0 ms)
2023-01-11T23:32:07.4094134Z [0;32m[ RUN      ] [mSequentialTest.HasReferenceSemantics
2023-01-11T23:32:07.4094796Z [0;32m[       OK ] [mSequentialTest.HasReferenceSemantics (0 ms)
2023-01-11T23:32:07.4095141Z [0;32m[ RUN      ] [mSequentialTest.IsCloneable
2023-01-11T23:32:07.4098117Z [0;32m[       OK ] [mSequentialTest.IsCloneable (0 ms)
2023-01-11T23:32:07.4098583Z [0;32m[ RUN      ] [mSequentialTest.RegistersElementsAsSubmodules
2023-01-11T23:32:07.4099083Z [0;32m[       OK ] [mSequentialTest.RegistersElementsAsSubmodules (0 ms)
2023-01-11T23:32:07.4099466Z [0;32m[ RUN      ] [mSequentialTest.CloneToDevice_CUDA
2023-01-11T23:32:07.4101379Z [0;32m[       OK ] [mSequentialTest.CloneToDevice_CUDA (0 ms)
2023-01-11T23:32:07.4101826Z [0;32m[ RUN      ] [mSequentialTest.PrettyPrintSequential
2023-01-11T23:32:07.4104875Z [0;32m[       OK ] [mSequentialTest.PrettyPrintSequential (0 ms)
2023-01-11T23:32:07.4105370Z [0;32m[ RUN      ] [mSequentialTest.ModuleForwardMethodOptionalArg
2023-01-11T23:32:07.4137734Z [0;32m[       OK ] [mSequentialTest.ModuleForwardMethodOptionalArg (3 ms)
2023-01-11T23:32:07.4138317Z [0;32m[----------] [m20 tests from SequentialTest (12 ms total)
2023-01-11T23:32:07.4138535Z 
2023-01-11T23:32:07.4138706Z [0;32m[----------] [m17 tests from TransformerTest
2023-01-11T23:32:07.4139063Z [0;32m[ RUN      ] [mTransformerTest.TransformerEncoderLayer
2023-01-11T23:32:07.4214157Z [0;32m[       OK ] [mTransformerTest.TransformerEncoderLayer (7 ms)
2023-01-11T23:32:07.4215544Z [0;32m[ RUN      ] [mTransformerTest.TransformerEncoderLayer_CUDA
2023-01-11T23:32:07.4376053Z [0;32m[       OK ] [mTransformerTest.TransformerEncoderLayer_CUDA (16 ms)
2023-01-11T23:32:07.4376874Z [0;32m[ RUN      ] [mTransformerTest.TransformerDecoderLayer
2023-01-11T23:32:07.4452835Z [0;32m[       OK ] [mTransformerTest.TransformerDecoderLayer (7 ms)
2023-01-11T23:32:07.4453737Z [0;32m[ RUN      ] [mTransformerTest.TransformerDecoderLayer_CUDA
2023-01-11T23:32:07.4627738Z [0;32m[       OK ] [mTransformerTest.TransformerDecoderLayer_CUDA (17 ms)
2023-01-11T23:32:07.4628584Z [0;32m[ RUN      ] [mTransformerTest.TransformerDecoderLayer_gelu
2023-01-11T23:32:07.4670148Z [0;32m[       OK ] [mTransformerTest.TransformerDecoderLayer_gelu (4 ms)
2023-01-11T23:32:07.4671399Z [0;32m[ RUN      ] [mTransformerTest.TransformerDecoderLayer_gelu_CUDA
2023-01-11T23:32:07.4763119Z [0;32m[       OK ] [mTransformerTest.TransformerDecoderLayer_gelu_CUDA (9 ms)
2023-01-11T23:32:07.4763530Z [0;32m[ RUN      ] [mTransformerTest.TransformerEncoder
2023-01-11T23:32:07.4910929Z [0;32m[       OK ] [mTransformerTest.TransformerEncoder (14 ms)
2023-01-11T23:32:07.4912102Z [0;32m[ RUN      ] [mTransformerTest.TransformerEncoder_CUDA
2023-01-11T23:32:07.5205763Z [0;32m[       OK ] [mTransformerTest.TransformerEncoder_CUDA (29 ms)
2023-01-11T23:32:07.5206207Z [0;32m[ RUN      ] [mTransformerTest.PrettyPrintTransformerEncoderLayer
2023-01-11T23:32:07.5206664Z [0;32m[       OK ] [mTransformerTest.PrettyPrintTransformerEncoderLayer (0 ms)
2023-01-11T23:32:07.5207116Z [0;32m[ RUN      ] [mTransformerTest.PrettyPrintTransformerEncoder
2023-01-11T23:32:07.5215225Z [0;32m[       OK ] [mTransformerTest.PrettyPrintTransformerEncoder (0 ms)
2023-01-11T23:32:07.5215696Z [0;32m[ RUN      ] [mTransformerTest.PrettyPrintTransformerDecoderLayer
2023-01-11T23:32:07.5216282Z [0;32m[       OK ] [mTransformerTest.PrettyPrintTransformerDecoderLayer (0 ms)
2023-01-11T23:32:07.5216688Z [0;32m[ RUN      ] [mTransformerTest.TransformerDecoder
2023-01-11T23:32:07.5644816Z [0;32m[       OK ] [mTransformerTest.TransformerDecoder (42 ms)
2023-01-11T23:32:07.5645203Z [0;32m[ RUN      ] [mTransformerTest.TransformerDecoder_CUDA
2023-01-11T23:32:07.6513734Z [0;32m[       OK ] [mTransformerTest.TransformerDecoder_CUDA (86 ms)
2023-01-11T23:32:07.6514583Z [0;32m[ RUN      ] [mTransformerTest.PrettyPrintTransformerDecoder
2023-01-11T23:32:07.6521103Z [0;32m[       OK ] [mTransformerTest.PrettyPrintTransformerDecoder (0 ms)
2023-01-11T23:32:07.6521882Z [0;32m[ RUN      ] [mTransformerTest.Transformer
2023-01-11T23:32:07.6675519Z [0;32m[       OK ] [mTransformerTest.Transformer (15 ms)
2023-01-11T23:32:07.6676220Z [0;32m[ RUN      ] [mTransformerTest.Transformer_CUDA
2023-01-11T23:32:07.7005886Z [0;32m[       OK ] [mTransformerTest.Transformer_CUDA (33 ms)
2023-01-11T23:32:07.7006286Z [0;32m[ RUN      ] [mTransformerTest.TransformerArgsCorrectness
2023-01-11T23:32:07.7066266Z [0;32m[       OK ] [mTransformerTest.TransformerArgsCorrectness (5 ms)
2023-01-11T23:32:07.7067020Z [0;32m[----------] [m17 tests from TransformerTest (292 ms total)
2023-01-11T23:32:07.7067331Z 
2023-01-11T23:32:07.7067621Z [0;32m[----------] [m24 tests from SerializeTest
2023-01-11T23:32:07.7068157Z [0;32m[ RUN      ] [mSerializeTest.KeysFunc
2023-01-11T23:32:07.7069078Z [0;32m[       OK ] [mSerializeTest.KeysFunc (0 ms)
2023-01-11T23:32:07.7069645Z [0;32m[ RUN      ] [mSerializeTest.TryReadFunc
2023-01-11T23:32:07.7072956Z [0;32m[       OK ] [mSerializeTest.TryReadFunc (0 ms)
2023-01-11T23:32:07.7073275Z [0;32m[ RUN      ] [mSerializeTest.Basic
2023-01-11T23:32:07.7073912Z [0;32m[       OK ] [mSerializeTest.Basic (0 ms)
2023-01-11T23:32:07.7074362Z [0;32m[ RUN      ] [mSerializeTest.MathBits
2023-01-11T23:32:07.7155161Z [0;32m[       OK ] [mSerializeTest.MathBits (7 ms)
2023-01-11T23:32:07.7156119Z [0;32m[ RUN      ] [mSerializeTest.BasicToFile
2023-01-11T23:32:07.7156991Z [0;32m[       OK ] [mSerializeTest.BasicToFile (0 ms)
2023-01-11T23:32:07.7157624Z [0;32m[ RUN      ] [mSerializeTest.BasicViaFunc
2023-01-11T23:32:07.7159120Z [0;32m[       OK ] [mSerializeTest.BasicViaFunc (0 ms)
2023-01-11T23:32:07.7159879Z [0;32m[ RUN      ] [mSerializeTest.Resized
2023-01-11T23:32:07.7160969Z [0;32m[       OK ] [mSerializeTest.Resized (0 ms)
2023-01-11T23:32:07.7161749Z [0;32m[ RUN      ] [mSerializeTest.Sliced
2023-01-11T23:32:07.7162784Z [0;32m[       OK ] [mSerializeTest.Sliced (0 ms)
2023-01-11T23:32:07.7163189Z [0;32m[ RUN      ] [mSerializeTest.NonContiguous
2023-01-11T23:32:07.7165800Z [0;32m[       OK ] [mSerializeTest.NonContiguous (0 ms)
2023-01-11T23:32:07.7166273Z [0;32m[ RUN      ] [mSerializeTest.ErrorOnMissingKey
2023-01-11T23:32:07.7245757Z [0;32m[       OK ] [mSerializeTest.ErrorOnMissingKey (7 ms)
2023-01-11T23:32:07.7246212Z [0;32m[ RUN      ] [mSerializeTest.XOR
2023-01-11T23:32:07.9141809Z [0;32m[       OK ] [mSerializeTest.XOR (189 ms)
2023-01-11T23:32:07.9142470Z [0;32m[ RUN      ] [mSerializeTest.Optim
2023-01-11T23:32:07.9158375Z [0;32m[       OK ] [mSerializeTest.Optim (2 ms)
2023-01-11T23:32:07.9158724Z [0;32m[ RUN      ] [mSerializeTest.Optim_Adagrad
2023-01-11T23:32:07.9188732Z [0;32m[       OK ] [mSerializeTest.Optim_Adagrad (3 ms)
2023-01-11T23:32:07.9189059Z [0;32m[ RUN      ] [mSerializeTest.Optim_SGD
2023-01-11T23:32:07.9220104Z [0;32m[       OK ] [mSerializeTest.Optim_SGD (2 ms)
2023-01-11T23:32:07.9220758Z [0;32m[ RUN      ] [mSerializeTest.Optim_Adam
2023-01-11T23:32:07.9249867Z [0;32m[       OK ] [mSerializeTest.Optim_Adam (3 ms)
2023-01-11T23:32:07.9250201Z [0;32m[ RUN      ] [mSerializeTest.Optim_AdamW
2023-01-11T23:32:07.9282198Z [0;32m[       OK ] [mSerializeTest.Optim_AdamW (3 ms)
2023-01-11T23:32:07.9282539Z [0;32m[ RUN      ] [mSerializeTest.Optim_RMSprop
2023-01-11T23:32:07.9315072Z [0;32m[       OK ] [mSerializeTest.Optim_RMSprop (3 ms)
2023-01-11T23:32:07.9315407Z [0;32m[ RUN      ] [mSerializeTest.Optim_LBFGS
2023-01-11T23:32:07.9344573Z [0;32m[       OK ] [mSerializeTest.Optim_LBFGS (3 ms)
2023-01-11T23:32:07.9344903Z [0;32m[ RUN      ] [mSerializeTest.XOR_CUDA
2023-01-11T23:32:08.1537372Z [0;32m[       OK ] [mSerializeTest.XOR_CUDA (218 ms)
2023-01-11T23:32:08.1538345Z [0;32m[ RUN      ] [mSerializeTest.CanSerializeModulesWithIntermediateModulesWithoutParametersOrBuffers
2023-01-11T23:32:08.1540389Z [0;32m[       OK ] [mSerializeTest.CanSerializeModulesWithIntermediateModulesWithoutParametersOrBuffers (0 ms)
2023-01-11T23:32:08.1541293Z [0;32m[ RUN      ] [mSerializeTest.VectorOfTensors
2023-01-11T23:32:08.1543681Z [0;32m[       OK ] [mSerializeTest.VectorOfTensors (0 ms)
2023-01-11T23:32:08.1544289Z [0;32m[ RUN      ] [mSerializeTest.IValue
2023-01-11T23:32:08.1544610Z [0;32m[       OK ] [mSerializeTest.IValue (0 ms)
2023-01-11T23:32:08.1545048Z [0;32m[ RUN      ] [mSerializeTest.UnserializableSubmoduleIsSkippedWhenSavingModule
2023-01-11T23:32:08.1545637Z [0;32m[       OK ] [mSerializeTest.UnserializableSubmoduleIsSkippedWhenSavingModule (0 ms)
2023-01-11T23:32:08.1546171Z [0;32m[ RUN      ] [mSerializeTest.UnserializableSubmoduleIsIgnoredWhenLoadingModule
2023-01-11T23:32:08.1553245Z [0;32m[       OK ] [mSerializeTest.UnserializableSubmoduleIsIgnoredWhenLoadingModule (0 ms)
2023-01-11T23:32:08.1553720Z [0;32m[----------] [m24 tests from SerializeTest (448 ms total)
2023-01-11T23:32:08.1553888Z 
2023-01-11T23:32:08.1554044Z [0;32m[----------] [m1 test from SpecialTest
2023-01-11T23:32:08.1554480Z [0;32m[ RUN      ] [mSpecialTest.special
2023-01-11T23:32:08.1554768Z [0;32m[       OK ] [mSpecialTest.special (0 ms)
2023-01-11T23:32:08.1555089Z [0;32m[----------] [m1 test from SpecialTest (0 ms total)
2023-01-11T23:32:08.1555251Z 
2023-01-11T23:32:08.1555400Z [0;32m[----------] [m5 tests from TestStatic
2023-01-11T23:32:08.1555678Z [0;32m[ RUN      ] [mTestStatic.AllOf
2023-01-11T23:32:08.1555963Z [0;32m[       OK ] [mTestStatic.AllOf (0 ms)
2023-01-11T23:32:08.1556236Z [0;32m[ RUN      ] [mTestStatic.AnyOf
2023-01-11T23:32:08.1556504Z [0;32m[       OK ] [mTestStatic.AnyOf (0 ms)
2023-01-11T23:32:08.1556797Z [0;32m[ RUN      ] [mTestStatic.EnableIfModule
2023-01-11T23:32:08.1557118Z [0;32m[       OK ] [mTestStatic.EnableIfModule (0 ms)
2023-01-11T23:32:08.1557446Z [0;32m[ RUN      ] [mTestStatic.ReturnTypeOfForward
2023-01-11T23:32:08.1557782Z [0;32m[       OK ] [mTestStatic.ReturnTypeOfForward (0 ms)
2023-01-11T23:32:08.1558080Z [0;32m[ RUN      ] [mTestStatic.Apply
2023-01-11T23:32:08.1558365Z [0;32m[       OK ] [mTestStatic.Apply (0 ms)
2023-01-11T23:32:08.1558670Z [0;32m[----------] [m5 tests from TestStatic (0 ms total)
2023-01-11T23:32:08.1558825Z 
2023-01-11T23:32:08.1558974Z [0;32m[----------] [m49 tests from TensorTest
2023-01-11T23:32:08.1559351Z [0;32m[ RUN      ] [mTensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame_CUDA
2023-01-11T23:32:08.1559810Z [0;32m[       OK ] [mTensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame_CUDA (0 ms)
2023-01-11T23:32:08.1560228Z [0;32m[ RUN      ] [mTensorTest.MagmaInitializesCorrectly_CUDA
2023-01-11T23:32:08.2986685Z [0;32m[       OK ] [mTensorTest.MagmaInitializesCorrectly_CUDA (142 ms)
2023-01-11T23:32:08.2987338Z [0;32m[ RUN      ] [mTensorTest.ToDtype
2023-01-11T23:32:08.2987872Z [0;32m[       OK ] [mTensorTest.ToDtype (0 ms)
2023-01-11T23:32:08.2988475Z [0;32m[ RUN      ] [mTensorTest.ToTensorAndTensorAttributes
2023-01-11T23:32:08.2989174Z [0;32m[       OK ] [mTensorTest.ToTensorAndTensorAttributes (0 ms)
2023-01-11T23:32:08.2989871Z [0;32m[ RUN      ] [mTensorTest.ToOptionsWithRequiresGrad
2023-01-11T23:32:08.3007254Z [0;32m[       OK ] [mTensorTest.ToOptionsWithRequiresGrad (2 ms)
2023-01-11T23:32:08.3008575Z [0;32m[ RUN      ] [mTensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame
2023-01-11T23:32:08.3009098Z [0;32m[       OK ] [mTensorTest.ToDoesNotCopyWhenOptionsAreAllTheSame (0 ms)
2023-01-11T23:32:08.3009791Z [0;32m[ RUN      ] [mTensorTest.AtTensorCtorScalar
2023-01-11T23:32:08.3010145Z [0;32m[       OK ] [mTensorTest.AtTensorCtorScalar (0 ms)
2023-01-11T23:32:08.3010476Z [0;32m[ RUN      ] [mTensorTest.AtTensorCtorSingleDim
2023-01-11T23:32:08.3010830Z [0;32m[       OK ] [mTensorTest.AtTensorCtorSingleDim (0 ms)
2023-01-11T23:32:08.3011187Z [0;32m[ RUN      ] [mTensorTest.AtTensorCastRealToComplex
2023-01-11T23:32:08.3011560Z [0;32m[       OK ] [mTensorTest.AtTensorCastRealToComplex (0 ms)
2023-01-11T23:32:08.3011956Z [0;32m[ RUN      ] [mTensorTest.AtTensorCastComplexToRealErrorChecks
2023-01-11T23:32:08.3064537Z [0;32m[       OK ] [mTensorTest.AtTensorCastComplexToRealErrorChecks (5 ms)
2023-01-11T23:32:08.3065200Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorScalarIntegralType
2023-01-11T23:32:08.3065863Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorScalarIntegralType (0 ms)
2023-01-11T23:32:08.3066553Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorScalarFloatingType
2023-01-11T23:32:08.3067231Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorScalarFloatingType (0 ms)
2023-01-11T23:32:08.3067869Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorScalarBoolType
2023-01-11T23:32:08.3068474Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorScalarBoolType (0 ms)
2023-01-11T23:32:08.3069102Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorSingleDimIntegralType
2023-01-11T23:32:08.3069748Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorSingleDimIntegralType (0 ms)
2023-01-11T23:32:08.3070485Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorSingleDimFloatingType
2023-01-11T23:32:08.3071344Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorSingleDimFloatingType (0 ms)
2023-01-11T23:32:08.3071979Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorSingleDimBoolType
2023-01-11T23:32:08.3072634Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorSingleDimBoolType (0 ms)
2023-01-11T23:32:08.3073307Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorMultiDimIntegralType
2023-01-11T23:32:08.3074395Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorMultiDimIntegralType (0 ms)
2023-01-11T23:32:08.3075062Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorMultiDimFloatingType
2023-01-11T23:32:08.3075953Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorMultiDimFloatingType (0 ms)
2023-01-11T23:32:08.3076565Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorMultiDimBoolType
2023-01-11T23:32:08.3077135Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorMultiDimBoolType (0 ms)
2023-01-11T23:32:08.3077719Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorMultiDimWithOptions
2023-01-11T23:32:08.3078642Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorMultiDimWithOptions (0 ms)
2023-01-11T23:32:08.3079228Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorMultiDimErrorChecks
2023-01-11T23:32:08.3139448Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorMultiDimErrorChecks (5 ms)
2023-01-11T23:32:08.3140816Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCastRealToComplex
2023-01-11T23:32:08.3141892Z [0;32m[       OK ] [mTensorTest.TorchTensorCastRealToComplex (0 ms)
2023-01-11T23:32:08.3142568Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCastComplexToRealErrorChecks
2023-01-11T23:32:08.3142963Z [W Copy.cpp:276] Warning: Casting complex values to real discards the imaginary part (function operator())
2023-01-11T23:32:08.3151916Z [0;32m[       OK ] [mTensorTest.TorchTensorCastComplexToRealErrorChecks (1 ms)
2023-01-11T23:32:08.3152565Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorMultiDim_CUDA
2023-01-11T23:32:08.3156983Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorMultiDim_CUDA (0 ms)
2023-01-11T23:32:08.3157547Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorZeroSizedDim
2023-01-11T23:32:08.3158093Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorZeroSizedDim (0 ms)
2023-01-11T23:32:08.3158583Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorWithoutSpecifyingDtype
2023-01-11T23:32:08.3159041Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorWithoutSpecifyingDtype (0 ms)
2023-01-11T23:32:08.3159768Z [0;32m[ RUN      ] [mTensorTest.TorchTensorCtorWithNonDtypeOptions
2023-01-11T23:32:08.3160402Z [0;32m[       OK ] [mTensorTest.TorchTensorCtorWithNonDtypeOptions (0 ms)
2023-01-11T23:32:08.3160800Z [0;32m[ RUN      ] [mTensorTest.Arange
2023-01-11T23:32:08.3161080Z [0;32m[       OK ] [mTensorTest.Arange (0 ms)
2023-01-11T23:32:08.3161438Z [0;32m[ RUN      ] [mTensorTest.PrettyPrintTensorDataContainer
2023-01-11T23:32:08.3161844Z [0;32m[       OK ] [mTensorTest.PrettyPrintTensorDataContainer (0 ms)
2023-01-11T23:32:08.3162300Z [0;32m[ RUN      ] [mTensorTest.TensorDataContainerCallingAccessorOfWrongType
2023-01-11T23:32:08.3225780Z [0;32m[       OK ] [mTensorTest.TensorDataContainerCallingAccessorOfWrongType (6 ms)
2023-01-11T23:32:08.3226315Z [0;32m[ RUN      ] [mTensorTest.FromBlob
2023-01-11T23:32:08.3226703Z [0;32m[       OK ] [mTensorTest.FromBlob (0 ms)
2023-01-11T23:32:08.3227087Z [0;32m[ RUN      ] [mTensorTest.FromBlobUsesDeleter
2023-01-11T23:32:08.3227446Z [0;32m[       OK ] [mTensorTest.FromBlobUsesDeleter (0 ms)
2023-01-11T23:32:08.3227785Z [0;32m[ RUN      ] [mTensorTest.FromBlobWithStrides
2023-01-11T23:32:08.3228127Z [0;32m[       OK ] [mTensorTest.FromBlobWithStrides (0 ms)
2023-01-11T23:32:08.3228418Z [0;32m[ RUN      ] [mTensorTest.Item
2023-01-11T23:32:08.3228700Z [0;32m[       OK ] [mTensorTest.Item (0 ms)
2023-01-11T23:32:08.3228981Z [0;32m[ RUN      ] [mTensorTest.Item_CUDA
2023-01-11T23:32:08.3229264Z [0;32m[       OK ] [mTensorTest.Item_CUDA (0 ms)
2023-01-11T23:32:08.3234423Z [0;32m[ RUN      ] [mTensorTest.DataPtr
2023-01-11T23:32:08.3234859Z [0;32m[       OK ] [mTensorTest.DataPtr (0 ms)
2023-01-11T23:32:08.3235149Z [0;32m[ RUN      ] [mTensorTest.Data
2023-01-11T23:32:08.3235443Z [0;32m[       OK ] [mTensorTest.Data (0 ms)
2023-01-11T23:32:08.3235736Z [0;32m[ RUN      ] [mTensorTest.BackwardAndGrad
2023-01-11T23:32:08.3236058Z [0;32m[       OK ] [mTensorTest.BackwardAndGrad (0 ms)
2023-01-11T23:32:08.3236407Z [0;32m[ RUN      ] [mTensorTest.BackwardCreatesOnesGrad
2023-01-11T23:32:08.3236768Z [0;32m[       OK ] [mTensorTest.BackwardCreatesOnesGrad (0 ms)
2023-01-11T23:32:08.3237123Z [0;32m[ RUN      ] [mTensorTest.BackwardNonScalarOutputs
2023-01-11T23:32:08.3262106Z [0;32m[       OK ] [mTensorTest.BackwardNonScalarOutputs (3 ms)
2023-01-11T23:32:08.3263323Z [0;32m[ RUN      ] [mTensorTest.IsLeaf
2023-01-11T23:32:08.3263783Z [0;32m[       OK ] [mTensorTest.IsLeaf (0 ms)
2023-01-11T23:32:08.3264204Z [0;32m[ RUN      ] [mTensorTest.OutputNr
2023-01-11T23:32:08.3264602Z [0;32m[       OK ] [mTensorTest.OutputNr (0 ms)
2023-01-11T23:32:08.3274531Z [0;32m[ RUN      ] [mTensorTest.Version
2023-01-11T23:32:08.3274822Z [0;32m[       OK ] [mTensorTest.Version (0 ms)
2023-01-11T23:32:08.3275100Z [0;32m[ RUN      ] [mTensorTest.Detach
2023-01-11T23:32:08.3275379Z [0;32m[       OK ] [mTensorTest.Detach (0 ms)
2023-01-11T23:32:08.3275663Z [0;32m[ RUN      ] [mTensorTest.DetachInplace
2023-01-11T23:32:08.3275975Z [0;32m[       OK ] [mTensorTest.DetachInplace (0 ms)
2023-01-11T23:32:08.3276271Z [0;32m[ RUN      ] [mTensorTest.SetData
2023-01-11T23:32:08.3276550Z [0;32m[       OK ] [mTensorTest.SetData (0 ms)
2023-01-11T23:32:08.3276862Z [0;32m[ RUN      ] [mTensorTest.RequiresGradInplace
2023-01-11T23:32:08.3287847Z [0;32m[       OK ] [mTensorTest.RequiresGradInplace (2 ms)
2023-01-11T23:32:08.3288225Z [0;32m[ RUN      ] [mTensorTest.StdDimension
2023-01-11T23:32:08.3288585Z [0;32m[       OK ] [mTensorTest.StdDimension (0 ms)
2023-01-11T23:32:08.3288902Z [0;32m[ RUN      ] [mTensorTest.ReshapeAlias
2023-01-11T23:32:08.3291226Z [0;32m[       OK ] [mTensorTest.ReshapeAlias (0 ms)
2023-01-11T23:32:08.3291921Z [0;32m[----------] [m49 tests from TensorTest (173 ms total)
2023-01-11T23:32:08.3292109Z 
2023-01-11T23:32:08.3292547Z [0;32m[----------] [m40 tests from TensorIndexingTest
2023-01-11T23:32:08.3292863Z [0;32m[ RUN      ] [mTensorIndexingTest.Slice
2023-01-11T23:32:08.3293296Z [0;32m[       OK ] [mTensorIndexingTest.Slice (0 ms)
2023-01-11T23:32:08.3293653Z [0;32m[ RUN      ] [mTensorIndexingTest.TensorIndex
2023-01-11T23:32:08.3304668Z [0;32m[       OK ] [mTensorIndexingTest.TensorIndex (1 ms)
2023-01-11T23:32:08.3305026Z [0;32m[ RUN      ] [mTensorIndexingTest.TestNoIndices
2023-01-11T23:32:08.3394733Z [0;32m[       OK ] [mTensorIndexingTest.TestNoIndices (8 ms)
2023-01-11T23:32:08.3395682Z [0;32m[ RUN      ] [mTensorIndexingTest.TestAdvancedIndexingWithListOfTensor
2023-01-11T23:32:08.3396698Z [0;32m[       OK ] [mTensorIndexingTest.TestAdvancedIndexingWithListOfTensor (0 ms)
2023-01-11T23:32:08.3397523Z [0;32m[ RUN      ] [mTensorIndexingTest.TestSingleInt
2023-01-11T23:32:08.3398211Z [0;32m[       OK ] [mTensorIndexingTest.TestSingleInt (0 ms)
2023-01-11T23:32:08.3398927Z [0;32m[ RUN      ] [mTensorIndexingTest.TestMultipleInt
2023-01-11T23:32:08.3399645Z [0;32m[       OK ] [mTensorIndexingTest.TestMultipleInt (0 ms)
2023-01-11T23:32:08.3400319Z [0;32m[ RUN      ] [mTensorIndexingTest.TestNone
2023-01-11T23:32:08.3400951Z [0;32m[       OK ] [mTensorIndexingTest.TestNone (0 ms)
2023-01-11T23:32:08.3401582Z [0;32m[ RUN      ] [mTensorIndexingTest.TestStep
2023-01-11T23:32:08.3402351Z [0;32m[       OK ] [mTensorIndexingTest.TestStep (0 ms)
2023-01-11T23:32:08.3402862Z [0;32m[ RUN      ] [mTensorIndexingTest.TestStepAssignment
2023-01-11T23:32:08.3403372Z [0;32m[       OK ] [mTensorIndexingTest.TestStepAssignment (0 ms)
2023-01-11T23:32:08.3403835Z [0;32m[ RUN      ] [mTensorIndexingTest.TestBoolIndices
2023-01-11T23:32:08.3404187Z [0;32m[       OK ] [mTensorIndexingTest.TestBoolIndices (0 ms)
2023-01-11T23:32:08.3404684Z [0;32m[ RUN      ] [mTensorIndexingTest.TestBoolIndicesAccumulate
2023-01-11T23:32:08.3405252Z [0;32m[       OK ] [mTensorIndexingTest.TestBoolIndicesAccumulate (0 ms)
2023-01-11T23:32:08.3405768Z [0;32m[ RUN      ] [mTensorIndexingTest.TestMultipleBoolIndices
2023-01-11T23:32:08.3406251Z [0;32m[       OK ] [mTensorIndexingTest.TestMultipleBoolIndices (0 ms)
2023-01-11T23:32:08.3406764Z [0;32m[ RUN      ] [mTensorIndexingTest.TestByteMask
2023-01-11T23:32:08.3407136Z [0;32m[       OK ] [mTensorIndexingTest.TestByteMask (0 ms)
2023-01-11T23:32:08.3407493Z [0;32m[ RUN      ] [mTensorIndexingTest.TestByteMaskAccumulate
2023-01-11T23:32:08.3407888Z [0;32m[       OK ] [mTensorIndexingTest.TestByteMaskAccumulate (0 ms)
2023-01-11T23:32:08.3408275Z [0;32m[ RUN      ] [mTensorIndexingTest.TestMultipleByteMask
2023-01-11T23:32:08.3408658Z [0;32m[       OK ] [mTensorIndexingTest.TestMultipleByteMask (0 ms)
2023-01-11T23:32:08.3409020Z [0;32m[ RUN      ] [mTensorIndexingTest.TestByteMask2d
2023-01-11T23:32:08.3409383Z [0;32m[       OK ] [mTensorIndexingTest.TestByteMask2d (0 ms)
2023-01-11T23:32:08.3409727Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIntIndices
2023-01-11T23:32:08.3410069Z [0;32m[       OK ] [mTensorIndexingTest.TestIntIndices (0 ms)
2023-01-11T23:32:08.3410426Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIntIndices2d
2023-01-11T23:32:08.3410786Z [0;32m[       OK ] [mTensorIndexingTest.TestIntIndices2d (0 ms)
2023-01-11T23:32:08.3411155Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIntIndicesBroadcast
2023-01-11T23:32:08.3411557Z [0;32m[       OK ] [mTensorIndexingTest.TestIntIndicesBroadcast (0 ms)
2023-01-11T23:32:08.3411930Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptyIndex
2023-01-11T23:32:08.3412282Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptyIndex (0 ms)
2023-01-11T23:32:08.3412659Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptyNdimIndex
2023-01-11T23:32:08.3466725Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptyNdimIndex (5 ms)
2023-01-11T23:32:08.3467317Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptyNdimIndex_CUDA
2023-01-11T23:32:08.3467752Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptyNdimIndex_CUDA (0 ms)
2023-01-11T23:32:08.3468150Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptyNdimIndexBool
2023-01-11T23:32:08.3487653Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptyNdimIndexBool (2 ms)
2023-01-11T23:32:08.3488228Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptyNdimIndexBool_CUDA
2023-01-11T23:32:08.3509176Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptyNdimIndexBool_CUDA (2 ms)
2023-01-11T23:32:08.3510402Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptySlice
2023-01-11T23:32:08.3511151Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptySlice (0 ms)
2023-01-11T23:32:08.3511866Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEmptySlice_CUDA
2023-01-11T23:32:08.3512584Z [0;32m[       OK ] [mTensorIndexingTest.TestEmptySlice_CUDA (0 ms)
2023-01-11T23:32:08.3513124Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIndexGetitemCopyBoolsSlices
2023-01-11T23:32:08.3513574Z [0;32m[       OK ] [mTensorIndexingTest.TestIndexGetitemCopyBoolsSlices (0 ms)
2023-01-11T23:32:08.3514009Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIndexSetitemBoolsSlices
2023-01-11T23:32:08.3601685Z [0;32m[       OK ] [mTensorIndexingTest.TestIndexSetitemBoolsSlices (9 ms)
2023-01-11T23:32:08.3602798Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIndexScalarWithBoolMask
2023-01-11T23:32:08.3603286Z [0;32m[       OK ] [mTensorIndexingTest.TestIndexScalarWithBoolMask (0 ms)
2023-01-11T23:32:08.3603735Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIndexScalarWithBoolMask_CUDA
2023-01-11T23:32:08.3607547Z [0;32m[       OK ] [mTensorIndexingTest.TestIndexScalarWithBoolMask_CUDA (0 ms)
2023-01-11T23:32:08.3607997Z [0;32m[ RUN      ] [mTensorIndexingTest.TestSetitemExpansionError
2023-01-11T23:32:08.3727703Z [0;32m[       OK ] [mTensorIndexingTest.TestSetitemExpansionError (11 ms)
2023-01-11T23:32:08.3728283Z [0;32m[ RUN      ] [mTensorIndexingTest.TestGetitemScalars
2023-01-11T23:32:08.3839039Z [0;32m[       OK ] [mTensorIndexingTest.TestGetitemScalars (10 ms)
2023-01-11T23:32:08.3839816Z [0;32m[ RUN      ] [mTensorIndexingTest.TestSetitemScalars
2023-01-11T23:32:08.3946927Z [0;32m[       OK ] [mTensorIndexingTest.TestSetitemScalars (10 ms)
2023-01-11T23:32:08.3947344Z [0;32m[ RUN      ] [mTensorIndexingTest.TestBasicAdvancedCombined
2023-01-11T23:32:08.3947759Z [0;32m[       OK ] [mTensorIndexingTest.TestBasicAdvancedCombined (0 ms)
2023-01-11T23:32:08.3948155Z [0;32m[ RUN      ] [mTensorIndexingTest.TestIntAssignment
2023-01-11T23:32:08.3948532Z [0;32m[       OK ] [mTensorIndexingTest.TestIntAssignment (0 ms)
2023-01-11T23:32:08.3948918Z [0;32m[ RUN      ] [mTensorIndexingTest.TestByteTensorAssignment
2023-01-11T23:32:08.3949531Z [0;32m[       OK ] [mTensorIndexingTest.TestByteTensorAssignment (0 ms)
2023-01-11T23:32:08.3949983Z [0;32m[ RUN      ] [mTensorIndexingTest.TestVariableSlicing
2023-01-11T23:32:08.3950386Z [0;32m[       OK ] [mTensorIndexingTest.TestVariableSlicing (0 ms)
2023-01-11T23:32:08.3950764Z [0;32m[ RUN      ] [mTensorIndexingTest.TestEllipsisTensor
2023-01-11T23:32:08.3952979Z [0;32m[       OK ] [mTensorIndexingTest.TestEllipsisTensor (0 ms)
2023-01-11T23:32:08.3953363Z [0;32m[ RUN      ] [mTensorIndexingTest.TestOutOfBoundIndex
2023-01-11T23:32:08.4051402Z [0;32m[       OK ] [mTensorIndexingTest.TestOutOfBoundIndex (9 ms)
2023-01-11T23:32:08.4051793Z [0;32m[ RUN      ] [mTensorIndexingTest.TestZeroDimIndex
2023-01-11T23:32:08.4073394Z [0;32m[       OK ] [mTensorIndexingTest.TestZeroDimIndex (2 ms)
2023-01-11T23:32:08.4074213Z [0;32m[----------] [m40 tests from TensorIndexingTest (78 ms total)
2023-01-11T23:32:08.4074727Z 
2023-01-11T23:32:08.4075125Z [0;32m[----------] [m18 tests from NumpyTests
2023-01-11T23:32:08.4075714Z [0;32m[ RUN      ] [mNumpyTests.TestNoneIndex
2023-01-11T23:32:08.4076346Z [0;32m[       OK ] [mNumpyTests.TestNoneIndex (0 ms)
2023-01-11T23:32:08.4077006Z [0;32m[ RUN      ] [mNumpyTests.TestEmptyFancyIndex
2023-01-11T23:32:08.4126491Z [0;32m[       OK ] [mNumpyTests.TestEmptyFancyIndex (5 ms)
2023-01-11T23:32:08.4126987Z [0;32m[ RUN      ] [mNumpyTests.TestEllipsisIndex
2023-01-11T23:32:08.4127474Z [0;32m[       OK ] [mNumpyTests.TestEllipsisIndex (0 ms)
2023-01-11T23:32:08.4127988Z [0;32m[ RUN      ] [mNumpyTests.TestSingleIntIndex
2023-01-11T23:32:08.4152736Z [0;32m[       OK ] [mNumpyTests.TestSingleIntIndex (2 ms)
2023-01-11T23:32:08.4153766Z [0;32m[ RUN      ] [mNumpyTests.TestSingleBoolIndex
2023-01-11T23:32:08.4154718Z [0;32m[       OK ] [mNumpyTests.TestSingleBoolIndex (0 ms)
2023-01-11T23:32:08.4155447Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanShapeMismatch
2023-01-11T23:32:08.4364317Z [0;32m[       OK ] [mNumpyTests.TestBooleanShapeMismatch (21 ms)
2023-01-11T23:32:08.4364864Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanIndexingOnedim
2023-01-11T23:32:08.4365377Z [0;32m[       OK ] [mNumpyTests.TestBooleanIndexingOnedim (0 ms)
2023-01-11T23:32:08.4365788Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanAssignmentValueMismatch
2023-01-11T23:32:08.4518214Z [0;32m[       OK ] [mNumpyTests.TestBooleanAssignmentValueMismatch (15 ms)
2023-01-11T23:32:08.4518779Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanIndexingTwodim
2023-01-11T23:32:08.4519265Z [0;32m[       OK ] [mNumpyTests.TestBooleanIndexingTwodim (0 ms)
2023-01-11T23:32:08.4519644Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanIndexingWeirdness
2023-01-11T23:32:08.4631597Z [0;32m[       OK ] [mNumpyTests.TestBooleanIndexingWeirdness (11 ms)
2023-01-11T23:32:08.4633143Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanIndexingWeirdnessTensors
2023-01-11T23:32:08.4743918Z [0;32m[       OK ] [mNumpyTests.TestBooleanIndexingWeirdnessTensors (11 ms)
2023-01-11T23:32:08.4744566Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanIndexingAlldims
2023-01-11T23:32:08.4745243Z [0;32m[       OK ] [mNumpyTests.TestBooleanIndexingAlldims (0 ms)
2023-01-11T23:32:08.4745892Z [0;32m[ RUN      ] [mNumpyTests.TestBooleanListIndexing
2023-01-11T23:32:08.4746276Z [0;32m[       OK ] [mNumpyTests.TestBooleanListIndexing (0 ms)
2023-01-11T23:32:08.4746653Z [0;32m[ RUN      ] [mNumpyTests.TestEverythingReturnsViews
2023-01-11T23:32:08.4747028Z [0;32m[       OK ] [mNumpyTests.TestEverythingReturnsViews (0 ms)
2023-01-11T23:32:08.4747396Z [0;32m[ RUN      ] [mNumpyTests.TestBroaderrorsIndexing
2023-01-11T23:32:08.4964507Z [0;32m[       OK ] [mNumpyTests.TestBroaderrorsIndexing (21 ms)
2023-01-11T23:32:08.4964904Z [0;32m[ RUN      ] [mNumpyTests.TestTrivialFancyOutOfBounds
2023-01-11T23:32:08.5238842Z [0;32m[       OK ] [mNumpyTests.TestTrivialFancyOutOfBounds (27 ms)
2023-01-11T23:32:08.5239637Z [0;32m[ RUN      ] [mNumpyTests.TestIndexIsLarger
2023-01-11T23:32:08.5240416Z [0;32m[       OK ] [mNumpyTests.TestIndexIsLarger (0 ms)
2023-01-11T23:32:08.5241221Z [0;32m[ RUN      ] [mNumpyTests.TestBroadcastSubspace
2023-01-11T23:32:08.5242087Z [0;32m[       OK ] [mNumpyTests.TestBroadcastSubspace (0 ms)
2023-01-11T23:32:08.5242827Z [0;32m[----------] [m18 tests from NumpyTests (116 ms total)
2023-01-11T23:32:08.5243159Z 
2023-01-11T23:32:08.5243431Z [0;32m[----------] [m6 tests from TensorOptionsTest
2023-01-11T23:32:08.5243851Z [0;32m[ RUN      ] [mTensorOptionsTest.ConstructsWellFromCUDATypes_CUDA
2023-01-11T23:32:08.5244358Z [0;32m[       OK ] [mTensorOptionsTest.ConstructsWellFromCUDATypes_CUDA (0 ms)
2023-01-11T23:32:08.5244832Z [0;32m[ RUN      ] [mTensorOptionsTest.DefaultsToTheRightValues
2023-01-11T23:32:08.5245298Z [0;32m[       OK ] [mTensorOptionsTest.DefaultsToTheRightValues (0 ms)
2023-01-11T23:32:08.5245819Z [0;32m[ RUN      ] [mTensorOptionsTest.UtilityFunctionsReturnTheRightTensorOptions
2023-01-11T23:32:08.5246408Z [0;32m[       OK ] [mTensorOptionsTest.UtilityFunctionsReturnTheRightTensorOptions (0 ms)
2023-01-11T23:32:08.5246930Z [0;32m[ RUN      ] [mTensorOptionsTest.ConstructsWellFromCPUTypes
2023-01-11T23:32:08.5247398Z [0;32m[       OK ] [mTensorOptionsTest.ConstructsWellFromCPUTypes (0 ms)
2023-01-11T23:32:08.5247876Z [0;32m[ RUN      ] [mTensorOptionsTest.ConstructsWellFromCPUTensors
2023-01-11T23:32:08.5248360Z [0;32m[       OK ] [mTensorOptionsTest.ConstructsWellFromCPUTensors (0 ms)
2023-01-11T23:32:08.5248825Z [0;32m[ RUN      ] [mTensorOptionsTest.ConstructsWellFromVariables
2023-01-11T23:32:08.5249407Z [0;32m[       OK ] [mTensorOptionsTest.ConstructsWellFromVariables (0 ms)
2023-01-11T23:32:08.5249861Z [0;32m[----------] [m6 tests from TensorOptionsTest (0 ms total)
2023-01-11T23:32:08.5250051Z 
2023-01-11T23:32:08.5250222Z [0;32m[----------] [m1 test from DeviceTest
2023-01-11T23:32:08.5250588Z [0;32m[ RUN      ] [mDeviceTest.ParsesCorrectlyFromString
2023-01-11T23:32:08.5326407Z [0;32m[       OK ] [mDeviceTest.ParsesCorrectlyFromString (8 ms)
2023-01-11T23:32:08.5326870Z [0;32m[----------] [m1 test from DeviceTest (8 ms total)
2023-01-11T23:32:08.5327050Z 
2023-01-11T23:32:08.5327244Z [0;32m[----------] [m3 tests from DefaultDtypeTest
2023-01-11T23:32:08.5327641Z [0;32m[ RUN      ] [mDefaultDtypeTest.CanSetAndGetDefaultDtype
2023-01-11T23:32:08.5328111Z [0;32m[       OK ] [mDefaultDtypeTest.CanSetAndGetDefaultDtype (0 ms)
2023-01-11T23:32:08.5328588Z [0;32m[ RUN      ] [mDefaultDtypeTest.NewTensorOptionsHasCorrectDefault
2023-01-11T23:32:08.5329092Z [0;32m[       OK ] [mDefaultDtypeTest.NewTensorOptionsHasCorrectDefault (0 ms)
2023-01-11T23:32:08.5329595Z [0;32m[ RUN      ] [mDefaultDtypeTest.NewTensorsHaveCorrectDefaultDtype
2023-01-11T23:32:08.5330101Z [0;32m[       OK ] [mDefaultDtypeTest.NewTensorsHaveCorrectDefaultDtype (0 ms)
2023-01-11T23:32:08.5330553Z [0;32m[----------] [m3 tests from DefaultDtypeTest (0 ms total)
2023-01-11T23:32:08.5330733Z 
2023-01-11T23:32:08.5330924Z [0;32m[----------] [m1 test from TorchIncludeTest
2023-01-11T23:32:08.5331292Z [0;32m[ RUN      ] [mTorchIncludeTest.GetSetNumThreads
2023-01-11T23:32:08.5539938Z [0;32m[       OK ] [mTorchIncludeTest.GetSetNumThreads (21 ms)
2023-01-11T23:32:08.5540424Z [0;32m[----------] [m1 test from TorchIncludeTest (21 ms total)
2023-01-11T23:32:08.5540587Z 
2023-01-11T23:32:08.5540758Z [0;32m[----------] [m28 tests from InferenceModeTest
2023-01-11T23:32:08.5541085Z [0;32m[ RUN      ] [mInferenceModeTest.TestTLSState
2023-01-11T23:32:08.5541436Z [0;32m[       OK ] [mInferenceModeTest.TestTLSState (0 ms)
2023-01-11T23:32:08.5541819Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorCreation
2023-01-11T23:32:08.5542247Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorCreation (0 ms)
2023-01-11T23:32:08.5542676Z [0;32m[ RUN      ] [mInferenceModeTest.TestExistingAutogradSession
2023-01-11T23:32:08.5612080Z [0;32m[       OK ] [mInferenceModeTest.TestExistingAutogradSession (7 ms)
2023-01-11T23:32:08.5612579Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorInInferenceModeFunctionalOp
2023-01-11T23:32:08.5613141Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorInInferenceModeFunctionalOp (0 ms)
2023-01-11T23:32:08.5613674Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorInInferenceModeInplaceOp
2023-01-11T23:32:08.5614198Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorInInferenceModeInplaceOp (0 ms)
2023-01-11T23:32:08.5614966Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorInInferenceModeViewOp
2023-01-11T23:32:08.5615489Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorInInferenceModeViewOp (0 ms)
2023-01-11T23:32:08.5616006Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorInNormalModeFunctionalOp
2023-01-11T23:32:08.5616526Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorInNormalModeFunctionalOp (0 ms)
2023-01-11T23:32:08.5617043Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorInNormalModeInplaceOp
2023-01-11T23:32:08.5667552Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorInNormalModeInplaceOp (5 ms)
2023-01-11T23:32:08.5668058Z [0;32m[ RUN      ] [mInferenceModeTest.TestInferenceTensorInNormalModeViewOp
2023-01-11T23:32:08.5668544Z [0;32m[       OK ] [mInferenceModeTest.TestInferenceTensorInNormalModeViewOp (0 ms)
2023-01-11T23:32:08.5669051Z [0;32m[ RUN      ] [mInferenceModeTest.TestNormalTensorInplaceOutputInInferenceMode
2023-01-11T23:32:08.5669663Z [0;32m[       OK ] [mInferenceModeTest.TestNormalTensorInplaceOutputInInferenceMode (0 ms)
2023-01-11T23:32:08.5670254Z [0;32m[ RUN      ] [mInferenceModeTest.TestNormalTensorInplaceOutputInNormalMode
2023-01-11T23:32:08.5670756Z [0;32m[       OK ] [mInferenceModeTest.TestNormalTensorInplaceOutputInNormalMode (0 ms)
2023-01-11T23:32:08.5671257Z [0;32m[ RUN      ] [mInferenceModeTest.TestNormalTensorViewOutputInInferenceMode
2023-01-11T23:32:08.5671768Z [0;32m[       OK ] [mInferenceModeTest.TestNormalTensorViewOutputInInferenceMode (0 ms)
2023-01-11T23:32:08.5672255Z [0;32m[ RUN      ] [mInferenceModeTest.TestNormalTensorViewOutputInNormalMode
2023-01-11T23:32:08.5702183Z [0;32m[       OK ] [mInferenceModeTest.TestNormalTensorViewOutputInNormalMode (3 ms)
2023-01-11T23:32:08.5702731Z [0;32m[ RUN      ] [mInferenceModeTest.TestMixInferenceAndNormalTensorFunctionalOp
2023-01-11T23:32:08.5733874Z [0;32m[       OK ] [mInferenceModeTest.TestMixInferenceAndNormalTensorFunctionalOp (3 ms)
2023-01-11T23:32:08.5734392Z [0;32m[ RUN      ] [mInferenceModeTest.TestMixInferenceAndNormalTensorInplaceOp
2023-01-11T23:32:08.5817646Z [0;32m[       OK ] [mInferenceModeTest.TestMixInferenceAndNormalTensorInplaceOp (8 ms)
2023-01-11T23:32:08.5818155Z [0;32m[ RUN      ] [mInferenceModeTest.TestMixInferenceAndNormalTensorViewOp
2023-01-11T23:32:08.5818637Z [0;32m[       OK ] [mInferenceModeTest.TestMixInferenceAndNormalTensorViewOp (0 ms)
2023-01-11T23:32:08.5819078Z [0;32m[ RUN      ] [mInferenceModeTest.TestHandleDirectViewOnRebase
2023-01-11T23:32:08.5852067Z [0;32m[       OK ] [mInferenceModeTest.TestHandleDirectViewOnRebase (3 ms)
2023-01-11T23:32:08.5852507Z [0;32m[ RUN      ] [mInferenceModeTest.TestHandleInDirectViewOnRebase
2023-01-11T23:32:08.5874364Z [0;32m[       OK ] [mInferenceModeTest.TestHandleInDirectViewOnRebase (2 ms)
2023-01-11T23:32:08.5874800Z [0;32m[ RUN      ] [mInferenceModeTest.TestCreationMetaPropagation
2023-01-11T23:32:08.5936936Z [0;32m[       OK ] [mInferenceModeTest.TestCreationMetaPropagation (6 ms)
2023-01-11T23:32:08.5937392Z [0;32m[ RUN      ] [mInferenceModeTest.TestCreationMetaPropagationInput
2023-01-11T23:32:08.6061709Z [0;32m[       OK ] [mInferenceModeTest.TestCreationMetaPropagationInput (12 ms)
2023-01-11T23:32:08.6062172Z [0;32m[ RUN      ] [mInferenceModeTest.TestInplaceCopyOnInferenceTensor
2023-01-11T23:32:08.6134869Z [0;32m[       OK ] [mInferenceModeTest.TestInplaceCopyOnInferenceTensor (7 ms)
2023-01-11T23:32:08.6135325Z [0;32m[ RUN      ] [mInferenceModeTest.TestSetRequiresGradInNormalMode
2023-01-11T23:32:08.6145524Z [0;32m[       OK ] [mInferenceModeTest.TestSetRequiresGradInNormalMode (1 ms)
2023-01-11T23:32:08.6146364Z [0;32m[ RUN      ] [mInferenceModeTest.TestAccessVersionCounter
2023-01-11T23:32:08.6179828Z [0;32m[       OK ] [mInferenceModeTest.TestAccessVersionCounter (3 ms)
2023-01-11T23:32:08.6180350Z [0;32m[ RUN      ] [mInferenceModeTest.TestInplaceUpdateInferenceTensorWithNormalTensor
2023-01-11T23:32:08.6252291Z [0;32m[       OK ] [mInferenceModeTest.TestInplaceUpdateInferenceTensorWithNormalTensor (7 ms)
2023-01-11T23:32:08.6252975Z [0;32m[ RUN      ] [mInferenceModeTest.TestComplexViewInInferenceMode
2023-01-11T23:32:08.6253548Z [0;32m[       OK ] [mInferenceModeTest.TestComplexViewInInferenceMode (0 ms)
2023-01-11T23:32:08.6254003Z [0;32m[ RUN      ] [mInferenceModeTest.TestComplexViewInNormalMode
2023-01-11T23:32:08.6254685Z [0;32m[       OK ] [mInferenceModeTest.TestComplexViewInNormalMode (0 ms)
2023-01-11T23:32:08.6255095Z [0;32m[ RUN      ] [mInferenceModeTest.TestCustomFunction
2023-01-11T23:32:08.6255464Z [0;32m[       OK ] [mInferenceModeTest.TestCustomFunction (0 ms)
2023-01-11T23:32:08.6255897Z [0;32m[ RUN      ] [mInferenceModeTest.TestLegacyAutoNonVariableTypeModeWarning
2023-01-11T23:32:08.6256401Z [0;32m[       OK ] [mInferenceModeTest.TestLegacyAutoNonVariableTypeModeWarning (0 ms)
2023-01-11T23:32:08.6256846Z [0;32m[----------] [m28 tests from InferenceModeTest (71 ms total)
2023-01-11T23:32:08.6257020Z 
2023-01-11T23:32:08.6257396Z [0;32m[----------] [m4 tests from GradModeTest
2023-01-11T23:32:08.6257739Z [0;32m[ RUN      ] [mGradModeTest.TestRequiresGradFunctionalOp
2023-01-11T23:32:08.6258138Z [0;32m[       OK ] [mGradModeTest.TestRequiresGradFunctionalOp (0 ms)
2023-01-11T23:32:08.6258527Z [0;32m[ RUN      ] [mGradModeTest.TestRequiresGradInplaceOp
2023-01-11T23:32:08.6258901Z [0;32m[       OK ] [mGradModeTest.TestRequiresGradInplaceOp (0 ms)
2023-01-11T23:32:08.6259271Z [0;32m[ RUN      ] [mGradModeTest.TestRequiresGradViewOp
2023-01-11T23:32:08.6259637Z [0;32m[       OK ] [mGradModeTest.TestRequiresGradViewOp (0 ms)
2023-01-11T23:32:08.6260017Z [0;32m[ RUN      ] [mGradModeTest.TestRequiresGradViewOpExiting
2023-01-11T23:32:08.6286930Z [0;32m[       OK ] [mGradModeTest.TestRequiresGradViewOpExiting (3 ms)
2023-01-11T23:32:08.6287342Z [0;32m[----------] [m4 tests from GradModeTest (3 ms total)
2023-01-11T23:32:08.6287510Z 
2023-01-11T23:32:08.6287671Z [0;32m[----------] [m3 tests from OperationTest
2023-01-11T23:32:08.6287964Z [0;32m[ RUN      ] [mOperationTest.Lerp
2023-01-11T23:32:08.6295953Z [0;32m[       OK ] [mOperationTest.Lerp (0 ms)
2023-01-11T23:32:08.6296321Z [0;32m[ RUN      ] [mOperationTest.Cross
2023-01-11T23:32:08.6335885Z [0;32m[       OK ] [mOperationTest.Cross (3 ms)
2023-01-11T23:32:08.6336190Z [0;32m[ RUN      ] [mOperationTest.Linear_out
2023-01-11T23:32:08.6339557Z [0;32m[       OK ] [mOperationTest.Linear_out (0 ms)
2023-01-11T23:32:08.6339930Z [0;32m[----------] [m3 tests from OperationTest (5 ms total)
2023-01-11T23:32:08.6340114Z 
2023-01-11T23:32:08.6340334Z [0;32m[----------] [m1 test from ParallelTest
2023-01-11T23:32:08.6340877Z [0;32m[ RUN      ] [mParallelTest.DataParallelUsesAllAvailableCUDADevices_CUDA
2023-01-11T23:32:08.6341369Z [0;32m[       OK ] [mParallelTest.DataParallelUsesAllAvailableCUDADevices_CUDA (0 ms)
2023-01-11T23:32:08.6341783Z [0;32m[----------] [m1 test from ParallelTest (0 ms total)
2023-01-11T23:32:08.6341948Z 
2023-01-11T23:32:08.6342118Z [0;32m[----------] [mGlobal test environment tear-down
2023-01-11T23:32:08.6472821Z [0;32m[==========] [m1035 tests from 49 test suites ran. (66641 ms total)
2023-01-11T23:32:08.6473122Z [0;32m[  PASSED  ] [m1035 tests.
2023-01-11T23:32:09.3135735Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_tensorexpr --gtest_output=xml:test/test-reports/cpp-unittest/test_libtorch/test_tensorexpr.xml
2023-01-11T23:32:09.7148816Z Only one CUDA device detected. Disabling MultiCUDA tests
2023-01-11T23:32:09.7152503Z [0;33mNote: Google Test filter = *-*_MultiCUDA
2023-01-11T23:32:09.7152892Z [m[0;32m[==========] [mRunning 829 tests from 26 test suites.
2023-01-11T23:32:09.7153255Z [0;32m[----------] [mGlobal test environment set-up.
2023-01-11T23:32:09.7153580Z [0;32m[----------] [m1 test from Approx
2023-01-11T23:32:09.7153849Z [0;32m[ RUN      ] [mApprox.log_vml
2023-01-11T23:32:10.5153636Z [0;32m[       OK ] [mApprox.log_vml (799 ms)
2023-01-11T23:32:10.5154085Z [0;32m[----------] [m1 test from Approx (799 ms total)
2023-01-11T23:32:10.5154310Z 
2023-01-11T23:32:10.5154479Z [0;32m[----------] [m34 tests from ATen
2023-01-11T23:32:10.5154785Z [0;32m[ RUN      ] [mATen._cast_Float
2023-01-11T23:32:10.5155088Z [0;32m[       OK ] [mATen._cast_Float (0 ms)
2023-01-11T23:32:10.5155385Z [0;32m[ RUN      ] [mATen.negInt
2023-01-11T23:32:10.5158140Z [0;32m[       OK ] [mATen.negInt (0 ms)
2023-01-11T23:32:10.5158436Z [0;32m[ RUN      ] [mATen.negFloat
2023-01-11T23:32:10.5162217Z [0;32m[       OK ] [mATen.negFloat (0 ms)
2023-01-11T23:32:10.5162521Z [0;32m[ RUN      ] [mATen.addInt
2023-01-11T23:32:10.5168655Z [0;32m[       OK ] [mATen.addInt (0 ms)
2023-01-11T23:32:10.5168953Z [0;32m[ RUN      ] [mATen.addFloat
2023-01-11T23:32:10.5176037Z [0;32m[       OK ] [mATen.addFloat (0 ms)
2023-01-11T23:32:10.5176343Z [0;32m[ RUN      ] [mATen.subInt
2023-01-11T23:32:10.5182891Z [0;32m[       OK ] [mATen.subInt (0 ms)
2023-01-11T23:32:10.5183522Z [0;32m[ RUN      ] [mATen.subFloat
2023-01-11T23:32:10.5190920Z [0;32m[       OK ] [mATen.subFloat (0 ms)
2023-01-11T23:32:10.5191345Z [0;32m[ RUN      ] [mATen.lerp
2023-01-11T23:32:10.5199349Z [0;32m[       OK ] [mATen.lerp (0 ms)
2023-01-11T23:32:10.5199766Z [0;32m[ RUN      ] [mATen.addcmulInt
2023-01-11T23:32:10.5208129Z [0;32m[       OK ] [mATen.addcmulInt (0 ms)
2023-01-11T23:32:10.5208577Z [0;32m[ RUN      ] [mATen.addcmulFloat
2023-01-11T23:32:10.5217291Z [0;32m[       OK ] [mATen.addcmulFloat (0 ms)
2023-01-11T23:32:10.5217713Z [0;32m[ RUN      ] [mATen.mulInt
2023-01-11T23:32:10.5222577Z [0;32m[       OK ] [mATen.mulInt (0 ms)
2023-01-11T23:32:10.5222991Z [0;32m[ RUN      ] [mATen.mulFloat
2023-01-11T23:32:10.5227954Z [0;32m[       OK ] [mATen.mulFloat (0 ms)
2023-01-11T23:32:10.5228350Z [0;32m[ RUN      ] [mATen.divInt
2023-01-11T23:32:10.5233994Z [0;32m[       OK ] [mATen.divInt (0 ms)
2023-01-11T23:32:10.5234408Z [0;32m[ RUN      ] [mATen.divFloat
2023-01-11T23:32:10.5238811Z [0;32m[       OK ] [mATen.divFloat (0 ms)
2023-01-11T23:32:10.5239215Z [0;32m[ RUN      ] [mATen.maxInt
2023-01-11T23:32:10.5244295Z [0;32m[       OK ] [mATen.maxInt (0 ms)
2023-01-11T23:32:10.5244702Z [0;32m[ RUN      ] [mATen.maxFloat
2023-01-11T23:32:10.5249565Z [0;32m[       OK ] [mATen.maxFloat (0 ms)
2023-01-11T23:32:10.5249977Z [0;32m[ RUN      ] [mATen.minInt
2023-01-11T23:32:10.5255325Z [0;32m[       OK ] [mATen.minInt (0 ms)
2023-01-11T23:32:10.5255732Z [0;32m[ RUN      ] [mATen.minFloat
2023-01-11T23:32:10.5260548Z [0;32m[       OK ] [mATen.minFloat (0 ms)
2023-01-11T23:32:10.5260968Z [0;32m[ RUN      ] [mATen.reluInt
2023-01-11T23:32:10.5264835Z [0;32m[       OK ] [mATen.reluInt (0 ms)
2023-01-11T23:32:10.5265245Z [0;32m[ RUN      ] [mATen.reluFloat
2023-01-11T23:32:10.5269183Z [0;32m[       OK ] [mATen.reluFloat (0 ms)
2023-01-11T23:32:10.5269600Z [0;32m[ RUN      ] [mATen.logFloat
2023-01-11T23:32:10.5274241Z [0;32m[       OK ] [mATen.logFloat (0 ms)
2023-01-11T23:32:10.5274651Z [0;32m[ RUN      ] [mATen.fastLogFloat
2023-01-11T23:32:10.5437138Z [0;32m[       OK ] [mATen.fastLogFloat (16 ms)
2023-01-11T23:32:10.5437588Z [0;32m[ RUN      ] [mATen.fastTanhFloat
2023-01-11T23:32:10.5501541Z [0;32m[       OK ] [mATen.fastTanhFloat (6 ms)
2023-01-11T23:32:10.5501999Z [0;32m[ RUN      ] [mATen.fastSigmoidFloat
2023-01-11T23:32:10.5583257Z [0;32m[       OK ] [mATen.fastSigmoidFloat (8 ms)
2023-01-11T23:32:10.5583705Z [0;32m[ RUN      ] [mATen.log10Float
2023-01-11T23:32:10.5587450Z [0;32m[       OK ] [mATen.log10Float (0 ms)
2023-01-11T23:32:10.5587881Z [0;32m[ RUN      ] [mATen.log2Float
2023-01-11T23:32:10.5592731Z [0;32m[       OK ] [mATen.log2Float (0 ms)
2023-01-11T23:32:10.5593156Z [0;32m[ RUN      ] [mATen.expFloat
2023-01-11T23:32:10.5595279Z [0;32m[       OK ] [mATen.expFloat (0 ms)
2023-01-11T23:32:10.5595693Z [0;32m[ RUN      ] [mATen.erfFloat
2023-01-11T23:32:10.5599731Z [0;32m[       OK ] [mATen.erfFloat (0 ms)
2023-01-11T23:32:10.5600137Z [0;32m[ RUN      ] [mATen.cosFloat
2023-01-11T23:32:10.5604055Z [0;32m[       OK ] [mATen.cosFloat (0 ms)
2023-01-11T23:32:10.5604455Z [0;32m[ RUN      ] [mATen.eqInt
2023-01-11T23:32:10.5609821Z [0;32m[       OK ] [mATen.eqInt (0 ms)
2023-01-11T23:32:10.5610208Z [0;32m[ RUN      ] [mATen.geInt
2023-01-11T23:32:10.5615796Z [0;32m[       OK ] [mATen.geInt (0 ms)
2023-01-11T23:32:10.5616194Z [0;32m[ RUN      ] [mATen.gtInt
2023-01-11T23:32:10.5621478Z [0;32m[       OK ] [mATen.gtInt (0 ms)
2023-01-11T23:32:10.5621863Z [0;32m[ RUN      ] [mATen.leInt
2023-01-11T23:32:10.5627204Z [0;32m[       OK ] [mATen.leInt (0 ms)
2023-01-11T23:32:10.5627603Z [0;32m[ RUN      ] [mATen.ltInt
2023-01-11T23:32:10.5633794Z [0;32m[       OK ] [mATen.ltInt (0 ms)
2023-01-11T23:32:10.5635033Z [0;32m[----------] [m34 tests from ATen (48 ms total)
2023-01-11T23:32:10.5635236Z 
2023-01-11T23:32:10.5635415Z [0;32m[----------] [m26 tests from BoundsInference
2023-01-11T23:32:10.5635727Z [0;32m[ RUN      ] [mBoundsInference._1
2023-01-11T23:32:10.5636318Z [0;32m[       OK ] [mBoundsInference._1 (0 ms)
2023-01-11T23:32:10.5636615Z [0;32m[ RUN      ] [mBoundsInference._2
2023-01-11T23:32:10.5638555Z [0;32m[       OK ] [mBoundsInference._2 (0 ms)
2023-01-11T23:32:10.5638913Z [0;32m[ RUN      ] [mBoundsInference._3
2023-01-11T23:32:10.5641006Z [0;32m[       OK ] [mBoundsInference._3 (0 ms)
2023-01-11T23:32:10.5641362Z [0;32m[ RUN      ] [mBoundsInference._4
2023-01-11T23:32:10.5646601Z [0;32m[       OK ] [mBoundsInference._4 (0 ms)
2023-01-11T23:32:10.5646925Z [0;32m[ RUN      ] [mBoundsInference._5
2023-01-11T23:32:10.5659767Z [0;32m[       OK ] [mBoundsInference._5 (1 ms)
2023-01-11T23:32:10.5660123Z [0;32m[ RUN      ] [mBoundsInference._6
2023-01-11T23:32:10.5668892Z [0;32m[       OK ] [mBoundsInference._6 (0 ms)
2023-01-11T23:32:10.5669225Z [0;32m[ RUN      ] [mBoundsInference.Adjacent
2023-01-11T23:32:10.5674816Z [0;32m[       OK ] [mBoundsInference.Adjacent (0 ms)
2023-01-11T23:32:10.5675231Z [0;32m[ RUN      ] [mBoundsInference.MultipleTopLoopLoad
2023-01-11T23:32:10.5679913Z [0;32m[       OK ] [mBoundsInference.MultipleTopLoopLoad (0 ms)
2023-01-11T23:32:10.5680389Z [0;32m[ RUN      ] [mBoundsInference.MultipleTopLoopStore
2023-01-11T23:32:10.5684930Z [0;32m[       OK ] [mBoundsInference.MultipleTopLoopStore (0 ms)
2023-01-11T23:32:10.5685308Z [0;32m[ RUN      ] [mBoundsInference.CacheReads
2023-01-11T23:32:10.5712873Z [0;32m[       OK ] [mBoundsInference.CacheReads (2 ms)
2023-01-11T23:32:10.5713317Z [0;32m[ RUN      ] [mBoundsInference.Flattened
2023-01-11T23:32:10.5725096Z [0;32m[       OK ] [mBoundsInference.Flattened (1 ms)
2023-01-11T23:32:10.5725688Z [0;32m[ RUN      ] [mBoundsInference.GetPotentialHazards
2023-01-11T23:32:10.5726106Z [0;32m[       OK ] [mBoundsInference.GetPotentialHazards (0 ms)
2023-01-11T23:32:10.5726508Z [0;32m[ RUN      ] [mBoundsInference.GetPotentialHazardsLoopNoHazard
2023-01-11T23:32:10.5730304Z [0;32m[       OK ] [mBoundsInference.GetPotentialHazardsLoopNoHazard (0 ms)
2023-01-11T23:32:10.5730773Z [0;32m[ RUN      ] [mBoundsInference.GetPotentialHazardsLoopCall
2023-01-11T23:32:10.5735139Z [0;32m[       OK ] [mBoundsInference.GetPotentialHazardsLoopCall (0 ms)
2023-01-11T23:32:10.5735603Z [0;32m[ RUN      ] [mBoundsInference.GetPotentialHazardsLoopSplit
2023-01-11T23:32:10.5747083Z [0;32m[       OK ] [mBoundsInference.GetPotentialHazardsLoopSplit (1 ms)
2023-01-11T23:32:10.5747624Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapSameBufferWithPartialOverlap
2023-01-11T23:32:10.5752938Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapSameBufferWithPartialOverlap (0 ms)
2023-01-11T23:32:10.5753504Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapSameBufferWithFullOverlap
2023-01-11T23:32:10.5754575Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapSameBufferWithFullOverlap (0 ms)
2023-01-11T23:32:10.5755429Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapSameBufferWithFullOverlapRAW
2023-01-11T23:32:10.5757333Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapSameBufferWithFullOverlapRAW (0 ms)
2023-01-11T23:32:10.5758157Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapSameBufferNotOverlapping
2023-01-11T23:32:10.5761434Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapSameBufferNotOverlapping (0 ms)
2023-01-11T23:32:10.5762219Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlap2DBufferWithOverlap
2023-01-11T23:32:10.5776580Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlap2DBufferWithOverlap (1 ms)
2023-01-11T23:32:10.5777342Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlap2DBufferWithNoOverlap
2023-01-11T23:32:10.5787366Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlap2DBufferWithNoOverlap (1 ms)
2023-01-11T23:32:10.5788102Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapDifferentBuffers
2023-01-11T23:32:10.5790386Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapDifferentBuffers (0 ms)
2023-01-11T23:32:10.5791250Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapDueToRAWDependence
2023-01-11T23:32:10.5794398Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapDueToRAWDependence (0 ms)
2023-01-11T23:32:10.5795131Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapDueToWARDependence
2023-01-11T23:32:10.5797715Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapDueToWARDependence (0 ms)
2023-01-11T23:32:10.5798391Z [0;32m[ RUN      ] [mBoundsInference.HasConflictingOverlapWithLoads
2023-01-11T23:32:10.5801029Z [0;32m[       OK ] [mBoundsInference.HasConflictingOverlapWithLoads (0 ms)
2023-01-11T23:32:10.5801631Z [0;32m[ RUN      ] [mBoundsInference.IsOverlapping
2023-01-11T23:32:10.5824408Z [0;32m[       OK ] [mBoundsInference.IsOverlapping (2 ms)
2023-01-11T23:32:10.5824958Z [0;32m[----------] [m26 tests from BoundsInference (19 ms total)
2023-01-11T23:32:10.5825223Z 
2023-01-11T23:32:10.5825447Z [0;32m[----------] [m4 tests from Conv
2023-01-11T23:32:10.5825887Z [0;32m[ RUN      ] [mConv.DepthwiseConv2D
2023-01-11T23:32:10.9417688Z [0;32m[       OK ] [mConv.DepthwiseConv2D (359 ms)
2023-01-11T23:32:10.9418250Z [0;32m[ RUN      ] [mConv.DepthwiseConv2DNoBias
2023-01-11T23:32:11.2952381Z [0;32m[       OK ] [mConv.DepthwiseConv2DNoBias (353 ms)
2023-01-11T23:32:11.2952953Z [0;32m[ RUN      ] [mConv.DepthwiseConv2DDynamicShapes
2023-01-11T23:32:11.4631809Z [0;32m[       OK ] [mConv.DepthwiseConv2DDynamicShapes (167 ms)
2023-01-11T23:32:11.4632219Z [0;32m[ RUN      ] [mConv.Conv2D
2023-01-11T23:32:12.4843025Z [0;32m[       OK ] [mConv.Conv2D (1020 ms)
2023-01-11T23:32:12.4843754Z [0;32m[----------] [m4 tests from Conv (1901 ms total)
2023-01-11T23:32:12.4843917Z 
2023-01-11T23:32:12.4844071Z [0;32m[----------] [m28 tests from CppPrinter
2023-01-11T23:32:12.4844360Z [0;32m[ RUN      ] [mCppPrinter.IntImm
2023-01-11T23:32:12.4844690Z [0;32m[       OK ] [mCppPrinter.IntImm (0 ms)
2023-01-11T23:32:12.4845096Z [0;32m[ RUN      ] [mCppPrinter.FloatImm
2023-01-11T23:32:12.4845523Z [0;32m[       OK ] [mCppPrinter.FloatImm (0 ms)
2023-01-11T23:32:12.4845904Z [0;32m[ RUN      ] [mCppPrinter.FloatImm1
2023-01-11T23:32:12.4846206Z [0;32m[       OK ] [mCppPrinter.FloatImm1 (0 ms)
2023-01-11T23:32:12.4846492Z [0;32m[ RUN      ] [mCppPrinter.DoubleImm
2023-01-11T23:32:12.4846788Z [0;32m[       OK ] [mCppPrinter.DoubleImm (0 ms)
2023-01-11T23:32:12.4847110Z [0;32m[ RUN      ] [mCppPrinter.DoubleImm1
2023-01-11T23:32:12.4847470Z [0;32m[       OK ] [mCppPrinter.DoubleImm1 (0 ms)
2023-01-11T23:32:12.4847815Z [0;32m[ RUN      ] [mCppPrinter.HalfImm
2023-01-11T23:32:12.4848177Z [0;32m[       OK ] [mCppPrinter.HalfImm (0 ms)
2023-01-11T23:32:12.4848456Z [0;32m[ RUN      ] [mCppPrinter.Add
2023-01-11T23:32:12.4848730Z [0;32m[       OK ] [mCppPrinter.Add (0 ms)
2023-01-11T23:32:12.4849018Z [0;32m[ RUN      ] [mCppPrinter.AddExpr1
2023-01-11T23:32:12.4849317Z [0;32m[       OK ] [mCppPrinter.AddExpr1 (0 ms)
2023-01-11T23:32:12.4849604Z [0;32m[ RUN      ] [mCppPrinter.AddExpr2
2023-01-11T23:32:12.4849901Z [0;32m[       OK ] [mCppPrinter.AddExpr2 (0 ms)
2023-01-11T23:32:12.4850193Z [0;32m[ RUN      ] [mCppPrinter.AddExpr3
2023-01-11T23:32:12.4850481Z [0;32m[       OK ] [mCppPrinter.AddExpr3 (0 ms)
2023-01-11T23:32:12.4850755Z [0;32m[ RUN      ] [mCppPrinter.Mod
2023-01-11T23:32:12.4851033Z [0;32m[       OK ] [mCppPrinter.Mod (0 ms)
2023-01-11T23:32:12.4851310Z [0;32m[ RUN      ] [mCppPrinter.ModFloat
2023-01-11T23:32:12.4851608Z [0;32m[       OK ] [mCppPrinter.ModFloat (0 ms)
2023-01-11T23:32:12.4851886Z [0;32m[ RUN      ] [mCppPrinter.Max
2023-01-11T23:32:12.4852167Z [0;32m[       OK ] [mCppPrinter.Max (0 ms)
2023-01-11T23:32:12.4852442Z [0;32m[ RUN      ] [mCppPrinter.MaxFloat
2023-01-11T23:32:12.4852741Z [0;32m[       OK ] [mCppPrinter.MaxFloat (0 ms)
2023-01-11T23:32:12.4853025Z [0;32m[ RUN      ] [mCppPrinter.MaxHalf
2023-01-11T23:32:12.4853309Z [0;32m[       OK ] [mCppPrinter.MaxHalf (0 ms)
2023-01-11T23:32:12.4853682Z [0;32m[ RUN      ] [mCppPrinter.And
2023-01-11T23:32:12.4854018Z [0;32m[       OK ] [mCppPrinter.And (0 ms)
2023-01-11T23:32:12.4854306Z [0;32m[ RUN      ] [mCppPrinter.CompareSelect
2023-01-11T23:32:12.4854871Z [0;32m[       OK ] [mCppPrinter.CompareSelect (0 ms)
2023-01-11T23:32:12.4855179Z [0;32m[ RUN      ] [mCppPrinter.IfThenElse
2023-01-11T23:32:12.4855473Z [0;32m[       OK ] [mCppPrinter.IfThenElse (0 ms)
2023-01-11T23:32:12.4855780Z [0;32m[ RUN      ] [mCppPrinter.AllocateFree
2023-01-11T23:32:12.4856093Z [0;32m[       OK ] [mCppPrinter.AllocateFree (0 ms)
2023-01-11T23:32:12.4856389Z [0;32m[ RUN      ] [mCppPrinter.LoadStore
2023-01-11T23:32:12.4856691Z [0;32m[       OK ] [mCppPrinter.LoadStore (0 ms)
2023-01-11T23:32:12.4856974Z [0;32m[ RUN      ] [mCppPrinter.Var
2023-01-11T23:32:12.4857248Z [0;32m[       OK ] [mCppPrinter.Var (0 ms)
2023-01-11T23:32:12.4857511Z [0;32m[ RUN      ] [mCppPrinter.Cast
2023-01-11T23:32:12.4857794Z [0;32m[       OK ] [mCppPrinter.Cast (0 ms)
2023-01-11T23:32:12.4858080Z [0;32m[ RUN      ] [mCppPrinter.BitCast
2023-01-11T23:32:12.4858363Z [0;32m[       OK ] [mCppPrinter.BitCast (0 ms)
2023-01-11T23:32:12.4858636Z [0;32m[ RUN      ] [mCppPrinter.Let
2023-01-11T23:32:12.4858910Z [0;32m[       OK ] [mCppPrinter.Let (0 ms)
2023-01-11T23:32:12.4859172Z [0;32m[ RUN      ] [mCppPrinter.For
2023-01-11T23:32:12.4859447Z [0;32m[       OK ] [mCppPrinter.For (0 ms)
2023-01-11T23:32:12.4859717Z [0;32m[ RUN      ] [mCppPrinter.Cond
2023-01-11T23:32:12.4859988Z [0;32m[       OK ] [mCppPrinter.Cond (0 ms)
2023-01-11T23:32:12.4860271Z [0;32m[ RUN      ] [mCppPrinter.Intrinsics
2023-01-11T23:32:12.4860627Z [0;32m[       OK ] [mCppPrinter.Intrinsics (0 ms)
2023-01-11T23:32:12.4860938Z [0;32m[ RUN      ] [mCppPrinter.ExternalCall
2023-01-11T23:32:12.4861244Z [0;32m[       OK ] [mCppPrinter.ExternalCall (0 ms)
2023-01-11T23:32:12.4861577Z [0;32m[----------] [m28 tests from CppPrinter (0 ms total)
2023-01-11T23:32:12.4861738Z 
2023-01-11T23:32:12.4861907Z [0;32m[----------] [m8 tests from DynamicShapes
2023-01-11T23:32:12.4862229Z [0;32m[ RUN      ] [mDynamicShapes.SimpleGraph
2023-01-11T23:32:12.5877756Z [0;32m[       OK ] [mDynamicShapes.SimpleGraph (103 ms)
2023-01-11T23:32:12.5878986Z [0;32m[ RUN      ] [mDynamicShapes.GraphWith2InputsSameDims
2023-01-11T23:32:12.6797638Z [0;32m[       OK ] [mDynamicShapes.GraphWith2InputsSameDims (91 ms)
2023-01-11T23:32:12.6798138Z [0;32m[ RUN      ] [mDynamicShapes.GraphWith2InputsAndBroadcast
2023-01-11T23:32:12.7672539Z [0;32m[       OK ] [mDynamicShapes.GraphWith2InputsAndBroadcast (87 ms)
2023-01-11T23:32:12.7673060Z [0;32m[ RUN      ] [mDynamicShapes.GraphWithPartiallySymbolicOutput
2023-01-11T23:32:12.8124408Z [0;32m[       OK ] [mDynamicShapes.GraphWithPartiallySymbolicOutput (45 ms)
2023-01-11T23:32:12.8124829Z [0;32m[ RUN      ] [mDynamicShapes.GraphWithSymbolicStrides
2023-01-11T23:32:13.0074808Z [0;32m[       OK ] [mDynamicShapes.GraphWithSymbolicStrides (194 ms)
2023-01-11T23:32:13.0075417Z [0;32m[ RUN      ] [mDynamicShapes.GraphWithCatAndBroadcast
2023-01-11T23:32:13.4194432Z [0;32m[       OK ] [mDynamicShapes.GraphWithCatAndBroadcast (411 ms)
2023-01-11T23:32:13.4194983Z [0;32m[ RUN      ] [mDynamicShapes.GraphFromModel
2023-01-11T23:32:13.7936132Z [0;32m[       OK ] [mDynamicShapes.GraphFromModel (374 ms)
2023-01-11T23:32:13.7937095Z [0;32m[ RUN      ] [mDynamicShapes.MultiThreadedExecution
2023-01-11T23:32:14.9972127Z [0;32m[       OK ] [mDynamicShapes.MultiThreadedExecution (1203 ms)
2023-01-11T23:32:14.9972706Z [0;32m[----------] [m8 tests from DynamicShapes (2512 ms total)
2023-01-11T23:32:14.9972911Z 
2023-01-11T23:32:14.9973075Z [0;32m[----------] [m30 tests from Expr
2023-01-11T23:32:14.9973415Z [0;32m[ RUN      ] [mExpr.BasicValueTest
2023-01-11T23:32:14.9973703Z [0;32m[       OK ] [mExpr.BasicValueTest (0 ms)
2023-01-11T23:32:14.9974064Z [0;32m[ RUN      ] [mExpr.BasicValueTest02
2023-01-11T23:32:14.9974965Z [0;32m[       OK ] [mExpr.BasicValueTest02 (0 ms)
2023-01-11T23:32:14.9975338Z [0;32m[ RUN      ] [mExpr.IsChannelsLastContiguous
2023-01-11T23:32:14.9975687Z [0;32m[       OK ] [mExpr.IsChannelsLastContiguous (0 ms)
2023-01-11T23:32:14.9976074Z [0;32m[ RUN      ] [mExpr.LetTest01
2023-01-11T23:32:14.9976345Z [0;32m[       OK ] [mExpr.LetTest01 (0 ms)
2023-01-11T23:32:14.9976624Z [0;32m[ RUN      ] [mExpr.LetTest02
2023-01-11T23:32:14.9976954Z [0;32m[       OK ] [mExpr.LetTest02 (0 ms)
2023-01-11T23:32:14.9977222Z [0;32m[ RUN      ] [mExpr.LetStmtTest01
2023-01-11T23:32:14.9977538Z [0;32m[       OK ] [mExpr.LetStmtTest01 (0 ms)
2023-01-11T23:32:14.9977869Z [0;32m[ RUN      ] [mExpr.IntTest
2023-01-11T23:32:14.9978141Z [0;32m[       OK ] [mExpr.IntTest (0 ms)
2023-01-11T23:32:14.9978425Z [0;32m[ RUN      ] [mExpr.FloatTest
2023-01-11T23:32:14.9978746Z [0;32m[       OK ] [mExpr.FloatTest (0 ms)
2023-01-11T23:32:14.9979013Z [0;32m[ RUN      ] [mExpr.ByteTest
2023-01-11T23:32:14.9979319Z [0;32m[       OK ] [mExpr.ByteTest (0 ms)
2023-01-11T23:32:14.9979636Z [0;32m[ RUN      ] [mExpr.CharTest
2023-01-11T23:32:14.9979912Z [0;32m[       OK ] [mExpr.CharTest (0 ms)
2023-01-11T23:32:14.9980199Z [0;32m[ RUN      ] [mExpr.ShortTest
2023-01-11T23:32:14.9980518Z [0;32m[       OK ] [mExpr.ShortTest (0 ms)
2023-01-11T23:32:14.9980785Z [0;32m[ RUN      ] [mExpr.LongTest
2023-01-11T23:32:14.9981078Z [0;32m[       OK ] [mExpr.LongTest (0 ms)
2023-01-11T23:32:14.9981371Z [0;32m[ RUN      ] [mExpr.HalfTest
2023-01-11T23:32:14.9981637Z [0;32m[       OK ] [mExpr.HalfTest (0 ms)
2023-01-11T23:32:14.9981918Z [0;32m[ RUN      ] [mExpr.DoubleTest
2023-01-11T23:32:14.9982313Z [0;32m[       OK ] [mExpr.DoubleTest (0 ms)
2023-01-11T23:32:14.9982584Z [0;32m[ RUN      ] [mExpr.VectorAdd01
2023-01-11T23:32:14.9989587Z [0;32m[       OK ] [mExpr.VectorAdd01 (1 ms)
2023-01-11T23:32:14.9990727Z [0;32m[ RUN      ] [mExpr.CompareSelectEQ
2023-01-11T23:32:15.0027937Z [0;32m[       OK ] [mExpr.CompareSelectEQ (3 ms)
2023-01-11T23:32:15.0028799Z [0;32m[ RUN      ] [mExpr.CompareSelectDtypes
2023-01-11T23:32:15.0065958Z [0;32m[       OK ] [mExpr.CompareSelectDtypes (3 ms)
2023-01-11T23:32:15.0066812Z [0;32m[ RUN      ] [mExpr.IntrinsicsDtypes
2023-01-11T23:32:15.0072472Z [0;32m[       OK ] [mExpr.IntrinsicsDtypes (0 ms)
2023-01-11T23:32:15.0073335Z [0;32m[ RUN      ] [mExpr.Substitute01
2023-01-11T23:32:15.0074116Z [0;32m[       OK ] [mExpr.Substitute01 (0 ms)
2023-01-11T23:32:15.0074801Z [0;32m[ RUN      ] [mExpr.Math01
2023-01-11T23:32:15.0075124Z [0;32m[       OK ] [mExpr.Math01 (0 ms)
2023-01-11T23:32:15.0075530Z [0;32m[ RUN      ] [mExpr.UnaryMath01
2023-01-11T23:32:15.0075947Z [0;32m[       OK ] [mExpr.UnaryMath01 (0 ms)
2023-01-11T23:32:15.0076228Z [0;32m[ RUN      ] [mExpr.BinaryMath01
2023-01-11T23:32:15.0076511Z [0;32m[       OK ] [mExpr.BinaryMath01 (0 ms)
2023-01-11T23:32:15.0076783Z [0;32m[ RUN      ] [mExpr.LogicalOps01
2023-01-11T23:32:15.0077073Z [0;32m[       OK ] [mExpr.LogicalOps01 (0 ms)
2023-01-11T23:32:15.0077356Z [0;32m[ RUN      ] [mExpr.LogicalOps02
2023-01-11T23:32:15.0077631Z [0;32m[       OK ] [mExpr.LogicalOps02 (0 ms)
2023-01-11T23:32:15.0077907Z [0;32m[ RUN      ] [mExpr.LogicalOps03
2023-01-11T23:32:15.0078206Z [0;32m[       OK ] [mExpr.LogicalOps03 (0 ms)
2023-01-11T23:32:15.0078585Z [0;32m[ RUN      ] [mExpr.BitwiseOps
2023-01-11T23:32:15.0079000Z [0;32m[       OK ] [mExpr.BitwiseOps (0 ms)
2023-01-11T23:32:15.0079418Z [0;32m[ RUN      ] [mExpr.DynamicShapeAdd
2023-01-11T23:32:15.0079865Z [0;32m[       OK ] [mExpr.DynamicShapeAdd (0 ms)
2023-01-11T23:32:15.0080283Z [0;32m[ RUN      ] [mExpr.OutOfBounds
2023-01-11T23:32:15.0080623Z [0;32m[       OK ] [mExpr.OutOfBounds (0 ms)
2023-01-11T23:32:15.0080905Z [0;32m[ RUN      ] [mExpr.OutOfBounds2d
2023-01-11T23:32:15.0088117Z [0;32m[       OK ] [mExpr.OutOfBounds2d (0 ms)
2023-01-11T23:32:15.0088576Z [0;32m[ RUN      ] [mExpr.OutOfBounds2dFlattenedIndex
2023-01-11T23:32:15.0092744Z [0;32m[       OK ] [mExpr.OutOfBounds2dFlattenedIndex (0 ms)
2023-01-11T23:32:15.0093222Z [0;32m[----------] [m30 tests from Expr (12 ms total)
2023-01-11T23:32:15.0093400Z 
2023-01-11T23:32:15.0093560Z [0;32m[----------] [m16 tests from ExternalCall
2023-01-11T23:32:15.0093859Z [0;32m[ RUN      ] [mExternalCall.Conv1d_float
2023-01-11T23:32:15.0411982Z [0;32m[       OK ] [mExternalCall.Conv1d_float (31 ms)
2023-01-11T23:32:15.0412429Z [0;32m[ RUN      ] [mExternalCall.Conv1d_int
2023-01-11T23:32:15.0749393Z [0;32m[       OK ] [mExternalCall.Conv1d_int (33 ms)
2023-01-11T23:32:15.0750446Z [0;32m[ RUN      ] [mExternalCall.Conv1d_nobias_noargs
2023-01-11T23:32:15.1023747Z [0;32m[       OK ] [mExternalCall.Conv1d_nobias_noargs (27 ms)
2023-01-11T23:32:15.1024705Z [0;32m[ RUN      ] [mExternalCall.Conv2d_float
2023-01-11T23:32:15.1373649Z [0;32m[       OK ] [mExternalCall.Conv2d_float (35 ms)
2023-01-11T23:32:15.1374145Z [0;32m[ RUN      ] [mExternalCall.Conv2d_int
2023-01-11T23:32:15.1795597Z [0;32m[       OK ] [mExternalCall.Conv2d_int (42 ms)
2023-01-11T23:32:15.1796088Z [0;32m[ RUN      ] [mExternalCall.Conv2d_nobias_noargs
2023-01-11T23:32:15.2102947Z [0;32m[       OK ] [mExternalCall.Conv2d_nobias_noargs (30 ms)
2023-01-11T23:32:15.2103439Z [0;32m[ RUN      ] [mExternalCall.Addmm_float
2023-01-11T23:32:15.2396843Z [0;32m[       OK ] [mExternalCall.Addmm_float (29 ms)
2023-01-11T23:32:15.2397299Z [0;32m[ RUN      ] [mExternalCall.Embedding
2023-01-11T23:32:15.2674027Z [0;32m[       OK ] [mExternalCall.Embedding (27 ms)
2023-01-11T23:32:15.2674522Z [0;32m[ RUN      ] [mExternalCall.MaxReduction
2023-01-11T23:32:15.2922604Z [0;32m[       OK ] [mExternalCall.MaxReduction (24 ms)
2023-01-11T23:32:15.2923159Z [0;32m[ RUN      ] [mExternalCall.Prepacked_Linear_float
2023-01-11T23:32:15.3704351Z [0;32m[       OK ] [mExternalCall.Prepacked_Linear_float (78 ms)
2023-01-11T23:32:15.3704756Z [0;32m[ RUN      ] [mExternalCall.Prepacked_Conv2d_float
2023-01-11T23:32:15.4426745Z [0;32m[       OK ] [mExternalCall.Prepacked_Conv2d_float (72 ms)
2023-01-11T23:32:15.4428053Z [0;32m[ RUN      ] [mExternalCall.BinaryFloat
2023-01-11T23:32:15.5228783Z [0;32m[       OK ] [mExternalCall.BinaryFloat (80 ms)
2023-01-11T23:32:15.5229231Z [0;32m[ RUN      ] [mExternalCall.UnaryFloat
2023-01-11T23:32:15.5729734Z [0;32m[       OK ] [mExternalCall.UnaryFloat (50 ms)
2023-01-11T23:32:15.5730225Z [0;32m[ RUN      ] [mExternalCall.ComputeInterop
2023-01-11T23:32:16.7200607Z [0;32m[       OK ] [mExternalCall.ComputeInterop (1146 ms)
2023-01-11T23:32:16.7201001Z [0;32m[ RUN      ] [mExternalCall.Inlining
2023-01-11T23:32:16.8142112Z [0;32m[       OK ] [mExternalCall.Inlining (93 ms)
2023-01-11T23:32:16.8142844Z [0;32m[ RUN      ] [mExternalCall.JitCustomFusionOp
2023-01-11T23:32:16.9347823Z [0;32m[       OK ] [mExternalCall.JitCustomFusionOp (120 ms)
2023-01-11T23:32:16.9348578Z [0;32m[----------] [m16 tests from ExternalCall (1925 ms total)
2023-01-11T23:32:16.9348913Z 
2023-01-11T23:32:16.9349212Z [0;32m[----------] [m8 tests from GraphOpt
2023-01-11T23:32:16.9349773Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCat
2023-01-11T23:32:16.9713726Z [0;32m[       OK ] [mGraphOpt.OptimizeCat (36 ms)
2023-01-11T23:32:16.9714388Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCat2
2023-01-11T23:32:17.0117083Z [0;32m[       OK ] [mGraphOpt.OptimizeCat2 (40 ms)
2023-01-11T23:32:17.0117438Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCat3
2023-01-11T23:32:17.0595876Z [0;32m[       OK ] [mGraphOpt.OptimizeCat3 (47 ms)
2023-01-11T23:32:17.0596325Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCatWithTypePromotionInUser
2023-01-11T23:32:17.0957109Z [0;32m[       OK ] [mGraphOpt.OptimizeCatWithTypePromotionInUser (36 ms)
2023-01-11T23:32:17.0957573Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCatWithTypePromotionInCat
2023-01-11T23:32:17.1739150Z [0;32m[       OK ] [mGraphOpt.OptimizeCatWithTypePromotionInCat (78 ms)
2023-01-11T23:32:17.1740053Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCatNoSingleTensorElementwiseOp
2023-01-11T23:32:17.2172332Z [0;32m[       OK ] [mGraphOpt.OptimizeCatNoSingleTensorElementwiseOp (43 ms)
2023-01-11T23:32:17.2173279Z [0;32m[ RUN      ] [mGraphOpt.OptimizeCatNoSingleTensorElementwiseOp2
2023-01-11T23:32:17.2638214Z [0;32m[       OK ] [mGraphOpt.OptimizeCatNoSingleTensorElementwiseOp2 (46 ms)
2023-01-11T23:32:17.2638688Z [0;32m[ RUN      ] [mGraphOpt.AOTGraphPrepPasses
2023-01-11T23:32:17.2639182Z [0;32m[       OK ] [mGraphOpt.AOTGraphPrepPasses (0 ms)
2023-01-11T23:32:17.2639684Z [0;32m[----------] [m8 tests from GraphOpt (329 ms total)
2023-01-11T23:32:17.2639847Z 
2023-01-11T23:32:17.2639999Z [0;32m[----------] [m4 tests from IRPrinter
2023-01-11T23:32:17.2640350Z [0;32m[ RUN      ] [mIRPrinter.BasicValueTest
2023-01-11T23:32:17.2640828Z [0;32m[       OK ] [mIRPrinter.BasicValueTest (0 ms)
2023-01-11T23:32:17.2641318Z [0;32m[ RUN      ] [mIRPrinter.BasicValueTest02
2023-01-11T23:32:17.2641769Z [0;32m[       OK ] [mIRPrinter.BasicValueTest02 (0 ms)
2023-01-11T23:32:17.2642229Z [0;32m[ RUN      ] [mIRPrinter.CastTest
2023-01-11T23:32:17.2642564Z [0;32m[       OK ] [mIRPrinter.CastTest (0 ms)
2023-01-11T23:32:17.2642921Z [0;32m[ RUN      ] [mIRPrinter.FunctionName
2023-01-11T23:32:17.2643317Z [0;32m[       OK ] [mIRPrinter.FunctionName (0 ms)
2023-01-11T23:32:17.2643647Z [0;32m[----------] [m4 tests from IRPrinter (0 ms total)
2023-01-11T23:32:17.2643806Z 
2023-01-11T23:32:17.2643954Z [0;32m[----------] [m8 tests from IRVerifier
2023-01-11T23:32:17.2644235Z [0;32m[ RUN      ] [mIRVerifier.BitwiseOps
2023-01-11T23:32:17.2644538Z [0;32m[       OK ] [mIRVerifier.BitwiseOps (0 ms)
2023-01-11T23:32:17.2645040Z [0;32m[ RUN      ] [mIRVerifier.CompareSelect
2023-01-11T23:32:17.2645347Z [0;32m[       OK ] [mIRVerifier.CompareSelect (0 ms)
2023-01-11T23:32:17.2645637Z [0;32m[ RUN      ] [mIRVerifier.Ramp
2023-01-11T23:32:17.2645919Z [0;32m[       OK ] [mIRVerifier.Ramp (0 ms)
2023-01-11T23:32:17.2646188Z [0;32m[ RUN      ] [mIRVerifier.Load
2023-01-11T23:32:17.2646462Z [0;32m[       OK ] [mIRVerifier.Load (0 ms)
2023-01-11T23:32:17.2646752Z [0;32m[ RUN      ] [mIRVerifier.IfThenElse
2023-01-11T23:32:17.2647054Z [0;32m[       OK ] [mIRVerifier.IfThenElse (0 ms)
2023-01-11T23:32:17.2647321Z [0;32m[ RUN      ] [mIRVerifier.For
2023-01-11T23:32:17.2647593Z [0;32m[       OK ] [mIRVerifier.For (0 ms)
2023-01-11T23:32:17.2647862Z [0;32m[ RUN      ] [mIRVerifier.Block
2023-01-11T23:32:17.2648136Z [0;32m[       OK ] [mIRVerifier.Block (0 ms)
2023-01-11T23:32:17.2648411Z [0;32m[ RUN      ] [mIRVerifier.Store
2023-01-11T23:32:17.2648690Z [0;32m[       OK ] [mIRVerifier.Store (0 ms)
2023-01-11T23:32:17.2648997Z [0;32m[----------] [m8 tests from IRVerifier (0 ms total)
2023-01-11T23:32:17.2649157Z 
2023-01-11T23:32:17.2649300Z [0;32m[----------] [m39 tests from Kernel
2023-01-11T23:32:17.2649609Z [0;32m[ RUN      ] [mKernel.ParallelExternalCallBuf
2023-01-11T23:32:17.3177736Z [0;32m[       OK ] [mKernel.ParallelExternalCallBuf (53 ms)
2023-01-11T23:32:17.3178655Z [0;32m[ RUN      ] [mKernel.InliningIntermediates
2023-01-11T23:32:17.6404407Z [0;32m[       OK ] [mKernel.InliningIntermediates (322 ms)
2023-01-11T23:32:17.6404896Z [0;32m[ RUN      ] [mKernel.PreAllocIntermediateBufs
2023-01-11T23:32:17.8270282Z [0;32m[       OK ] [mKernel.PreAllocIntermediateBufs (186 ms)
2023-01-11T23:32:17.8271039Z [0;32m[ RUN      ] [mKernel._1
2023-01-11T23:32:17.8598481Z [0;32m[       OK ] [mKernel._1 (32 ms)
2023-01-11T23:32:17.8598864Z [0;32m[ RUN      ] [mKernel._2
2023-01-11T23:32:17.8950985Z [0;32m[       OK ] [mKernel._2 (35 ms)
2023-01-11T23:32:17.8951870Z [0;32m[ RUN      ] [mKernel._3
2023-01-11T23:32:17.9309167Z [0;32m[       OK ] [mKernel._3 (35 ms)
2023-01-11T23:32:17.9309713Z [0;32m[ RUN      ] [mKernel.Huge
2023-01-11T23:32:17.9569482Z [0;32m[       OK ] [mKernel.Huge (26 ms)
2023-01-11T23:32:17.9570150Z [0;32m[ RUN      ] [mKernel.ParallelStrided
2023-01-11T23:32:18.0462879Z [0;32m[       OK ] [mKernel.ParallelStrided (89 ms)
2023-01-11T23:32:18.0463557Z [0;32m[ RUN      ] [mKernel.CatInputTypesPromotion
2023-01-11T23:32:18.1690513Z [0;32m[       OK ] [mKernel.CatInputTypesPromotion (122 ms)
2023-01-11T23:32:18.1691137Z [0;32m[ RUN      ] [mKernel.ToDType
2023-01-11T23:32:18.1984237Z [0;32m[       OK ] [mKernel.ToDType (29 ms)
2023-01-11T23:32:18.1984949Z [0;32m[ RUN      ] [mKernel.CatAndInlineWithAConstantDim
2023-01-11T23:32:18.2286505Z [0;32m[       OK ] [mKernel.CatAndInlineWithAConstantDim (30 ms)
2023-01-11T23:32:18.2287176Z [0;32m[ RUN      ] [mKernel.CatWithEmptyInputs
2023-01-11T23:32:18.3088681Z [0;32m[       OK ] [mKernel.CatWithEmptyInputs (80 ms)
2023-01-11T23:32:18.3089403Z [0;32m[ RUN      ] [mKernel.CatWoConditionals
2023-01-11T23:32:18.4017842Z [0;32m[       OK ] [mKernel.CatWoConditionals (92 ms)
2023-01-11T23:32:18.4018559Z [0;32m[ RUN      ] [mKernel.OptimizeConditionals
2023-01-11T23:32:18.5149674Z [0;32m[       OK ] [mKernel.OptimizeConditionals (113 ms)
2023-01-11T23:32:18.5150650Z [0;32m[ RUN      ] [mKernel.SumAllAxes
2023-01-11T23:32:18.5685393Z [0;32m[       OK ] [mKernel.SumAllAxes (53 ms)
2023-01-11T23:32:18.5685723Z [0;32m[ RUN      ] [mKernel.SumOneAxis
2023-01-11T23:32:19.0117392Z [0;32m[       OK ] [mKernel.SumOneAxis (442 ms)
2023-01-11T23:32:19.0117721Z [0;32m[ RUN      ] [mKernel.SumMultipleAxes
2023-01-11T23:32:19.5131765Z [0;32m[       OK ] [mKernel.SumMultipleAxes (501 ms)
2023-01-11T23:32:19.5132588Z [0;32m[ RUN      ] [mKernel.Softmax2D
2023-01-11T23:32:20.0512323Z [0;32m[       OK ] [mKernel.Softmax2D (537 ms)
2023-01-11T23:32:20.0512878Z [0;32m[ RUN      ] [mKernel.Softmax3D
2023-01-11T23:32:21.1867283Z [0;32m[       OK ] [mKernel.Softmax3D (1135 ms)
2023-01-11T23:32:21.1867905Z [0;32m[ RUN      ] [mKernel.Softmax4D
2023-01-11T23:32:22.5412910Z [0;32m[       OK ] [mKernel.Softmax4D (1354 ms)
2023-01-11T23:32:22.5413503Z [0;32m[ RUN      ] [mKernel.SignTest
2023-01-11T23:32:22.6390544Z [0;32m[       OK ] [mKernel.SignTest (97 ms)
2023-01-11T23:32:22.6391310Z [0;32m[ RUN      ] [mKernel.InlineProducerIntoReduction
2023-01-11T23:32:22.6707598Z [0;32m[       OK ] [mKernel.InlineProducerIntoReduction (31 ms)
2023-01-11T23:32:22.6708350Z [0;32m[ RUN      ] [mKernel.InlineReductionIntoConsumer
2023-01-11T23:32:22.7113341Z [0;32m[       OK ] [mKernel.InlineReductionIntoConsumer (40 ms)
2023-01-11T23:32:22.7114034Z [0;32m[ RUN      ] [mKernel.SanitizeNames_CUDA
2023-01-11T23:32:22.9483238Z [0;32m[       OK ] [mKernel.SanitizeNames_CUDA (237 ms)
2023-01-11T23:32:22.9483584Z [0;32m[ RUN      ] [mKernel.SanitizeConstants_CUDA
2023-01-11T23:32:23.1844808Z [0;32m[       OK ] [mKernel.SanitizeConstants_CUDA (236 ms)
2023-01-11T23:32:23.1845156Z [0;32m[ RUN      ] [mKernel.ConstantTensors
2023-01-11T23:32:23.2341748Z [0;32m[       OK ] [mKernel.ConstantTensors (49 ms)
2023-01-11T23:32:23.2342460Z [0;32m[ RUN      ] [mKernel.ConstantTensorsNonContiguous
2023-01-11T23:32:23.2825797Z [0;32m[       OK ] [mKernel.ConstantTensorsNonContiguous (48 ms)
2023-01-11T23:32:23.2826477Z [0;32m[ RUN      ] [mKernel.RunFast
2023-01-11T23:32:23.3179796Z [0;32m[       OK ] [mKernel.RunFast (35 ms)
2023-01-11T23:32:23.3180448Z [0;32m[ RUN      ] [mKernel.RunWithAllocatedOutputs
2023-01-11T23:32:23.3529808Z [0;32m[       OK ] [mKernel.RunWithAllocatedOutputs (35 ms)
2023-01-11T23:32:23.3530197Z [0;32m[ RUN      ] [mKernel.CodegenInspection
2023-01-11T23:32:23.4015453Z [0;32m[       OK ] [mKernel.CodegenInspection (48 ms)
2023-01-11T23:32:23.4016138Z [0;32m[ RUN      ] [mKernel.CustomLowering
2023-01-11T23:32:23.4267129Z [0;32m[       OK ] [mKernel.CustomLowering (25 ms)
2023-01-11T23:32:23.4267763Z [0;32m[ RUN      ] [mKernel.Vectorize
2023-01-11T23:32:23.4572623Z [0;32m[       OK ] [mKernel.Vectorize (30 ms)
2023-01-11T23:32:23.4573257Z [0;32m[ RUN      ] [mKernel.Strided1dWithinBounds
2023-01-11T23:32:23.4807352Z [0;32m[       OK ] [mKernel.Strided1dWithinBounds (23 ms)
2023-01-11T23:32:23.4808002Z [0;32m[ RUN      ] [mKernel.InputAsOutput
2023-01-11T23:32:23.5188083Z [0;32m[       OK ] [mKernel.InputAsOutput (37 ms)
2023-01-11T23:32:23.5189004Z [0;32m[ RUN      ] [mKernel.ScalarOut
2023-01-11T23:32:23.5397841Z [0;32m[       OK ] [mKernel.ScalarOut (21 ms)
2023-01-11T23:32:23.5398682Z [0;32m[ RUN      ] [mKernel.ScalarTensorOut
2023-01-11T23:32:23.5676069Z [0;32m[       OK ] [mKernel.ScalarTensorOut (27 ms)
2023-01-11T23:32:23.5677155Z [0;32m[ RUN      ] [mKernel.FuseLoopsWithVariableBounds
2023-01-11T23:32:24.1421650Z [0;32m[       OK ] [mKernel.FuseLoopsWithVariableBounds (574 ms)
2023-01-11T23:32:24.1422441Z [0;32m[ RUN      ] [mKernel.FuseLoopsWithVariableConcatDim
2023-01-11T23:32:24.4207439Z [0;32m[       OK ] [mKernel.FuseLoopsWithVariableConcatDim (278 ms)
2023-01-11T23:32:24.4208332Z [0;32m[ RUN      ] [mKernel.DoNotFuseLoopsWithMismatchingVariableDims
2023-01-11T23:32:24.6066791Z [0;32m[       OK ] [mKernel.DoNotFuseLoopsWithMismatchingVariableDims (185 ms)
2023-01-11T23:32:24.6067476Z [0;32m[----------] [m39 tests from Kernel (7341 ms total)
2023-01-11T23:32:24.6067641Z 
2023-01-11T23:32:24.6067786Z [0;32m[----------] [m174 tests from LoopNest
2023-01-11T23:32:24.6068156Z [0;32m[ RUN      ] [mLoopNest.ExprSimple01
2023-01-11T23:32:24.6068470Z [0;32m[       OK ] [mLoopNest.ExprSimple01 (0 ms)
2023-01-11T23:32:24.6068760Z [0;32m[ RUN      ] [mLoopNest.ExprLower01
2023-01-11T23:32:24.6069129Z [0;32m[       OK ] [mLoopNest.ExprLower01 (0 ms)
2023-01-11T23:32:24.6069426Z [0;32m[ RUN      ] [mLoopNest.ExprSimple02
2023-01-11T23:32:24.6089154Z [0;32m[       OK ] [mLoopNest.ExprSimple02 (2 ms)
2023-01-11T23:32:24.6090098Z [0;32m[ RUN      ] [mLoopNest.ExprSliceHeadWithLoopOptions
2023-01-11T23:32:24.6091561Z [0;32m[       OK ] [mLoopNest.ExprSliceHeadWithLoopOptions (0 ms)
2023-01-11T23:32:24.6092529Z [0;32m[ RUN      ] [mLoopNest.ExprSliceTailWithLoopOptions
2023-01-11T23:32:24.6093498Z [0;32m[       OK ] [mLoopNest.ExprSliceTailWithLoopOptions (0 ms)
2023-01-11T23:32:24.6095134Z [0;32m[ RUN      ] [mLoopNest.ExprSliceHeadWhenFactorEqualsSize
2023-01-11T23:32:24.6096290Z [0;32m[       OK ] [mLoopNest.ExprSliceHeadWhenFactorEqualsSize (0 ms)
2023-01-11T23:32:24.6097088Z [0;32m[ RUN      ] [mLoopNest.ExprSliceHeadWhenFactorLargerThanSize
2023-01-11T23:32:24.6097649Z [0;32m[       OK ] [mLoopNest.ExprSliceHeadWhenFactorLargerThanSize (0 ms)
2023-01-11T23:32:24.6098030Z [0;32m[ RUN      ] [mLoopNest.ExprSliceHead
2023-01-11T23:32:24.6098452Z [0;32m[       OK ] [mLoopNest.ExprSliceHead (0 ms)
2023-01-11T23:32:24.6098816Z [0;32m[ RUN      ] [mLoopNest.ExprSliceHeadWithNonZeroStart
2023-01-11T23:32:24.6099209Z [0;32m[       OK ] [mLoopNest.ExprSliceHeadWithNonZeroStart (0 ms)
2023-01-11T23:32:24.6099606Z [0;32m[ RUN      ] [mLoopNest.ExprSliceTailWhenFactorEqualsSize
2023-01-11T23:32:24.6100004Z [0;32m[       OK ] [mLoopNest.ExprSliceTailWhenFactorEqualsSize (0 ms)
2023-01-11T23:32:24.6100409Z [0;32m[ RUN      ] [mLoopNest.ExprSliceTailWhenFactorLargerThanSize
2023-01-11T23:32:24.6100835Z [0;32m[       OK ] [mLoopNest.ExprSliceTailWhenFactorLargerThanSize (0 ms)
2023-01-11T23:32:24.6101187Z [0;32m[ RUN      ] [mLoopNest.ExprSliceTail
2023-01-11T23:32:24.6101481Z [0;32m[       OK ] [mLoopNest.ExprSliceTail (0 ms)
2023-01-11T23:32:24.6101788Z [0;32m[ RUN      ] [mLoopNest.ExprSplitAndSlice
2023-01-11T23:32:24.6105518Z [0;32m[       OK ] [mLoopNest.ExprSplitAndSlice (0 ms)
2023-01-11T23:32:24.6105933Z [0;32m[ RUN      ] [mLoopNest.ExprSliceAndNormalize
2023-01-11T23:32:24.6106572Z [0;32m[       OK ] [mLoopNest.ExprSliceAndNormalize (0 ms)
2023-01-11T23:32:24.6106946Z [0;32m[ RUN      ] [mLoopNest.ExprSliceWithVariableDimension
2023-01-11T23:32:24.6123265Z [0;32m[       OK ] [mLoopNest.ExprSliceWithVariableDimension (1 ms)
2023-01-11T23:32:24.6123648Z [0;32m[ RUN      ] [mLoopNest.ExprSplitWithTail
2023-01-11T23:32:24.6129794Z [0;32m[       OK ] [mLoopNest.ExprSplitWithTail (0 ms)
2023-01-11T23:32:24.6130173Z [0;32m[ RUN      ] [mLoopNest.ExprSplitWithTailNone
2023-01-11T23:32:24.6143557Z [0;32m[       OK ] [mLoopNest.ExprSplitWithTailNone (1 ms)
2023-01-11T23:32:24.6143964Z [0;32m[ RUN      ] [mLoopNest.ExprSplitWithMask01
2023-01-11T23:32:24.6172268Z [0;32m[       OK ] [mLoopNest.ExprSplitWithMask01 (2 ms)
2023-01-11T23:32:24.6172712Z [0;32m[ RUN      ] [mLoopNest.ExprSplitWithMaskRepeatedNoMask
2023-01-11T23:32:24.6177245Z [0;32m[       OK ] [mLoopNest.ExprSplitWithMaskRepeatedNoMask (0 ms)
2023-01-11T23:32:24.6177684Z [0;32m[ RUN      ] [mLoopNest.getLoopAt
2023-01-11T23:32:24.6178005Z [0;32m[       OK ] [mLoopNest.getLoopAt (0 ms)
2023-01-11T23:32:24.6178317Z [0;32m[ RUN      ] [mLoopNest.TileSimple
2023-01-11T23:32:24.7113624Z [0;32m[       OK ] [mLoopNest.TileSimple (93 ms)
2023-01-11T23:32:24.7114326Z [0;32m[ RUN      ] [mLoopNest.TileWithTails
2023-01-11T23:32:24.8047256Z [0;32m[       OK ] [mLoopNest.TileWithTails (93 ms)
2023-01-11T23:32:24.8047587Z [0;32m[ RUN      ] [mLoopNest.TileInMiddle
2023-01-11T23:32:24.9545698Z [0;32m[       OK ] [mLoopNest.TileInMiddle (149 ms)
2023-01-11T23:32:24.9546384Z [0;32m[ RUN      ] [mLoopNest.SplitWithTailWithLoopOptions
2023-01-11T23:32:24.9547081Z [0;32m[       OK ] [mLoopNest.SplitWithTailWithLoopOptions (0 ms)
2023-01-11T23:32:24.9547691Z [0;32m[ RUN      ] [mLoopNest.SplitWithMaskWithLoopOptions
2023-01-11T23:32:24.9548068Z [0;32m[       OK ] [mLoopNest.SplitWithMaskWithLoopOptions (0 ms)
2023-01-11T23:32:24.9548433Z [0;32m[ RUN      ] [mLoopNest.ScheduleBroadcastAddBuffer
2023-01-11T23:32:24.9563578Z [0;32m[       OK ] [mLoopNest.ScheduleBroadcastAddBuffer (1 ms)
2023-01-11T23:32:24.9563971Z [0;32m[ RUN      ] [mLoopNest.ScheduleFunctionCall01
2023-01-11T23:32:24.9642705Z [0;32m[       OK ] [mLoopNest.ScheduleFunctionCall01 (7 ms)
2023-01-11T23:32:24.9643084Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineSimple
2023-01-11T23:32:24.9768857Z [0;32m[       OK ] [mLoopNest.ScheduleInlineSimple (12 ms)
2023-01-11T23:32:24.9769591Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineFunc01
2023-01-11T23:32:25.0353322Z [0;32m[       OK ] [mLoopNest.ScheduleInlineFunc01 (58 ms)
2023-01-11T23:32:25.0353818Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineRandom
2023-01-11T23:32:25.0354643Z [0;32m[       OK ] [mLoopNest.ScheduleInlineRandom (0 ms)
2023-01-11T23:32:25.0355017Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineRandomUnrelated
2023-01-11T23:32:25.0359767Z [0;32m[       OK ] [mLoopNest.ScheduleInlineRandomUnrelated (0 ms)
2023-01-11T23:32:25.0360184Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineRandomLowerDimensions
2023-01-11T23:32:25.0362126Z [0;32m[       OK ] [mLoopNest.ScheduleInlineRandomLowerDimensions (0 ms)
2023-01-11T23:32:25.0362530Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineIntrinsics
2023-01-11T23:32:25.0455189Z [0;32m[       OK ] [mLoopNest.ScheduleInlineIntrinsics (9 ms)
2023-01-11T23:32:25.0455658Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineRandWithIntrinsics
2023-01-11T23:32:25.0457984Z [0;32m[       OK ] [mLoopNest.ScheduleInlineRandWithIntrinsics (0 ms)
2023-01-11T23:32:25.0458381Z [0;32m[ RUN      ] [mLoopNest.ScheduleSplitAThenInline
2023-01-11T23:32:25.0458745Z [0;32m[       OK ] [mLoopNest.ScheduleSplitAThenInline (0 ms)
2023-01-11T23:32:25.0459090Z [0;32m[ RUN      ] [mLoopNest.ScheduleSplitBThenInline
2023-01-11T23:32:25.0462959Z [0;32m[       OK ] [mLoopNest.ScheduleSplitBThenInline (0 ms)
2023-01-11T23:32:25.0464572Z [0;32m[ RUN      ] [mLoopNest.ScheduleSplitTwiceThenInline
2023-01-11T23:32:25.0465400Z [0;32m[       OK ] [mLoopNest.ScheduleSplitTwiceThenInline (0 ms)
2023-01-11T23:32:25.0466123Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineThenSplit
2023-01-11T23:32:25.0467838Z [0;32m[       OK ] [mLoopNest.ScheduleInlineThenSplit (0 ms)
2023-01-11T23:32:25.0468213Z [0;32m[ RUN      ] [mLoopNest.ScheduleSplitInlineThenSplit
2023-01-11T23:32:25.0476031Z [0;32m[       OK ] [mLoopNest.ScheduleSplitInlineThenSplit (0 ms)
2023-01-11T23:32:25.0476439Z [0;32m[ RUN      ] [mLoopNest.ScheduleSplitInlineSimplify
2023-01-11T23:32:25.0477222Z [0;32m[       OK ] [mLoopNest.ScheduleSplitInlineSimplify (0 ms)
2023-01-11T23:32:25.0477616Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineThreeMixedOnce
2023-01-11T23:32:25.0481970Z [0;32m[       OK ] [mLoopNest.ScheduleInlineThreeMixedOnce (0 ms)
2023-01-11T23:32:25.0482354Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineThreeMixedTwice
2023-01-11T23:32:25.0485990Z [0;32m[       OK ] [mLoopNest.ScheduleInlineThreeMixedTwice (0 ms)
2023-01-11T23:32:25.0486379Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineThreeMixedInner
2023-01-11T23:32:25.0490805Z [0;32m[       OK ] [mLoopNest.ScheduleInlineThreeMixedInner (0 ms)
2023-01-11T23:32:25.0491199Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineThreeMixedSplit
2023-01-11T23:32:25.0492789Z [0;32m[       OK ] [mLoopNest.ScheduleInlineThreeMixedSplit (0 ms)
2023-01-11T23:32:25.0493251Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineOutputTensors
2023-01-11T23:32:25.0497176Z [0;32m[       OK ] [mLoopNest.ScheduleInlineOutputTensors (0 ms)
2023-01-11T23:32:25.0497745Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineWithCompoundIndices
2023-01-11T23:32:25.0498287Z [0;32m[       OK ] [mLoopNest.ScheduleInlineWithCompoundIndices (0 ms)
2023-01-11T23:32:25.0498712Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineConsumerIndicesWithCast
2023-01-11T23:32:25.0499245Z [0;32m[       OK ] [mLoopNest.ScheduleInlineConsumerIndicesWithCast (0 ms)
2023-01-11T23:32:25.0499823Z [0;32m[ RUN      ] [mLoopNest.ScheduleInlineProducerIndicesWithCast
2023-01-11T23:32:25.0500261Z [0;32m[       OK ] [mLoopNest.ScheduleInlineProducerIndicesWithCast (0 ms)
2023-01-11T23:32:25.0500634Z [0;32m[ RUN      ] [mLoopNest.ScheduleFuserStyle
2023-01-11T23:32:25.0552504Z [0;32m[       OK ] [mLoopNest.ScheduleFuserStyle (5 ms)
2023-01-11T23:32:25.0553472Z [0;32m[ RUN      ] [mLoopNest.ScheduleFuserThreeArg
2023-01-11T23:32:25.0612540Z [0;32m[       OK ] [mLoopNest.ScheduleFuserThreeArg (6 ms)
2023-01-11T23:32:25.0613267Z [0;32m[ RUN      ] [mLoopNest.ScheduleDynamicShape2D
2023-01-11T23:32:25.0718766Z [0;32m[       OK ] [mLoopNest.ScheduleDynamicShape2D (10 ms)
2023-01-11T23:32:25.0719131Z [0;32m[ RUN      ] [mLoopNest.LoopNestComputeAt_1
2023-01-11T23:32:25.0726459Z [0;32m[       OK ] [mLoopNest.LoopNestComputeAt_1 (0 ms)
2023-01-11T23:32:25.0726823Z [0;32m[ RUN      ] [mLoopNest.LoopNestComputeAt_2
2023-01-11T23:32:25.1245549Z [0;32m[       OK ] [mLoopNest.LoopNestComputeAt_2 (51 ms)
2023-01-11T23:32:25.1245919Z [0;32m[ RUN      ] [mLoopNest.LoopNestComputeAt_3
2023-01-11T23:32:25.1759490Z [0;32m[       OK ] [mLoopNest.LoopNestComputeAt_3 (51 ms)
2023-01-11T23:32:25.1759843Z [0;32m[ RUN      ] [mLoopNest.Reduce2dComputeAt
2023-01-11T23:32:25.2620343Z [0;32m[       OK ] [mLoopNest.Reduce2dComputeAt (85 ms)
2023-01-11T23:32:25.2621144Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderAxis1
2023-01-11T23:32:25.2622013Z [0;32m[       OK ] [mLoopNest.LoopNestReorderAxis1 (0 ms)
2023-01-11T23:32:25.2622709Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderPartialAxes
2023-01-11T23:32:25.2630358Z [0;32m[       OK ] [mLoopNest.LoopNestReorderPartialAxes (0 ms)
2023-01-11T23:32:25.2631680Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderInternalAxis
2023-01-11T23:32:25.2638172Z [0;32m[       OK ] [mLoopNest.LoopNestReorderInternalAxis (0 ms)
2023-01-11T23:32:25.2638560Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderEnclosingAxis
2023-01-11T23:32:25.2647257Z [0;32m[       OK ] [mLoopNest.LoopNestReorderEnclosingAxis (0 ms)
2023-01-11T23:32:25.2647771Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderSameAxis
2023-01-11T23:32:25.2648142Z [0;32m[       OK ] [mLoopNest.LoopNestReorderSameAxis (0 ms)
2023-01-11T23:32:25.2648507Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderExtraStatements
2023-01-11T23:32:25.2662594Z [0;32m[       OK ] [mLoopNest.LoopNestReorderExtraStatements (1 ms)
2023-01-11T23:32:25.2663032Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderLongStringOfPreOrphans
2023-01-11T23:32:25.3227922Z [0;32m[       OK ] [mLoopNest.LoopNestReorderLongStringOfPreOrphans (56 ms)
2023-01-11T23:32:25.3228544Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderLongStringOfPostOrphans
2023-01-11T23:32:25.3788065Z [0;32m[       OK ] [mLoopNest.LoopNestReorderLongStringOfPostOrphans (56 ms)
2023-01-11T23:32:25.3789279Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderLongStringFull
2023-01-11T23:32:25.4496353Z [0;32m[       OK ] [mLoopNest.LoopNestReorderLongStringFull (70 ms)
2023-01-11T23:32:25.4497163Z [0;32m[ RUN      ] [mLoopNest.LoopNestReorderInternalLoopNest
2023-01-11T23:32:25.4617441Z [0;32m[       OK ] [mLoopNest.LoopNestReorderInternalLoopNest (12 ms)
2023-01-11T23:32:25.4618204Z [0;32m[ RUN      ] [mLoopNest.OuterLoopVectorization
2023-01-11T23:32:25.4619096Z [0;32m[       OK ] [mLoopNest.OuterLoopVectorization (0 ms)
2023-01-11T23:32:25.4619812Z [0;32m[ RUN      ] [mLoopNest.VectorizeLoopNotNormalized
2023-01-11T23:32:25.4620587Z [0;32m[       OK ] [mLoopNest.VectorizeLoopNotNormalized (0 ms)
2023-01-11T23:32:25.4621368Z [0;32m[ RUN      ] [mLoopNest.Unroll
2023-01-11T23:32:25.4621932Z [0;32m[       OK ] [mLoopNest.Unroll (0 ms)
2023-01-11T23:32:25.4622490Z [0;32m[ RUN      ] [mLoopNest.UnrollOuter
2023-01-11T23:32:25.4623066Z [0;32m[       OK ] [mLoopNest.UnrollOuter (0 ms)
2023-01-11T23:32:25.4623641Z [0;32m[ RUN      ] [mLoopNest.UnrollInner
2023-01-11T23:32:25.4624222Z [0;32m[       OK ] [mLoopNest.UnrollInner (0 ms)
2023-01-11T23:32:25.4624855Z [0;32m[ RUN      ] [mLoopNest.UnrollMultipleStatements
2023-01-11T23:32:25.4625577Z [0;32m[       OK ] [mLoopNest.UnrollMultipleStatements (0 ms)
2023-01-11T23:32:25.4626309Z [0;32m[ RUN      ] [mLoopNest.UnrollNonLiteralConstantBounds
2023-01-11T23:32:25.4627322Z [0;32m[       OK ] [mLoopNest.UnrollNonLiteralConstantBounds (0 ms)
2023-01-11T23:32:25.4627767Z [0;32m[ RUN      ] [mLoopNest.UnrollNonConstantBounds
2023-01-11T23:32:25.4638896Z [0;32m[       OK ] [mLoopNest.UnrollNonConstantBounds (1 ms)
2023-01-11T23:32:25.4639276Z [0;32m[ RUN      ] [mLoopNest.UnrollByFactorsLessThan2
2023-01-11T23:32:25.4639647Z [0;32m[       OK ] [mLoopNest.UnrollByFactorsLessThan2 (0 ms)
2023-01-11T23:32:25.4639997Z [0;32m[ RUN      ] [mLoopNest.UnrollByFactorEqualToIters
2023-01-11T23:32:25.4640768Z [0;32m[       OK ] [mLoopNest.UnrollByFactorEqualToIters (0 ms)
2023-01-11T23:32:25.4641096Z [0;32m[ RUN      ] [mLoopNest.UnrollEmpty
2023-01-11T23:32:25.4641392Z [0;32m[       OK ] [mLoopNest.UnrollEmpty (0 ms)
2023-01-11T23:32:25.4641731Z [0;32m[ RUN      ] [mLoopNest.NoUnroll
2023-01-11T23:32:25.4642077Z [0;32m[       OK ] [mLoopNest.NoUnroll (0 ms)
2023-01-11T23:32:25.4642369Z [0;32m[ RUN      ] [mLoopNest.UnrollWithLet
2023-01-11T23:32:25.4643550Z [0;32m[       OK ] [mLoopNest.UnrollWithLet (0 ms)
2023-01-11T23:32:25.4643891Z [0;32m[ RUN      ] [mLoopNest.IsNormalized
2023-01-11T23:32:25.4644268Z [0;32m[       OK ] [mLoopNest.IsNormalized (0 ms)
2023-01-11T23:32:25.4644589Z [0;32m[ RUN      ] [mLoopNest.NormalizeStartPositive
2023-01-11T23:32:25.4645451Z [0;32m[       OK ] [mLoopNest.NormalizeStartPositive (0 ms)
2023-01-11T23:32:25.4645797Z [0;32m[ RUN      ] [mLoopNest.NormalizeStartNegative
2023-01-11T23:32:25.4648152Z [0;32m[       OK ] [mLoopNest.NormalizeStartNegative (0 ms)
2023-01-11T23:32:25.4648807Z [0;32m[ RUN      ] [mLoopNest.NormalizeStartZero
2023-01-11T23:32:25.4649217Z [0;32m[       OK ] [mLoopNest.NormalizeStartZero (0 ms)
2023-01-11T23:32:25.4649546Z [0;32m[ RUN      ] [mLoopNest.NormalizeStartVariable
2023-01-11T23:32:25.4651957Z [0;32m[       OK ] [mLoopNest.NormalizeStartVariable (0 ms)
2023-01-11T23:32:25.4652342Z [0;32m[ RUN      ] [mLoopNest.NormalizeOnNestedOuterLoop
2023-01-11T23:32:25.4653131Z [0;32m[       OK ] [mLoopNest.NormalizeOnNestedOuterLoop (0 ms)
2023-01-11T23:32:25.4653572Z [0;32m[ RUN      ] [mLoopNest.NormalizeOnNestedInnerLoop
2023-01-11T23:32:25.4656100Z [0;32m[       OK ] [mLoopNest.NormalizeOnNestedInnerLoop (0 ms)
2023-01-11T23:32:25.4656491Z [0;32m[ RUN      ] [mLoopNest.NormalizeAndSplitWithTail
2023-01-11T23:32:25.4660096Z [0;32m[       OK ] [mLoopNest.NormalizeAndSplitWithTail (0 ms)
2023-01-11T23:32:25.4660496Z [0;32m[ RUN      ] [mLoopNest.NotNormalizeAndSplitWithTail
2023-01-11T23:32:25.4664508Z [0;32m[       OK ] [mLoopNest.NotNormalizeAndSplitWithTail (0 ms)
2023-01-11T23:32:25.4664976Z [0;32m[ RUN      ] [mLoopNest.FlattenSimpleLoopNest2D
2023-01-11T23:32:25.4674233Z [0;32m[       OK ] [mLoopNest.FlattenSimpleLoopNest2D (0 ms)
2023-01-11T23:32:25.4674611Z [0;32m[ RUN      ] [mLoopNest.FlattenSimpleLoopNest3D
2023-01-11T23:32:25.4764054Z [0;32m[       OK ] [mLoopNest.FlattenSimpleLoopNest3D (8 ms)
2023-01-11T23:32:25.4764525Z [0;32m[ RUN      ] [mLoopNest.FlattenLoopNestAfterNormalize
2023-01-11T23:32:25.4794026Z [0;32m[       OK ] [mLoopNest.FlattenLoopNestAfterNormalize (2 ms)
2023-01-11T23:32:25.4795120Z [0;32m[ RUN      ] [mLoopNest.FlattenLoopNestWithNonLiteralConstantBounds
2023-01-11T23:32:25.4801594Z [0;32m[       OK ] [mLoopNest.FlattenLoopNestWithNonLiteralConstantBounds (1 ms)
2023-01-11T23:32:25.4802207Z [0;32m[ RUN      ] [mLoopNest.FlattenImperfectLoopNest
2023-01-11T23:32:25.4802688Z [0;32m[       OK ] [mLoopNest.FlattenImperfectLoopNest (0 ms)
2023-01-11T23:32:25.4803138Z [0;32m[ RUN      ] [mLoopNest.FlattenReductionLoopNest
2023-01-11T23:32:25.4803497Z [0;32m[       OK ] [mLoopNest.FlattenReductionLoopNest (0 ms)
2023-01-11T23:32:25.4803881Z [0;32m[ RUN      ] [mLoopNest.FlattenReductionLoopNestFromTensor
2023-01-11T23:32:25.4804294Z [0;32m[       OK ] [mLoopNest.FlattenReductionLoopNestFromTensor (0 ms)
2023-01-11T23:32:25.4804685Z [0;32m[ RUN      ] [mLoopNest.FlattenIncorrectLoopsAsInput
2023-01-11T23:32:25.4805167Z [0;32m[       OK ] [mLoopNest.FlattenIncorrectLoopsAsInput (0 ms)
2023-01-11T23:32:25.4805523Z [0;32m[ RUN      ] [mLoopNest.DetectInlineRankMismatch
2023-01-11T23:32:25.4805884Z [0;32m[       OK ] [mLoopNest.DetectInlineRankMismatch (0 ms)
2023-01-11T23:32:25.4806217Z [0;32m[ RUN      ] [mLoopNest.CacheReadsSimple
2023-01-11T23:32:25.5184137Z [0;32m[       OK ] [mLoopNest.CacheReadsSimple (37 ms)
2023-01-11T23:32:25.5185051Z [0;32m[ RUN      ] [mLoopNest.CacheReadsOuter
2023-01-11T23:32:25.5589365Z [0;32m[       OK ] [mLoopNest.CacheReadsOuter (40 ms)
2023-01-11T23:32:25.5590412Z [0;32m[ RUN      ] [mLoopNest.CacheReadsInternal
2023-01-11T23:32:25.6025976Z [0;32m[       OK ] [mLoopNest.CacheReadsInternal (43 ms)
2023-01-11T23:32:25.6026909Z [0;32m[ RUN      ] [mLoopNest.CacheReadsInner
2023-01-11T23:32:25.6711426Z [0;32m[       OK ] [mLoopNest.CacheReadsInner (68 ms)
2023-01-11T23:32:25.6712715Z [0;32m[ RUN      ] [mLoopNest.CacheWritesSimple
2023-01-11T23:32:25.7463915Z [0;32m[       OK ] [mLoopNest.CacheWritesSimple (75 ms)
2023-01-11T23:32:25.7464855Z [0;32m[ RUN      ] [mLoopNest.DeadStoreElimination
2023-01-11T23:32:25.7474994Z [0;32m[       OK ] [mLoopNest.DeadStoreElimination (1 ms)
2023-01-11T23:32:25.7475549Z [0;32m[ RUN      ] [mLoopNest.DeadStoreEliminationWithIntermediates
2023-01-11T23:32:25.7485878Z [0;32m[       OK ] [mLoopNest.DeadStoreEliminationWithIntermediates (1 ms)
2023-01-11T23:32:25.7486415Z [0;32m[ RUN      ] [mLoopNest.CompoundTensorSimple
2023-01-11T23:32:25.7499377Z [0;32m[       OK ] [mLoopNest.CompoundTensorSimple (1 ms)
2023-01-11T23:32:25.7499835Z [0;32m[ RUN      ] [mLoopNest.InlineConstantIndex
2023-01-11T23:32:25.7502106Z [0;32m[       OK ] [mLoopNest.InlineConstantIndex (0 ms)
2023-01-11T23:32:25.7502587Z [0;32m[ RUN      ] [mLoopNest.CompoundTensorUsed
2023-01-11T23:32:25.7524406Z [0;32m[       OK ] [mLoopNest.CompoundTensorUsed (2 ms)
2023-01-11T23:32:25.7524873Z [0;32m[ RUN      ] [mLoopNest.InlineFromLoad
2023-01-11T23:32:25.7525307Z [0;32m[       OK ] [mLoopNest.InlineFromLoad (0 ms)
2023-01-11T23:32:25.7525692Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsSimple
2023-01-11T23:32:25.7526445Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsSimple (0 ms)
2023-01-11T23:32:25.7526974Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsNestedConditions
2023-01-11T23:32:25.7528045Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsNestedConditions (0 ms)
2023-01-11T23:32:25.7528540Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsMultipleStores
2023-01-11T23:32:25.7530874Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsMultipleStores (0 ms)
2023-01-11T23:32:25.7531481Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsMultipleStoresInOneLoop
2023-01-11T23:32:25.7535569Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsMultipleStoresInOneLoop (0 ms)
2023-01-11T23:32:25.7536164Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsOuterLoopVar
2023-01-11T23:32:25.7538492Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsOuterLoopVar (0 ms)
2023-01-11T23:32:25.7539101Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsCompValuesNotOrdered
2023-01-11T23:32:25.7541707Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsCompValuesNotOrdered (0 ms)
2023-01-11T23:32:25.7542327Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsCompValuesNotConstants
2023-01-11T23:32:25.7544723Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsCompValuesNotConstants (0 ms)
2023-01-11T23:32:25.7545336Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsInvalidCondition
2023-01-11T23:32:25.7547820Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsInvalidCondition (0 ms)
2023-01-11T23:32:25.7548415Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsInvalidCondition2
2023-01-11T23:32:25.7551656Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsInvalidCondition2 (0 ms)
2023-01-11T23:32:25.7552303Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsInvalidCondition3
2023-01-11T23:32:25.7552877Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsInvalidCondition3 (0 ms)
2023-01-11T23:32:25.7553437Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsInvalidCondition4
2023-01-11T23:32:25.7555995Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsInvalidCondition4 (0 ms)
2023-01-11T23:32:25.7556587Z [0;32m[ RUN      ] [mLoopNest.OptimizeConditionalsNotNormalized
2023-01-11T23:32:25.7557099Z [0;32m[       OK ] [mLoopNest.OptimizeConditionalsNotNormalized (0 ms)
2023-01-11T23:32:25.7557496Z [0;32m[ RUN      ] [mLoopNest.ColReduceSplitTailEvenReorder
2023-01-11T23:32:25.9127696Z [0;32m[       OK ] [mLoopNest.ColReduceSplitTailEvenReorder (157 ms)
2023-01-11T23:32:25.9128211Z [0;32m[ RUN      ] [mLoopNest.ColReduceSplitTailUnevenReorder
2023-01-11T23:32:26.0367991Z [0;32m[       OK ] [mLoopNest.ColReduceSplitTailUnevenReorder (124 ms)
2023-01-11T23:32:26.0368946Z [0;32m[ RUN      ] [mLoopNest.ColReduceSplitMaskEvenReorder
2023-01-11T23:32:26.1928787Z [0;32m[       OK ] [mLoopNest.ColReduceSplitMaskEvenReorder (155 ms)
2023-01-11T23:32:26.1929575Z [0;32m[ RUN      ] [mLoopNest.ColReduceSplitMaskUnevenReorder
2023-01-11T23:32:26.3335632Z [0;32m[       OK ] [mLoopNest.ColReduceSplitMaskUnevenReorder (140 ms)
2023-01-11T23:32:26.3336589Z [0;32m[ RUN      ] [mLoopNest.ReorderAxisWithMultipleConds
2023-01-11T23:32:26.3337319Z [0;32m[       OK ] [mLoopNest.ReorderAxisWithMultipleConds (0 ms)
2023-01-11T23:32:26.3337935Z [0;32m[ RUN      ] [mLoopNest.VectorizeUse
2023-01-11T23:32:26.3341019Z [0;32m[       OK ] [mLoopNest.VectorizeUse (0 ms)
2023-01-11T23:32:26.3341674Z [0;32m[ RUN      ] [mLoopNest.Int64Direct
2023-01-11T23:32:26.3342317Z [0;32m[       OK ] [mLoopNest.Int64Direct (0 ms)
2023-01-11T23:32:26.3343107Z [0;32m[ RUN      ] [mLoopNest.Int64Compute
2023-01-11T23:32:26.3343903Z [0;32m[       OK ] [mLoopNest.Int64Compute (0 ms)
2023-01-11T23:32:26.3344713Z [0;32m[ RUN      ] [mLoopNest.DistributeLoopWithAllStmtsAsPivots
2023-01-11T23:32:26.3345536Z [0;32m[       OK ] [mLoopNest.DistributeLoopWithAllStmtsAsPivots (0 ms)
2023-01-11T23:32:26.3346349Z [0;32m[ RUN      ] [mLoopNest.DistributeLoopWithOneStmtAsPivot
2023-01-11T23:32:26.3347137Z [0;32m[       OK ] [mLoopNest.DistributeLoopWithOneStmtAsPivot (0 ms)
2023-01-11T23:32:26.3347766Z [0;32m[ RUN      ] [mLoopNest.DistributeLoopWithoutAnyPivot
2023-01-11T23:32:26.3348437Z [0;32m[       OK ] [mLoopNest.DistributeLoopWithoutAnyPivot (0 ms)
2023-01-11T23:32:26.3348960Z [0;32m[ RUN      ] [mLoopNest.DistributeLoopOverInnerLoops
2023-01-11T23:32:26.3349381Z [0;32m[       OK ] [mLoopNest.DistributeLoopOverInnerLoops (0 ms)
2023-01-11T23:32:26.3349790Z [0;32m[ RUN      ] [mLoopNest.DistributeLoopAndParentsWithoutAnyPivot
2023-01-11T23:32:26.3350320Z [0;32m[       OK ] [mLoopNest.DistributeLoopAndParentsWithoutAnyPivot (0 ms)
2023-01-11T23:32:26.3350691Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsSimple
2023-01-11T23:32:26.3351005Z [0;32m[       OK ] [mLoopNest.fuseLoopsSimple (0 ms)
2023-01-11T23:32:26.3351313Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsMultiple
2023-01-11T23:32:26.3351649Z [0;32m[       OK ] [mLoopNest.fuseLoopsMultiple (0 ms)
2023-01-11T23:32:26.3351964Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsNested
2023-01-11T23:32:26.3354995Z [0;32m[       OK ] [mLoopNest.fuseLoopsNested (0 ms)
2023-01-11T23:32:26.3355455Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsNested2D
2023-01-11T23:32:26.3357164Z [0;32m[       OK ] [mLoopNest.fuseLoopsNested2D (0 ms)
2023-01-11T23:32:26.3357646Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsNested2DInner
2023-01-11T23:32:26.3359877Z [0;32m[       OK ] [mLoopNest.fuseLoopsNested2DInner (0 ms)
2023-01-11T23:32:26.3360466Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsDifferentStopBounds
2023-01-11T23:32:26.3361007Z [0;32m[       OK ] [mLoopNest.fuseLoopsDifferentStopBounds (0 ms)
2023-01-11T23:32:26.3361551Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsDifferentStartBounds
2023-01-11T23:32:26.3362087Z [0;32m[       OK ] [mLoopNest.fuseLoopsDifferentStartBounds (0 ms)
2023-01-11T23:32:26.3362687Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsNotContiguous
2023-01-11T23:32:26.3363034Z [0;32m[       OK ] [mLoopNest.fuseLoopsNotContiguous (0 ms)
2023-01-11T23:32:26.3363390Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithDifferentParents
2023-01-11T23:32:26.3363777Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithDifferentParents (0 ms)
2023-01-11T23:32:26.3364155Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithVariableBounds
2023-01-11T23:32:26.3365180Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithVariableBounds (0 ms)
2023-01-11T23:32:26.3365657Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithExprBounds
2023-01-11T23:32:26.3369763Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithExprBounds (0 ms)
2023-01-11T23:32:26.3370292Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithDifferentExprBounds
2023-01-11T23:32:26.3373605Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithDifferentExprBounds (0 ms)
2023-01-11T23:32:26.3374196Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithNonOverlappingBufferAccesses
2023-01-11T23:32:26.3377747Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithNonOverlappingBufferAccesses (0 ms)
2023-01-11T23:32:26.3378395Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithNonOverlapping2DBufferAccesses
2023-01-11T23:32:26.3385131Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithNonOverlapping2DBufferAccesses (0 ms)
2023-01-11T23:32:26.3385678Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithReductions
2023-01-11T23:32:26.3389704Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithReductions (0 ms)
2023-01-11T23:32:26.3390265Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWith2DReductions
2023-01-11T23:32:26.3398465Z [0;32m[       OK ] [mLoopNest.fuseLoopsWith2DReductions (0 ms)
2023-01-11T23:32:26.3398966Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithComplexIndices
2023-01-11T23:32:26.3406039Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithComplexIndices (0 ms)
2023-01-11T23:32:26.3406576Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithMixedLoopVarsAsIndices
2023-01-11T23:32:26.3416328Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithMixedLoopVarsAsIndices (0 ms)
2023-01-11T23:32:26.3416880Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsWithTranspose
2023-01-11T23:32:26.3421527Z [0;32m[       OK ] [mLoopNest.fuseLoopsWithTranspose (0 ms)
2023-01-11T23:32:26.3422056Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies1
2023-01-11T23:32:26.3425917Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies1 (0 ms)
2023-01-11T23:32:26.3426483Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies2
2023-01-11T23:32:26.3429882Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies2 (0 ms)
2023-01-11T23:32:26.3430555Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies3
2023-01-11T23:32:26.3437364Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies3 (0 ms)
2023-01-11T23:32:26.3437948Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies4
2023-01-11T23:32:26.3446659Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies4 (0 ms)
2023-01-11T23:32:26.3447230Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies5
2023-01-11T23:32:26.3451319Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies5 (0 ms)
2023-01-11T23:32:26.3451878Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies6
2023-01-11T23:32:26.3456097Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies6 (0 ms)
2023-01-11T23:32:26.3456649Z [0;32m[ RUN      ] [mLoopNest.fuseLoopsThatViolateDependencies7
2023-01-11T23:32:26.3460948Z [0;32m[       OK ] [mLoopNest.fuseLoopsThatViolateDependencies7 (0 ms)
2023-01-11T23:32:26.3461477Z [0;32m[ RUN      ] [mLoopNest.areLoopsPerfectlyNested
2023-01-11T23:32:26.3461965Z [0;32m[       OK ] [mLoopNest.areLoopsPerfectlyNested (0 ms)
2023-01-11T23:32:26.3462319Z [0;32m[ RUN      ] [mLoopNest.reorderNestedLoops2D
2023-01-11T23:32:26.3462784Z [0;32m[       OK ] [mLoopNest.reorderNestedLoops2D (0 ms)
2023-01-11T23:32:26.3463232Z [0;32m[ RUN      ] [mLoopNest.reorderNestedLoops3D
2023-01-11T23:32:26.3463627Z [0;32m[       OK ] [mLoopNest.reorderNestedLoops3D (0 ms)
2023-01-11T23:32:26.3464049Z [0;32m[ RUN      ] [mLoopNest.reorderNestedLoops4D
2023-01-11T23:32:26.3464383Z [0;32m[       OK ] [mLoopNest.reorderNestedLoops4D (0 ms)
2023-01-11T23:32:26.3464735Z [0;32m[ RUN      ] [mLoopNest.reorderTrivialPermutation
2023-01-11T23:32:26.3465137Z [0;32m[       OK ] [mLoopNest.reorderTrivialPermutation (0 ms)
2023-01-11T23:32:26.3465522Z [0;32m[ RUN      ] [mLoopNest.reorderInvalidPermutations
2023-01-11T23:32:26.3465893Z [0;32m[       OK ] [mLoopNest.reorderInvalidPermutations (0 ms)
2023-01-11T23:32:26.3466233Z [0;32m[ RUN      ] [mLoopNest.reorderInvalidLoopNest
2023-01-11T23:32:26.3466627Z [0;32m[       OK ] [mLoopNest.reorderInvalidLoopNest (0 ms)
2023-01-11T23:32:26.3467081Z [0;32m[ RUN      ] [mLoopNest.compressBufferSimple
2023-01-11T23:32:26.3467478Z [0;32m[       OK ] [mLoopNest.compressBufferSimple (0 ms)
2023-01-11T23:32:26.3467819Z [0;32m[ RUN      ] [mLoopNest.compressBufferMultipleDims
2023-01-11T23:32:26.3468190Z [0;32m[       OK ] [mLoopNest.compressBufferMultipleDims (0 ms)
2023-01-11T23:32:26.3468556Z [0;32m[ RUN      ] [mLoopNest.compressBufferMultipleDims2
2023-01-11T23:32:26.3468921Z [0;32m[       OK ] [mLoopNest.compressBufferMultipleDims2 (0 ms)
2023-01-11T23:32:26.3469317Z [0;32m[ RUN      ] [mLoopNest.compressBufferDifferentOrderIndices
2023-01-11T23:32:26.3469740Z [0;32m[       OK ] [mLoopNest.compressBufferDifferentOrderIndices (0 ms)
2023-01-11T23:32:26.3470236Z [0;32m[ RUN      ] [mLoopNest.compressBufferVariableBounds
2023-01-11T23:32:26.3470605Z [0;32m[       OK ] [mLoopNest.compressBufferVariableBounds (0 ms)
2023-01-11T23:32:26.3470995Z [0;32m[ RUN      ] [mLoopNest.compressBufferNoCommonParentLoops
2023-01-11T23:32:26.3471400Z [0;32m[       OK ] [mLoopNest.compressBufferNoCommonParentLoops (0 ms)
2023-01-11T23:32:26.3471779Z [0;32m[ RUN      ] [mLoopNest.compressBufferIndicesMixed
2023-01-11T23:32:26.3472140Z [0;32m[       OK ] [mLoopNest.compressBufferIndicesMixed (0 ms)
2023-01-11T23:32:26.3472494Z [0;32m[ RUN      ] [mLoopNest.compressMultipleBuffers
2023-01-11T23:32:26.3472841Z [0;32m[       OK ] [mLoopNest.compressMultipleBuffers (0 ms)
2023-01-11T23:32:26.3473166Z [0;32m[ RUN      ] [mLoopNest.sanitizeNames
2023-01-11T23:32:26.3483761Z [0;32m[       OK ] [mLoopNest.sanitizeNames (1 ms)
2023-01-11T23:32:26.3484167Z [0;32m[----------] [m174 tests from LoopNest (1742 ms total)
2023-01-11T23:32:26.3484348Z 
2023-01-11T23:32:26.3484557Z [0;32m[----------] [m31 tests from MemDependency
2023-01-11T23:32:26.3484890Z [0;32m[ RUN      ] [mMemDependency.BoundOverlap
2023-01-11T23:32:26.3494837Z [0;32m[       OK ] [mMemDependency.BoundOverlap (1 ms)
2023-01-11T23:32:26.3495178Z [0;32m[ RUN      ] [mMemDependency.BoundComparison
2023-01-11T23:32:26.3503488Z [0;32m[       OK ] [mMemDependency.BoundComparison (0 ms)
2023-01-11T23:32:26.3504165Z [0;32m[ RUN      ] [mMemDependency.BoundOverlapSymbolic
2023-01-11T23:32:26.3512466Z [0;32m[       OK ] [mMemDependency.BoundOverlapSymbolic (0 ms)
2023-01-11T23:32:26.3512849Z [0;32m[ RUN      ] [mMemDependency.BoundOverlapMultiDim
2023-01-11T23:32:26.3520000Z [0;32m[       OK ] [mMemDependency.BoundOverlapMultiDim (0 ms)
2023-01-11T23:32:26.3520347Z [0;32m[ RUN      ] [mMemDependency.BoundSubtract
2023-01-11T23:32:26.3526914Z [0;32m[       OK ] [mMemDependency.BoundSubtract (0 ms)
2023-01-11T23:32:26.3527265Z [0;32m[ RUN      ] [mMemDependency.BoundSubtractSymbolic
2023-01-11T23:32:26.3548479Z [0;32m[       OK ] [mMemDependency.BoundSubtractSymbolic (2 ms)
2023-01-11T23:32:26.3549256Z [0;32m[ RUN      ] [mMemDependency.BoundSubtractMultiDim
2023-01-11T23:32:26.3572010Z [0;32m[       OK ] [mMemDependency.BoundSubtractMultiDim (2 ms)
2023-01-11T23:32:26.3572817Z [0;32m[ RUN      ] [mMemDependency.BoundSubtractMultiDimSymbolic
2023-01-11T23:32:26.3603403Z [0;32m[       OK ] [mMemDependency.BoundSubtractMultiDimSymbolic (3 ms)
2023-01-11T23:32:26.3603989Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerSimple
2023-01-11T23:32:26.3604687Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerSimple (0 ms)
2023-01-11T23:32:26.3605083Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerMultiStmt
2023-01-11T23:32:26.3605497Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerMultiStmt (0 ms)
2023-01-11T23:32:26.3606009Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerOverlap
2023-01-11T23:32:26.3606587Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerOverlap (0 ms)
2023-01-11T23:32:26.3606997Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoop
2023-01-11T23:32:26.3607385Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoop (0 ms)
2023-01-11T23:32:26.3607784Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopReduce
2023-01-11T23:32:26.3608982Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopReduce (0 ms)
2023-01-11T23:32:26.3609497Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopReduceExpanded
2023-01-11T23:32:26.3612176Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopReduceExpanded (0 ms)
2023-01-11T23:32:26.3612656Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerInputsOutputs
2023-01-11T23:32:26.3613256Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerInputsOutputs (0 ms)
2023-01-11T23:32:26.3613791Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerOutputDoesntDepend
2023-01-11T23:32:26.3614461Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerOutputDoesntDepend (0 ms)
2023-01-11T23:32:26.3615125Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopBounds
2023-01-11T23:32:26.3626214Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopBounds (1 ms)
2023-01-11T23:32:26.3626692Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopBoundsIndexShift
2023-01-11T23:32:26.3653127Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopBoundsIndexShift (2 ms)
2023-01-11T23:32:26.3654116Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopSelfDependency
2023-01-11T23:32:26.3784695Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopSelfDependency (13 ms)
2023-01-11T23:32:26.3785677Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopDistinctStrides
2023-01-11T23:32:26.3792326Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopDistinctStrides (0 ms)
2023-01-11T23:32:26.3792901Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerLoopBoundsCond
2023-01-11T23:32:26.3809944Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerLoopBoundsCond (1 ms)
2023-01-11T23:32:26.3824810Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerIfThenElse
2023-01-11T23:32:26.3825344Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerIfThenElse (1 ms)
2023-01-11T23:32:26.3825770Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerCutLoop
2023-01-11T23:32:26.3838597Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerCutLoop (1 ms)
2023-01-11T23:32:26.3839053Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerDynamicShapes
2023-01-11T23:32:26.3864308Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerDynamicShapes (2 ms)
2023-01-11T23:32:26.3864731Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerMultiDim
2023-01-11T23:32:26.3897134Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerMultiDim (3 ms)
2023-01-11T23:32:26.3897559Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerComputeAPI
2023-01-11T23:32:26.3907560Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerComputeAPI (1 ms)
2023-01-11T23:32:26.3907985Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerComputeInline
2023-01-11T23:32:26.3915183Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerComputeInline (0 ms)
2023-01-11T23:32:26.3915624Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerComputeSplit
2023-01-11T23:32:26.3936137Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerComputeSplit (2 ms)
2023-01-11T23:32:26.3936577Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerComputeReorder
2023-01-11T23:32:26.3949489Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerComputeReorder (1 ms)
2023-01-11T23:32:26.3949984Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerComputeReduce
2023-01-11T23:32:26.3960377Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerComputeReduce (1 ms)
2023-01-11T23:32:26.3960808Z [0;32m[ RUN      ] [mMemDependency.MemDependencyCheckerComputeGEMM
2023-01-11T23:32:26.4116357Z [0;32m[       OK ] [mMemDependency.MemDependencyCheckerComputeGEMM (15 ms)
2023-01-11T23:32:26.4116769Z [0;32m[----------] [m31 tests from MemDependency (63 ms total)
2023-01-11T23:32:26.4116982Z 
2023-01-11T23:32:26.4117174Z [0;32m[----------] [m2 tests from Ops
2023-01-11T23:32:26.4117441Z [0;32m[ RUN      ] [mOps.Sum
2023-01-11T23:32:26.4157960Z [0;32m[       OK ] [mOps.Sum (4 ms)
2023-01-11T23:32:26.4158265Z [0;32m[ RUN      ] [mOps.ChannelsLastSum
2023-01-11T23:32:26.5320722Z [0;32m[       OK ] [mOps.ChannelsLastSum (116 ms)
2023-01-11T23:32:26.5321259Z [0;32m[----------] [m2 tests from Ops (120 ms total)
2023-01-11T23:32:26.5321496Z 
2023-01-11T23:32:26.5321747Z [0;32m[----------] [m10 tests from Quantization
2023-01-11T23:32:26.5322208Z [0;32m[ RUN      ] [mQuantization.QuantDequantInt8
2023-01-11T23:32:26.5600267Z [0;32m[       OK ] [mQuantization.QuantDequantInt8 (27 ms)
2023-01-11T23:32:26.5600812Z [0;32m[ RUN      ] [mQuantization.QuantDequantUInt8
2023-01-11T23:32:26.5869295Z [0;32m[       OK ] [mQuantization.QuantDequantUInt8 (26 ms)
2023-01-11T23:32:26.5870397Z [0;32m[ RUN      ] [mQuantization.QuantDequantUInt8_NLC
2023-01-11T23:32:26.6148655Z [0;32m[       OK ] [mQuantization.QuantDequantUInt8_NLC (28 ms)
2023-01-11T23:32:26.6149085Z [0;32m[ RUN      ] [mQuantization.QuantAddDequantInt8
2023-01-11T23:32:26.6475466Z [0;32m[       OK ] [mQuantization.QuantAddDequantInt8 (32 ms)
2023-01-11T23:32:26.6475884Z [0;32m[ RUN      ] [mQuantization.QuantAddDequantUInt8
2023-01-11T23:32:26.6799785Z [0;32m[       OK ] [mQuantization.QuantAddDequantUInt8 (32 ms)
2023-01-11T23:32:26.6800312Z [0;32m[ RUN      ] [mQuantization.QuantSigmoidDequantUInt8
2023-01-11T23:32:26.7182769Z [0;32m[       OK ] [mQuantization.QuantSigmoidDequantUInt8 (38 ms)
2023-01-11T23:32:26.7183888Z [0;32m[ RUN      ] [mQuantization.QuantMulDequantUInt8
2023-01-11T23:32:26.7648364Z [0;32m[       OK ] [mQuantization.QuantMulDequantUInt8 (46 ms)
2023-01-11T23:32:26.7648899Z [0;32m[ RUN      ] [mQuantization.QuantUpsampleNearst2dDequantUInt8
2023-01-11T23:32:26.8212248Z [0;32m[       OK ] [mQuantization.QuantUpsampleNearst2dDequantUInt8 (56 ms)
2023-01-11T23:32:26.8213240Z [0;32m[ RUN      ] [mQuantization.UpsampleNearst2d
2023-01-11T23:32:26.8549136Z [0;32m[       OK ] [mQuantization.UpsampleNearst2d (33 ms)
2023-01-11T23:32:26.8550280Z [0;32m[ RUN      ] [mQuantization.QuantCatDequantUInt8
2023-01-11T23:32:26.9269989Z [0;32m[       OK ] [mQuantization.QuantCatDequantUInt8 (72 ms)
2023-01-11T23:32:26.9270690Z [0;32m[----------] [m10 tests from Quantization (395 ms total)
2023-01-11T23:32:26.9270918Z 
2023-01-11T23:32:26.9271135Z [0;32m[----------] [m2 tests from BufLiveRange
2023-01-11T23:32:26.9271543Z [0;32m[ RUN      ] [mBufLiveRange.SingleRangeLine
2023-01-11T23:32:26.9271989Z [0;32m[       OK ] [mBufLiveRange.SingleRangeLine (0 ms)
2023-01-11T23:32:26.9272318Z [0;32m[ RUN      ] [mBufLiveRange.MulRangeLine
2023-01-11T23:32:26.9272646Z [0;32m[       OK ] [mBufLiveRange.MulRangeLine (0 ms)
2023-01-11T23:32:26.9272981Z [0;32m[----------] [m2 tests from BufLiveRange (0 ms total)
2023-01-11T23:32:26.9273141Z 
2023-01-11T23:32:26.9273292Z [0;32m[----------] [m6 tests from MemPlanning
2023-01-11T23:32:26.9273610Z [0;32m[ RUN      ] [mMemPlanning.MemReuseWithTypeCast
2023-01-11T23:32:27.0038612Z [0;32m[       OK ] [mMemPlanning.MemReuseWithTypeCast (76 ms)
2023-01-11T23:32:27.0039006Z [0;32m[ RUN      ] [mMemPlanning.NoMemReuseForLargerType
2023-01-11T23:32:27.0982153Z [0;32m[       OK ] [mMemPlanning.NoMemReuseForLargerType (94 ms)
2023-01-11T23:32:27.0982707Z [0;32m[ RUN      ] [mMemPlanning.SameBufSizeMemReuse
2023-01-11T23:32:27.1826535Z [0;32m[       OK ] [mMemPlanning.SameBufSizeMemReuse (84 ms)
2023-01-11T23:32:27.1827319Z [0;32m[ RUN      ] [mMemPlanning.SameBufSizeMultiMemReuses
2023-01-11T23:32:27.2685979Z [0;32m[       OK ] [mMemPlanning.SameBufSizeMultiMemReuses (86 ms)
2023-01-11T23:32:27.2686412Z [0;32m[ RUN      ] [mMemPlanning.SameBufSizeMultiMemReusesOfOneBuf
2023-01-11T23:32:27.3701784Z [0;32m[       OK ] [mMemPlanning.SameBufSizeMultiMemReusesOfOneBuf (101 ms)
2023-01-11T23:32:27.3702657Z [0;32m[ RUN      ] [mMemPlanning.SmallerBufSizeNonMemReuse
2023-01-11T23:32:27.4502178Z [0;32m[       OK ] [mMemPlanning.SmallerBufSizeNonMemReuse (80 ms)
2023-01-11T23:32:27.4502958Z [0;32m[----------] [m6 tests from MemPlanning (522 ms total)
2023-01-11T23:32:27.4503299Z 
2023-01-11T23:32:27.4503641Z [0;32m[----------] [m45 tests from Reductions
2023-01-11T23:32:27.4504434Z [0;32m[ RUN      ] [mReductions.ReduceSum0D_1
2023-01-11T23:32:27.4505057Z [0;32m[       OK ] [mReductions.ReduceSum0D_1 (0 ms)
2023-01-11T23:32:27.4505661Z [0;32m[ RUN      ] [mReductions.ReduceSum0D_2
2023-01-11T23:32:27.4506259Z [0;32m[       OK ] [mReductions.ReduceSum0D_2 (0 ms)
2023-01-11T23:32:27.4506848Z [0;32m[ RUN      ] [mReductions.ReduceSum1D
2023-01-11T23:32:27.4507450Z [0;32m[       OK ] [mReductions.ReduceSum1D (0 ms)
2023-01-11T23:32:27.4508022Z [0;32m[ RUN      ] [mReductions.ReduceSum2D
2023-01-11T23:32:27.4508618Z [0;32m[       OK ] [mReductions.ReduceSum2D (0 ms)
2023-01-11T23:32:27.4509011Z [0;32m[ RUN      ] [mReductions.ReduceSum3D
2023-01-11T23:32:27.4536473Z [0;32m[       OK ] [mReductions.ReduceSum3D (3 ms)
2023-01-11T23:32:27.4537016Z [0;32m[ RUN      ] [mReductions.ReduceSum10D
2023-01-11T23:32:28.1548566Z [0;32m[       OK ] [mReductions.ReduceSum10D (700 ms)
2023-01-11T23:32:28.1549057Z [0;32m[ RUN      ] [mReductions.ReduceProduct
2023-01-11T23:32:28.1549593Z [0;32m[       OK ] [mReductions.ReduceProduct (0 ms)
2023-01-11T23:32:28.1550052Z [0;32m[ RUN      ] [mReductions.ReduceMax
2023-01-11T23:32:28.1552910Z [0;32m[       OK ] [mReductions.ReduceMax (0 ms)
2023-01-11T23:32:28.1553296Z [0;32m[ RUN      ] [mReductions.ReduceMinCustomInitializer
2023-01-11T23:32:28.1555257Z [0;32m[       OK ] [mReductions.ReduceMinCustomInitializer (0 ms)
2023-01-11T23:32:28.1555706Z [0;32m[ RUN      ] [mReductions.ReduceAnyAll
2023-01-11T23:32:28.1576389Z [0;32m[       OK ] [mReductions.ReduceAnyAll (2 ms)
2023-01-11T23:32:28.1576811Z [0;32m[ RUN      ] [mReductions.ReduceMatmul2D
2023-01-11T23:32:28.1585464Z [0;32m[       OK ] [mReductions.ReduceMatmul2D (0 ms)
2023-01-11T23:32:28.1585915Z [0;32m[ RUN      ] [mReductions.ReduceRfactorLike
2023-01-11T23:32:28.1595684Z [0;32m[       OK ] [mReductions.ReduceRfactorLike (1 ms)
2023-01-11T23:32:28.1596043Z [0;32m[ RUN      ] [mReductions.ReduceAsProducer
2023-01-11T23:32:28.1618719Z [0;32m[       OK ] [mReductions.ReduceAsProducer (2 ms)
2023-01-11T23:32:28.1619505Z [0;32m[ RUN      ] [mReductions.ReduceAsConsumer
2023-01-11T23:32:28.1656536Z [0;32m[       OK ] [mReductions.ReduceAsConsumer (3 ms)
2023-01-11T23:32:28.1657395Z [0;32m[ RUN      ] [mReductions.SplitReduceAxis
2023-01-11T23:32:28.1670548Z [0;32m[       OK ] [mReductions.SplitReduceAxis (1 ms)
2023-01-11T23:32:28.1671542Z [0;32m[ RUN      ] [mReductions.SplitNonReduceAxis
2023-01-11T23:32:28.1707068Z [0;32m[       OK ] [mReductions.SplitNonReduceAxis (3 ms)
2023-01-11T23:32:28.1707998Z [0;32m[ RUN      ] [mReductions.ReorderedReductionInitializer
2023-01-11T23:32:28.1746187Z [0;32m[       OK ] [mReductions.ReorderedReductionInitializer (3 ms)
2023-01-11T23:32:28.1747191Z [0;32m[ RUN      ] [mReductions.ReduceRfactor
2023-01-11T23:32:28.1755141Z [0;32m[       OK ] [mReductions.ReduceRfactor (1 ms)
2023-01-11T23:32:28.1755613Z [0;32m[ RUN      ] [mReductions.Reduce3DRfactorInner
2023-01-11T23:32:28.1887323Z [0;32m[       OK ] [mReductions.Reduce3DRfactorInner (13 ms)
2023-01-11T23:32:28.1888019Z [0;32m[ RUN      ] [mReductions.Reduce3DRfactorOuter
2023-01-11T23:32:28.2024104Z [0;32m[       OK ] [mReductions.Reduce3DRfactorOuter (13 ms)
2023-01-11T23:32:28.2024656Z [0;32m[ RUN      ] [mReductions.ReduceRepeatedInternalRfactor
2023-01-11T23:32:28.3828385Z [0;32m[       OK ] [mReductions.ReduceRepeatedInternalRfactor (180 ms)
2023-01-11T23:32:28.3828905Z [0;32m[ RUN      ] [mReductions.ReduceSplitTail
2023-01-11T23:32:28.4239889Z [0;32m[       OK ] [mReductions.ReduceSplitTail (41 ms)
2023-01-11T23:32:28.4240247Z [0;32m[ RUN      ] [mReductions.ReduceSplitNoTail
2023-01-11T23:32:28.4714932Z [0;32m[       OK ] [mReductions.ReduceSplitNoTail (47 ms)
2023-01-11T23:32:28.4715830Z [0;32m[ RUN      ] [mReductions.ReduceOverSplitTail
2023-01-11T23:32:28.5116111Z [0;32m[       OK ] [mReductions.ReduceOverSplitTail (39 ms)
2023-01-11T23:32:28.5116751Z [0;32m[ RUN      ] [mReductions.ReduceSplitMask
2023-01-11T23:32:28.5632957Z [0;32m[       OK ] [mReductions.ReduceSplitMask (51 ms)
2023-01-11T23:32:28.5633690Z [0;32m[ RUN      ] [mReductions.ReduceSplitNoMask
2023-01-11T23:32:28.6108813Z [0;32m[       OK ] [mReductions.ReduceSplitNoMask (47 ms)
2023-01-11T23:32:28.6109172Z [0;32m[ RUN      ] [mReductions.ReduceOverSplitMask
2023-01-11T23:32:28.6526186Z [0;32m[       OK ] [mReductions.ReduceOverSplitMask (41 ms)
2023-01-11T23:32:28.6526550Z [0;32m[ RUN      ] [mReductions.ReduceSplitRfactor
2023-01-11T23:32:28.6568880Z [0;32m[       OK ] [mReductions.ReduceSplitRfactor (4 ms)
2023-01-11T23:32:28.6569625Z [0;32m[ RUN      ] [mReductions.ReduceOverSplitRfactor
2023-01-11T23:32:28.6581988Z [0;32m[       OK ] [mReductions.ReduceOverSplitRfactor (1 ms)
2023-01-11T23:32:28.6582370Z [0;32m[ RUN      ] [mReductions.ReduceInlineReduction
2023-01-11T23:32:28.6582723Z [0;32m[       OK ] [mReductions.ReduceInlineReduction (0 ms)
2023-01-11T23:32:28.6583069Z [0;32m[ RUN      ] [mReductions.ReduceInlineConsumer
2023-01-11T23:32:28.6676783Z [0;32m[       OK ] [mReductions.ReduceInlineConsumer (9 ms)
2023-01-11T23:32:28.6677169Z [0;32m[ RUN      ] [mReductions.ReduceInlineReducerInternal
2023-01-11T23:32:28.6768972Z [0;32m[       OK ] [mReductions.ReduceInlineReducerInternal (9 ms)
2023-01-11T23:32:28.6769784Z [0;32m[ RUN      ] [mReductions.ReductionCacheAccessesOperatorAxis
2023-01-11T23:32:28.6818150Z [0;32m[       OK ] [mReductions.ReductionCacheAccessesOperatorAxis (4 ms)
2023-01-11T23:32:28.6818951Z [0;32m[ RUN      ] [mReductions.ReductionCacheAccessesOuterReduceAxis
2023-01-11T23:32:28.6860120Z [0;32m[       OK ] [mReductions.ReductionCacheAccessesOuterReduceAxis (4 ms)
2023-01-11T23:32:28.6861176Z [0;32m[ RUN      ] [mReductions.ReductionCacheAccessesInnerReduceAxis
2023-01-11T23:32:28.6902586Z [0;32m[       OK ] [mReductions.ReductionCacheAccessesInnerReduceAxis (4 ms)
2023-01-11T23:32:28.6903419Z [0;32m[ RUN      ] [mReductions.ReductionCacheBodyAccess
2023-01-11T23:32:28.6913971Z [0;32m[       OK ] [mReductions.ReductionCacheBodyAccess (1 ms)
2023-01-11T23:32:28.6914379Z [0;32m[ RUN      ] [mReductions.ReductionCacheConsumerAccess
2023-01-11T23:32:28.6930133Z [0;32m[       OK ] [mReductions.ReductionCacheConsumerAccess (1 ms)
2023-01-11T23:32:28.6930572Z [0;32m[ RUN      ] [mReductions.ReductionSplitCacheConsumerAccess
2023-01-11T23:32:28.6949076Z [0;32m[       OK ] [mReductions.ReductionSplitCacheConsumerAccess (1 ms)
2023-01-11T23:32:28.6950098Z [0;32m[ RUN      ] [mReductions.ReductionReorderCacheConsumerAccess
2023-01-11T23:32:28.6965351Z [0;32m[       OK ] [mReductions.ReductionReorderCacheConsumerAccess (1 ms)
2023-01-11T23:32:28.6965805Z [0;32m[ RUN      ] [mReductions.ReductionRfactorCacheTempOuter
2023-01-11T23:32:28.7153853Z [0;32m[       OK ] [mReductions.ReductionRfactorCacheTempOuter (18 ms)
2023-01-11T23:32:28.7154622Z [0;32m[ RUN      ] [mReductions.ReductionRfactorCacheTempInner
2023-01-11T23:32:28.7299601Z [0;32m[       OK ] [mReductions.ReductionRfactorCacheTempInner (14 ms)
2023-01-11T23:32:28.7300114Z [0;32m[ RUN      ] [mReductions.ReductionVectorize
2023-01-11T23:32:28.7308770Z [0;32m[       OK ] [mReductions.ReductionVectorize (1 ms)
2023-01-11T23:32:28.7309239Z [0;32m[ RUN      ] [mReductions.ReductionVectorizeInner
2023-01-11T23:32:28.7309600Z [0;32m[       OK ] [mReductions.ReductionVectorizeInner (0 ms)
2023-01-11T23:32:28.7310048Z [0;32m[ RUN      ] [mReductions.ReductionVectorizeRfactor
2023-01-11T23:32:28.7322376Z [0;32m[       OK ] [mReductions.ReductionVectorizeRfactor (1 ms)
2023-01-11T23:32:28.7322872Z [0;32m[ RUN      ] [mReductions.InitFunction
2023-01-11T23:32:28.7323218Z [0;32m[       OK ] [mReductions.InitFunction (0 ms)
2023-01-11T23:32:28.7323559Z [0;32m[----------] [m45 tests from Reductions (1282 ms total)
2023-01-11T23:32:28.7323725Z 
2023-01-11T23:32:28.7323881Z [0;32m[----------] [m69 tests from Registerizer
2023-01-11T23:32:28.7324265Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerSimple
2023-01-11T23:32:28.7325059Z [0;32m[       OK ] [mRegisterizer.RegisterizerSimple (0 ms)
2023-01-11T23:32:28.7325538Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoop
2023-01-11T23:32:28.7325931Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoop (0 ms)
2023-01-11T23:32:28.7326287Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopFixedLoad
2023-01-11T23:32:28.7327826Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopFixedLoad (0 ms)
2023-01-11T23:32:28.7328296Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopInternal
2023-01-11T23:32:28.7328947Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopInternal (0 ms)
2023-01-11T23:32:28.7329368Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopInternalLoadOverlap
2023-01-11T23:32:28.7331228Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopInternalLoadOverlap (0 ms)
2023-01-11T23:32:28.7331687Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopInternalRepeated
2023-01-11T23:32:28.7335373Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopInternalRepeated (0 ms)
2023-01-11T23:32:28.7335902Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopInternalRepeatedOverlapLoopVar
2023-01-11T23:32:28.7339035Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopInternalRepeatedOverlapLoopVar (0 ms)
2023-01-11T23:32:28.7339576Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopInternalRepeatedOverlapOther
2023-01-11T23:32:28.7342285Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopInternalRepeatedOverlapOther (0 ms)
2023-01-11T23:32:28.7342761Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiVar
2023-01-11T23:32:28.7344302Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiVar (0 ms)
2023-01-11T23:32:28.7344708Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerVariableLoad
2023-01-11T23:32:28.7346535Z [0;32m[       OK ] [mRegisterizer.RegisterizerVariableLoad (0 ms)
2023-01-11T23:32:28.7346940Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerSymbolicIndices
2023-01-11T23:32:28.7347779Z [0;32m[       OK ] [mRegisterizer.RegisterizerSymbolicIndices (0 ms)
2023-01-11T23:32:28.7348257Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiLoop
2023-01-11T23:32:28.7350609Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiLoop (0 ms)
2023-01-11T23:32:28.7351785Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerRepeated
2023-01-11T23:32:28.7354382Z [0;32m[       OK ] [mRegisterizer.RegisterizerRepeated (0 ms)
2023-01-11T23:32:28.7354898Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNoLoads
2023-01-11T23:32:28.7355312Z [0;32m[       OK ] [mRegisterizer.RegisterizerNoLoads (0 ms)
2023-01-11T23:32:28.7355730Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNoRepeatedStores
2023-01-11T23:32:28.7356710Z [0;32m[       OK ] [mRegisterizer.RegisterizerNoRepeatedStores (0 ms)
2023-01-11T23:32:28.7357107Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiVarOverlap
2023-01-11T23:32:28.7359855Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiVarOverlap (0 ms)
2023-01-11T23:32:28.7360243Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerAllocs
2023-01-11T23:32:28.7362656Z [0;32m[       OK ] [mRegisterizer.RegisterizerAllocs (0 ms)
2023-01-11T23:32:28.7363133Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNoInitializer
2023-01-11T23:32:28.7363818Z [0;32m[       OK ] [mRegisterizer.RegisterizerNoInitializer (0 ms)
2023-01-11T23:32:28.7364234Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNoInitializerLoopVar
2023-01-11T23:32:28.7364944Z [0;32m[       OK ] [mRegisterizer.RegisterizerNoInitializerLoopVar (0 ms)
2023-01-11T23:32:28.7365416Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoadThenStore
2023-01-11T23:32:28.7366891Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoadThenStore (0 ms)
2023-01-11T23:32:28.7367313Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerParallelized
2023-01-11T23:32:28.7367776Z [0;32m[       OK ] [mRegisterizer.RegisterizerParallelized (0 ms)
2023-01-11T23:32:28.7368221Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionAfter
2023-01-11T23:32:28.7369364Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionAfter (0 ms)
2023-01-11T23:32:28.7369836Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionBefore
2023-01-11T23:32:28.7371280Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionBefore (0 ms)
2023-01-11T23:32:28.7371748Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionInside
2023-01-11T23:32:28.7373155Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionInside (0 ms)
2023-01-11T23:32:28.7373683Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionInsideOverlap1
2023-01-11T23:32:28.7375629Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionInsideOverlap1 (0 ms)
2023-01-11T23:32:28.7376079Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionInsideOverlap2
2023-01-11T23:32:28.7379034Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionInsideOverlap2 (0 ms)
2023-01-11T23:32:28.7379454Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionHidden
2023-01-11T23:32:28.7380119Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionHidden (0 ms)
2023-01-11T23:32:28.7380548Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionUnhidden
2023-01-11T23:32:28.7382436Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionUnhidden (0 ms)
2023-01-11T23:32:28.7382849Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerCondCondition
2023-01-11T23:32:28.7384175Z [0;32m[       OK ] [mRegisterizer.RegisterizerCondCondition (0 ms)
2023-01-11T23:32:28.7384759Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerCondConditionUnhidden
2023-01-11T23:32:28.7385386Z [0;32m[       OK ] [mRegisterizer.RegisterizerCondConditionUnhidden (0 ms)
2023-01-11T23:32:28.7385877Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseHidden
2023-01-11T23:32:28.7389168Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseHidden (0 ms)
2023-01-11T23:32:28.7389607Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseUnhidden
2023-01-11T23:32:28.7393752Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseUnhidden (0 ms)
2023-01-11T23:32:28.7394165Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseNested
2023-01-11T23:32:28.7394677Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseNested (0 ms)
2023-01-11T23:32:28.7395167Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseInternal
2023-01-11T23:32:28.7397325Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseInternal (0 ms)
2023-01-11T23:32:28.7397739Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseCondition
2023-01-11T23:32:28.7398540Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseCondition (0 ms)
2023-01-11T23:32:28.7399049Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseConditionUnhidden
2023-01-11T23:32:28.7399999Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseConditionUnhidden (0 ms)
2023-01-11T23:32:28.7400430Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerConditionBranchOnly
2023-01-11T23:32:28.7410766Z [0;32m[       OK ] [mRegisterizer.RegisterizerConditionBranchOnly (1 ms)
2023-01-11T23:32:28.7411234Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerCondIfThenElse
2023-01-11T23:32:28.7412766Z [0;32m[       OK ] [mRegisterizer.RegisterizerCondIfThenElse (0 ms)
2023-01-11T23:32:28.7413165Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseLoop
2023-01-11T23:32:28.7414894Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseLoop (0 ms)
2023-01-11T23:32:28.7415412Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerIfThenElseLoopCut
2023-01-11T23:32:28.7415960Z [0;32m[       OK ] [mRegisterizer.RegisterizerIfThenElseLoopCut (0 ms)
2023-01-11T23:32:28.7416378Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialAfter
2023-01-11T23:32:28.7418336Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialAfter (0 ms)
2023-01-11T23:32:28.7418786Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialBefore
2023-01-11T23:32:28.7421010Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialBefore (0 ms)
2023-01-11T23:32:28.7421407Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialInside
2023-01-11T23:32:28.7424326Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialInside (0 ms)
2023-01-11T23:32:28.7424743Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialCondition
2023-01-11T23:32:28.7428159Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialCondition (0 ms)
2023-01-11T23:32:28.7428605Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialConditionInternalCut
2023-01-11T23:32:28.7429359Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialConditionInternalCut (0 ms)
2023-01-11T23:32:28.7429839Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialConditionInternalStart
2023-01-11T23:32:28.7432479Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialConditionInternalStart (0 ms)
2023-01-11T23:32:28.7432963Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerPartialOverlapsTwo
2023-01-11T23:32:28.7433804Z [0;32m[       OK ] [mRegisterizer.RegisterizerPartialOverlapsTwo (0 ms)
2023-01-11T23:32:28.7434212Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedBlocks
2023-01-11T23:32:28.7435665Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedBlocks (0 ms)
2023-01-11T23:32:28.7436079Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditions
2023-01-11T23:32:28.7436877Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditions (0 ms)
2023-01-11T23:32:28.7437310Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditionsUnhidden
2023-01-11T23:32:28.7438997Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditionsUnhidden (0 ms)
2023-01-11T23:32:28.7439479Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditionsHiddenFirst
2023-01-11T23:32:28.7441411Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditionsHiddenFirst (0 ms)
2023-01-11T23:32:28.7441900Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditionsHiddenSecond
2023-01-11T23:32:28.7443855Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditionsHiddenSecond (0 ms)
2023-01-11T23:32:28.7444365Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditionsCut
2023-01-11T23:32:28.7444971Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditionsCut (0 ms)
2023-01-11T23:32:28.7445502Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditionLoopHidden
2023-01-11T23:32:28.7447400Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditionLoopHidden (0 ms)
2023-01-11T23:32:28.7447889Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedConditionThreeDeep
2023-01-11T23:32:28.7452375Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedConditionThreeDeep (0 ms)
2023-01-11T23:32:28.7452848Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerNestedLoopSimple
2023-01-11T23:32:28.7453670Z [0;32m[       OK ] [mRegisterizer.RegisterizerNestedLoopSimple (0 ms)
2023-01-11T23:32:28.7454138Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerHiddenAccessYes
2023-01-11T23:32:28.7456741Z [0;32m[       OK ] [mRegisterizer.RegisterizerHiddenAccessYes (0 ms)
2023-01-11T23:32:28.7457186Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerHiddenAccessNo
2023-01-11T23:32:28.7459448Z [0;32m[       OK ] [mRegisterizer.RegisterizerHiddenAccessNo (0 ms)
2023-01-11T23:32:28.7459976Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerHiddenAccessMultiLoop
2023-01-11T23:32:28.7462639Z [0;32m[       OK ] [mRegisterizer.RegisterizerHiddenAccessMultiLoop (0 ms)
2023-01-11T23:32:28.7463106Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerTwoConditionalLoops
2023-01-11T23:32:28.7465034Z [0;32m[       OK ] [mRegisterizer.RegisterizerTwoConditionalLoops (0 ms)
2023-01-11T23:32:28.7465486Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerTwoConditionalLoopsCut
2023-01-11T23:32:28.7467705Z [0;32m[       OK ] [mRegisterizer.RegisterizerTwoConditionalLoopsCut (0 ms)
2023-01-11T23:32:28.7468151Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopLetVar
2023-01-11T23:32:28.7468957Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopLetVar (0 ms)
2023-01-11T23:32:28.7469443Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerLoopLetVarOuter
2023-01-11T23:32:28.7470042Z [0;32m[       OK ] [mRegisterizer.RegisterizerLoopLetVarOuter (0 ms)
2023-01-11T23:32:28.7470452Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiDim
2023-01-11T23:32:28.7473205Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiDim (0 ms)
2023-01-11T23:32:28.7473585Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiDimPartial
2023-01-11T23:32:28.7473979Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiDimPartial (0 ms)
2023-01-11T23:32:28.7474366Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiDimOverlap
2023-01-11T23:32:28.7476921Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiDimOverlap (0 ms)
2023-01-11T23:32:28.7477451Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiDimPartialOverlap
2023-01-11T23:32:28.7479416Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiDimPartialOverlap (0 ms)
2023-01-11T23:32:28.7479937Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiDim3DReduction1
2023-01-11T23:32:28.7481932Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiDim3DReduction1 (0 ms)
2023-01-11T23:32:28.7482365Z [0;32m[ RUN      ] [mRegisterizer.RegisterizerMultiDim3DReduction2
2023-01-11T23:32:28.7484744Z [0;32m[       OK ] [mRegisterizer.RegisterizerMultiDim3DReduction2 (0 ms)
2023-01-11T23:32:28.7485287Z [0;32m[----------] [m69 tests from Registerizer (16 ms total)
2023-01-11T23:32:28.7485509Z 
2023-01-11T23:32:28.7485722Z [0;32m[----------] [m92 tests from Simplify
2023-01-11T23:32:28.7486158Z [0;32m[ RUN      ] [mSimplify.ConstantFoldSimple
2023-01-11T23:32:28.7486611Z [0;32m[       OK ] [mSimplify.ConstantFoldSimple (0 ms)
2023-01-11T23:32:28.7487041Z [0;32m[ RUN      ] [mSimplify.ConstantFoldTwoLayer
2023-01-11T23:32:28.7487466Z [0;32m[       OK ] [mSimplify.ConstantFoldTwoLayer (0 ms)
2023-01-11T23:32:28.7487903Z [0;32m[ RUN      ] [mSimplify.ConstantFoldShifts
2023-01-11T23:32:28.7488369Z [0;32m[       OK ] [mSimplify.ConstantFoldShifts (0 ms)
2023-01-11T23:32:28.7488737Z [0;32m[ RUN      ] [mSimplify.ConstantFoldBitwise
2023-01-11T23:32:28.7489076Z [0;32m[       OK ] [mSimplify.ConstantFoldBitwise (0 ms)
2023-01-11T23:32:28.7489402Z [0;32m[ RUN      ] [mSimplify.ConstantFoldMultiOp
2023-01-11T23:32:28.7489738Z [0;32m[       OK ] [mSimplify.ConstantFoldMultiOp (0 ms)
2023-01-11T23:32:28.7490141Z [0;32m[ RUN      ] [mSimplify.ConstantFoldMinMax
2023-01-11T23:32:28.7490508Z [0;32m[       OK ] [mSimplify.ConstantFoldMinMax (0 ms)
2023-01-11T23:32:28.7490848Z [0;32m[ RUN      ] [mSimplify.ConstantFoldIntrinsics
2023-01-11T23:32:28.7491219Z [0;32m[       OK ] [mSimplify.ConstantFoldIntrinsics (0 ms)
2023-01-11T23:32:28.7491671Z [0;32m[ RUN      ] [mSimplify.ConstantFoldCastToBool
2023-01-11T23:32:28.7492120Z [0;32m[       OK ] [mSimplify.ConstantFoldCastToBool (0 ms)
2023-01-11T23:32:28.7492519Z [0;32m[ RUN      ] [mSimplify.ConstantFoldWithVar
2023-01-11T23:32:28.7492922Z [0;32m[       OK ] [mSimplify.ConstantFoldWithVar (0 ms)
2023-01-11T23:32:28.7493382Z [0;32m[ RUN      ] [mSimplify.ConditionalSelectFoldSimple
2023-01-11T23:32:28.7493812Z [0;32m[       OK ] [mSimplify.ConditionalSelectFoldSimple (0 ms)
2023-01-11T23:32:28.7494274Z [0;32m[ RUN      ] [mSimplify.ConditionalSelectFoldTwoLayer
2023-01-11T23:32:28.7494925Z [0;32m[       OK ] [mSimplify.ConditionalSelectFoldTwoLayer (0 ms)
2023-01-11T23:32:28.7495440Z [0;32m[ RUN      ] [mSimplify.ConditionalSelectFoldWithVar
2023-01-11T23:32:28.7495867Z [0;32m[       OK ] [mSimplify.ConditionalSelectFoldWithVar (0 ms)
2023-01-11T23:32:28.7496270Z [0;32m[ RUN      ] [mSimplify.UnFoldableExpr
2023-01-11T23:32:28.7496628Z [0;32m[       OK ] [mSimplify.UnFoldableExpr (0 ms)
2023-01-11T23:32:28.7497025Z [0;32m[ RUN      ] [mSimplify.HashSimple
2023-01-11T23:32:28.7497390Z [0;32m[       OK ] [mSimplify.HashSimple (0 ms)
2023-01-11T23:32:28.7497790Z [0;32m[ RUN      ] [mSimplify.HashEquivalence
2023-01-11T23:32:28.7498121Z [0;32m[       OK ] [mSimplify.HashEquivalence (0 ms)
2023-01-11T23:32:28.7498443Z [0;32m[ RUN      ] [mSimplify.HashEquivalenceRand
2023-01-11T23:32:28.7498771Z [0;32m[       OK ] [mSimplify.HashEquivalenceRand (0 ms)
2023-01-11T23:32:28.7499130Z [0;32m[ RUN      ] [mSimplify.HashEquivalenceAfterFolding
2023-01-11T23:32:28.7499506Z [0;32m[       OK ] [mSimplify.HashEquivalenceAfterFolding (0 ms)
2023-01-11T23:32:28.7499853Z [0;32m[ RUN      ] [mSimplify.HashDifferenceTypes
2023-01-11T23:32:28.7500259Z [0;32m[       OK ] [mSimplify.HashDifferenceTypes (0 ms)
2023-01-11T23:32:28.7500694Z [0;32m[ RUN      ] [mSimplify.HashLargeExpression
2023-01-11T23:32:28.7501029Z [0;32m[       OK ] [mSimplify.HashLargeExpression (0 ms)
2023-01-11T23:32:28.7501348Z [0;32m[ RUN      ] [mSimplify.HashForLoopOptions
2023-01-11T23:32:28.7501680Z [0;32m[       OK ] [mSimplify.HashForLoopOptions (0 ms)
2023-01-11T23:32:28.7502009Z [0;32m[ RUN      ] [mSimplify.SimplifyAdd
2023-01-11T23:32:28.7502308Z [0;32m[       OK ] [mSimplify.SimplifyAdd (0 ms)
2023-01-11T23:32:28.7502594Z [0;32m[ RUN      ] [mSimplify.SimplifySub
2023-01-11T23:32:28.7502891Z [0;32m[       OK ] [mSimplify.SimplifySub (0 ms)
2023-01-11T23:32:28.7503207Z [0;32m[ RUN      ] [mSimplify.SimplifyMultiLayer
2023-01-11T23:32:28.7503534Z [0;32m[       OK ] [mSimplify.SimplifyMultiLayer (0 ms)
2023-01-11T23:32:28.7503957Z [0;32m[ RUN      ] [mSimplify.SimplifyMultiTerm
2023-01-11T23:32:28.7504358Z [0;32m[       OK ] [mSimplify.SimplifyMultiTerm (0 ms)
2023-01-11T23:32:28.7504741Z [0;32m[ RUN      ] [mSimplify.SimplifyCasts
2023-01-11T23:32:28.7505058Z [0;32m[       OK ] [mSimplify.SimplifyCasts (0 ms)
2023-01-11T23:32:28.7505389Z [0;32m[ RUN      ] [mSimplify.SimplifyEliminatesNoOps
2023-01-11T23:32:28.7505742Z [0;32m[       OK ] [mSimplify.SimplifyEliminatesNoOps (0 ms)
2023-01-11T23:32:28.7506076Z [0;32m[ RUN      ] [mSimplify.SimplifyMultiVar
2023-01-11T23:32:28.7506399Z [0;32m[       OK ] [mSimplify.SimplifyMultiVar (0 ms)
2023-01-11T23:32:28.7506728Z [0;32m[ RUN      ] [mSimplify.SimplifyEliminatesVar
2023-01-11T23:32:28.7507068Z [0;32m[       OK ] [mSimplify.SimplifyEliminatesVar (0 ms)
2023-01-11T23:32:28.7507389Z [0;32m[ RUN      ] [mSimplify.SimplifyAdds
2023-01-11T23:32:28.7507696Z [0;32m[       OK ] [mSimplify.SimplifyAdds (0 ms)
2023-01-11T23:32:28.7507987Z [0;32m[ RUN      ] [mSimplify.SimplifyMuls
2023-01-11T23:32:28.7508296Z [0;32m[       OK ] [mSimplify.SimplifyMuls (0 ms)
2023-01-11T23:32:28.7508621Z [0;32m[ RUN      ] [mSimplify.SimplifySubs
2023-01-11T23:32:28.7512408Z [0;32m[       OK ] [mSimplify.SimplifySubs (0 ms)
2023-01-11T23:32:28.7512845Z [0;32m[ RUN      ] [mSimplify.SimplifyDiv
2023-01-11T23:32:28.7513203Z [0;32m[       OK ] [mSimplify.SimplifyDiv (0 ms)
2023-01-11T23:32:28.7513551Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext0
2023-01-11T23:32:28.7513924Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext0 (0 ms)
2023-01-11T23:32:28.7514337Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext1
2023-01-11T23:32:28.7514837Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext1 (0 ms)
2023-01-11T23:32:28.7515234Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext2
2023-01-11T23:32:28.7515955Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext2 (0 ms)
2023-01-11T23:32:28.7516330Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext3
2023-01-11T23:32:28.7516698Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext3 (0 ms)
2023-01-11T23:32:28.7517060Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext4
2023-01-11T23:32:28.7519259Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext4 (0 ms)
2023-01-11T23:32:28.7519638Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext5
2023-01-11T23:32:28.7521496Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext5 (0 ms)
2023-01-11T23:32:28.7521875Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext6
2023-01-11T23:32:28.7524724Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext6 (0 ms)
2023-01-11T23:32:28.7525252Z [0;32m[ RUN      ] [mSimplify.SimplifyDivWithLoopContext7
2023-01-11T23:32:28.7525736Z [0;32m[       OK ] [mSimplify.SimplifyDivWithLoopContext7 (0 ms)
2023-01-11T23:32:28.7526175Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext0
2023-01-11T23:32:28.7526554Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext0 (0 ms)
2023-01-11T23:32:28.7526992Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext1
2023-01-11T23:32:28.7527987Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext1 (0 ms)
2023-01-11T23:32:28.7528374Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext2
2023-01-11T23:32:28.7529995Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext2 (0 ms)
2023-01-11T23:32:28.7530369Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext3
2023-01-11T23:32:28.7530740Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext3 (0 ms)
2023-01-11T23:32:28.7531110Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext4
2023-01-11T23:32:28.7533151Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext4 (0 ms)
2023-01-11T23:32:28.7533517Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext5
2023-01-11T23:32:28.7535740Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext5 (0 ms)
2023-01-11T23:32:28.7536114Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext6
2023-01-11T23:32:28.7539157Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext6 (0 ms)
2023-01-11T23:32:28.7539533Z [0;32m[ RUN      ] [mSimplify.SimplifyModWithLoopContext7
2023-01-11T23:32:28.7540018Z [0;32m[       OK ] [mSimplify.SimplifyModWithLoopContext7 (0 ms)
2023-01-11T23:32:28.7540451Z [0;32m[ RUN      ] [mSimplify.SimplifyMod
2023-01-11T23:32:28.7542095Z [0;32m[       OK ] [mSimplify.SimplifyMod (0 ms)
2023-01-11T23:32:28.7542467Z [0;32m[ RUN      ] [mSimplify.SimplifyMultiOp
2023-01-11T23:32:28.7544327Z [0;32m[       OK ] [mSimplify.SimplifyMultiOp (0 ms)
2023-01-11T23:32:28.7544647Z [0;32m[ RUN      ] [mSimplify.SimplifyManyOps
2023-01-11T23:32:28.7547719Z [0;32m[       OK ] [mSimplify.SimplifyManyOps (0 ms)
2023-01-11T23:32:28.7548059Z [0;32m[ RUN      ] [mSimplify.SimplifyFactorization
2023-01-11T23:32:28.7554067Z [0;32m[       OK ] [mSimplify.SimplifyFactorization (0 ms)
2023-01-11T23:32:28.7554528Z [0;32m[ RUN      ] [mSimplify.SimplifyFactorizeUneven
2023-01-11T23:32:28.7554921Z [0;32m[       OK ] [mSimplify.SimplifyFactorizeUneven (0 ms)
2023-01-11T23:32:28.7555264Z [0;32m[ RUN      ] [mSimplify.SimplifyDeeperTerms
2023-01-11T23:32:28.7555609Z [0;32m[       OK ] [mSimplify.SimplifyDeeperTerms (0 ms)
2023-01-11T23:32:28.7555953Z [0;32m[ RUN      ] [mSimplify.SimplifyDeeperDifference
2023-01-11T23:32:28.7556383Z [0;32m[       OK ] [mSimplify.SimplifyDeeperDifference (0 ms)
2023-01-11T23:32:28.7556912Z [0;32m[ RUN      ] [mSimplify.SimplifyFoldComplexDifference
2023-01-11T23:32:28.7557483Z [0;32m[       OK ] [mSimplify.SimplifyFoldComplexDifference (0 ms)
2023-01-11T23:32:28.7557838Z [0;32m[ RUN      ] [mSimplify.SimplifyIfComponents
2023-01-11T23:32:28.7558184Z [0;32m[       OK ] [mSimplify.SimplifyIfComponents (0 ms)
2023-01-11T23:32:28.7558603Z [0;32m[ RUN      ] [mSimplify.SimplifyOpaqueTerms
2023-01-11T23:32:28.7559064Z [0;32m[       OK ] [mSimplify.SimplifyOpaqueTerms (0 ms)
2023-01-11T23:32:28.7559402Z [0;32m[ RUN      ] [mSimplify.SimplifySymbolicMinMax
2023-01-11T23:32:28.7560553Z [0;32m[       OK ] [mSimplify.SimplifySymbolicMinMax (0 ms)
2023-01-11T23:32:28.7561149Z [0;32m[ RUN      ] [mSimplify.SimplifyNestedMax
2023-01-11T23:32:28.7573791Z [0;32m[       OK ] [mSimplify.SimplifyNestedMax (1 ms)
2023-01-11T23:32:28.7574318Z [0;32m[ RUN      ] [mSimplify.SimplifyNestedMin
2023-01-11T23:32:28.7587359Z [0;32m[       OK ] [mSimplify.SimplifyNestedMin (1 ms)
2023-01-11T23:32:28.7587909Z [0;32m[ RUN      ] [mSimplify.SimplifyWontReorderFloat
2023-01-11T23:32:28.7588880Z [0;32m[       OK ] [mSimplify.SimplifyWontReorderFloat (0 ms)
2023-01-11T23:32:28.7589432Z [0;32m[ RUN      ] [mSimplify.SimplifyRoundModPattern
2023-01-11T23:32:28.7597832Z [0;32m[       OK ] [mSimplify.SimplifyRoundModPattern (0 ms)
2023-01-11T23:32:28.7598474Z [0;32m[ RUN      ] [mSimplify.SimplifyRoundModPatternFactorization
2023-01-11T23:32:28.7602324Z [0;32m[       OK ] [mSimplify.SimplifyRoundModPatternFactorization (0 ms)
2023-01-11T23:32:28.7602960Z [0;32m[ RUN      ] [mSimplify.SimplifyRoundModPatternMultivar
2023-01-11T23:32:28.7606941Z [0;32m[       OK ] [mSimplify.SimplifyRoundModPatternMultivar (0 ms)
2023-01-11T23:32:28.7607558Z [0;32m[ RUN      ] [mSimplify.SimplifyModRoundModPattern
2023-01-11T23:32:28.7612272Z [0;32m[       OK ] [mSimplify.SimplifyModRoundModPattern (0 ms)
2023-01-11T23:32:28.7612907Z [0;32m[ RUN      ] [mSimplify.SimplifyModRoundModPatternFactorization
2023-01-11T23:32:28.7620564Z [0;32m[       OK ] [mSimplify.SimplifyModRoundModPatternFactorization (0 ms)
2023-01-11T23:32:28.7621242Z [0;32m[ RUN      ] [mSimplify.SimplifyModRoundModPatternMultivar
2023-01-11T23:32:28.7634474Z [0;32m[       OK ] [mSimplify.SimplifyModRoundModPatternMultivar (1 ms)
2023-01-11T23:32:28.7635030Z [0;32m[ RUN      ] [mSimplify.SimplifyDivisionScalarFactorization
2023-01-11T23:32:28.7636093Z [0;32m[       OK ] [mSimplify.SimplifyDivisionScalarFactorization (0 ms)
2023-01-11T23:32:28.7636630Z [0;32m[ RUN      ] [mSimplify.SimplifyConstantBranches
2023-01-11T23:32:28.7637056Z [0;32m[       OK ] [mSimplify.SimplifyConstantBranches (0 ms)
2023-01-11T23:32:28.7637497Z [0;32m[ RUN      ] [mSimplify.SimplifyConstantCond
2023-01-11T23:32:28.7638015Z [0;32m[       OK ] [mSimplify.SimplifyConstantCond (0 ms)
2023-01-11T23:32:28.7638370Z [0;32m[ RUN      ] [mSimplify.SimplifyEliminateEmptyCond
2023-01-11T23:32:28.7638749Z [0;32m[       OK ] [mSimplify.SimplifyEliminateEmptyCond (0 ms)
2023-01-11T23:32:28.7639117Z [0;32m[ RUN      ] [mSimplify.SimplifyConstantComparisons
2023-01-11T23:32:28.7645412Z [0;32m[       OK ] [mSimplify.SimplifyConstantComparisons (0 ms)
2023-01-11T23:32:28.7646008Z [0;32m[ RUN      ] [mSimplify.SimplifySymbolicComparisons
2023-01-11T23:32:28.7652737Z [0;32m[       OK ] [mSimplify.SimplifySymbolicComparisons (0 ms)
2023-01-11T23:32:28.7653342Z [0;32m[ RUN      ] [mSimplify.SimplifyEliminateZeroLengthFor
2023-01-11T23:32:28.7653943Z [0;32m[       OK ] [mSimplify.SimplifyEliminateZeroLengthFor (0 ms)
2023-01-11T23:32:28.7654649Z [0;32m[ RUN      ] [mSimplify.SimplifyOneLoopFor
2023-01-11T23:32:28.7655184Z [0;32m[       OK ] [mSimplify.SimplifyOneLoopFor (0 ms)
2023-01-11T23:32:28.7655728Z [0;32m[ RUN      ] [mSimplify.SimplifyForWontLoseLoopOptions
2023-01-11T23:32:28.7656307Z [0;32m[       OK ] [mSimplify.SimplifyForWontLoseLoopOptions (0 ms)
2023-01-11T23:32:28.7656859Z [0;32m[ RUN      ] [mSimplify.SimplifyMultilevelFor
2023-01-11T23:32:28.7657379Z [0;32m[       OK ] [mSimplify.SimplifyMultilevelFor (0 ms)
2023-01-11T23:32:28.7657873Z [0;32m[ RUN      ] [mSimplify.SimplifyForCleansUp
2023-01-11T23:32:28.7660234Z [0;32m[       OK ] [mSimplify.SimplifyForCleansUp (0 ms)
2023-01-11T23:32:28.7660770Z [0;32m[ RUN      ] [mSimplify.SimplifyEliminateEmptyFor
2023-01-11T23:32:28.7661708Z [0;32m[       OK ] [mSimplify.SimplifyEliminateEmptyFor (0 ms)
2023-01-11T23:32:28.7662095Z [0;32m[ RUN      ] [mSimplify.SimplifyFlattenBlock
2023-01-11T23:32:28.7662461Z [0;32m[       OK ] [mSimplify.SimplifyFlattenBlock (0 ms)
2023-01-11T23:32:28.7662831Z [0;32m[ RUN      ] [mSimplify.SimplifyEliminateZeroLengthAlloc
2023-01-11T23:32:28.7664279Z [0;32m[       OK ] [mSimplify.SimplifyEliminateZeroLengthAlloc (0 ms)
2023-01-11T23:32:28.7664651Z [0;32m[ RUN      ] [mSimplify.DontSimplifyRand
2023-01-11T23:32:28.7664975Z [0;32m[       OK ] [mSimplify.DontSimplifyRand (0 ms)
2023-01-11T23:32:28.7665311Z [0;32m[ RUN      ] [mSimplify.SimplifyReorderForCond
2023-01-11T23:32:28.7669095Z [0;32m[       OK ] [mSimplify.SimplifyReorderForCond (0 ms)
2023-01-11T23:32:28.7669449Z [0;32m[ RUN      ] [mSimplify.SimplifyFuseConditions
2023-01-11T23:32:28.7674866Z [0;32m[       OK ] [mSimplify.SimplifyFuseConditions (0 ms)
2023-01-11T23:32:28.7675217Z [0;32m[ RUN      ] [mSimplify.SimplifySyncThreads
2023-01-11T23:32:28.7675596Z [0;32m[       OK ] [mSimplify.SimplifySyncThreads (0 ms)
2023-01-11T23:32:28.7676009Z [0;32m[ RUN      ] [mSimplify.SimplifyRampSubBroadcast
2023-01-11T23:32:28.7676428Z [0;32m[       OK ] [mSimplify.SimplifyRampSubBroadcast (0 ms)
2023-01-11T23:32:28.7676831Z [0;32m[ RUN      ] [mSimplify.SimplifyBroadcastTermExpander
2023-01-11T23:32:28.7677223Z [0;32m[       OK ] [mSimplify.SimplifyBroadcastTermExpander (0 ms)
2023-01-11T23:32:28.7677584Z [0;32m[ RUN      ] [mSimplify.CompareSelectLoopBounds
2023-01-11T23:32:28.7760074Z [0;32m[       OK ] [mSimplify.CompareSelectLoopBounds (8 ms)
2023-01-11T23:32:28.7760469Z [0;32m[ RUN      ] [mSimplify.CompareSelectCondAlwaysInLoopBounds
2023-01-11T23:32:28.7760951Z [0;32m[       OK ] [mSimplify.CompareSelectCondAlwaysInLoopBounds (0 ms)
2023-01-11T23:32:28.7761383Z [0;32m[ RUN      ] [mSimplify.IfThenCondAlwaysInLoopBounds
2023-01-11T23:32:28.7761766Z [0;32m[       OK ] [mSimplify.IfThenCondAlwaysInLoopBounds (0 ms)
2023-01-11T23:32:28.7762166Z [0;32m[ RUN      ] [mSimplify.MultiClauseCondAlwaysInLoopBounds
2023-01-11T23:32:28.7763915Z [0;32m[       OK ] [mSimplify.MultiClauseCondAlwaysInLoopBounds (0 ms)
2023-01-11T23:32:28.7764309Z [0;32m[----------] [m92 tests from Simplify (27 ms total)
2023-01-11T23:32:28.7764503Z 
2023-01-11T23:32:28.7764725Z [0;32m[----------] [m12 tests from TEFuserPass
2023-01-11T23:32:28.7765153Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_1
2023-01-11T23:32:28.7770435Z [0;32m[       OK ] [mTEFuserPass.FuserPass_1 (0 ms)
2023-01-11T23:32:28.7770751Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_2
2023-01-11T23:32:28.7774161Z [0;32m[       OK ] [mTEFuserPass.FuserPass_2 (0 ms)
2023-01-11T23:32:28.7774584Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_3
2023-01-11T23:32:28.7777112Z [0;32m[       OK ] [mTEFuserPass.FuserPass_3 (0 ms)
2023-01-11T23:32:28.7777467Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_0DimInput
2023-01-11T23:32:28.7779748Z [0;32m[       OK ] [mTEFuserPass.FuserPass_0DimInput (0 ms)
2023-01-11T23:32:28.7780100Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_UnfusibleDevice
2023-01-11T23:32:28.7780461Z [0;32m[       OK ] [mTEFuserPass.FuserPass_UnfusibleDevice (0 ms)
2023-01-11T23:32:28.7780808Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_UnknownShapes
2023-01-11T23:32:28.7781694Z [0;32m[       OK ] [mTEFuserPass.FuserPass_UnknownShapes (0 ms)
2023-01-11T23:32:28.7782253Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_Multidevice
2023-01-11T23:32:28.7790718Z [0;32m[       OK ] [mTEFuserPass.FuserPass_Multidevice (0 ms)
2023-01-11T23:32:28.7791249Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_MergeGroups
2023-01-11T23:32:28.7794568Z [0;32m[       OK ] [mTEFuserPass.FuserPass_MergeGroups (0 ms)
2023-01-11T23:32:28.7795157Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_IgnoreUnknownShapeAtStart
2023-01-11T23:32:28.7795754Z [0;32m[       OK ] [mTEFuserPass.FuserPass_IgnoreUnknownShapeAtStart (0 ms)
2023-01-11T23:32:28.7796289Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_Where
2023-01-11T23:32:28.7797997Z [0;32m[       OK ] [mTEFuserPass.FuserPass_Where (0 ms)
2023-01-11T23:32:28.7798523Z [0;32m[ RUN      ] [mTEFuserPass.FuserPass_WhereList
2023-01-11T23:32:28.7799486Z [0;32m[       OK ] [mTEFuserPass.FuserPass_WhereList (0 ms)
2023-01-11T23:32:28.7799979Z [0;32m[ RUN      ] [mTEFuserPass.DynamicShapeFusion
2023-01-11T23:32:29.0239213Z [0;32m[       OK ] [mTEFuserPass.DynamicShapeFusion (243 ms)
2023-01-11T23:32:29.0239602Z [0;32m[----------] [m12 tests from TEFuserPass (247 ms total)
2023-01-11T23:32:29.0239782Z 
2023-01-11T23:32:29.0239926Z [0;32m[----------] [m3 tests from Type
2023-01-11T23:32:29.0240210Z [0;32m[ RUN      ] [mType.Test01
2023-01-11T23:32:29.0240481Z [0;32m[       OK ] [mType.Test01 (0 ms)
2023-01-11T23:32:29.0240749Z [0;32m[ RUN      ] [mType.BitCasting
2023-01-11T23:32:29.0241025Z [0;32m[       OK ] [mType.BitCasting (0 ms)
2023-01-11T23:32:29.0241306Z [0;32m[ RUN      ] [mType.Propagation
2023-01-11T23:32:29.0241599Z [0;32m[       OK ] [mType.Propagation (0 ms)
2023-01-11T23:32:29.0241911Z [0;32m[----------] [m3 tests from Type (0 ms total)
2023-01-11T23:32:29.0242064Z 
2023-01-11T23:32:29.0242265Z [0;32m[----------] [m1 test from SpecializationsInCustomPasses
2023-01-11T23:32:29.0242634Z [0;32m[ RUN      ] [mSpecializationsInCustomPasses.Basic
2023-01-11T23:32:29.0257096Z [0;32m[       OK ] [mSpecializationsInCustomPasses.Basic (1 ms)
2023-01-11T23:32:29.0257509Z [0;32m[----------] [m1 test from SpecializationsInCustomPasses (2 ms total)
2023-01-11T23:32:29.0257711Z 
2023-01-11T23:32:29.0257854Z [0;32m[----------] [m26 tests from Cuda
2023-01-11T23:32:29.0258124Z [0;32m[ RUN      ] [mCuda.Sigmoid_CUDA
2023-01-11T23:32:29.2705668Z [0;32m[       OK ] [mCuda.Sigmoid_CUDA (244 ms)
2023-01-11T23:32:29.2706301Z [0;32m[ RUN      ] [mCuda.TestVectorAdd01_CUDA
2023-01-11T23:32:31.1580072Z [0;32m[       OK ] [mCuda.TestVectorAdd01_CUDA (1887 ms)
2023-01-11T23:32:31.1580735Z [0;32m[ RUN      ] [mCuda.TestVectorAdd02_CUDA
2023-01-11T23:32:31.6259428Z [0;32m[       OK ] [mCuda.TestVectorAdd02_CUDA (467 ms)
2023-01-11T23:32:31.6259739Z [0;32m[ RUN      ] [mCuda.HalfCast_CUDA
2023-01-11T23:32:31.8601078Z [0;32m[       OK ] [mCuda.HalfCast_CUDA (234 ms)
2023-01-11T23:32:31.8601485Z [0;32m[ RUN      ] [mCuda.DynamicShape2D_CUDA
2023-01-11T23:32:32.6085869Z [0;32m[       OK ] [mCuda.DynamicShape2D_CUDA (748 ms)
2023-01-11T23:32:32.6086223Z [0;32m[ RUN      ] [mCuda.TestRand01_CUDA
2023-01-11T23:32:32.8567663Z [0;32m[       OK ] [mCuda.TestRand01_CUDA (248 ms)
2023-01-11T23:32:32.8567996Z [0;32m[ RUN      ] [mCuda.DynamicShapeSplit_CUDA
2023-01-11T23:32:33.0940256Z [0;32m[       OK ] [mCuda.DynamicShapeSplit_CUDA (236 ms)
2023-01-11T23:32:33.0940647Z [0;32m[ RUN      ] [mCuda.OneBlockOneThreadGlobalReduce1_CUDA
2023-01-11T23:32:33.3361548Z [0;32m[       OK ] [mCuda.OneBlockOneThreadGlobalReduce1_CUDA (242 ms)
2023-01-11T23:32:33.3361983Z [0;32m[ RUN      ] [mCuda.OneBlockMultiThreadGlobalReduce1_CUDA
2023-01-11T23:32:33.5695921Z [0;32m[       OK ] [mCuda.OneBlockMultiThreadGlobalReduce1_CUDA (233 ms)
2023-01-11T23:32:33.5696668Z [0;32m[ RUN      ] [mCuda.NoThreadIdxWrite_1_CUDA
2023-01-11T23:32:33.8070326Z [0;32m[       OK ] [mCuda.NoThreadIdxWrite_1_CUDA (237 ms)
2023-01-11T23:32:33.8070722Z [0;32m[ RUN      ] [mCuda.SharedMemReduce_1_CUDA
2023-01-11T23:32:34.0568643Z [0;32m[       OK ] [mCuda.SharedMemReduce_1_CUDA (249 ms)
2023-01-11T23:32:34.0569535Z [0;32m[ RUN      ] [mCuda.LocalMemReduce_1_CUDA
2023-01-11T23:32:34.3070884Z [0;32m[       OK ] [mCuda.LocalMemReduce_1_CUDA (249 ms)
2023-01-11T23:32:34.3071803Z [0;32m[ RUN      ] [mCuda.HalfSupport_CUDA
2023-01-11T23:32:34.5446407Z [0;32m[       OK ] [mCuda.HalfSupport_CUDA (237 ms)
2023-01-11T23:32:34.5446881Z [0;32m[ RUN      ] [mCuda.HalfPropagation_CUDA
2023-01-11T23:32:34.7825093Z [0;32m[       OK ] [mCuda.HalfPropagation_CUDA (237 ms)
2023-01-11T23:32:34.7825560Z [0;32m[ RUN      ] [mCuda.UnusedHalfArgument_CUDA
2023-01-11T23:32:35.0182582Z [0;32m[       OK ] [mCuda.UnusedHalfArgument_CUDA (235 ms)
2023-01-11T23:32:35.0183275Z [0;32m[ RUN      ] [mCuda.PrioritizeDependents_CUDA
2023-01-11T23:32:35.2516881Z [0;32m[       OK ] [mCuda.PrioritizeDependents_CUDA (233 ms)
2023-01-11T23:32:35.2517559Z [0;32m[ RUN      ] [mCuda.MaskBlockDim_CUDA
2023-01-11T23:32:35.4852878Z [0;32m[       OK ] [mCuda.MaskBlockDim_CUDA (233 ms)
2023-01-11T23:32:35.4853543Z [0;32m[ RUN      ] [mCuda.MaskThreadDim_CUDA
2023-01-11T23:32:35.7197326Z [0;32m[       OK ] [mCuda.MaskThreadDim_CUDA (234 ms)
2023-01-11T23:32:35.7198051Z [0;32m[ RUN      ] [mCuda.MaskMultiBlockDim_CUDA
2023-01-11T23:32:35.9537216Z [0;32m[       OK ] [mCuda.MaskMultiBlockDim_CUDA (234 ms)
2023-01-11T23:32:35.9538401Z [0;32m[ RUN      ] [mCuda.MaskBlockAndThreadDim_CUDA
2023-01-11T23:32:36.1882440Z [0;32m[       OK ] [mCuda.MaskBlockAndThreadDim_CUDA (234 ms)
2023-01-11T23:32:36.1882843Z [0;32m[ RUN      ] [mCuda.MaskMultiDim_CUDA
2023-01-11T23:32:36.4248977Z [0;32m[       OK ] [mCuda.MaskMultiDim_CUDA (236 ms)
2023-01-11T23:32:36.4249484Z [0;32m[ RUN      ] [mCuda.MaskMultiDimSymbolic_CUDA
2023-01-11T23:32:36.6648696Z [0;32m[       OK ] [mCuda.MaskMultiDimSymbolic_CUDA (240 ms)
2023-01-11T23:32:36.6649065Z [0;32m[ RUN      ] [mCuda.MaskCompoundInnerLoop_CUDA
2023-01-11T23:32:36.9017545Z [0;32m[       OK ] [mCuda.MaskCompoundInnerLoop_CUDA (236 ms)
2023-01-11T23:32:36.9018001Z [0;32m[ RUN      ] [mCuda.MaskInnerLoopOneBlock_CUDA
2023-01-11T23:32:37.1497561Z [0;32m[       OK ] [mCuda.MaskInnerLoopOneBlock_CUDA (248 ms)
2023-01-11T23:32:37.1498285Z [0;32m[ RUN      ] [mCuda.MaskMultiDimMultiAxis_CUDA
2023-01-11T23:32:37.3868214Z [0;32m[       OK ] [mCuda.MaskMultiDimMultiAxis_CUDA (237 ms)
2023-01-11T23:32:37.3868576Z [0;32m[ RUN      ] [mCuda.MaskMultiDimMultiLevel_CUDA
2023-01-11T23:32:37.6233448Z [0;32m[       OK ] [mCuda.MaskMultiDimMultiLevel_CUDA (236 ms)
2023-01-11T23:32:37.6234286Z [0;32m[----------] [m26 tests from Cuda (8597 ms total)
2023-01-11T23:32:37.6234620Z 
2023-01-11T23:32:37.6234908Z [0;32m[----------] [m150 tests from LLVM
2023-01-11T23:32:37.6235436Z [0;32m[ RUN      ] [mLLVM.ByteImmTest
2023-01-11T23:32:37.6420277Z [0;32m[       OK ] [mLLVM.ByteImmTest (18 ms)
2023-01-11T23:32:37.6420841Z [0;32m[ RUN      ] [mLLVM.CharImmTest
2023-01-11T23:32:37.6600436Z [0;32m[       OK ] [mLLVM.CharImmTest (18 ms)
2023-01-11T23:32:37.6601040Z [0;32m[ RUN      ] [mLLVM.ShortImmTest
2023-01-11T23:32:37.6781035Z [0;32m[       OK ] [mLLVM.ShortImmTest (18 ms)
2023-01-11T23:32:37.6781358Z [0;32m[ RUN      ] [mLLVM.IntImmTest
2023-01-11T23:32:37.6960605Z [0;32m[       OK ] [mLLVM.IntImmTest (17 ms)
2023-01-11T23:32:37.6960905Z [0;32m[ RUN      ] [mLLVM.LongImmTest
2023-01-11T23:32:37.7142840Z [0;32m[       OK ] [mLLVM.LongImmTest (18 ms)
2023-01-11T23:32:37.7143563Z [0;32m[ RUN      ] [mLLVM.FloatImmTest
2023-01-11T23:32:37.7321662Z [0;32m[       OK ] [mLLVM.FloatImmTest (18 ms)
2023-01-11T23:32:37.7321979Z [0;32m[ RUN      ] [mLLVM.DoubleImmTest
2023-01-11T23:32:37.7504606Z [0;32m[       OK ] [mLLVM.DoubleImmTest (18 ms)
2023-01-11T23:32:37.7505202Z [0;32m[ RUN      ] [mLLVM.HalfImmTest
2023-01-11T23:32:37.7683969Z [0;32m[       OK ] [mLLVM.HalfImmTest (18 ms)
2023-01-11T23:32:37.7684286Z [0;32m[ RUN      ] [mLLVM.ByteAddTest
2023-01-11T23:32:37.7867883Z [0;32m[       OK ] [mLLVM.ByteAddTest (18 ms)
2023-01-11T23:32:37.7868282Z [0;32m[ RUN      ] [mLLVM.CharAddTest
2023-01-11T23:32:37.8047526Z [0;32m[       OK ] [mLLVM.CharAddTest (18 ms)
2023-01-11T23:32:37.8047835Z [0;32m[ RUN      ] [mLLVM.ShortAddTest
2023-01-11T23:32:37.8230191Z [0;32m[       OK ] [mLLVM.ShortAddTest (17 ms)
2023-01-11T23:32:37.8230705Z [0;32m[ RUN      ] [mLLVM.IntAddTest
2023-01-11T23:32:37.8408246Z [0;32m[       OK ] [mLLVM.IntAddTest (17 ms)
2023-01-11T23:32:37.8408565Z [0;32m[ RUN      ] [mLLVM.LongAddTest
2023-01-11T23:32:37.8593047Z [0;32m[       OK ] [mLLVM.LongAddTest (18 ms)
2023-01-11T23:32:37.8593642Z [0;32m[ RUN      ] [mLLVM.FloatAddTest
2023-01-11T23:32:37.8770045Z [0;32m[       OK ] [mLLVM.FloatAddTest (18 ms)
2023-01-11T23:32:37.8770351Z [0;32m[ RUN      ] [mLLVM.DoubleAddTest
2023-01-11T23:32:37.8955222Z [0;32m[       OK ] [mLLVM.DoubleAddTest (18 ms)
2023-01-11T23:32:37.8955822Z [0;32m[ RUN      ] [mLLVM.HalfAddTest
2023-01-11T23:32:37.9133149Z [0;32m[       OK ] [mLLVM.HalfAddTest (18 ms)
2023-01-11T23:32:37.9133763Z [0;32m[ RUN      ] [mLLVM.ByteSubTest
2023-01-11T23:32:37.9316368Z [0;32m[       OK ] [mLLVM.ByteSubTest (18 ms)
2023-01-11T23:32:37.9316956Z [0;32m[ RUN      ] [mLLVM.CharSubTest
2023-01-11T23:32:37.9492966Z [0;32m[       OK ] [mLLVM.CharSubTest (17 ms)
2023-01-11T23:32:37.9493421Z [0;32m[ RUN      ] [mLLVM.ShortSubTest
2023-01-11T23:32:37.9677141Z [0;32m[       OK ] [mLLVM.ShortSubTest (18 ms)
2023-01-11T23:32:37.9677753Z [0;32m[ RUN      ] [mLLVM.IntSubTest
2023-01-11T23:32:37.9855483Z [0;32m[       OK ] [mLLVM.IntSubTest (17 ms)
2023-01-11T23:32:37.9856070Z [0;32m[ RUN      ] [mLLVM.LongSubTest
2023-01-11T23:32:38.0037289Z [0;32m[       OK ] [mLLVM.LongSubTest (18 ms)
2023-01-11T23:32:38.0037882Z [0;32m[ RUN      ] [mLLVM.FloatSubTest
2023-01-11T23:32:38.0217729Z [0;32m[       OK ] [mLLVM.FloatSubTest (18 ms)
2023-01-11T23:32:38.0218373Z [0;32m[ RUN      ] [mLLVM.DoubleSubTest
2023-01-11T23:32:38.0399469Z [0;32m[       OK ] [mLLVM.DoubleSubTest (18 ms)
2023-01-11T23:32:38.0400067Z [0;32m[ RUN      ] [mLLVM.HalfSubTest
2023-01-11T23:32:38.0578796Z [0;32m[       OK ] [mLLVM.HalfSubTest (18 ms)
2023-01-11T23:32:38.0579377Z [0;32m[ RUN      ] [mLLVM.ByteMulTest
2023-01-11T23:32:38.0759748Z [0;32m[       OK ] [mLLVM.ByteMulTest (17 ms)
2023-01-11T23:32:38.0760337Z [0;32m[ RUN      ] [mLLVM.CharMulTest
2023-01-11T23:32:38.0940292Z [0;32m[       OK ] [mLLVM.CharMulTest (17 ms)
2023-01-11T23:32:38.0940882Z [0;32m[ RUN      ] [mLLVM.ShortMulTest
2023-01-11T23:32:38.1120098Z [0;32m[       OK ] [mLLVM.ShortMulTest (18 ms)
2023-01-11T23:32:38.1120682Z [0;32m[ RUN      ] [mLLVM.IntMulTest
2023-01-11T23:32:38.1302225Z [0;32m[       OK ] [mLLVM.IntMulTest (18 ms)
2023-01-11T23:32:38.1302827Z [0;32m[ RUN      ] [mLLVM.LongMulTest
2023-01-11T23:32:38.1482633Z [0;32m[       OK ] [mLLVM.LongMulTest (18 ms)
2023-01-11T23:32:38.1483211Z [0;32m[ RUN      ] [mLLVM.FloatMulTest
2023-01-11T23:32:38.1664811Z [0;32m[       OK ] [mLLVM.FloatMulTest (18 ms)
2023-01-11T23:32:38.1665632Z [0;32m[ RUN      ] [mLLVM.DoubleMulTest
2023-01-11T23:32:38.1843309Z [0;32m[       OK ] [mLLVM.DoubleMulTest (18 ms)
2023-01-11T23:32:38.1843702Z [0;32m[ RUN      ] [mLLVM.HalfMulTest
2023-01-11T23:32:38.2026170Z [0;32m[       OK ] [mLLVM.HalfMulTest (18 ms)
2023-01-11T23:32:38.2026823Z [0;32m[ RUN      ] [mLLVM.ByteDivTest
2023-01-11T23:32:38.2204997Z [0;32m[       OK ] [mLLVM.ByteDivTest (18 ms)
2023-01-11T23:32:38.2205372Z [0;32m[ RUN      ] [mLLVM.CharDivTest
2023-01-11T23:32:38.2387153Z [0;32m[       OK ] [mLLVM.CharDivTest (17 ms)
2023-01-11T23:32:38.2388530Z [0;32m[ RUN      ] [mLLVM.ShortDivTest
2023-01-11T23:32:38.2564899Z [0;32m[       OK ] [mLLVM.ShortDivTest (17 ms)
2023-01-11T23:32:38.2565200Z [0;32m[ RUN      ] [mLLVM.IntDivTest
2023-01-11T23:32:38.2749046Z [0;32m[       OK ] [mLLVM.IntDivTest (18 ms)
2023-01-11T23:32:38.2749612Z [0;32m[ RUN      ] [mLLVM.LongDivTest
2023-01-11T23:32:38.2927059Z [0;32m[       OK ] [mLLVM.LongDivTest (18 ms)
2023-01-11T23:32:38.2927366Z [0;32m[ RUN      ] [mLLVM.FloatDivTest
2023-01-11T23:32:38.3110371Z [0;32m[       OK ] [mLLVM.FloatDivTest (18 ms)
2023-01-11T23:32:38.3110999Z [0;32m[ RUN      ] [mLLVM.DoubleDivTest
2023-01-11T23:32:38.3289308Z [0;32m[       OK ] [mLLVM.DoubleDivTest (18 ms)
2023-01-11T23:32:38.3289618Z [0;32m[ RUN      ] [mLLVM.HalfDivTest
2023-01-11T23:32:38.3472834Z [0;32m[       OK ] [mLLVM.HalfDivTest (18 ms)
2023-01-11T23:32:38.3473157Z [0;32m[ RUN      ] [mLLVM.IntToFloatCastTest
2023-01-11T23:32:38.3651721Z [0;32m[       OK ] [mLLVM.IntToFloatCastTest (18 ms)
2023-01-11T23:32:38.3652544Z [0;32m[ RUN      ] [mLLVM.FloatToIntCastTest
2023-01-11T23:32:38.3835029Z [0;32m[       OK ] [mLLVM.FloatToIntCastTest (18 ms)
2023-01-11T23:32:38.3835630Z [0;32m[ RUN      ] [mLLVM.IntToLongCastTest
2023-01-11T23:32:38.4012529Z [0;32m[       OK ] [mLLVM.IntToLongCastTest (17 ms)
2023-01-11T23:32:38.4013122Z [0;32m[ RUN      ] [mLLVM.ByteToCharCastTest
2023-01-11T23:32:38.4193972Z [0;32m[       OK ] [mLLVM.ByteToCharCastTest (18 ms)
2023-01-11T23:32:38.4194309Z [0;32m[ RUN      ] [mLLVM.HalfToLongCastTest
2023-01-11T23:32:38.4376440Z [0;32m[       OK ] [mLLVM.HalfToLongCastTest (18 ms)
2023-01-11T23:32:38.4377089Z [0;32m[ RUN      ] [mLLVM.ByteToDoubleCastTest
2023-01-11T23:32:38.4560416Z [0;32m[       OK ] [mLLVM.ByteToDoubleCastTest (18 ms)
2023-01-11T23:32:38.4561030Z [0;32m[ RUN      ] [mLLVM.FloatToByteCastTest
2023-01-11T23:32:38.4738292Z [0;32m[       OK ] [mLLVM.FloatToByteCastTest (18 ms)
2023-01-11T23:32:38.4738980Z [0;32m[ RUN      ] [mLLVM.FloatToCharCastTest
2023-01-11T23:32:38.4920080Z [0;32m[       OK ] [mLLVM.FloatToCharCastTest (17 ms)
2023-01-11T23:32:38.4920692Z [0;32m[ RUN      ] [mLLVM.ByteToFloatCastTest
2023-01-11T23:32:38.5100640Z [0;32m[       OK ] [mLLVM.ByteToFloatCastTest (18 ms)
2023-01-11T23:32:38.5101173Z [0;32m[ RUN      ] [mLLVM.CharToFloatCastTest
2023-01-11T23:32:38.5278655Z [0;32m[       OK ] [mLLVM.CharToFloatCastTest (17 ms)
2023-01-11T23:32:38.5278964Z [0;32m[ RUN      ] [mLLVM.BitCast
2023-01-11T23:32:38.6001665Z [0;32m[       OK ] [mLLVM.BitCast (72 ms)
2023-01-11T23:32:38.6001963Z [0;32m[ RUN      ] [mLLVM.fastLogFloat
2023-01-11T23:32:38.6710144Z [0;32m[       OK ] [mLLVM.fastLogFloat (70 ms)
2023-01-11T23:32:38.6710780Z [0;32m[ RUN      ] [mLLVM.LetTest01
2023-01-11T23:32:38.6890416Z [0;32m[       OK ] [mLLVM.LetTest01 (18 ms)
2023-01-11T23:32:38.6890708Z [0;32m[ RUN      ] [mLLVM.LetTest02
2023-01-11T23:32:38.7074701Z [0;32m[       OK ] [mLLVM.LetTest02 (18 ms)
2023-01-11T23:32:38.7075258Z [0;32m[ RUN      ] [mLLVM.LetTestMultitype
2023-01-11T23:32:38.7253539Z [0;32m[       OK ] [mLLVM.LetTestMultitype (18 ms)
2023-01-11T23:32:38.7254864Z [0;32m[ RUN      ] [mLLVM.BufferTest
2023-01-11T23:32:38.7438798Z [0;32m[       OK ] [mLLVM.BufferTest (18 ms)
2023-01-11T23:32:38.7439354Z [0;32m[ RUN      ] [mLLVM.BlockTest
2023-01-11T23:32:38.7622227Z [0;32m[       OK ] [mLLVM.BlockTest (18 ms)
2023-01-11T23:32:38.7622537Z [0;32m[ RUN      ] [mLLVM.LoadStoreTest
2023-01-11T23:32:38.7805407Z [0;32m[       OK ] [mLLVM.LoadStoreTest (18 ms)
2023-01-11T23:32:38.7805722Z [0;32m[ RUN      ] [mLLVM.IfThenElseTest
2023-01-11T23:32:38.8010250Z [0;32m[       OK ] [mLLVM.IfThenElseTest (20 ms)
2023-01-11T23:32:38.8011316Z [0;32m[ RUN      ] [mLLVM.CondNoFalseBlockTest
2023-01-11T23:32:38.8594140Z [0;32m[       OK ] [mLLVM.CondNoFalseBlockTest (58 ms)
2023-01-11T23:32:38.8594747Z [0;32m[ RUN      ] [mLLVM.CondTest
2023-01-11T23:32:38.9183775Z [0;32m[       OK ] [mLLVM.CondTest (59 ms)
2023-01-11T23:32:38.9184101Z [0;32m[ RUN      ] [mLLVM.CondNestedTest
2023-01-11T23:32:39.0026531Z [0;32m[       OK ] [mLLVM.CondNestedTest (84 ms)
2023-01-11T23:32:39.0027282Z [0;32m[ RUN      ] [mLLVM.DirectVectorization
2023-01-11T23:32:39.0464591Z [0;32m[       OK ] [mLLVM.DirectVectorization (43 ms)
2023-01-11T23:32:39.0465239Z [0;32m[ RUN      ] [mLLVM.VecLoadStoreTest
2023-01-11T23:32:39.0646610Z [0;32m[       OK ] [mLLVM.VecLoadStoreTest (18 ms)
2023-01-11T23:32:39.0646932Z [0;32m[ RUN      ] [mLLVM.VecFloat_erfLane4Test
2023-01-11T23:32:39.0840851Z [0;32m[       OK ] [mLLVM.VecFloat_erfLane4Test (19 ms)
2023-01-11T23:32:39.0841505Z [0;32m[ RUN      ] [mLLVM.VecFloat_erfcLane4Test
2023-01-11T23:32:39.1031028Z [0;32m[       OK ] [mLLVM.VecFloat_erfcLane4Test (18 ms)
2023-01-11T23:32:39.1031685Z [0;32m[ RUN      ] [mLLVM.VecFloat_acosLane4Test
2023-01-11T23:32:39.1220695Z [0;32m[       OK ] [mLLVM.VecFloat_acosLane4Test (19 ms)
2023-01-11T23:32:39.1221667Z [0;32m[ RUN      ] [mLLVM.VecFloat_asinLane4Test
2023-01-11T23:32:39.1409720Z [0;32m[       OK ] [mLLVM.VecFloat_asinLane4Test (19 ms)
2023-01-11T23:32:39.1410050Z [0;32m[ RUN      ] [mLLVM.VecFloat_atanLane4Test
2023-01-11T23:32:39.1602638Z [0;32m[       OK ] [mLLVM.VecFloat_atanLane4Test (19 ms)
2023-01-11T23:32:39.1603180Z [0;32m[ RUN      ] [mLLVM.VecFloat_coshLane4Test
2023-01-11T23:32:39.1794751Z [0;32m[       OK ] [mLLVM.VecFloat_coshLane4Test (19 ms)
2023-01-11T23:32:39.1795873Z [0;32m[ RUN      ] [mLLVM.VecFloat_sinhLane4Test
2023-01-11T23:32:39.1987011Z [0;32m[       OK ] [mLLVM.VecFloat_sinhLane4Test (19 ms)
2023-01-11T23:32:39.1987663Z [0;32m[ RUN      ] [mLLVM.VecFloat_tanhLane4Test
2023-01-11T23:32:39.2177452Z [0;32m[       OK ] [mLLVM.VecFloat_tanhLane4Test (19 ms)
2023-01-11T23:32:39.2177811Z [0;32m[ RUN      ] [mLLVM.VecFloat_expm1Lane4Test
2023-01-11T23:32:39.2366943Z [0;32m[       OK ] [mLLVM.VecFloat_expm1Lane4Test (19 ms)
2023-01-11T23:32:39.2367287Z [0;32m[ RUN      ] [mLLVM.VecFloat_lgammaLane4Test
2023-01-11T23:32:39.2560652Z [0;32m[       OK ] [mLLVM.VecFloat_lgammaLane4Test (19 ms)
2023-01-11T23:32:39.2561331Z [0;32m[ RUN      ] [mLLVM.VecFloat_erfLane8Test
2023-01-11T23:32:39.2754328Z [0;32m[       OK ] [mLLVM.VecFloat_erfLane8Test (19 ms)
2023-01-11T23:32:39.2755497Z [0;32m[ RUN      ] [mLLVM.VecFloat_erfcLane8Test
2023-01-11T23:32:39.2946046Z [0;32m[       OK ] [mLLVM.VecFloat_erfcLane8Test (19 ms)
2023-01-11T23:32:39.2946678Z [0;32m[ RUN      ] [mLLVM.VecFloat_acosLane8Test
2023-01-11T23:32:39.3135803Z [0;32m[       OK ] [mLLVM.VecFloat_acosLane8Test (19 ms)
2023-01-11T23:32:39.3136838Z [0;32m[ RUN      ] [mLLVM.VecFloat_asinLane8Test
2023-01-11T23:32:39.3325069Z [0;32m[       OK ] [mLLVM.VecFloat_asinLane8Test (19 ms)
2023-01-11T23:32:39.3325507Z [0;32m[ RUN      ] [mLLVM.VecFloat_atanLane8Test
2023-01-11T23:32:39.3518807Z [0;32m[       OK ] [mLLVM.VecFloat_atanLane8Test (19 ms)
2023-01-11T23:32:39.3519481Z [0;32m[ RUN      ] [mLLVM.VecFloat_coshLane8Test
2023-01-11T23:32:39.3709340Z [0;32m[       OK ] [mLLVM.VecFloat_coshLane8Test (18 ms)
2023-01-11T23:32:39.3710129Z [0;32m[ RUN      ] [mLLVM.VecFloat_sinhLane8Test
2023-01-11T23:32:39.3900754Z [0;32m[       OK ] [mLLVM.VecFloat_sinhLane8Test (19 ms)
2023-01-11T23:32:39.3901401Z [0;32m[ RUN      ] [mLLVM.VecFloat_tanhLane8Test
2023-01-11T23:32:39.4090120Z [0;32m[       OK ] [mLLVM.VecFloat_tanhLane8Test (19 ms)
2023-01-11T23:32:39.4090519Z [0;32m[ RUN      ] [mLLVM.VecFloat_expm1Lane8Test
2023-01-11T23:32:39.4282669Z [0;32m[       OK ] [mLLVM.VecFloat_expm1Lane8Test (19 ms)
2023-01-11T23:32:39.4283035Z [0;32m[ RUN      ] [mLLVM.VecFloat_lgammaLane8Test
2023-01-11T23:32:39.4474567Z [0;32m[       OK ] [mLLVM.VecFloat_lgammaLane8Test (19 ms)
2023-01-11T23:32:39.4475221Z [0;32m[ RUN      ] [mLLVM.VecDouble_erfLane2Test
2023-01-11T23:32:39.4665878Z [0;32m[       OK ] [mLLVM.VecDouble_erfLane2Test (19 ms)
2023-01-11T23:32:39.4666524Z [0;32m[ RUN      ] [mLLVM.VecDouble_erfcLane2Test
2023-01-11T23:32:39.4855118Z [0;32m[       OK ] [mLLVM.VecDouble_erfcLane2Test (19 ms)
2023-01-11T23:32:39.4855778Z [0;32m[ RUN      ] [mLLVM.VecDouble_acosLane2Test
2023-01-11T23:32:39.5044714Z [0;32m[       OK ] [mLLVM.VecDouble_acosLane2Test (19 ms)
2023-01-11T23:32:39.5045050Z [0;32m[ RUN      ] [mLLVM.VecDouble_asinLane2Test
2023-01-11T23:32:39.5239218Z [0;32m[       OK ] [mLLVM.VecDouble_asinLane2Test (19 ms)
2023-01-11T23:32:39.5239863Z [0;32m[ RUN      ] [mLLVM.VecDouble_atanLane2Test
2023-01-11T23:32:39.5428758Z [0;32m[       OK ] [mLLVM.VecDouble_atanLane2Test (19 ms)
2023-01-11T23:32:39.5429411Z [0;32m[ RUN      ] [mLLVM.VecDouble_coshLane2Test
2023-01-11T23:32:39.5620725Z [0;32m[       OK ] [mLLVM.VecDouble_coshLane2Test (19 ms)
2023-01-11T23:32:39.5621374Z [0;32m[ RUN      ] [mLLVM.VecDouble_sinhLane2Test
2023-01-11T23:32:39.5809499Z [0;32m[       OK ] [mLLVM.VecDouble_sinhLane2Test (19 ms)
2023-01-11T23:32:39.5809836Z [0;32m[ RUN      ] [mLLVM.VecDouble_tanhLane2Test
2023-01-11T23:32:39.6002486Z [0;32m[       OK ] [mLLVM.VecDouble_tanhLane2Test (19 ms)
2023-01-11T23:32:39.6002823Z [0;32m[ RUN      ] [mLLVM.VecDouble_expm1Lane2Test
2023-01-11T23:32:39.6194453Z [0;32m[       OK ] [mLLVM.VecDouble_expm1Lane2Test (19 ms)
2023-01-11T23:32:39.6195118Z [0;32m[ RUN      ] [mLLVM.VecDouble_lgammaLane2Test
2023-01-11T23:32:39.6384987Z [0;32m[       OK ] [mLLVM.VecDouble_lgammaLane2Test (19 ms)
2023-01-11T23:32:39.6385671Z [0;32m[ RUN      ] [mLLVM.VecDouble_erfLane4Test
2023-01-11T23:32:39.6576177Z [0;32m[       OK ] [mLLVM.VecDouble_erfLane4Test (19 ms)
2023-01-11T23:32:39.6576821Z [0;32m[ RUN      ] [mLLVM.VecDouble_erfcLane4Test
2023-01-11T23:32:39.6765426Z [0;32m[       OK ] [mLLVM.VecDouble_erfcLane4Test (19 ms)
2023-01-11T23:32:39.6766015Z [0;32m[ RUN      ] [mLLVM.VecDouble_acosLane4Test
2023-01-11T23:32:39.6960292Z [0;32m[       OK ] [mLLVM.VecDouble_acosLane4Test (19 ms)
2023-01-11T23:32:39.6960950Z [0;32m[ RUN      ] [mLLVM.VecDouble_asinLane4Test
2023-01-11T23:32:39.7151352Z [0;32m[       OK ] [mLLVM.VecDouble_asinLane4Test (19 ms)
2023-01-11T23:32:39.7152706Z [0;32m[ RUN      ] [mLLVM.VecDouble_atanLane4Test
2023-01-11T23:32:39.7340775Z [0;32m[       OK ] [mLLVM.VecDouble_atanLane4Test (19 ms)
2023-01-11T23:32:39.7341429Z [0;32m[ RUN      ] [mLLVM.VecDouble_coshLane4Test
2023-01-11T23:32:39.7530118Z [0;32m[       OK ] [mLLVM.VecDouble_coshLane4Test (19 ms)
2023-01-11T23:32:39.7530456Z [0;32m[ RUN      ] [mLLVM.VecDouble_sinhLane4Test
2023-01-11T23:32:39.7723020Z [0;32m[       OK ] [mLLVM.VecDouble_sinhLane4Test (19 ms)
2023-01-11T23:32:39.7723488Z [0;32m[ RUN      ] [mLLVM.VecDouble_tanhLane4Test
2023-01-11T23:32:39.7916115Z [0;32m[       OK ] [mLLVM.VecDouble_tanhLane4Test (19 ms)
2023-01-11T23:32:39.7916798Z [0;32m[ RUN      ] [mLLVM.VecDouble_expm1Lane4Test
2023-01-11T23:32:39.8110067Z [0;32m[       OK ] [mLLVM.VecDouble_expm1Lane4Test (19 ms)
2023-01-11T23:32:39.8110437Z [0;32m[ RUN      ] [mLLVM.VecDouble_lgammaLane4Test
2023-01-11T23:32:39.8299117Z [0;32m[       OK ] [mLLVM.VecDouble_lgammaLane4Test (19 ms)
2023-01-11T23:32:39.8299805Z [0;32m[ RUN      ] [mLLVM.VectorizerLoadStoreTest
2023-01-11T23:32:39.8485602Z [0;32m[       OK ] [mLLVM.VectorizerLoadStoreTest (18 ms)
2023-01-11T23:32:39.8485937Z [0;32m[ RUN      ] [mLLVM.VectorizeBitCast
2023-01-11T23:32:39.8724402Z [0;32m[       OK ] [mLLVM.VectorizeBitCast (23 ms)
2023-01-11T23:32:39.8725009Z [0;32m[ RUN      ] [mLLVM.MemcpyTest
2023-01-11T23:32:39.8931526Z [0;32m[       OK ] [mLLVM.MemcpyTest (20 ms)
2023-01-11T23:32:39.8932470Z [0;32m[ RUN      ] [mLLVM.BzeroTest
2023-01-11T23:32:39.9130366Z [0;32m[       OK ] [mLLVM.BzeroTest (19 ms)
2023-01-11T23:32:39.9130666Z [0;32m[ RUN      ] [mLLVM.ElemwiseAdd
2023-01-11T23:32:39.9431209Z [0;32m[       OK ] [mLLVM.ElemwiseAdd (29 ms)
2023-01-11T23:32:39.9432394Z [0;32m[ RUN      ] [mLLVM.ElemwiseAddFloat
2023-01-11T23:32:39.9723687Z [0;32m[       OK ] [mLLVM.ElemwiseAddFloat (29 ms)
2023-01-11T23:32:39.9724019Z [0;32m[ RUN      ] [mLLVM.ElemwiseLog10Float
2023-01-11T23:32:39.9961571Z [0;32m[       OK ] [mLLVM.ElemwiseLog10Float (23 ms)
2023-01-11T23:32:39.9962066Z [0;32m[ RUN      ] [mLLVM.ElemwiseLog1pFloat
2023-01-11T23:32:40.0200107Z [0;32m[       OK ] [mLLVM.ElemwiseLog1pFloat (23 ms)
2023-01-11T23:32:40.0200719Z [0;32m[ RUN      ] [mLLVM.ElemwiseMaxInt
2023-01-11T23:32:40.0494144Z [0;32m[       OK ] [mLLVM.ElemwiseMaxInt (29 ms)
2023-01-11T23:32:40.0495093Z [0;32m[ RUN      ] [mLLVM.ElemwiseMinInt
2023-01-11T23:32:40.0793024Z [0;32m[       OK ] [mLLVM.ElemwiseMinInt (29 ms)
2023-01-11T23:32:40.0794303Z [0;32m[ RUN      ] [mLLVM.ElemwiseMaxFloat
2023-01-11T23:32:40.1113685Z [0;32m[       OK ] [mLLVM.ElemwiseMaxFloat (31 ms)
2023-01-11T23:32:40.1114368Z [0;32m[ RUN      ] [mLLVM.ElemwiseMaxNaNFloat
2023-01-11T23:32:40.1432662Z [0;32m[       OK ] [mLLVM.ElemwiseMaxNaNFloat (31 ms)
2023-01-11T23:32:40.1433311Z [0;32m[ RUN      ] [mLLVM.ElemwiseMinFloat
2023-01-11T23:32:40.1753274Z [0;32m[       OK ] [mLLVM.ElemwiseMinFloat (32 ms)
2023-01-11T23:32:40.1754307Z [0;32m[ RUN      ] [mLLVM.ElemwiseMinNaNFloat
2023-01-11T23:32:40.2073271Z [0;32m[       OK ] [mLLVM.ElemwiseMinNaNFloat (31 ms)
2023-01-11T23:32:40.2073898Z [0;32m[ RUN      ] [mLLVM.ElemwiseMod
2023-01-11T23:32:40.2314069Z [0;32m[       OK ] [mLLVM.ElemwiseMod (24 ms)
2023-01-11T23:32:40.2314726Z [0;32m[ RUN      ] [mLLVM.CompareSelectIntEQ
2023-01-11T23:32:40.2618513Z [0;32m[       OK ] [mLLVM.CompareSelectIntEQ (30 ms)
2023-01-11T23:32:40.2619176Z [0;32m[ RUN      ] [mLLVM.CompareSelectFloatEQ
2023-01-11T23:32:40.2923767Z [0;32m[       OK ] [mLLVM.CompareSelectFloatEQ (30 ms)
2023-01-11T23:32:40.2924110Z [0;32m[ RUN      ] [mLLVM.CompareSelectByteGT
2023-01-11T23:32:40.3249677Z [0;32m[       OK ] [mLLVM.CompareSelectByteGT (32 ms)
2023-01-11T23:32:40.3250021Z [0;32m[ RUN      ] [mLLVM.CompareSelectByteGE
2023-01-11T23:32:40.3582626Z [0;32m[       OK ] [mLLVM.CompareSelectByteGE (32 ms)
2023-01-11T23:32:40.3583285Z [0;32m[ RUN      ] [mLLVM.CompareSelectByteLT
2023-01-11T23:32:40.3910333Z [0;32m[       OK ] [mLLVM.CompareSelectByteLT (32 ms)
2023-01-11T23:32:40.3911048Z [0;32m[ RUN      ] [mLLVM.CompareSelectByteLE
2023-01-11T23:32:40.4240362Z [0;32m[       OK ] [mLLVM.CompareSelectByteLE (33 ms)
2023-01-11T23:32:40.4240975Z [0;32m[ RUN      ] [mLLVM.StoreFloat
2023-01-11T23:32:40.4421941Z [0;32m[       OK ] [mLLVM.StoreFloat (18 ms)
2023-01-11T23:32:40.4422543Z [0;32m[ RUN      ] [mLLVM.SimpleMath01
2023-01-11T23:32:40.4740334Z [0;32m[       OK ] [mLLVM.SimpleMath01 (31 ms)
2023-01-11T23:32:40.4740917Z [0;32m[ RUN      ] [mLLVM.ComputeMul
2023-01-11T23:32:40.5035733Z [0;32m[       OK ] [mLLVM.ComputeMul (29 ms)
2023-01-11T23:32:40.5036328Z [0;32m[ RUN      ] [mLLVM.BroadcastAdd
2023-01-11T23:32:40.5395137Z [0;32m[       OK ] [mLLVM.BroadcastAdd (35 ms)
2023-01-11T23:32:40.5395757Z [0;32m[ RUN      ] [mLLVM.BitwiseOps
2023-01-11T23:32:40.5575069Z [0;32m[       OK ] [mLLVM.BitwiseOps (18 ms)
2023-01-11T23:32:40.5575748Z [0;32m[ RUN      ] [mLLVM.ArithmeticRightShift
2023-01-11T23:32:40.5757251Z [0;32m[       OK ] [mLLVM.ArithmeticRightShift (18 ms)
2023-01-11T23:32:40.5757928Z [0;32m[ RUN      ] [mLLVM.LogicalRightShift
2023-01-11T23:32:40.5936216Z [0;32m[       OK ] [mLLVM.LogicalRightShift (17 ms)
2023-01-11T23:32:40.5936856Z [0;32m[ RUN      ] [mLLVM.DynamicShapeAdd
2023-01-11T23:32:40.7020371Z [0;32m[       OK ] [mLLVM.DynamicShapeAdd (108 ms)
2023-01-11T23:32:40.7021097Z [0;32m[ RUN      ] [mLLVM.BindDynamicShapeAdd
2023-01-11T23:32:40.8105046Z [0;32m[       OK ] [mLLVM.BindDynamicShapeAdd (108 ms)
2023-01-11T23:32:40.8105510Z [0;32m[ RUN      ] [mLLVM.TensorDynamicShapeAdd
2023-01-11T23:32:40.9197032Z [0;32m[       OK ] [mLLVM.TensorDynamicShapeAdd (108 ms)
2023-01-11T23:32:40.9198674Z [0;32m[ RUN      ] [mLLVM.DynamicShape2D
2023-01-11T23:32:41.0707106Z [0;32m[       OK ] [mLLVM.DynamicShape2D (151 ms)
2023-01-11T23:32:41.0707642Z [0;32m[ RUN      ] [mLLVM.EmptyStmt
2023-01-11T23:32:41.0877456Z [0;32m[       OK ] [mLLVM.EmptyStmt (17 ms)
2023-01-11T23:32:41.0878531Z [0;32m[ RUN      ] [mLLVM.EliminatedStmt
2023-01-11T23:32:41.1050566Z [0;32m[       OK ] [mLLVM.EliminatedStmt (17 ms)
2023-01-11T23:32:41.1050895Z [0;32m[ RUN      ] [mLLVM.SimpleReduction
2023-01-11T23:32:41.2095438Z [0;32m[       OK ] [mLLVM.SimpleReduction (104 ms)
2023-01-11T23:32:41.2096085Z [0;32m[ RUN      ] [mLLVM.RFactorReduction
2023-01-11T23:32:41.2388531Z [0;32m[       OK ] [mLLVM.RFactorReduction (29 ms)
2023-01-11T23:32:41.2389171Z [0;32m[ RUN      ] [mLLVM.RFactorVectorizedReduction
2023-01-11T23:32:41.2916967Z [0;32m[       OK ] [mLLVM.RFactorVectorizedReduction (52 ms)
2023-01-11T23:32:41.2917658Z [0;32m[ RUN      ] [mLLVM.SimpleParallelSS
2023-01-11T23:32:41.3168306Z [0;32m[       OK ] [mLLVM.SimpleParallelSS (25 ms)
2023-01-11T23:32:41.3168630Z [0;32m[ RUN      ] [mLLVM.SimpleParallelSP
2023-01-11T23:32:41.3445023Z [0;32m[       OK ] [mLLVM.SimpleParallelSP (27 ms)
2023-01-11T23:32:41.3445517Z [0;32m[ RUN      ] [mLLVM.SimpleParallelPS
2023-01-11T23:32:41.3738898Z [0;32m[       OK ] [mLLVM.SimpleParallelPS (29 ms)
2023-01-11T23:32:41.3739589Z [0;32m[ RUN      ] [mLLVM.SimpleParallelPP
2023-01-11T23:32:41.4017224Z [0;32m[       OK ] [mLLVM.SimpleParallelPP (28 ms)
2023-01-11T23:32:41.4017586Z [0;32m[ RUN      ] [mLLVM.CompositeParallel
2023-01-11T23:32:44.9376231Z [0;32m[       OK ] [mLLVM.CompositeParallel (3535 ms)
2023-01-11T23:32:44.9376616Z [0;32m[ RUN      ] [mLLVM.VectorizedGEMM
2023-01-11T23:32:45.0583698Z [0;32m[       OK ] [mLLVM.VectorizedGEMM (120 ms)
2023-01-11T23:32:45.0583996Z [0;32m[ RUN      ] [mLLVM.CallRaw
2023-01-11T23:32:45.7203922Z [0;32m[       OK ] [mLLVM.CallRaw (662 ms)
2023-01-11T23:32:45.7204222Z [0;32m[ RUN      ] [mLLVM.CustomTarget
2023-01-11T23:32:45.7465450Z [0;32m[       OK ] [mLLVM.CustomTarget (26 ms)
2023-01-11T23:32:45.7465769Z [0;32m[ RUN      ] [mLLVM.CodeGenKernelFuncName
2023-01-11T23:32:45.7837552Z [0;32m[       OK ] [mLLVM.CodeGenKernelFuncName (36 ms)
2023-01-11T23:32:45.7838230Z [0;32m[----------] [m150 tests from LLVM (8160 ms total)
2023-01-11T23:32:45.7838515Z 
2023-01-11T23:32:45.7838820Z [0;32m[----------] [mGlobal test environment tear-down
2023-01-11T23:32:45.7937314Z [0;32m[==========] [m829 tests from 26 test suites ran. (36068 ms total)
2023-01-11T23:32:45.7941609Z [0;32m[  PASSED  ] [m829 tests.
2023-01-11T23:32:45.7941777Z 
2023-01-11T23:32:45.7942503Z [0;33m  YOU HAVE 5 DISABLED TESTS
2023-01-11T23:32:45.7942825Z 
2023-01-11T23:32:45.9664381Z [m+ [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *android* ]]
2023-01-11T23:32:45.9664822Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *cuda* ]]
2023-01-11T23:32:45.9665072Z + assert_git_not_dirty
2023-01-11T23:32:45.9665375Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]]
2023-01-11T23:32:45.9665725Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]]
2023-01-11T23:32:45.9667198Z ++ git status --porcelain
2023-01-11T23:32:46.0390733Z + git_status=
2023-01-11T23:32:46.0391602Z + [[ -n '' ]]
2023-01-11T23:32:46.0392025Z + test_aot_compilation
2023-01-11T23:32:46.0392310Z + echo 'Testing Ahead of Time compilation'
2023-01-11T23:32:46.0392565Z Testing Ahead of Time compilation
2023-01-11T23:32:46.0393417Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libc10.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libc10d_cuda_test.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:32:46.0404870Z + ln -sf /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_cuda_linalg.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so /opt/conda/lib/python3.10/site-packages/torch/lib/libtorchbind_test.so /opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:32:46.0414640Z + TEST_REPORTS_DIR=test/test-reports/cpp-unittest/test_aot_compilation
2023-01-11T23:32:46.0415126Z + mkdir -p test/test-reports/cpp-unittest/test_aot_compilation
2023-01-11T23:32:46.0423760Z + '[' -f /opt/conda/lib/python3.10/site-packages/torch/bin/test_mobile_nnc ']'
2023-01-11T23:32:46.0424293Z + /opt/conda/lib/python3.10/site-packages/torch/bin/test_mobile_nnc --gtest_output=xml:test/test-reports/cpp-unittest/test_aot_compilation/test_mobile_nnc.xml
2023-01-11T23:32:46.4149334Z [0;33mNote: Google Test filter = *-*_CUDA:*_MultiCUDA
2023-01-11T23:32:46.4150339Z [m[0;32m[==========] [mRunning 6 tests from 2 test suites.
2023-01-11T23:32:46.4151189Z [0;32m[----------] [mGlobal test environment set-up.
2023-01-11T23:32:46.4151908Z [0;32m[----------] [m4 tests from Function
2023-01-11T23:32:46.4152481Z [0;32m[ RUN      ] [mFunction.ExecuteSlowMul
2023-01-11T23:32:46.4159060Z [0;32m[       OK ] [mFunction.ExecuteSlowMul (1 ms)
2023-01-11T23:32:46.4160154Z [0;32m[ RUN      ] [mFunction.Serialization
2023-01-11T23:32:46.4160502Z [0;32m[       OK ] [mFunction.Serialization (0 ms)
2023-01-11T23:32:46.4160890Z [0;32m[ RUN      ] [mFunction.ValidInput
2023-01-11T23:32:46.4161201Z [0;32m[       OK ] [mFunction.ValidInput (0 ms)
2023-01-11T23:32:46.4161506Z [0;32m[ RUN      ] [mFunction.InvalidInput
2023-01-11T23:32:46.4167213Z [0;32m[       OK ] [mFunction.InvalidInput (0 ms)
2023-01-11T23:32:46.4167969Z [0;32m[----------] [m4 tests from Function (2 ms total)
2023-01-11T23:32:46.4168408Z 
2023-01-11T23:32:46.4168723Z [0;32m[----------] [m2 tests from MobileNNCRegistryTest
2023-01-11T23:32:46.4169203Z [0;32m[ RUN      ] [mMobileNNCRegistryTest.FindAndRun
2023-01-11T23:32:46.4169670Z [0;32m[       OK ] [mMobileNNCRegistryTest.FindAndRun (0 ms)
2023-01-11T23:32:46.4170100Z [0;32m[ RUN      ] [mMobileNNCRegistryTest.NoKernel
2023-01-11T23:32:46.4170593Z [0;32m[       OK ] [mMobileNNCRegistryTest.NoKernel (0 ms)
2023-01-11T23:32:46.4170998Z [0;32m[----------] [m2 tests from MobileNNCRegistryTest (0 ms total)
2023-01-11T23:32:46.4171178Z 
2023-01-11T23:32:46.4171343Z [0;32m[----------] [mGlobal test environment tear-down
2023-01-11T23:32:46.4171668Z [0;32m[==========] [m6 tests from 2 test suites ran. (2 ms total)
2023-01-11T23:32:46.4171946Z [0;32m[  PASSED  ] [m6 tests.
2023-01-11T23:32:46.4172076Z 
2023-01-11T23:32:46.4172203Z [0;33m  YOU HAVE 1 DISABLED TEST
2023-01-11T23:32:46.4172325Z 
2023-01-11T23:32:46.4906295Z [m+ '[' -f /opt/conda/lib/python3.10/site-packages/torch/bin/aot_model_compiler_test ']'
2023-01-11T23:32:46.4906720Z + source test/mobile/nnc/test_aot_compile.sh
2023-01-11T23:32:46.4907001Z ++ set -e -o pipefail
2023-01-11T23:32:46.4910659Z +++ python -c 'import site; print(site.getsitepackages()[0])'
2023-01-11T23:32:46.5057267Z ++ TORCH_INSTALL_DIR=/opt/conda/lib/python3.10/site-packages/torch
2023-01-11T23:32:46.5058114Z ++ TORCH_BIN_DIR=/opt/conda/lib/python3.10/site-packages/torch/bin
2023-01-11T23:32:46.5059113Z +++ dirname test/mobile/nnc/test_aot_compile.sh
2023-01-11T23:32:46.5068026Z ++ CURRENT_DIR=test/mobile/nnc
2023-01-11T23:32:46.5068579Z ++ MODEL=aot_test_model.pt
2023-01-11T23:32:46.5068894Z ++ COMPILED_MODEL=aot_test_model.compiled.pt
2023-01-11T23:32:46.5069237Z ++ COMPILED_CODE=aot_test_model.compiled.ll
2023-01-11T23:32:46.5071055Z +++ mktemp -d -t build_XXX
2023-01-11T23:32:46.5081771Z ++ TMP_DIR=/tmp/build_TDr
2023-01-11T23:32:46.5082137Z + test_custom_script_ops
2023-01-11T23:32:46.5082606Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *asan* ]]
2023-01-11T23:32:46.5083014Z + echo 'Testing custom script operators'
2023-01-11T23:32:46.5083330Z Testing custom script operators
2023-01-11T23:32:46.5083693Z + CUSTOM_OP_BUILD=/var/lib/jenkins/workspace/build/custom_test_artifacts/custom-op-build
2023-01-11T23:32:46.5083978Z + pushd test/custom_operator
2023-01-11T23:32:46.5084220Z ~/workspace/test/custom_operator ~/workspace
2023-01-11T23:32:46.5084587Z + cp -a /var/lib/jenkins/workspace/build/custom_test_artifacts/custom-op-build build
2023-01-11T23:32:46.5153037Z + python test_custom_ops.py -v
2023-01-11T23:32:47.8307936Z Test results will be stored in test-reports/python-unittest/test_custom_ops
2023-01-11T23:32:47.8315924Z 
2023-01-11T23:32:47.8316119Z Running tests...
2023-01-11T23:32:47.8316484Z ----------------------------------------------------------------------
2023-01-11T23:32:47.8359872Z   test_calling_custom_op (__main__.TestCustomOperators) ... ok (0.004s)
2023-01-11T23:32:47.8836787Z   test_calling_custom_op_inside_script_module (__main__.TestCustomOperators) ... ok (0.047s)
2023-01-11T23:32:47.8840079Z   test_calling_custom_op_string (__main__.TestCustomOperators) ... ok (0.001s)
2023-01-11T23:32:47.8858235Z   test_calling_custom_op_with_autograd (__main__.TestCustomOperators) ... /opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py:197: UserWarning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/engine.cpp:1134.)
2023-01-11T23:32:47.8859491Z   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2023-01-11T23:32:47.8870551Z ok (0.003s)
2023-01-11T23:32:47.8878095Z   test_calling_custom_op_with_autograd_in_nograd_mode (__main__.TestCustomOperators) ... ok (0.001s)
2023-01-11T23:32:47.8881337Z   test_custom_library_is_loaded (__main__.TestCustomOperators) ... ok (0.000s)
2023-01-11T23:32:47.8967666Z   test_saving_and_loading_script_module_with_custom_op (__main__.TestCustomOperators) ... ok (0.008s)
2023-01-11T23:32:47.8967938Z 
2023-01-11T23:32:47.8968275Z ----------------------------------------------------------------------
2023-01-11T23:32:47.8968643Z Ran 7 tests in 0.065s
2023-01-11T23:32:47.8968803Z 
2023-01-11T23:32:47.8968898Z OK
2023-01-11T23:32:47.8969029Z 
2023-01-11T23:32:47.8969591Z Generating XML reports...
2023-01-11T23:32:47.8996322Z Generated XML report: test-reports/python-unittest/test_custom_ops/TEST-TestCustomOperators-20230111233247.xml
2023-01-11T23:32:48.1999940Z + python model.py --export-script-module=model.pt
2023-01-11T23:32:49.5480986Z + build/test_custom_ops ./model.pt
2023-01-11T23:32:49.9356450Z [W engine.cpp:1134] Warning: Using backward() with create_graph=True will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using autograd.grad when creating the graph to avoid this. If you have to use this function, make sure to reset the .grad fields of your parameters to None after use to break the cycle and avoid the leak. (function operator())
2023-01-11T23:32:50.9038423Z ok
2023-01-11T23:32:51.0714975Z + popd
2023-01-11T23:32:51.0715236Z ~/workspace
2023-01-11T23:32:51.0715429Z + assert_git_not_dirty
2023-01-11T23:32:51.0715890Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]]
2023-01-11T23:32:51.0716256Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]]
2023-01-11T23:32:51.0716535Z ++ git status --porcelain
2023-01-11T23:32:51.1435016Z + git_status=
2023-01-11T23:32:51.1435432Z + [[ -n '' ]]
2023-01-11T23:32:51.1435848Z + test_custom_backend
2023-01-11T23:32:51.1436205Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *asan* ]]
2023-01-11T23:32:51.1436506Z + echo 'Testing custom backends'
2023-01-11T23:32:51.1436722Z Testing custom backends
2023-01-11T23:32:51.1437088Z + CUSTOM_BACKEND_BUILD=/var/lib/jenkins/workspace/build/custom_test_artifacts/custom-backend-build
2023-01-11T23:32:51.1437399Z + pushd test/custom_backend
2023-01-11T23:32:51.1437639Z ~/workspace/test/custom_backend ~/workspace
2023-01-11T23:32:51.1438010Z + cp -a /var/lib/jenkins/workspace/build/custom_test_artifacts/custom-backend-build build
2023-01-11T23:32:51.1501785Z + python test_custom_backend.py -v
2023-01-11T23:32:52.4760010Z Test results will be stored in test-reports/python-unittest/test_custom_backend
2023-01-11T23:32:52.4771305Z 
2023-01-11T23:32:52.4771603Z Running tests...
2023-01-11T23:32:52.4771952Z ----------------------------------------------------------------------
2023-01-11T23:32:52.4778235Z   test_execute (__main__.TestCustomBackend)
2023-01-11T23:32:52.5327813Z Test execution using the custom backend. ... ok (0.055s)
2023-01-11T23:32:52.5333282Z   test_save_load (__main__.TestCustomBackend)
2023-01-11T23:32:52.5512280Z Test that a lowered module can be executed correctly ... ok (0.018s)
2023-01-11T23:32:52.5513595Z 
2023-01-11T23:32:52.5514540Z ----------------------------------------------------------------------
2023-01-11T23:32:52.5514842Z Ran 2 tests in 0.074s
2023-01-11T23:32:52.5515000Z 
2023-01-11T23:32:52.5515295Z OK
2023-01-11T23:32:52.5515398Z 
2023-01-11T23:32:52.5515501Z Generating XML reports...
2023-01-11T23:32:52.5538283Z Generated XML report: test-reports/python-unittest/test_custom_backend/TEST-TestCustomBackend-20230111233252.xml
2023-01-11T23:32:52.8517037Z + python backend.py --export-module-to=model.pt
2023-01-11T23:32:54.3198902Z + build/test_custom_backend ./model.pt
2023-01-11T23:32:54.7374453Z Testing custom_backend
2023-01-11T23:32:54.8858753Z OK
2023-01-11T23:32:54.9797877Z + rm -f ./model.pt
2023-01-11T23:32:54.9807049Z + popd
2023-01-11T23:32:54.9807258Z ~/workspace
2023-01-11T23:32:54.9807511Z + assert_git_not_dirty
2023-01-11T23:32:54.9807874Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]]
2023-01-11T23:32:54.9808294Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]]
2023-01-11T23:32:54.9810171Z ++ git status --porcelain
2023-01-11T23:32:55.0541105Z + git_status=
2023-01-11T23:32:55.0541609Z + [[ -n '' ]]
2023-01-11T23:32:55.0541821Z + test_torch_function_benchmark
2023-01-11T23:32:55.0542107Z + echo 'Testing __torch_function__ benchmarks'
2023-01-11T23:32:55.0542356Z Testing __torch_function__ benchmarks
2023-01-11T23:32:55.0542626Z + pushd benchmarks/overrides_benchmark
2023-01-11T23:32:55.0542904Z ~/workspace/benchmarks/overrides_benchmark ~/workspace
2023-01-11T23:32:55.0543190Z + python bench.py -n 1 -m 2
2023-01-11T23:32:56.1616159Z Type tensor had a minimum time of 0.0064373016357421875 us and a standard deviation of 0.5222837207838893 us.
2023-01-11T23:32:56.1617167Z Type SubTensor had a minimum time of 0.013589859008789062 us and a standard deviation of 0.028491269404185005 us.
2023-01-11T23:32:56.1617738Z Type WithTorchFunction had a minimum time of 0.00667572021484375 us and a standard deviation of 0.01281264212593669 us.
2023-01-11T23:32:56.1618321Z Type SubWithTorchFunction had a minimum time of 0.010967254638671875 us and a standard deviation of 0.006237733487068908 us.
2023-01-11T23:32:56.4001123Z + python pyspybench.py Tensor -n 1
2023-01-11T23:32:57.7378046Z + python pyspybench.py SubTensor -n 1
2023-01-11T23:32:59.0754885Z + python pyspybench.py WithTorchFunction -n 1
2023-01-11T23:33:00.4216272Z + python pyspybench.py SubWithTorchFunction -n 1
2023-01-11T23:33:01.7617636Z + popd
2023-01-11T23:33:01.7618049Z ~/workspace
2023-01-11T23:33:01.7618342Z + assert_git_not_dirty
2023-01-11T23:33:01.7618823Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *rocm* ]]
2023-01-11T23:33:01.7619181Z + [[ linux-bionic-cuda11.6-py3.10-gcc7-sm86 != *xla* ]]
2023-01-11T23:33:01.7621354Z ++ git status --porcelain
2023-01-11T23:33:01.8340162Z + git_status=
2023-01-11T23:33:01.8340543Z + [[ -n '' ]]
2023-01-11T23:33:01.8407470Z Prepare all required actions
2023-01-11T23:33:01.8407780Z Getting action download info
2023-01-11T23:33:02.0479868Z ##[group]Run ./.github/actions/get-workflow-job-id
2023-01-11T23:33:02.0480096Z with:
2023-01-11T23:33:02.0480449Z   github-token: ***
2023-01-11T23:33:02.0480633Z env:
2023-01-11T23:33:02.0480819Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:02.0481022Z   GPU_FLAG: --gpus all
2023-01-11T23:33:02.0481298Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:02.0481570Z ##[endgroup]
2023-01-11T23:33:02.0505661Z ##[group]Run nick-fields/retry@3e91a01664abd3c5cd539100d10d33b9c5b68482
2023-01-11T23:33:02.0505916Z with:
2023-01-11T23:33:02.0506091Z   shell: bash
2023-01-11T23:33:02.0506275Z   timeout_minutes: 10
2023-01-11T23:33:02.0506470Z   max_attempts: 5
2023-01-11T23:33:02.0506670Z   retry_wait_seconds: 30
2023-01-11T23:33:02.0507089Z   command: set -eux
python3 -m pip install requests==2.26.0
GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}")
echo "job-id=${GHA_WORKFLOW_JOB_ID}" >> "${GITHUB_OUTPUT}"

2023-01-11T23:33:02.0507641Z   polling_interval_seconds: 1
2023-01-11T23:33:02.0507852Z   warning_on_retry: true
2023-01-11T23:33:02.0508063Z   continue_on_error: false
2023-01-11T23:33:02.0508270Z env:
2023-01-11T23:33:02.0508479Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:02.0508697Z   GPU_FLAG: --gpus all
2023-01-11T23:33:02.0508980Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:02.0509423Z   GITHUB_TOKEN: ***
2023-01-11T23:33:02.0509608Z ##[endgroup]
2023-01-11T23:33:02.0984898Z + python3 -m pip install requests==2.26.0
2023-01-11T23:33:02.3057212Z Defaulting to user installation because normal site-packages is not writeable
2023-01-11T23:33:02.3232542Z Requirement already satisfied: requests==2.26.0 in /home/ec2-user/.local/lib/python3.7/site-packages (2.26.0)
2023-01-11T23:33:02.3371012Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (1.26.14)
2023-01-11T23:33:02.3541458Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2022.12.7)
2023-01-11T23:33:02.3549201Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2.0.12)
2023-01-11T23:33:02.3567353Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (3.4)
2023-01-11T23:33:02.5478841Z ++ python3 .github/scripts/get_workflow_job_id.py 3896346758 i-016718a172a944ca0
2023-01-11T23:33:05.4456608Z + GHA_WORKFLOW_JOB_ID=10589556206
2023-01-11T23:33:05.4457680Z + echo job-id=10589556206
2023-01-11T23:33:06.0997391Z Command completed after 1 attempt(s).
2023-01-11T23:33:06.1099778Z ##[group]Run kill "$MONITOR_SCRIPT_PID"
2023-01-11T23:33:06.1100035Z [36;1mkill "$MONITOR_SCRIPT_PID"[0m
2023-01-11T23:33:06.1110805Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:33:06.1111045Z env:
2023-01-11T23:33:06.1111234Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:06.1111439Z   GPU_FLAG: --gpus all
2023-01-11T23:33:06.1111720Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:06.1111995Z   MONITOR_SCRIPT_PID: 5027
2023-01-11T23:33:06.1112185Z ##[endgroup]
2023-01-11T23:33:06.1192379Z Prepare all required actions
2023-01-11T23:33:06.1192644Z Getting action download info
2023-01-11T23:33:06.3101065Z Download action repository 'actions/upload-artifact@v3' (SHA:0b7f8abb1508181956e8e162db84b466c27e18ce)
2023-01-11T23:33:06.4596484Z ##[group]Run ./.github/actions/upload-test-artifacts
2023-01-11T23:33:06.4596724Z with:
2023-01-11T23:33:06.4597001Z   file-suffix: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206
2023-01-11T23:33:06.4597271Z env:
2023-01-11T23:33:06.4597454Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:06.4597668Z   GPU_FLAG: --gpus all
2023-01-11T23:33:06.4597951Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:06.4598214Z ##[endgroup]
2023-01-11T23:33:06.4620557Z ##[group]Run # Remove any previous test jsons if they exist
2023-01-11T23:33:06.4620844Z [36;1m# Remove any previous test jsons if they exist[0m
2023-01-11T23:33:06.4621093Z [36;1mrm -f test-jsons-*.zip[0m
2023-01-11T23:33:06.4621392Z [36;1mzip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json'[0m
2023-01-11T23:33:06.4632126Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:33:06.4632360Z env:
2023-01-11T23:33:06.4632553Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:06.4632760Z   GPU_FLAG: --gpus all
2023-01-11T23:33:06.4633041Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:06.4633397Z   FILE_SUFFIX: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206
2023-01-11T23:33:06.4633661Z ##[endgroup]
2023-01-11T23:33:06.4717385Z   adding: test/allowlist_for_publicAPI.json (deflated 78%)
2023-01-11T23:33:06.4746882Z   adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%)
2023-01-11T23:33:06.4753168Z   adding: test/profiler/profiler_utils_mock_events.json (deflated 87%)
2023-01-11T23:33:06.4754857Z   adding: test/.pytorch-slow-tests.json (deflated 77%)
2023-01-11T23:33:06.4759584Z   adding: test/.pytorch-disabled-tests.json (deflated 84%)
2023-01-11T23:33:06.4782809Z ##[group]Run # Remove any previous test reports if they exist
2023-01-11T23:33:06.4783184Z [36;1m# Remove any previous test reports if they exist[0m
2023-01-11T23:33:06.4783509Z [36;1mrm -f test-reports-*.zip[0m
2023-01-11T23:33:06.4783858Z [36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' -i '*.csv'[0m
2023-01-11T23:33:06.4796261Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:33:06.4796553Z env:
2023-01-11T23:33:06.4796794Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:06.4797046Z   GPU_FLAG: --gpus all
2023-01-11T23:33:06.4797386Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:06.4797822Z   FILE_SUFFIX: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206
2023-01-11T23:33:06.4798148Z ##[endgroup]
2023-01-11T23:33:06.4873511Z   adding: test/custom_backend/test-reports/python-unittest/test_custom_backend/TEST-TestCustomBackend-20230111233252.xml (deflated 56%)
2023-01-11T23:33:06.4874257Z   adding: test/custom_operator/test-reports/python-unittest/test_custom_ops/TEST-TestCustomOperators-20230111233247.xml (deflated 65%)
2023-01-11T23:33:06.4874942Z   adding: test/test-reports/python-unittest/dynamo.test_optimizations/TEST-NormalizeIRTests-20230111212247.xml (deflated 41%)
2023-01-11T23:33:06.4875604Z   adding: test/test-reports/python-unittest/dynamo.test_optimizations/TEST-TestOptimizations-20230111212247.xml (deflated 78%)
2023-01-11T23:33:06.4881934Z   adding: test/test-reports/python-unittest/dynamo.test_misc/TEST-MiscTests-20230111212247.xml (deflated 90%)
2023-01-11T23:33:06.4882528Z   adding: test/test-reports/python-unittest/dynamo.test_misc/TEST-TestTracer-20230111212247.xml (deflated 39%)
2023-01-11T23:33:06.4883201Z   adding: test/test-reports/python-unittest/dynamo.test_torchxla_integration/TEST-TorchXLAReuseGraphTest-20230111212252.xml (deflated 76%)
2023-01-11T23:33:06.4884485Z   adding: test/test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatch-20230111212256.xml (deflated 87%)
2023-01-11T23:33:06.4885166Z   adding: test/test-reports/python-unittest/test_python_dispatch/TEST-TestPythonDispatcher-20230111212256.xml (deflated 55%)
2023-01-11T23:33:06.4886383Z   adding: test/test-reports/python-unittest/test_python_dispatch/TEST-TestPythonRegistration-20230111212256.xml (deflated 82%)
2023-01-11T23:33:06.4887697Z   adding: test/test-reports/python-unittest/test_scatter_gather_ops/TEST-TestScatterGatherCUDA-20230111212303.xml (deflated 92%)
2023-01-11T23:33:06.4888775Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertClose-20230111212313.xml (deflated 84%)
2023-01-11T23:33:06.4889580Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseContainer-20230111212313.xml (deflated 70%)
2023-01-11T23:33:06.4890412Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseErrorMessage-20230111212313.xml (deflated 84%)
2023-01-11T23:33:06.4891248Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseMultiDeviceCUDA-20230111212313.xml (deflated 56%)
2023-01-11T23:33:06.4892128Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseQuantized-20230111212313.xml (deflated 69%)
2023-01-11T23:33:06.4892950Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSC-20230111212313.xml (deflated 70%)
2023-01-11T23:33:06.4893778Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseBSR-20230111212313.xml (deflated 70%)
2023-01-11T23:33:06.4894713Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCOO-20230111212313.xml (deflated 76%)
2023-01-11T23:33:06.4895519Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSC-20230111212313.xml (deflated 70%)
2023-01-11T23:33:06.4896297Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestAssertCloseSparseCSR-20230111212313.xml (deflated 63%)
2023-01-11T23:33:06.4897255Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestFrameworkUtils-20230111212313.xml (deflated 41%)
2023-01-11T23:33:06.4898031Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestImports-20230111212313.xml (deflated 59%)
2023-01-11T23:33:06.4910153Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestOpInfoSampleFunctionsCUDA-20230111212313.xml (deflated 97%)
2023-01-11T23:33:06.4910951Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestOpInfos-20230111212313.xml (deflated 54%)
2023-01-11T23:33:06.4912294Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestTestParametrization-20230111212313.xml (deflated 88%)
2023-01-11T23:33:06.4913181Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestTestParametrizationDeviceTypeCUDA-20230111212313.xml (deflated 88%)
2023-01-11T23:33:06.4914257Z   adding: test/test-reports/python-unittest/test_testing/TEST-TestTestingCUDA-20230111212313.xml (deflated 91%)
2023-01-11T23:33:06.4922278Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CPUReproTests-20230111212336.xml (deflated 90%)
2023-01-11T23:33:06.5082378Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CpuTests-20230111212336.xml (deflated 94%)
2023-01-11T23:33:06.5090065Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaReproTests-20230111212336.xml (deflated 88%)
2023-01-11T23:33:06.5344870Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-CudaTests-20230111212336.xml (deflated 95%)
2023-01-11T23:33:06.5345649Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-ExprPrinterTests-20230111212336.xml (deflated 41%)
2023-01-11T23:33:06.5357864Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCpuTest-20230111212336.xml (deflated 97%)
2023-01-11T23:33:06.5371341Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-SweepInputsCudaTest-20230111212336.xml (deflated 97%)
2023-01-11T23:33:06.5372216Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-TestIndexingSimplification-20230111212336.xml (deflated 57%)
2023-01-11T23:33:06.5385410Z   adding: test/test-reports/python-unittest/inductor.test_torchinductor/TEST-TritonCodeGenTests-20230111212336.xml (deflated 91%)
2023-01-11T23:33:06.5386002Z   adding: test/test-reports/python-unittest/benchmark_utils.test_benchmark_utils/TEST-TestBenchmarkUtils-20230111212350.xml (deflated 77%)
2023-01-11T23:33:06.5386686Z   adding: test/test-reports/python-unittest/dynamo.test_comptime/TEST-ComptimeTests-20230111212357.xml (deflated 77%)
2023-01-11T23:33:06.5389728Z   adding: test/test-reports/python-unittest/dynamo.test_functions/TEST-FunctionTests-20230111212402.xml (deflated 95%)
2023-01-11T23:33:06.5390386Z   adding: test/test-reports/python-unittest/dynamo.test_replay_record/TEST-ReplayRecordTests-20230111212408.xml (deflated 81%)
2023-01-11T23:33:06.5390970Z   adding: test/test-reports/python-unittest/dynamo.test_verify_correctness/TEST-TestVerifyCorrectness-20230111212411.xml (deflated 69%)
2023-01-11T23:33:06.5391964Z   adding: test/test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyDynamicOps-20230111212421.xml (deflated 40%)
2023-01-11T23:33:06.5392506Z   adding: test/test-reports/python-unittest/lazy.test_ts_opinfo/TEST-TestLazyTensor-20230111212421.xml (deflated 58%)
2023-01-11T23:33:06.5393046Z   adding: test/test-reports/python-unittest/nn.test_packed_sequence/TEST-PackedSequenceTest-20230111212429.xml (deflated 81%)
2023-01-11T23:33:06.5393589Z   adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestDataFlow-20230111212434.xml (deflated 73%)
2023-01-11T23:33:06.5394161Z   adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestIdentifyGradients-20230111212434.xml (deflated 71%)
2023-01-11T23:33:06.5394739Z   adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfiler-20230111212434.xml (deflated 42%)
2023-01-11T23:33:06.5395394Z   adding: test/test-reports/python-unittest/profiler.test_memory_profiler/TEST-TestMemoryProfilerE2E-20230111212434.xml (deflated 84%)
2023-01-11T23:33:06.5395942Z   adding: test/test-reports/python-unittest/test_autocast/TEST-TestAutocastCPU-20230111212442.xml (deflated 77%)
2023-01-11T23:33:06.5396442Z   adding: test/test-reports/python-unittest/test_autocast/TEST-TestAutocastGPU-20230111212442.xml (deflated 45%)
2023-01-11T23:33:06.5396950Z   adding: test/test-reports/python-unittest/test_autocast/TEST-TestTorchAutocast-20230111212442.xml (deflated 42%)
2023-01-11T23:33:06.5397474Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestConcatDataset-20230111212451.xml (deflated 74%)
2023-01-11T23:33:06.5397998Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestCustomPinFn-20230111212451.xml (deflated 53%)
2023-01-11T23:33:06.5398511Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDataLoader-20230111212451.xml (deflated 82%)
2023-01-11T23:33:06.5399411Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDataLoaderPersistentWorkers-20230111212451.xml (deflated 85%)
2023-01-11T23:33:06.5400003Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDatasetRandomSplit-20230111212451.xml (deflated 76%)
2023-01-11T23:33:06.5400553Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestDictDataLoader-20230111212451.xml (deflated 61%)
2023-01-11T23:33:06.5417683Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestNamedTupleDataLoader-20230111212451.xml (deflated 43%)
2023-01-11T23:33:06.5418378Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestSetAffinity-20230111212451.xml (deflated 40%)
2023-01-11T23:33:06.5419140Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestStringDataLoader-20230111212451.xml (deflated 40%)
2023-01-11T23:33:06.5419864Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestTensorDataset-20230111212451.xml (deflated 72%)
2023-01-11T23:33:06.5420473Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-IntegrationTestDataLoaderDataPipe-20230111212451.xml (deflated 43%)
2023-01-11T23:33:06.5421095Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestConvAfterFork-20230111212451.xml (deflated 41%)
2023-01-11T23:33:06.5421658Z   adding: test/test-reports/python-unittest/test_dataloader/TEST-TestIndividualWorkerQueue-20230111212451.xml (deflated 42%)
2023-01-11T23:33:06.5422293Z   adding: test/test-reports/python-unittest/test_functional_optim/TEST-TestFunctionalOptimParity-20230111212706.xml (deflated 71%)
2023-01-11T23:33:06.5422881Z   adding: test/test-reports/python-unittest/test_fx_experimental/TEST-TestFXExperimental-20230111212713.xml (deflated 79%)
2023-01-11T23:33:06.5423498Z   adding: test/test-reports/python-unittest/test_fx_experimental/TEST-TestNormalizeOperatorsCUDA-20230111212713.xml (deflated 97%)
2023-01-11T23:33:06.5424084Z   adding: test/test-reports/python-unittest/test_import_stats/TEST-TestImportTime-20230111212730.xml (deflated 53%)
2023-01-11T23:33:06.5424609Z   adding: test/test-reports/python-unittest/test_jit_autocast/TEST-TestAutocast-20230111212740.xml (deflated 86%)
2023-01-11T23:33:06.5425151Z   adding: test/test-reports/python-unittest/test_jit_autocast/TEST-TestJitTraceAutocast-20230111212740.xml (deflated 76%)
2023-01-11T23:33:06.5425713Z   adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestEnableDisableLlgaFuser-20230111212850.xml (deflated 41%)
2023-01-11T23:33:06.5426262Z   adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestDynamoAOT-20230111212850.xml (deflated 41%)
2023-01-11T23:33:06.5426806Z   adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestFusionPatternCUDA-20230111212850.xml (deflated 92%)
2023-01-11T23:33:06.5427334Z   adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestModel-20230111212850.xml (deflated 96%)
2023-01-11T23:33:06.5427895Z   adding: test/test-reports/python-unittest/test_jit_llga_fuser/TEST-TestOpCUDA-20230111212850.xml (deflated 93%)
2023-01-11T23:33:06.5428436Z   adding: test/test-reports/python-unittest/test_matmul_cuda/TEST-TestMatmulCudaCUDA-20230111212855.xml (deflated 89%)
2023-01-11T23:33:06.5429101Z   adding: test/test-reports/python-unittest/test_mkldnn_fusion/TEST-TestMkldnnFusion-20230111213029.xml (deflated 71%)
2023-01-11T23:33:06.5429702Z   adding: test/test-reports/python-unittest/test_module_init/TEST-TestModuleInitCUDA-20230111213112.xml (deflated 96%)
2023-01-11T23:33:06.5430483Z   adding: test/test-reports/python-unittest/test_native_mha/TEST-TestMHADeviceTypeCUDA-20230111213119.xml (deflated 95%)
2023-01-11T23:33:06.5431725Z   adding: test/test-reports/python-unittest/test_numpy_interop/TEST-TestNumPyInteropCUDA-20230111213125.xml (deflated 86%)
2023-01-11T23:33:06.5432695Z   adding: test/test-reports/python-unittest/test_optim/TEST-TestDifferentiableOptimizer-20230111213129.xml (deflated 82%)
2023-01-11T23:33:06.5434561Z   adding: test/test-reports/python-unittest/test_optim/TEST-TestLRScheduler-20230111213129.xml (deflated 91%)
2023-01-11T23:33:06.5435211Z   adding: test/test-reports/python-unittest/test_optim/TEST-TestOptim-20230111213129.xml (deflated 81%)
2023-01-11T23:33:06.5435710Z   adding: test/test-reports/python-unittest/test_optim/TEST-TestSWAUtils-20230111213129.xml (deflated 75%)
2023-01-11T23:33:06.5437236Z   adding: test/test-reports/python-unittest/test_shape_ops/TEST-TestShapeOpsCUDA-20230111213312.xml (deflated 93%)
2023-01-11T23:33:06.5437890Z   adding: test/test-reports/python-unittest/test_type_info/TEST-TestDTypeInfo-20230111213317.xml (deflated 62%)
2023-01-11T23:33:06.5439961Z   adding: test/test-reports/python-unittest/test_view_ops/TEST-TestOldViewOpsCUDA-20230111213323.xml (deflated 90%)
2023-01-11T23:33:06.5442699Z   adding: test/test-reports/python-unittest/test_view_ops/TEST-TestViewOpsCUDA-20230111213323.xml (deflated 94%)
2023-01-11T23:33:06.5443504Z   adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConstHandling-20230111213807.xml (deflated 76%)
2023-01-11T23:33:06.5444211Z   adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorConverterTest-20230111213807.xml (deflated 73%)
2023-01-11T23:33:06.5444807Z   adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorOperatorInvariants-20230111213807.xml (deflated 60%)
2023-01-11T23:33:06.5445482Z   adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorPropTest-20230111213807.xml (deflated 40%)
2023-01-11T23:33:06.5446233Z   adding: test/test-reports/python-unittest/test_fake_tensor/TEST-FakeTensorTest-20230111213807.xml (deflated 82%)
2023-01-11T23:33:06.5472309Z   adding: test/test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRCUDA-20230111213819.xml (deflated 96%)
2023-01-11T23:33:06.5472999Z   adding: test/test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCSRSampler-20230111213819.xml (deflated 40%)
2023-01-11T23:33:06.5506641Z   adding: test/test-reports/python-unittest/test_sparse_csr/TEST-TestSparseCompressedCUDA-20230111213819.xml (deflated 97%)
2023-01-11T23:33:06.5507467Z   adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestCppExtensionAOT-20230111214707.xml (deflated 78%)
2023-01-11T23:33:06.5508245Z   adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestORTTensor-20230111214707.xml (deflated 68%)
2023-01-11T23:33:06.5509040Z   adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestPybindTypeCasters-20230111214707.xml (deflated 41%)
2023-01-11T23:33:06.5509836Z   adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestRNGExtension-20230111214707.xml (deflated 40%)
2023-01-11T23:33:06.5510517Z   adding: test/test-reports/python-unittest/test_cpp_extensions_aot_no_ninja/TEST-TestTorchLibrary-20230111214707.xml (deflated 40%)
2023-01-11T23:33:06.5511101Z   adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214711.xml (deflated 42%)
2023-01-11T23:33:06.5511900Z   adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214712.xml (deflated 42%)
2023-01-11T23:33:06.5512620Z   adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214714.xml (deflated 41%)
2023-01-11T23:33:06.5513200Z   adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214715.xml (deflated 42%)
2023-01-11T23:33:06.5513770Z   adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214717.xml (deflated 42%)
2023-01-11T23:33:06.5514341Z   adding: test/test-reports/python-unittest/test_cuda_nvml_based_avail/TEST-TestExtendedCUDAIsAvail-20230111214719.xml (deflated 41%)
2023-01-11T23:33:06.5514861Z   adding: test/test-reports/python-unittest/test_dispatch/TEST-TestDispatch-20230111214721.xml (deflated 81%)
2023-01-11T23:33:06.5515389Z   adding: test/test-reports/python-unittest/test_dispatch/TEST-TestPythonDispatcher-20230111214721.xml (deflated 75%)
2023-01-11T23:33:06.5526691Z   adding: test/test-reports/python-unittest/test_linalg/TEST-TestLinalgCUDA-20230111214810.xml (deflated 93%)
2023-01-11T23:33:06.5527387Z   adding: test/test-reports/python-unittest/test_multiprocessing_spawn/TEST-ErrorTest-20230111215317.xml (deflated 39%)
2023-01-11T23:33:06.5528052Z   adding: test/test-reports/python-unittest/test_multiprocessing_spawn/TEST-ForkTest-20230111215317.xml (deflated 76%)
2023-01-11T23:33:06.5528632Z   adding: test/test-reports/python-unittest/test_multiprocessing_spawn/TEST-SpawnTest-20230111215317.xml (deflated 77%)
2023-01-11T23:33:06.5529153Z   adding: test/test-reports/python-unittest/test_prims/TEST-TestDecompCUDA-20230111231405.xml (deflated 74%)
2023-01-11T23:33:06.5529786Z   adding: test/test-reports/python-unittest/test_prims/TEST-TestPrimsBasic-20230111231405.xml (deflated 52%)
2023-01-11T23:33:06.5530515Z   adding: test/test-reports/python-unittest/test_prims/TEST-TestPrimsCUDA-20230111231405.xml (deflated 84%)
2023-01-11T23:33:06.5531021Z   adding: test/test-reports/python-unittest/test_prims/TEST-TestRefsCUDA-20230111231405.xml (deflated 38%)
2023-01-11T23:33:06.5532667Z   adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestAsArrayCUDA-20230111231451.xml (deflated 95%)
2023-01-11T23:33:06.5533446Z   adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestLikeTensorCreationCUDA-20230111231451.xml (deflated 76%)
2023-01-11T23:33:06.5534266Z   adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestRandomTensorCreationCUDA-20230111231451.xml (deflated 86%)
2023-01-11T23:33:06.5540555Z   adding: test/test-reports/python-unittest/test_tensor_creation_ops/TEST-TestTensorCreationCUDA-20230111231451.xml (deflated 93%)
2023-01-11T23:33:06.5777379Z   adding: test/test-reports/python-pytest/test_ops/test_ops-8308d40cbcb1066e.xml (deflated 95%)
2023-01-11T23:33:06.6008621Z   adding: test/test-reports/python-pytest/test_ops/test_ops-621a128d9f5db79e.xml (deflated 95%)
2023-01-11T23:33:06.6009337Z   adding: test/test-reports/python-pytest/test_ops/test_ops-001e362a3915126e.xml (deflated 86%)
2023-01-11T23:33:06.6037460Z   adding: test/test-reports/cpp-unittest/test_libtorch/test_jit.xml (deflated 91%)
2023-01-11T23:33:06.6046306Z   adding: test/test-reports/cpp-unittest/test_libtorch/test_lazy.xml (deflated 93%)
2023-01-11T23:33:06.6065225Z   adding: test/test-reports/cpp-unittest/test_libtorch/test_api.xml (deflated 92%)
2023-01-11T23:33:06.6080509Z   adding: test/test-reports/cpp-unittest/test_libtorch/test_tensorexpr.xml (deflated 92%)
2023-01-11T23:33:06.6081325Z   adding: test/test-reports/cpp-unittest/test_aot_compilation/test_mobile_nnc.xml (deflated 75%)
2023-01-11T23:33:06.6102402Z ##[group]Run # Remove any previous test reports if they exist
2023-01-11T23:33:06.6102716Z [36;1m# Remove any previous test reports if they exist[0m
2023-01-11T23:33:06.6103044Z [36;1mrm -f usage-log-*.zip[0m
2023-01-11T23:33:06.6103340Z [36;1m# this workflow is also run in bazel build test, but we dont generate usage reports for it[0m
2023-01-11T23:33:06.6103655Z [36;1m# so check to see if the file exists first[0m
2023-01-11T23:33:06.6103899Z [36;1mif [ -f 'usage_log.txt' ]; then[0m
2023-01-11T23:33:06.6104167Z [36;1m    zip "usage-log-${FILE_SUFFIX}.zip" 'usage_log.txt'[0m
2023-01-11T23:33:06.6104400Z [36;1mfi[0m
2023-01-11T23:33:06.6114605Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:33:06.6114838Z env:
2023-01-11T23:33:06.6115027Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:06.6115229Z   GPU_FLAG: --gpus all
2023-01-11T23:33:06.6115508Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:06.6115862Z   FILE_SUFFIX: test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206
2023-01-11T23:33:06.6116126Z ##[endgroup]
2023-01-11T23:33:06.6965601Z   adding: usage_log.txt (deflated 98%)
2023-01-11T23:33:06.7001574Z ##[group]Run seemethere/upload-artifact-s3@v5
2023-01-11T23:33:06.7001808Z with:
2023-01-11T23:33:06.7002023Z   s3-prefix: pytorch/pytorch/3896346758/1/artifact

2023-01-11T23:33:06.7002247Z   retention-days: 14
2023-01-11T23:33:06.7002452Z   if-no-files-found: warn
2023-01-11T23:33:06.7002667Z   path: test-jsons-*.zip
2023-01-11T23:33:06.7002862Z   name: artifact
2023-01-11T23:33:06.7003060Z   s3-bucket: gha-artifacts
2023-01-11T23:33:06.7003266Z   region: us-east-1
2023-01-11T23:33:06.7003441Z env:
2023-01-11T23:33:06.7003630Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:06.7003849Z   GPU_FLAG: --gpus all
2023-01-11T23:33:06.7004120Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:06.7004383Z ##[endgroup]
2023-01-11T23:33:07.0103152Z NOTE: s3-prefix specified, ignoring name parameter
2023-01-11T23:33:07.0104179Z With the provided path, there will be 1 file uploaded
2023-01-11T23:33:07.0105127Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact
2023-01-11T23:33:07.0112439Z Starting upload of test-jsons-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip
2023-01-11T23:33:07.1415676Z Finished upload of test-jsons-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip
2023-01-11T23:33:07.1542695Z ##[group]Run seemethere/upload-artifact-s3@v5
2023-01-11T23:33:07.1542939Z with:
2023-01-11T23:33:07.1543153Z   s3-prefix: pytorch/pytorch/3896346758/1/artifact

2023-01-11T23:33:07.1543389Z   retention-days: 14
2023-01-11T23:33:07.1543605Z   if-no-files-found: error
2023-01-11T23:33:07.1543820Z   path: test-reports-*.zip
2023-01-11T23:33:07.1544024Z   name: artifact
2023-01-11T23:33:07.1544228Z   s3-bucket: gha-artifacts
2023-01-11T23:33:07.1544428Z   region: us-east-1
2023-01-11T23:33:07.1544611Z env:
2023-01-11T23:33:07.1544801Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:07.1545006Z   GPU_FLAG: --gpus all
2023-01-11T23:33:07.1545297Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:07.1545564Z ##[endgroup]
2023-01-11T23:33:07.4619471Z NOTE: s3-prefix specified, ignoring name parameter
2023-01-11T23:33:07.4620077Z With the provided path, there will be 1 file uploaded
2023-01-11T23:33:07.4620640Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact
2023-01-11T23:33:07.4627429Z Starting upload of test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip
2023-01-11T23:33:07.6623228Z Finished upload of test-reports-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip
2023-01-11T23:33:07.6759150Z ##[group]Run seemethere/upload-artifact-s3@v5
2023-01-11T23:33:07.6759383Z with:
2023-01-11T23:33:07.6759593Z   s3-prefix: pytorch/pytorch/3896346758/1/artifact

2023-01-11T23:33:07.6759827Z   retention-days: 14
2023-01-11T23:33:07.6760041Z   if-no-files-found: ignore
2023-01-11T23:33:07.6760479Z   path: usage-log-*.zip
2023-01-11T23:33:07.6760685Z   name: artifact
2023-01-11T23:33:07.6760893Z   s3-bucket: gha-artifacts
2023-01-11T23:33:07.6761167Z   region: us-east-1
2023-01-11T23:33:07.6761348Z env:
2023-01-11T23:33:07.6761535Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:07.6761740Z   GPU_FLAG: --gpus all
2023-01-11T23:33:07.6762016Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:07.6762277Z ##[endgroup]
2023-01-11T23:33:07.9844380Z NOTE: s3-prefix specified, ignoring name parameter
2023-01-11T23:33:07.9844698Z With the provided path, there will be 1 file uploaded
2023-01-11T23:33:07.9845047Z Uploading to s3 prefix: pytorch/pytorch/3896346758/1/artifact
2023-01-11T23:33:07.9852332Z Starting upload of usage-log-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip
2023-01-11T23:33:08.1166451Z Finished upload of usage-log-test-default-2-4-linux.g5.4xlarge.nvidia.gpu_10589556206.zip
2023-01-11T23:33:08.1295891Z ##[group]Run # shellcheck disable=SC2156
2023-01-11T23:33:08.1296153Z [36;1m# shellcheck disable=SC2156[0m
2023-01-11T23:33:08.1296485Z [36;1mfind . -iname "core.[1-9]*" -exec docker exec "${DOCKER_CONTAINER_ID}" sh -c "gdb python {} -ex 'bt' -ex 'q'" \;[0m
2023-01-11T23:33:08.1307020Z shell: /usr/bin/bash -e {0}
2023-01-11T23:33:08.1307211Z env:
2023-01-11T23:33:08.1307399Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:08.1307610Z   GPU_FLAG: --gpus all
2023-01-11T23:33:08.1307882Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:08.1308144Z ##[endgroup]
2023-01-11T23:33:08.3410594Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
2023-01-11T23:33:08.3411051Z Copyright (C) 2018 Free Software Foundation, Inc.
2023-01-11T23:33:08.3411501Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2023-01-11T23:33:08.3411961Z This is free software: you are free to change and redistribute it.
2023-01-11T23:33:08.3412447Z There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
2023-01-11T23:33:08.3412824Z and "show warranty" for details.
2023-01-11T23:33:08.3413247Z This GDB was configured as "x86_64-linux-gnu".
2023-01-11T23:33:08.3413631Z Type "show configuration" for configuration details.
2023-01-11T23:33:08.3413944Z For bug reporting instructions, please see:
2023-01-11T23:33:08.3414208Z <http://www.gnu.org/software/gdb/bugs/>.
2023-01-11T23:33:08.3414742Z Find the GDB manual and other documentation resources online at:
2023-01-11T23:33:08.3415042Z <http://www.gnu.org/software/gdb/documentation/>.
2023-01-11T23:33:08.3415278Z For help, type "help".
2023-01-11T23:33:08.3415538Z Type "apropos word" to search for commands related to "word"...
2023-01-11T23:33:08.4120446Z Reading symbols from python...done.
2023-01-11T23:33:08.9405386Z 
2023-01-11T23:33:08.9405825Z warning: core file may not match specified executable file.
2023-01-11T23:33:08.9432762Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:08.9433757Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:08.9434741Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:08.9435700Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:08.9436615Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:08.9437330Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:08.9437869Z [New LWP 6354]
2023-01-11T23:33:08.9438256Z [New LWP 6362]
2023-01-11T23:33:08.9438611Z [New LWP 6360]
2023-01-11T23:33:08.9438981Z [New LWP 6369]
2023-01-11T23:33:08.9439477Z [New LWP 6373]
2023-01-11T23:33:08.9439920Z [New LWP 6388]
2023-01-11T23:33:08.9440105Z [New LWP 6387]
2023-01-11T23:33:08.9440340Z [New LWP 6371]
2023-01-11T23:33:08.9440519Z [New LWP 6372]
2023-01-11T23:33:08.9440696Z [New LWP 6375]
2023-01-11T23:33:08.9440876Z [New LWP 6374]
2023-01-11T23:33:08.9441401Z [New LWP 6376]
2023-01-11T23:33:08.9441674Z [New LWP 6364]
2023-01-11T23:33:08.9441949Z [New LWP 6370]
2023-01-11T23:33:08.9442124Z [New LWP 6377]
2023-01-11T23:33:08.9442304Z [New LWP 6389]
2023-01-11T23:33:08.9442486Z [New LWP 6392]
2023-01-11T23:33:08.9443075Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:08.9443625Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:08.9444174Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:08.9444824Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:08.9616738Z [Thread debugging using libthread_db enabled]
2023-01-11T23:33:08.9617783Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2023-01-11T23:33:14.4700612Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'.
2023-01-11T23:33:14.4701542Z Program terminated with signal SIGSEGV, Segmentation fault.
2023-01-11T23:33:14.4702018Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:14.4702419Z [Current thread is 1 (Thread 0x7fea17da6080 (LWP 6354))]
2023-01-11T23:33:14.4724648Z 51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
2023-01-11T23:33:14.4725929Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
2023-01-11T23:33:14.4732344Z To enable execution of this file add
2023-01-11T23:33:14.4732890Z 	add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit
2023-01-11T23:33:14.4733287Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:14.4733687Z To completely disable this security protection add
2023-01-11T23:33:14.4734076Z 	set auto-load safe-path /
2023-01-11T23:33:14.4734400Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:14.4734956Z For more information about this security protection see the
2023-01-11T23:33:14.4735337Z "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
2023-01-11T23:33:14.4735664Z 	info "(gdb)Auto-loading safe path"
2023-01-11T23:33:14.4775719Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:14.4776873Z #1  0x00007fe9dedc265b in handler_SIGSEGV(int, siginfo_t*, void*) ()
2023-01-11T23:33:14.4777975Z    from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so
2023-01-11T23:33:14.4785918Z #2  <signal handler called>
2023-01-11T23:33:14.4786273Z #3  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
2023-01-11T23:33:14.4922693Z #4  0x00007fea1686403b in string_at (ptr=0x0, size=-1) at <artificial>:5564
2023-01-11T23:33:14.4923122Z #5  0x00007fea17c5b052 in ffi_call_unix64 ()
2023-01-11T23:33:14.4923526Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:14.4928371Z #6  0x00007fea17c598cd in ffi_call_int ()
2023-01-11T23:33:14.4928880Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:14.4933989Z #7  0x00007fea1686c879 in _call_function_pointer (argtypecount=2, argcount=2, 
2023-01-11T23:33:14.4934443Z     resmem=0x7ffca89e55f0, restype=<optimized out>, atypes=<optimized out>, 
2023-01-11T23:33:14.4935158Z     avalues=<optimized out>, pProc=0x7fea16864002 <string_at>, flags=4357)
2023-01-11T23:33:14.4935684Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916
2023-01-11T23:33:14.4935994Z #8  _ctypes_callproc ()
2023-01-11T23:33:14.4936333Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259
2023-01-11T23:33:14.4936644Z #9  0x00007fea1686c3fe in PyCFuncPtr_call () at <artificial>:4201
2023-01-11T23:33:14.4937196Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:14.4937537Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:14.4941531Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:14.4941994Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4942527Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7fe9262b7418, 
2023-01-11T23:33:14.4942968Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4943356Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.4944075Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:14.4944517Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.4944855Z     args=0x7fe9262b7418, callable=0x7fea16b20880, tstate=<optimized out>)
2023-01-11T23:33:14.4945227Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:14.4947377Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.4947973Z     args=0x7fe9262b7418, callable=0x7fea16b20880)
2023-01-11T23:33:14.4948587Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.4949672Z #14 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.4950218Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e5900, 
2023-01-11T23:33:14.4950593Z     tstate=<optimized out>)
2023-01-11T23:33:14.4951082Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.4951487Z #15 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.4951986Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:14.4952936Z #16 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.4954053Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4954476Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4954849Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.4955381Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.4958621Z #17 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.4959022Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.4959491Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.4959989Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fea16adf890, 
2023-01-11T23:33:14.4960460Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.4960864Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.4961295Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.4961556Z #18 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.4961872Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.4964524Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:14.4964928Z     nargsf=<optimized out>, args=0x7fe9262b7280, callable=0x7fea16adf880, 
2023-01-11T23:33:14.4965246Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.4965635Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.4966267Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.4966785Z     args=0x7fe9262b7280, callable=0x7fea16adf880)
2023-01-11T23:33:14.4967204Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.4968562Z #21 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.4968935Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e5ac0, 
2023-01-11T23:33:14.4969200Z     tstate=<optimized out>)
2023-01-11T23:33:14.4969602Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.4969924Z #22 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.4970245Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:14.4970845Z #23 0x0000000000543a33 in _PyEval_EvalFrame (
2023-01-11T23:33:14.4971813Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4972416Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4972988Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.4973628Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.4976078Z #24 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:14.4976580Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4977210Z     args=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, con=0x7fe9260c1ac0, tstate=0x18f3bf0)
2023-01-11T23:33:14.4977884Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.4978336Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.4978747Z     stack=<optimized out>, func=0x7fe9260c1ab0)
2023-01-11T23:33:14.4979262Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.4982419Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.4982893Z     args=<optimized out>, callable=0x7fe9260c1ab0, tstate=0x18f3bf0)
2023-01-11T23:33:14.4983488Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:14.4983874Z #27 vectorcall_unbound (
2023-01-11T23:33:14.4984348Z     nargs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, 
2023-01-11T23:33:14.4984939Z     func=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4985498Z     unbound=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4986052Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.4986661Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629
2023-01-11T23:33:14.4987032Z #28 vectorcall_method ()
2023-01-11T23:33:14.4987497Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661
2023-01-11T23:33:14.4987935Z #29 0x0000000000543898 in slot_mp_subscript (self=<optimized out>, 
2023-01-11T23:33:14.4988289Z     arg1=<optimized out>)
2023-01-11T23:33:14.4988755Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258
2023-01-11T23:33:14.4989166Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.4989674Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109
2023-01-11T23:33:14.4992379Z #31 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.4992881Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4993441Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.4993998Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.4994602Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.4996683Z #32 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.4997441Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.4998107Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.4998728Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fea16add5b0, 
2023-01-11T23:33:14.4999346Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.4999942Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5000670Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5001054Z #33 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5001533Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5002866Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:14.5003326Z     nargsf=<optimized out>, args=0x7fe9262bebf8, callable=0x7fea16add5a0, 
2023-01-11T23:33:14.5003707Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.5004179Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5006244Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.5006665Z     args=0x7fe9262bebf8, callable=0x7fea16add5a0)
2023-01-11T23:33:14.5007183Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5008012Z #36 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.5008445Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e5ec0, 
2023-01-11T23:33:14.5008822Z     tstate=<optimized out>)
2023-01-11T23:33:14.5009282Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.5009679Z #37 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5010163Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:14.5010825Z #38 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.5011462Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5012055Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5012602Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5013213Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.5016658Z #39 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.5017269Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.5017926Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.5018543Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fe92f0e1370, 
2023-01-11T23:33:14.5019180Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.5019769Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5020392Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5020766Z #40 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5021235Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5022339Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:14.5022797Z     nargsf=<optimized out>, args=0x690b718, callable=0x7fe92f0e1360, 
2023-01-11T23:33:14.5023163Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.5023633Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5025991Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x690b718, 
2023-01-11T23:33:14.5026394Z     callable=0x7fe92f0e1360)
2023-01-11T23:33:14.5026885Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5027649Z #43 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.5028105Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e6080, 
2023-01-11T23:33:14.5028471Z     tstate=<optimized out>)
2023-01-11T23:33:14.5028943Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.5029320Z #44 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5029809Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:14.5030734Z #45 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.5032826Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5033410Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5033951Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5034548Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.5035991Z #46 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.5036569Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.5037212Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.5037834Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fe92f0e0680, 
2023-01-11T23:33:14.5038457Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.5039062Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5039675Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5040056Z #47 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5040533Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5042802Z #48 0x00000000004f141c in do_call_core (kwdict=0x7fea16c2a700, 
2023-01-11T23:33:14.5043239Z     callargs=0x7fe926269e40, func=0x7fe92f0e0670, trace_info=0x7ffca89e6240, 
2023-01-11T23:33:14.5043627Z     tstate=<optimized out>)
2023-01-11T23:33:14.5044128Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:14.5044524Z #49 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5045005Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:14.5045426Z #50 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.5045902Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5046436Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5046992Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5047580Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.5050690Z #51 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.5051262Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.5051927Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.5052556Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fea16c555b0, 
2023-01-11T23:33:14.5053270Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.5053854Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5054692Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5055071Z #52 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5055500Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5056342Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:14.5057014Z     nargsf=<optimized out>, args=0x694cd88, callable=0x7fea16c555a0, 
2023-01-11T23:33:14.5057286Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.5057994Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5060584Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x694cd88, 
2023-01-11T23:33:14.5060868Z     callable=0x7fea16c555a0)
2023-01-11T23:33:14.5061221Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5062100Z #55 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.5062408Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e6400, 
2023-01-11T23:33:14.5062750Z     tstate=<optimized out>)
2023-01-11T23:33:14.5063072Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.5063338Z #56 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5063655Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:14.5064179Z #57 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.5064951Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5065532Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5066091Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5066717Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.5069562Z #58 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.5070142Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.5070618Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.5071063Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fea16c55f40, 
2023-01-11T23:33:14.5071547Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.5072008Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5072445Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5072712Z #59 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5073032Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5075165Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:14.5075467Z     nargsf=<optimized out>, args=0x1970510, callable=0x7fea16c55f30, 
2023-01-11T23:33:14.5075720Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.5076135Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5077745Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x1970510, 
2023-01-11T23:33:14.5078020Z     callable=0x7fea16c55f30)
2023-01-11T23:33:14.5078700Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5079247Z #62 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.5079552Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e65c0, 
2023-01-11T23:33:14.5079809Z     tstate=<optimized out>)
2023-01-11T23:33:14.5080494Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.5080789Z #63 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5081118Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:14.5081391Z #64 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.5081716Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5082093Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5082470Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5082854Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.5085414Z #65 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.5085818Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.5086263Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.5086673Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fea16ad6de0, 
2023-01-11T23:33:14.5087090Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.5087489Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5087892Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5088152Z #66 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5088473Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5090271Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:14.5090585Z     nargsf=<optimized out>, args=0x7fea16d9c200, callable=0x7fea16ad6dd0, 
2023-01-11T23:33:14.5090843Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.5091226Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5092716Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:14.5092992Z     args=0x7fea16d9c200, callable=0x7fea16ad6dd0)
2023-01-11T23:33:14.5093351Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5094239Z #69 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:14.5094696Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e6780, 
2023-01-11T23:33:14.5095033Z     tstate=<optimized out>)
2023-01-11T23:33:14.5095388Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.5095741Z #70 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5096117Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:14.5096391Z #71 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:14.5096721Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5097092Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5097465Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5097855Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:14.5100305Z #72 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:14.5100682Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:14.5101130Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:14.5101560Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fea16ad6d50, 
2023-01-11T23:33:14.5102067Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x18f3bf0, 
2023-01-11T23:33:14.5102458Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5102867Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5103124Z #73 _PyFunction_Vectorcall ()
2023-01-11T23:33:14.5103455Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:14.5104301Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7fea16d75e00, 
2023-01-11T23:33:14.5104629Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7fea16ad6d40, 
2023-01-11T23:33:14.5104887Z     tstate=0x18f3bf0)
2023-01-11T23:33:14.5105319Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5106001Z #75 PyObject_Vectorcall (kwnames=0x7fea16d75e00, nargsf=<optimized out>, 
2023-01-11T23:33:14.5106306Z     args=<optimized out>, callable=0x7fea16ad6d40)
2023-01-11T23:33:14.5106669Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:14.5108241Z #76 call_function (kwnames=0x7fea16d75e00, oparg=<optimized out>, 
2023-01-11T23:33:14.5108550Z     pp_stack=<synthetic pointer>, trace_info=0x7ffca89e6940, 
2023-01-11T23:33:14.5108800Z     tstate=<optimized out>)
2023-01-11T23:33:14.5109118Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:14.5109380Z #77 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:14.5109787Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:14.5111995Z #78 0x0000000000594b72 in _PyEval_EvalFrame (
2023-01-11T23:33:14.5112585Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5113102Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:14.5113596Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5114091Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46
2023-01-11T23:33:14.5114383Z #79 _PyEval_Vector ()
2023-01-11T23:33:14.5114697Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:14.5116214Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7fea16d38920, 
2023-01-11T23:33:14.5116634Z     globals=globals@entry=0x7fea16d2a600, locals=locals@entry=0x7fea16d2a600)
2023-01-11T23:33:14.5117115Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134
2023-01-11T23:33:14.5117660Z #81 0x00000000005c6e57 in run_eval_code_obj ()
2023-01-11T23:33:14.5118142Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291
2023-01-11T23:33:14.5119942Z #82 0x00000000005c1d40 in run_mod ()
2023-01-11T23:33:14.5120648Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312
2023-01-11T23:33:14.5121940Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias ()
2023-01-11T23:33:14.5122454Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183
2023-01-11T23:33:14.5124015Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias ()
2023-01-11T23:33:14.5124534Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503
2023-01-11T23:33:14.5126065Z #85 0x00000000005b8d5c in pymain_run_command (
2023-01-11T23:33:14.5126495Z     command=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:14.5126987Z     at /croot/python-split_1669298683653/work/build-static/python.c:252
2023-01-11T23:33:14.5127554Z #86 pymain_run_python (exitcode=0x7ffca89e6ba0)
2023-01-11T23:33:14.5128008Z     at /croot/python-split_1669298683653/work/build-static/python.c:582
2023-01-11T23:33:14.5128350Z #87 Py_RunMain.localalias ()
2023-01-11T23:33:14.5128710Z     at /croot/python-split_1669298683653/work/build-static/python.c:670
2023-01-11T23:33:14.5144161Z #88 0x0000000000587c29 in Py_BytesMain (argc=<optimized out>, 
2023-01-11T23:33:14.5144496Z     argv=<optimized out>)
2023-01-11T23:33:14.5145251Z     at /croot/python-split_1669298683653/work/build-static/python.c:1090
2023-01-11T23:33:14.5148169Z #89 0x00007fea16e0ac87 in __libc_start_main (main=0x587be0 <main>, argc=5, 
2023-01-11T23:33:14.5148578Z     argv=0x7ffca89e6da8, init=<optimized out>, fini=<optimized out>, 
2023-01-11T23:33:14.5148958Z     rtld_fini=<optimized out>, stack_end=0x7ffca89e6d98)
2023-01-11T23:33:14.5149345Z     at ../csu/libc-start.c:310
2023-01-11T23:33:14.5149625Z #90 0x0000000000587ade in _start ()
2023-01-11T23:33:14.5150175Z     at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880
2023-01-11T23:33:14.9774815Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
2023-01-11T23:33:14.9775627Z Copyright (C) 2018 Free Software Foundation, Inc.
2023-01-11T23:33:14.9776378Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2023-01-11T23:33:14.9777030Z This is free software: you are free to change and redistribute it.
2023-01-11T23:33:14.9777654Z There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
2023-01-11T23:33:14.9778182Z and "show warranty" for details.
2023-01-11T23:33:14.9778769Z This GDB was configured as "x86_64-linux-gnu".
2023-01-11T23:33:14.9779298Z Type "show configuration" for configuration details.
2023-01-11T23:33:14.9779823Z For bug reporting instructions, please see:
2023-01-11T23:33:14.9780331Z <http://www.gnu.org/software/gdb/bugs/>.
2023-01-11T23:33:14.9780889Z Find the GDB manual and other documentation resources online at:
2023-01-11T23:33:14.9781234Z <http://www.gnu.org/software/gdb/documentation/>.
2023-01-11T23:33:14.9781499Z For help, type "help".
2023-01-11T23:33:14.9781757Z Type "apropos word" to search for commands related to "word"...
2023-01-11T23:33:15.0488655Z Reading symbols from python...done.
2023-01-11T23:33:15.5762877Z 
2023-01-11T23:33:15.5763582Z warning: core file may not match specified executable file.
2023-01-11T23:33:15.5784967Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:15.5785337Z [New LWP 6353]
2023-01-11T23:33:15.5785544Z [New LWP 6381]
2023-01-11T23:33:15.5785789Z [New LWP 6368]
2023-01-11T23:33:15.5786184Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:15.5786664Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:15.5787033Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:15.5787421Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:15.5787787Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:15.5788058Z [New LWP 6383]
2023-01-11T23:33:15.5788236Z [New LWP 6361]
2023-01-11T23:33:15.5788421Z [New LWP 6366]
2023-01-11T23:33:15.5788605Z [New LWP 6382]
2023-01-11T23:33:15.5788782Z [New LWP 6379]
2023-01-11T23:33:15.5788964Z [New LWP 6378]
2023-01-11T23:33:15.5789157Z [New LWP 6365]
2023-01-11T23:33:15.5789332Z [New LWP 6380]
2023-01-11T23:33:15.5789512Z [New LWP 6363]
2023-01-11T23:33:15.5789693Z [New LWP 6367]
2023-01-11T23:33:15.5789945Z [New LWP 6393]
2023-01-11T23:33:15.5790168Z [New LWP 6386]
2023-01-11T23:33:15.5790349Z [New LWP 6385]
2023-01-11T23:33:15.5790524Z [New LWP 6384]
2023-01-11T23:33:15.5796706Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:15.5797350Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:15.5797898Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:15.5798728Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:15.5967327Z [Thread debugging using libthread_db enabled]
2023-01-11T23:33:15.5967852Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2023-01-11T23:33:21.0667881Z 51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
2023-01-11T23:33:21.0669222Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'.
2023-01-11T23:33:21.0670097Z Program terminated with signal SIGSEGV, Segmentation fault.
2023-01-11T23:33:21.0671525Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
2023-01-11T23:33:21.0672336Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:21.0672929Z [Current thread is 1 (Thread 0x7f974070a080 (LWP 6353))]
2023-01-11T23:33:21.0698176Z To enable execution of this file add
2023-01-11T23:33:21.0699202Z 	add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit
2023-01-11T23:33:21.0700038Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:21.0700826Z To completely disable this security protection add
2023-01-11T23:33:21.0701577Z 	set auto-load safe-path /
2023-01-11T23:33:21.0702120Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:21.0702704Z For more information about this security protection see the
2023-01-11T23:33:21.0703337Z "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
2023-01-11T23:33:21.0703690Z 	info "(gdb)Auto-loading safe path"
2023-01-11T23:33:21.0730024Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:21.0731473Z #1  0x00007f970772665b in handler_SIGSEGV(int, siginfo_t*, void*) ()
2023-01-11T23:33:21.0731994Z    from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so
2023-01-11T23:33:21.0741960Z #2  <signal handler called>
2023-01-11T23:33:21.0742295Z #3  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
2023-01-11T23:33:21.0880137Z #4  0x00007f973f1c803b in string_at (ptr=0x0, size=-1) at <artificial>:5564
2023-01-11T23:33:21.0880515Z #5  0x00007f97405bf052 in ffi_call_unix64 ()
2023-01-11T23:33:21.0880878Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:21.0884517Z #6  0x00007f97405bd8cd in ffi_call_int ()
2023-01-11T23:33:21.0884957Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:21.0890514Z #7  0x00007f973f1d0879 in _call_function_pointer (argtypecount=2, argcount=2, 
2023-01-11T23:33:21.0890974Z     resmem=0x7fff19e0f940, restype=<optimized out>, atypes=<optimized out>, 
2023-01-11T23:33:21.0891296Z     avalues=<optimized out>, pProc=0x7f973f1c8002 <string_at>, flags=4357)
2023-01-11T23:33:21.0891709Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916
2023-01-11T23:33:21.0891959Z #8  _ctypes_callproc ()
2023-01-11T23:33:21.0892344Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259
2023-01-11T23:33:21.0892660Z #9  0x00007f973f1d03fe in PyCFuncPtr_call () at <artificial>:4201
2023-01-11T23:33:21.0893244Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:21.0893716Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:21.0898179Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:21.0898638Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0899172Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7f964ec1b418, 
2023-01-11T23:33:21.0899584Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0900261Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0900717Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:21.0901046Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.0901365Z     args=0x7f964ec1b418, callable=0x7f973f484880, tstate=<optimized out>)
2023-01-11T23:33:21.0901724Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:21.0903878Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.0904200Z     args=0x7f964ec1b418, callable=0x7f973f484880)
2023-01-11T23:33:21.0904671Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0906730Z #14 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.0922187Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e0fc50, 
2023-01-11T23:33:21.0922514Z     tstate=<optimized out>)
2023-01-11T23:33:21.0922932Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.0923677Z #15 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0924121Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:21.0924487Z #16 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.0924932Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0925421Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0925857Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0926270Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.0926542Z #17 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.0926916Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.0927370Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.0927828Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f973f443890, 
2023-01-11T23:33:21.0928257Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.0928663Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0929062Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.0929312Z #18 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.0929627Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.0929921Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:21.0930217Z     nargsf=<optimized out>, args=0x7f964ec1b280, callable=0x7f973f443880, 
2023-01-11T23:33:21.0930457Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.0930776Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0931082Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.0931345Z     args=0x7f964ec1b280, callable=0x7f973f443880)
2023-01-11T23:33:21.0931704Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0931992Z #21 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.0932267Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e0fe10, 
2023-01-11T23:33:21.0932510Z     tstate=<optimized out>)
2023-01-11T23:33:21.0932822Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.0933082Z #22 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0933390Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:21.0933652Z #23 0x0000000000543a33 in _PyEval_EvalFrame (
2023-01-11T23:33:21.0934097Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0935017Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0935503Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0935965Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.0936218Z #24 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:21.0936521Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0937049Z     args=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, con=0x7f964ea29ac0, tstate=0x1f34bf0)
2023-01-11T23:33:21.0937517Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.0937894Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.0938249Z     stack=<optimized out>, func=0x7f964ea29ab0)
2023-01-11T23:33:21.0938597Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.0938910Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.0939206Z     args=<optimized out>, callable=0x7f964ea29ab0, tstate=0x1f34bf0)
2023-01-11T23:33:21.0939568Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:21.0939815Z #27 vectorcall_unbound (
2023-01-11T23:33:21.0940122Z     nargs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, 
2023-01-11T23:33:21.0940511Z     func=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0940883Z     unbound=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0941255Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0941709Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629
2023-01-11T23:33:21.0941961Z #28 vectorcall_method ()
2023-01-11T23:33:21.0942279Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661
2023-01-11T23:33:21.0942561Z #29 0x0000000000543898 in slot_mp_subscript (self=<optimized out>, 
2023-01-11T23:33:21.0942803Z     arg1=<optimized out>)
2023-01-11T23:33:21.0943107Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258
2023-01-11T23:33:21.0943373Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0943705Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109
2023-01-11T23:33:21.0943976Z #31 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.0944299Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0944664Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0945029Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0945414Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.0945684Z #32 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.0946060Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.0946500Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.0946912Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f973f4415b0, 
2023-01-11T23:33:21.0947318Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.0947819Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0948214Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.0948474Z #33 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.0948779Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.0950193Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:21.0950746Z     nargsf=<optimized out>, args=0x7f964ec22a48, callable=0x7f973f4415a0, 
2023-01-11T23:33:21.0951000Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.0952004Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0953518Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.0954044Z     args=0x7f964ec22a48, callable=0x7f973f4415a0)
2023-01-11T23:33:21.0954499Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0954794Z #36 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.0955140Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e10210, 
2023-01-11T23:33:21.0955399Z     tstate=<optimized out>)
2023-01-11T23:33:21.0955712Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.0956044Z #37 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0956372Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:21.0956639Z #38 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.0956969Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0957348Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0957730Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0958111Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.0960589Z #39 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.0960977Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.0961429Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.0961834Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f9657a45370, 
2023-01-11T23:33:21.0962247Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.0962644Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0963065Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.0963316Z #40 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.0963633Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.0965547Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:21.0965850Z     nargsf=<optimized out>, args=0x20cef18, callable=0x7f9657a45360, 
2023-01-11T23:33:21.0966097Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.0966503Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0968017Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x20cef18, 
2023-01-11T23:33:21.0968330Z     callable=0x7f9657a45360)
2023-01-11T23:33:21.0968673Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0969261Z #43 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.0969634Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e103d0, 
2023-01-11T23:33:21.0969946Z     tstate=<optimized out>)
2023-01-11T23:33:21.0970334Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.0970688Z #44 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0971001Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:21.0971277Z #45 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.0971634Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0972012Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0972377Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0972768Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.0975593Z #46 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.0976456Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.0977036Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.0977535Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f9657a44680, 
2023-01-11T23:33:21.0977956Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.0978351Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0978799Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.0979063Z #47 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.0979382Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.0980895Z #48 0x00000000004f141c in do_call_core (kwdict=0x7f973f58e700, 
2023-01-11T23:33:21.0981464Z     callargs=0x7f964ebcde40, func=0x7f9657a44670, trace_info=0x7fff19e10590, 
2023-01-11T23:33:21.0981754Z     tstate=<optimized out>)
2023-01-11T23:33:21.0982116Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:21.0982450Z #49 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0982823Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:21.0983152Z #50 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.0983476Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0983843Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.0984214Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0984603Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.0986639Z #51 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.0987281Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.0987804Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.0988268Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f973f5b95b0, 
2023-01-11T23:33:21.0988680Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.0989079Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.0989517Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.0989781Z #52 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.0990199Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.0993477Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:21.0993806Z     nargsf=<optimized out>, args=0x6f8f008, callable=0x7f973f5b95a0, 
2023-01-11T23:33:21.0994167Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.0994529Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0994858Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x6f8f008, 
2023-01-11T23:33:21.0995124Z     callable=0x7f973f5b95a0)
2023-01-11T23:33:21.0995455Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.0996358Z #55 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.0996967Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e10750, 
2023-01-11T23:33:21.0997384Z     tstate=<optimized out>)
2023-01-11T23:33:21.0997901Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.0998291Z #56 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.0998896Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:21.0999324Z #57 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.0999802Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1000365Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1000918Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1001518Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.1003821Z #58 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.1004385Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.1005067Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.1005677Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f973f5b9f40, 
2023-01-11T23:33:21.1006319Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.1006907Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1007529Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.1007925Z #59 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.1008386Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.1010165Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:21.1010616Z     nargsf=<optimized out>, args=0x1f727b0, callable=0x7f973f5b9f30, 
2023-01-11T23:33:21.1010985Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.1011491Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.1013282Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x1f727b0, 
2023-01-11T23:33:21.1013679Z     callable=0x7f973f5b9f30)
2023-01-11T23:33:21.1014194Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.1015074Z #62 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.1015496Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e10910, 
2023-01-11T23:33:21.1015878Z     tstate=<optimized out>)
2023-01-11T23:33:21.1016348Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.1016734Z #63 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.1017229Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:21.1018073Z #64 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.1018569Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1019144Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1019689Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1020443Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.1023804Z #65 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.1024393Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.1025044Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.1025653Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f973f43ade0, 
2023-01-11T23:33:21.1026287Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.1027015Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1027643Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.1028021Z #66 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.1028509Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.1029373Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:21.1029820Z     nargsf=<optimized out>, args=0x7f973f700200, callable=0x7f973f43add0, 
2023-01-11T23:33:21.1030281Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.1030779Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.1033758Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:21.1034164Z     args=0x7f973f700200, callable=0x7f973f43add0)
2023-01-11T23:33:21.1034707Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.1035163Z #69 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:21.1035559Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e10ad0, 
2023-01-11T23:33:21.1035931Z     tstate=<optimized out>)
2023-01-11T23:33:21.1036417Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.1036807Z #70 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.1037293Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:21.1037702Z #71 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:21.1038191Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1038725Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1039282Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1039873Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:21.1042675Z #72 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:21.1043248Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:21.1043922Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:21.1044570Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f973f43ad50, 
2023-01-11T23:33:21.1045200Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1f34bf0, 
2023-01-11T23:33:21.1045775Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1046408Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.1046819Z #73 _PyFunction_Vectorcall ()
2023-01-11T23:33:21.1047295Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:21.1047818Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f973f6d9e00, 
2023-01-11T23:33:21.1048470Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7f973f43ad40, 
2023-01-11T23:33:21.1048856Z     tstate=0x1f34bf0)
2023-01-11T23:33:21.1049339Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.1049842Z #75 PyObject_Vectorcall (kwnames=0x7f973f6d9e00, nargsf=<optimized out>, 
2023-01-11T23:33:21.1050276Z     args=<optimized out>, callable=0x7f973f43ad40)
2023-01-11T23:33:21.1050798Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:21.1052263Z #76 call_function (kwnames=0x7f973f6d9e00, oparg=<optimized out>, 
2023-01-11T23:33:21.1052700Z     pp_stack=<synthetic pointer>, trace_info=0x7fff19e10c90, 
2023-01-11T23:33:21.1053064Z     tstate=<optimized out>)
2023-01-11T23:33:21.1053649Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:21.1054035Z #77 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:21.1054801Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:21.1055567Z #78 0x0000000000594b72 in _PyEval_EvalFrame (
2023-01-11T23:33:21.1056042Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1056623Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:21.1057177Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1057872Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46
2023-01-11T23:33:21.1058294Z #79 _PyEval_Vector ()
2023-01-11T23:33:21.1058754Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:21.1063596Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f973f69c920, 
2023-01-11T23:33:21.1064057Z     globals=globals@entry=0x7f973f68e600, locals=locals@entry=0x7f973f68e600)
2023-01-11T23:33:21.1064617Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134
2023-01-11T23:33:21.1065016Z #81 0x00000000005c6e57 in run_eval_code_obj ()
2023-01-11T23:33:21.1065509Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291
2023-01-11T23:33:21.1068547Z #82 0x00000000005c1d40 in run_mod ()
2023-01-11T23:33:21.1069057Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312
2023-01-11T23:33:21.1072391Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias ()
2023-01-11T23:33:21.1072970Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183
2023-01-11T23:33:21.1073602Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias ()
2023-01-11T23:33:21.1074040Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503
2023-01-11T23:33:21.1076826Z #85 0x00000000005b8d5c in pymain_run_command (
2023-01-11T23:33:21.1077492Z     command=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:21.1078133Z     at /croot/python-split_1669298683653/work/build-static/python.c:252
2023-01-11T23:33:21.1078568Z #86 pymain_run_python (exitcode=0x7fff19e10ef0)
2023-01-11T23:33:21.1079079Z     at /croot/python-split_1669298683653/work/build-static/python.c:582
2023-01-11T23:33:21.1079461Z #87 Py_RunMain.localalias ()
2023-01-11T23:33:21.1079931Z     at /croot/python-split_1669298683653/work/build-static/python.c:670
2023-01-11T23:33:21.1096650Z #88 0x0000000000587c29 in Py_BytesMain (argc=<optimized out>, 
2023-01-11T23:33:21.1097045Z     argv=<optimized out>)
2023-01-11T23:33:21.1097569Z     at /croot/python-split_1669298683653/work/build-static/python.c:1090
2023-01-11T23:33:21.1101609Z #89 0x00007f973f76ec87 in __libc_start_main (main=0x587be0 <main>, argc=5, 
2023-01-11T23:33:21.1101990Z     argv=0x7fff19e110f8, init=<optimized out>, fini=<optimized out>, 
2023-01-11T23:33:21.1102374Z     rtld_fini=<optimized out>, stack_end=0x7fff19e110e8)
2023-01-11T23:33:21.1102793Z     at ../csu/libc-start.c:310
2023-01-11T23:33:21.1103273Z #90 0x0000000000587ade in _start ()
2023-01-11T23:33:21.1103750Z     at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880
2023-01-11T23:33:21.2770024Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
2023-01-11T23:33:21.2770459Z Copyright (C) 2018 Free Software Foundation, Inc.
2023-01-11T23:33:21.2770896Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2023-01-11T23:33:21.2771341Z This is free software: you are free to change and redistribute it.
2023-01-11T23:33:21.2771732Z There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
2023-01-11T23:33:21.2772020Z and "show warranty" for details.
2023-01-11T23:33:21.2772317Z This GDB was configured as "x86_64-linux-gnu".
2023-01-11T23:33:21.2772593Z Type "show configuration" for configuration details.
2023-01-11T23:33:21.2773128Z For bug reporting instructions, please see:
2023-01-11T23:33:21.2773398Z <http://www.gnu.org/software/gdb/bugs/>.
2023-01-11T23:33:21.2773687Z Find the GDB manual and other documentation resources online at:
2023-01-11T23:33:21.2773989Z <http://www.gnu.org/software/gdb/documentation/>.
2023-01-11T23:33:21.2774225Z For help, type "help".
2023-01-11T23:33:21.2774764Z Type "apropos word" to search for commands related to "word"...
2023-01-11T23:33:21.3484884Z Reading symbols from python...done.
2023-01-11T23:33:21.8763435Z 
2023-01-11T23:33:21.8764499Z warning: core file may not match specified executable file.
2023-01-11T23:33:21.8783793Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:21.8784250Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:21.8784695Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:21.8785221Z [New LWP 8833]
2023-01-11T23:33:21.8785547Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:21.8785922Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:21.8786303Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:21.8786570Z [New LWP 8850]
2023-01-11T23:33:21.8786753Z [New LWP 8842]
2023-01-11T23:33:21.8786935Z [New LWP 8848]
2023-01-11T23:33:21.8787118Z [New LWP 8860]
2023-01-11T23:33:21.8787293Z [New LWP 8840]
2023-01-11T23:33:21.8787476Z [New LWP 8859]
2023-01-11T23:33:21.8787657Z [New LWP 8858]
2023-01-11T23:33:21.8787830Z [New LWP 8857]
2023-01-11T23:33:21.8788010Z [New LWP 8846]
2023-01-11T23:33:21.8788191Z [New LWP 8864]
2023-01-11T23:33:21.8788363Z [New LWP 8844]
2023-01-11T23:33:21.8788542Z [New LWP 8862]
2023-01-11T23:33:21.8788722Z [New LWP 8863]
2023-01-11T23:33:21.8788895Z [New LWP 8865]
2023-01-11T23:33:21.8789079Z [New LWP 8861]
2023-01-11T23:33:21.8789258Z [New LWP 8872]
2023-01-11T23:33:21.8794638Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:21.8795255Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:21.8795805Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:21.8796346Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:21.8967287Z [Thread debugging using libthread_db enabled]
2023-01-11T23:33:21.8968047Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2023-01-11T23:33:27.3697658Z 51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
2023-01-11T23:33:27.3699085Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
2023-01-11T23:33:27.3700625Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'.
2023-01-11T23:33:27.3701378Z Program terminated with signal SIGSEGV, Segmentation fault.
2023-01-11T23:33:27.3701986Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:27.3702566Z [Current thread is 1 (Thread 0x7f494eef1080 (LWP 8833))]
2023-01-11T23:33:27.3726613Z To enable execution of this file add
2023-01-11T23:33:27.3727124Z 	add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit
2023-01-11T23:33:27.3727534Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:27.3727864Z To completely disable this security protection add
2023-01-11T23:33:27.3728157Z 	set auto-load safe-path /
2023-01-11T23:33:27.3728524Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:27.3728825Z For more information about this security protection see the
2023-01-11T23:33:27.3729195Z "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
2023-01-11T23:33:27.3729517Z 	info "(gdb)Auto-loading safe path"
2023-01-11T23:33:27.3760834Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:27.3762105Z #1  0x00007f4915f0d65b in handler_SIGSEGV(int, siginfo_t*, void*) ()
2023-01-11T23:33:27.3762669Z    from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so
2023-01-11T23:33:27.3771599Z #2  <signal handler called>
2023-01-11T23:33:27.3771964Z #3  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
2023-01-11T23:33:27.3908293Z #4  0x00007f494d9af03b in string_at (ptr=0x0, size=-1) at <artificial>:5564
2023-01-11T23:33:27.3908677Z #5  0x00007f494eda6052 in ffi_call_unix64 ()
2023-01-11T23:33:27.3909065Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:27.3915308Z #6  0x00007f494eda48cd in ffi_call_int ()
2023-01-11T23:33:27.3915799Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:27.3921301Z #7  0x00007f494d9b7879 in _call_function_pointer (argtypecount=2, argcount=2, 
2023-01-11T23:33:27.3921774Z     resmem=0x7ffd99b0c510, restype=<optimized out>, atypes=<optimized out>, 
2023-01-11T23:33:27.3922200Z     avalues=<optimized out>, pProc=0x7f494d9af002 <string_at>, flags=4357)
2023-01-11T23:33:27.3922634Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916
2023-01-11T23:33:27.3922987Z #8  _ctypes_callproc ()
2023-01-11T23:33:27.3923418Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259
2023-01-11T23:33:27.3923722Z #9  0x00007f494d9b73fe in PyCFuncPtr_call () at <artificial>:4201
2023-01-11T23:33:27.3924340Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:27.3925654Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:27.3928775Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:27.3929362Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3929934Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7f485d457418, 
2023-01-11T23:33:27.3930349Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3930723Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3931180Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:27.3931774Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.3932175Z     args=0x7f485d457418, callable=0x7f494dc68880, tstate=<optimized out>)
2023-01-11T23:33:27.3932609Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:27.3935373Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.3935764Z     args=0x7f485d457418, callable=0x7f494dc68880)
2023-01-11T23:33:27.3936458Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.3938748Z #14 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.3939161Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0c820, 
2023-01-11T23:33:27.3939488Z     tstate=<optimized out>)
2023-01-11T23:33:27.3939934Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.3940295Z #15 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.3940713Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:27.3941005Z #16 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.3941326Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3941809Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3942178Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3942573Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.3946246Z #17 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.3946778Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.3947216Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.3947636Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f494dc27890, 
2023-01-11T23:33:27.3948054Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.3948455Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3948889Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.3949153Z #18 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.3949469Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.3952173Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:27.3952596Z     nargsf=<optimized out>, args=0x7f485d457280, callable=0x7f494dc27880, 
2023-01-11T23:33:27.3952861Z     tstate=0x135abf0)
2023-01-11T23:33:27.3953289Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.3953839Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.3954168Z     args=0x7f485d457280, callable=0x7f494dc27880)
2023-01-11T23:33:27.3954601Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.3956329Z #21 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.3956695Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0c9e0, 
2023-01-11T23:33:27.3956998Z     tstate=<optimized out>)
2023-01-11T23:33:27.3957334Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.3957628Z #22 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.3958042Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:27.3958489Z #23 0x0000000000543a33 in _PyEval_EvalFrame (
2023-01-11T23:33:27.3958888Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3959295Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3959695Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3960240Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.3963118Z #24 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:27.3963637Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3964256Z     args=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, con=0x7f485d269ac0, tstate=0x135abf0)
2023-01-11T23:33:27.3965002Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.3965574Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.3965866Z     stack=<optimized out>, func=0x7f485d269ab0)
2023-01-11T23:33:27.3966522Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.3970146Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.3970580Z     args=<optimized out>, callable=0x7f485d269ab0, tstate=0x135abf0)
2023-01-11T23:33:27.3971179Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:27.3971442Z #27 vectorcall_unbound (
2023-01-11T23:33:27.3971770Z     nargs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, 
2023-01-11T23:33:27.3972168Z     func=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3972601Z     unbound=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3973124Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3973664Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629
2023-01-11T23:33:27.3974010Z #28 vectorcall_method ()
2023-01-11T23:33:27.3974354Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661
2023-01-11T23:33:27.3974921Z #29 0x0000000000543898 in slot_mp_subscript (self=<optimized out>, 
2023-01-11T23:33:27.3975178Z     arg1=<optimized out>)
2023-01-11T23:33:27.3975546Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258
2023-01-11T23:33:27.3975831Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.3976280Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109
2023-01-11T23:33:27.3978309Z #31 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.3978636Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3979014Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3979386Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3979791Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.3982859Z #32 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.3983472Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.3983918Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.3984339Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f494dc255b0, 
2023-01-11T23:33:27.3984751Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.3985150Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3985570Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.3985831Z #33 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.3986138Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.3988355Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:27.3988860Z     nargsf=<optimized out>, args=0x7f485d45ea48, callable=0x7f494dc255a0, 
2023-01-11T23:33:27.3989128Z     tstate=0x135abf0)
2023-01-11T23:33:27.3989474Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.3992631Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.3993022Z     args=0x7f485d45ea48, callable=0x7f494dc255a0)
2023-01-11T23:33:27.3993486Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.3993790Z #36 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.3994084Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0cde0, 
2023-01-11T23:33:27.3994336Z     tstate=<optimized out>)
2023-01-11T23:33:27.3994649Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.3994915Z #37 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.3995235Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:27.3996402Z #38 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.3996881Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3997292Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.3997677Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.3998190Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.4000106Z #39 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.4000726Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.4001346Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.4001906Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f4866255370, 
2023-01-11T23:33:27.4002361Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.4002765Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4003226Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4003527Z #40 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.4003977Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.4004982Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:27.4005375Z     nargsf=<optimized out>, args=0x6372c38, callable=0x7f4866255360, 
2023-01-11T23:33:27.4005680Z     tstate=0x135abf0)
2023-01-11T23:33:27.4006052Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4007672Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x6372c38, 
2023-01-11T23:33:27.4008041Z     callable=0x7f4866255360)
2023-01-11T23:33:27.4008487Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4009210Z #43 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.4009603Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0cfa0, 
2023-01-11T23:33:27.4009931Z     tstate=<optimized out>)
2023-01-11T23:33:27.4010313Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.4010575Z #44 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.4010893Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:27.4011362Z #45 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.4011774Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4012267Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4012728Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4013138Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.4016701Z #46 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.4017202Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.4017771Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.4018215Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f4866254680, 
2023-01-11T23:33:27.4018618Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.4019078Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4019596Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4019852Z #47 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.4020175Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.4022112Z #48 0x00000000004f141c in do_call_core (kwdict=0x7f494dd72780, 
2023-01-11T23:33:27.4022752Z     callargs=0x7f485d211e40, func=0x7f4866254670, trace_info=0x7ffd99b0d160, 
2023-01-11T23:33:27.4023235Z     tstate=<optimized out>)
2023-01-11T23:33:27.4023748Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:27.4024100Z #49 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.4024476Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:27.4024772Z #50 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.4025095Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4025471Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4025852Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4026250Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.4029279Z #51 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.4030125Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.4030830Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.4031452Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f494dd9d5b0, 
2023-01-11T23:33:27.4032081Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.4032687Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4033349Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4033735Z #52 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.4034222Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.4034927Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:27.4035374Z     nargsf=<optimized out>, args=0x63b42a8, callable=0x7f494dd9d5a0, 
2023-01-11T23:33:27.4035634Z     tstate=0x135abf0)
2023-01-11T23:33:27.4035987Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4038193Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x63b42a8, 
2023-01-11T23:33:27.4038568Z     callable=0x7f494dd9d5a0)
2023-01-11T23:33:27.4038995Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4039970Z #55 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.4040275Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0d320, 
2023-01-11T23:33:27.4040597Z     tstate=<optimized out>)
2023-01-11T23:33:27.4041178Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.4041539Z #56 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.4041998Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:27.4042368Z #57 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.4042692Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4043074Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4043447Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4043842Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.4047136Z #58 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.4047585Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.4048166Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.4048602Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f494dd9df40, 
2023-01-11T23:33:27.4049127Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.4049602Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4050027Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4050276Z #59 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.4050596Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.4051928Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:27.4052318Z     nargsf=<optimized out>, args=0x13d7510, callable=0x7f494dd9df30, 
2023-01-11T23:33:27.4052638Z     tstate=0x135abf0)
2023-01-11T23:33:27.4053005Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4055047Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x13d7510, 
2023-01-11T23:33:27.4055408Z     callable=0x7f494dd9df30)
2023-01-11T23:33:27.4055831Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4056920Z #62 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.4057312Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0d4e0, 
2023-01-11T23:33:27.4057627Z     tstate=<optimized out>)
2023-01-11T23:33:27.4058046Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.4058387Z #63 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.4058818Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:27.4059183Z #64 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.4059595Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4059973Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4060334Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4060734Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.4063511Z #65 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.4063922Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.4064449Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.4064875Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f494dc1ede0, 
2023-01-11T23:33:27.4065414Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.4065814Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4066245Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4066510Z #66 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.4066833Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.4069032Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:27.4069362Z     nargsf=<optimized out>, args=0x7f494dee8200, callable=0x7f494dc1edd0, 
2023-01-11T23:33:27.4069697Z     tstate=0x135abf0)
2023-01-11T23:33:27.4070185Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4072716Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:27.4073075Z     args=0x7f494dee8200, callable=0x7f494dc1edd0)
2023-01-11T23:33:27.4073495Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4073783Z #69 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:27.4074282Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0d6a0, 
2023-01-11T23:33:27.4074655Z     tstate=<optimized out>)
2023-01-11T23:33:27.4075055Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.4075314Z #70 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.4075666Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:27.4076043Z #71 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:27.4076482Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4076938Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4077312Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4077724Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:27.4080254Z #72 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:27.4080806Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:27.4081369Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:27.4081791Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f494dc1ed50, 
2023-01-11T23:33:27.4082200Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x135abf0, 
2023-01-11T23:33:27.4082603Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4083015Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4083279Z #73 _PyFunction_Vectorcall ()
2023-01-11T23:33:27.4083589Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:27.4085272Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f494dec1e00, 
2023-01-11T23:33:27.4085604Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7f494dc1ed40, 
2023-01-11T23:33:27.4085924Z     tstate=0x135abf0)
2023-01-11T23:33:27.4086246Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4087404Z #75 PyObject_Vectorcall (kwnames=0x7f494dec1e00, nargsf=<optimized out>, 
2023-01-11T23:33:27.4087759Z     args=<optimized out>, callable=0x7f494dc1ed40)
2023-01-11T23:33:27.4088154Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:27.4089318Z #76 call_function (kwnames=0x7f494dec1e00, oparg=<optimized out>, 
2023-01-11T23:33:27.4089696Z     pp_stack=<synthetic pointer>, trace_info=0x7ffd99b0d860, 
2023-01-11T23:33:27.4090138Z     tstate=<optimized out>)
2023-01-11T23:33:27.4090471Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:27.4090739Z #77 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:27.4091061Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:27.4091802Z #78 0x0000000000594b72 in _PyEval_EvalFrame (
2023-01-11T23:33:27.4092260Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4092664Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:27.4093044Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4093623Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46
2023-01-11T23:33:27.4093920Z #79 _PyEval_Vector ()
2023-01-11T23:33:27.4094268Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:27.4098499Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f494de84920, 
2023-01-11T23:33:27.4098814Z     globals=globals@entry=0x7f494de76600, locals=locals@entry=0x7f494de76600)
2023-01-11T23:33:27.4099283Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134
2023-01-11T23:33:27.4099988Z #81 0x00000000005c6e57 in run_eval_code_obj ()
2023-01-11T23:33:27.4100369Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291
2023-01-11T23:33:27.4102675Z #82 0x00000000005c1d40 in run_mod ()
2023-01-11T23:33:27.4103027Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312
2023-01-11T23:33:27.4105087Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias ()
2023-01-11T23:33:27.4105465Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183
2023-01-11T23:33:27.4107620Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias ()
2023-01-11T23:33:27.4108021Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503
2023-01-11T23:33:27.4110822Z #85 0x00000000005b8d5c in pymain_run_command (
2023-01-11T23:33:27.4112314Z     command=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:27.4112873Z     at /croot/python-split_1669298683653/work/build-static/python.c:252
2023-01-11T23:33:27.4113212Z #86 pymain_run_python (exitcode=0x7ffd99b0dac0)
2023-01-11T23:33:27.4113561Z     at /croot/python-split_1669298683653/work/build-static/python.c:582
2023-01-11T23:33:27.4113820Z #87 Py_RunMain.localalias ()
2023-01-11T23:33:27.4114164Z     at /croot/python-split_1669298683653/work/build-static/python.c:670
2023-01-11T23:33:27.4129408Z #88 0x0000000000587c29 in Py_BytesMain (argc=<optimized out>, 
2023-01-11T23:33:27.4129685Z     argv=<optimized out>)
2023-01-11T23:33:27.4130027Z     at /croot/python-split_1669298683653/work/build-static/python.c:1090
2023-01-11T23:33:27.4134205Z #89 0x00007f494df55c87 in __libc_start_main (main=0x587be0 <main>, argc=5, 
2023-01-11T23:33:27.4135449Z     argv=0x7ffd99b0dcc8, init=<optimized out>, fini=<optimized out>, 
2023-01-11T23:33:27.4136163Z     rtld_fini=<optimized out>, stack_end=0x7ffd99b0dcb8)
2023-01-11T23:33:27.4136605Z     at ../csu/libc-start.c:310
2023-01-11T23:33:27.4136905Z #90 0x0000000000587ade in _start ()
2023-01-11T23:33:27.4137335Z     at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880
2023-01-11T23:33:27.5657320Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
2023-01-11T23:33:27.5657758Z Copyright (C) 2018 Free Software Foundation, Inc.
2023-01-11T23:33:27.5658111Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2023-01-11T23:33:27.5658429Z This is free software: you are free to change and redistribute it.
2023-01-11T23:33:27.5658763Z There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
2023-01-11T23:33:27.5659036Z and "show warranty" for details.
2023-01-11T23:33:27.5659563Z This GDB was configured as "x86_64-linux-gnu".
2023-01-11T23:33:27.5659831Z Type "show configuration" for configuration details.
2023-01-11T23:33:27.5660101Z For bug reporting instructions, please see:
2023-01-11T23:33:27.5660352Z <http://www.gnu.org/software/gdb/bugs/>.
2023-01-11T23:33:27.5660634Z Find the GDB manual and other documentation resources online at:
2023-01-11T23:33:27.5660926Z <http://www.gnu.org/software/gdb/documentation/>.
2023-01-11T23:33:27.5661162Z For help, type "help".
2023-01-11T23:33:27.5661417Z Type "apropos word" to search for commands related to "word"...
2023-01-11T23:33:27.6368492Z Reading symbols from python...done.
2023-01-11T23:33:28.1652825Z 
2023-01-11T23:33:28.1653312Z warning: core file may not match specified executable file.
2023-01-11T23:33:28.1675785Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:28.1676162Z [New LWP 8834]
2023-01-11T23:33:28.1676430Z [New LWP 8852]
2023-01-11T23:33:28.1676672Z [New LWP 8843]
2023-01-11T23:33:28.1676906Z [New LWP 8853]
2023-01-11T23:33:28.1677205Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:28.1677669Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:28.1678247Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:28.1678706Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:28.1679073Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:28.1679330Z [New LWP 8849]
2023-01-11T23:33:28.1679518Z [New LWP 8855]
2023-01-11T23:33:28.1679706Z [New LWP 8845]
2023-01-11T23:33:28.1679888Z [New LWP 8851]
2023-01-11T23:33:28.1680072Z [New LWP 8856]
2023-01-11T23:33:28.1680255Z [New LWP 8854]
2023-01-11T23:33:28.1680428Z [New LWP 8841]
2023-01-11T23:33:28.1680611Z [New LWP 8847]
2023-01-11T23:33:28.1680795Z [New LWP 8868]
2023-01-11T23:33:28.1680968Z [New LWP 8873]
2023-01-11T23:33:28.1681148Z [New LWP 8866]
2023-01-11T23:33:28.1681331Z [New LWP 8869]
2023-01-11T23:33:28.1681503Z [New LWP 8867]
2023-01-11T23:33:28.1687936Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:28.1688673Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:28.1689233Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:28.1689774Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:28.1857092Z [Thread debugging using libthread_db enabled]
2023-01-11T23:33:28.1858109Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2023-01-11T23:33:33.6833990Z 51	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
2023-01-11T23:33:33.6835141Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'.
2023-01-11T23:33:33.6836264Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
2023-01-11T23:33:33.6836945Z Program terminated with signal SIGSEGV, Segmentation fault.
2023-01-11T23:33:33.6837274Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:33.6837563Z [Current thread is 1 (Thread 0x7fb3f30a2080 (LWP 8834))]
2023-01-11T23:33:33.6864579Z To enable execution of this file add
2023-01-11T23:33:33.6865563Z 	add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit
2023-01-11T23:33:33.6866359Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:33.6867217Z To completely disable this security protection add
2023-01-11T23:33:33.6867506Z 	set auto-load safe-path /
2023-01-11T23:33:33.6867761Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:33.6868055Z For more information about this security protection see the
2023-01-11T23:33:33.6868423Z "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
2023-01-11T23:33:33.6868739Z 	info "(gdb)Auto-loading safe path"
2023-01-11T23:33:33.6895504Z #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:51
2023-01-11T23:33:33.6896310Z #1  0x00007fb3ba0be65b in handler_SIGSEGV(int, siginfo_t*, void*) ()
2023-01-11T23:33:33.6897125Z    from /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch_python.so
2023-01-11T23:33:33.6906155Z #2  <signal handler called>
2023-01-11T23:33:33.6906498Z #3  __strlen_sse2 () at ../sysdeps/x86_64/multiarch/../strlen.S:120
2023-01-11T23:33:33.7040943Z #4  0x00007fb3f1b6003b in string_at (ptr=0x0, size=-1) at <artificial>:5564
2023-01-11T23:33:33.7041252Z #5  0x00007fb3f2f57052 in ffi_call_unix64 ()
2023-01-11T23:33:33.7041625Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:33.7047230Z #6  0x00007fb3f2f558cd in ffi_call_int ()
2023-01-11T23:33:33.7048332Z    from /opt/conda/lib/python3.10/lib-dynload/../../libffi.so.8
2023-01-11T23:33:33.7052052Z #7  0x00007fb3f1b68879 in _call_function_pointer (argtypecount=2, argcount=2, 
2023-01-11T23:33:33.7052611Z     resmem=0x7ffc09d87bc0, restype=<optimized out>, atypes=<optimized out>, 
2023-01-11T23:33:33.7053523Z     avalues=<optimized out>, pProc=0x7fb3f1b60002 <string_at>, flags=4357)
2023-01-11T23:33:33.7054360Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:916
2023-01-11T23:33:33.7055352Z #8  _ctypes_callproc ()
2023-01-11T23:33:33.7056013Z     at /usr/local/src/conda/python-3.10.8/build-static/stgdict.c:1259
2023-01-11T23:33:33.7056319Z #9  0x00007fb3f1b683fe in PyCFuncPtr_call () at <artificial>:4201
2023-01-11T23:33:33.7056626Z #10 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:33.7056972Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:33.7060315Z #11 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:33.7060782Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7061308Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7fb3015af418, 
2023-01-11T23:33:33.7061825Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7062319Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7062795Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:33.7063116Z #12 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7063430Z     args=0x7fb3015af418, callable=0x7fb3f1e1c880, tstate=<optimized out>)
2023-01-11T23:33:33.7063786Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:33.7064787Z #13 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7065157Z     args=0x7fb3015af418, callable=0x7fb3f1e1c880)
2023-01-11T23:33:33.7065524Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7068199Z #14 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7068519Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d87ed0, 
2023-01-11T23:33:33.7068830Z     tstate=<optimized out>)
2023-01-11T23:33:33.7069300Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7069614Z #15 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7070005Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:33.7070274Z #16 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7070869Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7071249Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7071618Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7072006Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7074812Z #17 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7075213Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7075848Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7076418Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb3f1ddb890, 
2023-01-11T23:33:33.7076845Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7077249Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7077684Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7077938Z #18 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7078259Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7080328Z #19 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:33.7080726Z     nargsf=<optimized out>, args=0x7fb3015af280, callable=0x7fb3f1ddb880, 
2023-01-11T23:33:33.7081046Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7081454Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7082061Z #20 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7082445Z     args=0x7fb3015af280, callable=0x7fb3f1ddb880)
2023-01-11T23:33:33.7082906Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7083543Z #21 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7083929Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d88090, 
2023-01-11T23:33:33.7084258Z     tstate=<optimized out>)
2023-01-11T23:33:33.7084680Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7084945Z #22 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7085264Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:33.7085759Z #23 0x0000000000543a33 in _PyEval_EvalFrame (
2023-01-11T23:33:33.7086172Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7086625Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7086990Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7087390Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7090041Z #24 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:33.7090482Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7091030Z     args=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, con=0x7fb3013c1ac0, tstate=0x1fb3bf0)
2023-01-11T23:33:33.7091575Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7091880Z #25 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7092155Z     stack=<optimized out>, func=0x7fb3013c1ab0)
2023-01-11T23:33:33.7092492Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7095244Z #26 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7095814Z     args=<optimized out>, callable=0x7fb3013c1ab0, tstate=0x1fb3bf0)
2023-01-11T23:33:33.7096334Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:33.7096676Z #27 vectorcall_unbound (
2023-01-11T23:33:33.7097093Z     nargs=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, 
2023-01-11T23:33:33.7097607Z     func=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7098035Z     unbound=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7098452Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7099077Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1629
2023-01-11T23:33:33.7099419Z #28 vectorcall_method ()
2023-01-11T23:33:33.7099811Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:1661
2023-01-11T23:33:33.7100116Z #29 0x0000000000543898 in slot_mp_subscript (self=<optimized out>, 
2023-01-11T23:33:33.7100359Z     arg1=<optimized out>)
2023-01-11T23:33:33.7100664Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7258
2023-01-11T23:33:33.7100946Z #30 0x00000000004ef56e in _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7101280Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:2109
2023-01-11T23:33:33.7101931Z #31 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7102370Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7102866Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7103293Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7103685Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7106136Z #32 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7106680Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7107278Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7107736Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb3f1dd95b0, 
2023-01-11T23:33:33.7108149Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7108542Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7108973Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7109223Z #33 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7109535Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7112055Z #34 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:33.7112471Z     nargsf=<optimized out>, args=0x7fb3015b6a48, callable=0x7fb3f1dd95a0, 
2023-01-11T23:33:33.7112790Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7113231Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7113644Z #35 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7113986Z     args=0x7fb3015b6a48, callable=0x7fb3f1dd95a0)
2023-01-11T23:33:33.7114351Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7115067Z #36 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7115456Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d88490, 
2023-01-11T23:33:33.7115773Z     tstate=<optimized out>)
2023-01-11T23:33:33.7116178Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7116444Z #37 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7116884Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:33.7117516Z #38 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7117947Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7118469Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7118922Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7119354Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7121702Z #39 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7122349Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7122976Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7123499Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb30a3dd370, 
2023-01-11T23:33:33.7123908Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7124307Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7124739Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7124999Z #40 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7125306Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7126601Z #41 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:33.7127020Z     nargsf=<optimized out>, args=0x2009fa8, callable=0x7fb30a3dd360, 
2023-01-11T23:33:33.7127350Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7127768Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7128559Z #42 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x2009fa8, 
2023-01-11T23:33:33.7128919Z     callable=0x7fb30a3dd360)
2023-01-11T23:33:33.7129337Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7130822Z #43 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7131203Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d88650, 
2023-01-11T23:33:33.7131532Z     tstate=<optimized out>)
2023-01-11T23:33:33.7131956Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7132302Z #44 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7132727Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:33.7133008Z #45 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7133334Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7133712Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7134084Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7134699Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7137152Z #46 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7137666Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7138254Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7138741Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb30a3dc680, 
2023-01-11T23:33:33.7139161Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7139676Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7140109Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7140362Z #47 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7140674Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7141716Z #48 0x00000000004f141c in do_call_core (kwdict=0x7fb3f1f26780, 
2023-01-11T23:33:33.7142017Z     callargs=0x7fb301569e40, func=0x7fb30a3dc670, trace_info=0x7ffc09d88810, 
2023-01-11T23:33:33.7142330Z     tstate=<optimized out>)
2023-01-11T23:33:33.7142667Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:33.7142921Z #49 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7143308Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:33.7143612Z #50 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7144025Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7144416Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7144783Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7145175Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7147862Z #51 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7148255Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7148826Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7149329Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb3f1f515b0, 
2023-01-11T23:33:33.7149747Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7150232Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7150659Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7150915Z #52 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7151221Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7153071Z #53 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:33.7153468Z     nargsf=<optimized out>, args=0x700c638, callable=0x7fb3f1f515a0, 
2023-01-11T23:33:33.7153784Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7154175Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7154943Z #54 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x700c638, 
2023-01-11T23:33:33.7155302Z     callable=0x7fb3f1f515a0)
2023-01-11T23:33:33.7155749Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7156630Z #55 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7157009Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d889d0, 
2023-01-11T23:33:33.7157350Z     tstate=<optimized out>)
2023-01-11T23:33:33.7157748Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7158011Z #56 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7158343Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:33.7158701Z #57 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7159098Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7159479Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7159848Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7160311Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7162580Z #58 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7163045Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7163626Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7164087Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb3f1f51f40, 
2023-01-11T23:33:33.7164506Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7164954Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7165385Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7165655Z #59 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7166000Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7167351Z #60 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:33.7167718Z     nargsf=<optimized out>, args=0x2030510, callable=0x7fb3f1f51f30, 
2023-01-11T23:33:33.7168030Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7168423Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7169650Z #61 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x2030510, 
2023-01-11T23:33:33.7169973Z     callable=0x7fb3f1f51f30)
2023-01-11T23:33:33.7170309Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7171710Z #62 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7172044Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d88b90, 
2023-01-11T23:33:33.7172328Z     tstate=<optimized out>)
2023-01-11T23:33:33.7172665Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7172923Z #63 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7173232Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:33.7173510Z #64 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7173886Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7174260Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7174902Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7175310Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7177835Z #65 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7178219Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7178742Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7179248Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb3f1dd2de0, 
2023-01-11T23:33:33.7179663Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7180058Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7180454Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7180709Z #66 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7181029Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7183257Z #67 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:33.7183932Z     nargsf=<optimized out>, args=0x7fb3f2098200, callable=0x7fb3f1dd2dd0, 
2023-01-11T23:33:33.7184495Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7185040Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7185914Z #68 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:33.7186329Z     args=0x7fb3f2098200, callable=0x7fb3f1dd2dd0)
2023-01-11T23:33:33.7186859Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7187990Z #69 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:33.7188434Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d88d50, 
2023-01-11T23:33:33.7188792Z     tstate=<optimized out>)
2023-01-11T23:33:33.7189291Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7189803Z #70 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7190369Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:33.7191215Z #71 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:33.7192282Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7192859Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7193444Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7194041Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:33.7196634Z #72 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:33.7197224Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:33.7197888Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:33.7198521Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7fb3f1dd2d50, 
2023-01-11T23:33:33.7199141Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1fb3bf0, 
2023-01-11T23:33:33.7199741Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7200368Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7200748Z #73 _PyFunction_Vectorcall ()
2023-01-11T23:33:33.7201219Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:33.7201901Z #74 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7fb3f2071e00, 
2023-01-11T23:33:33.7202240Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7fb3f1dd2d40, 
2023-01-11T23:33:33.7202501Z     tstate=0x1fb3bf0)
2023-01-11T23:33:33.7202862Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7203579Z #75 PyObject_Vectorcall (kwnames=0x7fb3f2071e00, nargsf=<optimized out>, 
2023-01-11T23:33:33.7203873Z     args=<optimized out>, callable=0x7fb3f1dd2d40)
2023-01-11T23:33:33.7204416Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:33.7206272Z #76 call_function (kwnames=0x7fb3f2071e00, oparg=<optimized out>, 
2023-01-11T23:33:33.7206714Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc09d88f10, 
2023-01-11T23:33:33.7207094Z     tstate=<optimized out>)
2023-01-11T23:33:33.7207570Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:33.7207936Z #77 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:33.7208430Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:33.7209335Z #78 0x0000000000594b72 in _PyEval_EvalFrame (
2023-01-11T23:33:33.7209818Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7210390Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:33.7211040Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7211736Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46
2023-01-11T23:33:33.7212152Z #79 _PyEval_Vector ()
2023-01-11T23:33:33.7212625Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:33.7217419Z #80 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7fb3f2034920, 
2023-01-11T23:33:33.7217894Z     globals=globals@entry=0x7fb3f2026600, locals=locals@entry=0x7fb3f2026600)
2023-01-11T23:33:33.7218441Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134
2023-01-11T23:33:33.7218833Z #81 0x00000000005c6e57 in run_eval_code_obj ()
2023-01-11T23:33:33.7219447Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291
2023-01-11T23:33:33.7222360Z #82 0x00000000005c1d40 in run_mod ()
2023-01-11T23:33:33.7222874Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312
2023-01-11T23:33:33.7224833Z #83 0x00000000005b9ebb in PyRun_StringFlags.localalias ()
2023-01-11T23:33:33.7225401Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183
2023-01-11T23:33:33.7227880Z #84 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias ()
2023-01-11T23:33:33.7228466Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503
2023-01-11T23:33:33.7232489Z #85 0x00000000005b8d5c in pymain_run_command (
2023-01-11T23:33:33.7232992Z     command=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:33.7233601Z     at /croot/python-split_1669298683653/work/build-static/python.c:252
2023-01-11T23:33:33.7234008Z #86 pymain_run_python (exitcode=0x7ffc09d89170)
2023-01-11T23:33:33.7234525Z     at /croot/python-split_1669298683653/work/build-static/python.c:582
2023-01-11T23:33:33.7234903Z #87 Py_RunMain.localalias ()
2023-01-11T23:33:33.7235404Z     at /croot/python-split_1669298683653/work/build-static/python.c:670
2023-01-11T23:33:33.7250882Z #88 0x0000000000587c29 in Py_BytesMain (argc=<optimized out>, 
2023-01-11T23:33:33.7251267Z     argv=<optimized out>)
2023-01-11T23:33:33.7251779Z     at /croot/python-split_1669298683653/work/build-static/python.c:1090
2023-01-11T23:33:33.7255882Z #89 0x00007fb3f2106c87 in __libc_start_main (main=0x587be0 <main>, argc=5, 
2023-01-11T23:33:33.7256272Z     argv=0x7ffc09d89378, init=<optimized out>, fini=<optimized out>, 
2023-01-11T23:33:33.7256630Z     rtld_fini=<optimized out>, stack_end=0x7ffc09d89368)
2023-01-11T23:33:33.7257063Z     at ../csu/libc-start.c:310
2023-01-11T23:33:33.7257378Z #90 0x0000000000587ade in _start ()
2023-01-11T23:33:33.7257855Z     at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880
2023-01-11T23:33:33.8771122Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
2023-01-11T23:33:33.8771726Z Copyright (C) 2018 Free Software Foundation, Inc.
2023-01-11T23:33:33.8772177Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2023-01-11T23:33:33.8772662Z This is free software: you are free to change and redistribute it.
2023-01-11T23:33:33.8772990Z There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
2023-01-11T23:33:33.8773260Z and "show warranty" for details.
2023-01-11T23:33:33.8773577Z This GDB was configured as "x86_64-linux-gnu".
2023-01-11T23:33:33.8773853Z Type "show configuration" for configuration details.
2023-01-11T23:33:33.8774129Z For bug reporting instructions, please see:
2023-01-11T23:33:33.8774397Z <http://www.gnu.org/software/gdb/bugs/>.
2023-01-11T23:33:33.8774934Z Find the GDB manual and other documentation resources online at:
2023-01-11T23:33:33.8775221Z <http://www.gnu.org/software/gdb/documentation/>.
2023-01-11T23:33:33.8775466Z For help, type "help".
2023-01-11T23:33:33.8775731Z Type "apropos word" to search for commands related to "word"...
2023-01-11T23:33:33.9484277Z Reading symbols from python...done.
2023-01-11T23:33:34.4779141Z 
2023-01-11T23:33:34.4779877Z warning: core file may not match specified executable file.
2023-01-11T23:33:34.4798979Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:34.4799532Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:34.4799908Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:34.4800186Z [New LWP 19077]
2023-01-11T23:33:34.4800484Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:34.4800850Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:34.4801502Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:34.4809496Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:34.4810185Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:34.4810751Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:34.4811293Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:34.4991938Z [Thread debugging using libthread_db enabled]
2023-01-11T23:33:34.4993032Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2023-01-11T23:33:40.4032594Z 78	../sysdeps/unix/syscall-template.S: No such file or directory.
2023-01-11T23:33:40.4033133Z Core was generated by `/opt/conda/bin/python -bb test_multiprocessing_spawn.py -v --import-slow-tests'.
2023-01-11T23:33:40.4033716Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
2023-01-11T23:33:40.4034098Z Program terminated with signal SIGABRT, Aborted.
2023-01-11T23:33:40.4034464Z #0  0x00007f1c269ce177 in kill () at ../sysdeps/unix/syscall-template.S:78
2023-01-11T23:33:40.4063665Z To enable execution of this file add
2023-01-11T23:33:40.4064840Z 	add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit
2023-01-11T23:33:40.4065630Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:40.4066227Z To completely disable this security protection add
2023-01-11T23:33:40.4066751Z 	set auto-load safe-path /
2023-01-11T23:33:40.4067215Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:40.4067758Z For more information about this security protection see the
2023-01-11T23:33:40.4068419Z "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
2023-01-11T23:33:40.4068779Z 	info "(gdb)Auto-loading safe path"
2023-01-11T23:33:40.4097364Z #0  0x00007f1c269ce177 in kill () at ../sysdeps/unix/syscall-template.S:78
2023-01-11T23:33:40.4098738Z #1  0x00000000004cb0d3 in os_kill_impl (
2023-01-11T23:33:40.4099559Z     module=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4100419Z     signal=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4101156Z     pid=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4102115Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/sys/_iomodule.c:7929
2023-01-11T23:33:40.4106398Z #2  os_kill (module=<optimized out>, args=args@entry=0x7f1b327c5058, 
2023-01-11T23:33:40.4106954Z     nargs=<optimized out>)
2023-01-11T23:33:40.4107670Z     at /usr/local/src/conda/python-3.10.8/Modules/codecs.c:3581
2023-01-11T23:33:40.4109453Z #3  0x00000000004fe7d4 in cfunction_vectorcall_FASTCALL (func=0x7f1c27911a30, 
2023-01-11T23:33:40.4110610Z     args=0x7f1b327c5058, nargsf=<optimized out>, kwnames=<optimized out>)
2023-01-11T23:33:40.4111605Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_bitutils.h:430
2023-01-11T23:33:40.4115046Z #4  0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4115505Z     nargsf=<optimized out>, args=0x7f1b327c5058, callable=0x7f1c27911a30, 
2023-01-11T23:33:40.4115856Z     tstate=0xb34b80)
2023-01-11T23:33:40.4116345Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4117056Z #5  PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4117441Z     args=0x7f1b327c5058, callable=0x7f1c27911a30)
2023-01-11T23:33:40.4118239Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4119779Z #6  call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4120517Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614eb7c0, 
2023-01-11T23:33:40.4120911Z     tstate=<optimized out>)
2023-01-11T23:33:40.4121763Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4122360Z #7  _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4122882Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:40.4123308Z #8  0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4123839Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4124437Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4124808Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4125213Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4125586Z #9  _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4126064Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4126780Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4127339Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b35dacd40, 
2023-01-11T23:33:40.4127751Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4128178Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4128596Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4128859Z #10 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4129174Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4131067Z #11 0x00000000004f141c in do_call_core (kwdict=0x0, callargs=0x7f1b35d70c10, 
2023-01-11T23:33:40.4131621Z     func=0x7f1b35dacd30, trace_info=0x7ffc614eb980, tstate=<optimized out>)
2023-01-11T23:33:40.4132044Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:40.4132372Z #12 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4132786Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:40.4133058Z #13 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4133382Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4133748Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4134122Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4134929Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4137034Z #14 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4137585Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4148776Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4149330Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b3f41b1d0, 
2023-01-11T23:33:40.4149963Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4150427Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4151025Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4151286Z #15 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4151607Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4151972Z #16 0x00000000004f141c in do_call_core (kwdict=0x7f1b3f3de800, 
2023-01-11T23:33:40.4152300Z     callargs=0x7f1b327ae2a0, func=0x7f1b3f41b1c0, trace_info=0x7ffc614ebb40, 
2023-01-11T23:33:40.4152549Z     tstate=<optimized out>)
2023-01-11T23:33:40.4152916Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:40.4153206Z #17 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4153519Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:40.4153787Z #18 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4154106Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4154577Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4154944Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4155427Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4155757Z #19 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4156245Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4156763Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4157175Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b3f3c7650, 
2023-01-11T23:33:40.4157667Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4158188Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4158723Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4158983Z #20 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4159290Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4159600Z #21 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4159900Z     nargsf=<optimized out>, args=0x39ca14b8, callable=0x7f1b3f3c7640, 
2023-01-11T23:33:40.4160187Z     tstate=0xb34b80)
2023-01-11T23:33:40.4160585Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4160992Z #22 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4161305Z     args=0x39ca14b8, callable=0x7f1b3f3c7640)
2023-01-11T23:33:40.4161643Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4161934Z #23 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4162215Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ebd00, 
2023-01-11T23:33:40.4162462Z     tstate=<optimized out>)
2023-01-11T23:33:40.4162764Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4163097Z #24 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4163414Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:40.4163685Z #25 0x0000000000509dbe in _PyEval_EvalFrame (
2023-01-11T23:33:40.4164003Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4164375Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4164746Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4165125Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4165500Z #26 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, 
2023-01-11T23:33:40.4165864Z     args=0x7f1b3325f9d8, locals=0x0, con=0x7f1b3f3f0050, tstate=0xb34b80)
2023-01-11T23:33:40.4166206Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4166936Z #27 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, 
2023-01-11T23:33:40.4167300Z     stack=0x7f1b3325f9d8, func=0x7f1b3f3f0040)
2023-01-11T23:33:40.4167636Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4170714Z #28 _PyObject_VectorcallTstate (kwnames=<optimized out>, 
2023-01-11T23:33:40.4171024Z     nargsf=<optimized out>, args=0x7f1b3325f9d8, callable=0x7f1b3f3f0040, 
2023-01-11T23:33:40.4171311Z     tstate=0xb34b80)
2023-01-11T23:33:40.4171691Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4171957Z #29 method_vectorcall ()
2023-01-11T23:33:40.4172282Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53
2023-01-11T23:33:40.4174956Z #30 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f1b328f77f0, 
2023-01-11T23:33:40.4175392Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7f1b35a1f400, 
2023-01-11T23:33:40.4175686Z     tstate=0xb34b80)
2023-01-11T23:33:40.4176019Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4176537Z #31 PyObject_Vectorcall (kwnames=0x7f1b328f77f0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4176863Z     args=<optimized out>, callable=0x7f1b35a1f400)
2023-01-11T23:33:40.4177408Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4178080Z #32 call_function (kwnames=0x7f1b328f77f0, oparg=<optimized out>, 
2023-01-11T23:33:40.4178381Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ebf10, 
2023-01-11T23:33:40.4178655Z     tstate=<optimized out>)
2023-01-11T23:33:40.4179043Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4179306Z #33 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4179685Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:40.4180111Z #34 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4180483Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4180902Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4181271Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4181671Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4184928Z #35 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4185454Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4185911Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4186332Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b327b7410, 
2023-01-11T23:33:40.4186865Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4187258Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4187669Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4187930Z #36 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4188239Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4189996Z #37 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4190362Z     nargsf=<optimized out>, args=0x7f1b328cb278, callable=0x7f1b327b7400, 
2023-01-11T23:33:40.4190619Z     tstate=0xb34b80)
2023-01-11T23:33:40.4191024Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4193390Z #38 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4193745Z     args=0x7f1b328cb278, callable=0x7f1b327b7400)
2023-01-11T23:33:40.4194286Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4194669Z #39 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4194954Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ec0d0, 
2023-01-11T23:33:40.4195208Z     tstate=<optimized out>)
2023-01-11T23:33:40.4195528Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4195784Z #40 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4196186Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:40.4196505Z #41 0x00000000004f706d in _PyEval_EvalFrame (
2023-01-11T23:33:40.4196924Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4197447Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4197849Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4198253Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4200755Z #42 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:40.4201513Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, locals=0x0, con=0x7f1b327b7020, 
2023-01-11T23:33:40.4202066Z     tstate=0xb34b80)
2023-01-11T23:33:40.4202531Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4202969Z #43 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4203274Z     stack=<optimized out>, func=0x7f1b327b7010)
2023-01-11T23:33:40.4203607Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4203904Z #44 _PyObject_FastCallDictTstate.localalias ()
2023-01-11T23:33:40.4204241Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142
2023-01-11T23:33:40.4208746Z #45 0x0000000000507af8 in _PyObject_Call_Prepend (kwargs=0x0, 
2023-01-11T23:33:40.4209424Z     kwargs@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7f1b35d91270, 
2023-01-11T23:33:40.4210014Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, obj=<optimized out>, 
2023-01-11T23:33:40.4210596Z     obj@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, callable=0x7f1b327b7010, 
2023-01-11T23:33:40.4211060Z     callable@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4211462Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4211880Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431
2023-01-11T23:33:40.4212119Z #46 slot_tp_init ()
2023-01-11T23:33:40.4212431Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7734
2023-01-11T23:33:40.4212900Z #47 0x00000000004f7bdb in type_call (kwds=0x0, args=0x7f1b35d91270, 
2023-01-11T23:33:40.4213208Z     type=<optimized out>)
2023-01-11T23:33:40.4213627Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:40.4213901Z #48 _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:40.4214241Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:215
2023-01-11T23:33:40.4215225Z #49 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4215647Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4216053Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7f1b328cb0d8, 
2023-01-11T23:33:40.4216549Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4216973Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4217409Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:40.4218010Z #50 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4218328Z     args=0x7f1b328cb0d8, callable=0x39dcd400, tstate=<optimized out>)
2023-01-11T23:33:40.4218696Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:40.4220853Z #51 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4221134Z     args=0x7f1b328cb0d8, callable=0x39dcd400)
2023-01-11T23:33:40.4221684Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4223247Z #52 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4223693Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ec3b0, 
2023-01-11T23:33:40.4224205Z     tstate=<optimized out>)
2023-01-11T23:33:40.4224713Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4225074Z #53 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4225403Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:40.4225739Z #54 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4226072Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4226441Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4226809Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4227199Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4229356Z #55 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4229987Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4230648Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4231159Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b3f418200, 
2023-01-11T23:33:40.4231575Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4231968Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4232383Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4232639Z #56 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4232948Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4234665Z #57 0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4235136Z     nargsf=<optimized out>, args=0x7f1b328cad98, callable=0x7f1b3f4181f0, 
2023-01-11T23:33:40.4235496Z     tstate=0xb34b80)
2023-01-11T23:33:40.4236022Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4236735Z #58 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4237344Z     args=0x7f1b328cad98, callable=0x7f1b3f4181f0)
2023-01-11T23:33:40.4237946Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4239092Z #59 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4239522Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ec570, 
2023-01-11T23:33:40.4239875Z     tstate=<optimized out>)
2023-01-11T23:33:40.4240339Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4240653Z #60 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4241216Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:40.4241508Z #61 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4241835Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4242219Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4242580Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4242972Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4245285Z #62 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4245681Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4246305Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4246799Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b3f3c76e0, 
2023-01-11T23:33:40.4247214Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4247610Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4248237Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4248496Z #63 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4248811Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4250247Z #64 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4250671Z     nargsf=<optimized out>, args=0x398f0b98, callable=0x7f1b3f3c76d0, 
2023-01-11T23:33:40.4250929Z     tstate=0xb34b80)
2023-01-11T23:33:40.4251578Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4252952Z #65 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4253348Z     args=0x398f0b98, callable=0x7f1b3f3c76d0)
2023-01-11T23:33:40.4253871Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4254298Z #66 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4254882Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ec730, 
2023-01-11T23:33:40.4255182Z     tstate=<optimized out>)
2023-01-11T23:33:40.4255499Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4255764Z #67 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4256082Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:40.4256357Z #68 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4257092Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4257545Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4257942Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4258459Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4260639Z #69 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4261090Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4261603Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4262103Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b3f435b50, 
2023-01-11T23:33:40.4262592Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4263075Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4263491Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4263744Z #70 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4264066Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4264820Z #71 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f1c267c5a40, 
2023-01-11T23:33:40.4265246Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7f1b3f435b40, 
2023-01-11T23:33:40.4265573Z     tstate=0xb34b80)
2023-01-11T23:33:40.4265913Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4266718Z #72 PyObject_Vectorcall (kwnames=0x7f1c267c5a40, nargsf=<optimized out>, 
2023-01-11T23:33:40.4267110Z     args=<optimized out>, callable=0x7f1b3f435b40)
2023-01-11T23:33:40.4267536Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4269154Z #73 call_function (kwnames=0x7f1c267c5a40, oparg=<optimized out>, 
2023-01-11T23:33:40.4269574Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ec8f0, 
2023-01-11T23:33:40.4269964Z     tstate=<optimized out>)
2023-01-11T23:33:40.4270390Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4270651Z #74 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4270962Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:40.4272327Z #75 0x0000000000509dbe in _PyEval_EvalFrame (
2023-01-11T23:33:40.4272994Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4273640Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4274315Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4274970Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4275578Z #76 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, 
2023-01-11T23:33:40.4276090Z     args=0x7f1b330bddd0, locals=0x0, con=0x7f1b35dad490, tstate=0xb34b80)
2023-01-11T23:33:40.4276472Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4277636Z #77 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, 
2023-01-11T23:33:40.4278105Z     stack=0x7f1b330bddd0, func=0x7f1b35dad480)
2023-01-11T23:33:40.4278561Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4280263Z #78 _PyObject_VectorcallTstate (kwnames=<optimized out>, 
2023-01-11T23:33:40.4280668Z     nargsf=<optimized out>, args=0x7f1b330bddd0, callable=0x7f1b35dad480, 
2023-01-11T23:33:40.4281185Z     tstate=0xb34b80)
2023-01-11T23:33:40.4281864Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4282220Z #79 method_vectorcall ()
2023-01-11T23:33:40.4282724Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53
2023-01-11T23:33:40.4285092Z #80 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4285549Z     nargsf=<optimized out>, args=0x7f1b330bddd8, callable=0x7f1b35bf7340, 
2023-01-11T23:33:40.4285980Z     tstate=0xb34b80)
2023-01-11T23:33:40.4286320Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4287641Z #81 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4288121Z     args=0x7f1b330bddd8, callable=0x7f1b35bf7340)
2023-01-11T23:33:40.4288548Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4289095Z #82 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4289479Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ecb00, 
2023-01-11T23:33:40.4289794Z     tstate=<optimized out>)
2023-01-11T23:33:40.4290131Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4290452Z #83 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4290812Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:40.4291175Z #84 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4291899Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4292415Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4292918Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4293386Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4295676Z #85 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4296304Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4296929Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4297442Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1c26720d40, 
2023-01-11T23:33:40.4297896Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4298316Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4298739Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4298994Z #86 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4299310Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4301106Z #87 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4301544Z     nargsf=<optimized out>, args=0x5b1cef0, callable=0x7f1c26720d30, 
2023-01-11T23:33:40.4301787Z     tstate=0xb34b80)
2023-01-11T23:33:40.4302116Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4303288Z #88 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x5b1cef0, 
2023-01-11T23:33:40.4303679Z     callable=0x7f1c26720d30)
2023-01-11T23:33:40.4304112Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4305323Z #89 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4305726Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614eccc0, 
2023-01-11T23:33:40.4305993Z     tstate=<optimized out>)
2023-01-11T23:33:40.4306344Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4306620Z #90 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4307070Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:40.4307347Z #91 0x0000000000509dbe in _PyEval_EvalFrame (
2023-01-11T23:33:40.4307671Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4308092Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4308474Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4309038Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4312551Z #92 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, 
2023-01-11T23:33:40.4313316Z     args=0x5ccbda8, locals=0x0, con=0x7f1c26720ef0, tstate=0xb34b80)
2023-01-11T23:33:40.4314236Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4314773Z #93 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, 
2023-01-11T23:33:40.4315294Z     stack=0x5ccbda8, func=0x7f1c26720ee0)
2023-01-11T23:33:40.4316028Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4317588Z #94 _PyObject_VectorcallTstate (kwnames=<optimized out>, 
2023-01-11T23:33:40.4318265Z     nargsf=<optimized out>, args=0x5ccbda8, callable=0x7f1c26720ee0, 
2023-01-11T23:33:40.4318513Z     tstate=0xb34b80)
2023-01-11T23:33:40.4318944Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4319207Z #95 method_vectorcall ()
2023-01-11T23:33:40.4319532Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53
2023-01-11T23:33:40.4323094Z #96 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f1b3d104a30, 
2023-01-11T23:33:40.4323425Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7f1b35bf7200, 
2023-01-11T23:33:40.4323682Z     tstate=0xb34b80)
2023-01-11T23:33:40.4324001Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4324660Z #97 PyObject_Vectorcall (kwnames=0x7f1b3d104a30, nargsf=<optimized out>, 
2023-01-11T23:33:40.4324954Z     args=<optimized out>, callable=0x7f1b35bf7200)
2023-01-11T23:33:40.4325545Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4327325Z #98 call_function (kwnames=0x7f1b3d104a30, oparg=<optimized out>, 
2023-01-11T23:33:40.4327782Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614eced0, 
2023-01-11T23:33:40.4328188Z     tstate=<optimized out>)
2023-01-11T23:33:40.4328682Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4329075Z #99 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4329548Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:40.4329945Z #100 0x0000000000509dbe in _PyEval_EvalFrame (
2023-01-11T23:33:40.4330438Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4330992Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4331538Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4332131Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4335977Z #101 _PyEval_Vector (kwnames=<optimized out>, argcount=<optimized out>, 
2023-01-11T23:33:40.4336522Z     args=0x7f1b35f46928, locals=0x0, con=0x7f1b35da5910, tstate=0xb34b80)
2023-01-11T23:33:40.4336957Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4337543Z #102 _PyFunction_Vectorcall (kwnames=<optimized out>, nargsf=<optimized out>, 
2023-01-11T23:33:40.4337844Z     stack=0x7f1b35f46928, func=0x7f1b35da5900)
2023-01-11T23:33:40.4338168Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4341388Z #103 _PyObject_VectorcallTstate (kwnames=<optimized out>, 
2023-01-11T23:33:40.4341954Z     nargsf=<optimized out>, args=0x7f1b35f46928, callable=0x7f1b35da5900, 
2023-01-11T23:33:40.4342395Z     tstate=0xb34b80)
2023-01-11T23:33:40.4342898Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4343295Z #104 method_vectorcall ()
2023-01-11T23:33:40.4343785Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:53
2023-01-11T23:33:40.4346828Z #105 0x00000000004efd83 in _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4347469Z     kwnames=0x7f1b3d102700, nargsf=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4348120Z     callable=0x7f1b3d4148c0, tstate=0xb34b80)
2023-01-11T23:33:40.4348704Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4349170Z #106 PyObject_Vectorcall (kwnames=0x7f1b3d102700, nargsf=<optimized out>, 
2023-01-11T23:33:40.4349590Z     args=<optimized out>, callable=0x7f1b3d4148c0)
2023-01-11T23:33:40.4350176Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4353150Z #107 call_function (kwnames=0x7f1b3d102700, oparg=<optimized out>, 
2023-01-11T23:33:40.4353939Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ed0e0, 
2023-01-11T23:33:40.4354628Z     tstate=<optimized out>)
2023-01-11T23:33:40.4355546Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4355965Z #108 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4356448Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:40.4356852Z #109 0x0000000000509f16 in _PyEval_EvalFrame (
2023-01-11T23:33:40.4357304Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4357858Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4358408Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4359005Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4359360Z #110 _PyEval_Vector (
2023-01-11T23:33:40.4359787Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4360421Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7ffc614ed1d0, locals=0x0, con=0x7f1b35da59a0, 
2023-01-11T23:33:40.4360898Z     tstate=0xb34b80, 
2023-01-11T23:33:40.4361337Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4361944Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4362320Z #111 _PyFunction_Vectorcall (
2023-01-11T23:33:40.4362875Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4363314Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, stack=0x7ffc614ed1d0, func=0x7f1b35da5990)
2023-01-11T23:33:40.4363827Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4365211Z #112 _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4365913Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4366465Z     callable=0x7f1b35da5990, tstate=0xb34b80)
2023-01-11T23:33:40.4367011Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4367391Z #113 method_vectorcall ()
2023-01-11T23:33:40.4367876Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:83
2023-01-11T23:33:40.4371804Z #114 0x00000000004f141c in do_call_core (kwdict=0x7f1b35dc7080, 
2023-01-11T23:33:40.4372304Z     callargs=0x7f1b35d91180, func=0x7f1b3eaf3980, trace_info=0x7ffc614ed2f0, 
2023-01-11T23:33:40.4372668Z     tstate=<optimized out>)
2023-01-11T23:33:40.4373170Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:40.4373449Z #115 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4373775Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:40.4374044Z #116 0x00000000004f706d in _PyEval_EvalFrame (
2023-01-11T23:33:40.4374372Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4375072Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4375605Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4375999Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4378196Z #117 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:40.4378846Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, locals=0x0, con=0x7f1c267210a0, 
2023-01-11T23:33:40.4379323Z     tstate=0xb34b80)
2023-01-11T23:33:40.4379806Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4380464Z #118 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4380856Z     stack=<optimized out>, func=0x7f1c26721090)
2023-01-11T23:33:40.4381486Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4381924Z #119 _PyObject_FastCallDictTstate.localalias ()
2023-01-11T23:33:40.4382442Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142
2023-01-11T23:33:40.4382825Z #120 0x00000000005084a6 in _PyObject_Call_Prepend ()
2023-01-11T23:33:40.4383305Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431
2023-01-11T23:33:40.4386544Z #121 0x00000000005d04d3 in slot_tp_call ()
2023-01-11T23:33:40.4387303Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7494
2023-01-11T23:33:40.4387908Z #122 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:40.4388495Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:40.4393253Z #123 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4393891Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4394468Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x5bd2650, 
2023-01-11T23:33:40.4394991Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4395404Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4395834Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:40.4396166Z #124 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4396472Z     args=0x5bd2650, callable=0x7f1b35d82ad0, tstate=<optimized out>)
2023-01-11T23:33:40.4396827Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:40.4397418Z #125 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4397807Z     args=0x5bd2650, callable=0x7f1b35d82ad0)
2023-01-11T23:33:40.4398329Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4400687Z #126 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4401115Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ed5f0, 
2023-01-11T23:33:40.4401490Z     tstate=<optimized out>)
2023-01-11T23:33:40.4401966Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4402349Z #127 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4402842Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:40.4403227Z #128 0x0000000000509f16 in _PyEval_EvalFrame (
2023-01-11T23:33:40.4403545Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4403915Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4404289Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4404675Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4406638Z #129 _PyEval_Vector (
2023-01-11T23:33:40.4407341Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4408067Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7ffc614ed6e0, locals=0x0, con=0x7f1c2672c200, 
2023-01-11T23:33:40.4408566Z     tstate=0xb34b80, 
2023-01-11T23:33:40.4408888Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4409328Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4409672Z #130 _PyFunction_Vectorcall (
2023-01-11T23:33:40.4410149Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4410741Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, stack=0x7ffc614ed6e0, func=0x7f1c2672c1f0)
2023-01-11T23:33:40.4411174Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4411828Z #131 _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4412541Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4413014Z     callable=0x7f1c2672c1f0, tstate=0xb34b80)
2023-01-11T23:33:40.4413488Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4413791Z #132 method_vectorcall ()
2023-01-11T23:33:40.4414130Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:83
2023-01-11T23:33:40.4418624Z #133 0x00000000004f141c in do_call_core (kwdict=0x7f1b3601a300, 
2023-01-11T23:33:40.4419061Z     callargs=0x7f1b35d67130, func=0x7f1b3592a040, trace_info=0x7ffc614ed800, 
2023-01-11T23:33:40.4419387Z     tstate=<optimized out>)
2023-01-11T23:33:40.4419732Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:40.4419990Z #134 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4420310Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:40.4420587Z #135 0x00000000004f706d in _PyEval_EvalFrame (
2023-01-11T23:33:40.4421244Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4421831Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4422405Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4423019Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4425990Z #136 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:40.4426777Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, locals=0x0, con=0x7f1c2672c0e0, 
2023-01-11T23:33:40.4427275Z     tstate=0xb34b80)
2023-01-11T23:33:40.4427764Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4428199Z #137 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4428616Z     stack=<optimized out>, func=0x7f1c2672c0d0)
2023-01-11T23:33:40.4429108Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4429524Z #138 _PyObject_FastCallDictTstate.localalias ()
2023-01-11T23:33:40.4430124Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142
2023-01-11T23:33:40.4430516Z #139 0x00000000005084a6 in _PyObject_Call_Prepend ()
2023-01-11T23:33:40.4430993Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431
2023-01-11T23:33:40.4433784Z #140 0x00000000005d04d3 in slot_tp_call ()
2023-01-11T23:33:40.4434653Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7494
2023-01-11T23:33:40.4435118Z #141 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:40.4435476Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:40.4438887Z #142 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4439669Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4440474Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x5bd1850, 
2023-01-11T23:33:40.4441069Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4441632Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4442297Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:40.4442782Z #143 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4443261Z     args=0x5bd1850, callable=0x7f1b35d83880, tstate=<optimized out>)
2023-01-11T23:33:40.4443917Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:40.4445821Z #144 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4446395Z     args=0x5bd1850, callable=0x7f1b35d83880)
2023-01-11T23:33:40.4446996Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4449265Z #145 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4449864Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614edb00, 
2023-01-11T23:33:40.4450253Z     tstate=<optimized out>)
2023-01-11T23:33:40.4450883Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4451277Z #146 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4451781Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:40.4452165Z #147 0x0000000000509f16 in _PyEval_EvalFrame (
2023-01-11T23:33:40.4452648Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4453220Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4453778Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4454356Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4456297Z #148 _PyEval_Vector (
2023-01-11T23:33:40.4456765Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4457395Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7ffc614edbf0, locals=0x0, con=0x7f1c2672c200, 
2023-01-11T23:33:40.4457894Z     tstate=0xb34b80, 
2023-01-11T23:33:40.4458378Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4458985Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4460696Z #149 _PyFunction_Vectorcall (
2023-01-11T23:33:40.4461316Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4461867Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, stack=0x7ffc614edbf0, func=0x7f1c2672c1f0)
2023-01-11T23:33:40.4462416Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4463599Z #150 _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4464309Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, nargsf=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4464797Z     callable=0x7f1c2672c1f0, tstate=0xb34b80)
2023-01-11T23:33:40.4465298Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:114
2023-01-11T23:33:40.4465587Z #151 method_vectorcall ()
2023-01-11T23:33:40.4465910Z     at /usr/local/src/conda/python-3.10.8/Programs/_functoolsmodule.c:83
2023-01-11T23:33:40.4469336Z #152 0x00000000004f141c in do_call_core (kwdict=0x7f1b35e39740, 
2023-01-11T23:33:40.4469737Z     callargs=0x7f1b35d675e0, func=0x7f1b35ef4bc0, trace_info=0x7ffc614edd10, 
2023-01-11T23:33:40.4470310Z     tstate=<optimized out>)
2023-01-11T23:33:40.4472353Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:40.4472724Z #153 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4473192Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:40.4473557Z #154 0x00000000004f706d in _PyEval_EvalFrame (
2023-01-11T23:33:40.4473900Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4474273Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4474644Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4475032Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4475589Z #155 _PyEval_Vector (kwnames=0x0, 
2023-01-11T23:33:40.4476099Z     argcount=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, locals=0x0, con=0x7f1c2672c0e0, 
2023-01-11T23:33:40.4476501Z     tstate=0xb34b80)
2023-01-11T23:33:40.4476816Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4478479Z #156 _PyFunction_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4479151Z     stack=<optimized out>, func=0x7f1c2672c0d0)
2023-01-11T23:33:40.4479955Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4480671Z #157 _PyObject_FastCallDictTstate.localalias ()
2023-01-11T23:33:40.4481484Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:142
2023-01-11T23:33:40.4482151Z #158 0x00000000005084a6 in _PyObject_Call_Prepend ()
2023-01-11T23:33:40.4482729Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431
2023-01-11T23:33:40.4483080Z #159 0x00000000005d04d3 in slot_tp_call ()
2023-01-11T23:33:40.4483469Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7494
2023-01-11T23:33:40.4485165Z #160 0x00000000004f7b8b in _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:40.4485919Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:40.4489105Z #161 0x00000000004f37ae in _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4489633Z     kwnames=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4490201Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x5b54c20, 
2023-01-11T23:33:40.4490804Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4491357Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4491998Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:40.4492481Z #162 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4492938Z     args=0x5b54c20, callable=0x7f1b35d72410, tstate=<optimized out>)
2023-01-11T23:33:40.4493483Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:40.4495127Z #163 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4495534Z     args=0x5b54c20, callable=0x7f1b35d72410)
2023-01-11T23:33:40.4496076Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4498811Z #164 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4499246Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ee010, 
2023-01-11T23:33:40.4499607Z     tstate=<optimized out>)
2023-01-11T23:33:40.4500118Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4500497Z #165 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4500987Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:40.4501393Z #166 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4501852Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4502563Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4503111Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4503708Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4506123Z #167 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4506703Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4507335Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4507913Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b35dc1d90, 
2023-01-11T23:33:40.4508338Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4508747Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4509159Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4509431Z #168 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4509747Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4512772Z #169 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4513200Z     nargsf=<optimized out>, args=0x5c960b8, callable=0x7f1b35dc1d80, 
2023-01-11T23:33:40.4513546Z     tstate=0xb34b80)
2023-01-11T23:33:40.4513981Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4514419Z #170 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4514805Z     args=0x5c960b8, callable=0x7f1b35dc1d80)
2023-01-11T23:33:40.4515253Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4516794Z #171 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4517376Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ee1d0, 
2023-01-11T23:33:40.4517741Z     tstate=<optimized out>)
2023-01-11T23:33:40.4518211Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4518572Z #172 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4518912Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:40.4519196Z #173 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4519519Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4519894Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4520262Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4520655Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4523047Z #174 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4523978Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4524655Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4525293Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1c2657a3c0, 
2023-01-11T23:33:40.4525899Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4526483Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4527096Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4527588Z #175 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4528063Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4528843Z #176 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4529298Z     nargsf=<optimized out>, args=0x5b7f2b8, callable=0x7f1c2657a3b0, 
2023-01-11T23:33:40.4529664Z     tstate=0xb34b80)
2023-01-11T23:33:40.4530145Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4532503Z #177 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4533046Z     args=0x5b7f2b8, callable=0x7f1c2657a3b0)
2023-01-11T23:33:40.4533727Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4534249Z #178 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4535010Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ee390, 
2023-01-11T23:33:40.4535476Z     tstate=<optimized out>)
2023-01-11T23:33:40.4535961Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4536225Z #179 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4536547Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:40.4536950Z #180 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4537395Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4537797Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4538190Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4538585Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4542175Z #181 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4542683Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4543310Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4543730Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1c26579e20, 
2023-01-11T23:33:40.4544145Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4544534Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4544943Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4545198Z #182 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4545506Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4545818Z #183 0x00000000004f711d in _PyObject_FastCallDictTstate.localalias ()
2023-01-11T23:33:40.4546180Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:153
2023-01-11T23:33:40.4550125Z #184 0x0000000000507af8 in _PyObject_Call_Prepend (kwargs=0x7f1b35db6a00, 
2023-01-11T23:33:40.4550720Z     kwargs@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=0x7f1c27820070, 
2023-01-11T23:33:40.4551368Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, obj=<optimized out>, 
2023-01-11T23:33:40.4551795Z     obj@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, callable=0x7f1c26579e10, 
2023-01-11T23:33:40.4552218Z     callable@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4552616Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4553038Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:431
2023-01-11T23:33:40.4553424Z #185 slot_tp_init ()
2023-01-11T23:33:40.4553726Z     at /usr/local/src/conda/python-3.10.8/Programs/gcmodule.c:7734
2023-01-11T23:33:40.4554201Z #186 0x00000000004f7bdb in type_call (kwds=0x7f1b35db6a00, 
2023-01-11T23:33:40.4554467Z     args=0x7f1c27820070, type=<optimized out>)
2023-01-11T23:33:40.4554804Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:224
2023-01-11T23:33:40.4555068Z #187 _PyObject_MakeTpCall.localalias ()
2023-01-11T23:33:40.4555393Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:215
2023-01-11T23:33:40.4557563Z #188 0x00000000004f428d in _PyObject_VectorcallTstate (
2023-01-11T23:33:40.4558033Z     kwnames=0x7f1b3d0f5740, 
2023-01-11T23:33:40.4558542Z     nargsf=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, args=<optimized out>, 
2023-01-11T23:33:40.4559067Z     callable=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4559454Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4559925Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:112
2023-01-11T23:33:40.4560366Z #189 _PyObject_VectorcallTstate (kwnames=0x7f1b3d0f5740, 
2023-01-11T23:33:40.4561006Z     nargsf=<optimized out>, args=<optimized out>, callable=0xcd1c30, 
2023-01-11T23:33:40.4561376Z     tstate=0xb34b80)
2023-01-11T23:33:40.4562000Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:99
2023-01-11T23:33:40.4562464Z #190 PyObject_Vectorcall (kwnames=0x7f1b3d0f5740, nargsf=<optimized out>, 
2023-01-11T23:33:40.4562870Z     args=<optimized out>, callable=0xcd1c30)
2023-01-11T23:33:40.4563380Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4564134Z #191 call_function (kwnames=0x7f1b3d0f5740, oparg=<optimized out>, 
2023-01-11T23:33:40.4564805Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ee6b0, 
2023-01-11T23:33:40.4565437Z     tstate=<optimized out>)
2023-01-11T23:33:40.4566181Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4566570Z #192 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4567052Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:40.4567437Z #193 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:40.4567812Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4568193Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4568560Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4568945Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:40.4569542Z #194 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:40.4570050Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:40.4570573Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:40.4570992Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f1b3ced2960, 
2023-01-11T23:33:40.4571407Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0xb34b80, 
2023-01-11T23:33:40.4571808Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4572233Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4572511Z #195 _PyFunction_Vectorcall ()
2023-01-11T23:33:40.4572860Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:40.4574877Z #196 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:40.4575416Z     nargsf=<optimized out>, args=0x7f1c27845ba8, callable=0x7f1b3ced2950, 
2023-01-11T23:33:40.4575710Z     tstate=0xb34b80)
2023-01-11T23:33:40.4576042Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4577954Z #197 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:40.4578360Z     args=0x7f1c27845ba8, callable=0x7f1b3ced2950)
2023-01-11T23:33:40.4579094Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:40.4579760Z #198 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:40.4580373Z     pp_stack=<synthetic pointer>, trace_info=0x7ffc614ee870, 
2023-01-11T23:33:40.4580708Z     tstate=<optimized out>)
2023-01-11T23:33:40.4581095Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:40.4581453Z #199 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:40.4581885Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:40.4582257Z #200 0x0000000000594b72 in _PyEval_EvalFrame (
2023-01-11T23:33:40.4582685Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4583138Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4583509Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:40.4583972Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46
2023-01-11T23:33:40.4584255Z #201 _PyEval_Vector ()
2023-01-11T23:33:40.4584554Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:40.4587693Z #202 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f1c26797730, 
2023-01-11T23:33:40.4588017Z     globals=globals@entry=0x7f1c268d2480, locals=locals@entry=0x7f1c268d2480)
2023-01-11T23:33:40.4588395Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134
2023-01-11T23:33:40.4588787Z #203 0x00000000005c6e57 in run_eval_code_obj ()
2023-01-11T23:33:40.4589220Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291
2023-01-11T23:33:40.4592299Z #204 0x00000000005c1d40 in run_mod ()
2023-01-11T23:33:40.4592682Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312
2023-01-11T23:33:40.4594846Z #205 0x000000000045adf2 in pyrun_file (
2023-01-11T23:33:40.4595403Z     fp=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4595914Z     filename=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4596411Z     start=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4596937Z     globals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4597324Z     locals=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:40.4597719Z     closeit=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, flags=0x7ffc614eeb68)
2023-01-11T23:33:40.4598160Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1208
2023-01-11T23:33:40.4598476Z #206 0x00000000005bc25f in _PyRun_SimpleFileObject.localalias ()
2023-01-11T23:33:40.4598836Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:456
2023-01-11T23:33:40.4599151Z #207 0x00000000005bc063 in _PyRun_AnyFileObject.localalias ()
2023-01-11T23:33:40.4599509Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:90
2023-01-11T23:33:40.4603409Z #208 0x00000000005b8e7d in pymain_run_file_obj (skip_source_first_line=0, 
2023-01-11T23:33:40.4603705Z     filename=0x7f1c2694d840, program_name=0x7f1c268bfa00)
2023-01-11T23:33:40.4604067Z     at /croot/python-split_1669298683653/work/build-static/python.c:357
2023-01-11T23:33:40.4604346Z #209 pymain_run_file (config=0xb18e50)
2023-01-11T23:33:40.4604770Z     at /croot/python-split_1669298683653/work/build-static/python.c:376
2023-01-11T23:33:40.4605310Z #210 pymain_run_python (exitcode=0x7ffc614eeb60)
2023-01-11T23:33:40.4605822Z     at /croot/python-split_1669298683653/work/build-static/python.c:591
2023-01-11T23:33:40.4606097Z #211 Py_RunMain.localalias ()
2023-01-11T23:33:40.4606419Z     at /croot/python-split_1669298683653/work/build-static/python.c:670
2023-01-11T23:33:40.4620764Z #212 0x0000000000587c29 in Py_BytesMain (argc=<optimized out>, 
2023-01-11T23:33:40.4621318Z     argv=<optimized out>)
2023-01-11T23:33:40.4621808Z     at /croot/python-split_1669298683653/work/build-static/python.c:1090
2023-01-11T23:33:40.4625046Z #213 0x00007f1c269b0c87 in __libc_start_main (main=0x587be0 <main>, argc=6, 
2023-01-11T23:33:40.4625507Z     argv=0x7ffc614eed68, init=<optimized out>, fini=<optimized out>, 
2023-01-11T23:33:40.4625812Z     rtld_fini=<optimized out>, stack_end=0x7ffc614eed58)
2023-01-11T23:33:40.4626201Z     at ../csu/libc-start.c:310
2023-01-11T23:33:40.4626487Z #214 0x0000000000587ade in _start ()
2023-01-11T23:33:40.4626880Z     at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880
2023-01-11T23:33:40.6170420Z GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
2023-01-11T23:33:40.6170848Z Copyright (C) 2018 Free Software Foundation, Inc.
2023-01-11T23:33:40.6171241Z License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
2023-01-11T23:33:40.6171592Z This is free software: you are free to change and redistribute it.
2023-01-11T23:33:40.6171911Z There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
2023-01-11T23:33:40.6172175Z and "show warranty" for details.
2023-01-11T23:33:40.6172474Z This GDB was configured as "x86_64-linux-gnu".
2023-01-11T23:33:40.6172757Z Type "show configuration" for configuration details.
2023-01-11T23:33:40.6173026Z For bug reporting instructions, please see:
2023-01-11T23:33:40.6173277Z <http://www.gnu.org/software/gdb/bugs/>.
2023-01-11T23:33:40.6173566Z Find the GDB manual and other documentation resources online at:
2023-01-11T23:33:40.6173853Z <http://www.gnu.org/software/gdb/documentation/>.
2023-01-11T23:33:40.6174085Z For help, type "help".
2023-01-11T23:33:40.6174336Z Type "apropos word" to search for commands related to "word"...
2023-01-11T23:33:40.6878127Z Reading symbols from python...done.
2023-01-11T23:33:41.2175363Z 
2023-01-11T23:33:41.2175859Z warning: core file may not match specified executable file.
2023-01-11T23:33:41.2197453Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:41.2197840Z [New LWP 19370]
2023-01-11T23:33:41.2198115Z [New LWP 19380]
2023-01-11T23:33:41.2198409Z [New LWP 19381]
2023-01-11T23:33:41.2198756Z BFD: warning: /opt/conda/lib/libgomp.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:41.2199124Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:41.2199501Z BFD: warning: /opt/conda/bin/../lib/libstdc++.so.6: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:41.2199862Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:41.2200214Z BFD: warning: /opt/conda/bin/../lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:41.2200483Z [New LWP 19374]
2023-01-11T23:33:41.2200666Z [New LWP 19379]
2023-01-11T23:33:41.2200847Z [New LWP 19373]
2023-01-11T23:33:41.2201026Z [New LWP 19376]
2023-01-11T23:33:41.2201212Z [New LWP 19375]
2023-01-11T23:33:41.2201388Z [New LWP 19378]
2023-01-11T23:33:41.2201565Z [New LWP 19372]
2023-01-11T23:33:41.2201750Z [New LWP 19385]
2023-01-11T23:33:41.2201923Z [New LWP 19382]
2023-01-11T23:33:41.2202099Z [New LWP 19384]
2023-01-11T23:33:41.2202281Z [New LWP 19383]
2023-01-11T23:33:41.2202450Z [New LWP 19377]
2023-01-11T23:33:41.2202628Z [New LWP 19386]
2023-01-11T23:33:41.2202806Z [New LWP 19403]
2023-01-11T23:33:41.2207106Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:41.2207664Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libgfortran.so.5: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:41.2208219Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
2023-01-11T23:33:41.2208756Z BFD: warning: /opt/conda/lib/python3.10/site-packages/numpy/core/../../../.././libquadmath.so.0: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
2023-01-11T23:33:41.2378669Z [Thread debugging using libthread_db enabled]
2023-01-11T23:33:41.2380027Z Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
2023-01-11T23:33:46.6756923Z 78	../sysdeps/unix/syscall-template.S: No such file or directory.
2023-01-11T23:33:46.6757865Z Core was generated by `/opt/conda/bin/python -bb -c from multiprocessing.spawn import spawn_main; spaw'.
2023-01-11T23:33:46.6758987Z warning: File "/var/lib/jenkins/workspace/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
2023-01-11T23:33:46.6759691Z Program terminated with signal SIGABRT, Aborted.
2023-01-11T23:33:46.6760113Z #0  0x00007f7a9a4be177 in kill () at ../sysdeps/unix/syscall-template.S:78
2023-01-11T23:33:46.6760403Z [Current thread is 1 (Thread 0x7f7a9b43c080 (LWP 19370))]
2023-01-11T23:33:46.6782759Z To enable execution of this file add
2023-01-11T23:33:46.6783114Z 	add-auto-load-safe-path /var/lib/jenkins/workspace/.gdbinit
2023-01-11T23:33:46.6783440Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:46.6783717Z To completely disable this security protection add
2023-01-11T23:33:46.6784012Z 	set auto-load safe-path /
2023-01-11T23:33:46.6784278Z line to your configuration file "/var/lib/jenkins/.gdbinit".
2023-01-11T23:33:46.6784564Z For more information about this security protection see the
2023-01-11T23:33:46.6784937Z "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
2023-01-11T23:33:46.6785258Z 	info "(gdb)Auto-loading safe path"
2023-01-11T23:33:46.6813661Z #0  0x00007f7a9a4be177 in kill () at ../sysdeps/unix/syscall-template.S:78
2023-01-11T23:33:46.6815743Z #1  0x00000000004cb0d3 in os_kill_impl (
2023-01-11T23:33:46.6816075Z     module=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6816461Z     signal=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6816837Z     pid=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6817327Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/sys/_iomodule.c:7929
2023-01-11T23:33:46.6822144Z #2  os_kill (module=<optimized out>, args=args@entry=0x7f79a9a92578, 
2023-01-11T23:33:46.6822753Z     nargs=<optimized out>)
2023-01-11T23:33:46.6823123Z     at /usr/local/src/conda/python-3.10.8/Modules/codecs.c:3581
2023-01-11T23:33:46.6827099Z #3  0x00000000004fe7d4 in cfunction_vectorcall_FASTCALL (func=0x7f7a9b4018f0, 
2023-01-11T23:33:46.6827700Z     args=0x7f79a9a92578, nargsf=<optimized out>, kwnames=<optimized out>)
2023-01-11T23:33:46.6828131Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_bitutils.h:430
2023-01-11T23:33:46.6834122Z #4  0x00000000004f351e in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:46.6834747Z     nargsf=<optimized out>, args=0x7f79a9a92578, callable=0x7f7a9b4018f0, 
2023-01-11T23:33:46.6835011Z     tstate=0x1894bf0)
2023-01-11T23:33:46.6835500Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6836092Z #5  PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:46.6836523Z     args=0x7f79a9a92578, callable=0x7f7a9b4018f0)
2023-01-11T23:33:46.6836961Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6838538Z #6  call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:46.6838981Z     pp_stack=<synthetic pointer>, trace_info=0x7fff600c2e20, 
2023-01-11T23:33:46.6839372Z     tstate=<optimized out>)
2023-01-11T23:33:46.6839858Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:46.6840254Z #7  _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6840794Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4181
2023-01-11T23:33:46.6841155Z #8  0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:46.6841475Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6841936Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6842315Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6842708Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:46.6845206Z #9  _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:46.6845836Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:46.6846307Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:46.6846876Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f79a9850ef0, 
2023-01-11T23:33:46.6847559Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1894bf0, 
2023-01-11T23:33:46.6848206Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6848789Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6849059Z #10 _PyFunction_Vectorcall ()
2023-01-11T23:33:46.6849376Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:46.6850932Z #11 0x00000000004f141c in do_call_core (kwdict=0x0, callargs=0x7f7a9a1b3c10, 
2023-01-11T23:33:46.6851405Z     func=0x7f79a9850ee0, trace_info=0x7fff600c2fe0, tstate=<optimized out>)
2023-01-11T23:33:46.6851963Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:46.6852223Z #12 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6852569Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:46.6852873Z #13 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:46.6853232Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6853655Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6854076Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6854760Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:46.6856939Z #14 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:46.6857626Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:46.6858328Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:46.6858923Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f79b2eea9f0, 
2023-01-11T23:33:46.6859347Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1894bf0, 
2023-01-11T23:33:46.6859736Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6860330Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6860588Z #15 _PyFunction_Vectorcall ()
2023-01-11T23:33:46.6860910Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:46.6862774Z #16 0x00000000004f141c in do_call_core (kwdict=0x7f7a9a2c2800, 
2023-01-11T23:33:46.6863416Z     callargs=0x7f79a984f240, func=0x7f79b2eea9e0, trace_info=0x7fff600c31a0, 
2023-01-11T23:33:46.6863809Z     tstate=<optimized out>)
2023-01-11T23:33:46.6864286Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5943
2023-01-11T23:33:46.6864688Z #17 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6865179Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4277
2023-01-11T23:33:46.6865548Z #18 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:46.6866005Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6866398Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6866776Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6867169Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:46.6869776Z #19 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:46.6870443Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:46.6870956Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:46.6871457Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f7a9a2ed5b0, 
2023-01-11T23:33:46.6871976Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1894bf0, 
2023-01-11T23:33:46.6872372Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6872808Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6873067Z #20 _PyFunction_Vectorcall ()
2023-01-11T23:33:46.6873375Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:46.6874647Z #21 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:46.6875051Z     nargsf=<optimized out>, args=0x6833738, callable=0x7f7a9a2ed5a0, 
2023-01-11T23:33:46.6875381Z     tstate=0x1894bf0)
2023-01-11T23:33:46.6875807Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6877134Z #22 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x6833738, 
2023-01-11T23:33:46.6877511Z     callable=0x7f7a9a2ed5a0)
2023-01-11T23:33:46.6877981Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6879335Z #23 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:46.6879764Z     pp_stack=<synthetic pointer>, trace_info=0x7fff600c3360, 
2023-01-11T23:33:46.6880131Z     tstate=<optimized out>)
2023-01-11T23:33:46.6880555Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:46.6880863Z #24 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6881305Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:46.6881663Z #25 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:46.6882098Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6882503Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6882892Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6883281Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:46.6886089Z #26 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:46.6886768Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:46.6887350Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:46.6887901Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f7a9a2edf40, 
2023-01-11T23:33:46.6888371Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1894bf0, 
2023-01-11T23:33:46.6888774Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6889261Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6889516Z #27 _PyFunction_Vectorcall ()
2023-01-11T23:33:46.6889833Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:46.6891341Z #28 0x00000000004ef101 in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:46.6891858Z     nargsf=<optimized out>, args=0x1911510, callable=0x7f7a9a2edf30, 
2023-01-11T23:33:46.6892172Z     tstate=0x1894bf0)
2023-01-11T23:33:46.6892586Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6893866Z #29 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, args=0x1911510, 
2023-01-11T23:33:46.6894333Z     callable=0x7f7a9a2edf30)
2023-01-11T23:33:46.6895074Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6895834Z #30 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:46.6896229Z     pp_stack=<synthetic pointer>, trace_info=0x7fff600c3520, 
2023-01-11T23:33:46.6896551Z     tstate=<optimized out>)
2023-01-11T23:33:46.6896994Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:46.6897323Z #31 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6897746Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4198
2023-01-11T23:33:46.6898672Z #32 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:46.6899252Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6899656Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6900127Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6900554Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:46.6903678Z #33 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:46.6904085Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:46.6904649Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:46.6905246Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f7a9a172de0, 
2023-01-11T23:33:46.6905797Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1894bf0, 
2023-01-11T23:33:46.6906311Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6906840Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6907159Z #34 _PyFunction_Vectorcall ()
2023-01-11T23:33:46.6907567Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:46.6908205Z #35 0x00000000004eecef in _PyObject_VectorcallTstate (kwnames=0x0, 
2023-01-11T23:33:46.6908585Z     nargsf=<optimized out>, args=0x7f7a9a438200, callable=0x7f7a9a172dd0, 
2023-01-11T23:33:46.6908905Z     tstate=0x1894bf0)
2023-01-11T23:33:46.6909511Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6913040Z #36 PyObject_Vectorcall (kwnames=0x0, nargsf=<optimized out>, 
2023-01-11T23:33:46.6913341Z     args=0x7f7a9a438200, callable=0x7f7a9a172dd0)
2023-01-11T23:33:46.6913844Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6914233Z #37 call_function (kwnames=0x0, oparg=<optimized out>, 
2023-01-11T23:33:46.6914581Z     pp_stack=<synthetic pointer>, trace_info=0x7fff600c36e0, 
2023-01-11T23:33:46.6914895Z     tstate=<optimized out>)
2023-01-11T23:33:46.6915298Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:46.6915613Z #38 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6916023Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4213
2023-01-11T23:33:46.6916499Z #39 0x00000000004fe5ef in _PyEval_EvalFrame (
2023-01-11T23:33:46.6916905Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6917373Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6917832Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6918327Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5052
2023-01-11T23:33:46.6920103Z #40 _PyEval_Vector (kwnames=<optimized out>, 
2023-01-11T23:33:46.6920730Z     kwnames@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, argcount=<optimized out>, args=<optimized out>, 
2023-01-11T23:33:46.6921285Z     args@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, locals=0x0, 
2023-01-11T23:33:46.6921827Z     locals@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, con=0x7f7a9a172d50, 
2023-01-11T23:33:46.6922351Z     con@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, tstate=0x1894bf0, 
2023-01-11T23:33:46.6922848Z     tstate@entry=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6923360Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6923688Z #41 _PyFunction_Vectorcall ()
2023-01-11T23:33:46.6924085Z     at /usr/local/src/conda/python-3.10.8/Include/abstract.h:342
2023-01-11T23:33:46.6924478Z #42 0x00000000004efd83 in _PyObject_VectorcallTstate (kwnames=0x7f7a9a40de40, 
2023-01-11T23:33:46.6924885Z     nargsf=<optimized out>, args=<optimized out>, callable=0x7f7a9a172d40, 
2023-01-11T23:33:46.6925205Z     tstate=0x1894bf0)
2023-01-11T23:33:46.6925595Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6926170Z #43 PyObject_Vectorcall (kwnames=0x7f7a9a40de40, nargsf=<optimized out>, 
2023-01-11T23:33:46.6926537Z     args=<optimized out>, callable=0x7f7a9a172d40)
2023-01-11T23:33:46.6926979Z     at /usr/local/src/conda/python-3.10.8/Objects/pycore_pyerrors.h:123
2023-01-11T23:33:46.6928587Z #44 call_function (kwnames=0x7f7a9a40de40, oparg=<optimized out>, 
2023-01-11T23:33:46.6928968Z     pp_stack=<synthetic pointer>, trace_info=0x7fff600c38a0, 
2023-01-11T23:33:46.6929290Z     tstate=<optimized out>)
2023-01-11T23:33:46.6929685Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5891
2023-01-11T23:33:46.6930011Z #45 _PyEval_EvalFrameDefault ()
2023-01-11T23:33:46.6930414Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:4231
2023-01-11T23:33:46.6931099Z #46 0x0000000000594b72 in _PyEval_EvalFrame (
2023-01-11T23:33:46.6931515Z     throwflag=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6931991Z     f=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>, 
2023-01-11T23:33:46.6932456Z     tstate=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6933032Z     at /croot/python-split_1669298683653/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/call.c:46
2023-01-11T23:33:46.6933454Z #47 _PyEval_Vector ()
2023-01-11T23:33:46.6933843Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:5065
2023-01-11T23:33:46.6937727Z #48 0x0000000000594ab7 in PyEval_EvalCode (co=co@entry=0x7f7a9a3d0920, 
2023-01-11T23:33:46.6938111Z     globals=globals@entry=0x7f7a9a3c2600, locals=locals@entry=0x7f7a9a3c2600)
2023-01-11T23:33:46.6938586Z     at /usr/local/src/conda/python-3.10.8/Modules/ceval_gil.h:1134
2023-01-11T23:33:46.6939114Z #49 0x00000000005c6e57 in run_eval_code_obj ()
2023-01-11T23:33:46.6939553Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1291
2023-01-11T23:33:46.6942201Z #50 0x00000000005c1d40 in run_mod ()
2023-01-11T23:33:46.6942739Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1312
2023-01-11T23:33:46.6944541Z #51 0x00000000005b9ebb in PyRun_StringFlags.localalias ()
2023-01-11T23:33:46.6945002Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:1183
2023-01-11T23:33:46.6946963Z #52 0x00000000005b9cfb in PyRun_SimpleStringFlags.localalias ()
2023-01-11T23:33:46.6947694Z     at /usr/local/src/conda/python-3.10.8/Objects/clinic/marshal.c.h:503
2023-01-11T23:33:46.6949779Z #53 0x00000000005b8d5c in pymain_run_command (
2023-01-11T23:33:46.6950370Z     command=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>)
2023-01-11T23:33:46.6950892Z     at /croot/python-split_1669298683653/work/build-static/python.c:252
2023-01-11T23:33:46.6953197Z #54 pymain_run_python (exitcode=0x7fff600c3b00)
2023-01-11T23:33:46.6953618Z     at /croot/python-split_1669298683653/work/build-static/python.c:582
2023-01-11T23:33:46.6953894Z #55 Py_RunMain.localalias ()
2023-01-11T23:33:46.6954238Z     at /croot/python-split_1669298683653/work/build-static/python.c:670
2023-01-11T23:33:46.6967955Z #56 0x0000000000587c29 in Py_BytesMain (argc=<optimized out>, 
2023-01-11T23:33:46.6968202Z     argv=<optimized out>)
2023-01-11T23:33:46.6968548Z     at /croot/python-split_1669298683653/work/build-static/python.c:1090
2023-01-11T23:33:46.6972616Z #57 0x00007f7a9a4a0c87 in __libc_start_main (main=0x587be0 <main>, argc=5, 
2023-01-11T23:33:46.6972932Z     argv=0x7fff600c3d08, init=<optimized out>, fini=<optimized out>, 
2023-01-11T23:33:46.6973226Z     rtld_fini=<optimized out>, stack_end=0x7fff600c3cf8)
2023-01-11T23:33:46.6973511Z     at ../csu/libc-start.c:310
2023-01-11T23:33:46.6973860Z #58 0x0000000000587ade in _start ()
2023-01-11T23:33:46.6974299Z     at /usr/local/src/conda/python-3.10.8/Modules/_io/clinic/peg_api.c:880
2023-01-11T23:33:46.8102125Z ##[group]Run set -x
2023-01-11T23:33:46.8102349Z [36;1mset -x[0m
2023-01-11T23:33:46.8102584Z [36;1mpython3 -m pip install -r requirements.txt[0m
2023-01-11T23:33:46.8102844Z [36;1mpython3 -m pip install boto3==1.19.12[0m
2023-01-11T23:33:46.8103160Z [36;1mpython3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test[0m
2023-01-11T23:33:46.8113626Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:33:46.8113857Z env:
2023-01-11T23:33:46.8114053Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:33:46.8114275Z   GPU_FLAG: --gpus all
2023-01-11T23:33:46.8114548Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:33:46.8114830Z   AWS_DEFAULT_REGION: us-east-1
2023-01-11T23:33:46.8115032Z   BRANCH: 
2023-01-11T23:33:46.8115220Z   TEST_CONFIG: default
2023-01-11T23:33:46.8115407Z   SHARD_NUMBER: 2
2023-01-11T23:33:46.8115666Z   BUILD_ENVIRONMENT: linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T23:33:46.8115939Z   PR_NUMBER: 
2023-01-11T23:33:46.8116144Z   PYTORCH_RETRY_TEST_CASES: 1
2023-01-11T23:33:46.8116368Z   PYTORCH_OVERRIDE_FLAKY_SIGNAL: 1
2023-01-11T23:33:46.8116619Z   SHA1: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T23:33:46.8116849Z   TAG: ciflow/trunk/91627
2023-01-11T23:33:46.8117051Z   WORKFLOW_ID: 3896346758
2023-01-11T23:33:46.8117385Z   GITHUB_TOKEN: ***
2023-01-11T23:33:46.8117670Z   GHA_WORKFLOW_JOB_ID: 10589556206
2023-01-11T23:33:46.8117878Z ##[endgroup]
2023-01-11T23:33:46.8140523Z + python3 -m pip install -r requirements.txt
2023-01-11T23:33:47.0256876Z Defaulting to user installation because normal site-packages is not writeable
2023-01-11T23:33:47.0533798Z Requirement already satisfied: astunparse in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.6.3)
2023-01-11T23:33:47.0562625Z Requirement already satisfied: expecttest in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.1.4)
2023-01-11T23:33:47.0570797Z Requirement already satisfied: future in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.18.2)
2023-01-11T23:33:47.0579537Z Requirement already satisfied: hypothesis in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (6.62.0)
2023-01-11T23:33:47.1003775Z Requirement already satisfied: numpy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (1.21.6)
2023-01-11T23:33:47.1013650Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (5.9.1)
2023-01-11T23:33:47.1101690Z Requirement already satisfied: pyyaml in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (6.0)
2023-01-11T23:33:47.1109476Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (2.26.0)
2023-01-11T23:33:47.1318745Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (49.1.3)
2023-01-11T23:33:47.1508577Z Requirement already satisfied: six in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (1.16.0)
2023-01-11T23:33:47.1517855Z Requirement already satisfied: types-dataclasses in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (0.6.6)
2023-01-11T23:33:47.1521976Z Requirement already satisfied: typing_extensions in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 13)) (4.4.0)
2023-01-11T23:33:47.1532338Z Requirement already satisfied: sympy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 14)) (1.10.1)
2023-01-11T23:33:47.1552739Z Requirement already satisfied: filelock in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 15)) (3.9.0)
2023-01-11T23:33:47.1631949Z Requirement already satisfied: networkx in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 16)) (2.6.3)
2023-01-11T23:33:47.1808491Z Requirement already satisfied: jinja2 in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 17)) (3.1.2)
2023-01-11T23:33:47.1835422Z Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from astunparse->-r requirements.txt (line 2)) (0.38.4)
2023-01-11T23:33:47.1851892Z Requirement already satisfied: attrs>=19.2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (22.2.0)
2023-01-11T23:33:47.2143450Z Requirement already satisfied: sortedcontainers<3.0.0,>=2.1.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (2.4.0)
2023-01-11T23:33:47.2153385Z Requirement already satisfied: exceptiongroup>=1.0.0; python_version < "3.11" in /home/ec2-user/.local/lib/python3.7/site-packages (from hypothesis->-r requirements.txt (line 5)) (1.1.0)
2023-01-11T23:33:47.2170959Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (1.26.14)
2023-01-11T23:33:47.2347872Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2.0.12)
2023-01-11T23:33:47.2366825Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (3.4)
2023-01-11T23:33:47.2378411Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 9)) (2022.12.7)
2023-01-11T23:33:47.2386929Z Requirement already satisfied: mpmath>=0.19 in /home/ec2-user/.local/lib/python3.7/site-packages (from sympy->-r requirements.txt (line 14)) (1.2.1)
2023-01-11T23:33:47.2448346Z Requirement already satisfied: MarkupSafe>=2.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from jinja2->-r requirements.txt (line 17)) (2.1.1)
2023-01-11T23:33:47.3010740Z + python3 -m pip install boto3==1.19.12
2023-01-11T23:33:47.5142360Z Defaulting to user installation because normal site-packages is not writeable
2023-01-11T23:33:47.5313314Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12)
2023-01-11T23:33:47.5364218Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12)
2023-01-11T23:33:47.5409801Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0)
2023-01-11T23:33:47.5429445Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2)
2023-01-11T23:33:47.5457702Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2)
2023-01-11T23:33:47.5477157Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.14)
2023-01-11T23:33:47.5649560Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0)
2023-01-11T23:33:47.7594337Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test
2023-01-11T23:35:15.6278301Z [scribe] Scribe access token not provided, sending report via boto3...
2023-01-11T23:35:15.6279035Z ERROR ENCOUNTERED WHEN UPLOADING TO SCRIBE: {"errorMessage":"2023-01-11T23:34:56.027Z 0a32609c-a387-4d95-91aa-b330e1781bb5 Task timed out after 60.02 seconds"}
2023-01-11T23:35:15.6279296Z 
2023-01-11T23:35:15.6279779Z ----- Historic stats comparison result ------
2023-01-11T23:35:15.6279945Z 
2023-01-11T23:35:15.6282560Z     job: linux-bionic-cuda11.6-py3.10-gcc7-sm86
2023-01-11T23:35:15.6282884Z     commit: 8419ddda87c8a47eacc63b54bc7ec98c1f27c26e
2023-01-11T23:35:15.6283057Z 
2023-01-11T23:35:15.6283212Z Commit graph (base is most recent master ancestor with at least one S3 report):
2023-01-11T23:35:15.6283421Z 
2023-01-11T23:35:15.6283496Z     : (master)
2023-01-11T23:35:15.6283666Z     |
2023-01-11T23:35:15.6283877Z     | * 8419ddda87 (HEAD)            total time 12646.49s
2023-01-11T23:35:15.6287630Z     | |
2023-01-11T23:35:15.6287892Z     | : (2 commits)
2023-01-11T23:35:15.6288130Z     |/
2023-01-11T23:35:15.6288651Z     * db2a237763 (base)   9 reports, total time  5112.18s ± 3497.27s
2023-01-11T23:35:15.6289113Z     * 2b0abd4ce3          9 reports, total time  5114.49s ± 3505.80s
2023-01-11T23:35:15.6289518Z     * f7939b21e1         22 reports, total time  4164.47s ± 3805.54s
2023-01-11T23:35:15.6290000Z     * cb3204823e          9 reports, total time  5113.06s ± 3503.00s
2023-01-11T23:35:15.6290369Z     * 6e236553f5          9 reports, total time  5103.01s ± 3496.18s
2023-01-11T23:35:15.6290846Z     * cce577b391          9 reports, total time  5169.75s ± 3599.85s
2023-01-11T23:35:15.6291266Z     * fae821c2f1          9 reports, total time  4979.02s ± 3495.76s
2023-01-11T23:35:15.6291590Z     * 0c3659586d          9 reports, total time  4993.42s ± 3527.06s
2023-01-11T23:35:15.6292235Z     * 122245985a          9 reports, total time  4994.41s ± 3491.69s
2023-01-11T23:35:15.6292555Z     * b797a24259          9 reports, total time  4950.22s ± 3490.58s
2023-01-11T23:35:15.6292768Z     |
2023-01-11T23:35:15.6292924Z     :
2023-01-11T23:35:15.6293025Z 
2023-01-11T23:35:15.6293156Z Removed  (across  942 suites)     0 tests, totaling     0.00s
2023-01-11T23:35:15.6293426Z Modified (across    0 suites)     0 tests, totaling     0.00s
2023-01-11T23:35:15.6293689Z Added    (across  330 suites) 45187 tests, totaling +12646.49s
2023-01-11T23:35:15.7358804Z ##[group]Run pytorch/test-infra/.github/actions/teardown-linux@main
2023-01-11T23:35:15.7359064Z with:
2023-01-11T23:35:15.7359235Z env:
2023-01-11T23:35:15.7359432Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:35:15.7359642Z   GPU_FLAG: --gpus all
2023-01-11T23:35:15.7359926Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:35:15.7360227Z ##[endgroup]
2023-01-11T23:35:15.7373153Z ##[group]Run set -eou pipefail
2023-01-11T23:35:15.7373391Z [36;1mset -eou pipefail[0m
2023-01-11T23:35:15.7373593Z [36;1m[0m
2023-01-11T23:35:15.7373845Z [36;1mecho "Holding runner for 2 hours until all ssh sessions have logged out"[0m
2023-01-11T23:35:15.7374107Z [36;1mfor _ in $(seq 1440); do[0m
2023-01-11T23:35:15.7374348Z [36;1m    # Break if no ssh session exists anymore[0m
2023-01-11T23:35:15.7374816Z [36;1m    if [ "$(who)" = "" ]; then[0m
2023-01-11T23:35:15.7375023Z [36;1m      break[0m
2023-01-11T23:35:15.7375231Z [36;1m    fi[0m
2023-01-11T23:35:15.7375423Z [36;1m    echo "."[0m
2023-01-11T23:35:15.7375615Z [36;1m    sleep 5[0m
2023-01-11T23:35:15.7375798Z [36;1mdone[0m
2023-01-11T23:35:15.7386339Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:35:15.7386572Z env:
2023-01-11T23:35:15.7386757Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:35:15.7386969Z   GPU_FLAG: --gpus all
2023-01-11T23:35:15.7387249Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:35:15.7387503Z ##[endgroup]
2023-01-11T23:35:15.7408696Z Holding runner for 2 hours until all ssh sessions have logged out
2023-01-11T23:35:15.7441058Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty
2023-01-11T23:35:15.7441384Z [36;1m# ignore expansion of "docker ps -q" since it could be empty[0m
2023-01-11T23:35:15.7441650Z [36;1m# shellcheck disable=SC2046[0m
2023-01-11T23:35:15.7441898Z [36;1mdocker stop $(docker ps -q) || true[0m
2023-01-11T23:35:15.7442141Z [36;1m# Prune all of the docker images[0m
2023-01-11T23:35:15.7442376Z [36;1mdocker system prune -af[0m
2023-01-11T23:35:15.7450928Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
2023-01-11T23:35:15.7451160Z env:
2023-01-11T23:35:15.7451354Z   GIT_DEFAULT_BRANCH: master
2023-01-11T23:35:15.7451572Z   GPU_FLAG: --gpus all
2023-01-11T23:35:15.7451855Z   DOCKER_CONTAINER_ID: b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:35:15.7452115Z ##[endgroup]
2023-01-11T23:35:17.5323094Z b465a1e11c77
2023-01-11T23:35:18.8763787Z Deleted Containers:
2023-01-11T23:35:18.8764495Z b465a1e11c77f74805f694f8096e59c3761c5a109e9555d7d0ec24fd6c5fa0c6
2023-01-11T23:35:18.8764833Z 
2023-01-11T23:35:22.0474449Z Deleted Images:
2023-01-11T23:35:22.0476189Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7:fd224c2e6c79d7fdec6408da598bf52bc5b201dd
2023-01-11T23:35:22.0477329Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-bionic-cuda11.6-cudnn8-py3-gcc7@sha256:866df6c1171dbe014496717cf2080d6cc72ca611a4e8146525c9ef09640c8ba4
2023-01-11T23:35:22.0477827Z deleted: sha256:09e297797cd8c095524ba49e041c45e57bf05ba16719f01e7240e8549da5beba
2023-01-11T23:35:22.0478162Z deleted: sha256:6d5f0082fbf8c3b01e49961283f44105a5bb12616f0073762021db97f20a16a5
2023-01-11T23:35:22.0478502Z deleted: sha256:19c574c96e47e3d16cc51cc088fed6cef16eaa2174e667ba0b395ac0e3b989bb
2023-01-11T23:35:22.0479202Z deleted: sha256:4fa7af758c23581dbcc2bd92defdc5fea97c8671fa67bbc30888ffbbf96c49a6
2023-01-11T23:35:22.0479629Z deleted: sha256:e68331e1b0f863bbdbd445ed8475d59d2234c9659264a4e49b7b096311445aee
2023-01-11T23:35:22.0480053Z deleted: sha256:69d886418998cd8758c555ed219fca3a457539e9d6f62c41c9664a80c82c4036
2023-01-11T23:35:22.0480489Z deleted: sha256:2368b1fdf0235d155eaa47e75ff379cff06cc82d63e03886a57363b8092d3c83
2023-01-11T23:35:22.0480961Z deleted: sha256:651c1e1b625aeaf8fa65e96ef11487e650a7a6d00ba8ec9fdfe7e89e762dc5c4
2023-01-11T23:35:22.0481526Z deleted: sha256:7b73f298df08c4b3aa849b55bb5e73ded619cb5b786fddcc21ece3c0b3887038
2023-01-11T23:35:22.0481958Z deleted: sha256:716f28b4b433958e5d1b9839a20dd22f9986f9c1b42fb95552f13d6bfd291efe
2023-01-11T23:35:22.0482263Z deleted: sha256:e4495993276176c504228e66b2fd6f348523c5af66f9292b2f7ea12acefcd606
2023-01-11T23:35:22.0482578Z deleted: sha256:d6f5fbd8782783697c73f4bcbce91a05e126575f72f0f58e1e9465aa57640a92
2023-01-11T23:35:22.0482945Z deleted: sha256:0b7725c897ee2681e3b1ff00aea6c14805c8050758cfa1010f561c9713934014
2023-01-11T23:35:22.0483287Z deleted: sha256:fdd864c6750bdd24f8cfb131673c6f04087e3cbecac2c1a9b3c30fefbd6d3070
2023-01-11T23:35:22.0483646Z deleted: sha256:bfaeaf77f180f62f3994ccdc2be80dd2ef7f4d25ffc8c9497dc51e6cff69711e
2023-01-11T23:35:22.0484012Z deleted: sha256:02f5a9d8be5a1bdd5a350d4c47147fd3dd46bfcedc7637f53a8a692720381fc2
2023-01-11T23:35:22.0484367Z deleted: sha256:e22cc66e4fd2e491fb4ae8194c35d6b1789f9f5d01e1dfbedf1c266c3a1537de
2023-01-11T23:35:22.0484694Z deleted: sha256:1536d02ae84ab410916541408cf2935f122735cc5d128324f6f82fbbeb913e80
2023-01-11T23:35:22.0485023Z deleted: sha256:a8374bc83a4bf3a838aaf8ee71b4a8281ac4eef801473b007f88b2f0efadc6c6
2023-01-11T23:35:22.0485349Z deleted: sha256:9381921b1e612b2d23517d24662a39f00be43efe2412e440ef41b487a48cb389
2023-01-11T23:35:22.0485678Z deleted: sha256:ae063d6cbeae3688a5a7bb8694d431dbb9792bfb7ff2908d3b25842f9586fc8e
2023-01-11T23:35:22.0486017Z deleted: sha256:595397ba048a351c7b09c25b5eba4cbb916c2db40dd80bde4d95c2b51b766045
2023-01-11T23:35:22.0486366Z deleted: sha256:7a2da4ffb8ea2b858fcdfc92f6e640dbb6a083b39dfa5e7aa87cd1296f9314e1
2023-01-11T23:35:22.0486729Z deleted: sha256:eddd66bfdbb5913f133ffda8d967ee0235f9f121434112ea8da4cfdf5f9ebbf4
2023-01-11T23:35:22.0487062Z deleted: sha256:c9392d92ee837d52b35b41e4de67d213886d844cecd9e769d9284dc21070aee8
2023-01-11T23:35:22.0487407Z deleted: sha256:fba16b8beafe9efa854d93f0e92718750ec97ced755ace0f6f51bbd5d1964f91
2023-01-11T23:35:22.0487742Z deleted: sha256:a09592e27d2e6896b9029f31e269787c92761fd19a48528851c48d85221cd4bc
2023-01-11T23:35:22.0488054Z deleted: sha256:8b2e3a8416af60625ccb9a8562891c1b9e85c2ee05f103b190ad5040313bf1f1
2023-01-11T23:35:22.0488379Z deleted: sha256:135caddb443044601f433d421e5c0f5d8ab02dff69cf2df9024a8ecb97c8948c
2023-01-11T23:35:22.0488697Z deleted: sha256:112b60db47585e101175390e102a66908f0f1175510c5e5d5f10b7e4e0c9769b
2023-01-11T23:35:22.0489063Z deleted: sha256:d9e5dd4e760b68190c010a9042c842afb0bcf3d4477a334d0e9c2d9c302ecb3c
2023-01-11T23:35:22.0489439Z deleted: sha256:7d37673dfb91518db6e68e9637f3db142a4eaccb9f548ab99d14ac52b3672325
2023-01-11T23:35:22.0489804Z deleted: sha256:12784d32c23b63941ebc8adf713d7167158c38c771d3b5f94506e036b6273dbd
2023-01-11T23:35:22.0490183Z deleted: sha256:395db4c8bde7e390accfa81ff84376004b78566423bd0a3bd7559e12e66759bc
2023-01-11T23:35:22.0490519Z deleted: sha256:a7aa04ee64333d427277f47b8e07dee6cc566b132a18ee166c305c890d4fd3bb
2023-01-11T23:35:22.0490838Z deleted: sha256:10a0e8d138614f29a13aecd15175039c7f86ca04fc588b439f56284b3c0e292a
2023-01-11T23:35:22.0491173Z deleted: sha256:68d97ecc8d2ec5f756b4a1b7e7451a413ed19f3d7c3ce52c81b30dc00fd96185
2023-01-11T23:35:22.0491506Z deleted: sha256:e833f7b95e0efaaf775293760365243991395e796d587deffe588e50fe7a9f1e
2023-01-11T23:35:22.0491821Z deleted: sha256:cbcd1e7502e949614d05d5d0a34316fd62665c3e030b3656d9c488cfee1eca34
2023-01-11T23:35:22.0492141Z deleted: sha256:a90ce4a75d9d408df6349607788566125c2662c603dc8f84c767b2256273ff12
2023-01-11T23:35:22.0492525Z deleted: sha256:bdee06da46fd67ff14bbc7286b40ef15174551b6452ca633be8576376a3dbddb
2023-01-11T23:35:22.0492864Z deleted: sha256:2cafe83ace87d1a83f30f8d458001f3a315a606d5251c20543cbd6604499aa73
2023-01-11T23:35:22.0493196Z deleted: sha256:cc1f7fb1208e7b05b48d3b3b2648946adaa723b10df2574c7db87ed9112b1510
2023-01-11T23:35:22.0493530Z deleted: sha256:9027ec66ecfd6490ecf157cdf665428dad438f54565d9d85263211da30e6684d
2023-01-11T23:35:22.0493873Z deleted: sha256:a27541cda46a9ede5932b1a1807360e2f6ada5bb0c30bfa3aa953d59dcc1bc5c
2023-01-11T23:35:22.0494252Z deleted: sha256:7b927de6f9fdeb74acee54e7654d04e2614a112cafb477b2433db34dcc7ebe28
2023-01-11T23:35:22.0495002Z deleted: sha256:8298c8753925ad5124b0611c4101e92ec1f877252f8320f5503ebf3e4e7e1314
2023-01-11T23:35:22.0495325Z deleted: sha256:0b741a2d83533cb3b47b912506d486fd477d8c8a1c520a0f9d7d62edfc55487d
2023-01-11T23:35:22.0495656Z deleted: sha256:9412a0a6fd6057a9939fed5beadf64568de1d230c0325a628e396f5b76444bbb
2023-01-11T23:35:22.0496047Z deleted: sha256:0a294dad6a5df3fab407cd28b31eadaf9f65a5bdbf8c7a0db349d3401b537cef
2023-01-11T23:35:22.0496426Z deleted: sha256:b42ee7d4f5715886c70d5cfef4724e889f773a20741ebc4ed9c1771eb09ad634
2023-01-11T23:35:22.0496758Z deleted: sha256:a68106c7b0e03f1e0fa9ad405ce403e2931174825bdcd8e259522ef40ec8a617
2023-01-11T23:35:22.0497076Z deleted: sha256:ad403ee05a64f55c765e665bd3f25a71650857123282dc1bad2af81f5665cda5
2023-01-11T23:35:22.0497402Z deleted: sha256:eb731dc1382c6cc80193d0454f740fc55f441c32f08cd0ce1b8784e7840e53df
2023-01-11T23:35:22.0497725Z deleted: sha256:45bbe3d22998589317c7f6c4dd591475423bb37ca9b922529c5878653483b18d
2023-01-11T23:35:22.0497898Z 
2023-01-11T23:35:22.0498010Z Total reclaimed space: 22.44GB
2023-01-11T23:35:22.0538912Z Post job cleanup.
2023-01-11T23:35:22.0566355Z Post job cleanup.
2023-01-11T23:35:22.1544233Z [command]/usr/bin/git version
2023-01-11T23:35:22.1580646Z git version 2.38.1
2023-01-11T23:35:22.1618445Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/f4d42d21-0616-47c6-8f2e-23c5a01efe0a' before making global git config changes
2023-01-11T23:35:22.1619272Z Adding repository directory to the temporary git global config as a safe directory
2023-01-11T23:35:22.1623372Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch
2023-01-11T23:35:22.1647221Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2023-01-11T23:35:22.1677288Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2023-01-11T23:35:22.1900338Z Entering 'android/libs/fbjni'
2023-01-11T23:35:22.1929763Z Entering 'third_party/FP16'
2023-01-11T23:35:22.1957689Z Entering 'third_party/FXdiv'
2023-01-11T23:35:22.1989227Z Entering 'third_party/NNPACK'
2023-01-11T23:35:22.2019674Z Entering 'third_party/QNNPACK'
2023-01-11T23:35:22.2047574Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T23:35:22.2075354Z Entering 'third_party/XNNPACK'
2023-01-11T23:35:22.2112155Z Entering 'third_party/benchmark'
2023-01-11T23:35:22.2139977Z Entering 'third_party/cpuinfo'
2023-01-11T23:35:22.2167795Z Entering 'third_party/cub'
2023-01-11T23:35:22.2196150Z Entering 'third_party/cudnn_frontend'
2023-01-11T23:35:22.2232425Z Entering 'third_party/cutlass'
2023-01-11T23:35:22.2264304Z Entering 'third_party/eigen'
2023-01-11T23:35:22.2293874Z Entering 'third_party/fbgemm'
2023-01-11T23:35:22.2324600Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T23:35:22.2352974Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T23:35:22.2382009Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T23:35:22.2410820Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T23:35:22.2440543Z Entering 'third_party/flatbuffers'
2023-01-11T23:35:22.2470017Z Entering 'third_party/fmt'
2023-01-11T23:35:22.2498445Z Entering 'third_party/foxi'
2023-01-11T23:35:22.2525355Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T23:35:22.2554083Z Entering 'third_party/gloo'
2023-01-11T23:35:22.2583712Z Entering 'third_party/googletest'
2023-01-11T23:35:22.2612987Z Entering 'third_party/ideep'
2023-01-11T23:35:22.2643027Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T23:35:22.2672605Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T23:35:22.2705325Z Entering 'third_party/ios-cmake'
2023-01-11T23:35:22.2732596Z Entering 'third_party/ittapi'
2023-01-11T23:35:22.2762781Z Entering 'third_party/kineto'
2023-01-11T23:35:22.2789271Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T23:35:22.2818370Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T23:35:22.2847983Z Entering 'third_party/nccl/nccl'
2023-01-11T23:35:22.2878417Z Entering 'third_party/neon2sse'
2023-01-11T23:35:22.2908057Z Entering 'third_party/nlohmann'
2023-01-11T23:35:22.2935965Z Entering 'third_party/onnx'
2023-01-11T23:35:22.2973879Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T23:35:22.2999574Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T23:35:22.3029247Z Entering 'third_party/onnx-tensorrt'
2023-01-11T23:35:22.3058015Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T23:35:22.3090150Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T23:35:22.3118400Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T23:35:22.3144247Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T23:35:22.3178200Z Entering 'third_party/pocketfft'
2023-01-11T23:35:22.3204082Z Entering 'third_party/protobuf'
2023-01-11T23:35:22.3235300Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T23:35:22.3263297Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T23:35:22.3294404Z Entering 'third_party/psimd'
2023-01-11T23:35:22.3324804Z Entering 'third_party/pthreadpool'
2023-01-11T23:35:22.3352785Z Entering 'third_party/pybind11'
2023-01-11T23:35:22.3379158Z Entering 'third_party/python-enum'
2023-01-11T23:35:22.3408587Z Entering 'third_party/python-peachpy'
2023-01-11T23:35:22.3436689Z Entering 'third_party/python-six'
2023-01-11T23:35:22.3463646Z Entering 'third_party/sleef'
2023-01-11T23:35:22.3492505Z Entering 'third_party/tbb'
2023-01-11T23:35:22.3523496Z Entering 'third_party/tensorpipe'
2023-01-11T23:35:22.3552354Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T23:35:22.3580576Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T23:35:22.3608667Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T23:35:22.3638010Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T23:35:22.3663706Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T23:35:22.3691539Z Entering 'third_party/zstd'
2023-01-11T23:35:22.3731154Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2023-01-11T23:35:22.3756378Z http.https://github.com/.extraheader
2023-01-11T23:35:22.3763626Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2023-01-11T23:35:22.3792637Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :
2023-01-11T23:35:22.4001500Z Entering 'android/libs/fbjni'
2023-01-11T23:35:22.4018364Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4040121Z Entering 'third_party/FP16'
2023-01-11T23:35:22.4056624Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4078494Z Entering 'third_party/FXdiv'
2023-01-11T23:35:22.4093915Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4115313Z Entering 'third_party/NNPACK'
2023-01-11T23:35:22.4131727Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4155077Z Entering 'third_party/QNNPACK'
2023-01-11T23:35:22.4171485Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4194096Z Entering 'third_party/VulkanMemoryAllocator'
2023-01-11T23:35:22.4209440Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4232321Z Entering 'third_party/XNNPACK'
2023-01-11T23:35:22.4247395Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4277139Z Entering 'third_party/benchmark'
2023-01-11T23:35:22.4292772Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4316503Z Entering 'third_party/cpuinfo'
2023-01-11T23:35:22.4332087Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4353645Z Entering 'third_party/cub'
2023-01-11T23:35:22.4370263Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4393148Z Entering 'third_party/cudnn_frontend'
2023-01-11T23:35:22.4409405Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4436394Z Entering 'third_party/cutlass'
2023-01-11T23:35:22.4452920Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4480023Z Entering 'third_party/eigen'
2023-01-11T23:35:22.4496918Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4520556Z Entering 'third_party/fbgemm'
2023-01-11T23:35:22.4536742Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4558784Z Entering 'third_party/fbgemm/third_party/asmjit'
2023-01-11T23:35:22.4575615Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4597710Z Entering 'third_party/fbgemm/third_party/cpuinfo'
2023-01-11T23:35:22.4613041Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4634049Z Entering 'third_party/fbgemm/third_party/googletest'
2023-01-11T23:35:22.4650505Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4673187Z Entering 'third_party/fbgemm/third_party/hipify_torch'
2023-01-11T23:35:22.4689346Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4711973Z Entering 'third_party/flatbuffers'
2023-01-11T23:35:22.4727961Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4753232Z Entering 'third_party/fmt'
2023-01-11T23:35:22.4768610Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4791181Z Entering 'third_party/foxi'
2023-01-11T23:35:22.4806989Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4829730Z Entering 'third_party/gemmlowp/gemmlowp'
2023-01-11T23:35:22.4846502Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4869488Z Entering 'third_party/gloo'
2023-01-11T23:35:22.4886222Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4909708Z Entering 'third_party/googletest'
2023-01-11T23:35:22.4926020Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4949099Z Entering 'third_party/ideep'
2023-01-11T23:35:22.4965941Z http.https://github.com/.extraheader
2023-01-11T23:35:22.4986233Z Entering 'third_party/ideep/mkl-dnn'
2023-01-11T23:35:22.5000948Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5023246Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN'
2023-01-11T23:35:22.5038817Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5065789Z Entering 'third_party/ios-cmake'
2023-01-11T23:35:22.5082282Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5103493Z Entering 'third_party/ittapi'
2023-01-11T23:35:22.5119768Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5142306Z Entering 'third_party/kineto'
2023-01-11T23:35:22.5158614Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5179796Z Entering 'third_party/kineto/libkineto/third_party/fmt'
2023-01-11T23:35:22.5195993Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5216974Z Entering 'third_party/kineto/libkineto/third_party/googletest'
2023-01-11T23:35:22.5233282Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5258814Z Entering 'third_party/nccl/nccl'
2023-01-11T23:35:22.5275282Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5296478Z Entering 'third_party/neon2sse'
2023-01-11T23:35:22.5313001Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5335702Z Entering 'third_party/nlohmann'
2023-01-11T23:35:22.5352427Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5374938Z Entering 'third_party/onnx'
2023-01-11T23:35:22.5392247Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5421856Z Entering 'third_party/onnx/third_party/benchmark'
2023-01-11T23:35:22.5439193Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5460893Z Entering 'third_party/onnx/third_party/pybind11'
2023-01-11T23:35:22.5477336Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5502910Z Entering 'third_party/onnx-tensorrt'
2023-01-11T23:35:22.5518862Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5541323Z Entering 'third_party/onnx-tensorrt/third_party/onnx'
2023-01-11T23:35:22.5557467Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5583944Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'
2023-01-11T23:35:22.5600407Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5622668Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'
2023-01-11T23:35:22.5639806Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5660807Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'
2023-01-11T23:35:22.5677297Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5703567Z Entering 'third_party/pocketfft'
2023-01-11T23:35:22.5720470Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5743000Z Entering 'third_party/protobuf'
2023-01-11T23:35:22.5759455Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5784689Z Entering 'third_party/protobuf/third_party/benchmark'
2023-01-11T23:35:22.5801221Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5823512Z Entering 'third_party/protobuf/third_party/googletest'
2023-01-11T23:35:22.5840192Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5864534Z Entering 'third_party/psimd'
2023-01-11T23:35:22.5881355Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5901958Z Entering 'third_party/pthreadpool'
2023-01-11T23:35:22.5920187Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5941535Z Entering 'third_party/pybind11'
2023-01-11T23:35:22.5957842Z http.https://github.com/.extraheader
2023-01-11T23:35:22.5980060Z Entering 'third_party/python-enum'
2023-01-11T23:35:22.5996660Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6017291Z Entering 'third_party/python-peachpy'
2023-01-11T23:35:22.6034276Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6056332Z Entering 'third_party/python-six'
2023-01-11T23:35:22.6072853Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6094949Z Entering 'third_party/sleef'
2023-01-11T23:35:22.6112240Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6133582Z Entering 'third_party/tbb'
2023-01-11T23:35:22.6150255Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6174443Z Entering 'third_party/tensorpipe'
2023-01-11T23:35:22.6190347Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6212289Z Entering 'third_party/tensorpipe/third_party/googletest'
2023-01-11T23:35:22.6229450Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6251955Z Entering 'third_party/tensorpipe/third_party/libnop'
2023-01-11T23:35:22.6269251Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6291553Z Entering 'third_party/tensorpipe/third_party/libuv'
2023-01-11T23:35:22.6308500Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6330859Z Entering 'third_party/tensorpipe/third_party/pybind11'
2023-01-11T23:35:22.6347482Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6368701Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang'
2023-01-11T23:35:22.6385944Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6410861Z Entering 'third_party/zstd'
2023-01-11T23:35:22.6427608Z http.https://github.com/.extraheader
2023-01-11T23:35:22.6639952Z Cleaning up orphan processes